Much Ado About Very Little
Guest post by Zeke Hausfather and Steve Mosher
E.M. Smith has claimed (see full post here: Summary Report on v1 vs v3 GHCN ) to find numerous differences between GHCN version 1 and version 3, differences that, in his words, constitute “a degree of shift of the input data of roughly the same order of scale as the reputed Global Warming”. His analysis is flawed, however, as the raw data in GHCN v1 and v3 are nearly identical, and trends in the globally gridded raw data for both are effectively the same as those found in the published NCDC and GISTemp land records.
Figure 1: Comparison of station-months of data over time between GHCN v1 and GHCN v3.
First, a little background on the Global Historical Climatology Network (GHCN). GHCN was created in the late 1980s after a large effort by the World Meteorological Organization (WMO) to collect all available temperature data from member countries. Many of these were in the form of logbooks or other non-digital records (this being the 1980s), and many man-hours were required to process them into a digital form.
Meanwhile, the WMO set up a process to automate the submission of data going forward, setting up a network of around 1,200 geographically distributed stations that would provide monthly updates via CLIMAT reports. Periodically NCDC undertakes efforts to collect more historical monthly data not submitted via CLIMAT reports, and more recently has set up a daily product with automated updates from tens of thousands of stations (GHCN-Daily). This structure of GHCN as a periodically updated retroactive compilation with a subset of automatically reporting stations has in the past led to some confusion over “station die-offs”.
GHCN has gone through three major iterations. V1 was released in 1992 and included around 6,000 stations with only mean temperatures available and no adjustments or homogenization. Version 2 was released in 1997 and added in a number of new stations, minimum and maximum temperatures, and manually homogenized data. V3 was released last year and added many new stations (both in the distant past and post-1992, where Version 2 showed a sharp drop-off in available records), and switched the homogenization process to the Menne and Williams Pairwise Homogenization Algorithm (PHA) previously used in USHCN. Figure 1, above, shows the number of stations records available for each month in GHCN v1 and v3.
We can perform a number of tests to see if GHCN v1 and 3 differ. The simplest one is to compare the observations in both data files for the same stations. This is somewhat complicated by the fact that station identity numbers have changed since v1 and v3, and we have been unable to locate translation between the two. We can, however, match stations between the two sets using their latitude and longitude coordinates. This gives us 1,267,763 station-months of data whose stations match between the two sets with a precision of two decimal places.
When we calculate the difference between the two sets and plot the distribution, we get Figure 2, below:
Figure 2: Difference between GHCN v1 and GHCN v3 records matched by station lat/lon.
The vast majority of observations are identical between GHCN v1 and v3. If we exclude identical observations and just look at the distribution of non-zero differences, we get Figure 3:
Figure 3: Difference between GHCN v1 and GHCN v3 records matched by station lat/lon, excluding cases of zero difference.
This shows that while the raw data in GHCN v1 and v3 is not identical (at least via this method of station matching), there is little bias in the mean. Differences between the two might be explained by the resolution of duplicate measurements in the same location (called imods in GHCN version 2), by updates to the data from various national MET offices, or by refinements in station lat/lon over time.
Another way to test if GHCN v1 and GHCN v3 differ is to convert the data of each into anomalies (with baseline years of 1960-1989 chosen to maximize overlap in the common anomaly period), assign each to a 5 by 5 lat/lon grid cell, average anomalies in each grid cell, and create a land-area weighted global temperature estimate. This is similar to the method that NCDC uses in their reconstruction.
Figure 4: Comparison of GHCN v1 and GHCN v3 spatially gridded anomalies. Note that GHCN v1 ends in 1990 because that is the last year of available data.
When we do this for both GHCN v1 and GHCN v3 raw data, we get the figure above. While we would expect some differences simply because GHCN v3 includes a number of stations not included in GHCN v1, the similarities are pretty remarkable. Over the century scale the trends in the two are nearly identical. This differs significantly from the picture painted by E.M. Smith; indeed, instead of the shift in input data being equivalent to 50% of the trend, as he suggests, we see that differences amount to a mere 1.5% difference in trend.
Now, astute skeptics might agree with me that the raw data files are, if not identical, overwhelmingly similar but point out that there is one difference I did not address: GHCN v1 had only raw data with no adjustments, while GHCN v3 has both adjusted and raw versions. Perhaps the warming the E.M. Smith attributed to changes in input data might in fact be due to changes in adjustment method?
This is not the case, as GHCN v3 adjustments have little impact on the global-scale trend vis-à-vis the raw data. We can see this in Figure 5 below, where both GHCN v1 and GHCN v3 are compared to published NCDC and GISTemp land records:
Figure 5: Comparison of GHCN v1 and GHCN v3 spatially gridded anomalies with NCDC and GISTemp published land reconstructions.
If we look at the trends over the 1880-1990 period, we find that both GHCN v1 and GHCN v3 are quite similar, and lie between the trends shown in GISTemp and NCDC records.
1880-1990 trends
GHCN v1 raw: 0.04845 C (0.03661 to 0.06024)
GHCN v3 raw: 0.04919 C (0.03737 to 0.06100)
NCDC adjusted: 0.05394 C (0.04418 to 0.06370)
GISTemp adjusted: 0.04676 C (0.03620 to 0.05731)
This analysis should make it abundantly clear that the change in raw input data (if any) between GHCN version 1 and GHCN version 3 had little to no effect on global temperature trends. The exact cause of Smith’s mistaken conclusion is unknown; however, a review of his code does indicate a few areas that seem problematic. They are:
1. An apparent reliance on station Ids to match stations. Station Ids can differ between versions of GHCN.
2. Use of First Differences. Smith uses first differences, however he has made idiosyncratic changes to the method, especially in cases where there are temporal lacuna in the data. The method which used to be used by NCDC has known issues and biases – detailed by Jeff Id. Smith’s implementation and his method of handling gaps in the data is unproven and may be the cause.
3. It’s unclear from the code which version of GHCN V3 that Smith used.
STATA code and data used in creating the figures in this post can be found here: https://www.dropbox.com/sh/b9rz83cu7ds9lq8/IKUGoHk5qc
Playing around with it is strongly encouraged for those interested.
Carrick
“Steven, as any astute observer will notice, BEST uses adjusted rather that raw data.”
ya I didnt think you had made that claim and dont know how amino got that notion.
Ideally, I’m hoping that people learn to apply a better terminology than raw data.
the surface temperature folks are adopting the level 0, level 1, etc terminology.
where level 0 is the “first report” the actual written form if it exists.
here sunshine. contemplate this
sunshine
“And if people do think I tried to hi-jack the thread, I apologize, but my first post on this thread did start with: “(Moderator — feel free to snip, but I think this is relevant)”
so basically you knew that this thread was about a comparison between Ghcn v1 and Ghcn V3
and you figured that your post was off topic but you’d try to sneak in something that
you’ve tried to get away with before… posting data that you know is ESTIMATED and not being straightforward about your source.
Nice Bruce. I also like the way you explained to people that the first time you tried this
I busted you for using data that was not the QC data.
EM
“Oh, and on Steven’s assertion that the StaionIDs all changed… ”
Once again, please find the quote where I said they all changed? You won’t find it. Because they dont all change. But enough of them change that you cannot use it as a reliable method for station disambiguation or station identification.
vukcevic says:
June 23, 2012 at 1:18 pm (Edit)
steven mosher says: June 23, 2012 at 7:17 am
You can expect some updates to that Sante fe chart in the coming months. I suspect folks who do spectral analysis will be very interested.
I will look forward to your results.
I suggest to separate two hemispheres, South is less volatile, ocean inertia and CPC flywheel effect, North is affected by gyres and more in sympathy with the GMF
Till then this is what I get
http://www.vukcevic.talktalk.net/NH+SH.htm
When done I’ll email you magnetic data, so you can have some fun with it
##########################################
I’ll see what I can do to separate the data for you.
since you seem keen on making new discoveries and sharing your work
I’ll put that on the top of my list.
Sunshine
So that everyone can understand your question. You come onto a thread that is about Ghcn version 1 ( which is used by no one ) and Ghcnv3
and ask me
“But I think I had a legitimate question as to where a -13.7C data point for Malahat came from in the BEST data.”
Well, lets see. As I pointed out to others there are many sources that BEST uses and those sources have sources
IN the post you link to, you claim to have looked at two BEST data sets.
The “single value” dataset and the QC dataset
http://sunshinehours.wordpress.com/2012/03/13/auditing-the-latest-best-and-ec-data-for-malahat/
The -13.7C figure is one you found in the single value dataset.
As I explained to you before when you MISTOOK this data as the QC data, the single value dataset is the first dataset after the merge. That is, all the sources are combined into a merged dataset and then duplicates are identified and you end up with a dataset that
has a single value for every station month and a source for that data. Hence the name
“Single value”
So, your question is how did this data point get into the single valued dataset?
Answer: the data point was present in one of the sources of data.
And your point, that you jacked this thread around for was what? BEST reads in all the sources
and them compiles a final QC dataset. You pick out a dataset that hasnt been QC’d
and ask me what the source is for one month of data?
And you and the moderators and all the other readers think that this is a relevant question?
Now, I’ve taken my own free time to write software that allows you and anyone else to answer this question, you think that this answer will somehow be relevant to the work that EM smith and Zeke and I did on GHCN version 1 and 3? Say what?
relevant how? and then you beat me up because I wont do the work for you?
So, not only do I have to write the software to allow you to do it for yourself, that is not good enough? you want me to fetch that answer for you?
Good god, where is Willis the thread nanny to tell me where my responsibilities start and end.
Download my Code.
download the data
use the command readSources()
find the stations you want..
Here are the potential sources. That particular Month
7973 1 2002.792 3 22 0 0 0 0 0 0 0 0 0 0
7973 1 2002.875 3 22 0 0 0 0 0 0 0 0 0 0
7973 1 2002.958 3 22
The source is source 22.
Now you can read the source descriptions. I wrote a function for that as well.
here is what it reads.. all the sources.
can you find source 22?
1: US First Order Summary of the Day
2: US Cooperative Summary of the Day
3: Global Historical Climatology Network – Daily
4: Global Summary of the Day
5: Original Manuscript (from USSOD)
11: MAPSO (from USSOD)
13: Unknown / Other (from USSOD)
14: ASOS (from USSOD)
15: US Cooperative Summary of the Day (from GHCN)
17: US Preliminary Cooperative Summary of the Day, keyed from paper (from GHCN)
18: CDMP Cooperative Summary of the Day (from GHCN)
19: ASOS, since 2006 (from GHCN)
20: ASOS, 2000-2005 (from GHCN)
21: US Fort Data (from GHCN)
22: GCOS or other offical Government Data (from GHCN)
23: High Plains Regional Climate Center (from GHCN)
24: International Collection, personal contacts (from GHCN)
25: Monthly METAR Extract (from GHCN)
26: Quarantined African Data (from GHCN)
27: NCDC Reference Network / USHCN (from GHCN)
28: Global Summary of the Day (from GHCN)
29: US First Order Summary of the Day (from GHCN)
34: Scientific Committee on Antarctic Research
35: Hadley Centre Data Release
36: US Cooperative Summary of the Month
37: US Historical Climatology Network – Monthly
38: World Monthly Surface Station Climatology
42: Australian data from Australian Bureau of Met (from GHCN)
44: Ukraine update (from GHCN)
46: NCDC: US Cooperative Summary of the Day
47: NCDC: US First Order Summary of the Day
49: NCDC: CDMP Cooperative Summary of the Day
50: NCDC: Undocumented Summary of the Day
51: NCDC: US Cooperative Summary of the Day – Preliminary
52: NCDC: RCC-Preliminary Summary of the Day
53: GSN Monthly Data
54: Monthly Climatic Data of the World
55: GCOS Monthly CLIMAT Summaries
56: Global Historical Climatology Network – Monthly v3
57: Monthly Climatic Data of the World – Preliminary (from GHCN3)
58: GHCN-M v2 – Single Valued Series (from GHCN3)
59: UK Met Office (from GHCN3)
60: Monthly Climatic Data of the World – Final
62: CLIMAT / non-MCDW (from GHCN3)
63: USHCN v2 (from GHCN3)
64: World Weather Records (from GHCN3)
65: GHCN-M v2 multiple series 0 (from GHCN3)
66: GHCN-M v2 multiple series 1 (from GHCN3)
67: GHCN-M v2 multiple series 2 (from GHCN3)
68: GHCN-M v2 multiple series 3 (from GHCN3)
69: GHCN-M v2 multiple series 4 (from GHCN3)
70: GHCN-M v2 multiple series 5 (from GHCN3)
71: GHCN-M v2 multiple series 6 (from GHCN3)
72: GHCN-M v2 multiple series 7 (from GHCN3)
73: GHCN-M v2 multiple series 8 (from GHCN3)
76: World Weather Records
77: Colonial Archive
79: Colonial Era Archive (from GHCN3)
81: USSOD-C transmitted (from GHCN)
82: USSOD-C paper forms (from GHCN)
83: European Climate Assessment (from GHCN)
And the source data is here. on NCDC site
ftp://ftp.ncdc.noaa.gov/pub/data/gcos/
Now, that phi has got his links to countries other than the US that have done TOBS adjustments and now that you know the source of december 2002 for malahat BC
do you have any intelligent questions?
I would have much to answer but seen my difficulty in English it’s above my forces.
Willis, thank you again for your comments, they seem to me quite appropriate.
I will be short by taking only two points that seem important to me in connection with this thread.
1. Steven Mosher claimed that TObs were the main adjustment made. That remains despite references, an unproven assertion outside US. That seems important in connection with this thread because TObs adjustments are the less problematic and in my interpretation, the divergence revealed by EM Smith is driven primarily by stations moves.
2. Implicit homogeneization in BEST are related to the segments adjustments. It is true that one can disable the scalpel but as I said earlier, the NMS aggregate the existing segments, disabling the scalpel only allow to prevent a part of the implicit homogenization and probably the least decisive.
The benefit of posting links is that readers can read and understand the reason for thinking in a certain way.
If I were to google TOBS etc and read 10 sources, I still may have not found the correct source/paper that caused a particular comment or thought process.
A link to paper X, 2001 would immediately put all readers of this site in sync with the writer.
I may encourage others to reply with “have you read paper Y, 2009 with contradicts paper X….
In the race to publish papers and blogs, it is extremely helpful if people are told what source has informed opinions.
As Mosher has his panties in a bunch over my minor variation on First Differences, instead of answering comments here this evening, I re-did the anomaly code to do Classical First Differences and then re-ran it on both v1 and v3 “all data”. Calculated the difference between them, and plotted it on the same chart with my dP or dT method. ( i.e the ‘bridge the gap’ method).
Unfortunately, the difference was not as large as I had hoped. I was expecting more induced error in Classical FD from the gratuitous resets on data dropouts. Either there are fewer of them than I thought, or the average error tends to average out more than I expected ( i.e. random rather than systematic). In any case, not much difference between the two methods.
In recent years, near zero, increasing to about 0.05 C for most of it. At about 1850 (the earliest data used that I’ve found for various climate codes – that being Hadley – GIStemp uses 1880) the difference has expanded to about 0.12 C. I still hope the method can be shown more accurate (and superior) on smaller sets of data, or those with larger dropouts. But at least this ought to answer any doubts about dramatic impact from the change.
Chart of comparison here:
http://chiefio.files.wordpress.com/2012/06/classic-fd-vs-dt-or-dp.png
I’ll come back tomorrow and try to catch up on any comments here, then.
Steven Mosher, you said
“willis,……..You yourself are well enough read in this field ( CET and Armagh) to know that the US is not the only record that does TOBS adjustments.”
I dont think that the CET has a specific TOBs adjustment in it, Parker says
‘Manley took considerable care to compensate his monthly series for changes in observation time. Much of this compensation was implicit, through his use of overlaps between stations to make adjustments for changes of site’
http://www.metoffice.gov.uk/hadobs/hadcet/Parker_etalIJOC1992_dailyCET.pdf
Pamela Gray says: “I think station dropout has to do with abandoned stations in less populated areas of the US. “
An interesting speculation. It would be nice if you would study it and try to quantify the effect.
sunshinehours1 says: “Like the average elevation dropping 46 meters from 1940 to 2000?”
If the new lower stations are treated as new stations, this will have no effect because the trends are computed on the anomalies.
If the new lower locations are merged with the data of an older long station, you would remove this effect by homogenization. Either when the change is made, by performing several years of parallel measurements, as the WMO advices, or if this was not possible you can do this afterwards by statistical homogenization. This community is funny, one half is against homogenization and wants to use raw data and the other half points to problems that are in the raw data, which you could remove by homogenization. Whatever the climatologist does, it is wrong.
http://en.wikipedia.org/wiki/Lernaean_Hydra
DocMartyn says: “Well if you make sure all the cooling stations left are inside very closely spaced clusters of warming stations and make sure that the ones you remove are near by the many voids, you make the voids warmer. …you have a smoking gun.”
Yes you could make the effect of non-random station drop out stronger by also taking the density of stations into account. But please quantify this effect before you call it a smoking gun. Currently, I would call it an incense stick, at best.
I went to Steve McIntyre’s site and did a search on “Berkeley best”. I hadn’t been to his site in quite a while. I was just curious. He sees problems with BEST also.
http://climateaudit.org/?s=berkeley+best
Carrick
If you want to avoid giving the impression that you think BEST and GISTemp are true to real world temps you should word things more carefully than this:
“BEST and GISTEMP get very findings, since they have the largest geographical coverage”
If you think BEST and GISTemp have a bias then come out and say it. Don’t act like you think both can be possible.
I have mentioned this before to Steven Mosher on other sites, he keeps talking about QC datasets and mentions BEST in the same breath.
Perhaps he can explain why so many of their northern hemisphere stations have as high or higher Average Temperatures in Mid Winter (Jan/Feb/Mar) as those in mid summer (June/July/Aug).
I do not know if any of the other QC datasets suffer with the same problem, but I know for certain that BEST does.
Quality Control that cannot identify the Improbable and maybe impossible is not fit for use.
Carrick says:
June 24, 2012 at 10:21 pm
“I will claim I never said “BEST is raw untouched data”,”
And I never said you said it either. I said you gave the impression of it. You changed what I said. Now can you see your bias?
More of your dancing in circles.
steven mosher says:
June 24, 2012 at 10:56 pm
“Let me see if I can help”
You were not able to help. You went off on a tangent. You digressed. You did not address what I actually said.
I said Carrick gave the impression. I did not say a Google search to find what BEST really is gave the impression. You and Carrick are turning what I said into something I did not say. Odd things happening here.
steven mosher and Carrick
here is what I said:
http://wattsupwiththat.com/2012/06/22/comparing-ghcn-v1-and-v3/#comment-1017458
steven mosher says:
June 23, 2012 at 6:20 am
“Tobs is the single largest adjustment made to most records. It happens to be a warming adjustment.”
How convenient.
Victor: “If the new lower stations are treated as new stations …”
Or they just stopped using the data from some higher elevation stations.
We don’t know. Maybe the people who are creating the GAT should mention it and explain what happened.
Mosher, I knew I was comparing SV to QC. I said so on my blog back in March:
“I am comparing the BEST SV (Single Valued) data to the BEST QC (Quality Controlled) data.”
http://sunshinehours.wordpress.com/2012/03/13/auditing-the-latest-best-and-ec-data-for-malahat/
Mosher: “Answer: the data point was present in one of the sources of data.”
Why? It bears no relation to reality.
Mosher: “but generally over space and time the planet has been warming.”
… and cooling and warming and cooling …
Then why do you change the topic when we try and find out which parts are cooling now and why?
I want to apologize to folks for not actively participating in this thread earlier. I was away at a retreat this weekend, and (unbeknownst to me beforehand) they did not have working wifi… Steve has done a pretty good job addressing concerns, but here are responses to some outstanding comments:
Wayne,
While there is a slightly positive mean bias in the differences they are relatively small as upwards of 90% of the differences are zero. Any systemic bias between the two sets would show up in Fig. 4, but as you can see the difference in century-scale trends is only ~1.5% and well within the error bars for each.
I’m rather swamped this week, but I’ll see if I can do an analysis of differences between the two over time. I’m also trying to get a station_id conversion from NCDC so I’m not just trying to match based on lat/lon.
Bill Illis,
The differences between NCDC’s record in GHCN v2 and v3 relates primarily to the adjustments. This post (and analysis) deals solely with unadjusted data.
For folks rehashing the station dropout debate, there are a number of data points that should be reassuring:
1) As we discovered when we first examined this, stations that dropped out of GHCN v2 around 1992 had a slightly higher trend pre-1992 than stations that did not drop out (due in part to better sampling of higher latitudes). This is the opposite of what you would expect to see if cooling stations were purposefully dropped.
2) GHCN v3 added in a bunch of new data post-1992 but the record over that period did not change appreciably.
3) Alternative datasets with much greater coverage over that period that experience no decrease in station count (GSOD/ISH, GHCN Daily, Berkeley) show effectively the same results as GHCN.
Zeke: “As we discovered when we first examined this, stations that dropped out of GHCN v2 around 1992 had a slightly higher trend pre-1992 ”
What do you mean by trend? 1950-1992?
John Doe,
Anyway, this is wrong. TObs isn’t the largest single adjustment. Steven Mosher gave a link that contradicts it: http://sciencelinks.jp/j-east/display.php?id=000020000500A0108818
According to the summary, if you evaluate the effect on trends in average temperatures, you see that this should be about 0.1 ° C over the twentieth century. This is absolutely not the dominant cause.