Comparing GHCN V1 and V3

Much Ado About Very Little

Guest post by Zeke Hausfather and Steve Mosher

E.M. Smith has claimed (see full post here: Summary Report on v1 vs v3 GHCN ) to find numerous differences between GHCN version 1 and version 3, differences that, in his words, constitute “a degree of shift of the input data of roughly the same order of scale as the reputed Global Warming”. His analysis is flawed, however, as the raw data in GHCN v1 and v3 are nearly identical, and trends in the globally gridded raw data for both are effectively the same as those found in the published NCDC and GISTemp land records.

clip_image002

Figure 1: Comparison of station-months of data over time between GHCN v1 and GHCN v3.

First, a little background on the Global Historical Climatology Network (GHCN). GHCN was created in the late 1980s after a large effort by the World Meteorological Organization (WMO) to collect all available temperature data from member countries. Many of these were in the form of logbooks or other non-digital records (this being the 1980s), and many man-hours were required to process them into a digital form.

Meanwhile, the WMO set up a process to automate the submission of data going forward, setting up a network of around 1,200 geographically distributed stations that would provide monthly updates via CLIMAT reports. Periodically NCDC undertakes efforts to collect more historical monthly data not submitted via CLIMAT reports, and more recently has set up a daily product with automated updates from tens of thousands of stations (GHCN-Daily). This structure of GHCN as a periodically updated retroactive compilation with a subset of automatically reporting stations has in the past led to some confusion over “station die-offs”.

GHCN has gone through three major iterations. V1 was released in 1992 and included around 6,000 stations with only mean temperatures available and no adjustments or homogenization. Version 2 was released in 1997 and added in a number of new stations, minimum and maximum temperatures, and manually homogenized data. V3 was released last year and added many new stations (both in the distant past and post-1992, where Version 2 showed a sharp drop-off in available records), and switched the homogenization process to the Menne and Williams Pairwise Homogenization Algorithm (PHA) previously used in USHCN. Figure 1, above, shows the number of stations records available for each month in GHCN v1 and v3.

We can perform a number of tests to see if GHCN v1 and 3 differ. The simplest one is to compare the observations in both data files for the same stations. This is somewhat complicated by the fact that station identity numbers have changed since v1 and v3, and we have been unable to locate translation between the two. We can, however, match stations between the two sets using their latitude and longitude coordinates. This gives us 1,267,763 station-months of data whose stations match between the two sets with a precision of two decimal places.

When we calculate the difference between the two sets and plot the distribution, we get Figure 2, below:

clip_image004

Figure 2: Difference between GHCN v1 and GHCN v3 records matched by station lat/lon.

The vast majority of observations are identical between GHCN v1 and v3. If we exclude identical observations and just look at the distribution of non-zero differences, we get Figure 3:

clip_image006

Figure 3: Difference between GHCN v1 and GHCN v3 records matched by station lat/lon, excluding cases of zero difference.

This shows that while the raw data in GHCN v1 and v3 is not identical (at least via this method of station matching), there is little bias in the mean. Differences between the two might be explained by the resolution of duplicate measurements in the same location (called imods in GHCN version 2), by updates to the data from various national MET offices, or by refinements in station lat/lon over time.

Another way to test if GHCN v1 and GHCN v3 differ is to convert the data of each into anomalies (with baseline years of 1960-1989 chosen to maximize overlap in the common anomaly period), assign each to a 5 by 5 lat/lon grid cell, average anomalies in each grid cell, and create a land-area weighted global temperature estimate. This is similar to the method that NCDC uses in their reconstruction.

clip_image008

Figure 4: Comparison of GHCN v1 and GHCN v3 spatially gridded anomalies. Note that GHCN v1 ends in 1990 because that is the last year of available data.

When we do this for both GHCN v1 and GHCN v3 raw data, we get the figure above. While we would expect some differences simply because GHCN v3 includes a number of stations not included in GHCN v1, the similarities are pretty remarkable. Over the century scale the trends in the two are nearly identical. This differs significantly from the picture painted by E.M. Smith; indeed, instead of the shift in input data being equivalent to 50% of the trend, as he suggests, we see that differences amount to a mere 1.5% difference in trend.

Now, astute skeptics might agree with me that the raw data files are, if not identical, overwhelmingly similar but point out that there is one difference I did not address: GHCN v1 had only raw data with no adjustments, while GHCN v3 has both adjusted and raw versions. Perhaps the warming the E.M. Smith attributed to changes in input data might in fact be due to changes in adjustment method?

This is not the case, as GHCN v3 adjustments have little impact on the global-scale trend vis-à-vis the raw data. We can see this in Figure 5 below, where both GHCN v1 and GHCN v3 are compared to published NCDC and GISTemp land records:

clip_image010

Figure 5: Comparison of GHCN v1 and GHCN v3 spatially gridded anomalies with NCDC and GISTemp published land reconstructions.

If we look at the trends over the 1880-1990 period, we find that both GHCN v1 and GHCN v3 are quite similar, and lie between the trends shown in GISTemp and NCDC records.

1880-1990 trends

GHCN v1 raw: 0.04845 C (0.03661 to 0.06024)

GHCN v3 raw: 0.04919 C (0.03737 to 0.06100)

NCDC adjusted: 0.05394 C (0.04418 to 0.06370)

GISTemp adjusted: 0.04676 C (0.03620 to 0.05731)

This analysis should make it abundantly clear that the change in raw input data (if any) between GHCN version 1 and GHCN version 3 had little to no effect on global temperature trends. The exact cause of Smith’s mistaken conclusion is unknown; however, a review of his code does indicate a few areas that seem problematic. They are:

1. An apparent reliance on station Ids to match stations. Station Ids can differ between versions of GHCN.

2. Use of First Differences. Smith uses first differences, however he has made idiosyncratic changes to the method, especially in cases where there are temporal lacuna in the data. The method which used to be used by NCDC has known issues and biases – detailed by Jeff Id. Smith’s implementation and his method of handling gaps in the data is unproven and may be the cause.

3. It’s unclear from the code which version of GHCN V3 that Smith used.

STATA code and data used in creating the figures in this post can be found here: https://www.dropbox.com/sh/b9rz83cu7ds9lq8/IKUGoHk5qc

Playing around with it is strongly encouraged for those interested.

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
275 Comments
Inline Feedbacks
View all comments
June 24, 2012 9:53 am

Carrick’s probably gone for a beer – non carbonated – with Mosher and Zeke.

Venter
June 24, 2012 9:54 am

Chiefo,
Yes, I’m up against FDA and TGA and EMEA / EDQM all the time and work on filing DMF’s and CTD’s for obtaining approvals. Every bit of data and every experiment, whether positive or negative has to be recorded and documented and no ” in-fillng ” is allowed. Every small change has to be documented and DMF’s updated constantly. Every single move and process has to be validated independently, quite rigorously. That’s what makes me laugh when so called educated gents come and spout BS about the way GHCN and other temperature data are recorded and handled, or I would say mishandled. What they are condoning would be called out as fraudulent practice in every hard scientific field.

Pamela Gray
June 24, 2012 10:45 am

Carrick, love marble graphs. However, that graph needs to be viewed using unhomogenized raw from the sensor data, and correlated with ENSO oscillations over time spans defined as El Nino, La Nina, and Neutral and probably with multi-variate parameters, IE PDO, AO and AMO oscillations. Then turned into a movie with the marbles changing colors as they cool or warm year by year, with the background color scheme changing according to analogue ENSO years. In addition, each station listed needs to be given a numerical error bar value related to its degradation/equipment changes over time as role over popups.

Editor
June 24, 2012 10:48 am

Carrick says:
June 23, 2012 at 10:40 pm

Willis:

Heck, contrary to your usual practice, you even actually answered a few questions

Serious question here:
How many questions are they required to answer? I don’t expect d****bags deserve any answer at all, for example.

So this is meant as a serious question, Willis? How clueless does a person need to be before we are allowed to blow them off?

Asking someone for a citation to their claim is not a “clueless question”, Carrick. It is everyday scientific practice.
How many requests for citations do they need to answer? Well … about as many as the number of uncited claims that they make. In this particular example, Steven Mosher claimed that the TOBS adjustments didn’t just apply to the US. Citing his authority for this would have been a trivial thing to do, but instead he wants to play “Go Fish” …
I’ve played that game, Carrick, and here is how it turns out all too often. Someone refuses to answer my request for a citation, so I go to look. I find something, and return to discuss it. The person who told me to go fish says ‘No, that’s not the citation I was thinking of’, so we’re back to square one. So I refuse to play it any more. Mosher is one of the larger offenders in this area.
One of the things that makes science work is transparency. Part of that transparency is to cite your sources for your claims. That’s why there are long lists of references at the bottom of scientific papers. If you do not do so, people are well justified in asking for those citations. If you refuse to provide them upon request, I’ve gotten to the stage where I just point and laugh.
Finally, as to whether “douchebags” deserve an any answer, I just re-read the comments from the person “phi” asking Mosh the question. I find nothing to indicate that he is a troll, a nit-picker, or a “douchebag”. But even if he were, HE’S JUST ASKING FOR A FREAKIN’ CITATION TO A SCIENTIFIC CLAIM.
So yes, he definitely deserves an answer, even if he is just the janitor.
w.

E.M.Smith
Editor
June 24, 2012 11:16 am

Interesting set of comments since I was last here.
As I need to get ready for a visit to a local church, I’m not going to do my usual canonical response.
Looks like Carrick tosses some smears, flung some “poo insults”, and when he saw “no joy” packed up and left. Fine with me. Never saw much reason to indulge in “poo fling contests” as things are not more pleasant nor cleaner at the end. (Better to keep just enough distance and watch what the wind does 😉
Attempts at distraction to “The Radiative Model” again, too. I’m bored with argument of the form:
“If we ignore conduction, evaporation, convection, and condensation: radiation dominates!”
The world is a spherical heat pipe with water as the working fluid. Radiation is irrelevant below the top of the atmosphere. At those levels, added CO2 causes more heat loss.
http://wattsupwiththat.com/2012/06/19/a-demonstration-of-negative-climate-sensitivity/
http://chiefio.wordpress.com/2011/07/11/spherical-heat-pipe-earth/
Per GIStemp:
Having been through the code and having it running on LINUX; it is just a giant “pot stir” on top of GHCN. Does yet more smearing and blending and yet more changing of data. Then does an “anomaly creation” for grid / boxes at the very last step that compares fictional value to fictional value. Don’t see much merit in it at all.
Oh, and created by a guy who testified that it was a GOOD thing to break the law if your cause was good enough… Yeah, I’ll trust that Moral Compass to not break the ethics of science when he thinks he is saving the world… /sarcoff>;
Per Church:
Just so folks know… I’m not a strongly religious type (though I have my moments and did pick up a D.D. at one point.) The spouse is. Mostly I find religion and comparative religion very interesting and the historical record in books like the Bible rather accurate (as recent archeology digs have shown). So we “collect churches” from time to time going to different ones just to see how each does things. (Some are pretty strange, others fun, some somber, others songs and play… and a few just flat out confusing and alien in foreign languages.) It can be a fun hobby and gets me away from computers and climate for a while…
@Verity:
Oh Dear! I did swap TonyB for Kevin… My apologies! What can I say? It had been a long day…
I’ll check back in this evening.

phi
June 24, 2012 12:13 pm

Willis Eschenbach,
Thank you.
But I would have said:
“So yes, he definitely deserves an answer, ESPECIALLY if he is just the janitor.”

June 24, 2012 12:54 pm

Pamela Gray says: “Victor, I read your blog post. Very interesting. What are your thoughts regarding non-random station dropout …”
Thank you Pamela Gray. I have studied homogenisation, therefore I feel qualified to make a statement about that.
I did not study non-random station dropout. Intuitively I would be very much surprised if it would be possible to change the trends by removing a small fraction of the climatological stations. After homogenisation, the trends in neighbouring stations are quite consistent. Thus removing one will probably not change the average regional trend much. Do you have any study that indicates that such a thing is possible? You could study it by taking the homogenised GHCN data and removing the x percent of stations with the smallest trend and see what happens. I would expect you get a larger effect if you do this with the homogenised data as with the raw data (and then homogenise it, of course).
For the US there is a small climate reference network, set-up at pristine locations. The series is still only a few years long, but Matthew Menne and Claude William (NOAA) found that the trends averaged over the US of this reference network matched those of the homogenised data of the full network. Which is an indication that dropout is not a serious problem for the trends in the average temperature over a large region.
For the dropout to be non-random, you would need a conspiracy of hundreds of people in all countries of the Earth. Do you really see this as realistic? As a comment to the original E.M. Smith post, I already explained why I do not believe in a conspiracy. Unfortunately this comment was censored.
For the global mean, I do not expect much problems, but station dropout is a problem for studying the regional climate and changes in extreme weather. Just at the moment, the Czech Republic is closing down a third of its stations to balance the national budget. Which stations will be closed, is normally decided by the meteorologists and the financial department. Climatologists are not the powerful elites you guys seem to think they are. If you can make some noise and are able to fight the station dieback, you will have the climatologists at your side. Maybe we could try to get the UNESCO on our side; the climatological network is part of our human heritage.
E.M. Smith wrote: “For homogenizing, the techniques vary, but largely use the same kind of “average a bunch” to get a value. ”
Please educate yourself. Your description is completely wrong.
http://variable-variability.blogspot.com/2012/01/homogenization-of-monthly-and-annual.html

phlogiston
June 24, 2012 1:19 pm

Should we not just scrap the corrupt thermometer record and look for reliable recent proxies (not tree rings)? Or make a rocket big enough to put up a satellite which peeks at earth out of a lead box so its electronics and CCDs dont get roasted by solar and cosmic rays?

Pamela Gray
June 24, 2012 3:06 pm

I am speculating. I think station drop-out and change could very well be non-random in terms of ENSO patterns but not because of a conspiracy. I think station dropout has to do with abandoned stations in less populated areas of the US. If you look at ENSO analogue years, you will find that areas that are highly sensitive to ENSO multi-decadal changes also happen to be in low-population areas. There is potential there for abandoned station drop-out removing sensors that would have otherwise recorded the large decadal ups and downs of ENSO effects on temperature.

DocMartyn
June 24, 2012 3:17 pm

“Victor Venema
I did not study non-random station dropout. Intuitively I would be very much surprised if it would be possible to change the trends by removing a small fraction of the climatological stations. After homogenisation, the trends in neighbouring stations are quite consistent. Thus removing one will probably not change the average regional trend much”
Well if you make sure all the cooling stations left are inside very closely spaced clusters of warming stations and make sure that the ones you remove are near by the many voids, you make the voids warmer.
You then make sure all the warming stations removed are inside very closely spaced clusters of warming stations and make sure that the ones you keep are near by the many voids, you make the voids even warmer.
It is placing the warming stations and removing of cooling stations adjacent to void regions which will make all the difference.
If you find a cooling station all alone, a long distance from everyone else, get rid of it (intuitively ‘knowing its crap), and the created void is average to the surrounding stations.
If you find a rapidly warming station all alone, a long distance from everyone else, keep it (intuitively ‘knowing its crap), and the non-void is average to the surrounding stations.
You can look for selection bias quite quickly, just look at the nearest four station distances in the dropped data, if the distance of the dropped colder stations is bigger than in the warming stations, you have a smoking gun.

Carrick
June 24, 2012 3:22 pm

Willis, thanks for the response. I would phrase it slightly differently, whether a person “deserves” an answer depends a bit on how many times he’s been told the same information. Wouldn’t you agree?
I am not myself loathe to give out references to literature I think is pertinent so generally I do agree with your sentiment.

Carrick
June 24, 2012 3:31 pm

sunshine:

What you actually did is cherry picked data that suited your paradigm. Then you said if anyone doesn’t agree with you they are biased. Then you left.

… because of “Kitchen remodel in progress”, as I mentioned. But nice job of being a completely dishonest sleaze in your description of what transpired.
As for the other, I didn’t cherry pick the data, I choose it based on objective criterion, to minimize homogenization issues among other reasons. This goes under the rubric “quality control” and the criteria and methodology I used are both credible and in general use in other branches of science.
You don’t like what you see because it doesn’t give the answer you like, but regardless, you’re wrong about the claim about 1/3 of Mueller’s data (really GHCN) having negative trends and so was he, though as I mentioned you didn’t understand the caveats associated with that claim, you were uncritical of reports that favored your prior beliefs, which makes you credulous, not a skeptic at all, and now you aren’t able to honestly admit error when it’s pointed out to you. .
As to the lat, when I said people were biased, I meant everybody is biased, and that includes me, we know this in science, and we account for it in the process of the scientific method.
Cheers… Lunch/dinner stop from tiling. Then back to the kitchen.
[Moderator’s Note: Rare agreement with Carrick: some people (probably not moderators) have lives outside of commenting on WUWT. Carrick, good luck with the tiling! If I try that myself, I now know where to go for advice. -REP]

gallopingcamel
June 24, 2012 3:34 pm

Sadly, this discussion is way above my pay grade.
However it inspiring to see a debate carried out in such depth with minimal name calling or appeals to authority. Even the usually grumpy Steve has been mostly civil. It probably means that there is some mutual respect among the principals (Steve, Zeke and Chiefio) sadly lacking in most discussions of climate change issues.

davidmhoffer
June 24, 2012 3:44 pm

Pamela Gray;
I didn’t study station drop out per se, but a few years ago I pulled apart NASA/GISS in considerable detail to compare the trends of grid cells with continous data to those that “came and went”. I fully expected that the increase in grid cells with data, and the subsequent decline, would account for some portion of the temperature trend. I proved myself wrong (to my chagrine).

June 24, 2012 4:10 pm

1) The paragraph Carrick claims I wrote, was not written by me. It was written by Amino Acids.
Maybe Carrick and Editor should check …
Victor: “For the dropout to be non-random, you would need a conspiracy of hundreds of people in all countries of the Earth. ”
Like the average elevation dropping 46 meters from 1940 to 2000?

June 24, 2012 4:12 pm

To Carrick and Moderator :
Amino Acids wrote the paragraph attributed to me.
But calling me a dishonest sleaze is definitely consistent with Carrick’s modus operandi of verbally bullying people he is losing arguments to.

Manfred
June 24, 2012 4:16 pm

wayne says:
June 23, 2012 at 5:49 am
In Figure 3: http://wattsupwiththat.files.wordpress.com/2012/06/clip_image006.png
“This shows that while the raw data in GHCN v1 and v3 is not identical (at least via this method of station matching), there is little bias in the mean. Differences between the two might be explained by the resolution of duplicate measurements in the same location (called imods in GHCN version 2), by updates to the data from various national MET offices, or by refinements in station lat/lon over time.”
Zeke, that is not a correct statement above, “there is little bias”. I performed a separation of the bars right of zero from the bars on the left of zero and did an exact pixel count of each of the two portions.
To the right of zero (warmer) there are 9,222 pixels contained within the bars and on the left of zero (cooler) there are 6,834 pixels of area within. That is makes the warm side adjustments 135% of those to the cooler side. Now I do not count that as “basically the same” or “insignificant”. Do you? Really?
—————————————————————-
Hi, Mosher, I find it a bit distressing to see you again ignore a clear and very easy to understand issue with your analysis. I don’t think by doing this you are up to the standards of this website.

June 24, 2012 4:41 pm

steven mosher says:
Here are some examples; Tmin is reported and being great than Tmax, tmax is reported as being less than Tmin. temperatures of +15000C being reported, temperatures of -200C
being reported. There are scads of errors like this. data items being repeated over and over again. In a recent case where I was looking at heat wave data we found one station reporting freezing temperatures. When people die in July in the midwest and a stations “raw data” says that it is sub zero, I have a choice: believe the doctor who said they died of heat stroke or believe the raw data of a temperature station.

To me, this speaks to the unreliability of the record more than anything else. How can unreliable data be used to make reliable models?
BTW – It’s an honest question, from a non-scientist who’s trying to understand, surely that’s a worthwhile question to answer?

Carrick
June 24, 2012 4:57 pm

sunshine, I apologize for the misattribution.

June 24, 2012 5:07 pm

I wonder if homogenization of data or some other algorithm has removed some of the more extreme negative anomalies from more recent data because even in the NOAA data low extreme’s seem to have stopped.
Here is a pattern I find in many US states and some of the western provinces. I’ll use Washington States as an example.
Before 1987, there were 11 months with an anomaly of more than -10F (two of which were over -15F). After 1987 there were none. The number of large positive anomalies did not go up. And those never went over 8.
http://sunshinehours.wordpress.com/2012/06/24/usanoaa-5-year-averages-plotted-using-all-montly-anomalies/
BC is similar:
http://sunshinehours.wordpress.com/2012/06/24/british-columbia-environment-canada-5-year-averages-plotted-using-all-monthly-anomalies/
Or maybe large negative anomalies just stopped occurring.

Gene
June 24, 2012 5:17 pm

, re: thermometer bulb shrinkage
It is news to me, although it does sound plausible. The only data point I can throw at this issue is this: the oldest mercury thermometer I possess is Jenaer Normalglas 18/III made in 1940, -38 .. +46C, graduated to 0.2C. I just compared it across its entire range to an electronic instrument with a K-type thermocouple and 0.1C precision. Their readings are identical to +/-0.2C, with no perceptible bias.

June 24, 2012 5:36 pm

TonyG says:
June 24, 2012 at 4:41 pm (Edit)
steven mosher says:
Here are some examples; Tmin is reported and being great than Tmax, tmax is reported as being less than Tmin. temperatures of +15000C being reported, temperatures of -200C
being reported. There are scads of errors like this. data items being repeated over and over again. In a recent case where I was looking at heat wave data we found one station reporting freezing temperatures. When people die in July in the midwest and a stations “raw data” says that it is sub zero, I have a choice: believe the doctor who said they died of heat stroke or believe the raw data of a temperature station.
To me, this speaks to the unreliability of the record more than anything else. How can unreliable data be used to make reliable models?
BTW – It’s an honest question, from a non-scientist who’s trying to understand, surely that’s a worthwhile question to answer?
#############################################
A couple points.
1. The frequency of these types of errors is low.
2. even with these errors left in your get roughly the same answer
3. This data is not used to build models. Models are built from first principles, not data

DR
June 24, 2012 5:43 pm

3. This data is not used to build models. Models are built from first principles, not data

Actually, very little is built from first principles.

June 24, 2012 5:43 pm

Gene,
Having calibrated hundreds of thermometers [mercury & alcohol], and B, J, R, K, and S-type thermocouples, etc., plus PRT’s and just about every other electronic temperature measuring device invented, I can assure you that a good mercury thermometer is more accurate, linear, repeatable and reliable than the others. If bulb deformation happens, I’ve seen no evidence of it. We would often use a known accurate mercury thermometer to do a quick verification of an electronic thermometer; but not vice-versa. Hysteresis is more of a problem in electronic instruments than in mercury thermometers.
WUWT had an article on thermometer calibration a year or two ago, which I can’t locate for some reason. But it was taken from this blog, which has lots of good info, and it shows how complex even calibrating a simple mercury thermometer can be.
That is one reason why I place great reliance on the Central England Temperature [CET] record, which is based on a mercury thermometer and covers the past several hundred years. It shows the planet emerging from the LIA along the same long term trend line with no acceleration of temperatures, even though there has been about a 40% rise in CO2 during that time. That is extremely strong evidence that the claimed warming effect of CO2 is greatly exaggerated.

June 24, 2012 6:01 pm

Carrick
You did cherry pick data.
You told me to compare some land data to some other land data. You did not include ocean temperature. And coean temperature is where GISTemp fails.
To pick only Meuller and Hansen work and say they are correct but not other data sets shows a clear bias. Why can’t you see that? Would it be because of your own bias? You cannot see you are cherry picking. These arguments over which data set is valid always go in these same circles.
Also, name calling should have no place in any discussion other than elementary school yards.
And, apparently, you aren’t really gone.

1 4 5 6 7 8 11