Comparing GHCN V1 and V3

Much Ado About Very Little

Guest post by Zeke Hausfather and Steve Mosher

E.M. Smith has claimed (see full post here: Summary Report on v1 vs v3 GHCN ) to find numerous differences between GHCN version 1 and version 3, differences that, in his words, constitute “a degree of shift of the input data of roughly the same order of scale as the reputed Global Warming”. His analysis is flawed, however, as the raw data in GHCN v1 and v3 are nearly identical, and trends in the globally gridded raw data for both are effectively the same as those found in the published NCDC and GISTemp land records.

clip_image002

Figure 1: Comparison of station-months of data over time between GHCN v1 and GHCN v3.

First, a little background on the Global Historical Climatology Network (GHCN). GHCN was created in the late 1980s after a large effort by the World Meteorological Organization (WMO) to collect all available temperature data from member countries. Many of these were in the form of logbooks or other non-digital records (this being the 1980s), and many man-hours were required to process them into a digital form.

Meanwhile, the WMO set up a process to automate the submission of data going forward, setting up a network of around 1,200 geographically distributed stations that would provide monthly updates via CLIMAT reports. Periodically NCDC undertakes efforts to collect more historical monthly data not submitted via CLIMAT reports, and more recently has set up a daily product with automated updates from tens of thousands of stations (GHCN-Daily). This structure of GHCN as a periodically updated retroactive compilation with a subset of automatically reporting stations has in the past led to some confusion over “station die-offs”.

GHCN has gone through three major iterations. V1 was released in 1992 and included around 6,000 stations with only mean temperatures available and no adjustments or homogenization. Version 2 was released in 1997 and added in a number of new stations, minimum and maximum temperatures, and manually homogenized data. V3 was released last year and added many new stations (both in the distant past and post-1992, where Version 2 showed a sharp drop-off in available records), and switched the homogenization process to the Menne and Williams Pairwise Homogenization Algorithm (PHA) previously used in USHCN. Figure 1, above, shows the number of stations records available for each month in GHCN v1 and v3.

We can perform a number of tests to see if GHCN v1 and 3 differ. The simplest one is to compare the observations in both data files for the same stations. This is somewhat complicated by the fact that station identity numbers have changed since v1 and v3, and we have been unable to locate translation between the two. We can, however, match stations between the two sets using their latitude and longitude coordinates. This gives us 1,267,763 station-months of data whose stations match between the two sets with a precision of two decimal places.

When we calculate the difference between the two sets and plot the distribution, we get Figure 2, below:

clip_image004

Figure 2: Difference between GHCN v1 and GHCN v3 records matched by station lat/lon.

The vast majority of observations are identical between GHCN v1 and v3. If we exclude identical observations and just look at the distribution of non-zero differences, we get Figure 3:

clip_image006

Figure 3: Difference between GHCN v1 and GHCN v3 records matched by station lat/lon, excluding cases of zero difference.

This shows that while the raw data in GHCN v1 and v3 is not identical (at least via this method of station matching), there is little bias in the mean. Differences between the two might be explained by the resolution of duplicate measurements in the same location (called imods in GHCN version 2), by updates to the data from various national MET offices, or by refinements in station lat/lon over time.

Another way to test if GHCN v1 and GHCN v3 differ is to convert the data of each into anomalies (with baseline years of 1960-1989 chosen to maximize overlap in the common anomaly period), assign each to a 5 by 5 lat/lon grid cell, average anomalies in each grid cell, and create a land-area weighted global temperature estimate. This is similar to the method that NCDC uses in their reconstruction.

clip_image008

Figure 4: Comparison of GHCN v1 and GHCN v3 spatially gridded anomalies. Note that GHCN v1 ends in 1990 because that is the last year of available data.

When we do this for both GHCN v1 and GHCN v3 raw data, we get the figure above. While we would expect some differences simply because GHCN v3 includes a number of stations not included in GHCN v1, the similarities are pretty remarkable. Over the century scale the trends in the two are nearly identical. This differs significantly from the picture painted by E.M. Smith; indeed, instead of the shift in input data being equivalent to 50% of the trend, as he suggests, we see that differences amount to a mere 1.5% difference in trend.

Now, astute skeptics might agree with me that the raw data files are, if not identical, overwhelmingly similar but point out that there is one difference I did not address: GHCN v1 had only raw data with no adjustments, while GHCN v3 has both adjusted and raw versions. Perhaps the warming the E.M. Smith attributed to changes in input data might in fact be due to changes in adjustment method?

This is not the case, as GHCN v3 adjustments have little impact on the global-scale trend vis-à-vis the raw data. We can see this in Figure 5 below, where both GHCN v1 and GHCN v3 are compared to published NCDC and GISTemp land records:

clip_image010

Figure 5: Comparison of GHCN v1 and GHCN v3 spatially gridded anomalies with NCDC and GISTemp published land reconstructions.

If we look at the trends over the 1880-1990 period, we find that both GHCN v1 and GHCN v3 are quite similar, and lie between the trends shown in GISTemp and NCDC records.

1880-1990 trends

GHCN v1 raw: 0.04845 C (0.03661 to 0.06024)

GHCN v3 raw: 0.04919 C (0.03737 to 0.06100)

NCDC adjusted: 0.05394 C (0.04418 to 0.06370)

GISTemp adjusted: 0.04676 C (0.03620 to 0.05731)

This analysis should make it abundantly clear that the change in raw input data (if any) between GHCN version 1 and GHCN version 3 had little to no effect on global temperature trends. The exact cause of Smith’s mistaken conclusion is unknown; however, a review of his code does indicate a few areas that seem problematic. They are:

1. An apparent reliance on station Ids to match stations. Station Ids can differ between versions of GHCN.

2. Use of First Differences. Smith uses first differences, however he has made idiosyncratic changes to the method, especially in cases where there are temporal lacuna in the data. The method which used to be used by NCDC has known issues and biases – detailed by Jeff Id. Smith’s implementation and his method of handling gaps in the data is unproven and may be the cause.

3. It’s unclear from the code which version of GHCN V3 that Smith used.

STATA code and data used in creating the figures in this post can be found here: https://www.dropbox.com/sh/b9rz83cu7ds9lq8/IKUGoHk5qc

Playing around with it is strongly encouraged for those interested.

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
275 Comments
Inline Feedbacks
View all comments
June 24, 2012 1:42 am

Louis Hooffstetter says: “I’ve often wondered how and why temperatures are adjusted in the first place, and whether or not the adjustments are scientifically valid. If this has been adequately discussed somewhere, can someone direct me to it? If not, Steve, is this something you might consider posting here at WUWT?”
A good review is Peterson et al.: Homogeneity adjustments of in situ atmospheric climate data: A review, Int. J. Climatol., 18, 1493–1517, 1998.
http://onlinelibrary.wiley.com/doi/10.1002/%28SICI%291097-0088%2819981115%2918:13%3C1493::AID-JOC329%3E3.0.CO;2-T/abstract
I recently published a blind validation of the most-used and most advanced homogenisation algorithms. This article also includes the references of the articles describing these algorithms in detail.
http://www.clim-past.net/8/89/2012/cp-8-89-2012.html
To accompany this article, I wrote a blog post with a (hopefully) more easy to read introduction on the main reasons for inhomogeneities in the historical climate record and the main ideas behind the homogenisation algorithms:
http://variable-variability.blogspot.com/2012/01/homogenization-of-monthly-and-annual.html
I hope these links help you find your way into the scientific literature.

phi
June 24, 2012 1:57 am

Carrick,
“When phi argued with Mosher, one of the authors of BEST’s code, over BEST’s capabilities, that was the funniest moment on the thread for me.”
The funniest interventions could be yours. We spoke with Steven Mosher of the ability to disable the implicit homogenization in BEST. This implicit homogenization is the result of the segments adjustments. If you disable this setting, there is simply no results.
“Right up there with phi claiming that tree rings make better thermometers than real ones.”
Yes, this is the case, proven for tree rings densities in the medium term (10-100 years). You still have a lot to learn.

Editor
June 24, 2012 2:04 am

E.M.Smith says:
June 23, 2012 at 10:40 pm says
“Verity Jones at Digging In The Clay has a very nice set of postings showing warming vs cooling stations and changes over time. She and TonyB even have a nice data base interface on line last time I looked.”
Thanks for the plug – actually it is KevinUK who is the database and mapping expert, not Tonyb.
Original post – http://diggingintheclay.wordpress.com/2010/01/18/mapping-global-warming/
Update – http://diggingintheclay.wordpress.com/2010/10/08/kml-maps-slideshow/
I know Kevin has done more recent work putting all this on Google Maps http://www.climateapplications.com/MapsNCDC2.asp but the data for the USA hasn’t been completed yet and we’ve not written anything up on the blog – too busy with the day jobs.

June 24, 2012 3:24 am

Well clearly Mosher & Co have done a favor by showcasing how scientists are mostly clueless about data management. EMS’s analyses of the whole mess is clear and obvious. He has clearly shown GiGo in action.
The replicating temperature smearing is beyond belief. In fact all numbers that show any form of ‘global’ temperature are actually 20% of the world’s measured temperatures (of doubtful quality themselves) smeared out over the entire globe.
It beggars belief. I once had to write a model calculating the dBa of every train on any given time on any given location on any given height on any given distance over the entire national railroad track.
The data used in Climate ‘Science’ would be akin to me putting the noise profile of standard track and an intercity commuter train in a database and use them to calculate the noise of a highspeed train traveling at 200 miles per hour

June 24, 2012 4:47 am

Carrick
Speaking of bias………
There’s been talk of GISS in this thread. Do you see any bias in data handling in these?
Does GISTemp change? Part 1 (6:53 min)

.
Does GISTemp change? Part 2 (11:09 min)

June 24, 2012 6:36 am

Don’t let Carrick fool you with his non-scientific intimidation tactics. Him an Mosher are good ‘ol boy buddies. As we’ve seen in Climate Science countless times, Warmer tribalism will trump objective scrutiny every time.
Andrew

June 24, 2012 7:01 am

Carrick: “As for Smith, he wears his bias on his sleeve”
Mosher clearly has stated numerous times he believes if CO2 has increased it must have warmed the earth. Therefore he works hard to find some magical formula that proves crap data proves the earth is warming.
He has no interest in the third of stations even Mueller admitted were cooling.
He has no interest in bright sunshine data which HAS changed up and down since 1900.
And he certainly has no interest in anyone criticizing his “proof”.
He is like an alchemist insisting that one day, with the right code, he can turn crap data into gold.

June 24, 2012 7:01 am

Mosher and Zeke? Crickets…

Mariana Britez
June 24, 2012 7:09 am

So Mosher was involved with the BEST project now im 100% convinced its C*** now wonder the guy is turning warmist. I would say stick to investigations of Gleick etc your really good at that stuff LOL

A C Osborn
June 24, 2012 7:39 am

Verity Jones says: but the data for the USA hasn’t been completed yet and we’ve not written anything up on the blog – too busy with the day jobs.
You need to spend some of the $Millions that BIG OIL has been paying you all these years and give up the day job.
Sarc off/

Carrick
June 24, 2012 7:40 am

Bad Andrew, how am I intimidating anybody? I just call BS when I see it. I can’t say that I’m more than an acquaintance to Steven. Hardly some old buddy system, and if there’s anybody anti-intellecutal playing games with the truth, it’s you for making these wild claims.
This sort of analysis is much easier to screw up and get a result like Smith finds, much, much more difficult to screw up and find a result like Zeke and Steven find that it “doesn’t make much difference”. Seems like the gauntlet is thrown for Smith to put up or shut up. We have counter evidence, the ball is in his court to explain how code that has been heavily regression tested like Zeke & Stevens is wrong, and something he slung together to prove something he already knew is right.
I’ll note that predictably phi is still trying to argue how the BEST software functions (while being barely being software literate) and still claiming that (the much smaller geographical coverage and non-uniform response to temperature provided by) MXD is a better representation of temperature than thermometers can provide.
Amino Acids—think about the problem this way. Look at the geographic distribution of warming, then think about what happens to your global mean trend when you add in more stations at northern climes. When you adjust for differences in the “land-only” algorithms, BEST and GISTEMP get very findings, since they have the largest geographical coverage, so this is believable. If you want to do an “apples to apples” comparison, look at the 40-50° N zonal average, land only, how does this compare across algorithms? Does it make a flip of a difference?
But my question is really for Willis. How much engaging in his opinion is required with people who have such strong confirmation biases like bad andrew that they apply completely different standards to people who feed them what they want to hear than people who don’t, and anybody who raises doubt about their beliefs is blown off as a “true believer” in any case?
What’s to be gained with engagement here? I do think “do your own homework” is a reasonable retort to people who aren’t going to be touched by reason.

Pamela Gray
June 24, 2012 7:56 am

Victor, I read your blog post. Very interesting. What are your thoughts regarding non-random station dropout that may have over-emphasized ENSO-related geographic decadal oscillations? Would that not bias the raw data? Remember that these oscillations make some areas colder and some areas warmer, depending on the ENSO decadal pattern we are in. These patterns also drive changes in day versus night highs and lows, sunshine days, early versus late onset of seasonal temperature and precipitation changes, etc. If decadal ENSO/station dropout conflagration is a source of inhomogeneity, it would be a big one, would you agree?

June 24, 2012 8:01 am

“people who have such strong confirmation biases like bad andrew”
Carrick, you just made an unfounded accusation. You don’t know what my biases are, if I have any. You can’t know. On the other hand, for years I’ve seen you and Mosher defend each other’s position in blog comments while calling people with other opinions d-bags. Typical Birds of a Warmer Feather, is what the evidence indicates,
Andrew

Carrick
June 24, 2012 8:08 am

sunshine:

Mosher clearly has stated numerous times he believes if CO2 has increased it must have warmed the earth. Therefore he works hard to find some magical formula that proves crap data proves the earth is warming.

Well that’s what you believe so I can see why you’d believe everybody else thinks like you.
Mosher like any rational person with science background understands that there is a direct forcing from CO2 that causes warming. His view is accepted by any skeptic I know with science training, including Jeff Id (hardly a froth at the mouth global warmer).
People who don’t understand radiative physics AT ALL can choose to deny it but absence of knowledge is not the same as knowledge of absence…. and in any case it’s been demonstrated beyond any reasonable doubt, to the point where the very strong critic of the IPCC Steve Fitzgerald, makes a living off of selling devices that utilize the same exact physics that tend to cause climate to warm as more CO2 is increased. I don’t know of a stronger proof for an effect than “it works and it is a viable economic product.”
Does that mean that CO2 causes warming? Probably, but the direct effect is only about 1°C/doubling of CO2. Does it demonstrate that warming is substantive enough that we need to change our global economies to mediate it? No. I think Mosher has said similar things too. Has the IPCC nailed the most likely sensitivity? Probably not, many of us think they are quoting values that are too high including questionable studies on sensitivity to increase the range of uncertainty.

He has no interest in the third of stations even Mueller admitted were cooling.

More confirmation bias, unskeptical thinking on your part. The marble diagram of the US shown by Mueller’s group is badly flawed and misleading. But you accept it uncritically because it feeds a story you want to hear.
Here’s Mueller’s US only figure done right. (Red is warming blue is cooling.) This is 1/3 of the stations, if by stations we mean stations that operated over the entire 1940-2010 period. Taken straight from the same data set Mueller’s group used to produce their figure (to be fair to them, it has been misconstrued by people like you who have used it to interpret something different than what they meant. The figure I produced on the other hand is meant to allow you to make the comparison that you wanted to make.)
And here’s a histogram of trends for land stations both for US only and for global.
We learn three things from this: Climate has noise (who knew?) but also that most stations globally have shown warming in the sense that a scientist would use the word, namely their trend in temperature is positive from 1940-2010. (That is, we don’t look at one point at the end and one point at the beginning to determine if a noisy series is exhibiting warming, we look at the regressed slope of the data.) Third: Confirmation bias is a dangerous thing and unless you need to apply at least as much critical thinking to data and analysis that support you as you do to data and analysis that disagrees with your views.

Pamela Gray
June 24, 2012 8:15 am

What concerns me about confirmation bias is the speed at which climate scientists were convinced to study CO2-related issues while still not having completed all that was needed to research natural drivers, and certainly not all that was needed to research the quality of the multiple sets of temperature data, be they proxies or sensors.
From what I can see at my armchair, AGW scientists were bred and funded from a bias point of view, even though these same scientists may claim they have no bias. If this bias were not the case, we would be seeing a lot more articles from them reporting on their studies of natural drivers, much of which is not clearly understood and admittedly poorly represented in “models”.

Carrick
June 24, 2012 8:21 am

Bad Andrew:

Carrick, you just made an unfounded accusation

Straight from the mouth of the guy who just made unfounded accusations.

don’t know what my biases are, if I have any.

Au contrarie, I doubt anybody who has seen your writing is unaware of your biases. I doubt many will mistake you for Gavin Schmidt for example.
As to “If I have any”? Really??? You’re an android now and not a human??? Fascinating.
Humans have biases, it’s how we function cognitively, and it’s why science is designed with the notion in mind that we have biases and has to be self-correcting against it.

On the other hand, for years I’ve seen you and Mosher defend each other’s position in blog comments while calling people with other opinions d-bags.

OK, give us a link where Mosher and I defended each other while calling people with other opinions “d-bags”. In those exact terms.
Truthfully I wasn’t even thinking of this thread when I wrote, but times when I haven’t engaged people I disagree with opinions on.
You made the accusation, seems like it’s your responsibility to prove it or withdrawn it. Since I know you can’t, I’ll say in advance that your comment in a nutshell demonstrate the types of anti-intellectual games you personally engage in. Psuedo-intellectual arguments followed by blanket, unsupported (and unsupportable) accusations.

June 24, 2012 8:39 am

To Mosher, Zeke and Carrick, for example… All I would say is that in the 50 years i have been around, the temperature isn’t any warmer that I remember when I was 5 years old. It is not as warm as the lovely long hot summers of the 30’s when my Mum and Dad were enjoying their youth. Sea level is the same as it has always been. But apparently CO2 is much higher than when I was a lad. Not that I have noticed…
Sorry to all you lukewarmers and warm-mongers – but I don’t see a problem. And I don’t care how you play with the figures.

June 24, 2012 8:48 am

“Truthfully I wasn’t even thinking of this thread when I wrote”
Evidence that you aren’t the greatest thinker, either. 😉
Andrew

June 24, 2012 8:48 am

Carrick
You can do all sorts of things to make it look like nothing biased is happening with GISTemp. You are showing the bias you think others are showing. You do the thing you accuse others of doing. I’ve seen your type of arguments so many times. You want to say anything that doesn’t show the results you are looking for is wrong and whatever shows what you are looking for is right.
There is no way GISTemp is a true record of temperature on earth.
You also have to take into account the head of the department handling GISTemp is an environmental activist. He says he is himself. If you truly wanted unbiased data then you’ll have to end all appearance of bias. In other words, you’ll have to look at data that is not handled by an environmental activist. Even if you believe his data is unbiased you still cannot reference it since doing that could give the appearance you want data handled by an activist.
There’s other data sets. You’re better off not referencing BEST and GISTemp. Use the others.

June 24, 2012 8:59 am

Carrick
funny how you’re trying to guide the argument away from obvious issues. Like, for example, how GISTemp data does not use ARGO buoys anymore. It looks like the environmental activist running GISS did not like the cooling trend shown in ARGO buoys—even though ARGO has the best coverage of oceans. He dropped ARGO and went to an inferior data set.
And even if that wasn’t the reason the environmental activist did drop ARGO it still can, legitimately, be said that’s why he did because that’s the appearance of why he did it. So if you want to give the appearance you are not biased it’s the better part of wisdom to steer away from GISTemp and not defend it.

Bill Illis
June 24, 2012 9:06 am

Here are the changes made to the Land Temperature Record by the NCDC from Version 2 to the current Version 3 they are using.
They cooled the 1930s by around -0.1C and warmed the recent months by +0.05C.
http://img18.imageshack.us/img18/8978/changencdclandv2tov3.png
This is clearly a systematic change versus random homogeneity adjustments.
Now what were the changes from Version 1 to Version 2? Where is the original Version 1?
And I don’t understand how this systematic change does not show up Zeke’s charts. Is there a difference between GHCN and what the NCDC eventually uses as the actual reported temperature?
Here is the data.
ftp://ftp.ncdc.noaa.gov/pub/data/anomalies/monthly.land.90S.90N.df_1901-2000mean.dat
ftp://ftp.ncdc.noaa.gov/pub/data/anomalies/usingGHCNMv2/monthly.land.90S.90N.df_1901-2000mean.dat

Carrick
June 24, 2012 9:09 am

And I’m out of here. Not because it hasn’t been real. Kitchen remodel in progress. I may or may not get back here later. One can state ones views sometimes, present evidence for them, and move one.
Cheers.

June 24, 2012 9:12 am

Carrick, I didn’t misconstrue anything. I downloaded the data and mapped it.
http://sunshinehours.wordpress.com/2012/03/18/cooling-weather-stations-by-decade-from-1880-to-2000/
“That is, we don’t look at one point at the end and one point at the beginning to determine if a noisy series is exhibiting warming,”
I look at 5 year averages. Many US States are cooler over the last 5 years than periods as far backs as the early 1900s.
http://sunshinehours.wordpress.com/2012/06/24/usanoaa-5-year-averages-plotted-using-all-montly-anomalies/
The climate cycle is up and down. Plotting trends using annual anomalies from as many stations as possible HIDE the climate signal.
The climate signal is up and down and up and down.

June 24, 2012 9:30 am

sunshinehours1
Carrick is (I hate to use the term because it’s used so much, but it does apply) cherry picking data.

June 24, 2012 9:33 am

Carrick says:
June 24, 2012 at 9:09 am
“One can state ones views sometimes, present evidence for them, and move one.”
What you actually did is cherry picked data that suited your paradigm. Then you said if anyone doesn’t agree with you they are biased. Then you left.

1 3 4 5 6 7 11