Comparing GHCN V1 and V3

Much Ado About Very Little

Guest post by Zeke Hausfather and Steve Mosher

E.M. Smith has claimed (see full post here: Summary Report on v1 vs v3 GHCN ) to find numerous differences between GHCN version 1 and version 3, differences that, in his words, constitute “a degree of shift of the input data of roughly the same order of scale as the reputed Global Warming”. His analysis is flawed, however, as the raw data in GHCN v1 and v3 are nearly identical, and trends in the globally gridded raw data for both are effectively the same as those found in the published NCDC and GISTemp land records.

clip_image002

Figure 1: Comparison of station-months of data over time between GHCN v1 and GHCN v3.

First, a little background on the Global Historical Climatology Network (GHCN). GHCN was created in the late 1980s after a large effort by the World Meteorological Organization (WMO) to collect all available temperature data from member countries. Many of these were in the form of logbooks or other non-digital records (this being the 1980s), and many man-hours were required to process them into a digital form.

Meanwhile, the WMO set up a process to automate the submission of data going forward, setting up a network of around 1,200 geographically distributed stations that would provide monthly updates via CLIMAT reports. Periodically NCDC undertakes efforts to collect more historical monthly data not submitted via CLIMAT reports, and more recently has set up a daily product with automated updates from tens of thousands of stations (GHCN-Daily). This structure of GHCN as a periodically updated retroactive compilation with a subset of automatically reporting stations has in the past led to some confusion over “station die-offs”.

GHCN has gone through three major iterations. V1 was released in 1992 and included around 6,000 stations with only mean temperatures available and no adjustments or homogenization. Version 2 was released in 1997 and added in a number of new stations, minimum and maximum temperatures, and manually homogenized data. V3 was released last year and added many new stations (both in the distant past and post-1992, where Version 2 showed a sharp drop-off in available records), and switched the homogenization process to the Menne and Williams Pairwise Homogenization Algorithm (PHA) previously used in USHCN. Figure 1, above, shows the number of stations records available for each month in GHCN v1 and v3.

We can perform a number of tests to see if GHCN v1 and 3 differ. The simplest one is to compare the observations in both data files for the same stations. This is somewhat complicated by the fact that station identity numbers have changed since v1 and v3, and we have been unable to locate translation between the two. We can, however, match stations between the two sets using their latitude and longitude coordinates. This gives us 1,267,763 station-months of data whose stations match between the two sets with a precision of two decimal places.

When we calculate the difference between the two sets and plot the distribution, we get Figure 2, below:

clip_image004

Figure 2: Difference between GHCN v1 and GHCN v3 records matched by station lat/lon.

The vast majority of observations are identical between GHCN v1 and v3. If we exclude identical observations and just look at the distribution of non-zero differences, we get Figure 3:

clip_image006

Figure 3: Difference between GHCN v1 and GHCN v3 records matched by station lat/lon, excluding cases of zero difference.

This shows that while the raw data in GHCN v1 and v3 is not identical (at least via this method of station matching), there is little bias in the mean. Differences between the two might be explained by the resolution of duplicate measurements in the same location (called imods in GHCN version 2), by updates to the data from various national MET offices, or by refinements in station lat/lon over time.

Another way to test if GHCN v1 and GHCN v3 differ is to convert the data of each into anomalies (with baseline years of 1960-1989 chosen to maximize overlap in the common anomaly period), assign each to a 5 by 5 lat/lon grid cell, average anomalies in each grid cell, and create a land-area weighted global temperature estimate. This is similar to the method that NCDC uses in their reconstruction.

clip_image008

Figure 4: Comparison of GHCN v1 and GHCN v3 spatially gridded anomalies. Note that GHCN v1 ends in 1990 because that is the last year of available data.

When we do this for both GHCN v1 and GHCN v3 raw data, we get the figure above. While we would expect some differences simply because GHCN v3 includes a number of stations not included in GHCN v1, the similarities are pretty remarkable. Over the century scale the trends in the two are nearly identical. This differs significantly from the picture painted by E.M. Smith; indeed, instead of the shift in input data being equivalent to 50% of the trend, as he suggests, we see that differences amount to a mere 1.5% difference in trend.

Now, astute skeptics might agree with me that the raw data files are, if not identical, overwhelmingly similar but point out that there is one difference I did not address: GHCN v1 had only raw data with no adjustments, while GHCN v3 has both adjusted and raw versions. Perhaps the warming the E.M. Smith attributed to changes in input data might in fact be due to changes in adjustment method?

This is not the case, as GHCN v3 adjustments have little impact on the global-scale trend vis-à-vis the raw data. We can see this in Figure 5 below, where both GHCN v1 and GHCN v3 are compared to published NCDC and GISTemp land records:

clip_image010

Figure 5: Comparison of GHCN v1 and GHCN v3 spatially gridded anomalies with NCDC and GISTemp published land reconstructions.

If we look at the trends over the 1880-1990 period, we find that both GHCN v1 and GHCN v3 are quite similar, and lie between the trends shown in GISTemp and NCDC records.

1880-1990 trends

GHCN v1 raw: 0.04845 C (0.03661 to 0.06024)

GHCN v3 raw: 0.04919 C (0.03737 to 0.06100)

NCDC adjusted: 0.05394 C (0.04418 to 0.06370)

GISTemp adjusted: 0.04676 C (0.03620 to 0.05731)

This analysis should make it abundantly clear that the change in raw input data (if any) between GHCN version 1 and GHCN version 3 had little to no effect on global temperature trends. The exact cause of Smith’s mistaken conclusion is unknown; however, a review of his code does indicate a few areas that seem problematic. They are:

1. An apparent reliance on station Ids to match stations. Station Ids can differ between versions of GHCN.

2. Use of First Differences. Smith uses first differences, however he has made idiosyncratic changes to the method, especially in cases where there are temporal lacuna in the data. The method which used to be used by NCDC has known issues and biases – detailed by Jeff Id. Smith’s implementation and his method of handling gaps in the data is unproven and may be the cause.

3. It’s unclear from the code which version of GHCN V3 that Smith used.

STATA code and data used in creating the figures in this post can be found here: https://www.dropbox.com/sh/b9rz83cu7ds9lq8/IKUGoHk5qc

Playing around with it is strongly encouraged for those interested.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

275 Comments
Inline Feedbacks
View all comments
June 23, 2012 12:05 am

3. From the last thread:
E.M.Smith says:
June 22, 2012 at 1:19 am
Stokes:
I use ghcn v3 unadjusted.

June 23, 2012 12:47 am

One actually needs the name of the dataset. and actually the code that downloads and reads it in.

June 23, 2012 2:13 am

“effectively the same” is not good enough. Again, global warming is is founded upon 1/10ths of a degree. It is not founded upon large amounts of whole integers—i.e., it’s barely perceptible, especially to the untrained eye.
But more than that, using strictly “anomalies” isn’t good enough either because “global warming” scientists can be tricky with anomalies.
Have a look for yourself at these two videos. You’ll see there’s lots of play room available in actual temperature when looking only at anomalies of two, or more, data sets:
How ClimateGate scientists do the anomaly trick, PART 1

How ClimateGate scientists do the anomaly trick, PART 2

phi
June 23, 2012 2:24 am

A central feature in these comparisons is constituted by adjustments and by choices that are made to integrate or not in the same data series segments of the same station.
National offices generally choose to form the longest possible series and homogenize them. I believe GHCN preserves the segmentation. This means that the reconstructions performed based on the GHCN data run slightly on the principle adopted by BEST. Segments are homogenized de facto at the stage where all is averaged (in the cases presented here, within cells). Quantifying the actual adjustments can be made only if the series of stations were previously merged so according to the methodology of NMS. The magnitude of the actual adjustments are remarkably stable and it is about 0.5 ° C for the twentieth century.

Richard T. Fowler
June 23, 2012 2:24 am

“Smith’s implementation and his method of handling gaps in the data is unproven and may be the cause. ”
“3. It’s unclear from the code which version of GHCN V3 that Smith used. ”
These two statements appear to contradict each other. If the code is available, how can Smith’s “implementation and his method of handling gaps in the data” be unproven?
Zeke or Steve, would you care to elaborate? Thank you.
RTF

June 23, 2012 2:44 am

Nice curve ball Steve. What is it in the data dance world; three strikes and you’re out?
The frequency bar bell charts look to heavily favor positive anomalies in both charts. Looks to be warmed up temps well outnumber cooler mods. Any chance the cooler mods are before 1970 while those positive adjustments tend towards the end on the 20th century and the beginning of the 21st? Of course, you are avoiding showing the changes by year.
The anomaly spatially gridded line comparison charts, nice but why did you have to force the data through a grid blender first?

“…When we do this for both GHCN v1 and GHCN v3 raw data, we get the figure above. While we would expect some differences simply because GHCN v3 includes a number of stations not included in GHCN v1,…”

As I understand your gridded database, you are knowingly comparing apples to oranges and then you follow that little twist of illogic with.

“the similarities are pretty remarkable”

I must say, that last little tidbit just might be the truest thing you’ve posted. And you are brazen enough to say

“…3. It’s unclear from the code which version of GHCN V3 that Smith used…”

You’re out!

mfo
June 23, 2012 3:21 am

Saturday morning. What a time to post this response to EM :o(
The First Difference Method in comparison with others was written about by Hu McCulloch in 2010 at Climate Audit in response to an essay about calculating global temperature by Zeke Hausfarther and Steven Mosher at WUWT.
http://climateaudit.org/2010/08/19/the-first-difference-method/
http://wattsupwiththat.com/2010/07/13/calculating-global-temperature/

Paul in Sweden
June 23, 2012 3:23 am

E.M. Smith, Zeke Hausfather and Steve Mosher & all of you other highly talented individuals with your own fine web sites that grind this data up – we all know who you are :),
There is a lot of work and a great deal of expenditure of time and finances going on refining the major global temperature databases for the purpose of establishing a global mean temperature trend. I imagine the same amount of resources could be dedicated towards refining various global databases regarding precipitation, wind speed, polar ice extent, sea level, barometric pressure or UserID passwd for the purpose of establishing a global mean average trend.
How do we justify the financial and resource allocations dedicated to generating and refining these global means?
I cannot fathom a practical purpose for planetary mean averages unless we are in the field of astronomy. Here on earth global mean averages for specific metrics regarding climate have no practical value(unless we are solely trying to begin to validate databases).
Regional data by zone for the purpose of agricultural and civic planning are all that I see as valuable. Errors distributed throughout entire global databases in an even manner give me little solace.

david_in_ct
June 23, 2012 4:18 am

so since you have all the data why don t u do exactly what smith did and see if u get the same plots, instead of producing a different analysis. his main point is that the past was cooled relative to the present. why not take all the station differences that u found and bin them by year, then plot a running sum of the average of the differences year by year. if he is correct that graph will be a u shape. if the graph is flat as it should be then maybe he/you can find the differences in the data/code that each of u has used.

Geoff Sherrington
June 23, 2012 5:01 am

Did this Australian comment from Blair Trewin of the BoM become incorporated in any international adata set? Consequences?
> Up until 1994 CLIMAT mean temperatures for Australia used (Tx+Tn)/2. In
> 1994, apparently as part of a shift to generating CLIMAT messages
> automatically from what was then the new database (previously they were
> calculated on-station), a change was made to calculating as the mean of
> all available three-hourly observations (apparently without regard to
> data completeness, which made for some interesting results in a couple
> of months when one station wasn’t staffed overnight).
>
> What was supposed to happen (once we noticed this problem in 2003 or
> thereabouts) was that we were going to revert to (tx+Tn)/2, for
> historical consistency, and resend values from the 1994-2003 period. I
> have, however, discovered that the reversion never happened.
>
> In a 2004 paper I found that using the mean of all three-hourly
> observations rather than (Tx+Tn)/2 produced a bias of approximately
> -0.15 C in mean temperatures averaged over Australia (at individual
> stations the bias is quite station-specific, being a function of the
> position of stations (and local sunrise/sunset times) within their time
> zone.

Louis Hooffstetter
June 23, 2012 5:35 am

Informative post – thanks.
I’ve often wondered how and why temperatures are adjusted in the first place, and whether or not the adjustments are scientifically valid. If this has been adequately discussed somewhere, can someone direct me to it? If not, Steve, is this something you might consider posting here at WUWT?

wayne
June 23, 2012 5:49 am

In Figure 3: http://wattsupwiththat.files.wordpress.com/2012/06/clip_image006.png

“This shows that while the raw data in GHCN v1 and v3 is not identical (at least via this method of station matching), there is little bias in the mean. Differences between the two might be explained by the resolution of duplicate measurements in the same location (called imods in GHCN version 2), by updates to the data from various national MET offices, or by refinements in station lat/lon over time.”

Zeke, that is not a correct statement above, “there is little bias”. I performed a separation of the bars right of zero from the bars on the left of zero and did an exact pixel count of each of the two portions.
To the right of zero (warmer) there are 9,222 pixels contained within the bars and on the left of zero (cooler) there are 6,834 pixels of area within. That is makes the warm side adjustments 135% of those to the cooler side. Now I do not count that as “basically the same” or “insignificant”. Do you? Really?
It seems your analysis has a bias to warm itself, ignoring the actual data presented. The warm side *has* been skewed as E.M. was pointing out. The overlying bias is always a skew to warmer temperatures, always, I have yet in three years to see one to the contrary, and that is how everyone deems this as junk science. To some, a softer term, cargo cult science.

June 23, 2012 5:54 am

“National offices generally choose to form the longest possible series and homogenize them. I believe GHCN preserves the segmentation. This means that the reconstructions performed based on the GHCN data run slightly on the principle adopted by BEST.”
The Berkeley Earth Method does not preserve segmentations, quite the opposite. It segments time series into smaller components,

phi
June 23, 2012 6:06 am

Steven Mosher,
“The Berkeley Earth Method does not preserve segmentations, quite the opposite. It segments time series into smaller components,”
What is the opposite of BEST is the NMS methodology which aggregates segments before homogenizing. What you did with the GHCN series is between these two extremes. In fact, you’re closer to BEST because segmentation present in GHCN generally corresponds to stations moves and it is these particular discontinuities which are biased.

June 23, 2012 6:14 am

“Richard T. Fowler says:
June 23, 2012 at 2:24 am (Edit)
“Smith’s implementation and his method of handling gaps in the data is unproven and may be the cause. ”
“3. It’s unclear from the code which version of GHCN V3 that Smith used. ”
These two statements appear to contradict each other. If the code is available, how can Smith’s “implementation and his method of handling gaps in the data” be unproven?
Zeke or Steve, would you care to elaborate? Thank you.”
Sure. In EM’s post on his method he describes his method of handling gaps in the record in words. His description is not very clear, but it is clear that he doesnt follow the standard approach used in FDM which is to reset the offset to 0. And in his post it wasnt clear what exact file he downloads. For example, if you read turnkey code by Mcintyre you can actually see which file is downloaded because their is an explict download.file() command. In what I could find of Smith’s it wasnt clear.

June 23, 2012 6:20 am

Louis Hooffstetter says:
June 23, 2012 at 5:35 am (Edit)
Informative post – thanks.
I’ve often wondered how and why temperatures are adjusted in the first place, and whether or not the adjustments are scientifically valid. If this has been adequately discussed somewhere, can someone direct me to it? If not, Steve, is this something you might consider posting here at WUWT?
#################
Sure. back in 2007 I started as a skeptic of adjustments. After plowing through piles of raw and adjusted data and the code to do adjustments. I conclude
A. Raw data has errors in it
B. These errors are evident to anyone who takes the time to look.
C. these errors have known causes and can be corrected or accounted for
The most important adjustment is TOBS. We dedicated a thread to it on Climate audit.
Tobs is the single largest adjustment made to most records. It happens to be a warming adjustment.

June 23, 2012 6:32 am

atheok
My guess is that you did not look at EM smiths code.
http://chiefio.wordpress.com/2012/06/08/ghcn-v1-vs-v3-some-code/
When you look through that Fortran and the scripts.. well, perhaps you can help me and figure
out which file he downloaded. Look for a reference in that code that shows he downloaded
this file
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/ghcnm.tavg.latest.qcu.tar.gz
basically, if somebody asks me why someone comes to wrong conclusions it could be
the wrong data or the wrong method. basic forensics work.
Wrong data can be
1. wrong file
2. read in wrong
3. formated wrong
Wrong method can be a lot of things. So, basically, I suggest starting at step zero when trying to figure these things out. Perhaps your fortran is better than mine and you can find the line in hat code that shows what file he downloads. Its a simple check,

June 23, 2012 6:38 am

“A. Raw data has errors in it”
Steven Mosher, elsewhere over the years you have claimed there is no raw data.
So 2 questions:
1. Is there raw data or not?
2. How did you come to determine there were errors in it? Data is normally just data. The error occurs in the way it’s handled. Care to explain?
Andrew

June 23, 2012 6:41 am

“so since you have all the data why don t u do exactly what smith did and see if u get the same plots, instead of producing a different analysis. his main point is that the past was cooled relative to the present. why not take all the station differences that u found and bin them by year, then plot a running sum of the average of the differences year by year. if he is correct that graph will be a u shape. if the graph is flat as it should be then maybe he/you can find the differences in the data/code that each of u has used.”
Zeke has provided the code he used to do this analysis. So, you are free to go do that. If you dont like that code, you can go use the R packages that I maintain. Everything can be freely downloaded from the CRAN repository. The package is called RghcnV3.
My preference is to avoid GHCN V3 altogether, and work with raw daily data. You get the same answers that we posted here for monthly data and avoid all the confusion and controversy surrounding GHCN V1,V2 and V3. That dataset has 26,000 stations ( actually 80K when you start )

phi
June 23, 2012 6:42 am

Steven Mosher,
“A. Raw data has errors in it
B. These errors are evident to anyone who takes the time to look.
C. these errors have known causes and can be corrected or accounted for”
Corrected errors are discontinuities. The main discontinuities that cause bias are those related to stations moves. They should not be regarded as errors but as corrections of increasing perturbations since the 1920s.
“The most important adjustment is TOBS. We dedicated a thread to it on Climate audit.”
Only valid for US.

June 23, 2012 6:48 am

mfo
Yes, you will find in the past that I used to be HUGE FAN of the first difference method.
read through that climate audit post. Skeptic Jeff Id, convinced believers Hu and Steve
that First differences was fatally flawed. EM did not get the memo.
That is how things work. I was convinced that First differences would solve all our problems.
I was wrong. Jeff Id made a great case and everybody with any statistical sense moved on to methods exactly like those created by Roman M and JeffId. That list includes: Tamino, Nick Stokes and Berkeley Earth. See Hu’s final comment:
“Update 8/29 Just for the record, as noted below at http://climateaudit.org/2010/08/19/the-first-difference-method/#comment-240064, Jeff Id has convinced me that while FDM solves one problem, it just creates other problems, and hence is not the way to go.
Instead, one should use RomanM’s “Plan B” — see
http://statpad.wordpress.com/2010/02/19/combining-stations-plan-b/, http://climateaudit.org/2010/08/19/the-first-difference-method/#comment-240129 , with appropriate covariance weighting — see http://climateaudit.org/2010/08/26/kriging-on-a-geoid/ .”

June 23, 2012 6:51 am

Phi.
Interesting that you think Tobs only applies to the US. It doesn’t.
With regard to station moves, I prefer the BEST methodology. although in practice we know that explicit adjustments give the same result.

June 23, 2012 6:59 am

Andrew
So 2 questions:
1. Is there raw data or not?
2. How did you come to determine there were errors in it? Data is normally just data. The error occurs in the way it’s handled. Care to explain?
Andrew
###############
Philosophically there is no raw data. Practically, what we have is what you could call
“first report” So, I’m using “raw” in the sense that most of you do.
2. How do you determine that there are errors in the data? Good question.
Here are some examples; Tmin is reported and being great than Tmax, tmax is reported as being less than Tmin. temperatures of +15000C being reported, temperatures of -200C
being reported. There are scads of errors like this. data items being repeated over and over again. In a recent case where I was looking at heat wave data we found one station reporting freezing temperatures. When people die in July in the midwest and a stations “raw data” says that it is sub zero, I have a choice: believe the doctor who said they died of heat stroke or believe the raw data of a temperature station. hmm. Tougher examples are subtle changes like
a) station moves
b) instrument changes
c) time of observation change
d) and toughest of all gradual changes over time to the enviroment

pouncer
June 23, 2012 7:00 am

Hi Steve,
Does this analysis address the point of “fitness for purpose”? The purpose of all such historic reviews, as I understand it, is to proximate the changes in black body model atmospheric temperatures for use in a (changing) radiation budget. The “simple physics” is simple. Measuring the data is more complicated.
Chiefio claims differences over time are of comparable size (a) between versions of the data set, (b) as “splice” and other artifacts of measuring methods, (c) deliberate adjustments intended to compensate for the data artifacts, and (d) actual physical measures.
If the real difference over a century is under two degrees and the variations for versions,data artifacts, and adjustments distort measurement of that difference, how can that difference be claimed to decimal point accuracy? (Precision, I grant, from the large number of measurements. But Chiefio’s point that the various sources of noise are NOT random and therefore can NOT be assumed to cancel is, as far as I can tell, not explicitly addressed.) If the intended purpose does require that level of accuracy and if the measurement does not provide it, can the data set be said to be useful for that purpose? (Useful for many other purposed, including those for which it was originally gathered, don’t seem to me to be germaine.)
I see your analysis as a claim that the differences make little difference. I agree. But we are talking about very little differences in the whole picture.

phi
June 23, 2012 7:08 am

Steven Mosher,
“Interesting that you think Tobs only applies to the US. It doesn’t.”
If you say this is that you have a case in mind. Have you a reference?
“With regard to station moves, I prefer the BEST methodology.”
It has the disadvantage of not allowing to assess the magnitude of the adjustments.
“although in practice we know that explicit adjustments give the same result.”
Yes, explicitly or implicitly all global temperatures curves are homogenized.

1 2 3 11
Verified by MonsterInsights