Comparing GHCN V1 and V3

Much Ado About Very Little

Guest post by Zeke Hausfather and Steve Mosher

E.M. Smith has claimed (see full post here: Summary Report on v1 vs v3 GHCN ) to find numerous differences between GHCN version 1 and version 3, differences that, in his words, constitute “a degree of shift of the input data of roughly the same order of scale as the reputed Global Warming”. His analysis is flawed, however, as the raw data in GHCN v1 and v3 are nearly identical, and trends in the globally gridded raw data for both are effectively the same as those found in the published NCDC and GISTemp land records.

clip_image002

Figure 1: Comparison of station-months of data over time between GHCN v1 and GHCN v3.

First, a little background on the Global Historical Climatology Network (GHCN). GHCN was created in the late 1980s after a large effort by the World Meteorological Organization (WMO) to collect all available temperature data from member countries. Many of these were in the form of logbooks or other non-digital records (this being the 1980s), and many man-hours were required to process them into a digital form.

Meanwhile, the WMO set up a process to automate the submission of data going forward, setting up a network of around 1,200 geographically distributed stations that would provide monthly updates via CLIMAT reports. Periodically NCDC undertakes efforts to collect more historical monthly data not submitted via CLIMAT reports, and more recently has set up a daily product with automated updates from tens of thousands of stations (GHCN-Daily). This structure of GHCN as a periodically updated retroactive compilation with a subset of automatically reporting stations has in the past led to some confusion over “station die-offs”.

GHCN has gone through three major iterations. V1 was released in 1992 and included around 6,000 stations with only mean temperatures available and no adjustments or homogenization. Version 2 was released in 1997 and added in a number of new stations, minimum and maximum temperatures, and manually homogenized data. V3 was released last year and added many new stations (both in the distant past and post-1992, where Version 2 showed a sharp drop-off in available records), and switched the homogenization process to the Menne and Williams Pairwise Homogenization Algorithm (PHA) previously used in USHCN. Figure 1, above, shows the number of stations records available for each month in GHCN v1 and v3.

We can perform a number of tests to see if GHCN v1 and 3 differ. The simplest one is to compare the observations in both data files for the same stations. This is somewhat complicated by the fact that station identity numbers have changed since v1 and v3, and we have been unable to locate translation between the two. We can, however, match stations between the two sets using their latitude and longitude coordinates. This gives us 1,267,763 station-months of data whose stations match between the two sets with a precision of two decimal places.

When we calculate the difference between the two sets and plot the distribution, we get Figure 2, below:

clip_image004

Figure 2: Difference between GHCN v1 and GHCN v3 records matched by station lat/lon.

The vast majority of observations are identical between GHCN v1 and v3. If we exclude identical observations and just look at the distribution of non-zero differences, we get Figure 3:

clip_image006

Figure 3: Difference between GHCN v1 and GHCN v3 records matched by station lat/lon, excluding cases of zero difference.

This shows that while the raw data in GHCN v1 and v3 is not identical (at least via this method of station matching), there is little bias in the mean. Differences between the two might be explained by the resolution of duplicate measurements in the same location (called imods in GHCN version 2), by updates to the data from various national MET offices, or by refinements in station lat/lon over time.

Another way to test if GHCN v1 and GHCN v3 differ is to convert the data of each into anomalies (with baseline years of 1960-1989 chosen to maximize overlap in the common anomaly period), assign each to a 5 by 5 lat/lon grid cell, average anomalies in each grid cell, and create a land-area weighted global temperature estimate. This is similar to the method that NCDC uses in their reconstruction.

clip_image008

Figure 4: Comparison of GHCN v1 and GHCN v3 spatially gridded anomalies. Note that GHCN v1 ends in 1990 because that is the last year of available data.

When we do this for both GHCN v1 and GHCN v3 raw data, we get the figure above. While we would expect some differences simply because GHCN v3 includes a number of stations not included in GHCN v1, the similarities are pretty remarkable. Over the century scale the trends in the two are nearly identical. This differs significantly from the picture painted by E.M. Smith; indeed, instead of the shift in input data being equivalent to 50% of the trend, as he suggests, we see that differences amount to a mere 1.5% difference in trend.

Now, astute skeptics might agree with me that the raw data files are, if not identical, overwhelmingly similar but point out that there is one difference I did not address: GHCN v1 had only raw data with no adjustments, while GHCN v3 has both adjusted and raw versions. Perhaps the warming the E.M. Smith attributed to changes in input data might in fact be due to changes in adjustment method?

This is not the case, as GHCN v3 adjustments have little impact on the global-scale trend vis-à-vis the raw data. We can see this in Figure 5 below, where both GHCN v1 and GHCN v3 are compared to published NCDC and GISTemp land records:

clip_image010

Figure 5: Comparison of GHCN v1 and GHCN v3 spatially gridded anomalies with NCDC and GISTemp published land reconstructions.

If we look at the trends over the 1880-1990 period, we find that both GHCN v1 and GHCN v3 are quite similar, and lie between the trends shown in GISTemp and NCDC records.

1880-1990 trends

GHCN v1 raw: 0.04845 C (0.03661 to 0.06024)

GHCN v3 raw: 0.04919 C (0.03737 to 0.06100)

NCDC adjusted: 0.05394 C (0.04418 to 0.06370)

GISTemp adjusted: 0.04676 C (0.03620 to 0.05731)

This analysis should make it abundantly clear that the change in raw input data (if any) between GHCN version 1 and GHCN version 3 had little to no effect on global temperature trends. The exact cause of Smith’s mistaken conclusion is unknown; however, a review of his code does indicate a few areas that seem problematic. They are:

1. An apparent reliance on station Ids to match stations. Station Ids can differ between versions of GHCN.

2. Use of First Differences. Smith uses first differences, however he has made idiosyncratic changes to the method, especially in cases where there are temporal lacuna in the data. The method which used to be used by NCDC has known issues and biases – detailed by Jeff Id. Smith’s implementation and his method of handling gaps in the data is unproven and may be the cause.

3. It’s unclear from the code which version of GHCN V3 that Smith used.

STATA code and data used in creating the figures in this post can be found here: https://www.dropbox.com/sh/b9rz83cu7ds9lq8/IKUGoHk5qc

Playing around with it is strongly encouraged for those interested.

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
275 Comments
Inline Feedbacks
View all comments
June 23, 2012 2:27 pm

steven mosher says:
Interesting that you think Tobs only applies to the US.

Phi said the adjustment only applies to the USA. What adjustment method is used is determined by how and when thermometer reading practices changed and when automation occured.
These vary by country and hence the adjustment (method) needed also varies by country. Karl’s adjustment method is specific to the USA. If it used elsewhere then that is questionable.

E.M.Smith
Editor
June 23, 2012 2:38 pm

@Pamela Gray:
The stations are not at all the same collection. Some are the same, others not. Station counts are quite different (so stations must be different). I provide a “count” of individual station records used in making any given ‘report’ of anomalies from which I make the graphs. You can find those counts in the individual reports that are posted in the individual examinations of regions and areas here:
http://chiefio.wordpress.com/v1vsv3/
In particular, for the set as a global whole here:
http://chiefio.wordpress.com/2012/06/01/ghcn-v1-vs-v3-1990-to-date-anomalies/
Where the 1990 ‘count’ for v1 is 3400 (down from 3503 in 1989) and the v3 ‘count’ is 4703 in 1990 (down from 4929 in 1989). So, of necessity, there are at minimum 1303 “different” stations in v3 than in v1. (There will be more, as station changes do not show up in the broad ‘count’. There will also be instrument changes that do not show up as either a count or a station change, that were flagged with ‘duplicate number’ in v2 but are now lost in the homogenizing of v3).
It is that constantly mutating content of the thermometer list that is, IMHO, the biggest problem. Any chemist who has done calorimetry will tell you that screwing around constantly changing the thermometers gives a load of error and splice artifacts that simply can not be reliably removed. And make no mistake about it, what the climate codes claim to do is use past temperature records to do a calorimetry on the Earth, showing net heat gain. (Just without accounting for mass, mass flow, phase changes, and thermometer changes…. Yeah, that bad…)
My comparison looks for how those changes can have impact on the results. This critique says if they look at small enough batches, they don’t change much. But the climate codes smear the data all over and only ‘batch it’ in the end steps; so IMHO the method used here will fail to find that problem in the data.
@sunshinehours1:
There is an accepted fallacy in ‘climate science’ circles. That fallacy is that if average things enough, you can get any precision you like. (Yes, an average can have far greater precision than the individual data times that go into it… but that’s not the point… read on…) It is true that if you average a bunch of readings of some value, the random error in that sample will be reduced. It is not true that the systematic error will be reduced.
An example:
You have a Liquid In Glass (LIG) thermometer read by a human. It is at about 4 feet off the ground, most folks are taller than that. So they look at the meniscus in the glass, find it above a whole degree mark (call it 95.4 F) and dutifully put on the record / report “95 F” (they only recorded whole degrees F in the USA for a very long time. Historically the directions even said that if you missed a value, you could just make one up; but once I linked to those directions at NOAA, they rapidly evaporated… One can only wonder why…)
Now there are two ways to illustrate the potential for error here. One is to say you change to a short person reading the thermometer, so they see the meniscus at 95.6 and report 96 F on the form. That one is pretty clear. You could easily have a series that is prone to that shifting when staff has turnover. (Or new training happens on how to look at the meniscus straight on).
The other is more interesting, IMHO. Say the station is now replaced with an electronic gizmo that reports “95.4”. You now have 0.4 F of “warming” that comes just from change of process (how temperatures are reported). As you can not go back are “fix” the past records to be greater precision than exists, this structural bias will just be there forever. So, say, your people regularly just looked at the meniscus and reported the last line it crossed. On average, the new electronic thing will be 1/2 F higher on what it reports. (Note that I am NOT saying the directions to the people were to do that, only that it is a very common thing for folks to do. Truncate instead of round.)
The more important point here is just that you do not know if that kind of structural error is in the data. It might be. It might not be. So averaging can remove random error (one person rounds up, the next rounds down) but can not remove a structural bias where most folks truncate because it is easy and a few round.
Nor can it remove the error from an old LIG that was regularly 1/2 F low being replaced by a new electronic one that is ‘spot on’, so you get a 1/2 F “lift” to the data. As the prior data were reported in whole degrees F, you can not find that “lift” by inspection of the data.
IMHO, a lot of the ‘step function warmer’ that happens at the 1987-1990 area is from exactly those kinds of structural errors in the “splice” when the “duplicate number” on the segments changed showing a change of process and / or instruments.
Also, IMHO, anyone who reports or uses temperatures to more than whole degrees F is showing that they either do not understand the problem, or are deliberately choosing to ignore it. ( I occasionally choose to ignore it, as it mostly just gets the True Believers In Over Averaging False Precision tossing rocks at me; but I occasionally point it out…)
So for me, that BEST puts values out to 3 decimal places in their data set tells me immediately that they are making some fundamental errors around the nature of averages, precision, and the false precision from expecting averages to remove all error instead of just random error.
(IFF they put a disclaimer on the set somewhere that they know the precision is false precision but are passing through the results of calculations for others to have the exact result, that is a reasonable thing to do. Saying “Only the whole degrees are trusted, the 1/10ths are good to about 1/2 C, and the rest is false precision but there for you to deal with” is a more flexible way of providing such a data product, and is fine. Saying “that 1/1000 place is accurate” is just wrong and indicates lack of understanding. So it really depends on the data set notes.)
FWIW, the point you illustrate is part of why I think the data are “an issue”. So much of it wanders back and forth by 1/2 C / 1 F type values and is riddled with structural errors that you just can’t really say anything about temperatures in the fractional degree range of a large average. But the True Believers will vociferously defend their divination powers to all sorts of precision…
@DocMartyn:
Nice point… WHICH station matters quite a bit when things start ‘reaching’ for data up to 1200 km away and using ‘selected’ corrections. There is one station in the middle of Europe that is key to GIStemp getting the UHI adjustments largely based on it. The code even puts in a ‘special’ set of data for that station so it becomes the ‘longest record’ in the area (it looks for ‘longest’ to determine which station gets priority in adjusting the others…)
So there are key stations with far more power than others.
Per “Fit For Purpose”:
I probably ought to have made it more clear that specifically the purpose I see as “the issue” is feeding into “Serial Averager” and “Serial Homgenizer” codes like GIStemp and expecting to get anything reasonable out.
In particular, that codes which give individual stations and individual data items ‘reach’ out to large distances to modify the contents of that record or those areas. The potential for hidden “splice artifact” like effects from picking up those higher end data and giving them long reach into other records is just too problematic.
I suppose my ‘critique of this critique’ mostly comes down to that point: I’m looking at what the data are used for in the AGW world (feeding GIStemp, NCDC Adjusted, Hadley) that do those kinds of ‘spread the wealth’ process and find the data unfit for that kind of process. This critique finds it suited for use inside small boxes. So? Not the point at all…
Dawg & Climatebeagle:
They do like to periodically shuffle the deck and make it hard to do comparisons…. It was that violation of the standards in things like accounting (where I have more experience) that first got me a bit ruffled. It’s just really dodgy technique, at minimum. While it makes me think of constantly moving walnut shells and peas; I can’t prove it is anything other than gratuitous change from lack of caring.
For some changes, like the Country Code map, it makes more sense. Countries come and countries go. But even there, they could have kept the unchanged group, well, unchanged. Instead a complete renumber was done. Near as I can tell they just picked a map and started assigning numbers in sequence each time. Not very sophisticated.
So “malice” or “stupidity”… heck of a choice.
And I notice in a further comment that Steven still hasn’t noticed that I don’t DO station matching on WMO number. ( I don’t do it at all. I compare groups of stations in matched country codes, or in the first digit of it, Region. I don’t expect that a North American station will suddenly wander off to Asia…. ) Nothing like having someone all in a tizzy about you doing wrongly something you don’t do at all. No idea how to answer that. It’s a “Have you stopped beating your wife?” issue structurally. “I never did it at all” is the only thing to say…
:
BINGO! again… As Steven points out, there are a very large number of flags for various degrees of flaky. Guess why they need them 😉
IMHO so much of the manipulation done to the data between collection and end use is done that the end result is full of all sorts of “pretty values” but they have lost usable meaning.
At one time I started listing all the steps of changes and couldn’t get them all listed. Even the “base data” are full of “estimated” and “interpolated” and other kinds of “not really data” flags.
But that’s for another day…
R:
Good point…
@Ripper:
Ah, the light dawns!
That kind of ‘selective loss and infill’ is what I think is at the heart of how GIStemp and related do a ‘splice artifacty’ smearing process (across long distances…) and get the effect they do.
That is exactly why I will ONLY compare a thermometer with itself, and only within a given month series and span dropouts without a reset. It breaks that kind of “delete and infill” effect. That is why it is important to deal with the dropouts in a way that negates them.
One of the things that I first ran into looking at v2 and GIStemp was the way some locations seemed to have dropouts just at convenient times that would be ‘filled in’ from nearby places, where the record was more suitable, by ‘warming by splicing’. That is especially visible in the Pacific where several islands that are “dead flat” with one thermometer had their data suddenly end, and the nearest place (Tonga or Fiji I think…) had something like 6 thermometers that changed over time (so splice opportunities) and when spliced gave a nice warming trend. So that “trend” gets spread into the ‘grid box’ of the nearby island and compared to its past flat, and presto, instant warming grid / box.
Congratulations on spotting it too!
And that, BTW, is why it is important to compare OUTSIDE the prescription of small grid/boxes and why I made a system that lets me do those kinds of ‘variable areas’… (And why the method used in this critique can not find that ‘issue’…)
@Pamela Gray:
Nice idea… Hmmm… someone has been thinking 😉
Also useful is to find those islands with data truncation and get current data and see if they have continued their flat trend (or even just if the present Wunderground value matches the historical reports, modulo the Airport Heat Island…)
@dp:
Another person ‘gets it’ about the nature of this critique 😉
I’d only add that the method for comparison ought to also have the goal of showing how not-doing infill on dropouts and how examination of data spreading could present “issues” in the typical climate codes that do in-fill, homogenizing, and Reference Station Method type spreading…
@Anthony:
I’m fine with the “put it up when you get it”. No worries.
“Reality just is. -E.M.Smith” and the comparison will show what it shows, be I looking or not. Similarly, my comparison shows what it shows. It is the answer to “what do they show?” that is interesting, not exactly when they show it…
As I’ve now caught up with my first response to comments here, I’m taking a lunch break, then I’ll come back and look at what is in subsequent comment.

June 23, 2012 2:52 pm

phi says:
June 23, 2012 at 12:21 pm
Chas,
“Thermometer bulbs shrink over time”
It’s an interesting point, I addressed it there:
http://rankexploits.com/musings/2012/a-surprising-validation-of-ushcn-adjustments/#comment-95708

phi, I would submit there are other factors not addressed regarding accuracy over time (re: ‘glass thermometer bulb shrinkage’) besides “thermometers which suffered from a slow contraction of the mercury containers.” and this is in the category of the ‘shrinkage-expansion’ hysteresis glass exhibits, to wit I quote the following from here:

The next most significant cause of error comes from the glass, a substance with complex mechanical properties. Like mercury, it expands rapidly on heating but does not contract immediately on cooling. This produces a hysteresis which, for a good glass, is about 0.1% of the temperature change.
A good, well-annealed thermometer glass will relax back over a period of days. An ordinary glass might never recover its shape. Besides this hysteresis, the glass bulb undergoes a secular contraction over time; that is, the thermometer reading increases with age, but fortunately the effect is slow and calibration checks at the ice point, 0°C, can correct for it.

The chief cause of inaccuracy the above reference cites is: “that not all of the liquid is at the required temperature due to its necessary presence in the stem. Thus, the thermometer is also sensitive to the stem temperature.” (An air-measuring thermometer obviously would be ‘immersed’ in the air it is intended to measure.)
Unfortunately, Chas’ linked material does not address ‘shrinkage’; it may have been directly on pg 95 which did not show in the preview I was allowed by Google.
.

June 23, 2012 3:13 pm

It’s late and my brain fuzzes. But my every instinct is that EMS is right and Steve and Zeke are not even beginning to investigate what EMS is saying.
My instincts in this are based on my past assemblage of several different formidable pieces of work, not least being the excellent work of John Daly, and that of Illarionov, videos reposted above by Amino Acids in Meteorites. Each project in my assemblage showed serious evidence of unquantified UHI. Logic says that this unquantified UHI has wrecked the “homogenization” from the start – to say nothing of dropped thermometers and smearing results over unjustified large areas especially polar.

phi
June 23, 2012 3:18 pm

Chas, _Jim,
“these bulb contractions do not seem to be as large as those you mention, but 0.1 C in the first 4 years…”
In fact, my interpretation is: discontinuities of raw data are overwhelmingly coolings. It is known that this is mainly related to stations moves. It’s very annoying for climatologists because it can be explained only by significant perturbations by urbanization which are not corrected. Therefore, they look to other possible causes (glass contraction, change of hours of observations etc..). It is likely that these effects actually exist but their importance is clearly overvalued. Anyway, in the case of glass contraction, the break is generally corrected but not the slow warming preceding. Still a little boost to anthropogenic warming.
Philip Bradley,
“These vary by country and hence the adjustment (method) needed also varies by country.”
Steven Mosher claimed that most adjustments came from Tobs not only in US. TObs adjustments are generally made ​​in all countries but they are weak and the problem is usually totally different. In this regard, US is in fact a special case. Very curious.

June 23, 2012 3:18 pm

I’m with Paul in Sweden regarding all the homogenizing and correcting and extrapolating into a globe of gridboxes to arrive at some highly doubtful global average temperature from which we can integrate grid boxes of anomalies to show Global Warming. If its significant Global Warming you are trying to show, you could choose an area or cluster of areas of the globe where there are abundant thermometers – 10 to 20% of the globe should be enough and foreget about the oceans (yeah I know 75% or so yadda yadda). If we are going to warm significantly, this selected area will show this. If it is unequivocal after 40 years (the time we have already been worrying about AGW) then it will become obvious. Here, I consider hundredths, tenths of a degree as insignificant and even 1 degree over a century also to be insignificant – which is our experience so far. If you have another purpose (whatever that might be) for knowing if the temperature on average has increased several hundredths of a degree to a couple of tenths of a degree over a decade, then by all means carry on. I believe the effort and expenditure made to date to have been hugely misspent resources (all in probably approaching a trillion bucks – research and windmills and government taxation and policies) if it is for determining if we are all going to fry real soon – we could have done something about people who have already died of things we could have fixed with the cash. We’re not like frogs being slowly heated up to boiling without them noticing. .

June 23, 2012 3:37 pm

Rather than “unquantified” I should have said “insufficiently-quantified”.

June 23, 2012 3:42 pm

phi. the real issue with the TOB adjustment is Karl used an estimating method for the time of observation, that even he accepts results in an adjustment that could be wrong by as much as 25%, when time of observation was recorded and is available on the paper records. A method using the recorded time of observation rather than an estimate of the time would give us a more accurate value for the adjustment needed.
i assume this was originally done to save money, and then, as is commonly the case in climate science, they doggedly stick to a method that gives them the result they want.

E.M.Smith
Editor
June 23, 2012 3:50 pm

@Willis:
Nice to know I’m not the only one to notice…
@Paddylol:
Then you ought to be really interested in the way codes like GIStemp submit every record to a variety of ‘in-filling’ and homgenizing and RSM based ‘spreading around’… There are giant dropouts in the data where it just fills in the grid / box ( like Indonesia during a war or two…)
Exactly why I don’t do that and why I did a version of FD that bridges the dropout.
@JT:
There is no ‘raw data’ in this discussion. By definition, the monthly averages are a computed thing. Even if you find daily values, they have typically been QA fixed and sometimes homogenized.
With that said, the next problem is that the data are temperatures, so range from minus a lot to very hot. Plotting it all up and you get a wide band of mush that is a narrow point at the start of time (one thermometer in Europe) and gets wider and warmer (and flatter) over time as the Equatorial Zones get added. A trend line through it will mostly show the discovery of the rest of the world by Europeans… 😉
If you make them “anomalies”, then you must answer “vs what?”. That’s what I did, and I ONLY compared any one data item to the same thermometer and (so what StationID it has makes no difference, it is only compared to itself regardless of number) and I only compare within the same month ( so ‘like to like’ in time as well). I can likely make a plot through that for you “in a while”. If I do I’ll put it up at my site and add a comment here.
I did a general ‘bulk comparison’ of v1, v2, and v3 actual temperatures that shows a little bit of how the data flatten over time as more coverage comes to the more stable tropics. Not what you want, but a hint in that direction.
http://chiefio.wordpress.com/2012/05/24/ghcn-v1-v2-v3-all-data/
Mostly it shows “summer months” getting a tiny bit less hot (as more Southern Hemisphere and equatorial data enter the series) and the “winter months” getting a more warmer.
It’s an interesting chart, but not useful for climatology, just for seeing how the composition of the data set changes over time.
@Phi:
I can’t say there is “NO” such set available ( I’ve not done an extensive search) but I can say that GHCN v3 comes pre-homogenized and with the ‘splices’ between “Duplicate Numbers” built in…
T. Fowler:
You are most welcome. Just glad it’s of use to someone.
@vukcevic :
I really enjoy the correlations you find and the graphs you make. Don’t know exactly what to make of them (which makes what do what…) but it hints at something really interesting lurking out there in causality land…
@Wayne:
All the figures here are the product of the poster. My charts and figures are on my site.
@Amino Acids in Meteorites:
Thanks! Very nicely illustrated the problem…
:
A very nice example of “structural errors” that will NOT be averaged away…
FWIW, I think that in many cases the early LIG thermometer stations were just set up and then you didn’t touch the thing for a long time. In some cases, the same LIG thermometer was used for decades (centuries?) especially in some classical / historical cases. It is one of those dangling “loose ends” where each “thermometer” is really a variety of instruments that vary over time and may be one long lived instrument or an endless series of “splices” of different instruments depending on location. ( I’d expect the Stevenson Screens in places with annual hurricane / cyclone visits were replaced far more often than the one on the wall of an Observatory at a University…) So each “record” will be idiosyncratic with respect to instrument change and calibration issues over time…
But some folks are sure it will all just ‘average out’ and give 0.001 of precision 😉
With this, I think I’ve caught up with comments and can actually go visit my own blog for once 😉

E.M.Smith
Editor
June 23, 2012 4:01 pm

Gak! Spoke too soon…
Hello Lucy!
@Lucy Skywalker:
Thanks for the endorsement via instinct 😉
It’s that ‘assembly’ process that’s the issue. This critique avoids the ‘assembly’, so doesn’t find the issue (thus is not really a critique of a method that sets out to do that and does find it. IMHO)
@Phi:
Yup. The devil is in the splices (of various kinds).
Pearse:
That’s why I do the “by region” graphs (and eventually the ‘by country’ graphs) that show wildly divergent changes by region and by country. It just hollers “not CO2” and clearly indicates “data artifact” and “local land use” issues.
And very much in agreement on the incredible waste of resources in the AGW “issue”.
Heck, with the $5 Million “wet kiss” to Mann for surviving a whitewashing one could provide a rocket stove to just about everyone in Madagascar and both save their forest from further destruction for fuel wood AND save the eyesight of huge numbers of women.
Saw on the news that $2 Billion of US dollars were being pledged in Rio for more UN Climate Boondoggles. The amount of real good that could be done with that is so great, and the shear waste of it there so pathetic…

June 23, 2012 4:58 pm

Here’s another short video (34 seconds) showing where GISS does not have stations taking actual temperature data. It will be the black areas on the globe. In these areas they use the areas around the black holes to do estimates of what the temperature would be in the black holes. And as EM Smith is pointing out, these estimates, in some cases, are estimates of estimate of farther away stations.
Maybe we can pass the hat and send some money to NASA so they can buy temp stations for these black hole areas. 😉

ferd berple
June 23, 2012 5:19 pm

Effectively the raw data points are randomly distributed. Gridding has removed the randomness and in the process changed the mean and deviation of the data set. This is a form of selection bias. Similar to what is done with the tree ring circus.

ferd berple
June 23, 2012 5:25 pm

Saw on the news that $2 Billion of US dollars were being pledged in Rio for more UN Climate Boondoggles. The amount of real good that could be done with that is so great, and the shear waste of it there so pathetic…
==================
Look on the bright side. It wasn’t the $100 billion Obama pledged 3 years ago.
Unfortunately the sad fact is that this money has to come from other programs that actually could save lives. Instead tens of thoussands of people die every month from preventable causes, as money that could have been used to save them is squandered on politically correct climate science repackaged as sustainable development.
Rather than save tens of thousands of people every month today, politicians and scientists do nothing and pledge our money to save tens of thousands of people in the distant future. By which time the money will be long gone and no one will be saved.
The real problem is that these people are totally ineffective and anything more than lining their own pockets and procaliming to the heavens how richeous they are.

ferd berple
June 23, 2012 5:47 pm

Gridding to fill in missing data is statistical nonsense. It assumes that the missing points are the average of the surounding data, without any basis in fact. You are mch better to work with the raw data missing the data points for statistical analysis than you are with the processed data.
Want to test this for yourself? Take a non linear series like the Fibonacci series. Randomly delete points from the series. Do a statistical analysis on the result. Now fill in the missing data points with averages of the surrounding points. Repeat your statistical analysis. In every case the results will be more representative of the original series using the data before you filled in the missing points.
This is the nonsense of gridding. It fills in data based on an assumption that is less accurate than the data before you filled it in. You need to know the relationship over time for how the missing points interact with their neighbors before you can fill them in. Nothing says this will be an average. Thus, if you use an average you are making the data less accurate, not more accurate.
Any time you hear theword gridding, put you hand over your wallet.

Paul in Sweden
June 23, 2012 6:03 pm

EM Smith, Thank you for taking the time for a reply, your volunteer service as well as the volunteer service of so many others has not only been noticed but has made a difference.
“FWIW, the very notion of a Global Average Temperature is based in a fundamental error of Philosophy of Science. It is an intrinsic or intensive property. You simply can not average two temperatures from different things and have any meaning in the result. It is an obscure, but vitally important point; that is consistently ignored by the entire Climate Panic Industry…”
Agreed.
Moving on to data quality:
If we were moving a top level general ledger commercial banking system from one set of data centers to another set of data centers and independent auditors were complaining that accounts were inexplicably off from the present system to the old one and the response from internal auditors at an op meeting was:
“This shows that while the raw data in [GHCN v1] old general ledger and [v3] new general ledger is not identical (at least via this method of station matching), there is little bias in the mean.”
There would be a seemingly long pause followed by language that should not be repeated and a two fold scramble to identify the individual regional discrepancies, quantify them, put teams on them to estimate reconciliation time, all the while other teams would be evaluating the prudence of initiating the meticulously planned fallback and restore plan. Not for nothin’ but I can tell you that versions of that scenario happen at all of the major financial institutions several times a day in order to produce a close of day database. This is done because major financial decisions must be based on quality data and if the books are off heads roll and people are JAILED. The concept that Asia has multiple accounts inexplicably down 40 percent but Europe inexplicably has multiple accounts up 40 percent and somehow this is OK because “there is little bias in the mean” is something that just does not fly.
Chiefio, I totally agree with your “a set of data that are not ‘fit for purpose’” statement but others do not seem to be very concerned. I do not know, it is not like someone would actually spend hundreds of billions of dollars, task 10s of thousands of people in multiple countries, enact punitive laws and turn whole governments & economies upside down based on a decision that even included this irreconcilable data. Right? Surely, they can definitively empirically isolate, quantify each of the first order known(natural & anthropogenic) climate forcings. Right?
With regard to GHCN v1 & v3, the data needs to be reconciled. There are climate scientists out there who actually do climate science that contributes towards regional agricultural and civic planning in addition to the advancement of our general understanding of climate science.
‘There is little bias in the mean’ does not change the fact that the data is corrupted, all regional products cannot be validated. The data is all well and good for academics and MMORPGs on taxpayer supplied super computers, but would never standup to commercial and regulatory standards. It is useless for practical applications and should not be included in any policy decision.
Our historic climate records are important.

ferd berple
June 23, 2012 6:29 pm

One of the most powerful concept in information processing is the NULL. Different from Zero, NULL means we don’t know the value in this cell. Zero means we know the value and it is 0. You can always spot the rookies in data processing, they forever try and replace NULL by Zero or some other value, because they don’t know how to deal with it.
Gridding is no different. A rookie mistake. Rather than tell the statistics that you don’t know what value should be in the cell, you are telling a statistical lie. You are saying you know the value, and it is X. As a result every statistical analysis you perform downsteam will underestimate the error. Your data will appear more accurate than it acutually is, because you have not been truthful about your data..

KenB
June 23, 2012 6:30 pm

A most interesting post, its way past time for Steve Mosher to engage and demonstrate, and thanks for doing so, please continue. Chiefio your Forensic approach to checking and re-checking is needed, even if only to break down the issues, so that they can be understood by all, rather than just serve an ivory tower of convinced individuals who then dictate we should take their groupthink as reality.
Otherwise this sort of data manipulation and declaration, only promotes a monetary search for a new Godlike world temperature better or BEST than all the others “trust us” and in the end what have we got? Problem solved? Fixed? Not likely, but send more money to fix the unfixable, i.e. tax till it hurts!

Venter
June 23, 2012 8:06 pm

Cheifo,
Thanks for an excellent set of replies, laying down clearly what you said, making it easy foreveryone to understand. Straightaway on that score, you’v shown more science, data, facts and humility compared to ” go check for yourselves ” or ” go study more ” bullshit espoused by Mosher.
And lastly thank for showing that in typical climate science fashion, Mosher set up a strawman and pretended to demolish it, completey missing the gist of what your initial post said.
I work in the healthcare field, handling clinical trials being one of my responsibilities. If I use or present data like this GHCN data to prove claims and get FDA approvals in healthcare, I’ll be up before the beak in half a jiffy, charged with data manipulation and fraud.
And we have ” ex-Phurts ” here who are smarter than everybody else by their own self conferred status, defending such data. It’s a joke. These ” ex-phurts ” wold be turfed out of the gate in any professional industry that pays people to be efficient.

David Falkner
June 23, 2012 8:35 pm

I seem to remember a graph in Berkeley’s original release that showed many stations in CONUS cooling and many warming. Apparently, averaging them together gives a warming signal. But does that really reflect reality?

David Falkner
June 23, 2012 8:45 pm

Mosher & Stokes,
Comparing trends does lend some semblance of proof, but trends can match for different reasons. GISS and GHCN can match trends for entirely different reasons. Does that make them right?

E.M.Smith
Editor
June 23, 2012 9:25 pm

@JT:
I computed all the anomalies, put them as a csv file, and loaded it into OpenOffice as a spreadsheet.
All that is fine.
Then I asked it to make a chart, and it crashes.
It would seem that a 24 MB file is too big to chart 😉
I can do a graph of the AVERAGE anomalies in any given year, but the whole data set as anomalies (or as temperates) is just too big for OO.

E.M.Smith
Editor
June 23, 2012 10:40 pm

@Ferd Berple:
As a side note, one of my college dorm roomies was named Fred. We all called him Ferd (which I think he applied to himself). I have fond associations with “Ferd”… 😉
The explanation given for the infilling bahaviour is a paper done by Hansen, IIRC, that justifies the Reference Station Method. I’ve read the paper. It basically tests a limited set of stations in a short period of time and shows that a reasonable prediction of one can be made from another up to 1200 km away.
This is then used as justification for filling in any missing temperature from ANY thermometer up to 1200 km regardless of relationship changes over time. AND doing it recursively.
So if the comparison period is when the PDO is in the cold phase, cold phase relationships can be used to fill in data when we are in the hot phase. (And vice versa).
Now think about that a minute. During the warm phase we had a very flat jet stream. West coast and East coast both neutral warm. Now in the cold phase, the west coast is quite cold, and the east coast is having tropical air pulled up over it…. Yet the RSM says you can use the former relationship to fill in missing data during the later relationship. Think that has opportunities for “fill in” and “homogenizing” and “UHI Correction” (all based on RSM) to enable “dropouts” to have “selective influence”?
The other one I like is that here in California, we have the interesting inverse relationship between San Francisco and the Central valley during the summer. When the central valley gets very hot, air rises, and pull cooling fog over S.F. During the winter, the cold just flows over everywhere, but SF is usually warmer than inland via water moderation. So cold: SF warmer. Hot: SF colder.
Now the average of that activity can be used to fill in missing data. Even though either regime might or might not be present at any given time. (Often it’s a 3 day oscillator during the summer. One HOT day, then the air starts to move. Day two still hot, but with a breeze. Day three, cooling air arrives. (Day four it kind of halts and then heat starts to build again…) )
So you get a value that is “the average” but during non-average times…
Then that method gets used recursively. First for infilling. Then for UHI “adjustment”. Then for “grid / box” filling in so you can make anomalies… No paper ever justifies serial use. It is not a peer reviewed behaviour…
IMHO, the RSM needs to be subjected to validation testing in multiple geographies (where I suspect it will fail in some) and in multiple long duration regime change times (so hot vs cold PDO or AMO or Indian Ocean Dipole or AO or …) and as a recursive use; and shown to be invalid in those cases. That would invalidate most of the ‘climate codes’ IMHO. and likely the GHCN now too.
@Paul in Sweden:
You are most welcome. I’d had other plans for what I was going to do today, so it’s nice to know that changing them was of benefit…
Early in life I was the “Night Auditor” at a hotel. I had to close the books each night. If the tabulating machine was off by 1 penny from the books, I could not close. (On one occasion I found a discarded ledger card in the trash for $26.10 (IIRC). It was the amount by which the cash register was off from the posting machine (tabulator). That was a ‘single queen sized bed room’ then. I’d recognized the value, figured someone blew a posting and didn’t put it in the errata book, and dug through the trash until I found the torn up card evidence.
It is things like that which make me cringe looking at what can be done in “climate science” …
I did computer book keeping for companies, including transitioning systems. Didn’t always have to be ‘to the penny’, but pretty darned close…
Then I was manager of QA for a compiler company. Talk about hard core… Think anyone would be happy with “Well, the math suite has some jitter in the low order half of the digits, but the mean result calculated is close”… Or “Well, when we use the Float package it does OK, but using the integer package doesn’t work as well, so don’t use it on that data. Didn’t you get the memo?…” (Yes, I bite my tongue a lot around ‘climate scientists’…)
I’ve also done “Qualified Installations” for a drug company for FDA compliance. I can state with absolute certainty that the data processing and archival process applied to the GHCN would FAIL the FDA requirements for even the most trivial drug approval. Even a new aspirin coating.
Maybe I’m just expecting too much… ( Then I remind myself folks are wanting to play “Bet The Global Economy” based on it …)
@Ferd Berple:
Hadn’t thought about it that way… Yes, NULL is your friend. (Must have been spending too much time in GHCN land where -9999 is the missing data flag 😉
The basic problem, IMHO, began back about 1970 when Hansen first started trying to do the whole GIStemp thing. At that point, I think they realized that the spacial coverage was just crap and the data had so many dropouts that it was useless. Then they had to confront the data quality issues and the horrible precision in the recording of the data.
If you look at the history of the “science” involved (and read the code with an eye to era of FORTRAN to date it 😉 You can see the “fixes” being layered on… At the end, they got a number, but were foolish enough to believe it…
So First Differences does this “reset” on a data drop out. There are so many dropouts that lots of places just make crap results on a FD run. (That’s WHY I made the change I did to span dropouts. The thesis being that “even if I’m missing 3 January values in a row, if THIS January is 1/2 C lower, it’s lower, and that is a usable fact.” The other folks ran off to this “baseline” method averaging a couple of decade values and then using that average to stand in for ‘normal’. All prior to having anomalies, so ‘has issues’ in terms of intrinsic properties…
Then the RSM gets invented to try and “fill in” some of the missing bits. Eventually when they realize the geographic coverage is crap, they use RSM to smear what thermometers they do have over 1200 km away into other grid boxes. Now they can do anomalies for each grid box. Never mind that most of them are entirely void of any actual data.
So in my approach, I deliberately avoid all of those behaviours. A thermometer is ONLY compared to itself. Dropouts are seamlessly bridged without without making up any values at all and with preservation of the actual changes measured over time. NO averaging of a temperature is ever done (only anomalies are averaged). The data are never stretched into empty “grids”. Instead I can take a “cut” of the data based on the Country Code (first 3 digits of the StaionID), or the Region (continent, the first digit of Country Code), or even parts of the WMO number if desired (the actual station ID, the later digits of the StationID). This lets you say “Give me the data you have for a place that has data”, but doesn’t try to change where it covers. Basically, every step I saw that smelled like a ‘kludge’ to me in GIStemp, I replaced with something that was not a kludge. (Except that I left in the “splice artifact” character that comes from blending records in the final report step and I left in the FD tendency to be sensitive to the fist data item in a series – for reasons I’ve explained above, wanting to measure the splice risk).
Then things get a bit speculative. I would speculate that during this process, folks noticed that by shifting things around they could select the result. It isn’t a large leap from “No mater what I do, the result is sensitive to how I select stations” to “Gee, I can select stations to get the result I want.” No, no proof. But the pattern of data dropouts, station dropouts, and that we know have three nations where the local Met Office says “Hang on, use ALL the data and we get a cooler result” does look mighty suspicious…
Oh, and on Steven’s assertion that the StaionIDs all changed… if he looks a bit closer he may find that the Country Code part is all different, but the Region is the same, and that the WMO number mostly is the same modulo an added ‘zero’ in a sub-station field and the lack of a Duplicate Number. Yes, some have changed WMO numbers, but not many. More distressing is that some with the same WMO number have different LAT and LONG values. So that matching on LAT/LONG may not be so perfect either… I didn’t mention it before because it isn’t a very big deal. But my first ‘cut’ at this had me spending about a week trying to build a StaionID map (that’s about 3/4 done) and noticing all the patterns. I looked at matching on LAT/LONG and it still has some errors in the match. That’s why I don’t even try to match on StationID, just let you select subsets by range. (So 5 gets you Oceana – Pacific Islands. 513 in v1 or 501 in v3 gets you Australia. You can go to things like ^501[1-4] to get a particular subset of WMO numbers (that tend strongly to be grouped by geography on the first digit or two). I suspect I know way more about the structure of StationID than any person ought 😉
This technique lets you grab collections of stations from “areas of interest” and compare “group to group”. So ^513[1-2] from v1 will tend to get stations from the same sub-geography of Australia in v1 as ^501[1-2] will get from v3. It is a nice little technique for comparing, say “Stations around Tahiti” to “Tahiti”. Very very useful for inspecting things like “IF I take out this one group of stations with a load of thermometer change, what does the rest of the region look like? It isn’t a problem, it is a giant advantage and avoids the grid / box trap.
@KenB:
Thanks! Sometimes I wonder… Like, I figure Steven must “have a life” and is likely having a nice Saturday Night out. Me? I’m trying to put a plot line through 24 MB of data points :-0
Falkner:
Verity Jones at Digging In The Clay has a very nice set of postings showing warming vs cooling stations and changes over time. She and TonyB even have a nice data base interface on line last time I looked. WELL worth time spent there:
http://diggingintheclay.wordpress.com/
Then click on the “sister website” graph on the right hand side for the database plotting tool.
So we have stations warming and stations cooling and different regions going in different directions. And an average off all that means???? Yeah, me too 😉
@All:
I may check in again a bit later, but I’m more likely to go to bed soon. It’s been a long day…

Carrick
June 23, 2012 10:40 pm

Willis:

Heck, contrary to your usual practice, you even actually answered a few questions

Serious question here:
How many questions are they required to answer? I don’t expect d****bags deserve any answer at all, for example.
As for Smith, he wears his bias on his sleeve, some independent statistical analysis that will turn out to be (NOT) and whatever else, is unsurprisingly is wrong (bias does that too you, it makes very smart people stupid and prone to confirmation bias).
OTOH, people like phi and Pamela Gray need to quit leaning on other people and do their own homework, that’s a fact, especially if they are going to frequency comment on certain topics. When phi argued with Mosher, one of the authors of BEST’s code, over BEST’s capabilities, that was the funniest moment on the thread for me. Right up there with phi claiming that tree rings make better thermometers than real ones. Theres a difference between skepticism and boneheadness, enough said there.
So this is meant as a serious question, Willis? How clueless does a person need to be before we are allowed to blow them off?
I have a very low admitted threshold here, hence I don’t do front page blogging. Not at this point in my life, not while working up to 80 hours a week on my own, very engaging and satisfying, research. Noobs just don’t interesting me much, especially when they are chalk full of their own “answers” already.

E.M.Smith
Editor
June 23, 2012 11:04 pm

@Venter:
Just noticed your comment.
Thanks! I try… Compulsive Service Personality Disorder 😉
I’ve had a long time goal of trying to speak clearly and generically about complex things. In my opinion, anyone can “get it” about complex and technical things if they are described in clear terms. I’ve not found any concept that was so abstract that it required jargon. Jargon can be faster, and I’ve used it sometimes when needed for speed or precision; but generally just thinking clearly about things for a few minutes can come up with something more understandable.
I’m also modestly intolerant of snobs and folks who like to play “Gotcha!”… Which kind of makes me not want to be like them 😉
So you’ve had experience with the FDA, eh? Painful, huh! 😎 It amazes me how many fields have standards that are just incredibly more high than “climate science”, yet the practitioners of it seem to think we are being petty for expecting things like, oh, a Golden Master date on a key data file, or revision control, or archives with revisions, or benchmark suites or regression testing suites or… All just standard SOP in most fields.
But don’t be too hard on Mosher. He’s been hanging out with Warmers and I think it has slowly been reducing his ‘lukewarmer’ independent thinking. He’s just become convinced that “If you just do it exactly the way THEY do it, it works just like they say!”…
His approach to doing the “testing” / critique would be valid if my goal was to make a One True Global Average Temperature like everyone else. It would be valid if I was doing “by station” compares. IMHO, all that happened was he ran to ‘defense’ assuming I was “doing what everyone does” before actually looking at what I said, what I do, and why I’m doing it. But that takes time. I’d guess about 2 days for someone with his skill level (if he already knows FORTRAN it would help) and I think he just didn’t want to put in that time. It is a common thing for folks to do when they are very close to an issue and someone comes at it from a new direction.
That he had trouble understanding my description of handling dropouts tells me he was distracted or not putting much time into it. How hard is it, really, to understand “On a drop out, do nothing and proceed to the next valid data item.”? Just ‘span the gap’ 😉 But it was ‘different’, so ‘not what everyone else does’, so ‘an issue’… I figure he was just up late making the graphs and writing code for this posting…

June 24, 2012 1:26 am

i reckon this issue is now ripe for the lads at Climate Audit.
Thank you Steve and Zeke for making us all sit up and think these issues through more precisely.
Like showing the HS is an unavoidable statistical result of the Team’s method of selection, I suspect CA would show that the warming is an unavoidable statistical result of dropping stations etc. But I am open to disproof.
Carrick?