E.M. Smith over at the blog Musings from the Chiefio earlier this month posted an analysis comparing versions 1 and 3 of the GHCN (Global Historical Climate Network) data set. WUWT readers may remember a discussion about GHCN version 3 here. He described why the GHCN data set is important:
There are folks who will assert that there are several sets of data, each independent and each showing the same thing, warming on the order of 1/2 C to 1 C. The Hadley CRUtemp, NASA GIStemp, and NCDC. Yet each of these is, in reality, a ‘variation on a theme’ in the processing done to the single global data set, the GHCN. If that data has an inherent bias in it, by accident or by design, that bias will be reflected in each of the products that do variations on how to adjust that data for various things like population growth ( UHI or Urban Heat Island effect) or for the frequent loss of data in some areas (or loss of whole masses of thermometer records, sometimes the majority all at once).
He goes on to discuss the relative lack of methodological analysis and discussion of the data set and the socio-economic consequences of relying on it. He then poses an interesting question:
What if “the story” of Global Warming were in fact, just that? A story? Based on a set of data that are not “fit for purpose” and simply, despite the best efforts possible, can not be “cleaned up enough” to remove shifts of trend and “warming” from data set changes, of a size sufficient to account for all “Global Warming”; yet known not to be caused by Carbon Dioxide, but rather by the way in which the data are gathered and tabulated?…
…Suppose there were a simple way to view a historical change of the data that is of the same scale as the reputed “Global Warming” but was clearly caused simply by changes of processing of that data.
Suppose this were demonstrable for the GHCN data on which all of NCDC, GISS with GIStemp, and Hadley CRU with HadCRUT depend? Suppose the nature of the change were such that it is highly likely to escape complete removal in the kinds of processing done by those temperature series processing programs?….
He then discusses how to examine the question:
…we will look at how the data change between Version 1 and Version 3 by using the same method on both sets of data. As the Version 1 data end in 1990, the Version 3 data will also be truncated at that point in time. In this way we will be looking at the same period of time, for the same GHCN data set. Just two different versions with somewhat different thermometer records being in and out, of each. Basically, these are supposedly the same places and the same history, so any changes are a result of the thermometer selection done on the set and the differences in how the data were processed or adjusted. The expectation would be that they ought to show fairly similar trends of warming or cooling for any given place. To the extent the two sets diverge, it argues for data processing being the factor we are measuring, not real changes in the global climate..The method used is a variation on a Peer Reviewed method called “First Differences”…
…The code I used to make these audit graphs avoid making splice artifacts in the creation of the “anomaly records” for each thermometer history. Any given thermometer is compared only to itself, so there is little opportunity for a splice artifact in making the anomalies. It then averages those anomalies together for variable sized regions….
What Is Found
What is found is a degree of “shift” of the input data of roughly the same order of scale as the reputed Global Warming.
The inevitable conclusion of this is that we are depending on the various climate codes to be nearly 100% perfect in removing this warming shift, of being insensitive to it, for the assertions about global warming to be real.
Simple changes of composition of the GHCN data set between Version 1 and Version 3 can account for the observed “Global Warming”; and the assertion that those biases in the adjustments are valid, or are adequately removed via the various codes are just that: Assertions….
Smith then walks the reader through a series of comparisons, both global and regional and comes to the conclusion:
Looking at the GHCN data set as it stands today, I’d hold it “not fit for purpose” even just for forecasting crop planting weather. I certainly would not play “Bet The Economy” on it. I also would not bet my reputation and my career on the infallibility of a handful of Global Warming researchers whose income depends on finding global warming; and on a similar handful of computer programmers who’s code has not been benchmarked nor subjected to a validation suite. If we can do it for a new aspirin, can’t we do it for the U.S. Economy writ large?
The article is somewhat technical but well worth the read and can be found here.
h/t to commenters aashfield, Ian W, and rilfeld