A new must read paper: McKitrick on GHCN and the quality of climate data

This new paper by Dr. Ross McKitrick of the University of Guelph is a comprehensive review of the GHCN surface and sea temperature data set. Unlike many papers (such as the phytoplankton paper in Nature, complete code is made available right from the start, and the data is freely available.

There is a lot here that goes hand in hand with what we have been saying on WUWT and other climate science blogs for months, and this is just a preview of the entire paper.This graph below caught my eye, because it tells one part of the GHCN the story well.

Figure 1-7: GHCN mean latitude of monitoring stations. Data are grouped by latitude band and the bands are weighted by geographical area. Data source: GHCN. See Appendix for calculation details.

1.2.3. Growing bias toward lower latitudes

The decline in sample has not been spatially uniform. GHCN has progressively lost more and more high latitude sites (e.g. towards the poles) in favour of lower-latitude sites. Other things being equal, this implies less and less data are drawn from remote, cold regions and more from inhabited, warmer regions. As shown in Figure 1-7, mean laititude declined as more stations were added during the 20th century.

Here’s another interesting paragraph:

2.4. Conclusion re. dependence on GHCN

All three major gridded global temperature anomaly products rely exclusively or nearly exclusively on the GHCN archive. Several conclusions follow.

  • They are not independent as regards their input data.
  • Only if their data processing methods are fundamentally independent can the three series be considered to have any independence at all. Section 4 will show that the data processing methods do not appear to change the end results by much, given the input data.
  • Problems with GHCN, such as sampling discontinuities and contamination from urbanization and other forms of land use change, will therefore affect CRU, GISS, and NOAA. Decreasing quality of GHCN data over time implies decreasing quality of CRU, GISS and NOAA data products, and increased reliance on estimated adjustments to rectify climate observations.

From the summary: The quality of data over land, namely the raw temperature data in GHCN, depends on the validity of adjustments for known problems due to urbanization and land-use change. The adequacy of these adjustments has been tested in three different ways, with two of the three finding evidence that they do not suffice to remove warming biases.

The overall conclusion of this report is that there are serious quality problems in the surface temperature data sets that call into question whether the global temperature history, especially over land, can be considered both continuous and precise. Users should be aware of these limitations, especially in policy sensitive applications.

Read the entire preview paper here (PDF), it is well worth your time.

h/t to E.M. Smith

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

164 Comments
Inline Feedbacks
View all comments
David Ball
August 3, 2010 7:42 am

Readers of WUWT have been familiar with a lot of this information for quite a while now. Thanks to Ross McKitrick for a clear concise academic paper on the subject. One should always examine the data source for problems before jumping to erroneous conclusion that the planet is catastrophically warming, unless it is catastrophic warming that you want to show. Anthony, your instincts on this have been correct. E.M. Smith as well. Now on to the public if we can be heard above the alarmists shouting and an agenda driven media. A herculean task.

theduke
August 3, 2010 7:43 am

I haven’t read the preview paper (will later), but the point about lack of continuity has always bothered me about surface temperature data. How does one realistically compare temperature readings in London in 1890 vis a vis 2010? There must be a huge number of adjustments that would have to be made in both up and down directions and how does one ever know if you’ve made all the adjustments required? Human changes to the environment are dramatic in a hundred plus year span, and that includes the outlying areas. I can see no way to properly compute and/or account accurately for these changes.
I suppose I’m stated the obvious, but every time I see the inevitable headlines recounting how the present decade is the warmest ever, I laugh at the certitude and the arrogance of so-called “climate science.”

David Ball
August 3, 2010 7:48 am

John Finn says:
August 3, 2010 at 6:36 am
You haven’t been paying attention in class, have you.

Pascvaks
August 3, 2010 7:53 am

For those doing climate studies and environmental assessments, McKitrick is sayng: “Stop! Look! Listen! Never Assume Anything!” And for everyone else, he is saying the same thing. For some strange reason mankind has been assuming more and more, indeed, so much more than ever before. Not only is Our Giant Civilization of Cards growing by leaps and bounds, but the cards are getting thinner and thinner; some of them we can even see through.

MattN
August 3, 2010 7:54 am

Page 16, Figure 1-10: Wow….

Bill Illis
August 3, 2010 7:54 am

Great paper and resource. At one point, Dr. McKitrick is referencing the anticipated changes in HadSST2 and ocean SSTs to take into account the 1946 blip.
There is a draft paper in preparation which is already being cited – [Reassessing Biases and Other Uncertainties in Sea Surface Temperature Observations since 1850, 2009, Kennedy, J.J. et al – in preparation]. I’m not sure of the status of this paper but it seems that Climategate and Tom Wigley’s email has delayed the effort.
The draft HadSST3 changes are outlined/referenced in the following paper which is itself not peer-reviewed I believe (but it already has 8 citations) and is produced by the Met Office Hadley Centre, the WMO and a long list of other ocean SST experts. Two versions available.
https://abstracts.congrex.com/scripts/jmevent/abstracts/FCXNL-09A02a-1662927-1-Rayneretal_OceanObs09_draft4.pdf
http://rainbow.ldeo.columbia.edu/~alexeyk/Papers/Rayner_etal2009ip.pdf
The new HadSST3 would reduce the pre-1941 data by -0.1C, increase the post-1946 data by +0.2C and increase the post-2001 data by about +0.1C (the post-2001 increase is hard to explain unless …). This is shown in the following chart – top panel – HadSST2 is red – proposed HadSST3 is green.
There are two versions produced in the two different versions of the paper but I think the second one below just artificially reduced the base period so that the changes are more/less? visible. [The base period is supposed to be 1961 to 1990 so the second chart below is too low].
http://img16.imageshack.us/img16/1205/hadsst3.png
http://a.imageshack.us/img832/3174/newesthadsst3.png
As Dr. McKitrick expected, the new line would be mostly flat from 1940 to 1990.

August 3, 2010 7:57 am

Having pointed out the problems in an authoritative and comprehensive way it remains to see that something is done about it. There in lay the rub. I predict far more effort will be put into defense then corrective action.

Roddy Campbell
August 3, 2010 8:19 am

I’m disappointed it’s being published by the GWPF. Will lessen credibility and impact I fear.

Steven mosher
August 3, 2010 8:23 am

Ditto on Carrick comments.
Ross’s comments on the changes in sampling are but a first step.
Nobody who uses GHNC data uses all 7280 stations. So for starters you can’t look at the entire sample of GHCN and draw any substantive conclusion. For example, Zeke and I take in all 7280 stations and then we do a preliminary Screen. The first screen is to reduce the total number of stations to those stations that have at least 15 years of full data within the 1961-1990 period. That screen drops a couple of thousand stations, so of the original 7280 stations you end up using about 4900 of them. So you really have to study THAT distribution and how it changes over time.
( thats pretty easy I can probably whip something up or Ron B can.)
WRT altitude. Thats a non issue. When you transform by anomaly you are correcting for this effect. Its NOT the temperature at a station, its the CHANGE WRT its mean.
WRT latitude. to test for the sensitivity to station loss I’ve done re sampling tests.
Basically each station gets assigned to a grid ( 3 degrees by 3 degrees) When you do that with 4900 stations you get some grids with one station and other grids with as many as 36. For example, you might get 36 stations in the grid 120-123, 60-57. (lon/lat) What I did was randomly pick only one station per grid. did that repeatedly. The result? no difference. Basically the same answer comes up if you use fewer stations. Again what you have to look at is the distribution of grids covered, not stations. And to further Carricks point, losing coverage at high latitude (per grid) DECREASES the trend. High latitude is COLDER in temp, but the CHANGE in anomaly is higher. So, for example,if the equator changes at zero degrees per century, the poles will be increasing. Reducing your sample at high latitudes depresses the trend. To see this you merely have to look at the long record trends by latitude. The highest trending grids are poleward.
WRT GHCN adjustments. i dont use the GHCN adjustments. The file in question has a few minor issues that Zeke has noted, and some other issues that I’ve identified but havent made public.
Also, the metadata in the ghcn inventory is stale. Ron’s got some updates for that.
Anyways, maybe if I get some time I’ll update in a post. new code drop as well

James P
August 3, 2010 8:26 am

Dr Lurtz
I think the short answer is that satellites don’t measure temperatures at ground level. Indeed, exactly where they are measuring temperature is something of a variable, as it depends on wavelengths, cloud cover, stratification and so on. The upshot is that a properly sited thermometer on the ground is still the standard reference, or would be if climatologists didn’t keep trying to compensate for the ones that ended up in car parks, aircraft runways or next to air-con outlets…

Steven mosher
August 3, 2010 8:27 am

Ditto on Nick Stokes.
GHCN Adjusted is NOT used by CRU or GISS. Given the little quirks I’ve found in the file I wouldnt use it ( records failing to match, duplicate records, no clear provenance for the adjustments).

Carrick
August 3, 2010 8:34 am

Kevin:

I believe most people here understand what you say, but the point is also that larger portions of the Earth, especially those considered to be most important to the detection of warming, are doing with fewer actual measurements. This does make the results more dependent on errors in measurement and adjustment, wouldn’t you say?

I agree with you, and actually this is my biggest gripe with the temperature reconstructions: Very few of them make any attempt at all to generate realistic error bars. This is experimental data, and the mean (and metrics derived from that) is meaningless without a statement of the uncertainty in the measurement. Of course the opposite is also true: Skeptics are wont to point out enumerable warts in the surface data sets, without ever sitting down and demonstrating whether they would amount to a hill of beans, numerically.
Few “actual measurements” may increase the error bars somewhat (not so much, as you note I have included error bars in my figure), but it wouldn’t explain a large systematic effect with latitude. If you think it does, the onus is on you as the critic to demonstrate how a sparsity of stations could explain this temperature trend.
In any case, you need to consider that a large land-surface latitudinal effect is expected from any sort of warming (at least one component of this is glacial retreat, which amplifies the warming through the reduced high-latitude albedo). So we have in this case, agreement of data with basic physics.
If you wanted to argue that the data don’t support an increase trend with latitude, you have a pretty steep stairs to climb. Simply suggesting that “it might matter” doesn’t hold any weight. I’m out of pocket the rest of day, so please don’t get offended if I can’t respond to any comments.

Jason Calley
August 3, 2010 8:35 am

Possible typo in the paper:
Page 36 mentions an error of 0.006 degrees per decade, page 37 mentions 0.006 degrees per century. Two different references, so they may very well both be accurate reporting, but this is the sort of thing that might easily be a typo.

August 3, 2010 8:37 am

Nick Stokes says:
August 3, 2010 at 7:35 am

JamesS says: August 3, 2010 at 6:27 am
“I wish someone would explain why “adjustments for known problems due to urbanization and land-use change” are made to the raw data from a station. If the point of the exercise is to measure the temperature at a given spot, then why would one change the values based on the above parameters?”

GHCN publishes unadjusted figures, and that’s what the main indices use (though they may themselves adjust). But you’re right – adjusted figures do not give a better result for a particular location. The thing is, when you use temperatures to compile a global estimate, then the station temp is taken to be representative of an area (or more correctly, its anomaly is representative). So the corrections are to respond to known issues which would make it less representative.

I’ve been in the software and database development field for 27 years, so I know a little bit about data and analyzing same. Perhaps the problem here is a more basic one than climate scientists will admit: there isn’t enough data to derive a global average temp.
Where’s the shame in admitting as much? Well, we’d like to come up with an average global temperature, but we just don’t have enough stations in enough places to really do that, so until we do, we’ll just publish data for the regions we do know about. Even a 250-km smoothing is too much when one considers that temperatures can vary from, say, Frederick, Maryland, USA to Annapolis, Maryland, USA, by three or four degrees C on any given day.
Climate science has fallen in love with crunching numbers and has lost sight of what those numbers are supposed to represent: reality. If you’ve only got five or six stations apiece in Siberia/Africa/South America/wherever, there is no way you know — or can derive — a decent average temp. You’d just be making up the numbers.

David Ball
August 3, 2010 8:40 am

Roddy Campbell says:
August 3, 2010 at 8:19 am
I’m disappointed it’s being published by the GWPF. Will lessen credibility and impact I fear.
I will say again, where the f**k else are they allowed to publish if they do not tow the party line?!?! The skeptics have been handcuffed, so acquire a series of linked vertebrae and promote the paper as though it had been published in a “respectable” (using the term very loosely) journal. C’mon people!!

Vorlath
August 3, 2010 8:50 am

I’m going through the paper and it is extremely well written.
The airport percentage graph is very striking. It’s amazing how much the trend matches the warming trend. Not saying this is where the warming comes from or anything. I’m just saying it’s amazing how often the selection, monitoring and reporting of temperature (but NOT the temperature itself) seems to match the warming trend (temperature). After a while, one has to wonder if this is just coincidence or if there is actually correlation and causation.

a dood
August 3, 2010 9:03 am

I’m confused by Figure 1-6: Percent GHCN stations located at airports, 1890-2009, page 12. It’s showing that in 1890, 25% of temperature data came from airports. I’m just wondering… why there were airports in 1890, 13 years before airplanes were invented?

August 3, 2010 9:20 am

For those that say the altitude doesn’t matter the anomaly takes care of it are dead wrong, they miss the point entirely. What is that point? Simple: According to the models the mountains (higher altitude) is suppose to warm at a faster rate then low lying areas (lower altitude) [Snyder 2002].
If the Mountains do not warm at a rate higher then the valley’s then the models are wrong. In repeated studies it has been found that the Mountains do not warm at a rate faster then the Valleys, matter of fact the Valleys warm faster then the Mountains. The most famous (IMO) paper on this is the Christy et al 2006 paper published in the Journal of Climate (http://www.openmarket.org/wp-content/uploads/2009/08/2006_christynrg_ca.pdf).
Interestingly enough this paper came out in 2006 and the 4 stations in the GHCN dataset that are also in the Christy paper all of sudden have no data after two months into 2006, but at the same time still have data in the USHCN dataset up into 2009. What’s the excuse this time, that NCDC couldn’t give the report to themselves?
So yes Altitude does matter and the anomaly doesn’t save you because the trends they show are wrong according to the models.
Here is the list of papers that Dr. Christy references in a presentation on this in 2010
(4:17 mark http://www.youtube.com/watch?v=UcGgLoPpbBw )
Christy 2002; Christy et al 2006, 2007, 2009; Pielke Sr. et al 2008 and Walters 2007
Matter of fact these studies show that Tmean is a poor representation of how much “warming” may or may not be due to CO2:

As a culmination of several papers and years of work, Christy et al. 2009 demonstrates that popular surface datasets overstate the warming that is assumed to be greenhouse related for two reasons. First, these datasets use only stations that are electronically (i.e. easily) available, which means the unused, vast majority of stations (usually more rural and more representative of actual trends but harder to find) are not included. Secondly, these popular datasets use the daily mean surface temperature (TMean) which is the average of the daytime high (TMax) and nighttime low (TMin). In this study (and its predecessors, Christy 2002, Christy et al. 2006, Pielke Sr. et al. 2008, Walters et al. 2007 and others) we show that TMin is seriously impacted by surface development, and thus its rise is not an indicator of greenhouse gas forcing.

From his 2009 EPA submission http://icecap.us/images/uploads/EPA_ChristyJR_Response_2.pdf
So what do you do when data goes against your narrative?
Well as shown by Climatologists for Paleoclimate studies we chop it off so it doesn’t cause Policymakers to ask akward questions. In the Surface data sets we stop using the high altitude ones.

James Sexton
August 3, 2010 9:24 am

Dr. Lurtz says:
August 3, 2010 at 6:27 am
Please help me!!
Didn’t the World spent billions of dollars on new weather satellites in the last decade.
1) Wasn’t one of the purposes of the satellites to measure land and sea temperatures????
2) If those temperature are accurate, why do we need ground (water) based stations???
3) Where are the satellite temperature records?? Are there any??? Who keeps them??
4) If we are so advanced in measurement technology, why don’t we have “complete” global temperature records via space???
WUWT????
I’m not an expert in the Sat. temp readings, but I’ll try to help.
Yes, we’ve spent billions on sat. temp measurements. They appear to be accurate, but I’ve questions about them. The satellites don’t measure the ground temperatures, the post from James P says: August 3, 2010 at 8:26 am is about of good explanation as I could give. Currently, RSS-MSU and UAH are the two most prominent sat temp collecting groups that I know of. Their data is updated on a regular basis. The way I get to the raw data for the various climate places, is first I go to the interactive graphs section at the woodfortrees web site. For example, here, http://www.woodfortrees.org/plot/rss . At the bottom of the page, there is a link to the raw data. You’ll find a text file with web addresses at the top of the page, go to them for more detailed analysis of the raw data.
The reasons we don’t simply use sat data is because the data is different than thermometer reading on the ground and sea. It would be an apples to oranges comparison. Even if we did use them, we have relatively no historical data from sat temps. We can’t compare a mercury reading from 1900 to at sat reading in 2010. I believe 1979 was the first year we start collecting sat. temp readings, but even then we’d have continuity issues on the way the satellites were calibrated.
At any rate, there has been a few posts relating to a couple of your questions. You should search the past postings here. For additional information, Dr.s Spencer and Christy maintain their own websites, http://www.drroyspencer.com/ and http://www.atmos.uah.edu/atmos/christy.html respectively. Both seem very open to communication.
Hope that helps.

James Sexton
August 3, 2010 9:28 am

a dood says:
August 3, 2010 at 9:03 am
“I’m confused by Figure 1-6: Percent GHCN stations located at airports, 1890-2009, page 12. It’s showing that in 1890, 25% of temperature data came from airports. I’m just wondering… why there were airports in 1890, 13 years before airplanes were invented?”
Or, maybe they built airports where they were measuring the temps. I’m not sure, hopefully Ross will answer. But it seems entirely plausible that this would be the explanation. With out airplanes and concrete ect. those would be ideal places to measure temps.

ML
August 3, 2010 9:45 am

Vince Whirlwind says:
August 3, 2010 at 1:08 am
You’re right, very interesting:
“Section 4 will show that the data processing methods do not appear to change the end results by much, given the input data. ”
That’s the funniest thing I’ve read for a while. Thank god we have professors of economics to explain science to us.
====================
I was laughing too ( reading your post). It is going to be even funnier when instead of professor of economics, the plumbers (professionals) will start explaining the science to you in a very plain language

James Sexton
August 3, 2010 9:47 am

Heh, love my instincts! For those wondering about airports in 1890, click on the source link. http://chiefio.wordpress.com/2009/12/08/ncdc-ghcn-airports-by-year-by-latitude/ There, you’ll find this explanation:
“..This is a bit hobbled by the primitive data structure of the “station inventory” file. It only stores an “Airstation” flag for the current state. Because of this, any given location that was an open field in 1890 but became an airport in 1970 will show up as an airport in 1890. Basically, any trend to “more airports” is understated. Many of the early “airports” are likely old military army fields that eventually got an airport added in later years.”

Bernie
August 3, 2010 10:21 am

Ross has left an email address if you have found typos, etc. He might find it more helpful than going through all the above comments – as good as they are. 😉

Gail Combs
August 3, 2010 10:25 am

Huth says:
August 3, 2010 at 3:35 am
thank you, Rich Matarese, I was just going to ask. I don’t understand all these abbreviated initial wotsits….
_________________________________________________________
Dr. McKitrick may want to add a glossary. Otherwise use the WUWT glossary (on to tool bar) here:
http://wattsupwiththat.com/glossary/
Otherwise I found the paper very readable except perhaps the last few pages that was getting into more technical detail. Non scientists using the glossary should not have too much trouble understanding this paper.
Congratulations Dr. McKitrick on a job well done.

Gail Combs
August 3, 2010 10:26 am

Bob(Sceptical Redcoat) says:
August 3, 2010 at 7:39 am
I suspect that all scientists, good and bad, consider themselves to be experts at using computers to analyse data. I also suspect that this is seldom the case. Thus, all scientific papers that rely extensively on computers to manipulate and interpret data should have the analysis reviewed by professional computer experts, such as Professor Ross McKitrick, not by other scientists.
________________________________________________________________
AND a statistician.