Would You Like Your Temperature Data Homogenized, or Pasteurized?

A Smoldering Gun From Nashville, TN

Guest post by Basil Copeland

The hits just keep on coming. About the same time that Willis Eschenbach revealed “The Smoking Gun at Darwin Zero,” The UK’s Met Office released a “subset” of the HadCRUT3 data set used to monitor global temperatures. I grabbed a copy of “the subset” and then began looking for a location near me (I live in central Arkansas) that had a long and generally complete station record that I could compare to a “homogenized” set of data for the same station from the GISTemp data set. I quickly, and more or less randomly, decided to take a closer look at the data for Nashville, TN. In the HadCRUT3 subset, this is “72730” in the folder “72.” A direct link to the homogenized GISTemp data used is here. After transforming the row data to column data (see the end of the post for a “bleg” about this), the first thing I did was plot the differences between the two series:

click to enlarge

The GISTemp homogeneity adjustment looks a little hockey-stickish, and induces an upward trend by reducing older historical temperatures more than recent historical temperatures. This has the effect of turning what is a negative trend in the HadCRUT3 data into a positive trend in the GISTemp version:

click to enlarge

So what would appear to be a general cooling trend over the past ~130 years at this location when using the unadjusted HadCRUT3 data, becomes a warming trend when the homogeneity adjustment is supplied.

“There is nothing to see here, move along.” I do not buy that. Whether or not the homogeneity adjustment is warranted, it has an effect that calls into question just how much the earth has in fact warmed over the past 120-150 years (the period covered, roughly, by GISTemp and HadCRUT3). There has to be a better, more “robust” way of measuring temperature trends, that is not so sensitive that it turns negative trends into positive trends (which we’ve seen it do twice how, first with Darwin Zero, and now here with Nashville). I believe there is.

Temperature Data: Pasteurized versus Homogenized

In a recent series of posts, here, here, and with Anthony here, I’ve been promoting a method of analyzing temperature data that reveals the full range of natural climate variability. Metaphorically, this strikes me as trying to make a case for “pasteurizing” the data, rather than “homogenizing” it. In homogenization, the object is to “mix things up” so that it is “the same throughout.” When milk is homogenized, this prevents the cream from rising to the top, thus preventing us from seeing the “natural variability” that is in milk. But with temperature data, I want very much to see the natural variability in the data. And I cannot see that with linear trends fitted through homogenized data. It may be a hokey analogy, but I want my data pasteurized – as clean as it can be – but not homogenized so that I cannot see the true and full range of natural climate variability.

I believe that the only way to truly do this is by analyzing, or studying, how differences in the temperature data vary over time. And they do not simply vary in a constant direction. As everybody knows, temperatures sometimes trend upwards, and at other times downward. The method of studying how differences in the temperature data allows us to see this far more clearly than simply fitting trend lines to undifferenced data. In fact, it can prevent us from reaching the wrong conclusion, as in fitting a positive trend when the real trend has been negative. To demonstrate this, here is a plot of monthly seasonal differences for the GISTemp version of the Nashville, TN data set:

click to enlarge

Pay close attention as I describe what we’re seeing here. First, “sd” means “seasonal differences” (not “standard deviation”). That is, it is the year to year variation in each monthly observation, for example October 2009 compared to October 2008. Next, the “trend” is the result of smoothing with Hodrick-Prescott smoothing (lamnda = 14,400). The type of smoothing here is not as critical as is the decision to smooth the seasonal differences. If a reader prefers a different smoothing algorithm, have at at it. Just make sure you apply it to the seasonal differences, and that it not change the overall mean of the series. I.e., the mean of the seasonal differences, for GISTemp’s Nashville, TN data set, is -0.012647, whether smoothed or not. The smoothing simply helps us to see, a little more clearly, the regularity of warming and cooling trends over time. Now note clearly the sign of the mean seasonal difference: it is negative. Even in the GISTemp series, Nashville, TN has spent more time cooling (imagine here periods where the blue line in the chart above is below zero) than it has warming over the last ~130 years.

How can that be? Well, the method of analyzing differences is less sensitive – I.e. more “robust” — than fitting trend lines through the undifferenced data. “Step” type adjustments as we see with homogeneity adjustments only affect a single data point in the differenced series, but affect every data point (before or after it is applied) in the undifferenced series. We can see the effect of the GISTemp homogeneity adjustments here by comparing the previous figure with the following:

click to enlarge

Here, in the HadCRUT3 series, the mean seasonal difference is more negative, -0.014863 versus -0.012647. The GISTemp adjustments increases the average seasonal difference by 0.002216, making it less negative, but not enough so that the result becomes positive. In both cases we still come to the conclusion that “on the average” monthly seasonal differences in temperatures in Nashville have been negative over the last ~130 years.

An Important Caveat

So have we actually shown that, at least for Nashville, TN, there has been no net warming over the past ~130 years? No, not necessarily. The average monthly seasonal difference has indeed been negative over the past 130 years. But it may have been becoming “less negative.” Since I have more confidence, at this point, in the integrity of the HadCRUT3 data, than the GISTemp data, I’ll discuss this solely in the context of the HadCRUT3 data. In both the “original data” and in the blue “trend” shown in the above figure, there is a slight upward trend over the past ~130 years:

click to enlarge

Here, I’m only showing the fit relative to the smoothed (trend) data. (It is, however, exactly the same as the fit to the original, or unsmoothed, data.) Whereas the average seasonal difference for the HadCRUT3 data here was -0.014863, from the fit through the data it was only -0.007714 at the end of series (October 2009). Still cooling, but less so, and in that sense one could argue that there has been some “warming.” And overall – I.e. if a similar kind of analysis is applied to all of the stations in the HadCRUT3 data set (or “subset”) – I will not be surprised if there is not some evidence for warming. But that has never really be the issue. The issue has always been (a) how much warming, and (b) where has it come from?

I suggest that the above chart showing the fit through the smooth helps define the challenges we face in these issues. First, the light gray line depicts the range of natural climate variability on decadal time scales. This much – and it is very much of the data – is completely natural, and cannot be attributed to any kind of anthropogenic influence, whether UHI, land use/land cover changes, or, heaven forbid, greenhouse gases. If there is any anthropogenic impact here, it is in the blue line, what is in effect a trend in the trend. But even that is far from certain, for before we can conclude that, we have to rule out natural climate variability on centennial time scales. And we simply cannot do that with the instrumental temperature record, because it isn’t long enough. I hate to admit that, because it means either that we accept the depth of our ignorance here, or we look for answers in proxy data. And we’ve seen the mess that has been made of things in trying to rely on proxy data. I think we have to accept the depth of our ignorance, for now, and admit that we do not really have a clue about what might have caused the kind of upward drift we see in the blue trend line in the preceding figure. Of course, that means putting a hold on any radical socioeconomic transformations based on the notion that we know what in truth we do not know.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
203 Comments
Inline Feedbacks
View all comments
J. Peden
December 12, 2009 10:32 am

Now we are not even sure if there HAS been any warming at all! This could be the biggest scientific fiasco of modern times.
What the elite Climate Scientists are measuring seems to me to be the same as only how they are measuring it. So it doesn’t look good for their case to the effect that what they are doing has much to do with the real world.

Kevin Kilty
December 12, 2009 10:35 am

If one wishes to test the hypothesis that UHI has affected the trends, then follow the lead of epidemiology and make a dose-response “curve”. For instance, sort the raw data into strata based on population growth (probably local energy usage is important also) and calculate average $\Delta t$ for each stratum. If one wishes to detect whether siting issues are relevent, then a dose response curve against station ranking 1-5, and so forth.
Just using rural stations may be probative in this UHI regard, but Evan has said that rural stations have worse siting problems, and probably have worse maintenance problems, than do urban stations.
Now if one wishes to quantify the UHI effect the dose-response curve is not much use in my opinion, and the UHI project that Anthony has on his “Projects” page is pertinent. It allows one to calculate an apparent partial of T with respect to time (i.e. an apparent global warming value) from a “v dot partial of T with respect to x.”

Kevin Kilty
December 12, 2009 10:46 am

Alec Rawls (09:14:50) :
According to NOAA, the largest adjustment in the USHCN data (the US part of the Global Historical Climate Network, which the basis for GISTemp) is time of day adjustments. These adjustments also have the strongest hockey stick shape, especially since the 1950’s.

Thanks for the references. I was looking at the Illinois data on blink comparator that Mike Macmillan posted yesterday, and find the adjustments to be bizarre just the same. Griggsville, Illinois for example has data that were shifted whole-hog by a decade into the past. Were the older data recorded in the wrong decade during the analog to digital conversion? Its possible–I’ve seen horrible data “busts” in digital elevation data for example.
Here is what the comparisons do for some other stations.
Morrison: The reverse correction of almost all other records. 1930s shifted positive by about 1 C.
Olney: 1930s adjusted cooler by 2C. Now very easy to make the current decade the warmest. Some of the oldest data is adjusted downward by 3C and a little of the most recent is adjusted upward by 0.5. The pivot point is about 2000.
Sparta: Pivot is about 1980. Oldest data vanishes.
Decatur: Pivot at 1970.
Aledo: 1985 and 1998 left alone, everything else is adjusted, and while the general adjustment prior to 1998 is downward and that after 1998 is upward, the adjustments are seesaw all throughout.
Almost every data value is adjusted in some way or another. Is this what we should expect of raw data? These don’t look like any sort of smooth adjustments…

Kevin Kilty
December 12, 2009 11:09 am

boballab (23:07:48) :
Anyone that can make sense of this let me know:

Missing data is difficult to deal with when it occurs in a time-series with a strong trend or cycle. So, for instance, in the S-O-N season the N value will always be colder (Northern latitudes here) than the average of S, and O, so you cannot figure a seasonal average from S and O alone, but have to have some algorithm for replacing the missing N. Apparently, from what you describe, a missing O is no big deal as they just average S and N, and this makes some sense, as O is generally intermediate to S and N in temperature. I have no idea how they handle missing terms at the front or back end of these seasons. I have lots of my inquiries to NOAA and related agencies go unanswered, but you might ask them what it is they do.
By the way, missing data are called “censored” values in the world of statistics, and isn’t that an ironic term considering the last month or so?

JamesJ003
December 12, 2009 11:31 am

Kevin Kilty 10:35:00
It has….McKitrick and Michaels 2007
http://www.uoguelph.ca/~rmckitri/research/jgr07/jgr07.html

Kevin Kilty
December 12, 2009 11:38 am

ScottR (10:31:14) :

I would sure like a map to all of this. Maybe a little flow chart showing the Global Warming creation process.

This game has become so complicated that a “program” would be helpful. Unfortunately the flow-chart might end up looking like those diagrams that Mort Saul made of the Kennedy Administration, and only good for comic effect.

Geo
December 12, 2009 11:54 am

Mark (22:14:25) :
Having spent a few weekend driving around upper Minnesota and the two Dakotas for surfacestations.org, I have to agree with Evan. Maybe in the Stevenson Screen days it would have been different, but the power/trenching requirements on the MMTS units seem to have done a darn thorough job of significantly biasing the rural stations too.

NickB.
December 12, 2009 12:04 pm

Phil A,
You can expect that anyone who calls people “deniers” probably isn’t a disinterested neutral party. I think the gentleman in question overreaches with his summation that this analysis effectively debunks any concerns about adjustments in the surface temperature record.
This may show that, in total, the CRU is relatively even-handed with their adjustments once they get the data. It does not end to end verify anything and, maybe someone can confirm/deny, is the CRU raw really raw?

Kevin Kilty
December 12, 2009 12:05 pm

JamesJ003 (11:31:38) :
Kevin Kilty 10:35:00
It has….McKitrick and Michaels 2007
http://www.uoguelph.ca/~rmckitri/research/jgr07/jgr07.html

Thanks. That is quite an interesting paper, and it does provide statistical evidence of a sizable effect from UHI. However, it does not eliminate a need for a study along the lines of the UHI project that Anthony suggests, which would be more like a direct measurement.
Also you will note in several sections they state
“If done correctly, temperature trends in climate data should be uncorrelated
with socioeconomic variables that determine these extraneous factors.”
If we are talking about identifying high-quality data, then insistence on no-correlation is almost assuredly not correct. I think that temperature measurement and adjustment, if done right, should show some correlation of this sort, as our use of energy, independently of the effect of CO2, demands that energy usage is dissipated to heat and ought to be apparent in reliable temperature readings.

Randy
December 12, 2009 12:13 pm

Let me see if I get this correct. Based on what I think you did with the data. If I were to sum up all the data points for Jan over the 100 or so years then I would see how much the temperature rose in the month of January over the 100 or so years. That might be as instructive as just knowing the deltas from year to year.

December 12, 2009 12:14 pm

I read all the responses last night, but there’s just too many to get thru today.
The point that I’d like to make is that everyone is looking at the temperature data & finding complete fabrication. Imagine what they could have done with the grid matrix? For them it’s the mother lode!
Think about it, for them to get the hockey stick they want, they need the temperature of the grid times the grid ‘volume’/’area.’ Wouldn’t it be easier to fudge that stuff? There is never a discussion of the grid.

photon without a Higgs
December 12, 2009 12:15 pm

TonyB (08:36:05) :
Basil 6 44 43
A bigger problem is the stations that have been dropped from the data

Phil A
December 12, 2009 12:19 pm

“Phil A,
You can expect that anyone who calls people “deniers” probably isn’t a disinterested neutral party. I think the gentleman in question overreaches with his summation that this analysis effectively debunks any concerns about adjustments in the surface temperature record. ” – NickB
Agree entirely about the “deniers” point – as soon as I saw that I knew we were probably not dealing with an open mind.
Having thought about it, let’s put it this way. Say we looked at a few criminal trials and found suspicion of jury tampering. Would a statistical analysis showing overall conviction rates had not been affected mean that the identified issues simply didn’t matter? Because that’s what he’s saying.

JJ
December 12, 2009 12:48 pm

Scott R,
“Let me get this straight: HadCRUT3 is based on real temperature measurements (with likely UHI effects), but has “corrections” and “homogenizations” in it already ”
Yes. The HadCrut3 data that have been released are homogenized. We dont know by what method(s), because Phlim Phlam Phil wont tell anyone. But they are homogenized. They likely do not include any significant correction for UHI, because Phil doesnt believe in it.
“— probably including extension of measured data into unmeasured areas of the earth, correct?”
No. Extending measured data into unmeasured areas occurs during the gridding procees. The homogenization process may extend measured data into measured areas, i.e. adjust some station’s data based on other stations’ data.
“Does GISTemp use HadCRUT3 and make more corrections to it?”
No. Despite what you may have read above, that is not how GISTemp is calculated.
“Or do GISTemp and HadCRUT share a root data set?”
Good question. It now appears that CRU draws heavily from GHCN, as does GISS. We dont know what other elfin magic CRU adds, though. And even if both draw from GHCN, they undoubtedly use a different mix of stations and a different mix of the data from those stations. Until the Motley CRU release their station list and detailed methods we wont know, but it is unlikely that the two start from exactly the same ‘root data set’.
“I would sure like a map to all of this. Maybe a little flow chart showing the Global Warming creation process.”
I agree. A cogent explanation of the two primary ‘global warming’ temperature estimates (GISS and CRU) and how they are derived from GHCN and/or other sources would be useful. It would document the interrelatedness of these ‘independant’ measurments, as well as catalog the holes in the public side of the process that the FOIA requests have been trying to fill.
One of these guys with a website ought to do this. Perhaps someone already has …

NickB.
December 12, 2009 12:51 pm

Plato Says (09:15:47) :
Quite brilliant fun and useful post – build your own climate change model
http://iowahawk.typepad.com/iowahawk/2009/12/fables-of-the-reconstruction.html
_____________________________________________
Wow, ty for the link!

Mac
December 12, 2009 1:05 pm

seems to me that if it’s “global” warming then i should be able to find a rural site with a long history that is well sited and pull it’s data and see the trend. If it’s truly global then the trend should always show up. Take one from each hemsiphere if you really feel a need to multiple samples.

steven mosher
December 12, 2009 1:34 pm

Basil,
You note that you used “homogenized data” Since Nashville is an Urban site GISS will adjust the record using surrounding rural sites. CRU make no such adjustment for Urban effects, instead CRU increase the error bands around the temperature signal to include the presumed effect.
You should look at the station “raw” data from GISS, then also at the data after combining stations if you want to get a complete picture.
Just a hint

steven mosher
December 12, 2009 1:41 pm

ScottR (10:31:14) :
“Let me get this straight: HadCRUT3 is based on real temperature measurements (with likely UHI effects), but has “corrections” and “homogenizations” in it already — probably including extension of measured data into unmeasured areas of the earth, correct?
Does GISTemp use HadCRUT3 and make more corrections to it? Or do GISTemp and HadCRUT share a root data set?
I would sure like a map to all of this. Maybe a little flow chart showing the Global Warming creation process.”
No. I know its confusing but here is the situation.
GISS: use GHCN “raw” data. They make adjustments to Urban stations
based on nearby Rural stations. These “adjustments” are all over the map, sometimes warming urban stations, sometimes cooling them. St. Mac has covered this. After adjustment Hansen claims his data shows the same trends as a RURAL ONLY. Problem? Rural isnt Rural.. se surfacestations project.
CRU: CRU ( it would appear) Also use GHCN ( plus others) They make NO ADJUSTMENT. They argue, citing peterson 2003 and Parker’s paper) That the Impact of UHI is SLIGHT. Jones ( see climategate mails in the Jan 07 time period ) clarifies this for Susan solomon prior to her Paris briefing. According to Jones ( 1990) the urban effect is on the order of .05C per century. CRU handle this NOT BY ADJUSTING, but by INCREASING the error bands around the dataline.
Clear? I thought not.

Peter Sinclair
December 12, 2009 2:07 pm

The site is doing a fantastic job. Forgive me if I push for guidance on several related issues. It would help me to take the battle forwards.
The debate is now addressing raw data and historical data (the same?) and homogenised and now manipulated data. I’m concerned that so-called raw data has been changed and I am very concerned about that. I’m not clear whether it is still available or not.
I’m also vaguely aware that the default of increased warming could be a process consequence rather than a conspiracy. I think a lot of us have strong views on that and eagerly await a conclusion. Either way, if it changes cooling or flat into warming, then at what point do we give data to the press?
It would be very useful for those of us not in the business to understand the relationships between GISS, NOAA, CRU and Hadley (UK Met Office) and any other major players. In particular, who has the raw data, which agencies manipulate it, who publishes what, etc. Is their homogenisation legitimate or misguided or corrupt?
For example, I am particularly annoyed by UK Met Office propaganda, but I don’t know how many data sources they use and whether the data is raw or homogenised when it reaches them. I don’t know whether they are the guilty party or the messenger. I want to hammer them but I don’t have the necessary knowledge. I’m sure that others are in this situation.
An informed overview of this lot would be great.

Pamela Gray
December 12, 2009 2:50 pm

Anthony, we here in Wallowa County are as rural as you can get. And we don’t appreciate you degrading our BBQ’s and silver upturned boats next to the temperature house. Plus, it is DAMNED cold in the winter and a lil’ heat by the Stevenson screen helps us navigate the daily thermometer check. Afterall, 15 minutes in this windchill and we freeze our ears and fingers. However, I will admit that the mug of hot Tom and Jerry helps a bit. I will remind the folks that monitor the two stations we have here to take the mug’o T&J with them instead of firing up the BBQ or using the blowtorch so’s they can see the frosted glass.
Now, for the really important stuff. My Jeep windshield frosted on the INSIDE (not fogged…it fricken froze!) this morning. Anyone know of an ice scraper shaped for the inside of the window?

rbateman
December 12, 2009 2:51 pm

I prefer my data raw, straight from the observer’s handwritten notes.

Tenuc
December 12, 2009 3:12 pm

My own feeling, after looking at some of the actual raw data station records and reading the Climategate documents, is that the real problem CRU/GISS have is that the raw thermometer data they gathered wasn’t of sufficiently good quality for them to be able to produce meaningful anomaly graphs, and they had to ‘fudge’ it as best they could. There were also too many problems with dissonance in the various temperature proxy series for this to be much use either.
Satellite data is much better in terms of continuity and spacial coverage, but even after it had been ‘calibrated’ to the obviously flawed thermometer data, it didn’t show the trend they believed was happening. So perhaps we shouldn’t be surprised at seeing station deletions and adjustments made to the pre-satellite data, needed to provide the +ve global temperatures needed to get continue support for the CAGW myth as our cooling Earth failed to co-operate.
Once you start going down the slippery slope, it’s impossible to stop.

Gary Palmgren
December 12, 2009 3:53 pm

I would prefer my data raw please.
If we had ten thousand stations recording the minimum and maximum temperatures every day for 150 years, this amounts to 1,095,000,000 temperature readings or one gigabyte. Assuming a lot extra information is added on the site location and other details, this could blow up to perhaps a 100 gigabytes. This would hardly strain a modern pc. Just how expensive is it to make such data available on the web? Consider the data streams that astronomers need to deal with. I spent a whole minute searching and found this:
Catalina Sky Survey…. Thanks to the $890,000 NSF grant awarded this month, the CRTS team soon will construct a web site that will make roughly 10 terabytes of data taken by the CSS over the past 5 years — as well as all new CSS data that continues to stream in — available over the Internet to astronomers worldwide, professional and amateur.
How many billions has gone to fund climate research?

J. Peden
December 12, 2009 4:13 pm

Thom (09:05:19) :
What do you make of this?
http://www.informationisbeautiful.net/2009/the-climate-deniers-vs-the-consensus/

All you have to do is look at the “climate deniers vs the consensus” title to see that the author likely has no idea what s/he is talking about, because both sides would likely be wrong, or at least would most likely have positions that have nothing to do with science. But, then, what are “climate deniers” anyway? I haven’t even heard of that one!
Going to the site, the author admits having no knowledge at all about climate science, praises Gavin Schmit and realclimate – where openness is absent, and doesn’t understand the basic argument of many sceptics – that the ipcc and its elite Climate Scientists are not doing real Science, where you have to first prove or back up your hypotheses by following the Scientific Method, which involves ‘showing your work’ by making it – code, data, ‘materials and methods’ – very accessable to anyone interested and in a timely way.
Otherwise, there are no scientific conclusions to either defend, promote, or criticize in the first place. It gets even worse from there for the “consensus”, pro-AGW position, which has been demonstrated well by Steve McIntyre, Richard Lindzen and many others, completely apart from what the leaked emails reveal.
I’m afraid that the author of the linked blog has no idea of what science is. S/he’s trying, but completely off the mark.

Basil
Editor
December 12, 2009 4:15 pm

steven mosher (13:34:23) :
Basil,
You note that you used “homogenized data” Since Nashville is an Urban site GISS will adjust the record using surrounding rural sites. CRU make no such adjustment for Urban effects, instead CRU increase the error bands around the temperature signal to include the presumed effect.
You should look at the station “raw” data from GISS, then also at the data after combining stations if you want to get a complete picture.
Just a hint

This, and your comment that followed, were insightful. Thanks for them.
Let me see if I understand. The GISS adjustment is to make Nashville “rural.” And it does this by making it appear that Nashville has warmed more than it actually has? Shouldn’t the adjustment be doing just the opposite? If the “unadjusted” Nashville trend was already sloping downward, any adjustment to remove UHI should have increased the downward slope, not turn it into a positive slope! It is almost as if the adjustment, rather than removing UHI, has artificially enhanced it.
Now maybe this is a “one-off” example, but it certainly suggests that there are situations where the UHI adjustment produces bizarre results. Of course, I’ve said several times now that I don’t think we should be trying to remove UHI from the “global temperature” metric. If I have a fever, I want to know it, I do not want a thermometer that has been hacked so that it does not measure fevers. This is just a case of being too clever by half.