Would You Like Your Temperature Data Homogenized, or Pasteurized?

A Smoldering Gun From Nashville, TN

Guest post by Basil Copeland

The hits just keep on coming. About the same time that Willis Eschenbach revealed “The Smoking Gun at Darwin Zero,” The UK’s Met Office released a “subset” of the HadCRUT3 data set used to monitor global temperatures. I grabbed a copy of “the subset” and then began looking for a location near me (I live in central Arkansas) that had a long and generally complete station record that I could compare to a “homogenized” set of data for the same station from the GISTemp data set. I quickly, and more or less randomly, decided to take a closer look at the data for Nashville, TN. In the HadCRUT3 subset, this is “72730” in the folder “72.” A direct link to the homogenized GISTemp data used is here. After transforming the row data to column data (see the end of the post for a “bleg” about this), the first thing I did was plot the differences between the two series:

click to enlarge

The GISTemp homogeneity adjustment looks a little hockey-stickish, and induces an upward trend by reducing older historical temperatures more than recent historical temperatures. This has the effect of turning what is a negative trend in the HadCRUT3 data into a positive trend in the GISTemp version:

click to enlarge

So what would appear to be a general cooling trend over the past ~130 years at this location when using the unadjusted HadCRUT3 data, becomes a warming trend when the homogeneity adjustment is supplied.

“There is nothing to see here, move along.” I do not buy that. Whether or not the homogeneity adjustment is warranted, it has an effect that calls into question just how much the earth has in fact warmed over the past 120-150 years (the period covered, roughly, by GISTemp and HadCRUT3). There has to be a better, more “robust” way of measuring temperature trends, that is not so sensitive that it turns negative trends into positive trends (which we’ve seen it do twice how, first with Darwin Zero, and now here with Nashville). I believe there is.

Temperature Data: Pasteurized versus Homogenized

In a recent series of posts, here, here, and with Anthony here, I’ve been promoting a method of analyzing temperature data that reveals the full range of natural climate variability. Metaphorically, this strikes me as trying to make a case for “pasteurizing” the data, rather than “homogenizing” it. In homogenization, the object is to “mix things up” so that it is “the same throughout.” When milk is homogenized, this prevents the cream from rising to the top, thus preventing us from seeing the “natural variability” that is in milk. But with temperature data, I want very much to see the natural variability in the data. And I cannot see that with linear trends fitted through homogenized data. It may be a hokey analogy, but I want my data pasteurized – as clean as it can be – but not homogenized so that I cannot see the true and full range of natural climate variability.

I believe that the only way to truly do this is by analyzing, or studying, how differences in the temperature data vary over time. And they do not simply vary in a constant direction. As everybody knows, temperatures sometimes trend upwards, and at other times downward. The method of studying how differences in the temperature data allows us to see this far more clearly than simply fitting trend lines to undifferenced data. In fact, it can prevent us from reaching the wrong conclusion, as in fitting a positive trend when the real trend has been negative. To demonstrate this, here is a plot of monthly seasonal differences for the GISTemp version of the Nashville, TN data set:

click to enlarge

Pay close attention as I describe what we’re seeing here. First, “sd” means “seasonal differences” (not “standard deviation”). That is, it is the year to year variation in each monthly observation, for example October 2009 compared to October 2008. Next, the “trend” is the result of smoothing with Hodrick-Prescott smoothing (lamnda = 14,400). The type of smoothing here is not as critical as is the decision to smooth the seasonal differences. If a reader prefers a different smoothing algorithm, have at at it. Just make sure you apply it to the seasonal differences, and that it not change the overall mean of the series. I.e., the mean of the seasonal differences, for GISTemp’s Nashville, TN data set, is -0.012647, whether smoothed or not. The smoothing simply helps us to see, a little more clearly, the regularity of warming and cooling trends over time. Now note clearly the sign of the mean seasonal difference: it is negative. Even in the GISTemp series, Nashville, TN has spent more time cooling (imagine here periods where the blue line in the chart above is below zero) than it has warming over the last ~130 years.

How can that be? Well, the method of analyzing differences is less sensitive – I.e. more “robust” — than fitting trend lines through the undifferenced data. “Step” type adjustments as we see with homogeneity adjustments only affect a single data point in the differenced series, but affect every data point (before or after it is applied) in the undifferenced series. We can see the effect of the GISTemp homogeneity adjustments here by comparing the previous figure with the following:

click to enlarge

Here, in the HadCRUT3 series, the mean seasonal difference is more negative, -0.014863 versus -0.012647. The GISTemp adjustments increases the average seasonal difference by 0.002216, making it less negative, but not enough so that the result becomes positive. In both cases we still come to the conclusion that “on the average” monthly seasonal differences in temperatures in Nashville have been negative over the last ~130 years.

An Important Caveat

So have we actually shown that, at least for Nashville, TN, there has been no net warming over the past ~130 years? No, not necessarily. The average monthly seasonal difference has indeed been negative over the past 130 years. But it may have been becoming “less negative.” Since I have more confidence, at this point, in the integrity of the HadCRUT3 data, than the GISTemp data, I’ll discuss this solely in the context of the HadCRUT3 data. In both the “original data” and in the blue “trend” shown in the above figure, there is a slight upward trend over the past ~130 years:

click to enlarge

Here, I’m only showing the fit relative to the smoothed (trend) data. (It is, however, exactly the same as the fit to the original, or unsmoothed, data.) Whereas the average seasonal difference for the HadCRUT3 data here was -0.014863, from the fit through the data it was only -0.007714 at the end of series (October 2009). Still cooling, but less so, and in that sense one could argue that there has been some “warming.” And overall – I.e. if a similar kind of analysis is applied to all of the stations in the HadCRUT3 data set (or “subset”) – I will not be surprised if there is not some evidence for warming. But that has never really be the issue. The issue has always been (a) how much warming, and (b) where has it come from?

I suggest that the above chart showing the fit through the smooth helps define the challenges we face in these issues. First, the light gray line depicts the range of natural climate variability on decadal time scales. This much – and it is very much of the data – is completely natural, and cannot be attributed to any kind of anthropogenic influence, whether UHI, land use/land cover changes, or, heaven forbid, greenhouse gases. If there is any anthropogenic impact here, it is in the blue line, what is in effect a trend in the trend. But even that is far from certain, for before we can conclude that, we have to rule out natural climate variability on centennial time scales. And we simply cannot do that with the instrumental temperature record, because it isn’t long enough. I hate to admit that, because it means either that we accept the depth of our ignorance here, or we look for answers in proxy data. And we’ve seen the mess that has been made of things in trying to rely on proxy data. I think we have to accept the depth of our ignorance, for now, and admit that we do not really have a clue about what might have caused the kind of upward drift we see in the blue trend line in the preceding figure. Of course, that means putting a hold on any radical socioeconomic transformations based on the notion that we know what in truth we do not know.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
203 Comments
Inline Feedbacks
View all comments
Mark
December 11, 2009 10:14 pm

Why doesn’t somebody get the raw unadjusted data for the world and plot just the rural data to see if there is warming or not? Urban area data would naturally seem to me to rise over time as those areas grow larger (and hence get more cars, electrical appliances, roads, buildings, people, etc).
In my view, if the raw unadjusted rural data shows no warming, then CO2 isn’t working.
[REPLY – Even the rural stations are horribly sited. When I last totted it up, the CRN site rating was even worse for rural stations than urban (rural/urban as defined by USHCN1). Even so, the average urban station warmed 0.5C/century more than the average rural station. 9% of USHCN1 stations are classified as urban, 17% as suburban, and the rest, rural. ~ Evan]

Ian George
December 11, 2009 10:18 pm

There seems to be a discrepancy in the BOM records re raw data and their anomaly graphs in their
Australian high-quality climate site data. A blogger on Andrew Bolt’s site noticed that when the mean temp for Cape Otway Lighthouse station was calculated from the raw data it was not reflected in the anomaly map.
 I checked Yamba Pilot Station and found a similar discrepancy straight away.
1915 had a max av temp of 23.6C and a min av temp of 15.9C.

2008 had a max av temp of 23.6C and a min av temp 0f 15.5C.

Clearly, 1915 has a slightly higher mean av temp than 2008.
Yet the anomaly graph shows 2008 higher than 1915 by 0.2C. Eh! It should be the other way around.
These discrepancies (which also show up in Cape Otway) give a false impression that the recent warming is greater than it really is. There must be many examples of this (NZ, Darwin, Arctic stations, etc).


Anomaly data at :-

http://reg.bom.gov.au/cgi-bin/climate/hqsites/site_data.cgi?variable=maxT&area=aus&station=058012&period=annual&dtype=anom&ave_yr=10
Raw data (max temps) at:-
http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=36&p_display_type=dataFile&p_startYear=&p_stn_num=058012
and min temps at:-
http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=38&p_display_type=dataFile&p_startYear=&p_stn_num=058012
Seems like the service just went down due to maintenance problems. Should be back on line – hopefully all the same.

David
December 11, 2009 10:20 pm

I’ll issue a gripe here. Annualizing the temperatures. Why? There are four seasons every year. If you want to make the data less choppy, fine, but at least keep it in its natural rhythm. Do the Grand Poobahs have a reason for imposing the calendar’s will on the data?

jorgekafkazar
December 11, 2009 10:22 pm

Pasteurized or homogenized, the GISTemp data is cheesy.

David
December 11, 2009 10:25 pm

Wait, is that what you did in the graph with sd Basil? I read fast and assumed the standard deviation.

TerryBixler
December 11, 2009 10:29 pm

Headline “Nashville avoids global warming by using unadulterated data” I wonder how many other locations could avoid global warming by using unadulterated data. Why indeed maybe we could start a national trend. Maybe we could introduce a bill in congress to stop global warming by using unadulterated data, no that would not work as where is the tax in that.
Great Work and thank you for your outstanding efforts.

Doug in Seattle
December 11, 2009 10:35 pm

Basil, I felt a very slight breeze above me as read this. Definitely above my level of understanding. Hopefully we have some other readers who understand your data torture better than I do. I do however relate to your closing remarks about the level of our collective ignorance regarding attribution.
Thanks . . . I think.

December 11, 2009 10:44 pm

Looking at the differences between the GISTemp and HadCRUT3 data (first graph), you can clearly see similar ‘stepwise’ decreases and increases as we have seen at Darwin. There is also the clear ‘reversal’ of the changes made in the early 60’s. That is significant, I think, since isn’t that when the tree-ring proxies were cut off, and thus these are the data ‘spliced’ on to the end of that data?
It would appear that ‘hiding the decline’ is not enough, they had to provide some artificial ‘forcing’ upward of the remaining data. I have to say that I now reject any claims of any Global Warming until all the data have been independently verified. As discussed well here, another method of seeking trends is probably in order too.
The trouble is that even sceptics admit ‘there has been warming’ but we are not certain if it is because of CO2. Now we are not even sure if there HAS been any warming at all! This could be the biggest scientific fiasco of modern times.

MikeC
December 11, 2009 10:47 pm

You guys are on the right track, but what you have not done is gone into the scientific literature and looked at exactly how the adjustments are made… you will, i guarrantee, find this whole temperature monitoring and reporting bizzarre

SABR Matt
December 11, 2009 10:49 pm

This is a creative idea…but…I fear you’re going to get a lot of misleading answers if there are large jumps in the dataset that aren’t the result of changes in the recording station and that aren’t the result of climate variability. If you gradually cool for 75 years and then rapidly warm for 30, you’re going to do more seasonal cooling than seasonal warming…that’s the argument the AGW crowd will use at any rate.
I think the better approach is to make real adjustments to the data instead of statistically invented ones. Don’t look at station data as though it’s numbers on a map…look at it as though it’s a thermometer in a park. Take the raw data…adjust it for the urban heat island affect (which can be fairly accurately parameterized with a spatial correlation between population density and spatial climate temperature anomaly), adjust to account for real moves in the station instrument, adjust to account for anything you can actually document. Don’t smear the data like GISTEMP does…don’t apply arbitrary adjustment to the homogenized data the way HadCRUT3 does. Adjust how you think is appropriate and thoroughly document why you’re adjusting.

April E. Coggins
December 11, 2009 10:49 pm

There is something wrong with our local reported temps. Is there a reasonable explanation of why the WeatherChannel and Weather Underground are reporting temps well above actual temps in my town? Our GISS station has been shut down, though it was well located and well manned. Today I have noticed that the temps reported at our airport are well above the real temps. Not just a little but over ten degrees difference. We supposedly reached over 42F but we never saw above 26F. I wonder because if our GISS is shut down, where are our temps being reported and who is recording them? Do I sound paranoid? Yes, because the politicians are relying on a warming earth.
[REPLY – No surprise here. Airports typically show a much faster warming over the last 30 years than non-APs, for a variety of reasons: Increased air traffic, urban encroachment, severe HO-83 equipment issues, etc. Even the best-sited systems in rural APs are seriously affected. ~ Evan]

December 11, 2009 10:53 pm

There’s a few things that emerge for me out of these recent postings about temperature datasets. These are all great posts by the way.
The 1880’s are coming out every bit as warm as the 1940’s and the current period, suggesting a 60 year cycle. This 60 year cycle doesn’t really correlate with sunspot activity, or the lunar cycle. ENSO and the PDO seems to be an effect, not a cause.
The fulcrum point for the warming seems to be happening around the 1980’s, in that years prior to that are adjusted downwards and years after slightly upwards, and I wonder is this because of satellite data coming onstream and keeping them honest subsequently.
We’re seeing a huge amount of interest in this, with people going off and looking into this data for themselves, for the first time in a lot of cases. But I worry that this is going to dissipate our efforts.
I have just downloaded the Daily data from GHCN (Please tell me that this data is truly raw, otherwise I will have downloaded 1.7GB of nothing) with a view to better learning R, and seeing what it shows for my country, Ireland.
Different datasets and methodologies are being used, meaning the warm-mongers can say “Ah yes, but we adjust for this in such and such a way using internationally recognised practices, and anyway, your methodologies haven’t been peer-reviewed”. And we’re going to keep bumping up against this every time a new dataset is issued.
I’ve seen lots of complaints about the various ways the raw data is adjusted, but I think if we are going to be taken seriously we need to come up with our own one way of adjusting the data, and then get the wider community to help out with the actual implementation.
Willis Essenbach has recently acquired the surfacetemps.org domain name. I’ve offered to help, as I’m sure many others have too, including the heavyweights. I would suggest that we put our shoulders behind this effort and put up a genuine alternative to the current crop of temperature analyses.

Nigel S
December 11, 2009 10:56 pm

A relevant hit from the late great Benny Hill
She said she’d like to bathe in milk
He said alright sweetheart
And when he finished work one night
He loaded up the cart
He said you wanted pasturised
Coz pasturised is best
She says Ernie I’ll be happy
If it comes up to me chest
And that tickled old Ernie (Ernie)
And he drove the fastest milkcart in the west

December 11, 2009 10:56 pm

Apologies, misspelling of Willis Eschenbach

David A. Burack
December 11, 2009 11:03 pm

“I think we have to accept the depth of our ignorance, for now, and admit that we do not really have a clue… ”
That is a proper attitude toward what is without a doubt one of the most complex natural systems one could choose to study. Had that attitude been maintained by all of the scientists interested in GCC, or at least those instrumental in setting the terms of the debate, it would have maintained the reputation of climatology as a science properly humble in the face of awesome forces and bewilderingly complex interactions.
Now, because of plainly revealed human error–to apply a generous characterization to plainly revealed behavior–this most vital of natural systems seems likely to revert to mean: a scientific backwater, ‘though as ever an immensely interesting one, whilst a reliable and universally accepted record is accumulated, commensurate with the immense periods and geography of global climate and with the importance of the immense human consequences of modifying its dynamics.

Michael
December 11, 2009 11:06 pm

Huffington Post: Anatomy Of The Tea Party Movement
http://www.huffingtonpost
The modern day Tea party movement was started by the Ron Paul movement.
Rachel Maddow: Tea Parties and Ron Paul
http://www.youtube.com/wa
Can people post comments over there and set them straight? I’m banned over there. *****************************************************************
*****************************************************************
Want to here a true story?
On December 12th, 2009 michaelwise says:
Want to here a true story that happened to me over the past two days?
Yesterday I posted this on my WUWT blog.
http://wattsupwiththat.co
“Michael (16:44:43) :
I scan the comments to the articles on a daily basis at The Huffington Post et al, to measure what I call the “Mass Brainwash Index”, that publication being one of the best places to get accurate results from the populous. Six months ago my index was at a reading of 9.5, 10 being the most brainwashed and 0 being the least. Today my Index has fallen to a reading of 7.5. Something dramatic is happening to the psyche of the American population.”
Today I post this concerning Huffington post;
“Michael (01:15:38) :
Top story on Huffington Posts Green Tab has this as the first comment about The Copenhagen Summit. Is somebody handing out brains over there?
“Mogamboguru I’m a Fan of Mogamboguru I’m a fan of this user 328 fans permalink
” An Incredibly Expensive F o l l y ”
“Why Failure in Copenhagen Would Be a Success”
CO2 Emissions Cuts Will Cost More than Climate Change Itself
Based on conventional estimates, this ambitious program would avert much of the damage of global warming, expected to be worth somewhere around €2 trillion a year by 2100. However, Tol concludes that a tax at this level could reduce world GDP by a staggering 12.9% in 2100 — the equivalent of €27 trillion a year.
.
It is, in fact, an optimistic cost estimate. It assumes that politicians everywhere in the world would, at all times, make the most effective, efficient choices possible to reduce carbon emissions, wasting no money whatsoever. Dump that far-fetched assumption, and the cost could easily be 10 or 100 times higher.
To put this in the starkest of terms: Drastic carbon cuts would hurt much more than climate change itself. Cutting carbon is extremely expensive, especially in the short-term, because the alternatives to fossil fuels are few and costly. Without feasible alternatives to carbon use, we will just hurt growth.
Secondly, we can also see that the approach is politically flawed, because of the simple fact that different countries have very different goals and all nations will find it hard to cut emissions at great cost domestically, to help the rest of the world a little in a hundred years.”
Me;
Yes Virgina there is a Santa Clause.**************************************
Later today I put up this topic on Daily Paul
http://www.dailypaul.com/
For those at the WUWT blog, the quotes are in the one with the Kid and the one with John Coleman.
True Story.
I don’t know? Maybe I can create a bridge between the WUWT blog and the Daily Paul Blog, a Political blog and a Science blog, to bookmark these threads, and keep the conversation going between these two blogs. The concersation that would come out would be most fascinating.
I think I just invented “Cross Blog Debate”. This is my idea. Please give me credit for it. Thanks.
******************************************************************
Now if we could just get comments to appear on both blogs talking abound the same subject at the same time linking the two in the blogosphere?
Wouldn’t that bee a Hoot.
*********************************************************************
“Want to Join the Debate” is another invention of mine, a web site managing all the debates all over the world. If you don’t mind me laying claim to it right here right now.
It is a website that brings all the Blogosphere together.
It connects blogs talking about similar subjects together. Tiers of competence are created, voted on by their peers. May the best blog debate WIN!
**********************************************************************
We can track the comment to the millisecond to see who got there first.

boballab
December 11, 2009 11:07 pm

Anyone that can make sense of this let me know:
I was going through the “raw” from GISS for State College Pa. when I came across something I can’t see how they cam up with the figure. First background for those that don’t know when you get the data from GISS there is 2 ways to get it, one by a Postscript file the other by looking at their txt file and pasting it over.
From there you see the data strung out with headers for each month followed by the four seasonal means starting with Dec of the previous year and going forward. So you take the three months and add them together then divide by 3 and you got the seasonal mean. From the four means they add them together and divide by 4 and get the annual mean. Now I decided to check to see how they computed the Seasonal mean when they had no data for a month and this is where things went into the twilight zone.
So there I am looking at 1973 and 1974, both of which had no data for the month of November. 1973 had 17.9C for Sept and 12.9C for Oct and a S-O-N of 12.3 . So then I go and look at 1974 and for Sept they had 15.9C and Oct 9.5C and a S-O-N of 9.6C . Now can anyone explain how you come up with a seasonal average lower then all inputs and another only .1C above all inputs?
To my poor tires brain it should be an average of 15.4C for ’73 and 12.7C for ’74.
Then in 1975 they don’t have data for Oct and I used the Sept and Nov data provided and what I calculated matches the S-O-N listed of 11.9C . However when I go to 1984 they don’t have data for Sept and when I calculate the S-O-N it doesn’t match the data file. My Calc: 8.5C what they got is 11.6C.
To me it looks like the USHCN v2 has a bug in it that when the data is missing from the center of the Seasonal mean calculation it gives a funking reading but when it is on either side (ie the S or the N in S-O-N) it works fine.

George Turner
December 11, 2009 11:10 pm

Have you tried attributing Nashville’s earlier, higher temperatures to the friction caused by Reconstruction? A link between global warming and institutionalized racism and sectional North/South bigotry would be a win/win in guiltophile circles.
On a serious note, Nashville has grown enormously and I’d expect more UHI on at least some level. Has this already been factored in?
I tried pulling up data for Kentucky but it seems to be a bit scrambled. Mt. Vernon kept pulling up old records for Munfordville, as if some links had been messed up.
This whole thing has me quite worried because if the IPCC claims are true, in a hundred years Kentucky will be as warm as Tennessee, where human life is impossible due to the incredibly high temperatures.

JJ
December 11, 2009 11:17 pm

“So what would appear to be a general cooling trend over the past ~130 years at this location when using the unadjusted HadCRUT3 data, becomes a warming trend when the homogeneity adjustment is supplied.”
This is not correct.
The HadCRUT3 data you are using are not raw data. They are homogenized data.
The GISS homogenization was not applied to HadCRUT3 to arrive at the GISS homogenized series.
If you want to ascertain the effect of the GISS homogenization, you need to compare to raw data, or back it out from the GISS code.

December 11, 2009 11:20 pm

Excellent post, Basil!
Another nail in the coffin of AGW – well done!

Paul Z.
December 11, 2009 11:21 pm

What really bothers me is that Tiger Woods gets crucified for liking sex and women, while the UN and governments of the world get off scot free perpetuating this massive AGW swindle that will cost taxpayers trillions of dollars, even when blatant examples of their fraud are being uncovered daily. So much for journalism. Where is the investigative reporting into this serious matter that will affect millions of lives? Where is the peer-review in the scientific process of climate change? No, instead Tiger Woods gets front page coverage in the News, Sports AND Entertainment sections of all the mainstream media channels. Something is very wrong; a revolution may be required to get rid of these climate change hucksters.

David A. Burack
December 11, 2009 11:22 pm

I wish that I’d excised a couple of instances of “immensely” and one “plainly revealed.”
I want to add one thought about “universally accepted,” since I know that is open to challenge, and a question. Imagine the state of simple astronomical science of the (Copernican) solar system if there had been, say, two different databases showing the distances and times of revolution of the planets. The astronomical “community” would have called a halt to any further prognostication until the predictive ability of one theory or another, or improvement in measurement techniques, purged the faulty data set(s) from the debate, or rectified them. Why did that not happen in this community?

LarryOldtimer
December 11, 2009 11:28 pm

Great work!

L
December 11, 2009 11:32 pm

I’d like to comment, but I’m speechless; great article.

Mike Fox
December 11, 2009 11:35 pm

This post, Willis Eschenbach’s “Smoking Gun”, and Anthony’s on raw/partially cooked/stewed data got me thinking.
Just as with the old “SETI at Home” project, couldn’t we all engage in some “distributed computing” using raw data, if any still exists, and then having it compiled in a central location?
I’m thinking along the lines of Anthony’s surfacestations.org project.
How about if each of us who was a surfacestations surveyor would take the “raw” data from the stations we surveyed, crank it through formulas like Willis and Basil did in their posts, and then upload the results to supplement our surfacestations.org surveys?
Now, I’m not educated in the right field to figure out how to devise the proper analytical tools (I’m a retired lawyer, not a scientist or statistician), but I bet someone could come up with an Excel script (or something) that each one of us could download that would create the right kind of spreadsheet for us to plug the raw data into. We then could let our computers do the work, draw the graphs, and then upload it for all to see.
It seems that there’s a lot of interest and energy out here in WUWT reader land. Analyzing all the HCN stations would be too big a project for folks like Willis Eschenbach, Basil Copeland, Anthony, et al., to take on all by themselves. But if analysis was handed off to all the readers of WUWT, surfacestation.org surveyors, and the like, there might be quite the artillery battery of smoking guns.
Am I making sense here? Is this in any way feasible?
Regards,
Mike

1 2 3 9