Would You Like Your Temperature Data Homogenized, or Pasteurized?

A Smoldering Gun From Nashville, TN

Guest post by Basil Copeland

The hits just keep on coming. About the same time that Willis Eschenbach revealed “The Smoking Gun at Darwin Zero,” The UK’s Met Office released a “subset” of the HadCRUT3 data set used to monitor global temperatures. I grabbed a copy of “the subset” and then began looking for a location near me (I live in central Arkansas) that had a long and generally complete station record that I could compare to a “homogenized” set of data for the same station from the GISTemp data set. I quickly, and more or less randomly, decided to take a closer look at the data for Nashville, TN. In the HadCRUT3 subset, this is “72730” in the folder “72.” A direct link to the homogenized GISTemp data used is here. After transforming the row data to column data (see the end of the post for a “bleg” about this), the first thing I did was plot the differences between the two series:

click to enlarge

The GISTemp homogeneity adjustment looks a little hockey-stickish, and induces an upward trend by reducing older historical temperatures more than recent historical temperatures. This has the effect of turning what is a negative trend in the HadCRUT3 data into a positive trend in the GISTemp version:

click to enlarge

So what would appear to be a general cooling trend over the past ~130 years at this location when using the unadjusted HadCRUT3 data, becomes a warming trend when the homogeneity adjustment is supplied.

“There is nothing to see here, move along.” I do not buy that. Whether or not the homogeneity adjustment is warranted, it has an effect that calls into question just how much the earth has in fact warmed over the past 120-150 years (the period covered, roughly, by GISTemp and HadCRUT3). There has to be a better, more “robust” way of measuring temperature trends, that is not so sensitive that it turns negative trends into positive trends (which we’ve seen it do twice how, first with Darwin Zero, and now here with Nashville). I believe there is.

Temperature Data: Pasteurized versus Homogenized

In a recent series of posts, here, here, and with Anthony here, I’ve been promoting a method of analyzing temperature data that reveals the full range of natural climate variability. Metaphorically, this strikes me as trying to make a case for “pasteurizing” the data, rather than “homogenizing” it. In homogenization, the object is to “mix things up” so that it is “the same throughout.” When milk is homogenized, this prevents the cream from rising to the top, thus preventing us from seeing the “natural variability” that is in milk. But with temperature data, I want very much to see the natural variability in the data. And I cannot see that with linear trends fitted through homogenized data. It may be a hokey analogy, but I want my data pasteurized – as clean as it can be – but not homogenized so that I cannot see the true and full range of natural climate variability.

I believe that the only way to truly do this is by analyzing, or studying, how differences in the temperature data vary over time. And they do not simply vary in a constant direction. As everybody knows, temperatures sometimes trend upwards, and at other times downward. The method of studying how differences in the temperature data allows us to see this far more clearly than simply fitting trend lines to undifferenced data. In fact, it can prevent us from reaching the wrong conclusion, as in fitting a positive trend when the real trend has been negative. To demonstrate this, here is a plot of monthly seasonal differences for the GISTemp version of the Nashville, TN data set:

click to enlarge

Pay close attention as I describe what we’re seeing here. First, “sd” means “seasonal differences” (not “standard deviation”). That is, it is the year to year variation in each monthly observation, for example October 2009 compared to October 2008. Next, the “trend” is the result of smoothing with Hodrick-Prescott smoothing (lamnda = 14,400). The type of smoothing here is not as critical as is the decision to smooth the seasonal differences. If a reader prefers a different smoothing algorithm, have at at it. Just make sure you apply it to the seasonal differences, and that it not change the overall mean of the series. I.e., the mean of the seasonal differences, for GISTemp’s Nashville, TN data set, is -0.012647, whether smoothed or not. The smoothing simply helps us to see, a little more clearly, the regularity of warming and cooling trends over time. Now note clearly the sign of the mean seasonal difference: it is negative. Even in the GISTemp series, Nashville, TN has spent more time cooling (imagine here periods where the blue line in the chart above is below zero) than it has warming over the last ~130 years.

How can that be? Well, the method of analyzing differences is less sensitive – I.e. more “robust” — than fitting trend lines through the undifferenced data. “Step” type adjustments as we see with homogeneity adjustments only affect a single data point in the differenced series, but affect every data point (before or after it is applied) in the undifferenced series. We can see the effect of the GISTemp homogeneity adjustments here by comparing the previous figure with the following:

click to enlarge

Here, in the HadCRUT3 series, the mean seasonal difference is more negative, -0.014863 versus -0.012647. The GISTemp adjustments increases the average seasonal difference by 0.002216, making it less negative, but not enough so that the result becomes positive. In both cases we still come to the conclusion that “on the average” monthly seasonal differences in temperatures in Nashville have been negative over the last ~130 years.

An Important Caveat

So have we actually shown that, at least for Nashville, TN, there has been no net warming over the past ~130 years? No, not necessarily. The average monthly seasonal difference has indeed been negative over the past 130 years. But it may have been becoming “less negative.” Since I have more confidence, at this point, in the integrity of the HadCRUT3 data, than the GISTemp data, I’ll discuss this solely in the context of the HadCRUT3 data. In both the “original data” and in the blue “trend” shown in the above figure, there is a slight upward trend over the past ~130 years:

click to enlarge

Here, I’m only showing the fit relative to the smoothed (trend) data. (It is, however, exactly the same as the fit to the original, or unsmoothed, data.) Whereas the average seasonal difference for the HadCRUT3 data here was -0.014863, from the fit through the data it was only -0.007714 at the end of series (October 2009). Still cooling, but less so, and in that sense one could argue that there has been some “warming.” And overall – I.e. if a similar kind of analysis is applied to all of the stations in the HadCRUT3 data set (or “subset”) – I will not be surprised if there is not some evidence for warming. But that has never really be the issue. The issue has always been (a) how much warming, and (b) where has it come from?

I suggest that the above chart showing the fit through the smooth helps define the challenges we face in these issues. First, the light gray line depicts the range of natural climate variability on decadal time scales. This much – and it is very much of the data – is completely natural, and cannot be attributed to any kind of anthropogenic influence, whether UHI, land use/land cover changes, or, heaven forbid, greenhouse gases. If there is any anthropogenic impact here, it is in the blue line, what is in effect a trend in the trend. But even that is far from certain, for before we can conclude that, we have to rule out natural climate variability on centennial time scales. And we simply cannot do that with the instrumental temperature record, because it isn’t long enough. I hate to admit that, because it means either that we accept the depth of our ignorance here, or we look for answers in proxy data. And we’ve seen the mess that has been made of things in trying to rely on proxy data. I think we have to accept the depth of our ignorance, for now, and admit that we do not really have a clue about what might have caused the kind of upward drift we see in the blue trend line in the preceding figure. Of course, that means putting a hold on any radical socioeconomic transformations based on the notion that we know what in truth we do not know.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
203 Comments
Inline Feedbacks
View all comments
December 12, 2009 3:15 am

As each new bit of data pops up and shows that it’s been tampered with, I’m increasingly concerned at the scale of what’s going on.
I thought it was primarily about clique behaviour and group-think acceptance of an argument which had gained popular traction. I really didn’t think that ANYONE would fiddle with the actual temperature records.
Making up models to suit their PowerPoint slides and selective use of start points is one thing – but to actually rewrite history is simply stunning. And so stupid – how did they expect to get away with this?
They do say that if you’re going to tell a lie, tell a big one – but this?!?
It’s like Dr Shipman and his hundreds of victims or 9/11 – no one would have believed it was real.

Malaga View
December 12, 2009 3:22 am

Wonderful research and analysis….
So the averages have been “cooked”… nothing can be taken at face value… we have to go back to basics… look for the daily maximum and minimum temperatures to see the real trends… the averages hide way too much information on a local, regional and global scale…
Just goes to prove “Who controls the past controls the future”….
And we now know “Who controls the present controls the past”…

Lulo
December 12, 2009 3:39 am

Admittedly slightly OT, but you’ll love this editorial in the Calgary Herald.
http://www.calgaryherald.com/technology/Real+scientists+should+care+more+about+fraud/2333666/story.html

NickB.
December 12, 2009 3:46 am

Alexander,
Welcome!

December 12, 2009 3:56 am

“But even that is far from certain, for before we can conclude that, we have to rule out natural climate variability on centennial time scales. And we simply cannot do that with the instrumental temperature record, because it isn’t long enough.”
I’d like to see this derivative analysis (“trend of a trend”) for the dozen or two continuous records that do go back to the 1700s or even 1600s (Central England), a century or more longer than the 1881 GISS cut-off date. What equation do I use in Excel to calculate it?
Fahrenheit’s scale of 1724 made thermometers fairly accurate. He set salted ice water at 0° and healthy body temperature (of a horse) at 100°. When the variation in the boiling point of water with altitude was understood, the scale was altered so that water boiled at 212° (half the degrees in a full circle above ice water at 32°). Thus by 1800, or even 1750, thermometers were fairly good. Even if absolute values were off for a given thermometer, even the original body temperature scale should be quite good at measuring changes.
From Wikipedia: “An individual’s body temperature typically changes by about 0.5 °C (0.9 °F) between its highest and lowest points each day.”
My temperature with a drugstore digital thermometer under my tongue came out as 98.1° F instead of the standard value of 98.6° F. Body temperature isn’t good enough for climatology. But if I’m off by 1° F in the high point of my 0-100° F calibration then the length of my degree is only off by 1% so I can still accurately measure temperature changes quite accurately.
For those interested, these two plotting sites allow listings in terms of how old records are:
http://rimfrost.no
http://www.appinsys.com/GlobalWarming/climate.aspx
And this site has collected many plots as images: http://climatereason.com
Most old records are quite linear and show no clear AGW signal even with urban heating not subtracted out. A few do (especially De Bilt and Uppsala) show a recent upswing. And a few (e.g. Paris) show linear trends that alter at some point to a different slope).
I’ll post my own long-record plots, made merely as appealing illustrations rather than analytical figures but which are in fact accurate aside from a couple of sadly truncated records having had 2-5 years of the very latest data tacked on from GISS after adjusting it to match the last available year in the super-long data set (New York is one and Minneapolis one I’d still like to treat thus):
http://i47.tinypic.com/2zgt4ly.jpg
http://i45.tinypic.com/125rs3m.jpg

J.Hansford
December 12, 2009 4:06 am

“Would You Like Your Temperature Data Homogenized, or Pasteurized?”
Well I think they thought they could homogenize it and sneak it Past-our-eyes….. 😉

Nigel S
December 12, 2009 4:24 am

MattB (02:11:30)
Glad you liked it but things are pretty secure humo(u)rwise over there. Have you ever watched the Simpsons? Possibly the most brilliant thing on TV ever.

PC
December 12, 2009 4:27 am

I am having trouble understanding why observed temperatures over the last 100 years show no indications of warming…not even the hint of a hockey stick. The Australian Bureau of Meteorology allows you to look at the mean maximum and mean minimum temperatures for any year, and to compare the plots of any 2 years.
If you look at the records, the temperatures in say 1890 show little or no difference with 2009…in some cases 1890 would be slightly higher. This is true for both city and country.
Glen Inness (country town) and Sydney Observatory (major city) are two good examples.
Any yet, the temperature chart for the state of New South Wales (both Glen Innes and Sydney are in New South Wales) shows a pronounce “hockey stick” warming since the 1970s.
How can this be?

December 12, 2009 4:32 am

There is no way to make heads or tails of the temperature record. and with each passing day, isn’t that the point? They have bent, spindled and mutilated the surface temperature data so they can present any fairy tale they choose. It’s really disgusting that this would then show up as science.
It’s going to take years to try and sort it all out, if it’s even possible to do so.
We need first a world version of SurfaceStations.org to even get started.
And then some agreed upon rules.

December 12, 2009 4:33 am

I have just downloaded the CRU data and was looking through the UK data. Many of the stations only have data over a limited period. For example the CRU data for Oxford UK contains data from 1900 to 1980, yet the Met Office site for Oxford covers 1853 to 2009. Why is the CRU data so truncated I wonder?

Ronaldo
December 12, 2009 4:35 am

The HadCRUT3 data set is not the raw data. Here is how the Met Office describe the data.
http://www.metoffice.gov.uk/climatechange/science/monitoring/subsets.html
“The data that we are providing is the database used to produce the global temperature series. Some of these data are the original underlying observations and some are observations adjusted to account for non climatic influences, for example changes in observations methods”
What changes have been made?

Hank Henry
December 12, 2009 4:49 am

not only homogenized but fortified with vitamins.

H.R.
December 12, 2009 4:57 am

JustPassing (02:38:55) :
“I think I’ll send the CRU at East Anglia a nice box of fudge for Xmas.”
Bingo! If your lead was followed… a couple of thousand boxes of fudge arriving at CRU would certainly make a point.

December 12, 2009 5:05 am

Instead of just picking one station here and there, why not look at the effect of adjustments overall. This is what Giorgio Gilestro did for the GHCN set that Willis analysed for Darwin alone. . He shows the distribution of the effects of the adjustment on trend. It looks quite symmetric. Adjustments are just as likely to cool as to heat.

Johnny Bombenhagel
December 12, 2009 5:08 am

It’s a Climategate Christmas:

Hahaha!

rbateman
December 12, 2009 5:09 am

DeNihilist (00:01:26) :
Yes, it brings back memories.
And today’s weather is not at all unlike 1970’s.
Full circle.

Alec, a.k.a Daffy Duck
December 12, 2009 5:10 am

“Would You Like Your Temperature Data Homogenized, or Pasteurized?” ??????
Just before I clicked here I was reading the emails…I searched and did not find that the following had ever been posted here:
“One way would be to note that the temperature amplitude (1000 – 1950)
for each is ~1.5°C. Thus you could conclude that hemispheric/global
climate varied by over a degree Celcius (although with regional
differences)
Another way would be to average the records. The resulting temperature
amplitude would be smaller because extremes would cancel since
variability is large and each region’s extremes occur at different
times.
Thus, if people simply looked at several records they would get the
impression that temperature variations were large, ~1.5°C. Imagine
their surprise when they see that the ensemble averages you publish
have much smaller amplitude. ”
http://climate-gate.org/email.php?eid=219&keyword=their%20surprise%20when%20they%20see%20that%20the%20ensemble%20averages%20you%20publish
……..
hmmm… Imagine their surprise!!!!

December 12, 2009 5:13 am

“…for example changes in observations methods”
Would that include “viewing objectively”…?

Anand Rajan KD
December 12, 2009 5:18 am

I have a question:
Why study the temperature at all? Temperature is a weather phenomenon – to begin with. It flickers and swings too much and thus requires mathematical ‘handling’ to make sense of. The warmists will always set up filters that warm up things because there is no baseline climate measurement paradigm to begin with. Using temperature seems to have its roots in the use of the phrase ‘global warming’.
Why not study only proxies? Arriving at a new compromise proxy that all agree on shouldn’t be contentious and it will have the added advantage of both camps not knowing what way things will trend (for both past and future) when the analysis is performed. Drastic political action should be taken only if the proxy satisfies predetermined conditions (which must still be subject to revision with improving scientific understanding).
Mann chopped off Briffa’s post-1960 part because it trended downwards w.r.t to the temperatures. In reality, it is a basic fallacy to do so, given the fact that tree ring reconstructions are the more long-term secular trend between the two. Questioning the longer tree record (with all its inherent problems) against a much shorter thermometer record (which has its own problems no end) doesn’t sound right.
Throw the temperature blade away and see if the secular proxy still bends only once to form a hockey stick. If it doesn’t – no unprecedented warming.

December 12, 2009 5:23 am

Raw, al dente or stewed?

Basil
Editor
December 12, 2009 5:26 am

ScottA (02:14:06) :
Did I miss the bleg for transforming row to column data?

Yes, you did! I had to go in to work last night, and in my haste to send this to Anthony, I forgot to add that. So let’s start with that, and I’ll have comments on some of the other posts later.
Have any ideas on how to do this? Years ago, I programmed in REXX, a great language for string processing, and it would be a snap to write up a script to do this in REXX. I suppose it wouldn’t be too hard to do in any language, but I haven’t programmed in years. I guess I was wondering if anyone had a suggestion for the easiest way to code something up to do this. All of the HadCRUT3 data is in this form, as is the GISTemp data. But I need it in straight column format for reading by my stat routines. What I did for this post was manually cut and paste it into Excel using the “transpose” option.
I suspect a true linux/unix geek could code up a bash script to do it as well, and I have access to linux machines to run code on if somebody were to give me a bash script that would do it.
Ignoring differences between HadCRUT3 and GISTemp in the header lines before the data of interest start — those lines just need to be deleted — when we get to the data, it is of the format
YYYY xxxx xxxx …. xxxx
where YYYY is the year, and the xxxx are monthly figures (except for GISTemp, which has a bunch of averages after the December figure. In either case, the object is to parse the line into “words” and write the 2nd through 13th word out line by line to a new file. (Well, I’d need to probably write out the first YYYY, maybe as a comment, at the start of the new file, since the data do not all start at the same year.) I realize this is a very trivial exercise…to anyone who codes daily.
So the bleg was just to ask for some ideas about how to do this. Lots of bright and capable people read this blog. Paul Clark (WoodForTrees) has to do this to read the GISS data for his web site, so he probably has some C++ code that does it, but it would have to be modified some for HadCRUT and the GISS station data. And I do not have a C++ IDE/compiler installed anyway.
Anyway, that’s the issue. With the release of the HadCRUT3 “subset,” there is a lot of data that would be interesting to look at in detail. But with the tools that I use, I need to come up with a more efficient way to handle this row to column transformation.

Basil
Editor
December 12, 2009 5:38 am

ralph (02:15:28) :
Do I spy an inverse hockey-stick developing in one of the graphs above?
http://wattsupwiththat.files.wordpress.com/2009/12/nashville-figure5.png

That’s an “artifact” of HP smoothing. All smoothing methods have issues with handling of data at the ends of the series. With some smooths, the results near the end are padded with extra data added, but one has to make assumptions about the extra data. HP doesn’t do this. It is a least squares approach, and it produces an unbiased smooth, in the sense that a linear regression through the smooth has exactly the same fit as a linear regression through the unsmoothed data.
Looking at the figure, we’re likely at the bottom of a cycle in the data. As the trend turns back up, the integration of new data into the smooth will make the turning point higher, removing the “blade” you see which reminds you of a hockey stick. That said, it still may be that the depth of this latest cooling episode will turn out to be more extreme than previous episodes. But the main point is that with HP smoothing to this type of data, you cannot read much of anything into the half of a cycle that begins and ends the series.

bill
December 12, 2009 5:44 am

From another thread thiss seems an extremely important analysis:
http://www.gilestro.tk/2009/lots-of-smoke-hardly-any-gun-do-climatologists-falsify-data/
The adjustments over the whole GHCN series add up to
NO BIAS
A selection of random long series stations GISS and GISS homogenised compared
http://img11.imageshack.us/img11/7440/gissrawtemps.jpg

Basil
Editor
December 12, 2009 5:45 am

David (22:20:43) :
I’ll issue a gripe here. Annualizing the temperatures. Why? There are four seasons every year. If you want to make the data less choppy, fine, but at least keep it in its natural rhythm. Do the Grand Poobahs have a reason for imposing the calendar’s will on the data?

While the rate of growth is annual, the seasonality is all still there, month by month. The monthly “natural rhythm” is present in the gray data labeled “(original data)” underlying the blue “(trend)”. But the blue trend is a “natural rhythm” as well, just on a time scale of a decade or so.

Eric Gamberg
December 12, 2009 5:55 am

Mike Fox (23:35:01) :
Mike, take a look at the plots at these two stations in the surfacestations database:
http://gallery.surfacestations.org/main.php?g2_itemId=1557
http://gallery.surfacestations.org/main.php?g2_itemId=27418&g2_imageViewsIndex=1