Rewriting History, Time and Time Again

A guest post by John Goetz

In February I wrote a post asking How much Estimation is too much Estimation? I pointed out that a large number of station records contained estimates for the annual average. Furthermore, the number of stations used to calculate the annual average had been dropping precipitously for the past 20 years. One was left to wonder just how accurate the reported global average really was and how meaningful rankings of the warmest years had become.

One question that popped into my mind back then was whether or not – with all of the estimation going on – the historical record was static. One could reasonably expect that the record is static. After all, once an estimate for a given year is calculated there is no reason to change it, correct? That would be true if your estimate did not rely on new data added to the record, in particular temperatures collected at a future date. But in the case of GISStemp, this is exactly what is done.

Last September I noted that an estimate of a seasonal or quarterly temperature when one month is missing from the record depends heavily on averages for all three months in that quarter. This can be expressed by the following equation, where ${m}_{a}, {m}_{b}, {m}_{c}$ are the months in the quarter (in no particular order) and one of the three months ${m}_{a}$ is missing:

${T}_{q,n} = \frac{1}{3}{\overline{T}}_{{m}_{a},N} + \frac{1}{2}\left({T}_{{m}_{b},n} + {T}_{{m}_{c},n}\right) - \frac{1}{6}\left({\overline{T}}_{{m}_{b},N} + \overline{T}}_{{m}_{c},N}\right)$

In the above, T is temperature, q is the given quarter, n is the given year, and N is all years of the record.

One can readily see that as new temperatures are added to the record, the average monthly temperatures will change. Because those average monthly temperatures change, the estimated quarterly temperatures will change, as will the estimated annual averages.

Interestingly, application of the “bias method” used to combine a station’s scribal records can have a ripple effect all the way back to the beginning of a station’s history. This is because the first annual average in every scribal record is estimated, and the bias method relies on the overlap between all years of record, estimated or not. Recall that annual averages are calculated from December of the prior year through November of the current year. However, all scribal records begin in January (well, I have not found one that does not begin in January), so that first winter average is estimated due to the missing December value. Thus, with the bias method, at least one of the two records contains estimated annual values.

Of course, it is fair to ask whether or not this ultimately has any effect on the global annual averages reported by GISS. One does not have to look very hard to find out that the answer is “yes”.

On March 29 I downloaded the GLB.Ts.txt file from GISS and compared it to a copy I had from late August 2007. I was surprised to find several hundred differences in monthly temperature. Intrigued, I decided to take a trip back in time via the “Way Back Machine”.

Here I found 32 versions of GLB.Ts.txt going back to September 24, 2005. I was a bit disappointed the record did not go back further, but was later surprised at how many historical changes can occur in a brief 2 1/2 years.The first thing I did was eliminate versions where no changes to the data were made. I then compared the number of monthly differences between the remaining sequential records and built the following table. Here I show the “Prior” record compared to the next sequential record (referred to as “Current”). The number of changes made to the monthly record between Prior and Current is shown in the “Updates” column (this column does not count additions to the record – only changes to existing data are counted). The number of valid months contained in the Prior record is in the “Months” column. “Change” is simply the percent Updates made to Months.

On average 20% of the historical record was modified 16 times in the last 2 1/2 years. The largest single jump was 0.27 C. This occurred between the Oct 13, 2006 and Jan 15, 2007 records when Aug 2006 changed from an anomoly of +0.43C to +0.70C, a change of nearly 68%.

Wow.

The next question I had was “how often are the months within specific years modified?” As can be seen in the next chart, a surprising number of the earliest monthly averages are modified time and again.

Click the image to see it in full

I was surprised at how much of the pre-Y2K temperature record changed! My personal favorite change was between the August 16, 2007 file and the March 29, 2008 file. Suddenly, in the later file, the J-D annual temperature for 1880 could now be calculated. In all previous versions the temperature could not be determined.

But some will want to know only how this process affects the rankings for the top 10 warmest years. Because the history goes back to the middle of 2005, I explored this question only for the years before 2005. While the overall ranking from top to bottom does change from one record to the other, the top 10 prior to 2005 does not change much. However, the top two do exchange position frequently, as can be seen from the following table:

I will note that the overall trend in changes between now and Sep. 24, 2005 is very close to zero. If one compares the latest file with the one from Sep 24, 2005, it can be seen that the earliest and latest years are adjusted lower today than in 2005, while the middle years are adjusted higher. However, this is purely coincidence. If one compares the file from Aug. 2007 with the latest file, it appears the earliest temperatures have been adjusted downward, leading to an overall upward trend. Surely other comparisons will yield a downward tend. It is by pure chance that we have selected two endpoint datasets that appear to have no effect on the tend.

In the meantime, will the real historical record please stand up?

0 0 votes

Article Rating

35 Comments

Mike Bryant

April 8, 2008 1:08 pm

George Orwell would be proud. He’d also be proud of this:
BBC changes global warming article title…

mark wagner

April 8, 2008 1:32 pm

Hansenize. Verb. To make a correction to historical records in such a way that the correction has the effect of increasing the error that the correction purports to correct.
Serious question: At what point does this become fraud?

Retired Engineer

April 8, 2008 2:01 pm

At what point do we say we really don’t know what’s going on? When does the error exceed the claimed trend? The station survey raises questions about their observed trends. “Urbanization causes local temperatures to rise.” How useful is that?
Until now, this was mostly an academic debate. When the government imposes restrictions with substantial economic impact, it takes on a far greater meaning.
Worse, the mandated shift to biofuels, particularly ethanol, has caused food prices to rise, and a shortage of hops.
Which raises the price of beer. Now that’s serious.

edaniel

April 8, 2008 2:12 pm

I prefer to call these changes to measured data modifications. Corrections and adjustments have meanings that imply some sort of valid reference is employed in the process.

MattN

April 8, 2008 2:15 pm

Actually I think “Hansenize” would be a transitive verb of “to hansen”.
“He hansenized the data so that his boss would think his project was on track”.
The hansenalization of climate continues….

Robert Coté

April 8, 2008 2:29 pm

Retired Engineer (14:01:58) :
At what point do we say we really don’t know what’s going on? When does the error exceed the claimed trend?
Tough question. The speed of light, supposedly a constant has been measured for centuries. Over that time the measured value has greatly exceeded the maximum cumulative error for the various methods. Just like Einstein’s incorrect claim that gravity fields and acceleration fields were identical perhaps temperature readings in the past are not as permanent as we would like.
That said, our genial host has made a clear case that there exist asymmetrical biases in forward calculations of empty fields (missing data).
A shortage of hops? And here i was previously unable to put in words my general unease with the dangers of actually addressing the theory of AGW. This is serious.

Ric Werme

Editor

April 8, 2008 2:40 pm

Oh come on. What you’re saying is that we won’t know what the past temperatures were until we have all the future ones. That means all the papers on past temperatures may now be wrong or may become wrong in the future. Or may become right again. I think there’s a word for this, and it’s not Science.
You note there was a cooling trend in the the 1998 & 2002 temps. Well, of course. The data recorded since the hottest years on record will be cooler, so the bias adjustment will have to make things cooler. Note also that given the warming trend over the last few decades, most old temperatures should be getting adjusted upward. So things have “actually” gotten hotter faster than we thought. Panic now before it’s too late! Oh no – it’s too late already. What comes after panic?
Pet peeve time. (1) There are several references to the only science that is good is that which has been published after peer review. This argument has been used to denigrate the substantial efforts of professional scientists, amateur scientists, and I suppose even non-scientists. (2) One thing I can’t stand about sports talk radio are people who vehmently complain complain about flubbed plays in various games. I figure I can’t do better than the athletes involved, I’m not going to complain about them.(*) (1+2) I’m a software engineer, I’m not a scientist. I could be a scientist, and I could be a good one. And if I used this sort of estimation I’d hope my software engineer persona would excoriate me because he could do a lot better.
(*) There are a couple exceptions, one person who picked a _very_ bad time to let a ground ball roll between his legs threw out the first pitch at the Red Sox home opener. That may have been the only play I could have handled better. However, he should have been forgiven long ago. Well, at least by late 2004. See http://wermenh.com/runnings.html

Tom Johnson

April 8, 2008 3:08 pm

I very much wish you could get into contact with statistician William M. Briggs (wmbriggs.com/blog/).
He has written a remarkable paper which attempts to quantify the uncertainty involved in doing weather analysis and prediction. The paper is “Quantifying Uncertainty in AGW”.
The abstract follows:
A month does not go by without some new study appearing in a peer-reviewed journal which purports to demonstrate some ill effect that will be caused by global warming. The effects are conditional on global warming being true, which is itself not certain, and which must be categorized and bounded. Evidence for global warming is in two parts: observations and explanations of those observations, both of which must be faithful, accurate, and useful in predicting new observations. To be such, the observations have to be of the right kind, the locations and timing where and when they were taken should be ideal, and the measurement error should be negligible. The physics of our explanations, both of motion and e.g. heat, must be accurate, the algorithms used to solve and approximate the physics inside software must be good, chaos on the time scale of predictions must be unimportant, and there must be no experimenter effect. None of these categories is certain. As an exercise, bounds are estimated for their certainty and for the unconditional certainty in ill effects. Doing so shows that we are more certain than we should be.
REPLY: I have his email, I’ll give him a shout.

Doug

April 8, 2008 4:01 pm

well the enviroevangelists have an agenda – truth should not get in their way
great article in our news today;
climate change is based on “over-certainty in the absence of convincing argument and data” and “over-reliance on computer models”.
http://www.theaustralian.news.com.au/story/0,25197,23509775-2702,00.html

Philip_B

April 8, 2008 4:18 pm

The very considerable irony in these estimates (of missing data) is that were Global Warming really global, they wouldn’t be needed.
Let me explain.
If the warming was occuring in most places by about the same amount, then any sufficiently large set of measurements over time would clearly show the amount of warming (over that period of time). Whether they were taken at one location or another would be irrelevant. I.e. the trend should show up in a random sample of temperatures over time (assuming no daily or seasonal bias).
Of course, Hansen knows that local and regional effects dominate temperatures, so it is necessary to keep measuring at the same locations. As anyone who has followed Anthony’s Surface Stations project will know, even a small change at a location can affect temperatures by as much or more than the claimed global warming.
Even then, missing data wouldn’t matter if you were interested in measuring a global effect, because the global effect should show up across the average of all sites and even large numbers of gaps in the records at individual sites shouldn’t matter statistically.
It is precisely because the temperature record at small numbers of individual sites can significantly affect the global temperature record that missing data matters.
So there you have it. The estimates for the missing data are needed in order to show the amount of global warming, because the warming isn’t global.

Mike Bryant

April 8, 2008 4:19 pm

I have a message for NASA:
STOP… PLEASE STOP… How many times do you have to keep changing the data?
Please stop NASA, and then put it back the way it was.
Thanks for listening.
Mike

Evan Jones

Editor

April 8, 2008 4:51 pm

Serious question: At what point does this become fraud?
Pish-Posh!
Not until policy costing trillions of dollars and seriously adversely affecting billions of lives . . . hang on . . .

Evan Jones

Editor

April 8, 2008 5:13 pm

Hullo, John. Great post!

Fernando Mafili

April 8, 2008 5:42 pm

Anthony :
When this story end. ….If…….
we must have missed the turnoff.
Fantasy

jeez

April 8, 2008 5:58 pm

Hey, not to pat myself on the back or anything, but you could update that post to reflect that you now have access to data going back to early 2001 thanks to some random guesswork on my part. One of my only talents, I’m good at making random guesses.
Maybe I should use that talent and be contributing to data adjustment algorithms?
REPLY: Jeez, I’m unclear as to what you are referring to, but maybe John Goetz can chime in. Whatever you did, thanks!

John Goetz

Editor

April 8, 2008 5:59 pm

First, let me say that I personally don’t think fraud of any sort is involved. I think this is just an unexpected consequence of the mathematics involved. Hansen has made an honest attempt to fill in missing data, and reading his papers I probably would have come up with a similar method initially. I don’t think anyone reading this forum can come up with a method for estimating missing temperatures that could not legitimately be picked apart by others. However, now that we know and can see the flaws, the question is can we find a better method?
To expand, I posted the following comment on CA:
Obviously no method is perfect. Initially I thought a method like Hu’s would suffice, but it assumes climate behaves normally. We know it does not.
Example: Where I live, November of 2006 was normal, December 2006 was way above average, and January 2007 was slightly below normal. If I were missing the December 2006 data point, just about any scheme I come up with will grossly under-estimate that temperature.
Step forward one year later. November was slightly above normal, December was slightly below normal, and January was even colder than the previous year. If I were missing December of 2007 but had both adjacent months as well as all three corresponding months from the previous year, I would over-estimate the December 2007 temperature.
Note however that the method GISS uses does not use adjacent months – it uses seasonal months. The autumn season is Sep-Oct-Nov. If N is missing it is derived from Sep-Oct. You and I might think it more reasonable to derive it from Oct-Dec. Likewise for missing Sep. Only Oct would be derived from adjacent months. In fact, only four of the missing seasons would be derived from missing months. If an entire season is missing, the relationship gets really strange.
Estimation of values during the intermediate steps of determining an average global temperature should be avoided. This is particularly true of the earliest steps, because the uncertainty of that estimation only grows as more and more results are derived from it.
In this case, estimation begins very early in the process. One of the first steps is to take each scribal record and estimate seasonal averages when months are missing. Then missing annual averages are estimated if any seasonal averages are estimated or missing. These scribal records are then combined with other scribal records to form a station record, and the consequence of the estimation ripples throughout the resulting record. GISS then applies a homogeneity adjustment to this data containing significant numbers of estimates. After that GISS goes through gridcell calculations and then the final calculation of the annual global temperature.
The only purpose of the estimation seems to be to calculate an annual average for each scribal record prior to combining, adjusting, griding and averaging. I propose that annual averages are not important as intermediate steps. Instead of calculating an annual global average temperature, calculate a monthly average temperature. If a data point is missing, then it is not adjusted, gridded or averaged. It simply is not included. There should be enough other data points from around the world to calculate an average temperature with some degree of certainty (of course, it is desirable that this degree of certainty be published).

-1

Mike Bryant

April 8, 2008 6:08 pm

Historical records are called historical for a reason. Leave them alone.

Philip_B

April 8, 2008 6:31 pm

If a data point is missing, then it is not adjusted, gridded or averaged. It simply is not included. There should be enough other data points from around the world to calculate an average temperature with some degree of certainty
John G, that was essentially my point.
The $64K question is, does the estimated data effect the global temperature average and trend?
I await the average and trend calculated with and without the estimated data with considerable interest.

John Goetz

Editor

April 8, 2008 6:52 pm

Philip_B, I don’t know with certainty if the estimated data effect the global temperature average and trend. I suspect they do, because I doubt the answer would be identical to the one we have now. But the result may be a greater trend upward rather than a lesser trend. We just don’t know.

pdm

April 8, 2008 6:59 pm

in the end, doesn’t GISS match the other temperature records? It may be tortuous, but it seems to work.

Evan Jones

Editor

April 8, 2008 7:44 pm

http://botd.wordpress.com/top-posts/
Check it out, John. You made top 100 WordPress (#60).

Mike Bryant

April 8, 2008 8:31 pm

#57 even

Evan Jones

Editor

April 8, 2008 9:34 pm

Movin’ up!

John Goetz

Editor

April 9, 2008 4:18 am

pdm, what do you mean by “match”? The data are not identical.

Dee Norris

April 9, 2008 4:56 am

#32 now

1 2 Next »

wpDiscuz

Share this:

Related Posts

This Fourth of July, Put American Pride and Patriotism on Full Display

Lumir K: Cooking Oil Powered LED Lamp

Surprise! Study says some glaciers actually shrank during the last ice age

Study: Interactions between smoke and clouds have unexpected cooling effect