Guest post by John Goetz

Cross posted from Climate Audit

Earlier this year I did a post on the amount of estimation done to the GHCN temperature record by GISS before generating zonal and global averages. A graphic I posted compared the amount of real temperature data with the amount of estimation over time. To read the graphic, consider 2000 as an example. As of February 7, 2008 there were 3159 station records in the GHCN data with an entry for the year 2000. Of those station records, 62% were complete and an annual average could be fully calculated. Another 29% were incomplete, but contained enough monthly data that the GISS estimation method kicked in. The final 9% were so incomplete that no estimation could be done.

What I did not explore at the time and would like to look more closely here is the accuracy of the estimation. One would hope with so much infilling going on that the accuracy would be rather high (I will leave the determination of “high accuracy” for a later time). Because I did not have real data to compare with the GISS estimations, I took another approach. I used the GISS method to estimate real temperature data as if that data were missing.

Recall that GISS never explicitly estimates missing monthly temperatures. What they do is estimate seasonal averages when one monthly temperature is missing but the other two are present. Similarly, an annual temperature can be estimated when one seasonal value is missing but the other three are present. Using this methodology GISS can estimate an annual temperature when as many as six monthly values are missing.

While no explicit monthly estimate is recorded by GISS, it certainly can be derived from the seasonal estimate. I have shown several times a one-line equation that exactly reproduces the GISS seasonal estimate. Leaving a subsequent derivation as an exercise for the reader, the implied monthly estimate can be found from that equation and is expressed as follows:

where the average values for A, B, and C are calculated from all valid entries for the given month in a particular station record.

Now to test the estimation accuracy. In Connecticut, December 2006 was warmer than normal, but February 2007 was colder than normal. Looking at the records for Hartford, CT, we see the following monthly and seasonal temperatures:

`Dec 2006: 3.3`

Jan 2007: -0.3

Feb 2007: -4.6

DJF: -0.5

If the December 2006 record were missing from Hartford, GISS would estimate a value of -0.7 C, which would yield a seasonal average of -1.9 C. Similarly, if February 2007 were missing, GISS would estimate it at 1.7 C and produce a seasonal average of 1.6 C. That’s a 4.0 degree miss for Dec, a 6.3 degree miss for February, and a 3.5 degree swing at the seasonal level.

The winter of 06-07 in Connecticut was a bit of an oddball. I really wanted to know what the typical error looked like. To do that, I performed the same calculation on all GHCN v2.mean records.

A real monthly value can be compared against its GISS estimate only when all three monthly values in the season are available. In my copy of GHCN v2.mean, there are approximately 6.25 million monthly values that meet that requirement. I went through each of the monthly values and simulated a GISS estimate, and from that estimate I subtracted the actual value to produce a delta temperature. A positive delta means that GISS would over-estimate the temperature and a negative delta means GISS would under-estimate the temperature.

Following is a histogram of the delta values collected. The x-axis is the value of the delta in degrees C. The y-axis is the percentage of records that had the specified delta value.

The fact that the simulation histogram looks like a normal distribution should not be surprising. This comes about because I need all three months in a season in order to simulate an estimate and a resulting delta. Recall that in the Hartford example above a large delta for December was followed by a similarly large delta for February, but of the opposite sign. Given the enormous sample size, the small differences in magnitude eventually even out.

The above distribution tells us the probability that the GISS estimate will miss the actual value by a specific amount. Zooming in on the distribution, we see GISS should get it exactly right just over 3% of the time:

Following is a table of absolute values and their corresponding probabilities, through a delta value of 2.9 degrees:

Referring to the table, the probability GISS will create an estimate within 0.4C of the actual value is 26.7%. A value between 0.5 C and 0.9 C has a 22.2% probability of occuring. Similarly, 1.0 C to 1.9 C is 26.5%, and 2.0 C to 2.9 C is 12.7%. There is about a 12% probability that the GISS estimate will be off by 3.0 C or more.

Note that the estimation method as it stands **does not** introduce a bias into the station record. But it **does** introduce a sizable uncertainty.

So what this is saying is that there is roughly a 50/50 chance that the adjustment is within 1C of the actual value? That’s a pretty broad range

How can anyone trust the data with this type of interpolation? This makes the 95% CI error bars huge for these estimated data points. What type of science is this? I would expect better from these professionals.

I have a question though, in looking through the overall data set do you see a bias in application in one direction or another? It would be interesting if the adjustments were only one way by selection. I find that when interpolation is performed many times a selection occurs within the generating code to guard against “unrealistic” results. This filtering application is ripe for bias.

This is great work in reverse engineering the equations, and in evaluating the error from a real data set.

Keep up the good work.

Richard

I wonder whether those estimated values are above or below the raw mean.

The interpolations and estimates have no statistical or scientific basis.

Having said that, an algorithm without bias (as documented by John above) for the purpose of computing an overall global or regional temperature average will have no effect on the result. This should be easy to confirm by computing an average justing using the data without the interpolations.

Another way of looking at this is to think of the interpolated data as part of the computation and these data are of no significance, assuming the computation is correct.

Which is not to say there aren’t biases in there. For example, missing data may be more prevalent in (very) cold weather or at stations that are remote from human settlement and hence hard to get to in adverse weather or particular seasons.

An analysis of where the data is missing would be interesting. Given the amount of missing data has increased substantially in recent years, evidence of trends in missing data would be even more interesting, as this indicate a sytematic bias.

Hell, my Ouija board can do better than that… ;)

I would guess that in England the first week of June would, on average, be cooler than the second week and the third and fourth would increase in average temperature; whereas the four weeks of December probably follow the opposite pattern, on average. It seems unlikely, to me, that missing periods of more than a week or two could ever be filled with any degree of accuracy.

This “summer” in London is a good example of the problem. The change in average (daytime) temperature week by week has been huge. If we look at average daytime temperatures for single weeks the swing has been even more noticeable; last week Monday was cold, Tuesday Thursday and Friday were cool whereas Wednesday (my golf day) was hot. If Wednesday were missing the average would be well below the true average, if Monday were missing the average would be too high.

Maybe these things average themselves out over time, but is it known whether GISS apply different methods of estimating missing periods depending on the time of year?

If the GISS guesstimates wrong, how can this not cause a bias?

Or do we simply assume that the bad guesses will cancel out, given a sufficiently large sample size?

If I understand, GISS creates an average annual temperature from 4 seasonal averages. Even if no data were missing, that method results a statistic that is almost meaningless. Average annual temperature? A hot summer and a cold winter yield the same result as a cool summer and a warm winter. Are two such years the same? (To a farmer they very much are not).

I have never been comfortable with average daily, weekly, or monthly temperature for the same reason, or for any method that picks an instantaneous high and an instantaneous low and averages them. Temperature itself is not the same as heat and a poor substitute measure of heat energy, which is better expressed as temperature over a time period.

In other words, the amount of heat-work is the area under the time series temperature curve, not the maxima and minima. The approximation (substitution) of maxima and minima for the area under the curve might be reasonable for short time periods, assuming the changes followed straight lines. But for any lengthy time period, the substitution is wanting in information. For a crude example, a curve made up of a series of m’s (mmmm) and a curve made up of u’s (uuuuu) may have the same maxima and minima, but the areas under the curves would be different.

Averaging maxima and minima over long periods, such as using a single point temperature for an entire month, ignores most of the data and all of the integral heat-work. Global warming (or cooling) is about heat, after all, and not temperature per se. Am I wrong? Please educate me.

It does not cause a bias because it is wrong on the positive side just as much and just as often as it is wrong on the negative side. You are correct, they cancel out.

I don’t understand why they need seasonal values? Why not just use monthly values?

Eventually to get a mean you would have to weight different values, since there are probably more temp sites in NA vs everywhere else, etc. We could argue those weights forever.

Did I see a plot of GIS corrections showing a steady positive bias? Interesting.

Can these adjustments be shown over time? I’m curious to see if there is any bias towards (intentional or not) earlier years vs later years.

We don’t assume, we KNOW that the “bad guesses” will cancel out, because the first graph shown on this page is symmetrical about the 0 value. e.g. The chance of a +1.5C error is the same as a -1.5C error.

For the guestimating to some how introduces a bias, there would need to be something strange going on. Of which there may be, but we have no evidence of that here (beyond the fact that GISS shows faster global temperature rises than most other measures).

GISS for July is 0.51, a marked leap back up: http://www.woodfortrees.org/plot/gistemp/from:2007

and also PDO index, which is still dropping;

http://www.woodfortrees.org/plot/jisao-pdo/from:2007

Giss executive officers wouldn’t ‘lose’ readings unfavorable to their beliefs, thus introducing a bias, would they??

Given the prescence of the satellites, they wouldn’t be able to get away with that for long, but what about older historical data? Is anyone capturing the older pre ’79 data for posterity, before bits of it get ‘lost’.

A normal distribution with a mean of zero looks like a pretty good measurement process to me. Sure you have uncertainty, but that is normal. Quarterly, or annual figures should be pretty accurate.

Also, if I understand this correctly, that this is the histogram for an individual station, and if each station is assumed to have an independent error, then this suggests the global measure will be very accurate indeed, as all the individual errors will, indeed, cancel out.

Reply by John Goetz:Actually, the histogram is for

allstations andallyears in the GHCN record (v2.mean).You need to be a little careful with the shape of the histogram. The “simulation” I performed required that all three months be available in a specific season for a specific station in order to calculate an estimate and compare it to the real value. For example, if summer 1957 was being tested, I needed June, July and August. If August were missing, the GISS algorithm would not be able to estimate June or July, and I would not have a real August to look at either.

With all three months available, I now force symmetry into the result. If my estimate for August is higher than the actual, the combination of June and July must be lower than the actual by an equivalent amount such that the average of the three predicted values is the same as the average of the three real values.

In the actual application of the GISS algorithm, at most one month in a season can be estimated, so symmetry in practice is not guaranteed. In fact some months in some years are estimated far more frequently than others.

What the distribution really tells us is 1) how accurate or inaccurate the GISS estimate is and 2) the probability that a specific value will be estimated.

The fascinating thing to me is that these estimates and adjustments get called “data” by climatologists.

As a layman the problem I have always had with the method of using “average temperatures” is that they want to find a single temperature rather than a range of of temperatures. For example, when using models to try to predict the path of a hurricane, many “predicted” tracks are plotted (known as spaghetti lines} and a cone of probability is used to encompass an area that includes most of these lines. A center line is drawn in that cone but the hurricane can move anywhere within that cone. Any spaghetti lines outside the cone of probability are there to take note of but are not likely tracks of the hurricane. All of us living on the coast of Florida are familiar with what this means. Experts can now predict landfall of the eyewall within 50 miles 24 hours out but everyone within the cone of probability gets ready just in case. If this type of mehtod was used on temperature, we would have a range of normal temperatures that would be more meaningful. Anything within that range would be considered normal and the nitpicking over a month being 0.2 degrees above or below a specific temperature would be eliminated. In southwest Florida the summer daily high temperature can range from 84 to 94 degrees on any given day depending on weather conditions. Now if, over time, the whole range shifts, then one could say there was warming or cooling. It seems more sensible to me.

“Sure you have uncertainty, but that is normal. Quarterly, or annual figures should be pretty accurate.”

Like: The global average temperature is 56.781 degrees F +/- 4.890 degrees at 3 standard deviations? Accurate, but hardly the implied precision. Why is it not quoted thus? And this is just the standard error, no? We also have experimental error.

Tom in Florida,

Anything that makes as much sense as what you propose, will be immediately discarded.

Mike

What happens to these numbers after the averaging described above? If this data set is the end point, then as SpecialEd pointed out, the errors are likely to cancel out.

However, if there is another correction process involved in generating regional/zonal/global averages (and here I am thinking of Steve McIntyre’s discussion of Mr. Hansen’s rather byzantine “rural” station correction program) are we sure than all errors are treated equally? My sense is that the wider the variance in the input set the more the correction algorithms get to decide what the “true” figures are.

Basically they are gathering bits of data here and there, and then making up the rest.

If I had done this with a physics experiment back in high school, what grade would I deserve?

Their climate data is accurate, plus-or-minus a season.

Anything within that range would be considered normal and the nitpicking over a month being 0.2 degrees above or below a specific temperature would be eliminated.But if we did that, then we wouldn’t be able to panic when one year was 0.1 degrees warmer than another year.

Oh, and errors with a mean of zero only “cancel out” when computing the time average of the data, i.e. the average global temperature over an infinite length of time. So as the time scale gets very large, the computed global average temperature over that range of time should get very close to the actual average. It doesn’t mean the errors “cancel out” for the sake of computing short-term trend lines. If we could get some kind of estimate on what kind of time period it would take to reduce our error bars to something small, like .1 C, say 10 to 30 years, it would be interesting to see what a graph of running mean would look like. To my knowledge, no one’s really looking at the graph convolved with a characteristic function.

Shades of Dan Rather! Forged but accurate.

Exerpt from http://www.giss.nasa.gov/about/: Current research, under the direction of Dr. James Hansen, emphasizes a broad study of Global Change, which is an interdisciplinary initiative addressing natural and man-made changes in our environment that occur on various time scales (from one-time forcings such as volcanic explosions, to seasonal/annual effects such as El Niño, and on up to the millennia of ice ages) and affect the habitability of our planet. Program areas at GISS may be roughly divided into the categories of climate forcings, climate impacts, model development, Earth observations, planetary atmospheres, paleoclimate, radiation, atmospheric chemistry, and astrophysics and other disciplines. However, due to the interconnections between these topics, most GISS personnel are engaged in research in several of these areas.

A key objective of GISS research is prediction of atmospheric and climate changes in the 21st century. The research combines analysis of comprehensive global datasets, derived mainly from spacecraft observations, with global models of atmospheric, land surface, and oceanic processes. Study of past climate change on Earth and of other planetary atmospheres serves as a useful tool in assessing our general understanding of the atmosphere and its evolution.

Is there a reason NASA is doing this work instead of NOAA? This is work NOAA is chartered to do.

With Dr. Hansen in charge, I suspect a biased “end justifies the means” political agenda,

Instances such as this amplify the usefullness of the Finagle Constant, the Bougerre Factor and the Diddle Coefficient in correcting faulty and/or missing data to obtain the projected result.

Tom in Fl has it correct. Most recent studies seem to indicate that our climate is within range of normal variables. If you remove the area of variability and use a thin line you can more expound on the gravity of the situation and have an entirely different view. 1. this is normal in the context of historical events.

2. this is much different from our historic record. I think that we must use all data and show our average temps with percentage of variability just as

Tom described in his example. Anything else just begs to be exploited by alarmist of either warming or cooling.

Just my 2 cents,

Bill Derryberry

@John Goetz who said “With all three months available, I now force symmetry into the result.”

While that is true, the symmetrical graph ‘proves’ that the algorithm is not inherently biased (although one could deduce that without the graph). But yes, it is nice to have a measure of the inaccuracy of the GISS estimation method :-)

BTW, it would actually be very difficult to systematically attempt to “trick” the GISS estimation algorithm into producing higher & higher temperatures, so I really don’t see there could be any (un)intentional biasing going on. Seems far more likely that the urban heat effect is simply not being sufficiently accounted for….?

I’m not sure that some of the readers understand what John did here. He is using actual data to estimate how much likely error there is in the estimated monthly temperatures used by GISS. We can’t know how wrong the actual guesses were. We can’t.

First, he showed that the averaging method GISS uses can produce a guesstimate that is significantly off. Look at a known Dec value and a known Feb value. Use the GISS method to guesstimate what Jan value would be (if it were not available, this guesstimate would be included in the GISS data).

But John only used data where we know Jan’s actual value. How does the hypothetical guesstimate (using GISS methodology) stack up with reality?

Answer — not good.

John Goetz; thanks, that makes sense. So would an even more useful measure of accuracy be a sampling distribution for a single month, rather than a single station? Or you could estimate the standard errors for a month through monte carlo analysis?

Can somebody post a URL where I can get my hands on the actual station temperature data, not the gridded data? And the 1951..1980 normals that GISS compares against?

This I found interesting when comparing warmest and coolest averages per month for a couple stations here in MN against what GISS has for those months.

The GISS figure is to the right of the year with the difference in ().

Pine River Dam,MN(216547)

Jan: 23.7F/-4.6C (2006) -4.5 (+0.1)

-10.7F/-23.7C (1912) -25.4 (-1.7)

Feb: 28.5F/-1.9C (1998) -1.9 (0.0)

-7.3F/-21.8C (1936) -23.1 (-1.3)

Mar: 37.8F/3.2C (1910) 2.9 (-0.3)

10.2F/-12.1C (1899) -12.5 (-0.4)

Apr: 51.9F/11.1C (1915) 10.9 (-0.2)

31.0F/-0.6C (1950) -0.7 (-0.1)

May: 65.2F/18.4C (1977) 17.9 (-0.5)

43.1F/6.2C (1907) 5.8 (-0.4)

Jun: 70.9F/21.6C (1933) 21.8 (+0.2)

57.2F/14.0C (1945) 14.1 (+0.1)

Jul: 75.1F/23.9C (1916) 23.9 (0.0)

61.7F/16.5C (1992) 16.5 (0.0)

Aug: 74.1F/23.4C (1983) 23.4 (0.0)

60.4F/15.8C (1927) 15.9 (+0.1)

Sep: 62.9F/17.2C (1906) 17.0 (-0.2)

50.1F/10.1C (1965) 9.6 (-0.5)

Oct: 56.6F/13.7C (1963) 12.9 (-0.8)

30.2F/-1.0C (1925) -1.5 (-0.5)

Nov: 40.9F/4.9C (2001) 4.9 (0.0)

17.4F/-8.1C (1911) -8.4 (-0.3)

Dec: 25.4F/-3.7C (1931) -5.2 (-1.5)

1.3F/-17.1C (1927) -18.6 (-1.5)

Obviously GISS cooled the past a bit, but they did it by cooling the colder months of the year and recent data stays about the same. Lets do one more.

Cloquet,MN(211630)

Jan: 24.7F/-4.1C (2006) -4.1(0.0)

-8.7F/-22.6C (1912) -22.7 (-0.1)

Feb: 29.7F/-1.3C (1998) -1.3 (0.0)

-4.4F/-20.2C (1936) -20.3 (-0.1)

Mar: 35.0F/1.7C (2000) 1.7 (0.0)

14.3F/-9.8C (1923) -9.8 (0.0)

Apr: 48.1F/8.9C (1987) 8.9 (0.0)

30.7F/-0.7C (1950) -0.7 (0.0)

May: 59.6F/15.3C (1977) 15.3 (0.0)

45.4F/7.4C (1915) 8.0 (+0.6)

Jun: 67.7F/19.8C (1933) 19.8 (0.0)

54.3F/12.4C (1915) 13.1 (+0.7)

Jul: 72.3F/22.4C (1921) 23.2 (+0.8)

60.0F/15.6C (1915) 16.3 (+0.7)

Aug: 70.1F/21.2C (1983) 21.1 (-0.1)

57.8F/14.3C (1912) 15.1 (+0.8)

Sep: 61.6F/16.4C (2004) 16.4 (0.0)

48.2F/9.0C (1918) 9.5 (+0.5)

Oct: 54.6F/12.6C (1963) 12.5 (-0.1)

33.1f/0.6C (1917) 1.1 (+0.5)

Nov: 41.1F/5.1C (2001) 5.1 (0.0)

19.8F/-6.8C (1911) -6.3 (+0.5)

Dec: 26.2F/-3.2C (1913) -3.3 (-0.1)

1.6F/-16.9C (1983) -16.9 (0.0)

We see with this station that it was warmed in the past and that warming was done mostly in the warmer months. I know this station was warmed because you can easily compare the chart listed with GISS and the chart posted with the survey done at surfacestations.org. before the 2007 adjustments. The Y-axis of the charts are 1degC different(warmer after the adjustments).

On these two stations, some years are there more than once for each station monthly record. For example, 1915(May,Jun,Jul) in Cloquet was cold but adjusted up pretty good. 1912(Jan) and 1912(Aug) are adjusted different. A slight tick down in Jan, but up in Aug.

Be nice to compare all months in a station record to what GISS has, but this gives some idea. The pros are going bald over trying to figure what they’ve done.

Warmest and coolest averages for each month came from here.

http://mrcc.sws.uiuc.edu/INTERACT/mwclimate_data_calendars_1.jsp