GISS Step 1: Does it influence the trend?

Guest post by John Goetz

The GISStemp Step 1 code combines “scribal records” (multiple temperature records collected at presumably the same station) into a single, continuous record. There are multiple detailed posts on Climate Audit (including this one) that describe the Step 1 process, known affectionately as The Bias Method.

On the surface seems like a reasonable concept, and in reading HL87 the description of the algorithm makes complete sense. In simple terms, HL87 says that:

  1. The longest available record is compared with the next longest record, and the period of overlap between the two records is identified.
  2. The average temperature during the period of overlap is calculated for each station.
  3. The difference between the average temperature for the longer station and shorter station is calculated, and that difference (a bias) is added to all temperatures of the shorter station to bias it – bringing it in line with the longer station.
  4. The two records can now be combined as one, and the process repeats for additional records.

In looking at numerous stations with multiple records, more often than not the temperatures during the period of overlap are identical, so one would expect the bias to be zero. However, we often see a slight bias existing in the GISS results for such stations, and over the course of combining multiple records, that bias can be several tenths of a degree.

This was one of Steve McIntyre’s many puzzles, and we eventually figured out why we were getting bias when two records with identical overlap periods were combined: GISStemp estimates the averages during the overlap period.

GISStemp does not take the monthly data during the overlap period and simply average it. Instead, it calculates seasonal averages from monthly averages (for example, winter is Dec-Jan-Feb), and then it calculates annual averages from the four seasonal averages. If a single monthly average is missing, the seasonal average is estimated. This estimate is based on historical data found in the individual scribal record. If two records are missing the same data point (say, March 1989), but one record covers 1900 – 1990 and the other 1987 – 2009, they will each produce a different estimate for March, 1989.  All other data points might match during the period of overlap, but a bias will be introduced nonetheless.

The GISS algorithm forces at least one estimation to always occur. The records used begin with January data, but the winter season includes the previous December. That December datapoint is always missing from the first year of a scribal record, which means the first winter season and first annual temperature in each scribal record is estimated. Thus, if two stations overlap from January 1987 through December 1990 (a common occurance), and all overlapping temperatures are identical, a bias will be applied because the 1987 annual temperature for the newer record will be estimated.

Obviously, the bias could go either way: it could warm or cool the older records. With a large enough sample size, one would expect the average bias to be near zero. So what does the average bias really look like? Using the GISStemp logs from June, 2009, the average bias on a yearly basis across 7006 scribal records was:

BiasAdjustment

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

94 Comments
Inline Feedbacks
View all comments
David Ermer
July 22, 2009 7:02 pm

Since the temperatures in the past were colder than they are today, this all makes sense.(?)
Reply: No … the temperatures in the past for stations with multiple records have been cooled by an average additional 0.08C. – John

Filipe
July 22, 2009 7:11 pm

The net effect is a tenth of degree in more than 100 years, that’s not much. The “bias” goes to about zero after ~1990. Are you sure the curve isn’t simply due to rounding effects? That’s the kind of curve I’d expect if we had gained an extra digit (going from half a tenth accuracy to half an hundredth).
Not that I don’t find strange the other aspect you talk about, that need to “always estimate.” Weird, we’re talking about really bad programming skills there, that is the kind of thing that could be easily avoided.

Sam Vilain
July 22, 2009 7:12 pm

Perhaps you can also explain the relevance to the data series, given that the average adjustment is < 0.1⁰C ?
Reply: I forget what the increase in global temperatures is purported to be since 1880, but I believe it is somewhere in the neighborhood of 0.8 C. Roughly 0.08 C – or 10% – seems to be due to the process of combining scribal records, and nothing more. You decide the significance of this single process step (one of many). – John

Allen63
July 22, 2009 7:20 pm

I guess I’m confused because I have not studied more than a couple examples of original scribal data sheets.
Why would there frequently be two or more overlapping scribal records from the same station. Were two or more people reading the temperatures at different times on the same days and writing them down on separate lists?

Basil
Editor
July 22, 2009 7:22 pm

John, you’ve explained this well, and it certainly seems like a flawed method, but how much of the trend since 1880 does it actually account for? Just using some rough numbers, since 1880, the trend line rise in temps has been about 0.75° C. Looking at your chart, it looks like the trend line rise accounted for by this bias is about 0.10° C. Am I right (about the 0.10° C)? That would make the bias account for about 13% of the total supposed warming.
Does that sound like an in the ballpark estimate of the significance of this?

John F. Hultquist
July 22, 2009 7:24 pm

I can understand why a researcher (data technician) might start down this path focusing on the desire for long-running series of temperature data. At some point that person or group should have stopped and asked a few questions of the sort: “What are we doing to the data?” or “What are the alternatives?” or “How do different time periods regarding a lengthily warming or cooling influence the outcomes?”
If they did these things it seems they chose a method that gave them a warming bias. If they did not do these things, than shame on them!
Are there no “facts” to work with in climate research?

Boris Gimbarzevsky
July 22, 2009 7:26 pm

It should be easy enough to test if this method alway produces a positive bias and that would be to create a set of artificial records in which the values were drawn at random from 3 separate distributions: one where the mean trend was decreasing, another in which there was no trend and one where the mean trend was upward. This would be a simple enough program to setup and run a few thousand iterations for each scenario. If the method used to combine records is unbiased then the final dataset should not differ significantly from the original dataset for which we have the advantage of knowing what the generating function is.

SOYLENT GREEN
July 22, 2009 7:43 pm

Anthony, this is a bit OT. But don’t you know these guys?
http://www.ametsoc.org/policy/2009geoengineeringclimate_amsstatement.html
I know these guys too. They had a secret island full of hot compliant babes in “Our Man Flint” where they CONTROLLED THE WEATHER!!!!

Nelson
July 22, 2009 7:46 pm

Perhaps more importantly, since the adjustment was essentially flat until 1940, then went almost straight up, all of the bias occurred between the previous warm period of the mid-30s and the mid-90s.
The net result is that the recent warming thru 1998 looks more dramatic relative to the 30s by the 0.07⁰C introduced via the adjustment process.

Mac
July 22, 2009 7:48 pm

10% increase just from the process? If a business was fudging it’s books by 10% they would be dragged into a courtroom and prosecuted.
Off topic: http://scienceandpublicpolicy.org/images/stories/papers/originals/climate_money.pdf
tracked this back from drudge. Basically its a report on all the money spent by the gov’t on hyping and researching climate change. Anthony Watts and Steve McIntyre are mentioned several times for their volunteer work. I wonder how the graph on page 3 compares to the recent rise in temperature.

July 22, 2009 8:07 pm

The net effect is a tenth of degree in more than 100 years, that’s not much.
====
Ah but Filipe, the entire temperature increase over the past 100 years was less than 4/10 of one degree.
Are you telling me that an artificial bias in an artificial database deliberately inserted into the record by GISS has created 1/3 of the entire “global warming” that the most extreme of the AGW extremists can actually find?

David
July 22, 2009 8:13 pm

Ten percent is significant.

Harold Vance
July 22, 2009 8:15 pm

I’ll take the +0.01 Year 2020 leaps for a buck each.
GISS: Cooling the past to bring you a warmer and fuzzier future.

j.pickens
July 22, 2009 8:17 pm

The important thing to note is that the creators of this scribal data averaging/estimating technique are either unaware of this bias, or aware of it.
If they are unaware of it, are they making efforts to correct their system?
If they are aware of it, why haven’t they already corrected their system.
Either way, it does not look good.
Why do all these biases seem to accrue to the older-colder, newer-hotter side of the ledger?

steven mosher
July 22, 2009 8:18 pm

Interesting. It might be interesting to pass this on to the guys doing the
clear climate code project. Step one could be rewritten to change the method
of station combining and we could get a little bit closer to something that is accurate. As for the size of the bias, every little bit of improved accuracy counts. Kudos for your continued hard work on this wretched piece of code

Bob D
July 22, 2009 8:33 pm

Nice work, John. As you say, 10% of the claimed warming may just be a result of poor programming. Another reason not to trust GISS temperatures, in my view.
Ignoring the red trend line, there seem to be three distinct steps, at -0.07, -0.03 and just below zero. It would be interesting to drill down and see what happened at the step changes around 1940-52 and again at 1994-ish.

Bruce
July 22, 2009 8:43 pm

A few steps with bias. Drop all the rural records. Only use urban weathers stations and airports, pretend UHI is accounted for and voila … fabricated warming.

Richard Sharpe
July 22, 2009 8:44 pm

I don’t know about GissTemp whatever, but something is up with the weather in the San Francisco Bay Area.
We are having very cold nights and cool days in the middle of summer.

anna v
July 22, 2009 8:45 pm

“Reply: I forget what the increase in global temperatures is purported to be since 1880, but I believe it is somewhere in the neighborhood of 0.8 C. Roughly 0.08 C – or 10% – seems to be due to the process of combining scribal records, and nothing more. You decide the significance of this single process step (one of many). – John”
Every little bit helps, like the money in the church collection.
I presume these numbers are further corrected for UHI and then used for corrections over 1000kms? Once on the slippery road it keeps slipping :).

Filipe
July 22, 2009 8:45 pm

Just to clarify my point. Consider you have a large set of points randomly distributed according to a uniform distribution between 0 and 10. The “true” average of these points is 5. Consider now that all the points are truncated to integers. The average of the truncated points is 4.5. If the points are instead truncated to the first decimal place, then the average is 4.95 and so on.
In a system with truncation, and with accuracy increasing with time, even with a true flat slope, one would get a positive slope just from the truncation. But I’m not sure this applies here, are these measures considered as true rounding or simple truncation?

Richard
July 22, 2009 8:47 pm

The GISS record is highly suspect. I said so on a warmists blog and have now been banned from there. They do not tolerate dissent.
Why is there such a big trend difference beween the Land based temperatures of GISS and Hadley compared with the satellite data over the same time period? Does anyone know?
Snowman if you come here and read this – hi. We could chat here. I’ve been banned at the other place

Richard
July 22, 2009 8:55 pm

Felipe – “The net effect is a tenth of degree in more than 100 years, that’s not much.” The total warming over this period is 6 tenths of a degree, so a bias of one tenths of a degree would be significant?

AnonyMoose
July 22, 2009 9:15 pm

Oh, my. A review can’t even get through step 1 without finding an error. That slightly changes my opinion of the chances of other errors being present.
I think the obvious question to ask is: “How many times has this procedure been reviewed?”
These people have been using this procedure for years, they should have been examining it often. Why didn’t everyone examining the procedure find this problem when they started looking at the process?
I’ve read of scientists who hired an outsider, trained them to use a copy of their equipment, and had them study their original material to see if the outsider made the same discovery they did. Why aren’t these scientists having people examine their equipment regularly?

Scott Gibson
July 22, 2009 9:46 pm

Filipe, you mention the possibility of rounding effects… I live in the Arizona desert, and every day the temperature is rounded up at the end of the day. The result is that certain temperatures are shown for the high and low until the evening when it begins to cool, then at the high rises at least one degree later at night. I figure they justify that as the actual temperature has to be at least a fraction higher, and therefore is rounded up, always. If you don’t believe me, watch the daily noaa temperatures for Tucson.
I wouldn’t be surprised to find this kind of bias in the GISS too.
When I was younger, people said the weather service reported lower than actual temperatures during the summer so as to not scare off tourists, though I can’t vouch for it, and it may have been an urban myth.

Fluffy Clouds (Tim L)
July 22, 2009 9:49 pm

Amazing!!!!!!!!!!
no snip’s needed for shortness…. LOL
” it’s better to be snipped than band for life! ”
I like this place.
A.W. keep moving on that publishing of the stations.
10% added to 20% parking lot error could be a bunch
nite nite

1 2 3 4