GISS Step 1: Does it influence the trend?

Guest post by John Goetz

The GISStemp Step 1 code combines “scribal records” (multiple temperature records collected at presumably the same station) into a single, continuous record. There are multiple detailed posts on Climate Audit (including this one) that describe the Step 1 process, known affectionately as The Bias Method.

On the surface seems like a reasonable concept, and in reading HL87 the description of the algorithm makes complete sense. In simple terms, HL87 says that:

  1. The longest available record is compared with the next longest record, and the period of overlap between the two records is identified.
  2. The average temperature during the period of overlap is calculated for each station.
  3. The difference between the average temperature for the longer station and shorter station is calculated, and that difference (a bias) is added to all temperatures of the shorter station to bias it – bringing it in line with the longer station.
  4. The two records can now be combined as one, and the process repeats for additional records.

In looking at numerous stations with multiple records, more often than not the temperatures during the period of overlap are identical, so one would expect the bias to be zero. However, we often see a slight bias existing in the GISS results for such stations, and over the course of combining multiple records, that bias can be several tenths of a degree.

This was one of Steve McIntyre’s many puzzles, and we eventually figured out why we were getting bias when two records with identical overlap periods were combined: GISStemp estimates the averages during the overlap period.

GISStemp does not take the monthly data during the overlap period and simply average it. Instead, it calculates seasonal averages from monthly averages (for example, winter is Dec-Jan-Feb), and then it calculates annual averages from the four seasonal averages. If a single monthly average is missing, the seasonal average is estimated. This estimate is based on historical data found in the individual scribal record. If two records are missing the same data point (say, March 1989), but one record covers 1900 – 1990 and the other 1987 – 2009, they will each produce a different estimate for March, 1989.  All other data points might match during the period of overlap, but a bias will be introduced nonetheless.

The GISS algorithm forces at least one estimation to always occur. The records used begin with January data, but the winter season includes the previous December. That December datapoint is always missing from the first year of a scribal record, which means the first winter season and first annual temperature in each scribal record is estimated. Thus, if two stations overlap from January 1987 through December 1990 (a common occurance), and all overlapping temperatures are identical, a bias will be applied because the 1987 annual temperature for the newer record will be estimated.

Obviously, the bias could go either way: it could warm or cool the older records. With a large enough sample size, one would expect the average bias to be near zero. So what does the average bias really look like? Using the GISStemp logs from June, 2009, the average bias on a yearly basis across 7006 scribal records was:

BiasAdjustment

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
94 Comments
Inline Feedbacks
View all comments
Larry
July 23, 2009 8:25 am

This is an interesting piece, another reason not to trust anything coming out of GISS. I wonder if John Goetz is prepared to do a full posting on all of the other steps of temperature analysis, or have I missed out on these other steps in previous posts?

George E. Smith
July 23, 2009 8:47 am

Well I have a couple of questions. (1) Why do such multiple records covering the same time frame even exist. Do these measuring stations take temperature readings for each day that become a solid permanent records of a part of the universe; or don’t they; how do multiple versions of the sme station data come into existence; and why do they exist for so many stations that a special algorithm has to be dreamed up to handle those cases.
My second question; How does planet earth know about our 12 month system of calendar notation; so that it can react to that, and do the correct different things for Spring, Summer, Autumn, and Winter seasons. does the planet know that not all months are the same length ?
Why not simply report the data, as one continuous stream of infromation updated daily. Do you know of any newspaper media that print news information that is classified as being spring, summer, autumn or winter news, and handled differently quarter by quarter.
The whole methodology sounds to me like a three dollar bill; meaningless pigeon holing of groups of data, to create an illusion that more information can be extratced than there is in the original daily records, that were actually noted down from the thermometers for each day.
In a normal experimental methodology, where multiple readings of some variable are made within a small time frame, the separate readings are simply averaged to obtain a presumably more likely value for the variable.
This “Bias” hocus pocus, sounds like witchcraft to me.

John F. Hultquist
July 23, 2009 9:01 am

Ron de Haan (06:17:26) : You wrote about Pacific Northwest Snow Pack – the True Story
Gregoire’s Washington is run by her and her Democratic colleagues from the wet side of the State. There are occasional calls to separate the dry side and form a new state. Meanwhile, official State policy is completely C-AGW. Snow pack and runoff records are as easily manipulated as temperature. Although T-max this week in Eastern Washington is near 100 F (38 C) this is not unusual for late July, and irrigators have their full allotment from the reservoirs and snowmelt on the east slopes of the Cascade Mountains.

lulo
July 23, 2009 9:05 am

I wonder how they will correct for this:
FOR ROCHESTER…AVERAGE JULY TEMP 70.7.
COOLEST JULYS (BACK TO 1871)…
2009… 64.3 (THRU 7/22)
1884… 65.4
1992… 66.6
1891… 67.1
2000… 67.1
FOR BUFFALO…AVERAGE JULY TEMP 70.8.
COOLEST JULYS (AIRPORT DATA BACK TO 1943)…
2009… 65.4 (THRU 7/22)
1992… 66.8
1956… 67.6
2000… 67.6
1976… 67.8
COOLEST JULYS (INCLUDING DOWNTOWN DATA BACK TO 1871)…
1884… 65.2
1891… 65.3
2009… 65.4 (THRU 7/22)
1920… 66.1
1883… 66.8
1992… 66.8
— End Changed Discussion —

Tim Clark
July 23, 2009 10:03 am

The increase circa 1994-5 is obviously a step change, implying either equipment or data processing failure.
An Inquirer (06:37:18) :
I am intrigued by your observation that multiple scribal records is overwhelmingly a non-U.S. issue;

As am I.
Keep up the good work, John and EM.

Rod Smith
July 23, 2009 10:05 am

George E. Smith (08:47:45) :
“The whole methodology sounds to me like a three dollar bill; meaningless pigeon holing of groups of data, to create an illusion that more information can be extratced than there is in the original daily records, that were actually noted down from the thermometers for each day.”
I get the distinct impression that this could be a form of, “Now keep your eye on the cup with the ball under it.”
In other words, a lot of shuffling to make certain that we get the results we want while hiding mostly pointless processing complexities that add enough non-transparant bias to prove our point.

Jim
July 23, 2009 10:06 am

George E. Smith (08:47:45) : Obviously you miss the point. Warmists are statistical experts who have taken the branch of mathematics we call Statistics into the parallel universes. Their quantum-like techniques can only be understood by the Inner Circle of the Hockey Team – or by anyone who uses LSD. A simple average is so far beneath them that they have forgotten what it is.

Ron de Haan
July 23, 2009 11:08 am

Have a look at the latest low temperature records updated for the week ending 21 July.
http://globalfreeze.wordpress.com/2009/07/22/usa-chills-1077-lowest-max-temps-and-856-low-temps-for-week-ending-tue-21-july/

JEM
July 23, 2009 1:44 pm

What’s the old line about not attributing to malice what can more easily be explained by incompetence?
I’m no scientist – as a coder I’m a craftsman, a technician – but when I look around at GIStemp – the design of the whole thing, as well as some of the code (I haven’t yet gone through it exhaustively) what I see is the product of someone who was learning as he went, the coding as well as probably the analysis.
I’m no Hansen fan, and none of this excuses Hansen’s failure to recognize the limitations and inconsistencies of the end result, but it explains why it is what it is.
Given the current state of the art one could staple together the hardware for a database of all the available time-series data sets, tagged with each model’s or researcher’s adjustments, fills, averages, etc. for about the cost of a decent flat-screen TV. Instead we get models that recrunch and revise history every time they’re run.

Nogw
July 23, 2009 2:24 pm

Ron de Haan (11:08:17) : Then we can say “it” has begun. The new LIA is on.

Pamela Gray
July 23, 2009 3:01 pm

Ron de Haan, notice the dot spread on the Hamweather map? That is more likely due to fewer stations, not because of fewer records.

timetochooseagain
July 23, 2009 6:57 pm

JEM (13:44:47) : It’s called Hanlon’s razor and I always try to remind people to apply it. Good to see someone else do it!
Just eyeballing, but maybe the sudden increase late in the record is related to the massive drop in the number of stations toward the end of the record? Lot’s of people asked about that. It doesn’t look totally correlated to number of stations but I imagine that the effect would be very non linear and not constant anyway. I also see a possible effect of the more steady ramp up of coverage by percent area:
http://data.giss.nasa.gov/gistemp/station_data/stations.gif

AnonyMoose
July 23, 2009 9:56 pm

George E. Smith (08:47:45) :
Well I have a couple of questions. (1) Why do such multiple records covering the same time frame even exist

The obvious situation is when equipment is replaced. Run the new and old together for a while.
Goetz and Smith – I thank you for what you’re doing, and I’m fully familiar which what you’re wrestling with. I’ve had to untangle business logic which was spread through odd corners and buried in routine procedures. It’s fun watching the lightbulbs go on when you explain how their own business is working, so they can see what can be adjusted. Of course, it’s different when all the audience and participants are friendly and helpful, rather than part of it wanting to be left alone.

tty
July 24, 2009 4:03 am

john (07:21:37) :
A lot of GHCN records end in 1990 and are replaced by MCDW records that usually start in 1987, so it is only to be expected that very strange things happen around that time.

E.M.Smith
Editor
July 25, 2009 7:34 pm

For those who have asked “how can there be two records from one station?”:
I believe that this is an artifact of the “odd choice” GIStemp makes to merge the GHCN and USHCN data rather than just “picking on”. So some of the same station raw data reading is processed one way into GHCN and another way into USHCN, then GIStemp “unadjusts” some of the records, throws some away entirely, and merges the resultant data. This can give two different records for the same station with different histories of “adjustment”.
What’s worse is that USHCN data may be kept unchanged, or GHCN data may be kept unchanged, or one may be used to “un-adjust” the other if both series exist for a site. Then the whole thing is mushed together into “one” dataset. This is what gets handed to STEP1 that then “blends the records” together. So you may well have disjoint periods of time with disjoint adjustment histories, all glued together (and with “gaps” filled in by simply making up missing data by guessing via the “reference station method”). So at the end of this, you really have at least 4 different things blended together and called “one data series” (and “smoothed”):
GHCN, USHCN, “Un-adjusted” hybrid, fabricated via reference station method.
Yes, that’s what it does…
Is that a valid technique? Who knows…

July 26, 2009 9:15 am

Comes back to the “accurate but corrupted” data (er, corrected data) that is used to “calibrate” the circulation models to the GIS “as-analyzed” temperatures between 1970 and 1998, doesn’t it?

E.M.Smith
Editor
July 27, 2009 3:46 am

Well, at long last I have a contribution based on the work porting GIStemp. I can now run it up to the “add sea surface anomaly maps” stage, and this means I can inspect the intermediate data for interesting trends. (The STEP4_5 part will take a bit longer. I’ve figured out that SBBX.HadR2 is in “bigendian” format and PCs are littleendian, so I have a data conversion to work out…).
Ok, what have I found in steps 0, 1, 2, …? Plenty. First off, though, I needed a “benchmark” to measure against. I decided to just use the canonical GHCN data set. This is what all the other bits get glued onto, so I wondered, what happens, step by step, as bits get blended into the sausage? I also wondered about the odd “seasonal” anomaly design, and wanted a simple year by year measure.
So my benchmark is just the GHCN monthly averages, summed for each month of the year, cross footed to an annual “Global Average Temperature”, and then a final GAT for ALL TIME is calculated by averaging those yearly GATs.
Now, there are a couple of caveats, not the least of which is that this is Beta code. I’ve cobbled together these tools on 5 hours sleep a night for the last few days (It’s called a “coding frenzy” in the biz… programmers know what I’m talking about… you don’t dare stop till it’s done…) So I’ve done nearly NO Quality Control and have not had a Code Review yet (though I’ve lined up a friend with 30+ years of high end work, currently doing robotics, to review my stuff. He started tonight.) I’m fairly certain that some of these numbers will change a bit as I find little edge cases where some record was left out of the addition…
Second is that I don’t try to answer the question “Is this change to the data valid?” I’m just asking “What is the degree of change?” These may be valid changes.
And third, I have not fully vetted the input data sets. Some of them came with the source code, some from the GIS web site, etc. There is a small possibility that I might not have the newest or best input data. I think this is valid data, but final results may be a smidgeon different if a newer data set shows up.
Ok enough tush cover: What did I find already?!
First up, the “GLOBAL” temperature shows a pronounced seasonal trend. This is a record from after STEP1, just before the zonalizing:
GAT in year : 1971 3.60 6.20 8.20 12.90 16.50 19.30 20.90 20.70 17.90 13.90 9.50 5.60 14.10
The first number is the year, then 12 monthly averages, then the final number is the global average. The fact that the 100ths place is always is a 0 is a direct result of their using C in tenths at this stage. It is “False Precision” in my print format.
It seems a bit “odd” to me that the “Globe” would be 17C colder in January than it is in July. Does it not have hemispheres that balance each other out? In fairness, the sea temps are added in in STEP4_5 and the SH is mostly sea. But it’s pretty clear that the “Global” record is not very global at the half way point in GIStemp.
Next is from GHCH, to GHCN with added (Antarctic, Hohenp…., etc.) and with the pre 1880’s tossed out and the first round of the Reference Station Method. The third record is as the data leaves STEP1 with it’s magic sauce. These are the total of all years in the data set. (The individual year trends are still being analyzed – i.e. I need to get some sleep 😉
2.6 3.9 7.3 11.8 15.8 18.9 20.7 20.3 17.4 13.1 7.9 3.9 11.97
2.6 3.8 7.3 11.7 15.6 18.7 20.5 20.0 17.2 13.0 7.9 3.9 11.85
3.2 4.5 7.9 12.1 15.9 19.0 20.9 20.5 17.7 13.5 8.5 4.5 12.35
It is pretty clear from inspection of these three that the temperature is raised by GIStemp. It’s also pretty clear that STEP0 does not do much of it (in fact, some data points go down – Adding the Antarctic can do that!). The “cooking” only really starts with STEP1.
The big surprise for me was not the 0.38 C rise in the Total GAT (far right) but the way that winters get warmed up! July and August hardly change (0.2 and 0.3 respectively) yet January has a full 0.6 C rise as do November, December, Febrary, and March.
So GIStemp thinks it’s getting warmer, but only in the winter! I can live with that! At this point I think it’s mostly in the data, but further dredging around is needed to confirm that. The code as written seems to have a small bias spread over all months, at least as I read it, so I’m at a loss for the asymmetry of winter. Perhaps it’s buried in the Python of Step1 that I’m still learning to read…
Finally, a brief word on trends over the years. The GIStemp numbers are, er, odd. I have to do more work on them, but there are some trends that I just do not find credible. For example, the 1776 record (that is very representative of that block of time) in GHCN is:
GAT/year: 1776 -1.40 2.30 4.20 7.20 12.10 18.20 19.70 19.30 15.60 9.50 3.00 -0.40 9.89
The 2008 record is:
GAT/year: 2008 8.30 8.30 11.10 14.60 17.60 19.90 20.90 20.90 18.80 15.50 11.00 8.80 15.90
Notice that last, whole year global number? We’re already 6 C warmer!
Now look at the post step1 record for 1881:
GAT in year : 1881 3.50 4.10 6.40 10.90 15.30 18.20 20.20 19.80 17.20 11.80 6.40 3.40 11.43
According to this, we’ve warmed up 4.5C since 1881 and the 1971 record above was a full 2.7C warmer than 1881. But I thought we were freezing in 1971 and a new ice age was forecast?!
Now take a look at January. No change from 1881 to 1971 (well, 0.1c) but February was up 2.1C, March 1.8C, December 2.2C. And the delta to 2008 is a wopping 4.8C in January and 5.4C in December, but July is almost identical. By definition, picking one year to compare to another is a bit of a cherry pick, even though these were modestly randomly picked. (There are “better” and “worse”: 1894 was MINUS 2.4c in January). But even with that, the “globe” seems to have gotten much much warmer during the Northern Hemisphere winters.
Somehow I suspect were seeing a mix of: Exit from LIA in the record that is mostly focused on N. America and Europe; any AGW being substantially in winter in the N.H. and not really doing much for summer heat (if anything), and potentially some kind of bias in the code or temperature recording system that has been warming winter thermometers (heated buildings nearby, car exhausts, huge UHI from massive winter fuel today vs a few wood fires 100+ years ago).
I’ve seen nothing in the AGW thesis that would explain these patterns in the data. Certainly not any “runaway greenhouse” effect. The summers are just fine…
So I’m going to dredge through the buckets of “stuff” my new toy is spitting out, and spend a while thinking about what would be a good article to make from this… and do a bit of a code review to make sure I’ve got it right. In the mean time, enjoy your balmy winters 😉
(And if Anthony would like a copy of the ported GIStemp to play with, well, “Will Program for Beer!” 😉

E.M.Smith
Editor
July 27, 2009 3:59 am

Hmmm…. A bit further pondering….
Does anyone have a graph of S.H. thermometer growth over time? It would be a bit of a “hoot” if the “Global Warming” all came down to more thermometers being put in The Empire in Africa, Australia, et. al. then to Soviet Union dropping SIberia out in large part…
Could GW all just be where in the world is Carmen Sandiego’s Thermometer?
😎