Calculating global temperature

I’m happy to present this essay created from both sides of the aisle, courtesy of the two gentlemen below. Be sure to see the conclusion. I present their essay below with only a few small edits for spelling, format, and readability. Plus an image, a snapshot of global temperatures.  – Anthony

http://veimages.gsfc.nasa.gov/16467/temperature_airs_200304.jpg
Image: NASA The Atmospheric Infrared Sounder (AIRS) instrument aboard NASA’s Aqua satellite senses temperature using infrared wavelengths. This image shows temperature of the Earth’s surface or clouds covering it for the month of April 2003.

By Zeke Hausfather and Steven Mosher

There are a variety of questions that people have about the calculation of a global temperature index. Questions that range from the selection of data and the adjustments made to data, to the actual calculation of the average. For some there is even a question about whether the measure makes any sense or not. It’s not possible to address all these questions in one short piece, but some of them can be addressed and reasonably settled. In particular we are in a position to answer the question about potential biases in the selection of data and biases in how that data is averaged.

To move the discussion onto the important matters of adjustments to data or, for example, UHI issues in the source data it is important to move forward on some answerable questions. Namely, do the methods for averaging data, the methods of the GISS, CRU and NCDC bias the result? There are a variety of methods for averaging spatial data, do the methods selected and implemented by the big three bias the result?

There has been a trend of late among climate bloggers on both sides of the divide to develop their own global temperature reconstructions. These have ranged from simple land reconstructions using GHCN data

(either v2.mean unadjusted data or v2.mean_adj data) to full land/ocean reconstructions and experiments with alternative datasets (GSOD , WDSSC , ISH ).

Bloggers and researchers who have developed reconstructions so far this year include:

Roy Spencer

Jeff Id

Steven Mosher

Zeke Hausfather

Tamino

Chad

Nick Stokes

Residual Analysis

And, just recently, the Muir Russell report

What is interesting is that the results from all these reconstructions are quite similar, despite differences in methodologies and source data. All are also quite comparable to the “big three” published global land temperature indices: NCDC , GISTemp , and CRUTEM .

[Fig 1]

The task of calculating global land temperatures is actually relatively simple, and the differences between reconstructions can be distilled down to a small number of choices:

1. Choose a land temperature series.

Ones analyzed so far include GHCN (raw and adjusted), WMSSC , GISS Step 0, ISH , GSOD , and USHCN (raw, time-of-observation adjusted, and F52 fully adjusted). Most reconstructions to date have chosen to focus on raw datasets, and all give similar results.

[Fig 2]

It’s worth noting that most of these datasets have some overlap. GHCN and WMSSC both include many (but not all) of the same stations. GISS Step 0 includes all GHCN stations in addition to USHCN stations and a selection of stations from Antartica. ISH and GSOD have quite a bit of overlap, and include hourly/daily data from a number of GHCN stations (though they have many, many more station records than GHCN in the last 30 years).

2. Choosing a station combination method and a normalization method.

GHCN in particular contains a number of duplicate records (dups) and multiple station records (imods) associated with a single wmo_id. Records can be combined at a single location and/or grid cell and converted into anomalies through the Reference Station Method (RSM), the Common Anomalies Method (CAM), and First Differences Method (FDM), or the Least Squares Method (LSM) developed by Tamino and Roman M . Depending on the method chosen, you may be able to use more stations with short records, or end up discarding station records that do not have coverage in a chosen baseline period. Different reconstructions have mainly made use of CAM (Zeke, Mosher, NCDC) or LSM (Chad, Jeff Id/Roman M, Nick Stokes, Tamino). The choice between the two does not appear to have a significant effect on results, though more work could be done using the same model and varying only the combination method.

[Fig 3]

3. Choosing an anomaly period.

The choice of the anomaly period is particularly important for reconstructions using CAM, as it will determine the amount of usable records. The anomaly period can also result in odd behavior of anomalies if it is too short, but in general the choice makes little difference to the results. In the figure that follows Mosher shows the difference between picking an anomaly period like CRU does, 1961-1990, and picking an anomaly period that maximizes the number monthly reports in a 30 year period.  The period that maximizes the number of monthly reports over a 30 year period turns out to be 1952-1983.  1953-82 (Mosher). No other 30 year period in GHCN has more station reports. This refinement, however, has no appreciable impact.

[Fig 4]

4. Gridding methods.

Most global reconstructions use 5×5 grid cells to ensure good spatial coverage of the globe. GISTemp uses a rather different method of equal-size grid cells. However, the choice between the two methods does not seem to make a large difference, as GISTemp’s land record can be reasonably well-replicated using 5×5 grid cells. Smaller resolution grid cells can improve regional anomalies, but will often result in spatial bias in the results, as there will be large missing areas during periods when or in locations when station coverage is limited. For the most part, the choice is not that important, unless you choose extremely large or small gridcells. In the figure that follows Mosher shows that selecting a smaller grid does not impact the global average or the trend over time. In his implementation there is no averaging or extrapolation over missing grid cells. All the stations within a grid cell are averaged and then the entire globe is averaged. Missing cells are not imputed with any values.

[Fig 5]

5. Using a land mask.

Some reconstructions (Chad, Mosh, Zeke, NCDC) use a land mask to weight each grid cell by its respective land area. The land mask determines how much of a given cell ( say 5×5) is actually land. A cell on a coast, thus, could have only a portion of land in it. The land mask corrects for this. The percent of land in a cell is constructed from a 1 km by 1 km dataset. The net effect of land masking is to increase the trend, especially in the last decade. This factor is the main reason why recent reconstructions by Jeff Id/Roman M and Nick Stokes are a bit lower than those by Chad, Mosh, and Zeke.

[Fig 6]

6. Zonal weighting.

Some reconstructions (GISTemp, CRUTEM) do not simply calculate the land anomaly as the size-weighted average of all grid cells covered. Rather, they calculate anomalies for different regions of the globe (each hemisphere for CRUTEM, 90°N to 23.6°N, 23.6°N to 23.6°S and 23.6°S to 90°S for GISTemp) and create a global land temp as the weighted average of each zone (weightings 0.3, 0.4 and 0.3, respectively for GISTemp, 0.68 × NH + 0.32 × SH for CRUTEM). In both cases, this zonal weighting results in a lower land temp record, as it gives a larger weight to the slower warming Southern Hemisphere.

[Fig 7]

These steps will get you a reasonably good global land record. For more technical details, look at any of the many http://noconsensus.wordpress.com/2010/03/25/thermal-hammer-part-deux/different  http://residualanalysis.blogspot.com/2010/03/ghcn-processor-11.html models  http://rankexploits.com/musings/2010/a-simple-model-for-spatially-weighted-temp-analysis/ that have been publicly  http://drop.io/treesfortheforest released http://moyhu.blogspot.com/2010/04/v14-with-maps-conjugate-gradients.html

].

7. Adding in ocean temperatures.

The major decisions involved in turning a land reconstruction into a land/ocean reconstruction are choosing a SST series (HadSST2, HadISST/Reynolds, and ERSST have been explored  http://rankexploits.com/musings/2010/replication/ so far), gridding and anomalizing the series chosen, and creating a combined land-ocean temp record as a weighted combination of the two. This is generally done by: global temp = 0.708 × ocean temp + 0.292 × land temp.

[Fig 8]

8. Interpolation.

Most reconstructions only cover 5×5 grid cells with one or more station for any given month. This means that any areas without station coverage for any given month are implicitly assumed to have the global mean temperature. This is arguably problematic, as high-latitude regions tend to have the poorest coverage and are generally warming faster than the global average.

GISTemp takes a somewhat different approach, assigning a temperature anomaly to all missing grid boxes located within 1200 km of one or more stations that do have defined temperature anomalies. They rationalize this based on the fact that “temperature anomaly patterns tend to be large scale, especially at middle and high latitudes.” Because GISTemp excludes SST readings from areas with sea ice cover, this leads to the extrapolation of land anomalies to ocean areas, particularly in the Arctic. The net effects of interpolation on the resulting GISTemp record is small but not insignificant, particularly in recent years. Indeed, the effect of interpolation is the main reason why GISTemp shows somewhat different trends from HadCRUT and NCDC over the past decade.

[Fig 9]

9. Conclusion

As noted above there are many questions about the calculation of a global temperature index. However, some of those questions can be fairly answered and have been fairly answered by a variety of experienced citizen researchers from all sides of the debate. The approaches used by GISS and CRU and NCDC do not bias the result in any way that would erase the warming we have seen since 1880. To be sure there are minor differences that depend upon the exact choices one makes, choices of ocean data sets, land data sets, rules for including stations, rules for gridding, area weighting approaches, but all of these differences are minor when compared to the warming we see.

That suggests a turn in the discussion to the matters which have not been as thoroughly investigated by independent citizen researchers on all sides:

A turn to the question of data adjustments and a turn to the question of metadata accuracy and finally a turn to the question about UHI. Now, however, the community on all sides of the debate has a set of tools to address these questions.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
194 Comments
Inline Feedbacks
View all comments
pat
July 13, 2010 3:33 pm

I have an idea. Let’s actually read th thermometer. Report what was read, and identify the location and environment. No mopre homogenization, models, adjustments, proxies,and tinkering with the past.

July 13, 2010 3:35 pm

George E. Smith,
Gridding is the method used to spatially weight temperature measurements. For example, if you have 1000 stations in the U.S., and 1000 stations in the rest of the world, simply averaging the anomalies from all the stations would result in the U.S. temperatures having the same weight in the resulting global anomaly as the rest of the world combined. To avoid biasing oversampled areas, reconstructions generally assign each station to a 5×5 lat/lon grid cell, calculate the anomalies of the entire grid cell as an average of the anomalies of all stations contained therein (using a common anomaly period), and calculate the global anomaly as the area-weighted average of all covered grid cells for a given month.
I go into a bit more detail about the specific methods in my post at Lucia’s place, but Jeff Id, Mosh, myself, NCDC, HadCRUT, etc. all use this same method (GISTemp uses something slightly different, with equal-sized grids instead of 5×5 cells).
http://rankexploits.com/musings/2010/a-simple-model-for-spatially-weighted-temp-analysis/

DirkH
July 13, 2010 3:39 pm

DirkH says:
“He compensates the death of thermometers […]”
I’m referring to E.M. Smith’s famous phrase that describes the population decline of thermometers worldwide from 6000 to 1500, of course. But the constant set method also makes sure you eliminate artefacts from stations that stop reporting for a while to restart later.

crosspatch
July 13, 2010 3:39 pm

But how easy is that to correct for? (meaning UHI)

I would say “exceedingly difficult” because it impacts different stations in different ways and there is no one-size-fits-all algorithm to use for every single station. You can’t have simply a binary “urban/rural” rule because much depends on rate of urbanization. Even the predominant wind direction can play a role. Agricultural uses can change temperatures, too. An area that had been desert suddenly experiencing irrigation might show a higher average temperature with slightly cooler days but much warmer nights.
If someone were to task me with making a global land surface temperature estimate, I would probably begin with either establishing new or selecting existing stations that are quite far from developed areas, just the opposite of the approach we seem to have from the Big Three where remote stations have been dropped over the years in favor of stations in more populated areas.

July 13, 2010 3:45 pm

George E Smith
“then there’s that 1961 to 1990 base period for all those anomaly graphs; what is that all about. If they didn’t measure the correct global temperature during the base period; then of course the anomalies don’t have any real reference either.”
It’s an important fact about anomalies that they aren’t calculated relative to a global temperature – each is calculated relative to the local mean over the period. They just use the same period. But the LSM methods that are mentioned don’t use a base period at all. The LS method calculates the correct offset. That’s one of the points of the article – both ways give the same result.

carrot eater
July 13, 2010 3:48 pm

Mosh
“I use an entirely different method for combining duplicates in GHCN than Hansen does.”
Had to chuckle at that one. For all the methodology choices that turn out not to matter much, that’s got to be one of the least consequential. No?
Zeke,
I like how Broberg came to see why you have to use anomalies, in this sort of exercise.
http://rhinohide.wordpress.com/2010/07/10/trb-0-01-ghcn-simple-mean-average/#comment-602

DirkH
July 13, 2010 3:50 pm

Steven Mosher says:
July 13, 2010 at 3:43 pm
“[…]6. My baseline period is not cherry picked. I pick the period with the most
stations reporting. That turns out to be 1953-82. [….]”
And right after 82, a steep temp rise (and declining thermometer population).
Not accusing anyone of anything, just saying.

John from CA
July 13, 2010 3:54 pm

Thanks Zeke and Steven – great article!

Malcolm Miller
July 13, 2010 3:56 pm

This seems to me all a terrible waste of time and effort. We don’t know how to measure the surface temperature of the whole planet (always changing with night, day, weather, seasons, etc) from here. Maybe it could be measured from space. But that would give us no information about what it was in the past, or how it might have changed in 100 or 200 or 500 years (all very tiny intervals in terms of geological time!). So what is the ‘temperature of the planet’ and where do you put the thermometer? It seems to me that the present records and readings are so suspect and so inaccurate (those Stevenson screens!) that they are useless and don’t represent valid data.

Tommy
July 13, 2010 3:58 pm

What if one were to take the topographical data of each cell, and graph the amount of land at various averaged altitudes, and compare the # of stations at those altitudes. Would certain altitudes be over-represented in the cell’s average, and would it matter?
Just thinking out loud…

July 13, 2010 4:00 pm

Dirk H: Here’s a guy who did a very simple analysis of raw data who comes to the conclusion that there is no discernible trend:
That guy freely admits that he did no geographic weighting. GHCN has a high percentage of US stations – and a low percentage of Southern Hemisphere stations. By taking simple averages of the data, you don’t get an ‘even’ input from each region of the world. So his method is flawed if uses as a “global” analysis.
You might enjoy watching how I am adding geographic information into my “global gridded anomaly” code.
http://rhinohide.wordpress.com/category/trb/

sky
July 13, 2010 4:02 pm

It should come as no surprise that, if all the station data commonly shared by all is processed by somewhat different procedures, then the end-products will differ only slightly. But the crucial question is what the end-product tells us about temperature variations beyond the immediate–and usually changing–environment at the station sites. In other words, are the “global” time-series produced truly indicative of the Earth’s entire continental surface?
Lack of any credible century-long station-data from much of equatorial Africa and several other major regions, as well as the propensity for virtually all station sites to be affected by human activity leaves that question much unsettled. And the whole enterprise of stitching together continous anomaly-series from station records that are spatially inhomogenous and temporally discontinous needs to be examined far more critically. If we wish to discern climate variations over a century measured in tenths of degrees, the clerical approach of indiscriminately using ALL the station data will not provide a reliable answer. Long experience with world-wide station records and Surface Marine Observations made by ships of opportunity strongly argues for an uncertainy level that is as large as the climate “signal.” Only the most thoroughly vetted station records tell us anything meaningful about that signal. The rest should be ignored.

July 13, 2010 4:11 pm

sky: Only the most thoroughly vetted station records tell us anything meaningful about that signal.
While this selection is not ‘throroughly vetted’, Nick Stokes took a stab at a global analysis using just GHCN stations that were current, at least 90 years long, and flagged as ‘rural’.
Stokes: Just 60 Stations?
http://moyhu.blogspot.com/2010/05/just-60-stations.html
REPLY: Yes he did, and one of the most interesting things Chiefio found was that of the stations in GHCN that are tagged “rural” 100 of them are actually at small rural airports. Between ASOS issues and issues like we’ve found at Carefree Skypark (a small “rual” airport at one time, more on this station tomorrow) it does not instill a lot of confidence in the data quality. – Anthony

1DandyTroll
July 13, 2010 4:13 pm

I think it’s all bull crap when different methods results in different results.
Use the simplest method et voila you get the proper result of what your equipment, which turns out to usually be reality, is showing you. The only thing then is the context.
People who screw around with the data, trying to process it and mold it (into better data?) to fit their belief of what reality is supposed to be, is just doing that. The data only need to change if your equipment it faulty, otherwise it’s supposed to function perfectly (usually then the change of data consist of actually making an exchange of the faulty equipment for a functional apparatus to get proper data.)
The data is the data, what might need processing and adjusting, molding even, is the context to explain the data. It’s not exactly the temperature data between 1900 and 2000 from New York, Paris, or Rom, that needed processing and adjusting, but the context that needed updating, i.e. the explanation of the heat island effect with perfectly natural population growth. Unnatural growth would mayhap have been just the building of one sky scraper to house everybody, oh and of course painted white like the chilean mountains (actually that would make sense to paint risers white, but mountains not since that’d constitute anthropogenic climate change, well locally anyways, for the “local people”, and critters and what not.)
But of course if the alarmist believers put everything into a proper context, would they be alarmist believers then?

Bill DiPuccio
July 13, 2010 4:14 pm

This is a well written overview of the problem by Steve Mosher!
However, the iconic status of global average surface temperature and its actual utility to the climate debate remains dubious. I think this point should continue to be pressed. Ocean heat content (though not without problems) is a much better metric.
Consider: Vast regions can undergo significant climate changes with little net effect on global average temperature. If region A increases 2 degrees, but region B decreases 2 degrees, the net difference is zero degrees.
Moreover, changes in humidity (latent heat) do not register either. But if we must have a global average temp, a better metric would be moist enthalpy, viz. equivalent potential temp (theta-e). This would bring us closer to measuring the actual heat present in the planetary boundary layer.

carrot eater
July 13, 2010 4:14 pm

I can’t make head or tail of what was actually done in Dirk’s link.
The ending implies strongly that the analysis only covers one country, perhaps New Zealand. In which case, I don’t know where the ‘GISS’ series came from, but that’s another matter.
It’s also very unclear how the station combining was done.
But if people want to see a constant-station sample, those have been done as well. Zeke’s done it; I’m sure he can find the link to his own graph somewhere.

RomanM
July 13, 2010 4:23 pm

CE

Mosh
“I use an entirely different method for combining duplicates in GHCN than Hansen does.”
Had to chuckle at that one. For all the methodology choices that turn out not to matter much, that’s got to be one of the least consequential. No?

Have you bothered to look at the data itself at all? A preliminary glance indicates that the quality control has been somewhat lax. In some cases, the data from different versions of the same station is not even close either in value or in pattern. Simple averaging doesn’t cut it.
Also, despite your constant nattering about the “need” for anomalies, this is not really the case when there are overlaps in station records. The LSM can handle that without resorting to referencing fixed periods of time (which can actually introduce seasonal distortion at other times in the record). Anomalising for comparison purposes can be done on the final result with greater accuracy.

July 13, 2010 4:29 pm

If you were to start from scratch to measure average global temperature, how would you do it? As an engineer, I would identify different environment types, such as urban, suburban, woodland, farmland, desert, marsh, lake, etc. For each grid we would need to determine what percentage consists of these environment types, place temperature recording instrumentation in each area and calculate a weighted average for each grid. Not having done this initially, the next best thing is to put out instrumentation now and correlate it with existing instrumentation. Historical land use patterns could then be used to adjust historical temperature data. Statistically adjusted data, even though the methodology has passed peer(i.e. pal) review does not pass the smell test.

cicero
July 13, 2010 4:30 pm

It would seem that once the UHI effect is removed from the temperature record, solar activity correlates pretty well with unbiased surface temperatures.
The paper, ‘Can solar variability explain global warming since 1970?’ Solanki and Krivova 2003 (http://www.mps.mpg.de/homes/natalie/PAPERS/warming.pdf) concluded that, “The solar indicators correlate well with the temperature record prior to 1970…” but that “the Sun has contributed less than 30% of the global
warming since 1970”. The author’s decided that the difference has to be due to human-induced warming but there seems to be another obvious possibility…
Their conclusion was based on the surface temp dataset they obtained from CRU which contains the UHI bias. I looked at just a dozen randomly chosen rural station plots throughout North America from the GISS site (checking against surfacestations.org – which is a great site Anthony! – and Google Earth to be sure there were no nearby heat sources).and could see what appears to be a good correlation between these graphs and the plotted solar activity post-1970 from the Solanki paper.

AMac
July 13, 2010 4:33 pm

sky (July 13, 2010 at 4:02 pm) —
> But the crucial question is … are the “global” time-series produced truly indicative of the Earth’s entire continental surface?
The great thing about this work is that citizen-scientists from across the Great Divide have taken the time to tackle a tool-building question. If that methods question can be answered, then citizen-scientists can use these methods to look at more interesting problems.
One possibility was that models built by folks subscribing to AGW would show a lot of warming. While models built by those skeptical of AGW would show modest or no warming.
Then we’d say that (1) at least some the models appear to be biased and unreliable, or (2) none of the models are any good.
But that didn’t happen. E.g. skeptic Jeff Id’s method produced a warming trend that’s on the high side, slightly. “That’s what it is,” he said.
With added confidence that the models are reproducible and not biased by the preconceptions of the modelers, it should be possible to go forward to the contentious issues, like the ones you raise. Scientific progress!

carrot eater
July 13, 2010 4:44 pm

RomanM
Yet in many cases, the duplicates are pretty much the same where they overlap.
And what ‘nattering’? When I say ‘anomaly’, I include what the RSM and LSM do. The moment you use offsets, you have relative temperatures, not absolutes. The terminology ‘anomaly’ is certainly not limited to the CAM. The CAM just introduces a fixed baseline at the time of combining, that’s all.

carrot eater
July 13, 2010 4:46 pm

AMac,
To avoid confusion, it might be better to not refer to the above work as models.

Mac the Knife
July 13, 2010 4:47 pm

Zeke Hausfather and Steven Mosher:
A most excellent post! Thank you very much!!!
Anthony and Charles The Moderator:
You guys Rock! Rock On, Numero Uno!!!!!!
Carrot eater:
There is no serious dispute that the planet has been warming in fits and starts since the last major ice age and, closer in time series, since the Little Ice Age. A most reasonable dispute ensues when claims are made that man made emissions of CO2 are direct causation for any of the extended period of global warming.
Nature plays enough ‘tricks’ to keep us all guessing at correlations, without dubious statistical manipulations such as ‘Mike’s nature trick’, as in “I’ve just completed Mike’s Nature trick of adding in the real temps to each series for the last 20 years (ie from 1981 onwards) amd (sic) from 1961 for Keith’s to hide the decline”. When faced with potential malfeasance by rabid advocates of the man made global warming hypothesis, it should come to no ones surprise that reasoning minds conclude some reported global warming is indeed “an artifact of adjustments”.
Now… where is the link to that catchy tune “Hide the Decline!”?