I’m happy to present this essay created from both sides of the aisle, courtesy of the two gentlemen below. Be sure to see the conclusion. I present their essay below with only a few small edits for spelling, format, and readability. Plus an image, a snapshot of global temperatures. – Anthony

By Zeke Hausfather and Steven Mosher
There are a variety of questions that people have about the calculation of a global temperature index. Questions that range from the selection of data and the adjustments made to data, to the actual calculation of the average. For some there is even a question about whether the measure makes any sense or not. It’s not possible to address all these questions in one short piece, but some of them can be addressed and reasonably settled. In particular we are in a position to answer the question about potential biases in the selection of data and biases in how that data is averaged.
To move the discussion onto the important matters of adjustments to data or, for example, UHI issues in the source data it is important to move forward on some answerable questions. Namely, do the methods for averaging data, the methods of the GISS, CRU and NCDC bias the result? There are a variety of methods for averaging spatial data, do the methods selected and implemented by the big three bias the result?
There has been a trend of late among climate bloggers on both sides of the divide to develop their own global temperature reconstructions. These have ranged from simple land reconstructions using GHCN data
(either v2.mean unadjusted data or v2.mean_adj data) to full land/ocean reconstructions and experiments with alternative datasets (GSOD , WDSSC , ISH ).
Bloggers and researchers who have developed reconstructions so far this year include:
Steven Mosher
And, just recently, the Muir Russell report
What is interesting is that the results from all these reconstructions are quite similar, despite differences in methodologies and source data. All are also quite comparable to the “big three” published global land temperature indices: NCDC , GISTemp , and CRUTEM .
[Fig 1]
The task of calculating global land temperatures is actually relatively simple, and the differences between reconstructions can be distilled down to a small number of choices:
1. Choose a land temperature series.
Ones analyzed so far include GHCN (raw and adjusted), WMSSC , GISS Step 0, ISH , GSOD , and USHCN (raw, time-of-observation adjusted, and F52 fully adjusted). Most reconstructions to date have chosen to focus on raw datasets, and all give similar results.
[Fig 2]
It’s worth noting that most of these datasets have some overlap. GHCN and WMSSC both include many (but not all) of the same stations. GISS Step 0 includes all GHCN stations in addition to USHCN stations and a selection of stations from Antartica. ISH and GSOD have quite a bit of overlap, and include hourly/daily data from a number of GHCN stations (though they have many, many more station records than GHCN in the last 30 years).
2. Choosing a station combination method and a normalization method.
GHCN in particular contains a number of duplicate records (dups) and multiple station records (imods) associated with a single wmo_id. Records can be combined at a single location and/or grid cell and converted into anomalies through the Reference Station Method (RSM), the Common Anomalies Method (CAM), and First Differences Method (FDM), or the Least Squares Method (LSM) developed by Tamino and Roman M . Depending on the method chosen, you may be able to use more stations with short records, or end up discarding station records that do not have coverage in a chosen baseline period. Different reconstructions have mainly made use of CAM (Zeke, Mosher, NCDC) or LSM (Chad, Jeff Id/Roman M, Nick Stokes, Tamino). The choice between the two does not appear to have a significant effect on results, though more work could be done using the same model and varying only the combination method.
[Fig 3]
3. Choosing an anomaly period.
The choice of the anomaly period is particularly important for reconstructions using CAM, as it will determine the amount of usable records. The anomaly period can also result in odd behavior of anomalies if it is too short, but in general the choice makes little difference to the results. In the figure that follows Mosher shows the difference between picking an anomaly period like CRU does, 1961-1990, and picking an anomaly period that maximizes the number monthly reports in a 30 year period. The period that maximizes the number of monthly reports over a 30 year period turns out to be 1952-1983. 1953-82 (Mosher). No other 30 year period in GHCN has more station reports. This refinement, however, has no appreciable impact.
[Fig 4]
4. Gridding methods.
Most global reconstructions use 5×5 grid cells to ensure good spatial coverage of the globe. GISTemp uses a rather different method of equal-size grid cells. However, the choice between the two methods does not seem to make a large difference, as GISTemp’s land record can be reasonably well-replicated using 5×5 grid cells. Smaller resolution grid cells can improve regional anomalies, but will often result in spatial bias in the results, as there will be large missing areas during periods when or in locations when station coverage is limited. For the most part, the choice is not that important, unless you choose extremely large or small gridcells. In the figure that follows Mosher shows that selecting a smaller grid does not impact the global average or the trend over time. In his implementation there is no averaging or extrapolation over missing grid cells. All the stations within a grid cell are averaged and then the entire globe is averaged. Missing cells are not imputed with any values.
[Fig 5]
5. Using a land mask.
Some reconstructions (Chad, Mosh, Zeke, NCDC) use a land mask to weight each grid cell by its respective land area. The land mask determines how much of a given cell ( say 5×5) is actually land. A cell on a coast, thus, could have only a portion of land in it. The land mask corrects for this. The percent of land in a cell is constructed from a 1 km by 1 km dataset. The net effect of land masking is to increase the trend, especially in the last decade. This factor is the main reason why recent reconstructions by Jeff Id/Roman M and Nick Stokes are a bit lower than those by Chad, Mosh, and Zeke.
[Fig 6]
6. Zonal weighting.
Some reconstructions (GISTemp, CRUTEM) do not simply calculate the land anomaly as the size-weighted average of all grid cells covered. Rather, they calculate anomalies for different regions of the globe (each hemisphere for CRUTEM, 90°N to 23.6°N, 23.6°N to 23.6°S and 23.6°S to 90°S for GISTemp) and create a global land temp as the weighted average of each zone (weightings 0.3, 0.4 and 0.3, respectively for GISTemp, 0.68 × NH + 0.32 × SH for CRUTEM). In both cases, this zonal weighting results in a lower land temp record, as it gives a larger weight to the slower warming Southern Hemisphere.
[Fig 7]
These steps will get you a reasonably good global land record. For more technical details, look at any of the many http://noconsensus.wordpress.com/2010/03/25/thermal-hammer-part-deux/different http://residualanalysis.blogspot.com/2010/03/ghcn-processor-11.html models http://rankexploits.com/musings/2010/a-simple-model-for-spatially-weighted-temp-analysis/ that have been publicly http://drop.io/treesfortheforest released http://moyhu.blogspot.com/2010/04/v14-with-maps-conjugate-gradients.html
].
7. Adding in ocean temperatures.
The major decisions involved in turning a land reconstruction into a land/ocean reconstruction are choosing a SST series (HadSST2, HadISST/Reynolds, and ERSST have been explored http://rankexploits.com/musings/2010/replication/ so far), gridding and anomalizing the series chosen, and creating a combined land-ocean temp record as a weighted combination of the two. This is generally done by: global temp = 0.708 × ocean temp + 0.292 × land temp.
[Fig 8]
8. Interpolation.
Most reconstructions only cover 5×5 grid cells with one or more station for any given month. This means that any areas without station coverage for any given month are implicitly assumed to have the global mean temperature. This is arguably problematic, as high-latitude regions tend to have the poorest coverage and are generally warming faster than the global average.
GISTemp takes a somewhat different approach, assigning a temperature anomaly to all missing grid boxes located within 1200 km of one or more stations that do have defined temperature anomalies. They rationalize this based on the fact that “temperature anomaly patterns tend to be large scale, especially at middle and high latitudes.” Because GISTemp excludes SST readings from areas with sea ice cover, this leads to the extrapolation of land anomalies to ocean areas, particularly in the Arctic. The net effects of interpolation on the resulting GISTemp record is small but not insignificant, particularly in recent years. Indeed, the effect of interpolation is the main reason why GISTemp shows somewhat different trends from HadCRUT and NCDC over the past decade.
[Fig 9]
9. Conclusion
As noted above there are many questions about the calculation of a global temperature index. However, some of those questions can be fairly answered and have been fairly answered by a variety of experienced citizen researchers from all sides of the debate. The approaches used by GISS and CRU and NCDC do not bias the result in any way that would erase the warming we have seen since 1880. To be sure there are minor differences that depend upon the exact choices one makes, choices of ocean data sets, land data sets, rules for including stations, rules for gridding, area weighting approaches, but all of these differences are minor when compared to the warming we see.
That suggests a turn in the discussion to the matters which have not been as thoroughly investigated by independent citizen researchers on all sides:
A turn to the question of data adjustments and a turn to the question of metadata accuracy and finally a turn to the question about UHI. Now, however, the community on all sides of the debate has a set of tools to address these questions.









Hitherto, it had been my understanding that the base period was selected over a span of continuous years then set to zero.
Illustration http://www.geoffstuff.com/global-temperature-anomalies-1800-2006.png
Source http://www.globalissues.org/article/233/climate-change-and-global-warming-introduction
which is a version of GISS land + sea in a 2009 report. (Chosen at random, not because of any attribute).
Now I realise that my assumption of setting to zero might be wrong. Although the material quoted here eyeballs to zero temp from 1951-1980, none of the graphs in the artcle here on WUWT seem to averge to zero over the base period. Please, can someone clarify the procedure? In an ideal world, the “anomaly” (a method that I dislike personally) should be comparable as one moves between grid cells, hemispheres, land/sea, etc.
It seems that stations are being adjusted, dropped out and perhaps added to the base period from time to time. Thus, if correct, the “anomaly” is a bit emphemeral, changing in K temperature each time the base period is altered.
So this is a call to be more close to real physics, by using at least degree C actual. Graphics can be used to cut out the dead area at the base of the graph in hot places.
Addendum: Is each author above aware of the probability that reconstructions as shown are based on country information that is, for some countries, already adjusted?
For example, I do not think I have ever seen a graph of Australia land temperatures that is based on raw data as collected by the observers. Also, the Australian record is undergoing continuous revision as more metadata are being extended back in time.
In short, I think that you are not dealing with a stationary historical data set.
CE
Then you don’t understand these other methods very well.
Anomalies in the usual sense calculate a temperature record relative to itself during a specific time period. The results that you end up with using them end up as relative values.
On the other hand, in LSM, no temperature series is ever compared to itself and at the end of the process you end up with a non-anomalised result.
The two approaches really do not have much in common either in technique or in final result form.
“pat says:
July 13, 2010 at 3:33 pm
I have an idea. Let’s actually read th thermometer. Report what was read, and identify the location and environment. No mopre homogenization, models, adjustments, proxies,and tinkering with the past.”
I totally agree. When you look at the raw data from well sited rural thermometers just about anywhere and you do not find the warming these complicated treatments appear to show, then maybe there is something deeply flawed in the concept of a global temperature. We do not have the coverage and info to do this when we have to extrapolate so extensively. And the adjustments for UHI are usually wrong signed, queering the sites.
When 5 rural stations in the US show an average in which the 1930s were the warmest and 1990s not even equal to 1953 and then the bigger averages show the 90s to be warmer, obviously there must be a significant flaw. Too many local, raw, rural sites seriously disagree.
So once again the accuracy and precision of the data is still not addressed.
Sorry, I still fail to see the significance of reproducing the same results over and over without investigating the quality of measurement at each individual station and also the inclusion of land use change (which alters the climate over time i.e. boundary layer ) and UHI. It would seem those are the most important factors that need to be ironed out.
I enjoyed the article – the overview was interesting, if arcane.
You guys make enough stats to start shipping bubble gum cards of the major players…lol.
Do you suppose they’ll be worth something one day?
Thanks everyone who participated in producing these reconstructions because it clearly took a lot of time and effort.
I guess we can conclude then that the methodology of constructing a properly-weighted land temperature record does not vary much between the various choices of reasonable methodologies.
Land temperatures in the GHCN dataset has increased about 0.9C and the Land/Ocean temperature series has increased about 0.55C since 1900.
There are some other adjustments done that have GISTemp at +0.70C (Land) and +0.64C (Land and Ocean) since 1900 and Hadcrut3 at +0.966C (Land) and +0.703C (Land and Ocean) since 1900.
This then leaves a number of questions remaining:
1. How Raw is the Raw data in the GHCN dataset.
2. What adjustments are done to have GISTemp and Hadcrut3 higher (and lower) than the reconstruction numbers.
3. Can we pick a better series of high quality rural stations that have consistently reported over the whole period so we can avoid UHI and station-selection biases (continuing to use rising stations and discarding declining stations) in the GHCN dataset.
“”” Nick Stokes says:
July 13, 2010 at 3:45 pm
George E Smith
“then there’s that 1961 to 1990 base period for all those anomaly graphs; what is that all about. If they didn’t measure the correct global temperature during the base period; then of course the anomalies don’t have any real reference either.”
It’s an important fact about anomalies that they aren’t calculated relative to a global temperature – each is calculated relative to the local mean over the period. They just use the same period. But the LSM methods that are mentioned don’t use a base period at all. The LS method calculates the correct offset. That’s one of the points of the article – both ways give the same result. “””
Nick I don’t disagree with anything you said. But the Title of this Article was “Calculating the Global Temperature.” When in fact the methodology has nothing whatsoever to do with the global temperature. It is simply the result of manipulations of some given and quite arbitrary set of thermometers somewhere; compared against themselves each to each; and all of that can take place even if the planet earth was nowhere to be found.
The point is that NOWHERE in this process, can the result be connected to the planet to “Calculate the Global Temperature” It simply calculates the variations of some quite arbitrary set of thermometers from themselves.
The min/max daily temperature reading fails the Nyquist sampling criterion; and the spatial distribution of the set of thermometers also fails the Nyquist Test; and by orders of magnitude; so there’s no way that any recovered average can be correct because of the aliassing noise; and the result is meaningless anyway.
As I have said many times GISStemp calculates GISStemp; and nothing else; same goes for HADcrud.
And even if one did actually measure the true average Temperature of the globe; it is not related in any way to the energy flows; so it tells us nothing about the stability of earth’s energy balance.
Again basically all of these land datasets show one thing –
that with growth of airports, plane traffic, and UHI effects the temperature readings are higher.
Concerning UHI there is Dr Spencer’s analysis which suggests it can be quite large even for quite low population density areas in the US.
http://www.drroyspencer.com/2010/03/direct-evidence-that-most-u-s-warming-since-1973-could-be-spurious/
However the really interesting study to my mind is by Edward Long, who chose apparently good quality rural stations in 48 US states, and found that the raw data showed a warming of only 0.11 C/century. A key to Anthony’s concern is he eyeballed the station sites “using a GPS map application”, which probably means no airport UHI issues.
Now how can CO2 avoid warming rural sites if it is the main cause of warming? At 0.11 C/century it would take a cool 1800 years or so to reach IPCC’s arbitrary 2 C limit.
Edward Long’s study can be found here:
http://icecap.us/index.php/go/joes-blog/a_pending_american_temperaturegate/
The PDF report is linked at this site and has a listing of the station ID’s.
If we plotted the yearly mean temp instead of the anomaly, set the bottom of the graph at ZERO, set the top at 2x the mean, then we’d see how this Global Temp scare is making a mountain out of a molehill.
RomanM
Don’t worry, I understand all the methods just fine. To me, an ‘anomaly’ is simply a relative temperature. That’s the key part. Whether it’s relative to its own average (CAM) or something else (as with the intermediate calculations in RSM, LSM) is not that important, when explaining to people the principle of why you use relatives, instead of just averaging together absolutes.
Mac the knife,
You may think that is not under dispute, but look around. You will see lots of commenters and bloggers say or imply that global warming only really shows up at the advertised magnitude when GISS or NOAA or CRU add in their adjustments. I don’t know how widely held this confusion is, but it’s been out there.
So someone can delete all of your posts lambasting GISSTEMP after all?
If Not Then Goto FacePalmLand.
I also question: How Raw is Raw? And I can’t see how vast ocean expanse temps with no records in existence can be determined from measurements on the irregular land masses. And then there is this:
http://icecap.us/images/uploads/DrKeen2.jpg
I’m fully aware this is only a region, yet it must be used to determine a larger area.
And let’s not forget all references to this: (et al in the arctic, e.g. Lucy Skywalker’s post):
http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=431042500000&data_set=1&num_neighbors=1
and this:
[Wibjorn Karlen] In attempts to reconstruct the temperature I find an increase from the early 1900s to ca 1935, a trend down until the mid 1970s and so another increase to about the same temperature level as in the late 1930s.
A distinct warming to a temperature about 0.5 deg C above the level 1940 is reported in the IPCC diagrams. I have been searching for this recent increase, which is very important for the discussion about a possible human influence on climate, but I have basically failed to find an increase above the late 1930s.
See here:
http://wattsupwiththat.com/2009/11/29/when-results-go-bad/
??????????
Firstly, a HUGE pat on the back to Anthony, for having the kohannas to post this! Thanx Anthony!
Second a huge thanx to all the citizen scientists who have spent their own precious time on these temp constructions. Because of you, we can now move on to the next questions. My favourite being where is this warmth from, min or max? BRAVO!
And finally, thanx to all the respondents for being civil and asking questions about the article, not about the authors! The more we have of this type of exchange, the better chance we have of deciding as to whether or not we are in real trouble, a bit of trouble or no trouble.
Zeke and Mosh, would it be advantagious, to maybe do a dual post on your next step, one here and maybe one at Tamino’s or Gavin’s?
Kudos to all!
“”” Zeke Hausfather says:
July 13, 2010 at 3:39 pm
George E. Smith,
Lucia gave an excellent explanation of why anomalies are more useful than absolute temps awhile back: “””
Maybe anomalies are “more useful than absolute temps”; I’ll even stipulate that although I can’t imagine what for. That doesn’t
change the fact that anomalies have nothing whatsoever to do with Calculating the Global Temperature.
The GISStemp process, and the HADcrud process calculate GISStemp and HADcrud respectively; and nothing else. They have no connection to the mean global temperature of the planet; which in turn has no connection to the energy balance of the earth energy budget. The thermal processes over different terrains are all different; and none of them are simply related to even the local Temperature; let alone to any global average temperature; so the whole process is an exercise in self delusion.
Might as well average the telephone numbers in your local phone directory; it is meaningless, uless the average happens to be your phone number; It might not even be a valid phone number; but it still is the average of a quite arbitrary set of numbers.
I quite understand the concept of scraping all the numbers off a thermometer, so they have no absolute calibration; and simply referencing the mark at any time to some other place it once was. But what a complete and utter waste of time and effort that is.
And the results if they show any trends at all, simply show the trend of a particular algorithm applied to a particular set of quite arbitrary and meaningless numbers.
It’s like holding your arms out straight in front of you and then reporting:- ” see there are my fingers right out there on the end of my hands; which is about where they always have been !”
Good paper Anthony.!
I admire the scientists who post here, but a major simple fact seems to be absent in all the presentations. You only have to review the snapshot under Anthony’s starting comment, and ask yourself why the equatorial temperatures show as cool while the adjacent areas are quite warm.
I find the complete disregard of relative humidity most strange. Is it solely because we cannot measure it remotely. ? The global land temperature graph is in accordance with my memories as a 70+ year old living in tropical Australia and PNG. The period around the 1940’s were major drought times that changed to major flooding years from mid 1950’s to mid 1970’s. Since then we have gone into a serious drought cycle that we may be coming out of in the past few years.
Given that people around the world’s developed countries are now demanding the availability of huge amount of energy for their enjoyment in it’s various forms, it’s not suprising that the UHI effect is so significant.
I’m sure that the scientists are well aware of the impact of relative humidity.
They were, when I was a boy. Why not now.?
For all those asking, how raw is raw:
In the case of unadjusted GHCN and USHCN: these are monthly means. They are often the means of the daily max and min temperatures, but in some countries other averages may be used. So somebody somewhere calculated those means. But there is no attempt to correct for TOB, UHI, station moves, changing conditions at the site, changing instrumentation, etc in the unadjusted source files.
NOAA does attempt such adjustments, but they put the results in other files (adj for GHCN, TOB and F52 for USHCN). The individual countries often also do their own adjustments for their own stations, but these should be kept separate from what gets sent to NOAA. You can check that by checking against CLIMAT reports for recent years.
Oh, and R. Gates, in all sincerity, would you please contribute a post about the arctic? Anthony has already said that he would most likely post it. And this is the way science grows, by looking at all sides of the question.
Bill Illis,
“1. How Raw is the Raw data in the GHCN dataset.”
Quite raw, at least in current practice. It comes straight from the CLIMAT forms submitted by the various Mets. You can see these at OGIMET.
“2. What adjustments are done to have GISTemp and Hadcrut3 higher (and lower) than the reconstruction numbers.”
Gistemp code is available. But the message from the linked articles is that the adjusted GISS and HADcrut are not noticeably different from the raw GHCN results.
“3. Can we pick a better series of high quality rural stations that have consistently reported over the whole period so we can avoid UHI and station-selection biases (continuing to use rising stations and discarding declining stations) in the GHCN dataset.”
That was the criterion I used in the 60 stations exercise. Rural, 90 years of record, and still reporting in 2008. The exercise was intended just to show what a rather random choice could do – if I were doing it again, I would modify the weighting.
George Smith, that’s quite an astonishing nihillist (and paranoid) vision of reality. I never thought I’d see that kind of thing. Even here.
Steven Mosher says: “Well, the results are consistent with and confirm the theory of GHG warming, espoused BEFORE this data was collected. They dont prove the theory, no theory is proven.”
“Consistent with,” perhaps. CO2 went up and temperatures went up. This does not prove causation. There have been periods in the past where temperatures went down and CO² remained high.
But “confirm the theory of GHG warming?” How did you reach that conclusion?
The fact that the theory was espoused before the data was collected is irrelevant and “confirms” nothing. It may be necessary, but is far from sufficient. Correlation is not causation. If I had a theory that wind is caused by trees wiggling their branches, and data subsequently showed that when it’s windy, tree branches are always waving, then according to your logic [or what I think is your logic], this would “confirm” my earlier theory of dendroanemosicity.
BillyBob: You asked, “…is there a reliable record for min/max anywhere?
There is an absolute land surface temperature dataset from 1948 t0 present. It’s identified as the “CPC GHCN/CAMS t2m analysis (land)” and is available through the KNMI Climate Explorer in three different resolutions:
http://climexp.knmi.nl/selectfield_obs.cgi?someone@somewhere
The dataset is discussed in Fan and van den Dool (2007) “A global monthly land surface air temperature analysis for 1948–present” (24mb):
ftp://ftp.cpc.ncep.noaa.gov/wd51yf/GHCN_CAMS/2007JD008470.pdf
I used it in one post:
http://bobtisdale.blogspot.com/2010/03/absolute-land-surface-temperature.html
And in that post, I plotted annual maximum, average, and minimum global land surface temperatures:
http://i43.tinypic.com/25qr8yo.png
Here are only stations with 100 year records, for those interested: http://i81.photobucket.com/albums/j237/hausfath/Picture135.png
One of the nice things of having all of these tools out there is that folks can look at any particular subset of station they desire, be it dark rural non-airport stations with 100+ year records or anything else.
For those asking about the rawness of the data, GHCN v2.mean comes directly from CLIMAT reports submitted by national MET offices, with some very basic QA (they keep a file with all the QA rejects if you want to check). We also show reconstructions with two alternative datasets (WMSSC and GSOD). The latter (GSOD) uses mostly stations not included in GHCN and is constructed from raw daily readings.
REPLY: Do you have a list of those stations by name and ID Zeke? I’d like to have a look at them. – Anthony
Just a note to look again at the charts at the top of the post and note the label “GSOD.” It’s not all about GHCN data.
GSOD is an alternate data set – it is not GHCN. Whereas GHCN is compiled from CLIMAT reports submitted by various National Meteorology Centers, GSOD is the daily summary of near real time (synoptic) data gathered hourly to every several hours, depending on the source. this ~hourly data is used for the ISH/ISD data set (used by Dr Spencer). GSOD is a separate data stream, although many of the same stations will appear in both GHCN and GSOD. However, GSOD includes thousands of stations not included in GHCN after 1972 – although there are many fewer stations before 1972.