Art courtesy Dave Stephens
Foreword by Anthony Watts: This article, written by the two Jeffs (Jeff C and Jeff Id) is one of the more technically complex essays ever presented on WUWT. It has been several days in the making. One of the goals I have with WUWT is to make sometimes difficult to understand science understandable to a wider audience. In this case the statistical analysis is rather difficult for the layman to comprehend, but I asked for (and got) an essay that was explained in terms I think many can grasp and understand. That being said, it is a long article, and you may have to read it more than once to fully grasp what has been presented here. Steve McIntyre of Climate Audit laid much of the ground work for this essay, and from his work as well as this essay, it is becoming clearer that Steig et al (see “Warming of the Antarctic ice-sheet surface since the 1957 International Geophysical Year”, Nature, Jan 22, 2009) isn’t holding up well to rigorous tests as demonstrated by McIntyre as well as in the essay below. Unfortunately, Steig’s office has so far deferred (several requests) to provide the complete data sets needed to replicate and test his paper, and has left on a trip to Antarctica and the remaining data is not “expected” to be available until his return.
To help layman readers understand the terminology used, here is a mini-glossary in advance:
RegEM – Regularized Expectation Maximization
PCA – Principal Components Analysis
PC – Principal Components
AWS – Automatic Weather Stations
One of the more difficult concepts is RegEM, an algorithm developed by Tapio Schneider in 2001. It’s a form of expectation maximization (EM) which is a common and well understood method for infilling missing data. As we’ve previously noted on WUWT, many of the weather stations used in the Steig et al study had issues with being buried by snow, causing significant data gaps in the Antarctic record and in some burial cases stations have been accidentally lost or confused with others at different lat/lons. Then of course there is the problem of coming up with trends for the entire Antarctic continent when most of the weather station data is from the periphery and the penisula, with very little data from the interior.
Expectation Maximization is a method which uses a normal distribution to compute the best probability of fit to a missing piece of data. Regularization is required when so much data is missing that the EM method won’t solve. That makes it a statistically dangerous technique to use and as Kevin Trenberth, climate analysis chief at the National Center for Atmospheric Research, said in an e-mail: “It is hard to make data where none exist.” (Source: MSNBC article) It is also valuable to note that one of the co-authors of Steig et al, Dr. Michael Mann, dabbles quite a bit in RegEm in this preparatory paper to Mann et al 2008 “Return of the Hockey Stick”.
For those that prefer to print and read, I’ve made a PDF file of this article available here.
Introduction
This article is an attempt to describe some of the early results from the Antarctic reconstruction recently published on the cover of Nature which demonstrated a warming trend in the Antarctic since 1956. Actual surface temperatures in the Antarctic are hard to come by with only about 30 stations prior to 1980 recorded through tedious and difficult efforts by scientists in the region. In the 80’s more stations were added including some automatic weather stations (AWS) which sit in remote areas and report the temperature information automatically. Unfortunately due to the harsh conditions in the region many of these stations have gaps in their records or very short reporting times (a few years in some cases). Very few stations are located in the interior of the Antarctic, leaving the trend for the central portion of the continent relatively unknown. The location of the stations is shown on the map below.
In addition to the stations there are satellite data from an infrared surface temperature measurement which records the temperature of the actual emission from the surface of the ice/ground in the Antarctic. This is different from the microwave absorption measurements as made from UAH/RSS data which measure temperatures in a thickness of the atmosphere. This dataset didn’t start until 1982.
Steig 09 is an attempt to reconstruct the continent-wide temperatures using a combination of measurements from the surface stations shown above and the post-1982 satellite data. The complex math behind the paper is an attempt to ‘paste’ the 30ish pre-1982 real surface station measurements onto 5509 individual gridcells from the satellite data. An engineer or vision system designer could use several straightforward methods which would insure reasonable distribution of the trends across the grid based on a huge variety of area weighting algorithms, the accuracy of any of the methods would depend on the amount of data available. These well understood methods were ignored in Steig09 in favor of RegEM.
The use of Principal Component Analysis in the reconstruction
Steig 09 presents the satellite reconstructions as the trend and also provides an AWS reconstruction as verification of the satellite data rather than a separate stand alone result presumably due to the sparseness of the actual data. An algorithm called RegEM was used for infilling the missing data. Missing data includes pre 1982 for satellites and all years for the very sparse AWS data. While Dr. Steig has provided the reconstructions to the public, he has declined to provide any of the satellite, station or AWS temperature measurements used as inputs to the RegEM algorithm. Since the station and AWS measurements were available through other sources, this paper focuses on the AWS reconstruction.
Without getting into the detail of PCA analysis, the algorithm uses covariance to assign weighting of a pattern in the data and does not have any input whatsoever for actual station location. In other words, the algorithm has no knowledge of the distance between stations and must infill missing data based solely on the correlation with other data sets. This means there is a possibility that with improper or incomplete checks, a trend from the peninsula on the west coast could be applied all the way to the east. The only control is the correlation of one temperature measurement to another.
If you were an engineer concerned with the quality of your result, you would recognize the possibility of accidental mismatch and do a reasonable amount of checking to insure that the stations were properly assigned after infilling. Steig et. al. described no attempts to check this basic potential problem with RegEM analysis. This paper will describe a simple method we used to determine that the AWS reconstruction is rife with spurious (i.e. appear real but really aren’t) correlations attributed to the methods used by Dr. Steig. These spurious correlations can take a localized climactic pattern and “smear” it over a large region that lacks adequate data of its own.
Now is where it becomes a little tricky. RegEM uses a reduced information dataset to infill the missing values. The dataset is reduced by Principal Component Analysis (PCA) replacing each trend with a similar looking one which is used for covariance analysis. Think of it like a data compression algorithm for a picture which uses less computer memory than the actual but results in a fuzzier image for higher compression levels.
While the second image is still visible, the actual data used to represent the image is reduced considerably. This will work fine for pictures with reasonable compression, but the data from some pixels has blended into others. Steig 09 uses 3 trends to represent all of the data in the Antarctic. In it’s full complexity using 3 PC’s is analogous to representing not just a picture but actually a movie of the Antarctic with three color ‘trends’ where the color of each pixel changes according to different weights of the same red, green and blue color trends (PC’s). With enough PC’s the movie could be replicated perfectly with no loss. Here’s an important quote from the paper.
“We therefore used the RegEM algorithm with a cut-off parameter K=3. A disadvantage of excluding higher-order terms (k>3) is that this fails to fully capture the variance in the Antarctic Peninsula region. We accept this tradeoff because the Peninsula is already the best-observed region of the Antarctic.”

Above: a graph from Steve McIntyre of ClimateAudit where he demonstrates how “K=3 was in fact a fortuitous choice, as this proved to yield the maximum AWS trend, something that will, I’m sure, astonish most CA readers.”
K=3 means only 3 trends were used, the ‘lack of captured variance’ is an acknowledgement and acceptance of the fuzziness of the image. It’s easy to imagine that it would be difficult to represent a complex movie image of Antarctic with any sharpness from 1957 to 2006 temperature with the same 3 color trends reweighted for every pixel. In the satellite version of the Antarctic movie the three trends look like this.
Note that the sudden step in the 3rd trend would cause a jump in the ‘temperature’ of the entire movie. This represents the temperature change between the pre 1982 recreated data and the after 1982 real data in the satellite reconstruction. This is a strong yet overlooked hint that something may not be right with the result.
In the case of the AWS reconstruction we have only 63 AWS stations to make the movie screen, by which the trends of 42 surface station points are used to infill the remaining data. If the data from one surface station is copied to the wrong AWS stations the average will overweight and underweight some trends. So the question becomes, is the compression level too high?
The problems that arise when using too few principal components
Fortunately, we’re here to help in this matter. Steve McIntyre again provided the answer with a simple plot of the actual surface station data correlation with distance. This correlation plot compares the similarities ‘correlation’ of each temperature station with all of the 41 other manual surface stations against the distance between them. A correlation of 1 means the data from one station is exactly equal to the other. Because A -> B correlation isn’t a perfect match for B->A there are 42*42 separate points in the graph. This first scatter plot is from measured temperature data prior to any infilling of missing measurements. Station to station distance is shown on the X axis. The correlation coefficient is shown on the Y axis.
Since this plot above represents the only real data we have existing back to 1957, it demonstrates the expected ‘natural’ spatial relationship from any properly controlled RegEM analysis. The correlation drops with distance which we would expect because temps from stations thousands of miles away should be less related than those next to each other. (Note that there are a few stations that show a positive correlation beyond 6000 km. These are entirely from non-continental northern islands inexplicably used by Steig in the reconstruction. No continental stations exhibit positive correlations at these distances.) If RegEM works, the reconstructed RegEM imputed (infilled) data correlation vs. distance should have a very similar pattern to the real data. Here’s a graph of the AWS reconstruction with infilled temperature values.
Compare this plot with the previous plot from actual measured temperatures. Now contrast that with the AWS plot above. The infilled AWS reconstruction has no clearly evident pattern of decay over distance. In fact, many of the stations show a correlation of close to 1 for stations at 3000 km distant! The measured station data is our best indicator of true Antarctic trends and it shows no sign that these long distance correlations occur. Of course, common sense should also make one suspicious of these long distance correlations as they would be comparable to data that indicated Los Angeles and Chicago had closely correlated climate.
It was earlier mentioned that the use of 3 PCs was analogous to the loss of detail that occurs in data compressions. Since the AWS input data is available, it is possible to regenerate the AWS reconstruction using a higher number of PCs. It stood to reason that spurious correlations could be reduced by retaining the spatial detail lost in the 3 PC reconstruction. Using RegEM, we generated a new AWS reconstruction using the same input data but with 7 PCs. The distance correlations are shown in the plot below.
Note the dramatic improvement over that shown in the previous plot. The correlation decay with distance so clearly seen in the measured station temperature data has returned. While the cone of the RegEM data is slightly wider than the ‘real’ surface station data, the counterintuitive long distance correlations seen in the Steig reconstruction have completely disappeared. It seems clear that limiting the reconstruction to 3 PCs resulted in numerous spurious correlations when infilling missing station data.
Using only 3 principal components distorts temperature trends
If Antarctica had uniform temperature trends across the continent, the spurious correlations might not have a large impact in the overall reconstruction. Individual sites may have some errors, but the overall trend would be reasonably close. However, Antarctica is anything but uniform. The spurious correlations can allow unique climactic trends from a localized region to be spread over a larger area, particularly if an area lacks detailed climate records of its own. It is our conclusion is that is exactly what is happening with the Steig AWS reconstruction.
Consider the case of the Antarctic Peninsula:
- The peninsula is geographically isolated from the rest of the continent
- The peninsula is less than 5% of the total continental land mass
- The peninsula is known to be warming at a rate much higher than anywhere else in Antarctica
- The peninsula is bordered by a vast area known as West Antarctica that has extremely limited temperature records of its own
- 15 of the 42 temperature surface stations (35%) used in the reconstruction are located on the peninsula
If the Steig AWS reconstruction was properly correlating the peninsula stations temperature measurements to the AWS sites, you would expect to see the highest rates of warming at the peninsula extremes. This is the pattern seen in the measured station data. The plot below shows the temperature trends for the reconstructed AWS sites for the period of 1980 to 2006. This time frame has been selected as this is the period when AWS data exists. Prior to 1980, 100% of AWS reconstructed data is artificial (i.e. infilled by RegEM).
Note how warming extends beyond the peninsula extremes down toward West Antarctica and the South Pole. Also note the relatively moderate cooling in the vicinity of the Ross Ice Shelf (bottom of the plot). The warming once thought to be limited to the peninsula appears to have spread. This “smearing” of the peninsula warming has also moderated the cooling of the Ross Ice Shelf AWS measurements. These are both artifacts of limiting the reconstruction to 3 PCs.
Now compare the above plot to the new AWS reconstruction using 7 PCs.
The difference is striking. The peninsula has become warmer and warming is largely limited to its confines. West Antarctica and the Ross Ice Shelf area have become noticeably cooler. This agrees with the commonly-held belief prior to Steig’s paper that the peninsula is warming, the rest of Antarctica is not.
Temperature trends using more traditional methods
In providing a continental trend for Antarctica warming, Steig used a simple average of the 63 AWS reconstructed time series. As can be seen in the plots above, the AWS stations are heavily weighted toward the peninsula and the Ross Ice Shelf area. Steig’s simple average is shown below. The linear trend for 1957 through 2006 is +0.14 deg C/decade. It is worth noting that if the time frame is limited to 1980 to 2006 (the period of actual AWS measurements), the trend changes to cooling, -0.06 deg C/decade.
We used a gridding methodology to weight the AWS reconstructions in proportion to the area they represent. Using the Steig’s method, 3 stations on the peninsula over 5% of the continent’s area would have the same weighting as three interior stations spread over 30% of the continent area. The gridding method we used is comparable to that utilized in other temperature constructions such as James Hansen’s GISStemp. The gridcell map used for the weighted 7 PC reconstruction is shown here.
Cells with a single letter contain one or more AWS temperature stations. If more than one AWS falls within a gridcell, the results were averaged and assigned to that cell. Cells with multiple letters had no AWS within them, but had three or more contiguous cells containing AWS stations. Imputed temperature time series were assigned to these cells based on the average of the neighboring cells. Temperature trends were calculated both with and without the imputed cells. The reconstruction trend using 7 PCs and a weighted station average follow.
The trend has decreased to 0.08 deg C/decade. Although it is not readily apparent in this plot, from 1980 to 2006 the temperature profile has a pronounced negative trend.
Temporal smearing problems caused by too few PCs?
The temperature trends using the various reconstruction methods are shown in the table below. We have broken the trends down into three time periods; 1957 to 2006, 1957 to 1979, and 1980 to 2006. The time frames are not arbitrarily chosen, but mark an important distinction in the AWS reconstructions. There is no AWS data prior to 1980. In the 1957 to 1980 time frame, every single temperature point is a product of the RegEM algorithm. In the 1980 to 2006 time frame, AWS data exists (albeit quite spotty at times) and RegEM leaves the existing data intact while infilling the missing data.
We highlight this distinction as limiting the reconstruction to 3 PCs has an additional pernicious effect beyond spatial smearing of the peninsula warming. In the table below, note the balance between the trends of the 1957 to 1979 era vs. that of the 1980 to 2006 era. In Steig’s 3 PC reconstruction, moderate warming that happened prior to 1980 is more balanced with slight cooling that happened post 1980. In the new 7 PC reconstruction, the early era had dramatic warming, the later era had strong cooling. It is believed that the 7 PC reconstruction more accurately reflects the true trends for the reasons stated earlier in this paper. However, the mechanism for this temporal smearing of trends is not fully understood and is under investigation. It does appear to be clear that limiting the selection to three principal components causes warming that is largely constrained to a pre-1980 time frame to appear more continuous and evenly distributed over the entire temperature record.
| Reconstruction |
1957 to 2006 trend |
1957 to 1979 trend (pre-AWS) |
1980 to 2006 trend (AWS era) |
| Steig 3 PC |
+0.14 deg C./decade |
+0.17 deg C./decade |
-0.06 deg C./decade |
| New 7 PC |
+0.11 deg C./decade |
+0.25 deg C./decade |
-0.20 deg C./decade |
| New 7 PC weighted |
+0.09 deg C./decade |
+0.22 deg C./decade |
-0.20 deg C./decade |
| New 7 PC wgtd imputed cells |
+0.08 deg C./decade |
+0.22 deg C./decade |
-0.21 deg C./decade |
Conclusion
The AWS trends which this incredibly long post was created from were used only as verification of the satellite data. The statistics used for verification are another subject entirely. Where Steig09 falls short in the verification is that RegEM was inappropriately applying area weighting to individual temperature stations. The trends from the AWS reconstruction clearly have blended into distant stations creating an artificially high warming result. The RegEM methodology also appears to have blended warming that occurred decades ago into more recent years to present a misleading picture of continuous warming. It should also be noted that every attempt made to restore detail to the reconstruction or weight station data resulted in reduced warming and increased cooling in recent years. None of these methods resulted in more warming than that shown by Steig.
We don’t yet have the satellite data (Steig has not provided it) so the argument will be:
“Silly Jeff’s you haven’t shown anything, the AWS wasn’t the conclusion it was the confirmation.”
To that we reply with an interesting distance correlation graph of the satellite reconstruction (also from only 3 PCs). The conclusion has the exact same problem as the confirmation. Stay tuned.
(Graph originally calculated by Steve McIntyre)












Well done. A nice piece of analysis.
Many thanks for this very nice article.
[snip ad hom]
http://www.nytimes.com/2009/03/01/science/earth/01treaty.html?_r=1&partner=rss&emc=rss
What are we going to do about that?
Jeff C and Jeff Id, what are the results if some of the data are removed; say 5 or 10 % randomly chosen? Shouldn’t the results be ‘about’ the same, where ‘about’ is kind of fuzzy?
A WAG on my part and very likely not useful.
A really great post/article, and not *that* long when compared to some RC or Air Vent posts 🙂
Many probably have already been tracking this discussion here and at CA and elsewhere. What strikes me, besides the very clever use of a mass of different statistical and graphical tools is the quality and tone of the debate. The commitment to transparency of methods and the sharing of code stands in sharp contrast to the obfuscation of Steig et al and Gavin who is apparently their surrogate. I am sure they are following this work closely and are probably somewhat unnerved by the findings, the level of effort, the sheer horsepower that is being targeted on this topic.
Congratulations to the Jeff’s for a clear exposition, Anthony for making it available and all those other contributors here, at CA and elsewhere. The satellite data will come in time and with it red faces among the “professionals”.
Outstanding contribution! Very concrete, clear, and, as far as it goes, convincing. Now, the opposition has the task, should they accept it, to refute by means of an equally clear exposition.
We need more articles with this level of exposition, not less. The blogs are full of “unsubstantiated opinions” . Back and forth discussions with this level of “substantiation” could lead to genuine improvements in public understanding of AGW issues.
[snip off topic – trying to keep this thread centered]
Jeff and Jeff, thanks for this very interesting analysis. I am interested in the trends calculated using the 7 PC weighted and weight/imputed cells. These give similar positive trends for 1957-1979, and similar negative trends for 1980-2006. The positive and negative trends have almost identical absolute values of 0.21 C/decade. Yet the total trend for 1957-2006 comes out at +0.08 C/decade. Is this a consequence of the error bars of the resulting trends? End effects in the data? Or something else?
Now to this post –
Thanks for the big effort on our behalf in presenting this in an understandable way. I look forward to reading it later this evening.
This really should be featured at the NIPCC Convention in NY to further underscore the sloppiness behind the global warming movement.
The sloppiness just further confirms the thin ice the AGW theory is based on.
“Juraj V, the new improved Harry was used.”
Thanks. I think it is worth publicing in some official way.
Thanks Anthony, Jeff, and Jeff. I will be able to understand a bit more as I ask my brain to form new neurons for this science. The old neurons and networks have been greatly taxed by trying to wrap themselves around the depth of highly organized [snip] by people/organizations/publications I used to trust. [snip]
By the way, why exactly 7 PC?
Why not use as many PCs as possible to get the best fit possible? I.e. why is more not better, in this particular case? For example, McIntyre’s plot seems to indicate that 32 would make a significant difference to the result.
If there is an optimum (or merely correct) number of PC to use, how is it “objectively” determined?
UAH shows a downwards trend in Antarctica of nearly 1C/century over the last 30 years. Nobody should be using sparse ground based data there. Antarctica is the same size as the US, and I’m pretty sure that you can’t accurately interpolate the temperature in western Kansas from thermometers in downtown Houston and Phoenix.
http://spreadsheets.google.com/pub?key=pj0h2MODqj3gMXQwEtd2uXg&oid=7&output=image
It is a well known fact that temperatures along Chile and the antartic peninsula have never corresponded to temperatures at the same latitudes in the norther hemisphere, because of the great extent of the pacific ocean. As allways GWrs choose the warmest month of the year of the antartic peninsula, february,(temperatures reach up to 2°C above zero) to issue their “convenient” studies.
And remember:
http://wattsupwiththat.com/2008/01/22/surprise-theres-an-active-volcano-under-antarctic-ice/
That mountains´ chain, which is a prolongation of southamerican andes, it is active again as shown by the continuous eruption of Chaiten volcano in Chile.
Thanks a million Jeff & Jeff. So lucid, so refreshing to get first-rate science again, finally confirming what I’d suspected in reading CA where it was sometimes like trying to read Chinese (no offence meant!).
Now here’s a paper for publication… and… go for another first, online peer-review? or rather, Craig Loehle’s already done that hasn’t he?
Now Antarctica looks like what it used to look like in 2004 (my page to help folk grasp Polar realities, all the way up to Steig), it fits what Svensmark would predict, both before and after 1980. With this, another detail comes to mind. I’d expect the mid-continent areas to show colder cold, and more fluctuation, than all coastal stations. Could that fit too? Finally: any instrument siting data issues (UHI sort of)? Have such been checked? especially with so few sites.
Two Jeffs. Thank you for this summary. We have a local “letter to the editor” writer who is convinced the Antarctic will soon be gone. Good ammo.
Pierre Gosselin: RE: the DC protest … the forecast is hilarious:
http://www.wunderground.com/cgi-bin/findweather/getForecast?query=washington%20DC%20&wuSelect=WEATHER
And remember too: Along those mountains we pay a gas tax of about 50%. You were used to pay up to $4.-per gallon last year, so it would be advisable ( :)), the sooner the better (in order not to get accustom to lower prices), to establish such a tax. (I guess this is what is behind the green agenda and, of course, Hansen´s “history march”).
@Adam Gallon (02:13:22) :
“Let me see if I’ve got this right.
1) Everyone can agree that the Antarctic has warmed between 1957 & 2006, the amount is the question?.
2) That warming is confined to the 1957 – 1979 period, the amount is near enough the same no matter what methodology is applied to calculate it?.
3) 1980 – 2006 shows a cooling, again the amount is the question?.”
the fourth picture shows that talking 3 or even 7 pc’s still introduces a warming bias.
talking all pc’s (why should anyone not use all information ?), should further lower the trends.
In Canada, province of Quebec, the Government of Quebec must food white-tailed deer for a second year because the are too snow. The news in french: http://droitemonde.blogspot.com/2009/03/une-autre-mauvaise-nouvelle-pour-nos.html
[snip – off topic, trying to keep this thread centered ]
QUESTION(s)
(1) What result would they get for only the regions where sensors exist, without any statistical reconstruction? It would seem that if they warmed, or didn’t, in the same way the derived changes did, that might give us some idea whether the reconstruction was at least plausible. I mean, if they don’t show any change, or very little, and the generated data do, I would be very suspicious, as I would if the proportion of change inferred was much greater than that observed.
(2) Is there anything like a control, like performing that hat trick using US monitoring stations selected to border a large region of the US to model the trends within the border, then compare the computed result with what actually happened?
Thanks for pulling this together. I have a reasonable statistical background so I understand a good deal of this argument. What I do not see discussed is a presentation of the Null Hypothesis (H0) and an analysis of whether or not the data/model supports or refutes the hypothesis at a statistically significant level.
For the Antarctic, the Null Hypothesis should be something like this: “Over the time frame where temperature data is available, there is no discernable trend in temperature change.” This would allow testing the alternative hypothesis that there is a trend in the data statistically different from “0”. That trend could be up or down. Statistical significance of the alternative hypothesis would determine if the null hypothesis would be rejected.
Simply looking at the available data presented in this post leads on to believe that the null hypothesis cannot be refuted by the data. Perhaps this is true for both the Steig data and the McIntyre data.
Another issue that could be discussed is the use of models of data to identify outliers. Thus a reasonable model of the Antarctic data could identify either data within a station that does not fit modeled trends or identification of stations themselves that appear to be outside of modeled trends. Investigation of outlier data can lead to interesting conclusions. The simple case would be recalibration of an instrument or resiting to a better location. The complex case would be discovery of an unexpected fact.
Finally, as George E. P. Box is quoted to have said, “All models are wrong, but some are useful.” Thanks to diligent efforts by McIntyre and many others, I believe we can say that the Steig model is both wrong and not useful.
What are we going to do about that?
Judging by past experience, we will sign it and abide by it to about an 80% level. The RoW will sign it and blithely ignore it. The news you will read about it will be about how the US is 20% in violation.
OTOH, I’d rather the RoW ignore it than abide by it because if they do not ignore it, millions of babies will starve, which a result which I do not favor most days.
[thanks, noted, but off topic. trying to keep this thread centered ]