Foreword by Anthony Watts: This article, written by the two Jeffs (Jeff C and Jeff Id) is one of the more technically complex essays ever presented on WUWT. It has been several days in the making. One of the goals I have with WUWT is to make sometimes difficult to understand science understandable to a wider audience. In this case the statistical analysis is rather difficult for the layman to comprehend, but I asked for (and got) an essay that was explained in terms I think many can grasp and understand. That being said, it is a long article, and you may have to read it more than once to fully grasp what has been presented here. Steve McIntyre of Climate Audit laid much of the ground work for this essay, and from his work as well as this essay, it is becoming clearer that Steig et al (see “Warming of the Antarctic ice-sheet surface since the 1957 International Geophysical Year”, Nature, Jan 22, 2009) isn’t holding up well to rigorous tests as demonstrated by McIntyre as well as in the essay below. Unfortunately, Steig’s office has so far deferred (several requests) to provide the complete data sets needed to replicate and test his paper, and has left on a trip to Antarctica and the remaining data is not “expected” to be available until his return.
To help layman readers understand the terminology used, here is a mini-glossary in advance:
RegEM – Regularized Expectation Maximization
PCA – Principal Components Analysis
PC – Principal Components
AWS – Automatic Weather Stations
One of the more difficult concepts is RegEM, an algorithm developed by Tapio Schneider in 2001. It’s a form of expectation maximization (EM) which is a common and well understood method for infilling missing data. As we’ve previously noted on WUWT, many of the weather stations used in the Steig et al study had issues with being buried by snow, causing significant data gaps in the Antarctic record and in some burial cases stations have been accidentally lost or confused with others at different lat/lons. Then of course there is the problem of coming up with trends for the entire Antarctic continent when most of the weather station data is from the periphery and the penisula, with very little data from the interior.
Expectation Maximization is a method which uses a normal distribution to compute the best probability of fit to a missing piece of data. Regularization is required when so much data is missing that the EM method won’t solve. That makes it a statistically dangerous technique to use and as Kevin Trenberth, climate analysis chief at the National Center for Atmospheric Research, said in an e-mail: “It is hard to make data where none exist.” (Source: MSNBC article) It is also valuable to note that one of the co-authors of Steig et al, Dr. Michael Mann, dabbles quite a bit in RegEm in this preparatory paper to Mann et al 2008 “Return of the Hockey Stick”.
For those that prefer to print and read, I’ve made a PDF file of this article available here.
This article is an attempt to describe some of the early results from the Antarctic reconstruction recently published on the cover of Nature which demonstrated a warming trend in the Antarctic since 1956. Actual surface temperatures in the Antarctic are hard to come by with only about 30 stations prior to 1980 recorded through tedious and difficult efforts by scientists in the region. In the 80′s more stations were added including some automatic weather stations (AWS) which sit in remote areas and report the temperature information automatically. Unfortunately due to the harsh conditions in the region many of these stations have gaps in their records or very short reporting times (a few years in some cases). Very few stations are located in the interior of the Antarctic, leaving the trend for the central portion of the continent relatively unknown. The location of the stations is shown on the map below.
In addition to the stations there are satellite data from an infrared surface temperature measurement which records the temperature of the actual emission from the surface of the ice/ground in the Antarctic. This is different from the microwave absorption measurements as made from UAH/RSS data which measure temperatures in a thickness of the atmosphere. This dataset didn’t start until 1982.
Steig 09 is an attempt to reconstruct the continent-wide temperatures using a combination of measurements from the surface stations shown above and the post-1982 satellite data. The complex math behind the paper is an attempt to ‘paste’ the 30ish pre-1982 real surface station measurements onto 5509 individual gridcells from the satellite data. An engineer or vision system designer could use several straightforward methods which would insure reasonable distribution of the trends across the grid based on a huge variety of area weighting algorithms, the accuracy of any of the methods would depend on the amount of data available. These well understood methods were ignored in Steig09 in favor of RegEM.
The use of Principal Component Analysis in the reconstruction
Steig 09 presents the satellite reconstructions as the trend and also provides an AWS reconstruction as verification of the satellite data rather than a separate stand alone result presumably due to the sparseness of the actual data. An algorithm called RegEM was used for infilling the missing data. Missing data includes pre 1982 for satellites and all years for the very sparse AWS data. While Dr. Steig has provided the reconstructions to the public, he has declined to provide any of the satellite, station or AWS temperature measurements used as inputs to the RegEM algorithm. Since the station and AWS measurements were available through other sources, this paper focuses on the AWS reconstruction.
Without getting into the detail of PCA analysis, the algorithm uses covariance to assign weighting of a pattern in the data and does not have any input whatsoever for actual station location. In other words, the algorithm has no knowledge of the distance between stations and must infill missing data based solely on the correlation with other data sets. This means there is a possibility that with improper or incomplete checks, a trend from the peninsula on the west coast could be applied all the way to the east. The only control is the correlation of one temperature measurement to another.
If you were an engineer concerned with the quality of your result, you would recognize the possibility of accidental mismatch and do a reasonable amount of checking to insure that the stations were properly assigned after infilling. Steig et. al. described no attempts to check this basic potential problem with RegEM analysis. This paper will describe a simple method we used to determine that the AWS reconstruction is rife with spurious (i.e. appear real but really aren’t) correlations attributed to the methods used by Dr. Steig. These spurious correlations can take a localized climactic pattern and “smear” it over a large region that lacks adequate data of its own.
Now is where it becomes a little tricky. RegEM uses a reduced information dataset to infill the missing values. The dataset is reduced by Principal Component Analysis (PCA) replacing each trend with a similar looking one which is used for covariance analysis. Think of it like a data compression algorithm for a picture which uses less computer memory than the actual but results in a fuzzier image for higher compression levels.
While the second image is still visible, the actual data used to represent the image is reduced considerably. This will work fine for pictures with reasonable compression, but the data from some pixels has blended into others. Steig 09 uses 3 trends to represent all of the data in the Antarctic. In it’s full complexity using 3 PC’s is analogous to representing not just a picture but actually a movie of the Antarctic with three color ‘trends’ where the color of each pixel changes according to different weights of the same red, green and blue color trends (PC’s). With enough PC’s the movie could be replicated perfectly with no loss. Here’s an important quote from the paper.
“We therefore used the RegEM algorithm with a cut-off parameter K=3. A disadvantage of excluding higher-order terms (k>3) is that this fails to fully capture the variance in the Antarctic Peninsula region. We accept this tradeoff because the Peninsula is already the best-observed region of the Antarctic.”
Above: a graph from Steve McIntyre of ClimateAudit where he demonstrates how “K=3 was in fact a fortuitous choice, as this proved to yield the maximum AWS trend, something that will, I’m sure, astonish most CA readers.“
K=3 means only 3 trends were used, the ‘lack of captured variance’ is an acknowledgement and acceptance of the fuzziness of the image. It’s easy to imagine that it would be difficult to represent a complex movie image of Antarctic with any sharpness from 1957 to 2006 temperature with the same 3 color trends reweighted for every pixel. In the satellite version of the Antarctic movie the three trends look like this.
Note that the sudden step in the 3rd trend would cause a jump in the ‘temperature’ of the entire movie. This represents the temperature change between the pre 1982 recreated data and the after 1982 real data in the satellite reconstruction. This is a strong yet overlooked hint that something may not be right with the result.
In the case of the AWS reconstruction we have only 63 AWS stations to make the movie screen, by which the trends of 42 surface station points are used to infill the remaining data. If the data from one surface station is copied to the wrong AWS stations the average will overweight and underweight some trends. So the question becomes, is the compression level too high?
The problems that arise when using too few principal components
Fortunately, we’re here to help in this matter. Steve McIntyre again provided the answer with a simple plot of the actual surface station data correlation with distance. This correlation plot compares the similarities ‘correlation’ of each temperature station with all of the 41 other manual surface stations against the distance between them. A correlation of 1 means the data from one station is exactly equal to the other. Because A -> B correlation isn’t a perfect match for B->A there are 42*42 separate points in the graph. This first scatter plot is from measured temperature data prior to any infilling of missing measurements. Station to station distance is shown on the X axis. The correlation coefficient is shown on the Y axis.
Since this plot above represents the only real data we have existing back to 1957, it demonstrates the expected ‘natural’ spatial relationship from any properly controlled RegEM analysis. The correlation drops with distance which we would expect because temps from stations thousands of miles away should be less related than those next to each other. (Note that there are a few stations that show a positive correlation beyond 6000 km. These are entirely from non-continental northern islands inexplicably used by Steig in the reconstruction. No continental stations exhibit positive correlations at these distances.) If RegEM works, the reconstructed RegEM imputed (infilled) data correlation vs. distance should have a very similar pattern to the real data. Here’s a graph of the AWS reconstruction with infilled temperature values.
Compare this plot with the previous plot from actual measured temperatures. Now contrast that with the AWS plot above. The infilled AWS reconstruction has no clearly evident pattern of decay over distance. In fact, many of the stations show a correlation of close to 1 for stations at 3000 km distant! The measured station data is our best indicator of true Antarctic trends and it shows no sign that these long distance correlations occur. Of course, common sense should also make one suspicious of these long distance correlations as they would be comparable to data that indicated Los Angeles and Chicago had closely correlated climate.
It was earlier mentioned that the use of 3 PCs was analogous to the loss of detail that occurs in data compressions. Since the AWS input data is available, it is possible to regenerate the AWS reconstruction using a higher number of PCs. It stood to reason that spurious correlations could be reduced by retaining the spatial detail lost in the 3 PC reconstruction. Using RegEM, we generated a new AWS reconstruction using the same input data but with 7 PCs. The distance correlations are shown in the plot below.
Note the dramatic improvement over that shown in the previous plot. The correlation decay with distance so clearly seen in the measured station temperature data has returned. While the cone of the RegEM data is slightly wider than the ‘real’ surface station data, the counterintuitive long distance correlations seen in the Steig reconstruction have completely disappeared. It seems clear that limiting the reconstruction to 3 PCs resulted in numerous spurious correlations when infilling missing station data.
Using only 3 principal components distorts temperature trends
If Antarctica had uniform temperature trends across the continent, the spurious correlations might not have a large impact in the overall reconstruction. Individual sites may have some errors, but the overall trend would be reasonably close. However, Antarctica is anything but uniform. The spurious correlations can allow unique climactic trends from a localized region to be spread over a larger area, particularly if an area lacks detailed climate records of its own. It is our conclusion is that is exactly what is happening with the Steig AWS reconstruction.
Consider the case of the Antarctic Peninsula:
- The peninsula is geographically isolated from the rest of the continent
- The peninsula is less than 5% of the total continental land mass
- The peninsula is known to be warming at a rate much higher than anywhere else in Antarctica
- The peninsula is bordered by a vast area known as West Antarctica that has extremely limited temperature records of its own
- 15 of the 42 temperature surface stations (35%) used in the reconstruction are located on the peninsula
If the Steig AWS reconstruction was properly correlating the peninsula stations temperature measurements to the AWS sites, you would expect to see the highest rates of warming at the peninsula extremes. This is the pattern seen in the measured station data. The plot below shows the temperature trends for the reconstructed AWS sites for the period of 1980 to 2006. This time frame has been selected as this is the period when AWS data exists. Prior to 1980, 100% of AWS reconstructed data is artificial (i.e. infilled by RegEM).
Note how warming extends beyond the peninsula extremes down toward West Antarctica and the South Pole. Also note the relatively moderate cooling in the vicinity of the Ross Ice Shelf (bottom of the plot). The warming once thought to be limited to the peninsula appears to have spread. This “smearing” of the peninsula warming has also moderated the cooling of the Ross Ice Shelf AWS measurements. These are both artifacts of limiting the reconstruction to 3 PCs.
Now compare the above plot to the new AWS reconstruction using 7 PCs.
The difference is striking. The peninsula has become warmer and warming is largely limited to its confines. West Antarctica and the Ross Ice Shelf area have become noticeably cooler. This agrees with the commonly-held belief prior to Steig’s paper that the peninsula is warming, the rest of Antarctica is not.
Temperature trends using more traditional methods
In providing a continental trend for Antarctica warming, Steig used a simple average of the 63 AWS reconstructed time series. As can be seen in the plots above, the AWS stations are heavily weighted toward the peninsula and the Ross Ice Shelf area. Steig’s simple average is shown below. The linear trend for 1957 through 2006 is +0.14 deg C/decade. It is worth noting that if the time frame is limited to 1980 to 2006 (the period of actual AWS measurements), the trend changes to cooling, -0.06 deg C/decade.
We used a gridding methodology to weight the AWS reconstructions in proportion to the area they represent. Using the Steig’s method, 3 stations on the peninsula over 5% of the continent’s area would have the same weighting as three interior stations spread over 30% of the continent area. The gridding method we used is comparable to that utilized in other temperature constructions such as James Hansen’s GISStemp. The gridcell map used for the weighted 7 PC reconstruction is shown here.
Cells with a single letter contain one or more AWS temperature stations. If more than one AWS falls within a gridcell, the results were averaged and assigned to that cell. Cells with multiple letters had no AWS within them, but had three or more contiguous cells containing AWS stations. Imputed temperature time series were assigned to these cells based on the average of the neighboring cells. Temperature trends were calculated both with and without the imputed cells. The reconstruction trend using 7 PCs and a weighted station average follow.
The trend has decreased to 0.08 deg C/decade. Although it is not readily apparent in this plot, from 1980 to 2006 the temperature profile has a pronounced negative trend.
Temporal smearing problems caused by too few PCs?
The temperature trends using the various reconstruction methods are shown in the table below. We have broken the trends down into three time periods; 1957 to 2006, 1957 to 1979, and 1980 to 2006. The time frames are not arbitrarily chosen, but mark an important distinction in the AWS reconstructions. There is no AWS data prior to 1980. In the 1957 to 1980 time frame, every single temperature point is a product of the RegEM algorithm. In the 1980 to 2006 time frame, AWS data exists (albeit quite spotty at times) and RegEM leaves the existing data intact while infilling the missing data.
We highlight this distinction as limiting the reconstruction to 3 PCs has an additional pernicious effect beyond spatial smearing of the peninsula warming. In the table below, note the balance between the trends of the 1957 to 1979 era vs. that of the 1980 to 2006 era. In Steig’s 3 PC reconstruction, moderate warming that happened prior to 1980 is more balanced with slight cooling that happened post 1980. In the new 7 PC reconstruction, the early era had dramatic warming, the later era had strong cooling. It is believed that the 7 PC reconstruction more accurately reflects the true trends for the reasons stated earlier in this paper. However, the mechanism for this temporal smearing of trends is not fully understood and is under investigation. It does appear to be clear that limiting the selection to three principal components causes warming that is largely constrained to a pre-1980 time frame to appear more continuous and evenly distributed over the entire temperature record.
1957 to 2006 trend
1957 to 1979 trend (pre-AWS)
1980 to 2006 trend (AWS era)
|Steig 3 PC||
+0.14 deg C./decade
+0.17 deg C./decade
-0.06 deg C./decade
|New 7 PC||
+0.11 deg C./decade
+0.25 deg C./decade
-0.20 deg C./decade
|New 7 PC weighted||
+0.09 deg C./decade
+0.22 deg C./decade
-0.20 deg C./decade
|New 7 PC wgtd imputed cells||
+0.08 deg C./decade
+0.22 deg C./decade
-0.21 deg C./decade
The AWS trends which this incredibly long post was created from were used only as verification of the satellite data. The statistics used for verification are another subject entirely. Where Steig09 falls short in the verification is that RegEM was inappropriately applying area weighting to individual temperature stations. The trends from the AWS reconstruction clearly have blended into distant stations creating an artificially high warming result. The RegEM methodology also appears to have blended warming that occurred decades ago into more recent years to present a misleading picture of continuous warming. It should also be noted that every attempt made to restore detail to the reconstruction or weight station data resulted in reduced warming and increased cooling in recent years. None of these methods resulted in more warming than that shown by Steig.
We don’t yet have the satellite data (Steig has not provided it) so the argument will be:
“Silly Jeff’s you haven’t shown anything, the AWS wasn’t the conclusion it was the confirmation.”
To that we reply with an interesting distance correlation graph of the satellite reconstruction (also from only 3 PCs). The conclusion has the exact same problem as the confirmation. Stay tuned.
(Graph originally calculated by Steve McIntyre)