From the Air Vent, reposted by invitation
Posted by Jeff Id on February 15, 2009
Guest post by Jeff C
Jeff Chas done an interesting and impressive recalculation of the automatic weather station AWS, reconstruction of the Steig 09 currently on the cover of Nature.
Warming of the Antarctic ice-sheet surface since the 1957 International Geophysical Year
Jeff C is an engineer who realized that the data was not weighted according to location in the original paper. He has taken the time to come up with a reasonable regridding method which more appropriately weights individual temperature stations across the Antarctic. It’s amazing that a simple, reasonable gridding of temperature stations can make so much difference to the final result.
———————-
Jeff Id’s AWS reconstructions using his implementation of RegEM are reasonably close to the Steig reconstructions. The latest difference plot between his reconstruction and Steig’s is quite impressive. Removing two sites from his reconstruction that were erroneously included in initial attempts (Gough and Marion) gives us this chart:
It is clear Jeff is very close as the plot above has virtually zero slope and the “noise level” is typically within +/- 0.3 deg C except for a few outliers (that’s the Racer Rock anomaly at the far right as we are using the original data). Although not quite fully there, it is clear Jeff has the fundamentals correct as to how Steig used the occupied station and AWS data with RegEM.
I duplicated Jeff’s results using his code and began to experiment with RegEM. As I became more familiar, it dawned on me that RegEM had no way of knowing the physical location of the temperature measurements. RegEM does not know or use the latitude and longitude of the stations when infilling, as that information is never provided to it. There is no “distance weighting” as is typically understood as RegEM has no idea how close or how far the occupied stations (the predictor) are from each other, or from the AWS sites (the predictand). Steig alludes to this in the paper on page 2:
“Unlike simple distance-weighting or similar calculations, application of RegEM takes into account temporal changes in the spatial covariance pattern, which depend on the relative importance of differing influences on Antarctic temperature at a given time.”
I’m an engineer, not a statistician so I’m not sure exactly what that means, but it sounds like hand-waving and a subtle admission there is no distance weighting. He might be saying that RegEM can draw conclusions based on the similarity in the temperature trend patterns from site to site, but that is about it. If I’ve got that wrong, I would welcome an explanation.
I plotted out the locations of the 42 occupied stations used in the reconstruction below. Note the clustering of stations on the Antarctic Peninsula. This is important because the peninsula is known to be warming, yet only constitutes a small percentage of the overall land mass (less than 5%). Despite this, 15 of the 42 occupied stations used in the reconstruction are on the peninsula.
Location of 42 occupied stations that form the READER temperature dataset (per Steig 2009 Supplemental Information). Note clustering of locations at northern extremes of the Antarctic Peninsula.
I decided to see what would happen if I applied some distance weighting to the data prior to running it through RegEM.
DISCLAIMER: I am not stating or implying that my reconstruction is the “correct” way to do it. I’m not claiming my results are any more accurate than that done by Steig. The point of this exercise is to show that RegEM does, in fact, care about the sparseness, location and weighting of the occupied station data.
I decided to carve up Antarctica into a series of grid cells. I used a triangular lattice and experimented with various cell diameters and lattice rotations. The goal was to have as many cells as possible containing occupied stations, but also to have as high a percentage of the cells as possible contain at least one occupied station. I ended up with a cell diameter of about 550 miles with the layout below.
Gridcells used for averaging and weighting. Cell diameter is approximately 550 miles. Value in parenthesis is number of occupied stations in cell. Note that cell C (northern peninsula extreme) contains 11 occupied stations, far more than other cells. Cells without letters have no occupied stations.
I sorted the occupied station data (converted to anomalies by Jeff Id’s code) into groups that corresponded to each gridcell location. If a gridcell had more than one station, I averaged the results into a single series and assigned it to the gridcell. Unfortunately, 14 of the 36 gridcells had no occupied station within them. Most of these gridcells were in the interior of the continent and covered a large percentage of the land mass. Since manufacturing data is all the rage these days, I decide to assign a temperature series to these grid cells based on the average of neighboring grid cells. The goal was to use the available temperature data to spread observed temperature trends across equal areas. For example, 17 stations on the peninsula in three grid cells would have three inputs to RegEM. Likewise, two stations in the interior over three grid cells would have three inputs to RegEM. The plot below shows my methodology.
Shaded cells with single letter contain occupied stations. Cells with two or more letters have no occupied stations but have temperature records derived from average of adjacent cells (cell letters describe cells used for derivation). Cells with derived records must have three adjacent or two non-contiguous adjacent cells with occupied stations or they are left unfilled.
I ended up with 34 gridcell temperature series. Two of the grid cells I left unfilled as I did not think there was adequate information from the adjacent gridcells to justify infilling. Once complete, I ran the 34 occupied station gridcell series through RegEM along with the 63 AWS series. The same methodology was used as in Jeff Id’s AWS reconstruction except the 42 station series were replaced by the 34 gridcell series.
For comparison, here is Steig’s AWS reconstruction:
Calculated monthly means of 63 AWS reconstructions using aws_recon.txt from Steig website. Trend is +0.138 deg C. per decade using full 1957-2006 reconstruction record. Steig 2009 states continent-wide trend is +0.12 deg C. per decade for satellite reconstruction. AWS reconstruction trend is said to be similar.
And here is my gridcell reconstruction using Jeff Id’s implementation of RegEM:
Calculated monthly means of 63 AWS reconstructions using Jeff Id RegEM implementation and averaged grid cell approach. Trend is +0.069 deg C. per decade using full 1957-2006 reconstruction record.
Although the plots are similar, the gridcell reconstruction trend is about half of that seen in the Steig reconstruction. Note that most warming occurred prior to 1972.
Again, I’m not trying to say this is the correct reconstruction or that this is any more valid than that done by Steig. In fact, beyond the peninsula and coast data is so sparse that I doubt any reconstruction is accurate. This is simply to demonstrate that RegEM doesn’t realize that 40% of the occupied station data came from less than 5% of the land mass when it does its infilling. Because of this, the results can be affected by changing the spatial distribution of the predictor data (i.e. occupied stations).
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.






At least, we know it’s quite hot down there (relatively of course) for the moment
http://nsidc.org/data/seaice_index/images//daily_images/S_timeseries.png
the arctic is on its side is evaporating
http://nsidc.org/data/seaice_index/images/daily_images/N_timeseries.png
unless there’s a serious problem with NSIDC.
REPLY: “quite hot”? “Flanagan” that’s the most untrue and misleading comment ever made on this blog. Rephrase it please. -And be careful citing NSIDC, there’s some problems- Anthony Watts
Jeff Id (08:58:24)
So far the findings in Surface Temperature Reconstructions for the Last 2,000 Years by the National Research Council has been good enough for me. I have not looked into Mann08 or the ‘Composite and Scale’ method. I have bookmarked your post, but that is all I can promise; not sure whether I will find the time to look into it. (Yeah, I know, lame retreat 😉 )
One obvious question pops up right away: does your linked entry on CPS entail separate training and verification periods?
Jeff Id:
As a mere layperson statistician, I read this just after I was developing the sneaking suspicion that Steig was manipulating/”adjusting” the Satellite data, too, as well as having misled RegEM concerning the occupied and AWS data – the adjusted Satellite results and recon methods which he hasn’t released! Which means to me that there is very likely something wrong with his adjusted results and conclusions, or else that they don’t really show what he says and wants them to show.
Thanks for your and Jeff C’s work and explanations, which have really helped me battle through just exactly what is going on in “Climate Science” reconstructions. It’s stunning.
When I wrote this up, it was for Jeff Id’s site where most of the readers and commenters have closely followed the deconstruction of the paper’s methodology. Because of this, I didn’t provide a top-level overview to set the stage for my analysis. Now that Anthony has picked it up, let me clarify a few things to put this post in context. It might help answer some of the criticisms.
Steig provided two reconstructions, the satellite recon and the AWS recon (more on the meaning of AWS in a moment). He claims both use comparable methodology and provide similar results. In the paper, he shows the smoothed trends for both recons and they do look very similar.
The satellite recon uses two data sets, the temperature data from occupied stations (measurements from manned scientific colonies in Antarctica) and the measured temperature data from the AVHRR satellite. Since the AVHRR satellite uses IR imaging, cloud cover over Antarctica corrupts large portions of the measured satellite data. These data points are removed using a technique known as cloud masking. Both data sets are then processed using the RegEM algorithm to infill missing data (both before the satellite launch and during cloud masking intervals). RegEM outputs the satellite reconstruction with the missing data infilled.
The AWS recon uses a similar methodology but using temperature data from the occupied stations and from automated weather stations (AWS). The AWS data series are from 63 unmanned stations that are deployed throughout Antarctica. The stations are left in remote portions of the continent and upload the temperature measurements through a satellite link. Since the AWS are in a hostile environment with no maintenance for prolonged periods, they are prone to breakdowns (being covered with snow, electronic malfunctions, etc.). The breakdowns cause large gaps in the AWS records (analogous to the cloud-masking gaps in the satellite records). Also, there is no AWS data prior to 1980. RegEM is used with both the occupied station data and the AWS data to infill the gaps and create a reconstruction prior to 1980. This is a separate and different reconstruction from the satellite reconstruction.
The problem with the reconstruction forensics is that Steig did not release the input data (occupied station temperatures, AWS temperatures, and satellite temperatures) to the public. The occupied station and AWS data is available to the public through Steig’s source, through the British Antarctic Survey (BAS). The cloud-masked satellite data is not available. Because of this, most of the forensic work has focused on the AWS reconstruction which does not use *any* satellite data. I would love to work with the satellite reconstruction, but Dr. Steig has not seen fit to make the RegEM satellite input data available.
My point of the above post was to demonstrate that distance weighting the occupied station data into the RegEM algorithm significantly changes the result of the AWS reconstruction. Since the satellite reconstruction is said to use similar methodology (by Steig himself in the paper and SI) presumably its reconstruction would also change significantly.
I’ll be happy to answer any substantive questions, but charges of “hack job analysis” without understanding the fundamentals of Steig’s work will be ignored.
Jeff: excellent work! I’m an engineer involved in resource extraction who works with geological models. If any company tried to list reserves on a publicly traded company using the Steig methodology of averaging everything in the area without distance weighting they would at the least have a stop trade issued on their stock, and at most might be looking at jail time for fraud.
Matt N: let’s flesh out your example a little bit more. Let’s say you want to determine your 2 weeks average fuel use, and you’re a little paranoid so you fill up every day to make sure you never run out of gas. Monday to Friday your 30 mile one-way commute takes a couple of hours because of gridlock, and you buy 6 gallons (10 miles/gallon). On each of the two weekends you travel to visit your parents 300 miles away, traveling there on Saturday and back on Sunday, and again use 6 gallons on the wide-open freeways (50 miles/gallon). What is your mileage? If you average on a daily basis then you’d assume your vehicle is terrible. You have to determine what the appropriate use is and make sure you’re comparing with other vehicles that are used the same way.
Richard Sharpe (10:26:48) :
I’m just a layman here, but it would seem that trying to deal with the fact that there is a concentration of measurements in a small area in Antarctica that is warming would be important.
Is that just not so?
Mann made science seems to be a very corrupting influence in climate science.
I see Mann as the equivalent of one of those eugenicists who carefully compiled stats of how facial eye placement was a direct indicator of racial intelligence.
Congratulations on the quality of (most) of the scientific discussion presented above – it is very heartening to read.
I would like to pick up on a suggestion made much earlier in the postings by BC, Bruce, Robert Wood and Jim F, that the Antarctic Peninsula data are somehow suspect because this is a “volcanically active area”.
In asserting this they are clearly implying that the surface temperature observations from the Peninsula have ( or may have) been influenced by volcanic activity.
Folks, be careful here!
While there are well documented examples of the impact of volcanic eruptions (e.g Pinatubo) on short/med term local and global temperatures, there is no convincing evidence of the impact of “regional volcanism” on local surface temperatures (let alone on temperature trends).
In a region of active volcanism there will certainly be local areas of high geomthermal gradient/ heat flow. Such heat flow may or may not increase/decrease over short time periods. This might possibly influence the surface temperature readings made AT individual observation sites (you may perhaps expect some obvious signs of volcanic/geothermal activity at the observation site if that was the case). But it is drawing a fairly long bow.
Remember, we are looking for trends in surface temperature. We would need not just an association between any individual site and volcanism, but an increase in that volcanic activity also, to render the observations un-reliable.
I’ve not seen any credible evidence for increased volcanic activity in the Antartic Peninsula over the period in question.
In a broader sense, the hypothesis that local/regional activity may have influenced surface temperatures at any given site is worth considering, but only if examined along with all other possible sources of variability and un-reliability in the observations (e.g. measurement errors, biases, clustering, external factors, local variability, etc) which may impact on the intended end use of the data.
Bad reporting of a paper from Nature Geoscience last year may be partly responsible for giving this idea credence. The original article postulates that sub-ice volcanic activity may have an impact on ice flow dynamics, and influence ice sheet stability. The authors certainly do not propose that there has been an increase of volcanic activity during the period of supposed temperature increase.
(see Hugh F. J. Corr & David G. Vaughan. A recent volcanic eruption beneath the West Antarctic ice sheet. Nature Geoscience 1, 122 – 125 (2008).
A good example of the woolly reportage about this article can be found at:
http://www.tgdaily.com/content/view/41171/117
BC, Bruce, Robert Wood and Jim F – don’t be guilty of making unsupported assertions and be cautious of inference. You can’t argue against dodgy science by using dodgy science.
Here’s a blink of the two temperature reconstructions –
http://i39.tinypic.com/2cx6rdi.jpg
Looks like a classic GISS homogenization with the older temps dropping.
The Jeffs work is interesting to me, but I doubt it will reveal anything particularly exciting. Since the error range of the AWS reconstruction is over 50% it will be kinda hard to make a point one way or the other. The 5509 data points for the satellite data should be a better example of when RegEm can be employed and its limitations.
It might be interesting to select an area where surface temperature records have gaps and apply RegEm. Like Canada that lost so many sites.
@ur momisugly Mike Stewart (05:36:39) :
I certainly did not intend to suggest that volcanism is contributing to local surface temperatures in Antarctica. My main point was how one might use geostatistical techniques with temperature readings in widely separated locations to “infer” temperatures over the entirety of the area under concern.
Additionally, I noted in an abbreviated way that East and West Antarctica are two different beasts, in terms of geology, oceanograpy, topography, temperature regimes and who knows what else. In geostatistics, that probably negates, or else tremendously complicates, using measured temperatures from the two areas to estimate temperatures across the combined area.
Ultimately (but unstated earlier) my point is that the two areas have so many differences – especially the one key to these discussions – recorded temperatures/temperature trends – that the two areas should be considered separately. A geologist ought to consider and speak to this (multiple working hypotheses, at a minimum) because they may fundamentally change the analysis. Steig, disappointingly, does not do so.
Thanks for your comments.
What is obvious is . . . figures don’t lie, but liars do figure.