From the Air Vent, reposted by invitation
Posted by Jeff Id on February 15, 2009
Guest post by Jeff C
Jeff Chas done an interesting and impressive recalculation of the automatic weather station AWS, reconstruction of the Steig 09 currently on the cover of Nature.
Warming of the Antarctic ice-sheet surface since the 1957 International Geophysical Year
Jeff C is an engineer who realized that the data was not weighted according to location in the original paper. He has taken the time to come up with a reasonable regridding method which more appropriately weights individual temperature stations across the Antarctic. It’s amazing that a simple, reasonable gridding of temperature stations can make so much difference to the final result.
———————-
Jeff Id’s AWS reconstructions using his implementation of RegEM are reasonably close to the Steig reconstructions. The latest difference plot between his reconstruction and Steig’s is quite impressive. Removing two sites from his reconstruction that were erroneously included in initial attempts (Gough and Marion) gives us this chart:
It is clear Jeff is very close as the plot above has virtually zero slope and the “noise level” is typically within +/- 0.3 deg C except for a few outliers (that’s the Racer Rock anomaly at the far right as we are using the original data). Although not quite fully there, it is clear Jeff has the fundamentals correct as to how Steig used the occupied station and AWS data with RegEM.
I duplicated Jeff’s results using his code and began to experiment with RegEM. As I became more familiar, it dawned on me that RegEM had no way of knowing the physical location of the temperature measurements. RegEM does not know or use the latitude and longitude of the stations when infilling, as that information is never provided to it. There is no “distance weighting” as is typically understood as RegEM has no idea how close or how far the occupied stations (the predictor) are from each other, or from the AWS sites (the predictand). Steig alludes to this in the paper on page 2:
“Unlike simple distance-weighting or similar calculations, application of RegEM takes into account temporal changes in the spatial covariance pattern, which depend on the relative importance of differing influences on Antarctic temperature at a given time.”
I’m an engineer, not a statistician so I’m not sure exactly what that means, but it sounds like hand-waving and a subtle admission there is no distance weighting. He might be saying that RegEM can draw conclusions based on the similarity in the temperature trend patterns from site to site, but that is about it. If I’ve got that wrong, I would welcome an explanation.
I plotted out the locations of the 42 occupied stations used in the reconstruction below. Note the clustering of stations on the Antarctic Peninsula. This is important because the peninsula is known to be warming, yet only constitutes a small percentage of the overall land mass (less than 5%). Despite this, 15 of the 42 occupied stations used in the reconstruction are on the peninsula.
Location of 42 occupied stations that form the READER temperature dataset (per Steig 2009 Supplemental Information). Note clustering of locations at northern extremes of the Antarctic Peninsula.
I decided to see what would happen if I applied some distance weighting to the data prior to running it through RegEM.
DISCLAIMER: I am not stating or implying that my reconstruction is the “correct” way to do it. I’m not claiming my results are any more accurate than that done by Steig. The point of this exercise is to show that RegEM does, in fact, care about the sparseness, location and weighting of the occupied station data.
I decided to carve up Antarctica into a series of grid cells. I used a triangular lattice and experimented with various cell diameters and lattice rotations. The goal was to have as many cells as possible containing occupied stations, but also to have as high a percentage of the cells as possible contain at least one occupied station. I ended up with a cell diameter of about 550 miles with the layout below.
Gridcells used for averaging and weighting. Cell diameter is approximately 550 miles. Value in parenthesis is number of occupied stations in cell. Note that cell C (northern peninsula extreme) contains 11 occupied stations, far more than other cells. Cells without letters have no occupied stations.
I sorted the occupied station data (converted to anomalies by Jeff Id’s code) into groups that corresponded to each gridcell location. If a gridcell had more than one station, I averaged the results into a single series and assigned it to the gridcell. Unfortunately, 14 of the 36 gridcells had no occupied station within them. Most of these gridcells were in the interior of the continent and covered a large percentage of the land mass. Since manufacturing data is all the rage these days, I decide to assign a temperature series to these grid cells based on the average of neighboring grid cells. The goal was to use the available temperature data to spread observed temperature trends across equal areas. For example, 17 stations on the peninsula in three grid cells would have three inputs to RegEM. Likewise, two stations in the interior over three grid cells would have three inputs to RegEM. The plot below shows my methodology.
Shaded cells with single letter contain occupied stations. Cells with two or more letters have no occupied stations but have temperature records derived from average of adjacent cells (cell letters describe cells used for derivation). Cells with derived records must have three adjacent or two non-contiguous adjacent cells with occupied stations or they are left unfilled.
I ended up with 34 gridcell temperature series. Two of the grid cells I left unfilled as I did not think there was adequate information from the adjacent gridcells to justify infilling. Once complete, I ran the 34 occupied station gridcell series through RegEM along with the 63 AWS series. The same methodology was used as in Jeff Id’s AWS reconstruction except the 42 station series were replaced by the 34 gridcell series.
For comparison, here is Steig’s AWS reconstruction:
Calculated monthly means of 63 AWS reconstructions using aws_recon.txt from Steig website. Trend is +0.138 deg C. per decade using full 1957-2006 reconstruction record. Steig 2009 states continent-wide trend is +0.12 deg C. per decade for satellite reconstruction. AWS reconstruction trend is said to be similar.
And here is my gridcell reconstruction using Jeff Id’s implementation of RegEM:
Calculated monthly means of 63 AWS reconstructions using Jeff Id RegEM implementation and averaged grid cell approach. Trend is +0.069 deg C. per decade using full 1957-2006 reconstruction record.
Although the plots are similar, the gridcell reconstruction trend is about half of that seen in the Steig reconstruction. Note that most warming occurred prior to 1972.
Again, I’m not trying to say this is the correct reconstruction or that this is any more valid than that done by Steig. In fact, beyond the peninsula and coast data is so sparse that I doubt any reconstruction is accurate. This is simply to demonstrate that RegEM doesn’t realize that 40% of the occupied station data came from less than 5% of the land mass when it does its infilling. Because of this, the results can be affected by changing the spatial distribution of the predictor data (i.e. occupied stations).
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.






Fantastic work- congratulations! IMHO it would appear more reasonable to equalise temperature data on an equal area basis, as is done here, rather than apparently allowing the 5% tail (of the Antarctic Peninsula) to wag the (Continental Antarctic) dog, a la Stein et al.
Why not try and get this accepted as a counter publication?
I’m just a layman here, but it would seem that trying to deal with the fact that there is a concentration of measurements in a small area in Antarctica that is warming would be important.
Is that just not so?
Would it be too much work to see what the plot looks like once peninsula grid cells ( A / B / C / D / E ) are removed?
If we suspect that vulcanism has played a part in peninsular “warming” , then if we remove those (affected) cells, this should better reflect the rest of the land mass.
Essentially the (my, anyway) question is to ask how much influence is the peninsula itself having, not just station weighting.
Thanks…
Excellent job! Just another example of how the Man-made Global Warming Hypothesis is truly a phenomenon of number & data manipulation. All in the name of reaching a pre-determined result!
Pure GIGO.
I’m sorry and don’t mean to be grumpy, but sparse data measured with error have no inferential power. It doesn’t matter if you weight them, massage them, put them in a sack and beat them with stick, calculate the principal component eigen vectors, tickle them, squeeze them, or put them through the dishwasher; poor data are worthless and meaningless.
The confidence limits exceed the range of the data. The temp trend could be plus or minus 5 degrees. We have no way of knowing from these data. IMHO.
PS — please don’t use RegEM PCA when you engineer bridges, or chips, or cars, or anything else that which might inflict tremendous tragedy and suffering should it fail.
15 of 42 cells are in one area and that happens to be the warmest area? That doesn’t seem to me like it would give the right result.
And if we chart the main Antarctic Mass seperately from the peninsula, my guess is that the main mass has cooled far more than the Peninsula has warmed. And we are right back to that embarassing jpeg of Antarctica.
Which is why it had to be remodeled just recently.
SEPP Science Editorial #6-09 (2/7/09)
Returning to the Antarctic:
You may recall our skepticism about reported Antarctic warming [Science Editorial #4-09 (1/247/09)]:
Recall that Professor Eric Steig et al last month announced in Nature that they had spotted a warming in West Antarctica that previous researchers had missed through slackness – a warming so strong that it more than made up for the cooling in East Antarctica. Finally, Global Warming really was global.
The paper was immediately greeted with suspicion, not least because one of the authors was Michael Mann, ‘inventor’ of the infamous hockey stick, now discredited, and the data was reconstructed from very sketchy weather-station records. But also, because the Steig result was contradicted by the much superior MSU data from satellites.
As reported by Australia’s Herald Sun (Feb 4), the warming trend ‘arises entirely from the impact of splicing two data sets together’ Read this link and this to see Steve McIntyre’s superb forensic work. Why wasn’t this error picked up earlier? Perhaps because the researchers got the results they’d hoped for, and no alarm bell went off that made them check. Now, wait for the papers to report the error with the zeal with which they reported Steig’s warming.
http://blogs.news.com.au/heraldsun/andrewbolt/index.php/heraldsun/comments/going_cold_on_antarctic_warming#48360
———————————————–
University of Toronto geophysicists have shown that should the West Antarctic Ice Sheet (WAIS) collapse and melt in a warming world, as many scientists are concerned it will, it is the coastlines of North America and of nations in the southern Indian Ocean that will face the greatest threats from rising sea levels. The research is published in the February 6 issue of Science magazine
“This concern was reinforced further in a recent study led by Eric Steig of the University of Washington that showed that the entire region is indeed warming.”
Well now, not only is there no indication of a collapse of the WAIS – but it’s not even warming. The researchers end their news release with: “The most important lesson is that scientists and policy makers should focus on projections that avoid simplistic assumptions.” I agree fully.
http://www.sepp.org/
“Unlike simple distance-weighting or similar calculations, application of RegEM takes into account temporal changes in the spatial covariance pattern, which depend on the relative importance of differing influences on Antarctic temperature at a given time.”
I’m an engineer, not a statistician so I’m not sure exactly what that means, but it sounds like hand-waving and a subtle admission there is no distance weighting.
Well, I’m pretty good at deconstructing science Babblespeak… but this is a bit on the dense side. My cut at it ends up where yours does:
~”We don’t do simple distance weighting or similar things (implied air of superiority), RegEM uses time changing things in the pattern of how space is covered (wave of obfuscation over exact thing done) which depends on how time based changes move Antarctic temperature data”
Or my paraphrase: We figure that seasons might have an impact, so we try to match change patterns over time and ignore distance. You know, summer may warm up faster near the coasts but very slowly in the middle; winter comes and the pole drops fastest first; so we just vary the impact with time relationships and figure that drags some distance stuff in with it, maybe.
Richard Sharpe @ur momisugly (10:26:48)
Richard, using the suspect readings from a very small area, taken on a volcanically-active peninsula, to make a case for an entire continent’s “multi-decadal warming trend” would, essentially, be akin to taking temperature readings from a few stations in South Florida to extrapolate the temperature trends (within a few hundredths of a degree Celsius) on Hudson Bay.
One can do it, but one shouldn’t expect to have one’s results taken seriously by others with even a modicum of rational thinking processes going on inside their craniums.
If I’ve misunderstood the point of your comment and question, I apologize in advance.
Jeff, this is yet another incredible piece of work that us lowly non-statisticians can only sit back and marvel over. Great job.
Now pardon me while I sit back and watch those evil capitalists burn all of that [snip] racing fuel and increase their carbon footprints! Woot!
[snip]
REPLY: B.C. Please stop making up slogans on trademarks, it puts this blog at risk. I’ve had to snip several of your posts in the past, and I don’t need the extra work. Final warning. – Anthony
Mike D. (11:06:32) :Pure GIGO.
I’m sorry and don’t mean to be grumpy, but sparse data measured with error have no inferential power. It doesn’t matter if you weight them, massage them, […]; poor data are worthless and meaningless.
The confidence limits exceed the range of the data.
Strange. This is exactly my evaluation of what GISS does in GIStemp!
In fact, my first response to the original issue (Steig paper) was “Wait a minute, they are making the same “coastal” error that Hansen does in GIStemp only inverting the direction of action! In a Very Cold Place projecting a sparse interior it will imply bogus warming interior where in warm places with most sites on the coast being ‘adjusted’ via interior stations it will imply — bogus warming of the coasts!)
And that is why I keep saying that the GIS data are ‘cooked’, and why I think the original Steig approach is in error.
Basically, the “Reference Station Method” does not work well for projecting into non-homogenous zones and is especially bad with coastal vs non-coastal over long distances and is even worse when linear offsets are used rather than a coefficient of correlation formula with (perhaps variable) slope (and perhaps variance over time… summer vs winter).
This induces error that exceeds the accuracy of the data; that then forces your precision down to the single digit range. And we know to “Never let your precision exceed your accuracy!” -Mr. McGuire.
So we again end up dancing in the error bands of the simulation (or extrapolation, or interpolation, or projection, or whatever fancy word you want to use for “made up ‘data’ set”).
We fret over 1/10ths or 1/100ths in “made up data” that are only useful in the 1/1 ths…
With that said: Jeff C.’s experiment demonstrates some of the sensitivities of this method to minor changes of assumption. It shows that the outcome is more related to the method than to the reality: And THAT is a very important lesson!
Anthony, will do and apologies for any problems.
The two Jeffs have demonstrated what fakery the temperature reconstructions employ.
Ignobale Prize for Mann & Steig, anyone?
EM,
I believe that you should be the official translator. Whenever you see a bit of pedantic folderol (BS), you can let each of us know what it means.
Maybe sometime in the future we could even have a dictionary here of words that are commonly used to prevaricate (lie).
Jeff,
Thanks for this exercise. I can hardly wait to see the responses and big words that will come soon.
Mike Bryant
Let us take these very small temperature fluctuations in perspective.
Antarctica today has ice that averages over 7000 feet thick and currently holds about 90% of all ice on Earth. The average temperature of this ice is -37˚C. Antarctica is so cold today that the vast majority of it never gets above freezing, which means it’s ice is safe from melting. From 1986 to 2000 both satellite data and ground stations show that Antarctica cooled and actually gained more ice by a staggering 26.8 million tons of ice per year.
The greater part of Antarctica experiences a longer sea ice season lasting 21 days longer than what it did in 1979. While the overall the amount of Antarctic sea ice has increased since 1979, some ice shelf is melting. This ice shelf melting was already floating on the ocean as a solid so very little sea level rise occurs because of this sea ice melt. This melting and freezing is part of a natural cycle that has gone on for over 30 million years long before any human influence.
Can someone enlighten me as I get confused with all the acronyms for everything.
I thought that Steig did the following:
1) Took ground data that have a long history but are geographically limited, and compared them with satellite data, station by station and claims reasonable agreement over the common time.
2) Made a grid of satellite data that covers the whole antarctic.
3) Using this grid extrapolated from the old data to the interior data and generated interior data for the times before satellites existed.
4) took the average and made a temperature versus time plot which he claims shows warming, though very slight.
It seems to me that the analysis above addresses the ground stations and uses them for extrapolating to the interior, which is something entirely different.
What am I misunderstanding?
Ed Scott say: “As reported by Australia’s Herald Sun (Feb 4), the warming trend ‘arises entirely from the impact of splicing two data sets together’”
Actually, Ed-san, if you re-read the article more closely, the warming trend referred to is strictly for Harry, just one station, which I believe Steig even denies was part of the final data set. It seems to have been a major player in some way, however, given that it had the largest trend and was well away from the peninsula. More work remains to be done on this.
As stated on Jeff’s blog, simply figure out what kind of grouping you need to maximize the trend, and there you go….
anna v
Steig’s paper did two entirely separate reconstructions. The AWS reconstruction was used as verification of the satellite reconstruction which showed a ‘strong’ warming trend. Steig has seen fit not to share the satellite data or even one line of the code his team wrote for creation of this paper, however he did point to RegEM claiming this was all of the code.
We have been able to come close to the AWS reconstruction but not the satellite. So since this is what we have, this is what we’re using.
What Jeff C showed that RegEM results are dramatically affected by the spatial distribution of the surface data. The implication is that this effect of the reconstruction is not proplerly accounted for. Actually it wasn’t discussed at all as far as I know. Since simple and reasonable regridding of the data creates so much difference in trend, the question becomes—
How can this rebuilt and heavily imputed data be used for verification of sat trend?
and of course
If the AWS data which verifies sat trend is flawed, what does that say for sat trend?
BTW: The satellite data was an IR surface measurement, not a lower troposphere measurement as in UAH or RSS.
“Note that most warming occurred prior to 1972”.
And most (if not all) warming occurred on 5% of Antarctica, where most of the recently active volcanoes are.
jorgekafkazar (12:53:21)
You say, “…which I believe Steig even denies was part of the final data set.”
——————————————-
I cannot find the denial by Dr. Steig in either the posted article or the referenced blog.
—————————–
“Previous researchers hadn’t overlooked the data. What they’d done was to ignore data from four West Antarctic automatic weather stations in particular that didn’t meet their quality control. As you can see above, one shows no warming, two show insignificant warming and fourth – from a station dubbed “Harry” shows a sharp jump in temperature that helped Steig and his team discover their warming Antarctic.”
“Harry in fact is a problematic site that was buried in snow for years and then re-sited in 2005. But, worse, the data that Steig used in his modelling which he claimed came from Harry was actually old data from another station on the Ross Ice Shelf known as Gill with new data from Harry added to it, producing the abrupt warming. The data is worthless.”
As a failed engineer (couldn’t hack the math) may I congratulate Jeff Chas on his superbly presented thought experiment. A tribute to engineers everywhere.
And dontcha love that throwaway line buried there in the middle:
“since manufacturing data is all the rage these days”
As Glenn Reynold says: Heh.
“This is simply to demonstrate that RegEM doesn’t realize that 40% of the occupied station data came from less than 5% of the land mass when it does its infilling. Because of this, the results can be affected by changing the spatial distribution of the predictor data (i.e. occupied stations).”
Let me see if I’m interpreting this correctly. Let me use fuel mileage as an example:
On two consecutive fillups, I record 22 and 26mpg, and determine that my average fuel mileage is 24mpg. But, I only used 5 gallons from the first fillup, and 12 gallons from the 2nd fillup, so it’s not as simple and simply averaging 22 and 26 together.
This is similar to 40% of the stations accounting to 40% of the signal event hough they are covering only 5% of the landmass.
Am I thinking about that right?
I am just a dummy trying to understand all the statistical mumbo jumbo, but it seems to me that a lot of people are just making up numbers using computer programs and a limited amount of real data. I see that playing around with the numbers gives one different results (which I think is your point here), but exactly why does anyone take all of this manipulation seriously? It reminds me of what my company does with sales data…then they feed us a line of bs about the metric being exact to 0.00…none of us believe THAT, either.
To moderator: I am trying to both have some fun here and explain what I have come to understand as the upshot of teh Two Jeff’s work.
Reading the comments about this post at Climate Audit, I get the feeling that the Mann & Steig discussion went something like this:
“Well, we got rid of the medieval warm period, Eric, so let’s deal with the Antartic cold period”
“Hmmm, how do we go about that?”
“Well, we know most of the key measurements are from the Antartic peninsula, which we know is warming due to the volcanoes underneath”
“OK … where are you going with this Mike??”
“So, we do a statistical analysis which does not weigh the measurements for geographical distribution. Therefore, the statistical analysis will have built into it the assumption that all stations have equal weighting and biases.”
“Oh, right Mike. They will all have equal weights, meaning they are assumed to be equally distributed, at equal latitudes and altitudes, in equal environments.”
“You got it … and we don’t need to state the assumptions as they aren’t assumptions … merely the product of our analytical method”.
“Now, how about that beer?”