Jeff Id emailed me today, to ask if I wanted to post this with the caveat “it’s very technical, but I think you’ll like it”. Indeed I do, because it represents a significant step forward in the puzzle that is the Steig et all paper published in Nature this year ( Nature, Jan 22, 2009) that claims to have reversed the previously accepted idea that Antarctica is cooling. From the “consensus” point of view, it is very important for “the Team” to make Antarctica start warming. But then there’s that pesky problem of all that above normal ice in Antarctica. Plus, there’s other problems such as buried weather stations which will tend to read warmer when covered with snow. And, the majority of the weather stations (and thus data points) are in the Antarctic peninsula, which weights the results. The Antarctic peninsula could even be classified under a different climate zone given it’s separation from the mainlaind and strong maritime influence.
A central prerequisite point to this is that Steig flatly refused to provide all of the code needed to fully replicate his work in MatLab and RegEM, and has so far refused requests for it. So without the code, replication would be difficult, and without replication, there could be no significant challenge to the validity of the Steig et al paper.
Steig’s claim that there has been “published code” is only partially true, and what has been published by him is only akin to a set of spark plugs and a manual on using a spark plug wrench when given the task of rebuilding an entire V-8 engine.
In a previous Air Vent post, Jeff C points out the percentage of code provided by Steig:
“Here is an excellent flow chart done by JeffC on the methods used in the satellite reconstruction. If you see the little rectangle which says RegEM at the bottom right of the screen, that’s the part of the code which was released, the thousands of lines I and others have written for the rest of the little blocks had to be guessed at, some of it still isn’t figured out yet.”

With that, I give you Jeff and Ryan’s post below. – Anthony
Posted by Jeff Id on May 20, 2009
I was going to hold off on this post because Dr. Weinstein’s post is getting a lot of attention right now it has been picked up on several blogs and even translated into different languages but this is too good not to post.
Ryan has done something amazing here, no joking. He’s recalibrated the satellite data used in Steig’s Antarctic paper correcting offsets and trends, determined a reasonable number of PC’s for the reconstruction and actually calculated a reasonable trend for the Antarctic with proper cooling and warming distributions – He basically fixed Steig et al. by addressing the very concern I had that AVHRR vs surface station temperature(SST) trends and AVHRR station vs SST correlation were not well related in the Steig paper.
Not only that he demonstrated with a substantial blow the ‘robustness’ of the Steig/Mann method at the same time.
If you’ve followed this discussion whatsoever you’ve got to read this post.
RegEM for this post was originally transported to R by Steve McIntyre, certain versions used are truncated PC by Steve M as well as modified code by Ryan.
Ryan O – Guest post on the Air Vent
I’m certain that all of the discussion about the Steig paper will eventually become stale unless we begin drawing some concrete conclusions. Does the Steig reconstruction accurately (or even semi-accurately) reflect the 50-year temperature history of Antarctica?
Probably not – and this time, I would like to present proof.
I: SATELLITE CALIBRATION
As some of you may recall, one of the things I had been working on for awhile was attempting to properly calibrate the AVHRR data to the ground data. In doing so, I noted some major problems with NOAA-11 and NOAA-14. I also noted a minor linear decay of NOAA-7, while NOAA-9 just had a simple offset.
But before I was willing to say that there were actually real problems with how Comiso strung the satellites together, I wanted to verify that there was published literature that confirmed the issues I had noted. Some references:
(NOAA-11)
Click to access i1520-0469-59-3-262.pdf
(Drift)
(Ground/Satellite Temperature Comparisons)
Click to access p26_cihlar_rse60.pdf
The references generally confirmed what I had noted by comparing the satellite data to the ground station data: NOAA-7 had a temperature decrease with time, NOAA-9 was fairly linear, and NOAA-11 had a major unexplained offset in 1993.

Let us see what this means in terms of differences in trends.

The satellite trend (using only common points between the AVHRR data and the ground data) is double that of the ground trend. While zero is still within the 95% confidence intervals, remember that there are 6 different satellites. So even though the confidence intervals overlap zero, the individual offsets may not.
In order to check the individual offsets, I performed running Wilcoxon and t-tests on the difference between the satellites and ground data using a +/-12 month range. Each point is normalized to the 95% confidence interval. If any point exceeds +/- 1.0, then there is a statistically significant difference between the two data sets.

Note that there are two distinct peaks well beyond the confidence intervals and that both lines spend much greater than 5% of the time outside the limits. There is, without a doubt, a statistically significant difference between the satellite data and the ground data.
As a sidebar, the Wilcoxon test is a non-parametric test. It does not require correction for autocorrelation of the residuals when calculating confidence intervals. The fact that it differs from the t-test results indicates that the residuals are not normally distributed and/or the residuals are not free from correlation. This is why it is important to correct for autocorrelation when using tests that rely on assumptions of normality and uncorrelated residuals. Alternatively, you could simply use non-parametric tests, and though they often have less statistical power, I’ve found the Wilcoxon test to be pretty good for most temperature analyses.
Here’s what the difference plot looks like with the satellite periods shown:

The downward trend during NOAA-7 is apparent, as is the strange drop in NOAA-11. NOAA-14 is visibly too high, and NOAA-16 and -17 display some strange upward spikes. Overall, though, NOAA-16 and -17 do not show a statistically significant difference from the ground data, so no correction was applied to them.
After having confirmed that other researchers had noted similar issues, I felt comfortable in performing a calibration of the AVHRR data to the ground data. The calculated offsets and the resulting Wilcoxon and t-test plot are next:


To make sure that I did not “over-modify” the data, I ran a Steig (3 PC, regpar=3, 42 ground stations) reconstruction. The resulting trend was 0.1079 deg C/decade and the trend maps looked nearly identical to the Steig reconstructions. Therefore, the satellite offsets – while they do produce a greater trend when not corrected – do not seem to have a major impact on the Steig result. This should not be surprising, as most of the temperature rise in Antarctica occurs between 1957 and 1970.
II: PCA
One of the items that we’ve spent a lot of time doing sensitivity analysis is the PCA of the AVHRR data. Between Jeff Id, Jeff C, and myself, we’ve performed somewhere north of 200 reconstructions using different methods and different numbers of retained PCs. Based on that, I believe that we have a pretty good feel for the ranges of values that the reconstructions produce, and we all feel that the 3 PC, regpar=3 solution does not accurately reproduce Antarctic temperatures. Unfortunately, our opinions count for very little. We must have a solid basis for concluding that Steig’s choices were less than optimal – not just opinions.
How many PCs to retain for an analysis has been the subject of much debate in many fields. I will quickly summarize some of the major stopping rules:
1. Kaiser-Guttman: Include all PCs with eigenvalues greater than the average eigenvalue. In this case, this would require retention of 73 PCs.
2. Scree Analysis: Plot the eigenvalues from largest to smallest and take all PCs where the slope of the line visibly ticks up. This is subjective, and in this case it would require the retention of 25 – 50 PCs.
3. Minimum explained variance: Retain PCs until some preset amount of variance has been explained. This preset amount is arbitrary, and different people have selected anywhere from 80-95%. This would justify including as few as 14 PCs and as many as 100.
4. Broken stick analysis: Retain PCs that exceed the theoretical scree plot of random, uncorrelated noise. This yields precisely 11 PCs.
5. Bootstrapped eigenvalue and eigenvalue/eigenvector: Through iterative random sampling of either the PCA matrix or the original data matrix, retain PCs that are statistically different from PCs containing only noise. I have not yet done this for the AVHRR data, though the bootstrap analysis typically yields about the same number (or a slightly greater number) of significant PCs as broken stick.
The first 3 rules are widely criticized for being either subjective or retaining too many PCs. In the Jackson article below, a comparison is made showing that 1, 2, and 3 will select “significant” PCs out of matrices populated entirely with uncorrelated noise. There is no reason to retain noise, and the more PCs you retain, the more difficult and cumbersome the analysis becomes.
The last 2 rules have statistical justification. And, not surprisingly, they are much more effective at distinguishing truly significant PCs from noise. The broken stick analysis typically yields the fewest number of significant PCs, but is normally very comparable to the more robust bootstrap method.
Note that all of these rules would indicate retaining far more than simply 3 PCs. I have included some references:
Click to access North_et_al_1982_EOF_error_MWR.pdf
I have not yet had time to modify a bootstrapping algorithm I found (it was written for a much older version of R), but when I finish that, I will show the bootstrap results. For now, I will simply present the broken stick analysis results.
The broken stick analysis finds 11 significant PCs. PCs 12 and 13 are also very close, and I suspect the bootstrap test will find that they are significant. I chose to retain 13 PCs for the reconstruction to follow.
Without presenting plots for the moment, retaining more than 11 PCs does not end up affecting the results much at all. The trend does drop slightly, but this is due to better resolution on the Peninsula warming. The rest of the continent does not change if additional PCs are added. The only thing that changes is the time it takes to do the reconstruction.
Remember that the purpose of the PCA on the AVHRR data is not to perform factor analysis. The purpose is simply to reduce the size of the data to something that can be computed. The penalty for retaining “too many” – in this case – is simply computational time or the inability for RegEM to converge. The penalty for retaining too few, on the other hand, is a faulty analysis.
I do not see how the choice of 3 PCs can be justified on either practical or theoretical grounds. On the practical side, RegEM works just fine with as many as 25 PCs. On the theoretical side, none of the stopping criteria yield anything close to 3. Not only that, but these are empirical functions. They have no direct physical meaning. Despite claims in Steig et al. to the contrary, they do not relate to physical processes in Antarctica – at least not directly. Therefore, there is no justification for excluding PCs that show significance simply because the other ones “look” like physical processes. This latter bit is a whole other discussion that’s probably post worthy at some point, but I’ll leave it there for now.
III: RegEM
We’ve also spent a great deal of time on RegEM. Steig & Co. used a regpar setting of 3. Was that the “right” setting? They do not present any justification, but that does not necessarily mean the choice is wrong. Fortunately, there is a way to decide.
RegEM works by approximating the actual data with a certain number of principal components and estimating a covariance from which missing data is predicted. Each iteration improves the prediction. In this case (unlike the AVHRR data), selecting too many can be detrimental to the analysis as it can result in over-fitting, spurious correlations between stations and PCs that only represent noise, and retention of the initial infill of zeros. On the other hand, just like the AVHRR data, too few will result in throwing away important information about station and PC covariance.
Figuring out how many PCs (i.e., what regpar setting to use) is a bit trickier because most of the data is missing. Like RegEM itself, this problem needs to be approached iteratively.
The first step was to substitute AVHRR data for station data, calculate the PCs, and perform the broken stick analysis. This yielded 4 or 5 significant PCs. After that, I performed reconstructions with steadily increasing numbers of PCs and performed a broken stick analysis on each one. Once the regpar setting is high enough to begin including insignificant PCs, the broken stick analysis yields the same result every time. The extra PCs show up in the analysis as noise. I first did this using all the AWS and manned stations (minus the open ocean stations).
I ran this all the way up to regpar=20 and the broken stick analysis indicates that 9 PCs are required to properly describe the station covariance. Hence the appropriate regpar setting is 9 if all the manned and AWS stations are used. It is certainly not 3, which is what Steig used for the AWS recon.
I also performed this for the 42 manned stations Steig selected for the main reconstruction. That analysis yielded a regpar setting of 6 – again, not 3.
The conclusion, then, is similar to the AVHRR PC analysis. The selection of regpar=3 does not appear to be justifiable. Additional PCs are necessary to properly describe the covariance.
IV: THE RECONSTRUCTION
So what happens if the satellite offsets are properly accounted for, the correct number of PCs are retained, and the right regpar settings are used? I present the following panel:
(Right side) Reconstruction trends using just the model frame.
RegEM PTTLS does not return the entire best-fit solution (the model frame, or surface). It only returns what the best-fit solution says the missing points are. It retains the original points. When imputing small amounts of data, this is fine. When imputing large amounts of data, it can be argued that the surface is what is important.
RegEM IPCA returns the surface (along with the spliced solution). This allows you to see the entire solution. In my opinion, in this particular case, the reconstruction should be based on the solution, not a partial solution with data tacked on the end. That is akin to doing a linear regression, throwing away the last half of the regression, adding the data back in, and then doing another linear regression on the result to get the trend. The discontinuity between the model and the data causes errors in the computed trend.
Regardless, the verification statistics are computed vs. the model – not the spliced data – and though Steig did not do this for his paper, we can do it ourselves. (I will do this in a later post.) Besides, the trends between the model and the spliced reconstructions are not that different.
Overall trends are 0.071 deg C/decade for the spliced reconstruction and 0.060 deg C/decade for the model frame. This is comparable to Jeff’s reconstructions using just the ground data, and as you can see, the temperature distribution of the model frame is closer to that of the ground stations. This is another indication that the satellites and the ground stations are not measuring exactly the same thing. It is close, but not exact, and splicing PCs derived solely from satellite data on a reconstruction where the only actual temperatures come from ground data is conceptually suspect.
When I ran the same settings in RegEM PTTLS – which only returns a spliced version – I got 0.077 deg C/decade, which checks nicely with RegEM IPCA.
I also did 11 PC, 15 PC, and 20 PC reconstructions. Trends were 0.081, 0.071, and 0.069 for the spliced and 0.072, 0.059, and 0.055 for the model. The reason for the reduction in trend was simply better resolution (less smearing) of the Peninsula warming.
Additionally, I ran reconstructions using just Steig’s station selection. With 13 PCs, this yielded a spliced trend of 0.080 and a model trend of 0.065. I then did one after removing the open-ocean stations, which yielded 0.080 and 0.064.
Note how when the PCs and regpar are properly selected, the inclusion and exclusion of individual stations does not significantly affect the result. The answers are nearly identical whether 98 AWS/manned stations are used, or only 37 manned stations are used. One might be tempted to call this “robust”.
V: THE COUP DE GRACE
Let us assume for a moment that the reconstruction presented above represents the real 50-year temperature history of Antarctica. Whether this is true is immaterial. We will assume it to be true for the moment. If Steig’s method has validity, then, if we substitute the above reconstruction for the raw ground and AVHRR data, his method should return a result that looks similar to the above reconstruction.
Let’s see if that happens.
For the substitution, I took the ground station model frame (which does not have any actual ground data spliced back in) and removed the same exact points that are missing from the real data.
I then took the post-1982 model frame (so the one with the lowest trend) and substituted that for the AVHRR data.
I set the number of PCs equal to 3.
I set regpar equal to 3 in PTTLS.
I let it rip.
Look familiar?
Overall trend: 0.102 deg C/decade.
Remember that the input data had a trend of 0.060 deg C/decade, showed cooling on the Ross and Weddel ice shelves, showed cooling near the pole, and showed a maximum trend in the Peninsula.
If “robust” means the same answer pops out of a fancy computer algorithm regardless of what the input data is, then I guess Antarctic warming is, indeed, “robust”.
———————————————
Code for the above post is HERE.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.





John Silver (06:33:02) :
I forgot: The Norwegian Nobel Committee (Den norske Nobelkomité) awards the Nobel Peace Prize each year. Its five members are appointed by the Norwegian parliament.
They are politicians!! Got that?
http://en.wikipedia.org/wiki/Norwegian_Nobel_Committee
Sure, the Nobel Peace Prize has always been a political one. At the time Alfred Nobel established the Nobel Prize (1895), Sweden and Norway was in union. That ended in 1905.
The Nobel Prizes for science, physics, chemistry etc. are awarded by the Swedish committee. http://en.wikipedia.org/wiki/Nobel_Prize
“”” Ryan O (12:38:06) :
George E. Smith (10:09:06) :
Does anyone really believe that the blue is representative of the black ?
No. The blue is simply a linear model. No one believes that a linear model is an accurate model for near-surface temperatures. It is used because it is simple and provides a gross comparison between different analyses. Because it is a poor model of the underlying physical process, the confidence limits associated with it are quite large.
Your comments about the Nyquist sampling theorem are not quite apt in this case. There is no attempt – either in Steig or in the above post – to reconstruct the high-frequency portion of the temperature signal. The attempt is to reconstruct the low-frequency portion, the wavelength of which is determined by the interval over which you want to determine the trend. Given that Steig was not concerned with trends of less than about 15 years, there is no need to sample at high rates. “””
Ryan if the linear model is not a good model of the data; then how can its comparison to another linear model of another set of data have any validity to it.
I understand your point about 15 year time scales; but somehow the climate science folks seem to forget there is a spatial sampling process going on as well; and in the case of these Antarctic comparisons; there simply isn’t enough spatial sampling to ascribe any real meaning to the data.
You can’t bore a rock core in Spokane, and another one in Atlanta, and then proceed to describe the complete geology of North America from such poor sampling strategies.
And that is the whole problem with all of this global mean temperature hullabaloo; neither the spatial nor the temporal sampling is adequate to accurately recover the global continuous function; and the aliassing noise is so pervasive that even the average (the zero frequency component) is unrecoverable. The typical twice daily time measurement is already not sufficient to recover the corerect time average; and that completely ignores higher frequency components due to cloud variations.
No wonder the computer modellers don’t address clouds properly; their data sampling regimens assume that cloud variations during the day do not occur.
Well the planet faithfully keeps track of all those cloud variations, to come up with the correct answer. We won’t get the correct answer by ignoring them; nor the spatial variations.
I keep reading that typical ground measured temperature “anomalies” are transported over distances of 1200 Km. That leads to San Jose CA temperatures beinbg used to represent Loreto on the Sea of Cortez, half way down the Baja.
You only have to violate the Nyquist Criterion by a factor of two to make the average unrecoverable (or at least noisy); so even though you aren’t interested in high frequency variations; the inadequacy of the sampling process, prevents you from even getting the low frequency information .
George
Jeff, you guys discussed bias in the instrumentation and accounted for it in your analysis. I would expect that to also account for any sensor calibration isssues. Are there any issues left outstanding with regard to accuracy or precision of the sensors?
Perhaps I am poking my nose too much on this particular issue, but Mann and Schon are entirely different stories. Schon could not be replicated; Mann can be. Wahl and Ammann did just that. Whether Mann’s method yields useful results is an entirely separate question.
Furthermore, the methods involved in a work such as Mann’s are highly technical and esoteric, as are the McIntyre and McKitrick criticisms. Many people – climate scientists included – do not have the background to independently determine the validity of M&M’s arguments or Mann’s counter-arguments. Without that ability to independently verify, they can do no better than defaulting to the party they believe is more credible – which, in most cases, will be the party they are more familiar with.
I agree that the hockey stick is given far more credence than it deserves (and the same goes for many paleo reconstructions), but to put Mann in the same category as Schon is in my opinion unjustified.
But lest I sound like a broken record, I’ll leave it at that. 😉
And I do appreciate the attention this seems to be getting – it’s gratifying.
“”” Jeff Id (10:53:17) :
George E. Smith (10:21:17) :
Ryan’s work as I see it is basically a better representation of the spatial distribution of ground data. Infilling between stations based on covariance with the satellite data. By releasing the RegEM TTLS requirement that the satellite grid data is fixed in the reconstruction it becomes a more appropriate blend of the two sets allowing the acutal measured temperatures to direct the reconstruction. This is why it and other higher order RegEM matches well with the area weighted reconstructions which included only surface station data.
As far as fraud, Rayn and I have no intention of calling Steig et al. fraud and nobody else I know of who’s worked on this paper would call it fraud. I don’t mind calling it incorrect though and do have suspicions about choices made during publication – see the link in my comments above.
I think Mann was chosen as a coauthor for his work with RegEM on the M08 hockeystick. “””
Jeff, please don’t think for a minute that I am in any way critical of your’s and Ryan’s work; I am not. You have helped show up the weaknesses in the whole process that Steig et al embarked on.
And I’m in agreement with you that there’s no basis for flinging around words like fraud.
Steig was very courteous and responsive to the questions I put to him; and my problem with the whole business, is that somebody tried to make something out of nothing.
In any case, I don’t believe that the Arctic, or the Antarctic are the right places to be looking for answers to global climate mysteries. At those low temperatures, the energy fluxes are way down from global norms, so those regions are just bit players in the global temperature question.
Doubling the CO2 in the atmosphere over Vostok station; or the whole of
Antarctica simply points out how silly the whole idea of “Climate Sensitivity” is.
Fossil hunters try to build a whole animal incuding its skin and the color of it’s eyes, from a chp off a tooth fragment. Well if they got any DNA in the process that might lead somewhere; but mostly it is wild guess work that will be criticized by “peers” on the basis of equally wild guesswork.
What is wrong with the experimental result that simply says; we don’t have enough data to know.
“…Schön was, in effect, doing science backwards: working out what his conclusions should be, and then using his computer to produce the appropriate graphs…”
the Wegmann commisson wrote about mann’s hockey stick:
“…A cardinal rule of statistical inference is that the method of analysis must be decided before looking at the data. The rules and strategy of analysis cannot be changed in order to obtain the desired result. Such a strategy carries no statistical integrity and cannot be used as a basis for drawing sound inferential conclusions…
…I am baffled by the claim that the incorrect method doesn’t matter because the answer is correct anyway.
Method Wrong + Answer Correct = Bad Science…”
i wouldn’t call Mann’s output fraud, as – to my knowledge – mann did not produce artificially data, but i would call it INTENTIONAL bad science.
Actress Daryl Hannah is a real leading expert on climate. She can predict anything with her enormous mental ability. We know this because the BBC covers her and not any skeptic mentioned by this site ;p
http://www.bbc.co.uk/blogs/ethicalman/2009/05/is_coal_the_number_one_enemy.html
Thanks for doing this important work.
I agree with Antonio San and disagree with G E Smith, reason being that this is not a trivial matter and could cost trillions. It simply leads the media on to think AGW is actually happening and therefore it is not OK to publish this work especially in Nature when it was obviously flawed
George E. Smith (14:15:08) :
What is wrong with the experimental result that simply says; we don’t have enough data to know.
In effect, isn’t that what Ryan is doing? By showing that the Steig et al.’s analysis is not robust, he’s called into question their claim to have enough data to know.
Your question is a good one, but not very practical. The horse is already out of the barn. Science has been usurped into the service of policy. There is no going back on that. We can only do what we can to expose bad science when it is used to justify bad policy. If this means getting our hands dirty doing science in a way that is less than ideal, it still has to be done.
I doubt that you will be offended, nor do I mean it in any critical sort of way, but I think you are too much of an idealist.
This is quite interesting, I like the attempt at ‘correcting’ the satellite data. However I’m left with a number of points:
1) When a reconstruction is used with an increasing number of PCs > 10 then a pattern emerges that the Antarctic cannot be simpified into an overall trend.
2) If 3 PCs are used and an ‘overall’ trend is calculated then it gives an inaccurate impression of the nature of temperature variation in Antarctica, both by assuming a trend and by using too few PCs
When reading the article I was still drawn to the mention of an overall postive trend (as Flanagan mentioned). If this device of calculating an overall trend is continually used as a metric then by your own admission and calculation Jeff the Antarctic is warming. However this doesn’t represent the actual picture as the graphs show. It’s a bit more complicated.
Perhaps it would be better to separate zones out and then propose that these be modelled. That would be a good way of moving forward.
With regards to the posts about peer review, sometimes it is better to let a paper be published that has basic science correct but on a deeper level is then incorrect. Then someone can publish a better paper. The process, in its ideal form, is not a catch all: that is the job of science. However some people (and it is used as a pro-AGW argument) say that because a paper is published it is instantaneously correct. So, the thinking goes, if there are 1000 peer-reviewed papers on a subject then that means the information and results they contain are correct. Well, no, this isn’t how science works. Jeff has just shown that. It only takes someone to spot the mistake no-one else has considered or repeat the experiment correctly and get a different result. The key though is to get this published so that it is reviewed and available in the same forum. Then no-one can complain its not been through the same wringer and is less valid.
Sadly with this in mind, in some more ‘sexy’ subjects (climate science being one) there appears to be a bias in the publishing so that some papers going against the consensus don’t get considered objectively. That is not the principle of peer review but the reality of consensus view
John W. (14:03:44) :
Jeff, you guys discussed bias in the instrumentation and accounted for it in your analysis. I would expect that to also account for any sensor calibration isssues. Are there any issues left outstanding with regard to accuracy or precision of the sensors?
===
Yes, a lot in fact. Ryan has done a good job working with the end result of the problem by recalibrating the processed AVHRR satellite to surface station data, however the satellite data is very heavily processed. AVHRR is surface temperature data – think dirt/snow/ice temp – rather than surface air temperature. The sensors used cannot penetrate clouds and to call cloud removal problematic would be a huge understatement. It seems to me that it would be reasonable to assume that trends in ground temperature would be muted relative to air temperature.
Another issue is with the blackbody calibration targets onboard each satellite as well. Each satellite seems to have its own story. That’s why Ryan’s approach of correcting the end result to surface station knowns makes a lot of sense to me.
In this case the magnitude of the trend signal is so small compared to even daily variation that it is IMO impossible to detect the difference. By calibration to surface station data, the problem is eliminated at the endpoint. Then the process becomes primarily a covariance distribution of the surface station data. – It’s really kind of an interesting and clever process. Like Ryan said before the process is complex enough that you can easily make bad results. The only way to determine if your result is good is through good statistical setup and extreme quality control afterward.
As a baseline I believe the result must match what is in this case higher quality surface station data which I why Fig 10 is an exciting result to me.
George E. Smith (14:15:08) :
Thanks for the comments, I was more concerned that others would take it as though there is implication of fraud somewhere. The F word has an unfortunate place in science but must be used with extreme care.
What is wrong with the experimental result that simply says; we don’t have enough data to know.
Nothing of course, BTW: I have the same problems with science by watercolor paleontology you do.
Hu McCulloch on CA did a post either in the thread or a headpost where he determined the confidence interval of the trend was something like +/- 0.11C/Decade (I’m not positive about the number but it was similar to this). That would pretty well kill knowledge of a trend and I don’t think it took into account many of the small errors that can occur along the way.
Brilliant. I shudder at the thought of how much work it must have taken, and all because information was not published.
One sentence nudged my memory “most of the temperature rise in Antarctica occurs between 1957 and 1970.“.
That was before the late 20thC warming got under way, in fact the world as a whole was cooling slightly at that time. Then, when the world did warm up, the Antarctic cooled. This is exactly as described (and explained) by Henrik Svensmark.
Now, if Henrik Svensmark’s theories are correct, and if the world is indeed now in a cooling phase, the Antarctic will start to warm up.
What a horrible irony if this does indeed happen. The proponents of AGW will be able to get all the ammunition they need from the Antarctic (the Arctic melt will become a distant memory), and all because Henrik Svensmark’s theories are correct!
To echo Jeff, there is absolutely nothing wrong with the result that says we don’t have enough data to know. And when you don’t have enough data to know, the responsible thing to do is simply admit it.
Your point is right on target, George.
“Aron (14:48:52) :
Actress Daryl Hannah is a real leading expert on climate. She can predict anything with her enormous mental ability. We know this because the BBC covers her and not any skeptic mentioned by this site ;p”
True I believe that Waxman and Markey are waiting to push Cap and Tax through until they get the final reports from Barbra Streisand and Celine Dion.
I am very uncomfortable with seeing Eric Steig’s name juxtaposed with words like “fraud” and “Mann” and “hockey stick.” Dr. Steig’s work is, in fact, proving robust in the sense that he got the sign right, and I don’t recall him ever claiming that the trend was hugely positive. Post-analysis does seem to show a smaller trend, but one that is still positive.
Okay, so maybe there was some accidental cherry-picking inherent in the start date for the data. But that’s no crime, in itself, and it’s laughable to suspect that he spent hours and hours searching for a start date that would give the highest positive trend. That’s simply where the best data start.
Should West Antarctica be analyzed separately? I think so, but it’s a matter of opinion.
Did Dr. Steig use the best statistical methods? Likely not, but I’d point out that post-facto analysis is a tiny bit like Monday-morning quarterbacking–not wrong, just looking at the game with the advantage of a few days’ (or months’) better perspective. It would have been nice if the advantage of transparency had also been made available.
I don’t think we need rude ad hominem attacks here. RC has that well covered. Let’s all retain the behavioral high ground and welcome politely and positively any response in kind from Eric Steig.
“George E. Smith (14:15:08) :
What is wrong with the experimental result that simply says; we don’t have enough data to know.”
That is what is wrong with both analyses.
What the second analysis does though, is prove that by changing the method, you change the result.
I also agree that the whole exercise is futile, trying to obtain data where there are none.
DaveE.
jorgekafkazar (16:47:40) :
I need to point out that Mann is a coauthor of this paper so we would expect some comparison.
I agree with you excepting the Monday morning quarterback. Steig is the one paid to do this day in and day out. We are those who do it in our spare time, unfunded and in this case for understanding. I’m enough of a rookie in climate science I really wanted to know how Antarctic warming occurred and simultaneously ice grew.
If someone doesn’t spend the hours, it may be a stretch to hope that paid climatologists are going to address a paper on the cover of Nature? I don’t know how many will remember but one of the RC crowds main points when this paper was introduced was the robustness due to the high level of time consuming scrutiny of the ‘pro’s’.
Regarding trends, the Steig et al trend is positive throughout the record. The reconstructions I’ve done show negative trends for the last 40 years, I’m fairly certain that Ryan’s trend will do the same – Fig 10. This has a bit to do with the meaning of a LS fit also as the trend is heavily affected by the farthest back 1957 endpoints in the data. Each time above I read comments saying that the sign is right so he’s not that far off, I cringe a little.
I know it’s not apparent if you haven’t followed it all along but the continuous upslope of the Steig/Mann process is a problem. You ask did Steig et al use the best statistical methods, I say nope but they accidentally choose the absolute highest trend version.
I agree with you Jorge. I think he just got in with a bad crowd. He certainly seems to be a forthright person. I’m looking forward to his comments.
Mike
I am serious about my above comment. Take a set of numbers made from whole cloth, plug it into the family circle circuitous route charted in the post and see if Antarctica warms. If it does, the algorithm has a life of its own regardless of data set input.
“”” Basil (14:55:19) :
George E. Smith (14:15:08) :
What is wrong with the experimental result that simply says; we don’t have enough data to know.
In effect, isn’t that what Ryan is doing? By showing that the Steig et al.’s analysis is not robust, he’s called into question their claim to have enough data to know.
Your question is a good one, but not very practical. I doubt that you will be offended, nor do I mean it in any critical sort of way, but I think you are too much of an idealist. “””
Well Basil I am way too long in the tooth to be offended by what anybody says about me; but also I am not an idealist; a realist yes; but having never seen the ideal; I quit searching long ago.
I am not in this academic science community; I have been in Industry all my working life; trying to make life better for everybody. My work is judged by my employers very simply; it just has to work and do what I tell them it is supposed to do. Some stuff works very well; some I wish I had to do over again; but walking into a retail store, and watching some individual put down his own money freely to exchange for some product I helped to design; and that my employer makes a profit on; is the sort of peer review that grabs me.
It’s an even bet that maybe half of the people posting on WUWT anywhere in the world did so using something I had a big hand in the practical design of.
And that’s likely true no matter what brand logo it has on it.
I always know a better way to do it; but it may not be the most practical way or markettably economical way; and in the end; customer satisfaction is all that matters to my employers and by proxy, to me.
I don’t have the pressures of publish or perish as they do in academia (I did start there); in fact the only peer reviewed literature I am even permitted to generate is filed with the US Patent Office. Doesn’t make me rich; may not make my employers rich; but it does help stop the idea thieves from cashing in on our hard work.
One of the most important lessons that one learns in an industrial R&D environment; is that sometimes the most important research result; is learning that you have no business continuing with this project you are working on; it simply isn’t going to lead anywhere. So you down tools (after telling your boss its a no go) and you look for a seat in another boat; then tell him to get the hell out of your way so you can row that boat.
Maybe that’s a good rule in academia too. Don’t get married to an idea, when your innermost thoughts tell you that things aren’t the way you thought they were.
“Mr Mac”, the MacDonald of MacDonald aircraft (St Louis MO) used to say; “We seldom fire anyone for making a mistake; but we always fire anyone who tries to cover up a mistake.” It’s a good philosophy.
I come here to learn from any and all of the posters here;l and if my thoughts help or inspire anyone else; that’s good too; but to me it is the learning opportunities.
George
Mr. Lynn said
Tuesday the President of the United States said this:
“We have over the course of decades slowly built an economy that runs on oil. It has given us much of what we have — for good but also for ill. It has transformed the way we live and work, but it’s also wreaked havoc on our climate.”
actually, the totus said “wrecked” so that’s what the potus said… “wrecked havoc on our climate”.
There were a number of interesting comments on fraud. I cannot find the source of one of my favorite lines, “Never ascribe to malice what can be adequately explained by incompetence.” Its so much more fun to humiliate someone with evidence of incompetence. They can hardly explain, “No, no, I’m not incompetent. I was committing fraud.”
On the other hand withholding data and methods could easily be defined as fraud all by itself. Some drug companies have gotten into big trouble by withholding studies that had negative results and only presenting favorable studies. If you use any chemicals in industry you better be keeping files on any allegations of adverse reactions under TOSCA rules. It doesn’t matter if the allegations turn out to be unfounded. You must keep those files or face big penalties.
The NSF “expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work.”
I think its time to declare that withholding data and methods, while being funded by the taxpayers, is fraud. Suppose just for a moment that disasters forecast by the AGW proponents are all true. How is it going to look that the threat could not be confirmed in time because some researchers hid the evidence. Why are they withholding the data if they really believe that there is a danger?
Ron de Haan (09:41:47) :
“What is happening at the Southern Hemisphere?
In New Zealand, winter has arrived skipping Autumn this year.
http://www.iceagenow.com/2009_Other_Parts_of_the_World.htm
And heavy hail covering a surfer resort and low temperatures have convinced some people that a new ice age is due.”
Currently here in the South Island of NZ we are being blasted by icy cold weather we would “normally” expect during mid-winter that being July/August. Somebody once said “don’t knock the weather because it starts 80% of conversations.” Kinda true down under right now.