NOTE: This has been running two weeks at the top of WUWT, discussion has slowed, so I’m placing it back in regular que. – Anthony
UPDATES:
Statistician William Briggs weighs in here
Eduardo Zorita weighs in here
Anonymous blogger “Deep Climate” weighs in with what he/she calls a “deeply flawed study” here
After a week of being “preoccupied” Real Climate finally breaks radio silence here. It appears to be a prelude to a dismissal with a “wave of the hand”
Supplementary Info now available: All data and code used in this paper are available at the Annals of Applied Statistics supplementary materials website:
http://www.imstat.org/aoas/supplements/default.htm
=========================================
Sticky Wicket – phrase, meaning: “A difficult situation”.
Oh, my. There is a new and important study on temperature proxy reconstructions (McShane and Wyner 2010) submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to the proxy reconstructions of Mann. It seems watertight on the surface, because instead of trying to attack the proxy data quality issues, they assumed the proxy data was accurate for their purpose, then created a bayesian backcast method. Then, using the proxy data, they demonstrate it fails to reproduce the sharp 20th century uptick.
Now, there’s a new look to the familiar “hockey stick”.
Before:

After:

Not only are the results stunning, but the paper is highly readable, written in a sensible style that most laymen can absorb, even if they don’t understand some of the finer points of bayesian and loess filters, or principal components. Not only that, this paper is a confirmation of McIntyre and McKitrick’s work, with a strong nod to Wegman. I highly recommend reading this and distributing this story widely.
Here’s the submitted paper:
(PDF, 2.5 MB. Backup download available here: McShane and Wyner 2010 )
It states in its abstract:
We find that the proxies do not predict temperature significantly better than random series generated independently of temperature. Furthermore, various model specifications that perform similarly at predicting temperature produce extremely different historical backcasts. Finally, the proxies seem unable to forecast the high levels of and sharp run-up in temperature in the 1990s either in-sample or from contiguous holdout blocks, thus casting doubt on their ability to predict such phenomena if in fact they occurred several hundred years ago.
Here are some excerpts from the paper (emphasis in paragraphs mine):
This one shows that M&M hit the mark, because it is independent validation:
In other words, our model performs better when using highly autocorrelated
noise rather than proxies to ”predict” temperature. The real proxies are less predictive than our ”fake” data. While the Lasso generated reconstructions using the proxies are highly statistically significant compared to simple null models, they do not achieve statistical significance against sophisticated null models.
We are not the first to observe this effect. It was shown, in McIntyre
and McKitrick (2005a,c), that random sequences with complex local dependence
structures can predict temperatures. Their approach has been
roundly dismissed in the climate science literature:
To generate ”random” noise series, MM05c apply the full autoregressive structure of the real world proxy series. In this way, they in fact train their stochastic engine with significant (if not dominant) low frequency climate signal rather than purely non-climatic noise and its persistence. [Emphasis in original]
Ammann and Wahl (2007)
…
On the power of the proxy data to actually detect climate change:
This is disturbing: if a model cannot predict the occurrence of a sharp run-up in an out-of-sample block which is contiguous with the insample training set, then it seems highly unlikely that it has power to detect such levels or run-ups in the more distant past. It is even more discouraging when one recalls Figure 15: the model cannot capture the sharp run-up even in-sample. In sum, these results suggest that the ninety-three sequences that comprise the 1,000 year old proxy record simply lack power to detect a sharp increase in temperature. See Footnote 12
Footnote 12:
On the other hand, perhaps our model is unable to detect the high level of and sharp run-up in recent temperatures because anthropogenic factors have, for example, caused a regime change in the relation between temperatures and proxies. While this is certainly a consistent line of reasoning, it is also fraught with peril for, once one admits the possibility of regime changes in the instrumental period, it raises the question of whether such changes exist elsewhere over the past 1,000 years. Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining paleoclimatology as a statistical enterprise.
…

We plot the in-sample portion of this backcast (1850-1998 AD) in Figure 15. Not surprisingly, the model tracks CRU reasonably well because it is in-sample. However, despite the fact that the backcast is both in-sample and initialized with the high true temperatures from 1999 AD and 2000 AD, it still cannot capture either the high level of or the sharp run-up in temperatures of the 1990s. It is substantially biased low. That the model cannot capture run-up even in-sample does not portend well for its ability
to capture similar levels and run-ups if they exist out-of-sample.
…
Conclusion.
Research on multi-proxy temperature reconstructions of the earth’s temperature is now entering its second decade. While the literature is large, there has been very little collaboration with universitylevel, professional statisticians (Wegman et al., 2006; Wegman, 2006). Our paper is an effort to apply some modern statistical methods to these problems. While our results agree with the climate scientists findings in some
respects, our methods of estimating model uncertainty and accuracy are in sharp disagreement.
On the one hand, we conclude unequivocally that the evidence for a ”long-handled” hockey stick (where the shaft of the hockey stick extends to the year 1000 AD) is lacking in the data. The fundamental problem is that there is a limited amount of proxy data which dates back to 1000 AD; what is available is weakly predictive of global annual temperature. Our backcasting methods, which track quite closely the methods applied most recently in Mann (2008) to the same data, are unable to catch the sharp run up in temperatures recorded in the 1990s, even in-sample.
As can be seen in Figure 15, our estimate of the run up in temperature in the 1990s has
a much smaller slope than the actual temperature series. Furthermore, the lower frame of Figure 18 clearly reveals that the proxy model is not at all able to track the high gradient segment. Consequently, the long flat handle of the hockey stick is best understood to be a feature of regression and less a reflection of our knowledge of the truth. Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distribution of our model.
Our main contribution is our efforts to seriously grapple with the uncertainty involved in paleoclimatological reconstructions. Regression of high dimensional time series is always a complex problem with many traps. In our case, the particular challenges include (i) a short sequence of training data, (ii) more predictors than observations, (iii) a very weak signal, and (iv) response and predictor variables which are both strongly autocorrelated.
The final point is particularly troublesome: since the data is not easily modeled by a simple autoregressive process it follows that the number of truly independent observations (i.e., the effective sample size) may be just too small for accurate reconstruction.
Climate scientists have greatly underestimated the uncertainty of proxy based reconstructions and hence have been overconfident in their models. We have shown that time dependence in the temperature series is sufficiently strong to permit complex sequences of random numbers to forecast out-of-sample reasonably well fairly frequently (see, for example, Figure 9). Furthermore, even proxy based models with approximately the same amount of reconstructive skill (Figures 11,12, and 13), produce strikingly dissimilar historical backcasts: some of these look like hockey sticks but most do not (Figure 14).
Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries. Nonetheless, paleoclimatoligical reconstructions constitute only one source of evidence in the AGW debate. Our work stands entirely on the shoulders of those environmental scientists who labored untold years to assemble the vast network of natural proxies. Although we assume the reliability of their data for our purposes here, there still remains a considerable number of outstanding questions that can only be answered with a free and open inquiry and a great deal of replication.
===============================================================
Commenters on WUWT report that Tamino and Romm are deleting comments even mentioning this paper on their blog comment forum. Their refusal to even acknowledge it tells you it has squarely hit the target, and the fat lady has sung – loudly.
(h/t to WUWT reader “thechuckr”)

It shall be interesting indeed to see the response from both the statisticians and climate experts once this paper is actually peer reviewed…
stephen richards says:
August 17, 2010 at 8:07 am
Jaye
that after an experiment shows that a theory or some aspect of a theory is nullified the experimenter must show an alternative theory
Suggest you read Feynman. This is not STRICTLY true. It is not necessary to show an alternate theory merely to show where the current theory breaks down.
Well you didn’t read the entire phrase which was :
or the nonexistent requirement that after an experiment shows that a theory or some aspect of a theory is nullified the experimenter must show an alternative theory.
duckster you are very confused or maybe you have 1000 monkeys typing on your keyboard.
@wobble
Here’s what can change the heat equation a lot at the periphery and below the arctic ice cap when we pick up more or less warm water as the surface conveyor cruises the through the warm side of ENSO, PDO, and AMDO. The mother of all El Ninos was a biggie picking up the 100-year record warm water in the tropical pacific and shuttling it right on up until it hits the far north atlantic.
http://oceanmotion.org/html/background/ocean-conveyor-belt.htm
Don’t worry, duckster… The chance of Mann’s hockey stick being certified viable is nill. Even if they manage to somehow bury this paper, Mann is still toast. He was toast 20 years ago; the bigger crime is that he’s defended his nefarious “science” in the face of all superior and weighty criticism. The flood of criticism stated with Climategate and continues to swell. It is that simple.
I’d like to see more in-depth statistical analysis of this sort on non-cherry-picked datasets so a better understanding of past climate can be constructed. Like datasets I’ve worked with in the past, skewed or tampered datasets can be identified and eliminated by testing their intrinsic characteristics.
Dave Springer says:
August 17, 2010 at 10:43 am
wobble says:
August 17, 2010 at 8:27 am
“Arctic ice melts every single year.”
It doesn’t stay melted through the next winter every single year. A million square kilometers has gone missing.
_________________________
Does it make any sense in this discussion of latent heat to treat the Arctic as if it was a separate sealed container? There is the rest of world isn’t there? How does the overall global sea ice budget and sea temperature (surface and deep) factor into this. What about the temperature of the air? Aren’t these things all interrelated? And further, even in the Arctic isn’t your “missing” ice showing back up a bit more year by year? To bring this discussion back to the subject matter of this thread. Isn’t there a lot we don’t know about the way heat, ice, temperature, wind, currents, clouds and the like behave? Just stating that some ice has gone missing is an admission that we don’t know much, especially when the ice may not be missing for long or even at all if we knew where to look for it, or even how to accurately measure it.
People complaining about scrolling are probably nursing some strange ulterior motive(s).
(or they are irritably looking for the punched card that says ‘0.8 deg C’ – needed to complete their latest (peer-reviewed) submission to The International Journal of Climate Change, Juicy Grants, and Exotic Conference Locations’)
I have a few comments. I am definitely a skeptic from RC’s point of view.
Don’t underestimate the Team, they will have a response, and they have a large mass of foaming at the mouth followers that will accept the response as golden and spread it widely. Don’t be surprised.
When faced with a problem like this where the authors have shown legitimate problems with methods and have reached reasonable conclusions, you don’t fight the methods or conclusions, you will attack the very basis of their paper. When you are guilty, argue the law.
Something along the lines of these guys aren’t climate scientists, and their methods of data reduction are OK from a purely neutral data point of view, and we have always agreed the data is difficult to interpret.. However our insider expert knowledge on the proxies allows us to “intelligently” reduce the data in a much more meaningful way than a blind statistician can. Unless someone can substantiate why our expert data reduction processes are invalid from a climate science perspective, our results stand.
This moves the argument back into their arena where they can argue from authority again. They won’t be giving up anytime soon. Their careers and legacy depend on it. It’s personal.
This is probably a very weak argument, but the response will likely be along these type of lines. They may praise the mathematics, but will dismiss the results as an amateur hour attempt to refute the science which the authors have little knowledge of.
That has been the pattern with M&M.
This paper was in fact very readable and the explanations made it easier to understand exactly what they were saying, and they do make some very good arguments:
The estimated skill of models are overestimated with two block approach because filling in the center 30 year block is an interpolation (i.e. the model knows the starting point and ending point of the estimate). When asked to either predict or backcast, where only the starting point is known, the model performs much more poorly.
The creation of a null model that has localized matches, but no long term trend was quite illuminating. It basically showed you aren’t going to get a very good backcast when only a short time period of training data is available.
In summary the backcast using proxies is simply unreliable to use for decision making. I think anyone who has been around this stuff for a while already knows this though.
Henry@DaveSpringer
You said that Co2 is transparent to sunshine.
Well is it or is it not. What do the spectra tell you, and at what wavelengths does the sun shine?
RockyRoad says:
August 17, 2010 at 10:57 am
Hi Rocky,
One of the points I took away from the paper was that the proxies in general were insufficient to predict anything, I thought, on the basis of sample size: that there just aren’t enough to extrapolate a meaningful result.
You seem to be saying that there are proxy data sets out there that are unaltered and in sufficient quantity to produce a meaningful, and hence, a predictive result?
Just form reading above I was begining to think that all past reconstructions were thrown into the mix with MBH. Of course pre-1986 products, most likely not Mann-handled, would have more integrity/credibility.
@Henry
http://wattsupwiththat.com/2008/06/21/a-window-on-water-vapor-and-planetary-temperature-part-2/
Has a nice chart of different major absorption bands for GHGs.
CO2 in visible spectrum (shorter than 1.2um which is near infrared) has no appreciable absorption band. I don’t doubt that with a sensitive enough instrument you can dig some characteristic scattering at visible frequencies out of the dirt but the power just ain’t there compared to the infrared absorption bands.
Barry:
You mistakenly assert to me August 17, 2010 at 5:16 am :
“It would appear from the paper cited that there has been a long cooling trend from 1000 – 1900, followed by a sharp uptick during the 20th century. As you posit 900-year cycles as comprising two phases of 450 years, warming and cooling, the paper du jour doesn’t seem to support your contention.”
No, that is a misreading of the paper.
JER0ME stated the matter clearly at August 17, 2010 at 7:03 am so I can do no better than quote his post that said:
“People are still looking at the graph and saying “Look, it says this…”
The graph represents nothing. All it demonstrates is the Hockey Stick is broken. Nothing is created to replace it – welcome to the ‘void’ of Hidden Global Warming hypothesis.”
However, there is an enormous amount of information from history and from archaeology (in addition to proxy studies) that indicates the existence of the ~900 year global temperature cycle. The importance that was placed on the MBH ‘hockey stick’ was that it seemed to deny all that evidence.
But the MBH ‘hockey stick’ is now in the refuse bin so all that evidence is again seen to be valid.
Richard
Eli Rabett:
“Well yeah, the science team always looks at things, and finds answers. It looks like the basic error on this one is that by calibrating against the hemispheric average, rather than smaller grid cells, they loose information and kill the signal to noise. Averaging out the local signal means that noise looks better than signal and in their words, noise provides a better fit than the proxys. There are, however, some other useful ideas in the paper.”
Uhuh. The thing that really strikes me about the climate debate, is that on the one hand you have a bunch of people pointing out what you would expect an everage ten-year-old to notice. On the other hand you have people spinning complicated verbiage to try and convince you that the issue is so complex that only the experts can tell what the truth is.
Apparently “calibrating against the hemispheric average” is the major issue here. Gosh, it just sounds so sciencey. It must be right.
It couldn’t possibly be as simple as “we can’t find any genuine proxies that actually have the shape of graph that we are looking for.”
It has happened just as predicted numerous times above.
“While WattsUpWithThat thinks this paper is so important that he has been running a post on it at the top of his blog for days, he conveniently omits this rather remarkable statement from the authors:
Using our model, we calculate that there is a 36% posterior probability that 1998 was the warmest year over the past thousand. If we consider rolling decades, 1997-2006 is the warmest on record; our model gives an 80% chance that it was the warmest in the past thousand years.
Doh!”
http://climateprogress.org/2010/08/16/hockey-stick-paper-mcshane-and-wyner-statisticians/
REPLY: And Climateprogress ignores everything else in the paper, such as the predictive ability of fake data being better than the proxy data. 1998 was caused by an El Nino, not by “global warming” as the next year, 1999 was near the zero anomaly line. Romm has no point. – Anthony
For those claiming the paper has not been peer reviewed. Note that in one of the first links provided at the top of this thread the paper is listed to appear in one of the next issues. That means it has already been peer reviewed.
The process works like this: sumbit paper to journal – journal sends the paper to anonymous referees for peer review – based on the referee reports, the journal editor rejects or offers a revise/resubmit or accepts the paper (the review period can take serveral months). The que for most journals is long, so that acceptance today may mean that the authors have to wait months+ before the paper is actually in print.
The authors probably finished this paper 6 months to a year ago and are only now learning it has been accepted. It’s also possible they received a revise and resubmit which means they submitted the paper well over a year ago, but the referees wanted some revisions before publication. Revisions take time and then the paper goes back to referees or journal editor. Few papers are accepted without revision.
The next step is for the profession to extend or refute the paper by basing new research based on the papers assumptions/methods/data set, etc.
Stephen Brown says:
August 17, 2010 at 11:56 am
A commenter at Briggs site raised a similar issue. Briggs response says it better than I ever could
Briggs says:
17 August 2010 at 9:29 am
Bernie,
The money quote is
Using our model, we calculate that there is a 36% posterior probability that 1998 was the warmest year over the past thousand. If we consider rolling decades, 1997-2006 is the warmest on record; our model gives an 80% chance that it was the warmest in the past thousand years. Finally, if we look at rolling thirty-year blocks, the posterior probability that the last thirty years (again, the warmest on record) were the warmest over the past thousand is 38%.
Recall our litany: All probability statements are conditional on certain premises or evidence. One premise here is the truth of the model. It’s unlikely that this model is perfect. If we allow some chance for other, better models, then the chance that, say, that the last thirty years were the warmest would be less than 38%. And the chance that the rolling decade 1997-2006 is the warmest would be less than 80%.
Another piece of evidence is the purity of the data, assumed by their model to be measured without error. Again, not true, so we must damp down the probabilities even more.
A third piece of evidence, an assumption, is stationarity of the proxy-temperature relationship: not that this is not the same as assuming the stationarity of either series; we only require that the statistical relationship between them is stationary. There is good evidence that this is not so (see the above, and the main article). Once more, this being so, the chances are lowered yet again.
Since we don’t have a model for these premises, we cannot say explicitly how low the probabilities should drop; but a reasonable guess is by at least a quarter to a half. That’s based on my subjective assessment of the likelihood that (1) the model is perfect and (2) the data are measured without significant error, and (3) the relationship is stationary.
In other words, we just can’t be that sure what the pre-historical-record temperatures were—at least, not to the tune of fractions of degrees Celsius. Ice ages we can tell, the difference between last year and, say, 1640, about the best we can do is say, “It was about the same.”
Dear Stephen Brown
Go back to Climate Progress – wherein lives the comment-deleting Mr Romm. Go to his Usual Suspects post. Give him first a bit of the earful, for spoiling the name of a good movie.
Draw a straight line parallel to the X-axis at the ‘0’ point, on the MW paper graph. Draw a straignt line parallel to the X-axis at the ‘0’ point on the graph above that.
What do you see?
Romm thinks MW did not ‘break’ the hockey stick, because he draws an actual hockey stick overlying the MW graph. What a joke! Someone should tell him he can draw a hockey stick over Lamb’s graph with the huge medieval warmth as well.
I occasionally counsult with statisticians about the analysis of my scientific data and the collaboration can be quite helpful. However, based on my experience, statisticians analyzing scientific data without the help of a scientist would be more likely to make mistakes than scientists who analyze their own data without the assistance of a professional statistician. I get help from statisticians from departments of mathematics and statistics who have experience with scientific projects. I have not tried working with statisticians, such as M & W, who work primarily (I assume) with economic and business data, rather than scientific data. Even though the M & W paper was submitted to a statistics journal, the editors would have been well advised to include some scientific input.
Tom Scharf: I believe that you have it exactly right.
This statement is untrue. There have not been any winter freeze data which indicate a million square kilometers less of arctic ice since 2007. Additionally, there isn’t any data which indicates that there is a million square kilometers less of summer arctic ice since 2007.
So what? It’s irrational to believe that a materially equal amount of radiation would cause the same exact amount of arctic summer ice melt year-after-year-after year. Surely, you accept that some natural variance should be expected. Can you quantify the amount to natural variance that a rational person should expect?
Duckster is not confused. He is father Duckster of the great warming religion a complete zealot and evangelist of the great warming faith. The Galileo’s of statistics have pronounced the self evident truth and here begins the inquisition with no regard for the integrity of their domain as it is not of the global warming faith.
I just started reading the paper and acame across this;
All data and code used in this paper are available at the Annals of Applied
Statistics supplementary materials website:
http://www.imstat.org/aoas/supplements/default.htm
Well done!
REPLY: Thanks for that- added your note and link to main page – Anthony
My point is quite simple.
Massive amounts of latent heat is adsorbed during every spring and summer arctic ice melt. The amount of latent heat required to melt an additional million square miles of ice might only be one, two, or even three standard deviations from the average amount of heat which melts the arctic ice every year.
We don’t have enough data to quantify the average amount of annual heat, and we certainly don’t have enough data to calculate standard deviations.
So while I’m not trying to claim that you’re hypothesis is wrong, I am absolutely claiming that it’s far from “clear.”
In another comment response at his site Mr. Briggs addresses what has always been one of my biggest personal bugaboos in “climate science”, that being the pervasive practice of presenting graphs with the central tendency plotted in bold color while the error bars are represented by the palest of grays and pastels. The clear, though unstated, implication is that the central tendency best reflects reality. To quote Mr. Briggs
“Their red line would be the best guess, if you had to commit yourself to just one number. Chance that that one number is right is near zero. The error envelope is always—always—a superior way to do business. It says that there’s a 95% chance that the observable temperature was somewhere in that window.”
Some of the comments above suggest that the M & W paper will be published with discussion and critical comments. Perhaps that would be a chance for Michael Mann (maybe with a statistics Ph.D. as a co-author) to defend his analysis and to criticize the M & W paper.
I was the editor for such a critique and the procedure was to have comments from both sides where the authors who disagreed acted as one set of peer reviewers for each others comments. Presumably, M & W would have a chance to defend their work in view of the critique, just as Mann and his colleagues would have a chance to defende their work.
This would certainly cut down the extent of disagrement and made sure that neither side’s criticism was based on misunderstanding or misinterpretation. The down side of the precedure is that a couple of rounds of peer review will delay publication.
In any case, I would expect that M & W would want to be sure that their critique was not based in part of any misunderstanding of the Mann et al. papers.