NOTE: This has been running two weeks at the top of WUWT, discussion has slowed, so I’m placing it back in regular que. – Anthony
UPDATES:
Statistician William Briggs weighs in here
Eduardo Zorita weighs in here
Anonymous blogger “Deep Climate” weighs in with what he/she calls a “deeply flawed study” here
After a week of being “preoccupied” Real Climate finally breaks radio silence here. It appears to be a prelude to a dismissal with a “wave of the hand”
Supplementary Info now available: All data and code used in this paper are available at the Annals of Applied Statistics supplementary materials website:
http://www.imstat.org/aoas/supplements/default.htm
=========================================
Sticky Wicket – phrase, meaning: “A difficult situation”.
Oh, my. There is a new and important study on temperature proxy reconstructions (McShane and Wyner 2010) submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to the proxy reconstructions of Mann. It seems watertight on the surface, because instead of trying to attack the proxy data quality issues, they assumed the proxy data was accurate for their purpose, then created a bayesian backcast method. Then, using the proxy data, they demonstrate it fails to reproduce the sharp 20th century uptick.
Now, there’s a new look to the familiar “hockey stick”.
Before:

After:

Not only are the results stunning, but the paper is highly readable, written in a sensible style that most laymen can absorb, even if they don’t understand some of the finer points of bayesian and loess filters, or principal components. Not only that, this paper is a confirmation of McIntyre and McKitrick’s work, with a strong nod to Wegman. I highly recommend reading this and distributing this story widely.
Here’s the submitted paper:
(PDF, 2.5 MB. Backup download available here: McShane and Wyner 2010 )
It states in its abstract:
We find that the proxies do not predict temperature significantly better than random series generated independently of temperature. Furthermore, various model specifications that perform similarly at predicting temperature produce extremely different historical backcasts. Finally, the proxies seem unable to forecast the high levels of and sharp run-up in temperature in the 1990s either in-sample or from contiguous holdout blocks, thus casting doubt on their ability to predict such phenomena if in fact they occurred several hundred years ago.
Here are some excerpts from the paper (emphasis in paragraphs mine):
This one shows that M&M hit the mark, because it is independent validation:
In other words, our model performs better when using highly autocorrelated
noise rather than proxies to ”predict” temperature. The real proxies are less predictive than our ”fake” data. While the Lasso generated reconstructions using the proxies are highly statistically significant compared to simple null models, they do not achieve statistical significance against sophisticated null models.
We are not the first to observe this effect. It was shown, in McIntyre
and McKitrick (2005a,c), that random sequences with complex local dependence
structures can predict temperatures. Their approach has been
roundly dismissed in the climate science literature:
To generate ”random” noise series, MM05c apply the full autoregressive structure of the real world proxy series. In this way, they in fact train their stochastic engine with significant (if not dominant) low frequency climate signal rather than purely non-climatic noise and its persistence. [Emphasis in original]
Ammann and Wahl (2007)
…
On the power of the proxy data to actually detect climate change:
This is disturbing: if a model cannot predict the occurrence of a sharp run-up in an out-of-sample block which is contiguous with the insample training set, then it seems highly unlikely that it has power to detect such levels or run-ups in the more distant past. It is even more discouraging when one recalls Figure 15: the model cannot capture the sharp run-up even in-sample. In sum, these results suggest that the ninety-three sequences that comprise the 1,000 year old proxy record simply lack power to detect a sharp increase in temperature. See Footnote 12
Footnote 12:
On the other hand, perhaps our model is unable to detect the high level of and sharp run-up in recent temperatures because anthropogenic factors have, for example, caused a regime change in the relation between temperatures and proxies. While this is certainly a consistent line of reasoning, it is also fraught with peril for, once one admits the possibility of regime changes in the instrumental period, it raises the question of whether such changes exist elsewhere over the past 1,000 years. Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining paleoclimatology as a statistical enterprise.
…

We plot the in-sample portion of this backcast (1850-1998 AD) in Figure 15. Not surprisingly, the model tracks CRU reasonably well because it is in-sample. However, despite the fact that the backcast is both in-sample and initialized with the high true temperatures from 1999 AD and 2000 AD, it still cannot capture either the high level of or the sharp run-up in temperatures of the 1990s. It is substantially biased low. That the model cannot capture run-up even in-sample does not portend well for its ability
to capture similar levels and run-ups if they exist out-of-sample.
…
Conclusion.
Research on multi-proxy temperature reconstructions of the earth’s temperature is now entering its second decade. While the literature is large, there has been very little collaboration with universitylevel, professional statisticians (Wegman et al., 2006; Wegman, 2006). Our paper is an effort to apply some modern statistical methods to these problems. While our results agree with the climate scientists findings in some
respects, our methods of estimating model uncertainty and accuracy are in sharp disagreement.
On the one hand, we conclude unequivocally that the evidence for a ”long-handled” hockey stick (where the shaft of the hockey stick extends to the year 1000 AD) is lacking in the data. The fundamental problem is that there is a limited amount of proxy data which dates back to 1000 AD; what is available is weakly predictive of global annual temperature. Our backcasting methods, which track quite closely the methods applied most recently in Mann (2008) to the same data, are unable to catch the sharp run up in temperatures recorded in the 1990s, even in-sample.
As can be seen in Figure 15, our estimate of the run up in temperature in the 1990s has
a much smaller slope than the actual temperature series. Furthermore, the lower frame of Figure 18 clearly reveals that the proxy model is not at all able to track the high gradient segment. Consequently, the long flat handle of the hockey stick is best understood to be a feature of regression and less a reflection of our knowledge of the truth. Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distribution of our model.
Our main contribution is our efforts to seriously grapple with the uncertainty involved in paleoclimatological reconstructions. Regression of high dimensional time series is always a complex problem with many traps. In our case, the particular challenges include (i) a short sequence of training data, (ii) more predictors than observations, (iii) a very weak signal, and (iv) response and predictor variables which are both strongly autocorrelated.
The final point is particularly troublesome: since the data is not easily modeled by a simple autoregressive process it follows that the number of truly independent observations (i.e., the effective sample size) may be just too small for accurate reconstruction.
Climate scientists have greatly underestimated the uncertainty of proxy based reconstructions and hence have been overconfident in their models. We have shown that time dependence in the temperature series is sufficiently strong to permit complex sequences of random numbers to forecast out-of-sample reasonably well fairly frequently (see, for example, Figure 9). Furthermore, even proxy based models with approximately the same amount of reconstructive skill (Figures 11,12, and 13), produce strikingly dissimilar historical backcasts: some of these look like hockey sticks but most do not (Figure 14).
Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries. Nonetheless, paleoclimatoligical reconstructions constitute only one source of evidence in the AGW debate. Our work stands entirely on the shoulders of those environmental scientists who labored untold years to assemble the vast network of natural proxies. Although we assume the reliability of their data for our purposes here, there still remains a considerable number of outstanding questions that can only be answered with a free and open inquiry and a great deal of replication.
===============================================================
Commenters on WUWT report that Tamino and Romm are deleting comments even mentioning this paper on their blog comment forum. Their refusal to even acknowledge it tells you it has squarely hit the target, and the fat lady has sung – loudly.
(h/t to WUWT reader “thechuckr”)

Well, as Ernest Rutherford said:
“If your experiment depends on statistics, perhaps you should have designed a better experiment.”
BTW, the acronym “CAGW” deserves mention: As this paper illustrates, the “C” (for “Catastrophic”) is gone because temperatures in the recent past (say the past thousand years or so) have been as high as today with no associated catastrophe (indeed, those were better times by far than during the Little Ice Age).
The “A” in the term stands for “Anthropogenic”, but following the argument above, it can be asserted that past warmings weren’t of Anthropogenic origin, so that’s gone.
Finally, the “GW” (for “Global Warming”) is apparently exaggerated and is most likely of natural origin.
So with no “C”, no “A”, and little or no “GW”, they’re down to practically nothing. One could assert, then, that “CAGW” doesn’t exist.
If you google “Mann climate” you get one news story from PrisonPlanet.com. Obviously a lot of ignorring going on here, who are the deniers now?
Duckster you said:
“…MWP graphs that have commonly been used here – cited earlier – which clearly placed the MWP at 1200 – 1400ce…”
Are you citing the graphic called the “Battle of the graphs” for your date range of 1200-1400 for the MWP? The comparison shows the global steady state temperatures with hockey stick by Mann and the hot and cold variation in Europe over similar time spans. Using the second graph, not Mann’s, I would describe the hot time from 950 to 1400 as the medieval warm period and peaking at 1200. I would not say the medieval warm period was from 1200 to 1400.
duckster: M&W are not scientists and their point is not scientific. They are statisticians and their point is statistical. They do not claim to present a new, “valid” reconstruction. Their point is that proxies will not support any reconstruction. In other words, the Hockey Stick is not so much broken as it is a castle in the air.
duckster says:
August 16, 2010 at 2:13 am
“………….Once again – the timing of the MWP, if you accept the M&W graph above, wildly conflicts with the timing of the MWP graphs that have commonly been used here – cited earlier – which clearly placed the MWP at 1200 – 1400ce……………”
‘GROAN’
duckster, are you intentionally disregarding some of the things the paper states? Or the efforts for many here to clarify the assertions of the paper? Are you intentionally mis-informing people that come here to read about climate issues?
duckster, the paper itself states rather clearly that they don’t believe the proxy data is sensitive enough to detect “upticks” in the warming. I’ve quoted several places in the paper where they state as much.
duckster, the lack of an apparent MWP in the graph was generated by proxy data in which the authors state they have no confidence. You are entirely mis-interpreting what the graph means. If you disagree with my statements, please show me where in the paper that I’m wrong. I’m beginning to think this is a willful attempt at disinformation dissemination.
duckster says:
“Once again – the timing of the MWP, if you accept the M&W graph above, wildly conflicts with the timing of the MWP graphs that have commonly been used here – cited earlier – which clearly placed the MWP at 1200 – 1400ce. I could use this paper to therefore discredit many of the lines of argument that have been made here in support of a medieval warming period (it almost completely disappears into the margin error, temporal placement wildly inconsistent etc.). Not that I am doing this. My point is that this paper also undermines much of the work that has been ‘published’ here.”
duckster, you have raised this strawman about the timing of the MWP a couple of times on this thread. Have a look at http://pages.science-skeptical.de/MWP/Loehle-2007.html and if you wish go and read the full Loehle, C. 2007 paper. it shows that 1000 ce is right in the middle of the MWP. By 1200 it was already waning, and by 1400 it was well over. Maybe you could link to the item on this site where you believe graphs have been presented showing the MWP started in 1200 ce.
Now that we have that strawman out of the way, maybe we can get back to the statistical methods that this paper is all about. If the maths behind Mann’s hockey stick is wrong, then there can be no confidence in the conclusions. In other words, the hockey stick may be correct, but more likely it isn’t.
#
#
evanmjones says:
August 15, 2010 at 2:02 pm
I am not concerned with the politics. But the Medieval Warm period is in the literature, architecture, geology, and archaeology. Even the statistics, such as they were (e.g., “the emperor’s cherry trees”). Climate scientists have got to face up to the fact that they are, as on wit put it, like Truman Capote trying to marry Dolly Parton. The job is just too big for them.
Nice post Evan, but this next comment really hit home, especially when I read that some like Richy were calling foul at sceptics “villifying” Michael Mann……………..
“Climatology takes a village. A Full and Complete village”.
I know what you meant, but I suppose every village needs a village idiot, and Mann has sadly self cast himself in that role.
The facts, the science, the numbers make that abundantly clear. He did the trick “his way”…….
[REPLY – Every village needs an idiot. No village would be complete without one. (The problem only arises when he get elected mayor.) ~ Evan]
Jabbed this on Taminos blog, will it survive?
“What perplexes me the most is why the majority of folk on this thread are so fixated on a catastrophic outcome and seem to welcome a doomsday scenario.
Instead of welcoming a study that might point to a less damaging outcome for the future of our children there is an instant cry for blood when a new paper emerges that might indicate otherwise.
Even before the paper has been analysed or assessed there are attempts to discredit the authors, why? What is the motivation here and why is it not possible to give this new analysis a fair hearing? Something to hide?
Why is it that historical evidence of a roman warming/medieval warming/little ice age that are absent in the original Mann analysis ignored? These folk were all lying nutters? Were they anticipating this argument and telling porkies to discredit the deniers? The still buried settlements in Greenland were planted evidence?
Just saying.”
[REPLY – I think it may have something to do with the fact that if we survive, they don’t. ~ Evan]
I don’t think that Rutherford said ‘perhaps’….
“If your experiment needs statistics, you ought to have done a better experiment.”
Vorlath says:
August 16, 2010 at 6:14 am
“They talk about how one will select series that agree with known temperatures. This is something you should never do in statistics. This ends up skewing the data toward what you want it for the data that you know, but will lead to erroneous data everywhere else.”
Indeed!
From the paper: “All three of these datasets have been substantially
processed including smoothing and imputation of missing data
(Mann et al., 2008). While these present interesting problems, they are not
the focus of our inquiry. We assume that the data selection, collection, and processing performed by climate scientists meets the standards of their discipline. Without taking a position on these data quality issues, we thus take
the dataset as given.We further make the assumptions of linearity and stationarity
of the relationship between temperature and proxies, an assumption
employed throughout the climate science literature (NRC, 2006) noting
that ”the stationarity of the relationship does not require stationarity of the
series themselves” (NRC, 2006). Even with these substantial assumptions,
the paleoclimatological reconstructive endeavor is a very difficult one and
we focus on the substantive modeling problems encountered in this setting.”
(Please note, for consistency, I’ve been quoting the article by putting them in italics. In the paper itself, the word assume is in italics, for clarity I put it in bold.)
One could read the paragraph in one of two ways. One could state this is a nice way of saying “We believe the processing and collecting of samples were done in a professional manner and we don’t have any exception to the data.” I read it a bit differently though. It seems to me, the authors knew many would have taken exception to the data collection and processing(including the imputation) and were forced to put an early disclaimer in the paper as opposed to some footnote. To me, it reads something akin to “Yes, we know they gathered and processed the data in errant fashion, but there’s only so much we can write about without publishing a textbook on how not to apply statistics.” The sentence “We assume that the data selection, collection, and processing performed by climate scientists meets the standards of their discipline.”, seems to be a particular harsh slap at an entire profession. OUCH!!!
duckster says:
August 16, 2010 at 2:13 am
I don’t know what you are babbling on about. If there is a theory out there that says “If X then Y” and I produce an experiment that shows “If X then not Y”. That’s it, theory gone poof. Its still very much science and its a valid result to reject the hypothesis without any attempt to explain what the theory might actually be. You might well counter with “If X in the presence of Z, then Y” and start the whole process over again. However, it is definitely scientific to independently verify a result and if the result is not verified, then it is well within accepted norms that the burden of correction is on the original purveyor of the hypothesis.
Mark in Oz: August 16, 2010 at 6:30 am
Well, as Ernest Rutherford said:
“If your experiment depends on statistics, perhaps you should have designed a better experiment.”
Or, “If your stats don’t fit, you’d better quit.”
Willem’s claim is simply untrue. Anyone who looks at the two graphs from the paper as Anthony has reproduced here can see the point clearly. There is an uptick in the 20th century, but temps are still not as high as they were during the Medieval Warm Period. The new graph is very like the pre-1998 reconstruction of temperature history… before Michael Mann photoshopped the temp record. This means, even with all of the CO2 in the atmosphere, the Earth’s energy budget is not imbalanced in any unprecedented way. The Earth has been this warm before and warmer…. the polar bears survived and the human race survived. We have time to figure out how, if at all, increasing atmospheric CO2 is changing our climate.
Nope, deleted.
This paper is beautifully and clearly written and a welcome addition to undoing past careless work in climate science.
There is a lot in McShane and Wyner 2010 and much of it is dense in statistics and specialized mathematical treatment; but the result is clear:
(from pg 42) “Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries…”
To come to such a conclusion is devastating to the so-called Hockey Stick, but McIntyre and McKitrick did so in 2005 by showing that (pg 19) “…that random sequences (also referred to as “pseudo-proxies”) with complex local dependence structures can predict temperatures.”
What? a pseudo-proxy or a “random sequence” predicts temperature about as well as the actual proxies, used by Mann et al? (pg 17) “Since these pseudo-proxies are generated independently of the temperature series, we know they cannot be truly predictive of it. Hence, the real proxies – if they contain linear signal on temperatures – should out perform out pseudo-proxies, at least with high probability.”
At this point, I am on the edge of my seat – do the real proxies outperform the pseudo-proxies and if they do not, then the Hockey Stick isn’t much more of an important or valid contribution to climate science than say, palm-reading or reading tea leaves, and it is time to recognize this and move the science on. It also tells us the science was not settled science at the time of that oft made proclamation.
So are skeptics the only ones using pseudo-proxies? Not at all.
(pg 16) “The use of pseudo-proxies is quite common in the climate science literature where pseudo-proxies are often built by adding an AR1 time series (“red noise”) to natural proxies, local temperatures, or simulated temperatures generated from General Circulation Models (Mann and Rutherford, 2002; Wahl and Amman, 2006). These pseudo-proxies determine whether a given reconstruction is “skillful” (i.e. statistically significant). Skill is demonstrated with respect to a class of pseudo-proxies if the true proxies outperform the pseudo-proxies with high probability (probabilities are approximated by simulation). In our study, we use an even weaker benchmark than those in the climate science literature: our pseudo-proxies are random numbers known to be completely independent of the temperature series.”
They continue and describe, with helpful detail, precisely what pseudo-proxies they employ and provide a rationale for using them and then reiterate their aim:
(pg 17) “Since these pseudo-proxies are generated independently of the temperature series, we know they cannot be truly predictive of it. Hence the real proxies – if they contain linear signal on temperature – should outperform our pseudo-proxies, at least with high probability.”
Do the real proxies outperform the pseudo-proxies? No. They do not.
(pg 18) “Finally, the empirical AR1 process and Brownian Motion both substantially outperform the proxies. They have a lower average holdout RMSE and lower variability than that achieved by the proxies. This is extremely important since these three classes of time series are generally completely independent of the temperature data. They have no long term predictive ability, and they cannot be used to reconstruct historical temperatures. Yet, they significantly outperform the proxies at thirty-year holdout prediction!”
“In other words, our model performs better when using highly auto-correlated noise rather than proxies to “predict” temperature. The real proxies are less predictive than our “fake” data.”
More rationale for their methods and procedures and of note if slightly out of page number order:
(pg 17) “This suggests that climate scientists are using a particularly weak null benchmark to test their models. That the null models may be too weak and the associated standard errors in papers such as Mann et al. (1998) are not wide enough has already been pointed out in the climate literature (von Storch et al., 2004).”
This is already getting to be a long post, but their paper is 45 pages long and I have to get back to work.
Paul Jackson says:
August 15, 2010 at 7:51 am
Paul, the encryption is for sending and receiving the emails. They are still stored in plain text (depending upon the email program of course) on the server – from whence the climategate information was obtained.
They would have to change their email server software (something they probably have no control over) to try to get encryption there.
Mark in Oz says:
August 16, 2010 at 6:30 am
I wonder what Rutherford would have to say about the politicizing of science in his home country?
Did you not get what I was saying earlier:
The climate (and temp) in 1200 AD was the same as it was now. This is because in that time there were many animals. They produced a lot of methane. And that caused warming. Unfortunately, the humans killed the animals. And then it became ice cold. (they still call it the “little ice age”). Lucky for us, we have now many humans producing carbon dioxide. Othewerwise it would become (very) cold again!!
bushy says:
August 16, 2010 at 7:46 am
======================================
Everyone is noticing the same thing bushy.
They are shouting “do it for the children”, but every time any good news comes out that maybe we are not all going to die, they all go ballistic.
Excellent and exciting post, but headline is a bit of a mess: no wickets in hockey, let alone the ice hockey of the hockey stick debate. You’re thinking of cricket, when the wicket may be sticky, but the bat is always straight!
Doug Proctor, August 15, 2010 at 10:22 am
Read it with wonder when first saw it. Two points: one of the authors is at the University of Pennsylvania. Direct smack at Mann.
Not really. The University of Pennsylvania and Penn State are two different schools. Penn is Ivy League; Penn State is a state school.
Perhaps following this paper and the hard work of M&M we should update an old adage?
Statistics has done us, the people, a great service so a re-write seems the least we can do..
My suggestion would be:
‘Lies, damn lies, and robust AGW science’
Any other suggestions?
Ron Cram says:
“Willem’s claim is simply untrue. Anyone who looks at the two graphs from the paper as Anthony has reproduced here can see the point clearly. There is an uptick in the 20th century, but temps are still not as high as they were during the Medieval Warm Period.”
-Yet the TREND is clearly upward Ron. And the observed data correlates with the modeled data where it exists.
For Evan & Brian (in case you missed it)
http://wattsupwiththat.com/2010/08/14/breaking-new-paper-makes-a-hockey-sticky-wicket-of-mann-et-al-99/#comment-458382