New paper makes a hockey sticky wicket of Mann et al 98/99/08

NOTE: This has been running two weeks at the top of WUWT, discussion has slowed, so I’m placing it back in regular que. – Anthony

UPDATES:

Statistician William Briggs weighs in here

Eduardo Zorita weighs in here

Anonymous blogger “Deep Climate” weighs in with what he/she calls a “deeply flawed study” here

After a week of being “preoccupied” Real Climate finally breaks radio silence here. It appears to be a prelude to a dismissal with a “wave of the hand”

Supplementary Info now available: All data and code used in this paper are available at the Annals of Applied Statistics supplementary materials website:

http://www.imstat.org/aoas/supplements/default.htm

=========================================

Sticky Wicket – phrase, meaning: “A difficult situation”.

Oh, my. There is a new and important study on temperature proxy reconstructions (McShane and Wyner 2010) submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to the proxy reconstructions of Mann. It seems watertight on the surface, because instead of trying to attack the proxy data quality issues, they assumed the proxy data was accurate for their purpose, then created a bayesian backcast method. Then, using the proxy data, they demonstrate it fails to reproduce the sharp 20th century uptick.

Now, there’s a new look to the familiar “hockey stick”.

Before:

McShane-Wyner-Fig1 — Multiproxy reconstruction of Northern Hemisphere surface temperature variations over the past millennium (blue), along with 50-year average (black), a measure of the statistical uncertainty associated with the reconstruction (gray), and instrumental surface temperature data for the last 150 years (red), based on the work by Mann et al. (1999). This figure has sometimes been referred to as the hockey stick. Source: IPCC (2001).

After:

McShane-Wyner-Fig16 — FIG 16. Backcast from Bayesian Model of Section 5. CRU Northern Hemisphere annual mean land temperature is given by the thin black line and a smoothed version is given by the thick black line. The forecast is given by the thin red line and a smoothed version is given by the thick red line. The model is fit on 1850-1998 AD and backcasts 998-1849 AD. The cyan region indicates uncertainty due to t, the green region indicates uncertainty due to β, and the gray region indicates total uncertainty.

Not only are the results stunning, but the paper is highly readable, written in a sensible style that most laymen can absorb, even if they don’t understand some of the finer points of bayesian and loess filters, or principal components. Not only that, this paper is a confirmation of McIntyre and McKitrick’s work, with a strong nod to Wegman. I highly recommend reading this and distributing this story widely.

Here’s the submitted paper:

A Statistical Analysis of Multiple Temperature Proxies: Are Reconstructions of Surface Temperatures Over the Last 1000 Years Reliable?

(PDF, 2.5 MB. Backup download available here: McShane and Wyner 2010 )

It states in its abstract:

We find that the proxies do not predict temperature significantly better than random series generated independently of temperature. Furthermore, various model specifications that perform similarly at predicting temperature produce extremely different historical backcasts. Finally, the proxies seem unable to forecast the high levels of and sharp run-up in temperature in the 1990s either in-sample or from contiguous holdout blocks, thus casting doubt on their ability to predict such phenomena if in fact they occurred several hundred years ago.

Here are some excerpts from the paper (emphasis in paragraphs mine):

This one shows that M&M hit the mark, because it is independent validation:

In other words, our model performs better when using highly autocorrelated

noise rather than proxies to ”predict” temperature. The real proxies are less predictive than our ”fake” data. While the Lasso generated reconstructions using the proxies are highly statistically significant compared to simple null models, they do not achieve statistical significance against sophisticated null models.

We are not the first to observe this effect. It was shown, in McIntyre

and McKitrick (2005a,c), that random sequences with complex local dependence

structures can predict temperatures. Their approach has been

roundly dismissed in the climate science literature:

To generate ”random” noise series, MM05c apply the full autoregressive structure of the real world proxy series. In this way, they in fact train their stochastic engine with significant (if not dominant) low frequency climate signal rather than purely non-climatic noise and its persistence. [Emphasis in original]

Ammann and Wahl (2007)

…

On the power of the proxy data to actually detect climate change:

This is disturbing: if a model cannot predict the occurrence of a sharp run-up in an out-of-sample block which is contiguous with the insample training set, then it seems highly unlikely that it has power to detect such levels or run-ups in the more distant past. It is even more discouraging when one recalls Figure 15: the model cannot capture the sharp run-up even in-sample. In sum, these results suggest that the ninety-three sequences that comprise the 1,000 year old proxy record simply lack power to detect a sharp increase in temperature. See Footnote 12

Footnote 12:

On the other hand, perhaps our model is unable to detect the high level of and sharp run-up in recent temperatures because anthropogenic factors have, for example, caused a regime change in the relation between temperatures and proxies. While this is certainly a consistent line of reasoning, it is also fraught with peril for, once one admits the possibility of regime changes in the instrumental period, it raises the question of whether such changes exist elsewhere over the past 1,000 years. Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining paleoclimatology as a statistical enterprise.

…

McShane-Wyner-Fig15 — FIG 15. In-sample Backcast from Bayesian Model of Section 5. CRU Northern Hemisphere annual mean land temperature is given by the thin black line and a smoothed version is given by the thick black line. The forecast is given by the thin red line and a smoothed version is given by the thick red line. The model is fit on 1850-1998 AD.

We plot the in-sample portion of this backcast (1850-1998 AD) in Figure 15. Not surprisingly, the model tracks CRU reasonably well because it is in-sample. However, despite the fact that the backcast is both in-sample and initialized with the high true temperatures from 1999 AD and 2000 AD, it still cannot capture either the high level of or the sharp run-up in temperatures of the 1990s. It is substantially biased low. That the model cannot capture run-up even in-sample does not portend well for its ability

to capture similar levels and run-ups if they exist out-of-sample.

…

Conclusion.

Research on multi-proxy temperature reconstructions of the earth’s temperature is now entering its second decade. While the literature is large, there has been very little collaboration with universitylevel, professional statisticians (Wegman et al., 2006; Wegman, 2006). Our paper is an effort to apply some modern statistical methods to these problems. While our results agree with the climate scientists findings in some

respects, our methods of estimating model uncertainty and accuracy are in sharp disagreement.

On the one hand, we conclude unequivocally that the evidence for a ”long-handled” hockey stick (where the shaft of the hockey stick extends to the year 1000 AD) is lacking in the data. The fundamental problem is that there is a limited amount of proxy data which dates back to 1000 AD; what is available is weakly predictive of global annual temperature. Our backcasting methods, which track quite closely the methods applied most recently in Mann (2008) to the same data, are unable to catch the sharp run up in temperatures recorded in the 1990s, even in-sample.

As can be seen in Figure 15, our estimate of the run up in temperature in the 1990s has

a much smaller slope than the actual temperature series. Furthermore, the lower frame of Figure 18 clearly reveals that the proxy model is not at all able to track the high gradient segment. Consequently, the long flat handle of the hockey stick is best understood to be a feature of regression and less a reflection of our knowledge of the truth. Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distribution of our model.

Our main contribution is our efforts to seriously grapple with the uncertainty involved in paleoclimatological reconstructions. Regression of high dimensional time series is always a complex problem with many traps. In our case, the particular challenges include (i) a short sequence of training data, (ii) more predictors than observations, (iii) a very weak signal, and (iv) response and predictor variables which are both strongly autocorrelated.

The final point is particularly troublesome: since the data is not easily modeled by a simple autoregressive process it follows that the number of truly independent observations (i.e., the effective sample size) may be just too small for accurate reconstruction.

Climate scientists have greatly underestimated the uncertainty of proxy based reconstructions and hence have been overconfident in their models. We have shown that time dependence in the temperature series is sufficiently strong to permit complex sequences of random numbers to forecast out-of-sample reasonably well fairly frequently (see, for example, Figure 9). Furthermore, even proxy based models with approximately the same amount of reconstructive skill (Figures 11,12, and 13), produce strikingly dissimilar historical backcasts: some of these look like hockey sticks but most do not (Figure 14).

Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries. Nonetheless, paleoclimatoligical reconstructions constitute only one source of evidence in the AGW debate. Our work stands entirely on the shoulders of those environmental scientists who labored untold years to assemble the vast network of natural proxies. Although we assume the reliability of their data for our purposes here, there still remains a considerable number of outstanding questions that can only be answered with a free and open inquiry and a great deal of replication.

===============================================================

Commenters on WUWT report that Tamino and Romm are deleting comments even mentioning this paper on their blog comment forum. Their refusal to even acknowledge it tells you it has squarely hit the target, and the fat lady has sung – loudly.

(h/t to WUWT reader “thechuckr”)

0 0 votes

Article Rating

1.2K Comments

Inline Feedbacks

View all comments

cohenite

August 21, 2010 5:42 am

Latimer; you are reading something into my post which I did not intend and which is not the point; the point is that the recent warming [assuming one can trust the data and really the satellites are the only sources which are reliable with GISS and NOAA beyond the pale] is not exceptional; the extent of the trend figures I quoted from WFT are superflous because the broad trend is sufficient to make that conclusion in the periods I referred to.

Boudu

August 21, 2010 5:43 am

Almost 1000 comments !

Matt

August 21, 2010 6:32 am

Well
Zorita already called it a “deeply flawed” study – he was only too polite to say so. However, seeing how there is nothing substancial left standing of the paper after his review, it is fair to sum it up that way. – And now you have a second guy taking it to the cleaners. I think it is safe to take the news item off the pole position now, it seems it is not all that you had hoped for.

Pamela Gray

August 21, 2010 7:16 am

hmmm. Early in the comment section over at RC, one of the posters wrote, “In fact, one might wonder if they didn’t search for a method that wouldn’t beat some noise models (i.e. lasso) for the data at hand …” First of all, you must select your statistical method before you do the study. So if the authors here searched, they searched before hand. Second, a “robust” conclusion is exactly one that demonstrates the same conclusion no matter WHAT method you use. These efforts to find a statistical method that would challenge conclusions, should be welcomed endeavors by scientists, lest we lead the scientific community, let alone the world community, down a primrose path. That the hockey stick does not stand up against the statistical methods chosen by the authors, more than likely means the hockey stick, in its present popular version, is not robust.

Josh

Editor

August 21, 2010 7:52 am

Phlogiston, I liked your comment so much I drew this…
http://www.cartoonsbyjosh.com/Blades4Uscr.jpg

John Whitman

August 21, 2010 8:29 am

Josh says:
August 21, 2010 at 7:52 am
Phlogiston, I liked your comment so much I drew this…
http://www.cartoonsbyjosh.com/Blades4Uscr.jpg

Josh,
Wonderful stuff.
Please try a cartoon of yourself . . . . or is the hockey salesman in your above cartoon a self-portrait? Ever put your self-portrait in one of your cartoons?
John

Vince Causey

August 21, 2010 9:04 am

Matt,
“Zorita already called it a “deeply flawed” study.”
In what way is the study deeply flawed? Zorita has attacked M&W’s discourses on proxy data gathering, but when you get to the bottom of the arguments, it is all smoke and mirrors.
Why does M&W’s ignorance of CO2 in ice cores impact the statistical findings of their paper? M&W are guilty of naivette – by venturing to discuss the background to climate proxies – they have allowed their paper to fall victim to the argumentium strawmanium. So far, I have not seen – and Zorita does not provide – any critique of their statistical methods.

latitude

August 21, 2010 9:06 am

Matt, read what Pamela posted until you get it.
Three methods, three different results, is not robust.

James Sexton

August 21, 2010 9:26 am

Matt says:
August 21, 2010 at 6:32 am
“Well
Zorita already called it a “deeply flawed” study – he was only too polite to say so. However, seeing how there is nothing substancial left standing of the paper after his review, it is fair to sum it up that way. – And now you have a second guy taking it to the cleaners. I think it is safe to take the news item off the pole position now, it seems it is not all that you had hoped for.”
Matt, read the critiques again, but before you do, read the title of the paper.”A STATISTICAL ANALYSIS OF MULTIPLE TEMPERATURE
PROXIES: ARE RECONSTRUCTIONS OF SURFACE
TEMPERATURES OVER THE LAST 1000 YEARS RELIABLE?” In either DC or Zorita, it seems lost that this is the question the authors are trying to answer. In fact, while rambling Zorita actually confirms that the proxies are not reliable. Zorita states, “The authors unfortunately do not go into a deeper analysis. Questions of proxy selection, underestimation of past variability (the failure of their method to reproduce the trend in the last 30 years could be perfectly due to this problem as well), the role of non-climate noise in the proxies, and finally the tendency of almost all methods to produce spurious hockey sticks, all of them are related to some degree. For instance, the presence of noise in the proxy records alone could, regardless of the statistical method used, lead to underestimation of past variations.” Later he states, Well, this result may be interesting and probably correct, but I doubt it is useful, since I am not aware of any reconstruction using this statistical regression model.” This is the crux of the argument. In all of the critiques of the M&W paper I’ve seen it is ‘we don’t like the method, so it’s probably wrong’.(while never really presenting a valid reason why the methods are incorrect) Or as in Zorita’s case, he states its probably correct, but it doesn’t count because others haven’t done it before. But then later states it isn’t novel. “This is the part I most agree with, but their conclusions are hardly revolutionary. Already the NRC assessment on millennial reconstructions and other later papers indicate that the uncertainties are much larger than those included in the hockey stick and that the underestimation of past variability is ubiquitous. Already the NRC assessment on millennial reconstructions and other later papers indicate that the uncertainties are much larger than those included in the hockey stick and that the underestimation of past variability is ubiquitous. “ THIS IS THE ENTIRE STATED PURPOSE OF THE PAPER!!! I guess Zorita is stating the authors were correct, but for the wrong reasons.
Zorita seems to take them to task about not addressing the validity of the proxies chosen even though the paper clearly states that isn’t the purpose of the paper. Zorita even takes some time to do a bit of self-promotion and talks of a paper he wrote regarding proxy selection. Nice. Given that the authors M&W stated they would stipulate even though there were questions about that in itself which is why they used Mann’s later study because it used the most comprehensive sets of proxies.
At this point, I feel I must apologize to the many that I’ve taken to task for not reading the paper. Apparently, throughout the world, it is even too much of a difficulty to even understand stated purpose of the paper apparent in the title. Obviously, my expectations were a bit high for even professionals, much less laymen.
While Zorita’s critique is a bit more than DC’s, it still spends and inordinate amount of time on the background that doesn’t really have a darn thing to do with the actual study. As I stated on DC’s blog, who cares what the impetus was for them to write the study? They could have stated aliens told them to write it, that in itself doesn’t invalidate the methodologies nor the analysis used in the study.

James Sexton

August 21, 2010 10:01 am

Pamela Gray says:
August 21, 2010 at 7:16 am
Hammer hitting the nail on the head!!! For the life of me, I can’t understand why this is such a difficulty to understand. I thought we all learned this in grade school by looking at the different views of a cylinder and understanding we have to give it various dimensional views to conclude it is indeed a cylinder. Apparently, once again we see that temps and their proxies are so special that fundamental and rudimentary rules and concepts don’t apply to these magical numbers and only a selected few are trained well enough to be guardians of the truth.<—— Yet another tell tale sign of pseudo-science, bordering on cult like behavior.

Latimer Alder

August 21, 2010 10:15 am

@cohenite
‘Latimer; you are reading something into my post which I did not intend and which is not the point;’
Well, I suggest that next time you decide to comment on a thread that is all about uncertainty, you refrain from quoting trends to a completely unjustified 6 significant figures. And when asked to explain why, you could consider understanding the question rather than trashing the intellectual abilities of the questionner. RC tactics are not appropriate on WUWT.
And you should also remember that readers do not see what you meant…they see what you wrote. If these are different things, then the problem lies with you to correct, not with them to second guess your intention.

phlogiston

August 21, 2010 10:49 am

Josh says:
August 21, 2010 at 7:52 am
Brilliant cartoon, wow, I’m lost for words

Jobnls

August 21, 2010 11:07 am

Soon to be a thousand comments…… and some of the replies are apparently still defending the glorious hockey stick despite being up against an army of logic and common sense.
Does anybody really think that boring holes into old trees and then measuring the tree rings is a good way of knowing past mean temperatures 600 years ago with a certainty of +- 1.5 degrees??? What if birds ate half of the leaves at random intervals? =)
REPLY: I discuss the reliability of “treemometers” here:
http://wattsupwiththat.com/2009/09/28/a-look-at-treemometers-and-tree-ring-growth/
-Anthony

_Jim

August 21, 2010 11:07 am

cohenite August 19, 2010 at 11:38 pm
Russell teaches?!

Operative word here would be lectures (in the strictest sense).
.

Russell Seitz

August 21, 2010 11:23 am

What’s up with Mr. Watts claim of ” original “coverage of my forthcoming work ? One so deeply shocked over grey literature sources in IPCC reports must be in high dudgeon that a climate blog would stoop to a third hand account of an embargoed paper still under peer review. As guests should not beat their hosts, I must decline Mr. Watts kind offer to post on his behalf, as his original failure to contact me to confirm merits a journalistic thrashing.
His ” original” coverage link was one hacks take on a second journalist’s account of an interview with a _Science _ news reporter who hadn’t read the paper in question for the simple reason that I hadn’t finished writing it- he instead attended a conference where I gave a talk on the work in question.
The paper must speak for itself when it appears, but readers wishing to to get a grip on its context might profitably read Paul Crutzen’s piece in _Climate Change_ dealing with the policy ramifications of the inverse problem– solar radiation management by aerosol scattering. If that’s too technical try Victor Davis article-( and my reply) in _Foreign Affairs_. If even that seems too abstruse, there’s always the lucid _Economist_.
But if no matter how far down the chain of popularization and decaying signal-to-noise ratio you descend, you still can’t make head or tail of the science itself, just stay on this page. Mr. Watt’s is entitled to his lawful prey.
REPLY: Ah, well thanks for confirming that even when someone tries to engage you nicely, offering a guest post for you to explain further, you write condescendingly. Oh well, you had your chance to engage the public in meaningful way, instead you chose the “let them eat cake” route. Remember, people that live in greenhouses shouldn’t throw stones.
As for the word “original” coverage no such word was used, you inserted it. Thus you err in your interpretation.
I wrote “our first coverage of it”, meaning “the first time on WUWT”. Certainly there will be a follow up to that one when your paper is published. I do find it funny though that you can lecture on a paper “not yet finished” and then lambaste somebody for writing about it. The simplest solution to your dilemma is to not say anything if you don’t want anyone to take notice.
But given the press coverage so far, it appears that in fact you are seeking attention for it, so your protestations about press coverage are ridiculous. BTW can you confirm for me that all authors of the linked stories above contacted you first? I want to know who will be doing the journalistic thrashing of me.
And finally I’m curious, this story, is that you? Are you a jade hunter? – Anthony

kim

August 21, 2010 11:41 am

He always finds these
Hockey Sticks, through the ruckus.
Accident or not?
==========

kim

August 21, 2010 11:45 am

Sneers are, it appears,
property of the fearful.
Arch higher, and hiss.
============

Invariant

August 21, 2010 11:47 am

Let us assume that the global temperature rises one or two degrees. So what?
Let us further assume that the sea level rises a couple feet. So what? Not unusual in the Earth’s long history.
The climate establishment states that this will lead to doomsday – 30% of the world’s species will become extinct. The temperature increased several degrees after the last ice age; moreover, sea levels rose by 120 meters. And no animals became extinct. Some died indirectly because they migrated but not due to climate change!
I think we have to adapt to perfectly normal fluctuations in temperature and sea level due to climate change – we are unable to stop it!

kim

August 21, 2010 11:53 am

Look Prof, we’ve long known
It’s all about albedo.
Cloud’s quiet message.
============

Bill Tuttle

August 21, 2010 12:20 pm

SamG: August 14, 2010 at 5:56 pm
What’s a truck?
REPLY: A Lorry.
It’s also the round ball atop a flagpole.
*koff*
Only two more to go for a thousand comments…

john kenny

August 21, 2010 1:02 pm

I’ve heard that if it gets to 1000 comments then MBH have to release their code.
Only one to go and we’ll know.
Well done WUWT et al. for your wonderful site.

H.R.

August 21, 2010 1:11 pm

“Are we there, yet? Are we there yet? Are we there yet?”
A: “If I have to stop this blog and turn around…”

Mike Post

August 21, 2010 1:32 pm

@Invariant at 11.47 am. “The temperature increased several degrees after the last ice age; moreover, sea levels rose by 120 meters. And no animals became extinct.”
Surely animals are always becoming extinct? That is the nature of evolution.

Mike Post

August 21, 2010 1:36 pm

SamG: August 14, 2010 at 5:56 pm
What’s a truck?
“The Truck” was, maybe still is, a famous large vehicle parked in a car park near Tokyo’s Narita airport which contained a bar. In my day, predominantly Anglo-Saxon aircrew used to meet there for a few beers.

John Whitman

August 21, 2010 1:42 pm

Since this post is about bringing professional statisticians into the climate science methods/processes, I thought you might enjoy a post to RC that had the life expectancy of a tsetse fly. It didn’t make it nearly that long, like zero life.
NOTE: Honestly, I am not trying (perhaps just a little bit) to be the 1000th commenter : )

Gavin,
It is educational to me to see a clearly original uniquely alternative approach to the scientific process here at RC compared to what I find in the history of science, in the history of philosophy of science and in current general process of science.
Thanks for the experience, sincerely.
John

John