New paper makes a hockey sticky wicket of Mann et al 98/99/08

NOTE: This has been running two weeks at the top of WUWT, discussion has slowed, so I’m placing it back in regular que. – Anthony

UPDATES:

Statistician William Briggs weighs in here

Eduardo Zorita weighs in here

Anonymous blogger “Deep Climate” weighs in with what he/she calls a “deeply flawed study” here

After a week of being “preoccupied” Real Climate finally breaks radio silence here. It appears to be a prelude to a dismissal with a “wave of the hand”

Supplementary Info now available: All data and code used in this paper are available at the Annals of Applied Statistics supplementary materials website:

http://www.imstat.org/aoas/supplements/default.htm

=========================================

Sticky Wicket – phrase, meaning: “A difficult situation”.

Oh, my. There is a new and important study on temperature proxy reconstructions (McShane and Wyner 2010) submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to the proxy reconstructions of Mann. It seems watertight on the surface, because instead of trying to attack the proxy data quality issues, they assumed the proxy data was accurate for their purpose, then created a bayesian backcast method. Then, using the proxy data, they demonstrate it fails to reproduce the sharp 20th century uptick.

Now, there’s a new look to the familiar “hockey stick”.

Before:

McShane-Wyner-Fig1 — Multiproxy reconstruction of Northern Hemisphere surface temperature variations over the past millennium (blue), along with 50-year average (black), a measure of the statistical uncertainty associated with the reconstruction (gray), and instrumental surface temperature data for the last 150 years (red), based on the work by Mann et al. (1999). This figure has sometimes been referred to as the hockey stick. Source: IPCC (2001).

After:

McShane-Wyner-Fig16 — FIG 16. Backcast from Bayesian Model of Section 5. CRU Northern Hemisphere annual mean land temperature is given by the thin black line and a smoothed version is given by the thick black line. The forecast is given by the thin red line and a smoothed version is given by the thick red line. The model is fit on 1850-1998 AD and backcasts 998-1849 AD. The cyan region indicates uncertainty due to t, the green region indicates uncertainty due to β, and the gray region indicates total uncertainty.

Not only are the results stunning, but the paper is highly readable, written in a sensible style that most laymen can absorb, even if they don’t understand some of the finer points of bayesian and loess filters, or principal components. Not only that, this paper is a confirmation of McIntyre and McKitrick’s work, with a strong nod to Wegman. I highly recommend reading this and distributing this story widely.

Here’s the submitted paper:

A Statistical Analysis of Multiple Temperature Proxies: Are Reconstructions of Surface Temperatures Over the Last 1000 Years Reliable?

(PDF, 2.5 MB. Backup download available here: McShane and Wyner 2010 )

It states in its abstract:

We find that the proxies do not predict temperature significantly better than random series generated independently of temperature. Furthermore, various model specifications that perform similarly at predicting temperature produce extremely different historical backcasts. Finally, the proxies seem unable to forecast the high levels of and sharp run-up in temperature in the 1990s either in-sample or from contiguous holdout blocks, thus casting doubt on their ability to predict such phenomena if in fact they occurred several hundred years ago.

Here are some excerpts from the paper (emphasis in paragraphs mine):

This one shows that M&M hit the mark, because it is independent validation:

In other words, our model performs better when using highly autocorrelated

noise rather than proxies to ”predict” temperature. The real proxies are less predictive than our ”fake” data. While the Lasso generated reconstructions using the proxies are highly statistically significant compared to simple null models, they do not achieve statistical significance against sophisticated null models.

We are not the first to observe this effect. It was shown, in McIntyre

and McKitrick (2005a,c), that random sequences with complex local dependence

structures can predict temperatures. Their approach has been

roundly dismissed in the climate science literature:

To generate ”random” noise series, MM05c apply the full autoregressive structure of the real world proxy series. In this way, they in fact train their stochastic engine with significant (if not dominant) low frequency climate signal rather than purely non-climatic noise and its persistence. [Emphasis in original]

Ammann and Wahl (2007)

…

On the power of the proxy data to actually detect climate change:

This is disturbing: if a model cannot predict the occurrence of a sharp run-up in an out-of-sample block which is contiguous with the insample training set, then it seems highly unlikely that it has power to detect such levels or run-ups in the more distant past. It is even more discouraging when one recalls Figure 15: the model cannot capture the sharp run-up even in-sample. In sum, these results suggest that the ninety-three sequences that comprise the 1,000 year old proxy record simply lack power to detect a sharp increase in temperature. See Footnote 12

Footnote 12:

On the other hand, perhaps our model is unable to detect the high level of and sharp run-up in recent temperatures because anthropogenic factors have, for example, caused a regime change in the relation between temperatures and proxies. While this is certainly a consistent line of reasoning, it is also fraught with peril for, once one admits the possibility of regime changes in the instrumental period, it raises the question of whether such changes exist elsewhere over the past 1,000 years. Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining paleoclimatology as a statistical enterprise.

…

McShane-Wyner-Fig15 — FIG 15. In-sample Backcast from Bayesian Model of Section 5. CRU Northern Hemisphere annual mean land temperature is given by the thin black line and a smoothed version is given by the thick black line. The forecast is given by the thin red line and a smoothed version is given by the thick red line. The model is fit on 1850-1998 AD.

We plot the in-sample portion of this backcast (1850-1998 AD) in Figure 15. Not surprisingly, the model tracks CRU reasonably well because it is in-sample. However, despite the fact that the backcast is both in-sample and initialized with the high true temperatures from 1999 AD and 2000 AD, it still cannot capture either the high level of or the sharp run-up in temperatures of the 1990s. It is substantially biased low. That the model cannot capture run-up even in-sample does not portend well for its ability

to capture similar levels and run-ups if they exist out-of-sample.

…

Conclusion.

Research on multi-proxy temperature reconstructions of the earth’s temperature is now entering its second decade. While the literature is large, there has been very little collaboration with universitylevel, professional statisticians (Wegman et al., 2006; Wegman, 2006). Our paper is an effort to apply some modern statistical methods to these problems. While our results agree with the climate scientists findings in some

respects, our methods of estimating model uncertainty and accuracy are in sharp disagreement.

On the one hand, we conclude unequivocally that the evidence for a ”long-handled” hockey stick (where the shaft of the hockey stick extends to the year 1000 AD) is lacking in the data. The fundamental problem is that there is a limited amount of proxy data which dates back to 1000 AD; what is available is weakly predictive of global annual temperature. Our backcasting methods, which track quite closely the methods applied most recently in Mann (2008) to the same data, are unable to catch the sharp run up in temperatures recorded in the 1990s, even in-sample.

As can be seen in Figure 15, our estimate of the run up in temperature in the 1990s has

a much smaller slope than the actual temperature series. Furthermore, the lower frame of Figure 18 clearly reveals that the proxy model is not at all able to track the high gradient segment. Consequently, the long flat handle of the hockey stick is best understood to be a feature of regression and less a reflection of our knowledge of the truth. Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distribution of our model.

Our main contribution is our efforts to seriously grapple with the uncertainty involved in paleoclimatological reconstructions. Regression of high dimensional time series is always a complex problem with many traps. In our case, the particular challenges include (i) a short sequence of training data, (ii) more predictors than observations, (iii) a very weak signal, and (iv) response and predictor variables which are both strongly autocorrelated.

The final point is particularly troublesome: since the data is not easily modeled by a simple autoregressive process it follows that the number of truly independent observations (i.e., the effective sample size) may be just too small for accurate reconstruction.

Climate scientists have greatly underestimated the uncertainty of proxy based reconstructions and hence have been overconfident in their models. We have shown that time dependence in the temperature series is sufficiently strong to permit complex sequences of random numbers to forecast out-of-sample reasonably well fairly frequently (see, for example, Figure 9). Furthermore, even proxy based models with approximately the same amount of reconstructive skill (Figures 11,12, and 13), produce strikingly dissimilar historical backcasts: some of these look like hockey sticks but most do not (Figure 14).

Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries. Nonetheless, paleoclimatoligical reconstructions constitute only one source of evidence in the AGW debate. Our work stands entirely on the shoulders of those environmental scientists who labored untold years to assemble the vast network of natural proxies. Although we assume the reliability of their data for our purposes here, there still remains a considerable number of outstanding questions that can only be answered with a free and open inquiry and a great deal of replication.

===============================================================

Commenters on WUWT report that Tamino and Romm are deleting comments even mentioning this paper on their blog comment forum. Their refusal to even acknowledge it tells you it has squarely hit the target, and the fat lady has sung – loudly.

(h/t to WUWT reader “thechuckr”)

0 0 votes

Article Rating

1.2K Comments

SamG

August 14, 2010 5:56 pm

What’s a truck?
REPLY: A Lorry. But don’t make me figure out what you are referring to. Just say it. – Anthony

Enneagram

August 14, 2010 5:56 pm

Mannipulated statistics?

dbstealey

August 14, 2010 6:07 pm

Natural climate variability is not well understood and is probably quite large.
We have been saying this here for the past few years. It’s good having the planet’s large natural variability statistically confirmed in a peer reviewed paper. The larger the natural variability, the less wiggle room for the putative effects of a rise in a tiny trace gas.

Methow Ken

August 14, 2010 6:12 pm

Downloaded the paper; saved local.
Definitely a devastating ”curtain call” for Mann, et. al.
Three cheers for the Fat Lady. . . .

co2insanity

August 14, 2010 6:17 pm

I bet you can’t read the DEL on a couple of computer keyboard keys about now.

Nick Stokes

August 14, 2010 6:20 pm

“As can be seen in Figure 15, our estimate of the run up in temperature in the 1990s has a much smaller slope than the actual temperature series.”
This does not sound like a recommendation.
“The fundamental problem is that there is a limited amount of proxy data which dates back to 1000 AD; what is available is weakly predictive of global annual temperature. “
But they give a backcast anyway?
REPLY: Oh puhlezze, but Mann writes a paper anyway? Amman and Wahl go through all their gyrations to avoid McIntyre to write a supporting paper? yeah sure. Nick you are deluding yourself. Proxies are not temperature data, and trees are not accurate thermometers.
You failed to make any headway over at CA with your line of reasoning, I don’t think you’ll get any traction here either. – Anthony

trbixler

August 14, 2010 6:33 pm

Anthony as always thanks for the update, one hopes that truth will finally be heard. With our current MSM and government I worry that it will be kept here and in the obscurity of statistical academia.

J.Hansford

August 14, 2010 6:36 pm

So…. Th’ science isn’t settled…… Who woulda thunk it!
The hard bit though, is getting the mainstream media to tell people about it….. They’re more interested in headlines like, “CO2 stole my Baby”, and other fanciful notions of greenhouse gases, than in reporting factual accounts of good science and statistics.
…. But maybe there’s a change in the wind.

Jason

August 14, 2010 6:41 pm

The title of this post should refer to Mann ’08 because that is where they drew their data from. The reference to Mann ’99 is just a passing reference used to place their work in historical context.
REPLY: yes but really it refers to all of them, as it has been an ongoing paper chase. – Anthony

Aldi

August 14, 2010 6:43 pm

“Our backcasting methods, which track quite closely the methods applied most recently in Mann (2008) to the same data, are unable to catch the sharp run up in temperatures recorded in the 1990s, even in-sample.”
Hide the decline? The recorded data has been *massaged*, most climate scientists are riding the gravy train(engaged in fraud).

Jason

August 14, 2010 6:48 pm

Nick said:
“But they give a backcast anyway?”
They give a backcast which shows that the temperature a thousand years ago could have been much warmer or much cooler than the present day. This is perfectly consistent with their deep reservations about the predictive ability of the proxy data.
Its worth noting that their Bayesian reconstruction calculates an 80% probability that the most recent decade is the warmest in the past 1000 years. That is not exactly a complete repudiation of the hockey stick. Then again, they didn’t even try to address the data quality issues in Mann ’08. Thir reconstruction includes the tree rings and Tiljander.
I would be interested to see what happens when that data is removed.

Matt Hardy

August 14, 2010 6:55 pm

“Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining paleoclimatology as a statistical enterprise.”
OUCH!

Wind Rider

August 14, 2010 7:03 pm

Best line of the excerpts, it bears repeating.
Climate scientists have greatly underestimated the uncertainty of proxy based reconstructions and hence have been overconfident in their models.
i.e. they’ve done it wrong, and then oversold it.
Bravo.

Ed Caryl

August 14, 2010 7:11 pm

Tell Ken Cuccinelli.

John Blake

August 14, 2010 7:13 pm

Guest post to WUWT on April 26, 2010 by Girma Orozngo, B.Tech, MASc, PhD, provides an equation bearing on the latter-day period from 1880 – 2010 projected to AD 2100, showing “excellent agreement” with GMTAs’ [Global Mean Temperature Anomalies] observed vs. modeled turning points, to wit:
GMTA = .0059 x (Year – 1880) – .52 + 2pi x Cos((Year – 1880)/60)
Prof. Orozngo’s chart (termed Figure 3) realistically depicts late-19th Century temperatures rebounding from Earth’s Little Ice Age (LIA) through AD 2100, exhibiting cyclical highs/lows above and below a long-term linear regression-line. As real-world evidence refuting Mann et al. continues to accumulate, it would be useful to track Prof. Orozngo’s extrapolation in light of a looming Dalton if not Maunder Minimum presaging an overdue reversion to Pleistocene Ice Time.

Mike Roddy

August 14, 2010 7:13 pm

Here’s the definitive article on questions about the Mann Hockey Stick:
http://www.realclimate.org/index.php/archives/2009/09/hey-ya-mal/
The authors of the 20- odd studies that confirmed Mann’s data are not really interested in what professional statisticians and mathematicians are saying about it. The people who understand and develop the data are the reliable sources, including actual climate scientists who produce their own outlier charts of the upward march of temperatures (are there any?)
Besides… Species are migrating north. Glaciers and Arctic ice are melting at unheard of rates. The ocean is becoming more acidic, and has experienced a 40% decline in fish biomass since 1950 due to CO2’s effect on phytoplankton.
Similarly, climate scientists are getting bored with arguments from untrained individuals that the “trace gas” CO2 does not play the major role in the recent and rapid temperature increases. This role was proven in a laboratory in the 19th century by Arrhenius, and has not been seriously disputed since.
Best wishes.

Anthony Watts

Author

Reply to Mike Roddy

August 14, 2010 7:18 pm

Nice try at misdirection Mike Roddy- FAIL

dbstealey

August 14, 2010 7:16 pm

Yo, Nick Stokes,
That whole issue can be settled in very short order by Mann and his clique opening the books on their data and methodologies.
Only pseudo-scientific charlatans would refuse to disclose tree ring data and methods…
…right?

Evan Jones

Editor

August 14, 2010 7:18 pm

We hold these truths to be self evident that all data shall be weighted equally and endowed by their compiler with verifiable links, among them, raw data, algorithms, and methodologies . . .

dbstealey

August 14, 2010 7:20 pm

Mike Roddy,
At first I honestly thought you were doing a silly parody of the RealClimate charlatans.
Then I realized you were serious.
Condolences. You’ve been immersed in the realclimate echo chamber way too long.

Evan Jones

Editor

August 14, 2010 7:23 pm

The people who understand and develop the data are the reliable sources, including actual climate scientists who produce their own outlier charts of the upward march of temperatures (are there any?)
Oh, the usual collection of liars, damnliars, and outliers.
This role was proven in a laboratory in the 19th century by Arrhenius, and has not been seriously disputed since.
Well, not until Arrhenius, 1906, anyway . . .
But, seriously, Mike. Stick around. Impart knowledge. Learn. In the genuine liberal tradition.
At any rate, most of us here believe the planet has somewhat warmed, CO2 is a GHC and has increased temperatures. The crux of the argument is all about rates and feedbacks — and, heh-heh, “adjustments”.
And so long as you keep it civil, your posts will not be deleted, which is more than you can say for realclimate. It’s a contentious issue, but you’ll find WUWT’s little (that is to say “huge”) readership to be more openminded than most.

Michael

August 14, 2010 7:26 pm

OT
This WUWT blog should create it’s own Hurricane prediction poll on the side line. I bet we could predict hurricane activity much closer than NOAA’s current prediction accuracy.
This blog’s Hurricane prediction forecast poll may be the one in the future that financial institutions rely on to make actuarial plans, set premiums, and is used to make preparedness plans.
I predicted zero hurricanes last year, this year and was 100% accurate.
I’m not saying I am 100% accurate, but with the contribution of the WUWT community, I bet we will increase prediction accuracy by 1000%
This also goes for predicting the severity of the coming winters so that the states can more accurately prepare for the amount of money they will need to spend on salt and snow removal.
It’s obvious our experts are failing us.

MichaelO

August 14, 2010 7:32 pm

Journalists will not attempt to understand, let alone explain, these findings. There should be someone (perhaps Mr Watts himself) who can issue concise, accurate summaries of this and other papers cited on this site in a form that will be understood by the general populace and perhaps even by journalists. It has to be in a form that will allow an eye-catching headline and a television news story. Accuracy would be of the utmost importance, so that news outlets can trust the summaries. There is, of course, no guarantee that the news media will take advantage of such a service, but we can hope and pray.

Sonicfrog

August 14, 2010 7:32 pm

The authors of the 20- odd studies that confirmed Mann’s data are not really interested in what professional statisticians and mathematicians are saying about it.
Yet they rely in stats and math to deduce the state of climate….. do you realize just what you’re saying?

SimonH

August 14, 2010 7:37 pm

I read this paper earlier this evening. It’s spectacularly devastating to the Mann hockey stick series of papers, not least because it’s very much up-to-the-minute, and it coincidentally amounts to being a resounding affirmation of M&M’s work. And more besides, in fact. It’s also wonderfully easy to read (which makes a nice change) and I therefore commend it to the house.
Everyone should read it, because it is effectively the last chapter in the field of paleo reconstruction and the final nail in the coffin of Mann’s hockey stick.
Mike Roddy says:

The authors of the 20- odd studies that confirmed Mann’s data are not really interested in what professional statisticians and mathematicians are saying about it. The people who understand and develop the data are the reliable sources, including actual climate scientists who produce their own outlier charts of the upward march of temperatures

. . . . . . . BWHAHAHAHAHA!!!

OK S.

August 14, 2010 7:38 pm

Now that took time to read. I found footnote 12 on page 39 particularly telling:

On the other hand, perhaps our model is unable to detect the high level of and sharp run-up in recent temperatures because anthropogenic factors have, for example, caused a regime change in the relation between temperatures and proxies. While this is certainly a consistent line of reasoning, it is also fraught with peril for, once one admits the possibility of regime changes in the instrumental period, it raises the question of whether such changes exist elsewhere over the past 1,000 years. Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining paleoclimatology as a statistical enterprise.

Also, it’s nice to see other statisticians are stepping up to take a look.
OK S.