New paper makes a hockey sticky wicket of Mann et al 98/99/08

NOTE: This has been running two weeks at the top of WUWT, discussion has slowed, so I’m placing it back in regular que.  – Anthony

UPDATES:

Statistician William Briggs weighs in here

Eduardo Zorita weighs in here

Anonymous blogger “Deep Climate” weighs in with what he/she calls a “deeply flawed study” here

After a week of being “preoccupied” Real Climate finally breaks radio silence here. It appears to be a prelude to a dismissal with a “wave of the hand”

Supplementary Info now available: All data and code used in this paper are available at the Annals of Applied Statistics supplementary materials website:

http://www.imstat.org/aoas/supplements/default.htm

=========================================

Sticky Wicket – phrase, meaning: “A difficult situation”.

Oh, my. There is a new and important study on temperature proxy reconstructions (McShane and Wyner 2010) submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to the proxy reconstructions of Mann. It seems watertight on the surface, because instead of trying to attack the proxy data quality issues, they assumed the proxy data was accurate for their purpose, then created a bayesian backcast method. Then, using the proxy data, they demonstrate it fails to reproduce the sharp 20th century uptick.

Now, there’s a new look to the familiar “hockey stick”.

Before:

Multiproxy reconstruction of Northern Hemisphere surface temperature variations over the past millennium (blue), along with 50-year average (black), a measure of the statistical uncertainty associated with the reconstruction (gray), and instrumental surface temperature data for the last 150 years (red), based on the work by Mann et al. (1999). This figure has sometimes been referred to as the hockey stick. Source: IPCC (2001).

After:

FIG 16. Backcast from Bayesian Model of Section 5. CRU Northern Hemisphere annual mean land temperature is given by the thin black line and a smoothed version is given by the thick black line. The forecast is given by the thin red line and a smoothed version is given by the thick red line. The model is fit on 1850-1998 AD and backcasts 998-1849 AD. The cyan region indicates uncertainty due to t, the green region indicates uncertainty due to β, and the gray region indicates total uncertainty.

Not only are the results stunning, but the paper is highly readable, written in a sensible style that most laymen can absorb, even if they don’t understand some of the finer points of bayesian and loess filters, or principal components. Not only that, this paper is a confirmation of McIntyre and McKitrick’s work, with a strong nod to Wegman. I highly recommend reading this and distributing this story widely.

Here’s the submitted paper:

A Statistical Analysis of Multiple Temperature Proxies: Are Reconstructions of Surface Temperatures Over the Last 1000 Years Reliable?

(PDF, 2.5 MB. Backup download available here: McShane and Wyner 2010 )

It states in its abstract:

We find that the proxies do not predict temperature significantly better than random series generated independently of temperature. Furthermore, various model specifications that perform similarly at predicting temperature produce extremely different historical backcasts. Finally, the proxies seem unable to forecast the high levels of and sharp run-up in temperature in the 1990s either in-sample or from contiguous holdout blocks, thus casting doubt on their ability to predict such phenomena if in fact they occurred several hundred years ago.

Here are some excerpts from the paper (emphasis in paragraphs mine):

This one shows that M&M hit the mark, because it is independent validation:

In other words, our model performs better when using highly autocorrelated

noise rather than proxies to ”predict” temperature. The real proxies are less predictive than our ”fake” data. While the Lasso generated reconstructions using the proxies are highly statistically significant compared to simple null models, they do not achieve statistical significance against sophisticated null models.

We are not the first to observe this effect. It was shown, in McIntyre

and McKitrick (2005a,c), that random sequences with complex local dependence

structures can predict temperatures. Their approach has been

roundly dismissed in the climate science literature:

To generate ”random” noise series, MM05c apply the full autoregressive structure of the real world proxy series. In this way, they in fact train their stochastic engine with significant (if not dominant) low frequency climate signal rather than purely non-climatic noise and its persistence. [Emphasis in original]

Ammann and Wahl (2007)

On the power of the proxy data to actually detect climate change:

This is disturbing: if a model cannot predict the occurrence of a sharp run-up in an out-of-sample block which is contiguous with the insample training set, then it seems highly unlikely that it has power to detect such levels or run-ups in the more distant past. It is even more discouraging when one recalls Figure 15: the model cannot capture the sharp run-up even in-sample. In sum, these results suggest that the ninety-three sequences that comprise the 1,000 year old proxy record simply lack power to detect a sharp increase in temperature. See Footnote 12

Footnote 12:

On the other hand, perhaps our model is unable to detect the high level of and sharp run-up in recent temperatures because anthropogenic factors have, for example, caused a regime change in the relation between temperatures and proxies. While this is certainly a consistent line of reasoning, it is also fraught with peril for, once one admits the possibility of regime changes in the instrumental period, it raises the question of whether such changes exist elsewhere over the past 1,000 years. Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining paleoclimatology as a statistical enterprise.

FIG 15. In-sample Backcast from Bayesian Model of Section 5. CRU Northern Hemisphere annual mean land temperature is given by the thin black line and a smoothed version is given by the thick black line. The forecast is given by the thin red line and a smoothed version is given by the thick red line. The model is fit on 1850-1998 AD.

We plot the in-sample portion of this backcast (1850-1998 AD) in Figure 15. Not surprisingly, the model tracks CRU reasonably well because it is in-sample. However, despite the fact that the backcast is both in-sample and initialized with the high true temperatures from 1999 AD and 2000 AD, it still cannot capture either the high level of or the sharp run-up in temperatures of the 1990s. It is substantially biased low. That the model cannot capture run-up even in-sample does not portend well for its ability

to capture similar levels and run-ups if they exist out-of-sample.

Conclusion.

Research on multi-proxy temperature reconstructions of the earth’s temperature is now entering its second decade. While the literature is large, there has been very little collaboration with universitylevel, professional statisticians (Wegman et al., 2006; Wegman, 2006). Our paper is an effort to apply some modern statistical methods to these problems. While our results agree with the climate scientists findings in some

respects, our methods of estimating model uncertainty and accuracy are in sharp disagreement.

On the one hand, we conclude unequivocally that the evidence for a ”long-handled” hockey stick (where the shaft of the hockey stick extends to the year 1000 AD) is lacking in the data. The fundamental problem is that there is a limited amount of proxy data which dates back to 1000 AD; what is available is weakly predictive of global annual temperature. Our backcasting methods, which track quite closely the methods applied most recently in Mann (2008) to the same data, are unable to catch the sharp run up in temperatures recorded in the 1990s, even in-sample.

As can be seen in Figure 15, our estimate of the run up in temperature in the 1990s has

a much smaller slope than the actual temperature series. Furthermore, the lower frame of Figure 18 clearly reveals that the proxy model is not at all able to track the high gradient segment. Consequently, the long flat handle of the hockey stick is best understood to be a feature of regression and less a reflection of our knowledge of the truth. Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distribution of our model.

Our main contribution is our efforts to seriously grapple with the uncertainty involved in paleoclimatological reconstructions. Regression of high dimensional time series is always a complex problem with many traps. In our case, the particular challenges include (i) a short sequence of training data, (ii) more predictors than observations, (iii) a very weak signal, and (iv) response and predictor variables which are both strongly autocorrelated.

The final point is particularly troublesome: since the data is not easily modeled by a simple autoregressive process it follows that the number of truly independent observations (i.e., the effective sample size) may be just too small for accurate reconstruction.

Climate scientists have greatly underestimated the uncertainty of proxy based reconstructions and hence have been overconfident in their models. We have shown that time dependence in the temperature series is sufficiently strong to permit complex sequences of random numbers to forecast out-of-sample reasonably well fairly frequently (see, for example, Figure 9). Furthermore, even proxy based models with approximately the same amount of reconstructive skill (Figures 11,12, and 13), produce strikingly dissimilar historical backcasts: some of these look like hockey sticks but most do not (Figure 14).

Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries. Nonetheless, paleoclimatoligical reconstructions constitute only one source of evidence in the AGW debate. Our work stands entirely on the shoulders of those environmental scientists who labored untold years to assemble the vast network of natural proxies. Although we assume the reliability of their data for our purposes here, there still remains a considerable number of outstanding questions that can only be answered with a free and open inquiry and a great deal of replication.

===============================================================

Commenters on WUWT report that Tamino and Romm are deleting comments even mentioning this paper on their blog comment forum. Their refusal to even acknowledge it tells you it has squarely hit the target, and the fat lady has sung – loudly.

(h/t to WUWT reader “thechuckr”)

Share

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
1.2K Comments
Inline Feedbacks
View all comments
latitude
August 15, 2010 7:11 am

Brad says:
August 15, 2010 at 6:27 am
So what happens when you use the real data? I guess the whole thing was made up?
=========================================================
Brad, there’s no real data, they used Mann’s data.
This is not a reconstruction of temperature data, this is a reconstruction of Mann’s data.
It’s not meant to prove or disprove or anything to do with the MWP.
It’s only looking at Mann’s reconstruction of his own data.
Mann ran his data and came up with a flat line with a up-tic on the end, the hockey stick.
They ran his own data, and came up with warmer temperatures at the beginning than the end, no hockey stick.
If this paper proves to be true, then it can only mean one of two things:
1 Mann lied and cheated
2 Mann doesn’t know what he’s doing and is inept

August 15, 2010 7:11 am

TerryS: August 15, 2010 at 5:02 am
You say: “Figure 16 illustrates the remarkable feature that, at the onset of the industrial revolution, the increase in the Earth’s temp was so great it created a reversal in its slope. Fascinating.”
Gee, the same thing happened in 1350 and 1690, so there must have been something other than industrialization to cause it — say, some natural variation. Fascinating, huh?

Richard M
August 15, 2010 7:13 am

If I understand this paper corerctly it demonstrates “Mike’s Nature trick” really was a trick in the usual meaning of the word. LOL.

August 15, 2010 7:20 am

I congratulate McShane and Wyner not only for the substance and readability of their paper.
More importantly, given that they had to know that publishing it would invite frenzied vehemence from the “team” of entrenched climate scientists, I congratulate them on having the courage to stand up and speak critically of the so called “consensus”.
I think it only takes a man with integrity and independence speaking out without fear to stop any falsity in climate science.
John

Stephan
August 15, 2010 7:22 am

Great idea!
“NOTE: this will be the top post at WUWT for a couple of days, see below for new stories – Anthony”

Mikael Pihlström
August 15, 2010 7:24 am

Joe Horner says:
August 15, 2010 at 5:56 am
So, your call: Either:
(a) the proxies are unreliable predictors because they fail to track the current temp rise. In which case they are also worthless for back-casting. In which case there is absolutely no evidence to claim current warming is “unprecedented”, or,
(b) the proxies are reasonable predictors. In which case they may be ok to support a claim of unprecedented warming. But in that case, the insturmental record is showing warming that isn’t really there because the (reliable) proxies would show it if it was. In which case, the instrumental record is (as has been widely discussed) contaminated beyond usefulness.
Your call, (a), (b) or both of the above?
———
It is (a), for the moment.

Pamela Gray
August 15, 2010 7:26 am

Oh, this is an old trick. Timing your publication is every bit the main concern, especially for those who have an “in” with the journal editor. Trust me on this regarding the magic 3 things: 1. who gets published, 2. in what journal, and 3. when, are the three main considerations of many research efforts. The research itself can go to hell in a hand basket and still get published, as long as the unwritten 3 main considerations are given top priority. The next round of IPCC authors and their studies are already being planned around the magic 3 things. Who cares if the conclusions are nothing but piles of poo and statistically infantile.

August 15, 2010 7:28 am

TerryS says: August 15, 2010 at 5:29 am
Re: joshua corning says: The real fun will be watching the next IPCC panel doing back flips to keep this out of their next report.
It will be easy for them. They will simply arrange for one of their pet journals to publish a paper, refuting this one, just before the cutoff date for IPCC submissions. The paper won’t have to be accurate or have sound statistics, it simply has to be published too late for any responses to it to make it into the next IPCC report.

OTOH, they might decide it cannot be fought any longer. They might say
“ah, now at last we have a real peer-reviewed statisticians’ paper. Why didn’t McIntyre get published and peer-reviewed, then we could have taken him seriously. In fact, he’s not really helped anyone by refusing to publish all this time. If we’d known our stats were shaky, of course we’d have got expert help…” etc etc
Now of course, we should remember Wegman, Gerry North, and all the rest. But people have short memories, and anyway, the press at the time of the Congress inquiries made it sound like Mann’s hokey stick had been vindicated by North.

Mike Roddy
August 15, 2010 7:44 am

A reader questioned my comment that the oceans have 40% less fish biomass. This is actually only a logical assumption, since it’s impossible to measure fish biomass, due to their dispersion. The study in question measures phytoplankton, which form the basis of the oceanic food chain. I should have noted that in my comment. Here is the study:
http://www.cleveland.com/world/index.ssf/2010/07/oceans_phytoplankton_drops_40.html
Climate scientists have plenty of training in statistical methodology. Those who claim superior abilities, such as McIntyre and Wegman, have not been successful in producing charts in peer reviewed publications that show anything other than the many versions of the hockey stick that have appeared in scientific publications. Their attempted corrections tend to be heavy on jargon, and in some cases question dispute the randomness of tree ring selection when they have little knowledge of the raw sampling.
“The hockey stick is broken” is a great rallying cry, but has zero substance in the world of qualified scientists who actually produce the charts in question. Some climate scientists have actually investigated the broken hockey stick claim in detail. Here’s what they found: nothing. If, on the other hand, one chooses to believe that IPCC and NASA scientists are part of a grant-seeking world-government-installing cabal, than it is difficult to dispute your argument. It’s considerably more difficult to believe it.

E. Robichaud
August 15, 2010 7:44 am

Thank you, thank you. I realized two years ago that climate science was a statistical exercise and that garbage in, garbage out. However, trying to explain this in the comments section of my local MSM newspapers, I was immediately shot down with the usual comments of “You are not a scientist” and “scientists say…” and “X scientists have issued reports proving that AGW is real”. This would be followed up by multi page arguments between an anti AGW scientist vs. a pro AGW scientist bringing out ice measurements, currents, air temperatures etc. They are all in awe of scientists whereas the poor old statisticians and mathematicians are ignored.

August 15, 2010 7:46 am

I don’t understand! where’s the hockey stick? (sarc)

August 15, 2010 7:51 am

Michael Jankowski says:
I wonder how many emails went back-and-forth between team members today?
I wonder if they were smart enough to encrypt them this time around.

Anders L.
August 15, 2010 7:54 am

To me, it still looks very much like a hockey stick. The only real difference is that the handle now has a downward slope.

August 15, 2010 8:02 am

Moderators,
If this post goes ballistic, as it has started . . . . . you guys better cancel some dates and stock up on Red Bull and popcorn.
Good moderating to ya . . .
John

Pamela Gray
August 15, 2010 8:04 am

But… but… but… weren’t the proxies properly homogenized, parametrized, adjusted, back filled, and quality enhanced in Mann’s version? Surely these statisticians were able to use Mann’s original proprietary data code file, yes? Makes me wonder if they even bothered to ask him. I’m sure he would have said yes, right?

August 15, 2010 8:11 am

To a layman such as I, the arguments presented so far by the Mann believers who have rushed to defend their idol are quite droll in their utter lack of understanding of the paper and of any attempt to point out that that it was solely the Mann-made data that was used.

TerryS
August 15, 2010 8:15 am

Bill Tuttle says:
August 15, 2010 at 7:11 am
TerryS: August 15, 2010 at 5:02 am
You say: “Figure 16 illustrates the remarkable feature that, at the onset of the industrial revolution, the increase in the Earth’s temp was so great it created a reversal in its slope. Fascinating.”
Please get your quotes right. I did not say that, I was quoting eudoxus

Feet2theFire
August 15, 2010 8:22 am

OK. After having looked at this a good deal, I come away with these observations:
1. They use Mann’s data, which means CRU-adjusted data
2. The data uses temps from 80%+ poorly sited stations, distorting the post-1990 record
3. Their frustrations about forecasting and backcasting have to do with the post-1990 record, which is distorted by poorly sited stations and unknown adjustments, plus the loss of nearly 90% of met stations in the post-1990 period.
4. Though they came up with a hockey stick that shows 1000AD as high as now, yet no one is farming Greenland now, so no matter how much this undercuts Mann/CRU, it is still inadequate (which they seem to be saying, in fact)
5. The new hockey stick is missing the LIA (as I read it); it shows temps in 1900 as low as the LIA. The bottom of the curve (annual and rolling both) is after 1800, which we all know is not true.
6. Amazingly, the 2000 un-rolling curve is pretty much exactly equal to 1000AD
7. One of their main points is the width of the uncertainty bands, which I have yelled and screamed about for a long time. A better graphing would show the 95% certainty bands, IMHO.
8. They do conclude that the predictive capabilities from the Mann dataset is just too low to be usable. For the IPCC, this is definitely not a good result.
9. I posit that corrected and more inclusive data for post-1990 would remedy much of their difficulties, which are tied to the post-1990 steep rise; i.e., I still suggest the steep rise does not exist in the real world, only in the post-adjusted CRU numbers.

TomRude
August 15, 2010 8:22 am

Latest news:
Not only the surface temperature record has been shown inaccurate by Ross McKitrick latest papers and others, the statistical fabric of the proxies collage with this temperature record has now been shown a dubious use of statistical tools that doesn’t resist to proper analysis.
CO2 might be increasing but the temperature curve that is supposed to reflect this CO2 increase is now exposed as baseless. We are finally back to the basics of meteorology and it is time for many here to read Leroux “dynamic analysis of weather and climate” Springer 2010, 2ed. as all the weather events we are witnessing were predicted and explained.

Policyguy
August 15, 2010 8:32 am

In my opinion, Tamino and Romm are enforcers, not thought leaders. I’d be interested in Revkin’s take on this. It strikes me that this is the kind of paper that he will read and then call his buddies Mann and Jones for a chat about what they think it means.
While he’s at it, perhaps he will call his other buddy Hansen for a chat about his program to manage the NASA GISS data set so well. Your recent post http://wattsupwiththat.com/2010/08/11/more-gunsmoke-this-time-in-nepal/
would provide a quick reference point for him.
Fat chance.
http://wattsupwiththat.com/2010/08/11/more-gunsmoke-this-time-in-nepal/

John Blake
August 15, 2010 8:35 am

McShane and Wyner will never again share faculty lounge crudites with Briffa, Hansen, Jones, Mann, Trenberth et al. Meantime, we note that in respecting climate hysterics’ self-evidently selective and skewed dendrochronological time-series, this seminal paper grants the collusive Green Gang unwonted legitimacy even in non-statistical contexts. Factoring in such cultists’ absurdly manipulated base data eked out from c. AD 1000 reduces any and all Warmist hypotheses to smoking ruin.

JDN
August 15, 2010 8:55 am

I’d like to second this:
Brad says:
August 15, 2010 at 6:27 am
So what happens when you use the real data?

Jimbo
August 15, 2010 8:57 am

“Commenters on WUWT report that Tamino and Romm are deleting comments even mentioning this paper on their blog comment forum.”

Just found a comment about the paper on Tamino’s site but no response yet.
http://tamino.wordpress.com/2010/08/13/changes/#comment-43742
I have posted one at Romm’s site about the paper.

Chuck L
August 15, 2010 8:58 am

It seems that Tamino has let a few comments be posted about the paper. Already, one of his sychophants has called McShane and Wyner “well-known denialists.” (shaking my head sadly)

David
August 15, 2010 9:01 am

McIntyre and Mckitrick were publised and peer reviewed, this is additonal confirmation. At some point the ref will throw in the towel, but the team will continue until then.

1 7 8 9 10 11 49