New paper makes a hockey sticky wicket of Mann et al 98/99/08

NOTE: This has been running two weeks at the top of WUWT, discussion has slowed, so I’m placing it back in regular que.  – Anthony

UPDATES:

Statistician William Briggs weighs in here

Eduardo Zorita weighs in here

Anonymous blogger “Deep Climate” weighs in with what he/she calls a “deeply flawed study” here

After a week of being “preoccupied” Real Climate finally breaks radio silence here. It appears to be a prelude to a dismissal with a “wave of the hand”

Supplementary Info now available: All data and code used in this paper are available at the Annals of Applied Statistics supplementary materials website:

http://www.imstat.org/aoas/supplements/default.htm

=========================================

Sticky Wicket – phrase, meaning: “A difficult situation”.

Oh, my. There is a new and important study on temperature proxy reconstructions (McShane and Wyner 2010) submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to the proxy reconstructions of Mann. It seems watertight on the surface, because instead of trying to attack the proxy data quality issues, they assumed the proxy data was accurate for their purpose, then created a bayesian backcast method. Then, using the proxy data, they demonstrate it fails to reproduce the sharp 20th century uptick.

Now, there’s a new look to the familiar “hockey stick”.

Before:

Multiproxy reconstruction of Northern Hemisphere surface temperature variations over the past millennium (blue), along with 50-year average (black), a measure of the statistical uncertainty associated with the reconstruction (gray), and instrumental surface temperature data for the last 150 years (red), based on the work by Mann et al. (1999). This figure has sometimes been referred to as the hockey stick. Source: IPCC (2001).

After:

FIG 16. Backcast from Bayesian Model of Section 5. CRU Northern Hemisphere annual mean land temperature is given by the thin black line and a smoothed version is given by the thick black line. The forecast is given by the thin red line and a smoothed version is given by the thick red line. The model is fit on 1850-1998 AD and backcasts 998-1849 AD. The cyan region indicates uncertainty due to t, the green region indicates uncertainty due to β, and the gray region indicates total uncertainty.

Not only are the results stunning, but the paper is highly readable, written in a sensible style that most laymen can absorb, even if they don’t understand some of the finer points of bayesian and loess filters, or principal components. Not only that, this paper is a confirmation of McIntyre and McKitrick’s work, with a strong nod to Wegman. I highly recommend reading this and distributing this story widely.

Here’s the submitted paper:

A Statistical Analysis of Multiple Temperature Proxies: Are Reconstructions of Surface Temperatures Over the Last 1000 Years Reliable?

(PDF, 2.5 MB. Backup download available here: McShane and Wyner 2010 )

It states in its abstract:

We find that the proxies do not predict temperature significantly better than random series generated independently of temperature. Furthermore, various model specifications that perform similarly at predicting temperature produce extremely different historical backcasts. Finally, the proxies seem unable to forecast the high levels of and sharp run-up in temperature in the 1990s either in-sample or from contiguous holdout blocks, thus casting doubt on their ability to predict such phenomena if in fact they occurred several hundred years ago.

Here are some excerpts from the paper (emphasis in paragraphs mine):

This one shows that M&M hit the mark, because it is independent validation:

In other words, our model performs better when using highly autocorrelated

noise rather than proxies to ”predict” temperature. The real proxies are less predictive than our ”fake” data. While the Lasso generated reconstructions using the proxies are highly statistically significant compared to simple null models, they do not achieve statistical significance against sophisticated null models.

We are not the first to observe this effect. It was shown, in McIntyre

and McKitrick (2005a,c), that random sequences with complex local dependence

structures can predict temperatures. Their approach has been

roundly dismissed in the climate science literature:

To generate ”random” noise series, MM05c apply the full autoregressive structure of the real world proxy series. In this way, they in fact train their stochastic engine with significant (if not dominant) low frequency climate signal rather than purely non-climatic noise and its persistence. [Emphasis in original]

Ammann and Wahl (2007)

On the power of the proxy data to actually detect climate change:

This is disturbing: if a model cannot predict the occurrence of a sharp run-up in an out-of-sample block which is contiguous with the insample training set, then it seems highly unlikely that it has power to detect such levels or run-ups in the more distant past. It is even more discouraging when one recalls Figure 15: the model cannot capture the sharp run-up even in-sample. In sum, these results suggest that the ninety-three sequences that comprise the 1,000 year old proxy record simply lack power to detect a sharp increase in temperature. See Footnote 12

Footnote 12:

On the other hand, perhaps our model is unable to detect the high level of and sharp run-up in recent temperatures because anthropogenic factors have, for example, caused a regime change in the relation between temperatures and proxies. While this is certainly a consistent line of reasoning, it is also fraught with peril for, once one admits the possibility of regime changes in the instrumental period, it raises the question of whether such changes exist elsewhere over the past 1,000 years. Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining paleoclimatology as a statistical enterprise.

FIG 15. In-sample Backcast from Bayesian Model of Section 5. CRU Northern Hemisphere annual mean land temperature is given by the thin black line and a smoothed version is given by the thick black line. The forecast is given by the thin red line and a smoothed version is given by the thick red line. The model is fit on 1850-1998 AD.

We plot the in-sample portion of this backcast (1850-1998 AD) in Figure 15. Not surprisingly, the model tracks CRU reasonably well because it is in-sample. However, despite the fact that the backcast is both in-sample and initialized with the high true temperatures from 1999 AD and 2000 AD, it still cannot capture either the high level of or the sharp run-up in temperatures of the 1990s. It is substantially biased low. That the model cannot capture run-up even in-sample does not portend well for its ability

to capture similar levels and run-ups if they exist out-of-sample.

Conclusion.

Research on multi-proxy temperature reconstructions of the earth’s temperature is now entering its second decade. While the literature is large, there has been very little collaboration with universitylevel, professional statisticians (Wegman et al., 2006; Wegman, 2006). Our paper is an effort to apply some modern statistical methods to these problems. While our results agree with the climate scientists findings in some

respects, our methods of estimating model uncertainty and accuracy are in sharp disagreement.

On the one hand, we conclude unequivocally that the evidence for a ”long-handled” hockey stick (where the shaft of the hockey stick extends to the year 1000 AD) is lacking in the data. The fundamental problem is that there is a limited amount of proxy data which dates back to 1000 AD; what is available is weakly predictive of global annual temperature. Our backcasting methods, which track quite closely the methods applied most recently in Mann (2008) to the same data, are unable to catch the sharp run up in temperatures recorded in the 1990s, even in-sample.

As can be seen in Figure 15, our estimate of the run up in temperature in the 1990s has

a much smaller slope than the actual temperature series. Furthermore, the lower frame of Figure 18 clearly reveals that the proxy model is not at all able to track the high gradient segment. Consequently, the long flat handle of the hockey stick is best understood to be a feature of regression and less a reflection of our knowledge of the truth. Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distribution of our model.

Our main contribution is our efforts to seriously grapple with the uncertainty involved in paleoclimatological reconstructions. Regression of high dimensional time series is always a complex problem with many traps. In our case, the particular challenges include (i) a short sequence of training data, (ii) more predictors than observations, (iii) a very weak signal, and (iv) response and predictor variables which are both strongly autocorrelated.

The final point is particularly troublesome: since the data is not easily modeled by a simple autoregressive process it follows that the number of truly independent observations (i.e., the effective sample size) may be just too small for accurate reconstruction.

Climate scientists have greatly underestimated the uncertainty of proxy based reconstructions and hence have been overconfident in their models. We have shown that time dependence in the temperature series is sufficiently strong to permit complex sequences of random numbers to forecast out-of-sample reasonably well fairly frequently (see, for example, Figure 9). Furthermore, even proxy based models with approximately the same amount of reconstructive skill (Figures 11,12, and 13), produce strikingly dissimilar historical backcasts: some of these look like hockey sticks but most do not (Figure 14).

Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries. Nonetheless, paleoclimatoligical reconstructions constitute only one source of evidence in the AGW debate. Our work stands entirely on the shoulders of those environmental scientists who labored untold years to assemble the vast network of natural proxies. Although we assume the reliability of their data for our purposes here, there still remains a considerable number of outstanding questions that can only be answered with a free and open inquiry and a great deal of replication.

===============================================================

Commenters on WUWT report that Tamino and Romm are deleting comments even mentioning this paper on their blog comment forum. Their refusal to even acknowledge it tells you it has squarely hit the target, and the fat lady has sung – loudly.

(h/t to WUWT reader “thechuckr”)

Share

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
1.2K Comments
Inline Feedbacks
View all comments
Geoff Sherrington
August 15, 2010 6:19 pm

Nick Stokes says:
August 14, 2010 at 6:20 pm re run-up period that ends in 1998. Now, 1998 was an anomalous year, if the recording was accurate. It therefore is an unfortunate choice for the final data point through its potential ability to exaggerate detail in some forms of treatments.
You have made some good points in the past, to be fair, and a few horrible ones, to be unfair, so here’s a chance for another good score. Was the anomalously hot global year 1998 caused by an additional anomalous heat input into the system, by a less-than-usual subtraction, or from a redustribution of heat already in the “closed” global system?
Nobody I have asked can offer an answer as to why 1998 was so much hotter, apart from being at the extremity of statistical fluctuation. But, it does have some strange character of its own. Any thoughts? Would McShane & Wyner be better if they chose a different final point?

Chris D.
August 15, 2010 6:36 pm

Finally, we learn the true meaning of “climate justice”!

Geoff Sherrington
August 15, 2010 6:37 pm

Mike Roddy says: August 14, 2010 at 7:13 pm -“a 40% decline in fish biomass since 1950 due to CO2′s effect on phytoplankton.”
You forgot to count the big one that got away.
References?

Stephan
August 15, 2010 6:45 pm

I would not surprised that all the young climate researchers in their old age become the most ardent skeptics yes even Gavin, Mann etc.. its very common for this to happen, so skeptics you will get satisfaction eventually.. of course the main driver is the actual weather year by year and that is not revealing any consistent warming/cooling anywhere LOL. Again I repeat we only live max 100 years we will never experience palpable climate change. At least 1000=3000 years life span would be required…. so we can all go home and get a life and forget this nonsense chao…..

anticlimactic
August 15, 2010 7:27 pm

It is nice to see some experts becoming involved rather than the rather amateur approach often found in climate science. When processing statistics it is important to process them with skill to get the right results, not the ‘desired’ results.
This is the crux of science : to get an idea, assemble the data and process it to see if you are correct. Hopefully ‘yes’, but even ‘no’ is useful information. It is knowledge.
In some areas of climate science it is manipulating data until you get the ‘desired’ result then if anyone questions it they should, variously : lose their job, be banned from publication, be shunned, be put on trial, be deported [and I am sure some would wish a worse fate on these blasphemers]. This is just degenerate pseudo-scientific propaganda. It is not knowledge.

Glenn
August 15, 2010 7:29 pm

Mike says:
August 15, 2010 at 5:33 pm

“I said: “The journal claims to have the 6th highest impact factor among stat journals, but it is not clear what that really means.”
You said more than that. “Not likely to be a top journal” and “It takes many years to establish a reputation.”
“Glenn said: “Clear to those who know what it means and can check. But your previous claims would have more weight by saying that.”
“The impact factor is a measure of how often the articles in a journal are cited. But if a paper is cited a lot because others are criticising it, that counts the same as when it is cited by others praising it. That is one reason impact factors are not clear indicators of quality. See http://www.ams.org/notices/200603/comm-milman.pdf for another view of this. (A sub may be required.)”
That is true for all journals, but whether “quality” can be determined by how many “praise” the journal articles is what is not clear. What I do know is that journals are measured by their impact.
You have provided absolutely no support for your contentions, your reasons seem to be nothing more than to cast doubt on the journal and the authors.
“I will repeat that the M&W article seems interesting and should be judged on its merits.”
Yet you didn’t, except to take a sentence from the paper out of context in another post, quote:
“…our model offers support
to the conclusion that the 1990s were the warmest decade of the last millennium,…”
What “merit” does that have, and what does it imply about you in light of your attempt to discredit or downplay the journal and authors reputations?

Dave F
August 15, 2010 7:33 pm

If 2005 is the hottest year ever, then allowing the series to continue only exacerbates the problem. Choosing 1998 as an endpoint is as valid as, say, anomalously hot 2010?

geo
August 15, 2010 7:44 pm

Knights says:
August 15, 2010 at 4:03 pm
Thank you, Roger. I for one will “Huzzah” from the rooftops at a more realistic debate over percentages of causation.
And I feel the need to point out that Anthony’s work re UHI (which I have contributed a goodly amount of time and mony to over the last few years, because basic R&D is always a worth endeavor, whatever the results) moves front and center with the Hockey Stick in tatters.
Understand, I consider myself a “lukewarmist” and bristle more than most at being called a “denier”. I think C02 almost certainly has played a role in modern warming. But whether that role is 1/5, 1/3, 2/3, or 4/5, is a vitally imporant question to determine in the next 20 years. I think we will. But then I’ve always been an optimist about the human race in the longer-term, and as a semi-pro historian tend to take with a large grain of salt the contretempts and mud-slinging of the moment from a historical perspective.
[REPLY – Yes, geo, you have been a staunch footsoldier for the surfacestations project. (I’ve personally evaluated a lot of your work.) Thanks to all of you who have hunted down stations; you know who you are (and so do I) — so take a proud moment out. ~ Evan]

DR
August 15, 2010 7:53 pm

Mike Roddy. Follow link below.
Physical tests were done with raising CO2 levels in ocean water. Most tests showing damaging affects to shell fish and plankton were done lowering pH with other than CO2.
Plankton actually thrive on excess CO2 in the water.
http://tinyurl.com/37v9pd2

Mike Roddy
August 15, 2010 8:04 pm

Geoff Sherrington, the reference to the study published in Nature showing a 40% decline in phytoplankton was the link I posted in that comment.
Mann’s hockey stick and the blogosphere (not scientific) controversy that came from it was studied by NAS, or the National Academy of Sciences. His work was vindicated in all respects, and was shown to be robust. Here’s the link:
http://live.psu.edu/fullimg/userpics/10026/Final_Investigation_Report.pdf
I also suggest that readers take a look at the Realclimate post on the subject that I linked in my previous comment. If neither of these convinces you, then nothing I can say will. Have WUWT commenters and readers actually read them? If not, you should.

August 15, 2010 8:14 pm

Here is a poem I wrote about this – written to the tune of “Blowin’ in The Wind.”
___________________________________________
Blowin’ in the Trees
How many times must Mann get spanked
Before he admits he was wrong!
Yes, and how many emails must scream out, “Denier!”
That some scientists don’t belong!
Yes and how many times must statistics be damned
To invent Mann-made catastrophe!
The answer my friend, is blowin’ in the trees,
The answer is blowin’ in the trees.
How many times must Briffa measure wood
Before he can make a hockey stick?
Yes, and how many times must McIntyre insist
That bad math should never persist?
Yes, and how much cooling can push back the lies
Claiming any given storm proves the fit!
The answer my friend, is blowin’ in the trees,
The answer is blowin’ in the trees.
How much C02 is the “proper amount?”
And what temperature is the “best?”
Yes, and how many years can ‘the team’ suck up Grants
Before it’s exposed by the Press?
Yes and how many newspapers can publish foolish claims
Pretending that they understand this mess?
The answer my friend, is blowin’ in the trees,
The answer is blowin’ in the trees.
___________________________________________
©2010 Dave Stephens
(with apologies to Bob Dylan)

August 15, 2010 8:16 pm

geo says:
August 15, 2010 at 7:44 pm
Understand, I consider myself a “lukewarmist” and bristle more than most at being called a “denier”. I think C02 almost certainly has played a role in modern warming. But whether that role is 1/5, 1/3, 2/3, or 4/5, is a vitally imporant question to determine in the next 20 years.
============================================================
Whichever fraction you choose to attribute to CO2, hopefully you will properly accept the fraction of CO2 that we contribute.

Evan Jones
Editor
August 15, 2010 8:23 pm

Whichever fraction you choose to attribute to CO2, hopefully you will properly accept the fraction of CO2 that we contribute.
Naturally (well, okay, anthropogenically).
What is, is. All we are trying to do it to find out. Preferably while not flushing half of world growth while we’re about it!

geo
August 15, 2010 8:33 pm

Evan–
Hopefully while not *unnecessarily* flushing half of world growth while we’re about it.
Fixed it for you. . .

wwf
August 15, 2010 8:34 pm

[snip – invalid email – see policy page ~mod]

August 15, 2010 8:53 pm

Adding to what Gail mentioned above … From the paper itself – and noting the hundreds of BAD data inputs, selections of data from the record, selective picking of sources of that data from the total environment, errors in processing, errors in statistics, errors in counting, and double-selecting redundant and self-duplicating errors displayed by Mann – and his white-washed cohorts – in dissembling their propaganda as related by the Hockey Stick Illusion by Monkton …
The following discussed Mann 2008:
“This is by far the most comprehensive publicly available database of
temperatures and proxies collected to date. It contains 1,209 climate proxies
(with some going back as far as 8855 BC and some continuing up till
2003 AD). It also contains a database of eight global annual temperature
aggregates dating 1850-2006 AD (expressed as deviations or ”anomalies”
from the 1961-1990 AD average4). Finally, there is a database of 1,732 local
annual temperatures dating 1850-2006 AD (also expressed as anomalies
from the 1961-1990 AD average)5. All three of these datasets have been substantially
processed including smoothing and imputation of missing data
(Mann et al., 2008). While these present interesting problems, they are not
the focus of our inquiry. We assume that the data selection, collection, and
processing performed by climate scientists meets the standards of their discipline.
Without taking a position on these data quality issues, we thus take
the dataset as given.We further make the assumptions of linearity and stationarity
of the relationship between temperature and proxies, an assumption
employed throughout the climate science literature (NRC, 2006) noting
that ”the stationarity of the relationship does not require stationarity of the
series themselves” (NRC, 2006).”
—…—…—
Thus, one wonders what the critiques of Mann-made CAGW would become if the full story with the full set errors were discussed honestly.

Jaye
August 15, 2010 8:53 pm

GeoFlynx says:
August 15, 2010 at 2:56 pm
You are seriously deluding yourself. The statisticians didn’t take on measured data because they knew better. Mann did it for nefarious reasons.

August 15, 2010 8:58 pm

Mike Roddy says:
August 15, 2010 at 8:04 pm
Geoff Sherrington, the reference to the study published in Nature showing a 40% decline in phytoplankton was the link I posted in that comment.
Mann’s hockey stick and the blogosphere (not scientific) controversy that came from it was studied by NAS, or the National Academy of Sciences. His work was vindicated in all respects, and was shown to be robust.
—…—…
False. Mann’s methods, motives, and opportunities were reviewed twice by Congressional hearings – and they were far less biased than Penn States’ whitewash, and his conclusions were rejected. That an administration chooses to select the result they want – vitally needed to receive 1.3 trillion in unnecessary taxes from the world’s poor and middle class, condemning billions to a life cut short by disease and poverty by your policies of restricting energy development based don FALSE premises and FALSE processing is not surprising.

August 15, 2010 8:59 pm

I just saw a sign that reads, “CO2 HOAX”, right by the US Marine base at Quantico, Virginia…!!!
Watts up with that ???

Reed Coray
August 15, 2010 9:00 pm

Well, at least one university in the state of Pennsylvania has the courage to put science before the almighty dollar.

Robert in Calgary
August 15, 2010 9:07 pm

Mike Roddy says…..
“His work was vindicated in all respects, and was shown to be robust. ”
Ha ha ha! That Mike Roddy! Don’t bother him with the facts, it will burst his “reality bubble”.

Jaye
August 15, 2010 9:12 pm

His work was vindicated in all respects, and was shown to be robust
Ok now we have entered into the fantastical. In reality, he was politely chastised. That North chose to back pedal a bit in his use of language was likely political.
The NAS did find some of Mann’s work “plausible” — that’s the closest that it comes to actually supporting Mann’s findings — but then it immediately states there are so many scientific uncertainties attached to Mann’s work that it doesn’t have great confidence in it. The committee then proceeds to further downgrade its view of Mann’s work: “Even less confidence can be placed in the original conclusions by Mann et al. (1999) that ‘the 1990s are likely the warmest decade, and 1998 the warmest year, in at least a millennium.’ ”

orkneygal
August 15, 2010 9:14 pm

Mike Roddy-
Your link is to the Penn State whitewash report, not to the NAS report.
In the Penn State report, Dr. Mann is not found to be pure as the driven snow, by any means.
This is not about anyone’s behaviour in any case, its about the data and the truth.

August 15, 2010 9:14 pm

Its amazing to me that UV and other radiation absorbing volcanic ash aerosols in the atmosphere have increased, probably close to a good 30% in the last three decades, especially since 1995. Right along with people being convinced that human CO2 emissions were to blame.
How did they manage to stage that?

Evan Jones
Editor
August 15, 2010 9:18 pm

Well, geo, as I have commented in the past, for every $billion wasted (or never produced) anywhere in the world, babies starve somewhere in the world. I am sure we agree on that.
And no false appeals to Pascal, please, people. Pascal’s wager presumes there’s no material cost to taking the precaution in question. (AND that the solution will be effective if the danger is real!) But this one’s a cost-benefit deal with innocent blood on the line for every iota of cost.
So we better be very damn sure what we are about. Not only do we need to be reasonably certain there is a problem in the first place, but we also need good reason to believe that the proposed solution is going to turn the trick.
And, so far, not only does the supposed problem not add up, but the proposed solution wouldn’t add up even if the problem did.

1 13 14 15 16 17 49