This post over at Lucia’s Blackboard caught my interest because it shows you don’t need to operate heavy equipment (like Al Gore’s famous elevator scene) to get yourself a big data hockey stick. More evidence that Mann and Briffa’s selection criteria for trees leads to a spurious result.
It doesn’t matter if you are using treemometers, thermometers, stock prices, or wheat futures, if your method isn’t carefully designed to prevent an artificial, and unintentional selection of data that correlates with your theory, the whole study can turn out like a hockey stick. Anyone can do this, no special science skills are needed.
So, even though the method seems reasonable, and the person doing it doesn’t intend to cherry pick, if they don’t do some very sophisticated things, rejecting trees that don’t correlate with the recent record biases an analysis. It encourages spurious results, and in the context of the whole “hockey stick” controversy, effectively imposes hockey sticks on the results.
And she backs it up with a simple experiment anybody can do with Microsoft Excel.
Method of creating hockey stick reconstructions out of nothing
To create “hockey stick” reconstructions out of nothing, I’m going to do this:
- Generate roughly 148 years worth of monthly “tree – ring” data using rand() in EXCEL. This corresponds to 1850-1998. I will impose autocorrelation with r=0.995. I’ll repeat this 154 times. (This number is chosen arbitrarily.) On the one hand, we know these functions don’t correlate with Hadley because they are synthetically generated. However, we are going to pretend we believe “some” are sensitive to temperature and see what sort of reconstruction we get.
- To screen out the series that prove themselves insensitive to temperature, I’ll compute the autocorrelation, R, between Hadley monthly temperature data and the tree-ring data for each of the 154 series. To show this problem with this method, I will compute the correlation only over the years from 1960-1998. Then, I’ll keep all series that have autocorrelations with absolute values greater than 1.2* the standard deviation of the 154 autocorrelations R. I’ll assume the other randomly generated monthly series are “not sensitive” to temperature and ignore them. (Note: The series with negative values of R are the equivalent of “upside down” proxies. )
- I’ll create a proxy by simply averaging over the “proxies” that passed the test just described. I’ll rebaseline so the average temperature and trends for the proxie and Hadley match between 1960-1998.
- I’ll plot the average from the proxies and compare it to Hadley
The comparison from one (typical) case is shown below. The blue curve indicates is the “proxy reconstruction”; the yellow is the Hadley data (all data are 12-month smoothed.)
Notice that after 1960, the blue curve based on the average of “noise” that passed the test mimics the yellow observations. It looks good because I screened out all the noise that was “not sensitive to temperature”. (In reality, none is sensitive to temperature. I just picked the series that didn’t happen to fail. )
Because the “proxies” really are not sensitive to temperature, you will notice there is no correspondence between the blue “proxy reconstruction” and the yellow Hadley data prior to 1960. I could do this exercise a bajillion times and I’ll always get the same result. After 1960, there are always some “proxies” that by random chance correlate well with Hadley. If I throw away the other “proxies” and average over the “sensitive” ones, the series looks like Hadley after 1960. But before 1960? No dice.
Also notice that when I do this, the “blue proxie reconstruction” prior to 1960 is quite smooth. In fact, because the proxies are not sensitive, the past history prior to the “calibration” period looks unchanging. If the current period has an uptick, applying this method to red noise will make the current uptick look “unprecedented”. (The same would happen if the current period had a down turn, except we’d have unprecedented cooling. )
The red curve
Are you wondering what the red curve is? Well, after screening once, I screened again. This time, I looked at all the proxies making up the “blue” curve, and checked whether they correlated with Hadely during the periods from 1900-1960. If they did not, I threw them away. Then I averaged to get the red line. (I did not rebaseline again.)
The purpose of the second step is to “confirm” the temperature dependence.
Having done this, I get a curve that sort of looks sort of like Hadley from 1900-1960. That is: the wiggles sort of match. The “red proxy reconstructions” looks very much like Hadley after 1960: both the “wiggles” and the “absolute values” match. It’s also “noisier” than the blue curve–that’s because it contains fewer “proxies”.
But notice that aprior to 1900, the wiggles in the red proxy and the yellow Hadley data don’t match. (Also, the red proxie wants to “revert to the mean.”)
Can I do this again? Sure. Here are the two plots created on the next two “refreshes” of the EXCEL spreadsheet:
I can keep doing this over and over. Some “reconstructions” look better; some look worse. But these don’t look too shabby when you consider that none of the “proxies” are sensitive to temperature at all. This is what you get if you screen red noise.
Naturally, if you use real proxies and that contain some signal, you should do better than this. But knowing you can get this close with nothing but noise should make you suspect that screening out based on a known temperature record can bias your answers to:
- Make a “proxy reconstruction” based on nothing but noise match the thermometer record and
- Make the historical temperature variations looks flat and unvarying.
So, Hu is correct. If you screen out “bad” proxies based on a match to the current temperature record, you bias your answer. Given the appearance of the thermometer record during the 20th century, you bias toward hockey sticks!
Does this mean it’s impossible to make a reliable reconstruction? No. It means you need to think very carefully about how you select your proxies. Just screening to match the current record is not an appropriate method.
Update
I modified the script to show the number of proxies in the “blue” and “red” reconstructions. Here is one case, the second will be uploaded in a ’sec.
Steve McIntyre writes in comments:
Steve McIntyre (Comment#21669)
October 15th, 2009 at 4:24 pm
Lucia, in addition to Jeff Id, this phenomenon has now been more or less independently reported by Lubos, David Stockwell and myself. David published an article on the phenomenon in AIG News, online at http://landshape.org/enm/wp-co…..6%2014.pdf . We cited this paper in our PNAS comment (as one of our 5 citations.) I don’t have a link for Lubos on it, but he wrote about it.
I mention this phenomenon in a post prior to the starting of Climate Audit, that was carried forward from my old website from Dec 2004 http://www.climateaudit.org/?p=9, where I remarked on this phenomenon in connection with Jacoby and D’Arrigo picking the 10 most “temperature-sensitive” out of 35 that they sampled as follows:
If you look at the original 1989 paper, you will see that Jacoby “cherry-picked” the 10 “most temperature-sensitive” sites from 36 studied. I’ve done simulations to emulate cherry-picking from persistent red noise and consistently get hockey stick shaped series, with the Jacoby northern treeline reconstruction being indistinguishable from simulated hockey sticks. The other 26 sites have not been archived. I’ve written to Climatic Change to get them to intervene in getting the data. Jacoby has refused to provide the data. He says that his research is “mission-oriented” and, as an ex-marine, he is only interested in a “few good” series.
===
Read the whole post at Lucia’s blog here
I encourage readers to try these experiments in hockey stick construction themselves. – Anthony






As the brilliant card magician Ricky Jay said: “Every profession is a conspiracy against the laity”.
Al gore didn’t use an elevator. It’s called a scissor lift. 😉
Method A is exactly what was done. We used a broad frequency click to measure subjects first. If their brainstem response was easy to read, we put them in the next group. If the response was too noisy (that higher brain problem), we rejected the subject for inclusion. We then selected the 6 easiest to read click responders from the second group and performed the rest of the procedures. Why didn’t we just put them all to sleep? The study was expensive and we had a very limited budget. Putting them to sleep would have enlarged the costs way beyond our budget. And the literature was already filled with comparisons between the response ability of the auditory pathway in sleeping brains and quiet brains, demonstrating very little difference. So our method was valid.
Those persons/traders/analysts in the financial markets are well aware of such biases and persons in data modeling in general, myself included. At a *minimum*, you must fit data on one set and validate on another. In our analysis, as we search for models, we fit one set, validate on a second to evaluate model performance, then validate on a third set to select models and finally a FOURTH set to hold back to make sure the entire modeling process wasn’t blind luck. Good performance means your “fit” matches on ALL data sets EQUALLY.
In matters of major importance, such as taxation of masses and suppression of standards of living, such modeling should stand to the highest rigors of validation.
Thanks,
Carl
TomP, I do not believe you are in a position to make any recommendations on how Steve Mcintyre should make use of his time. In the past, I have found my life to be on the wrong tack. Course corrections can be difficult and humbling, but in the end, were very rewarding. We can help you through this.
TomP, you have Imperial college at your disposal. If you are so concerned, why not do the work yourself and publish it with your name on it?
I don’t have a lot a respect for academics that challenge others who do their work in the open, but don’t want to put their own name on their challenges. Put up or shut up time. Time to define yourself as a true academic or just another net hack.
Suppose you had 50 years of good temperature data from modern thermometers, and you first selected your trees using only the last 25 years of data, then you checked it by seeing how they compare to the first 25 years of data? Would that sort of test reliably tell you that the process was of selection was broken?
Just curious how you would go about “proving” to a true believer that the process is broken using their own data.
Larry
hotrod,
“Suppose you had 50 years of good temperature data from modern thermometers, and you first selected your trees using only the last 25 years of data, then you checked it by seeing how they compare to the first 25 years of data? Would that sort of test reliably tell you that the process was of selection was broken?”
That would be only the most primitive first step and probably not sufficient. The field of model building and validation is substantial and covers more than can be posted here as a comment, but includes things such as model parsimony (degrees of freedom), bootstrapping within and across multiple windows (data sets) to prove model reliability and fit for purpose, extensive analysis of errors, the performance on data after the model building and validation process is complete (true out-of-sample) and much more.
Data snooping, overt and subtle, can cause models to look valid after doing all the above on historical data. It’s easy for your presumptions and biases to leak into the process (the subject of the article), but ongoing performance after you say “there, I’m done” is the final true test. If the model’s performance, which can be measured in many ways, degrades going forward, you find out you weren’t as successful as you thought, which is often the case, I’m afraid. We live in a non-stationary, highly interconnected universe and to claim that you have a model that works in all cases for all time would be foolish. Even Newton was “wrong”. The best you can expect is a workable (useful) approximation.
Thanks,
Carl
“Suppose you had 50 years of good temperature data from modern thermometers, and you first selected your trees using only the last 25 years of data, then you checked it by seeing how they compare to the first 25 years of data? Would that sort of test reliably tell you that the process was of selection was broken?”
This is what they should have done to confirm their selection method. Easy really. They picked the trees that were good thermometers in the modern period. Now the question was, were they also good thermometers in the distant past. All they had to do is find out what the temps were in the distant past, then compare what their selected trees said they were. They must have done that, mustn’t they? That’s how they would confirm their selection method really works.
But wait a minute. I just thought of a problem with that. The reason we were using these trees because they didn’t got no thermometers back in that distant past. So how do we go about validating that the modern selected trees, just because they match the modern record, also matched the distant past record?
Round about this point you realize the only thing to do is accuse your critics of being a close relative of Dick Cheney, or maybe having worked for Exxon, or been associated with the tobacco lobby, or perhaps being a creationist. Because the statistics, well, on that basis, you are sunk.
michel:
Actually, both sides tend to accuse the other side of associations of this type. The difference is that one side suggests that the Heartland Institute might not be the most unbiased source of information while the other side claims there is some kind of mass conspiracy or collusion / distortion brought about by funding that has made the IPCC, the National Academy of Sciences and the analogous bodies in all the other G8+5 countries, AAAS, and the councils of most of the major scientific societies (APS, AMS, AGU, …) untrustworthy. Do you see the difference?
Also, since you mentioned Exxon, it is worth noting that Exxon is now light-years ahead of the “skeptic” community here in publicly accepting the science of AGW: http://www.exxonmobil.com/Corporate/energy_climate_views.aspx
But surely the random sequences added together are just that random. Because they are random there will be random sequences that conform to any curve required, but outside the conformance the sequence will fall back to random = average zero.
Surely what is being proposed is that trees growths are controlled by many factors. no randomness just noise and a combination of factors.
Trees will not grow at -40C trees will not grow at +100c.
Trees do grow well at a temp in between (all else being satisfactory).
Choosing trees that grow in tune to the temperature means that if they extend beyond the temp record than the is a greater possibility that these will continue to grow in tune with the temp. If they grow to a different tune then they are invalid responders.
A long time ago I posted a sequence of pictures showing what can be obtained by adding and averaging a sequence of aligned photos – the only visible data was the church and sky glow. I added 128 of these images together and obtained this photo:
http://img514.imageshack.us/img514/1989/128imagesaddednootheradub9.jpg
Note that it also shows the imperfections in the digital sensor (the window frame effect)
Image shack did have a single image with the gamma turned up to reveal the only visible image (Church+sky) but they’ve lost it!
The picture was taken in near dark conditions.
A flash photo of the same:
http://img514.imageshack.us/img514/6475/singleflashulpn3.jpg
By removing all invalid data (pictures of the wife, the kids, flowers etc) that do not have the church and sky, a reasonable picture of the back garden appears from the noise.
Of course I may have included a few dark picture with 2 streetlights in those locations, but with enough of the correct image these will have a lessening effect.
wattsupwiththat (12:08:12) :
“If you are so concerned, why not do the work yourself and publish it with your name on it?”
It’s Steve McIntyre who is making the accusation against Mann with his description of “fudged” benchmarks. The burden is on Steve to publish and prove Mann wrong.
In any event I very much doubt I could get a publication if all I did was reproduce Mann’s paper.
How, exactly, is ExxonMobil supposed to act when the US Senate is threatening them with a tobacco-style inquisition if they don’t start toeing the line…
Very McCarthy-esque… “Are you now, or have you ever been, a AGW denier?”
One would have to assume that the world’s largest organization of sedimentary geologists, the AAPG, is one of the “groups… whose public advocacy has contributed to the small, but unfortunately effective, climate change denial myth.”
The biggest lie in all of this is the phrase “climate change denial.” None of the serious AGW skeptics has ever denied “climate change.” Our main point is that the climate is always changing.
Tom P, oh puh-leeze. Mann developed an unheard of pick-two procedure. The benchmarks are going to be higher than the pick-one used by Mann. How much higher? I didn’t try to simulate it at the time: but really that should have been Mann’s job rather than not allowing for the procedure. My point here was that it was going to be a higher benchmark than the one reported by Mann and that the dendro proxies (other than the RegEMed MXD series) were already more or less at a random yield.
Joel Shore said “light years ahead of this blog”. Light years in the wrong direction. By the way, Joel, condescending means “to talk down too”. Just so you know, …. 8^]
It is worth noting Jeff Id’s demonstration of using Mann’s method on a large grouping of random data.
After you get through finding spurious sine waves and step functions, he demonstrates how even the slightest imperfection in “the true temperature” can drive the entire calculation off into the weeds. Quickly.
It would interesting IMNSHO to see the discrepancies between the normal Mann method, and the Mann method applied strictly to the satellite era and the local grid cell temperatures as determined by satellites.
Dave Middleton says:
It doesn’t sound that McCarthy-esque to me. They are just calling Exxon-Mobil out on some deceptive practices. As they note, it is not as if Exxon was funding peer-reviewed science that was attempting to prevail in the scientific community. Rather they were funding public obfuscation of peer-reviewed science.
If you want to talk about something McCarthy-esque, you could talk about Barton’s inquisition of Mann and his co-authors, which was so bad that even fellow Republican Sherwood Boehlert, Chair of the House Science Committee, found it extremely objectionable ( http://sciencepolicy.colorado.edu/prometheus/archives/climate_change/000497letter_from_boehlert.html ):
Of course, various scientific organizations like AAAS also weighed in on the heavy-handed tactics of Congressman Barton. But, I didn’t hear a lot of concern about McCarthy-eqsue tactics coming from the “skeptic” community on that one.
Joel Shore, I find it a bit odd you don’t mention the pro-global warming funding from Exxon-Mobile.
If you sum pro and anti-global warming studies funded by Exxonmobil, cae to guess which is larger?
The problem for people like you isn’t that they exclusively fund climate skeptics (which they don’t fund at all now, btw), it’s that they ever funded them at all.
The thing that is destroying the AGW movement from within is that it cannot admit any error, ever, past or present, by any member of the Nomeklatura.
And so we find RC still defending MBH98, and Tamino asserting that the PCA methods used in it are legitimate and approved of by Ian Joliffe, and now we find a chorus of people asserting that Briffa’s selection procedures are entirely normal and legitimate, despite the fact that this stuff is what stats students are taught not to do in the first course. We find the surface station record defended in its entirety, down to the last station.
The funny thing about it, really, is that it was the AGW movement which began by calling dissidents ‘denialists’, but it is now way outscoring its worst betes noires on denialism. What is coming out of the AGW movement on this question of the Hockey Stick, and the Mann and Briffa studies, is now total denialism of settled science. We know, everyone knows, that you just cannot do sampling and statistics like that.
If they could just admit that, get rid of the studies, drop the Hockey Stick, and move on, they would have a chance. There are lot more serious indicators that warming is an issue. But they will not or cannot. And so the thing is snowballing all the time, and with every desperate attempt to cover it up, it gets worse and they lose more credibility.
As always, its not the mistake that sinks you, its the coverup.
Barton wasn’t threatening anyone or any corporation; nor was Barton demanding that anyone or any corporation spend their own money in an Al Gore-approved manner; nor was Barton demanding that anyone or any corporation.
Barton asked Wegman to check Mann’s work because the peers who supposedly reviewed MBH98/99 failed to do so.
This is not a Republican v Democrat thing. A lot of Republicans have fallen for this AGW scam. Enough Republicans have fallen for it, that we will almost certainly have some sort of massive carbon tax imposed on our economy. So, when oil, natural gas and electricity prices go through the roof while our gov’t is artificially restricting supplies over the next few decades, the AGW crowd better hope that the cooling over the next ~25 years is more like 1942 to 1976 than it will be like the Dalton or Maunder Minima… Because our response will be, “Y’All can freeze in the dark for all we care.”
I think the Spam filter grabbed my last post… I’m not sure why.
Joel Shore (19:29:22) :
If you want to talk about something McCarthy-esque, you could talk about Barton’s inquisition of Mann and his co-authors, which was so bad that even fellow Republican Sherwood Boehlert, Chair of the House Science Committee, found it extremely objectionable.
I do not take the scientific credentials of this man seriously. In fact, I find it extremely objectionable even referencing him on a science blog. In addition, as you are quick to point out, he has not been peer-reviewed in a scientific journal.
Boehlert was born in Utica, New York to Elizabeth Monica Champoux and Sherwood Boehlert,[1] and graduated with a B.A., Utica College, Utica, N.Y., 1961;He served two years in the United States Army (1956–1958) and then worked as a manager of public relations for Wyandotte Chemical Company. After leaving Wyandotte, Boehlert served as Chief of Staff for two upstate Congressmen, Alexander Pirnie and Donald J. Mitchell[2]; following this, he was elected the county executive of Oneida County, New York, serving from 1979 to 1983. After his four-year term as county executive, he ran successfully for Congress.
Since 2007, Boehlert has remained active promoting environmental and scientific causes. He serves currently on the Board of the bipartisan Alliance for Climate Protection chaired by former Vice President Al Gore.
Dave Middleton says:
Well, Congressman Boehlert found Barton’s approach to be quite intimidating to scientists, as did scientific organizations like AAAS ( http://www.aaas.org/news/releases/2005/0714letter.pdf ) and Ralph Cicerone, the head of the National Academy of Sciences ( http://www.realclimate.org/Cicerone_to_Barton.pdf )
And, for that matter, I don’t see Sen. Rockefeller and Snowe’s letter as being threatening or demanding. They were just making their views known and requesting that Exxon act as a better corporate citizen.
What I would say is that there are a fair number of Republicans who are not completely ideologically-blinded and are thus actually listening to the scientific community on this issue.
And, what happens if the warming continues at about the average rate it has between the mid-1970s and now?
To quote Bill Cosby, “How long can you tread water?”
It would be the end of the PDO. The Earth did warm from somewhere around 1976 to somewhere around 2003. In many of the surface stations, particularly along the Pacific coast of North America, the Climate Shift of 1976 (PDO shift) is very obvious. Somewhere between 2003 and 2007, the PDO shifted back to negative.
Where do you start calculating your linear temperature trend?
What would have happened if the Earth had continued to cool at the rate in had from 1942-1976? We’d be back in Little Ice Age conditions. What would have happened if the Earth had not cooled from 1942-1976? The rate of warming from 1908-1942 was almost exactly the same as it was from 1977-2005. Without that ~30-year cooling period, the Earth might be getting close to as warm as it was in the Sangamon interglacial.
Back to your question…”And what happens if the warming continues at about the average rate it has between the mid-1970s and now?”
It will be really easy to tell if I’m wrong… The satellite temperature data will quickly revert to the pre-2003 trend within the next few years.
Dave Middleton: Your A) – D) scenario seems to rely on assuming that even if you are wrong, the effects of climate change are less severe than most projections and that the economic effects of mitigating climate change are much, much larger than most projections.
Indeed it will. Here is a plot where I added in a few more linear fits over similar time periods to see how well similar extrapolations might have done in the past: http://www.woodfortrees.org/plot/uah/plot/uah/from:2003/trend/plot/uah/to:2003/trend/plot/uah/from:1979/to:1986/trend/plot/uah/from:1988/to:1995/trend