How to trick yourself into unintentional cherry picking to make hockey sticks

This post over at Lucia’s Blackboard caught my interest because it shows you don’t need to operate heavy equipment (like Al Gore’s famous elevator scene) to get yourself a big data hockey stick. More evidence that Mann and Briffa’s selection criteria for trees leads to a spurious result.

cherry_pickers

It doesn’t matter if you are using treemometers, thermometers, stock prices, or wheat futures, if your method isn’t carefully designed to prevent an artificial, and unintentional selection of  data that correlates with your theory, the whole study can turn out like a hockey stick. Anyone can do this, no special science skills are needed.

Lucia writes:

So, even though the method seems reasonable, and the person doing it doesn’t intend to cherry pick, if they don’t do some very sophisticated things, rejecting trees that don’t correlate with the recent record biases an analysis. It encourages spurious results, and in the context of the whole “hockey stick” controversy, effectively imposes hockey sticks on the results.

And she backs it up with a simple experiment anybody can do with Microsoft Excel.

Method of creating hockey stick reconstructions out of nothing

To create “hockey stick” reconstructions out of nothing, I’m going to do this:

  1. Generate roughly 148 years worth of monthly “tree – ring” data using rand() in EXCEL. This corresponds to 1850-1998. I will impose autocorrelation with r=0.995. I’ll repeat this 154 times. (This number is chosen arbitrarily.) On the one hand, we know these functions don’t correlate with Hadley because they are synthetically generated. However, we are going to pretend we believe “some” are sensitive to temperature and see what sort of reconstruction we get.
  2. To screen out the series that prove themselves insensitive to temperature, I’ll compute the autocorrelation, R, between Hadley monthly temperature data and the tree-ring data for each of the 154 series. To show this problem with this method, I will compute the correlation only over the years from 1960-1998. Then, I’ll keep all series that have autocorrelations with absolute values greater than 1.2* the standard deviation of the 154 autocorrelations R. I’ll assume the other randomly generated monthly series are “not sensitive” to temperature and ignore them. (Note: The series with negative values of R are the equivalent of “upside down” proxies. )
  3. I’ll create a proxy by simply averaging over the “proxies” that passed the test just described. I’ll rebaseline so the average temperature and trends for the proxie and Hadley match between 1960-1998.
  4. I’ll plot the average from the proxies and compare it to Hadley

The comparison from one (typical) case is shown below. The blue curve indicates is the “proxy reconstruction”; the yellow is the Hadley data (all data are 12-month smoothed.)

Figure 1: "Typical" hockey stick from generated by screening synthetic red noise.

Notice that after 1960, the blue curve based on the average of “noise” that passed the test mimics the yellow observations. It looks good because I screened out all the noise that was “not sensitive to temperature”. (In reality, none is sensitive to temperature. I just picked the series that didn’t happen to fail. )

Because the “proxies” really are not sensitive to temperature, you will notice there is no correspondence between the blue “proxy reconstruction” and the yellow Hadley data prior to 1960. I could do this exercise a bajillion times and I’ll always get the same result. After 1960, there are always some “proxies” that by random chance correlate well with Hadley. If I throw away the other “proxies” and average over the “sensitive” ones, the series looks like Hadley after 1960. But before 1960? No dice.

Also notice that when I do this, the “blue proxie reconstruction” prior to 1960 is quite smooth. In fact, because the proxies are not sensitive, the past history prior to the “calibration” period looks unchanging. If the current period has an uptick, applying this method to red noise will make the current uptick look “unprecedented”. (The same would happen if the current period had a down turn, except we’d have unprecedented cooling. )

The red curve

Are you wondering what the red curve is? Well, after screening once, I screened again. This time, I looked at all the proxies making up the “blue” curve, and checked whether they correlated with Hadely during the periods from 1900-1960. If they did not, I threw them away. Then I averaged to get the red line. (I did not rebaseline again.)

The purpose of the second step is to “confirm” the temperature dependence.

Having done this, I get a curve that sort of looks sort of like Hadley from 1900-1960. That is: the wiggles sort of match. The “red proxy reconstructions” looks very much like Hadley after 1960: both the “wiggles” and the “absolute values” match. It’s also “noisier” than the blue curve–that’s because it contains fewer “proxies”.

But notice that aprior to 1900, the wiggles in the red proxy and the yellow Hadley data don’t match. (Also, the red proxie wants to “revert to the mean.”)

Can I do this again? Sure. Here are the two plots created on the next two “refreshes” of the EXCEL spreadsheet:

Hockey2

Hockey3

I can keep doing this over and over. Some “reconstructions” look better; some look worse. But these don’t look too shabby when you consider that none of the “proxies” are sensitive to temperature at all. This is what you get if you screen red noise.

Naturally, if you use real proxies and that contain some signal, you should do better than this. But knowing you can get this close with nothing but noise should make you suspect that screening out based on a known temperature record can bias your answers to:

  1. Make a “proxy reconstruction” based on nothing but noise match the thermometer record and
  2. Make the historical temperature variations looks flat and unvarying.

So, Hu is correct. If you screen out “bad” proxies based on a match to the current temperature record, you bias your answer. Given the appearance of the thermometer record during the 20th century, you bias toward hockey sticks!

Does this mean it’s impossible to make a reliable reconstruction? No. It means you need to think very carefully about how you select your proxies. Just screening to match the current record is not an appropriate method.

Update

I modified the script to show the number of proxies in the “blue” and “red” reconstructions. Here is one case, the second will be uploaded in a ’sec.

Hockey4

Hockey

Steve McIntyre writes in comments:

Steve McIntyre (Comment#21669)

October 15th, 2009 at 4:24 pm

Lucia, in addition to Jeff Id, this phenomenon has now been more or less independently reported by Lubos, David Stockwell and myself. David published an article on the phenomenon in AIG News, online at http://landshape.org/enm/wp-co…..6%2014.pdf . We cited this paper in our PNAS comment (as one of our 5 citations.) I don’t have a link for Lubos on it, but he wrote about it.

I mention this phenomenon in a post prior to the starting of Climate Audit, that was carried forward from my old website from Dec 2004 http://www.climateaudit.org/?p=9, where I remarked on this phenomenon in connection with Jacoby and D’Arrigo picking the 10 most “temperature-sensitive” out of 35 that they sampled as follows:

If you look at the original 1989 paper, you will see that Jacoby “cherry-picked” the 10 “most temperature-sensitive” sites from 36 studied. I’ve done simulations to emulate cherry-picking from persistent red noise and consistently get hockey stick shaped series, with the Jacoby northern treeline reconstruction being indistinguishable from simulated hockey sticks. The other 26 sites have not been archived. I’ve written to Climatic Change to get them to intervene in getting the data. Jacoby has refused to provide the data. He says that his research is “mission-oriented” and, as an ex-marine, he is only interested in a “few good” series.

===

Read the whole post at Lucia’s blog here

I encourage readers to try these experiments in hockey stick construction themselves. – Anthony

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

80 Comments
Inline Feedbacks
View all comments
October 19, 2009 2:51 pm

Joel Shore (12:46:57) :
Dave Middleton: Your A) – D) scenario seems to rely on assuming that even if you are wrong, the effects of climate change are less severe than most projections and that the economic effects of mitigating climate change are much, much larger than most projections.
It will be really easy to tell if I’m wrong… The satellite temperature data will quickly revert to the pre-2003 trend within the next few years.
Indeed it will. Here is a plot where I added in a few more linear fits over similar time periods to see how well similar extrapolations might have done in the past…

You can actually “fit” all of the warming trend in the UAH series into one 63-month period from January 1995 to March 2000. The trend was flat before January 1995 and after March 2000.

October 20, 2009 1:26 am

michel (00:33:18) : The thing that is destroying the AGW movement from within is that it cannot admit any error, ever, past or present, by any member of the Nomeklatura…
well put, whole post, deserves QOTW.

MrAce
October 20, 2009 12:09 pm

@Patrik
“Please tell us how Mann et al knows that one should expect ~13% conformity with measurements from a purely random source?”
Generate a 100 groups of 100 random series and count for each group the series that match. Average this over the 100 groups. This number turned up to be 13%. So if your group of 100 series has 40 matches it is probably not random.

Patrick
November 23, 2009 8:41 pm

@MrAce, P
MrAce said, “Generate a 100 groups of 100 random series and count for each group the series that match. Average this over the 100 groups. This number turned out to be 13%”
What would be the result of this type of analysis if there is actually no correlation between temperature and tree growth but there is a strong correlation between tree growth in one period and the next? (If month X was a good growth year for a tree, isn’t month (x+1) more likely to be a good growth month?) If this were true, wouldn’t the bar be set drastically too low by this random-generation screening method?
To test for selection bias, it seems to me you’d at least have to figure out how to measure/model this inertia or self-correlation or whatever you’d like to call it. This seems hard, but perhaps doable. I haven’t read about anything like this being accounted for, but if it was done, I’d be very interested in the methodology. It sounds like a very interesting problem.
I might look at the dendrology data used for this study and try to figure out a good way to estimate the self-correlation in growth and the effect on the number of tree series that would match if there were no true correlation between temperature and tree growth. Any suggestions for how to do this would be appreciated. I’m already thinking that I might want to look at both pre-industrial and post-industrial periods to see if the self-correlation is the same. I could probably find this on my own, but does anyone have a link to the dendrology data in a nice format (.csv or Excel would be ideal for me, but I’m flexible).

November 28, 2009 7:29 am

Co2 is plant food.