This post over at Lucia’s Blackboard caught my interest because it shows you don’t need to operate heavy equipment (like Al Gore’s famous elevator scene) to get yourself a big data hockey stick. More evidence that Mann and Briffa’s selection criteria for trees leads to a spurious result.
It doesn’t matter if you are using treemometers, thermometers, stock prices, or wheat futures, if your method isn’t carefully designed to prevent an artificial, and unintentional selection of data that correlates with your theory, the whole study can turn out like a hockey stick. Anyone can do this, no special science skills are needed.
So, even though the method seems reasonable, and the person doing it doesn’t intend to cherry pick, if they don’t do some very sophisticated things, rejecting trees that don’t correlate with the recent record biases an analysis. It encourages spurious results, and in the context of the whole “hockey stick” controversy, effectively imposes hockey sticks on the results.
And she backs it up with a simple experiment anybody can do with Microsoft Excel.
Method of creating hockey stick reconstructions out of nothing
To create “hockey stick” reconstructions out of nothing, I’m going to do this:
- Generate roughly 148 years worth of monthly “tree – ring” data using rand() in EXCEL. This corresponds to 1850-1998. I will impose autocorrelation with r=0.995. I’ll repeat this 154 times. (This number is chosen arbitrarily.) On the one hand, we know these functions don’t correlate with Hadley because they are synthetically generated. However, we are going to pretend we believe “some” are sensitive to temperature and see what sort of reconstruction we get.
- To screen out the series that prove themselves insensitive to temperature, I’ll compute the autocorrelation, R, between Hadley monthly temperature data and the tree-ring data for each of the 154 series. To show this problem with this method, I will compute the correlation only over the years from 1960-1998. Then, I’ll keep all series that have autocorrelations with absolute values greater than 1.2* the standard deviation of the 154 autocorrelations R. I’ll assume the other randomly generated monthly series are “not sensitive” to temperature and ignore them. (Note: The series with negative values of R are the equivalent of “upside down” proxies. )
- I’ll create a proxy by simply averaging over the “proxies” that passed the test just described. I’ll rebaseline so the average temperature and trends for the proxie and Hadley match between 1960-1998.
- I’ll plot the average from the proxies and compare it to Hadley
The comparison from one (typical) case is shown below. The blue curve indicates is the “proxy reconstruction”; the yellow is the Hadley data (all data are 12-month smoothed.)
Notice that after 1960, the blue curve based on the average of “noise” that passed the test mimics the yellow observations. It looks good because I screened out all the noise that was “not sensitive to temperature”. (In reality, none is sensitive to temperature. I just picked the series that didn’t happen to fail. )
Because the “proxies” really are not sensitive to temperature, you will notice there is no correspondence between the blue “proxy reconstruction” and the yellow Hadley data prior to 1960. I could do this exercise a bajillion times and I’ll always get the same result. After 1960, there are always some “proxies” that by random chance correlate well with Hadley. If I throw away the other “proxies” and average over the “sensitive” ones, the series looks like Hadley after 1960. But before 1960? No dice.
Also notice that when I do this, the “blue proxie reconstruction” prior to 1960 is quite smooth. In fact, because the proxies are not sensitive, the past history prior to the “calibration” period looks unchanging. If the current period has an uptick, applying this method to red noise will make the current uptick look “unprecedented”. (The same would happen if the current period had a down turn, except we’d have unprecedented cooling. )
The red curve
Are you wondering what the red curve is? Well, after screening once, I screened again. This time, I looked at all the proxies making up the “blue” curve, and checked whether they correlated with Hadely during the periods from 1900-1960. If they did not, I threw them away. Then I averaged to get the red line. (I did not rebaseline again.)
The purpose of the second step is to “confirm” the temperature dependence.
Having done this, I get a curve that sort of looks sort of like Hadley from 1900-1960. That is: the wiggles sort of match. The “red proxy reconstructions” looks very much like Hadley after 1960: both the “wiggles” and the “absolute values” match. It’s also “noisier” than the blue curve–that’s because it contains fewer “proxies”.
But notice that aprior to 1900, the wiggles in the red proxy and the yellow Hadley data don’t match. (Also, the red proxie wants to “revert to the mean.”)
Can I do this again? Sure. Here are the two plots created on the next two “refreshes” of the EXCEL spreadsheet:
I can keep doing this over and over. Some “reconstructions” look better; some look worse. But these don’t look too shabby when you consider that none of the “proxies” are sensitive to temperature at all. This is what you get if you screen red noise.
Naturally, if you use real proxies and that contain some signal, you should do better than this. But knowing you can get this close with nothing but noise should make you suspect that screening out based on a known temperature record can bias your answers to:
- Make a “proxy reconstruction” based on nothing but noise match the thermometer record and
- Make the historical temperature variations looks flat and unvarying.
So, Hu is correct. If you screen out “bad” proxies based on a match to the current temperature record, you bias your answer. Given the appearance of the thermometer record during the 20th century, you bias toward hockey sticks!
Does this mean it’s impossible to make a reliable reconstruction? No. It means you need to think very carefully about how you select your proxies. Just screening to match the current record is not an appropriate method.
Update
I modified the script to show the number of proxies in the “blue” and “red” reconstructions. Here is one case, the second will be uploaded in a ’sec.
Steve McIntyre writes in comments:
Steve McIntyre (Comment#21669)
October 15th, 2009 at 4:24 pm
Lucia, in addition to Jeff Id, this phenomenon has now been more or less independently reported by Lubos, David Stockwell and myself. David published an article on the phenomenon in AIG News, online at http://landshape.org/enm/wp-co…..6%2014.pdf . We cited this paper in our PNAS comment (as one of our 5 citations.) I don’t have a link for Lubos on it, but he wrote about it.
I mention this phenomenon in a post prior to the starting of Climate Audit, that was carried forward from my old website from Dec 2004 http://www.climateaudit.org/?p=9, where I remarked on this phenomenon in connection with Jacoby and D’Arrigo picking the 10 most “temperature-sensitive” out of 35 that they sampled as follows:
If you look at the original 1989 paper, you will see that Jacoby “cherry-picked” the 10 “most temperature-sensitive” sites from 36 studied. I’ve done simulations to emulate cherry-picking from persistent red noise and consistently get hockey stick shaped series, with the Jacoby northern treeline reconstruction being indistinguishable from simulated hockey sticks. The other 26 sites have not been archived. I’ve written to Climatic Change to get them to intervene in getting the data. Jacoby has refused to provide the data. He says that his research is “mission-oriented” and, as an ex-marine, he is only interested in a “few good” series.
===
Read the whole post at Lucia’s blog here
I encourage readers to try these experiments in hockey stick construction themselves. – Anthony
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.






Steve-M: Jacoby has refused to provide the data. He says that his research is “mission-oriented” and, as an ex-marine, he is only interested in a “few good” series
Lol, of all the lame excuses for poor scientific method I’ve ever heard from the warmista… Very VERY revealing.
Jacoby has earned a couple of lines in my next satirical song.
Yes, once again we see the power of the patented Mann-O-matic proxy fitting AlGore-ithm.
Its amazing how the religious among us avert their gazes from this obvious torture of numbers. Kinda like the Spanish Inquisition.
Sort of like Michelangelo’s elephant. Just cut away all the parts that don’t look like an elephant.
Look I am not a scientist but this whole thing smacks to me of snake oil. To believe that you can take trees and get precise enough temperatures from them to splice it with modern records and get measurements to the tenth of a degree is delusional. Perhaps we should just chuck out our thermometers and use trees instead. Could a double blind test be done? If not it is not worth anything! I very much appreciate the efforts to debunk it but now we need to get this bunkum before the general public that is the real problem. I have asked on blogs when was it the climate did not change and what would be an ideal global average temperature? I get abuse from the warmists no answers. To show the lie is simple to put it to joe public is not he is beset by superstition.
“Only interested in a few good series’ and “mission oriented” speaks volumes. The refusal to provide data for replication is inexcusable. When are the governing bodies going to admonish these people? After science is dead? Methinks the defibrillator is needed now.
Very understandable. The problem, however, is when these issues are pointed out to the perpetrators, and summarily ignored. Their papers should also be ignored. End of story.
And they want to take our money to mitigate data cherry picking?
Once again, the folly of the statistically unwashed attempting to do statistics is demonstrated. This is what I was trying to argue with Tom P (or whatever) a while ago, but they just don’t get it.
Our universities churn out “scientists” on an assembly line (especially biologists) who remain ignorant of math, and this is the results that we end up with.
I think there is a data problem in the sense that a yearly temperature metric is essentially a yearly statistic derived from a monthly statistic which is derived from a daily statistic that is computed from raw temperature measurements over the day. To then compute the ubiqitous temperature anomaly, another derived statistic, the 30 year avergage, is then subtracted from the statistics from which it was computed.
So in a nutshell we are looking at the variation of a statistic varying over time compared to equally nebulous metrics derived from tree rings – that different conclusions could be reached from this data depending on the choice of time frame, sort of quais-cherry picking, tells me that these derived data are simply random.
Lucia’s work seems to bear this out as well, and rather than carp on the methologies used to analyse the data, it’s in the original data collection stage that the problems start.
This happens when the raw data (temperature) are intensive variables – intensive variables do not represent physical quantities – and while you can subject them to statistical analyis because they are expressed as numbers, it’s physicall meaningless – and no better than doing a stats analysis of the phone numbers for Los Angeles.
I personally don’t think the Team are doing this purposefully – they just don’t understand the physical science behind their maths.
Yes, this was an excellent posting by Lucia. Very clear. The thing people with less stats background find difficult to see is: when you do the selection of proxies to find which ones are good temperature readers, you are not ‘picking the good temperature readers’. You are just picking the ones that correlate well for the period you have temps for.
As Lucia says in the comments, there is nothing wrong with picking some trees rather than others. It is just that if you are using them as a proxy for temps, you cannot have temps as your selection criterion. This sounds totally unintuitive. The lay persons natural impulse is to say, that is crazy, of course you should use temps, that is what you are interested in.
Well no, because to make the selection when using temps, in effect you assume what is required to be proven, that the same trees that correlate with temps in the period you use for selection, also did in the period you are trying to use them for. And the fact that they did in the period you are using for selection tells you nothing about whether they will correlate in the period you are trying to use them to measure.
Now, if you picked them on some other basis, like being long lived, well formed, whatever, and then came on a temp reconstruction that resulted, that might be legitimate. If you could show any theoretical reason why that make make them better thermometers. Then the correlation of such an independently selected series with a given set of temp measurements would have some validation value.
The easy way to see it is to ask yourself, why does this correlation of temp with tree rings for this sample for the years 18xx to 19xx make me think that there will be the same correlation for the years 12xx to 14xx. As soon as you ask that, you see that you cannot get to any greater certainty about the second hypothesis from selecting on the basis of the first correlation.
“… they just don’t understand the physical science behind their maths.”
Or perhaps, if John Reid is to be believed …
http://www.quadrant.org.au/magazine/issue/2009/10/climate-modelling-nonsense
they don’t understand the maths behind the physical science 🙂
Louis Hissink wrote: “I personally don’t think the Team are doing this purposefully – they just don’t understand the physical science behind their maths.”
I’ve said it before (in the post directly above yours, at the latest), and now I’ll say it again, directly.
It’s not that these people don’t understand the physical science behind their math (they are highly educated, and probably very smart, professional scientists), it’s that they don’t understand the MATH behind their science.
Lucia:
“Does this mean it’s impossible to make a reliable reconstruction? No. It means you need to think very carefully about how you select your proxies. Just screening to match the current record is not an appropriate method.”
This issue has indeed been thought about very carefully. For example, Mann and coworkers in their supplement to their 2008 paper on proxies:
http://www.pnas.org/content/suppl/2008/09/02/0805721105.DCSupplemental/0805721105SI.pdf
“Although 484 (~40%) pass the temperature screening process over the full (1850 –1995) calibration interval, one would expect that no more than ~150 (~13%) of the proxy series would pass the screening procedure described above by chance alone. This observation indicates that selection bias, although potentially problematic when employing screened predictors… does not appear a significant problem in our case.”
I’ve just started to read all the articles on this, especially Steve McIntyre’s work, and am still just flabbergasted. It seems to me that they hockey players are breaking the most fundamental principles of statistics (wrt. randomness of samples and using separate sample sets for model building and the actual tests), but I need to read a little more on PCA on time series to get the grasp of this.
I once worked with a professional statistician. One day I asked him “what job and with whom do statisticians aspire” ?
His reply was short and quick .. The Tobacco Institute.
This “climate studies” field seems to be teaming with Tobacco Institute refugees as the shine is now off it’s previous beauty.
It’s not just hockey sticks.
When I attended Caltech, in our Freshmen chemistry class we were treated to a lecture that is eventually given to everyone who graduates from Caltech, on the famous Millikan Oil Drop Experiment. The lecture was originally derived from Richard Feynman’s commencement lecture given in 1974, on cargo cult science, and the lecture goes something like this:
Robert Millikan (who the Millikan library at Caltech is named after) originally devised an experiment using drops of oil suspended in an electric field in order to measure the charge of an electron: if you know the current used to suspend the oil drop and you know its mass (by allowing it to fall in the absence of a field), you can calculate the force due to charges on oil drops–and by examining the different values of those charges you can extrapolate the electric charge of an electron.
Now, to quote Dr. Feynman:
“We have learned a lot from experience about how to handle some of the ways we fool ourselves. One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It’s a little bit off because he had the incorrect value for the viscosity of air. It’s interesting to look at the history of measurements of the charge of an electron, after Millikan. If you plot them as a function of time, you find that one is a little bit bigger than Millikan’s, and the next one’s a little bit bigger than that, and the next one’s a little bit bigger than that, until finally they settle down to a number which is higher.
“Why didn’t they discover the new number was higher right away? It’s a thing that scientists are ashamed of – this history – because it’s apparent that people did things like this: When they got a number that was too high above Millikan’s, they thought something must be wrong – and they would look for and find a reason why something might be wrong. When they got a number close to Millikan’s value they didn’t look so hard. And so they eliminated the numbers that were too far off, and did other things like that. We’ve learned those tricks nowadays, and now we don’t have that kind of a disease.”
With all due respect to the late Dr. Feynman, it’s clear we still have that kind of a disease.
As an aside, one of the things that saddens me about most scientific educations is that most scientists never hear the story of the Millikan Oil Drop Experiment, or see the graph, shown in my chemistry class, giving the “accepted” value of the electron charge plotted over time, which shows this beautiful sinusoidal curve, with the largest uptick right around 1953.
Tom P
And how, exactly, are you going to show how many should correlate on a purely chance basis? Think about it!
The reason this method is not so obviously flawed to the eye seems to be because there is a springiness to the median line that prevents it from very suddenly jerking down below the median when the thermometer data begins, in this case in 1960. Can the randomness be tweaked for which time period it varies in to alter this transition? Were a much larger number of random graphs used, two things would stand out clearly that would give a casual viewer pause: (1) there would be a PERFECT match to the thermometer data right down to what is clearly noise while there would be no visible noise in the pre-thermometer data at all, and (2) the dip right before temperature data appears would either be an abrupt 90 degree one or there would be none at all, meaning a completely and perfectly impossible horizontal “temperature” line followed by a perfect match of noisy thermometer data. So even using this method as is, if many hundreds of “good temperature signal” trees were used instead of less than a dozen each time, the graph itself would lack the noise that is required to appear data like instead of obviously not at all data like.
My curiosity is really that of why there is any springiness to the baseline at all. It is that very dip that makes a hockey stick look like real data and thus it is that dip (along with some noise) that fools the eye into not automatically doubting whether what you are looking at is too artificial looking. Imagine a yellow graph with pre-1960 data replaced by a ruler-drawn horizontal line.
The solution would be to go outside and get 10X-100X as much data, then accept their method and make a new hockey stick that thus becomes too perfect a match to thermometer data NOISE to be taken seriously!
I remember that picture of cranes and cherry pickers. It was from an auction of unused and repoed construction equipment because of the real estate bust.
Wow! Great explenation!
I haven’t even begun to grasp the real meaning of the critique of the “excluding bad matches”-method before…
But I believe I see it clearly now. I think… 😮
Does this mean that if one was to move the 1960-1990 Hadley temp data to the horisontal centre of the graph, then both ends on the sides of it would “straighten out”?
I.e. if I select series that match the 1960-1990 Hadley data but do it as if they represent, say, 1930-1960 – I would then get a rather straight line with a bump in the middle instead of a hockey stick?
Because the “unmatched” parts of the series are more “truly random” than the matched data and will therefore always be represented by a somewhat straight line?
And; the more series one adds the smoother the line “outside” the matched area will be!?
This means that the more proxy studies they add – the more straight the blade of the hockey stick will become! 😀
Neo says:
Actually, the Tobacco Institute refugees seem to be mainly on the “skeptic” side of the climate change issue. These include Steven Milloy of JunkScience.com ( http://www.sourcewatch.org/index.php?title=Steven_Milloy ), the late Frederick Seitz ( http://www.sourcewatch.org/index.php?title=Frederick_Seitz ), and S. Fred Singer ( http://www.sourcewatch.org/index.php?title=Fred_Singer ).
As for this current post, the important point is made by Tom P: The potential for this sort of bias has been understood and that is why it is controlled for or various verification methods are used. Are those methods sufficient? I haven’t investigated well enough to know. However, that should be the question rather than just illustrating this fact that has been understood and then not looking into how it has been dealt with.
Tom P>> “would expect that no more than ~150 (~13%) of the proxy series would pass the screening procedure”
Interresting – how would they know this?
[snip – moved to the correct thread where we cover this, thanks for the video tip – Anthony]
Tom P (14:31:51) : I suggest that you read the contemporary CA posts on Mann et al 2008 (See the Categories in the left frame.) To say that “Mann and co-workers” dealt with this issue “very carefully” is laughable.