How to trick yourself into unintentional cherry picking to make hockey sticks

This post over at Lucia’s Blackboard caught my interest because it shows you don’t need to operate heavy equipment (like Al Gore’s famous elevator scene) to get yourself a big data hockey stick. More evidence that Mann and Briffa’s selection criteria for trees leads to a spurious result.

cherry_pickers

It doesn’t matter if you are using treemometers, thermometers, stock prices, or wheat futures, if your method isn’t carefully designed to prevent an artificial, and unintentional selection of  data that correlates with your theory, the whole study can turn out like a hockey stick. Anyone can do this, no special science skills are needed.

Lucia writes:

So, even though the method seems reasonable, and the person doing it doesn’t intend to cherry pick, if they don’t do some very sophisticated things, rejecting trees that don’t correlate with the recent record biases an analysis. It encourages spurious results, and in the context of the whole “hockey stick” controversy, effectively imposes hockey sticks on the results.

And she backs it up with a simple experiment anybody can do with Microsoft Excel.

Method of creating hockey stick reconstructions out of nothing

To create “hockey stick” reconstructions out of nothing, I’m going to do this:

  1. Generate roughly 148 years worth of monthly “tree – ring” data using rand() in EXCEL. This corresponds to 1850-1998. I will impose autocorrelation with r=0.995. I’ll repeat this 154 times. (This number is chosen arbitrarily.) On the one hand, we know these functions don’t correlate with Hadley because they are synthetically generated. However, we are going to pretend we believe “some” are sensitive to temperature and see what sort of reconstruction we get.
  2. To screen out the series that prove themselves insensitive to temperature, I’ll compute the autocorrelation, R, between Hadley monthly temperature data and the tree-ring data for each of the 154 series. To show this problem with this method, I will compute the correlation only over the years from 1960-1998. Then, I’ll keep all series that have autocorrelations with absolute values greater than 1.2* the standard deviation of the 154 autocorrelations R. I’ll assume the other randomly generated monthly series are “not sensitive” to temperature and ignore them. (Note: The series with negative values of R are the equivalent of “upside down” proxies. )
  3. I’ll create a proxy by simply averaging over the “proxies” that passed the test just described. I’ll rebaseline so the average temperature and trends for the proxie and Hadley match between 1960-1998.
  4. I’ll plot the average from the proxies and compare it to Hadley

The comparison from one (typical) case is shown below. The blue curve indicates is the “proxy reconstruction”; the yellow is the Hadley data (all data are 12-month smoothed.)

Figure 1: "Typical" hockey stick from generated by screening synthetic red noise.

Notice that after 1960, the blue curve based on the average of “noise” that passed the test mimics the yellow observations. It looks good because I screened out all the noise that was “not sensitive to temperature”. (In reality, none is sensitive to temperature. I just picked the series that didn’t happen to fail. )

Because the “proxies” really are not sensitive to temperature, you will notice there is no correspondence between the blue “proxy reconstruction” and the yellow Hadley data prior to 1960. I could do this exercise a bajillion times and I’ll always get the same result. After 1960, there are always some “proxies” that by random chance correlate well with Hadley. If I throw away the other “proxies” and average over the “sensitive” ones, the series looks like Hadley after 1960. But before 1960? No dice.

Also notice that when I do this, the “blue proxie reconstruction” prior to 1960 is quite smooth. In fact, because the proxies are not sensitive, the past history prior to the “calibration” period looks unchanging. If the current period has an uptick, applying this method to red noise will make the current uptick look “unprecedented”. (The same would happen if the current period had a down turn, except we’d have unprecedented cooling. )

The red curve

Are you wondering what the red curve is? Well, after screening once, I screened again. This time, I looked at all the proxies making up the “blue” curve, and checked whether they correlated with Hadely during the periods from 1900-1960. If they did not, I threw them away. Then I averaged to get the red line. (I did not rebaseline again.)

The purpose of the second step is to “confirm” the temperature dependence.

Having done this, I get a curve that sort of looks sort of like Hadley from 1900-1960. That is: the wiggles sort of match. The “red proxy reconstructions” looks very much like Hadley after 1960: both the “wiggles” and the “absolute values” match. It’s also “noisier” than the blue curve–that’s because it contains fewer “proxies”.

But notice that aprior to 1900, the wiggles in the red proxy and the yellow Hadley data don’t match. (Also, the red proxie wants to “revert to the mean.”)

Can I do this again? Sure. Here are the two plots created on the next two “refreshes” of the EXCEL spreadsheet:

Hockey2

Hockey3

I can keep doing this over and over. Some “reconstructions” look better; some look worse. But these don’t look too shabby when you consider that none of the “proxies” are sensitive to temperature at all. This is what you get if you screen red noise.

Naturally, if you use real proxies and that contain some signal, you should do better than this. But knowing you can get this close with nothing but noise should make you suspect that screening out based on a known temperature record can bias your answers to:

  1. Make a “proxy reconstruction” based on nothing but noise match the thermometer record and
  2. Make the historical temperature variations looks flat and unvarying.

So, Hu is correct. If you screen out “bad” proxies based on a match to the current temperature record, you bias your answer. Given the appearance of the thermometer record during the 20th century, you bias toward hockey sticks!

Does this mean it’s impossible to make a reliable reconstruction? No. It means you need to think very carefully about how you select your proxies. Just screening to match the current record is not an appropriate method.

Update

I modified the script to show the number of proxies in the “blue” and “red” reconstructions. Here is one case, the second will be uploaded in a ’sec.

Hockey4

Hockey

Steve McIntyre writes in comments:

Steve McIntyre (Comment#21669)

October 15th, 2009 at 4:24 pm

Lucia, in addition to Jeff Id, this phenomenon has now been more or less independently reported by Lubos, David Stockwell and myself. David published an article on the phenomenon in AIG News, online at http://landshape.org/enm/wp-co…..6%2014.pdf . We cited this paper in our PNAS comment (as one of our 5 citations.) I don’t have a link for Lubos on it, but he wrote about it.

I mention this phenomenon in a post prior to the starting of Climate Audit, that was carried forward from my old website from Dec 2004 http://www.climateaudit.org/?p=9, where I remarked on this phenomenon in connection with Jacoby and D’Arrigo picking the 10 most “temperature-sensitive” out of 35 that they sampled as follows:

If you look at the original 1989 paper, you will see that Jacoby “cherry-picked” the 10 “most temperature-sensitive” sites from 36 studied. I’ve done simulations to emulate cherry-picking from persistent red noise and consistently get hockey stick shaped series, with the Jacoby northern treeline reconstruction being indistinguishable from simulated hockey sticks. The other 26 sites have not been archived. I’ve written to Climatic Change to get them to intervene in getting the data. Jacoby has refused to provide the data. He says that his research is “mission-oriented” and, as an ex-marine, he is only interested in a “few good” series.

===

Read the whole post at Lucia’s blog here

I encourage readers to try these experiments in hockey stick construction themselves. – Anthony

Advertisements

  Subscribe  
newest oldest most voted
Notify of
tallbloke

Steve-M: Jacoby has refused to provide the data. He says that his research is “mission-oriented” and, as an ex-marine, he is only interested in a “few good” series
Lol, of all the lame excuses for poor scientific method I’ve ever heard from the warmista… Very VERY revealing.
Jacoby has earned a couple of lines in my next satirical song.

Doug in Seattle

Yes, once again we see the power of the patented Mann-O-matic proxy fitting AlGore-ithm.
Its amazing how the religious among us avert their gazes from this obvious torture of numbers. Kinda like the Spanish Inquisition.

Evan Jones

Sort of like Michelangelo’s elephant. Just cut away all the parts that don’t look like an elephant.

dhmo

Look I am not a scientist but this whole thing smacks to me of snake oil. To believe that you can take trees and get precise enough temperatures from them to splice it with modern records and get measurements to the tenth of a degree is delusional. Perhaps we should just chuck out our thermometers and use trees instead. Could a double blind test be done? If not it is not worth anything! I very much appreciate the efforts to debunk it but now we need to get this bunkum before the general public that is the real problem. I have asked on blogs when was it the climate did not change and what would be an ideal global average temperature? I get abuse from the warmists no answers. To show the lie is simple to put it to joe public is not he is beset by superstition.

David Ball

“Only interested in a few good series’ and “mission oriented” speaks volumes. The refusal to provide data for replication is inexcusable. When are the governing bodies going to admonish these people? After science is dead? Methinks the defibrillator is needed now.

So, even though the method seems reasonable, and the person doing it doesn’t intend to cherry pick, if they don’t do some very sophisticated things, rejecting trees that don’t correlate with the recent record biases an analysis. It encourages spurious results, and in the context of the whole “hockey stick” controversy, effectively imposes hockey sticks on the results.

Very understandable. The problem, however, is when these issues are pointed out to the perpetrators, and summarily ignored. Their papers should also be ignored. End of story.

crosspatch

And they want to take our money to mitigate data cherry picking?

Clayton Hollowell

Once again, the folly of the statistically unwashed attempting to do statistics is demonstrated. This is what I was trying to argue with Tom P (or whatever) a while ago, but they just don’t get it.
Our universities churn out “scientists” on an assembly line (especially biologists) who remain ignorant of math, and this is the results that we end up with.

Louis Hissink

I think there is a data problem in the sense that a yearly temperature metric is essentially a yearly statistic derived from a monthly statistic which is derived from a daily statistic that is computed from raw temperature measurements over the day. To then compute the ubiqitous temperature anomaly, another derived statistic, the 30 year avergage, is then subtracted from the statistics from which it was computed.
So in a nutshell we are looking at the variation of a statistic varying over time compared to equally nebulous metrics derived from tree rings – that different conclusions could be reached from this data depending on the choice of time frame, sort of quais-cherry picking, tells me that these derived data are simply random.
Lucia’s work seems to bear this out as well, and rather than carp on the methologies used to analyse the data, it’s in the original data collection stage that the problems start.
This happens when the raw data (temperature) are intensive variables – intensive variables do not represent physical quantities – and while you can subject them to statistical analyis because they are expressed as numbers, it’s physicall meaningless – and no better than doing a stats analysis of the phone numbers for Los Angeles.
I personally don’t think the Team are doing this purposefully – they just don’t understand the physical science behind their maths.

michel

Yes, this was an excellent posting by Lucia. Very clear. The thing people with less stats background find difficult to see is: when you do the selection of proxies to find which ones are good temperature readers, you are not ‘picking the good temperature readers’. You are just picking the ones that correlate well for the period you have temps for.
As Lucia says in the comments, there is nothing wrong with picking some trees rather than others. It is just that if you are using them as a proxy for temps, you cannot have temps as your selection criterion. This sounds totally unintuitive. The lay persons natural impulse is to say, that is crazy, of course you should use temps, that is what you are interested in.
Well no, because to make the selection when using temps, in effect you assume what is required to be proven, that the same trees that correlate with temps in the period you use for selection, also did in the period you are trying to use them for. And the fact that they did in the period you are using for selection tells you nothing about whether they will correlate in the period you are trying to use them to measure.
Now, if you picked them on some other basis, like being long lived, well formed, whatever, and then came on a temp reconstruction that resulted, that might be legitimate. If you could show any theoretical reason why that make make them better thermometers. Then the correlation of such an independently selected series with a given set of temp measurements would have some validation value.
The easy way to see it is to ask yourself, why does this correlation of temp with tree rings for this sample for the years 18xx to 19xx make me think that there will be the same correlation for the years 12xx to 14xx. As soon as you ask that, you see that you cannot get to any greater certainty about the second hypothesis from selecting on the basis of the first correlation.

Sam the Skeptic

“… they just don’t understand the physical science behind their maths.”
Or perhaps, if John Reid is to be believed …
http://www.quadrant.org.au/magazine/issue/2009/10/climate-modelling-nonsense
they don’t understand the maths behind the physical science 🙂

Clayton Hollowell

Louis Hissink wrote: “I personally don’t think the Team are doing this purposefully – they just don’t understand the physical science behind their maths.”
I’ve said it before (in the post directly above yours, at the latest), and now I’ll say it again, directly.
It’s not that these people don’t understand the physical science behind their math (they are highly educated, and probably very smart, professional scientists), it’s that they don’t understand the MATH behind their science.

Tom P

Lucia:
“Does this mean it’s impossible to make a reliable reconstruction? No. It means you need to think very carefully about how you select your proxies. Just screening to match the current record is not an appropriate method.”
This issue has indeed been thought about very carefully. For example, Mann and coworkers in their supplement to their 2008 paper on proxies:
http://www.pnas.org/content/suppl/2008/09/02/0805721105.DCSupplemental/0805721105SI.pdf
“Although 484 (~40%) pass the temperature screening process over the full (1850 –1995) calibration interval, one would expect that no more than ~150 (~13%) of the proxy series would pass the screening procedure described above by chance alone. This observation indicates that selection bias, although potentially problematic when employing screened predictors… does not appear a significant problem in our case.”

Espen

I’ve just started to read all the articles on this, especially Steve McIntyre’s work, and am still just flabbergasted. It seems to me that they hockey players are breaking the most fundamental principles of statistics (wrt. randomness of samples and using separate sample sets for model building and the actual tests), but I need to read a little more on PCA on time series to get the grasp of this.

Neo

I once worked with a professional statistician. One day I asked him “what job and with whom do statisticians aspire” ?
His reply was short and quick .. The Tobacco Institute.
This “climate studies” field seems to be teaming with Tobacco Institute refugees as the shine is now off it’s previous beauty.

It’s not just hockey sticks.
When I attended Caltech, in our Freshmen chemistry class we were treated to a lecture that is eventually given to everyone who graduates from Caltech, on the famous Millikan Oil Drop Experiment. The lecture was originally derived from Richard Feynman’s commencement lecture given in 1974, on cargo cult science, and the lecture goes something like this:
Robert Millikan (who the Millikan library at Caltech is named after) originally devised an experiment using drops of oil suspended in an electric field in order to measure the charge of an electron: if you know the current used to suspend the oil drop and you know its mass (by allowing it to fall in the absence of a field), you can calculate the force due to charges on oil drops–and by examining the different values of those charges you can extrapolate the electric charge of an electron.
Now, to quote Dr. Feynman:
“We have learned a lot from experience about how to handle some of the ways we fool ourselves. One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It’s a little bit off because he had the incorrect value for the viscosity of air. It’s interesting to look at the history of measurements of the charge of an electron, after Millikan. If you plot them as a function of time, you find that one is a little bit bigger than Millikan’s, and the next one’s a little bit bigger than that, and the next one’s a little bit bigger than that, until finally they settle down to a number which is higher.
“Why didn’t they discover the new number was higher right away? It’s a thing that scientists are ashamed of – this history – because it’s apparent that people did things like this: When they got a number that was too high above Millikan’s, they thought something must be wrong – and they would look for and find a reason why something might be wrong. When they got a number close to Millikan’s value they didn’t look so hard. And so they eliminated the numbers that were too far off, and did other things like that. We’ve learned those tricks nowadays, and now we don’t have that kind of a disease.”
With all due respect to the late Dr. Feynman, it’s clear we still have that kind of a disease.
As an aside, one of the things that saddens me about most scientific educations is that most scientists never hear the story of the Millikan Oil Drop Experiment, or see the graph, shown in my chemistry class, giving the “accepted” value of the electron charge plotted over time, which shows this beautiful sinusoidal curve, with the largest uptick right around 1953.

michel

Tom P
And how, exactly, are you going to show how many should correlate on a purely chance basis? Think about it!

The reason this method is not so obviously flawed to the eye seems to be because there is a springiness to the median line that prevents it from very suddenly jerking down below the median when the thermometer data begins, in this case in 1960. Can the randomness be tweaked for which time period it varies in to alter this transition? Were a much larger number of random graphs used, two things would stand out clearly that would give a casual viewer pause: (1) there would be a PERFECT match to the thermometer data right down to what is clearly noise while there would be no visible noise in the pre-thermometer data at all, and (2) the dip right before temperature data appears would either be an abrupt 90 degree one or there would be none at all, meaning a completely and perfectly impossible horizontal “temperature” line followed by a perfect match of noisy thermometer data. So even using this method as is, if many hundreds of “good temperature signal” trees were used instead of less than a dozen each time, the graph itself would lack the noise that is required to appear data like instead of obviously not at all data like.
My curiosity is really that of why there is any springiness to the baseline at all. It is that very dip that makes a hockey stick look like real data and thus it is that dip (along with some noise) that fools the eye into not automatically doubting whether what you are looking at is too artificial looking. Imagine a yellow graph with pre-1960 data replaced by a ruler-drawn horizontal line.
The solution would be to go outside and get 10X-100X as much data, then accept their method and make a new hockey stick that thus becomes too perfect a match to thermometer data NOISE to be taken seriously!

Michael

I remember that picture of cranes and cherry pickers. It was from an auction of unused and repoed construction equipment because of the real estate bust.

Patrik

Wow! Great explenation!
I haven’t even begun to grasp the real meaning of the critique of the “excluding bad matches”-method before…
But I believe I see it clearly now. I think… 😮
Does this mean that if one was to move the 1960-1990 Hadley temp data to the horisontal centre of the graph, then both ends on the sides of it would “straighten out”?
I.e. if I select series that match the 1960-1990 Hadley data but do it as if they represent, say, 1930-1960 – I would then get a rather straight line with a bump in the middle instead of a hockey stick?
Because the “unmatched” parts of the series are more “truly random” than the matched data and will therefore always be represented by a somewhat straight line?
And; the more series one adds the smoother the line “outside” the matched area will be!?
This means that the more proxy studies they add – the more straight the blade of the hockey stick will become! 😀

Joel Shore

Neo says:

This “climate studies” field seems to be teaming with Tobacco Institute refugees as the shine is now off it’s previous beauty.

Actually, the Tobacco Institute refugees seem to be mainly on the “skeptic” side of the climate change issue. These include Steven Milloy of JunkScience.com ( http://www.sourcewatch.org/index.php?title=Steven_Milloy ), the late Frederick Seitz ( http://www.sourcewatch.org/index.php?title=Frederick_Seitz ), and S. Fred Singer ( http://www.sourcewatch.org/index.php?title=Fred_Singer ).
As for this current post, the important point is made by Tom P: The potential for this sort of bias has been understood and that is why it is controlled for or various verification methods are used. Are those methods sufficient? I haven’t investigated well enough to know. However, that should be the question rather than just illustrating this fact that has been understood and then not looking into how it has been dealt with.

Patrik

Tom P>> “would expect that no more than ~150 (~13%) of the proxy series would pass the screening procedure”
Interresting – how would they know this?

Michael

[snip – moved to the correct thread where we cover this, thanks for the video tip – Anthony]

stevemcintyre

Tom P (14:31:51) : I suggest that you read the contemporary CA posts on Mann et al 2008 (See the Categories in the left frame.) To say that “Mann and co-workers” dealt with this issue “very carefully” is laughable.


Michael (15:09:30) :
I remember that picture of cranes and cherry pickers. It was from an auction of unused and repoed construction equipment because of the real estate bust.
/

I see loading forks on a number of those Gradalls and the JLG showing in the picture (meaning: they were used in wharehouse operations).
Along I-55 in mid Illinois one passes an equipment yard where that kind of equipment can be seen year after year … I have a picture somewhere on this PC of that yard from about 2003 …
.
.

Patrik

Tom P>> Would You agree on this statement:
IF the data is random and the method described by Lucia is used THEN the more series you add, the straighter and smoother the “unmatched” parts of the series will be. At the same time, the more series you add, the “uptick part”/blade of the stick will be more and more similar in detail to the data you’re matching against.
?

Tony Hansen

re Tom P ‘… The issue has indeed been thought about very carefully.’
Thinking about something in no way guarantees understanding.
Perhaps one might think on that.

stevemcintyre

CA posts on this topic include http://www.climateaudit.org/?p=4908 4216 3821 3838 3858.
While the matter may have been thought about “very carefully”, unfortunately with Team articles, that often means that you also have to examine it very carefully.
If something is a temperature “proxy”, then one would it expect it to have significant correlation in both “early” and “late” periods as defined in M08. Only 342 of the ~400 do (and one of these is disqualified in my opinion because the “significant” correlation has a different sign in each period.
Further breaking down the 342 of 1209: 71 of 71 Luterbacher proxies pass – these use instrumental data and thus are irrelevant. Of the 927 code 9000 dendro proxies (the majority), only 143 (15.4%) pass the first-cut test. There are issues pertaining to AR1 autocorrelation that would reduce this. Also Mann picks the better of two correlations, goosing the number up a bit on the observed without adjusting the benchmark. I didn’t test how to do a proper benchmark – at this stage, one merely knows that Mann’s benchmark is fudged. However 93 of 105 MXD proxies (these are the Rutherford RegEMed version of Briffa’s MXD proxies.) These suffer from the divergence problem and therefore Mann 2008 deleted the post-1960 portion of these proxies and substituted infilled data. I don’t recall offhand whether we figured out which the correlations were calculated on. The Briffa MXD series are relatively short – none go before AD1400 and do not pertain to the MWP-modern comparison that is the main game.
Tom P, as usual, speaks over certainly on a matter where he has not familiarized himself with the relevant analysis.

David Ball (13:12:03) : ” ‘Only interested in a few good series’ and ‘mission oriented’ [speak] volumes. The refusal to provide data for replication is inexcusable. When are the governing bodies going to admonish these people? After science is dead? Methinks the defibrillator is needed now.”
“She’s dead, Jim.”

AnonyMoose

Instead of throwing out data, should the calibration instead adjust factors which are applied to all of the data? If a linear or geometric relationship doesn’t exist, that’s just too bad.

AnonyMoose

While the matter may have been thought about “very carefully”,…

The Nobel Prize in Chemistry will go to someone who very carefully thought about how to change lead into gold. Someone who tried very hard and very sincerely. That the results are not gold is a detail, some of them do look like yellow metal.

Gordon Ford

Excellent explanation Lucia.
This tale reminds me of the days long age when I was evaluating gold properties for a mining company. One class of property was the gold – quartz vein with pockets of free (visible) gold. This type of property was typically owned by a junior mining company with a name like Free Gold Inc. or GMITS Inc. The owners, typically a prospector, a promoter and a couple of dentists could proudly point to sections of the vein and truthfully say “John Doe (A reputable geologist whom I knew) took a sample from here that ran 10 ounces to the ton and over there he got 5 ounces to the ton.”
On examining and sampling the exposed portions of the vein I would note the small pockets containing visible gold (A sampling QA/QC and statistical nightmare) and long sections of barren looking quartz vein (a mining / grade control nightmare) .
On receipt of the assay results from my sampling (Typically one or two intresting values and a lot of trace or nils) I would write them a nice letter expressing our regrets that their property was not the type of gold property we were looking for and as a courtesy enclose a copy of the assay results of my sampling.
The owners were still convinced that they had a high grade gold vein and had more assay reports to prove it. In reality they had an essentially barren quartz vein with small pockets containing irratically distributed flakes of gold.
The intresting thing is while they could never get a mining company to put the property into production (and make them millionairs) they could always find another dentist, doctor or lawyer to put more money into the company.
I guess that it will always be that some people (possibly most people) will fixate on data that conforms to their beliefs and reject or ignore that which doesn’t.
My appologies to dentist, doctors and lawyers but they kept cropping up time after time.

Espen (14:34:30) : “I’ve just started to read all the articles on this, especially Steve McIntyre’s work, and am still just flabbergasted. It seems to me that they hockey players are breaking the most fundamental principles of statistics (wrt. randomness of samples and using separate sample sets for model building and the actual tests), but I need to read a little more on PCA on time series to get the grasp of this.”
Special rules seem to apply to climatology in general and to dendroclimatology in particular. The relevance of tree cores is extremely low, given that they are highly ambiguous per se and are being used to attempt to measure localized atmospheric temperatures. Any dendro climatic signals are easily obscured by weather signals. The oceans are ~1200 times as great a heatsink as the atmosphere. How significant can this type of study be? It’s astrology.

Michael

“_Jim (15:36:58) :
I see loading forks on a number of those Gradalls and the JLG showing in the picture (meaning: they were used in wharehouse operations).”
Jim,
Now I know what the AGW’ers feel like when they get debunked.
I do have a picture very similar though of construction equipment up at auction.

Pamela Gray

There are cases in research when this is an acceptable practice (IE gathering data and then using only the sensitive to the treatments subjects). I did this in my research. It was necessary as we did not put our subjects to sleep (which would have caused nearly all subjects to be sensitive to the treatment). So instead we had to find subjects whose brains were quiet enough when awake to allow their auditory brainstem synaptic response to the signals to rise above the noisy synaptic brain when listening to high frequency tone pips. I even had my brain tested and it turned out to be WAAAYYYYY too noisy for any use in the study. A few years later I had an EEG done (I was put to sleep). It confirmed that even when asleep, my brain just keeps talking (go figure). I was able to find 6 subjects who had quiet brains. As a result we were able to demonstrate that the auditory pathway was sensitive to narrow frequency from the get go (at the first synaptic junction as well as the later ones in the brainstem).
So I can understand why a researcher would think that looking through the trees to find the sensitive ones is okay to do. However the difference is this: In my case we were removing higher noisy responses that were not related to frequency response in the brainstem (the noise comes from higher synaptic junctions and could be the result of just “thinking”). So we needed to remove noisy brains or put all our subjects to sleep.
With trees, the rings demonstrate the tree’s response to the environment, which is the treatment under consideration. So all rings should be used within a stand that is subject to the same environment (meaning you should not mix trees in a meadow with trees growing next to a river bank). If the intent was to remove trees that were growing next to a stream from trees that were in the meadow, it would make sense to remove those rings from the data pool. Is it possible this is what was done?

glen martin

“stevemcintyre (16:02:45) :
Of the 927 code 9000 dendro proxies (the majority), only 143 (15.4%) pass the first-cut test.”
Roughly what they claim would be expected by chance alone (~13%)

David Ball

Joel Shore, speaking of things that have been understood for some time, the tree ring proxy has been known to have very little accuracy when dealing with temperature for at least three decades. As a child, I remember dinner table conversations between my father and his colleagues over this very issue. It was known to be problematic 30 years ago. Climate science had been way beyond this stuff, and here we are arguing the validity today. The use of tree ring proxies is a joke, yet the people using them are still trying to show they are valid. To discuss the interpretation of this proxy data is a waste of time. Examine your basic assumptions.

climatebeagle

Very nice article by Lucia and very clear & valid comment by michel (13:47:43)

p.g.sharrow "PG"

I remember when research on tree rings was an important field as a proxy for climate favorability to local farm or food production in areas without written records. Sometimes this was used as a presipatation proxy and at other times as a temperature proxy by lazy theoriests to prove their point.
Tree rings are a good proxy for plant growth conditions and little else. With nearly 60 years of growing trees and other plants, I can attest to that fact.

michel

[i]AnonyMoose (16:25:46) :
Instead of throwing out data, should the calibration instead adjust factors which are applied to all of the data? If a linear or geometric relationship doesn’t exist, that’s just too bad.[/i]
Yes, exactly. That’s the procedure that Pamela Gray describes. Call this Method A. You have a huge number of trees. You may have some reason for thinking some trees not others are thermometers. They may, for instance, be of a particular species, particularly large, old, undamaged, regular in shape, irregular in shape, in a particular region. Whatever.
For some reason connected to a an account of their biology you think you’ve found the thing that makes these particular ones accurate registers of temperature. So you pick them. Notice that so far you have not compared any of them to any temperature record.
THEN you plot the temperatures they indicate.
Now you check your results by comparing the temps they seem to indicate with a known real thermometer record you trust for part of the period you are interested in. They show a reasonably decent correlation for this period. Now you have some reason to find it plausible that this sample are thermometers, at least for some period. So you have some reason to think that they will also be thermometers in other periods.
The way Lucia is exposing, Method B, is, you take the same thermometer record and the same initial large sample. Then you take the same temperature record. You just compare your sample against the temperature record, and you throw out all the ones that do not correlate. This is not legitimate.
Again, you can see why, if you consider how likely the two methods are to lead to the same sub-sample. Only under one condition: that a temperature match in your sample period is only found in trees with some biological reason for being good thermometers throughout their lives. But that is the question at issue, so you’ve assumed what you are trying to prove. Method A provides some kind of test of whether your sample is a sample of thermometers, precisely because you did the picking before doing any temperature checks.
It is possible that Method B could work, and yield a sample of thermometers. How would you know? Well, you’d have to do Method A, find the key biological variables, then see if Method B also selected the trees with them. Bit of a waste of time then, doing Method B. In short, you have no way of avoiding doing Method A. Method B adds nothing.
You can also see the problem if you consider how certain you will be in the first case of correlating to temperature in any given period. Not at all a priori. It really will depend on whether your theory about the biology of the trees is correct, and that is one of the things you’ll be finding out in Method A, when you compare your runs against known temps.
Therefore, in Method A, correlating against temperatures really may give you some new information. In Method B, as the red noise example shows, it does not. It just picks the samples that show what you want them to show in the period you’ve picked for correlation.
Take this a step further. Someone criticizes your procedure in Method B by arguing that you just picked trees that correlate in your temperature measured time, and that this shows nothing about whether this sample correlated earlier or later. Aha, you say. I have calculated the results of picking from my tree sample randomly, and correlating the resulting sample against temperature in my period. If I do this, I find that very few of my trees correlate with temps. This shows I am not picking at random. I am picking ones that really do correlate.
Which of course, we knew all along was being done, and the problem was not that at all. It was that the procedure gave us no reason to think this sample would show the same correlations outside the measured temp period that it did within it.
There is no way around this one. You cannot do sampling like this. This is like the absorption spectrum of CO2, or gravity, this really is settled science. There is no point denying it. Its just that because its statistics, it can be a bit hard to get your head around. But it really is settled.
It does seem very odd that Climate Scientists should refuse to accept settled science when it comes to statistics, or should misrepresent what the settled science actually is – they have form on this, have a look at Tamino’s series on PCA – but accuse others of denialism when they point out that the science on this matter is not what they are saying it is. Doing and defending statistics in this way really is denialism.

Greg Cavanagh

Tom P (14:31:51) :
“Although 484 (~40%) pass the temperature screening process over the full (1850 –1995) calibration interval, one would expect that no more than ~150 (~13%) of the proxy series would pass the screening procedure described above by chance alone. This observation indicates that selection bias, although potentially problematic when employing screened predictors… does not appear a significant problem in our case.
I’ve got an issue with this.
Of the 100% of data captured they know 13% is biased. They then discard 60% of the data keeping the remaining 40%, which still contains the 13% biased data. In effect they now have 32.5% biased data.

A Mathematica notebook, and its PDF preview, showing how you get a hockey stick out of red noise if you prefer the series that show warming at the end:
http://cid-9cd81cfa06ff7718.skydrive.live.com/self.aspx/.Public/mann-hockey-fun.pdf
http://cid-9cd81cfa06ff7718.skydrive.live.com/self.aspx/.Public/mann-hockey-fun.nb
http://motls.blogspot.com/2009/09/beaten-with-hockey-sticks-yamal-tree.html
The problem of the MBH98, MBH99, and similar papers was (and is!) that the algorithm preferred proxies – or trees (or their equivalents) – that showed a warming trend in the 20th century, assuming that this condition guaranteed that the trees were sensitive to temperature.
But even if such a 20th century trend occurred by chance for a certain tree (and a fraction of the trees inevitably satisfies this condition), the corresponding tree would influence Mann’s final graphs a lot. Effectively, the algorithm picked a lot of trees that didn’t show any correlation with the temperature but they were rather composed out of random data – red noise – before 1900, and an increasing trend in 1900-2000.
You can’t be surprised that the average of such trees looked like a hockey stick even if the temperature didn’t. The noise before 1900 averages to a constant temperature or something close to it while the 20th century warming survives.

Tom P

stevemcintyre (16:02:45) :
“I didn’t test how to do a proper benchmark – at this stage, one merely knows that Mann’s benchmark is fudged.”
That’s what I call a statistical analysis! You promised back in January in your original posting you referenced to look further into “questionable Mannian benchmarks”. Are you going to do this?

O. Weinzierl

Why not extend the timeline of the random data to 2100? You can get future climate data this way much cheaper than modelling and at least the same quality, probably even better.

Jack Simmons

dhmo (13:07:32) :

Look I am not a scientist but this whole thing smacks to me of snake oil. To believe that you can take trees and get precise enough temperatures from them to splice it with modern records and get measurements to the tenth of a degree is delusional. Perhaps we should just chuck out our thermometers and use trees instead.

If its all the same to you, I would prefer to keep the rectal thermometer I have.

Patrik

Tom P>> Please enlighten us:
Please tell us how Mann et al knows that one should expect ~13% conformity with measurements from a purely random source?
I’m not sure that I will understand the answer, but at least you owe the more mathematically gifted people here that.

Bulldust

Sooner or later someone is going to come up with something witty regarding Pres Washington and cherry trees… I can feel it in my waters. Somehow I sense a missed opportunity that he did not graph the rings.
Anywho… as for data & statistics, look no further than Aesop (and that IS going back a ways):
“We can easily represent things as we wish them to be.”
Tru dat.

Patrik

Greg Cavanagh (00:47:35)>>
I’m not sure that is the context of the Mann quote. I reacted the same way as you first, but:
I believe they are saying that if the conforming sources would be amount to ~13%, then those series would probably be of a random nature.
This actually doesn’t mean that ~13% is always random noise.
I’m still wondering how on earth they would know that 13% is the magic number.
Perhaps there is some statistical algorithm that can resolve this likelihood?
I’m waiting for Tom P to give us this algortithm.
On the other hand, I really don’t see why such an algorithm would be of any significanse, since:
A) The sources are organic an may well have developed differently during different time spans.
B) Point A) means that any or none source could be valid BEFORE ~1850, totally unrelated to any mathematical formulae.
I believe that A and B is supported by people who knows much about trees and vegetation in general.
And if A and B is true, then this is also true:
C) The probabilty for pre 1850 data being noise, as opposed to temp data, must be resolved also.
If C) is resolved and it is found that the probability for these data being temp is sufficiently high, then one should proceed to:
D) Gather measured temp data from exactly those areas where the samples where collected (which would probably only leave sattelite temp data from ~1980 to compare with) and then:
E) Start over again from A), and only do matching against the actually measured data (sattelite) from the exact areas where one have gathered the proxy data.
I believe the above A-E process is impossible, since there will probably be a huge gap between available proxies and temp data from ~1980, but this process only would reveal how T might have fluctuated in these actual areas of proxy collection.
I believe there is a looong way to wander before any significance can be given to these pre-measurement data used in reconstructions.

UK Sceptic

I have always thought it ironic that Gore used a cherry picker (what we in the UK call an elevator machine) to demonstrate the Hockey Stick graph. Now, with Mr McIntyre lifiting the lid on the diddled dendro revelations, the irony is all the more delicious.

Stefan

I like Lucia’s simple article, very informative, and fun too! I like that it demonstrates things in a way a layperson can understand.
One of the things that has often troubled me about Climate Science, is the attitude that, the science is so complex that only experts are qualified to judge. And yet, sooner or later, one has to summarise, and the summary needs to make sense. I’m sure one needs to be expert to grasp the details, but one need only have a brain to grasp the logic.
As for suggestion that the biases have already been corrected or accounted for, well, what can I say, I’ve been taught a little something by various experts on how people deceive themselves, how I deceive myself, and I know that it takes a lot of introspection and testing to actually get out of that hole. The automatic response is to create a smoke screen, and I emphasise, it is automatic–the person under the illusion doesn’t know it. Just watch a friend say he’s planning on starting a diet on Jan 1st, as a New Year’s resolution, and then watch in January and February all the reasons the friend gives for why it isn’t the right time to start yet. All the reasons are reasonable and make sense. We all know this. No expertise required.
So when scientists are criticised, their supporters can claim, “oh, we already knew that! we’ve already corrected!” The more truthful can say, “oh, well, um… I’m sure they probably corrected for that already!”
The cool thing about being a layperson or not having a vested interest in the topic, is that there is no pressure to be right about it. I mean, this is what lay AGW supporters keep saying, they keep claiming that various scientists are paid by oil industry–in effect affirming that scientists can be easily biased, and yet any scientist in favour of AGW is somehow so immune to bias that they can self-correct for any and all bias–and we can even assume that they have done this, to the point that we should be the ones to jump through hoops to disprove this, rather than them jumping through hoops to show that they have.
I’m sure my friend is very right, that he is already perfectly well aware that he didn’t start the diet, and as he says that’s for some very good reasons–plus he’s already planning a new diet that’ll be twice as effective–and he doesn’t need to say what they are, for what business is it of mine anyway.