CG2 and Ex Post Picking

Reposted from Climate Audit

CG2 and Ex Post Picking

Jul 31, 2019: Noticed this as an unpublished draft from 2014. Not sure why I didn’t publish at the time. Neukom, lead author of PAGES (2019) was coauthor of Gergis’ papers.

One of the longest-standing Climate Audit controversies has been about the bias introduced into reconstructions that use ex post screening/correlation.   In today’s post, I’ll report on a previously unnoticed Climategate-2 email in which a member of the paleoclimatology guild (though then junior) reported to other members of the guild that he had carried out simulations to test “the phenomenon that Macintyre has been going on about”, finding that the results from his simulations from white noise “clearly show a ‘hockey-stick’ trend”, a result that he described as “certainly worrying”.

A more senior member of the guild dismissed the results out of hand:  “Controversy about which bull caused mess not relevent.”  Members of the guild have continued to merrily ex post screen to this day without cavil or caveat.

The bias, introduced by ex post screening of a large network of proxies by correlation against increasing temperatures, has been noticed and commented on (more or less independently) by myself, David Stockwell, Jeff Id, Lucia and Lubos Motl.  It is trivial to demonstrate through simulations, as each of us has done in our own slightly different ways.

In my case, I had directed the criticism of ex post screening particularly at practices of D’Arrigo and Jacoby in their original studies: see, for example, one of the earliest Climate Audit posts (Feb 2005) where I wrote:

Jacoby and d’Arrigo [1989] states on page 44 that they sampled 36 northern boreal forest sites within the preceding decade, of which the ten “judged to provide the best record of temperature-influenced tree growth” were selected. No criteria for this judgement are described, and one presumes that they probably picked the 10 most hockey-stick shaped series.  I have done simulations, which indicate that merely selecting the 10 most hockey stick shaped series from 36 red noise series and then averaging them will result in a hockey stick shaped composite, which is more so than the individual series.

The issue of cherry picking arose forcefully at the NAS Panel on paleoclimate reconstructions on March 2, 2006 when D’Arrigo told a surprised panel on March 2 that you had to pick cherries if you wanted to make “cherry pie”, an incident that I reported in a blog post a few days later on March 7 (after my return to Toronto.)

Ironically, on the same day, Rob Wilson, then an itinerant and very junior academic, wrote a thus far unnoticed CG2 email (4241. 2006-03-07) which reported on simulations that convincingly supported my concerns about ex post screening. Wilson’s email was addressed to most of the leading dendroclimatologists of the day:  Ed Cook, Rosanne D’Arrigo, Gordon Jacoby, Jan Esper, Tim Osborn, Keith Briffa, Ulf Buentgen, David Frank,  Brian Luckman and Emma Watson, as well as Philip Brohan of the Met Office. Wilson wrote:

Greetings All,

I thought you might be interested in these results. The wonderful thing about being paid properly (i. e. not by the hour) is that I have time to play.

The whole Macintyre issue got me thinking about over-fitting and the potential bias of screening against the target climate parameter.  Therefore, I thought I’d play around with some randomly generated time-series and see if I could ‘reconstruct’ northern hemisphere temperatures.

I first generated 1000 random time-series in Excel – I did not try and approximate the persistence structure in tree-ring data. The autocorrelation therefore of the time-series was close to zero, although it did vary between each time-series. Playing around therefore with the AR persistent structure of these time-series would make a difference. However, as these series are generally random white noise processes, I thought this would be a conservative test of any potential bias.

I then screened the time-series against NH mean annual temperatures and retained those series that correlated at the 90% C. L. 48 series passed this screening process.

Using three different methods, I developed a NH temperature reconstruction from these data:

  1. simple mean of all 48 series after they had been normalised to their common period
  2. Stepwise multiple regression
  3. Principle component regression using a stepwise selection process.

The results are attached.  Interestingly, the averaging method produced the best results, although for each method there is a linear trend in the model residuals – perhaps an end-effect problem of over-fitting.

The reconstructions clearly show a ‘hockey-stick’ trend. I guess this is precisely the phenomenon that Macintyre has been going on about. [SM bold]

It is certainly worrying, but I do not think that it is a problem so long as one screens against LOCAL temperature data and not large scale temperature where trend dominates the correlation. I guess this over-fitting issue will be relevant to studies that rely more on trend coherence rather than inter-annual coherence. It would be interesting to do a similar analysis against the NAO or PDO indices. However, I should work on other things.

Thought you’d might find it interesting though. comments welcome

Rob

Wilson’s sensible observations, which surely ought to have caused some reflection within the guild, were peremptorily dismissed about 15 minutes later by the more senior Ed Cook  as nothing more than “which bull caused which mess”:

You are a masochist. Maybe Tom Melvin has it right:  “Controversy about which bull caused mess not relevent. The possibility that the results in all cases were heap of dung has been missed by commentators.”

Cook’s summary and contemptuous dismissal seems to have persuaded the other correspondents and the issue receded from the consciousness of the dendroclimatology guild.

Looking back at the contemporary history, it is interesting to note that the issue of the “divergence problem” embroiled the dendro guild the following day (March 8) when Richard Alley, who had been in attendance on March 2, wrote to IPCC Coordinating Lead Author Overpeck “doubt[ing] that the NRC panel can now return any strong endorsement of the hockey stick, or of any other reconstruction of the last millennium”: see 1055. 2006-03-11 (embedded in which is Alley’s opening March 8 email to Overpeck). In a series of interesting emails (e.g. CG2 1983. 2006-03-08;  1336. 2006-03-09; 3234. 2006-03-10; 1055. 2006-03-11), Alley and others discussed the apparent concerns of the NAS panel about the divergence problem, e.g. Alley:

As I noted, my observations of the NRC committee members suggest rather strongly to me that they now have serious doubts about tree-rings as paleothermometers (and I do, too… at least until someone shows me why this divergence problem really doesn’t matter). —

In the end, after considerable pressure from paleoclimatologists, the NAS Panel more or less evaded the divergence problem (but that’s another story, discussed here from time to time.)

Notwithstanding Wilson’s “worry” about the results of his simulations, ex post screening continued to be standard practice within the paleoclimate guild.  Ex post screening was used, for example, in the Mann et al (2008) CPS reconstruction.  Ross and I commented on the bias in a comment published by PNAS in 2009 as follows:

Their CPS reconstruction screens proxies by calibration-period correlation, a procedure known to generate ‘‘hockey sticks’’ from red noise (4 – Stockwell, AIG News, 2006).

In their reply in PNAS, Mann et al dismissed the existence of ex post screening bias, claiming that we showed  “unfamiliarity with the concept of screening regression/validation”:

McIntyre and McKitrick’s claim that the common procedure (6) of screening proxy data (used in some of our reconstructions) generates ”hockey sticks” is unsupported in peer reviewed literature and reflects an unfamiliarity with the concept of screening regression/validation.

CA readers will remember that the issue arose once again in Gergis et al 2012, who had claimed to have carried out detrended screening, but had not.  CA readers will also recall that Mann and Schmidt both intervened in the fray, arguing in favor of ex post screening as a valid procedure.

Advertisements

26 thoughts on “CG2 and Ex Post Picking

  1. Always exhilarating to be so vividly reminded that some of these so-called scientists can be just as stuffed with self-importance and wilfully blind as any leftist celebrity.

  2. I saw the video of that NAS panel session where Rosanne D’Arrigo was giving her presentation. She was asked about “cherry picking” the data. That is when she said “you have to pick cherries if you wanted to make cherry pie”
    In front of a NAS panel.
    Showstopper. Yow.

    I find it amazing that people would try to defend any of this. You screen a bunch of data sets for a specific pattern. Of course, the combination of the selected data sets will show the pattern. And a very enhanced pattern. Where the data sets have no pattern, like in the early portion of the time series, minor ups and downs cancel out. Where the data sets have a common element, that element sums together as the data sets are added.
    But wait, There’s more!
    Your “Big Effect” will show very strong statistical significance, and a “wee p value”.
    The statistical test can not know that the data sets were pre-screened. The test assumes “random”, and so gives a bogus significance.

    “When you screen for hockey sticks, hockey sticks are what you will find.”

      • When you filter a signal, you are removing an unwanted or interfering component.
        Here, they are including a whole data set or excluding a data set based on the presence or absence of a desired feature. This is totally different from what you do.
        You use the word “filter”, they may also use the word “filter”. DO NOT ASSUME the process is the same just because the word is the same.
        Note that this selection process produces the desired “hockey stick” shape even when fed a synthetic red noise signal. I would doubt that your signal processing produces “hockey stick” shapes or anything else from red noise.

  3. “Ex post screening”? There are very good reasons for double blind designs in experiments, and not just in medicine or psych. As the people dealing with the data have strong expectations of how they want the study to come out, using a procedure that allows for bias has a strong chance of producing illusory results.
    This is different from deliberate deception, but probably more common in producing bad results.

    • I can’t agree it is “different” from deliberate deception. If the perpetrator is trained in science and competent then they must know this is a biased process and the results unreliable. It is unscientific. If they claim otherwise they are deliberately deceptive. Unfortunately the entire body of scientific literature is riddled with this type of deception and the incentives encourage it.

  4. I’m not a great archer. I know which end of the arrow goes forward (the pointy end, I think), and how not to slap my forearm with the bowstring, but that’s about it.

    Still, I can reliably shoot a dozen or more arrows into the center of as many 1 inch diameter circles drawn on a target surface (like the side of a barn). Without fail. Every time. All I have to do is shoot the arrows first, then draw the circles.

    • You are a great “Texas Sharpshooter”. There have been a few in the paleo proxy crowd caught doing that. This case is a little different. What they did is like shooting a thousand arrows at the target. Then they remove all the ones that did not hit the bullseye, and show off the remaining ones.
      They have convinced themselves that there is nothing wrong with this.

    • It’s called the Texas sharpshooter fallacy

      The problem in paleothermometry appears to be group think. Everybody is doing something stupid and evil but since everyone else is doing it, it’s OK. link Hint … when the chickens come home to roost, it’s not OK at all and it is evil.

  5. “ex post screening” *IS* a completely valid procedure if you want to produce a given result…just do not confuse it with being a valid unbiased statistical procedure, or with it being valid for producing a scientific result.

    I honestly do not think most “scientists” studying climate understand statistics. Any, at all. I don’t think they understand how misleading a simple average can be. They fool themselves and then rant at the world.

    • I have been reading fisheries simulation papers lately and also suspect statistical training.
      “But in the absence of critical questions related to validity when using ex post facto methodology, conclusions can be problematic.” This came out of one my papers, one did most of the stats, but three of us checked each other. No bragging intended, we all learned this long ago from different schools of statistics over a period of at least two decades.

      “We accepted the model simulations because good, long-term estimates of annual XXXX abundances are not available…” This came out of a recent paper that seems to claim to understand the necessity of verification and does not seem to to bully or snow, but is heavy into the importance of models.

      I’m still studying, but have to wonder. In this work from a long time fisheries modeler who actually worked with a lot of real data, [Cushing, D. H. 1996. Towards a science of recruitment in fish populations. Excellence in Ecology. Inter-Research 7. 175pp.] a main conclusion was that a “…..science of recruitment does not exist,” and offered some suggestions. The number of variables that should be in fisheries models is greater than climate, partly because it includes it, but also because of the biology.

  6. “…finding that the results from his simulations from white noise “clearly show a ‘hockey-stick’ trend”, a result that he described as “certainly worrying”.”

    Isn’t it true that McIntyre showed that result using red noise?

    I see white noise mentioned a couple of times in the correspondence, but I recall it was red noise data sets that were used to demonstrate that Mann’s MBH98 filtering always produced a hockey stick. Red noise is a random walk from the last data point, not random numbers between two limiting values (white noise).

    So whether Mann used tree rings, Excel random number generators, the number of hairs on babies at birth or the number of starts that can be seen on sequential nights, the ex post filtering would only have selected those which fit a predetermined shape – his fantasy temperature chart with no medieval warm period and a sharp rise near the industrial age – remembering that it still failed towards the end, so he tacked high resolution instrument measurements on top to complete the blade.

    The usual consequence for that is being defrocked by the institution that granted the relevant degree. To see the others supporting this nonsense it just unbelievable tribalism. And they were warned by a neophyte demonstrating to all that even a beginner could see it was cheating and rendered the paper, MBH98, worthless.

  7. Steve Mc has brought this up as a result of the latest attempt to resurrect the dead parrot of the hockey stick, by the PAGES2k team. Amazingly, they use the long-discredited technique of proxy screening/weighting discussed by Rob Wilson in that 2006 email (and also described many times here and at other sceptic blogs).

    See Steve’s tweets on this (start with his current pinned tweet) or my blog post linked by Clipe.

  8. A lot of the initial errors in early papers can be blamed on lack of stats skill among team members.
    As pointed out in the report into the UEA scandal.

    However, the truly shocking thing is the team still uses bad stats after the effect of bad stats had been pointed out to them, quite publicly, years ago.

    I would like to think that in any other branch of science, if you had been publicly shown to be very wrong in your techniques, methods etc that you change your ways immediately. (and keep your head down for a while!)

  9. This can be explained to a high school graduate. My question is just a repeat of the questions above, but if this is so basic, why do Michael Mann and those that follow him, still have jobs? Why are he and his work and some of those that have criticized it, still in the courts? He should have fallen into world-wide disrepute years ago and shunned into oblivion. Why has this not happened? The whole hockey stick illusion is the underpinning of trillions of dollars in wasted money. But the “emperor” has no clothes and no hockey stick. Even the IPCC has recognized this and stopped using the Mann graphic, yet it is still the symbol of the scam in everyone’s mind, and all the graphics of modern temperature include a form of it. All the data from GISS is made to eventually show it, again, when averaged together. The whole climate scam is built around this technique. All the models (except one Russian) are designed to support it. All “climate scientists” know this technique and use it. It is so universal it is accepted! Why???

  10. “Cook’s summary and contemptuous dismissal seems to have persuaded the other correspondents and the issue receded from the consciousness of the dendroclimatology guild.”

    The same Ed Cook who in CG1, believing it was confidential, said that we know eff all about variability greater than 100 years in the past, but still defends the false narrative.

  11. Mathematically, the formal name for this error (ex post picking) is:
    “Selecting on the dependent variable”. Google to see why it generates false results.

  12. This problem is common in social sciences. A filtered sample is no longer random. But statistics requires a random sample to deliver accurate results. In effect you are lying to the stats package when you filter. So by filtering out the garbage in, you end up creating garbage out, which is hard for non mathematicians to understand. It seems illogical to the layman.

Comments are closed.