Duke Neukom's Secret Sauce

Guest Post by Willis Eschenbach

In my last post, I talked about the “secret sauce”, as I described it, in the Neukom et al. study “Inter-hemispheric temperature variability over the past millennium”. By the “secret sauce” I mean the method which is able to turn the raw proxy data, which has no particular shape or trend, into a consensus-approved hockeystick … but in truth, I fear I can’t reveal all of the secret sauce, because as is far too common in climate science, they have not released their computer code. However, they did provide some clues, along with pretty pictures.

neukom figure 1Figure 1. The overview graphic from Neukom2014. Click to embiggen.

So what did they do, and how did they do it? Well, don your masks, respirators, coveralls, and hip boots, because folks, we’re about to go wading in some murky waters …

From my last post, Figure 2 shows the mean of the proxies used by Neukom, and the final result of cooking those proxies with their secret sauce:

mean neukom proxies final resultFigure 2. Raw proxy data average and final result from the Neukom2014 study. Note the hockeystick shape of the result.

Let me start with an overview of the whole process of proxy reconstruction, as practiced by far too many paleoclimatologists. It is fatally flawed, in my opinion, by their proxy selection methods.

What they do first is to find a whole bunch of proxies. Proxies are things like tree ring widths, or the thickness of layers of sediment, or the amounts of the isotope oxygen-18 in ice cores—in short, a proxy might be anything and everything which might possibly be related to temperature. The Neukom proxies, for example, include things like rainfall and streamflow … not sure how those might be related to temperature in any given location, but never mind. It’s all grist for the proxy mill.

Then comes the malfeasance. They compare the recent century or so of all of the proxies to some temperature measurement located near the proxy, like say the temperature of their gridcell in the GISS temperature dataset. If there is no significant correlation between the proxy and the gridcell temperature where the proxy is located, the record is discarded as not being a temperature proxy. However, if there is a statistically significant correlation between the proxy and the gridcell temperature, then the proxy is judged to be a valid temperature proxy, and is used in the analysis.

Do you see the huge problem with this procedure?

The practitioners of this arcane art don’t see the problem. They say this procedure is totally justified. How else, they argue, will we be able to tell if something actually IS a proxy for the temperature or not? Here is Esper on the subject:

However as we mentioned earlier on the subject of biological growth populations, this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal. The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.

“An advantage unique to dendroclimatology”? Why hasn’t this brilliant insight been more widely adopted?

To show why this procedure is totally illegitimate, all we have to do is to replace the word “proxies” in a couple of the paragraphs above with the words “random data”, and repeat the statements. Here we go:

They compare the recent century or so of all of the random data proxies to some temperature measurement located near the random data proxy, like say the temperature of their gridcell in the GISS temperature dataset. If there is no significant correlation between the random data proxy and the gridcell temperature, the random data proxy is discarded. However, if there is a statistically significant correlation between the random data proxy and the gridcell temperature, then the random data proxy is judged to be a valid temperature proxy, and is used in the analysis.

Now you see the first part of the problem. The selection procedure will give its blessing to random data just as readily as to a real temperature proxy. That’s the reason why this practice is “unique to dendroclimatology”, no one else is daft enough to use it … and sadly, this illegitimate procedure has become the go-to standard of the industry in proxy paleoclimate studies from the original Hockeystick all the way to Neukom2014.

The name for this logical error is “post-hoc proxy selection”. This means that you have selected your proxies, not based on some inherent physical or chemical properties that tie them to temperature, but on how well they match the data you are trying to predict …

The use of post hoc proxy selection in Neukom2014 is enough in itself to totally disqualify the study … but wait, it gets worse. I guess that comparing a proxy with the temperature record of the actual gridcell where it is physically located was too hard a test, and as a result they couldn’t find enough proxies random data that would pass that test. So here is the test that they ended up using, from their Supplementary Information:

We consider the “local” correlation of each record as the highest absolute correlation of a proxy with all grid cells within a radius of 1000 km and for all the three lags (0, 1 or -1 years). A proxy record is included in the predictor set if this local correlation is significant (p<0.05).

“Local” means within a thousand kilometers? Dear heavens, how many problems and misconceptions can they pack into a single statement? Like I said, hip boots are necessary for this kind of work.

First question, of course, is “how many gridcells are within 1,000 kilometres of a given proxy”? And this reveals a truly bizarre problem with their procedure. They are using GISS data on a regular 2° x 2° grid. At the Equator, there are no less than 68 to 78 of those gridcells whose centers are within 1,000 km of a given point, depending on the point’s location within the gridcell … so they are comparing their proxy to ABOUT 70 GRIDCELL VALUES!!! Talk about a data dredge, that about takes the cake … but not quite, because they’ve outdone themselves.

The situation on the Equator doesn’t take the cake once we consider say a proxy which is an ice core from the South Pole … because there are no less than 900 2° x 2° gridcells within 1000 kilometres of the South Pole. I’ve heard of tilting the playing field in your favor, but that’s nonsense.

I note that they may be a bit uneasy about this procedure themselves. I say this because they dodge the worst of the bullet on other grounds, saying:

The predictors for the reconstructions are selected based on their local correlations with the target grid. We use the domain covering 55°S-10°N and all longitudes for the proxy screening. High latitude regions of the grid are excluded from the correlation analysis because south of 55°S, the instrumental data are not reliable at the grid-point level over large parts of the 20th century due to very sparse data coverage (Hansen et al., 2010). We include the regions between 0°N and 10°N because the equatorial regions have a strong influence on SH temperature variability.

Sketchy … and of course that doesn’t solve the problem:

Proxies from Antarctica, which are outside the domain used for proxy screening, are included, if they correlate significantly with at least 10% of the grid-area used for screening (latitude weighted).

It’s not at all clear what that means. How do you check correlation with 10% of a huge area? Which 10%? I don’t even know how you’d exhaustively search that area. I mean, do you divide the area into ten squares? Does the 10% have to be rectangular? And why 10%?

In any case, the underlying issue of checking different proxies against different numbers of gridcells is not solved by their kludge. At 50°S, there are no less than one hundred gridcells within the search radius. This has the odd effect that the nearer to the poles that a proxy is located, the greater the odds that it will be crowned with the title of temperature proxy … truly strange.

And it gets stranger. In the GISS temperature data, each gridcell’s temperature is some kind of average of the temperature stations in that gridcell. But what if there is no temperature station in that gridcell? Well, they assign it a temperature as a weighted average of the other local gridcells. And how big is “local” for GISS? Well … 1,200 kilometres.

This means that when the proxy is compared to all the local gridcells, in many cases a large number of the gridcell “temperatures” will be nothing but slightly differing averages of what’s going on within 1,200 kilometres.

Not strange enough for you? Bizarrely, they then go on to say (emphasis mine):

An alternative reconstruction using the full un-screened proxy network yields very similar results (Supplementary Figure 20, see section 3.2.2), demonstrating that the screening procedure has only a limited effect on the reconstruction outcome.

Say what? On any sane planet, the fact that such a huge change in the procedure has “only a limited effect” on your results should lead a scientist to re-examine very carefully whatever they are doing. To me, the meaning of this phrase is “our procedures are so successful at hockeystick mining that they can get the same results using random data” … how is that not a huge concern?

Returning to the question of the number of gridcells, here’s the problem with looking at through that many gridcells to find the highest correlation. The math is simple—the more times or places you look for something, the more likely you are to find an unusual but purely random result.

For example, if you flip a coin five times, the odds of all five flips coming up heads are 1/2 * 1/2 * 1/2 * 1/2 * 1/2. This is 1/32, or about .03. This is below the normal 0.05 significance threshold usually used in climate science.

So if that happened the first time you flipped a coin five times, five heads in a row, you’d be justified in saying that the coin might be weighted.

But suppose you repeated the whole process a dozen times, with each sample consisting of flipping the same coin five times. If we come up with five heads at some point in that process, should we still think the coin might be loaded?

Well … no. Because in a dozen sets of five flips, the odds of five heads coming up somewhere in there are about 30% … so if it happens, it’s not unusual.

So in that context, consider the value of testing either random data or a proxy against a hundred gridcell temperatures, not forgetting about checking three times per gridcell to include the lags, and then accepting the proxy if the correlation of any one of those is above 0.05 … egads. This procedure is guaranteed to drive the number of false positives through the roof.

Next, they say:

Both the proxy and instrumental data are linearly detrended over the 1911-1990 overlap period prior to the correlation analyses.

While this sounds reasonable, they haven’t thought it all the way through. Unfortunately, procedure this leads to a subtle error. Let me illustrate it using the GISS data for the southern hemisphere, since this is the mean of the various gridcells they are using to screen their data:

giss southern hemisphere temperatueFigure 3. GISS land-ocean temperature index (LOTI) for the southern hemisphere.

Now, they are detrending it for a good reason, which is to keep the long-term trend from influencing the analysis. If you don’t do that, you end up doing what is also known as “mining for hockeysticks”, because the trend of the recent data will dominate the selection process. So they are trying to solve a real problem, but look what happens when we do linear detrending:

detrend giss southern hemisphere temperature detrendedFigure 4. Linearly detrended GISS land-ocean temperature index (LOTI) for the southern hemisphere.

All that this does is change the shape of the long-term trend. It does not remove the trend, it rises steadily after 1910. So they are still mining for hockeysticks.

The proper way to do this detrending is to use some kind of smoothing filter on the data to remove the slow swings in the data. Here’s a loess smooth, you can use others, the particular choice is not critical for these purposes:

loess giss southern hemisphere temperature detrendedFigure 5. Loess smooth of GISS land-ocean temperature index (LOTI) for the southern hemisphere.

And once we subtract that loess smooth (gold line) from the GISS LOTI data, here’s what we get:

loess detrend giss southern hemisphere temperature detrendedFigure 6. GISS land-ocean temperature index (LOTI) for the southern hemisphere, after detrending using a loess smooth.

As you can see, that would put all of the proxies and data on a level playing field. Bear in mind, however, that improving the details of the actual method of post-hoc proxy selection is just putting lipstick on a pig … it’s still post-hoc proxy selection.

And since they haven’t done that, they are definitely mining for hockeysticks. No wonder that their proxy selection process is so meaningless.

From there, the process is generally pretty standard. They “calibrate” each proxy using a linear model to determine the best fit of the proxy to the temperature data from whichever of the 70 gridcells that the proxy got the best correlation with. Then they use some other portion of the data (1880-1910) to “validate” the calibration parameters, that is to say, they check how well their formula works to replicate the early portion of the data.

However, in Neukom2014 they introduced an interesting wrinkle. In their words:

For most of these choices, objective “best” solutions are largely missing in literature. The main limitation is that the real-world performance of different approaches and parameters can only be verified over the instrumental period, which is short and contains a strong trend, complicating quality assessments. We assess the influence of these methodological choices by varying methodological parameters in the ensemble and quantifying their effect on the reconstruction results. Obviously, the range within which these parameters are varied in the ensemble is also subjective, but we argue that the ranges chosen herein are within reasonable thresholds, based our own experience and the literature. Given the limited possibilities to identify the “best” ensemble members, we treat all reconstruction results equally and consider the ensemble mean our best estimate.

OK, fair enough. I kind of like this idea, but you’d have to be very careful with it. It’s like a “Monte Carlo” analysis. For each step in their analysis, they generate a variety of results by varying the parameters up and down. That explores the parameter space of the model to a greater extent. In theory this might be a useful procedure … but the devil is in the details, and there are a couple of them that are not pretty. One difficulty involves the uncertainty estimates for the “ensemble mean”, the average of the whole group of results that they’ve gotten by varying the parameters of the analysis.

Now, the standard formula for the errors in calculating the mean has been known for a long time. the error of the mean is the standard deviation of the results, divided by the square root of the number of data points.

However, they don’t use that formula. Instead, they say that the error is the quadratic sum (the square root of the sum of the squares) of the standard deviation of the data and the “residual standard deviation”. I can’t make heads or tails out of this procedure. Why doesn’t the number of data points enter into the calculation of the standard error? Is this some formula I’m unaware of?

And what is the “residual standard error”? It’s not explained, but I think the “residual standard error” is the standard deviation of the residuals in the model for each proxy. This is a measure of how well or how poorly the individual proxy matched up with the actual temperature it was calibrated against.

So they are saying that the overall error can be calculated as the quadratic sum of the year-by-year average of the residual errors of all proxies contributing to that year and the standard deviation of the 3,000 results for that year … gotta confess, I’m not feeling it. I don’t understand even in theory how you’d calculate the expected error from this procedure, but I’m pretty sure that’s not it. In any case, I’d love to see the theoretical derivation of that result.

I mentioned that the devil is in the details. The second kinda troublesome detail about their Monte Carlo method is that at the end of the day, their method does almost nothing.

Here’s why. Let me take one of the “methodological parameters” that they are actually varying, viz:

Sampling the weight that each proxy gets in the PC analysis by increasing its variance by a factor of 0.67-1.5 (after scaling all proxies to mean zero and unit standard deviation over their common period).

OK, in the standard analysis, the variance is not adjusted at all. This is the equivalent of a variance factor of 1. Now, they are varying it above and below 1, from 2/3 to 3/2, in order to explore the possible outcomes. This gives a whole range of possible outcomes, they collected 3,000 of them

The problem is that at the end of the day, they average out all of the results to get their final answer … and of course, that ends them back where they started. They have varied the parameter up and down from the actual value used, but the average of all of that is just the actual value …

Unless, of course, they vary the parameter more in one direction than the other. This, of course, has the effect of simply increasing or decreasing the parameter. Because at the end of the day, in a linear model if you vary a parameter and average the results, all you end up with is what you’d get if you had simply used the average of the random parameters chosen.

Dang details, always messing up a good story …

Anyhow, that’s at least some of the oddities and the problems with what they’ve done. Other than that it is just more of the usual paleoclimate handwaving, addition and distraction. Here’s one of my favorite lines:

To determine the extent to which reconstructed temperature patterns are independently identified by climate models, we investigate inter-hemispheric temperature coherence from a 24-member multi-model ensemble

Yes siree, that’s the first thing I’d reach for in their situation, a 24-model climate circus, that’s the ticket …

If nothing else, this study could serve as the poster child for the need to provide computer code. Without it, despite their detailed description, we don’t know what was actually done … and given the fact that bugs infest computer code, they may not even have done what they think they’ve done.

Conclusions? My main conclusion is that almost the entire string of paleoclimate reconstructions, from the Hockeystick up to this one, are fatally flawed through their use of post-hoc proxy selection. This is exacerbated by the bizarre means of selection. In addition their error results seem doubtful. They are saying that they know the average temperature of the southern hemisphere in the year 1000 to within a 95% confidence interval of plus or minus a quarter of a degree C?? Really? … c’mon, guys. Surely you can’t expect us to believe that …

Anyhow, that’s their secret sauce … post-hoc proxy selection.

My best wishes to all,

w.

CODA: With post-hoc proxy selection, you are choosing your explanatory variables on the basis of how well they match up with what you are trying to predict. This is generally called “data snooping”, and in real sciences it is regarded as a huge no-no. I don’t know how it got so widespread in climate science, but here we are … so given that post-hoc selection is clearly the wrong way to go, what would be the proper way to do a proxy temperature reconstruction?

First, you have to establish the size and nature of the link between the proxy and the temperature. For example, suppose your experiments show that the magnesium/calcium ratio in a particular kind of seashell varies up and down with temperature. What you do then is you get every freaking record of that kind of seashell that you can lay your hands on, from as many drill cores in as many parts of the ocean as you can find.

And then? Well, first you have to look at each and every one of them, and decide what the rules of the game are going to be. Are you going to use the proxies that are heteroskedastic (change in variance with time)? Are you going to use the proxies with missing data, and if so, how much missing data is acceptable? Are you going to restrict them to some minimum length? Are you only allowing proxies from a given geographical area? You need to specify exactly which proxies qualify and which don’t.

Then once you’ve made your proxy selection rules, you have to find each and every proxy that qualifies under those rules. Then you have to USE THEM ALL and see what the result looks like.

You can’t start by comparing the seashell records to the temperature that they are supposed to predict and throw out the proxies that don’t match the temperature, that’s a joke, it’s extreme data snooping. Instead, you have to make the rules in advance as to what kind of proxies you’re going to use, and then use every proxy that fits those rules. That’s the proper way to go about it.

PS–The Usual Request. If you disagree, quote what you disagree with. Otherwise, no one really knows what the heck you’re talking about.

5 2 votes
Article Rating
115 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
tonyb
Editor
April 4, 2014 12:23 am

Willis
You forgot to mention that sometimes the random data is inverted or truncated, but the answer will still be robust as the more random the data is, the more robust it becomes. (apparently)
tonyb

April 4, 2014 12:27 am

Hint to paleos…if your linearly detrended data show clear linear trends still, you have done something very wrong.

Mike Bromley the Kurd
April 4, 2014 12:38 am

“this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal.”
What???????? this does not mean that one could not improve a silk purse by reducing the number of Sow’s ears used if the purpose of removing sow’s ears is to enhance the desired silk purse.
This is astounding beyond all belief. They come right out and SAY that they fake it. Holy cow.

climatereason
Editor
April 4, 2014 12:41 am

Willis
Perhaps we can have some promotional lapel badges?
“Duke Neukom’s secret sauce-now with added robustness.”
nice piece by the way.
tonyb

April 4, 2014 12:44 am

I’m reminded of the origins of http://en.wikipedia.org/wiki/Duke_Nukem way back when. BTW the wiki does not go into the origins. – a dispute with another Neukom.

Dudley Horscroft
April 4, 2014 12:57 am

My Physics Master used to call this the use of Cook’s Constant and Fudge’s Formula.
Fudge’s formula – Divide the result you want by the result you got. This gives you Cook’s Constant. Then use Fudge’s formula – multiply the result you got by Cook’s Constant. This gives you the result you want. QED. Success!
Used by University examiners to catch poor candidates – eg, giving data for an experiment to calculate the water equivalent of a copper calorimeter. The given data, if correctly worked, gives a value of, say, 0.01. Students knowing that the correct value is 0.1, manage to slip in a little error along the way, and turn in the result 0.1 Zero marks for that question, and look very, very, carefully at the working in all the other questions. Students a bit more honest turn in the correct answer of 0.01 and get full marks. Really bright students, knowing the answer should be 0.1, turn in the answer of 0.01, and add a rider – “I believe that the result of other experiments usually gives a value of 0.1. I would therefore question very carefully the data as recorded in this experiment, and/or the way it was carried out.”

thingadonta
April 4, 2014 1:11 am

Am I right in concluding that the average of random data and hockeyticks is still hockeysticks?
That is, the random data in the past pre ~20th century averages, cancels or smooths out (depending on the statistical method used) to produce a smooth shaft, whilst it combines with the more recent self- selected temperature upticks to give a hockeystick shape overall, because these have been preferentially weighted by the selection method to begin with?
I think, hesitantly, I’m not all that far off, and I am not trained in statistics. But why aren’t these papers reviewed by professional statisticians before they pass peer review?

Stephen Richards
April 4, 2014 1:24 am

Mike Bromley the Kurd says:
April 4, 2014 at 12:38 am
“this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal.”
This new super duper statistical method was used in one of the more famous pieces of fraud that SteveMc disected. I just can’t remember which one.

jim
April 4, 2014 2:04 am

thingadonta says: “That is, the random data in the past pre ~20th century averages, cancels or smooths out (depending on the statistical method used) to produce a smooth shaft, whilst it combines with the more recent self- selected temperature upticks to give a hockeystick shape overall’
JK — That is also my conclusion.
You will also get the average of noise AFTER the calibration period. Perhaps this explains the need to “hide the decline”
Thanks
JK

Nick Stokes
April 4, 2014 3:01 am

If readers are interested, I have posted an active viewer for the Neukom proxy data here.

Espen
April 4, 2014 3:01 am

In textbook statistics, the way to use methods like principal component analysis or stepwise multiple regression is as an exploratory tool to find suitable mathematical models for what you’re studying. When you’ve decided on your model, you should discard all data used during your model building and collect a completely new set of data that you can now test your model on. Then, and only then, can you say anything about the statistical significance of your results.
At least this was the way I learned it at university and the first time I looked at this paleo stuff I thought it was so bizarre and wrong that I was almost certain that I didn’t understand what they were doing, they couldn’t really be messing up things that thoroughly, could they? It seems they could. The whole field of climate science is tainted by the awful statistical wrongdoings of the tree wringers, and I can’t really trust anything of it until the serious researchers speak up against this.

Kon Dealer
April 4, 2014 3:06 am

Why can’t the people who reviewed this paper- who are supposed to be “experts” see the logical fallicies of the methodology?
Is it because they are incompetant, stupid, or mates of the authors (pal review)- or all 3?
I’ll leave you to ponder…

JDN
April 4, 2014 3:18 am

Willis:
That was great! I’ve said it before, you should do a course on stats, like on Youtube or Khan Academy, showing off your mad R skills and some of this real world data analysis on a systematic basis.
I’m sure you’ll exceed “every freaking record” (heh) they have for stats course attendance. Or just EFR for short.

Greg
April 4, 2014 3:23 am

OMG , this is Mann’s hockeystick all over again.
Following tonyb’s point , how many Tiljanders are there is retained “proxies”?
The best part is what the data shows in figure 2. The ‘mean screened proxies, clearly shows c 950 AD a good 0.5K warmer than today and a steady increase from about 1550. They manage to annihilate both those features in their final result.
The other thing they manage to remove is an apparent repetitive bump in the data. Since LIA there have been three bumps and we are currently at the top of a forth one.
If that grey line is the result of their suspect screening, they must have some extra chilli sauce to get to the red line as the final result.
In the caption of figure2 you label grey as ” Raw proxy data average ” yet in the legend you call it “mean screened proxies”. Could you clarify what the grey line is?
Is this data available?
thanks.

April 4, 2014 3:31 am

Never mind the hip boots, you need chest waders for this one.

Doubting Rich
April 4, 2014 3:37 am

Leave them alone. There is a great history of post-hoc proxy selection in climate science. It has been serving climate scientists lucratively … I mean well for years. It is so common it should now count as standard technique in climate science.
So who was it that said that the best scientists go into physics and chemistry, not climate science?

Greg
April 4, 2014 3:46 am

SI fig 6 and 7 shows their uncertainties go through the roof at the end of the data, after generally declining in more recent times.
I guess that not that many proxies run that recently so results are getting unreliable.

Sparks
April 4, 2014 3:57 am

Is “Duke Neukom” a video game reference or just a happy coincidence?

Editor
April 4, 2014 3:59 am

Thanks, Willis. A great analysis and wonderful read.
Regards

April 4, 2014 4:07 am

“Then comes the malfeasance. They compare the recent century or so of all of the proxies to some temperature measurement located near the proxy, like say the temperature of their gridcell in the GISS temperature dataset. If there is no significant correlation between the proxy and the gridcell temperature where the proxy is located, the record is discarded as not being a temperature proxy. However, if there is a statistically significant correlation between the proxy and the gridcell temperature, then the proxy is judged to be a valid temperature proxy, and is used in the analysis.”
Early on in my delving into climate “science,” one of Michael Mann’s followup hockey stick used this same type of procedure and my jaw just dropped and I soon began calling it “algorithmic cherry picking” and finally “Al-Gore-ithmic cherry picking.” I was roundly ridiculed by whole armies of online Gorebots, around 2008. My background was benchtop chemistry and nanofabrication with some genetics lab work too, and in all of that the main lesson driven home was to be oh so careful that you are not fooling yourself, that you really have what you think you do damn it. I couldn’t believe this sort of very simple cheating was allowed in another field of science. To this day such hockey sticks remain unretracted in the literature. It was very difficult and in the end impossible to use this as a useful debate point, online, since AGW enthusiasts simply had no idea how science really worked, how intense the discipline in it was to get things right, at least in the hard physical sciences. So I switched to dirt simple data plots that showed no AGW signal in various old thermometer and tide gauge records. Then there was no hand waving of it away, data on the ground, that is. The Marcott 2013 hockey stick that Willis plotted input data of that shows no blade in any of it, however, was the first hockey stick that anybody at all could competently debunk just by looking at the data. For myself though, I have to chuckle at the meaningless idea that you can average or somehow black box combine various proxy series that vary between each other in upwards, downwards, or kinked trends and expect the result to have any physical meaning at all. It’s the sheer audacity of the bad math involved that makes so few people so far really understand that true cheating is at work, not just over-enthusiasm and precautionary principle panic.

April 4, 2014 4:21 am

The same secret sauce as one finds in Mikey Mann’s ‘Nature Trick’ fraud, namely a deliberate use of statistical fraud based on malicious code malfeasance. I have IT headcount who will for no charge, audit this or any other climate-baloney application from top to bottom. My guess is that the main elements of the coding based fraud will be uncovered within 3 days or less. I offered said resources to the little-Mann, but so far, no interest. In the name of ‘science’ I would assume he would be delighted to prove that his ‘system’ is ‘sound’.

April 4, 2014 4:22 am

Nick: Do you see things differently from Willis?

Chris Wright
April 4, 2014 4:32 am

NikFromNYC says:
April 4, 2014 at 4:07 am
“Then comes the malfeasance…….”
Very nicely put. The sad, sad thing is that these frauds are still winning awards.
On page 153 of Montford’s ‘The Hockey Stick Illusion’ are twelve perfect hockey stick graphs created with Mann’s method. Problem is, eleven were created with random red noise. It’s blindingly obvious that Mann’s method was mining for hockey sticks. It looks like Neukom has been using essentially the same deeply flawed method.
So, here’s my question: does Neukom2014 also create perfect hockey sticks from red noise? Is it possible to replicate Neukom2014 and, if not, why not? Of course, if it can’t be replicated by other researchers then it’s not science, it’s an opinion piece. It sounds as if the proxy data is available, but what about the methodology and software used? Is this publicly available? I assume not.
It seems to me that the best way to prove fraud is to prove, using the various author’s data and methods, that these methods reliably create hockey sticks from random data.
Chris

Nick Stokes
April 4, 2014 4:36 am

bernie1815 says:April 4, 2014 at 4:22 am
“Nick: Do you see things differently from Willis?”

I’ve concentrated so far on visualizing the proxy data, rather than the analysis. I’ll note one thing though. Neukom et all looked at the effect of their screening. In Fig 20, they show the results of recon with:
1. Their screening with the 1000 km test
2, A screening with a 500 km test
3. No screening at all.
4. A simple average.
No screening and the 1000 km test gave very similar results. The 500 km test screened out more proxies, reducing 111 to 85, and made a bit more difference. A simple average was quite a lot different, but that is not surprising.

April 4, 2014 4:42 am

In the bad old days of medical research publishing, oncology studies would commit a similar post-hoc fallacy. Doctors would try a new cancer treatment on a group of patients. At some point the size of the tumors would be compared to the pre-treatment size, and patients would be divided into “responders” if the tumors shrunk, and “non-responders” if the tumors did not shrink. Treatment would be stopped in the non-responders, but continued in the responders until it stopped working for them too. So far, so good. Then the drug company representatives would write a paper on how much the new treatment improved life expectancy in responders when compared with a control group, and how the new treatment should become standard. The graphs were impressive.
By limiting their analysis to “responders” they selected only the patients with cancers that are susceptible to the drug. What about the non-responders? The morbidity of the treatment, combined with the morbidity of progressive disease, shortened their lives.
This created a situation where the 20% of patients who responded lived an extra 6 months, on average, while the 80% of non-responders survived an average of 2 fewer months. The group overall had 0.4 fewer months of survival, and yet the drug was being proposed as a new treatment because of the great job it did for the responders. The non-responders get thrown under the bus because post-hoc selection has crept into the study.
There may still be some hope that the drug can help patients, but not until some test can identify the responders up-front, before the non-responders get exposed to the treatment. Just as Willis describes an up-front selection of proxies, followed by an analysis of *all* the data, an up-front selection of patients, followed by an analysis of *all* the data is part of how we protect ourselves from statistical fallacies and wishful thinking.
I know how it feels to work in the heady times of a new field, where low-hanging fruit appears to be everywhere, and very junior people can make discoveries and be experts, and also what it is like to work in a mature field, where the grave markers of half-cocked theories dot the landscape. I think this is why scientists in more mature disciplines that have developed a culture of rigor and self-criticism, because they have been burned in the past, are more likely to be climate skeptics.

The Ghost Of Big Jim Cooley
April 4, 2014 5:00 am

Excellent, Willis. But I still want Kon Dealer’s question answered:
“Why can’t the people who reviewed this paper- who are supposed to be “experts” see the logical fallicies of the methodology?”

Aussiebear
April 4, 2014 5:01 am

I think this may get Modded. Why does http://www.populartechnology.net/ hate you?
What you write seems, on the face of reasonable.

rgbatduke
April 4, 2014 5:05 am

Now you see the first part of the problem. The selection procedure will give its blessing to random data just as readily as to a real temperature proxy. That’s the reason why this practice is “unique to dendroclimatology”, no one else is daft enough to use it … and sadly, this illegitimate procedure has become the go-to standard of the industry in proxy paleoclimate studies from the original Hockeystick all the way to Neukom2014.
What is really amazing is that statistics is so arcane and difficult, and climatology people so ill-trained in the art, that it is chock full of people that are precisely that daft. If you want to make cherry pie you have to pick cherries. If you want to sell catastrophic global warming you have to take the simple mean of the means of the 36 models in CMIP5 independent of their model independence, how many perturbed parameter ensemble runs contribute, or how well each model does in comparison with the data and then you have to pretend that the envelope of the results has some sort of statistical meaning as a measure of statistical variance in order to be able to make various claims “with confidence”.
Note that this is the exact opposite of what they are doing in proxy estimation. They are refusing to only give weight to models that are at least arguably working across the thermometric data to predict the future. This is one of the places where selectivity could easily be justified, as the models are not at all random samples and do not produce “noise” — comparison with the data is merely identifying probable occult bias, errors in computational methodology, or errors in the physics (all of which exist, I’m pretty sure, in profusion, in most of the models).
They acknowledge in AR5 that this procedure is flawed and means that when they use it they can no longer assess the predictivity of the result or any sort of measure of confidence (in a single line in the entire document that no policy maker will ever read) and then do it anyway.
Then we could go on to kriging, Cowtan and Way, or Trenberth’s paper on millidegree oceanic warming, and how to fill in a sparse grid and make statistically impossible claims for precision at the same time!
I don’t know if these guys understand it, but the predictions and claims of AR-N (for any value of N) are going to be subjected to the brutal effects of time and empirical verification no matter what they do. The data on global climate (thanks largely to enormous investment in technology) is getting to be vast enough, and based on enough unfutzable hardware, and dense enough (although it is a LONG way from adequate there) that it is getting to be very difficult to “readjust” existing temperatures still warmer to prolong the illusion of ongoing warming. RSS alone is putting a serious lid on surface temperatures, for example.
What, exactly, do they plan to do if temperatures actually start to fall as we move into the long slow decline associated with the current (already weak) solar cycle? Or if they merely remain flat? What will they do if arctic sea ice actively regresses to the mean while antarctic ice remains strong? What will they do if the current possible ENSO fizzles like the last two or (their worst nightmare) turns into a strong La Nina and chills the entire Northern Hemisphere? Or just turns out to be weak and have little effect on temperature?
In the case of tree rings, even trees that were selected not infrequently failed to be predictors when compared to known temperatures outside of the selection interval (e.g. the infamous bristlecone pine). That’s the problem with multivariate dependency in a proxy — it might well reflect temperature for fifty years and then turn around and reflect rainfall, or an ecological change, or depletion of nutrients, or predator prey cycles, or volcanic activity, or the flicked switch change associated with a flood.
rgb

Alan Robertson
April 4, 2014 5:15 am

Espen says:
April 4, 2014 at 3:01 am
“In textbook statistics, the way to use methods like principal component analysis or stepwise multiple regression is as an exploratory tool to find suitable mathematical models for what you’re studying. When you’ve decided on your model, you should discard all data used during your model building and collect a completely new set of data that you can now test your model on. Then, and only then, can you say anything about the statistical significance of your results.
At least this was the way I learned it at university and the first time I looked at this paleo stuff I thought it was so bizarre and wrong that I was almost certain that I didn’t understand what they were doing, they couldn’t really be messing up things that thoroughly, could they? It seems they could. The whole field of climate science is tainted by the awful statistical wrongdoings of the tree wringers, and I can’t really trust anything of it until the serious researchers speak up against this.”
_____________________________________
We know that Willis is a fun- loving guy, but he’s just shown us again that he’s serious about unmasking the endemic statistical malpractices of “climate science”, which at this point in time, look more like blatant and deliberate fraud.

Bill Illis
April 4, 2014 5:23 am

Why would anyone do this?
Anyone who is able to obtain a PhD and an academic position at any well-known university is going to know this is wrong mathematically and wrong ethically.
Its depressing that this is occurring and even more depressing that it is encouraged.
It is a symptom of something that has gone really, really wrong.

hunter
April 4, 2014 5:24 am

The AGW hypesters wave the scary pictures around and pretend they represent evidence.

Rob
April 4, 2014 5:38 am

Yes, proxy data reconstructions are perhaps the best example of “non science”.

ferd berple
April 4, 2014 5:44 am

Calibration is known statistically as “selection on the dependent variable”. It is forbidden mathematically because it leads to spurious (false) correlations.
However, in Climate Science, where you are trying to prove something that is not true, spurious correlations are a positive boon.

JustAnotherPoster
April 4, 2014 5:56 am

it won’t be long before rgb is termed a “denier”.

Oscar Bajner
April 4, 2014 5:56 am

I have been trying (as a scientific layman) to understand the (basics of the) science of climate, and the essence of the controversies as interpreted by skeptics, agnostics and cynics alike for probably 10 years now. I have followed several sagas in as much depth as I could stand (with
particular reference to climate audit issues of reconstructions), and I have reached the following firm conclusion:
Two men were walking the plains of the Serengeti when they came upon a pride of lions in the open. The men froze, until several of the lions roused themselves and began to plod purposefully towards them. One of the men sat down, ripped off his heavy boots and
produced a pair of Nike (TM) running shoes from his kitbag.
“What are the hell are you doing?”, asked the other man, “you’ll never manage to outrun those
lions!”
“I know” replied the man, furiously tying up his shoelaces, “I just have to outrun you!”
BTW: It is incomprehensible to me that scientific studies that utilize computers and software are not required to publish their source code, as Willis points out, from a bug catching point
alone, it is necessary.
All models are wrong, but some models are useful.
All software has bugs, but some bugs have been found.

Dudley Horscroft
Reply to  Oscar Bajner
April 4, 2014 6:19 am

rebatduke asks (4 April, 0505:
“What, exactly, do they plan to do if temperatures actually start to fall as we move into the long slow decline associated with the current (already weak) solar cycle? Or if they merely remain flat? What will they do if arctic sea ice actively regresses to the mean while antarctic ice remains strong?”
Simple. They will say that these prove that Climate Change exists and therefore we have to do something really quickly to stop it, because it will be disastrous, and in the mean time send them some more money.

Lance Wallace
April 4, 2014 6:58 am

jgbatduke’s illustrious predecessor J.B.Rhine (at Duke) used the same technique to derive gold from dross in his studies of ESP, telekinesis, etc. By running a large number of Duke students through his card-guessing games, there would be a few high scores. (“Responders” as mentioned by UnfrozenCavemanMD above in connection with drug effectiveness tests). Further tests on the responders might pick out someone doing well on both sets of tests, a super-responder. QED, ESP exists. The late great Martin Gardner dealt with this in his book Fads and Fallacies in the Name of Science.

Lance Wallace
April 4, 2014 6:59 am

Whoops, rgbatduke. (Sorry, rgb).

April 4, 2014 7:02 am

Professor Brown,
The Universities in this country seem to produce an awful lot more Mann’s and Neukom’s than people such as yourself. Could you discuss how you got where you are, and more particularly, how you have managed to stay there? We hear truth from you, every time, and deliberate lies from the “Climate Scientists” who are allowed to use the imprimatur of, say, Princeton, or Stanford, or U of NSW. The problem is not Mann, it is the University Presidents who permit his ilk to flourish.
When I was at Michigan I took an Econ course, and discovered that the professor was an active Communist preaching the drivel from the Club of Rome. I did not last long in that class!
Yes, something in our society has gone very very wrong…

ferdberple
April 4, 2014 7:05 am

http://www.nyu.edu/classes/nbeck/q2/geddes.pdf
“Most graduate students learn in the statistics courses forced upon them that selection on the dependent variable is forbidden, but few remember why, or what the implications of violating this taboo are for their own work.”
At the heart of statistics is the notion of the “random sample”. That you have chosen the data randomly. On this basis you can make statistical conclusions.
“Calibration” changes the data from a “random sample” to a “calibrated sample”. This sample is no longer random, thus your statistical conclusions are no longer valid.
Your statistics may well tell you there is a high correlation, but because your sample is no longer randomly selected, this is a spurious (false) correlation. Thus your conclusions are false, or at best unproven.
In Medicine this has been a hard learned lesson. Many of the treatment disasters of the past have resulted from this statistical mistake. Statistics requires that your sample be randomly selected. As soon as you seek to “qualify” the sample you cannot use statistics to test the results.
Unfortunately Climate Science is one of those soft sciences, where the results trump methodology. If the method gives the expected (desired) answer, the method is assumed to be correct. Snake oil salesmen use the same approach.

ferdberple
April 4, 2014 7:14 am

Lance Wallace says:
April 4, 2014 at 6:58 am
“Responders” as mentioned by UnfrozenCavemanMD
=============
“Responders” violate the statistical requirement of the random sample. This leads to false statistical conclusions.
The problem is that our common sense tells us that we should be able to “improve” the sample by selecting only “responders”. While forgetting that statistics forbids this.

Evan Jones
Editor
April 4, 2014 7:22 am

An advantage unique to dendroclimatology
Sounds like a lot of Gergis to me.
Like I said, hip boots are necessary for this kind of work.
It’s all too hip for me.

Craig Loehle
April 4, 2014 7:25 am

The way science is supposed to work is that things known to compromise your results must be avoided at all cost. That is why randomized double-blind trials were instituted in medicine–if patients knew they were taking the medicine they reported getting better, and the doctors thought they were better. If something violates conservation of energy, you check your equipment and calculations. And always always you must beware of random effects. You make sure your sample size is adequate. You watch out for spurious correlation. You keep samples for verification. post hoc proxy selection has been rigorously shown (and has been known in econometrics for decades) to be a risky procedure able to mine for spurious relationships easily. This problem has been ably demonstrated in stock market forecasting for example. When something has been clearly demonstrated to be likely to mislead, you guard against it, period. You don’t keep doing it over and over because you like the answer. And the reviewers are guilty of this also.

izen
April 4, 2014 7:33 am

@-“To show why this procedure is totally illegitimate, all we have to do is to replace the word “proxies” in a couple of the paragraphs above with the words “random data”, and repeat the statements.”
This is totally illegitimate reasoning because there is never any expectation o possibility that the correlation with random data is anything but coincidence, with a probability that can be calculated.
The correlation between proxies and recent temperature data IS legitimate because there are well established physical and biological processes that result in temperature changes altering the proxy measured as with dO18 isotope analysis.
A lack of correlation in such cases indicates that factors other than temperature are distorting the data so that it should be discarded.
@-“The name for this logical error is “post-hoc proxy selection”. This means that you have selected your proxies, not based on some inherent physical or chemical properties that tie them to temperature, but on how well they match the data you are trying to predict …”
You have this entirely reversed, I am not sure what the name for that logical error is but the proxies are chosen BECAUSE of their known potential for revealing past temperatures based on some inherent physical or chemical properties that tie them to temperature. The correlation with recent temperatures just confirms and quantifies that tie.

Greg
April 4, 2014 7:34 am

“No screening and the 1000 km test gave very similar results. The 500 km test screened out more proxies, reducing 111 to 85, and made a bit more difference. A simple average was quite a lot different, but that is not surprising.”
Is the simple average of the whole lot statistically less valid that cherry pie?
From Willis’ figure 2 the grey line looks like it may be more credible than the processed result.
Steve Mc says the data is now archived at NOAA, anyone have a link?

Rud Istvan
April 4, 2014 7:43 am

Taking a few steps back to survey the landscape, Willis’ excellent posts on Newkom illustrate two larger issues.
First, in the whole proxytology (pun intended) field the tradition of a fundamentally flawed procedure is so ingrained it automatically passes peer review. It is like phrenologists practicing phrenology in the 1800s. The most effective macro-response is not proxy by proxy, post-hoc by post hoc paper wack-a-mole, but rather discrediting proxytology as legitimate science in the first place.
Second, why the need to repeatedly try to establish a hockey stick? I think the main need is the shaft, not the blade. We have thermometer records for the last century showing small scale natural variability beyond reasonable dispute even according to the IPCC. Disappear the MWP and the LIA and you remove large scale natural variability from the AGW equation in order to make future catastrophe claims. Small scale natural variability (the pause) is the biggest thing going against the AGW model projections. Onlynif latge scale variability does not exist is any claim of unprecedented, impending doom… even possible. So that is really the only ‘service to the cause’ proxytologists can provide. Now, there are ways to get at large scale natural variability both qualitatively and quantitatively, for example ClimateReasons use of written historical records, that do not involve proxytology. Its practitioners must be feeling very threatened. They have an illegitimate science (treemometers) using illegitimate methods (post hoc selection) that can easily be rebutted by more legitimate means.
Existentially threatened.

April 4, 2014 7:46 am

Crappy cult science + crappy MSM reporting = dire predictions of doom. Without a warming trend, “fear” is all they have left in their arsenal. I’d venture 97% of Alarmists are “parrots,” just repeating what the MSM, and/or McKibben et al, tell them. This is a battle for minds, and (as the MSNBC poll showed the other day) they are losing. You don’t need a computer model to predict CAGW voices becoming shriller and claims more outrageous as the scientists who are crying wolf hurriedly pressure politicians to enshrine their failed theories into government policy, to support their ongoing bogus research. That’s why the demands of censorship of “deniers/skeptics” are increasing…silence the disbelievers and implement the manifesto….and ignore the record cold temperatures outside…this is going to be fun to watch this slow motion train wreck…the irony is Pachauri, a railroad engineer, is the locomotive’s driver….http://m.youtube.com/watch?v=6VIECzlFVUM. “Drivin’ that train, high on CO2 and methane, Mister Pa-chauri you’d better watch your speed….”

DocMartyn
April 4, 2014 7:57 am

They have used a proxy representing the Galapagos Islands; nice and red is is too. The ‘Cheifo’ looked at the temperature series of the Galapagos Islands and noticed its non-warming;
http://chiefio.files.wordpress.com/2011/02/galapagos-islands-temp-w-a.gif
http://www.wolframalpha.com/input/?i=Galapagos+Islands+temperature
Now, how did they calibrate their proxy?

NormD
April 4, 2014 8:12 am

Stupid question:
Before any method is applied to real data should not one have to demonstrate that the method works correctly when fed random data?
What I am imagining is that you generate say a million sets of random data, apply your method and see if the results show a trend. If so then you know that your method has a bias and you go back to the drawing board.

Matt Skaggs
April 4, 2014 8:21 am

Excellent perspective piece, thanks Willis!

April 4, 2014 8:25 am

Thank you, Willis.
Steamboat Jack (Jon Jewett’s evil twin)

April 4, 2014 8:32 am

Izen,
How does rainfall correlate to temperature? How do tree ring widths or “latewood density” correlate to temperature? How does stream flow correlate to temperature? How do sediment layer thicknesses correlate to temperature?
The simple answer: Not in any way easy to quantify! Such “data” would have been thrown out of my 10th-grade biology class, and just because it comes from Stanford or wherever does not make it correlate.
Come back with something rational, we would all like to hear it…

MarkW
April 4, 2014 8:33 am

The sad thing is that with a straight face, these guys claim to be doing science.

Paul Linsay
April 4, 2014 8:38 am

“The Neukom proxies, for example, include things like rainfall and streamflow … not sure how those might be related to temperature in any given location, but never mind.” You got it right in one and should quit right there.
I’ve never seen a single proxy where it’s been demonstrated that there is a physical connection between the proxy and temperature. For these studies to be valid, there have to be independent experiments demonstrating the connection, along with calibration curves, between temperature and [your proxy here]. (I once asked my arborist if tree rings were thermometers. He just laughed. Nope, they measure precipitation. Worse yet, if the north side of the tree got more water than the south side, the rings would be wider on the north side. There’s even a climategate email by a biologist stating this. ) Quite frankly, I think even the d18O measurements are suspect as temperature proxies. http://scienceofdoom.com/2014/02/24/ghosts-of-climates-past-seventeen-proxies-under-water-i/
An honest presentation of the data would also include error bars derived from the calibration curves. Take a look at the data from the recent BICEP2 experiment that measured the Cosmic Microwave Background. Every point has its one sigma error bar, the standard in physics. Ever see error bars on the data points in a proxy time series? Neither have I.
The entire paleo-proxy effort fails at the level of basic science. All the statistical manipulations in the world can’t change that.

pottereaton
April 4, 2014 8:42 am

Professor Brown: as always, thank you for your clear analysis. You ask some questions I’ve been asking (as have many others):

What, exactly, do they plan to do if temperatures actually start to fall as we move into the long slow decline associated with the current (already weak) solar cycle? Or if they merely remain flat? What will they do if arctic sea ice actively regresses to the mean while antarctic ice remains strong? What will they do if the current possible ENSO fizzles like the last two or (their worst nightmare) turns into a strong La Nina and chills the entire Northern Hemisphere? Or just turns out to be weak and have little effect on temperature?

First they will pretend that the science is and always will be settled. This has already begun. Then, I suppose they will do what ideologues always do. They will become reactionary. They will dig in and defend the widespread changes in policies, regulations, laws, technology and attitudes that they have inspired and for which they lobbied. They will continue to proselytize and indoctrinate children in the old ways. There was always a goal for all this and it was legislative, social and cultural. Revolution by other means. And they have succeeded to a degree. Imposing new technology, passing laws, and decreeing regulations is difficult, but abandoning and/or repealing them when they become, omnipresent, burdensome, archaic and even destructive, is even more so.
Which is not to say that all that change has been negative. But there is much that the new generation of progressives will have to do to clean up the mess that was created by a combination of genuine environmental concern and the ability to raise enormous amounts of funding through the device of climate alarmism.

April 4, 2014 8:51 am

Thanks, Willis. A superb article.
“you have to make the rules in advance as to what kind of proxies you’re going to use, and then use every proxy that fits those rules”; This is the golden rule.

izen
April 4, 2014 9:06 am

@-Michael Moon
“How does rainfall correlate to temperature? How do tree ring widths or “latewood density” correlate to temperature? ”
That they do is a fundamental part of the economic exploitation of timber.
The productivity, expected output of a forest assessed for logging is calculated from the temperature and rainfall records.
http://www.fsl.orst.edu/~waring/Publications/pdf/87%20-%20Copy.pdf

Anachronda
April 4, 2014 9:17 am

Don’t know about anyone else, but I’m looking forward to working “post hoc ergo proxy hoc” into conversation.

Russ R.
April 4, 2014 9:31 am

A “double-blind” approach to proxy selection would be to mix the actual proxy series with an equivalent number of random data series, either drawn from phenomena unrelated to climate (such as sports statistics) or constructed from a random number generator to resemble natural time series data.
That way, neither the proxy selectors nor the proxies themselves know which are real or fake.
If the selection process can differentiate real proxies from the random data, and the historical average of the proxies is materially different from that of the random data, then I might believe that the proxies reconstructions had some value.

Michael D Smith
April 4, 2014 10:23 am

“Greg says:
April 4, 2014 at 3:23 am
OMG , this is Mann’s hockeystick all over again. ”
See, this paleo stuff REALLY IS reproducible! 🙂

Max Hugoson
April 4, 2014 10:31 am

Can ANYONE explain to me why ANY OF THIS has ANY MEANING?
Temperature + Humidity (plus of course, a minor contribution of Atm pressure at the time of a reading) = ENTHALPY. (Or yields it.)
That means the ENERGY in a VOLUME of Air.
Averaging TEMPERATURES is MEANINGLESS. Even if you say, “Well, we are looking at the “changes”, not the averages of the temperatures…” ALL THE MORE B.S. (Barbara Streisand)
BECAUSE the “necessary and sufficient condition” for that to have MEANING would be a “non-
moving data set.” I.e., a “baseline” which said that for season/location…over some period of time
you could say..the temperature(of the observed air mass) will be at this VALUE. Again,
completely impossible to do!
Sorry, while I retreat to my BOMB shelter, to let ALL this “hogwash” go by me!

CrossBorder
April 4, 2014 10:35 am

Stephen Richards says:
April 4, 2014 at 1:24 am
This new super duper statistical method was used in one of the more famous pieces of fraud that SteveMc disected. I just can’t remember which one.
———————————–
Wasn’t it Yamal, especially Yamal06?

April 4, 2014 10:45 am

izen, I suspect you mean well, but you are mistaken in exactly the way that is being discussed. I know the appeal to “common sense” feels strong, but in this case it is wrong. Every proxy is a mixture of signal and noise, even the ones with fairly straightforward temperature relationships, like oxygen isotopes, unlike most of the proxies used here, which are mostly noise to begin with, as you can see from the graph. The post-hoc selection of proxies allows noise that by chance correlates to recent temperature to be magically (fraudulently) converted into signal. Even people with higher education in science make this mistake because it is so seductive. So, don’t feel bad. Statistical rigor can be cruel, but we are better off for it.

tty
April 4, 2014 10:58 am

The weird thing is that there is a number of paleotemperature proxies that actually work, and where the physical relationship between the proxy and the temperature is understood (or has at least been thoroughly tested amd verified):
D18O (from arctic icecaps)
TEX86
Alkenones
Foraminifera
Pollen analysis
Treeline changes
Faunal compositions
These all have limitations and problems, and none of them is very exact (the uncertainty is at least plus or minus a degree or two), but the strangest thing of all is that (with the possible exception of D18O) they are almost never used by “climate scientists”, who prefer to use proxies like tree-rings, streamflow and lake deposits whose relation to temperatures is, to put things mildly, very indirect.

David Riser
April 4, 2014 11:07 am

I wonder how many folks used this kind of statistics to earn their PHD in the first place?

April 4, 2014 11:24 am

It’s fine to use correlation with T to select proxy types but not to cherry pick individual proxy series within each proxy type. So I will translate izen’s statement thus:
“The correlation between PROXY TYPES and recent temperature data IS legitimate because there are well established physical and biological processes that result in temperature changes altering the proxy measured as with dO18 isotope analysis. A lack of correlation in such PROXY TYPES indicates that factors other than temperature are distorting the data so that THOSE PROXY TYPES should be discarded.”
…for otherwise you are just data mining for recent correlation with thermometers while less recent noise just averages out to a horizontal hockey stick blade.

Reply to  NikFromNYC
April 4, 2014 11:47 am

It strikes me that what proxy constructors have done is the equivalent of removing outliers in an experiment because, well, they are outliers and we don’t like them!!

tty
April 4, 2014 11:27 am

Izen says:
“The productivity, expected output of a forest assessed for logging is calculated from the temperature and rainfall records.”
Exactly, temperature and rainfall. And in most parts of the World rainfall is the most important factor. You might find reasonably “pure” temperature dominated treering record close to the arctic treeline in areas where there is never any moisture stress (not many such places in the World), and provided you take samples from a very large number of trees to even out local effects. And you will still get plenty of noise and spurious signals from e. g. exceptionally late spring or early autumn frosts, major insect infestations, forest fires and large storms that fell trees over a large area
“there are well established physical and biological processes that result in temperature changes altering the proxy measured as with dO18 isotope analysis.”
Sometimes yes, sometimes no:
“The general peril in isotopic paleoclimate proxies is that the data may reflect a change in the source vapor due to a minor circulation change, rather than a widespread change in a major climate variable such as temperature or runo ff” (R. T. Pierrehumbert) (my emphasis)

basicstats
April 4, 2014 1:12 pm

Relying just on the account given here, it would seem that a big problem with the proxy selection lies in the evident nonstationarity of (typical) 20th century temperature time series. Whatever the exact nature of these time series may be – integrated, trend stationary or other – they are clearly right for spurious correlation with any other time series with an ‘upward drift’, or even just high autocorrelation, over the same time period. Random walks for instance, especially if drift is added.
By way of explanation for those invoking physical arguments to connect proxy and temperature, ‘spurious’ in this sense does not imply there definitely is not a connection, just that the correlations obtained (and their p-values) have no statistical worth.

Christopher Hanley
April 4, 2014 2:04 pm

Thank you Willis.
Circular reasoning can be difficult to detect particularly when the premise and conclusion are widely separated by a series convoluted steps.

Rud Istvan
April 4, 2014 2:29 pm

Izen, I happen to own several hundred acres of managed forest. Managed for maximum wildlife and maximum mixed deciduous hardwood yield (I won’t allow any cutting of the few residual northern white pines). Game includes whitetail, turkey, squirrel, coyote, ruffed grouse, raccoon, rabbit, the occaisional black bear, plus lots of hawks, lesser rodents, and other wonderful forest edge creatures-even visiting bald eagles up from the Uplands LWRV, a national scenic waterway.
The relationship between tree growth and environmental factors is much more complex than the website you note says. As a single example, every ‘wolf tree’ of any species should be cut at any selective logging. All the trees around it undergo a growth spurt for perhaps 15-20 years just because they have access to more sunlight, everything else being equal- which it never is. Google for an explanation. On my land, old hollow 200 year old burr oaks, prairie savannah remnants left over after we snuffed out fire in southwest Wisconsin, are but one example. We only spare the wild honey bee trees. There are exactly three. Darned big old box elders are another always cut.
There is no way trees make proxy thermometers. Neither in Wisconsin nor in Yamal. People should stop trying, and learn forestry instead.
And learning basic statistical principles would not hurt the proxytologists either, as Willis continues to point out.

Nick Stokes
April 4, 2014 2:40 pm

Robert Scribbler has been talking about a coming Kelvin Wave for a while now.

April 4, 2014 2:54 pm

Izen,
Sure, rainfall correlates to expected output, and temperature correlates to expected output. This does not remotely imply that rainfall correlates to temperature, not under any system of logic I have ever encountered. The hottest places on Earth are also the driest! You have given yourself yet another black eye…

Follow the Money
April 4, 2014 3:02 pm

Rud,
Izen’s paper reference regarding “temps” is very off in his implied point. The writers mostly apply mositure budgets and “solar irradiance” variability which is mostly a product of cloud patterns. Not much new there. The idea that tree rings are akin to thermometers developed from an idea that annually varying temps or growing seasons might play a noticeable role in tree rings applied only to trees under a lot of cold stress, that is, timberline species. That is why Sierra bristlecones and Siberian taiga tress show up.
I have no knowledge of any North Hemisphere dendro study using anything other than timberline trees as possible temps indicators. There appears to have grown a heresy of sorts in Australia and New Zealand science circles that have dropped the timberline business. I have no knowledge how they ground that, even Mann could talk about mysterious “teleconnections” for his ideas. No one has challenged the Aussie and Kiwi treemometer assertions because…where is the money in that?? A simple comparative study of the types of trees used in NH studies and Aussie/NZ studies would be fruitful on the point.

Follow the Money
April 4, 2014 3:15 pm

“I fear I can’t reveal all of the secret sauce, because as is far too common in climate science, they have not released their computer code.”
Looking at the shape of the blade at Fig. 2, I would guess some twist on the modern instrumental temperature records are smeared in. The code might also be adulterated with Law Dome CO2 concentration findings.
Note in fig. 1, color-wise, the most intense warming, and even some of the cooling, is located far from the proxy sites. E.g. the north Indian Ocean. How does that happen? Isn’t it convenient the most extreme temps can’t be corroborated with surface stations?

April 4, 2014 3:17 pm

Isn’t GISS the data set that Hansen “corrected”?
A month or so ago I looked at what is currently posted and then used “TheWayBackMachine” (http://archive.org/web/web.php) to look at older versions. The oldest I found was from 2012. I didn’t do anything that could be called an analysis but even so it was obvious that many, many changes had been made. Even the first numbers from January 1880 were different.
I wonder what kind of stick would be formed using the old numbers? (Especially if data before 2012 could be found.)

robert schooley
April 4, 2014 3:25 pm

Dr. Unfrozen Caveman raises an important issue on oncology research.
Another is that the oncologists did not use prospective “new drug” vs. “old drug” vs. NO (oncology) drug double-blinded studies.
What happened in the 1970s-80s was an unaccounted factor: intensive care. Intensive care increased the lifespans of millions of patients suffering from myriad diseases. In the cancer chemotherapeutic patients, it got them through immunodeficiency crises. New penicillinase-proof penicillin-based drugs and cephalosporins–Thank You! chemists and chemical engineers at Eli Lilly et al– mechanical ventilation –Thank you! real-scientist mechanical engineers at Bird and Siemens. Oxygen delivery and C02 expungement, good job!
Battling DIC (disseminated intravascular coagulation) was a matter of combining a lot of ordinary-people blood-donations, scientists’ studies on fractionation, and engineers figuring out how to do it, e.g. platelet concentration, and coagulation-factor concentration.
Anyway, we got millions of people through crises. Did the oncologists ever try to measure our inputs’ effects? No. They ascribed the improved longetiveties to new oncology drugs. All they had to do was to use old anti-tumor drugs and old anti-tumor drugs in a controlled fashion using new intensive care medicine, head to head. This didn’t happen.
“Historic control” studies were completely unscientific, putting up old oncology drugs and old supportive care longevity numbers, vs. new drug and new supportive care longevity numbers, and ascribing new longer-term longevities only to the new oncology drugs.
Is it Nixon’s fault for promoting the “War on Cancer” which provided billions of dollars to do research whose funds attracted tens of thousands of second, third and fourth-rate scientific minds to the new federal gravy train?
The federal gravy train is what it is. The Continental Railroad, and Northern Capitalists’ desire to substitute low-cost semi-slave European laborers for lower-cost slave African laborers may have been the start of this. The new crop was abused. Read “Sod and Stubble”. Read about how the New England textile mills were made the model for public education. Read about how Common Core is not a “Federal” program, but it only “works” if it is implemented nationwide. Read deeply about “International Baccalaureate Programme” to find it is a UN scheme.
Why aren’t Barack Obama’s daughters, or Bill Gates’ children, or even Arne Duncan’s children in Common Core schools? They have decided that Common Core is good for remedial students, which characterization does not describe their kids. But CC is good for “stupid kids”. Alas, the “stupid kids” will reject it.
I visited a “post-modern” school. Its perimeter was enwrapped by chainlink topped by razor wire. No joke. It was estensibly designed to prevent after-school vandelism. But the attending students, what were they supposed to think? Vandals excluded, or we are imprisoned?

Mike Ozanne
April 4, 2014 3:29 pm

Gosh Gee-Willikers thank that these people are in charge of trivial matters like the future of the global economy and the survival of democracy! and not the quality standard of your car’s brake linings, or the PORV in your boiler, or the surge sensor in your house’s consumer unit. Because that level of wilfully ignorant incompetence would be dangerous if it were exercised on matters of importance….
Yes yes /sarc on

scf
April 4, 2014 4:05 pm

You would think that after the hockey statistical methods had been debunked, there might be some stoppage to the usage of such methods… I guess that’s asking for a lot from today’s climate scientists. Learning is not their specialty it seems.

Patrick B
April 4, 2014 7:55 pm

“They are saying that they know the average temperature of the southern hemisphere in the year 1000 to within a 95% confidence interval of plus or minus a quarter of a degree C?? Really? … c’mon, guys. Surely you can’t expect us to believe that …”
This – from when I first started reading about the topic, I found it hard to believe the error analysis wouldn’t swamp any trends people were searching for. I find it impossible to believe we know the average temperature of the southern hemisphere in the year 1935 to within plus or minus 1 degree.

Sparks
April 4, 2014 9:07 pm

Willis, what is that high horse your pedestal is supported?

Sparks
April 4, 2014 9:16 pm

There’s logic!

April 4, 2014 10:07 pm

Aussiebear says: April 4, 2014 at 5:01 am
I think this may get Modded. Why does http://www.populartechnology.net/ hate you?
What you write seems, on the face of reasonable.

I don’t know why you posted this as I do not hate Willis nor have I ever implied any such thing.

robert schooley
April 4, 2014 11:05 pm

Oops, I screwed up. The proper tests in oncology-drugs vs. new oncology drugs was old oncology drugs with old intensive care, new oncology drugs with old intensive care, old oncology drugs with new intensive care, new oncology drugs with new intensive care. What the oncology profession decided to do was old oncology drugs with old intensive care vs. new oncology drugs with new intensive care and decide that marginally prolonged lives were due to new oncology drugs.
I went to Berkeley a bit before Micheal Mann. I didn’t have a father who was a tenured Associate Professor at UMass. I didn’t have the chance to study math under my dad, in Amherst, MA, and take DiffEQ and linear algebra at Amherst College or UMass. Michael Mann had advantages I
couldn’t have dreamed of.
Here is the interesting thing: Michel Mann, Mass native, pater was a UMass tenured Asso. Prof, didn’t get into Harvard or MIT. Why not? Then, at Berkeley, Mann graduated with “Honors”. Not High Honors or Highest Honors. I graduated with High Honors.
Living in Amherst, Dr. Mann was able to concurrently-enroll, pass out of lower-division physics and math, then graduate from Berkeley in three years, or two. It took him five years.
Here’s the thing. In my field, I was an immediate set up for MIT, Harvard, Stanford, Caltech.
Michael Mann didn’t make it. Yale physics, not really happening. Zero Nobel Laureates. 4 NAS Physics-Section members (now down to two). Who told Mr. Mann, “You’re not good enough for Harvard, MIT, Stanford, Berkeley,or Caltech.” Or even physics-legendary, albeit faded Cornell, Columbia or Chicago.
If you’re a high student at Berkeley, you can go to Harvard, MIT, Stanford, Caltech or Berkeley. Or Cambridge. Mr. Mann had the opportunity to study and excel in high school and college. Mr. Mann’s Massachusetts teachers thought he wasn’t good enough for undergrad study at MIT or Harvard. His Berkeley physics and math teachers decided he wasn’t worthy of studying at graduate level under first rate minds.
I don’t want my international policies determined by second-rate minds. The UN is populated by second-rate minds. They want second-rate science minds to give them cover. I’m not interested in subjecting myself to that.

Jeff Alberts
April 4, 2014 11:05 pm

M Simon says:
April 4, 2014 at 12:44 am
I’m reminded of the origins of http://en.wikipedia.org/wiki/Duke_Nukem way back when. BTW the wiki does not go into the origins. – a dispute with another Neukom.

Except Neukom is most likely pronounced “noykom”.

Lynn Clark
April 4, 2014 11:33 pm

At 3:06 am on April 4, 2014, Kon Dealer said, “Is it because they are incompetant [sic], stupid, or mates of the authors (pal review)- or all 3?”
Richard Lindzen hinted that it’s door number one: incompetent. Listen to 3 minutes of his remarks before the UK Parliament House of Commons Energy and Climate Change Committee on January 28, 2014 (video should start playing at 2:49:10):
http://youtu.be/6GzNATrGH7I?t=2h49m10s

matayaya
April 4, 2014 11:34 pm

I think this study and you all underestimate the quality of the proxy temperature record. Many independent studies have been done using corals, ocean and lake sediments, cave deposits, ice cores, boreholes, glaciers, and documentary evidence such as paintings of glaciers. You can dismiss the value of such information but the methods used are fully transparent and not at all as described above. The website for the National Academy of Science and others give very detailed information on how they reach their conclusions.

Dudley Horscroft
Reply to  matayaya
April 5, 2014 1:44 am

Izen at 4 April 0906 answers Michael Moon’s queries “How does rainfall correlate to temperature? How do tree ring widths or “latewood density” correlate to temperature?” by referring to a particular site. I have read it; it is an interesting site.
Read it and you find :
“Absorbed photosynthetically active radiation (APAR) is estimated from global solar radiation, derived if necessary from an established empirical relationship based on average maximum and minimum temperatures. The utilized portion of APAR (APARU) is obtained by reducing APAR by an amount determined by a series of modiŽfiers derived from constraints that cause partial to complete stomatal closure: (a) subfreezing temperatures; (b) high daytime atmospheric vapour pressure deficit (VPD); (c) depletion of soil water reserves.”
Then you find that the incoming radiation – which is what the trees use and what the researchers are interested in, is obtained using temperature where actual measurement of APAR is not available.
The relevance of temperature is this. If it is sub-freezing trees do not grow (this puts a ‘modifier’ of ‘0’ into their equations). If they haven’t got the APAR they have to use temperature records (av max and min) as a proxy for APAR. It is radiation and water (plus CO2 of course) that allows the trees to grow. Negative temperatures can stop this.
It is difficult to get the reverse logic, “if the tree grows well, the temperature is X”.

thegriss
April 5, 2014 2:17 am

The fact that the reconstruction matches pre-1979 GISS.. (Fig 1c)
PROVES that it is a fantasy. !!

Kon Dealer
April 5, 2014 7:39 am

Izen, were you a reviewer of the Neukim paper?

Steve Keohane
April 5, 2014 7:59 am

Max Hugoson says:April 4, 2014 at 10:31 am
Bingo!! Until we stop talking temperature and start measuring enthalpy, it is a meaningless discussion.

Steve Keohane
April 5, 2014 8:00 am

Willis, thanks once again for a great dissection.

Robet Schooley
April 5, 2014 8:07 am

Willis, your CV is weird. My read on it is you are an unmitigated outlier. Why do you insist on being off the statistical charts? And how did you cook those Alaskan salmon? Did you grill them over alderwood, or did you use fir? I’m just asking because one of my kids, and my best friend from high school, want an adventure, and they want me to lead it. You know these Cali kids, they want something exciting. I made the mistake of taking them to Baja. Once you’ve been there, you can’t really go back. Salmon fishing in Alaska, true also

rgbatduke
April 5, 2014 8:14 am

it won’t be long before rgb is termed a “denier”.
Oh, it happens all the time when I post on any forum outside of WUWT. It’s the quickest way to end debate and “win”, after all. First I’m labelled a denier. Then I present evidence, usually straight from e.g. W4T but I have a pile of it bookmarked. Then I’m accused of cherrypicking dates in my presentation of evidence, usually accompanied by some robust cherrypicking straight back at me. These days, I end up presenting evidence straight out of AR5, chapter 9.
The really silly thing is that I’m not, actually “a denier”. I not only acknowledge that there is a GHE, I try to educate people about how it works (usually to no point). My primary beef with climate science is that the claim that the GCMs are accurate and have predictive skill is not borne out by the usual sorts of comparisons and tests we would put any new theory through. The best of them are starting to look “better” as they up computational resolution and include more dynamics but CMIP5 is fully of models that really pretty much suck and it isn’t surprising that they suck. What is surprising is that AR4 and AR5 use them as if they don’t suck, as if they are equally likely to be accurate predictors as the better models. What is surprising is that there is no empirical component to determining what the best models are (as in, which ones DO the best job of predicting the actual temperature).
What is tragic about the whole discussion is that the side that is so very quick to label somebody a “denier” and hence avoid having to confront any of the uncertainties or inconsistencies in the picture of catastrophe that is being sold, hard, around the world is completely unwilling to acknowledge that the climate system is enormously complex and may not do at all what one expects from a naive argument, especially one bolstered by infinitely adjustable model based proxy based determinations of past climates that are custom tuned to accentuate present warming and that deliberately neglect confounding effects such as UHI corruption of the land surface record that would reduce the apparent severity of the recent warming (which is none too severe even without removing the UHI effect). They do not with to acknowledge that anybody could disagree with them, have reasons for disagreeing with them, and not be either in the pay of special interests or Evil Incarnate.
rgb

lemiere jacques
April 5, 2014 8:53 am

they could make an easy test , split temperature data in two, and apply selection methods on each periods and then compare the results with temperature record.

Reply to  lemiere jacques
April 5, 2014 9:19 am

That helps and will undoubtedly screen out more proxies included merely by chance but it does not address the more fundamental weakness of their approach. It seems to me that you first need a compelling rationale for testing against temp 1000KM away and with leads and lags. Assume that you had a temp record at the location and it did not correlate, but those further away did. On what basis would you ignore the in situ record?

Kon Dealer
April 5, 2014 9:08 am

“We Live in Cold Times” should be compulsory viewing for Neukom and supporters.

April 5, 2014 10:47 am

As I stated before if something in my post was inaccurate I would correct it, so I removed the sentence about not listing a programming language on his CV. Otherwise everything else I stated stands.
As for why I wrote it, this is very simple – various Willis fanboys here refused to concede that Willis is not a professional scientist and then became quite nasty about it. As I promised to them when people search for his name my article will come up so that is likely why Aussiebear found it.
https://www.google.com/search?q=willis+eschenbach
Apparently they did not take me seriously.
Willis continues to argue a strawman as my argument was always in relation to his credentials not his work, which is pretty obvious when you read my article. I believe people should read his articles in the proper context of his actual qualifications, which have been misrepresented in major news articles.
Willis of course can please ignorance on the whole issue and the behavior of his fanboys.

How does poptech think that I have done the literally hundreds of very complex analyses that I’ve published here if I’m not a computer programmer?

Using Excel.
Since I posted my article I have had no need to bring the issue up as I have not seen anyone incorrectly refer to Willis as a professional scientist since.

April 5, 2014 11:06 am

Brown of course is wrong about cowtan and way and wrong about kriging..the method skeptics approved of before they saw the answer. Cowtan and way make a prediction of the arctic. That is what krigging does. They then test that prediction using out of sample data from bouys.
Subsequent to their paper I have tested their prediction
Using yet another dataset that covers 100percent of the arctic from 2002 to present. Their prediction is superior to hadcrut who ignore usable data.

ferdberple
April 5, 2014 12:48 pm

izen says:
April 4, 2014 at 7:33 am
A lack of correlation in such cases indicates that factors other than temperature are distorting the data so that it should be discarded.
===============
FALSE. It shows no such thing.
Statistic does not allow you to reject individuals from your sample simply because they show poor correlation. Those individuals with poor correlation are telling you about the overall quality of the entire sample. If you remove them this will artificially skew the sample and make the results appear more statistically reliable than they are. Which is why this is mathematically forbidden in statistics.
For example, say I institute a new teaching method in school. We measure student performance before and after the method is introduced. Then we examine the records after the method is introduced, and exclude any students with bad marks after, because something must be distorting their response to the teaching (drugs, family life, poverty, etc).
So what we are left with is only students with good marks after the teaching method is imposed, and students with as yet unknown marks before the teaching method is imposed.
Question: what effect will this have on our statistical analysis of the teaching method? Will it make our analysis of the trend more reliable or less reliable? Will it increase the reliability of the trend, or will it falsely skew the trend?
Most people will now recognize that by rejecting individuals we have made our statistical analysis much less reliable. It likely will have made our teaching method appear to have created an increase in grades where no such increase actually exists.
What we have created is a hockey stick.

Dale Muncie
April 5, 2014 10:30 pm

Robert says:
Robert I would not recommend using fir or pine to cook salmon.
Dale

Dale Muncie
April 5, 2014 10:42 pm

Robert Schooley says:
April 5, 2014 at 8:07 am
sorry about the mess I made . I would suggest grilling. Pine and fir leave a nasty taste.
Dale