Guest Post by Willis Eschenbach
Over at Bishop Hill, the Bish has an interesting thread about a new proxy reconstruction by Rob Wilson et al. entitled “Last millennium northern hemisphere summer temperatures from tree rings: Part I: The long term context” , hereinafter Wilson 2016. The paper and the associated data are available here. They describe the genesis of the work as follows:
This work is the first product of a consortium called N-TREND (Northern Hemisphere Tree-Ring Network Development) which brings together dendroclimatologists to identify a collective strategy for improving large-scale summer temperature reconstructions.
At first I was stoked that they had included an Excel spreadsheet with the proxy data. Like they say in the 12-step programs, Hi, my name’s Willis, and I’m a data addict … anyhow, here’s a graph of all of the data, along with the annual average in red.
Figure 1. Plot of the proxy data from the Wilson 2016 Excel worksheet. All proxies cover the period 1710 – 1988, as indicated by the vertical dotted lines. Note what happens to the average at the recent end.
But as always, the devil is in the details. I ran across a couple of surprises as I looked at the data.
First, I realized after looking at the data for a bit that all of the proxies had been “normalized”, that is to say, set to a mean of zero and a standard deviation of one. This is curious, because one of the selling points of their study is the following (emphasis mine):
For N-TREND, rather than statistically screening all extant TR chronologies for a significant local temperature signal, we utilise mostly published TR temperature reconstructions (or chronologies used in published reconstructions) that start prior to 1750. This strategy explicitly incorporates the expert judgement the original authors used to derive the most robust [temperature] reconstruction possible from the available data at that particular location.
So to summarize the whole process: for most of the data used, it started out as various kinds of proxies (ring width, wood density, “Blue Intensity”).
Then it was transformed using the “expert judgement of the original authors” into temperature estimates in degrees celsius.
Then it has been transformed again, this time using the expert judgement of the current authors, into standard deviations based on the mean and standard deviation of the period 1750-1950. Why this exact period? Presumably, expert judgement.
Finally, it will be re-transformed one last time, again using the expert judgement of the current authors, back into temperatures in degrees celsius
This strikes me as … well … a strangely circuitous route. I mean, if you start with proxy temperatures in degrees C and you are looking to calculate an average temperature in degrees C, why change it to something else in between?
It got odder when I analyzed the authorship of the temperature records reconstructed from the 53 proxies. In 48 of the proxy, the original lead author is also an author on this paper. It is true that they said this study is the result of a consortium of dendroclimatologists. However, I had expected them to look at more tree ring temperature reconstructions from other authors. So when they say they depend on the expert judgement of the authors of the proxies, in more than 90% of the proxies studied, they are merely saying they trust their own judgment.
And indeed, they think quite highly of their own judgement. rating it “expert”.
But since that is the case, since they are depending on their own prior transformation of a record of, e.g., tree ring width in mm into an estimated temperature in degrees C, then why on earth would they convert it out of degrees C again, and then at the end of the day convert it back into degrees C? What is the gain in that?
My second surprise came after I’d messed with the actual data for a few hours, when I got around to looking at their reconstruction. They describe their method for creating their reconstruction as follows:
3. Reconstruction methodology
A similar iterative nesting method (Meko, 1997; Cook et al., 2002), as utilised in D’Arrigo et al. (2006) and Wilson et al. (2007), was used to develop the N-TREND2015 NH temperature reconstruction. This approach involves first normalising the TR data over a common period (1750 – 1950), averaging the series to derive a mean series and iteratively removing the shorter series to allow the extension of the reconstruction back (as well as forward) in time. Each nest is then scaled to have the same mean and variance as the most replicated nest (hereafter referred to as NEST 1) and the relevant time-series sections of each nest spliced together to derive the full-length reconstruction. For each nest, separate average time series were first generated for 4 longitude quadrats (Fig. 1). These continental scale time series were then averaged (after again normalising to 1750 – 1950) to produce the final large-scale hemispheric mean to ensure it is not biased to data rich regions in any one continent. 37 backward nests and 17 forward nests were calculated to produce the full reconstruction length from 750 to 2011.
Like the song says, “Well, it was clear as mud but it covered the ground” … I was reminded of a valuable insight by Steve McIntyre, which was that at the end of the day all these different systems for combining proxies are simply setting weights for a weighted average. No matter how complex or simple they are, whether it’s principal components or 37 backwards nests and 17 forwards nests, all they can do is weight different points by different amounts. This is another such system.
In any case, that explained why they put the normalized data in their spreadsheet. This normalized data was what they used in creating their reconstruction.
I got my second surprise when I plotted up their reconstruction from the data given in their Excel worksheet. I looked at it and said “Dang, that looks like the red line in Figure 1”. So I plotted up the annual average of the 53 normalized proxies in black, and I overlaid it with a regression of their reconstruction in red. Figure 2 shows that result:
Figure 2. Annual average of 53 proxies (black), and linear regression of Wilson 2016 iterative nested reconstruction. Regression is of the form Proxy_Average = m * Reconstruction + b, with m = 1.25 and b = 0.54.
All I can say is, I hope they didn’t pay full retail price for their Nested Reconstruction Integratomasticator. Other than the final data point, their nested reconstructed integrated results are nearly identical to a simple average of the data.
Finally, as you can see, in recent times (post 1988) the fewer the proxies, the higher the estimated temperature. This is abated but not solved by their method. We can see what this means by restricting our analysis to the time period when all of the proxies have data.
Figure 3. As in Figure 2, annual average of 53 proxies (black), and linear regression of Wilson 2016 iterative nested reconstruction (red). Blue lines show proxy data.You can see how the number of proxies drops off after 1988 by the change in the intensity of the blue color.
Again you can see that their reconstruction is scarcely different from the plain old average of the data. As you can see, according to the full set of proxies the temperature in 1988 was lower than the temperature in 1950, and there is no big hockey-stick in the recent data up to 1988. After that the number of proxies drops off a cliff. By 1990, we’ve already lost about 40% of the proxies, and from there the proxy count just continues to drop.
In closing let me add that this post is far from an exhaustive analysis of difficulties facing the Wilson 2016 study. It does not touch any of the individual proxies or the problems that they might have. I hope Steve McIntyre takes on that one, he’s the undisputed king of understanding and explaining proxy minutiae. It also doesn’t address the lack of bright-line ex-ante proxy selection criteria. Nor does it discuss “data snooping”, the practice of (often unconsciously or unwittingly) selecting the proxies that will support your thesis. I can only cover so much in one post.
My conclusions from all of this:
• Transforming a dataset from tree ring widths in mm to temperatures in degrees C, thence to standard deviations, and finally back to degrees C, seems like a doubtful procedure.
• Without seeing the underlying data, it is hard to judge the full effects of what they have done. While having the normalized datasets is valuable, it cannot replace the actual underlying data.
• Whatever their iterative nested method might be doing, it’s not doing a whole lot.
• I do not know of any justification for normalizing the proxies before averaging them. They are already in degrees C. In addition, normalization greatly distorts the trends a time series, in a manner that depends on the exact shape and variance of the time series.
Finally, to a good first approximation their reconstruction is the same as the annual average of the normalized data. That means their method uses the following process:
Transform the "expert judgement" proxy temperature estimates from degrees C to units of standard deviations. Average them. Transform them back to degrees C using linear regression.
I’m sorry, but I simply don’t believe you can do that. Well, you can do it, but the result will have error bars from floor to ceiling and will have little to do with temperature.
El Niño rains here tonight. We’ve gotten four inches (10 cm) in the last four days, and it’s supposed to rain on and off for a week … great news here in drought city.
Best of life to all, sun when you need it, rain when it’s dry, silver from the five-day-old moon far-reaching on the sea …
My Usual Request: If you disagree with me or anyone, please quote the exact words you disagree with. I can defend my own words. I cannot defend someone’s interpretation of my words.
My Other Request: If you think that e.g. I’m using the wrong method on the wrong dataset, please educate me and others by demonstrating the proper use of the right method on the right dataset. Simply claiming I’m wrong doesn’t advance the discussion.