A question about proxies and calibration with the adjusted temperature record

WUWT reader Tom O’Hara writes in a question that seemed worthwhile to discuss. Paleo specialists can weigh in on this. It seems to me that he has a point, but like him, I don’t know all the nuances of calibrating a proxy. (Graphic at right by Willis Eschenbach, from another discussion.)

O’Hara writes:

[This] is a puzzle to me.

Everything we know about past climate is based on “proxies.” As I understand the concept, science looks at “stuff” and finds something that tends to mirror the changes in temperature, or whatever, and uses that as a means to determine what the likely temperature would have been at an earlier time. This is, I am sure, an oversimplified explanation.

So what we have, in essence, is a 150 year or so record of temperature readings to use to determine our proxy’s hopeful accuracy.

Now my question would be, if we are continuously adjusting the “readings” of that record, how does that affect the usefulness of the proxy information?

If I have correlated my proxy to a moving target, doesn’t that effect the likelihood that the proxy will yield useful information?

It would seem to me that this constant massaging of the database used to define and tune my proxy, would, in the end, destroy the utility of my proxy to deliver useful information. Or have I got it all wrong?

A few primers for discussion:

1.Detecting instabilities in tree-ring proxy calibrationVisser et al

Abstract. Evidence has been found for reduced sensitivity of tree growth to temperature in a number of forests at high northern latitudes and alpine locations. Furthermore, at some of these sites, emergent subpopulations of trees show negative growth trends with rising temperature. These findings are typically referred to as the “Divergence Problem” (DP). Given the high relevance of paleoclimatic reconstructions for policy-related studies, it is important for dendrochronologists to address this issue of potential model uncertainties associated with the DP. Here we address this issue by proposing a calibration technique, termed “stochastic response function” (SRF), which allows the presence or absence of any instabilities in growth response of trees (or any other climate proxy) to their calibration target to be visualized and detected. Since this framework estimates confidence limits and subsequently provides statistical significance tests, the approach is also very well suited for proxy screening prior to the generation of a climate-reconstruction network.

Two examples of tree growth/climate relationships are provided, one from the North American Arctic treeline and the other from the upper treeline in the European Alps. Instabilities were found to be present where stabilities were reported in the literature, and vice versa, stabilities were found where instabilities were reported. We advise to apply SRFs in future proxy-screening schemes, next to the use of correlations and RE/CE statistics. It will improve the strength of reconstruction hindcasts.

Citation: Visser, H., Büntgen, U., D’Arrigo, R., and Petersen, A. C.: Detecting instabilities in tree-ring proxy calibration, Clim. Past, 6, 367-377, doi:10.5194/cp-6-367-2010, 2010.

From WUWT August 16, 2013 A new paper now in open review in the journal Climate of the Past suggests that “modern sample bias “has “seriously compromised” tree-ring temperature reconstructions, producing an “artificial positive signal [e.g. ‘hockey stick’] in the final chronology.”

Basically, older trees grow slower, and that mimics the temperature signal paleo researchers like Mann look for. Unless you correct for this issue, you end up with a false temperature signal, like a hockey stick in modern times. Separating a valid temperature signal from the natural growth pattern of the tree becomes a larger challenge with this correction. More here

Calibration trails using very long instrumental and proxy data

Esper et al. 2008

Introduction

The European Alps are one of the few places that allow comparisons of natural climate proxies, such as tree-rings, with instrumental and documentary data over multiple centuries. Evidence from local and regional tree-ring analyses in the Alps clearly showed that tree-ring width (TRW) data from high elevation, near treeline environments contain substantial temperature signals (e.g., Büntgen et al. 2005, 2006, Carrer et al. 2007, Frank and Esper 2005a, 2005b, Frank et al. 2005). This sensitivity can be evaluated over longer timescales by comparison with instrumental temperature data recorded in higher elevation (>1,500 m asl) environments back to the early 19th century, and, due to the spatially homogenous temperature field, back to the mid 18th century using observational data from stations surrounding the Alps (Auer et al. 2007, Böhm et al. 2001, Casty et al. 2005, Frank et al. 2007a, Luterbacher et al. 2004). Further, the combination of such instrumental data with even older documentary evidence (Pfister 1999, Brázdil et al. 2005) allows an assessment of temporal coherence changes between tree-rings and combined instrumental and documentary data back to AD 1660. Such analyses are outlined here using TRW data from a set of Pinus cembra L. sampling sites from the Swiss Engadin, and calibrating these data against a gridded surface air temperature reconstruction integrating long-term instrumental and multi-proxy data (Luterbacher et al. 2004).

paper here: Esper_et_al_TraceVol_6 (PDF)

0 0 votes

Article Rating

102 Comments

Inline Feedbacks

View all comments

nutso fasst

June 13, 2014 7:26 am

How does CO2 concentration affect growth rates?

Coach Springer

June 13, 2014 7:33 am

“Lay person” reaction: Proxy is projection of the present onto the past. Change the present, change the projection. The issue serves to remind me that these are only projections rather than assuming a false accuracy just because different projections might produce minute variances. (A whole set of forecasts could be off by a mile but only vary from one another minimally. Sound familiar with regard to projections into the future?)

leftturnandre

June 13, 2014 7:35 am

Maybe that the reason for the divergence problem is also bad weather station placements, especially in the arctic. If you “train” your proxies using “bad” stations, where situation has been changed, you may see the artificial warming, which is not visible in the tree rings, simply because there was no real warming.

David McKeever

June 13, 2014 7:39 am

Statistics can look at the same data and look at it from a different angle (so to speak) and find a stronger correlation with some subset of the data (pca analysis is just one technique). Once you have a data set you aren’t frozen into one analysis. That also opens the door to abusing these same techniques (see Steve McIntyre on the hockey stick). Abusing the methods to find a predetermined pattern doesn’t nullify all the methods (used appropriately).

Joseph Murphy

June 13, 2014 7:39 am

A general rule of mine, you can not do hard science on anything with a specific date attached to it. Experimentation requires that time be irrelevant. (You can do an experiment that shows x causes y. But, if you know that y occurred at sometime in the past, you can not do an experiment to show that x was the cause of that y.) This post seems to be pondering some of the extra assumptions required when specific ‘times’ are incorporated into science.

Jim G

June 13, 2014 8:06 am

Don’t trees, like most living things, adapt over time to their environment? Plus the variables are many, ie CO2, moisture, temperature, sunlight, humidity, etc.

BioBob

June 13, 2014 8:07 am

– It is important to realize that field temperature readings are themselves proxies of particle velocity or kinetic energy.
– In addition, the unit scales (Celsius, Kelvin, etc.) employed are also proxies for reality.
– calibration is also a proxy for ‘accuracy’, since precision and the limits of observation make the resulting readings a ‘fuzzy’ probability cloud rather than a single value.

James

June 13, 2014 8:23 am

Related to proxies, what are the resolution of various proxies? I always hear (mostly from skepticalscience.com) about how proxies show that we’ve never seen as rapid a temperature rise as we have in the last century anywhere in the historical record. My impression, though, is that there’s not enough resolution in the proxies at the sub-centennial scale. Is this true? Can someone help shed some light on this for me? Thank you!

rgbatduke

June 13, 2014 8:25 am

All you are really pointing out is that tree rings in particular make lousy proxies, because tree growth rates are highly multivariate and because any process you use to include or exclude specific trees on the basis of IMAGINED confounding processes are open opportunities for undetectable confirmation bias to creep into your assessment. You can only reject trees if you think you know the answer they are supposed to be providing, outside of the usual statistical process of rejecting extreme outliers. But one of the problems with Bayesian reasoning in this context is that one man’s Bayesian prior can all too easily become another man’s confirmation bias that prejudices a particular answer. One has to have a systematic way of reassessing the posterior probabilities based on data.
But data is what this approach can never obtain. We cannot ever know the temperatures in the remote, pre-thermometric past. Hell, we can barely assess them now, with thermometers! One could do a multi-proxy analysis, using things like O18 levels that might be a completely independent proxy with independent confounding errors to improve Bayesian confidence levels, but I’ve always thought “dendroclimatology” is largely nonsense because of my work on random number generator testers (dieharder).
Here’s an interesting question. Once upon a time, before computer generation of pseudorandom numbers became cheap and reliable in situ, books like the CRC handbook or Abramowitz and Stegun (tables) often included pages of “tested” random numbers for people to use in Monte Carlo computations done basically by hand. Even into the 90’s, one of the premier experts on random number generators and testing (George Marsaglia) distributed tables of a few million “certified” random numbers — sets that passed his diehard battery of random number generator tests — along with the tests themselves on a CD you could buy. What is wrong with this picture?
Random number generators are tested on the basis of a pure (null) hypothesis test. One assumes that the generator is a perfect generator (and that the test is a perfect test!), uses it to generate some systematically improvable/predictable statistic that can be computed precisely some other way, and then compute the probability of getting the answer you got from using the RNG if it were a perfect RNG. In case this is obscure, consider testing a coin, presumed to be 50-50 heads and tails. If we flip the coin 100 times and record the number of heads (say) we know that the distribution of outcomes should be the well-known binomial distribution. We know exactly how (un)likely it is to get (say) 75 heads and 25 tails — it’s a number that really, really wants to be zero. If we have a coin that produces 75 heads and 25 tails, we compute this probability — known as the p-value of the test — and if it is very, very low, we conclude that it is very, very unlikely that a fair coin would produce this outcome, and hence it is very, very unlikely that the coin is, indeed, the unbiased coin we assumed that it was. We falsify the null hypothesis by the data.
Traditionally, one sets the rejection threshold to p = 0.05. There isn’t the slightest good reason for this — all this means is that a perfect coin will be falsely rejected on average 1 time out of 20 trials of many samples, no matter how many samples there are in each trial. A similar problem with accepting a result if it reaches at least a p-value of 0.05 “significance” plagues medical science as it enables data dredging, see:
http://xkcd.com/882/
which is an entire education on this in a single series of cartoon panels.
However, there is a serious problem with distributing only sets of random numbers that have passed a test at the 0.05 level. Suppose you release 200 such sets — all of them pass the test at the 0.05 level, so they are “certified good’ random numbers, right? Yet if you feed all 200 sets into a good random number generator tester, it will without question reject the series! What’s up with that?
It’s simple. The set is now too “random”! You’ve applied an accept/reject criterion to the sets with some basically arbitrary threshold. That means that all the sets of 100 coin flips that have just enough heads or tails to reach a p-value of 0.04 have been removed. But in 200 sets, 8 of them should have had p-values this low or lower if the coin was a perfectly random coin! You now have too few outliers. The exact same thing can be understood if one imagines testing not total numbers of heads but the probability of (say) 8 heads in a row. Suppose heads are 1’s and tails are 0’s. 8 heads in a row is something we’d consider (correctly) pretty unlikely — 1 in 256. Again, we “expect” most coin flips to have roughly equal numbers of heads and tails, so combinations with 4 0’s and 4 1’s are going to be a lot more likely than combinations with 8 0’s or 8 1’s.
We are then tempted to reject all of the sets of flips that contain 6, 7, or 8 1’s or 0’s as being “not random enough” and reject them from a table of “random coin flips”. But this too is a capital mistake. The probability of getting the sequence 11111111 is indeed 1/256. But so is the probability of getting 10101010, or 11001010, or 01100101! In fact, the probability of getting any particular bit pattern is 1/256. A perfect generator should produce all such bit patterns with equal probability. Omitting any of them on the basis of accept/reject at some threshold results in an output data set that is perfectly biased and that will fail any elementary test for randomness except the one used to establish the threshold. This is one reason that humans make lousy random number generators. If you are asked to put down a random series of 1’s and 0’s on the page, or play rock-paper-scissors with random selection, you simply cannot do it. We aren’t wired right. We will always produce series that lack sufficient outliers because 11111111 doesn’t look random, where 10011010 does.
With that (hopefully) clear, the relevance to data selection in any sort of statistical analysis should be clear. This is an area where angels should rightly fear to tread! Even the process of rejecting data outliers is predicated on a Bayesian assumption that they are more likely to be produced by e.g. spurious errors in our measuring process or apparatus than to be “real”. However, that assumption is not always correct! When Rutherford (well, really Geiger and Marsden) started bombarding thin metal foil with alpha particles, most passed through as expected. However, some appeared to bounce back at “impossible” angles. These were data outliers that contradicted all prior expectations, and it would have been all too easy to reject them as fluctuations in the apparatus or other method errors — in which case a Nobel prize for discovering the nucleus would have been lost.
In the case of tree ring analysis, it is precisely this sort of accept/reject data selection on the basis of an arbitrary criterion that led Mann to make the infamous Hockey Stick Error in his homebrew PCA code — a bias for random noise to be turned into hockey sticks. Even participants in this sort of work acknowledge that it is as much guesswork and bias as it ever is science — usually off the record. In the Climategate letters, one such researcher laments that the trees in his own back yard don’t reflect the perfectly well known temperature series for that back yard, causing his own son to get a null result in a science fair contest (IIRC, it is some time since I read them:-). Then there is the infamous remark ON the record about the need to pick cherries to make cherry pie — to the US Congress!
I spent 15 years plus doing Monte Carlo computations, which rely even now on very, very reliable sources of random numbers. Applying heuristic selection criteria to any data series with a large, unknown, unknowable set of “random” confounding influences in the remote past to “improve” the result compared to just sampling the entire range of data and hope that the signal exceeds the noise (eventually) by dint of sheer statistics is trying to squeeze proverbial statistical blood from a very, very hard no-free-lunch stone. Chances are excellent that your criterion will simply bias your answer in a way you can never detect and that will actually make your answers systematically worse as you gather more data relative to the unknown true answer.
rgb

Pamela Gray

June 13, 2014 8:27 am

Proxies based on solar metrics may also find themselves with a published temperature paper that has morphed from a gold standard to one that is now questionable and possibly unreliable. But this should be seen as part of the scientific process and not reflect poorly on the authors of such papers. There have been many examples in the past where understanding at that time was accepted only to see that understanding nearly stand on its head decades later (or even a few years later) yet those papers were not pulled and can still be read today. Which is the way it should be. The fact that tree ring and other proxies are now being questioned, and temperature observations adjusted up or down is rather normal in the history of science advances and paradigm shifts rather than an exception, and the process whereby these things happen should remain in the journals instead of removed.
Which reminds me of a very important step in defensible research. Do your literature review very thoroughly. That vetting process should not be quickly dispatched lest you find yourself basing your entire work on out of date information or somewhat paradoxically, current science fads that will eventually go down the same path.

Robert of Texas

June 13, 2014 8:28 am

There is a related issue I would like to hear someone address:
Proxies such as Tree Ring measurements have multiple confounding factors: Temperature, Water Availability, Sunlight, CO2 Availability, Nutrient Availability (other than CO2), other stress factors (pests, disease, early winter). There may be others.
So not only does the baseline move, but how do you assign the growth of a ring to all of these factors (and probably more I didn’t think of)? Each of these factors may change year to year or decade to decade. I just do not understand how you untangle them without introducing bias.

Bob Kutz

June 13, 2014 8:33 am

No, I think the part where dendro-chronology falls off the rails has nothing to do with revising data. The fact that our proxy goes completely the wrong way for about the last 50 years (aka “hide the decline” in the original context of ‘Mike’s Nature trick’) means that this particular proxy should be completely disregarded until such time as the difference can be reconciled.
THAT, at least, shouldn’t be hard for anybody to understand.
Too bad the media completely glossed over Meuller’s real comments on that point in favor of his ‘vindication’ of the historical temp. data.

Stark Dickflüssig

June 13, 2014 8:34 am

BioBob says:
June 13, 2014 at 8:07 am

– It is important to realize that field temperature readings are themselves proxies of particle velocity or kinetic energy.
– In addition, the unit scales (Celsius, Kelvin, etc.) employed are also proxies for reality.
– calibration is also a proxy for ‘accuracy’, since precision and the limits of observation make the resulting readings a ‘fuzzy’ probability cloud rather than a single value.

The first two are true, but they have the advantage of being replicable (ie, I can build a thermometer in my kitchen, calibrate it to the freezing & boiling points of water, & I will be very close to everyone else’s thermometers), unlike the paleo-proxies, which have to simply be trusted.
The third point is a great big “So what? That’s life, get used to it.”

pouncer

June 13, 2014 8:48 am

RGB, Plus One, as usual.

Ron C.

June 13, 2014 9:07 am

In Soviet Russia, they used to say: “The future is known, it is the past that keeps changing.”

joe

June 13, 2014 9:17 am

From the article above: “Furthermore, at some of these sites, emergent subpopulations of trees show negative growth trends with rising temperature. These findings are typically referred to as the “Divergence Problem” (DP). ”
This is not a divergence problem – This is basic biology – all plants have optimum growing ranges – A bell curve – too cold – plants grow slowly, as it warms plants grow faster until it reaches an optimum growth, as temp gets even warmer, the plant growth slows down. An important question for the Detro experts – is how can you tell the difference along with all the other factors, light, nutrition, rainfall, etc
Virtually all plant species have geographical ranges. There is a reason plants growing in northern latitudes dont grow well in southern latitudes and visa versa for plant species growing in southern latitudes.
Is the yamal ural divergence problem due to getting too warm? I dont know.
Are the proxies that are not picking up the MWP due to it getting too warm and therefore having slower growth?

lemiere jacques

June 13, 2014 9:27 am

using a proxy means you are making an assumption.
being able to assess a proxy means you didn(t need it.
well if you have several independant proxies you can begin to work more seriously.

Stanleysteamer

June 13, 2014 9:33 am

I have not seen a discussion about sampling technique and sampling bias. I teach a basic statistics course and have some insight into these problems when associated with any study. Personally, I do not think that taking a few trees in the Northern Hemisphere constitutes a valid sampling technique. In addition, a researcher must be very careful to extrapolate conclusions beyond the region in which the samples were taken. From my perspective, the only valid data we have for the entire planet is the satellite data and we just don’t have enough of it to be drawing firm conclusions about anything. Maybe an expert can weigh in on sampling.

Gunga Din

June 13, 2014 9:40 am

My conclusion? There’s more than one bug in Mann’s tree rings.

JeffC

June 13, 2014 9:40 am

at a minimum every time the data is “adjusted” any modeling done using said data becomes invalidated and must be rerun … if the modeler used hindcasting to tune his model he would have to retune the model with the new historic data and rerun the model for future forecasts …

BioBob

June 13, 2014 10:17 am

Stark Dickflüssig says: June 13, 2014 at 8:34 am The third point is a great big “So what? That’s life, get used to it.”
——————————-
So what ? Every AGW graph, every temperature reading I have ever seen ignores point 3. Liquid in glass thermometers typically have a plus or minus .5 degree F limit of observability reported by the manufacturer and yet weather station report temperatures that employ such devices supposedly have a precision of .01 to .001 rather than to the nearest degree F that the instrument proxy can not discern.
That’s what Stark. Read em and weep for “life as we do NOT know it”. The central limit theorem (which likely does not apply in any case) concerns variance, not instrument limitations.

vukcevic

June 13, 2014 10:20 am

By comparing two very respectable science proxy-based reconstructions and one observational data calculation, good agreement is attained ( HERE
A provisional conclusion could be that the accord may not be coincidental, or at least unlikely.
a – …annual band counting on three radiometrically dated stalagmites from NW Scotland, provides a record of growth rate variations for the last 300 years. Over the period of instrumental meteorological records we have a good historical calibration with local climate (mean annual temperature/mean annual precipitation), regional climate (North Atlantic Oscillation) and sea surface temperature (SST; strongest at 65-70°N, 15-20°W)….- Baker, A., Proctor, C., – NOAA/NGDC Paleoclimatology Program, Boulder CO, USA.
b – ….. observational results indicate that Summer NAO variations are partly related to the Atlantic Multidecadal Oscillation.Reconstruction of NAO variations back to 1706 is based on tree-ring records from specimens collected in Norway and United Kingdom …. – C. Folland, UK Met Office Hadley Centre.
c – …. Solar magnetic cycles (SIDC-SSN based) & geomagnetic variability (A. Jackson, J. Bloxham data) interaction. Calculations – vukcevic

BioBob

June 13, 2014 10:25 am

Stanleysteamer says: June 13, 2014 at 9:33 am From my perspective, the only valid data we have for the entire planet is the satellite data
———————-
Even sat data has it’s problems or we would not have things like this:
http://wattsupwiththat.com/2010/10/04/an-over-the-top-view-of-satellite-sensor-failure/
example from the post:
“The U.S. physicist agrees there may now be thousands of temperatures in the range of 415-604 degrees Fahrenheit automatically fed into computer climate models and contaminating climate models with a substantial warming bias. This may have gone on for a far longer period than the five years originally identified.”

BioBob

June 13, 2014 10:40 am

Stark Dickflüssig says: June 13, 2014 at 8:34 am I can build a thermometer in my kitchen, calibrate it to the freezing & boiling points of water, & I will be very close to everyone else’s thermometers
—————–
LOL – when pigs fly.
Build your thermometer:
1) demonstrate how it’s response is linear between -200 c to +100 c or whatever range you use,
2) determine the limits of the changes it can reliable discern (limits of observation)
3) let me know how you define “very close” when identical machine produced devices placed in identical seeming Stevenson screens at identical heights, etc vary significantly with age, etc etc.
In short bullcrap !!

Peter Miller

June 13, 2014 11:29 am

Ron C, I hope you know that saying is the First Rule of Mann – The future is known, it is the past which keeps changing.
The gatekeepers of the earth’s ground temperature records obviously believe the same.