Guest Post by Willis Eschenbach
As Anthony discussed here at WUWT, we have yet another effort to re-animate the long-dead “hockeystick” of Michael Mann. This time, it’s Recent temperature extremes at high northern latitudes unprecedented in the past 600 years, by Martin P. Tingley and Peter Huybers (paywalled), hereinafter TH2013.
Here’s their claim from the abstract.
Here, using a hierarchical Bayesian analysis of instrumental, tree-ring, ice-core and lake-sediment records, we show that the magnitude and frequency of recent warm temperature extremes at high northern latitudes are unprecedented in the past 600 years. The summers of 2005, 2007, 2010 and 2011 were warmer than those of all prior years back to 1400 (probability P > 0.95), in terms of the spatial average. The summer of 2010 was the warmest in the previous 600 years in western Russia (P > 0.99) and probably the warmest in western Greenland and the Canadian Arctic as well (P > 0.90). These and other recent extremes greatly exceed those expected from a stationary climate, but can be understood as resulting from constant space–time variability about an increased mean temperature.
Now, Steve McIntyre has found some lovely problems with their claims over at ClimateAudit. I thought I’d take a look at their lake-sediment records. Here’s the raw data itself, before any analysis:
Figure 1. All varve thickness records used in TH2013. Units vary, and are as reported by the original investigator. Click image to embiggen.
So what’s not to like? Well, a number of things.
To start with, there’s the infamous Korttajarvi record. Steve McIntyre describes this one well:
In keeping with the total and complete stubbornness of the paleoclimate community, they use the most famous series of Mann et al 2008: the contaminated Korttajarvi sediments, the problems with which are well known in skeptic blogs and which were reported in a comment at PNAS by Ross and I at the time. The original author, Mia Tiljander, warned against use of the modern portion of this data, as the sediments had been contaminated by modern bridgebuilding and farming. Although the defects of this series as a proxy are well known to readers of “skeptical” blogs, peer reviewers at Nature were obviously untroubled by the inclusion of this proxy in a temperature reconstruction.
Let me stop here a moment and talk about lake proxies. Down at the bottom of most every lake, a new layer of sediment is laid down every year. This sediment contains a very informative mix of whatever was washed into the lake during a given year. You can identify the changes in the local vegetation, for example, by changes in the plant pollens that are laid down as part of the sediment. There’s a lot of information that can be mined from the mud at the bottom of lakes.
One piece of information we can look at is the rate at which the sediment accumulates. This is called “varve thickness”, with a “varve” meaning a pair of thin layers of sediment, one for summer and one for winter, that comprise a single year’s sediment. Obviously, this thickness can vary quite a bit. And in some cases, it’s correlated in some sense with temperature.
However, in one important way lake proxies are unlike say ice core proxies. The daily activities of human beings don’t change the thickness of the layers of ice that get laid down. But everything from road construction to changes in farming methods can radically change the amount of sediment in the local watercourses and lakes. That’s the problem with Korttajarvi.
And in addition, changes in the surrounding natural landscape can also change the sediment levels. Many things, from burning of local vegetation to insect infestation to changes in local water flow can radically change the amount of sediment in a particular part of a particular lake.
Look, for example, at the Soper data in Figure 1. It is more than obvious that we are looking at some significant changes in the sedimentation rate during the first half of the 20th Century. After four centuries of one regime, something happened. We don’t know what, but it seems doubtful a gradual change in temperature would cause a sudden step change in the amount of sediment combined with a change in variability.
Now, let me stop right here and say that the inclusion of this proxy alone, ignoring the obvious madness of including Korttajarvi, this proxy alone should totally disqualify the whole paper. There is no justification for claiming that it is temperature related. Yes, I know it gets log transformed further on in the story, but get real. This is not a representation of temperature.
But Korttajarvi and Soper are not the only problem. Look at Iceberg, three separate records. It’s like one of those second grade quizzes—”Which of these three records is unlike the other two?” How can that possibly be considered a valid proxy?
How does one end up with this kind of garbage? Here’s the authors’ explanation:
All varve thickness records publicly available from the NOAA Paleolimnology Data Archive as of January 2012 are incorporated, provided they meet the following criteria:
• extend back at least 200 years,
• are at annual resolution,
• are reported in length units, and
• the original publication or other references indicate or argue for a positive association with summer temperature.
Well, that all sounds good, but these guys are so classic … take a look at Devon Lake in Figure 1, it’s DV09. Notice how far back it goes? 1843, which is 170 years ago … so much for their 200 year criteria.
Want to know the funny part? I might never have noticed, but when I read the criteria, I thought “Why a 200 year criteria”? It struck me as special pleading, so I looked more closely at the only one it applied to and said huh? Didn’t look like 200 years. So I checked the data here … 1843, not 200 years ago, only 170.
Man, the more I look, the more I find. In that regard, both Sawtooth and Murray have little short separate portions at the end of their main data. Perhaps by chance, both of them will add to whatever spurious hockeystick has been formed by Korttajarvi and Soper and the main players.
So that’s the first look, at the raw data. Now, let’s follow what they actually do with the data. From the paper:
As is common, varve thicknesses are logarithmically transformed before analysis, giving distributions that are more nearly normally distributed and in agreement with the assumptions characterizing our analysis (see subsequent section).
I’m not entirely at ease with this log transformation. I don’t understand the underlying justification or logic for doing that. If the varve thickness is proportional in some way to temperature, and it may well be, why would it be proportional to the logarithm of the thickness?
In any case, let’s see how much “more nearly normally distributed” we’re talking about. Here are the distributions of the same records, after log transformation and standardization. I use a “violin plot” to examine the shape of a distribution. The width at any point indicates the smoothed number of data points with that value. The white dot shows the median value of the data. The black box shows the interquartile range, which contains half of the data. The vertical “whiskers” extend 1.5 times the interquartile distance at top and bottom of the black box.
Figure 2. Violin plots of the data shown in Figure 1, but after log transformation and standardization. Random normal distribution included at lower right for comparison.
Note the very large variation between the different varve thickness datasets. You can see the problems with the Soper dataset. Some datasets have a fairly normal distribution after the log transform, like Big Round and Donard. Others, like DV09 and Soper, are far from normal in distribution even after transformation. Many of them are strongly asymmetrical, with excursions of four standard deviations being common in the positive direction. By contrast, often they only vary by half of that in the negative direction, two standard deviations. When the underlying dataset is that far from normal, it’s always a good reason for further investigation in my world. And if you are going to include them, the differences in which way they swing from normal (excess positive over negative excursions) affects both the results and their uncertainty.
In any case, after the log transformation and standardization to a mean of zero and a standard deviation of one, the datasets and their average are shown in Figure 3.
Figure 3. Varve thickness records after log transformation and standardization.
As you can see, the log transform doesn’t change the problems with e.g. the Soper or the Iceberg records. They still do not have internal consistency. As a result of the inclusion of these problematic records, all of which contain visible irregularities in the recent data, even a simple average shows an entirely spurious hockeystick.
In fact, the average shows a typical shape for this kind of spurious hockeystick. In the “shaft” part of the hockeystick, the random variations in the chosen proxies tend to cancel each other out. Then in the “blade”, the random proxies still cancel each other out, and all that’s left are the few proxies that show rises in the most recent section.
My conclusions, in no particular order, are:
• The authors are to be congratulated for being clear about the sources of their data. It makes for easy analysis of their work.
• They are also to be congratulated for the clear statement of the criteria for inclusion of the proxies.
• Sadly, they did not follow their own criteria.
•The main conclusion, however, is that clear, bright-line criteria of the type that they used are a necessary but not sufficient part of the process. There are more steps that need to be followed.
The second step is the use of the source documents and the literature to see if there are problems with using some parts of the data. For them to include Korttajarvi is a particularly egregious oversight. Michael Mann used it upside-down in his 2008 analysis. He subsequently argued it “didn’t matter”. It is used upside-down again here, and the original investigators said don’t use it after 1750 or so. It is absolutely pathetic that after all of the discussion in the literature and on the web, including a published letter to PNAS, that once again Korttajarvi is being used in a proxy reconstruction, and once again it is being used upside-down. That’s inexcusable.
The third part of the proxy selection process is the use of the Mark I eyeball to see if there are gaps, jumps in amplitude, changes in variability, or other signs of problems with the data.
The next part is to investigate the effect of the questionable data on the final result.
And the final part is to discuss the reasons for the inclusion or the exclusion of the questionable data, and its effects on the outcome of the study.
Unfortunately, they only did the first part, establishing the bright-line criteria.
Look, you can’t just grab a bunch of proxies and average them, no matter if you use Bayesian methods or not. The paleoproxy crowd has shown over and over that you can artfully construct a hockeystick by doing that, just pick the right proxies …
So what? All that proves is yes indeed, if you put garbage in, you will assuredly get garbage out. If you are careful when you pack the proxy selection process, you can get any results you want.
Man, I’m tired of rooting through this kind of garbage, faux studies by faux scientists.
w.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Thanks Willis.
~~~~~~~~~~~
Geoff Sherrington says:
April 13, 2013 at 11:33 pm
“Nature tends not to know what a logarithm is . . .”
Thanks, Geoff. I needed an early morning laugh.
http://upload.wikimedia.org/wikipedia/commons/thumb/3/36/Spiral_aloe.jpg/800px-Spiral_aloe.jpg
http://www.empowernetwork.com/zeflow/files/2012/08/shell.jpg?id=zeflow
Just kidding. I know you know.
~~~~~~~~~~~~~~~~
Janice Moore says:
April 13, 2013 at 9:17 pm
“This is my first and likely last post . . . ”
And that could be a disservice to the “fight for TRUTH”.
You know things. Share.
I thought this was interesting:
‘They also serve who only stand and wait.’
~John Milton’s Sonnet XVII
http://forum.quoteland.com/eve/forums/a/tpc/f/99191541/m/8371905596
~~~~~~~~~~~~~~~~~~
Pamela Gray says:
April 14, 2013 at 8:15 am
“The Willamette Valley lake and wet land sediments . . . ”
See:
http://www.firescience.gov/projects/04-2-1-115/project/04-2-1-115_04-2-1-115_final_report.pdf
Title (orig. in all caps; sorry):
HISTORICAL FIRE REGIMES OF THE WILLAMETTE VALLEY, OREGON:
PROVIDING A LONG-TERM, REGIONAL CONTEXT FOR FIRE AND FUELS
MANAGEMENT
From the introduction: “A fire history study based on lake-sediment records of the last 2000 years and tree-ring records from the Willamette Valley and surrounding lowlands was undertaken with three objectives: …”
The 4th author has been at Central Washington University (Ellensburg) for several years and continues this sort of work in WA and OR; as, I assume, the others also continue to do.
j ferguson says:
April 14, 2013 at 7:34 am
“Thank you Chris Wright for alerting us to Benford’s Law. What an astonishing law it is.”
I have a “Word” file graph showing distribution of leading digits in Bill Clinton’s 13 years of tax returns overlaid with the Benson distribution – the fit is impressive and shows that Clinton’s tax returns are highly likely to have been correct. (I don’t know how to put this on the Word Press page). Cooked books tend toward a normal distribution of leading digits. BD is also the reason that the first pages of the old log table books were scruffy and dog-eared compared to the rest of book. Apparently the Benford D is the only one that is “scale invariant” ie does not vary with differing units (dollars, Yen, Euros; SI, English units…).
Probably, taking the logarithm of the varve thickness data DOES essentially change scale and violates its natural distribution. Perhaps in some way, a test using the BD could be another possibility for “auditing” climate statistical manipulations.
Just reread the paper Willis – oops! You’re correct. My bad.
Procedural certainty, not representational certainty. Reality is not found simply by number-crunching.
I am shocked that technically proficient men and women don’t understand the difference between correct math and valid conclusions determined from math. The sampling is invalid for the purpose, the variation clearly indicating a regional situation.
A mineral geologist would recognize this pattern: local ore deposits, two of which deserve staking.
Who would claim 11 in-part incomplete temperature records would accurately reflect the large regional history excerpt by accident? Apparently knowledgeable is not smart.
A great post Mr. Eschenbach but you say “…the random variations in the chosen proxies tend to cancel each other out.”
Actually i would go further. From a visual inspection I suspect that if someone wasted enough of their time obtaining the correllation coefficients between each pair of proxies, even after transformation and standardisation, there would be sufficient evidence to show that the varve thicknesses were not responding to the same stimuli.
“Steve McIntyre says:
April 14, 2013 at 9:54 am
In respect to the Korttajarvi organics.”….
Readers do not in general understand the technics about this (Willis point) concerning temperature that Tiljander does not refer varve thickness to temperature but to other measurements about details within the year.
There has been a barrage of warming garbage of late from the climateer “scientists”. “Hide the decline” has morphed into “Bury the truth”.
A lot of comment but you and the readers leave out an important subject. The little ice age. In 1400s we were near the bottom of the temeratures so temperatures despite all the flaws in this study could have been lower than and higher today. What was the temperature in say 1000 AD, before the temperatures started dropping. The basin study I just checked shows the temperatuire in 1225 was the same as in 1990 so maybe you folks are looking at the trees in this study and ignoring the forest so to speak.
Janice Moore says:
April 13, 2013 at 9:17 pm
Thank you, Mr. Eschenbach and Mr. Watts. Your thorough and insightful summarizing of scientific papers in language a non-science major can understand (er, well, ahem, most of the time…), is much appreciated. This is my first and likely last post (I am not likely to have anything substantial to say), but know that I and, no doubt, thousands [ah, why not speculate — MILLIONS $:)] of us are silently reading and learning, gaining more ammunition for the fight for TRUTH.
Thanks, too, to all you other cool science geeks who weigh in, here! You rock.
++++++++++++++
My vote for comment of the month…
If only I had the same sense of restraint, and could write.
Data transformations are often used in a direct response to the technology that generated the data. “Random” errors arising from measurement techniques tend to be proportional to the size of the quantity being measured. The sources of potential error really need to be considered when thinking about the validity and utility of performing data transformations, as well as the apparent inherent scatter in the quantity being investigated. Take precipitation measurements as an example. Collections of monthly average rainfall data may contain values from 0 (zero) millimeters to a few hundred. It will be obvious that very high values are likely to be subject to substantial errors of sampling as well as actual measuring the height of the water in the collector. In dry months such errors will be small. In these circumstances a log scale may be useful, but it falls down for any zero observations. One might thus consider a (logV + C) transform, choosing C such that you feel comfortable with its non-linear scale effects. Clearly subjective in general, but possibly useful. What about a square root transform? The zero problem vanishes whilst the de-emphasis of high values is fairly well maintained. If your original data contain negative values you may again need to choose an arbitrary constant to add.
Simply deciding on a transform in an attempt to improve the “normality” of a set of observations is very ill-advised. Normality is never achieved in practice, and despite its attractive simplification of the arithmetic and algebra of regression, the inferential statistical calculations that generally follow the regression process and the consequences of back-transforming to the original data scale have to be accepted and allowed for. Take care when using transforms!
My two cents: Air pressure in the atmosphere drops off as the logarithm of altitude… another example of nature’s use of logarithms.
berniel
Thank you for the comment about glacial melt water for increased runoff and sediment yield. You are correct. Willis is also correct in that glacial advance as well as retreat can caused increased sediment transport. I used precipitation to include rain as well as snow. Also, in places such as the NE USA, rapid snowmelt from warm rain can cause huge floods with ice jams and drastically increased sediment transport. When lake Bonneville’s dam burst it scoured out huge canyons in the Snake River of Northwest USA, perhaps the largest yet found. I can’t swear to it until I go back and find a good book on it, but I do believe an ice dike may have been involved in the very rapid flood and huge depths of water. But with a flood of that magnitude we may never know.
So again, you are correct, precipitation in the form of snow and ice flow/melt can be significant.
I appreciate your comments and those of Willis.
Leonard Lane says:
April 14, 2013 at 9:22 pm
Leonard,
The great scouring was done by the Columbia River, fed by bursts of water from the Clark Fork River. Lake Missoula was indeed a glacier formed lake that burst periodically. I remember an article in Scientific American belatedly acknowledging the work of J Harlan Bretz in figuring out the mechanism of the periodic flooding by hiking over the ground on foot. They did discover varves, though I don’t recall use of the term. The varves were located on the Snake River because the huge volume of water in the Columbia caused the Snake to flow upstream, allowing deposition of soil layers as the water flow decelerated. Examination of those layers was key in figuring out what had gone on so many years before.
pbh
McComber Boy.
Correct. Missoula floods took the Clark Fork to the Columbia and certainly where the Snake River joined the Columbia River, water would have went everywhere, including up the Snake River. And the Columbia River was scoured several times by the Missoula Floods.
But, I was referring to Lake Bonneville (in Utah) and its massive flood scouring the Snake River. As I recall, the Lake Bonneville flood was a single occurrence. Thank you for making the comment more general by including other massive floods in the region..
Actually i would go further. From a visual inspection I suspect that if someone wasted enough of their time obtaining the correllation coefficients between each pair of proxies, even after transformation and standardisation, there would be sufficient evidence to show that the varve thicknesses were not responding to the same stimuli.
Not to be picky, but you mean there would be insufficient evidence to conclude that there are responding to the same stimulus (one at a time). It is virtually impossible to conclude a negative in science. The correct use of statistics here is to try to disprove the null hypothesis of no common cause, and to be unable to do so.
Beyond that, I disagree as long as one includes the sets that are obviously correlated with what is being interpreted as a 19th and 20th century warming. These aren’t correlated with most of the rest of the sites, but they are correlated strongly with each other. I would guess that a few of the other sites might have decent correlation too, but on a much more modest basis — they might well be responding weakly to a common set of conditions e.g. excessive rainfall or drought per year or per decade. This is actually pretty reasonable, as things like ENSO often create correlated patterns of drought quite independent of the temperature, “often” (on a century timescale) drought or flood patterns that last 2 to 5 years and that are quite widespread on a continental basis.
But your suggestion is an excellent one, and since Willis has the data in hand and with R it is very easy, perhaps he might do the correlation between all of the datasets pairwise. The reason this is important — and Steve McIntyre can check me on this if he is still listening in — is that let’s just suppose that all of the datasets BUT the ones that are obviously corrupted in the 20th century have a very consistent level of mutual correlation, one that is equally consistent with a weak hypothesis of correlation with e.g. El Nino type phenomena (or some simple model of spatiotemporal correlation of weather on a decadal scale). Then the datasets that are in fact questionable will stand out like a sore thumb with absurdly incorrect correlation properties, outvoted within the study some three to one. Add to that the fact that those sites/studies have other serious problems discussed even in the literature from which they were drawn, and it is the basis for a comment in the original journal if not withdrawal of the paper, or — better yet — the publication of a new paper that directly contradicts its conclusions (and supports the null hypothesis of “no observable warming”) on the basis of a sound statistical analysis of the data that omits the sites that are known to be corrupted by human activity such as land use changes AND that fail a clearly stated correlation criterion that is used to separate climate signal (which is surely universal) from any land use signals that OVERWHELM the climate signal at selected sites.
I fervently await the day when climate scientists actually learn real statistics, or stats grad students start to go into climate science. So far the best that I can say is that the papers purporting to present detailed statistical studies of complex multivariate data from proxies with multiple influences look like they have been written by rank amateurs, people who have literally taken no more than one or two stats courses and who are utterly clueless about the subject beyond that. Mann writing his own PCA code in FORTRAN for gosh sake. This study, where a mere glance at the data suggests that one had better resolve an obvious bimodal inconsistency before publishing a result that de facto selects one of the two modes as being representative of the climate and the other as not. I mean, one would think that a scientist would want to understand this before risking reputation and career on a premature publication.
Sadly, one of the most profound effects of the kind of gatekeeping and career-ruining-behind-the-scenes activity revealed by the Climategate letters, where there actually is an informal committee of sorts that actually will try to ruin your career and obstruct your papers (and that has had some success doing so) is that researchers in the field no doubt fear Mann more than they fear publishing a bad result. That is, it is literally safer, career-wise, to publish a paper that confirms a “warming signal” even if there are obvious, glaring inconsistencies in the data and no adequate explanation of those inconsistencies than to do the right thing and pursue the inconsistencies at the expense of ending up with a result with no warming, that fails to reject the null hypothesis.
What a disaster, if this part of the US actually shows no discernible 19th and 20th century warming, or a barely resolvable 0.3 to 0.5 C warming from the LIA on (cherrypicking the interval where we KNOW global warming to have occurred, just not anthropogenic global warming).
rgb
rgb@duke said:
“I fervently await the day when climate scientists actually learn real statistics, or stats grad students start to go into climate science. ”
Nice one. That would sting a little if Mann & Co. were bright enough to understand your comment.