Standard Deviation, The Overlooked But Essential Climate Statistic

Guest essay by Dr Tim Ball

Never try to walk across a river just because it has an average depth of four feet. Martin Friedman

“Statistics: The only science that enables different experts using the same figures to draw different conclusions.“ Evan Esar

I am not a statistician. I took university level statistics because I knew, as a climatologist, I needed to know enough to ask statisticians the right questions and understand the answers. I was mindful of what the Wegman Committee later identified as a failure of those working on the Intergovernmental Panel on Climate Change (IPCC) paleoclimate reconstructions.

It is important to note the isolation of the paleoclimate community; even though they rely heavily on statistical methods they do not seem to be interacting with the statistical community.

Apparently they knew their use and abuse of statistics and statistical methods would not bear examination. It was true of the “hockey stick”, an example of misuse and creation of ‘unique’ statistical techniques to predetermine the result. Unfortunately this is an inherent danger in statistics. A statistics professor told me that the more sophisticated the statistical technique, the weaker the data. Anything beyond basic statistical techniques was ‘mining’ the data and moving further from reality and reasonable analysis. This is inevitable in climatology because of inadequate data. As the US National Research Council Report of Feb 3, 1999 noted,

“Deficiencies in the accuracy, quality and continuity of the records place serious limitations on the confidence that can be placed in the research results.”

Methods in Climatology by Victor Conrad is a classic text that identified most of the fundamental issues in climate analysis. Its strength is it realizes the amount and quality of the data is critical, a theme central to Hubert Lamb’s establishing the Climatic Research Unit (CRU). In my opinion statistics as applied in climate has advanced very little since. True, we now have other techniques like spectral analysis, but it all those techniques, is meaningless if you don’t accept that cycles exist or have records of adequate quality and length.

Ironically, some techniques such as moving averages, remove data. Ice core records are a good example. The Antarctic ice core graphs, first presented in the 1990s, illustrate statistician William Briggs’ admonition.

Now I’m going to tell you the great truth of time series analysis. Ready? Unless the data is measured with error, you never, ever, for no reason, under no threat, SMOOTH the series! And if for some bizarre reason you do smooth it, you absolutely on pain of death do NOT use the smoothed series as input for other analyses! If the data is measured with error, you might attempt to model it (which means smooth it) in an attempt to estimate the measurement error, but even in these rare cases you have to have an outside (the learned word is “exogenous”) estimate of that error, that is, one not based on your current data. (His bold)

A 70 – year smoothing average was applied to the Antarctic ice core records. It eliminates a large amount of what Briggs calls “real data” as opposed to “fictional data” created by the smoothing. The smoothing diminishes a major component of basic statistics, standard deviation of the raw data. It is partly why it received little attention in climate studies, yet is a crucial factor in the impact of weather and climate on flora and fauna. The focus on averages and trends was also responsible. More important from a scientific perspective is its importance for determining mechanisms.

clip_image002

Figure 1: (Partial original caption) Reconstructed CO2 concentrations for the time interval ca8700 and ca6800 calendar years B.P based on CO2 extracted from air in Antarctica ice of Taylor Dome (left curve; ref.2; raw data available via www.ngdc.noaa.gov/paleo/taylor/taylor.html) and SI data for fossil B. pendula and B.pubescens from Lake Lille Gribso, Denmark. The arrows indicate accelerator mass spectrometry 14C chronologies used for temporal control. The shaded time interval corresponds to the 8.2-ka-B.P. cooling event.

Source: Proc. Natl. Acad. Sci. USA 2002 September 17: 99 (19) 12011 -12014.

Figure 1 shows a determination of atmospheric CO2 levels for a 2000-year span comparing data from a smoothed ice core (left) and stomata (right). Regardless of the efficacy of each method of data extraction, it is not hard to determine which plot is likely to yield the most information about mechanisms. Where is the 8.2-ka-BP cooling event in the ice core curve?

At the beginning of the 20th century statistics was applied to society. Universities previously divided into the Natural Sciences and Humanities, saw a new and ultimately larger division emerge, the Social Sciences. Many in the Natural Sciences view Social Science as an oxymoron and not a ‘real’ science. In order to justify the name, social scientists began to apply statistics to their research. A book titled Statistical Packages for the Social Sciences” (SPSS) first appeared in 1970 and became the handbook for students and researchers. Plug in some numbers and the program provides results. Suitability of data, such as the difference between continuous and discrete numbers, and the technique were little known or ignored, yet affected the results.

Most people know Disraeli’s comment, “There are three kinds of lies: lies, damn lies and statistics”, but few understand how application of statistics affects their lives. Beyond inaccurate application of statistics is the elimination of anything beyond one standard deviation, which removes the dynamism of society. Macdonald’s typifies the application of statistics – they have perfected mediocrity. We sense it when everything sort of fits everyone, but doesn’t exactly fit anyone.

Statistics in Climate

Climate is an average of the weather over time or in a region and until the 1960s averages were effectively the only statistic developed. Ancient Greeks used average conditions to identify three global climate regions, the Torrid, Temperate, and Frigid Zones created by the angle of the sun. Climate research involved calculating and publishing average conditions at individual stations or in regions. Few understand how meaningless a measure it is, although Robert Heinlein implied it when he wrote, “Climate is what you expect, weather is what you get”. Mark Twain also appears aware with his remark that, “Climate lasts all the time, and weather only a few days.” A farmer asked me about the chances of an average summer. He was annoyed with the answer “virtually zero” because he didn’t understand that ‘average’ is a statistic. A more informed question is whether it will be above or below average, but that requires knowledge of two other basic statistics, the variation and the trend.

After WWII predictions for planning and social engineering emerged as postwar societies triggered development of simple trend analysis. It assumed once a trend started it would continue. The mentality persists despite evidence of downturns or upturns; in climate it seems to be part of the rejection of cycles.

Study of trends in climate essentially began in the 1970s with the prediction of a coming mini ice age as temperatures declined from 1940. When temperature increased in the mid-1980s they said this new trend would continue unabated. Political users of climate adopted what I called the trend wagon. The IPCC made the trend inevitable by saying human CO2 was the cause and it would continue to increase as long as industrial development continued. Like all previous trends, it did not last as temperatures trended down after 1998.

For year-to-year living and business the variability is very important. Farmers know you don’t plan next year’s operation on last year’s weather, but reduced variability reduces risk considerably. The most recent change in variability is normal and explained by known mechanisms but exploited as abnormal by those with a political agenda.

John Holdren, Obama’s science Tsar, used the authority of the White House to exploit increased variation of the weather and a mechanism little known to most scientists let alone the public, the circumpolar vortex. He created an inaccurate propaganda release about the Polar Vortex to imply it was something new and not natural therefore due to humans. Two of the three Greek climate zones are very stable, the Tropics and the Polar regions. The Temperate zone has the greatest short-term variability because of seasonal variations. It also has longer-term variability as the Circumpolar Vortex cycles through Zonal and Meridional patterns. The latter creates increased variation in weather statistics, as has occurred recently.

IPCC studies and prediction failures were inevitable because they lack data, manufacture data, lack knowledge of mechanisms and exclude known mechanism. Reduction or elimination of the standard deviation leads to loss of information and further distortion of the natural variability of weather and climate, both of which continue to occur within historic and natural norms.

Get notified when a new post is published.
Subscribe today!
3.7 3 votes
Article Rating
118 Comments
Inline Feedbacks
View all comments
Brent Walker
June 15, 2014 6:23 pm

I am just a humble actuary. Although the variance of a distribution is important I like to look much further. I often describe distributions in terms of their skewness and kurtosis. These statistics help me understand what I am working with. Unfortunately no-one seems to want to use the higher levels of understanding anymore. It seems to be all about the quick media bite.

kadaka (KD Knoebel)
June 15, 2014 6:40 pm

From RoHa on June 15, 2014 at 5:25 pm:

“Universities previously divided into the Natural Sciences and Humanities, saw a new and ultimately larger division emerge, the Social Sciences.”
This sentence violates the “no comma after subject clause” rule. It is not a difficult rule, so I do not understand why I see it violated so often.

Here it gathers up the subject to avoid confusion. Without the comma it may be read (with an implied comma) as “Universities previously divided into the Natural Sciences, and Humanities saw a new and ultimately larger division emerge, the Social Sciences.”
Social Sciences emerging from Humanities (apparently) makes sense, the Natural Sciences are divided at universities, etc. Using the comma to gather up the “they” improves readability.
Your suggested use of a comma after “Universities”, could also work.
Note the comma after subject, is good for dramatic effect, making for a “pause” that brings the written words closer to how the writer, would have spoken them.

mebbe
June 15, 2014 7:14 pm

profitup10 says:
June 15, 2014 at 5:46 pm
English majors need a forum . . who cares about use of commas in a informal discussion? Find some real issues and join the discussion.
——————————————————————
Perhaps, people that don’t want to waste their time re-reading some sloppy prose on the off-chance that there was something worthwhile to be read.

RoHa
June 15, 2014 7:21 pm

“Without the comma it may be read (with an implied comma) as “Universities previously divided into the Natural Sciences, and Humanities saw a new and ultimately larger division emerge, the Social Sciences.”
Since “divided into” leads the reader to predict “A and B”, “divided into the Natural Sciences, ” (A alone) would be a perverse reading. Placing a comma after a subject clause is confusing, since it implies (incorrectly) that part of the preceding clause is a subordinate clause.
And, profitup10, as the story about the panda shows, misuse of commas can be an obstacle to even informal communication.

June 15, 2014 7:39 pm

not possible unless we know, within a great degree of accuracy
The accuracy is insufficient and the computers you use do not have long enough words. And I don’t care about your accuracy nor about the length of your computer words. In a chaotic system it is never enough. If nothing else quantum fluctuations will get you. You can’t measure it close enough. Ever. Fundamentally.

June 15, 2014 7:46 pm

Jeff L says:
June 15, 2014 at 5:51 pm

It tends to depend on the environment they live in. In general the right are country folk and the left are city folk.
Behavioral Sink Behavior And Thermodynamics
A thermodynamic explanation of politics

kadaka (KD Knoebel)
June 15, 2014 8:08 pm

From RoHa on June 15, 2014 at 7:21 pm:

Since “divided into” leads the reader to predict “A and B”, “divided into the Natural Sciences, ” (A alone) would be a perverse reading.

Bacteria previously divided into the numerous flora, and Climatosis saw a new and ultimately larger division emerge, the Grantia Suckus.
Bacteria previously divided into the numerous flora and Climatosis saw a new and ultimately larger division emerge, the Grantia Suckus.

And where is this predicting of A and B?

June 15, 2014 8:31 pm

Quinx says:
June 15, 2014 at 2:04 pm
A lot of people do
+++++++++++++++++++++++
I agree with you. Averages often leave out critical information. Averages of averages get worse. Average day doesn’t tell you about the high and low trends. Average the day into months, the months into years, the years into decades and make a trend. But the trend can be deceiving if you don’t know how it was made. Is it an average of averages, or is it a true mean. I assume we are still grading on a bell curve. It would be interesting to see how the degree of difficulty has trended versus the bell curve. Just kidding. Although I did have the opportunity to train a number of university graduates in my career since their academic training …
I love claims that a fraction of a degree change on average is disrupting the environment … an environment with a diurnal variation of 20 degrees C or more.
But in reality, there may be a local impact of 5 degrees C up or down in regional areas resulting in an “average” of a fraction of a degree. It cuts both ways and we don’t know what the “averaging” algorithm has done.
Lots to learn.
Happy Fathers day.

ossqss
June 15, 2014 8:42 pm

Latitude says:
June 15, 2014 at 3:33 pm
Mosh…seems he was making a joke along the lines of “torturing the data”…
sound familiar?
——————————————————–
Ha! nice……
Live the “dash” as they say.
The data can be tortuous in the end…………..
Regards, Ed

richard verney
June 15, 2014 8:51 pm

Steven Mosher says:
June 15, 2014 at 2:29 pm
/////////////////////////
We all know what Dr Tim Ball was seeking to convey by thee statement quoted. It is in effect part of the ‘truism’ observed by Lord Rutherford, namely:
“If your experiment needs statistics, you ought to have done a better experiment.”
If the signal in the data is significant, you do not need some fancy statistical modulation to identify it. The data can stand on its own. If it can’t stand on its own, chances are that you are merely looking at noise.

Greg Goodman
June 15, 2014 9:01 pm

An interesting article, however, one phrase seemed odd.
” A more informed question is whether it will be above or below average, but that requires knowledge of two other basic statistics, the variation and the trend.”
It is very common to see statements like “the trend is” , presented as if it is a fundamental fact of the data and without any recognition of the fact that this is ASSUMING a linear model is applicable to the data and has some predictive value.
Another way of looking at the “trend” is as the average rate of change, so it could be called the “expectation value” (the statistician’s term for the mean) of the rate of change. But what about higher orders: what is the acceleration? Is it speeding up or slowing down? And why are we now fitting higher order polynomial models to the data ? Is that suitable or would some other model ( perhaps periodic ) be more appropriate? Is that implicit choice or ASSUMPTION even being recognised.
Rate of change may be relevant to auto-regressive data like temperature, for example. There is not much sense in taking the “expectation value” of the last 150 years of SST and using it as a “best estimation” for next years SST. Just taking last years value would be a lot better. Adding the average annual change to last years SST may be better still (ie using the “trend” rather than the mean).
That may seem like common sense, but we are already applying some physical knowledge of the system to know that this will be better than just taken the mean as our guess for next year. A model is being applied to the data.
How often are we presented with statements like “if the current trend continues, by the year 2100 …. bhah….blah.” with the implication that this “if” somehow makes sense, that it is at least a likely outcome.
Without presenting some reason for the choice this has no more validity than “if the current average persists, by the year 2100, it will be the same as today”.
The other problem is extrapolation beyond the data. Any reasonable model may give a good estimation for next year. If you project tens years there’s a fair change you will be badly off ( even if you spend 30 years and billions of dollar making your model ). If you project a complex system 100 years hence based on 150 years at most of very poor quality data you have as much chance as guessing next weeks lottery numbers.
The exercise is meaningless. No honest scientist would make such a projection.

John F. Hultquist
June 15, 2014 9:06 pm

Seems like a genuinely fine essay. However, the little bit about “Statistics in Climate” jumps from the Greeks to Mark Twain. This ignores the interesting work of Wladimir Köppen and Rudolf Geiger that began with the concept that native vegetation is the best expression of climate. This was first published in 1884 and temperature records are easier to obtain than the field work necessary to find the boundaries of biomes (ecotone not in use then). Think of vegetation as integrators of weather.
The modern “climate-is-average-temperature” is a hoax.

John F. Hultquist
June 15, 2014 9:19 pm

Greg Goodman says:
June 15, 2014 at 9:01 pm
It is very common to see statements like “the trend is”, presented as if it is a fundamental fact of the data and without any recognition of the fact that this is ASSUMING a linear model is applicable to the data and has some predictive value.

In the WSJ for June 14-15 (p. C6) there is a review [by Mario Livio] of a book you might like:
“How Not to Be Wrong”
by Jordan Ellenberg
Here is a quote from the review: “But as Mr. Ellenberg quickly points out, the relation between prosperity and social services could very well be non-linear.” Then an inverted U form is mentioned.

ferdberple
June 15, 2014 9:33 pm

Steven Mosher says:
June 15, 2014 at 2:29 pm
“A statistics professor told me that the more sophisticated the statistical technique, the weaker the data. ”
false.
The data and its quality are separate and distinct from the method.
=============
I was also troubled by Dr Balls comment in this specific example. I believe the correct answer is:
“A statistics professor told me that the more sophisticated the statistical technique, the weaker the RESULT. ”
Perhaps Dr Ball could reviews this point for us. Thanks,

ferdberple
June 15, 2014 9:40 pm

or alternatively:
“A statistics professor told me that the more sophisticated the statistical technique, the weaker the CONCLUSION.”
I expect Dr Balls original quote was also correct, but not for the obvious reason. Rather that what his Professor meant was that when someone needs to use sophisticated statistics, it is because their underlying data is weak, If you have good data there is no need for sophisticated statistics, The simplest of statistics will suffice with good data.

ferdberple
June 15, 2014 9:42 pm

Steven Mosher says:
June 15, 2014 at 2:29 pm
The data and its quality are separate and distinct from the method.
======
thinking more on this question, I believe you are incorrect. Good data will reveal itself with simple methods, while bad data requires sophisticated methods to extract the signal from the noise,

george e. smith
June 15, 2014 9:47 pm

“””””…..Never try to walk across a river just because it has an average depth of four feet. Martin Friedman
“Statistics: The only science that enables different experts using the same figures to draw different conclusions.“ Evan Esar……””””””
Well I wouldn’t try to walk across a river that had an average depth (at that location) of two feet; I wouldn’t even step into a river that was 4 feet deep, right where I was (not) going to step. A lake maybe; but not a river.
Standard deviation wouldn’t help much; you only get one experiment, in the case where your first try is six sigma out on the deep end.
As for the second quip, by Evan Esar, whoever he is; the problem is in drawing ANY conclusions, from the results of a statistical calculation. The statistics is about numbers (figures), that you already know. Doesn’t tell you anything about any numbers you DON’T already know.

ferdberple
June 15, 2014 9:52 pm

The modern “climate-is-average-temperature” is a hoax.
==================
With one foot in the freezer and one food in the oven you are on average comfortable. climatic science 101. everything else is dialogue.

June 15, 2014 9:54 pm

Seems to me that a cabal of ‘climatologists’, not a big number, decided to work the numbers to create a funding pipeline and a bit of power … Yeah, those dumbass politicians and bureaucrats were easily convinced to satisfy the craving. Seems also to me that a couple of astute politicians came upon this and recognised that they could gain political sway and riches but scaring the bejeezus out of the weak of mind citizens, and then a few ‘industrialists’ got together with a few socialists to hold sway over the errant ‘climatologists’ and the politicians. The result is now the Great Climate Scam … ‘data’ and ‘statistics’ are just tools for manipulation.

Reply to  Streetcred
June 16, 2014 6:10 am

Good points and in my opinion, a bullseye.

george e. smith
June 15, 2014 9:59 pm

“””””……profitup10 says:
June 15, 2014 at 5:46 pm
English majors need a forum . . who cares about use of commas in a informal discussion? Find some real issues and join the discussion…….””””””
Well Dr. Richard Lederer; the world’s leading authority on the English language, says to put a comma anywhere you would “pause” in normal speech. (Most people do have to pause for breath reasons).
Yes he can parse anything you can write; but he says that “language is for communicating.”
I put them wherever I darn well please.

Reply to  george e. smith
June 16, 2014 6:07 am

Ys und my bt is taht lla can rd and comperehend tis setnece? We communicate in many ways, while some prefer long detailed narratives. Most people prefer getting to the point and make it short not flowery.

ferdberple
June 15, 2014 10:03 pm

I often describe distributions in terms of their skewness and kurtosis.
=========
baby steps. start with standard deviation and the normal distribution. then show how this is not the real world. rather, how we can sample the real world to arrive at the normal distribution, and thereby simplify the math to analyze the results in terms of the standard deviation.
while your data may not be normal, you can sample it in such a way that the sample is normal. and thus can be analyzed by standard methods. or you can skip this step, and analyze non-normal data as though it was normal, and the result is almost certainly misleading. this is what we affectionately call climate science 101.

george e. smith
June 15, 2014 10:09 pm

When I went to school, every single exam, regardless of subject, was also an English test. If you couldn’t write reasonably correct English, you were docked marks, even if the answers related to the subject, were quite correct.
If you are content to be around nitwits, who, like, can’t even write a sensible sentence, I know plenty of places to go and seek their company.
WUWT is not such a place.

ferdberple
June 15, 2014 10:25 pm

She is being blocked at every turn.
Don’t think for one instance that Australia is an isolated case by these fraudsters.
=======
the adjacent pair adjustment is biased to perceive short term trend as artificial while ignoring long term trends. this assumes that any long term trend (logging, agriculture, urbanization) is natural while any short term trend (volcanoes) is caused by humans.
the adjacent pair adjustment to US temperatures is then used to reduce observed temperatures so that they match climate models, proving that US models are “correct”. the definition of correctness lies in the ability of reality to match model prediction/. Any model that states otherwise must be noise and thereby must be corrected.

kadaka (KD Knoebel)
June 15, 2014 10:32 pm

Statistics convinces us with great certainty by comparing similar data that we eat with and breathe through our short muzzles.

BioBob
June 15, 2014 11:20 pm

kadaka (KD Knoebel) says:
June 15, 2014 at 10:32 pm
Statistics convinces us with greatestimated certainty by comparing similar data that we eat with and breathe through our short muzzles.
There…I fixed it. 8>P