Guest essay by Dr Tim Ball
Never try to walk across a river just because it has an average depth of four feet. Martin Friedman
“Statistics: The only science that enables different experts using the same figures to draw different conclusions.“ Evan Esar
I am not a statistician. I took university level statistics because I knew, as a climatologist, I needed to know enough to ask statisticians the right questions and understand the answers. I was mindful of what the Wegman Committee later identified as a failure of those working on the Intergovernmental Panel on Climate Change (IPCC) paleoclimate reconstructions.
It is important to note the isolation of the paleoclimate community; even though they rely heavily on statistical methods they do not seem to be interacting with the statistical community.
Apparently they knew their use and abuse of statistics and statistical methods would not bear examination. It was true of the “hockey stick”, an example of misuse and creation of ‘unique’ statistical techniques to predetermine the result. Unfortunately this is an inherent danger in statistics. A statistics professor told me that the more sophisticated the statistical technique, the weaker the data. Anything beyond basic statistical techniques was ‘mining’ the data and moving further from reality and reasonable analysis. This is inevitable in climatology because of inadequate data. As the US National Research Council Report of Feb 3, 1999 noted,
“Deficiencies in the accuracy, quality and continuity of the records place serious limitations on the confidence that can be placed in the research results.”
Methods in Climatology by Victor Conrad is a classic text that identified most of the fundamental issues in climate analysis. Its strength is it realizes the amount and quality of the data is critical, a theme central to Hubert Lamb’s establishing the Climatic Research Unit (CRU). In my opinion statistics as applied in climate has advanced very little since. True, we now have other techniques like spectral analysis, but it all those techniques, is meaningless if you don’t accept that cycles exist or have records of adequate quality and length.
Ironically, some techniques such as moving averages, remove data. Ice core records are a good example. The Antarctic ice core graphs, first presented in the 1990s, illustrate statistician William Briggs’ admonition.
Now I’m going to tell you the great truth of time series analysis. Ready? Unless the data is measured with error, you never, ever, for no reason, under no threat, SMOOTH the series! And if for some bizarre reason you do smooth it, you absolutely on pain of death do NOT use the smoothed series as input for other analyses! If the data is measured with error, you might attempt to model it (which means smooth it) in an attempt to estimate the measurement error, but even in these rare cases you have to have an outside (the learned word is “exogenous”) estimate of that error, that is, one not based on your current data. (His bold)
A 70 – year smoothing average was applied to the Antarctic ice core records. It eliminates a large amount of what Briggs calls “real data” as opposed to “fictional data” created by the smoothing. The smoothing diminishes a major component of basic statistics, standard deviation of the raw data. It is partly why it received little attention in climate studies, yet is a crucial factor in the impact of weather and climate on flora and fauna. The focus on averages and trends was also responsible. More important from a scientific perspective is its importance for determining mechanisms.
Figure 1: (Partial original caption) Reconstructed CO2 concentrations for the time interval ca8700 and ca6800 calendar years B.P based on CO2 extracted from air in Antarctica ice of Taylor Dome (left curve; ref.2; raw data available via www.ngdc.noaa.gov/paleo/taylor/taylor.html) and SI data for fossil B. pendula and B.pubescens from Lake Lille Gribso, Denmark. The arrows indicate accelerator mass spectrometry 14C chronologies used for temporal control. The shaded time interval corresponds to the 8.2-ka-B.P. cooling event.
Source: Proc. Natl. Acad. Sci. USA 2002 September 17: 99 (19) 12011 -12014.
Figure 1 shows a determination of atmospheric CO2 levels for a 2000-year span comparing data from a smoothed ice core (left) and stomata (right). Regardless of the efficacy of each method of data extraction, it is not hard to determine which plot is likely to yield the most information about mechanisms. Where is the 8.2-ka-BP cooling event in the ice core curve?
At the beginning of the 20th century statistics was applied to society. Universities previously divided into the Natural Sciences and Humanities, saw a new and ultimately larger division emerge, the Social Sciences. Many in the Natural Sciences view Social Science as an oxymoron and not a ‘real’ science. In order to justify the name, social scientists began to apply statistics to their research. A book titled “Statistical Packages for the Social Sciences” (SPSS) first appeared in 1970 and became the handbook for students and researchers. Plug in some numbers and the program provides results. Suitability of data, such as the difference between continuous and discrete numbers, and the technique were little known or ignored, yet affected the results.
Most people know Disraeli’s comment, “There are three kinds of lies: lies, damn lies and statistics”, but few understand how application of statistics affects their lives. Beyond inaccurate application of statistics is the elimination of anything beyond one standard deviation, which removes the dynamism of society. Macdonald’s typifies the application of statistics – they have perfected mediocrity. We sense it when everything sort of fits everyone, but doesn’t exactly fit anyone.
Statistics in Climate
Climate is an average of the weather over time or in a region and until the 1960s averages were effectively the only statistic developed. Ancient Greeks used average conditions to identify three global climate regions, the Torrid, Temperate, and Frigid Zones created by the angle of the sun. Climate research involved calculating and publishing average conditions at individual stations or in regions. Few understand how meaningless a measure it is, although Robert Heinlein implied it when he wrote, “Climate is what you expect, weather is what you get”. Mark Twain also appears aware with his remark that, “Climate lasts all the time, and weather only a few days.” A farmer asked me about the chances of an average summer. He was annoyed with the answer “virtually zero” because he didn’t understand that ‘average’ is a statistic. A more informed question is whether it will be above or below average, but that requires knowledge of two other basic statistics, the variation and the trend.
After WWII predictions for planning and social engineering emerged as postwar societies triggered development of simple trend analysis. It assumed once a trend started it would continue. The mentality persists despite evidence of downturns or upturns; in climate it seems to be part of the rejection of cycles.
Study of trends in climate essentially began in the 1970s with the prediction of a coming mini ice age as temperatures declined from 1940. When temperature increased in the mid-1980s they said this new trend would continue unabated. Political users of climate adopted what I called the trend wagon. The IPCC made the trend inevitable by saying human CO2 was the cause and it would continue to increase as long as industrial development continued. Like all previous trends, it did not last as temperatures trended down after 1998.
For year-to-year living and business the variability is very important. Farmers know you don’t plan next year’s operation on last year’s weather, but reduced variability reduces risk considerably. The most recent change in variability is normal and explained by known mechanisms but exploited as abnormal by those with a political agenda.
John Holdren, Obama’s science Tsar, used the authority of the White House to exploit increased variation of the weather and a mechanism little known to most scientists let alone the public, the circumpolar vortex. He created an inaccurate propaganda release about the Polar Vortex to imply it was something new and not natural therefore due to humans. Two of the three Greek climate zones are very stable, the Tropics and the Polar regions. The Temperate zone has the greatest short-term variability because of seasonal variations. It also has longer-term variability as the Circumpolar Vortex cycles through Zonal and Meridional patterns. The latter creates increased variation in weather statistics, as has occurred recently.
IPCC studies and prediction failures were inevitable because they lack data, manufacture data, lack knowledge of mechanisms and exclude known mechanism. Reduction or elimination of the standard deviation leads to loss of information and further distortion of the natural variability of weather and climate, both of which continue to occur within historic and natural norms.
Forgot the mention the 9 (count ’em) investigations into the so called Climategate emails and how the skeptical crowd came out on top in, erm, none of them.
To cement her position as a gullible lemming, Margaret cites so-called ‘investigations’, in which no hostile witnesses were called — and in which Michael Mann was allowed to confer with the committe beforehand in order to decide what questions to ask, and what not to ask!
That is the ultimate appeal to a corrupt authority. It is like the police investigating a robbery, but refusing to question eyewitnesses. Only a fool would label those so-called ‘investigations’ as anything other than a whitewash.
[cut the personal insults. .mod]
[Margaret, you are welcome to dial back your hatred, and resubmit this comment, or not. Either way, you’re gonna blow a vein if you keep it up. – Anthony]
I will just speak of standard deviations and their like. Those, along with correlations, and some other things, are second order statistics. Those only exist in a space with a dimension greater or equal to than L2. You can always compute a sample “standard deviation”, “correlation”, etc. but that does not guarantee that the thing you are sampling lives in that L2 or higher space. Many things do, but there are some that simply do not. You have to estimate the dimension of the space using some other tools to be confident that the “standard deviation” will not diverge to infinity as your sample size increases. The Cotton Futures market was shown by Mandelbrot to probably live in around L1.8. [See Kutner, MIT Press, Random Character of Sock Market Prices] Compute the sample statistic, but please be aware that it may well be that your anticipated 100 year flood occurs every 25 years or so.
@Robin Edwards…….”””””””Incidentally, the idea of someone doing statistical calculations using a slide rule strikes me as rather funny. Stats is a digital discipline. I used to use a mechanical calculator in the 1950s to sum squares and products, and very time consuming it was too. But I did get the right answer, which you wouldn’t with a slide rule!…….”””””
Statistical calculations are all precisely defined, in a myriad of math textbooks. Mostly it is pretty much all simple arithmetic. So any four year old could do it. Well, you might have to coach them about square roots.
But a slide rule is simply a tool for doing arithmetic, and many can do even trigonometry too.
So it is based on logarithms, and other book tabulated data. The cognoscenti, know how to do slide rule math to about 0.1% precision, if you have the right slide rule (K&E).
So when I first started actually doing lens designing (seriously), computation was very expensive, with little in the way of mechanical assistance.
So you didn’t do any calculations you didn’t need to do.
So early researchers, extensively studied the general theory of optical imaging, which resulted in the theory of the Seidel aberrations. As a result, it was quite possible to completely design a very good cemented achromatic doublet objective lens, suitable for world class binoculars ((7 x 50 ) by tracing just three rays, in two different colors. The finished manufacturable lens could be specified, from just those three rays.
The ray tracing process, was reduced to a cyclic spread sheet routine, done using 4 figure log tables; logs of numbers and logs of trig functions.
Any good slide rule could do the same design, to three digits.
Nowadays, people call themselves lens designers, who don’t know a thing about the Seidel aberrations. They are mostly mechanical packaging engineers, who understand that some construction materials actually transmit “light”.
So you use a computer, to do not what was done years ago, by slide rule or pen and paper, plus log tables, but simply trace a lot of rays very cheaply.
I can do that too. If I’m designing something like an LED light “bulb” to replace a 60 Watt incandescent Edison lamp, I can trace a hundred million light rays, with full Fresnel polarized ray optics, through a few surfaces, refractive or reflective, and plot surface illumination at some arbitrary location; perhaps in false color mapping, or actual spread sheet tables.
So the tools don’t matter much; they do the math quickly; a slide rule is a bit slower, but plenty adequate for doing statistics.
The trouble with lens design programs, is they wouldn’t know a good lens, if it came crashing through your living room window.
And if the design is no good, or maybe not even physically realizeable, the computer can’t tell you anything about what needs to be changed. Well, you can tell the computer what is a good lens, and it can hunt for a possible candidate, or even improve not so good ones.
Trouble with that scenario, is that then YOU have to know about lens design, just like the good old days,, and the computer whizzes don’t.
I can’t imagine that calculation of weather/climate statistics, requires anything more than a good slide rule, and how to use it. When was the last time you saw 0.1% precision, in any weather report ??
Key facts about “climate change” which are ignored by true believers.
1. The concentration of CO2 in the global atmosphere is lower today, even including human emissions, than it has been during most of the existence of life on Earth.
2. The global climate has been much warmer than it is today during most of the existence of life on Earth. Today we are in an interglacial period of the Pleistocene Ice Age that began 2.5 million years ago and has not ended.
3. There was an Ice Age 450 million years ago when CO2 was about 10 times higher than it is today.
4. Humans evolved in the tropics near the equator. We are a tropical species and can only survive in colder climates due to fire, clothing and shelter.
5. CO2 is the most important food for all life on earth. All green plants use CO2 to produce the sugars that provide energy for their growth and our growth. Without CO2 in the atmosphere carbon-based life could never have evolved.
6. The optimum CO2 level for most plants is about 1600 parts per million, four times higher than the level today. This is why greenhouse growers purposely inject the CO2-rich exhaust from their gas and wood-fired heaters into the greenhouse, resulting in a 40-80 per cent increase in growth.
7. If human emissions of CO2 do end up causing significant warming (which is not certain) it may be possible to grow food crops in northern Canada and Russia, vast areas that are now too cold for agriculture.
8. Whether increased CO2 levels cause significant warming or not, the increased CO2 levels themselves will result in considerable increases in the growth rate of plants, including our food crops and forests.
9. There has been no further global warming for nearly 18 years during which time about 25 per cent of all the CO2 ever emitted by humans has been added to the atmosphere. How long will it remain flat and will it next go up or back down? Now we are out of the realm of facts and back into the game of predictions.
George E Smith writes with knowledge and experience about the merits and utility of slide rules. The point I was wanting to make without actually stating it is that slide rules are not very good for simple addition and subtraction. For these essential statistical operations you need something else. An abacus would be ideal if you now how to drive one, I’m sure. True, all the hard stuff like multiplications and long divisions are in the realm of slide rule technology. I used spiral slide rules for years, and really liked them, but for adding up I used pencil and paper until the Monroe appeared, with its 20 (?) digit keyboard. Sums of squares and products of two or three digit numbers became easy using the (x+y)squared = X squared + 2xy + y squared recipe. Magic! Now I simply use 1st, a stats program I wrote years ago aimed originally at interactive multiple regression but now with loads of other stuff. Much easier and always correct if the original are reliable and not wrongly entered!