Guest essay by Dr. Tim Ball
“If you torture the data enough, nature will always confess” – Ronald Coase.
Facts are stubborn things, but statistics are more pliable. – Anonymous.
Climatology is the study of average weather over time or in a region. It is very different than Climate Science, which is the study by specialists of individual components of the complex system that is weather. Each part is usually studied independent of the entire system and even how it interacts or influences the larger system. A supposed link between the parts is the use of statistics. Climatology has suffered from a pronounced form of the average versus the discrete problem from the early 1980s when computer modelers began to dominate the science. Climatology was doomed to failure from then on, only accelerated by its hijacking for a political agenda. I witnessed a good example early at a conference in Edmonton on Prairie Climate predictions and the implications for agriculture.
It was dominated by the keynote speaker, a climate modeler, Michael Schlesinger. His presentation compared five major global models and their results. He claimed that because they all showed warming they were valid. Of course they did because they were programmed to that general result. The problem is they varied enormously over vast regions. For example, one showed North America cooling another showed it warming. The audience was looking for information adequate for planning and became agitated, especially in the question period. It peaked when someone asked about the accuracy of his warmer and drier prediction for Alberta. The answer was 50%. The person replied that is useless, my Minister needs 95%. The shouting intensified.
Eventually a man threw his shoe on the stage. When the room went silent he said, “I didn’t have a towel”. We learned he had a voice box and the shoe was the only way he could get attention. He asked permission to go on stage where he explained his qualifications and put a formula on the blackboard. He asked Schlesinger if this was the formula he used as the basis for his model of the atmosphere. Schlesinger said yes. The man then proceeded to eliminate variables asking Schlesinger if they were omitted in his work. After a few eliminations he said one was probably enough, but you have no formula left and you certainly don’t have a model. It has been that way ever since with the computer models.
Climate is an average, and in the early days averages were the only statistic determined. In most weather offices the climatologist’s job was to produce monthly and annual averages. The subject of climatology was of no interest or concern. The top people were forecasters who were meteorologists with only learning in physics of the atmosphere. Even now few know the difference between a meteorologist and a climatologist. When I sought my PhD essentially only two centers of climatology existed, Reid Bryson’s center in Wisconsin and Hubert Lamb’s Climatic Research Unit (CRU) at East Anglia. Lamb set up there because the national weather office wasn’t interested in climatology. People ridiculed my PhD being in the Geography Department at the University of London, but university departments weren’t doing such work. Geography accommodated it because of its chorologic objectives. (The study of the causal relationships between geographic phenomena in a region.)
Disraeli’s admonition of lies, damn lies and statistics was exemplified by the work of the IPCC and its supporters. I realized years ago that the more sophisticated the statistical technique the more likely the data was inadequate. In climate the data was inadequate from the start as Lamb pointed out when he formed the CRU. He wrote in his autobiography “…it was clear that the first and greatest need was to establish the facts of the past record of the natural climate in times before any side effects of human activities could well be important.” It is even worse today. Proof of the inadequacy is the increasing use of more bizarre statistical techniques. Now they invent data such as in parameterization. Now they use output of one statistical contrivance or model as real data in another model.
The climate debate cannot be separated from environmental politics. Global warming became the central theme of the claim humans are destroying the planet promoted by the Club of Rome. Their book, Limits to Growth did two major things both removing understanding and creating a false sense of authority and accuracy. First, was the simplistic application of statistics beyond an average in the form of a straight-line trend analysis: Second, predictions were given awesome, but unjustified status, as the output of computer models. They wanted to show we were heading for disaster and selected the statistics and process to that end. This became the method and philosophy of the IPCC. Initially, we had climate averages. Then in the 1970s, with the cooling from 1940, trends became the fashion. Of course, the cooling trend did not last and was replaced in the 1980s by an equally simplistic warming trend. Now they are trying to ignore another cooling trend.
One problem developed with switching from average to trend. People trying to reconstruct historic averages needed a period in the modern record for comparison. The 30-year Normal was created with 30 chosen because it is a statistically significant sample, n, in any population N. The first one was the period 1931-1960, because it was believed to have the best instrumental data sets. They keep changing the 30-year period, which only adds to the confusion. It is also problematic because the number of stations has reduced significantly. How valid are the studies done using earlier “Normal periods”?
Unfortunately, people started using the Normal for the wrong purposes. Now it is used as the average weather overall. It is only the average weather for a 30-year period. Actually it is inappropriate for climate because most changes occur over longer periods.
But there is another simple statistical measure they effectively ignore. People, like farmers, who use climate data in their work know that a most important statistic is variation. Climatology was aware of this decades ago as it became aware of changing variability, especially of mid-latitude weather, with changes in upper level winds. It was what Lamb was working on and Leroux continued.
Now, as the global trend swings from warming to cooling these winds switched from zonal to meridional flow causing dramatic increases in variability of temperature and precipitation. The IPCC, cursed with the tunnel vision of political objectives and limited by their terms of reference did not accommodate natural variability. They can only claim, incorrectly, that the change is proof of their failed projections.
Edward Wegman in his analysis of the “hockey stick” issue for the Barton Congressional committee identified a bigger problem in climate science when he wrote:
“We know that there is no evidence that Dr. Mann or any of the authors in paleoclimatology studies have had significant interactions with mainstream statisticians.
This identifies the problem that has long plagued the use of statistics, especially in the Social Sciences, namely the use of statistics without knowledge or understanding.
Many used a book referred to as SPSS, (it is still available) the acronym for Statistical Packages for the Social Sciences. I know of people simply plugging in numbers and getting totally irrelevant results. One misapplication of statistics undermined the career of an English Geomorphologist who completely misapplied a Trend Surfaces analysis.
IPCC projections fail for many inappropriate statistics and statistical methods. Of course, it took a statistician to identify the corrupted use of statistics to show how they fooled the world into disastrous policies, but that only underlines the problem with statistics as the two opening quotes attest.
There is another germane quote by mathematician and philosopher A.N. Whitehead about the use, or misuse, of statistics in climate science.
There is no more common error than to assume that, because prolonged and accurate mathematical calculations have been made, the application of the result to some fact of nature is absolutely certain.
_______________
Other quotes about statistics reveal a common understanding of their limitations and worse, their application. Here are a few;
He uses statistics as a drunken man uses lampposts – for support rather than for illumination. – Andrew Lang.
One more fagot (bundle) of these adamantine bandages is the new science of statistics. – Ralph Waldo Emerson
Then there is the man who drowned crossing a stream with an average depth of six inches. – W E Gates.
Satan delights equally in statistics and in quoting scripture. – H G Wells
A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions. – M J Moroney.
Statistics are the modern equivalent of the number of angels on the head of a pin – but then they probably have a statistical estimate for that. – Tim Ball
__._,_.___
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Some great quotes on statistics. My favorite statistics quote is by Homer Simpson:
‘Kent Brockman: Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack beatings are up a shocking nine hundred percent?
Homer Simpson: Aw, you can come up with statistics to prove anything, Kent. Forty percent of all people know that. ‘
Dr. Ball: thanks, a wonderful essay. Better yet, it prompts some outstanding comments and mini-essays. In particular I want to echo the praise from “chris y” and “stan stendera”–
“chris y on October 2, 2013 at 1:37 pm
stan stendera says:
October 2, 2013 at 1:25 pm
“Brad and Tom G: You should both write books!!!! Among the all time great comments on WUWT, and that’s saying something.”
I agree. These are absolutely excellent comments.
I encourage both of you to consider putting together a post for Anthony. Your writing styles are similar to Willis Eschenbach, whose posts I enjoy immensely!”
@rgb
You have outlined very eloquently the problems with statistics and climate modelling. I am simply astonished by the errors made in basic probability in justifying models. The problem is when you have trashed the obvious non-performers and you are left with something that approximates real data. Is this simply a fortuitous choice of initial conditions and parameters or does the model really work? I cannot see any way of testing this except through prospective testing. In other words, there is a hypothesis that looks as though it might be right but it has not been tested.
Most of the models seem to be no more accurate than a first order Taylor’s expansion. Unfortunately for the IPCC, if the reference point is moved backwards, so giving a wider range of estimates that include current data, the second order term is probably significant and how you extract the second derivative of the temperature signal with any confidence is beyond me.
The point about auto-correlation is very important, since many models give wildly different ACFs to real data. The ACF is important in model identification and seems to be roundly ignored, which it should not be.
RC Saumarez (October 3, 2013 at 4:17 am) wrote:
“I cannot see any way of testing this […]”
Step back from the univariate view to see other variables, some of which are rigidly constrained. If you really cannot see this, you definitely need jolting perspective change. Step One: Sober Up. Global average temperatures alone do not determine large scale terrestrial flows, which are constrained by the law of conservation of angular momentum in earth rotation summaries. A specification of the spatiotemporal distribution of temperature gradients is needed. When attractors in well-constrained flow data are taken into account, the set of permissible ensembles is reduced by an order of magnitude (or an order of idiocy in YEP’s translation). This can be proven geometrically. This is the natural extension of the work of Lamb & Leroux mentioned by Tim Ball.
Universal Fluctuations in Correlated Systems
http://arxiv.org/abs/cond-mat/9912255
“Self similarity is an important feature of the natural world. It arises in strongly correlated many body systems when fluctuations over all scales from a microscopic length a to a diverging correlation length ξ lead to the appearence of “anomalous dimension” [1] and fractal properties. However, even in an ideal world the divergence of of ξ must ultimately be cut off by a macroscopic length L, allowing the definition of a range of scales between a and L, over which the anomalous behaviour can occur. Such systems are found, for example, in critical phenomena, in Self-Organized Criticality [2,3] or in turbulent flow problems. By analogy with fluid mechanics we shall call these finite size critical systems “inertial systems” and the range of scales between a and L the “inertial range”. One of the anomalous statistical properties of inertial systems is that, whatever their size, they can never be divided into mesoscopic regions that are statistically independent. As a result they do not satisfy the basic criterion of the central limit theorem and one should not necessarily expect global, or spatially averaged quantities to have Gaussian fluctuations about the mean value.”
That’s what makes earth orientation parameters such indispensable climate indicators.
Macroscopic control parameter for avalanche models for bursty transport
http://arxiv.org/abs/0806.1133
“It is increasingly recognized that a large group of physical systems can be characterized as driven, dissipative, out-of-equilibrium and having a conservation law or laws (see the comprehensive treatments of [1, 2]). They usually have many degrees of freedom (d.o.f.), or excited modes, and long range correlations leading to scaling or multiscaling. Two examples are fully developed turbulence (see e.g. [3, 4]) and Self-Organized Criticality (SOC, [5, 6, 7]).
[…]
Our focus here is then to establish the macroscopic similarities and differences between turbulence and SOC in the most general sense. A central idea in physics is that complex and otherwise intractable behavior may be quantified by a few measurable macroscopic control parameters. In fluid turbulence, the Reynolds number RE expresses the ratio of driving to dissipation and parameterizes the transition from laminar to turbulent flow. Control parameters such as the Reynolds number can be obtained from dimensional analysis (see e.g.[3, 56]), without reference to the detailed dynamics. From this perspective the level of complexity resulting from the detailed dynamics is simply characterized by the number N of excited, coupled d.o.f. (or energy carrying modes) in the system. The transition from laminar to turbulent flow then corresponds to an (explosive) increase in N. The nature of this transition, the value of the RE at which it occurs, and the rate at which N grows with RE all depend on specific system phenomenology. Dimensional arguments, along with the assumptions of steady state and energy conservation, are however, sufficient to give the result that N always grows with RE (as in [57], see also [3].
We anticipate that an analogous control parameter for complexity, RA, will exist for the wider group of systems discussed above. Interestingly it is now known that such a control parameter that expresses the ratio of driving to dissipation does indeed exist for SOC. In this paper we will give a prescription to obtain RA generally from dimensional analysis, that is, without reference to the range of detailed and rich phenomenology that any given system will also exhibit.”
Well some people believe everything they are told by the IPCC – oh dear!
http://travelnewsandwire.com/meteorologist-breaks-down-in-tears-after-climate-change-report-say-he-business-insider/?utm_source=outbrain&utm_medium=traffic&utm_content=%7Baid%7D&utm_campaign=outbrain
Thanks, Tim. I enjoyed that.
Regards
Steven Mosher says,
“The bottomline is that 30 years was not selected because of the reason Ball asserts. In fact, you can find a discussion about this in the climategate mails.”
Could Mr. Mosher tell us why 30 years was selected, please. Otherwise some might suspect that it was because at the time that it chosen it provided the steepest incline in global temperatures.
I have no problems with models if these are used for testing simple hypotheses. Let a certain sensitivity be the hypothesis. Do a thousand model runs with random initial conditions. Note per run the slope of linear regression for the past 17 years. Compute the mean and variance of all obtained slopes. If the real slope deviates more than 2 sigma from the model mean, the hypothesis should be rejected. Results can be reported on one page. It is rather suspect that the IPCC needs 2000 pages for telling their story.
– – – – – – – –
Why did the IPCC choose models as the locus of their arguments? Was their decision an objective one or subjective one?
The corpus of A. N. Whitehead’s philosophy of science was influential to a following generation of scientists via his extensive treatment on the conception of objectivity.
From a philosophy of science basis, was the IPCC’s choice explicitly justified in peer reviewed literature or was it based on an inherited, and presumed to be correct, paradigm of science?
The later justification, if found to be the case, is a central strategic critique point for skeptics.
John
I disagree. I think the Team understood exactly what they were doing, especially Mann. They just (rightly) assumed the general public’s eyes would glaze over when the “errors” were made public.
A couple of things I want to emphasize”
Absolutely spot-on. Computer programs output the results they do because they were created to do so. Computer models do not output facts. Computer models to not output data.
This highlights one of the many, many instances of double-talk and dissembling on the part of the alarmists. They scream “the sky is falling, the sky is falling” because of the model results. In the next breath they tell you the models used by the IPCC don’t make predictions.
Repeat after me, “Computer models do not output facts. Computer models to not output data.”
This is a pet peeve of mine. I sometimes talk back to the TV weatherman when he/she says today’s weather is warmer/colder/wetter/drier/etc, than normal. “The high today will be 86, which is above normal.” What they mean to say is the high is above the 30 year average for today’s date. To get a real idea of what normal is, look at the record high and record low for a date. That range defines the normal range, which can be quite variable.
That was a very interesting read, Dr. Ball. Thank you.
@Paul Vaughn
Thank you for jolting me out of my complacency. I am sure that all you have quoted is highly relevant.
I simply said that if you ditch the obviously non-performing models you are left with ones that seem to conform to observations. By selecting this subgroup of models one has formed a hypothesis. Remembering the statistical adage that you cannot test a hypothesis with the data used to construct it, these models have to be tested with fresh data, in other words seeing how they predict the future.
RC Saumarez says at October 3, 2013 at 8:46 am
Yes! That is right and it needs to be understood. But many don’t understand it.
For those who do not know why it is true, see
http://en.wikipedia.org/wiki/Texas_sharpshooter_fallacy
Richard
the climate is simply the average weather over the previous 30 years…….just like a baseball players batting average is his PAST actual record at the plate…..his average exerts NO control on any level over what happens in his next at bat….a player that has never hit one home run in his life COULD hit one his next time up, and the all time leader could strike out his next time up…….those claiming the climate exerts any control over the weather are showing incredible IGNORANCE.
Vukcevic says:
“For some years now, I have looked into changes of the Earth’s magnetic field and noticed easily observed ‘apparent correlation’ with the temperature records.
For time being the science indicates that the only way these changes may make an impact is through the cosmic rays nucleation process, but it doesn’t support the Svensmark’s hypothesis since the Earth’s and solar field appear to have negative ‘correlation’:”
I have seen the same anti correlation and interpret it as another case of two opposite acting forces (the solar and geomagnetic field) that contribute to the stability of the earths climate through a governoring effect on Svensmark’s hypothesis of cosmic ray nucleation process. This is actually necessary to match the temperature anomaly record.
Does everybody know Barrack Obama never signed the Kyoto Treaty? It’s true, it’s true.
I plead guilty to using SPSS for my MSc in 1985 (i am getting old!) and getting it wrong.
Dr. Ball, if you can divulge the name “Eventually a man threw his shoe on the stage. who was the man that went on the stage ?
My theory of climate change/global warming – painting houses yellow creates global warming.
I can test the theory by creating a computer model that adds heat to the atmosphere every time a house is painted yellow. Run the simulation, with large numbers of simulated houses painted with simulated yellow paint, and – the simulated atmosphere gets warmer!
Not very surprising results. Its pretty obvious that if you create a simulation in which heat is driven by a factor in the simulation, an increase in that factor will yield more heat – even if that factor is yellow paint. And, naturally, you can substitute CO2 for yellow paint; it will work just as well. In the end, the simulation proves nothing.
Steven Mosher says,
“The bottomline is that 30 years was not selected because of the reason Ball asserts. In fact, you can find a discussion about this in the climategate mails.”
Could Mr. Mosher tell us why 30 years was selected, please. Otherwise some might suspect that it was because at the time that it chosen it provided the steepest incline in global temperatures.
Well, except for the interval between 1910 and 1940. In fact, the incline in GASTA from 1910-1940 almost exactly matches the incline in GAST from 1980 to 2010. Dick Lindzen is fond of presenting e.g. HADCRUT4:
http://www.woodfortrees.org/plot/hadcrut4gl/offset:288
broken into 1900-1950 and 1950-2000 without numbers but on identical vertical anomaly scales side by side and asking people to identify which one is which, which incline is associated with CO_2 and which one isn’t. It ain’t easy.
I also recommend looking at:
http://www.woodfortrees.org/plot/hadcrut4gl/detrend:0.7
Note that this subtracts a trend of 0.7/165 = 0.43 degrees per century from the anomaly data. What’s left looks like a sinusoid with a period approximately equal to the period of the PDO, that is SO CLOSE to being flat — maybe there is a bare 0.2 C that could be attributed to the entire “anthropogenic era” of increased CO_2 in the second half of the 20th century. Could be, or not, because the mid-1800s exceeded this trend line by just as much without CO_2, and we don’t understand this graph or its underlying causality.
rgb
@RC Saumarez (October 3, 2013 at 8:46 am)
My concern is that the modelers never draw attention to multivariate model output. The FIRST thing I want to see is their global AAM (atmospheric angular momentum) output. WHY DO THEY HIDE IT? Because they know that if I analyze it I can PROVE BEYOND ALL SHADOW OF A DOUBT that their model output is TOTAL BS TOTAL BS TOTAL BS in the most egregious sense possible. Like the sharpest razor blade possible, cutting effortlessly through something that senses nothing…
Demand that they report global AAM for all models. I dare them to….
Best Regards!
Thanks to all for your kind and generous comments and your request for “more of the same”. If you want to stray a little off topic I do have a blog which addresses timely topics in the news pertaining to geology. I don’t want to post the link here without permission from Anthony and the moderators, as it would be a shameless plug to drag the audience from here.
Back on track, however, I have been considering a more concentrated blog about the reckless waste inflicted by EPA on the American taxpayer and consumer since the enactment of CERCLA in 1980. It would be an insider’s perspective with ample examples (intentional word play there). I have refrained until now because until this past January, I was still representing clients in that regulatory arena and couldn’t risk it. Now however, I am one of those lying, untrustworthy, evil, polluting, “frackers”.(Those who know my comments also know how hard it is for me to write it that way). Now that I don’t deal with CERCLA any longer, and before I lose my insider’s edge I might begin that blog now.
Cheerio.
Tom
I find it strange that within the study of and use of statistics, for the purpose of analyzing a dataset, that we do not have an agreed methodology.
Not being versed in stats, I can not comment well, however, there are many parallels with other professions which I do work within: software and hardware design.
In both of these skill areas, a number of competed paradigms can be used, some favorites become displaced by ‘new kids on the block’ for clear and understandable reasons, they do a better job at achieving your intended outcome.
By now I would have expected, perhaps half a dozen or so, methods or techniques would be vying to be ‘the best’ way to analyse time series datasets.
However, we have endless, heated debate, not about which method, but how to implement a method for what is ‘standard data’.
Is it not possible to expect a few good statisticians to get together and hammer out a few strong methods that people could agree on because they will be able to ‘see’ the value of each method?
Or am I misunderstanding the maturity level of applied statistics today?
As an ex software engineer/hardware designer I would like to be able to find the most appropriate method to use to find a trend in time series data.
It is disconcerting that I can not.
TonG(ologist) says: October 3, 2013 at 8:32 pm
Back on track, however, I have been considering a more concentrated blog about the reckless waste inflicted by EPA on the American taxpayer and consumer since the enactment of CERCLA in 1980. …… Now that I don’t deal with CERCLA any longer, and before I lose my insider’s edge I might begin that blog now.
Anything which brings a public focus onto the regulatory monster that the EPA has become would be a wonderful thing.