The quote in the headline is direct from this article in Science News for which I’ve posted an excerpt below. I found this article interesting for two reasons. 1- It challenges use of statistical methods that have come into question in climate science recently, such as Mann’s tree ring proxy hockey stick and the Steig et al statistical assertion that Antarctica is warming. 2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. Skeptics might ponder this famous quote:
“If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford
There are many more interesting quotes about statistics here.
– Anthony
UPDATE: Luboš Motl has a rebuttal also worth reading here. I should make it clear that my position is not that we should discard statistics, but that we shouldn’t over-rely on them to tease out signals that are so weak they may or may not be significant. Nature leaves plenty of tracks, and as Lord Rutherford points out better experiments make those tracks clear. – A
==================================
Odds Are, It’s Wrong – Science fails to face the shortcomings of statistics
March 27th, 2010; Vol.177 #7 (p. 26)

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.
During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.
It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.
Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.
Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.
“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”
Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”
Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”
====================================
Read much more of this story here at Science News
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

I am 99 percent confident that this thread belongs to the top 2 percent of all posts that I have read in WUWT. Thank you 100 percent of all contributors for making statistics something to read about at least once in a lifetime. 🙂
There was a long thread about “random walk” and statistics elsewhere in the blogosphere but most of the stuff there went over this layman’s head and the thread got derailed by Tamino’s gargoyles, especially his Gargoyle-in-Chief, dhogaza.
Anyways… So, I’d like to highlight a particular comment here that finally helped me understand what was being discussed there. Could temperature trends be nothing more than a “random walk”? Interesting question. And I think steveta_uk’s DIY experiment gives an interesting answer. An expanded version of the experiment (which can be done safely at any home) is worthy of a separate post on WUWT.
Steve Goddard (12:04:19) :
Mountain climbers always “cherry pick” their start location to achieve the appearance of an uphill slope.
Like when picking when to start a snow cover slope? Putting numbers to something is better than just eyeballing, no?
What we need is a little lateral thinking here:
Our resident experts here have siad that you can’t use statistics to prove anything.
But Gordon Brown has been using statistics for 13 years to prove that he’s a crook.
[OK. Some pedant will tell me it wasn’t the statistics is was the statist.]
Evans (11:24:46) wrote: “Has Science observed & measured for H2O at every location possible to confirm that indeed that H2O does “average” out to one percent?”
Leif Svalgaard (12:23:18) replied: “It doesn’t have to.”
As I suggested before, “no,” [it is correct that Science doesn’t need to measure from every possibly location] but it is a “statistical” inference.
A product of a statistical work-up:
Based on prime observations & measurements and given Science’s understanding of the relevant physical material and conditions.
Guys, for the 100th time, don’t fall for Tamino’s strawmen.
Nobody claimed temperatures are a *random walk*. We claim that the series contains a *unit root*. A random walk implies a unit root, a unit root doesn’t imply a random walk.
The presence of a unit root invalidates most ‘trend analysis’ as performed in climate science, because such analysis implicitly assumes the underlying data generating process to be a ‘trend-stationary’ one. However, extensive (formal) testing (see link) has shown this not to be the case (i.e. the series is non-stationary).
This is a simple implication of a whole body of literature establishing the presence of unit roots in temperature series.
Read this comment, by Alex, it’s enlightening:
http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared/#comment-1931
Again, for a detailed analysis of where Tamino ‘went wrong’, and links to all the relevant test results, see:
http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared/#comment-1643
@Steve Goddard
‘You can prove anything you want with statistics.’
I say people only think that they, and usually because they don’t really understand.
Although I don’t think I really need to explain this to you, but then again it is a saturday. A statistical representation of reality is, and will always only be, the complete raw plot of everything in it’s own context. So, the reality of the last hundred years’ temperatures are only accurately shown by plotting all temperature readings for that period that were taken the same hour on the hour for every day for every year for the last hundred years. The reality that is reconstructed is only for one specific hour on the hour though, no more and no less. Every calculation done on that data are mostly bulls— unless otherwise accurately stated, however, and most importantly the calculations wont ever describe reality, other than the odd random occasion.
Leif,
Thanks for proving my point.
VS (12:25:42) : edit
Say what? The Hurst coefficient of the global temperature datasets are on the order of 0.8 … how is that not high?
James F. Evans (12:42:19) :
A product of a statistical work-up:
Based on prime observations & measurements and given Science’s understanding of the relevant physical material and conditions.
And is thus entirely valid and the best that humankind can and need to do, right. An example of good use of statistics, in your opinion.
But you are wrong. It is not a statistical inference, it is a simple coverage-weighted average of the individual measurements. There is no hypothesis to be validated or disproved, etc.
Steve Goddard (12:54:17) :
Thanks for proving my point.
I didn’t think you had a point, but if you are happy, then stay so.
Willis Eschenbach (12:56:01) :
The Hurst coefficient is not applicable here because the series has a unit root. In other words, it is non-stationary.
The Hurst coefficient is relevant for stationary series, as is ‘trend inference’.
I reiterate: the temperature series are NOT trend stationary with a high persistence. It is integrated, or I(1), and contains a unit root. Under the link given above, you will find the test results to support that assertion.
You seem like a devoted and intelligent individual and I enjoyed many of your earlier posts, so I strongly suggest you read the thread in question (thousands of words, many trolls, but you can start with my links given above).
Most of the statistics employed in climate science (see scholar link above) is in fact obsolete in the presence of a unit root (which we have established, again, extensive test results posted under link).
Statisticians have a sacred thing called “maximum likelihood”. This thing leads to things like the global financial meltdown when it goes wrong.
Everything rests on the “i.i.d.” assumption. Get that wrong and here is what happens due to the way the calculations go:
WRONG MULTIPLIED BY ITSELF N TIMES – i.e. WRONG^N
Confronted with this observation, a quick-witted statistician countered with, “Well, once you take logs it’s only additive.” He was referring to what statisticians call “log-likelihood” — it’s a log that gets optimized. So his defense was that when the “i.i.d.” assumption fails (which is literally almost all the time in many fields of study), results are only n-times wrong, rather than wrong^n.
I remember a horrified look on another statistician’s face when I suggested that this practice should not be used where the “i.i.d.” assumption is untenable. Why the look of such horror? A whole paradigm is built on the assumption. It is therefore (in the minds of some) not open to challenge.
Note to academic statisticians:
I appreciate the elegant abstract mathematics, but do honesty, integrity, & reality mean anything to you folks? Or is it all about mathematical convenience?
[ :
All that’s on offer to ecologists, physical geographers, economists, climate scientists, etc. through official academic channels (as statistics “outreach” to the wider community) is methods based on untenable assumptions. There’s no benefit in looking to the wrong people for the wrong methods.
One could spend several lifetimes buried in endless literature that has no application in reality (due to untenable assumptions). I wouldn’t recommend lifting a finger to study such methods formally (but I can see the appeal for those looking to join a smirking guild of deception).
VS (12:44:28) :
Guys, for the 100th time, don’t fall for Tamino’s strawmen.
Nobody claimed temperatures are a *random walk*. We claim that the series contains a *unit root*. A random walk implies a unit root, a unit root doesn’t imply a random walk.
Well, I guess I am one of those guys, and I’m happy to stand corrected. Great to see VS discussing the issue on the WUWT.
Off topic: Bishop Hill has just reported that Geological Society is seeking submissions from its members to prepare a position statement with regard to climate change. This was long time coming. Geologists are finallly on the march!
VS (13:14:45)
Thanks, VS, I’ll read the thread in questions.
I agree with what Lubos Motl says.
I worked on an experiment where we “saw” a Higgs at the 2 sigma(95.4499736%) level. You have not heard of it, because three other experiments saw nothing, so only a limit was set.
In particle physics to establish a resonance/particle we required 4 sigma (99.993666%).
One got excited with 3 sigma effects (99.7300204%), but I have seen 4sigma effects that were not reproduced, because of too many cuts on the data.
In the end repetition of experiments is what is crucial.
I welcome your input.
Cheers.
Don’t you know that
90% of all statistics are made up on the spot?
yep,
Climate scientists have convinced a lot of people that CO2 is causing changes that can only be detected by teasing a tiny, tiny signal out from very noisy data with arcane statistical techniques, and that once that tiny signal is detected we can conclude that we must restructure the worlds economies to stave off catastrophe. If we wait ’till we can see the damage (sans arcane statistics) it will be too late.
Seems to me that statistics are the most powerful tool ever employed by scientists.
VS
I very much enjoyed reading the thread as it unfolded over many days. As you say, there were many trolls-one of which pops up in all sorts of places but mainly on the Guardian stories supporting George Monbiot. As far as I am aware neither Dr Mann or Jones are particularly statistically literate, which causes problems as their subjects demand a high level in that skill.
tonyb
[quote anna v (13:36:37) :]
I agree with what Lubos Motl says.
[/quote]
Me too. No need to throw the baby out with the bathwater.
Mike Mann, for example, not using statistics correctly does mean statistics cannot be used correctly.
“-weighted average”
…is a stastical term…
Statistics: With one foot in a bucket of ice and one in a hot frying pan….well, on average I feel pretty good!
No mention so far that p-tests control the probability of Type I errors (rejecting a valid null) while failing to control the probability of Type II errors (failing to reject an invalid null). These latter can get very large for small sample sizes for even small deviations into the range of the alternative to the null.
It seems Siegfried bases his conclusions mostly on retrospective studies where one must assume inclusion criteria fits the objectives of the study. In such cases, it is not always the statistical analysis that produces a false result but rather the working assumption that the data is applicable to the degree necessary to avoid false conclusions. In clinical studies, patient enrollment is better controlled to the study’s protocol, making the data quality, validity, and end statistical analysis far more reliable.
As a general rule, doctors and researchers pay less attention to retrospective studies as compared to clinical studies because the statistical problems of retrospective studies are well understood. There is a growing trend in all areas of medical research to require a qualified statistician to at least check the end analysis of a study submitted for publication. Indeed many IRB’s (institutional review boards) require a study to include analysis by a qualified statistician to be approved.
Giving considerable thought to Siegfried’s implications, I’m hard pressed to imagine a better objective means to qualify the findings in research analysis outside of the traditional statistical tests. In fairness to his points, it is important that the tests be oriented towards disproving the null hypothesis as opposed to proving the working hypothesis. Disproving the null hypothesis is more robust and reliable. Tests to prove the working hypothesis are far more prone to committing Type I errors (rejecting the null hypothesis when the null hypothesis is true). I believe that is where many studies contain serious statistical flaws. I see a lot of studies bent on proving of the working hypothesis in climate science which is why I remain skeptical of their findings.
Statistics remains the workhorse in formulating study conclusions, unless, of course, you’re into post modern science. In such a case, consensus will be the method of decision. Comparing consensus view of treatment practices vs. study findings I’ve personally worked on, I’ve seen too many times that the consensus view was wrong. I recently co-authored a published abstract that showed conclusively that the consensus wisdom of treating for inter-amniotic hyperechoic matter using antibiotics is not only unwarranted but exposes the patient and fetus to undue risks.
James F. Evans (14:17:21) :
“-weighted average”
…is a statistical term…
Wrong kind of statistics, we were talking about inferences, not description. There comes a time, when you should stop digging. Your example was not wrong use of statistics.