'science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation.'

The quote in the headline is direct from this article in Science News for which I’ve posted an excerpt below. I found this article interesting for two reasons. 1- It challenges use of statistical methods that have come into question in climate science recently, such as Mann’s tree ring proxy hockey stick and the Steig et al statistical assertion that Antarctica is warming. 2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. Skeptics might ponder this famous quote:

“If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford

There are many more interesting quotes about statistics here.

– Anthony

UPDATE: Luboš Motl has a rebuttal also worth reading here. I should make it clear that my position is not that we should discard statistics, but that we shouldn’t over-rely on them to tease out signals that are so weak they may or may not be significant. Nature leaves plenty of tracks,  and as Lord Rutherford points out better experiments make those tracks clear. – A

==================================

Odds Are, It’s Wrong – Science fails to face the shortcomings of statistics

By Tom Siegfried

March 27th, 2010; Vol.177 #7 (p. 26)

P valueA P value is the probability of an observed (or more extreme) result arising only from chance. S. Goodman, adapted by A. Nandy

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.

Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.

“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”

====================================

Read much more of this story here at Science News

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

238 Comments
Inline Feedbacks
View all comments
DR
March 20, 2010 7:17 am

Being grounded in metrology (the science of measurement) should be a prerequisite for anyone before applying any advanced statistical tests, and that includes scientists.
The surface station debacle is the perfect example. Reading through the blogjacks at Lucia’s, after all the statistical game playing, the bottom line is unless first principles of metrology are understood and applied to the instrumentation, it is a meaningless exercise to evaluate the data statistically.
That’s the problem I have with slick data miners like Tamino and a few threads over at Lucia’s. After all the statistical game playing, not one single analysis employed has addressed the problem of each individual station’s integrity.

John M
March 20, 2010 7:28 am

Blackbarry (07:14:30) :

“It’s all relative”

Yes indeed. That is why a handful of years ago, we often heard “We must act because the science is certain”. We now hear “We must act because of uncertaintly, and we just don’t know how bad it could be!”
It’s all relative to how the headlines are going.

jdn
March 20, 2010 7:37 am

I can’t believe that so many here think this Science News article is worth anything. Their “barking dog” experiments are such obvious straw men. Nobody would conduct an experiment that way. And these jokers keep returning to that same bad experimental design to “prove” their point. It’s not proof, and, is one more example of bad science. It’s pretty funny that so many people here think these guys are right just because they have an anti-establishment tone to their article. They’re also the establishment, and, you guys who think this article was good have been fooled yet again.
The complaint that many of you have, namely, that enough experimentation will lead to statistical significance by chance, is already addressed by the ANOVA test or Bonferroni correction to the t-test. In short, statisticians already know about it and raise the bar for statistical significance when multiple trials are performed. It’s true that many people don’t follow the rules, but, that’s the fault of the experimenter, not the field of statistics.
This Science News article is an example of people speaking about statistics who don’t know what they’re doing or who are misrepresenting the field. Nobody uses p = .05 anymore unless they have really rare data. We like to see p = 0.01 or 0.001. Also, the Bayesian crap these guys are pushing leads to trivial results in trivial situations but madness in difficult situations. In my experience, people use these trendy methods to get funding, not to get results…. just like the CRU does.

Bernie
March 20, 2010 7:50 am

IMHO, Bayesians have a significant epistemological advantage over frequentists – they start with more explicit causal model not an associative model. Consequently they have to be more explicit about causal mechanisms.
Climate science have always struck me as being underspecified – the dendrochronolgy mess is a perfect example.
The danger of Bayesian models is attaching significance to subjective probabilities generated by experts with agendas.

March 20, 2010 7:54 am

Statistics are a valuable tool that can be used to investigate the secrets of science. Trouble occurs when they are used to develop proof of a scientific theory. One of the techniques that Japanese used when using statistics to improve their manufacturing capabilities was to ask why 5 times. Example, the global average for February was the second warmest in 32 years, according to Christy and Spencer. Why? On his website, Roger Pielke, Sr, reports that this anomality is all due to Greenland being much warmer than normal in February. http://pielkeclimatesci.wordpress.com/2010/03/19/an-example-of-why-a-global-average-temperature-anomaly-is-not-an-effective-metric-of-climate/
Why was Greenland much warmer than normal? Empirical data and statistics can be used as a tool to investigate this questions. In this case there are 4 more whys to answer.

J. Bob
March 20, 2010 7:57 am

I’ll second Stephen Skinner’s recommendation of the book, still available, “How to lie with Statistics”. On of the great things about the book is, it show how statistics can be manipulated to a particular view. I have seen that book on many desks of designers, especially in the medical field, for many years .
The general rule we had for data analysis was to look at a good time history chart, use signal conditioning methods to extract additional information, and resort to stats at the end, to see if there were any “droppings”.
It is most interesting to seen the proponents of AGW, using stats first, almost as a religion, and not even mentioning looking at charts and graphs to seen if all the data sets make sense. However stats do provide a cover, if one does not understand the underlying principals.

r
March 20, 2010 8:03 am

When my children were learning algebra…
they said to me:
“Why do I have to learn this? I’m never going to use it.”
My answer:
So that no one can fool you with algebra.

Henry chance
March 20, 2010 8:09 am

Statistics are fantastic for showing coincidental relationships

JER0ME (01:45:36) :
As a born mathematician, I have revelled in many aspects of the the subject. I must admit that imaginary numbers and the like made me feel a bit queasy, but I ‘took it like a man’, and accepted it all in the end once I saw the benefit of something that seem so wrong.
But statistics? I have never, ever, been even slightly comfortable with them. It is all so easy to manipulate, even for very bright people. I have enormous respect for those that can delve into this area and come out with any kind of truth. It is all so easy to be misled, and, indeed, to mislead.
I have a strong belief that mathematics is a pure subject in its own right. It also forms the basis, or foundation, for physics. Without mathematics we cannot accurately describe the physical world.
Chemistry then rests on top of Physics, as we eventually find that we canny explain or describe Chemistry without Physics. So further, Biology rests in exactly the same way on Chemistry, we also find.
Where does statistics come into the equation? Pretty much nowhere IMO.
Of course, it is almost certainly possible to prove me wrong … with statistics…

Are you asking me to give the “odds” that you are wrong? LOL
I can think of may ways statistics are misused, misunderstood, wrongfully applied to probabilities as being prooof and often just wrong in the “science” of global warming.

Douglas DC
March 20, 2010 8:12 am

I was in on the beginning of using computers to store data. I had my sights on
a career in Forestry. As I pursued my study, it became clear that the life of
a Forester, Field Biologist/Botanist, Ranger, etc. Was to be mostly spent sorting
data and putting it in to a data base, with little real field work. Reality was not
as important as the Stats. Trends.” Make the data fit the hypothesis.” I was
troubled by that. Finished a general Biology B.S. and left the field. Went into aviation where I did more in the field than I did while in study…
I think that academia sometimes crawls into that Ivory tower and cuts the
rope ladder that is used to get up there, trouble is- you cannot get down either…

tom Trevor
March 20, 2010 8:31 am

I really don’t know anything about statistics, but I always wonder about the results of many medical studies, especially those that have a very limited sample size, say only 40 or so female nursing students at one college. The results of these studies are often report in the media as if they are valid for everyone in general.

kadaka
March 20, 2010 8:38 am

Sou (02:52:56) :
(…)
(Amateur statisticians are the cause of much confusion and misinformation in climate science.)

True. The work of Mann, Jones et al has been exposed as having such amateurism. You would be wise to ignore them.

March 20, 2010 8:47 am

Basil (07:02:14) :
calling attention to something that is quite true: the misuse and frequent misunderstanding of the results of statistical tests.
We, of course, basically agree. What I was trying to say is that the misuse is rarely done by the scientists themselves as we know how the method works. I am one off and know hundreds of those critters and do not know a single one that has a misunderstanding of this. The misuse is done by people trying to use a scientific ‘finding’ for their own purposes.
That said, scientists are also people [I state this at the 95% confidence level] so some may try to use statistics to misrepresent the significance of a finding i.e. to fool others rather than fooling themselves. One of the best [worst?] examples of that is in this famous paper [cited – at least – 216 times]: http://www.ukssdc.ac.uk/wdcc1/papers/nature.html
where they state that they found an “unprecedentedly high and significant correlation”: “The correlation coefficient is 0.91, for which the significance level is (100-4.3*10^-11)%. “. Note the clever way the ridiculous significance level is expressed. Had they used the equivalent form 99.99999999996 % it would have jumped out at you that something was amiss [there were only about 30 data points]. In fact, when I pointed that out to them in http://www.leif.org/research/Reply%20to%20Lockwood%20IDV%20Comment.pdf paragraph 21, they lamely admitted [in “http://www.eiscat.rl.ac.uk/Members/mike/publications/pdfs/sub/239_Lockwood_2006JA011640R.pdf ] that “the significance levels of correlations quoted here all make correction for the persistence of the data (from their autocorrelation functions) and hence the effective number of independent samples [Wilkes, 1995]. This correction was not made in the original paper by LEA99 who, as a result, quoted significance values that were too high. However, even with the correction for persistence, all the correlations presented by LEA99 remain significant at greater than the 99.9% level”. A significant(!) number of 9s were dropped.
The two [comment] papers contain lots of statistics [some bad, some good]. But my point was that we as scientists don’t really believe that statistics ‘prove’ anything, or disproves anything because as Willis notes: if you can’t see it by eye in the graph, it probably ain’t there.

Pascvaks
March 20, 2010 8:47 am

Ref – Leif Svalgaard (01:48:47) :
“..Now, if a prediction has been made, statistics can be used as a rough gauge of how close to the observations the prediction came, but the ultimate test is if the predictions hold up time after time again. This is understood by scientists, but often not by Joe Public [his dirtiest secret perhaps 🙂 ].”
____________________________
That’s certainly the way it is supposed to work. (Well.. the way it was suppose to work;-) The problem is that the Scientific Ethic has suffered as much, if not more in some ‘psyentists’, as in every other field of human endeavour. The Guilds do not police their own. Indeed, there are no Guilds anymore. All is chaos. Kind of a “Do Your Own Thing” sort of thing. Dig it, Man? Cool! Oh, neato skeeto!

Kevin Kilty
March 20, 2010 8:49 am

Leif Svalgaard (01:48:47) : hit the issue pretty well on its head, as did later postings by dearieme, et. al. However, I might add that without the use of statistics in process control, and design of experiments, the modern world of low cost, reliable, uniformly manufactured items might never have come to be. If one is offended by the illogic of frequentist approaches, then by all means look at more comprehensible methods such as likelihood ratios.

1DandyTroll
March 20, 2010 8:51 am

That was one sensible article. I think I almost cried.
Statistics is very easy. When properly used it always includes a rational context that the authors explains very well. Rational context: compiling yearly rain water levels during the last hundred years to see what year got the most. Semi rational: to use the previous result to predict in no uncertain terms future average, or otherwise draw nutty conclusions from, like the it’s a fifty-fifty chance of more rain. Irrational context: to predict future draught or flooding or other doom and gloom scenarios a hundred years from now using the same rain water data as above.

Michael
March 20, 2010 8:52 am

This is idiotic, especially about the quote about an experiment requiring statistics being a bad experiment. There is no escape from the chore of having to analyze and interpret our data, and statistics is still the best tool for that.
Kitchen knives can cut you but we still keep cooking.

richard
March 20, 2010 8:52 am

PJB –
and there’s the problem right there. We can’t reproduce the climate and the models that are being used, while undoubtedly very clever are not a match for reality.
Case in point is the current lack of surface warming. None of the models predicted it so therefore all of the models are unsuitable to base governmental policy on.

March 20, 2010 9:00 am

Blackbarry (07:14:30) :
–Heisenberg, uncertainty paper, 1927 […]
All the statistics do is measure the degree of uncertainty in a chaotic world. As Einstein demonstrated, “It’s all relative”.

This is a common misunderstanding and has often been misused. The Schroedinger Equation that governs quantum mechanics is completely deterministic and allows no uncertainty whatsoever. The uncertainty principle may perhaps better refer to the difficulty related to how to pin down where a wave is: the bigger the wave, the less sense does it make to state its position with high precision; Think of saying that this monster 50-ft ocean wave off Oahu is 123456.789 inches from the shore.

March 20, 2010 9:03 am


r (06:54:26) :
Alas, not even calculus is perfect:
I was devastated when was shown Gabriel’s Horn:
Gabriel’s Horn is obtained by rotating the curve y=1/x around the x axis for 1<or =x <or = infinity .

Include Wicked-pedia in that category too (‘not perfect’); they appear to ignore the constraints on ‘x’ in their graphics thereby _not_ accurately illustrating Gabriel’s Horn:
http://en.wikipedia.org/wiki/Gabriel's_Horn
Versus a proper depiction (albeit 2D) if one observes the dotted line indicating x=1 :
http://local.wasp.uwa.edu.au/~pbourke/fun/calculus/
Or the graphic shown here at bottom-right:
http://www2.scc-fl.edu/lvosbury/CalculusII_Folder/Calculus_II_Exam_2.htm
.
.

R. de Haan
March 20, 2010 9:04 am

“Statistics can prove anything!”
“Absolute certainty is a privilege of uneducated minds-and fanatics. It is, for scientific folk, an unattainable ideal.” Cassius J. Keyser (like this one)
“CO2 is causing AGW with 95% probability!”
“We are 100% honest scientists! We did not fudge the data!”
“Statistics can never “prove” anything. All a statistical test can do is assign a probability to the data you have, indicating the likelihood (or probability) that these numbers come from random fluctuations in sampling. If this likelihood is low, a better decision might be to conclude that maybe these aren’t random fluctuations that are being observed. Maybe there is a systematic, predictable, or understandable, relationship going on? In this case, we reject the initial randomness hypothesis in favor of one that says, “Yes we do have a real relationship here” and then go on to discuss or speculate about this relationship”.
In short, gambling game!”
So how will this gambling game look like like when the data is corrupted?
Welcome to the Climate Casino!

R. Gates
March 20, 2010 9:08 am

Fascinating! Thanks for the post. I think the entire issue can be summed up nicely by this little snippit from the actual article, in talking about the marriage between science and statistics, it said:
“Whether the future holds a fruitful reconciliation or an ugly separation may depend on forging a shared understanding of probability.”
It’s always been a rough and ugly marriage, and it may very well end in a painful divorce, but IF, and this is a big IF, if the two can find a shared perspective on what is truly meant by probability, then not only will the marriage be saved, but it will be a very positive advancement for both. (just as good marriage should be!)

harrywr2
March 20, 2010 9:13 am

Liar’s never figure, figures never lie. The Easter Bunny is real.

March 20, 2010 9:15 am

DR: “After all the statistical game playing, not one single analysis employed has addressed the problem of each individual station’s integrity.”
Totally agree, there are just so many ways measurements can be out, and so many ways the readings could be biased in one direction in terms of the trend that most analysis is completely rediculous theoretical clap trap. The basic essential prerequisite for an experiment is that comparisons use the same extraneous conditions. Instead this “experiment” has gone from manual measurements using mercury thermometers in an era when there probably wasn’t even a common idea of a universal time, to an era of automated measurement at one universal time.

Dr Anthony Fallone
March 20, 2010 9:17 am

Various commentators have mentioned sample size in relation to finding significant p-values. I taught my students that they must make the standard p-threshold in their stats more and more stringent as their comparisons within one data set multiplied. Something called ‘the Bonferroni Correction’ should be used, which is merely shifting the value from p<.05 to p<.01 or perhaps p<.001 in statistical testing that really chops a data set into mincemeat. Hans Eysenck had me do that with one of my papers.
It is correct to say simply that if significance is found it ensures just that the null hypothesis can be rejected, implying that the experimental hypothesis may be accepted-but, as has been said by commentators above, greater certainty can be gained only by replication of the experiment. In the same way that an anecdote is worthless to support any hypothesis so one experiment that supports a particular hypothsis and its theory is almost as worthless. I know that sounds really hard but if you want to be credible that is the way to go.
A large data set can be mined to find significance, as has been said, but the larger the data set small effects begin to become significant; it has been said (Bakan) that a statistical significance in a large data set is uninteresting but a dramatic significance in a small data set is much more satisfying, presuming that this small set obeys the prerequisite of having a normal distribution and is otherwise suitable for feeding into the strongest stats test being used (e.g., parametric, interval, etc.). Non-parametric tests such as Chi-squared, Wilcoxon's and so on are as weak as correlations and regressions, these latter tests being most often found being mangled in climate research.
A 90% confidence interval is not acceptable even in my discipline, psychology.
If you want to be credible when writing in scientific papers you should never use the terms 'proof', 'proven', 'truth', 'fact'-only 'hypothesis supported', acknowledging that your finding, however exciting you think it is, can only be temporary, contingent, waiting to be either knocked down by the next experiment or, if it is lucky, further supported by some other published paper.
For crusty old scientists emotionally attached to their little theories their careers usually end in tears as those theories are shot down in flames by subsequent work. Being a scientist is a tough life, one that usually ends in being patronised or disregarded. Chin up!

Ron
March 20, 2010 9:20 am

My impression is that ‘statistical hypothesis testing’ is not done in the ‘hard’ sciences such as physics and chemistry, but used to study populations in psychology, ecology, medicine, etc.
There is a hard hitting academic paper titled “Statistical Hypothesis Testing As Pseudo Science”. The entire paper is in an academic journal, and was once online in its entirety. Only some of it appears online now, here it is
http://www.npwrc.usgs.gov/resource/methods/hypotest/?C=M%3BO=A