'science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation.'

The quote in the headline is direct from this article in Science News for which I’ve posted an excerpt below. I found this article interesting for two reasons. 1- It challenges use of statistical methods that have come into question in climate science recently, such as Mann’s tree ring proxy hockey stick and the Steig et al statistical assertion that Antarctica is warming. 2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. Skeptics might ponder this famous quote:

“If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford

There are many more interesting quotes about statistics here.

– Anthony

UPDATE: Luboš Motl has a rebuttal also worth reading here. I should make it clear that my position is not that we should discard statistics, but that we shouldn’t over-rely on them to tease out signals that are so weak they may or may not be significant. Nature leaves plenty of tracks,  and as Lord Rutherford points out better experiments make those tracks clear. – A

==================================

Odds Are, It’s Wrong – Science fails to face the shortcomings of statistics

By Tom Siegfried

March 27th, 2010; Vol.177 #7 (p. 26)

P valueA P value is the probability of an observed (or more extreme) result arising only from chance. S. Goodman, adapted by A. Nandy

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.

Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.

“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”

====================================

Read much more of this story here at Science News

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
238 Comments
Inline Feedbacks
View all comments
James F. Evans
March 20, 2010 3:09 pm

Leif Svalgaard (13:01:34) wrote: “There is no hypothesis to be validated or disproved, etc.”
Yes, there is, “no hypothesis to be validated or disproved, etc.”, rather, there is a law of physics to be applied, a theory, and all theories started off as hypotheses.

H.R.
March 20, 2010 3:13 pm

Here’s a nice topical example of “the beauty of statistics.”
Approximately 2,420,000 Americans die each year. Approximately 18,000 uninsured Americans die each year.
Therefore, you are 133.44 times more likely to die if you are insured than if you are uninsured.
We do not need universal healthcare. We need to ban insurance altogether.
Silly you say? Consider Great Britain where there is universal health insurance. 100% of the population who die do so while insured. Think of the lives that could be saved if G.B. insured only half the population.
I rest my case.

wayne
March 20, 2010 3:14 pm

steveta_uk (04:31:35) :
After reading some of the “random walk” posts recently, I thought I’d try a little experiment, which consisted of writing a bit of C code which generated pseudo-temperature records
You might enjoy Dr. Spencers two term GCM spoken of in http://www.youtube.com/watch?v=xos49g1sdzo
He calls it a “Minimalist’s Global Climate Model”. It works more properly than a mere random walk and he gets into white noise, red noise, and the pink noise he uses.

Basil
Editor
March 20, 2010 3:50 pm

Leif Svalgaard (09:25:01) :
Leif Svalgaard (08:47:00) :
Basil (07:02:14) :
We, of course, basically agree.
Hey, I had intentionally misspelled ‘basically’ as ‘basilally’. And somehow it got corrected…
REPLY: No good deed goes unpunished, sorry Leif. – Anthony

🙂
But Anthony didn’t correct Leif’s “I am one off…” and I’m not sure if this was intended or not. 🙂 I think Leif meant “I am one of…”
But Leif is more of a “one off” scientist than I think he realizes. I’m glad the hundreds of scientists he knows are not among those who abuse statistics. But how should the thousands of scientists who saw their papers warped into the IPCC “Treatment of Uncertainty” feel about the “>66% = likely” nonsense? And weren’t the people who created this “Treatment of Uncertainty” supposedly “scientists?”

March 20, 2010 4:05 pm

James F. Evans (15:09:08) :
rather, there is a law of physics to be applied, a theory, and all theories started off as hypotheses.
You are rambling, spare us and stop digging.

Espen
March 20, 2010 4:05 pm

jdn: This Science News article is an example of people speaking about statistics who don’t know what they’re doing or who are misrepresenting the field. Nobody uses p = .05 anymore unless they have really rare data. We like to see p = 0.01 or 0.001.
Yes, my BS alarm went red when I read that.
I still think the article raises important points, though, and it’s highly relevant to WUWT. As far as I’ve been able to tell, a lot of the work in climatology violates two principles of statistics: 1) Samples should be random (the tree-ring guys seem to use subjective judgments of data sets instead of choosing randomly from them) and 2) You can’t reuse data (As far as I can tell, they run tests on the same data sets with which they build their models with exploratory methods like e.g. PCA. You can’t do that, in order to test your model you need a new randomly sampled data set from the same population).

Rob H
March 20, 2010 4:15 pm

Nothing is more ridiculous than the use of statistics to claim the probability of global warming. There is no scientific basis for claiming statistics can help predict the amount of warming or anything else based on some previous period of temperature data. How would the climate scientists of the cold 1700s’ done with temperature data from their period and using statistics to predict temperatures in the 1800s’.
And as far as computing a global average temperature using sample data sets that are then “homogenized”; how can the larger scientific community not call this out for the pseudo science that it is?

March 20, 2010 4:56 pm

Basil (15:50:07) :
I think Leif meant “I am one of…”
Yeah, yeah, ..
But how should the thousands of scientists who saw their papers warped into the IPCC “Treatment of Uncertainty” feel about the “>66% = likely” nonsense?
There is probably a disconnect here. >66% is ‘likely’, but scientists take that with several grains of salt anyway. And do not think the word is as ‘strong’ as Joe Public does. For example, In a court of law and in dealings with the IRS, the phrase ‘more likely than not’ is taken to mean ‘indicative’ rather than ‘probable, e.g. http://www.pwc.com/en_US/us/tax-compliance-services/assets/fin_48_tax_penalty_standard.pdf
And weren’t the people who created this “Treatment of Uncertainty” supposedly “scientists?”
see above.

John Whitman
March 20, 2010 5:19 pm

Anthony,
In the lead paragraph you mention, in summarizing the article that: “2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. ”
Is your point to stress that competition is positive or is your point to stress that it is negative?
I think it can only be positive.
John

rw
March 20, 2010 5:23 pm

Someone in this thread said something about “the Gaussian assumption”. I suggest you go back and read about the Central Limit Theorem.
I have always felt that statistics is one of the most extraordinary achievements of the human mind. But using it well takes a lot of experience and effort. (And it is certainly not a guarantee to _scientific_ significance – in fact, some of the greatest experimental scientists, such as N. Tinbergen or C. S. Sherrington, used little or no statistics in their work.)
Someone also said something about only accepting results beyond 4 sigma – but what was the sample size? The 0.05 criteria is reasonable, but not if the sample size is so large that even small deviations from expectation are unlikely.

Editor
March 20, 2010 5:55 pm

VS, I have now slogged through the entire post you cited, most fascinating.
My only remaining question is this. Hurst derived this theory and his “Hurst statistic” from a climate dataset, the flow of the Nile. You say that Hurst statistics don’t apply to climate datasets … what am I missing here?
I also found an interesting paper here about unit roots and the Hurst statistic that I would love to get your comments on if you have time.

idlex
March 20, 2010 6:17 pm

My take on Siegfried and Motl.

James F. Evans
March 20, 2010 6:19 pm

Evans (14:17:21) wrote: “-weighted average”
Leif Svalgaard (15:07:54) replied: “Wrong kind of statistics…”
Hey, you’re the one using a statistical term. I can’t help it if you use a term and then turn around and say, “wrong kind of statistics…”
Sounds like you’re shuffling the terms of debate.
Evans (15:09:08) wrote: “rather, there is a law of physics to be applied, a theory, and all theories started off as hypotheses.”
Leif Svalgaard (16:05:27) replied: “You are rambling…”
Sorry, not rambling at all.
Dr. Svalgaard previously claimed, “There is no hypothesis to be validated or disproved, etc.”, which as I previously stated is correct, and it is so because originally the hypothesis was demonstrated via experiment so many times and in many different experiments that it is deemed a Theory, a set of physical relationships that is so well established that it is deemed a ‘physical law’ or part of what is deemed a sub-set of a physical law.
Silly Dr. Svalgaard, so determined to disagree, he can’t even recognize when I’m acknowledging that statistical analysis can be applied to physical relationships to derive further understanding.
The point others and I have made here on this thread about statistics, “There are three kinds of lies: lies, damned lies and statistics”, is that statistics are subject to improper use, application, and interpretation, that is to say, abuse or improper use, whether intentional or unintentional, or they may not convey any useful information.
Like any powerful tool, and statistics is a tool, as is mathematics, it can be used for profit and understanding or misused for confusion and loss.
This is true for any tool and statistics is no different.
Which way do you want it, Dr. Svalgaard, statistics are a useful servant to physics or…what?
But lets not obscure my original point (09:30:31): “An example of misleading statistics: Water vapor is around 1% of the atmosphere, a molecular constituent of air. But the 1% figure is misleading because it is an ‘average’ of the entire volume of air in the atmosphere.”
Of course, the 1% figure for water vapor in the atmosphere is a data point, often refered to as a statistic, perhaps colloquially, but still often used; as I stated above, the 1% figure holds limited utility for the scientist.
Statistics are often a breakdown of a larger whole into its constituent parts and proportions…hey, that’s what a percentage is, a proportion or part of a whole.
And, particular mathematical equations can be applied to derive the percentage based on known physical relationships.
Which side of the argument are you on Dr. Svalgaard?

Gary P
March 20, 2010 6:19 pm

For a continuous education in statistics follow the link in the sidebar to William Brigg’s website.
http://wmbriggs.com/blog/
A recent entry discussed parameters in models and how one can sometimes compute the the confidence interval for one of the parameters to show that the value is correct to a small margin of error, GIVEN that the model is correct. The message was of course that statistics said nothing about the correctness of the model.
For example, I made a Mayan climate model that is exactly the same as the best computer model existing but all outputs are multiplied by zero after 2012. It will match all statistical tests until then so one cannot use statistics to show my false model is any worse than the existing model.
John von Neumann, “With four parameters I can fit an elephant and with five I can make him wiggle his trunk.”
With more than five parameters the modelers still cannot get rid of the tropical hot spot from their outputs. Then some argue about the data.

March 20, 2010 6:21 pm

rw (17:23:59) :
Someone in this thread said something about “the Gaussian Someone also said something about only accepting results beyond 4 sigma – but what was the sample size?
There is some confusion between inductive statistics and descriptive statistics. The 5-sigma physicist-criteria is almost always about descriptive, not inductive, statistics. One measures a ‘blip’ in the counting rate and wonders if the blip rises above the noise. This is very different from trying to tease a trend out of the data.

Tenuc
March 20, 2010 6:33 pm

juandos (03:58:07) :
“Hmmm, regardless of the numbers and quality of data sets how can one model something like climate if all the subtleties aren’t understood?
I’m sorry for asking what might be a seriously dumb question but I keep tripping over the butterfly effect…”

So do the climate models!
‘Climate’ is driven by deterministic chaos and is in constant change at all spacial and temporal scales. Trends have no meaning in non-linear systems, and can easily be cherry-picked to support or refute any hypothesis tabled. The current style of pseudo-statistical science is a travesty.

John Whitman
March 20, 2010 7:09 pm

VS,
Thank you for your thread participation over at Bart’s “http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared”
I went through it a few days ago [repeatedly].
You made me want to be a statistician. I never had that desire before.
New subject: Can you provide guidance on the standard [frequentist] vs the Bayesian approaches. Is there a fundamental difference, or is it just a difference in emphasis?
John

March 20, 2010 7:21 pm

We need to require people add a logo like the following to results that have not been replicated by others, including at least one skeptic.
http://2.bp.blogspot.com/_djgssszshgM/S6SzU2VMTTI/AAAAAAAABHo/RqY8Xkbzdy8/s1600-h/warninglabel.jpg

bob
March 20, 2010 7:57 pm

VS, a question if I may.
If a temperature series indeed contains a unit root, then the temperature series must diverge to infinity, and since it cannot go to – infinity due to the laws of thermodynamics, it must go to positive infinity,
therefore global warming is caused by statisics,
the following is from the wikepedia article on unit roots
As noted above, a unit root process has a variance that depends on t, and diverges to infinity

John Whitman
March 20, 2010 8:37 pm

Luboš,
In your ‘rebuttal’ you said,
“””””””And quite often, your data simply don’t contain enough information to decide. This is not a bug that you should blame on the statistical method. The statistical method is innocent. It is telling you the truth and the truth is that we don’t know. The laymen may often be scared by the idea that we don’t know something – and they often prefer fake and wrong knowledge over admitting that we don’t know – but it’s their illness, their inability to live with what the actual science is telling us (or not telling us, in this case), not a bug of the statistical method.””””
Luboš,
Good stuff.
A person who fully faces reality and the many uncertainties, yet does not seek the false security of comfortable nonreality theories/beliefs . . . . that is a special human.
Your essay is not a rebuttal, it has a stand alone merit. Just edit it slightly to remove the ‘rebuttal’ part.
Please see if you can do your ‘rebuttal’ post at WUWT by itself, rather than just as a side note to this Tom Siegfried post.
John

JDN
March 20, 2010 8:53 pm

Espen (16:05:52) :
>I still think the article raises important points, though, and it’s highly relevant to WUWT.
I liked the topic, but, I felt the authors had an agenda…. and were writing badly in Science News 🙂
Here’s a nice article if you can see it: http://jrsm.rsmjournals.com/cgi/content/abstract/101/10/507
They had reviewers examine fake articles in medical research with deliberate errors to see how many of the errors were caught. As you might imagine, the results were not good. Such exercises would be worth doing elsewhere. The climate journals can’t be trusted to self-police. It would have to be done without them knowing they were being tested.

Brian G Valentine
March 20, 2010 9:04 pm

As I see it, statistics have to be applied to (and may only properly be applied to) true indeterminate errors of measurements (every measurement excepting true counting involves an indeterminate error) – that is the only way hypotheses can be tested, when the true indeterminate errors of measurements fall within known deviations of means of measurements.
Statistics are improperly applied to establishing limits of confidence intervals – when those confidence intervals can be adjusted evidently at will by assumptions made to fit some hypothesis, neglecting assumptions consistent with the hypothesis but altering the confidence intervals over which the hypothesis is taken to be valid

wayne
March 20, 2010 9:37 pm

VS (13:37:05) :
I have read the links you pointed to on series that have a unit root.
Would I correctly understand unit one series in this thought example: if I view a time series over the last 5,000 years of the average global temperature to have a slope of zero (no trend), then any offsets caused by processes such as additional CO2 do not then create a trend but instead simply move the base up as a one time permanent offset to create a new base of the zero slope linear regression? In other words, a permanent step move, the current regression is unaffected and the slope is still zero but the regression is now split in two pieces at the time the offset occurred.
Of course in reality that would have to include the logrithmic nature of CO2’s concentration and the rate of its increase and the sensitivity of the climate system and CO2s effect on termpereures itself. But every month there might be a ~0.02 degC permanent step, decreasing as the years go by if CO2 concentrations continue to increase in logrithmic style, but any slope (if there is any slope) would not change from CO2 influence alone.
Does that type of interpretation of unit one fit correctly in rough terms? Does someone actually know how to mathematically state this type of example in statistics?

March 20, 2010 9:46 pm

Since you didn’t see it, my pun wasn’t any good to begin with 🙂
Missed you at ctm’s great party last night. Seven police cruisers were standing by outside [only half a block away] the joint.
It was great to see you Leif!.

wayne
March 20, 2010 9:52 pm

VS (13:37:05) :
Correction:
I should have not put any explicit rate in my example, the ~0.02/mo was more ~0.02/cy/mo and that is just grabbing an example figure. Just make it read “some positive amount”.