'science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation.'

The quote in the headline is direct from this article in Science News for which I’ve posted an excerpt below. I found this article interesting for two reasons. 1- It challenges use of statistical methods that have come into question in climate science recently, such as Mann’s tree ring proxy hockey stick and the Steig et al statistical assertion that Antarctica is warming. 2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. Skeptics might ponder this famous quote:

“If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford

There are many more interesting quotes about statistics here.

– Anthony

UPDATE: Luboš Motl has a rebuttal also worth reading here. I should make it clear that my position is not that we should discard statistics, but that we shouldn’t over-rely on them to tease out signals that are so weak they may or may not be significant. Nature leaves plenty of tracks,  and as Lord Rutherford points out better experiments make those tracks clear. – A

==================================

Odds Are, It’s Wrong – Science fails to face the shortcomings of statistics

By Tom Siegfried

March 27th, 2010; Vol.177 #7 (p. 26)

P valueA P value is the probability of an observed (or more extreme) result arising only from chance. S. Goodman, adapted by A. Nandy

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.

Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.

“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”

====================================

Read much more of this story here at Science News

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

238 Comments
Inline Feedbacks
View all comments
March 20, 2010 10:14 pm

VS (12:25:42) :
Debating on their turf is always problematic. You should guest post on CA.
You’ll have more commenters ( UC. Roman, stevemc, hu jeanS) that can actually add to the discussion.

March 20, 2010 10:52 pm

James F. Evans (18:19:24) :
Which side of the argument are you on Dr. Svalgaard?
On the right side, of course. What else?
But I shall not bother with you this time. Ramble on.

Tucci
March 20, 2010 11:25 pm

There are damned few places online where I’m not the smartest guy in the room. This is, by jeez, one of ’em. It seems to me that if Anthony Watts’ Web site is to be the basis of judgment, the “deniers” are the best educated, most knowledgeable people in the whole AGW discussion.
I’m wondering if anybody has given thought in this discussion of statistical analysis to the compounding effects of instrumental limits of accuracy upon the datasets being subjected to statistical evaluation in climate research.
Like some of the other folks commenting here, I’m a physician, and one of the things that was pounded into my skull back when I was young and had hair was that information cannot be relied upon unless you keep always conscious of the degrees to which your measurements can be affected by errors.
I’ve been paying attention to the “global warming” brouhaha for more than thirty years, and from the outset I’ve wondered about the ways in which the alarmists’ contentions appear to have been invalidated by uncertainties in the ways that measurements have been taken, and how these uncertainties have been addressed.
Never struck me as having been taken seriously into consideration, and you’d think that with all the statistical hand-waving they’ve been doing over the decades, some consideration of that should have crept in.
I’d welcome comments along that line. Thanks.

Wren
March 20, 2010 11:35 pm

Max Hugoson (06:43:37) :
A little bit of fun here: Years ago I was listening to NPR (yes, I still do occassionally, but with a VERY jaundiced eye/ear these days! I’ve learned…how completely left biased they are) and they gave a report on the attempts in N.J. to raise the “standard tests scores” for high school graduates.
The announcer said this: “Despite 4 years of efforts, 50% of all students still fall below the mean on the standard tests…”
I almost drove off the road (I was in my car) I laughed so hard.
My thought: “We know which side of the MEAN this announcer fell on…”
Remember: 78% of all statistics are actually made up on the spot. (Like that one.)
====
Did he mean median? That would be funnier.

Wren
March 20, 2010 11:51 pm

Rob H (16:15:59) :
Nothing is more ridiculous than the use of statistics to claim the probability of global warming. There is no scientific basis for claiming statistics can help predict the amount of warming or anything else based on some previous period of temperature data…..
======
You take the no-change extrapolation, and I’ll take the predicted warming.

March 21, 2010 12:26 am

Wren (23:51:02) :
You take the no-change extrapolation, and I’ll take the predicted warming.
And I’ll sit back and adapt to whatever transpires.
I predict that it’ll get warmer over here in July than it was in February…

pft
March 21, 2010 12:27 am

The article touches upon the philosophy of science. The frequentists vs bayesians, and deductive reasoning proponents vs inductive reasoning, realists vs non-realists.
The Bayesians and Inductive reasoning proponents have introduced more subjectivity and untested assumptions to science. Bayesian statistics was pretty much discredited before WW II, but has made a comeback in the late 20th century. In order to handle the uncertainty of science, scientific inference has become reliant on probability, perhaps too reliant. The definition of probability is of course at the heart of the debate.
It is not statistics that is the problem, the problem is those using statistics to test hypothesis that depend on assumptions which are not true, and despite knowing this, pretend to a certainty in their hypothesis, or probability if you like, that is not warranted. Yes, the argument and statistics used may be valid, yet the conclusion reached is only as good as the assumptions, which are not always supported by scientific data, or a good understanding of the science.
For example, in climate science, whatever is not understood (precipitation efficiency, cosmic rays), or where data is not available (historical cloud cover, TSI), it is ignored, the assumption being that it is not important; or estimations by estimators whose uncertainty is large or even unknown- the assumption being the estimator is accurate enough. False assumptions tend to lead to false conclusions, despite a valid argument, supported by the statistics.
Science is far removed from reliance on experiments in a lab. Einstein never performed an experiment and had to outsource his math. He eliminated the Aether with a mental eraser, claiming it was not needed to support his theory of special relativity. He back tracked on this a bit in his general theory of relativity, saying the special theoriy of relativity did not deny the Aether, and in regard to the general theory accepting there may be a different kind of Aether, or as he put it ” another thing [in the vacuum], which is not perceptible, [that] must be looked upon as real, to enable acceleration or rotation to be looked upon as something real”.
In Quantum Mechanics they are now looking at the possibility of a kind of Aether like particle, called the quantum vacuum, which is thought of as a seething froth of real particle-virtual particle pairs going in and out of existence continuously and very rapidly.
So much of science is in considering things which can not be directly observed. Nobody has ever seen an electron or even an atom, and we seek to explain what happened in the past 100 years, to 4 billion years, to the very beginning (Big Bang). We don’t really know what gravity is, just what it does. If you look at the earth as an apple, we are only guessing as to what is under the skin of the apple, having explored so little of it. Perhaps 10 thousand years from now people will laugh at how little we know, while thinking we know so much. Much like a physicist in the late 19th century claiming the greatest discoveries have all been made.
Generally though, however you go about it, the test of a scientific theory backed up by statistics or not is can it do more than fit the known observational data and predict that which is unknown (eg the global temperature in 10 years).
And even this does not prove the theory is true (just useful). For example the Aether theory was considered proven by French physicist Auguste Fresnel in the 19th century when he developed a mathematical version of the theory which was used to predict new optical phenomena. Optical experiments confirmed his predictions, yet Einstein and modern physicists claim there is no Aether, and the experiments disproving the Aether, whatever form it might take, are not as convincing as they would like us to think.

wayne
March 21, 2010 2:53 am

VS:
If you read this before answering my question above, forget it! I picked up a reference in the last few comments of Charles party post leading to BishopHill which lead to some external sites where you have laid out enough links on “unit root” testing to last me a at least a week. Thanks for the clarity!
Here’s a hint, Dr. Spencer described a bounded method using a proper feedback parameter common to climate science GCMs which creates your bounded “random walk” alternate. Check it out.

Steve:
Willis:
You should follow that path too. There is a lot of stat info there. And Willis, I followed your method to detect discontinuities using Excel under “Tale of two cities”, it works great! Still don’t understand how the residual sums and the math beneath create such of a graph, but, it does work fine.

March 21, 2010 3:23 am

MikeE (10:05:22) Wrote: “My pet concern, I have seen it many times in biology/biochemistry, is when people assume the thing they are measuring is normally distributed when that is not at all clear from their data.”
The selfsame assumption is made in the stockmarkets. Benoit Mandelbrot demonstrated this is his book ‘The (mis)Behaviour of Markets’. He related the tale of a bunch of wheeler-dealers who invested vast sums on the assumption of normal distributions, made millions, lost more millions when the assumption let them down, and had to be bailed out to prevent a crash (before the recent crash). Movements in chaotic systems are not – repeat not – Gaussian.

March 21, 2010 3:28 am

pft (00:27:50): “Much like a physicist in the late 19th century claiming the greatest discoveries have all been made.”
Hah! I knew that ‘the science is settled’ had a familiar ring! I wonder if somebody asked Einstein, ‘Why should I give you my data? You’ll just try to find something wrong with it!’

Joe Leblanc
March 21, 2010 7:12 am

As usual Luboš Motl is right. Measurement is useless without an estimate of confidence intervals and that is gained by statistical analysis.
When your doctor gets the results of your blood test, all of the results are accompanied by upper and lower values equal to one standard deviation above and below the mean value observed in the general population. This is how she applies science to diagnosing your ailment.
For the statistical approach to data analysis to work properly, the doctor or scientist must be willing to reject or accept the hypothesis being tested. This is difficult to do when the financial and career consequences are great.
Professor Wegman’s criticism of Michael Mann’s use of statistics was that he did not apply the statistical techniques properly. Wegman confirmed the claim by McIntryre and McKittrick that the technique used actually mined the data to generate the “hockey stick”.

Curiousgeorge
March 21, 2010 7:19 am

John Whitman (19:09:00) :

New subject: Can you provide guidance on the standard [frequentist] vs the Bayesian approaches. Is there a fundamental difference, or is it just a difference in emphasis?
John

Edwin Jaynes; “Probability Theory, The Logic of Science ( partial manuscript ) ” http://omega.albany.edu:8008/JaynesBook.html ( Full book also available at Amazon.
And a large number of statistical papers by various authors on both general and specific applications — http://bayes.wustl.edu/
Enjoy.

Alan D McIntire
March 21, 2010 8:01 am

I think the major problem here is with the social sciences, especially medical studies. A p value of 1% means that if the character you are measuring happens randomly, 1% of the time you’ll get an event as rare as that or rarer at least 1% of the time.
Then consider all the factors that can affect the medical testing of a food, drug, etc— Age, sex, weight, frequency of use- Combining a bunch of factors you can easily get 1% results
like,
“women over 40 drinking over 5 cups of coffee per day reduce their chance of heart attacks by 1/3. Then would happen
less than 0.5% by chance alone, so the results are significant.”
In actuality, the first test is really just a “fishing” expediton. Once the results are in the next step is to run an independent test on the same factors- If you still get that 1% result, there may be something to the test.

OceanTwo
March 21, 2010 9:01 am

steveta_uk (04:31:35) :
After reading some of the “random walk” posts recently, I thought I’d try a little experiment, which consisted of writing a bit of C code which generated pseudo-temperature records, …

As have I, repeatedly. I also have access to huge quantites (GBytes) of raw process data.
I can make up random numbers, use actual process data, induce both negative and positive forcings, examine process data with harmful forcings, and various feedback mechanisms. With the process data, a lot of it has the input data correlated with the output data.
What was quite interesting was when you got below 500 odd data points, the cause/effect and process forcing was indistinguishable from random actions.
True, I’m sure from a statistical standpoint this is completely meaningless, but when the data presented with climate schience (sic) whitewashed on top matches the random burp of a computer program, you tend to suspect that the climate models aren’t analyzing data but generating it.

March 21, 2010 9:14 am

Statistics can help support a hypothesis, they cannot be the basis of the hypothesis.

Tad
March 21, 2010 9:54 am

So what are you going to do when you want to research something and you don’t have a gazillion dollars? You do what you can afford, get some preliminary results and write up what you’ve found. But ah ha, others do not see a trend when you see it. How to get everyone to admit whether there is or isn’t a trend? A statistical statement. Then at least people have a basis on which they can argue whether the stats were done right or not. And if they agree the stats are correct, then someone can pursue further research. This is but the first step in finding the truth, not the final determination. The final determination will be made when there is such a large body of data, or when the scientific theory is so solid, that stats are not needed. In the meantime, statistics can provide a guideline as to what areas of research to pursue or not.

Wren
March 21, 2010 10:02 am

Bill Tuttle (00:26:10) :
Wren (23:51:02) :
You take the no-change extrapolation, and I’ll take the predicted warming.
And I’ll sit back and adapt to whatever transpires.
I predict that it’ll get warmer over here in July than it was in February…
============
That’s not very bold of you, but I wasn’t referring to seasonal changes in temperature.
A bold prediction would be a prediction of no more global warming(i.e., a no-change extrapolation). I say bold because it doesn’t backcast well.

March 21, 2010 11:12 am


Wren (10:02:39) :

A bold prediction would …

Pls, sully not a thread containing some really good technical posts, references and so forth …
.
.

A C Osborn
March 21, 2010 11:43 am

VA, along with all the others, I say thanks for coming on here, I posted this question twice over on Bart’s “Global average temperature increase GISS HadCRU and NCDC compared” but haven’t had an answer form anyone.
Well nobody bothered to answer my question, so I will ask it again.
We all know that the Global Temperature Anomaly series is “Corrected”, “Celled”, “Averaged” and “Homogenised”.
Has anyone looked at a Raw Temperature Series to see if it exhibits the same Statistical characteristics?

March 21, 2010 12:28 pm

Wren (10:02:39) :
That’s not very bold of you…A bold prediction would be a prediction of no more global warming
There’s bold, and then there’s rash.
I haven’t survived four combat zones and two marriages by being rash.

Editor
March 21, 2010 12:29 pm

wayne (02:53:10)

Willis, I followed your method to detect discontinuities using Excel under “Tale of two cities”, it works great! Still don’t understand how the residual sums and the math beneath create such of a graph, but, it does work fine.

Thanks, wayne. Now if you’d be so kind, post that over on the “Tale of Two Cities” thread, Steve Goddard still doesn’t believe it.

Steve Goddard
March 21, 2010 1:28 pm

Willis,
I am glad that you and Leif like your spreadsheet. Nevertheless the entire the 1895-1941 “trend” you claim, occurred during one year (1920) indicating a discontinuity which your spreadsheet missed.

Editor
March 21, 2010 3:13 pm

Steve Goddard (13:28:36)

Willis,
I am glad that you and Leif like your spreadsheet. Nevertheless the entire the 1895-1941 “trend” you claim, occurred during one year (1920) indicating a discontinuity which your spreadsheet missed.

Trend 1895-1919 (not including 1920) 0.04°C/decade
Trend 1920-1941 (not including 1920) 0.09°C/decade
In other words, there is a trend both before and after 1920.
I’m afraid your eyeball has misled you again. There is a change in the trend in 1920, but a change in the trend != a discontinuity.
Also, please recall that your claim was that

The increase in temperatures started around 1970.

When I showed there was a trend post 1941, your new claim was that there was no trend pre 1941. But please note:
Your original thesis was that the trend only existed post-1970, and was driven by the differential post-1970 population growth.
That claim has been resoundingly disproven. We’re now discussing other issues about the temperature record.
But we should discuss this on the relevant thread. I have cross-posted this there. Thread drift, it burns …

Veronica (England)
March 21, 2010 3:22 pm

A very petite friend of mine was expecting a baby and was incensed to be told by her health visitor that there must be a problem with the pregnancy because her baby’s size-for-dates was “below average”.
When Joe Public has such a clear understanding of statistics, it is very easy to make them believe anything.

Steve Goddard
March 21, 2010 4:45 pm

Willis,
I don’t know where you are getting your numbers from. The trend from 1895-1919 is negative -0.0056. The trend from 1920-1941 is 0.0097 . This is nowhere near the 0.20 you originally claimed, or your reduced numbers above.
You are changing the subject of this discussion, which was the fact that your spreadsheet missed the discontinuity in 1920.
And I had already agreed that there was probably a post mid-1940s trend, which is when the population started to grow rapidly in Fort Collins – supporting the UHI thesis.