'science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation.'

The quote in the headline is direct from this article in Science News for which I’ve posted an excerpt below. I found this article interesting for two reasons. 1- It challenges use of statistical methods that have come into question in climate science recently, such as Mann’s tree ring proxy hockey stick and the Steig et al statistical assertion that Antarctica is warming. 2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. Skeptics might ponder this famous quote:

“If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford

There are many more interesting quotes about statistics here.

– Anthony

UPDATE: Luboš Motl has a rebuttal also worth reading here. I should make it clear that my position is not that we should discard statistics, but that we shouldn’t over-rely on them to tease out signals that are so weak they may or may not be significant. Nature leaves plenty of tracks,  and as Lord Rutherford points out better experiments make those tracks clear. – A

==================================

Odds Are, It’s Wrong – Science fails to face the shortcomings of statistics

By Tom Siegfried

March 27th, 2010; Vol.177 #7 (p. 26)

P valueA P value is the probability of an observed (or more extreme) result arising only from chance. S. Goodman, adapted by A. Nandy

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.

Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.

“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”

====================================

Read much more of this story here at Science News

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
238 Comments
Inline Feedbacks
View all comments
Onion
March 20, 2010 3:14 am

Great article. Flavour of the month in medicine is the QRisk score for assessing risk of CHD, similar to the Framingham risk score. Once all of my patient’s CHD risk factors are inputted, the computer magically calculates the patient’s risk, expressed as a percentage; based upon which we decide to prescribe statins or not.
There are many flaws with this. Say a patient scores 20%. What is the accuracy of that score? We don’t know. In order to test the validity, we would have to do a prospective study recruiting a statistically siginificsnt number of patients with a Qrisk of 20% and measuring their CHD outcomes over the following 10 years.
There are many other problems with the studies used to justify statin use for primary prevention of coronary heart disease. The fact that we give patients a percentage risk score, the accuracy or confidence limits over which we have no idea, is pure voodoo medicine.

Allan M
March 20, 2010 3:17 am

“This means that it is 95 percent certain that the observed difference between groups, or sets of samples, is real and could not have arisen by chance.”

“That interpretation commits an egregious logical error (technical term: “transposed conditional”): confusing the odds of getting a result (if a hypothesis is true) with the odds favoring the hypothesis if you observe that result.”

Someone ought to tell the IPCC that!
So, statistically, it’s impossible to distinguiush a hockey stick from a pogo stick.
Good article, and a list of books to read.

TerrySkinner
March 20, 2010 3:38 am

I have an amateur interest in archaeology and ancient history. Even here use and misuse of statistics is commonplace. For example some seeds from a site might be radiocarbon tested. For whatever reason it is commonplace to get a range, often a wide range, of dates from the samples tested.
To get a date for the site what is then done is either a mean is taken of all of the dates and published as the date of the site or a few ‘anomalous’ dates are ignored and the mean date then calculated.
This is all very understandable and is probably the best that can be done with the present imperfect science. But only one of the readings at most can be the true date, possibly one of the anomalous ones and all of the dates might be wrong
Mathematically it has seemed to me to be like trying to measure the height of your son by averaging the heights of all of the other boys in his school.

dearieme
March 20, 2010 3:40 am

Rutherford did physics. His dictum is laughable in, say, agricultural research.
As for the shaky foundation of stats: yes, reasonably well known i.e. I’ve known for 40 years -:). But not much to do with the lousy (or even dishonest) methods that pollute Climate Science.

channon
March 20, 2010 3:54 am

Yes pure math gives what appears to be the comfort of certainty and although many pure scientists believe this to be absolutely true, most philosophers can show that no system of logic is both complete and consistent.
That being the case, that purity is only relatively true and there is an element of uncertainty inherent in all calculations and proofs.
Statistics at least states from the start that it is dealing with probabilities and not absolutes.
I do agree with the author that, bunching together a whole load of probablys from disparate research findings is just as likely to compound errors in conclusions as it is to diminish them.
However, there isn’t really any gold standard to work from that can be equally applied to all fields of enquiry and he needs to get a little more comfortable with uncertainty.
Nor should there be. Man the toolmaker has to develop tools that will give results most of the time in every field of endeavour. Some are better than others but I’m not throwing my nutcracker away just because it won’t crack almonds. I hit them with it instead!

davide
March 20, 2010 3:57 am

one quotation missing is from Winston Churchill
“There are three types of lie; lies, damned lies and statistics!”

juandos
March 20, 2010 3:58 am

Hmmm, regardless of the numbers and quality of data sets how can one model something like climate if all the subtleties aren’t understood?
I’m sorry for asking what might be a seriously dumb question but I keep tripping over the butterfly effect…

Simon H
March 20, 2010 4:03 am

It’s a frustrating thing, to see a thick black line graph representing the average of a model’s predictions over 50 cycles. I think it should be a legal requirement to graph climate model projections in a more directly representative fashion. If you plot the result of a model you ran 50 times, all the plotted points of all the model’s run-time projections should be shown at 1/50th density.
The result will look like the end of a frayed piece of string, but it will at least be more visually representative of the confidence that should be placed in the model’s predictions. There would be nothing visually discernible at the right-hand side of a graph plotted this way, and that’s exactly what it should look like.

JimD
March 20, 2010 4:08 am

Stats can be excellent in the right disciplines and if deployed appropriately – without this branch of maths, agriculture could not have progressed nearly as far as it has. Natural biological variation needs appropriate tools to determine effects of treatments on yield, growth, etc.
In this case, we have the effective methods, which have been fully tested for 150+ years and are responsible for advances which feed billions of people. However, these tools depend on utterly scrupulous and trustworthy scientists not massaging data for their own theories and Lysenkoist delusions.
Dishonest pseudoscientists using fabricated and worthless techniques damage every area of science they contact. Climate fraud risks public confidence in so many respects – I’m disappointed in the likes of The Royal Society for being so utterly spineless in failing to defend their own constituencies.

tom wannamaker
March 20, 2010 4:09 am

“Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted.” … Matches are a good thing, but in the hands of children or the careless can burn the house down.
Thanks for the compendium of statistics quotes. Missing was one of my all-time favorites:
“There are three kinds of lies: lies, damned lies and statistics.”
-Benjamin Disraeli (attributed by Mark Twain)

Allen63
March 20, 2010 4:22 am

Very informative. It comports with my actual experiences over the course of my career. Knowing I was to become a scientist, I elected to take courses in experiment design, statistics, and computer programming. Thought I was “normal”. When I worked at NASA, I found I was one the very very very few.
So often, I look at how statistics are used in climate related studies (even those I would like to support) and, without being able to put my finger on it, think — this has to be a misuse of statistics.

slow to follow
March 20, 2010 4:25 am

I think the author is too hard on “maths” rather than “incompetence”. As he states:
“Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals.”
…Sounds like a certain Canadian blogger!

jimp
March 20, 2010 4:26 am

the diagram of p values is ok but one vital piece of information is left out. The p value is the probability of an observed (or more extreme) result arising only from chance but UNDER THE NULL HYPOTHESIS.
There is a venerable literature on the null hypothesis and a venerable tradition, followed by generations of weak students, of ignoring it and what methodologically it means.
The method of science (summarised by Popper’s idea of ‘conjectures and refutations’) is neatly caught by the concept of the null hypothesis but consideration of its proper use seems to have gone out of fashion in some quarters.

March 20, 2010 4:26 am

What drives me nuts are statistical analyses of rare, unevenly distributed but probably not random events.
If there are only about 6 Category 5 hurricanes hitting the US mainland in a century, then you’re almost certainly going to get more in one half than in the other, even if it means nothing at all.

steveta_uk
March 20, 2010 4:31 am

After reading some of the “random walk” posts recently, I thought I’d try a little experiment, which consisted of writing a bit of C code which generated pseudo-temperature records, based on a random +- 0.1 annual deviation from the previous year, centered around 15C, and with a bias factor that made the temperature drift towards 15C if it starts drifting away.
So this 15-minute job produced 10,000 years of temperature records which I imported into a spreadsheet and drew some pictures.
There’s basically with a boring average close to 15, and lots of apparent noise between 13 and 17C. But zoom in a bit, and you see features like little ice ages, medieval warm periods, “hockey stick” features, and all sorts.
And apply some of the trend analysis functions to selected parts of the “noise” and it finds all sorts of things.
And it’s all random.

Ron
March 20, 2010 4:44 am

A bit off topic but vaguely relevant. Next to this article on my computer this morning was an add for the London Speakers Bureau advertising the services of Rajendra Pachauri. What words in the article could have prompted the link: “dirtiest secret”, “flimsy foundation” or “countless illegitimate findings” ? I wonder.

Joe
March 20, 2010 4:44 am

KISS comes to mind…Keep It Simple Stupid.
When you start to get into more complex math married to science, a scientist are not mathematician.
A times, even simple math will loose people when trying to show something or prove a point.
My wife will go into a fog when I show interesting points of science as it is not her interest. As well, I’ll tune out when she talks about cooking and recipes.
We all have our different areas that will peek our interest or disinterest which will turn the foggy eyes on.

March 20, 2010 5:02 am

If using P-values correctly are so “problematical,” then perhaps the alternative, Bayesian statistics, should be considered.
E.T. Jaynes wrote a famous book “Probability Theory – The Logic of Science” and I note that the term “P-value” does not even appear in the index. Maybe the world would be better understood without P-values.

Shocked
March 20, 2010 5:11 am

This is why a growing number of young scientists are Bayensians.
One of the crucial features of the Bayesian view is that a probability is assigned to a hypothesis, whereas under the frequentist view, a hypothesis is typically rejected or not rejected without directly assigning a probability.
http://en.wikipedia.org/wiki/Bayesian_probability
So the statistical approach with the “flimsy foundations” is known as the frequentist approach. The same one that says “the debate is over”: the hypothesis that GHG are causing a dangerous linear warming has been accepted. By contrast Bayensians (that most of us here are without even realising it) will work out how probable it is that the dangerous GHG hypothesis is correct, and compare this probability to competing climate theories.

richard
March 20, 2010 5:14 am

Agree strongly. When dealing with something as complex as human-drug interaction, even when you use statistics rigorously you can often get some very odd results, hence the almost weekly ‘wine is good/bad for you’ headlines.
To grace something as intangible as climatology with the moniker “science” is to give it a source of legitimacy that it really shouldn’t have. Maybe “climate prediction” or “climate educated guesswork” would be closer to the mark.

March 20, 2010 5:16 am

In almost anything related to human beings direct experiment is not possible so statistical analysis of measurements is all we have & probably all we can have.
Sometimes we rely on worse than that for example the LNT theory that low level radiation is harmful relies on absolutely no evidence whatsoever wherea the opposing theory of hormesis (that at low levels it is good for you) has a considerable amount of statistical evidence in humans & an unquestioned base in experimental results of animals & plants. Once again we see real science being ignored for the sake of politics.

DocMartyn
March 20, 2010 5:24 am

actually the major problem is the ‘Gaussian’ assumption. One can only use use statistic’s if you know the error distribution, typically scientists assume that their error distribution is Gaussian and use statistics (packages that they don’t understand) to define standard errors and confidence levels.
A large number of processes are not Gaussian, probably the majority of things that are measured in biology are not. In biology one tends to have bimodal populations where you have a significant overlap between members of the two populations.
It is rather like trying to find the average speed of human movement in New York, some are sedate (mean 0 mph), some people are walking (mean 4 mph), some are on the subway (mean 8 mph) and some are in car (mean 11 mph). During the course of the day the average speed changes; but mostly due to the transition from one state to another. The movement for sub-populations between states is very difficult to test, but calculating the mean is easy.
The best thing to do is to experimentally populate or depopulate a state, negating the need for statistics. Clever experimental design is a lot better than clever statistics.

steveta_uk
March 20, 2010 5:28 am

FatBigot (01:05:20) :
I think this depends on the nature of the problem, where an uncertain answer can be completely correct.
For example, try running across a busy fast-moving road. Once you’ve got across, the answer to “did a car kill me” is clearly “no”. Or else, if you failed to cross, the answer is clearly “yes”.
So if the question is “will I be killed running across a busy road” a probabilistic answer seems completely reasonable, but in your view the answer can only be “I don’t know”.
If the question is “is it dangerous to run across a busy road” then the answer is clearly “yes” despite the possibility of survival.

Simon H
March 20, 2010 5:34 am

richard (05:14:44) : “To grace something as intangible as climatology with the moniker “science” is to give it a source of legitimacy that it really shouldn’t have. Maybe “climate prediction” or “climate educated guesswork” would be closer to the mark.”
I concur. Climatology has far more in common with Astrology than Geology. Whoever would call Astrology a science? In the immortal words of Maureen Lipman in the 80s British Telecom TV ad, “You get an ‘ology’? You’re a SCIENTIST!” – Youtube: http://www.youtube.com/watch?v=vEfKEzX9QLE
:o)

Tom
March 20, 2010 5:53 am

Close enough, ‘IS’ government work… because they then have to reveiw and update their findings of fact, that will then need to be kept from the public. It is all about the ‘O’ flow. All of this needs to take place first, before the scientists throw it all out to save ‘SPACE’. You know, they are moving into a bigger and better facility after they hold the ‘ACLUE Meeting’ in Madrid…