'science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation.'

The quote in the headline is direct from this article in Science News for which I’ve posted an excerpt below. I found this article interesting for two reasons. 1- It challenges use of statistical methods that have come into question in climate science recently, such as Mann’s tree ring proxy hockey stick and the Steig et al statistical assertion that Antarctica is warming. 2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. Skeptics might ponder this famous quote:

“If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford

There are many more interesting quotes about statistics here.

– Anthony

UPDATE: Luboš Motl has a rebuttal also worth reading here. I should make it clear that my position is not that we should discard statistics, but that we shouldn’t over-rely on them to tease out signals that are so weak they may or may not be significant. Nature leaves plenty of tracks,  and as Lord Rutherford points out better experiments make those tracks clear. – A

==================================

Odds Are, It’s Wrong – Science fails to face the shortcomings of statistics

By Tom Siegfried

March 27th, 2010; Vol.177 #7 (p. 26)

P valueA P value is the probability of an observed (or more extreme) result arising only from chance. S. Goodman, adapted by A. Nandy

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.

Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.

“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”

====================================

Read much more of this story here at Science News

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

238 Comments
Inline Feedbacks
View all comments
Dan Lee
March 20, 2010 5:54 am

Willis Eschenbach (02:41:51)
Thank you for that. I got a warning in one of my stats classes about “fishing expeditions” through the data looking for results. I was told that I would always find -something- significant in the data somewhere if I tweaked my tests enough.
But finding significance on your umpteenth attempt missed the whole point: if your hypothesis was right and the test result was truly significant, it would have jumped out at you on the first try.
Each try after that is like doubling the size of your net and draining half the pond looking that monster catfish that you just know is in there somewhere.
When you finally do find a minnow flopping around in the mud at the bottom of the drained pond, the stats will still show it to be significant, and guess what gets submitted for publication.
That’s why the point in the article about replication is so important. And perhaps why stonewalling on providing data for replication has been the biggest problem with climatology.

PJB
March 20, 2010 5:55 am

Having used statistical methods in my research, I always distilled my results down to one aspect of analysis.
Signal to noise ratio.
If the effect could not be attributed to random fluctuations (no matter what the cycle state currently affecting the result) then it was a real effect and could be reproduced under all conditions.

Adrian Ashfied
March 20, 2010 5:58 am

First read “Odds are, it’s wrong” http://www.sciencenews.org/view/feature/id/57091/title/Odds_Are,_Its_Wrong That looks like it is the piece from where Anthony Watts got his excerpt.
Then, if you have more than half a day, follow the comments of VS on Bart’s blog here: http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared/#comment-1216
Interesting discussion until Tamino’s trolls join the fray.
See Bishop Hill’s comment on March 17th

NickB.
March 20, 2010 6:01 am

Anthony,
We should see if we could get VS to post an intro/overview of econometrics here. The real/worst issue with statistics and climate science is the picking and choosing of what methods to use – how many model tweaks do you think Mann went through before he got his Hockey Stick *just* right?
According to VS, CO2 and temperature don’t pass the first – mandatory – econometric test for correlation. In his field, if you don’t get past that, the theory is invalidated – no matter what fundamental theory underlies it or how similar/correlated the graphs look. *Real* statistics conflicts with *their* statistics – and advanced econometrics, especially applied to climate data sets, are much more appropriate than Statistics for Science I

Brent Hargreaves
March 20, 2010 6:01 am

Juandos (3:58:07): You wrote: “Regardless of the numbers and quality of data sets how can one model something like climate if all the subtleties aren’t understood? I’m sorry for asking what might be a seriously dumb question but I keep tripping over the butterfly effect…”
Not dumb in the least. I am beginning to firm up these two ideas:
(i) That many areas of modern life qualitative understanding is being shouldered aside by quantitative methods, to the detriment of common sense and ‘feel’. The lawyer’s piece above (Fat Bigot, 1:05:20) is a perfect example.
(ii) That the warmists and the sceptics stand either side of a profound philosophical gulf. They are determinists, confident that the forecasts are founded on such solid science and such solid initial conditions that the future of the climate is more pridictable than it actually is. We are Chaoticists, conscious of “known unknowns” and wondering whether there remain “unknown unknowns” yet to emerge.
I recently tried to discuss this philosophical divide with a bunch of warmists, but was labelled a know-nothing-numpty.

Allan M
March 20, 2010 6:02 am

FatBigot (01:05:20) :
To say there is a 25%, 50%, 75% or even 99% chance is to say “we don’t know”.
The answer can only be “yes 75% of the time but no 25% of the time” if one can identify that which causes three quarters of occasions to give an affirmative response and one quarter to give a negative, and that can only be done by further refining the external factors. One then has a more detailed analysis and a longer list of yeses and noes.

Which is what troubled Erwin Schrodinger. But with those little electromagnetic globules (technical term) they can’t do any better.
——————-
There are ways in which it is good to be a ‘frequentist,’ but not at my age.

March 20, 2010 6:03 am

Oh I get it now, when natural scientists say they dont trust statistics, they mean they dont trust the use of quantative probability techniques to prove hypothesis .
Daniel H’s quote from Schneider finds me (if I understand Schneider correctly) in sympathy with Schneider’s view (oh dear!) – which itself reminds me of a forgotten text by the young John Maynard Keynes, A Treatise on Probability 1921.
Keynes demolishes the frequency theory of probability as found from Bernoulli to Laplace. His critique of Laplace’s approach to probability (and stats generally) is marvelous:

It seemed to follow from the Laplacian doctrine that the primary qualification for one who would be well informed was an equally balanced ingnorace. p85

It’s like: What are the chances of tossing heads? On ignorance/frequency says its 1/2. OK, so now I have tossed heads 15 times, and you get a million buck if you guess the 16th toss. On that knowledge, what do you choose? He notices that while the frequency doctrine prevailed in his time in science, it did not previal in the field of insurance – where failure meant insolvency.
Keynes was a founder of modern economic (and not in the narrow way you might think from the term ‘Keynsian’) and one of the big things he advocated was statistical indicators – as accurate public knowledge upon which investors could make decisions. Injecting this knowledge in the market would help to avoid panics and bubbles based on rumour and misinformation. So here we have a social scientist who would approve of this critique of the over reliance on quantative probability in validation of hypothesis but alsa an advocate of making the primary data publically available…I think this social scientist is on our side.

Capn Jack.
March 20, 2010 6:11 am

Pepsicology gives me hiccups.

Mike B
March 20, 2010 6:12 am

Oh gosh, who cares about statistics when the science is settled?

March 20, 2010 6:26 am

Holy cow, it is a silly article.
There is a substantial portion of science where work without statistics would be almost impossible – and be sure that you’re hearing this from a person who almost always used “non-statistical” arguments about everything. That people make errors or add their biases or misinterpret findings can’t reduce the importance of statistics. People do mistakes, misinterpretations, and distortions outside statistics, too.
The notion that statistics itself should be blamed for these human problems or that it is inconsistent because of them is preposterous. Even in the most accurate disciplines, like particle physics, it’s inevitable to work with statistics. It’s a large part of the job. And people usually don’t do flagrant errors because scientists in this discipline don’t suck.
One can have his opinions about the ideal methodology and/or required confidence level, but dismissing all of statistics is surely about the throwing of the baby out with the bath water.

H.R.
March 20, 2010 6:38 am

channon (03:54:04) :
“Yes pure math gives what appears to be the comfort of certainty and although many pure scientists believe this to be absolutely true, most philosophers can show that no system of logic is both complete and consistent.
That being the case, that purity is only relatively true and there is an element of uncertainty inherent in all calculations and proofs. […]”
Sooo… when is it that 1pebble + 1pebble doesn’t equal 2pebbles? Did I make an error in logic somewhere?

ShrNfr
March 20, 2010 6:39 am

You can never use a probability to prove a hypothesis. You can only use it to reject a hypothesis with a certain probability of being wrong. Further than that, the methods of sample survey are non-trivial. You cannot just take a batch of data from where every you might get it and regard it as a state. It represents a sample. Errors introduced by poor sample survey designs can overwhelm any purported results of the survey.

Stephen Skinner
March 20, 2010 6:40 am

There is an excellent book “How to lie with statistics” by Darrell Huff, and was first published in 1954.
Chapter 5 is titled ‘The Gee-Whiz Graph’ and the writer goes on to describe how a graph that doesn’t have an impact can be made to have an impact. One way is to remove the space at the bottom from 0 to where the data makes an appearance on the graph. This immediately isolates any slope and makes it look more important. Next is to expand the scale on the left to increase the incline.
This book was written before AGW and I read it from an unrelated recommendation. So when I see the usual graphs of increasing CO2 in a graph I see the lower bit from 0 to where the data makes an appearance has been removed, and the scale has increased to ensure the gradient is steep. So what should the scale on the left be? I would have thought that as what we are measuring is parts per million then the scale should be a million? I tried that and the problem was I couldn’t see the CO2 on the graph!
In addition we hear that CO2 has increased by 30%, but compared to what? I would argue that the CO2 properties of the atmosphere as a whole has increased by 1% of 1%. Have I got this wrong?

Max Hugoson
March 20, 2010 6:43 am

A little bit of fun here: Years ago I was listening to NPR (yes, I still do occassionally, but with a VERY jaundiced eye/ear these days! I’ve learned…how completely left biased they are) and they gave a report on the attempts in N.J. to raise the “standard tests scores” for high school graduates.
The announcer said this: “Despite 4 years of efforts, 50% of all students still fall below the mean on the standard tests…”
I almost drove off the road (I was in my car) I laughed so hard.
My thought: “We know which side of the MEAN this announcer fell on…”
Remember: 78% of all statistics are actually made up on the spot. (Like that one.)

JimB
March 20, 2010 6:47 am

Hopefully the dirty little secret will get much more exposure as things like this start popping up:
http://www.reuters.com/article/idUSN1916237120100319
JimB

geo
March 20, 2010 6:51 am

Interesting article, and proof of “auditing” is necessary when statistics are involved.
It is nice to get the perspectives in that article from scientists outside the climatology furball. How will the AGW proponents attack them –funded by Exxon-Mobil? Oh, I know. . . “associated with industries that are significant C02 producers!” As if there are any that aren’t.

r
March 20, 2010 6:54 am

Alas, not even calculus is perfect:
I was devastated when was shown Gabriel’s Horn:
Gabriel’s Horn is obtained by rotating the curve y=1/x around the x axis for 1<or =x <or = infinity . Remarkably, the resulting surface of revolution has a finite volume and an infinite surface area. It is interesting to note that as the horn extends to infinity, the volume of the horn approaches pi .
After having so many things in the world betray me, I thought at least math is perfect and truthful… I was crushed.

r
March 20, 2010 6:58 am

I still say:
Show me the data,
Show me how you measured the data,
Show me how you manipulated the data,
and I will show you the truth.

R. Craigen
March 20, 2010 7:01 am

My favourite variation on the famous Mark Twain quote:
There are three kinds of liars: Liars, Outliers, and Out-and-out Liars.
Maybe we should amend Twain say that there are three kinds of lies: Lies, Stastics and Climate Science.
Fun rhetoric aside, as a professional mathematician I have hated statistics. In my own studies I maneuvered my course choices at UBC to avoid the subject, and took my PhD in Pure Math. I was, therefore, not prepared when the first university that hired me asked me to teach Statistics. I guess they liked that so much they started giving me two statistics classes at a time, both introductory statistics and a second-year course in multivariate analysis of variance. The next place I taught also handed me a Stats load. Where I am now, Statistics is a separate department, and I like it that way.
All that said, I wish to pour a bit of cold water on the discussion by saying that modern statistics itself is not poorly founded but rather poorly understood or deliberately abused by many who would draft statistics to preconceived ends. This includes many in the “hard” sciences, and even more in the soft sciences “humanities” and medicine. What we need, however, is not to ridicule and curtail the use of statistics but more, and better, rigour in its use, and more scrutiny of how conclusions are drawn from it.
There are two kinds of statistics: Descriptive statistics (which is 100% accurate — it merely describes the characteristics of a data set, but may be used to “lie” simply by selective reporting) and Inferential statistics (in which sampled data is used to infer facts about unsampled data or predictions about the behavior of that which is sampled, such as weather or climate systems).
Inferential statistics is not wrong by virtue of having built-in uncertainty, but the uncertainty is an integral part of any conclusions one draws with it, and more rigour must be applied in the analysis of uncertainty than in the principal figures themselves. In inferential statistics there are more places where one can make errors (deliberate or accidental) and misinterpret or otherwise simply get conclusions wrong. This is why people like Steve McIntyre, who devote great energy and care to scrutinizing others’ use of statistics on matters of great importance, are heros.
As Willis points out above, it is easy to create a surface impression that statistics is done right. A rule of thumb in the world of polling is that a properly randomized poll of 30 citizens (regardless of the size of the population) on a binary question (will you vote X in the upcoming election) gives a result that is accurate to within 3%, 19 times out of 20. This means that the actual number reported (52% of voters sampled say they will vote X) can be expected to be within 3% of the actual proportion of the population (so in fact you can expect 49 to 55% with something like 95% certainty).
But this also means that, if you carried out this poll 20 times there is a very good chance that one of the 20 will give a number MORE than 3% from the actual proportion. This makes it easy to cherry-pick results to suit one’s desired conclusion (which necessitates strict controls over statistical methodology).
But this is not the only way to screw up results. As Anthony has well established in climate science, it is easy to advertently or inadvertenty introduce bias by systematic problems in sampling methodology. In my polling example, a telephone poll may be problematic: How are the telephone numbers selected? Will the fact that you won’t get responses from people unwilling to give time to answer telephone polls have an effect on your results (for example, what if you’re doing a consumer survey on how people feel about telemarketers)? A phone survey leaves out people who don’t have the financial means to own a phone, or who have no permanent address — not a good way to sample the homeless. A land-phone survey will miss those who rely only on cell phones, skype and internet-based phone services, a growing demographic, heavily skewed to young people and retirees. And so on.
It’s important to remember that figures don’t lie — but at the same time, liars can figure! And that’s the critical datum here.

March 20, 2010 7:01 am

Stephen Skinner (06:40:28),
Here’s a good example: click
Another: click
John Daly’s chart: click
And the original alarming CO2 chart: click

Basil
Editor
March 20, 2010 7:02 am

Leif Svalgaard (01:48:47) :
It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation
Clearly not written by a scientist. We validate a hypothesis by its predictions or explanatory power or even ‘usefulness’ [even if actually not the correct one – e.g. the Bohr atom]. Statistics is only used as a rough guide to whether the result is worth looking into further. Now, if a prediction has been made, statistics can be used as a rough gauge of how close to the observations the prediction came, but the ultimate test is if the predictions hold up time after time again. This is understood by scientists, but often not by Joe Public [his dirtiest secret perhaps 🙂 ].

Leif, you are too much the idealist here. You are exactly right, of course, in the role of prediction or explanatory power or utility in “validating” hypotheses. (Be careful, though. We do not “validate” hypotheses; strictly speaking we fail to invalidate them. But I know you know that.)
But did you read the entire article? It is actually quite good, and is calling attention to something that is quite true: the misuse and frequent misunderstanding of the results of statistical tests.
Let’s apply this discussion to the IPCC exercise. What, exactly, are they predicting? As I understand it, nothing. So how is that science?
More pointedly, I’d like to see some serious discussion of the IPCC’s use of statistics in its “Treatment of Uncertainty.” They take the outcome of statistical tests and interpret them in this way:
“Where uncertainty in specific outcomes is assessed using expert judgment and statistical analysis of a body of evidence (e.g. observations
or model results), then the following likelihood ranges are used to express the assessed probability of occurrence: virtually certain >99%;
extremely likely >95%; very likely >90%; likely >66%; more likely than not > 50%; about as likely as not 33% to 66%; unlikely <33%; very
unlikely <10%; extremely unlikely <5%; exceptionally unlikely <1%.

I think there is indeed a “dirty little secret” here that the article we’re discussing exposes. The IPCC “Treatment of Uncertainty” is based on the fallacy of the “transposed conditional.” When a paper publishes a statistical result with, say, a 95% “level of significance,” that doesn’t mean that there is a 95% likelihood that the finding is 95% “true.” Properly constructed, it means that there is a 95% likelihood that the finding is “not not true” or “not false.” In statistics (conditional reasoning), the fact that something is “not false” does not make it “true.” Or, to use the terminology of the IPCC, the fact that there is a 95% probability that something is “not unlikely” does mean that it is 95% likely. Formally, the problem here is that in most cases we cannot properly state the conditions under which something is likely to be true. So we cannot say, with any degree of measurable certainty, whether or not our results were a fluke. All the “null hypothesis” can do is assign a probability to the likelihood that are results are not a fluke. But if the results are a fluke, the fact that it was that one time in twenty is little consolation. And since it only takes one fluke to disprove an hypothesis, we don’t stop with one test. We keep testing, and testing. Or, as you put it, we use our 95% (or whatever) statistically significant result only “as a rough guide to whether the result is worth looking into further.” And I agree further that the goal is a prediction, and that prediction cannot be validated or invalidated by statistics. It is validated or invalidated by observation. Either the prediction holds true, or it does not.
When I started out doing statistical research for publication (back in the 1990’s), a 95% level of significance was a kind of threshold for concluding the possibility of a meaningful inference (in my field, at least). A 90% level of significance was considered a weak finding in support of an hypothesis. I think this was an intuitive way of avoiding the “fallacy of the transposed conditional.” So when I see the IPCC treating >66% as “likely” I find it revolting. And yet these are so-called scientists!
Give the author a little slack. He’s calling attention to a real problem. He may have not said everything exactly the way you would have, but that doesn’t mean everything he said is unworthy of your consideration.

Basil
Editor
March 20, 2010 7:03 am

I hit the post button too quickly.
1) When I bold Leif’s “rough guide” comment, I’m supplying the emphasis.
2) I started out publishing in the 1970’s, not 1990’s.

Curiousgeorge
March 20, 2010 7:06 am

This has been a issue between scientists and statisticians for a long time. For anyone who wishes to explore the issue from the statistical side, I’ll recommend the American Statistical Association http://www.amstat.org/ . There have been many papers and discussion of climate and the use of statistics therein. Suggest searching the site for “climate” and/or “Bayes + climate”.
In addition there is a meeting coming up on this very issue:
Richard L. Smith, L. Mark Berliner, and Peter Guttorp, the authors of the article: Statisticians Comment on Status of Climate Change Science in the March issue of Amstat News, are going to discuss their article online LIVE from noon to 1 p.m. EST on Wednesday, March 31. Be sure to check back here or Amstat News online on the 31st for the link!
That said, there is also considerable disagreement within the statistics community about philosophy and methods used to analyze climate data. Bayesian vs. Orthodox, for example.
My own personal philosophy is: If you ask a fuzzy question, you’ll get a fuzzy answer.

Duke C.
March 20, 2010 7:06 am

The EPA has been quite successful manipulating statistics for political purposes:
EPA ’93 Study on Second Hand Smoke
No doubt their success with the secondhand smoke meta-study in 1993 paved the way for their recent CO2 finding.

Blackbarry
March 20, 2010 7:14 am

FatBigot (01:05:20) :
“To say there is a 25%, 50%, 75% or even 99% chance is to say “we don’t know”. ”
I believe that the existence of the classical “path” can be pregnantly formulated as follows: The “path” comes into existence only when we observe it.
–Heisenberg, in uncertainty principle paper, 1927
The more precisely the position is determined, the less precisely the momentum is known in this instant, and vice versa.
–Heisenberg, uncertainty paper, 1927
Although Heisenberg was referring to the character of particles in the subatomic world, he shed a great light on our scientific conundrum. We can determine the path of past climate by ferreting out historical evidence in tree rings, sediments, ice cores, etc. But that path did not exist until we looked for it and created it. The path does not exist as a physical entity because it is simply a human creation. As we try to accurately pin down the exact conditions that exist today, we lose our ability to see where we are going. It’s as if we assume there is a linearity to climate and we simply need to draw the future path from the direction of the past + current data. But Heisenberg points out that you can’t know both at the same time!
Despite the Heisenberg uncertainty principle, the understanding of quantum mechanics has given us computers, cell phones, etc. We can function quite well in a world of uncertainty. All the statistics do is measure the degree of uncertainty in a chaotic world. As Einstein demonstrated, “It’s all relative”.