In the wake of what Willis recently pointed out from Nassim Taleb, about how “In fact errors are so convex that the contribution of a single additional variable could increase the total error more than the previous one.”, I thought it relevant to share this evisceration of the over-reliance on statistical techniques in science, especially since our global surface temperature record is entirely a statistical construct.
Excerpts from the Science News article by Tom Siegfried:
Science is heroic. It fuels the economy, it feeds the world, it fights disease. Sure, it enables some unsavory stuff as well — knowledge confers power for bad as well as good — but on the whole, science deserves credit for providing the foundation underlying modern civilization’s comforts and conveniences.
But for all its heroic accomplishments, science has a tragic flaw: It does not always live up to the image it has created of itself. Science supposedly stands for allegiance to reason, logical rigor and the search for truth free from the dogmas of authority. Yet science in practice is largely subservient to journal-editor authority, riddled with dogma and oblivious to the logical lapses in its primary method of investigation: statistical analysis of experimental data for testing hypotheses. As a result, scientific studies are not as reliable as they pretend to be. Dogmatic devotion to traditional statistical methods is an Achilles heel that science resists acknowledging, thereby endangering its hero status in society.
More emphatically, an analysis of 100 results published in psychology journals shows that most of them evaporated when the same study was conducted again, as a news report in the journal Nature recently recounted. And then there’s the fiasco about changing attitudes toward gay marriage, reported in a (now retracted) paper apparently based on fabricated data.
But fraud is not the most prominent problem. More often, innocent factors can conspire to make a scientific finding difficult to reproduce, as my colleague Tina Hesman Saey recently documented in Science News. And even apart from those practical problems, statistical shortcomings guarantee that many findings will turn out to be bogus. As I’ve mentioned on many occasions, the standard statistical methods for evaluating evidence are usually misused, almost always misinterpreted and are not very informative even when they are used and interpreted correctly.
Nobody in the scientific world has articulated these issues more insightfully than psychologist Gerd Gigerenzer of the Max Planck Institute for Human Development in Berlin. In a recent paper written with Julian Marewski of the University of Lausanne, Gigerenzer delves into some of the reasons for this lamentable situation.
Above else, their analysis suggests, the problems persist because the quest for “statistical significance” is mindless. “Determining significance has become a surrogate for good research,” Gigerenzer and Marewski write in the February issue of Journal of Management. Among multiple scientific communities, “statistical significance” has become an idol, worshiped as the path to truth. “Advocated as the only game in town, it is practiced in a compulsive, mechanical way — without judging whether it makes sense or not.”
Commonly, statistical significance is judged by computing a P value, the probability that the observed results (or results more extreme) would be obtained if no difference truly existed between the factors tested (such as a drug versus a placebo for treating a disease). But there are other approaches. Often researchers will compute confidence intervals — ranges much like the margin of error in public opinion polls. In some cases more sophisticated statistical testing may be applied. One school of statistical thought prefers the Bayesian approach, the standard method’s longtime rival.
Why don’t scientists do something about these problems? Contrary motivations! In one of the few popular books that grasp these statistical issues insightfully, physicist-turned-statistician Alex Reinhart points out that there are few rewards for scientists who resist the current statistical system.
“Unfortunate incentive structures … pressure scientists to rapidly publish small studies with slapdash statistical methods,” Reinhart writes in Statistics Done Wrong. “Promotions, tenure, raises, and job offers are all dependent on having a long list of publications in prestigious journals, so there is a strong incentive to publish promising results as soon as possible.”
And publishing papers requires playing the games refereed by journal editors.
“Journal editors attempt to judge which papers will have the greatest impact and interest and consequently those with the most surprising, controversial, or novel results,” Reinhart points out. “This is a recipe for truth inflation.”
Scientific publishing is therefore riddled with wrongness.
Read all of part 1 here
Excerpts from Part2:
Statistics is to science as steroids are to baseball. Addictive poison. But at least baseball has attempted to remedy the problem. Science remains mostly in denial.
True, not all uses of statistics in science are evil, just as steroids are sometimes appropriate medicines. But one particular use of statistics — testing null hypotheses — deserves the same fate with science as Pete Rose got with baseball. Banishment.
Numerous experts have identified statistical testing of null hypotheses — the staple of scientific methodology — as a prime culprit in rendering many research findings irreproducible and, perhaps more often than not, erroneous. Many factors contribute to this abysmal situation. In the life sciences, for instance, problems with biological agents and reference materials are a major source of irreproducible results, a new report in PLOS Biology shows. But troubles with “data analysis and reporting” are also cited. As statistician Victoria Stodden recently documented, a variety of statistical issues lead to irreproducibility. And many of those issues center on null hypothesis testing. Rather than furthering scientific knowledge, null hypothesis testing virtually guarantees frequent faulty conclusions.
10. Ban P values
9. Emphasize estimation
8. Rethink confidence intervals
7. Improve meta-analyses
6. Create a Journal of Statistical Shame
5. Better guidelines for scientists and journal editors
4. Require preregistration of study designs
3. Promote better textbooks
2. Alter the incentive structure
1. Rethink media coverage of science
Read the reasoning behind the list in part 2 here
I would add one more to that top 10 list:
0. Ban the use of the word “robust” in science papers.
Given what we’ve just read here and from Nassim Taleb, and since climate science in particular seems to love that word in papers, I think it is nothing more than a projection of ego from the author(s) of many climate science papers, and not a supportable statement of statistical confidence.
One other point, one paragraph in part one from Tom Siegfried said this:
For science is still, in the long run, the superior strategy for establishing sound knowledge about nature. Over time, accumulating scientific evidence generally sorts out the sane from the inane. (In other words, climate science deniers and vaccine evaders aren’t justified by statistical snafus in individual studies.) Nevertheless, too many individual papers in peer-reviewed journals are no more reliable than public opinion polls before British elections.
That ugly label about climate skeptics mars an otherwise excellent article about science. It also suggests Mr. Siegfreid hasn’t really looked into the issue with the same questioning (i.e. skepicism) that he did for the abuse of statistics.
Should Mr. Siegfreid read this, I’ll point out that many climate skeptics became climate skeptics once we started examining some of the shoddy statistical methods that were used, or outright invented, in climate science papers. The questionable statistical work of Dr. Michael Mann alone (coupled with the unquestioning media hype) has created legions of climate skeptics. Perhaps Mr. Siegfeid should spend some time looking at the statistical critiques done by Stephen McIntyre, and tell us how things like a single tree sample or upside down data or pre-screening data begets “robust” climate science before he uses the label “climate deniers” again.