The replication crisis in science has just begun. It will be big.
By Larry Kummer. From the Fabius Maximus website.
Summary: After a decade of slow growth beneath public view, the replication crisis in science begins breaking into public view. First psychology and biomedical studies, now spreading to many other fields — overturning what we were told is settled science, the foundations of our personal behavior and public policy. Here is an introduction to the conflict (there is pushback), with the usual links to detailed information at the end, and some tentative conclusions about effects on public’s trust of science. It’s early days yet, with the real action yet to begin.
“Men only care for science so far as they get a living by it, and that they worship even error when it affords them a subsistence.”
— Goethe, from Conversations of Goethe with Eckermann and Soret
.
Mickey Kaus referred to undernews as those “stories bubbling up from the blogs and the tabs that don’t meet MSM standards.” More broadly, it refers to information which mainstream journalists pretend not to see. By mysterious processes it sometimes becomes news. A sufficiently large story can mark the next stage in a social revolution. Game, the latest counter-revolution to feminism, has not yet reached that stage. The replicability crisis of science appears to be doing so, breaching like a whale from the depths of the sea in which it has silently grown.
See these powerful articles in the past month about the crisis. The first four discuss egregious failures of scientific institutions — with large public policy consequences; the last two are among the few articles describing this crisis for a general audience.
- “A Study on Fats That Doesn’t Fit the Story Line” by the NYT, looking at the long-hidden research suggesting that animal fats are not worse than vegetable fats. See #12 below for links to these studies.
- “The sugar conspiracy” by Ian Leslie in The Guardian — “In 1972, a British scientist sounded the alarm that sugar – and not fat – was the greatest danger to our health. His findings were ridiculed and his reputation ruined. How did the world’s top nutrition scientists get it so wrong for so long?”
- “How scientists fell in and out of love with the hormone oxytocin” by Brian Resnick at VOX — “Scientists believed a whiff of the chemical could increase trust between humans. Then they went back and checked their work.”
- “Cancer Research Is Broken” by Daniel Engber at Slate — “There’s a replication crisis in biomedicine — and no one even knows how deep it runs.”
- “Big Science is broken” by Pascal-Emmanuel Gobry at The Week.
- Best so far: “Scientific Regress” by William A. Wilson at First Things.
This crisis emerged a decade ago as problems in a few fields — especially health care and psychology. Slowly similar problems emerged in other fields, usually failures to replicate widely accepted research. Even economics, with its high standards for transparency — has been hit. The landmark 2010 paper “Growth in a Time of Debt” by Harvard professors Carmen Reinhart and Kenneth Rogoff — used to justify austerity policies in scores of nations — was found to have serious errors in their spreadsheets. Even physics has been affected, as William Wilson notes:
“Two of the most vaunted physics results of the past few years — the announced discovery of both cosmic inflation and gravitational waves at the BICEP2 experiment in Antarctica, and the supposed discovery of superluminal neutrinos at the Swiss-Italian border — have now been retracted, with far less fanfare than when they were first published.” {See this about the former and this about the latter.}
By now it’s obvious that there is a structural problem in modern science, a deterioration of the always sloppy (as with most social processes) self-correcting dynamics of institutional research. Only small scale research has been conducted so far, so we do not know how broad and deep this dysfunctionality extends. The available evidence suggests that “large” is the most likely answer.
The stakes are almost beyond imagination. It’s not just a matter of time and money wasted when bad studies send research down blind allies. Science is one of our best ways to see the world, and effective public policy requires reliable research on scores of subjects, from health care to climate change. Trillions of dollars, the world’s rate of economic growth, and the health of billions can be affected.
Actions and resistance
Talk precedes action, and there are have several high-level conferences about this crisis. Such as the February 2014 workshop by the Subcommittee on Replicability in Science, part of the Advisory Committee to the NSF Directorate for Social, Behavioral, and Economic Sciences. They produced this typically thorough report: Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science“.
Journalists describe the replication crisis as a “Whig history” — another step in the inevitable evolution and perfection of science. They seldom mention the scientists — and science institutions — resisting reforms, making the outcome uncertain (here’s an example in social psychology). This hidden side of the crisis is described by David Funder (Prof of psychology, UC-Riverside) at his website.
“It’s not just – or even especially – about psychology. I was heartened to see that the government representatives saw the bulk of problems with replication as lying in fields such as molecular biology, genetics, and medicine, not in psychology. Psychology has problems too, but is widely viewed as the best place to look for solutions since the basic issues all involve human behavior.
“It makes me a bit crazy when psychologists say (or sometimes shout) that everything is fine, that critics of research practices are “witch hunting,” or that examining the degree to which our science is replicable is self-defeating. Quite the reverse: psychology is being looked to as the source of the expertise that can improve all of science. As a psychologist, I’m proud of this.
Backlash and resistance.
“This issue came up only a couple of times and I wish it had gotten more attention. It seemed like nobody at the table (a) denied there was a replicability problem in much of the most prominent research in the major journals or (b) denied that something needed to be done. As one participant said, “we are all drinking the same bath water.” … {But} there will be resistance out there. And we need to watch out for it.
“…One of Geoff Cumming’s graduate students, Fiona Fidler, recently wrote a thesis on the history of null hypothesis significance testing {NHST}. It’s a fascinating read and I hope will be turned into a book soon. One of its major themes is that NHST has been criticized thoroughly and compellingly many times over the years. Yet it persists, even though – and, ironically, perhaps because – it has never really been explicitly defended! Instead, the defense of NHST is largely passive. People just keep using it. Reviewers and editors just keep publishing it; granting agencies keep giving money to researchers who use it. Eventually the critiques die down. Nothing changes.
“That could happen this time too. The defenders of the status quo rarely actively defend anything. They aren’t about to publish articles explaining why NHST tells you everything you need to know, or arguing that effect sizes of r = .80 in studies with an N of 20 represent important and reliable breakthroughs, or least of all reporting data to show that major counter-intuitive findings are robustly replicable. Instead they will just continue to publish each others’ work in all the “best” places, hire each other into excellent jobs and, of course, give each other awards. This is what has happened every time before.
“Things just might be different this time. Doubts about statistical standard operating procedure and the replicability of major findings are rampant across multiple fields of study, not just psychology. And, these issues have the attention of major scientific studies and even the US Government. But the strength of the resistance should not be underestimated.”
Conclusions
“But what a weak barrier is truth when it stands in the way of an hypothesis!”
— By Mary Wollstonecraft in A Vindication of the Rights of Woman
(1792).
This just touches on the many dimensions of the replication crisis. For example, there is the large and growing literature about the misuse of statistics — and the first steps to understanding the various causes of replication failure (almost certainly from structural issues, perhaps common to many or all sciences today).
We can only guess at how many of the sciences have serious problems with replication — and the methodological problems that produce it. This might be one of the greatest challenges to science since the backlash to Darwin’s theory of evolution. Depending on the extent of the problem and the resistance of institutions to reform, this might become the largest challenge since the Roman Catholic Church’s assault in the 15th and 16th centuries, putting the works of famous scientists on the Index Librorum Prohibitorum (e.g., Copernicus, Kepler, Galileo). But this time the problems are within, not external to science.
The likely (but not certain) eventual results are reforms which strengthen the institutions of science, but the crisis might have severe side-effects — such as a loss in public confidence. America has long had a rocky relationship with science, from the 1925 Scopes “Monkey Trial” about evolution to the modern climate wars. With our confidence in our institutions so low and falling, news about replication failures in “settled science” might have affect the public’s confidence willingness to trust scientists. This might take long to heal.
Many sciences are vulnerable, but climate science might become the most affected. It combines high visibility, a central role in one of our time’s major public policy questions, and a frequent disregard for the methodological safeguards that other sciences rely upon.
Watch for news developments in this important story.
To learn more about this crisis in science
Some early articles about the crisis
- “Most scientific papers are probably wrong“, Kurt Kleiner, New Scientist, 30 August 2005.
- “Replication studies: Bad copy” by Ed Yong in Nature, 16 May 2012 — “In the wake of high-profile controversies, psychologists are facing up to problems with replication.”
- “How science goes wrong: Scientific research has changed the world. Now it needs to change itself“, The Economist, 19 October 2013.
- An excellent intro to the subject: “The Replication Crisis in Psychology” by Edward Diener and Robert Biswas-Diener, NOBA, 2016.
Some of the many papers about the replication crisis
- An early warning that something was amiss: “Problems With Null Hypothesis Significance Testing (NHST)” by Jeffrey A. Gliner et al in The Journal of Experimental Education, 2002 — “The results show that almost all of the textbooks fail to acknowledge that there is controversy surrounding NHST.”
- “Why Most Published Research Findings Are False“, John P. A. Ioannidis, Public Library of Science Medicine, 30 August 2005.
- “Statistical errors in medical research – a review of common pitfalls” by Alexander M. Strasak et al, Swiss Medical Weekly, 27 January 2007 — “Standards in the use of statistics in
medical research are generally low. A growing body of literature points to persistent statistical errors, flaws and deficiencies in most medical journals.”
- “What errors do peer reviewers detect, and does training improve their ability to detect them?” by Sara Schroter et al in the Journal of the Royal Society of Medicine, 1 October 2008 — Showed massive failure of peer-review on deliberated flawed paper submitted to the British Medical Journal.
- “Reliability of ‘new drug target’ claims called into question“, Brian Owens, Nature, 5 September 2011 — Internal study at Bayer finds that in only 14 of 67 target-validation projects did results match the published finding. These projects covering the majority of Bayer’s work in oncology, women’s health and cardiovascular medicine over the past 4 years. See the paper: “Reliability of ‘new drug target’ claims called into question“, Asher Mullard, Nature Reviews Drug Discovery,
- “Academic bias & biotech failures” at Life Sci VC, 28 March 2011 — “The unspoken rule is that at least 50% of the studies published even in top tier academic journals – Science, Nature, Cell, PNAS, etc… – can’t be repeated with the same conclusions by an industrial lab.”
- “In cancer science, many “discoveries” don’t hold up“, Reuters, 28 March 2012 — About Amgen’s study, “Drug development: Raise standards for preclinical cancer research” by C. Glenn Begley and Lee M. Ellis in Nature, 28 March 2012. They tested 53 “landmark” papers about cancer; 47 could not be replicated.
- “Weak statistical standards implicated in scientific irreproducibility” by Erika Check Hayden, Nature, 11 November 2013 — “One-quarter of studies that meet commonly used statistical cutoff may be false.” About “Revised standards for statistical evidence” by Valen E. Johnson in PNAS, 26 November 2013.
- “Estimating the reproducibility of psychological science” by the Open Science Collaboration, Science, 28 August 2015. Part of The Reproducibility Project: Psychology of the Open Science Foundation.
- “Records found in dusty basement undermine decades of dietary advice” by Sharon Begley at STAT, 12 April 2016. — Powerful but unpublished studies decisively refuted the consensus belief about dangers of animal fats. They were probably unpublished because they contradicted the ruling paradigm. The NYT also covered this. See these two papers in the British Medical Journal: “Re-evaluation of the traditional diet-heart hypothesis: analysis of recovered data from Minnesota Coronary Experiment (1968-73)“, 12 April 2016 — and “Use of dietary linoleic acid for secondary prevention of coronary heart disease and death: evaluation of recovered data from the Sydney Diet Heart Study {1966-73} and updated meta-analysis“, 5 February 2013.
- Investigating Variation in Replicability: A “Many Labs” Replication Project by the Open Science Collaboration. See a summary at National Geographic.
- List of replication attempts in psychological research. Many failed.
- The master website for anyone interested in this subject: Retraction Watch.
Thanks for the great review of the issues. I will keep this and send it to friends (and enemies) when the subject comes up. There’s a lot of useful reading to do here.
The problem is the output institutional organizations funded by government becomes political.
There are so many reasons for the replication crisis that there won’t be a simple, single solution that can be used across all of science.
1. Confirmation bias (unconscious; benign)
2. Lack of reporting of negative findings
3. Poor knowledge, or deliberately improper use, of statistics
4. Political bias (conscious yet not publicly acknowledged; malicious) among researchers and funding agencies
5. Overhyped reporting of findings
6. Money (see 1., 2., 4., and 5. above)
In many cases, it’s not a big deal. If somebody gets something wrong in astronomy, oh well. But when somebody gets something wrong in medicine, people die. Medicine is the rock on which this problem is going to be broken wide open. My gut tells me it’s mainly a problem of too many effects being searched for in too few samples, which results in misleading statistical results, combined with lack of reporting of negative results. All these problems can be dealt with, because medicine is basically using controlled experiments (though often with small sample sizes). It will require real courage to reform the publication process in medicine, but I believe most doctors and researchers are acting in good faith and that reform can happen.
For climate science, as Climategate showed, that festering pile of offal is filled with the more conscious problems, possibly including outright lies. When so many of the practitioners in a field are morally and ethically compromised, I don’t know how the field can be reformed. Further, climate science is not amenable to controlled experiments, so you’ll be dealing with trying to counter assertions based on models tuned to deal with altered data of the past and designed to give a predetermined result for the future. Possibly the best one can hope for is that the field is increasingly ignored and the funding eventually shrinks to negligible levels.
As a database architect and developer with 25 years in the field, what strikes me is the apparent complete lack of competency in climate science in maintaining and preserving the temperature data in a pristine form. You absolutely have to have a “gold copy” of your data somewhere that it will never be lost or altered, so that you can always reset back to the starting point when you desire. My heart went out to the anonymous DBA from the Climategate emails who bemoaned how he’d worked on their data all weekend and thought he was making progress, only to discover yet another inconsistency late in the day.
Here’s how a DBA sees the problem: the surface area of the Earth is 510 million square kilometers. If you had a well-sited temperature station in every square mile and took measurements every hour, at the end of the day you’d have 12.24B records containing latitude, longitude, elevation, time and temperature. If we assume 32 bytes of storage for each record, a day’s worth of data would be around 391 gigabytes, a year’s worth would be just about 142 terabytes, seven years to reach a petabyte, and 7000 years to hit an exabyte.
While that is a lot of data, it’s not an unbearable amount of data (and we wouldn’t have a record every square kilometer, either), and the fact that we don’t have a better handle on the sheer amount of data generated every year is a crime.
I managed to switch from km to miles up there, but of course I mean km for all units.
Steven Mosher wrote on April 22, 2016 at 7:31 pm:
“The only experiment being done is the uncontrolled experiment of putting c02 into the atmosphere.
We are doing that experiment.
One side predicts no bad effects based on past experience
One side predicts bad effects based on physics.”
With all due respect (not meant sarcastically) let me humbly adjust that last sentence for you:
One side predicts no bad effects based on past experience covering many millenia
One side predicts bad effects based on physics-based models which each replicate only a tiny portion of an immensely complex natural mechanism, the complete workings of which are unknown and presently unknowable.
For now, science-based conclusions which veer away from past experience may only be drawn after at least a few more centuries of experience and observation.
I offer this not as an aggressive gesture toward Mosh, but as a simple statement of how things look from the perspective of an ex-engineer.
Never underestimate Mosher’s physics. His latent [heat] of water vaporization never changes.
Grrrh … latent heat of water vaporization.
Only one side has both experience and physics on its side. We have 600 million years of geologic records showing that CO2 as high as 7,000 never caused catastrophic warming. We have ice core data showing that the recent temperature variation is well within the norm for the Holocene. We have IR absorption data showing that CO2 has an exponential decay in the amount of radiation absorbed/unit. We have MODTRAN that shows that doubling CO2 in the lower 100 m of the earth’s atmosphere has an immeasurable impact on W/M^2, and we have warming oceans and day time temperatures, neither of which can be due to CO2’s absorption of IR between 13 and 18µ. One side has physics, history and the scientific method on its side, the other has politics and a misguided social movement.
co2islife,
That quote should read:
“One side predicts no bad effects based on past experience
One side predicts bad effects based on bad physics.”
Every climate science paper that tries to use physics, inevitably gets the physics wrong – either starting with the wrong physics model or using simplified functions that do not get the physics of the atmosphere right. And since the entire debate is around 2W/m2 out of around 1350W/m2, it doesn’t take much for errors to accumulate.
For those who haven’t seen it before, this paper by Hermann Harde is still the most comprehensive one that I’ve found:
http://www.hindawi.com/journals/ijas/2013/503727/
And where was that CO2 before it was placed in those things that man is taking it from and putting back in the atmosphere?
Is it wrong to put water back in the desert, dry lands? Should we stop all irrigation?
I have published research, and with just a Master’s degree. Not bad. I am also a one-hit wonder. Not good. But neither of these things impress me much one way or the other. The merit is in the fact that without my input and quite a few years later, my study was repeated (not replicated because equipment improvements were made that produced a cleaner stimulus used in early latency auditory evoked potentials and with more subjects). The salient results I obtained were again seen in the repeated study.
With that background, research replication is an exact copy. In many fields that may not be the best design, especially if better techniques or better equipment is available. Repeating a study with new and improved techniques and/or equipment that results in the same finding gives robustness to the proposed hypothesis. As it did mine. And that’s not too bad at all.
The guest Blogger is being too generous to the ‘halls of science’. His comment: “the crisis might have severe side-effects — such as a loss in public confidence. America has long had a rocky relationship with science”, is well taken but too lenient as many straight thinking every day type Americans have already thrown the bath water out and have a grave mistrust of scientific conclusions. Can you blame them?
The main problem, in addition to replicability of many current scientific beliefs/findings, is the fact that climate science is not, and has not ever been a science. It is a study involving the input of many scientific fields and statistics. It has been hijacked by those who were intent on making beaucoup dollars from fraudulent enterprises — and — even more importantly — developing a global governance structure run only by elites. Enough of the term “climate science”.
What is really scary is the amount of analysis on CO2 and Carbon compounds compared to the amount of analysis on H2O in the UN IPCC WG1 report. If you have a PDF of the report. do a word search for the terms. Most references on water I found are in terms of water causing increases in CO2. Those that are there are fuzzy “Not to Worry” comments. Supports “blame it on CO2,do NOT look for any other cause.
From the National Cancer Institute.
“Cyclamate, Because the findings in rats suggested that cyclamate might increase the risk of bladder cancer in humans, the FDA banned the use of cyclamate in 1969. After reexamination of cyclamate’s carcinogenicity and the evaluation of additional data, scientists concluded that cyclamate was not a carcinogen or a co-carcinogen (a substance that enhances the effect of a cancer-causing substance). A food additive petition was filed with the FDA for the reapproval of cyclamate, but this petition is currently being held in abeyance (not actively being considered). The FDA’s concerns about cyclamate are not cancer related.”
I actually liked drinking soda that was sweetened with Cyclamates. It was difficult for me to tell the difference between Cyclamate sweetened and Sugar sweetened drinks. Have never been able to drink any of the new sweeteners.
I personally prefer Sucralose, which has 3 hydroxyl groups replaced with chlorine atoms making it, well, a chlorinated hydrocarbon (sucrose = C12H22O11, sucralose = C12H19Cl3O8) and therefore mostly calorie free. A safe(?) Chlorinated hydrocarbon, who would have thought.
https://edis.ifas.ufl.edu/pi090
https://en.wikipedia.org/wiki/Sucralose
“The defenders of the status quo rarely actively defend anything. They aren’t about to publish articles explaining why NHST tells you everything you need to know, or arguing that effect sizes of r = .80 in studies with an N of 20 represent important and reliable breakthroughs, or least of all reporting data to show that major counter-intuitive findings are robustly replicable.”
Not only is the n=20 issue a problem, but the question of what exact universe has been sampled compared to the universe upon which one is attempting to lay predictions. Many times, even with large sample sizes they are completely different universes negating any predictability from the sample to the universe upon which one wishes to project effects. In addition, there are so many other sources of potential error other than sample size, many of which are not quantified or may not be quantifiable in any event. Statistically, in the real world, the science can never be “settled” completely. Statements of “fact” need to always be subject to qualifiers regarding per the best data we presently have as many items stated as fact are much closer to conjecture than fact due to poor research methods and or poor statistical analysis. The most often violated is correlation mistaken for causality.
Larry Kummer;
I repeat my comment from upthread as you have failed to respond to it. In your reply to Steven Mosher you said:
can you provide a published source to support it?
To which I replied:
The appeal to authority card in all its glory.
Can you cite a single example of a climate science “experiment” which can be replicated?
Per Einstein, all you need is one.
I repeat my request.
Nice job Larry! Much appreciated write up.
How do we replicate something like this?
http://realclimatescience.com/2016/04/nsidc-busted/
The probity of this comments-thread has inspired me to share my quotations file, which contains many quotations directly relevant to this thread, as well as many that are an important part of this thread’s larger context — the general politicization of “science” as part of a cultural and social engineering project masterfully administered by the minions of the global oligarchy. However fractious relations between the various factions of the oligarchy, they are united in their manipulation of their bred-to-obey, credulous chattels. Wheels within wheels…
https://www.dropbox.com/s/knkzbgqle31ox6h/QUOTATIONSApr2016.doc.zip?dl=0
Please enjoy…
I’ve got a few bookmarked myself. You have linked to many that I have bookmarked in your essay. If you are keeping a file, here are a few I didn’t see linked in your piece that may interest you:
“Stanford researchers uncover patterns in how scientists lie about their data”
https://news.stanford.edu/2015/11/16/fraud-science-papers-111615/
“Many scientific “truths” are, in fact, false” [lots of embedded links in this one]
http://qz.com/638059/many-scientific-truths-are-in-fact-false/
“On the origins of the linear no-threshold (LNT) dogma by means of untruths, artful dodges and blind faith”
http://www.sciencedirect.com/science/article/pii/S0013935115300311
How does one replicate altered data and rank opinion? The process has been impossible to replicate and climate scientists are careful to not share raw data. But the conclusions are easy to formulate: CAGW = $.
There are no morals in politics; there is only expedience.
A scoundrel may be of use to us just because he is a scoundrel.
– Vladimir Lenin
Vlad understood how to staff a climate science department.
Such alarmism:
One would think that before putting such alarmism on paper, you’d research the public trust in scientific institutions (and scientists), think about whether there have been changes over time, and consider what some of influences might be for any changes you might find, other than those that you’re speculating about.
Surely, you’ve done that. Then why didn’t you report what you found? Because I’ve seen a fair amount of evidence that you’re alarmism is not substantiated.
I read this a few years ago. Seems to me this has been brewing for a few years at the very least:
“Not even trying: the corruption of real science” by Bruce G Charlton.
http://corruption-of-science.blogspot.co.uk/2013/07/not-even-trying-corruption-of-real.html?m=1
My most recent copy of “Mechanical Engineering” magazine included an article by several young engineers recommending books that other young engineers might find helpful in beginning their careers. I thought of the unwritten book that would be helpful.
You get paid to say what the boss wants said.
You get paid to do what the boss want done.
These are the two most basic, though unwritten, expectations.
If you expect to get paid, check your ethics and conscience at the door. Nobody really cares what you think about it. All those high minded corporate values are just window dressing. When those values roll up against big egos or big money, they lose.
I have a growing list of papers and articles on troubled state of what passes for “science” these days, on my web site, here:
sealevel.info/papers.html#whitherscience
Thank you, Larry Kummer, for providing me with more material to add to my list.
Larry: 3rd Request!
davidmhoffer April 23, 2016 at 10:48 am
Larry Kummer;
I repeat my comment from upthread as you have failed to respond to it. In your reply to Steven Mosher you said:
can you provide a published source to support it?
To which I replied:
The appeal to authority card in all its glory.
Can you cite a single example of a climate science “experiment” which can be replicated?
Per Einstein, all you need is one.
I repeat my request.
Larry: You’ve made an assertion which you haven’t seen fit to support with a single example. You can support your assertion, you can admit that you are wrong, or you can quietly ignore this simple request which would make you guilty of the exact crisis in science you accuse others of.
There are no Nobel’s for replication…….sadly all the Kudos goes to the first publisher.
Perhaps they and other awards should be withheld until such studies ere replicated.