The Replication Crisis

Harvard Data Science Review explores reproducibility and replicability in science


Research News

CAMBRIDGE, MA–December 16, 2020–In 2019, the National Academies of Science, Engineering, and Medicine (NASEM) published a consensus report for the US Congress–Reproducibility and Replicability in Science–which addressed a major methodological crisis in the sciences: The fact that many experiments and results are difficult or impossible to reproduce. The conversation about this report and this vital topic continues in a special, twelve-article feature in issue 2:4 of the Harvard Data Science Review (HDSR), publishing today.

Growing awareness of the replication crisis has rocked the fields of medicine and psychology, in particular, where famous experiments and influential findings have been cast into doubt. But these issues affect researchers in a wide range of disciplines–from economics to particle physics to climate science–and addressing them requires an interdisciplinary approach.

“The overall aim of reproducibility and replicability is to ensure that our research findings are reliable,” states HDSR Editor-in-Chief Xiao-li Meng in his editorial. “Reliability does not imply absolute truth–which is an epistemologically debatable notion to start with–but it does require that our findings are reasonably robust to the relevant data or methods we employ.”

“Designing sound replication studies requires a host of data science skills, from statistical designs to causal inference to signal-noise separation, that are simultaneously tailored by and aimed at substantive understanding,” Meng continues.

Guest edited by Victoria Stodden (University of Illinois, Urbana-Champaign), the special theme collection presents research and commentary from an interdisciplinary group of scholars and professionals. Articles include:

The editors hope to take advantage of the collaborative features available on the open-source publishing platform, PubPub, whereHDSR is hosted. Readers around the world can freely read, annotate, and comment on the essays–continuing this important conversation.


The Harvard Data Science Initiative, launched in 2017, is a cross-University initiative working at the nexus of statistics, computer science, and related disciplines to gain insights from complex data in nearly every research domain. Those insights can be deployed to address issues ranging from global economics and inequality to targeted medical treatments, privacy and security, health and the environment, scientific discovery, education, and many more. While the collection and analysis of data has long held an important role in academic research, the Harvard Data Science Initiative strengthens, deepens, and expands this work by advancing methodologies, enabling breakthroughs, promoting new research collaborations, and enhancing Harvard’s educational mission. All of these efforts are rooted in an urgent desire to improve our world: how can we best use data for the common good?

The Harvard Data Science Review is published for the Harvard Data Science Initiative by the MIT Press. Established in 1962, the MIT Press (Cambridge, MA and London) is one of the largest and most distinguished university presses in the world and a leading publisher of books and journals at the intersection of science, technology, art, social science, and design. MIT Press books and journals are known for their intellectual daring, scholarly standards, interdisciplinary focus, and distinctive design. For almost 50 years the MIT Press journals division has been publishing journals that are at the leading edge of their field and launching new journals that have nurtured burgeoning areas of scholarship.

PubPub is an open-source publishing platform from the Knowledge Futures Group for collaboratively editing and publishing journals, monographs, and other open access scholarly content. The Knowledge Futures Group, a nonprofit originally founded as a partnership between the MIT Press and MIT Media Lab, builds and sustains technology for the production, curation, and preservation of knowledge in service of the public good.

From EurekAlert!

3.7 3 votes
Article Rating
Newest Most Voted
Inline Feedbacks
View all comments
Keith Peregrine
December 16, 2020 4:11 pm

In this age of Intellectoids, Junk Science is acceptable. Given the current weltanschauung, logic has no place in discussions. Proof in their view is simply what feels best….

Curious George
December 16, 2020 4:20 pm

I speed-read the “Self-Correction by Design“ by Marcia McNutt, President of NAS. The article mentions the word “consensus” only twice. She does not define how to reach consensus, but stresses its importance: “Scientists are therefore obligated to use all means at our disposal to ensure that science advances from intriguing hypothesis to validated consensus as quickly and efficiently as possible.”

Apparently the validated consensus is the shining ultimate goal of science. I guess the best way to validate a consensus is more consensus. That’s where the National Academy of Science is heading.

Reply to  Curious George
December 16, 2020 5:34 pm

Science be damned, we need to validate the consensus!

All in favor say AYE!

(We will skip the NAY for sake of efficiency and expediency).

Reply to  Curious George
December 16, 2020 9:13 pm

Self correction means it took the explosion of Soviet Russia for the European intelligencia to admit that model was unworkable – and also to discover it was not actually the actual communism/socialism/whatever they believed it was.

Self correction is the same as saying: planes are self correcting as broken ones end up on the ground, one way or another.

It’s tautological. Failed practical ideas fail.

The problem is that they make take the whole empire with them.

Tim Folkerts
Reply to  Curious George
December 16, 2020 9:53 pm

I think the intention here with “validated consensus” is to differentiate it from either
* invalid consensus (like “ulcers are caused by stress” or “continents don’t move”)
* valid but little-known conclusions (like “bacteria cause ulcers” or “continents do move”).

Intriguing, correct hypotheses like H. pylori or plate tectonics *should” move quickly and efficiently to become the new consensus. That really seems to be all that Marcia McNutt is trying to say.

Reply to  Tim Folkerts
December 17, 2020 8:10 am

I see some problems: Who validates the consensus, and what is the criteria for validating it?

Paul Penrose
Reply to  Tim Folkerts
December 17, 2020 10:24 am

Consensus is a political/social phenomenon, and as such is not a good way to determine the validity of a hypothesis or theory, unless you are completely ignorant of the subject in question. If that’s the case, you are better not forming an opinion, or better yet, educate yourself in the relevant subject(s).

Len Werner
Reply to  Paul Penrose
December 17, 2020 5:42 pm

I would propose that consensus was involved in neither understanding plate tectonics nor in curing ulcers. In the case of plate tectonics, consensus if anything impeded the eventual explanation which was in true scientific form proved by experiment, by the actual measurement of sea-floor (the magnetic striping) and eventually continental, movement. Consensus also did not cure ulcers, penicillin did. The only effect of consensus was to delay acceptance of the proving experiment.

I will agree that science should not be determined by consensus; it is uncomfortable to read of someone as learned as Marcia McNutt promoting the concept.

Incidentally when in grad studies I once tracked down the earliest mention of the possibility of continental drift in scientific literature; this used to be, and for this subject probably still must be conducted in libraries, not all of history is digitized. The first mention I found was within 2 years of the exploration voyages of HMS Challenger, which mapped the Atlantic sea floor in advance of laying the first trans-Atlantic cable, and the existence of the mid-Atlantic ridge was confirmed. This was before Alfred Wegener was born; Wegener was NOT the originator of the theory of continental drift.

I also suggest that it is not incorrect that it was not accepted until it was measured by experiment; isn’t this how science should be conducted?

Mark Pawelek
Reply to  Curious George
December 17, 2020 9:44 pm

Lysenko would be proud of the consensusi. The 160, or so, dissenting biologists killed in the Soviet Union for disagreeing with him – not so proud. The 30 million who died of starvation due to agricultural plans which failed are wailing in their graves every time they hear the term: ‘scientific consensus’.

December 16, 2020 4:26 pm

“Designing sound replication studies requires a host of data science skills, from statistical designs to causal inference to signal-noise separation”. When I started out this set of skills was grouped together under Biometrics, which has now come to mean at least a few other things. I believe the focus on statistical analysis at the expense of good design has contributed hugely to the Reproducibility Crisis. Statistical analyses have become increasingly arcane and convoluted and are routinely used to patch up poor design. The old adage that if you can’t see it by plotting the data then it probably isn’t interesting, still holds. Statisticians who think they can offer general statistical advice without understanding the nuances of the subject matter are big part of the problem. Bring back Biometrics. And don’t believe anything reported by the mains stream media.

Reply to  BCBill
December 17, 2020 5:50 pm

It’s not that sophisticated statistical tools are bad in themselves. The problem is that most users and peer reviewers are somewhere between unaware of and totally clueless about their limitations. And given the complexity of the tools and pressures to publish, there is very little incentive to learn when and how to use or not use them.

Tim Gorman
Reply to  Ralph Dave Westfall
December 17, 2020 6:04 pm


“clueless about their limitations”

It’s even worse than that. My youngest son, when he was going for his Microbiology undergraduate degree was told by his counselor and one of his mentors to not even bother taking any statistic courses. If he needed a statistical analysis of any data then find a math major or grad student to do it for him. I counseled him otherwise and he took 9 hours of statistics. Today he has a PhD in immunology and many of his peers use him as a resource on how to analyze the data from their projects.

Math majors have no background to base educated judgments on when it comes to experimental data and biologists have no background to base educated judgments on when it comes to statistical analysis. It’s the blind leading the blind.

It apparently isn’t much different in the field of climate science.

December 16, 2020 4:45 pm

The problem is really really bad.

In this case Big Pharma are the good guys. They scan the literature looking for findings they can turn into drugs. Then, the first thing they do is to try to reproduce those findings. As a result one company, Amgen, found that research findings were wrong 90% of the time. link

Bio-med is the one place where replication is routinely attempted. There’s no good reason to think the rest of the research world is any more reliable. Replication is not attempted for the vast majority of research findings. If it were, I have little doubt that the results would be equally abysmal.

… and then there’s climate science …

Rick C PE
Reply to  commieBob
December 16, 2020 8:10 pm

CommieB: Good point. In fact, research done in/by the business community needs to be very careful to assure they are making decisions based on repeatable, replicable data. If they don’t they risk putting products on the market that don’t work and may well be dangerous. Business oriented scientific research is, is my experience, mush more stringent in this regard than academic or government research. Invariably, those companies that cut corners and put products on the market without verifying that they are safe and effective end up paying out big time liability settlements. Boeing failed to adequately test and verify a single system on their 737-Max and we all have seen how that worked out.

Ian W
Reply to  Rick C PE
December 17, 2020 5:48 am

Academic research is pushed by publish or perish requirements throughout academia from masters students to postdocs to tenure seeking/maintaining professors. There is no requirement to be right just to publish, but to get funded and get through peer review the paper must support the fashionable narrative. Replication is almost never done.
Papers are expected to have multiple cited papers as references. But the author of the paper never attempts to replicate their references (indeed some are only read through to obtain a phrase that supports the paper being written). The result is that scientists in academia have become extremely gullible and easily led, this has seeped into industry with ‘peer review‘ being ‘the gold standard‘ for papers. Yet peer review is meaningless without replication of the research the paper is reporting.
When research meets business all of a sudden the claims have to work rather than be nicely put together statistics and glib explanations.
Interestingly, using the example of the 737-Max is also a case of believing what has been published rather than understanding the more complex underlying issues.

Eric Vieira
Reply to  commieBob
December 17, 2020 1:58 am

I somewhat have doubts regarding this. These “findings” are usually published as patents. If something not reproducible appears in a patent, it can be attacked by competitors, which means that the whole patent gets declared as “void”. I don’t see the interest that big pharma would have in doing this, except if one wants to kill a whole research area “burned fields policy”. But then a research disclosure would be a much more reasonable alternative.

Reply to  commieBob
December 17, 2020 6:34 am

This includes Rick C PE:
The proper use of statistics and experimental design often is the deal-breaker in product development. Typically(prior to about 1980) it relied heavily on the personal experience of the scientists. This often led to intense even acrimonious meetings due to misunderstandings about product tests or the meaning of statistical tests on the data.

Fortunately food products or other consumer products are easier to evaluate than seismic activity. My experience was that complicated multifactorial designs that required computer analysis were not more effective than prehistoric 2 level designs. Multifactorial screening experiments were rarely useful for anything but the top 2-3. Any variable less than 1 sigma variance was very rarely useful.

I’m really glad bioMed does rigorous screening and replication studies to avoid the late appearance of poor effectiveness in drugs and treatments.

December 16, 2020 4:47 pm

Two types of “studies” I read for amusement:
Wild guess predictions of the future climate
(called scientific studies, but are really climate astrology)
Retracted studies discussed at

December 16, 2020 6:43 pm

“Perspectives on Data Reproducibility and Replicability in Paleoclimate and Climate Science”

by Rosemary Bush, Andrea Dutton, Michael Evans, Rich Loft, and Gavin A. Schmidt
Published on Dec 16, 2020

Safety valve build in…

December 16, 2020 7:02 pm

Inference will fill in the missing links, smooth the computational “big bangs”… if you believe.

December 16, 2020 7:05 pm

“that many experiments and results are difficult or impossible to reproduce”. So, what? We’re just supposed to accept them? Not question their veracity? What B.S.

December 16, 2020 7:16 pm

This all qualifies as a massive red herring. The appearance of research sincerity without actually taking steps to force research credibility or rewarding honest research or researchers..

” For almost 50 years the MIT Press journals division has been publishing journals that are at the leading edge of their field and launching new journals that have nurtured burgeoning areas of scholarship.”

Ah yes, no mention about forcing researchers to meet experimental standards, share data, allow their program codes to be analyzed, utilize repeatable experiment structures and publish when they have firm results?

Nor does MIT condemn the publish or perish mentality forced upon researchers and strictly enforced by Universities, journals and employers?

MIT does not, in any way, condemn blatant use of waffle words that imply firm results. Nor does MIT bother to establish teams that verify by replication the results of all MIT research and MIT researchers.

December 16, 2020 7:52 pm

As a graduate student at University of Chicago (business and economics), I and other graduate students were required to replicate published studies. It was amazing how f requently we foundstatistical errors. I would recommend we do the same with climate change studies.

Reply to  Mohatdebos
December 16, 2020 9:44 pm

Steve McIntyre and Ross McKibben have lead the way in doing what you suggest. The problem is that their work has been consistently dismissed, ignored, downplayed, etc. followed by accusations that they are on the payroll of “Big Oil.”

John Ionnidis at Stanford has done a great deal of work in the area of peer reviewed medical research papers that could not be replicated. I find it informative that his name does not appear on any of studies noted above.

Joel O’Bryan
Reply to  RayG
December 16, 2020 11:39 pm

This entire exercise form MIT Press is a whitewash job and meant as PR lipservice to appease critics in Congress who sit on committees that are funding the NIH, DoE and NSF grant making processes.
Evidence: Marcia McNutt as the author here of “Self correction by Design” essay. She one of the key science adulterers allowing and actively enabling pal reviews of flawed climate papers while she was the senior editor at science magazine.

This whole thing reeks of a white wash job to enable the climate science corruption to continue.

The Dark Lord
December 16, 2020 8:57 pm

may some ethics training and/or fewer scientists willing to lie …

December 16, 2020 9:02 pm

The part about “Climate Science” is a joke. This is a field packed with charlatans, those manufacturing phony data to meet political purposes, those corrupting historical climate data, psychologists, sociologists, policy studies, and other ignorant of science, professional grafters, etc. who wouldn’t know science or honesty if they bit them in the bottom.

Worldwide we may have wasted trillions of dollars on these scams and efforts to destroy intellectual and physical freedoms. A pox on hem all!

Doc Chuck
Reply to  Leonard
December 17, 2020 12:12 am

Fully seven decades ago Eric Blair (pen-named George Orwell) presciently described a despotic system wide forced societal regimentation as well as behavioral and thought crime surveillance prohibiting simple truth-telling in his dystopian novel “Nineteen Eighty-Four” patterned on the Soviet model. And in our own bankrupt post-modern philosophical milieu any abiding truth is as conveniently dismissed as Pontius Pilate famously ducked what it might well call upon him for two millennia ago. Thus ‘War is Peace’; ‘Freedom is Slavery’; and ‘Ignorance is Strength’ become axiomatic double-think for the citizenry of an oppressive new world order controlled by those who turn out to be ‘more equal than others’ (from another such top down command society in his portrayal “Animal Farm”. Reproducibility of research findings may yet prove to be misdirection at the distracting margins (not so much unlike spurious claims of Russian collusion to affect an election have for so long served while the real misshaping of the popular will takes place by those deeply beholden to Communist Chinese bidding — just sayin’).

Ed Reid
Reply to  Doc Chuck
December 17, 2020 4:41 am
Jim Whelan
December 16, 2020 11:20 pm

Replicating a previous study isn’t “original” and won’t get you any points in academia.

Reply to  Jim Whelan
December 17, 2020 11:27 am

And finding that some cherished study conclusion is total vaporware is a career ender!

Last edited 1 month ago by OweninGA
Ian W
Reply to  OweninGA
December 19, 2020 6:47 am

It is not normally the conclusion – it is usually one of the base assumptions for the paper often not even stated as an assumption but there by implication. But yes you are right pointing out that the assumption(s) on which a paper is based do not appear to be correct does not increase one’s popularity; especially if it has elicited a large number of citations.

Carl Friis-Hansen
December 17, 2020 3:44 am

Curious George write:

“Apparently the validated consensus is the shining ultimate goal of
science. I guess the best way to validate a consensus is more consensus.
That’s where the National Academy of Science is heading.”

You may be correct Geaorge.
Take for example the Drosten mRPC test used all over the world. It is devastating useless:

There are ten fatal problems with the
Corman-Drosten paper which we will outline and explain in greater
detail in the following sections.

“We aimed to develop and deploy
robust diagnostic methodology for use in public health laboratory
settings without having virus material available.”

As I see it:
Scientific results these days do not need to be replicated, reviewed or in any way qualitative, as long as it serves a political or financial purpose.

Tim Gorman
December 17, 2020 4:51 am

Every so-called climate scientists should read”

Reproducibility and Replicability in Science, A Metrology Perspective

In there it states:

For numerical data, the critical evaluation criteria are:

a. Assuring the integrity of the data, such as provision of uncertainty determinations and use of standards; (bolding mine, tpg)

b. Checking the reasonableness of the data, such as consistency with physical principles and comparison with data obtained by independent methods; and

c. Assessing the usability of the data, such as inclusion of metadata and well-documented measurement procedures.

I have yet to read any “climate” study that even gives the bolded statement above a glance. Measurements and computations using those measurements are assumed to be 100% accurate out to whatever precision is needed to confirm a pre-assumed result.

And why do none of the CGM authors ever address item “b” and why the CGM results don’t match balloon and satelitte data, i.e. data obtained by independent methods?

December 17, 2020 7:37 am

An old science adage I first heard in the early 1980s – “If it’s not reproducible or replicable, then it’s not worth notice.”

Thomas Gasloli
December 17, 2020 8:09 am

Of course the elephant in the room being ignored is that the unreproducible science comes to the conclusion the government agency providing the research funding desires. This isn’t just a science problem, this is a problem of the damaging effect of corrupt government. A pretend effort by a publication that is partisan in the corrupt government is unlikely to result in a correction of the problem.

Kevin kilty
December 17, 2020 8:22 am

and addressing them requires an interdisciplinary approach….

I notice they do not list skepticism. Skepticism is an absolute necessity for addressing groupthink, poor design, inadequate models and analysis, and so forth.

Don Thompson
December 17, 2020 8:36 am

In scanning the paleoclimate article, the recommendation for shared information and linked databases would be useful. As most of us recall, Dr. Mann’s hockey stick steadfastly refused access to the data for critics or clear presentation of the model details. One of the biggest issues has for publicly-funded research has been refusal to share data for replication studies. It is scandalous that laws and regulations that will impact our health and economy are based on hidden data.

December 17, 2020 10:42 am

Here was the giveaway: “All of these efforts are rooted in an urgent desire to improve our world: How can we best use data for the common good?”

That is a certain path to scientific totalitarianism. Wasn’t it H.L. Mencken who observed that “The urge to save humanity is nearly always accompanied by the urge to rule humanity.” – or words to that effect.

The question should be: “How can we assure that data will be collected and analyzed without regard to “higher purposes”?

David Wojick
December 17, 2020 11:26 am

Replication in psychology is impossible, due to the wide range of human traits. An expermrn with a group, no matter how large, is a small sample of a huge population. Given the variability the next group tested will give different results.even testing the same group twice is likely to give different results.

Sampling theory says different samples will give different results. Replication requires the same results. So any time the experiment includes sampling replication is impossible. This is true of all science.

Javert Chip
Reply to  David Wojick
December 17, 2020 1:30 pm

This is either pure bullshit, or proof-positive that psychology is meaningless.

Take your pick.

Jim Gorman
Reply to  Javert Chip
December 18, 2020 4:45 am

Psychology is worthless when applying group results to individuals. This is where statistical variance comes into play. Part of this is the misconception that since error of the mean uses division by the sqrt N that a whole bunch of samples will give you a more and more accurate answer.

They fail to realize that N is not the number of samples, it is the sample size. Why is that important? The Central Limit Theory can only give a good answer if the size of each sample encompasses a good representation of the distribution of the population. Otherwise the variance of the sample mean goes off the chart. The mean itself may be accurate, but the variance means you have a large range of values around it.

When you apply this mean + variance to individuals, i.e. localities, there is no way to insure you have an accurate description. That’s why political polling is so wrong so often. It is why a Global Average Temperature is meaningless, among many other problems.

Javert Chip
December 17, 2020 1:27 pm

Article reads like fluffy marketing pitch for the Harvard Data Science Initiative (aka: bovine excrement). I didn’t see the term “metric” used a single time. Things like:

a) The Audit rate: metric for total number of funded replication attempts as a % of total funded projects by discipline & academic institution (we’re talking about you, psychology, as well as others)

b) The Failure rate: Metric for total number of funded replication attempts that do not succeed as a % of total replication attempts by discipline & academic institution

c) Reasonable extrapolation from the Audit sample size & Failure rate to the discipline

I realize “…god is in the details…”, but taxpayers blindly spend billions on this activity, and those “funded projects by discipline” looks like a damn good place to start.

Not all “un-replicability” is necessarily bad, but a discipline consistently running with low replicability ia a prime target for methodological improvement or loss of academic credibility (which appears to be happening with psychology)

Doc Chuck
Reply to  Javert Chip
December 17, 2020 7:59 pm

Well yes, us taxpayers are blind to the spending that is being done by our so-called public service representatives whose authorization of spending increases enhances their supposed importance within their own realm.

December 17, 2020 3:20 pm

In an era where scientists (think Phil Jones or Michael Mann) hide their data because they don’t want skeptics to have it, this should not come as a shock.

December 17, 2020 6:12 pm

Gavin Schmidt is on a panel investigating reproducibility on paleoclimate?

That’s a good one!

Mark Pawelek
December 17, 2020 9:34 pm

“Designing sound replication studies requires a host of data science skills, from statistical designs to causal inference to signal-noise separation, that are simultaneously tailored by and aimed at substantive understanding,” Meng continues.

<– I don’t see that. Successful replication requires the replicating team to faithfully reproduce the original study’s experimental details. In an effort to save space my studies have only basic experimental details published! Knowledge of full experimental, and data gathering techniques is likely to be more important than statistical wizardry. A study with statistical faults should never be published in the first place. So a lot of studies – which can’t be replicated – can be blamed on bad statistics, and lax editing at the original journal.

Mark Pawelek
December 17, 2020 9:37 pm

You still need an editing feature – for at least 10 minutes following a post.

PS: I meant “In an effort to save space many studies …

%d bloggers like this: