A Nature editorial on the state of ‘robusted’ science reproducibility

Robusted-science

Illustration by David Parkins for Nature

Excerpts from Robust research: Institutions must do their part for reproducibility

C. Glenn BegleyAlastair M. Buchan & Ulrich Dirnagl

Tie funding to verified good institutional practice, and robust science will shoot up the agenda, say C. Glenn Begley, Alastair M. Buchan and Ulrich Dirnagl.

Irreproducible research poses an enormous burden: it delays treatments, wastes patients’ and scientists’ time, and squanders billions of research dollars. It is also widespread. An unpublished 2015 survey by the American Society for Cell Biology found that more than two-thirds of respondents had on at least one occasion been unable to reproduce published results. Biomedical researchers from drug companies have reported that one-quarter or fewer of high-profile papers are reproducible1, 2.

Many parties are addressing the problem. Funding bodies such as the US National Institutes of Health (NIH) have announced training initiatives3 and explicitly instructed grant reviewers to consider whether experimental plans ensure rigour. New methods of data analysis and peer review have been proposed to deflate bias.

Several journals, including Nature and Science, have updated their guidelines and introduced checklists. These ask scientists whether they followed practices such as randomizing, blinding and calculating appropriate sample size. Science has also added statisticians to its panel of reviewing editors. Philanthropic and non-profit organizations have sponsored projects to improve robustness.

Funders’ policies, journal guidelines and widespread soul-searching are necessary. But they are not sufficient.

Conspicuous by their absence from these efforts are the places in which science is done: universities, hospitals, government-supported labs and independent research institutes. This has to change. Institutions must support and reward researchers who do solid — not just flashy — science and hold to account those whose methods are questionable.

The systems needed to promote reproducible research must come from institutions — scientists, funders and journals cannot build them on their own. These kinds of changes will require additional money, infrastructure, personnel and paperwork. The load on institutions and investigators will be real, but so is the burden of irreproducible research. Even if it is accompanied by an apparent decrease in productivity, the resulting increase in research quality will be well worth the costs.

Still, most institutions will not make the necessary moves unless forced. Funding bodies should make GIP a prerequisite for receiving a grant. The concept has gained some traction: last year, Science Foundation Ireland announced plans to conduct external audits on some of the labs that it supports.

There will not be one ideal solution. Faculty members, trainees and administrators will need to come together for honest, difficult discussions to restructure institutions. Neither scientists nor institutions should engage in mere box checking; new practices must restrain sloppiness while interfering only minimally with the many scientists who are behaving well.

Read the full article here: http://www.nature.com/news/robust-research-institutions-must-do-their-part-for-reproducibility-1.18259?WT.mc_id=SFB_NNEWS_1508_RHBox

Advertisements

57 thoughts on “A Nature editorial on the state of ‘robusted’ science reproducibility

  1. Good idea, but will it ever get implemented to an extent to significantly diminish the number of articles based on bad science?

  2. “Several journals, including Nature and Science, have updated their guidelines and introduced checklists. These ask scientists whether they followed practices such as randomizing, blinding and calculating appropriate sample size.”

    This will be yet another hoop, used to squelch the publication of legitimate studies but ignored for papers that fit the desired narrative.

  3. “The systems needed to promote reproducible research must come from institutions — scientists, funders and journals cannot build them on their own.”

    There is already a strong bias in favor of pal-review science. If you’re in the club, your papers sail through, if you’re out of the club you don’t have a chance to publish in the top tier journals. Grant applications heavily depend on your “research environment”, meaning the perceived quality of your university. A revolutionary idea alone is not enough.

    I published a paper last year that was rejected by Science because they didn’t think it was impactful enough. I compared the response to my paper when it was published to the articles published in Science the same week it came out. My paper got more attention and more accolades than anything they published in place of it. But I wasn’t in the club, and they were.

    This will further entrench clubby science, and squeeze out more quality research so that the powerful labs can continue shunting grant money and high profile publications into their own pockets.

    • Grant applications heavily depend on your “research environment”, meaning the perceived quality of your university.

      Yep, folks are too lazy and/or incompetent to fairly evaluate grant proposals (and maybe it’s an impossible task anyway) so they filter by whatever is easy.

      There is work that suggests that we should give up on trying to pick winners. AI (Artificial Intelligence) researchers are finding interesting things:

      They make the case that great achievement can’t be bottled up into mechanical metrics; that innovation is not driven by narrowly focused heroic effort; and that we would be wiser (and the outcomes better) if instead we whole-heartedly embraced serendipitous discovery and playful creativity. Why Greatness Cannot Be Planned – The Myth of the Objective

      We are victims of the kind of management thinking that believes that if it quantifies everything and creates the right processes the optimum outcome will result. My new favorite enemy is the MBA and my new hero is Henry Mintzberg. MBAs as the new Aristocracy

      Mintzberg tracked the careers of 19 superstar graduates of Harvard Business School:

      Ten were outright failures (the company went bankrupt, the CEO was fired, a major merger backfired etc.);

      four had questionable records at best.

      Five out of the 19 seemed to do fine.

      The formatting is mine http://www.freerepublic.com/focus/news/2443544/posts

      Never mind ‘granting bodies’ or ‘institutions’. We have a management problem. It isn’t just wrecking science; it’s wrecking the whole economy. I’m seriously worried that we have already passed the tipping point. [/rant]

      • In the 1980s, consultants convinced corporations they didn’t need managers. It has been a mess ever since.

      • Another poor idea is that a manager can manage anything and doesn’t need to understand the process, it’s those below who make things happen. Been working in this sort of situation for a number of years, some managers are willing to listen to those that know how stuff works others just order things done and you’re being negative if you question them.
        They also want things to work in a fixed way and in my industry (like the climate) there are so many variables that can and do change as the weather changes.

        James Bull

      • The management problem is because since the 1980s, investors have only cared about short term stock price and performance. If you are in any conglomerate it doesn’t matter that you invented a cure for cancer if the stock price lost 1% under your oversight. Since executive management forces managers to focus on stock price above all else, it creates a viscous cycle of poor decisions that work in the short term but destroy the long term.

        You can be the literal best executive/manager in the world and still fail completely because all you were allowed to focus on is stock performance.

        As for the “managers can manage anything” I have to say that it is true that a good manager can manage any group of people. What they cannot manage is the process those people follow if they don’t understand the dynamic of the work place (culture, needs, requirements, technical details, etc). Conflating the former with the latter is the problem I see at so many companies — especially when they are hoping to quickly squeeze a particular function into fewer jobs to help the stock prices.

      • Bob, quoting from the Free Republic article above:

        Fail to make the targets, no matter how profitable the company remained, and out the door went thousands of employees, those “human resources.

        I guess I was lucky to enter the job market (high tech) back in the mid 1960’s, when managers were long-term employees of the company and understood the business from the inside. . Over 40 some-odd years I have worked for a few large companies and four or five start ups. Along the way I developed a couple of what I call “truths”, neither of which appears to be taught in business school:

        1. You cannot grow faster than the available talent – every time I have seen a company try this, they pack on the people, start fudging the numbers and within 3 to 5 years are in, or close to bankruptcy. All of this while the CEO is making enough money each year to retire.
        2. Your first layoff, rather than allowing you to get ride of the dead weight, costs you your most talented people. The better employees, rather than hang around waiting for the next ax to fall, will find jobs elsewhere. But, layoffs make the numbers look good for another year, allowing the CEO to bank a few million more in his retirement fund.

        On the other side of the coin, I talked to a childhood friend a few months ago. He and his brother took over their father’s local-area business. Both of them had started working for the business when they got out of the service, cycling through every department and practically every job within the company. When I asked him how thinks were going, he said that federal regulations had been slowly eliminating major parts of the business but, “we’re keeping it alive until several of our long-time employees retired”. You don’t find that dedication, or that concern for your fellow man taught in business school today.

  4. Reproducability is indeed a big problem. But the institutionl solutions being proposed in Nature do not address root causes like publication quantity over quality incentives, pal review, statistical package autopilots, going with the flow, and disinterest generally in replication as it is not ‘new’ or resume building. And when Nature spawns Nature Climate Change where computer model output is magically published with editorial approval as data, then Nature itself is a large part of the problem.
    You can put lipstick on a pig, but it is still a pig.

  5. One of the main elephants in the room is the current practice of basing tenure and promotions on publishing record. Since your future often depends more on your publishing record rather than on the actual quality of your work it is much easier to just play along, by the rules, never taking on a difficult project and jointly with your pals put out a continuous stream of poppycock.

    The reliance on ‘peer reviewed’ publishing is just a cop out taken by incompetent administrators that saves them from having to spend the time necessary to learn their jobs and do them correctly. Until Dumbo is shipped back to the circus nothing will change.

    • That has always been the problem with management by the numbers. It is human nature to “work the numbers” to look good and in that environment the cream does not rise to the top but instead those most effective at working the numbers.

      For repetitive tasks as in manufacturing using numbers is practical and effective. “Does this change to the assembly line manufacture widgets faster, cheaper and with less error” is an easy and meaningful measure with just hard numbers.

      For non-repetitive tasks (like research) it is much more difficult to assess quality or effectiveness, but one thing for sure is that measuring quantity of papers alone is not a meaningful measure.

      • Alx, you are quite right about the ‘management by the numbers’. The auto companies when through that in spades when most of their top management came out of accounting. That’s what gave us the Chevy Vega and the Ford Pinto. They also went through a time where they were adding a lot of legal eagles to top management. That didn’t fair well either. Expanding on what someone stated (or implied) above, management that comes up through the ranks via competency, not Peter’d, doesn’t have a problem evaluating their subordinates. They also tend to make decisions for the long haul, rather than to make this quarter’s numbers please Wall Street.

        I still like a statement out of an old book titled ‘The Entrepreneur’s Handbook’:

        First rate managers hire first rate people, second rate managers hire third rate people.

  6. A crisis is needed to make a change – the climate science crisis might turn out beneficial for a change towards more robust scientific practices. The IPCC report looks like a monument over inductivism and justificationism. A monument over approaches warned about by Karl Popper.

    If in doubt, the first 25 pages from “The logic of scientific discovery” by Karl Popper will be enlightening. (Karl Popper was the master mind behind the modern scientific method):
    http://strangebeautiful.com/other-texts/popper-logic-scientific-discovery.pdf
    A handful of pages from the following IPCC report should then be sufficient to become suspicious: The contribution from working group I (On the scientific basis) to the fifth assessment report by IPCC.

    • I doubt it will have much of an effect. Childrearing and Nutrition studies, famous for contradicting each other, have already undermined public confidence in medical science outside of actual disease treatment, and yet these studies are still funded en-mass and widely reported.

      • But I once heard that raising a child upside down for 18 hours a day was the healthiest way to raise them. Someone should do a study!

  7. I don’t see how NIH study sections could be the problem. Funding lines are modest, and I assume that the studies getting funded have to be meritorious and worthy.

    When I was on study section, I know that we gave fair weight to setting, but that is part of the judgment: is this researcher and his or her environment capable of pulling off the study as planned? Beyond that, we were sensitive to new investigators, spreading the wealth, etc.

    I would appreciate hearing others chime in.

    • Funding success rates at NIH (and at its Canadian counterpart, CIHR) are very low – below 15%, sometimes below 10% of all applications get funded. That means many, if not most, sound and worthy applications will get rejected, and there sure is a temptation to stretch the truth in order to make the cut.

  8. The news that much ‘research’ is basically flawed or worse, utter hogwash, isn’t merely a problem, it is a crisis.

    This news about unreproducable science is huge news. It has so infested the system it has overwhelmed real science. This means the publishers were putting out wretched excess of worthless information at an amazing pace and devalues everything done in the last 30 years or so.

    I remember when Nature and Science magazines were rigorous, I remember scientists like my father refereeing studies submitted to both, I helped him go over the data at home back in the 1970’s, for example.

    This is terrible what is going on. All the studies that are unreproducable should be hung out in public humiliation and the authors should be denounced in no uncertain terms and the editors should be fined or some sort of thing to prevent this from happening ever again.

    • In my now considerable lifetime (anecdotal, I know), I have seen so many supposedly rock solid medical truths debunked that it makes me doubt anything I hear, even from my own doctor. Salt, eggs, fat, sugar, cholesterol, you name it. The trust is gone due to all of these idiot “studies.”

    • In fairness, the problem comes partly because the p-values for significant results were set in the first part of the last century.

      If there’s a one in twenty chance of your result being a fluke (confounded by complexity) the your result will soon be rejected as unreproducible – if only ten groups try and reproduce it.
      But now every city in the Western World (and many in the Developing World too) have a University. If 100 PhD students try to reproduce it – you get about five confirmations.

      That’s enough for a conference – maybe even a journal started.
      And then the rubbish is entrenched.
      After that it’s hard to reject further findings during peer review just for poor practise when the outcomes are so self-evidently reasonable.

      And the tower of tripe builds one level higher.

    • The want of proper critical appraisal that pervades the sciences is not only reflected in poor reproducibility, but also in poorly conceived scientific aims and goals. Many fashionable “new paradigms” that attract money and effort are fuzzy concepts that have no fundamental interest and little practical relevance, and which predictable fizzle and fade, only to be replaced by the next short-lived fad. Current examples are buzzwords such as “biosensors”, “nanotechnology”, “proteomics” and so forth. Experimental cancer therapy is particularly and perpetually rife with poorly thought out, unrealistic approaches that have no chance of ever becoming practically useful.

      Science is supposedly based on rigorous and vigorous criticism, which in reality is, however, conspicuous mostly by its absence. Why is this so?

      My personal explanation starts with the question: What kind of person is happy as an academic researcher? Quite simply, it is one who feels that he is doing good work. This requires that he produces enough scientific ideas that stand up to his own criticism — his creativity must exceed his critical sense. Therefore, a happy academic must either be extraordinarily creative, or he must be lacking in critical sense. Don’t laugh — the majority of career academics, while reasonably intelligent and knowledgeable, is distinguished from other scientifically educated professionals not by superior scientific acumen and creativity, but by a subpar critical sense. All it takes for a new scientific fad to be born is that a critical mass of such people congregate around a new buzzword.

  9. This article is a small step in the right direction. Some are beginning to realize what engineers the world over have known for a long time: Eventually ‘facts’ (as in real reproducible results) will push through the fog and things will fail if they are not built while taking real characteristics into account. Imagine building a very tall building and not knowing very precisely the limits of your structural calculations. The first wind storm or earthquake that comes along could be tragic.

    • The World Trade Center where my husband worked once, had zero internal ‘honeycomb’ support, it was all the elevator shaft and the outside walls. This is why I told him to quit his job on the 66th floor which saved his life a few years later.

  10. Problem with “robust” is that it’s a linguistic phrase that varies between disciplines.

    I (was) most used to seeing it in chemical processes where a “robust” process could be given to any fairly new science-graduate/technician in the lab, and they would invariably fail to make a mess of it.
    A “no brainer”, as they say. And usually not a full complement of limbs needed either.

    Cli-Sci doesn’t appear to use the word in the same way.

    • Cli-Sci doesn’t appear to use the word in the same way.
      ==============
      a robust result in climate science is a result that happens once, maybe. but only on the 5th Thursday in February, excepting leap years.

    • … a “robust” process could be given to any fairly new science-graduate/technician in the lab, and they would invariably fail to make a mess of it.

      The logic hurts my brain. Murphy’s law dictates that the technician should make a mess of the process, that’s what technicians are supposed to do; so Murphy’s law dictates that they should, therefore, mess up messing up. Gracie Allen would be proud of such a construct. Perhaps Yhprum’s law superceded Murphy’s law here.

  11. Apply 100 different statistical methods to your data. 99 give you a negative result, 1 gives you a positive result. Say nothing about the 99 negative results and publish the 1 positive result with great fanfare and press releases up the wazoo.

    This is the state of science today. The scientific method has been throw under the bus in the name of getting published and getting future grants.

    Even if someone tries to replicate your results, so long as they choose the same statistical method you used, they may well be able to replicate. But it is a meaningless replication because statistics are only a probability, not an absolute.

    If you try enough statistical methods you will often find one that gives the results you want, but all the other statistical methods that failed to give a positive result are screaming at you that your one positive result is bogus.

    So science covers up the 99 negative results. A lie of omission is still a lie.

  12. “Science has also added statisticians to its panel of reviewing editors.”

    So McIntyre is being added to the editorial staff of Science?

    • One custom of the recent few years, for articles I have reviewed, has been to have me, as reviewer, check a box if I believe a statistician is needed to review the stat methods of an article.

      I am on a team submitting an article now, and the stats are far beyond me; that is not my role. The stats will be beyond most reviewers, and beyond a lot of statisticians. An expert reviewer will be needed. Or, if they just get dazzled by all of the fancy talk and just publish it, my CV won’t mind.

  13. also please note the PNAS paper “revised standards for statistical evidence” by Valen Johnson which suggests that the alpha level in classical hypothesis tests should be reduced by an order of magnitude to improve reproducibility of results.

    • I disagree – set “p” to .005?
      The problem is this: a “p” of .05 indicates something worth further investigation. It THEN has to be investigated in various other avenues to see if the holding holds up.

      When examined in observational data, like a longitudinal cohort study, vitamins are associated with longevity. Once examined by prospective trials, it has become clear they don’t have this magic.

      However, when the observational studies, with all of the confounding influences, indicated that vitamins were worth further investigation, “p” < .05, this was a perfectly legitimate use of stats and probability testing – as long as the next study is an improvement to cross-check the finding, not to parrot methods.

  14. It all comes back to the present lack among research scientists of what is called the Feynman integrity :

    “It’s a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty–a kind of leaning over backwards. For example, if you’re doing an experiment, you should report everything that you think might make it invalid–not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked–to make sure the other fellow can tell they have been eliminated.

    Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can–if you know anything at all wrong, or possibly wrong–to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition”

    That is the attitude to scientific research which should be fostered. Scientific papers with more reproducible results would follow.

    • The difference is that the theoretical physics community actively encourages dissent and debate. I’d suggest reading “The Black Hole War” for a good example. Susskind and Hawking split physics in twain about the basic properties of entropy, but they remained cordial through it all.

      To compare, the alarmist movement cracks down on anyone who even claims the effects are probably weaker than predicted. You must support the narrative.

  15. National Institutes of Health (NIH) have announced training initiatives3 and explicitly instructed grant reviewers to consider whether experimental plans ensure rigour.

    RIGOUR is what should be being demanded. “Robust” is a word for politicians, not scientists. If someone is trying to put forward the idea that there results are “robust” you can be pretty sure that they are not. It is also a fairly good bet that the results are contentious in some way, probably politically.

    It is a shame that the authors chose to start their title with the word robust. We need to see less use of this word in science, not more.

    At least this issue is finally getting some attention. SIX years after climategate, we start to see some recognition of the problem.

  16. It is about money. I am horrified by a few of the comments above that paint a picture of a person in the funding stream with many $$$ in hand, wondering about the best way to distribute it.
    It seems also that the sources of grants (these days typically government or quasi-gov’t groups), have been given more money than they can handle.
    The money I spent on scientific research from the late 60s to year 2000 was generated by the efforts of our internal group. That is, if we did not generate profits from our work, we would have no money, irrespective of merit. Well, close to that situation. At age 29 I was part owner of a large laboratory that was established and operated with no government funding, rebates, incentives at all. It failed when the market for our work crashed, partly because of governmental insufficiencies nationally. It was a good way to learn a lesson about independence.
    I have never been able to hold out my hand to a government slush fund and get free $$$ for research.
    So, by analogy, I have to feel that there is now too little incentive to produce reproducible science.
    One is left with the feeling that the whole funding system needs an overhaul.
    It is not base to analyse in terms of $$$. They were invented as a way to compare performance. Some form of value for effort incentive should be studied as an alternative to current funding models if it not already there.

  17. In the modern British university, it is not that funding is sought in order to carry out research, but that research projects are formulated in order to get funding. I am not joking when I say that a physics lecturer called Einstein, who just thought about the Universe would risk being sacked because he brought in no grants.
    From a letter to the Times by Prof Sir Fergus Millar.

  18. Page limitations in journals used to be an excuse for not publishing complete details and metadata for experiments. With online journals that’s no longer a problem. The scientific societies ought to develop standards of experimentation and reporting with which all research must comply before it can be published. Of course, the journals will have to do the policing, but that is part of their mission.

  19. A NASA rocket was launched recently with a device that would ‘calculate’ how much water is in the soil up to 2″ deep so they can plan for global warming hysteria when they see the Sahara is actually very dry, I suppose.

    Well…this billion dollar device just broke when it went into orbit and is now useless. More money down a very silly drain.

  20. Maybe if universities upped the ante for Ph.D. degrees we might see some change. Students should be required to take at least a year of graduate level statistical methods, Students should be required to produce original research that meets stringent reproducible results. And most of all, Lab directors/lead authors should be banned from giving Ph.D. candidates low hanging fruit projects stemming off of their own endeavors and should not be among the authors of the candidate’s research. But if you really want qualified Ph.D.’s, the candidates should be barred from any input and the product should be a stand alone single author study. It should be a test, not a coddled trip down easy street. The degree means you are capable of discovery. If the degree is awarded for lab sweeping with help from your boss, you end up with monkeys who’s acumen has not been tested.

    • Pamela, you hit the nail on the head with ” …monkeys whose acumen has not been tested.” It perfectly describes leading global warming activists that have the ear of our political leaders. I also think the field has gone overboard for statistical studies that obfuscate rather than enhance actual climate observations. An example is a temperature curve so festooned with annoying statistical markers so that you can’t hardly see the curve itself. I am with Earnest Rutherford here who opined that “…If your experiment needs statistics, you ought to have done a better experiment.” As to co-authors, I have never had any through the fifties and sixties and have no idea how it got so out of hand. But then again I was not subject to the academic pressures that brought forth this expansion of participatory credits. It really is out of control as Hansen’s recent paper on assessing dangerous climate change shows. He has 18 co-authors whose function apparently is more political than scientific. How else do you explain the presence of Jeffrey Sachs on his author list? He is a well-known psychologist but certainly not a climate scientist, and quite likely is meant to supply name recognition to those who do not know climate science. As far as climate science goes this article is nothing more than a pseudo-scientific fantasy about carbon reduction to save the world.

  21. As Mr. W. Briggs will tell you, often and clearly, statistics can make NO predictions about future experiments. It can only tell you characteristics about the actual data you have.

    The 95% confidence interval was never meant to be anything but a rough guide to the variability of your data. Anything less is more or less a good guess. Robust data, or rigorous data would be in the range of 99.9999%.

  22. If 75% of biomedical studies aren’t reproducible then it’s not a few bad apples. It’s a system that rewards notoriety over truth. For profit companies must demonstrate reproducibility every day. That’s why we trust their products. Federally funded science whether direct or via a tax subsidy can’t be reproduced because the scientist’s incentives are political not economic. Like all politicians they don’t need to be right just persuade enough people to win. But like political systems science must be seen.to function over the long hall which means reproducibility. Academic science like so much of academe turns out to be deeply corrupt and exploitative. Academe: our most corrupt sector.

  23. Yet another indicator (as if the climate change fraud wasn’t enough) of the root problem of overpopulation in the ecological niche called “science.” Even if published results become rigidly reproducible (lol), what’s the benefit to society from all the scientific blah-blah-blah?

    The big bang observations have already been made, calculus has already been invented, relativity has already been described, the germ theory of disease has already been posited, and antibiotics have already been proven. So, genius though you may be, it looks like you’re a little late to the party to make much of a dent. Furthermore, Sputnik is dead, the Cold War is over, and public funding of “science” has become an anachronism at best.

    My suggestion, if the markets retest their Tech Wreck and Housing Bubble lows before the elections: it would probably be smart for 75% of so-called scientists to shore up their resumes with a more apropos job niche in mind, like a middle management spot in the fast food industry. For those left, if you’re interested in how many Higgs particles can dance on the end of a retrovirus, fine. Fund the research among yourselves and your pals. Just get your hand out of my back pocket.

Comments are closed.