Climate Science take note: New gold standard established for open and reproducible research

data-code

From the University of Cambridge:

A group of Cambridge computer scientists have set a new gold standard for openness and reproducibility in research by sharing the more than 200GB of data and 20,000 lines of code behind their latest results – an unprecedented degree of openness in a peer-reviewed publication. The researchers hope that this new gold standard will be adopted by other fields, increasing the reliability of research results, especially for work which is publicly funded.

The researchers are presenting their results at a talk today (4 May) at the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI) in Oakland, California.

In recent years there’s been a great deal of discussion about so-called ‘open access’ publications – the idea that research publications, particularly those funded by public money, should be made publicly available.

Computer science has embraced open access more than many disciplines, with some publishers sub-licensing publications and allowing authors to publish them in open archives. However, as more and more corporations publish their research in academic journals, and as academics find themselves in a ‘publish or perish’ culture, the reliability of research results has come into question.

“Open access isn’t as open as you think, especially when there are corporate interests involved,” said Matthew Grosvenor, a PhD student from the University’s Computer Laboratory, and the paper’s lead author. “Due to commercial sensitivities, corporations are reluctant to make their code and data sets available when they publish in peer-reviewed journals. But without the code or data sets, the results are irrelevant – we can’t know whether an experiment is the same if we try to recreate it.”

Beyond computer science, a number of high-profile incidents of errors, fraud or misconduct have called quality standards in research into question. This has thrown the issue of reproducibility – that a result can be reliably repeated given the same conditions – into the spotlight.

“If a result cannot be reliably repeated, then how can we trust it?” said Grosvenor. “If you try to reproduce other people’s work from the paper alone, you often end up with different numbers. Unless you have access to everything, it’s useless to call a piece of research open source. It’s either open source or it’s not – you can’t open source just a little bit.”

With their most recent publication, Grosvenor and his colleagues have gone several steps beyond typical open access standards – setting a new gold standard for open and reproducible research. All of the experimental figures and tables in the award-winning final version of their paper, which describes a new method of making data centres more efficient, are clickable.

By clicking on any of the figures or tables in the paper, readers are taken to a website where the researchers have produced technically detailed descriptions of the methods for every one of their experiments. These descriptions include the original data sets and tools that were used to produce the figures as well as free and open source access to all of the source code that they wrote and modified.

In the past this might not have been possible, but thanks to cheap cloud storage, the researchers have put nearly 200GB of data and 20,000 lines of code on to the internet and made it freely available to all under a permissive open-source license.

“It now should be possible for anyone with a collection of computers to follow our instructions and produce our exact graphs,” said Grosvenor. “We think that this is the way forward for all scientific publications and so we’ve put our money where our mouth is and done it.”

###

Advertisements

46 thoughts on “Climate Science take note: New gold standard established for open and reproducible research

  1. and everyone of those researchers will become rich when their software is adopted … as they should … because nobody will adopt a black box …

    • As long as corporate interest, drives the politics and funding of science? The so called Science will remain “settled” by committee.
      Observation becomes relevant, when prostitution becomes a crime, that actually embarrasses the observers, beyond their interests in greed and stature. That issue will not be resolved. So although this is a positive move, politics will find a way of bypass. Or they will simply label honest conclusions, as the work of deniers and heretics.

  2. Excellent but “should be” should be changed to “must be” if we are ever to have faith in scientists again…

  3. It’s a good thing climate science is settled… there is no need for this new-fangled cloud storage and open source stuff and nonsense.
    /snark

  4. About time.
    In these days of cheap storage and even cheaper internet access, data size is no longer an excuse.
    For example, Amazon S3 web storage – 1000GB storage, 10,000GB monthly transfer costs $150 / month by my calculation. Any university not prepared to pay a sum so trivial to ensure open access should hang its head in shame.
    http://calculator.s3.amazonaws.com/index.html

    • It’s not just about storing it. It is also about making it discoverable. There is quite a bit of digital curation that needs to happen with all of this data. Mandates are well on their way in Europe and UK. The U.S. is behind in that regard and the state of open, discoverable digital repositories is nascent.

      • That is a great comment about curation.
        An organization can be malicious in its’ adherence to FOIA requests and innocently say they are complying, while slow walking it or stonewalling it. Same goes for the data.

    • just bought a TB USB drive for $60, so that 200 GB is about $12. Not an issue. Oh and the tranfer rate exceeds network speeds on the internet… Cloud rental is for folks who can’t build an FTP server or web server… or do bittorent releases. ( I serve about 30 GB of bittorent files from a $35 Raspberry Pi…)

    • It’s 41 Euros/year for 10 T of storage on hubic.com. Works like dropbox, with a synced folder across computers and devices, and has a cloud backup routine that doesn’t sync. This cloud storage service is only based in France. The parent company is huge.

  5. “the idea that research publications, particularly those funded by public money, should be made publicly available.”
    Wow, what a concept. Actually, I thought this was already the law! Good thing our legislators are honorable people with common sense.
    (mods: does this require a /s?)

    • Ask Phil Jones and Michael Mann…..they say that there publicly funded data and methodology is “proprietary”……. Now we can all see what transparency really means…

      • I expect when the GWPF does their study on the terrestrial temperature data, they will run up against the Jones/Mann style buzz saw. Should be interesting and hopefully enlightening. It could even become Climategate III. We shall see…….

      • It has nothing to do with transparency. Its the “Glory” of first to publish. Once the Press Release generates
        the intended interest, then folks have to come to them for the “details”. They then have the power of
        gatekeeper.
        This is what really pisses me off about Science by Press Release. The MSM laps it up with headlines of
        “…a recent study…”, “…new research shows…”. Until it can be reproduced, it is a interesting. Newsworthy,
        maybe. Scientific. No.

      • One of the problems with that is even publicly funded research may include data that another country considers proprietary. The ground stations temp data has frequently includes data from stations that have unduly restrictive licensing terms on the data.
        Even after having said that, I will admit that Jones, Mann et. al. really milk that excuse for all it’s worth.

    • It’s not always the case that it SHOULD be at least not until IP protection if appropriate has been executed. Why should UK taxpayers fund the future wealth streams of foreign corporations after all??
      Plenty of publicly funded research has commercial value. Why waste it through some cock-eyed self-righteous academic culture of ‘we decide what happens to our research’?? The funder decides, and in that case, it most certainly is not the HEI or the researcher, it is the taxpayer……..

  6. Well its a long way from:
    ‘why should i give the data to you, when all you want to do is find something wrong with it?’

  7. Good job! Don’t forget to archive operational metadata and dataset version info in your archive. In the long run, I would hope large national bureaus like NOAA and Hadley would store raw, unedited data, including remote sensing data, and ALL edits, adjustments, infills and cleanups to those datasets would be reproducible derivations, NOT end products which overwrite the whole. Only in this way can we ever hope to test the adjustments themselves.

  8. The University of Cambridge, Openness and “Gold Standard for….
    I’m a skeptic.

  9. One giant leap for computer modelling and now for one small step from Mann, et al?

  10. http://www.reuters.com/article/2012/03/28/us-science-cancer-idUSBRE82R12P20120328
    That link above is all you really need to know about university science. When the end product is a press release and a publication what’s the harm right… I’ve seen business waste millions of dollars trying to repeat academic science and taxpayers paying for patents that have zero commercial value. Climate science seems to be the worst, let’s see take a worse case climate model, apply it to a local ecosystem and conclude the polar bears are going to be eaten by huge sea creatures – presto Nature Paper.

  11. Of course when you provide all your methods and procedures along with your data, then those that follow may well make all the same “mistakes”. Sometimes even the data has been sourced to fit a belief and hence all the medical “breakthroughs” that turned out not to be breakthroughs.
    Nevertheless, if one won’t divulge their methods, procedures data collection and assumptions, and repeatability, there really is no way to check. Isn’t that what we all learned in high school physics and chemistry? Or don’t they have to do reports anymore?
    Double edged sword in some ways.
    The Phil Jones comment is interesting projection though. Says a lot about the man.

    • Or as I said before about the Glory. You do “research” and publish a Press Release. Then someone comes
      along re-uses/re-analyses YOUR data and finds something else more interesting. They release Science by Press Release, get famous and get grants. You as original publisher get nothing. Why publish more then the Press Release, the paper and vague references to your data? If you do, you could lose your funding.
      What a strange world we live in.

      • LOL research produced for the Bold print, undermines the reality, that science does not require promotion insults or debates. You simply need to look.
        Long lost in the dust storm, that is essentially the largest proportion of “climate science”.

  12. I am encouraged by this. Careful review by outsiders is usually the best way to find ‘oops!’. It happens to me all the time. Why not others? We can’t have our noses buried in our methods and not make oversights.
    In a world that seeks progress we cannot have the egos of ‘those who are always right’ directing policy. Sooner or later everyone is wrong on something and we should stop punishing people so much for that. Should we correct our mistakes or defend them to the death because of fear of censure?
    The peer review process was supposed to catch those OMG! errors so that what went out did not waste readers time. What passes for peer review these days is not protecting the public and giving them good work, it is too frequently protecting cliques of interested parties who are empire building – white tower empires, often.
    Science review is not about trashing people, it is about proving, if at all possible, that there is something new known and shown which you can own. Learn, and move forward.
    Science has always had its toxic ideologies and ideologues – ok we can deal with it – but we do it by showing how things work. That any public money is spent based on the predictions of unvalidated, secret code computer models of the atmosphere – models paid for by the public – is downright embarrassing. Especially when there is so much which remains unknown. It is the broad spectrum of unknowns which created the space to allow a few Chicken Littles to crow, “The sky is burning.” Let’s share what we know and close that space.

  13. The situation in many areas of academia is worse than could ever be imagined by the layperson.
    As demonstrated by this astonishing long running fraud in which data was simply made up. In some cases with no actual “experiment” having been conducted at all.
    The question we should be asking is not, “how many such frauds are detected?”, but “how many cannot or will not ever be detected?”.
    The Derek Stapel case is essential reading.
    Many insightful articles and links exist. Here is a summary of the case:
    http://pipeline.corante.com/archives/2011/11/02/faking_two_papers_a_month_for_seven_years.php

  14. Need to be a bit careful here.There are a number of new “open access” journals that have appeared which charge the author , not the reader , but some are regarded as having suspect scientific credibility . There is something called Beals List , or Beallys list , which categorises them as to the credibility of the contents , and whether they are just an example of ” vanity publishing”.
    What I have said does not of course include long established journals , that sometimes now allow a temporary period of free access for fresh papers , or those where the Institution of the author(s) has paid for access.

  15. ‘a number of high-profile incidents of errors, fraud or misconduct’
    The trouble is what would be poor standards in other areas of science are accepted or even celebrated standards in climate ‘science’ , put it simply how you get the results means nothing all that matters is that you get the ‘right result ‘
    So such calls for quality really are water of a ducks back in an area that cares nothing about quality are everything about ‘effect ‘.

  16. Not sure what this article has to do with climate science. I think the public has access to most of the climate models used in the various CMIP studies — the intercomparison projects between models on which IPCC reports are ultimately based. I few seconds of Googling found a list of models (http://cmip-pcmdi.llnl.gov/cmip5/docs/CMIP5_modeling_groups.pdf), and, picking one at random (the NCAR CCSM4), further search got me to a list of model components (http://www.cesm.ucar.edu/models/ccsm4.0/). From there I chose an atmospheric component and immediately found a 224 page technical description (http://www.cesm.ucar.edu/models/ccsm4.0/cam/docs/description/cam4_desc.pdf). And this is just for one atmospheric component of the CCSM4 model. Oh, and here (http://www.cesm.ucar.edu/models/ccsm4.0/tags/ccsm4_0_rel/) is a list of the versions of code anyone can download and examine as they like. All seems pretty open to me.

  17. When the likes of Phil Jones were renowned for not being able to run excel, I don’t expect they’ll be at the front of the queue wanting to implement this sort of thing. Especially as people will only try and find fault with their work.

  18. “A group of Cambridge computer scientists have set a new gold standard for openness and reproducibility in research ”
    They use “Gold standard” THREE times in that blurb. What are these people, terrorists? Haven’t they noticed that Keynesian scientists have found out that Gold is a Barbarous Relic, that there is no Gold standard anymore, and that they SHOULD have said “Papiermark standard” or something along that line. Only debt-backed fiat money delivers real value! Just ask Venezuela, Zimbabwe or any other nation of importance!

  19. To someone like me who writes code for a living, 20000 lines seems like a pretty modest amount of code to make a big deal about. Still, it’s certainly a step in the right direction.
    I saw a reference to the Cambridge press release on a programming web site earlier today, and also immediately thought of our friends the climate scientists. I’m glad to see that your blog picked it up.

Comments are closed.