The code of Nature: making authors part with their programs

Guest post by Shub Niggurath

However, as it is now often practiced, one can make a good case
that computing is the last refuge of the scientific scoundrel.

—Randall LeVeque

Nature - show me your code if you want to

Some backstory first

A very interesting editorial has appeared recently in Nature magazine. What is striking is that the editorial picks up the same strands of argument that were considered in this blog – of data availability in climate science and genomics.

Arising from this post at Bishop Hill, cryptic climate blogger Eli Rabett and encyclopedia activist WM Connolley claimed that the Nature magazine of yore (c1990), required only crystallography and nucleic acid sequence data to be submitted as a condition for publication, (which implied, that all other kinds of data was exempt).

We showed this to be wrong (here and here). Nature, in those days placed no conditions on publication, but instead expected scientists to adhere to a gentleman’s code of scientific conduct. Post-1996, it decided, like most other scientific journals, to make full data availability a formal requirement for publication.

The present data policy at Nature reads:

… a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications in material transfer agreements.

Did the above mean that everything was to be painfully worked out just to be gifted away, to be audited and dissected?  Eli Rabett pursued his own inquiries at Nature. Writing to editor Philip Campbell, the blogger wondered: when Nature says ‘make protocols promptly available’, does it mean ‘hand over everything’, as with the case of software code used?

I am also interested in whether Nature considers algorithmic descriptions of protocols sufficient, or, as in the case of software, a complete delivery.

Interestingly, Campbell’s answer addressed something else:

As for software, the principle we adopt is that the code need only be supplied when a new program is the kernel of the paper’s advance, and otherwise we require the algorithm to be made available.

This caused Eli Rabett to be distracted and forget his original question altogether. “A-ha!”. “See, you don’t have to give code” (something he’d assumed, was to be given).

At least something doesn’t have to be given.

A question of code

The Nature editorial carried the same idea about authors of scientific papers making their code available:

Nature does not require authors to make code available, but we do expect a description detailed enough to allow others to write their own code to do a similar analysis.

The above example serves to illustrate how partisan advocacy positions can cause long-term damage in science. In some quarters, work proceeds tirelessly to obscure and befuddle simple issues. The editorial raises a number of unsettling questions that are sought to be buried by such efforts. Journals try to frame policy to accommodate requirements and developments in science, but apologists and obscurantists seek to hide behind journal policy for not providing data.

So, is the Nature position sustainable as its journal policy?

Alchemy - making things without knowing

A popularly held notion mistakes publication for science; in other words – it is science in alchemy mode. ‘I am a scientist and I synthesized A from B. I don’t need to describe how, in detail. If you can see that A could have synthesized B without needing explanations, that would prove you are a scientist. If you are not a scientist, why would you need to see my methods anyway?’

It is easy to see why such parochialism and close-mindedness was jettisoned. Good science does not waste time describing every known step or in pedantry. Poor science tries to hide its flaws in stunted description, masquerading as the terseness of scholarly parlance. Curiously, it is often the more spectacular results that are accompanied by this technique. As a result, rationalizations to not provide data or method take on the same form – ‘my descriptions may be sketchy but you cannot replicate my experiment, because, you are just not good enough to understand the science, or follow the same trail’.

If we revisit the case of Duke University genomics researcher Anil Potti, this effect was clearly visible (a brief introduction is here). Biostatisticians Baggerly and Coombes could not replicate Potti et al’s findings from microarray experiments reported in their Nature Medicine paper.  Potti et al’s response, predictably, contained the defense: ‘You did not do what we did’.

Unfortunately, they have not followed our methods in several crucial contexts and have made unjustified conclusions in others, and as a result their interpretation of our process is flawed.

Because Coombes et al did not follow these methods precisely and excluded cell lines and experiments with truncated -log concentrations, they have made assumptions inconsistent with our procedures.

Behind the scenes, web pages changed, data files changed versions and errors were acknowledged. Eventually, the Nature Medicine paper was retracted.

The same thing repeated itself in greater vehemence with another paper. Dressman et al published results on microarray research on cancer, in the Journal of Clinical Oncology. Anil Potti and Joseph Nevins were co-authors. The paper claimed to have developed a method of finding out which patients with cancer would not respond to certain drugs. Baggerly et al reported that Dressman et al’s results arose from ‘run batch effects’ –  i.e., results that varied solely due to parts of the experiment being done on different occasions.

This time the response was severe. Dressman, with Potti and Nevins wrote in their reply in the Journal of Clinical Oncology:

To “reproduce” means to repeat, following the methods outlined in an original report. In their correspondence, Baggerly et al conclude that they are unable to reproduce the results reported in our study […]. This is an erroneous claim since in fact they did not repeat our methods.

Beyond the specific issues addressed above, we believe it is incumbent on those who question the accuracy and reproducibility of scientific studies, and thus the value of these studies, to base their accusations with the same level of rigor that they claim to address.

To reproduce means to repeat, using the same methods of analysis as reported. It does not mean to attempt to achieve the same goal of the study but with different methods. …

Despite the source code for our method of analysis being made publicly available, Baggerly et al did not repeat our methods and thus cannot comment on the reproducibility of our work.

Is this a correct understanding of  scientific experiment? If a method claims to have uncovered a fundamental facet of reality, should it not be robust enough to be revealed by other methods as well, which follow the principle but differ slightly? Obviously, Potti and colleagues are wandering off into the deep end here. The points raised here are unprecedented and go well beyond the specifics of their particular case – not only do the authors say: ‘you did not do what we did and therefore you are wrong’, they go on to say: ‘you have to do exactly what we did, to be right’.  In addition they attempt to shift the burden of proof from a  paper’s authors to those who critique it.

Victoria Stodden

The Dressman et al authors face round criticism by statisticians Vincent Carey and Victoria Stodden for their approach. They note that a significant portion of Dressman et al results were nonreconstructible – i.e., could not be replicated even with the original data and methods, because of flaws in the data. This was only exposed when attempts were made to repeat their experiments. This defeats the authors’ comments about the rigor of their critics’ accusations. Carey and Stodden take issue with the claim that only the precise original methods can produce true results:

The rhetoric – that an investigation of reproducibility just employ “the precise methods used in the study being criticized” – is strong and introduces important obligations for primary authors. Specifically, if checks on reproducibility are to be scientifically feasible, authors must make it possible for independent scientists to somehow execute “the precise methods used” to generate the primary conclusions.

Arising from their own analysis, they agree firmly with Baggerly et al’s observations of ‘batch effects’ confounding the results. They conclude, making crucial distinctions between experiment reconstruction and reproduction:

The distinction between nonreconstructible and nonreproducible findings is worth making. Reconstructibility of an analysis is a condition that can be checked computationally, concerning data resources and availability of algorithms, tuning parameter settings, random number generator states, and suitable computing environments. Reproducibility of an analysis is a more complex and scientifically more compelling condition that is only met when scientific assertions derived from the analysis are found to be at least approximately correct when checked under independently established conditions.

Seen in this light, it is clear that an issue of ‘we cannot do what you say you did’ will morph rapidly to a  ‘does your own methods do what you say they do?’ Intractable disputes arise even with both author and critic being expert, and with much of the data openly available. Full availability of data, algorithm and computer code is perhaps the only way to address both questions.

Therefore Nature magazine’s approach to not ask for software code as a matter of routine, but to obtain everything else, becomes difficult to reconcile.

Software dependence

Results of experiments can hinge just on software, just as it can on the other components of scientific research. The editorial recounts an interesting example of one more instance of bioinformatics findings which were dependent on the version number of commercially available software employed by the authors.

The most bizarre example of software-dependence of results however comes from Hothorn and Leisch’s recent paper ‘Case studies in reproducibility‘ in the journal Breifings in Bioinformatics.  The authors recount the example of Pollet and Nettle (2009) reaching the mind-boggling conclusion that wealthy men give women more orgasms. Their results remained fully reproducible – in the usual sense:

Pollet and Nettle very carefully describe the data and the methods applied and their analysis meets the state-of-the-art for statistical analyzes of such a survey. Since the data are publicly[sic] available, it should be easy to fit the model and derive the same conclusions on your own computer. It is, in fact, possible to do so using the same software that was used by the authors. So, in this sense, this article is fully reproducible.

What then was the problem? It turned out that the results were software-specific.

However, one fails performing the same analysis in R Core Development Team. It turns out that Pollet and Nettle were tricked by a rather unfortunate and subtle default option when computing AICs for their proportional odds model in SPSS.

Certainly this type of problem is not confined to one branch of science. Many a time, description of method conveys something, but the underlying code does something else (of which even the authors are unaware), the results in turn seem to substantiate emerging, untested hypotheses and as a result, the blind spot goes unchecked. Veering to climate science and the touchstone of code-related issues in scientific reproducibility— the McIntyre and McKitrick papers, Hothorn and Leisch  draw obvious conclusions:

While a scientific debate on the relationship of men’s wealth and women’s orgasm frequency might be interesting only for a smaller group of specialists there is no doubt that the scientific evidence of global warming has enormous political, social and economic implications. In both cases, there would have been no hope for other, independent, researchers of detecting (potential) problems in the statistical analyzes and, therefore, conclusions, without access to the data.

Acknowledging the many subtle choices that have to be made and that never appear in a ‘Methods’ section in papers, McIntyre and McKitrick go as far as printing the main steps of their analysis in the paper (as R code).

Certainly when science becomes data- and computing intensive, issues of how to reproduce an experiment’s results is inextricably linked with its own repeatability or reconstructibility. Papers may be fall into any combination of repeatability and reproducibility, with varying degree of both, and yet be wrong. As Hothorn and Leisch write:

So, in principle, the same issues as discussed above arise here: (i) Data need to be publically[sic] available for reinspection and (ii) the complete source code of the analysis is the only valid reference when it comes to replication of a specific analysis

Why the reluctance?

(C) Jan Hein van Dierendonck

What reasons can there be, for scientists not willing to share their software code? As always, the answers turn out far less exotic. In 2009 Nature magazine, devoted an entire issue to the question of data sharing. Post-Climategate, it briefly addressed issues of code. Computer engineer Nick Barnes opined in a Nature column on the software angle and why scientists are generally reluctant. He sympathized with scientists – they feel that their code is very “raw”, “awkward” and therefore hold “misplaced concerns about quality”. Other more routine excuses for not releasing code, we are informed, are that it is ‘not common practice’, will ‘result in requests for technical support’, is ‘intellectual property’ and that ‘it is too much work’.

In another piece, journalist Zeeya Merali took a less patronizing look at the problem. Professional computer programmers were less sanguine about what was revealed in the Climategate code.

As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software, say computer scientists. At best, poorly written programs cause researchers such as Harry to waste valuable time and energy. But the coding problems can sometimes cause substantial harm, and have forced some scientists to retract papers.

While Climategate and HARRY_READ_ME focused attention on the problem, this was by no means unknown before. Merali reported results from an online survey by computer scientist Greg Wilson conducted in 2008.  Wilson noted that most scientists taught themselves to code and had no idea ‘how bad’ their own work was.

As a result, codes may be riddled with tiny errors that do not cause the program to break down, but may drastically change the scientific results that it spits out. One such error tripped up a structural-biology group led by Geoffrey Chang of the Scripps Research Institute in La Jolla, California. In 2006, the team realized that a computer program supplied by another lab had flipped a minus sign, which in turn reversed two columns of input data, causing protein crystal structures that the group had derived to be inverted.

Geoffrey Chang’s story was widely reported in 2006. His paper in Science on a protein structure, had by the time the code error was detected, accumulated 300+ citations, impacted grant applications, caused contrary papers to be bounced off, and resulted in drug development work. Chang, Science magazine reported scientist Douglas Rees as saying, was a hard-working scientist with good data, but the “faulty software threw everything off”. Chang’s group retracted five papers in prominent science journals.

Interestingly enough, Victoria Stodden reports in her blog, that she and Mark Gerstein wrote a letter to Nature, responding to the Nick Barnes and Zeeya Merali articles voicing some disagreements and suggestions. They felt that journals could help tighten the slack:

However, we disagree with an implicit assertion, that the computer codes are a component separate from the actual publication of scientific findings, often neglected in preference to the manuscript text in the race to publish. More and more, the key research results in papers are not fully contained within the small amount of manuscript text allotted to them. That is, the crucial aspects of many Nature papers are often sophisticated computer codes, and these cannot be separated from the prose narrative communicating the results of computational science. If the computer code associated with a manuscript were laid out according to accepted software standards, made openly available, and looked over as thoroughly by the journal as the text in the figure legends, many of the issues alluded to in the two pieces would simply disappear overnight.

We propose that high-quality journals such as Nature not only have editors and reviewers that focus on the prose of a manuscript but also “computational editors” that look over computer codes and verify results.

Nature decided not to publish it. It is now obvious to see why.

Code battleground

Small sparks about scientific code can set off major rows. In a more recent example, the Antarctic researcher Eric Steig wrote in a comment to Nick Barnes that he faced problems with the code of Ryan O’Donnell and colleagues’ Journal of Climate paper. Irked, O’Donnell wrote back that he was surprised Steig hadn’t taken time to run their R code, as reviewer of their paper, a fact which was had remained unknown up-to that point. The ensuing conflagration is now well-known.

In the end, software code is undoubtedly an area where errors, inadvertent or systemic, can lurk and impact significantly on results, as even the meager examples above show, again and again. In his paper on reproducible research in 2006, Randall LeVeque wrote in the journal Proceedings of the International Congress of Mathematicians:

Within the world of science, computation is now rightly seen as a third vertex of a triangle complementing experiment and theory. However, as it is now often practiced, one can make a good case that computing is the last refuge of the scientific scoundrel. Of course not all computational scientists are scoundrels, any more than all patriots are, but those inclined to be sloppy in their work currently find themselves too much at home in the computational sciences.

However, LeVeque was perhaps a bit naivé, when expecting only disciplines with significant computing to attempt getting away with poor description:

Where else in science can one get away with publishing observations that are claimed to prove a theory or illustrate the success of a technique without having to give a careful description of the methods used, in sufficient detail that others can attempt to repeat the experiment? In other branches of science it is not only expected that publications contain such details, it is also standard practice for other labs to attempt to repeat important experiments soon after they are published.

In an ideal world, authors would make their methods, including software code available along with their data. But that doesn’t happen in the real world. ‘Sharing data and code’ for the benefit of ‘scientific progress’ may be driving data repository efforts (such as DataONE), but hypothesis-driven research generates data and code, specific to the question being asked. Only the primary researchers possess such data to begin with.  As the “Rome” meeting of researchers, journal editors and attorneys wrote in their Nature article laying out their recommendations (Post-publication sharing of data and tools):

A strong message from Rome was that funding organizations, journals and researchers need to develop coordinated policies and actions on sharing issues.

When it comes to compliance, journals and funding agencies have the most important role in enforcement and should clearly state their distribution and data-deposition policies, the consequences of non-compliance, and consistently enforce their policy.

Modern-day science is mired in rules and investigative committees. What’s to be done naturally in science – showing other what you did – becomes a chore under a regime. However, rightly or otherwise, scientific journals have been drafted into making authors comply. Consequently it is inevitable that journal policies become battlegrounds where such issues are fought out.

71 thoughts on “The code of Nature: making authors part with their programs

  1. Shub Niggurath,

    I enjoyed your circumspect approach to the topic of disclosure of scientific documentation.

    It is my intention to advise my representatives in the US Federal Government as follows:

    NOTE: Of course I would wordsmith the following a lot before sending to my government reps. : )

    1. All, and I mean ALL, documentation in support of climate research performed with government funds must be supplied to citizens on request. ALL means all, which means the code too. Also, associated emails.

    2. To the extent that government funds where provided to a scientist who subsequently submits a paper to a scientific journal using the result of government funding and that paper is published, then the author(s) of the papers must show all documentation to any citizen . . . . ALL as in including code.

    3. In the modern era, clicking a button or several on a computer is all that is required to submit all documentation. Electronic storage is absurdly cheap. So, arguments that it is too costly or time consuming or manpower intensive are insufficient arguments.

    John

  2. This is a subject of significant interest to me. as a licensed engineer I’m bound by law to be liable for the accuracy and appropriateness of all solutions, regardless of whether I’ve used pencil and paper for a simple analytical solution, or computer code for a complex numerical solution. I would rank computer code in order of my Increasing burden of proof that my the results are appropiate

    1. Computer code which is in the public domain and has been verified by others in the literature to properly reproduce solutions of known analytIcal test cases.
    2. Proprietary, but commercially available computer code which has been demonstrated to reproduce known analytical solutions.
    3. Proprietary code which is not commercially available and has no previous evaluation or review in the literature.

    If I were reviewing a paper and the results were presented using a computer code such as 1) above, I would be ok as long as the author made the raw input files to the code available and I was provided assurance that solution parameters were chosen that did not cause any sort of numerical instability in the solution or otherwise exceeded the capabilities of the solution techniques used in the code. Of course whether thier solution makes any sense and leads to the conclusion described by the author is a seperate matter, concerning proper calIbration, and selection of proper input parameters.
    If I were reviewing a paper using unverifiable proprietary code such as 3) above, I would require a complete description of the code, the numerical solutions used, and a rigourous verification of the code by comparing it’s solutions to a wide range of known analytical solutions. I would expect the bulk of the paper to simply address the code, and I would still be skeptical of the conclusions drawn from use of the code.

  3. First, Edit Notes:
    ’does your own methods do what you say they do?’ == ‘do your own methods …’
    a bit naivé, == a bit naïve
    _______

    Second, it is a strange “truth of nature” which can only be observed by following a specific study protocol. (I am reminded of Skinner’s “Superstition and the Pigeon”, in which he induced hungry pigeons to perform bizarre dances and rituals by letting them have food pellets at random intervals.) Surely, any coherent hypothesis/conclusion should be demonstrable in a large number of procedurally unrelated ways. A “scientific truth” specific to one and only one method of discovery is of little value or significance.

  4. Thank you for this.

    The reality is that data and code need to be disclosed so that errors – such as the inversion you mention – can be corrected.

    And, just as importantly, before a computationally dense paper can be analyzed, its own basis should be replicated. This should not be made difficult. instead, the researchers should make a serious effort to put their “workings” before their peers and the occasional itinerant auditor who happens upon their material.

    After all, there is nothing to lose scientifically. To be found out in an error is not shameful, it is useful. (A fact Steig seems to have trouble with, but, oh well.)

    Science advances by getting closer to the truth. Providing the data and the code which got you closer to the truth confirms your claim. Unless it doesn’t in which case you go back to your work and see where you went wrong. No harm in that. Two steps forward, one step back.

  5. Software is intellectual property. It can be a trade secret, or patented. If it is published without protection, potential value is lost. That means there is a significant extra step that may be required prior to publishing. Of course, that is true of any process, frequently the scientific work that is the subject of the paper.

    Perhaps authors should think about publishing the software aspect separately, and only demonstrating the process acting on representative data. Another possibility would be to deposit compiled versions of software without source code, but with descriptions of algorithms. These compiled versions might be downloaded and installed on compatible systems for anyone to use. Or, there might be a web interface to a server that could process data input from another laboratory.

    It is important to provide the software name and version in methods sections. If a widely-available commercial product is used, obviously, the authors should not be required to deposit the software or source code in an archive prior to publication. Software changes rapidly, because people want to do new things with a useful tool. Of course, there are always bugs to discover and fix. Keeping track of every version could be a terrible burden and eventually a waste of resources.

    I don’t think there is a perfect solution regarding software. People have too many different systems and prefered programming languages. Software can be combined in a larger process in nearly infinite ways. I believe it should be sufficient to report the approach and the tools used. A good scientist would try to make software available to colleagues. Universities are almost as concerned with intellectual property as industry. Academia is not much of a refuge.

    Bad science gets discovered eventually. We can’t eliminate all of it through a prefilter. We find it when results just never seem to be repeated by anyone else. To me, the worst evil in climatology was multiple researchers apparently conspiring to hide cherry-picked data and the fraudulent nature of their conclusions. They may not be the only ones. I believe the only explanation for their conspiracy is the grant money involved. Large amounts of cash seems to breed corruption.

    At this point, I’ve concluded corruption at some level does seem to be a necessary component of AGW. It seems to be more cult religion than science.

  6. I presume Potti is in a prison somewhere now, fed beans with a slingshot on odd numbered days that are also primes, and in months with a blue moon.

  7. This is an excellent post, Shub.

    Code must be likened to a lab book, in any research that includes processing of data, and maybe nowhere more so than in climate science compilations with its reams of data, necessary adjustments and homogenization.

    Providing textual descriptions of what the code is tasked with doing is directly analogous to an abstract to a scientific paper – a summary of what is described in greater detail in the body of the paper. But would any scientist think it adequate to provide only an abstract? Of course not. No one would accept only an abstract.

    Just so, no one should accept descriptions of what code is intended to do, not without being able to see the code itself to see if some single typo or some mis-programming might have occurred.

    I found it very impressive that over 300 papers had citations to Chang’s work, and that the process and his integrity brought about the retraction of his papers. I am sure it is something he lost sleep over, but it had to be done.

    Let us not peer-review here. It DOES need to be pointed out that Chang’s papers were peer-reviewed prior to publication, thus showing that peer-review is imperfect and that any appeals to peer-review as proof that work is unimpeachable are to be viewed skeptically. No matter how small a percentage of reviews turn up to have missed flaws, the fact that they do should mean that the “assumption of error” approach of those attempting to replicate work are not “us vs them,” but merely scientists doing due diligence. It simply cannot suffice for others to start replication from an assumption of correctness.

    Correctness has to earn its way to respect. The original work is step #1, review is step #2, publications is step #3, and replication is step #4. Only after step #4 is it established science.

    Note that consensus was not one of the four steps.

  8. @Steve R Feb 26, 2011 at 11:28 pm:

    If I were reviewing a paper using unverifiable proprietary code such as 3) above, I would require a complete description of the code, the numerical solutions used, and a rigourous verification of the code by comparing it’s solutions to a wide range of known analytical solutions.

    Yes, thanks for pointing out the last.

    Why anyone would assert that new code is to be accepted without this proof of reliability and applicability belies belief. I see replication as part of this step in making a paper become established “scientific fact,” because replication will ideally include testing this “wider range.”

  9. You raise a vital topic and make an excellent case, but don’t go far enough!

    The impression is given that coding bugs are a risk but perhaps not extensive, and that testing code is straightforward. Both are wrong. Coding bugs are endemic and difficult to find. Any code that has not been rigorously tested will contain bugs.

    Any software-based work should include not just how the code works, but how it was tested and verified. Submitted code should include a software test harness and test data.

    On the issue of intellectual property, why don’t publications require that code is released under some type of formal public licence?

    Mike

  10. I would suggest that the version problem can be minimized by requiring that the final published results be run on a (clean, minimal) virtual machine which is then archived at the journal’s SI site.

    A windows 7 VM (which is, in my experience, the most space consuming) requires about 10GBytes of OS before you stick the relevant scientific packages etc. on it. However that 10GB is compressible so the total archive size is rather less than that. A linux VM typically requires 1-2GB for a basic OS plus GUI. ISTR that I built a linux VM with R in a 8GB HD and never used more than about a third of the disk space, even with multiple R libraries added.

    There are a huge number of benefits from requiring the clean build. These range from being completely certain about the code version and any dependencies to the ability to document precisely the steps taken (install a specific OS from ISO or VMware image, install the following versions of tools etc.). Over all it helps enforce a good attitude regarding version control and other related software engineering best practices and, by providing the VM it allows anyone who wishes to reproduce the results given the input data. This then allows those who are interested to audit the process by analysing code quality or verifying robustness by porting the code to a different OS/languages/library version etc.

  11. Excellent article. The dangers of acceptance of insufficient testing are highlighted by the failure of the drug Thalidomide. The United States was very fortunate to have Frances Kelsey review the drug for the FDA. She decided that Thalidomide had not been sufficiently tested, so FDA approval was not given.

    In countries which had accepted the test results and approved the drug, around 12000 babies were born with phocomelia (birth deformities) due to the use of Thalidomide during pregnancy. However only seventeen were born in the US.

    The original testing had been on rats, which had apparently shown no ill-effects. When Thalidomide was found to be teratogenic (cause malformation of a fetus) further tests on rabbits and primates resulted in phocomelia in these animals.

    Thousands of people in the USA, around the age of fifty, were born healthy because Frances Kelsey refused to accept, on faith, the research results presented to her.

  12. One of the factors probably resulting in most resistance to publishing ones code is that a lot of scientists write ugly code. I worked as a scientific programmer in the 1980’s and there was considerable pressure to get the code working and get on with the real work of the lab which was electrophysiology. Once the code seemed to work, then it was used in experiments and I’d often be frantically programming in the middle of an experiment to correct bugs which had just showed up and many of these quick fixes were undocumented and so different versions of the program were used for different sets of data. Fortunately, the data acquisition and storage portion of the code was well debugged initially in the project but even the final version of my code still has bugs.

    I found this out the hard way when I agreed to port what I thought was a well tested assembly language PDP-11 program I’d written . This was a bleeding edge program for the time pushing the PDP-11 to its limits for data acquisition/realtime analysis and I naively assumed that all I’d have to do was recompile the program in my colleague’s lab and use a lower A/D sampling rate for his slower PDP-11 model. What I didn’t know was that some instructions were missing from his machine, specifically SOB (subtract one and branch) and I just replaced that instruction with a decrement and comparison, but then the program didn’t work. It took over a day of almost continuous poring over my code to find a very subtle error which resulted in the program working just fine with the ordering of the instructions I had for my PDP 11/34 but failing on the lower end PDP-11. I’m sure there are still other bugs waiting to be found in that piece of code.

    This is why I like open source code which has been described as peer-reviewed code. It’s impossible to adequately document scientific code as it’s constantly changing. When we needed programming help in the lab we chose engineers who got code written and working fast whereas computer science students coded far too slowly producing excessively commented pretty code instead of working programs. The other thing that has to be specified is what compiler was used for the code, what libraries were linked into the code and, what fixes were made to the libraries as well as the CPU version used. I didn’t realize how much I patched certain programs until I attempted to unsuccessfully run some of my own code 20 years later on a non-customized PDP-11 system.

    When I first started at UBC, we stored experimental results in analog form on a multichannel tape recorder which means the worst case scenario is that one just has to re-analyze the original data if the analysis program has serious bugs. The last work I was doing involved total digital recording and there were numerous complaints about my being overly obsessive with meticulously documenting and testing the data acquisition system (my job also included digital logic design) and writing what I was told was far too detailed calibration code. That was the one section of the project which I had great confidence in but there were some major errors along the way in the data display and analysis sections which almost always involved using + instead of – or dividing when I should have multiplied. I hate to think what type of errors are present in some of the insanely complex satellite temperature measurement programs where the displayed data involves multiple processing steps.

    One of the problems we had was that development of the programs was considered ancillary and the actual results which involved measuring the transfer functions of guinea pig trigeminal ganglion neurons and how they were affected by general anesthetics were the only things that counted for publication credits which were needed to renew grants. I estimate that 95% of the work on this project involved programming and digital electronics to perform a novel type of analysis which was pushing the limits of 1980’s hardware. I guess I got something right as people have now duplicated the results we got 25 years ago just using the general description of how we did the experiments. I tried to use some of my data acquisition routines recently and found it simpler to rewrite them than try to figure out my 30 year old FORTRAN code where a 2 character variable name was used only when absolutely necessary and, not only did I use GOTO’s in profusion, but I hacked the FORTRAN compiler so that I could create jump tables and other techniques which are considered to be too dangerous to use in code now.

    I should also note that the full extent of my formal computer science education consisted of one evening FORTRAN programming course in 1969.

  13. Hoser says:
    February 27, 2011 at 12:03 am

    “Software is intellectual property. It can be a trade secret, or patented. If it is published without protection, potential value is lost.”

    This is fortunately not true. Software is automatically copyrighted upon creation (Berne convention and laws in each jurisdiction). It has the same protections as e.g. a novel that has been published. The mere fact of publication does not remove the copyright protections.

    I repeat, this means that the software source code can be made available without loss of protection. Perhaps if (make that when) code is required to be made available, then the scientists creating the code will have it checked by competent programmers before they use it to create dubious results. I am certain that following the “open source” model, which does not necessarily involve the “free/libre open source” model, then major improvements will be made in the code and in the reliability of the results.

  14. I am not a (Paid) scientist, BUT I have studied physics and chemistry since before the AGW-CO2 scare began (Long years. And I wish people would go read a real book and learn unbiased views, libraries are, STILL, free, rather than Wiki/Google “filtered” links. I can hope, right?). I started my “study” when “scientists” stated “an ice age was approaching”, due to emissions of CO2. Hummmm! There may have been genuine scientific concern, unfortunately, Thatcher sorted that out for us all. Her vision for the UK was “services”, NOT “making stuff”, as “making stuff” relied on COAL (Energy) at that time. Coal was bad in her eyes (Miners), AS WELL AS, oil AND nuclear. So, crush “industry” (The making stuff bit), and “expand services”. Result? Stuff all your eggs in one basket…=FAIL!

  15. Thanks for a good, informative and revealing article. It seems to this layman that the computer age has taken the scientific community by surprise with respect to:
    1. software documentation and version control
    2. the blog review process
    3. “freedom of information” (e.g. climategate)

  16. Shub said: Certainly this type of problem is not confined to one branch of science. Many a time, description of method conveys something, but the underlying code does something else (of which even the authors are unaware), the results in turn seem to substantiate emerging, untested hypotheses and as a result, the blind spot goes unchecked.

    The thrust of your piece is about how not making codes available is allowing bad science to be sustained. More might be achieved in pursuading journals if you reverse the argument – good science *may* be being abandoned due to duff results caused by computer programs.

    Rather than lay it on thick that it should be easier to show where individual scientists have gone wrong releasing computer code as a matter of routine would be a massive aid to science in general getting things right. Perhaps even releasing code long before you finish your paper, so that you don’t slog away for months or years only to be undone by bad computer code.

    Nature said: A strong message from Rome was that funding organizations, journals and researchers need to develop coordinated policies and actions on sharing issues.

    Nonsense. What is it with this institutional craven attitude bordering on mania that no group should dare take the lead on something. They wish to erase the effect of peer pressure, to make ‘advances’ that the most illustrious journals can take credit for and further cement themselves at the top of the tree.

    This is an opportunity for other journals to upset the applecart.

  17. Hoser, that’s nonsense. “Intellectual property” is a term relevant to a marketable product or process, not to scientific reports to the research community. Get this straight: the purpose of research, in each specific instance and in general, is to bring forward ideas and evidence which contribute to understanding. PERIOD. Especially when it comes to climate. Unless you have a climate control device you’re trying to sell?

    Any competing interests are destructive to that understanding. Are they what you are defending?

  18. Speaking as a programmer, I’d say there are two necessary practices to adopt. The first is something that might better be termed code auditing than code review: the code used to write a paper needs to be checked by an expert programmer to make sure it does what it is supposed to do. The choice of programming methodology would be more suitable for reviewers to comment on – so in some cases a reviewer will need to be an expert programmer. The other practice I would suggest involves the creation of some kind of standard pseudo-code with which one can accurately describe the programmatic steps taken, without providing actual code to run a program.

    All that said, the adoption of proper programming practices would significantly improve matters. Scientists in general don’t seem to appreciate that programming is a profession just as much as practising law or medicine, with similar levels of knowledge and experience required to do the job well. People who wouldn’t dream of attempting to diagnose and treat their own illnesses, or representing themselves in court, will think themselves capable of effectively writing code that they are equally unqualified to write.

  19. Oh, one more point: adequate unit-testing would all-but prevent the programmatic ‘bugs’ from affecting results. Every section of code should be checked piece by piece to ensure correct function at every stage.

  20. I wonder if there is not a case for a sub-division of science: Basic-science and “scientific interpretation”. The aim of basic science would be to provide a repository of scientific fact – facts that are undisputed. As such basic-science would publish all code, data etc. etc.

    Then perhaps for those who wish to be more secretive in their techniques, there should be lesser standards for “interpretative” science, whereby the rules for publishing details of code and methodology would be a lot less rigorous.

    The problem with climate “science” is that it seems to want to have its cake and eat it. It wants to be seen as providing the raw data, the climate “facts”, but it doesn’t want to be subject to public scrutiny.

  21. “As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software, say computer scientists.”

    As an engineer, I was responsible for documenting everything that I did for future reference use. One problem I found early on in my career is that when I cleaned everything up I found mistakes and I thought of other things that I should have done. Then it was especially embarrassing to go back and correct my mistakes and omissions. The lack of rigor mentioned above is inexcusable. Full disclosure of code will force researchers to be more rigorous.

  22. Excellent information. We need a volunteer effort like Anthony Watts Surface Staions to audit journals, determine the availability of data, and publish results. I would be willing to subscribe to one journal such as Science and request information on papers, and I’m sure many others would. What we need is someone to lead this effort.

  23. Excellent stuff here.
    I am a professional software engineer. I have never released an application to the testers that does not contain hundreds of bugs, and they have never signed off an application that does not contain dozens of bugs.
    When a new version is released it can contain hundreds, even thousands of changes, and each change might have a disturbance effect, creating yet more bugs. It should go through the same testing as the original, but often does not.
    To apply this type of regime (that still reults in errors) to ‘home-grown-built-on-the-fly-by-scientists’ software is simply a non starter.

    As for versioning, that is so open to abuse that it becomes an absurd notion. There is no way of knowing that the software used at the beginning of a process is the same as that at the end. It can be changing faster than – well, the weather.

    I dont know what the answer is here, apart from the obvious – which is to get to the position where the software doesnt matter, where it is not so crucial.

    which means ditching the models

    EO

  24. Mike G
    I agree with you at a basic level on bug testing. But most scientific code is written by one person or a small person, to be used only by them (is what they think), run to produce expected outcomes (which serves as validation), has a high chance of not being used in its form again.

    No wonder bug tolerance is high. I would suspect bug testing would involve repeatedly reading the code and pulling out hair, rather than a systematic approach with specific tools. (that is what I do).

    Another Gareth:
    Your approach suggests that journals would be attracted to the prospect of contributing to science. I don’t think that is particularly the case. Or at least it is not true with all journals. This maybe true even if their editorial boards are staffed with people who have the highest aims about science. Most journal policies however, also have the imperative of being in tune with their economic pressures. So for a high profile journal like Nature, getting ‘scoops’ is of paramount importance for its business, rather than weeding out trouble with its papers.

    If Nature started laying down stringent code requirements, which cannot be met because of the very reason that scientists with an exciting finding are hurrying up to ‘get scooped’ and beat other labs into print (and don’t have time to clean up code), its business model would suffer.

    Secondly, imagine if Nature published code (it does, but not every bit) and people started balking at the awful code that is behind the latest and the greatest papers. I would imagine scientists are scared where their code sloppiness would be read as an indicator of their science sloppiness. Which in may cases, it is.

  25. John Whitman>

    ” To the extent that government funds where provided to a scientist who subsequently submits a paper to a scientific journal using the result of government funding and that paper is published, then the author(s) of the papers must show all documentation to any citizen . . . . ALL as in including code.”

    That’s a different issue, really. Publicly funded work is subject to different access requirements. That issue aside, it’s certainly possible for someone to describe a method which can be followed, but without providing their code. If I have done (non-publicly funded) work which involved writing code, I may have come up with a particularly neat an elegant solution – perhaps it’s easy to work with, or particularly fast, or some such. I don’t see that it’s my responsibility, in that situation, to pass on my code – it’s my professional advantage.

    I think this situation is somewhat analogous to another which is more straightforward: for a lab experiment, directions should be provided for replicability, but there is no onus to provide lab facilities.

  26. This problem is not limited to scientific journals. It is apparently rampant in law reviews as well, a problem I am in the process of documenting.

    The review process for a law review article is limited to checking the reference to be sure that the precise words quoted are in that reference. It does not include checking to see that the inference or conclusion that the law article author takes from the reference is in fact the conclusion that the reference author drew. Nor does it include checking that source’s reference if it is not original.

    So we have an article by a prominent environmental law professor and textbook author who cites a law review article by someone else for a specific scientific fact, that second article cites to a scientific journal for that fact, and both law articles present a different conclusion than the author of the scientific journal article, even though the quotes are all accurate as far as they go. The prominent professor goes on in his article to recommend major federal policy changes based in part on this fact and unsupported conclusion. I haven’t seen any evidence that this particular article actually has been used to support major public policy changes, but I can tell you that court opinions often reference law review articles to support technical and scientific positions that provide a basis for the opinion and decision.

    This particular prominent professor told me in an email that it is not his responsibility to check the veracity of his sources, so long as they have gone through a review process, either peer-review in a scientific journal, or at another law review. He merely relies on the system to be right.

    This is all very scary to me–collectively the blind leading the blind, and ignoring those who may be able to see because they are not in the system.

  27. Randall LeVeque knows engineering science computing.

    We mere mortal engineers have been doing it the correct way for years.

    References for the editorial polices for some of the journals that have the same requirements are:

    The ASME Journal of Heat Transfer: Editorial Board, “Journal of Heat
    Transfer Editorial Policy Statement on Numerical Accuracy,” ASME
    Journal of Heat Transfer, Vol. 116, pp. 797-798, 1994.

    The ASME Journal of Fluids Engineering: C. J. Freitas, “Editorial
    Policy Statement on the Control of Numerical Accuracy,” ASME Journal
    of Fluids Engineering, Vol. 117, No. 1, p. 9, 1995

    http://www.asme.org/pubs/journals/fluideng/JFENumAccuracy.pdf.

    The AIAA Journal of Spacecraft and Rockets: AIAA, Editorial Policy
    Statement on Numerical Accuracy and Experimental Uncertainty, AIAA
    Journal, Vol. 32, No. 1, p. 3, 1994.

    The International Journal for Numerical Methods in Fluids: P. M.
    Gresho and C. Taylor, “Editorial,” International Journal of
    Numerical Methods in Fluids, Vol. 19, p. iii, 1994.

    Some posts about a methodology that has been successfully applied to a variety of engineering and scientific software are characterization of the software, requirements for production-grade software, and software verification.

  28. Excellent post. All code should be available whether it’s a climate model, string theory model, or simply a counting algorithm. I don’t really care what the scientific community thinks about this, especially since so many scientists (?) are in bed with politicians, and are more interested in promoting an ideology goes beyond the research they are publishing. The public in general better take notice and start demanding complete transparency in published research, and withdrawing public funding if it is not forthcoming.

  29. I once toiled in the Ivory Tower. My idealism destroyed by that experience, I had enough knowledge of what unimpeachable science was, to take my original raw data, frequency response photo images, and schematics of the equipment I used with me when I left, leaving of course the equipment, xeroxed data copies, and computer files behind. While I no longer practice in that chosen field, I still have that raw data and could reconstruct what I did, right down to building the electronic equipment, and the “Statview SE” software I used for data analysis. If anyone were to ask me for that raw data, I would not be able to say, “the dog ate it”.

    Back then I was just a low down, bottom of the totem pole research audiologist in a lab with plenty of other more experienced managers and lead researchers above me. I was hardly worth mentioning on the “el al” list. But it was my toiling and my thesis work. I thought it incumbent to keep that raw data for as long as I lived, just in case others needed it. What gobsmacks me is the recent bungling by well-known and experienced climate researchers who state “we didn’t keep the raw data”.

    That either says something very, very bad about them, or it says maybe I should have had the cojones to stick around with a laundry clip on my nose, and continued doing research.

  30. Dave says:
    February 27, 2011 at 5:42 am

    I think this situation is somewhat analogous to another which is more straightforward: for a lab experiment, directions should be provided for replicability, but there is no onus to provide lab facilities.

    Although I generally agree with your sentiments, I think the above statement requires some elaboration. (If you’ve been following discussions at Lucia’s Blackboard, you know that analogies are kind of a running joke, but here goes anyway.)

    My background is in chemistry, so I’ll speak from that area. If a lab experiment only uses readily available equipment, certainly there would be no expectation other than to disclose “the reaction was run in a three-neck, 500 mL round-bottom flask”. But if custom equipment is required, the usual requirement is that the equipment be described in detail with drawings and a full description on how someone can obtain a similar if not identical piece of equipment. If the experiment is such that “it can only be reproduced by Joe in a special flask he takes home every night and hides under his bed”, that would be present a problem.

    There are, of course, occasions where the equipment is unique and/or expensive, such Brookhaven’s synchrotron facilities for things like XAFS or Argonne’s for neutron diffraction. In those cases, it is understood that the researchers are using shared facilities, that while not universally available, can become available with something other than a secret hand shake and a certain way of spitting.

  31. The British Met office has a $53,000,000 Super Computer that was installed in 2008 to predict the climate/weather years into the future. It has problems with next weeks. the computer works just wonderful, it is the programs that don’t work. Or I should say, the writers of the programs are not up to the capacity of the computers ability.
    As Shub Niggurath says in this superb article most scientists write there own programs, I would ask these ‘Scientists’ would you be happy having open heart surgery done to you by the local butcher ?

  32. I’d go further, and require sharing the change history of any code written by the researcher. This can be done easily on a service like Bitbucket, Github, or Google Code, and is generally free for open-source projects. Reproduceable, hermetic builds are a key bit of software engineering that the scientific community really should adopt.

  33. feet2thefire says:
    February 27, 2011 at 12:23 am
    Let us not peer-review here. It DOES need to be pointed out that Chang’s papers were peer-reviewed prior to publication, thus showing that peer-review is imperfect and that any appeals to peer-review as proof that work is unimpeachable are to be viewed skeptically.

    I would suggest as well, that Chang’s papers were not properly peer-reviewed prior to publication since the code was not provided. Peer review must include complete access to the methodology which would include any computer software or code involved.

    Is this not the crux of the matter?

  34. On the first day of every job I have had, I had to face existing code that was written by another programmer who often was not available to provide information about what they left behind. If I was lucky, the comments would tell me something about what the programmer had in mind. Because the code was often already in production, rewriting from scratch was not an option unless the code was so bad that it couldn’t be salvaged. On my current job, the code needs to be as close to 100% bug free as possible because it controls a device that in one case would take out over 7000 user if it fails. When I started on the project, uptime was measured in hours. Now it is a year or better(hard to measure with so few failures).
    This has taught me that programmers fresh out of school often lack the caution need to write complex code and are willing to release code that may not have been reviewed or tested sufficiently in order to meet deadlines. Getting the code out fast ends up costing far more in the long run because the repair programmer not only has to learn what was intended, but has to figure out what the programmer had in mind before making a single change to the code. This is a great duplication of effort.

    This makes me wonder how someone who is interested in another subject can take a single course in programming and with no experience in programming can be expected to turn out a correctly debugged application without reviewing it several times after the program is complete. I suspect many of these applications are written like throw away code where only one run of the code is all the use the code will ever see. While I have written throw away code, I didn’t trust it and checked the results carefully to make sure it was doing what I expected it to. Permanent code I review several times, step through with a debugger when possible and I have another person review it when possible. It is easy to assume the code is doing something that it is not so a programmer must never assume their code bug free without using every check that is available.

  35. Published science is only as good as it’s ability to be INDEPENDENTLY VERIFIED. Think about Albert Einstein, he said gravity would bend light.

    Most thought he was right, others didn’t.

    Finally, he was proven correct. It took a number of eclipses’ to do it, but his logic, equations, “algorithms” & calculations vindicated.

    The least the nerds who are proclaiming the destruction of land, atmosphere & oceans can do is to submit to the same approach used to honor Einsteins achievements! If not, they must be hiding something, like fraud!

  36. What, today, would be an onerous requirement, will tomorrow be the click of a button.

    The only reason truly complete submissions of code, methodology, data, etc… in today’s scientific world is difficult is because there is exists no apparatus to facilitate it.

    Never wanting to complain without proposing a solution: A “better” solution for issues with code would be for groups who write and use code in their papers to be asked to make a copy workable on the Journal of publications server. Then anyone who wants access to the code can log into a code server type a command and have a folder with all of the code, data, etc… freshly copied, linked, compiled, and ready to go.

    In this way the group is asked only to provide support to a single party (the journal to which they submit their paper) and then the journal can setup whatever scripts are needed to maintain a system whereby anyone can quickly and easily bring the code and data on any submission together to examine the results.

    Such a system would be incredibly simple to set up, and doubtless there would be many additional features that could be added.

  37. Some more excellent points in the comments (Dena)

    Can I suggest looking at this from a slightly different point of view?

    If the good fairy decended, and provided a bit of software that perported to solve world hunger, we would run it and accept the results.

    In the real world, we should require forensic levels of proof that a particular copy of the software was used against a particular set of date in order to produce a particular set of results.

    Anyone who knows anything about how often these things change, knows that the world would run out of storage capacity in a few days. I just dont believe, that with the way things in science are currently practised thats its possible to get forensic.

    we need a new approach

    EO

  38. The easiest way as I see it to do this, is for researchers to show their methodology in a flowchart or UML diagram. There are a number of very good free programs to do this. Some of them even write out the underlying code in the language of choice.

    A flowchart shows the underlying logic behind the method. It is a wonderful debugging and optimisation tool in its own right, and allows other researchers to rewrite the code in their favourite language if they so wish. A lot of the mistakes outlined in this article would have been caught before publication if that type of check was made as a final step.

    As for “intellectual property”, if it is being paid for by the public purse, the researcher doesn’t get a choice in this, it has to be published along with the underlying data or a link to it.

    Even where it is not being paid for by the public, say in the case of pharmaceuticals, it should be made available to the authority governing whether the product should be approved or not. This way, an audit trail of who exactly is to blame if it proves erroneous, can be established quickly and correctly.

    Most research nowadays couldn’t even be done without computers, and to leave out or hide the code or methodology is a huge backward step in my view. Excellent article, btw.

  39. Shub Niggurath said:

    Another Gareth:
    Your approach suggests that journals would be attracted to the prospect of contributing to science. I don’t think that is particularly the case. Or at least it is not true with all journals. This maybe true even if their editorial boards are staffed with people who have the highest aims about science. Most journal policies however, also have the imperative of being in tune with their economic pressures. So for a high profile journal like Nature, getting ‘scoops’ is of paramount importance for its business, rather than weeding out trouble with its papers.

    Which is why I think the smaller journals could steal a march on the leviathans. Their income stream is (presumably) smaller so economic considerations could be less distorting. The top tier have a vested interest in crafting a compromise that they are able to meet and doesn’t perturb their business operations but could conceivably hinder the smaller outfits.

    Survival of the fittest. Smaller journals are fitter. Lots of journals taking lots of different approaches and see what generates support and credit rather than a compromised, controlled and sedate inching towards transparency that maintains the cartel at the top.

    Shub Nigguarath said:

    Secondly, imagine if Nature published code (it does, but not every bit) and people started balking at the awful code that is behind the latest and the greatest papers. I would imagine scientists are scared where their code sloppiness would be read as an indicator of their science sloppiness. Which in may cases, it is.

    The potential damage to reputations would be an incentive to get professional help or training to make your code tidy, sensible and well documented. Many scientists are working in or with universities. Would that be a useful environment to get computer science students and departments to cast an eye over programs and maths departments to do likewise with the handling of statistics? They could do it early on in an investigation rather than after time, money and reputations have been invested in a particular result.

  40. It is NOT enough to ask for “code” to be published with a paper or provide for by the authors upon request. Code has multiple meanings in computer science. Code can be the BINARY CODE and while having the executable problem might help you reproduce the results it’s the SOURCE CODE along with all the necessary source code of all the software libraries needed to rebuild the software and the instructions for doing so that are crucial for studying the analytical, mathematical, statistical, information processing and data manipulation as well as other methods used in the software.

    Actually you want both the binary and source code of the programs used and all the libraries they used, plus instructions.

  41. pwl,

    We want it all; raw data, code, metadata, methodologies. Everything, including what was collected but not used. It can all be publicly archived on line with very minimal effort. When public funds are involved there is no excuse to hide anything. We’re not talking national defense secrets here. It’s essentially weather predictions.

  42. Good article.
    As a slight aside – Microsoft spend millions/billions of dollars and zillions of programmer hours producing their stuff – and it still has bugs of one sort or another. Often, such bugs are not found until some alcohol crazed ham fisted idiot presses the wrong sequence of keys (for example) and the system, despite all its development and testing – says ‘Beggar this for a lark’ – and freezes! Now, I am sure we have all had this at some time, and we all then wondered ‘Was it me? – or is their a bug?’

    The point that a scientists self generated code may be bug-ridden is surely taken as a given. The problem is that it cannot be tested sufficiently by the writer – it has to be destruct tested by ‘ham fisted idiots’, in effect! I would presume that many code writers never ‘hand ball’ their calculations to see if the code output is correct? – the old, ‘it looks right’ attitude may often prevail.

    The advantage of general code release is at least two-fold in that
    a) the code can be inspected and checked for errors by independent persons.
    b) the code can effectively be destruct tested by all other ham fisted muppets! (I mean that in the nicest possible way!)
    Obviously, additional benefits may arise in kind – such as a kindly or more experienced programmer re-writing code to be less error prone and more understandable?
    I find it hard to believe that anyone considers code to be intellectual property in the context that it is to be separated from any published work that relies upon it. If any work uses code – it must be published to validate the work.

  43. John M says:
    February 27, 2011 at 7:10 am

    Well, let’s not debate the analogy :)

    Suffice it to say that the point I was making is that there’s a point beyond which as long as replicability is reasonably possible, it’s not the responsibility of the original researcher.

    I think if any genuinely revolutionary computational method is used, it needs to be published separately in the computer science literature and evaluated there before it can be relied on. Well below that level, though, there exists something that we might loosely call ‘good code’. To take your example of the unusual piece of lab glassware, I have to describe it, but I don’t have to tell you my glass-blowing tips as long as it’s possible to construct the glassware without them.

    All this said, I can’t help feeling that a true scientist welcomes all challengers, and does his best to help them. If his theories are correct, the stronger the test, the better the proof. In the real world, many scientists don’t adhere to that ideal, of course. And, as I said before, if they’re publicly funded, that’s reason in itself to disclose pretty much anything anyone asks to see.

  44. pwl says:
    February 27, 2011 at 11:00 am

    Absolutely.
    Just receiving a bunch of compiled Fortran code means nothing without the compiler version, build number, etc…. In the same way, as we cannot run MS software on Macs, etc, we cannot expect everyones version of a piece of software to be exactly the same. Logically, it would make sense for a repository of any used system type code to be made available (at some online server) as a ‘reference’ to be included in any published work where code was used in its production. It’s not difficult in academia, I wouldn’t think. Even in private companies, we all talk about which version of Office or Excel or whatever, that we are using/running in order to advise our clients/counterparts accordingly when we send them something -would it be so difficult to organise this in science?

  45. I too am a professional civil engineer. That is, a practitioner of applied science. I have a great deal of experience in highway design, hydrology studies and hydraulic design., and worked in the field of transportation planning.

    All of my work, including calculations, was reviewed and commented on by experts in the appropriate field of engineering, and I would not have had it any other way.

    I was paid mostly by the taxpayer (some of my work, in private practice, was for commercial interests), and since I had agreed to the various pay-scales in state and local government I was subject to, I thought I was receiving adequate pay, even though I could have practiced in the private sector (and did for about 7 years, as project manager), and receive far better pay (but with far fewer benefits).
    Since I was being paid with public money, I fully expected that all of my work, including calculations, was open and available for all to see, review and comment on.

    These so-called climate “experts” are paid with public money, all of them, and paid handsomely indeed. They should not be paid extra. Nor should they be allowed to be paid and expensed to go to what are in reality grand vacations at resorts, which they refer to as seminars, on the taxpayer’s dollar.

    As it is, it is clear to me that they have little knowledge of the fundamentals of psychics and chemistry, and they don’t even do arithmetical calculations properly.

  46. So Nature the journal remains the Journal of Irreproducible Results. I stopped buying it some years ago as the content of so many of its articles diverged from the headline and claimed conclusion. Opinion is not scientific proof and that is all the JIP has had for years, I browse it at news stands but there is still not enough content to justify buying. Too much of the self proclaimed science of today is thinly disguised G.I.G.O..Flow Chart analysis included with the code would go a long way toward the authors spotting their own errors, but wrt the team I have seen no real interest in science in their actions to date.For the IPCC ect science is a cloak not a method.

  47. I concur with you fully Smokey.

    I’m simply pointing out that we have to be very specific in what is asked for from the scientists otherwise – as we know from hard lessons – many of them will be sneaky and not give source code but only the binary code which while it can be reverse engineered it’s a very time consuming and error prone process.

    The article above uses “code” when it should be saying “source code” or “source code and binary code”.

  48. Dave says:
    February 27, 2011 at 5:42 am

    ‘If I have done (non-publicly funded) work which involved writing code, I may have come up with a particularly neat an elegant solution – perhaps it’s easy to work with, or particularly fast, or some such. I don’t see that it’s my responsibility, in that situation, to pass on my code – it’s my professional advantage.’

    ————–

    I am very troubled by this attitude. Surely the professional advantage accruing to a scientist should be his sound contributions to science and any deserving recognition arising from this?

    Science properly understood is not a competition to gain advantage over others – that is a business model. Science historically, when done at its best, has either been collaborative and constructive, or competitive in terms of the ideas under consideration, not in terms of the hiding the methods used. Above all else, science should strive to be reproducible. Any barriers put up to block that reproducibility or to prevent others from reaching the same results — if they are valid — should, by rendering the science unrepeatable, invalidate that science.

    If the commercial possibilities of a computer code are more enticing to a scientist than the actual scientific outcomes, then prioritize and b****y-well patent or copyright the process first, then make it available at the same time you publish your wonderful model. This was the method followed by Craig Venter in his approach to the Human Genome Project, when ‘decoding’ genes. If your only reason to withhold the code is intellectual arrogance or a desire to have some strange advantage (advantage in what?), stop pretending you are a real scientist. You are not trying to advance humanity’s understanding of nature, in that case.

  49. Following the links here led me to a timely reminder at Rabett Run written a few months after Climategate:

    As Jones wrote:

    Almost all the data we have in the CRU archive is exactly the same as in the [GHCN] archive… The original raw data are not “lost”… If we have “lost” any data it is the following:

    1. Station series for sites that in the 1980s we deemed then to be affected by either urban biases or by numerous site moves, that were either not correctable or not worth doing as there were other series in the region.

    2. The original data for sites for which we made appropriate adjustments in the temperature data in the 1980s. We still have our adjusted data, of course, and these along with all other sites that didn’t need adjusting.

    3. Since the 1980s as colleagues and National Meteorological Services (NMSs) have produced adjusted series for regions and or countries, then we replaced the data we had with the better series.

    In the papers, I’ve always said that homogeneity adjustments are best produced by [National Meteorological Services]. A good example of this is […] Here we just replaced what data we had for the 200+ sites she sorted out…

    I think if it hadn’t been this issue, the Competitive Enterprise Institute would have dreamt up something else!

    But were these adjusted “data” relying on the inadequate UHI adjustment standards, still effectively drawing on the 1990 Wang & Jones paper whose serious critique by McKitrick & Michaels was suppressed from AR4?

    I look forward to transparent data from BEST.

  50. Wonderful article. Everyone should copy and save. It is a fine contribution to our understanding of scientific method and how journal standards support or undermine it.

  51. Why not include the code?

    How else to validate that the conversion of the mathematical algorithms are correct? How else to validate that the implementation of said functions are correct? How else to validate how the data is read and mann handled in all the process’?

    We non know there never was any QA of the CRUde code, and that there’s no real QA done by NASA/GISS and NOAAs climate related coding neither.

    Any one that has an interest in math has most likely learnt about error propagation and most folks who’re interested in coding has learnt, probably the hard way, about error propagation in coding and that different languages can yield different results even though you do the implementation the same in all languages. How many think the average so called climate scientist is interested in either math or programming? How many think the average climate scientist understand, if they even know it, that different brands of calculator, and different versions of the same calculator from the same manufacturer, yields different results from the same equations?

    If you’re given a average would you not want to know how much of the data was excluded and on what grounds it was excluded and if the exclusion was rational to boot and on conscious ground or just because of badly implemented code, badly constructed filters (like excluding 20 years of a pre-defined period just because one year was considered bad just because there wasn’t enough statically predefined days within the statically predefined temperature ranges. And who would not want to know if the predefined ranges was from the original coder’s geographic location in some Siberian outpost but applied to Miami), et cetera et al or just bugs (but of course crazed climate hippies can outdo Microsoft, Apple, IBM, Oracle, SUN Microsystem, Google, and the whole Open Source community, with bug free code because what they’re so in to hacking?)

  52. vigilantfish;
    It’s all about “scoops”. In any research field there is a set of discoveries waiting to occur: their time has come. The one who publishes first is no genius, generally, just the first of a number of “like-minded” researchers. So every little advantage counts.

    Hence the hoarding of secret sauce.

  53. Michael Mann collated bristlecone data, and when it didn’t show a hockey stick, he put it in an ftp file labeled “Censored.” [A.W. Montford’s The Hockey Stick Illusion explains how McIntyre eventually found the ‘censored file’.]

    If Mann had used the ‘censored’ data his chart would have shown declining temperatures instead of a fast rising hockey stick shape.

    Rejected data should be publicly archived, with an explanation given for why it was not used. If public funding paid for the work product, all information should be available to the public.

  54. Dave says:
    February 27, 2011 at 5:42 am

    “If I have done (non-publicly funded) work which involved writing code, I may have come up with a particularly neat an elegant solution – perhaps it’s easy to work with, or particularly fast, or some such. I don’t see that it’s my responsibility, in that situation, to pass on my code – it’s my professional advantage.”

    This position assumes that the author of the code has God’s understanding of the code and its role in the scientific work. Better to have some humility and give others the opportunity to see what you did not see in your code.

    In addition, this position assumes that others who are trying to replicate the experiment will be able to do so without the assistance of your code. That might be possible but highly unlikely. In any case, it will delay the work of replication and waste the time of fellow scientists.

    For the sake of replicability, give up your “advantage.”

  55. Smokey says:
    February 27, 2011 at 4:52 pm

    “Rejected data should be publicly archived, with an explanation given for why it was not used.”

    This is more than “rejected data.” This is a rejected finding. It is another case of “hide the decline.”

  56. Seems to me that results only achievable using a certain method aren’t worth much. Back in the 60s, programmers often did things using several different algorithms to ensure their results weren’t simply and artifact of the chosen method. Why should any worthwhile scientific knowledge depend on using the “correct” statistical package?

  57. Nature was forced to retract Mann, Bradley and Hughes 1998 paper [MBH98], which purported to show an alarming hockey stick-shaped rise in modern temperatures. The chart is fraudulent.

    For the Kool Aid drinkers who still claim that Mann’s Hokey Stick is valid, note that even the UN/IPCC will no longer use it because it was debunked.

    The IPCC loved Mann’s chart. It was alarming because it clearly showed runaway global warming. The [equally bogus] hockey stick spaghetti charts that replaced it have nowhere near the visual impact of Mann’s chart. The IPCC would never have stopped using Mann’s debunked chart if they were not forced to discard it.

  58. Richard says:
    February 27, 2011 at 2:22 am
    Hoser says:
    February 27, 2011 at 12:03 am

    “Software is intellectual property. It can be a trade secret, or patented. If it is published without protection, potential value is lost.”

    This is fortunately not true. Software is automatically copyrighted upon creation (Berne convention and laws in each jurisdiction).
    ______________________________

    Well, I’m not a lawyer, but I believe there is a big difference between having a copyright and having a patent. It seems software companies agree.
    See http://www.bitlaw.com/software-patent/why-patent.html

  59. Brian H says:
    February 27, 2011 at 2:47 am
    Hoser, that’s nonsense. “Intellectual property” is a term relevant to a marketable product or process, not to scientific reports to the research community. Get this straight: the purpose of research, in each specific instance and in general, is to bring forward ideas and evidence which contribute to understanding. PERIOD.
    ____________________

    Who stuck a quarter in you? I’m talking about science as it is in the real world.

  60. In all these comments there has not been one mention of ‘Quality Management’. Yet what is being discussed is just that. If a paper being submitted to a journal was poorly spelled with incorrect punctuation and no formatting, the editors would dismiss it out of hand. Yet it seems acceptable for similar lack of quality to be displayed in software by amateur software engineers but with the important difference that the results of the paper hinge on that poor quality software!

    This isn’t a case of wouldn’t it be nice to have diagrams to support the text. The software IS the driver for the text the software is what provides the support for the claims made.

    The entire edifice of AGW is based on software models. read that again . SOFTWARE MODELS. This is software built by amateurs who are unable to correctly document and publish software. Has any one of these laboratories been assessed for quality against say ISO 9000-3 or similar industry standards? Who would drive a car, cross a bridge, fly an aircraft designed using software written by amateurs who are incapable of writing quality software?

    Yet the world economy is being turned over based on the output of these programs written by these people.

    Some quotes from the thread:

    Boris Gimbarzevsky says:
    February 27, 2011 at 2:13 am

    “When we needed programming help in the lab we chose engineers who got code written and working fast whereas computer science students coded far too slowly producing excessively commented pretty code instead of working programs”

    Academic programming falls into these camps – pretty non-functional code that industry has to retrain out of computer science graduates. Or undocumented kludges held together with patches by hurried but keen undergrad engineer students. This – as is said in the quote – was to ‘get code written and working fast’. Great – but how do you know that it is ‘working’ you know the results already? Has it been verification and validation tested? No – its a university – you are lucky if its got titles let alone documented – ‘Harry Read-Me’ is way over the top for this code. Its fast now – but finding the bugs in it when the grad that wrote it has left and the requirements are slightly altered takes three or four times as long and each fix introduces more effective bugs.

    Shub Niggurath says:
    February 27, 2011 at 5:34 am
    imagine if Nature published code (it does, but not every bit) and people started balking at the awful code that is behind the latest and the greatest papers. I would imagine scientists are scared where their code sloppiness would be read as an indicator of their science sloppiness. Which in may cases, it is.

    The readers of Nature should realize that the results they have been presented with are based on ‘awful code’ and sloppy code. And I am sorry to say it but sloppy code is an indicator of science sloppiness. Universities are places of learning – they should learn to write quality code. Believe it or not documented and careful up front analysis and design followed by documented implementation in something like UML followed by programming with continual verification testing, actually saves time in the long run _and_ creates quality software that your establishment can be proud of and not want to conceal.

    Dena says:
    February 27, 2011 at 8:52 am
    I suspect many of these applications are written like throw away code where only one run of the code is all the use the code will ever see. While I have written throw away code, I didn’t trust it and checked the results carefully to make sure it was doing what I expected it to.

    There is no such thing as throw away code. Code is always used longer than expected and often, if not documented properly, for uses other than those it was designed for.

    It is disappointing that something that has become as important worldwide as study of climate is carried out by ‘scientists’ who do not understand botany or statistics and who cannot produce or maintain quality software. For some reason the ‘learned’ journals seem to be run by people who fail to understand the central importance of industry quality standards and verification and validation tested well documented software. One can only assume that the editors and peer reviewers are as at a loss with software as their contributors.

  61. Interesting discussion but I must disagree with those posters who are of the opinion that scientists shouldn’t write their own programs. Programming is simple and the big attraction of microcomputers in the 1960’s and 1970’s is that one could have the machine in ones own lab and use an interpreted language like FOCAL on the PDP-8 to do “real time” calculations. (Real time had a different meaning when the only other option was batch processing on the mainframe machine). The programs were small as the PDP-8 only had 4 Kw of core memory.

    Small programs are far easier to debug than large programs. With early minicomputers the amount of RAM was limited (I had to shoehorn every program into 56 Kb on the PDP-11) and complex tasks were accomplished by performing analysis in steps using multiple small programs which could be independently tested and debugged.

    I’m not sure at what program size the approach of self-taught scientist programmers breaks down, but a multi-megabyte program has orders of magnitude more dependencies among various subroutines than a 16 Kb program.

    The problems that were brought up by the climategate emails were not primarily as a result of poor programming practice but rather abysmal documentation. There should be a very clear record of what happens along every step of the data analysis and every result should be reproducible. Raw data is the most important asset in a research project and should never be discarded. Also, code for data acquisition and recording should be the most obsessively debugged code whereas subsequent data analysis code can always be corrected if mistakes are found.

    Having a repository of all source code is a wonderful idea as, for an area as controversial as climate research, a mass of programmers will descend on the code and pick it apart finding the bugs. This is a type of peer review which is not currently conducted but needs to be as software becomes an increasingly important component of scientific research. Valid criticisms are programming mistakes that result in incorrect results being published whereas criticism of programming style is only appropriate if the style of programming produces very buggy code.

  62. Computers, mother boards, processors, commercial soft ware can have math errors as well. I think the computers need to be described.

  63. Well, gents, it’s an interesting discussion between computers junkies, which leaves me out.

    However, being somewhat monetarily enhanced, I can vouch for the veracity of the original premise in the alledgely faulty study above:

    reaching the mind-boggling conclusion that wealthy men give women more orgasms.

  64. Ian W
    The issues of quality control you raise apply to software written for software’s sake. Do they apply to scientific code? Of certain kinds, maybe. But do they apply to scientific code written as part of a project, just to get things required for that project done ? I am not sure. The people involved are probably struggling with the coding language for the first time, reading up computer books and getting their experiments ready, at the same time (most common scenario). I would hardly imagine that they would be aware of software quality control concepts, apart from the odd student/postdoc who’s had some exposure to such concepts accidentally. Even if they do, they would quickly realize that learning and incorporating those elements is going to take effort and time, which they tend to be short of.

    Which is why the usual outcome is, the end-point of scientific code-writing is considered reached when results of some meaningful variety start showing up. Effort is then expended in making those results presentable, presenting at conferences and writing up the paper. The student who wrote the code sometimes leaves, explains everything briefly to the incoming candidate, who tries to stitch up things and wind up everything.

    I am sure there are fleeting insights that much of the results depends on the code written (because when you explain your methods you always say ‘… and then you put through the system and this is what comes out’), but I guess – just like in a business project – things change in a science project once ‘results’ have arrived, the mood is different. There is no space or opportunity to bring up ‘old’ issues and be klutzing around with code. In fact, it may become impossible to bring up.

    Most of lab-based experimentation verifies results by repeating the experiments under similar and/or different conditions. That is scientists’ internal model of validity of underlying science. With the code, you run the program many times and if it gives ‘consistent’ results, it is OK (!).

    The above is a common scenario of software/code in science (in my limited experience), and my understanding of why and where bad code gets written. I don’t think this is the universal case. (I am sure the engineering/software types are smacking their foreheads and rolling their eyes). Science is about finding something exciting and cool. Are those findings true or not – that can partly be for the community to figure out after publication (with the caveat that you don’t want too many non-replicable results coming out of your lab). Isn’t this is why scientific code should be made absolutely available – a good chunk of quality control can happen afterward, when other people peer into your code? These people don’t even know how to code,…how are they going to do anything about the quality of that code…

  65. Shub Niggurath says:
    February 28, 2011 at 3:56 am
    Ian W
    The issues of quality control you raise apply to software written for software’s sake. Do they apply to scientific code? Of certain kinds, maybe. But do they apply to scientific code written as part of a project, just to get things required for that project done ? I am not sure. The people involved are probably struggling with the coding language for the first time, reading up computer books and getting their experiments ready, at the same time (most common scenario). I would hardly imagine that they would be aware of software quality control concepts, apart from the odd student/postdoc who’s had some exposure to such concepts accidentally. Even if they do, they would quickly realize that learning and incorporating those elements is going to take effort and time, which they tend to be short of.

    I must first say that worked in a university research department for several years. With very similar time pressures to get things done for research and customer deadlines.

    It is a common misconception that you get computer code written faster if you don’t bother documenting what you are doing. However, it always helps to follow quality procedures even when working alone. Analysis and Design in something simple like UML then a design review, ideally with someone else who understands what you are doing – can prevent a considerable amount of wasted effort and errors. As the requirements for the program are generated validation tests specs are written that will be run to show that the requirement has been met. Asking “how is this tested for?” often leads to the requirement definitions being rewritten to be more precise. Choice of programming language may also have a huge effect on the length of the programming of the task – ask for advice. Then as the program is implemented more inline documentation to explain what is being done and why. Each module can then be verification tested. If a team is relatively large it also helps to impose configuration control and of course run a full secure backup of all data and software. When the program is written the validation tests can be run to confirm that it does what it was required to do. Some/most of these laboratory administrative tasks and repetitive testing and regression testing are ideal jobs for undergraduate students who get to understand the importance of process in creating a stable research computing environment and it takes the drudge admin work away from the grad students and post-docs. This also leads to interdisciplinary team work and ‘ego-less’ review as a post-doc in say human factors gets his C++ code tested and corrected by a keen undergrad. This is a win-win – nobody loses by this ‘quality’ approach. Not only that but code development time actually shortens as the amount of rework is reduced especially at the design stage.

    Now take it a step further and you are bidding for industry or government contracts, isn’t it better for your department to show that they have a Quality Management System in place and working? These don’t have to be more than a set of agreed processes that everyone follows. As far as the funded research groups in NASA and NOAA and the Federally Funded Research and Development Centers (FFRDC) like CRU, I can see no justification for sloppy coding. These are professional establishments and should have their funding withdrawn if they cannot demonstrate audited standards to at least ISO 9000-3. This was one of the major unreported issues in ‘climategate’ that a DOE funded ‘professional’ research unit CRU was not applying any quality control harry-read-me shows that the code was not being openly reviewed. (If it was being openly reviewed then that raises all sorts of other questions!). So the trillion dollar economic decisions of the world’s politicians are based on low quality amateur software with no documented testing. This cannot be correct.

    So what part do the ‘learned journals’ take in this – they appear to accept the ‘low quality amateur software with no documented testing’ even though it is the meat of the research on which the outcome of the research rests. Does it make sense that someone who would reject a research report because of split infinitives or poor referencing, is unworried by untested low quality software that is the basis for the report’s content?

    From my point of view publication of the source code (and software libraries) are more important than the textual report. If a research group is ashamed to show their source code then their research should not be trusted.

  66. I fully agree that data and source codes used in scientific articles should be made unconditionally available. This should not be difficult nowadays as most journals offer Web space for hosting supplementary information about published articles. However I disagree with the suggestion that journal referees should be required to examine and test source codes for software used, even if a team of “computational editors” is assigned the job. Testing and verifying computational codes can take anywhere from weeks to months of full-time work depending on the complexity of the code. This would make the time it takes to publish an article much longer than already is and make it extraordinarily difficult to recruit referees whose work is not compensated by the publishers. Let those that have critically read a particular article and are suspicious of the results and conclusions do the examining and testing as well as the other required analysis. I think this idea of ensuring absolute correctness arises from the distorted meaning of “peer-reviewed” article that is being presented in discussions about AGW in the press and other media. The way the term is used, in particular by CAGW propagandists, it would seem that a peer-reviewed article should be free of any errors and its conclusions should be rock solid and eternal. The publication of an article in a refereed scientific journal is not a guarantee of absolute absence of errors either mathematical and computational, methodological or conceptual. Nor is it a guarantee of the correctness of its conclusions. It should be viewed more as a quality assurance process: the review should verify that errors that can be ascertained by a knowledgeable reader within a reasonably short period of time are not present, that the scientific contents are properly connected to a larger body of knowledge and that relevant references have been cited, and that the procedures used are sound and results obtained are novel and original. That is, the peer-reviewed (or refereeing) process at most gives plausibility to the claims made in the article. Such claims should be verified by others, with greater value given to comparisons with relevant empirical data. That is why scientists should read articles with a critical mind and an attitude of sympathetic skepticism: sympathetic in the sense that the reader recognizes the value of the research presented in the article in question and is skeptical about whether the authors really made their case so that the reader will be motivated to examine the article in detail.

Comments are closed.