Guest post by Shub Niggurath
However, as it is now often practiced, one can make a good case
that computing is the last refuge of the scientific scoundrel.
—Randall LeVeque

Some backstory first
A very interesting editorial has appeared recently in Nature magazine. What is striking is that the editorial picks up the same strands of argument that were considered in this blog – of data availability in climate science and genomics.
Arising from this post at Bishop Hill, cryptic climate blogger Eli Rabett and encyclopedia activist WM Connolley claimed that the Nature magazine of yore (c1990), required only crystallography and nucleic acid sequence data to be submitted as a condition for publication, (which implied, that all other kinds of data was exempt).
We showed this to be wrong (here and here). Nature, in those days placed no conditions on publication, but instead expected scientists to adhere to a gentleman’s code of scientific conduct. Post-1996, it decided, like most other scientific journals, to make full data availability a formal requirement for publication.
The present data policy at Nature reads:
… a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications in material transfer agreements.
Did the above mean that everything was to be painfully worked out just to be gifted away, to be audited and dissected? Eli Rabett pursued his own inquiries at Nature. Writing to editor Philip Campbell, the blogger wondered: when Nature says ‘make protocols promptly available’, does it mean ‘hand over everything’, as with the case of software code used?
I am also interested in whether Nature considers algorithmic descriptions of protocols sufficient, or, as in the case of software, a complete delivery.
Interestingly, Campbell’s answer addressed something else:
As for software, the principle we adopt is that the code need only be supplied when a new program is the kernel of the paper’s advance, and otherwise we require the algorithm to be made available.
This caused Eli Rabett to be distracted and forget his original question altogether. “A-ha!”. “See, you don’t have to give code” (something he’d assumed, was to be given).
At least something doesn’t have to be given.
A question of code
The Nature editorial carried the same idea about authors of scientific papers making their code available:
Nature does not require authors to make code available, but we do expect a description detailed enough to allow others to write their own code to do a similar analysis.
The above example serves to illustrate how partisan advocacy positions can cause long-term damage in science. In some quarters, work proceeds tirelessly to obscure and befuddle simple issues. The editorial raises a number of unsettling questions that are sought to be buried by such efforts. Journals try to frame policy to accommodate requirements and developments in science, but apologists and obscurantists seek to hide behind journal policy for not providing data.
So, is the Nature position sustainable as its journal policy?

A popularly held notion mistakes publication for science; in other words – it is science in alchemy mode. ‘I am a scientist and I synthesized A from B. I don’t need to describe how, in detail. If you can see that A could have synthesized B without needing explanations, that would prove you are a scientist. If you are not a scientist, why would you need to see my methods anyway?’
It is easy to see why such parochialism and close-mindedness was jettisoned. Good science does not waste time describing every known step or in pedantry. Poor science tries to hide its flaws in stunted description, masquerading as the terseness of scholarly parlance. Curiously, it is often the more spectacular results that are accompanied by this technique. As a result, rationalizations to not provide data or method take on the same form – ‘my descriptions may be sketchy but you cannot replicate my experiment, because, you are just not good enough to understand the science, or follow the same trail’.
If we revisit the case of Duke University genomics researcher Anil Potti, this effect was clearly visible (a brief introduction is here). Biostatisticians Baggerly and Coombes could not replicate Potti et al’s findings from microarray experiments reported in their Nature Medicine paper. Potti et al’s response, predictably, contained the defense: ‘You did not do what we did’.
Unfortunately, they have not followed our methods in several crucial contexts and have made unjustified conclusions in others, and as a result their interpretation of our process is flawed.
…
Because Coombes et al did not follow these methods precisely and excluded cell lines and experiments with truncated -log concentrations, they have made assumptions inconsistent with our procedures.
Behind the scenes, web pages changed, data files changed versions and errors were acknowledged. Eventually, the Nature Medicine paper was retracted.
The same thing repeated itself in greater vehemence with another paper. Dressman et al published results on microarray research on cancer, in the Journal of Clinical Oncology. Anil Potti and Joseph Nevins were co-authors. The paper claimed to have developed a method of finding out which patients with cancer would not respond to certain drugs. Baggerly et al reported that Dressman et al’s results arose from ‘run batch effects’ – i.e., results that varied solely due to parts of the experiment being done on different occasions.
This time the response was severe. Dressman, with Potti and Nevins wrote in their reply in the Journal of Clinical Oncology:
To “reproduce” means to repeat, following the methods outlined in an original report. In their correspondence, Baggerly et al conclude that they are unable to reproduce the results reported in our study […]. This is an erroneous claim since in fact they did not repeat our methods.
…
Beyond the specific issues addressed above, we believe it is incumbent on those who question the accuracy and reproducibility of scientific studies, and thus the value of these studies, to base their accusations with the same level of rigor that they claim to address.
…
To reproduce means to repeat, using the same methods of analysis as reported. It does not mean to attempt to achieve the same goal of the study but with different methods. …
Despite the source code for our method of analysis being made publicly available, Baggerly et al did not repeat our methods and thus cannot comment on the reproducibility of our work.
Is this a correct understanding of scientific experiment? If a method claims to have uncovered a fundamental facet of reality, should it not be robust enough to be revealed by other methods as well, which follow the principle but differ slightly? Obviously, Potti and colleagues are wandering off into the deep end here. The points raised here are unprecedented and go well beyond the specifics of their particular case – not only do the authors say: ‘you did not do what we did and therefore you are wrong’, they go on to say: ‘you have to do exactly what we did, to be right’. In addition they attempt to shift the burden of proof from a paper’s authors to those who critique it.

The Dressman et al authors face round criticism by statisticians Vincent Carey and Victoria Stodden for their approach. They note that a significant portion of Dressman et al results were nonreconstructible – i.e., could not be replicated even with the original data and methods, because of flaws in the data. This was only exposed when attempts were made to repeat their experiments. This defeats the authors’ comments about the rigor of their critics’ accusations. Carey and Stodden take issue with the claim that only the precise original methods can produce true results:
The rhetoric – that an investigation of reproducibility just employ “the precise methods used in the study being criticized” – is strong and introduces important obligations for primary authors. Specifically, if checks on reproducibility are to be scientifically feasible, authors must make it possible for independent scientists to somehow execute “the precise methods used” to generate the primary conclusions.
Arising from their own analysis, they agree firmly with Baggerly et al’s observations of ‘batch effects’ confounding the results. They conclude, making crucial distinctions between experiment reconstruction and reproduction:
The distinction between nonreconstructible and nonreproducible findings is worth making. Reconstructibility of an analysis is a condition that can be checked computationally, concerning data resources and availability of algorithms, tuning parameter settings, random number generator states, and suitable computing environments. Reproducibility of an analysis is a more complex and scientifically more compelling condition that is only met when scientific assertions derived from the analysis are found to be at least approximately correct when checked under independently established conditions.
Seen in this light, it is clear that an issue of ‘we cannot do what you say you did’ will morph rapidly to a ‘does your own methods do what you say they do?’ Intractable disputes arise even with both author and critic being expert, and with much of the data openly available. Full availability of data, algorithm and computer code is perhaps the only way to address both questions.
Therefore Nature magazine’s approach to not ask for software code as a matter of routine, but to obtain everything else, becomes difficult to reconcile.
Software dependence
Results of experiments can hinge just on software, just as it can on the other components of scientific research. The editorial recounts an interesting example of one more instance of bioinformatics findings which were dependent on the version number of commercially available software employed by the authors.
The most bizarre example of software-dependence of results however comes from Hothorn and Leisch’s recent paper ‘Case studies in reproducibility‘ in the journal Breifings in Bioinformatics. The authors recount the example of Pollet and Nettle (2009) reaching the mind-boggling conclusion that wealthy men give women more orgasms. Their results remained fully reproducible – in the usual sense:
Pollet and Nettle very carefully describe the data and the methods applied and their analysis meets the state-of-the-art for statistical analyzes of such a survey. Since the data are publicly[sic] available, it should be easy to fit the model and derive the same conclusions on your own computer. It is, in fact, possible to do so using the same software that was used by the authors. So, in this sense, this article is fully reproducible.
What then was the problem? It turned out that the results were software-specific.
However, one fails performing the same analysis in R Core Development Team. It turns out that Pollet and Nettle were tricked by a rather unfortunate and subtle default option when computing AICs for their proportional odds model in SPSS.
Certainly this type of problem is not confined to one branch of science. Many a time, description of method conveys something, but the underlying code does something else (of which even the authors are unaware), the results in turn seem to substantiate emerging, untested hypotheses and as a result, the blind spot goes unchecked. Veering to climate science and the touchstone of code-related issues in scientific reproducibility— the McIntyre and McKitrick papers, Hothorn and Leisch draw obvious conclusions:
While a scientific debate on the relationship of men’s wealth and women’s orgasm frequency might be interesting only for a smaller group of specialists there is no doubt that the scientific evidence of global warming has enormous political, social and economic implications. In both cases, there would have been no hope for other, independent, researchers of detecting (potential) problems in the statistical analyzes and, therefore, conclusions, without access to the data.
…
Acknowledging the many subtle choices that have to be made and that never appear in a ‘Methods’ section in papers, McIntyre and McKitrick go as far as printing the main steps of their analysis in the paper (as R code).
Certainly when science becomes data- and computing intensive, issues of how to reproduce an experiment’s results is inextricably linked with its own repeatability or reconstructibility. Papers may be fall into any combination of repeatability and reproducibility, with varying degree of both, and yet be wrong. As Hothorn and Leisch write:
So, in principle, the same issues as discussed above arise here: (i) Data need to be publically[sic] available for reinspection and (ii) the complete source code of the analysis is the only valid reference when it comes to replication of a specific analysis
Why the reluctance?

What reasons can there be, for scientists not willing to share their software code? As always, the answers turn out far less exotic. In 2009 Nature magazine, devoted an entire issue to the question of data sharing. Post-Climategate, it briefly addressed issues of code. Computer engineer Nick Barnes opined in a Nature column on the software angle and why scientists are generally reluctant. He sympathized with scientists – they feel that their code is very “raw”, “awkward” and therefore hold “misplaced concerns about quality”. Other more routine excuses for not releasing code, we are informed, are that it is ‘not common practice’, will ‘result in requests for technical support’, is ‘intellectual property’ and that ‘it is too much work’.
In another piece, journalist Zeeya Merali took a less patronizing look at the problem. Professional computer programmers were less sanguine about what was revealed in the Climategate code.
As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software, say computer scientists. At best, poorly written programs cause researchers such as Harry to waste valuable time and energy. But the coding problems can sometimes cause substantial harm, and have forced some scientists to retract papers.
While Climategate and HARRY_READ_ME focused attention on the problem, this was by no means unknown before. Merali reported results from an online survey by computer scientist Greg Wilson conducted in 2008. Wilson noted that most scientists taught themselves to code and had no idea ‘how bad’ their own work was.
As a result, codes may be riddled with tiny errors that do not cause the program to break down, but may drastically change the scientific results that it spits out. One such error tripped up a structural-biology group led by Geoffrey Chang of the Scripps Research Institute in La Jolla, California. In 2006, the team realized that a computer program supplied by another lab had flipped a minus sign, which in turn reversed two columns of input data, causing protein crystal structures that the group had derived to be inverted.
Geoffrey Chang’s story was widely reported in 2006. His paper in Science on a protein structure, had by the time the code error was detected, accumulated 300+ citations, impacted grant applications, caused contrary papers to be bounced off, and resulted in drug development work. Chang, Science magazine reported scientist Douglas Rees as saying, was a hard-working scientist with good data, but the “faulty software threw everything off”. Chang’s group retracted five papers in prominent science journals.
Interestingly enough, Victoria Stodden reports in her blog, that she and Mark Gerstein wrote a letter to Nature, responding to the Nick Barnes and Zeeya Merali articles voicing some disagreements and suggestions. They felt that journals could help tighten the slack:
However, we disagree with an implicit assertion, that the computer codes are a component separate from the actual publication of scientific findings, often neglected in preference to the manuscript text in the race to publish. More and more, the key research results in papers are not fully contained within the small amount of manuscript text allotted to them. That is, the crucial aspects of many Nature papers are often sophisticated computer codes, and these cannot be separated from the prose narrative communicating the results of computational science. If the computer code associated with a manuscript were laid out according to accepted software standards, made openly available, and looked over as thoroughly by the journal as the text in the figure legends, many of the issues alluded to in the two pieces would simply disappear overnight.
…
We propose that high-quality journals such as Nature not only have editors and reviewers that focus on the prose of a manuscript but also “computational editors” that look over computer codes and verify results.
Nature decided not to publish it. It is now obvious to see why.
Code battleground
Small sparks about scientific code can set off major rows. In a more recent example, the Antarctic researcher Eric Steig wrote in a comment to Nick Barnes that he faced problems with the code of Ryan O’Donnell and colleagues’ Journal of Climate paper. Irked, O’Donnell wrote back that he was surprised Steig hadn’t taken time to run their R code, as reviewer of their paper, a fact which was had remained unknown up-to that point. The ensuing conflagration is now well-known.
In the end, software code is undoubtedly an area where errors, inadvertent or systemic, can lurk and impact significantly on results, as even the meager examples above show, again and again. In his paper on reproducible research in 2006, Randall LeVeque wrote in the journal Proceedings of the International Congress of Mathematicians:
Within the world of science, computation is now rightly seen as a third vertex of a triangle complementing experiment and theory. However, as it is now often practiced, one can make a good case that computing is the last refuge of the scientific scoundrel. Of course not all computational scientists are scoundrels, any more than all patriots are, but those inclined to be sloppy in their work currently find themselves too much at home in the computational sciences.
However, LeVeque was perhaps a bit naivé, when expecting only disciplines with significant computing to attempt getting away with poor description:
Where else in science can one get away with publishing observations that are claimed to prove a theory or illustrate the success of a technique without having to give a careful description of the methods used, in sufficient detail that others can attempt to repeat the experiment? In other branches of science it is not only expected that publications contain such details, it is also standard practice for other labs to attempt to repeat important experiments soon after they are published.
In an ideal world, authors would make their methods, including software code available along with their data. But that doesn’t happen in the real world. ‘Sharing data and code’ for the benefit of ‘scientific progress’ may be driving data repository efforts (such as DataONE), but hypothesis-driven research generates data and code, specific to the question being asked. Only the primary researchers possess such data to begin with. As the “Rome” meeting of researchers, journal editors and attorneys wrote in their Nature article laying out their recommendations (Post-publication sharing of data and tools):
A strong message from Rome was that funding organizations, journals and researchers need to develop coordinated policies and actions on sharing issues.
…
When it comes to compliance, journals and funding agencies have the most important role in enforcement and should clearly state their distribution and data-deposition policies, the consequences of non-compliance, and consistently enforce their policy.
Modern-day science is mired in rules and investigative committees. What’s to be done naturally in science – showing other what you did – becomes a chore under a regime. However, rightly or otherwise, scientific journals have been drafted into making authors comply. Consequently it is inevitable that journal policies become battlegrounds where such issues are fought out.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
So Nature the journal remains the Journal of Irreproducible Results. I stopped buying it some years ago as the content of so many of its articles diverged from the headline and claimed conclusion. Opinion is not scientific proof and that is all the JIP has had for years, I browse it at news stands but there is still not enough content to justify buying. Too much of the self proclaimed science of today is thinly disguised G.I.G.O..Flow Chart analysis included with the code would go a long way toward the authors spotting their own errors, but wrt the team I have seen no real interest in science in their actions to date.For the IPCC ect science is a cloak not a method.
I concur with you fully Smokey.
I’m simply pointing out that we have to be very specific in what is asked for from the scientists otherwise – as we know from hard lessons – many of them will be sneaky and not give source code but only the binary code which while it can be reverse engineered it’s a very time consuming and error prone process.
The article above uses “code” when it should be saying “source code” or “source code and binary code”.
Dave says:
February 27, 2011 at 5:42 am
‘If I have done (non-publicly funded) work which involved writing code, I may have come up with a particularly neat an elegant solution – perhaps it’s easy to work with, or particularly fast, or some such. I don’t see that it’s my responsibility, in that situation, to pass on my code – it’s my professional advantage.’
————–
I am very troubled by this attitude. Surely the professional advantage accruing to a scientist should be his sound contributions to science and any deserving recognition arising from this?
Science properly understood is not a competition to gain advantage over others – that is a business model. Science historically, when done at its best, has either been collaborative and constructive, or competitive in terms of the ideas under consideration, not in terms of the hiding the methods used. Above all else, science should strive to be reproducible. Any barriers put up to block that reproducibility or to prevent others from reaching the same results — if they are valid — should, by rendering the science unrepeatable, invalidate that science.
If the commercial possibilities of a computer code are more enticing to a scientist than the actual scientific outcomes, then prioritize and b****y-well patent or copyright the process first, then make it available at the same time you publish your wonderful model. This was the method followed by Craig Venter in his approach to the Human Genome Project, when ‘decoding’ genes. If your only reason to withhold the code is intellectual arrogance or a desire to have some strange advantage (advantage in what?), stop pretending you are a real scientist. You are not trying to advance humanity’s understanding of nature, in that case.
Following the links here led me to a timely reminder at Rabett Run written a few months after Climategate:
As Jones wrote:
But were these adjusted “data” relying on the inadequate UHI adjustment standards, still effectively drawing on the 1990 Wang & Jones paper whose serious critique by McKitrick & Michaels was suppressed from AR4?
I look forward to transparent data from BEST.
Wonderful article. Everyone should copy and save. It is a fine contribution to our understanding of scientific method and how journal standards support or undermine it.
Why not include the code?
How else to validate that the conversion of the mathematical algorithms are correct? How else to validate that the implementation of said functions are correct? How else to validate how the data is read and mann handled in all the process’?
We non know there never was any QA of the CRUde code, and that there’s no real QA done by NASA/GISS and NOAAs climate related coding neither.
Any one that has an interest in math has most likely learnt about error propagation and most folks who’re interested in coding has learnt, probably the hard way, about error propagation in coding and that different languages can yield different results even though you do the implementation the same in all languages. How many think the average so called climate scientist is interested in either math or programming? How many think the average climate scientist understand, if they even know it, that different brands of calculator, and different versions of the same calculator from the same manufacturer, yields different results from the same equations?
If you’re given a average would you not want to know how much of the data was excluded and on what grounds it was excluded and if the exclusion was rational to boot and on conscious ground or just because of badly implemented code, badly constructed filters (like excluding 20 years of a pre-defined period just because one year was considered bad just because there wasn’t enough statically predefined days within the statically predefined temperature ranges. And who would not want to know if the predefined ranges was from the original coder’s geographic location in some Siberian outpost but applied to Miami), et cetera et al or just bugs (but of course crazed climate hippies can outdo Microsoft, Apple, IBM, Oracle, SUN Microsystem, Google, and the whole Open Source community, with bug free code because what they’re so in to hacking?)
vigilantfish;
It’s all about “scoops”. In any research field there is a set of discoveries waiting to occur: their time has come. The one who publishes first is no genius, generally, just the first of a number of “like-minded” researchers. So every little advantage counts.
Hence the hoarding of secret sauce.
Michael Mann collated bristlecone data, and when it didn’t show a hockey stick, he put it in an ftp file labeled “Censored.” [A.W. Montford’s The Hockey Stick Illusion explains how McIntyre eventually found the ‘censored file’.]
If Mann had used the ‘censored’ data his chart would have shown declining temperatures instead of a fast rising hockey stick shape.
Rejected data should be publicly archived, with an explanation given for why it was not used. If public funding paid for the work product, all information should be available to the public.
Dave says:
February 27, 2011 at 5:42 am
“If I have done (non-publicly funded) work which involved writing code, I may have come up with a particularly neat an elegant solution – perhaps it’s easy to work with, or particularly fast, or some such. I don’t see that it’s my responsibility, in that situation, to pass on my code – it’s my professional advantage.”
This position assumes that the author of the code has God’s understanding of the code and its role in the scientific work. Better to have some humility and give others the opportunity to see what you did not see in your code.
In addition, this position assumes that others who are trying to replicate the experiment will be able to do so without the assistance of your code. That might be possible but highly unlikely. In any case, it will delay the work of replication and waste the time of fellow scientists.
For the sake of replicability, give up your “advantage.”
Smokey says:
February 27, 2011 at 4:52 pm
“Rejected data should be publicly archived, with an explanation given for why it was not used.”
This is more than “rejected data.” This is a rejected finding. It is another case of “hide the decline.”
Seems to me that results only achievable using a certain method aren’t worth much. Back in the 60s, programmers often did things using several different algorithms to ensure their results weren’t simply and artifact of the chosen method. Why should any worthwhile scientific knowledge depend on using the “correct” statistical package?
Nature was forced to retract Mann, Bradley and Hughes 1998 paper [MBH98], which purported to show an alarming hockey stick-shaped rise in modern temperatures. The chart is fraudulent.
For the Kool Aid drinkers who still claim that Mann’s Hokey Stick is valid, note that even the UN/IPCC will no longer use it because it was debunked.
The IPCC loved Mann’s chart. It was alarming because it clearly showed runaway global warming. The [equally bogus] hockey stick spaghetti charts that replaced it have nowhere near the visual impact of Mann’s chart. The IPCC would never have stopped using Mann’s debunked chart if they were not forced to discard it.
Richard says:
February 27, 2011 at 2:22 am
Hoser says:
February 27, 2011 at 12:03 am
“Software is intellectual property. It can be a trade secret, or patented. If it is published without protection, potential value is lost.”
This is fortunately not true. Software is automatically copyrighted upon creation (Berne convention and laws in each jurisdiction).
______________________________
Well, I’m not a lawyer, but I believe there is a big difference between having a copyright and having a patent. It seems software companies agree.
See http://www.bitlaw.com/software-patent/why-patent.html
Brian H says:
February 27, 2011 at 2:47 am
Hoser, that’s nonsense. “Intellectual property” is a term relevant to a marketable product or process, not to scientific reports to the research community. Get this straight: the purpose of research, in each specific instance and in general, is to bring forward ideas and evidence which contribute to understanding. PERIOD.
____________________
Who stuck a quarter in you? I’m talking about science as it is in the real world.
In all these comments there has not been one mention of ‘Quality Management’. Yet what is being discussed is just that. If a paper being submitted to a journal was poorly spelled with incorrect punctuation and no formatting, the editors would dismiss it out of hand. Yet it seems acceptable for similar lack of quality to be displayed in software by amateur software engineers but with the important difference that the results of the paper hinge on that poor quality software!
This isn’t a case of wouldn’t it be nice to have diagrams to support the text. The software IS the driver for the text the software is what provides the support for the claims made.
The entire edifice of AGW is based on software models. read that again . SOFTWARE MODELS. This is software built by amateurs who are unable to correctly document and publish software. Has any one of these laboratories been assessed for quality against say ISO 9000-3 or similar industry standards? Who would drive a car, cross a bridge, fly an aircraft designed using software written by amateurs who are incapable of writing quality software?
Yet the world economy is being turned over based on the output of these programs written by these people.
Some quotes from the thread:
Boris Gimbarzevsky says:
February 27, 2011 at 2:13 am
“When we needed programming help in the lab we chose engineers who got code written and working fast whereas computer science students coded far too slowly producing excessively commented pretty code instead of working programs”
Academic programming falls into these camps – pretty non-functional code that industry has to retrain out of computer science graduates. Or undocumented kludges held together with patches by hurried but keen undergrad engineer students. This – as is said in the quote – was to ‘get code written and working fast’. Great – but how do you know that it is ‘working’ you know the results already? Has it been verification and validation tested? No – its a university – you are lucky if its got titles let alone documented – ‘Harry Read-Me’ is way over the top for this code. Its fast now – but finding the bugs in it when the grad that wrote it has left and the requirements are slightly altered takes three or four times as long and each fix introduces more effective bugs.
Shub Niggurath says:
February 27, 2011 at 5:34 am
imagine if Nature published code (it does, but not every bit) and people started balking at the awful code that is behind the latest and the greatest papers. I would imagine scientists are scared where their code sloppiness would be read as an indicator of their science sloppiness. Which in may cases, it is.
The readers of Nature should realize that the results they have been presented with are based on ‘awful code’ and sloppy code. And I am sorry to say it but sloppy code is an indicator of science sloppiness. Universities are places of learning – they should learn to write quality code. Believe it or not documented and careful up front analysis and design followed by documented implementation in something like UML followed by programming with continual verification testing, actually saves time in the long run _and_ creates quality software that your establishment can be proud of and not want to conceal.
Dena says:
February 27, 2011 at 8:52 am
I suspect many of these applications are written like throw away code where only one run of the code is all the use the code will ever see. While I have written throw away code, I didn’t trust it and checked the results carefully to make sure it was doing what I expected it to.
There is no such thing as throw away code. Code is always used longer than expected and often, if not documented properly, for uses other than those it was designed for.
It is disappointing that something that has become as important worldwide as study of climate is carried out by ‘scientists’ who do not understand botany or statistics and who cannot produce or maintain quality software. For some reason the ‘learned’ journals seem to be run by people who fail to understand the central importance of industry quality standards and verification and validation tested well documented software. One can only assume that the editors and peer reviewers are as at a loss with software as their contributors.
Interesting discussion but I must disagree with those posters who are of the opinion that scientists shouldn’t write their own programs. Programming is simple and the big attraction of microcomputers in the 1960’s and 1970’s is that one could have the machine in ones own lab and use an interpreted language like FOCAL on the PDP-8 to do “real time” calculations. (Real time had a different meaning when the only other option was batch processing on the mainframe machine). The programs were small as the PDP-8 only had 4 Kw of core memory.
Small programs are far easier to debug than large programs. With early minicomputers the amount of RAM was limited (I had to shoehorn every program into 56 Kb on the PDP-11) and complex tasks were accomplished by performing analysis in steps using multiple small programs which could be independently tested and debugged.
I’m not sure at what program size the approach of self-taught scientist programmers breaks down, but a multi-megabyte program has orders of magnitude more dependencies among various subroutines than a 16 Kb program.
The problems that were brought up by the climategate emails were not primarily as a result of poor programming practice but rather abysmal documentation. There should be a very clear record of what happens along every step of the data analysis and every result should be reproducible. Raw data is the most important asset in a research project and should never be discarded. Also, code for data acquisition and recording should be the most obsessively debugged code whereas subsequent data analysis code can always be corrected if mistakes are found.
Having a repository of all source code is a wonderful idea as, for an area as controversial as climate research, a mass of programmers will descend on the code and pick it apart finding the bugs. This is a type of peer review which is not currently conducted but needs to be as software becomes an increasingly important component of scientific research. Valid criticisms are programming mistakes that result in incorrect results being published whereas criticism of programming style is only appropriate if the style of programming produces very buggy code.
Computers, mother boards, processors, commercial soft ware can have math errors as well. I think the computers need to be described.
Well, gents, it’s an interesting discussion between computers junkies, which leaves me out.
However, being somewhat monetarily enhanced, I can vouch for the veracity of the original premise in the alledgely faulty study above:
reaching the mind-boggling conclusion that wealthy men give women more orgasms.
Ian W
The issues of quality control you raise apply to software written for software’s sake. Do they apply to scientific code? Of certain kinds, maybe. But do they apply to scientific code written as part of a project, just to get things required for that project done ? I am not sure. The people involved are probably struggling with the coding language for the first time, reading up computer books and getting their experiments ready, at the same time (most common scenario). I would hardly imagine that they would be aware of software quality control concepts, apart from the odd student/postdoc who’s had some exposure to such concepts accidentally. Even if they do, they would quickly realize that learning and incorporating those elements is going to take effort and time, which they tend to be short of.
Which is why the usual outcome is, the end-point of scientific code-writing is considered reached when results of some meaningful variety start showing up. Effort is then expended in making those results presentable, presenting at conferences and writing up the paper. The student who wrote the code sometimes leaves, explains everything briefly to the incoming candidate, who tries to stitch up things and wind up everything.
I am sure there are fleeting insights that much of the results depends on the code written (because when you explain your methods you always say ‘… and then you put through the system and this is what comes out’), but I guess – just like in a business project – things change in a science project once ‘results’ have arrived, the mood is different. There is no space or opportunity to bring up ‘old’ issues and be klutzing around with code. In fact, it may become impossible to bring up.
Most of lab-based experimentation verifies results by repeating the experiments under similar and/or different conditions. That is scientists’ internal model of validity of underlying science. With the code, you run the program many times and if it gives ‘consistent’ results, it is OK (!).
The above is a common scenario of software/code in science (in my limited experience), and my understanding of why and where bad code gets written. I don’t think this is the universal case. (I am sure the engineering/software types are smacking their foreheads and rolling their eyes). Science is about finding something exciting and cool. Are those findings true or not – that can partly be for the community to figure out after publication (with the caveat that you don’t want too many non-replicable results coming out of your lab). Isn’t this is why scientific code should be made absolutely available – a good chunk of quality control can happen afterward, when other people peer into your code? These people don’t even know how to code,…how are they going to do anything about the quality of that code…
Shub Niggurath says:
February 28, 2011 at 3:56 am
Ian W
The issues of quality control you raise apply to software written for software’s sake. Do they apply to scientific code? Of certain kinds, maybe. But do they apply to scientific code written as part of a project, just to get things required for that project done ? I am not sure. The people involved are probably struggling with the coding language for the first time, reading up computer books and getting their experiments ready, at the same time (most common scenario). I would hardly imagine that they would be aware of software quality control concepts, apart from the odd student/postdoc who’s had some exposure to such concepts accidentally. Even if they do, they would quickly realize that learning and incorporating those elements is going to take effort and time, which they tend to be short of.
I must first say that worked in a university research department for several years. With very similar time pressures to get things done for research and customer deadlines.
It is a common misconception that you get computer code written faster if you don’t bother documenting what you are doing. However, it always helps to follow quality procedures even when working alone. Analysis and Design in something simple like UML then a design review, ideally with someone else who understands what you are doing – can prevent a considerable amount of wasted effort and errors. As the requirements for the program are generated validation tests specs are written that will be run to show that the requirement has been met. Asking “how is this tested for?” often leads to the requirement definitions being rewritten to be more precise. Choice of programming language may also have a huge effect on the length of the programming of the task – ask for advice. Then as the program is implemented more inline documentation to explain what is being done and why. Each module can then be verification tested. If a team is relatively large it also helps to impose configuration control and of course run a full secure backup of all data and software. When the program is written the validation tests can be run to confirm that it does what it was required to do. Some/most of these laboratory administrative tasks and repetitive testing and regression testing are ideal jobs for undergraduate students who get to understand the importance of process in creating a stable research computing environment and it takes the drudge admin work away from the grad students and post-docs. This also leads to interdisciplinary team work and ‘ego-less’ review as a post-doc in say human factors gets his C++ code tested and corrected by a keen undergrad. This is a win-win – nobody loses by this ‘quality’ approach. Not only that but code development time actually shortens as the amount of rework is reduced especially at the design stage.
Now take it a step further and you are bidding for industry or government contracts, isn’t it better for your department to show that they have a Quality Management System in place and working? These don’t have to be more than a set of agreed processes that everyone follows. As far as the funded research groups in NASA and NOAA and the Federally Funded Research and Development Centers (FFRDC) like CRU, I can see no justification for sloppy coding. These are professional establishments and should have their funding withdrawn if they cannot demonstrate audited standards to at least ISO 9000-3. This was one of the major unreported issues in ‘climategate’ that a DOE funded ‘professional’ research unit CRU was not applying any quality control harry-read-me shows that the code was not being openly reviewed. (If it was being openly reviewed then that raises all sorts of other questions!). So the trillion dollar economic decisions of the world’s politicians are based on low quality amateur software with no documented testing. This cannot be correct.
So what part do the ‘learned journals’ take in this – they appear to accept the ‘low quality amateur software with no documented testing’ even though it is the meat of the research on which the outcome of the research rests. Does it make sense that someone who would reject a research report because of split infinitives or poor referencing, is unworried by untested low quality software that is the basis for the report’s content?
From my point of view publication of the source code (and software libraries) are more important than the textual report. If a research group is ashamed to show their source code then their research should not be trusted.
I fully agree that data and source codes used in scientific articles should be made unconditionally available. This should not be difficult nowadays as most journals offer Web space for hosting supplementary information about published articles. However I disagree with the suggestion that journal referees should be required to examine and test source codes for software used, even if a team of “computational editors” is assigned the job. Testing and verifying computational codes can take anywhere from weeks to months of full-time work depending on the complexity of the code. This would make the time it takes to publish an article much longer than already is and make it extraordinarily difficult to recruit referees whose work is not compensated by the publishers. Let those that have critically read a particular article and are suspicious of the results and conclusions do the examining and testing as well as the other required analysis. I think this idea of ensuring absolute correctness arises from the distorted meaning of “peer-reviewed” article that is being presented in discussions about AGW in the press and other media. The way the term is used, in particular by CAGW propagandists, it would seem that a peer-reviewed article should be free of any errors and its conclusions should be rock solid and eternal. The publication of an article in a refereed scientific journal is not a guarantee of absolute absence of errors either mathematical and computational, methodological or conceptual. Nor is it a guarantee of the correctness of its conclusions. It should be viewed more as a quality assurance process: the review should verify that errors that can be ascertained by a knowledgeable reader within a reasonably short period of time are not present, that the scientific contents are properly connected to a larger body of knowledge and that relevant references have been cited, and that the procedures used are sound and results obtained are novel and original. That is, the peer-reviewed (or refereeing) process at most gives plausibility to the claims made in the article. Such claims should be verified by others, with greater value given to comparisons with relevant empirical data. That is why scientists should read articles with a critical mind and an attitude of sympathetic skepticism: sympathetic in the sense that the reader recognizes the value of the research presented in the article in question and is skeptical about whether the authors really made their case so that the reader will be motivated to examine the article in detail.