In my opinion, this is a testament to Steve McIntyre’s tenacity.
Via the GWPF: At Last, The Right Lesson From Climategate Fiasco
A diverse group of academic research scientists from across the U.S. have written a policy paper which has been published in the journal Science, suggesting that the time has come for all science journals to begin requiring computer source code be made available as a condition of publication. Currently, they say, only three of the top twenty journals do so.
The group argues that because computer programs are now an integral part of research in almost every scientific field, it has become critical that researchers provide the source code for custom written applications in order for work to be peer reviewed or duplicated by other researchers attempting to verify results.
Not providing source code, they say, is now akin to withholding parts of the procedural process, which results in a “black box” approach to science, which is of course, not tolerated in virtually every other area of research in which results are published. It’s difficult to imagine any other realm of scientific research getting such a pass and the fact that code is not published in an open source forum detracts from the credibility of any study upon which it is based. Articles based on computer simulations, for example, such as many of those written about astrophysics or environmental predictions, tend to become meaningless when they are offered without also offering the source code of the simulations on which they are based.
The team acknowledges that many researchers are clearly reticent to reveal code that they feel is amateurish due to computer programming not being their profession and that some code may have commercial value, but suggest that such reasons should no longer be considered sufficient for withholding such code. They suggest that forcing researchers to reveal their code would likely result in cleaner more portable code and that open-source licensing could be made available for proprietary code.
They also point out that many researchers use public funds to conduct their research and suggest that entities that provide such funds should require that source code created as part of any research effort be made public, as is the case with other resource materials.
The group also points out that the use of computer code, both off the shelf and custom written will likely become ever more present in research endeavors, and thus as time passes, it becomes ever more crucial that such code is made available when results are published, otherwise, the very nature of peer review and reproducibility will cease to have meaning in the scientific context.
More information: Shining Light into Black Boxes, Science 13 April 2012: Vol. 336 no. 6078 pp. 159-160 DOI: 10.1126/science.1218263
Abstract
The publication and open exchange of knowledge and material form the backbone of scientific progress and reproducibility and are obligatory for publicly funded research. Despite increasing reliance on computing in every domain of scientific endeavor, the computer source code critical to understanding and evaluating computer programs is commonly withheld, effectively rendering these programs “black boxes” in the research work flow. Exempting from basic publication and disclosure standards such a ubiquitous category of research tool carries substantial negative consequences. Eliminating this disparity will require concerted policy action by funding agencies and journal publishers, as well as changes in the way research institutions receiving public funds manage their intellectual property (IP).
=========================================
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Should this happen I expect the publication rate of many individuals in ‘climate modeling’ to collapse.
Many researchers are *reluctant* to reveal their source code, but *reticent* they are not.
Anthony Watts says:
April 17, 2012 at 12:36 pm
I wrote my first computer program in 1963. As a broad generalization, for me, there’s two kinds of programmers. Those that used punch cards, and those that didn’t.
Those that used punch cards are generally skeptical of all computer results. Those that didn’t, often not so much …
I publish my own data and code, but it’s not only not user-friendly, it would best be described as “user-agressive” … I write in R, and the joy and the bane of R is that you can run all or part of a program. So to produce a graph, I may run some lines from one part of the program to get the data, and some lines from another part to graph it … bad Willis, no cookies.
w.
Excellent idea. Regarding the problem with amateurish, undocumented code – why, we will just stipulate that every program be accompanied by its own Harry.readme file /sarc
Lord, lord, when I was doing scientific programming about 45 years ago, I wrote in Fortran. But it was an extensively tweaked Fortran, which let me use assembly language in with the Fortran. It probably was only run on that one particular Control Data 3100. “Portable code” is a much larger problem than it looks to be. Flowcharts and pseudocode is the easiest way out.
Graeme W says:
April 17, 2012 at 1:41 pm
I agree with Leif’s concerns about software dependencies, but the main point is to make sure that the custom code associated with the paper is available. The dependencies required to get it to work, while potentially annoying and time-consuming, do not stop someone from evaluating the code required for the paper to see if there are any issues with it.
#############################################################
This is surely the nub of the thread. The reason why source code and data should be an automatic provision is so that somebody who wishes to can follow the process and attempt to see any errors which would disprove whatever was claimed. It doesn’t matter if it is amateurish,long winded or disorganised – it either works or it doesn’t.
If somebody claimed to be able to prove that a cold fusion process worked and provided a proof in a new, private mathematical symbology, who would believe them? It is the same with these climatologists who claim things but wont show their workings – who believes them? Not me.
Really excellent news, except for those scientists who care more about avoiding their own errors coming to light than the truth and scientific progress.
I don’t see that what language the code is written in is likely to be a major problem in the vast majority of cases, nor that it would be reasonable to seek to restrict what language was used, providing it and any code libraries used were reasonably accessible.
Likewise, while good programming practice should certainly be encouraged (as should employment of a widely used language), I’m very doubtful that there should be any prescriptive requirements – IMO that would be unduly onerous on scientists, who cannot and should not be expected to be professional-standard programmers. The important objectives are surely to ensure replicability (including testing the effects of various changes) and to enable discovery of errors. I don’t see that elimination of errors shouldn’t be the key aim, at least in most cases, although publication of code can be expected to make scientists take more care to avoid and to correct program errors.
The main thing from the point of replicability and finding any errors is surely that others should both be able to follow exactly what the code does and to run it, both as is and with changes to test the robustness of the results of the study employing the code. IMO (speaking as a pretty poor programmer), the key requirement for doing so is frequent, detailed comments in the code setting out what operation is being performed, along with extensive explanatory comments at the start of each code module setting out what it does, including a description of the variables and functions involved. An overall description of what the program does, with any relevant flowcharts, arguably belongs in the published paper, or in textual supplementary information to it, rather than within the code itself.
Let me get this straight:
Louise and joeldshore have elevated this site to the equivalence of a major scientific journal.
Such high praise. Congrats Anthony.
I applaud the policy paper which has been published in the journal Science. It is heartening to see the scientific community make a serious effort at self-correcting. I am sure there are some stubborn pragmatic obstacles to tame to implement the policy, perhaps, but there are no credible/serious show stoppers to achieve what the Science journal’s policy paper is promoting.
We have seen the disturbing shortcomings of some high profile climate science papers in the area of QC and openness of code, methodology and data. The policy advocated by the journal Science should also be taken up by bodies issuing grant application specifications. The grant specifications also should include similar requirement on code, method and data; required of grant applicants by the body issuing grant applications to ensure there can be complete and timely independent verification by others scientists.
Personal Note: I remember programing FORTRAN on punch cards at university in 1969 for my required computer science class for engineering majors. The computer was a CDC 6400 . . . . ahh the incredible fun we all had in the computer terminal building until the weeee hours of the morning. : )
John
Shock News: Scientific journal suggests that scientists use the Scientific Method. Film at 11.
“and that some code may have commercial value”
What a joke ! Just look at the wealth of extraordinary open source software in almost every area that is now available: Blender, GIMP, Open Office, Linux, Audacity, etc, etc
Anthony
Just after I’d enjoyable taken algebra in 1960 I was given access to a little IBM 1620. FORTRAN was the most wonderful magic, and I loved filling in the code=sheet with algebra-looking formulae, then seeing my lines turn into punch cards, the smell of which I liked as much as those of book-bindings.
My very first program (on its twentieth try) transformed geocentric stellar coordinates into galactic xyz and displayed star maps from locations within 100 light years. When they started discovering exoplanets I went back to find I had listed their host stars in my little catalog of nearby sunlike stars, four decades earlier.
I still haven’t heard anyone comment about supercomputer climate models and what you do with a code listing. Instead of comments, the various lines of code would need links to briefings of the numerous meetings that made the decisions embodied in the code. But that sounds to me like something so useful that of course no one would do it.
You can bet that climate-model documentation, if any, is scattered, disorganized, incomplete, and intricately sequestered from public view. If the Team is ever forced to comply they will dump mountains of digital trash requiring hundreds of volunteers to spend years combing through the wretched mess. Then they’ll trumpet their transparency and openness.
It seems to me that the concern about “amateur” code gives all the more reason to insist on releasing the code – so that more experienced coders can see if the code has “amateur” mistakes!
Even as a professional programmer, I go through a code review process on a regular basis. It’s a good practice to get other eyes on your code to see things you’re often just too close to to see clearly.
cw00p says:
April 17, 2012 at 1:54 pm
Thank you.
Leif Svalgaard: slogging through that is no fun and not very illuminating…
It makes publication possible, which addresses your point about Excel.
pbittle says
That’s great news! I had always wondered why this wasn’t a requirement all along. Of course, the corollary to this is that the source data must be made available as well. Had this been the practice there would have been no “hidden decline”.
—————–
Just because it is not a journal requirement does not mean the code is not available. Sometimes the code is available on the Internet, sometimes it’s available on request, sometimes it’s been lost, sometimes it is deliberately kept back for whatever reason.
I tend to regard this particular debating point as bogus. There is a lot of code available and it can be inspected. When code or data does becomes available, when previously it was not, I see Climate skeptic land lose interest. The general impression is that the lack of code of data is just a debating point and you are not sincere in your complaints.
REPLY: Riiiiight, Mr. Sincerity himself speaks. When we requested Hansen’s code, nobody could run it due to lack of hardware and ancient compilers, because Hansen’s stuff is spaghetti code, which was a learning experience all by itself. After weeks of effort and hair pulling, Steve Mosher finally succeeded. You wouldn’t be able to work on those Macs you program edu-apps for I bet. But go ahead diss the idea, it is quite suitable for your demeanor.
Bottom line though is that I support code release across the board, whether you like it or not. – Anthony
Let’s see how much substance there is in all this instead of hot air?
How many of you have actually downloaded climate-related code and checked it for bugs? Just a simple code inspection would suffice.
How many of you have actually found bugs in such code and reported the difference it makes to the outcome? My general impression that one or 2 instances of this have actually occurred.
The same applies to data. I have been lead to believe that the Climategate raw data has been released. If true, an analysis of that would prove or disprove the repeated claims of fraudulent data manipulation in that data set. But I have seen nothing in climate skeptic land about this.
Remember the stigmata of the 60’s programmer–the rubber band (from and/or for the card deck) around the wrist?! A neat Trivia question, maybe.
Regarding the problem of library dependencies and such, maybe these are some possible workarounds:
1. The journals could stipulate that the code should compile and run on a reference platform. This could be for example FreeBSD or some stable and resource-rich Linux flavor like Debian. These two have a wealth of pre-packaged programs and libraries available right out of the box, especially in the realm of scientific computing, and are very well integrated and stable.
The journals could then administer such reference platforms and make user accounts available to authors and readers alike; the authors would set up the software, and the readers could run their tests.
2. In case the authors are unable to release code that conforms with the reference platform, they could be requested to make user accounts available for reviewers and readers on their own machines. The users would be able to run their own inputs, and at least the reviewers would also have to be given the credentials to verify that they are indeed running the released code; for example, they would have to be allowed to compile the code, md5digest the result and then compare it to the active executable.
This IS good news. But it means NOTHING if the codes are properly documented.
Here is a good example of poor documentation.. Try to find what equations their FORTRAN code is solving and the numerical methods associated with those equations…Black box indeed!
joeldshore says:
April 17, 2012 at 2:33 pm
COOL! Where’s the link to his code for a) his Hockeystick paper, b) his 1999 paper, c) his 2003 paper with Jones (see below)? (After publication of his 2008 paper, and after complaints from Steve McIntyre among others, Mann archived his 2008 code.)
Regarding the code for his 2003 paper, Mann said in the Climategate emails:
Joel, you talk as though Mann has voluntarily made his code public, rather than claiming it was his personal property, and hiding it every chance he has gotten, as above. This is the guy who famously was quoted in the WSJ article:
I just want you to remember exactly what kind of a serial liar and expert in scientific malfeasance you are holding up as an example of available code, and to remind you that when you lie down with dogs, you get up with fleas.
For you to hold Mann’s code out as an example of transparency is a sick joke. I had expected better from you, Joel, much better. In fact you go on to say:
So … are you saying you are against the thrust of the Science article that says the opposite of that? You don’t believe that as a rule scientists should release their code when they publish the results of that code?
w.
I always like the box in the flow chart that reads: “And then a miracle happened.”
Of course, with the AGW crowd it would read: “And then a catastrophe happened.”
John W. says:
April 17, 2012 at 11:07 am
Len, FYI: As a general rule, any code developed by a DoD contractor with any contract money is the property of the US government.
———————-
In that case it seems a small step to require it of all government spending, including grants and everything else.
Willis Eschenbach says:
I would not say that I am against it, but I think there are some serious issues that will need to be addressed, including:
(1) What if the code is proprietary to your company? When I worked for Kodak, I published papers based on code that I and others at Kodak had written but that Kodak would never allow us to release; in fact, releasing it could be grounds for severe disciplinary action up and including dismissal. Will this provide further discouragement for scientists in industrial environments to publish their work?
(2) What if you use code that is proprietary to another company? There are lots of papers out there using, say, TracePro raytracing software or proprietary quantum chemistry software where the scientists writing the paper don’t have access to the source code themselves.
(3) Will the requirement that scientists have to release code that has taken a significant amount of work to write mean that some scientists will shy away from writing papers in a timely manner, preferring instead to get all the use that they can out of their code before revealing it to the competing scientists and, if this occurs, how large a negative impact might it have? There are, after all, good reasons for allowing people to have certain intellectual property rights even given the need for openness and transparency in science.
I am not saying that these issues can’t be overcome but I am just saying that there are some real issues that need to thought about. It is not as cut-and-dry as people seem to think.
Leif Svalgaard says:
April 17, 2012 at 11:53 am
While laudable [and with my full support] there are problems. Some research does not require specific custom-written code, but can be adequately done with interactive tools. A trivial example being Excel. So, there is no code to publish. Publishing the spreadsheet might solve the problem for Excel, but not for more sophisticated [or proprietary] tools. Another problem is that the code might actually be quite small, but use extensive libraries and procedures developed by others [and shared by a larger community]. Some of that may be difficult to publish and near impossible for outsiders to use. In my own research I have often tried to use other people’s code, but mostly given up and written my own because of the steep learning curve involved in using other people’s stuff.
———————————————–
If a paper to be published requires peer review, how can you say it was peer reviewed if the data and code were not checked? Do you just trust your pal’s numbers and graphs?
I used to believe if a paper was peer reviewed it was pretty darn sure the truth. And the reviewer agreed with the paper.(with comments) Otherwise why even bother having a peer review it?
If you’re not looking at the paper’s accuracy then why lie about the “peer review”?
It sounds like maybe climate science is loaded with PAL review and maybe not so much peer review.
Willis Eschenbach says:
I am not sure why people are so fascinated by code over a decade old that has been superceded by more recent work. As you noted, he has archived has 2008 code.
He also has archived his data for the earlier papers, see e.g., here: http://www.meteo.psu.edu/~mann/shared/research/old/mbh99.html
http://www.meteo.psu.edu/~mann/shared/research/old/mbh98.html
I had thought the code was there too now but I can’t seem to find it at the moment.
So, are you going to return the favor and provide me with the link to the UAH code? I don’t need to see all the earlier versions. The latest version would be just fine.