The Journal Science – Free the code

In my opinion, this is a testament to Steve McIntyre’s tenacity.

Via the GWPF: At Last, The Right Lesson From Climategate Fiasco

Monday, 16 April 2012 11:21 PhysOrg

A diverse group of academic research scientists from across the U.S. have written a policy paper which has been published in the journal Science, suggesting that the time has come for all science journals to begin requiring computer source code be made available as a condition of publication. Currently, they say, only three of the top twenty journals do so.

The group argues that because are now an integral part of research in almost every scientific field, it has become critical that provide the source code for custom written applications in order for work to be peer reviewed or duplicated by other researchers attempting to verify results.

Not providing source code, they say, is now akin to withholding parts of the procedural process, which results in a “black box” approach to science, which is of course, not tolerated in virtually every other area of research in which results are published. It’s difficult to imagine any other realm of scientific research getting such a pass and the fact that code is not published in an open source forum detracts from the credibility of any study upon which it is based. Articles based on computer simulations, for example, such as many of those written about astrophysics or environmental predictions, tend to become meaningless when they are offered without also offering the source code of the simulations on which they are based.

The team acknowledges that many researchers are clearly reticent to reveal code that they feel is amateurish due to computer programming not being their profession and that some code may have commercial value, but suggest that such reasons should no longer be considered sufficient for withholding such code. They suggest that forcing researchers to reveal their code would likely result in cleaner more portable code and that open-source licensing could be made available for proprietary code.

They also point out that many researchers use public funds to conduct their research and suggest that entities that provide such funds should require that  created as part of any research effort be made public, as is the case with other resource materials.

The group also points out that the use of  code, both off the shelf and custom written will likely become ever more present in research endeavors, and thus as time passes, it becomes ever more crucial that such code is made available when results are published, otherwise, the very nature of peer review and reproducibility will cease to have meaning in the scientific context.

More information: Shining Light into Black Boxes, Science 13 April 2012: Vol. 336 no. 6078 pp. 159-160 DOI: 10.1126/science.1218263

Abstract

The publication and open exchange of knowledge and material form the backbone of scientific progress and reproducibility and are obligatory for publicly funded research. Despite increasing reliance on computing in every domain of scientific endeavor, the computer source code critical to understanding and evaluating computer programs is commonly withheld, effectively rendering these programs “black boxes” in the research work flow. Exempting from basic publication and disclosure standards such a ubiquitous category of research tool carries substantial negative consequences. Eliminating this disparity will require concerted policy action by funding agencies and journal publishers, as well as changes in the way research institutions receiving public funds manage their intellectual property (IP).

=========================================

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
248 Comments
Inline Feedbacks
View all comments
Dr Slop
April 17, 2012 1:30 pm

In all a very good move. Should provide plenty of fodder for http://thedailywtf.com/Default.aspx

Jakehig
April 17, 2012 1:31 pm

Coincidentally this week’s Economist has a leading article calling for open access to all research funded by government and charities. Apparently the UK government is going to mandate this.
Another straw in the wind….

April 17, 2012 1:33 pm

Earle Williams says:
April 17, 2012 at 1:15 pm
“My programs are completely portable and will run on ANY computer with a COBOL compiler.”
Shouldn’t that read BOTH computers, not ANY computer?

The joke apart, I meant ANY [modern or ancient] computer.
AGU’s EOS has a piece about using the R language which is appropriate for the topic:
http://www.leif.org/EOS/2012EO160003-R.pdf

April 17, 2012 1:33 pm

WHY WOULDN’T THEY USE STANDARD, UNIVERSALLY RECOGNIZED CODE COMMENTING METHODS?
Well, if they are going to produce the source code and data, then there should be some standards set on those submissions much as process and protocol you follow when you submit the paper (citing, bib, etc). This would note be an issue to specify that the code be annotated in certain ways to identify algorithms. If you use external libraries (regardless of the language),you cite them. (Heck, we used to put Easter eggs in the code and put in all kinds of funny stuff in the comments for future programmers to see, you can write a friggin’ book based on just the comments in coding. And with all the code and comment management tools out there….FREE…this makes no sense at all…they can just load it all up into CVS, SourceForge, subversion…come on, there’s tons of these out there. Plus there all tons of tools that walk code and pull out the comments; as well as map the code, etc, etc)
Yes, some of these folks are really just almost hobbyists on some of the code; but, since the computer, source code, and data are now part of the professional tools kit, then they better darn well follow some semblance of standard programming format; much of which has been around for close to 60 years now.

Jim Melton
April 17, 2012 1:40 pm

There is no today excuse for producing undocumented bug ridden code that has no error traps or a set-up guide unless it is for personal use. I had this *principle* drummed in to me at uni – not about code but as an engineer producing unlabeled drawings and maths withour annotation. the reason given was – if you work for an employer they pay for your time so if ou are hit by a bus someone else MUST be able to pick it up.
after having spent 20years since in IT consultancy my tutors at uni were correct. Any employee of mine who does not do this gets a dressing down (it is rarely needed) and any contractor gets the heave-ho.
Welcome O great scientific minds to the 21st century.
Interstellar Bill, Antony and Leif,
The language is not at issue because like anything else they are always machine/OS/version specific to be able to run error free. One answer is to archive a vitual machine (our company uses data centres but cloud or journal repository would be better in this case) with full code and os and version controlled and documented source code. This is now a slew of commercially available software and hardware to enable exactly this sort of archiving.

Graeme W
April 17, 2012 1:41 pm

I agree with Leif’s concerns about software dependencies, but the main point is to make sure that the custom code associated with the paper is available. The dependencies required to get it to work, while potentially annoying and time-consuming, do not stop someone from evaluating the code required for the paper to see if there are any issues with it.
As for a standard language, I agree with the poster above that it is too late for that. Indeed, one of the reasons we have multiple programming languages is because they are for different purposes. FORTRAN, for example, was the language of choice for mathematical manipulation, but other languages were better for other purposes (eg. text manipulation). Someone expressed a preference for C#, but one disadvantage of that it’s not available on other platforms such as Linux. The Mono project is overcoming that limitation, but can it guarantee 100% compatibility? As an example, while OpenOffice is largely compatible with MS-Office, there are some areas where the two have subtle differences. I suspect there will be similar cases with Mono that may or may not affect the results for a program written in C#.
As a computer programmer, I believe that seeing the code has benefits, even if the environment to run it can’t be easily reproduced. As an example, I’ve personally found bugs in open source code by simply reading the code – bugs that I then reproduced by setting up the conditions that the source code led me to believe would cause a problem. I wouldn’t have found those bugs if I hadn’t been able to read the code.

Matthew R Marler
April 17, 2012 1:42 pm

Leif Svalgaard: Some research does not require specific custom-written code, but can be adequately done with interactive tools. A trivial example being Excel. So, there is no code to publish.
Does not Excel create a log showing the code that is executed during the interactive session?

April 17, 2012 1:43 pm

Editor’s note to self…proof read my own stuff sometime *seesh* Sorry folks for my creative use of the alphabet.

peter_dtm
April 17, 2012 1:48 pm

so if you think your code is that bad – then you should do the following
you *do* write top down code don’t you; you are not that mad that you just write it out without planning ?
So – release your FLOW CHART and/or your pseudo code
and for heavens sake COMMENT the thing; if it doesn’t have comments explaining what it does it is GIGO code.
And – surely if you write code as part of your research project (publicly funded) then ALL of that code belongs to the tax payer; NOT to you.
Of course if you work for a private institute then the code belongs to your employer.
Either way it is NOT your code. Unless you are a gentleman scientist ?
Note that by releasing your flow chart and pseudo code your ‘code’ is portable to an extent. And if one of your assumptions is wrong; why; the little box in the flow chart where you employ that assumption is available for nit picking.
Just think; if we had the flow charts for the GCM we would be able to see the so few variables they use; and all the Crook’s Constants and Fiddle’s Factors that have to be applied to make them work.

Matthew R Marler
April 17, 2012 1:48 pm

Leif Svalgaard: My experience with this [going back to the 1960s and continuing today] is that it is easier to write good, self-documenting, correct code than the amateurish hash you may refer to. The way to do this can be taught.
I agree on both counts: (1) it can be taught and learned and (2) over the course of a project, it is easier to write good code than mateurish junk; it’s just that on some days you feel like you only have to solve a simple problem for that day, and you feel justified in cutting corners, and you need to learn early on to fight the temptation to do so. My experience goes back to the 60s and card punches, and I am not good enought to call myself proficient. but it is always best to write the programs from the start with the knowledge that you may have to explain how they do what they do two years from now when you have forgotten it.

April 17, 2012 1:54 pm

Matthew: Does not Excel create a log showing the code that is executed during the interactive session?
You can actually run it in a debug mode and step it too. http://www.excel-vba-easy.com/vba-basic-debug-macro.html and yes, it logs.
For MS stuff, you can reverse engineer the libraries (including MS’s libraries) with plenty of tools…you don’t need the source code (but you better be a pro).

April 17, 2012 2:02 pm

Bottom line on the coding thing from me. Who cares if you write crappy code or even what computer language you write it in. Commenting can follow universal standards (Flower Box for one), it’s not the computer language, its specifically describing what it’s doing. (comment Tools can even take all your comments and put them into a nice little book for you). ie: [this sets java up to work with javadoc] http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html
And…shall we count the comment methods and styles?: http://en.wikipedia.org/wiki/Comment_(computer_programming)

Rob Crawford
April 17, 2012 2:03 pm

“The team acknowledges that many researchers are clearly reticent to reveal code that they feel is amateurish due to computer programming not being their profession and that some code may have commercial value”
Well, then, maybe they should spend some cash on developers instead of conferences?
Then maybe they’ll hear about things like automated testing and acceptance tests and…

peter
April 17, 2012 2:05 pm

Ill happily give them a pass on amateur code as long as the statistics it implements are not.
Thats really all we care about. What exactly did you do to the numbers prior to the pretty plots.
Its so simple. just show all your work.
Doing anything else is like a proof or derivation with a missing few lines.
Lunacy.. and its about bloody time.

Rob Crawford
April 17, 2012 2:05 pm

“As a computer programmer, I believe that seeing the code has benefits, even if the environment to run it can’t be easily reproduced.”
Are there still languages that do not have things like Java’s “maven” and ruby’s “gems”?
It’s been six or seven years since I’ve had to worry about having all the right libraries and tools installed; the tools manage that for me.

Otter
April 17, 2012 2:06 pm

Louise~ Since the WUWT team appears to have grown since Anthony created the site, I was thinking they could use someone who is good at going over code. You sound very interested in seeing Dr. Spencer’s, perhaps you could petition to join the team and help them out?

More Soylent Green!
April 17, 2012 2:14 pm

peter_dtm says:
April 17, 2012 at 1:48 pm
so if you think your code is that bad – then you should do the following
you *do* write top down code don’t you; you are not that mad that you just write it out without planning ?
So – release your FLOW CHART and/or your pseudo code
and for heavens sake COMMENT the thing; if it doesn’t have comments explaining what it does it is GIGO code.
And – surely if you write code as part of your research project (publicly funded) then ALL of that code belongs to the tax payer; NOT to you.
Of course if you work for a private institute then the code belongs to your employer.
Either way it is NOT your code. Unless you are a gentleman scientist ?
Note that by releasing your flow chart and pseudo code your ‘code’ is portable to an extent. And if one of your assumptions is wrong; why; the little box in the flow chart where you employ that assumption is available for nit picking.
Just think; if we had the flow charts for the GCM we would be able to see the so few variables they use; and all the Crook’s Constants and Fiddle’s Factors that have to be applied to make them work.

Flow chart? Pseudo code? Top down code? In which century did you learn to program? 8>)
I learned programming in the 80’s (OK, that’s a decade, not a century, but it was a decade in the last century) and that’s what we used. These were replaced with UML diagrams, class diagrams, object-oriented design, use cases and user stories years ago.

Macsteep
April 17, 2012 2:28 pm

As programming is not their main task they are afraid of releasing the code. There cannot be a better reason for the releasing of code. As scientific study is now using program’s the code must be made available to allow for the identification of coding and logic errors

sophocles
April 17, 2012 2:30 pm

Opening the code is an excellent method of having it improved—free. It shouldn’t be scary.
Look at the Free Software Foundation ( at gnu.org) and Linux, and all the Linux distributions available. None of the authors are scared of others seeing their code.
The Internet runs on open sourced computer applications and operating systems.
Without it, we wouldn’t have an Internet.
How many scientists use R—the free, opensource competitor to Matlab (which is definitely not free, nor open)?
As one open source developer said: “many eyes make bugs shallow.”

joeldshore
April 17, 2012 2:33 pm

Anthony Watts says:

REPLY: As far as I know, they’ve made it available for inspection, and it was the doing so that enabled others to spot the orbit decay issue which introduced a bias, long since corrected, though few people like yourself ever let others bashing Christy and Spencer forget about that.

Really…It is available? Where? I can give you links to Michael Mann’s code or the code to do GISS Temp or the code for the GISS climate model but nobody ever seems to be able to tell me where I can go to get Spencer and Christy’s code, just vague unsubstatiated claims that they think that it is available. Actually, as I recall hearing, it actually took the RSS group some effort just to get Spencer and Christy to release the relevant section of their code to them.

BTW where’s the code on that paper you wrote a couple years ago Joel? – Anthony

Our paper commenting on Gerlich & Tscheuschner? I don’t think there was anything in there that requires computer code to calculate. We purposely made our examples simple enough to work out easily with pencil and paper.
I will also note that I am not the one who is loudly proclaiming that I believe scientists must always release their computer code. I only note the issue with Spencer and Christy to point out the inconsistency of those who do.

April 17, 2012 2:35 pm

Matthew R Marler says:
April 17, 2012 at 1:42 pm
Does not Excel create a log showing the code that is executed during the interactive session?
slogging through that is no fun and not very illuminating…

timg56
April 17, 2012 2:38 pm

Leave Louise alone.
Anyone who likes wine, supports nuclear power and reportedly likes to relax in high heeled boots, scant under garments, with a whip to hand is someone to be held in high regard in my book.
Provided they are from the female side of the population. (Sorry Anthony.)

April 17, 2012 2:50 pm

It is worth repeating again and again: Steve McIntyre (together with a handful of others) deserves major credit for triggering this essential improvement in the peer review publication process.

Latitude
April 17, 2012 3:01 pm

so the fate of the world is all based on computer programs…
…that are based on amateurish code, so bad it would be embarrassing
nice

AndyG55
April 17, 2012 3:05 pm

FORTRAN is not dead.. and its certainly a very quick language for throwing around big matrices and applying complex formulas.. Not so pretty on the user interface though !
We use Intel Fortran 95 for our Engineering stuff, mainly because there is so much useful old code that you can link into without having to re-write it.