The Journal Science – Free the code

In my opinion, this is a testament to Steve McIntyre’s tenacity.

Via the GWPF: At Last, The Right Lesson From Climategate Fiasco

Monday, 16 April 2012 11:21 PhysOrg

A diverse group of academic research scientists from across the U.S. have written a policy paper which has been published in the journal Science, suggesting that the time has come for all science journals to begin requiring computer source code be made available as a condition of publication. Currently, they say, only three of the top twenty journals do so.

The group argues that because are now an integral part of research in almost every scientific field, it has become critical that provide the source code for custom written applications in order for work to be peer reviewed or duplicated by other researchers attempting to verify results.

Not providing source code, they say, is now akin to withholding parts of the procedural process, which results in a “black box” approach to science, which is of course, not tolerated in virtually every other area of research in which results are published. It’s difficult to imagine any other realm of scientific research getting such a pass and the fact that code is not published in an open source forum detracts from the credibility of any study upon which it is based. Articles based on computer simulations, for example, such as many of those written about astrophysics or environmental predictions, tend to become meaningless when they are offered without also offering the source code of the simulations on which they are based.

The team acknowledges that many researchers are clearly reticent to reveal code that they feel is amateurish due to computer programming not being their profession and that some code may have commercial value, but suggest that such reasons should no longer be considered sufficient for withholding such code. They suggest that forcing researchers to reveal their code would likely result in cleaner more portable code and that open-source licensing could be made available for proprietary code.

They also point out that many researchers use public funds to conduct their research and suggest that entities that provide such funds should require that  created as part of any research effort be made public, as is the case with other resource materials.

The group also points out that the use of  code, both off the shelf and custom written will likely become ever more present in research endeavors, and thus as time passes, it becomes ever more crucial that such code is made available when results are published, otherwise, the very nature of peer review and reproducibility will cease to have meaning in the scientific context.

More information: Shining Light into Black Boxes, Science 13 April 2012: Vol. 336 no. 6078 pp. 159-160 DOI: 10.1126/science.1218263

Abstract

The publication and open exchange of knowledge and material form the backbone of scientific progress and reproducibility and are obligatory for publicly funded research. Despite increasing reliance on computing in every domain of scientific endeavor, the computer source code critical to understanding and evaluating computer programs is commonly withheld, effectively rendering these programs “black boxes” in the research work flow. Exempting from basic publication and disclosure standards such a ubiquitous category of research tool carries substantial negative consequences. Eliminating this disparity will require concerted policy action by funding agencies and journal publishers, as well as changes in the way research institutions receiving public funds manage their intellectual property (IP).

=========================================

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

248 Comments
Inline Feedbacks
View all comments
barry
April 17, 2012 7:28 pm

Would WUWT be willing to initiate a policy whereby it will only publish articles on science where the code/data et al is available at the time of posting?
If there is strong feeling about this, that would seem like the righteous thing to do, both to act according to principle, and to encourage scientists to make their data available.
This could not operate retrospectively, of course, and it does make for some interesting choices regarding semi-regular contributors (eg, Roy Spencer, who I don’t think has published the UAH code/algorithms?).
I would be interested in seing discussuion here of such a policy, pros and cons, because I think it would be revealing as to true attitudes on ‘releasing the code’.

April 17, 2012 7:49 pm

barry,
The difference is that my tax dollars do not go into paying for WUWT. Where my tax dollars subsidized the code, then that code should be available to the public that paid for it.

April 17, 2012 7:59 pm

Nobody expects scientists in general to be wonderful programmers. But if they use computers as tools for analysis, and write their own programs as part of that, then they must provide source code. It becomes part and parcel of the “experimental” process and essential for others to independently try to replicate the results.
There are so many traps in programming which aren’t even obvious to professional programmers. One has to understand the limits of digital representation of data and its consequences in numerical analysis. This is definitely an issue when one is dealing with finite element analysis; which encompassess circulation and other climate models. And also numerical statistics. “The Computer” won’t take of these things automagically. Theoretical analysis techniques must be programmed with due regard to the limits of data representation, which is often not the same techniques on would use when working through the data by hand.
At the very least, one must be aware that “random noise” from the numerical analysis may swamp the actual results.

April 17, 2012 8:06 pm

joeldshore ,
I think you must have skipped this, because about 10 people have said this as comments already, but you must have missed it….judging from your “points” you listed…
If someone writes a science paper, the results are supposed to be replicated by anyone in that field. This is the same as any patent that is filed in the patent office should be easilly copied by anyone in the field that a certain patent is filed in.
The advantage of writing in pure science is rather whimsical. There is very little advantage in publishing. If you do not like what pure science is, then perhaps publishing and likewise pure science and reasearch is not for you.
The call for source code might not be clear, but its not a call for all source code per se.
If someone is using a third party app for instance, there are probably cases where the source code might not be relevant to revelation to the usage of the code. For instance, if the professor merely used code, I don’t see how putting out source code in that case which has nothing to do with the research paper helps, to give an example, if someone writes a research paper on dinosaur eggs I doubt the source code for excel would help if he merely used it.
So in essence, your points 1 and 2 are really irrelevant for this case. The source code being asked for is the source code that the writer of the papers actually writes himself. Now for one, perhaps in the rare case that a writer of a scientific paper was also working at a private firm and wrote some software at the same time and tried to write a scientific paper, but in that rare case, that writer could NOT write about that process because in a scientific paper you have to describe the process so that anyone could duplicate the process, and if the code is propriatary, well describing how to write the code is just as bad…and well the defeats the purpose if you think about it, otherwise well you aren’t writing a real scientific paper and just having people guess.
And if people are guessing, its not real science. And that is the entire point being raised here, if your code and method is not easilly duplicated, then you need to produce all of it so that others can attempt to do so. In essence, any code being provided along with data should be provided because otherwise what are we being asked to accept? The scientists word as a person? I am sorry that does not cut it. In the heart of pure science we accept only blunt scepticism as our tool of trade.
And your third point… Going back to what pure scientific research is, its about taking lots of time and effort for little or no gain for oneself.
I tried to explain this at the start, but if you did not understand this by now, to explain it further, if someone is not willing to share everything up to the current point on what they have, they have no business being in science anyway. If someone wants to make a profit, then start a business, because that is not science. And in that case, you can act like Dr. Mann all you want and hide all the code you want to as well.
But by saying that there might be a concern with research in the future possibly about sharing code is just trying to make it more difficult for others to replicate research when its hard enough to get other people’s programs to run.

BernieH
April 17, 2012 8:11 pm

Providing computer code, and where necessary, input data, seems unquestionably the least ambiguous way to finally document what we have done. Without the code we are saying that here is the graph (or numerical results, etc.), and here is the method (or equations, etc.) that we affirm were used to produce these results. Here is our manuscript.
Such descriptions in words are often incomplete, imprecise, or ambiguous, and even mathematical formulas may be ambiguous and/or subject to typos. But I can’t see how computer code, when provided “cut-and-paste” from our working programs, provided as a supplement to our text, can be either incomplete or ambiguous. Computer WON’T tolerate either.
If a methodology is unclear to a reader from the text and related offerings, program code, even in a language in which the reader may not be familiar (not capable of writing), is nonetheless often fairly easily interpreted to yield the missing elements leading to a fuller, more exact, understanding. Aiding replication of results is important, but first of all the code can aid understanding of published papers.
Authors who are not proud of their code because it is not elegant have likely, correspondingly, written simpler and more straightforward coding, code which will be MORE useful to more readers to show what was done and that it was done correctly. Researchers following up are of course free to write better (faster, more compact – but more obscure?) code for their own daily use.

barry
April 17, 2012 8:14 pm

Smokey,
For quite a few commenters here the principle is that you can’t trust the science if you don’t have the code. Do you think it’s ok for WUWT to promote science that doesn’t include code (and other data)?
Comments by Joel and Louise touch on the issue of principle. Why do skeptics berate Michael Mann but not Spencer and Christy for keeping code? (both are ‘publicly funded’) It appears that skeptics think the ‘warmists’ are trying to keep their code secret, and that revealing it will overturn their conclusions. But this could work both ways.
My query is about principle. I want to discover if the general agreement here on releasing code is about improving the science generally, or if it is about a belief that releasing the code will weaken the ‘warmist’ science. I’m seeing a little of both right now, from different contributors. Anthony has declared code should be released ‘across the board’. Your reply, Smokey, makes me think that you are mainly interested in how such a policy might crush the ‘warmists’. (You’re free to correct me on that)
So, to ask once more of people who seem to feel strongly about this – would you advocate for WUWT only to post articles on science where all the data is available?
I think the answer would be “no”. So my next question would be the reasoning behind such a reply. And how should we as readers approach such articles? BAU?
Is code release so important that advocates would like to see some principled action here? Or is it not meaningful enough to make some kind of stand on it?
Money/mouth and all that.

Michael Palmer
April 17, 2012 8:19 pm

joeldshore
April 17, 2012 at 7:12 pm

makes some excellent points regarding intellectual property.
I still think that making the release of code the norm is the right thing to do. People will then have to explicitly declare whether or not they reserve intellectual property and, as such, prevent others from verifying their software. It will then be up to the reviewers and readers to decide whether or not a paper that is shackled and restricted in this way still has enough substance and credibility to merit publication. If that leads to fewer papers, it’s not the end of the world; in fact, it happens quite commonly in medicinal chemistry.

Editor
April 17, 2012 8:30 pm

joeldshore says:
April 17, 2012 at 7:12 pm

Willis Eschenbach says:

So … are you saying you are against the thrust of the Science article that says the opposite of that? You don’t believe that as a rule scientists should release their code when they publish the results of that code?

I would not say that I am against it, but I think there are some serious issues that will need to be addressed, including:
(1) What if the code is proprietary to your company? When I worked for Kodak, I published papers based on code that I and others at Kodak had written but that Kodak would never allow us to release; in fact, releasing it could be grounds for severe disciplinary action up and including dismissal. Will this provide further discouragement for scientists in industrial environments to publish their work?

Thanks as always for your reply, Joel. Certainly, if the code is proprietary then it cannot be published in a scientific journal … but then if the code used in a study is proprietary, then the study it can’t be replicated, and thus it shouldn’t be published in a scientific journal.

(2) What if you use code that is proprietary to another company? There are lots of papers out there using, say, TracePro raytracing software or proprietary quantum chemistry software where the scientists writing the paper don’t have access to the source code themselves.

That seems like a non-issue to me, for the same reason. If someone can’t explain what their procedure is doing for any reason, be it that they used proprietary quantum chemistry software or some other reason, then why are they publishing in a scientific journal?

(3) Will the requirement that scientists have to release code that has taken a significant amount of work to write mean that some scientists will shy away from writing papers in a timely manner, preferring instead to get all the use that they can out of their code before revealing it to the competing scientists and, if this occurs, how large a negative impact might it have? There are, after all, good reasons for allowing people to have certain intellectual property rights even given the need for openness and transparency in science.

At some point, the researchers are going to have to man up and decide if they are businessmen or scientists. If they are businessmen looking for a competitive advantage, fine, keep it all secret. If they are scientists, reveal it. I have no problem either way. But you want to have it both ways. You want them to have the imprimatur and the prestige of scientists, but not show their work because they work for Kodak … sorry, my friend. Make up your mind, one or the other.
Finally, I don’t see how the ruling could have a “negative impact” compared to what’s happening now, because the problem is that people aren’t revealing their code now, but they are publishing. I’d much prefer that if they don’t want to reveal their code that they don’t publish, because that is commercialism masquerading as science … and them not publishing would be greatly preferable to them pretending to be scientists and publishing without backing up their claims in the normal scientific manner.

I am not saying that these issues can’t be overcome but I am just saying that there are some real issues that need to thought about. It is not as cut-and-dry as people seem to think.

I don’t see the “real issues”. It seems extremely cut and dried to me. If you want to be a businessman, you get to keep all the secrets you want. But if you want to be a scientist, you have to show your work. Where’s the issue?
Transparency is at the core of science, because science is built on replicability. My high school chemistry teacher, Mrs. Henniger, would laugh in your face for claiming otherwise, Joel, for saying that business concerns should allow scientists to not show their work. She would say that’s not science, that’s just business … and she’d be right.
w.
PS—I’m still waiting for the link to the code for Mann’s various papers, you said you had them … although at this point Mann is probably claiming that they are business secrets because he was working for Kodak or something …

April 17, 2012 8:36 pm

Chuck Nolan says:
April 17, 2012 at 7:18 pm
If a paper to be published requires peer review, how can you say it was peer reviewed if the data and code were not checked?
This is often not easy, but a lot can be done by inspection of the method and the analysis and generally making sure the authors are honest. As an example of peer review I offer my own review of a solar prediction paper [the prediction eventually failed] done without access to data and code:
http://www.leif.org/research/Dikpati%20Referee%20Report.pdf

Frank K.
April 17, 2012 8:59 pm

barry says:
April 17, 2012 at 8:14 pm
“Your reply, Smokey, makes me think that you are mainly interested in how such a policy might crush the ‘warmists’.”
Barry – we don’t wish to “crush the warmists”. We just want them to go away, stop bothering us, and use their OWN money to conduct their “science”.
And to be honest, if NASA GISS Model E is any indication of the code quality for climate models, I really DON’T want to see the source code…

Steve Garcia
April 17, 2012 9:12 pm

Let’s not forget that adjustments would also be within the code, thereby outing any fudge factors for what they are. Such adjustments thus would need explaining – either at the time or when replication is attempted. Someone would spot some dodgy adjustments. The big ‘adjusters’, if they think they are powerful enough, would lean on journals to not go along with this.
Andy bets on which journals that would be?

April 17, 2012 9:32 pm

To the person asking if Dr Cristy makes his code available, the last time I asked hi. He said no, but they were working with NOAA to get it available to the public.
This was 2 years ago and I’ve since lost interest in chasing after scientists trying verify their work (can’t be done, IMHO). But if someone wanted to follow up they could shot Dr. Cristy an email.
I have the history of my conversations with him as well as the name of the NOAA contact on my old blog http://magicjava.blogspot.com/ You’ll need to go through the posts to find the right ones, but they’re there.

Ally E.
April 17, 2012 11:04 pm

The thing I like about this site is that there are so many links provided. I can follow it along, checking out source documentation or sidetrack into deeper research.
By contrast, the AGW mob wave around pretty graphs and yell that the science is settled. I always get the feeling they are talking down to me, as though I’m some kid who just has to accept what they say.
Anthony never does that. None of the contributors here do that. Their line is, “Here’s what we’ve found, this is what we base our conclusions on, check it out for yourself.”
The truth is not a religion. There have been too many errors and not a single prediciton correct from the warmist camp. That’s thirty years worth of nonsense from them. Now let’s cut the crap, it’s time to toss out CO2 as a cause of trouble, it’s time to stop treating human beings as the enemy and it’s time to stop wasting billions in trying to control what we are too puny to control – nature.
Let’s get back to living and growing and creating funds enough to handle any adaptation we may need in the face of the cooler years heading our way.

Jeef
April 17, 2012 11:33 pm

Three words to justify the thrust of this article: HARRY READ ME
that is all.

barry
April 17, 2012 11:50 pm

magicjava,
searching your old website, Christy replies to you.

We are in a program with NOAA to transfer the code to a certified system that will be mounted on a government site and where almost anyone should be able to run it. We actually tried this several years ago, but our code was so complicated that the transfer was eventually given up after six months.

http://magicjava.blogspot.com.au/2010/02/dr-john-christy-on-uah-source-code.html
You contact NOAA, and receive reply confirming

the existence of the project and indicated that it is in its very early stages with no ETA at this time.

http://magicjava.blogspot.com.au/2010/02/update-on-my-attempts-to-get-airs-and.html
Both posts are dated February 2010. If we take several years to mean 3, then it has not been possible for Christy et al to get their code in the public domain for 5 years. His responses and the time-frame may give some perspective on other campaigns to ‘release the code’. Perhaps accusations of malfeasace have been ill-advised.

Editor
April 18, 2012 1:00 am

joeldshore says:
April 17, 2012 at 7:23 pm

Willis Eschenbach says:

COOL! Where’s the link to his code for a) his Hockeystick paper, b) his 1999 paper, c) his 2003 paper with Jones (see below)? (After publication of his 2008 paper, and after complaints from Steve McIntyre among others, Mann archived his 2008 code.)

I am not sure why people are so fascinated by code over a decade old that has been superceded by more recent work. As you noted, he has archived has 2008 code.
He also has archived his data for the earlier papers, see e.g., here: http://www.meteo.psu.edu/~mann/shared/research/old/mbh99.html
http://www.meteo.psu.edu/~mann/shared/research/old/mbh98.html
I had thought the code was there too now but I can’t seem to find it at the moment.

You can’t seem to find it at the moment … OK, well, get back to us when you do find it. At this point your claim, that you could provide us links to Mann’s code, is looking real shaky. So if you want to make your word good, I wouldn’t delay too long.

So, are you going to return the favor and provide me with the link to the UAH code? I don’t need to see all the earlier versions. The latest version would be just fine.

Hey, you’re the one saying “I can give you links to” someone’s code, not me. I’ve said nothing of the sort, particularly about UAH. I have no clue if their code is available or not, never given it a moment’s thought until now. So why are you bugging me about it? Go ask John Christy for his code if you want it, come back and report the result.
w.

Steve Richards
April 18, 2012 1:29 am

Not only should the code and data be provided, but a master script should be provided that runs all of the program sequences etc that create the final outputs.
This way would allow anyone to recreate the final charts and tables that researchers drew their conclusions from.
Otherwise you may get a climategate style data dump, which would require verifiers to try many different sequences of compilations trying to achieve the original output. (With the originators saying “well I have given you all the data, can’t you make sense of it?”

Ally E.
April 18, 2012 2:39 am

Let’s get real here. It’s the warmists who insist we are “deniers”, it’s the warmists who object to our objections. It’s the warmists who want to bring in laws to silence the criticism and who want to “treat” us or ban us or lock us away. There’s even hints of death for dissention in the future – just like the good old day, hey what?
If AGW is such a certainty and the AGW crowd has real evidence of it, wouldn’t it be a whole lot easier to shut us all up by SHOWING THE CODES and SHARING THE DATA?
Surely Mann and the rest aren’t putting PERSONAL INTEREST ahead of SAVING THE WORLD!
/sarc off (and sorry for shouting).

April 18, 2012 2:48 am

As a professional programmer, I do more than provide source code and data (inputs and outputs). I also use version/revision control, in case I wipe out something important. Lately, I’ve also been documenting my coding process, in hopes of being able to automate my thinking – thus being able to think at a higher level. I do this so I can discover what is the quickest way to go from point A to point B.

April 18, 2012 3:17 am

Frank K. says:
April 17, 2012 at 6:14 pm
This IS good news. But it means NOTHING if the codes are properly documented.
Better than trying to document code, one should describe how the code was developed. I’ll give an example of the development of a non-trivial code for calculation of spherical harmonics of the sun’s magnetic field as we measure it at the Wilcox solar Observatory: http://www.leif.org/research/Calculation%20of%20Spherical%20Harmonics.pdf

April 18, 2012 3:47 am

thomasl3125 says:
April 18, 2012 at 2:48 am
Lately, I’ve also been documenting my coding process,
This is what I meant in my previous comment. The important word is ‘process’. That is what should be documented, not the code itself. Some would say that any tricks and obscure points should be well-documented. I would say that one should not use tricks and obscure code in the first place.

ZZZ
April 18, 2012 4:50 am

There is really no substitute for good will and a real desire to communicate how the work was done. If that isn’t present, “process” type rules — like saying you have to publish the code and its documentation — won’t be as helpful as people here seem to think. No large and long code is perfect and without more bugs to be discovered — why do you think Microsoft and Apple keep on issuing patches to their operating systems? The real question is whether the bugs still there are important and significantly affect the results you’re interested in. (Come to think of it, computer languages like Fortran and C++ are also implemented by long complicated computer programs with undiscovered bugs.)
What you really want is access to the stuff that is not meant for public consumption, like the emails leaked in climategate. Hey, that’s an idea! Why not require all scientific emails be disclosed? Wait, I know why that won’t work! The official emails will be unhelpful (because everyone will know they’re destined for release) while the real conspiracy occurs off the record. I suspect that requiring all the code to be released will produce lots of similarly unhelpful pro-forma information, with the computational dirty work buried in lots of “magic” data files that cannot be easily examined or understood.
A better idea would be to require a scientific “audit” of what was done, or better yet, just require that others be able to reproduce the disputed work — and isn’t this in fact already the gold standard for whether a scientific discovery is valid?. If a group of scientists aren’t of their own free will going to help you — a fellow member of their discipline — replicate what they did, that already tells you everything you need to know about the quality of what was done. If they respond to criticism by launching a publicity campaign against you in the popular media, that confirms it.

Walt The Physicist
April 18, 2012 7:12 am

Publishing the codes is a bad idea. Any simulation requires three components: a physical model, a mathematical model, and a numerical code. Publishing first two components, i.e. physical and mathematical models, provides complete disclosure of information needed to reproduce the simulation. Disclosing the numerical procedure isn’t necessary. A numerical code can (and usually does) contain proprietary methods and, if enforced this rule will preclude publication of large number of good works. At this point I wonder why is that all you smart people allow few bad scientist control you by making you reconsider the established publication procedures that are suitable for free society? What’s next… the journals will require purity of heart certificate for publication?

François GM
April 18, 2012 7:17 am

Code is a good start but I suspect more important problems in climate research that influence results: selection and expectation biases being the dominant ones. Both can in part be subconscious and these biases are not apparent in the code. Also, the best written code in the world cannot prevent using an improper statistical method.

April 18, 2012 7:30 am

Walt the Physicist,
Transparency is an absolute requirement of the scientific method. If an experiment or a paper based on a hypothesis cannot be replicated, it is simply a conjecture; an opinion.
Public policy costing $Trillions is now predicated on opinions in which the basic assumptions are not transparent. Willis is correct: if it’s buisness, by all means, withhold the codes. But if public policy depends upon the codes, they must be published in full. No exceptions. No excuses.

1 3 4 5 6 7 10