In my opinion, this is a testament to Steve McIntyre’s tenacity.
Via the GWPF: At Last, The Right Lesson From Climategate Fiasco
A diverse group of academic research scientists from across the U.S. have written a policy paper which has been published in the journal Science, suggesting that the time has come for all science journals to begin requiring computer source code be made available as a condition of publication. Currently, they say, only three of the top twenty journals do so.
The group argues that because computer programs are now an integral part of research in almost every scientific field, it has become critical that researchers provide the source code for custom written applications in order for work to be peer reviewed or duplicated by other researchers attempting to verify results.
Not providing source code, they say, is now akin to withholding parts of the procedural process, which results in a “black box” approach to science, which is of course, not tolerated in virtually every other area of research in which results are published. It’s difficult to imagine any other realm of scientific research getting such a pass and the fact that code is not published in an open source forum detracts from the credibility of any study upon which it is based. Articles based on computer simulations, for example, such as many of those written about astrophysics or environmental predictions, tend to become meaningless when they are offered without also offering the source code of the simulations on which they are based.
The team acknowledges that many researchers are clearly reticent to reveal code that they feel is amateurish due to computer programming not being their profession and that some code may have commercial value, but suggest that such reasons should no longer be considered sufficient for withholding such code. They suggest that forcing researchers to reveal their code would likely result in cleaner more portable code and that open-source licensing could be made available for proprietary code.
They also point out that many researchers use public funds to conduct their research and suggest that entities that provide such funds should require that source code created as part of any research effort be made public, as is the case with other resource materials.
The group also points out that the use of computer code, both off the shelf and custom written will likely become ever more present in research endeavors, and thus as time passes, it becomes ever more crucial that such code is made available when results are published, otherwise, the very nature of peer review and reproducibility will cease to have meaning in the scientific context.
More information: Shining Light into Black Boxes, Science 13 April 2012: Vol. 336 no. 6078 pp. 159-160 DOI: 10.1126/science.1218263
Abstract
The publication and open exchange of knowledge and material form the backbone of scientific progress and reproducibility and are obligatory for publicly funded research. Despite increasing reliance on computing in every domain of scientific endeavor, the computer source code critical to understanding and evaluating computer programs is commonly withheld, effectively rendering these programs “black boxes” in the research work flow. Exempting from basic publication and disclosure standards such a ubiquitous category of research tool carries substantial negative consequences. Eliminating this disparity will require concerted policy action by funding agencies and journal publishers, as well as changes in the way research institutions receiving public funds manage their intellectual property (IP).
=========================================
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
The team acknowledges that many researchers are clearly reticent to reveal code that they feel is amateurish due to computer programming not being their profession and that some code may have commercial value
=============================
And how many people have fallen for this….for how long
The first one is obvious……the second one is….well…..commercial ops have better programers, and if it had any commercial application…they would already have one and it would be better….
Translated: We produce a really crappy below standard product….but someone with far superior sources might steal it
While laudable [and with my full support] there are problems. Some research does not require specific custom-written code, but can be adequately done with interactive tools. A trivial example being Excel. So, there is no code to publish. Publishing the spreadsheet might solve the problem for Excel, but not for more sophisticated [or proprietary] tools. Another problem is that the code might actually be quite small, but use extensive libraries and procedures developed by others [and shared by a larger community]. Some of that may be difficult to publish and near impossible for outsiders to use. In my own research I have often tried to use other people’s code, but mostly given up and written my own because of the steep learning curve involved in using other people’s stuff.
Leif – Maybe the solution is to require all code be programmed in something portable. FORTRAN used to be the universal programming language for science in the early days, now I’d guess that C++ might be more common…hard to say.
Anthony Watts says:
April 17, 2012 at 11:56 am
Leif – Maybe the solution is to require all code be programmed in something portable. FORTRAN used to be the universal programming language for science in the early days, now I’d guess that C++ might be more common…hard to say.
My choice would be R. But that is probably asking for too much. Now, don’t laugh, but a lot of my own coding is done in a dead [but standard] language COBOL [because of its easy handling of large amounts of textual data and databases]. My programs are completely portable and will run on ANY computer with a COBOL compiler.
Leif, no laughing. I myself programmed FORTRAN on punch cards and paper tape in ASR33 teletype terminals…talk about dead.
??…..Unbelievable! How could any honest person be against this proposal? And why would “this site” NOT require this? Why on earth not! Talk about a made up mind, damn…
Just ignore Louise, she’s just a well meaning Earth momma out to get me to save the Earth…she obviously has issues.
It’s one thing to dash off some hastily assembled code for an immediate task, with the programmer relying on short-term memory for documentation, but entirely another to produce code that is fully documented and maintainable, ready to be passed on to the world.
From my decades of programming experience, the latter takes about ten times the effort of the former. People budgeting for grants will have to up their software costs, labor time, and scheduling delays. It would be wise for a research facility to have a software validation department to handle all this new code promulgation.
I’m not saying we shouldn’t do this, but good things are never free and everything has a downside. For example, it will be mandatory for science education to include as many software courses as math courses.
Also, supercomputer code would fill a phone book, and is so particular to one parallel machine (and one particular month, most likely) that it can never be replicated. How are we supposed to vet all those huge climate models? When the models someday are truly enormous enough to reliably forecast weather and climate, they will be beyond any human ability to even inspect.
Finally, what code language are we talking about here?
Fortran? (whose version?)
Matlab?
Lab View?
Every one of these has a costly learning curve for users and software specialists alike, but there’s no way every scientist in the world is going to use the same language. Can we make them only use approved languages from a short list? (whose list?)
I wouldn’t bother writing to Science, but posting here may get these ideas to the right people.
This will set alchemy back decades!
I think its 50 years too late to insist on a common programming language for science. Many people will just not touch some languages: personally, I would do C# but not C++. Publishing code is a good idea. No-one wants to look a fool for writing code with obvious errors, so expect much better checking in future if this happens.
This is a damn good start.
Of course when I run a computer program using the same data, I expect to get the same results each time. But if we have the source code, we can verify the methods being used and how the results were calculated.
Two quick points:
1 – Releasing the code and the data still doesn’t necessarily mean the results are correct, either.
2 – Computer model runs are not experiments. The output of a computer model may be an hypothesis that can be validated against real-world observations.
Well its about time…
For everything I do, every single character in my software get scrutinized….no exaggeration.
Furthermore, I have to even do tolerance analysis, StN analysis etc…..
This has been a bee in my bonnet for years… I bet 1/2 of the code they use is uncontrolled crap.
At last, Progress……….
Anthony Watts says:
April 17, 2012 at 12:36 pm
Leif, no laughing. I myself programmed FORTRAN on punch cards and paper tape in ASR33 teletype terminals…talk about dead.
I still do it…This is from my program to process data from the ACE satellite:
IDENTIFICATION DIVISION.PROGRAM-ID. GETACE.
AUTHOR. LEIF SVALGAARD.
DATE-WRITTEN. 08/05/18
-REVISED: 12/01/03.
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
SOURCE-COMPUTER. PORTABLE.
OBJECT-COMPUTER. PORTABLE.
DATA DIVISION.
WORKING-STORAGE SECTION.
“But they only want it so that they can pick holes in it and find errors!!”
Well – yes, that is the idea its how science works. Oh, I didn’t realize you are a climatologist – I do apologize I am more used to dealing with the hard sciences like sociology. /sarc
By the the way… openness is one thing…. revision control is quite another…I have never met a graduate student that was concerned about revision tracking…. and guess who is writing all that crap code…. it isn’t price waterhouse.
Your right Anthony, I will.
why did I bother? It just amazes me how a warped mind can turn this sensible, honest and perfectly logical (although difficult, as smart guys above state) proposal around and use it to try to smear somenone…but in a very weirdo way…it’s like asking Realclimate if they require their own posters to declare that they do not take money from big oil…it doesn’t make ANY sense at all! But your right, zen zen…
Louise, btw, is just one of the Bunnies.
What a great concept transparency is!
I would be curious as to how many custom formulas in Excel it takes to create a climate model?
Perhaps a Googol of them? 🙂
Anthony Watts says:
She is asking a legitimate question. If this site is so big on code being made publicly available, why not use your influence as a friend and colleague to have Spencer and Christy release the code to do the UAH satellite temperature analysis? There are a lot of people on this site who complain to high heaven that Mann et al. haven’t released every last bit of code (although they have now released pretty close to that) but seem to give Spencer and Christy a free pass. Why is that?
REPLY: As far as I know, they’ve made it available for inspection, and it was the doing so that enabled others to spot the orbit decay issue which introduced a bias, long since corrected, though few people like yourself ever let others bashing Christy and Spencer forget about that. BTW where’s the code on that paper you wrote a couple years ago Joel? – Anthony
a word of warning here.
Most coders regard even their own work as amateurish, because they recognise that with infinite resources it could be done a lot better. A lot tighter.
Most code can be written amateurishly or perfectly and come up with the same results.
Code can also be written So it is not consistant, uses different data sources without warning or uses inconsistant formatting (for e.g.) and this is shoddy
it is not amateurishness that is dangerous, but shoddiness
In my experience the shoddiness comes from the fellow specifying the task, rather than the executor, the developer (although, in reality, they may be the same person)
Wijnand – can you tell me where I can find Dr Spencer’s code?
As a programmer I have to totally agree. Without access to the code any results are worthless. Less than worthless. In my entire career (30 years now) I have never found a program that behaved as its documentation claimed.
Furthermore if scientists are worried about unprofessional code then write professional code. Have someone else look at it. Code reviews are a key element of good software creation.
Just like reviews are a part of good science. Or should be.
Louise,
You have my encouragement to request whatever code you feel is pertinent from Dr. Spencer. Should he refuse, you have the option of pursuing a request under state or federal freedom of information laws. You also have the option of publicly announcing any refusal on this and other blogs.
If scientific journals take to heart the recommendations of Morin, et al, the you also will have the option of requesting or obtaning the code from these journals. I share in your joy that the scientific establishment now recognizes that the need for access to obscure computer code extends to all who pursue excellence in science, not just those who have established work relationships with select researchers.
Please let us know how your project proceeds.
REPLY: Agreed. I think she’s in for a shock, the real fun will be to watch what she does with it once she has it. – Anthony
Dr. Svalgaard,
My programs are completely portable and will run on ANY computer with a COBOL compiler.
Shouldn’t that read BOTH computers, not ANY computer?
😉
Cheers,
Earle
EternalOptimist says:
April 17, 2012 at 12:57 pm
Most coders regard even their own work as amateurish, because they recognise that with infinite resources it could be done a lot better. A lot tighter.
My experience with this [going back to the 1960s and continuing today] is that it is easier to write good, self-documenting, correct code than the amateurish hash you may refer to. The way to do this can be taught.
“they feel is amateurish due to computer programming not being their profession and that some code may have commercial value, but suggest that such reasons should no longer be considered sufficient for withholding such code” As someone with a fair amount of numerical methods (modeling) training plus statistics the code must be available for review as it is too easy to make mistakes in these areas plus of course programming errors are notoriously frequent. I wonder how they test these programs including regression testing after making any and all changes.
Dave W
Leif, I think you will find that I was not talking about what is easiest, and I was not talking about what can be taught
I was talking about how it is , right now, amongst the people who work in the field