In my opinion, this is a testament to Steve McIntyre’s tenacity.
Via the GWPF: At Last, The Right Lesson From Climategate Fiasco
A diverse group of academic research scientists from across the U.S. have written a policy paper which has been published in the journal Science, suggesting that the time has come for all science journals to begin requiring computer source code be made available as a condition of publication. Currently, they say, only three of the top twenty journals do so.
The group argues that because computer programs are now an integral part of research in almost every scientific field, it has become critical that researchers provide the source code for custom written applications in order for work to be peer reviewed or duplicated by other researchers attempting to verify results.
Not providing source code, they say, is now akin to withholding parts of the procedural process, which results in a “black box” approach to science, which is of course, not tolerated in virtually every other area of research in which results are published. It’s difficult to imagine any other realm of scientific research getting such a pass and the fact that code is not published in an open source forum detracts from the credibility of any study upon which it is based. Articles based on computer simulations, for example, such as many of those written about astrophysics or environmental predictions, tend to become meaningless when they are offered without also offering the source code of the simulations on which they are based.
The team acknowledges that many researchers are clearly reticent to reveal code that they feel is amateurish due to computer programming not being their profession and that some code may have commercial value, but suggest that such reasons should no longer be considered sufficient for withholding such code. They suggest that forcing researchers to reveal their code would likely result in cleaner more portable code and that open-source licensing could be made available for proprietary code.
They also point out that many researchers use public funds to conduct their research and suggest that entities that provide such funds should require that source code created as part of any research effort be made public, as is the case with other resource materials.
The group also points out that the use of computer code, both off the shelf and custom written will likely become ever more present in research endeavors, and thus as time passes, it becomes ever more crucial that such code is made available when results are published, otherwise, the very nature of peer review and reproducibility will cease to have meaning in the scientific context.
More information: Shining Light into Black Boxes, Science 13 April 2012: Vol. 336 no. 6078 pp. 159-160 DOI: 10.1126/science.1218263
Abstract
The publication and open exchange of knowledge and material form the backbone of scientific progress and reproducibility and are obligatory for publicly funded research. Despite increasing reliance on computing in every domain of scientific endeavor, the computer source code critical to understanding and evaluating computer programs is commonly withheld, effectively rendering these programs “black boxes” in the research work flow. Exempting from basic publication and disclosure standards such a ubiquitous category of research tool carries substantial negative consequences. Eliminating this disparity will require concerted policy action by funding agencies and journal publishers, as well as changes in the way research institutions receiving public funds manage their intellectual property (IP).
=========================================
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Not releasing the source code and data is the same as not releasing the procedures necessary to replicate an experiment. Without the source code, we just have some scientists saying “trust us.” That’s not science.
BTW: I don’t want the source code released because I want to run the model myself. I want it released because it’s necessary for the process. We need to pay attention to the man (or Mann) behind the curtain.
Releasing flow charts and pseudo code without the source code won’t cut it, either. We have to be able to verify the program works as documented and the source code and the data is the only way to verify that.
Back in school, if you filled in your maths homework with just the answers, you would get no marks, and a comment ‘please show your working out’…….it seemed that teachers used to be concerned that not showing ‘your working out’ was akin to wild guesswork or maybe even cheating.
They were right of course.
Wonder if Joel ever ‘showed his working out’ ????
Ian W (April 17, 2012 at 12:42 pm) wrote:
“Oh, I didn’t realize you are a climatologist – I do apologize I am more used to dealing with the hard sciences like sociology. /sarc”
An insightful comment.
I feel that they are using the term “amateurish” as a fig leaf for those out there who have models with a predetermined output. It is a much less confrontational term than “dishonest”.
This entire situation is mind boggling to me, as I was taught from grade school through college that showing your work was as important as what your answer was. Initially it was to allow the teacher to see where I had messed up in my long division homework, but eventually it came to the point where the steps being shown allowed my professors to follow my line of thought in addressing a project. No steps shown, no credit. How did this environment allow the situation we are in to develop?
I can, with minimal effort, lay my hands on the deck of punch cards for the simple distallation program I wrote (in FORTRAN) for a class now 35 years in the past. I can see them becoming collector’s items in another 35 :-} (should I live so long).
D. J. Hawkins says:
April 18, 2012 at 9:48 am
I can, with minimal effort, lay my hands on the deck of punch cards for the simple distallation program I wrote (in FORTRAN) for a class now 35 years in the past.
Most likely that program would still run today.
I run a company that writes complex scientific code for our own commercial products. The work is private, self-funded. The resulting code is proprietary, but its not the code I want to hide specifically, its the underlying technology, so I simply don’t publish anything about it. If I am an academic and want to publish a novel method, then the code and implementation should be as much a part of the paper as the mathematical derivation. If the work is privately funded, then don’t publish. If its publically funded then I get VERY ticked off by academia claiming commercial advantage when they don’t have to provide their own money – a huge advantage over small companies like mine that have to self-fund.
A good example of how academia can still write fantastic code, make it freely available and yet still demand (modest) licence fees for commercial use is the FFTW code from MIT. Developed and owned by MIT it performs very fast prime number FFT transforms in higher dimensions. Its amazing code and you can download it for free. If you use it internally, even in a commerical company, there is no licence fee (or for academic purposes of course) but if you sell the resulting product you have to buy a licence. The price is very reasonable – single payment of about $2,500 – $5,000 depending on the version. THAT’s the right commercial model for these things – everyone wins from that kind of arrangement.
On the topic of code and portability, the journal Computers & Geosciences has been publishing algorithms and accompanying code for many years, sometimes on very technical and sophisticated techniques. No-one makes rules about what language etc, the authors just need to give possible users a guide as to what the basic compiler/OS might be. There are clever people out there – look at how Steve McIntyre has reverse engineered even some of the most obfuscated results from Mann and others. Its not hard for others to replicate the work on other OS etc if they have the basic alogirthm written as code. Very often just the key functions are provided, with the calling arguments, and its enough. Take a look at Numerical Recipes in [C, Java, FORTRAN – take your pick] to see how useful such routines are to a wider scientific community. You don’t need all of it, although for scripted languages like R it helps – Steve McIntyre is great in this regard.
I am a technical bod, not a programmer, but I can still write reasonable programs. I learnt a long time ago the difference between commerical code and technical/scientific code. I write code to prove a concept or test an idea (be it something simple like using an Excel spreadsheet, or maybe Java or C). What I write is simple, practical, even amateurish code, but I am not ashamed of that. I am not a commerical programmer. Its the commercial programmer’s job to turn it into robust code that has regression tests and bounds checking etc etc, but that doesn’t invalidate my original analysis – unless I made a mistake. In which case I want to know about it and PUT IT RIGHT. Don’t those publishing analyses/algorithms concerned with AGW want to know if they got it wrong and, if so, put it right? Or is that too much to expect?
ThinkingScientist:
Thankyou for your superb post at April 18, 2012 at 10:21 am.
I think it is by far the best post in the thread and I commend everybody to read all of it: each paragraph contains some meat.
Again, thankyou.
Richard
@ur momisugly thinking scientist:
Great post, a very insightfull, professional and completely logical argument.
As to your question
When speaking of the Team, I think I know the answer to that one.
Question for everybody here: if this were to be implemented by the journals, would this then mean that earlier published work also has to make available the code used, else be retracted?
One potential issue with sharing the source code is more uniformity of the models. The code just shows us how it works, what assumptions are made, the fudge factors, and errors, etc. The code still doesn’t show us the results are correct, just how the results were made.
I can even foresee open source climate models, officially endorsed and stamped with approval and the claim that since the code is good, then the results are as well.
joeldshore: It is not as cut-and-dry as people seem to think.
Actually, yes it is.
As to points 1 and 3, the scientists involved can choose to protect their intellectual property rights (possibly temporarily) or to publish and gain the scientific prestige. All the complications can be considered by them individually on a case-by-case basis, but the choice is simple: publication in scientific journals requires disclosure. Publishing a paper without disclosing original code is equivalent to publishing a description of the procedure that omitted critical steps, or omitted a key ingredient.
As to point 2, when using proprietary code by someone else (the examples cited above include Matlab, Excel, and statistical software), cite the name and version number — all information needed for someone to replicate the procedure exactly. That’s what scientists do when they use commercial reagents.
About effing time that some scientists have stated the obvious… Now let’s see how long it takes to get the journals to enforce the idea.
When they do, computer code should also come with instructions or documentation on how it was built, host platform, specific compiler versions, notes on platform dependencies, notes on underlying support code such as libraries and runtime environments. Obviously if some commercial library is used which cannot be legally shared as an object file, then version numbers for that SW should be provided, with a caveat that if the precise code is no longer sold due to obsolecense, the actual libraries used in the experiment can be retrieved. (Scientific and academic study is one of the Fair Use provisions that allow for copying in U.S. copyright law).
Wijnand:
At April 18, 2012 at 11:21 am you say;
“Question: if this were to be implemented by the journals, would this then mean that earlier published work also has to make available the code used, else be retracted?”
Well, I can only answer for myself and not “for everybody here”, but I think it would be very unreasonable to demand “that earlier published work also has to make available the code used, else be retracted”.
A paper is published according to the rules stipulated by the journal at the time of the paper’s submission. Changing the publication rules after a paper has been published should not affect the published paper in any way.
Your suggestion would require authors to guess how other publication rules may change in future. This would require all authors to each own a crystal ball (sarc on/ or a GCM sarc off/).
Richard
ThinkingScientist says:
April 18, 2012 at 10:21 am
Well said!
Willis Eschenbach says:
That’s quite an admission. You are so concerned about the fact that Mann might not have publicly available code over a decade old that has since been superseded by newer codes of his (for his 2008 paper) that are, by your own admission, publicly available. And, yet, at the same time, you have no clue whether Spencer and Christy have made ANY version of their codes publicly available?!?
One thing that I have admired about you is how you are willing to step up and complain about censorship of comments whether the “censor” is RealClimate or tallbloke. Similarly, don’t you think it is important for you to be sure that scientists who you admire are complying with the standards that you want to impose on other scientists in regards to making code freely available?
Willis Eschenbach says:
So, basically, you are saying that a huge swath of science currently published in scientific journals shouldn’t be there. Practically entire fields of research, like research on organic light emitting diodes (OLEDs) would disappear under your new standards.
As has been explained many times, “replication” has traditionally not meant what McIntyre and you and many other skeptics have defined it to mean. “Replication” has traditionally meant using the methods described in the paper to reproduce the basic results and conclusions of the paper. That has rarely involved using the author’s computer code.
There is a difference between explaining what it is doing and having the line-by-line code. I think there is universal agreement that papers need to adequately explain their methods. That has not traditionally been taken to mean that they have to provide the actual computer code.
The issue is with your very black-and-white, two-valued orientation: You are either a scientist or a businessman and if you are a scientist you have absolutely no intellectual property rights.
That is not traditionally how things have been done…and there are some good reasons why things haven’t been done that way.
Smokey says:
Great…So, where are the UAH codes? I am not asking for versions going back more than a decade. I would be perfectly happy just to see the most recent version …Or the version from Spencer and Christy’s most recent publication on the satellite temperature record.
The day I see someone demanding the code from scientists their ‘side’ is the day I might beleive I’m looking at a real live skeptic instead of a propagandist. Ogf course, a real live skeptic would eschew the notion of taking a side to begin with.
Joel, Willis, is this the description of, and the actual fortran coding for MBH98?
http://www.meteo.psu.edu/~mann/shared/research/MANNETAL98/METHODS/
joeldshore: The issue is with your very black-and-white, two-valued orientation: You are either a scientist or a businessman and if you are a scientist you have absolutely no intellectual property rights.
No, it just means that the scientist business man has to decide which intellectual property rights to keep secret for commercial reasons, and which to sacrifice for academic/professional advancement. Same as with reagents and other physical assets: keep the ingredients secret (as with Coke and Kentucky Fried Chicken) and sell the product, or publish the ingredients (Taq, etc) in the peer-reviewed literature for the academic and scientific credit.
Great…So, where are the UAH codes?
A palpable hit, in my opinion. There has to be a clear and unbiased standard. If you have asked for their code and they have not released it, then they have a problem.
joel shore,
I am an equal opportunity skeptic. Everyone who is government subsidized should provide their code, methods, metadata, etc. upon request. I would gladly support a law requiring that. The details could be worked out, but one requirement would be that anyone refusing to comply would be ineligible for any future gov’t money, as would their employer. Yeah, let’s make transparency the law. For everyone.
Ball’s back in your court: would you support such a law? Do you think Michael Mann would?
joeldshore: “Replication” has traditionally meant using the methods described in the paper to reproduce the basic results and conclusions of the paper.
That “tradition” has outlived its usefulness. Now it is recognized that without the code that supported a reported experimental result, it can not be dermined whether successors repeated the “same” procedure with adequate fidelity; that applies whether the successors do or do not seem to have replicated the original result.
More Soylent Green! I can even foresee open source climate models, officially endorsed and stamped with approval and the claim that since the code is good, then the results are as well.
A more likely outcome, in my opinion, will be lots of professionals volunteering time to test various sections of the code, and compiling a list of tests that the code sections have passed and (sometimes) failed. What happens now with commercial software like SAS is that many people test the new releases and report back to the SAS Institute when (it’s always “when”, not “if”) they discover problems. People in Big Pharma routinely test new releases against results from previous releases to ensure that the new releases are as reliable as the old releases. So do people in other industries.
barry says:
Thanks, barry. I think that is indeed what I had seen before and was trying to find again!
Smokey says:
I would hardly call you “an equal opportunity skeptic”. You have repeatedly made statements that Mann is a fraud, hiding things, etc. just because McIntyre says there is some obscure piece of code from 14 years ago that Mann has supposedly not yet released. And yet, I have never seen you use any sort of bad language about Spencer and Christy who, to my knowledge, have never publicly made available ANY version of their code.
Your claiming to be “an equal opportunity skeptic” is like a sheriff claiming he believes that everyone should obey the speed limit and then goes and throws a little old lady in jail for a week for going 27 in a 25 mph zone while failing to even pull over a political friend who whizzes by at 60 in a 30 zone.
joel shore,
That’s a lot of pointless chatter in place of answering my questions: would you support such a law? Do you think Michael Mann would?
Joel Shore,
You are still habitually evading questions I see.
BTW, you should try reading Andrew Montford’s book; The Hockey Stick Illusion, available at modest cost from Amazon. You make yourself look very silly to make such a naïve statement on that topic