In my opinion, this is a testament to Steve McIntyre’s tenacity.
Via the GWPF: At Last, The Right Lesson From Climategate Fiasco
A diverse group of academic research scientists from across the U.S. have written a policy paper which has been published in the journal Science, suggesting that the time has come for all science journals to begin requiring computer source code be made available as a condition of publication. Currently, they say, only three of the top twenty journals do so.
The group argues that because computer programs are now an integral part of research in almost every scientific field, it has become critical that researchers provide the source code for custom written applications in order for work to be peer reviewed or duplicated by other researchers attempting to verify results.
Not providing source code, they say, is now akin to withholding parts of the procedural process, which results in a “black box” approach to science, which is of course, not tolerated in virtually every other area of research in which results are published. It’s difficult to imagine any other realm of scientific research getting such a pass and the fact that code is not published in an open source forum detracts from the credibility of any study upon which it is based. Articles based on computer simulations, for example, such as many of those written about astrophysics or environmental predictions, tend to become meaningless when they are offered without also offering the source code of the simulations on which they are based.
The team acknowledges that many researchers are clearly reticent to reveal code that they feel is amateurish due to computer programming not being their profession and that some code may have commercial value, but suggest that such reasons should no longer be considered sufficient for withholding such code. They suggest that forcing researchers to reveal their code would likely result in cleaner more portable code and that open-source licensing could be made available for proprietary code.
They also point out that many researchers use public funds to conduct their research and suggest that entities that provide such funds should require that source code created as part of any research effort be made public, as is the case with other resource materials.
The group also points out that the use of computer code, both off the shelf and custom written will likely become ever more present in research endeavors, and thus as time passes, it becomes ever more crucial that such code is made available when results are published, otherwise, the very nature of peer review and reproducibility will cease to have meaning in the scientific context.
More information: Shining Light into Black Boxes, Science 13 April 2012: Vol. 336 no. 6078 pp. 159-160 DOI: 10.1126/science.1218263
Abstract
The publication and open exchange of knowledge and material form the backbone of scientific progress and reproducibility and are obligatory for publicly funded research. Despite increasing reliance on computing in every domain of scientific endeavor, the computer source code critical to understanding and evaluating computer programs is commonly withheld, effectively rendering these programs “black boxes” in the research work flow. Exempting from basic publication and disclosure standards such a ubiquitous category of research tool carries substantial negative consequences. Eliminating this disparity will require concerted policy action by funding agencies and journal publishers, as well as changes in the way research institutions receiving public funds manage their intellectual property (IP).
=========================================
Joel Shore,
You are still habitually evading questions I see.
BTW, you should try reading Andrew Montford’s book; The Hockey Stick Illusion, available at modest cost from Amazon. You make yourself look silly to make such a naïve statement on that topic
joeldshore says:
April 18, 2012 at 6:24 pm
So you are claiming the scientific journals should publish results that cannot be replicated … you’ll have to justify that one to me. Because it seems clear to me that if the research doesn’t contain enough information to replicate it, it shouldn’t be in a scientific journal. Feel free to argue the opposite side, that irreproducible results belong in scientific journals … I can hardly wait.
Oh, please, Joel, come up with a new argument. That has been explained to you many times. Why do you think Science magazine recommends requiring that the code be provided? Perhaps you can explain why Science magazine would do that when Joel Shore says that it is not necessary in any sense.
And of course, as you say, replication “traditionally” that hasn’t involved the author’s computer code, because traditionally authors didn’t use computers, or they didn’t build the entire study around computers … but as the Science article clearly indicates, that was then, and this is now. Sorry to drag you kicking and screaming into the 21st century, but that’s the modern reality.
Traditionally in science there were neither computers nor code, so I’m totally unclear what you mean by “traditionally”. In any case, explaining your computer code, even explaining it perfectly, doesn’t mean that your code actually does what you claim … you sure you’re familiar with computer programs? You do know that they often contain what they call “bugs”, and that without the code you can’t establish whether there are any of those “bugs”?
Where did I say scientists have no property rights? That’s a straw man. What I said was very different. I said that if you want to claim the mantle of science for your results, you have to be transparent so people can check your work. Otherwise, it’s not science. Why is that so hard to understand?
Again with the tradition … I don’t know if you noticed, but times have changed, Joel. Traditionally scientists wrote their reports in longhand. So should we continue the tradition?
Look, it appears that you think that Science magazine and a host of scientists are 100% wrong in wanting to require computer code. To most of us who have used computers extensively, requiring code makes perfect sense, because we know that computer programs are
a) very difficult to describe accurately in English, and
b) often contain bugs, and
c) can easily conceal a foolish error like say using degrees instead of radians, and
d) are quite fond of doing things that their programmers never dreamed of.
When you can explain to me how to discover those problems with a scientist’s work WITHOUT having access to the code, I’ll believe you have a point. Until then, you’re just defending the actions of scoundrels. I see above that you want to claim that this kind of investigation, of trying to do exactly what the author did with the data and code, is not a part of “replication”, whatever replication means to you.
Perhaps so, but I’ll leave the semantic hairsplitting to you. Me, I don’t care what you call it, but checking the accuracy and validity of someones data and code it is very necessary and critical part of the investigation and replication of anyone’s claims. The issue with Michael Mann’s “Hockeystick” code is a perfect example.
Mann made a newbie mistake, he used un-centered PC analysis, and he didn’t even realize it. The only way that was it discovered was that he left some of his code on an open server. Without that good fortune, we would still not know about his error—it simply could not have been discovered without access to the code.
That is the kind of bullshit that your argument is supporting, Joel—you are speaking out in favor of the concealment of the kind of crappy, error-containing code that Michael Mann wanted so desperately to keep secret. Are you sure that’s the side you want to be on?
I was under the impression you were a scientist … so why are you so opposed to transparency? Why do you want Mann and others to continue to be able to hide their errors? Science magazine understands the issue with computer code. We all understand that issue. So why don’t you? That’s the part I don’t get …
w.
PS—Are you going to have the courage to acknowledge that you were … well, let me call it “overly optimistic” when you said “I can give you links to Michael Mann’s code”? Because all of this flailing strikes me in part as a vain attempt to distract us from your failure to make your word good.
joeldshore says:
April 18, 2012 at 6:13 pm
I’m deeply sorry for my lack of omniscience, Joel, but I truly had never considered the question. So sue me …
I suspect I had not considered it because I knew Spencer and Christy had given their code to RSS to pick apart. RSS is their competitor, and one of the few groups with the expertise to find errors in the UAH code. I’ve often advocated doing just that, giving the code to your worst enemy. I’ve said in the past that if your worst enemy can’t find errors in the code, then things are looking good.
Since they had done exactly that, I didn’t even give it a second thought.
Now you want to bust me for that? Get real. As posters above have pointed out, Spencer and Christy have been working with NOAA to make their code public, so you are bitching me out about a total non-issue.
w.
PS—If Mann had done exactly what Spencer and Christy did, if he had given the code to Steve McIntyre, it wouldn’t be an issue. He didn’t. Your attempt to equate the two is a joke.
You need to get out more, you are the Pollyanna optimist that claimed that you could give us links to Mann’s code … now that you’ve noticed you can’t do that, you’d love to change the subject …
joeldshore says:
April 18, 2012 at 7:50 pm
Nice try, but no. See, Joel, some of use were actually following this story when Mann finally released what he claimed was the code in 2005. As Steve McIntyre commented at the time (emphasis mine):
Read Steve’s article for more details, also see here—the bottom line is, no, that not what barry calls “the actual fortran coding for MBH98”.
w.
The overall discussion triggered two points in my mind, and I’m interested to hear reactions.
1. It takes time to prepare code and model results for publication. When proposing for funding, scientists, including myself [I am primarily NASA-funded], would naturally rather propose to do more research than to propose to spend resources, say, cleaning up code and readying model runs for public consumption. The reason for this is pretty obvious: There is currently hardly any stigma attached to not publishing code, and proposals that contain additional research [vs. those proposing to archive code and model runs] are usually viewed by review panels as more cost-effective [more science per public dollar] and thus more likely to be funded. You may view the resulting science as inferior, but that usually doesn’t matter unless you happen to be serving on a review panel. No one ever said scientists were saints, and when the economic incentives slant heavily against publishing code and model runs, it’s shouldn’t be surprising that these items aren’t published. My opinion is that probably the best way to make publishing these items more commonplace is to have government funding agencies require it, but if even if that were in place then how do you get the independently wealthy or privately funded scientists [and there are lots of the latter in the medical disciplines] who are not beholden to government funding to change their current habits and buy in?
2. Further, there is the additional cost of archiving. These costs are not trivial when one considers that model runs, especially those involving 3D time-evolving numerical simulations, can take up terabytes of storage space, and this number is growing rapidly. [I’m assuming that to live up to the ideal being espoused here, one would require the model results and associated code to process it, in addition to the computer code that generated it, to be available as well.] Any suggestions for who should archive these data, and who should pay for it? This is not a trivial question to answer. Such data would ideally be stored in a such a way to allow easy access by everyone for long periods of time, or basically forever. However, it turns out the main priority of for-profit publishers who house the journals in which the majority of peer-reviewed scientific research is published [the three big ones are Wiley, Springer, and Elsevier] is to make money, not back up terabytes of data or keep it online and readily accessible. Government centers or funding agencies are subject to myriad forces [political, financial, bureaucratic, etc.] and may not be reliable in the long-term. Maybe something like cloud storage or bittorrent could work?
Competent parties need only:
1. data.
2. concise outline of methods. (Brighter parties won’t need #2.)
The incessant whining for code spoonfeeding gives the impression of a quantitatively weak community.
Red tape:
a) builds in delays.
b) deflects resources from research to admin.
In the aggregate, the delays and resource waste burden an already over-taxed society.
Seems like common sense to me. How much grief and misery would’ve been avoided if scientists, publishers and academics had gone down this route 15 years ago? Science is about testing hypotheses. If the hypothesis is based on computer code, it’s not testable unless the code is in the hands of people seeking to find fault with it. Climate models might be well nigh perfect by now … if only the modellers had opened their code to scrutiny. What we need now is retrospective enforcement of what has always been the rule. We’ll soon learn who’s been doing good research, and who’s been cooking the data to produce a grant … once we see the actual calculations. There probably aren’t all that many who’ve loaded the dice for profit. We need to check, though. It’s the only way to restore respect to climate science.
1) Computer models, computer simulations do not output data. Model results are neither facts nor data.
2) Without the code, there is no way to tell if what we actually done matches the claimed methods. Perhaps there is a bug in the code, so that no matter what data is entered, the results are essentially the same.
~More Soylent Green!
Willis Eschenbach: I suspect I had not considered it because I knew Spencer and Christy had given their code to RSS to pick apart. RSS is their competitor, and one of the few groups with the expertise to find errors in the UAH code.
Ah. that is news to me. Thanks. That’s a good step. Did they make their code publicly available, as called for in the Science article, when they published results?
More Soylent Green! (April 19, 2012 at 6:48 am) wrote:
“Computer models, computer simulations do not output data. Model results are neither facts nor data.”
What you address here is art & culture based on fantasy assumptions – and it isn’t good enough to attract sensible attention, let alone retain it.
—
More Soylent Green! (April 19, 2012 at 6:48 am) wrote:
“Without the code, there is no way to tell if what we actually done matches the claimed methods.”
Simply not true. Red herrings & straw men – if allowed – will deliberately have armies tied up unproductively at committee in a protracted resource drain. War or attrition’s a tactical political exercise that will do nothing to advance our understanding of natural climate variability.
—
I suggest redirecting focus to climate exploration without assuming climatology’s already at the level of a science. Simply explore the terrain and informally share raw findings without succumbing to procedural harassment and corrupt cultural pressure formally demanding editorial cosmetic distortion.
Can you explain the “simply not true” remark? Can you explain your entire post?
Willis Eschenbach says:
You know this how? What I have heard (admittedly only second-hand) is that it took RSS quite a bit of effort to get what they wanted from Spencer and Christy. And note that they, unlike McIntyre, were not asking for everything and anything but only for one particular small section of the code that related to their main point of contention.
What magicjava said was:
If Mann had said that he was working with his employer (or former employer) to get the code made available to the public and then 2 years went by, I doubt that you guys would be representing this as “Mann has been working with his employer to make his code public.” No, I think you would be representing it as something more along the lines of “Mann is continuing to stonewall about releasing his code.”
Willis Eschenbach says:
The simple fact remains that if I want to see the code for Mann’s ***MOST CURRENT*** work on the proxy temperature record, it is all publicly available. If I want to see ***ANY*** code for Spencer and Christy’s UAH temperature analysis based on the satellite data, I appear to be completely out of luck.
Willis Eschenbach says:
I said that if you want to claim the mantle of science for your results, you have to be transparent so people can check your work.
So even if you fully publish your methods, your work still isn’t reproducible?
My group works on simulations of materials. We publish the equations we solve, the numerical methods we use to solve them, and all parameters and inputs that go into the simulations. The results are reproducible: but you’ll have to actually do some work to reproduce them.
For us, our code is like a piece of lab equipment. We’re happy to explain what we do and how our methods works; that’s part of science and reproducibility. But we’re not just going to let you come in and use our lab equipment (even if it didn’t cost us anything), because that equipment cost us time and money to set up.
To most of us who have used computers extensively, requiring code makes perfect sense…
When you can explain to me how to discover those problems with a scientist’s work WITHOUT having access to the code, I’ll believe you have a point.
Why not just write your own code?
Sure, if it produces different results, you may not be able to explain exactly *why* the results are different, but that’s quite normal in science – some % of the time, we never figure out why so-and-so’s results were wrong.
Willis Eschenbach says:
I am not saying that. I am saying that you are defining the meaning of replication quite differently than it has traditionally been defined.
When I say “traditionally”, I mean including the last several decades over which computer code has been a very important part of scientific research.
Where has Science magazine made this recommendation? I hope you are not misinterpreting / misrepresenting the fact that Science published this article in their “Policy Forum” to mean that Science endorses all of its conclusions.
Windchaser says:
April 19, 2012 at 11:53 am
I’m sorry, Windchaser, but I don’t understand that. Why would your work not be reproducible?
The issue is whether the results are reproducible. For many studies, they are not reproducible without the computer code.
Windchaser, this has all come up and come to a head because for many things, being “happy to explain” what you do and how you do it isn’t enough to reproduce what you have done, for the reasons I spelled out above.
The problem is that your method just leads to dueling claims, where nothing gets settled. For example, people tried to reproduce what Michael Mann did in the “Hockeystick” paper. They couldn’t replicate it. But Mann continued to insist that everything was right and proper, and his results continued to be cited even though they were clearly wrong. What do you do then?
Without Mann leaving some of his code on an open server, that issue would never have been settled, because we’d never have found out just where Mann’s paper went off the rails. The problem was, unbeknownst to Mann, he was making a newbie math error. So the code wasn’t doing what he said it did. He, like you, was “happy to explain what we do and how our methods works” … but the code didn’t work the way he explained. What then?
I see no way to avoid those problems and to settle those issues without access to the code. It’s like when Ross McKitrick erroneously used degrees instead of radians … but unlike Mann, Ross revealed his code, and so the error was found and remedied quite quickly.
Under your proposed method, where hiding the code is perfectly fine, we’d never have found either Mann’s or McKitrick’s errors. Not only that, but in Mann’s case, the hours involved in trying to replicate it were huge. That’s a great waste of human resources, and that’s part of why Science magazine says, free the code.
So if you want to keep your code secret, that’s your call … just don’t expect me to believe a single word of the results from your code. Why should I, when you won’t reveal exactly how your code does it? The code may well contain errors that you don’t know about. It may be that you are shading the truth about what it does. It may be that you have just made up the answers. It may be for some unknown reason you are deliberately producing slightly wrong answers.
And without your code, we’ll never know the difference. You want us to trust that your code actually does what you claim, but science isn’t built on trust.
w.
donkeygod says:
This is just pure silliness. The legitimate debates and uncertainties in climate science have nothing to do with the nitty-gritty details of various groups’ computer codes. It has to do with real issues involving things like clouds, aerosols, etc., all of which are openly discussed in the scientific literature, at conferences, and by e-mail every day. I doubt you’d be able to find any serious scientist in the field who would list lack of access to other groups’ computer codes as among the top issues in the field.
And, some of the models, such as GISS Model E, have been publicly available. I doubt that as a result of this it is considered any more perfect than any other model.
Windchaser
My group works on simulations of materials. We publish the equations we solve, the numerical methods we use to solve them, and all parameters and inputs that go into the simulations. The results are reproducible: but you’ll have to actually do some work to reproduce them. For us, our code is like a piece of lab equipment. We’re happy to explain what we do and how our methods works; that’s part of science and reproducibility. But we’re not just going to let you come in and use our lab equipment (even if it didn’t cost us anything), because that equipment cost us time and money to set up.
So, let’s assume that I take the information you provide and come up with a totally different result. And MY results are just as consistent as yours. Obviously, at least one of us has NOT implemented the simulation correctly – but how is anyone to know?
In your example, it’s like you built your OWN piece of lab ware, but didn’t provide all of the details about how it was built. Maybe you gave the measurements, but failed to specify the materials, and yours was made of aluminum while mine was made of plastic. Or maybe you gave all the details, but made a mistake in the construction. How is anyone to know if you don’t let anyone SEE the piece of equipment you built? There is no way someone can compare what you SAY it’s supposed to do with what it ACTUALLY does, if nobody can examine it.
joeldshore says:
April 19, 2012 at 11:44 am
The fact remains, your mouth made a bet that your resources can’t pay. You said you could give us links to Mann’s code. You can’t. Quit trying to wriggle out of it, it’s unseemly.
w.
PS—Please don’t try to claim you were talking about Mann’s most current code all along, your own words show that’s not the case.
joeldshore says:
April 19, 2012 at 12:29 pm
So your claim is that in a Science magazine special issue, whose theme is “Computational Biology”, and whose introduction to the special issue says:
… your claim is that the Editors allowed a group of authors to put out a very strong call for disclosure that the Editors plan to ignore, or don’t approve of?
The article says (emphasis mine):
and
Reference 16 is by Brooks Hanson, Andrew Sugden, and Bruce Alberts, who are respectively two Deputy Editors and the Editor-in-Chief of Science magazine … the other three are to the code requirements of the Journal of Biological Sciences, PLoS, and the Proceedings of the National Academy of Science …and you are trying to get us to believe that Science magazine is not endorsing this approach? That they are just putting it up for discussion, but they’re not going to follow the ideas because they don’t believe in them??
Get real, Joel, your gyrations to try to establish your claims are getting embarrassing.
w.
PS—You seem to think that the appearance of the article in the “Policy Forum” section means the magazine editors don’t back the ideas, but actually the opposite is true. Not only are they backed by the editors, but according to Science guidelines for authors, items in the “Policy Forum” are mostly “commissioned by the editors” …
I am “Computer Challenged” However the one point I see in all this is if I do a scientific experiment I try to documented what I did exactly because I want someone to be able to reproduce what I did whether it is tomorrow or a hundred years from now. The Millikan Oil Drop Experiment is a classic example.
If ALL the information is not published, then the results are completely worthless to scientists in the future because they are nothing more than someone’s opinion. All the papers so far published without code and data attached will be useless a hundred years from now unlike Robert Millikan’s 1909 experiment.
THAT is the critical issue in my opinion.
joeldshore says:
April 19, 2012 at 12:29 pm
Upon further research, I find that Science magazine made the recommendation in the cited reference 16 above, available here (paywalled), viz (emphasis mine):
w.
Eschenbach says:
Windchaser, this has all come up and come to a head because for many things, being “happy to explain” what you do and how you do it isn’t enough to reproduce what you have done, for the reasons I spelled out above.
The issue is whether the results are reproducible. For many studies, they are not reproducible without the computer code.
Then either:
1) The original work is incorrect, or
2) the methodology is not clearly explained.
The problem is that your method just leads to dueling claims, where nothing gets settled.
Yes. Dueling claims happen all the time in science. But that’s hardly the same as “nothing gets settled”, because usually other people are willing to jump on board and run the experiments/simulations themselves. Controversy is what drives the scientific method, after all.
The ‘currency’ of the publishing world is citations. Citations are a measure of your influence, success, and fame in this world, and are a part of what gets you more research money, a bigger lab, etc.
A big controversy generates lots of papers and lots of citations, and getting in on the action by publishing is an easy way to build your academic career (provided your data is solid). Take a look at the recent faster-than-light neutrino controversy, for instance, where something like 150 papers were published in a few months after the controversial results came out, and see how many citations those initial papers received. Or look at the cold fusion story from the ’60s and the flurry of publication that followed.
Basically – it’d be great for you if you can show that some important piece of work is not reproducible (well, provided the errors are in their code and not yours). And there’s plenty of motivation for other people to come along and help resolve the controversy.
And without your code, we’ll never know the difference. You want us to trust that your code actually does what you claim, but science isn’t built on trust.
Of course it’s not. Which is why you write your own code, test it yourself, and run the simulations yourself.
TonyG says:
In your example, it’s like you built your OWN piece of lab ware, but didn’t provide all of the details about how it was built. Maybe you gave the measurements, but failed to specify the materials, and yours was made of aluminum while mine was made of plastic. Or maybe you gave all the details, but made a mistake in the construction.
Nah. All the important details about how it’s built are given, so that other people can build one themselves.
If you follow the exact instructions and get different results from your equipment, then one of us built it badly, right? Maybe I scratched the lens on the microscope I was building, or didn’t measure the focal length properly, or whatever. But even if I didn’t correctly follow my own methodology, it doesn’t mean that you get to come into into my lab and inspect my equipment.
Most of the potential problems you guys have raised, like bugs in the code, have nothing to do with publishing enough information about your methods, and everything to do with implementing those methods correctly.
Willis,
as far as I am aware, that is the base code for MBH98. You have cited a preliminary post McIntyre made in 2005, following it’s release. Since then McIntyre has made use of the code to check MBH98/99.
A commenter at Science, under the Black Box abstract, says;
McIntyre has found it very deifficult to work with the code Mann posted. This stuff (apparently) is not like commercial software, it’s buggy, purpose built, and not easily applicable. It’s not written for end users.
The commenter raises an interesting point – if code is not user-friendly, will it be enough to provide it and sit back? If it works for the makers, but peer-reviewers can’t get it to function, what then? If journals may not publish unless user-friendly code is provided, then that is going to slow down research.
Results that are reproduced using different methods are more robust than done by repeating the steps. This call for scientists to write and publish code that others can easily use to verify is a call from auditors, not researchers. I can understand serving the wants of the gatekeepers, but I’m not sure we’d get better science or scientists. We might get better programmers (for end users).
BTW, I could only get the summary of the Science editorial you cited. That was about freeing data, not code. Is the rest of the article specifically asking for code?
barry says:
April 19, 2012 at 4:03 pm
Cite?
And what do you mean by “base code”? Either that is the code that was used to do the Hockeystick calculations, or it isn’t. As far as I know, it isn’t. Calling it the “base code” is just a verbal trick, that statement means nothing.
Most of the article is about data. The part I quoted was about how Science magazine is changing its editorial policies to require code.
w.