Dr. Leif Svalgaard directs my attention via email to this letter in Eos, Vol. 93, No. 29, 17 July 2012, which calls for more “open science” in the role of peer review, always a good thing.
I’m repeating parts of it here, a link to the full letter follows.
By Duncan Carr Agnew
Institute of Geophysics and Planetary Physics, University of California, San Diego, La Jolla
The title of this Forum is meant to sound paradoxical: Isn’t the publication of results what AGU journals are for? I argue that in some ways they aren’t, and suggest how to
fix this.
Explaining this apparent paradox requires that we look at the structure of a published
paper and of the research project that produced it. Any project involves many
steps; for those using data to examine some problem the first step (step A) is for
researchers to collect the relevant raw data. Next (step B), they analyze these
data to learn about some phenomenon of interest; this analysis is very often
purely computational. Then (step C), the researchers (we can now call them “the
authors”) arrange the results of this analysis in a way that shows the reader the evidence
for the conclusions of the paper. Sometimes these results appear as a table, but more
often they are shown pictorially, as, for example, a plot of a time series, a map, a
correlation plot, or a cross-section. Finally (step D), the authors state the conclusions to
be drawn from the results presented.
I would claim that almost always what is published (that is, made available to others)
is just the conclusions (step D).
The actual results produced in step B from the data
collected in step A are usually not available: All that a paper usually contains is pictures
made from these results (step C), and only rarely can the actual results (the numbers)
be determined from these pictures.
…
The whole point of papers is to allow scientists to benefit from each others’ research; we could do this much more easily if AGU journals strongly encouraged authors to provide, as text files with some annotation, the information on which they based their conclusions.
Just as with the current option of providing data files in an electronic supplement, these text files would be kept by AGU. Then, a future author could not only cite the paper but also use its results—and researchers could build on each others’ work in ways that now can be very difficult.
One objection to this proposal is the cost of having AGU store all this information—a concern reflected in the existing AGU data policy, which states that AGU will
not archive data sets. But that policy was written in 1996, and the cost of storage is
5 × 10-4 of what it was then. For many papers it is quite likely that the size of the files
for the results would occupy much less storage space than what is now used for the
PDF files for the figures.
…
I suggest two changes in AGU’s data policy to help make results available:
1. It should be considered standard practice (though not an absolute requirement) for authors to include files of numerical results, which would go in an electronic supplement.
…
For some papers the results might be too large to store, but that should not be a reason to omit them in others. In accordance with present policy for materials
in a supplement these numerical files, or thorough descriptions of them, should be
made available to the reviewers.
2. Any numerical information put in the supplement should be provided in forms that
can easily be accessed by future readers. It is common to convert tabular information in
supplements to pdf files. This format is not an absolute barrier to using the numbers
given, but it requires more work than a text file would to convert the contents to
a useful (that is, machine-readable) form.
Also, authors should be allowed to bundle compressed text files using some common utility, such as zip, rather than having to keep them separate, which requires users (and reviewers) to download them one by one.
Much discussion of data policy has been about making raw data openly available. This is certainly desirable but would not, practically speaking, make the results of research available, since getting the authors’ results from raw data would require complete replication of the authors’ original processing. Making such replication easy, called “reproducible research,” was proposed by geophysicist Jon Claerbout in the early 1990s; it has made some, but slow, progress since then [Mesirov, 2010].
Pending the arrival of this utopia, the more modest steps I propose would do much to
make at least some aspects of our work easily reproduced. AGU’s (former) motto,
“unselfish cooperation in research,” would be perfectly exemplified by papers that
contained results that could be used by others.
Why not take this step?
See the full letter here
A few details of a non-scientific but practical nature may push this proposal forward. What, specifically, is the AGU worried about? That would be a data size (We can’t store Y terabytes of data, that’s huge!) and a dollar amount (it costs $X/gb/yr to store). And the proposers could also mention what they think is a reasonable burden for the AGU to take on and work to a consensus on that. At some point the engineers making the storage systems will build systems cheap and large enough that the objections are overcome and we’ll know when they’ve done it because we have these figures ahead of time. And if the budgets are so tight at the AGU perhaps we should promote a kickstarter so that the money is covered.
I suspect that in the real world, the storage systems to handle this cheaply enough are already available in commerce. I suspect that the AGU is shying away from this data storage mission because they don’t want people to be able to go back and knock out too many of the studies that passed peer review. I suspect that it’s mostly not a conscious conspiracy but rather a not entirely thought through unease that leads them not to even run the numbers. So let’s insist that the numbers be run and the technologists take a crack at sorting out the practicalities. If it turns out that the job *is* actually insanely expensive and prohibitively large in 2012, we can look at present trends and start preparing for the time when it ceases to be unreasonable.
My only possible comment could be, ‘Duh!”
Kev-in-UK: Well, close, but not quite. One low-cost (so no instrumentation or test apparatus readily available) grassroots job I worked as compassman early in my career, the geologist warned me up front that he was accustomed to igneous terrains and that he saw everything in that lens. He recognized this, and wanted me to question anything he said because the area we were in was supposed to be sedimentary and the rock face we were looking at could have been pyroclastic in origin, or sedimentary outwash. I think that those were the possibilities, it was a couple of decades ago. The presence of something different in the strata column (namely, one strata column “eating” another one, with boulders of the older column made up of all size ranges) was what finally convinced the whole party that we were indeed in an igneous setting. Since the presence and type of former life is one of the classifying agents for age, a change in fossils might indicate a change in the life at that time, or a miss-categorization. That’s why I was harping on the location (raw data) so much.
Paul Vaughan says: “As I say every time this issue comes up: All capable parties need are the data on a plain-text webpage. Turning exploration into a nightmare of administrative, financial, & temporal inefficiency seems to be the objective of many. What you would get is more cosmetics, less substance, long delays, & cost overruns.”
What we’ve got now is $79 billion worth of pseudoscience designed to put us all in a global-scale concentration camp.