AGU Letter: Journals Should Ask Authors to Publish Results

Dr. Leif Svalgaard directs my attention via email to this letter in Eos, Vol. 93, No. 29, 17 July 2012, which calls for more “open science” in the role of peer review, always a good thing.

I’m repeating parts of it here, a link to the full letter follows.

By Duncan Carr Agnew

Institute of Geophysics and Planetary Physics, University of California, San Diego, La Jolla

The title of this Forum is meant to sound paradoxical: Isn’t the publication of results what AGU journals are for? I argue that in some ways they aren’t, and suggest how to
fix this.

Explaining this apparent paradox requires that we look at the structure of a published
paper and of the research project that produced it. Any project involves many
steps; for those using data to examine some problem the first step (step A) is for
researchers to collect the relevant raw data. Next (step B), they analyze these
data to learn about some phenomenon of interest; this analysis is very often
purely computational. Then (step C), the researchers (we can now call them “the
authors”) arrange the results of this analysis in a way that shows the reader the evidence
for the conclusions of the paper. Sometimes these results appear as a table, but more
often they are shown pictorially, as, for example, a plot of a time series, a map, a
correlation plot, or a cross-section. Finally (step D), the authors state the conclusions to
be drawn from the results presented.

I would claim that almost always what is published (that is, made available to others)
is just the conclusions (step D).

The actual results produced in step B from the data
collected in step A are usually not available: All that a paper usually contains is pictures
made from these results (step C), and only rarely can the actual results (the numbers)
be determined from these pictures.

The whole point of papers is to allow scientists to benefit from each others’ research; we could do this much more easily if AGU journals strongly encouraged authors to provide, as text files with some annotation, the information on which they based their conclusions.

Just as with the current option of providing data files in an electronic supplement, these text files would be kept by AGU. Then, a future author could not only cite the paper but also use its results—and researchers could build on each others’ work in ways that now can be very difficult.

One objection to this proposal is the cost of having AGU store all this information—a concern reflected in the existing AGU data policy, which states that AGU will
not archive data sets. But that policy was written in 1996, and the cost of storage is
5 × 10-4 of what it was then. For many papers it is quite likely that the size of the files
for the results would occupy much less storage space than what is now used for the
PDF files for the figures.

I suggest two changes in AGU’s data policy to help make results available:

1. It should be considered standard practice (though not an absolute requirement) for authors to include files of numerical results, which would go in an electronic supplement.

For some papers the results might be too large to store, but that should not be a reason to omit them in others. In accordance with present policy for materials
in a supplement these numerical files, or thorough descriptions of them, should be
made available to the reviewers.

2. Any numerical information put in the supplement should be provided in forms that
can easily be accessed by future readers. It is common to convert tabular information in
supplements to pdf files. This format is not an absolute barrier to using the numbers
given, but it requires more work than a text file would to convert the contents to
a useful (that is, machine-readable) form.

Also, authors should be allowed to bundle compressed text files using some common utility, such as zip, rather than having to keep them separate, which requires users (and reviewers) to download them one by one.

Much discussion of data policy has been about making raw data openly available. This is certainly desirable but would not, practically speaking, make the results of research available, since getting the authors’ results from raw data would require complete replication of the authors’ original processing. Making such replication easy, called “reproducible research,” was proposed by geophysicist Jon Claerbout in the early 1990s; it has made some, but slow, progress since then [Mesirov, 2010].
Pending the arrival of this utopia, the more modest steps I propose would do much to
make at least some aspects of our work easily reproduced. AGU’s (former) motto,
“unselfish cooperation in research,” would be perfectly exemplified by papers that
contained results that could be used by others.

Why not take this step?

See the full letter here

29 thoughts on “AGU Letter: Journals Should Ask Authors to Publish Results

  1. My vote is to present relevant data necessary to support the conclusions drawn. I like to draw my own conclusions based on available data. I may draw data from other sources to focus or support any given conclusion. The author is correct that the cost of data storage is much less than it used to be, so AGU’s reason for the No-Data-Storage policy is no longer valid. What could be difficult is labeling all data such that it can be found and retrieved later. A foolproof identification system would be required before implementation of full storage could be implemented.

  2. “Dr. Leif Svalgaard directs my attention via email to this letter in Eos, Vol. 93, No. 29, 17 July 2012, which calls for more “open science” in the role of peer review, always a good thing.”…….
    =======
    Impressive, I guess, considering my cognitive ability.
    Which is always improved by visiting WUWT, and will continue.

  3. Exactly one of the reasons why I wouldn’t even contemplate joining the AGU !
    As a geologist (first) and an engineer (second) and a ‘failed’ physicist (third) – I take geological processes and their interpretation as my baseline – interestingly enough, geology is one of the few subjects where the ‘science’ and its interpretation and understanding (by others) is paramount to the very core (excuse pun) of the subject. Sure, we have arguments over major theory challenges – but the basic data (i.e. the ‘rocks’ to you non-geological types) is ‘Out there’ and is usually readily visible….it is irrefutable in its basic form…if some guy studies foraminifera of the cretaceous periopd and announces that it shows ‘something’ – it’s easily repeatable and the data is essentially ‘available’ for any tom, dick or harry to check….all science should be this way IMHO – but then again, when it’s all modeled this and modeled that – based on the imagination of some bloke at a computer – what do you expect? science should be based on observations and actual evidence as far as possible – not fanciful undemonstrable ‘ideas’.
    Transparency of ideas, workings and proof (or at least the experimental findings/proof) is the fine detail that must be shown in any published work……..just my twopenneth….

  4. “1. It should be considered standard practice (though not an absolute requirement) for authors to include files of numerical results, which would go in an electronic supplement.”

    Why the qualification? The major reason for publishing is not to stroke one’s own ego, but to permit, nay, encourage replication. No data & methodology, no replication. No replication, no science. Color me unimpressed. If it’s not an absolute requirement, we’ll get what we’ve always gotten: “I’ve discovered evidence of impending global doom! My output numbers are: 42, 1.21 x 10³, 0.0054. The results are robust. Dismantle modern society. Blow up unbelievers. Send me and the UN oodles of dosh. My methods, code, and raw data? My error bars? Don’t be silly, I’m a climate scientist. Yer has ta troost me, Guv’nah.”

  5. The archive site also needs a downloadable PDF reader and an UNZIP utility so that future readers can actually access the data.

  6. For those paper who’s supporting data is too large for storage by AGU, the authors should provide access to this data on their servers. This would limit the cost to AGU while achieving the desired transparency. Authors who fail to do this would have their papers immediately rejected until a signed agreement is submitted.

    Bill

  7. If the data supports the conclusions in a paper and if the Researcher’s principal concern was scientific, the data and the algorithms used to manipulated the data would be made available. The only logical motivation to hide the data, is the data does not support the conclusions in the paper.

    The problem from the extreme AGW supporters viewpoint, is the science does not support the extreme AGW paradigm. The planet’s response to a change in forcing is to resist the forcing (negative feedback). Clouds in the tropics increase or decrease to resist the forcing change (negative feedback). The IPCC’s general circulation models (GCM) that makes the extreme warming prediction assumes the planet amplifies the forcing change (positive feedback). If there is no amplification, planetary warming due to a doubling of atmospheric CO2 is less than 1c with most of the warming occurring at high latitudes where it has and will result in the biosphere expanding. The data and analysis clearly supports the assertion that the planet’s feedback response is negative. There is hence no extreme AGW warming issue. The extreme AGW war has been won.

    CO2 is not a poison. Commercial greenhouses inject CO2 to increase yield and reduce growing times. The optimum level of atmospheric CO2 from the standpoint of plants is 1000 ppm to 1500 ppm. The biosphere expands when the planet is warmer and contracts when it is colder.

    When science is removed from the formation of public policy the resultant is anarchy, absurdity. Intelligent, knowledgeable environmentalists should logically be on the side of the so called “deniers”, “skeptics”.

    http://www.uoguelph.ca/~rmckitri/research/McKitrick-hockeystick.pdf

    ” What is the ‘Hockey Stick’ Debate About?
    … At the political level the emerging debate is about whether the enormous international trust that has been placed in the IPCC was betrayed. The hockey stick story reveals that the IPCC allowed a deeply flawed study to dominate the Third Assessment Report, which suggests the possibility of bias in the Report-writing…

    …The result is in the bottom panel of Figure 6 (“Censored”). It shows what happens when Mann’s PC algorithm is applied to the NOAMER data after removing 20 bristlecone pine series. Without these hockey stick shapes to mine for, the Mann method generates a result just like that from a conventional PC algorithm, and shows the dominant pattern is not hockey stick-shaped at all. Without the bristlecone pines the overall MBH98 results would not have a hockey stick shape, instead it would have a pronounced peak in the 15th century.
    Of crucial importance here: the data for the bottom panel of Figure 6 is from a folder called CENSORED on Mann’s FTP site. He did this very experiment himself and discovered that the PCs lose their hockey stick shape when the Graybill-Idso series are removed. In so doing he discovered that the hockey stick is not a global pattern, it is driven by a flawed group of US proxies that experts do not consider valid as climate indicators. But he did not disclose this fatal weakness of his results, and it only came to light because of Stephen McIntyre’s laborious efforts.

    Another extension to our analysis concerned the claims of statistical significance in Mann’s papers. We found that meaningless red noise could yield hockey stick-like proxy PCs. This allowed us to generate a “Monte Carlo” benchmark for statistical significance. The idea is that if you fit a model using random numbers you can see how well they do at “explaining” the data. Then the “real world” data, if they are actually informative about the climate, have to outperform the random numbers. We calculated significance benchmarks for the hockey stick algorithm and showed that the hockey stick did not achieve statistical significance, at least in the pre-1450 segment where all the controversy is. In other words, MBH98 and MBH99 present results that are no more informative about the millennial climate history than random numbers. “…”

    http://www.climatechangefacts.info/ClimateChangeDocuments

    /LandseaResignationLetterFromIPCC.htm

    “After some prolonged deliberation, I have decided to withdraw from participating in the Fourth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC). I am withdrawing because I have come to view the part of the IPCC to which my expertise is relevant as having become politicized. In addition, when I have raised my concerns to the IPCC leadership, their response was simply to dismiss my concerns….

    Shortly after Dr. Trenberth requested that I draft the Atlantic hurricane section for the AR4’s Observations chapter, Dr. Trenberth participated in a press conference organized by scientists at Harvard on the topic “Experts to warn global warming likely to continue spurring more outbreaks of intense hurricane activity” along with other media interviews on the topic. The result of this media interaction was widespread coverage that directly connected the very busy 2004 Atlantic hurricane season as being caused by anthropogenic greenhouse gas warming occurring today. Listening to and reading transcripts of this press conference and media interviews, it is apparent that Dr. Trenberth was being accurately quoted and summarized in such statements and was not being misrepresented in the media. These media sessions have potential to result in a widespread perception that global warming has made recent hurricane activity much more severe. Moreover, the evidence is quite strong and supported by the most recent credible studies that any impact in the future from global warming upon hurricane will likely be quite small. The latest results from the Geophysical Fluid Dynamics Laboratory (Knutson and Tuleya, Journal of Climate, 2004) suggest that by around 2080, hurricanes may have winds and rainfall about 5% more intense than today. It has been proposed that even this tiny change may be an exaggeration as to what may happen by the end of the 21st Century (Michaels, Knappenberger, and Landsea, Journal of Climate, 2005, submitted).

    It is beyond me why my colleagues would utilize the media to push an unsupported agenda that recent hurricane activity has been due to global warming. Given Dr. Trenberth’s role as the IPCC’s Lead Author responsible for preparing the text on hurricanes, his public statements so far outside of current scientific understanding led me to concern that it would be very difficult for the IPCC process to proceed objectively with regards to the assessment on hurricane activity.”

    http://www.leif.org/EOS/2009GL039628-pip.pdf

    ““On the determination of climate feedbacks from ERBE data
    Richard S. Lindzen and Yong-Sang Choi
    Program in Atmospheres, Oceans, and Climate
    Massachusetts Institute of Technology

    Climate feedbacks are estimated from fluctuations in the outgoing radiation budget from the latest version of Earth Radiation Budget Experiment (ERBE) nonscanner data. It appears, for the entire tropics, the observed outgoing radiation fluxes increase with the increase in sea surface temperatures (SSTs). The observed behavior of radiation fluxes implies negative feedback processes associated with relatively low climate sensitivity. This is the opposite of the behavior of 11 atmospheric models forced by the same SSTs. Therefore, the models display much higher climate sensitivity than is inferred from ERBE…

    1) The models display much higher climate sensitivity than is inferred from ERBE.

    2) The (negative) feedback in ERBE is mostly from SW while the (positive) feedback in
    the models is mostly from OLR.

    Finally, it should be noted that our analysis has only considered the tropics. Following Lindzen et al. [2001], allowing for sharing this tropical feedback with neutral higher latitudes could reduce the negative feedback factor by about a factor of two. This would lead to an equilibrium sensitivity that is 2/3 rather than 1/2 of the non-feedback value. This, of course, is still a small sensitivity.”

  8. Said before, will say it again. The only way to break the circle of confirmation bias and less than honest methodologies is to separate the collection of data from the analysis completely. The people responsible for drawing conclusions from the data should NOT be the same people collecting the data. We wouldn’t allow a single drug on the market that wasn’t tested via the “double blind” process, and why would would have a lower (shockingly lower) standard for climate research is beyond me.

    The data should be collected and published BEFORE the research is done, and with open access for all. This nonsense about archiving data is just that, nonsense. Even with the data archived, we still don’t know if it was collected properly, if contrary data was discarded, and so on.

  9. Without open access to data the conclusion cannot be falsified. Karl Popper (Logic of Scientific Discovery) says that’s non-science. Marketing and technology are validation and not science. Popper’s other influential book is The Open Society and Its Enemies. Notice the confluence of ideas? An open society requires open access to information. In another forum the argument has been made that an ad hominem implies argumentum ad verecundiam and the opponent is not worthy of the special knowledge.

  10. Kev-in-UK – would you agree that the type of formation and the location (not the the millimeter, but in regional or rock zone or transition zone terms? Any datum is useless without the context. The context for temperature data is where and when it’s taken, and what it purports to measure. Rock faces should not change over time (unless they’re in an active working mine, the equivalent of having a thermometer directly above a BBQ, 5 feet away from an air conditioner, next to a tennis court that gets ripped up every 5 years and re-paved).

    In your example, foraminifera of the cretaceous period, if the geologist making the claim were actually in a different formation and didn’t realize it then the results would be non-replicable and only by examination of the location data could the underlying difference of opinion be brought to light. For the computer based folks, it’s like the question of whether 1+1=10, or 1+1=2. Both are right, it depends on the frame of reference. If you don’t realize that you’re not sharing a frame of reference, then the other person looks like they’re making things up (ie the word “two”, when everyone knows the right answer is “one-zero”). Without the meaning/location/source of the underlying data, cross-disciplinary misunderstandings are more likely.

  11. drat. proofread the second multiple times, missed the garbled opening. would you agree that the type of formation and the location (not to the the millimeter, but in regional or rock zone or transition zone terms) needs to be known and logged as part of the results?

  12. Most Geologists I know are baffled by anyone not making his raw data and his complete methodology available to everyone. I would like to think that approach and attitude stems from our foundations and founders who called themselves Natural Philosophers. I have taken to referring to myself in just his way to separate myself from the other earth scientists such as climatologists that are mucking the place up. Professional societies and journals that do not strictly enforce this kind of information exchange should simply be ignored. Many of those journals depend on subscribers to pay the bills. What happens if people stop subscribing I have.

  13. I would suggest, perhaps erroneously, that most data used in studies could be compressed to under 1Mb. That is one HELL of a lot of text.

    DaveE.

  14. question: how many scientists publishing in AGU journals, or climatologists more generally, could meet the basic standards of what is expected of, e.g., undergrad chemistry students recording all of their “workings” faithfully in non-expungeable lab notebook? i.e., are climatologists really practicing reproducible science, or are they giving “marketing” write-ups about what they want others to believe?? sample description of how basic science is to be approached in a chemistry course (simply chosen from a web search):

    http://www.dartmouth.edu/~chemlab/info/notebooks/how_to.html

    “The laboratory notebook is a permanent, documented, and primary record of laboratory observations. Therefore, your notebook will be a bound journal with pages that should be numbered in advance and never torn out. A notebook will be supplied to you before the first laboratory period. Write your name, the name of your TA, and your lab section on the cover of your notebook. All notebook entries must be in ink and clearly dated. No entry is ever erased or obliterated by pen or “white out”. Changes are made by drawing a single line through an entry in such a way that it can still be read and placing the new entry nearby. If it is a primary datum that is changed, a brief explanation of the change should be entered (e.g. “balance drifted” or “reading error”). No explanation is necessary if a calculation or discussion is changed; the section to be deleted is simply removed by drawing a neat “x” through it.”

    “In view of the fact that a notebook is a primary record, data are not copied into it from other sources (such as this manual or a lab partner’s notebook, in a joint experiment) without clear acknowledgment of the source. Observations are never collected on note pads, filter paper, or other temporary paper for later transfer into a notebook. If you are caught using the “scrap of paper” technique, your improperly recorded data may be confiscated by your TA or instructor at any time. It is important to develop a standard approach to using a notebook routinely as the primary receptacle of observations.”

  15. Fun little irony (I’m sure plenty of others have made this kind of observation), but just as the discussion of Gergis et al (2012) started on Climate Audit I commented upon Feynman’s famous “Cargo Cult” talk and how climatologists such Gergis (and Karoly) don’t grasp his crucial distinction between “advertising” and “science”…. Karoly still can’t grasp it:

    Climate Audit, Gergis/Karoly, and Feynman on “advertising” vs. “science”

  16. Skiphil says: @ July 14, 2012 at 6:50 pm

    Thank you for the map of how things should be done, and in most cases are. But not in mainstream climate science it seems.

  17. Mike D;
    yes, the basic definitions differ depending on context.

    “There are only 10 kinds of people; those who understand binary, and those who don’t!”

  18. Brian H says:
    July 14, 2012 at 8:50 pm
    Mike D;
    yes, the basic definitions differ depending on context.
    “There are only 10 kinds of people; those who understand binary, and those who don’t!”
    >>>>>>>>>>>

    Itz 101. You forgot the parity bit ;-)

  19. There has been an ongoing trend to do all this, but the technical solutions are either not widely known, used or not there at all. It’s a mess basically and its an evolving situation. And it’s not as easy as you might naively conclude.

  20. LazyTeenager says:
    July 14, 2012 at 10:53 pm
    There has been an ongoing trend to do all this, but the technical solutions are either not widely known, used or not there at all. It’s a mess basically and its an evolving situation. And it’s not as easy as you might naively conclude.
    >>>>>>>>>>>>>>

    So enlighten us oh young towering genius. ‘Cuz I notice that when anyone challenges you to specifics you tend to make a fool of yourself or else not respond at all. Spell it out. What are we so naive about?

  21. As I say every time this issue comes up:
    All capable parties need are the data on a plain-text webpage.

    Turning exploration into a nightmare of administrative, financial, & temporal inefficiency seems to be the objective of many. What you would get is more cosmetics, less substance, long delays, & cost overruns. More importantly, 80% less territory would be covered (Pareto Principle Corollary), causing decades-long delays in collective progress. Naive &/or deceptive.

    There are alternatives that might actually address the problem efficiently. I again suggest a careful rethink.

  22. Mike D in AB says:
    July 14, 2012 at 5:18 pm

    I think you grasped my point which was that in ‘GeolWorld’ – there is always a solid point of reference (literally!) and it is always (to all intent and purpose) re-visitable for other geologists to check and moreover to be able to directly compare one persons set of observations with anothers – of essentially the same data (rocks)….and thus, if there is some obvious error, it can be ‘observed’ and discussed/corrected….
    I believe that the ‘climate scientists’ must KNOW they have no real ‘product’ as such – and they have no choice but to hide this behind imaginary scenarios and computer models. That’s fine and dandy – leave ‘em to it for their own enjoyment – BUT they should not be allowed to influence/define the future of mankind without just cause – and just cause cannot be shown when you have beggar all to actually show – which (I presume) is precisely why they don’t show it!

  23. Let me provide some prospective on storage costs

    Amazon S3 currently charges about $0.10/GB/month for reasonable quantities of data, or $1.20/GB/year.

    An annuity to fund “perpetual” storage would run between $15 and $35/GB, with a long term average probably around $20.

    This could be funded by requiring the authors of the paper to provide half of the annuity cost, $10, and by charging a download fee, say $2/GB.

    This costs are likely very high.

    First, the cost of data center storage has dropped by a factor of 8 in the last 3 years thanks to lower hardware costs. This drop will continue.

    Second, these are the prices for active online data. More likely, the data we’re talking about will be stored for long periods, but for the most part never or very rarely accessed and essentially never rewritten. Taking advantage of this allows a significant drop in costs. There is another significant drop in cost available if the archive is allowed to impose an hour delay in retrieving less used data, allowing it to be stored off line.

    The problem is, I don’t have a good feel for the size of data set in geophysics. Seti@Home collects a bit less than 1 TB/day when operating at Arecibo. The upcoming GOES-R system will produce about 2 TB/satellite/day. These are interesting problems to archive.

    ++PLS

  24. A few details of a non-scientific but practical nature may push this proposal forward. What, specifically, is the AGU worried about? That would be a data size (We can’t store Y terabytes of data, that’s huge!) and a dollar amount (it costs $X/gb/yr to store). And the proposers could also mention what they think is a reasonable burden for the AGU to take on and work to a consensus on that. At some point the engineers making the storage systems will build systems cheap and large enough that the objections are overcome and we’ll know when they’ve done it because we have these figures ahead of time. And if the budgets are so tight at the AGU perhaps we should promote a kickstarter so that the money is covered.

    I suspect that in the real world, the storage systems to handle this cheaply enough are already available in commerce. I suspect that the AGU is shying away from this data storage mission because they don’t want people to be able to go back and knock out too many of the studies that passed peer review. I suspect that it’s mostly not a conscious conspiracy but rather a not entirely thought through unease that leads them not to even run the numbers. So let’s insist that the numbers be run and the technologists take a crack at sorting out the practicalities. If it turns out that the job *is* actually insanely expensive and prohibitively large in 2012, we can look at present trends and start preparing for the time when it ceases to be unreasonable.

  25. Kev-in-UK: Well, close, but not quite. One low-cost (so no instrumentation or test apparatus readily available) grassroots job I worked as compassman early in my career, the geologist warned me up front that he was accustomed to igneous terrains and that he saw everything in that lens. He recognized this, and wanted me to question anything he said because the area we were in was supposed to be sedimentary and the rock face we were looking at could have been pyroclastic in origin, or sedimentary outwash. I think that those were the possibilities, it was a couple of decades ago. The presence of something different in the strata column (namely, one strata column “eating” another one, with boulders of the older column made up of all size ranges) was what finally convinced the whole party that we were indeed in an igneous setting. Since the presence and type of former life is one of the classifying agents for age, a change in fossils might indicate a change in the life at that time, or a miss-categorization. That’s why I was harping on the location (raw data) so much.

  26. Paul Vaughan says: “As I say every time this issue comes up: All capable parties need are the data on a plain-text webpage. Turning exploration into a nightmare of administrative, financial, & temporal inefficiency seems to be the objective of many. What you would get is more cosmetics, less substance, long delays, & cost overruns.”

    What we’ve got now is $79 billion worth of pseudoscience designed to put us all in a global-scale concentration camp.

Comments are closed.