The Vast Majority of Raw Data From Old Scientific Studies May Now Be Missing

From the people that know how to save and care for things of importance, comes this essay from The Smithsonian:

One of the foundations of the scientific method is the reproducibility of results. In a lab anywhere around the world, a researcher should be able to study the same subject as another scientist and reproduce the same data, or analyze the same data and notice the same patterns.

This is why the findings of a study published today in Current Biology are so concerning. When a group of researchers tried to email the authors of 516 biological studies published between 1991 and 2011 and ask for the raw data, they were dismayed to find that more 90 percent of the oldest data (from papers written more than 20 years ago) were inaccessible. In total, even including papers published as recently as 2011, they were only able to track down the data for 23 percent.

“Everybody kind of knows that if you ask a researcher for data from old studies, they’ll hem and haw, because they don’t know where it is,” says Timothy Vines, a zoologist at the University of British Columbia, who led the effort. “But there really hadn’t ever been systematic estimates of how quickly the data held by authors actually disappears.”

To make their estimate, his group chose a type of data that’s been relatively consistent over time—anatomical measurements of plants and animals—and dug up between 25 and 40 papers for each odd year during the period that used this sort of data, to see if they could hunt down the raw numbers.

A surprising amount of their inquiries were halted at the very first step: for 25 percent of the studies, active email addresses couldn’t be found, with defunct addresses listed on the paper itself and web searches not turning up any current ones. For another 38 percent of studies, their queries led to no response. Another 7 percent of the data sets were lost or inaccessible.

“Some of the time, for instance, it was saved on three-and-a-half inch floppy disks, so no one could access it, because they no longer had the proper drives,” Vines says. Because the basic idea of keeping data is so that it can be used by others in future research, this sort of obsolescence essentially renders the data useless.

These might seem like mundane obstacles, but scientists are just like the rest of us—they change email addresses, they get new computers with different drives, they lose their file backups—so these trends reflect serious, systemic problems in science.

===============================================================

The paper:

The Availability of Research Data Declines Rapidly with Article Age


Highlights

• We examined the availability of data from 516 studies between 2 and 22 years old
• The odds of a data set being reported as extant fell by 17% per year
• Broken e-mails and obsolete storage devices were the main obstacles to data sharing
• Policies mandating data archiving at publication are clearly needed


Summary

Policies ensuring that research data are available on public archives are increasingly being implemented at the government [1], funding agency [2, 3 and 4], and journal [5 and 6] level. These policies are predicated on the idea that authors are poor stewards of their data, particularly over the long term [7], and indeed many studies have found that authors are often unable or unwilling to share their data [8, 9, 10 and 11]. However, there are no systematic estimates of how the availability of research data changes with time since publication. We therefore requested data sets from a relatively homogenous set of 516 articles published between 2 and 22 years ago, and found that availability of the data was strongly affected by article age. For papers where the authors gave the status of their data, the odds of a data set being extant fell by 17% per year. In addition, the odds that we could find a working e-mail address for the first, last, or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives.


Results

We investigated how research data availability changes with article age. To avoid potential confounding effects of data type and different research community practices, we focused on recovering data from articles containing morphological data from plants or animals that made use of a discriminant function analysis (DFA). Our final data set consisted of 516 articles published between 1991 and 2011. We found at least one apparently working e-mail for 385 papers (74%), either in the article itself or by searching online. We received 101 data sets (19%) and were told that another 20 (4%) were still in use and could not be shared, such that a total of 121 data sets (23%) were confirmed as extant. Table 1 provides a breakdown of the data by year.

We used logistic regression to formally investigate the relationships between the age of the paper and (1) the probability that at least one e-mail appeared to work (i.e., did not generate an error message), (2) the conditional probability of a response given that at least one e-mail appeared to work, (3) the conditional probability of getting a response that indicated the status of the data (data lost, data exist but unwilling to share, or data shared) given that a response was received, and, finally, (4) the conditional probability that the data were extant (either “shared” or “exists but unwilling to share”) given that an informative response was received.

There was a negative relationship between the age of the paper and the probability of finding at least one apparently working e-mail either in the paper or by searching online (odds ratio [OR] = 0.93 [0.90–0.96, 95% confidence interval (CI)], p < 0.00001). The odds ratio suggests that for every year since publication, the odds of finding at least one apparently working e-mail decreased by 7%

See more discussion and graphs here:

http://www.sciencedirect.com/science/article/pii/S0960982213014000

About these ads

144 thoughts on “The Vast Majority of Raw Data From Old Scientific Studies May Now Be Missing

  1. Can a request be put in to just lose the bad data, like temperature measurements that don’t fully embrace the Climate Change Cause?

    Doh, it’s been done.

    /sarc

  2. Our problem with older data is that it’s stored in media that are no longer supported, such as TK50 and TK70 tapes formatted in VAX VMS. It’s an expensive and time-consuming proposition to recover those files, even if you have a hard-copy data log, which we do.

    I’d suspect most of the recovery problem is there. The data aren’t lost, they’re just very poorly accessible.

  3. Most data is stored on I-drives and when you leave your job, every 3-5 years, it goes bye-bye with your account, shortly afterward.
    You generally keep your raw data for five years or so, so that you can produce it in event of a query from a grant body or journal. Lab books go into storage or the skip, depending on institution.
    I have the data for all my publication back to 2004. All my data from 95-99 is on drives that are not made any more, but the drive is in the same draw at my old department.
    The data from 2000-2004 was purged from the I-drive at my previous Institute 12 months after I left, as it is for all former staff.
    The IT people also deliberately reformat drives so that there are no IP documents, viruses or Trojans in reused computers

  4. I’m experiencing this myself. I have boxes with 3.5 inch diskettes and even 5 inch ‘floppies’ with stored data from +25 years of engineering work. I don’t have a drive (or software) necessary to read them.

    Perhaps there is a viable service business to be explored here??!

  5. Pat Frank says

    I’d suspect most of the recovery problem is there. The data aren’t lost, they’re just very poorly accessible
    _______________________________
    Having spent some time trying to reproduce scientific results with data available to the public, I’d hazzard a guess you’re wrong and the data and source code are intentionally kept from the public.

    Does anyone know if the data that went into the study discussed by this article is publically available?

  6. Should I be proud or embarrassed to say that I would have no problem accessing data on 3 1/2 (or 5 /14) inch drives?

  7. P.s. and “kept from thepublic” includes kept from other scientists too, even scientists working on the project.

    And sorry for anytypos, I’m on a tiny little nook tablet right now.

  8. Two thoughts:
    1) If all the data is truly valuable, then we need a National Scientific Digital Library (NSDL) that hosts digital copies of any published papers and supporting data along with relevant facts about the authors, etc, etc. Given the scale of supercomputers these days, the cost per paper would be minimal. The responsibility for sending the data to the NSDL should lie with the authors.
    2) If a paper loses its supporting data it should be considered obsolete or of no value. It should not be cited in subsequent studies.
    It’s unrealistic to think that individuals, or even institutions, will protect historical data.

  9. P.p.s. for those wondering how you could keep data from scientists working on the project, the answer is put the data into computer code and no where else. Once that’s done, the other scientists never see it and even a FOIA request can’t get it.

  10. On a related matter, nearly all of the IPCC documentation of any historical interest prior to the 3rd assessment is at risk of being lost. The only significant holdings of these documents are in the private print archives of participants — most of who are now in their late 70s or 80s. Given the importance that many of us assess for this episode in the history of science, the lost of this documents would be as surprising as it would be tragic. I believe they hold the key to understanding how this singular scare corrupted the institutions of public science. So far I have failed to gain any real support for their collection and preservation.

  11. “Obsolete storage devices.”
    Ya think?!
    How many of us could lay our hands on a punch card reader, 9-track mag tape drive, DecTape, 5.25 inch floppy? Heck moving from IBM 370 to VAX, you are going to lose a lot. going from Mac to Window 3.1 or Win-95 you lost almost everything. For a brief time I worked on the Landmark Mark-1 seismic interpreation station. It’s big removable media was a 12-inch Optical disk cassette, WORM in eight 100 megabyte sectors. $800 per cassette.

  12. Let’s not forget 8mm and 16mm film.
    I move our family films to video tape about 10 years ago.
    Now I have to do it again from 8mm digital video to DVD or HD. I have to find a 8mm diginal cassette camera now — lost the one we had.
    Sisyphus must have been an archivist.

  13. Mac the Knife says:
    December 20, 2013 at 6:09 pm
    I’m experiencing this myself. I have boxes with 3.5 inch diskettes and even 5 inch ‘floppies’ with stored data from +25 years of engineering work. I don’t have a drive (or software) necessary to read them.

    Perhaps there is a viable service business to be explored here??!
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    There are service bureaus available that can take old back up tapes/floppies, hard drives – from everything from an IBM 360 or Dec PDP 1170 (many still running after 40 years – you can still buy then on line) to tapes and floppies from Atari and Commodore 64’s. I even had some old 12 inch floppies from AES word processors (that we actually used for engineering calculations using CP/M). I have 40 years of engineering work on everything from tape to floppy to external hard drives tucked away in the basement. The data does degrade over time but much is recoverable and I actually have copied data to different media to avoid degradation. Although now that we have a 10 year limit of liability in Canada for engineering, I recently tossed all my paper as I have now been retired for more than 10 years. The media files don’t take a lot of space but they may go soon. Data conversion from one form to another isn’t terribly difficult. In the late 70’s we converted VAX Intergraph files to work with Trash 80’s; Victor computers from England and later the first IBM PC’s. Kind of fun. Course now I can barely start a computer … sort of.

  14. Mac the Knife says: December 20, 2013 at 6:09 pm

    I’m experiencing this myself. I have boxes with 3.5 inch diskettes and even 5 inch ‘floppies’ with stored data from +25 years of engineering work. I don’t have a drive (or software) necessary to read them.

    In the tropics we found that improperly stored (ie no air con) floppy disks became overgrown with fungus within a few years. As well as clogging up the drives, the fungus damages the surface (ie, dismantling and cleaning did not help).

  15. Wayne Delbeke says:
    There are service bureaus available that can take old back up tapes/floppies, hard drives
    ——————————-
    While what you’re saying is true, it takes funding to carry out. I’ll bet dollars to donuts the science teams will never even make a request for such funding.

    Go back and re-read the article. Notice that only 23% of the data from *2011* was available. That has nothing to do with technology.

    In modern science, reproducibility is not a goal, it is something to be avoided at all costs.

  16. Set a deadline that in order to do new, taxpayer funded research, all of your old taxpayer funded data must be saved to the “peoples” cloud. Sort of like a taxpayer amnesty program for science data. Another stick could be loss of awards, or retraction of honorarium.

  17. For climatology this is a design feature.
    Small wonder Phil Jones claimed “context”, as losing/ destroying original data, now appears to be a tradition.
    Surely if taxpayer dollars fund the research, the same agencies must be responsible for storing the completed research, which includes the raw data.
    Otherwise what benefit does the taxpayer accrue from funding scientific research?
    This makes the case for non funding even more coherent.

  18. I have some 1/2″ mag tapes that it is probably possible to read somewhere … then a couple of DEC-tapes, which maybe some museum could help with … my 10″ floppies would be harder, and the machine code and assembler source on them is for a machine that probably hasn’t been made for 30 years .. as for my punched cards and paper tape … although I *used* to be able to read the paper tape by eye.

    There is a real issue here. Not just for research results, but for civilization itself if information can’t be preserved!

  19. “Losing” data is a dead giveaway that there is something to hide. The “Inconvenient Truth” perhaps. What will happen is that the claim of what they said will be simply assumed as gospel truth to support their theories that have not a shred of truth in them-100% baseless lies from start to finish.

  20. One last comment…

    Even these numbers do not tell the full story. Saying that 23% of the data is available does not imply that 23% of the studies are reproducible.

    A science team can make 95% of its data available and the study still cannot be reproduced without the missing 5%.

  21. “Our problem with older data is that it’s stored in media that are no longer supported, such as TK50 and TK70 tapes formatted in VAX VMS.”

    In the “old days” we used to make a final “released” drawing of our engineering designs with ink on “vellum” (originally dried sheep stomachs, but later polyester), with several copies created and numbered. One copy went into the “vault”, a nice secure fire resistant room. I have on occasion retrieved a drawing 1 or 2 decades later from the vault for reuse. A little harder to do with a long list of numbers, but things like microfilm (metallic silver in gelatin) are also very stable for many many decades. And the only support needed is a magnifying lens (still available as version 1.0).

    At one time companies used to offer “long term” storage by renting space in old (dry) mines to store your microfilm.

    If you want your data to be stable it can be done, of course if you would like to “forget” your predictions a nice magnetic tape or hard drive will make your data disappear without much effort.

    Cheers, Kevin.

  22. Nothing said above should surprise those that have worked on projects.
    Isn’t it funny though that keeping data about important topics is difficult while if someone drunk or naked gets her/his photo on the web it is there forever (or at least a long time)?

  23. “To avoid potential confounding effects of data type and different research community practices, we focused on recovering data from articles containing morphological data from plants or animals…” Yes, they knew better than to choose the area of climatology!

    (I still have my boxes of FORTRAN II and FORTRAN IV programs and data on punched cards from the 1960s. Unfortunately I have several hundred 5 1/4 floppies that are waiting for an external drive-to-USB solution.)

  24. You can purchase an external USB 3.5 or 5.25 in. drive from Newegg for $14.99. Working in IT for a large public University, I suspect this is less a technical problem and more a “human” one. As usual, it’s the people, and not the technology causing the problem.

  25. Magnetic media degrades with time. So even if you can find a drive, you might have a hard time reading the diskette or tape.

    I have drives that can read 3.5″, 5.25″, and even soft-sectored 8″ diskettes. Unfortunately, most of the 8″ diskettes and many of the others are so degraded that I can’t read the data from them.

    The big problem with the 8″ diskettes (except for the 3M brand!) is that the oxide flakes off when the drive’s heads rub the media. If anyone has a solution to that problem, please contact me!

    For 3.5″ diskettes, even if the data seems to be unreadable, it still might read correctly with an LS-120 “SuperDisk” drive. I have an old machine with one of those drives, and it is absolutely amazingly good at recovering the data from old diskettes.

    For future data archiving, the right solution is probably M-Discs (though you’ll need an LG brand DVD burner to write ‘em).

  26. Dougmanxx;
    I suspect this is less a technical problem and more a “human” one.
    >>>>>>>>>>>>>

    Exactly. For anyone in the data management profession, there’s nothing novel or surprising in this study. The technology and the processes to protect data for the long term have been known for decades. The IT department merely needs the mandate and the funding to make it happen along with publication of the indexes and process for retrieval.

  27. … saw this ‘magnetic medium’ issue in the 70’s when I was working at the FWSH&TC (Ft Wayne State Hospital and Training Center) where I worked while going to school; we received a boatload of 1″ Ampex video mag tapes that had been stored ‘outside’ in non-temp/environment controlled atmosphere … at that time the magnetic medium was separating from the polyester tape base …

    .

  28. The is one topic where even the private sector isn’t fully immune. I’ve seen some companies attempt to bury research results from internal efforts, not out of stupidity or malice, but because the tax code makes it painful if the effort isn’t a complete write off.

  29. I still have my first PC, with both 3.5″ and 5.25″ floppy drives.

    On rare occasions we need to use it to pull up old survey data stored on those formats.

  30. KevinK says December 20, 2013 at 7:06 pm

    In the “old days” we used to make a final “released” drawing of our engineering designs with ink on “vellum” (originally dried sheep stomachs, but later polyester), with several copies created and numbered. One copy went into the “vault” …

    I was going to say, whatever happened to “Drawing Control”? The master vellum copies were were ‘microfiched’ (at TI) for subsequent human access in ‘card libraries’ at the various company campuses … the 35mm (or so) microfiche film was placed into a ‘cutout’ on an IBM 80-column ‘card’ which had the drawing number encoded in the first half of the card. Doing this allowed the clerk, using a nearby IBM ‘card sorter’, to re-sort the cards after they had been pulled by engineering and production personnel during the course of work day …

    .

  31. If you wanted to justify why you couldn’t satisfy an FOIA request it would be useful to have a study like this one. You could say, “I’d be happy to give you my tree ring data but it seems the magnetic medium was accidentally demagnitized so you’re out of luck. Please stop persecuting me about it.”

  32. …. the 35mm (or so) microfiche film was placed into a ‘cutout’ on an IBM 80-column ‘card’ which had the drawing number encoded in the first half of the card. …..
    ________________________________________________________

    These are called aperture cards. I worked on a project in 1985 to digitize all of the U.S. government (Army, USAF, Navy) Aperture cards. It was a massive contract to build the systems and deploy them (7 systems) and then a much larger task to run the cards.

    The Aperture cards for the B1 Bomber numbered over 5 million. North American Rockwell estimated that it was going to cost $238 million dollars to digitize that data. We did it for $32 million.

    Data migration is one of the two biggest issues in the preservation world. We are working with the National Archives and the Library of Congress and there is never enough money to get everything done.

    Just our project to capture 1960’s era raw data, digitize it, and deliver it to the planetary data system from the five Lunar Orbiters is generating almost 40 terabytes of data.

  33. One of the foundations of the scientific method is the reproducibility of results.
    Yes, true. That is why experiments are not one off, and many are repeated many times, improving the result. The Milikan oil drop experiment where one measures the charge of the electrons is an example of this, and of how measurements change through time. This was a lab experiment during my studies of physics back in 1960.

    In a lab anywhere around the world, a researcher should be able to study the same subject as another scientist and reproduce the same data, or analyze the same data and notice the same patterns.

    I think the last statement is an overstatement analyze the same data and notice the same patterns. overshoots the mark for the scientific method by hundreds of years. The way the scientific community made sure of the stability of knowledge was through publications which another researcher could study and repeat the experiment. Nobody required the same data to be chewed over again and again. It is the computer age that made the data available for chewing over and over, which in my opinion, is the wrong path to take. If there are doubts about data, experiments should be repeated, not checked like homework problems.

    The reason lies that the complexity in any decent experiment is large, the probability of errors entering in the gathering is also large, as humans are fallible. Chewing over and over the same data may only show up these errors, which would explain discrepancies between experiments , or not , because similar blind spots could exist in the new analysis. It would not advance knowledge, particularly if this habit of rechecking made experiments one off, thinking that rechecking the same data is like a new experiment and makes it safe.

  34. From essay:
    “These might seem like mundane obstacles, but scientists are just like the rest of us—they change email addresses, they get new computers with different drives, they lose their file backups—so these trends reflect serious, systemic problems in science.”

    Mundane obstacles?what like tying your shoes?
    this kind of data storage is part of ‘science’ isn’t it? Some are proud of it ,no?
    IDK if it’s me?,but i seem to notice an administrative point of view to recent
    articles ,studies ,news items.

    From article summary:
    “Policies ensuring that research data are available on public archives are increasingly being implemented at the government [1], funding agency [2, 3 and 4], and journal [5 and 6] level”

    Thanks for the interesting posts,articles and comments

  35. I smell a project for Google! The Google Scientific Archives. Stored in multiple data centres on multiple continents. Online all the time. It’s better than trying to read VMS tapes that haven’t been retentioned in a decade.

  36. anna v says:
    December 20, 2013 at 8:42 pm

    If there are doubts about data, experiments should be repeated, not checked like homework problems.
    —————-
    You cannot repeat the experiments without the data.

    Example 1:
    Satellite data needs to be adjusted due to noise in one of the sensors. The adjustments are written in computer code and stored no where else. No one can independently repeat the results of the adjusted satellite readings, including scientists working on the project. And the data cannot be obtained via a FOIA request because computer code is not considered to be documentation.

    Example 2:
    Government climate scientists adjust temps of cities *up* rather than down (as would normally be expected for an urban heat island). They give no reason why this adjustment was done this way. You cannot independently verify their reasoning when no reason was given.

    Example 3:
    Astronomer claims to have reproduced the orbits of all the objects in the solar system, from ancient times when it was little more than gas to modern day. Without the computer code and data it is impossible to know what assumptions were made to reach such a conclusion.

    These are all real-life examples.

  37. One medium that is particularly stable and needs no supporting technology is ink on acid-free paper. We used to have whole buildings just full of bundles of data in this medium. I forget what those places were called. “Librioteks” or something like that.

  38. 3.5″ floppy drives are still available but if you try to access a 20 year old floppy you may well be disappointed. Bit rot. The recording medium is not stable over that time scale

    Even without the topical fungus someone else referred to I have back ups of software from 30 years ago kept in clean dry conditions and I’d estimate less than half are fully readable. Even commercially produced originals, and that predates the collapse in the quality of floppy disks that killed the medium.

    17% per year ! That’s a half life of just 4 years. That’s serious.

    Data storage requires maintenance. Libraries were traditionally created to perform this function for paper records. It seems like much in our disposable age data is now a throw away commodity too.

    But modern science is disposable too. Study results are made to order for political or commercial needs. The naive idea of objective, investigative science is long dead. Scientific “reports” are bought to serve a short term objective and are then no longer needed.

    Welcome to cleanex science. One wipe and flush.

  39. This is a real problem with all data, it requires a lot of work to keep porting your older data to new media.

  40. magicjava says:
    December 20, 2013 at 8:56 pm

    You cannot repeat the experiments without the data.

    We have a different definition of experiment.. I am a physicist, and experiment means that one sets up a new experimental setup and gets new data.

    What you call experiment I call analysis. Historically scientific knowledge advanced by experiments and even multiple observations for astronomy, not by reanalyzing the same data..

    I agree that it would be good since now the facilities exist to keep the data in the one off observations , but it will be a new way of doing science and should only be used for checking discrepancies by reanalysis, not as if it is a new experiment/observation . Certainly they should not be written in stone for the generations. If the next generation wants to examine something it should redo the experiments, not regurgitate old measurements.

  41. RoHa says:
    December 20, 2013 at 8:58 pm

    One medium that is particularly stable and needs no supporting technology is ink on acid-free paper. We used to have whole buildings just full of bundles of data in this medium. I forget what those places were called. “Librioteks” or something like that.

    The problem is information density. Paper is a really useful adjunct for relatively limited amounts of data. But, paper is really bulky for the amount of information you can store on it. I’ve dealt with projects where my staff really whined when I insisted on hard copies of everything. At the very worst, the data may need to be re-entered, but that is cheap compared to losing it completely.

  42. Now where did I leave that Bernoulli Cartridge with my cop of Volkswriter? I know my Overunity Design plans were stored somewhere. Perhaps it isnstillninnmy Bernoulli Box. Come to think of it, my unified theory is probably on the same disk. DRATS

  43. anna v says:
    December 20, 2013 at 9:50 pm

    We have a different definition of experiment.. I am a physicist, and experiment means that one sets up a new experimental setup and gets new data.
    ——————————-
    In the case of satellites, how would one set up a new experiment to verify satellite readings? With another satellite? Few can do that and even when it’s done *that* data and code cannot be fully verified either.

    In the case of urban heat islands, how does one set up an experiment to verify or refute a claim that was never made? It can be easily demonstrated that urban heat islands raise temperatures. That does not mean some other factor rightfully required the final temperature to be adjusted up even higher.

    In the case of computer models of the life of the solar system, how does one set up an experiment that can only be performed on a computer? And if your computer model, or 100 computer models, get different results than someone else’s, what does that demonstrate? That all your models are wrong? You cannot refute the core assumptions of a model without knowing what they are.

    Experiments mean nothing if they cannot be verified.

  44. You silly people thought this was about Science? Pish-tush! It was all about Publication! Academic publish or perish. Once you’ve published, you’re done. Science had nothing to do with it. Archiving? That’s something they do on another planet.

    Fifty years ago, I worked in aerospace R&D. When a co-worker was laid off, we heard there was money left in her project for archiving, but no way to access the funds. She’d worked near me for several years, doing studies on [can't tell you]. On her last day, she put all her notebooks in her lab bench drawer. I’m fairly certain (since others were also laid off) that her bench was eventually taken to field storage and left there, notebooks and all, in the humid Santa Monica air.

  45. Why do Atheists put up Christmas lights?

    Why do they leave them up permanently?

    Today’s epiphany progressive: I should have used Cliffs Notes for the classics. They’d worked wonderfully for both years of calculus, physics, chemistry, mechanics of materials, et cetera.

    Why did I waste so much time?

  46. They couldn’t find email addresses for biology research papers older the 20 years?!!! OMG, I’m going to have to report that to some kind of scientific probity society!!!

  47. Not a problem. Just think of the studies that must be re- done. Grants, grants, grants.

    Where did I leave those MathCAD floppies…

  48. This problem is really urgent in history research depending on our archives. These contain huge amounts of written sources but almost all information of the past decades is digital. Some archives try to keep old computers with old drives and programs because they cannot copy gigabytes of information with each innovation. The whole thing is hopeless. It means that the period behind us will be called by historians the Dark Decades.

  49. Claivus says: Stone is the only answer. Poor data density but great shelf life.

    You’re wrong. I tried backing up my software as a binary record marked on granite once. My selves didn’t last 5 minutes.

  50. anna v says:
    December 20, 2013 at 9:50 pm

    We have a different definition of experiment.. I am a physicist, and experiment means that one sets up a new experimental setup and gets new data.

    ===

    Then your comments are irrelevant to this discussion , which I think is what others have been trying to say.

  51. From the poor old taxpayer’s angle the failure to ensure comprehensive storage of all data relevant to the immense numbers of science articles published over the last couple of decades is proving to be just another example of the public’s trust and the immense almost no strings attached, financial largesse that has been showered on science of every type in ever increasing quantities over the last few decades, now becoming nothing more than another deep black financial, integrity and accountability free rat hole of steadily decreasing value to society.

    It is becoming apparent that a very large percentage of scientists are now grabbing every dollar of public’s money they can but have blithely and arrogantly assuming that they do not have to meet any standards of integrity, accountability or responsibility to society in return.

    It can only end in tears for much of science unless they get their house in order as there is an increasing sentiment that maybe our society doesn’t need the numbers of scientists we currently and quite lavishly support.
    The increasing public perception of science, driven primarily by the bad image climate science is developing, is that most so called scientists are in it today for the money and prestige rather that a deep passion about science

    To quote one of my close relatives who got a degree at a well known university here in Australia, In science we pay ninety nine dickwits to get the hundredth guy or gal who can really make a difference.
    Maybe we as a society only needs to pay nine dickwits to still get that tenth guy or gal who can really make a difference.

  52. We are told that even science that does not appear to have any useful application at all is valuable because it Adds to the Sum of Human Knowledge.
    Yeh ! Right! It Adds to the Sum of Human Knowledge as long as the data format is still around or the discs are not lost or are thrown into the rubbish bin or go moldy, a period that now seems to be down to only half a decade or so.

    So in short society through no fault of anybody except the scientists involved has completely done it’s dough when it backed that bit of supposed research and those scientists.
    The only real beneficiaries are the scientists involved.
    The rest of us have done our dough big time.

    An excellent reason not to back those scientists or that research again until society and the tax payer can be categorically assured that ALL the relevant data tied to that research will be around permanently in a format that posterity into the far future can still view and sort through and check and verify.

    Society only has limited resources to spread around amongst it’s various important sectors and if full accountability is not assumed by the recipients of society’s largesse they should not be surprised if they find themselves out looking for a job as street sweeper.

  53. Gerry says: December 20, 2013 at 6:10 pm
    Should I be proud or embarrassed to say that I would have no problem accessing data on 3 1/2 (or 5 /14) inch drives?

    Ditto. My 60 Mb tapes are toast, though, because the rubber pinch wheel on the drive turned to bubble gum.

    I have my high school PSAT scores on a Hollerith card, but I found out there’s a bunch of different coding schemes for punch cards.

  54. This is a problem which has been well known among librarians and archivists for decades. There is major concern that all the digital data from the 1970’s through the early 2000’s will be lost to history soon. You as mentioned above have multiple lines of attack that destroys the data. Physical loss (I forgot where I put it or it got thrown out while I was on vacation etc.). Degradation of the media itself such as the gradual loss of magnetic domains on the tape etc. Loss of the mechanical devices necessary to read the media. Loss of the supporting software and formats of the data so you can make sense of the raw bits even if you can read them off the media.

    The library of Congress has storage rooms full of old media recovery equipment from beta max tape readers to vinyl record turn tables so they can read data storage resources they acquire.

    I the late 1990’s I worked in a tape library for a large data processing company here in Colorado they had 500,000 3480 tape cartridges and racks and racks of both large and small reel tapes. Many customers had archival tape reels stored there (keep for ever tapes) which if you looked closely were probably useless as they had tape cinches deep inside the reel where shrinkage of the tape and caused segments of the tape to be folded over and permanently creased.

    We weekly ran into data recovery problems with these tapes. We had two brands of reel to reel tape drives IBM and Storage Tek. Sometimes the Storage Tek drives would refuse to even load the reels, but often the IBM drives would load and read the tapes. All because of differences in both the operational methods used in each drive to read bad spots on the tape but just simple mechanical issues like slight differences in the tape head alignments. The tape was unreadable on one drive but moved to a different identical drive it could “usually” be read if you could get it to load.

    I also have boxes of 3.5 inch disks sitting under my desk as I write this post. A year or so ago I went through those boxes (several hundred disks) and one by one loaded in my desktop computer and wrote the files out to a hard drive to refresh the data, and then wrote the data to CD’s. I am currently planning on picking up an M-disk compatible drive because I have several terabytes of photos that I need to backup on a permanent media. I also have hundreds of silver halide slides and film negatives that I have slowly been trying to digitize. Funny the film is a far better archival storage medium than modern digital systems.

    The fact is almost no one takes data preservation seriously. We cry and moan about all the film lost to history as the early celluloid films crumbled to dust in the vaults of the movie studios but the exact same thing is happening to both personal and commercial digital data as we debate this problem. Right now it looks like M-disk is the only truly archival means of storing digital data other than optically on silver halide micro film

    We may find that 50 – 100 years from now suddenly paper books will be come enormously valuable as they will be the only surviving records from our era. I have cheap paper back books in my book shelves I bought 40 years ago that are as good as the day I bought them except for a little yellowing at the edge of the pages, where cassette tapes and early floppy drives only 1/2 that age are essentially all gone.

    The best option right now for important data is to get it into archives like the wayback machine and their attempts to archive all manner of information including printed media via their scanning projects.

    http://en.wikipedia.org/wiki/Internet_Archive

  55. jorgekafkazar
    “You silly people thought this was about Science? Pish-tush! It was all about Publication! Academic publish or perish. Once you’ve published, you’re done. Science had nothing to do with it. Archiving? That’s something they do on another planet.”

    The above comment is nasty, but pretty well nails it. With the imperative to publish, quantity will inevitably trump quality. It is unreasonable to expect the authors of low-quality papers to leave data (evidence) lying around indefinitely as it increases the likelihood of their eventual exposure.

    Genuinely high-quality scientific papers rarely ‘die’ because they are too widely copied. (Nobody mention Nikola Tesla.) If the academic emphasis ever shifts to quality it will be accompanied by a corresponding decrease in the quantity of papers published — and confirmed sightings of flying pigs.

  56. Duster said ‘At the very worst, the data may need to be re-entered, but that is cheap compared to losing it completely’.

    It reminds me of research I did many years ago. I needed the data base of someone else and yes, all data was still available on punch tape. The lab owned a bizarre engine that could translate the code into normal print. It was quite defective, meaning that after the job the room was filled with spaghetti, but I had the data. Next, I had to re-enter everything on punch cards. That took many weeks and at the end I owned some boxes. If you let drop a box, you had an information disaster. Finally, the data arrived on the university computer and I got after many years the message that I had to copy it because they were cleaning the drives. Because the subject had become totally obsolete, I did not respond, which meant the end of a data base. Perhaps science should live with the fact that we cannot keep all that data, and we should only retain the most important from astronomic observations, for example.

  57. In 1995 I was asked to give correct readings for Sweden to Tema Vatten Linköpings University. How and why aren’t important here. Tema Vatten’s scientist answered: It’s easier to estimate the readings before 1990 than to write them into a computer …….

    More records are preserved in unexpacted places at Archaeologic and Historic Institutions University Libraries. I know of one other place where almost all correct data can be found from 1890’s on….. guess I better keep that information for myself for the time being.

  58. Yep. Which is the reason why the publication of any manuscript should require submission of the data. Unfortunately, standards are so low that not even a comprehensible methodology is even required when reviewers find their opinions validated.

  59. So, in the modern computer age, we have a data half-life, which is what?

    Interestingly, this was not a problem before computers, because all data had to be written. The past was obviously more permanent than previously thought.

    This should lead to the establishment of sound data procedures, such as making the data publicly available in computer form, etc.

  60. michaelwiseguy says:
    December 20, 2013 at 7:30 pm

    In other news, here’s a great interview;
    COP19: Marc Morano, Executive Editor/Chief Correspondent, Climate Depot

    michaelwiseguy,

    Thank you very much for the wonderful link. Very good interview.

  61. Information loss is a problem which has plagued mankind for all of human history.

    “As for knowledge, it will pass away.”1 Cor 13:8

    Claivus wrote, “Stone is the only answer. Poor data density but great shelf life.”

    Or use M-Discs, which are probably the next best thing to “written in stone.” M-Discs are new technology, for inexpensive thousand-year data storage. The guys at Millenniata are heros. I hope they get very rich.

    Data loss is a severe problem in climatology. The NSIDC falsely claims that “the satellite record [of sea-ice extent] only dates back to 1979,”, which was the end of a particularly cold period, characterized by above-normal Arctic sea ice. But, actually, Nimbus-5, Nimbus-6, and Seasat-1 all made sea ice measurements via passive microwave radiometry prior to 1979. Unfortunately, NASA has lost the Nimbus-6 and Seasat-1 data.

    We still have good quality Nimbus-5 ESMR (passive microwave) measurement data from December 11, 1972 through May 16, 1977. Nimbus-5’s ESMR instrument continued to operate in a degraded mode through March 1983, but the 1977-1983 data doesn’t seem to be available on-line; perhaps it has been lost, too.

    The early Nimbus satellite measurements showed that 1979 was probably near the peak for Arctic sea ice, a fact which was reflected in graphs in the IPCC’s First and Second Assessment Reports (in 1990 and 1995, respectively), but omitted in later Assessment Reports.

    The other thing that can be done to preserve data is to get it onto a web site on the Internet, and archived by services like TheWaybackMachine, WebCite, AwesomeHighlighter, and CiteBite. But even that doesn’t guarantee that the knowledge won’t pass away. AwesomeHighlighter is now gone, along with all its archived data.

    It would help if scientists in universities and research institutions didn’t use robots.txt exclusion rules to prevent their web pages from being archived, and deliberately delete and hide their data, like Jones, Mann, Briffa, etc.

    The loss of early data is obviously very bad for science, but it can be convenient for propaganda. Starting the sea ice graphs at the 1979 peak maximizes the appearance of subsequent decline, to support the CAGW narrative. The loss of so much of the earlier data makes it easier to perpetrate that deception.

  62. _Jim says:
    December 20, 2013 at 7:54 pm

    Too cool not to post directly, daveburton!

    Jim,

    Yes, very cool!

    Just thinking out loud here, same basic technique could be used to convert cuneiform tablets or even hieroglyphics. After image is transferred to digital image, translate it.

    Why don’t archaeologists simply carry around an app on their phone and snap the photos?

    Somebody beat me to it: https://play.google.com/store/apps/details?id=com.eyelid.Nexus.Hieroglyphs&hl=en

  63. The only data is the raw data so if the raw data goes missing thats great news for the agwers because nobody cane tell if they are lying because the evidence is gone. So stop yepping and go to work. Look for the raw data and make it work now you cane.

    • Well if it was true that no rawdata still existed, but that’s not true in a world were Swedes exists…. I know of an archive where all essential origin newspapers daily reports from around the world written from 1890’s been saved one way or an other (not on computers or servers even if some are digitiliazed as well)….. If we Swedes hadn’t had a master of administration and bureaucracy back in history (Axel Oxenstierna) we wouldn’t have so many archived papers of every kind there is….

      Then in an other archive there is copies of raw data for temperatures on Northern Hemisphere as far back as early 1800’s together with dissertations.

  64. Somewhat [un]related:

    Those of us who are currently alive may be the last generations whose descendants will be able to see us in old photographs. When I take my last breath, the thousands of digital photos I have stored on my strong-password-protected computer will probably never be seen again by anyone. This year I finally got around to transferring all the VHS home movies of my family, and the old 8mm home movies of my Dad’s family to digital files. Uploaded them all to youtube (unlisted so only family members can view them) and gave the URLs to every relative I thought would be interested in them. I couldn’t think of a better way to try to preserve them for posterity. I’ve commented to various people before that 100 years from now it is highly unlikely that there will be anyone alive who remembers any of us unless we manage to do something extraordinarily memorable (good or bad).

  65. Well, this is a very relevant issue.
    I surely do not have raw data of the experiments I made in the early 90′ when we used to record on a paper polygraph with no digital capabilities at all.
    As others have mentioned to keep all the raw data of an entire life spend in research is a very costly procedure. Perhaps funding bodies should consider to sponsor this kind of applications which may turn out to be even more scientifically relevant than a sound and trendy new grant application with up to date technology.

  66. For those of you who are having trouble recovering data from 3.5″ floppy diskettes, I strongly recommend two things:

    1. Open the write-protect tab/window on each 3.5″ diskette before inserting it into any disk drive. (For 5.25″ diskettes, cover the write-protect notch.) This is very important, because diskette drives position their heads by “dead reckoning,” and when two different drives write to a diskette the new data usually doesn’t line up exactly with the old, which causes hard-to-read mixtures of old & new data, and Microsoft Windows writes to the disks (to update the “last accessed” date/time) whenever you read from them. By write-protecting the diskettes before you try to read from them, you will prevent Windows from destroying the readability of your fragile data.

    2. Buy a used LS-120 “SuperDisk” drive (under US$30 on eBay), install it in (or plug the USB version into) an old Windows XP computer, and use it to rescue the data from your diskettes. Those LS-120 drives work much better than regular diskette drives when reading from old, degraded media.

    Note that most LS-120 drives (including mine) use IDE (PATA) cable connections, rather than diskette cable connections. There are also some USB LS-120 drives, too, but I’ve never used one. There are also laptop drives, but it could be challenging to get one of those to work unless you have the right model (old!) laptop computer.

    You’ll need an old computer. I don’t think that Windows versions after Windows XP include LS-120 drivers (I’m sure 64-bit versions of Windows don’t), and new computers don’t have IDE interfaces unless you add an adapter.

  67. Part of high defence procurement costs is in a small part due to data archive.

    My old company had procedures where, on a regular basis, all archived data was copied from surface to surface. Irrespective of the type of surface medium, tape or disk, the data would be copied.

    This ensured that the design and support data could genuinely survived the contracted period of 20 years.

    The customer could then procure a new contract or have the data destroyed.

    In the engineering world, to completely loose a whole dataset would be professional suicide.

    It seems in science, it’s a badge of respect amongst some!

  68. I think much “scientific” data is bad data in the sense of not completely supporting the “conclusions” for one reason or another. But, given “publish or perish”, the conclusions are published — and the “data” not.

    Given the modern “cloud”, virtually “all scientific data” could be uploaded there and probably kept for “all time” — if fellow scientists really wanted their data available to the “public”. In a more honorable world, someone might seriously propose that solution on a National or World Wide basis — and others agree to it.

  69. Steve Richards says December 21, 2013 at 4:15 am

    Part of high defence procurement costs is in a small part due to data archive.

    My old company had procedures where, on a regular basis, all archived data was copied from surface to surface. Irrespective of the type of surface medium, tape or disk, the data would be copied.

    One of the big advantages of working in the forward-looking and organized environment provided for on-going semiconductor production and research as well as (defense) ‘projects’ at a company such as TI was access to such resources as the IBM “Tape Librarian” facilities maintained by the CIC (Corporate Information Center) folks …

    Nowadays, robotically implemented mechanisms ‘fetch and replace’ tape volumes in cartridge form when named datasets are requested for read or write … ‘updating’ or refreshing of the data is done on a regular timed basis to new tapes in a multi-tape (older to newer) ‘set’ to allow access to some number “n” back in the series of tapes in the event of any issue which might arise.

    Unaccessed, ‘dormant’ (for some period of time) ‘datasets’ normally residing on DSD (direct access storage devices – 3350, 3380 etc ‘hard disks’) were also backed out to the ‘tape library’, freeing up that valuable and limited resource as well. I can recall several ‘jobs’ (on the IBM 370 mainframe) in the 80’s where it required extra time to complete (or start, actually!) because an infrequently used data file had been taken off the HDs and put into long-term (and cheaper) ‘tape’ storage system.

    http://en.wikipedia.org/wiki/Tape_library

    Quickie showing a modern tape library being installed, plus a sample of its operation:

    automated tape library (IBM TS3500) … capable of storing ~ 27.4 PB petabytes of uncompressed data. The library is composed of 16 frames plus 2 service bay cabinets and can contain up to 18,257 tapes moved by two robots

  70. I binned my stuff when I retired: there was neither room nor point in taking it home, and nobody at work would have been interested in my punched cards, mag tapes, floppies, zip drives and so on. Or even paper records. It also saved me from having to distinguish data that were mine to use freely and data I had been given under confidentiality agreements, or had assigned to grant-givers. I suppose my collaborators still have raw data from the last decade or so.

    Mind you, I did once have a Philistine Head of Department who had our librarian destroy all reports older than ten years on the grounds that nobody could possibly be interested any more. Inevitably this was discovered by somebody asking for an old report of his own and being told by the librarian what had happened. So there is not necessarily any advantage in handing stuff over on retirement anyway.

    The key lessons are (i) Don’t appoint arseholes as Heads of Department, and (ii) If data aren’t archived promptly, loss is very likely.

    So who’s to pay for the cost of the original archiving (probably minor) and for maintaining the archive (probably major)?

  71. I use a “One Year Rule” to control bin access when tidying up, surely the same kind of arrangement could be used here. If your data have sat around for, say, 5 years and nobody has shown a lick of interest you must be fairly confident that the world has forgotten your efforts. If you get a request from someone talking about data preservation, send them 2Gb of random Wiki on a cheap memory stick. If they complain, try retrieve the real thing.

  72. I have spent literally days and days scanning old photographic negatives and prints, if I couldn’t find the negatives, and have catalogued them by year and season (current is 2013 Winter). The earliest I have is my grandparents photos from 1902 (season unknown!) I have digitised VHS videos of my children’s parties, school plays etc. To lose it all would be heart breaking, it is automatically backed up on two network HDD’s in the house and manually to another one in Spain. I have tried cloud backup but with 1.2 TB of data, it takes ages to upload and almost as long to download even with fibre broadband.
    I think like many people have said in previous postings, that some climate data was deliberately “lost” to prevent uncomfortable questions being asked. Again like others have said above if the raw data is no longer available this should make that study invalid. I appreciate that storage methods and formats have changed, but the bottom line is that all computer data is binary using sequences of zeroes and ones so moving from one medium to another is only a problem if the person doing the moving does not feel it is important enough to move.

  73. Data Retrieval

     Something that so far hasn’t been addressed needs mentioning here.

     Back in the late 80’s I worked on complex non-planar surfaces, and one of the things we had to do was generate surface normals for a predetermined distance and the approach vectors to them. This was done on a VAX/VMS system.

     Some years later when we migrated to desk top systems, it was discovered that the end points of the surface normals and the approach vectors changed when processed on the new systems.

     Much handwringing and tooth knashing took place until it was finally determined that the VAX was using sum of least squares, and the desktops were using the much newer cubic squares technique to arrive at what were supposed to be the same points in space.

     Simply having the data may only be a part of the solution, especially concerning complex calculations. Processor math has changed over time as well.

  74. Almost 40 years ago I worked on outdoor environmental exposures to asbestos. There were not very many samples taken outdoors, much less indoor/occupational exposures. The analytical techniques were all over the map and many many disagreements as to the health exposures. Asbestos was then the hot button issue in the mid-1970’s and early 1980’s. Most of the exposures we were concerned had to do with the type of rock and how it was used (road surfaces). We established sampling protocol with analytical techniques that were cutting edge at the time (not used by any others), and now standard protocol. There was 3 years of sampling of different exposures. At the time I worked for a governmental agency–who like everyone else–I did not trust to keep the information for others to look at the data or analyze the samples.
    I kept the samples, the raw lab data, and the end results until 4 years ago when I moved my home (which is where I kept the information and stuff). I threw out everything, since there would be no more interest in this data. BTW under EPA law much of the asbestos data, jobs, etc has to be kept for up to 30 years or more. So I have a whole bunch (closet full) of asbestos projects that will only go when I die and then no one will care. ;-)

  75. One of the suggestions was requiring a government data archive.

    Let me give you an example of a government data archive in action.

    Harris County, Texas (Houston) had a massive problem with old land records etc, it was all on paper and costing a fortune to house. So being efficient they decided to transfer it all to microfilm and then burn the originals.

    Lots of money made in the contract and it was expertly carried out by a friend of the Powers That Be. And in due course the paper went up in flames.

    And then the discovery. For cost control purposes the contractor stored over 100 years of data on cheap microfilm that had an expected life of 4 years. Nothing to be done but weep, the project was a 5 year effort and the earliest data was already decayed.

    Trust not in Princes!

  76. “they were dismayed to find that more 90 percent of the oldest data (from papers written more than 20 years ago) were inaccessible.”

    This is a little overdone, data storage. First, explain what meaningful finding has been “destroyed” by the fact hat the data is no longer available? Is Madam Curie’s radium not radiumating anymore? Are the fly mutants that Nueslein Volhard and Eric Weischaus no longer identifying important genes in human disease? Is science falling apart because we can’t read our 3 1/4 inch floppies?

    Any work that is important is usually repeated in a slightly different way. For example, most people believe Mann’s old ’ 8 paper not because they have looked at his data but because the study has been repeated–not exactly, but close enough—many times by now, Marcott’s paper (’13) might be the last one.

  77. Stephen Rasey says:
    December 20, 2013 at 6:24 pm

    “Obsolete storage devices.”
    Ya think?!
    How many of us could lay our hands on a punch card reader, 9-track mag tape drive, DecTape, 5.25 inch floppy?

    Philip Peake says:
    December 20, 2013 at 7:02 pm

    I have some 1/2″ mag tapes that it is probably possible to read somewhere … then a couple of DEC-tapes,

    Yay DECtapes! I have one on my desk I’ve meaning to take into work for a while. I don’t have a DECtape drive, but I bet the tape is still readable. At least the oxide hasn’t fallen off the tape yet.

    I should be able to read a 5.25″ floppy, but only if my C/PM system still boots.

    http://www.obsoletemedia.org/dectape/

  78. This is done on PURPOSE for the most part.

    1) There is no science in their studies so they lose the data.
    2) The Piled on High and Deep ‘researchers’ are not researching anything with meaning.
    3) If they ‘lose’ the data, they can always reissue their ‘study’ for grant requests and tax money.
    4) They are incompetent and corrupt – see Mann made hockey sticks for more info.

  79. As a non scientist there are a number of things which strike me about this debate:

    1. In days of old only a small segment of a small population generated data. It was expensive, valued, and people were motivated to preserve it (libraries). In 2013 data is cheap to generate by a hugely increased population, most of whom have the ability (education and facilities).

    2. With geometric growth, data (scientific and other) cannot all be realistically archived and indefinitely accessible. The costs (hardware, software, porting, file maintenance, storage etc) will become increasingly un-affordable. We therefore have to accept that most data will have a short shelf life.

    3. The scientific community needs to respond to this by doing a number of things:

    (a) set minimum standards for data retention supporting scientific papers – minimum of 5 years??
    (b) identify the key data sets which need to be preserved for longer – 20, 50, 100 years??.
    (c) funding for this to be from the scientific community budgets – probably unpopular but otherwise the taxpayer will be dragged into funding all kinds of data retention claims (film, news, sport, photos, where would it stop??)
    (d) improve peer review process to ensure (amongst possibly other things) that only those papers which have clear data policies in place, and whose narrative properly describes changes to base data sets and assumptions are approved for publication/acceptance.

  80. several things come to mind. first and foremost on my mind is what happened to the rest of you folks that used 8″ floppy disks like me. The next thing that came to mind is this article is about data collected for biology and as Rutherford(?) is quoted (para.) “Physics is science, all else is stamp collecting”. Finally, there is that infamous paper from a couple of years back that estimated 90% of scientific papers are proven wrong or turn out to be severely flawed within 5 to 7 years of publication. Finally, I’ve not yet found out whether that infamous paper was in the 90% or 10% – or whatever the numbers really turn out to be.

    It is clearly a problem that important information is getting lost in our so far rapidly advancing technological society, including scientific research. It has been a problem for quite some time that the repeatability part of the scientific method has taken a back seat for the vast majority of experimental results.

    The creation of the patent office was one of the first attempts at preventing this sort of problem – at least with technology. I’m sure computer repositories can deal with the much greater information now existing – at least until an EMP takes out the technology to the point where we can no longer access the information neccesary to reproduce the damaged technology.

  81. Mag tape isn’t dead yet!

    “Magnetic tape to the rescue”

    http://www.economist.com/news/technology-quarterly/21590758-information-storage-60-year-old-technology-offers-solution-modern

    “WHEN physicists switch on the Large Hadron Collider (LHC), between three and six gigabytes of data spew out of it every second. That is, admittedly, an extreme example. But the flow of data from smaller sources than CERN, the European particle-research organisation outside Geneva that runs the LHC, is also growing inexorably. At the moment it is doubling every two years. These data need to be stored. The need for mass storage is reviving a technology which, only a few years ago, seemed destined for the scrapheap: magnetic tape.”

    It’s interesting to see that mag tape still provides a viable method of mass storage. But ultimately, someone in academia/research has to mandate the actual procedure of moving “cold data” to tape storage (or other) instead of simply letting the data vanish.

  82. Maybe it isn’t a bad thing for humans to have to re-discover themselves and their world. Maybe we are not genetically designed to constantly improve from generation to generation. Maybe we are a pendulum species, like many others, waxing and waning in an oscillation of re-discovery and hibernation.

  83. Records management long ago became it’s own IT specialty and for my employer it is a growing part of the IT budget. There is a constant review of holdings (physical and electronic) with increasing costs associated with storage space (including physical security and environmental control costs), maintaining legacy equipment and software (to access old storage media) and ongoing media conversion (micro film or paper to electronic format, and moving data from old media to new). Add FOIA, congressional or litigation related requests and it’s easy to see that archive management is a growth area within IT (especially within the federal government).

  84. Given the reversals of so many scientific conclusions on science, particularly science involving health, the environment, and climate, I suggest the loss is intentional.

  85. Perhaps the missing raw data is not missing at all ?

    Perhaps it’s being stored in the deep ocean from where it will be emerge at some future date and time ?

    That’s my theory anyway.

  86. IMO, all of this could give rise to a new logical fallacy…it’s the
    “Dog Ate My Homework Fallacy”
    an example from a teacher POV:
    “I have lost the assignment ,class ,now all of you fail, but i will still give you all a passing grade depending on your social status.”
    or it could be like the “http://www.logicalfallacies.info/relevance/appeals/appeal-to-tradition/”
    fallacy with or without the actualdata.
    a paraphrased quote/info from tha preface of”The Golden Bough” circa 1922 :
    “…the Khazars in S.Russia ,where kings were liable to be put to death either on the expiry of a set term or whenever some public calamity, such as drought ,dearth, or defeat in war, seemed to indicate a failure of their natural powers.” (italics mine)Some cultures back then gave there kings everything and then took it all away.
    Nowadays , we’re supposed to elect a new king.
    How’s that for conjecture? http://en.wikipedia.org/wiki/Conjecture
    Thanks and have a good day.

  87. Case law usually drives record management. But case law is ahead of our ability to store mountains of data in a small space. Non-profits and governmental agencies are notoriously disconnected and technologically mismatched to the extent that record keeping is localized and housed in a huge variety of ways. Some of it is paper, some of it is on disks of various configurations, and some of it has been shoved to the back of some storage room, uncatalogued and slowly melting into oblivion. No one’s fault. It is simply the current state of the collective mess.

    So what does that mean? It means that we will have to rediscover through new research what was discovered and forgotten decades ago. If there is one lesson to be learned when we open the door on the raw data archive mess it is this one: do not let politicians pass any legislation based on this current state of affairs. If we do, the onus is on us, not them, and Ike will have been proven right.

  88. Another argument for Open Access is the wider distribution of data across locations, formats and media. Unfortunately this horse is long gone from the barn. Is DRM/paywall akin to a flaming barn?

  89. I apologize for the spelling/syntax errors in my comments.
    Now i must pay a sin-tax penance, and can’t comment for a undefined period of time.
    dang it ;-)

  90. Let X be an elliptic curve, of the form: $y^2 = x^3 +17$. Consider the set of integer solutions…

    150 years in the future a mathematician will know what ‘data’ I am working with!

    “Let X be a CMIP model initial with UAH and GISS data from 1970-1990…” we can’t replicate that next year!

  91. Why does this make me think of the fire at the Library of Alexandria? In some cases the names of the Greek and other inventors/philosophers are only remembered today because some of their scrolls survived.
    If a system is devised to store the data on obsolete media or in obsolete formats it should be redundant.
    For starters, the NSA has a huge storage facility that could be put to a better use.

  92. First, I suspect that 99+% of “data” collected for all studies, both published and unpublished, is of relatively little value and will never, ever, be accessed again, even by the original authors. Spending public money to archive all of it will therefore be a waste of our tax dollars.

    Instead, now that anyone can “publish” even if only via a personal website, authors should have the sense to strip all data of personal identities and publish the supporting data along with their results. And why can’t a journal insist that upon publication of a journal article, all supporting data be uploaded to their server and made available instantly to anyone interested? The cost of doing so would be trivial, for the data need not be archived beyond a few years. If anyone had an interest in the raw data, they could download it to their own server/computer and keep it as long as they wanted. The 99% that is simply useless junk will never be accessed and will disappear into the ether in time. The 1% that interests people will be preserved in multiple places. If it’s not, so what? That’s what’s apparently happening now anyway, and will likely continue to happen.

    Journals should make it an absolute requirement that raw data be published online at the same time any article is published. Authors of unpublished articles who publish them on their own websites instead should do the same if they expect anyone to pay attention to their studies. People who actually discover something should be more than willing to share their data if they are scientists. After all, they’re supposedly trying to make a point, a point indicated by the data itself.

    I do realize that authors will have competitive reasons for not sharing all of the data they’ve generated. However, any data relied upon in a published paper should be made available. And, as we’ve seen, those who withhold their data, at least in the climate “science” area, are not always behaving like real scientists seeking to advance the state of knowledge.

    Incidentally, the only records I can find of data I’ve collected for personal reasons that are over 20 years old are all recorded on paper, not a hard drive or a floppy. And I expect most of it to be tossed when I’m no longer around.

  93. This is nothing. What about those climate scientists who absolutely refuse to hand over the data NOW!

    One of the foundations of the scientific method is the reproducibility of results.

    I agree. But what is this?

    Dr. Phil Jones (CRU)
    I should warn you that some data we have we are not supposed to pass on to others. We can pass on the gridded data – which we do. Even if WMO agrees, I will still not pass on the data. We have 25 or so years invested in the work. Why should I make the data available to you, when your aim is to try and find something wrong with it. There is IPR to consider.

    http://climateaudit.org/2005/10/15/we-have-25-years-invested-in-this-work/

    This is the world of Climastrology in action. This is not yet a science I see.

  94. It has been alluded to in prior posts but one of the biggest problems is changing hardware and standards. I do photography and generate about .5 terabyte of images a year. Having worked in the IT field for many years I saw this problem a long time ago and have considered how do I preserve and protect my images so like photographers of old using glass negatives any of my images might out live me. The biggest hurdle was how do I know what means of recovery will exist in 10, 30, 50, 100 years from now.

    Lets assume a small budget generator of data gathers some information. It matters little if it is some obscure authors plays (Shakespear) or a budding amateur photographer (Ansel Adams) each generates data in some original form, which gets stored away in the storage media of the day. In both of the above examples, they use two of the best currently available archival means. Fade proof ink on archival paper is good for 100-200 years assuming it is kept dry and free of bugs and mold. Likewise silver halide film negatives on glass or polyester stock last over 100 years (not so much if on celluloid base stock).

    In modern digital data you have progressive changes in the data itself (ibm punch cards, punched paper tape, 8″ and 10″ reel tapes, 3480 family of tape cartridges, 4mm dat tape, DLT tapes, LTO tapes and probably a dozen other media I have never seen or heard of).

    Not only do the data formats on the media change, but so do the availability of the software to allow the data to be read (not mentioning forgotten passwords for protected files). In the late 1980’s early 1990’s the defacto standard word processing system used in government was Corel Word Perfect on the early PC’s. I have a large number of old documents I wrote as a State Planner that were written on early PC’s using Word Perfect. Open office used to allow you to open those documents but recent editions of Open office no longer allow that. It no longer has the drivers/internal code to open those files. I recently had to dig through a stack of old CD’s to find an old copy of Word Perfect 8 to allow me to open one of those files. How many folks out there do you think can open any of those documents written just 30 years ago?

    Then you have the physical hardware infrastructure changes. Just suppose our intrepid data generator, diligently put all his/her data on a hard drive in some relatively universal document format like RTF html xls or TXT which almost all systems can read today and locked that disk drive away in a safety deposit box. 30 – 40 – 50 years later someone pulls out that disk drive and tries to read the data off it. Assuming the lubricants in the disk drives bearing has not turned to varnish so the platter still spins. Does he have a system and the needed adapter cables to allow the drive to even be plugged in and powered up. Is it IDE, ATA PATA SATA, USB2, USB3 interface? Can they find the drivers to read that format on modern equipment. Will their new “Garagedoor 28″ computer using a 128 bit operation system still be backward compatible to now ubiquitous document and image formats. Will they still be using simple 0 1 binary or will we have moved on to a system of data storage that uses a 4 state code or some quantum system that no longer even uses binary data representation. Will 2010 vintage PDF files still be readable? Will anyone know what a JPG or PNG image file is?

    The biggest hurdles right now are at the hardware software level. Simple little things like not having an obsolete adapter cable interface (remember the old key boards and mice before almost all moved to USB) You had the old PS/2 and prior to that the older DE-9 RS-232 serial mice connectors. Right now most everything is moving to some variant of the USB interface but even there you have 3-4 different connector types and sizes. Who will have a junk box of old USB cables 50 years from now even if you have a working disk drive?

    One data preservation task it to store both the physical media but also the primary interface hardware (cables, sockets adapters, drivers etc.) along with the primary storage device. Otherwise your only option is to every 5-10 years migrate the data to a more “modern” storage media, and hope that you don’t pick a “newer” media system that is suddenly obsoleted by a law suite or the bankruptcy of the parent company, or some poorly informed administrator who has all your old “archival junk” tossed out so they can use the space for a new break room.

    This is a very big problem! Right now the most reliable form of preservation of rare old reports is on the personal hard drives of thousands of topic specific web surfers. We have seen it several times right here on WUWT that someone has posted that they could not find a certain old document and some other user posts an obscure link to some site that has a local copy of the original or a personal copy they captured years ago on their personal system.

    Anthony needs to consider this issue seriously for his study materials on station quality and see that some archival version of his study and raw data is placed in as many reliable repositories as possible! I suspect in 100 years all the data that supported the CAGW hysteria will be long gone and only the diligent efforts of a few skeptics might survive for historians to review.

    In fact this blog is a very important historical archive of CAGW and how it matured and decayed as the hype gave way to reality. I sincerely hope all the early blog data is well kept and preserved somewhere.

    I would happily send in some donation money to help Anthony archive his blog data!

  95. Not a new problem. When I was in high school, the electronics shop teacher made some money on the side fixing up magnetic wire recorders and using them to transcribe old recordings in the Smithsonian collection to (then) modern magnetic tape. There was a lot of material recorded during the Depression of rural American folk music that was at risk of being lost forever. The Germans developed magnetic tape for audio recording during WWII; prior to that there were several competing systems which used magnetic wire.

    I also attended a very interesting lecture by a 3M chemist on the archival properties of magnetic tape and basically it boils down to “store in a cool, dry place, and hope for the best”. After 10 years, there is no assurance the information can be read.

    Digitizing only puts off data obsolescence for a while, as you have to continually refresh/migrate it onto new media. At some point the effort involved is no longer worth it for most content.

    If you want to keep information for a very long time with very high assurance it can be read again, the best bet is archival microfilm. If processed and stored correctly it should be recoverable for 500 years, based on extensive testing by Kodak. The only technology required to read microfilm is illumination and magnification; modern microfilm would have been readable with the technology of the 18th century. Unfortunately, the digital camera revolution has gutted the market for conventional film and I think Kodak no longer makes it, but Fuji does.

  96. Sir Isaac Newton’s notes were kept. His notes on alchemy were “lost” I believe because later biographers didn’t want to reveal this unbecoming interest of the great man. However in the 1930s, they were found among unsorted papers at the Royal Society. They created a stir and history panned Newton mercilessly, referring to him as a magician rather than a scientist. The linearity history eggheads at Oxford and Cambridge (Newton’s university) who still are far behind the great man in knowledge and understanding, do this kind of thing. Tear down the great. The guy was 17th century for goodness sake before chemistry had legs! Despite this, he successfully experimented with production of hydrochloric acid from salt and he crystallized antimony oxide needles (how many of you even know there is such an element as antimony (symbol Sb). By the way, Newton was denied the chair in mathematics at Kings College, Oxford because he had heated disagreements with King James, he of the King James’s version of the Holy Bible.

    An outside the box thought: Maybe we will convert lead to gold one day. It will be expensive rejiggering the nucleus but it would be worth it to put a shine back on Newton’s image and thumb a nose at the linearity eggheads. We already have converted some elements to others after all.

    http://rsnr.royalsocietypublishing.org/content/60/1/25.full

    More on linearity. The same Oxford eggheads disenfranchised Herodotus as the father of History, buying into the probably jealous and much more boring (like the Oxford historians) Thucydides (whom I’ve also read) who called Herodotus a storyteller – ironically not realizing that he himself would not likely have been a historian if it hadn’t been charted out by his benefactor. I’ve read Herodotus’s Histories and it was a superb read. I forgave him his wrong thinking about the ebb and flow of the Nile which some believed (correctly) was caused by seasonal melting of snow.

    I also have two of the three volumes of Isaac Newton’s physics and mathematics lecture notes, collected and published in the late 18th Century (still looking for vol 3). Believe you me, we are in for a veritable tsunami of lost data when the climate science house of cards has finally collapsed. No loss really. But irresponsible scoundrels like these will unfortunately also do their best to destroy the raw data that has been foolishly left in their care to play with as they like. I have little doubt we have already lost long running records that weren’t behaving according to the script after their data had been put through the mincer. Boy, there is a mess to clean up and a major starting over awaiting us.

  97. Pippen Kool says:
    December 21, 2013 at 6:26 am
    “Any work that is important is usually repeated in a slightly different way. For example, most people believe Mann’s old ’ 8 paper not because they have looked at his data but because the study has been repeated–not exactly, but close enough—many times by now, Marcott’s paper (’13) might be the last one.”

    Well, the spike near the present in marcotte&Shakun’s data is not reliable, as Shakun has told Revkin
    (

    http://dotearth.blogs.nytimes.com/2013/03/07/scientists-find-an-abrupt-warm-jog-after-a-very-long-cooling/#more-48664

    )
    ; and when we ignore the spike, we see gradual cooling over the past 8000 years.

  98. This would be a great public service that a company like Google could provide for the world. They could have a data archive for scientific research. It could be part of Google Scholar.

    REPLY: I was thinking the same thing, but with a backup on the Amazon cloud service. – Anthony

  99. What gets me is how these papers get written in the first place if they cite previous research data. SOMEONE must be archiving their data or there would be nothing to research. New research would require new data every single time, or maybe that is part of the game. Collecting data requires money.

  100. “Newton … had heated disagreements with King James, he of the King James’s version of the Holy Bible.” King James died in 1625; Newton was born in 1643.

  101. crosspatch says:
    December 21, 2013 at 11:56 am

    What gets me is how these papers get written in the first place if they cite previous research data. SOMEONE must be archiving their data or there would be nothing to research.

    That is why it is so much cheaper to just make the data up. /sarc

    Actually I would bet that much of that old data started out as photo copies in the library from old documents or hand transcribed lists (by grad students) off of old printed documents. The originals are long gone now. Then that hand transcribed or photocopied second generation data was again re-transcribed into digital format (again by a graduate student) into 3rd generation data, with more new typos introduced at each layer of use.

    Then that data was manipulated, adjusted, tweaked, modified and formatted each time losing content or introducing unintentional errors (we will ignore any intentionally introduced errors).

    As a result the “original data” that the author of the study used was in reality 2nd, 3rd or 4th generation data from an original source that no longer exists.

    Just like historians have identified about 7 different versions of the Gettysburg address, all that data has been subject to decay of content due to each step in the replication process if not intentional destruction. Even computers drop data during copy operations. That is why we usually include data verification steps like hash values and check sums to verify the data has not suffered errors in copying when we duplicate files on computers.

    Even high profile programs suffer this sort of data decay. Many people talk about apocryphal stories that the U.S. could not build a new copy of the Saturn V booster because the original engineering drawings and specifications are no longer available, nor are key single source components used in the design. The original designers and builders have all retired or died taking with them their first person knowledge about why and how certain things were done. Similar examples exist such as the order to destroy all tooling and fixtures that were used in the production of the SR-71. If there was a need to re-manufacture one, it would be cheaper to start from scratch than to reassemble all the blue prints and hardware necessary to assemble a new clone. It is obvious that the same or worse happens to much less visible programs on a daily basis. Do you think Lockheed could produce the original air tunnel test data for the SR-71?

    We just burn our Library at Alexanderia a bit more slowly than the Egyptians did.

  102. The newspapers of New Zealand have now been digitised for around 100 years up to 1945, and are searchable. This is the result of my first search, for the words Auckland temperature:

    Note the large amount of information, amazing for 1869 or even today.
    Since NIWA seem to have lost NZ’s early temperature data, I plan to recreate the records from printed newspapers. I’ll save a JPG with the relevant date for each reading.

  103. dearieme says:
    December 21, 2013 at 12:15 pm

    “Newton … had heated disagreements with King James, he of the King James’s version of the Holy Bible.” King James died in 1625; Newton was born in 1643.”

    Oops, you are correct. It was James the second (James the VII for you Scots out there). Jimmy one was famous for his treatise on the tobacco smoke being bad for us.

    http://www.royal.gov.uk/HistoryoftheMonarchy/KingsandQueensoftheUnitedKingdom/TheStuarts/JamesII.aspx

    http://www.jesus-is-lord.com/kjcounte.htm

  104. This is the reason I am in favor of the old paper lab notebooks and 35 millimeter film for photography. (And microfiche) This loss of data is not only happening in science but through out entire lives.

    Think about it. No paper letters between friends and families, no diaries or permanent photos from the present generation. With e-books much literature may only be published in electronic form in the near future. Future historians will consider this a “Lost Era” Heck they don’t even want to teach kids how to write cursive or how to take hand written notes in class!

    There will be no real permanent records for this generation. Orwell would love it. /sarc

  105. So I’m thinking back to my first publication in the cancer literature, which was between 1991 and 2011, the period of study. The raw data is written in a notebook archived somewhere, I have no idea where, at Boston University School of Medicine. If I had been e-mailed for the study, I would have replied that it is archived somewhere in a notebook, but that retrieving it would take some doing. Would I have been counted as having “missing raw data?”

    There really is no excuse for relying on magnetic media of uncertain lifespan as the record of scientific work. For my later lab work, I wrote all my notes on my laptop, but then printed out the pages, signed and dated them, and put them in a notebook. It is clumsy in the digital age to resort to paper, but it is the only thing that will be verifiable 50 years from now. Eventually it will all be scanned, interpreted, and searchable.

  106. There really is no excuse for relying on magnetic media of uncertain lifespan as the record of scientific work.

    Giving the prices of storage, replication across physically separated devices would solve most of the problems. Most raw data should be publically available in any case, seeing as how we paid for them.

  107. Given that even with the data only about 1/4 of the studies are reproducible I don’t see the great [loss]. So much of what we think we know ain’t so.

  108. Shouldn’t the journals be responsible for the archiving of data that supports what they have published?

    What’s the archival policy of this discussion forum? The material herein and in a few associated blogs will provide an interesting historical footnote in the cold, hungry world of 2040.

  109. As I “print” this comment I feel ???? rising up. As Larry said “somewhere some one must be archiving”. I agree .
    Is that why the NSA is building this massive project in an environment (both isolated from people and weather today and is self sustainable.. And as far as being concerned about where all the data is going? Every time I hear of another scientific project being a “success” and “providing scientists” with enough data to keep them “BUSY FOR YEARS TO COME”, I think great guys … but… WHO is paying for that?.

  110. daveburton says:

    noaaprogrammer, the data on your old Hollerith punch cards is probably recoverable by optically scanning them, but it won’t be very easy.

    With a bit of effort, I could read paper tape, but punched cards are harder.

    In the case of punched cards and paper tape the data density very low and it is likely to take quite a bit of damage before things are irrecoverable.
    Also cards can be “interpreted” where the data is also printed along the top. Which means you have the same data represented in two different ways.

  111. anna v says:

    The reason lies that the complexity in any decent experiment is large, the probability of errors entering in the gathering is also large, as humans are fallible. Chewing over and over the same data may only show up these errors, which would explain discrepancies between experiments , or not , because similar blind spots could exist in the new analysis.

    It may depend on who is doing the analysis. Someone from a different group or background may spot “obvious” errors because they don’t have the same “blind spots”in their thinking and reasoning. Thus the attitude that only “climate scientists” are qualified to hold opinions of “climate science” is potentially a big problem.

  112. please allow me to add a proper citation for the “archive’. The quote from my earlier comment.
    *from “The Golden BoughA Study in Magic and Religion” by Sir James George Frazer f.r.s.,f.b.a. Hon.D.C.L.,Oxford; Hon.LITT.D.,Cambridge and Durham; Hon.LL.D.,Glasgow; Doctor Honoris Causa of the Universites of Paris and Strasbourg
    I volume, abridged edition
    copyright 1922 by the macmillan company
    copyright 1950 by barclays bank ltd.
    and i consider my use of the quote as a reviewer who wishes to quote a brief passage in connection with a review for inclusion in a magazine or newspaper
    my connection from the posting to the quote was in the light of data being lost and unelected officials that control the archives.(?)—-because this quote is from an old book that might get lost in the new digital age of data.
    I wish to thank the author /publisher for their work in producing the book.
    I also would like to thank WUWT for publishing/posting my comment/review.
    I apologize for any mis-use of quotes and my cryptic style of commenting/reviewing

  113. mbur says December 22, 2013 at 6:19 am

    I also would like to thank WUWT for publishing/posting my comment/review.
    I apologize for any mis-use of quotes and my cryptic style of commenting/reviewing

    re: In bold above.

    Yes, and, it does require more than the normal or average amount of effort to read and mentally parse. On first glance, if you pardon the blunt appraisal, it almost looks like, well, gibberish (yes, we have a few posters who post at that caliber).

    Let me be kind: The visual style and presentation needs some work …

    Apologies if English is not your native language, and also, my own personal view, I would rather you post in any form you can rather than not post; better to have your viewpoint, if substantive, than not.

    .

  114. @_Jim—Thank you for your reply and your kind words. Sometimes i have a flare up ;-) and blurt something out. Recently i have refrained from commenting as often as i would like to ,due to the fact that i read it again and it doesn’t make as much sense as i first thought.
    I apologize to any who think that it is “gibberish”.I have been exploring ways to improve and your reply is appreciated.
    Maybe i am just having a “Watts Up With That ” moment ,or others are having it.

  115. mbur says December 22, 2013 at 6:19 am

    please allow me to add a proper citation for the “archive’. The quote from my earlier comment.
    *from “The Golden Bough A Study in Magic and Religion” by Sir James George Frazer

    Incidentally, the above cited volume seems to be viewable here:

    1922 abridged edition – http://ebooks.adelaide.edu.au/f/frazer/james/golden/

    1894 ed Vol. I (of 2) – https://archive.org/stream/goldenboughstudy01fraz#page/n9/mode/2up

    1900 ed Vol. I (of 3) – https://archive.org/stream/goldenboughstudy01frazuoft#page/n11/mode/2up

    Google search for all volumes on Archive.org
    .
    .
    The referenced quote from the Preface reads:

    … of the Khazars in Southern Russia, where the kings were liable to be put to death either on the expiry of a set term or whenever some public calamity, such as drought, dearth, or defeat in war, seemed to indicate a failure of their natural powers.

    .

  116. Ernst-Georg Beck reconstructed the data from tens of thousands of CO2 measurements, analysed the experimental techniques and rated the quality of the data, based on descriptions of the methods of measurement and ambient conditions.

    Methinks there’ll be a “science gap” from about the 1970’s until real soon now, I hope. Future researchers may come to the conclusion that there was no scientific activity at all for a generation, because there are almost no surviving data or rigorous experimental or analytical documentation.

    Ask some of the “climate scientists” to replicate their own “experiments” with the catastrophic models that they used just five years ago. It has nought to do with bit-rot.

  117. Larry Ledwick says December 21, 2013 at 12:23 pm

    Do you think Lockheed could produce the original air tunnel test data for the SR-71?

    There is also the aspect, Larry, that the design and design verification (testing) would use contemporary methods involving (dare I say it?) modelling on computer equipment unavailable ‘in the day’ … also, the design would make use of CAD software/hardware unavailable in the day as well. The coupling of CAD and numerically controlled metal forming.cutting/turning equipment and the availability of composite materials might (in all likelyhood) result in a shorter design cycle than the original.

    .

  118. Thank you for a complete linked citation for my selected quote.I do know that almost everything is ‘archived’.My cryptic comments alluded to—–those that control the archives
    are really the ones in control of the whole thing.

  119. Thanks guys for mentioning M-Disc.

    I hadn’t been aware of its capabilities but as it turns out, the optical drive that I bought about 2 years ago for my current desktop system will etch M-Disc. Looks like I’ll be ordering some M-Disc media after Christmas.

    BTW: Most of my old VHS tapes are still OK after 3 years with no special storage facilities; just a bit of common sense. The DV tapes I have from 1998 are still readable, with a few, correctable errors. Old hard drives that haven’t been powered up for 5+ years, last written before the turn of the century, also come up good. fsck (readonly) has no complaints. The major problem that I have with old cassette tapes (other than entropic losses of quality), is in the glue binding the leaders to the spools; mostly on pre-1980’s tapes.

    And my university notes from the 1970’s are still as incomprehensible as they were the day after they were written.

    Have a Merry Christmas.

  120. Gail Combs says:
    December 21, 2013 at 3:37 pm

    This is the reason I am in favor of the old paper lab notebooks and 35 millimeter film for photography. (And microfiche) This loss of data is not only happening in science but through out entire lives.

    Think about it. No paper letters between friends and families, no diaries or permanent photos from the present generation. With e-books much literature may only be published in electronic form in the near future. Future historians will consider this a “Lost Era” Heck they don’t even want to teach kids how to write cursive or how to take hand written notes in class!

    There will be no real permanent records for this generation. Orwell would love it. /sarc

    =============================================================================
    We’re required by the EPA to keep our lab records for 10 years. At present, our “official” records are paper but there is an option to go digital.
    In my locker I have records from almost 25 years ago on 5 1/4 floppies. I also have the proprietary DOS program that made them, also on the old floppies.
    I’ve tried to access a backup of the info using other programs but to no avail.
    At present we enter our paper data into a WindowsXP based proprietary program to generate our reports.
    If we told the EPA we were “officially” going digital, could we be fined for not being able to access the digital data because Microsoft is forcing people to go to Windows8 or if the company that supplied our program goes under?
    I’m just a peedon where I work but there are also legal issues to be considered with only digital storage.

  121. If you really want to keep those records on 5.25 floppies and be able to recover it if needed in the future, you need to open those files in a computer that has a 5.25 drive and the appropriate software, and then save the documents out to a hard drive so they can be written out to cdrom or dvd at least in a universally accepted document format. For images you want to save them as jpg, tiff or png, those formats are recognized by just about all browsers, and document image programs, for text data RTF (rich text), txt (ascii text), html or odt (open document format used in open office).

    The early 5.25 floppies lose data over time as the small areas which are magnetized to store the info slowly spread (sort of like a spreading stain in a rug). They eventually begin to blend into adjacent data and become unreadable. That is assuming that the magnetic oxide still is adhered to the disk itself. They also have a tendency to flake off oxide, just like old magnetic tapes do.

    The only sure way to keep that data is to periodically “refresh it” by writing it back out to newer media.

    Like I mentioned above I have a lot of late 1980’s early 1990’s vintage 3.5 inch floppies which are still readable. But some of the files do sometimes take a couple of attempts to read them. They have been stored in low relative humidity at room temperature for 20+ years.

    I just pulled 4 random floppies from that box and opened random files on the disks with no problem/read errors. But they could be unreadable next week no guarantees on a floppy that old.

    If they can only be opened in that proprietary program then your only option might be to use print screen or cntl A cntl C to highlight the data and cut and paste it into a more modern document format. You might lose formatting but with both a screen grab of the display and the raw text data, you could reconstruct the page display in a modern word processor document with proper formatting. Big hassle I know, I spent about 4 hours yesterday evening converting some of my old WP7 files to RTF and ODT documents.

    It is a long tedious process but if the data is important to you, it is a cost of doing business to protect the data. If you are dependent on an xp compatible system to run that software you might need to keep an old xp box someplace disconnected from all networks where you can read the data and then export it to flash drives, a USB hard drive or cd/dvd storage to be transferred to a more modern computer system.

  122. Larry Ledwick says:
    December 22, 2013 at 1:45 pm

    =======================================================================
    Thanks. Even though we couldn’t be legally required to required to retrieve data that old, it’s worth a shot.
    The paper records should also be cared for. About five years ago someone found some of our old paper records. They went back to before we had any automation. I found the first report with my initials on it. A year or two ago someone decided to toss out the old records. *Sigh*
    Aside from the personal disappointment, there were clues in those records as to how to run the place if our SCADA went down.
    (“SCADA” is Supervisory Control and Data Acquisition or, in other words, automation.)

  123. The solution is a bit mean but not particularly difficult. If the data is not available to replicate, this is “trust us” science. “Trust us” science needs a rating scale from the data from the study itself is not available to the immediate citations, to 2nd, 3rd, or whatever degrees up line no longer have data. Grant making bodies should use the scores to determine future funding.

    Once journals are rated on how unreliable they are as far as insisting on data archiving/maintaining the ability to replicate and real money is on the line in the form of grant eligibility, the problem will correct itself and basic experiments without available data will get redone, perhaps with interesting variations in results.

  124. I have just read that there was no pause because not enough data was collected in arctic. Surely new data for the pause years cannot be slotted in? How can this work?

  125. Access to the raw data and re-analysis is valuable to confirm the original analysis isn’t screwed up (e.g., Mann’s hockey stick). Ideally, this sort of thing should be soon after publication, (if not before, but I doubt academic peer review will ever be that good.) After the analysis is verified, then you try to reproduce the experiment.

Comments are closed.