Questions on the CRU email backup server.

Over at Climate Audit, Steve reports on the Update for the FOI for the Wahl Attachments

He’s wondering about the use of that mailserver and why there are inconsistencies, for example:

I have a quick question for the technically-inclined about backup protocols. I had asked UEA the following question:

4. You stated that the earliest backup of Briffa’s computer that the university located was on August 2, 2009. I must confess to being completely astonished at this information, particularly since the Climategate dossier included Briffa emails from 2006 that were said to have been deleted.

To provide reassurance on this point, can you explain whether this late date of earliest backup also applied to other CRU computers e.g. it is my understanding that CRUBACK3 contained backups of four of Phil Jones’ computers, with a total of 22 individual backups. Did any of these backups date prior to July 2009? What was this earliest date? If there were earlier backups for other computers, why was the earliest backup of Briffa’s computer so late? Is there perhaps another machine attributable to Briffa that needs to be searched?

UEA replied as follows:

It is right to say that the earliest backup that is held for Professor Briffa’s work PC is the 2 August 2009 backup. However, that is not to say that that backup does not store emails dating back to a period before 2 August 2009. It is merely to say that there are no earlier backups. UEA’s position is that the 2 August 2009 backup would have included copies of all emails and attachments stored on Professor Briffa’s PC as at 2 August 2009 and this could easily have included documents and emails dating back to 2005/2006. You should in any event note that the backup server had an automated function that operated so as to remove older backups on a rolling basis. It is possible that the hacker who obtained and disclosed the emails to which you refer had access to the server for a number of months and that he or she obtained the emails from a backup that is no longer on the server.

…

But some aspects of the backup don’t make obvious sense to me. (They appear to have used BackupPC). Is this common practice: “the backup server had an automated function that operated so as to remove older backups on a rolling basis”. Wouldn’t it be standard practice to periodically preserve some of the older backups?

I note that the police report indicated that access to the CRU backup was not established until September 2009 so that the presence of emails in the CLimategate dossier that cannot be located on the CRUBACK3 server would require a different explanation than the one proffered here by the UEA.

==============================================================

I left this comment at CA:

In my server room we keep current backups for operational recovery, but not old backups unless that old backup has some particular configuration of value, like only running on specific older hardware that me may have to revert to.

For a mailserver, one labeled CRUBACK3, the question then becomes, what is the purpose of that server?

1. Is it a server that acts as a failover for the main mail server?

…or…

2. Is it an archiving server?

If the latter, then there would be absolutely no reason to use a rolling backup, and in fact it would be contrary to the archival mission. The fact that that same server had emails on it from 2006 suggests its mission was archival.

Archival servers typically have removable storage, so that you can put years of data/correspondence on the shelf. The FOI request may be too narrow in stating that the specific server be searched. I would restate it to include removable storage, including media such as: magnetic tape, DVD’s, CD ROM’s, removable hard drives, and Network attached storage drives that were used on CRUBACk3.

You might also ask what happened to CRUBACK1 and CRUBACK2 servers.

==================================================================

I f any readers have anything valuable to add, leave a comment please.

0 0 votes

Article Rating

46 Comments

Inline Feedbacks

View all comments

Leif Svalgaard

October 7, 2012 8:18 am

A typical procedure [for a well-managed server] is the son-father-grandfather system where when a backup of of the son becomes the father, the old backup [father] becomes the grandfather. Earlier backups are not kept.

D. Patterson

October 7, 2012 8:36 am

lEIF,
UEA has a “Records Retention Schedule (RRS)” requirement. See:
http://www.uea.ac.uk/is/strategies/infregs/Records+management/RRS
I don’t know what the RRS is for CRU or the particular information being sought.

October 7, 2012 8:43 am

Backups are controlled by policy and schedules. Part of policy includes retention time and archival method. Following the Enron problem a number of organizations realized that email was discoverable for court purposes and made radical changes to the retention policy. Where once it was common to preserve certain email for 10 – 20 years it is now not uncommon to keep it around for no more than 90 days. There are other considerations as well – if laws regarding Sarbanes-Oxley, personal credit information, or HIPPA (health data) are involved all manner of laws kick in. These are US specific but the UK and the EU have similar regulations and privacy requirements for archived data.
Archiving data means long-term storage and it is still common to use tape systems. With very long-term retention policies it is common to put the tapes on off-site storage as part of a disaster recovery program. More and more, spinning storage is being used as disk storage arrays become more prevalent. This kind of storage often included deduplication so that the same file, email attachments, for example, will be stored once and referenced many times, and will appear to be multiple copies to the operators. Tremendous storage efficiency is achieved via deduplication but oddly, you end up with just a single backup of everything – all eggs in one basket.
My bet is that unless the email were already part of a court matter and has been FOIA request or subpoenaed for a court proceeding, it is not likely to be held long, and will not make it to archive status. However – every server room that handled the mail potentially has a copy as well. That list of servers would be found from the CC: and BCC: lists.

patrioticduo

October 7, 2012 8:54 am

Yes, FOI requests should always delineate between “backups” and “archiving” systems. The processes and procedures are significantly different with completely different aims. A “backup system” is intended to allow the restoration of a server or service after a failure. An “archiving system” is intended to allow going back in time to examine data from the past. The two should not be conflated when making FOI requests.

John Whitman

October 7, 2012 8:55 am

It would be reasonable for a backup/archival strategy to depend on the nature of the owning entity. For a Forbes Top 100 international company there would be an overriding concern with long term liability and therefore very long term electronic info retention that would be well in excess of the legal retention requirements.
For a low tier public UK University whose risks are covered ultimately by the UK Government, it sadly seems reasonable to expect much less rigorous IT strategy on backup/archival. I think that is what we are seeing from CRU’s FOI responses and I think in parallel we are also seeing a resistant attitude by CRU that is less than an open and transparent one.
John

Bernd Felsche

October 7, 2012 9:08 am

Backups are like underpants.
Everybody expects you to have them. But few would be willing to check how or if you wear them, how often they are washed and their state of repair.
A good backup policy is one that is tuned to the needs to the organization that is tested on a regular basis. A bad one is one off the rack; standard software that somebody comes in to install and you forget about it.

davidmhoffer

October 7, 2012 9:19 am

Is this common practice: “the backup server had an automated function that operated so as to remove older backups on a rolling basis”. Wouldn’t it be standard practice to periodically preserve some of the older backups?
>>>>>>>>>>>>>>>>>>>>>
The statement is standard practice. The question seems to be a misunderstanding of the meaning of the statement.
In backup systems, you can make a “full copy” of the data, or you can make an “incremental copy”. The latter is simply a copy of all the changes since the last full copy. The most common rotation is daily incrementals and weekly fulls. So yes, it is standard practice to preserve full copies of the data for some period of time, often 7 years. After that, when the next full copy of the data is made, it overwrites the oldest copy in the system, be it on disk or tape. Or the tapes over a certain age are discarded, or one of several other possibilities. There is nothing “fishy” about removing older backups.

Ian W

October 7, 2012 9:32 am

There is a very useful reference here
https://www.watsonhall.com/resources/downloads/paper-uk-data-retention-requirements.pdf
Note that some requirements will be contract specific and depend on ‘vesting’ or ownership clauses in the contract. Typically, for government contracts ownership of _everything_ is vested in the Government (be it UK or US). By everything it means everything including laboratory logs, emails, pens, desks etc etc. If some derogation from that vesting was desired it would need to be requested and would be shown in the contract with the Government department. In most cases no derogation is possible.
Also note that the extreme cases are ‘indefinite storage’ and ‘100 years’ – this is a little ambitious for data storage so it is usual to have a requirement to maintain the data in useable form. So the government isn’t given an 8 inch floppy disk, an 8 track cartridge or a stack of punched/magnetic cards; or even a stack of floppy drives with the data in an unknown format. (I have been there!)

Kaboom

October 7, 2012 9:43 am

Backups at a larger institution/company are usually governed by two things: legal requirements (i.e. for purposes of taxation or financial auditing) which are obviously not negotiable and a policy for “the rest” which is a compromise of constraints on financial and staffing needs vs. the need to recover old information.
In the academic world one would assume that the researchers would put a big emphasis on having all their work protected with a backup plan, in particular since they usually are notoriously difficult to educate about keeping their files on the network instead of their local computer drives. They also don’t delete stuff as long as they have space, in my experience. The proper backup policy for that would be an incremental backup plan that starts with a snapshot of “what is” and then only records changes from run to run (usually just additions and revisions of single files). If space preservation is an issue, you’d limit the number of total increments and do a new snapshot every year or so and then begin increments again.
As far as email retention is concerned, the only real issue are attachments, not the email body per se, which is just text. That doesn’t take a whole lot of space and additionally can be compressed quite well on the fly. From the perspective of backing up emails sans attachments there is really no reason to be fickle about storage requirements. Considering all the material is centralized on the email server anyway, it can be backed up quite easily and efficiently.

davidmhoffer

October 7, 2012 9:45 am

1. Is it a server that acts as a failover for the main mail server?
…or…
2. Is it an archiving server?
>>>>>>>>>>>>>>>>>>>>
Question 2 should actually be two questions.
2. Is it a server that backs up data from other servers?
3. Is it a server that archives data from the mail and backup servers?
Archive differs from backup but serves some of the same purposes. In a backup system, copies of data from the production server are made on a regular basis, most often weekly, and retained for some period of time, often years. So consider an email that comes into the system. Once it is in the system, it never changes (or shouldn’t). So, after one year of weekly backups, the backup system now has 52 copies of that email. Just how many copies of the same email does one actually need? Particularly since it takes time to do the backup, and it costs money to store each copy on disk or tape.
So an archive is actually an extenstion of the backup system. An archive functions not be deleting data from either the production server or the backup server, but by simply making a copy of that same email, and then telling the backup system not to make anymore backup copies of it. The result is that we now have the potential to recover any given email from one of three places in the overall system, depending on what the data retention policies are for each part of the system. A typical implementation might be:
Production system => retain for 3 months
Backup system => retain for 1 year
Archive system => retain for 7 years
In the case of both the backup and archive systems though, there still needs to be a retention policy of some sort, and anything older than that retention policy should be deleted by one process or another. As an example, a brokerage firm in the US was nailed with a $1.3 billion summary judment a few years ago when it came to light that they had inadvertantly preserved emails that were beyond their retention policy. They were essentially guilty not because of what was in the emails, but because they had emails that were supposed to have been destroyed. For that reason, having a retention policy that automatically deletes copies of data older than the retention policy is not only common, it is often by deliberate design driven by legal requirements (differences between UK and US may alter this for the CRU, I don’t know).

davidmhoffer

October 7, 2012 9:56 am

Backups of operating system active drives typically use rolling backups…because why would you need a backup from 3 years prior if your intent is simply to recover the operational state of the machine?
>>>>>>>>>>>>>>>>>
Backups serve more than one purpose. If your only goal is to preserve the last operational state of the machine, then yes. But there are many cases where this is insufficient. Suppose you discover that you have a virus on your machine. How long has it been there? What has it been doing to the data? How do you back those changes out?
If the virus has been making subtle changes to the data, you need to know when that started. If all you have is the last operational state of the machine (say from one week ago), and the virus is on that copy as well….you’re hooped. If the virus has been present for say 9 months, you’ll need a copy of the operational state of the machine from before that.
The above is only an example. I’ve had situations where we needed to recover the operational state of a server from as much as 3 years prior to resolve an issue. For data, I’ve had situations where we had to go back to copies of the data wich were 3 years old to resolve a discrepancy in a financial system. Operational recovery is simply one aspect of backup and archive.

George

October 7, 2012 10:23 am

I believe I have read elsewhere that these people used the Eudora mail client and retrieved their mail by pop3 from the server. This means the mail was stored on their local PC and unless they actively deleted it from the server, it would usually be stored on the server for a period of time, too. I keep many old emails on my laptop. If I back up my computer right now, there will be emails going back to 2008 included in that backup as I have my inbox and other mailboxes to which I filter messages stored on my computer.
There is also the question if the backups were server backups, workstation backups, or a combination of both. Even if an older backup is deleted, on the next “full” backup cycle, all of those older emails that are still in the user’s mailboxes will be backed up again. I believe I read in at least one of the emails a complaint by someone that full backups were being done and it was taking a long time to finish (this would keep their mailbox folder locked so they couldn’t use email).
So if a full backup is done you have old emails included. Then you have “incremental” backups made. It is regular practice at some point to make a new “full” backup and delete the old one and all of its incrementals in order to save space. When that new full backup is made, any old email that is still in the user’s mailbox folders will again be backed up.

D. Patterson

October 7, 2012 10:46 am

Some resources:
UEA Records Management Policy
http://www.uea.ac.uk/is/strategies/infregs/Records+management+policy
Data Protection Policy
http://www.uea.ac.uk/is/strategies/infregs/dp/Data+Protection+Policy
Data Protection Act 1998
http://www.legislation.gov.uk/ukpga/1998/29/contents

rgbatduke

October 7, 2012 10:53 am

The comments above seem competent (and I speak as a 26 year sysadmin who has trained many sysadmins). There is a difference between “backup” for local files on a local computer and policy driven backup and archival storage for a department level server. There is also a fairly wide range of both the driving archiving policies and the competence of those implementing them. Finally, there are constraints from both economics and hardware.
A current/typical department-level archival/backup scheme for a mail server or file server would be to maintain a running “hot” backup — a twinned filesystem or RAID — plus archival and incremental backups on a weekly or monthly full backup schedule with incrementals in between that could range from daily to several times a day. For a corporate site with a high cost to lost data, the backups would be regularly rotated into offsite storage so that a fire in the server room that destroyed both the primary server and any redundancy systems would not be a complete disaster. In the old days that was accomplished by literally carrying tapes offsite and storing them in a fire safe in a different building; nowadays it would be more common to store the offsite archivals in a suitably secured cloud or on an institutional server provided for that purpose.
One thing that has altered the character of backups over the last decade or so is the sheer volume of material being backed up and the capacity of media that can hold it. At one point in time it was fairly easy to have a tape robot that could backup a department-sized server, but as data storage has skyrocketed into the tens to hundreds of terabytes of capacity, a serious mismatch between total data and non-disk media capacity has developed. Hence the increasing tendency to back disks up to other disks — nothing but hard disk farms in some sort of RAID has the capacity or reliability to store a full image of the contents of a large disk RAID used in a server. Time has also become an issue — it takes a long time to write very large amounts of data by ANY mechanism to synchronize two images on separate systems. Many of the hot backup schemes avoid enormously long backup times (times so long that the image being backed up can change significantly during the backup itself) by constantly writing only the deltas from one image over to another.
With all of that said, mail servers usually DON’T carry terabytes of data — I’m probably one of the most egregious of email users on the planet and have anywhere from 2000 to 20000 messages saved in my mail spool at any given time (at the moment, just short of 4000, but I cleaned up and saved a full copy of my 24000+ message spool file six months ago and have kept something of a lid on it in the meantime). Even allowing for attachments, I probably have less than a gigabyte of mail spooled, and probably had at most a few gigabytes when I backed it off and started over. The entire physics department at Duke has around 200 GB of current active mail spooled, and runs a backup scheme like those described above — servers on RAID, running backup to failover server, nightly incrementals on top of periodic fulls (onto tape, frequency determined by when an incremental starts to take too long), offsite archival backups every six months or so. The University sets policy as to how long archivals must be maintained and I don’t know offhand what the current policy is but I’m guessing at least 2 years simply on the basis of CYA on liability issues. We had all sorts of “interesting” subpoenas seeking mail spool content floating around during the infamous Duke Lacrosse case a few years ago, and if anything they increased the longevity of the backups in its wake.
Beyond that, everything stated in the CRU response makes sense. If their policy is to keep full archival images for 2 years, those images might contain mail messages dating back a DECADE more more — before I restarted my own mail directory I had messages dating back to 2007, and I have personal archives dating back into the 90’s.
I also do have to say that as much as I found the climategate emails interesting, I do think that their publication violated any number of excellent and well-founded privacy laws. I’m not a big fan of the FOIA. I am a strong proponent of open data and methods and public ownership of all results obtained with government grant support except in very limited, very specific exceptions granted for e.g. military research or contracted corporate research, but think that the place to establish the rules and enforce them is in the granting agencies (which is where it is currently happening). Either publish your data and methods and make your results freely available for open and public comment and use, or no grant money for you, end of story. Even there, I don’t care for the idea that my or anybody else’s email spools can be opened up to any damn fool who asks for them just because I might participate in government funded research. That’s not part of the results OR data I might be working with, and most of what is there is nobody’s business but my own.
I think it is an established fact that free and open communication is a fundamental aspect of modern science. At the same time, humans need a reasonable expectation of privacy or they are unable to speak freely — they have to weigh every word of every communication as if it might be made completely public. How can I write frankly to a Dean about some student having problems if that communication can be published on the internet or in a tell-all book at the whim of somebody who thinks that they have a “right” to the information? Do I need to resort to automatic encryption of every single message I send so that at the very least it won’t be revealed unless a court order that I have a chance to defend against compels me to give it up, where there is a good chance that the court order will place strict limits on what somebody might go “fishing” for and impose severe consequences on overstepping those bounds?
There are some cures that are worse than the disease, and the current tendency to wield the FOIA as a political weapon or instrument of a kind of harassment is one of them. Email is often an unguarded forum where we state things that we would never say, after sober reflection, in a public arena. It is also a “flat” medium, where it is impossible to easily tell when somebody is kidding or being sarcastic, where “smileys” (emoticons) were invented to convey some small fraction of the missing affect and emotion essential to sense, where critical remarks sound far more critical than they perhaps really are, where insults are more insulting, where flame wars cause us to overstate a case in the heat of battle, where we are more inclined to voice private doubts or analyze a competing point of view we don’t really agree with. If we’re going to just make it all public at anybody’s whim, we may as well just start posting all of our email messages straight to facebook…
rgb

Dennis Ray Wingo

October 7, 2012 11:47 am

There are still 200,000 emails with a password lurking out there……
Where did they come from?

Doug Proctor

October 7, 2012 11:52 am

rgbatduke says:
October 7, 2012 at 10:53 am
He makes a very good point about the nature of e-mail discourse: that it is a lowly self-censored discussion with more passion than one would tolerate in himself in a face-to-face talk. We have to allow much more benefit-of-the-doubt, more tolerance of excess, less maturity than otherwise.
Much of the alarmist-skeptic rhetoric is the stuff of junior high. We need to take that into account when we use the grown-up tools of FOIA and the courts to find out what is going on. A lot of it so far seems to be little more than back-stabbing and teenage dissing from the in-group about the out-group. Not all, of course, but a lot, actually most. Judgement says we are careful about the noise we make, lest we be said to be nothing more than a shepherd calling wolf.

Paul Jackson

October 7, 2012 11:53 am

Occasionally an astute IT manager will make an on-going series of back-ups of the Emails without telling anyone else; that way if the C-levels are under supoena, they honestly don’t know it exists, yet he can still say “let me take a look around for what you need and I’ll get back to you” if they need a lost email. I hold little hope that this will be the case in anything doing with climatastrology.
Perhaps it’s time for researchers receiving public grant money to start acting like real-life grownups, quit the embarrassing sophomoric trash-talking, the back channel communications, and the hide the original data BS. We need a data retention law for federal grant funded research data, something like can’t produce the data, give the money back or go to prison, lose the Email, give the money back or go to prison.
Seriously the if the Hockey team is correct, then the fate of the human race rests on the ability that they can convince us to spend U$ trillions to fix the CO2 problem; if that were really true, would they be so lackadaisical about retaining the original data and methods? I honestly feel that these buffoons have failed to demonstrate the maturity to allow independent living without adult supervision.

John Whitman

October 7, 2012 12:05 pm

rgbatduke says:
October 7, 2012 at 10:53 am
– – – – – – –
rgb,
A US gov’t grant receiving scientist’s emails discussing the government funded research he is formally and legally contracted to perform are not private emails.
For the present controversial ‘team’ research (of CG1 & CG2 fame), if no FOI laws existed, with the fact that those publically funded climate scientists refuse to provide all publically funded research data, programming, communications and methodology THEN there would be virtually no recourse for getting that publically funded info. I disagree with your apparent suggestion that FOI is being inappropriately used with respect to those non-open and non-transparent climate scientists.
Your idea that the requirement for openness and transparency should be strictly enforced by the granting institution seems reasonable, except those are virtually all government and therefore political bodies which can be ‘political’. I think FOI is a reasonable independent line of inquiry that is reasonably outside of politics and should be continued and strengthened.
John

Dale

October 7, 2012 12:08 pm

Don’t confuse “backups” and “archiving”.
A “backup” is a copy of the data to allow recovery in the event of a disaster.
“Archiving” is the process of moving un-used data from primary storage to secondary storage (secondary storage could be tape, or even low-cost slow hard disk).
Retention is set by policy, and it is very uncommon to have long-term retention. Nearly all organisations only set retention policy based on tax laws. If you need to keep something for tax reasons for five years, then retention policies will mostly be in the 5-7 years.
Long-term retention is made difficult due to changing backup software (even upgrades of the same software), hardware changes and even deterioration of the backup media. It can be extremely expensive to cycle your retained data to modern standards. Most quotes I’ve seen come to about one thousand times the cost of originally backing up the data, to cycle it to modern standards.
Steve should however enquire as to what JOURNALING CRU employed. This is the IT secret, and how they’ve been able to recover an email from years ago (when pushed to do it). Basically, nearly all email servers file a copy of every single inbound and outbound email in what’s called the journal. This acts as a log for IT email troubleshooting, as well as compliance with Sarbanes.
However IT generally doesn’t admit it exists, as the journal means they can easily read everyone’s emails. (The journal is also why the IT person is the best person to find out what secret plans an organisation has, and why they seen to know a big decision ahead of time). 😉
I wouldn’t worry too much on the server name. When IT rebuilds a server (for whatever reason) they usually use a number to indicate what iteration they are up to. It’s extremely bad practice to rebuild a server with the same name (can cause all sorts of technical problems on the network). Also, CRUBACK is most likely to be the email backup server, as in there would be an active and backup server in case the active goes out of action for any reason. It’s to ensure uptime, not to hold backups of the email server.

prjindigo

October 7, 2012 12:33 pm

Such an odd set of questions to see… since most scientists keep everything and freak out if anything is lost.
It would surprise me if there wasn’t an archived copy of everything not subject to loss-by-failure.
These kinds of people I commonly dealt with in storage concerns because they REFUSED to lose even one e-mail.

rgbatduke

October 7, 2012 12:38 pm

Perhaps it’s time for researchers receiving public grant money to start acting like real-life grownups, quit the embarrassing sophomoric trash-talking, the back channel communications, and the hide the original data BS. We need a data retention law for federal grant funded research data, something like can’t produce the data, give the money back or go to prison, lose the Email, give the money back or go to prison.
I agree completely with regard to the data and methods, but again, that’s a matter for the federal funding agencies to require and enforce. NASA, for example, has the requirement and recently, has been quite stringent about enforcing it. I suspect that within a very few years it will be an across-the-board requirement for federally funded research. As for “trash talking” and “back channel communications”, I fundamentally disagree and indeed, think you are treading heavily on the toes of the first amendment. Both sides in this debate engage in far too much of the former — WUWT is full of trash talk and presumption of guilt and malice aforethought in “all climate researchers”, and there are at least some participants on the CAGW “side” of the discussion that are no better. But this isn’t a legal matter, it is a social one or a political one and grant agencies literally don’t have the right to take trash talk or political agenda into account when considering whether to fund or not fund a proposal.
As for back-channel communications, are you trying to make it illegal for me to sit in my office and have a private discussion with somebody? How about calling them on the phone, am I obligated to make a recording of it if I discuss any sort of research? When I go to meetings, do we need to hire a video crew to make videotapes of not only the presentations but all of the private discussions, including those held while sitting around drinking beer late into the night (which happens remarkably frequently, as you might well imagine)? Where, precisely, is there a privacy boundary that permits my unguarded email conversations with people, where I might well be voicing new ideas, patentable concepts, doubts, concerns, or just blowing off steam with regard to a person that I think is a butt-head for entirely personal or any mix of personal and professional reasons, to be open to all comers armed with the flimsiest of excuses but still protects my right to hold exactly the same discussions offline in any number of venues?
Personally, I don’t think there is one. I think email should be precisely as private as personal mail — literally a federal crime to open or expose without a court order obtained on grounds more substantial than a “FOI request”.
This has nothing to do with whether or not CRU or other agents involved in the climate debate should or should not be required to provide their data and methods not only “upon request” but to make them utterly publicly available without any request. Again, NASA is actually not a bad example of the way it should run — you can get ALMOST any of the data used in NASA funded publications, and MOST of the code (maybe all of it, but I doubt it because thing don’t work perfectly) straight off of the internet in real time. I’m also not commenting on whether or not there was academic dishonesty or unethical behavior exhibited on the part of the hockey team members and revealed by climategate and many other events, or whether or not there were real crimes committed in there somewhere.
While one might wish for a greater level of maturity and tolerance by ALL participants in the discussion, one part of freedom is the freedom to be immature and intolerant in one’s communications. Usually, in science, such behavior in the long run carries the seeds of its own destruction. Nature, like honey badger, just doesn’t give a shit about what people think — it is what it is and does what it does. You can trash-talk all you want and even believe true things for stupid reasons — if you turn out to be right. But heaven help you if you turn out to be wrong, as then you are wrong and stupid, not just wrong. When discussing nature it’s a lot wiser to be open to the possibility that, no matter how passionately you believe something to be true, you could be mistaken, and practice just a teensy bit of humility on that account.
rgb

thesdale

October 7, 2012 12:40 pm

Oh, another important thing to remember, is this:
Today, 2012, if an IT Manager did not ensure very strong and resilient backup/archive/retention policies (and test them) they would be fired on the spot.
Back in 2006, and even 2009, that same IT Manager, in the same situation would just get a slap on the hand and told to do better next time.
We cannot apply today’s thinking and standards, on the past.

alexwade

October 7, 2012 12:54 pm

At home, I use Windows Home Server (the second best OS Microsoft made behind Windows 7) to manage by backups. It performs daily backups. On the eighth day, it deletes all but the backup on the seventh day. To make things clear, lets say the seventh day is the daily backup on Sunday. Then on Monday, it deletes all the backups except the one on Sunday and once again does a daily backup. That Sunday backup becomes the weekly backup. The next Monday, it deletes all the backups for the previous week except the one on Sunday. So now it has two weekly backups. And the process repeats. Windows Home Server lets you specify how many months of weekly backups you can keep. I choose 3 weeks.
I would imagine some backups are that way. There would be the daily backup that would be cleared each week and then a weekly backup that is kept for a fixed duration before the backup media is reused or destroyed.
(Since I’m talking about Windows, let me say that Windows 8 is awful and perhaps the worst OS Microsoft ever made. Much MUCH worse than Vista. If you need to buy a computer, buy it before you are cursed with Windows 8 is officially released later this month.)

rgbatduke

October 7, 2012 1:03 pm

A US gov’t grant receiving scientist’s emails discussing the government funded research he is formally and legally contracted to perform are not private emails.
That is simply not true, and indeed, it is absurd. What a scientist who receives a grant is morally obligated to provide is an end product, not every single step of the process through which that product is obtained, including video recordings of every discussion with a thesis advisor, some sort of reality TV taping of every day’s work in a lab, every single bug introduced or resolved while writing the code, every single concept considered and accepted or rejected along the way. For one thing, there isn’t enough bandwidth even in the modern world to provide that kind of trail. For another, it would utterly squelch the freedom to be wrong, especially in intermediate steps of research, if every half-baked idea one has but might end up rejecting becomes part of a public record that can be trotted out by political enemies of your conclusion seeking to prove that some ideas you have had, some things you have said, some statements you might have made in an utterly unguarded venue as part of the search process that is real science turned out — surprise — to be wrong.
I have a fair number of papers to my name, although nothing like the number a truly prolific researcher might produce, as I tended to work more or less on my own or with just one or two colleagues. I was, and am, perfectly happy for my work to be judged on the basis of what I published, but not on the basis of every single thought I had along the way and discussed with somebody. Lots of those ideas were wrong, and one of the ways I learned they were wrong was in those private discussions. Some of the actual published results turned out to be wrong as well. Other parts I’m still quite proud of; they turned out to be right, or close to right.
A grant is given to conduct a particular piece of proposed research and publish the result. It is quite reasonable oversight for the granting agency to be able to verify that the research it contracted for actually occurred (so money wasn’t taken under false pretenses). It is certainly reasonable to write into the contract that the researcher openly publish their results, their methods, and provide open access to their data as reproducibility is a key aspect of the “product” in all sorts of actual science research. It is utterly unreasonable to require that they document every vagrant thought or conversation they have with any or every other worker in or out of the field over the period of the grant and to make them openly available to anyone who asks. It is utterly unreasonable to require them to document or provide any access to letters, emails, private phone calls, discussions held at private dinner tables or in bars, or on a daily basis in the laboratory or office, beyond the minimum needed to ensure that they were in fact present and performing the research and not basking on a beach in Jamaica and making it all up. And the latter is already S.O.P. for all Universities and other agencies that are authorized to administer grant funded research in the first place.
Even this latter is done, as it should be, with a light touch. A grant usually doesn’t require one to spend X hours a week working on project Y; it requires that at the end of the grant period the proposed work be completed. Usually, the consequence of failure is simple — you lose your funding and have a hard time getting new funding. End of career, end of story.
In the meantime, hands off my email spool! I try to keep all my money laundering and cocaine transactions out of my email spool because we all know it isn’t REALLY private (ask Oliver North). But I do, sometimes, express personal opinions or speculations that I would rather not trumpet to the world.
rgb

John Whitman

October 7, 2012 1:43 pm

rgbatduke says:
October 7, 2012 at 1:03 pm

John Whitman says, “A US gov’t grant receiving scientist’s emails discussing the government funded research he is formally and legally contracted to perform are not private emails.”

That is simply not true, and indeed, it is absurd. What a scientist who receives a grant is morally obligated to provide is an end product, not every single step of the process through which that product is obtained, including video recordings of every discussion with a thesis advisor, some sort of reality TV taping of every day’s work in a lab, every single bug introduced or resolved while writing the code, every single concept considered and accepted or rejected along the way. For one thing, there isn’t enough bandwidth even in the modern world to provide that kind of trail. For another, it would utterly squelch the freedom to be wrong, especially in intermediate steps of research, if every half-baked idea one has but might end up rejecting becomes part of a public record that can be trotted out by political enemies of your conclusion seeking to prove that some ideas you have had, some things you have said, some statements you might have made in an utterly unguarded venue as part of the search process that is real science turned out — surprise — to be wrong.
[ . . . ]
rgb
– – – – – –
rgb,
Hey, appreciate your return comment.
The way I understand it, you are saying that the ‘in-process’ part of performing government funded research work is exempt from the rules regarding working for the government, that is the process is exempt from the spirit of openness and transparency in government.’
Are you suggesting that in principle no investigation should be allowed by media, private citizen or enforcement agencies is possible wrt the in-process activities of a government funded scientific researcher? During the time they receive grant money and the time they produce the research product a publically funded research scientist, are you suggesting there should be exemption from investigation by private citizens, media and enforcement agencies? They get a free pass?
My view is that by your argument you appear to be emulating the argument/position of the IPCC AR5 leadership wrt preventing in-process openness and transparency. I do not think we should just trust in their product’s unbiased, open and objective nature. Trust does not work that way. I do not think science works that way.
John