Questions on the CRU email backup server.

Over at Climate Audit, Steve reports on the Update for the FOI for the Wahl Attachments

He’s wondering about the use of that mailserver and why there are inconsistencies, for example:

I have a quick question for the technically-inclined about backup protocols. I had asked UEA the following question:

4. You stated that the earliest backup of Briffa’s computer that the university located was on August 2, 2009. I must confess to being completely astonished at this information, particularly since the Climategate dossier included Briffa emails from 2006 that were said to have been deleted.

To provide reassurance on this point, can you explain whether this late date of earliest backup also applied to other CRU computers e.g. it is my understanding that CRUBACK3 contained backups of four of Phil Jones’ computers, with a total of 22 individual backups. Did any of these backups date prior to July 2009? What was this earliest date? If there were earlier backups for other computers, why was the earliest backup of Briffa’s computer so late? Is there perhaps another machine attributable to Briffa that needs to be searched?

UEA replied as follows:

It is right to say that the earliest backup that is held for Professor Briffa’s work PC is the 2 August 2009 backup. However, that is not to say that that backup does not store emails dating back to a period before 2 August 2009. It is merely to say that there are no earlier backups. UEA’s position is that the 2 August 2009 backup would have included copies of all emails and attachments stored on Professor Briffa’s PC as at 2 August 2009 and this could easily have included documents and emails dating back to 2005/2006. You should in any event note that the backup server had an automated function that operated so as to remove older backups on a rolling basis. It is possible that the hacker who obtained and disclosed the emails to which you refer had access to the server for a number of months and that he or she obtained the emails from a backup that is no longer on the server.

But some aspects of the backup don’t make obvious sense to me. (They appear to have used BackupPC). Is this common practice: “the backup server had an automated function that operated so as to remove older backups on a rolling basis”. Wouldn’t it be standard practice to periodically preserve some of the older backups?

I note that the police report indicated that access to the CRU backup was not established until September 2009 so that the presence of emails in the CLimategate dossier that cannot be located on the CRUBACK3 server would require a different explanation than the one proffered here by the UEA.

==============================================================

I left this comment at CA:

Backups of operating system active drives typically use rolling backups…because why would you need a backup from 3 years prior if your intent is simply to recover the operational state of the machine?

In my server room we keep current backups for operational recovery, but not old backups unless that old backup has some particular configuration of value, like only running on specific older hardware that me may have to revert to.

For a mailserver, one labeled CRUBACK3, the question then becomes, what is the purpose of that server?

1. Is it a server that acts as a failover for the main mail server?

…or…

2. Is it an archiving server?

If the latter, then there would be absolutely no reason to use a rolling backup, and in fact it would be contrary to the archival mission. The fact that that same server had emails on it from 2006 suggests its mission was archival.

Archival servers typically have removable storage, so that you can put years of data/correspondence on the shelf. The FOI request may be too narrow in stating that the specific server be searched. I would restate it to include removable storage, including media such as: magnetic tape, DVD’s, CD ROM’s, removable hard drives, and Network attached storage drives that were used on CRUBACk3.

You might also ask what happened to CRUBACK1 and CRUBACK2 servers.

==================================================================

I f any readers have anything valuable to add, leave a comment please.

About these ads
This entry was posted in Climategate and tagged , , , , , , . Bookmark the permalink.

46 Responses to Questions on the CRU email backup server.

  1. A typical procedure [for a well-managed server] is the son-father-grandfather system where when a backup of of the son becomes the father, the old backup [father] becomes the grandfather. Earlier backups are not kept.

  2. D. Patterson says:

    lEIF,

    UEA has a “Records Retention Schedule (RRS)” requirement. See:
    http://www.uea.ac.uk/is/strategies/infregs/Records+management/RRS

    I don’t know what the RRS is for CRU or the particular information being sought.

  3. dp says:

    Backups are controlled by policy and schedules. Part of policy includes retention time and archival method. Following the Enron problem a number of organizations realized that email was discoverable for court purposes and made radical changes to the retention policy. Where once it was common to preserve certain email for 10 – 20 years it is now not uncommon to keep it around for no more than 90 days. There are other considerations as well – if laws regarding Sarbanes-Oxley, personal credit information, or HIPPA (health data) are involved all manner of laws kick in. These are US specific but the UK and the EU have similar regulations and privacy requirements for archived data.

    Archiving data means long-term storage and it is still common to use tape systems. With very long-term retention policies it is common to put the tapes on off-site storage as part of a disaster recovery program. More and more, spinning storage is being used as disk storage arrays become more prevalent. This kind of storage often included deduplication so that the same file, email attachments, for example, will be stored once and referenced many times, and will appear to be multiple copies to the operators. Tremendous storage efficiency is achieved via deduplication but oddly, you end up with just a single backup of everything – all eggs in one basket.

    My bet is that unless the email were already part of a court matter and has been FOIA request or subpoenaed for a court proceeding, it is not likely to be held long, and will not make it to archive status. However – every server room that handled the mail potentially has a copy as well. That list of servers would be found from the CC: and BCC: lists.

  4. patrioticduo says:

    Yes, FOI requests should always delineate between “backups” and “archiving” systems. The processes and procedures are significantly different with completely different aims. A “backup system” is intended to allow the restoration of a server or service after a failure. An “archiving system” is intended to allow going back in time to examine data from the past. The two should not be conflated when making FOI requests.

  5. John Whitman says:

    It would be reasonable for a backup/archival strategy to depend on the nature of the owning entity. For a Forbes Top 100 international company there would be an overriding concern with long term liability and therefore very long term electronic info retention that would be well in excess of the legal retention requirements.

    For a low tier public UK University whose risks are covered ultimately by the UK Government, it sadly seems reasonable to expect much less rigorous IT strategy on backup/archival. I think that is what we are seeing from CRU’s FOI responses and I think in parallel we are also seeing a resistant attitude by CRU that is less than an open and transparent one.

    John

  6. Backups are like underpants.

    Everybody expects you to have them. But few would be willing to check how or if you wear them, how often they are washed and their state of repair.

    A good backup policy is one that is tuned to the needs to the organization that is tested on a regular basis. A bad one is one off the rack; standard software that somebody comes in to install and you forget about it.

  7. davidmhoffer says:

    Is this common practice: “the backup server had an automated function that operated so as to remove older backups on a rolling basis”. Wouldn’t it be standard practice to periodically preserve some of the older backups?
    >>>>>>>>>>>>>>>>>>>>>

    The statement is standard practice. The question seems to be a misunderstanding of the meaning of the statement.

    In backup systems, you can make a “full copy” of the data, or you can make an “incremental copy”. The latter is simply a copy of all the changes since the last full copy. The most common rotation is daily incrementals and weekly fulls. So yes, it is standard practice to preserve full copies of the data for some period of time, often 7 years. After that, when the next full copy of the data is made, it overwrites the oldest copy in the system, be it on disk or tape. Or the tapes over a certain age are discarded, or one of several other possibilities. There is nothing “fishy” about removing older backups.

  8. Ian W says:

    There is a very useful reference here

    https://www.watsonhall.com/resources/downloads/paper-uk-data-retention-requirements.pdf

    Note that some requirements will be contract specific and depend on ‘vesting’ or ownership clauses in the contract. Typically, for government contracts ownership of _everything_ is vested in the Government (be it UK or US). By everything it means everything including laboratory logs, emails, pens, desks etc etc. If some derogation from that vesting was desired it would need to be requested and would be shown in the contract with the Government department. In most cases no derogation is possible.
    Also note that the extreme cases are ‘indefinite storage’ and ’100 years’ – this is a little ambitious for data storage so it is usual to have a requirement to maintain the data in useable form. So the government isn’t given an 8 inch floppy disk, an 8 track cartridge or a stack of punched/magnetic cards; or even a stack of floppy drives with the data in an unknown format. (I have been there!)

  9. Kaboom says:

    Backups at a larger institution/company are usually governed by two things: legal requirements (i.e. for purposes of taxation or financial auditing) which are obviously not negotiable and a policy for “the rest” which is a compromise of constraints on financial and staffing needs vs. the need to recover old information.

    In the academic world one would assume that the researchers would put a big emphasis on having all their work protected with a backup plan, in particular since they usually are notoriously difficult to educate about keeping their files on the network instead of their local computer drives. They also don’t delete stuff as long as they have space, in my experience. The proper backup policy for that would be an incremental backup plan that starts with a snapshot of “what is” and then only records changes from run to run (usually just additions and revisions of single files). If space preservation is an issue, you’d limit the number of total increments and do a new snapshot every year or so and then begin increments again.

    As far as email retention is concerned, the only real issue are attachments, not the email body per se, which is just text. That doesn’t take a whole lot of space and additionally can be compressed quite well on the fly. From the perspective of backing up emails sans attachments there is really no reason to be fickle about storage requirements. Considering all the material is centralized on the email server anyway, it can be backed up quite easily and efficiently.

  10. davidmhoffer says:

    1. Is it a server that acts as a failover for the main mail server?
    …or…
    2. Is it an archiving server?
    >>>>>>>>>>>>>>>>>>>>

    Question 2 should actually be two questions.

    2. Is it a server that backs up data from other servers?
    3. Is it a server that archives data from the mail and backup servers?

    Archive differs from backup but serves some of the same purposes. In a backup system, copies of data from the production server are made on a regular basis, most often weekly, and retained for some period of time, often years. So consider an email that comes into the system. Once it is in the system, it never changes (or shouldn’t). So, after one year of weekly backups, the backup system now has 52 copies of that email. Just how many copies of the same email does one actually need? Particularly since it takes time to do the backup, and it costs money to store each copy on disk or tape.

    So an archive is actually an extenstion of the backup system. An archive functions not be deleting data from either the production server or the backup server, but by simply making a copy of that same email, and then telling the backup system not to make anymore backup copies of it. The result is that we now have the potential to recover any given email from one of three places in the overall system, depending on what the data retention policies are for each part of the system. A typical implementation might be:

    Production system => retain for 3 months
    Backup system => retain for 1 year
    Archive system => retain for 7 years

    In the case of both the backup and archive systems though, there still needs to be a retention policy of some sort, and anything older than that retention policy should be deleted by one process or another. As an example, a brokerage firm in the US was nailed with a $1.3 billion summary judment a few years ago when it came to light that they had inadvertantly preserved emails that were beyond their retention policy. They were essentially guilty not because of what was in the emails, but because they had emails that were supposed to have been destroyed. For that reason, having a retention policy that automatically deletes copies of data older than the retention policy is not only common, it is often by deliberate design driven by legal requirements (differences between UK and US may alter this for the CRU, I don’t know).

  11. davidmhoffer says:

    Backups of operating system active drives typically use rolling backups…because why would you need a backup from 3 years prior if your intent is simply to recover the operational state of the machine?
    >>>>>>>>>>>>>>>>>

    Backups serve more than one purpose. If your only goal is to preserve the last operational state of the machine, then yes. But there are many cases where this is insufficient. Suppose you discover that you have a virus on your machine. How long has it been there? What has it been doing to the data? How do you back those changes out?

    If the virus has been making subtle changes to the data, you need to know when that started. If all you have is the last operational state of the machine (say from one week ago), and the virus is on that copy as well….you’re hooped. If the virus has been present for say 9 months, you’ll need a copy of the operational state of the machine from before that.

    The above is only an example. I’ve had situations where we needed to recover the operational state of a server from as much as 3 years prior to resolve an issue. For data, I’ve had situations where we had to go back to copies of the data wich were 3 years old to resolve a discrepancy in a financial system. Operational recovery is simply one aspect of backup and archive.

  12. George says:

    I believe I have read elsewhere that these people used the Eudora mail client and retrieved their mail by pop3 from the server. This means the mail was stored on their local PC and unless they actively deleted it from the server, it would usually be stored on the server for a period of time, too. I keep many old emails on my laptop. If I back up my computer right now, there will be emails going back to 2008 included in that backup as I have my inbox and other mailboxes to which I filter messages stored on my computer.

    There is also the question if the backups were server backups, workstation backups, or a combination of both. Even if an older backup is deleted, on the next “full” backup cycle, all of those older emails that are still in the user’s mailboxes will be backed up again. I believe I read in at least one of the emails a complaint by someone that full backups were being done and it was taking a long time to finish (this would keep their mailbox folder locked so they couldn’t use email).

    So if a full backup is done you have old emails included. Then you have “incremental” backups made. It is regular practice at some point to make a new “full” backup and delete the old one and all of its incrementals in order to save space. When that new full backup is made, any old email that is still in the user’s mailbox folders will again be backed up.

  13. rgbatduke says:

    The comments above seem competent (and I speak as a 26 year sysadmin who has trained many sysadmins). There is a difference between “backup” for local files on a local computer and policy driven backup and archival storage for a department level server. There is also a fairly wide range of both the driving archiving policies and the competence of those implementing them. Finally, there are constraints from both economics and hardware.

    A current/typical department-level archival/backup scheme for a mail server or file server would be to maintain a running “hot” backup — a twinned filesystem or RAID — plus archival and incremental backups on a weekly or monthly full backup schedule with incrementals in between that could range from daily to several times a day. For a corporate site with a high cost to lost data, the backups would be regularly rotated into offsite storage so that a fire in the server room that destroyed both the primary server and any redundancy systems would not be a complete disaster. In the old days that was accomplished by literally carrying tapes offsite and storing them in a fire safe in a different building; nowadays it would be more common to store the offsite archivals in a suitably secured cloud or on an institutional server provided for that purpose.

    One thing that has altered the character of backups over the last decade or so is the sheer volume of material being backed up and the capacity of media that can hold it. At one point in time it was fairly easy to have a tape robot that could backup a department-sized server, but as data storage has skyrocketed into the tens to hundreds of terabytes of capacity, a serious mismatch between total data and non-disk media capacity has developed. Hence the increasing tendency to back disks up to other disks — nothing but hard disk farms in some sort of RAID has the capacity or reliability to store a full image of the contents of a large disk RAID used in a server. Time has also become an issue — it takes a long time to write very large amounts of data by ANY mechanism to synchronize two images on separate systems. Many of the hot backup schemes avoid enormously long backup times (times so long that the image being backed up can change significantly during the backup itself) by constantly writing only the deltas from one image over to another.

    With all of that said, mail servers usually DON’T carry terabytes of data — I’m probably one of the most egregious of email users on the planet and have anywhere from 2000 to 20000 messages saved in my mail spool at any given time (at the moment, just short of 4000, but I cleaned up and saved a full copy of my 24000+ message spool file six months ago and have kept something of a lid on it in the meantime). Even allowing for attachments, I probably have less than a gigabyte of mail spooled, and probably had at most a few gigabytes when I backed it off and started over. The entire physics department at Duke has around 200 GB of current active mail spooled, and runs a backup scheme like those described above — servers on RAID, running backup to failover server, nightly incrementals on top of periodic fulls (onto tape, frequency determined by when an incremental starts to take too long), offsite archival backups every six months or so. The University sets policy as to how long archivals must be maintained and I don’t know offhand what the current policy is but I’m guessing at least 2 years simply on the basis of CYA on liability issues. We had all sorts of “interesting” subpoenas seeking mail spool content floating around during the infamous Duke Lacrosse case a few years ago, and if anything they increased the longevity of the backups in its wake.

    Beyond that, everything stated in the CRU response makes sense. If their policy is to keep full archival images for 2 years, those images might contain mail messages dating back a DECADE more more — before I restarted my own mail directory I had messages dating back to 2007, and I have personal archives dating back into the 90′s.

    I also do have to say that as much as I found the climategate emails interesting, I do think that their publication violated any number of excellent and well-founded privacy laws. I’m not a big fan of the FOIA. I am a strong proponent of open data and methods and public ownership of all results obtained with government grant support except in very limited, very specific exceptions granted for e.g. military research or contracted corporate research, but think that the place to establish the rules and enforce them is in the granting agencies (which is where it is currently happening). Either publish your data and methods and make your results freely available for open and public comment and use, or no grant money for you, end of story. Even there, I don’t care for the idea that my or anybody else’s email spools can be opened up to any damn fool who asks for them just because I might participate in government funded research. That’s not part of the results OR data I might be working with, and most of what is there is nobody’s business but my own.

    I think it is an established fact that free and open communication is a fundamental aspect of modern science. At the same time, humans need a reasonable expectation of privacy or they are unable to speak freely — they have to weigh every word of every communication as if it might be made completely public. How can I write frankly to a Dean about some student having problems if that communication can be published on the internet or in a tell-all book at the whim of somebody who thinks that they have a “right” to the information? Do I need to resort to automatic encryption of every single message I send so that at the very least it won’t be revealed unless a court order that I have a chance to defend against compels me to give it up, where there is a good chance that the court order will place strict limits on what somebody might go “fishing” for and impose severe consequences on overstepping those bounds?

    There are some cures that are worse than the disease, and the current tendency to wield the FOIA as a political weapon or instrument of a kind of harassment is one of them. Email is often an unguarded forum where we state things that we would never say, after sober reflection, in a public arena. It is also a “flat” medium, where it is impossible to easily tell when somebody is kidding or being sarcastic, where “smileys” (emoticons) were invented to convey some small fraction of the missing affect and emotion essential to sense, where critical remarks sound far more critical than they perhaps really are, where insults are more insulting, where flame wars cause us to overstate a case in the heat of battle, where we are more inclined to voice private doubts or analyze a competing point of view we don’t really agree with. If we’re going to just make it all public at anybody’s whim, we may as well just start posting all of our email messages straight to facebook…

    rgb

  14. There are still 200,000 emails with a password lurking out there……

    Where did they come from?

  15. Doug Proctor says:

    rgbatduke says:
    October 7, 2012 at 10:53 am

    He makes a very good point about the nature of e-mail discourse: that it is a lowly self-censored discussion with more passion than one would tolerate in himself in a face-to-face talk. We have to allow much more benefit-of-the-doubt, more tolerance of excess, less maturity than otherwise.

    Much of the alarmist-skeptic rhetoric is the stuff of junior high. We need to take that into account when we use the grown-up tools of FOIA and the courts to find out what is going on. A lot of it so far seems to be little more than back-stabbing and teenage dissing from the in-group about the out-group. Not all, of course, but a lot, actually most. Judgement says we are careful about the noise we make, lest we be said to be nothing more than a shepherd calling wolf.

  16. Paul Jackson says:

    Occasionally an astute IT manager will make an on-going series of back-ups of the Emails without telling anyone else; that way if the C-levels are under supoena, they honestly don’t know it exists, yet he can still say “let me take a look around for what you need and I’ll get back to you” if they need a lost email. I hold little hope that this will be the case in anything doing with climatastrology.

    Perhaps it’s time for researchers receiving public grant money to start acting like real-life grownups, quit the embarrassing sophomoric trash-talking, the back channel communications, and the hide the original data BS. We need a data retention law for federal grant funded research data, something like can’t produce the data, give the money back or go to prison, lose the Email, give the money back or go to prison.

    Seriously the if the Hockey team is correct, then the fate of the human race rests on the ability that they can convince us to spend U$ trillions to fix the CO2 problem; if that were really true, would they be so lackadaisical about retaining the original data and methods? I honestly feel that these buffoons have failed to demonstrate the maturity to allow independent living without adult supervision.

  17. John Whitman says:

    rgbatduke says:
    October 7, 2012 at 10:53 am

    – - – - – - -

    rgb,

    A US gov’t grant receiving scientist’s emails discussing the government funded research he is formally and legally contracted to perform are not private emails.

    For the present controversial ‘team’ research (of CG1 & CG2 fame), if no FOI laws existed, with the fact that those publically funded climate scientists refuse to provide all publically funded research data, programming, communications and methodology THEN there would be virtually no recourse for getting that publically funded info. I disagree with your apparent suggestion that FOI is being inappropriately used with respect to those non-open and non-transparent climate scientists.

    Your idea that the requirement for openness and transparency should be strictly enforced by the granting institution seems reasonable, except those are virtually all government and therefore political bodies which can be ‘political’. I think FOI is a reasonable independent line of inquiry that is reasonably outside of politics and should be continued and strengthened.

    John

  18. Dale says:

    Don’t confuse “backups” and “archiving”.

    A “backup” is a copy of the data to allow recovery in the event of a disaster.

    “Archiving” is the process of moving un-used data from primary storage to secondary storage (secondary storage could be tape, or even low-cost slow hard disk).

    Retention is set by policy, and it is very uncommon to have long-term retention. Nearly all organisations only set retention policy based on tax laws. If you need to keep something for tax reasons for five years, then retention policies will mostly be in the 5-7 years.

    Long-term retention is made difficult due to changing backup software (even upgrades of the same software), hardware changes and even deterioration of the backup media. It can be extremely expensive to cycle your retained data to modern standards. Most quotes I’ve seen come to about one thousand times the cost of originally backing up the data, to cycle it to modern standards.

    Steve should however enquire as to what JOURNALING CRU employed. This is the IT secret, and how they’ve been able to recover an email from years ago (when pushed to do it). Basically, nearly all email servers file a copy of every single inbound and outbound email in what’s called the journal. This acts as a log for IT email troubleshooting, as well as compliance with Sarbanes.

    However IT generally doesn’t admit it exists, as the journal means they can easily read everyone’s emails. (The journal is also why the IT person is the best person to find out what secret plans an organisation has, and why they seen to know a big decision ahead of time). ;)

    I wouldn’t worry too much on the server name. When IT rebuilds a server (for whatever reason) they usually use a number to indicate what iteration they are up to. It’s extremely bad practice to rebuild a server with the same name (can cause all sorts of technical problems on the network). Also, CRUBACK is most likely to be the email backup server, as in there would be an active and backup server in case the active goes out of action for any reason. It’s to ensure uptime, not to hold backups of the email server.

  19. prjindigo says:

    Such an odd set of questions to see… since most scientists keep everything and freak out if anything is lost.

    It would surprise me if there wasn’t an archived copy of everything not subject to loss-by-failure.
    These kinds of people I commonly dealt with in storage concerns because they REFUSED to lose even one e-mail.

  20. rgbatduke says:

    Perhaps it’s time for researchers receiving public grant money to start acting like real-life grownups, quit the embarrassing sophomoric trash-talking, the back channel communications, and the hide the original data BS. We need a data retention law for federal grant funded research data, something like can’t produce the data, give the money back or go to prison, lose the Email, give the money back or go to prison.

    I agree completely with regard to the data and methods, but again, that’s a matter for the federal funding agencies to require and enforce. NASA, for example, has the requirement and recently, has been quite stringent about enforcing it. I suspect that within a very few years it will be an across-the-board requirement for federally funded research. As for “trash talking” and “back channel communications”, I fundamentally disagree and indeed, think you are treading heavily on the toes of the first amendment. Both sides in this debate engage in far too much of the former — WUWT is full of trash talk and presumption of guilt and malice aforethought in “all climate researchers”, and there are at least some participants on the CAGW “side” of the discussion that are no better. But this isn’t a legal matter, it is a social one or a political one and grant agencies literally don’t have the right to take trash talk or political agenda into account when considering whether to fund or not fund a proposal.

    As for back-channel communications, are you trying to make it illegal for me to sit in my office and have a private discussion with somebody? How about calling them on the phone, am I obligated to make a recording of it if I discuss any sort of research? When I go to meetings, do we need to hire a video crew to make videotapes of not only the presentations but all of the private discussions, including those held while sitting around drinking beer late into the night (which happens remarkably frequently, as you might well imagine)? Where, precisely, is there a privacy boundary that permits my unguarded email conversations with people, where I might well be voicing new ideas, patentable concepts, doubts, concerns, or just blowing off steam with regard to a person that I think is a butt-head for entirely personal or any mix of personal and professional reasons, to be open to all comers armed with the flimsiest of excuses but still protects my right to hold exactly the same discussions offline in any number of venues?

    Personally, I don’t think there is one. I think email should be precisely as private as personal mail — literally a federal crime to open or expose without a court order obtained on grounds more substantial than a “FOI request”.

    This has nothing to do with whether or not CRU or other agents involved in the climate debate should or should not be required to provide their data and methods not only “upon request” but to make them utterly publicly available without any request. Again, NASA is actually not a bad example of the way it should run — you can get ALMOST any of the data used in NASA funded publications, and MOST of the code (maybe all of it, but I doubt it because thing don’t work perfectly) straight off of the internet in real time. I’m also not commenting on whether or not there was academic dishonesty or unethical behavior exhibited on the part of the hockey team members and revealed by climategate and many other events, or whether or not there were real crimes committed in there somewhere.

    While one might wish for a greater level of maturity and tolerance by ALL participants in the discussion, one part of freedom is the freedom to be immature and intolerant in one’s communications. Usually, in science, such behavior in the long run carries the seeds of its own destruction. Nature, like honey badger, just doesn’t give a shit about what people think — it is what it is and does what it does. You can trash-talk all you want and even believe true things for stupid reasons — if you turn out to be right. But heaven help you if you turn out to be wrong, as then you are wrong and stupid, not just wrong. When discussing nature it’s a lot wiser to be open to the possibility that, no matter how passionately you believe something to be true, you could be mistaken, and practice just a teensy bit of humility on that account.

    rgb

  21. thesdale says:

    Oh, another important thing to remember, is this:

    Today, 2012, if an IT Manager did not ensure very strong and resilient backup/archive/retention policies (and test them) they would be fired on the spot.

    Back in 2006, and even 2009, that same IT Manager, in the same situation would just get a slap on the hand and told to do better next time.

    We cannot apply today’s thinking and standards, on the past.

  22. alexwade says:

    At home, I use Windows Home Server (the second best OS Microsoft made behind Windows 7) to manage by backups. It performs daily backups. On the eighth day, it deletes all but the backup on the seventh day. To make things clear, lets say the seventh day is the daily backup on Sunday. Then on Monday, it deletes all the backups except the one on Sunday and once again does a daily backup. That Sunday backup becomes the weekly backup. The next Monday, it deletes all the backups for the previous week except the one on Sunday. So now it has two weekly backups. And the process repeats. Windows Home Server lets you specify how many months of weekly backups you can keep. I choose 3 weeks.

    I would imagine some backups are that way. There would be the daily backup that would be cleared each week and then a weekly backup that is kept for a fixed duration before the backup media is reused or destroyed.

    (Since I’m talking about Windows, let me say that Windows 8 is awful and perhaps the worst OS Microsoft ever made. Much MUCH worse than Vista. If you need to buy a computer, buy it before you are cursed with Windows 8 is officially released later this month.)

  23. rgbatduke says:

    A US gov’t grant receiving scientist’s emails discussing the government funded research he is formally and legally contracted to perform are not private emails.

    That is simply not true, and indeed, it is absurd. What a scientist who receives a grant is morally obligated to provide is an end product, not every single step of the process through which that product is obtained, including video recordings of every discussion with a thesis advisor, some sort of reality TV taping of every day’s work in a lab, every single bug introduced or resolved while writing the code, every single concept considered and accepted or rejected along the way. For one thing, there isn’t enough bandwidth even in the modern world to provide that kind of trail. For another, it would utterly squelch the freedom to be wrong, especially in intermediate steps of research, if every half-baked idea one has but might end up rejecting becomes part of a public record that can be trotted out by political enemies of your conclusion seeking to prove that some ideas you have had, some things you have said, some statements you might have made in an utterly unguarded venue as part of the search process that is real science turned out — surprise — to be wrong.

    I have a fair number of papers to my name, although nothing like the number a truly prolific researcher might produce, as I tended to work more or less on my own or with just one or two colleagues. I was, and am, perfectly happy for my work to be judged on the basis of what I published, but not on the basis of every single thought I had along the way and discussed with somebody. Lots of those ideas were wrong, and one of the ways I learned they were wrong was in those private discussions. Some of the actual published results turned out to be wrong as well. Other parts I’m still quite proud of; they turned out to be right, or close to right.

    A grant is given to conduct a particular piece of proposed research and publish the result. It is quite reasonable oversight for the granting agency to be able to verify that the research it contracted for actually occurred (so money wasn’t taken under false pretenses). It is certainly reasonable to write into the contract that the researcher openly publish their results, their methods, and provide open access to their data as reproducibility is a key aspect of the “product” in all sorts of actual science research. It is utterly unreasonable to require that they document every vagrant thought or conversation they have with any or every other worker in or out of the field over the period of the grant and to make them openly available to anyone who asks. It is utterly unreasonable to require them to document or provide any access to letters, emails, private phone calls, discussions held at private dinner tables or in bars, or on a daily basis in the laboratory or office, beyond the minimum needed to ensure that they were in fact present and performing the research and not basking on a beach in Jamaica and making it all up. And the latter is already S.O.P. for all Universities and other agencies that are authorized to administer grant funded research in the first place.

    Even this latter is done, as it should be, with a light touch. A grant usually doesn’t require one to spend X hours a week working on project Y; it requires that at the end of the grant period the proposed work be completed. Usually, the consequence of failure is simple — you lose your funding and have a hard time getting new funding. End of career, end of story.

    In the meantime, hands off my email spool! I try to keep all my money laundering and cocaine transactions out of my email spool because we all know it isn’t REALLY private (ask Oliver North). But I do, sometimes, express personal opinions or speculations that I would rather not trumpet to the world.

    rgb

  24. John Whitman says:

    rgbatduke says:
    October 7, 2012 at 1:03 pm

    John Whitman says, “A US gov’t grant receiving scientist’s emails discussing the government funded research he is formally and legally contracted to perform are not private emails.”

    That is simply not true, and indeed, it is absurd. What a scientist who receives a grant is morally obligated to provide is an end product, not every single step of the process through which that product is obtained, including video recordings of every discussion with a thesis advisor, some sort of reality TV taping of every day’s work in a lab, every single bug introduced or resolved while writing the code, every single concept considered and accepted or rejected along the way. For one thing, there isn’t enough bandwidth even in the modern world to provide that kind of trail. For another, it would utterly squelch the freedom to be wrong, especially in intermediate steps of research, if every half-baked idea one has but might end up rejecting becomes part of a public record that can be trotted out by political enemies of your conclusion seeking to prove that some ideas you have had, some things you have said, some statements you might have made in an utterly unguarded venue as part of the search process that is real science turned out — surprise — to be wrong.

    [ . . . ]

    rgb

    - – - – - -

    rgb,

    Hey, appreciate your return comment.

    The way I understand it, you are saying that the ‘in-process’ part of performing government funded research work is exempt from the rules regarding working for the government, that is the process is exempt from the spirit of openness and transparency in government.’

    Are you suggesting that in principle no investigation should be allowed by media, private citizen or enforcement agencies is possible wrt the in-process activities of a government funded scientific researcher? During the time they receive grant money and the time they produce the research product a publically funded research scientist, are you suggesting there should be exemption from investigation by private citizens, media and enforcement agencies? They get a free pass?

    My view is that by your argument you appear to be emulating the argument/position of the IPCC AR5 leadership wrt preventing in-process openness and transparency. I do not think we should just trust in their product’s unbiased, open and objective nature. Trust does not work that way. I do not think science works that way.

    John

  25. D. J. Hawkins says:

    Back in the Dark Ages (1992), I made the mistake of demonstrating to my boss I knew what DOS stood for. For my sins I was appointed the network manager (Novell 2.15; yes, that old). I had an automated tape backup routine that was incremental, Monday through Thursday, Friday was a complete backup. The complete backups were Week 1, Week 2, and Week 3. The fourth weekly backup was Month 1. At the end of the next cycle, Month 1 came home with me and sat in my desk there, Month 2 stayed on site. The following cycle, Month 2 came home and Month 1 went back in the lineup to be recycled. I never viewed the scheme as anything but a backup in case something went sidways on the server, or someone had a serious “oops!” and deleted the wrong file.

  26. Jace says:

    Sounds like the file system that was backed up in 2009 contained email archives dating back some time

  27. D. J. Hawkins says:

    @ John Whitman
    @ rgbatduke

    John, on reflection, I find rgbatduke’s highly persuasive, yours less so. I am reminded that the creation of the fundamental law of the land (U.S.) took place in utter secrecy. The founding fathers understood that free and uninhibited exchange could not take place under the whithering glare of political scrutiny. Once they had hammered out the finished product it could be vigorously debated in the coffee houses, newspapers and legislatures. Would it have informed the public discourse to find that, say, James Wilson of Pennsylvania thought William Patterson of New Jersey a scoundrel and buffoon (as an example only, not suggesting this was the case) during the Conventions’s deliberations?

    On the other hand, it strikes me as more than a little valuable to preserve the work product after the fact for any government grant or government funded activity. This would correspond to the various diaries and notes kept by the Convention participants and published after the matters were settled. For e-mails pertaining directly to the work, it is the public’s property. How hard can this be? In my own work life, every project I work on has it’s own Outlook folder. If I have something nasty to say about someone, it doesn’t go in a project-related folder. At the end of the project I usually print the entire folder to a PDF document and store it with the other electronic copies of the project’s supporting documents. Then the folder gets archived on our server. Done and done.

  28. Neo says:

    In many private sector businesses, the backup policy may be customized to meet the regulatory policies that apply. It is not uncommon for all e-mails to be saved for company “insiders” (as defined by the SEC), but to delete e-mail after one-year for all other employees.

  29. John Whitman says:

    D. J. Hawkins says:
    October 7, 2012 at 2:17 pm

    @ John Whitman
    @ rgbatduke

    John, on reflection, I find rgbatduke’s highly persuasive, yours less so.

    - – - – - – - -

    D. J. Hawkins,

    Thank you for your comment.

    NOTE: I have the greatest respect for rgb’s scientific content comments at WUWT. I am a fan from his very first comment at WUWT. : )

    Since I am a retired engineer and have not ever taken government money to do research, I am an outsider looking in to the gov’t use of money in scientific research in climate.

    rgb, I appears to me, has a much more precise insider view since he appears to have performed government funded research, but apparently not in climate.

    I expect him to be more persuasive therefore. I would be surprised if I was more persuasive in that regard. : )

    I, as a non-scientist (like some of the public), am very negative to any suggestion that we must trust climate scientists ‘in-process’ with public money. I am absolutely against it. I will trust only after I am assured that everything, including ‘in-process’ is completely done with the highest intellectual integrity on the part of the scientist receiving government funding. No ‘a priori’ trust.

    John

  30. Greg House says:

    rgbatduke says:
    October 7, 2012 at 10:53 am:
    “There are some cures that are worse than the disease, and the current tendency to wield the FOIA as a political weapon or instrument of a kind of harassment is one of them.”
    ========================================================

    I have never heard of such a “tendency”. You are the first one I hear to claim that such a tendency exists. At the moment I doubt that you or anyone else scientifically studied a sufficient number of FOIA requests and can prove your claim.

    As far as understand, FOIA requests concerning climate have a purpose to expose climate liars.

  31. Rick Powell says:

    I want to second what people are saying here. Email is backed up, but isn’t usually archived. The backups will contain a hodge-podge of old and new stuff – whatever random people happen to keep in their inboxes. Some people delete everything right away, and others hoard everything until you tell them they’re using too much server space. But once an email message has been deleted, you only have a limited amount of time (90 days, or 1 year, or whatever the backup overwrite policy is) before the message disappears from all backups.

    Most people will save a few very old emails – if they’re important or personally meaningful – so you should see a few old messages in the backups. But just because there are a few old messages doesn’t mean all old messages will be there.

    If you’re looking for a particular message, your best bet is to find it in the backup of an account of one of the email horders. Another possible source is the “outbox” of the person who sent the email.

  32. Michael J says:

    Large disks are now very cheap. It was not always so. Going back a few years, backup servers were generally struggling for space so it was common to delete all but a small number of old backups.

    For simple data recovery backups, you frequently kept just two or three backups. The more paranoid (a.k.a. sensible) would use some sort of “grandfathering” algorithm.

    For permanent(-ish) archives, they were generally stored on some form of tape, or maybe a DVD or (recently) a Blue Ray disk.

    For people without an (implemented) retention policy it would be common to keep email and working copies of files in just a small number of backups. Important files (data, work product, etc.) they might be archived by project at certain milestones (e.g. project release).

    Obviously people implementing a retention policy would need a longer-term solution>

  33. Ian W says:

    D. J. Hawkins says:
    October 7, 2012 at 2:17 pm

    @ John Whitman
    @ rgbatduke

    John, on reflection, I find rgbatduke’s highly persuasive, yours less so. I am reminded that the creation of the fundamental law of the land (U.S.) took place in utter secrecy. The founding fathers understood that free and uninhibited exchange could not take place under the whithering glare of political scrutiny.

    It is extremely simple, if you do not wish to have the ‘withering glare of political scrutiny’ – then do not take government money. The government contract will state that everything that you do while claiming that government money belongs to the government and everything that you purchase (and sometimes use) belongs to the government. If you do not like those terms – do not take the research funding or job – or in some way alter the contract to remove the ‘withering glare of scrutiny’. To put it another way – the government needs you ‘to show your working’ as well as the results.

    This is absolutely no different from a 7/11 checkout being videoed if you don’t want to be continually videoed then you can’t have the job at 7/11. It is part of the rules of the contract. If you want privacy in how you work on the research then you do not take _any_ government funding and you definitely do _not_ use any government funded systems to play those games on. Your reductio ad absurdum of continually videoing discussions is invalid. What is being discussed here is the conduct of research and discussions – on email – where the terms of the job state that there is ‘no expectation of privacy’, as mine do in my commercial job and as the woman at the 7/11 has in her terms of employment.

    It may be unfair to those who work correctly – but if the sunshine is removed then some people are tempted to, and do, make use of the darkness. The person who decides where that sunshine falls is the funding agency – in the 7/11 case it is the cash till area and the fuel pumps, in the government research funding case email is covered.

    .

  34. atheok says:

    “D. J. Hawkins says:
    October 7, 2012 at 2:17 pm
    @ John Whitman
    @ rgbatduke
    John, on reflection, I find rgbatduke’s highly persuasive, yours less so. I am reminded that the creation of the fundamental law of the land (U.S.) took place in utter secrecy. The founding fathers understood that free and uninhibited exchange could not take place under the whithering glare of political scrutiny. Once they had hammered out the finished product it could be vigorously debated in the coffee houses, newspapers and legislatures. Would it have informed the public discourse to find that, say, James Wilson of Pennsylvania thought William Patterson of New Jersey a scoundrel and buffoon (as an example only, not suggesting this was the case) during the Conventions’s deliberations?

    On the other hand, it strikes me as more than a little valuable to preserve the work product after the fact for any government grant or government funded activity. This would correspond to the various diaries and notes kept by the Convention participants and published after the matters were settled. For e-mails pertaining directly to the work, it is the public’s property….”

    Um; which fundamental law was that? If you mean the Constitution of the United States, you are incorrect about “utter secrecy”. Only the deliberations themselves were to considered secret and even this ‘secrecy’ meaning is loose. Remember, 55 delegates reported back to their delegations, state legislatures and others. Thomas Jefferson, who at the time was in France as United States minister kept in the loop via James Madison and was able to add his thinking/influence. This is before taking into account that the delegates often stayed at the same taverns/rooming houses. Secret wasn’t so secret, but what was kept quiet was any acrimony from badly reported ‘he said they said’ getting published and poisoning discussions. Basically, anyone could trundle on down and meet with one of their delegates and get the lowdown, just not the ugly arguments from the floor of the convention.

    On ‘privacy’ and other outhouse stuff; if you don’t want it read, don’t write it!. The whole idea of writing something that clearly expresses your thoughts, is that you thought about it before you wrote it. If gutter mouth, envy, burn your bridges, hate, pompous, ego, and many other nasty personal traits are how you want to be remembered, then by all means communicate that way (sound like some infamous people we discuss?). Don’t expect any sympathy from me if it goes public, ever.

    Having spent my career in US Government, mostly in IT, I do disagree with you rgb. A grant recipient takes Federal money and is expected to operate within ALL legal boundaries. Whether, EEO, FOIA, lack of prejudice, and so on you are considered a Federal representative. Claims of sex/gender/race discrimination against you and you will not be protected by claiming you are independent of the Federal government or that you don’t keep emails/records whatever… As the Manniacal one and UVA have discovered, grant recipients are NOT protected from FOIA requests for emails.

    On the matter of a deliverable; you took money from the Federal government and left the government with some expectations of what they will get for their money. If you fall off of a cliff, are the Feds expected to ‘write off’ that money as a bad investment? Shame, the Feds don’t agree. When accepting a Federal (and most state) grants, the grantee agrees to comply with all grant conditions. http://www.whitehouse.gov/omb/grants_circulars/.

    Steve:
    Dale gives some great advice (amongst others who cover the backup/archive detail well) about seeking the server’s journal. If you want to get pissy technical, every server those emails passed through may have a copy; that store and forward notion goes a little astray when IT guys get really going on CYA. It’s one thing to not be able to restore some cubicle worker’s damaged emails, it is entirely different when the top boss in your building is missing his great auntie’s address in an email from years ago. When hard drives were expensive, tape was cheap coparatively. Tape is slow and very linear. Now hard drives are cheap and getting cheaper all the time, plus they’re easily searched. I used to have write an ROI (return on investment) to buy drives over a hundred Mb; now Tb RAIDs can be had easily even on small budgets. Even back in the early 2000s, it was very easy to punch out archivals. If someone wanted to they could use DVDs at a personal level and keep multiple whole copies on the same DVD and then keep the DVD in storage. As I type this, I have an external HD I use to copy (much easier than a backup) my work to. I also have a 32Gb flashdrive I use to keep ‘fresher’ copies of my work. Back in 2007, the flashdrives were smaller but still effective as were external HDs. And this is my personal laptop.

    It may be a pain to re-install software, but it is murder to redo previous work, especially if that work is older stuff.

    Other that that, I think you’re missing the concept. When you submit an FOIA request, you should not have to state all of the possibilities where information may reside. That is their job! When you’re left unsatisfied and dissatisfied, escalate. Lame excuses from CRU/UEA are irresponsible and escalating to the next level are your only true course. I would add as documentation that UEA and CRU are trying to make FOIA requests a joke as you are supposed to identify technical options where they can look.

    Stick to the facts. Several have posted the retention policies and standards links. State the relevant policy (not the specific words, just the policy) and that you have no recourse since UEA/CRU (different every time just like them) because they’re uncooperative and perhaps are avoiding due diligence. Remember, when you made your original request all relevant items of your request are required to be maintained, no matter what the original expiration date was.

    As an aside; I keep remembering that one ‘team member’ said he kept all of his emails on a thumb drive and I wonder if that might’ve been ‘advice’ circulated by the team at some time.

  35. Shevva says:

    I’ll help but I get bored of this as I do it 8 hours a day. If you get confused can you ask close ended questions, cheers.

    Oh and don’t think just because you hit delete the e-mail is deleted, although this is more technical and would not have anything to do with FOI.

    Right I’m off to check the backups ran over the weekend.

  36. rgbatduke says:

    Are you suggesting that in principle no investigation should be allowed by media, private citizen or enforcement agencies is possible wrt the in-process activities of a government funded scientific researcher? During the time they receive grant money and the time they produce the research product a publically funded research scientist, are you suggesting there should be exemption from investigation by private citizens, media and enforcement agencies? They get a free pass?

    Not at all. Here’s what one can investigate. Is researcher A conducting the contracted research, spending a reasonable fraction of their time on it? If researcher A alleged that they spent salary money hiring postdoc B and grad student C, did they indeed do so? When postdoc B spent valuable grant moneys on a trip that was claimed to be to a workshop relevant to the ongoing research, did they in fact do so or did they instead use the money to purchase cocaine and hookers? Finally, when the contracted work is eventually published, did the author(s) of that work actually do the work alleged to have been done in the publications and obtain the results those publications assert? In the event that the work was done for an agency (such as NASA) that requires data and methods to be made available post-publication, are they in fact both available?

    None of this implies carte blanche permission to open their underwear drawer and paw through it at home, or their email spool to paw through it at work. All of the first part of it is satisfied by what we in the business call an “audit” and is a routine part of actually doing work for a granting agency that is administered by e.g. a University. Every year I fill in a form (even though I’m not currently supported by any grants) certifying my “effort” distribution among activities such as teaching, doing research, and non-University work such as consulting or entrepreneurial stuff. This certification must be co-signed by the department administrative staff and correspond within reason with their observations and resolve any complaints of me doing or not doing my job. This goes to sponsored programs, which is responsible to all of the granting agencies supporting work done at Duke. At any time, a granting agency or sponsored program can and cheerfully will conduct a far more detailed audit if there is any reason to suspect abuse.

    Nobody gets to sit in judgment of the work itself except my colleagues — which includes everybody in the whole world, but with weight given according to expertise and corresponding effort. Nothing constrains my results, which may or may not be what I expected them to be when writing the proposal, politically popular, or what everybody else gets when they study the same problem. There is no recourse for the granting agency to recover any part of the contracted funds should I or my work prove to be incompetent or for that matter mildly wasteful of their money at a level short of active embezzlement — their only recourse is to not fund me in the future if I waste their money now.

    The single exception to this are as follows: If, after my work is published, it proves to be egregiously non-reproducible in a way that suggests that I faked my results and the work itself is broadly challenged as not, in fact, being an objective presentation of work that I actually did, a granting agency and sponsored programs can delve deeper and investigate the possibility that actual fraud was committed. They don’t even need a subpoena — I give them implicit permission to do this when I apply for and accept the grant. At that time I do indeed have to open up my “books” to them, show them the details of my intermediate work, show them the actual data if I’m an experimentalist (I personally am not) and/or work from data, show them my methods or code or algebra if I’m a theorist (which I personally am).

    Only in the event that my work proves to be academically dishonest — to put it bluntly, I either faked the data or claimed to have done work I did not actually do — does my employer (the University) and the granting agency have to pursue civil remedies and of course the possibility of criminal prosecution also looms if my actions shade over towards embezzlement rather than mere sloth or academic dishonesty. The most likely outcome in this case would be me getting fired, for cause, end of career in academia on the spot, the University absorbing the loss (they were responsible for my hiring in the first place) and reimbursing the granting agency as a condition for continuing to receive grant money in the future. It is possible, but moderately unlikely, that the University would pursue a civil suit against me to recover some or all of this loss.

    Sadly, the University is the home of a poster child for scientific misconduct, Anil Potti, see:

    http://en.wikipedia.org/wiki/Scientific_misconduct#United_States

    Crime: Faking data extensively, over years, to assert egregious claims in the general area of cancer research IIRC. Revealed by the utter failure of his work to be reproducible. Exposed and verified by interviews with his postdocs and research fellows, by delving into the details of his supposed “patient data”, by interviewing the patients. Punishment (so far): systematic withdrawal of all of his papers. Public humiliation. Termination of employment, with a near guarantee that his future career will be as a used car salesman or working in sanitary engineering. Exposure to lawsuits seeking to recover damages on the part of both the University and any patients who claim to have been hurt by the “therapies” that he claimed were fruitful on the basis of faked data. Possible criminal prosecution, although it will never happen — it doesn’t need to, he’s finished.

    Note well that the issue isn’t whether or not his work was mistaken — there is a ton of stuff published in science that turns out to be wrong. It doesn’t have anything to do with whether or not he believed his own claims — I rather expect that he did, so strongly that he didn’t think he needed to actually verify them or to the point where he was willing to bend them by a few percentage points to make them statistically significant instead of insignificant. At issue was scientific misconduct — deliberately faking data, bending it by those percentage points, and publishing something that as a consequence was egregiously false.

    Note well that at no time is the implicit right to investigate cases of possible scientific misconduct for cause by the granting agencies and the adminstrators of the grants to be construed as a carte blanche invitation to the general public to embark on a witch hunt at will anytime the results being published by some researcher don’t agree with their own personal and political biases and beliefs. Nor does it extend even to the granting agency and/or administrator complete freedom to open and read all correspondence electronic or otherwise.

    Finally, only a tiny handful of even the climategate emails cross the line and provide evidence of “edgy” behavior, behavior shading on scientific misconduct. Incompetence yes. An egregious tendency towards confirmation bias absolutely. An entirely unhealthy superman save-the-world complex sure. But none of these are scientific misconduct; they are scientific incompetence.

    To be explicit, Michael Mann’s promotion of bristlecone pine trees from a tiny fraction of the surface of the Earth into the dominant component in a flawed PCA method that produced only hockey stick shapes, and the subsequent promotion of his hockey stick into “the” proxy record of global temperature by political extremists is not scientific misconduct on Mann’s part, it is scientific incompetence on his part and political misconduct on the part of others. If he had faked the data it would have been scientific misconduct. Using handwritten PCA code tuned to produce a result that he expected to find, however, was just incompetent because if he were competent he would have doubted his own work and taken steps to verify that his own results were robust instead of waiting for M&M to do it for him. The one place where his activities edged over into scientific misconduct was his reluctance to release his data and code when M&M requested it, but back then granting agencies were a lot more lax with disclosure rules if they had them at all!

    This reluctance was rather understandable from a human point of view. Even if you publish results obtained in the best of faith — and I have little doubt that Mann’s results were good-faith results and do not think to this very day that he doubts that they are correct — the public demonstration that they are crap because you made a number of mistakes and failed to exercise sufficient care checking them is rather bad for a career. To some extent we create this monster by supporting a research system that despises negative results and rewards positive ones. As Feynman points out in the cargo cult lecture, negative results are often more valuable (and honest!) than positive ones but nobody gets tenure and future grant money for negative results. Why should we then be surprised when young researchers with their entire future on the line drag their heels and seek to at least postpone the day of reckoning when they make a mistake that could cost them their entire future career?

    The structure of the ivory tower itself created Potti, created Mann, and continues to create piles of poor results every day. Incompetent results. Results tainted with confirmation bias and cherrypicking of data, but still honest results in the sense that the researcher in question honestly believes their conclusions to be true and can rationalize their arguments for inclusion of the data in some way. The whole point of the scientific process is to correct for this nearly inevitable tendency by exposing it to the process of verification and falsification. It is, or should be, all right to make a mistake, as long as you don’t lie and create false data to support it.

    I am far from immune to this myself. I spent a decade or so investigating the critical behavior of the classical Heisenberg ferromagnet via Monte Carlo computations. This is a model that looks like it “should” be analytically solvable, but in fact it has so far proven intractable (only one major magnetic model — the Ising ferromagnet — has proven solvable). In the course of my simulations I discovered that the dimensionless critical temperature/coupling of the model was very, very close to T_c = 1/\ln(2) = 1.442695..., and I came to believe that it was this value (it fits to between four and five significant figures). This was naturally very exciting — it suggested that a simple/tractable solution might exist where this result could be proven!

    If one had opened up my email records to scrutiny, I’m quite certain that they would have contained all sorts of discussions of this that I had with colleagues I was working with. I was quite convinced of it — too many digits to be a coincidence, surely, I thought. However other researchers got numbers that were very slightly different from mine — still within my error bars, but they claimed smaller error bars. I doubted their error bars were as small as they claimed they were, they agreed that my results were correct as far as they went but that I needed to make my error smaller.

    They, as it turns out, were quite correct. Eventually I accumulated enough data to positively reject my own hypothesis of a “nice” analytic form for T_c — in the fifth and sixth significant digit — within my own error bars. The seductive pretty story sadly gave way to actual computations that eventually falsified it.

    At no time did I participate in scientific misconduct, mind you. I didn’t fake my results. I didn’t misreport them. I presented my hypothesis as a hypothesis, not a proven fact (one would have to have the derivation itself to report it as a proven fact). But I damn skippy sure believed, throughout a couple of years of work, that the result was analytic, and presented results that were very slightly erroneous by assuming (as a stated assumption) that this was T_c.

    This is only one of several pretty stories I’ve been misled by over the years. The formation of a scientific hypothesis is the formation of a pretty story. It is by its nature seductive, the observation of a pattern and a proposed explanation for that pattern. It isn’t always easy to do work to accept or reject it, either — one’s results tend to reflect the bias of assumptions made in obtaining them, the bete noire of scientific research and the reason many eyes and hands looking at things and challenging one another are good.

    In the end, there are checks and balances in the scientific process a plenty. Treating the email spools of every scientist in the world as open to public scrutiny is not, and should not, be one of them, not without far more cause than what is behind a typical FOI request. If someone wants my code — they are welcome to it. If someone wants my data (well, I personally don’t have any data, theorist) sure, why not. If somebody wants to read through my private emails, many of them discussing students with special needs and other things that would violate privacy laws if exposed, just so they can see whether or not there was a time I believed something that turned out to be incorrect, such as the value of T_c for the critical Heisenberg ferromagnet, well, all I can say is get a life. Go away. Leave me and my email spool alone.

    That’s a line that can and should only be crossed armed with the equivalent of a warrant issued on the basis of actual hard evidence of a committed crime, precisely like my real underwear drawer in my own house, precisely like my real, federally protected paper mail. Email and paper mail should absolutely enjoy the same protections under law, and those protections should completely block and actively punish its exposure by third parties armed with anything less than a court order, or even my employer (who technically “own” the spool file itself) acting as an agent of granting agencies that support me without just cause and due process.

    I repeat: Without a reasonable expectation of privacy, frank discussions and speculations essential to the scientific process cannot occur. Freedom of scientific discourse includes the freedom to be wrong, the freedom to be misled, the freedom to pursue the wrong path to a fault, to stubbornly stay on it even in the teeth of contrary evidence. The system has checks and balances galore built into it, and is loose enough to tolerate the iconoclast in the short and intermediate run while still ending up quite rigorous in the long run. FOI is often being used as a weapon and tool for harassment in a political war, not a scientific war, and is just as repugnant to me as the gatekeeping and other over-the-edge activities revealed in climategate. It is one thing to ask for data and methods, quite another to ask for open access to email spools so one can see if there is a “conspiracy” of some sort that can be discovered there.

    Whatever happened to ordinary civility in scientific discourse?

    rgb

  37. D. J. Hawkins says:

    Anthony/mods;

    I suggest that rgbatduke’s post at 07:56 be elevated to guest post.

  38. John Whitman says:

    rgbatduke says:
    October 8, 2012 at 7:56 am

    [ . . . ]

    I repeat: Without a reasonable expectation of privacy, frank discussions and speculations essential to the scientific process cannot occur. Freedom of scientific discourse includes the freedom to be wrong, the freedom to be misled, the freedom to pursue the wrong path to a fault, to stubbornly stay on it even in the teeth of contrary evidence. The system has checks and balances galore built into it, and is loose enough to tolerate the iconoclast in the short and intermediate run while still ending up quite rigorous in the long run. FOI is often being used as a weapon and tool for harassment in a political war, not a scientific war, and is just as repugnant to me as the gatekeeping and other over-the-edge activities revealed in climategate. It is one thing to ask for data and methods, quite another to ask for open access to email spools so one can see if there is a “conspiracy” of some sort that can be discovered there.

    Whatever happened to ordinary civility in scientific discourse?

    rgb

    ———-

    rgb

    Again, appreciate your continued dialog in this important subject of the role of open transparent professionalism by climate scientists in the performance of the entire process of executing a US government contract for climate research.

    We are from different philosophical foundations, apparently.

    Private is private, but conduct of public research is inherently not private in any common sense way.

    I see that a climate scientist receiving US government funding is not prevented from fully and creatively conducting activities via the scientific method especially if there is a clear public expectation of ‘in-process’ openness and transparency in conduct of government funded climate research. I really see no problem for said scientist. I see no dilemma, whereas you have expressed, at length, a critically severe concern over ‘privacy’ and ‘suppressing’ free thought’.

    What scientist fears being seen, during the entire process of grant through final research product, as a highly professional and open scientist in public research? I see no suppression of free thought and no intimidation. I take that as civility in the highest possibly mode. Where has civility gone you ask? That is where civility should be.

    In my long professional experience in corporate America, there was never the expectation of any ‘in-process’ privacy from the owning corporation’s eyes; it did not reduce creativity nor dampen open discussion of even our failed attempts at achieving original goals . . . on the contrary the openness stimulated new ideas and solutions. Likewise, that same expectation of no privacy from owning government (and public) eyes during the conduct of government funded research is not inherently inimical to creativity, false starts and failure to meet original goals; in fact it can and should enhance the creative ideation process and scientific objectives.

    As to fundamental philosophical differences; I live in a benignly indifferent universe.

    John

  39. atheok says:

    rgb said “…FOI is often being used as a weapon and tool for harassment in a political war, not a scientific war, and is just as repugnant to me as the gatekeeping and other over-the-edge activities revealed in climategate. It is one thing to ask for data and methods, quite another to ask for open access to email spools so one can see if there is a “conspiracy” of some sort that can be discovered there.
    Whatever happened to ordinary civility in scientific discourse?…”

    Well said and well asked rgb!
    I don’t quite agree with what ‘D. J. Hawkins’ said about a guest post; but I do agree that it is a magnificent post.
    I do agree with ‘John Whitman’s’ followup. Neither in public occupation nor in private did I ever get an expectation of privacy in communications/actions except at home on personal business relations. Government business discussion at home has no protection and every time I was recertified for security classification, there was NO privacy anywhere.

    Grant recipients are subject to complete audits anytime by the grantors. As the Virginia AG discovered, he did not have the right to just FOI UVA emails. However he could have initiated a fraud investigation that has proper jurisdiction and gotten the emails that way; only any success at prosecution is in serious doubt when, as you say, a blind witchhunt is how the evidence is discovered. Without proof of illegal activity so he can subpoena legally, the AG can not search just to search.

    FOIA requests whether by concerned or curious citizens and questing scientists, such as Steve’s are for a purpose, not harrassment. Is there another way that Steve can get the information he seeks? well, that is why FOIA legislation was proposed and passed.

    FOIA does not give requesters the right to access personal information and any/all FOI coordinators know this. There are some other exemptions that were added later, such as the one about ‘unpublished’ research. I suspect that once all the dust settles regarding scientists (cough) that sought to block/deny/ignore/destroy FOI requested items, there might be additional legislation narrowing those exemptions.

    When one works ‘openly’ at work and is relatively organized, it is easy to respond to FOI’s. Hand over (or identify the repositories) and let the FOI coordinators do their work. Meet with the manager and explain what information is ‘restricted’, go over their findings and agree/disagree on points. The FOI staff should know already where to search emails and backups. Privately held and protected storage offsite is against basic work rules, unless your storage has been certified by security and steps are taken to ensure secure copies are available if you should fall off a cliff. Again, the FOI coordinator should be aware of or find out this in the meeting.

    Only when someone is very rectally oriented and insists on reviewing/approving every word/sentence/paragraph does an FOI request become unmanageable and a horror to all involved. These are also the types who believe they are the only people who have a right to ‘touch’ what they consider ‘their’ personal stuff including words strung into paragraphs.

    As a civilized society. People who are overly protective/possessive don’t play well with others. Lacking good social integration skills is one term for it. Still, everbody is expected to follow certain social protocols whether at work or home. You may be allowed to act irresponsibly at home, but it is unacceptable at work or in public. Public is interaction with anyone but you yourself alone. Spouses and family may tolerate irresponsible behavior, but that does not make it right. Even then, anyone who takes the effort can ask your spouse, family, neighbors, friends about your activities and generally find out anything they want.

    Why do people not do this regularly, to paraphrase you, “civility in discourse and life”. Nor will people seek this type of information unless one’s public actions cause others (anyone) to seek more information. From my personal perspective, I’d say that the ‘climate team’s’ public actions are cause for concern. Those climategate emails you thought interesting “rgb says …I also do have to say that as much as I found the climategate emails interesting, I do think that their publication violated any number of excellent and well-founded privacy laws…” I personally thought their freely discussed and shared actions against editors, publications, other scientists were neither interesting or normal scientific discourse. I did think they were frightening, scurrilous and definitely cause for citizen concerns. Citizens ARE concerned and under legal civil access process the citizens are asking questions.

    My apologies to Steve and Anthony. We’ve strayed far off topic and I certainly contributed to the straying.

  40. Greg House says:

    rgbatduke says:
    October 8, 2012 at 7:56 am:
    “That’s a line that can and should only be crossed armed with the equivalent of a warrant issued on the basis of actual hard evidence of a committed crime, precisely like my real underwear drawer in my own house, precisely like my real, federally protected paper mail.”
    =======================================================

    Look, talking about your real underwear, I’ll give you one example: going through Customs. They can touch your real underwear without a warrant. They can confiscate or arrest it under circumstances. You can not reasonably say they should not be allowed to go through your luggage, because you might have some real underwear there.

    Back to the taxpayers funded climate liars, it looks like there are quite a few around and they do not just perpetrate their “research” alone, but in teams. In this context their emails are relevant and no way are their taxpayers funded working emails private. And then there is a law allowing taxpayers to look into that stuff. This is the way the liars can be caught and I hope they will.

  41. D. J. Hawkins says:

    atheok says:
    October 7, 2012 at 11:14 pm

    Um; which fundamental law was that? If you mean the Constitution of the United States, you are incorrect about “utter secrecy”. Only the deliberations themselves were to considered secret and even this ‘secrecy’ meaning is loose. Remember, 55 delegates reported back to their delegations, state legislatures and others. Thomas Jefferson, who at the time was in France as United States minister kept in the loop via James Madison and was able to add his thinking/influence. This is before taking into account that the delegates often stayed at the same taverns/rooming houses. Secret wasn’t so secret, but what was kept quiet was any acrimony from badly reported ‘he said they said’ getting published and poisoning discussions. Basically, anyone could trundle on down and meet with one of their delegates and get the lowdown, just not the ugly arguments from the floor of the convention…

    I will not claim expertise on the precise manner in which the deliberations were held. I’ve skimmed some of the commentary by Madison and it seems clear you could NOT siddle up to a participant and “get the lowdown”, and they were NOT reporting back to their respective legislatures, particulary since many of the said legislatures would have recalled their delegations if they’d gotten wind of what they were up to. They weren’t supposed to be writing a new constitution, just fixing the Articles of Confederation.

    On the broader issue, I fail to see the utility of what seems to be the general demand of “real time” open public monitoring of every e-mail under the sun possibly related to a government project. Too frequently it seem commenters here and at other blogs conflate “wrong” with “evil”. Therefore, the only result I can see is the premature and misguided pillorying of investigators as they go about the very difficult task of trying to understand the universe, or at least a small part of it.

    Unless there is evidence of E. coli contamination, I don’t much care to wade through every particular of the sausage making process. And those who do care to, should have compelling reasons related to the public good, not just idle curiosity or a desire to tar a researcher’s efforts with the brush of a few ill-considered personal or political remarks dropped among the pedestrian messages regarding the real work at hand. This is simple voyeurism and serves no legitimate interest.

  42. rogerknights says:

    rgb says:
    Without a reasonable expectation of privacy, frank discussions and speculations essential to the scientific process cannot occur.

    Steve McIntyre came to Mann’s defense (re the FOIA for his e-mails) on these grounds. But maybe there’s a middle way. For instance, maybe the judge (or an expert appointed by him) could look through the e-mails for evidence of skullduggery, if there were reason for suspicion, even without probable cause.

  43. John Brookes says:

    Look, you should just assume that they are trying to hide scientific misconduct from you. Actually getting proof of this is just a time consuming distraction. Just assume it.

  44. atheok says:

    “I will not claim expertise on the precise manner in which the deliberations were held. I’ve skimmed some of the commentary by Madison and it seems clear you could NOT siddle up to a participant and “get the lowdown”, and they were NOT reporting back to their respective legislatures, particulary since many of the said legislatures would have recalled their delegations if they’d gotten wind of what they were up to. They weren’t supposed to be writing a new constitution, just fixing the Articles of Confederation…”

    Aye in some respects, but this discussion turns the FOIA back into a very real need. Madison, was in frequent contact with Thomas Jefferson. And yes, in Colonial America, it was possible to sit down at a table, buy the delegate a beer (or some of George Washington’s rye whiskey) and chat. No, the delgate would NOT tell you that they were overturning the present government (Articles of Confederation). However, if the delegate thought you could add something, he might discuss any number of ideas that were floating around the Convention.

    If you think a peer review is difficult, imagine the result if 13 delegations returned to their states and suddenly dropped the bomb of an idea of throwing out the present government? As it was, only 39 of the delegates signed the constitution for presentation to the present Congress. Add to that sudden bombshell the idea that the present Articles of Confederation Congress did not have the authority to create a new national government. That was a State choice and could have only happened if the States themselves were leaning towards trashing the Articles. of Confederation.

    There were a number of plans submitted to the Convention:
    •The Annapolis Conference

    http://www.usconstitution.net/consttop_ccon.html#annapolis

    “…So it was in September 1786 that a conference was called to discuss the state of commerce in the fledgling nation. The national government had no authority to regulate trade between and among the states. The conference was called to discuss ways to facilitate commerce and establish standard rules and regulations. The conference was called by Virginia, at the urging of one of its great minds of the time, James Madison. Madison had designs on doing more than just discussing commerce, but his hopes were dashed when he arrived at the conference. Only five of the 13 states sent any delegates at all (Delaware, New Jersey, New York, Pennsylvania, and Virginia), and of those, only three (Delaware, New Jersey, and Virginia) had enough delegates to speak for their states.

    Unable to do much of anything, the people who were there sat down and talked amongst themselves. The group consisted of some of the great political minds of the time; besides Madison, Alexander Hamilton, George Read, and Edmund Randolph. Most were dissatisfied with the current system of government. The delegates decided that another conference, “with more enlarged powers” meet in Philadelphia the following summer to “take into consideration the situation of the United States, to devise such further provisions as shall appear to them necessary to render the constitution of the Federal Government adequate to the exigencies of the Union.” The report was written by Alexander Hamilton and sent to Congress for its consideration on September 14, 1786…”

    •Madison and the Virginia Plan
    James Madison submitted a plan that encompassed ideas from others. Ideas that had been long thought over and discussed with many people.

    •Sherman and the Connecticut (Great) Compromise

    “…Threats to dissolve the Convention, and, indeed, the Union, flew from one side of the issue to the other. Fortunately, when the convention adjourned that day, it did so on a Saturday evening, allowing heads to cool and deals to be made that Sunday for presentation to the Convention on Monday. On June 11, Roger Sherman of Connecticut rose on the floor and proposed:

    “That the proportion of suffrage in the 1st. branch should be according to the respective numbers of free inhabitants; and that in the second branch or Senate, each State should have one vote and no more.”

    Sherman was very well-liked and well-respected among the delegates, and spoke more in the Convention than anyone except Madison. In his time, he was a leader, respected by political friend and foe alike. His opinion carried weight. He had advanced an idea such as this as far back as 1776, when it was considered too radical to be taken seriously. This time, it not only was taken seriously, but Sherman’s voicing of his compromise may have saved the Convention from doom…”

    •Paterson and the New Jersey Plan
    Paterson and other delegates of the smaller states formed a caucus to discuss/rebut Madison’s Virginia plan. Again, many of the ideas Paterson and other small state delegates favored were the results of the, in their day, overwhelming power of the big states and large population centers. Ideas generated back in their state legislatures from dissatisfaction arising from the Articles of Confederation (AoC). Surprisingly Paterson’s group ideas were closest to their original mandate of fixing the AoC.

    •Hamilton and the British Plan
    Hamilton proposed many features of government that are recognizable in our consitution today, yet he left the Convention early and was frequently outvoted by his fellow New York delegates. His frustration was not evident when the Constitution was announced and sent for ratification; as he very strongly campaigned for the new Constitution.

    •Pinckney – step-father of the Constitution
    Pinckney is considered a step-father to the Constitution, where Madison is considered a Father. Pinckney himself did not propose radical new ideas, intead he submitted many ideas coalesced from the ideas of others. The Constitutional Convention convened on May 25, 1787. Madison submitted the Virginia plan on May 29, 1787 and Pinckney submitted his plan also on May 29, 1787.

    Both Madison’s and Pinckney’s plans had several similarities:
    •A bicameral legislature
    •The lower house, the House of Delegates, was elected by the people, with proportional representation
    •The upper house, the Senate, (Pinckney’s proposal was for equal state representation, Madison’s proposal was for state proportional representation)
    •An executive – elected by the legislature
    •Ability for veto over bills (Pinckney – by executive council; Madison – executive and judiciary)
    •National veto power over any state legislation
    •A judiciary was established

    For that matter Pinckney had submitted a similar plan to the AoC Congress before the Annapolis Conference.

    Perhaps today people think that Madison cribbed ideaas from Pinckney and Hamilton cribbed ideas from Madison and King George…? The truth is that many of these ideas had been floated and discussed openly since before the AoC. With the appalling lack of AoC ability to act for the nation it was more and more apparent that changes were needed.

    The colonies and then the states had operated on a central power concept, with that power controlled by the elite, along the lines of Hamilton’s ideas. Yet the end result was for “…Madison’s idea, certainly not an original one, but unique for the new United States, was to recreate the United States under an entirely different form of government – a republican model. In a republic, the people are the ultimate power, and the people transfer that power to representatives….” http://www.usconstitution.net/consttop_ccon.html#madison

    The Delaware delegation were sent with explicit instructions to leave the Convention if equal state representation in the legislature was compromised.

    So secrecy was kept, in a matter of speaking. No one let on that the Convention was dissolving the AoC or public confusion would be a problem. Especially how the the public over in England, Spain and France would view the situation. Diatribes and pernicious arguments were quashed by not exposing them outside the Convention. Think about how the legislature of New York thinks about Alexander Hamilton suddenly returning home from a Convention where his input is of high value to the state? Can you imagine telling your boss that you left an important meeting, because you felt like it… Oh yeah that’d go over well.

    Back to the FOIA, the whole Constitutional Convention was a success, because neither it nor many of the ideas it proposed were a surprise. Many of the ideas were greatly enhanced by open discussion both in the Convention and in the years prior by anyone interested.

    Most of our knowledge about the internal discussions of the Convention came from letters and notes of participants after their deaths. Even then, they were strongly flavored by the person’s attitudes who wrote them. I am not aware of any ‘orders’ to keep the notes private (secret) so I’m inclined to believe that the authors considered it ‘ungentlemanly’ to release them. Whatever their personal reasons, their papers eventually came under public scrutiny and this was in a time where fireplaces were commonplace.

    Fast forward to today; not only are electronic systems in place, but many of them have multiple fail-safe capabilities. The Delete function at the user level may not mean what they think it means.

    rgb stated “…Whatever happened to ordinary civility in scientific discourse?” and I paraphrased that as ordinary civility in discourse. Many of us were raised with the admonition of “If you can’t say anything nice, then don’t say anything.”. Civility in discussion should always be practiced, no matter the medium.

    The example provided by the Constitutional Convention is that ideas exposed to open discussion usually are improved by that discussion. Coupled with the fact that publishing a concept or idea establishes ‘proof’ of ownership/authorship. Yeah, the world is full of smart people who can see a rough idea and then form their own expansion of that idea. Normally that’s called progress. Many people believe it’s an invasion of their ‘personal’ idea and they want to protect it from violation. They’re allowed that right and “research that hasn’t been published yet” is protected from FOIA. Published research is an assumption that everything is published sufficient for anyone to replicate the research. Retaining and ‘hiding’ some of the research that went into what is published so replication is impossible is anti-scientific in purpose and principal.

    We’re back to, if you don’t want something read by everyone/everybody at some point in time, then do NOT write it. That is civility. When you refuse to release ALL information pertinent to published research; expect people, especially scientists, to question the research. Uncivil communications that support the research are poor reasons to claim privacy. So is feeling motherly about ‘invasive violation of personal space’ by others wanting the background information.

    It’s a big world and there are billions of people who live on it. Privacy is relative and fleeting at all times. Civility and social behaviors are necessary in all societies. Hermits and others desiring hermit lifestyles should find their own mountaintops and cease being human if that is their desire. Refusing to share is something many people try to minimize in their children by teaching them to share. Privacy is protected where necessary and open to challenge when it is not.

    Steve has legitimately FOIA documentation and some communications; let us help him obtain that information. What is posted above are supposed technical challenges to properly responding to Steve’s FOIA. Does everyone truly believe this important background information no longer exists? If so, what is proper procedure regarding the published research that these documents and communications resulted in? Where is replicability?

  45. Philemon says:

    Frankly, I have very little sympathy for the Duke professor claiming that FOIA requests are “harassment.” He’s crazy if he wrote anything in an email while at Duke which in any way could lead to anything like a discrimination charge! ;)

Comments are closed.