The Climategate email network infrastructure

Computer Room 1, University of East Anglia Jan 13, 2009 - somewhere in here is the email and the data

Guest Post by David M. Hoffer

Since both ClimateGate 1&2 there has been considerable confusion in regard to how the emails were obtained, how the FOIA requests were managed, and what was or wasn’t possible in that context.  There is no simple answer to those questions.

The ClimateGate emails span a period of nearly two decades.  During that time period, email systems evolved substantially in terms of technology, implementation, operational procedures, and the job descriptions of those responsible for them.  Other technologies such as backup systems, archive, and supporting technologies for legal compliance also changed (as did the laws themselves).  Saying exactly what was and wasn’t possible for such simple actions as deleting an email have completely different answers over time, and also based on the technology that was implemented at any given time.  With so many moving targets, it is impossible to draw any conclusions to a 100 percent certainty.

This article is written to cover the basics of how email systems and their supporting infrastructure work, and how they have evolved over time.  With that as a common background, we can then discuss everything from the simple questions regarding who could delete what  (and when), how the emails might have been obtained, and possibly most interesting of all, raise some serious questions about the manner in which the FOIA requests were handled at the CRU.

EMAIL 101

There are many, different email systems, and many different ways for end users to access them.  The basics are common to all of them however.  Each user has a “client” that allows them to access their email.  It could be an internet browser based client such as the ones used by Hotmail and Gmail, or it could be an email client that runs on your desk top computer like Outlook or Eudora.  For the purposes of this discussion I am going to discuss how things work from the perspective of an email client running on a desk top computer.

The email client connects an email server (or servers in a very large implementation).  To send an email to someone on a different email server, the two servers must “talk” to each other.  In most cases they do so over the internet.   How the clients interact with the servers however, is part of understanding why deleting an email that you sent (or received) is not straight forward.  The reason is that an email is never actually “sent” anywhere.  Once you write an email it exists on the disk drive of the computer the client software is installed on.  Press “send” and it goes….nowhere.  It is still there, exactly as it was before you “sent” it.

A copy however, has now been sent to the email server you are connected to.  That email server makes yet another copy and sends it to the email server the recipient is connected to.  That email server then makes still one more copy and sends it to the email client on the recipient’s computer, which in turn writes it to the local hard drive.  There are now a minimum of four copies of that one email.

image

But wait, there may be more copies.  When researchers first started exchanging information via email, they were commonly years ahead of the rest of the world.  Most large organizations had central IT shops, but they ran financial applications for the most part, email was a curiosity at best.  Many researchers were left to run their own email systems, and it wasn’t that hard to do.  Solaris was the UNIX operating system in vogue in those days, and Solaris came with a pretty good email system built in called Sendmail.  There were many other options too.  The bottom line was that early email systems were frequently run by researchers on their own computers.

As time went on, email became more common, and it became more important.  The volume of data, performance matters, and security were all becoming beyond the skill set of anyone but someone whose full time job it was to run IT (Information Technology) systems.  Researchers began giving up ownership of their own email system and central IT shops took over.  Email was becoming mission critical, and a lot of data was being stored in email systems along with records of contract negotiations and other important “paper” trails.  Losing email was becoming a painful matter if important information disappeared as a result.  As a consequence, the systems that protected the data on email systems also began to mature and be run professionally by IT departments.

The early email systems were just a single server with local hard drives.  As they grew in capacity and overall usage, plain old hard drives could no longer keep up.  Storage arrays emerged which used many hard drives working together to increase both capacity and performance.  Storage arrays also came with  interesting features that could be leveraged to protect email systems from data loss.  Two important ones were “snapshots” and “replication”.

Snapshots were simply point in time copies of the data.  By taking a snapshot every hour or so on the storage array, the email administrator could recover from a crash by rolling back to the last available snapshot and restarting the system.  Some storage arrays could handle keeping a few snapshots, others could maintain hundreds.  But each snapshot was actually a full copy of the data!  Not only could a storage array store many copies of the data, consider the question of deletion.  If an email was received and then deleted after a snapshot, even by the central IT department itself, the email would still exist in the last snapshot of the data, not matter what procedure was used to delete it from the email system itself.

What if the storage array itself crashed?  Since the storage arrays could replicate their data to other storage arrays, it wasn’t uncommon to have two arrays and two email servers in a computer room so that no matter what failed, the email system could keep on running.  What if the whole computer room burned down?  Replication to another storage array at a completely different location is also very common, and should the main data centre burn down, the remote data centre would take over.  Keep in mind as you think this through that the ability of the storage arrays to replicate data in this fashion is completely and totally independent of the email system itself.

Early email systems were, as mentioned before, most often a single server with internal hard drives.  A modern “enterprise class” email system would be comprised of many servers and storage arrays more like this:

image

If you recall that just sending an email makes, at minimum, four copies, consider what “one” copy on a large email system actually translates to.  In the figure above, there are two copies on the storage arrays in the data centre.  If snapshots are being used, there may be considerably more.  Plus, there is at least one more copy being replicated to a remote data center, which also may have regular snapshots of data.  That’s a LOT of copies of just one email!  And we haven’t even started talking about backup and archive systems yet.

Let’s return to the question of deleting email.  It should be plain to see that in terms of current email technology, deleting an email just from the email system itself is not a simple task if your intention is to erase every single copy that ever existed.

As an end user, Phil Jones is simply running an email client connected to an email server run by somebody else.  He has no control over what happens on the server.  When he deletes an email, it is deleted from his email client (and hence the hard drive on his computer), and from his view of his emails on the email server.  Technically it is possible to set up the email server to also delete the email on the server at the same time, but that is almost never done, and we’ll see why when we start discussing backup, archive, and compliance.

On the other hand, are we talking about what was most likely to happen when Phil Jones deleted an email in 2009?  Or what was most likely to happen when Phil Jones deleted an e-mail in 1996?  The answers would most likely be entirely different.  In terms of how email systems have been run in the last ten years or so however, while it is technically possible that Phil Jones hit delete and erased all possible copies of the email that he received, this would have done nothing to all the copies on the sender’s desk top and on the sender’s email server… and backup systems.  Let’s jump now into an explanation of additional systems that coexist along with the email system, and make the possibility of simply deleting an email even more remote.

Backup Systems

Just as we started with email and how it worked at first and then evolved, let’s trace how backup systems worked and evolved.  There are many different approaches to backup systems, but I’ll focus here on the most common, which is to make a copy of data to a tape cartridge.

At first, backup was for “operational” purposes only.  The most common method of making a backup copy of data for a server (or servers) was to copy it to tape.  The idea was that if a disk drive failed, or someone deleted something  inadvertently, you could restore the data from the copy on tape.  This had some inherent problems.  Suppose you had a program that tracked your bank account balance.  But for some reason you want to know what the bank account balance was a week ago, not what it is today.  If the application didn’t retain that information, just updated the “current” balance as it went, you would have only one choice, which would be to restore the data as it existed on that specific day.  To do that, you’d need one tape for each day (or perhaps one set of tapes for each day in a larger environment).  That starts to be a lot of tape fast.  Worse, as data started to grow, it was taking longer to back it up (and the applications had to be shut down during that period) and the amount of time at night where people didn’t need their applications running kept shrinking as companies became more global.

Several approaches emerged, and I will be covering only one.  The most common  by far is an approach called “weekly full, daily incremental”.  The name pretty much describes it.  Every weekend (when the backup window is longest), a full copy of the data is made to tape.  During the week, only what changed that day is copied to tape.  Since changes represent a tiny fraction of the total data, they could be run in a fraction of the time a full copy could.  To restore to any given day, you would first restore the last “full copy” and then add each daily “incremental” on top until you got to the day you wanted.

This worked fine for many organizations, and larger ones bought “tape libraries” which were exactly what they sound like.  They would have slots for dozens, sometimes hundreds, of tape cartridges, several tape drives, and a robot arm that could change tapes for both backup processes and for restore processes.  The problem was that the tape library had to be as close as possible to the servers so that data could be copied as fast as possible (performance degrades sharply with distance).   The following depicts the email system we’ve already looked at, plus a tape backup system:

image

By making regular copies of data to tape, which was a fraction of the cost of disk storage, the IT department could have copies of the data, exactly as it existed on any given day, and going as far back as the capacity of the tape library (or libraries) would allow.  Now try deleting an email from say a year ago.  In addition to all the copies on disk, there are at least 52 copies in the tape library.  Since we have a tape library however, it is easy to make still more copies, automatically, and most organizations do.

Disaster Recovery

What if there was a major flood, or perhaps an earthquake that destroyed both our local and remote data centers?  In order to protect themselves from disaster scenarios, most IT shops adopted an “off site” policy.  Once the backup was complete, they would use the copy of the data on tape to make… another copy on tape.  The second set of tapes would then be sent to an “off site” facility, preferably one as far away as practical from the data centers themselves.

image

Consider now how many copies of a given email now exist at any given time.  Unlike that financial application whose “current account balance” is constantly changing, email, once received, should never change.  (But it might which is a security discussion just as lengthy as this one!).  Provided the email doesn’t change, there are many copies in many places, and no end user would have the security permissions to delete all of them.  In fact, in a large IT shop, it would take several people in close cooperation to delete all the copies of a single email.  Don’t organizations ever delete their old data?

Data Retention

The answer to that question can only be answered by knowing what the data retention policy of the organization is.  Many organizations just kept everything until the cost of constantly expanding their storage systems, tape libraries and the cost of housing off site tapes started to become significant.  Many organizations decided to retain only enough history on tape to cover themselves from a tax law perspective.  If the retention policy was implemented correctly, any tapes older than a certain period of time would be removed from the tape library and discarded (or possibly re-used and overwritten).  The copies in the offsite storage facility would also be retrieved to be either destroyed or re-used so that the offsite data and he onsite data matched.

Archive

As email systems grew, the backup practices described above became problematic.  How long people wanted to keep their email for was often in conflict with the retention periods for financial purposes.  They were designed for general purpose applications with ever changing data.  As the amount of data in an email system started to grow exponentially due to ever larger attachments, graphics, and volume, the expense and pressure on even an “incremental” backup window became enormous.  That’s where archive started to emerge as a strategy.  The storage arrays that supported large email systems were very expensive because they had to be ultra reliable as well as ultra high performance.  But 99% of all emails were being read on the day they were sent… and never again.  Only if something made an older email important… evidence of who said what and when from a year ago for example, would an email be accessed again after it was a few days old.  So why house it on the most expensive storage the organization  owned?  And why back it up and make a copy of it every week for years?

Many organizations moved to an “archive” which was simply a way of storing email on the cheapest storage available.  If someone needed an email from a year ago, they would have to wait minutes or perhaps hours to get it back.  Not a big issue provided it didn’t need to be done very often.  Some organizations used low performance low cost disk, some even went so far as to write the archive to tape.  So, for example, the email you sent and received in the last 90 days might open and close in seconds, but something from two years ago might take an hour.  Not only did this reduce the cost of storing email data, but it had the added benefit of removing almost all the email from the email system and moving it to the archive.  Since the archive typically wasn’t backed up at all, the only data the backup system had to deal with in its weekly full daily incremental rotation was the last 90 days.  This left an email system, with the integrated backup and archive systems, looking something like this:

image

For most IT shops, if you ask them how many copies of a given email they have if it was sent a year ago, they can’t even answer the question.  Lots.

What does that mean in terms of FOIA requests?  Plenty.

Compliance

The world was rolling along quite nicely using these general techniques to protect data, and then the law got involved.  Enron resulted in Sarbanes Oxley in the United States and similar laws in other countries  FOIA came in existence in most western countries.  Privacy laws cropped up.  Suddenly IT had a new problem, and a big one.  The board of directors was suddenly asking questions about data retention.  The IT department went from not being able to get a meeting with the board of directors to having the board shining a spot light on them.  Why?

Because they (the board of directors) could suddenly go to jail (and some did) because of what was in their email systems.  Worse, they could even go to jail for something that was NOT in their email system.  The laws in most jurisdictions took what you could delete, and what you could not delete, to a whole new level.  Worse (if you were a member of the board of directors) you could be held responsible for something an employee deleted and shouldn’t have…. or didn’t delete and should have.  Bingo.  The board of directors is suddenly no longer interested in letting employees decide what they can and cannot delete, and when.  The same applied in most cases to senior management of public institutions.

Back to the original question.  Could Phil Jones have deleted his emails?  When?  In the early days when his emails were all in a server run by someone in his department?  Probably.  When the email system moved to central IT and they started backing it up regularly?  No.  He would only be able to permanently delete any given email provided that he had access to all the snapshot copies on all the storage arrays plus the archive plus all the backup tapes (offsite and onsite).  Fat chance without the express cooperation of a lot of people in IT, and the job of those people, based on laws such as FOIA, SOX and others was to expressly prevent an end user such as Phil Jones from ever doing anything of the sort, because management had little interest in going to jail over something someone else deleted and shouldn’t have.

So…did CRU have a backup system?  Did they send tapes off site?  Did they have a data retention policy and what was it?  Did they have an archive?  If they had these things, when did they have them?

With all that in mind, now we can look at two other interesting issues:

  • What are the possible ways the emails could have been obtained?
  • Were the proper mechanisms to search those emails against FOIA requests followed?

Short answer: No.

In terms of how the emails could have been obtained, we’ve seen various comments from the investigation into ClimateGate 1 that they were most likely obtained from accessing an email archive.  This suggests that there at least was an email archive.  Without someone laying out a complete architecture drawing of the email systems, archive system, backup system, data retention policies and operational procedures, we can only guess at how the system was implemented, what options were available, and what options not.  What we can conclude however is that at some point in time, an archive was implemented.  Did it work like the description above about archives?  Probably.  But there are many different archive products on the market, and some IT shops refer to their backup tapes as an archive just to confuse matters more.

In addition, without knowing how the investigators came to the conclusion that the emails were obtained from the archive, we don’t have any way to assess the quality of their conclusions.  I’m not accusing them of malfeasance, but the fact is without the data, we can’t determine if the conclusions are correct.  Computer forensics is an “upside down” investigation in which the “evidence” invariably points to an innocent party.  For example, if someone figured out what Phil Jones username and password was, and used them to download the entire archive, the “evidence” in the server logs would show that Phil Jones did the deed.  It takes a skilled investigator to sort out what Phil Jones did (or didn’t do) from what someone using Phil Jones credentials did (or didn’t do).  So let’s put aside what the investigators say they think happened and just take a look at some of the possibilities:

Email Administrator – anyone who had administration rights to the email system itself could have made copies of the entire email database going back as far as the oldest backup tapes retained with little effort.  So…who had administration rights on the email system itself?  There’s reason to believe that it was not any of the researchers, because it is clear from many of the emails themselves that they had no idea that things like archives and backup tapes existed.

Storage Administrator – In large IT shops, managing the large storage arrays that the application servers are attached to is often a job completely separate from application administration jobs such as running the email system.  Since the storage administrator has direct access to the data on the storage arrays, copying the data from places such as the email system and the archive would be a matter of a few mouse clicks.

Backup Administrator – This again is often a separate job description in a large organization, but it might be rolled in with storage administration.  The point being however, that whoever had backup administration rights had everything available to copy with a few mouse clicks.  Even in a scenario where no archive existed, and copying the data required restoring it from backup tapes that went back 20 years, this would have been a snap for the backup administrator.  Provided that the tapes were retained for that length of time of course, the backup administrator could simply have used the backup system itself, and the robotics in the tape library, to pull every tape there was with email data on it and copy the emails to a single tape.  This is a technique called a “synthetic full” and could easily run late at night when it would just look like regular backup activity to the casual observer.  The backup administrator could also “restore” data to any hard drive s/he had access too… like their personal computer on their desk.

Truck Driver – yes, you read that right, the truck driver.  Google keywords like “backup tapes stolen truck” and see what you get.  The results are eye popping.  The companies that specialize in storing tapes off site for customers send a truck around on a regular basis to pick up the weekly backup tapes.  There have been incidents where entire trucks (and the tapes they were carrying) were stolen.  Did anyone steal CRU’s tapes that way?  Probably not.  The point is however that once the tapes leave your site and are entrusted to another organization for storage, they could be copied by anyone from the truck driver to the janitor at the storage site.  Assembling 20 years of email from backup tapes could be a real hassle of course.  On the other hand, an offsite storage facility frequently has as part of the service it provides to clients…great big tape libraries for automating copying of tapes.   Encryption of backup tapes was a direct response to incidents in which tapes with valuable (and/or embarrassing information) wound up in the wrong hands.

But encryption has only been common for a few years.  That raised an interesting theoretical question.  The last email release ends in 2009, and the rest of the release is, in fact, encrypted.  One can only wonder, does the CRU encrypt their backup tapes, and if so, when did they start doing that?

Administrative Foul Up – One of the biggest “cyber crimes” in history occurred when a company doing seismic processing for oil companies cycled the tapes back to their customers for the next round of data, and sent old tapes to different customers.  One of their customers figured it out, and started checking out the data they were being sent which was from their competitors.  It wasn’t the first time it happened, and it wasn’t the last time.

Janitor – Let’s be clear, I’m not accusing anyone, just making a point.  There’s an old saying about computer security.  If you have physical access, then you have access.  Anyone with physical access to the computer room itself, and the right technical skills, could have copied anything from anywhere.

The FOIA Requests

There are dozens of emails that provide glimpses into both how the email systems at CRU were run, and how FOIA requests were handled.  Some of them raise some very interesting questions.  To understand just how complex compliance law can be, here’s a brief real world story.  Keep in mind as you read this that we’re talking about American law, and the CRU is subject to British law which isn’t quite the same.

In the early days of compliance law, a large financial services firm was sued by one of their clients.  His claim was that he’d sent instructions via email to make changes to his investment portfolio.  The changes hadn’t been made and he’d suffered large losses as a result.  His problem was that he didn’t have copies of the emails he’d sent (years previous) so his legal case was predicated upon the financial firm having copies of them.  To his chagrin, the financial firm had a data retention policy that required all email older than a certain date to be deleted.  The financial firm figured they were scot free.  Here’s where compliance law starts to get nasty.

A whistle blower revealed that the financial firm had been storing backup tapes in a closet, and had essentially forgotten about them.  A quick inspection revealed that a number of the backup tapes were from the time in question.  The financial services firm asked the judge for time to restore the data from the tapes, and see what was on them that might be relevant.  The judge said no.

The judge entered a default judgment against the financial services firm awarding the complainant  $1.3 Billion in damages.  The ruling of the court was that the financial services firm was guilty by virtue of the fact that they had told the court the data had been deleted from that time period, but it hadn’t been.  They had violated their own data retention policies by not deleting the data, and were guilty on that basis alone.  Wake up call for the financial industry…and everyone else subject to compliance law, which includes FOIA requests.

Suddenly deleting information when you said you hadn’t was a crime.  Not deleting information when you said you had, was a crime.  Keeping information could wind up being used against you.  Not keeping information that it turns out you were required to keep (by the tax department for example) could be used against you.  No one serious about compliance could possibly take the risk of allowing end users to simply delete or keep whatever they wanted.  From their own personal accounts certainly, but not from the company email server.  Ever.

In that context, let’s consider just a few words from one email in which Phil Jones, discussing with David Palmer whether or not he’d supplied all the email in regard to a specific FOIA request, says “Eudora tells me…”

These few words raise some serious questions.  Eudora is an email client, similar to the more familiar Outlook.  So, let us ask ourselves:

Why was David Palmer relying on Phil Jones to report back all the emails he had?  Compliance law in most countries would have required that David Palmer have the appropriate search be done by the IT department.  This would have captured any emails deleted by Phil Jones that were still retained by the CRU based on their data retention policy.

Was David Palmer aware of the proper procedure (to get the search done by IT)?  If not, was he improperly trained and who was responsible for properly training him in terms of responding to FOIA requests?  If he was aware… well then why was he talking to Phil Jones about it at all?

Phil Jones specifically says that “Eudora tells me” in his response to Palmer.  Since Phil Jones evidently did the search from his own desk top, the only emails he could search for were ones that he had not deleted.  But, that doesn’t mean he found all the emails subject to the FOIA request, because email that he did delete was more than likely retained on the CRU email server according to their data retention policies.  As in the case of the financial company, the CRU may well have said they didn’t have something that they did.  In fact, we can surmise this to be highly likely.  There are multiple emails showing up in which, for example, Phil Jones says he is going to delete the message right after sending it.  But we now have a copy of that specific message.  Did he send it and then forget to delete it?  Probably not.  The more likely answer is that he did delete it, not realizing that the CRU data retention policy resulted in a copy being left on the server.  If the CRU responded to an FOIA request and didn’t include an email that met the FOIA parameters because they failed to search all their email instead of just the email that Phil Jones retained in his personal folder… well, in the US, there would be some prosecutors very interested in advancing their careers…

“Eudora tells me” is even more curious from another perspective.  Why “Eudora”?  Why didn’t he say that he’d searched all his email?  Why specify that he’d used the search capabilities in Eudora?  Personally, I have three email systems that I connect to, and a different email client for each one.  Searching all my email and searching all my email in just one email client are two completely different things.  Most interesting!

While I am comfortable discussing with IT shops how to architect email systems to protect data and properly service legal requirements such as FOIA requests, I have to admit that I wouldn’t know where to start in terms of submitting one.  If I did, I just might ask for all the emails they have pertaining to FOIA requests, and I’d be specific about wanting all the email, regardless of it being in the main email system, the archives, or on backup media, and all the email that ever existed, regardless of having been deleted by end users.  Then for the capper, I’d ask for their data retention policy and see if they managed to meet it in their response.

Just sayin’

dmh

0 0 votes
Article Rating
151 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Dale
November 30, 2011 1:56 am

“But each snapshot was actually a full copy of the data!”
Just want to point out that a snapshot is technically not a full copy of the data. A snapshot is the difference between the old snapshot and the new. The first snapshot is a full copy, but one’s after that are just the differences.
Replication is a full copy though, a replica of the original.

Ripper
November 30, 2011 2:13 am

Great post.

Larry Kirk
November 30, 2011 2:25 am

A fascinating tour de force! Relevant, revealing and much appreciated. The belief in having ‘deleted’ emails, at CRU or anywhere else for that matter, sounds about as valid as the belief of the infant that covers its eyes and says: ‘Now you can’t see me Mummy!’
I heard a very closely related issue discussed on US National Public Radio (syndicated to us here in Oz via the Australian ABC News Radio service). The interviewee was Max Shrems who owns a website called Europe-vs-Facebook: http://europe-v-facebook.org/EN/Objectives/objectives.html
Whilst studying in the US, Max had made a legal request to Facebook for a copy of all the data that they retained from his Facebook account. He expected to recieve only the small amount of current information that he had not ‘deleted’, and was therefore shocked to recieve thousands of pages of data that, which he had supposeddly deleted, but which in his words had: ‘actually only been hidden from my own view’.
Back in Europe, he feels very strongly that this policy is conflict with the European legal concept of privacy, which he is therefore trying to get Facebook to comply with (at least at their European server, which is located in the Irish Republic).
Reading the foregoing however, I realise that he hasn’t got a cat-in-hell’s chance.
Facebook are unlikely to have set their systems up facilitate this, and very unlikely to voluntarily apply productive resources to do it the hard way.

A Public Servant
November 30, 2011 2:30 am

I normally lurk here rather than post, but in this instance feel the need to do so that some very valid parts article are not clouded by a misunderstanding of how the UK’s FOI environment operates. Under (current) FOI legislation and statutory guidance, UK public bodies are not required to search their IT systems and email servers for copies of an email that a user has deleted from their email client or other system. A public body can legitmately refuse any FOI request that asks them to do so. It is precisely why Phil Jones deserves the opprobrium being heaped upon him for deleting emails in his posession that were subject to FOI requests: he and David Palmer would have known full well that in FOI disclosure terms “they were gone for ever”. The Information Commissioner’s Office guidance is very specific on this matter. It is why the Information Commissioner took a such a dim view of the deletion and its inability to do anything about it. (The UK has no statute of limitations equivalent to the US so the time restriction on the IOC’s ability to act has to have been deliberately drafted by the civil servants who drafted the Act.) David Palmer followed the correct procedure when he asked Phil Jones to report back on the emails he held.
The truth is far more damaging than the possible conspiracy or failure to search IT systems to comply with an FOI request: there was deliberate connivance to delete a series of emails subject to an FOI request in the full knowledge that it would prohibit dislosure and would probably never have been discovered; and if it were, it would probably be too late for anyone to do anything about it.
Remember, were it not for “Climategate”, we would not know (for a fact) that Phil Jones had systematically deleted emails subject to FOI requests.

November 30, 2011 2:54 am

Excellent sleuthing, David. A thoroughly worthwhile read.
You say that latterly emails would have been encrypted on the server. I wonder if that is what is in the encrypted portion of the CG2.0 batch and FOIA wasn’t able to decrypt it himself but releases them anyway?
If that were the case, the CRU team would be able to use their own key to see what was in that batch. I wonder if they are poring over it as we write?

Duke C.
November 30, 2011 3:36 am

Nice work, David. Just wanted to add- the oldest email goes back to 1990, not 1996 (636048969 contained in the “all.7z ” archive).

November 30, 2011 3:51 am

Thanks David, a brilliantly clear post that all should be able to comprehend.

Michael Larkin
November 30, 2011 4:04 am

What a wonderful and educational article – thank you very much, David Hoffer! 🙂

Russell
November 30, 2011 4:09 am

While interesting, this overstates the availability of data *in readable form*. For most of the time period covered in the emails, one would need root privileges to read the emails, even if they were from a stolen tape (or any tape). The data is backed up and archived generally as described, but the security information is backed up with it, and the correct passwords etc are still required. It’s possible that the password security was shoddy and easily cracked, but it’s more likely that someone with the passwords liberated the data (which doesn’t mean they were supposed to have the passwords). It’s unlikely to be from collected from tapes either, simply because tapes have changed formats so many times over the years. (I am dutifully retaining tapes for the network I manage in accordance with the law and the retention policy, but I’ve no idea where I would find a drive to read most of them if it ever became necessary).
Good summary of why the data still exists!

Shevva
November 30, 2011 4:11 am

Sorry just about to read the post but might be worth cutting the front page post as it’s the full post [please delete]

John
November 30, 2011 4:13 am

Great article. I wanted to add a comment to Public Servant but first thanks I was wondering how UK law works. May that be why the CRU is where it is.
A Public Servant says:
November 30, 2011 at 2:30 am
I normally lurk here rather than post, but in this instance feel the need to do so…..
That said and I am sure many posters here know this or have figured this out while reading this is that many these emails were sent to sites in the US (much better FOIA laws).
Many of those addresses in the US are Univeristy’s, Publications, Private and Public entities, and US locals and like wise servers as described above like …llnr.gov which if I recall is Lawrance Livermore National Laboratory and so many more which I am certain can be accessed via the many backs-ups and redundances as described. Good luck and start hunting. By hunting I mean FOIA request to those entities and now…..

kim2ooo
November 30, 2011 4:13 am

Great Read! Thank you 🙂

SteveW
November 30, 2011 4:16 am

Just a thought, and one which I may well pursue.
Assuming the above to be correct (and I have no reason to asume otherwise) is it fair to assume that somewhere within CRU’s systems there should be copies of all the amails Phil Jones stated that he’d spent half a day deleting (can’t pull the precise reference out of the air, but I’m sure you know the mail I’m talking about).
Would their systems also have a record fo what and when was deleted by a specific user from their own personal folder?
If so, I feel a request for any emails deleted by Phil on the date of the aforementioned email or within 7 days thereof – could prove to be quite interestign reading.

Roger Knights
November 30, 2011 4:32 am

A Public Servant says:
November 30, 2011 at 2:30 am
… UK public bodies are not required to search their IT systems and email servers for copies of an email that a user has deleted from their email client or other system. A public body can legitmately refuse any FOI request that asks them to do so. It is precisely why Phil Jones deserves the opprobrium being heaped upon him for deleting emails in his posession that were subject to FOI requests: he and David Palmer would have known full well that in FOI disclosure terms “they were gone for ever”.

Ah-HA!

Shevva
November 30, 2011 4:42 am

Just a few points, the daily, weekly is generally more like daily (incremental), weekly (Friday night, full), Monthly (Full) and year end (full) generally the year end tape is kept until FOIA is up (7 years I think) and you have the monthly back ups for the year before. You can delete e-mails from your client but if your sitting in-front of your outlook now try Tools->Recover Deleted Items, now there not deleted anymore (depending on your system setup).
The troubling point as you say is that they span 20 years? For this there is no way that any IT department would keep the e-mails live on the system for that long so the only way to get 20 year old e-mails is the back up tapes, why the back up tapes because 20 years ago this was the only way to back up data (20 AS 400 tapes a night was a lot of tapes to take off site each night).
Educated guess, you restore all e-mails from the back up tapes 200,00 and then copy the data to a DVD, copy the tapes to second tapes and take the second tapes home with you or just simply walk off with the originals (although someone might notice this).

Slartibartfast
November 30, 2011 4:46 am

Comment section should be interesting today, given the fairly recent dustup between David Hoffer and another commenter whose name escapes me over who has the biggest…expertise in this area.

steveta_uk
November 30, 2011 4:52 am

There was recently an email outage at a company I’ve been doing some work for. It took a couple of days to restore the systems, then feed thru the pending stuff that was queued while the servers were out. I have IMAP access to their email servers.
On completion, every email I’d sent or received over the past 3 years was back in the folder structure visible via Thunderbird, so there was some hours of sorting to do to work out which ones I’d deliberately kept and which could be deleted.
I wasn’t aware until this happened that deleting emails, even using IMAP, doesn’t actually delete emails – just removes them from my ‘view’.

steveta_uk
November 30, 2011 4:56 am

Russel, given a tape of an email archive, there’s nothing to prevent someone dumping the tape content as a raw device and extracting the text, assuming the data isn’t encrypted.
Sure, if you restore the archive, then you restore the security, but the raw tape might just have all the emails in readable form. This could even explain why the email numbering in the FOIA releases is apparently random.

November 30, 2011 5:04 am

Very lovely article (and I say that as a professional). I’d add only a few things to it.
No history of people bitten by email “deletion” (not!) should exclude good old Oliver North in the Iran-Contra scandal. He tried to hide his covert activities by deleting all of his email relevant to it from his email client, to be sure (removing it from the primary email server his client connected to) as well as deleting any personal copies of messages and so forth that he had. He forgot about backups. Big oops. Quoting a nice little review here:
http://www.akdart.com/priv9.html
“Apparently there are many security-conscious individuals — even some who own paper shredders — who don’t know or don’t care about residual information from their deleted computer files. But there have been many public figures in recent history who have learned about this issue the hard way. That’s how U.S. Senate investigators got evidence on Col. Oliver North. E-mail messages that North believed to be deleted were found and used against him in litigation. A total of 758 e-mail messages were sent, involving him in the Iran-Contra affair, and every one of them was recovered.”
This was 1982, and was one of the first cases where this happened, and it is now an iconic event, the legal precedent that started the whole “responsible archiving” thing. Today, to be REALLY secure, you not only have to delete the file, you have to go in and do bad things to the physical disk platter to be certain of putting the data firmly beyond the reach of somebody who really wanted to get at it, say the NSA. Yeah, you can encrypt it instead, but a court can always compel the decryption key with the alternative of life in prison in contempt of court until you decide to comply. Data security is serious business that becomes more difficult and expensive as the stakes in the game go up, and just who “owns” data and what their rights/expectations are to data privacy are open questions with the probable answers of “anybody who can get at it by fair/legal means or foul” and “none”. Few people understand the means or are willing to pay the costs to really secure their data and information exchanges against less-than-NSA-level efforts to access it.
The second is an illustration of the power of discovery and its effect on e.g. University governance these days. Those of you who live in the US are likely aware of the infamous Duke Lacrosse Scandal. The story, in a nutshell, is that the members of the Duke lacrosse team had a party in a house just across the street from East Campus and hired three strippers as part of the entertainment (off campus, not Duke’s business, technically). One of the strippers was found in her car in a nearby parking lot completely wasted — drunk and drugged out of her mind (and obviously, driving at least to where she was found). In the hospital, when they asked her how she came to be in such a state she replied that she had been gang-raped by members of the Duke lacrosse team.
Instantly (in a way that is eerily reminiscent of the way CAGW caught on) she went from being a drunk, stoned, stripper being arrested to being a victim celebre — the investigating detective hated Duke and had a history of harassing undergrads and the DA was involved in a tough re-election battle. The “victim” (Crystal Gayle Magnum) was black, female, bravely working as a stripper (or so it was implied) to support her family and educational aspirations, the perpetrators were from a mostly white group that was “elite” even at Duke, an elite University.
It was almost surreal as my own wife and sister-in-law were openly believing all of the charges, where there’s smoke there’s fire, etc, and they are both physicians and not stupid. I myself counselled caution and wait and see — even from the beginning there were a lot of things in her story that didn’t add up, and the DA was obviously pursuing the case in a completely inappropriate way. In the meantime, a number of Duke faculty — now called “the gang of 88” — took out a full page ad in the local paper where they expressed shock and regret on behalf of Duke that any such thing could have been permitted to occur, effectively convicting the three young men who were eventually charged (picked out of a lineup that was one of the most biased and telegraphed lineups in history, with ONLY members of the lacrosse team in it so that no matter who she picked, she’d get a lacrosse player). Long story short, the case completely fell apart, the DA was fired and debarred, Ms. Magnum currently resides in prison (I imagine, for life) for allegedly knifing her last boyfriend. But that’s not the story.
The completely exonerated students, quite reasonably, started filing massive lawsuits right after their release. One of them was against Duke and all of the faculty in the gang of 88. One of those faculty was in the physics department (a good friend of mine, actually, and again very, very much not stupid, although he should probably have known better). Well before Duke got “served”, however, word came down from the administration that discovery was immanent and that our department IT staff (where I’m nominally a member as I actually built most of our first serious Unix client/server network , although at this point I hardly do anything any more) was to lock everything down, making copies of our archival backups and delivering them up for offsite and out-of-our-hands storage. I mean backups of everything — email, files on disk, the whole nine yards, going back over incrementals and snaps covering the entire time around the incident. Not just of the one person affected — of everybody. IIRC (I’m not certain of this) this request was made of pretty much the entire IT superstructure at Duke, which is not really particularly centralized. It was both expensive and universal, in other words, but the CYA principle dictated that it become IMPOSSIBLE for anyone in the Duke administration OR the gang of 88 OR (for that matter) anyone else on campus to delete any electronic document involving the lacrosse case.
I have no idea who is really “in charge” of the EAU-CRU, or whether UK discovery laws are similar to our own, but the CYA principle strongly suggests that they are fairly unlikely to permit tampering with their backup system, and indeed may have taken steps to ensure that they can comply with court-compelled discovery of all sorts, not just “FOI” requests. The case of data and methods stored on CRUs lovely beowulf-style compute cluster (visible in the picture above:-) is a bit hazier — research groups and units often are responsible for their own backup, and if you have (say) a few PB of data somewhere, backing it up outside of RAID (which isn’t really a backup, it is just protection against corruption and loss on disk failure) requires that you have ANOTHER PB (that’s “petabyte” — 1000 terabytes (TB)) server to do it on, and that starts getting very expensive.
Now I have no direct idea what sorts of data storage requirements the CRU has — it may well be in the TB range and be thoroughly backed up and snapshotted. However, it is common practice to back up AT LEAST the code! Who would ever risk losing their own code! That’s typically an investment of FTE years on the part of any researcher that publishes computed quantities. In addition to the code, I’d absolutely expect any modest dataset (that is, a set in the 1 to 1000 GB range, nowadays) to be thoroughly backed up, given that TB-scale disks are cheap and plentiful. Given this — and I think any expert who has helped manage scientific computing would agree with it and be willing to testify to that effect — the main question at the CRU is who is going to be covering who’s rear end, and what they’re willing to risk while doing so.
Jones et al or back in the US Mann and co (the researchers) might well want to cover THEIR OWN asses by deleting all sorts of things — email, data, code, whatever. Indeed, Jones could lose things without even meaning to out of sheer computer incompetence — not backing things up, overwriting them, not using a version control system with primary repository on a backed up system. On their own, sufficiently local servers they MIGHT be able to manage it, too!
Or rather, they could if they were me instead of being them — Jones in particular is such a computer klutz that he would be utterly incapable of going into a local mail/file server and deleting local copies, going into the archives and deleting selected parts of the archived copies, and so on, covering his tracks in the end (assuming that they are still “directly” accessible online on e.g. a tape library and that he has access to any offsite copies that might exist). It wouldn’t be easy even for me, even on a smallish system without offsite backups where I have direct root access, but at least I could take a stab at it.
If those servers are managed by third parties, however, it becomes almost impossible. Their IT staff have their OWN asses to cover, as do the toplevel administrators, who usually are risk averse. There is humiliation and loss of prestige and grant funding, and then there is humiliation and loss of prestige and grant funding accompanied by cash settlements and possible jail time! For them, not the researchers.
To summarize: Mann may well have had his own server(s) and done a lot of his own system management at UVA. In this case he — possibly with some help — may have actually been able to delete or hide all sorts of things that are being sought in the current FOI request. UVA almost certainly does not still have full archival copies of everything being requested, and is likely stonewalling and awaiting a court order BOTH because of a sincere desire to protect the privacy of its faculty (a laudable one, frankly!) AND because they may fear being caught out not having such an archival copy. As noted in the original article above, you can be damned either way, although the greatest damnation is reserved for those that do something like deliberately alter or delete records when they have been served or expect to be served with a discovery order of any sort, not just FOIA. Given the size of the CRU and its probable IT structure, I’d be very surprised if their records are not all carefully preserved, intact, offsite, and completely beyond the reach of any researcher without the aid and abettance of a serious system wizard. I’d be very surprised if any member of that staff would comply with an order by e.g. Jones to alter or delete part of that backup record — such a request would have to go through channels and would leave traces of its own and leave a whole bunch of people open to lawsuits and possibly jail time.

1DandyTroll
November 30, 2011 5:13 am

In EU we archive stuff for the purpose of fueling our deep seated perverted need to just keep stuff around because it feels safe. And that’s pretty much the only reason, because if you want to look at it you really need a damn good rational reason and usually a court order, even when you don’t need one, to do so. It’s a democratic dilemma in a newly open democratic society that has its roots in countries that has been run by socialists for the better part of the last hundred years. :p
I think you complicate things too much with the email issue.
1. Eudora was one of the more common clients in education, and for people over 60 they tend to change to new software when they’re dead.
2. No educational organization has implemented a proper utopian email system.
3. All systems suffer from fundamental flaws in the security design: They’re run by morons who want to constantly save time and money, which is why people end up having all sorts of access rights, because they’re performing all sorts administration duties, not the least because in educational institutions people come and people go, when they really shouldn’t.
4. Usually, in politicized controversial, leaks and hacks are not done by super villains from the opposition but by the very side who cries about having been hacked and sprung a leak. Sometimes knowingly so, if only to have the opposition running in circles wasting time, and if it works, hurray, but if it fails there’s always a couple of guys to feed to the wolfs, because sometimes they don’t really tell the upper echelon about their trek into the adventures of smoke and mirror’s lane.
5. Most people, it seems to me, in politicized controversies, who get caught from having hacked and leaked does so get caught because they’re, in fact, not super villains, but village idiots suffering a hubris level of naiveté, like using their own names asking for help to solve data problems, apparently, all the while thinking nobody will ever be the wiser. :-p

November 30, 2011 5:14 am

Great summary, Dave. Those emails still exist. They can never be completelt deleted, and they are damning evidence of the conniving climate alarmist clique that is perverting honest science for their own self-serving interest.
For Phil Jones, behind that silly, fatuous Jimmy Carter-type smile, there lies an evil, vindictive mind. Jones is completely anti-science, and self-serving at the expense of all honest, skeptical scientists [the only honest kind of scientists].
For the good of civilization Jones must be removed from his position; tried, convicted and imprisoned for misrepresenting science, and for deliberately conniving to destroy the carreers of honest scientists. Phil Jones is nothing but a criminal with a thin veneer of faux respectability.

November 30, 2011 5:31 am

Thanks for the lesson !!!
Very informative !!

November 30, 2011 5:38 am

Thanks for the education.

R Barker
November 30, 2011 5:39 am

Thanks David. Very informative. I wonder what gems might be revealed in the all.7z file(s).

Frank K.
November 30, 2011 5:44 am

Nice article – Thanks.
I hope this encourages everyone to remember to be professional and respectful in your e-mail correspondence at work (in particular), and even in your personal e-mail/g-mail/hotmail account. The internet remembers…

davidmhoffer
November 30, 2011 5:56 am

Dale;
The first snapshot is a full copy, but one’s after that are just the differences.>>>
That is mostly correct. However there are mlutiple snapshot architectures. “Split Mirror-Extent” makes a full copy and then records incremental changes after that. “Copy on Write” never makes a ful copy, just makes copies of the original portions of the data that would otherwise have changed in order to reasemble what any given point in time looked like. “Redirect on Write” never over writes data in the first place, it just copies the pointer tables at the given time and then reverts to those.
Then there are techniques such as journaling which also result in the ability to restore data to a specific point in time, and are sometimes used instead of (or as well as) snapshots.
Of course for this answer to be complete… I’d have to also describe vaulting 😉

Stephan
November 30, 2011 5:59 am

Well Lucia is now showing that the IPCC projections are now outside (below) the error bars LOL
http://rankexploits.com/musings/2011/la-nina-drives-hadcrut-nhsh-13-month-mean-outside-1sigma-model-spread/#comments
and that does not include November temps hahahah

davidmhoffer
November 30, 2011 6:01 am

A Public Servant;
Thanks for the clarifications! I knew that FOIA laws were implemented differently in the UK than in NA, your explanation clears up a lot of my own confusion as to why things were done a certain way.
If I read your explanation correctly, then in the UK one could actually avoid an FOIA request by deleting the email even though the institution itself would clearly still have them. How add. Sort of like a law that comes with a get out of jail free card attached to it!

Warren in Minnesota
November 30, 2011 6:06 am

Excellent article!
I will add that another minor problem with storage is similar to someone who now wants to play 8-track tapes. The hardware is no longer available or supported. Computer and storage technology as well as software constantly change making retrieval difficult when hardware and software are no longer available.

Dodgy Geezer
November 30, 2011 6:09 am

One minor point – law does require an evidence chain. In theory, I could delete all copies of a damning email from my PC and my email server, and then claim that copies found on systems belonging to someone else were forgeries which had been placed there to incriminate me. In justification, I would show that the corresponding files did not exist on my system…
In reality, if someone tried that I am pretty sure that the balance of probabilities would be firmly against me, and, in the UK at least, there is the ‘legal fiction’ that a computer is deemed to be operating properly unless evidence can be given to the contrary, so the remote servers might be deemed to be definitive.
But it would be an interesting argument to put up….

davidmhoffer
November 30, 2011 6:10 am

Shevva;
The troubling point as you say is that they span 20 years? For this there is no way that any IT department would keep the e-mails live on the system for that long so the only way to get 20 year old e-mails is the back up tapes, why the back up tapes because 20 years ago this was the only way to back up data >>>
As old email systems were replaced by new email systems it would have been a hassle (but just a hassle, not impossible) to ingest the old system’s email database into the new system’s email data base. Similarly, tape backup evolved over time. 20 years ago a tape cartridge could store a few megabytes. Now a single tape cartridge can store on the order of a terabyte. As backup systems evolved, ingesting data from old backup tape systems into the new ones was also a hassle, but certainly no where near impossible.
Building an archive would also be a way of addressing these issues. Most archive software can ingest email from most email systems…and export it back to most email systems. Once you have a good solid archive system in place, evoloving the email systems without actually losing access to older email becomed very easily managed.

RobB
November 30, 2011 6:15 am

Bearing in mind the long time frame of these emails it does sound most likely that it was the archive that was accessed. I would be interested to hear whether it would be possible to remotely access an email archive or storage facility or do you physically have to be there to get at it? In other words, can such facilities be hacked off-site? The answer might put to bed one of the more common arguments expressed in the blogosphere!

davidmhoffer
November 30, 2011 6:18 am

SteveW says:
November 30, 2011 at 4:16 am
Just a thought, and one which I may well pursue.
Assuming the above to be correct (and I have no reason to asume otherwise) is it fair to assume that somewhere within CRU’s systems there should be copies of all the amails Phil Jones stated that he’d spent half a day deleting>>>
Depends on exactly when in time he did the deleting, and exactly how the systems were implemented at that specific point in time.
SteveW
Would their systems also have a record fo what and when was deleted by a specific user from their own personal folder?>>>
Is it possible? Yes. Is it likely? Yes. Do we know for certain that the specific implementation of the email system during the specific time period in question retained that information? No. What is most likely? Most likely yes.
SteveW
If so, I feel a request for any emails deleted by Phil on the date of the aforementioned email or within 7 days thereof – could prove to be quite interestign reading.>>>
I’d check out the comment from A Public Servant above. In the jurisdictions I’m familiar with, you could do that. From the looks of A Public Servant’s comments, this may not be so in the UK.

November 30, 2011 6:18 am

“The troubling point as you say is that they span 20 years? For this there is no way that any IT department would keep the e-mails live on the system for that long…”
Not true. I know lots of people who have mail spools that go back at least that far, for at least selected messages. My own would, except that I get so damn much spam that it grows to where I have to make my own archival copy of my own mail spool and then reset the whole thing back to “zero messages” (I have almost 20,000 messages in my current mail spool). The oldest mail message that I have in my own regular email archives is from 1993, although there is nothing special about that. I probably do have messages archived that go back to bitnet days in my research directory space.
The question is whether or not storage for mail spools grew faster than people’s mail spool files. For an “IT rich group” with its own server(s) and mail spool and plenty of money for big disks, individuals may literally never have been asked to delete email to make room, or may have selectively deleted away crap and retained communications from their closest colleagues, especially (in the latter case) if they thought the discussions were important. For example, one of the 1994 archives I have references the steps taken to catch out a postdoc who was engaged in IP theft of code and data from a major (but poorly secured) project in a department at Duke. Since I did the actual work of running him to ground (doing the kind of thing described elsewhere, correlating the logs of several systems at once to validate the assertion that it was really him and not somebody who had his password) and since there were all sorts of liability and legal issues involved, I kept all the records of the correspondence.
This latter case is still apropos today. Absolutely anybody could have stolen the CRU archives. By absolutely anybody, I mean that I (who have never been anywhere near East Anglia) could, under certain circumstances, have done it. So could Joe Cracker from the middle of Iowa, or a paid hacker employed by the near-mythical Oil Cartel that somehow fails to send me money for being publicly skeptical. All it takes is either mad skills and a certain amount of luck (both needed to crack a competently managed site) where the skill/luck combination steadily decrease with the competence of the CRU IT staff and/or their reliance on notoriously insecure server software and technology (such as various incarnations of WinXX and Outlook, but Unixoid servers can be just as badly run and insecure).
Note well that there is no real reason to think that the perpetrator (if any) physically resided inside the building at the time. A scenario like:
a) CRU IT staff person X often works from home. His teen age son uses the same WinXP computer to play games. The son downloads a WoW hack that contains a trojan and a sniffer/keystroke logger. The IT person logs in to CRU’s main servers to make sure the overnights are going well or finish a chore he started during the day (all IT people get their best work done at night anyway) unaware that his keystrokes are being logged.
b) Joe Cracker (who built the trojan and receives the output after umpty breakouts designed to protect him) gets the logs — and there are a LOT of logs. He runs them through a filter to pull out things like probable password entry sequences and and is scanning through the addresses and notices that he’s hit the jackpot! Hot damn, entry into the CRU. Not only that, but as an IT staff person — he now has the ROOT password on the SERVERS.
c) Joe himself, or one of Joe’s friend, or somebody Joe approaches to sell off his discovery to, takes said password, goes in, and — doing what crackers do — looks around a bit. Maybe he’s curious about AGW and starts reading the mail spools of these “famous” AGW people. Lo, he discovers that they aren’t heroes, they are a**holes! He’s almost by definition an anarchist rebel who sees himself romantically. For whatever reason, he sets up a script that will take a snapshot of the entire /var/spool/mail directory for starters, then looks around in e.g. home directory space and elsewhere and grabs other stuff as well. CRU must have very high bandwidth (big datasets flying across the atlantic to colleagues both ways) and of course Joe has the highest bandwidth he can afford — if he’s at a University himself very high indeed, but even at home probably a premium service. Minutes to hours later, he has everything that has appeared in CG1 and CG2, and maybe more! Why limit what you steal while you have the opportunity! Take it all, sort it out later!
Some time later — perhaps even the next MORNING — the intrusion is noticed, passwords are changed, and so on. Barn door locked, horse gone. Perhaps Joe installed a backdoor, perhaps not. The point is that there are plenty of perfectly reasonable scenarios (where I know as historical fact not one but SEVERAL cases where almost precisely this has happened in the past, and it is one of my own personal nightmares and one reason I never use my sons’ systems at home to do work or let them use my personal system at home for much of anything at all) whereby perfect strangers could have cracked CRU and taken whatever they wanted.
As noted in the article above, the barrier goes down the closer you are to the server, the more you are inside its security barriers (e.g. firewalls, VPNs). Hence postdocs on site, disgruntled faculty, the janitor, IT staff gone bad… But this isn’t a national security site — it is a bloody university research department doing non-classified research, so it is virtually certain that people have e.g. ssh access or vpn access or even rdesktop access from outside of the firewall, and anyone with the right credentials then can waltz right in. There are so many ways those credentials could have been obtained by a third party that a third party cracker is right up there with the main chances of an inside job.
rgb

November 30, 2011 6:20 am

Someone care to ‘splain the scenario where the central ‘server’ (in the closet!) complains that I have “exceeded my mail quota” and I need to free up some space by ‘moving’ the e-mails to a private folder (ostensibly on my local PC it appeared)?
Mail client was MS Outlook running under Xp.
Small Windows shop (not *NIX) BTW (and I was just a ‘user’ not the admin). Time frame 2006 – 2007.
Q’s:
1) Were the e-mails moved _off_ the server in this case?
2) Would copies have existed anywhere accessible by the server?
.

davidmhoffer
November 30, 2011 6:23 am

Russell says:
November 30, 2011 at 4:09 am
While interesting, this overstates the availability of data *in readable form*. For most of the time period covered in the emails, one would need root privileges to read the emails, even if they were from a stolen tape (or any tape). >>>
In the case of email systems it has been common practice to ingest email from old systems into new systems and then backup the entire new system (including the old email) in order to preserve everything in a readable format. Only encryption would defeat this from a backup tape perspective. The fact that the unencrypted emails end in 2009, and encrypting tape backups began getting popular right about then may be a “root cause”…. or just a coincidence.

davidmhoffer
November 30, 2011 6:37 am

Robert Brown;
Today, to be REALLY secure, you not only have to delete the file, you have to go in and do bad things to the physical disk platter to be certain of putting the data firmly beyond the reach of somebody who really wanted to get at it, say the NSA.>>>
I used to have a lot of military customers, though its been a while. When they wanted to “retire” and old hard drive, they had a machine with a hopper on top that you dropped the hard drive into. There would be a nasty grinding sound, and metal filings would emerge at the bottom.
Pretty effective, but your point is valid. Provided that you have the right equipment, it is sometimes possible to recover data from a drive that has been overwritten several times.

Alex
November 30, 2011 6:50 am

If today’s price of 1G storage is, let’s say, one dollar, and it will cost half dollar next year and so on and so forth, the cost of storing the same volume of data is halved every year. If the volume of stored data is doubling every year, then there would not be much scope of really deleting old data to make space for the storage of new data, especially considering the risks due to FOIA. Hence, i would say that there should be no real scope of deleting old FOIA-iable emails and other data. The best and safest thing to do is just keep on buying memory. It all depends on the economics of the thing really.

A Public Servant
November 30, 2011 7:10 am

John says:
November 30, 2011 at 4:13 am
I totally agree. Reading through the emails, my belief is Phil Jones was ignorant of FOI, then rapidly appraised about UK FOI law and then decided to urge colleagues to delete their emails without realising that most would already have their own FOI concerns and would have been well aware that deletion of emails from their client was not sufficient. Otherwise, the emails from Phil Jones make little sense to me. (I am at work – otherwise I would search and quote directly.) I could easily be wrong of course, but I have seen not dissimilar responses at work…
Sorry to Anthony and the mods if I have broken convention when referring back to earlier posts.

Louis Hooffstetter
November 30, 2011 7:21 am

Suggestion:
Although I’m not a Brit, I would gladly contribute to a ‘tip jar’ to hire lawyers to sue the CRU on behalf of the British taxpayers. They should be sued not only for failure to comply with British FOI laws, but they should be held liable for massive punitive damages as well. Maybe British citizens could recover some of their money that was wasted on this tribe of fraudsters. If such a ‘tip jar’ or fund is ever set up, please let everyone at WUWT know about it.

November 30, 2011 7:29 am

Fascinating article on the basic way an email exchange works, and how it is damn near impossible to delete something once it is sent out on the web.

Slartibartfast
November 30, 2011 7:29 am

I used to have a lot of military customers, though its been a while. When they wanted to “retire” and old hard drive, they had a machine with a hopper on top that you dropped the hard drive into. There would be a nasty grinding sound, and metal filings would emerge at the bottom.

Interesting approach. My own non-DOD-approved technique is to remove the hard drive platters and bend them into interesting shapes. In theory you might be able to recover some fragments of data, but it’s unlikely you’d be able to do anything with them.
My comment about about the thread in which you, davidmhoffer, got into it with another commenter, should have linked to here. It’d be interesting to continue the parts of that discussion that actually apply to the problem at hand, here in this thread. Because I don’t think either of you admitted they were wrong, there.
Which is not to say that I have any idea which of you actually is wrong.

A Public Servant
November 30, 2011 7:31 am

davidmhoffer says:
November 30, 2011 at 6:01 am
That is exactly right. UK FOI law is a mixture of the onerous and the very lax e.g. having poor or non-existent records management policies may not be a breach of the Act, but if an organisation cannot address legitimate requests because of that fault, it may be a breach(!) This kind of oddity also extends to physical documents: if a public body’s records management / document retention policy states a document should have been deleted, there is generally no need to look for it when asked for under FOI.
The exemption relating to international relations cited by UEA would have been interesting to see challenged at an Information Tribunal or in the High Court, particularly as it would be subject to a public interest test…

November 30, 2011 7:48 am

Just a technical note on IMAP servers and their interactions with email clients; “clients” being the application programs that are used to access emails on the server.
When the email client, e.g. Eudora, Thunderbird, Outlook, … deletes a message, it flags it for deletion on the IMAP server. (There are technical and human-interface reasons for that.). When the user wants to recover space in their mail area, they send a “compress” command to the IMAP server for that mail folder, which causes actual deletion. If the user set their mail client options to show deleted items, then they’d be listed; usually with strike-through.
Unfortunately, as users haven’t been educated and never actually compressed their mail folders, the delete from the client is more recently accompanied by a compress; reducing operational efficiency and the ability to recover from OOPS moments.
It still doesn’t fix the problem of the Trash folders with 123,456 deleted items.

Joe
November 30, 2011 8:02 am

As a professional Back and Storage administrator I fully endorse this article.

AJC
November 30, 2011 8:03 am

If you want something less generic and more specific to CRU and their e-mail system(s) then I commend Lance Levsen’s original analysis (December, 2009) …
Climate-Gate: Leaked
http://www.smalldeadanimals.com/FOIA_Leaked/
also a quick look at Peter Sommer’s pathetic UEA – CRU Review ….
Initial Report and commentary on email examination
http://www.cce-review.org/evidence/Report%20on%20email%20extraction.pdf
informs us that the three memory sticks he received for analysis from the backup server were in the alien (to him) format called “Thunderbird”. I surmise that these three sets were personal collections produced by the three individuals. The ClimateGate 1.0 archive was just that – an archive – presumably for the whole CRU or more likely a select sub-group.

Olen
November 30, 2011 8:05 am

AaHH for the days of a paper shredder and burn site.
A great post and superb comments.

Stuck-Record
November 30, 2011 8:18 am

David
Thank you for your genuinely enlightening essay.
I have a question. Do you think that the authorities (UEA and Norfolk Police) know who did this and how?

November 30, 2011 8:37 am

As with most universities the UEA IT department is a bureaucratic nightmare,
http://www.uea.ac.uk/is/contacts
http://www.uea.ac.uk/is/contacts/isdstaffingstructure
Admin access to the email servers is likely shared with the network and IT support staff, which looks like a lot of people. Just as I suspected they do not have anyone dedicated to email. I would be very surprised if they had redundant data centers (redundant servers in the same data center possibly) and had more than one form of backup for their email servers.
(performance degrades sharply with distance)
That depends on the type of network connection between the servers.
The last email release ends in 2009, and the rest of the release is, in fact, encrypted. One can only wonder, does the CRU encrypt their backup tapes, and if so, when did they start doing that?
The hacker encrypted the rest of the emails using 7-Zip and AES 256-bit encryption. I am almost positive it was a hacker and not internal IT staff or something else as they hacked into the RealClimate website with the original Climategate release.

eyesonu
November 30, 2011 8:41 am

Thank you David for a very informative post.
Thank you WUWT for the knowledge offered on this site.

henrychance
November 30, 2011 8:48 am

Only 10 years ago, the CPA’s for Enron and Enron were working hard with the shredders. Like your article says, that is a mere “symbolic” act at destroying files. CPA’s do like to review original source documents and have a document trail. (Enron wind is now known as GE wind)
This article is simple and well needed. It is too little and too late for the Mann/Jones world to clean up their underground activities. CPA’s are taught how to audit around the IT system and audit thru the IT systems. This article is actually a template on how fraud is caught. Jones claims some old temp data sets have been deleted. I think they exist and he hopes they can’t be restored.

More Soylent Green!
November 30, 2011 8:53 am

Is it just possible they had lax security standards and poor enforcement? You bet it is. There are lots of hacker types hanging around universities of any size. Sometimes those guys become the unofficial IT guy because the official guys are so hard to work with, move at a snail’s pace and don’t really care if they help you or not.
It could simply be the revenge of the hacker. Maybe somebody kept treating him like a flunky and he got tired of it.

AJC
November 30, 2011 9:00 am

For a “research” group like CRU it is almost certain that its computing effort was provided, for many years at least, internally on research funded equipment and staff possibly as a “hobby” overseen by one of the long term researchers.
Over many years centralised IT service provision would have lagged the group’s requirement. More recently the provision of IT for the masses will have been the primary focus of the IT service – not providing support for specialist research requirements
So its quite probable that CRU have been “running” their own show for most of their history
The quality of the code fragments which leaked and the excuses about being unable to to resource even minimal change/version control with their datasets indeed indicate that CRU was/is just another shambles on the IT front: it just shouts lack of professionalism – and I would guess that this applied/applies more widely within UEA .

mojo
November 30, 2011 9:03 am

My money is on backup tapes – they are, as a general rule, incredibly easy to walk off with, and these days are quite small physically. Most shops rotate through a set of tapes, overwriting old data, so if part of a set gets replaced with blanks, chances are good that nobody will notice.

Mike Wilson
November 30, 2011 9:07 am

I have worked in storage administration since 1986. You did a pretty good job on your article, but it has actually become even more complex than your article states.
I did not see any reference to how many companies are virtualizing their tape systems. I helped IBM test their first VTS in Texas while it was still in beta in the mid 1990s. Since tapes were so difficult, expensive to handle and poorly utilized VTSs were developed. The backups for the operating system think they are writing to a bank of tape drives, but the VTS is actually writing the data to a disk raid array. The VTS then can take multiple tape images on the array(s) and stack them together with other tape images to fill today’s large capacity tape media. The VTS can automatically duplicate the physical tape and you can have multiple VTSs in different locations where the data can be automatically replicated between them for disaster recovery. Also as tape data expires and the valid data on tape drops below the desired threshold, all the data is automatically copied to new tapes in a reclamation process. The VTS can also spit out copies of tapes for vaulting and will automatically eject tapes it determines having media problems (which can still have data that could be retrieved).

Pascvaks
November 30, 2011 9:17 am

Not every hacker is a Russian or Chinese Commie out to disembowel the decadent West. Not every hacker believes in Mann Made Global Warming nor the End of Civilization, Versions 101, 102, 103, 104, etc., proposed by the IPCC and preached by the Rt Rev Al Gore and his flock of Save the Earth from Human Destruction jihadists. The ‘science’ of Climate is still being fed by a baby bottle and still makes a mess in his nappy. Keep looking! Keep searching! And don’t believe everything you hear, no matter who told you, or what the subject is. Be skeptical of all college professors, all politicians, and all stock brokers. It’s a dog eat dog world out there.

mrrabbit
November 30, 2011 9:20 am

Dale beat me to it…
Only the first snapshot is a full copy – and a read-only copy at that. From there on out – each additional snapshot is a read-only copy of changes that have occurred with referential pointers to the original full snapshot of where the changes occurred.
The only time snapshots become full copies again is when they are requested either manually or via scheduling – typically monthly.
Manuals for NetApp arrays usually have a nice section with a very easy to read explanation of how snapshots work with pie diagrams and all.
=8-)

davidmhoffer
November 30, 2011 9:31 am

_Jim
Q’s:
1) Were the e-mails moved _off_ the server in this case?
2) Would copies have existed anywhere accessible by the server?
OK, I neglected to cover this in the article itself, so great question. Unfortunately the answer isn’t static. I’ll try and keep it brief.
The message you got was the result of the email administrator assigning a storage quota. When you used up a certain amount of disk space, you got a message telling you to clean up. So, you “moved” some of your email to a “different folder”.
Could the email server “see” those emails? Probably not. The whole point of forcing you to clean up your email was to get it off the email server’s storage.
But…where did you move them to? This is ultra important.
If you moved them to a thumb drive and stuck it in your pocket, of course the server couldn’t see it. If you moved them to anywhere on the company network however…different answer.
1. While the email server could not “see” those files, if they are network accessible, including the hard drive on your desk top computer, or a shared drive on the network, an IT administrator could restore them back to the email server with a few mouse clicks.
2. Here’s where implementation of archives makes the answer even more complicated. One of the things that enterprise class archive software does is “crawl” the network, searching for email folders, pst files, etc, that were at some point removed from the email server. It can then copy those emails to the archive, and then delete the copy you made on a shared drive or on your own desk top. You probably wouldn’t know it, because the archive software would leave a “Stub” behind. When you searched for those emails, the stub would present the index to you exactly as you would have seen it before, but now the emails themselves (should you actually open one) would be opened from the archive, not the folder you think you have them in. Plus, when the archive was set up, they would most likely have made it visible to the email server….so the emails are once again available from the email server.
3. Provided your emails were being properly backed up, there are most likely copies in the backup system no matter what you deleted or moved somewhere else.

davidmhoffer
November 30, 2011 9:32 am

More great questions to answer, but I’m off line for the next few hours. Will do my best to answer thenm all later today.

Ray
November 30, 2011 9:32 am

If we consider that the whistle-blower might be an internal IT, I bet he has a personal copy of all the emails… maybe this is why the password protected document is locked. It could contain recent emails.

Mike Wilson
November 30, 2011 9:35 am

mrrabbit,
“Only the first snapshot is a full copy – and a read-only copy at that.”
I think it depends on the particular vendors implementation of snapshot, flash copy, BVC or what ever the vendor calls their implementation. It is actually possible for the 1st copy to be just incremental (all done with a magic bitmap in the storage controller) or for all copies to be full copies. And some snapshot technologies let you specify which you want.

Austin
November 30, 2011 9:35 am

Good article. There are a few errors. The most glaring one is how snapshots work. Most snapshot technology today just maintains a list of pointers to the old blocks. They do not keep a full copy. Snapshots age and then are thrown away or become stale. You can also set the system to not throw away the blocks, too. So, if you keep the list of pointers AND the blocks, then you can keep the information in stasis.
Upgrades. When email systems are upgraded, the old emails are brought along. Sometimes the new email system allows users to use their old client. Furthermore, the PC that each person works on when upgraded or replaced is often backed up over the network or replicated. This then feeds into the topic below.
Compliance and archive. In general, all email and other relevant documents on workstations are copied into the compliance./archive system and then organized. Also all data from servers is loaded as well. Compliance and archive systems have tools to perform searches. Many compliance systems today intercept all the email before sending it to the email server so even if someone deletes an email, a copy still exists.
Decommissioning. Most people just get rid of their drives and tapes. Rather than shred them. The data on them is recoverable.
Most backup systems have pretty wide open back doors even when configured properly. And when admins do take pains to do it right, and most don’t, the backup encryption technology is often not upgraded over time – many key based systems from the 90s are crackable by direct attack using publicly available methods in a few days. Furthermore, a clever person can program a tape reader to read the data from a tape directly, which in many cases, is easily accessible from a low level. Most places have very poor chain of custody on tapes and do not use tamper proof seals on the tapes.
Looking at the emails, it looks to me like the output from a search of a compliance/archive system. That does not mean it came from one that is used by the emailers’ employers. It could be a system set up by the leaker into which data was fed, or came from a backup of a system or systems in place.
A careful social network analysis of the emails might reveal more clues.

Alan the Brit
November 30, 2011 9:44 am

Oh how I love this bloggsite! I learn so much every time I enter it! I have definitely learned about emails now & how back-up systems & servers work – well a bit more than I did before! Thank you.
LibertyisOurTreasure says:
November 30, 2011 at 7:29 am
Nice one, Cyril! 🙂

mrrabbit
November 30, 2011 9:57 am

Mike Wilson:
As Dale mentioned earlier, replication is a full copy. Calling archiving, replication, backup, incremental backup, etc., “snapshots” doesn’t make it so unless it involves an initial full snapshot followed by snapshots that are read-only pointers to changes that reference the related full snapshot.
In others words, while replication in IT circles may not be snapshots – a snapshot can be replication.
As I said earlier, the Netapp manual does a good job of clarifiying exactly what a snapshot is compared to a simple replication, copy or backup.
Sure it is splitting hairs and circular argumentation – I won’t deny that. But the distinction is important.
I’ve watched managers in IT several times who were quite familiar with replication, backups, archiving, etc., try to solve disc space issues on Netapps, IBM arrays, as well as Sun disk arrays – delete the wrong snapshot – try to recover from the other snapshots – only to wonder why they end up with nothing.
It is a slight, “splitting hairs”, circular distinction – but an important one.
=8-)

Bill
November 30, 2011 10:10 am

Someone else may have already said this ……. As far as why did he say “Eudora tells me”?
In another e-mail string, he admits to not knowing how to fit a straight-line in Excel (or apparently any other program he had handy). So it does not surprise me that he only knows a little about the e-mail system.

Mike Wilson
November 30, 2011 10:14 am

mrrabbit,
“In others words, while replication in IT circles may not be snapshots – a snapshot can be replication.”
There are all sorts of definitions out there, but in my circle, we have always considered replication to be another copy of data on a separate computer system or separate storage controller. And a snapshot to be an additional instantaneous copy of data (whether full or incremental) within the same storage controller.

November 30, 2011 10:27 am

question on exchange-outlook relationship where non-cached is used.
where is the local copy kept?
there is no ost file like cached mode, is it a temp file deleted at some point?
I had actually been wondering this and was running some tests against my exchange 2010 box today to see when I happened to read this.
figured I would ask.
like everything on this site good article and I hope it makes people appreciate their IT depts some 🙂

Atomic Hairdryer
November 30, 2011 10:59 am

I suspect a simpler explanation, considering CRU’s email system was apparently self-administered and existed outside of UEA’s normal IT infrastructure and management. That’s always been a challenge with IT in academia from my experience, users that think they know what they’re doing, and in doing so, blow gaping holes in security.
I think this explains how CRU’s email backup system was configured:-
http://www.devco.net/archives/2006/03/24/saving_copies_of_all_email_using_exim.php

You can now restart your Exim server, if you’ve done it all right and created the main Maildir where this all live under your incoming and outgoing mail for domain.com will all be saved on a per user basis.

Simple, convenient for users, but also simple for someone who knows where the copies are held to grab nicely organised user directories containing all incoming/outgoing emails. No need to worry about obscure archival formats either. From memory, CRU initially used Sendmail and it’s also simple to script that to copy all incoming/outgoing mail as well. It’s also still a cheap way to do compliance for organisations that didn’t buy into the Microsoft ecosystem.

dukeofurl
November 30, 2011 10:59 am

For tape backups, especially the old reel types you dont need to keep one of the standard high speed tape units to read old tapes. There are desktop units ( were ?) which can read slowly and normally are used to transfer to a hard disk. A software program on the attached PC would read the data in hex ( or EBCIDIC) and converted format side by side.. Record length and block length would be shown along with headers. Once that is known or some good guess work the data is converted onto PC format. This would be for raw data. Database formatted data could have a different structure. Of course email data would have fairly standard formats.
Gee, Im glad I havent had to think about this sort of stuff for nearly 15 years

November 30, 2011 11:10 am

Can you say Pandora? I hope there is more to come…

November 30, 2011 11:38 am

Found that they uses X.www X = variable, maybe —??,, etc.
Ilkka Mononen. alias Myteurastaja.
https://sites.google.com/site/myteurastaja/home
e) p16 – an operations timetable – need to specifically mention the setting
up of a comprehensive WWW site with public and private pages….
f) on page 3 of the call- para 3 starting Global climate models… it says
that a significant challenge for the new centre is …. I am NOT sure we
have explicitly addressed the question – esp local and regional scales, cut
down model etc
g) perhaps in Suggested Research Agenda intro need to specifically mention
existing close links with Hadley, UGAMP, UKICP, IPCC….. and say will work
closely with and compliment …
The list of typos and small changes(done on John Shepherd’s version of the
draft) :
1)p3, line 1 – double comma ,,
2)p3, line 7 – major cultural divides -> major cultural and organisational
divides
3)p3, last line – double full stop ..
4)p4, list of names -Markvart – Dr not Prof
5)p7 – management structure : bullet point one : line 2 scientists We ->
scientists. We
6)p7 – management structure : bullet point 2 : delete open square bracket [
7)p7 – management structure : bullet point 3 : need a little explanation
after the Programme Leaders if only to say ‘whose role is described below’
8)p7 – management structure : text after bullet points : Council’s ->
Councils
9)p7 – ditto – ditto : if the Soton & UMIST reps are well known figures then
I think they should be named now
10)p7 – ditto – ditto – need to define the Centre’s Science Co-ordinator and
Communications Manager – is this one post or two ? what are their role’s ?
how is the Science Co-ordinatoir different from the PL’s or the ED ?
11) p8 – line 1 – I thought the Management Team mtgs should be MUCH MORE
FREQUENT than every six months, if not then what body/person is running
things in the interim ?
12) p8, para 2, line 3 ‘responsible to implement’ -> ‘responsible for
implementing’
13) p8 ditto, last line – double full stop ..
14) p8, para 3, line 2 : this JIF -> a recent JIF
14) p8, ditto, ditto, office accommodation has -> office accommodation has
already
15) p9, para 2 line 2 – double full stop ..
16) p9, challenge 1, para 2, line 3 double full stop ..
17) p10, challenge 2 line 2 delete [and alternative]
18) p12, challenge 5 para 1, line 5 double full stop ..
19) p12, ditto, ditto, line 9 ?. to ?
20) p13, para 2 methodsl -> methods
21) p19, Jim Halliday, line 2 Director, Energy -> Head of the Energy
22) p20, Nick Jenkins, email address -> ???@umist.ac.uk
23) p21, Jonathan Kohler, email address -> ???@econ.acm.ac.uk
24) p21, Tom Markvart : details are School of Engineering Sciences,
University of Southampton email ???@soton.ac.uk

davidmhoffer
November 30, 2011 1:00 pm

Stuck-Record says:
November 30, 2011 at 8:18 am
David
Thank you for your genuinely enlightening essay.
I have a question. Do you think that the authorities (UEA and Norfolk Police) know who did this and how?>>>
Not a clue, there’s not enough info out there to even hazard a guess on that score. I will make the observation however that the skill sets to conduct investigations into who did what and when in an environment this complex are very rare. I’d be shocked if the IT department at UEA had someone with those skill sets. I wouldn’t be shocked if Norfolk Police had them…. but I’d be surprised. That’s the domain of high end security consultants and government agencies with three letter acronyms.

mikerossander
November 30, 2011 1:07 pm

This was a great post. Thank you.
A couple of minor quibbles, though. First, in the example above the copies on the recipient’s computer (and his/her institution’s server) do still exist but are irrelevant to an FOI request. Document requests can only be served against you for documents under your control. Copies of documents on other people’s equiment are not your problem. Requestors wanting those copies must submit separate requests to those custodians. (And assuming that the other custodian is third-party and not a direct participant in the suit, the request is held to a higher standard than a request to a directly-involved party.)
Second, remember that (depending on the email system’s settings), deletion is generally treated as just another a differential that gets rapidly propogated to the local server, archives and storage arrays. If you delete the email on your desktop, it will be flagged for deletion on your institution’s server copy and overwritten the next time the computer decides it needs to use that particular piece of the disk. If a message was sent, received and deleted all within the time between incrementals, it may not be captured for backup at all. (That won’t be true if the institution journals all messages but journalling is uncommon except in highly litigious environments.)
Even if the message was captured for backup, the incrementals are unlikely to still exist by the time a document request comes through. Most IT shops reuse their incremental media on the same cycle as their full backups – a week or a month at most. The fulls themselves are reused on a cycle set by the IT backup needs – usually 3 or 4 back. You could keep longer but why would you?
(Yes, there are some horrific counter-examples. I remember one case where the company had an email retention policy based on business need and properly approved by the Board. They were sued and rightly disclosed their retention rules and practices along with the statement that the documents in question had been legitimately deleted in the normal course of business. During depositions, a single data center person told of having been chewed out as a young tech for ‘losing’ a file and admitted that he had been taking the tapes out of the trash and storing them in his garage at home “just in case”. He violated company policy and the law (theft, falsification of destruction records). When discovered, he was disciplined and ultimately fired for his misbehavior. The evidence showed that no one in management knew of this practice or had any idea that these tapes had not been destroyed as the records showed. Despite all this, the company was held liable for not knowing that these copies still existed.)
My (long-winded) point is that I haven’t seen annual backups of transient systems like email in years. Annual backups of your general ledger system? Yes. Email? No. Maybe public educational institutions have a different cultural attitude about document retention. In the environments where I’ve worked, the costs of managing all that data and all those copies are not trivial and the company has a strong and legitimate business incentive to keep the number of backups down to what is reasonably necessary. (Users are packrats and want to keep everything but that’s another rant entirely.)
I have to offer another quibble with davidmhoffer’s comment above that “it is sometimes possible to recover data from a drive that has been overwritten several times”. Not unless you’re still using some pretty old equipment. To understand why it’s no longer possible, we first have to understand why it used to be possible.
Remember that the 1s and 0s on the media are really little blobs of up and down magnetic ‘charge’ set down on the disk or tape. Think of it as blobs of blue and green paint for a minute. As the data is being written, little blobs are being plopped on the disk or tape. When it’s time to read, the head goes back and looks at the pattern of blobs. When it’s time to overwrite, you lay a new pattern of blobs down overtop the old pattern. Old disk and tape heads were pretty sloppy, though. If things shifted a bit or weren’t aligned perfectly, the new blob might mostly cover the old blob but still leave a sliver of the old blob visible at the edge. If you had a high-enough powered microscope and some really good software, you could sometimes tease out the pattern of misalignments and make a pretty good guess from the slivers showing about the overwritten content – at one point, as much as 7 iterations back.
Miniaturization and better alignment made that a thing of the past. Nowadays, the blobs are so small and the heads so well aligned that there’s essentially no slop. A single overwrite will completely obscure the prior data. There is, in fact, a reward offered for anyone who can recover once-overwritten data from any reasonably modern media. It has gone uncollected for several years.
Again, thank you for a great post.

davidmhoffer
November 30, 2011 1:09 pm

Dale beat me to it…
Only the first snapshot is a full copy – and a read-only copy at that. From there on out – each additional snapshot is a read-only copy of changes that have occurred with referential pointers to the original full snapshot of where the changes occurred.>>>
There are different snapshot architectures. In a “split mirror” architecture, the first snapshot is a full copy, and additional snapshots are the incremental differences. In a Copy on First Write architecture, the first snapshot is zero new bytes, and the same is true for Redirect on Write.
You are correct that snapshots are “read only” copies of the data. In any snapshot architecture, there is usually a method to promote a given snapshot to be both read and write, but it is then called a “clone” rather than a snapshot.
The vendor you mentioned (Netapp) uses a Redirect on Write snapshot. Other platforms such as Equallogic and Compellent and ZFS are also Redirect on Write. Copy on First Write is the technique used by Hitachi Data Systems, EMC, IBM DS5000 and many others. Split Mirror is available as an option from EMC, and is inherent in the IBM v7000. My point here is that you cannot say snapshots work in any specific way. There are three main architectures, and the implementation between two vendors using the same architecture also have differences.

Mike Wilson
November 30, 2011 1:10 pm

mrrabbit,
Also, I believe what you are describing as a snapshot, I have always referred to as a “point-in-time”. A snapshot is an instantaneous picture of the data. A snapshot may or may not involve a
a valid point-in-time depending on a number of factors. With replication, you are continually replicating a copy of the data on a server (it may be all the data or just a critical subset of it) which would be more like a video tape that is continually moving. And the replication may be synchronous or asynchronous (synchronous meaning updates are sent to both servers at the same time and asynchronous meaning the updates happen only on the secondary system (replica) a little later as bandwidth allows. But in either sync or async modes you need to momentarily pause the replication to take a point-in-time to have a valid system for recovery. Otherwise your data is what we call “fuzzy”.

PaulH
November 30, 2011 1:13 pm

Excellent article and discussion. Puts me in mind of the book “The Cuckoo’s Egg: Tracking a Spy Through the Maze of Computer Espionage”
http://www.amazon.com/Cuckoos-Egg-Tracking-Computer-Espionage/dp/1416507787

davidmhoffer
November 30, 2011 1:24 pm

mrrabbit;
In others words, while replication in IT circles may not be snapshots – a snapshot can be replication.>>>
Ahem. No. Well…. maybe. Ask three experienced IT managers what a snapshot it and you will get four answers. So to have the discussion, we’d have to make certain we both mean the same things when we use the same terms, that said….
A “snapshot” is a point in time copy of the data, in most cases predicated upon the original data that the “snapshot” was taken of.
Replication on the other hand, is a physical copy of the data that is physically seperate in all respects from the original data.
Here is the important distinction. Suppose that I have a large email system. Suppose further that I replicate data to a remote site. In addition, I also snapshot my data every hour. Now…something bad happens.
Bad Thing Example 1: My email system gets infected with a virus, and scrambles all the data on disk. Guess what happened to the data replicated to the remote site? Bad news, it is scrambled too. The replication tool makes no distinction between good data and bad data. If something gets scrambled on one site, then it gets scrambled on the replica. Delete from one, delete from the other. BUT, since we’ve also been running snapshots, we can simply role back to the last good snapshot (which is less than an hour ago) and with a few mouse clicks, we’re up and running.
Bad Thing Example 2: My primary data centre burns down. All the snapshots in the world will not help me, they are ALL gone. Well, from that site anyway. If my replication software was doing what it was supposed to, all my data has been replicated to another storage array and I can just mount that to a server and start running again. (assuming I wasn’t so unlucky as to be in the data centre when it burned down 😉 )

November 30, 2011 1:24 pm

But if converting zip to txt only failured, losting some part of email addres
(some 8 bit ascii conversion?).
——-?? ??.WWW > ???.WWW
So ——-?? ??.WWW is somebody?
Im’´only quessing.
Ilkka.

davidmhoffer
November 30, 2011 1:28 pm

mrrabbit;
– delete the wrong snapshot – try to recover from the other snapshots – only to wonder why they end up with nothing.>>>
Yup, if you are messing with snapshots, one had best be familiar with exactly how that specific snapshot implementation works, they are NOT all the same. In some, delete any given snapshot, and all the snapshots older than that one go poof and disappear. In other implementations, deleting one only affects that one.
The other thing that messes IT up is performance. In some architectures, more than two or three snapshots will bring the storage array to its knees. In others, you can have hundreds without a performance impact.

davidmhoffer
November 30, 2011 1:33 pm

dmacleogravacleo says:
November 30, 2011 at 10:27 am
question on exchange-outlook relationship where non-cached is used.
where is the local copy kept?
there is no ost file like cached mode, is it a temp file deleted at some point?>>>
I’m not up to speed on their latest implementation, but in general the term “cache” means a temporary means of storing some subset of the data for performance purposes. So, the ultimate place that your emails are stored in should be the same either way. I’d think there would be someone deep on how Outlook works these days who could comment further?

davidmhoffer
November 30, 2011 1:43 pm

mikerossender;
Document requests can only be served against you for documents under your control.>>>
Correct. I only made the point about the copies being in both the sender’s environment and in the recipients environment to illustrate how the system works. Of course, this makes (for example) Michael Mann’s emails even more interesting than they would be otherwise. There very well could be email trails that no longer exist at the CRU, but they do at Penn State, and the time stamps show they existed at the time an FOIA request was submitted. If so, it would indicate that the email was deleted after the FOIA request came in, which would be…in North America, jail time, I’ve no idea about Britain.

November 30, 2011 1:46 pm

http://www.ResearchResearch, to FIOA ripper, 1 minute and i found a new, delete emails..
http://www.ecowho.com/foia.php?file=3000.txt&search=www.ResearchResearch
Ilkka.

davidmhoffer
November 30, 2011 1:46 pm

mikerossander;
If you delete the email on your desktop, it will be flagged for deletion on your institution’s server copy and overwritten the next time the computer decides it needs to use that particular piece of the disk. If a message was sent, received and deleted all within the time between incrementals, it may not be captured for backup at all.>>>
It is POSSIBLE to set up an email system to work exactly like that. It is UNLIKELY that any system designed to current best practices and to meet current compliance law would do so.

Berndt Koch
November 30, 2011 1:51 pm

A couple of other things to consider if you are trying to delete a specific email.. in general people will ‘reply’ to an email or forward an email quoting the original (as can be seen from both sets of Climategate emails) so if you REALLY want to delete your original email you also have to find all the emails where your original email was quoted or forwarded.. and all their copies..
Also emails can be sent to multiple people, so multiple servers in different IT shops with different backup and archive policies..
I’m not sure the term delete really applies.. maybe reduce the number of copies by 1 or 2 % would be better!

davidmhoffer
November 30, 2011 1:54 pm

mikerossander;
There is, in fact, a reward offered for anyone who can recover once-overwritten data from any reasonably modern media. It has gone uncollected for several years.
Again, thank you for a great post.
I wasn’t aware of that, can you post a link? I’d like to read what their terms and conditions are. In terms of your overlapping blobs explanation, yup, that methodology is pretty tough to use these days. There are other methods though. We’re straying into some pretty exotic technology however, and I haven’t reviewed it in detail for several years so I may very well be behind the times on that one. I also haven’t been working with military organizations for quite some time, so all the really cool stuff that I got to hear about at the risk of getting shot for hearing it is no longer accessible to me (wah!)

Joe Public
November 30, 2011 1:57 pm

I’ve no need to buy Detective Fiction Novels; I just read the enormous variety of educational postings on WUWT.
Thank you for a very instructive read.

Dan Murphy
November 30, 2011 1:59 pm

David,
Excellent post, and great comments (and commentators!)
The only thing missing that I would have expected to see in the general discussion of this issue is the recent emergence of “de-duplication” technology. I would expect that there were often several or more users with the same e-mail on their local systems. Suppose that Michael Mann copies both Kevin Trenberth and Phil Jones on an e-mail. They would both have copies of that e-mail on their local systems, and backups of their local systems would have copies of the same e-mail. Would you kindly take a moment to address how de-duplication software would impact the situation from the standpoint of the release of ClimateGate e-mails, and the attempts to delete e-mails to avoid FOIA requests?
Thanks in advance,
Dan Murphy

Joe Public
November 30, 2011 2:00 pm

@ davidmhoffer says: November 30, 2011 at 1:00 pm
RE: “Do you think that the authorities (UEA and Norfolk Police) know who did this and how?”
“Not a clue, ….. that’s the domain of high end security consultants and government agencies with three letter acronyms.”
And the delay in releasing their findings might, just might, be deliberate.

davidmhoffer
November 30, 2011 2:01 pm

Mike Wilson;
I did not see any reference to how many companies are virtualizing their tape systems. I helped IBM test their first VTS in Texas while it was still in beta in the mid 1990s>>
Do you remember the product name? My guess is that it is the VTS (or the more common term VTL) that they used to rebrand from Falconstore. A few years ago, they bought a company named Diligent and renamed it Protectier. Protectier is both a VTL and a deduplication platform for use as a backup target.
Your description of using VTL between the backup servers and the tape libraries pretty much on the money. I decided to leave VTL, disk to disk, and deduplication out of the article as it was getting long enough as it was.

davidmhoffer
November 30, 2011 2:05 pm

PaulH;
Puts me in mind of the book “The Cuckoo’s Egg: Tracking a Spy Through the Maze of Computer Espionage”
http://www.amazon.com/Cuckoos-Egg-Tracking-Computer-Espionage/dp/1416507787
Wow, thanks! I knew the basics of the story from being involved with certain customers…but I digress. Anyway, I didn’t know it had come out as a book. Putting it on my reading list!

40 Shades of Green
November 30, 2011 2:18 pm

Great article. One of the mostninformative ever on wuwt which is saying a lot. Thanks.

Mike Wilson
November 30, 2011 2:34 pm

David,
It was an all IBM product. It was VTS model B10 which was v1.0 of IBM Virtual Tape Server running inside a IBM 3494 ATL. The library manger driving the VTS ran a specialized version of TSM (Tivoli Storage Manager) under the covers. The tape drives storing the data on the backend in the ATL were 3 IBM 3590 Magstar drives storing a whopping 10GB uncompressed per cartridge. The virtual drives presented to the system were 16 3490 tape drives.

Mike Wilson
November 30, 2011 2:48 pm

David,
Also this VTS product has today has evolved into the currrent TS7700 line of products. You can google it if you are interested.

davidmhoffer
November 30, 2011 2:49 pm

Mike Wilson;
It was an all IBM product. It was VTS model B10 which was v1.0 of IBM Virtual Tape Server running inside a IBM 3494 ATL. The library manger driving the VTS ran a specialized version of TSM (Tivoli Storage Manager) under the covers. The tape drives storing the data on the backend in the ATL were 3 IBM 3590 Magstar drives storing a whopping 10GB uncompressed per cartridge. The virtual drives presented to the system were 16 3490 tape drives.>>>
Yup, that was well before the Falconstore product for sure. To complicate the discussion, TSM is an “incremental forever” architecture rather than a “weekly full, daily incremental”. In an incremental forever architecture, tape reclamation and save set consolidation are very important to do, but they are very hard on the tape drives, cartridges and the library itself. So, doing those processes in a virtual environment such as the VTS is a HUGE benefit. Today’s IBM Protectier can present several virtual libraries and thousands of virtual tape cartridges plus it does deduplication too.

KnR
November 30, 2011 2:56 pm

I think you left one out ‘cleaners’ ever office has them and most being don’t even know who they are , and given there low paid sub-contractors to sub-contractors who are normal treated poorly and have no loyalty to those they clean . They make for dam good ways of getting access to sometime even the most secure room or building , everyone has bins and these people do need access to do their job and you can’t expect important people to clean their own toilets now can you. As for the idea these people are vetted, well that costs moneys and takes time and who would that for just ‘stupid cleaners ‘

davidmhoffer
November 30, 2011 2:56 pm

Dan Murphy;
The only thing missing that I would have expected to see in the general discussion of this issue is the recent emergence of “de-duplication” technology.
Would you kindly take a moment to address how de-duplication software would impact the situation from the standpoint of the release of ClimateGate e-mails, and the attempts to delete e-mails to avoid FOIA requests?>>>
I just knew someone would ask that one….
De-duplication is a very old technology (late 70’s) that everyone forgot about, and has now been “invented” again. I’ll try and respond later tonight or else tomorrow AM. I know what the answers are, just condensing them into something understandable and less than a hundred pages is a bit of a challenge.
Stay tuned!

Mike Wilson
November 30, 2011 3:05 pm

davidmhoffer,
Yes, it is absolutely getting crazy out there. IT professionals can barely keep up with what is going on, so you know Phil Jones had no clue. He doesn’t even have a clue concerning his own profession.
Thanks for the informative article!

Crispin in Waterloo
November 30, 2011 3:09 pm

@R Barker:
>…I wonder what gems might be revealed in the all.7z file(s).
+++++
We had a discussion in the livingroom (including a high end IT server expert) and posit the following:
The encrypted file may be something as simple as a whole backup file from early in the ‘game’ containing something of great importance, perhaps from other sources, showing collusion to manipulate the public with falsified temperature data. It might be necessary to crack the code then decrypt the (original) result
It may be a carefully assembled set of data including the deleted programmes and original temperature data (which were claimed to be unavailable only after they were asked for). I think that is what most people expect – lots of UEA emails.
They may be a nested set of encrypted files with sequential revelations, each level requiring a serious brute force attack to open the hard way. Crack it once and you get a trove plus another huge encrypted file: (1(2(3(4(5(6)))))) with the most important ‘reveal’ sitting at position 6.
The way things are going, the last seems most likely. We have no idea what is going on behind the scenes re threats to publish a password. Sending the target a pwd would only open level 1 but serve to hint at what will happen (and more) if they do not, for example, come forward to admit the fraud and perfidy and take down the whole sorry mess from the IPCC to RC. The additional mails in CG2 are so damning to those who have papered over their asses since CG1, there are no doubt more layers of revelation still to be released.

Dan Murphy
November 30, 2011 3:22 pm

David,
Thanks, I’ll look for your de-duplication post. I had not realized that it was “resurrected” technology!
Dan

davidmhoffer
November 30, 2011 3:38 pm

Dan Murphy;
Thanks, I’ll look for your de-duplication post. I had not realized that it was “resurrected” technology!>>>
Yup. We were doing the equivelant of de-duplication, snapshots, server virtualization, storage virtualization, and many other things in the 70’s and 80’s. Then servers and storage became uber cheap, and we forgot about all those nifty tools because it was more work to run them than they were worth. Then the number of servers we needed started to explode, and the amount of data started to explode…. and so we “invented” them all over again.

1DanydTroll
November 30, 2011 3:53 pm

Everything is so god damn complex unless you know how to do it. But when you know what to do and how to do it, it very easy and not at all complex, albeit it was a bit harder some 10-15 years ago, but these day, not so much.
1. Disregard everything you think you know about Climate gate 1. Too much information makes you susceptible to presumptions.
2. What then do you actually know about climate gate 2? The real actual facts, not what you thinks are the facts.
3. From where does most people leak? Excepting the usual orifices.
4. From where does most people hack? Excepting the woods.
5. Put the remainder into a simple visual social network analysis powerpoint presentation if you like.
6. Try connect each actual real fact to each person in the network using bing and google.
7. Probability dictates, these days, that, then, your done.
Pretty much everything that is done via the internet is stored on the internet, so it’s not at all that difficult these days to find out about stuff, the hard part is actually figuring out why people do what they do, unless they’re trying to fix something that is corrupt that is. :p

Eric Anderson
November 30, 2011 4:10 pm

“Interesting approach. My own non-DOD-approved technique is to remove the hard drive platters and bend them into interesting shapes. In theory you might be able to recover some fragments of data, but it’s unlikely you’d be able to do anything with them.”
I know you’re talking about large arrays and hard drives, rather than personal backups, but it reminded me of my approach to making my DVD backups non-accessible before I throw them in the trash. About 5-6 seconds in the microwave works great!

mikerossander
November 30, 2011 5:50 pm

re: single-overwrite
I suggest a quick search on the “Great Zero Challenge”. The original challenge was by 16 Systems, I think. It was up for several years and finally taken down, unaccepted. Several smaller groups appear to have made similar offers under the same name and also claim that no one has ever taken them up on the offer.
A good analysis of Gutmann’s original paper about the ability to read overwritten data was published by Daniel Feenberg of the National Bureau of Economic Research at http://www.nber.org/sys-admin/overwritten-data-guttman.html
One caveat (and maybe the source for the recent multiple-overwrite recommendations): When a sector goes bad, the data is still in the sector even though your OS can’t access it. You can, however, format and attempt to overwrite the device. Only some of the overwrite of the bad sector will fail. Overwrite enough times and you have a good chance of overwriting even the bad sectors.
And, yeah, the paint blob explanation is a pretty loose analogy for what’s really done but trying to concisely explain fractional additive charge probabilities and quantum interference patterns was beyond my ability. It’s possible the NSA has something that the rest of us don’t know about. I’m skeptical, though. Write sizes are approaching the theoretical limits of resolution for the media and all the research I’ve seen suggests that you needed a read head 20x more precise than was used to write to even have a shot at reading overwritten magnetic media.

Johnnythelowery
November 30, 2011 7:01 pm

This whole thing is incredible. Anti-Cause WUWT undercover Agent Unit #1 ‘FOIA’ gambled a great risk—he/she waited for the anniversary to release the next tranche, Climategate II, set of emails assumnig he’d be alive to do so!! I wouldn’t put it past this conglomeration- deluded into thinking no one should stand in the way of them saving the planet- into investigating FOIA and perhaps ‘intervening’. The access to the emails points to a inside job(?)(see above discussions). We know it was dumped into a Russian server. The few who are the web gate keepers should be able trace the time and location of the uploads. What is incredible is the feat was repeated since ClimateGate I. The leaks derailed the Copenhagen meeting Sark, Merk, Brown and O’bama(IMHO). The combined secret service power of these nations seems unable to ‘deal with’ FOIA. A possible clue to his/her identity. Who ever it is needs to make provision for the password release in case of a visit from the heavies. The Climate Science perversion is total. Why would ‘Russia’ want to derail the AGW steam roll.? Who………………..without the protection of state, who could wait a year and repeat the release of these damaging emails??? GCHQ should surrely be able to trace an upload of that Climategate size to a Beligerant State’s (Hi Chelski fans) obscure server. Governments don’t suffer individuals with the power to politically derail $37 Trillion tabbed tax schemes. FOIA—who ever you are: stay thirsty my friend!!
And as he posted on TallBlokes under FOIA….wouldn’t Tall Bloke have some basic information to identify something about this……….hero??.

November 30, 2011 7:01 pm

davidmhoffer ,
outlook against exchange in cached mode does use ost file, similar to pst file.
but in non-cached there isn’t a file (ost or pst) but I suspect there HAS to be a trace somewhere, just was trying to track down some unknown disk usage today on client machine so was hoping you might have known 🙂
a very interesting thread, again I thank you.
my username got skewed last time LOL

November 30, 2011 8:51 pm

For a crash course in real world information security go to the CEO’s desk, or virtually any (non-IT) senior executive’s desk and open the center drawer. His username and passwords for all the systems he has access to will be taped there for him to use when he needs them – unless they’re taped to his keyboard or the front of his monitor.

davidmhoffer
November 30, 2011 8:55 pm

De-Duplication – does it change anything?
Yes. And no. Like all technology questions…. it depends.
Let’s start with what it is. Even that isn’t straight forward.
Data has a lot of duplication within it. Let’s think about it just in terms of email to see how much duplication there could be. Suppose one sent an e-mail to 100 people, and the email had an attachment (say a large Word document that was 1 Megabyte in size). Suppose further that all of those people are on the same mail server (within your company for example). How many copies of the email and the attachment are there? Better still, let’s further suppose that someone changes a single word in the attachment, hits reply all, and sends it. Now how many copies of what are there?
Answer: It depends. The answer varies from one email system to another. Sendmail, Lotus Notes and Exchange all do things differently. So, let’s just focus on one of those. Exchange is the most common, so we’ll go with that.
Answer: It depends. What version of Exchange? Let’s say Exchange 2003, which is a couple of versions ago.
Answer: It depends. Are all the users in the same datastore? Or are they spread out amongst several data stores? OK, let’s make it easy, and say they are all in the same datastore.
Answer: There is one copy of the first email, one copy of the original attachment, 100 “pointers” to them, one copy of the “reply all” email, one copy of the attachment with one word changed, and 100 pointers to them. For easy reference, let’s call them Email1, Attachment1, Email2 and Attachment2 respectively.
If we were on Exchange 2007 however (the second oldest release) we’d get a different answer. There would be 100 copies of Email1, but only one copy of Attachment1 (plus 100 pointers). There would also be 100 copies of Email2, and one copy of Attachment2 (plus 100 pointers).
Now if we were on Exchange 2010, there would be (hang onto your hat) 100 copies of Email1, 100 copies of Attachment1, 100 copies of Email2 and 100 copies of Attachment2. No, I am not kidding! Isn’t that going backwards? Needing more storage for the exact same emails as we did with the older version?
Answer: It depends. The total performance from a disk storage point of view for Exchange 2003 was, by comparison to Exchange 2010, ginormous. So, since the performance requirements of Exchange 2010 are so much lower, you can actually store all your email for less money than you did in Exchange 2003. The expression “your mileage may vary” applies however. But let’s put that aside for the moment.
If you have been following along, you’ve already figured out that there really are only two unique emails and two unique attachments. On top of that, the two attachments, which are huge, are identical except for one single word. The mechanism used in Exchange 2003 to store one copy of each and 100 “pointers” was called “Single Instance Store” or SIS. In brief, SIS was a way to have 100 copies of something without actually “duplicating” it 100 times. But in Exchange 2010, is in fact is duplicated 100 times. So SIS is one form of “de-duplication”. But when people talk about “de-duplication” they are, in most cases, talking about “de-duplicated” backup. To add to the confusion however, de-duplication actually applies to much more than backup.
Let’s discuss backup first. In the article we discussed backup to tape. Tape was designed to backup large amounts of data and restore large amounts of data. If all you want is one file though (say someone deleted their one page document) it might take as long to restore that one file as the whole backup took. If only the users would coordinate their accidental deletions and do them all at the same time…. but they don’t. Like herding cast those end users…
So along came these very interesting specialized storage arrays that could “de-duplicate” data, and could look like a tape library to the backup system. This changed backup architectures considerably, solving a lot of problems along the way. And of course, creating some new ones. To illustrate how complex the question of de-duplication is, let’s follow the changes in backup systems based on the three different versions of Exchange already discussed.
If we were backing up Exchange 2003, we’d only have two emails, two attachments, and a bunch of pointers to backup, right? Wrong. The backup program doesn’t look at the data on the storage array. OK, that’s not true either, there’s ways for the backup program to talk to the storage array directly, but let’s forget about that for a moment. In most cases the backup program would talk to the email server and ask it to send out copies of all the emails it has. Now Exchange 2003 doesn’t give a hoot about what is actually on disk, it responds with what it logically has stored. That is 200 emails and 200 attachments, and it sends each one individually. If we were backing up to tape, we would actually write 200 emails and 200 attachments to tape. Good thing we bought that de-duplication device.
The de-duplication device recognizes that it just got 100 copies of the same email, so it makes one copy on disk, with 100 pointers. It does the same with the other email, and the attachments. But wait! Those two attachments are nearly 100% identical. The de-duplication device picks up on that too, and stores only one copy of the attachment, plus the delta changes between it and the second attachment. So, even though Exchange 2003 had SIS, the de-duplication device used even less storage for backing up the data than the email system itself used. It could “de-duplicate” the two attachments. In fact, if one paragraph was repeated inside the attachment, it could very possibly de-duplicate that too.
Paradoxically, from a backup perspective, Exchange 2007 and Exchange 2010 would look exactly identical to the backup software compared to Exchange 2003. But, the amount of disk space that one needed to backup Exchange 2010 to a de-duplication device would be (in this example) less than 1% of the disk space on the email server. (The purists would argue that this statement is incorrect, and it is. Exchange 2010 uses a number of compression techniques, so while the apparent storage on the email server is 100X, the actual storage used is less. )
When someone wants to restore data from the de-duplication device, in theory it doesn’t actually exist. There are a whole bunch of blobs of data with a whole bunch of pointers. If someone deleted that large Word document and needed it back, it could be restored, and to do so, the de-duplication device would re-assemble the original document from the various blobs on the disk. This is a process known as “re-hydrating” the data.
OK, now we can start to answer the original question. De-duplication devices changed the way tape was used. It was so much easier to backup and restore to de-duplication devices that they became the main target for backup. One de-duplication device could hold the equivalent of weeks or months worth of backups to tape, but on disk. Could the tape libraries just be abandoned? No.
Tape was still a lot cheaper per terabyte in the long run. Tape didn’t need power to keep running. Tape could be thrown into a cardboard box and shipped off site. What many IT shops did was move their daily backup activity to their de-duplication device. They could then relegate tape to a much less frequent use, sometimes as little as once per quarter or even once per month. Better still, the copies of tape could actually be made from the data on the de-duplication device. The de-duplication device would re-hydrate the data and copy it to tape, producing a tape that was identical to having made a backup directly to tape in the first place. There were a number of secondary effects of de-duplication though.
The first one was that the number of tapes one needed for tape backup went way down. No more daily incremental and weekly fulls. Maybe four or twelve full copies on tape for the year instead of 52. Suddenly the capacity of the tape library to keep years and years of data increased by a factor of four or more. Capacity is to an IT shop like space in a garage. The mount of stuff you need to keep will always expand to fill the capacity. So, one immediate effect of de-duplication was a massive increase in tape capacity that enabled IT shops to keep more years of data than they otherwise would have, and the number of tapes required to represent one year of data was a fraction of what it otherwise would have been. More years of data to…uhm… give liberty to, and less tapes required to do it.
Since the introduction of de-duplication devices, de-duplication has spread throughout the backup system. Backup systems can actually de-duplicate data at a target device such as the de-duplication device we have been discussing, or it can be done in software by the backup application itself, or it can be done by an agent running on the server. But all those options introduced a new problem.
For security purposes, encryption is the best way to protect data. Encryption could be done anywhere in the data food chain. At the application level, on the storage array, in the backup tapes, and so on. The problem was that encryption and de-duplication didn’t get along. Encrypt your data, and your de-duplication devices could no longer de-duplicate it. The same was true of compression. So, a lot of IT shops that were moving toward encryption of their data on disk, had to back away from that if they wanted the benefits of de-duplication. So, de-duplication increased risk to…having data liberated… because it reduced the number of places in the food chain that data could be protected using that method.
You’d think we’d be done with this topic by now, we’re not. There is also archive. Going back to the discussion in the backup section of the article, many IT shops instituted archives to reduce storage costs and take pressure off of backup systems. De-duplication is also a feature of many archive systems. As a result, the capacity to store email in an archive platform is much higher than it is in the email system itself, often by orders of magnitude. Capacity is like a garage… oops, I already said that. Bottom line is that archives that are capable of de-duplicating data could store years of email for little cost compared to the email system itself. So, if the source of the data was in fact the archive, our intrepid data liberator would have had considerably more emails all nicely stored in one place. Better still (or worse, depending on your point of view) archive systems came with sophisticated search tools. If someone had access to those tools in the archive, isolating emails with key words and downloading them from years worth of data, would be a snap.
To be complete, I’d have to add primary storage de-duplication, de-duplication as it applies to wide area acceleration, how compression and de-duplication don’t play well together, how deterministic de-duplication changes that, how journaling and de-duplication affect one another…. and instead, I’m going to bed.
I think that’s the basics though. De-duplication would have no doubt increased the amount of data that was available to liberate, decreased the number of security options available to protect it from unauthorized download, concentrated larger amounts of data in fewer tapes, and made searching for specific email easier. Again, I don’t know that CRU used all (or any) of these techniques, or that they enabled or hindered the liberator of the ClimateGate emails. I’m just saying what the possibilities were.
G’night!

Alex Heyworth
November 30, 2011 9:26 pm

mikerossander;
“There is, in fact, a reward offered for anyone who can recover once-overwritten data from any reasonably modern media. It has gone uncollected for several years.”
Probably because the “reward” is an NSA front. Anybody who tries to claim the reward is shot. /tinfoil hat

davidmhoffer
November 30, 2011 9:49 pm

Alex Heyworth;
Probably because the “reward” is an NSA front. Anybody who tries to claim the reward is shot. /tinfoil hat>>>
Now now now, “shot” is such an ugly term. I believe the policy is to identify the source of the irritation and delete it or encrypt it (scramble them memory cells). hey, you got one of them tinfoil hats too? cool. works like a hot zzzzzzzt. huh? who are you? where am I?

December 1, 2011 5:27 am

been wondering if some of these mails/data were gotten off a versioning and/or CMS server such as sharepoint.

Dave Springer
December 1, 2011 7:56 am

Hey it’s JEDI SALESMAN!
http://www.cracked.com/members/david.hoffer
ROFLMAO

John Whitman
December 1, 2011 8:01 am

davidmhoffer,
Well done post.
I am left with whether the self-named “we” was a single person (an “I”) wrt the CR1 and CR2 releases or an actual “we”.
After reading your wonderful post, I still lean toward there actually being a “we”; more than one person involved. The primary reason I think so is that I cannot see a unique personality in the effort and communications by “we”.
John

Dave Springer
December 1, 2011 8:18 am

The emails were obtained off a backup email server in the IT department. Ostensibly few people knew about it including the CRU team and the person handling the FOIA requests. This is a finding of the UK government’s investigation into the climategate as given in item 31 of the conclusions in the Muir-Russell Review:
http://www.cce-review.org/pdf/FINAL%20REPORT.pdf

31. Limited internal communication. We found a lack of understanding within
University central functions of the presence of extensive, and long duration,
backups of e-mail and other materials despite these being on a server housed
within the central Information Technology (IT) facilities. Awareness of these
might have led to much greater challenge of assertions regarding non-availability
of material by CRU, notably in the case of a subject access request made under the
DPA for material naming the requesting individual.

Unadvertised archiving of employee email is quite common. Even if the official policy states otherwise there’s usually someone in IT who does it anyway.

Dave Springer
December 1, 2011 8:32 am

One might wonder if the Muir-Russell conclusion 31 is an honest finding. UEA responded to a number of FOIA requests saying there was no responsive material. Phil Jones and the rest of the usual suspects cleaned out their personal email storage around 2008 which was admitted in climategate emails. If those had been the only copies then it would have been a defensible response. If any of the CRU team or the FOIA respondent himself had known about the “extensive” “long duration” email backup server then UEA would have been committing a serious fraud in denying the existence of responsive materials. So I’m left wondering if it actually was the case that it was an honest mistake or whether the Muir-Russell review is itself covering up criminal activity with the seemingly less evil conclusion of “poor communication”.
I’d say it probably was an honest mistake. Many institutions keep a permanent easily searchable record of all employee emails and few institutions advertise the fact that they do. Often it’s done in contravention of official policy with IT doing it and everyone who’s aware it’s being done just looking the other way.

Dave Springer
December 1, 2011 8:51 am

Let’s get something straight. I’ve actually written email clients and pop servers. I’ve been in the business of designing computer hardware and software since the 1970’s. I didn’t sell the stuff I designed it and I worked with the people who used what I designed. There are so many mistakes in the OP I hardly know where to begin. This one below just happened to jump out at me:

The email client connects an email server (or servers in a very large implementation). To send an email to someone on a different email server, the two servers must “talk” to each other. In most cases they do so over the internet. How the clients interact with the servers however, is part of understanding why deleting an email that you sent (or received) is not straight forward. The reason is that an email is never actually “sent” anywhere. Once you write an email it exists on the disk drive of the computer the client software is installed on. Press “send” and it goes….nowhere. It is still there, exactly as it was before you “sent” it.

This is not how it works in thin clients. Web browsers are thin clients. If you’re accessing your email through a browser nothing is stored locally on your computer. When you press send it does indeed leave your computer except perhaps an inadvertant copy left behind in a temporary browser cache. Other email client implementations may not be thin clients but these have been increasingly popular beginning in mid to late 1990s when browsers became ubiquitous and internet bandwidth costs were plummeting.

davidmhoffer
December 1, 2011 9:24 am

John Whitman;
I am left with whether the self-named “we” was a single person (an “I”) wrt the CR1 and CR2 releases or an actual “we”.>>>
Having zero evidence in regard to how anything was or wasn’t done, exactly how their environment was set up, etc, I am nonetheless inclined to agree. Consider simple things like a backup administrator setting the backup system to cut an extra set of tapes for off site storage, and a truck drived from the off site storage company simply taking them out of the box. There are many ways in which an individual in the right place could pull this off themselves, but two or three people working together…and you have a crazy number of possibilities.

davidmhoffer
December 1, 2011 9:30 am

Dave Springer;
The emails were obtained off a backup email server in the IT department. Ostensibly few people knew about it including the CRU team and the person handling the FOIA requests. This is a finding of the UK government’s investigation into the climategate as given in item 31 of the conclusions in the Muir-Russell Review:>>>
The Muir-Russel Review that went out of its way to whitewash anyone and everything it could in regard to the entire incident? Yes, that would be a firm conclusion backed up by solid evidence that we could all just accept at face value. Sorta like tree ring data and hockey sticks…. here’s the results, no, you can’t see how we got them.

davidmhoffer
December 1, 2011 9:35 am

Dave Springer;
This is not how it works in thin clients. Web browsers are thin clients.>>>
You are correct. If you will refer back to the article itself, you will see that I was specific about the fact that there is a difference, and that the balance of the article would be written from the perspective of an email client installed on a desk top (ie a “thick” client). As at least one of the emails from Phil Jones referes to his use of Eudora, a thick client, it made most sense to restrict the article d a discussion of email from that perspective.

davidmhoffer
December 1, 2011 9:40 am

Dave Springer;
There are so many mistakes in the OP I hardly know where to begin. This one below just happened to jump out at me:>>>
Your personal animosity toward me is leading you astray. You claim that I’ve made many errors, yet single out one which, a brief read of the article itself, makes it clear was not an error on my part at all. I specified the difference, gave examples of both, and then advised that the balance of the article would focus on only one of them. Do you want to contribute to the discussion? Or just try and attack what I’ve said because you don’t like me personally?

davidmhoffer
December 1, 2011 9:53 am

Dave Springer says:
December 1, 2011 at 7:56 am
Hey it’s JEDI SALESMAN!
http://www.cracked.com/members/david.hoffer
ROFLMAO>>>
Hey that was hilarious! Are you certain that the author and me are the same person? There’s a David Hoffer who went to jail in Saskatoon and all the people that he owed money to tried to collect by putting liens on my assetts. Boy, was that fun sorting out. Then there was a David Hoffer who had a one night stand with a girl in Winnipeg, got her pregnant, and phoned my house at 3:00AM demanding that I pay child support. My wife was not amused. If you hit linkedin, you’ll find 25 professionals named David Hoffer. There’s a David Hoffer who is a public defender, and another one who is an investment banker. Of course, there’s actually only one of me, those are all just personas I’ve adopted to keep you guessing.
Really Dave, drop it. Your obsession with discrediting me is tiresome and of little value to the discussion.

Dave Springer
December 1, 2011 10:02 am

Store-and-forward email
This, where emails you write and receive are stored on your local computer, is considered obsolete. In 1993 when I first started working at Dell Computer we were using cc:Mail which is a store-and-forward system. This is what Hoffer is describing when he says “when you press ‘send’ nothing is really sent”. I think it was finally abandoned around 1998 at Dell but it may have been later. The last I recall needing a cc:mail client was when I was over in Taipei working on the first Inspiron laptop circa 1998. In order to get my email I had to dial up Dell from my hotel room using an analog modem on my laptop and use a fat client called cc:Remote which retrieved any emails sent me since the last connection and forwarded anything I’d written. My personal email at that time was already a thin client where telnet was the client side software. It was pretty soon after that I began using web mail and never looked back. After January 2000 when I left the corporate rat race I’ve used nothing but hotmail (general purpose) and gmail (technical-only). I understand gmail introduced something called “Gears” which allows for some measure of offline work with your email. I read about “Gears” just a moment ago as I was engaged in due diligence fact-checking about fat clients that may still around before posting this comment. Gears would be an example of a fat client.
Anyhow, ccMail itself was purchased by Lotus in 1991, Lotus was purchased by IBM in 1995, and was officially abandoned by IBM/Lotus in 2000. I recall the struggle. There was a push by Lotus to migrate cc:Mail into Lotus Notes. Lotus Notes sucked IMO. The UI was obnoxious and bordering on unusable for the casual user. I used Notes because our test/qualification system was tied to our suppliers so, for instance, when we were qualifying a newly designed motherboard Intel, NVidia, and assorted others could all work off the same issues database hosted through Notes with appropriate compartmentalization. I never recall our email going through Notes though and I was high enough up on the food chain so I didn’t have to muck around in Notes very often. An issue serious enough for my attention usually found me as opposed to me finding the issue.

davidmhoffer
December 1, 2011 10:09 am

mikerossander;
A good analysis of Gutmann’s original paper about the ability to read overwritten data was published by Daniel Feenberg of the National Bureau of Economic Research at http://www.nber.org/sys-admin/overwritten-data-guttman.html>>>
I read Freenberg’s article and agree 100% with his criticisms of Gutmann’s paper. Thanks for the link! I had a bit of time to dig through my own archives and have to admit that based on what I have (the newest correspondence I have on the matter is almost ten years old) suggests that you are probably correct. The precision and density of current hard drive technology would have made the techniques I recall pretty much obsolete. Also, the correspondence I have turned out to be in relation to formatting hard drives and whether or not the data could be recovered afterward (I’m talking low level format here, not an O/S format). The answer was “sometimes” because while formatting would return any given “bit” to “zero”, it would still be in a range called “zero” by the heads in the drive. By reading the absolute value of the magnetic “charge” of that “bit”, it was possible to guess that a “zero” at the top of the range was originally a “one”, but that a zero at the bottom of the range was originally a zero. After coming up with a “best guess” on a bit by bit basis, the technique was then to use parity bits and other ECC information to try and verify the guesswork.
However, that was recovering data from a format operation, actual over writing of data would be many, many, times as challenging, and, as you pointed out, the precision from ten years ago was completely different from what we have today.
dmh

davidmhoffer
December 1, 2011 10:23 am

Dave Springer;
This, where emails you write and receive are stored on your local computer, is considered obsolete. >>>
Dave, Dave, Dave…. Outlook is by far and away the most commonly used email client there is. It can be implemented with different options, the most common of which is “store and forward”. Phil Jones specified his use of Eudora, another thick client that is a “store and forward”. It matters not one wit what is obsolete and what isn’t. What matters is what technology was being used, and how it worked, which is what I’ve focused on. If Phil Jones had made comments about using thin clients, I would have focused more on that approach.
BTW, even if you are using a thin client, the email servers themselves still operate via a store and forward mechanism.
In regard to Lotus Notes, this is far more than an email system. It is also an electronic conferencing tool, and a workflow automation tool. The manner in which you describe interacting with Lotus suggests you were using functionality over and above the email system itself. If you want to trace the heritage of Lotus Notes from that perspective, you’d have to dig into technologies such as Digital Equipment’s VAX Notes which was pretty much abandoned after the VAX Notes development team resigned en masse and went to work for…Lotus.

CRS, Dr.P.H.
December 1, 2011 10:43 am

Thank you, David, for an excellent read! Also, thanks to commenters for your valuable analysis and, as always, Anthony & the Mods for keeping the conversation going!
I’m just a humble biologist, but I’ve known the basics about email for many years (I have friends who helped to develop these systems at the University of Illinois in the 1970’s on our PLATO network system, a fascinating story). It blows my mind that these “brilliant” climate scientists were so ignorant about even the basics of email architecture & function!!

There are multiple emails showing up in which, for example, Phil Jones says he is going to delete the message right after sending it. But we now have a copy of that specific message. Did he send it and then forget to delete it? Probably not. The more likely answer is that he did delete it, not realizing that the CRU data retention policy resulted in a copy being left on the server.

Ha ha ha, Eudora made him do it!! What a pack of fools!! Oops, now that comment will be retained for eternity on a drive somewhere. Oh well.

Dave Springer
December 1, 2011 12:36 pm


“Phil Jones specified his use of Eudora, another thick client that is a “store and forward”. It matters not one wit what is obsolete and what isn’t. ”
It also doesn’t matter what kind of client Jones used. I never claimed he was using one or the other. I just said not all email clients keep a local copy which you do not dispute. You wrote:
“Once you write an email it exists on the disk drive of the computer the client software is installed on. Press “send” and it goes….nowhere. It is still there, exactly as it was before you “sent” it.”
This you wrote under your heading EMAIL 101. Now you’re trying to say it wasn’t a general introduction to email (hence 101) but rather on Phil Jones’ Suspected Email Client Configuration. Okay. Sure. That’s it. That’s the ticket.
You REALLY need to learn to stop digging.
.

Dave Springer
December 1, 2011 12:47 pm


“BTW, even if you are using a thin client, the email servers themselves still operate via a store and forward mechanism.”
That’s certainly an option but they only need to store until delivery is confirmed which might be from milliseconds to never. If policy is to trash everything more than say 6 months old then it might get trashed that often. Reality doesn’t always match policy and it’s awfully easy to configure the mail server to save a copy of everything. Fourteen years is a long time for “The Team” to not have a clue about it, isn’t it?

Dave Springer
December 1, 2011 1:11 pm

“Also, the correspondence I have turned out to be in relation to formatting hard drives and whether or not the data could be recovered afterward (I’m talking low level format here, not an O/S format).”
The problem with that is when a zero-fill format is done it’s almost aways in conjuction with a test of the media as well to locate & mark bad sectors. That means all bits will be raised to one before being lowered to zero as part of the integrity test. I worked for a couple of years in internal diagnostics at Dell and have written more low level disk utilities than I care to remember except for first-to-market things which are cool. I wrote and sold the first hard disk cache (LRU sector algorithm) for an IBM PC or compatibile using LEM (Lotus Intel Microsoft) expanded memory for instance just about exactly 30 years ago.

Dave Springer
December 1, 2011 1:18 pm

-Oops, that should be LIM not LEM. I guess there’s too much stink of neo-NASA around here and I was thinking Lunar Excursion Module. 🙂

Dave Springer
December 1, 2011 1:46 pm

davidmhoffer says:
December 1, 2011 at 9:24 am
John Whitman;
I am left with whether the self-named “we” was a single person (an “I”) wrt the CR1 and CR2 releases or an actual “we”.>>>
Having zero evidence in regard to how anything was or wasn’t done, exactly how their environment was set up, etc, I am nonetheless inclined to agree. Consider simple things like a backup administrator setting the backup system to cut an extra set of tapes for off site storage, and a truck drived from the off site storage company simply taking them out of the box. There are many ways in which an individual in the right place could pull this off themselves, but two or three people working together…and you have a crazy number of possibilities.
==============================================================
This is why in science, where there are very often many possible explanations, we have the principle called Occam’s Razor which states that the simplest explanation is usually the correct one and you should not add things that are not strictly necessary.
The simplest explanation is there was a 13-year continouous inclusive email archive on a departmental mail server configured to keep just such a continuous inclusive record and then someone with legitimate credentials to access the archive made a copy of it or allowed someone else to make a copy of it. This has whistle blower written all over it or, barring that, someone on the inside was paid off. I’d really like to know if anyone with legitimate access bought any expensive cars lately if you get my drift.
Hackers and tape libraries and anything else I’ve seen mentioned are unnecessary complications.
I mean it could have been intercepted by extraterrestrials described in Erich Von Daniken who beamed it to their hidden base in Area 51 who then transferred it by mental telepathy to an army of elves who transcribed and emailed to a 17-year old Russian kid who knew a guy who could post it. But that’s a lot of unnecessary complication.

davidmhoffer
December 1, 2011 2:12 pm

Dave Springer;
It also doesn’t matter what kind of client Jones used. I never claimed he was using one or the other.>>>
No. You claimed my article was in error because thick clients were obsolete. I answered that issue and now you are casting my answer in the context of something else completely.
Dave Springer;
This you wrote under your heading EMAIL 101. Now you’re trying to say it wasn’t a general introduction to email (hence 101)>>>
Anyone who read the artcile knows it was a general discussion of email with a specific focus on the most likely technologies that were in use.
Dave Springer;
You REALLY need to learn to stop digging>>>
LOL.
Dave Springer;
That’s certainly an option but they only need to store until delivery is confirmed which might be from milliseconds to never. If policy is to trash everything more than say 6 months old then it might get trashed that often>>>
Which is why I explained what a data retention policy is, how data retention from the end user perspective differs from the email server perspective, how various data protection mechanisms such as snapshots and backup systems work, and how they typically integrate with each other?
Is there any purpose to your sniping? Are you adding any value by attacking snippets of what I said and casting them in an ill light by presenting them out of context? Is there something to be gained by pretending you are adding new information when the article itself covers exactly the information you claim to be adding? Have you spent one second trying to be of value to the discussion rather than smply trying to find fault with my explanations?
Seems to me your are pretty fixated on your dislike for me. Tiresome frankly. Don’t go away mad.
Just go away.
REPLY: BOTH OF YOU JUST STOP- PLEASE- DON”T MAKE ME THROW YOU IN THE TROLL BIN – Anthony

davidmhoffer
December 1, 2011 2:20 pm

Dave Springer;
The simplest explanation is there was a 13-year continouous inclusive email >>>
The prupose of the article was to explain the main technologies involved in order to illustrate the possibilities. Applying Occum’s Raiser to arrive at any conclusion at all simply by pointing the finger at the simplest possible method displays a total and complete misunderstanding of the complexity of the systems involved, the security systems that were very likely in place for the specific purpose of preventing access by the simplest and most obvious methods, and your fixation on attacking me personally for whatever reasons you have to do so.
Hugs and kisses.

December 1, 2011 9:17 pm

This article also does not discuss the difference between email protocols, the two most prevalent being POP and IMAP. This is important since how your email client interacts with the email server and how emails are retrieved and deleted is very different. Most home users are probably using webmail clients (browser based) and need to check if it is using POP or IMAP. For example Comcast, Hotmail and Yahoo! use POP3, while Gmail uses both.

davidmhoffer
December 1, 2011 10:34 pm

dmacleo;
dmacleogravacleo says:
November 30, 2011 at 10:27 am
question on exchange-outlook relationship where non-cached is used.
where is the local copy kept?
there is no ost file like cached mode, is it a temp file deleted at some point?>>>
I pinged one of my contacts and he explained it this way. In “online” mode, also called “cached” mode, the email is never stored on the hard drive of your desk top computer. So, while Outlook itself is installed on your desk top computer, when you read an email, it is downloaded to your desk top computer, but only stored in RAM (memory). When you close the email, from the perspective of your desk top computer, it is “gone”. No permanent copy was ever made to the desk top computer’s hard drive.
This is slightly different from what a web client (thin client) does, but has a very similar over all result. If you want to make a copy of your emails to (for example) a USB drive, you would have to “export” them to a .pst file and store it on your USB drive. However, you cannot do that unless the email server itself is configured to ALLOW that to be done. A lot of IT shops shut that feature off for the specific purpose of preventing end users from walking off site with all their emails on a USB key.

CRS, Dr.P.H.
December 1, 2011 11:08 pm

Whoever these DUQU folks are, they sure showed us how to wipe your trail out properly!!
http://www.computerworld.com/s/article/9222293/Duqu_hackers_scrub_evidence_from_command_servers_shut_down_spying_op?taxonomyId=85#disqus_thread

According to Kaspersky, each Duqu variant — and it knows of an even dozen — used a different compromised server to manage the PCs infected with that specific version of the malware. Those servers were located in Belgium, India, the Netherlands and Vietnam, among other countries.
“The attackers wiped every single server they had used as far back as 2009,” Kaspersky said, referring to the Oct. 20 cleaning job.
The hackers not only deleted all their files from those systems, but double-checked afterward that the cleaning had been effective, Kaspersky noted. “Each [C&C server] we’ve investigated has been scrubbed,” said Schouwenberg.

Brian H
December 1, 2011 11:58 pm

From A Public Servant’s account of UK FOI laws, they adhere to the commandment, “Thou shalt remember the legal fictions, and keep them holy.” Deemed deletion => real deletion? Bizarre indeed, as davidmhoffer sez..
BTW, david, there is not, thankfully, any such word in English as “becomed”. Irregular ol’ Anglo-Saxon verb, past tense “became”. Honest.

davidmhoffer
December 2, 2011 12:10 am

CRS, Dr.P.H. says:
December 1, 2011 at 11:08 pm
Whoever these DUQU folks are, they sure showed us how to wipe your trail out properly!! >>>
Read the full article and some related articles, thanks for the link!. That is one seriously nasty viscious brilliant piece (or pieces) of code those guys wrote. At the beginning of the article when they theorized that there were government agencies behind it, I thought… yeah right.
Then I started reading a bit more on how the darn thing worked. That’s one seriously heavy duty chunk of code. It works by tunneling across operating systems, network protocols, even invades versions of one piece of code with one exploit, and then used that to upgrade to a newer version in order to take advantage of a different exploit. WOW!
This isn’t the kind of thing that you can write on a PC. You’d need an entire network with all the operating system variants, protocol variants, edge network devices like firewalls, load balancers, network sniffers and on and on just to test it and see if it works as designed. You’d even need another whole environment to represent the “target” environment otherwise when you did payload testing you’d wipe out your own computers.
One can’t help but be both impressed and scared sh*tless at the same time.

davidmhoffer
December 2, 2011 12:50 am

Brian H;
BTW, david, there is not, thankfully, any such word in English as “becomed”. >>>
Sigh. You are correct. You’d think that with “unbecoming” and “becoming” and “become” all being real words that “becomed” would be too? Or would it be “becamed”? 😉
There’s no such word as “discombooberated” either, but when I read the ClimateGate emails…it seems appropriate. After all, the emails were appropriated in some manner from their proprieters, and when one reads the explanations of “the team” it is clear that they have become discomboobriated. Quite unbecoming.

December 2, 2011 5:37 am

offline is cached, it generates the ost file.
online (non cached) is similar to thin client, i suspected and you confirmed held in ram.
I appreciate your looking into that.
I have set server (SBS 2011 now) up to allow export to pst, small network so I can get away with it,
as far as the disc space I was wondering about…well lets just say I am too embarrassed to say what it was. just accept I was being really stupid and failed to actually look at what I needed to LOL LOL

davidmhoffer
December 2, 2011 9:04 am

dmacleo;
as far as the disc space I was wondering about…well lets just say I am too embarrassed to say what it was. just accept I was being really stupid and failed to actually look at what I needed to LOL LOL>>>
Never be embarrased about this stuff. As a couple of commenters pointed out, IT is now so complicated that nobody on earth understands all the technologies and all the options. Get good at the stuff you do every day, and reach out for expertise on the “once in a while” stuff. The real trick is knowing when to reach out because you don’t know what you don’t know. At least, that’s my philosophy. Here’s a brief anecdotal story that ought to make you feel better.
I was in a meeting with a very large ($ Billions/year) company that was having trouble with their high performance compute farm. They’d bought entire racks of servers from three different manufacturers, and, in their words, “none of them work”. The complaint was that they were getting about 1/8th the performance they thought they should be getting. I asked to see the list of software they were running.
The problem was that their software was all “single threaded”, but their server blades had 8 cpu cores on them each. So…the maximum they could get out of a single threaded application was 1 cpu core out of the 8 actually doing any work. Talk about some red faces. Then I suggested that they run virtualization software (and there are free versions that suited their purposes) and then they could run 8 O/S instances on each blade, and hence 8 threads on each blade. More red faces.
I won’t tell you who the company was, but I will tell you that they are one of the largest software companies in the world.
Feel better now? LOL

December 2, 2011 2:01 pm

HA LOL
a little better, still embarrassed at myself though 🙂

davidmhoffer
December 2, 2011 4:29 pm

I just remembered a better story. I was travelling, and forwarded one email account to another so I would get both sets of inbound email on my blackberry (this was before blackberries could have multiple email accounts). Then I forgot that I had done it. I moved my blackberry service from the 2nd account to the first, but wanted all my email visible on my blackberry, so forwarded all my inbound email from the second account to the first account…
32,000 emails later, I exceeded some limit or other on one of the accounts , or a firewall rule woke up, I’m not even sure what broke the infinite loop. Woke up in the morning, and had 32,000 unread emails. Actually, 64,000 (32k in each account).
The razzing didn’t last as long as the time I wrote a piece of code with an infinite loop that had a print command in it that went to a 400 page per minute line printer….

Mike Wilson
December 5, 2011 7:09 am

David,
I wonder if any outsourcing has been involved here? We do not even realize how lax security has become through outsourcing. I lost my job 2 years ago, shortly after demonstrating to management how they were allowing US military secrets to be accessible to employees (administrators) residing in foreign lands. After I documented how it could be done, it was taken to the very top of this VERY large corporation. My manager was forced to sign of that there was no security risks even though I could prove it!
They did not want to hear about this, because they could save money by using the cheaper labor. This just goes to show what a lack of leadership we have today!

Mike Wilson
December 5, 2011 7:17 am

Oh, and while I am not certain, I think this data is still sitting available today.

December 5, 2011 8:21 am

It seems that CRU,s MIME device left his footprint to email , some cryptic strings.
For exampe. ??——————————————
Maybe they didn´regognize that string keeps inside a message ID, and the orginal
message can be found with that ID.
Is that an accident, are they really so silly.
And the MI5 investigating what happened.
Im´ just taking a closer look at their MIME device, some UAX or ….
Hold on, is this only a dream??, i must check it.
Ilkka

davidmhoffer
December 5, 2011 8:30 am

Mike Wilson says:
December 5, 2011 at 7:09 am
David,
I wonder if any outsourcing has been involved here? We do not even realize how lax security has become through outsourcing.>>>
Yes, outsourcing is a huge security risk. In most cases, outsourcing contracts rely on contractual obligations to enforce security. The client gets legal commits and then frequently washes their hands of the risk, as it is up to the supplier of the outsourced services to put the proper security in place. Oddly, many clients will put the outsourcer through the ringer to prove they are capable of providing the services being contracted for, and then just sign a piece of paper without any due diligence on the security aspects.
Was outsourcing part of trhe mix here? I don’t know. From the various snippets of information that we get, it looks like UofEA had their own central IT department running the email system. But off site storage of backup tapes is almost always an outsourced service. Implementation of new technologies not familiar to the IT group is often outsourced to a contractor with experience and expertise who comes on site to get things up and running. I don’t know that anything at all was outsourced by UofEA, but it certainly is a possibility, and in fact likely for at least some tasks.

Mike Wilson
December 5, 2011 8:46 am

David,
Not says this is what happened at CRU, but the following did happen:
Imagine this scenario. Large outsourcer is taking over data centers for many customers. Some customers have extremely sensitive data of national security interest and some don’t. Outsourcer wants to offer ATL(s) (automated tape libraries) to customers, but these libraries cost 1 or 2 million apiece. So to save money, a management & marketing decision (with customers approval at the time) is made to place multiple customers data in a shared ATL. The ATL is configured by the storage administrator as directed in such a way that nobody on customer A system can access data in the ATL that resides on customer B (it is actually also customers c,d,e,f,g & h) except the storage administrator(s) that holds the keys. Now imagine after this is all setup and running quite well, management starts thinking these storage administrators are very expensive and we have cheaper people overseas. Management starts making decisions w/o input from current storage administrators. They know good and well that customer A and C has very sensitive data that can not be legally administered overseas. But due to many internal reorgs and management changes, they really have no clue as to how the systems are setup or even that the ATLs are shared or even what that means. So the plan becomes to divvy up support for individual customers according to who can be managed overseas and who cannot. But the problem is a storage administrator must have complete control of all the data due to the nature of his job and to manage the hardware and data. So now when storage administrator learns of these plans, he raises a red flag, but by then, the ball has already rolled a long ways down hill & decision makers have much crow to eat if plans are halted. THAT MUST NOT HAPPEN SAYS MANAGEMENT! WE DON’T LIKE CROW!
So now you have ATLs and VTSs with multiple companies data in them with storage administrators in multiple countries that (if they know what they are doing) can access the tape data of any system in the ATL or VTS.
And do you think any of the companies that outsourced their systems even know about this?

Reply to  Mike Wilson
December 5, 2011 8:58 am

If you dont believe me, why dont you take a call to Phil Jones?
I almost hear MIME devices dial tones.
Click the link.
http://www.ecowho.com/foia.php?search=+______________________________________________________________

Mike Wilson
December 5, 2011 11:26 am

Oh yea, if you are reading this and you are in the United States. Chances are good your data is in there too. That is if you have credit or if you have had medical tests performed. Otherwise nothing to worry about. Actually I think the medical data may have been removed, not sure though, have not been there for a while now. I was the trouble maker that went 1st.
And yes, David is absolutely correct. We storage administrators have access to EVERYTHING! Nobody has more access on the system to the data. Generally not even the data security admin (although they could grant themselves access if they wanted too).

davidmhoffer
December 5, 2011 4:30 pm

Mike Wilson;
Your example of an outsourcer and changing administrative roles that result in a security risk happens… way too often. Not only does management not want to eat crow, most customers don’t even think to ask the question in the first place, let alone demand a contractual obligation to implement best practices that would protect them.
To underline just how much access to data a storage admin has, and how they can cover their tracks, here is an amusing story involving a telephone company who shall remain nameless. I’ll cal them Telco for easy reference.
Telco had a study done for them on their customer satisfaction versus their competition in the market for cell phones. One company came out WAY on top, so Telco decided to take a closer look at that company’s web site to see what made them so “good”. To Telco’s surprise, the packages being offered on the web site were identical to their own. The website seemed to be a one man operation, and to top things off, the prices were well below Telco’s cost of operation. How could this one man show have replicated Telco’s services right down to the last feature and offer them for less than Telco themselves could despite being a $10 Billion/yr company that owned their own cell tower infrastructure.
A little sleuthing turned up the fact that the one man band was, in fact, a storage administrator that worked at Telco. Every evening, he would take a “snapshot” of the data in all the admin systems. He would enter all the cell phone orders that he had gotten from his web site that day, and initialize them. This would trigger the admin systems to contact the phone switches themseleves and provision services to the cell phone being set up (including the number). This is where the power of the storage admin in that position becomes clear.
Once all the services were provisioned in the switching network, the storage admin would interrupt the admin systems, kill the current copy of the data, and then promote the “snapshot” he had just taken previously to be the current copy of the data. The net result was that all the services that the rogue sys admin had provisioned for his customers were now up and running in the cellular network itself, but there was no record of them ever having been initialized in the billing system… and hence, no bill.
I have way too many of these stories, but here’s a couple more quick points:
Question: Can you preserve a document that you wrote on a computer by printing it and saving the hard copy?
Answer: Not unless you want to go to jail. Evidentiary law in both the United States and Canada requires that all documents be preserved with all their original attributes. If you print a hard copy it is no longer searchable, it no longer has meta data (date of creation, modification, author, etc) and hence is a violation of compliance law.
Question: If you are a Canadian publicly traded company, are you subject to Sarbanes Oxley in the United States even if you do not trade on an American stock exchange?
Answer: Possibly. SOX applies regardless of where your shares are traded provided that 20% (I think) or more of your shares are held by American citizens. I did work for one Canadian company who was sure they were exempt until I asked if they had an employee share program, and how many of their American employees held how many shares….ooops!
Question: If you are in Canada and are an executive of a company that has violated SOX, can you avoid arrest and prosecution in the United States simply by never going there? SOX is not an extraditable offense, so one ought to be safe, right?
Answer: Without naming names, there’s a fellow in jail in the United States right now who figured he could get avoind prosecution by simply not travelling to the US. But, he decided to take a direct flight from Vancouver to Mexico. The moment his plane crossed into American air space…hello… where’d all those fighter jets come from? They forced the plane down in the US and arrested him.
The fact those of us who work in IT don’t simply lose our minds trying to figure this stuff out and architect systems that both work and are secure… wait a sec…everyone who knows me says I lost my mind a long time ago…

December 6, 2011 4:43 am

I have always been fond of the old “The Butler Did It!” solution. In this case the ‘butler’ is any long-standing, trusted but underpaid employee with almost unlimited access, good attention to detail, and a strong moral sense. The motivation would be righteous indignation of the employee against what he perceives as wrong-doing by his employers.
His trusted standing with his employers would defer any suspicion, partially because people who react emotionally before they think (the employers) are likely to look for a scapegoat before they will buckle down and analyze the problem. By that time it is easy to plant and nurture suspicion against your chosen nemesis.
You see in case after case of criminal law that once a suspect has been identified, all efforts turn to finding evidence to convict that suspect rather than finding evidence of exactly what happened.
The anonymous leaker will remain so unless he publicly confesses, simply because the Climategaters *want* to believe they were ‘hacked.’