
Tonight, a prescient prediction made on WUWT shortly after Gleick posted his confession has come true in the form of DeSmog blog making yet another outrageous and unsupported claim in an effort to save their reputation and that of Dr. Peter Gleick as you can read here: Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic
In a desperate attempt at self vindication, the paid propagandists at DeSmog blog have become their own “verification bureau” for a document they have no way to properly verify. The source (Heartland) says it isn’t verified (and a fake) but that’s not good enough for the Smoggers and is a threat to them, so they spin it and hope the weak minded regugitators retweet it and blog it unquestioned. They didn’t even bother to get an independent opinion. It seems to be just climate news porn for the weak minded Suzuki followers upon which their blog is founded. As one WUWT commenter (Copner) put it – “triple face palm”.
Laughably, the Penn State sabbaticalized Dr. Mike Mann accepted it uncritically.
Twitter / @DeSmogBlog: Evaluation shows “Faked” H …
Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic bit.ly/y0Z7cL – Retweeted by Michael E. Mann
Tonight in comments, Russ R. brought attention to his comment with prediction from two days ago:
I just read Desmog’s most recent argument claiming that the confidential strategy document is “authentic”. I can’t resist reposting this prediction from 2 days ago:
Russ R. says:
February 20, 2012 at 8:49 pm
Predictions:
1. Desmog and other alarmist outfits will rush to support Gleick, accepting his story uncritically, and offering up plausible defenses, contorting the evidence and timeline to explain how things could have transpired. They will also continue to act as if the strategy document were authentic. They will portray him simultaneously as a hero (David standing up to Goliath), and a victim (an innocent whistleblower being harassed by evil deniers and their lawyers).
2. It will become apparent that Gleick was in contact with Desmog prior to sending them the document cache. They knew he was the source, and they probably knew that he falsified the strategy document. They also likely received the documents ahead of the other 14 recipients, which is the only way they could have had a blog post up with all the documents AND a summary hyping up their talking points within hours of receiving them.
3. This will take months, or possibly years to fully resolve.
Russ R. is spot on, except maybe for number 3, and that’s where you WUWT readers and crowdsourcing come in. Welcome to the science of stylometry / textometry.
Since DeSmog blog (which is run by a Public Relations firm backed by the David Suzuki foundation) has no scruples about calling WUWT, Heartland, and skeptics in general “anti-science”, let’s use science to show how they are wrong. Of course the hilarious thing about that is that these guys are just a bunch of PR hacks, and there isn’t a scientist among them. As Megan McArdle points out, you don’t have to be a scientist to figure out the “Climate Strategy” document is a fake, common sense will do just fine. She writes in her third story on the issue: The Most Surprising Heartland Fact: Not the Leaks, but the Leaker
… a few more questions about Gleick’s story: How did his correspondent manage to send him a memo which was so neatly corroborated by the documents he managed to phish from Heartland?How did he know that the board package he phished would contain the documents he wanted? Did he just get lucky?If Gleick obtained the other documents for the purposes of corroborating the memo, why didn’t he notice that there were substantial errors, such as saying the Kochs had donated $200,000 in 2011, when in fact that was Heartland’s target for their donation for 2012? This seems like a very strange error for a senior Heartland staffer to make. Didn’t it strike Gleick as suspicious? Didn’t any of the other math errors?
So, let’s use science to show the world what they the common sense geniuses at DeSmog haven’t been able to do themselves. Of course I could do this analysis myself, and post my results, but the usual suspects would just say the usual things like “denier, anti-science, not qualified, not a linguist, not verified,” etc. Basically as PR hacks, they’ll say anything they could dream up and throw it at us to see if it sticks. But if we have multiple people take on the task, well then, their arguments won’t have much weight (not that they do now). Besides, it will be fun and we’ll all learn something.
Full disclosure: I don’t know how this experiment will turn out. I haven’t run it completely myself. I’ve only familiarized myself enough with the software and science of stylometry / textometry to write about it. I’ll leave the actual experiment to the readers of WUWT (and we know there are people on both sides of the aisle that read WUWT every day).
Thankfully, the open-source software community provides us with a cross-platform open source tool to do this. It is called JGAAP (Java Graphical Authorship Attribution Program). It was developed for the express purpose of examining unsigned manuscripts to determine a likely author attribution. Think of it like fingerprinting via word, phrase, and punctuation usage.
From the website main page and FAQs:
JGAAP is a Java-based, modular, program for textual analysis, text categorization, and authorship attribution i.e. stylometry / textometry. JGAAP is intended to tackle two different problems, firstly to allow people unfamiliar with machine learning and quantitative analysis the ability to use cutting edge techniques on their text based stylometry / textometry problems, and secondly to act as a framework for testing and comparing the effectiveness of different analytic techniques’ performance on text analysis quickly and easily.
What is JGAAP?
JGAAP is a software package designed to allow research and development into best practices in stylometric authorship attribution.
Okay, what is “stylometric authorship attribution”?
It’s a buzzword to describe the process of analyzing a document’s writing style with an eye to determining who wrote it. As an easy and accessible example, we’d expect Professor Albus Dumbledore to use bigger words and longer sentences than Ronald Weasley. As it happens (this is where the R&D comes in), word and sentence lengths tend not to be very accurate or reliable ways of doing this kind of analysis. So we’re looking for what other types of analysis we can do that would be more accurate and more reliable.
Why would I care?
Well, maybe you’re a scholar and you found an unsigned manuscript in a dusty library that you think might be a previously unknown Shakespeare sonnet. Or maybe you’re an investigative reporter and Deep Throat sent you a document by email that you need to validate. Or maybe you’re a defense attorney and you need to prove that your client didn’t write the threatening ransom note.
Sounds like the perfect tool for the job. And, best of all, it is FREE.
So here’s the experiment and how you can participate.
1. Download, and install the JGAAP software. Pretty easy, works on Mac/PC/Linux
If your computer does not already have Java installed, download the appropriate version of the Java Runtime Environment from Sun Microsystems. JGAAP should work with any version of Java at least as recent as version 6. If you are using a Mac, you may need to use the Software Update command built into your computer instead.
You can download the JGAAP software here. The jar will be named jgaap-5.2.0.jar, once it has finished downloading simply double click on it to launch JGAAP. I recommend copying it to a folder and launching it from there.
2. Read the tutorial here. Pay attention to the workflow process and steps required to “train” the software. Full documentation is here. Demos are here
3. Run some simple tests using some known documents to get familiar with the software. For example, you might run tests using some posts from WUWT (saved as text files) from different authors, and then put in one that you know who authored as a test, and see if it can be identified. Or run some tests from authors of newspaper articles from your local newspaper.
4. Download the Heartland files from Desmog Blog’s original post here. Do it fast, because this experiment is the one thing that may actually cause them to take them offline. Save them in a folder all together. Use the “properties” section of the PDF viewer to determine authorship. I suggest appending the author names (like J.Bast) to the end of the filename to help you keep things straight during analysis.
5. Run tests on the files with known authors based on what you learned in step 3.
6. Run tests of known Heartland authors (and maybe even throw in some non-heartland authors) against the “fake” document 2012 Climate Strategy.pdf
You might also visit this thread on Lucia’s and get some of the documents Mosher used to compare visually to tag Gleick as the likely leaker/faker. Perhaps Mosher can provide a list of files he used. If he does, I’ll add them. Other Gleick authored documents can be found around the Internet and at the Pacific Institute. I won’t dictate any particular strategy, I’ll leave it up to our readers to devise their own tests for exclusion/inclusion.
7. Report your finding here in comments. Make screencaps of the results and use tinypic.com or photobucket (or any image drop web service) to leave the images in comments as URLs. Document your procedure so that others can test/replicate it.
8. I’ll then make a new post (probably this weekend) reporting the results of the experiment from readers.
As a final note, I welcome comments now in the early stages for any suggestions that may make the experiment better. The FBI and other law enforcement agencies investigating this have far better tools I’m told, but this experiment might provide some interesting results in advance of their findings.

Michael J says:
February 23, 2012 at 12:39 am
I have a new theory about the fake document.
I suspect that it was sent to him by a colleague or, more likely, an opponent for the specific purpose of yanking his chain.
They hoped to get a laugh as Dr Gleick’s anger and hatred blinded him to the document’s obvious faults.
============================================================
Nope, it’s been scientifically proven that liberals have no sense of humor.
To make this a crowdsourcing exercise, we have to compile a huge archive. Not only of Gleick’s texts but of tens, even hundreds of other people. One has to compile a set of docs from Mann, one from Jones, one from Trenberth etc.
And to make this the right way, we also have to compile similar sets from McIntyre, Willis, Pielke, you name it.
People should write here, whose texts they’ll collect. The texts have to be copy-pasted as txt-files and the whole material has to be collected at one central place.
The only way to figure out who forged the memo is to start a lawsuit and afford Gleick to show the letter he received.
Then it must be checked whether it was printed on his own printer.
There is definitely more to this than meets the eye.
There are 4 documents that have multiple versions:
(1-15-2012) 2012 Heartland Budget here and here
2010_IRS_Form_990 here and here
2012 Climate Strategy here and here
Binder1 here and here
The actual content of the different versions is the same but they are different documents. The modifications time of the latest versions have changed to Feb 14 2012 from their original values.
Having played around with jgaap for a while, I’ve started to think, why we always have to do the work? That program has a lot of analyzing methods. Even if JGAAP fingers Gleick, there are tons of excuses to use:
– Oh, the real writer is not one amoung those that were tested.
– Yes, but “WEKA J48 Decision Tree Classifier” tells that it wasn’t Gleick!
We’ve seen this already with statistic analysis of Mann’s work. No matter how much we prove our point, it’s being denied.
Mr Connolley, you’re a software designer, aren’t you? Obviously as an IT guru, you’ve an advantage over many here. Instead of issuing cheap potshots, howsabout you pitch in on this and show us if you can actually manage a simple research project? Given your track record here, it’s not like you have too many things to do, people to see, etc.
@William Howard M. Connolley says:
February 23, 2012 at 4:28 am
Trying to thread jack again? your ego must be desperate for stroking.
@Bill
“As Mosher pointed out, if the text copied from the other Heartland documents is not removed, the software will say that Heartland wrote it. If you do take out those lines, the folks at DeSmog can criticize the results as you using an edited document. It still might be interesting though.”
My recommendation is that you do NOT take anything out, but rather go paragraph by paragraph through the strategy memo. This should yield paragraphs which align with the real HI documents (i.e. cut and paste), and other paragraphs which may point to another author (our forger).
If someone really has time go through sentence by sentence. My caution though is that in order to quell critics, you must eventually scan the WHOLE document for attribution.
TerryS>
There’s not enough consistency in timestamp handling across different operating systems and applications to draw any conclusions whatsoever. It’s perfectly possible for the documents to have inconsistent timestamps without it meaning anything at all.
For me the really odd thing is the idea that HI would send the board documents as PDF format. I very much doubt that would be the case. Making them PDFs for distribution appears to have been done by whoever released them, and this appears to be highly significant when you check the earliest timestamps.
I have written the following to BBC’s Richard Black on twitlonger. I doubt that he will even read it, let alone respond, But I had to get it off my chest.
“Have you no shame?
The Climate Alarmist’s reasoning as to the fake document in the fakegate leak actually being genuine, would be similar to me not having a driving licence, and stealing one from someone else with the same name as me and claiming, “of course I must have passed my driving test, I have a driving licence to prove it. And look, the information on it is mostly true, and I always believed I could drive anyway, so combining all these matters I guess I can legally drive now!”
The document which YOU claim proves that the Heartland Institute is attacking education is a FAKE! Can you understand that?
You are crudely and unsuccessfully and dishonestly passing off FAKED information gained through deception and lies as “news” and expecting us to believe you.
By all means, write whatever lie based rubbish you like for the likes of the Guardian or Greenpeace, but DO NOT do that on the BBC!
I cannot believe ANYTHING you write ever again, for you are NOT a journalist in ANY rational meaning of the word. You are nothing more than a very overly privileged advocate and activist for a political cause. NOTHING MORE!
Have the decency to apologise, resign from the BBC and go work for the Gutter press where you belong.
At least I support the side of the debate which still supports, truth, honesty, empirical evidence, the full and strict adherence to the FULL tenets of the scientific method, freedom and openness of research and opinion, acceptance and WELCOMING of scientific debate.
How can you look yourself in the face knowing that you are on the side which supports criminality, lies, fraud, fakery, deception, bullying, keeping secret publicly funded research, the hiding of inconvenient data, misrepresentation of data, the bullying of editors and the threats to journals to supinely cave in to the oppression by advocates of a political agenda, the imposition of “acceptable” thought upon everyone, regardless of the weakness and error-filled level of research.
Your follow up on your BBC blog fails to address your negligence and complicity in passing off fraudulently obtained and faked information as accurate news, nor does it address your stark double standard in suppressing the climate gate emails for two weeks and then when that news broke internationally, your blatantly biased defence of the CRU at UEA, and your attacking the leak (or theft as you described it, without ANY evidence whatsoever to back up that serious allegation).
And the difference in this fakegate case in your immediate rush to publication of what you called “leaked” information from an “insider” and then your attacking the VICTIM of this fraudulent theft and defended the thief!
You FAILED to point out the difference between the Heartland being a private organisation which is not subject to FOIA requests, and the climate gate leaks happening largely because that data had already been subject to a FOIA request and the people at CRU ILLEGALLY withheld that public data. NOR did you point out another crucial difference in that ALL the climategate data-leaks were of GENUINE data. NONE of it faked or edited, whereas the FAKEGATE data contained damning information which was ENTIRELY FAKE! It now appears POSSIBLE that you obtained the news of this fakegate theft firsthand from Peter Gleick himself. IF that is the case, then you are guilty of being an accessory to the crime and then deliberately and wilfully misleading, (lying) to the BBC Audience about the information coming from an “insider” you knew Peter Gleick was not an “insider” of the Heartland Institute when you wrote BOTH of your misleading articles about this theft.
Do the decent thing and resign! “
We have all been acting as if we have been in a fair minded debate with persons who could be convinced if they were shown conclusive evidence. Unfortunately it seems that the opinions and positions of the warmists are not based in fact at all and that no amount of proof or evidence will ever sway their opinions. Whether their positions are based on emotion, eco-fascism, politics or money gathering – our acting as if we think they are logical adults who can look at scientific facts and be swayed has been a waste of effort. Many of the people who have shown their true stripes in the past few days were never honest brokers – they simply are pretending to be.
There is plenty of proof and it is being ignored and denied. Additional proof will be denied. Spending time trying to prove something is not only wasted – it will be twisted by them to some new message.
Can someone ask Dr. Gleick for a better scan of the memo he has allegedly got by snail mail? 2012 Climate Strategy.pdf is an awful B&W scan of pretty low quality.
I suppose he still has the original in his possession. Also, he may publish a high resolution scan of the envelope the memo was sent in.
Laughably, the Penn State sabbaticalized Dr. Mike Mann accepted it uncritically.
Twitter / @DeSmogBlog: Evaluation shows “Faked” H …
Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic bit.ly/y0Z7cL – Retweeted by Michael E. Mann
—
From someone who KNOWS something about “fake but authentic” [heh]!
Dave,
It is a pretty normal committee practice to have documents as pdfs, as it ensures everyone is looking at the same thing (if you use word, for example, different versions or even setups can lead to slightly different results). And, since they cannot be edited, pdfs are a permanent record – they cannot be tampered with as can documents in editable formats. This is why, for example, legal judgements are now issued in pdf in most countries.
In case DeSmog does remove the documents, there is another copy at
http://scienceblogs.com/gregladen/2012/02/heartlandgate_anti-science_ins.php
It’s a Feb 14th post, Greg invites readers to:
I don’t have time today to compare them against DeSmog’s offerings, I assume they are a binary match.
BTW, Greg agrees that the strategy document is authentic (i.e. from Heartland), see http://scienceblogs.com/gregladen/2012/02/faked_heartland_institute_is_a.php
to DeSmog’s Feb 22 post http://www.desmogblog.com/evaluation-shows-faked-heartland-climate-strategy-memo-authentic
On reason they claim it’s authentic – “It also uses phrases, language and, in many cases, whole sentences that were taken directly from Heartland’s own material.” Gee, that was one reason McArdle et al thought it was a fabrication.
They make no mention of the various Gleickisms in the strategy document or (I suppose) why Heartland would put them in. Perhaps it’s all a complex attempt by Heartland to take down Gleick. 🙂
Re: Dave
The significance is that there are 2 DIFFERENT Budget documents, for example.
The document contents are the same but the documents are different. It has nothing to do with how different OS’es handle timestamps. The files sizes are even different.
PDF is not an editable document format. You can send, receive, view, print and even cut and paste from them. None of these actions will change the modification time (a META property of the file). The modification time for those who receive the PDF will be the same as for those who sent it.
If, however, you had a Word document and exported it multiple times as a PDF it would have different modification times.
The conclusion I draw from this is that DeSmog has the documents in a different format than PDF and has saved them to PDF’s multiple times.
The question is: WHERE DID THEY GET THEM FROM?
correction: codification time should be modification time
“Dave says:
February 23, 2012 at 5:09 am
…For me the really odd thing is the idea that HI would send the board documents as PDF format. I very much doubt that would be the case. Making them PDFs for distribution appears to have been done by whoever released them, and this appears to be highly significant when you check the earliest timestamps.”
Actually, no – PDF format is a close as is practical to a universal format. Everybody can get the reader, and just about every platform has some form of reader, whereas that’s not always the case with proprietary software. Makes sense to save them out as PDFs.
Cross posted from the Bishop’s Palace
PS – My original idea is wrong too – copying and pasting a file into the same place renames it to “xx Copy” and then if you do it again it becomes “xx Copy (2)” – there is a slight difference between XP and 7 between the names but the word “Copy” is inserted into the new file name.
As you were 😉
TerryS says:
February 23, 2012 at 5:49 am
The conclusion I draw from this is that DeSmog has the documents in a different format than PDF and has saved them to PDF’s multiple times.
It may be even simpler: DeSmog received it several times, from Gleick and different receivers who forwarded them (again) to DeSmog, or he had background conversations with Gleick, who sent him different versions of the (fake) file…
Alex at 04:54PT is right– the only way this is fully explained is when HI sues Gleicker and conducts discovery, such that Gleicker must produce (under penalty for pergury) the snail-mail envelope and produce his printer for testing. That said this group inquiry this weekend should be fun and surprisingly productive– hey, “Pajamas Media” was born 7 years ago when CBS/Dan Rather pulled the TANG nonsense. I’m sure warmists will want to participate to show that the forgery is “authentic”
Mann slays me, he really does.
He’s the kinda guy that would know an “Authentic” fake when he see one wouldnt he?
Anyone for Hockey?
@Bill and everyone else,
Ok finally had a chance to look at the documentation for JGAAP and I don’t believe it will actually be necessary to decompose the memo into paragraphs or sentences. It looks like the LDA and both of the SVM algorithms do this automatically and assign attribution to sections of the document.
So for individuals wanting to give this a go, I’d suggest using those algorithms first.
where did they get it from/ what about desmoblog LOL