
Tonight, a prescient prediction made on WUWT shortly after Gleick posted his confession has come true in the form of DeSmog blog making yet another outrageous and unsupported claim in an effort to save their reputation and that of Dr. Peter Gleick as you can read here: Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic
In a desperate attempt at self vindication, the paid propagandists at DeSmog blog have become their own “verification bureau” for a document they have no way to properly verify. The source (Heartland) says it isn’t verified (and a fake) but that’s not good enough for the Smoggers and is a threat to them, so they spin it and hope the weak minded regugitators retweet it and blog it unquestioned. They didn’t even bother to get an independent opinion. It seems to be just climate news porn for the weak minded Suzuki followers upon which their blog is founded. As one WUWT commenter (Copner) put it – “triple face palm”.
Laughably, the Penn State sabbaticalized Dr. Mike Mann accepted it uncritically.
Twitter / @DeSmogBlog: Evaluation shows “Faked” H …
Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic bit.ly/y0Z7cL – Retweeted by Michael E. Mann
Tonight in comments, Russ R. brought attention to his comment with prediction from two days ago:
I just read Desmog’s most recent argument claiming that the confidential strategy document is “authentic”. I can’t resist reposting this prediction from 2 days ago:
Russ R. says:
February 20, 2012 at 8:49 pm
Predictions:
1. Desmog and other alarmist outfits will rush to support Gleick, accepting his story uncritically, and offering up plausible defenses, contorting the evidence and timeline to explain how things could have transpired. They will also continue to act as if the strategy document were authentic. They will portray him simultaneously as a hero (David standing up to Goliath), and a victim (an innocent whistleblower being harassed by evil deniers and their lawyers).
2. It will become apparent that Gleick was in contact with Desmog prior to sending them the document cache. They knew he was the source, and they probably knew that he falsified the strategy document. They also likely received the documents ahead of the other 14 recipients, which is the only way they could have had a blog post up with all the documents AND a summary hyping up their talking points within hours of receiving them.
3. This will take months, or possibly years to fully resolve.
Russ R. is spot on, except maybe for number 3, and that’s where you WUWT readers and crowdsourcing come in. Welcome to the science of stylometry / textometry.
Since DeSmog blog (which is run by a Public Relations firm backed by the David Suzuki foundation) has no scruples about calling WUWT, Heartland, and skeptics in general “anti-science”, let’s use science to show how they are wrong. Of course the hilarious thing about that is that these guys are just a bunch of PR hacks, and there isn’t a scientist among them. As Megan McArdle points out, you don’t have to be a scientist to figure out the “Climate Strategy” document is a fake, common sense will do just fine. She writes in her third story on the issue: The Most Surprising Heartland Fact: Not the Leaks, but the Leaker
… a few more questions about Gleick’s story: How did his correspondent manage to send him a memo which was so neatly corroborated by the documents he managed to phish from Heartland?How did he know that the board package he phished would contain the documents he wanted? Did he just get lucky?If Gleick obtained the other documents for the purposes of corroborating the memo, why didn’t he notice that there were substantial errors, such as saying the Kochs had donated $200,000 in 2011, when in fact that was Heartland’s target for their donation for 2012? This seems like a very strange error for a senior Heartland staffer to make. Didn’t it strike Gleick as suspicious? Didn’t any of the other math errors?
So, let’s use science to show the world what they the common sense geniuses at DeSmog haven’t been able to do themselves. Of course I could do this analysis myself, and post my results, but the usual suspects would just say the usual things like “denier, anti-science, not qualified, not a linguist, not verified,” etc. Basically as PR hacks, they’ll say anything they could dream up and throw it at us to see if it sticks. But if we have multiple people take on the task, well then, their arguments won’t have much weight (not that they do now). Besides, it will be fun and we’ll all learn something.
Full disclosure: I don’t know how this experiment will turn out. I haven’t run it completely myself. I’ve only familiarized myself enough with the software and science of stylometry / textometry to write about it. I’ll leave the actual experiment to the readers of WUWT (and we know there are people on both sides of the aisle that read WUWT every day).
Thankfully, the open-source software community provides us with a cross-platform open source tool to do this. It is called JGAAP (Java Graphical Authorship Attribution Program). It was developed for the express purpose of examining unsigned manuscripts to determine a likely author attribution. Think of it like fingerprinting via word, phrase, and punctuation usage.
From the website main page and FAQs:
JGAAP is a Java-based, modular, program for textual analysis, text categorization, and authorship attribution i.e. stylometry / textometry. JGAAP is intended to tackle two different problems, firstly to allow people unfamiliar with machine learning and quantitative analysis the ability to use cutting edge techniques on their text based stylometry / textometry problems, and secondly to act as a framework for testing and comparing the effectiveness of different analytic techniques’ performance on text analysis quickly and easily.
What is JGAAP?
JGAAP is a software package designed to allow research and development into best practices in stylometric authorship attribution.
Okay, what is “stylometric authorship attribution”?
It’s a buzzword to describe the process of analyzing a document’s writing style with an eye to determining who wrote it. As an easy and accessible example, we’d expect Professor Albus Dumbledore to use bigger words and longer sentences than Ronald Weasley. As it happens (this is where the R&D comes in), word and sentence lengths tend not to be very accurate or reliable ways of doing this kind of analysis. So we’re looking for what other types of analysis we can do that would be more accurate and more reliable.
Why would I care?
Well, maybe you’re a scholar and you found an unsigned manuscript in a dusty library that you think might be a previously unknown Shakespeare sonnet. Or maybe you’re an investigative reporter and Deep Throat sent you a document by email that you need to validate. Or maybe you’re a defense attorney and you need to prove that your client didn’t write the threatening ransom note.
Sounds like the perfect tool for the job. And, best of all, it is FREE.
So here’s the experiment and how you can participate.
1. Download, and install the JGAAP software. Pretty easy, works on Mac/PC/Linux
If your computer does not already have Java installed, download the appropriate version of the Java Runtime Environment from Sun Microsystems. JGAAP should work with any version of Java at least as recent as version 6. If you are using a Mac, you may need to use the Software Update command built into your computer instead.
You can download the JGAAP software here. The jar will be named jgaap-5.2.0.jar, once it has finished downloading simply double click on it to launch JGAAP. I recommend copying it to a folder and launching it from there.
2. Read the tutorial here. Pay attention to the workflow process and steps required to “train” the software. Full documentation is here. Demos are here
3. Run some simple tests using some known documents to get familiar with the software. For example, you might run tests using some posts from WUWT (saved as text files) from different authors, and then put in one that you know who authored as a test, and see if it can be identified. Or run some tests from authors of newspaper articles from your local newspaper.
4. Download the Heartland files from Desmog Blog’s original post here. Do it fast, because this experiment is the one thing that may actually cause them to take them offline. Save them in a folder all together. Use the “properties” section of the PDF viewer to determine authorship. I suggest appending the author names (like J.Bast) to the end of the filename to help you keep things straight during analysis.
5. Run tests on the files with known authors based on what you learned in step 3.
6. Run tests of known Heartland authors (and maybe even throw in some non-heartland authors) against the “fake” document 2012 Climate Strategy.pdf
You might also visit this thread on Lucia’s and get some of the documents Mosher used to compare visually to tag Gleick as the likely leaker/faker. Perhaps Mosher can provide a list of files he used. If he does, I’ll add them. Other Gleick authored documents can be found around the Internet and at the Pacific Institute. I won’t dictate any particular strategy, I’ll leave it up to our readers to devise their own tests for exclusion/inclusion.
7. Report your finding here in comments. Make screencaps of the results and use tinypic.com or photobucket (or any image drop web service) to leave the images in comments as URLs. Document your procedure so that others can test/replicate it.
8. I’ll then make a new post (probably this weekend) reporting the results of the experiment from readers.
As a final note, I welcome comments now in the early stages for any suggestions that may make the experiment better. The FBI and other law enforcement agencies investigating this have far better tools I’m told, but this experiment might provide some interesting results in advance of their findings.

Shawn Otto has published a result from JGAAP.
Good one, DukeC!
I do work with images and imagine myself a bit of a Photoshop expert, if I may blow my own horn, and so it’s my semi-educated guess that the fix-up may have been done manually on the original, possibly with a white tape or a label. Still, we cannot discount that it was done digitally with a rectangular block and then re-scanned in lower resolution. This low resolution pixelation you see on either what is a shadow from a tape or a label (my guess), or a leftover logo bit, which is evident on the sloppily left-out line on the left-hand side, is dimensionally quite similar if not identical to the resolution on the text, and spills over to the right-hand side of the line, instead of being crispy sharp, had the obscuring been done at the last stage and at a high res. I hope all this made sense.
“Duke C. says:
February 23, 2012 at 9:31 am
I think there might have been a header of some sort that was cropped out (sloppily, I might add) on the 2012 Climate Strategy.pdf”
Hmmmmm I don’t think that’s part of a header… It just looks like a smudge on the scanner glass.
Since our friend Willie isn’t reading my posts or rising to my goading to help out, I’ll say sotto scriptum, that Connolley is merely goading everyone out of sheer nervousness. If so, just imagine what Dr Gleick must be feeling like, shvitzing as we merrily stumble along this excellent sleuthing adventure Anthony was good enough to throw for us. To indulge in some cheap parlour-psychology, all for entertainment value of course, it’s my esteemed opnion as a wag that with his seemingly gratuitous taunts and mockery, Connolly is exhibiting for our pleasure what A Type (both alpha and –ss hole) personalities tend to do when very stressed. In any case, keep in mind that had this process here—which is actually moving along quite nicely with the preliminary discussions and levity–had it been moving faster, he’d be accusing everyone of jumping the gun. After all, Connolley does have the ethics and integrity of his favourite beast, the stoat, which is a kind of a weasel.
Peter Kovachev says:
February 23, 2012 at 10:19 am
“… it’s my semi-educated guess that the fix-up may have been done manually on the original, possibly with a white tape or a label.”
That’s plausible. 🙂
Although the type starts so high up on the page that it doesn’t look like it was intended to print on letterhead stationery. Hard to say!
@Ian Hoder
I haven’t had a chance to play with JGAAP myself yet, will do that tonight, but have been reading the documentation. I would NOT select “all” since the developers warn that they have a memory leak problem, which is why you are probably dying everytime.
My recommended settings would be:
– Canonicizers: Normalize ASCII and Whitespace, Strip Punctuation, and Unify Case
– Event Drivers: Character Grams (N=4, N=5), M…N letter words (M=6, N=12)
– Event Culling: Most Common Events (N=100 or other fairly large number)
– Analysis Methods: Try these ONE at a time (Guassian SVM, LDA, Linear SVM)
Good luck and let us know how that works.
A dood, beg to respectfully differ with your smudge hypothesis. If you were to look again, you’ll note that the anomaly is unnaturally regular; rectangular, straight lined and with an apparent corner. I still place my wager on a shadow caused by a white tape or a label. On second look, it may actually be a shadow caused by the upper corner and edge of the document page, although the abrupt, un-tapered ending of the bottom bit of the line on the left hand side would militate against that. Argh, what I would give for the original doc! Dr Gleick, if you’re reading all our prattle through a curtain of tears, be a jolly good sport and hand the item over to the authorities. Do that and I promise to make you a soap-on-a-rope with your name carved onto it.
“Up to 98 comments. Lots of people spouting off, but no-one has actually done any work.” –William M. Connolley (February 23, 2012 at 8:09 am)
Actually several other lines of useful enquiry have been opened.
1) is the original pdf of a folded document? It will take a lot of explaining if it is not.
2) why is the leading wrong?
Of course we know the document is faked. You have to be a particular sort of blind not to see it.
I would find myself leaving any site that thought defending forged documents was fine. I have a list of sites I never visit for ethical violations well below that. How low do you have to sink that defending forgery becomes acceptable?
@Mooloo – Apparently Mr. Connolley is too high and mighty to try himself, but would rather deride others for not instantly publishing what they may be working on. -or- Maybe he’s a “genius” like Dr. Gleick, and has no learning curve for new software and new techniques. – Anthony
Charles Bruce Richardson (February 23, 2012 at 7:28 am)
“Then lying will be a felony for which he would likely serve time–the justice system cannot allow perjury to go unpunished.”
Unfortunately not true. As Bill Clinton repeatedly demonstrated. Ask any member of the Democratic Party – they will make it clear that liberals consider themselves above the law.
Duke C. says:
February 23, 2012 at 9:31 am
I think there might have been a header of some sort that was cropped out (sloppily, I might add) on the 2012 Climate Strategy.pdf:
http://img813.imageshack.us/img813/5586/startdocpg2.jpg
—————————
Good catch!
It looks to me as if it is a left corner header……….Does this fit? [ The Blue header ]
http://pacinst.org/reports/success_stories/new_ag_water_success_stories.pdf
@Michael Tobis,
I see that the results based on scanning the document as a whole are unsurprising from your POV. And in fact were anticipated by myself. If you’ll scan up many comments, you’ll note that one of my suggestions was that each paragraph/section should be scanned individually. Most problematically, I would suggest the “Expanded Communications” section.
Re:File time – this can be faked but a lot of computers automatically set the time back based on either a network time server or the interweb – in fact it can be a real pain when you want to fake it – e.g. when testing software.
Good heavens, kim2000. I hate you, how dare you blow my beautiful hypothesis to smithereens! Alas, I think you are onto something!
Alright, bugger my tape/label hypothesis. The Pacific Institute header, which appears even on the inside or secondary pages, totally jives proportionally to the page edge and text distance, and the rectangle bit is a shoe-in. Bloody marvelous, Mr/Ms kim2000. ..I’d offer to kiss you if I knew your gender.
Me, I’m thinking that with all re-use and sustainability hooplah eco-critters have conditioned themselves to, it would be just like a Pacific employee to conscientiously recycle an old document from their re-use bin in the copier or mail room. LOL! What poetic justice that would be!
William M. Connolley says:
February 23, 2012 at 8:09 am
Up to 98 comments. Lots of people spouting off, but no-one has actually done any work.
============================================================
Peter Kovachev says:
February 23, 2012 at 11:31 am
I’m a girl 😉
I’m a kid
…………………………
Mr Watts do I get a hat tip..please?
[No. However, using hockey stick terminology, you can perform a hat trick” … 8<) Robt]
LOL! No kiss from me, then; I’m too old! And apologies for my sailor ‘s lingo, young Missy…must remember there’re young-uns here too.
But yes, how about it, Mr Watts, a hat tip for the young lady at least? This thing might have easily been missed by all of us.
Wonder if maybe the line at the top is a fax stamp covered with correction tape?
Just noticed Kim2000’s graphics, looks plausible, as the aspect ratio of the border seems identical. Further investigation needed.

Give the kid a gold star!
I R a scientist now?
Thank you Mr. Peter Kovachev says:
February 23, 2012 at 11:51 am
And Mr Watts 🙂
kim2ooo says:
February 23, 2012 at 11:37 am
I’m a girl 😉
I’m a kid
…………………………
Mr Watts do I get a hat tip..please?
==============================================
ABSOLUTELY!……………………… 😀
Linux/Unix has a utility called ‘touch’ that updates the time stamp to the current time or any other time and date. That existed for DOS too as a utility, distributed with Borland stuff for instance.
keep getting “java.lang.IllegalArgumentException: URL source must use ‘file’ protocol”
Tried on mac/pc, tried sudo, made sure doc was utf8 (with and without BOM)… I will run from the source later. This program seems very sensitive to not working at all. You can see the errors by runninf rome the prompt from your download folder: java -jar jgaap-5.2.0.jar
I did get it working for a while using their test files (I used ‘L’), then removing their unknown docs, and adding the heartland memo and some authors. The first run with their authors and the memo, the gleick text from above here, and a heartland speech was from this (excellent) overview of the docs, pre gleick confession: http://ljzigerell.wordpress.com/2012/02/18/profiling-the-heartland-memo-author/
I ran with only Burrows delta, and the Event Drivers are listed below (couldn’t run them all without crashing, so a randomish sample). I assume lower is closer as the other authors are ancient latin poets iirc. Infinities probably means no hits in their algorithm.
unknown.txt /Users/admin/Desktop/jar/unknown/unknown.txt
Canonicizers: none
Analyzed by Burrows Delta using Sentence Length as events
1. Author01 78.77155793679051
2. Author01 80.2867632506562
3. gleick 86.33117704938115
4. Author03 88.9431049630841
5. bast 104.26533350267803
Analyzed by Burrows Delta using Words as events
1. Author01 Infinity
1. Author01 Infinity
1. Author03 Infinity
1. gleick Infinity
1. bast Infinity
Analyzed by Burrows Delta using MW Function Words as events
1. bast 100.96472895245202
2. Author01 110.13874051963056
3. Author03 111.98543979724113
4. Author01 112.50870666862883
5. gleick 118.5464526044639
Analyzed by Burrows Delta using Sentences as events
1. Author01 Infinity
1. Author01 Infinity
1. Author03 Infinity
1. gleick Infinity
1. bast Infinity
Analyzed by Burrows Delta using Syllables Per Word as events
1. gleick 4.678547658695827
2. Author01 7.16552557153825
3. Author01 9.372995377516089
4. Author03 9.552504044292814
5. bast 13.812675242063655
Analyzed by Burrows Delta using Characters as events
1. Author01 Infinity
1. Author01 Infinity
1. Author03 Infinity
1. gleick Infinity
1. bast Infinity
Analyzed by Burrows Delta using Word Lengths as events
1. gleick 19.34449212885342
2. Author01 23.605545416411637
3. Author01 28.46339579762878
4. bast 30.397652383928673
5. Author03 30.8325765083569
Analyzed by Burrows Delta using Suffices as events
1. Author01 Infinity
1. Author01 Infinity
1. Author03 Infinity
1. gleick Infinity
1. bast Infinity
Analyzed by Burrows Delta using Lexical Frequencies as events
1. Author01 Infinity
1. Author01 Infinity
1. Author03 Infinity
1. gleick Infinity
1. bast Infinity
Analyzed by Burrows Delta using Rare Words as events
1. Author01 Infinity
1. Author01 Infinity
1. Author03 Infinity
1. gleick Infinity
1. bast Infinity
Analyzed by Burrows Delta using Binned Frequencies as events
1. gleick 56.804661456724205
2. bast 57.4768784285835
3. Author03 90.58279638495986
4. Author01 92.86171017325175
5. Author01 103.78059122092623
Michael Tobis says:
February 23, 2012 at 10:02 am
Shawn Otto has published a result from JGAAP.
Of course, if 90% of the text is lend from the original documents, then the software will show that the original author is the most likely author. But it is not about the 90%, it is the author of the remaining juicy comments which we like to know. One should do the test on only those comments that are not tracable back in the original documents…
YES! I believe that’s the first gold star anyone has earned here, right ?
Congrats, Miss kim2000! O, yes, indeed, for whatever my opinion’s worth, I’d say U R indeed a scientist. Science, as I’m already sure you know, is mostly about methodology. The ability to analyse, eliminate, induct and deduct, not to mention to root around dull data until something, hopefully, pops up. This you’ve done admiringly. Regardless of whether this pans out…as Anthony says, it’s plausible and needs further investigation…you’ve still succeeded in presenting a very good workable hypothesis. I predict you’ll do very well in this game of life.
Kim2000;
you not only get a hat tip and a hat trick. . . . . .
But you now also have your very own fan club here. . . of which I am a member!
Good job!!!