Readers may recall that on February 22nd, I offered up some open source stylometry/textometry software called JGAAP (Java Graphical Authorship Attribution Program), with a suggestion that readers make use of it to determine the authorship of the faked Heartland strategy memo disseminated to the media by Peter Gleick.
A link to that article is here:
The reason I did that was that many had speculated that Dr. Peter Gleick was the author. Gleick, who admitted to obtaining the Heartland board meeting documents under false pretenses, and likely illegally, denies he wrote it. Except for a few holdouts and those who won’t give an opinion, like Andy Revkin, other prominent voices of the online community such as Megan McArdle of The Atlantic think otherwise, and she doesn’t even see it as a professionally written memo:
“…their Top Secret Here’s All the Bad Stuff We’re Gonna Do This Year memo…reads like it was written from the secret villain lair in a Batman comic. By an intern.”
In posting about JGAAP software crowdsourcing, I had hoped that the wide professional base of readers could make use of this software and would be able to come to conclusions using it, but there were complications that made the task more difficult than it would normally be. These complications included the fact that there were cut and pasted elements of other stolen Heartland documents in the “Climate Strategy Memo,” making it difficult for the software to delineate the separate writing styles without knowledgeable fine tuning.
These complications became especially evident when writer Shawn Otto at the Huffington Post used the JGAAP software to do his own analysis, coming to the conclusion that Joe Bast, president of the Heartland Institute, had authored the fake memo. The problem was that Mr. Otto did not perform the due diligence required in his selection of documents and the JGAAP software controls, and this led to an erroneous result.
In the end I realized that only professionals familiar with the science of stylometry/textometry would be able to make a credible determination as to the authorship. So, I asked for help.
On February 23, 2012 I sent the Evaluating Variations in Language Laboratory (the group responsible for the JGAAP software) a request for assistance. Mainly what I was looking for initially was tips on how to best operate their software, but given the high profile nature of this issue, and the unique situation, they referred me to Juola & Associates and its president, Patrick Brennan, who responded with an even better offer. They would use their larger collection of tools and techniques reserved for their forensics consulting work and apply it to the task, pro bono. Normally such professional analysis for courtroom quality work nets them fees comparable to what a metropolitan lawyer might charge, so not only was I extremely grateful, but realized it was an offer I couldn’t refuse.
In my email to Brennan on Fri, Feb 24, 2012 at 5:07 PM I wrote:
For the record, I do not know what the outcome might be, but it is always best to consult experts externally who have no financial interest in the outcome of the case.
Here’s the background on the group:
Juola & Associates (www.juolaassoc.com) is the premier provider of expert analysis and testimony in the field of text and authorship. Our scientists are leading, world-recognized experts in the fields of stylometry, authorship attribution, authorship verification, and author analysis. Every written document is a snapshot of the person who wrote it; through our analysis, we can determine everything from sociological information to biographical information, even the identity of the author. We provide sound, tested, and legally-recognized analysis as well as expert testimony by Dr. Patrick Juola, arguably one of the world’s leaders in the field of Forensic Stylometry.
We have worked with groups as wide-ranging as multinational companies, Federal courts, research groups, and individuals seeking political asylum. We have literally written the book (ISBN 978-1-60198-118-9) on computational methods for authorship analysis and profiling.
The lead analysis was conducted by Patrick Juola, Ph.D., Director of Research, and director of the Evaluating Variations in Language Laboratory at Duquesne University in Pittsburgh. Juola & Associates, headed by President Patrick Brennan is a separate commercial entity that provides analysis and consultation on stylometry.
Dr. Juola has published his analysis of the “Climate Strategy Memo,” which I present first and in entirety here at WUWT.
First, the short read:
Stylometric Report – Heartland Institute Memo
Patrick Juola, Ph.D.
As an expert in computational and forensic linguistics, I have reviewed the alleged Heartland memo to determine who the primary author of the report is, and more specifically whether the primary author was Peter Gleick or Joseph Bast. I conclude, based on a computational analysis, that the author is more likely to be Gleick than Bast.
And the larger excerpt of the document, bolds mine:
24 This task is challenging for several reasons, some technical and some linguistic.
25 First, the Heartland memo as published contains a great many quotations taken from other sources. As originally published, the memo contains approximately 717 words, but at least 266 of those words have been identified as belonging to phrases (or paraphrases of phrases) found elsewhere in the stolen documents). [N.b. this identification was done by the Heartland Institute, who admit that these 266 words are “paraphrases [of] text appearing in one of the stolen documuments.”
As paraphrases, they may nor may not reflect the style of the original authors, and they also may or may not reflect the style of the alleged forger. For this reason, we analyzed both the full document as well as the 451-word redacted document with the controversial passages removed.
26 Second, even the full-length document is rather short for an accurate analysis. Most authorship attribution experts recommend larger samples if possible. (E.g., Eder recommends 3500 words per sample, noting that results obtained from fewer than 3000 words “are simply disastrous.”)
27 Thirdly, perhaps as a result of the previous factors, we have observed that Bast and Gleick appear to have extremely similar writing styles.
28 Despite this difficulty, we were able to identify and calibrate an appropriate analysis method. Using this method, we analyzed both the complete Heartland memo and the selections from the Heartland memo that had been identified as not copied from other stolen documents. In both analyses, the JGAAP system identified the author as Peter Gleick.
29 In particular, the JGAAP system identified the author of the complete (unredacted) memo as Peter Gleick, despite the large amount of text that even Bast admits is largely taken from genuine writings of the Heartland Institute. We justify this result by observing, first, that much of the quotation is actual paraphrase, and the amount of undisputed writing is still nearly 2/3 of the full memo.
30 In response to the question of who wrote the disputed Heartland strategy memo, it is difficult to deliver an answer with complete certainty. The writing styles are similar and the sample is extremely small, both of which act to reduce the accuracy of our analysis. Our procedure by assumption excluded every possible author but Bast and Gleick. Nevertheless, the analytic method that correctly and reliably identified twelve of twelve authors in calibration testing also selected Gleick as the author of the disputed document. Having examined these documents and their results, I therefore consider it more likely than not that Gleick is in fact the author/compiler of the document entitled ”Confidential Memo: 2012 Heartland Climate Strategy,” and further that the document does not represent a genuine strategy memo from the Heartland Institute.
It seems very likely then, given the result of this analysis, plus the circumstances, proximity, motive, and opportunity, that Dr. Peter Gleick forged the document known as ”Confidential Memo: 2012 Heartland Climate Strategy.” The preponderance of the evidence points squarely to Gleick. According to Wikipedia’s entry on the “legal burden of proof”:
Preponderance of the evidence, also known as balance of probabilities is the standard required in most civil cases. This is also the standard of proof used in Grand Jury indictment proceedings (which, unlike civil proceedings, are procedurally unrebuttable).
Further, it is abundantly clear that this document was not authored by Heartland’s Joe Bast, nor was it included as part of the board package of documents Dr. Gleick (by his own admission) phished under false pretenses from Heartland.
The complete analysis by Dr. Juola is available here: MemoReport (PDF 101k)