Readers may recall that on February 22nd, I offered up some open source stylometry/textometry software called JGAAP (Java Graphical Authorship Attribution Program), with a suggestion that readers make use of it to determine the authorship of the faked Heartland strategy memo disseminated to the media by Peter Gleick.
A link to that article is here:
The reason I did that was that many had speculated that Dr. Peter Gleick was the author. Gleick, who admitted to obtaining the Heartland board meeting documents under false pretenses, and likely illegally, denies he wrote it. Except for a few holdouts and those who won’t give an opinion, like Andy Revkin, other prominent voices of the online community such as Megan McArdle of The Atlantic think otherwise, and she doesn’t even see it as a professionally written memo:
“…their Top Secret Here’s All the Bad Stuff We’re Gonna Do This Year memo…reads like it was written from the secret villain lair in a Batman comic. By an intern.”
In posting about JGAAP software crowdsourcing, I had hoped that the wide professional base of readers could make use of this software and would be able to come to conclusions using it, but there were complications that made the task more difficult than it would normally be. These complications included the fact that there were cut and pasted elements of other stolen Heartland documents in the “Climate Strategy Memo,” making it difficult for the software to delineate the separate writing styles without knowledgeable fine tuning.
These complications became especially evident when writer Shawn Otto at the Huffington Post used the JGAAP software to do his own analysis, coming to the conclusion that Joe Bast, president of the Heartland Institute, had authored the fake memo. The problem was that Mr. Otto did not perform the due diligence required in his selection of documents and the JGAAP software controls, and this led to an erroneous result.
In the end I realized that only professionals familiar with the science of stylometry/textometry would be able to make a credible determination as to the authorship. So, I asked for help.
On February 23, 2012 I sent the Evaluating Variations in Language Laboratory (the group responsible for the JGAAP software) a request for assistance. Mainly what I was looking for initially was tips on how to best operate their software, but given the high profile nature of this issue, and the unique situation, they referred me to Juola & Associates and its president, Patrick Brennan, who responded with an even better offer. They would use their larger collection of tools and techniques reserved for their forensics consulting work and apply it to the task, pro bono. Normally such professional analysis for courtroom quality work nets them fees comparable to what a metropolitan lawyer might charge, so not only was I extremely grateful, but realized it was an offer I couldn’t refuse.
In my email to Brennan on Fri, Feb 24, 2012 at 5:07 PM I wrote:
For the record, I do not know what the outcome might be, but it is always best to consult experts externally who have no financial interest in the outcome of the case.
Here’s the background on the group:
Juola & Associates (www.juolaassoc.com) is the premier provider of expert analysis and testimony in the field of text and authorship. Our scientists are leading, world-recognized experts in the fields of stylometry, authorship attribution, authorship verification, and author analysis. Every written document is a snapshot of the person who wrote it; through our analysis, we can determine everything from sociological information to biographical information, even the identity of the author. We provide sound, tested, and legally-recognized analysis as well as expert testimony by Dr. Patrick Juola, arguably one of the world’s leaders in the field of Forensic Stylometry.
We have worked with groups as wide-ranging as multinational companies, Federal courts, research groups, and individuals seeking political asylum. We have literally written the book (ISBN 978-1-60198-118-9) on computational methods for authorship analysis and profiling.
The lead analysis was conducted by Patrick Juola, Ph.D., Director of Research, and director of the Evaluating Variations in Language Laboratory at Duquesne University in Pittsburgh. Juola & Associates, headed by President Patrick Brennan is a separate commercial entity that provides analysis and consultation on stylometry.
Dr. Juola has published his analysis of the “Climate Strategy Memo,” which I present first and in entirety here at WUWT.
First, the short read:
Stylometric Report – Heartland Institute Memo
Patrick Juola, Ph.D.
Summary
As an expert in computational and forensic linguistics, I have reviewed the alleged Heartland memo to determine who the primary author of the report is, and more specifically whether the primary author was Peter Gleick or Joseph Bast. I conclude, based on a computational analysis, that the author is more likely to be Gleick than Bast.
And the larger excerpt of the document, bolds mine:
Analysis
24 This task is challenging for several reasons, some technical and some linguistic.
25 First, the Heartland memo as published contains a great many quotations taken from other sources. As originally published, the memo contains approximately 717 words, but at least 266 of those words have been identified as belonging to phrases (or paraphrases of phrases) found elsewhere in the stolen documents). [N.b. this identification was done by the Heartland Institute, who admit that these 266 words are “paraphrases [of] text appearing in one of the stolen documuments.”
As paraphrases, they may nor may not reflect the style of the original authors, and they also may or may not reflect the style of the alleged forger. For this reason, we analyzed both the full document as well as the 451-word redacted document with the controversial passages removed.
26 Second, even the full-length document is rather short for an accurate analysis. Most authorship attribution experts recommend larger samples if possible. (E.g., Eder recommends 3500 words per sample, noting that results obtained from fewer than 3000 words “are simply disastrous.”)
27 Thirdly, perhaps as a result of the previous factors, we have observed that Bast and Gleick appear to have extremely similar writing styles.
Results
28 Despite this difficulty, we were able to identify and calibrate an appropriate analysis method. Using this method, we analyzed both the complete Heartland memo and the selections from the Heartland memo that had been identified as not copied from other stolen documents. In both analyses, the JGAAP system identified the author as Peter Gleick.
29 In particular, the JGAAP system identified the author of the complete (unredacted) memo as Peter Gleick, despite the large amount of text that even Bast admits is largely taken from genuine writings of the Heartland Institute. We justify this result by observing, first, that much of the quotation is actual paraphrase, and the amount of undisputed writing is still nearly 2/3 of the full memo.
Conclusions
30 In response to the question of who wrote the disputed Heartland strategy memo, it is difficult to deliver an answer with complete certainty. The writing styles are similar and the sample is extremely small, both of which act to reduce the accuracy of our analysis. Our procedure by assumption excluded every possible author but Bast and Gleick. Nevertheless, the analytic method that correctly and reliably identified twelve of twelve authors in calibration testing also selected Gleick as the author of the disputed document. Having examined these documents and their results, I therefore consider it more likely than not that Gleick is in fact the author/compiler of the document entitled ”Confidential Memo: 2012 Heartland Climate Strategy,” and further that the document does not represent a genuine strategy memo from the Heartland Institute.
It seems very likely then, given the result of this analysis, plus the circumstances, proximity, motive, and opportunity, that Dr. Peter Gleick forged the document known as ”Confidential Memo: 2012 Heartland Climate Strategy.” The preponderance of the evidence points squarely to Gleick. According to Wikipedia’s entry on the “legal burden of proof”:
Preponderance of the evidence, also known as balance of probabilities is the standard required in most civil cases. This is also the standard of proof used in Grand Jury indictment proceedings (which, unlike civil proceedings, are procedurally unrebuttable).
Further, it is abundantly clear that this document was not authored by Heartland’s Joe Bast, nor was it included as part of the board package of documents Dr. Gleick (by his own admission) phished under false pretenses from Heartland.
The complete analysis by Dr. Juola is available here: MemoReport (PDF 101k)
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
This is certainly interesting, but I’m having a hard time getting past him acknowledging other experts advise that trying to use samples of less than 3,000 words “are simply disastrous” when this sample is so much smaller in either form (paraphrases included or not).
That’s a pretty big caveat.
The MSM, NYT included of course, is waiting for this to blow over. In so doing, they are making life a lot easier for a criminal.
So despite the alaemist crowd still desperately attempting to blame Heartland for the memo, it was Gleick the reprobate after all. Anyone who still employs that guy, or has anything to do with him, is condoning dishonesty. Gleick needs to just go away, and let someone honest take over.
It is a serious allegation that he actually forged the document. I wonder whether Heartland would have the nerve to run with that allegation.
Of course, Gleick could lay such allegations to rest by simply disclosing how he came into possession of the ‘faked’ document. He could open up the paper trail, if there truly is a trail.
By pro bono do you actually mean it’s being funded by Big Oil? /sarc
…I don’t see anything wrong with the memo in the first place
`It seems very likely then, given the result of this analysis, plus the circumstances, proximity, motive, and opportunity, that Dr. Peter Gleick forged the document known as ”Confidential Memo: 2012 Heartland Climate Strategy.” `
Well, well, well…
Nothing about this business suprises me anymore.
It is dirty, dishonest and disingenuous.
Science is the loser, Thankyou for nothing Dr Gleick!
The report is impressive and potentially devastating for Gleick. Where a number of WUWT readers tend to be scientifically (or perhaps analytically is better) orientated, they should avoid taking a scientific method-like approach to analyzing the report and its conclusions. Rather, it should be viewed as a member of a jury hearing introduced evidence. In this light, the report certainly lends credence to Gleick as the strategy memo’s author – beyond the level of speculation or conjecture via a quantitative analysis.
I’m still curious, though, as to how Heartland (and/or the FBI) will address Gleick’s clearly criminal activity from a legal perspective – false moral justifications from Gleick apologists notwithstanding.
Yes, interesting but difficult to take very seriously. Would a genuine language expert use “The quick brown fox jumped over the lazy dog” (missing the letter ‘s’) as an example?
Given a 300 word essay by my wife and another by someone else, I would be able to tell which was which about half way through. What Dr. Patrick Juola has going for him here is that the two most likely writers are both known. The analysis (with high probability) says eliminate Bast. Then it says, also with high probability, that the other fellow is the culprit. Also, knowing that the other fellow was a bit unhinged at the time, we have the third strike. Back to the dugout, Gleick.
Well done Anthony. I especially appreciate that you took care to point out the various caveats and limitations in the analysis.
I only wish that all scientists and journalists displayed such integrity.
It’s worth noting, especially for non-US readers, that this is a much lower bar than the criminal standard “Beyond reasonable doubt”.
This is what allows things like OJ Simpson to be found not guilty of murder, but also a civil result that he was responsible for his wife’s death. (Well, that and some really poor handling by the prosecution, but let’s not go there.)
So, from the self-critique, it appears this analysis is inadequate by itself to prove Gleick violated federal wire fraud statutes. However, it could have important implications for any civil proceedings between Heartland and Gleick.
I suspect both the FBI and Heartland know that already.
wouldnt it be great if Ar5 where written with this much attention to uncertainties.
Has Peter Gleick actually denied penning that ‘secret strategy memo’?
If I recall correctly, the wordings were carefully (lawyerly) crafted around the hot topics and questions pertaining to that memo.
In the same manner as he did claim, after having been ridiculed and pressured for ‘reviewing’ Donna F’s book without touching upon its contents, that he actually had read it. But carefully avoided the real relevant question whether he actually had read it prior to his Amazon ‘review’.
If expert testimony, with caveats, gives odds sufficient to state that the “weight of evidence” identifies the sole plausible candidate to have perpetrated this egregious fraud,
then as in forensic genealogy –relating to trusts and estates– an impartial jury is fully entitled to convict Peter Gleick of criminal forgery (among other things).
For over a generation now, pseudo-intellectual poseurs like Gleick have poisoned climate science with impunity. Let’s hope that Joe Bast on Heartland’s behalf pursues this case, prosecuting Gleick as the simpering phony that he is.
This what I got from your link….
Navigation
Links
Contact
Login
JComputing was started from a authorship attribution / stylometry research group working out of Duquesne University under the direction of Prof. Patrick Juola.
Johnny Japan
As the experts themselves caution us, this was a probe at (or beyond?) the limits of detection. A tiny sample and a binary decision: eliminate Bast and Gleick is the only one left. Ideally the software would have been given a “line up” of hundreds or thousands of possible culprits and big samples to compare. But we don’t live in an ideal world. I agree this might not be enough to hang Gleick but it does put him in the hot seat. I doubt he will respond; his best strategy is to go to ground, quietly settle with Heartland (possibly by applying pressure through unseen or indirect channels: the War Against The Donors is his main strategic objective in any case) and then re-emerge in a few months as loud and proud as ever.
I think Heartland either breaks him, very publicly and convincingly, or he and his team will eventually destroy Heartland. This is for reals. Just my uninformed opinion, of course.
Based on all this, I wonder if P. Gleick would voluntarily submit to a polygraph. I’m guessing…’No’.
Now if Dr. Gleick has to go to court I wonder who Joe Bast will call in as an expert witness?
I’ve said it before and I’ll say it again Gleick should do the cathartic thing and just own up. This is an open secret for goodness sake man. You lied once why wouldn’t you lie again.
Juola & Associates (http://www.juolaassoc.com) is the premier provider of expert analysis and testimony in the field of text and authorship.
Could we start up a fund drive to hire them to see if Dr. Mann’s latest novel of historical fiction was truly authored by him?
Thanks, to the “Evaluating Variations in Language Laboratory”. I suppose the payback for the Pro Bono could be a good will thing but something they clearly do not need.
Their donation of time and resources to this matter is very much appreciated, at least, by myself.
Good idea to go to the experts on this one to get a truly objective opinion. Just think what a game changer this would be if Bast was fingered as the author. In any case the truth is more important than motivation and I am sure all you are after is the truth.
Thanks Anthony, and thanks Patrick Juola, Ph.D.
Jonas N says:
March 14, 2012 at 6:52 am
Has Peter Gleick actually denied penning that ‘secret strategy memo’?
No he hasn’t. He denied writing the alleged anonymous document which he allegedly received through the alleged US Postal Service. One is led to infer that this was the 2012 Strategy Memo but nowhere does Gleick confirm this.
If Gleick thought the 2012 Strategy Memo was genuine, why did he not explicitly ask Heartland to email him a copy when he was phishing for confirming documents ? The embedded document characteristics would have proved to the world that this was a genuine Heartland document.
My bet is that not only did Gleick write the fake memo after phishing and finding no dirt, but also that the anonymous document never existed.
Game, Set, Match. (with emphasis on the match!).
I find the analysis interesting but uncompelling for the reason identified previously by myself, and in more detail by my colleague Byronic, in relation to Otto’s and Greg Laden’s analysis – stylometric analysis is very difficult against a hostile author.
http://books.google.co.uk/books?id=NdnMX5NUBJQC&pg=PA117&lpg=PA117&dq=jgaap+obfuscation&source=bl&ots=M5J5HnXZ7-&sig=DloBYZcM5vbdnIt_yeKDGJNdYRc&hl=en&sa=X&ei=sdhUT9jqN4eT8gPd1KjxBQ&ved=0CDcQ6AEwAw#v=onepage&q=jgaap%20obfuscation&f=false –
“Brennan and Greenstadt applied three fairly standard stylometric methods to determine authorship of obfuscated or imitative essays. Their results for obsfuscated essays were essentially at chance, suggesting that attempts to disguise or imitate style are likely to be successful against stylometric methods.”
For the record, I can’t speak for Byronic, but I believe that Gleick is the author for a very simple reason:
The memo must have been written by
(a) somebody who had access to Heartland documents prior to February 13th – which narrows it down to somebody at Heartland or Gleick
(b) can’t understand Heartland’s spreadsheets (there are two maths errors – Koch & the double counting of $88,000) – which makes a Heartland insider unlikely
(c) believed that Gleick and his Forbes column was not just important, but very important – which rules out everybody except for Gleick & perhaps his mum.
Ric Werme says:
March 14, 2012 at 6:40 am
///////////////////////////////////////////////////////
Your summary of the burden of proof and why you can therefore see different outcomes in related civil and criminal proceedings is useful.
However, whilst in the UK we have that distinction, the boundaries are sometimes blurred in that for example in a civil case which involves an allegation of fraud, my understanding is that the criminal burden of proof (or at any rate ‘clear and convincing evidence’) is required in order to substantiate that allegation.