Readers may recall that on February 22nd, I offered up some open source stylometry/textometry software called JGAAP (Java Graphical Authorship Attribution Program), with a suggestion that readers make use of it to determine the authorship of the faked Heartland strategy memo disseminated to the media by Peter Gleick.
A link to that article is here:
The reason I did that was that many had speculated that Dr. Peter Gleick was the author. Gleick, who admitted to obtaining the Heartland board meeting documents under false pretenses, and likely illegally, denies he wrote it. Except for a few holdouts and those who won’t give an opinion, like Andy Revkin, other prominent voices of the online community such as Megan McArdle of The Atlantic think otherwise, and she doesn’t even see it as a professionally written memo:
“…their Top Secret Here’s All the Bad Stuff We’re Gonna Do This Year memo…reads like it was written from the secret villain lair in a Batman comic. By an intern.”
In posting about JGAAP software crowdsourcing, I had hoped that the wide professional base of readers could make use of this software and would be able to come to conclusions using it, but there were complications that made the task more difficult than it would normally be. These complications included the fact that there were cut and pasted elements of other stolen Heartland documents in the “Climate Strategy Memo,” making it difficult for the software to delineate the separate writing styles without knowledgeable fine tuning.
These complications became especially evident when writer Shawn Otto at the Huffington Post used the JGAAP software to do his own analysis, coming to the conclusion that Joe Bast, president of the Heartland Institute, had authored the fake memo. The problem was that Mr. Otto did not perform the due diligence required in his selection of documents and the JGAAP software controls, and this led to an erroneous result.
In the end I realized that only professionals familiar with the science of stylometry/textometry would be able to make a credible determination as to the authorship. So, I asked for help.
On February 23, 2012 I sent the Evaluating Variations in Language Laboratory (the group responsible for the JGAAP software) a request for assistance. Mainly what I was looking for initially was tips on how to best operate their software, but given the high profile nature of this issue, and the unique situation, they referred me to Juola & Associates and its president, Patrick Brennan, who responded with an even better offer. They would use their larger collection of tools and techniques reserved for their forensics consulting work and apply it to the task, pro bono. Normally such professional analysis for courtroom quality work nets them fees comparable to what a metropolitan lawyer might charge, so not only was I extremely grateful, but realized it was an offer I couldn’t refuse.
In my email to Brennan on Fri, Feb 24, 2012 at 5:07 PM I wrote:
For the record, I do not know what the outcome might be, but it is always best to consult experts externally who have no financial interest in the outcome of the case.
Here’s the background on the group:
Juola & Associates (www.juolaassoc.com) is the premier provider of expert analysis and testimony in the field of text and authorship. Our scientists are leading, world-recognized experts in the fields of stylometry, authorship attribution, authorship verification, and author analysis. Every written document is a snapshot of the person who wrote it; through our analysis, we can determine everything from sociological information to biographical information, even the identity of the author. We provide sound, tested, and legally-recognized analysis as well as expert testimony by Dr. Patrick Juola, arguably one of the world’s leaders in the field of Forensic Stylometry.
We have worked with groups as wide-ranging as multinational companies, Federal courts, research groups, and individuals seeking political asylum. We have literally written the book (ISBN 978-1-60198-118-9) on computational methods for authorship analysis and profiling.
The lead analysis was conducted by Patrick Juola, Ph.D., Director of Research, and director of the Evaluating Variations in Language Laboratory at Duquesne University in Pittsburgh. Juola & Associates, headed by President Patrick Brennan is a separate commercial entity that provides analysis and consultation on stylometry.
Dr. Juola has published his analysis of the “Climate Strategy Memo,” which I present first and in entirety here at WUWT.
First, the short read:
Stylometric Report – Heartland Institute Memo
Patrick Juola, Ph.D.
Summary
As an expert in computational and forensic linguistics, I have reviewed the alleged Heartland memo to determine who the primary author of the report is, and more specifically whether the primary author was Peter Gleick or Joseph Bast. I conclude, based on a computational analysis, that the author is more likely to be Gleick than Bast.
And the larger excerpt of the document, bolds mine:
Analysis
24 This task is challenging for several reasons, some technical and some linguistic.
25 First, the Heartland memo as published contains a great many quotations taken from other sources. As originally published, the memo contains approximately 717 words, but at least 266 of those words have been identified as belonging to phrases (or paraphrases of phrases) found elsewhere in the stolen documents). [N.b. this identification was done by the Heartland Institute, who admit that these 266 words are “paraphrases [of] text appearing in one of the stolen documuments.”
As paraphrases, they may nor may not reflect the style of the original authors, and they also may or may not reflect the style of the alleged forger. For this reason, we analyzed both the full document as well as the 451-word redacted document with the controversial passages removed.
26 Second, even the full-length document is rather short for an accurate analysis. Most authorship attribution experts recommend larger samples if possible. (E.g., Eder recommends 3500 words per sample, noting that results obtained from fewer than 3000 words “are simply disastrous.”)
27 Thirdly, perhaps as a result of the previous factors, we have observed that Bast and Gleick appear to have extremely similar writing styles.
Results
28 Despite this difficulty, we were able to identify and calibrate an appropriate analysis method. Using this method, we analyzed both the complete Heartland memo and the selections from the Heartland memo that had been identified as not copied from other stolen documents. In both analyses, the JGAAP system identified the author as Peter Gleick.
29 In particular, the JGAAP system identified the author of the complete (unredacted) memo as Peter Gleick, despite the large amount of text that even Bast admits is largely taken from genuine writings of the Heartland Institute. We justify this result by observing, first, that much of the quotation is actual paraphrase, and the amount of undisputed writing is still nearly 2/3 of the full memo.
Conclusions
30 In response to the question of who wrote the disputed Heartland strategy memo, it is difficult to deliver an answer with complete certainty. The writing styles are similar and the sample is extremely small, both of which act to reduce the accuracy of our analysis. Our procedure by assumption excluded every possible author but Bast and Gleick. Nevertheless, the analytic method that correctly and reliably identified twelve of twelve authors in calibration testing also selected Gleick as the author of the disputed document. Having examined these documents and their results, I therefore consider it more likely than not that Gleick is in fact the author/compiler of the document entitled ”Confidential Memo: 2012 Heartland Climate Strategy,” and further that the document does not represent a genuine strategy memo from the Heartland Institute.
It seems very likely then, given the result of this analysis, plus the circumstances, proximity, motive, and opportunity, that Dr. Peter Gleick forged the document known as ”Confidential Memo: 2012 Heartland Climate Strategy.” The preponderance of the evidence points squarely to Gleick. According to Wikipedia’s entry on the “legal burden of proof”:
Preponderance of the evidence, also known as balance of probabilities is the standard required in most civil cases. This is also the standard of proof used in Grand Jury indictment proceedings (which, unlike civil proceedings, are procedurally unrebuttable).
Further, it is abundantly clear that this document was not authored by Heartland’s Joe Bast, nor was it included as part of the board package of documents Dr. Gleick (by his own admission) phished under false pretenses from Heartland.
The complete analysis by Dr. Juola is available here: MemoReport (PDF 101k)
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
It is nice, though probably not conclusive to have this analysis.
I like Copner’s summary points, but would add one other that hasn’t appeared yet in this thread:
The forged document requires that Heartland (presumably in the person of Joe Bast) be knowingly mendacious. In other words: They are wrong, they know they are wrong and they revel in it. To attribute this sentiment to Bast is preposterous. Even most warmists probably find that ridiculous. Only a true believer in the inherent and conscious evil of the the skeptic position could have penned this. Forget about style.
Several weeks ago, I did a cursory search through the released documents to try to determine the extent of copying in the production of the fake agenda. The result of the search with some annotation of the text can be found in this pdf document.
As I mentioned in a comment on the previous WUWT thread on this topic, I doubt that the methodology used in the stylometry process can provide any definitive information as to the strategy document’s authorship.
The relative shortness of the document and the sparsity of the analysis and the lack of any concrete numerical results in the pdf linked in head post does little change to alter my view of this. It all seems to come down to an evaluation of the idiosyncrasies of the text of the document and to its apparent provenance that it was in the possession of Peter Gleick at some point in time.
REPLY: And yet when properly calibrated with sample input, the software identified Gleick without failure – Anthony
I was concerned with the original idea of a crowd-source analysis of the document for the very reasons outlined … it contained material from one possible source.
Whilst I detect some possible sympathies toward WUWT both in the pro bono work and in some of the wording. It would seem that they have approached it in as methodical a way possible, given the various caveats regarding length etc.
I may not be the Heartland’s greatest fan, but they were clearly the victim of a nasty viscous attack by Gleick, which was all the worse given the way Heartland had extended the hand of friendship to him. It is even worse that many people like those “journalists” at the Huff&Puff post have tried to defend the indefensible.
All I can say is Thank you Juola & Associates (www.juolaassoc.com)
REPLY: “…but they were clearly the victim of a nasty viscous attack by Gleick”
Yes indeed, it is a sticky wicket 😉 Anthony
Gail Combs says:
March 14, 2012 at 8:44 am
My lawyer wife says “Probable cause” is all you need to get search warrant. Oh, I could’ve just checked that Wiki page:
Quite right. That anybody still denies Gleick is the author, with a straight face no less, is astonishing.
I marvel with every passing day that there appears to be no criminal repercussions to admitted criminal behavior.
HowardG @8:53:
If you go and carefully read the documents, all of them, I think you will find that the possibility that Heartland will be that forgiving is unlikely. I think Gleick has done real damage to them and their employees. I’m thinking there are a lot of angry people at Heartland who had their private information compromised. Bast has a responsibility to them and, as has been hinted at by some, may even be, as the principal of institute, culpable for damages in some way.
Dave says: March 14, 2012 at 7:49 am
One has to wonder why Gleick hasn’t been arrested already since he admitted phishing data from Heartland. Isn’t that illegal?
It is if you or I do it! But climate scientists are special. They are above the law: above the law of man, above the law of science, and given the way Gleick just makes up his own ethics. above the law of god/morality.
being above everything, all I can say is “the higher you are the harder you fall”.
Of all the above text, the BBC’s Richard Black, and the other mainstream media alarmist advocates are most likely to focus on the following:
“These complications became especially evident when writer Shawn Otto at the Huffington Post used the JGAAP software to do his own analysis, coming to the conclusion that Joe Bast, president of the Heartland Institute, had authored the fake memo.”
I can’t wait for the Huffington post piece on this.
It would be a shame for them to “deny” it by ignoring the scientific consensus.
Gleick needs to just go away, and let someone honest take over.
If they can find an honest environmentalist.
It is impossible to take Gleick’s word as true when he said it was mailed to him by an unknown source. Once someone has been proven to be a liar, you can not believe anything they say, especially when it relates to ‘vindicating’ that person.
Now that’s style. I can’t conceive a more elegant and professional way of refuting the amateurish deflection/smear of the Huffington Post. Entries like this explain the “best science blog” and “lifetime achievement” awards.
Many thanks to Anthony for the inspiring work. A big thank you to Juola & Associates for their generosity and expertise.
On reflection, I feel so very sad for Gleick and I do hope he has some supportive friends and family
Until now, he probably had a lot of sympathy from the warmist group – even being called a hero. Unfortunately, that is going to disappear when they find out that it appears he continued to lie further undermining the “cause”.
It looks like his career is gone.
It looks like his warmist “friends” will desert him,
And perhaps worst of all, it will be Gleick himself that is likely to do the most almighty damage and bring down the global warming scam which he feels so passionate about.
We all make mistakes, we are all human, he may have felt he was doing the right thing, but he’s going to be ostracised. Let’s not be too hard on the man, even if we do not like the action because I fear we may soon be the best friends he has.
Weak.
“Gleick (…) denies he wrote it.” (the faked Heartland strategy memo)
Did he deny it? Reallt? When? I remember reading that he said he received it in the mail. But I have never read any denial by Gleick that he wrote it.
In the words of the song “…send it off, in a letter to yourself…”
In other words it is good enough to get a search warrant but not good enough to hang him.
—
It is however, good enough to sue him for every penny he is worth.
I found this:
http://www.huffingtonpost.com/shawn-lawrence-otto/joe-bast-fake-document_b_1297042.html
So who is doing the correct analysis?
Civil case penalties rarely involve ‘hanging’ …
Gleick will not be prosecuted on federal charges so long as Obama is president and/or Holder is AG. The politics in CA seems similar even though he has publicly confessed to identity theft and impersonation a state crime.
“They would use their larger collection of tools and techniques reserved for their forensics consulting work and apply it to the task, pro bono.”
So you admit to being on the payroll of Big Lingo?!
Ric Werme says: @ur momisugly March 14, 2012 at 6:40 am
Preponderance of the evidence, also known as balance of probabilities is the standard required in most civil cases.
It’s worth noting, especially for non-US readers, that this is a much lower bar than the criminal standard “Beyond reasonable doubt”……
_____________________________________
Gail Combs says: @ur momisugly March 14, 2012 at 8:44 am
In other words it is good enough to get a search warrant but not good enough to hang him.
____________________________________
Ric Werme says:
March 14, 2012 at 9:03 am
My lawyer wife says “Probable cause” is all you need to get search warrant.
____________________________________
We are talking a sacred “Climate Science” “Hero” here. The example of Mann and the University of Virginia e-mails shows “Probable Cause” has not allowed State Attorney General Cuccinelli access (search warrant) to e-mails that should have been available through a FOIA.
The high court in Virgina ruled that the university is not “a person” under the state’s Fraud Against Taxpayers Act, and the term “corporation” as used in the statute does not include state agencies such as public universities. (Does that mean the Virginia State Supreme Court just ruled that Mann is not “a person” but a god?)
That is why I said good enough to get a search warrant. After all we no longer have “The Rule of Law” in the USA but “The Rule of the Privileged Class” (I wish I could put /sarc)
I just want to note that, on the point there is no hurry to charge the fraudster, the more time that passes the more difficult forensic analysis on the computers in use by the perpetrator become. While many people think they have cleaned up after themselves, it takes a bit of expertise to do it properly. Given that we aren’t dealing with an expert in this area, time is the enemy of discovering evidence on the computers.
_Jim says:
March 14, 2012 at 9:58 am
Gail Combs says March 14, 2012 at 8:44 am
…
In other words it is good enough to get a search warrant but not good enough to hang him.
Civil case penalties rarely involve ‘hanging’ …..
_____________________________
The last person hanged in the United States was Billy Bailey, on January 25, 1996… So there is hope yet! /sarc
(Sad, Jim that was really very weak snark you are loosing your touch)
Stephen Richards says:
March 14, 2012 at 9:21 am
Gleick needs to just go away, and let someone honest take over.
If they can find an honest environmentalist.
“Honest environmentalist,” oxymoron?
IF law enforcement gets involved, and that’s a big if, it is sometimes possible to connect a document with the printer it was printed on. It shouldn’t be too difficult to check the printers at Heartland, as well as those of Bass and Gliek.
I still think it is possible that someone else wrote the fake memo and mailed it to Gleik, leading him to become curious enough to dig for confirmation. Gleik and Bass have similar writing styles. I wonder how many people also share similar styles with them? For example, what would the analysis software tell us if everyone here was also tested as a possible source of the memo? Would anyone here be an even better match than Gleik. If so, would that be enough to say that you have found the source? Any half-competent lawyer should be able to convince a jury that the evidence is circumstantial.
Now, IF you could get a warrant and check to see if you could find a printer that matches the one on which the memo was printed, that would be a different story.