Forensic analysis of the fake Heartland 'Climate Strategy Memo' concludes Peter Gleick is the likely forger

gleickpic[1]Readers may recall that on February 22nd, I offered up some open source stylometry/textometry software called JGAAP (Java Graphical Authorship Attribution Program), with a suggestion that readers make use of it to determine the authorship of the faked Heartland strategy memo disseminated to the media by Peter Gleick.

A link to that article is here:

An online and open exercise in stylometry/textometry: Crowdsourcing the Gleick “Climate Strategy Memo” authorship

The reason I did that was that many had speculated that Dr. Peter Gleick was the author. Gleick, who admitted to obtaining the Heartland board meeting documents under false pretenses, and likely illegally, denies he wrote it. Except for a few holdouts and those who won’t give an opinion, like Andy Revkin, other prominent voices of the online community such as Megan McArdle of The Atlantic think otherwise, and she doesn’t even see it as a professionally written memo:

“…their Top Secret Here’s All the Bad Stuff We’re Gonna Do This Year memo…reads like it was written from the secret villain lair in a Batman comic. By an intern.”

In posting about JGAAP software crowdsourcing, I had hoped that the wide professional base of readers could make use of this software and would be able to come to conclusions using it, but there were complications that made the task more difficult than it would normally be. These complications included the fact that there were cut and pasted elements of other stolen Heartland documents in the “Climate Strategy Memo,” making it difficult for the software to delineate the separate writing styles without knowledgeable fine tuning.

These complications became especially evident when writer Shawn Otto at the Huffington Post used the JGAAP software to do his own analysis, coming to the conclusion that Joe Bast, president of the Heartland Institute, had authored the fake memo.  The problem was that Mr. Otto did not perform the due diligence required in his selection of documents and the JGAAP software controls, and this led to an erroneous result.

In the end I realized that only professionals familiar with the science of stylometry/textometry would be able to make a credible determination as to the authorship. So, I asked for help.

On February 23, 2012 I sent the Evaluating Variations in Language Laboratory (the group responsible for the JGAAP software) a request for assistance. Mainly what I was looking for initially was tips on how to best operate their software, but given the high profile nature of this issue, and the unique situation, they referred me to Juola & Associates and its president, Patrick Brennan, who responded with an even better offer. They would use their larger collection of tools and techniques reserved for their forensics consulting work and apply it to the task, pro bono. Normally such professional analysis for courtroom quality work nets them fees comparable to what a metropolitan lawyer might charge, so not only was I extremely grateful, but realized it was an offer I couldn’t refuse.

In my email to Brennan on Fri, Feb 24, 2012 at 5:07 PM I wrote:

For the record, I do not know what the outcome might be, but it is always best to consult experts externally who have no financial interest in the outcome of the case.

Here’s the background on the group:

Juola & Associates (www.juolaassoc.com) is the premier provider of expert analysis and testimony in the field of text and authorship. Our scientists are leading, world-recognized experts in the fields of stylometry, authorship attribution, authorship verification, and author analysis.  Every written document is a snapshot of the person who wrote it; through our analysis, we can determine everything from sociological information to biographical information, even the identity of the author.  We provide sound, tested, and legally-recognized analysis as well as expert testimony by Dr. Patrick Juola, arguably one of the world’s leaders in the field of Forensic Stylometry.

We have worked with groups as wide-ranging as multinational companies, Federal courts, research groups, and individuals seeking political asylum.  We have literally written the book (ISBN 978-1-60198-118-9) on computational methods for authorship analysis and profiling.

The lead analysis was conducted by Patrick Juola, Ph.D., Director of Research, and director of the Evaluating Variations in Language Laboratory at Duquesne University in Pittsburgh. Juola & Associates, headed by President Patrick Brennan is a separate commercial entity that provides analysis and consultation on stylometry.

Dr. Juola has published his analysis of the “Climate Strategy Memo,” which I present first and in entirety here at WUWT.

First, the short read:

Stylometric Report – Heartland Institute Memo

Patrick Juola, Ph.D.

Summary

As an expert in computational and forensic linguistics, I have reviewed the alleged Heartland memo to determine who the primary author of the report is, and more specifically whether the primary author was Peter Gleick or Joseph Bast. I conclude, based on a computational analysis, that the author is more likely to be Gleick than Bast.

And the larger excerpt of the document, bolds mine:

Analysis

24 This task is challenging for several reasons, some technical and some linguistic.

25 First, the Heartland memo as published contains a great many quotations taken from other sources. As originally published, the memo contains approximately 717 words, but at least 266 of those words have been identified as belonging to phrases (or paraphrases of phrases) found elsewhere in the stolen documents). [N.b. this identification was done by the Heartland Institute, who admit that these 266 words are “paraphrases [of] text appearing in one of the stolen documuments.”

As paraphrases, they may nor may not reflect the style of the original authors, and they also may or may not reflect the style of the alleged forger. For this reason, we analyzed both the full document as well as the 451-word redacted document with the controversial passages removed.

26 Second, even the full-length document is rather short for an accurate analysis. Most authorship attribution experts recommend larger samples if possible. (E.g., Eder recommends 3500 words per sample, noting that results obtained from fewer than 3000 words “are simply disastrous.”)

27 Thirdly, perhaps as a result of the previous factors, we have observed that Bast and Gleick appear to have extremely similar writing styles.

Results

28 Despite this difficulty, we were able to identify and calibrate an appropriate analysis method. Using this method, we analyzed both the complete Heartland memo and the selections from the Heartland memo that had been identified as not copied from other stolen documents. In both analyses, the JGAAP system identified the author as Peter Gleick.

29 In particular, the JGAAP system identified the author of the complete (unredacted) memo as Peter Gleick, despite the large amount of text that even Bast admits is largely taken from genuine writings of the Heartland Institute. We justify this result by observing, first, that much of the quotation is actual paraphrase, and the amount of undisputed writing is still nearly 2/3 of the full memo.

Conclusions

30 In response to the question of who wrote the disputed Heartland strategy memo, it is difficult to deliver an answer with complete certainty. The writing styles are similar and the sample is extremely small, both of which act to reduce the accuracy of our analysis. Our procedure by assumption excluded every possible author but Bast and Gleick. Nevertheless, the analytic method that correctly and reliably identified twelve of twelve authors in calibration testing also selected Gleick as the author of the disputed document. Having examined these documents and their results, I therefore consider it more likely than not that Gleick is in fact the author/compiler of the document entitled ”Confidential Memo: 2012 Heartland Climate Strategy,” and further that the document does not represent a genuine strategy memo from the Heartland Institute.

It seems very likely then, given the result of this analysis, plus the circumstances, proximity, motive, and opportunity, that Dr. Peter Gleick forged the document known as ”Confidential Memo: 2012 Heartland Climate Strategy.” The preponderance of the evidence points squarely to Gleick. According to Wikipedia’s entry on the “legal burden of proof”:

Preponderance of the evidence, also known as balance of probabilities is the standard required in most civil cases. This is also the standard of proof used in Grand Jury indictment proceedings (which, unlike civil proceedings, are procedurally unrebuttable).

Further, it is abundantly clear that this document was not authored by Heartland’s Joe Bast, nor was it included as part of the board package of documents Dr. Gleick (by his own admission) phished under false pretenses from Heartland.

The complete analysis by Dr. Juola is available here: MemoReport (PDF 101k)

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

166 Comments
Inline Feedbacks
View all comments
RomanM
March 15, 2012 10:12 am

shawnotto

But it doesn’t change the fact that using that methodology the JGAAP software produces the same result over and over.

Huh? My comments are substantive whether you can recognize them as such or not. Your “calibration” was in no way a genuine calibration of the program’s specific abilities with regard to identification of authorship of documents, but merely a gratuitous demonstration that the software can repeat a calculation twice and get the same numeric result each time. That you actually did that as part of your analysis struck me as misguided when I first read the post on your home page.
The JGAAP software itself appears to be well written and contains a very extensive list of analysis alternatives. Producing such software requires an excellent knowledge of forensic stylometric methodology. Given that Prof. Juola originated the project to create the software and directed it to its conclusion, I would suggest that he is probably more proficient in its use than you are. Although I also have reservations about the capability of the methodology in this case and the relative lack of specifics, based on ability and knowledge of the software user alone, my naive guess would be that his conclusions would be more believable than your own.
The fact remains that you used a scientific procedure with which you were completely unfamiliar and, based on questionable output which you also did not fully understand, you made a categorical statement on a website viewed by the public. Given the poor quality of the analysis, I would agree with Anthony that an admission that the statement was unjustified would indeed be in order. However, I am well aware that this will probably not happen because AGW advocates cannot be seen as admitting to or correcting previous inaccuracies.

March 15, 2012 10:52 am

> copner 7:28AM I don’t disagree with you; there obviously is something different about them.
In which case, why do you not issue a retraction or correction for the following misleading statements that appear in your articles?
“the document, titled “Confidential Memo: 2012 Heartland Climate Strategy,” (PDF) simply recapitulates the information contained in much more incriminating detail in Heartland’s undisputed Fundraising Plan (PDF).”
“Heartland says it’s a fake, although as I showed several days ago, it is, if anything, a milder version of the information contained in much greater detail the apparently authentic Fundraising Plan (PDF).”
“Thus while Heartland has expressly denied the authenticity of the Strategy document, it does nevertheless seem to be essentially in line with the strategy as actually put forth in their undisputed budget and fundraising documents.”

RockyRoad
March 15, 2012 10:54 am

David L says:
March 14, 2012 at 6:26 am

By pro bono do you actually mean it’s being funded by Big Oil?
/sarc

I don’t know why this statement hit me with such force (considering I’ve been making “pro bono” comments here for years), but I shall quit looking for my check in the mail from Big Oil from henceforth.
On Gleick–he’s like Mann. Never owning up, always obfuscating, always the negative of anything good you can say about a man. Wow–how does he put up with himself? Only a distorted value system could be the answer. Repentance is good not only for the soul and for what’s left of his career.

March 15, 2012 11:26 am

shawnotto says:
“@Smokey to me it’s not a matter of belief. I don’t approach questions of fact from a faith-based perspective.”
With the alarmist crowd it’s always one of two motives: religious belief in CO2=CAGW, or advancing the narrative for personal gain. So maybe it’s not faith-based in your case.
How do I know it’s always one of those two motivations? Because the scientific method is always ignored by the alarmist contingent. As is transparency. Without transparency and the scientific method, it’s true belief, or it’s self-serving propaganda. Because there is no empirical, testable evidence supporting the “carbon” scare. It is an evidence-free conjecture, which is still alive and kicking only because of the $billions thrown at it every year.
And regarding Gleick’s self-confessed dishonesty, once a liar, always a liar. That’s what makes your feeble attempts to defend him so impotent. So go ahead. Keep digging.

Dave Wendt
March 15, 2012 12:38 pm

On a somewhat related note, as it deals quite cogently with the mindset of Gleick and his like, I highly recommend this piece.
http://sultanknish.blogspot.com/2012/03/consensus-wants-you.html
“The Consensus Wants You
While terms like “The Marketplace of Ideas” are still tossed about occasionally like confetti out of a tenth story window, they mean about as much as the soiled mass of tape that everyone has stepped on by the time the parade is over. The age of ideas, when issues might actually be debated, instead of answered immediately with talking points derived from an inflexible ideology whose only two poles are outrage and guilt, ended some time ago.
Today we live in the age of consensus. The cultural elites no longer debate opposing points of view, they dismiss them as racist or ignorant, ridiculing not only the argument, but the arguer and the very premise that there can even be an argument.
The “marketplace of ideas” is replaced with “I’m offended that we’re even having this discussion” or “Only ignorant people believe that.” These alternating poses of victimhood and superiority make it illegal or pointless to even discuss the subject and leave every issue settled by consensus. Scientific debates end before they have begun. Political debates exist only to allow candidates to affirm the consensus or castigate them for standing outside the consensus. Personal exchanges of views either reflect the consensus or become perilous and illegal….

Dissent is not allowed within the consensus. If the consensus cannot reach directly into your head, it will do its best to force you to violate your principles to the extent that it can, knowing that people rationalize the compromises that they are forced to make and that such rationalizations lead them away from their principles. If it cannot alter your thoughts, then it will do its best to prevent you from expressing them.
The consensus wants you. All of you. If it cannot have you, it will have your children or your grand-children. It is through talking and done debating. It has climbed up to the steeple, past the gargoyles and shouts down at the world. It is not interested in ideas, only in submission to its will. It will sneer at you, laugh at you and do its best to compel you to obey its doctrines. Because it knows that if it cannot, then the consensus will dry up and blow away on the wind.”
As I said I highly recommend reading the whole thing

March 15, 2012 1:44 pm

[SNIP: If you want to post here, Greg, you’ll learn to obey the house rules. -REP]

March 15, 2012 2:48 pm

Shawn Otto’s claim isn’t that he did this correctly. It is that he did exactly what Anthony suggested, and that it is hypocritical of Anthony to criticize him for that. And that he showed his work, which is something you guys are selectively adamant about.
Juola’s claim is very weak. He simply says that he calibrated his tools on a training set to distinguish reliably between Gleick and Bast on it. They were also ‘very close” which probably means that there was a very small segment of the tuning space that was 100% reliable on the 12-sample training set.
Which in turn would imply a fairly high likelihood that there is no perfect tuning for a larger set of sample documents, meaning in turn, a finite probability of failure.
Interestingly, Joula does NOT show methods or results. Presumably, he did not cherry pick the training documents, which would mean that, all else equal, that the author is more likely to be Gleick than Bast. Or specifically, a sample from the space spanned by the chosen Gleick documents as opposed to the chosen Bast ones. But we don’t know how much more so.
As is intuitively obvious given the peculiar content of the disputed memo, neither author will give a good match. Not only is the wording notably peculiar, but the context (whatever it is) is very different from the context from which the training documents. That is, we should have a dozen examples of Gleick committing fraud and a dozen samples of Bast exposing his cynicism, even if we presume that the author could not possibly be a third party.
While I continue to propose that the document was retranslated from an obscure language, a more serious theory is that this document is an authentic draft written by an assistant to Bast on his instruction.
In short, this result is of very low quality. I feel confident that if the result had come out the other way, we would not have heard about it here. I would also propose that if we had heard about it elsewhere, say through Shawn, that it would be cavalierly dismissed here, as much stronger arguments are that do not suit the prejudices of the audience. In this case you would be right to do so.
For you to claim that the matter is settled on this feeble basis is quite remarkable, given how often settled matters are called into question in these parts. Your indifference to the details of Joula’s work is also telling.
Still, I do appreciate your attention to this matter, as anytime we can get people thinking about who Heartland is and why Gleick might be so angry at them as to take this course, is of great benefit.
REPLY: Sometimes Mr. Tobis, I think that maybe you have been under the influence of something when you write, such as when you wrote your famous f-word fusillade. Or maybe its just your innate bias, or fear that we are all going to roast when you said: “…because the f***ing survival of the f***ing planet is at f***ing stake.” I don’t know, but I do know that your reasoning makes no sense at all to me.
But to the point, Otto wrote this:

Shawn Lawrence Otto:
Mar 14, 2012 at 07:09 PM
This was, of course, the point. It was, indeed, “irresponsible” of Anthony Watts to suggest readers adopt this approach on his blog,

So, it is “irresponsible” for me to ask people to have a look at the software and the science of stylometery, but by your reasoning it was responsible for Otto to do so and come to a conclusion, but not responsible for the authors of the program to do so. By your reasoning (as I follow it) Otto’s claim is “high quality” but one the world’s leading expert on the issue and author of the software is “low quality”.
Riiiiight. Stick to f-word fusillades my friend, it is your true calling.
Here’s the comment from statistician RomanM that Shawn Otto has so far avoided addressing:

RomanM:
Feb 24, 2012 at 08:35 AM
I tried to post this at the Huffington site, but was rejected because the comment was too long.
Are you sure this was done correctly?
I downloaded your files and attempted to duplicate your work. Initially, I got error messages when using the .docx files provided. However, after saving them in the earlier Word .doc format (and also as ordinary text files), the program ran smoothly. All instructions as given were followed and the results were:
Heartland Strategy Memo.doc D:\Otto Files\Heartland Strategy Memo.doc
Canonicizers: none
Analyzed by Nearest Neighbor Driver with metric Canberra Distance using Character 2Grams as events
1. Strategy Memo 0.0
2. Peter Gleick 351.42130776228015
3. Joe Bast 352.58168488057447
4. Joe Bast 429.3252413560006
5. Peter Gleick 506.8088869961059
6. Heartland Staff 568.8816858635792
Analyzed by Nearest Neighbor Driver with metric Canberra Distance using Word 2Grams as events
1. Strategy Memo 0.0
2. Joe Bast 991.0149182362647
3. Peter Gleick 1002.0033572750334
4. Peter Gleick 1183.5345626116602
5. Joe Bast 1304.0765401409653
6. Heartland Staff 3260.0970573163086
Analyzed by Nearest Neighbor Driver with metric Canberra Distance using Word stems as events
1. Strategy Memo 0.0
2. Peter Gleick 475.2953391024075
3. Joe Bast 494.5595682060029
4. Joe Bast 612.1427626952322
5. Peter Gleick 618.9538041331741
6. Heartland Staff 1295.3724837604568
You will note that the ordering of the documents is different and that two of the analyses now choose Peter Gleick, rather than Joe Bast. You will also note that the magnitude of the statistics for each document is substantially greater than in the results of your post.
I have further questions about your method choices in the JGAAP program, as well as the particular texts you selected for the analysis, but that is another matter.
RomanM

I figure if Mr. Otto can’t be bothered to answer questions on his own blog about methodology, then he really isn’t doing science, but instead advocacy.
– Anthony Watts

March 15, 2012 5:08 pm

mtobis (@mtobis) says:
March 15, 2012 at 2:48 pm
Still, I do appreciate your attention to this matter, as anytime we can get people thinking about who Heartland is and why Gleick might be so angry at them as to take this course, is of great benefit.
Heartland is a small, modestly funded, above-board conservative think tank that covers your friggin’ climate conniptions almost as a flicking sideline. The flipping problem is that it’s been wiping the effing floor with the freaking Big Oil and Big Green super-funded fudge-packing Warmie foundations and their foofing PR shlocks. That made Pete Gleick, a spoiled and bitter old puck, oooh-so-ficking-mad, mad enough to turn him into a mother-truckin’ little thief and a liar. There, simple explanations are the best and I didn’t have to say “f*ck” even once.

Steve O
March 15, 2012 5:37 pm

“Peter Gleick is the likely forger” if the suspect list is narrowed to two possible subjects. Given that restriction, how many of US would be likely forger if it were between one of us and one other person?
Positions are stronger when they are not overstated. And they are less subject to criticism.

IanR
March 15, 2012 7:54 pm

Um, no, Anthony. I find your blog post unconvincing, and I find the forensic analyst’s argument circular, an issue of logic. There are far better ways to be snarky rather than summarizing what was written. So I will ask again, how would you paraphrase, “We justify this result by observing, first, that much of the quotation is actual paraphrase, and the amount of undisputed writing is still nearly 2/3 of the full memo.” How does that not just say that part of the memo is paraphrased and part of it is not? Is it not the author’s justification, because he says, “We justify this result”?
REPLY: Well I find your comment circular as well. I think you misunderstand, what is being said. Basically all he’s saying is that when it came to analysis, he excluded text from other know documents so that it removed that bias.
It is no more different than taking out pieces of another unrelated jigsaw puzzle that your kids mixed together in one box when you are trying to complete the puzzle. – Anthony

March 15, 2012 7:57 pm

IanR,
You’re grasping at straws. Gleick is self-admittedly dishonest. Why would you believe him? Are you really that credulous?

IanR
March 15, 2012 8:05 pm

I’m sorry, where did I say I believe Gleick? If the forensic analysis “for” Gleick’s dishonesty is built on poor logic, then why use is it? How does that make it any better than what alarmists do?

RomanM
March 16, 2012 5:34 am

IanR: I don’t understand what part of the report constitutes “circular logic” in this situation. Perhaps you could elaborate on this.
The primary non-legal meaning of the word justify is to explain or provide evidence for something. The analyst raised some possible difficulties for producing a proper examination of the documents. In particular, one objection was the relatively short length of the target document. This was further exacerbated by the fact that parts of that document consisted of direct quotes from stolen Heartland material, paraphrased text from the same source and text which was written by the unknown author.
Analysis was done on both the complete document and on the document with the quoted material removed. In both cases, the result was reported to be the same selecting Gleick as the more likely of the two to be the author. The report indicates that the fact that the entire document (including the quoted text) was also attributed to Gleick is reasonable (i.e. justified) because the combined amount of paraphrased text and original written text was still sufficiently large for the methodology to properly do so.
Now, one can reasonably argue the technical details in the entire process and the lack of provision of finer detail in the results, but I fail to see where any “circular logic” is evident in these statements.

March 16, 2012 11:24 am

Did this analysis have a “none of the above” option?
12 samples tested, using 12 known sources. How about a 13th, not known source? How would the program have handled this?
A fundamental question. Even in a line-up there are those impossible to be the perp.
Right now, to me, this analysis is worth what was paid for it.

ben
March 19, 2012 3:06 am

Well done Anthony.
But I have to ask if the analysis comparing Bast vs Gleick is not compromised by a false counterfactual? Is there any real suggestion that Bast really is behind the forged memo? Isn’t the correct test Gleick versus random member of public or perhaps random scientist?
Since Gleick and Bast have similar writing styles, this alternative will probably point more firmly at Gleick, but I tentatively think that would also be the truer test.

1 5 6 7