Readers may recall that on February 22nd, I offered up some open source stylometry/textometry software called JGAAP (Java Graphical Authorship Attribution Program), with a suggestion that readers make use of it to determine the authorship of the faked Heartland strategy memo disseminated to the media by Peter Gleick.
A link to that article is here:
The reason I did that was that many had speculated that Dr. Peter Gleick was the author. Gleick, who admitted to obtaining the Heartland board meeting documents under false pretenses, and likely illegally, denies he wrote it. Except for a few holdouts and those who won’t give an opinion, like Andy Revkin, other prominent voices of the online community such as Megan McArdle of The Atlantic think otherwise, and she doesn’t even see it as a professionally written memo:
“…their Top Secret Here’s All the Bad Stuff We’re Gonna Do This Year memo…reads like it was written from the secret villain lair in a Batman comic. By an intern.”
In posting about JGAAP software crowdsourcing, I had hoped that the wide professional base of readers could make use of this software and would be able to come to conclusions using it, but there were complications that made the task more difficult than it would normally be. These complications included the fact that there were cut and pasted elements of other stolen Heartland documents in the “Climate Strategy Memo,” making it difficult for the software to delineate the separate writing styles without knowledgeable fine tuning.
These complications became especially evident when writer Shawn Otto at the Huffington Post used the JGAAP software to do his own analysis, coming to the conclusion that Joe Bast, president of the Heartland Institute, had authored the fake memo. The problem was that Mr. Otto did not perform the due diligence required in his selection of documents and the JGAAP software controls, and this led to an erroneous result.
In the end I realized that only professionals familiar with the science of stylometry/textometry would be able to make a credible determination as to the authorship. So, I asked for help.
On February 23, 2012 I sent the Evaluating Variations in Language Laboratory (the group responsible for the JGAAP software) a request for assistance. Mainly what I was looking for initially was tips on how to best operate their software, but given the high profile nature of this issue, and the unique situation, they referred me to Juola & Associates and its president, Patrick Brennan, who responded with an even better offer. They would use their larger collection of tools and techniques reserved for their forensics consulting work and apply it to the task, pro bono. Normally such professional analysis for courtroom quality work nets them fees comparable to what a metropolitan lawyer might charge, so not only was I extremely grateful, but realized it was an offer I couldn’t refuse.
In my email to Brennan on Fri, Feb 24, 2012 at 5:07 PM I wrote:
For the record, I do not know what the outcome might be, but it is always best to consult experts externally who have no financial interest in the outcome of the case.
Here’s the background on the group:
Juola & Associates (www.juolaassoc.com) is the premier provider of expert analysis and testimony in the field of text and authorship. Our scientists are leading, world-recognized experts in the fields of stylometry, authorship attribution, authorship verification, and author analysis. Every written document is a snapshot of the person who wrote it; through our analysis, we can determine everything from sociological information to biographical information, even the identity of the author. We provide sound, tested, and legally-recognized analysis as well as expert testimony by Dr. Patrick Juola, arguably one of the world’s leaders in the field of Forensic Stylometry.
We have worked with groups as wide-ranging as multinational companies, Federal courts, research groups, and individuals seeking political asylum. We have literally written the book (ISBN 978-1-60198-118-9) on computational methods for authorship analysis and profiling.
The lead analysis was conducted by Patrick Juola, Ph.D., Director of Research, and director of the Evaluating Variations in Language Laboratory at Duquesne University in Pittsburgh. Juola & Associates, headed by President Patrick Brennan is a separate commercial entity that provides analysis and consultation on stylometry.
Dr. Juola has published his analysis of the “Climate Strategy Memo,” which I present first and in entirety here at WUWT.
First, the short read:
Stylometric Report – Heartland Institute Memo
Patrick Juola, Ph.D.
Summary
As an expert in computational and forensic linguistics, I have reviewed the alleged Heartland memo to determine who the primary author of the report is, and more specifically whether the primary author was Peter Gleick or Joseph Bast. I conclude, based on a computational analysis, that the author is more likely to be Gleick than Bast.
And the larger excerpt of the document, bolds mine:
Analysis
24 This task is challenging for several reasons, some technical and some linguistic.
25 First, the Heartland memo as published contains a great many quotations taken from other sources. As originally published, the memo contains approximately 717 words, but at least 266 of those words have been identified as belonging to phrases (or paraphrases of phrases) found elsewhere in the stolen documents). [N.b. this identification was done by the Heartland Institute, who admit that these 266 words are “paraphrases [of] text appearing in one of the stolen documuments.”
As paraphrases, they may nor may not reflect the style of the original authors, and they also may or may not reflect the style of the alleged forger. For this reason, we analyzed both the full document as well as the 451-word redacted document with the controversial passages removed.
26 Second, even the full-length document is rather short for an accurate analysis. Most authorship attribution experts recommend larger samples if possible. (E.g., Eder recommends 3500 words per sample, noting that results obtained from fewer than 3000 words “are simply disastrous.”)
27 Thirdly, perhaps as a result of the previous factors, we have observed that Bast and Gleick appear to have extremely similar writing styles.
Results
28 Despite this difficulty, we were able to identify and calibrate an appropriate analysis method. Using this method, we analyzed both the complete Heartland memo and the selections from the Heartland memo that had been identified as not copied from other stolen documents. In both analyses, the JGAAP system identified the author as Peter Gleick.
29 In particular, the JGAAP system identified the author of the complete (unredacted) memo as Peter Gleick, despite the large amount of text that even Bast admits is largely taken from genuine writings of the Heartland Institute. We justify this result by observing, first, that much of the quotation is actual paraphrase, and the amount of undisputed writing is still nearly 2/3 of the full memo.
Conclusions
30 In response to the question of who wrote the disputed Heartland strategy memo, it is difficult to deliver an answer with complete certainty. The writing styles are similar and the sample is extremely small, both of which act to reduce the accuracy of our analysis. Our procedure by assumption excluded every possible author but Bast and Gleick. Nevertheless, the analytic method that correctly and reliably identified twelve of twelve authors in calibration testing also selected Gleick as the author of the disputed document. Having examined these documents and their results, I therefore consider it more likely than not that Gleick is in fact the author/compiler of the document entitled ”Confidential Memo: 2012 Heartland Climate Strategy,” and further that the document does not represent a genuine strategy memo from the Heartland Institute.
It seems very likely then, given the result of this analysis, plus the circumstances, proximity, motive, and opportunity, that Dr. Peter Gleick forged the document known as ”Confidential Memo: 2012 Heartland Climate Strategy.” The preponderance of the evidence points squarely to Gleick. According to Wikipedia’s entry on the “legal burden of proof”:
Preponderance of the evidence, also known as balance of probabilities is the standard required in most civil cases. This is also the standard of proof used in Grand Jury indictment proceedings (which, unlike civil proceedings, are procedurally unrebuttable).
Further, it is abundantly clear that this document was not authored by Heartland’s Joe Bast, nor was it included as part of the board package of documents Dr. Gleick (by his own admission) phished under false pretenses from Heartland.
The complete analysis by Dr. Juola is available here: MemoReport (PDF 101k)
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Zen,
That’s been posted and refuted in the Gleick thread. If you look, you can find others that come to a different conclusion. And the article here refers to that same opinion piece, Keep in mind that your amateur Huffpo guy admittedly was just playing around with the software, while Juola does this as a profesion and is accepted in courtrooms as an expert witness. The eco-zealot at Huffpo has a big motive to reach his conclusion. And like all eco-zealots everywhere, he doesn’t have any ethics. Which one are you going to listen to? The ethics-free eco guy, or Dr Juola, who doesn’t have a dog in this fight?
And Bill Parsons says: “Weak”. To Bill maybe, because Bill is leaving out an essential element: motive. What motive would Heartland have to produce a document with erroneous numbers and misinformation? Why would Heartland promptly state that all the stolen documents were theirs, except the memo [which said essentially the same thing as the other documents, but with a spin that made them look bad]?
On the other hand we have Gleick, who had a big motive to forge that memo: he was livid at Forbes running an article signed by scientific skeptics, and then refusing to run his own article. He confessed to lashing out at Heartland.
Gleick was completely unethical. He was dishonest. Where has Heartland been anything but a totally ethical, upstanding, honest think tank, working on a shoestring budget and yet offering to pay for Gleick to come and be a part of the discusion?
Motive is the preponderance of the evidence as I see it. Gleick had the motive, and he admitted to identity fraud to steal the documents. Does anyone in their right mind really believe this crime was committed by anyone at Heartland?
One further thought on the printer thing:
Heartland should move to secure the original document before gleik trashes it.
I suspect somebody is busy forging a date stamped envelope.
I suspect Gleick will do a Weiner on us and sneak away with a million dollar penshun fund paid by the taxpayer.
the faux science is unsettling.
Copner says: “(b) can’t understand Heartland’s spreadsheets (there are two maths errors – Koch & the double counting of $88,000) – which makes a Heartland insider likely”
Dont you mean: “– which makes a Heartland insider unlikely” or do you think that heartland’s own staff do not understand their own spreadsheets?
Like a forensic analysis of Gleick’s computer(s)? Though it is way too late now. But you might be able to implicate nefarious activity from all the google searches on “how to permanently delete” in his browser cache.
Can anyone post a link to this essay in comments over on Shawn Otto’s HuffPo story?
http://www.huffingtonpost.com/shawn-lawrence-otto/joe-bast-fake-document_b_1297042.html
I have been denied access.
It’s always nice to see unbiased professionals at work.
Headline:
Forensic analysis of the fake Heartland ‘Climate Strategy Memo’ concludes Peter Gleick is the likely forger.
Reality:
Everybody knows and knew from the start that Peter Gleick is the likely forger.
Pretence:
Greens want to hide the decline in their support by pretending they don’t know Peter Gleick is the likely forger.
Richard
I also suspect it is by Gleick, but he claims to have gotten it from someone else. So the number of candidates is 7 billion minus 1 (Bast) minus 1 (each of us who know it’s not me).
theduke says:
March 14, 2012 at 8:34 am
I think it’s still possible that another person wrote it and that person is known to Gleick.
———————————
Whom would you suggest is “the other person”. Due to the number of quotes and paraphrases from the other documents, whomever wrote it would have had to have access to said documents. The only people to whom we know had access to those documents are Heartland staff and Gleick. And only one person in that pool of suspects had the motive to do it, and that person isn’t on Heartland’s staff.
On a related note, the new tack is tu quoque: http://www.desmogblog.com/heartland-double-standard-institute-tried-scam-greenpeace-internal-documents
The headline says “internal documents”. The story says UN documents.
DirkH says:
March 14, 2012 at 8:43 am
Tom in Florida says:
March 14, 2012 at 7:37 am
“My, my how what goes around comes around. So the analysis eliminates Bast and the only one left is Gleick. Much like the reasoning of the AGW crowd that says once we’ve eliminated natural causes for warming the only thing left is man made CO2.”
Tom, imagine I send a document to some bloggers that I scanned, and that looks like a bad parody of a DoD document, that says “we plan to bomb country XXX on that and that day”.
I later “confess” that yes, I scanned it, and I got the original in the mail, and I thought it could be genuine so I sent it to the bloggers without telling them how I got it, letting them also believe it’s genuine – let’s assume for the moment that those bloggers are so gullible they don’t question the document. Wouldn’t that make me just a tiny bit suspect.
Dirk, I was jabbing at how it must feel to be on the receiving end of “it must be you (Gleick) because we have eliminated others we considered” just as Gleick and his pals deem the science is settled on CO2 for the same reason.
It is obvious to me that Gleick is the forger. If he had received it through the mail, as claimed, would an honest person then think “hmm, this looks damning, but how can I be sure it’s genuine? Oh wait – why don’t I create a fake email address,impersonate a Heartland director and trick the secretary into sending me the document?”
An honest person would – if he had an axe to grind – just forward it to some sympathetic journalists and let them make of it what they will.
Ladies and gentlemen of the jury. I put it to you that the defendent planned to lift genuine documents by deception in the belief that they contained the smoking gun evidence to incriminate Heartland over funding. Having phished the documents and discovered the smoking gun did not exist, he decided to invent one. Lifting enough material from the genuine documents to lend credibility, he then proceded to build a web of lies. He forwarded this to the sympathetic journalists, and sat back waiting gleefully for the fallout to destroy Heartland.
But caught up in his own hubris, he didn’t even realise that his amateurish forgery stood out like a beacon, a work so absurd that it prompted one writer to comment that it looked like it had been written in the den of a villain in a Batman comic – by an intern.
Gelick got the fake memo in the mail, I have video evidence, starting about 35 seconds into this clip. 🙂
http://www.youtube.com/embed/-A0yqKbIpAU?version=3&rel=1&fs=1&showsearch=0&showinfo=1&iv_load_policy=1
Anthony Watts said on March 14, 2012 at 11:24 am:
Google Cache: link
Google Cache Text-only (has links): here
I doubt they can block your access to them, but they could request Google delete the cached versions. If they work on your end and you worry they could be disappeared, just delete this text and leave a generic Thanks, message received, etc.
Tried Anthony, but my post vanished instantly
This is the same way Peter Gleick and his elk forged the Co2/ AGW scam, these people have no morals or honesty as regular readers at WUWT are well aware. The good news is the whole world is waking up to the biggest Ponzi scheme in history, some by education and others by the price of energy/everything.
Re: Previous post:
Whoops, sorry, misunderstood. You’re not blocked from accessing the article, just leaving comments there period. My apologies.
ChE, I took a quick look at that link. I guess DeSmog is digging their hole deeper. Now to disinfect the computer
Where does the huffingtonpost get a poster’s name from? All I get is a box to put a comment in, and no place for a poster’s name or email address.
I have tried a preview and it still doesn’t show my name, whereever they might get that from.
jaymam says: March 14, 2012 at 2:11 pm
Where does the huffingtonpost get a poster’s name from?
When you try and post on the huff&Puff, it then asks you to register using one of various means.
It’s really dishonest! But that’s about standard for the huff&puff.
From AndiC on March 14, 2012 at 1:32 pm:
Might be like at WUWT, posts that auto-drop into the spam bucket don’t show up as any sort of “pending”. The question is if they ever bother to check it. Of course it’s been ten days since the last comment, maybe comments are technically closed by auto-rejecting all submissions.
===
jaymam said on March 14, 2012 at 2:11 pm:
If you’re not already logged in when you hit Post, it’ll give you the opportunity to give your log in info or create an account. Soon as you get done with that, then the page will reload with the comment # in the URL, and they can start properly censoring your submission.
“Ted G says:
March 14, 2012 at 1:40 pm
This is the same way Peter Gleick and his elk forged the Co2/ AGW scam,”
That would make an interesting cartoon by Josh. Science from Mooseport perhaps?
The points of “Analysis” 24 – 27 above, are all caveats that cite the difficulties of an accurate analysis.
Following the weaknesses in the argument with Dr. Juola’s conclusion that identified Gleick as the author does not inspire confidence. Since (in my opinion) the actual analyses could and should be cited here in the thread, I considered the analysis itself “weak”. I was a bit more encouraged after reading some of the methods used, which are outlined in steps 18 – 23 of the PDF.
On of Dr. Juola’s own essays, “Does Size Matter? Authorship Attribution, Small Samples, Big Problem,” 2010, hints at the problem. From the original 717-work memo, Juola had exactly 451 words (after removing paraphases, or paraphrases of paraphrases of the authentic Heartland documents) of… (what?) apocrypha (?) to establish its author. Obviously, if the choice of authors is Gleick or Bast, I make the assumption that anyone else makes: Bast said he didn’t write it, and Bast has never lied; Gleick, on the other hand, claims not to have written it, and has just perpetrated a complicated fraud that he has confessed to – as part of which he lied copiously and with apparent gusto. Gleick clearly considered fraudulent behavior as just another tool to spread the word about skeptics, and having found nothing of much interst in the legitimate paperwork from Heartland – decided to create his own “word”, a memo that villifies its Moriarty-like author by a tone of bland venality. The memo, he hopes, will damn Heartland by its intention to prevent “true science” from being taught. Yes, I think Geick wrote it.
Still, to find that Geick is the writer based on the stylistic evidence alone seems a leap. It did help me to go through Dr. Juola’s findings in the PDF. I’d be very interested to know more details about the kind of analysis that he carried out.
As I understand it, Juola has a computer capable of recognizing the characteristics of an author once it has established a database of that author’s writing foibles and characteristics. He “callibrated” this computer by “feeding it” 12 known works by Gleick, which he had downloaded from the internet. He said at one point he had to “calibrate” the computer to identify stylistic traits and qualities unique to Gleick. The omission of details here is troubling. The computer programming part is likely complicated, but omitting analyises for proprietary reasons (the secret sauce that makes Mr. Juola “the world’s leading…”), seems unreasonable in light of the serious conclusions being sought. My guess is that the computer looks for repetitions of anything which: usage, grammar, syntax, diction, spacing, case (capitals). Repetitions of correct usages would be revealing, but (again, I would imagine) errors would be even more so.
Once the database was created, Juola asked the computer if it could recognize any individual paper omitted from the “known” body of Gleick’s work – the 12 downloaded papers, a test he calls the “leave-one-out validation”. He did the same thing with several of Bast’s works. Once he had established a winning record at identifying every omitted paper, he introduced the memo. By the end, he claims, the computer could sniff out the author, and that was Gleick.
I think Dr. Juola really must assume that anybody could have penned the memo, notwithstanding the claims of “the most logical candidate(s). Until more substantial proof incriminates Gleick, it just seems that 717 words is a bit sketchy to attribute authorship.