Forensic analysis of the fake Heartland 'Climate Strategy Memo' concludes Peter Gleick is the likely forger

gleickpic[1]Readers may recall that on February 22nd, I offered up some open source stylometry/textometry software called JGAAP (Java Graphical Authorship Attribution Program), with a suggestion that readers make use of it to determine the authorship of the faked Heartland strategy memo disseminated to the media by Peter Gleick.

A link to that article is here:

An online and open exercise in stylometry/textometry: Crowdsourcing the Gleick “Climate Strategy Memo” authorship

The reason I did that was that many had speculated that Dr. Peter Gleick was the author. Gleick, who admitted to obtaining the Heartland board meeting documents under false pretenses, and likely illegally, denies he wrote it. Except for a few holdouts and those who won’t give an opinion, like Andy Revkin, other prominent voices of the online community such as Megan McArdle of The Atlantic think otherwise, and she doesn’t even see it as a professionally written memo:

“…their Top Secret Here’s All the Bad Stuff We’re Gonna Do This Year memo…reads like it was written from the secret villain lair in a Batman comic. By an intern.”

In posting about JGAAP software crowdsourcing, I had hoped that the wide professional base of readers could make use of this software and would be able to come to conclusions using it, but there were complications that made the task more difficult than it would normally be. These complications included the fact that there were cut and pasted elements of other stolen Heartland documents in the “Climate Strategy Memo,” making it difficult for the software to delineate the separate writing styles without knowledgeable fine tuning.

These complications became especially evident when writer Shawn Otto at the Huffington Post used the JGAAP software to do his own analysis, coming to the conclusion that Joe Bast, president of the Heartland Institute, had authored the fake memo.  The problem was that Mr. Otto did not perform the due diligence required in his selection of documents and the JGAAP software controls, and this led to an erroneous result.

In the end I realized that only professionals familiar with the science of stylometry/textometry would be able to make a credible determination as to the authorship. So, I asked for help.

On February 23, 2012 I sent the Evaluating Variations in Language Laboratory (the group responsible for the JGAAP software) a request for assistance. Mainly what I was looking for initially was tips on how to best operate their software, but given the high profile nature of this issue, and the unique situation, they referred me to Juola & Associates and its president, Patrick Brennan, who responded with an even better offer. They would use their larger collection of tools and techniques reserved for their forensics consulting work and apply it to the task, pro bono. Normally such professional analysis for courtroom quality work nets them fees comparable to what a metropolitan lawyer might charge, so not only was I extremely grateful, but realized it was an offer I couldn’t refuse.

In my email to Brennan on Fri, Feb 24, 2012 at 5:07 PM I wrote:

For the record, I do not know what the outcome might be, but it is always best to consult experts externally who have no financial interest in the outcome of the case.

Here’s the background on the group:

Juola & Associates (www.juolaassoc.com) is the premier provider of expert analysis and testimony in the field of text and authorship. Our scientists are leading, world-recognized experts in the fields of stylometry, authorship attribution, authorship verification, and author analysis.  Every written document is a snapshot of the person who wrote it; through our analysis, we can determine everything from sociological information to biographical information, even the identity of the author.  We provide sound, tested, and legally-recognized analysis as well as expert testimony by Dr. Patrick Juola, arguably one of the world’s leaders in the field of Forensic Stylometry.

We have worked with groups as wide-ranging as multinational companies, Federal courts, research groups, and individuals seeking political asylum.  We have literally written the book (ISBN 978-1-60198-118-9) on computational methods for authorship analysis and profiling.

The lead analysis was conducted by Patrick Juola, Ph.D., Director of Research, and director of the Evaluating Variations in Language Laboratory at Duquesne University in Pittsburgh. Juola & Associates, headed by President Patrick Brennan is a separate commercial entity that provides analysis and consultation on stylometry.

Dr. Juola has published his analysis of the “Climate Strategy Memo,” which I present first and in entirety here at WUWT.

First, the short read:

Stylometric Report – Heartland Institute Memo

Patrick Juola, Ph.D.

Summary

As an expert in computational and forensic linguistics, I have reviewed the alleged Heartland memo to determine who the primary author of the report is, and more specifically whether the primary author was Peter Gleick or Joseph Bast. I conclude, based on a computational analysis, that the author is more likely to be Gleick than Bast.

And the larger excerpt of the document, bolds mine:

Analysis

24 This task is challenging for several reasons, some technical and some linguistic.

25 First, the Heartland memo as published contains a great many quotations taken from other sources. As originally published, the memo contains approximately 717 words, but at least 266 of those words have been identified as belonging to phrases (or paraphrases of phrases) found elsewhere in the stolen documents). [N.b. this identification was done by the Heartland Institute, who admit that these 266 words are “paraphrases [of] text appearing in one of the stolen documuments.”

As paraphrases, they may nor may not reflect the style of the original authors, and they also may or may not reflect the style of the alleged forger. For this reason, we analyzed both the full document as well as the 451-word redacted document with the controversial passages removed.

26 Second, even the full-length document is rather short for an accurate analysis. Most authorship attribution experts recommend larger samples if possible. (E.g., Eder recommends 3500 words per sample, noting that results obtained from fewer than 3000 words “are simply disastrous.”)

27 Thirdly, perhaps as a result of the previous factors, we have observed that Bast and Gleick appear to have extremely similar writing styles.

Results

28 Despite this difficulty, we were able to identify and calibrate an appropriate analysis method. Using this method, we analyzed both the complete Heartland memo and the selections from the Heartland memo that had been identified as not copied from other stolen documents. In both analyses, the JGAAP system identified the author as Peter Gleick.

29 In particular, the JGAAP system identified the author of the complete (unredacted) memo as Peter Gleick, despite the large amount of text that even Bast admits is largely taken from genuine writings of the Heartland Institute. We justify this result by observing, first, that much of the quotation is actual paraphrase, and the amount of undisputed writing is still nearly 2/3 of the full memo.

Conclusions

30 In response to the question of who wrote the disputed Heartland strategy memo, it is difficult to deliver an answer with complete certainty. The writing styles are similar and the sample is extremely small, both of which act to reduce the accuracy of our analysis. Our procedure by assumption excluded every possible author but Bast and Gleick. Nevertheless, the analytic method that correctly and reliably identified twelve of twelve authors in calibration testing also selected Gleick as the author of the disputed document. Having examined these documents and their results, I therefore consider it more likely than not that Gleick is in fact the author/compiler of the document entitled ”Confidential Memo: 2012 Heartland Climate Strategy,” and further that the document does not represent a genuine strategy memo from the Heartland Institute.

It seems very likely then, given the result of this analysis, plus the circumstances, proximity, motive, and opportunity, that Dr. Peter Gleick forged the document known as ”Confidential Memo: 2012 Heartland Climate Strategy.” The preponderance of the evidence points squarely to Gleick. According to Wikipedia’s entry on the “legal burden of proof”:

Preponderance of the evidence, also known as balance of probabilities is the standard required in most civil cases. This is also the standard of proof used in Grand Jury indictment proceedings (which, unlike civil proceedings, are procedurally unrebuttable).

Further, it is abundantly clear that this document was not authored by Heartland’s Joe Bast, nor was it included as part of the board package of documents Dr. Gleick (by his own admission) phished under false pretenses from Heartland.

The complete analysis by Dr. Juola is available here: MemoReport (PDF 101k)

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
166 Comments
Inline Feedbacks
View all comments
Pamela Gray
March 14, 2012 9:46 pm

It amazes me that people who complain loudly about the speck in the eye of their opposition remain unaware of the old growth forest log stuck in…[self-snip]. Or they know it is there and choose to fluant it anyway due to their false sense of superiority.
To be direct, the sad part is this. The damage he has caused to himself (and I pity him because of that) and the profession he says he loves will ripple through climate science for many decades. Worse, he is completely blind to that fact and instead, completely believes he has done his profession a huge favor by attempting to rid the world of what he believes is a very badly flawed organization.
Unfortunately, the world has plenty of people serving in positions of leadership who have that same log stuck up…

Jeff
March 14, 2012 10:23 pm

Like a resume; I’m concerned when I find typos on a professional website, http://www.juolaassoc.com/about-us/ “… and act is socially unacceptable ways on the Internet.”
REPLY: That may have been a web designer typo- don’t assume the people that do analysis made that error – Anthony

Hilary Ostrov (aka hro001)
March 14, 2012 11:16 pm

Endicott March 14, 2012 at 11:44 am

Whom would you suggest is “the other person”. Due to the number of quotes and paraphrases from the other documents, whomever wrote it would have had to have access to said documents. The only people to whom we know had access to those documents are Heartland staff and Gleick. And only one person in that pool of suspects had the motive to do it, and that person isn’t on Heartland’s staff.

My guess is that Gleick may have had a helping hand from Mashey. Such mashups as this “memo” turned out to be are very much up Mashey’s alley. It is conceivable that whoever wrote the “memo” had initially written a much longer piece, and that Gleick “edited” whatever he had received, adding some flourishes of his own.
Gleick’s “confession” does not specifically say that he received the memo via snail-mail, just “in the mail”. Nor does it specify the date that he had allegedly received it, just “early 2012”. As we know from his “review” of The Delinquent Teenager … – and subsequent discussion thereof – honesty, accuracy and clarity are not Gleick’s forté.
Except for the list of Board Members, Gleick had fraudulently acquired all purloined Heartland documents during the period Feb. 2 – Feb. 6. So he had between Feb. 6 and Feb. 13 to either “verify” or create the “contents” of the memo before bundling it up with the docs for the E-mail he sent on Feb. 14, doing yet another great pretender act.
Even if Gleick received no assistance from anyone else – and the memo was, in fact, “composed” by some unknown other person – then the known errors strongly suggest that his attention to detail during the process of this alleged “verification” exercise is somewhat lacking. In which case one might well ask, what does this say about Gleick’s “science”, and his reputation as an “expert”?!
And let’s not forget that undated “An Open Letter to the Heartland Institute” from the Gang of 7 that surfaced on Feb. 17. Gleick’s name was highly conspicuous by its absence in this blatant attempt to spin equivalence to Climategate.
For all his “genius” attributes, Gleick does not strike me as being an original or creative thinker. Even his organization’s mission statement was lifted from the “official definition” of sustainable development. And for all we know, his determination to get hold of Heartland’s list of funders might even be a copy-cat act: Bob Ward has been after the same information from GWPF for quiet some time, has he not?!
Bottom line, as far as I can see, is that whichever way one looks at this, there are multiple lines of evidence that Gleick is indisputably the author of his own misfortune.

Scottish Sceptic
March 15, 2012 1:14 am

Anthony, I notice a few heart felt replies. In the context that Gleick was personally attacking you in a nasty vindictive way, they are warranted. But for someone reading the blog without knowing that context ….

Alistair Pope
March 15, 2012 1:53 am

Surely it is time for the police, FBI, etc ti call in Gleick and ask hime under oath if he is the author. If ‘Yes’ then criminal charges must follow then a civil case for damages.
If ‘No’ then the investigation must continue and if the evidence stacks up that he lied under oath then add one more charge
We will never get the truth until the Climate Deceivers & Climate Cultists start going to prison. This is still cheaper than paying them further alchemy grants.

John Q. Galt
March 15, 2012 2:51 am

What about Jon Entine’s attempt at satire? Why isn’t any body acknowledging it?
http://junkscience.com/2012/03/13/nrdc-threatens-legal-action-over-memo-spoof/

March 15, 2012 3:52 am

Otto:
My colleague Byronic has just posted the following comment in response to your latest 2 posts on your blog (in which you now claim the main purpose of your articles was to expose the weakness of stylometric analysis rather than accuse Jo Bast).. Let’s see if you allow them through, rather than disappear them, and whether you rise to the challenge.
The quoted lines in Byronic’s post are quotes from your earlier post at shawnotto.com
Byronic:
Mar 15, 2012 at 04:47 AM
> A) JGAAP showed different answers;
I believe it possible, maybe even likely, that JGAAP could show different answers, and not reliably identify an author, but even a cursory examination of your post shows you are not using the software correctly.
> B) the study was commissioned by an interested party (big surprise he got the results he had hoped for)
You’re stretching “commissioned” to mean something that it doesn’t in common parlance.
> and C) as noted by other experts, the sample the study considered was small to the point of being garbage.
Agreed, I am doubtful that software examination of the text can be used to identify the author.
It’s funny how you say now that your point was to prove that stylometric analysis was flawed as an approach – because at the time your wrote 2 articles, you never mentioned that as your goal – you merely cautioned that your analysis might be unreliable.
Instead you wrote 2 articles accusing Jo Bast of being the author, based on your experiment. Your experiment was then republished at HuffPo & Greg Laden, again making the same accusation – Greg Laden retrospectively and deceptively edited his article to imply it was partly a joke – but is still sticking with the line that because you completed your experiment first, it must be reliable, and proves Bast is the author.
You never issued a correction, or even made a comment, correcting the misinterpretation of your experiments at Laden’s or Huffington Post.
You also told an untruth in your article, claiming the memo “simply recapitulates the information contained in much more incriminating detail in [other documents]”.
Even a cursory examination shows that is untrue.
First there is a significantly different use of language, for example “anti-climate”, stopping teachers from teaching science, and so on, which only appears in the disputed memo.
Secondly there are funding details in the disputed memo which do not matchthe other documents – such as Koch funding being for climate work, Koch having already paid $200K, and apparent double counting of $88K.
Thirdly there are miscellaneous items, many libelous if untrue, which appear only in the disputed memo, such as a secret strategy being distributed to only a subset of the board.
Fourthly, and in my opinion my significantly (in terms of determining the author), the entire section about Gleick, Forbes, (as well the bit about Revkin, Curry, etc.), only appears in the disputed memo.
Science is in large part about honesty. Maybe you should think about that.
I would be charitable enough to say your summation of the memo might be an honest mistake.
I would also be charitable enough to say that maybe your failure to communicate that the purpose of your experiment was to point out the weakness of stylometric analysis (rather than accuse Bast of being the author), was also an oversight.
But if you are honest, then you need to go back and correct your mistakes.
That means making a prominent correction in both your articles here, and your articles at Huffington Post, and doing the best to get a correction at Laden’s too – posting a comment if not able to do anything else.
So, I challenge you to be a credible science writer, and correct these points.

shawnotto
Reply to  copner
March 15, 2012 6:24 am

Well, copner had my respect until he went off the rails. Re: the Byronic post:
@Byronic:
“I believe it possible, maybe even likely, that JGAAP could show different answers, and not reliably identify an author, but even a cursory examination of your post shows you are not using the software correctly.”
To be credible scientifically, you have to explain this claim convincingly. Otherwise it’s just fluff.
“You’re stretching “commissioned” to mean something that it doesn’t in common parlance.”
If you want to have credibility, please don’t quibble over points where the facts contradict you. Just because you don’t understand the definition of a word does not make the word wrong.
From the free online dictionary:
com·mis·sion (k-mshn)
n.
1.
a. The act of granting certain powers or the authority to carry out a particular task or duty.
b. The authority so granted.
c. The matter or task so authorized: Investigation of fraud was their commission.
d. A document conferring such authorization.
http://www.thefreedictionary.com/commission
“It’s funny how you say now that your point was to prove that stylometric analysis was flawed as an approach – because at the time your wrote 2 articles, you never mentioned that as your goal – you merely cautioned that your analysis might be unreliable.”
I simply performed an honest analysis and made my methodology available to all readers – something Juola has not, incidentally. Then when Joe Bast complained that the memo included pasted language that should be removed, I removed it and reperformed the analysis per his suggestion, and wrote it up. See my conversation in the comment thread about this as well, here shawnotto.rebeccaotto.com/neorenaissance/blog20120229.html I was not attempting to “prove” anything. “Proof” is not a scientific term; it’s something you do in math.
“Instead you wrote 2 articles accusing Jo Bast of being the author, based on your experiment.”
Wrong. I did not accuse anyone. You seem to have a somewhat loose grasp of the English language and of scientific methodology.
“You never issued a correction, or even made a comment, correcting the misinterpretation of your experiments at Laden’s or Huffington Post.”
It is what it is. There is no correction to be issued. Others can confirm the outcome of the experiment by reproducing it reliably, and, using the methodology I indicated the software reliably identifies Joe Bast as the most likely author. And I have been far more transparent that Juola. His analysis is no more credible than mine, and arguably less credible since it is less transparent.
“You also told an untruth in your article, claiming the memo “simply recapitulates the information contained in much more incriminating detail in [other documents]”.”
That’s the fact, and has been observed many times by other writers and scientists around the world. The only people so up in arms over the memo are you Heartland lovers, and you are acting like that’s the basis of all the outrage in the press, which is patently false. It appears to me to simply be an attempt to deflect attention from the substantive issue of the nefarious plan to teach school children propaganda. Nobody else cares about the language choices of the memo – that’s just rhetoric. What the mainstream press cares about is the substance of the plan and who’s funding the Heartland.

AGW_Skeptic
March 15, 2012 4:33 am

Quoting hro001:
“Except for the list of Board Members, Gleick had fraudulently acquired all purloined Heartland documents during the period Feb. 2 – Feb. 6. So he had between Feb. 6 and Feb. 13 to either “verify” or create the “contents” of the memo before bundling it up with the docs for the E-mail he sent on Feb. 14, doing yet another great pretender act.”
This is not correct. While it is true that board members are listed on The Heartland Institutes website, the document Gleick requested and received contained the board members private addresses, email and phone numbers (not on the website).
The only reason to specifically and individually request this (the last doc requested after receiving all of the other ones) was to harass and harm the board because he included this information when he released the documents to the bloggers and media. This is probably the most serious breach as it had nothing to do with the strategy claims in the forged document.
It had everything to do with providing personal access to board members in an attempt to harass/harm them and/or give others the information and opportunity to do so. Gleick cannot spin his way out of this one.

March 15, 2012 5:44 am

shawnotto says:
March 14, 2012 at 8:14 pm
…blah, blah, blah…
REPLY(From Anthony Watts): Oh, please. You were fine with it all when you thought you had snagged Joe Bast, and to use your words “your readers lapped it up”. But now that somebody who actually understands the science of stylometry has correctly configured JGAAP and run the calibration and produced results that don’t support you it’s now “oh wait, the method is flawed”. Right, sure, anything to defend Gleick and keep yourself from looking foolish

Uh, the “keep (himself) from looking foolish” part isn’t working.
Perhaps someone should run his above post through JGAAP to determine if he really wrote it.
🙂
(Bold mine, for emphasis and clarity.)

March 15, 2012 6:43 am

shawnotto,
Next time a judge approves an expert witness, he’ll probably appoint you… not.
But thanx for your non-expert opinion, and Vanna has some lovely parting gifts for you on your way out.
“What the mainstream press cares about is the substance of the plan and who’s funding the Heartland.”
But we know who is funding Gleick, it’s the same George Soros who funds all the anti-science chumps. Anyone who still believes that Gleick didn’t forge that memo is as deluded as a Scientologist.

shawnotto
March 15, 2012 6:50 am

@Smokey to me it’s not a matter of belief. I don’t approach questions of fact from a faith-based perspective. The jury is still out and this analysis has contributed very little, IMHO.

March 15, 2012 6:51 am

Byronic:
Mar 15, 2012 at 07:49 AM
You don’t have to love Heartland to see that the disputed memo does not “simply recapitulate” points in the other documents. There are a number of points in the disputed memo, which are unique to that memo and appear no where else.
It is not merely a question of language, but of factual claims that appear in the dispute memo. It is the only place that says Koch funds Heartland’s climate activities. It is the only place where it says Heartland wants to stop teachers teaching science. It is the only place where it says that Heartland wants to exclude other voices. All these were important in the press coverage. But regardless of what the press focused on – your coverage is dishonest when you claim the memo “simply recapitulates” the other documents. It does not in either language or content (“substance of the plan”).
As to your experiment, you are missing any kind of calibration, and not using the software correctly. A proper experiment would train the software on known samples, then see if it can correctly identify authorship on a second set of known samples (to test whether you are using the right metrics, and calibrate a known error rate for predictions based on those metrics), and then try to identify authorship on the unknown memo.
You appear to have just thrown a small set of documents at the software, producing a jumbled list of those documents, with Gleick & Bast samples intermixed in the list, and said some selected numbers look similar, so… you avoided doing any kind of calibration whatsoever.
Juola may not have published all his data, but at least he used the software correctly.

IanR
March 15, 2012 7:16 am

Yes, Anthony, it’s an accurate paraphrase:
What I wrote, “The memo was paraphrased by Peter Gleick, because the memo was paraphrased, except for what wasn’t paraphrased. Therefore, Peter Gleick paraphrased it. Maybe, but more than certainly.”
What the analysis says, “We justify this result by observing, first, that much of the quotation is actual paraphrase, and the amount of undisputed writing is still nearly 2/3 of the full memo.”
Could you provide an alternate paraphrasing that is different than mine?
REPLY: Why bother? Snark is snark. Your comment wasn’t intended to be anything else. – Anthony

shawnotto
March 15, 2012 7:17 am

Byronic
“A proper experiment would train the software on known samples,”
I wonder if you understand the software. It is not AI, subject to training. Maybe this is another case of your using a word differently than the dictionary?
“…then see if it can correctly identify authorship on a second set of known samples (to test whether you are using the right metrics, and calibrate a known error rate for predictions based on those metrics), and then try to identify authorship on the unknown memo.”
Now you are getting closer to a substantive scientific criticism. But this is what I did in the first pass. The software correctly identified the memo itself with a distance score of 0.00. You may have a methodological disagreement with this choice but you haven’t clarified what it is or why the choice is flawed.
“you avoided doing any kind of calibration whatsoever.”
From the above I don’t think you can support that statement. The software identifying the memo itself with a perfect match of 0.00 is a pretty clear calibration verification of both the software and the methodology.

March 15, 2012 7:28 am

My question for Shawn Otto would be: If he really thinks the strategy doc contains the same stuff as the other documents, why is the strategy doc the only one which Heartland disavows?
Even before you look at the content of strategy doc, and even before you consider the question of whether it is genuine or faked, it wouldn’t make sense to disavow just those 2 pages, unless there really was something different about them.

Peter Kovachev
March 15, 2012 7:34 am

shawnotto says:
March 14, 2012 at 8:14 pm
———————————-
O, you poor baby! If anyone’s blathering here, it’s you, the censorious, mendations and self-promoting apartchik of no remarkable talents other than rabbing a good seat on the bandwagon. I accuse you of being a liar because your “pro-science” website, http://www.shawnotto.com, is based on the latest lying propaganda meme, namely that CAGW skepticism is “anti-science,” and a liar and a coward because you resorted to slandering Joe Bast and Anthony Watt with nothing but cowardly hints and inuendos and when called out, you attempted to repair the damage with more of the same. Shame on you.
Carrick (March 14, 2012 at 7:33 pm), that makes sense, especially when it concerns the same case. Assuming that Gleick is the author of the forgery is reasonable, not only given the amateurish job, but his history of animosity. Very different from Shawn Otto’s hint that Steven Mosher is engaging in slander. Mosher is on safe ground, given Gleick’s admission to fraud. It’s Shawn Otto who may be in deep doo-doo for his attempt to smear and accuse Heartland’s Joe Bast, the clear victim of theft, fraud and forgery. One can hope that Heartland includes him in their civil suits.

shawnotto
March 15, 2012 8:05 am

@Anthony I have noticed that when you get backed into a corner you don’t argue facts; you get derisive and make ad hominem attacks. This hurts your credibility and your public image. As to your claims above, they seem to me to be an exercise in revisionist history. Please reread your own post: http://wattsupwiththat.com/2012/02/22/an-online-and-open-excercise-in-stylometrytextometry-crowdsourcing-the-gleick-climate-strategy-memo-authorship/
“So, let’s use science to show the world what they the common sense geniuses at DeSmog haven’t been able to do themselves. Of course I could do this analysis myself, and post my results, but the usual suspects would just say the usual things like “denier, anti-science, not qualified, not a linguist, not verified,” etc. Basically as PR hacks, they’ll say anything they could dream up and throw it at us to see if it sticks. But if we have multiple people take on the task, well then, their arguments won’t have much weight (not that they do now). Besides, it will be fun and we’ll all learn something.”
You then go on to provide a methodology, and invited people “on both sides” to follow it. You didn’t say don’t do this if you’re not an expert in stylometry; quite the contrary — you encouraged “multiple people take on the task.” Unlike other readers, I identified the many possible sources of error inherent in the approach. I simply took you up on your suggestion, and by arguing against me for doing that you are casting doubt on the responsibility of making the suggestion in the first place.
REPLY: Asking for help rather than being self assured enough to publish on something you knew nothing about like you did is being backed into a corner? Curious.
And, the last sentence..”we’ll all learn something.” did in fact occur. We learned that none of us, including you, is an expert on the program and the science of stylometry. So, rather than publish a flawed rushed result like you did, I asked the true experts for help. Now, you don’t like their result so you resort to tearing it and me down. Shameless ego there Mr. Otto. See Roman M’s comment.
Oh and speaking of “derisive” ad homs, why in your essay did you say: “Anthony Watts, one of the climate deniers…” Seems pretty clear to me that you don’t follow the advice you gave to me. Look in my essay above. Do you see any derogatory labels for you? – Anthony

RomanM
March 15, 2012 8:08 am

shawnotto:

It is what it is. There is no correction to be issued. Others can confirm the outcome of the experiment by reproducing it reliably, and, using the methodology I indicated the software reliably identifies Joe Bast as the most likely author.

Hogwash! I am not exactly a beginner at statistical analyses and I could not even come close to determining exactly what you had done. Given the metric you had chosen, the values in your output were several orders of magnitude away from what might be expected and the ordering was not the same as what I was getting.
When I questioned your analysis, you made no effort to elucidate on your efforts in the process, quite probably because you have little or no understanding of the program and the methodology involved.
Do you understand the numbers calculated by the program? Do you understand the dependance on the specific documents chosen for the analysis? I sincerely doubt it. However, this does not deter you from making definitive statements such as “the software reliably identifies Joe Bast as the most likely author”.
I guess that defense of The Cause can justify anything, stealing documents, forging new ones with inflammatory statements, or as in your case, just making things up for your own propaganda purposes.

shawnotto
March 15, 2012 8:11 am

copner 7:28AM I don’t disagree with you; there obviously is something different about them.

RomanM
March 15, 2012 8:25 am

shawnotto:

“you avoided doing any kind of calibration whatsoever.”
From the above I don’t think you can support that statement. The software identifying the memo itself with a perfect match of 0.00 is a pretty clear calibration verification of both the software and the methodology.

!!! This tantamount to writing a program to calculate the square root of a number and then “calibrating” the program by running it twice with the same number. Then, when the same result occurs both times, the process is delared as a “pretty clear calibration verification of both the software and the methodology”.
What this would actually show is that the program reproduces the same value when given the same input. It in no way indicates that the software does what it is intended to do nor does it tell you that the value generated is correct.
“True” calibration would be showing that the particular methodology chosen is at least capable of distinguishing between the writing of Mr. Bast and Mr. Gleick using samples of their writing and known target samples at the same time determining a measure of that ability.

gnomish
March 15, 2012 8:30 am

another klimate kamikaze crashes and burns.
i do love to hear shawnotto howling – can practically visualize him standing there naked and burning.
moar, please! fry them – they need to be crispy to prevent contagion.

Peter Kovachev
March 15, 2012 8:35 am

shawnotto says:
March 15, 2012 at 8:05 am
“…You then go on to provide a methodology, and invited people “on both sides” to follow it. You didn’t say don’t do this if you’re not an expert in stylometry; quite the contrary — you encouraged “multiple people take on the task.” Unlike other readers, I identified the many possible sources of error inherent in the approach. I simply took you up on your suggestion, and by arguing against me for doing that you are casting doubt on the responsibility of making the suggestion in the first place.”
Thought so. What a sorry character. It’s all Anthony’s fault now that Otto ran a gentleman’s experiment with his cup of cacao and then, the careful and jigh-principled ueber-scientist that he is, ran with it as a juicy piece of slander and all the way to the usual suspects, like Huff Po. Otto’s lawyered up now, would be my guess, so now he speaks with two heads.

shawnotto
March 15, 2012 8:37 am

@RomanM you are coming close to making some substantive criticisms of my approach. I respect that. You disagree with the methodology. Fine. But it doesn’t change the fact that using that methodology the JGAAP software produces the same result over and over. A valid criticism will show why that should be disregarded on the merits, not by slamming me personally as Anthony has done. The experiment I conducted is reproducible, so has nothing to do with me. Neither am I making any conclusions about whether that result is true or whether it is the best methodology. In fact I caution that it may not be. I don’t think anyone can comment on whether it is true or false at this point, and from what I have read I am not convinced that any stock can be placed in Joula’s contradictory conclusions either, for reasons I indicate in comments above, predominantly because as he himself points out in HIS caveats, other experts believe a sample this size is far too small to distinguish from noise. That being the case, issuing this crowd sourcing suggestion in the first place was, in my view, probably not wise, and issuing this story now is little more than rhetoric. Looking at noise, Joula has nevertheless devined an opinion. On what basis, then? Hard to say, because he doesn’t supply his data or methodology, but does caution that the results are probably unreliable, and then goes on to cast an opinion anyway. I think it’s valid to criticize that as unscientific, and if this were a conversation about a report issued by a climate scientist on global warming, you’d be probably making the same argument.
REPLY:…other experts believe a sample this size is far too small to distinguish from noise.” That would be true, as in your case study without any calibration. But like any filter, when properly tuned (via Juola’s calibration methodology) the filter can then extract a signal from the noise.
You did no tuning, no calibration, and thus your results are in fact indistinguishable from noise. The difference between your approach and Juola’s is clearly obvious. Juola is enough of a professional (with the risks that entails) to know if he couldn’t produce a credible result. His caveat was essentially, “yes the sample size is small, but this method improves the filter enough for me to make a credible probability judgment”. I doubt that a leading expert in this field would risk his professional reputation on a pro bono freebie for some blogger if he didn’t think his method was sound.
The experiment I conducted is reproducible” So what? Reproducibility of an erroneous result does not equate to divining the truth. Lots of published erroneous science is reproducible.
I predict that you will be unable to write a post about this new result on HuffPo and link to it, even though you stated “I would encourage others to attempt to replicate, critique, and perform other analyses.” because you can’t risk having your credibility as a science writer impugned.
– Anthony

Peter Kovachev
March 15, 2012 9:37 am

shawnotto says:
March 15, 2012 at 8:37 am
“The experiment I conducted is reproducible, so has nothing to do with me.”
And, “I am not convinced that any stock can be placed in Joula’s contradictory conclusions either, for reasons I indicate in comments above, predominantly because as he himself points out in HIS caveats, other experts believe a sample this size is far too small to distinguish from noise.”
————————————
Elevator summary: My cluelessness and ineptitude will yield the same results for anyone as clueless and inept as I. I have nothing to do with that. An expert can’t make proper conclusions because of this and that, but I can, because my clueless and inept experiment is reproducible.