An online and open exercise in stylometry/textometry: Crowdsourcing the Gleick "Climate Strategy Memo" authorship

Tonight, a prescient prediction made on WUWT shortly after Gleick posted his confession has come true in the form of DeSmog blog making yet another outrageous and unsupported claim in an effort to save their reputation and that of Dr. Peter Gleick as you can read here: Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic

In a desperate attempt at self vindication, the paid propagandists at DeSmog blog have become their own “verification bureau” for a document they have no way to properly verify. The source (Heartland) says it isn’t verified (and a fake) but that’s not good enough for the Smoggers and is a threat to them, so they spin it and hope the weak minded regugitators retweet it and blog it unquestioned. They didn’t even bother to get an independent opinion. It seems to be just climate news porn for the weak minded Suzuki followers upon which their blog is founded. As one WUWT commenter (Copner) put it – “triple face palm”.

Laughably, the Penn State sabbaticalized Dr. Mike Mann accepted it uncritically.

Twitter / @DeSmogBlog: Evaluation shows “Faked” H …

Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic bit.ly/y0Z7cL  – Retweeted by Michael E. Mann

Tonight in comments, Russ R. brought attention to his comment with prediction from two days ago:

I just read Desmog’s most recent argument claiming that the confidential strategy document is “authentic”. I can’t resist reposting this prediction from 2 days ago:

Russ R. says:

February 20, 2012 at 8:49 pm

Predictions:

1. Desmog and other alarmist outfits will rush to support Gleick, accepting his story uncritically, and offering up plausible defenses, contorting the evidence and timeline to explain how things could have transpired. They will also continue to act as if the strategy document were authentic. They will portray him simultaneously as a hero (David standing up to Goliath), and a victim (an innocent whistleblower being harassed by evil deniers and their lawyers).

2. It will become apparent that Gleick was in contact with Desmog prior to sending them the document cache. They knew he was the source, and they probably knew that he falsified the strategy document. They also likely received the documents ahead of the other 14 recipients, which is the only way they could have had a blog post up with all the documents AND a summary hyping up their talking points within hours of receiving them.

3. This will take months, or possibly years to fully resolve.

Russ R. is spot on, except maybe for number 3, and that’s where you WUWT readers and crowdsourcing come in. Welcome to the science of stylometry / textometry.

Since DeSmog blog (which is run by a Public Relations firm backed by the  David Suzuki foundation) has no scruples about calling WUWT, Heartland, and skeptics in general “anti-science”, let’s use science to show how they are wrong. Of course the hilarious thing about that is that these guys are just a bunch of PR hacks, and there isn’t a scientist among them. As Megan McArdle points out, you don’t have to be a scientist to figure out the “Climate Strategy” document is a fake, common sense will do just fine. She writes in her third story on the issue: The Most Surprising Heartland Fact: Not the Leaks, but the Leaker

… a few more questions about Gleick’s story:  How did his correspondent manage to send him a memo which was so neatly corroborated by the documents he managed to phish from Heartland?
How did he know that the board package he phished would contain the documents he wanted?  Did he just get lucky?

If Gleick obtained the other documents for the purposes of corroborating the memo, why didn’t he notice that there were substantial errors, such as saying the Kochs had donated $200,000 in 2011, when in fact that was Heartland’s target for their donation for 2012?  This seems like a very strange error for a senior Heartland staffer to make.  Didn’t it strike Gleick as suspicious?  Didn’t any of the other math errors?

So, let’s use science to show the world what they the common sense geniuses at DeSmog haven’t been able to do themselves. Of course I could do this analysis myself, and post my results, but the usual suspects would just say the usual things like “denier, anti-science, not qualified, not a linguist, not verified,” etc. Basically as PR hacks, they’ll say anything they could dream up and throw it at us to see if it sticks. But if we have multiple people take on the task, well then, their arguments won’t have much weight (not that they do now). Besides, it will be fun and we’ll all learn something.

Full disclosure: I don’t know how this experiment will turn out. I haven’t run it completely myself. I’ve only familiarized myself enough with the software and science of stylometry / textometry to write about it. I’ll leave the actual experiment to the readers of WUWT (and we know there are people on both sides of the aisle that read WUWT every day).

Thankfully, the open-source software community provides us with a cross-platform open source tool to do this. It is called JGAAP (Java Graphical Authorship Attribution Program). It was developed for the express purpose of examining unsigned manuscripts to determine a likely author attribution. Think of it like fingerprinting via word, phrase, and punctuation usage.

From the website main page and FAQs:

JGAAP is a Java-based, modular, program for textual analysis, text categorization, and authorship attribution i.e. stylometry / textometry. JGAAP is intended to tackle two different problems, firstly to allow people unfamiliar with machine learning and quantitative analysis the ability to use cutting edge techniques on their text based stylometry / textometry problems, and secondly to act as a framework for testing and comparing the effectiveness of different analytic techniques’ performance on text analysis quickly and easily.

What is JGAAP?

JGAAP is a software package designed to allow research and development into best practices in stylometric authorship attribution.

Okay, what is “stylometric authorship attribution”?

It’s a buzzword to describe the process of analyzing a document’s writing style with an eye to determining who wrote it. As an easy and accessible example, we’d expect Professor Albus Dumbledore to use bigger words and longer sentences than Ronald Weasley. As it happens (this is where the R&D comes in), word and sentence lengths tend not to be very accurate or reliable ways of doing this kind of analysis. So we’re looking for what other types of analysis we can do that would be more accurate and more reliable.

Why would I care?

Well, maybe you’re a scholar and you found an unsigned manuscript in a dusty library that you think might be a previously unknown Shakespeare sonnet. Or maybe you’re an investigative reporter and Deep Throat sent you a document by email that you need to validate. Or maybe you’re a defense attorney and you need to prove that your client didn’t write the threatening ransom note.

Sounds like the perfect tool for the job. And, best of all, it is FREE.

So here’s the experiment and how you can participate.

1. Download, and install the JGAAP software. Pretty easy, works on Mac/PC/Linux

If your computer does not already have Java installed, download the appropriate version of the Java Runtime Environment from Sun Microsystems. JGAAP should work with any version of Java at least as recent as version 6. If you are using a Mac, you may need to use the Software Update command built into your computer instead.

You can download the JGAAP software here. The jar will be named jgaap-5.2.0.jar, once it has finished downloading simply double click on it to launch JGAAP. I recommend copying it to a folder and launching it from there.

2. Read the tutorial here. Pay attention to the workflow process and steps required to “train” the software. Full documentation is here. Demos are here

3. Run some simple tests using some known documents to get familiar with the software. For example, you might run tests using some posts from WUWT (saved as text files) from different authors, and then put in one that you know who authored as a test, and see if it can be identified. Or run some tests from authors of newspaper articles from your local newspaper.

4. Download the Heartland files from Desmog Blog’s original post here. Do it fast, because this experiment is the one thing that may actually cause them to take them offline. Save them in a folder all together. Use the “properties” section of the PDF viewer to determine authorship. I suggest appending the author names (like J.Bast) to the end of the filename to help you keep things straight during analysis.

5. Run tests on the files with known authors based on what you learned in step 3.

6. Run tests of known Heartland authors (and maybe even throw in some non-heartland authors) against the “fake” document 2012 Climate Strategy.pdf 

You might also visit this thread on Lucia’s and get some of the documents Mosher used to compare visually to tag Gleick as the likely leaker/faker. Perhaps Mosher can provide a list of files he used. If he does, I’ll add them. Other Gleick authored documents can be found around the Internet and at the Pacific Institute. I won’t dictate any particular strategy, I’ll leave it up to our readers to devise their own tests for exclusion/inclusion.

7. Report your finding here in comments. Make screencaps of the results and use tinypic.com or photobucket (or any image drop web service) to leave the images in comments as URLs. Document your procedure so that others can test/replicate it.

8. I’ll then make a new post (probably this weekend) reporting the results of the experiment from readers.

As a final note, I welcome comments now in the early stages for any suggestions that may make the experiment better. The FBI and other law enforcement agencies investigating this have far better tools I’m told, but this experiment might provide some interesting results in advance of their findings.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
233 Comments
Inline Feedbacks
View all comments
robin
February 23, 2012 12:34 pm

Some more complete numbers. Was able to run a few tests, including SPCA (Standardized Principle Component Anaysis), which resulted in charts without labels. http://blog.debreuil.com/images/jcharts.png
I think this would be more useful being run by the people that made it, as they know the ins and outs of what everything signifies. Sorry for the text dump here.
unknown.txt /Users/admin/Desktop/jar/unknown/unknown.txt
Canonicizers: none
Analyzed by Burrows Delta using Word stems as events
1. Author01 Infinity
1. Author01 Infinity
1. Author03 Infinity
1. gleick Infinity
1. bast Infinity
Analyzed by Burrows Delta using Sentence Length as events
1. Author01 78.77155793679051
2. Author01 80.2867632506562
3. gleick 86.33117704938115
4. Author03 88.9431049630841
5. bast 104.26533350267803
Analyzed by Burrows Delta using Syllable Transitions as events
1. gleick 26.876961580487105
2. Author03 39.498099057362204
3. Author01 44.81833603913991
4. Author01 53.69169935354535
5. bast 53.93613732442708
Analyzed by Burrows Delta using MW Function Words as events
1. bast 100.96472895245202
2. Author01 110.13874051963056
3. Author03 111.98543979724113
4. Author01 112.50870666862883
5. gleick 118.5464526044639
Analyzed by Burrows Delta using Syllables Per Word as events
1. gleick 4.678547658695827
2. Author01 7.16552557153825
3. Author01 9.372995377516089
4. Author03 9.552504044292814
5. bast 13.812675242063655
Analyzed by Burrows Delta using Word Lengths as events
1. gleick 19.34449212885342
2. Author01 23.605545416411637
3. Author01 28.46339579762878
4. bast 30.397652383928673
5. Author03 30.8325765083569
Analyzed by Burrows Delta using Binned Frequencies as events
1. gleick 56.804661456724205
2. bast 57.4768784285835
3. Author03 90.58279638495986
4. Author01 92.86171017325175
5. Author01 103.78059122092623
——————————
unknown.txt /Users/admin/Desktop/jar/unknown/unknown.txt
Canonicizers: none
Analyzed by Naive Bayes Classifier using Word stems as events
1. gleick 6.284762047923577E-306
2. bast 1.5118408762742144E-308
3. Author03 9.881312917E-314
4. Author01 4.9E-323
Analyzed by Naive Bayes Classifier using Sentence Length as events
1. bast 2.520179164211697E-52
2. Author03 9.791582471231638E-73
3. Author01 1.0853260416155313E-87
4. gleick 8.612586952329744E-88
Analyzed by Naive Bayes Classifier using Syllable Transitions as events
1. bast 1.379192064573727E-41
2. gleick 2.9856345009842305E-43
3. Author01 9.87335629357855E-53
4. Author03 2.4869515345576324E-54
Analyzed by Naive Bayes Classifier using MW Function Words as events
1. bast 3.523051194773655E-79
2. gleick 1.3668325518288174E-95
3. Author03 1.228600114016044E-180
4. Author01 6.883327597227027E-188
Analyzed by Naive Bayes Classifier using Syllables Per Word as events
1. bast 1.936285647735328E-5
2. gleick 1.1133390291997895E-5
3. Author01 2.339809499435671E-6
4. Author03 1.1557038384224756E-6
Analyzed by Naive Bayes Classifier using Word Lengths as events
1. gleick 4.185945546916831E-28
2. bast 2.1906437924926997E-29
3. Author01 8.198176784562617E-38
4. Author03 2.2375812457401515E-42
Analyzed by Naive Bayes Classifier using Binned Frequencies as events
1. bast 1.1656631793255989E-88
2. gleick 3.7646212780446196E-113
3. Author03 4.074684412801988E-118
4. Author01 3.7928452097311674E-162
———————————-
unknown.txt /Users/admin/Desktop/jar/unknown/unknown.txt
Canonicizers: none
Analyzed by Markov Chain Analysis using Word stems as events
1. bast 193.85717213511924
2. gleick 13.278602110106085
3. Author03 0.0
3. Author01 0.0
3. Author01 0.0
Analyzed by Markov Chain Analysis using Sentence Length as events
1. Author01 8.391629968440892
2. bast 4.382026634673881
3. gleick 0.6931471805599453
4. Author03 0.0
4. Author01 0.0
Analyzed by Markov Chain Analysis using Syllable Transitions as events
1. Author01 1112.350964916729
2. Author03 968.781177953394
3. Author01 930.332078970121
4. bast 885.1058174495475
5. gleick 851.9537665612401
Analyzed by Markov Chain Analysis using MW Function Words as events
1. bast 496.32584285189483
2. gleick 122.80138586374213
3. Author01 0.6931471805599453
4. Author03 0.0
4. Author01 0.0
Analyzed by Markov Chain Analysis using Syllables Per Word as events
1. Author01 1175.347569702713
2. Author03 1010.443933363838
3. Author01 977.0885796023475
4. gleick 897.9218021335724
5. bast 881.9137449893942
Analyzed by Markov Chain Analysis using Word Lengths as events
1. Author01 1754.7669675494938
2. bast 1693.2837194057502
3. gleick 1543.939541549111
4. Author01 1506.0750905415073
5. Author03 1478.8738547796027
Analyzed by Markov Chain Analysis using Binned Frequencies as events
1. bast 1244.1093577231186
2. gleick 836.2033308547939
3. Author03 227.3215761357559
4. Author01 203.7365317508035
5. Author01 108.59439936922432

February 23, 2012 12:48 pm

kim2ooo says:
February 23, 2012 at 11:12 am
Good catch!
It looks to me as if it is a left corner header……….Does this fit? [ The Blue header ]

When both documents are printed, the text width does fit, but the text of the faked document starts too high, over the Pacific Institute logo. But still it may be some logo, as the text starts rather low and something is hidden at the top (probably by paper). But the aspect ratio may be different in the US (both printed here on A4, that is 12″ i.s.o. 11″).

February 23, 2012 12:57 pm

additional, it indeed seems to be a logo, but not the Pacific Institute logo, it is too wide for that, But in the two headers de vertical and horizontal dots match exactly at the same left and top margins. To make that happen with tape or paper would be very difficult to obtain…

1DandyTroll
February 23, 2012 1:28 pm

This bloke has a free stylometry software, but it’s only for ms windows. It looks like he’s looking for more beta testers for version 2, though.
http://www.philocomp.net/?pageref=humanities&page=signature
(At least it comes with a powerpoint presentation) 🙂
Pfft, there’s even this: Writeprint Stylometry (Scripts) 0.1 for wordpress for analyzing comments to know who’s hiding behind who in the comment section. :p
I never knew linguisticians could have so much fun… o_O

P. Solar
February 23, 2012 1:28 pm

>>
My recommended settings would be:
– Canonicizers: Normalize ASCII and Whitespace, Strip Punctuation, and Unify Case
>>
No, actually one of Gleick’s notable features is his (over) use of commas. Strip Punctuation would not be the best way to detect that !
Like any analysis your will need to think and take time to read the doc and get familiar with it. I doubt anyone will get any meaningful results without spending a evening just finding out how to do this. In fact I doubt any trivial analysis by someone who does not know what they are doing will be of any use.
It would be interesting if someone has the time to do this properly but the “crowd” seems small thus far.

Bart
February 23, 2012 1:32 pm

Ferdinand Engelbeen says:
February 23, 2012 at 12:57 pm
“…it is too wide for that…”
Scanning shadow?

February 23, 2012 1:46 pm

Bart says:
February 23, 2012 at 1:32 pm
Scanning shadow?

Indeed it looks like the scanning shadow of the paper, as in the second page the bottom line (which is the bottom scanning shadow of the paper) starts at the same left margin as the top vertical dots.
Pitty, but no information in here… But anyway clever deduction from Kim2000!

Manfred
February 23, 2012 1:51 pm

Laughably, the Penn State sabbaticalized Dr. Mike Mann accepted it uncritically.
That must raise disturbing doubts about his judgements on other issues as well.

February 23, 2012 1:55 pm

Ferdinand Engelbeen says:
February 23, 2012 at 12:48 pm
I’ve been doing some research….
It absolutely doesn’t fit Heartlands’s logo [ letterhead ]
http://heartland.org/issues/law
It doesn’t seem to fit Desmogs
http://www.desmogblog.com/media_centre
“but the text of the faked document starts too high, over the Pacific Institute logo.”
Remember, this is a jpg image and is probably cropped.
http://img813.imageshack.us/img813/5586/startdocpg2.jpg
Or am I missing what you are saying?
———————
Thank you Mr Robt Moderator
Thank you, Latitude says:
February 23, 2012 at 12:06 pm
Thank you Mr. Ben Wilson says:
February 23, 2012 at 12:26 pm
Thank you Mr.Peter Kovachev says:
February 23, 2012 at 12:23 pm

robin
February 23, 2012 2:02 pm

I really think this is a job for a human – though this program could be a good tool to give insights. The idea of plugging in data and getting a result seems a bit hopeful given all the possible settings and lack of details. People always want to believe you can just dump data into a computer and a correct result will come out, sometimes that is true, but at very least you need impartial humans to calibrate and confirm. In this case I’d say it is of zero value as a result, and little value as a tool unless you wrote it or are well steeped in the field stylometry.
I wonder what http://ljzigerell.wordpress.com/2012/02/18/profiling-the-heartland-memo-author/ would find with Gleick’s writings compared to the Heartland speech he used.

Ian Hoder
February 23, 2012 3:00 pm

Thanks Jake. I analyzed a few writings from just myself and Gleick and used your recommended settings of:
– Canonicizers: Normalize ASCII and Whitespace, Strip Punctuation, and Unify Case
– Event Drivers: Character Grams (N=4, N=5), M…N letter words (M=6, N=12)
– Event Culling: Most Common Events (N=100 or other fairly large number)
– Analysis Methods: Try these ONE at a time (Guassian SVM, LDA, Linear SVM)
Results are:
Canonicizers: Normalize ASCII Normalize Whitespace Strip Punctuation Unify Case
Analyzed by Linear SVM using Character 4Grams as events
1. Gleick 1.0
Analyzed by Linear SVM using 6–12 letter Words as events
1. Hoder 2.0
Analyzed by Gaussian SVM using Character 4Grams as events
1. Hoder 2.0
Analyzed by Gaussian SVM using 6–12 letter Words as events
1. Hoder 2.0
Analyzed by LDA using Character 4Grams as events
1. Gleick 2.3824198113541523E12
2. Hoder 2.38241981135374E12
Analyzed by LDA using 6–12 letter Words as events
1. Hoder -1.0570728820980994E13
2. Gleick -1.0570728820981416E13
No smoking gun here. It looks like there’s a better chance of myself having written the memo than Gleick. As others have mentioned though, he probably used quite a few words from the other documents that he stole to produce the fake one.

GeoLurking
February 23, 2012 3:21 pm

P. Solar says:
February 23, 2012 at 1:28 pm
“… but the “crowd” seems small thus far.”
If I can ever get past the “failed experiment” error I’ll be in the game. Rummaging through data is fun.

Bart
February 23, 2012 3:37 pm

A big problem is that, clearly most of the memo was cut and pasted from Heartland documents. It’s the cardboard cutout, Snidely Whiplash, Tourette’s -like expulsions which John Hinderaker pointed out here which were inserted by the faker. It’s not a lot to go on, but that is where the focus should be.

February 23, 2012 3:56 pm

Feeding time for all you WUWT sleuths. On another thread here (http://wattsupwiththat.com/2012/02/23/peter-gleick-debate-invitation-email-thread/#comment-901963), Hearland submits emails between them and Dr Gleick.
To make it easier for you tired textual analysts, I’ve taken the liberty to copy and paste Gleick’s text, minus all else. The delightful bit is that although normally this would be a pretty pro forma correspondence with the usual polite and formalized language, Gleick’s “”special character and idiosyncratic language comes through loud and clear. You have to read the entire email thread to get a feel for his total gracelessness. And, as TheGoodLocust noted, Gleick makes it unambiguously clear that he knew about the importance of donor anonymity to Heartland, and his oppositoin to that.
***
Dear Mr. Lakely,
After reviewing your email and after serious consideration, I must decline your invitation to participate in the August fundraising event for the Heartland Institute.
I think the seriousness of the threat of climate change is too important to be considered the “entertainment portion of the event” as you describe it, for the amusement of your donors.
Perhaps more importantly, the lack of transparency about the financial support for the
Heartland Institute is at odds with my belief in transparency, especially when your Institute and its donors benefit from major tax breaks at the expense of the public.
Thank you for considering me.
Dr. Peter Gleick
* * *
Dear Mr. Lakely,
Thank you for your email of January 13th, 2012, inviting me to participate in the Heartland Institute’s 28th Anniversary Benefit Dinner.
In order for me to consider this invitation, please let me know if the Heartland Institute
publishes its financial records and donors for the public and where to find this information.
Such transparency is important to me when I am offered a speaking fee (or in this case, a
comparable donation to a charity). My own institution puts this information on our website.
Also, I would like a little more information about the date, venue, and expected audience and format. In addition, I assume your offer includes all travel and hotel expenses, economy class, but can you please confirm this?
Sincerely,
Dr. Peter Gleick
***

AFPhys
February 23, 2012 4:18 pm

robin @12:34
What you ran through there is quite interesting to start this project. One thing that seems quite surprising to me is that the other three authors are nearly always either all above or below Gleick & Bast. I understand Bast being in the mix: we know that he was part of the team who wrote the true documents, and those were heavily quoted by the fake.
Ah – what was the general topic and subject matter of the other documents you used?
Somewhere I read two days ago or so (Curry’s?) about a project to highlight the fake memo in such a way as to show what is cut from the other documents, what was modified but taken from them, and the totally new material. I have not seen
It seems to me unless the cut/paste, and possibly modified, stuff is isolated no reliable analysis can be done with this tool.

AFPhys
February 23, 2012 4:26 pm

Kim2000, DukeC, etc…
One thing that would really make it interesting is if PDFs scanned by Gleick or PI on “plain paper” also just happened to bear those same marks in the upper left hand corner… hmmmm…???

Philemon
February 23, 2012 4:32 pm

The textual analysis should exclude the copy/pasted text from the rest of the material.
In addition to the papered over header, the typeset is not consistent with the Heartland house-style in their other published documents. It would be interesting if it were, coincidentally, consistent with the typeset house-style of any of the institutions with which Peter Gleick was associated, or with documents he produced independently.
Given the names named, and the context in which they were named, as well as the recent history of his comments in various venues, with the addition of the odd expressions used, it is completely consistent with Gleick having authored it.
As Sir Thomas More would have said: “Why Peter, it profits a man nothing to give his soul for the whole world… but for a Forbes blogspot and a K-12 science curriculum?”

P. Solar
February 23, 2012 4:58 pm

h/t to robin for the Zigerell link. That is an excellent analysis as far as it goes. Oddly it does not continue the analysis of meme vs Bast speach to a similar exercise on some Gleick text.
I google Peter Gleick Forbes (prominent in the memo) and took the first Forbes article by G.
http://www.forbes.com/sites/petergleick/2012/02/05/global-warming-has-stopped-how-to-fool-people-using-cherry-picked-climate-data/
Z’s #2 , the Oxford comma, before and in lists of items. G uses this without fail.
Z’ #3. The memo author wrote 20 as a number but two as a word.Z [one instance of “fifteen” all other numbers are numerals]
#4. The memo author did not indent paragraphs.
#5. The memo author used ragged-right justification with no hyphenation. [not applicable to HTML]
#10. The memo author inconsistently hyphenated the adjective high-profile / high profile.[G seems consistent in hyphenation in this text ]
#13. The memo author used parenthetical remarks,[Strong +ve. G over use of this technique is a hall mark of just about every sentence.]
#14 The memo author introduced the acronyms IPCC, NIPCC, AGW, and WUWT without explanation [neutral: virtually no acronyms, NASA (trivial), GISS not explained. ]
The one’s I’ve skipped don’t seem to apply to the Frobes article.
Let’s see if there are any similar things to pick out.
S #1 Use of double hyphen to introduce an example. Three times in first page of memo eg.
“focus on providing curriculum that shows that the topic of climate change is controversial and
uncertain — two key points that are effective at dissuading teachers from teaching science. ”
G uses this device 6 times in Forbes plus 7th time as a parenthesis.
higher-than-average warming – a dynamic confirmed by both models and by actual observations.
Z’s idea that the memo may an original but fraudulently “spiced up” is interesting. I’m not that convinced but it merits further consideration.
The final para is surely the most obvious fakery.
H.I. would be unlikely to refer to one of their in-house climate experts bluntly as “Taylor” even in a memo. Indeed, all are referred to by last name only in contrast to the rest of the memo where names are given in full, with title where appropriate.
Forbes seems to get improbable importance. “High profile climate scientist (such as Gleick)”. Not sure who would have considered him in those terms (before this week). Self-promotion?
Attempted kiss of death to “Curry (who has become popular with our supporters).
Circulation to “subset” . Scientific phraseology. Also if you intend to be underhand, it’s unwise to put it in writing and say “shh, don’t tell any one will you?” Especially if there is nothing controversial that is worth hiding from half your colleges!
“… if our work continues to align with their [KOCH Foundation.] interests.” Obvious “ah-ha! so Koch are dictating the H.I. agenda. Greenpeace were right and here’s the proof”. This is for external consumption, not a secret “subset” of the board.
“showing… climate change is controversial and uncertain — two key points that are effective in disuading teachers from teaching science.”
H.I. certainly would say they want teachers to START teaching science, not stop. This is straight out of warmist vocabulary. What the author meant by science is “our science” , “the science”. You know, the indivisible, consensus, the science is settled, don’t argue ever again “science”.
That is an subconscious slip of language of someone so familiar with that use of the word science he does not even notice.
“… we sponsor the NIPCC to undermine the official…” . Again, Freudian slip. H.I. aim to rebut and expose what they say is corrupt science not undermine it. This phrase comes from the mind of a “believer” who feels his dogma is being threatened.
Conclusion
=========
It would appear that the whole memo was written by a frustrated warmist with a grudge against Forbes, Taylor, Watts and Curry in particular.
Certain distinct features of Gleick’s writting style are very present though not enough to condemn him from such a cursory examination. Of course once he is in a court of law, under oath, he’s going to be hard pressed to play “I got it in the post but I lost it”.
More pop-corn required…

P. Solar
February 23, 2012 5:09 pm

Robin says:
>>
I think this is a job for a human.
>>
….I agree it is best done by brain than box. The result of the software can only be as good as some “subset” of the knowledge of the person who wrote it. I doubt that java applet is much more than an amusing toy.
>>
I wonder what http://ljzigerell.wordpress.com/2012/02/18/profiling-the-heartland-memo-author/ would find with Gleick’s writings compared to the Heartland speech he used.
>>
Great minds think alike 😉

Dave Worley
February 23, 2012 5:28 pm

Mr. Connelly…..the courts are overloaded with criminal cases. This information will not be needed for a while. You trying to rush good scientists at work? How’s a scientist to cope with such pressure?
Maybe where you come from things get published overnight, but crowdsourcing is a much more laborious process than today’s climatological peer review process.
Relax, give the wine a little time to ferment.

Mike Rossander
February 23, 2012 5:36 pm

re: TerryS’ challenge above (at February 23, 2012 at 6:42 am) regarding PDF meta-data:
Piece of cake. And marginally plausible, too.
1. Open your document in Adobe X Standard (the default for those of us who have to create forms, OCR old scans or redact contents)
2. Click File/Save As from within the program to save it to a different location.
3. Open both versions and compare xmp:ModifyDate and the xmp:MM:InstanceID.
Why is this plausible? When receiving a document via email, most of us do just launch the attachment. If your job requires you to create pdf-based forms, OCR old flat scans or redact contents for our jobs, defaulting to Adobe Pro or Standard is normal. Once the document is open and interesting, you want to save it to your harddrive so you can do something with it (like, say, post it to a website). The document is already open and File/Save-As is the pattern that Microsoft trained us to. No forensic examiner would make that kind of mistake but a casual computer user could with no evil intent.
By the way, it’s also plausible that you might Save-As to Reduced-File-Size if you are concerned about download times and want to make a document more accessible to readers.
Inconsistent modification dates and filesizes are maybe clues that something happened but they are far from proof that anything malicious happened.

Tim Minchin
February 23, 2012 5:48 pm

If he says he received it by snail mail shouldn’t he have kept the envelope which will have the post mark on it ?

Jake
February 23, 2012 5:57 pm

Hoder
Well I’m glad you got it to work. I just keep getting the same errors that you alluded to this morning.

David L.
February 23, 2012 6:31 pm

If you want to compare documents against each other for text that is similar, use Wcopyfind found here: Simple free executable that works great.
http://plagiarism.bloomfieldmedia.com/z-wordpress/software/wcopyfind/

Jenn Oates
February 23, 2012 7:23 pm

I do believe that this will be fun!