An online and open exercise in stylometry/textometry: Crowdsourcing the Gleick “Climate Strategy Memo” authorship

Tonight, a prescient prediction made on WUWT shortly after Gleick posted his confession has come true in the form of DeSmog blog making yet another outrageous and unsupported claim in an effort to save their reputation and that of Dr. Peter Gleick as you can read here: Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic

In a desperate attempt at self vindication, the paid propagandists at DeSmog blog have become their own “verification bureau” for a document they have no way to properly verify. The source (Heartland) says it isn’t verified (and a fake) but that’s not good enough for the Smoggers and is a threat to them, so they spin it and hope the weak minded regugitators retweet it and blog it unquestioned. They didn’t even bother to get an independent opinion. It seems to be just climate news porn for the weak minded Suzuki followers upon which their blog is founded. As one WUWT commenter (Copner) put it – “triple face palm”.

Laughably, the Penn State sabbaticalized Dr. Mike Mann accepted it uncritically.

Twitter / @DeSmogBlog: Evaluation shows “Faked” H …

Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic bit.ly/y0Z7cL  – Retweeted by Michael E. Mann

Tonight in comments, Russ R. brought attention to his comment with prediction from two days ago:

I just read Desmog’s most recent argument claiming that the confidential strategy document is “authentic”. I can’t resist reposting this prediction from 2 days ago:

Russ R. says:
February 20, 2012 at 8:49 pm
Predictions:

1. Desmog and other alarmist outfits will rush to support Gleick, accepting his story uncritically, and offering up plausible defenses, contorting the evidence and timeline to explain how things could have transpired. They will also continue to act as if the strategy document were authentic. They will portray him simultaneously as a hero (David standing up to Goliath), and a victim (an innocent whistleblower being harassed by evil deniers and their lawyers).

2. It will become apparent that Gleick was in contact with Desmog prior to sending them the document cache. They knew he was the source, and they probably knew that he falsified the strategy document. They also likely received the documents ahead of the other 14 recipients, which is the only way they could have had a blog post up with all the documents AND a summary hyping up their talking points within hours of receiving them.

3. This will take months, or possibly years to fully resolve.

Russ R. is spot on, except maybe for number 3, and that’s where you WUWT readers and crowdsourcing come in. Welcome to the science of stylometry / textometry.

Since DeSmog blog (which is run by a Public Relations firm backed by the  David Suzuki foundation) has no scruples about calling WUWT, Heartland, and skeptics in general “anti-science”, let’s use science to show how they are wrong. Of course the hilarious thing about that is that these guys are just a bunch of PR hacks, and there isn’t a scientist among them. As Megan McArdle points out, you don’t have to be a scientist to figure out the “Climate Strategy” document is a fake, common sense will do just fine. She writes in her third story on the issue: The Most Surprising Heartland Fact: Not the Leaks, but the Leaker

… a few more questions about Gleick’s story:  How did his correspondent manage to send him a memo which was so neatly corroborated by the documents he managed to phish from Heartland?
How did he know that the board package he phished would contain the documents he wanted?  Did he just get lucky?

If Gleick obtained the other documents for the purposes of corroborating the memo, why didn’t he notice that there were substantial errors, such as saying the Kochs had donated $200,000 in 2011, when in fact that was Heartland’s target for their donation for 2012?  This seems like a very strange error for a senior Heartland staffer to make.  Didn’t it strike Gleick as suspicious?  Didn’t any of the other math errors?

So, let’s use science to show the world what they the common sense geniuses at DeSmog haven’t been able to do themselves. Of course I could do this analysis myself, and post my results, but the usual suspects would just say the usual things like “denier, anti-science, not qualified, not a linguist, not verified,” etc. Basically as PR hacks, they’ll say anything they could dream up and throw it at us to see if it sticks. But if we have multiple people take on the task, well then, their arguments won’t have much weight (not that they do now). Besides, it will be fun and we’ll all learn something.

Full disclosure: I don’t know how this experiment will turn out. I haven’t run it completely myself. I’ve only familiarized myself enough with the software and science of stylometry / textometry to write about it. I’ll leave the actual experiment to the readers of WUWT (and we know there are people on both sides of the aisle that read WUWT every day).

Thankfully, the open-source software community provides us with a cross-platform open source tool to do this. It is called JGAAP (Java Graphical Authorship Attribution Program). It was developed for the express purpose of examining unsigned manuscripts to determine a likely author attribution. Think of it like fingerprinting via word, phrase, and punctuation usage.

From the website main page and FAQs:

JGAAP is a Java-based, modular, program for textual analysis, text categorization, and authorship attribution i.e. stylometry / textometry. JGAAP is intended to tackle two different problems, firstly to allow people unfamiliar with machine learning and quantitative analysis the ability to use cutting edge techniques on their text based stylometry / textometry problems, and secondly to act as a framework for testing and comparing the effectiveness of different analytic techniques’ performance on text analysis quickly and easily.

What is JGAAP?

JGAAP is a software package designed to allow research and development into best practices in stylometric authorship attribution.

Okay, what is “stylometric authorship attribution”?

It’s a buzzword to describe the process of analyzing a document’s writing style with an eye to determining who wrote it. As an easy and accessible example, we’d expect Professor Albus Dumbledore to use bigger words and longer sentences than Ronald Weasley. As it happens (this is where the R&D comes in), word and sentence lengths tend not to be very accurate or reliable ways of doing this kind of analysis. So we’re looking for what other types of analysis we can do that would be more accurate and more reliable.

Why would I care?

Well, maybe you’re a scholar and you found an unsigned manuscript in a dusty library that you think might be a previously unknown Shakespeare sonnet. Or maybe you’re an investigative reporter and Deep Throat sent you a document by email that you need to validate. Or maybe you’re a defense attorney and you need to prove that your client didn’t write the threatening ransom note.

Sounds like the perfect tool for the job. And, best of all, it is FREE.

So here’s the experiment and how you can participate.

1. Download, and install the JGAAP software. Pretty easy, works on Mac/PC/Linux

If your computer does not already have Java installed, download the appropriate version of the Java Runtime Environment from Sun Microsystems. JGAAP should work with any version of Java at least as recent as version 6. If you are using a Mac, you may need to use the Software Update command built into your computer instead.

You can download the JGAAP software here. The jar will be named jgaap-5.2.0.jar, once it has finished downloading simply double click on it to launch JGAAP. I recommend copying it to a folder and launching it from there.

2. Read the tutorial here. Pay attention to the workflow process and steps required to “train” the software. Full documentation is here. Demos are here

3. Run some simple tests using some known documents to get familiar with the software. For example, you might run tests using some posts from WUWT (saved as text files) from different authors, and then put in one that you know who authored as a test, and see if it can be identified. Or run some tests from authors of newspaper articles from your local newspaper.

4. Download the Heartland files from Desmog Blog’s original post here. Do it fast, because this experiment is the one thing that may actually cause them to take them offline. Save them in a folder all together. Use the “properties” section of the PDF viewer to determine authorship. I suggest appending the author names (like J.Bast) to the end of the filename to help you keep things straight during analysis.

5. Run tests on the files with known authors based on what you learned in step 3.

6. Run tests of known Heartland authors (and maybe even throw in some non-heartland authors) against the “fake” document 2012 Climate Strategy.pdf 

You might also visit this thread on Lucia’s and get some of the documents Mosher used to compare visually to tag Gleick as the likely leaker/faker. Perhaps Mosher can provide a list of files he used. If he does, I’ll add them. Other Gleick authored documents can be found around the Internet and at the Pacific Institute. I won’t dictate any particular strategy, I’ll leave it up to our readers to devise their own tests for exclusion/inclusion.

7. Report your finding here in comments. Make screencaps of the results and use tinypic.com or photobucket (or any image drop web service) to leave the images in comments as URLs. Document your procedure so that others can test/replicate it.

8. I’ll then make a new post (probably this weekend) reporting the results of the experiment from readers.

As a final note, I welcome comments now in the early stages for any suggestions that may make the experiment better. The FBI and other law enforcement agencies investigating this have far better tools I’m told, but this experiment might provide some interesting results in advance of their findings.

About these ads

233 thoughts on “An online and open exercise in stylometry/textometry: Crowdsourcing the Gleick “Climate Strategy Memo” authorship

  1. You have a wicked, and extremely funny, sense of humor. At this moment, I’m a bit pinched for time or I’d go through the exercise myself……but I can see the joke well enough without going through the exercise. I’m sure that I’ll only laugh the louder when results start coming in. Bravo!

  2. Mike Mann:
    “Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic ”

    Mike, Mike, stop, it’s a trap… Oh. Poor Mike. There ya go.

  3. I have a different idea all together.

    The Heartland Institute can go into their email server & retrieve all of the documents sent to Dr. Gleick’s phony email address.

    This action can cast more doubt on any warmists claims

  4. Rude: No dice, Glieck implies he got the memo separately, from an anonymous source, by snail mail. Very convenient, no?

  5. If the climate strategy document is real why did they feel the need to redact the header and footer of the document? you can still see the little bits of what was there peeking above the area that has been removed.

  6. Rudebaeger, I bet the strategy doc isn’t one of the ones sent, why would they scan the document, none of the others are scanned images.

  7. Do not use documents that have been edited by anyone other than Gleick. The mistakes he makes would be removed.

    use: letter to peilke, use his blog comments.

    DO NOT use articles that have been or might have been edited by others.

    The fake memo also contains a plagarized sentence. You have to remove that.

  8. Actually the bits that are left above the redaction on the first page could be enough of a fingerprint to match up a header to. Maybe.

  9. I have a new theory about the fake document.

    I suspect that it was sent to him by a colleague or, more likely, an opponent for the specific purpose of yanking his chain.

    They hoped to get a laugh as Dr Gleick’s anger and hatred blinded him to the document’s obvious faults. However even the provocateur(s) could not have anticipated Dr Gleick’s actions.

  10. Got my second treatment to stop me going blind today so I may have a few hours spare this afternoon as I sit in a darkend room.

    Then again it being the NHS my appointment was 3 hours late last time so that spare time I talked about probably will be a pocket full of dreams.

  11. “… and hope the weak minded regugitators retweet it and blog it unquestioned.”
    – more a certainty than a hope :-|

  12. I like The Registers description of DeSmog blog.

    A publication run by a Canadian PR firm which lists various green businesses and organisations among its clients, also funded in part by convicted online-gambling payments kingpin and hippie biz tycoon John Lefebvre.

  13. I am very much looking forward to the court case where Gleick presents the “original” memo as evidence.

    Today thanks to a nifty government program called Echelon most printers(at least those from HP, Xerox, Dell, Canon, Epson, Lexmark, amongst others now imbed microscopic code on everything they print. A Forensic Science team will be able to look at the Gleick’s memo and will be able to tell the printer it came from without much difficulty.

    My money is it coming from Gleick’s very own Epson Printer, the same one he used to scan the document.

  14. Reading what DeSmegHead write is like hearing some Islamic extremist saying the woman “deserved to be raped … it was her fault because she dressed provocatively.”

    Well perhaps the Heartland should have worn a Burkha and not provoked the taliban eco-nuts to attack it????

    This is their warped mentality. There is a real victim here. The Heartland institute were not breaking the law or acting with any impropriety, because they refused to tow-the-eco-nut-line they deserved what they got.

  15. Mike Mann: “Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic ”
    ____________________________
    Agreed; it’s an authentic fake.

  16. The document in the link is now called Strategy Document (3).pdf – like someone has made some copies – or perhaps some changes ?

  17. Given the nature of Gleick’s usual “unburnished” literary style, I suspect that almost all of his efforts which have made it into print have been subjected to a rigorous editing process to polish his rather crude prose.
    I am not convinced of the validity of this experiment, although I remain amused by the hilarity!

    I would also remind everyone, before they make any definitive statements that may come back to haunt them, that despite all the speculation there is still no definitive proof of the authorship of the fake. Or, dare I say it, absolute proof that it is a fake.

    Let us be careful not to lose the high ground.

  18. This software can’t ever give you a reliable proof of who the author is. It can only select the most likely author out of group of potential authors you train it on, but you can never guarantee that you’ve put the real author to the analysis unless you know for sure who it is.
    If the analysis turns out that certain particular person outside Heartland is way more likely to be the author than anybody from Heartland, then it’s weak proof the memo is indeed fake. It in no way proves that that certain particular person is the faker.

  19. I may be way off the mark here but isn’t DeSmogBlog funded by a convicted money launderer by the name of John Lefebvre?

    “Two former directors and founding shareholders of NETeller Plc, a British online money transfer company, have been charged in the United States with laundering billions of dollars in illegal gambling proceeds.

    Canadians Stephen Lawrence, 46, and John Lefebvre, 55, were arrested on Monday — Lawrence in the U.S. Virgin Islands and Lefebvre in Malibu, California — U.S. Attorney Michael Garcia said.”

    http://www.reuters.com/article/2007/01/16/us-crime-neteller-idUSN1622302920070116

    http://www.taxabletalk.com/tag/neteller/

    http://www.populartechnology.net/2011/04/truth-about-desmogblog.html

    http://news.heartland.org/newspaper-article/2008/02/01/blog-funder-guilty-money-laundering

  20. At the beginning of the “authentication” post, DeSmegHead writes:

    “It also uses phrases, language and, in many cases, whole sentences that were taken directly from Heartland’s own material. Only someone who had previous access to all of that material could have prepared the Climate Strategy in its current form.”

    Yep, that’s the way it looks. So until Glieck comes up with some documentary proof (rather than the word of a self-confessed liar) that he received a paper copy of the document by regular mail, that statement fits perfectly with the creation date of the fake and the date that he conned H.I. into sending him the other docs.

    At least we are all agreed so far.

  21. Sorry but I haven’t got time to actaully do this myself..

    But here is some Gleick correspondence (published just 2-3 weeks ago, with permission) even Forbes gets a mention.

    http://www.realclimategate.org/2012/02/clarifications-and-how-better-to-communicate-science/

    —————————————————

    Barry,

    Again, I am not going to spend more time on this, but I will try to be clear.

    My comments about your communications with me were not meant to suggest that you were either abusive or threatening to me in the nature of the kinds of emails/comments Hayhoe (or I or others in the climate community regularly receive). You have not been so far as I know, and I will try to make that clear in a tweet, when I get a chance. And perhaps “incredibly annoying” or “incredibly frustrating” or “incredibly discourteous,” or “incredibly uncivil” or some other synonym would have been a better choice. Do you really want me to pick one?

    I stand by my other comments in the email I sent to you, about how I personally perceived your participation in exchanges in the fall when I ran out of patience with any chance of rational discussion with WUWT, Bishop Hill, or the regular tweeters and bloggers of that group. It became clear it was an unproductive time sink with a group whose minds were closed to fact, and whose primary tool was ad hominem attack. The systematic and coordinated and dishonest attack on me after my negative review of LaFramboise’s book was only one example that made it clear that rational debate was not possible and dissenting views not tolerated. The fact that WUWT blocked me from adding comments more than a year ago to his routinely biased and often dissembling blog further convinced me that there was little interest in discussion among that group. Perhaps you’re having more luck, or have more patience.

    Peter Gleick

    ——————-
    Tamsin,

    I am not going to deal with this anymore. It has taken far too much
    of my constrained time and bandwidth already.

    I am glad Woods’ exchanges with you seem to have been decent. We’re
    probably all far more polite one-on-one than in public online
    screamfests. I’m sorry he didn’t like my comment. But I’ve reviewed
    his tweets, blog posts, status, web URL, and comments and
    contributions in places like Bishop Hill and WUWT (where, by the way,
    I’ve been blocked for more than a year from posting comments,
    presumably because my comments are “incredibly offensive” — yet I’m
    regularly and personally attacked on these kinds of sites). His
    adoption of the language, often coded, of the
    denier/skeptic/contrarian community, his amplification of memes
    around “climategate,” “AGwarmists,” “hide the decline,” “the hockey
    stick,” the straw man of “catastrophic” climate change, etc. may have
    changed since I blocked his Twitter feed to me last year, but I
    simply don’t find his input to the debate helpful or informative, and
    I’m certainly entitled to both my opinions and to decide what part of
    the climate controversy comes to me through different media. By the
    way, I also block people I LIKE, when I can no longer tolerate or
    filter their massive overuse of Twitter.

    I do what I can to communicate rationally with open-minded
    participants in this debate, but the polarization makes it hard to
    find them. If this is something you’re committed to diving into, I
    wish you the best of luck. I hope you’ll continue to publish in the
    scientific literature as your top priority — in the long-run, your
    reputation as a scientist (and your influence in the associated
    policy debates) will benefit from it.

    Barry, if you want to pursue this further, feel free, but honestly,
    you should consider cutting your tweet rate by a factor of 10 until
    your ratio of tweets to followers improves, you might consider what
    you really believe and how you express it, and we should probably
    ALL count to 10 after writing anything and before hitting send.

    (and Tamsin, your note about how Barry regrets the domain name, but
    “has kept it because it’s known” might be a warning to you, apropos
    “All Models Are Wrong…” But I’ve already made my opinion known on that.)

    Peter Gleick
    ————————————————
    little bit more email inthe url

    I am still in total shock that Peter did this, I really hope he was NOT involved in the fake document for his own sake. But it must be gotten to the bottom of. Because Desmogblog have ‘authenticated’ it. And Greenpeace are using the ‘fake’ document for political purposes.
    this is very serious.

  22. Re: Morph
    You are correct. The document has changed.
    The Modified time has changed from Mon Feb 13 12:41:52 2012 to Tue Feb 14 12:36:20 2012.
    The format of the original was PDF-1.4 and this has now changed to PDF-1.5.
    The Document instance UUID has changed from 692440ef-d85e-4cec-afef-742d339ece7b to e5477a6f-aa33-4521-b161-1ae07ed0a258

    The original document UUID has not changed so this indicates that it is the same scan as before.

    What it looks like to me is that somebody has the original scan and has converted it to pdf again.

  23. Hi,

    Just found something interesting – the Fake they point to in the article you link to is different to the original.

    The original had a modification and creation time as 2012-02-13T12:41:52-08:00. the ‘new’ fake (with ‘(3)’ on the end) has a modification time as 2012-02-14T12:36:20-08:00 (same original creation time) and is slightly shorter… Is this PDF meant to be the ‘original’ as leaked – if so, why the change? Also the time zone offset is -8 which is West coast..

    Maybe nothing – just seem interesting why the change in the pdf at all.

  24. More info about the changed document.
    I’ve now extracted the internal page images from both the original and the modified document and they are the same bit for bit. This shows that it is not a new scan of the original document since it would be nearly impossible to achieve the bit for bit match with a new scan.

  25. Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic bit.ly/y0Z7cL – Retweeted by Michael E. Mann

    If this gets to court we may find out whether the “Evaluation” is correct.
    Fake but real.
    Cold but warm.
    This is what I call living in denial; desperate, desperate stuff indeed.

  26. I suspect that the forged document was produced shortly before it was scanned – after the receipt of the stolen emailed documents. In fact, because where it fits it fits too well, it would have been very difficult to forge without the emailed documents to hand. If, contrary to my suspicion, the original forged document was in fact mailed at some stage it would almost certainly have been folded. I wonder if one of our forensically minded IT expert readers could have a close look at the scan and tell us whether there is any indication that the original document from which it was copied was folded prior to the scan?

  27. Does Gleick’s version of the story allow TIME for anyone else to have written the “strategy memo”? I haven’t followed the details but his version seems to go like:

    1. Mysterious Insider obtains strategy memo, puts it in the mail.
    2. Gleick receives, decides to investigate, tricks Heartland into sending board documents.
    3. Heartland emails documents, Gleick spends a week or two reading them.
    4. Only then does Gleick scan the original memo, and spreading the combined package.

    If that original memo shares literal text with the real documents, how did that happen? Let’s say it’s not coincidence. So… maybe the memo was the source for the board documents. Clearly it wasn’t written to be, or the “dissuade teachers from teaching science” slur wouldn’t have been there. And why did the shared text stay unchanged throughout the editing process when everything else changed so much? Why are some parts of such a sloppily-edited memo suddenly of boardroom quality? And shouldn’t the memo have other usable details that weren’t incriminating but would have been removed in editing anyway? It seems unlikely.

    So assuming all that, the memo was written based on the board documents. HOW OLD are those board documents? Was there really time for an insider to read them and distill them into that memo, and for the mail system to deliver them to Gleick by the time he says he got it? Clearly Gleick would have had time to write the memo after he got the finished board documents, but does the timeline of his version of the story fit?

  28. Ignore my previous post – file names are here

    http://www.desmogblog.com/heartland-insider-exposes-institute-s-budget-and-strategy

    (1-15-2012) 2012 Fundraising Plan.pdf 89.87 KB
    (1-15-2012) 2012 Heartland Budget (2).pdf 124.62 KB
    2 Agenda for January 17 Meeting.pdf 7.4 KB
    2010_IRS_Form_990 (2).pdf 2.7 MB
    2012 Climate Strategy (3).pdf 96.56 KB
    Binder1 (2).pdf 55.36 KB
    Board Meeting Package January 17.pdf 6.84 KB

    Its interesting that a few of them have the copy|paste numbers (2) and (3) added to their names including the document declared a fake by HI. Seems inconsistent.

    Hmmm – Makes me think that maybe DSBlog did some editing of their own perhaps ? Pure conjecture of course.

  29. Why did’nt he keep the packaging with the postmark and date stamp? Hardly conclusive evidence, obviously, but it would have been at least circumstantial support for his version of events.
    Secondly, in the time between the alleged receipt and “going fishing”, he would surely have talked to a few confidantes who would be able to corroborate his story. Like so much else, it defies credibility and reason that he would keep such news entirely to himself and then act as he did.

  30. As a matter of interest, does there exist a programme that can take a text and re-write it in the style of a different author? I Know it’s just my twisted mind at work, but if that programme doesn’t yet exist I’m sure it will soon.

  31. Good practice would suggest that you give the software an equal number of stories from a wide variety of authors in as similar domains and formats as possible. Make it as hard as possible for the software.

    I mean, if you put in 19 Gleick rants and one poem by Kahlil Gibran, then offer the fake memo as a test, the answer is obvious but unenlightening.

    And if you’re collecting material from web posts, be sure to snip quoted material from other sources.

  32. Important thing to note before speculating recent manipulation of the files.
    file name like “2010_IRS_Form_990 (2).pdf ” are often created by software like Firefox when a second download is done that would overwrite an existing file.

    On linux based systems that files can be checked for being identical by the ‘diff’ command.

    If diff outputs nothing , they are identical, otherwise is will show differences if possible else just state that they differ.

    It would be well within the style of SmogBlog to be trying prevent such analysis as is being done here by “tidying up” the docs.

    If there are differences I’m sure enough people here have untainted copies.

  33. Barry Woods quoted Gleick as follows:

    “I personally perceived your participation in exchanges in the fall when I ran out of patience with any chance of rational discussion with WUWT, Bishop Hill, or the regular tweeters and bloggers of that group. It became clear it was an unproductive time sink with a group whose minds were closed to fact, and whose primary tool was ad hominem attack. The systematic and coordinated and dishonest attack on me after my negative review of LaFramboise’s book was only one example that made it clear that rational debate was not possible and dissenting views not tolerated. The fact that WUWT blocked me from adding comments more than a year ago to his routinely biased and often dissembling blog further convinced me that there was little interest in discussion among that group.”
    ————————————————————————-
    The primary tool of WUWT and Bishop Hill is ad hominem attack? The contributors’ minds are ‘closed to fact’? Gleick must have been reading slightly distorted mirror sites that are not available to the rest of us. Bizarre.

    He then goes on to claim that he was ‘blocked’ from WUWT, which Anthony and his mods assure us is not true. Apart from their respective reputations for veracity, it is difficult to imagine why Gleick would be blocked, given the rules which state that it is how you behave, not who you are, that govern access. Connolly is freebasing all over another thread as I type – why does Gleick imagine that he was singled out for exclusion?

    The recurring themes of grandiosity and paranoia illustrated by Barry’s post help us to comprehend how Gleick saw no inconsistency between his public role as the ethics guru and the private person who would do ‘whatever it takes.’ He is just like the family values politicians and evangelical preachers who are eventually exposed as hypocrites. In one of John Le Carre’s spy novels, there is a telling comment on criteria used for recruitment – along the lines that the last thing they wanted was someone who spewed hatred for Communism. They said: if he hates it that much, he’s half in love with it already (my paraphrase).

  34. Barry’s post is a good sample of his style in email, perhaps blog posts may be a bit difference. In any case I can see the profuse use of commas, while generally correct, has the, rather common, fault of putting a comma before and, and but which, being prepositions, should not be preceded by a comma, and his propensity for rather long, drawn out sentences, that would better be made into a paragraph, like this one. ;)

    OMG , spot the style, I’ve just given myself away as being the source of the sting operation to trick Gleick into impersonating an H.I. officer to get confidential documents. LOL

  35. Who elected Connolley to decide what is “content free”? This isn’t Wikipedia, where Connolley can make decisions. He is only another commenter here. And as we can see, he is usually wrong.

  36. Great idea Anthony.

    So far, though the sheer volume of it is remarkable, most of the actual, specific evidence for Gleick authorship of the fake document has been of the subjective variety. (Tone, grammatical idiosyncrasies, out of place mentions of things that matter to Gleick and few others: himself, Forbes, Taylor, etc.) It’s enough that I’m confident he’s the author, but there’s nothing yet that “proves” it.

    It would be a win to gather additional “hard” evidence in a more rigorous, methodical fashion… more scientifically. So, on that note, all I’d ask “the crowd” is to please treat this like any other piece of scientific research… be disciplined, objective, consider all the evidence, don’t jump to conclusions, and if you findings happen to point away from Gleick, consider it an obligation to report those findings as well.

    Gleick first caved and gave a partial confession when the evidence pointing to him began to accumulate. There’s a good chance that he or his accomplices throw in the towel when the evidence becomes overwhelming and they realize they can’t possibly win.

  37. @Okander says:
    February 23, 2012 at 12:29 am
    Rude: No dice, Glieck implies he got the memo separately, from an anonymous source, by snail mail. Very convenient, no?

    If Gliech says he has a hard copy, it can be claimed during the discovery phase of the lawsuit, trial, defense, offense, etc, etc… “I got a hard copy, trust me!” “OK, show us the hard copy.” Glieck: “Trust me, my dog ate it!” QED, case closed.

  38. As Mosher pointed out, if the text copied from the other Heartland documents is not removed, the software will say that Heartland wrote it. If you do take out those lines, the folks at DeSmog can criticize the results as you using an edited document. It still might be interesting though.

  39. Michael J says:
    February 23, 2012 at 12:39 am
    I have a new theory about the fake document.
    I suspect that it was sent to him by a colleague or, more likely, an opponent for the specific purpose of yanking his chain.
    They hoped to get a laugh as Dr Gleick’s anger and hatred blinded him to the document’s obvious faults.
    ============================================================
    Nope, it’s been scientifically proven that liberals have no sense of humor.

  40. To make this a crowdsourcing exercise, we have to compile a huge archive. Not only of Gleick’s texts but of tens, even hundreds of other people. One has to compile a set of docs from Mann, one from Jones, one from Trenberth etc.

    And to make this the right way, we also have to compile similar sets from McIntyre, Willis, Pielke, you name it.

    People should write here, whose texts they’ll collect. The texts have to be copy-pasted as txt-files and the whole material has to be collected at one central place.

  41. The only way to figure out who forged the memo is to start a lawsuit and afford Gleick to show the letter he received.

    Then it must be checked whether it was printed on his own printer.

  42. There is definitely more to this than meets the eye.

    There are 4 documents that have multiple versions:

    (1-15-2012) 2012 Heartland Budget here and here

    2010_IRS_Form_990 here and here

    2012 Climate Strategy here and here

    Binder1 here and here

    The actual content of the different versions is the same but they are different documents. The modifications time of the latest versions have changed to Feb 14 2012 from their original values.

  43. Having played around with jgaap for a while, I’ve started to think, why we always have to do the work? That program has a lot of analyzing methods. Even if JGAAP fingers Gleick, there are tons of excuses to use:

    – Oh, the real writer is not one amoung those that were tested.
    – Yes, but “WEKA J48 Decision Tree Classifier” tells that it wasn’t Gleick!

    We’ve seen this already with statistic analysis of Mann’s work. No matter how much we prove our point, it’s being denied.

  44. Mr Connolley, you’re a software designer, aren’t you? Obviously as an IT guru, you’ve an advantage over many here. Instead of issuing cheap potshots, howsabout you pitch in on this and show us if you can actually manage a simple research project? Given your track record here, it’s not like you have too many things to do, people to see, etc.

  45. @William M. Connolley says:
    February 23, 2012 at 4:28 am

    Trying to thread jack again? your ego must be desperate for stroking.

  46. @Bill
    “As Mosher pointed out, if the text copied from the other Heartland documents is not removed, the software will say that Heartland wrote it. If you do take out those lines, the folks at DeSmog can criticize the results as you using an edited document. It still might be interesting though.”

    My recommendation is that you do NOT take anything out, but rather go paragraph by paragraph through the strategy memo. This should yield paragraphs which align with the real HI documents (i.e. cut and paste), and other paragraphs which may point to another author (our forger).

    If someone really has time go through sentence by sentence. My caution though is that in order to quell critics, you must eventually scan the WHOLE document for attribution.

  47. TerryS>

    There’s not enough consistency in timestamp handling across different operating systems and applications to draw any conclusions whatsoever. It’s perfectly possible for the documents to have inconsistent timestamps without it meaning anything at all.

    For me the really odd thing is the idea that HI would send the board documents as PDF format. I very much doubt that would be the case. Making them PDFs for distribution appears to have been done by whoever released them, and this appears to be highly significant when you check the earliest timestamps.

  48. I have written the following to BBC’s Richard Black on twitlonger. I doubt that he will even read it, let alone respond, But I had to get it off my chest.

    “Have you no shame?

    The Climate Alarmist’s reasoning as to the fake document in the fakegate leak actually being genuine, would be similar to me not having a driving licence, and stealing one from someone else with the same name as me and claiming, “of course I must have passed my driving test, I have a driving licence to prove it. And look, the information on it is mostly true, and I always believed I could drive anyway, so combining all these matters I guess I can legally drive now!”

    The document which YOU claim proves that the Heartland Institute is attacking education is a FAKE! Can you understand that?

    You are crudely and unsuccessfully and dishonestly passing off FAKED information gained through deception and lies as “news” and expecting us to believe you.

    By all means, write whatever lie based rubbish you like for the likes of the Guardian or Greenpeace, but DO NOT do that on the BBC!

    I cannot believe ANYTHING you write ever again, for you are NOT a journalist in ANY rational meaning of the word. You are nothing more than a very overly privileged advocate and activist for a political cause. NOTHING MORE!

    Have the decency to apologise, resign from the BBC and go work for the Gutter press where you belong.

    At least I support the side of the debate which still supports, truth, honesty, empirical evidence, the full and strict adherence to the FULL tenets of the scientific method, freedom and openness of research and opinion, acceptance and WELCOMING of scientific debate.

    How can you look yourself in the face knowing that you are on the side which supports criminality, lies, fraud, fakery, deception, bullying, keeping secret publicly funded research, the hiding of inconvenient data, misrepresentation of data, the bullying of editors and the threats to journals to supinely cave in to the oppression by advocates of a political agenda, the imposition of “acceptable” thought upon everyone, regardless of the weakness and error-filled level of research.

    Your follow up on your BBC blog fails to address your negligence and complicity in passing off fraudulently obtained and faked information as accurate news, nor does it address your stark double standard in suppressing the climate gate emails for two weeks and then when that news broke internationally, your blatantly biased defence of the CRU at UEA, and your attacking the leak (or theft as you described it, without ANY evidence whatsoever to back up that serious allegation).

    And the difference in this fakegate case in your immediate rush to publication of what you called “leaked” information from an “insider” and then your attacking the VICTIM of this fraudulent theft and defended the thief!

    You FAILED to point out the difference between the Heartland being a private organisation which is not subject to FOIA requests, and the climate gate leaks happening largely because that data had already been subject to a FOIA request and the people at CRU ILLEGALLY withheld that public data. NOR did you point out another crucial difference in that ALL the climategate data-leaks were of GENUINE data. NONE of it faked or edited, whereas the FAKEGATE data contained damning information which was ENTIRELY FAKE! It now appears POSSIBLE that you obtained the news of this fakegate theft firsthand from Peter Gleick himself. IF that is the case, then you are guilty of being an accessory to the crime and then deliberately and wilfully misleading, (lying) to the BBC Audience about the information coming from an “insider” you knew Peter Gleick was not an “insider” of the Heartland Institute when you wrote BOTH of your misleading articles about this theft.

    Do the decent thing and resign! “

  49. We have all been acting as if we have been in a fair minded debate with persons who could be convinced if they were shown conclusive evidence. Unfortunately it seems that the opinions and positions of the warmists are not based in fact at all and that no amount of proof or evidence will ever sway their opinions. Whether their positions are based on emotion, eco-fascism, politics or money gathering – our acting as if we think they are logical adults who can look at scientific facts and be swayed has been a waste of effort. Many of the people who have shown their true stripes in the past few days were never honest brokers – they simply are pretending to be.

  50. There is plenty of proof and it is being ignored and denied. Additional proof will be denied. Spending time trying to prove something is not only wasted – it will be twisted by them to some new message.

  51. Can someone ask Dr. Gleick for a better scan of the memo he has allegedly got by snail mail? 2012 Climate Strategy.pdf is an awful B&W scan of pretty low quality.

    I suppose he still has the original in his possession. Also, he may publish a high resolution scan of the envelope the memo was sent in.

  52. Laughably, the Penn State sabbaticalized Dr. Mike Mann accepted it uncritically.

    Twitter / @DeSmogBlog: Evaluation shows “Faked” H …

    Evaluation shows “Faked” Heartland Climate Strategy Memo is Authentic bit.ly/y0Z7cL – Retweeted by Michael E. Mann

    From someone who KNOWS something about “fake but authentic” [heh]!

  53. Dave,

    It is a pretty normal committee practice to have documents as pdfs, as it ensures everyone is looking at the same thing (if you use word, for example, different versions or even setups can lead to slightly different results). And, since they cannot be edited, pdfs are a permanent record – they cannot be tampered with as can documents in editable formats. This is why, for example, legal judgements are now issued in pdf in most countries.

  54. In case DeSmog does remove the documents, there is another copy at

    http://scienceblogs.com/gregladen/2012/02/heartlandgate_anti-science_ins.php

    It’s a Feb 14th post, Greg invites readers to:

    I can not prove that these documents are real or fake. I will certainly pass on to you any information that comes along about this. Have a look at the documents and make up your own mind (before I am forced by guys in suits to take down the links).

    I don’t have time today to compare them against DeSmog’s offerings, I assume they are a binary match.

    BTW, Greg agrees that the strategy document is authentic (i.e. from Heartland), see http://scienceblogs.com/gregladen/2012/02/faked_heartland_institute_is_a.php
    to DeSmog’s Feb 22 post http://www.desmogblog.com/evaluation-shows-faked-heartland-climate-strategy-memo-authentic

    On reason they claim it’s authentic – “It also uses phrases, language and, in many cases, whole sentences that were taken directly from Heartland’s own material.” Gee, that was one reason McArdle et al thought it was a fabrication.

    They make no mention of the various Gleickisms in the strategy document or (I suppose) why Heartland would put them in. Perhaps it’s all a complex attempt by Heartland to take down Gleick. :-)

  55. Re: Dave

    The significance is that there are 2 DIFFERENT Budget documents, for example.
    The document contents are the same but the documents are different. It has nothing to do with how different OS’es handle timestamps. The files sizes are even different.

    PDF is not an editable document format. You can send, receive, view, print and even cut and paste from them. None of these actions will change the modification time (a META property of the file). The modification time for those who receive the PDF will be the same as for those who sent it.

    If, however, you had a Word document and exported it multiple times as a PDF it would have different modification times.

    The conclusion I draw from this is that DeSmog has the documents in a different format than PDF and has saved them to PDF’s multiple times.
    The question is: WHERE DID THEY GET THEM FROM?

  56. “Dave says:
    February 23, 2012 at 5:09 am
    …For me the really odd thing is the idea that HI would send the board documents as PDF format. I very much doubt that would be the case. Making them PDFs for distribution appears to have been done by whoever released them, and this appears to be highly significant when you check the earliest timestamps.”

    Actually, no – PDF format is a close as is practical to a universal format. Everybody can get the reader, and just about every platform has some form of reader, whereas that’s not always the case with proprietary software. Makes sense to save them out as PDFs.

  57. Cross posted from the Bishop’s Palace

    PS – My original idea is wrong too – copying and pasting a file into the same place renames it to “xx Copy” and then if you do it again it becomes “xx Copy (2)” – there is a slight difference between XP and 7 between the names but the word “Copy” is inserted into the new file name.

    As you were ;)

  58. TerryS says:
    February 23, 2012 at 5:49 am

    The conclusion I draw from this is that DeSmog has the documents in a different format than PDF and has saved them to PDF’s multiple times.

    It may be even simpler: DeSmog received it several times, from Gleick and different receivers who forwarded them (again) to DeSmog, or he had background conversations with Gleick, who sent him different versions of the (fake) file…

  59. Alex at 04:54PT is right– the only way this is fully explained is when HI sues Gleicker and conducts discovery, such that Gleicker must produce (under penalty for pergury) the snail-mail envelope and produce his printer for testing. That said this group inquiry this weekend should be fun and surprisingly productive– hey, “Pajamas Media” was born 7 years ago when CBS/Dan Rather pulled the TANG nonsense. I’m sure warmists will want to participate to show that the forgery is “authentic”

  60. Mann slays me, he really does.
    He’s the kinda guy that would know an “Authentic” fake when he see one wouldnt he?
    Anyone for Hockey?

  61. @Bill and everyone else,
    Ok finally had a chance to look at the documentation for JGAAP and I don’t believe it will actually be necessary to decompose the memo into paragraphs or sentences. It looks like the LDA and both of the SVM algorithms do this automatically and assign attribution to sections of the document.
    So for individuals wanting to give this a go, I’d suggest using those algorithms first.

  62. Here’s some material from the Pacific Institute site, written while the Fakegate plot was being hatched:

    Dear Friends,

    There’s a Los Angeles Times clipping from 1993 hanging on the wall at our Oakland office: “Study Forecasts Ill Effects of Possible Warming Trend.” That story was about a groundbreaking Pacific Institute study on climate change. The Pacific Institute turns 25 this year, a striking milestone. We have led the call to put climate change mitigation and adaptation into action; we have put the human right to water on the global agenda and water conservation and efficiency into local policy; we tackle environmental and human health with local communities and social responsibility with multinational corporations and the United Nations, and our influence continues to grow. Celebrating a quarter century is a time to assess how the Institute’s efforts to produce innovative, influential research and to get it into the right hands are changing things! I invite you to share this celebration with us each month and see both where we’ve been and where we’re going. This is just the beginning…

    Peter Gleick
    President and Co-founder
    Pacific Institute

    Standing Up to Editorial Bias on Climate Science – Peter Gleick Writes, Thousands Respond

    On January 27, Pacific Institute President Peter Gleick responded to the op-ed piece “No Need to Panic about Global Warming” published in The Wall Street Journal, which claimed climate change was not occurring. In his Forbes blog post, Dr. Gleick said, “The most amazing and telling evidence of the bias of The Wall Street Journal in [the climate sceince] field is the fact that 255 members of the United States National Academy of Sciences wrote a comparable (but scientifically accurate) essay on the realities of climate change and on the need for improved and serious public debate around the issue, offered to The Wall Street Journal, and were turned down.” That letter was subsequently published in Science magazine.

    Responses from Dr. Gleick’s post by climate scientists, environmental activists and organizations, and the public have been remarkable — sparking investigations of the sixteen scientists who signed the op-ed submitted to the WSJ; dozens of editorial pieces in prominent publications and by organizations such as The Huffington Post, Catholic Alliance for the Common Good, The New York Times Dot Earth Blog, The Australian, Media Matters, and Climate Progress debunking the WSJ’s flawed and misleading arguments about climate science; and thousands of comments and conversations on blogs, social media outlets, and forums on the current climate dialog in the media. Later that week, The Wall Street Journal implicitly acknowledged their “climate goof” by accepting a letter from 38 climate scientists, including Peter Gleick, who responded to the WSJ op-ed.

    “While much of the opposition to addressing the issue of climate change is political,” Dr. Gleick commented, “it often hides behind pseudo-scientific claims, with persistent efforts to intentionally mislead the public and policymakers with bad science about climate change.”

    With efforts by well-funded climate change deniers to sow confusion and delay action by Congress and the public, the Pacific Institute has taken a stand to address climate science misinformation and to push for planning to
    prepare for increasingly severe negative impacts of climate change.

    -Read Peter Gleick’s blog “Remarkable Editorial Bias on Climate Science at The Wall Street Journal.”
    -Read the climate scientists’ Wall Street Journal letter.
    -More about climate change.

  63. Heartland’s lawyers need to send Gleick a letter asking him to preserve his cell phone, and not to delete any of his texts and e-mails. They need to be sure he preserves all communications by which he set up the one- time-only e-mail account.

    Given his amateur and apparently rushed forgery, I am sure he left plenty of tracks. He will be under tremendous pressure to try to cover those tracks which will only make things worse.

  64. Changed document numbers ((2), (3)) might have a quite simple explanation: when downloading documents from web (including web-based e-mail) using Windows 7 (and possibly other OS), first instance of the document will be saved with correct title (i.e. filename.doc).
    Every time you download the same file after that, W7 will save the file with added number (1), (2), (3) etc. (ie filename (1).doc, filename (2).doc, etc)

  65. It may be even simpler: DeSmog received it several times, from Gleick and different receivers who forwarded them (again) to DeSmog, or he had background conversations with Gleick, who sent him different versions of the (fake) file…

    Gleick makes no claims about sending it multiple times. In fact, in the email he sent out he even states that the mailbox they are sent from is going to be deleted so there is no possibility of either sending again or having an email exchange.

    No matter how many times you forward a PDF or save it or print it or whatever, the modification time in the META data will never change.

    For those who doubt this I’ll issue a challenge. Change the meta data modification time (and the Instance UUID) in any one of the PDFs to a different time and post the modified PDF somewhere I can get it (to verify it) together with what you did to achieve this.

  66. Also may want to consider our list of potential authors. At a minimum, I’m including Bast, McElrath, Gleick, and a group that I call HI-unattributed (which includes a scattering of other heartland documents of mixed authorship).
    Now to counter criticism that the sample does not include a fair representation of “Lucy Ramirez’s”, I’m also thinking about including Littlemore, DeMelle, and Laden.
    We’ll see if it crashes my computer.

  67. Some change in comment: if you receive several times a document by mail with the same title, a mail program automatically adds (2), (3),… to the latest version, at least it does in Eudora, before storing it in the attachment box. Thus Greg Ladem received the same or different files with the same title.

    That can be copies forwarded by others, who received it from Gleick too, but that may be that the fake document and/or others was/were forwarded several times by Gleick, awaiting comments of one or more of the “15” and altering some parts of the document. DeSmog was maybe smart enough to skip the receive number addition, but Greg only published the latest version as is.

  68. How often do climate scientists change the leading in their documents?
    How many climate scientists know how to change the leading in their documents?
    How many climate scientists know what leading is?

    “Leading” is the spacing between lines in a document, actually Microsoft word uses the term “Spacing” and is very cluncky to change.

    Another tell is the graphic styling, the style of the fake memo is different than the authentic ones. For instance the fake one uses 16 pt leading and all of the Heartland documents are all 14 pt even when double spaced. Plus there are other graphic styling differences, I point them out here.

    http://www.minnesotansforglobalwarming.com/m4gw/2012/02/desmogbloggate.html

    If anybody has access to any documents produced by Gleick’s send it to me and I can analyze it.

  69. TerryS says:
    February 23, 2012 at 6:42 am

    Gleick makes no claims about sending it multiple times. In fact, in the email he sent out he even states that the mailbox they are sent from is going to be deleted so there is no possibility of either sending again or having an email exchange.

    That is what Gleick says, but at least Greg Laden received the same (or different versions of the) fake file at least three times… But indeed it may be simple forwarding by mutual friends…

  70. Re: Ferdinand Emgelbeen

    if you receive several times a document by mail with the same title, a mail program automatically adds (2), (3),… to the latest version

    The mail program will not change either the modification time or the Instance UUID contained in the meta data of the PDFs. You can rename the file to whatever you like, the meta data will remain the same.
    Gleick only sent the documents out once and deleted the mailbox he sent it from so there should be no possibility of comments or multiple.
    If DeSmog has copies of the PDFs that pre-date those sent to the “15” then DeSmog acted in concert with Gleick and are thus claiming to have verified documents they helped to compile.

  71. TerryS says on February 23, 2012 at 2:40 am

    Re: Morph
    You are correct. The document has changed.
    The Modified time has changed from Mon Feb 13 12:41:52 2012 to Tue Feb 14 12:36:20 2012.

    (1) Also, do not overlook the fact that PC time can be changed, to ‘backdate docs’ (when reproduced ‘again’) for instance …

    (2) And – PDF docs can be edited as they stand (i.e., they need not be the product or output of some other process e.g. Adobe Acrobat, Distiller PDF-Print etc. to have content changed), using any one of several pdf edit tools.

    .

  72. For samples of writing to test see:
    Peter Gleick’s Blog: Water By Numbers
    Note his characterization: Climate BS Award

    Fourth Place: The Koch Brothers for funding the promotion of bad climate science.
    Fourth place goes to fossil-fuel billionaires Charles and David Koch of Koch Industries, Inc., who provide substantial funding to groups and politicians who deny the science of climate change. The Koch brothers fund a veritable Who’s Who of groups that put out misleading science or tout bad science on climate change as an intentional strategy.

    Fifth Place: Anthony Watts for his BEST hypocrisy
    Anti-climate-science blogger Anthony Watts . . .

    Runners-Up in 2011 included:
    Harrison Schmitt and the Heartland Institute for “Arcticgate” (documented errors in denying disappearance of Arctic sea ice); Rush Limbaugh for his consistent falsehoods about climate science; and Steve McIntyre for his smear of climate scientist Dr. Michael Mann of Penn State University.

    For potential people and training documents see:
    The Pacific Institute http://pacinst.org/
    Pacific Institute Staff & Board
    Pacific Institute Publications

    National Center for Science Education
    NCSE Board of Directors
    NCSE Staff
    NCSE Supporters

  73. Berényi Péter said, “Can someone ask Dr. Gleick for a better scan of the memo he has allegedly got by snail mail? 2012 Climate Strategy.pdf is an awful B&W scan of pretty low quality….I suppose he still has the original in his possession. Also, he may publish a high resolution scan of the envelope the memo was sent in.”

    Ha! That’s funny, Mr Berényi. Anthony’s current venture has no doubt caused Dr Gleick to freak again and so has no doubt misplaced / spring cleaned-away / accidentally eaten or otherwise lost the habeas documenti. I’m betting too that his laptop was stolen or was inadvertantly put by someone in his fridge whereupon the powerful fridge magnets regretfully wiped out his hard drive. Or, perhaps it was a solar magnetic storm. Yeah, that’s the ticket; a solar forcing.

    Paging Mr Connolley ! You seemed so impatient earlier on. Come on now Sir, show us lazy peons how to run a proper document analysis. An aggressive, hyper-active Alpha male like you should have had it done, published and peer-reviewed by now.

  74. For writing samples and persons see:
    The Pacific Institute
    Peter Gleick’s Blogs</a
    Note
    2001 Climate B.S. of the Year Awards

    Fourth Place: The Koch Brothers for funding the promotion of bad climate science
    Fourth place goes to fossil-fuel billionaires Charles and David Koch of Koch Industries, Inc., who provide substantial funding to groups and politicians who deny the science of climate change. The Koch brothers fund a veritable Who’s Who of groups that put out misleading science or tout bad science on climate change as an intentional strategy.
    Runners-Up in 2011 included:
    Harrison Schmitt and the Heartland Institute for “Arcticgate” (documented errors in denying disappearance of Arctic sea ice);

    Also: <a href=http://ncse.com/about/The National Center for Science Education

  75. They are almost forcing Heartland Institute to takes steps which will force Dr. Gleick to testify under oath. Then lying will be a felony for which he would likely serve time–the justice system cannot allow perjury to go unpunished. If Dr. Gleick is forced to admit that he forged the key document, DeSmogBlog will be exposed as what we already know they are.

  76. TerryS says on February 23, 2012 at 5:49 am:

    PDF is not an editable document format.

    Actually, using 3rd party tools they are editable; sometimes in limited ways (font sets might be restricted for instance depending on the 3rd party tool).

    I have used such 3rd party tools to annotate/markup for shop (floor) personnel various technical ‘schematic’ pdfs with additional info. At other times I have also ‘scrubbed’ various identifying title blocks and such from PDF files, and on occasion even unselected the ‘no select-and-copy’ option feature that can be invoked when printing/producing/outputting a PDF file.

    Examples of 3rd party pdf editors: https://www.google.com/search?client=opera&rls=en&q=pdf+editor&sourceid=opera&ie=utf-8&oe=utf-8

    .

  77. Good one, Mr Hardy Cross! The word subset certainly appears frequently enough. We all become enamoured of certain words or terms and use them heavily. Now some may say that the software is configged to spot such repetition, but one can’t be always sure, as we use many words over and over again and the programmers may not have been able to design for or calibrate for unusual repetitions. Even with our advanced systems, it never hurts to eyball something whenever possible.

  78. Ok- so maybe this is TL;DR stuff.
    I did some research into the documents. I found some things that were inconsistent between the Strategy document and the other Heartland documents. They may be small, but if the Strategy document is coming from a member of the Heartland board, wouldn’t it be at least correct and consistent?
    I dunno, maybe it’s just me….

    “Strategy” document – “Development of our ‘Global Warming Curriculum for K-12 Classrooms’ project.
    Heartland 2012 Fundraising plan calls this the “Global Warming Curriculum for K-12 Schools”

    “Strategy” document – “We tentatively plan to pay Dr. Wojick $100,000 for 20 modules in 2012, with funding pledged by the Anonymous Donor.”
    Heartland 2012 Fundraising plan – “”We tentatively plan to pay Dr. Wojick $5,000 per module, about $25,000 per quarter, starting in the second quarter of 2012, for this work. The Anonymous Donor has pledged $100,000 for this project” Wouldn’t that be paying Dr. Wojick $75,000? The other $25,000 for the project would be for materials, supplies for the curriculum, etc? At $5,000 per module for THREE quarters of 2012, that would be 15 modules total, not 20.

    “Strategy” document – Regarding the drop in funding by the Anonymous Donor – “He has promised an increase in 2012-see the 2011 Fourth Quarter Financial Report” There is no mention the Fourth Quarter financial report (Part of the “Binder” document) of a promise by the Anonymous Donor to increase donations. The paragraph on page 10 of the “Binder” document regarding the Anonymous Donor has been cut off. In the Fundraising plan (pg. 20) Heartland mentions the amount the projects Anonymous Donor has agreed to fund and “those we hope he will agree to fund as the year progresses.” The amount he has already pledged is $1,000,000 (Fundraising document, pg 21, Table 6), which is an increase of $21,000 over 2011 giving total.

    “Strategy” document – regarding funding from the Koch foundation in paragraph “Increased climate project fundraising” – “We expect to push up their level of support in 2012 and gain access to their network of philanthropists”
    Heartland 2012 Fundraising plan – “We expect to ramp up their level of support in 2012 and gain access to the network of philanthropists they work with.” Pg 7

    “Strategy” document; Funding for parallel organizations – “At present we sponsor the NIPCC to undermine the official United Nation’s IPCC reports and paid a team of writers $388,000 in 2011 to work on a series of editions of Climate Change Reconsidered.” “Another $88,000 is earmarked this year for Heartland staff, incremental expenses, and overhead for editing, expense reimbursement for the authors, and marketing.”
    Heartland 2012 Fundraising plan, pg 13 – “Heartland pays a team of scientists approximately $300,000 a year to work on a series of editions of Climate Change Reconsidered, the most comprehensive rebuttal of the United Nations’ IPCC reports. Another $88,000 is earmarked for Heartland staff, incremental expenses, and overhead for editing, expense reimbursement for the authors, and marketing.”
    Heartland pays the scientists $300,000 an year, NOT $388,000. The total with the expenses is $388,000.
    However, in the 2012 Heartland Budget, pg. 8, table 2 – payment to lead authors and contributors is listed as $140,000 in 2011. The rest of the 2011 budget went to center staff ($140,000), SEPP to recruit authors and host meetings ($84,000) and Heartland to fundraise, edit, proof, publish and promote the book ($24,000).
    Also notable is the changing of “scientists” as authors of the NIPCC in the Heartland document to “writers” in the “Strategy” document. Telling that the author of the “Strategy” document does not consider the NIPCC to be a scientific document.

    “Strategy” Document – Funding for selected individuals outside of Heartland. The funding levels mentioned for Craig Idos, Fred Singer and Robert Carter is payment for work on the NIPCC report. The “Strategy” document makes no mention of this.

    The “Strategy” document uses the term “consider expanding…, if funding can be (found) obtained” twice on one page.
    The last paragraph of the “Strategy” document also seems to be entirely someone’s opinion. The only fact I can find in this paragraph supported by any of the Heartland documents is the funding for Anthony Watts for tracking temperature station data.

    Use of the term “anti-climate” in the “Strategy” document; this term is being used, by a supposed Heartland board member & AGW skeptic to describe other AGW skeptics.– I found 4 Heartland documents that use the term “anti-climate”. The term is consistantly used to by AGW proponants to describe AGW skeptics. It is not a term AGW skeptics have used to describe themselves.

    FWIW…..

  79. Acrobat is made by Adobe, the Adobe Creative Suite comes with Illustrator.
    With Illustrator you can open any “unlocked” pdf file (which the Heartland Documents were) and do ANYTHING to that document then simply resave it as a pdf.

  80. PDF files are fully (if rather messily) editable with Adobe Illustrator and marginally editable with Adobe Acrobat and a number of 3rd party editors.
    The problem is that a PDF file consists of a number of objects that have to be edited separately, and the composition of the objects can be very weird. Usually it is easier to edit the source file and make a new PDF, but if this isn’t possible fairly extensive editing of the PDF is perfectly possible but time-consuming.,

  81. @TerryS says:
    February 23, 2012 at 5:49 am

    “Re: Dave

    The significance is that there are 2 DIFFERENT Budget documents, for example.
    The document contents are the same but the documents are different. It has nothing to do with how different OS’es handle timestamps. The files sizes are even different.

    PDF is not an editable document format. ”

    Actually you can, all you need is the right tools.

    A pdf analyzer not free but that is able to erase incriminating meta data otherwise pdf files record every byte put into it, even when you have deleted the bytes.
    http://www.pdfanalyzer dot com

    A free pdf analyzer, however to edit you need to pay

    http://www.softpedia.com/get/Office-tools/PDF/PDFAnalyzer.shtml

    A list of different tools for pdf and other documents

    http://zeltser.com/reverse-malware/analyzing-malicious-documents.html

    How to put viruses into pdf

    http://www.sudosecure.net/archives/636

    Free pdf edit tool

    http://www.labnol.org/software/edit-pdf-files/10870/

    So, there’s abunch of stuff you can do. :p

  82. Here is a sample from the faked HI doc:

    “Efforts at places such as Forbes are especially important now that they have begun to allow high-profile climate scientists (such as Gleick) to post warmist science essays that counter our own. This influential audience has usually been reliably anti-climate and it is important to keep opposing voices out.”

    Here is Gleicks review of Donna Laframboises book he wrote in October:

    “This book is a stunning compilation of lies, misrepresentations, and falsehoods about the fundamental science of climate change. It compiles the old arguments, long refuted, about the Intergovernmental Panel on Climate Change, which summarizes the state of science on climate change. The IPCC reports — the most comprehensive summary of climate science in the world — are so influential and important, that they must be challenged by climate change deniers, who have no other science to stand on. LaFramboise recycles these critiques in a form bound to find favor with those who hate science, fear science, or are afraid that if climate change is real and caused by humans then governments will have to act (and they hate government)….

    Are you already convinced that climate change is false? Then you don’t need this book, since there is nothing new in it for you. If you respect science, then you ALSO don’t need this book, since there’s no science in it, and lots of pseudo-science and misrepresentations of science. See, especially, the section trying to discredit the “hockey stick” — long a bugaboo of the anti-climate change crowd. Seven independent scientific commissions and studies have separately verified it, but you won’t find out about that in this book.”

    Granted, two words do not automatically show that he wrote the fake doc. That said, those of us who write a lot have our own linguistic quirks, our own personal style. Unless you are trained in the art of writing dialog, or something along those lines, these quirks and ticks typically go unnoticed by the writer. Again, the use of the words “influential” and “anti-climate” don’t prove that he wrote the fake, but the use of the same words and language does not help his cause.

  83. Re: _Jim
    > Actually, using 3rd party tools they are editable;
    Re: MissBrooks
    > PDFs are actually (marginally) editable, if you have Acrobat Pro.

    And what possible reason is there for DeSmog to employ these tools and modify the PDFs?

    I accept that there might well be a perfectly innocent explanation of why DeSmog has 2 different versions of 4 of the files but I can not think of one. Why, for example, would they open the PDFs in an editor?

  84. “Who elected Connolley to decide what is “content free”? This isn’t Wikipedia, where Connolley can make decisions. He is only another commenter here. And as we can see, he is usually wrong.”

    I have no love for Connelley , who is a eco-bigot who has done much damage to the usefulness of Wikipedia. However, once again I have to say he’s right.

    This effort at crowd sourcing seems to have a very poor response rate. Lot’s of chaff about pdf”s but if no one is analysing them comment is immaterial.

    I think the idea is interesting but I have some climate science I’d rather do instead of invest time in what Gleick is going to have to come clean on at some stage.

    He looks like a kind and decent guy , I don’t think he will stand up well to the prospect of doing serious time for forgery.

  85. Rude: No dice, Glieck implies he got the memo separately, from an anonymous source, by snail mail. Very convenient, no?

    Sherlock Mosher has already pointed out that this is highly unlikely due to the fact that there are no crease lines from folding in the fake document….

  86. One suggestion on the crowdsourcing — I am not familiar with the tool you use, but analyses like these are subject to both false negatives and false positives. If the tools seems to point to Gleick as a likely author, we are not worried about false negatives. But there is still the problem of false positives. The tool needs to be run against other climate authors to see how they score as authors. If I were a defender of Gleick, and your tool came up with a “positive” match, the first thing I would try to defend my guy is to run the tool against other authors and see if I can get other positives. If they are able to get positives against, say, yourself or even Madonna as authors, that it is going to undermine your results. So I would recommend that you understand the false positive rate of your tool.

  87. Mr Connolley seems to imagine that in the time between late last night and this morning that many of us should have found the time to perform this experiment already. I guess we should have forgone sleep. After all, if we can find the time to spend 30 seconds writing a blog posting, we should have been able to perform this experiment by now.

    This cannot possibly be done as quickly as Mr. Connolley seems to think it should be done. Something like this effort will likely require free time over a weekend or something like that. Counting the number of posts which occur before the first results are is embarrassingly childish in nature.

  88. @Aaron,
    I would even go further to suggest that people who are doing this checking, should hold onto their results until they have done several runs using different options within the software.
    Then when you DO publish, be prepared to discuss BOTH negative and positive results.
    Afterall, I believe that is how “science” is supposed to be done.

  89. The document and the envelope: Was it sent by US mail? I would guess that Peter Gleick would say that he threw it away and it is gone. But if it were anonymously mailed, my common sense would want to know from where the letter came. Maybe Peter Gleick has similar common sense and knows the place of mailing and even kept the envelope.

  90. Sonicfrog, (February 23, 2012 at 7:52 am), I agree with you that the use of the terms “anti-climate” and “influential” you highlighted may not make a strong case for forensically identifying the author, but the use of the former has raised many eyebrows. I cannot imagine a skeptic calling his position “anti-climate.” It would be akin to, for example, an antisemite today refering to himself as an antisemite, when the common and favoured terms are “critic of Israel” or “anti-Zionist.” My guess is that the doc has been put together by someone who may be otherwise a bright person, but an utter dufus when it comes to IT-based forgery, which our digitized environment have made harder.

  91. Michael J says:
    February 23, 2012 at 12:39 am

    I have a new theory about the fake document.

    I suspect that it was sent to him by a colleague or, more likely, an opponent for the specific purpose of yanking his chain.

    They hoped to get a laugh as Dr Gleick’s anger and hatred blinded him to the document’s obvious faults. However even the provocateur(s) could not have anticipated Dr Gleick’s actions.

    ================

    This gave me a good laugh. As did imagining the panic that would have gripped the prankster when Gleick told him about it.

  92. William M. Connolley says February 23, 2012 at 8:09 am

    Up to 98 comments. Lots of people spouting off, but no-one has actually done any work.

    That, is what is called “a drive-by”; it was also contentless as well as being witlessly done.

    Somehow one would expect more from the big kahuna-combination of Climate + Wiki …

    Say, Big Kahuna, how do you know what is taking place behind the scenes? Have you no imagination, no ability think beyond your own 6′ x 8′ cubicle?

    .

  93. Has Andrew Revkin or any other journalist made an effort to directly contact Gleick and ask him if he fabricated (or had any involvement in the fabrication of) the “Climate Strategy” memo?

  94. William M Connolley has form in faking information. Perhaps he give us pointers on how he does it? It seems a pity to waste all that experience when it is close at hand.

  95. “Up to 98 comments. Lots of people spouting off, but no-one has actually done any work.” –William M. Connolley (February 23, 2012 at 8:09 am)

    And right indeed you are, Mr Connolley, we’re all a pile of lazy loafs here, it would seem. Shame on us. All the more reason for you to take up my challenge to you, to roll up your sleeves and show us how to do a man’s job properly. An interesting experiment to see how devoted you are to the muse of science, and o, what a delightful screamer it would be if it were to be you who conclusively nailed the identity of the fraudster! I bet you still have the mojo for that.

    Still, in defense of us lazy bones, some of us like me are lucky to manage a PC for email and to even tart up our simpletonian postings with html, and all here appear to be cursed with day jobs which limit us to quick missives throughout the workday. Perhaps if the folks who keep you in supply of your daily pint and bangers-‘n-beans would throw some green paper stuff our way….

  96. One thing to note about the scanned PDF: The ‘readable’ text was created by the EPSON’s OCR software as invisible overlay. This is pretty common – it makes the scan’s text searchable. But the OCR software will generate errors in spacing and spelling that are NOT in the original text. For a cleaner analysis the text should be cleaned up to actually match the original before running the JGAAP software….. Anyway, that’s just a thought. I’ll try and remember to try this out later!

    Also, I keep seeing people bringing up the ‘yellow dot tracking’ – it doesn’t look like the original scan was made at a resolution where those would be detectable. They’re just low res black and white bitmaps.

  97. William M. Connolley says:
    February 23, 2012 at 8:09 am

    Up to 98 comments. Lots of people spouting off, but no-one has actually done any work.
    _________________________
    By your own standard, you are guilty of doing absolutely nothing. but spouting off.

    Not that it matters, your Wiki antics have lent you about as much credibility as the statement: “Guaranteed by Enron”.

  98. I think there might have been a header of some sort that was cropped out (sloppily, I might add) on the 2012 Climate Strategy.pdf:

    There is also a slight trace of the same artifact on page 1.

    After comparing page 2 with a boatload of scanned docs on my drive, a cannot find a similar type of artifact. If page 2 was simply misaligned when it was scanned, the vertical line would have extended the entire length of the page.
    Since it occurs in the earliest instance of the memo, it would rule out any ex post facto alterations by DeSmog. It doesn’t rule out any involvement they might have had in the original creation, though.

    That being said, feel free to shoot it down with a better theory. :)

  99. Thought I’d put another comment on for Connelly to count. OCD types just can’t help themselves. I’ll get to the “work” when MY work allows it.

  100. OK Bill lets try one more time.

    ‘Besides, it will be fun and we’ll all learn something.’

    I did apply for a cliamte grant but as I couldn’t promise that the answer would be CAGW they turned me down.

  101. Mr. Cannoli, I suggest that before you shoot your mouth off yet one more time, you do the following to teach everyone how it is done (if you aren’t too busy guarding your stash of misinformation on Wiki) :

    Download the program.

    Learn from scratch how it works. Determine the various choices to be made in the large variety of analysis methods available within the program – there are many. Of course, your thorough knowledge of statistical techniques including the principles of cluster analysis will come in handy here.

    Gather together sufficient samples of text to work on. The program requires not only those created by the author under investigation, but also others with which comparisons need to be be made. The results order the authors according to the likelihood of ownership of the unknown text so external comparisons are important.

    Do the analyses (plural, not singular), for properly evaluating the results.

    Come back within 24 hours with the answers.

    Although the discussion here has been helpful, what would be even more useful is an assembled collection of texts written by Gleick as well as several others persons possibly including the De SmugBlog posters who should not be dismissed as possible contributors. The program can supposedly read Word and pdf files, but mine seemed to give error messages when encountering docx files. Text files seem to run reasonably well.

  102. William seems to be setting the bar at the standard set by Gavin who worked right through the Superbowl to beat McIntyre to a GCHN error that McIntyre discovered.

  103. I’ve downloaded the program and tried to process 3 times but just keep on getting an error of “Experiment failed to complete”. Perhaps someone else could actually try to give it a go? It takes about 5 minutes to download and run. All you need to do is add in the fake document to the “Unknown Author” section, then add in documents from known authors including Gleick.

    The updated user guide is here. http://evllabs.com/jgaap/5.2/JGAAP_User_Guide.pdf

    I honestly didn’t know what all the different types of analysis meant so I just clicked “All” on every tab and then hit “Process” on the last tab. Maybe that was my mistake.

  104. Good one, DukeC!

    I do work with images and imagine myself a bit of a Photoshop expert, if I may blow my own horn, and so it’s my semi-educated guess that the fix-up may have been done manually on the original, possibly with a white tape or a label. Still, we cannot discount that it was done digitally with a rectangular block and then re-scanned in lower resolution. This low resolution pixelation you see on either what is a shadow from a tape or a label (my guess), or a leftover logo bit, which is evident on the sloppily left-out line on the left-hand side, is dimensionally quite similar if not identical to the resolution on the text, and spills over to the right-hand side of the line, instead of being crispy sharp, had the obscuring been done at the last stage and at a high res. I hope all this made sense.

  105. “Duke C. says:
    February 23, 2012 at 9:31 am
    I think there might have been a header of some sort that was cropped out (sloppily, I might add) on the 2012 Climate Strategy.pdf”

    Hmmmmm I don’t think that’s part of a header… It just looks like a smudge on the scanner glass.

  106. Since our friend Willie isn’t reading my posts or rising to my goading to help out, I’ll say sotto scriptum, that Connolley is merely goading everyone out of sheer nervousness. If so, just imagine what Dr Gleick must be feeling like, shvitzing as we merrily stumble along this excellent sleuthing adventure Anthony was good enough to throw for us. To indulge in some cheap parlour-psychology, all for entertainment value of course, it’s my esteemed opnion as a wag that with his seemingly gratuitous taunts and mockery, Connolly is exhibiting for our pleasure what A Type (both alpha and –ss hole) personalities tend to do when very stressed. In any case, keep in mind that had this process here—which is actually moving along quite nicely with the preliminary discussions and levity–had it been moving faster, he’d be accusing everyone of jumping the gun. After all, Connolley does have the ethics and integrity of his favourite beast, the stoat, which is a kind of a weasel.

  107. Peter Kovachev says:
    February 23, 2012 at 10:19 am
    “… it’s my semi-educated guess that the fix-up may have been done manually on the original, possibly with a white tape or a label.”

    That’s plausible. :)

    Although the type starts so high up on the page that it doesn’t look like it was intended to print on letterhead stationery. Hard to say!

  108. @Ian Hoder
    I haven’t had a chance to play with JGAAP myself yet, will do that tonight, but have been reading the documentation. I would NOT select “all” since the developers warn that they have a memory leak problem, which is why you are probably dying everytime.
    My recommended settings would be:
    – Canonicizers: Normalize ASCII and Whitespace, Strip Punctuation, and Unify Case
    – Event Drivers: Character Grams (N=4, N=5), M…N letter words (M=6, N=12)
    – Event Culling: Most Common Events (N=100 or other fairly large number)
    – Analysis Methods: Try these ONE at a time (Guassian SVM, LDA, Linear SVM)

    Good luck and let us know how that works.

  109. A dood, beg to respectfully differ with your smudge hypothesis. If you were to look again, you’ll note that the anomaly is unnaturally regular; rectangular, straight lined and with an apparent corner. I still place my wager on a shadow caused by a white tape or a label. On second look, it may actually be a shadow caused by the upper corner and edge of the document page, although the abrupt, un-tapered ending of the bottom bit of the line on the left hand side would militate against that. Argh, what I would give for the original doc! Dr Gleick, if you’re reading all our prattle through a curtain of tears, be a jolly good sport and hand the item over to the authorities. Do that and I promise to make you a soap-on-a-rope with your name carved onto it.

  110. “Up to 98 comments. Lots of people spouting off, but no-one has actually done any work.” –William M. Connolley (February 23, 2012 at 8:09 am)

    Actually several other lines of useful enquiry have been opened.

    1) is the original pdf of a folded document? It will take a lot of explaining if it is not.

    2) why is the leading wrong?

    Of course we know the document is faked. You have to be a particular sort of blind not to see it.

    I would find myself leaving any site that thought defending forged documents was fine. I have a list of sites I never visit for ethical violations well below that. How low do you have to sink that defending forgery becomes acceptable?

    • @Mooloo – Apparently Mr. Connolley is too high and mighty to try himself, but would rather deride others for not instantly publishing what they may be working on. -or- Maybe he’s a “genius” like Dr. Gleick, and has no learning curve for new software and new techniques. – Anthony

  111. Charles Bruce Richardson (February 23, 2012 at 7:28 am)
    “Then lying will be a felony for which he would likely serve time–the justice system cannot allow perjury to go unpunished.”

    Unfortunately not true. As Bill Clinton repeatedly demonstrated. Ask any member of the Democratic Party – they will make it clear that liberals consider themselves above the law.

  112. @Michael Tobis,
    I see that the results based on scanning the document as a whole are unsurprising from your POV. And in fact were anticipated by myself. If you’ll scan up many comments, you’ll note that one of my suggestions was that each paragraph/section should be scanned individually. Most problematically, I would suggest the “Expanded Communications” section.

  113. Re:File time – this can be faked but a lot of computers automatically set the time back based on either a network time server or the interweb – in fact it can be a real pain when you want to fake it – e.g. when testing software.

  114. Good heavens, kim2000. I hate you, how dare you blow my beautiful hypothesis to smithereens! Alas, I think you are onto something!

    Alright, bugger my tape/label hypothesis. The Pacific Institute header, which appears even on the inside or secondary pages, totally jives proportionally to the page edge and text distance, and the rectangle bit is a shoe-in. Bloody marvelous, Mr/Ms kim2000. ..I’d offer to kiss you if I knew your gender.

    Me, I’m thinking that with all re-use and sustainability hooplah eco-critters have conditioned themselves to, it would be just like a Pacific employee to conscientiously recycle an old document from their re-use bin in the copier or mail room. LOL! What poetic justice that would be!

  115. William M. Connolley says:
    February 23, 2012 at 8:09 am

    Up to 98 comments. Lots of people spouting off, but no-one has actually done any work.
    ============================================================

  116. Peter Kovachev says:
    February 23, 2012 at 11:31 am

    I’m a girl ;)
    I’m a kid
    …………………………
    Mr Watts do I get a hat tip..please?

    [No. However, using hockey stick terminology, you can perform a hat trick" ... 8<) Robt]

  117. LOL! No kiss from me, then; I’m too old! And apologies for my sailor ‘s lingo, young Missy…must remember there’re young-uns here too.

    But yes, how about it, Mr Watts, a hat tip for the young lady at least? This thing might have easily been missed by all of us.

  118. Just noticed Kim2000’s graphics, looks plausible, as the aspect ratio of the border seems identical. Further investigation needed.

    Give the kid a gold star!

  119. kim2ooo says:
    February 23, 2012 at 11:37 am
    I’m a girl ;)
    I’m a kid
    …………………………
    Mr Watts do I get a hat tip..please?
    ==============================================
    ABSOLUTELY!……………………… :-D

  120. Linux/Unix has a utility called ‘touch’ that updates the time stamp to the current time or any other time and date. That existed for DOS too as a utility, distributed with Borland stuff for instance.

  121. keep getting “java.lang.IllegalArgumentException: URL source must use ‘file’ protocol”

    Tried on mac/pc, tried sudo, made sure doc was utf8 (with and without BOM)… I will run from the source later. This program seems very sensitive to not working at all. You can see the errors by runninf rome the prompt from your download folder: java -jar jgaap-5.2.0.jar

    I did get it working for a while using their test files (I used ‘L’), then removing their unknown docs, and adding the heartland memo and some authors. The first run with their authors and the memo, the gleick text from above here, and a heartland speech was from this (excellent) overview of the docs, pre gleick confession: http://ljzigerell.wordpress.com/2012/02/18/profiling-the-heartland-memo-author/

    I ran with only Burrows delta, and the Event Drivers are listed below (couldn’t run them all without crashing, so a randomish sample). I assume lower is closer as the other authors are ancient latin poets iirc. Infinities probably means no hits in their algorithm.

    unknown.txt /Users/admin/Desktop/jar/unknown/unknown.txt
    Canonicizers: none
    Analyzed by Burrows Delta using Sentence Length as events
    1. Author01 78.77155793679051
    2. Author01 80.2867632506562
    3. gleick 86.33117704938115
    4. Author03 88.9431049630841
    5. bast 104.26533350267803

    Analyzed by Burrows Delta using Words as events
    1. Author01 Infinity
    1. Author01 Infinity
    1. Author03 Infinity
    1. gleick Infinity
    1. bast Infinity

    Analyzed by Burrows Delta using MW Function Words as events
    1. bast 100.96472895245202
    2. Author01 110.13874051963056
    3. Author03 111.98543979724113
    4. Author01 112.50870666862883
    5. gleick 118.5464526044639

    Analyzed by Burrows Delta using Sentences as events
    1. Author01 Infinity
    1. Author01 Infinity
    1. Author03 Infinity
    1. gleick Infinity
    1. bast Infinity

    Analyzed by Burrows Delta using Syllables Per Word as events
    1. gleick 4.678547658695827
    2. Author01 7.16552557153825
    3. Author01 9.372995377516089
    4. Author03 9.552504044292814
    5. bast 13.812675242063655

    Analyzed by Burrows Delta using Characters as events
    1. Author01 Infinity
    1. Author01 Infinity
    1. Author03 Infinity
    1. gleick Infinity
    1. bast Infinity

    Analyzed by Burrows Delta using Word Lengths as events
    1. gleick 19.34449212885342
    2. Author01 23.605545416411637
    3. Author01 28.46339579762878
    4. bast 30.397652383928673
    5. Author03 30.8325765083569

    Analyzed by Burrows Delta using Suffices as events
    1. Author01 Infinity
    1. Author01 Infinity
    1. Author03 Infinity
    1. gleick Infinity
    1. bast Infinity

    Analyzed by Burrows Delta using Lexical Frequencies as events
    1. Author01 Infinity
    1. Author01 Infinity
    1. Author03 Infinity
    1. gleick Infinity
    1. bast Infinity

    Analyzed by Burrows Delta using Rare Words as events
    1. Author01 Infinity
    1. Author01 Infinity
    1. Author03 Infinity
    1. gleick Infinity
    1. bast Infinity

    Analyzed by Burrows Delta using Binned Frequencies as events
    1. gleick 56.804661456724205
    2. bast 57.4768784285835
    3. Author03 90.58279638495986
    4. Author01 92.86171017325175
    5. Author01 103.78059122092623

  122. Michael Tobis says:
    February 23, 2012 at 10:02 am

    Shawn Otto has published a result from JGAAP.

    Of course, if 90% of the text is lend from the original documents, then the software will show that the original author is the most likely author. But it is not about the 90%, it is the author of the remaining juicy comments which we like to know. One should do the test on only those comments that are not tracable back in the original documents…

  123. YES! I believe that’s the first gold star anyone has earned here, right ?

    Congrats, Miss kim2000! O, yes, indeed, for whatever my opinion’s worth, I’d say U R indeed a scientist. Science, as I’m already sure you know, is mostly about methodology. The ability to analyse, eliminate, induct and deduct, not to mention to root around dull data until something, hopefully, pops up. This you’ve done admiringly. Regardless of whether this pans out…as Anthony says, it’s plausible and needs further investigation…you’ve still succeeded in presenting a very good workable hypothesis. I predict you’ll do very well in this game of life.

  124. Kim2000;

    you not only get a hat tip and a hat trick. . . . . .

    But you now also have your very own fan club here. . . of which I am a member!

    Good job!!!

  125. Some more complete numbers. Was able to run a few tests, including SPCA (Standardized Principle Component Anaysis), which resulted in charts without labels. http://blog.debreuil.com/images/jcharts.png

    I think this would be more useful being run by the people that made it, as they know the ins and outs of what everything signifies. Sorry for the text dump here.

    unknown.txt /Users/admin/Desktop/jar/unknown/unknown.txt
    Canonicizers: none
    Analyzed by Burrows Delta using Word stems as events
    1. Author01 Infinity
    1. Author01 Infinity
    1. Author03 Infinity
    1. gleick Infinity
    1. bast Infinity

    Analyzed by Burrows Delta using Sentence Length as events
    1. Author01 78.77155793679051
    2. Author01 80.2867632506562
    3. gleick 86.33117704938115
    4. Author03 88.9431049630841
    5. bast 104.26533350267803

    Analyzed by Burrows Delta using Syllable Transitions as events
    1. gleick 26.876961580487105
    2. Author03 39.498099057362204
    3. Author01 44.81833603913991
    4. Author01 53.69169935354535
    5. bast 53.93613732442708

    Analyzed by Burrows Delta using MW Function Words as events
    1. bast 100.96472895245202
    2. Author01 110.13874051963056
    3. Author03 111.98543979724113
    4. Author01 112.50870666862883
    5. gleick 118.5464526044639

    Analyzed by Burrows Delta using Syllables Per Word as events
    1. gleick 4.678547658695827
    2. Author01 7.16552557153825
    3. Author01 9.372995377516089
    4. Author03 9.552504044292814
    5. bast 13.812675242063655

    Analyzed by Burrows Delta using Word Lengths as events
    1. gleick 19.34449212885342
    2. Author01 23.605545416411637
    3. Author01 28.46339579762878
    4. bast 30.397652383928673
    5. Author03 30.8325765083569

    Analyzed by Burrows Delta using Binned Frequencies as events
    1. gleick 56.804661456724205
    2. bast 57.4768784285835
    3. Author03 90.58279638495986
    4. Author01 92.86171017325175
    5. Author01 103.78059122092623

    ——————————

    unknown.txt /Users/admin/Desktop/jar/unknown/unknown.txt
    Canonicizers: none
    Analyzed by Naive Bayes Classifier using Word stems as events
    1. gleick 6.284762047923577E-306
    2. bast 1.5118408762742144E-308
    3. Author03 9.881312917E-314
    4. Author01 4.9E-323

    Analyzed by Naive Bayes Classifier using Sentence Length as events
    1. bast 2.520179164211697E-52
    2. Author03 9.791582471231638E-73
    3. Author01 1.0853260416155313E-87
    4. gleick 8.612586952329744E-88

    Analyzed by Naive Bayes Classifier using Syllable Transitions as events
    1. bast 1.379192064573727E-41
    2. gleick 2.9856345009842305E-43
    3. Author01 9.87335629357855E-53
    4. Author03 2.4869515345576324E-54

    Analyzed by Naive Bayes Classifier using MW Function Words as events
    1. bast 3.523051194773655E-79
    2. gleick 1.3668325518288174E-95
    3. Author03 1.228600114016044E-180
    4. Author01 6.883327597227027E-188

    Analyzed by Naive Bayes Classifier using Syllables Per Word as events
    1. bast 1.936285647735328E-5
    2. gleick 1.1133390291997895E-5
    3. Author01 2.339809499435671E-6
    4. Author03 1.1557038384224756E-6

    Analyzed by Naive Bayes Classifier using Word Lengths as events
    1. gleick 4.185945546916831E-28
    2. bast 2.1906437924926997E-29
    3. Author01 8.198176784562617E-38
    4. Author03 2.2375812457401515E-42

    Analyzed by Naive Bayes Classifier using Binned Frequencies as events
    1. bast 1.1656631793255989E-88
    2. gleick 3.7646212780446196E-113
    3. Author03 4.074684412801988E-118
    4. Author01 3.7928452097311674E-162

    ———————————-

    unknown.txt /Users/admin/Desktop/jar/unknown/unknown.txt
    Canonicizers: none
    Analyzed by Markov Chain Analysis using Word stems as events
    1. bast 193.85717213511924
    2. gleick 13.278602110106085
    3. Author03 0.0
    3. Author01 0.0
    3. Author01 0.0

    Analyzed by Markov Chain Analysis using Sentence Length as events
    1. Author01 8.391629968440892
    2. bast 4.382026634673881
    3. gleick 0.6931471805599453
    4. Author03 0.0
    4. Author01 0.0

    Analyzed by Markov Chain Analysis using Syllable Transitions as events
    1. Author01 1112.350964916729
    2. Author03 968.781177953394
    3. Author01 930.332078970121
    4. bast 885.1058174495475
    5. gleick 851.9537665612401

    Analyzed by Markov Chain Analysis using MW Function Words as events
    1. bast 496.32584285189483
    2. gleick 122.80138586374213
    3. Author01 0.6931471805599453
    4. Author03 0.0
    4. Author01 0.0

    Analyzed by Markov Chain Analysis using Syllables Per Word as events
    1. Author01 1175.347569702713
    2. Author03 1010.443933363838
    3. Author01 977.0885796023475
    4. gleick 897.9218021335724
    5. bast 881.9137449893942

    Analyzed by Markov Chain Analysis using Word Lengths as events
    1. Author01 1754.7669675494938
    2. bast 1693.2837194057502
    3. gleick 1543.939541549111
    4. Author01 1506.0750905415073
    5. Author03 1478.8738547796027

    Analyzed by Markov Chain Analysis using Binned Frequencies as events
    1. bast 1244.1093577231186
    2. gleick 836.2033308547939
    3. Author03 227.3215761357559
    4. Author01 203.7365317508035
    5. Author01 108.59439936922432

  126. kim2ooo says:
    February 23, 2012 at 11:12 am

    Good catch!
    It looks to me as if it is a left corner header……….Does this fit? [ The Blue header ]

    When both documents are printed, the text width does fit, but the text of the faked document starts too high, over the Pacific Institute logo. But still it may be some logo, as the text starts rather low and something is hidden at the top (probably by paper). But the aspect ratio may be different in the US (both printed here on A4, that is 12″ i.s.o. 11″).

  127. additional, it indeed seems to be a logo, but not the Pacific Institute logo, it is too wide for that, But in the two headers de vertical and horizontal dots match exactly at the same left and top margins. To make that happen with tape or paper would be very difficult to obtain…

  128. This bloke has a free stylometry software, but it’s only for ms windows. It looks like he’s looking for more beta testers for version 2, though.

    http://www.philocomp.net/?pageref=humanities&page=signature

    (At least it comes with a powerpoint presentation) :)

    Pfft, there’s even this: Writeprint Stylometry (Scripts) 0.1 for wordpress for analyzing comments to know who’s hiding behind who in the comment section. :p

    I never knew linguisticians could have so much fun… o_O

  129. >>
    My recommended settings would be:
    – Canonicizers: Normalize ASCII and Whitespace, Strip Punctuation, and Unify Case
    >>
    No, actually one of Gleick’s notable features is his (over) use of commas. Strip Punctuation would not be the best way to detect that !

    Like any analysis your will need to think and take time to read the doc and get familiar with it. I doubt anyone will get any meaningful results without spending a evening just finding out how to do this. In fact I doubt any trivial analysis by someone who does not know what they are doing will be of any use.

    It would be interesting if someone has the time to do this properly but the “crowd” seems small thus far.

  130. Ferdinand Engelbeen says:
    February 23, 2012 at 12:57 pm

    “…it is too wide for that…”

    Scanning shadow?

  131. Bart says:
    February 23, 2012 at 1:32 pm

    Scanning shadow?

    Indeed it looks like the scanning shadow of the paper, as in the second page the bottom line (which is the bottom scanning shadow of the paper) starts at the same left margin as the top vertical dots.

    Pitty, but no information in here… But anyway clever deduction from Kim2000!

  132. Laughably, the Penn State sabbaticalized Dr. Mike Mann accepted it uncritically.

    That must raise disturbing doubts about his judgements on other issues as well.

  133. Ferdinand Engelbeen says:
    February 23, 2012 at 12:48 pm

    I’ve been doing some research….
    It absolutely doesn’t fit Heartlands’s logo [ letterhead ]

    http://heartland.org/issues/law

    It doesn’t seem to fit Desmogs

    http://www.desmogblog.com/media_centre

    “but the text of the faked document starts too high, over the Pacific Institute logo.”
    Remember, this is a jpg image and is probably cropped.

    Or am I missing what you are saying?

    ———————
    Thank you Mr Robt Moderator
    Thank you, Latitude says:
    February 23, 2012 at 12:06 pm
    Thank you Mr. Ben Wilson says:
    February 23, 2012 at 12:26 pm
    Thank you Mr.Peter Kovachev says:
    February 23, 2012 at 12:23 pm

  134. I really think this is a job for a human – though this program could be a good tool to give insights. The idea of plugging in data and getting a result seems a bit hopeful given all the possible settings and lack of details. People always want to believe you can just dump data into a computer and a correct result will come out, sometimes that is true, but at very least you need impartial humans to calibrate and confirm. In this case I’d say it is of zero value as a result, and little value as a tool unless you wrote it or are well steeped in the field stylometry.

    I wonder what http://ljzigerell.wordpress.com/2012/02/18/profiling-the-heartland-memo-author/ would find with Gleick’s writings compared to the Heartland speech he used.

  135. Thanks Jake. I analyzed a few writings from just myself and Gleick and used your recommended settings of:
    – Canonicizers: Normalize ASCII and Whitespace, Strip Punctuation, and Unify Case
    – Event Drivers: Character Grams (N=4, N=5), M…N letter words (M=6, N=12)
    – Event Culling: Most Common Events (N=100 or other fairly large number)
    – Analysis Methods: Try these ONE at a time (Guassian SVM, LDA, Linear SVM)

    Results are:

    Canonicizers: Normalize ASCII Normalize Whitespace Strip Punctuation Unify Case
    Analyzed by Linear SVM using Character 4Grams as events
    1. Gleick 1.0
    Analyzed by Linear SVM using 6–12 letter Words as events
    1. Hoder 2.0
    Analyzed by Gaussian SVM using Character 4Grams as events
    1. Hoder 2.0
    Analyzed by Gaussian SVM using 6–12 letter Words as events
    1. Hoder 2.0
    Analyzed by LDA using Character 4Grams as events
    1. Gleick 2.3824198113541523E12
    2. Hoder 2.38241981135374E12
    Analyzed by LDA using 6–12 letter Words as events
    1. Hoder -1.0570728820980994E13
    2. Gleick -1.0570728820981416E13

    No smoking gun here. It looks like there’s a better chance of myself having written the memo than Gleick. As others have mentioned though, he probably used quite a few words from the other documents that he stole to produce the fake one.

  136. P. Solar says:
    February 23, 2012 at 1:28 pm

    “… but the “crowd” seems small thus far.”

    If I can ever get past the “failed experiment” error I’ll be in the game. Rummaging through data is fun.

  137. A big problem is that, clearly most of the memo was cut and pasted from Heartland documents. It’s the cardboard cutout, Snidely Whiplash, Tourette’s -like expulsions which John Hinderaker pointed out here which were inserted by the faker. It’s not a lot to go on, but that is where the focus should be.

  138. Feeding time for all you WUWT sleuths. On another thread here (http://wattsupwiththat.com/2012/02/23/peter-gleick-debate-invitation-email-thread/#comment-901963), Hearland submits emails between them and Dr Gleick.

    To make it easier for you tired textual analysts, I’ve taken the liberty to copy and paste Gleick’s text, minus all else. The delightful bit is that although normally this would be a pretty pro forma correspondence with the usual polite and formalized language, Gleick’s “”special character and idiosyncratic language comes through loud and clear. You have to read the entire email thread to get a feel for his total gracelessness. And, as TheGoodLocust noted, Gleick makes it unambiguously clear that he knew about the importance of donor anonymity to Heartland, and his oppositoin to that.

    ***
    Dear Mr. Lakely,

    After reviewing your email and after serious consideration, I must decline your invitation to participate in the August fundraising event for the Heartland Institute.

    I think the seriousness of the threat of climate change is too important to be considered the “entertainment portion of the event” as you describe it, for the amusement of your donors.

    Perhaps more importantly, the lack of transparency about the financial support for the
    Heartland Institute is at odds with my belief in transparency, especially when your Institute and its donors benefit from major tax breaks at the expense of the public.

    Thank you for considering me.

    Dr. Peter Gleick

    * * *

    Dear Mr. Lakely,

    Thank you for your email of January 13th, 2012, inviting me to participate in the Heartland Institute’s 28th Anniversary Benefit Dinner.

    In order for me to consider this invitation, please let me know if the Heartland Institute
    publishes its financial records and donors for the public and where to find this information.

    Such transparency is important to me when I am offered a speaking fee (or in this case, a
    comparable donation to a charity). My own institution puts this information on our website.

    Also, I would like a little more information about the date, venue, and expected audience and format. In addition, I assume your offer includes all travel and hotel expenses, economy class, but can you please confirm this?

    Sincerely,

    Dr. Peter Gleick

    ***

  139. robin @12:34
    What you ran through there is quite interesting to start this project. One thing that seems quite surprising to me is that the other three authors are nearly always either all above or below Gleick & Bast. I understand Bast being in the mix: we know that he was part of the team who wrote the true documents, and those were heavily quoted by the fake.

    Ah – what was the general topic and subject matter of the other documents you used?

    Somewhere I read two days ago or so (Curry’s?) about a project to highlight the fake memo in such a way as to show what is cut from the other documents, what was modified but taken from them, and the totally new material. I have not seen
    It seems to me unless the cut/paste, and possibly modified, stuff is isolated no reliable analysis can be done with this tool.

  140. Kim2000, DukeC, etc…
    One thing that would really make it interesting is if PDFs scanned by Gleick or PI on “plain paper” also just happened to bear those same marks in the upper left hand corner… hmmmm…???

  141. The textual analysis should exclude the copy/pasted text from the rest of the material.

    In addition to the papered over header, the typeset is not consistent with the Heartland house-style in their other published documents. It would be interesting if it were, coincidentally, consistent with the typeset house-style of any of the institutions with which Peter Gleick was associated, or with documents he produced independently.

    Given the names named, and the context in which they were named, as well as the recent history of his comments in various venues, with the addition of the odd expressions used, it is completely consistent with Gleick having authored it.

    As Sir Thomas More would have said: “Why Peter, it profits a man nothing to give his soul for the whole world… but for a Forbes blogspot and a K-12 science curriculum?”

  142. h/t to robin for the Zigerell link. That is an excellent analysis as far as it goes. Oddly it does not continue the analysis of meme vs Bast speach to a similar exercise on some Gleick text.

    I google Peter Gleick Forbes (prominent in the memo) and took the first Forbes article by G.

    http://www.forbes.com/sites/petergleick/2012/02/05/global-warming-has-stopped-how-to-fool-people-using-cherry-picked-climate-data/

    Z’s #2 , the Oxford comma, before and in lists of items. G uses this without fail.
    Z’ #3. The memo author wrote 20 as a number but two as a word.Z [one instance of "fifteen" all other numbers are numerals]
    #4. The memo author did not indent paragraphs.
    #5. The memo author used ragged-right justification with no hyphenation. [not applicable to HTML]
    #10. The memo author inconsistently hyphenated the adjective high-profile / high profile.[G seems consistent in hyphenation in this text ]
    #13. The memo author used parenthetical remarks,[Strong +ve. G over use of this technique is a hall mark of just about every sentence.]
    #14 The memo author introduced the acronyms IPCC, NIPCC, AGW, and WUWT without explanation [neutral: virtually no acronyms, NASA (trivial), GISS not explained. ]

    The one’s I’ve skipped don’t seem to apply to the Frobes article.

    Let’s see if there are any similar things to pick out.

    S #1 Use of double hyphen to introduce an example. Three times in first page of memo eg.
    “focus on providing curriculum that shows that the topic of climate change is controversial and
    uncertain — two key points that are effective at dissuading teachers from teaching science. ”
    G uses this device 6 times in Forbes plus 7th time as a parenthesis.
    higher-than-average warming – a dynamic confirmed by both models and by actual observations.

    Z’s idea that the memo may an original but fraudulently “spiced up” is interesting. I’m not that convinced but it merits further consideration.

    The final para is surely the most obvious fakery.

    H.I. would be unlikely to refer to one of their in-house climate experts bluntly as “Taylor” even in a memo. Indeed, all are referred to by last name only in contrast to the rest of the memo where names are given in full, with title where appropriate.

    Forbes seems to get improbable importance. “High profile climate scientist (such as Gleick)”. Not sure who would have considered him in those terms (before this week). Self-promotion?

    Attempted kiss of death to “Curry (who has become popular with our supporters).

    Circulation to “subset” . Scientific phraseology. Also if you intend to be underhand, it’s unwise to put it in writing and say “shh, don’t tell any one will you?” Especially if there is nothing controversial that is worth hiding from half your colleges!

    “… if our work continues to align with their [KOCH Foundation.] interests.” Obvious “ah-ha! so Koch are dictating the H.I. agenda. Greenpeace were right and here’s the proof”. This is for external consumption, not a secret “subset” of the board.

    “showing… climate change is controversial and uncertain — two key points that are effective in disuading teachers from teaching science.”

    H.I. certainly would say they want teachers to START teaching science, not stop. This is straight out of warmist vocabulary. What the author meant by science is “our science” , “the science”. You know, the indivisible, consensus, the science is settled, don’t argue ever again “science”.
    That is an subconscious slip of language of someone so familiar with that use of the word science he does not even notice.

    “… we sponsor the NIPCC to undermine the official…” . Again, Freudian slip. H.I. aim to rebut and expose what they say is corrupt science not undermine it. This phrase comes from the mind of a “believer” who feels his dogma is being threatened.

    Conclusion
    =========

    It would appear that the whole memo was written by a frustrated warmist with a grudge against Forbes, Taylor, Watts and Curry in particular.

    Certain distinct features of Gleick’s writting style are very present though not enough to condemn him from such a cursory examination. Of course once he is in a court of law, under oath, he’s going to be hard pressed to play “I got it in the post but I lost it”.

    More pop-corn required…

  143. Robin says:
    >>
    I think this is a job for a human.
    >>
    ….I agree it is best done by brain than box. The result of the software can only be as good as some “subset” of the knowledge of the person who wrote it. I doubt that java applet is much more than an amusing toy.

    >>
    I wonder what http://ljzigerell.wordpress.com/2012/02/18/profiling-the-heartland-memo-author/ would find with Gleick’s writings compared to the Heartland speech he used.
    >>

    Great minds think alike ;)

  144. Mr. Connelly…..the courts are overloaded with criminal cases. This information will not be needed for a while. You trying to rush good scientists at work? How’s a scientist to cope with such pressure?

    Maybe where you come from things get published overnight, but crowdsourcing is a much more laborious process than today’s climatological peer review process.

    Relax, give the wine a little time to ferment.

  145. re: TerryS’ challenge above (at February 23, 2012 at 6:42 am) regarding PDF meta-data:
    Piece of cake. And marginally plausible, too.
    1. Open your document in Adobe X Standard (the default for those of us who have to create forms, OCR old scans or redact contents)
    2. Click File/Save As from within the program to save it to a different location.
    3. Open both versions and compare xmp:ModifyDate and the xmp:MM:InstanceID.
    Why is this plausible? When receiving a document via email, most of us do just launch the attachment. If your job requires you to create pdf-based forms, OCR old flat scans or redact contents for our jobs, defaulting to Adobe Pro or Standard is normal. Once the document is open and interesting, you want to save it to your harddrive so you can do something with it (like, say, post it to a website). The document is already open and File/Save-As is the pattern that Microsoft trained us to. No forensic examiner would make that kind of mistake but a casual computer user could with no evil intent.
    By the way, it’s also plausible that you might Save-As to Reduced-File-Size if you are concerned about download times and want to make a document more accessible to readers.
    Inconsistent modification dates and filesizes are maybe clues that something happened but they are far from proof that anything malicious happened.

  146. If he says he received it by snail mail shouldn’t he have kept the envelope which will have the post mark on it ?

  147. @Ian Hoder
    Well I’m glad you got it to work. I just keep getting the same errors that you alluded to this morning.

  148. When anything is printed on a laser printer it suffers the George Costanza problem; shrinkage.
    The printer works by using reflected light to ionize a selenium coated rod, the rod picks up magnetic ink, which is deposited on paper, the paper is placed in a strong magnetic field to remove the charge, finally, the ink is heat to melt it into the paper.
    Different laser printers have different inks, and different heat treatments, and different amounts of shrinkage, However, cheap papers tend to shrink the most and also have much more asymmetry in the product, the edges shrink more than the center of the paper.
    If the document was printed on high quality paper, shrinkage is minimal and justified lines in the center of the paper will have the same linearity and width as those on the top and bottom.
    Lower quality papers are much more likely to have asymmetry about the center of the paper, so the bottom and top lines will have a curved form.
    Paper has to be kept at the correct humidity to cut down on curling, the more humid the atmosphere and cheaper the paper, the more curl will be observed.
    Does the text have smilees and frownies?

  149. P. Solar: you are supposed spell out only single digit numbers. Two or more are inserted as digits. Spelling fifteen would be the only anomaly.

    Mark

  150. It is astounding at the number of commenters on this thread as well as others who have pitched in with their knowledge to decipher fakegate as well as the many other topics discussed on WUWT. It is truely an ‘amy of ones’. I doubt that defeat will ever be an option. Now that is a team you can believe in!

  151. No problem getting the software installed on my mac. So it looks like basically I need to load in text from several authors to compare the “anonymous” document against. Hmmmm I bet it could identify a Willis essay from a mile away. Will keep you posted.

  152. “Peter Kovachev says:
    February 23, 2012 at 10:59 am
    A dood, beg to respectfully differ with your smudge hypothesis. If you were to look again, you’ll note that the anomaly is unnaturally regular; rectangular, straight lined and with an apparent corner. I still place my wager on a shadow caused by a white tape or a label. On second look, it may actually be a shadow caused by the upper corner and edge of the document page, although the abrupt, un-tapered ending of the bottom bit of the line on the left hand side would militate against that.”

    Agreed. I was just thinking that whenever I had the same mark on all the pages of a scan it would often be something on the scanner glass.

  153. Hey now! Using Mr. Watts recommended software => most likely author of strategy document: Heartland’s Joe Bast (with bonus consideration: Mosher sent the document to Gleick)

    The most likely author of the Heartland Institute climate strategy memo? => http://www.shawnotto.com/neorenaissance/blog20120223.html

    REPLY:
    Yawn, predictable. It’s HuffPo, what did you expect? And he just throws out the numbers there, not knowing how to interpret them, and jumps to the conclusion he wants, like you do, providing scads of caveats. The scores he published suggest otherwise. The authors of the program are helping WUWT do an analysis, we’ll wait for their input on how to properly configure the program. Which we’ll share. -Anthony

  154. I do not have the capability of using the recommended software and doing a textual analysis. Nevertheless I think it very worthwhile to simply study the purported memo to see what one can see. In agreement with the message left by P. Solar I have the following observations which may or may not prove useful.

    First – the memo says “His effort will focus on providing curriculum that shows that the topic of climate change is controversial and uncertain – two key points that are effective at dissuading teachers from teaching science.”

    Second – the memo says “This influential audience has usually been reliably anti-climate and it is important to keep opposing voices out.”

    Now ponder these quotes a minute or so. Ask yourself, have you ever ever ever heard such blatant self-incrimination as “dissuading teachers from teaching science”, “reliably anti-climate” and “important to keep opposing voices out”? That is way beyond even “hide the decline”. Any such phrases, if authenticated, would comprise direct evidence of fraud. But even low level fraudsters are sufficiently careful to never put such self-incrimination into writing. It just does not add up that an official document, however secret, would be produced which contained proof of abysmally guilty intent, not to mention guilty action.

    I am reminded of an event cited in Richard Frank’s book GUADALCANAL where some Marines heard a noise in the distant brush and hollered out “who goes there?” After a few seconds came an answer “we are American Marines returning to report on the evening’s activities.”

    Sometimes words are so blatantly phony that they give themselves utterly away.

    Press on and best of luck.

  155. I’ve been playing around with JGAAP, and can see why it’s definitely beta software.

    If you get your combination of analysis methods and event drivers wrong, it just fails to do the computation.

    When I was using Training and Test sets (in the approved style), it correctly identified Gleick’s writing on matters of Sentence Length and ‘Words as Events’, whatever that is supposed to mean.

    It’s an interesting topic, so I might go away and write my own analysis tool — as it stands JGAAP is too opaque right now.

  156. @Draig Du says:
    February 23, 2012 at 11:13 pm

    Let me help you there.

    *Picks up rock*

    Are you OK getting back under there or do you need help?

  157. James Hill picks up on “important to keep opposing voices out”?

    That is one I spotted and forgot to comment on. This is such a blatant parallel to the Machiavellian gatekeeping revealed in climategate that it seems to be an obvious plant of “proof that the other side are just as bad, so don’t criticise us any more”.

    This document reeks of forgery from top to bottom. The fact that the last para is a bit more obviously so probably means the author started by trying to weave in the material he had purloined for H.I. and by the end was just letting it flow.

  158. The prominence of Gleick(isms) and Bast(isms) indicates that Gleick wrote the strategy document based on summaries of the stolen documents which were probably written in large parts by Bast. That is a clincher.

  159. Memo refers to “the Anonymous Donor” rather than “our”. This suggests it is written by an outsider to the organisation.

    memo states their climate work is of special interest to this donor. This seems to be an obsession of Gleick and his insistence of this being published as a condition of him speaking at the dinner.

  160. Mark T on February 23, 2012 at 8:27 pm said:
    P. Solar: you are supposed spell out only single digit numbers. Two or more are inserted as digits. Spelling fifteen would be the only anomaly.

    Mark
    —————
    I was not taught that rule in college technical writing class. I was taught that you keep comsisent. If you’re spelling numbers then you spell them all. If not, then they’re all numeric. Not saying this is the absolute truth, just there are variations to the rule just like I was taught to use the Oxford comma but not using it is also okay.

  161. @kim2000: I do not buy it. Compare the size of the left “margin” in the two documents.

    docA: http://img813.imageshack.us/img813/5586/startdocpg2.jpg
    docB: http://pacinst.org/reports/success_stories/new_ag_water_success_stories.pdf

    The horizontal distance between text and pacinst logo-box in docA, is much smaller than the
    distance between text and pixel-clutter in docB. They obviously do not match.
    Here is an overlay where i’ve matched the font size and first line of text on page 2 from both documents: – feel free to try yourself.

    @watts: Could you please share your aspect-ratio calculation? I see only two edges to measure from, and I am curious how you measure the aspect ratio from those.

    To me the distance between the text and the pixel clutter is exactly where I would expect the paper margin to be. The margin looks just like the standard margins in word processors. That is also the parsimonious explanation. The idea that somebody should have used their institute template with a header, and then later removed it when creating a forgery is just unbelievable. Who would do such an idiotic thing? So @watts, @Kovachev, @wilson, and @Engelbeen: Talk about confirmation bias!

  162. Oh Anthony. I have a feeling this idea could come back and bite you [snip].

    I rather hope not though. The alternative would be far more interesting.

  163. A potential problem withthis is that it might be detecting the style of the writing more so than the author.

    Think about it this way. The “fake” memo is written in the style of an official document. If the only examples of Bast’s writings you are putting into the program are the other official documents, then naturally it is going to peg him as the author when compared to Gleick’s writings, which are more journalistic in tone.

    Has anyone found any reports written by Gleick when he was head of the scientific integrity panel?

  164. Unknown Document:
    Expanded Climate Communications section of fake memo (the obviously adlibbed section of the fake memo)

    Joe Bast documents:

    http://news.heartland.org/newspaper-article/2011/09/02/are-we-doomed

    http://news.heartland.org/newspaper-article/2011/08/10/heartland-replies-science

    http://news.heartland.org/newspaper-article/2011/07/28/heartland-replies-nature

    http://heartland.org/editorial/2011/01/31/writer-owes-schmitt-readers-apology

    Peter Gleick Documents:
    E-mail to Barry, E-mail to Tamsin

    Expanded climate communications.doc C:\Users\Sean\Documents\Expanded climate communications.doc
    Canonicizers: none
    Analyzed by Nearest Neighbor Driver with metric Canberra Distance using Character 2Grams as events
    1. Expanded Climate Communications 0.0
    2. Peter Gleick 276.85855690431157
    3. Joe Bast 341.0642160873912
    4. Peter Gleick 342.9695238767721
    5. Joe Bast 387.49597772052016
    6. Joe Bast 398.2288354385788
    7. Joe Bast 595.1870344738793

    Expanded climate communications.doc C:\Users\Sean\Documents\Expanded climate communications.doc
    Canonicizers: none
    Analyzed by Nearest Neighbor Driver with metric Canberra Distance using Character 4Grams as events
    1. Expanded Climate Communications 0.0
    2. Peter Gleick 1697.8838678571715
    3. Peter Gleick 2212.55268946424
    4. Joe Bast 2498.548796452392
    5. Joe Bast 2746.177655844846
    6. Joe Bast 2904.130159851305
    7. Joe Bast 4659.464731336592

    Expanded climate communications.doc C:\Users\Sean\Documents\Expanded climate communications.doc
    Canonicizers: none
    Analyzed by Nearest Neighbor Driver with metric Canberra Distance using Word 2Grams as events
    1. Expanded Climate Communications 0.0
    2. Peter Gleick 428.2207207207207
    3. Peter Gleick 542.1495016611295
    4. Joe Bast 625.1890034364261
    5. Joe Bast 720.8237103084341
    6. Joe Bast 816.7965029955717
    7. Joe Bast 1496.2399713441844

    Expanded climate communications.doc C:\Users\Sean\Documents\Expanded climate communications.doc
    Canonicizers: none
    Analyzed by Nearest Neighbor Driver with metric Canberra Distance using Word 4Grams as events
    1. Expanded Climate Communications 0.0
    2. Peter Gleick 439.0
    3. Peter Gleick 553.0
    4. Joe Bast 641.0
    5. Joe Bast 753.0
    6. Joe Bast 876.0
    7. Joe Bast 1627.0

    Expanded climate communications.doc C:\Users\Sean\Documents\Expanded climate communications.doc
    Canonicizers: none
    Analyzed by Nearest Neighbor Driver with metric Canberra Distance using Word stems as events
    1. Expanded Climate Communications 0.0
    2. Peter Gleick 245.90269295532852
    3. Peter Gleick 329.25374822603555
    4. Joe Bast 374.28204336681557
    5. Joe Bast 422.0761532754034
    6. Joe Bast 433.8851473207132
    7. Joe Bast 718.2862711836141

    Conclusion: Gleick much more likely than Bast. However, I need more understanding of the settings.

  165. “This influential audience has usually been reliably anti-climate and it is important to keep opposing voices out.”

    “Reliably anti-climate” sounds like a clear giveaway slip. The word is always used in a derogatory way, and that’s how Gleik uses it in his own writings, as when he calls the hockey stick a “bugaboo of the anti-climate change crowd.”

    Heartland describing its own audience as “reliably anti-climate” would be similar to describing them as “reliably denialist”. Or, from the opposite perspective, it would be as if a warmist/alarmist organization referred to its own audience as “reliably alarmist”.

  166. Sean,

    I’m too busy to look them all up right now, but other posters like Robin [@12:34 above] have come to the opposite conclusion. It appears that the program is a cherrypicker’s delight. So I’ll wait for Mosher’s input. He was the one who ID’d Gleick, forcing Gleick to admit to criminal activity in his noble cause corruption. I trust Mosher’s reasoning much more than any GIGO program in beta testing.

  167. @ smokey

    I agree. A black box, subject to the GIGO effect definitely. Just like climate change models. Fix the inputs and the settings and you can get whatever answer you want.

    I was trying to use settings similar to Shawn Otto, but only using a sub-section of the fake memo that is definitely not plagiarized. My result is quite different than his too.

  168. Aslak Grinsted says:
    February 24, 2012 at 4:27 am
    “@watts: Could you please share your aspect-ratio calculation? I see only two edges to measure from, and I am curious how you measure the aspect ratio from those.”

    ………………………………….
    I see the reminisces of 4 edges.
    There are 2 dots in-line at the bottom of the vertical arrows and a dot at the right of the arrows in http://img813.imageshack.us/img813/5586/startdocpg2.jpg

    Until the original is released, analyzed by forensic experts, we will not know. [ Maybe, not even then ].
    Kinda like “sighting a rifle”…a millimeter off at the sight – exponentially… miles off… down range. It is also the reason why raw data and programming data transparency is important in Climate Science.

    This is why my hypothesis doesn’t come with a conclusion. It asks the question, “Does This Fit”?

  169. AFPhys says:
    February 23, 2012 at 4:26 pm

    Kim2000, DukeC, etc…
    One thing that would really make it interesting is if PDFs scanned by Gleick or PI on “plain paper” also just happened to bear those same marks in the upper left hand corner… hmmmm…???

    ————————————–
    Indeed! :)

  170. Having lived with this style analysis for a day or so I am beginning to feel that it is very unlikely to produce anything useful. It may well be a parallel to computer models that predict future climate. The variables are so plentiful as to render any conclusion chaotic and unreliable. Good effort, but poor results.

  171. kim2000, nice bit of sleuthing, there.
    I have been trying to replicate the steps that the Strategy memo writer followed when creating the document. If you copy and paste the PDF OCR overlay into Microsoft Word you can come very close! When you open Word, it defaults to the same type font and size (Times New Roman 12 point, unless your copy of Word has customized settings). And, it is very easy to add a header that matches the Strategy Memo. So, it’s very plausible.

    If you have Word on your computer, you can use this example:

    http://dl.dropbox.com/u/18009262/January%202012%20-%20Copy.doc

  172. Using Evince document viewer (Debian Linux), Properties, fonts in the pdf’s:

    The suspect “2012 Climate Strategy” doc:
    Times-Bold
    Times-Roman
    Times-Italic
    All identified as Type 1, not embedded.

    2012 Fundraising Plan:
    TimesNewRomanPSMT, TrueType, not embedded
    TimesNewRomanPS-BoldMT, TrueType, not embedded
    WPTypographicSymbols, TrueType, embedded subset
    TimesNewRomanPS-ItalicMT, TrueType, not embedded

    2012 Proposed Budget:
    TimesNewRomanPSMT, TrueType, not embedded
    TimesNewRomanPS-BoldMT, TrueType, not embedded
    TimesNewRomanPS-ItalicMT, TrueType, not embedded
    TimesNewRomanPS-BoldItalicMT, TrueType, not embedded
    WPTypographicSymbols, TrueType, embedded subset

    1/17/2012 Board Meeting Agenda:
    TimesNewRomanPS-BoldMT, TrueType, not embedded
    TimesNewRomanPSMT, TrueType, not embedded
    TimesNewRomanPS-ItalicMT, TrueType, not embedded

    “Binder1″ Notice of 1/17/2012 Board Meeting:
    TimesNewRomanPS-BoldMT, TrueType, not embedded
    TimesNewRomanPSMT, TrueType, not embedded
    TimesNewRomanPS-ItalicMT, TrueType, not embedded
    Arial-BoldMT, TrueType, not embedded
    ArialMT, TrueType, not embedded
    SymbolMT, TrueType (CID), embedded subset
    WPTypographicSymbols, TrueType, embedded subset

    “Board Meeting Package January 17.pdf”:
    TimesNewRomanPS-BoldMT, TrueType, not embedded
    TimesNewRomanPSMT, TrueType, not embedded

    The real Heartland Institute documents consistently use TrueType fonts from a limited group, and every one is using Times New Roman. The “Climate Strategy” pdf uses Times and is not TrueType. Could that be an artifact of the scanning, that the Epson scanner software recognized there was one font in three different forms yet misidentified the exact font?

    Occam’s Razor suggests another explanation to me, but I’d rather hear from more experienced commenters about this finding before reaching a definitive conclusion.

  173. Francisco says : “Reliably anti-climate” sounds like a clear giveaway slip.

    Definitely. How can you be “anti-climate” ? This comes out of the same word book at “anit-science”. Every paragraph of this memo reeks of being a fake written by warmist. In view of who sent it and the strong parallels to his writing style, I’d say he’s now going to keep his head down until the plea bargaining stage.

    “anti-climate” ? Yeah, I’m totally anti-climate, I think climate should be banned! Weather, I can live with, but climate …

    Some of this crew, of which Gleick is a good example, have so seriously lost the plot they think climate change itself is the “cause”. Anyone who cared about Earth’s climate, nature, wild-life(no comma) and future of the planet would be over-joyed if the alarming warming trend at the end of the last century flattened out. But for these guys, NOT seeing the climate rush to thermageddon seems to be the end of the world [sic]. They must, at all ends, maintain the pretence, frig the figures, even resort to serious felony to stop anyone saying the awful truth: IT’S NOT AS BAD AS WE THOUGHT.

    Environmentalism could be regarded as a cause. Not shitting on our own collective doorstep and destroying a biosphere that keeps us alive makes a lot of sense. Except that the environmental movement seem to have forgotten what REAL pollution looks like.

    The whole thing has got rather perverse. It seems they are praying for more catastrophic global warming so they can show how urgently we need to do something to save the planet from catastrophic global warming.

    The same is true of the other side to some extent. There seems to be some on the denier end of the sceptic scale that would love to see another LIA so that they can they can be proved right.

    I think we should consider ourselves very fortunate to be living in a period with such a benign climate.

    It seems the software Anthony suggested using here has little worth as a serious tool, check back in 5 five years. However, the discussion of human analysis of the content seems to have been quite productive.

    If, as he claims, someone sent him the memo anonymously, they had a writing style with a remarkably similar style to his own.

    Of course, this is not necessarily at odds with his careful statements made so far. He could well have sent it to himself “anonymously” as a precaution.

    Unfortunately this adds “premeditated” to the charge sheets, rather than it being an emotional and uncharacteristic mistake in a moment of frustration.

    I’d say Gleick is up shit creek.

  174. Most of the people on this thread who have tried running the text analysis software are coming to what I believe to be the correct conclusion – it will not offer much evidence as to who may have written the fake agenda.

    The software is reasonably well written and easy to use although I think it could be more useful to be able to save some of the intermediate results and to be able to identify the specific documents in the output. There are many available choices for the type of analysis to be done, but that also creates a problem for the new user at to which ones to choose. To understand these, the user needs an understanding of each method and the process by which the statistics can be calculated. For the latter, a course in multivariate statistical analysis would be recommended as an essential prerequisite.

    For example, some of you have used the Character 2Grams as a method. This method counts the frequency of occurrence of all possible consecutive pairs of letters for each document. This forms strings of numbers which are compared to string from the target document. The “closest” document to the target is deemed the most likely to be created by the same author. “Closest” is determined by the choice of Analysis Method and Distance function and differnt selections may very well give conflicting results. To understand what you are getting, you need to know how each of these are defined.

    But that isn’t enough. The effectiveness of these methods depends very much on the character of the document to which it is applied, and on the length of the target text. Thus the choice of documents and the comparison individuals used is very important and will affect the conclusions one comes up with. In practice, one would apply many methods to examine various aspects of the text and base their conclusions on the overall picture. This requires some further knowledge and experience to use the methodology properly.

    In the case of the fake, we seem to be left with the prospect of doing what we have been doing all along, looking at the words and trying to infer where they came from. Where is Mosher when you need him? ;)

  175. “Could that be an artifact of the scanning, that the Epson scanner software recognized there was one font in three different forms yet misidentified the exact font?”

    Some printers have fonts, some rely on the software to render fonts. A scanner is a scanner, it does pixels not fonts.

    To get back to fonts implies OCR. That could be part of the scanner software bundle , a separate process done by Gleick, or a feature of Evince. I doubt the latter but have not checked.

    A simple jpeg scan can be inserted into a pdf. ANY modern scanner will do a highly respectable scan with a resolution that will be near photo quality. The crappy rendition suggests a low res scan and OCR were probably done as further obfuscation.

    +1 for forgery
    +1 for premeditation.

  176. Normally I wouldn’t post private conversations, but this may help your analysis. A while back Peter Gleick responded to an email I wrote to him complaining about his article in Forbes “Paper Disputing Basic Science of Climate Change is “Fundamentally Flawed,” Editor Resigns, Apologizes”. He responded fairly quickly, so I suppose that this would be his writing unedited.

    He’s already in a whole world of trouble. Since he admitted it fairly early, I guess I have some small hope that he was honest in saying that he didn’t write the fake memo.

    From: Peter H. Gleick
    To: Scott Basinger ; “pgleick@pacinst.org”
    Sent: Saturday, September 3, 2011 2:40:59 PM
    Subject: Re: Article in Forbes

    There are many examples where theoretical models, created and based on our understanding of physical, chemical, biological (etc.) laws have made predictions BEFORE observations were made. When observations are then made, they either confirm the model, or tell us the model must be adjusted. Models are part of the overall scientific process.

    In the end, your statement that modelers must reconcile with reality is correct. But models are a very important and useful part of the process.
    There is a classic statement: “All models are wrong, but some are useful.” Climate models have been incredibly useful.

    And frankly, they just saved perhaps hundreds of lives by estimating, incredibly accurately, the path of Hurricane Irene. These models are the children of the climate models we run.

  177. From P. Solar on February 24, 2012 at 10:30 am:

    To get back to fonts implies OCR. That could be part of the scanner software bundle , a separate process done by Gleick, or a feature of Evince. I doubt the latter but have not checked.

    It’s not Evince, the document viewer. It reported no fonts at all in the 2010 IRS Form 990, nor can any text be selected although it is clearly visible and legible. Meanwhile the suspect “Climate Strategy” and the other real Heartland docs have selectable text. Therefore Evince is not performing any OCR, that info would have to be in the pdf.

  178. This isn’t really going to work. The fake document quotes extensively from the other ones. The author used few of his own words.

  179. Duke C. says:
    February 24, 2012 at 9:27 am

    kim2000, nice bit of sleuthing, there.”

    ——————-
    Thank you, Mr, C :)
    ————————-
    “I have been trying to replicate the steps that the Strategy memo writer followed when creating the document. If you copy and paste the PDF OCR overlay into Microsoft Word you can come very close! When you open Word, it defaults to the same type font and size (Times New Roman 12 point, unless your copy of Word has customized settings). And, it is very easy to add a header that matches the Strategy Memo. So, it’s very plausible.

    If you have Word on your computer, you can use this example:

    http://dl.dropbox.com/u/18009262/January%202012%20-%20Copy.doc

    ———————————-

    Thank you, for your work and link.
    I wonder if any readers, here, have a PI actual letterhead [ logo ]?
    We might be able to get the actual pixel sizes of the logo?

  180. kadaka (KD Knoebel) says:
    February 24, 2012 at 9:40 am

    Using Evince document viewer (Debian Linux), Properties, fonts in the pdf’s:……………”

    ——————-
    That is interesting! Thank you.

  181. “I guess I have some small hope that he was honest in saying that he didn’t write the fake memo.”

    I think you maybe are falling into the assumption of what wanted to make people think without actually saying it in a way that would be untruthful.

    IIRC he just said “someone” had sent it to him anonymously in the mail.

    That statement does not preclude him having written it then (aclaimedly or really) posted it to himself “anonymously”.

    That may get him around not having lied in his HuffPo confession but I don’t think it would pass the test for “the truth, the whole truth and nothing but the truth” under oath.

    I’m wondering how long it would take for anyone else who admitted a felony, to be arrested.

    I seem to recall Tallbloke got the dawn raid treatment just to be interviewed as a witness.

  182. re fonts: Heartland appears to have used WordPerfect. Can you tell from the scan whether the fake memo is in Word?

  183. P. Solar says:
    February 24, 2012 at 12:30 pm
    “IIRC he just said “someone” had sent it to him anonymously in the mail.

    That statement does not preclude him having written it then (aclaimedly or really) posted it to himself “anonymously”. ”

    Attention. The confession states that he received a document in the snail mail. And it says he forwarded “the material” to DeSmogBlog et.al. But it DOESN’T say anything about whether the snail mailed document was part of that; whether the snail mailed document is identical to the suspect strategic memo, and whether or not he produced the memo himself. It’s very carefully wordsmithed to leave all of these conditions open. This also means that it was a lawyer who wrote it. It also has no redundant parentheses.

  184. Anthony, you are an evil man :) Some organization needs to give you a genius award! Sorry, I can’t be a part of your consenses.

  185. P. Solar said @ February 24, 2012 at 10:13 am

    Some of this crew, of which Gleick is a good example, have so seriously lost the plot they think climate change itself is the “cause”. Anyone who cared about Earth’s climate, nature, wild-life(no comma) and future of the planet would be over-joyed if the alarming warming trend at the end of the last century flattened out. But for these guys, NOT seeing the climate rush to thermageddon seems to be the end of the world [sic]. They must, at all ends, maintain the pretence, frig the figures, even resort to serious felony to stop anyone saying the awful truth: IT’S NOT AS BAD AS WE THOUGHT.

    Environmentalism could be regarded as a cause. Not shitting on our own collective doorstep and destroying a biosphere that keeps us alive makes a lot of sense. Except that the environmental movement seem to have forgotten what REAL pollution looks like.

    The whole thing has got rather perverse. It seems they are praying for more catastrophic global warming so they can show how urgently we need to do something to save the planet from catastrophic global warming.

    The same is true of the other side to some extent. There seems to be some on the denier end of the sceptic scale that would love to see another LIA so that they can they can be proved right.

    I think we should consider ourselves very fortunate to be living in a period with such a benign climate.

    All too true P. I remember back in the 1960s when living in UKLand that if it started to rain after me mam hung out the washing, she’d rush to get it off the clothesline. There was so much soot in the air, the clothes would have ended up dirtier than before they were washed. Now that was pollution! Personally, I’d like just a tad more warmth in the summer, but then I live in the southern hemisphere where we don’t seem to have any “global” warming.

    It seems the software Anthony suggested using here has little worth as a serious tool, check back in 5 five years. However, the discussion of human analysis of the content seems to have been quite productive.

    Pompous Gits being what they are (fragile ego and all) I used a number of examples of my own writing. This software enables me to “prove” that The Git did, or did not write what he wrote. Of course I’ve never used this software before and don’t really understand what I am doing. It’s probably even less useful than grammar checkers.

  186. Steve McIntyre said @ February 24, 2012 at 1:11 pm

    re fonts: Heartland appears to have used WordPerfect. Can you tell from the scan whether the fake memo is in Word?

    Opened Kim2000’s copy of the Word doc in Word 2010 (default settings) but the lines of text are too long. It creates 10 instances of orphaned words on their own line.

  187. Steve McIntyre says:
    February 24, 2012 at 1:11 pm

    re fonts: Heartland appears to have used WordPerfect. Can you tell from the scan whether the fake memo is in Word?

    ——————————————————————————————————————–

    The font is an Identical match in Word 2003. Kerning and line spacing is off, however. I printed out 2012 Climate Strategy.pdf and overlaid it on a printout of a Word doc version, w/ default settings, and held it up to a back-light.

    Don’t have access to WordPerfect. The results might be telling. Since it’s a TrueType font, however, they me be indistinguishable from each other.

  188. Here’s an idea for those who have a full version of Adobe and/or WordPerfect.

    Open a new doc and type the first paragraph of 2012 Climate Strategy w/ the default settings. then try it with settings from one of the HI Pdfs, and the settings from a PI Pdf. Print them out, overlay them, and hold them up to a back-light. Do any of the combinations match up?

  189. Elmer from m4gw had an interesting post on some of the discrepancies between the forged document and the others.

    The forger probably did not reset the default formatting on their word-processing app all that carefully. It looks like all they did was go for a font/pt match and decided that was good enough.

    http://www.minnesotansforglobalwarming.com/m4gw/2012/02/desmogbloggate.html

    “The Style is Different: The faked “2012 Climate Strategy” is in a completely different style than the other live text documents. They both use the Font “Times New Roman” and they are both 12pt but that is where the similarities end. The headlines and subheads on the real documents are 18 pt., the subheads are numbered and the paragraphs are indented. The fake document doesn’t use any of these devices.

    “…The Leading is Different: In all of the live text documents the leading is 14pt but on the fake memo its 16 pt. I overlayed the fake document (in gray) over the real one to show the difference.”

  190. Philemon says:
    February 24, 2012 at 4:07 pm

    Yup. That’s what I’m thinking….
    The memo writer used the existing formatting from a previously loaded .doc or .pdf.

  191. OK-
    Might have found something.

    This is the Pac Inst 2011 Funders List which, ironically, was posted this morning:

    http://www.pacinst.org/about_us/financial_information/funders_2011.pdf

    The heading, left hand margin, and line spacing/kerning are identical to the Strategy Memo.
    However, the font is 14 pt, not 12 pt.

    So if one were to load this PDF, then select/delete all text, change the font size to 12 pt., would the kerning be preserved?

    (That being said, been at the ‘puter all day, my wife is p***ed. I’m out for awhile :))

  192. The analytical problem is separating the author’s style from the subject and content which has been lifted from Heartland sources. So there are mixed DNA traces in the text and word-match tests are going to get tangled. The punctuation and letterhead clues are going to be less contaminated.

  193. I ran an n-gram analysis of the writings of several climate-related authors and checked to see how well they matched the unknown memo; this is a fairly simplistic method, but interesting none the less. The 5 best matches were:

    1. Richard Littlemore 21.08%
    2. John Mashey 18.87%
    3. Peter Gleick 18.63%
    4. David Karoly 18.38%
    5. Joe Bast 18.29%

    (As a control, I included a Woody Allen short story; he scored a 6.91% match, Ha! A result! Woody Allen did not write the fake memo ….. where’s my government grant?)

    All of which says this is too blunt a tool to use, and more specific algorithms (stop-word analysis, for example) might show better results.

  194. I used JGAAP to compare the 2012 Climate Strategy document from DeSmog’s blog to a set of documents that I was reasonably confident had been authored by either Joe Bast or Peter Gleick. The set of documents that I used for training follow:

    Bast 01 – Email to Judy Curry 02/24/12
    Bast 02 – HI doc: 10 Bold New Projects for 2012
    Bast 03 – HI Press Release 02/20/12
    Bast 04 – HI Press Release 02/24/12
    Gleick 01 – Forbes: Reply to Taylor 01/25/12
    Gleick 02 – Forbes Article 01/05/12
    Gleick 03 – Forbes Article 01/27/12
    Gleick 04 Huffington Post Blog 02/20/12

    The results of the analysis by Nearest Neighbor Driver with metric Kendall Correlation Distance using Character 2Grams as events are:

    Gleick 02 = 0.1746
    Gleick 04 = 0.2008
    Bast 02 = 0.2839
    Bast 01 = 0.3090
    Gleick 01 = 0.3117
    Bast 04 = 0.3189
    Bast 03 = 0.3902
    Gleick 03 = 0.3920

    Which indicates that Gleick 02 is the most likely author of the 2012 Climate Strategy document. I have no insights about the relative significance of the above-listed values for each author.

    I also used other event drivers (MW Function Words, Sentence Length, and Syllables per word) and analysis methods (Centroid, Linear SVM, Markov Chain) and weighting distances (Keselj-weighted, Pearson Correlation.) Gleick 02 was the most likely author in all case except Centroid w/Kendall Correlation Distance and Nearest Neighbor w/Kesedlj distance which indicated that Gleick 04 is the most likely author of the 2012 Climate Strategy document.

    IMHO, based on my analysis, Peter Gleick is the most likely author of the 2012 Climate Strategy document.

    As RomanM reported, I also found JGAAP easy to use but it was a bit finicky and didn’t accept *.docx files or a mixture of files. I finally converted all of the training documents and the unknown document to *.txt files. Also, I found the user documentation did not provide me much help understanding the analytical methods used in the program.

  195. In my 02/27/2012 3:05PM comment, I said that I found the JGAAP user documentation did not provide me much help understanding the analytical methods used in the program. Subsequently, I found this document,
    Authorship Attribution
    by Dr. Patrick Juola , one of the lead developers of JGAAP, to be very informative. It is lengthy (~100 pages), so have a couple of glasses of wine and read slowly!!

Comments are closed.