A look at comment deletion at RealClimate compared with WUWT
Guest post submitted by Ian Rons
Regular readers will doubtless be familiar, either at first- or second-hand, with the enthusiasm with which moderators at RealClimate.org seem to reject comments from AGW sceptics. Ecotretas’ recent story on Realclimate censorship (re-posted here) piqued my interest, since in addition to the usual tones of indignation, it suggested a method of estimating the RealClimate comment deletion rate by looking at the comment IDs (as revealed by WordPress’s use of the HTML (attribute), and counting the number of these IDs which are missing from the sequence.
Being at a loose end, I took up the cudgels and wrote a script using PHP and cURL which took about an hour to mine every available page on realclimate.org by accessing the page using its WordPress “post_id” value (from http://…/?p=1 to http://…/?p=8092, as at (14th July), extracting comment IDs with a simple regular expression and doing a bit of maths on the result. Naturally, the source code and the various output files are available upon request.
The figures are rather high, though doubtless some missing comments will have been spam.
However, at least in recent times, the RC site has employed the “re-CAPTCHA”service, which (unlike the Akismet service used by WUWT) does not create a new comment ID if the comment is rejected for being spam, so for instance the 56% of comments missing during June 2011 seems likely to be an accurate figure, unless some other explanation can be found.
A possible explanation might be the existence of a large number of comments on the site by an inner circle of users hidden on special-access pages, but I find it hard to believe this could account for a large proportion of (e.g.) the 933 comments which are missing in June 2011. Similarly, the apparent surge of deletions beginning July 2007 may also be truly reflective of events, since it has been suggested in comments here that it “coincided” with RC’s attack on the surface stations project. However it has also been pointed out that such interpretations are impossible to verify using this method.
Overall, there were 78,639 missing comment IDs, out of a total of 210,595, or 37.3%. As for the RC page known as “The Bore Hole”, which started on 6th January 2011 (the date being evident from the post_id of 6013 and the moderator’s response to the first comment, although they have since re-dated it 6th December 2004 for some reason), the comments on that page are of course counted here as “published” comments, however they are small in number (404) when compared with the number of comments which seem to go missing even after that date in January (5,000). At the risk of mixing metaphors, “The Bore Hole” could perhaps be regarded as something of a fig-leaf.
I ran a similar scan on WUWT (also a WordPress site) on the 14th July, extracting data from http://…/?p=1 to http://…/?p=43440. However, analysing WUWT with this method presented several problems that aren’t applicable to the RC site:
- Some earlier comment IDs are out of chronological order (stemming, it seems, from the import from TypePad to WordPress in October 2007), so figures for early months are impossible to calculate. During this period there seems to have been some infilling of comment IDs (probably due to the TypePad import not setting the “auto_increment” values in the database properly), which would affect the overall total; however, the numbers involved (whilst impossible to calculate precisely) are probably at most in the very low hundreds.
- WUWT has always used WordPress’s Akismet spam-filtering, which creates new comment IDs before marking them as spam. Anthony provided me with a screenshot showing the total volume of spam which had been deleted as of early on the 15th July to be 55,097. This can be adjusted down to 55,085 for the period covered by my data to late on the 14th July 2011.
- The Tips & Notes page encourages comments from readers which are not intended to remain permanently on the site, so they are to be regarded as “legitimate” deletions. Anthony provided me with records of the numbers of Tips & Notes comments posted (then eventually deleted) for the period 24th March to 10th July 2011 (3,220), on which I based an estimate of 22,215 “legitimately deleted” comments for the period 23rd June 2009 (when the T&N page was created) to 14th July 2011 inclusive.
Overall, WUWT has 75,989 missing comments IDs, out of a total of 700,115 submitted comments (10.9%). Subtracting the above figures for Akismet and Tips & Notes gives us a problem, since it’s a negative figure: -1,311. I think this is most likely due to an over-estimation of the number of comments posted on the Tips & Notes page, combined with perhaps a few hundred from the infilling problem mentioned above. However, the combined additions from these two sources of error would have to be in excess of 8,000 to raise the number of deletions to 1% of the total submitted, which I think very unlikely.
Putting it another way, and assuming a total of 200 “infilled” comment IDs (a high estimate, in my opinion), I would have to have over-estimated the volume of Tips & Notes comments by some 58% to reach a 1% deletion rate. I therefore see no reason to doubt the claims made on behalf of WUWT that the deletion rate is less than 1%. In fact it may be considerably lower. It is, however, noteworthy that December 2007 and January 2008 show high deletion rates, with another bump during Sep-Oct 2008:
In summary, whilst there remains some uncertainty especially regarding the RC data, the data does tend to support the anecdotal evidence concerning RC’s tendentious comment moderation practices. It also tends to support (or at least does not contradict) WUWT’s claims of a <1% comment deletion record.
For reference, here are the monthly totals which I used for the graphs. This table excludes incomplete months and some early WUWT months as noted:
| RealClimate | Watts Up With That? | |||||
|---|---|---|---|---|---|---|
| Month | Missing | Submitted | Missing (%) | Missing | Submitted | Missing (%) |
| Jan 2005 | 86 | 524 | 16.4% | |||
| Feb 2005 | 111 | 383 | 29% | |||
| Mar 2005 | 53 | 286 | 18.5% | |||
| Apr 2005 | 96 | 294 | 32.7% | |||
| May 2005 | 47 | 305 | 15.4% | |||
| Jun 2005 | 119 | 482 | 24.7% | |||
| Jul 2005 | 524 | 826 | 63.4% | |||
| Aug 2005 | 255 | 474 | 53.8% | |||
| Sep 2005 | 112 | 527 | 21.3% | |||
| Oct 2005 | 99 | 664 | 14.9% | |||
| Nov 2005 | 67 | 654 | 10.2% | |||
| Dec 2005 | 544 | 1150 | 47.3% | |||
| Jan 2006 | 277 | 944 | 29.3% | |||
| Feb 2006 | 306 | 1236 | 24.8% | |||
| Mar 2006 | 390 | 1292 | 30.2% | |||
| Apr 2006 | 660 | 2130 | 31% | |||
| May 2006 | 580 | 1477 | 39.3% | |||
| Jun 2006 | 174 | 995 | 17.5% | |||
| Jul 2006 | 142 | 1252 | 11.3% | |||
| Aug 2006 | 888 | 2123 | 41.8% | |||
| Sep 2006 | 253 | 1005 | 25.2% | |||
| Oct 2006 | 340 | 1055 | 32.2% | |||
| Nov 2006 | 114 | 1290 | 8.8% | |||
| Dec 2006 | 62 | 876 | 7.1% | |||
| Jan 2007 | 203 | 1791 | 11.3% | |||
| Feb 2007 | 223 | 2282 | 9.8% | |||
| Mar 2007 | 343 | 3107 | 11% | |||
| Apr 2007 | 160 | 1960 | 8.2% | |||
| May 2007 | 213 | 2271 | 9.4% | |||
| Jun 2007 | 188 | 2055 | 9.1% | |||
| Jul 2007 | 4061 | 5724 | 70.9% | |||
| Aug 2007 | 7171 | 9511 | 75.4% | |||
| Sep 2007 | 4140 | 5499 | 75.3% | |||
| Oct 2007 | 4561 | 7091 | 64.3% | |||
| Nov 2007 | 6064 | 8226 | 73.7% | 108 | 476 | 22.7% |
| Dec 2007 | 4184 | 6073 | 68.9% | 547 | 869 | 62.9% |
| Jan 2008 | 493 | 1938 | 25.4% | 497 | 1217 | 40.8% |
| Feb 2008 | 452 | 1656 | 27.3% | 536 | 2027 | 26.4% |
| Mar 2008 | 332 | 1444 | 23% | 776 | 3212 | 24.2% |
| Apr 2008 | 854 | 2222 | 38.4% | 396 | 3023 | 13.1% |
| May 2008 | 1159 | 3050 | 38% | 465 | 3192 | 14.6% |
| Jun 2008 | 880 | 2526 | 34.8% | 586 | 5781 | 10.1% |
| Jul 2008 | 1156 | 3086 | 37.5% | 751 | 6651 | 11.3% |
| Aug 2008 | 922 | 2733 | 33.7% | 514 | 6775 | 7.6% |
| Sep 2008 | 873 | 2827 | 30.9% | 1596 | 9174 | 17.4% |
| Oct 2008 | 692 | 1892 | 36.6% | 1918 | 8936 | 21.5% |
| Nov 2008 | 1466 | 3026 | 48.4% | 931 | 7012 | 13.3% |
| Dec 2008 | 1089 | 3127 | 34.8% | 436 | 7599 | 5.7% |
| Jan 2009 | 1063 | 3269 | 32.5% | 508 | 11357 | 4.5% |
| Feb 2009 | 834 | 2587 | 32.2% | 1053 | 12586 | 8.4% |
| Mar 2009 | 1232 | 3260 | 37.8% | 857 | 16186 | 5.3% |
| Apr 2009 | 1635 | 4369 | 37.4% | 662 | 16291 | 4.1% |
| May 2009 | 2037 | 4361 | 46.7% | 641 | 14217 | 4.5% |
| Jun 2009 | 808 | 3183 | 25.4% | 1236 | 13525 | 9.1% |
| Jul 2009 | 646 | 3664 | 17.6% | 1561 | 14722 | 10.6% |
| Aug 2009 | 384 | 2341 | 16.4% | 1606 | 13619 | 11.8% |
| Sep 2009 | 337 | 1657 | 20.3% | 1802 | 15389 | 11.7% |
| Oct 2009 | 722 | 3699 | 19.5% | 2187 | 19746 | 11.1% |
| Nov 2009 | 1518 | 5745 | 26.4% | 2945 | 25712 | 11.5% |
| Dec 2009 | 981 | 6401 | 15.3% | 4339 | 36716 | 11.8% |
| Jan 2010 | 728 | 5349 | 13.6% | 2250 | 26840 | 8.4% |
| Feb 2010 | 966 | 6020 | 16% | 2267 | 26640 | 8.5% |
| Mar 2010 | 873 | 5066 | 17.2% | 2349 | 26051 | 9% |
| Apr 2010 | 883 | 4227 | 20.9% | 2312 | 23259 | 9.9% |
| May 2010 | 966 | 3425 | 28.2% | 2877 | 20174 | 14.3% |
| Jun 2010 | 983 | 2915 | 33.7% | 2295 | 19584 | 11.7% |
| Jul 2010 | 1613 | 3808 | 42.4% | 2789 | 23840 | 11.7% |
| Aug 2010 | 772 | 2324 | 33.2% | 3211 | 27241 | 11.8% |
| Sep 2010 | 770 | 2072 | 37.2% | 3414 | 24257 | 14.1% |
| Oct 2010 | 681 | 2267 | 30% | 2547 | 24362 | 10.5% |
| Nov 2010 | 824 | 2698 | 30.5% | 2667 | 20508 | 13% |
| Dec 2010 | 1942 | 3744 | 51.9% | 1983 | 22411 | 8.8% |
| Jan 2011 | 685 | 2794 | 24.5% | 2716 | 24451 | 11.1% |
| Feb 2011 | 963 | 2901 | 33.2% | 2243 | 22524 | 10% |
| Mar 2011 | 1077 | 2326 | 46.3% | 2371 | 23480 | 10.1% |
| Apr 2011 | 684 | 1674 | 40.9% | 2124 | 17466 | 12.2% |
| May 2011 | 738 | 1679 | 44% | 2457 | 20028 | 12.3% |
| Jun 2011 | 933 | 1677 | 55.6% | 2544 | 20682 | 12.3% |
====================================================================
UPDATE:
Thanks to Ian. It should be noted that I had no influence of any kind on his analysis, other than providing the input data he requested. It is published exactly as he presented it to me, with only some small edits for formatting, with no content changes.
I thought this might be a good time to show something I encountered personally on June 7th, 2009 at RC. Gavin posted up a thread asking for ideas about the blog.
http://www.realclimate.org/index.php/archives/2009/06/groundhog-day-2/
His central question to readers was:
“What is it that you feel needs more explaining?”
I decided I’d offer my suggestion. Big mistake. Here’s a series of screen caps I made illustrating the central systemic bias that RC has, even for basic and germane topics.
It starts out like this when my first suggestion was not published:
It never appeared, so I thought I’d try an experiment. Using my wife’s computer (on the same DSL circuit, same IP address) I decided I’d submit an upbeat generic comment that didn’t offer any sort of challenge to RC using a new email account to see if it was an automation problem related to IP or my name/email address, or if it was simply that RC does not like challenges to their position:
And amazingly, it went right through. So I knew I was not being blocked by IP address or name/keyword, as you can see below, it was approved:
So, I tried again, again on the same home network, my PC this time:
And here it is awaiting moderation:
Nope, it was consigned to the ether:
A few comments later, we can see who is moderating, Gavin himself, note the inline response:
I decided to send a polite email inquiring about my missing comments:
And of course, I never received a response.
So there you have it, even when they ASK for ideas, ones that come from skeptics are apparently deleted; real open debate from a Real Climate scientist, Dr. Gavin Schmidt of NASA GISS.
Update#2 Ric Werme asks in comments:
The next question is “Why do people even bother posting comments at RC?” I’ve found it easier to not go there at all, so I don’t get tempted to add a comment.
Apparently, other than dhogaza and a few hangers on, not many do:
Except for search engine hits, WUWT beats RC in every measure. See for yourself here:
http://www.alexa.com/siteinfo/wattsupwiththat.com+realclimate.org#trafficstats
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.











Ian Rons,
Thank you for that superb post. I know this expression will displease some, but “let the numbers speak for themselves”.
What I’d really like to know is the contribution record of certain regular RC commenters. Any chance on finding out the comment numbers and percentages of such ‘scientific’ luminaries as:
dhogaza
CompletelyFedUp
SecularAnimist
Hank Roberts
Paul Burton Leveson.
Any chance a computer program could tell us how much that privileged lot has contributed quantitatively?
sHx,
You have identified five echo chamber commentators who probably account for half the total RC comments. Add the next 5 and you’re probably up to ≈70% of the total.
Tamino! What are you doing on this blog and what have you done with Ian???
I have to say, Ian – this is not the blog where I expected reasonable skepticism to draw ad hominem attacks from the article authors.
I remain skeptical, I don’t recall mentioning MySQL 4.x, and I’m out of patience with you.
Re dp on July 24, 2011 at 11:28 am:
Heh, lots of command line work to find the nameserver, when it pops up when you WHOIS the site.
Here in the modern age I just Googled “realclimate server host” and found this site with all that info and more.
We estimated that realclimate.org is ranked #162,390 of all websites. Beautiful. BTW: We estimated that wattsupwiththat.com is ranked #18,056 of all websites.
I have a question for you gathered internet site gurus. The physical site of the web83-dot-webfaction-dot-com server is Dallas, Texas. Note something strange in the “Traffic by City – Monthly Data”:
City Name__________Unique Visitors (%)__Page Views (%)
Dallas-Fort Worth__14.70%_______________11.80%
Chicago_____________8.00%_______________10.00%
San Diego___________4.70%________________6.50%
Miami_______________4.60%________________6.80%
Washington__________3.80%________________3.70%
Monterey-Salinas____3.40%________________3.70%
Other Cities_______60.80%_______________57.50%
The server is in Dallas. The Dallas-Fort Worth numbers look skewed, much higher on Unique Visitors vs Page Views, with Unique Visitors nearly twice as high as the next lowest city. Is the server somehow generating its own “Unique Visitors”, perhaps generating it own “hits”?
# of comments in RC’s “Borehole”: 404
How ironic!
[Sorry if this has been posted – I hastily scrolled the comments and couldn;t see it. For those unfamiliar with the term, 404 is the internet standard code for ‘not found’, typical of any post that does not confirm to the Real Climate ideology!]
dp said on July 24, 2011 at 7:06 pm:
I must have missed the meeting where it was decided to redefine the term “ad hominem”. You came along saying you are a very experienced IT professional, and whilst I did refer back to this in a somewhat sceptical manner, this was a response to your argumentum ad verecundiam and not an ad hom. It’s very easy to claim expertise when posting anonymously, and I have every right to question those claims — not that it gets to the substance of the issues under discussion, which I have considered (and examined empirically, as far as possible) in a manner which I think is conscientious.
In your reply, you “quoted” my post with the words “bla bla bla” and simply restated your points more vaguely before saying “I’m right”, then accused me of using ad homs in another comment and talked about your professional experience again. I then responded to the only part of these follow-up replies that seemed even vaguely relevant (data migration between versions of MySQL), and frankly I still see no reason to think the RC database has had serious issues (like mangling of numerical primary keys or other inadvertant post/comment deletion), partly because they’ve always used WordPress and partly because the sort of issues you allude to would almost certainly be evident in the data (in post_id and comment ID sequences), or more visibly in character-set conversion problems on screen. And yes, I have experience in this area: I’ve run a WordPress MU installation for a large social/community hub, and dealt with exactly the sort of database migration that comes with the territory of content management systems, including WordPress, Drupal, phpBB and older stuff like postNuke (for what it’s worth). And I didn’t find any evidence of database corruption with RC (unlike WUWT), so if it exists I think it would be very minor. I note Gavin hasn’t made an issue of it, and nobody’s come along to say they recall RC ever losing a lot of posts (it would have to be a lot to make a difference).
Sure, there’s always some uncertainty involved in this sort of forensic analysis, and I think I’ve been appropriately cautious in the OP, but after looking at the data I don’t really see much in what you’re saying to be concerned about.
:sHx said on July 24, 2011 at 3:59 pm:
Glad you enjoyed the article. It was fun to write.
This may sound overly cautious, but I think we’re getting into tricky ethical territory if we start mining the data for information about specific individuals in the way you suggest. And it’s too easy to see how that would get “spun” by the warmists as some sort of a hate-list (though obviously they’re not above that themselves). What I am willing to say is that 50% of comments come from 70 distinct usernames/handles. Here are the percentages for these comment-posters:
6.584620505
3.7215775858
3.0506346349
2.1520098032
2.0476240904
1.8796992481
1.7057230602
1.2722954267
1.2609491536
1.1194989486
1.0332672728
1.0052797991
0.9841000893
0.9031633409
0.8010468828
0.7526361175
0.7526361175
0.7163280434
0.6807763876
0.573365002
0.5703393292
0.5680700746
0.56580082
0.5559673832
0.5310055824
0.5226849821
0.5181464728
0.5158772182
0.478812726
0.4780563078
0.4697357075
0.4614151072
0.4296455424
0.4009016505
0.3872861228
0.3676192493
0.3623243219
0.3615679037
0.3570293944
0.3494652123
0.3388753574
0.3358496846
0.3328240117
0.3275290843
0.3108878837
0.3071057927
0.3040801198
0.3040801198
0.2950031013
0.286682501
0.2851696646
0.2677720458
0.264746373
0.2639899548
0.2632335366
0.2473487542
0.2435666631
0.2428102449
0.2397845721
0.2299511354
0.2238997897
0.2238997897
0.2208741169
0.2201176987
0.217848444
0.2170920258
0.2170920258
0.2057457527
0.2027200799
0.2004508253
Curious – while the comments are definitely opinions and some are even ad hominems, the article is factual. This article is merely seeking the truth through study and analysis – the very core of real science. That Gavin would label discussion and/or debate as Ad Hominem tends to indicate he has forgotten what science is (as has Hans Moleman). I am sure that should gavin ever want to start practicing REAL science again, he will allow all contrary comments on his blog. But seeing as how he has forgotten what science is, I doubt his policies will change in the near term.
Ian Rans,
Wow!
The top ten commenters in RC have authored not 70% of all comments as we feared but only approx. 25%. It turns out there is absolutely nothing for Gavin and Mike (“Quality academic journals boast about their rejection rates”) to be alarmed about.
And, yes, I withdraw the question about specific RC names/monikers, lest the bullies claim to be the victims.
Further Down the “Bore Hole” | Watts Up With That? is really the sweetest on this notable topic. I harmonise with your conclusions and will thirstily look forward to your incoming updates. Saying thanks will not just be sufficient, for the phenomenal clarity in your writing. I will directly grab your rss feed to stay informed of any updates. Admirable work and much success in your business dealings! Please excuse my poor English as it is not my first tongue.