Further Down the "Bore Hole"

A look at comment deletion at RealClimate compared with WUWT

Guest post submitted by Ian Rons

Regular readers will doubtless be familiar, either at first- or second-hand, with the enthusiasm with which moderators at RealClimate.org seem to reject comments from AGW sceptics. Ecotretas’ recent story on Realclimate censorship (re-posted here) piqued my interest, since in addition to the usual tones of indignation, it suggested a method of estimating the RealClimate comment deletion rate by looking at the comment IDs (as revealed by WordPress’s use of the HTML (attribute), and counting the number of these IDs which are missing from the sequence.

Being at a loose end, I took up the cudgels and wrote a script using PHP and cURL which took about an hour to mine every available page on realclimate.org by accessing the page using its WordPress “post_id” value (from http://…/?p=1 to http://…/?p=8092, as at (14th July), extracting comment IDs with a simple regular expression and doing a bit of maths on the result. Naturally, the source code and the various output files are available upon request.

The figures are rather high, though doubtless some missing comments will have been spam.

However, at least in recent times, the RC site has employed the “re-CAPTCHA”service, which (unlike the Akismet service used by WUWT) does not create a new comment ID if the comment is rejected for being spam, so for instance the 56% of comments missing during June 2011 seems likely to be an accurate figure, unless some other explanation can be found.

A possible explanation might be the existence of a large number of comments on the site by an inner circle of users hidden on special-access pages, but I find it hard to believe this could account for a large proportion of (e.g.) the 933 comments which are missing in June 2011. Similarly, the apparent surge of deletions beginning July 2007 may also be truly reflective of events, since it has been suggested in comments here that it “coincided” with RC’s attack on the surface stations project. However it has also been pointed out that such interpretations are impossible to verify using this method.

Overall, there were 78,639 missing comment IDs, out of a total of 210,595, or 37.3%. As for the RC page known as “The Bore Hole”, which started on 6th January 2011 (the date being evident from the post_id of 6013 and the moderator’s response to the first comment, although they have since re-dated it 6th December 2004 for some reason), the comments on that page are of course counted here as “published” comments, however they are small in number (404) when compared with the number of comments which seem to go missing even after that date in January (5,000). At the risk of mixing metaphors, “The Bore Hole” could perhaps be regarded as something of a fig-leaf.

I ran a similar scan on WUWT (also a WordPress site) on the 14th July, extracting data from http://…/?p=1 to http://…/?p=43440. However, analysing WUWT with this method presented several problems that aren’t applicable to the RC site:

  • Some earlier comment IDs are out of chronological order (stemming, it seems, from the import from TypePad to WordPress in October 2007), so figures for early months are impossible to calculate. During this period there seems to have been some infilling of comment IDs (probably due to the TypePad import not setting the “auto_increment” values in the database properly), which would affect the overall total; however, the numbers involved (whilst impossible to calculate precisely) are probably at most in the very low hundreds.
  • WUWT has always used WordPress’s Akismet spam-filtering, which creates new comment IDs before marking them as spam. Anthony provided me with a screenshot showing the total volume of spam which had been deleted as of early on the 15th July to be 55,097. This can be adjusted down to 55,085 for the period covered by my data to late on the 14th July 2011.
  • The Tips & Notes page encourages comments from readers which are not intended to remain permanently on the site, so they are to be regarded as “legitimate” deletions. Anthony provided me with records of the numbers of Tips & Notes comments posted (then eventually deleted) for the period 24th March to 10th July 2011 (3,220), on which I based an estimate of 22,215 “legitimately deleted” comments for the period 23rd June 2009 (when the T&N page was created) to 14th July 2011 inclusive.

Overall, WUWT has 75,989 missing comments IDs, out of a total of 700,115 submitted comments (10.9%). Subtracting the above figures for Akismet and Tips & Notes gives us a problem, since it’s a negative figure: -1,311. I think this is most likely due to an over-estimation of the number of comments posted on the Tips & Notes page, combined with perhaps a few hundred from the infilling problem mentioned above. However, the combined additions from these two sources of error would have to be in excess of 8,000 to raise the number of deletions to 1% of the total submitted, which I think very unlikely.

Putting it another way, and assuming a total of 200 “infilled” comment IDs (a high estimate, in my opinion), I would have to have over-estimated the volume of Tips & Notes comments by some 58% to reach a 1% deletion rate. I therefore see no reason to doubt the claims made on behalf of WUWT that the deletion rate is less than 1%. In fact it may be considerably lower. It is, however, noteworthy that December 2007 and January 2008 show high deletion rates, with another bump during Sep-Oct 2008:

In summary, whilst there remains some uncertainty especially regarding the RC data, the data does tend to support the anecdotal evidence concerning RC’s tendentious comment moderation practices. It also tends to support (or at least does not contradict) WUWT’s claims of a <1% comment deletion record.

For reference, here are the monthly totals which I used for the graphs. This table excludes incomplete months and some early WUWT months as noted:

RealClimate Watts Up With That?
Month Missing Submitted Missing (%) Missing Submitted Missing (%)
Jan 2005 86 524 16.4%
Feb 2005 111 383 29%
Mar 2005 53 286 18.5%
Apr 2005 96 294 32.7%
May 2005 47 305 15.4%
Jun 2005 119 482 24.7%
Jul 2005 524 826 63.4%
Aug 2005 255 474 53.8%
Sep 2005 112 527 21.3%
Oct 2005 99 664 14.9%
Nov 2005 67 654 10.2%
Dec 2005 544 1150 47.3%
Jan 2006 277 944 29.3%
Feb 2006 306 1236 24.8%
Mar 2006 390 1292 30.2%
Apr 2006 660 2130 31%
May 2006 580 1477 39.3%
Jun 2006 174 995 17.5%
Jul 2006 142 1252 11.3%
Aug 2006 888 2123 41.8%
Sep 2006 253 1005 25.2%
Oct 2006 340 1055 32.2%
Nov 2006 114 1290 8.8%
Dec 2006 62 876 7.1%
Jan 2007 203 1791 11.3%
Feb 2007 223 2282 9.8%
Mar 2007 343 3107 11%
Apr 2007 160 1960 8.2%
May 2007 213 2271 9.4%
Jun 2007 188 2055 9.1%
Jul 2007 4061 5724 70.9%
Aug 2007 7171 9511 75.4%
Sep 2007 4140 5499 75.3%
Oct 2007 4561 7091 64.3%
Nov 2007 6064 8226 73.7% 108 476 22.7%
Dec 2007 4184 6073 68.9% 547 869 62.9%
Jan 2008 493 1938 25.4% 497 1217 40.8%
Feb 2008 452 1656 27.3% 536 2027 26.4%
Mar 2008 332 1444 23% 776 3212 24.2%
Apr 2008 854 2222 38.4% 396 3023 13.1%
May 2008 1159 3050 38% 465 3192 14.6%
Jun 2008 880 2526 34.8% 586 5781 10.1%
Jul 2008 1156 3086 37.5% 751 6651 11.3%
Aug 2008 922 2733 33.7% 514 6775 7.6%
Sep 2008 873 2827 30.9% 1596 9174 17.4%
Oct 2008 692 1892 36.6% 1918 8936 21.5%
Nov 2008 1466 3026 48.4% 931 7012 13.3%
Dec 2008 1089 3127 34.8% 436 7599 5.7%
Jan 2009 1063 3269 32.5% 508 11357 4.5%
Feb 2009 834 2587 32.2% 1053 12586 8.4%
Mar 2009 1232 3260 37.8% 857 16186 5.3%
Apr 2009 1635 4369 37.4% 662 16291 4.1%
May 2009 2037 4361 46.7% 641 14217 4.5%
Jun 2009 808 3183 25.4% 1236 13525 9.1%
Jul 2009 646 3664 17.6% 1561 14722 10.6%
Aug 2009 384 2341 16.4% 1606 13619 11.8%
Sep 2009 337 1657 20.3% 1802 15389 11.7%
Oct 2009 722 3699 19.5% 2187 19746 11.1%
Nov 2009 1518 5745 26.4% 2945 25712 11.5%
Dec 2009 981 6401 15.3% 4339 36716 11.8%
Jan 2010 728 5349 13.6% 2250 26840 8.4%
Feb 2010 966 6020 16% 2267 26640 8.5%
Mar 2010 873 5066 17.2% 2349 26051 9%
Apr 2010 883 4227 20.9% 2312 23259 9.9%
May 2010 966 3425 28.2% 2877 20174 14.3%
Jun 2010 983 2915 33.7% 2295 19584 11.7%
Jul 2010 1613 3808 42.4% 2789 23840 11.7%
Aug 2010 772 2324 33.2% 3211 27241 11.8%
Sep 2010 770 2072 37.2% 3414 24257 14.1%
Oct 2010 681 2267 30% 2547 24362 10.5%
Nov 2010 824 2698 30.5% 2667 20508 13%
Dec 2010 1942 3744 51.9% 1983 22411 8.8%
Jan 2011 685 2794 24.5% 2716 24451 11.1%
Feb 2011 963 2901 33.2% 2243 22524 10%
Mar 2011 1077 2326 46.3% 2371 23480 10.1%
Apr 2011 684 1674 40.9% 2124 17466 12.2%
May 2011 738 1679 44% 2457 20028 12.3%
Jun 2011 933 1677 55.6% 2544 20682 12.3%

====================================================================

UPDATE:

Thanks to Ian. It should be noted that I had no influence of any kind on his analysis, other than providing the input data he requested. It is published exactly as he presented it to me, with only some small edits for formatting, with no content changes.

I thought this might be a good time to show something I encountered personally on June 7th, 2009 at RC. Gavin posted up a thread asking for ideas about the blog.

http://www.realclimate.org/index.php/archives/2009/06/groundhog-day-2/

His central question to readers was:

“What is it that you feel needs more explaining?”

I decided I’d offer my suggestion. Big mistake. Here’s a series of screen caps I made illustrating the central systemic bias that RC has, even for basic and germane topics.

It starts out like this when my first suggestion was not published:

It never appeared, so I thought I’d try an experiment. Using my wife’s computer (on the same DSL circuit, same IP address) I decided I’d submit an upbeat generic comment that didn’t offer any sort of challenge to RC using a new email account to see if it was an automation problem related to IP or my name/email address, or if it was simply that RC does not like challenges to their position:

And amazingly, it went right through. So I knew I was not being blocked by IP address or name/keyword, as you can see below, it was approved:

So, I tried again, again on the same home network, my PC this time:

And here it is awaiting moderation:

Nope, it was consigned to the ether:

A few comments later, we can see who is moderating, Gavin himself, note the inline response:

I decided to send a polite email inquiring about my missing comments:

And of course, I never received a response.

So there you have it, even when they ASK for ideas, ones that come from skeptics are apparently deleted; real open debate from a Real Climate scientist, Dr. Gavin Schmidt of NASA GISS.

Update#2 Ric Werme asks in comments:

The next question is “Why do people even bother posting comments at RC?” I’ve found it easier to not go there at all, so I don’t get tempted to add a comment.

Apparently, other than dhogaza and a few hangers on, not many do:

Except for search engine hits, WUWT beats RC in every measure. See for yourself here:

http://www.alexa.com/siteinfo/wattsupwiththat.com+realclimate.org#trafficstats

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

162 Comments
Inline Feedbacks
View all comments
sHx
July 24, 2011 3:59 pm

Ian Rons,
Thank you for that superb post. I know this expression will displease some, but “let the numbers speak for themselves”.
What I’d really like to know is the contribution record of certain regular RC commenters. Any chance on finding out the comment numbers and percentages of such ‘scientific’ luminaries as:
dhogaza
CompletelyFedUp
SecularAnimist
Hank Roberts
Paul Burton Leveson.
Any chance a computer program could tell us how much that privileged lot has contributed quantitatively?

July 24, 2011 5:43 pm

sHx,
You have identified five echo chamber commentators who probably account for half the total RC comments. Add the next 5 and you’re probably up to ≈70% of the total.

dp
July 24, 2011 7:06 pm

Ian Rons says:
July 24, 2011 at 3:26 pm
dp,
Differences between MySQL <4/4.0 and 4.1/5.xxx are irrelevant, and it's totally spurious to raise such issues. FUD.

Tamino! What are you doing on this blog and what have you done with Ian???
I have to say, Ian – this is not the blog where I expected reasonable skepticism to draw ad hominem attacks from the article authors.
I remain skeptical, I don’t recall mentioning MySQL 4.x, and I’m out of patience with you.

kadaka (KD Knoebel)
July 24, 2011 9:43 pm

Re dp on July 24, 2011 at 11:28 am:
Heh, lots of command line work to find the nameserver, when it pops up when you WHOIS the site.
Here in the modern age I just Googled “realclimate server host” and found this site with all that info and more.
We estimated that realclimate.org is ranked #162,390 of all websites. Beautiful. BTW: We estimated that wattsupwiththat.com is ranked #18,056 of all websites.
I have a question for you gathered internet site gurus. The physical site of the web83-dot-webfaction-dot-com server is Dallas, Texas. Note something strange in the “Traffic by City – Monthly Data”:

City Name__________Unique Visitors (%)__Page Views (%)
Dallas-Fort Worth__14.70%_______________11.80%
Chicago_____________8.00%_______________10.00%
San Diego___________4.70%________________6.50%
Miami_______________4.60%________________6.80%
Washington__________3.80%________________3.70%
Monterey-Salinas____3.40%________________3.70%
Other Cities_______60.80%_______________57.50%

The server is in Dallas. The Dallas-Fort Worth numbers look skewed, much higher on Unique Visitors vs Page Views, with Unique Visitors nearly twice as high as the next lowest city. Is the server somehow generating its own “Unique Visitors”, perhaps generating it own “hits”?

jeef
July 24, 2011 9:49 pm

# of comments in RC’s “Borehole”: 404
How ironic!
[Sorry if this has been posted – I hastily scrolled the comments and couldn;t see it. For those unfamiliar with the term, 404 is the internet standard code for ‘not found’, typical of any post that does not confirm to the Real Climate ideology!]

Ian Rons
July 25, 2011 3:37 am

dp said on July 24, 2011 at 7:06 pm:

Tamino! What are you doing on this blog and what have you done with Ian???
I have to say, Ian – this is not the blog where I expected reasonable skepticism to draw ad hominem attacks from the article authors.
I remain skeptical, I don’t recall mentioning MySQL 4.x, and I’m out of patience with you.

I must have missed the meeting where it was decided to redefine the term “ad hominem”. You came along saying you are a very experienced IT professional, and whilst I did refer back to this in a somewhat sceptical manner, this was a response to your argumentum ad verecundiam and not an ad hom. It’s very easy to claim expertise when posting anonymously, and I have every right to question those claims — not that it gets to the substance of the issues under discussion, which I have considered (and examined empirically, as far as possible) in a manner which I think is conscientious.
In your reply, you “quoted” my post with the words “bla bla bla” and simply restated your points more vaguely before saying “I’m right”, then accused me of using ad homs in another comment and talked about your professional experience again. I then responded to the only part of these follow-up replies that seemed even vaguely relevant (data migration between versions of MySQL), and frankly I still see no reason to think the RC database has had serious issues (like mangling of numerical primary keys or other inadvertant post/comment deletion), partly because they’ve always used WordPress and partly because the sort of issues you allude to would almost certainly be evident in the data (in post_id and comment ID sequences), or more visibly in character-set conversion problems on screen. And yes, I have experience in this area: I’ve run a WordPress MU installation for a large social/community hub, and dealt with exactly the sort of database migration that comes with the territory of content management systems, including WordPress, Drupal, phpBB and older stuff like postNuke (for what it’s worth). And I didn’t find any evidence of database corruption with RC (unlike WUWT), so if it exists I think it would be very minor. I note Gavin hasn’t made an issue of it, and nobody’s come along to say they recall RC ever losing a lot of posts (it would have to be a lot to make a difference).
Sure, there’s always some uncertainty involved in this sort of forensic analysis, and I think I’ve been appropriately cautious in the OP, but after looking at the data I don’t really see much in what you’re saying to be concerned about.
:sHx said on July 24, 2011 at 3:59 pm:

Thank you for that superb post. I know this expression will displease some, but “let the numbers speak for themselves”.
What I’d really like to know is the contribution record of certain regular RC commenters. Any chance on finding out the comment numbers and percentages of such ‘scientific’ luminaries as:
dhogaza
CompletelyFedUp
SecularAnimist
Hank Roberts
Paul Burton Leveson.
Any chance a computer program could tell us how much that privileged lot has contributed quantitatively?

Glad you enjoyed the article. It was fun to write.
This may sound overly cautious, but I think we’re getting into tricky ethical territory if we start mining the data for information about specific individuals in the way you suggest. And it’s too easy to see how that would get “spun” by the warmists as some sort of a hate-list (though obviously they’re not above that themselves). What I am willing to say is that 50% of comments come from 70 distinct usernames/handles. Here are the percentages for these comment-posters:
6.584620505
3.7215775858
3.0506346349
2.1520098032
2.0476240904
1.8796992481
1.7057230602
1.2722954267
1.2609491536
1.1194989486
1.0332672728
1.0052797991
0.9841000893
0.9031633409
0.8010468828
0.7526361175
0.7526361175
0.7163280434
0.6807763876
0.573365002
0.5703393292
0.5680700746
0.56580082
0.5559673832
0.5310055824
0.5226849821
0.5181464728
0.5158772182
0.478812726
0.4780563078
0.4697357075
0.4614151072
0.4296455424
0.4009016505
0.3872861228
0.3676192493
0.3623243219
0.3615679037
0.3570293944
0.3494652123
0.3388753574
0.3358496846
0.3328240117
0.3275290843
0.3108878837
0.3071057927
0.3040801198
0.3040801198
0.2950031013
0.286682501
0.2851696646
0.2677720458
0.264746373
0.2639899548
0.2632335366
0.2473487542
0.2435666631
0.2428102449
0.2397845721
0.2299511354
0.2238997897
0.2238997897
0.2208741169
0.2201176987
0.217848444
0.2170920258
0.2170920258
0.2057457527
0.2027200799
0.2004508253

July 25, 2011 5:19 am

Their whole endeavour is a giant ad hom argument designed to shift discussion from substance and science to personalities.

Curious – while the comments are definitely opinions and some are even ad hominems, the article is factual. This article is merely seeking the truth through study and analysis – the very core of real science. That Gavin would label discussion and/or debate as Ad Hominem tends to indicate he has forgotten what science is (as has Hans Moleman). I am sure that should gavin ever want to start practicing REAL science again, he will allow all contrary comments on his blog. But seeing as how he has forgotten what science is, I doubt his policies will change in the near term.

sHx
July 25, 2011 9:26 am

Ian Rans,
Wow!
The top ten commenters in RC have authored not 70% of all comments as we feared but only approx. 25%. It turns out there is absolutely nothing for Gavin and Mike (“Quality academic journals boast about their rejection rates”) to be alarmed about.
And, yes, I withdraw the question about specific RC names/monikers, lest the bullies claim to be the victims.

Nickolas Smialek
August 10, 2011 12:49 pm

Further Down the “Bore Hole” | Watts Up With That? is really the sweetest on this notable topic. I harmonise with your conclusions and will thirstily look forward to your incoming updates. Saying thanks will not just be sufficient, for the phenomenal clarity in your writing. I will directly grab your rss feed to stay informed of any updates. Admirable work and much success in your business dealings! Please excuse my poor English as it is not my first tongue.

1 5 6 7