A look at comment deletion at RealClimate compared with WUWT
Guest post submitted by Ian Rons
Regular readers will doubtless be familiar, either at first- or second-hand, with the enthusiasm with which moderators at RealClimate.org seem to reject comments from AGW sceptics. Ecotretas’ recent story on Realclimate censorship (re-posted here) piqued my interest, since in addition to the usual tones of indignation, it suggested a method of estimating the RealClimate comment deletion rate by looking at the comment IDs (as revealed by WordPress’s use of the HTML (attribute), and counting the number of these IDs which are missing from the sequence.
Being at a loose end, I took up the cudgels and wrote a script using PHP and cURL which took about an hour to mine every available page on realclimate.org by accessing the page using its WordPress “post_id” value (from http://…/?p=1 to http://…/?p=8092, as at (14th July), extracting comment IDs with a simple regular expression and doing a bit of maths on the result. Naturally, the source code and the various output files are available upon request.
The figures are rather high, though doubtless some missing comments will have been spam.
However, at least in recent times, the RC site has employed the “re-CAPTCHA”service, which (unlike the Akismet service used by WUWT) does not create a new comment ID if the comment is rejected for being spam, so for instance the 56% of comments missing during June 2011 seems likely to be an accurate figure, unless some other explanation can be found.
A possible explanation might be the existence of a large number of comments on the site by an inner circle of users hidden on special-access pages, but I find it hard to believe this could account for a large proportion of (e.g.) the 933 comments which are missing in June 2011. Similarly, the apparent surge of deletions beginning July 2007 may also be truly reflective of events, since it has been suggested in comments here that it “coincided” with RC’s attack on the surface stations project. However it has also been pointed out that such interpretations are impossible to verify using this method.
Overall, there were 78,639 missing comment IDs, out of a total of 210,595, or 37.3%. As for the RC page known as “The Bore Hole”, which started on 6th January 2011 (the date being evident from the post_id of 6013 and the moderator’s response to the first comment, although they have since re-dated it 6th December 2004 for some reason), the comments on that page are of course counted here as “published” comments, however they are small in number (404) when compared with the number of comments which seem to go missing even after that date in January (5,000). At the risk of mixing metaphors, “The Bore Hole” could perhaps be regarded as something of a fig-leaf.
I ran a similar scan on WUWT (also a WordPress site) on the 14th July, extracting data from http://…/?p=1 to http://…/?p=43440. However, analysing WUWT with this method presented several problems that aren’t applicable to the RC site:
- Some earlier comment IDs are out of chronological order (stemming, it seems, from the import from TypePad to WordPress in October 2007), so figures for early months are impossible to calculate. During this period there seems to have been some infilling of comment IDs (probably due to the TypePad import not setting the “auto_increment” values in the database properly), which would affect the overall total; however, the numbers involved (whilst impossible to calculate precisely) are probably at most in the very low hundreds.
- WUWT has always used WordPress’s Akismet spam-filtering, which creates new comment IDs before marking them as spam. Anthony provided me with a screenshot showing the total volume of spam which had been deleted as of early on the 15th July to be 55,097. This can be adjusted down to 55,085 for the period covered by my data to late on the 14th July 2011.
- The Tips & Notes page encourages comments from readers which are not intended to remain permanently on the site, so they are to be regarded as “legitimate” deletions. Anthony provided me with records of the numbers of Tips & Notes comments posted (then eventually deleted) for the period 24th March to 10th July 2011 (3,220), on which I based an estimate of 22,215 “legitimately deleted” comments for the period 23rd June 2009 (when the T&N page was created) to 14th July 2011 inclusive.
Overall, WUWT has 75,989 missing comments IDs, out of a total of 700,115 submitted comments (10.9%). Subtracting the above figures for Akismet and Tips & Notes gives us a problem, since it’s a negative figure: -1,311. I think this is most likely due to an over-estimation of the number of comments posted on the Tips & Notes page, combined with perhaps a few hundred from the infilling problem mentioned above. However, the combined additions from these two sources of error would have to be in excess of 8,000 to raise the number of deletions to 1% of the total submitted, which I think very unlikely.
Putting it another way, and assuming a total of 200 “infilled” comment IDs (a high estimate, in my opinion), I would have to have over-estimated the volume of Tips & Notes comments by some 58% to reach a 1% deletion rate. I therefore see no reason to doubt the claims made on behalf of WUWT that the deletion rate is less than 1%. In fact it may be considerably lower. It is, however, noteworthy that December 2007 and January 2008 show high deletion rates, with another bump during Sep-Oct 2008:
In summary, whilst there remains some uncertainty especially regarding the RC data, the data does tend to support the anecdotal evidence concerning RC’s tendentious comment moderation practices. It also tends to support (or at least does not contradict) WUWT’s claims of a <1% comment deletion record.
For reference, here are the monthly totals which I used for the graphs. This table excludes incomplete months and some early WUWT months as noted:
|RealClimate||Watts Up With That?|
|Month||Missing||Submitted||Missing (%)||Missing||Submitted||Missing (%)|
Thanks to Ian. It should be noted that I had no influence of any kind on his analysis, other than providing the input data he requested. It is published exactly as he presented it to me, with only some small edits for formatting, with no content changes.
I thought this might be a good time to show something I encountered personally on June 7th, 2009 at RC. Gavin posted up a thread asking for ideas about the blog.
His central question to readers was:
“What is it that you feel needs more explaining?”
I decided I’d offer my suggestion. Big mistake. Here’s a series of screen caps I made illustrating the central systemic bias that RC has, even for basic and germane topics.
It starts out like this when my first suggestion was not published:
It never appeared, so I thought I’d try an experiment. Using my wife’s computer (on the same DSL circuit, same IP address) I decided I’d submit an upbeat generic comment that didn’t offer any sort of challenge to RC using a new email account to see if it was an automation problem related to IP or my name/email address, or if it was simply that RC does not like challenges to their position:
And amazingly, it went right through. So I knew I was not being blocked by IP address or name/keyword, as you can see below, it was approved:
So, I tried again, again on the same home network, my PC this time:
And here it is awaiting moderation:
Nope, it was consigned to the ether:
A few comments later, we can see who is moderating, Gavin himself, note the inline response:
I decided to send a polite email inquiring about my missing comments:
And of course, I never received a response.
So there you have it, even when they ASK for ideas, ones that come from skeptics are apparently deleted; real open debate from a Real Climate scientist, Dr. Gavin Schmidt of NASA GISS.
Update#2 Ric Werme asks in comments:
The next question is “Why do people even bother posting comments at RC?” I’ve found it easier to not go there at all, so I don’t get tempted to add a comment.
Apparently, other than dhogaza and a few hangers on, not many do:
Except for search engine hits, WUWT beats RC in every measure. See for yourself here: