How many pieces of comment spam have you got hidden

This post is both in jest, and yet on a serious issue none the less. Its also a little hint to the great guy behind WordPress, Matt Mullenweg, that we might need some changes for the next version of WordPress.

In the old days when I was an MT user comment spam was a major problem. Jay Allens MT Blacklist blocked some of it, but daily I would find myself deleting comments by the hundreds. Enter my conversion to WordPress about this time last year with its built in spam fighting abilities, and in my case comment spam isn’t nearly as bad any more. Sure, a few do get through, but on average maybe 5 a day. More also get picked up in the moderation que (this morning it was 20, some days its only a handful). Zapped them no worries as well. But this doesn’t mean that comment spam wasn’t being targeted at the Blog Herald, it meant that most of it never sees the light of day. But I made one mistake though: I presumed that those getting caught got deleted. I was wrong.

I used this plug in: Paged Comment Editing Plugin for WordPress so I could flick through comments in the morning because there were so many legitimate ones coming in lately that they’d disappear off the end of the screen! The plugin also has an option to display spam comments. Spam comments I said to myself? but don’t they all get deleted?. To my amazement, in WordPress they are surpressed but not deleted, so every spam comment takes up a place on your server (which in my case for the Blog Herald recently went past 250mb).

So heres the fun part, can anyone beat this number.

The number of spam comments found on the Blog Herald server that didn’t make the screen (so they weren’t manually deleted):

65,000 (give or take a few)

I can give the exact number because I displayed them at 5000 per page so I could delete them and there were 13 pages at 5000 a page.

Thats a lot of comment spam and bandwidth. If I take it that 5 comments on average get through a day over 365 days in a year then 1825 comments got through and 65000 didn’t, which means that WordPress picked up 97.2% of all comment spam targeted at the Blog Herald, and that’s with a standard installation and no specific spam related plugins. That’s mighty fine fighting figures.

What we need in the next version of WordPress though is the ability to natively access the comment spam and be able to delete it (or by default delete it properly so it doesn’t take up space on a server). The saving to me in MB? over 30MB saved by deleting the comment spam.

Like & Share this Article


  1. says

    I blogged about that a few months ago, and that was around 20,000 hidden spam comments after one year of running WordPress.

    Matt himself argued that it is necessary to keep enough samples in the database to have the potential to build a Bayesian classifier (see comments here). But not in MY database!

    I am now running daily cron jobs purging spam emails from the database, before I back them up.

  2. says

    All of my comment spam is dropped before it gets onto the blog by WP Hashcash 3.0 (which no one but me has yet) and a new trackback plugin (Hardened Trackbacks). Therefore, it’s hard to really estimate how much spam I have gotten. I think the average is around 300 a day, so for a year that would be 109500, or roughly the same order of magnatude…

  3. says

    I get about 20 spam comments a day which works out to 7300 a year but my database is not too big since I tend to delete spam comments held in moderation versus marking them as spam.

    I’ve noticed an increase in trackback spam though.

  4. says

    As far as I can tell, the plugin BadBehavior reduced the spam coming from bots to zero at my wordpress blog. I only receive “real comments/trackbacks” or “manual spam” which gets caught in the moderation queue anyways. Once I disabled the plugin, I had several spam comments in the queue after one night.

  5. says

    Why not use Bad Behavior in combination with SpamKarma 2. I have relatively no spam comments on my blog now that I got both running.

    The stats for both show a lot is being blocked everyday.

    Check out my post on tackling spam.