False Positives and Better Akismet Spam Management

I’ve just spent about two hours rummaging thru 64 pages of spam flagged by Akismet (that’s over 3,150+ spam comments/trackbacks in the last 4 days alone). Of the lot, I found at least 4 legit comments and 7 trackbacks which I had to un-mark as spam and they were not all easy to spot amongst tons and tons of actual spam. A margin of error for false positives of about 0.3% is actually pretty good but don’t we all want those precious comments and trackbacks to get thru without a hitch, right?

So, I tried several tricks to find the false positives using the search:

  • Search for the term “blogherald” or “blog herald”. Most legit trackbacks would mention the name or the URL so that’s a good start to find them.
  • Search for author names. Usually, when readers leave a comment, they address the author as well.
  • Search for unique keywords in recent post titles. Trackbacks often include the post titles too so that’s another way to find the legit ones.

Despite the above quick tricks, I wasn’t sure I got all the real comments and trackback so I had to check out and scroll all 64 pages in the Akismet admin section. I had better chances sorting them out via PHPMyAdmin. I wish there could be some nicer way to manage the spam box and I was thinking of the ff:

  • Ability to delete all spam entries on a “per page” basis. Right now, all you can do is delete ALL spam and that could include the legit ones.
  • Categorize all flagged entries as either spam comments or spam trackbacks. The trackbacks will be fewer so you’ll easily spot the real ones from the fake ones.
  • Ability to filter spam by the ff. parameters – number of links per entry, type of language/characters used, originating IP addresses, email, or even the length of comments.
  • … and maybe even a spam intensity rating filter: smells like spam, spammy, spammier, spammiest.

I know I could be demanding too much from a service that’s practically free but it doesn’t hurt to give suggestions for improvement. :)

  • I think thats a great idea, at least a filter or two you can run the list of spam through to thin the list down ie

    Delete/hide spam with more than 6 links or any of ‘these’ words (the general spamish words) or from know spammy ips/emails.

    or just order by spamification so all of the lease spammy float to the top and can be plucked out easier :)

    Great post

  • Akismet is one of the best SPAM filters out there. However, there are a number of obvious improvements that can be made to:
    – Improve accuracy of filtering
    – Making it easy to scan through and separate valid comments from true spam

    When looking at a site for a different product, they had an interesting approach where people could register and then post “proposed product enhancements”. More interesting, one go could through the existing list and “vote” for developments of interest. This enabled the user community to official register their wish-list and also, as a group, to prioritise them. I hope that Akismet and other products will take a similar approach. That way, rather than posting them on a blog like this (which is admirable, given the absence of other formal feedback mechanisms) one can feed direct into the design community.

  • I’ve been begging for a per page delete for ages. It’s critical to the success of handling that many comment spams.

    After a trip recently, and several days offline, I found over 6,000 comment spams. When the paging feature goes into the hundreds, there is no way to keep up with all of that. I do the same searching and other techniques you mentioned, but I fear that some are still caught. Still, 4-8 found in 3000+ is pretty good comment spam catching.

  • I’ve been using Spam Arrest for a couple of years now. It really works for me and it seems more and more people I know keep joining it, I guess more and more people like me are just getting to much spam for any filter to work. You should check it out if you haven’t already its Spam Arrest.

  • have you done any searches for 1800HART in your spam pages? I know I’ve lost many comments here. But, after a while I figured .. it was just an impulse comment anyway – if it was really contributing to the betterment of life I would just let you know and ask to be un-spam-detected (which I have done in the past).

    I don’t mind being a statistic in people’s footer. Often.

