2008: The Year Ahead for Spam Blogs

Filed as Features on December 31, 2007 2:25 pm

As the year draws to a close, the blogging community has a great deal to reflect on and look ahead toward. Between the viral videos, blogstorms and major upgrades, it has been a busy year.

But for those of us involved in content theft and spamming issues, 2007 was something of a bittersweet year. A lot of progress was made in the fight against spam, but a great deal went wrong. It seemed that, for every victory, there were at least two setbacks.

Sadly, it seems that we can expect a very similar year in 2008. However, there are new tools and new possibilities that might make the next year a little bit more bright than the one gone by. Perhaps, with a little bit of luck, 2008 can be a brighter year than 2007 when it comes to spam blogs.

A Look Back at 2007

To be certain, 2007 started out with a great deal of promise. WordPress.com showed us how to run a relatively spam-free service. A little while later, Blogspot hit back against blog spam on its service. Though the effect was not lasting, it was the first major public offensive by the BlogSpot team against what many see as the main source for blog spam.

However, as the year wore on, spammers began to evolve their tactics. 2007 would also be known as the year that we saw the rise of spinning spammers, scrapers that take content from RSS feed and then modify it through either translations or synonyms. These spammers are tougher to detect, for both copyright holders and search engines, and have a much more elaborate structure to their operation.

The rise of spinning spammers was coupled with the further rise in two other trends, a sharp rise in keyword scraping, where the spammer scrapes a Google Blog Search result or a Technorati Watchlist for a certain desired keyword, and a rise in spammers turning to domain hosting.

The latter trend is especially worrisome. Previously, the majority of spammers would simply create as many free blog accounts as they could, fill them up either with computer-generated garbage or content scraped from RSS feeds, include a few affiliate links/ads and then try to make money. This made them easy to detect and easier to shut down. The new trend for spammers is paid hosting with hundreds of domains. This is often done with “wink and nod” approval from Web hosts, happy to get a lucrative customer. These networks are harder to detect and even more difficult to shut down.

This evolution in spam was due in part to pressures from the free hosting providers but, most likely, due more to the fierce competition for search results and traffic. Simply put, spinning, keyword scraping and domain hosts generate results, that is why spammers have taken to using them despite their higher costs.

Sadly, it appears that trend will continue deep into the new year.

Looking Ahead to 2008

Unfortunately, the trends that ended 2007 will likely mar the beginnings of 2008. Spinning spammers will continue to become more common, more spammers will move to domains and keyword scraping will become even more popular. Though some spammers will always find a home on the free services, especially newcomers, the more skilled and prolific spammers will likely continue to favor more reliable hosting.

This means that spam blog hosts, once eager to get rid of junk content on their servers, will have less incentive to cooperate in the fight against internet pollution and Webmasters will struggle to get their content pulled from junk sites. Couple this with the continuing lack of cooperation from Adsense and other ad networks and these operations are going to be nearly impossible to shut down.

As Jason Calacanis pointed out in his presentation about internet pollution, much of the problem with battling spam is that the people who create the platforms that enable spammers would make much less money if they stepped in and effectively policed their services.

Sadly, as the amount of money spammers can make grows with the new techniques, they will likely find more and more services willing to turn a blind eye to the abuse. Services currently being abused by spammers will continue to be abused, albeit with more impunity, and new services will open up the back door to spammers, including at least some respected Web hosts.

However, there is good news. In response, at least in large part, to this rise in spamming and scraping, there is a rising counter-market for protection against it.

First, new services will take flight in 2008. For one, Attributor will offer a version of their service for individual bloggers. Attributor can help not only improve detection of infringing material by detecting more kinds of infringements, checking for license compliance and prioritizing cases, but also aid with the removal of any unlicensed content.

Second, Blogwerx is scheduled to re-launch some time in the coming months. This comes after the service stumbled after its first launch almost a year ago due to technical issues. Blogwerx has the potential to detect modified plagiarism, including that from spinning spammers.

Finally, as the problem grows, we can expect to see more and more programmers taking up the challenge. We already have several great WordPress plugins for dealing with content theft, 2008 seems as if it could be a great year for even more advanced tools and techniques.

In short, though 2007 brought out the worst in the spammers, it seems likely that 2008 could bring out the best in the spam fighters. The heightened interest in tracking content and preventing misuse is attracting investors to a budding industry and that, in turn, could make available cheaper, better and more powerful content tracking and protection tools.

Conclusions

More than anything, 2008 is poised to be a year of escalated warfare. In that regard, it will be little different than years gone by. After all, every year since spam first found its way onto the Web has been a game of cat and mouse.

However, what will make this year different is that, for the first time, corporations, along with big dollars, will be thrown into the fight. Companies will have to choose sides and decide if they can make more money by helping the spammers, tacitly or directly, or by promoting a healthy Web by fighting Internet pollution.

Unfortunately, we’ve seen how far too many companies have answered that question and, given that some of the biggest companies on the Web are already some of the most important “supporters” of spam blogs, it is going to be very hard for smaller companies to resist the temptation of an easy buck.

Let’s hope that 2008 is the year that companies start putting their foot down and start getting serious about fighting spam. If it isn’t, it could be too late to every hope to control the problem.

Disclosure: I am a consultant for Attributor.

Tags: , , , ,

This post was written by

You can visit the for a short bio, more posts, and other information about the author.

Submissions & Subscriptions

Submit the post to Reddit, StumbleUpon, Digg or Del.icio.us.

Did you like it? Then subscribe to our RSS feed!



  1. By Adam posted on December 31, 2007 at 3:00 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    and more and more programmers developing new spam techniques im afraid..

    Reply

  2. By Jonathan Bailey posted on December 31, 2007 at 4:23 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    Adam:

    But at least we now have some money and corporations backing the good guys. It’s a slight improvement…

    Reply

  3. By Josly posted on January 1, 2008 at 4:47 am
    Want an avatar? Get a gravatar! • You can link to this comment

    Anyway, I’ve been arguing this point for years: that too much choice is bad because it makes it difficult to find the right thing and makes it easy for dishonest companies to slip crap products into the market.

    Reply

  4. By Sue posted on January 2, 2008 at 3:07 am
    Want an avatar? Get a gravatar! • You can link to this comment

    Jonathan, Attributor made a fine choice in choosing you as a consultant.
    What gets me is why are they scraping me? I have a narrow niche that is of no interest to the general blogger, nowhere near the readers most blogs do (around a hundred a day, counting subscribers) so why me? Ah, wait. It’s that Page Rank thing! As soon as I got a page rank from Google, as in 0 to 3, suddenly overnight I have scrapers pulling my content. And guess what, they’re all loaded with adsense. At least between Akismet and Peter’s Custom Captcha plugin, I can keep the spam comments away.

    Reply

  5. By Lorelle VanFossen posted on January 2, 2008 at 6:46 am
    Want an avatar? Get a gravatar! • You can link to this comment

    @Sue:

    It’s amazing how often I hear people say this same thing. Honestly, do you think that any scraper who grabs thousands of blog contents every day even pays attention to who they are grabbing?

    I’ve found that they go after “small” bloggers as they are the ones less willing to “fight back”. It may or may not have anything to do with PageRank. It could do with keywords you use on your blog that brought them into their sights. It could be a lot of things, which I hope Jonathan can enlighten us with, but don’t think you are special because you’ve been scraped.

    EVERYONE is being scraped or has the potential to be scraped. It’s a sick world out there. Thank goodness we have people like Jonathan Bailey fighting for our rights and educating us.

    Reply

  6. By Jonathan Bailey posted on January 2, 2008 at 12:34 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    Josly:

    As a marketer I have to agree. Choice is good, too much choice is bad. Look at Linux for proof of that.

    Still, I don’t know if that is the problem here or not. I’m really unsure what angle you’re coming at this from or choice you are referring to? Choice in Web sites? That could be part of it at least.

    Sue:

    Why are they scraping you?

    Well, it’s hard to add on Lorelle here, but I can take a stab.

    The first problem is that some spammers seem to be grabbing just about everything that comes down the pipe. When you ping your blog out with every new entry, it goes to services that spammers and good guys alike can read and update from. Many pull from those sources and get incredible amounts of content.

    The second is that many spammers target keywords and since it doesn’t matter if the keyword is the center of your article, you can get picked up on some strange sites. Case in point, I had a woman who operated a blog focused on helping parents protecting their children that found her content being used on a spam blog promoting teen porn. Why? Because she happened to mention the two words in the same piece when talking about the dangers faced by runaways. It was horrible and disgusting, but is also the nature of this automated software.

    I’ve done tests on my own and found that scraping starts from the very first post, literally. It’s frightening stuff. It doesn’t matter how big or small you are, it’s just a matter of if your content is available.

    The only thing the PageRank hike might have changed is that it put you on spam “short lists”, lists that spammers keep of active blogs that have a good reputation. These are preferred targets because they create a reliable content stream. However, it isn’t a matter of big or small, just how much output you have.

    The bottom line is that spammers want content and they don’t care who they get it from.

    Sad, but true.

    Lorelle:

    Thank you for the high praise and for your efforts as well. You’ve done as much as I in this field!

    Reply

  7. PlagiarismToday » 2008: Looking AheadJanuary 2, 2008 at 5:18 pm
  8. By Lorelle VanFossen posted on January 2, 2008 at 7:27 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    @Jonathan Bailey:

    I have to take care in the words I choose now for the very reason Jonathan mentions. Within minutes of publishing Give Credit When Credit is Due: Skip The Middle Man, three scraper trackbacks hit from mortgage, banking, and credit card splogs. The article is about giving credit to the source of the quoted post, not whoever pointed you to the source. Nothing to do with finance.

    Don’t forget, these people are making money. Lots of money. All their time and money goes into figuring out how to do this better and faster. There are a couple people we can “thank” for originally creating auto-scraping programs for WordPress blogs, but their excuse is that “someone would do this so why not me first” and they continue to get attention, links, and money for their actions.

    Sure, evil blooms where people gather. It’s just heartbreaking that evil is allowed to flourish in what should be a bastion of freedom and goodwill.

    Reply

  9. By ian in hamburg posted on January 2, 2008 at 7:45 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    Being careful with keywords can cut down on the scrapers, but then again you’re cutting yourself off from potential legitimate readers, are you not? I had instant scrape as soon as a post with “credit card” in the title and travel in the categories was published. I don’t often write about credit cards, thank goodness, but I do like to write a travel post now and then. What to do? I don’t want to feed these weasels my content on a platter, but avoiding these tags is a little like letting the bad guys win.

    Reply

  10. By Lorelle VanFossen posted on January 2, 2008 at 8:25 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    @ian in hamburg:

    Good point. I should have been more specific in my mention of “careful”. Watch trackbacks closer to ensure you catch the scrapers and splogs and remove them from your comments panel, and take action if necessary to help put an end to them.

    Thanks for catching me on that, Ian!

    Reply

  11. By Jonathan Bailey posted on January 3, 2008 at 5:39 am
    Want an avatar? Get a gravatar! • You can link to this comment

    Ian & Lorelle:

    Though it wasn’t what Lorelle meant to say, it is worth accenting. I don’t think there is much point in watching your keywords because there isn’t much you can do as you never know which keywords the spammers will target.

    Case in point. Say I wrote the sentence:

    “Increased competition from legitimate bloggers and search engines will result in enhancements in spam techniques.”

    Seems innocent enough, but a few pharma spammers will see the word “enhancements and latch on to the post.

    Ok, rewrite the last part to read as follows:

    “…will result in improved performance in spamming techniques.”

    Once again, same problem though now with “improved performance”. Ok, so we try again, change the direction of the sentence.

    “will result in increased interest in developing new spamming techniques.”

    Same boat, this time real estate spammers will hit on “increased interest” and so the loop goes on.

    Not only would you be denying your readers your true voice, but I can promise you that you will never be able to write your way out of the spammer’s traps. There are so many kinds of spammers seeking so many keywords it isn’t possible.

    Though it would be a fun writing assignment to try. Maybe I just came up with a great way to torment my readers :)

    Reply

  12. die-Spammer » Blog Archiv » Spam spam spam egg and spam! - Der dümmste Kriminelle auf diesem Planet…February 19, 2008 at 9:02 pm
  13. Google! Clean Up Blogger! Now! : The Blog HeraldFebruary 21, 2008 at 7:58 pm
  14. By Rick posted on February 27, 2009 at 9:29 am
    Want an avatar? Get a gravatar! • You can link to this comment

    Well, spamming is really a big issue nowadays…unsolicited advertisements…they want easy money so they made that kind of style.which is bothering, if you are a bit naive you may fall into the trap.

    Reply

  15. By Tico posted on October 24, 2009 at 6:55 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    I’ve been arguing this point for years: that too much choice is bad because it makes it difficult to find the right thing and makes it easy for dishonest companies to slip crap products into the market.

    Reply

    Your words are your own, so be nice and helpful if you can. If this is the first time you're posting a comment, it might go into moderation. Don't worry, it's not lost, so there's no need to repost it! We accept clean XHTML in comments, but don't overdo it please.

    Current ye@r *