Protecting Your Content From the Spinning Spammers

Filed as Features on November 12, 2007 12:30 pm

Repost This

Last week, Tony posted an article about a somewhat different kind of spam blogger.

The spammer had taken an article from this site, scraped it and then modified it before republishing. Though the method of modification remains debatable, it is clear that it was through some automated means as the duplicate version was mangled and borderline unintelligible.

However, the unfortunate truth is that this type of scraping is not as uncommon as we might wish and the technology to do it has been around for several years. Worse still, this type of scraping is growing much more popular as search engines clamp down on duplicate content and ad networks get better at detecting traditional content theft.

Modified scraping is a rising threat that bloggers need to be aware of as it presents a whole new set of challenges for content creators.

No Laughing Matter

It is easy to laugh at these automatic scrapers as their results are often quite comical and produce gems such as this out of completely legible text:

“One word, or the demand thereof, denaturized the full intent. Such is the venture digit takes when they indite in a module in which they demand the pertinent fluency.”

However, the broken English belies the full extent of the problem. Spammers create these works by taking posts from legitimate bloggers and then running it through an algorithm. This can involve using a thesaurus to find synonyms for the words in questions or an automatic translation program to convert the work into another language, possibly then converting it back to English.

This process of modifying the content before reposting it is often called “spinning”. Spinning a work before republication has several advantages, the largest of which is that Google is less likely to detect the work as a duplicate and, thus rank it higher. However, almost equally important is that it is much harder for victims of plagiarism to detect and follow up on the misuse, making this kind of abuse much harder to stop.

The good news in all of this is that, since so little of the content remains the same, the odds of the search engines penalizing the victim are much more slim than with traditional spamming. However, this isn’t saying that these modified scrapers aren’t targeting similar keywords to your site, which they often intentionally leave intact when spinning a work, and might usurp the original work through a combination of scraping and spam linking.

Though less of a direct threat to bloggers, these scrapers are still a major thorn to legitimate content creators and remain a threat well worth addressing.

Legal Issues

The problem is that, when confronted with this type of scraping many feel that there is little that they can do. They fear that, since the reuse isn’t verbatim, that the law does not protect them and there is no action they can take.

Fortunately, the law is very clear on this subject. Copyright is not merely the right to copy one’s own work, but a set of rights that includes the right to create derivative works. This is why only J.K. Rowling can sell Harry Potter books, though she does tolerate non-profit fan fiction, and why spinning a work is almost always still illegal.

This right to create derivative works covers the right to create translations and any other work based on copyrightable portions of the original. Spinning, since it starts with a copyright-protected work and creates a new work based upon it, violates that right.

Fair use arguments fall equally flat in the eyes of the law. Spinning is not transformative as it is designed to replace the original, it offers no commentary or criticism, it is for commercial use, it can greatly harm the market for the original work and usually is unattributed. There is almost no fair use argument left for the spammers who modify the posts they scrape, leaving the door wide open for rightsholders to take action.

In short, though I am not a lawyer, I can see little reason to doubt your rights in the event you detect such scraping of your content. Your work is still very much protected and your rights are still very much enforceable.

What to Do

Of course, knowing that your work is protected does little good if you can not detect the misuse of your content. As we discussed earlier, this can be a challenge as the content has been modified and most search engines can only detect verbatim copying. Even powerful academic tools. such as Turnitin, struggle when faced with non-verbatim copying.

In a recent article on my site, I talked about various techniques for detecting spun versions of your posts. Those tips included the following:

  1. Digital Fingerprinting: Digital fingerprinting is a process by which you append a unique word or phrase to the end of your posts in your RSS feed. If the feed is scraped, so is the fingerprint and searching for that string of characters tells you which sites have taken your content. Since fingerprints don’t have easy translations or synonyms, they remain intact through the spinning process. Plugins such as the Digital Fingerprint Plugin and Copyfeed can automate the process.
  2. Trackback Monitoring: As was the case with Tony’s original post, spam blogs often leave links in the scraped post intact, even as they modify the copy. They often send trackbacks to those URLs in a bid to get extra incoming links to the spam blog. If you link to your own articles when writing, you can watch the trackbacks and get an idea for who is using your content, even if it is spun.
  3. FeedBurner Tracking: FeedBurner offers a very powerful “uncommon uses” feature that tracks where your feed is published. Since FeedBurner does not depend upon the post content to track the feed, spinning the text will not fool the system.

Once you’ve detected the scraping, you then have all of your typical resolution techniques at your disposal including contacting advertising networks, such as Adsense, filing a DMCA notice with the host or sending a such a notice to the search engines.

In short, detecting spun content is the hard part, dealing with it is relatively easy. Still if ever you need help with that, please feel free to post the Performancing Legal Issues Forum and I will be glad to assist you.

Conclusions

In the case that Tony references, we discovered after some research that the blog in question is really just the tip of a much larger spam blog network. So, we are currently contacting and filing notices with the ad networks involved to see if we can sever the revenue stream and, once that is done, we will seek takedown of the infringing work.

The process may be slower and requires more work but, since there is little harm being done to the original work in the short run, we feel it is more valuable to try and topple the whole network before seeking removal of the infringing work.

It is a bid to clean up at least one small corner of the Web and, hopefully, we’ll begin to show the fruits of that labor very soon.

Tags: , ,

This post was written by

You can visit the for a short bio, more posts, and other information about the author.

Submissions & Subscriptions

Submit the post to Reddit, StumbleUpon, Digg or Del.icio.us.

Did you like it? Then subscribe to our RSS feed!



  1. PlagiarismToday » Copyright 2.0 Show - Episode 32 - Ahoy PiratesNovember 12, 2007 at 3:16 pm
  2. By eschaton posted on November 12, 2007 at 4:47 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    I’ve also noticed that some bloggers scrape their own material, run it through text-modifying software, and post it to splogs (numbering often in the hundreds) linking back to their main blog in order to create massive Technorati authority gains.

    Reply

  3. By Ross Gordon posted on November 12, 2007 at 5:33 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    I have a big problem with sploggers ripping off my content from my Free Stuff blog. I have contacted a few by commenting on their blog threatening legal action. It actually worked a few times.

    But most don’t care. I added an auto sig to my feed with a wordpress plugin, so at least i am getting backlinks. But the truth is the they are not quality links. I would rather just have my content stay mine!

    some people just have no lives….

    Reply

  4. By Jonathan Bailey posted on November 12, 2007 at 11:56 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    Eschaton: There’s special software that does exactly that, working like the scrapers I’ve described here but without the scraping ability. Yet it uses the variations of the theme to spin hundreds of copies of the same work, all with modification.

    This something Google has had to work very hard to stop but seems to be making at least some progress.

    Ross: If you want, either shoot me an email or post about the problem to the Performancing Legal Issues Forum and I’ll see what I can do to help.

    http://performancing.com/forums/performancing-blog-forums/legal-issues

    There are other ways of handling this than just threatening legal action. One doesn’t need a lawyer, just to know the law.

    Let me know if I can help!

    Reply

  5. By Dave posted on November 14, 2007 at 12:34 am
    Want an avatar? Get a gravatar! • You can link to this comment

    I seen my content scraped within two minutes after its been published. Thansk for the article.

    Reply

  6. By Jonathan Bailey posted on November 14, 2007 at 3:18 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    Dave: That sounds about right, you might want to take a look at my article about pinging smart to see if that can help you out!

    http://www.blogherald.com/2007/09/17/how-to-avoid-spambots-by-using-pinging-services/

    Reply

  7. Censoring WordPress.com, New Comment Spam Fighter, More Blog Security News, WordCamp Melbourne, BlogWorld Expo, and More WordPress News : The Blog HeraldNovember 14, 2007 at 8:40 pm
  8. Spinning Spammers Steal Our Blog Content « Lorelle on WordPressNovember 15, 2007 at 2:05 am
  9. Protecting Your Content From the Spinning Spammers : The Blog Herald - BloggercampNovember 15, 2007 at 4:03 am
  10. Fighting the spinning spammersNovember 15, 2007 at 4:31 am
  11. A very spammy friday to you too : Everybody KnowsNovember 15, 2007 at 9:20 am
  12. Spinning Spammers: The New Breed Of Splogs » Brown ThoughtsNovember 15, 2007 at 11:30 am
  13. Spimmers « GormfulNovember 15, 2007 at 12:58 pm
  14. Spimmers « UnintelligibleNovember 15, 2007 at 1:05 pm
  15. Protecting Your Content From the Spinning Spammers | Web Standards WeblogNovember 15, 2007 at 1:59 pm
  16. Had Your Blog “Scraped” Lately? - - It’s not random, it’s CHAOS!November 17, 2007 at 9:18 am
  17. By Ron posted on November 17, 2007 at 8:59 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    Since I’m just getting started with my own domain and blogging site, I found your article very informative. One item that troubled me though is the link that you have for the digital fingerprint plugin. It took me to a Google page that warned me that the site in question could harm my computer. Their page took me to StopBadware.org which had further information about the safety of visiting this plugin site. Is the site for the digital fingerprint legitimate and safe?

    Reply

  18. By Lorelle VanFossen posted on November 17, 2007 at 9:18 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    @Ron:

    The site is fine. I’ve contacted the author as there is something going on in the web page code, but the site is absolutely fine.

    Reply

  19. By Kim posted on November 18, 2007 at 3:57 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    Thanks your article told me alot!
    I am finding they are taking my blog title and what I have written but the only way I know is the fact I have google alerts set up for keywords I blog about often. Yet I visit the spammer and there is no way to contact the person. I am finding I am now getting scrapped often due to the subject matter I blog about. I tell google on them.
    I also have a site that uses my blog as part of they membership ads, will there a way to set up the digital footprint to ignore them?

    Reply

  20. By Jonathan Bailey posted on November 19, 2007 at 9:20 am
    Want an avatar? Get a gravatar! • You can link to this comment

    Ron & Lorelle: I’ve spoken with the person who wrote the plug in many times and the site is fine. However, if you don’t feel comfortable you can use the Copyfeed plugin as it has the same functionality along with many more features.

    Kim: First, in the future, you may want to consider informing their ad networks and their host about what is going on before contacting Google. The reason is that the latter doesn’t remove your work from the Web and other search engines. If you cut off the money and then cut off the hosting, you do much more harm to the spammer.

    As far as the other site you’re talking about, the secret there is to create a second, secret, feed that you only give out to sites that need a version of it without the fingerprint. You can use FeedBurner to do that. It can create two feeds from one and then add the digital fingerprint to one of the feeds using Feedflare.

    Hope that helps!

    Reply

  21. PlagiarismToday » Massive Trackback/Comment Spam AttackNovember 20, 2007 at 12:22 pm
  22. Spinning Spammers — Full Time in NMNovember 20, 2007 at 3:52 pm
  23. Netsensei » Blog Archive » links for 2007-11-21November 21, 2007 at 12:20 pm
  24. Spinning Spammers Steal Our Blog Content | BigDadGib.netNovember 22, 2007 at 8:15 pm
  25. By Spewb posted on December 13, 2007 at 9:43 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    Under the Computer Fraud and Abuse Act (CFAA), which forbids exceeding authorized access to a computer with the intent to defraud the host of the blogging site can bust the scrappers not for plagiarizing your work but for having these guys access their servers repeatedly for the purpose of scrapping ( an act that is against almost everyone’s terms of service).

    Reply

  26. By Jonathan Bailey posted on December 15, 2007 at 12:15 am
    Want an avatar? Get a gravatar! • You can link to this comment

    Spewb: As true as that is, the process is much more complicated. To get almost anything done under the CFAA you have to get an attorney, file an injunction and jump through legal hoops. The DMCA is as simple as a sheet of paper and takes less than 48 hours.

    It is a good alternative though, something to consider.

    Reply

  27. Links für den 15.11.2007 | virtuatron::weblogDecember 22, 2007 at 6:56 am
  28. Can Spammers Legally Steal Your Blog Content? | Internet Marketing BlogDecember 28, 2007 at 1:41 pm
  29. Breaking Trust: How Not To Link to a Plagiarist : The Blog HeraldMarch 5, 2008 at 3:23 am
  30. Cleaning Blogspot Spam: Is Google Responding to Public Pressure? : The Blog HeraldApril 1, 2008 at 8:30 am
  31. War against content slammersJune 19, 2008 at 3:20 am
  32. How to Add MyFreeCopyright To Your WordPress Blog « Lorelle on WordPressOctober 15, 2008 at 3:17 pm
  33. How to Add MyFreeCopyright To Your WordPress Blog | This Is The Maverick Of BlogsDecember 17, 2008 at 3:28 pm

    Your words are your own, so be nice and helpful if you can. If this is the first time you're posting a comment, it might go into moderation. Don't worry, it's not lost, so there's no need to repost it! We accept clean XHTML in comments, but don't overdo it please.

    Current day month ye@r *