The Evil Linking of Sploggers and Scrapers Messing With Your Content

Filed as Features on July 2, 2007 4:10 pm

The other day, I found a fascinating post on blogging programs that really held my interest, exploring why someone would stay with a specific program even when it made them unhappy. I noticed a reference to Matt Mullenweg, founder of WordPress, but the link wasn’t to his name but on the word “of”.

Splog uses buried links to direct visitors to porn sites in stolen content

I thought that was odd and noticed that other strange words were in links: her, the, in, about, and he. Very odd linking pattern.

A hover over the links found that they were links to porn sites.

Suspecting this was a splog, I wondered if the author had intended to write this interesting article and stuff it with the nasty porn links or if this was indeed a splog. I selected a block of text with unique phrasing and ran a search in Google for the phrase wrapped in quotes. Indeed, I found the original author and then informed them of the scraping and copyright violation as well as the nasty links.

Getting Scraped is a Compliment

Many feel that getting your blog content stolen and scraped by a splogger is a compliment. It means they care enough about your content to “spread the word”. Or they think that it won’t hurt them, but benefits them due to the trackbacks and link love.

This is crap. Loads of it. Piled very high.

It’s a load of garbage because few splogs give credit to the original author. In fact, they have programs which strip the HTML links and tags so they are free to insert their own with no good links getting in the way.

Google’s new PageRank algorithm now investigates and considers links and content in many ways. It’s about keyword matching and relative content linking.

If there is a credit link back to your blog, and the links within the blog post are not inline with the blog contents, on the blog and the linking blogs, the discrepancy is noticed and can score against you. If the content from two different sites match, and the links within don’t, it can score against you.

If you are worried about duplicate content, then be more worried. If the duplicated content is matched up with your blog, then your site may get scored low for such duplication. It isn’t just the duplication on your blog but the duplication of your content off your blog.

Many a blogger’s PageRank has dropped due to splogs scraping their content, so help stop scrapers and sploggers from stealing and abusing your content. If others abusing your content is a compliment, it’s a painful one.

Tags: , ,

This post was written by

You can visit the for a short bio, more posts, and other information about the author.


Submissions & Subscriptions

Submit the post to Reddit, StumbleUpon, Digg or Del.icio.us.

Did you like it? Then subscribe to our RSS feed!



  1. By gaman posted on July 3, 2007 at 2:51 am
    Want an avatar? Get a gravatar! • You can link to this comment

    I thought Google is able to identify the stronger blog (the original most of the time) and won’t penalize it even though its content is duplicated elsewhere.

    The PageRank you point out was published in 2005, I am sure certain things have changed since.

    Reply

  2. By Lorelle VanFossen posted on July 3, 2007 at 4:24 am
    Want an avatar? Get a gravatar! • You can link to this comment

    The article continues to be valid today. I’ve updated it and if anything, it is more relevant today due to the increasing ability to compare data in the PageRank algorithm.

    Google could identify the “stronger” blog if there was enough history to compare to. I’ve heard many stories from bloggers who had been blogging for a year or so and found their PageRank falling and staying that way until they discovered a splogger scraping their content. Once the splogger was shut down, it took a few months but their PageRank increased.

    Reply

  3. By David Krug posted on July 3, 2007 at 3:39 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    Lorelle,
    I truly believe what you’ve written here is rubbish. Google’s pagerank algorithm is private has not been leaked, no information is roaming around about how it operates. Again yet another story of a blogger who thinks they know how something works when in reality they don’t. Will you stop already…

    Teaching bloggers in this fashion is really misleading.

    Reply

  4. By Lorelle VanFossen posted on July 4, 2007 at 1:26 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    The following are the publicly released patents on the Google PageRank:

    Google’s patent for their search engine ranking technique from 2005
    Google PageRank patent release from 2007

    Reply

  5. By Andy Merrett posted on July 4, 2007 at 1:58 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    David,

    Could you be a little less offensive in your attack? I thought you’d grown out of this.

    Reply

  6. Blog News Watch » Blog Archive » Talk Back: Sploggers and Scrapers and Thieves - Oh My!July 7, 2007 at 7:29 pm
  7. 恶劣的链接搞乱你的博客内容 : 先驱博客 - The Blog Herald ChinaAugust 3, 2007 at 9:13 am
  8. Are You Willing To Go Naked For One Day For Akismet? « Lorelle on WordPressNovember 29, 2007 at 6:23 am
  9. How to Add MyFreeCopyright To Your WordPress Blog « Lorelle on WordPressOctober 16, 2008 at 1:31 am

    Your words are your own, so be nice and helpful if you can. If this is the first time you're posting a comment, it might go into moderation. Don't worry, it's not lost, so there's no need to repost it! We accept clean XHTML in comments, but don't overdo it please.

    Current ye@r *