Negotiating with the RSS content scrapers

Last week I was browsing Technorati to see which blogs had recently linked to one of my own, when I saw something familiar: a copy of one of my articles on a different web site.

Sometimes it’s just a long quote, used either for another site’s commentary, or to link back, and I’ve no problem with this usage.

The main issue is with sites that scrape content and publish it on their own AdSense-riddled layouts in the hope of making some cash from other people’s work.

Though I abhor this type of site, I’ll often let them pass by because it’s not worth my time reporting them or searching around for a contact. If they’ve lifted an item of news I’ve written based on a press release, then I don’t feel quite so hard done by. The article in question, however, was a verbatim copy of an opinion piece I had spent a considerable amount of time working on. I felt aggrieved. The only reason any links pointed back to my site at all was because I had put internal links within the article itself.

Unusually, I found a contact form for the site (I won’t name and shame them here) and sent a polite but firm explanation stating that they had lifted several articles (I searched further) from my site – as well as from a number of others – and that I’d appreciate the content being removed.

I didn’t expect to get a reply, but today an email arrived which (paraphrased) said:

I didn’t copy and paste your content on my site. I just used your RSS feed from your site (and many other sites) and my script imported it to my site. Our blogger writes content about gadgets from all over the world, and you can find it in another section.

Anyway, I’m sorry if you don’t agree that I used your RSS, and I have two options to offer you.

1. I use your RSS feed in my site and at the end of it link to your site for the complete story. In this way you’ll get more visitors to your site.
2. I remove your RSS from my system.

Any way, I think if you choose option 1 I’ll send you more visitors and you can send me visitors.

So, what do we have here?

  1. Someone who believes (as I’m sure many others do) and blatantly acknowledges that it’s OK to do whatever they want with the contents of an RSS feed. The very fact that I chose to publish it gives them that right.
  2. Someone who thinks that, after I’ve discovered their splog, I might still consider “option 1” and link back to their site.

Would it have been a big deal to link and be linked? Well, quite apart from my own ‘ethical’ viewpoint, I don’t really want to test out Google’s ‘bad neighbourhood’ penalty, duplicate content penalty, nor any other problems that might arise from being associated with dubious sites.

The dilemma is how much time, in a busy blogging schedule, to spend on tracking down and attempting to eradicate these parasites.

Like & Share this Article

Comments

  1. says

    Why don’t you just add a signature to the bottom of each post with your name and URL. That way you get the credit and, maybe, a little traffic.

  2. says

    These type of people really tick me off!

    After discovering one site from [a certain domain] hosting blogs stealing my content, (which I had shut down) I discovered dozens more!

    You’re right about one thing…fighting these types of blogs is time consuming and irritating as well.

    I have better things to do in life than to send “cease and desist” orders to spam infested blog host sites.

  3. says

    One of the biggest reason to publish only partial feeds is to get around this annoyance. Its a bit of a debate, but if you do, you’ll guarantee that this won’t happen — at least to the extent that it does.

    Its funny how sploggers rationalize it by saying that it “sends” “traffic” to your blog — which is patently ridiculous.

    Also, feel free to publish their name if you want … its time we put a spotlight on sploggers, many of whom seem not to have an ethical bone in their bodies.

    Cheers
    Tony.

  4. says

    I feel your pain Andy – i find one of these every second day and am at a similar point of trying to work out whether it’s worth my time dealing with it any more.

    My main problem with it previously was around duplicate content issues but Google recently came out with info on duplicate content that seemed to indicate that this wouldn’t be too big of an issue in this situation – however for me its also more of an ethical thing – i put many hours of work into my content and to have it republished word for word without them making it more useful in any way and republishing every post I write just feels wrong to me.

    So I fight each one and hope that someone will find a way to stop it or at least automate how we track them down and stop it happening….

  5. says

    Just send a very short email saying that if the content is not removed in the next 24 hours, you are contacting Adsense and their web host.

    You will meet with success in 99% of the cases.

  6. says

    The ‘original content’ that this site talks of is rewriting articles from other tech sites – not a huge problem and does give attribution. However, ALL of the articles that are scraped from the site I write for are just pulled verbatim.

    Paranoid? Moi?

    I won’t name and shame them… this time…

  7. says

    If it’s blogger/blogspot, you can always use the “mark as spam” button, though I don’t know how fast they address that. Sometimes, it really is a waste of time going through the process. Would you rather be writing new content than catching scrapers? I’m not sure myself.

  8. says

    I love their response: I’ll take it down if you ask but if I keep it up you’ll get more traffic. This is the *EXACT* same argument Robert Scoble is using with his link blog that republishes full posts.

  9. says

    Andy, I use Antileech for my WordPress feeds. I also use Angsuman’s feed copyrighter to embed a copyright below my feeds. So in case my feed is scraped then the splog gets a blaring copyright notice, I get a linkback and so can easily trackback.

    And I suggest you directly report them to Adsense and the webhost with the DMCA note. He may take off your feed but there are other webmasters out there who will be suffering right now. We need to work together against splogs.

Trackbacks

  1. […] I’ve already written about the nasty little sites that scrape complete content from legitimate web sites via their RSS feeds, so I won’t go into huge detail here. Suffice it to say that this particular breed of scrapers, who go by the name of “The Gadget World” but have registered ‘igizmodo.com’ as their domain name (hmm, sounds familiar), see nothing wrong in what they’re doing. They even dare to suggest that our much higher ranked site would benefit from being linked to by a splog. Yeah, I don’t think so. […]

  2. […] The main reason for wanting to get readers to click through to one’s content (by using partial feeds) would be to get more advertisement impressions, to which Rick responds by mentioning feed monetizing options (which one is more profitable may ultimately be debatable). Apart from that, Dan brings up an important issue, that of RSS content-scraping which may prevent some people from publishing full feeds, but then himself provides the solution: 1. Use internal linking. […]