Now Reading
The Five Worst Ideas in Content Theft

The Five Worst Ideas in Content Theft

When it comes to detecting and stopping content theft, there is a great deal of progress to be seen. New plugins are constantly being developed to stop scrapers, search techniques are constantly being improved and new tracking methods are being explored.

But despite all of the effective ways to monitor your content and protect it from misuse, it seems some of the worst ways never die.

No matter how many times these techniques to get shot down, disproved or otherwise defeated, there are still those that preach them as gospel. However, these systems not only provide a false sense of security, but often times irritate readers and, in some cases, can actually make the problem worse.

So let us take a moment to look at the five worst methods of dealing with content theft on the Web and analyze why they are so bad.

5. Truncated Feeds

There is little doubt that RSS scraping from spam blogs is the single largest threat bloggers face when it comes to content theft. Not only is it the most common type of content theft currently ongoing, but the quantity of content stolen, which can often include the entire blog, and the immediacy of the infringement make it an extremely dangerous form of content theft.

This has caused something of a panic among many bloggers and they have sought to nip this problem in the bud. The solution is simple enough, if you don’t put the content in the feed, it can’t be scraped, so use truncated feeds to prevent the misuse before it happens.

Though shortened feeds can prevent most of the direct RSS scraping from taking place, it doesn’t protect against other forms of scraping, such as keyword scraping, and it can greatly annoy legitimate readers of your feed. Using a partial feed will cause many not not subscribe to your feed and, according to FeedBurner, those who do subscribe are not any more likely to click through to your site.

There are many tools available to track and monitor feeds, especially to WordPress users running their own server, and there is little gained from using a partial feed to protect against content theft. Worst of all, many of those who use truncated feeds view their content as “safe” and ignore other threats, including those from human plagiarists.

4. Overblown & Hostile Copyright Notices

There are literally dozens of sites on the Web that will provide you with banners and badges that you can put on your blog to let visitors know that you disapprove of any copying of your work. If a button isn’t to your taste, you can always write a bold-faced copyright warning, letting your users know exactly how you feel about those wanting to take your content.

The problem is that these warnings are very rarely, if ever, an effective deterrent against infringement. In the case of RSS scraping, no human is likely to even see the warning since the entire process is automated and human plagiarists tend to ignore such warnings as they already understand that what they are doing is wrong.

Simply put, if someone visits your site with the intent to misuse your content, they are going to do so, regardless of what warnings you may display.

Though it is important to be clear about the rights you reserve and let your readers know your terms of reuse, doing so avoids confusion and helps prevent accidental misuse of your work, there is a fine line between stating your rights and being hostile toward your readers.

Remember, your efforts to protect your content should impact your legitimate readers as little as possible. Though we all have to cover the legal bases, there is little to be gained from being more hostile than absolutely necessary.

3. Converting to Static Content

If a truncated RSS feed is not extreme enough for you, you can always try taking all of your valuable content and moving it to static pages, thus using your actual RSS feed to just announce updates, additions and revisions to the rest of your site.

This idea has all of the drawbacks of a truncated feed but goes a step farther by reducing the amount of content you can effectively present on your site. In following this method, one either has to sacrifice usability by providing hundreds of links or cut back severely on the amount of unique content offered.

Worst of all though, moving content to a static page may eliminate the threat of RSS scraping, but it increases the threat of human plagiarism. Where an entry to a blog might have a life span of a few days, static content is perpetual and can be lifted by humans months or years down the road.

Though every blog requires some cornerstone content to anchor it, such content also requires special protection to prevent misuse.

Moving content to static pages does not outright prevent content theft, just shift the danger from one type of infringer to another.

2. False Copyright Registrations

If you want to take the extra step of registering your work, there are no shortage of services available to take your money. With names such as iCreateditfirst and DulyRegistered these sites vary wildly in terms of price and services, but the sales pitch is almost exactly the same.

These services claim to help you “protect” your copyright and offer “peace of mind” by having a version of your work in storage to prove you created it on or before a certain date or time.

Though non-repudiation can play an important part in a content protection strategy, many of these services are out of step with the realities of the Web. Their offerings are simply too slow and too expensive to be practical for your average blogger.

Fortunately, services such as MyFreeCopyright, Numly and Registered Commons provide more practical and effective non-repudiation services at either a minimal cost or for no fee at all.

However, the biggest problem with many of these expensive “copyright” services is that they, with their marketing copy, lead users to believe that it is the equivalent to a legal copyright registration. Unfortunately, in the United States, that is not the case. If you wish to register your work or site and receive full legal rights and protection, you need to visit the United States Copyright Office Web site and go through their motions.

Though the process there is even more expensive and slow, it is the only way to obtain the advantages of such a registration.

See Also
Google search

Unfortunately, many who use these services may not realize the difference and move forward believing that they have rights they have not yet obtained. This can cause serious legal issues down the road and seriously injure one’s ability to obtain damages from an infringement.

1. Scripts

It doesn’t really matter which script it is. Among the popular scripts are the “no-right-click” script, which is designed to prevent people from right clicking on images or text, the “no select text” script, which prevents the user from selecting text to copy, and the “image protection script“, which creates special menus on the right click of an image. No matter what the script, the idea of using a script to protect content stinks.

The problem is that scripts simply do not work. Any JavaScript you use to protect your content, image or text, can be defeated any number of ways, the simplest being to simply disable JavaScript in the browser.

But while there are a million ways to circumvent JavaScripts, there are an equal number of ways that they can upset legitimate visitors by changing their browser behavior unpredictably. Want to view an image by itself? Think it would be great to select a funny quote to send to a friend? Like using the right-click menu to move forward and backward? You can’t do any of those things if they are disabled by the Webmaster.

Users, by in large, are very hostile to this and tend not to return to sites with such protections. Meanwhile, your dedicated scrapers and spammers are largely unfazed. These scripts do nothing to stop RSS scraping since they aren’t embedded into the feed itself and most human plagiarists can get around such blocks in a matter of seconds.

In short, the handful of “simple minded” plagiarists that either don’t know how or don’t bother to circumvent such simple blocks will easily be outnumbered by the legitimate users that never return to your site out of frustration. Like it or not, the use of such scripts, especially in a broad manner, can actually work to sink your Web site.

In most cases, your best bet is to stay far, far away.

Conclusions

As with most bad ideas, the ones that seem to survive when dealing with content theft share one thing in common, they are supposedly easy solutions that fix the problems of plagiarism and copyright infringement with a few simple steps.

The problem is that, as with anything else, there are no easy solutions or quick-fixes. If you look for short cuts and easy ways out, you’re likely to wind up falling into well-placed traps.

There is simply no easy way to deal with content theft and, while there are many tools to help you in your fight, there is still a need to work, study and learn. It is important to remember that you can not just flip a switch and make this go away.

Most of all though, be wary of snake oil salesmen and anyone promising you something too good to be true. In my experience, they either don’t know what they are selling, or there is something in it for them.

View Comments (10)
  • Excellent list of what doesn’t work that people use every day. Wonderful!

    Though, I don’t agree with the issue of feeds. One of those myths with no supporting data. Sure, it doesn’t change anything for content theft, other than limit them to the excerpt if they are using traditional feed scraping programs, but the myth that readers won’t read blogs with feed excerpts has not thoroughly been researched only assumed. If the content is good, they’ll click through to read. Choose the feed style that matches your blog.

    How does Creative Commons fit into your list? Is it really helpful? Bloggers everywhere embrace them, but what can they really do to help protect your content?

  • Lorelle wrote that ‘If the content is good, they’ll click through to read’. I disagree, truncated feeds are a real nuisance to the subscriber.

    Feeds are used so I can read in my mail instead of jumping around blogs. When I get a truncated feed I usually skip that feed or even unsubscribe.

  • Lorelle: Though I agree that one should choose the feed style that fits their blog, I have to trust FeedBurner with their statistics. They monitor over 1 million feeds and are usually the one doing the truncating. They monitor clickthroughs and viewership. According to the link I gave in the article, they looked at it and found no different.

    Are there cases where a truncated feed could be better? Definitely. There’s a reason why all MSM outlets I know of use partial feeds. That being said, averages indicate no real difference in clickthrough rates for truncated and full feeds.

    As far as CC goes, it was never really billed, that I’ve read, as a solution to content theft. It’s designed to solve the problem of people having to obtain permission for every single use.

    In general, the consensus I hear is that it doesn’t really affect content theft one way or another. Scrapers ignore the license, as do most plagiarists, I doubt anyone who has gone into reuse with bad faith is swayed to change any of their behavior by a CC license.

    That being said, the problem it is designed to solve it solves quite well, though it would be much better with more promotion and a slightly clearer set of license terms.

    Still, in that regard, it beats the current regime.

    Glad you liked the article!

    Bengt:

    The approach I take is this, great content can survive a partial feed the same way that a great story can survive bad writing.

    However, this isn’t to say that a partial feed helps great content or bad writing helps a great story. Both can work and there may be cases where it is preferable, but those applications seem to be getting more limited by the day.

    Just my thoughts though.

  • A great story can survive bad writing a few times but not in the long run. If each story comes in bad writing (partial feed) then it does not work anymore. That is my 2 cents.

Scroll To Top