The 6 Steps to Stop Content Theft

With spammers and plagiarists becoming more prolific and more aggressive than ever, content theft is no longer a matter of “if”, but “when”.

Where once protecting content was the realm of lawyers and billion-dollar industries, it is now important for Webmasters, large and small, to be familiar with both the laws and the tools available for dealing with content theft.

Fortunately, the steps for fighting plagiarism are easy to follow and, for the most part, the tools are free and readily available.

If you take a few moments to familiarize yourself with the process and technology, you can become a champion plagiarism fighter in short order and get back to the business of running your site before you realize how effective you’ve become.

Step One: Detection

The Internet is vast and detecting content theft can feel like a needle in a haystack. Fortunately, we have tools that are designed to wade through the Web and find exactly what we’re looking for. Though some scrapers and plagiarists are kind enough to leave you trackbacks that lead you straight to their infringement, for those who aren’t that nice, the following tools can make life a lot easier.

  • Copyscape: Punch in a URL, see a list of potential matches. It can’t be any easier. Though the free service might be too limited for for many Webmasters, the paid service starts at pennies a search and is well worth the money. Drastic improvements to the service have made it a force to be reckoned with.
  • Google Alerts: Why search for content yourself when Google can do it for you, every single day? Simply punch in a unique phrase from your work, put it in quote marks and Google Alerts will email you suspect sites every day. Great for static content or keystone pieces that are frequently stolen. Also works great with the Digital Fingerprint Plugin for WordPress.
  • Mahalo’s Plagiarism Detection Tool: Add this applet to your browser’s toolbar, highlight some text and click the button. You’ll be whisked to a Google result with all suspect matches. Though really just a simple JavaScript trick, it is great for random plagiarism checks and quick comparisons.

Step Two: Preserving the Evidence

Once you’ve discovered the misuse of your content, you next need to preserve what you have found. Since the later steps usually result in the infringing site either being altered or taken down, having a personal copy of the site both for your records and to verify what was there previously can be very important in the event that a dispute arises later.

Fortunately, there are several great services to help preserve Web pages on the Web and offer some third-party non-repudiation of the results.

  • WebCite: Originally intended for academics, WebCite creates on-demand caches of Web pages and stores them in simple URL that you can easily access later or offer others as evidence. See a sample snapshot of my site here.
  • Furl: This LookSmart service functions both as an archiving tool and as a bookmarking service. Not only does it create caches of bookmarked pages, but also allows you to organize bookmarks with tags. Also check out MyWeb by Yahoo!.
  • The Internet Archive: The grandfather of all Web archives, the Internet Archive caches pages automatically and keeps a regular archive of most of the Web. Great for situations where you don’t have a personal archive.

Step Three: Contact the Plagiarist (if Practical)

Once the plagiarism has been discovered and the evidence preserved, the next step is to try and resolve the situation. For some, this involves first contacting the plagiarist directly. Though not practical with most spammers and scrapers, it may be possible with human plagiarists and, while results may vary, generally leads to the most amicable solutions.

However, contacting a plagiarist isn’t always as simple as sending an email, sometimes getting in touch takes a little extra work.

  • Domain Tools: If the plagiarist/spammer uses their own domain, you can perform a Whois search at Domain Tools to locate the contact information for them. If you punch in the domain, click submit and scroll to the bottom, you’ll see all of the contact information for the site. Remember, even if the site uses an anonymous service, such as DomainsByProxy, the email address often forwards to the real account so you can still use that address to contact them.
  • 10 Minute Mail: If the plagiarist doesn’t have their own domain and, instead, is on a service such a social network, you may have to register for a new account before you can contact them. If that is so, you may want to keep your personal email private when registering for this one-time use account. You can do that easily by using 10 Minute Mail to create a temporary email account you can use to receive registration emails.
  • Commentful: Though leaving a comment on a site is far from the best way to contact a plagiarist, especially considering that it could lead to other legal issues such as defamation and create unnecessary drama, it sometimes is the only approach available. If that is the case, keeping track of comments and replies can be tricky. Rather than checking back with the site regularly, use Commentful to notify you when a reply is posted.

Step Four: Contacting the Advertisers (optional)

Though many find it faster to just demand takedown of the work and be done with it, others want to strike at the heart of the plagiarist, their profit motive. Since one advertising account can serve hundreds, if not thousands, of spam sites,. targeting the advertisers is an obvious choice for dealing with spammers. Though not always a good solution or a practical one, it can be a powerful tool.

To aid with that, here are several resources to help make the process easier.

  • Adsense DMCA Policy: By far and away the most common advertising network on the Web, Adsense comes up more often than any other service on plagiarist’s sites. Having their DMCA policy handy is absolutely essential if you plan to take this route.
  • Plagiarism Today’s Stock Letters: Since most advertising networks require a DMCA notice to take action, you are going to need a template. Fortunately I provide one on my site that has worked well for me over the years.
  • FaxZERO: Many advertising networks, including Adsense, will only accept faxed or mailed communications. In this age of email and Internet that can be extremely frustrating. Fortunately, FaxZERO provides a means to send quick faxes via the Internet at no cost.

Step Five: Contacting the Host

Of all the methods of resolving plagiarism issues, contacting the host to get he offending site/page removed is almost always the fastest and most reliable. Laws, such as the Digital Millennium Copyright Act (PDF), require hosts to remove infringing materials once they have been properly notified.

For those interested only in a quick, clean resolution to the matter, this route is almost certainly the first, and final, cessation step.

  • Domain Tools (linked above): If the plagiarist is hosting their own domain, then Domain Tools comes to the rescue again. In addition to providing Whois information, Domain Tools also provides information about the host. Simply punch in the domain you want to learn more about and scroll down to the “Server Info” heading. You can see where the domain is hosted and who it is hosted by in the “IP Location” line. You can also click the Red “W” on the line above it to perform an IP Whois and obtain even more information.
  • DMCA Contact Information: If you need to know who the DMCA contact for a particular host is, the DMCA Contact Information Page on Plagiarism Today may be a good place to start. With over 100 major hosts listed, there is a very good chance that the host you are looking for is already included.
  • Copyright Office’s Directory of Agents: If you can’t find the DMCA contact information on the list above or by searching the host’s site, usually under their terms of service or “legal” page, then check with the United States Copyright Office and see if they have registered there. Since the DMCA requires hosts to register in order to obtain the protections that the DMCA provides them, there is a very good chance that they have.

Step Six: Contacting the Search Engines

The last step, if all else has failed, is generally to contact the search engines and get the offending content removed. Though it doesn’t actually remove the content from the Web, it prevents others from finding it, stops the plagiarist from gaining any benefit from it keeps and from the misuse from damaging your rankings for shared search terms.

Fortunately, the DMCA also requires search engines to remove infringing URLs once properly notified and, even if the other techniques fail, this one works very reliably.

  • Google Site Status Wizard: Before you can report a site to Google for infringement, you have to be sure that it is indexed. If you didn’t discover the plagiarism through the search engine initially, use their Site Status tool to see if the site is indexed already. You can also use it to ensure that the site is delisted from the search engine after your complaint is sent.
  • PrimoPDF: Google’s DMCA policy requires a handwritten signature before they will act on a complaint. You can either fax the complaint in, perhaps using FaxZERO above, or you can scan your signature, place it into a document and print it to a PDF. From there, you can email it to their DMCA agent (PDF). PrimoPDF makes it easy to print to a PDF from any application. Also, you can use OpenOffice.org to create the letter and export it to a PDF directly.
  • Plagiarism Today’s Stock Letters (Linked Above): Dealing with search engines requires a special DMCA notice. However, just such a notice is provided at my site. You can use it in conjunction with OpenOffice.org or PrimoPDF to create your letters to send to the various search engines.

Conclusions

Though plagiarism fighting has not traditionally been the realm of your everyday author or artist, the Internet has forever changed the game. Fortunately, the technology has risen to the challenge and empowered us to protect our content in ways that the bad guys never could have envisioned. Even better, it continues to rise even higher, promising us new tools in the coming months and years to even better protect our content.

In the meantime, it is important that we learn the laws, procedures and tools that are at our disposal and do the best with what we have. Though the approach may be somewhat hodgepodge, it is very effective and has worked for me in over 600 cases.

However, what is important about me is that there is nothing special about me in this matter. Six years ago I had no interest in copyright law. I learned the techniques the same way other Webmasters do today, by hitting the books and learning from others.

Though it sounds like a great deal of work, I can not think of anything else that has been so easy for me to learn or had so many wonderful people there to help me.

Considering how important this issue is and how little time and energy is truly required, there is no reason not to familiarize yourself with the procedure and make use of the resources available.

Comments

  1. says

    Thanks a million Jonathan,
    I have been going through some serious problems.Many
    Bloggers started copying articles from my blog and posted in their blogs and some even got stumbled..
    I have made some arrangements already.
    That was a healthy list.

  2. says

    An excellent and timely post because indeed, this internet plagiarism is a growing problem for many bloggers.

    It’s really astounding how fast the internet is growing and how easy for plagiarists to copy one’s work so thanks for the advice on how to handle plagiarism. I hope that the tools we use and laws that are implemented now are continously developed to keep up with the growing needs of internet writers, bloggers, etc to be protected from plagiarism.

  3. says

    phenomenal job ! Thanks to both you and Lorell for the great content.

    you have an error in your “about” section..
    Check the word “ago”

    Jonathan Bailey writes at Plagiarism Today, a site about plagiarism, content theft and copyright issues on the Web. He started Plagiarism Today about in 2005 [ago] as a way

  4. says

    TechDune: I’m glad that the article was helpful but I am sorry to hear about your troubles. If I can help in any way, please let me know! You can also post to the Performancing Legal Issues Forum if needed.

    Syaifudin: Very welcome!

    Tony: My pleasure, it is still frustrating that I had to pass over some great tools. Another post though.

    Vienna: I hope so too. However, it appears that the laws are starting to lag behind the needs on the Web, especially when you look at it from an international stance. It may be time for another broad international treaty, just to keep pace.

    Erisian: Thanks for the correction, I’m fixing it now. That’s what I get for hammering that out in about five minutes the day before my first post…

  5. says

    I’ve found a number of my articles reprinted without my permission. But interestingly, many provide a link or at least a Web address leading people to my site.

    One reason could be that I encourage people to repost my content and all I ask for is a byline and a link to my site. I provide a “reprint policy” link with every article that launches a popup which asks people to not be a jerk and please just ask me before they use my content. This is on my main site and not on my blog, since my blog posts are not as thorough as the articles.

    The real thieves online are print publishers who think they own all the content they publish. Virtually all of my articles have been abducted by print publishers who actually sell my articles online with no compensation for me. I don’t make a fuss, though, since it’s free promotion.

  6. says

    Dean: Like you, I am fine with a lot of reuse of my content. I have a Creative Commons License and encourage others to take advantage of it. The problem is that too many people are now violating the simple “don’t be a jerk” premise.

    A lot of them are spammers and scrapers that do it automatically, others are just idiots that don’t want to give a link back.

    I am a bit confused when you talk about the print publishers. Are you saying that you’ve had work you’ve put on the Web taken by print publishers and sold or is there something else going on? If that is the case, you might want to reconsider making a fuss, it would be very easy to do, if you can prove it, and depending on the situation, there could be a decent amount of money involved.

    That would be something for you discuss with your attorney. However, I don’t think there is anything wrong with getting what you are owed.

  7. says

    Hi! It has been quite a while since I have been looking for a post about his on-line theft prevention, and I am glad to see your site through google alerts. Now I can safely say that somehow I am protecting my self. Another thing, though I have only less than a thousand visitors per day, I think I should have this copyscape widget, it makes sense. Being new here, I really want to thank you for such great tips!

  8. says

    Paul: Detecting images is tricky, but the rest of the steps pretty much remain the same. I’m working on a new method to detect image plagiarism and I should have something about that on this site and mine soon.

    However, since you already found the plagiarism, steps 2-6 still apply pretty much the same as if it were text. If you have any specific questions, feel free to ask!

  9. says

    I actually fly on the other side of the spectrum.

    I do not like anyone re-printing — scrapping it — because it doesn’t actually add any value to the web.

    If someone wants to take an excerpt from my article, great! I would happily allow that.

    But do not, by any means, just copy text from my article or re-print it with some random name as the link back — or not even linking back at all.

    I simply do not like sploggers. It’s just cluttering up the Internet more and more everyday with more worthless crap.

  10. says

    Jonathan: Though I agree that I don’t like any and all spam bloggers, I have also grown to realize that limited reuse can have a purpose, especially if it is targeted at different audiences.

    However, I have to say that the amount of reuse I consider to be acceptable has gone down in recent years, mainly due to spam blogging. Sure, some people use my content in ways that I would gladly permit under my CC licenses, but it seems the bulk of reuse I’m seeing comes from spammers and junk purveyors.

    I guess you hit at a good point, we should stop and ask not so much if this is good for us, but good for the Web. No easy answers there that I see.

    Thank you for your input.

  11. says

    Jonathan,

    I agree with you to a certain degree about re-producing content, but from a search engine’s point of view, it’s just a bunch of crap.

    That’s why I take such a firm stance on copying any of my content. I don’t show full feeds RSS anymore, only partial, because so many people were stealing my content.

    My subscriber rate went up and so did my traffic, so the switch was good for me.

    But aside from that, I always want to keep the web’s best interest at heart. The web only exists because people like us exist. If webmasters didn’t care, Google would just have a search engine full of pay per click ads and make money on every single click — and they wouldn’t even need organic search (and they wouldn’t care whether someone re-produced your content or not).

    Just another note from my side of things.

    Thanks for the great article again, it’s a good write-up.

  12. says

    Hello. Your article is NECESSARY stuff, indeed. I bookmarked it, and I’ll add a link to it from my blog, if you don’t mind (anyway, nobody reads my blog, heheh –and don’t even try, Jonathan, it’s in spanish).

  13. says

    Great information:

    As your writing skills increase, you are more open to these types of flattery/attack. Sometimes you just want to shake them down so you can confront them with a “Dude, write your own”! Warning, but oftentimes, the webmasters just use harvesting tools and fake aliases for the domain info. Great tips, thanks.

  14. says

    Great article, I have been having some issues with my blog getting scraped by a few sites. I will try some of these things and hopefully I can get it resolved. I really hate how you work on writing something and people steal it to try and benefit themselves.

    Bud
    http://www.budcalabrese.com

  15. Mia Tyler says

    Hey!…Thanks for the nice read, keep up the interesting posts about scan ip address..what a nice Thursday .

  16. says

    The most important is to prevent plagiarism, and you don’t even mention it. It is time for us to be “more aggressive” and stop them before they steal.
    I had a Google PR5 site with good, unique content, and they come, and they take it away. My page rank drop to 1 , my earnings on ad sense simply melted, i was in despair, then I have found: http://donotcopy.org to be honest, it looks stupid, but it works! Last six months i had 0 copy attempts, my page rank is 5 again, ad sense earnings are back to normal. Thank you do not copy , thank you (sorry for my bad English but i had to spread the word).
    S.V. my site is: http://uradi.com

  17. says

    Hi

    I run eBuster.co.uk that is dedicated to exposing fraud on eBay and eBays unwillingness to deal with fraud on the site and as part of that process I often need to preserve the contents of pages before eBay removed them in an effort to hide the crimes under the carpet.

    One of the many scams uncovered is a fake login page that was hosted by about.com and I took a copy of the page and posted several messages to eBay concerning the page but as usual they were ignored but dam me today my host pulls the site after it received a complaint from the DCMA where eBay claims copyright infringement and this is to a page they never hosted.

    Clearly something is very wrong here and it can not be argued it was part of a scam after I made the effort to insert big red bold text in the page warning no one to use the page so my conclusion is the DCMA only care about the big boys and it has become illegal to expose corporate fraud on the internet.

    Yes the page did include the eBay logo as all the pages I present as evidence needed to remain as close as possible to the original item and yet DCMA are on my back wanting me to jump through hoops.

    I wanted to contact these DCMA people myself to put my case but alas like most terrorcrats they remain hidden and if ever I saw a suppression or freedom of speech then this is it.

  18. says

    This is bullshit. I have a site which I have copied from many others. The best thing is I enjoy good SERP. No one can get me to remove those, although many tried. LOL. Your advises are not practical.

  19. Greg says

    The best one so far is trackment.com It’s not expensive, yet easy and reliable. I’ve been with them for 2 years, highly recomenned.

Trackbacks

  1. […] The 6 Steps to Stop Content Theft : The Blog Herald Nice article on stopping content theft. In conjunction with my post on getting traffic from scrapers, these seem to work well together. http://www.ewhisper.net/blog/scraper-sites-steal-your-content-use-them-to-build-your-traffic/ (tags: copyright) Powered by Gregarious (42) Share This Possibly Related Posts: links for 2008-01-11links for 2008-01-05links for 2008-01-04How Bloggers Make MoneyJim on “Link Pages” […]

Leave a Reply

Your email address will not be published. Required fields are marked *