Protecting Your Cornerstone Content
Having your RSS feed scraped is bad enough. Spam blogs that republish your feed or unwanted aggregators reusing your content without proper attribution can not only siphon off your visitors, but damage your rankings in the search engines.
However, having your feed scraped is not nearly as damaging as having your cornerstone content plagiarized. This is the content that you build your site around, the content that stands the test of time and gives visitors a reason to come by your site in the first place.
But more importantly this is the content that gives your site true uniqueness, both in the eyes of human visitors, but also the search engines. Where a blog post might be forgotten in a week’s time, losing control over this content can have a negative effect for a long time to come.
So, while taking reasonable precautions with your RSS feed is important, it is at least as important to take steps to guard your static content against plagiarism. After all, given the time you spent crafting and creating this content, it makes perfect sense to take a few moments to protect it.
What is Cornerstone Content?
On Copyblogger, Brian Clark defines cornerstone content as “what people need to know to make use of your website and do business with you.” In his free eBook, Chris Garrett describes it as “Flagship Content” saying that “Just like Wikipedia is the lazy bloggers go-to reference for general research, your
Flagship Content is the place everyone will link to when they think of your topic.”
In short, cornerstone content is any content that is static on your site. It is your “about” pages, your FAQs, your guides and any other general information that is not in a regular blog post. However, in my opinion, it goes even beyond that. Any posts that were very popular, well-linked and still get a decent amount of traffic even after two months could also be considered an element of cornerstone (or flagship) content. The same could be said for any content that was Dugg or Reddited as well as any posts that have a continued comment discussion long after it has fallen off the front page.
Basically, any article or entry that defines your site or would be a likely entry point for a new visitor could be considered a cornerstone piece. Think of these as your signature pieces, the touchstones everyone knows you and your site by. These are not only your site’s most visible works, making them the most likely to be plagiarized, but the works that can be the most damaging when they are taken as your search engine traffic flow depends heavily upon them.
It makes good sense, in every regard, to fortify these weaknesses and take extra precautions to protect these works against plagiarism. Fortunately, doing so only takes a few simple steps that can be done in just a few moments time.
Since cornerstone content is static and not protected by more traditional RSS tools, the easiest way to protect it is to constantly search for it. However, searching for all of your non-blog post pieces regularly can quickly become very time-consuming.
For example, if you have twelve flagship items and it takes you five minutes to do a single search, that’s an hour of searching each time you want to do a scan. You may be able to perform a search once per week or once per month, but most would rather be doing something else with that time and many will simply forget to perform their next search.
The need for an automated search solution is clear. Fortunately, Google provides the perfect tool for the job: Google Alerts.
What Google Alerts does is provide email notifications for Google search terms. It is frequently used to perform vanity searches, thus letting the user monitor all mentions of their name, and to keep on top of topics that are important to the user (EX: I have several Google Alerts for “plagiarism” related topics.)
However, twisting Google Alerts into a powerful plagiarism monitor is a very simple task. One that can be completed in five simple steps.
- Open up the article you wish to protect, locate a unique phrase in it that is about 8-10 words long.
- Copy the phrase and paste it into the “Search Terms” box, being sure to add quotes around the sentence.
- Set the type to “Comprehensive” unless you only wish to search one kind of site.
- Set the “How Often” option to indicate how regularly you want to see the results. Most set it to once per day or “As it Happens”.
- Enter your email address and click “Create Alert”.
If you repeat that process for every article you want to protect, you will start receiving email alerts when Google detects a potential duplicate of one of your articles. Also, by creating multiple alerts for the same article, you can better guard against plagiarists that might only take a few paragraphs or reword portions of your work. This is especially important to do on longer works with multiple parts.
That idea, however, is that you do not have to search for your own material, Google will do it for you and only email you when something new is discovered. This not only saves you time and effort, but also enables you to search much more often than you would be able to do by hand.
That, in turn, makes it easier to put a stop to the plagiarism by not letting the site get established and also minimizes the damage such a site can do by not letting them accumulate much traffic or build up their search engine ranking very much.
Stopping Cornerstone Plagiarism
Dealing with plagiarism of static content is no different than any other form of plagiarism. However, since the content is not being scraped from your RSS feed, you can’t simply disable access to the feed or block the plagiarist from scraping your content. The damage has already been done.
Instead, you have to actually focus on stopping the plagiarism and either getting the content removed or the site shut down. To that end, there are three options that can work for you.
- Send a Cease and Desist: If you can find contact information for the plagiarist, you can send him or her a cease and desist letter. These are generally done best over email and seem to work well in cases where a human is responsible for the misappropriation, not just a spam bot. However, these can backfire and create more problems than they solve if the plagiarist responds angrily.
- Send a DMCA or other Host Contact: Informing the host of the plagiarism, either through a DMCA notice with a U.S.-based host or regular abuse complaints for hosts in other countries, is usually the quickest path to resolution. This amounts to a decapitation attack, removing the infringing site by cutting it off at the source.
- Search Engine Removal: Finally, you can get the infringing content removed from the search engines by sending a DMCA notice to Google and the other major companies. Though this will not result in the content being removed from the Web, it will prevent it from competing with you in the search engines and severely limit the people who access it. Ideal for the rare situations where both the host and the plagiarist are uncooperative.
While many people prefer to use cease and desist letters for ethical reasons, generally speaking, the second option is the best. Hosts, for the most part, are more professional and cooperative than plagiarists and they resolve these situations more quickly with fewer problems than the more direct route.
Though sending a DMCA notices takes a little more time, especially when trying to locate the host, it saves many headaches in the long run.
All in all, there is nothing special or unusual about protecting cornerstone content versus other kinds of material on your site. The same methods apply, you just don’t have the technological tools you have available when dealing with scraping of your blog posts.
While there are many reasons to be concerned about the lifting of cornerstone content, protecting it does not require any new tools or techniques, just renewed vigilance.
Fortunately, the vigilance that is required can be obtained not through repetitive work, but through an automated tool that is both easy and free to use.
Given the small amount of time and effort that goes into setting up these alerts, there is little reason not to create them. Also, given the potential problems such infringements create, even bloggers that don’t bother with spam blogs or RSS scraping may find it worthwhile to pursue those who plagiarize their cornerstone content.
In the end, it is an issue all bloggers need to be aware of and one that virtually any blogger can deal with easily if they are just willing to take a few moments to address it.
It is one of the few situations where an ounce of prevention truly is worth a pound of cure and the time to start the prevention is now.
Jonathan Bailey writes at Plagiarism Today, a site about plagiarism, content theft and copyright issues on the Web. Jonathan is not a lawyer and none of the information he provides should be taken as legal advice.
Excellent point, Jonathan. While some scrapers take what’s current, the worst are those who find the most popular, timeless posts are the ones I deal with constantly.
In some cases, the rip off is naive. They just want to “bookmark” the content on the blog to help them remember. There is a fear that they won’t be able to find it in the future, so ripping it off “preserves” it.
Either way, you are so right. It is incredibly painful to the blogger whose core content is taken.
Which reminds me, have you dealt with the copyright issues of Furl and other “bookmarking” sites which allow preservation of the entire post content on their servers for individual? Google and the Way Back Archive both get away with copyright violations because of the perception of “greater good”, but what about the others?
Thanks, as always, for helping us fight the good fight!
I read your article and I’m going to comment on it in just a minute (answering your question there). But thank you for your comment here.
However, I will say that I use FURL very heavily in my work. As you might imagine, I deal heavily with sites that are likely or certainly going to go down in a day or so, so Furl is an easy way for me to preserve evidence and a record. I also use it to preserve articles for the Copyright 2.0 Show before I move them over to Delicious.
That being said though, I keep my account private and primarily use the external link. The cached copy is only a backup.
I was wary of the same issues but I realized that A) They don’t cache sites that have the nocache meta tag or forbid it in their robots.txt. B) Cached copies are not available to search engines that I’ve been able to find. C) The cached content is used non-commercially in most cases. D) Though they do less good than the Web Archive or Google Cache, they also do much less harm. I would wager that the “Greater Good” argument you mention still applies.
Though this stuff has to be taken case by case by case, I feel comfortable that Furl would pass a fair use test, especially given the others sites that have. But any service that wants to enter this field needs to tread carefully. There’s a dangerous line here that is mostly invisible.