What Gets Copied: A 3-Week Study

Three weeks ago I signed up my blog for a beta service by Tynt called Tracer in an attempt to both test the service and get a better understanding of how people are using my content.

The service works by having users embed a line of JavaScript code into their site and it tracks when users select or copies text and images from the site itself. Tracer also adds an attribution line to every copy that includes a special link Tracer can track, thus letting you know when people visit your site from copied text.

The information provided by Tracer is only aggregate in nature, there is no information about what an individual user did with your content, and Tracer does nothing to prevent copying, thus it is not a DRM solution. All Tracer does is analyze how users interact with your content and which pages are the most “active”.

To do that, Tracer follows four metrics: page views, selections (meaning when someone selects objects), copies (actually copying the work) and generated traffic (clicks on links generated by Tracer).

After over three weeks of running the service, I’ve gotten some pretty good data on my site and the results more than surprised me. Here is what I learned.

Static Pages Receive More Copying

Tracer can not look at content on RSS feeds, it could only capture and analyze what visitors to my site were doing. Despite that, it became clear from reading the statistics that, once at my site, readers began to gravitate toward the static pages, especially those in my “Stop Internet Plagiarism” section.

Of the top five most copied URLs of my site, if you discount posts about Tracer where I encouraged readers to test it, four were static pages. Three were from the series mentioned above and one was the stock letter page, which makes sense as it is there solely for copying.

However, these pages tended to have a much lower level of traffic than other posts. This means that, though they weren’t the most visited URLs by far, they were head and shoulders above dynamic content in the level of user interaction.

Traffic Does Not Equal Copying

On a related note, traffic was not a good indicator of copying. The two variables seemed to be almost completely unrelated.

For example, a page about the limitations of copyright had less than one fifth the traffic of a page about how to find plagiarism but it still had nearly three times the number of copies and was dead even in the number of selections. Likewise, one of the more popular posts on my site received only one copy per 300 visitors while another, lesser-known, post had more than one copy per three views.

The difference between the posts was astounding and it became clear that traffic is a very poor indicator for how likely a post is to be copied.

Most Copies are Lengthy

Tracer distinguishes between long copies, over seven words in length, and short ones. However, the copying on my site was overwhelmingly long. No URL I checked had more than 50% short copies and most had an overwhelming percentage of copies be lengthy in nature.

This is important as it distinguishes between people who are copying to ensure they don’t misspell a name or some other element from those wanting to copy actual content. Though seven words is not a magical threshhold, most posts seemed to have an average copy length well over 100 words, in some cases over 200.

Though it is impossible to tell if this is plagiarism or copyright infringement, it does seem to point to copying beyond what is needed for a quote or a reference.

Little Traffic Gained

Despite having well over 100,000 words copied on my site over the three week trial and Tracer adding the attribution link to every copy (Note: Tracer doesn’t provide a total of the number copies made), I only received 20 visits from Tracer’s links. Though it is likely that many people stripped out Tracer’s links to create their own, it is still somewhat discouraging.

There is a good chance that many of these copies were for other uses, including ones offline, but it seems that only a very small percentage of cases actually kept the Tracer link intact, for whatever reason.

YouTube Plans to Sunset Stories: What’s Next for Content Creators?

Caveats

Needless to say, this is far from a formal study as it only deals with my site. Plagiarism Today is likely not a great case study for many blogs as it favors lengthy posts (over 1000 words), few images and has a large amount of mission-critical static content. If your blog is radically different, so likely will be your copying.

Also, the system itself has limitations too, namely that it does not track RSS subscribers (where the bulk of my readers get the content) and it can’t track automated copying of any variety. This solely deals with how humans who visit my Web presence interact with my content.

Despite these limitations though, there is still a great deal to be gleaned from this information and it hints and ways that we may want to change our content tracking focus.

Changing Strategies

As someone who is interested in monitoring and tracking copying, both good and bad, this study has made me consider several different strategy changes for monitoring my content.

More Focus on Static Content: Though RSS scraping/spamming is still the biggest obstacle many bloggers will face, static content is clearly getting a great deal of attention to. Using Google Alerts or performing Copyscape searches for static pages will be more important moving forward.
Focus on Interaction: In determining which posts to watch closest, no longer will traffic be the determiner. This is where Tracer can really help, by providing an interaction metric that can be a better guide for this determination.
Ignoring Short Copies: Though I’ve long ignored short copies of text, including those up to about 100 words, due to issues with fair use, this reinforces that by showing that the average copy length is much higher. Clearly, if we’re only going to target the worst offenders, the goal is to aim higher than the average.

These are not drastic shifts in strategy by any stretch, but they can help track copies and study content use that otherwise might have slipped through the cracks.

Bottom Line

In the end, Tracer is an interesting service with some potentially valuable metrics. Though it has some serious limitations, it can help bloggers study how users interact with their content and, from that, both what content is most interesting and what content is most likely to be reused.

For that reason alone, Tracer has been fascinating for me to use these past three weeks. Though I’ve now changed to a version of the JavaScript that doesn’t forcibly add attribution (it seemed unkind to mess with user’s copy/paste functionality, especially for so little reward) I plan to keep it on the site a little while longer to see how else my content is used.

Obviously, if I find anything else out, I will report it here.

Jonathan Bailey

Jonathan Bailey writes at Plagiarism Today, a site about plagiarism, content theft and copyright issues on the Web. Jonathan is not a lawyer and none of the information he provides should be taken as legal advice.