Tweetbacks, Copyright and Scraping
Yesterday, a friend of mine on Twitter sent me a DM to alert me about a what she said “looks like aTwitter scraping tool”. I clicked the link expecting to find a social aggregator gone awry or a spam blog. However, instead, the link instead pointed to Joost de Valk’s new Tweetback plugin.
The plugin, as well as Dan Zarella’s plugin by the same name, searches Twitter to for tweets that link back to posts on the blog and displays those tweets on the site under their respective entries, much like a trackback, but with Twitter (hence the name).
These plugins do, by their very nature, copy and paste tweets, displaying them on the user’s Web site, all without the explicit permission to of the author. Where trackbacks are sent from the linking site and comments are left intentionally by the visitor, these plugins are different in that they activelhy go out in search of these “tweetbacks” (including parsing URL shortening services), even though the creator has taken no steps to ensure they appear on the site.
This, in turn, raises serious issues about copyright, scraping and more that have to be at least looked at. Is it legal to copy and publish tweets from others without permission, simply because they link back to your site? The answers are not as simple as one might initially think.
Previous Coverage
Back in May I posted a somewhat controversial article on this site about Twitter and Copyright. The article was more broad in nature but concluded that it was difficult, though not impossible, for a tweet to meet the criteria for copyrightability.
However, with tweetbacks, which link to original articles, there is even less potential for copyrightability. Not only does the URL reduce the potential character count, but typically such tweets are just a short, fact-based description of the post itself.
But this isn’t to say that such tweets could never be copyrightable. Shorter works, including Haiku, routinely do enjoy copyright protection and are sufficiently original enough to have that protection hold up in court.
The problem with copyright law is that it only takes one “perfect storm” to create trouble. If one tweet were to be copyrightable, the Twitter user had registered their feed, the user did not wish to see it appear elsewhere, discovered the use and was in a position to take action on it (including either registering the work with the USCO or being in a country with no such obligation), there could be issues. Though the odds of such a situation happening are incredibly slim, when you spread those odds across millions of Twitter users and hundreds of times as many tweets, it becomes at least an outside chance.
This isn’t to say that this should be a huge worry for those using these plugins, but that it is a possibility. However, there are still more factors to consider.
A More Likely Problem
Rather than someone using copyright to file a successful lawsuit or legitimate takedown notice. I would be more worried about someone using it as leverage to harass or annoy a blogger. Consider the following:
- A blogger creates a controversial post.
- A Twitter user links to the post in their account, likely to complain about it.
- Tweetback plugin grabs the tweet and reposts it.
- Twitter user either doesn’t understand or ignore copyright issues and files takedown notice to get it removed.
- Host complies, but is either forced to disable the entire post or have the user remove the plugin for the moment.
Though this situation is potentially very scary, it is actually a situation faced every time a blogger uses someone else’s content, even if it is just to quote it and do so within the bounds of what is likely fair use. Tweetbacks are hardly the only time this situation is encountered and, in my experience, certainly not the most dangerous.
People who are likely to get upset over, likely legitimate, use of their content typically do so over more deliberate use and not over automated ones, such as Tweetbacks. Quoting, citing and screenshots are, historically, much more risky behaviors.
Is It Scraping?
Another, though somewhat more broad, question is whether or not these plugins are guilty of “scraping” Twitter. The answer is most likely no.
Twitter provides a robust API and a terms of service for the different elements for it.
If these plugins follow the API and its guidelines, and it seems there is no reason they wouldn’t, then an accusation of scraping would be hard to sustain. It would be like following a Creative Commons license and being accused of copyright infringement.
That being said, the ways the two different plugins go about pulling Tweetbacks are different and worth noting. Dan Zarrella’s, for example, uses JavaScript and pulls down the tweets with almost every page load. Joost De Valk’s, on the other hand, copies the tweets to a database on the blog’s server. Though De Valk’s plugin likely results in faster displaying of tweets, Zarrella’s raises fewer copyright issues.
Since Zarrella’s implementation is JavaScript-based, if someone wished to remove their tweet from a site, the easiest way would be to just delete the tweet themselves from their own stream (Note: I spoke with De Valk and he informed me that removing a tweet from your stream does NOT remove it from the search, not only does this create privacy issues in general, but negates this advantage at this time). However, with the De Valk’s plugin, depending on how it handles deleted tweets, it would likely remain up.
Furthermore, the database implementation create a more permanent copy of the tweet rather than the JavaScript one, which creates more temporary ones. Zarrella’s implementation more closely resembles browser caching and ISP caching, creating only short-lived copies that are needed to transmit the data.
Is this to say that De Valk’s plugin is an infringement or that people who use it need to fear copyright problems? Not necessarily. Just that if Tweetbacks were to become a copyright issue, De Valk’s plugin would raise more issues.
Bottom Line
These plugins are a pretty novel new use of Twitter and, with such newness comes a new set of potential legal questions. Though the likelihood of any serious copyright trouble from using Tweetbacks are slim to none, as with anything that is used by a large number of people, the popularity of a service increases the likelihood that someone will find themselves in a dispute.
Though I would be hard pressed to call Tweetbacks a form of “scraping”, considering that Twitter intentionally licenses such access to their data via an API and a TOS, it is easy to see how some users, not expecting or foreseeing that their Tweets could start appearing on other sites, could be upset and, since they are the ones that hold copyright interest in the work, they are the ones with the right to file any copyright complaint.
However, the likelihood of such copyright complaints rising to anything beyond saber rattling are slim to none. Simply put, most tweets aren’t likely copyrightable, even more so for those including links. The bigger danger is that the copyright issues could be used as an excuse to harass a site with the plugin installed.
Still, it might be worthwhile to warn visitors that, if they tweet your post, it will appear as a Tweetback. Displaying tweetbacks should be warning enough, but offering “Tweet This” tools and an explicit statement can’t hurt either.
However, even without such warnings, the risks and issues that exist with tweetbacks also exist any time one uses content from another source, no matter how legitimate the use.
In the end, though there are some logical comparisons between tweetbacks and traditional RSS scraping, but the likely lack of copyrightable material and blank check permission from Twitter separates tweetbacks from it.
Furthermore, unlike RSS scraping, most Twitter users are likely going to be happy to have their tweets displayed on other sites. With so little time going into each Tweet and almost no SEO benefit, there is little lost in having one’s Twitter feed scraped. However, given the social nature of Twitter, there may be a lot to be gained.
Still, it only takes one person to create a problem and, until that person appears, it remains to be seen what the exact implications are.
Jonathan Bailey writes at Plagiarism Today, a site about plagiarism, content theft and copyright issues on the Web. Jonathan is not a lawyer and none of the information he provides should be taken as legal advice.
You sort of gloss over the API, but I think that’s a very key point. Because Twitter provides an API, authors must expect that tweets may be used or re-published in various ways. That doesn’t make -all- usage OK, but I think it opens the door enough that re-publishing a tweet on the page it links to would be OK.
(It also seems pretty straightforward that fair use would be a successful defense, but I’d hope that it never gets so far in courts as to need a defense.)
And of course, I’m not a lawyer, and you’re not a lawyer…not that most lawyers do an adequate job of understanding trends in online intellectual property before they write about them ;)
Terrific article and brings a question about using someone’s tweets in a blog post directly instead of just attaching them to the article/entry via the tweetback process. The reason I ask is that last week I did two different articles utilizing the “opinion” of Twitter users to make a point in the blog post. Obviously they di dnot make their comments with the intent of being included in a blog post as opposed to retweeting something they liked and wanted to share.
I did not ask specific permission to use those tweets – would this be considered a copyright issue under the Twitter Terms of Use or are the comments we post on Twitter considered to be in the public domain?
Thanks for the chance to comment on this.
You know how I feel about scraping — for those who don’t, I HATE it and wish all scrapers would die a horrible, painful death — yet I didn’t think twice about utilizing Tweetbacks in my blog. It seemed natural to be able to include tweets about a post — just as it’s automatic and natural to include trackbacks/pingbacks for a post. I didn’t consider it “scaping” at all, let alone in the same sense of bloggers building entire Web sites by scaping and using the original content from another blog.
To me, a Tweetback is a sign of a post’s popularity. If people are tweeting its link, they like it. The more tweetbacks, the better.
It’s also a good way to encourage Twitter users to spread the word about blog posts they like. When their icon and Twitter name appears in a Tweetback, blog post readers learn about them. This could help them get new followers.
In defense of the plugin (not the Javascript solution), there IS a way to filter out certain Twitter IDs. I think that can be used to remove tweetbacks by people who don’t want their tweets “scraped” for this purpose. Once filtered, a “clean up” feature removes the tweets from the database. So it really is a better solution for those worried about copyright infringement — provided that an offended tweet author takes the time to ask to have his tweets removed.
I use the JavaScript solution now, primarily because I like the way it looks and I’m having a heck of a time integrating the plugin in my theme.
Anyway, I hope this does not become a serious issue. Frankly, I see no difference between it and pingbacks/trackbacks — and don’t understand why anyone else would.
Come on.. lets be serious here. Anyone who’s throwing a fuss over 140 characters of text (less if you discount the link in it) is bonkers.
Oh, and wouldn’t that also make retweeting susceptible? It’s basically the same thing.
Dave: The API one is an interesting one and, sadly, the post was already too long. The problem with this is that, in order for one to claim that there is an implied license for this kind of use, one has to be able to easily foresee it, such as with Google indexing your site.
Certainly a lawyer defending the use would argue exactly as you did, but a counter argument would be that there was no way to foresee it as such a technology had never existed. In this case, the newness of the product could hurt it.
An analogy, to play devil’s advocate, would be with RSS scraping. It is foreseeable that it would be used in RSS readers, but not on other sites. An RSS is, in a way, an API. And without a clear license many argue that it should be foreseeable that scraping would take place and that an implied license grants them that right.
An argument that Twitter users granted such an implied license for tweetbacks could work, but it could also be a dangerous way to win with consequences elsewhere on the Web.
Maria: Believe me, I know well how much you hate scraping. But, like you, I did install the Tweetbacks JavaScript (though I pulled it due to speed issues, not copyright ones, may try one of the newer ones later though).
The difference between pingbacks/trackbacks and tweetbacks is that pingbacks send out their notification to the original site. Users, when blogging, select whether they want to send pingbacks out. With tweetbacks, the process is passive. The blog goes out looking for the tweetbacks without the Twitter user doing anything.
In short, pingbacks go from the commenter to the original post, with tweetbacks the original post finds the commenter. Does not change the end result, but could have an impact when looking at it from the law.
Overall though, I don’t think there are any serious issues with tweetbacks, something I tried to get across. The problem is that it only takes one perfect storm and we have to play hypotheticals to prepare for that…
Jonathan said:
This is very true and a good point. But between use of tweets on the “public” timeline and API tools, I think a Twitter would have a very hard time convincing any legal body that he had the ability or right to control where his 140-character tweets appear.
I’m not going to worry about it — at least not yet. We’ll see how this pans out.
Agreed on performance for that JavaScript solution. One of these days, I’m going to decipher my Comments template file and get Tweetbacks to appear properly with the plugin.
Maria: I agree that someone would have a hard time convincing a court, it would be an uphill battle, but there are several aspects of the technology that do leave it more open. Besides, most of the concerns one would have about a copyright holder going after them these days deal less with courts and more with DMCA notices.
I’m probably going to install either of the Database one (Zarrella released one this morning as I was writing it, hellacious timing) so I’m going to see about that later this week :)
I would argue that as you are posting your tweets to Twitter, they are the publisher and have distribution rights. As they have a public API allowing the postings to be pulled to other sites, they are explicitly allowing the tweets to be posted on other sites. Therefore, anyone complaining about “scraping” doesn’t have a leg to stand on.
I’ve had my blog posts scraped before (it’s very annoying!) but I hadn’t even considered this. I find all this information fascinating; I gave it a stumble.
Nice article Jonathan! As the author of one of the two plugins, I’m curious how the discussion on this will develop. There are a few things to keep in mind, I think, first and foremost, neither of us scrape, we indeed use the Twitter search API. But your actually wrong about how that works a bit, because you say:
“the easiest way would be to just delete the tweet themselves from their own stream.”
That’s actually not true. Deleting a tweet from your stream does NOT delete it from twitter search. Though I would agree that that behavior is odd, it is the current fact. Blog owners can delete tweetbacks with my plugin, and if they do, they’ll stay deleted.
As for the speed comments: Dan has his work cut out for him in that area, I’ve got mine in other areas, f/i there’s some issues with shorturl providers giving back wrong data, which results in tweetbacks that look like spam :)
Redwall: That would be an interesting argument. Without looking at the Twitter TOS, it’s hard to say but I find it hard to claim that Twitter would have the right to control whether the work appears on sites outside of their control. I would doubt that Twitter would require such strict rights from users.
Joost: Thank you for the info, post is updated!
I post on twitter occasionally and I find in two weeks that if I web search my posts’ text then I’ll find them on scraper sites.
Search engines are bad with scrapers. There’s lots of sites that scrape mailing lists and they end up with pagerank of 7 or 8 and they outrank the real archives and the real archives, google won’t even index.
I agree that Tweetbacks are unlikely to get the blog owner in trouble with the vast majority of Twitter participants, especially as the presence of the URL is going to reduce the available space for original content.
However, since many blogs using the plugin are also displaying ads, is there a ‘for profit’ condition that lessens the validity of the fair use argument in favor of using the tweets? This was what ended up torpedoing the Choicetweets t-shirt project — whether or not an individual tweet is copyrightable, selling content for profit without making an effort to clear it with the author (even in cases of clear infringement/copyright, such as a tweeted haiku) is on the wrong side of the issue.
Since I have a decent number of followers on Twitter, most of the feedback I get is through Twitter, … you can place anywhere. Here’s how to install it in WordPress: – Download the WordPress plugin here … about. If you have a decent following on Twitter, chances are your Tweet about a post of yours receives
http://cloudappers.com/2009/01/commentweet-beta-twitter-conversation-on-blogs/
I think we’re all forgetting one primary consideration (this is just my opinion). When I have a discussion in public, everyone hears (or reads) it. However this is more a privacy issue, but it does touch on copyright issues in the sense that it could be argued that content (contributed through twitter) is contributed to the public domain. As such, it appears that twitter (http://twitter.com/terms) encourages users to contribute to the public domain and also indicates that we should consider progressive licensing terms (if we do not want to contribute to the public domain). To me at least, it seems that any text posted using twitter falls within the public domain, unless I ensure other licensing terms or tools are in effect. Common sense however MUST prevail – If I wish to retain copyright of any text I transmit via twitter, then I should ensure I retain copyright BEFORE transmitting and ensure copyright ownership.
Finally, twitter very clearly states in their terms: “The Twitter service makes it possible to post images and text hosted on Twitter to outside websites. This use is accepted (and even encouraged!). However, pages on other websites which display data hosted on Twitter.com must provide a link back to Twitter.”
And here is the crux… Twitter users accepted the above term (and others) during the registration process. As such all users are aware of this (as users are required to review and accept the terms during registration).