You are currently browsing the tag archive for scraping

May 11, 2009

5 Alternatives to Truncated Feeds

Yesterday, Mark Ghosh at Weblog Tools Collection made an announcement that, due to the rampant abuse of their RSS feed, that the site would be moving over to a truncated, or shortened, feed.

However, the decision did not last long. After less than three hours, Ghosh reactivated the full feeds after many of the site’s readers posted comments objecting to the change. He instead said he would experiment with RSS footer and reopen the full feed.

Still, Ghosh’s frustration is more than understandable. With countless spam blogs scraping content without permission, the temptation to deny them access is understandable. However, users overwhelmingly prefer full RSS feeds and denying access to spammers is almost impossible without hindering access by legitimate aggregators.

The good news is that there are alternatives to shortening your RSS feed, practical ways to protect your content without cutting off your readers. read more

Tags: , , , , ,

April 23, 2009

Year of Original Content: Make Money From Copyright Thieves

Ask First Copyright badge - by Lorelle VanFossenI and Jonathan Bailey of Plagiarism Today have long been advocates of copyright protections and education, leading the way with projects such as “Ask First,” the Year of Original Content,”5 Content Theft Myths and Why They Are False,” and “The 6 Steps to Stop Content Theft.”

It seems that the rest of the world is waking up to the fact that stolen content is big business. Within the past two years, there are a variety of services you can use to track where your online content has gone, report and stop it. A new project is underway called the Fair Syndication Consortium that might put a dollar amount on that stolen content, paying you for others abusing your content. read more

Tags: , , , , , , , , , , , , , , , , , , ,

December 8, 2008

When Does Social Media Copying Go Too Far?

Yesterday, on CenterNetworks, Allen Stern reported on a new social news site, Social|Median. The story, however, didn’t center around Social|Median’s features or capability, but instead on how, according to Stern, it “take(s) content from around the Web, put it onto Socialmedian and let you comment about it.”

Though I did not see any widespread copying of content on the links that I checked (example), it appears that the amount of content copied in the snippet is determined by the user posting the link, not the site.

Still, it is clear that there has to be a balancing act between social media and content creators. Though social news sites need to use some of the content and conversation from the blog in order to properly function, if they take too much, there is nothing left to encourage content creators to participate or permit their works to be used.

Finding this balance is tricky and has been a problem that has plagued social news sites since the beginning. Many sites have faced criticism for “scraping content” or “fragmenting the conversation” and the concern remains at the top of mind of many Webmasters, especially when dealing with new social news sites that do not drive significant traffic.

So how should social news behave? The last is not very clear but the standards on the Web seem to have spoken to at least some degree. read more

Tags: , , , , , , , ,

October 13, 2008

Comments, Copyright and Confusion

For many blogs the bulk of their content comes not from their posts, but from their comments. It is not uncommon for a blog to have only a few hundred words of text per post, perhaps even less, and many thousands of words in comments.

For bloggers, this is a very good deal. Not only do comments promote a sense of community, add value to the site and encourage repeat visitors, it also adds a great deal of search engine-friendly content that helps to grow the blog.

But the power of comments has caused many bloggers to be worried about what rights they have over them. What happens if a spammer begins to scrape the comment feed? What if a commenter changes his or her mind and asks for the post removed? What happens if I move to another site or service?

Unfortunately, these are not simple questions but they are important ones for bloggers to be aware of, especially since disputes over comments are happening more and more frequently. read more

Tags: , , , ,

September 29, 2008

The Future of Blog Spam

When Steven Carrol of The Next Web admitted to using a content generation service known as Datapresser, reportedly after seeing it used by an unnamed author at TechCrunch, he seemed to indicate that it was the future of mainstream blog publishing.

But while there is no doubt that at least some mainstream blogs use content creation tools to aid in meeting their deadlines, content generation has found a much more comfortable home with another group, spammers.

Creating content from nothing has always been something of a holy grail for spammers. Traditionally, filling their junk blogs has required scraping content from article databases, other blogs (usually without permission) or other sources. This has made them easy for search engines to spot and also drawn the ire of many bloggers who have had their content reused.

But technology is advancing and content generation is becoming increasingly practical. Many spammers have already moved to it and it seems likely that others will follow soon. This has some strong implications for both the future of spam and the Web itself. read more

Tags: , , , , , , ,