Blogging vs. Human Content Aggregators

View Comments (13)

Siani says:

March 30, 2008 at 11:17 am

I think people who grab other bloggers’ content are lazy and disrespectful. As a blogger myself, I expect other bloggers to bring a personal touch to their work, rather than simply copying and pasting what someone else has slaved over. If I find a blog post I consider worthy of a mention, I’ll quote a sentence or two from it, and link to the full article, so that my readers can check it out for themselves. To scrape the whole content, even with attribution, is rude, and very poor blogging practice.

Could someone honestly call themselves an author, if they scraped the entire content of the latest Stephen King novel? Definitely not. Nor would they be allowed to get away with it. It’s high time the intellectual property rights of bloggers were placed on a par with those of other writers.
Michael says:

March 30, 2008 at 5:29 pm

I read your rants here and on Jack of All Blogs. At first, I wanted to chide you. Are you living in the past? Gripe all you want about casual scraping. People will do it, even if confronted publicly about their thievery. Many sued by RIAA probably found untraceable ways to steal music afterwards.

Then I may have grasped a solution. Google, Yahoo, MSN, and other major players can agree on a standard whereby authors submit content to search engines before making it public online. Submitted content is tagged for time and author, but not made listed in SERPs until made public on site and crawlable to search engine spiders. The author is notified immediately that content has been tagged. Then the author can display content publicly with greater assurance that scrapers will not be indexed favorably by search engines.

Time and author tags provide search engines in-house tracking that could be used to determine authenticity after content is indexed. Obviously, if you submit your content before making it public for indexing, then you probably are the author. Page rank and relevancy could be pegged to authenticity. Duplicate (scraped) content could be spotted immediately and ignored or penalized without jeopardizing the original’s SERPs.

Those who submit content must have Google, Yahoo, MSN accounts, etc. (This would make it an easier sell to search engines.) All older content is grandfathered. In other words, content already indexed cannot be resubmitted for authentic tags. So I cannot submit my site’s present poems for tags, only new poems. Someone could steal another’s content, such as one of my handwritten poems, before the author submits it to search engines for tagging. In that case, the author could file a DMCA takedown notice and go to court if necessary, just like we have to do now.

How do author and search engine reconcile authentic tag to content location (URL) before content is made public? The author must tell the search engine at what URL the content will appear, although the content will not appear at that URL until a future date.

What if author wants to move content from one URL to another? The author could log into his or her account and change the authentic tag’s URL information. This probably would be the most difficult problem this idea faces. Web sites go dark, and the author uploads old content elsewhere at a later date, etc.

What if author loses and cannot obtain again his or her login information? In the short term, I am not sure. In the long term, defunct accounts could have submitted content untagged after, say, a year has passed with no account activity. This would not negate copyright protection, but search engines need to clean house from time to time.

I think authentic tagging would work much better than the ragtag system we have now. Will bloggers still copy and paste? Will scrapers still scrape? Yes, but your original content would be ranked more favorably than theirs if tagged. Their “content” might be ignored altogether. What do you think?
Yearblook says:

March 30, 2008 at 6:05 pm

This is an interesting post. You should submit it at yearblook.com/submit.php. Yearblook is a competition to find each day’s best blog posts.
At the end of the year, the 365 best posts (1 from each day) will be published in a book (a real, printed book, you will find it on Amazon).

If you’re not ready to post your articles yet, browse around and see if there is anything you find interesting.

Also, since we’re just starting out, we would love any feedback you are willing to share.
Darnell Clayton says:

March 30, 2008 at 6:29 pm

It all depends (yeah, I know it sounds hippish and relativism gone bad).

Example: I have a space blog site called Colony Worlds, that I often find partially reposted (the first four paragraphs) on the Mars Society.

While some may get angry at the concept of them using my content, I really do not mind as it helps stir up some more ideas and (as I have found) I have received more subscribers because of it.

On the other hand other blogs I have started I allow users to post the article IN FULL provided they link back to the original article, as I found dealing with content via “cease and desist” takes too much time.

I decided to take TechCrunch’s approach and just “keep blogging” and let the divine reward me with more traffic (as I am producing the original content).

But to each their own I guess… ;-)
Sally says:

March 30, 2008 at 6:43 pm

Its theft. And its lazy. I like your analogy to the Daily Post reprinting the New York Times articles – AND NOT PAYING. Its one thing for an article to be published in the New York Times from Reuters since the NYT paid to reprint it – and they still have to acknowledge it was sourced from Reuters. So why should a content aggregator lift your hard worked upon blog entry, place it on their blog and not pay a penny for it?

Content pilfering and copying, even if it links back to you is not blogging. Its not what blogs are about – blogs are logs about our interpretations on something or a information. Are they going to say by copying in full someone else’s work, and placing it on their own website constitutes a log of their surfing??? I would respect them more if they put up a paragraph that said, ‘hey, I read this article and I think its interesting because… so click here to read it.”

And quite frankly, I’ve seen some of my blog entries appear on other ‘blogs’ which I don’t think the owner has even seen – they are clearly there because I had certain keywords in my entry which were pertinent to the so-called subject of their ‘blog’. I don’t call posting my entire blog entry minus the last paragraph a satisfactory link back to my website – it is usually the first clue a reader has that this blog entry wasn’t even written by the owner of the pseudo-blog they were on.

It’s theft and these content aggregators should not call themselves bloggers or steal or content and pretend to pass most of it off as their own.
Sally says:

March 30, 2008 at 7:38 pm

Actually, further to my rant above, I think I despise most ‘human content aggregators’ because they practice deception – not revealing it is not their post until the end of the article where if you are lucky, there is some text which if you knew how to read, acknowledges your website, or a link saying “more…” will bounce someone to your website.

If the content aggregator acknowledged their source up front, e.g. Travelblips writes… (with the title of your blog linking back to your blog) and then reproduces a paragraph or two with a link a the end to the full article on your blog, then I’d only be grumbling about being an unpaid contributor to their ‘blog.’ And I’d let it go as it was an oblique bit of free web traffic.

But they go to extraordinary lengths to hide that they did not write that article, which is deceptive and sometimes, when with no discernible acknowledgement, blatent copyright theft.
Gloria Karlos says:

March 30, 2008 at 10:26 pm

I think should be charged of plagiarism. They claim the intellectual property of other bloggers as their own, which is very unethical. They should take time out to review R.A. 8792 (Philippine E-Commerce Law)…
Andrew G.R. says:

March 31, 2008 at 7:40 am

Michael: Your concept of an “authentic tag” is quite interesting. How do we get started? ;-)
Jeremy Steele says:

March 31, 2008 at 9:45 am

I think the difference between someone like a splogger and a site like AllTop has to do with the intention. Is it malicious or not? Is the entire goal to make money or is it to provide an actual service?
David LaFerney says:

March 31, 2008 at 10:29 am

Although scraping is offensive on a certain level as long as they link back to my original post I don’t see that it really hurts anything.

I suspect that in most cases Google and Co can tell by the one way link between duplicate content which is the original, and most of these kinds of sites have little chance of gaining any real traction with either Readers or Search. And if they do, then that just inflates the value of the link.

So, Scrape Me – as long as you link back!
Michael says:

March 31, 2008 at 4:50 pm

Andrew: I don’t think “authentic tagging” is beyond the capability of current programming. Jonathan Bailey just wrote “How to Immunize Your Site Against Scraping” and made me wonder whether my idea is worth pursuing. But having search engines tag content before it is made public would provide the best method to authenticate authorship for the purposes of arranging SERPs, in my opinion.

The chief problem with the techniques Bailey explained is that your content is potentially exposed to content thieves before any legit service picks it up…before any Technorati or Feedburner or search engine knows that your content is there.

Consider Ryan D’s dilemma in Bailey’s comments. A thief hit his server every second for hours. No search engine or legit aggregator does that. If the thief snatches his content and uploads it and the thief’s services pick up Ryan D’s stolen content before Ryan D’s services pick up his original, how are search engines to know the difference? (This is probably why the thief hit the server every second. Chances were very good the thief could nab the content and look like the original author.) A human editor might be able to deduce that you wrote it, but I doubt there is any algorithm at present that can.

How do we get started? That should be simple in theory. If enough of us ask search engines to develop this, then I think they would. It would be in their best interests. Their programmers, at least, might find the concept intriguing. If not, then a third-party system could be developed to catalog content before it is made public. Then we ask search engines to recognize the third-party standard or develop one of their own.

There could be a “permissions tag” as well. There may be duplicate content on another site that you want there or do not mind that is there. But how do you convince an indifferent algorithm that you gave the other site permission so that the search engine does not strongly penalize that site? Add a permissions tag that only you, the author, can give out, a random alphanumeric string that could be displayed in the code, perhaps as a property of a blockquote tag, that is unique and securely stored and only recognized by the search engine. Each time you give a tag, the tag is a different string.
TourPro says:

April 1, 2008 at 7:23 am

I think we should identify the spectrum of behavior we are talking about.

On one end is total plagiarism – republishing, no attribution. The other end might be the automated aggregators like PopUrls and Google News.

Somewhere near the latter end are the human-driven content aggregators – Boing Boing, Instapundit – and popularity sites like del.icio.us and Digg.

There’s also a similar spectrum when you look at the goals of the content producers. Some actually are able to charge for syndicating their content – newswires for example.

It’s like the Wild West out there and the environment is constantly changing. But just like most human endeavors, I think basic social etiquette and good behavior always rises to the top. Call me an optimist.
Tomas says:

April 1, 2008 at 7:51 am

Hola,

I want throw in my dos centovas as I have a different perspective. I have personally aggregated over 36,000 posts and that is a number I am proud of. I spend hours on my site/blog because of the scope of what I cover. It is my job/passion. I don’t claim to be an expert in my subject matter, although I am definitely in the ‘know.’ Can’t help it.

In fact, I would say I am an more of an expert in human content aggregation than my niche subject mainly because it is broad. (You try aggregating 200 to 400 articles a week.) However, does that negate me as a blogger? I don’t think so. There is a tendency to think there is a pedestal, there isn’t. We are all exploring this new medium using various tools and methods – hopefully producing something new even wonderful.

However, at the end of the day, I say who cares? Even though I personally have debated this with myself, I eventually landed on the question, “Why do I have to label myself?” I provide a useful service that many love. I am happy.

I am a human content aggregator (and proud), a blogger (and proud) and/or something else (and proud). Reminds me of whether or not I am a Hispanic, Latino or something else – either way I am proud. I am me and I am doing something good.

What I do – collecting, categorizing (by both subject and geography), prioritizing and presenting all using the awesome wordpress while sometimes sprinkling witty (at least I think so) comments/snippets or clarifying/augmenting titles makes me all of these.

Let me clarify, that I only utilize 1 to 3 paragraphs with a link to the original plus a link to a translation of the article at Goog and Babelfish.

One last things after 2.5 years, I would say that between 5 to 10% of my ‘source articles vanish – link rot. If not for my site there would be no record of them. Am I an archiver as well or all of these labels?

Once again I don’t care about labels. I am providing a useful service to both this nation and a large minority population that needs this service (really need, imho) and I am doing it in a respectful way.

I also have been scrapped and copied but I dealt with it. I hated it/them even though I saw the irony.

My name is Tomas. I run HispanicTips.com. National Hispanic News: For, From & About Hispanics & Latinos – Presenting, Celebrating & Documenting This As Blogante News So You May Easily Stay Informed. – A Living, Growing Archive/Database Categorized by Subject, State & Metro – With 36,576 News Items