Blogging vs. Human Content Aggregators
I recently blogged about my disdain for so-called ‘bloggers’ who rip and run with your content. You know the drill. You stay up late researching and writing a post, only to find it re-posted (at varying lengths) on other people’s blogs. Sure, they’re kind enough to attribute the story to you. But let’s be honest; how many people are gong to click-through to your Website to read other articles.
We all like to think that our writing is strong enough to lure people in to read more and earn them as a subscriber. But the majority of Web surfers generally take a glance and move on.
This tactic of copy and pasting within in a niche, does NOT make you an authority on a subject. In fact, I’m not even sure it should qualify as blogging.
Here in New York, how would the New York Times feel if the Daily News started to publish their stories – without permission – in there entirely. Even with proper attribution, it’s illegal and would never fly.
I’m sure there are plenty of you out there who take this route and do consider yourself bloggers. I’m open minded and willing to consider both sides of the argument. So let’s get the debate started:
If you grab content from multiple blogs, and do not offer your own commentary, should you be considered a blogger?
On the flip side, I will say that I thoroughly enjoy both the content and traffic generation offered by “blog catalog” Websites like Alltop.
However, I say these folks trying to pass themselves off as bloggers are nothing more than human content aggregators. What’s your take?
Andrew G.R. is the owner of Jobacle, a career advice and employment news blog and podcast designed to make work better. Follow him on Twitter.
I think people who grab other bloggers’ content are lazy and disrespectful. As a blogger myself, I expect other bloggers to bring a personal touch to their work, rather than simply copying and pasting what someone else has slaved over. If I find a blog post I consider worthy of a mention, I’ll quote a sentence or two from it, and link to the full article, so that my readers can check it out for themselves. To scrape the whole content, even with attribution, is rude, and very poor blogging practice.
Could someone honestly call themselves an author, if they scraped the entire content of the latest Stephen King novel? Definitely not. Nor would they be allowed to get away with it. It’s high time the intellectual property rights of bloggers were placed on a par with those of other writers.
I read your rants here and on Jack of All Blogs. At first, I wanted to chide you. Are you living in the past? Gripe all you want about casual scraping. People will do it, even if confronted publicly about their thievery. Many sued by RIAA probably found untraceable ways to steal music afterwards.
Then I may have grasped a solution. Google, Yahoo, MSN, and other major players can agree on a standard whereby authors submit content to search engines before making it public online. Submitted content is tagged for time and author, but not made listed in SERPs until made public on site and crawlable to search engine spiders. The author is notified immediately that content has been tagged. Then the author can display content publicly with greater assurance that scrapers will not be indexed favorably by search engines.
Time and author tags provide search engines in-house tracking that could be used to determine authenticity after content is indexed. Obviously, if you submit your content before making it public for indexing, then you probably are the author. Page rank and relevancy could be pegged to authenticity. Duplicate (scraped) content could be spotted immediately and ignored or penalized without jeopardizing the original’s SERPs.
Those who submit content must have Google, Yahoo, MSN accounts, etc. (This would make it an easier sell to search engines.) All older content is grandfathered. In other words, content already indexed cannot be resubmitted for authentic tags. So I cannot submit my site’s present poems for tags, only new poems. Someone could steal another’s content, such as one of my handwritten poems, before the author submits it to search engines for tagging. In that case, the author could file a DMCA takedown notice and go to court if necessary, just like we have to do now.
How do author and search engine reconcile authentic tag to content location (URL) before content is made public? The author must tell the search engine at what URL the content will appear, although the content will not appear at that URL until a future date.
What if author wants to move content from one URL to another? The author could log into his or her account and change the authentic tag’s URL information. This probably would be the most difficult problem this idea faces. Web sites go dark, and the author uploads old content elsewhere at a later date, etc.
What if author loses and cannot obtain again his or her login information? In the short term, I am not sure. In the long term, defunct accounts could have submitted content untagged after, say, a year has passed with no account activity. This would not negate copyright protection, but search engines need to clean house from time to time.
I think authentic tagging would work much better than the ragtag system we have now. Will bloggers still copy and paste? Will scrapers still scrape? Yes, but your original content would be ranked more favorably than theirs if tagged. Their “content” might be ignored altogether. What do you think?
This is an interesting post. You should submit it at yearblook.com/submit.php. Yearblook is a competition to find each day’s best blog posts.
At the end of the year, the 365 best posts (1 from each day) will be published in a book (a real, printed book, you will find it on Amazon).
If you’re not ready to post your articles yet, browse around and see if there is anything you find interesting.
Also, since we’re just starting out, we would love any feedback you are willing to share.
It all depends (yeah, I know it sounds hippish and relativism gone bad).
Example: I have a space blog site called Colony Worlds, that I often find partially reposted (the first four paragraphs) on the Mars Society.
While some may get angry at the concept of them using my content, I really do not mind as it helps stir up some more ideas and (as I have found) I have received more subscribers because of it.
On the other hand other blogs I have started I allow users to post the article IN FULL provided they link back to the original article, as I found dealing with content via “cease and desist” takes too much time.
I decided to take TechCrunch’s approach and just “keep blogging” and let the divine reward me with more traffic (as I am producing the original content).
But to each their own I guess… ;-)
Its theft. And its lazy. I like your analogy to the Daily Post reprinting the New York Times articles – AND NOT PAYING. Its one thing for an article to be published in the New York Times from Reuters since the NYT paid to reprint it – and they still have to acknowledge it was sourced from Reuters. So why should a content aggregator lift your hard worked upon blog entry, place it on their blog and not pay a penny for it?
Content pilfering and copying, even if it links back to you is not blogging. Its not what blogs are about – blogs are logs about our interpretations on something or a information. Are they going to say by copying in full someone else’s work, and placing it on their own website constitutes a log of their surfing??? I would respect them more if they put up a paragraph that said, ‘hey, I read this article and I think its interesting because… so click here to read it.”
And quite frankly, I’ve seen some of my blog entries appear on other ‘blogs’ which I don’t think the owner has even seen – they are clearly there because I had certain keywords in my entry which were pertinent to the so-called subject of their ‘blog’. I don’t call posting my entire blog entry minus the last paragraph a satisfactory link back to my website – it is usually the first clue a reader has that this blog entry wasn’t even written by the owner of the pseudo-blog they were on.
It’s theft and these content aggregators should not call themselves bloggers or steal or content and pretend to pass most of it off as their own.
Actually, further to my rant above, I think I despise most ‘human content aggregators’ because they practice deception – not revealing it is not their post until the end of the article where if you are lucky, there is some text which if you knew how to read, acknowledges your website, or a link saying “more…” will bounce someone to your website.
If the content aggregator acknowledged their source up front, e.g. Travelblips writes… (with the title of your blog linking back to your blog) and then reproduces a paragraph or two with a link a the end to the full article on your blog, then I’d only be grumbling about being an unpaid contributor to their ‘blog.’ And I’d let it go as it was an oblique bit of free web traffic.
But they go to extraordinary lengths to hide that they did not write that article, which is deceptive and sometimes, when with no discernible acknowledgement, blatent copyright theft.
I think should be charged of plagiarism. They claim the intellectual property of other bloggers as their own, which is very unethical. They should take time out to review R.A. 8792 (Philippine E-Commerce Law)…
Michael: Your concept of an “authentic tag” is quite interesting. How do we get started? ;-)
I think the difference between someone like a splogger and a site like AllTop has to do with the intention. Is it malicious or not? Is the entire goal to make money or is it to provide an actual service?
Although scraping is offensive on a certain level as long as they link back to my original post I don’t see that it really hurts anything.
I suspect that in most cases Google and Co can tell by the one way link between duplicate content which is the original, and most of these kinds of sites have little chance of gaining any real traction with either Readers or Search. And if they do, then that just inflates the value of the link.
So, Scrape Me – as long as you link back!
Andrew: I don’t think “authentic tagging” is beyond the capability of current programming. Jonathan Bailey just wrote “How to Immunize Your Site Against Scraping” and made me wonder whether my idea is worth pursuing. But having search engines tag content before it is made public would provide the best method to authenticate authorship for the purposes of arranging SERPs, in my opinion.
The chief problem with the techniques Bailey explained is that your content is potentially exposed to content thieves before any legit service picks it up…before any Technorati or Feedburner or search engine knows that your content is there.
Consider Ryan D’s dilemma in Bailey’s comments. A thief hit his server every second for hours. No search engine or legit aggregator does that. If the thief snatches his content and uploads it and the thief’s services pick up Ryan D’s stolen content before Ryan D’s services pick up his original, how are search engines to know the difference? (This is probably why the thief hit the server every second. Chances were very good the thief could nab the content and look like the original author.) A human editor might be able to deduce that you wrote it, but I doubt there is any algorithm at present that can.
How do we get started? That should be simple in theory. If enough of us ask search engines to develop this, then I think they would. It would be in their best interests. Their programmers, at least, might find the concept intriguing. If not, then a third-party system could be developed to catalog content before it is made public. Then we ask search engines to recognize the third-party standard or develop one of their own.
There could be a “permissions tag” as well. There may be duplicate content on another site that you want there or do not mind that is there. But how do you convince an indifferent algorithm that you gave the other site permission so that the search engine does not strongly penalize that site? Add a permissions tag that only you, the author, can give out, a random alphanumeric string that could be displayed in the code, perhaps as a property of a blockquote tag, that is unique and securely stored and only recognized by the search engine. Each time you give a tag, the tag is a different string.
I think we should identify the spectrum of behavior we are talking about.
On one end is total plagiarism – republishing, no attribution. The other end might be the automated aggregators like PopUrls and Google News.
Somewhere near the latter end are the human-driven content aggregators – Boing Boing, Instapundit – and popularity sites like del.icio.us and Digg.
There’s also a similar spectrum when you look at the goals of the content producers. Some actually are able to charge for syndicating their content – newswires for example.
It’s like the Wild West out there and the environment is constantly changing. But just like most human endeavors, I think basic social etiquette and good behavior always rises to the top. Call me an optimist.
I want throw in my dos centovas as I have a different perspective. I have personally aggregated over 36,000 posts and that is a number I am proud of. I spend hours on my site/blog because of the scope of what I cover. It is my job/passion. I don’t claim to be an expert in my subject matter, although I am definitely in the ‘know.’ Can’t help it.
In fact, I would say I am an more of an expert in human content aggregation than my niche subject mainly because it is broad. (You try aggregating 200 to 400 articles a week.) However, does that negate me as a blogger? I don’t think so. There is a tendency to think there is a pedestal, there isn’t. We are all exploring this new medium using various tools and methods – hopefully producing something new even wonderful.
However, at the end of the day, I say who cares? Even though I personally have debated this with myself, I eventually landed on the question, “Why do I have to label myself?” I provide a useful service that many love. I am happy.
I am a human content aggregator (and proud), a blogger (and proud) and/or something else (and proud). Reminds me of whether or not I am a Hispanic, Latino or something else – either way I am proud. I am me and I am doing something good.
What I do – collecting, categorizing (by both subject and geography), prioritizing and presenting all using the awesome wordpress while sometimes sprinkling witty (at least I think so) comments/snippets or clarifying/augmenting titles makes me all of these.
Let me clarify, that I only utilize 1 to 3 paragraphs with a link to the original plus a link to a translation of the article at Goog and Babelfish.
One last things after 2.5 years, I would say that between 5 to 10% of my ‘source articles vanish – link rot. If not for my site there would be no record of them. Am I an archiver as well or all of these labels?
Once again I don’t care about labels. I am providing a useful service to both this nation and a large minority population that needs this service (really need, imho) and I am doing it in a respectful way.
I also have been scrapped and copied but I dealt with it. I hated it/them even though I saw the irony.
My name is Tomas. I run HispanicTips.com. National Hispanic News: For, From & About Hispanics & Latinos – Presenting, Celebrating & Documenting This As Blogante News So You May Easily Stay Informed. – A Living, Growing Archive/Database Categorized by Subject, State & Metro – With 36,576 News Items