Now Reading
Is Blog Searching Dead?

Is Blog Searching Dead?

technorati-logo2

When I first started blogging over three years ago, blog search engines including Technorati and Google Blog Search were my favorite tools for keeping on top of who was talking about my topics, who was linking to my site and finding posts to comment on and offer help to.

However, over the years, the usefulness of these services have dwindled to nearly nothing. Where once nearly every great tip or connection came from either a Technorati Watchlist or a Google RSS feed, now I seem to get the best results from Twitter and more targeted searches.

The days of punching in a few keywords into Technorati and getting a stream of useful results is over. What follows now is a kludge of spam, off-topic posts and other noise that has to be sifted through to find the few grains of great content.

If blog searching isn’t dead, it certainly is very ill and it is time that something is done to fix it.

The Problems

All totaled, blog searching has at least five major problems that have helped to lessen its usefulness over the past few years.

  1. Spam: Blog spam has exploded over the past tnree years and every blog search engine has struggled to keep the junk out while letting the real blogs in. This has not been an easy task and, thanks to scrapers, many results in blog search engines are repeated several times over with only one being the original post.
  2. Non-Blogs with RSS: Three years ago, RSS was mosty a blogging phenomenon. Now it is not uncommon to see RSS feeds on mainstream news sites, forums, social networking sites and elsewhere. These sites often get picked up by blog search engines unwittingly and usually provide little added value.
  3. Tag Gaming: Technorati may have brought the idea of tagging blog entries into the mainstream, but some bloggers have begun to abuse tags, usually by stuffing their posts full of unrelated keywords. This greatly hurts the accuracy of tag search results and has made tag searching dubious at best.
  4. Incorrect Feeds: Even if a blog search engine has fund a true blog, it is not uncommon for it to mistake the comment feed, tag feed or category feed for a whole new blog. Worse still, depending on permalink structure, it may read these feeds as separate blogs, with separate URLs.
  5. General Expansion: Perhaps the biggest problem is simply that there is many, many times more blogs than there was 3 years ago. Even if every blog were legit and every feed parsed a correct one, there would be a many-fold increase in the amount of noise to every keyword.

The major blog search engines have tried, with mixed results, to deal with this problem but both have managed to fail, each in their own unique way.

Paths to Failure

Technorati used to be the darling of blog search, with its massive index and authority-based rankings, it seemed to have the best system, even better than Google. But in recent months and years, the index has become dilluded with spam and duplicate content.

Even a simple link search for my own site turned up several cases of duplicate content, some by the Webmasters themselves, other times by the spam bloggers that scrape them. Worse still, some times the spammers would be indexed before the original site, making it appear that the spammer posted first.

The more active the keyword, the bigger this problem becomes. Though Technorati has counteracted this somewhat with its “authority” system, which looks at how often a blog is linked to, the system is far from perfect and setting the slider high enough to filter most of the spam also filters out a large number of legitimate blogs.

Searchers are forced to make a “devils choice” between accuracy and completeness.

Google Blog Search, on the other hand, seems to have done a respectable job keeping spam blogs at bay but a far worse one at keeping other results out. Forums, comment feeds, etc. routinely make appearances in Google Blog Search and can quickly drown out some keywords.

To make matters worse, Google has a strange way of ordering results, leading to a lot of very old content ranking well, even when newer stories are breaking.

Most frustrating, however, is how Google has rendered it’s own link results, which are used by default in the WordPress dashboard, useless. It recently made the decision to index pages, not RSS feeds.

Though the decision was great news for sites wth partial feeds as their full content would now be indexed, it meant also that all links on the page, including blogroll links, would be counted. That meant if a site has you linked in their blogroll, you’ll likely see every post they publish in your WordPress dashboard or your link feed.

All in all, the problem is frustrating but it doesn’t come with easy answers. The brightest minds in search have been stumped and it doesn’t appear that any perfect answers are on the horizon.

See Also
quillbot paraphrase tool

Ways to Address the Issue

Several smaller blog and news search engines have arisen to address this issue, most focused on limiting the index to only the best blogs. Examples of this include Regator, Blog Search Engine (owned by SplashPress and partnered with Icerocket) and Twingly all work on this principle of greater control over the index to yield better results.

The problem with these search engines is two-fold. First, the indexes have to either be maintained by humans or by some form of automated process. If it is the latter, it is only a matter of time before spammers learn how to game it. If it is the first, then maintenance costs will go up and new sites will be slower to appear.

Secondly, these sites, as with Technorati authority, are a trade off between accuracy and completeness. The results may be relatively spam and garbage free, but they will miss at leas some legitimate blogs.

Though these are imperfect and inelegant solutions, they are likely the best ones available.

Conclusions

Personally, I’ve begun to seek out and use other means of staying on top of my field. Subscribing to relevant blogs directly, using Twitter search feeds, watching incoming Delicious links and social news sites seem to do the job much better.

I’m almost to a point where I am ready to completely unsubscribe to my blog search feeds but I’m not prepared to give up yet, I keep holding out some hope that Technorati and/or Google will figure it out.

Though the heyday of blog search may have passed, its memory remains strong.

View Comments (12)
  • I’ve never really searched for blogs – generally find my way round using other peoples blogrolls. More recently I’ve used feeds from the growing number of small specialist index sites such as naturenetworkblogs or similar sites.

  • Chung: Thank you. I just started using Twingly off and on a while back but it seems to be the best solution going right now.

    Aaron: I’m the same way, mostly subscribing to related search results, its in my RSS reader that I’ve noticed the noise become unbearable.

    Krug: So far, so good!

  • Hi Jonathan,

    It’s really amazing how the notion of “blog search” has gone from generally sceptical, to that of “broken” to that of “dead” over the last year or so. Blog are still on the rize, yet there still seems to be no decent search for them.. somewhat strange really..

    I think that the biggest problem is not really spam-blogs or irrelevant sources as such, but the lack of semantic/meaning mining approach to the search problem.

    Enjoy the follow-up over at my site if you like (if you don’t, you’re still on our list of first people to get an invite when our service launches :))

    Cheers,
    Martin

  • Martin: I’m going to respond on your other post as well but I wanted to say that I think the reason the dialog has shifted is because people, like myself, remember the “golden age” of blog search around late 2005 to mid-2006 and have watched as it has deteriorated. In less than 1000 days it has gone from liquid greatness to liquid garbage.

    It ends up hard to not either call it dead or want to put it out of its misery.

    As far as the problem goes though. You may be right. The problem though is that we’ll have to see how a more semantic Web fixes this problem Even with a truly semantic blog search, you still have the issue of spammers, who will now try to game this as well, the exponential increase in blogs and semantic Web search addresses none of this directly.

    I guess we’ll see what happens next…

  • Jonathan, great post. I’m one of the co-founders of Regator and I would agree with you on the decline of useful blog search from the major players for all the reasons you outline. I also agree there are some issues with purely algorithmical estimations of a blog’s quality as they can and will get gamed if there is enough value in doing so.

    We looked at the quality problem and went back to basics. We use qualified editors to select blogs on the site essentially creating a curated and closed system to ensure only quality stuff is in our database. We do use algorithms and other magic to ensure that we keep track of what’s going on with those blogs, as doing it manually is unrealistic. Is it the most technological forward method using AI and advanced semantic algorithms to find the best blogs? Not really. But it works well.

    We are especially selective when it comes to the blogs on our site. We have strict criteria on each one and our qualified editors actually read months of posts (on most) to ascertain if the writing quality is up to scratch among being on topic and updated regularly, among other factors. Sure, this means that we don’t have tens of thousands of blogs on every subject just yet, but we are still pretty young and we intend on continuing to grow. Our archive is over 2,000,000 posts right now so we are creating a pool of useful content. Also, we are targeting users who are not quite familiar with blogs/RSS and we want to be an easy way for them to be introduced to quality blog content.

    Anyways, I have rambled on here but we love the feedback and if your or any of your readers have any, we’d love to hear it. Cheers!

  • Scott: I actually use several Regator feeds as part of my “keep on top of things” feeds. They are fairly quiet feeds, I get maybe 5 posts on average per day on my topics, but I do know that they are relevant and they are always high-quality posts.

    So, in that regard, Regator is a smashing success. I can’t find any fault in your process of finding new blogs to pick (it would be suicide as you actually have picked up my main blog, plagairismtoday.com).

    Still, there is a trade off. I follow these watchlists to find new articles to link to, comment on and converse with. Sadly, the problem with this setup is that, while density is high, quantity is low, as you mentioned.

    What I do right now is read my Regator feeds first, trusting that they are going to be at least reasonable in quality, and then begin the process of wading through less-dense feeds. It does its job well and it is a great asset, one I recommend to others wholeheartedly, but it also isn’t a complete solution.

    It’s part of the answer, perhaps even a big part, but there is more to be done from other sides.

    Congrats though on a great service!

  • Jonathan: we are definitely going for quality over quantity, but I hear you on the overall volume. Partially it’s by design. We’re working on adding more blogs every week to round out some areas and add a larger pool of quality stuff so those numbers will go up. At the end of the day we are keeping our eye on our niche. We know there is value in searching the larger blogosphere in some cases which you can do on Technorati or Google Blog Search, but we feel there’s also real value in a resource that is really focused on searching and browsing only quality posts. Thanks for your nice words about Regator and we’re glad you find it useful.

  • A Prouerbe old, yet nere forgot;
    Tis good to strike while the Irons hott.

    To the glory days of the blogger’s domain;
    may the tweeterer’s hearts grow fonder.

    Can’t stay me ways in those frontier days;
    twill ne’r forget them either.

  • Jonathan, you may want to run this entry through a spell check and grammar check. There are several errors that take away from the great piece you wrote.

Scroll To Top