When I first started blogging over three years ago, blog search engines including Technorati and Google Blog Search were my favorite tools for keeping on top of who was talking about my topics, who was linking to my site and finding posts to comment on and offer help to.
However, over the years, the usefulness of these services have dwindled to nearly nothing. Where once nearly every great tip or connection came from either a Technorati Watchlist or a Google RSS feed, now I seem to get the best results from Twitter and more targeted searches.
The days of punching in a few keywords into Technorati and getting a stream of useful results is over. What follows now is a kludge of spam, off-topic posts and other noise that has to be sifted through to find the few grains of great content.
If blog searching isn’t dead, it certainly is very ill and it is time that something is done to fix it.
All totaled, blog searching has at least five major problems that have helped to lessen its usefulness over the past few years.
- Spam: Blog spam has exploded over the past tnree years and every blog search engine has struggled to keep the junk out while letting the real blogs in. This has not been an easy task and, thanks to scrapers, many results in blog search engines are repeated several times over with only one being the original post.
- Non-Blogs with RSS: Three years ago, RSS was mosty a blogging phenomenon. Now it is not uncommon to see RSS feeds on mainstream news sites, forums, social networking sites and elsewhere. These sites often get picked up by blog search engines unwittingly and usually provide little added value.
- Tag Gaming: Technorati may have brought the idea of tagging blog entries into the mainstream, but some bloggers have begun to abuse tags, usually by stuffing their posts full of unrelated keywords. This greatly hurts the accuracy of tag search results and has made tag searching dubious at best.
- Incorrect Feeds: Even if a blog search engine has fund a true blog, it is not uncommon for it to mistake the comment feed, tag feed or category feed for a whole new blog. Worse still, depending on permalink structure, it may read these feeds as separate blogs, with separate URLs.
- General Expansion: Perhaps the biggest problem is simply that there is many, many times more blogs than there was 3 years ago. Even if every blog were legit and every feed parsed a correct one, there would be a many-fold increase in the amount of noise to every keyword.
The major blog search engines have tried, with mixed results, to deal with this problem but both have managed to fail, each in their own unique way.
Paths to Failure
Technorati used to be the darling of blog search, with its massive index and authority-based rankings, it seemed to have the best system, even better than Google. But in recent months and years, the index has become dilluded with spam and duplicate content.
Even a simple link search for my own site turned up several cases of duplicate content, some by the Webmasters themselves, other times by the spam bloggers that scrape them. Worse still, some times the spammers would be indexed before the original site, making it appear that the spammer posted first.
The more active the keyword, the bigger this problem becomes. Though Technorati has counteracted this somewhat with its “authority” system, which looks at how often a blog is linked to, the system is far from perfect and setting the slider high enough to filter most of the spam also filters out a large number of legitimate blogs.
Searchers are forced to make a “devils choice” between accuracy and completeness.
Google Blog Search, on the other hand, seems to have done a respectable job keeping spam blogs at bay but a far worse one at keeping other results out. Forums, comment feeds, etc. routinely make appearances in Google Blog Search and can quickly drown out some keywords.
To make matters worse, Google has a strange way of ordering results, leading to a lot of very old content ranking well, even when newer stories are breaking.
Most frustrating, however, is how Google has rendered it’s own link results, which are used by default in the WordPress dashboard, useless. It recently made the decision to index pages, not RSS feeds.
Though the decision was great news for sites wth partial feeds as their full content would now be indexed, it meant also that all links on the page, including blogroll links, would be counted. That meant if a site has you linked in their blogroll, you’ll likely see every post they publish in your WordPress dashboard or your link feed.
All in all, the problem is frustrating but it doesn’t come with easy answers. The brightest minds in search have been stumped and it doesn’t appear that any perfect answers are on the horizon.
Ways to Address the Issue
Several smaller blog and news search engines have arisen to address this issue, most focused on limiting the index to only the best blogs. Examples of this include Regator, Blog Search Engine (owned by SplashPress and partnered with Icerocket) and Twingly all work on this principle of greater control over the index to yield better results.
The problem with these search engines is two-fold. First, the indexes have to either be maintained by humans or by some form of automated process. If it is the latter, it is only a matter of time before spammers learn how to game it. If it is the first, then maintenance costs will go up and new sites will be slower to appear.
Secondly, these sites, as with Technorati authority, are a trade off between accuracy and completeness. The results may be relatively spam and garbage free, but they will miss at leas some legitimate blogs.
Though these are imperfect and inelegant solutions, they are likely the best ones available.
Personally, I’ve begun to seek out and use other means of staying on top of my field. Subscribing to relevant blogs directly, using Twitter search feeds, watching incoming Delicious links and social news sites seem to do the job much better.
I’m almost to a point where I am ready to completely unsubscribe to my blog search feeds but I’m not prepared to give up yet, I keep holding out some hope that Technorati and/or Google will figure it out.
Though the heyday of blog search may have passed, its memory remains strong.