Blogs are one of the richest sources of information for certain classes of information. Yet they are frustratingly hard to find or extract information from, and the state of the art (Google, Technorati) feel 100% stagnant. Here are a couple of example use cases I have that aren’t well served by existing tools:

1. Recruiting. When I recruit for a particular role, I’m looking for thought leaders or people with insight and passion. Usually these people have blogs. If I could see, for example, a list of all the people in the Boston area with blogs that blog about web development, I’d probably find some rock star developers. No easy way to do this today.
2. Travel planning. I’m thinking about a stay in southern Utah at a Bed & Breakfast. Who’s blogged about their trips there that might have some good perspective for me?
3. Music Discovery. Great blogs like Aurgasm, Quietcolor or TheMusicSlut are great ways to find music. But how many others like that are out there?

The current serious choices are pretty much limited to using normal search (Google, Bing etc), or using a Blog search engine like Google Blog Search or Technorati. With Google Blog Search, you get pretty much a toned-down version of Google: a search box with 10 results – you can’t really search for *blogs*, you can only search for *posts*, with the relevance ranking determined by some version of PageRank. There’s no real sense of the authority of a blog (other than that of PageRank), and no real opportunity for discovery – just punch in your keywords and hope for the best.

With Technorati, you do get some increased power. You can search for blogs as entities distinct from an individual post, and blogs do get assigned an authority score. But the experience seems to fail as often as succeed. A search for “boston web developer” blogs on Technorati returns three blogs, all with an authority score of 1 ( the minimum) – pretty sure there are more than 3 of this kind of blog in Boston! And there’s no way to sort the blogs by their authority score, at least that I can see. The Technorati blog directory also seems to be mostly limited to “authoritative” blogs – personal blogs (for example my own) seem to have little or no representation. But on long tail topics (say, music reviews of obscure artists), blogs by “real people” are often the only place to find this kind of commentary. Most importantly, there seems to be little innovation happening in Google Blog Search, Technorati, or more generally – the field is stagnant.

What would the characteristics of a good blog search tool be? What’s lacking in today’s approaches?
1. Comprehensiveness. There’s hundreds of millions of blogs (Billions?) – yet Technorati doesn’t seem to find many of them. (Google is more comprehensive, but limited by the “search box + 10 results” interface).
2. Ranking of blogs relative to search query and/or authority of author. (Of course this ranking problem is non-trivial. There are some interesting ideas on authority for twitter accounts which could perhaps be leveraged, e.g.
3. A faceted, searchable directory of blogs supporting discovery. Categorization technology has come a long way. It ought to be possible to categorize every blog against a reasonably detained taxonomy or facet set (say, the Open Directory categories, or something better), with 80% accuracy, across some common facets: topics, locations, age of blog, date of last post, and so forth. Even at 80% accuracy, this asset would be quite effective. And with a little UGC thrown in, the 20% that matter, and are wrong, will quickly get corrected. Using a microformat like hCard, blog authors could document their blog’s metadata quickly and accurately. Blogs also typically have some consistent thematic elements, such as an “About” page or a Blogroll list, that could be mined for interesting metadata. This kind of experience would power a new way to discover fresh and interesting blogs & content.
4. Recommend other, similar blogs. Powered by the facets above, or by a blogroll analysis, or something similar, a recommendation feature for similar blogs could be implemented, based on information readily available in an “almost standard” format.

Verticalized Blog Search Engines might also provide some task-centric capabilities. As I’ve written before, the future of search is about providing task-centric search capabilities. In music for example, The Hype Machine has some very interesting behaviors it can support, simply by virtue of being focused on music.

The obvious question: what business or investment model would support this kind of vertical search engine? In the Goby world of travel and entertainment, there’s a long history of various ways to monetize that kind of content. In the “pure content” world of blog search, it’s less clear – a pure page-view based CPM ad model isn’t likely to work. If the New York Times can’t make that kind of model work, a startup probably can’t either. Perhaps some form of interest-based, downstream ad retargeting approach might get enough leverage that it could get to critical mass. Alternatively in some domains a “freemium” model might work, where additional tools (say for recruiters or brand managers looking for a competitive edge). Given the scale of the problem, it’s not clear a bootstrapped company could take this on – the infrastructure requirements (bandwidth for crawling, servers, etc) probably require a non-trivial level of investment.

What blog search tool do you use? Do you use a blog search tool? or just Google? Is anyone innovating in the area?

