May 26 2010

The future of Blog Search

Does Blog Search have a future?

Blogs are one of the richest sources of information for certain classes of information. Yet they are frustratingly hard to find or extract information from, and the state of the art (Google, Technorati) feel 100% stagnant. Here are a couple of example use cases I have that aren’t well served by existing tools:

1. Recruiting. When I recruit for a particular role, I’m looking for thought leaders or people with insight and passion. Usually these people have blogs. If I could see, for example, a list of all the people in the Boston area with blogs that blog about web development, I’d probably find some rock star developers. No easy way to do this today.
2. Travel planning. I’m thinking about a stay in southern Utah at a Bed & Breakfast. Who’s blogged about their trips there that might have some good perspective for me?
3. Music Discovery. Great blogs like Aurgasm, Quietcolor or TheMusicSlut are great ways to find music. But how many others like that are out there?

The current serious choices are pretty much limited to using normal search (Google, Bing etc), or using a Blog search engine like Google Blog Search or Technorati. With Google Blog Search, you get pretty much a toned-down version of Google: a search box with 10 results – you can’t really search for *blogs*, you can only search for *posts*, with the relevance ranking determined by some version of PageRank. There’s no real sense of the authority of a blog (other than that of PageRank), and no real opportunity for discovery – just punch in your keywords and hope for the best.

With Technorati, you do get some increased power. You can search for blogs as entities distinct from an individual post, and blogs do get assigned an authority score. But the experience seems to fail as often as succeed. A search for “boston web developer” blogs on Technorati returns three blogs, all with an authority score of 1 ( the minimum) – pretty sure there are more than 3 of this kind of blog in Boston! And there’s no way to sort the blogs by their authority score, at least that I can see. The Technorati blog directory also seems to be mostly limited to “authoritative” blogs – personal blogs (for example my own) seem to have little or no representation. But on long tail topics (say, music reviews of obscure artists), blogs by “real people” are often the only place to find this kind of commentary. Most importantly, there seems to be little innovation happening in Google Blog Search, Technorati, or more generally – the field is stagnant.

What would the characteristics of a good blog search tool be? What’s lacking in today’s approaches?
1. Comprehensiveness. There’s hundreds of millions of blogs (Billions?) – yet Technorati doesn’t seem to find many of them. (Google is more comprehensive, but limited by the “search box + 10 results” interface).
2. Ranking of blogs relative to search query and/or authority of author. (Of course this ranking problem is non-trivial. There are some interesting ideas on authority for twitter accounts which could perhaps be leveraged, e.g. http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/)
3. A faceted, searchable directory of blogs supporting discovery. Categorization technology has come a long way. It ought to be possible to categorize every blog against a reasonably detained taxonomy or facet set (say, the Open Directory categories, or something better), with 80% accuracy, across some common facets: topics, locations, age of blog, date of last post, and so forth. Even at 80% accuracy, this asset would be quite effective. And with a little UGC thrown in, the 20% that matter, and are wrong, will quickly get corrected. Using a microformat like hCard, blog authors could document their blog’s metadata quickly and accurately. Blogs also typically have some consistent thematic elements, such as an “About” page or a Blogroll list, that could be mined for interesting metadata. This kind of experience would power a new way to discover fresh and interesting blogs & content.
4. Recommend other, similar blogs. Powered by the facets above, or by a blogroll analysis, or something similar, a recommendation feature for similar blogs could be implemented, based on information readily available in an “almost standard” format.

Verticalized Blog Search Engines might also provide some task-centric capabilities. As I’ve written before, the future of search is about providing task-centric search capabilities. In music for example, The Hype Machine has some very interesting behaviors it can support, simply by virtue of being focused on music.

The obvious question: what business or investment model would support this kind of vertical search engine? In the Goby world of travel and entertainment, there’s a long history of various ways to monetize that kind of content. In the “pure content” world of blog search, it’s less clear – a pure page-view based CPM ad model isn’t likely to work. If the New York Times can’t make that kind of model work, a startup probably can’t either. Perhaps some form of interest-based, downstream ad retargeting approach might get enough leverage that it could get to critical mass. Alternatively in some domains a “freemium” model might work, where additional tools (say for recruiters or brand managers looking for a competitive edge). Given the scale of the problem, it’s not clear a bootstrapped company could take this on – the infrastructure requirements (bandwidth for crawling, servers, etc) probably require a non-trivial level of investment.

What blog search tool do you use? Do you use a blog search tool? or just Google? Is anyone innovating in the area?


May 10 2010

Some thoughts on the convergence of Search, Travel, Local & Social

There’s a convergence coming, between the worlds of search, travel, local, and social. It used to be that if you were traveling, you used a guidebook and map and talked to the concierge, then you graduated to TripAdvisor and Expedia (and if you were adventurous, Kayak). People’s use of search engines tended not to intersect with their travel planning. In recent years of course Google has become a de facto part of the travel planning experience – although by no means a perfect one. And some search engines have introduced travel products (notably Bing Travel). And for planning your weekend, search engines have historically not been of much use at all – they don’t understand the concept of time or location very well (“this weekend” is just a few keywords to them), and don’t understand your task (when I search for beaches on Cape Cod, why do I get back results for restaurants with the word “beach” in them?). Robert Scoble has some thoughts on this subject, here. Google appears to be moving in this direction, with their rumored acquisition of ITA, which powers many airfare metasearch sites including Kayak. Their abortive attempt to acquire Yelp shows how search & local are converging as well.

But there’s a new game in town – social/local gaming, in particular with things like Foursquare and Gowalla, that combine social gaming with local-search-like results, allowing people to broadcast where they are and what they’re doing. There’s an evolving “stack” of technologies, including location databases and engagement tools, nicely summarized by Chris Dixon. (I disagree with his assertion that location databases will become commoditized – the information is too hard to come by, and companies like InfoUSA make hundreds of millions in revenue providing this kind of data. Not to mention the startups like SimpleGeo and Locationary and for that matter Goby, that are tackling the problem, but I digress).

This kind of engagement is going to have a profound impact on how people plan travel and figure out their weekends. DeepDish Creative (http://deepdishcreative.com/wordpress/2010/02/foursquare-for-tourism/) is talking about how destination marketing organizations can leverage these tools to promote their destination. But I see two problems with this generation of tools as they apply to this problem:

  1. They are after-the-fact. I tend to engage with Foursquare after I’m already AT someplace – Foursquare isn’t really involved in my decision process, it simply records what I’ve already decided. As a result, it has limited use (not no use, just limited use) in making decisions.
  2. These tools only recognize a limited set of entities, primarily businesses (in fact, primarily restaurants). It’s hard to check in at a U2 concert, because it’s an event, and it’s hard to check in at the Grand Canyon, because it’s not really an entity, it’s a generalized (and off-the-beaten-track) place. God help you if you want to check-in on a hiking trail!

Addressing those last two elements would create a resource that will not only appeal to my vanity & let me broadcast what I’m doing, but more importantly help me decide.

The key need here is a semantically meaningful database of things, to key all your features off of, and search tool to find & organize them – not just a pile of URLs. The system needs to know that Yo La Tengo is a band playing at the Fillmore on the 23rd of April, with a date and a location – not just a pile of keywords without any meaning. Any system like this needs to cover hotels and restaurants as well as non-business entities like hiking trails or concerts, and once you leave hotels/restaurants, this information is hard to come by. Once you have the database of entities, it is straightforward to build a platform for people to engage with their networks, in the context of that content. Once you have a strongly categorized, rich database of things to do, and a strong network of people telling you what they are interested in, you can provide compelling recommendations as well as support discovery. And, strangely enough 8), that’s where we’re headed with Goby – we plan to be right at the intersection of this convergence.


Apr 27 2010

TV

Recorded today in NY for Shelly Palmer’s Digital Life, an NBC show in NY focused on consumer tech. My first time doing TV. Makeup & the whole bit. First observation: These people work early! The makeup person told me she started work around 3:30am. ick. Good news for her: home by 10:30am.

Recording was fun. 3 minutes (my segment) goes by SO fast when the lights go on. You’ve got to have your message down to so few words to get it in there crisply. Watching Shelly was fun – so matter of factly creates so many facial expressions, rarely makes any mistakes. When he looks at you on camera, it’s with such focus that I had a hard time not feeling like a deer in the headlights…

They had these cool robotic cameras (for some reason my picture didn’t come out), but one of the other guests was telling me that those cameras replace 3 or 4 cameramen, driving costs way down. They seem very similar to the robots that are in many factories and warehouses these days.

Came down the night before and caught Los Campesinos, a welsh punk band at the Fillmore. Those guys rock. Was going to mention it on the show, but NBC doesn’t really seem like a “welsh punk” kind of show 8)


Apr 13 2010

Startup I wish someone would build: TheNextOne.com

How many times have you gone to lunch or drinks with somebody, tried to figure out who paid last time, and promised “ok the next one’s on me”? Happens to me all the time. I wish someone would build this app – “thenextone.com” – I go to lunch, hit a mobile site or send a text message, record who I went to lunch with, and who paid. Then, next time, voila! You know who’s turn it is. Presumably could even be integrated into foursquare or gowalla so that as you are checking in, it’s recorded. Done.

Bonus points for cross-referencing to my social network so the “I owe” is tagged onto the social identities for the people I owe or am owed by. Karma points for buying more than you are bought for…..


Mar 30 2010

What’s Goby all about anyway?

What’s Goby all about anyway? On the surface, Goby is a search engine for things to do in your free time. The travel industry has invested hundreds of millions (if not billions) of dollars in helping you get a hotel room and plane ticket – but hotels and plane rides aren’t why people travel – they travel for experiences. Finding experiences is tough – the information is scattered around the web, locked away in domain-specific databases, and often with poor user experiences and bad information architecture. And we’ve all had the experience of sitting around on a Friday night trying to decide what to do over the weekend – essentially the same problem. Goby crawls the web looking for high quality sources of information about all kinds of experiences, covering both traditional travel content (tours, attractions, lodging) as well as more local-oriented things to do (like music, theater, restaurants, museums, hiking trails, surfing spots and skiing…). We then take those results and contextualize them, by geolocating the results and putting them on a map, cross-referencing photography from around the web, and converting those web pages we found into real-world objects you can make decisions about.

Under the covers, Goby is a structured data, task-centric search engine. Over the years there has been continuous interest in the tech & business communities around “what is the next Google?”. In my view there won’t be a “next Google” in search, if by that one means a market-dominating, universally applicable search engine. The future of search is task-centric information access, that supports both findability and exploration in the context of specific objectives – say, finding a new book to read, deciding what neighborhood to move to, getting your next job or deciding where to eat. The shortcoming of major search engines is that, while they can happily parse your query and give you some web pages to read, they have no idea what you are trying to accomplish – and therefore cannot adapt their experience to support your task. You can see this trend happening with Goby (search engine for your free time), and with other interesting products like Milo (product search will real-time store inventory), and the very interesting Hunch ( a general purpose recommendation/decision engine).

The other major dimension to how people consume information is through social media – tools that integrate search & social media have the opportunity to bring the engagement of social media to the findability of search. Look for more on this from Goby in the future.


Feb 17 2010

New & Improved…

Welcome to the new and improved viking blog. I started this blog back in the day as an experiment in home-rolled lifestreaming, covering some of the passions in my life – travel, books, and music. I wanted a way to integrate my activities in some of the social media sites I was engaged with (Last.FM, LibraryThing, Flickr) with longer-form blogging. When I co-founded Goby, a lot of that went out the window as we got the company off the ground. I’m now at a place where I can restart things. I’ll continue to cover the books/music/travel areas that I’m passionate about, but want to bring in some new areas I’m really engaged in. In particular, information is the ocean we all swim in, and I want to explore the world of search, social media, and information retrieval, and how people consume information, especially as it relates to their free time, and of course, our experiences building Goby.