Seek But Ye Shan’t Find

Friday October 06th 2006, 6:47 pm Printer Friendly Version
Filed under:World Wide Web, Language
Posted By: Matt

I stumbled yesterday upon Matt Marshall’s expose of stealth search startup Powerset on VentureBeat. My immediate reaction was of intense skepticism. Having majored in computational linguistics in university, I’ve developed a healthy respect for the tremendously difficult problem of natural language understanding. It’s impossible for me to imagine a startup appearing at the present time with a radical new search solution based on this type of technology. Scrolling down to the comments, I noted that Danny Sullivan of SearchEngineWatch fame had expressed similar doubts (which didn’t stop me from posting my own rambling opinion).

Sure enough, Danny followed up with a fascinating post about the sordid history of natural language search. Despite his encyclopaedic knowledge of the space, however, I disagree with Danny’s analysis of why these efforts have all failed. He seems to be saying that the main issue is changing people’s habits.

Maybe this is because we’re talking about different things. For me, natural language search means that I can enter “What is the best way to revive a failed hollandaise sauce?” into a box and get back a list of relevant results. Get this to work and people will change their habits so fast it’ll make your head spin. In fact, the notion of entering terse keywords to get search results is far less intuitive than just asking the computer for information as you would ask another person, so going back to the more natural approach shouldn’t be a problem at all.

The real issue is that it’s so hard to make this work. At the risk of sounding pretentious (like that ever stopped me), I don’t think that the average layman realizes how difficult understanding language is. It’s so easy for us humans that we aren’t able (without academic study) to take a step back and bask in the complexity of the task that we are performing so effortlessly. Keeping this in mind, Danny’s quote from former Excite CTO Graham Spencer is extremely revealing:

“The problem with any technology that tries to be explicitly ’smart’ is that it has to be really close to perfect or else a human will notice.”

If you’re very clever and work very hard, you can perhaps create a search engine that uses computational linguistics to provide intelligent results X% of the time (where X is some low number). This is fantastic for demos since you know which queries work well. But in the real world, users are going to expect the computer to respond like a human being, since they’re talking to it as if it were one. These expectations are sure to be dashed after only a handful of queries, sending them running back to Google with their keywords in tow.


1 Comment »

  1. Matt, there are two ways to apply linguistic processing: (1) interpreting the user’s query; (2) extracting the true meaning of the pages themselves. Just as PageRank indexed hyperlinks to pages, the ability to deeply analyze all of the web’s pages themselves is the key. You can then extract information. For example, you search for the link between global warming and hurricanes (e.g. enter keywords global warming hurricanes) and you get both pro/con references with links to core research articles on both sides of the argument. This alone would be huge, but it would require crawling an analyzing the pages instead of just indexing them. But then people could see for themselves that there is no support for a link between the frequency or severity of hurricanes in the Gulf and global warming…but that is another post.

    Comment by Mike — 10/6/2006 @ 10:27 pm

Trackback URL RSS feed for comments on this post. TrackBack URI

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

(required)

(required)