Thursday, May 18, 2006

A Virtuous "Cyc"le

How Common Sense is Uncommon, and Why Computers Need it Badly

When I first learnt about computers in college in 1982, it seemed logical to me that computers must have common sense. However, that turned out to be far from true. I was astounded at the level of detail that a computer programmer must get down to, in writing a program to solve even the most trivial problem. While things have improved in the intervening two-and-a-half decades, programming computers to solve problems remains a task for people with specialized knowledge, and so-called 'High-level' programming languages remain far, far less sophisticated than natural languages.

That state of affairs, however, is not due to lack of effort by the Computer Science research community. Much work has focused on Natural-language (NL) processing, or figuring out how to make computers smart enough to allow humans to interact with them as though they were other people. This may sound easy enough, but has turned out to be stupefyingly complex - for example, there probably isn't a computer in the world today that could read and understand the previous sentence (the greatest difficulty it would find would be to understand that the word "they" in that sentence refers to computers, and not humans, which is the immediately preceding noun in that sentence!).

NL processing is crucial if we are to ever make computers truly easier for humans to use, in applications ranging from teaching kids to making cars easier to drive.

It is thus that the story of Cyc, a brainchild of former Stanford Professor Michael Lenat, is staggering. This project has soldiered on for two whole decades in search of the laudable goal of imparting common sense to computers. An excerpt from their site:
"Natural-language (NL) processing is among the most studied -- and most intractable -- challenges of software engineering. Consider the following pair of sentences:
Fred saw the plane flying over Zurich.
Fred saw the mountains flying over Zurich.
Humans have little difficulty in recognizing that in the first sentence, "flying" probably refers to the plane, while in the second sentence, "flying" almost certainly refers to Fred. Traditional NL systems will have difficulty resolving this syntactic ambiguity, but because Cyc knows that planes fly and mountains do not, it will be able to reject nonsensical interpretations. It's difficult to see how this could be done without relying on a large database of common sense."
Cyc has two things: a Knowledge Base (KB), which consists of what Cyc "knows", and an Inference Engine(IE), which Cyc uses to make sense of what it knows. The Cyc KB contains nearly two hundred thousand terms and several hundred thousand assertions, or rules, connecting these terms. New assertions are continually added to the KB by human workers. The whole project is still far from achieving it's goal, but it's making progress.
MIT's Open Mind Project is another example of an initiative directed towards making computers understand things that human beings take for granted.
It has been said that common sense is uncommon (even among humans). But that is no excuse for computers not to have plenty of it. Maybe one of these days, we'll actually see a computer that talks - and listens to - common sense!