Watson computer smoking hot at Jeopardy challenge

Well, the contest isn’t over yet, but the outcome looks like a foregone conclusion. After two days, the Watson computer is poised to defeat the two human champions it is playing. The computer’s performance has been impressive, to say the least, and has left the human contestants looking dazed  and confused.

And who wouldn’t be? The computer was both ruthless & relentless. (There I go, anthromorphising again.)  The two human champions were barely able to answer a question or two as Watson virtually ran the board in the 2nd day of the competition. Watson, which has to generate an answer in real-time, was so successful at beating the human contestants to the punch that it generated speculation about whether the computer had some kind of unfair time advantage from being fed the question electronically. As reported here (thanks, Phillip), according to IBM, Watson actually cedes a slight “reaction time” advantage to the human contestants. Given how successful Watson is in determining the correct answer so quickly, I think it would be more sporting to give the human players an even bigger head start. Hey, give us poor, deserving human a break!

After Day 1 of the contest, the computer and one of the contestants were tied, and it looked as if things would get interesting. After Tuesday’s totally one-sided shellacking, though, commentators were reduced to wondering about the few missteps and obvious quirks that the computer did exhibit on occasion. See, for example: http://www.wired.com/epicenter/2011/02/watson-does-well-and-not/, which analyzes the prodigious strengths the program displayed, as well as describing its few weak spots.

I am afraid that the computer is so good at answering Trivia question that the contest isn’t turning into much of a drama. (It is turning into a great promo, though, for the IBM Watson Research lab.)

However, it remains a challenge of mythic proportions, which is very cool. Like John Henry, the steel-driving man vs. a steam-powered machine, or Charlie Chaplin trapped inside the assembly line in “Modern Times.” On Ray Kurzweil’s web site (he is the author of “The Singularity is Near”), I can almost hear the champagne glasses clinking..

The Smartest Machine on Earth Plays Jeopardy

I don’t know if anyone out there besides me saw the NOVA TV show “Smartest Machine on Earth” about the IBM Research Watson computer. Watson is scheduled to play two human Jeopardy champions on TV on Monday-Wednesday (Feb 14-16) of next week. I thought the show was excellent.

Here’s a link to the broadcast: http://www.pbs.org/wgbh/nova/tech/smartest-machine-on-earth.html.

If you are interested in going deeper, the current issue of AI Magazine is devoted to “Question Answering,” and contains an article by the Watson researchers. After the IBM Deep Blue chess computer successfully challenged the reigning human chess champion in 1997, AI researchers at IBM turned to other “hard problems” in AI. I am not much of a chess player myself, but I enjoyed following the progress of man against machine at the time, and I expect to tune in to watch the new IBM software play Jeopardy next week.

I admit I enjoy the drama of these human vs. computer challenges. A computer that plays Jeopardy models the famous “Turing test” for artificial intelligence coined by mathematician and computer pioneer Alan Turing. Today, the Turing test has been largely supplanted by John Searle’s Chinese room  thought experiment, a challenge to the AI research agenda that is taken quite seriously. This, perhaps, explains why IBM is willing to spend millions of dollars on this Jeopardy effort.

Essentially, Searle’s philosophical argument is that humans have minds, while computer programs that perform automated reasoning based on encoded rules do not. Searle’s challenge encapsulates the gulf between syntax in language, which is indisputably governed by formal rules, and semantic knowledge, which may or may not be. The gulf between syntax and semantics is very wide indeed, but it is one that many AI researchers are actively engaged in trying to bridge. (Things like the Semantic Web come to mind.)

Of course, I also found the show relevant in the context of my current blog topic, where I have been discussing rule-based “expert systems” approaches to computer performance analysis. As I have written earlier, I am not a huge fan of the approach, but I do acknowledge some of its benefits, particularly in filtering very large sets of performance-oriented data, like the ones associated with huge server farms, for example. My assessment of the value of the rule-based, automated reasoning approach does appear to square with current academic thinking in the AI world. Today, engineering-oriented approaches dominate much of the current research in AI. The emphasis of the machine learning approach, for example, is on the underlying performance of the system, not the extent to which the cognitive capabilities of humans are modeled or imitated.

The NOVA show on Watson featured several AI luminaries from the academic world. Doug Lenat, a prominent AI researcher at Stanford who is still pursuing the rule-based approach, was on camera. Lenat’s current focus is a reasoning engine in which millions of “common sense” rules are represented in a unique language, derived from the predicate calculus, that he developed called CycL. On the NOVA program, Lenat said that the CYC knowledge base currently consists of more than 6 million assertions in CycL.

A sample CycL assertion looks like this:

(#$implies

      (#$isa ?A #$Animal)

      (#$thereExists ?M

         (#$and (#$mother ?A ?M)

         (#$isa ?M #$FemaleAnimal))))

CycL is certainly interesting as an example of a Knowledge Representation (KR) language. The problem is that, by nature (pun intended), biological categories are messy. If you think about it, the assertion in the example should probably say something about the mother object being the same species as its offspring. This is both an important biological and logical constraint. The assertion I learned in biology is:

Animal a  => HasA femaleparent m =>
Where m IsA Animal and a.species == m.species  

Which, if you think about that, also implies that a new species coming into existence is a (bio)logical contradiction. I don’t know why creationists don’t argue this, the logical inconsistency seems pretty explicit to me, but, perhaps, their positions aren’t grounded in logic to begin with.

The CycL rule doesn’t even mention animals like snails that are hermaphrodites and can self-fertilize their own eggs, a pretty neat trick, but not entirely unknown in the Animal Kingdom. It turns out that there is more to heaven and earth than is dreamt of in this set of categorical Rules that evaluate as either true or false using an automated reasoning program. Whether individual specimens belong to the same or different species is often in dispute. I remember learning in science class that there were nine planets in our solar system; now astronomers aren’t so sure. Poor Pluto. It has been demoted. There are some people that are devastated by the demotion. Poor Pluto and its acolytes.

In KR, this is known as the problem of ontologies. The problem is the differences between a planet, an asteroid, and a comet are not always clear cut. Worse, we are blind to our own tacit assumptions. A central thesis of cultural anthropology is the extent to which reality is culturally determined. Levi-Srauss on Le sauvage pensee argues that plant and animal classification schemes used by so-called “primitive” societies are no less rigorous than the one we use that originated with Linnaeus. The American linguist (and darling of the Left) George Lakoff also writes about the socially-constructed, culturally-determined “cognitive models” that shape our thinking in “Women, Fire, and Dangerous Things.” We see the world “through a glass, darkly.” We are like the prisoners in the Plato’s cave that mistake the shadows on the wall for reality.

Less philosophically, there are mathematical-logical objections to the automated reasoning approach. The fact that 1st order logic is Undecidable (after Godel), or that computer programs of arbitrary complexity are subject to the Halting problem (Turing, again) ought to give proponents of the Rule-based approach in AI pause, but it doesn’t seem to. They have faith in mathematical modes of reasoning that I guess I must lack.

Given some of these inherent limitations, however, the trend in AI research today is away from Lenat’s rule-based reasoning approach. For instance, Terry Winograd also appeared in the NOVA show. When he was a graduate student at MIT, Winograd conducted ground-breaking research in AI, building a program called SHRDLU that could carry out simple tasks about a small domain of physical objects (called the Blocks world) using a natural language interface. (For a very amusing account of the origin of the name SHRDLU, see http://hci.stanford.edu/~winograd/shrdlu/name.html.) Winograd’s doctoral dissertation was later published as a book, “Understanding Natural Language” (currently out of print).

Back when I was in graduate school, Winograd’s SHRDLU program was considered one of the great success stories in “strong AI.” But then Winograd, one of the rising stars in AI, subsequently became disenchanted with the mechanistic reasoning approach he used in building SHRDLU, essentially a parser for a context-free grammar with back-tracking, which is a very rigid and limited approximation of natural language speech recognition. Winograd famously repudiated the rule-based reasoning approach to AI in a 2nd book, “Understanding Computers and Cognition: A New Foundation for Design.” His critique, coming from someone from deep within the orthodoxy, was notorious. But, in fact, if you look at the way computer technology is used in speech recognition today, it is very far removed from the approach Winograd used back in the day. (I am thinking of the statistical approach described in Jelinek, “Statistical Methods for Speech Recognition” that relies on Hidden Markov Models.) These statistical techniques are quite effective in distinguishing human speech, but I doubt anyone would mistake them for simulating or imitating what it is we humans do when we converse with each other.

On the NOVA episode, Winograd demoed a version of Eliza, another celebrated AI program from the sixties that “simulated” conversing with a sympathetic therapist. The syntactically-oriented approach used in Eliza is easy to defeat, as Winograd demonstrated to some comic effect. Unlike Watson, the program could never hope to pass Searle’s Chinese Room test, although maybe today’s computers, several orders of magnitude more powerful, can.

Despite Eliza’s simple-minded capabilities, many human subjects that interacted with Eliza were comfortable having extended “conversations” with the computer, which surprised its author, given how limited a range of human interaction the program imitated. What seems to happen with Eliza is that human subjects project human attributes onto something that exhibits or mimics recognizably human behavior. Cognitive scientists claim we develop in early childhood a “Theory of Mind” that aids us in social interaction with other humans, something clinical researchers noted was absent in autistic children. When we encounter a computer that walks like a duck and quacks like a duck, it is normal for us to assume it is a duck. Similarly, participants in the Eliza naively assume that the computer-generated replies Eliza generates reflected empathy from a recognizably human Mind.

Searle’s Chinese room challenge turns Eliza on its head. It begins with a Skeptic’s perspective: can the computer program present a thoroughly convincing simulation of human interaction? Can it tell a joke, can it be ironic, or coin a metaphor? Can it be intuitive? Can it truly exhibit sympathy? These are human qualities and capabilities that have evolved that may require elements that are not wholly logical.

Finally, Tom Mitchell, one of the prominent researchers in the machine learning school, was featured on the NOVA show. Mitchell wrote the first textbook on the subject in 1997. Several of the recently minted PhDs at Microsoft Research I worked with on computer performance issues trained in Mitchell’s “machine learning” approach. It is a broad term, encompassing a variety of (mainly) statistical techniques for improving the performance of task-oriented computer programs through iteration and feedback (or “training”). The Watson Jeopardy-playing computer is programmed using the machine learning techniques.

The iteration and feedback aspects of the machine learning approach are really trial and error, or more succinctly, error-correcting, procedures. They can not only be quite effective, they do seem to model the incremental and adaptive procedures that biological agents (like homo sapiens) do use to learn a new skill or hone an existing one. The Watson computer trains on Jeopardy questions, and its learning algorithms are modified and adjusted to improve the probability the program will choose the correct answers. Similarly, if you are human and you want to get better at answering questions on the SAT exam, you take an SAT prep course where you practice answering a whole lot of questions from previous exams. Some of what you might learn in the class helps you with the content of the test (like vocabulary and rules of English grammar). But learning about the kinds of questions and the manner in which they are asked – on an exam where questions are often deliberately designed to trick you or confuse you – can also be extremely helpful. Having Watson train on a dataset of existing Jeopardy questions is essentially the same, proven strategy.

In the upcoming televised contest, Watson is competing against two reigning Jeopardy champions, the most skilled human contestants alive. I don’t know whether Watson vs. the human Jeopardy champions is going to be David vs. Goliath or Achilles vs. Hector, but I expect it will be a very intriguing human drama.

.