Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: CNN article about Cyc




The ISP that serves my email and my web site has been down today,
and I can't read any of my mail.  However, I can send email by a
different route, so I thought that I'd mention the following article:

 
http://www.cnn.com/2002/TECH/industry/04/11/memome.project.idg/index.html

This a brief news report that doesn't get into any of the technical
issues about Cyc, but it reports an interview with Doug Lenat,
who made some interesting remarks about Cyc and where it is going.
In this note, I'd like to comment on those remarks and to indicate
how I believe those directions should be modified.

First of all, the article gives some statistics, such as 600 person-
years of development effort that went into Cyc over the past 17 years,
its 3 million rules (aka axioms) and 300,000 concept types.  It also
mentions the availability of OpenCyc with about 5,000 concepts and
50,000 rules.

I agree with Lenat's view that this knowledge base is broader than
anybody else's.  It might not be larger than some very specialized ones,
but for overall breadth and coverage, I would agree with Lenat that
nobody else is likely to come close for a long time.  I would also
argue that they shouldn't even try -- at least they shouldn't try
to do it the way Cyc was done.  But I believe that there are ways to
go much further by other means (which could take advantage of current
knowledge bases, such as OpenCyc or SUMO as resources).

The following excerpt from the article indicates their future plans:

   Cyc finally knows enough that it can actually help with the
   knowledge-entry process. It's changed in the past year from where we
   were entering these things by hand and writing them in logic to a kind
   of tutoring mode. For example, you say, "I want to tell you about a
   new kind of bacteria," and it might say, "What kinds of things does it
   kill? Is it similar to anything I know about already?" Up until now,
   the only people adding knowledge were a small priesthood of logicians.
   Now, suddenly, millions of people can add their knowledge to Cyc.
   Because of the acceleration, we'll be at 10 million assertions a year
   from now.

To sort out the obvious garbage that such an open project would throw
into the pot, Lenat proposes a committee:

   I'll have an OpenCyc committee to help vet knowledge that is
   suggested. Also, we've developed the notion of local consistency,
   which is analogous to our everyday notion of the earth as being
   locally flat and globally spherical. In the same way, we have divided
   the knowledge base into regions that are locally consistent, and all
   the inconsistent information is so far away that you can ignore it.
   If someone puts in "Dining room tables are made of Jello," that will
   contradict so many things in the "normal" part of the knowledge base
   that it automatically will get pushed out into the boonies.

I like the idea of "local regions" or contexts for which consistency
can be guaranteed.  Those would fit nicely into my proposed lattice
of all possible theories.  I agree that new knowledge that hasn't been
"vetted" should be placed in a context that hasn't been certified or
validated as reliable.  But rather than have that knowledge vetted
by a committee appointed by the OpenCyc organizers.  I would like to
see that knowledge tested and validated by anyone who happens to be
knowledgeable about the subject.  And I would like to see automatic
methods for assisting in that validation.  I believe they can be built.

Suppose, for example, that somebody proposed that dining room tables be 
  made of styrofoam.  Commonsense knowledge tells us that styrofoam cups
and packing material are too soft, flimsy, and breakable to be used
in a table.  But chemists and materials engineers have recently
developed techniques for making "microfoams" of various kinds of
plastics that have much greater strength and resistence to impact.

In terms of the knowledge base, there is a fundamental distinction
that must be observed:  ordinary plastic foams are weak, but microfoams
can be strong.  That one tiny fact can invalidate millions of
"commonsense" observations.  The lattice of theories can accommodate
such a new fact by its organization:  the new fact would migrate high
up in the lattice above all the other knowledge about plastic foams.
It would automatically cause a split between the subtheories about
today's "ordinary" plastic foam and the new kinds of microfoams.

This point also relates to my objection to the term "commonsense",
which I believe confuses too many issues to be in any way useful
as a guideline for KB design and development.  Fifty years ago,
nobody had any commonsense ideas about styrofoam, but today most
people do.  Now an esoteric fact that is known only to a tiny
percentage of scientists and engineers may have an enormous impact
on what everybody thinks is "commonsense" a few years from now.

That kind of revolutionary migration from esoteric fact to common
sense has happened many times during the 20th century, and it will
happen at an increasing rate during this century.  In fact, it has
happened throughout human history, but the slow communication
meant that different tribes living under different customs and
environments had very different ideas of common sense.  I prefer
to talk about axioms and theories and how they should be organized.

The interviewer asked "Is Cyc like the human genome project, where
eventually you will be done, or will it grow forever?"   To which,
Lenat responded:

   I refer to it as the human "memome" project. A typical person knows
   about 100 million things about the world. I see us crossing that point
   in five years. It's difficult to predict the course thereafter.

Lenat is very consistent:  Since 1984, he has been predicting that Cyc
would reach human-like abilities in five years.  With its current
structure, I wouldn't expect Cyc to reach such a level in 50 years.
And I strongly disagree with Lenat that the total number of axioms
is the most significant factor in intelligence.  A 6-year-old child
with a fraction of an adult's knowledge is extremely intelligent,
as Gene Charniak discovered in the 1970s when he tried to design
a computer that could read children's stories.  It turned out that
they were far harder to understand than an advanced textbook on
science or mathematics.

Instead, I believe that the organization of the knowledge base in a
flexible, dynamically changeable structure is far, far more important.
That is the main point of my paper "Signs, Processes, and Language
Games", which I have mentioned on these email lists many times:

    http://www.jfsowa.com/pubs/signproc.htm

(I just tried to ping my web site, and it's still unreachable.  But
I hope that it will be up later today.)

Meanwhile, I wouldn't say that large projects like Cyc and SUMO are
a total waste of time.  They should be regarded as useful resources
that can be incorporated in much more flexible reasoning structures.
But the important word here is "structure", which is far more
important than the number of axioms.

John Sowa