Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: Some comments about Cyc




Following is another copy of my note about Cyc, which was included
as an addendum to another note.  I'm resending it now to serve as
a resource for some further discussions related to the notion of
a single monolithic ontology vs. a modular ontology.  Some relevant
points:

 1. Cyc has supported microtheories since Guha implemented them
    while he was working on his PhD dissertation (which was
    completed in 1991).

 2. The microtheories are widely used for lower-level theories,
    but Adam has made the point that the upper levels consist of
    a single large theory.  I have not heard or seen any arguments
    why the microtheories were not used to organize the upper 
    levels, but my suspicion is that it was easier for Guha to
    add the microtheories to the lower levels, which were still
    in development, than to reorganize the entire upper level,
    which was already fixed in place.

 3. However, if microtheories (or modules) are going to be used
    for the lower levels, there is no reason not to use them for
    the upper levels as well -- especially for a new project that
    is just being developed from scratch.  Cyc, for example, has
    made some major reorganizations of its upper levels during the
    past 10 years (as Fritz Lehmann has pointed out many times).
    We can certainly expect such reorg's to take place in any
    ontology developed by SUO, and a modular structure would
    facilitate the changes by making clear exactly which parts
    have been changed.  (Any module that does not inherit from
    one of the changed modules is not affected by the change;
    that property is certainly a big help in managing change.)

 4. As the following note illustrates, the Cyc developers themselves
    do not fully understand what is in Cyc and how those features
    interact.  Right now, the only person who really knows what is
    in SUMO is Ian, and I seriously doubt whether he or anyone else
    fully understands all the implications of SUMO.  Furthermore,
    if Ian takes a vacation for a couple of weeks, I doubt whether
    he could really be said to know what is in SUMO when he got back.

 5. Adam keeps asking for tools to manage the modules.  Cyc does
    have such tools, and they are promising to release some parts
    of Cyc very soon.  Perhaps we could ask them for those tools,
    and ask that they be put under some license, such as LGPL,
    which would allow them to be merged with proprietary code
    without requiring the proprietary parts to be released.

 6. And the clinching argument is that Cyc upper levels are going
    to be released very soon.  Cyc is certainly bigger than SUMO,
    and a lot more effort has gone into it than SUMO.  If IEEE is
    going to standardize any ontology, there are probably more
    arguments for standarizing Cyc than for standardizing SUMO.
    However, many of us have concerns about buying a "pig in a
    poke" -- a large undocumented system that we haven't had a
    chance to examine in detail.  That is certainly true of Cyc,
    but it is also true of SUMO -- at least for everyone but Ian.
    Therefore, it would be much better to have a modular system,
    in which we could "buy" or "certify" one module at a time
    rather than accept an all-or-nothing monolith.

 7. No one today knows whether there are inconsistencies between Cyc
    and SUMO, but the probability of inconsistencies is extremely
    high.  For any particular inconsistency, no one today can tell
    us whether the Cyc version or the SUMO version or some other
    version would be preferable.  We need to establish a framework
    that would enable us to adopt and certify one module at a time
    when its suitability has been determined.

Bottom line:  The modular approach would allow us to adopt the best
modules of both Cyc and SUMO after they have been analyzed, tested,
and certified instead of taking everything in one undigested lump.

John Sowa
_____________________________________________________________________

                      Some Observations about Cyc

[The following comments on Cyc have been extracted from a paper that
was presented by Stuart Shapiro at an IJCAI Workshop (citation below).
The evaluation of Cyc is based on Cycorp documentation and on experience
by the first author (Frances Johnson) during a Cyc training course.]

Doug Lenat and Cycorp have developed Cyc [Cycorp, 200la] -- a large
knowledge base and inferencing system that is built upon a core of over
a million hand-entered assertions or rules about the world and how it
works.  This system attempts to perform commonsense reasoning with the
help of this large corpus of beliefs (mostly default with some that are
monotonic).  It divides its knowledge base into smaller contexts called
microtheories which contain specialized information regarding specific
areas (such as troop movement, physics, movies, etc.).  Belief
revision is performed within microtheories or within a small group
of microtheories that are working together, and the system is only
concerned with maintaining consistency within that small group (as
opposed to across the entire belief space).  For example:  in an
everyday context, a table is solid, but within a physics context,
it is mostly space (between atoms).

A belief can have only one truth value, so no microtheory can contain
both p and ~p.  For example, ~p could be expressed as the proposition p
with a truth value of false.  The technique for maintaining consistency
is to check for contradictory arguments whenever a proposition is
derived or asserted into a microtheory.  When contradictions are found,
their arguments are analyzed, and a decision is made regarding the truth
value of the propositions involved.  Rankings of beliefs, however, is
not a part of the system -- it uses specificity to determine the truth
value of a default belief.  For example:  Opus the penguin does not fly,
even though he is a bird, because penguins don't fly.  If there can be
no decision based on specificity, the truth value of the default belief
is unknown.  A default belief loses out to a monotonic one.  And,
lastly, according to Cyc trainers and other contacts, contradictions
that are purely monotonic bring the system to a halt until they are
fixed.  During Cyc training, Johnson attempted to prove this last
statement and failed -- revision was performed.  The instructors were
surprised, but thought the training interface might be the cause.  We
plan to explore this further, but it is an example of a system behaving
differently than it is described.

As mentioned [above], Cyc did not perform as described, and there must
be some question as to other possible differences from design theory.
Most specifically, Cyc literature [Cycorp, 2001b] claims to keep the
microtheories consistent, for lack of a better word.  When asked,
contacts agreed that, in spite of a cursory check, it was possible that
unknown contradictions might exist that had not, yet, been derived.  In
this sense, Cyc can only guarantee that its microtheories are not known
to be inconsistent (or KS-consistent).  Ideal terminology, such as
consistent and derivable, is often not appropriate for use with a large
or complex implemented system.

References

Cycorp [2001a] _Cycorp, Creators of the Cyc Knowledge Base_,
http://cyc.com

Cycorp [2001b] _Features of CycL_, http://cyc.com/cycl.html

The original article from which these paragraphs were extracted:

Frances L. Johnson and Stuart C. Shapiro, "Redefining belief change
terminology for implemented systems," _Inconsistency in Data and
Knowledge_, Working Notes from IJCAI'01, Seattle, Washington,
6 August 2001, pp. 11-21