SUO: Some comments about Cyc
Following is another copy of my note about Cyc, which was included
as an addendum to another note. I'm resending it now to serve as
a resource for some further discussions related to the notion of
a single monolithic ontology vs. a modular ontology. Some relevant
points:
1. Cyc has supported microtheories since Guha implemented them
while he was working on his PhD dissertation (which was
completed in 1991).
2. The microtheories are widely used for lower-level theories,
but Adam has made the point that the upper levels consist of
a single large theory. I have not heard or seen any arguments
why the microtheories were not used to organize the upper
levels, but my suspicion is that it was easier for Guha to
add the microtheories to the lower levels, which were still
in development, than to reorganize the entire upper level,
which was already fixed in place.
3. However, if microtheories (or modules) are going to be used
for the lower levels, there is no reason not to use them for
the upper levels as well -- especially for a new project that
is just being developed from scratch. Cyc, for example, has
made some major reorganizations of its upper levels during the
past 10 years (as Fritz Lehmann has pointed out many times).
We can certainly expect such reorg's to take place in any
ontology developed by SUO, and a modular structure would
facilitate the changes by making clear exactly which parts
have been changed. (Any module that does not inherit from
one of the changed modules is not affected by the change;
that property is certainly a big help in managing change.)
4. As the following note illustrates, the Cyc developers themselves
do not fully understand what is in Cyc and how those features
interact. Right now, the only person who really knows what is
in SUMO is Ian, and I seriously doubt whether he or anyone else
fully understands all the implications of SUMO. Furthermore,
if Ian takes a vacation for a couple of weeks, I doubt whether
he could really be said to know what is in SUMO when he got back.
5. Adam keeps asking for tools to manage the modules. Cyc does
have such tools, and they are promising to release some parts
of Cyc very soon. Perhaps we could ask them for those tools,
and ask that they be put under some license, such as LGPL,
which would allow them to be merged with proprietary code
without requiring the proprietary parts to be released.
6. And the clinching argument is that Cyc upper levels are going
to be released very soon. Cyc is certainly bigger than SUMO,
and a lot more effort has gone into it than SUMO. If IEEE is
going to standardize any ontology, there are probably more
arguments for standarizing Cyc than for standardizing SUMO.
However, many of us have concerns about buying a "pig in a
poke" -- a large undocumented system that we haven't had a
chance to examine in detail. That is certainly true of Cyc,
but it is also true of SUMO -- at least for everyone but Ian.
Therefore, it would be much better to have a modular system,
in which we could "buy" or "certify" one module at a time
rather than accept an all-or-nothing monolith.
7. No one today knows whether there are inconsistencies between Cyc
and SUMO, but the probability of inconsistencies is extremely
high. For any particular inconsistency, no one today can tell
us whether the Cyc version or the SUMO version or some other
version would be preferable. We need to establish a framework
that would enable us to adopt and certify one module at a time
when its suitability has been determined.
Bottom line: The modular approach would allow us to adopt the best
modules of both Cyc and SUMO after they have been analyzed, tested,
and certified instead of taking everything in one undigested lump.
John Sowa
_____________________________________________________________________
Some Observations about Cyc
[The following comments on Cyc have been extracted from a paper that
was presented by Stuart Shapiro at an IJCAI Workshop (citation below).
The evaluation of Cyc is based on Cycorp documentation and on experience
by the first author (Frances Johnson) during a Cyc training course.]
Doug Lenat and Cycorp have developed Cyc [Cycorp, 200la] -- a large
knowledge base and inferencing system that is built upon a core of over
a million hand-entered assertions or rules about the world and how it
works. This system attempts to perform commonsense reasoning with the
help of this large corpus of beliefs (mostly default with some that are
monotonic). It divides its knowledge base into smaller contexts called
microtheories which contain specialized information regarding specific
areas (such as troop movement, physics, movies, etc.). Belief
revision is performed within microtheories or within a small group
of microtheories that are working together, and the system is only
concerned with maintaining consistency within that small group (as
opposed to across the entire belief space). For example: in an
everyday context, a table is solid, but within a physics context,
it is mostly space (between atoms).
A belief can have only one truth value, so no microtheory can contain
both p and ~p. For example, ~p could be expressed as the proposition p
with a truth value of false. The technique for maintaining consistency
is to check for contradictory arguments whenever a proposition is
derived or asserted into a microtheory. When contradictions are found,
their arguments are analyzed, and a decision is made regarding the truth
value of the propositions involved. Rankings of beliefs, however, is
not a part of the system -- it uses specificity to determine the truth
value of a default belief. For example: Opus the penguin does not fly,
even though he is a bird, because penguins don't fly. If there can be
no decision based on specificity, the truth value of the default belief
is unknown. A default belief loses out to a monotonic one. And,
lastly, according to Cyc trainers and other contacts, contradictions
that are purely monotonic bring the system to a halt until they are
fixed. During Cyc training, Johnson attempted to prove this last
statement and failed -- revision was performed. The instructors were
surprised, but thought the training interface might be the cause. We
plan to explore this further, but it is an example of a system behaving
differently than it is described.
As mentioned [above], Cyc did not perform as described, and there must
be some question as to other possible differences from design theory.
Most specifically, Cyc literature [Cycorp, 2001b] claims to keep the
microtheories consistent, for lack of a better word. When asked,
contacts agreed that, in spite of a cursory check, it was possible that
unknown contradictions might exist that had not, yet, been derived. In
this sense, Cyc can only guarantee that its microtheories are not known
to be inconsistent (or KS-consistent). Ideal terminology, such as
consistent and derivable, is often not appropriate for use with a large
or complex implemented system.
References
Cycorp [2001a] _Cycorp, Creators of the Cyc Knowledge Base_,
http://cyc.com
Cycorp [2001b] _Features of CycL_, http://cyc.com/cycl.html
The original article from which these paragraphs were extracted:
Frances L. Johnson and Stuart C. Shapiro, "Redefining belief change
terminology for implemented systems," _Inconsistency in Data and
Knowledge_, Working Notes from IJCAI'01, Seattle, Washington,
6 August 2001, pp. 11-21