Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: Sowa's comments about CNN article about Cyc


I just wanted to clarify some possible misunderstandings
about Cyc, some of which are no doubt due to media garbling.

1. I've made it a point to explain, in every talk I've given on Cyc
for the past 17 years, that the number of axioms is a red herring.
The example I usually give is having assertions like "No elk is
a goldfish", where it's better to have 10,000 Linnaean taxonomy
assertions plus a disjointness-unless-spec rule, rather
than have 100,000,000 rules of the elk/goldfish variety.  But
reporters keep asking about the number of assertions, and
there is a sense in which it's worth mentioning, namely that
we continually try to police the KB and find ways to generalize
and combine assertions, to REDUCE the number of assertions,
and despite our best efforts it is still 10**6 and growing.

2. The systematic microfoam exception is easy to add to Cyc.
The fact that it's hard but composed a type of stuff which normally is
soft is just an exception.  That's why we have been using default
logic for all these years; Cyc's reasoning has been based on
argumentation since the late 80's.  For the Nixon diamond, e.g.,
there are simply two arguments, one pro and one con, and
Cyc applies metarules (rules that conclude that one argument
is PREFERRED over another) to decide which argument to
believe, in any given context -- or whether they are incommensurable
in which case it (correctly) doesn't trust either argument except in
the specialized context in which that argument is assumed valid.

3. ">I would like to see that knowledge tested and validated by
      > anyone who happens to be knowledgeable about the subject."
Absolutely.  There can/should/will be contexts for individual's beliefs,
and when you ask a question you should be able to designate which
individuals (by name or description) you want to trust.  E.g., in asking
a medical question, you might trust the AMA official position, not any
M.D.s in general but with the exception of those who have at least
10 refereed publications in J.AMA, plus anyone that your brother
trusts.  And you might want to see answers using arguments
that involve information from some middling-trusted group, or
even groups you strongly distrust (e.g. terrorist organizations),
but with explicit annotation of who claims/believes this.  Out of all
this, one organization you might choose to trust in general is
Good Housekeeping.  Another might be the OpenCyc Committee.
Another set might be those vetted by the OpenCyc Committee. etc.

4. ">Fifty years ago..."
I think this arises from a misunderstanding,  thinking we are claiming
Cyc is somehow correct or timeless or globally consistent knowledge.
Cyc assertions are default-true in the context they were entered (and
other contexts that inherit or lift from that context), and no more.  So
things can be true in one context and absent from or false  in others.
And yes, naturally, one dimension of context space is time. So
things can be true at one time and absent from (i.e., meaningless)
or false  at other times.

5. ">Since 1984, he has been predicting that Cyc
       >would reach human-like abilities in five years"
That's a little cruel; the horizon has approached several months per
year.  Part of the problem here is the media, I think, since
the reporters ask where we'll be in 5 years, rather than what the
next qualitative state-change will be and when we'll be there.
In our 1984 project plan at MCC, and talks based on it, I
would always show -- and in fact STILL show the same slide, but
now in color in PowerPoint -- a graph with 3 S-curves representing
three phases I thought (in 1984) would take about a decade each:
getting Cyc manually to the point where manual ontologizing would
yield to interactive dialogue and tutoring; then another decade to
virtually automated NLU (information extraction from text at a level
where the extracted information would be as good as a human
doing the manual translation of the same written material);  and
then a third decade to "real" open-ended machinge learning.  The
first decade happened to have taken 16 years, which just shows
I was thinking in hexadecimal all along.  Seriously, that was longer
than planned but I consider getting within a factor of 2 pretty good.
I now believe that this second decade (getting the tutoring to be
easier and more automatic, approaching real NLU) will take less
than 10 more years, partly because of the Web, so we'll be starting
our focus on machine learning around 2007, a few years later than
planned in 1984.

6. ">Instead, I believe that the organization of the knowledge base in a
      >flexible, dynamically changeable structure is far, far more important."
We have no ideological axe to grind here.  We evolved from frames to
FOPC to HOL as we had to; we evolved away from cf's and to arguments
because we had to; we evolved the context mechanism because we had
to; etc.  When you demonstrate something we can use, rest assured we
will add it into the mix.  We view the inference engine, the logic, even
the CONTENT of the KB to be just scaffolding to facilitate the construction
of what will come later; much like building a sand castle where you get
the rough form in place and gradually get the details right.  I don't
CURRENTLY believe that structure, as you mean it, is that important,
e.g. the sort of symmetry that Aristotle mystically ascribed great value to,
nor do I believe the number of axioms is important.  But I am willing to
be proven wrong empirically, and I will (hopefully) continue to evolve.
What I do believe is important is to get ENOUGH coverage (deductive
closure), and ENOUGH "traction" (some combination of consistency
and correct contextualization, plus efficient enough inference modules)
to get to the next stage:  from manual entry to tutoring to automated NL/ML.
Feeling too much like lungfish crawling from the sea, we're slowly, painfully,
getting there.