Directions for future research
Since I have made some critical comments about current
R & D projects that address ontology, the Semantic Web,
and related topics, somebody sent me an offline note
to ask what kind of research I would recommend.
Following is my response:
1. I believe that the most difficult and important
problem for any database system, knowledge-based
system, learning system, AI system, or the entire
Semantic Web, is to address what I call the challenge
of knowledge soup:
http://www.jfsowa.com/talks/challenge.pdf
2. The collection of problems that I lump under the
name of knowledge soup have been addressed repeatedly
by various partial solutions: nonmonotonic reasoning,
standardized terminologies, nomenclatures, ontologies,
information extraction, data mining, knowledge discovery,
genetic algorithms, etc.
3. But all the attempts break down when they get to one
fundamental problem: the world is far more complex
than any discrete notation, language, representation,
or whatever is capable of dealing with.
4. The problems of dealing with natural languages are
so well known that many people have given up. Since
they recognize that they're too difficult to solve
with current methods, they look for a "quick fix" by
some other means. That was the solution that Frege,
Russell, and Carnap attempted with symbolic logic as
a more perfect, ideal language. But it failed.
5. The failure of symbolic logic as a replacement for NL
does not mean that any system of logic is bad for what
it is capable of doing well. It just means that no
version of logic (which includes every language or
notation for knowledge or data representation) can,
by itself, solve the problems that make NLP difficult.
6. At best, any system of logic or knowledge representation
addresses some useful special cases. One example is SQL,
which runs the databases that support the world economy.
That is certainly an important special case, and there
are many other important special cases.
7. But just lumping all the solvable special cases in one
big package is not going to address the fundamental issue:
The world is so complex that any representation that is
adequate to handle the full range of problems must be as
general and flexible as natural languages. Nothing less
will do.
As one example of the kind of research that I believe should be
done in order to get some handle on the complexity, I would cite
the proposal that I and a couple of colleagues submitted as a
"Grand Challenge"; this was our response to a solicitation by
DARPA in December. I don't know whether they will adopt our
suggestion as one of their challenge problems, but I believe
it is one way of forcing people to address the issues:
http://www.jfsowa.com/ai/gcprop.pdf
Following is a brief description from the opening paragraph:
The task we suggested is one that nearly every two-year-old
child solves: the problem of learning to integrate visual,
tactile, and motor information with language. To evaluate
progress on this task, we proposed that any AI research group
that wished to respond to the challenge be given a collection
of binocular pictures, still or moving, together with some
natural-language questions about those pictures. Any AI
system they develop would be asked to determine which pictures
could answer any of the questions and to state those answers.
This is a problem that has been solved for some simple cases with
current technology, but a general solution would be, as DARPA
requested, a challenge and motivation for research over the
next 10 to 20 years.
Although this problem is not going to be solved soon, I believe
it helps to put current work into perspective. There are many
good special cases that are very nicely handled with current
technology, but it is essential to recognize that they are special
cases. That means that proposed solutions should be sufficiently
flexible that they could be extended to fit into more general
solutions to the Grand Challenge that might be developed over
the next 10 to 20 years (or perhaps more).
John Sowa