RE: SUO: RE: Re: Missing Ingredients
Murray Altheim wrote:
> Richard Cooper wrote:
> [...]
> > This set of three questions is the most important triple we're
> > dealing with in all SUO work. Getting clear answers to how
> > meaning is represented, communicated, stored, compared and
> > organized would be a successful result.
>
> Rich,
>
> Yes, I agree very strongly that until there is some clear
> answers to this, until there is a foundation such as you
> describe, everything else is resting on sand. This is in
> part why I thought tackling the LBase document might be a
> good start, since it was Pat's best shot at the problem.
I'm not familiar with LBase. The reason I suggested WordNet
is that its an ongoing, funded program by Comp Linguists who
release new versions with the same database format. Since
its run under the auspices of Princeton U, and has secure
funding, it is about as authoritative a list of words as
we can obtain at this time.
The other relationships in WordNet (hypernymy, synonymy, isa,
...) are well organized, and can be extracted as versions
change and evolve. And WordNet is free, making it likely
to be the standard in years to come.
So WordNet is a useful thing to build tools around for
extracting word 'meanings' as defined by a dictionary.
The controlled vocabulary John talks about seems to me to
be good conceptual fodder, so lets make one that can be
revised and restructured as new uses dictate.
Of course, WordNet is just the most general of dictionaries,
but it could be augmented in specific contexts with words
and meanings, so long as the WordNet database format is
followed. It just gives us the initial everyday words
needed to get started.
I've been working on a tool to build an SQL database from
the WordNet dictionary so that the various relationships
can be automatically extracted. It uses SQL Server 2000,
but in principle, it could be ported to other databases
pretty easily. The tool itself is written in Delphi.
If anyone is interested in collaborating, please let me
know.
> John and Jon have both written extensively on this, but
> rather than point at existing, lengthy texts, it'd be good
> if we just talked I think. Informally, until the formalisms
> and the assumptions behind them become clearer. Background
> reading is good too, but conversation seems called for.
>
> > We have predefined the answer to be an ontology. Then we refined
> > that concept to include the lattice of ontologies, plus the IFF
> > framework, but I still get the feeling there's a lot of stuff left
> > out.
> >
> > So I agree with Tom that the focus should be refined further
> > to incorporate real world database concepts, and I add one more
> > suggestion; that we should be working with natural language
> > words and sentences to impose the type structure, or class
> > structure, and property lists, of common everyday concepts like
> > address, customer, person, ..., fill in your favorite concepts.
> >
> > Finally, since we haven't been able to agree on more enhanced
> > ontologies than WordNet, perhaps we should start the bottom-up
> > process by extracting exactly the ontology that WordNet provides.
> > This could be one of the bottom-level concept sets, along with
> > others that may appear in the lattice as we continue.
>
> Before we head down the WordNet road, there are a number of
> questions that pop to mind. The first is that if there is to be
> any value in using something like WordNet, we have to assume that
> it has some kind of legitimate, ordered, and relatively accurate
> structure to its ontology, and this would rely on language itself
> having the same kinds of order (i.e., any structures meant to
> describe natural language must assume that natural language has
> structure), that WordNet is not simply arbitrarily ordered in some
> way similar to the ordering of natural language. (Perhaps
WordNet represents the best efforts of linguists to organize
what is known about English usage. Its the growing, evolving
result of their work. So it does seem to be ordered, and also
I would bet the linguists at Princeton would love to get any
feedback we can give them on disorderly entries.
> "arbitrary" is not quite right -- "organic" or even recursively
> associative, but I don't think of natural language as remotely
> being formally ordered, not even within one person's head, much
> less the world population. If there is an inherent structure in
> natural language, it is so enormously more complex and intricate
> than our current understanding that we cannot assume to find a
> solution for many decades, which jibes with what I hear from the
> computational linguists.)
All of language processing won't be done for a very long time,
but the dictionary of word meanings and synonyms is pretty clearly
defined for now. Of course, word meanings change with time, and
new words are added, but the dynamic nature of language requires
that. So I think we should build our tools to work on a changing,
dynamic dictionary, hence WordNet.
> My difficulty with this is that from my understanding of the
> current state of computational linguistics, we're many, many
> years from understanding language to the level, and indeed,
> several comp. ling. experts I've heard (such as Geoff Nunberg,
> who spoke at ICCS 2001) or read imply that it's likely that
> language is so inextricably bound to individuals and individual
> contexts and usages, linguistic families, cultures, communities,
> etc., that no formal set of rules for natural languages are
> likely to be devised. I tend to rule out WordNet as a tool to
> support computer-based reasoning for this reason, and just
> consider it a handy tool for human judgment and use, like a
> book-form thesaurus.
Again, I don't think we have to tackle all of natural language
to do useful work on ontologies. But I think we do need to
use words the way people express themselves.
<snip\>
> [I'm not sure if this is clear, but it's the best my neurons
> can do for now... they're kinda fried lately. I'm waiting for
> a convenient time for a gin and tonic, after I don't need them,
> when I can properly fry them.]
>
> Murray
I think I'll join you - the day is hereby over.
Rich