RE: SUO: RE: RE: Re: Missing Ingredients
Tom Johnston wrote:
> I must admit to being skeptical about using WordNet. I've
> finally realized
> why. Two reasons.
>
> 1. As I've explained before, I'll bet I can give you a dozen
> definitions of
> "customer", each of them a real one, used by a real company,
> such that the
> WordNet definition of "customer" fails to discriminate among
> any of them. So
> interesting, tough semantic problems are less likely to arise from an
> analysis of WordNet.
Good programmers choose names for columns in databases that are
widely recognizable to code reviewers. WordNet provides a list
of all the meanings of a word like 'customer', 'address', ... .
In the 80s, there were several research projects aimed at
translating English requirements documents into designs. Since
there was no WordNet at the time, and computers were still
very slow, expensive and forgetful, that effort didn't get
very far and the government agencies funding the efforts weren't
able to justify their budgets.
I think a controlled language based on WordNet, and using the
Verb frames, hypernymy/hyponymy, ... concepts that have been
made freely available could revitalize that kind of work.
Especially now that we've gotten widespread OO technology
tools, like Delphi, Common Lisp, ... there is more understanding
now about the need for ontologies, and more agreement on that
base of construction. There is work on various XML dialects
which identifies certain concepts as important in specific
domains. Cross wiring these with a common dictionary would
improve interoperability.
> 2. If we use WordNet, where will we find the discoveries
> that bring up
> ambiguities, vaguenesses, homonomies, etc. that we must then
> cleanse WordNet
> of in ways I've already talked about?
By automated analysis of well written English text. Such as
conversations in good fiction, scene descriptions, and other
annotated sets of texts. WordNet also has synsets, where
several words have the same meaning in the same context.
There is no lack of ambiguities in these sources. What's
needed is the ability to organize the concepts from natural
language originals to more structured, machine processable
forms that can provide the substrate for programmers to
add additional subtleties.
>These discoveries come out of
> recognized needs. In business, that comes when a decision
> maker finds that a
> screen or report, purportedly containing the information he
> needs, really
> contains less, or more or different information than what he
> had in mind.
That's an iterative process. Shown a program the first time,
nearly everyone wants something a little different. Build
to that spec and the second iteration also reveals more
needs, wishes, requirements. There is no end to software
development, as I'm sure you know. What's needed is the
substructures that let the user make these changes without
programming. Let the user do the interpretation, but give
her the means to do it immediately, iteratively, and with
changing goals in mind.
> (Yes, non-Peirceans can talk about what people have in mind,
> can realize
> that meaning isn't inherent in phonetic and orthographic
> tokens but rather
> in the minds of the interpreters.)
>
> An example is in a series of three articles I recently wrote at
> datawarehouse.com. The subject was zip codes. A certain
> zipcode was split
> into two by the post office. Call it zipcode 12345, split
> into a new 12345
> and also 12346. (This is how the post office does it. Talking
> to them about
> 12345 now being a homonym isn't going to change their
> practice (and for good
> reasons, too).)
>
> The split happened a couple of months ago. The big boss
> decision maker says
> he wants to see the history of sales for his division, by the
> zip code of
> the customers to whom those sales were made. The report lands
> on his desk.
> Sales for 12345 are shown to have dropped by 60% two months ago.
>
> This is known, in business database management, as the "as
> was vs. as is"
> problem. In this case, it will probably be pretty easy to
> figure out what
> happened. But often the results are more difficult to
> discover. Suppose that
> what the boss sees is a list in which there is some highly
> derived dollar
> figure, and in which sales by customer zipcode by month is
> only a small
> contributing factor to creating that derived amount. Now the
> boss starts to
> see a downward trend in certain entries on that list, that started two
> months ago, but that even a year from now shows no signs of correcting
> itself.
>
> Well, anyway, the issue here is what "..... by zipcode"
> means. Does it mean
> (a) by the definition of the zipcode as of some date in the
> past; or (b) the
> definition of the zipcode as it is currently. ("Definition of
> the zipcode"
> really means "referent of the proper name 12345", of course.)
>
> Point is: using examples from databases where the semantics vary
> considerably from one database to another, and also vary to a
> degree for the
> same database, over time, will give us some meat to sink our
> teeth into. I
> don't see how just using WordNet will do that.
It can if we use it to accomplish that goal.
Rich
> -----Original Message-----
> From: owner-standard-upper-ontology@majordomo.ieee.org
> [mailto:owner-standard-upper-ontology@majordomo.ieee.org]On Behalf Of
> Richard Cooper
> Sent: Wednesday, October 22, 2003 3:49 PM
> To: Murray Altheim
> Cc: Tom Johnston; Jon Awbrey; SUO
> Subject: RE: SUO: RE: RE: Re: Missing Ingredients
>
>
>
> Murray Altheim wrote:
> > Richard & Tom,
> >
> > I perhaps wasn't clear in my previous message regarding use of
> > WordNet as a resource for building reasoning engines.
> >
> > It's not the WordNet doesn't have categories, doesn't have any
> > structure, it's that in order to build a system that uses natural
> > language, you need to build into it architecture the notion that
> > each of these words has essentially *no* meaning outside of a
> > given context. Meaning only comes via interpretation, it's not
> > bound somehow essentially within the word. So the specific string
> > of characters "parrot" has no meaning whatsoever. The meaning you
> > think it has is there because *you've* interpreted it. When you
> > go to a thesaurus, how do you choose the "correct" word? From
> > experience, you do so by looking at the context in which you plan
> > to use it, and make a judgement. The reverse, from a machine POV,
> > is not straightforward. I think this is what Jon alluded to when
> > he says we don't have a clue about how to tackle it.
>
> That is certainly true of natural language, but I'm using WordNet
> as a "lexical resource", as its called, not to interpret natural
> language in all its complexity. The point is to select a project
> scope that is within reach. WordNet is within reach, and fits
> the requirement for defining the commonly held interpretations
> people have for words. But that doesn't mean full natural
> language in all its variations; its just a vocabulary of words
> that have defined meanings, is computer processable, and can be
> applied to ontology development.
>
> The alternative is to have everybody who designs an ontology
> make up their own words. Since an authoritative dictionary
> like WordNet exists, its far better to use the definitions from
> that source that to have an ontologist make up words.
>
>
>
> <snip/>
> > A child's understanding of "bird" is much different than a
> > biologist's, and to my understanding, DNA evidence has completely
> > mucked up zoological taxonomy, showing that there are no hard and
> > fast boundaries between species, that "species" may be as flawed
> > a concept as "race". So what does "bird" mean? Absent a specific
> > context, a specific interpretation, nothing at all. Meaning doesn't
> > exist on the printed page or in the words themselves -- it only
> > exists in our heads.
>
> Yes, but we have no instruments that can measure what goes on
> in our heads. So a commonly used set of definitions is better
> than no definition at all, or an unreferreed salad of words
> chosen without regard to any natural language interpretations.
>
>
>
> > Point is, there's a "pragmatic" and a Pragmatic way to go about
> > solving this problem. What may seem like a pragmatic approach (to
> > ignore the complexities of language, to jump nominalistically
> > into the fray of WordNet) will likely in the end bite you in the
> > ass. Or somebody else, maybe somebody named Osama.
>
> Absolutely right. There is only so far we can take this project,
> but that doesn't mean we can't get some useful results with some
> reasonable amount of work. Just don't have such high expectations
> that you think it will properly handle any utterance. Some is
> better than none.
>
>
>
> >Any reasoning
> > engine that makes a mistaken assumption about the context of usage
> > of a word, even if it guesses the base definition from a dictionary
> > correctly, is going to create a flawed result. Am I talking about
> > "bird" or "bird" or "bird" or "bird"? How would you know which one
> > is which?
> >
> > I don't think natural language is yet at a point where we
> > understand its complexities well enough, understand the vagaries
> > of context and interpretation well enough, that we can perform
> > machine-based reasoning on it, except in toy experiments. As I
> > mentioned, one approach is to use a controlled vocabulary, or
> > require that all machine-based reasoning operate upon known,
> > agreed-upon identifiers for known-agreed upon concepts, sort of
> > a business agreement about the use of terms in a vocabulary.
> >
> > Murray
>
> The latter - a controlled vocabulary - is what I want from WordNet,
> but not a tiny toy vocabulary.
>
> As for known agreed upon identifiers - WordNet has them. It shows
> each interpretation as a synonym set (a synset) for every noun,
> verb, adjective and adverb.
>
> As for known agreed upon concepts - WordNet has those also. They
> include the dictionary relations of hypernymy/hyponymy, synonymy,
> and so on.
>
> So it gets my vote for the controlled vocabulary. However, languge
> is time varying also. So having a database that formalizes the words
> means we have a perishable product. So the actual benefit is short
> lived without a continuing maintenance effort.
>
> But at least this way we can get some useful results on a smaller
> scale before trying to go any higher.
>
> Rich
>
>
>
>