SUO: RE: RE: Re: Missing Ingredients
Tom Johnston wrote:
> Rich:
>
> 1. To use WordNet to implement what I have in mind would
> require database
> tables for every leaf-node concept in WordNet.
When you download WordNet 2.0, there are 19 tables, properly
normalized, but expressed in text according to their documentation.
I've developed Delphi code to extract the 19 tables into true
SQL tables. Then I defined a few views that seem useful, but
I've only just started, so it will grow quickly. I have yet
to extract hypernymy, synonymy, or any other -onymies, but
the relationships are encoded in the detail relations for
each part of speech.
> 2. It would also require a database schema for every concept
> in WordNet,
> leaf-node or not.
Or rather, it requires a schema for 'concept' as encoded
in WordNet tables. There are lots of ways to skin that cat.
> 3. Then SQL queries could reference any of these schemas,
> specifying result
> sets.
Yes, and the queries could implement arbitrary predications
so long as they can be expressed in WordNet terms. Unless
you want to build on top of that structure to make more
application-oriented distinctions that aren't part of the
commonly communicated concepts.
> 4. All tables registered under the referenced nodes would be
> in-scope; all
> others not.
If you're using 'scope' in the same way I use 'context', its
not a property of entire tables, but of selected rows and
columns in the query. Scope in the programming sense is
appropriate to that level of reference.
> 5. The semantics of the result set would be as clear as the
> definitions of
> the referenced nodes, and not any clearer.
Yes; there's no free lunch.
> Theoretical issues abound, of course. And certainly many more
> than I am
> aware of. For example:
>
> re 1: dictionaries are notoriously not hierarchies, and are
> notoriously
> circular. So perhaps we would need a hierarchical distillate
> of WordNet, not
> WordNet itself.
I haven't gotten far enough yet to extract hypernymy relationships,
but I think somewhere in the WordNet documentation they claim
that theirs is noncircular, but they offer no proof and they
apparently don't test that assertion. So we would have to do it.
> re 2: are all non-leaf nodes instantiated only through their leaf-node
> subtypes?
A 'bird' is defined with 5 senses.
A 'parrot' is a kind of 'bird' in one of its two noun senses, and a
'copycat' in another, and has a verb sense 'to repeat mindlessly', so
there is a combination of ways in which the word is used.
If you want a table of birds, every row is a bird. If you
have a table of parrots, every row is still a bird, but the
distinction between parrots and nonparrots has to be encoded
in a descriptor such as 'BirdType', which might include Robins,
Parrots, Parakeets, etc. Since a parakeet is a kind of Parrot,
you could be encoding a partially hierachical concept in a
discrete enumerator. Just like in the real world.
Then you can specify queries that retrieve all parrots, robins,
parakeets, or whatever is needed for the business distinctions.
My view is that SUO could provide an ontology of concepts
and relationships that come from the WordNet database, and
with a lot of work, encode many of the subtler concepts into
the ontology. The whole thing could be done in SQL, and would
provide a starting point for more specific application-oriented
ontologies that could be grown from the SUO one.
Instead of calling it the Standard Upper Ontology, we could
call it the Standard Useful Ontology!
> re 3 & 4: so SQL queries, at least those referencing more
> than one database,
> would be redirected to reference the registration hierarchy, and
> supplemented to optionally specify an include-database and/or
> exclude-database list for all databases whose tables are
> registered under
> the referenced node. There's a fair chunk of software
> development work here,
> and its hard, internals-type middleware work, not traditional
> applications
> development work.
Yes. But government agencies (NIST, NIH, NASA, DoD, DOC, DARPA,
...) have funded infrastructure work like that before. So with
a good starting concept, and some useful results, perhaps some kind
government agency would fund extension of the work until there
is a truly useful infrastructure to support database interchange.
After all, there is a lot of government data that could be used
to make a better, more capable government service.
> re 5: my experience has been that the relevant semantics of business
> database tables nearly always requires distinctions not part of the
> definition of the ordinary language terms used to label
> tables (or columns).
In very large projects, there is a documentation standard
that defines the tables and columns in English, as well as
the ER diagrams, UML, and other notations that help define
the data more meticulously. The English description is
what operators get in their 'help' screens. If a meticulously
documented SUO is made part of the public infrastructure,
all kinds of useful tools could extract that information to
translate user requirements into databases more automatically
than at present. That approach would help remedy the seat of
the pants way of building databases in the future.
> So even if all this could be done starting with WordNet, we
> still wouldn't
> have anything a CFO would pull out his checkbook for.
A government project manager might though. Start with the
government, but aim at the commercial world as the endpoint.
That's the way CS&EE research has often gone from ideas to
useful objects. When the CFO can buy bigger better databases
for less money, she'll pull out the ATM card and sign the
receipt.
> Finally, your comments about sets not being mutually
> exclusive, and the
> messiness of databases, are well taken. My emails in which I discussed
> Wittgensteinian family resemblances, a couple of months ago,
> were attempts
> to talk about this messiness. But I'd rather stop talking philosophy,
> translating back and forth between Peirce, Wittgenstein,
> Quine and even the
> dreaded Rorty. I'd rather stop talking about axiomatizing
> everything we do.
> I'd rather start building a house on a local sandbar instead
> of endlessly
> prospecting for granite bedrock
That's why I'm putting my spare time into the WordNet project. I like
to see something useful come out of my work. It used to be enough
to publish papers, but after a while I realized how few people
read the really deep papers, and how useful the glossier magazines
and web sites are to most of us.
>(to use the least favorable
> metaphor I can
> think of for what I am recommending). (After all, Stone Mountain, the
> largest outcrop of granite in the world, is only a few miles
> from me and my
> local creek!
Tom, are you near Stone Mountain? Coincedentally, I went to
Georgia Tech in Atlanta many long years ago. I climbed Stone
Mountain several times during those years.
>i.e. various metaphors about firm foundations
> are creating
> "analysis paralysis", IMHO.)
>
> But so what? Is there an idea here that might lead somewhere?
> Or have I just
> rediscovered, in less refined language, some of the more obvious
> implications of knowledge soup, KIF, a lattice of theories, IFF, C. S.
> Peirce, twenty years of work in deductive databases, or two
> ISO documents
> recently mentioned by Matthew West? I certainly don't know.
At the end of "Candide", Voltaire has the 40 year old man
turn from concepts to real world work. He seems to think
there is a time to stop analyzing and start producing also.
Rich
> -----Original Message-----
> From: Richard Cooper [mailto:rich@valutech.com]
> Sent: Wednesday, October 22, 2003 12:22 PM
> To: Tom Johnston; Jon Awbrey; SUO
> Subject: RE: RE: Re: Missing Ingredients
>
>
> From: Tom Johnston wrote:
> > That's interesting. I've been thinking of a do-it-yourself,
> > start-from-scratch approach.
> >
> > One question: are the entries in WordNet sophisticated enough
> > to make the
> > kind of distinctions I've been providing examples of,
> > distinctions where
> > tables in different databases but with the same names (the
> > Customer table,
> > the Shipments table, etc.) nonetheless have significantly
> > different set
> > membership criteria?
>
> WordNet provides class structure, so the concept of a
> customer is returned from their standard seach as below:
>
> "
> The noun customer has 1 sense (first 1 from tagged texts)
>
> 1. (25) customer, client -- (someone who pays for goods or services)
> "
>
> The distinctions you mentioned were specific predicates that
> distinguish the class of customers into those who pay upon
> purchase, those who pay upon shipment, and those who pay
> a bill when its due. Those kinds of distinctions are deeper
> than "customer", but still based on predications using the
> same English words you used in describing the subsets.
>
> But you could take that arbitrarily deep. For example,
> some customers could pay upon purchase for some items, pay
> upon shipment for others, and pay by mail for yet others.
> So the sets aren't necessarily mutually exclusive. Since
> you've dealt with databases, you know how things can get
> muddled up by unanticipated real world conditions.
>
> Given the columns of your customer table, you could make
> all kinds of distinctions, such as customers from Chicago,
> customers over 65, and so on. All of these distinctions
> have to be communicated to your users, and your users
> work in natural language, so you must also.
>
> So I think WordNet provides a good starting point, and its
> an authoritative reference work that lots of people use,
> with ongoing funding and good prospects for continued
> refinement. And its free. So it makes a very good starting
> point.
>
> JMHO,
> Rich
>
>
> > From a business perspective, that's
> > where the rubber
> > really meets the road. Clearing up stuff like that is what will get
> > corporate checkbooks out. Formalizing ordinary language
> > semantics will not.
> >
> > Thanks.
> >
> > Tom
> >
> > -----Original Message-----
> > From: owner-standard-upper-ontology@majordomo.ieee.org
> > [mailto:owner-standard-upper-ontology@majordomo.ieee.org]On
> Behalf Of
> > Richard Cooper
> > Sent: Tuesday, October 21, 2003 5:22 PM
> > To: Jon Awbrey; SUO
> > Subject: SUO: RE: Re: Missing Ingredients
> >
> >
> >
> > Jon Awbrey wrote:
> > <snip\>
> > > TJ: 1.1. Our goal, I take it, is to increase the semantic
> > > interoperability
> > > of databases. This means, I take it, (although I
> > > have found no
> > > description of any such thing on the SUO website)
> > > is to create
> > > a registration framework for real world databases.
> > >
> > > Tom,
> > >
> > > There's about 20 years worth of research on "deductive databases"
> > > that I can remember just since the first standard textbooks began
> > > to appear. But you said bottoms-up, and I'm all for that, well,
> > > let me check -- yes, it's an odd-numbered day where I am, so OK.
> > >
> > > Let us try to approach the question
> > > of "semantic inter-operability" (SIO)
> > > by way of the following sub-questions:
> > >
> > > 1. What is the "meaning" of a "set of sentences" (SOS)?
> > >
> > > 2. What is the "meaning" of a "table of tuples" (TOT)?
> > >
> > > 3. How shall we compare the "meanings" of these two?
> > >
> > > I will give you and me both time to think and then get
> back to you.
> > >
> > > Jon Awbrey
> >
> > This set of three questions is the most important triple we're
> > dealing with in all SUO work. Getting clear answers to how
> > meaning is represented, communicated, stored, compared and
> > organized would be a successful result.
> >
> > We have predefined the answer to be an ontology. Then we refined
> > that concept to include the lattice of ontologies, plus the IFF
> > framework, but I still get the feeling there's a lot of stuff left
> > out.
> >
> > So I agree with Tom that the focus should be refined further
> > to incorporate real world database concepts, and I add one more
> > suggestion; that we should be working with natural language
> > words and sentences to impose the type structure, or class
> > structure, and property lists, of common everyday concepts like
> > address, customer, person, ..., fill in your favorite concepts.
> >
> > Finally, since we haven't been able to agree on more enhanced
> > ontologies than WordNet, perhaps we should start the bottom-up
> > process by extracting exactly the ontology that WordNet provides.
> > This could be one of the bottom-level concept sets, along with
> > others that may appear in the lattice as we continue.
> >
> > Rich
> >
> > he
> >
>
>