Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

RE: ONT RE: Ontology case study




Matthew,

At 11:14 AM 5/30/2002 +0200, West, Matthew R SITI-ITPSIE wrote:
>Dear Adam,
>
>See comments below.
>
>
>Matthew West
>Principal Consultant
>Shell Information Technology International Limited
>Shell Centre, London SE1 7NA, United Kingdom
>
>Tel: +44 20 7934 4490 Other Tel: +44 7796 336538
>Email: matthew.r.west@is.shell.com
>Internet: http://www.shell.com
>
>
> > -----Original Message-----
> > From: Adam Pease [mailto:apease@ks.teknowledge.com]
> > Sent: 29 May 2002 15:38
> > To: West, Matthew R SITI-ITPSIE; ontology@ieee.org
> > Subject: RE: ONT RE: Ontology case study
> >
> >
> > Matthew,
> >
> > At 11:21 AM 5/29/2002 +0200, West, Matthew R SITI-ITPSIE wrote:
> >
> >
> >
> > [snip]
> >
> > > > So why is using an ontology overkill again?
> > >
> > >MW: A data model is an ontology, just one with far fewer axioms, but
> > >inclulding those that are useful in database design. So it
> > is a matter
> > >of degree. A fully axiomatized ontology just includes a lot of stuff
> > >that has no value for database design.
> >
> > If a data model is an ontology with fewer axioms then the
> > terms in the data
> > model are lacking explicit mention of some sufficient
> > conditions.  I could
> > see how on a particular ontology one might claim that
> > particular axioms are
> > unnecessary, but don't understand how you can make this general
> > claim.  Could you explain further?
>
>MW: In the case of database design it is about what axioms (constraints
>in datbase speak) that can be implemented in the database. These are
>relatively
>limited, mostly cardinality related. So for example you can note that
>the number of dates of birth a person has is exactly one, and stop people
>trying to enter more than one.

Yes, but surely not all legitimate and useful constraints involve 
cardinality or other limited conditions that can be handled by database 
constraint languages.

>Others can be implemented in code of some sort, but in pracice
>they are not.

Is that a good thing?  It sounds to me like simply a limitation of the 
tools available, so even though a constraint is known and valid, it doesn't 
get implemented.  Are you claiming that this limitation is somehow an 
advantage?

>On the other hand there are some data quality applications
>that look at the data in the database, and can check against quite
>sophisticated rules, and discover patterns in the data.
> >
> > [snip]
> >
> >
> > > > How about a database in which both John and Jon (should be
> > > > John) have the
> > > > same SSN, or hospital records where Mary who was born on
> > > > 5/2/65 is recorded
> > > > as giving birth to John on 8/7/68 (should be 8/7/86)?  This
> > > > sort of thing
> > > > is not at all rare in real world databases.
> > >
> > >MW: This sort of thing is sorted out by referential integrity, and
> > >is the sort of constraint that is set up in data models. In this case
> > >you would probably not get the opportunity to enter the
> > name, unless you
> > >were the first person to enter the SSN and related details,
> > after that
> > >these would be provided by reference only.
> >
> > There are two typos in the example.  How would referential
> > integrity catch
> > the date problem?
>
>MW: You would not enter the data twice. The data of birth is already
>there. Of course you can have bad database designs that allow multiple
>entry of the same data, but hten I am sure there are bad ontologies too.

The birth date is not entered twice in the example.  It's entered once for 
two different people.

>MW: In this situation you would have a primary key, probably the SSN
>number, and when you enter that, the other details would be returned
>as a check that you had typed in the correct SSN. What you don't do
>is allow people to just type data into foreign key fields. That is
>asking for trouble. You always do a lookup, and insist that the object
>referenced exists. That is referential integrity.

While it's true that the example had that issue, I'm currently addressing 
the second issue in the example which is that the birth date of the child 
has a typo which shows it as being earlier than the birth date of the 
mother.  That is a situation that an ontology can catch.  Could you address 
how you would see that being handled?

> >
> > [snip]
> >
> > > > It's just as easy to come up with an issue relevant to database
> > > > design.  Let's say that you're integrating databases that are
> > > > hospital
> > > > records from different countries.  The schema from one
> > Muslim country
> > > > allows multiple wives for a given husband and the schema for
> > > > the US DB
> > > > prohibits this.  With English comments, there's no way for an
> > > > automated
> > > > process to find the problem.
> > >
> > >MW: The database structure is likely to be quite different (assuming
> > >only the current wife is recorded in the US database) and so
> > the constraint
> > >quite obvious.
> >
> > Obvious to a human being, but not to a software program, that's the
> > point.  With a formal axiom, a theorem prover can catch the
> > problem.  With
> > an English comment, a human has to catch the problem.
>
>MW: I don't think there is anything that can merge two databases
>automatically without humans looking at it as part of the process.

Nor do I.  What an ontology can do is provide a set of integrity 
constraints that can be automatically processed to determine in part if 
that merger was specified correctly.

> >
> > >  I don't see what advantage the ontolgies would have. You
> > >would ahve to manually map the concepts in the two
> > ontologies anyway, and
> > >that is much harder than finding the differences in data
> > structure and
> > >how/whether the constraints are enforced in the database.
> > >
> > >MW: In fact, if the US database is able to hold information
> > about previous
> > >wives, it will probably be able to hold the data about the multiple
> > >concurrent muslim wives, since it is unlikely that a
> > constraint restricting
> > >a US citzen to one wife at a time would have been implemented.
> >
> > That's assuming a number of things that change the example,
> > including that
> > database designers do the right thing.  Mistakes happen, and
> > if all you
> > have are English (or some other human language) definitions,
> > no automated
> > process can check it.
>
>MW: Again I know of no way to  compare databases without human
>intervention.

nor do I

>If they each have an ontology, they will be just
>as different as the two databases, so I don't see how that gets
>you further.

If the database integrator makes an error and reverses the mapping of 
birthdate and deathdate from a client database to a central database, a 
human being has to catch the error.  If there's an ontology that has an 
axiom that you die after you're born, a theorem prover can catch the error.

> >
> > [snip]
> >
> >
> > > > >MW: EXPRESS is a language that is computer processable, and
> > > > is processed
> > > > >to provide a database. You do not automatically process
> > the KIF in
> > > > >your ontology, to produce a database design, it is read by
> > > > humans, so you
> > > > >should use something that they understand. It is the
> > information you
> > > > >convey, rather than the information you contain that matters.
> > > >
> > > > Let's distinguish between what is currently done with today's
> > > > tools, and
> > > > what this standardization effort is supposed to enable.  A
> > > > logic based
> > > > ontology enables computer processing, so the SUO ontology is
> > > > not intended
> > > > just for human eyes.  Also keep in mind that for human
> > eyes, we have
> > > > Michal's pseudo-English presentation and a graphical browser,
> > > > as well as
> > > > English comments.
> > >
> > >MW: Fine. But what is it for ...
> >
> > You made a comment above that the when the ontology is used
> > by humans, it
> > should be presented in a language they understand.  That's
> > the point I
> > answered.  We do present it in a human-readable form, as well
> > as in formal
> > logic.
>
>MW: Accepted.
> >
> > [snip]
> >
> >
> > > > >MW: Well if I had an ontology, I certainly wouldn't ignore
> > > > it (as long
> > > > >as I could make sense of it) but I would translate it into a
> > > > data model
> > > > >as my first step if what I was doing was database design.
> > > >
> > > > ...and you're free to do that.  Knowing that many people are
> > > > put off by
> > > > logic, we've translated SUMO into Protege and DAML.  We'd
> > be happy to
> > > > implement other translators as our time allows.
> > >
> > >MW: That would be potentiallly useful, although one thing I
> > have seen is
> > >that the best way to say something in one language, does nto
> > necessarily
> > >translate automatically to another language in the best way
> > to say it there.
> > > >
> >
> > I agree.
> >
> > Adam
> >
> >
> >
> > Adam Pease
> > Teknowledge
> > (650) 424-0500 x571
> >

Adam Pease
Teknowledge
(650) 424-0500 x571