RE: ONT RE: Ontology case study
Dear Adam,
See comments below
Matthew West
Principal Consultant
Shell Information Technology International Limited
Shell Centre, London SE1 7NA, United Kingdom
Tel: +44 20 7934 4490 Other Tel: +44 7796 336538
Email: matthew.r.west@is.shell.com
Internet: http://www.shell.com
> -----Original Message-----
> From: Adam Pease [mailto:apease@ks.teknowledge.com]
> Sent: 30 May 2002 21:10
> To: West, Matthew R SITI-ITPSIE; ontology@ieee.org
> Subject: RE: ONT RE: Ontology case study
>
>
> Matthew,
>
> At 11:14 AM 5/30/2002 +0200, West, Matthew R SITI-ITPSIE wrote:
> >Dear Adam,
> >
> >See comments below.
> >
> >
> >Matthew West
> >Principal Consultant
> >Shell Information Technology International Limited
> >Shell Centre, London SE1 7NA, United Kingdom
> >
> >Tel: +44 20 7934 4490 Other Tel: +44 7796 336538
> >Email: matthew.r.west@is.shell.com
> >Internet: http://www.shell.com
> >
> >
> > > -----Original Message-----
> > > From: Adam Pease [mailto:apease@ks.teknowledge.com]
> > > Sent: 29 May 2002 15:38
> > > To: West, Matthew R SITI-ITPSIE; ontology@ieee.org
> > > Subject: RE: ONT RE: Ontology case study
> > >
> > >
> > > Matthew,
> > >
> > > At 11:21 AM 5/29/2002 +0200, West, Matthew R SITI-ITPSIE wrote:
> > >
> > >
> > >
> > > [snip]
> > >
> > > > > So why is using an ontology overkill again?
> > > >
> > > >MW: A data model is an ontology, just one with far fewer
> axioms, but
> > > >inclulding those that are useful in database design. So it
> > > is a matter
> > > >of degree. A fully axiomatized ontology just includes a
> lot of stuff
> > > >that has no value for database design.
> > >
> > > If a data model is an ontology with fewer axioms then the
> > > terms in the data
> > > model are lacking explicit mention of some sufficient
> > > conditions. I could
> > > see how on a particular ontology one might claim that
> > > particular axioms are
> > > unnecessary, but don't understand how you can make this general
> > > claim. Could you explain further?
> >
> >MW: In the case of database design it is about what axioms
> (constraints
> >in datbase speak) that can be implemented in the database. These are
> >relatively
> >limited, mostly cardinality related. So for example you can note that
> >the number of dates of birth a person has is exactly one,
> and stop people
> >trying to enter more than one.
>
> Yes, but surely not all legitimate and useful constraints involve
> cardinality or other limited conditions that can be handled
> by database
> constraint languages.
MW: Pretty much, at least for pure SQL.
>
> >Others can be implemented in code of some sort, but in pracice
> >they are not.
>
> Is that a good thing? It sounds to me like simply a
> limitation of the
> tools available, so even though a constraint is known and
> valid, it doesn't
> get implemented. Are you claiming that this limitation is somehow an
> advantage?
MW: I'm not making judgements, just saying how it is and that there is
a lot of it.
>
> >On the other hand there are some data quality applications
> >that look at the data in the database, and can check against quite
> >sophisticated rules, and discover patterns in the data.
> > >
> > > [snip]
> > >
> > >
> > > > > How about a database in which both John and Jon (should be
> > > > > John) have the
> > > > > same SSN, or hospital records where Mary who was born on
> > > > > 5/2/65 is recorded
> > > > > as giving birth to John on 8/7/68 (should be 8/7/86)? This
> > > > > sort of thing
> > > > > is not at all rare in real world databases.
> > > >
> > > >MW: This sort of thing is sorted out by referential
> integrity, and
> > > >is the sort of constraint that is set up in data models.
> In this case
> > > >you would probably not get the opportunity to enter the
> > > name, unless you
> > > >were the first person to enter the SSN and related details,
> > > after that
> > > >these would be provided by reference only.
> > >
> > > There are two typos in the example. How would referential
> > > integrity catch
> > > the date problem?
> >
> >MW: You would not enter the data twice. The data of birth is already
> >there. Of course you can have bad database designs that
> allow multiple
> >entry of the same data, but hten I am sure there are bad
> ontologies too.
>
> The birth date is not entered twice in the example. It's
> entered once for
> two different people.
>
> >MW: In this situation you would have a primary key, probably the SSN
> >number, and when you enter that, the other details would be returned
> >as a check that you had typed in the correct SSN. What you don't do
> >is allow people to just type data into foreign key fields. That is
> >asking for trouble. You always do a lookup, and insist that
> the object
> >referenced exists. That is referential integrity.
>
> While it's true that the example had that issue, I'm
> currently addressing
> the second issue in the example which is that the birth date
> of the child
> has a typo which shows it as being earlier than the birth date of the
> mother. That is a situation that an ontology can catch.
> Could you address
> how you would see that being handled?
MW: Probably it wouldn't be. If you did somethign it would be most likely
some code in the application that lay between the user and the database.
You might also use a stored procedure that is triggered on each update.
MW: The main reason people don't do this is that there are a huge number
of constraints you might specify, and there is a performance hit for each
one you implement. So generally people wait and only imlpement constraints
when there is an obvious problem.
MW: What you might do is an occassional check of data quality against some
range of constraints off line.
>
> > >
> > > [snip]
> > >
> > > > > It's just as easy to come up with an issue relevant
> to database
> > > > > design. Let's say that you're integrating databases that are
> > > > > hospital
> > > > > records from different countries. The schema from one
> > > Muslim country
> > > > > allows multiple wives for a given husband and the schema for
> > > > > the US DB
> > > > > prohibits this. With English comments, there's no way for an
> > > > > automated
> > > > > process to find the problem.
> > > >
> > > >MW: The database structure is likely to be quite
> different (assuming
> > > >only the current wife is recorded in the US database) and so
> > > the constraint
> > > >quite obvious.
> > >
> > > Obvious to a human being, but not to a software program,
> that's the
> > > point. With a formal axiom, a theorem prover can catch the
> > > problem. With
> > > an English comment, a human has to catch the problem.
> >
> >MW: I don't think there is anything that can merge two databases
> >automatically without humans looking at it as part of the process.
>
> Nor do I. What an ontology can do is provide a set of integrity
> constraints that can be automatically processed to determine
> in part if
> that merger was specified correctly.
MW: Please explain how you think that might work.
>
> > >
> > > > I don't see what advantage the ontolgies would have. You
> > > >would ahve to manually map the concepts in the two
> > > ontologies anyway, and
> > > >that is much harder than finding the differences in data
> > > structure and
> > > >how/whether the constraints are enforced in the database.
> > > >
> > > >MW: In fact, if the US database is able to hold information
> > > about previous
> > > >wives, it will probably be able to hold the data about
> the multiple
> > > >concurrent muslim wives, since it is unlikely that a
> > > constraint restricting
> > > >a US citzen to one wife at a time would have been implemented.
> > >
> > > That's assuming a number of things that change the example,
> > > including that
> > > database designers do the right thing. Mistakes happen, and
> > > if all you
> > > have are English (or some other human language) definitions,
> > > no automated
> > > process can check it.
> >
> >MW: Again I know of no way to compare databases without human
> >intervention.
>
> nor do I
>
> >If they each have an ontology, they will be just
> >as different as the two databases, so I don't see how that gets
> >you further.
>
> If the database integrator makes an error and reverses the mapping of
> birthdate and deathdate from a client database to a central
> database, a
> human being has to catch the error. If there's an ontology
> that has an
> axiom that you die after you're born, a theorem prover can
> catch the error.
MW: This is a high overhead for an error that is unlikely. Normal procedure
would be to eyeball the resulting data to check you got what you expected.
Erros like this are pretty obvious. This way you look for the errors you
made
rather than try to guess all the erros you might make.
>
> > >
> > > [snip]
> > >
> > >
> > > > > >MW: EXPRESS is a language that is computer processable, and
> > > > > is processed
> > > > > >to provide a database. You do not automatically process
> > > the KIF in
> > > > > >your ontology, to produce a database design, it is read by
> > > > > humans, so you
> > > > > >should use something that they understand. It is the
> > > information you
> > > > > >convey, rather than the information you contain that matters.
> > > > >
> > > > > Let's distinguish between what is currently done with today's
> > > > > tools, and
> > > > > what this standardization effort is supposed to enable. A
> > > > > logic based
> > > > > ontology enables computer processing, so the SUO ontology is
> > > > > not intended
> > > > > just for human eyes. Also keep in mind that for human
> > > eyes, we have
> > > > > Michal's pseudo-English presentation and a graphical browser,
> > > > > as well as
> > > > > English comments.
> > > >
> > > >MW: Fine. But what is it for ...
> > >
> > > You made a comment above that the when the ontology is used
> > > by humans, it
> > > should be presented in a language they understand. That's
> > > the point I
> > > answered. We do present it in a human-readable form, as well
> > > as in formal
> > > logic.
> >
> >MW: Accepted.
> > >
> > > [snip]
> > >
> > >
> > > > > >MW: Well if I had an ontology, I certainly wouldn't ignore
> > > > > it (as long
> > > > > >as I could make sense of it) but I would translate it into a
> > > > > data model
> > > > > >as my first step if what I was doing was database design.
> > > > >
> > > > > ...and you're free to do that. Knowing that many people are
> > > > > put off by
> > > > > logic, we've translated SUMO into Protege and DAML. We'd
> > > be happy to
> > > > > implement other translators as our time allows.
> > > >
> > > >MW: That would be potentiallly useful, although one thing I
> > > have seen is
> > > >that the best way to say something in one language, does nto
> > > necessarily
> > > >translate automatically to another language in the best way
> > > to say it there.
> > > > >
> > >
> > > I agree.
> > >
> > > Adam
> > >
> > >
> > >
> > > Adam Pease
> > > Teknowledge
> > > (650) 424-0500 x571
> > >
>
> Adam Pease
> Teknowledge
> (650) 424-0500 x571
>