Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

RE: ONT RE: Ontology case study




Dear Adam,

See comments below.


Matthew West
Principal Consultant
Shell Information Technology International Limited
Shell Centre, London SE1 7NA, United Kingdom

Tel: +44 20 7934 4490 Other Tel: +44 7796 336538
Email: matthew.r.west@is.shell.com
Internet: http://www.shell.com


> -----Original Message-----
> From: Adam Pease [mailto:apease@ks.teknowledge.com]
> Sent: 28 May 2002 20:51
> To: West, Matthew R SITI-ITPSIE; ontology@ieee.org
> Subject: Re: ONT RE: Ontology case study
> 
> 
> Matthew,
> 
> At 09:24 PM 5/28/2002 +0200, West, Matthew R SITI-ITPSIE wrote:
> 
> 
> 
> [snip]
> 
> > > >MW: An ontology is really overkill for this.
> > >
> > > I think this is simply a question of personal preference and
> > > experience.  My understanding is that you're not familiar
> > > with logic so
> > > it's not surprising that it seems imposing.
> >
> >MW: I am much more familiar than I was when I arrived here, so
> >my principle concern is utility and economy.
> >
> > > I'm not familiar
> > > with the
> > > EXPRESS language so it would seem like overkill to me to use
> > > EXPRESS syntax
> > > for describing something in a much smaller and more familiar
> > > (to me) syntax
> > > of SUO-KIF.
> >
> >MW: Well this is compact vs expressive. I generally prefer compact
> >myself, so this is not the reason. Rather economy, and using the
> >tools of the community you are working in. I haven't tried to
> >suggest that people here should use EXPRESS or some other data
> >modelling formalism. Neither would I suggest other communities
> >used KIF, unless their current tools couldn't support the business
> >requirements, and KIF obviously did.
> 
> So why is using an ontology overkill again?

MW: A data model is an ontology, just one with far fewer axioms, but
inclulding those that are useful in database design. So it is a matter
of degree. A fully axiomatized ontology just includes a lot of stuff
that has no value for database design.
> 
> > >
> > > >  Though I should perhaps make
> > > >clear some distinctions I use related to terms like ontology.
> > > >
> > > >  - Taxonomy, a dictionary of standard terms and their
> > > natural language
> > > >    definitions.
> > > >
> > > >  - Thesaurus, a taxonomy with subtype/supertype relations.
> > > >
> > > >  - Data Model, a structure of types and relations against
> > > which instances
> > > >    of the types and relations can be stored. Definitions of
> > > the types and
> > > >    relations are in natural language. Some constraints are
> > > defined, usually
> > > >    in terms of the cardinality of relationships.
> > > >
> > > >  - Ontology, types, relations, and possibly instances of
> > > those together
> > > >    with axioms, usually in First Order Logic, that define
> > > the rules that
> > > >    apply to members of the types and relations.
> > > >
> > > >Data Models are what are usually developed/used to design a
> > > database. The
> > > >axioms of an ontology are generally of little use/relevance
> > > either in the
> > > >design of the database,
> > >
> > > Again, a question of perspective.  If one doesn't read logic,
> > > the axioms
> > > would certainly be of little use.  If one does read 
> logic, and has a
> > > theorem prover available, the axioms could be used to test
> > > the consistency
> > > of the schema.
> >
> >MW: Not many people - particularly business users - read logic. They
> >have a tough enough time with data models, and we usually 
> present those
> >in a somewhat simplified form to check that we understand what they
> >need.
> 
> ...and we do the same with logic.  Check out Michal 
> Sevcenko's SUMO browser 
> for one very simple algorithm that presents logic in an 
> English-like form.

MW: I haven't had time yet, but good to hear.
> 
> > >
> > > >or in the population of it by instances.
> > > >
> > > >So in general my question for this sort of case would be:
> > > Why would you
> > > >develop an ontology (more work) when a data model will do?
> > >
> > > Because if I have a schema like
> > >
> > > Table: Foo
> > > col 1: parent name  col 2: parent SSN  col 3: child name  col
> > > 4: child SSN
> > > Mary M. Smith            555-11-1234    Robert J. Smith
> > > 555-11-4444
> > > Robert J. Smith          555-11-4444    Mary M. Smith
> > > 555-11-1234
> > >
> > > Table comments:
> > > parent: "The parent of a child specified in the child columns."
> > >
> > > No automated process can read the English and determine that
> > > the database
> > > is bad.  If you have instead a formal ontology that defines
> > >
> > > (=>
> > >    (parent ?X ?Y)
> > >    (not
> > >      (parent ?Y ?X))
> > >
> > > as part of the formal definition of the parent relation, then
> > > an automated
> > > process could catch the problem.  I'm sure you could find
> > > some existing
> > > database constraint language that could handle this
> > > particular constraint,
> > > but not all the constraints that are easily specified in
> > > logic, unless of
> > > course the database constraint language is a particular
> > > syntax for logic
> > > (which many language designers seem to be moving towards, see
> > > OCL in the
> > > UML for example).
> >
> >MW: The utility of this is determined by the frequency with which
> >people put this kind of bad data into the database. This kind of
> >error is very rare, simply because the people putting the data in
> >know it is wrong. Now typo's are very common, but I don't see how
> >axioms can help with the miss-spelling of names.
> 
> How about a database in which both John and Jon (should be 
> John) have the 
> same SSN, or hospital records where Mary who was born on 
> 5/2/65 is recorded 
> as giving birth to John on 8/7/68 (should be 8/7/86)?  This 
> sort of thing 
> is not at all rare in real world databases.

MW: This sort of thing is sorted out by referential integrity, and
is the sort of constraint that is set up in data models. In this case
you would probably not get the opportunity to enter the name, unless you
were the first person to enter the SSN and related details, after that
these would be provided by reference only.
> 
> > >
> > > >Of course ISO15926-2 is a data model developed specifically
> > > to solve this
> > > >sort of problem, with ISO18876 an architecture and
> > > methodology for it.
> > > >
> > > > > We then "compiled" the ontology by hand into
> > > > > a relational
> > > > > database.
> > > >
> > > >MW: Most data modelling tools can compile automatically to a
> > > relational
> > > >database, so more expense here.
> > >
> > > True, but that's only because these modeling tools are built
> > > on technology
> > > that's been around a bit longer, so it's not surprising there
> > > would be
> > > software implemented for it.  That doesn't mean there 
> some technical
> > > obstacle.  Another issue is that existing DB languages are
> > > less expressive
> > > than logic.  If the data modeling tool is just an ER 
> diagram, it maps
> > > trivially into a DB schema.
> >
> >MW: As one design option yes (but certainly not the only one). Yes DB
> >languages are less expressive than logic, but my point is 
> that that has
> >not value unless/until you need that additional 
> expressiveness, and then
> >you generally want it in a different context, e.g. checking 
> data quality,
> >rather than doing database design.
> 
> It's just as easy to come up with an issue relevant to database 
> design.  Let's say that you're integrating databases that are 
> hospital 
> records from different countries.  The schema from one Muslim country 
> allows multiple wives for a given husband and the schema for 
> the US DB 
> prohibits this.  With English comments, there's no way for an 
> automated 
> process to find the problem.

MW: The database structure is likely to be quite different (assuming
only the current wife is recorded in the US database) and so the constraint
quite obvious.  I don't see what advantage the ontolgies would have. You 
would ahve to manually map the concepts in the two ontologies anyway, and
that is much harder than finding the differences in data structure and
how/whether the constraints are enforced in the database.

MW: In fact, if the US database is able to hold information about previous
wives, it will probably be able to hold the data about the multiple
concurrent muslim wives, since it is unlikely that a constraint restricting
a US citzen to one wife at a time would have been implemented.
> 
> [snip]
> 
> > > > > I've provided references in
> > > > > previous messages
> > > > > to other government research projects where we've used
> > > ontologies for
> > > > > integration as well.
> > > > >    One impact of an ontology that I find undeniable is that
> > > > > at the very
> > > > > least, a formal ontology can serve as a more precise set of
> > > > > comments about
> > > > > the meaning of database tables and fields than 
> informal English.
> > > >
> > > >MW: I disagree. The disadvantage in this application of a
> > > formal definition
> > > >is that only those who understand the formalism can 
> understand it.
> > >
> > > That's true for any language including EXPRESS, EPISTLE etc.
> > > How is this
> > > an argument against logic and ontology?
> >
> >MW: EXPRESS is a language that is computer processable, and 
> is processed
> >to provide a database. You do not automatically process the KIF in
> >your ontology, to produce a database design, it is read by 
> humans, so you
> >should use something that they understand. It is the information you
> >convey, rather than the information you contain that matters.
> 
> Let's distinguish between what is currently done with today's 
> tools, and 
> what this standardization effort is supposed to enable.  A 
> logic based 
> ontology enables computer processing, so the SUO ontology is 
> not intended 
> just for human eyes.  Also keep in mind that for human eyes, we have 
> Michal's pseudo-English presentation and a graphical browser, 
> as well as 
> English comments.

MW: Fine. But what is it for ...
> 
> > >
> > > >  I think
> > > >the only time that stating axioms formally is sensible (in a
> > > commercial
> > > >application) is when they are going to be acted on by some
> > > application
> > > >(usually of course some sort of inference engine).
> > >
> > > That's a reasonable statement to make about expression of
> > > information in
> > > logic, but it doesn't address the at least anecdotally 
> demonstrated
> > > advantage of ontology in supporting modeling tasks, as oppposed to
> > > generating a model from scratch.
> >
> >MW: Well if I had an ontology, I certainly wouldn't ignore 
> it (as long
> >as I could make sense of it) but I would translate it into a 
> data model
> >as my first step if what I was doing was database design.
> 
> ...and you're free to do that.  Knowing that many people are 
> put off by 
> logic, we've translated SUMO into Protege and DAML.  We'd be happy to 
> implement other translators as our time allows.

MW: That would be potentiallly useful, although one thing I have seen is
that the best way to say something in one language, does nto necessarily
translate automatically to another language in the best way to say it there.
> 
> [snip]
> 
> Adam
> 
> 
> Adam Pease
> Teknowledge
> (650) 424-0500 x571
>