ONT RE: Ontology case study
Matthew,
Since we're addressing generalities rather than a change proposal for a
starter document, I'd prefer to continue this discussion on the
ontology@ieee.org list. Comments below:
At 05:29 PM 5/26/2002 +0200, West, Matthew R SITI-ITPSIE wrote:
>Dear Adam,
>
>Your case study tramples all over my heartland, so see some
>comments below.
>
>
>Matthew West
>Principal Consultant
>Shell Information Technology International Limited
>Shell Centre, London SE1 7NA, United Kingdom
>
>Tel: +44 20 7934 4490 Other Tel: +44 7796 336538
>Email: matthew.r.west@is.shell.com
>Internet: http://www.shell.com
>
>
> > -----Original Message-----
> > From: Adam Pease [mailto:apease@ks.teknowledge.com]
> > Sent: 23 May 2002 00:17
> > To: SUO
> > Subject: SUO: Ontology case study
> >
> >
> >
> > Folks,
> > Bill Andersen and I were speaking a moment ago and he
> > pointed out that I
> > hadn't related an application of ontology that we did a while
> > ago. My hope
> > is that this case should point out one simple, concrete
> > application of an
> > ontology, in a deployed business environment.
> > During the dot-com boom we worked with a Internet-based
> > real-estate
> > company to solve a database integration task. I was
> > surprised to find out
> > as we started the project that what consumers know as the
> > Multiple Listing
> > Service database that realtors use is actually a collection
> > of some 30,000
> > locally-developed databases that record information about
> > homes for sale in
> > a particular geographic area. All the databases cover the same basic
> > information, but all the table names, field names, and symbols may be
> > different from one database to the next.
> > We created an ontology of real estate in first order
> > logic, using an
> > upper ontology.
>
>MW: An ontology is really overkill for this.
I think this is simply a question of personal preference and
experience. My understanding is that you're not familiar with logic so
it's not surprising that it seems imposing. I'm not familiar with the
EXPRESS language so it would seem like overkill to me to use EXPRESS syntax
for describing something in a much smaller and more familiar (to me) syntax
of SUO-KIF.
> Though I should perhaps make
>clear some distinctions I use related to terms like ontology.
>
> - Taxonomy, a dictionary of standard terms and their natural language
> definitions.
>
> - Thesaurus, a taxonomy with subtype/supertype relations.
>
> - Data Model, a structure of types and relations against which instances
> of the types and relations can be stored. Definitions of the types and
> relations are in natural language. Some constraints are defined, usually
> in terms of the cardinality of relationships.
>
> - Ontology, types, relations, and possibly instances of those together
> with axioms, usually in First Order Logic, that define the rules that
> apply to members of the types and relations.
>
>Data Models are what are usually developed/used to design a database. The
>axioms of an ontology are generally of little use/relevance either in the
>design of the database,
Again, a question of perspective. If one doesn't read logic, the axioms
would certainly be of little use. If one does read logic, and has a
theorem prover available, the axioms could be used to test the consistency
of the schema.
>or in the population of it by instances.
>
>So in general my question for this sort of case would be: Why would you
>develop an ontology (more work) when a data model will do?
Because if I have a schema like
Table: Foo
col 1: parent name col 2: parent SSN col 3: child name col 4: child SSN
Mary M. Smith 555-11-1234 Robert J. Smith 555-11-4444
Robert J. Smith 555-11-4444 Mary M. Smith 555-11-1234
Table comments:
parent: "The parent of a child specified in the child columns."
No automated process can read the English and determine that the database
is bad. If you have instead a formal ontology that defines
(=>
(parent ?X ?Y)
(not
(parent ?Y ?X))
as part of the formal definition of the parent relation, then an automated
process could catch the problem. I'm sure you could find some existing
database constraint language that could handle this particular constraint,
but not all the constraints that are easily specified in logic, unless of
course the database constraint language is a particular syntax for logic
(which many language designers seem to be moving towards, see OCL in the
UML for example).
>Of course ISO15926-2 is a data model developed specifically to solve this
>sort of problem, with ISO18876 an architecture and methodology for it.
>
> > We then "compiled" the ontology by hand into
> > a relational
> > database.
>
>MW: Most data modelling tools can compile automatically to a relational
>database, so more expense here.
True, but that's only because these modeling tools are built on technology
that's been around a bit longer, so it's not surprising there would be
software implemented for it. That doesn't mean there some technical
obstacle. Another issue is that existing DB languages are less expressive
than logic. If the data modeling tool is just an ER diagram, it maps
trivially into a DB schema.
> > We then wrote scripts, again by hand, to map the
> > contents of
> > each MLS database into our common database.
>
>MW: There are sophisticated tools to support this kind of application.
>There are two basic types, ETL (Extract Transform and Load) and EAI
>(Enterprise Application Integration).
>
>ETL tools specialise in the batch migration of usually transaction data
>between different databases with different structures. Often the data
>is being brought into a Management Information System (MIS) data warehouse.
>
>EIA tools specialise in real time integration of data where systems need
>to cooperate, usually this involves the synchronisation and restructuring
>of reference data (organisations, products, addresses, ...).
>
> > The ontology was
> > robust and
> > comprehensive enough such that after the first couple
> > databases, and we
> > eventually mapped several dozen of them, there were no
> > changes to the ontology.
>
>MW: When integrating (or merging) a number of systems there is a lot more
>than just the data structure that needs to be integrated. In the case of
>your real-estate (we call them estate agents) application, things like
>getting the names of cities, states etc in a standard form and correct
>spelling, not to mention the companies involved.
I agree.
>There are tools that
>will help with these data quality issues.
>
> > The company's web site went live for several months, using our
> > ontology-based database, and then they exhausted their
> > funding and went out
> > of business.
> > Of course this is just an anecdote. I doubt it's going to change
> > anyone's mind who already has a strong opinion about the
> > usefulness of a
> > single ontology for supporting integration. But at the very
> > least, it's a
> > concrete instance of the productive use of a particular
> > ontology on one
> > commercial integration task.
>
>MW: Whilst I accept that you can use an ontology for this kind of thing,
>it is generally more than you need. So it is interesting that I am here,
>even though I wouldn't use a formal ontology for this sort of application,
>and I don't see much future for formal ontologies here, except as in your
>case it happens to be the approach that those involved are already
>familiar with.
And likewise the opposite in your case. This does beg the question though
if you believe ontologies are useless, why are you here?
> > I've provided references in
> > previous messages
> > to other government research projects where we've used ontologies for
> > integration as well.
> > One impact of an ontology that I find undeniable is that
> > at the very
> > least, a formal ontology can serve as a more precise set of
> > comments about
> > the meaning of database tables and fields than informal English.
>
>MW: I disagree. The disadvantage in this application of a formal definition
>is that only those who understand the formalism can understand it.
That's true for any language including EXPRESS, EPISTLE etc. How is this
an argument against logic and ontology?
> I think
>the only time that stating axioms formally is sensible (in a commercial
>application) is when they are going to be acted on by some application
>(usually of course some sort of inference engine).
That's a reasonable statement to make about expression of information in
logic, but it doesn't address the at least anecdotally demonstrated
advantage of ontology in supporting modeling tasks, as oppposed to
generating a model from scratch.
> > This also points out some fruitful areas of research for tool
> > builders. It would be nice to compile the ontology to an SQL
> > DB schema. I
> > believe Bill's company is working on that.
>
>MW: This might be useful, but personally I think that merging data models,
>ontologies, and database definition, would be even better. We have
>also taken approaches that involve a fixed database structure whatever
>the data model/ontology.
>
> > Another extremely
> > valuable tool
> > would be one that aids in doing the mapping from disparate
> > databases to a
> > common database.
>
>MW: Already done (though at the data model rather than ontology level)
>see above. Actually, I do think there is some room here, since most of
>the tools have grown up empirically, rather than based on FOL explicitly
>or knowingly.
>
> > I'm doubtful about the prospects for doing that
> > completely automatically, but hopefully that tools could be
> > developed that
> > would help.
>
>MW: Indeed. None of the tools I mention are automatic, they manage the
>definition and execution of the interfaces. I too very much doubt
>if full automation is possible.
>
> > I hope this is helpful.
>
>MW: Yes, we should be talking more about possible applications, and
>what it takes to support them.
>
>My personal view is that full formal ontologies only come into their
>own when there is inferenced based work to do. I think there are two
>potential areas for this:
>
>1. Data Mining, by which I mean discovering facts that are hidden in
>the data you have, but are not easily apparent.
>
>2. Automation, by which I mean providing the where-with-all for applications
>to take decisions without (or with reduced) human intervention.
> >
> > Adam
> >
> >
> >
> > Adam Pease
> > Teknowledge
> > (650) 424-0500 x571
> >
Adam Pease
Teknowledge
(650) 424-0500 x571