Re: ONT RE: Ontology case study
Bill,
Comments below:
At 08:14 AM 5/29/2002 -0700, William Burkett wrote:
>Hi, Adam --
>
> > > -----Original Message-----
> > > > From: Adam Pease [mailto:apease@ks.teknowledge.com]
> > > > Sent: Thursday, May 23, 2002 2:10 PM
> > > > To: William Burkett; ontology@ieee.org
> > > > Subject: RE: Ontology case study
> > >...
> > > > >*The* most significant problem with this paradigm, however, is the
> > > > >development and application of mappings. What is "mapping",
> really? Can
> > > > >it be understood and taught to the general ontology-using
> public? Your
> > > > >effort was successful because you were dealing with a closed
> system of a
> > > > >known and well-defined scope and data meanings. How can the mapping
> > > > >lessons you learned (and were learned in the above efforts) be
> applied to
> > > > >an open system with a huge, unknown, and constantly evolving scope
> and
> > > > >fuzzy, ambiguous, context-sensitive data meanings?
> > > >
> > > > The mapping problem is significant, to be sure, but is a problem in
> any
> > > > sort of integration effort, whether using ontologies, or a more
> > > > conventional data warehouse approach. I would suggest that at
> least the
> > > > problem is more manageable than typical systems integration approaches
> > > > where n components require n^2 mappings.
> > >
> > >While I agree that mapping is an issue regardless of the integration
> > >paradigm/approach used (e.g., neutral model, data warehouse, database
> > >federations), I don't agree that the neutral model offers any advantages
> > >in terms of manageability. In fact, I think the problem is actually far
> > >more complex and less manageable than n^2 direct mappings. Sure, you
> > >reduce the number of mappings to 2 * n, but then you have to
> > deal with:
> > >
> > > - loss of semantic precision when "generalizing" local data into the
> > > neutral model, making extraction (interpretation) of meaning in the
> > > neutral model by other connected data source imprecise or wrong.
> >
> > That could be a problem if the neutral model is not specific or detailed
> > enough. That needn't be the case however.
>
>I feel I should respond to this indirectly, because - on reading your
>other responses - I feel that the following observation is the source of
>many of our differences of opinion.
>
>I think that our differences stem from differing assumptions about how an
>upper ontology is to be or will be used (a neutral integration model being
>one such use.) Your statement here implies that it is *possible* to be
>detailed and specific enough that the upper ontology/neutral model either
>cannot or will not be misused. (Matthew, in his response, makes the same
>assumption.) I feel this assumption is -- forgive me for saying and I
>intend no offense -- both naive and dogmatic. If "people" use the upper
>ontology/neutral model at all, they WILL misuse it and interpret it to
>their own needs - knowingly or unknowingly. All of my data modelling
>experience leads me to this position.
I have not made any such assumption. Of course it is possible that people
will use any tool improperly. Ontologies are no more sensitive to that
issue than databases however.
>I envision a bunch of different communities of people creating a bunch of
>ontologies and mapping them together following some standardized protocols
>such that a "knowledge web" can be built up incrementally as people do
>their jobs locally (like the internet has grown based on a few standard
>protocols). I suspect that your vision is not dissimilar, though I cannot
>imagine what safeguards or procedures could possibly be put in place to
>prevent misuse of an upper ontology without some single overseeing arbiter
>to police its use. (I also don't know if this "knowledge web" is the same
>thing as the "Semantic Web", though it's as valid an interpretation or
>vision as any.)
>
>Perhaps another important differing assumption is that you/Matthew
>assumption computing or modelling professionals will be interpreting the
>upper ontology/neutral model and, therefore, have the responsibility of
>using it correctly. I can't argue with this. My assumption, however, is
>that Joe Everyman can pick it up, use it, or create his own ontology if he
>wants to get what he knows into a computer.
>
>We want the cost-of-entry for using an upper ontology to be low,
>right? We can't, therefore, assume or depend on people using it
>correctly; rather, we should build in the safe-fail features to make sure
>that when it fails, such a fail isn't disastrous and recovery operations
>can be immediately started.
Again, this is neither something that ontologies are particularly prone to,
nor something to which databases are immune.
> >
> > > - Mapping "data source A" -> "neutral model NM" and "data source B" ->
> > > "neutral model NM" is not the same mapping "data source A + B" ->
> > > "neutral model NM". If there's an overlap of information in A and B,
> > > there's a kind of "information multiplexing" involved that makes correct
> > > mapping more difficult.
> >
> > I'm not sure I understand. Could you provide an example?
>
>Suppose I map data from a local driver's registration database (data
>source A) into state government database for motor-voter registration
>(data source NM). Suppose then that data from the Department of Motor
>Vehicles (data source B) is also mapped into the state database. If
>William Burkett in Los Angeles is translated from A into NM, there will be
>a motor-voter record for that individual created. If William C. Burkett
>from Los Angeles County is mapped from data source C into NM, then is a
>new motor-voter individual record created? If I'd had prior knowledge of
>the two data sources - A and B - I could write mapping rules that take
>into account the fact that data source A doesn't use middle initials and
>that if (1) first name-last name and (2) city is in county, the it's one
>and the same individual. If A and B are both mapped independantly of the
>knowledge of the other, it's very likely that there will be two records
>for the same individual in the NM.
Now I understand the example, but what is the point that it makes about
databases or ontologies? One could have a common model developed from an
ontology, or a common database not developed from an ontology, and you
would still have the same problem.
> >
> > > - Mapping between A and B is straight-forward because the "system" is
> > > essentially closed: you know what is in A and what is in B. Mapping
> A to
> > > a NM is less deterministic: you *think* you know what is in NM, but if
> > > others are free to map to it, their interpretation of what is in the NM
> > > will likely be very different from yours. In other words, the
> assumption
> > > that all mappers will interpret the NM in the same way while mapping is
> > > false. (Hell - the assumption that any two people will interpret *any*
> > > model the same way is probably false, too.)
> >
> > I believe this is actually a good counterexample. While the terms and
> > relations in a database representation don't have a formal semantics (note
> > that I didn't say SQL itself doesn't have a formal semantics), axiomatized
> > terms and relations in first order logic do. The axioms completely
> specify
> > the meaning of the term so there is not as much of an issue about people's
> > different interpretations. Of course, if the axioms are not detailed or
> > specific enough that's a problem just as it would be with any
> > underspecified representation.
>
>I think THE most important issue is people's different
>interpretations. (See my second assumption above.) Regardless of the FOL
>language chosen to represent the ontology, human beings are still going to
>read the words/terms that are tokens in the FOL representation and apply
>natural language interpretations to them. There is no getting away from
>this -- we are trapped using natural language - ultimately - to articulate
>and interpret meaning (i.e., real world domain semantics).
Maybe you're arguing against any sort of data warehouse then, regardless of
whether it was created from an ontology. It seems in fact that you're also
arguing against any sort of team development where people must interpret
each others data definitions (or APIs etc). Ontologies don't (and I
wouldn't claim) to solve this issue completely, but they do help, because
the definitions are in logic rather than English, and therefore some
additional automated consistency checking is possible.
> > > - When a new "node" is added to the community of integrated "nodes"
> > > mapped to a common NM, the mappings of all the nodes need to be reviewed
> > > to see if they still "interpret" the NM properly given the expansion of
> > > its semantic applicability with/for the new node.
> >
> > I would have to disagree with this as well. The interpretation of a term
> > does not change just because some additional term is added to the
> > ontology. All the past mappings would still be correct. The only issue
> > would be whether the mappings are specific enough and take appropriate
> > advantage of the presence of a new term.
>
>I guess we'll have to settle for the old "agree to disagree" conclusion,
>then, because my fundamental assumptions lead me to the opposite
>position. I think it is very likely, in not inevitable, that the
>interpretation of a new term could cause "interpretive ripples" through a
>collection of mapped ontologies. It's the same phenomena as a new person
>coming into your committee meeting half-way through: there's a temporary
>pertubation of the discussion while the new person "comes up to speed"
>with what's transpired so that he/she can then fully and fruitfully
>participate and contribute.
>
> >
> > >Off the top of my head, these are just some of the problems with a
> neutral
> > >model integration approach. These problems can be overcome
> > >methodologically, of course, but the depth and dimensions of the problem
> > >are, I think, still poorly understood (if not mostly unrecognized).
> > >
> > >
> > > > >While I think you can sell the neutral ontology integration model
> as a
> > > > >problem solving approach, getting people to know about and use
> SUMO (or
> > > > >any other "upper" ontology) as neutral ontology in their solution
> is a
> > > > >different kind of sales job altogether. And it is one that I
> don't think
> > > > >will be very successful - any well-defined and well-bounded
> integration
> > > > >effort will want to use their own.
> > > >
> > > > Can you discuss further why you feel they'd want to use their
> own? I've
> > > > found that unlike in the research world, people who want to
> accomplish a
> > > > practical commercial task are very happy to adopt someone else's
> models or
> > > > software if it helps them get their job done.
> > >
> > >But in adopting someone else's models or software, how often do they use
> > >them exactly as is? I don't know how often I've heard "My/our
> > >requirements are different". At the very best, they would use the
> > >models/software as a starting point for doing what they want to
> > >do. Adopting and adapting a neutral ontology model to the usage and
> needs
> > >of your local (integrated) community defeats the whole purpose of
> using it
> > >as a generalized integration model. People will interpret and use the
> > >model as they wish, and this can't be policed (and shouldn't because
> it is
> > >not wrong of them do this - it's natural.) The only way "standardized"
> > >interpretations will arise is by the conventions that arise and are
> > >reinforced in a language-use community, in which case it will pay for
> > >people to interpret the ontology the same way. (Remember: dictionaries
> > >don't specify the meaning of words; they document the conventional
> > >meanings of usages of a word.)
> >
> > Well, we're drawing on the anecdotes of personal experience here, not
> > having the results of some survey that specifies how various groups of
> > people use various types of software.
>
>True enough. I know of little realistic "research studies" in this field.
>
> >I would only try to support my view
> > further with the fact that the vast majority of Java programmers use, and
> > subclass the JDK, rather than feeling a need to modify it.
>
>And my view stems from a different set of experiences, e.g., modelling the
>information requirements of domain experts for the purpose of representing
>and exchanging data between CAD systems. My response to your Java example
>is that Java and the behavior of computing machines is a (relatively)
>well-understood domain compared to the knowledge in the "real
>world". Therefore the programmers that subclass the JDK already know what
>the classes could/should do and how to apply them. You give the same
>programmers a class diagram ostensbility representing domain knowledge,
>like people or parts or products, and they will each interpret them -
>differently - as they see fit in their applications. (Again - that has
>been my consistent and unvarying experience.)
I would agree with that, which is precisely why it is important that the
definitions of those models be as precise as possible. English comments
are ambiguous. Logic is more precise. It doesn't solve all the social
problems of team development, but it's certainly a worthwhile, if modest,
tool to help.
Adam
>--- Bill
Adam Pease
Teknowledge
(650) 424-0500 x571