RE: ONT RE: Ontology case study
Adam,
AP> Forgive me for being a bit pointed, but I continue to be very troubled
by
claims that appear to me unjustified, unspecific or unsupported on this
list. If you believe Cyc and SUMO are "not sufficiently accurate", what
metric is that in regard to? Is there a particular axiom in either that
you could point out that would lead to an incorrect or inaccurate
conclusion during the course of logical deduction?
Firstly, as I and a number of other people have pointed out on this list
many times the issue is not about particular axioms which one can tinker
with for the rest of our lives - with little effect. My recollection of the
discussion at the workshop in Seattle was that it was agreed that the top
levels were not particularly well regimented and that they needed more
work - one area discussed being, I recall mereology. Your recent post
admitting no policy for dealing with the difference between attribution and
exemplification is another example. At the more detailed level, I recall a
number of people pointed out a number of problems with the notion of
transaction and process - a point I raised personally with you.
I am "very troubled" that these issues continue to be ignored.
To take an example from Cyc, relating to the enterprise, it seems to me from
the data I have that it regards positions as a role of a person. This is not
sufficient to track the identity of a position - such as the English
Monarch. (If I am wrong, I would like to know.) I believe I raised the
general requirement when the enterprise module of SUMO was being discussed
and I seem to recall SUMO was amended.
CP>It seems to me that if we are looking for the kind of general ontologies
>that Bill suspects are not feasible, then we need to address the demands
>of accuracy and regimentation at both the top and domain levels. In my
>view, for the kind of applications that fall under Bill Anderson s 3), it
>is only really possible to do these together.
AP>Indeed. How would you suggest that we proceed, or proceed differently?
It seems to me that what the meeting in Seattle agreed - a process of
regimenting the top level seems sensible. My experience is that the lower
levels normally benefit from some contact with reality. There are two
complementary strategies - first extracting (re-engineering) the information
from existing working systems and secondly using the SUMO to integrate
systems. Obviously, in both case the goal is not to just find any way
matching up the systems, but trying to find ways of improving the ontology.
Chris
-----Original Message-----
From: Adam Pease [mailto:apease@ks.teknowledge.com]
Sent: 30 May 2002 20:41
To: mail@ChrisPartridge.net; ontology@ieee.org
Subject: RE: ONT RE: Ontology case study
Chris,
At 10:03 PM 5/29/2002 +0200, Chris Partridge wrote:
>I think the discussion that has developed on the difference between data
>models and ontologies is linked to Bill Anderson s original comment about
>the PAR his item 3). It also helps to answer the question Pierluigi
>Miraglia asked about why certain types of effort are less successful than
>it seems theoretically they should be. It also helps to answer Bill s
>point below where I (for once) take Adam s side.
I'm not sure which side that might be :-)
>The issue is, it seems to me, accuracy and tolerance.
>
>Firstly, the kind of inference in things that the thread is labelling
>ontologies is absolute without any tolerance. As (real J) engineers know
>you need to plan for tolerance. An example I like is from (I think) Mike
>Uschold when building the 777 the errors in the individual tolerances add
>up leading to a visible difference in the length of the plane and
>associated problems. A philosopher has made the same point (see pp. 50-1
>of Dummett s The Logical Basis of Metaphysics 1991), that making
>inferences tends to dilute precision significantly. I copy the extract
>below for those that are interested.
>
>This means we need techniques to identify which inferences preserve enough
>accuracy to be workable. The way that this is done in operational systems
>is two-fold. One the requirements are clear the data is made sufficiently
>accurate for the specified process (and their inferences). That is why
>database people are always a bit wary of new processes one of the checks
>they apply is around data quality. In a system with unrestricted inference
>the data has to be unrestrictedly accurate i.e. exact.
>
>I also note, in passing, that FOL as it stands cannot do what Aristotle
>called practical reasoning no amount of logical/inferential processing
>will result in an action. This is not a problem in database systems.
>
>This leads onto another problem with what this discussion has labelled
>ontologies . The strong roots in predicate logic particularly FOL. As is
>well known (see e.g. p. 48 of Lowe s latest book) predicate logic was
>developed for mathematic applications and so is not well crafted for more
>mundane uses. For example, if you believe in a distinction between
>exemplification and attribution, this is not well marked and, at the very
>least, the temporality of predication needs some explaining. Note that
>database systems typically have this distinction built into them but in
>such a simplistic way that it cannot be practically used to reliably mark
>the distinction.
>
>How does this link to Bill s comment below about the potential for
>misinterpretation? Firstly, Bill is basically right, without some kind of
>framework, there will be misinterpretations. However, the work on
>ontologies may be able to reduce the potential significantly. It seems to
>me that if one can develop a well founded top ontology that this
>significantly reduces the potential for some kinds of misinterpretation
>(NB neither Cyc nor SUMO seem to currently have this).
Certainly any set of precise definitions that one adds to an existing model
will help to reduce potential misinterpretation. One might argue that Cyc
or SUMO don't do this *sufficiently* with respect to some criteria, or that
there are some other disadvantages involved in their use, but not that they
don't add specificity and reduce potential minterpretation.
>It also seems to me that the domain ontologies also need to be more
>rigourous. Just as the introduction of databases forced an increased level
>of accuracy so the sharing of databases (or ontologies) will force a
>further increase. In the area that I am interested in enterprise
>ontologies - it seems to me that both Cyc and SUMO are not sufficiently
>accurate. Of course, Cyc is intended to do common sense reasoning not
>database integration a different problem, and so has not need for accuracy
>along this dimension.
Forgive me for being a bit pointed, but I continue to be very troubled by
claims that appear to me unjustified, unspecific or unsupported on this
list. If you believe Cyc and SUMO are "not sufficiently accurate", what
metric is that in regard to? Is there a particular axiom in either that
you could point out that would lead to an incorrect or inaccurate
conclusion during the course of logical deduction?
>
>It seems to me that if we are looking for the kind of general ontologies
>that Bill suspects are not feasible, then we need to address the demands
>of accuracy and regimentation at both the top and domain levels. In my
>view, for the kind of applications that fall under Bill Anderson s 3), it
>is only really possible to do these together.
Indeed. How would you suggest that we proceed, or proceed differently?
Adam
>Regards,
>
>Chris
>
>
>
>
>
>
>
>pp. 50-1 - Dummett, M. The Logical Basis of Metaphysics 1991.
>
>In a section called Degeneration of Probabilities .
>
>Hence it is sufficient, for mathematical purposes, that a principle of
>inference should guarantee that truth is transmitted from premises to
>conclusion. Outside mathematics, we have a motive to demand more, if we
>can get it. ... Most of our beliefs are perforce based upon grounds that
>fall short of being conclusive, but a form of inference guaranteed to
>preserve truth is not, in general, guaranteed to preserve degree of
>probability. ... The 'ideal' subject starting from beliefs whose
>probability is close to 1, will end up with beliefs negligibly greater
>than 0; the man of common sense, initially adopting beliefs with a much
>weaker evidential basis, but reasoning from them only to a meagre extent,
>will finish with far fewer beliefs than he. That is why scientific
>conclusions arrived at by long chains of impeccable reasoning almost
>always prove, when a direct test becomes possible, to be wrong. ... In
>practical life, truth is valued chiefly as a guide to action; and then the
>principal remedy for the degeneration of probability in the course of
>inferential reasoning is to employ it sparingly.
>
>
>
>
>
>
>
>-----Original Message-----
>From: owner-ontology@majordomo.ieee.org
>[mailto:owner-ontology@majordomo.ieee.org]On Behalf Of William Burkett
>Sent: 29 May 2002 17:15
>To: 'Adam Pease'
>Cc: ontology@ieee.org
>Subject: ONT RE: Ontology case study
>
>
>
>Hi, Adam --
>
> > > -----Original Message-----
> > > > From: Adam Pease [mailto:apease@ks.teknowledge.com]
> > > > Sent: Thursday, May 23, 2002 2:10 PM
> > > > To: William Burkett; ontology@ieee.org
> > > > Subject: RE: Ontology case study
> > >...
> > > > >*The* most significant problem with this paradigm, however, is the
> > > > >development and application of mappings. What is "mapping",
> really? Can
> > > > >it be understood and taught to the general ontology-using
> public? Your
> > > > >effort was successful because you were dealing with a closed
> system of a
> > > > >known and well-defined scope and data meanings. How can the
mapping
> > > > >lessons you learned (and were learned in the above efforts) be
> applied to
> > > > >an open system with a huge, unknown, and constantly evolving scope
> and
> > > > >fuzzy, ambiguous, context-sensitive data meanings?
> > > >
> > > > The mapping problem is significant, to be sure, but is a problem in
> any
> > > > sort of integration effort, whether using ontologies, or a more
> > > > conventional data warehouse approach. I would suggest that at
> least the
> > > > problem is more manageable than typical systems integration
approaches
> > > > where n components require n^2 mappings.
> > >
> > >While I agree that mapping is an issue regardless of the integration
> > >paradigm/approach used (e.g., neutral model, data warehouse, database
> > >federations), I don't agree that the neutral model offers any
advantages
> > >in terms of manageability. In fact, I think the problem is actually
far
> > >more complex and less manageable than n^2 direct mappings. Sure, you
> > >reduce the number of mappings to 2 * n, but then you have to
> > deal with:
> > >
> > > - loss of semantic precision when "generalizing" local data into the
> > > neutral model, making extraction (interpretation) of meaning in the
> > > neutral model by other connected data source imprecise or wrong.
> >
> > That could be a problem if the neutral model is not specific or detailed
> > enough. That needn't be the case however.
>
>I feel I should respond to this indirectly, because - on reading your
>other responses - I feel that the following observation is the source of
>many of our differences of opinion.
>
>I think that our differences stem from differing assumptions about how an
>upper ontology is to be or will be used (a neutral integration model being
>one such use.) Your statement here implies that it is *possible* to be
>detailed and specific enough that the upper ontology/neutral model either
>cannot or will not be misused. (Matthew, in his response, makes the same
>assumption.) I feel this assumption is -- forgive me for saying and I
>intend no offense -- both naive and dogmatic. If "people" use the upper
>ontology/neutral model at all, they WILL misuse it and interpret it to
>their own needs - knowingly or unknowingly. All of my data modelling
>experience leads me to this position.
>
>I envision a bunch of different communities of people creating a bunch of
>ontologies and mapping them together following some standardized protocols
>such that a "knowledge web" can be built up incrementally as people do
>their jobs locally (like the internet has grown based on a few standard
>protocols). I suspect that your vision is not dissimilar, though I cannot
>imagine what safeguards or procedures could possibly be put in place to
>prevent misuse of an upper ontology without some single overseeing arbiter
>to police its use. (I also don't know if this "knowledge web" is the same
>thing as the "Semantic Web", though it's as valid an interpretation or
>vision as any.)
>
>Perhaps another important differing assumption is that you/Matthew
>assumption computing or modelling professionals will be interpreting the
>upper ontology/neutral model and, therefore, have the responsibility of
>using it correctly. I can't argue with this. My assumption, however, is
>that Joe Everyman can pick it up, use it, or create his own ontology if he
>wants to get what he knows into a computer.
>
>We want the cost-of-entry for using an upper ontology to be low,
>right? We can't, therefore, assume or depend on people using it
>correctly; rather, we should build in the safe-fail features to make sure
>that when it fails, such a fail isn't disastrous and recovery operations
>can be immediately started.
>
> >
> > > - Mapping "data source A" -> "neutral model NM" and "data source
B" ->
> > > "neutral model NM" is not the same mapping "data source A + B" ->
> > > "neutral model NM". If there's an overlap of information in A and B,
> > > there's a kind of "information multiplexing" involved that makes
correct
> > > mapping more difficult.
> >
> > I'm not sure I understand. Could you provide an example?
>
>
>
>Suppose I map data from a local driver's registration database (data
>source A) into state government database for motor-voter registration
>(data source NM). Suppose then that data from the Department of Motor
>Vehicles (data source B) is also mapped into the state database. If
>William Burkett in Los Angeles is translated from A into NM, there will be
>a motor-voter record for that individual created. If William C. Burkett
>from Los Angeles County is mapped from data source C into NM, then is a
>new motor-voter individual record created? If I'd had prior knowledge of
>the two data sources - A and B - I could write mapping rules that take
>into account the fact that data source A doesn't use middle initials and
>that if (1) first name-last name and (2) city is in county, the it's one
>and the same individual. If A and B are both mapped independantly of the
>knowledge of the other, it's very likely that there will be two records
>for the same individual in the NM.
>
>
>
> >
> > > - Mapping between A and B is straight-forward because the "system"
is
> > > essentially closed: you know what is in A and what is in B. Mapping
> A to
> > > a NM is less deterministic: you *think* you know what is in NM, but if
> > > others are free to map to it, their interpretation of what is in the
NM
> > > will likely be very different from yours. In other words, the
> assumption
> > > that all mappers will interpret the NM in the same way while mapping
is
> > > false. (Hell - the assumption that any two people will interpret *any*
> > > model the same way is probably false, too.)
> >
> > I believe this is actually a good counterexample. While the terms and
> > relations in a database representation don't have a formal semantics
(note
> > that I didn't say SQL itself doesn't have a formal semantics),
axiomatized
> > terms and relations in first order logic do. The axioms completely
> specify
> > the meaning of the term so there is not as much of an issue about
people's
> > different interpretations. Of course, if the axioms are not detailed or
> > specific enough that's a problem just as it would be with any
> > underspecified representation.
>
>I think THE most important issue is people's different
>interpretations. (See my second assumption above.) Regardless of the FOL
>language chosen to represent the ontology, human beings are still going to
>read the words/terms that are tokens in the FOL representation and apply
>natural language interpretations to them. There is no getting away from
>this -- we are trapped using natural language - ultimately - to articulate
>and interpret meaning (i.e., real world domain semantics).
>
>
>
> > > - When a new "node" is added to the community of integrated "nodes"
> > > mapped to a common NM, the mappings of all the nodes need to be
reviewed
> > > to see if they still "interpret" the NM properly given the expansion
of
> > > its semantic applicability with/for the new node.
> >
> > I would have to disagree with this as well. The interpretation of a
term
> > does not change just because some additional term is added to the
> > ontology. All the past mappings would still be correct. The only issue
> > would be whether the mappings are specific enough and take appropriate
> > advantage of the presence of a new term.
>
>I guess we'll have to settle for the old "agree to disagree" conclusion,
>then, because my fundamental assumptions lead me to the opposite
>position. I think it is very likely, in not inevitable, that the
>interpretation of a new term could cause "interpretive ripples" through a
>collection of mapped ontologies. It's the same phenomena as a new person
>coming into your committee meeting half-way through: there's a temporary
>pertubation of the discussion while the new person "comes up to speed"
>with what's transpired so that he/she can then fully and fruitfully
>participate and contribute.
>
>
>
> >
> > >Off the top of my head, these are just some of the problems with a
> neutral
> > >model integration approach. These problems can be overcome
> > >methodologically, of course, but the depth and dimensions of the
problem
> > >are, I think, still poorly understood (if not mostly unrecognized).
> > >
> > >
> > > > >While I think you can sell the neutral ontology integration model
> as a
> > > > >problem solving approach, getting people to know about and use
> SUMO (or
> > > > >any other "upper" ontology) as neutral ontology in their solution
> is a
> > > > >different kind of sales job altogether. And it is one that I
> don't think
> > > > >will be very successful - any well-defined and well-bounded
> integration
> > > > >effort will want to use their own.
> > > >
> > > > Can you discuss further why you feel they'd want to use their
> own? I've
> > > > found that unlike in the research world, people who want to
> accomplish a
> > > > practical commercial task are very happy to adopt someone else's
> models or
> > > > software if it helps them get their job done.
> > >
> > >But in adopting someone else's models or software, how often do they
use
> > >them exactly as is? I don't know how often I've heard "My/our
> > >requirements are different". At the very best, they would use the
> > >models/software as a starting point for doing what they want to
> > >do. Adopting and adapting a neutral ontology model to the usage and
> needs
> > >of your local (integrated) community defeats the whole purpose of
> using it
> > >as a generalized integration model. People will interpret and use the
> > >model as they wish, and this can't be policed (and shouldn't because
> it is
> > >not wrong of them do this - it's natural.) The only way "standardized"
> > >interpretations will arise is by the conventions that arise and are
> > >reinforced in a language-use community, in which case it will pay for
> > >people to interpret the ontology the same way. (Remember: dictionaries
> > >don't specify the meaning of words; they document the conventional
> > >meanings of usages of a word.)
> >
> > Well, we're drawing on the anecdotes of personal experience here, not
> > having the results of some survey that specifies how various groups of
> > people use various types of software.
>
>True enough. I know of little realistic "research studies" in this field.
>
> >I would only try to support my view
> > further with the fact that the vast majority of Java programmers use,
and
> > subclass the JDK, rather than feeling a need to modify it.
>
>And my view stems from a different set of experiences, e.g., modelling the
>information requirements of domain experts for the purpose of representing
>and exchanging data between CAD systems. My response to your Java example
>is that Java and the behavior of computing machines is a (relatively)
>well-understood domain compared to the knowledge in the "real
>world". Therefore the programmers that subclass the JDK already know what
>the classes could/should do and how to apply them. You give the same
>programmers a class diagram ostensbility representing domain knowledge,
>like people or parts or products, and they will each interpret them -
>differently - as they see fit in their applications. (Again - that has
>been my consistent and unvarying experience.)
>
>
>
>--- Bill
Adam Pease
Teknowledge
(650) 424-0500 x571