Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Avoiding the pitfalls of ontology development



John Sowa wrote:
> That's why I suggested that we address the problem of
> working on a document about principles for developing
> and defining ontologies.
> ... compile an annotated bibliography of such resources.

I have put the principles I list below in
http://www.webkb.org/doc/ontologyBuildingPrinciples.html
It links to my previous "discussion on recommendations to
increase knowledge reuse".



Matthew West wrote:
> The traps and principles were combined in a publicly available document:
> http://www.matthew-west.org.uk/Documents/princ03.pdf

Six good principles (a list is on page 7) and the importance of
"reflecting the reality" and "generic entity modelling" is well
illustrated. Section 8.6 (page 51) asks the question "Why Stop Here?",
i.e. why not use a "binary relational model" representing all
relationships as entity types since "associations are
derived entity types". The given answer is (i) "unfortunately,
the data model that results is almost impossible to understand" and
(ii) this would not make the data models more "flexible and stable".
I won't dispute this for a data model (in a database context),
but in a knowledge base, (i) relation types may be defined and thus
used as abbreviations to give readability, (ii) this would lead to
more normalized knowledge representations (hence more comparable,
retrievable, re-usable, ...). Thus, for KBs, I propose the
following list of principles as an element for the discussion:


I Structural/ontological principles

1) Each introduced relation should be defined. If contexts (i.e.
   meta-statements) can be used, only one primitive relation is
   needed and it is binary (John Sowa called it LINK in his 1984 book).
   If contexts are not used, a ternary primitive relation may be
   useful.
   I am not sure but I think this covers (and extends) the
   first 4 of the 6 above refered principles, when adapted to KBs.

2) There should be one subtype hierarchy (hence with one one top)
   which include all the declared concept type.
   This is my reformulation of the last of the 6 above refered
   principles.

3) The meaning of the subtype and instance relations should
   be respected. The criterias of the <a href="http://www.loa-cnr.it/Papers/CACM2002.pdf";>OntoClean methodology</a>
   (e.g. identity, unity, ...) help to check that.

4) The "proper_subtype" relation and the "identity" or "equivalence"
   relations should be prefered to the general "subtype" relation.
   Same note for other partial order relations.

5) Organizing types via the "subtype" relation should be prefered
   to organizing via the "instance" relations, whenever adequate.

6) The classic knowledge representation structures (relation
   signatures and cardinalities, open/complete subtype partitions,
   exclusion links, ...) should be used whenever adequate.

7) New ontologies should re-use directly or indirectly reuses a
   top-level ontology that defines notions of collection, state,
   process, event, spatial_entity, physical_entity, temporal_entity,
   information_entity, property and measure.
   The ontological assumptions (e.g. see the <a href="http://wonderweb.semanticweb.org/deliverables/D18.shtml";>WonderWeb D18</a>
   for a nice summary) should be explicit.

8) If a 3D approach is used (instead of a 4D approach), temporal
   constraints should be represented in the statements that require
   them. Temporal constraints may be represented using contexts, or
   adding a time parameter to some relations type. The first option
   seems easier for people to represent natural language sentences. The
   second option is easier for current theorem provers to reason with.


II Naming principles

1) "Entity types should represent, and be named after, the
   underlying nature of an object, not the role it plays in a
   particular context" (this is the 5th of the 6 above refered
   principles).

2) The <a href="http://www.webkb.org/doc/conventions.html#relationArguments";>reading and naming convention generally advocated by
   frame-based or graph-based languages</a> should be preferred
   (hence the use of a noun or a nominal form for an identifier,
   without "has_", "is_" or "_of" inside this identifier; e.g.
   "parent", not "child_of", not "has_parent" nor the verbal forms
   "parenting" and "parents").
   However, if this convention is not respected, it is better to
   use "_of" to make the parameter ordering explicit than to use
   a different parameter ordering without signaling it in the
   identifier.

3) Identifiers that reuse common words should use the same
   capitalization. To separate words, it is <a href="http://www.webkb.org/doc/conventions.html#interCapStyle";>better to use
   underscore than the Intercap style</a>. When European words
   are re-used, a first capitalized letter should be the mark of
   an individual (not a type). Thus, "member_of_the_ONU" is better
   than "MemberOfTheONU", "memberOfTheONU", "ONU_member" and
   "ONUMember".

4) Identifiers for second-order types should include "_type" or
   "_class" at the end, as in "binary_relation_type" (intead of
   "Property").



Philippe