Re: Avoiding the pitfalls of ontology development
John Sowa wrote:
> That's why I suggested that we address the problem of
> working on a document about principles for developing
> and defining ontologies.
> ... compile an annotated bibliography of such resources.
I have put the principles I list below in
http://www.webkb.org/doc/ontologyBuildingPrinciples.html
It links to my previous "discussion on recommendations to
increase knowledge reuse".
Matthew West wrote:
> The traps and principles were combined in a publicly available document:
> http://www.matthew-west.org.uk/Documents/princ03.pdf
Six good principles (a list is on page 7) and the importance of
"reflecting the reality" and "generic entity modelling" is well
illustrated. Section 8.6 (page 51) asks the question "Why Stop Here?",
i.e. why not use a "binary relational model" representing all
relationships as entity types since "associations are
derived entity types". The given answer is (i) "unfortunately,
the data model that results is almost impossible to understand" and
(ii) this would not make the data models more "flexible and stable".
I won't dispute this for a data model (in a database context),
but in a knowledge base, (i) relation types may be defined and thus
used as abbreviations to give readability, (ii) this would lead to
more normalized knowledge representations (hence more comparable,
retrievable, re-usable, ...). Thus, for KBs, I propose the
following list of principles as an element for the discussion:
I Structural/ontological principles
1) Each introduced relation should be defined. If contexts (i.e.
meta-statements) can be used, only one primitive relation is
needed and it is binary (John Sowa called it LINK in his 1984 book).
If contexts are not used, a ternary primitive relation may be
useful.
I am not sure but I think this covers (and extends) the
first 4 of the 6 above refered principles, when adapted to KBs.
2) There should be one subtype hierarchy (hence with one one top)
which include all the declared concept type.
This is my reformulation of the last of the 6 above refered
principles.
3) The meaning of the subtype and instance relations should
be respected. The criterias of the <a href="http://www.loa-cnr.it/Papers/CACM2002.pdf">OntoClean methodology</a>
(e.g. identity, unity, ...) help to check that.
4) The "proper_subtype" relation and the "identity" or "equivalence"
relations should be prefered to the general "subtype" relation.
Same note for other partial order relations.
5) Organizing types via the "subtype" relation should be prefered
to organizing via the "instance" relations, whenever adequate.
6) The classic knowledge representation structures (relation
signatures and cardinalities, open/complete subtype partitions,
exclusion links, ...) should be used whenever adequate.
7) New ontologies should re-use directly or indirectly reuses a
top-level ontology that defines notions of collection, state,
process, event, spatial_entity, physical_entity, temporal_entity,
information_entity, property and measure.
The ontological assumptions (e.g. see the <a href="http://wonderweb.semanticweb.org/deliverables/D18.shtml">WonderWeb D18</a>
for a nice summary) should be explicit.
8) If a 3D approach is used (instead of a 4D approach), temporal
constraints should be represented in the statements that require
them. Temporal constraints may be represented using contexts, or
adding a time parameter to some relations type. The first option
seems easier for people to represent natural language sentences. The
second option is easier for current theorem provers to reason with.
II Naming principles
1) "Entity types should represent, and be named after, the
underlying nature of an object, not the role it plays in a
particular context" (this is the 5th of the 6 above refered
principles).
2) The <a href="http://www.webkb.org/doc/conventions.html#relationArguments">reading and naming convention generally advocated by
frame-based or graph-based languages</a> should be preferred
(hence the use of a noun or a nominal form for an identifier,
without "has_", "is_" or "_of" inside this identifier; e.g.
"parent", not "child_of", not "has_parent" nor the verbal forms
"parenting" and "parents").
However, if this convention is not respected, it is better to
use "_of" to make the parameter ordering explicit than to use
a different parameter ordering without signaling it in the
identifier.
3) Identifiers that reuse common words should use the same
capitalization. To separate words, it is <a href="http://www.webkb.org/doc/conventions.html#interCapStyle">better to use
underscore than the Intercap style</a>. When European words
are re-used, a first capitalized letter should be the mark of
an individual (not a type). Thus, "member_of_the_ONU" is better
than "MemberOfTheONU", "memberOfTheONU", "ONU_member" and
"ONUMember".
4) Identifiers for second-order types should include "_type" or
"_class" at the end, as in "binary_relation_type" (intead of
"Property").
Philippe