Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: Building the Hierarchy by Language ansd Module Processing




All

In my message "Language and Module Processing"
http://suo.ieee.org/email/msg09546.html
I discussed the connection between "Language and Module Processing" (LMP)
document
http://suo.ieee.org/IFF/language-module-processing.html
(consisting of six diagrams and five process steps)
and John Sowa's message "Building the hierarchy" .
http://suo.ieee.org/email/msg09453.html
This note will reemphasize and expand on this connection.

John Sowa wrote:
> Bottom line:  Building the hierarchy can be done in
> a step-by-step fashion, and even the early stages
> can be useful to the SUMO and OpenCyc developers
> and anyone else who needs ontology building blocks.

The LMP document illustrates this is depth.

Here are some general comments.

All diagrams and processing steps of (LMP) are represented in the IFF. The
processing steps are ordered in the LMP document in order to illustrate how
John Sowa's diagram can be built up. However, in general these processing
steps can be used in any way that one wants. The two basic processing steps
in the context of theories are (1) FOL theory/language sum and (2) FOL
theory/language quotient. The lattice of theories contains many basic
processing steps, including (1) library of modules sum, (2) sub-theorizing,
(3) generic theory creation, (4) theory creation via meet, etc.

There is a lattice of theories determined by any FOL language. Each FOL
theory has an underlying FOL language, and hence the context of theories is
partitioned by the FOL languages. Each block of this partition corresponds
to the lattice of theories built from the FOL language associated with that
partition block. Following the preceding comments the following point should
be strongly emphasized: each lattice of theories lies within the context of
theories. In particular, any context of theories processing step can follow
any lattice of theories processing step.

Here are some comments connecting John Sowa's message to the LMP document.
____________________

LANGUAGE SUM
____________________

John Sowa wrote:
> Some strategy for distintuishing the identifiers
> used in different modules and relating them to
> one another is essential.  I'm renaming this
> thread "Building the hierarchy" because it
> addresses some important general principles.
>
> The first point to note is that every module in
> the hierarchy should have a unique identifier.
> Therefore, all names could be distinguished by
> a concatenated string such as "moduleID.localId".

This is represented by the FOL language sums in the first processing step in
the LMP. The standard sum of FOL languages is the disjoint union of the sets
of relation symbols, function symbols and constant symbols. Suppose that two
FOL languages L1 and L2 need to be summed, and suppose that we decide to use
the standard sum (disjoint unions). Suppose that L1 has relations {..., R_m,
...}, functions {..., f_j, ...}, and constants {..., c_p, ...}, and L2 has
relations {..., S_n, ...}, functions {..., g_k, ...}, and constants {...,
d_q, ...}. Then, using the numbers 1 and 2 as tags for disjoint union, the
sum FOL language has relations {..., (1, R_m), ..., (2, S_n), ...},
functions {..., (1, f_j), ..., (2, g_k), ...}, and constants {..., (1, c_p),
..., (2, d_q), ...}.
____________________

LANGUAGE QUOTIENT
____________________

John Sowa wrote:
> Keeping the names distinct is not a problem.
> The real problem is to have a systematic way
> of specifying the conditions for determining
> how and whether identifiers in different
> modules could be assumed to be "the same".
>
> As you noted in your email, one might assume
> that the SUMO names or the OpenCyc names had
> been coordinated with one another so that
> we could assume:
>
>  1. Two names with the same spelling in two
>     modules that had been extracted from
>     some larger module would have the same
>     intended referents.
>
>  2. Two names, even with the same spelling, in
>     indepedently developed modules could not be
>     assumed to have the same intended referents.
>
> The purpose of the lattice of theories (or at least
> a generalization hierarchy that represents some
> finite excerpt from or some finite step toward
> such a lattice) is to relate the names in
> different modules.

FOL languages and FOL theories represent unpopulated object-level
ontologies -- they are only concerned with types of things. FOL languages
consist of a set of relation symbols, a set of function symbols, and a set
of constants. When the FOL languages are sorted (as they are in the IFF),
there is also a set of sort or entity symbols (called entity types in the
IFF), and certain coherence assumptions may need to be made about entity
symbols being equivalent to certain unary relation symbols. Relations
symbols represent relation types, function symbols represent function types,
and entity symbols represent entity types. From the mathematical viewpoint,
constants are just nullary functions, and hence can be ignored.

Just as in set theory, quotients of FOL languages are determined by
endorelations (binary relations on a particular set, with the same set of
elements for each coordinate). There is an endorelation need for relations,
an endorelation needed for functions, and an endorelation needed for
entities. Moreover, arities must respect these endorelations -- the arity of
equivalent relations must be coordinatewise equivalent. Once suitable
endorelations are specified, then corresponding equivalence relations
(symmetric, transitive, reflexive closure) can be calculated and quotients
can be formed. Each quotient set is the collection of equivalence blocks for
the corresponding equivalence relation.

As an example, suppose that a FOL languages L needs to be quotiented. It is
important to note that endorelations and quotients are formed on a single
FOL language. Suppose that L has relations {..., R_m, ...}, functions {...,
f_j, ...}, and entities (sorts) {..., E_p, ...}. For simplicity, we will
only quotient relations and entities. Suppose that we want to regard the
following sort equivalences: E1 = E2 and E2 = E3, and suppose that we want
to regard the following relation equivalence: R(E1, E5) = S(E3, E5). These
arities respect the equivalences. Forming the quotient sets results in the
quotient relations {..., [R_m], ..., [R] = [S] = {R, S}, ...} and the
quotient entities (sorts) {..., [E_p], ..., [E1] = [E2] = [E3] = {E1, E2,
E3}, ..}.

As a final note, FOL models and FOL logics represent populated object-level
ontologies -- they not only are concerned with types of things, but are also
concerned with instances and how these instances are classified by the
types. Quotients can also be formed for models and logics. But the formation
works in a dual fashion: endorelations are formed on types (relation,
function, entity), but only a subset of instances (relation, function,
entity) may qualify, since equivalent instances must be classified by the
same types. For more on this, see the notion of a *dual invariant* in the
book "Information Flow: The Logic of Distrubuted Systems" by Jon Barwise and
Jerry Seligman. Finally, the need to form quotients of FOL logics occurs
when one is involved in the semantic integration of object-level ontologies.
There the endorelations are indirectly specified by an alignment diagram
with a coordinating reference or mediating theory.
____________________

LIBRARY OF MODULE SUM
____________________

John Sowa wrote:
> At the start of the joint project, we begin with
> a one-node hierarchy:  just the universal theory
> T at the top (the one with no axioms at all).
> Then we would put two big nodes on two separate
> branches under T:  for example, OpenCyc on the
> right branch and SUMO on the left branch.

A library of modules consists of a set of indices and a indexing function
from indices to modules (FOL theories). We think of the indexing function as
mapping into the background (usually infinite) lattice of theories for the
particular FOL language that underlies all of the theories in the lattice. A
library of modules is pictured as a diagram. This diagram is the (infinite)
diagram of the lattice of theories with only the indexed theories (modules)
highlighted, but all the other theories except for the top and bottom nodes
invisible (not pictured in the diagram). The top and bottom nodes are
special -- they are not indexed in the library, but they are pictured in the
diagram. The top represents the closed theory having only the logically true
expressions as its theorems, whereas the bottom represents the inconsistent
closed theory having all expressions (including non-satisfiable expressions
of the form P&~P) as its theorems.

Recall that the lattice of theories is not a partially ordered set, but only
a preorder with some of the elements equivalent to each other. The links in
the lattice of theories and any of its libraries of theories represent the
lattice order with a link going upward from theory T1 to theory T2 above it
when the two theories are ordered as T1 <= T2, where order means theory T1
is more specialized than theory T2 (has more theorems than T2). However,
since the lattice and any of its libraries is just a preorder, it is a
little difficult to picture equivalent theories (modules). In particular, a
theory may appear in the diagram for a library of modules, but be
inconsistent and hence equivalent to the bottom node.

We form the sum of two libraries of modules by forming the sum (disjoint
union) of their indexing sets, and forming the so-called copairing indexing
function that applies each component indexing function on the corresponding
component part of the indexing set. For example, suppose Lib1 is a library
of modules with indexing set {a, b} and indexing function T mapping a to Ta
and b to Tb, and suppose Lib2 is a library of modules with indexing set {c,
d, e} and indexing function S mapping c to Sc, d to Sd, and e to Se. Then
the sum Lib1 + Lib2 library of modules has indexing set {a, b} + {c, d, e} =
{a, b, c, d, e} and indexing function P = [T, S] mapping a to Pa = Ta, Pb =
Tb, Pc = Sc, Pd = Sd, and Pe = Se.
____________________

SUB-THEORIZING
____________________

John Sowa wrote:
> The next step would be to extract smaller modules
> (or microtheories) from the two big theories.
> Each of the smaller modules would be more general
> than the larger module from which it was extracted.
> Therefore, all the SUMO modules would lie on some
> branch between T and SUMO, and all the OpenCyc
> modules would lie on some branch between T and
> OpenCyc.  The hierarchy at this step is shown in
> the attached file, which I also put on my web site:
>
>    http://www.jfsowa.com/figs/suohier.gif

No further comments needed here.
____________________

THEORY CREATION VIA MEET
____________________

John Sowa wrote:
> The second drawing, suohier2.gif, shows a later stage in the development
> of the hierarchy, after some collaboration between the SUMO and OpenCyc
> groups.  The modules in blue represent smaller, more general theories,
> many of which are common to both groups and can be shared by them (or
> by anyone else who might find them useful).  The large theory in the
> middle (colored yellow) represents a customized ontology that somebody
> derived by combining the S3 module from SUMO, the O2 module from OpenCyc,
> and some of the blue modules. (It might also include some specialized
> axioms that were not taken from any other modules in the hierarchy.)

I assume that such a customized module (theory) is combined by taking the
lattice meet of various other theories. The lattice meet of a collection of
theories can be formed by taking the union of the axiom sets. As John Sowa
mentioned elsewhere, the customized theory may be inconsistent (equivalent
to the bottom node) and this nontrivial question must be checked.

Robert E. Kent
rekent@ontologos.org