SUO: The lattice of theories
At 08:25 2002-06-10 -0400, John F. Sowa wrote:
> The term "infinite lattice" may be distracting, since it sounds
> very, very big. But you can have a lattice with just one element
> or as many as you need or want.
> It should be compared to the integers, which are also infinite,
> but nobody ever uses more than a finite subset. Recognizing that
> the integers are infinite is essential for theoretical reasons, and
> it is important in practice: it ensures that we will never run out
> of integers, no matter how big our computers get.
> It might sound more comforting to call it a "library of theories".
> Then it sounds like a more conventional kind of thing that everyone
> is familiar with: a indexed collection of things, where each item
> has a unique identifier; a catalog with a short description of each
> item; a method for finding any item; and a "neighborhood" around
> each item that contains related items for convenient browsing.
Mike and John-
About a month ago (2002-05-12 E-mail to SUO list), I had suggested some ideas for organizing the SUO work, which is consistent with your "lattice" or "library of theories" approach. There were three kinds of standards that were necessary (I've slightly edited my words of 2002-05-12):
- Type #1 (the registry): A registry of general concepts where there is common agreement. The goal would be to build consensus around a bunch of concepts and, hopefully, these general concepts could be used as a foundation for developing other concepts. Note 1: I believe that SUMO, some portion of IFF, and OpenCyc might be this kind of standard. Note 2: We might want to relax the constraint of "common agreement". Note 3: This kind of standard usually produces two separate standards: the registry (i.e., the "table" of entries -- the technical part), and the registration authority process (i.e., how we agree to include entries in the table -- the administrative part).
- Type #2 (the relationships/mappings): A technique for mapping one set of concepts to another. Why do we care about mappings? Because, e.g., that's all the market wants, or maybe it's impossible to get to common agreement on all the entries in Type #1. So this kind of standard would specify we describe, in general, mapping one set of concepts to another. Note: I believe that some portion of IFF might be this kind of standard.
- Type #3: We specify the necessary attributes to describe a concept. In other words, we wouldn't be specifying the concepts themselves, but we'd specify the required descriptive attributes when describing a concept. Note: Someone had mentioned something "metadata for ontologies" <-- an example of a Type #3 standard.
These types of standards could be further partitioned or combined ... other types are possible, too. And once we've settled on the types of standards we want, we'll probably need to specify how to use this kind of information from an IT perspective, e.g., codings (file formats, XML, KIF, etc.), APIs (objects/interfaces with convenient paradigms in C/C++, Java, LISP, etc.), protocols (ontology exchange, storage/retrieval, etc.), and so on. <-- But most of this stuff is lower level engineering (also known as "bindings") because we really need to decide what kind of standard we want: Type #1, ..., Type #N, or some combination.
> Mike Uschold wrote:
> MU> Even if we succeeded in building your dream lattice of many
> > different theories, it is not clear to me that such a beast would be
> > usable in practice. It could be very unwieldy indeed. Simplicity
> > will help ensure usability.
> On the contrary, a lattice is the simplest kind of organization you
> can imagine for a library of theories. It is much better than the
> linear indexing used in libraries of books, which can never accommdate
> books that happen to address more than one topic. A lattice lets you
> have multiple pointers to all the related items. If you focus on any
> one theory, you can find its neighbors by going up (generalizations),
> going down (specializations), or going sideways (siblings).
> Furthermore, you can start your library with as many or as few theories
> as you want. A single monolithic theory, such as SUMO, is simply a
> lattice with just one element. If you break SUMO into modules, you can
> still keep SUMO in the lattice as one complete whole, with each of its
> submodules as different generalizations.
As I had mentioned in my E-mail of 2002-05-12, I suggested that the approach would be to "grow" the registry over time ... add entries, as appropriate/as approved. I don't know how many entries SUMO would require ... and there might be more than one "modular" representation/decompisition of SUMO if we used Type #1 and Type #2 standards.
Once the registry were large, it would have the usual issues concerning searching, indexing, tagging, etc.. This is why I suggested that we also look into a Type #3 standard ("metadata for ontologies") to support the larger structures when they become available.
> Another nice feature is that the location of any element of the
> lattice can be automatically computed. You don't need an army of
> librarians to catalog the theories. You just let the IFF operators
> compute the position of any new theory relative to the ones that are
> already there.
This is a feature of registration (into the registry) and mappings (which are registered themselves in the registry).
> A lattice is the simplest way of organizing a library of theories.
> MU> There will be some, hopefully small number of sets of axioms that
> > DO INDEED have some priveleged status, in the sense of being an
> > agreed standard that others conform to.
> Sure. That is what we have in any library of books. Some are very
> popular, and some are never read. You can have a "best sellers"
> list, a reference section, etc. You can put a big pointer to Oprah's
> latest "standard" so that everybody can find it.
Regarding so-called "privileged status", I am uncertain if this is necessary. I agree with John that a better approach might be simply to leave the "registry" (or "library", as John calls it) alone -- without so-called "privileged status". I'm not sure what "privileged status" might mean or how we would get agreement on its meaning.
Maybe there is an entry on "Oprah's perspective of the world" and for those people who think like Oprah, they can find their entries. On a more practical note, there might be entries that corrspond to international standards (ISO/IEC/ITU) that might be useful because they might be widely adopted, yet there is no guarantee that the international standard is correct from a theoretical perspective.
For example, in my E-mail of 2002-05-12 I mentioned the ISO standard for units of measurement (ISO 31-*). Even though these series of standards are in widespread use in industry, trade, and science, there is no assurance that these are the "correct" descriptions.
So "privileged status" might be (1) difficult to achieve (how would we know we are "really right", other than this group agreeing?), and (2) impractical for use ... e.g., for business applications, I'd want to use the ISO version of units of measure because that is the one all my trading partners are using.
> MU> This will be driven by the impracticallity of dealing with
> > infinitely many infinitely variable ways to combine many different
> > sets of axioms.
> Think about the integers. There are infinitely many infinitely
> variable ways to combine them. But children learn to count very
> quickly, and the infinity never bothers them. You just use what you
> need, ignore what you don't, and never worry about the ones you don't
> need or want.
I'd respond differently ... One doesn't worry about algebraic expressions, even though there are "infinitely many infinitely variable ways to combine them", right? So, if one considers "walking a lattice" (i.e., navigation in IT terminology) to be constructing an expression or query, then there are certainly an infinite number of possibilities, but the main concerns would include: (1) well-understood "semantics" for the navigation/expression/query technique, and (2) an implementation that supports these "semantics". <-- I'm trying to frame this sentence using the context of standards documents (replace "semantics" for "standards wording") to make it clear what kind of standards wording is nececessary to achieve this kind of expressiveness.
> MU> Any UNIMPORTANT differences among representations for the same
> > concepts should be eliminated. To some extent it will be arbitrary
> > deciding which one to choose. Where there are important differences
> > these should be at attempt to keep them to a minimum and as you say,
> > document them carefully so users can choose.
> That is the beauty of having a library of theories instead of one big
> lump. Each separate theory can be reviewed, studied, and evaluated
> on its own merits and on its merits in comparison to its neighbors.
One reason for keeping slightly different representations is to deal with "drift" over time ("terminology drift", "concept drift", etc.). It might be useful to have the 1970-timeframe thinking available, even though (presuably) the 2002-thinking would be more up-to-date. For practical reasons, both would need to be available.
Frank Farance, Farance Inc. T: +1 212 486 4700 F: +1 212 759 1605
Standards, products, services for the Global Information Infrastructure