Re: Many-sorted (or typed) logic
I wonder if I might try to pick this issue apart a little, as I think
both (all?) sides are in a sense right but in a different sense
wrong. As Fritz mentioned, this issue has been debated before in the
ontology community; it has also been debated in other forums,
including the theorem-proving community about 25 years ago and more
recently even at such academic heights as the Foundations of
mathematics (FOM) list. Moreover it is potentially important for the
choice of an ontology language, so for all these reasons I think it
is important to keep the issues straight.
Following advice from Matthew West, I will try to explain them so
that everyone can see what the issues are; I apologize in advance to
those to whom such 'tutorial' explanations might seem condescending.
What is this about?
The issue can be phrased as whether or not to allow higher-order
quantifiers in the ontological language, and how to decide what they
mean if we do allow them. Any ontological language worth a damn will
allow quantification over 'things' (sometimes 'entities' or some
other word is used), and this is often called 'first-order'
quantification. What makes it first-order is that there is a clear
distinction, in this kind of logic (first-order logic, or FOL),
between the things that quantifiers range over and the assertions
that are made about them in the logic. Higher-order quantifiers treat
properties of things as another kind of 'thing', which makes
everything more complicated.
A brief summary of logical semantics.
The basic idea of 'model theory' as it is often called (a better term
would be 'interpretation theory') is to give a mathematical
specification of what would count as a way to interpret the logic.
The logical expressions themselves say something, but what do they
say it about? An 'interpretation' is a way to answer that question.
An interpretation defines just enough structure to give a meaningful
truthvalue to every expression of the logical language. There are
interpretation rules which often seem so obvious as to be hardly
worth saying, such as that the meaning of '(and p q)' is true in an
interpretation just when the meaning of 'p' is true and the meaning
of 'q' is also true in that interpretation, but it gets interesting
when one gets to the quantifiers. The way that first-order
quantification is understood is that the quantifier ranges over some
set of 'things' called 'the universe' of the interpretation. In fact,
the choice of a universe (it usually has to be nonempty) is the first
thing to be done in specifying a first-order interpretation, and then
each name is interpreted to be the name of a member of the universe,
each predicate is given an interpretation which is a subset of the
universe, each binary relation is a subset of pairs of elements of
the universe, and so on. A statement beginning with the universal
quantifier is true just when the body of the statement (without the
quantifier) is true for every value of the bound variable. Notice
that this might come out different if we had chosen a different
universe: the meaning of the quantifier depends on the entire
universe, not just on some part of it.
(The title line refers to many-sorted logics. This is just a way of
saying that some of the predicates of the language are given a
special status as a 'sort', and some extra syntactic machinery is
provided to restrict quantifiers to one of these sorts, and maybe
some special inference schemes are also added for reasoning
efficiently about sort heirarchies. But all the sorts are subsets of
the same universe.)
An interpretation specifies an unique value for every expression of
the logic; and indeed it can be characterised as the 'smallest'
mathematical structure which would assign a truthvalue to every
expression in the language in this way: it is a kind of minimal
mathematical specification of a possible universe. It is one way the
world could be so as to make all the sentences perfectly clear in
their meaning. These 'ways the world might be' are not themselves
part of the logic, and are not to be thought of as in any physical or
inferential or mechanical way 'connected' to the logic. They are not
required to be computable or constructible (though some of them may
be) and they are not constrained to be 'made of' any particular kind
or category of 'stuff'. They are a purely mathematical description of
the idea of a 'possible meaning' for the logic.
It is important to see that this is only ONE way the world could be.
It doesnt make sense to ask what is THE interpretation of an
expression in a logical language. In general one cannot point to a
single interpretation and say anything very useful about the logic
from that single interpretation, since the logic may well have
different interpretations which have different properties. What one
can do is relate inference in the logic - the mechanical business of
drawing conclusions and creating proofs - to what is true in *all*
interpretations of the logic. Sometimes a logic is "complete" in the
sense that a sentence is provable exactly when it is true on all
possible interpretations (according to the rules of interpretation
given). This is a very nice property of a logic and its semantics,
since it shows that they are in a precise kind of correspondence with
each other. If a logic is not 'complete' in this way (note, the name
doesnt mean that it is 'completely defined' or that everything
intuitively true is provable in it) then it means that the semantics
- the way the language is being interpreted - and the proof machinery
- which in our terms means how the computer is able to use the logic
- are out of step with each other: the semantics makes claims about
the logic which the machinery cannot guarantee.
(Notice that the very same language could be given different rules of
interpretation. This isnt often done for FOL but it could be, and it
is often done for other logics. Completeness only makes sense with
respect to a set of interpretation rules, so a logic may be complete
with reference to one semantics but incomplete with repect to
another.)
A natural question to ask is, can one write enough axioms into an
ontology to be able to guarantee that the only interpretations which
satisfy those axioms are just the ones one has in mind? And the
answer is: almost certainly, NO. Almost all interesting ontologies
have 'nonstandard' interpretations, ie interpretations which satisfy
the logical rules but which don't correspond to the intuitions one
had in mind when writing the axioms. However, it is important to not
let this get one down, since it is in fact a source of power rather
than a kind of defeat. The existence of nonstandard interpretations
doesnt amount to a claim that there is something wrong with the
axioms, or to say that the axioms in some sense are infected by these
unintended interpretations, or that one is somehow 'forced' into
considering them. In foundations of mathematics work nonstandard
models are often considered an embarrassment, since mathematicians
aspire to complete theories of Platonic realities, but for ontology
writing it seems to be what one would expect: it's another way of
saying that ontologies must be open-ended. There is always something
more you could say about your intended topic: the ontological theory
is never 'finished' in the sense that it says everything that could
logically be said about a topic in enough detail to guarantee that
nobody will ever be able to say anything more about it.
('Nonstandard' interpretations can often be very interesting, in any
case, even in mathematics. For example there is an entire field
concerned with 'nonstandard' interpretations of arithmetic in which
infinitesimals are allowed.) But the key point is, whether one likes
it or not, nonstandard interpretations are perfectly good
interpretations, and so there is no way to legislate them out of
existence or claim that they are somehow forbidden.
Now, a final point about interpretations. I said above that proofs
could be related to what is true in *all* interpretations. This 'all'
assumes that we have given an exact characterisation of what counts
as an interpretation, which is relatively easy for 'first-order'
quantification; but when we turn to 'higher-order' quantification,
the story gets more complicated, since there will be several
different ways to interpret a higher-order quantifier, and these
correspond to different kinds of interpretation, and so it is no
longer quite so clear what it means to say 'all interpretations'.
This is where the Chris/John/Fritz exchange becomes tricky.
Higher-order logical notation
Mathematicians decided about 85 years ago that they needed a way to
talk about properties and relations as well as about things, and
practicing ontologists came to a similar conclusion about 75 years
later. First-order logic has symbols for properties, of course, but
it doesnt allow one to say something about 'all properties': it only
has names for properties, not quantifiers and variables over them.
Second-order logic (SOL) allows such quantifiers (and third-order
logic allows properties of properties, and fourth-order... you get
the idea. 'Higher-order logic' (HOL) is sometimes taken to mean any
of these logics and sometimes to mean their union where one has all
properties of properties of properties of..... up as far as you want.
It might seem like a big difference, but in practice once you have
second order you pretty much get all the rest, so I won't bother to
be exact about the distinctions.)
Now it is not hard to tweak the syntax of the logical language to
allow quantifiers over properties (and relations: again I won't
bother with the details). You just allow variables to be put in the
predicate position of an expression, ie you allow things like (?P a
?x) as well as (P ?x ?y), and then you allow them to be quantified.
Nothing to it: CYCL does it without even batting an eyelid. (KIF
carefully excludes this, which IMHO is a major problem for using KIF
in ontology specification. I agree with Fritz and I think with John
here). It gets a bit trickier however to specify inference rules and
interpretations.
Inferences using higher-order notation.
It is almost as easy to adapt the usual FOL inference rules/machinery
to handle these property quantifiers. In fact one hardly needs to
adapt it: it works on them just like it works on other variables. The
usual unification algorithm, for example, used by first-order
reasoners, works just fine with the variables in the first position
of an expression. (As far as unification is concerned, the expression
is just a tree with constant names or variables at its tips. The
usual first-order restriction to only allow constants in the relation
position has to be added as an explicit restriction.) So what, one
might well ask, is all the fuss about?
A hint of the problem can be seen by asking whether there ought to be
any new inference rules which we can now use which we couldnt use
before. For example, suppose we have asserted (I'll do this in
'illegal KIF')
(not(exists ((?P))(and (?P a) (?P b)) )
and something else asserts (FOO a) and (BAZ b). Now, is this a
contradiction? The slightly-modified first-order reasoner isnt going
to see any contradiction here. But does this mean there is no
contradiction, or that first-order reasoning isnt up to the new task?
Well, it depends on what that "exists ((?P))" is supposed to mean.
What that says is, intuitively, there exists a property which is true
of both a and b, ie a and b 'share a property'. Well, do they? One
person (call him F.) might say: no, or at least not unless FOO is the
same as BAZ. But someone else (call him P.) might say: Oh yes they
do. Consider the property 'FOO OR BAZ", ie the property which is true
of something just when either FOO or BAZ is true of it (compare "No
dogs or Irish need apply"). If FOO and BAZ are properties, so is
this. And this disjunction property is true of both a and of b, so it
is a counterxample to the axiom, so there is a contradiction here.
Imagine how the resulting debate might go:
F: That 'property' of yours isn't a real property.
P: Sure it is. I can write a sentence in the logical language which defines it:
(defproperty FUBAZ(?x) (or (FOO ?x)(BAZ ?x)))
F: But you didn't write that definition, did you?
P: But that is irrelevant: just because I didnt bother to define it
explicitly, doesnt mean it doesnt *exist*. When I quantify over the
universe, I don't quantify only over the things that have been
defined explicitly. Even first-order quantification wouldnt work
properly with that interpretation.
F: But if you allow property quantifiers to range over all properties
which could possibly be defined, they become useless, since there are
all kinds of those 'properties'. In fact there is, potentially, such
a property for every expression in the logic.
P: Yes, indeed, there is; and that is why it is so hard to give a
complete set of inference rules for higher-order quantifiers. But
whether you like it or not, that is the universe of properties over
which your quantifiers are supposed to be interpreted, so you had
better face up to that fact.
F: Hey, I can make my quantifiers range over any set I want them to
range over! The question is, who is to be master, that is all.
R: (A stranger who has been listening nearby): Actually the situation
is even worse than either of you realize. Second-order quantifiers
are true only over the set of ALL properties, whether they can be
defined or not, so there might be more properties than there are
expressions in the language, even.
F and P , together: Listen, things are hard enough already. Go back
to wherever you came from and leave us alone.
Second order universes
As one can see, the issue turns on what universe the higher-order
quantifiers are supposed to range over. If we were giving an
interpretation for a higher-order logic, we would have to choose a
basic universe for the ordinary quantifiers to range over, but what
would be the proper universe for the the second-order quantifers?
Here are a few possible answers:
1. The set of all properties, ie all subsets of the first-order
universe (often called the 'power set' of the universe). (R's answer.)
2. The set of all definable properties, ie those which could be
defined using an expression of the language. This is a subset of the
power set. (P's answer)
3. The set of all named properties, ie the properties which have been
assigned to predicate names by the interpretation. This is an even
smaller subset of the power set. (F's answer.)
Of these answers 1. seems best: it is simple, clear, intuitive and
has the merit that it is completely determined by the first-order
universe itself, so that the choice of a 'basic' universe determines
the whole universe for all the other quantifiers (third-order
property quantifiers range over subsets of subsets of the universe,
etc. on up.). This is the 'classical' semantics for higher-order
logic, and it is what most mathematicians mean when they contrast
higher-order with first-order logics. Unfortunately for us it is
utterly impractical, as one can prove that NO logical system of rules
can possibly be complete with respect to this semantics. (The reason
is quite interesting: basically because the power set of a countable
universe is uncountable. This means for example that there are
uncountably many properties of the integers; see R's interjection.
Maybe we don't really want to be quantifying over such a large set
after all...) What this means is that no matter how loudly someone
might claim that this was the semantics they had in mind for their
logic, the logic itself - and any implementation of it - will not in
fact accurately reflect this semantics. It can only be wishful
thinking. There is absolutely no way around this, its just a
mathematical fact of life. This is the sense in which Chris is right
in emphasising that SOL (or HOL) cannot be transcribed into FOL or
indeed into any first-order theory.
(An aside. Fritz's remark that we just have to abandon completeness
is a misunderstanding about the nature of completeness. If we
'abandon completeness' in this sense then we are just claiming the
right to assign arbitrary semantic interpetations to our logics as a
kind of ritualization of fantasy. I can claim that the symbol "urgh"
has the same semantics as the US Constitution, but what does that
claim amount to? How could it be justified? It wouldnt be enough for
me to just say that this was my "intended interpretation" and expect
my machinery to work accordingly. Completeness is the only guarantee
that our semantic claims about our logics, and the actual inferential
performance of those logics, have any meaningful relationship to each
other. 'Abandoning completeness' allows one to bask in a kind of glow
of satisfaction about what ones axioms seem to be saying, but it
doesnt help with the awkward fact that the 'nonstandard' models which
one has conveniently refused to allow into ones preferred semantic
account are, like it or not, the sand in the machinery of ones
inference system.)
(Just as a point of clarification: Ive been careful to distinguish
between a logic, and a language or notation. One can take a given
notation and define a variety of different inference rules for it,
each giving a different logic. In the FOO OR BAZ example, there are
(at least) two different logics being considered, but they both
involve the same language. The moral is that one cannot tell a logic
from its syntax: one needs to know something more about it. This
'something more' might be some inference rules or it might be a
system of interpretation, or both. If both and they correspond to
each other, we have completeness. If only a semantics with no rules,
we have an aspiration. If only rules but no semantics, we have
confusion.)
Returning to the above answers, the imaginary debate between F. and
P. is really about the difference between 2. and 3. Here there is
something useful to be said. Answer number 2 - P's position in the
debate - gives what is often called the Henkin semantics for HOL
(named for its inventor) . This has several nice properties. There is
a complete set of inference rules, and HOL with this semantics can be
transcribed into a (rather clumsy, but never mind for now) ontology
in FOL. (It is a many-sorted logic with a sort for each arity of each
level, eg there is a separate sort for third-order predicates with
seven arguments, etc., and a lot of axioms about how various
combinations of predicates are connected with each other.) This is
the sense which Chris dismisses as being just a notational variant on
FOL, as indeed in a robust mathematical sense it is. Nevertheless, it
is a distinctly different logic to mechanize, one with its own
special inference rules which go beyond the usual first-order
inference machinery (for example, it would be able to find the FOO OR
BAZ contradiction all by itself) and which are very tricky to
implement. (Another way of saying the same thing is that the FOL
ontology which would 'transcribe' a Henkin-interpreted HOL is so
clumsy as to be unusable in practice.) For example, the usual
unification algorithm notoriously does not work in this version of
HOL (it doesnt even work for SOL). I know of no current
implementation of higher-order logic in this sense, though if there
was one that would be a wonderful thing. (To implement the logic
directly one would need rules of lambda-conversion to be incorporated
into the unification algorithm.)
Answer 3 makes a higher-order quantification amount to even less than
the second answer, and as that is equivalent to FOL, one might expect
that with this answer, a higher-order quantification is actually LESS
expressive than a first-order quantifier, and indeed that is the
case. What this amounts to is that higher-order quantifiers are
equivalent to quantifications over the names of properties rather
than the properties themselves. When one writes
(forall ((?P))...?P....) one really means, on this interpretation,
something like
(forall ((?n)) (implies (PropertyName ?n) ....(propertynamedby ?n)....)
and any property that doesnt have an explicit name is just not in the
universe of properties.
Now, this last way of interpreting second-order quantifiers is SO
limited in semantic expressiveness that the mathematicians of 85
years ago wouldnt have taken it seriously for a minute. Nevertheless,
I would urge, this is in fact the most useful way to interpret
higher-order quantifiers for ontological use. It means that
higher-order quantifiers are severely restricted in their scope, but
still it allows one to write properties of properties of ... etc., to
be able to draw conclusions about named properties, to define classes
of properties, etc., and use all that expressive licence whichFritz
and other happy CYCLists have been taking for granted for years; and
still have the resulting inferences interpreted by a standard
efficient first-order reasoner.
In terms of the 'semantic debates' to which Chris and John have been
referring, it is so weak as to be hardly worth mentioning, but for
practical usage in writing ontologies it is of immense and liberating
value. It costs nothing in inference complexity and does not
complicate the proof theory in any harmful way, and it fits smoothly
into the usual inference machinery. Its very weakness is an odd
source of strength, in that it renders harmless a kind of use/mention
casualness which one sees often in CYC, where a predicate on a
property can refer either to the property or to the name of the
property. In conventional interpretations this is a potential
disaster, but with this weak notion of higher-order quantifier it is
harmless. It also allows the user to control the universe of
quantification by providing new definitions, for example.
Why everyone except me is wrong
So, to return to the adhominemism for a moment. Chris is the
mathematician here, and he is correct (contra John) that HOL with the
Henkin semantics is no more expressive than FOL, and that it is
incorrect to say that SOL (still less HOL) is *equivalent* to a
sorted first-order theory. Nevertheless John is right to emphasise
that even with the Henkin semantics, SOL and HOL are distinct
languages with their own distinct proof styles and are in many ways
superior to their transcription into first-order language.
Unfortunately these strengths do not translate into mechanizability,
as far as I know. Fritz is, I guess, entitled to claim that he will
ignore completeness and hence claim that his logic fits a semantic
theory which it in fact doesn't fit, but then he is also entitled to
claim that black is white. The truth of the matter is that the logic
which Fritz likes so much is actually FOL inference applied directly
to HOL syntax, an odd blend from the traditional point of view but
one which (Fritz is right to emphasise) is of considerable utility
for representing ontologies, and which is readily mechanizable using
well-known techniques.
(But the debates would be a lot shorter and less noisy if John and
Fritz would stop making philosophical claims which are overstated,
controversial and unnecessary to their main points and irrelevant to
the main topic at hand. So there.)
-----
I plan to expand on this in a later message, giving a more thorough
description of the proposed language. One might call it a CYClified
version of KIF. (This logic lies somewhere in between answers 2 and
3, by the way, which is where CYCL may also lie.)
Pat Hayes
---------------------------------------------------------------------
IHMC (850)434 8903 home
40 South Alcaniz St. (850)202 4416 office
Pensacola, FL 32501 (850)202 4440 fax
phayes@ai.uwf.edu
http://www.coginst.uwf.edu/~phayes