Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Many-sorted (or typed) logic




I wonder if I might try to pick this issue apart a little, as I think 
both (all?) sides are in a sense right but in a different sense 
wrong. As Fritz mentioned, this issue has been debated before in the 
ontology community; it has also been debated in other forums, 
including the theorem-proving community about 25 years ago and more 
recently even at such academic heights as the Foundations of 
mathematics (FOM) list. Moreover it is potentially important for the 
choice of an ontology language, so for all these reasons I think it 
is important to keep the issues straight.

Following advice from Matthew West, I will try to explain them so 
that everyone can see what the issues are; I apologize in advance to 
those to whom such 'tutorial' explanations might seem condescending.

What is this about?

The issue can be phrased as whether or not to allow higher-order 
quantifiers in the ontological language, and how to decide what they 
mean if we do allow them.  Any ontological language worth a damn will 
allow quantification over 'things' (sometimes 'entities' or some 
other word is used), and this is often called 'first-order' 
quantification. What makes it first-order is that there is a clear 
distinction, in this kind of logic (first-order logic, or FOL), 
between the things that quantifiers range over and the assertions 
that are made about them in the logic. Higher-order quantifiers treat 
properties of things as another kind of 'thing', which makes 
everything more complicated.

A brief summary of logical semantics.

The basic idea of 'model theory' as it is often called (a better term 
would be 'interpretation theory') is to give a mathematical 
specification of what would count as a way to interpret the logic. 
The logical expressions themselves say something, but what do they 
say it about? An 'interpretation' is a way to answer that question. 
An interpretation defines just enough structure to give a meaningful 
truthvalue to every expression of the logical language. There are 
interpretation rules which often seem so obvious as to be hardly 
worth saying, such as that the meaning of '(and p q)' is true in an 
interpretation just when the meaning of 'p' is true and the meaning 
of 'q' is also true in that interpretation, but it gets interesting 
when one gets to the quantifiers.  The way that first-order 
quantification is understood is that the quantifier ranges over some 
set of 'things' called 'the universe' of the interpretation. In fact, 
the choice of a universe (it usually has to be nonempty) is the first 
thing to be done in specifying a first-order interpretation, and then 
each name is interpreted to be the name of a member of the universe, 
each predicate is given an interpretation which is a subset of the 
universe, each binary relation is a subset of pairs of elements of 
the universe, and so on. A statement beginning with the universal 
quantifier is true just when the body of the statement (without the 
quantifier) is true for every value of the bound variable.  Notice 
that this might come out different if we had chosen a different 
universe: the meaning of the quantifier depends on the entire 
universe, not just on some part of it.

(The title line refers to many-sorted logics. This is just a way of 
saying that some of the predicates of the language are given a 
special status as a 'sort', and some extra syntactic machinery is 
provided to restrict quantifiers to one of these sorts, and maybe 
some special inference schemes are also added for reasoning 
efficiently about sort heirarchies. But all the sorts are subsets of 
the same universe.)

An interpretation specifies an unique value for every expression of 
the logic; and indeed it can be characterised as the 'smallest' 
mathematical structure which would assign a truthvalue to every 
expression in the language in this way: it is a kind of minimal 
mathematical specification of a possible universe. It is one way the 
world could be so as to make all the sentences perfectly clear in 
their meaning. These 'ways the world might be' are not themselves 
part of the logic, and are not to be thought of as in any physical or 
inferential or mechanical way 'connected' to the logic.  They are not 
required to be computable or constructible (though some of them may 
be) and they are not constrained to be 'made of' any particular kind 
or category of 'stuff'. They are a purely mathematical description of 
the idea of a 'possible meaning' for the logic.

It is important to see that this is only ONE way the world could be. 
It doesnt make sense to ask what is THE interpretation of an 
expression in a logical language. In general one cannot point to a 
single interpretation and say anything very useful about the logic 
from that single interpretation, since the logic may well have 
different interpretations which have different properties. What one 
can do is relate inference in the logic - the mechanical business of 
drawing conclusions and creating proofs - to what is true in *all* 
interpretations of the logic. Sometimes a logic is "complete" in the 
sense that a sentence is provable exactly when it is true on all 
possible interpretations (according to the rules of interpretation 
given). This is a very nice property of a logic and its semantics, 
since it shows that they are in a precise kind of correspondence with 
each other. If a logic is not 'complete' in this way (note, the name 
doesnt mean that it is 'completely defined' or that everything 
intuitively true is provable in it) then it means that the semantics 
- the way the language is being interpreted - and the proof machinery 
- which in our terms means how the computer is able to use the logic 
- are out of step with each other: the semantics makes claims about 
the logic which the machinery cannot guarantee.

(Notice that the very same language could be given different rules of 
interpretation. This isnt often done for FOL but it could be, and it 
is often done for other logics. Completeness only makes sense with 
respect to a set of interpretation rules, so a logic may be complete 
with reference to one semantics but incomplete with repect to 
another.)

A natural question to ask is, can one write enough axioms into an 
ontology to be able to guarantee that the only interpretations which 
satisfy those axioms are just the ones one has in mind? And the 
answer is: almost certainly, NO. Almost all interesting ontologies 
have 'nonstandard' interpretations, ie interpretations which satisfy 
the logical rules but which don't correspond to the intuitions one 
had in mind when writing the axioms.  However, it is important to not 
let this get one down, since it is in fact a source of power rather 
than a kind of defeat. The existence of nonstandard interpretations 
doesnt amount to a claim that there is something wrong with the 
axioms, or to say that the axioms in some sense are infected by these 
unintended interpretations, or that one is  somehow 'forced' into 
considering them. In foundations of mathematics work nonstandard 
models are often considered an embarrassment, since mathematicians 
aspire to complete theories of Platonic realities, but for ontology 
writing it seems to be what one would expect: it's another way of 
saying that ontologies must be open-ended. There is always something 
more you could say about your intended topic: the ontological theory 
is never 'finished' in the sense that it says everything that could 
logically be said about a topic in enough detail to guarantee that 
nobody will ever be able to say anything more about it. 
('Nonstandard' interpretations can often be very interesting, in any 
case, even in mathematics. For example there is an entire field 
concerned with 'nonstandard' interpretations of arithmetic in which 
infinitesimals are allowed.)  But the key point is, whether one likes 
it or not, nonstandard interpretations are perfectly good 
interpretations, and so there is no way to legislate them out of 
existence or claim that they are somehow forbidden.

Now, a final point about interpretations. I said above that proofs 
could be related to what is true in *all* interpretations. This 'all' 
assumes that we have given an exact characterisation of what counts 
as an interpretation, which is relatively easy for 'first-order' 
quantification; but when we turn to 'higher-order' quantification, 
the story gets more complicated, since there will be several 
different ways to interpret a higher-order quantifier, and these 
correspond to different kinds of interpretation, and so it is no 
longer quite so clear what it means to say 'all interpretations'. 
This is where the Chris/John/Fritz exchange becomes tricky.

Higher-order logical notation

Mathematicians decided about 85 years ago that they needed a way to 
talk about properties and relations as well as about things, and 
practicing ontologists came to a similar conclusion about 75 years 
later. First-order logic has symbols for properties, of course, but 
it doesnt allow one to say something about 'all properties': it only 
has names for properties, not quantifiers and variables over them. 
Second-order logic (SOL) allows such quantifiers (and third-order 
logic allows properties of properties, and fourth-order... you get 
the idea. 'Higher-order logic' (HOL) is sometimes taken to mean any 
of these logics and sometimes to mean their union where one has all 
properties of properties of properties of..... up as far as you want. 
It might seem like a big difference, but in practice once you have 
second order you pretty much get all the rest, so I won't bother to 
be exact about the distinctions.)

Now it is not hard to tweak the syntax of the logical language to 
allow quantifiers over properties (and relations: again I won't 
bother with the details). You just allow variables to be put in the 
predicate position of an expression, ie you allow things like (?P a 
?x) as well as (P ?x ?y), and then you allow them to be quantified. 
Nothing to it: CYCL does it without even batting an eyelid.  (KIF 
carefully excludes this, which IMHO is a major problem for using KIF 
in ontology specification. I agree with Fritz and I think with John 
here).  It gets a bit trickier however to specify inference rules and 
interpretations.

Inferences using higher-order notation.

It is almost as easy to adapt the usual FOL inference rules/machinery 
to handle these property quantifiers. In fact one hardly needs to 
adapt it: it works on them just like it works on other variables. The 
usual unification algorithm, for example, used by first-order 
reasoners, works just fine with the variables in the first position 
of an expression. (As far as unification is concerned, the expression 
is just a tree with constant names or variables at its tips. The 
usual first-order restriction to only allow constants in the relation 
position has to be added as an explicit restriction.) So what, one 
might well ask, is all the fuss about?

A hint of the problem can be seen by asking whether there ought to be 
any new inference rules which we can now use which we couldnt use 
before. For example, suppose we have asserted (I'll do this in 
'illegal KIF')
(not(exists ((?P))(and (?P a) (?P b)) )
and something else asserts (FOO a) and (BAZ b). Now, is this a 
contradiction? The slightly-modified first-order reasoner isnt going 
to see any contradiction here. But does this mean there is no 
contradiction, or that first-order reasoning isnt up to the new task? 
Well, it depends on what that "exists ((?P))" is supposed to mean. 
What that says is, intuitively, there exists a property which is true 
of both a and b, ie a and b 'share a property'.  Well, do they? One 
person (call him F.) might say: no, or at least not unless FOO is the 
same as BAZ.  But someone else (call him P.) might say: Oh yes they 
do. Consider the property 'FOO OR BAZ", ie the property which is true 
of something just when either FOO or BAZ is true of it (compare "No 
dogs or Irish need apply"). If FOO and BAZ are properties, so is 
this. And this disjunction property is true of both a and of b, so it 
is a counterxample to the axiom, so there is a contradiction here. 
Imagine how the resulting debate might go:

F: That 'property' of yours isn't a real property.
P: Sure it is. I can write a sentence in the logical language which defines it:
(defproperty FUBAZ(?x) (or (FOO ?x)(BAZ ?x)))
F: But you didn't write that definition, did you?
P: But that is irrelevant: just because I didnt bother to define it 
explicitly, doesnt mean it doesnt *exist*. When I quantify over the 
universe, I don't quantify only over the things that have been 
defined explicitly. Even first-order quantification wouldnt work 
properly with that interpretation.
F: But if you allow property quantifiers to range over all properties 
which could possibly be defined, they become useless, since there are 
all kinds of those 'properties'. In fact there is, potentially, such 
a property for every expression in the logic.
P: Yes, indeed, there is; and that is why it is so hard to give a 
complete set of inference rules for higher-order quantifiers. But 
whether you like it or not, that is the universe of properties over 
which your quantifiers are supposed to be interpreted, so you had 
better face up to that fact.
F: Hey, I can make my quantifiers range over any set I want them to 
range over! The question is, who is to be master, that is all.
R: (A stranger who has been listening nearby): Actually the situation 
is even worse than either of you realize. Second-order quantifiers 
are true only over the set of ALL properties, whether they can be 
defined or not, so there might be more properties than there are 
expressions in the language, even.
F and P , together: Listen, things are hard enough already. Go back 
to wherever you came from and leave us alone.

Second order universes

As one can see, the issue turns on what universe the higher-order 
quantifiers are supposed to range over. If we were giving an 
interpretation for a higher-order logic, we would have to choose a 
basic universe for the ordinary quantifiers to range over, but what 
would be the proper universe for the the second-order quantifers? 
Here are a few possible answers:

1. The set of all properties, ie all subsets of the first-order 
universe (often called the 'power set' of the universe). (R's answer.)
2. The set of all definable properties, ie those which could be 
defined using an expression of the language. This is a subset of the 
power set. (P's answer)
3. The set of all named properties, ie the properties which have been 
assigned to predicate names by the interpretation. This is an even 
smaller subset of the power set. (F's answer.)

Of these answers 1. seems best: it is simple, clear, intuitive and 
has the merit that it is completely determined by the first-order 
universe itself, so that the choice of a 'basic' universe determines 
the whole universe for all the other quantifiers (third-order 
property quantifiers range over subsets of subsets of the universe, 
etc. on up.). This is the 'classical' semantics for higher-order 
logic, and it is what most mathematicians mean when they contrast 
higher-order with first-order logics. Unfortunately for us it is 
utterly impractical, as one can prove that NO logical system of rules 
can possibly be complete with respect to this semantics. (The reason 
is quite interesting: basically because the power set of a countable 
universe is uncountable. This means for example that there are 
uncountably many properties of the integers; see R's interjection. 
Maybe we don't really want to be quantifying over such a large set 
after all...) What this means is that no matter how loudly someone 
might claim that this was the semantics they had in mind for their 
logic, the logic itself - and any implementation of it - will not in 
fact accurately reflect this semantics.  It can only be wishful 
thinking. There is absolutely no way around this, its just a 
mathematical fact of life. This is the sense in which Chris is right 
in emphasising that SOL (or HOL) cannot be transcribed into FOL or 
indeed into any first-order theory.

(An aside. Fritz's remark that we just have to abandon completeness 
is a misunderstanding about the nature of completeness. If we 
'abandon completeness' in this sense then we are just claiming the 
right to assign arbitrary semantic interpetations to our logics as a 
kind of ritualization of fantasy. I can claim that the symbol "urgh" 
has the same semantics as the US Constitution, but what does that 
claim amount to? How could it be justified? It wouldnt be enough for 
me to just say that this was my "intended interpretation" and expect 
my machinery to work accordingly. Completeness is the only guarantee 
that our semantic claims about our logics, and the actual inferential 
performance of those logics, have any meaningful relationship to each 
other. 'Abandoning completeness' allows one to bask in a kind of glow 
of satisfaction about what ones axioms seem to be saying, but it 
doesnt help with the awkward fact that the 'nonstandard' models which 
one has conveniently refused to allow into ones preferred semantic 
account are, like it or not, the sand in the machinery of ones 
inference system.)

(Just as a point of clarification: Ive been careful to distinguish 
between a logic, and a language or notation. One can take a given 
notation and define a variety of different inference rules for it, 
each giving a different logic. In the FOO OR BAZ example, there are 
(at least) two different logics being considered, but they both 
involve the same language. The moral is that one cannot tell a logic 
from its syntax: one needs to know something more about it.  This 
'something more' might be some inference rules or it might be a 
system of interpretation, or both. If both and they correspond to 
each other, we have completeness. If only a semantics with no rules, 
we have an aspiration. If only rules but no semantics, we have 
confusion.)

Returning to the above answers, the imaginary debate between F. and 
P. is really about the difference between 2. and 3.  Here there is 
something useful to be said. Answer number 2 - P's position in the 
debate - gives what is often called the Henkin semantics for HOL 
(named for its inventor) . This has several nice properties. There is 
a complete set of inference rules, and HOL with this semantics can be 
transcribed into a (rather clumsy, but never mind for now) ontology 
in FOL. (It is a many-sorted logic with a sort for each arity of each 
level, eg there is a separate sort for third-order predicates with 
seven arguments, etc., and a lot of axioms about how various 
combinations of predicates are connected with each other.) This is 
the sense which Chris dismisses as being just a notational variant on 
FOL, as indeed in a robust mathematical sense it is. Nevertheless, it 
is a distinctly different logic to mechanize, one with its own 
special inference rules which go beyond the usual first-order 
inference machinery (for example, it would be able to find the FOO OR 
BAZ contradiction all by itself) and which are very tricky to 
implement. (Another way of saying the same thing is that the FOL 
ontology which would 'transcribe' a Henkin-interpreted HOL is so 
clumsy as to be unusable in practice.) For example, the usual 
unification algorithm notoriously does not work in this version of 
HOL (it doesnt even work for SOL).  I know of no current 
implementation of higher-order logic in this sense, though if there 
was one that would be a wonderful thing. (To implement the logic 
directly one would need rules of lambda-conversion to be incorporated 
into the unification algorithm.)

Answer 3 makes a higher-order quantification amount to even less than 
the second answer, and as that is equivalent to FOL, one might expect 
that with this answer, a higher-order quantification is actually LESS 
expressive than a first-order quantifier, and indeed that is the 
case. What this amounts to is that higher-order quantifiers are 
equivalent to quantifications over the names of properties rather 
than the properties themselves. When one writes
(forall ((?P))...?P....) one really means, on this interpretation, 
something like
(forall ((?n)) (implies (PropertyName ?n) ....(propertynamedby ?n)....)
and any property that doesnt have an explicit name is just not in the 
universe of properties.

Now, this last way of interpreting second-order quantifiers is SO 
limited in semantic expressiveness that the mathematicians of 85 
years ago wouldnt have taken it seriously for a minute. Nevertheless, 
I would urge, this is in fact the most useful way to interpret 
higher-order quantifiers for ontological use.  It means that 
higher-order quantifiers are severely restricted in their scope, but 
still it allows one to write properties of properties of ... etc., to 
be able to draw conclusions about named properties, to define classes 
of properties, etc., and use all that expressive licence whichFritz 
and other happy CYCLists have been taking for granted for years; and 
still have the resulting inferences interpreted by a standard 
efficient first-order reasoner.

In terms of the 'semantic debates' to which Chris and John have been 
referring, it is so weak as to be hardly worth mentioning, but for 
practical usage in writing ontologies it is of immense and liberating 
value. It costs nothing in inference complexity and does not 
complicate the proof theory in any harmful way, and it fits smoothly 
into the usual inference machinery. Its very weakness is an odd 
source of strength, in that it renders harmless a kind of use/mention 
casualness which one sees often in CYC, where a predicate on a 
property can refer either to the property or to the name of the 
property. In conventional interpretations this is a potential 
disaster, but with this weak notion of higher-order quantifier it is 
harmless. It also allows the user to control the universe of 
quantification by providing new definitions, for example.

Why everyone except me is wrong

So, to return to the adhominemism for a moment. Chris is the 
mathematician here, and he is correct (contra John) that HOL with the 
Henkin semantics is no more expressive than FOL, and that it is 
incorrect to say that SOL (still less HOL) is *equivalent* to a 
sorted first-order theory.  Nevertheless John is right to emphasise 
that even with the Henkin semantics, SOL and HOL are distinct 
languages with their own distinct proof styles and are in many ways 
superior to their transcription into first-order language. 
Unfortunately these strengths do not translate into mechanizability, 
as far as I know. Fritz is, I guess, entitled to claim that he will 
ignore completeness and hence claim that his logic fits a semantic 
theory which it in fact doesn't fit, but then he is also entitled to 
claim that black is white. The truth of the matter is that the logic 
which Fritz likes so much is actually FOL inference applied directly 
to HOL syntax, an odd blend from the traditional point of view but 
one which (Fritz is right to emphasise) is of considerable utility 
for representing ontologies, and which is readily mechanizable using 
well-known techniques.
(But the debates would be a lot shorter and less noisy if John and 
Fritz would stop making philosophical claims which are overstated, 
controversial and unnecessary to their main points and irrelevant to 
the main topic at hand. So there.)

-----

I plan to expand on this in a later message, giving a more thorough 
description of the proposed language. One might call it a CYClified 
version of KIF. (This logic lies somewhere in between answers 2 and 
3, by the way, which is where CYCL may also lie.)

Pat Hayes
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes