Note: As with all other versions of KIF as of April 2003, this
is a work in progress on a proposal with no official standards
status.

Knowledge Interchange Format (KIF) is a language designed for use in the interchange of knowledge among disparate computer systems (created by different programmers, at different times, in different languages, and so forth).

KIF is not intended as an internal representation for knowledge within computer systems or within closely related sets of computer systems (though the language can be used for this purpose as well). Typically, when a computer system reads a knowledge base in KIF, it converts the data into its own internal form (specialized pointer structures, arrays, etc.). All computation is done using these internal forms. When the computer system needs to communicate with another computer system, it maps its internal data structures into KIF.

The following categorical features are essential to the design of KIF.

- The language has declarative semantics. It is possible to understand the meaning of expressions in the language without appeal to an interpreter for manipulating those expressions. In this way, KIF differs from other languages that are based on specific interpreters, such as Emycin and Prolog.
- The language is logically comprehensive -- at its most general, it provides for the expression of arbitrary logical sentences. In this way, it differs from relational database languages (like SQL) and logic programming languages (like Prolog).
- The language provides for the representation of knowledge
about knowledge. This allows the user to make knowledge representation
decisions explicit and permits the user to introduce new knowledge
representation constructs without changing the language.

The following normative documents contain provisions, which, through reference in this text, constitute provisions of this document. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. However, parties to agreements based on this document are encouraged to investigate the possibility of applying the most recent editions of the normative documents indicated below. Members of ISO and IEC maintain registers of currently valid International Standards. ANSI maintains a register of currently valid American National Standards.

ISO/IEC 10646-1:1993, *Information Technology (IT) - Universal
Multiple-Octet Coded Character Set (UCS)*.

ISO/IEC 14481:1998, *Information Technology (IT) - Conceptual
Schema Modeling Facilities (CSMF)*.

For the purpose of this document, the terms and definitions given in ISO/IEC 10646-1:1993 and ISO/IEC 14481:1998 apply.

As with many computer-oriented languages, the syntax of KIF
is most easily described in three layers. First, there are the
basic *characters* of the language. These characters can
be combined to form *lexemes*. Finally, the lexemes of the
language can be combined to form grammatically legal *expressions*.
Although this layering is not strictly esential to the specification
of KIF, it simplifies the description of the syntax by dealing
with white space at the lexeme level and eliminating that detail
from the expression level.

The notation ** nonterminal*** means zero or more
occurrences;

The alphabet of KIF consists of 7 bit blocks of data. In this document, we refer to KIF data blocks via their usual ASCII encodings as characters (as given in ISO 646:1983).

KIF characters are classified as upper case letters, lower case letters, digits, alpha characters (non-alphabetic characters that are used in the same way that letters are used), white space, and other characters (every ascii character that is not in one of the other categories). Initial characters which are the first character of a term, must be alphabetic. Constants and variables consist of an initial alphabetic character plus a sequence of alphabetic, numeric or dash characters.

upper ::= A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z lower ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 special ::= ! | $ | % | & | * | + | - | . | / | < | = | > | ? | @ | _ | ~ | white ::= space | tab | return | linefeed | page initialchar ::= upper | lower wordchar ::= upper | lower | digit | - | _ | special character ::= upper | lower | digit | special | white

Use of characters in "special" for word characters is discouraged as they may be given particular meaning in future versions of the standard or its extensions.

The process of converting characters into lexemes in called
*lexical analysis*. The input to this process is a stream
of characters, and the output is a stream of *lexemes*.

The function of a lexical analyzer is cyclic. It reads characters from the input string until it encounters a character that cannot be combined with previous characters to form a legal lexeme. When this happens, it outputs the lexeme corresponding to the previously read characters. It then starts the process over again with the new character. Whitespace causes a break in the lexical analysis process but otherwise is discarded.

A *character string* is a series of characters enclosed
in quotation marks. The escape character `\` is used to
permit the inclusion of quotation marks and the `\` character
itself within such strings.

string ::= "character*"

For the purpose of grammatical analysis, it is useful to subdivide the class of words a little further, viz. as variables, operators, and constants.

A *constant* is a letter or digit followed by any number
of other legal word characters.

word ::= initialchar wordchar*

A *variable* is a word in which the first character is
`?` or `@`. A variable with a '@' character is called
"row variable" or "sequence variable". It
holds a variable number of arguments. While the unbounded implementation
of this makes KIF technically an "infinitary logic",
with associated issues in efficient implementation, a bounded
interpretation does force KIF out of first order. See Appendix
C for further information.

variable ::= ?word | @word

Semantically, there are four categories of constants in KIF
-- object constants, function constants, relation constants, and
logical constants. *Object constants* are used to denote
individual objects. *Function constants* denote functions
on those objects. *Relation constants* denote relations.
*Logical constants* express conditions about the world and
are either true or false. KIF is unusual among logical languages
in that there is no syntactic distinction among these four types
of constants; any constant can be used where any other constant
can be used. The differences between these categories of constants
is entirely semantic.

The legal expressions of KIF are formed from lexemes according
to the rules presented in this section. Terms are used to denote
objects in the world being described; sentences are used to express
facts about the world. Sentences can be used as terms, allowing
higher-order expressions to be written. A *knowledge base*
is a finite set of sentences.

number ::= [-] digit+ [. digit+] [exponent] exponent ::= e [-] digit+

There are several types of terms in KIF -- variables, constants, character strings, and functional terms, as well as sentences themselves.

term ::= variable | word | string | funterm | number | sentence

A *functional term* consists of a constant and an arbitrary
number of *argument* terms surrounded by matching parentheses.
Note that there is no syntactic restriction on the number of argument
terms; arity restrictions in KIF are treated semantically.

relword ::= initialchar wordchar* funword ::= initialchar wordchar*

No "relword" and "funword" shall have the same character sequence in a particular knowledge base.

funterm ::= (funword term+)

The following BNF defines the set of legal sentences in KIF. There are six types of sentences. We have already mentioned logical constants.

sentence ::= word | equation | relsent | logsent | quantsent

An *equation* consists of the `=` operator and
two terms.

equation ::= (= term term)

An *implicit relational sentence* consists of a constant
and an arbitrary number of *argument* terms. As with functional
terms, there is no syntactic restriction on the number of argument
terms in a relation sentence.

relsent ::= (relword term+)

The syntax of *logical sentences* depends on the logical
operator involved. A sentence involving the `not` operator
is called a *negation*. A sentence involving the `and`
operator is called a *conjunction*, and the arguments are
called *conjuncts*. A sentence involving the `or`
operator is called a *disjunction*, and the arguments are
called *disjuncts*. A sentence involving the `=>`
operator is called an *implication*; all of its arguments
but the last are called *antecedents*; and the last argument
is called the *consequent*. A sentence involving the `<=>`
operator is called an *equivalence*.

logsent ::= (not sentence) | (and sentence+) | (or sentence+) | (=> sentence sentence) | (<=> sentence sentence)

There are two types of *quantified sentences* -- a *universally
quantified sentence* is signalled by the use of the `forall`
operator, and an *existentially quantified sentence* is signalled
by the use of the `exists` operator. The first argument
in each case is a list of variable specifications. A variable
specification is either a variable or a list consisting of a variable
and a term denoting a relation that restricts the domain of the
specified variable.

quantsent ::= (forall (variable+) sentence) | (exists (variable+) sentence)

Note that, according to these rules, it is permissible to write
sentences with *free* variables, i.e. variables that do not
occur within the scope of any enclosing quantifiers. The significance
of the free variables in a sentence depends on the use of the
sentence. When we assert the truth of a sentence with free variables,
we are, in effect, saying that the sentence is true for all values
of the free variables, i.e. the variables are universally quantified.
When we ask whether a sentence with free variables is true, we
are, in effect, asking whether there are any values for the free
variables for which the sentence is true, i.e. the variables are
existentially quantified.

It is important to keep in mind that a knowledge base is a
*set* of sentences, not a *sequence*; and, therefore,
the order of forms within a knowledge base is unimportant. Order
*may* have heuristic value to deductive programs by suggesting
an order in which to use those sentences; however, this implicit
approach to knowledge exchange lies outside of the definition
of KIF.

Comments in SUO-KIF are indicated with a single semi-colon. All characters from the semi-colon to the end of the line can be ignored.

The basis for the semantics of KIF is a *conceptualization*
of the world in terms of objects and relations among those objects.

A *universe of discourse* is the set of all objects
presumed or hypothesized to exist in the world. The notion of
*object* used here is quite broad. Objects can be concrete
(e.g. a specific carbon atom, Confucius, the Sun) or abstract
(e.g. the number 2, the set of all integers, the concept of justice).
Objects can be primitive or composite (e.g. a circuit that consists
of many subcircuits). Objects can even be fictional (e.g. a unicorn,
Sherlock Holmes).

Different users of a declarative representation language, like
KIF, are likely to have different universes of discourse. KIF
is *conceptually promiscuous* in that it does *not*
require every user to share the same universe of discourse. On
the other hand, KIF is *conceptually grounded* in that every
universe of discourse *is* required to include certain *basic*
objects.

The following basic objects must occur in every universe of discourse.

- All numbers, real and complex.
- All ASCII characters.
- All finite strings of ASCII characters.
- Words. Yes, words are themselves objects in the universe of discourse, along with the things they represent.
- All finite lists of objects in the universe of discourse.

Remember, that to these basic elements, the user can add whatever
*non-basic* objects seem useful.

In KIF, relationships among objects take the form of relations.
Formally, a *relation* is defined as an arbitrary set of
finite lists of objects (of possibly varying lengths). Each list
is a selection of objects that jointly satisfy the relation. For
example, the < relation on numbers contains the list <2,3>,
indicating that 2 is less than 3.

A *function* is a special kind of relation. For every
finite sequence of objects (called the *arguments*), a function
associates a unique object (called the *value*). More formally,
a function is defined as a set of finite lists of objects, one
for each combination of possible arguments. In each list, the
initial elements are the arguments, and the final element is the
value. For example, the 1+ function contains the list <2,3>,
indicating that integer successor of 2 is 3.

Note that both functions and relations are defined as sets of lists. In fact, every function is a relation. However, not every relation is a function. In a function, there cannot be two lists that disagree on only the last element. This would be tantamount to the function having two values for one combination of arguments. By contrast, in a relation, there can be any number of lists that agree on all but the last element. For example, the list <2,3> is a member of the 1+ function, and there is no other list of length 2 with 2 as its first argument, i.e. there is only one successor for 2. By contrast, the < relation contains the lists <2,3>, <2,4>, <2,5>, and so forth, indicating that 2 is less than 3, 4, 5, and so forth.

Many mathematicians require that functions and relations have fixed arity, i.e they require that all of the lists comprising a relation have the same length. The definitions here allow for relations with variable arity, i.e. it is perfectly acceptable for a function or a relation to contain lists of different lengths. For example, the relation < contains the lists <2,3> and <2,3,4>, reflecting the fact that 2 is less than 3 and the fact that 2 is less than 3 and 3 is less than 4. This flexibility is not essential, but it is extremely convenient and poses no significant theoretical problems.

The value of a functional term is obtained by applying the function denoted by the function constant in the term to the objects denoted by the arguments.

For example, the value of the term `(+ 2 3)` is obtained
by applying the addition function (the function denoted by `+`)
to the numbers ** 2** and

A simple relational sentence is true if and only if the relation denoted by the relation constant in the sentence is true of the objects denoted by the arguments. Equivalently, viewing a relation as a set of tuples, we say that the relational sentence is true if and only if the tuple of objects formed from the values of the arguments is a member of the set of tuples denoted by the relation constant.

An equation is true if and only if the terms in the equation refer to the same object in the universe of discourse.

An inequality is true if and only if the terms in the equation refer to distinct objects in the universe of discourse.

The truth value of `true` is true, and the truth value
of `false` is false.

A negation is true if and only if the negated sentence is false.

A conjunction is true if and only if every conjunct is true.

A disjunction is true if and only if at least one of the disjuncts is true.

If every antecedent in an implication is true, then the implication as a whole is true if and only if the the consequent is true. If any of the antecedents is false, then the implication as a whole is true, regardless of the truth value of the consequent.

A reverse implication is just an implication with the consequent and antecedents reversed.

An equivalence is equivalent to the conjunction of an implication and a reverse implication.

A simple existentially quantified sentence (one in which the
first argument is a list of variables) is true if and only if
the embedded sentence is true for *some* value of the variables
mentioned in the first argument.

A simple universally quantified sentence (one in which the
first argument is a list of variables) is true if and only if
the embedded sentence is true for *every* value of the variables
mentioned in the first argument.

Note that the significance of free variables in quantifier-free sentences depends on context. Free variables in an assertion are assumed to be universally quantified. Free variables in a query are assumed to be existentially quantified. In other words, the meaning of free variables is determined by the way in which KIF is used. It cannot be unambiguously defined within KIF itself. To be certain of the usage in all contexts, use explicit quantifiers.

The referent of every numerical constant in KIF is assumed
to be the number for which that constant is the base **10**
representation. Among other things, this means that we can infer
inequality of all distinct numerical constants, i.e. for every
** t1** and distinct

(/=t1t2)

We use the intended meaning of numerical constants in defining the numerical functions and relations in this chapter. In particular, we require that these functions and relations behave correctly on all numbers represented in this way.

Note that this does mean that it is incorrect to apply these
functions and relations to terms other than numbers. For example,
a nonnumerical term may *refer* to a number, e.g. the term
`two` may be defined to be the same as the number `2`
in which case it is perfectly proper to write `(+ two two)`.

The user may also want to extend these functions and relations to apply to objects other than numbers, e.g. sets and lists.

KIF is a highly expressive language. For many, this is a desirable feature; but there are disadvantages. One disadvantage is that it complicates the job of building fully conforming systems. Another disdvantage is that the resulting systems tend to be "heavyweight" (i.e. they are larger and in some cases less efficient than systems that employ more restricted languages).

In order to deal with these problems, the basic language specification is augmented with a set of "conformance dimensions". These dimensions are not the same as the "conformance levels" of other languages. Rather, each conformance dimension has a variety of levels within that dimension.

A "conformance profile" is a selection of alternatives from each conformance dimension. System builders are expected to make choices for each dimension and and then ensure that their systems adhere to the resulting comformance profile. Systems are expected to use the terminology defined here to share information about their conformance profile with other systems (in a protocol-specific manner).

Although this conformance profile scheme is more complex than one based on conformance levels, it accommodates varying capabilities and/or computational constraints while providing a migration path from more restrictive to more expressive.

A **conformance dimension** is a classification of KIF sentences
into **conformance categories** on the basis of a single syntactic
criterion. (For example, the quantification dimension provides
two categories, quantified KIF and unquantified KIF, based on
whether or not a conforming knowledge base contans quantifiers.)

The first conformance dimension concerns logical form. There are five basic categories: atomic, conjunctive, positive, logical, and rule-like. Rule-like knowledge bases are further categorized as Horn or non-Horn and recursive or non-recursive.

A knowledge base is **atomic** if and only if it contains
no logical operators.

A knowledge base is **conjunctive** if and only if it contains
no logical operators except for conjunction.

A knowledge base is **positive** if and only if it contains
no logical operators except for conjunction and disjunction.

A knowledge base is **logical** if and only if it contains
no logical operators except for conjunction, disjunction, and
negation.

A knowledge base is **rule-like** if and only if every sentence
is either atomic or an implication or reverse implication in which
all subexpressions are atomic sentences or negations of atomic
sentences. A rule system is a rule-like knowledge base.

A rule system is **Horn** if and only if every constituent
of every rule is atomic (i.e. no negations allowed). Otherwise,
the rule system is said to be **non-Horn**.

The dependency graph for a rule system is a graph whose nodes
are the constants in relational position. There is an edge from
the node for a given relation constant *p* to the node of
relation constant *q* if and only if *p* appears in
the body of a rule whose head predicate is *p*.

A rule system is **recursive** if there is a cycle in its
dependency graph. Otherwise, the rule system is said to be **non-recursive**.

The nature of terms defines a second conformance dimension. There are two categories: simple and complex.

A knowledge base is **simple** if and only if the only terms
occurring the knowledge base are constants and variables.

A knowledge base is **complex** if and only if it contains
terms other than constants or variables, e.g. functional terms
or logical terms.

The third conformance dimension concerns the presence or absence of variables.

A knowledge base is **ground**, or **zeroth-order**,
if and only if it contains no variables. Otherwise, a knowledge
base in **nonground**.

A knowledge base is **first-order** if and only if there
are no variables in the first argument of any explicit functional
term or explicit relational sentence.

A knowledge base is **higher-order** otherwise.

For nonground knowledge bases, there are two alternatives -- quantified and unquantified.

A nonground knowledge base is **quantified** if and only
if it contains at least one explicit quantifier.

A nonground knowledge base is **unquantified** if and only
if it contains no explicit quantifiers.

A **conformance profile** is a selection of alternatives
for each conformance dimension. Given the dimensions and categories
defined in the preceding section, it is possible to define a large
number of profiles. A single system may use different profiles
in different types of communication. In particular, it is common
to use one profile for assertions and another for queries. The
following paragraphs define a few common types of systems with
their corresponding profiles.

A **database system** is one in which (1) all assertions
are atomic, simple, ground, and baselevel and (2) all queries
are positive, simple, unquantified, and baselevel.

A **Horn system** (e.g. pure Datalog) is one in which (1)
all assertions are rules that are Horn, unquantified, and baselevel
and (2) all queries are positive, non-recursive, unquantified,
and baselevel.

A **relational system** is one in which (1) all assertions
are rules that are simple, unquantified (but may be non-Horn and
non-recursive), and baselevel and (2) all queries are logical,
non-recursive, unquantified, and baselevel.

A **first-order system** is one that allows the broadest
categories within each conformance dimension except that only
first-order expressions are accommodated.

A **full KIF system** is one that accepts the broadest categories
within each conformance dimension, i.e. any KIF knowledge base
is acceptable in any context.

The existence of multiple conformance profiles raises the question of what happens when systems with different profles must communicate.

Whenever the conformance profile of a receiver is known, a sender should avoid sending expressions that fall outside the receiver's conformance profile.

Unfortunately, this rule cannot be enforced in all situations. In some cases, conformance information about receivers is unavailable; and, even when conformance information is available, it may be desirable to send a message that falls outside a receiver's profile, e.g. it may be most efficient for a sender to broadcast a single knowledge base to a large number of receivers with differing conformance profiles rather than sending different knowledge bases to each receiver.

Whenever a receiver receives a non-conforming expression, it is free to ignore the expression, even though it may be able to make sense of portions of that expression. If the receiver ignores a non-conforming expression and the sender requests a reply, the receiver should report a failure.

upper ::= A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z lower ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 special ::= ! | $ | % | & | * | + | - | . | / | < | = | > | ? | @ | _ | ~ | white ::= space | tab | return | linefeed | page initialchar ::= upper | lower wordchar ::= upper | lower | digit | - | _ | special character ::= upper | lower | digit | special | white word ::= initialchar wordchar* string ::= "character*" variable ::= ?word | @word number ::= [-] digit+ [. digit+] [exponent] exponent ::= e [-] digit+ term ::= variable | word | string | funterm | number | sentence relword ::= initialchar wordchar* funword ::= initialchar wordchar* funterm ::= (funword term+) sentence ::= word | equation | inequality | relsent | logsent | quantsent equation ::= (= term term) relsent ::= (relword term+) logsent ::= (not sentence) | (and sentence+) | (or sentence+) | (=> sentence sentence) | (<=> sentence sentence) quantsent ::= (forall (variable+) sentence) | (exists (variable+) sentence)

`( instance ?X ?Y)` - ?X is an individual which
is a member of class ?Y

**subclass** is transitive and **instance** is transitive
through **subclass**.

(=> (and (instance ?X ?Y) (subclass ?Y ?Z)) (instance ?X ?Z)) (=> (and (subclass ?X ?Y) (subclass ?Y ?Z)) (subclass ?X ?Z))

`( domain ?X ?N ?Y)` - the argument in position
?N of relation ?X is a member of class ?Y. Note that ?N=0 could
be used for the return type of functional relations.

One term is also required which could be called "**top**"
(as in Sowa's ontology) or "**Entity**" (as in SUMO)

A B (and A B) T T T T F F F T F F F F A B (or A B) T T T T F T F T T F F F A (not A) T F F T A B (=> A B) T T T T F F F T T F F T A B (<=> A B) T T T T F F T F F F F T

**Row Variables**

One option is to treat row varaibles as "macros", which would get expanded automatically so

(=> (and (subrelation ?REL1 ?REL2) (holds ?REL1 @ROW)) (holds ?REL2 @ROW))

would become

(=> (and (subrelation ?REL1 ?REL2) (holds ?REL1 ?ARG1)) (holds ?REL2 ?ARG1)) (=> (and (subrelation ?REL1 ?REL2) (holds ?REL1 ?ARG1 ?ARG2)) (holds ?REL2 ?ARG1 ?ARG2))

etc.

Note that this "macro" style expansion has the problem that unlike the true semantics of row variables, that it is not infinite. If the interpretation only expands to five variables, that there is a problem if the knowledge engineer uses a relation with six.

Last updated on 2003-12-28 by *Webmaster
*

Copyright IEEE 2003 - All rights reserved.