Standard Upper Ontology Knowledge Interchange Format

Note: As with all other versions of KIF as of April 2003, this is a work in progress on a proposal with no official standards status.


1 Scope

Knowledge Interchange Format (KIF) is a language designed for use in the interchange of knowledge among disparate computer systems (created by different programmers, at different times, in different languages, and so forth).

KIF is not intended as an internal representation for knowledge within computer systems or within closely related sets of computer systems (though the language can be used for this purpose as well). Typically, when a computer system reads a knowledge base in KIF, it converts the data into its own internal form (specialized pointer structures, arrays, etc.). All computation is done using these internal forms. When the computer system needs to communicate with another computer system, it maps its internal data structures into KIF.

The following categorical features are essential to the design of KIF.

  1. The language has declarative semantics. It is possible to understand the meaning of expressions in the language without appeal to an interpreter for manipulating those expressions. In this way, KIF differs from other languages that are based on specific interpreters, such as Emycin and Prolog.
  2. The language is logically comprehensive -- at its most general, it provides for the expression of arbitrary logical sentences. In this way, it differs from relational database languages (like SQL) and logic programming languages (like Prolog).
  3. The language provides for the representation of knowledge about knowledge. This allows the user to make knowledge representation decisions explicit and permits the user to introduce new knowledge representation constructs without changing the language.

     


2 Normative References

The following normative documents contain provisions, which, through reference in this text, constitute provisions of this document. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. However, parties to agreements based on this document are encouraged to investigate the possibility of applying the most recent editions of the normative documents indicated below. Members of ISO and IEC maintain registers of currently valid International Standards. ANSI maintains a register of currently valid American National Standards.

ISO/IEC 10646-1:1993, Information Technology (IT) - Universal Multiple-Octet Coded Character Set (UCS).

ISO/IEC 14481:1998, Information Technology (IT) - Conceptual Schema Modeling Facilities (CSMF).


3 Terms and Definitions

For the purpose of this document, the terms and definitions given in ISO/IEC 10646-1:1993 and ISO/IEC 14481:1998 apply.


4 Syntax

4.1 Introduction

As with many computer-oriented languages, the syntax of KIF is most easily described in three layers. First, there are the basic characters of the language. These characters can be combined to form lexemes. Finally, the lexemes of the language can be combined to form grammatically legal expressions. Although this layering is not strictly esential to the specification of KIF, it simplifies the description of the syntax by dealing with white space at the lexeme level and eliminating that detail from the expression level.

The notation nonterminal* means zero or more occurrences; nonterminal+ means one or more occurrences; The nonterminals space, tab, return, linefeed, and page refer to the characters corresponding to ascii codes 32, 9, 13, 10, and 12, respectively. The nonterminal character denotes the set of all 128 ascii characters.

4.2 Characters

The alphabet of KIF consists of 7 bit blocks of data. In this document, we refer to KIF data blocks via their usual ASCII encodings as characters (as given in ISO 646:1983).

KIF characters are classified as upper case letters, lower case letters, digits, alpha characters (non-alphabetic characters that are used in the same way that letters are used), white space, and other characters (every ascii character that is not in one of the other categories). Initial characters which are the first character of a term, must be alphabetic. Constants and variables consist of an initial alphabetic character plus a sequence of alphabetic, numeric or dash characters.

 

    upper ::= A | B | C | D | E | F | G | H | I | J | K | L | M | 
              N | O | P | Q | R | S | T | U | V | W | X | Y | Z

    lower ::= a | b | c | d | e | f | g | h | i | j | k | l | m | 
              n | o | p | q | r | s | t | u | v | w | x | y | z

    digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

    special ::= ! | $ | % | & | * | + | - | . | / | < | = | > | ? |
              @ | _ | ~ |

    white ::= space | tab | return | linefeed | page

    initialchar ::= upper | lower

    wordchar ::= upper | lower | digit | - | _ | special

    character ::= upper | lower | digit | special | white

Use of characters in "special" for word characters is discouraged as they may be given particular meaning in future versions of the standard or its extensions.

4.3 Lexemes

The process of converting characters into lexemes in called lexical analysis. The input to this process is a stream of characters, and the output is a stream of lexemes.

The function of a lexical analyzer is cyclic. It reads characters from the input string until it encounters a character that cannot be combined with previous characters to form a legal lexeme. When this happens, it outputs the lexeme corresponding to the previously read characters. It then starts the process over again with the new character. Whitespace causes a break in the lexical analysis process but otherwise is discarded.

A character string is a series of characters enclosed in quotation marks. The escape character \ is used to permit the inclusion of quotation marks and the \ character itself within such strings.

 

   string ::= "character*"

For the purpose of grammatical analysis, it is useful to subdivide the class of words a little further, viz. as variables, operators, and constants.

A constant is a letter or digit followed by any number of other legal word characters.

 

    word ::= initialchar wordchar*

A variable is a word in which the first character is ? or @. A variable with a '@' character is called "row variable" or "sequence variable". It holds a variable number of arguments. While the unbounded implementation of this makes KIF technically an "infinitary logic", with associated issues in efficient implementation, a bounded interpretation does force KIF out of first order. See Appendix C for further information.

    variable ::= ?word | @word

Semantically, there are four categories of constants in KIF -- object constants, function constants, relation constants, and logical constants. Object constants are used to denote individual objects. Function constants denote functions on those objects. Relation constants denote relations. Logical constants express conditions about the world and are either true or false. KIF is unusual among logical languages in that there is no syntactic distinction among these four types of constants; any constant can be used where any other constant can be used. The differences between these categories of constants is entirely semantic.

 

4.4 Expressions

The legal expressions of KIF are formed from lexemes according to the rules presented in this section. Terms are used to denote objects in the world being described; sentences are used to express facts about the world. Sentences can be used as terms, allowing higher-order expressions to be written. A knowledge base is a finite set of sentences.

 

    number ::= [-] digit+ [. digit+] [exponent]
    exponent ::= e [-] digit+

There are several types of terms in KIF -- variables, constants, character strings, and functional terms, as well as sentences themselves.

 

    term ::= variable | word | string | funterm | number | sentence

A functional term consists of a constant and an arbitrary number of argument terms surrounded by matching parentheses. Note that there is no syntactic restriction on the number of argument terms; arity restrictions in KIF are treated semantically.

 

    relword ::= initialchar wordchar*
    funword ::= initialchar wordchar*

No "relword" and "funword" shall have the same character sequence in a particular knowledge base.

    funterm ::= (funword term+)

The following BNF defines the set of legal sentences in KIF. There are six types of sentences. We have already mentioned logical constants.

 

     sentence ::= word | equation | relsent | logsent | quantsent

An equation consists of the = operator and two terms.

    equation ::= (= term term)

An implicit relational sentence consists of a constant and an arbitrary number of argument terms. As with functional terms, there is no syntactic restriction on the number of argument terms in a relation sentence.

 

    relsent ::= (relword term+)

The syntax of logical sentences depends on the logical operator involved. A sentence involving the not operator is called a negation. A sentence involving the and operator is called a conjunction, and the arguments are called conjuncts. A sentence involving the or operator is called a disjunction, and the arguments are called disjuncts. A sentence involving the => operator is called an implication; all of its arguments but the last are called antecedents; and the last argument is called the consequent. A sentence involving the <=> operator is called an equivalence.

    logsent ::= (not sentence) |
                (and sentence+) |
                (or sentence+) |
                (=> sentence sentence) |
                (<=> sentence sentence)

There are two types of quantified sentences -- a universally quantified sentence is signalled by the use of the forall operator, and an existentially quantified sentence is signalled by the use of the exists operator. The first argument in each case is a list of variable specifications. A variable specification is either a variable or a list consisting of a variable and a term denoting a relation that restricts the domain of the specified variable.

 

    quantsent ::= (forall (variable+) sentence) |
                  (exists (variable+) sentence)

Note that, according to these rules, it is permissible to write sentences with free variables, i.e. variables that do not occur within the scope of any enclosing quantifiers. The significance of the free variables in a sentence depends on the use of the sentence. When we assert the truth of a sentence with free variables, we are, in effect, saying that the sentence is true for all values of the free variables, i.e. the variables are universally quantified. When we ask whether a sentence with free variables is true, we are, in effect, asking whether there are any values for the free variables for which the sentence is true, i.e. the variables are existentially quantified.

It is important to keep in mind that a knowledge base is a set of sentences, not a sequence; and, therefore, the order of forms within a knowledge base is unimportant. Order may have heuristic value to deductive programs by suggesting an order in which to use those sentences; however, this implicit approach to knowledge exchange lies outside of the definition of KIF.

 

4.5 Comments

Comments in SUO-KIF are indicated with a single semi-colon. All characters from the semi-colon to the end of the line can be ignored.


5 Basics

5.1 Introduction

The basis for the semantics of KIF is a conceptualization of the world in terms of objects and relations among those objects.

A universe of discourse is the set of all objects presumed or hypothesized to exist in the world. The notion of object used here is quite broad. Objects can be concrete (e.g. a specific carbon atom, Confucius, the Sun) or abstract (e.g. the number 2, the set of all integers, the concept of justice). Objects can be primitive or composite (e.g. a circuit that consists of many subcircuits). Objects can even be fictional (e.g. a unicorn, Sherlock Holmes).

Different users of a declarative representation language, like KIF, are likely to have different universes of discourse. KIF is conceptually promiscuous in that it does not require every user to share the same universe of discourse. On the other hand, KIF is conceptually grounded in that every universe of discourse is required to include certain basic objects.

The following basic objects must occur in every universe of discourse.

Remember, that to these basic elements, the user can add whatever non-basic objects seem useful.

In KIF, relationships among objects take the form of relations. Formally, a relation is defined as an arbitrary set of finite lists of objects (of possibly varying lengths). Each list is a selection of objects that jointly satisfy the relation. For example, the < relation on numbers contains the list <2,3>, indicating that 2 is less than 3.

A function is a special kind of relation. For every finite sequence of objects (called the arguments), a function associates a unique object (called the value). More formally, a function is defined as a set of finite lists of objects, one for each combination of possible arguments. In each list, the initial elements are the arguments, and the final element is the value. For example, the 1+ function contains the list <2,3>, indicating that integer successor of 2 is 3.

Note that both functions and relations are defined as sets of lists. In fact, every function is a relation. However, not every relation is a function. In a function, there cannot be two lists that disagree on only the last element. This would be tantamount to the function having two values for one combination of arguments. By contrast, in a relation, there can be any number of lists that agree on all but the last element. For example, the list <2,3> is a member of the 1+ function, and there is no other list of length 2 with 2 as its first argument, i.e. there is only one successor for 2. By contrast, the < relation contains the lists <2,3>, <2,4>, <2,5>, and so forth, indicating that 2 is less than 3, 4, 5, and so forth.

Many mathematicians require that functions and relations have fixed arity, i.e they require that all of the lists comprising a relation have the same length. The definitions here allow for relations with variable arity, i.e. it is perfectly acceptable for a function or a relation to contain lists of different lengths. For example, the relation < contains the lists <2,3> and <2,3,4>, reflecting the fact that 2 is less than 3 and the fact that 2 is less than 3 and 3 is less than 4. This flexibility is not essential, but it is extremely convenient and poses no significant theoretical problems.

 

5.3 Functional Terms

The value of a functional term is obtained by applying the function denoted by the function constant in the term to the objects denoted by the arguments.

For example, the value of the term (+ 2 3) is obtained by applying the addition function (the function denoted by +) to the numbers 2 and 3 (the objects denoted by the object constants 2 and 3) to obtain the value 5, which is the value of the object constant 5.

5.4 Relational Sentences

A simple relational sentence is true if and only if the relation denoted by the relation constant in the sentence is true of the objects denoted by the arguments. Equivalently, viewing a relation as a set of tuples, we say that the relational sentence is true if and only if the tuple of objects formed from the values of the arguments is a member of the set of tuples denoted by the relation constant.

5.5 Equations and Inequalities

An equation is true if and only if the terms in the equation refer to the same object in the universe of discourse.

An inequality is true if and only if the terms in the equation refer to distinct objects in the universe of discourse.

5.6 True and False

The truth value of true is true, and the truth value of false is false.


6 Logic

6.2 Logical Sentences

A negation is true if and only if the negated sentence is false.

A conjunction is true if and only if every conjunct is true.

A disjunction is true if and only if at least one of the disjuncts is true.

If every antecedent in an implication is true, then the implication as a whole is true if and only if the the consequent is true. If any of the antecedents is false, then the implication as a whole is true, regardless of the truth value of the consequent.

A reverse implication is just an implication with the consequent and antecedents reversed.

An equivalence is equivalent to the conjunction of an implication and a reverse implication.

6.3 Quantified Sentences

A simple existentially quantified sentence (one in which the first argument is a list of variables) is true if and only if the embedded sentence is true for some value of the variables mentioned in the first argument.

A simple universally quantified sentence (one in which the first argument is a list of variables) is true if and only if the embedded sentence is true for every value of the variables mentioned in the first argument.

Note that the significance of free variables in quantifier-free sentences depends on context. Free variables in an assertion are assumed to be universally quantified. Free variables in a query are assumed to be existentially quantified. In other words, the meaning of free variables is determined by the way in which KIF is used. It cannot be unambiguously defined within KIF itself. To be certain of the usage in all contexts, use explicit quantifiers.


7 Numbers

7.1 Introduction

The referent of every numerical constant in KIF is assumed to be the number for which that constant is the base 10 representation. Among other things, this means that we can infer inequality of all distinct numerical constants, i.e. for every t1 and distinct t2 the following sentence is true.

    (/= t1 t2)

We use the intended meaning of numerical constants in defining the numerical functions and relations in this chapter. In particular, we require that these functions and relations behave correctly on all numbers represented in this way.

Note that this does mean that it is incorrect to apply these functions and relations to terms other than numbers. For example, a nonnumerical term may refer to a number, e.g. the term two may be defined to be the same as the number 2 in which case it is perfectly proper to write (+ two two).

The user may also want to extend these functions and relations to apply to objects other than numbers, e.g. sets and lists.

 


12 Conformance

12.1 Introduction

KIF is a highly expressive language. For many, this is a desirable feature; but there are disadvantages. One disadvantage is that it complicates the job of building fully conforming systems. Another disdvantage is that the resulting systems tend to be "heavyweight" (i.e. they are larger and in some cases less efficient than systems that employ more restricted languages).

In order to deal with these problems, the basic language specification is augmented with a set of "conformance dimensions". These dimensions are not the same as the "conformance levels" of other languages. Rather, each conformance dimension has a variety of levels within that dimension.

A "conformance profile" is a selection of alternatives from each conformance dimension. System builders are expected to make choices for each dimension and and then ensure that their systems adhere to the resulting comformance profile. Systems are expected to use the terminology defined here to share information about their conformance profile with other systems (in a protocol-specific manner).

Although this conformance profile scheme is more complex than one based on conformance levels, it accommodates varying capabilities and/or computational constraints while providing a migration path from more restrictive to more expressive.

12.2 Conformance Dimensions

12.2.1 Introduction

A conformance dimension is a classification of KIF sentences into conformance categories on the basis of a single syntactic criterion. (For example, the quantification dimension provides two categories, quantified KIF and unquantified KIF, based on whether or not a conforming knowledge base contans quantifiers.)

12.2.2 Logical Form

The first conformance dimension concerns logical form. There are five basic categories: atomic, conjunctive, positive, logical, and rule-like. Rule-like knowledge bases are further categorized as Horn or non-Horn and recursive or non-recursive.

A knowledge base is atomic if and only if it contains no logical operators.

A knowledge base is conjunctive if and only if it contains no logical operators except for conjunction.

A knowledge base is positive if and only if it contains no logical operators except for conjunction and disjunction.

A knowledge base is logical if and only if it contains no logical operators except for conjunction, disjunction, and negation.

A knowledge base is rule-like if and only if every sentence is either atomic or an implication or reverse implication in which all subexpressions are atomic sentences or negations of atomic sentences. A rule system is a rule-like knowledge base.

A rule system is Horn if and only if every constituent of every rule is atomic (i.e. no negations allowed). Otherwise, the rule system is said to be non-Horn.

The dependency graph for a rule system is a graph whose nodes are the constants in relational position. There is an edge from the node for a given relation constant p to the node of relation constant q if and only if p appears in the body of a rule whose head predicate is p.

A rule system is recursive if there is a cycle in its dependency graph. Otherwise, the rule system is said to be non-recursive.

12.2.3 Term Complexity

The nature of terms defines a second conformance dimension. There are two categories: simple and complex.

A knowledge base is simple if and only if the only terms occurring the knowledge base are constants and variables.

A knowledge base is complex if and only if it contains terms other than constants or variables, e.g. functional terms or logical terms.

12.2.4 Order

The third conformance dimension concerns the presence or absence of variables.

A knowledge base is ground, or zeroth-order, if and only if it contains no variables. Otherwise, a knowledge base in nonground.

A knowledge base is first-order if and only if there are no variables in the first argument of any explicit functional term or explicit relational sentence.

A knowledge base is higher-order otherwise.

12.2.5 Quantification

For nonground knowledge bases, there are two alternatives -- quantified and unquantified.

A nonground knowledge base is quantified if and only if it contains at least one explicit quantifier.

A nonground knowledge base is unquantified if and only if it contains no explicit quantifiers.

12.3 Common Conformance Profiles

A conformance profile is a selection of alternatives for each conformance dimension. Given the dimensions and categories defined in the preceding section, it is possible to define a large number of profiles. A single system may use different profiles in different types of communication. In particular, it is common to use one profile for assertions and another for queries. The following paragraphs define a few common types of systems with their corresponding profiles.

A database system is one in which (1) all assertions are atomic, simple, ground, and baselevel and (2) all queries are positive, simple, unquantified, and baselevel.

A Horn system (e.g. pure Datalog) is one in which (1) all assertions are rules that are Horn, unquantified, and baselevel and (2) all queries are positive, non-recursive, unquantified, and baselevel.

A relational system is one in which (1) all assertions are rules that are simple, unquantified (but may be non-Horn and non-recursive), and baselevel and (2) all queries are logical, non-recursive, unquantified, and baselevel.

A first-order system is one that allows the broadest categories within each conformance dimension except that only first-order expressions are accommodated.

A full KIF system is one that accepts the broadest categories within each conformance dimension, i.e. any KIF knowledge base is acceptable in any context.

12.4 Dealing with Differences in Conformance Profiles

The existence of multiple conformance profiles raises the question of what happens when systems with different profles must communicate.

Whenever the conformance profile of a receiver is known, a sender should avoid sending expressions that fall outside the receiver's conformance profile.

Unfortunately, this rule cannot be enforced in all situations. In some cases, conformance information about receivers is unavailable; and, even when conformance information is available, it may be desirable to send a message that falls outside a receiver's profile, e.g. it may be most efficient for a sender to broadcast a single knowledge base to a large number of receivers with differing conformance profiles rather than sending different knowledge bases to each receiver.

Whenever a receiver receives a non-conforming expression, it is free to ignore the expression, even though it may be able to make sense of portions of that expression. If the receiver ignores a non-conforming expression and the sender requests a reply, the receiver should report a failure.


Appendix A - BNF Syntax

    upper ::= A | B | C | D | E | F | G | H | I | J | K | L | M | 
              N | O | P | Q | R | S | T | U | V | W | X | Y | Z

    lower ::= a | b | c | d | e | f | g | h | i | j | k | l | m | 
              n | o | p | q | r | s | t | u | v | w | x | y | z

    digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

    special ::= ! | $ | % | & | * | + | - | . | / | < | = | > | ? |
              @ | _ | ~ |

    white ::= space | tab | return | linefeed | page

    initialchar ::= upper | lower

    wordchar ::= upper | lower | digit | - | _ | special

    character ::= upper | lower | digit | special | white

    word ::= initialchar wordchar*

    string ::= "character*"

    variable ::= ?word | @word

    number ::= [-] digit+ [. digit+] [exponent]

    exponent ::= e [-] digit+

    term ::= variable | word | string | funterm | number | sentence

    relword ::= initialchar wordchar*

    funword ::= initialchar wordchar*

    funterm ::= (funword term+)

    sentence ::= word | equation | inequality | 
                 relsent | logsent | quantsent

    equation ::= (= term term)

    relsent ::= (relword term+)

    logsent ::= (not sentence) |
                (and sentence+) |
                (or sentence+) |
                (=> sentence sentence) |
                (<=> sentence sentence)

    quantsent ::= (forall (variable+) sentence) |
                  (exists (variable+) sentence)


Appendix B - Structural Ontology

(instance ?X ?Y) - ?X is an individual which is a member of class ?Y
(subclass ?X ?Y) - ?X is a class which is a subclass of class ?Y.

subclass is transitive and instance is transitive through subclass.

  (=>
    (and
      (instance ?X ?Y)
      (subclass ?Y ?Z))
    (instance ?X ?Z))

  (=>
    (and
      (subclass ?X ?Y)
      (subclass ?Y ?Z))
    (subclass ?X ?Z))

(domain ?X ?N ?Y) - the argument in position ?N of relation ?X is a member of class ?Y. Note that ?N=0 could be used for the return type of functional relations.

One term is also required which could be called "top" (as in Sowa's ontology) or "Entity" (as in SUMO)

 


Appendix C - Truth Tables and Interpretations

A  B  (and A B)
T  T      T
T  F      F
F  T      F
F  F      F

A  B  (or A B)
T  T      T
T  F      T
F  T      T
F  F      F

A     (not A)
T         F
F         T

A  B  (=> A B)
T  T      T
T  F      F
F  T      T
F  F      T

A  B  (<=> A B)
T  T      T
T  F      F
T  F      F
F  F      T

Row Variables

One option is to treat row varaibles as "macros", which would get expanded automatically so

(=>
   (and
      (subrelation ?REL1 ?REL2)
      (holds ?REL1 @ROW))
   (holds ?REL2 @ROW))

would become

(=>
   (and
      (subrelation ?REL1 ?REL2)
      (holds ?REL1 ?ARG1))
   (holds ?REL2 ?ARG1))

(=>
   (and
      (subrelation ?REL1 ?REL2)
      (holds ?REL1 ?ARG1 ?ARG2))
   (holds ?REL2 ?ARG1 ?ARG2))

etc.

Note that this "macro" style expansion has the problem that unlike the true semantics of row variables, that it is not infinite. If the interpretation only expands to five variables, that there is a problem if the knowledge engineer uses a relation with six.


Last updated on 2003-12-28 by Webmaster

Copyright IEEE 2003 - All rights reserved.