Conceptual Graph Summary

John F. Sowa

A conceptual graph (CG) is a graph representation for logic based on the semantic networks of artificial intelligence and the existential graphs of Charles Sanders Peirce. The ISO standard conceptual graphs express the full semantics of Common Logic (CL), which includes the subset used for the Semantic Web languages; a larger CG subset adds a context notation to support metalanguage and modality; and the research CGs are exploring an open-ended variety of extensions Section 1 of this report presents a summary of the ISO standard for Common Logic and its mapping to the Conceptual Graph Interchange Format (CGIF) and the Common Logic Interchange Format (CLIF). for aspects of natural language semantics. Section 2 summarizes various extensions to the CG notation to support versions of logic that go beyond the current CL standard. Section 3 presents the grammar for CGIF, as defined by the CL standard. In case of any discrepancies, the ISO/IEC 24707 document is the definitive standard for Common Logic that takes precedence over this summary.

1. Common Logic

Common Logic (CL) evolved from two projects to develop parallel ANSI standards for conceptual graphs and the Knowledge Interchange Format (Genesereth & Fikes 1992). Eventually, those projects were merged into a single ISO project to develop a common abstract syntax and model-theoretic foundation for a family of logic-based notations (ISO/IEC 24707). Hayes and Menzel (2001) defined a very general model theory for CL, which Hayes and McBride (2003) used to define the semantics for the languages RDF(S) and OWL. In addition to the abstract syntax and model theory, the CL standard specifies three concrete dialects that are capable of expressing the full CL semantics: the Common Logic Interchange Format (CLIF), the Conceptual Graph Interchange Format (CGIF), and the XML-based notation for CL (XCL). Since the semantics of RDF and OWL is based on a subset of CL semantics, those languages can also be considered dialects of CL: any statement in RDF or OWL can be translated to CLIF, CGIF, or XCL, but only a subset of CL can be translated back to RDF or OWL.

The CL syntax allows quantifiers to range over functions and relations, but CL retains a first-order style of model theory and proof theory. To support a higher-order syntax, but without the computational complexity of higher-order semantics, the CL model theory uses a single domain D that includes individuals, functions, and relations. The option of limiting the domain of quantification to a single set was suggested by Quine (1954) and used in various theorem provers that allow quantifiers to range over relations (Chen et al., 1993).

The CL standard is defined in an abstract syntax that is independent of any concrete notation. It does, however, support the full Unicode character set and the URIs of the Semantic Web. The three dialects defined in the standard (CLIF, CGIF, and XCL) use only the ASCII subset of Unicode for their basic syntax, but they allow any Unicode symbols in names and character strings. Although CGIF and CLIF had different origins, the two notations have many similarities. As an example, following is the core CGIF for the sentence Bob drives his Chevy to St. Louis:

   [*x] [*y]
   (Drive ?x) (Person Bob) (City "St. Louis") (Chevy ?y)
   (Agnt ?x Bob) (Dest ?x "St. Louis") (Thme ?x ?y) (Poss Bob ?y)

In core CGIF, the concept nodes [*x] and [*y] represent the existential quantifiers ∃x and ∃y. Following is the CLIF statement, which uses the keyword exists to introduce a list of existentially quantified variables:

   (exists (x y)
      (and (Drive x) (Person Bob) (City "St. Louis") (Chevy y)
           (Agnt x Bob) (Dest x "St. Louis") (Thme x y) (Poss Bob y) ))

Although CGIF and CLIF look similar, there are several fundamental differences:

Since CGIF is a serialized representation of a graph, labels such as x or y represent coreference links between nodes, but they represent variables in CLIF or predicate calculus.
CGIF distinguishes the labels *x and ?x from a name like Bob by an explicit prefix. CLIF, however, has no special markers on variables; the only distinction is that variables appear in a list after the quantifier.
Since the nodes of a graph have no inherent ordering, a CGIF sentence is an unordered list of nodes. Unless grouped by context brackets, the list may be permuted without affecting the semantics.
The CLIF operator and does not occur in CGIF because the conjunction of nodes within any context is implicit. Omitting the conjunction operator in CGIF tends to reduce the number of parentheses.

Extended CGIF allows monadic relation names to be used as type labels, and extended CLIF allows them to be used as restrictions on the scope of quantifiers. Following is the extended CGIF for the above sentence:

   [Drive *x] [Person: Bob] [City: "St. Louis"] [Chevy *y]
   (Agnt ?x Bob) (Dest ?x "St. Louis") (Thme ?x ?y) (Poss Bob ?y)

And following is the equivalent in extended CLIF:

   (exists ((x Drive) (y Chevy))
      (and (Person Bob) (City "St. Louis") (Agnt x Bob)
      (Dest x "St. Louis") (Thme x y) (Poss Bob y) ))

Since the semantics of any statement in extended CGIF and CLIF is defined by its translation to the core language, neither language makes a semantic distinction between type labels and monadic relations. If a statement in a strongly typed language, such as the Z Specification Language, is translated to CGIF or CLIF, the Z types are mapped to CGIF type labels or CLIF quantifier restrictions. A syntactically correct statement in Z and its translation to CGIF or CLIF have the same truth value. But an expression with a type mismatch would cause a syntax error in Z, but it would merely be false in CGIF or CLIF.

As another example, Figure 20 shows a CG for the sentence If a cat is on a mat, then it is a happy pet. The dotted line that connects the concept [Cat] to the concept [Pet], which is called a coreference link, indicates that they both refer to the same entity. The Attr relation indicates that the cat, also called a pet, has an attribute, which is an instance of happiness.

Figure 20. CG display form for If a cat is on a mat, then it is a happy pet.

The dotted line in Figure 20, called a coreference link, is shown in CGIF by the defining label *x in the concept [Cat: *x] and the bound label ?x in [Pet: ?x]. Following is the extended CGIF:

   [If: [Cat *x] [Mat *y] (On ?x ?y)
      [Then: [Pet ?x] [Happy *z] (Attr ?x ?z) ]]

In CGs, functions are represented by conceptual relations called actors. Figure 21 is the CG display form for the following equation written in ordinary algebraic notation:

   y = (x + 7)/sqrt(7)

The three functions in this equation would be represented by three actors, which are shown in Figure 21 as diamond-shaped nodes with the type labels Add, Sqrt, and Divide. The concept nodes contain the input and output values of the actors. The two empty concept nodes contain the output values of Add and Sqrt.

Figure 21. CL functions represented by actor nodes

In CGIF, actors are represented as relations with two kinds of arcs: a sequence of input arcs separated by a vertical bar from a sequence of output arcs.

   [Number: *x] [Number: *y] [Number: 7]
   (Add ?x 7 | [*u]) (Sqrt 7 | [*v]) (Divide ?u ?v | ?y)

In the display form, the input arcs of Add and Divide are numbered 1 and 2 to indicate the order in which the arcs are written in CGIF. Following is the corresponding CLIF:

   (exists ((x Number) (y Number))
      (and (Number 7) (= y (Divide (Add x 7) (Sqrt 7)))))

No CLIF variables are needed to represent the coreference labels *u and *v since the functional notation used in CLIF shows the connections directly.

CLIF only permits functions to have a single output, but extended CGIF allows actors to have multiple outputs. The following actor of type IntegerDivide has two inputs: an integer x and an integer 7. It also has two outputs: a quotient u and a remainder v.

   (IntegerDivide [Integer: *x] [Integer: 7] | [*u] [*v])

When this actor is translated to core CGIF or CLIF, the vertical bar is removed, and the actor becomes an ordinary relation with four arguments; the distinction between inputs and outputs is lost. In order to assert the constraint that the last two arguments are functionally dependent on the first two arguments, the following CGIF sentence asserts that there exist two functions, identified by the coreference labels Quotient and Remainder, which for every combination of input and output values are logically equivalent to an actor of type IntegerDivide with the same input and output values:

   [Function: *Quotient] [Function: *Remainder]
   [[@every*x1] [@every*x2] [@every*x3] [@every*x4]
   [Equiv: [Iff: (IntegerDivide ?x1 ?x2 | ?x3 ?x4)]
           [Iff: (#?Quotient ?x1 ?x2 | ?x3) (#?Remainder ?x1 ?x2 | ?x4)]]]

Each line of this example illustrates one or more features of CGIF. The first line represents existential quantifiers for two entities of type Function. On the second line, the context bracket [ encloses the concept nodes with universal quantifiers, marked by @every, to show that the existential quantifiers for Quotient and Remainder include the universals within their scope. The equivalence on lines three and four shows that an actor of type IntegerDivide is logically equivalent to a conjunction of the quotient and remainder functions. Finally, the symbol # on line four shows that the coreference labels ?Quotient and ?Remainder are being used as type labels. Following is the corresponding CLIF:

   (exists ((Quotient Function) (Remainder Function))
      (forall (x1 x2 x3 x4)
         (iff (IntegerDivide x1 x2 x3 x4)
              (and (= x3 (Quotient x1 x2)) (= x4 (Remainder x1 x2))))))

As another example of quantifiers that range over relations, someone might say “Bob and Sue are related,” but not say exactly how they are related. The following sentences in CGIF and CLIF state that there exists some familial relation r that relates Bob and Sue:

   [Relation: *r] (Familial ?r) (#?r Bob Sue)

   (exists ((r Relation)) (and (Familial r) (r Bob Sue)))

The concept [Relation: *r] states that there exists a relation r. The next two relations state that r is familial and r relates Bob and Sue.

This brief survey has illustrated nearly every major feature of CGIF and CLIF. One important feature that has not been mentioned is a sequence marker to support relations with a variable number of arguments. Another is the use of comments, which can be placed before, after, or inside any concept or relation node in CGIF. The specifications in the CL standard guarantee that any sentence expressed in the dialects CGIF, CLIF, or XCL can be translated to any of the others in a logically equivalent form. Although the translation will preserve the semantics, it is not guaranteed to preserve all syntactic details: a sentence translated from one dialect to another and then back to the first will be logically equivalent to the original, but some subexpressions might be reordered or replaced by semantic equivalents. Following is a summary of the CGIF grammar; see ISO/IEC 24707 for the complete specification of the Common Logic syntax and semantics.

2. CG Extensions

Natural languages are highly expressive systems that can state anything that can be expressed in any formal language or logic. That enormous expressive power makes it difficult or impossible for any formalism to represent every feature of every natural language. To increase the range of expressibility, conceptual graphs constitute an open-ended family of notations with a formally defined core. Each of the following four levels of CGs can be expressed in the graphic display form or the linear CGIF:

Core CGs. A typeless version of logic that expresses the full semantics of Common Logic, as defined by ISO/IEC 24707. This level corresponds to Peirce’s existential graphs: its only logical primitives are conjunction, negation, and the existential quantifier. Core CGs permit quantifiers to range over relations, but Peirce also experimented with that option for EGs.
Extended CGs. An upward compatible extension of the core, which adds a universal quantifier; type labels for restricting the range of quantifiers; Boolean contexts with type labels If, Then, Either, Or, Equivalence, and Iff; and the option of importing external CGIF text. Core and extended CGs have exactly the same expressive power, since the semantics of extended CGs is defined by a formal translation to core CGs. Extended CGs are usually more concise than core CGs, and they have a more direct mapping to and from natural languages.
CGs with contexts. An upward compatible extension of core and extended CGs to support metalanguage, the option of using language to talk about language. That option requires some way of quoting or delimiting object level statements from metalevel statements. As a delimiter, CG notations use a type of concept node, called a context, which contains a nested conceptual graph (Figure 18). CG contexts can be formalized by the IKL semantics, which includes the CL semantics as a subset (Hayes & Menzel 2006). Another formalization, which can be adapted to the IKL semantics, is the system of nested graph models (Sowa 2003, 2005). Either or both of these formalizations can be used in an upward compatible extension of core and extended CGs.
Research CGs. Adaptations of the CG notation for an open-ended variety of purposes. The advantage of a standard is a fixed, reliable platform for developing computer systems and applications. The disadvantage of a standard is the inability to explore useful modifications. But the combination of an ISO standard with the option of open-ended variations gives developers a solid basis for applications without limiting the creativity of researchers. Research extensions that prove to be useful may be incorporated in some future standard.

Peirce’s first use for the oval was to negate the graphs nested inside, and that is the only use supported by the ISO standard. But Peirce (1898) generalized the ovals to context enclosures, which allow relations other than negation to be linked to the enclosure. The basic use of a context enclosure is to quote the nested graphs. That syntactic option allows metalevel statements outside the context to specify how the nested (object level) graphs are interpreted. Nested graph models (NGMs) can be used to formalize the semantics of many kinds of modal and intensional logics. A hierarchy of metalevels with the NGM semantics can express the equivalent of a wide range of modal, temporal, and intentional logics. The most useful NGMs can be represented with the IKL semantics, but the many variations and their application to natural languages have not yet been fully explored.

The most common use of language about language is to talk about the beliefs, desires, and intentions of the speaker and other people. As an example, the sentence Tom believes that Mary wants to marry a sailor, contains three clauses, whose nesting may be marked by brackets:

     Tom believes that [Mary wants [to marry a sailor]].

The outer clause asserts that Tom has a belief, which is the object of the verb believe. Tom’s belief is that Mary wants a situation described by the nested infinitive, whose subject is the same person who wants the situation. Each clause makes a comment about the clause or clauses nested in it. References to the individuals mentioned in those clauses may cross context boundaries in various ways, as in the following two interpretations of the original English sentence:

     Tom believes that [there is a sailor whom Mary wants [to marry]].
     There is a sailor whom Tom believes that [Mary wants [to marry]].

The two conceptual graphs in Figure 18 represent the first and third interpretations. In the CG on the left, the existential quantifier for the concept [Sailor] is nested inside the situation that Mary wants. Whether such a sailor actually exists and whether Tom or Mary knows his identity are undetermined. The CG on the right explicitly states that such a sailor exists; the connections of contexts and relations imply that Tom knows him and that Tom believes that Mary also knows him. Another option (not shown) would place the concept [Sailor] inside the context of type Proposition; it would leave the sailor’s existence undetermined, but it would imply that Tom believes he exists and that Tom believes Mary knows him.

Figure 18. Two interpretations of Tom believes that Mary wants to marry a sailor

The context boxes illustrated in Figures 4 and 6 express negations or operators such as If and Then, which are defined in terms of negations. But the contexts of type Proposition and Situation in Figure 18 raise new issues of logic and ontology. The CL semantics can represent entities of any type, including propositions and situations, but it has no provision for relating such entities to the internal structure of CL sentences. A more expressive language, called IKL (Hayes & Menzel 2006), was defined as an upward compatible extension of CL. The IKL semantics introduces entities called propositions and a special operator, spelled that, which relates IKL sentences to the propositions they express. IKL semantics does not have a built-in type for situations, but it is possible in IKL to make statements that state the existence of entities of type Situation and relate them to propositions.

The first step toward translating the CGs in Figure 18 to IKL is to write them in an extended version of CGIF, which allows CGs to be nested inside concept nodes of type Proposition or Situation. Following is the CGIF for the CG on the left:

     [Person: Tom] [Believe: *x1] (Expr ?x1 Tom)
     (Thme ?x1 [Proposition:
        [Person: Mary] [Want: *x2] (Expr ?x2 Mary)
        (Thme ?x2 [Situation:
           [Marry: *x3] [Sailor: *x4] (Agnt ?x3 Mary) (Thme ?x3 ?x4)])])

This statement uses the option of moving the concept nodes for the types Proposition and Situation inside the relation nodes of type Thme. That option has no semantic significance, but it makes the order of writing the CGIF closer to English word order. A much more important semantic question is the relation between situations and propositions. In the ontology commonly used with CGs, that relation is spelled Dscr and called the description relation. The last two lines of the CGIF statement above could be rewritten in the following form:

     (Thme ?x2 [Situation: *s]) (Dscr ?s [Proposition:
        [Marry: *x3] [Sailor: *x4] (Agnt ?x3 Mary) (Thme ?x3 ?x4)])])

The last line is unchanged, but the line before it states that the theme of x2 is the situation s and the description of s is the proposition stated on the last line. In effect, every concept of type Situation that contains a nested CG is an abbreviation for a situation that is described by a concept of type Proposition that has the same nested CG. This expanded CGIF statement can then be translated to IKL (which is based on CLIF syntax with the addition of the operator that).

     (exists ((x1 Believe)) (and (Person Tom) (Expr x1 Tom)
     (Thme x1 (that
        (exists ((x2 Want) (s Situation)) (and (Person Mary) (Expr x2 Mary)
        (Thme x2 s) (Dscr s (that
           (exists ((x3 Marry) (x4 Sailor)) (and (Agnt x3 Mary) (Thme x3 x4)
     ))))))))))

Each line of the IKL statement expresses the equivalent of the corresponding line in CGIF. Note that every occurrence of Proposition in CGIF corresponds to that in IKL. The syntax of CLIF or IKL requires more parentheses than CGIF because every occurrence of exists or and requires an extra closing parenthesis at the end.

As these examples illustrate, the operator that adds an enormous amount of expressive power, but IKL still has a first-order style of semantics. The proposition nodes in CGs or the that operator in IKL introduce abstract entities of type Proposition, but propositions are treated as zero-argument relations, which are supported by the semantics of Common Logic. Although language about propositions is a kind of metalanguage, it does not, by itself, go beyond first-order logic. Tarski (1933), for example, demonstrated how a stratified series of metalevels, each of which is purely first order, can be used without creating paradoxes or going beyond the semantics of FOL. In effect, Tarski avoided paradoxes by declaring that certain kinds of sentences (those that violate the stratification) do not express propositions in his models. The IKL model theory has a similar way of avoiding paradoxes: it does not require every model to include a proposition for every possible sentence. For example, the following English sentence, which sounds paradoxical, could be expressed in either IKL or CGIF syntax:

There exists a proposition p, p is true,
and p is the proposition that p is false.

Since IKL does not require every sentence to express a proposition in every model, there are permissible IKL models in which this sentence is false simply because no such proposition exists. Therefore, the paradox vanishes because the sentence has a stable, but false, truth value.

Issues of context and metalanguage require some syntactic and semantic extensions beyond the CL standard. The syntax for context was proposed by Sowa (1984) with a semantics that was a subset of the later IKL and NGM versions. Syntax for the following three extensions has also been available since 1984, and some CG systems have implemented versions of them. But they are not yet in ISO standard CGIF because a fully general treatment would involve ongoing research in linguistics:

Generalized quantifiers. In addition to the usual quantifiers of every and some, natural languages support an open-ended number of quantificational expressions, such as exactly one, at least seven, or considerably more. Some of these quantifiers, such as exactly one cat, could be represented as [Cat: @1] and defined in terms of the CL standard. Others, such as at least seven cats, could be represented [Cat: @≤7] and defined with a version of set theory added to the base logic. But quantifiers such as considerably more would require some method of approximate reasoning, such as fuzzy sets or rough sets.
Indexicals. Peirce observed that every statement in logic requires at least one indexical to fix the referents of its symbols. The basic indexical, which corresponds to the definite article the, is represented by the symbol # inside a concept node: [Dog: #] would represent the phrase the dog. The pronouns I, you, and she would be represented [Person: #I], [Person: #you], and [Person: #she]. To process indexicals, some linguists propose versions of dynamic semantics, in which the model is updated during the discourse. A simpler method is to treat the # symbol as a syntactic marker that indicates a incomplete interpretation of the original sentence. With this approach, the truth value of a CG that contains any occurrences of # is not determined until those markers are replaced by names or coreference labels. This approach supports indexicals in an intermediate representation, but uses a conventional model theory to evaluate the final resolution.
Plural nouns. Plurals have been represented in CGs by set expressions inside the concept boxes. The concept [Cat: {*}@3] would represent three cats, and [Dog: {Lucky, Macula}] would represent the dogs Lucky and Macula. Various methods have been proposed for representing distributed and collective plurals and translating them to versions of set theory and mereology.

Simple versions of these features have been implemented in CG systems. The difficult issues involve generalizing them in a systematic way to cover all the variations that occur in natural languages.

The major difference between natural languages and formal languages is not in the notation. For the world’s first formal logic, Aristotle used a subset of Greek. Some computer systems use a controlled natural language based on a formally defined subset of a natural language. The defining characteristic of a formal language is that the meaning of any statement is completely determined by its form or syntax. Natural languages, however, are highly context dependent. Ambiguities and indexicals are two kinds of dependencies that have been thoroughly analyzed, but vagueness is more challenging. Peirce (1902:748) defined vagueness in an article for Baldwin’s Dictionary of Philosophy and Psychology:

A proposition is vague when there are possible states of things concerning which it is intrinsically uncertain whether, had they been contemplated by the speaker, he would have regarded them as excluded or allowed by the proposition. By intrinsically uncertain we mean not uncertain in consequence of any ignorance of the interpreter, but because the speaker’s habits of language were indeterminate.

In other writings, Peirce explained that a vague statement requires some additional information to make it precise, but the sentence itself gives little or no indication of what kind of information is missing. That characteristic distinguishes vagueness from the ambiguity illustrated in Figure 18. An ambiguous sentence has two or more interpretations, and the kind of information needed to resolve the ambiguity can usually be determined by analyzing the sentence itself. An indexical is a word or phrase, such as she or the book, whose referent is not specified, but the kind of thing that would serve as a referent and the means of finding it are usually determined. But the meaning of a vague sentence, as Peirce observed, is “intrinsically uncertain” because the method for finding the missing information is unknown, perhaps even to the speaker.

Peirce’s definition is compatible with an approach to vagueness by the logical methods of underspecification and supervaluation. As an example, the graph on the left of Figure 18 is underspecified because it is true in all three of the interpretations of the sentence Tom believes that Mary wants to marry a sailor. The graph on the right is the most specific because it is true in only one interpretation. CGs represent indexicals with underspecified concepts such as [Person: #she] for she and [Book: #] for the book, whose referents could later be resolved by the methods of discourse representation theory. To accommodate the truth-value gaps of vague sentences that are neither true nor false, van Fraassen (1966) proposed supervaluation as a model-theoretic method that allows some sentence p to have an indeterminate truth value in a specific model. But other sentences, such as p∨~p would have a supervalue of true in all models, while the sentence p∧~p would have the supervalue false in all models.

For language understanding, Hintikka (1973:129) observed that infinite families of models might be theoretically interesting, but “It is doubtful whether we can realistically expect such structures to be somehow actually involved in our understanding of a sentence or in our contemplation of its meaning.” As an alternative, he proposed a surface model of a sentence S as “a mental anticipation of what can happen in one’s step-by-step investigation of a world in which S is true.” Instead of a logic of vagueness, Hintikka suggested a method of constructing a model that would use all available information to fill in the details left indeterminate in a given text. The first stage of constructing a surface model begins with the entities occurring in a sentence or story. During the construction, new facts may be asserted that block certain extensions or facilitate others. A conventional model is the limit of a surface model that has been extended infinitely far, but such infinite processes are not required for normal understanding.

After forty years, supervaluations are still widely used by logicians, but Fodor and Lepore (1996) denounced them as useless for linguistics and psychology. Surface models, although largely neglected, can be constructed by constraint satisfaction methods similar to the heuristic techniques in AI and dynamic semantics in linguistics. Conceptual graphs are convenient for such methods because they can be used to represent arbitrary statements in logic, to represent formal models of those statements, and to represent each step from an initially vague statement to a complete specification. Any Tarski-style model, for example, can be represented as a potentially infinite simple graph, whose only logical operators are conjunction and existential quantification. Each step in constructing a surface model is a subgraph of a potentially infinite complete model, but for most applications, there is no need to finish the construction. Such techniques have been successfully used for understanding dialogs, in which fragments of information from multiple participants must be assembled to construct a larger view of a subject.

Implementations, in either computers or neurons, can exploit the topological properties of graphs, which include symmetry, duality, connectivity, and cycles. For knowledge representation, the topology often reveals similarities that are masked by different choices of ontology. Figure 19 shows two different ways of representing the sentence Sue gives a child a book. The CG on the left represents the verb by a triadic relation, while the one on the right represents it by a concept linked to three dyadic relations: agent, theme, and recipient. Different ontologies lead to different numbers of concept and relation nodes with different type labels, but a characteristic invariant of giving is the triadic connectivity of three participants to a central node.

Figure 19. Two ways of representing an act of giving

Although the relation type Gives and the concept type Give have similar names, concept nodes and relation nodes cannot be directly mapped to one another. But when the topology is analyzed, adjacent nodes can be merged to reduce the size of the graphs while preserving invariants such as cycles and connectivity. The invariants can indicate common underlying relationships expressed with different ontologies. Sowa and Majumdar (2003) presented a more complex example that compared one CG derived from a relational database to another CG derived from an English sentence, both of which represented the same physical structure. The database CG had 15 concept nodes and 9 relation nodes, but the CG from English had 12 concept nodes and 11 relation nodes. Furthermore, no type label on any node of one CG was identical to any type label on any node of the other. Given only the constraint that five concept nodes of each CG referred to the same five objects, the algorithms correctly derived the mappings from one ontology to the other. Those mappings could then be used to compare other representations with the same two ontologies. Ongoing research on methods for representing graphs has led to high-performance algorithms for searching, comparing, and transforming large numbers of very large graphs.

As an example of applied research, one of the largest commercial CG systems is Sonetto (Sarraf & Ellis 2006), which uses extended versions of earlier algorithms by Levinson and Ellis (1992). A key innovation of Sonetto is its semi-automated methods for extracting ontologies and business rules from unstructured documents. The users who assist Sonetto in the knowledge extraction process are familiar with the subject matter, but they have no training in programming or knowledge engineering. CGIF is the knowledge representation language for ontologies, rules, and queries. It is also used to manage the schemas of documents and other objects in the system and to represent the rules that translate CGIF to XML and other formats. For the early CG research, see the collections edited by Nagle et al. (1992), Way (1992), and Chein (1996). More recent research on CGs has been published in the annual proceedings of the International Conferences on Conceptual Structures.

3. CGIF Syntax

Lexical Grammar Rules

The syntax rules are written in Extended Backus-Naur Form (EBNF) rules, as specified by ISO/IEC 14977. The CGIF syntax rules assume the same four types of names as CLIF: namecharsequence for names not enclosed in quotes; enclosedname for names enclosed in double quotes; numeral for numerals consisting of one or more digits; and quotedstring for character strings enclosed in single quotes. But because of syntactic differences between CGIF and CLIF, CGIF must enclose more names in quotes than CLIF in order to avoid ambiguity. Therefore, the only CG names not enclosed in quotes belong to the categories identifier and numeral.

   CGname = identifier | '"', (namecharsequence - identifier), '"'
            | numeral | enclosedname | quotedstring;
   identifier = letter, {letter | digit | "_"};

When CGIF is translated to CL, a CGname is translated to a CLIF name by removing any quotes around a name character sequence. CLIF does not make a syntactic distinction between constants and variables, but in CGIF any CGname that is not used as a defining label or a bound label is called a constant. The start symbol of CGIF syntax is the category text for a complete text or the category CG for just a single conceptual graph.

Core CGIF Grammar Rules

An actor is a conceptual relation that represents a function in Common Logic. It begins with (, an optional comment, an optional string #?, a CG name, |, an arc, an optional end comment, and ). If the CG name is preceded by #?, it represents a bound coreference label; otherwise, it represents a type label. The arc sequence represents the arguments of the CL function and the last arc represents the value of the function.

   actor = "(", [comment], ["#?"], CGname, arcSequence, "|", arc,
                [endComment], ")";

An arc is an optional comment followed by a reference. It links an actor or a conceptual relation to a concept that represents one argument of a CL function or relation.

   arc = [comment], reference;

An arc sequence is a sequence of zero or more arcs, followed by an option consisting of an optional comment, ?, and a sequence marker.

   arcSequence = {arc}, [[comment], "?", seqmark];

A comment or an end comment is a character string that has no effect on the semantics of a conceptual graph or any part of a conceptual graph. A comment begins with "/*", followed by a character string that contains no occurrence of "*/", and ends with "*/". A comment may occur immediately after the opening bracket of any concept, immediately after the opening parenthesis of any actor or conceptual relation, immediately before any arc, or intermixed with the concepts and conceptual relations of a conceptual graph. An end comment begins with ;, followed by a character string that contains no occurrence of ] or ). An end comment may occur immediately before the closing bracket of any concept or immediately before the closing parenthesis of any actor or conceptual relation.

   comment = "/*", {(character-"*") | ["*", (character-"/")]}, ["*"], "*/";

   endComment = ";", {character - ("]" | ")")};

A concept is either a context, an existential concept, or a coreference concept. Every concept begins with [ and an optional comment; and every concept ends with an optional end comment and ]. Between the beginning and end, a context contains a CG; an existential concept contains * and either a CG name or a sequence marker; and a coreference concept contains : and a sequence of one or more references. A context that contains a blank CG is said to be empty, even if it contains one or more comments; any comment that occurs immediately after the opening bracket shall be part of the concept, not the following CG.

   concept = "[", [comment],
              (CG | "*", (CGname | seqmark) | ":", {reference}- ),
              [endComment], "]";

A conceptual graph (CG) is an unordered list of concepts, conceptual relations, negations, and comments.

   CG = {concept | conceptualRelation | negation | comment};

A conceptual relation is either an ordinary relation or an actor. An ordinary relation, which represents a CL relation, begins with (, an optional comment, an optional string #?, a CG name, an optional end comment, and ). If the CG name is preceded by #?, it represents a bound coreference label; otherwise, it represents a type label. An ordinary relation has just one sequence of arcs, but an actor has two sequences of arcs.

   conceptualRelation = ordinaryRelation | actor;

   ordinaryRelation = "(", [comment], ["#?"], CGname, arcSequence,
                           [endComment], ")";

A negation is ~ followed by a context.

   negation = "~", context;

A reference is an optional ? followed by a CG name. A CG name prefixed with ? is called a bound coreference label; without the prefix ?, it is called a constant.

   reference = ["?"], CGname;

A text is a context, called an outermost context, that has an optional name, has an arbitrarily large conceptual graph, and is not nested inside any other context. It consists of [, an optional comment, the type label Proposition, :, an optional CG name, a conceptual graph, an optional end comment, and ]. Although a text may contain core CGIF, the type label Proposition is outside the syntax of core CGIF.

   text = "[", [comment], "Proposition", ":", [CGname], CG,
               [endComment], "]";

Extended CGIF Grammar Rules

Extended CGIF is superset of core CGIF, and every syntactically correct sentence of core CGIF is also syntactically correct in extended CGIF. Its most prominent feature is the option of a type label or a type expression on the left side of any concept. In addition to types, extended CGIF adds the following features to core CGIF:

More options in concepts, including universal quantifiers.
Boolean contexts for representing the operators or, if, and iff.
The option of allowing concept nodes to be placed in the arc sequence of conceptual relations.
The ability to import text into a text.

These extensions are designed to make sentences more concise, more readable, and more suitable as a target language for translations from natural languages and from other CL dialects, including CLIF. None of them, however, extend the expressive power of CGIF beyond the CG core, since the semantics of every extended feature is defined by its translation to core CGIF, whose semantics is defined by its translation to the abstract syntax of Common Logic.

The following grammar rules of extended CGIF have the same definitions as the core CGIF rules of the same name: arcSequence, conceptualRelation, negation, ordinaryRelation, text. The following grammar rules of extended CGIF don’t occur in core CGIF, or they have more options than the corresponding rules of core CGIF: actor, arc, boolean, CG, concept, eitherOr, equivalence, ifThen, typeExpression.

An actor in extended CGIF has the option of zero or more arcs following | instead of just one arc.

   actor = "(", [comment], ["#?"], CGname,
           arcSequence, "|", {arc}, [endComment], ")";

An arc in extended CGIF has the options of a defining coreference label and a concept in addition to a bound coreference label.

   arc = [comment], (reference | "*", CGname | concept);

A boolean is either a negation or a combination of negations that represent an either-or construction, an if-then construction, or an equivalence. Instead of being marked with ~, the additional negations are represented as contexts with the type labels Either, Or, If, Then, Equiv, Equivalence, or Iff.

   boolean = negation | eitherOr | ifThen | equivalence;

A concept in extended CGIF permits any combination allowed in core CGIF in the same node and it adds two important options: a type field on the left side of the concept node, and a universal quantifier on the right. Four options are permitted in the type field: a type expression, a bound coreference label prefixed with "#", a constant, or the empty string; a colon is required after a type expression, but optional after the other three.

   concept = "[", [comment],
             ( (typeExpression, ":"
               | ["#?"], CGname, [":"]),
               [["@every"], "*", CGname], {reference}, CG
             | ["@every"], "*", seqmark
             ), [endComment], "]";

A conceptual graph (CG) in extended CGIF adds Boolean combinations of contexts to core CGIF.

   CG = {concept | conceptualRelation | boolean | comment};

An either-or is a negation with a type label Either that contains zero or more negations with a type label Or.

   eitherOr = "[", [comment], "Either", [":"],
                   {"[", [comment], "Or", [":"], CG, [endComment], "]"}
                   [endComment], "]";

An equivalence is a context with a type label Equivalence or Equiv that contains two contexts with a type label Iff. It is defined as a pair of if-then constructions, each with one of the iff-contexts as antecedent and the other as consequent.

   equivalence = "[", [comment], ("Equivalence" | "Equiv"), [":"],
                     "[", [comment], "Iff", [":"], CG, [endComment], "]",
                     "[", [comment], "Iff", [":"], CG, [endComment], "]",
                      [endComment], "]";

An if-then is a negation with a type label If that contains a negation with a type label Then.

   ifThen = "[", [comment], "If", [":"], CG,
                   "[", [comment], "Then", [":"], CG, [endComment], "]",
                   [endComment], "]";

A type expression is a lambda-expression that may be used in the type field of a concept. The symbol @ marks a type expression, since the Greek letter λ is not available in the ASCII subset of Unicode.

   typeExpression = "@", "*", CGname, CG;

References

Chein, Michel, ed., (1996) Revue d’intelligence artificielle, Numéro Spécial Graphes Conceptuels, vol. 10, no. 1.

Dau, Frithjof (2006) “Some notes on proofs with Alpha graphs,” in Schärfe et al., pp. 172-188.

Davidson, Donald (1967) “The logical form of action sentences,” reprinted in D. Davidson (1980) Essays on Actions and Events, Clarendon Press, Oxford, pp. 105-148.

Fillmore, Charles J. (1968) “The case for case” in E. Bach & R. T. Harms, eds., Universals in Linguistic Theory, Holt, Rinehart and Winston, New York, 1-88.

Fodor, Jerry A., & Ernie Lepore (1996) “What can’t be valued, can’t be valued, and it can’t be supervalued either,” Journal of Philosophy 93, 516-536.

Genesereth, Michael R., & Richard Fikes, eds. (1992) Knowledge Interchange Format, Version 3.0 Reference Manual, TR Logic-92-1, Computer Science Department, Stanford University.

Gentzen, Gerhard (1935) “Untersuchungen über das logische Schließen,” translated as “Investigations into logical deduction” in The Collected Papers of Gerhard Gentzen, ed. and translated by M. E. Szabo, North-Holland Publishing Co., Amsterdam, 1969, pp. 68-131.

Hayes, Patrick (2005) “Translating semantic web languages into Common Logic,” http://www.ihmc.us/users/phayes/CL/SW2SCL.html

Hayes, Patrick, & Brian McBride (2003) “RDF semantics,” W3C Technical Report, http://www.w3.org/TR/rdf-mt/

Hayes, Patrick, & Chris Menzel (2001) “A semantics for the Knowledge Interchange Format,” Proc. IJCAI 2001 Workshop on the IEEE Standard Upper Ontology, Seattle.

Hayes, Patrick, & Chris Menzel (2006) “IKL Specification Document,” http://www.ihmc.us/users/phayes/IKL/SPEC/SPEC.html

Hays, David G. (1964) “Dependency theory: a formalism and some observations,” Language 40:4, 511-525.

Hintikka, Jaakko (1973) "Surface semantics: definition and its motivation," in H. Leblanc, ed., Truth, Syntax and Modality, North-Holland, Amsterdam, 128-147.

Hughes, Dominic J. D. (2006) “Proofs Without Syntax,” Annals of Mathematics,

ISO/IEC (2002) Z Formal Specification Notation — Syntax, Type System, and Semantics, IS 13568, International Organisation for Standardisation.

ISO/IEC (2007) Common Logic (CL) — A Framework for a family of Logic-Based Languages, IS 24707, International Organisation for Standardisation.

Kamp, Hans (1981) “A theory of truth and semantic representation,” in Formal Methods in the Study of Language, ed. by J. A. G. Groenendijk, T. M. V. Janssen, & M. B. J. Stokhof, Mathematical Centre Tracts, Amsterdam, 277-322.

Kamp, Hans, & Uwe Reyle (1993) From Discourse to Logic, Kluwer, Dordrecht.

Klein, Sheldon, & Robert F. Simmons (1963) “Syntactic dependence and the computer generation of coherent discourse,” Mechanical Translation 7.

Levinson, Robert A., & Gerard Ellis (1992) “Multilevel hierarchical retrieval,” Knowledge Based Systems 5:3, pp. 233-244.

McKinley, Richard (2006) Categorical Models of First-Order Classical Proofs, PhD Dissertation, University of Bath.

Nagle, T. E., J. A. Nagle, L. L. Gerholz, & P. W. Eklund, eds. (1992) Conceptual Structures: Current Research and Practice, Ellis Horwood, New York.

Newell, Allen, & Herbert A. Simon (1972) Human Problem Solving, Prentice-Hall, Englewood Cliffs, NJ.

Peano, Giuseppe (1889) Aritmetices principia nova methoda exposita, Bocca, Torino.

Peirce, Charles Sanders (1870) “Description of a notation for the logic of relatives,” reprinted in Writings of Charles S. Peirce, Indiana University Press, Bloomington, vol. 2, pp. 359-429.

Peirce, Charles Sanders (1880) “On the algebra of logic,” American Journal of Mathematics 3, 15-57.

Peirce, Charles Sanders (1885) “On the algebra of logic,” American Journal of Mathematics 7, 180-202.

Peirce, Charles Sanders (1898) Reasoning and the Logic of Things, The Cambridge Conferences Lectures of 1898, ed. by K. L. Ketner, Harvard University Press, Cambridge, MA, 1992.

Peirce, Charles Sanders (1902) “Vague,” in J. M. Baldwin, ed., Dictionary of Philosophy and Psychology, MacMillan, New York, p. 748.

Peirce, Charles Sanders (1906) Manuscripts on existential graphs, Collected Papers of Charles Sanders Peirce, vol. 4, Harvard University Press, Cambridge, MA, pp. 320-410.

Peirce, Charles Sanders (1909) Manuscript 514, with commentary by J. F. Sowa, http://www.jfsowa.com/peirce/ms514.htm

Quillian, M. Ross (1966) “Semantic memory,” in M. Minsky, ed., Semantic Information Processing, MIT Press, Cambridge, MA, pp. 227-270.

Quine, Willard Van Orman (1954) “Reduction to a dyadic predicate,” J. Symbolic Logic 19, reprinted in W. V. Quine, Selected Logic Papers, Enlarged Edition, Harvard University Press, Cambridge, MA, 1995, pp. 224-226.

Sarraf, Qusai, & Gerard Ellis (2006) “Business Rules in Retail: The Tesco.com Story,” Business Rules Journal 7:6, http://www.brcommunity.com/a2006/n014.html

Schärfe, Henrik, Pascal Hitzler, & Peter Øhrstrom, eds. (2006) Conceptual Structures: Inspiration and Application, LNAI 4068, Springer, Berlin.

Schank, Roger C., ed. (1975) Conceptual Information Processing, North-Holland Publishing Co., Amsterdam.

Selz, Otto (1913) Über die Gesetze des geordneten Denkverlaufs, Spemann, Stuttgart.

Selz, Otto (1922) Zur Psychologie des produktiven Denkens und des Irrtums, Friedrich Cohen, Bonn.

Sowa, John F. (1976) “Conceptual graphs for a database interface,” IBM Journal of Research and Development 20:4, 336-357.

Sowa, John F. (1984) Conceptual Structures: Information Processing in Mind and Machine, Addison-Wesley, Reading, MA.

Sowa, John F. (2000) Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole Publishing Co., Pacific Grove, CA.

Sowa, John F. (2003) “Laws, facts, and contexts: Foundations for multimodal reasoning,” in Knowledge Contributors, edited by V. F. Hendricks, K. F. Jørgensen, and S. A. Pedersen, Kluwer Academic Publishers, Dordrecht, pp. 145-184.

Sowa, John F. (2006) “Worlds, Models, and Descriptions,” Studia Logica, Special Issue Ways of Worlds II, 84:2, 2006, pp. 323-360.

Sowa, John F., & Arun K. Majumdar (2003) “Analogical reasoning,” in A. de Moor, W. Lex, & B. Ganter, eds. (2003) Conceptual Structures for Knowledge Creation and Communication, LNAI 2746, Springer-Verlag, Berlin, pp. 16-36.

Stewart, John (1996) Theorem Proving Using Existential Graphs, MS Thesis, Computer and Information Science, University of California at Santa Cruz.

Tarski, Alfred (1933) “The concept of truth in formalized languages,” in A. Tarski, Logic, Semantics, Metamathematics, Second edition, Hackett Publishing Co., Indianapolis, pp. 152-278.

Tesnière, Lucien (1959) Éléments de Syntaxe structurale, Librairie C. Klincksieck, Paris. Second edition, 1965.

van Fraassen, Bas C. (1966) “Singular terms, truth-value gaps, and free logic,” Journal of Philosophy 63, 481-495.

Way, Eileen C., ed. (1992) Journal of Experimental and Theoretical Artificial Intelligence (JETAI), Special Issue on Conceptual Graphs, vol. 4, no. 2.

Whitehead, Alfred North, & Bertrand Russell (1910) Principia Mathematica, 2nd edition, Cambridge University Press, Cambridge, 1925.