Sharing and Integrating Ontologies

John F. Sowa

This report evolved from an email discussion in Ontolog Forum starting in February, 2010. Since many of the ideas were introduced, elaborated, and modified by multiple participants, it’s impossible to credit any particular individual for any specific point. Instead, all participants in the thread with the subject line Foundation Ontology, Cyc, and Mapping should be acknowledged as contributors. Some related discussions, also starting in February, took place on the email list of the Architecture Ecosystem SIG of the Object Management Group, which also influenced the ideas presented in this report. Other publications and presentations are cited in the body of the report and collected in the bibliography at the end.

1. Foundations for Interoperability

Two or more application programs that interoperate successfully on common data must be based, explicitly or implicitly, on some agreement about the meaning of that data. Internally, those applications may use very different syntax, and some of their processing may depend on information that is not described in the common agreements. For example, a personnel database and a medical database may share information about the names, addresses, and Social Security numbers of many of the same people. But the business-related details in the personnel DB and the case histories in the medical DB would not be shared. In general, the conditions for successful interoperability are explained in the following terms:

  1. Syntax.  Languages, notations, and formats shared among multiple programs must have syntax that is recognized and interpreted by each program. The shared notations may be as simple as comma-separated values in a file, or they may be rich languages with a highly expressive grammar.

  2. Semantics.  Meaningful notations must have some semantics, which determines how symbols in that notation are related to something outside the notation itself. Normally, those external things are significant for the people who pay the cost of developing and maintaining the the application systems.

  3. Pragmatics.  Useful applications must have some pragmatics that supports the goals or intentions of the designers, developers, and users. Pragmatics is that aspect of meaning that explains the motivation or the reason why some action is performed.

  4. Specifications.  Any program that must interoperate with programs implemented by more than one person requires detailed documentation. The syntax, semantics, and pragmatics of all the data and operations accessible to other programs must be specified to a level of precision that enables programmers to write compatible code. To support automated tools, the documentation must be specified with even greater precision — a level achieved only in programming languages and formal logics.

  5. Metadata.  Language about language is called metalanguage. Metadata is metalanguage about data and software that is sufficiently precise for computer processing. Several levels of data, metadata, and metalanguage have been used to represent subject domains, applications designed for the domain, languages for implementing applications, languages for specifying the designs, languages that relate designs to implementations, and methodologies for relating all of the above.

  6. Ontology.  The field of ontology analyzes, describes, and classifies the entities that exist or may exist in any domain. Those domains could be the physical world, plans and designs for the future, computer systems that relate to the world or to each other, or any language or metalanguage that relates any domain to any other.

In short, interoperability requires precise documentation of the syntax, semantics, and pragmatics of all the interactions among interoperable systems. Formal ontologies are metadata about the things, events, properties, people, and information involved in the design, implementation, and use of those systems.

As an example of typical documentation, consider the following paragraph, written in a mixture of English and computer jargon. This level of detail can be used by programmers who are familiar with the subject domain. But the level of precision is not adequate for automated processing by computer:

This file contains the billing transaction codes (types of records) that are to be interfaced to General Ledger for the given month. For this process the following transaction codes are used: 32 - loss on unbilled, 72 - gain on uncollected, and 85 - loss on uncollected. Any of these records that are actually taxes are bypassed. Only client types 01 - Mar, 05 - Internal Non/Billable, 06 - Internal Billable, and 08 - BAS are selected. This is determined by a GETBDATA call to the client file. The unit that the gain or loss is assigned to is supplied at the time of its creation in EBT.
This paragraph illustrates the massive amount of detail that must be defined for precise interoperability. Just a single phrase, such as 32 - loss on unbilled, illustrates the complexity. The number 32 is a highly specific code used in a narrow range of applications for a single aspect of one business enterprise. The definition of loss on unbilled presupposes and mixes information about money, bookkeeping, profits and losses, and computer processing. Much of that information is common to all business, but different companies and even different departments of the same company are likely to organize and express similar information in different ways.

Defining formal ontologies for all the kinds of information used by a single business enterprise is an enormous undertaking. Generalizing those ontologies for an entire industry with all the competing and supporting companies, suppliers, and customers is far more difficult. Defining universal ontologies to support all science, engineering, business, medicine, politics, law, and the arts will not be achieved for a long time, if ever.

Some intended solutions, unfortunately, can themselves become obstacles to finding better solutions. In an article on information integration, Firat, Madnick, and Grosof (2002) made the following observation:

The likelihood of a single international accounting standard coming to dominate anytime soon is quite slim. This is further complicated by the complexities and localities involved in the accounting practices of different countries (e.g. the UK views the proposed standards as actually reducing the quality of their corporate reporting).

A more reasonable goal is to define an open-ended framework that can accommodate any ontologies at any level of generalization or specialization and display whatever relationships among them that anyone has been able to discover. Such a framework will not solve all the problems, but it can help to organize and relate any solutions that have been found. Most importantly, the framework itself must not become an obstacle to further innovation and integration.

2. Ontologies, Terminologies, and Lexical Resources

[This section is incomplete.]

3. Methods of Reasoning

For a summary of some of the issues, see Fads and fallacies about logic.

[This section is incomplete.]

4. Lattice of Theories

For any given logic, the set of all possible theories expressible in that logic forms a lattice. The ordering of theories is defined by specialization and generalization:  adding axioms to a theory creates a more specialized theory; deleting axioms creates a more generalized theory.

The most general theory at the top of the lattice has no axioms. It contains all tautologies, which are true of everything and say nothing about anything. Examples include (p or not p), (p implies p), and (p & q implies p).

The bottom of the lattice is the absurd theory, which contains all statements in the lattice, including all contradictions. Adding a contradiction to any consistent theory causes it to degenerate to the absurd theory.

[This section is incomplete.]

Deduction can never introduce anything new that is not already implicit in a given theory. In terms of the lattice, deduction stays within the bounds of a single theory. Methods of theory revision move from one theory to another by deleting, adding, or modifying axioms Figure 1 shows the four basic operators for navigating the lattice:  contraction, expansion, revision, and relabeling. The operators of contraction and expansion follow the arcs of the lattice, revision makes short hops sideways, and relabeling makes long-distance jumps.

Figure 1:  Four operators for navigating the lattice of theories

To illustrate the moves through the lattice, suppose that A is Newton’s theory of gravitation applied to the earth revolving around the sun and F is Niels Bohr’s theory about an electron revolving around the nucleus of a hydrogen atom. The path from A to F is a step-by-step transformation of the old theory to the new one. The revision step from A to C replaces the gravitational attraction between the earth and the sun with the electrical attraction between the electron and the proton. That step can be carried out in two intermediate steps:

Unlike contraction and expansion, which move to nearby theories in the lattice, relabeling jumps to a remote theory, such as C to E, by systematically renaming the types, relations, and individuals that appear in the axioms:  the earth is renamed the electron; the sun is renamed the nucleus; and the solar system is renamed the atom. Finally, the revision step from E to F uses a contraction step to discard details about the earth and sun that have become irrelevant, followed by an expansion step to add new axioms for quantum mechanics.

The lattice of theories can be viewed as the theoretical background for a wide range of developments in AI, ranging from informal heuristics to the most sophisticated methods of learning and reasoning. In fact, any method of default or nonmonotonic reasoning can be interpreted as a strategy for walking and jumping through the infinite lattice in search of a suitable theory.

5. Hierarchy of Certified Theories

The set of all theories that have been tested and certified to be consistent and used successfully in one or more applications is a finite subset of the lattice of all possible theories. It is called the hierarchy of certified theories.

[This section is incomplete.]

6. Methodologies and Metadata

[This section is incomplete.]

References

Cassidy, Patrick (2008), Toward an open-source foundation ontology representing the Longman’s defining vocabulary: The COSMO Ontology OWL version, in Proc. Third International Ontology for the Intelligence Community Conference, CEUR Workshop Proceedings, vol. 440, Fairfax, VA. http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-440/paper11.pdf

Cassidy, Patrick (2009), The foundation ontology as a basis for semantic interoperability, http://www.micra.com/COSMO/TheFoundationOntologyForInteroperability.ppt

Fellbaum, Christiane, ed. (1998) WordNet: An Electronic Lexical Database, MIT Press, Cambridge, MA.

Firat, Aykut, Stuart Madnick, & Benjamin Grosof (2002) Financial information integration in the presence of equational ontological conflicts, Proc. Workshop on Information Technology and Systems (WITS). http://www.mit.edu/~bgrosof/paps/wits02.pdf

Grüninger, Michael (2009) COLORE:  Common Logic Ontology Repository, Semantic Technologies Lab, University of Toronto, http://ontolog.cim3.net/file/work/OOR-Ontolog-Panel/2009-08-06_Ontology-Repository-Research-Issues/Colore--MichaelGruninger_20090806.pdf

Grüninger, Michael (2010) An update on COLORE and OOR, Semantic Technologies Lab, University of Toronto, http://ontolog.cim3.net/file/work/OpenOntologyRepository/2010-02-19_OOR-Developers-Panel/COLORE--MichaelGruninger_20100219.pdf

ISO/IEC (2007) Common Logic (CL) — A Framework for a family of Logic-Based Languages, IS 24707, International Organisation for Standardisation, Geneva.

Lenat, Douglas B. (1995) Cyc: A large-scale investment in knowledge infrastructure, Communications of the ACM 38:11, 33-38.

Lenat, D. B., & R. V. Guha (1990) Building Large Knowledge-Based Systems, Addison-Wesley, Reading, MA.

Miller, George A. (1995) WordNet: A lexical database for English, Communications of the ACM 38:11, 39-41.

Rössler, Otto E. (1998) Endophysics: The World as an Interface, Singapore: World Scientific Publishing Co.

Sowa, John F. (2000) Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole Publishing Co., Pacific Grove, CA.

Sowa, John F. (2005) The challenge of knowledge soup, in J. Ramadas & S. Chunawala, Research Trends in Science, Technology, and Mathematics Education, Homi Bhabha Centre, Mumbai, pp. 55-90.

Sowa, John F. (2006) A dynamic theory of ontology, in B. Bennett & C. Fellbaum, eds., Formal Ontology in Information Systems, IOS Press, Amsterdam, pp. 204-213.

Sowa, John F. (2007) Fads and fallacies about logic, IEEE Intelligent Systems, 22:2, pp. 84-87.

Sowa, John F. (2008) Conceptual graphs, in F. van Harmelen, V. Lifschitz, and B. Porter, eds., Handbook of Knowledge Representation, Elsevier, Amsterdam, pp. 213-237.

Tsuda, Ichiro, & Takeshi Ikegami (2002) Review of Endophysics: The World as Interface, Discrete Dynamics in Nature and Society 7:3, 213-214.

West, Matthew (2009) Ontology Meets Business, in Tolk, A and Lain, L.C. Complex Systems in Knowledge-based Environments: Theory, Models and Applications, Springer, Berlin, pp. 229-260.

West, Matthew, Fowler, Julian (2001) The IIDEAS architecture and integration methodology for integrating enterprises, PDT Days

West, Matthew Sullivan, Jan; Teijgeler, Hans (2003) ISO/FDIS 15926-2 - Lifecycle integration of process plant data including oil and gas production facilities, ISO TC184/SC4/WG3N1328.

Yokoi, Toshio (1995) The EDR electronic dictionary, Communications of the ACM 38:11, 42-44.