This report evolved from an email discussion in Ontolog Forum starting in February, 2010. Since many of the ideas were introduced, elaborated, and modified by multiple participants, it’s impossible to credit any particular individual for any specific point. Instead, all participants in the thread with the subject line Foundation Ontology, Cyc, and Mapping should be acknowledged as contributors. Some related discussions, also starting in February, took place on the email list of the Architecture Ecosystem SIG of the Object Management Group, which also influenced the ideas presented in this report. Other publications and presentations are cited in the body of the report and collected in the bibliography at the end.
Two or more application programs that interoperate successfully on common data must be based, explicitly or implicitly, on some agreement about the meaning of that data. Internally, those applications may use very different syntax, and some of their processing may depend on information that is not described in the common agreements. For example, a personnel database and a medical database may share information about the names, addresses, and Social Security numbers of many of the same people. But the business-related details in the personnel DB and the case histories in the medical DB would not be shared. In general, the conditions for successful interoperability are explained in the following terms:
In short, interoperability requires precise documentation of the syntax, semantics, and pragmatics of all the interactions among interoperable systems. Formal ontologies are metadata about the things, events, properties, people, and information involved in the design, implementation, and use of those systems.
As an example of typical documentation, consider the following paragraph, written in a mixture of English and computer jargon. This level of detail can be used by programmers who are familiar with the subject domain. But the level of precision is not adequate for automated processing by computer:
This file contains the billing transaction codes (types of records) that are to be interfaced to General Ledger for the given month. For this process the following transaction codes are used: 32 - loss on unbilled, 72 - gain on uncollected, and 85 - loss on uncollected. Any of these records that are actually taxes are bypassed. Only client types 01 - Mar, 05 - Internal Non/Billable, 06 - Internal Billable, and 08 - BAS are selected. This is determined by a GETBDATA call to the client file. The unit that the gain or loss is assigned to is supplied at the time of its creation in EBT.This paragraph illustrates the massive amount of detail that must be defined for precise interoperability. Just a single phrase, such as 32 - loss on unbilled, illustrates the complexity. The number 32 is a highly specific code used in a narrow range of applications for a single aspect of one business enterprise. The definition of loss on unbilled presupposes and mixes information about money, bookkeeping, profits and losses, and computer processing. Much of that information is common to all business, but different companies and even different departments of the same company are likely to organize and express similar information in different ways.
Defining formal ontologies for all the kinds of information used by a single business enterprise is an enormous undertaking. Generalizing those ontologies for an entire industry with all the competing and supporting companies, suppliers, and customers is far more difficult. Defining universal ontologies to support all science, engineering, business, medicine, politics, law, and the arts will not be achieved for a long time, if ever.
Some intended solutions, unfortunately, can themselves become obstacles to finding better solutions. In an article on information integration, Firat, Madnick, and Grosof (2002) made the following observation:
The likelihood of a single international accounting standard coming to dominate anytime soon is quite slim. This is further complicated by the complexities and localities involved in the accounting practices of different countries (e.g. the UK views the proposed standards as actually reducing the quality of their corporate reporting).
A more reasonable goal is to define an open-ended framework that can accommodate any ontologies at any level of generalization or specialization and display whatever relationships among them that anyone has been able to discover. Such a framework will not solve all the problems, but it can help to organize and relate any solutions that have been found. Most importantly, the framework itself must not become an obstacle to further innovation and integration.
For a summary of some of the issues, see Fads and fallacies about logic.
For any given logic, the set of all possible theories expressible in that logic forms a lattice. The ordering of theories is defined by specialization and generalization: adding axioms to a theory creates a more specialized theory; deleting axioms creates a more generalized theory.
The most general theory at the top of the lattice has no axioms. It contains all tautologies, which are true of everything and say nothing about anything. Examples include (p or not p), (p implies p), and (p & q implies p).
The bottom of the lattice is the absurd theory, which contains all statements in the lattice, including all contradictions. Adding a contradiction to any consistent theory causes it to degenerate to the absurd theory.
Deduction can never introduce anything new that is not already implicit in a given theory. In terms of the lattice, deduction stays within the bounds of a single theory. Methods of theory revision move from one theory to another by deleting, adding, or modifying axioms Figure 1 shows the four basic operators for navigating the lattice: contraction, expansion, revision, and relabeling. The operators of contraction and expansion follow the arcs of the lattice, revision makes short hops sideways, and relabeling makes long-distance jumps.
Figure 1: Four operators for navigating the lattice of theories
To illustrate the moves through the lattice, suppose that A is Newton’s theory of gravitation applied to the earth revolving around the sun and F is Niels Bohr’s theory about an electron revolving around the nucleus of a hydrogen atom. The path from A to F is a step-by-step transformation of the old theory to the new one. The revision step from A to C replaces the gravitational attraction between the earth and the sun with the electrical attraction between the electron and the proton. That step can be carried out in two intermediate steps:
The lattice of theories can be viewed as the theoretical background for a wide range of developments in AI, ranging from informal heuristics to the most sophisticated methods of learning and reasoning. In fact, any method of default or nonmonotonic reasoning can be interpreted as a strategy for walking and jumping through the infinite lattice in search of a suitable theory.
The set of all theories that have been tested and certified to be consistent and used successfully in one or more applications is a finite subset of the lattice of all possible theories. It is called the hierarchy of certified theories.
Cassidy, Patrick (2008), Toward an open-source foundation ontology representing the Longman’s defining vocabulary: The COSMO Ontology OWL version, in Proc. Third International Ontology for the Intelligence Community Conference, CEUR Workshop Proceedings, vol. 440, Fairfax, VA. http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-440/paper11.pdf
Cassidy, Patrick (2009), The foundation ontology as a basis for semantic interoperability, http://www.micra.com/COSMO/TheFoundationOntologyForInteroperability.ppt
Fellbaum, Christiane, ed. (1998) WordNet: An Electronic Lexical Database, MIT Press, Cambridge, MA.
Firat, Aykut, Stuart Madnick, & Benjamin Grosof (2002) Financial information integration in the presence of equational ontological conflicts, Proc. Workshop on Information Technology and Systems (WITS). http://www.mit.edu/~bgrosof/paps/wits02.pdf
Grüninger, Michael (2009) COLORE: Common Logic Ontology Repository, Semantic Technologies Lab, University of Toronto, http://ontolog.cim3.net/file/work/OOR-Ontolog-Panel/2009-08-06_Ontology-Repository-Research-Issues/Colore--MichaelGruninger_20090806.pdf
Grüninger, Michael (2010) An update on COLORE and OOR, Semantic Technologies Lab, University of Toronto, http://ontolog.cim3.net/file/work/OpenOntologyRepository/2010-02-19_OOR-Developers-Panel/COLORE--MichaelGruninger_20100219.pdf
ISO/IEC (2007) Common Logic (CL) — A Framework for a family of Logic-Based Languages, IS 24707, International Organisation for Standardisation, Geneva.
Lenat, Douglas B. (1995) Cyc: A large-scale investment in knowledge infrastructure, Communications of the ACM 38:11, 33-38.
Lenat, D. B., & R. V. Guha (1990) Building Large Knowledge-Based Systems, Addison-Wesley, Reading, MA.
Miller, George A. (1995) WordNet: A lexical database for English, Communications of the ACM 38:11, 39-41.
Rössler, Otto E. (1998) Endophysics: The World as an Interface, Singapore: World Scientific Publishing Co.
Sowa, John F. (2000) Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole Publishing Co., Pacific Grove, CA.
Sowa, John F. (2005) The challenge of knowledge soup, in J. Ramadas & S. Chunawala, Research Trends in Science, Technology, and Mathematics Education, Homi Bhabha Centre, Mumbai, pp. 55-90.
Sowa, John F. (2006) A dynamic theory of ontology, in B. Bennett & C. Fellbaum, eds., Formal Ontology in Information Systems, IOS Press, Amsterdam, pp. 204-213.
Sowa, John F. (2007) Fads and fallacies about logic, IEEE Intelligent Systems, 22:2, pp. 84-87.
Sowa, John F. (2008) Conceptual graphs, in F. van Harmelen, V. Lifschitz, and B. Porter, eds., Handbook of Knowledge Representation, Elsevier, Amsterdam, pp. 213-237.
Tsuda, Ichiro, & Takeshi Ikegami (2002) Review of Endophysics: The World as Interface, Discrete Dynamics in Nature and Society 7:3, 213-214.
West, Matthew (2009) Ontology Meets Business, in Tolk, A and Lain, L.C. Complex Systems in Knowledge-based Environments: Theory, Models and Applications, Springer, Berlin, pp. 229-260.
West, Matthew, Fowler, Julian (2001) The IIDEAS architecture and integration methodology for integrating enterprises, PDT Days
West, Matthew Sullivan, Jan; Teijgeler, Hans (2003) ISO/FDIS 15926-2 - Lifecycle integration of process plant data including oil and gas production facilities, ISO TC184/SC4/WG3N1328.
Yokoi, Toshio (1995) The EDR electronic dictionary, Communications of the ACM 38:11, 42-44.