Open Forum 2003
on Metadata Registries
John F. Sowa
VivoMind LLC
Copyright ©2003, John F. Sowa
![]()
![]()
What to Standardize
- Terminology: Character strings that refer to the entities of interest in some domain.
- Ontology: Formal descriptions of the entities that exist in a domain.
- Methodology: Methods for determining what entities exist and how they should be described.
- Framework: APIs for tools and techniques that support the terminology, ontology, and methodology.
![]()
![]()
Common Logic
- Standardized by a higher body than ISO or W3C — God.
- Semantically identical notations for the past 124 years
— since Frege (1879) and Peirce (1880, 1885).- Very large overlap (up to 100%) with the alphabet soup:
CGIF, CycL, IDEF1X, KIF, OCL, OWL, RDF, SQL, UML, Z.
- Proposed New Work Item for ISO.
- Presentations tomorrow by Chris Menzel and Pat Hayes.
![]()
![]()
Ontology Projects
Shifting groups of people with a large common core, who have generated many interesting ideas:
- 1991—199?: Shared Reusable Knowledge Base (SRKB) project.
- 1996—1997: X3H2 ontology working group.
Offshoot from the Conceptual Schema Modeling Facility (CSMF) Project, which survived in many versions from 1978 to 1998.- 1998: Ontology workshop in Heidelberg hosted by the Klaus Tschira Foundation.
- 2000—200?: IEEE Standard Upper Ontology Project.
Observation: Don't hold your breath waiting for a standard.
![]()
![]()
Three Large Ontologies
- Cyc: 100,000 concept types with over a million axioms.
- Electronic Dictionary Research (EDR): 400,000 concept types with mappings to English and Japanese words.
- WordNet: 170,000 English word senses (and related projects for other languages).
A lot of time, effort, and money required — 600 person-years for Cyc since 1984 and many billions of yen for EDR.
![]()
![]()
Three SUO Projects
- SUMO: Large hierarchy of concept types with formal definitions stated in KIF.
- OpenCyc: Free, open-source subset of Cyc.
- Even larger hierarchy of concepts than SUMO.
- Formal definitions stated in CycL.
- Software for developing and reasoning about the definitions.
- IFF: Framework based on category theory.
- Defines formal mappings between theories.
- Very mathematical and possibly very powerful, when and if completed.
- Could be used to relate SUMO, OpenCyc, and many other ontologies to one another.
Uncertain whether any of these will become an IEEE standard.
![]()
![]()
Precision and Vagueness
Precision is sometimes bad:
- Essential for computability and logical deduction.
- But highly inflexible: an advantage in some cases, but a disadvantage in many other cases.
- A computer program is never vague. But what it does so precisely may have no relationship to what was intended.
Vagueness is sometimes good:
- Inevitable starting point for planning, design, research, and any kind of sincere negotiation.
- Observation by C. S. Peirce:
"It is easy to be certain.
One has only to be sufficiently vague."- The engineers' dilemma:
Customers never know what they want
until they see what they get.- In diplomacy, too much precision at the beginning leads to war. Vagueness is necessary at the beginning, but compromises must be codified in precise treaties.
![]()
![]()
Cyc and WordNet
For natural language processing, Cyc is too brittle, and WordNet is more flexible.
- Cyc has 100,000 precisely defined concept types that are intended to support logical deduction.
- WordNet has 170,000 word senses, but the definitions are not precise enough for deduction.
- Cyc allows new concepts to be added, but they must be precisely defined.
- WordNet emphasizes contextual relationships between word senses, which are more flexible.
![]()
![]()
Questions
- Can we have a single system that can support both language and logic?
- Aligning the Cyc concept types to the WordNet synsets (senses) just makes WordNet as brittle as Cyc. What else is possible?
- Can computers negotiate meaning to move from vagueness to precision? How?
- What kind of system would support such negotiation?
- Could the same system support the kinds of applications that the current Cyc and WordNet can handle?
![]()
![]()
Legacy Re-enginering
- An application that requires both language and logic.
- Comparing English documentation to programming implementation:
- 100 megabytes of English reports, notes, comments, etc.
- 1.5 million lines of COBOL code.
- Hundreds of JCL scripts (IBM Job Control Language).
- Some programs in daily use are up to 40 years old.
- A major consulting firm estimated 80 person-years to analyze and compare all the programs and documentation (40 people for 2 years).
![]()
![]()
Using Automated Tools
- Two programmers, Majumdar and Leclerc, completed the job in 8 weeks.
- Using the same system to extract and translate information from English, COBOL, and JCL to conceptual graphs.
- Analogy finder compared CGs from all 3 sources.
- Ran for 504 hours (3×7×24) on a 750 MHz Pentium III.
- Generated one CD-Rom with results of the analysis:
- Glossary with definitions of all terms in English.
- Data dictionary suitable for use in modern DBMS.
- Specifications for generating UML diagrams.
- 250 to 1 productivity increase (16 person-weeks vs. 80 person-years).
- For more info: http://www.jfsowa.com/pubs/tosi.htm
![]()
![]()
A Consensus Dictionary
A proposal:
- A worldwide collaborative effort of academic and industrial R & D centers.
- To take advantage of available resources, such as WordNet, OpenCyc, SUMO, Ωmega, and many others.
- With a central core (the consensus) that represents the commonly accepted wisdom.
- And with open-ended research contributions that may be as controversial, specialized, or exotic as any researcher might suggest.
![]()
![]()
Supporting Tools
- Mulitple cross-indexing schemes:
- A Cyc-like organization by contexts or microtheories.
- A WordNet-like organization by synsets.
- Any other organizational methods, such as IFF, that anyone might develop.
- Many applications might choose to use only the consensus core.
- A researcher might want to see everything that anyone has ever said about a particular word.
- An editorial board would decide which research contributions should go into the consensus.
- But anyone could extract or develop an index to a different version of the consensus core with some selection of the research.
![]()
![]()
Project Organization
- A bazaar rather than a cathedral.
- External APIs defined in terms of XML.
- Semantics defined by the Common Logic (CL) standard.
- Internally, any notation based on the CL semantics can be used.
- Freedom for anybody to add anything they please to the research extensions.
- Editorial board controls only what goes into the consensus core.
- Version 0.1 within 6 months (WordNet + some extensions translated to version 0.1 of the XML interfaces).
This talk: http://www.jfsowa.com/talks/santafe.htm