Copyright ©2003, John F. Sowa![]()
Implications of a Fast Analogy Engine
For Ontology Design and Use
John F. Sowa
VivoMind LLC
Mitre Technical Exchange Meeting12 June 2003
![]()
![]()
Cyc Review
Two-day DARPA-sponsored review of Cyc on June 10th & 11th with about two dozen AI experts.
Consensus:
- Cyc is a unique and valuable resource:
Since 1984, 650 person years and $65 million to define and axiomatize about 600,000 concept types.- Support for Cyc should be continued.
- Cyc should be freely available for research purposes.
- Many questions about the relationship of Cyc to other R & D efforts.
![]()
![]()
Lexical Resources
Developers of WordNet (George Miller) and FrameNet (Chuck Fillmore) were also present.
Consensus:
- Lexical resources are complementary to Cyc.
- Extremely valuable for natural language projects.
- Desirable to integrate contributions from various sources.
- Integration would require relatively modest funding.
- Word senses (synsets) can be linked to the concept types of Cyc and other axiomatized ontologies.
- Further questions to be explored.
![]()
![]()
Common Logic (CL)
Abstract syntax and model theory for logic-based languages.
Currently supported: KIF, conceptual graphs, and OWL.
Other notations can be supported: Z, UML Object Constraint Language, and traditional predicate calculus.
Lenat agreed
- CycL is very close in expressive power to CL.
- Defining CycL in terms of CL abstract syntax is important for knowledge interchange.
- Doing so would give CycL a model-theoretic semantics.
![]()
![]()
Feigenbaum's Question #1
Ed Feigenbaum asked a question:
- Lenat had claimed that when the KB reached a critical size, new knowledge could be added much faster.
- Recently, the size of the KB has increased significantly.
- Has Cyc now reached a critical mass that would support an exponential increase in size?
Fritz Lehmann's response: The major reason for the recent increase is a managerial decision by Lenat.
![]()
![]()
Feigenbaum's Question #2
Feigenbaum asked another question:
- In 1961, I. J. Good made a prediction:
It is more probable than not that, within the twentieth century, an ultraintelligent machine will be built and that it will be the last invention that man need make.- Why hasn't Good's prediction come to pass?
- Is there some missing ingredient that the AI community hasn't discovered?
- What is it? Could it be added to Cyc?
![]()
![]()
Sowa's Answer
The missing ingredient is analogical reasoning:
- All human thinking is based on analogies.
- All aspects of logic — induction, deduction, and abduction — are highly disciplined, special cases of analogy.
- A high-speed analogical reasoner would be more flexible than Cyc.
- And it could call Cyc as a subroutine.
This answer generated more squabbles than consensus.
![]()
![]()
Intellitex
- NL processor developed by VivoMind LLC.
- Lightweight syntax with heavy semantics.
- Based on conceptual graphs (CGs) — a graphical version of logic.
- Uses analogies as its primary reasoning method.
- VivoMind Analogy Engine (VAE) takes (N log N) time to find analogies — where N is the number of nodes in the knowledge base.
- Older algorithms take N3 time.
![]()
![]()
VivoMind Analogy Engine
Three methods of analogy:
- Matching labels:
- Compare type labels on conceptual graphs.
- Matching subgraphs:
- Compare subgraphs independent of labels.
- Matching transformations:
- Transform subgraphs.
Methods #1 and #2 take (N log N) time.
Method #3 takes polynomial time (analogies of analogies).
![]()
![]()
Analogy of Cat to Car
Cat Car head hood eye headlight cornea glass plate mouth fuel cap stomach fuel tank bowel combustion chamber anus exhaust pipe skeleton chassis heart engine paw wheel fur paint
VAE used methods #1 and #2.
Source data from WordNet mapped to CGs.
![]()
![]()
Matching Labels
Corresponding concepts have similar functions:
- Fur and paint are outer coverings.
- Heart and engine are internal parts with a regular beat.
- Skeleton and chassis are structures for attaching parts.
- Paw and wheel support the body, and there are four of each.
![]()
![]()
Matching Subgraphs
A pair of isomorphic subgraphs:
- Cat: head → eyes → cornea.
- Car: hood → headlights → glass plate.
Approximate match (missing esophagus and muffler):
- Cat: mouth → stomach → bowel → anus.
- Car: fuel cap → fuel tank → combustion chamber → exhaust pipe.
![]()
![]()
Relating Different Representations
Method #3 for relating data structures that represent equivalent information.
![]()
- A structure described in different ways:
- English description: "A red pyramid A, a green pyramid B, and a yellow pyramid C support a blue block D, which supports an orange pyramid E."
- A relational database would use tables.
- But many different options for chosing tables, rows and columns, and labels for the columns.
![]()
![]()
Representation in a Relational DB
![]()
![]()
![]()
CG Derived from Relational DB
![]()
![]()
![]()
CG Derived from English
![]()
"A red pyramid A, a green pyramid B, and a yellow pyramid C support a blue block D, which supports an orange pyramid E."
![]()
![]()
The Two CGs Look Very Different
- CG from RDB has 15 concept nodes and 8 relation nodes.
- CG from English has 12 concept nodes and 11 relation nodes.
- No label on any node in the first graph is identical to any label on any node in the second graph.
- But there are some structural similarities.
- VAE uses method #3 to find them.
![]()
![]()
Transformations Found by VAE
![]()
Top transformation applied to 5 subgraphs.Bottom one applied to 3 subgraphs.
One application could be due to chance, but 3 or 5 contribute strong evidence for the mapping.
![]()
![]()
Legacy Re-engineering
An earlier version of Intellitex was applied to three languages — English, COBOL, and JCL:
- 1.5 million lines of COBOL.
- Several hundred JCL scripts.
- 100 megabytes of English documentation — text files, e-mails, Lotus Notes, HTML, and transcriptions of oral communications.
![]()
![]()
Same Parser for Three Languages
- English used canonical graphs derived from WordNet and task-oriented graphs for the application.
- COBOL and JCL did not require the WordNet graphs, but they used the same task-oriented graphs.
- Results represented in conceptual graphs independent of the source language.
- Translated to diagrams with English text:
Glossary, data dictionary, data flow diagrams, process architecture diagrams, system context diagrams.
![]()
![]()
Results
Job finished in 8 weeks by two programmers, Arun Majumdar and André LeClerc.
- Four weeks for customization:
- Design and logistics.
- Additional programming for I/O formats.
- Three weeks to run Intellitex + VAE + extensions:
- 24 hours a day on a 750 MHz Pentium III.
- VAE handled matches with strong evidence.
- Matches with weak evidence were confirmed or corrected by Majumdar and LeClerc.
- One week to produce a CD-ROM with integrated views of the results:
Glossary, data dictionary, data flow diagrams, process architecture, system context diagrams.
![]()
![]()
Supporting Multiple Ontologies
- Cyc supports microtheories, which are subontologies that may be inconsistent with one another.
- Example: microtheories about vampires or Greek mythology.
- Cyc currently has 6,000 microtheories.
- Cyc can create new microtheories dynamically to represent modalities or some agent's knowledge and belief.
- But there is a need for different microtheories even at the upper levels of the ontology.
![]()
![]()
Example from Cyc
A sample story used by the Cyc ontologists:
Jim, a car dealer, saw a tornado approaching the lot where all his cars were located. Shortly thereafter, the tornado swept through the lot and destroyed all his cars.Implications of the Cyc ontology:
- If a tornado approaches, it is an object.
- If a tornado destroys something, it is an event.
- But objects and events are disjoint.
- Therefore, there must be two distinct entities: TornadoAsObject and TornadoAsEvent.
![]()
![]()
Precision and Vagueness
Precision is sometimes bad:
- Essential for computability and logical deduction.
- But highly inflexible: an advantage in some cases, but a disadvantage in other cases.
- A precise ontology may force undesirable choices.
Vagueness is sometimes good:
- Inevitable starting point for planning, design, research, and any kind of sincere negotiation.
- Some things — such as tornadoes, glaciers, and clouds — may be both objects and events.
![]()
![]()
Cyc Intermediate Language (I-CycL)
- I-CycL uses the same syntax and logical operators as CycL.
- But I-CycL uses concept types that map directly to the words of natural languages.
- The concept type Tornado, for example, could be used in I-CycL.
- But in the mapping from I-CycL to CycL, the constraints imposed by the Cyc ontology would replace Tornado with either TornadoAsObject or TornadoAsEvent.
![]()
![]()
Mapping Language to Logic
The I-CycL approach has also been used with other logic-based languages, including conceptual graphs:
- The first stage of mapping language to logic uses labels taken from lexical resources, such as WordNet and FrameNet.
- Usually, there is a one-to-many mapping from the lexical labels to the names of the concept types and relations in the ontology.
- The selection of specific concept types and relations depends on constraints derived from axioms and definitions in the ontology.
![]()
![]()
Conclusions
The analogy engine should be the primary controller:
- Most human activities are guided by analogies.
- People can do precise deduction, but only when they "stop to think."
- VAE has been successfully used as an agent's primary controller and evaluator.
- A system such as Cyc can be used as an assistant for planning.
- But the main use for deduction is to provide more data for the analogy engine.
![]()
![]()
References
Paper on analogical reasoning by Sowa and Majumdar:
http://www.jfsowa.com/pubs/analog.htmCyc web sites:
http://www.cyc.com/WordNet web site:
http://www.cogsci.princeton.edu/~wn/FrameNet web site:
http://www.icsi.berkeley.edu/~framenet/Paper on Laws, Facts, and Contexts:
http://www.jfsowa.com/pubs/laws.htmPeirce's tutorial on existential graphs, with commentary by Sowa:
http://www.jfsowa.com/peirce/ms514.htm