Automating Ontology Development |
Next ![]() |
John F. Sowa
Source: http://www.jfsowa.com/pubs/autotalk.htm
Abstract:
Large ontologies are required for two different, but related purposes:
understanding unrestricted natural language text; and merging and
aligning independently developed knowledge bases and databases.
The largest ontologies currently available for these purposes,
Cyc, WordNet, and EDR, have been developed at enormous expense
by organizing and encoding the most critical information by hand.
The high cost and slow pace of the development indicate that handcoding
techniques are obsolete, inflexible, and inappropriate for large-scale
ontology development. Although fully-automated ontology development is
not yet feasible, many automated and semiauomated techniques have been
implemented for performing some of the required subtasks.
This talk surveys some of those techniques and proposes a framework
for integrating them into a system of tools that could support
more efficient and flexible methods of ontology development and
customization.
|
Questions to Consider |
![]() |
- How can we build large ontologies?
- How can we accommodate existing ontologies, databases, knowledge bases, and web sites?
- How do we relate them to natural languages?
- How can we use automated or semiautomated tools to build them, modify them, extend them, or apply them?
|
Large Hand-Coded Ontologies |
![]() |
Three large ontologies:
- Cyc: 100,000 concept types with over a million axioms.
- Electronic Dictionary Research (EDR): 400,000 concept types with mappings to English and Japanese words.
- WordNet: 166,000 English word senses with related projects for other languages.
Building these things requires a great deal of time and money.
|
Small Hand-Coded Ontologies |
![]() |
Can such simple systems coexist peacefully with the grand ontologies?
|
Purpose of This Talk |
![]() |
- Propose a more flexible structure.
- Show how it can coexist with various kinds of systems, large and small.
- Show how it can support natural language processing.
- Show how it can support detailed deductions for knowledge-based systems.
- Show how automated and semiautomated tools can help the above tasks.
|
2N+2 Hierarchies |
![]() |
Two hierarchies for each natural language:
- Words. A hierarchy of words and word senses, similar to WordNet and EDR.
- Canonical graphs. A partial ordering of conceptual graphs that express the lexical patterns associated with each word of each language.
Two language-independent hierarchies:
- Types. A lattice of all the concept and relation types that are used in the theories and in the canonical graphs.
- Theories. An open-ended, potentially infinite, lattice of all possible theories, each corresponding to a possible language game.
|
Some Automated Techniques |
![]() |
A small sample of many techniques — similar, related, or radically different:
- Word hierarchies. Example: MindNet by Richardson et al. (1998), Dolan et al. (2000).
- Canonical graph hierarchies. Example: Ariosto by Basili et al. (1993, 1994, 1996, 1999).
- Type hierarchies. Example: Formal Concept Analysis by Ganter & Wille (1999).
- Lattices of theories. Example: BIAIT by Donald Burnstine (1979).
|
MindNet |
![]() |
Automated development of semantics for the MS-NLP project
- Processes dictionaries (LDOCE) and corpora (Encarta Encyclopedia).
- Derives word hierarchies automatically (7 hours on a P266).
- Finds relationships by analyzing corpora (34 hours for 500,000 sentences on a P266).
|
Ariosto-Lex and Trevi |
![]() |
Ongoing project to develop integrated tools for NLP engineering
- Includes modules for lexical aquisition, linguistic processing, and applications.
- Lexical acquistion starts with a generic word hierarchy with generic canonical graphs and uses automated means to specialize them to particular domains.
|
Formal Concept Analysis |
![]() |
Techniques for analyzing concepts and creating lattices
- Analyze a given set of concepts to determine relevant attributes.
- Present concepts and attributes in a table.
- Construct a lattice of concept types automaticaly.
- Basis for tools that can be used in semiautomated and collaborative development.
|
Table of Attributes and Categories |
![]() |
| Attributes | |||||
|---|---|---|---|---|---|
| Concept Types | nonalcoholic | hot | alcoholic | caffeinic | sparkling |
| HerbTea | x | x | |||
| Coffee | x | x | x | ||
| MineralWater | x | x | |||
| Wine | x | ||||
| Beer | x | x | |||
| Cola | x | x | x | ||
| Champagne | x | x | |||
Table of beverage types and attributes
|
A Lattice of Beverages |
![]() |
Problem: No distinction between Beer and Champagne.
|
Revised Lattice of Beverages |
![]() |
Solution: Add attributes madeFromGrapes and madeFromGrain.
|
Disagreements Lead to Distinctions |
![]() |
- Socrates' principle:
Whenever two philosophers — human or machine — disagree, draw a distinction.- Anyone may discover a conflict.
- Anyone may suggest a distinction.
- Machine recomputes the lattice.
- Repeat until everybody is happy.
|
Collaborative Development |
![]() |
Semiautomated development:
Collaboration between one machine and one human.
Collaborative development:
- Collaboration among multiple humans and multiple machines distributed across the Internet.
- Anyone may draw attention to an error, conflict, or inconsistency.
- Owner of the data has the right to accept or reject a suggested change.
|
BIAIT |
![]() |
Business Information Analysis and Integration Technique — based on seven binary distinctions:
- Bill. Does the supplier bill the customer, or does the customer pay by cash?
- Future. Does the supplier deliver the product at some time in the future, or does the customer take the order from stock?
- Profile. Does the supplier keep a profile of the customer, or is every transaction a surprise?
- Negotiate. Is the price negotiated or fixed?
- Rent. Is the product rented or purchased?
- Track. Does the supplier keep track of the product after it is sold or not?
- Make to order. Is the product made to order, or prefabricated?
|
Building Theories with BIAIT |
![]() |
Combinatorial construction of theories from conjunctions of axioms:
- Each BIAIT question is based on a binary distinction.
- Each choice contributes one axiom.
- Result of seven choices is a theory with seven axioms.
- Total number of theories is 27 = 128, which describe 128 kinds of businesses classified according to their record-keeping needs.
|
The Lattice of Theories |
![]() |
An infinite lattice of all possible theories, also called a Lindenbaum lattice:
- Only a finite number of theories are stored at any given time.
- But there are enough slots in the lattice to accommodate any theory that anyone could ever conceive.
- Ordered by generalization and specialization.
- Belief-revision operators are used to relate theories: contraction, expansion, and analogy.
|
Navigating the Lattice of Theories |
![]() |
Example: earth and sun map to the hydrogen atom.
|
Special-Purpose Theories |
![]() |
Belief-revision operators can accommodate such theories
|
Summary |
![]() |
- A single, consistent, all-encompassing theory is impossible.
- The enormous number of special-purpose solutions aren't going to disappear overnight, if ever.
- The infinite hierarchy of theories can accommodate everything, general or special-purpose, scruffy or neat.
- Automated and semiautomated techniques are essential for developing, extending, and using the four kinds of hierarchies.
|
References |
![]() |
For the slides used in this talk, see
http://www.jfsowa.com/pubs/autotalk.htmFor further discussion of the hierarchies, see
http://www.jfsowa.com/pubs/signtalk.htmFor even more detail, see the [unfinished] paper:
http://www.jfsowa.com/pubs/signproc.htmAll other references are [or will be] in the bibliography:
http://www.jfsowa.com/bib.htm
Copyright ©2001, John F. Sowa