Data Mining
and
Knowledge Discovery


John F. Sowa

VivoMind LLC



Panel Discussion

4 July 2003











 

Panelists

Today's speakers:

  • Guy Mineau, Université Laval
    Graph Structures:  Simple Models, Complex Processing

  • Yves Kodratoff, Université Paris-Sud XI
    Text Mining:  From Text Retrieval to Knowledge Extraction

  • Amadeo Napoli, LORIA
    Knowledge Discovery in Databases

  • Jean-Guy Meunier, UQÀM
    Comuputer Text Categorization

Speaker on Tuesday, July 8th:

  • Ryszard Michalski, George Mason University
    Inferential Theory of Learning









 

Peirce's Logic of Pragmatism










 

Relating Logic to the World



George Box:  "All models are wrong, but some are useful."








 

Branches of Semeiotic

Peirce's classification:

  1. Grammar:  Patterns of signs at every level of complexity in every sensory modality.

  2. Logic:  Formal conditions for the truth of representations.

  3. Methodeutic:  Methods of observation, experiment, and testing for relating signs to their referents in science, engineering, and everday life.









 

Major Challenge

  • Computer systems are very good at deduction.

  • They can process large volumes of data for induction and abduction.

  • But they cannot compete with a child in learning language.

  • Why not?













 

Replacing Sherlock Holmes










 

Paradox of Information Retrieval

  • People try to understand a document before classifying it.

  • They try to understand a question before answering it.

  • Since the 1950s, computational linguists have been developing sophisticated methods for information retrieval.

  • But the most successful methods use little or no linguistics.









 

Paradox of Machine Translation

  • Human translators must know the subject matter.

  • Research on knowledge-based MT since the 1970s.

  • But the most widely used MT system is SYSTRAN:
    Originally called GAT (Georgetown Automatic Translator).

    Research terminated in 1963.

    Uses a very big dictionary and very little linguistics.

    Now called Babelfish for translating WWW pages.










 

Paradox of Machine Learning

Human knowledge is expressed in complex structures.

But most machine learning systems use very simple structures:

  • Boolean combinations

  • Adjusting numerical weights

  • Vectors of features
How can a system learn the structures that occur in language?








 

Utterance by a 3-year-old Child

When I was a little girl, I could go "geek, geek" like that; but now I can go "This is a chair."

Enormous logical complexity in one short passage:

  • Subordinate and coordinate clauses

  • Tenses:  Earlier time contrasted with "now"

  • Modal auxiliaries:  can and could

  • Quotations:  "geek, geek" and "This is a chair"

  • Metalanguage about her own linguistic abilities

  • Contrast shown by but

  • Parallel stylistic structure









 

A Typical Neural Network

  • Fixed set of features, concepts, nodes, and arcs.

  • Learning is limited to adjusting weights.

  • Such a structure cannot learn a language.








 

Questions for the Panelists

Why hasn't linguistics helped information retrieval?

Why aren't richer structures used in machine learning?

How could richer learning systems be designed?

How could other branches of cognitive science

(a) contribute to research in machine learning?

(b) benefit from research in machine learning?

What are the prospects for the future?








 

References

Slides presented on the opening day:

http://www.jfsowa.com/talks/uqam.htm

Paper on analogical reasoning by Sowa and Majumdar:

http://www.jfsowa.com/pubs/analog.htm

Peirce's tutorial on existential graphs, with commentary by Sowa:

http://www.jfsowa.com/peirce/ms514.htm

Selected papers by Peirce on semeiotic and related topics; see his 1903 lectures on pragmatism in vol. 2 for material related to this talk:

Peirce, Charles Sanders (EP) The Essential Peirce, ed. by N. Houser, C. Kloesel, and members of the Peirce Edition Project, 2 vols., Indiana University Press, Bloomington, 1991-1998.


Copyright ©2003, John F. Sowa