Reconciling Documentation
And Implementation


John F. Sowa

VivoMind LLC




Recent Object-Oriented Trends Symposium (rOOts)

Bergen, Norway

May 2003











 

Documentation and Implementation

They will never be consistent unless...

Both are automatically derived from a common source.

For legacy code,

Some documentation can be derived from the implementation.

For new code in current languages,

Some of the implementation can be derived from the documentation.

With well-designed tools and languages,

Documentation can be identical to implementation.









 

Knowledge Representation

  • Natural languages have the words and syntax to express anything that can be stated in any version of logic.

    • First-order logic:  and, or, not, if, some, and every.
    • Modal logic:  can and must.
    • Temporal logic:  sometimes, alway and next.
    • Epistemic logic:  know and believe.
    • Context-dependent indexicals:  the, this, that, he, she, it, I, and you.
    • Metalanguage:  direct and indirect quotation.

  • Natural languages can be learned by a two-year-old child.

  • But they still pose a major challenge for computer systems.









 

Problem

Observation by Alan Perlis:

You can't map informal specifications to formal specifications by any formal algorithm.

Questions:

  • How can you relate documentation (comments, manuals, and help facilities) to the implementation?

  • Can you derive one from the other?

  • Can you derive everything from a common source?

  • Can you do something useful that is less than a complete derivation?









 

Controlled Natural Languages

Formal languages that use a subset of the syntax and vocabulary of natural languages.

First used by Aristotle for his syllogisms, which express the description logic subset of FOL.

Syllogisms are limited to four sentence types:

  1. Universal affirmative.  Every employee is a person.

  2. Particular affirmative.  Some employees are customers.

  3. Universal negative.  No employee is a competitor.

  4. Particular negative.  Some customers are not employees.

Syllogisms can support declarations, inheritance, and constraint checking for object-oriented languages.








 

OO Declarations in Syllogistic Form

Declaration of a new type:

Every truck
  — is a vehicle.
  — has an unloaded weight.
  — has a maximum gross weight.
  — has a cargo capacity.
  — has a number of wheels.

Declaration of an instance:

X39071D
  — is a truck.
  — has a cargo capacity of 25 cubic meters.
  — has 6 wheels.

Inheritance:

Every truck has a cargo capacity.
X39071D is a truck.
Therefore, X39071D has a cargo capacity.







 

Going Beyond Syllogisms


  • Syllogisms can only represent the description logic subset of first-order logic.

  • Procedural information requires Horn-clause logic.

  • SQL queries and constraints require full FOL.

  • All of the required forms can be represented in controlled NLs.

  • And they have been implemented in working prototypes.













 

Representing Horn Clauses in ACE


Attempto Controlled English:

If a copy of a book is checked out to a borrower
   and a staff member returns the copy
then the copy is available.
If a staff member adds a copy of a book to the library
   and no catalog entry of the book exists
then the staff member creates a catalog entry
        that contains the author name of the book
           and the title of the book
           and the subject area of the book
   and the staff member enters the id of the copy
   and the copy is available.

These statements are compiled to executable code.








 

Use of Controlled NLs


  • Controlled natural languages are formal languages that can be read by people who had never studied logic or programming languages.

  • They are not true NLs, and they require tools that ensure the authors stay within the limited subsets.

  • They can be implemented in new development tools.

  • But the real challenge is to deal with legacy systems.









 

Analogical Reasoning

  • All human thinking is based on analogies.

  • Logic is a disciplined method of using analogies.

  • Analogical reasoning by computer can...

    1. Process computer languages.

    2. Process natural languages.

    3. Map (some of) one language to another.










 

Intellitex

  • NL processor developed by VivoMind LLC.

  • Lightweight syntax with heavy semantics.

  • Based on conceptual graphs (CGs) — a graphical version of logic.

  • Uses analogies as its primary reasoning method.









 

VivoMind Analogy Engine

Three methods of analogy:

  1. Matching labels: 

    • Compare type labels on conceptual graphs.

  2. Matching subgraphs: 

    • Compare subgraphs independent of labels.

  3. Matching transformations: 

    • Transform subgraphs.

Methods #1 and #2 take (N log N) time.

Method #3 takes polynomial time (analogies of analogies).




 

Analogy of Cat to Car

CatCar
headhood
eyeheadlight
corneaglass plate
mouthfuel cap
stomachfuel tank
bowelcombustion chamber
anusexhaust pipe
skeletonchasis
heartengine
pawwheel
furpaint

VAE used methods #1 and #2.

Source data from WordNet mapped to CGs.






 

Matching Labels and Subgraphs

Corresponding parts have similar functions:

  • Fur and paint are outer coverings.

  • Heart and engine are internal parts with a regular beat.

  • Skeleton and chasis are structures for attaching parts.

  • Paw and wheel support the body, and there are four of each.

Approximate matching (missing esophagus and muffler):

  • Cat:  mouth → stomach → bowel → anus.

  • Car:  fuel cap → fuel tank → combustion chamber → exhaust pipe.

Another pair of matching subgraphs:

  • Cat:  head → eyes → cornea.

  • Car:  hood → headlights → glass plate.





 

Relating Different Representations

Method #3 for relating data structures that represent equivalent information.

  • A structure described in different ways:

  • English description:  "A red pyramid A, a green pyramid B, and a yellow pyramid C support a blue block D, which supports an orange pyramid E."

  • A relational database would use tables.

  • But many different options for chosing tables, rows and columns, and labels for the columns.





 

Representation in a Relational DB












 

CG Derived from Relational DB
















 

CG Derived from English

"A red pyramid A, a green pyramid B, and a yellow pyramid C support a blue block D, which supports an orange pyramid E."














 

The Two CGs Look Very Different

  • CG from RDB has 15 concept nodes and 8 relation nodes.

  • CG from English has 12 concept nodes and 11 relation nodes.

  • No label on any node in the first graph is identical to any label on any node in the second graph.

  • But there are some structural similarities.

  • VAE uses method #3 to find them.













 

Transformations Found by VAE


Top transformation applied to 5 subgraphs.

Bottom one applied to 3 subgraphs.

One application could be due to chance, but 3 or 5 contribute strong evidence for the mapping.








 

A Large Application

An earlier version of Intellitex was applied to three languages — English, COBOL, and JCL:

  • 1.5 million lines of COBOL.

  • Several hundred JCL scripts.

  • 100 megabytes of English documentation — text files, e-mails, Lotus Notes, HTML, and transcriptions of oral communications.

Same parser but different grammars for all three languages:

  • English used canonical graphs derived from WordNet and TOSI graphs for the application.

  • COBOL and JCL did not require the WordNet graphs, but they did use the TOSI graphs.

  • Results represented in conceptual graphs independent of the source language.

  • Translated into diagrams with English text: 

    Glossary, data dictionary, data flow diagrams, process architecture diagrams, system context diagrams.






 

Results

Job finished in 8 weeks by two programmers, Arun Majumdar and André LeClerc.

  • Four weeks for customization:
    • Design and logistics.
    • Additional programming for I/O formats.

  • Three weeks to run Intellitex + VAE + extensions:
    • 24 hours a day on a 750 MHz Pentium III.
    • VAE handled matches with strong evidence.
    • Matches with weak evidence were confirmed or corrected by Majumdar and LeClerc.

  • One week to produce a CD-ROM with integrated views of the results:

    Glossary, data dictionary, data flow diagrams, process architecture, system context diagrams.






 

Contradiction Found by VAE

From analyzing English documentation:

  • Every employee is a human being.

  • No human being is a computer.

From analyzing COBOL programs:

  • Some employees are computers.

What is the reason for this contradiction?









 

Quick Patch in 1979

A COBOL programmer made a quick patch:

  • Two computers were used to assist human consultants.

  • But there was no provision to bill for computer time.

  • Therefore, the programmer named the computers Bob and Sally, and assigned them employee ids.

For more than 20 years:

  • Bob and Sally were issued payroll checks.

  • But they never cashed them.

  • However, the computer system was clogged with phantom accounts.

VAE discovered the two computer "employees".




 

Mismatch Found by VAE

Question:  Which location determines the market?

According to the documentation:  The business unit.

According to COBOL:  The client HQ.










 

Legacy Works

Tools under development by VivoMind LLC.

  • Convert the consultant-ware to products.

  • Short term:  translate CGs to current languages and tools, such as UML, SQL, Java, C#, etc.

  • Long term:  develop better languages and tools,











 

Current Tools

  • Too much emphasis on syntactic details:
    FORTRAN, COBOL, SQL, C, Perl, LISP, English, Norwegian, Japanese, etc.

  • Too many interface details:
    Programming language, database, command shell, GUI, email, Internet, I/O, formatting, printing, security, multimedia, etc.

  • Too many tools:
    Any one by itself is a tremendous aid to productivity, but any two together will kill you.











 

Flexible Modular Framework












 

Logic-Based Components

  • Choose any syntax you like:
    Algebraic, graphical, English, Norwegian, Japanese, etc.

  • Specify as much or as little as you like:
    application details, database, command shell, GUI, email, Internet, I/O, formatting, printing, security, multimedia, etc.

  • One tool, but many uses.









 

Method Repository System

  • Implemented by Olivier Gerbé and his colleagues at the DMR Consulting Group.

  • Uses conceptual graphs as the representation language at every level.

  • CGs are the metametalanguage for defining CGs themselves and other notations, including UML and Common KADS.

  • About 200 business processes modeled in a total of 80,000 CGs.

  • All specifications are stored as CGs, but they can be translated to web pages in either English or French.









 

Conclusions

Ideally,

Documentation ≡ Implementation.

For legacy code,

Some documentation can be derived from the implementation.

For new code in current languages,

Some of the implementation can be derived from the documentation.

The ultimate goal of the VivoMind tools is to make

Documentation ≡ Implementation.









 

References

For background information on all the topics discussed in this talk:
Sowa, John F. (2000) Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole, Pacific Grove, CA.
For an article about the Flexible Modular Framework (FMF):
Sowa, John F. (2002) "Architectures for intelligent systems," IBM Systems Journal 41:3, 331-349.
For more information about logic, ontology, and metalanguage:
Sowa, John F. (2000) "Ontology, metadata, and semiotics," in B. Ganter & G. W. Mineau, eds., Conceptual Structures: Logical, Linguistic, and Computational Issues, Lecture Notes in AI #1867, Springer-Verlag, Berlin, 2000, pp. 55-81.
For information about other systems discussed in this talk:

Fuchs, Norbert E., Uta Schwertel, Rolf Schwitter (1998) "Attempto Controlled English — not just another logic specification language," Proceedings LOPSTR'98, Manchester.

Fuchs, Norbert E., Uta Schwertel, Rolf Schwitter (1999) Attempto Controlled English (ACE), Language Manual, Version 3.0, Technical Report ifi-99.03, University of Zurich.

Gerbé, Olivier, & M. Perron (1995) "Presentation definition language using conceptual graphs," in G. Ellis, R. A. Levinson, & W. Rich, eds., Conceptual Structures: Applications, Implementation, and Theory, Lecture Notes in AI 954, Springer-Verlag, Berlin.

Gerbé, Olivier, B. Guay, & M. Perron (1996) "Using conceptual graphs for methods modeling," in P. W. Eklund, G. Ellis, & G. Mann, eds., Conceptual Structures: Knowledge Representation as Interlingua, Lecture Notes in AI 1115, Springer-Verlag, Berlin, pp. 161-174.

Gerbé, Olivier (1997) "Conceptual graphs for corporate knowledge management," in D. Lukose, H. Delugach, M. Keeler, L. Searle, & J. Sowa, eds., Conceptual Structures: Fulfilling Peirce's Dream, Lecture Notes in AI 1257, Springer-Verlag, Berlin, pp. 474-488.

Gerbé, Olivier, R. Keller, & G. Mineau (1998) "Conceptual graphs for representing business processes in corporate memories," in M-L Mugnier & Michel Chein, eds., Conceptual Structures: Theory, Tools, and Applications, Lecture Notes in AI 1453, Springer-Verlag, Berlin, pp. 401-415.

Gerbé, Olivier, & Brigitte Kerhervé (1998) "Modeling and metamodeling requirements for knowledge management," in Proc. of OOPSLA'98 Workshops, Vancouver.

Gerbé, Olivier (2000) Un Modèle uniforme pour la Modélisation et la Métamodélisation d'une Mémoire d'Entreprise, PhD Dissertation, Département d'informatique et de recherche opérationelle, Université de Montréal.

LeClerc, André, & Arun Majumdar (2002) "Legacy revaluation and the making of LegacyWorks," Distributed Enterprise Architecture 5:9, Cutter Consortium, Arlington, MA.

Schwitter, Rolf (1998) Kontrolliertes Englisch für Anforderungsspezifikationen, Studentdruckerei, Zurich. Available from http://www.ifi.unizh.ch/~schwitter/.

Skuce, Doug, & Timothy Lethbridge (1995) "CODE4: A unified system for managing conceptual knowledge," International J. of Human-Computer Studies, 42 413-451.

Skuce, Doug (1998) "Intelligent knowledge management: integrating documents, knowledge bases, and linguistic knowledge," in Proceedings of KAW'98, Calgary.

Skuce, Doug (2000) "Integrating web-based documents, shared knowledge bases, and information retrieval for user help," Computational Intelligence 16:1.










Copyright ©2003, John F. Sowa