The New World of Computing: The Sublanguage Paradigm

Bożena Henisz Thomspon and Frederick B. Thompson

Note:  This is a slightly revised version of a preprint formerly located at http://www.cs.caltech.edu/people/fbt/vision.htm . Changes to the text are based on the published version in Proceedings of the Second International Symposium on the Frontiers of Science and Technology, United Nations University, Kyoto, Japan, May 1992, pp. 290-327.

1. Prologue

We are witnessing one of history's major technological events: the advance of the telephone-computer era. Over a number of years, we have directed our research toward the solution to the problems of this coming age. The results of this research are presented here. However, it is not enough to put down some concepts from which a solution my be implied. The solution therefore is also in the form of a fully implemented system which now exits as a commercial prototype ready for product development — the New World of Computing System.

2. Obstacles to the Development of the Telephone-Computer

The next decade will see the telephone, personal computer, work station, and television set combined into a single, ubiquitous instrument — the telephone-computer. The telephone-computer will cause a rapid, widespread acceleration in the use of information processing and telecommunications. As a result of this major development, the market for both computer hardware and software will rapidly expand in both quantity and variety. This market will soon far exceed the current market for personal computers, workstations and large transaction processors.

Current hardware capabilities are already adequate for this development 30 MIP processing chips, voice digitizing, image processing and communication chips, 8 megabyte main memories and 100+ megabyte peripheral memories are quite adequate for the great bulk- of processing to be done. 144 K bits per second, error free, and soon 1500 K bits per second, telecommunications, with packet switching, are now becoming available, capabilities can be expected to stay well ahead of foreseeable needs. New technologies, such as flat screens, and parallel processing chips, will all enhance this radical change in the human/computer interface.

What will the era of the telephone-computer be like? This question has been the focus of our research over a number of years. One thing is clear: in the confluence- of computer technology and telecommunications technology, we are witnessing one of history's major advances in human communication.

Industry is ill-prepared for this rapid acceleration of the information technologies. There are dislocations in the current software industry that work against the full development of the telephone-computer. One major symptom of these dislocations is the high cost of software development Industry is acutely aware of these symptoms, as evidenced by their concern with "open systems" and "software engineering" approaches. However the roots of these problems lie elsewhere. In this paper we will identify these roots and an approach that substantially corrects them.

As industry moves into this period of accelerating change and expanding market opportunity, management — indeed the industry as a whole — needs a coherent set of concepts that can provide the perspective required for intelligent decision making. At this point in time, there is an almost total lack of sensible, down-to-earth concepts on which to develop an undemanding and a strategy for what is taking place. Lacking perspective, managers in the computer industry are preoccupied with tactical questions and short range considerations.

The artificial intelligence paradigm has distracted us and has proven to be inadequate. The successes of UNIX, on the one hand, and the Macintosh computer interface, on the other, have led industry into espousing the minimalist philosophy of the computer as an applications independent tool kit. The situation was stated succinctly by Dr- Robert W. Lucky, in his capacity as Annenberg Distinguished Lecturer, University of Southern California, on January 22, 1991. After surveying the astounding advances in telecommunications resulting from the development of digital switching and optical fiber technologies, he asked the rhetorical question:

"What are we going to do with this gigabyte? To be honest with you, nobody knows."

He went on to state that there is a total lack of leadership to carry us into the emerging telephone-computer era. In parallel with the vacuum in the conceptual arm he pointed out that the legal position of the Local Area Telephone Companies has worked against any telephone company taking a leadership role.

A new paradigm is needed that puts into proper perspective the role of the computer in human communications.

3. Sublanguage: A New Paradigm

We present here a simple, basic concept, that of a "sublanguage." A sublanguage is a form of human communication that is domain specific, appropriate to that domain and, consequently, highly efficient. Using this concept, we put forward a new paradigm for human information processing and communication. We then use this paradigm to lay out the new world of computing that will characterize the telephone-computer era.

We have learned to impose structure on the jumble of our moment-to-moment experiences so as to create order and provide perspective. In the words of William James:

"Is not the sum of your actual experience taken at this moment and impartially added together an utter chaos? The strains of my voice, the lights and shades inside the room and out, the murmur of the wind, the ticking of the clock, the various organic feelings you my happen individually to possess, do these make a whole at all? ... We break it: we break it into histories, and we break it into arts, and we break it into sciences; and then we begin to feel at home.... The intellectual life of a man consists almost wholly in his substitution of a conceptual order for the perceptual order in which his experience originally came."

It is in terms of this conceptual order that our world becomes comprehensible. It is the infinitely variable expressions of language that give tangible form to our own immediate view of the world and by which we share that view with others. It should be no surprise to find human language playing the central role in leading to an understanding of information processing. The mechanisms. of language are precisely the tools we need and use to express the recursive structures we impose on our experience. it is these mechanisms of language which we universally share that form the basis of communication.

What is this "natural language" that we use? The notion of natural language has played a useful role in the linguists' development of a general understanding of human communication, in the codification and maintenance of purity of national languages and the training of language teachers. Language, in the sense used by the linguists, might more properly be thought of as an integrated family of linguistic mechanisms, often expressible as grammar rules. The phenomenon of information processing and communications, although exhibiting in a given cultural community the adherence to such syntactic forms, also has features better characterized, we believe, by the notion of "sublanguage," here being introduced. Current literature often refers to a person's mental awareness of the world as one's "cognitive model." But it is not a single model it is a large family of interrelated, comparable models — the many alternatives that we visualize and choose among. The logician would refer to these as the model theoretic counterpart of our sublanguage; and indeed, abstractly, the model theoretic and the linguistic representations are quite equivalent. We do not wish to imply that we "think linguistically," as an alternative to "thinking in terms of a cognitive model." Rather, it is simply a more useful paradigm for the consideration of the role of the computer, a case that we intend to make in this paper. The linguistic formulation, we feel, grasps much more clearly the characteristics of our ongoing cognitive processes.  

In dealing with their immediate task environments, people narrow their considerations by making judgments of relevance, value and, task effectiveness — judgments that are characteristically human. The results of these judgments take concrete form in the sublanguages we use both in communicating within our task group and in our own internal thought processes. A moment's introspection makes it clear that as we move about from one task to another during a busy day we change from one sublanguage to another as our attention is drawn from one domain of activity to another. These sublanguages differ in vocabulary, often in their cryptic syntax, and even in the meanings of the same words. Their only commonalty is there basic linguistic structure.  

Consider the sentences in Figure 1. entered by the trust officer of a bank in his ongoing dialogue with his computer. Does this look like English? It is not Surely it is a sublanguage of English, one that has been geared to the concerns of the trust officer. In such a defined context, the phrasing is no longer ambiguous or indecipherable.

The concerns of a human individual engaged in a specific task environment can be characterized by that person's immediate sublanguage. The essential characteristic of a well functioning team is their common sublanguage. The stability of a given sublanguage is found in the stability of the task we undertake. When we return again and again to a task, and to that group in which we interact in conducting that task, it is the sublanguage that codifies and externalizes our ongoing considerations. And it is the ongoing, ubiquitous changes in that sublanguage that track our decision making processes.

As an illustration, consider a particular work environment, say that of a person working as secretary for an industrial manager. One aspect of that person's environment is their typewriter. The technology of the typewriter keyboard has not changed in many years, even though a more efficient layout of the keys is known. The reasons for this stability are not hard to envision — the keyboard does not change because of the strong social inertia resulting from so many people having been trained on the existing one. There are many other aspects of the typing, filing and sending of letters, reports, etc. where there are both physical and social inertia that mitigate against change in many aspects of the secretary's concern. The moment to moment sublanguage of such a person is constantly shifting as a needed address must be found, a letter retrieved from a file, a phone call answered. But a part of all of these sublanguages -remains essentially constant — that part related to the mechanics of typing, phoning, filing, where the physical and social inertia are high. This part can itself be characterized as a formal language. It is precisely these highly stable sublanguages that can economically be built into computer systems. Word processors are an ideal example of how the inertia of a significant part of the secretarial world can be exploited.

This illustration, concerning word processors, bears greater scrutiny. When a team is working on a project, there are at any given time a number of aspects that are undergoing change, and the rapidity of this change results in uncertainties and ambiguities in the sublanguage that characterizes their interactions. Each of the participants has their own sublanguage related to the common task but also containing the expertise and personal insights the individual brings to the common effort But it is essential to the effective functioning of the team that they share completely and unambiguously and tacitly a sublanguage that establishes the basis for their intercommunication. It is this that is the team's sublanguage. Relative to their many interactions — the sharing of insights, the settling of differing ways to approach a problem, etc. — this team sublanguage changes only slowly. And these changes result directly from interventions of the team itself. A word processor obviously does not comprehend the entire sublanguage of a busy secretary. But it is a stable part of the secretary's rapidly changing sublanguage.

A useful analogy is to the layout of a mountainous terrain. Suppose all sublanguages were spread out over a broad area, and that the altitude of this terrain at any point was some measure of the relevance of the associated sublanguage to the task at hand. The landscape would in general consist of a tall mesa whose top was quite flat except for a small hillock. As time went by, the mesa would move almost imperceptibly, but the hillock would be seen to shift about, in almost constant motion, as the concerns the of the moment shifted from one situation to the next

Consider a team, working on a common design task. If one were to ask them what they were doing, they would say "designing a ......" From our perspective, however, their sole task is the maintenance of the underlying team sublanguage. The completion of their task is marked by their agreement that this sublanguage is now ready to be passed on to those who will implement their design. Their common sublanguage will by that time, have evolved into one containing the design drawings, part tables, and specification lists in forms they know to meet the conventions established for transmittal to the industrial engineers who will further prepare them to go to tooling and the production floor. These conventions are characteristic of the hierarchy of sublanguages that characterize interactions in an organization.

There is one area where the computerization of sublanguages is already highly developed, and sophisticated, namely programming languages. The stability of the Von Neumann architecture has resulted in an evolutionary development of sublanguages that exploit this stability. We will say that the computer "understands" a sublanguage to mean precisely the same as when we say a computer understands a programming language. Namely, it carries out instructions in the way they were intended to be carried out, whether in the sublanguage of a programmer or the sublanguage of a professional when referring to work related matters. Stable, domain-specific sublanguages, from simple word processors to the cryptic, icon oriented yet highly sophisticated sublanguage of a NASA space flight control room, can be handled by a computer as easily as any traditional programming language. Computers can be programmed to understand the highly idiosyncratic, often cryptic sublanguages of individuals and teams from all walks of life.

The academic disciplines of linguistics, foundations of mathematics and computer science provide firm theoretical underpinnings for the study and computer implementation of sublanguages. A brief overview of these underpinnings reveals in sharp focus the full sublanguage paradigm. The concept of "sublanguage" is abstractly equivalent to the concepts of "recursive function" and "Turing machine." 'Bus our "sublanguage" paradigm is equivalent to Church's Thesis that human information processing can be characterized by these formalisms.

To give utmost precision to this statement, we restate this paradigm in terms of one of the above formalisms. Turing machines can be enumerated, i.e., we can speak of the ith Turing machine, T i. A Universal Turing machine, T u, is such that given any argument n, T u(i,n) = T i(n); thus a universal Turing machine can simulate any Turing machine, provided it is given the proper index. We are indeed Universal Turing machines, but with a Demon D. Having observed n, our Demon selects that Turing machine that is most informing, and T u(D (n), n) becomes our cognitive model. In this formalism, the sublanguage paradigm is equivalent to saying that at any instant our thought processes can be characterized as a Turing machine; but the selection of what Turing machine is a non-computable process, characteristically human.

 The sublanguage paradigm can also be considered as the integration of two other well established paradigms of computer science, namely object oriented programming and compositional semantics. Approaching this relationship from the object oriented perspective, take the object classes as "parts of speech," the "semantic categories" of Tarski's seminal paper3 that established modern mathematical linguistics. We note the clear tie to consideration of data structures. In the terminology of compositional — or more generally procedural — semantics, a sublanguage is defined by its "rules of grammar," each of which is associated with a semantic procedure. The implementation of these semantic procedures on the computer is in terms of the processes encapsulated in the object classes associated with the parts of speech occurring in the syntax rule. This point of view yields an elegant formulation of the "sublanguage" notion: Let the processes and memory structures of a given object oriented programming environment be implemented in "hardware," then the "sublanguage" becomes the "machine language" of the resulting computer. In this form, sublanguage is abstractly equivalent to "computer.;" that is, the set of all "sublanguages" coincides with the set of all machine languages of computers.

The paradigm for human information processing can now be stated:

It is the constant re-evaluation and adjustment of the relevant view that characterizes human information processing. A succinct expression of such a view is as a formal language. When there are strong social and physical inertia in an area of broad concern, a part of the sublanguages characterizing those concerns stabilizes. For these stable areas, it is economically expedient to develop computer systems that can understand these sublanguages. When, in some area, these stabilities dissipate and others arise, it is only human intervention that can maintain their relevance and effectiveness.

A computer can "understand" a given task-specific sublanguage far beyond its use as a simple query language to a database. Consider a middle level manager in a large engineering establishment. See Figure 2.

He changes the estimated time of completion of one of the tasks for which he is responsible by typing instructions to his computer; the computer responds by carrying out the indicated actions. The sublanguage of engineering management is thus "understood" by the computer, just as the manager would expect a staff assistant to have responded in a pre-computer era. The computer then becomes an instantaneous link, conveying the relevant implications of this simple change to wide-ranging concerns across the engineering floor. The computer is seen here in its true identity, as a powerful communications device. A brief, curt word to the computer, in the same jargon that has been developed by the management team, is enough to elicit a complex computer response, which may include composing and sending messages, or controlling equipment. However, as the task develops, there will always be new instructions for the computer, thus, there is a need to fall back on longer sentences and more complex commands, using, of course, the syntax and vocabulary of our own sublanguage.

4. The Implementation of Sublanguages

How are sublanguages implemented in the computer? Members of one class of sublanguages are already implemented in computers, namely programming languages. How are they currently implemented? A compiler is written which embodies both the syntax of the language and the semantics. The compiler accepts a sentence of the language and returns a single, machine language program. When used in interactive mode, the program is then executed. That is, the abstract computer which understands the programming language consists of a hardware computer with its own machine language and the compiler which translates the programming language into machine language. To change the programming language, one rewrites the compiler. There are compelling reason why ft is a bad technology for implementing sublanguages, including programming languages. Before discussing these reasons, we present here a radically different technology.

Let the computer be a Universal Language Processor. (A hardware computer with a universal language processor replacing the compiler of a particular language, if you like.) It operates in two modes:

  1. It accepts, one at a time, the rules of grammar and their associated semantic procedures that define the sublanguage, building them into its internal grammar table.
  2. It accepts an input sentence, parses it according to the grammar, uses the resulting parsing graph to compose the associated semantic procedures, evaluates them, outputs the result, and cycles.

Thus it is a simple, straightforward implementation of compositional semantics.

A little insight into what is going on here will be useful in understanding the power of this paradigm. Sublanguages are defined to the computer in terms of grammar rules, consisting of a syntactic aspect and an associated semantic procedure. An example of such a rule is shown in Figure 3.

Given the constituents of a meaningful phrase, for example: "government" and "contracts," the semantic procedure goes to the two associated data files and produces the "meaning" of the entire phrase: "government contracts." The role of syntax is to show how words and phrases can be combined into meaningful statements. Once the syntactic structure, of a sentence is seen, the associated semantic procedures can be composed appropriately.

The rules of grammar, along with the corresponding semantic procedures, constitute the building blocks. Each of these rules is implemented as a separate unit. The syntax of a sentence provides the plan for combining these building blocks into the complex meaning of the entire sentence. Thus the individual semantic procedures can be efficiently composed in innumerable ways to produce the needed answers to immediate user concerns.

(The first question that will come to the mind of a knowledgeable computer person is the effect of such an architecture on response time. Let us deal with this immediately. In our current implementation of this architecture, against a moderate size data base concerning ships and shipping (for computational linguists, this is the well known DARPA "blue" file.), and using a sizable grammar, the parsing time for the following sentence:

"What is the cargo type and destination of each ship whose port of departure was some Soviet port?"

is about a tenth of a second, the through put time including data base access is 8 seconds. The key to these response times lies in the fact that in such very high level sublanguages, the object class data structures and processes are highly optimized, so that in processing a sentence one is composing a few highly optimized procedures.)

The first thing to notice is the implications of the independence of the grammar rules — syntax and associated semantic procedures. As said above, in building a sublanguage, rules are added one at a time. these same rule adding utilities can obviously be used at any time to add an additional rule or, for example, a whole family of rules implementing a new object class. It is these same utilities that implement the user's ability to extend his own sublanguage by definitions.

An "insider's" problem is to determine how the great number of highly complex procedures that may all be needed at some time or another can be retained in a form that makes them available for rapid response to a query. One way that has proved particularly effective is to use "pages" in peripheral memory that are organized on the basis of semantic content. In response to a particular query, only those pages that are required are brought into main memory — whether they be data base record, procedure, text, image, digitized-voice, or other pages. Pages holding all manner of material are brought into the same paging area. Obviously, procedure pages require a modicum of run time binding, but since the number of paging slots is large, there is very little trashing of pages between main and peripheral memories. The information available to the computer is organized into a network of "nodes" and "links". The "nouns" of a sublanguage point to certain of the nodes in this semantic net. The syntax rules also have a geometric interpretation in terms of the semantic net; they indicate how to move from one set of nodes to another. Thus the parser composes the path from the words in the initial expression of a question to the nodes constituting the desired answer. The information about a node is kept on a database record on one or more pages of peripheral memory.

Organizing information in this way provides a highly efficient and flexible method for maintaining a rather shallow level of information organization (essentially equivalent to an entity-attribute database or relational database, plus inheritance). By linking such "database" records to more complex forms of representation (e.g., texts, pixel files, postscript files, engineering drawings) and by providing sophisticated semantic procedures that can exploit the additional complexities of these structures, the computer can give wide-ranging responses to highly complex technical questions. In the terminology of object oriented programming, these database records constitute the object representations for the single all-encompassing object class, "noun." Any hierarchy of subclasses of objects may be created, such as "image noun," "matrix noun," "covariance matrix noun," etc. with their associated processing procedures.

A new object class can be easily implemented as a new subclass of the "noun" object class; when an instance of the new object class is created, first its record as an instance of "noun" is created, and then a link from this record to an instance of the data structure of the new class is added, As an example, suppose one were building a new sublanguage to be used by the structural engineers in an aerospace company. Suppose the company already had a major investment in files of stress data and, say, FORTRAN procedures that processed these files. The new object class, a sub-object class of "noun," would be created whose associated data structure was that of the stress data files. Syntax rules for noun phrases that engineers commonly used in referring to the stress data would be added, their corresponding semantic procedures consisting largely of calls to the relevant FORTRAN routines. Such queries as:

"Plot the stress against wing tip loading for both Model A12 and Model A14 wing aileron designs."

would be immediately available.

In today's highly visual world, subtanguages are seldom limited to written text. But how can this complete integration of media be implemented? Certainly the identification of the object class with its encapsulation of structure and process is a ma or step. Another step concerns the extended "alphabet" available to all sublanguages. All letters and characters of the usual alphabet as well as the entire extended ASCH character set; all graphic "events," such as clicks of the mouse and movements of the cursor, and all "interrupts" from internal and external sources (property screened and identified) can be used in the input string that is fed to the language processor. (The computer, like human beings, has "fingers" for pointing and "intonation" and "gestures" it can use.) In this respect all sublanguages have the same terminal vocabulary, namely this extended alphabet. Once this is established, grammar rules can supply the recursive, flexible link between the input string and the internal object classes. For example, one can at any time introduce a new icon, placing under it any sentence or phrase of the sublanguage which then is evaluated in line whenever the icon is clicked during input of a query.

In Figure 4., an airline mechanic is seen working on the radar nose cone of a Boeing 747 aircraft He urns to his computer for detailed technical support He has already entered information identifying the particular aircraft he is working on, and has called for a display of the nose cone area. The computer generated photo image of the relevant area (plus an invisible back-plane drawing outlining all significant parts) provides a highly efficient medium for communication. For example, he may type "leak" and click his mouse on the image of the place he suspects is leaking oil The computer may respond with the spoken word: "tighten" and blink the bolt it identified in its diagnosis as the probable cause of the leak- In response to a sparsely stated but technically involved question, the mechanic receives an immediately useful response that reflects a high degree of built-in understanding.

In Figure 5., a maintenance professional is entering instructions into his personal, completely mobile, telephone-computer. It eliminates any need for the usual truckfull of manuals. The professional's efficiency is greatly increased, since the computer tailors its responses to the specific installation Astute use of hyper-media links from one data display to another quickly provides pathways to the details the professional really needs. References that establish context (e.g., "I am at ") as well as pronouns and elliptic constructions (e.g., "What about the other connector?") play important roles in effective dialogue. Note that pointing to and blinking significant areas in pictures and drawings constitutes visual "pronouns" (e.g., "voltage 'there'?" or "tighten 'that'", "[show schematic icon] of 'that'").

5. The Creation and Basing of Sublanguages

The typical industrial manager will have many subtanguages, for example:

Underlying each of these, and a part of every sublanguage, are the general dialect of the manager's natural language, a complete graphics package, text editor, electronic mail, voice messaging, etc. Once he has chosen to use any one of his sublanguages, all of these services will be immediately available; the manager will not be aware of which service a phrase of his query may have invoked as he proceeds in his normal way:  "Send this draft budget to my section managers with the, following message: '...(voice)...'; "Schedule a meeting with them sometime on Wednesday afternoon."

How are sublanguages created? Initially. there is one sublanguage, BASE. It contains a limited dialect of English which is adequate to handle expressions concerning typical relational or entity-attribute data bases with inheritance, It also contains a graphics package, text editor, electronic mail, etc.- as mentioned above. To create a new sublanguage, say "Finances," one "bases" it on BASE:  base Finances on BASE, or, for that matter, on any pre-existing sublanguage that may be available. Then, choosing this new sublanguage:  enter Finances, one has all the capabilities of the based upon sublanguage immediately available. One can then extend this now sublanguage in many ways (these will be discussed below).

In Figure 6., engineering manager E. D. Moore creates a sublanguage to share with his three subordinate managers Now any of the four of them can use, modify and extend this common sublanguage "EngScc." Thus they jointly maintain a common, up to date view of their joint activities (e.g,, preliminary designs; , personal schedules). This is the significance of being able to "enter."

There is a strong asymmetric relationship between a sublanguage and all of the subtanguages on which it is based, either directly or indirectly. Suppose one sublanguage, "Accounting," is based on another, "Personnel Accounting": base Accounting on Personnel Accounting. Any changes in Personnel Accounting are immediately reflected in Accounting; however, Accounting can be changed in any way without affecting Personnel Accounting at all. This asymmetric relationship is characteristic of basing.

 

In Figure 7. showing the accounting sublanguage, the people in the personnel Accounting Section are the only ones who are authorized to enter the Personnel Accounting sublanguage; therefore they are the only ones who can change it. Similarly for the Contracts Accounting and General Accounting sublanguages. Accounting is based on each of these three. No one is authorized to enter Accounting; therefore no one cm make any changes in it. Of course, it is automatically always up to date with the latest data from Parsonnel Accounting. Contracts Accounting and General Accounting.

Appropriate managers are, authorized to base on Accounting. One of the Department Manager's sublanguages is based on both Accounting and Production, and therefore always has available the very latest accounting information. The manager my well have had the application programmers add a number of grammar rules, graphic output formats, and icons so that overviews of the complete operation are always readily available. These added facilities would only be available in this particular sublanguagc, but would sways utilize the latest accounting and production data. A member of the manager's staff looking into the possible change in the pricing structure for company products, could also base a staff study sublanguage on Accounting, change many of the entries to values reflecting the new pricing structure, then examine the inferred results, and finally arrange appropriate graphics for a presentation (without, of course, affecting the Accounting sublanguage at all) In Figure 8., placing one sublanguage above another indicates that the top one has been based upon the one immediately below.

6. Networking in the Telephone Computer Era

Local area networks (LANs) are a very transient aspect of our computer environment. Once Integrated Service Digital Network.(DSL), Broadband (DSL) and packet switching are fully implemented and installed, file transfers will be sufficiently fast to provide current LAN services between any two telephone-computers. At that time, of course, telephone-computers will have telephone numbers, indeed, will be the terminal equipment represented by one's telephone number. Simpler forms of telephone communication such as voice, e-mail fax and electronic messaging will be subsumed within a much broader spectrum of telecommunication services mediated by the telephone-computer.

When one telephone-computer calls another, pairs of windows are created; one window of each pair appears on the monitor of each computer. One pair of windows is controlled by each participant. The participant controlling a given pair of windows can enter any of his sublanguages; then any of the wide ranging data and graphics thereby' available can be displayed on both windows of the pair. If both participants are authorized to enter the given sublanguage, then both can address this sublanguage jointly.

For example, in Figure 9., a banker calls Bob Moore one of his clients. The banker shows Moore a bar graph of the changes in Moore's DetEd stock over the last quarter. Moore asks what happened during the previous quarter, and the banker immediately instructs the computer to bring this information forward. The banker shows Moore the implications of the changes he is advocating, and Moore's questions are quickly answered with the supporting data displayed by the computer. A change is made, and its effect is immediately seen by both parties. Once a transaction is decided upon, it is immediately implemented by means of a transaction transfer to the appropriate stock exchange.

We have already shown how a manager can create a sublanguage, and then authorize his subordinates to be able to enter it, thus creating a common sublanguage. A sublanguage on one telephone-computer can be entered from a different telephone-computer provided the requesting person has been authorized.

The combination of basing with the single, world wide telephone network adds up to a powerful set of capabilities. To see the significance of basing, consider the matter of virtual address space. In current practice, each computer has its own private virtual address space. Data and processes in other files must be brought into this virtual address space by bulk transfer before they can be utilized in calculations. This two level addressing constitutes a major barrier to the expanded uses of computers, in particular to the integration of information resources relevant to the changing needs of applications from the vast resources that are available. That is not how "addressble memory" is utilized in our daily life. Our addressable memory provides an encyclopedic array of vast amounts of data and process, while only a minute fraction is involved in our immediate considerations.

In the telephone-computer age, there is just one common virtual address space, down to the byte level for all telephone-computers, for all these resources. This address space is blocked into pages, which are also the packets that move over the telephone lines. International agreements long ago have created a universal address standard, namely the telephone number. When one installs a new telephone and is assigned a new telephone number, one is thereby automatically assigned one's own slice of "virtual memory," one's own corner of the world's address space. Companies that maintain huge data and powerful processing resources have the whole world as their market because they are uniquely identified and reachable through their telephone number. It remains to provide the means of establishing the addressability to these resources. That is the role of basing.

Basing one sublanguage on another establishes addressability among the data and process pages that constitute the physical manifestation of the sublanguage. Basing results in the sharing of a common address space whose key element is the addressable page. Therefore, pages whose "home" is on one station are comingled with pages from another. This results in a common address space (down to the byte level) across the entire hierarchy of associated sublanguages. Thus it is the sublanguages in such a hierarchy that are networked, not computers. Such a network may be small and of short duration, servicing an immediate problem. Other networks may be large and stable, hosting an extensive hierarchy of sublanguages. Such "networks" are created, and deleted, by the simple acts of basing and unbasing.

The banker has formed such a network with the various stock exchanges and commodity markets (these being archival stations, discussed below). Since the banker shares an address space and sublanguage with these markets, transactions can be completed as a natural aspect of any dialogue with a client

Here again is the maintenance professional working on the nose cone radar of a Boeing 747 aircraft in Figure 10., his computer by his side. Consider the networking aspects of this maintenance situation- The computer is networked, in the strong sense of sharing address space, with the Boeing maintenance shops in Seattle, Washington. That is, a

sublanguage in this computer is based on the Boeing Maintenance sublanguage in the Seattle shops. None of the maintenance material is in the computer being used by the maintenance person. First, the identification of the aircraft being serviced is established. In response- to a call for a full color annotated image of the radar nose cone area, the pixel data sets come, via DSL, from a single source — the high transaction rate server in Seattle. Although some processing is being done locally, all maintenance data and diagnostic analysis is being done in Seattle. The bane of having out-of-date maintenance manuals will be a thing of the past The addressable page is both the unit of storage and the packet of telecommunication. If the maintenance professional is still puzzled, clicking the mouse on a special icon will establish an immediate conversation with a maintenance specialist at Boeing, Seattle. Both monitors will display the same material both people can use their mouse to point, and both can have a voice discussion of the problem at hand.

Figure 11. illustrates the point that all of the database records, semantic procedures., utility routines, pixel data sets, etc. are stored as pages; and that only the necessary pages are brought into main memory. We see that by using basing and networking, many if not most of these pages will be drawn from distant stations at the time they are needed.

Each individual item of information in a network of sublanguages has its unique address by which it is identified. The stations whose peripheral memories are the depositories for these items are uniquely addressable by their telephone numbers. Note that it is the hierarchies of sublanguages with their associated common address space, that are networked, and not the computers. Thus, within a single computer, a person may have many sublanguages, each in its own network- Sublanguages, not computers, are networked. Sublanguages have no geographic limits.

In current data base technologies, the database consists largely of links from one node of the data base to another. Each "link" in our semantic net data base consist of page addresses and byte offset. Thus for a network of more than one telephone-computer, the links within a data base may refer to pages that reside anywhere in the net anywhere in the world. A data base in such a network is "distributed" in an intrinsic way; the basing procedure implements this. The size of a page is 2024 bytes.

The 64 bit page address has the following structure:

bytc offset on a page

211 bytes

page number

221 bytes

telephone number

232 bytes

Thus up to four billion telephone-computers can be accommodated.Each telephone-computer may hold two million pages, that is for billion bytes of information. Thus the bound on directly addressable data and processes, the size of the world' single virtual memory, is 1019 bytes.

Figure 12. shows a schematic of a typical sublanguage network. The inclusion in these networks of large volume servers of archival information will be typical. We have seen this need in maintenance situations. Companies like Mead Data Central in Dayton, ISML — a Houston based supplier of scientific subroutines, Springer-Verlag's Beilsteins Handbuch der Organischen Chemie, cookbooks and garden catalogues, the New York Stock Exchange stock closings and the show records of the American Kennel Club will au be available, page by page as needed by thousands of users. A department store, serving several hundred thousand charge customers, will do so thr6ugh a high transaction server that both makes available all manner of sales material, processes incoming queries and orders and connects customers to knowledgeable sales personnel.

Here is the maintenance professional working for a local service company. (Figure 13.) The professional's mobile station is in at least two networks: the first, previously illustrated,

is with the home maintenance station whose server has all records for the field locations served and all the maintenance information. The second is the dispatcher network. The central dispatcher can see on his map displayed before him the location of all maintenance trucks as they move about the city. When the dispatcher receives a call requesting maintenance service, he can type in the address of the caller and immediately see its location on the map, spot the nearest maintenance unit, click it with his mouse, and talk with the maintenance person directly to coordinate the new service

 

7. All of the World's Information

What needs to be addressed? the obvious answer is any thing that my at some time be an object of attention in some sublanguage. Each object class will have its own grain and often its own special means for the identification of its relevant elements.

It is interesting to note that many, if not the majority, of objects referred to by sublanguages do not have "names" in the lexicon. Consider the marriage of Edward D. Moore and Patricia Jones Moore. Friends of the Moores often refer to their marriage; for example, "her daughter by her second marriage," but it does not have a name. "Texas Instruments FB74 transistor" may name a class of its instances as they exist in a great variety of circuits, or alternatively, of its instances in circuit drawings (as subdrawings). In either case they will inherit the properties (such as impedance) of the FB74 transistor class. In the later case, one can identify the particular transistor either by "the FB74 transistor used in the monopole section of the shift register" or by pointing at the transistor and clicking the mouse while viewing the circuit diagram

How is the addressability problem for all-of the world's information solved? We identify the notion of the Archival Station. You call up a station that supplies information you wish to be accessible to one of your sublanguages. A form appears on your monitor and you are asked to fill it out. After doing so, you put your charge card in the card reader slot in your telephone-computer. You are then free to base your sublanguage on any material of interest to you and available from this source. Your initiating call to the Archival Station provides it with all it needs for billing, notification, etc. The act of basing automatically identifies authorization information necessary for security. The Archival Station only infrequently initiates a call to you. Your computer calls it, requesting a page. Since the request is sent as one of your own pages, it carries not only the requested page address but also the return address as well. Thus it carries all the information the Archival Station needs to identify you and your account and provide you with the inf6n-nation you need. In this manner, the Archival Station information resource sublanguage can be "in" as many "networks" as there are clients who wish to have its resources available. Since clients' sublanguages are based upon this resource, it itself is protected from change. Billing services for the use of these pages are handled by the telephone company in the manner of "900" numbers today.

All telephone-computer users in each of their sublanguages have complete freedom to choose and base on whatever information resources they desire, paying for accesses to only those pages required in the course of their processing. Furthermore, this information does not come as isolated, independent displays (as for example, in the French Minitel System). Any number of such resources may be integrated in response to a single user query in a single sublanguage. This may be a sublanguage a telephone computer user has developed in conjunction with one of his personal interests, having personally selected the several information resources it has been based upon. The processes of adding such resources and of extending and modifying the sublanguage and its data in many ways then becomes just part of normal day-to-day activities.

8. The New World of Computing Applications Development Environment

The most serious problem facing the computer industry today is the prohibitory cost of software development. Without a major improvement in this area, the telephone-computer will remain underutilized. So what is the software development environment for application programming in the era of the telephone-computer?

Each sublanguage has an associated meta-sublanguage. A sublanguage expresses the way a person views an area of interest, a nieta-subianguage has as its focus the associated sublanguage itself. This meta-sublanguage is the proper software development environment of the applications programmer. In this environment, he can extend the vocabulary and grammar rules of his user's sublanguage to encompass the idiosyncratic expressions of the user's domain, construct here the utilities needed for efficiently building and maintaining the semantics of these expressions, and create the new object classes, their data structures and processes, that bring the necessary efficiency to the computer's support of user needs. The result is a programming environment test is domain specific and highly efficient.

The same language processor that handles the 'English" sentences for the client also handles the "Pascal" or "C" statements for the applications programmer. So it is a straightforward matter to give the meta-sublanguage access to the lexicon and grammar table of the sublanguage. When a new rule of grammar is added, the name of the semantic procedure is put into the meta-sublanguage's lexicon, linking to both the source code file and the paged object code.

When a sublanguage L2 is based on a sublanguage LI, the basing process also creates a meta-sublanguage meta-L2, based on meta-LI. All sublanguages are ultimately based on BASE; thus, all meta-sublanguages are ultimately based on Meta-BASE. Meta-f BASE contains a rich programming environment: trace, breakpoints, and a complete spectrum of programming and debugging tools. Further, it knows all about the associated user's sublanguage. The clients sublanguage's symbol map and grammar table, source code for its semantic procedures, etc. are all available to the applications programmer through the efficient syntax of the meta-sublanguage inherited through basing on Meta-BASE.

In the example in Figure 14, both the end user and the application programmer are referring to the British Star, however their respective interests are markedly different.

The applications programmer maintains and extends the clients sublanguage by dealing directly with the syntax rules and associated semantic procedures of the sublanguage. To add a new capability for the client, the programmer first enters tile client's sublanguage, types "metalanguage" to enter the meta-sublanguage, and then types "RULE." The meta-sublanguage responds with the prompt "SYNTAX" and the programmer adds the syntax for the new grammar rule. If this includes a new part of speech not recognized by the sublanguage, it is automatically added to the appropriate table. The meta-sublanguage then prompts for the semantic procedure, which is then programmed directly on line. When the procedure has been completed, the system:

  1. compiles the procedure
  2. links the result to the resident code
  3. puts the result on a page
  4. puts the syntax into the grammar table linked to this page

Note that linking is done only with the relatively small resident code. The programmer types "return" and is back in the client's sublanguage, with all the client's data, grammar and previously added extensions, and can immediately try out the new rule in this actual client environment. Again entering the meta-sublanguage, the programmer can edit the procedure, and iterate. One final iteration, for removing remaining debug material, and the programmer can call the client, saying that the new capability is available, and giving a concrete illustration of how it can be used on the coupled windows.

In Figure 15., the Trust Officer of a bank has called the applications programmer to request a new analysis procedure, "the ABC value," to be applied to various equities. One new rule of grammar with semantic procedure defining the notion of the ABC value is all that the programmer need add. It then can immediately be used by the banker in far ranging queries.

Just as any given sublanguage can be extended by adding new syntax rules and their associated semantic procedures, its meta-sublanguage can be extended by adding syntax and semmtics appropriate to the context of the applications programmer. The applications programmer can extend his own domain specific meta-sublanguage by first typing "METARULE", and then proceeding as before when adding a "RULE". The programmer can indicate any convenient syntax for calling this new procedure, including of course the standard functional notation. A new utility procedure can be added by typing "PROC"; he will then be prompted to write, on line, the program for the utility. For example, in the meta- accounting sublanguage, the programmer may add the procedure:

"update_col_totals(ledger, change amount)"

which can subsequently be used in semantic procedures either by the programmer or by an applications programmer that has purchased the accounting package. If a programmer wants to use an abbreviation for an often used sequence of code — a "macro" in the programming sense — then "MACRO" is used in place of "RULE" or "PROC".

We see in Figure 16. that the BASE sublanguage has been extended to an Accounting Sublanguage. The Accounting Sublanguage contains all the terms and syntax that are commonly used in the accounting world. For example, it may include an object class of procedures and data structures for double-entry bookkeeping. Meta-accounting includes an extensive family of accounting utilities and a convenient syntax for using them. Thus, a meta-sublanguage becomes highly domain knowledgeable. When carefully crafted, it can have the look and feel of a specification language, yet after macro expansion and compilation, produce efficient object code.

Such a sublanguage/meta-sublanguage may be marketed by a software firm specializing in accounting. The source code for the semantic procedures will not, of course, be shipped with the package. The ABC Co. purchases this package, and their small group of application programmers tailor the ABC Co. Accounting sublanguage to the special practices and jargon of their own accountants. They lose nothing of their company's preexisting capabilities; they simply add rules that embed their own existing files and special procedures in the "accounting English" that the package provides.

Now the whole panorama of the sublanguage hierarchy begins to come into focus. Each office of the enterprise has its own sublanguages, which may be related to each other through basing. Many of these sublanguages will also be based on other information resources, including archival stations. Many will have been extended to include domain specific application packages. But this is only half the picture.

9. Toward an Efficient Organization of the Software and Data Provider Industry

A sublanguage does not have to be a complete, self contained "language." it may, for example, consist of only those syntax rules and associated semantic procedures that are the tangible implementation of a new object class. By identifying the object class by a new, otherwise unused part of speech, the encapsulation of the processes and data of the object classes is achieved. Since this new object class is automatically made a subclass of "noun," the objects of this class satisfy the already existing syntax of "English" and thereby link- to other appropriately related object classes in a natural way.

To visualize this other half of the picture, consider how easy it is to change a significant part of a sublanguage, in contrast to the difficulties this would entail in current systems. One rather narrow facility that almost all sublanguages would want to include is a graphics package. A graphics package exists now, consisting of syntax miles, semantic procedures, and utility procedures, and these in turn can call appropriate Postscript operators. The package only supports two dimensional drawings. Suppose some software house that generally works close to the UNIX or DOS level produces a considerably improved graphics package that supports three-dimensional drawings. The applications programmer in any applied software shop, say one specializing in spread-sheet packages, can go into the meta-sublanguage of any appropriate sublanguage and type:

"delete from file:"

followed by the name of the file containing the current graphics sublanguage syntax, then after having mounted the new graphics system in the floppy disk drive, type:

"extend from disk."

The result is the replacement of one graphics package by the new, updated one. Everything else remains the same; no data is disturbed, no relinking of the whole huge system is required. And when the new spread-sheet package is purchased, the client will be able to analyze and modify their preexisting database along any three selected attributes in a beautiful three dimensional display.

The door is now open for a vast proliferation of software development at all levels, a layering of development levels, a multiple branching hierarchy of specialization. It is just such a hierarchy that characterizes the textile industry, the automotive maintenance industry and, indeed, today's computer hardware business. The monolithic, application- independent "tools" approach that so dominates the software industry today, finds its proper niche in the lower levels within this far broader perspective.

Near the top of this hierarchy, shown in Figure 17., software houses that consider themselves as specialists in their client's domain rather than in computer software can succeed in small, specialized markets because their development environment is already highly specific to their needs. Th result is the drastic reduction of software development costs. This opening of the competitive market to the innovative specialist will produce sublanguages that can be fully understood by the telephone-computer and imbue it with a really penetrating understanding of the subject domain.

Let us suppose that some software house that specializes in the graphic aspects of merchandising has produced a package, including television camera, for building catalogues. A women's apparel shop has purchased this package and is seen in Figure 18. preparing a Spring Sales catalogue. How is the catalogue material to be connected to the rest of the apparel shop's information system? The software house assumed correctly that

video sequences and other catalogue material would be directly related to items of merchandise offered by their clients, and that the names of these items would already be in the clients lexicon as noun phrases. This was enough; what other links from these items — inventory levels prices, etc. — are maintained by the stores is immaterial to the catalogue building package. So when the store personnel are in the midst of preparing a catalogue, and the catalogue package asks for the description of the merchandise item to be associated with a particular graphic, the store personnel reply using their store's item number. The catalogue package procedure then links the graphics to this item record in the database; it is, therefore, indirectly linked to all non-catalogue aspects of store management

When a customer, perusing the catalogue over the telephone-computer, clicks the mouse on a graphic and typed:

"Send me one of those in size 14, color blue."

the following occurs:

  1. The pronoun "me" is assumed to be the person on the phone to the store's telephone-computer, who is therefore identifiable from the page address of the page sent to the store containing the request; from this the store, using a standard utility procedure, goes to the customer's telephone-computer for the number of the customer's credit card (entered by the customer in the computer slot as a part of logging in) which identifies the customer's account, credit rating, and other information.
  2. The syntax of the catalogue package, through a graphics package the software house had based on, includes the RULE: <noun phrase> => "one of those" <click>. The semantics of this rule retrieves the item number associated by the catalogue package with this graphic.
  3. The store's order-processing package, purchased from a different software house, includes the verb " send" with the associated semantic procedure that carries out the appropriate processing of the whole transaction.
  4. The underlying English includes all of the other rules necessary to complete the parsing, and therefore the processing of the request.

The customer's telephone-computer displays the text:

"Mrs. Smith, we are pleased to send you a Calvin Klein dress, catalogue item 08249, size 14, color blue. Your account will be billed for $124.99 plus $7.50 .slipping and tax, for a total of $132.49. You can expect delivery in 3 to 5 days. Thank-you for your order."

As another example, a furniture store has purchased a software plus equipment package that helps customers visualize potential purchases. In Figure 19., the sales person has put the floor plan of the customer's room on the scanner (seen in the background). As the customer selects furniture items, their identifying numbers are typed in and the mouse is clicked at the appropriate site on the floor plan. The package then displays, perhaps as a

giant, room scaled screen, from any view point, how the room will look. Colors, fabrics, furniture orientation, etc. can be changed and the immediate effect seen.

The scanner package, which identifies the inputs to the ultimate graphic process, would be rather easy to implement using the standard graphics package of the BASE sublanguage. The output graphics pose a considerably more difficult problem How can the cost of development of such sophisticated software be amortized? Inputs would probably be pictures of each item taken from a prescribed set of angles. The output would likely use ray tracing to get the shadows and textures just. right. This package, however, would contain no other aspects, whether it be furniture, automobiles, microscopic pictures of tissue, a "blowup" schematic of the nose cone of an airplane, or what ever. The specialists building such packages would not need to know anything about accounting or linguistics. Their deliverable product would likely be a chip, together with grammar rules and procedures for extending a meta-sublanguage rather than a sublanguage. Providing the result as meta-sublanguage syntax gives the applications programmers that use it a great deal of flexibility in the way their user interfaces can be designed, and ease in using this flexibility. Thus, the resulting package can conveniently be tied in with other packages (such as the scanner package in the application cited here, a catalogue-buuding package, a medical lab package or one used in developing the graphics for the Boeing maintenance package used by the airplane mechanic who was shown working on the nose cone. This means that the high development cost of such a package could be distributed over a wide market comprised of software houses closer to applications, and labs with their own strong programming groups, each having its own distinct clientele (like the furniture store).

Many exciting capabilities are being demonstrated in academic and industrial laboratories. However, in today's software development environments, the cost is prohibitively high; and it will be a long time before these capabilities win find their way into any but military and space applications. The software development environment presented here sharply reverses that trend by supplying a simple, linguistic linking at the meta-sublanguage level.

10. The Vision and the Realization

So here is our paradigm: It is the constant re-evaluation and adjustment of the relevant view that characterizes human information processing. A succinct exprssion of this view is a sublanguage which is describable as a formal syntax and denotational semantics. When the basic conceptual structure of a task environments changing only slowly, a sublanguage can be established that characterizes the task and the considerations relevant to it. A computer can be programmed to understand this sublanguage, providing a natural computer adjunct in accomplishing that task.

Programming languages are the proper sublanguages of system programmers, and their very limited application is the implementation of operating systems and language processors. Possibly, it is only lack of a clear and relevant paradigm that has delayed even applications programmers from having an appropriated way to communicate with the computer. It is now time for the rest of us to be provided with the sublanguages appropriate to our concerns, sublanguages that are natural and efficient for our communications with the computer and for our communications with each other through the computer. Then the computer will find its appropriate niche as a medium of communication, tying people, information resources and processing power together into the efficient focus of our appropriate sublanguage. This is our vision. However vision alone will not provide the better answer to Dr. Lucky's question. It is a sad commentary on this industry that the many visions, the promises, remain largely just that — visions and promises. The Apple Computer film "Knowledge Navigator' offers no clue for bringing that vision into being.

How do we realize our vision? How do we solve the many technological problems that lie in the path of such a development?

The problems now faced by the software industry are clear and indeed are recognized by that industry; the first of these is by far the most stringent:

  1. The high cost of software development.
  2. The integration of multiple media, of data, and of geographically dispersed people and resources into a coherent user interface.
  3. The computer's current inability to understand references to internal contents of files — to know what we are pointing at in a picture, to be able to answer questions using a table of data from a journal article, or to know to inform others of a change in an item of data.
  4. The difficulty in getting any single relevant item from all the world's information without being blocked by the enormous barriers of ambiguity, volume, and non-focused indexing.

We have faced these problems and have concluded that current software development practices, built as they are on the minimalist philosophy and the current perspective of software engineering, are not conducive to solving these problems. Object-oriented programming is a major step in the correct direction, but is insufficient because it does not deal with the adverse effects of the isolated, monolithic system that is the hallmark of the minimalist and software engineering views. Further basic changes of approach are required.

In seeking a solution to these problems, we have first put forward a clear paradigm: The sublanguage is the proper focus of software development From the vantage point of this paradigm, follow three radical changes in software system architecture:

  1. A single, grammar-driven language processor that includes language extension utilities.
  2. Segmentation at the language-processing level using a page address structure compatible with networking.
  3. Hierarchical sublanguages sharing a common, world wide address space to solve the distributed data and distributed processing problems.

In presenting our vision here, it has been -important to us that our ideas really work, and that our words. are backed by a solid, working system. The New World of Computing System exists. We have embodied the radical changes into a complete, rounded system. We. have gone farther by extending this system to include capabilities that elucidate and amplify the basic concepts. The many technical designs, both indicated in the above presentation and implied by the illustrations have been fully implemented in this single, integrated system. We have successfully tested and demonstrated every technological capability required to achieve every aspect of our vision- Thus, the basis has now been laid, a technical solution has been achieved for that new world of computing, the era of the telephone-computer.

11. Epilogue

The research phase of the New World of Computing System is completed. It exists today at the level of a commercial prototype. It is now ready to move into product development.

The New World of Computing System is written entirely in standard "C," except for a few hardware interface assembler procedures. It is running under UNIX3/OpenWindows4. It currently consists of over 400,000 lines of "C" — about 3 megabytes of compiled code. Of this only about 300 kilobytes is resident; the rest is on the System's own pages (together with data, text, etc.) and is managed by the System's own paging subsystem. The System's own pages are the packets sent across existing and future digital telecommunication systems. This includes pages containing the digitized voice, and echoed texts and graphics, that will constitute telephone communications. Current PC and workstation hardware and DSL telecommunication standards are completely adequate to fully support the functionality of the New World of Computing System as described in this document

"New World of Computing" is a registered trademark of the California Institute of Technology, which holds the copyright to the New World of Computing System.

Notes

1. James, William, The Will to Believe, reprinted in The Philosophy of William James, (ed. Kallen), Chapterl I, Wodern Library,N.Y.

2. Tarski, Alfred, Der Wahrheitsbegriff in den formalisierten Sprachen, Studia Philosphica, vol. 1, 1936.

3. UNIX is a trademark of AT&T Bell Laboratories

4. Open Windows is a registered trademark of Sun Microsystems Corp.

For a related article, see The Future of Machine Translation. For a biography of Fred Thompson, see the memorial note at Caltech.