Chapter 2.1: Object Base

2.1.1 Storage

A fundamental concept of AFS is that all storage internal to the system is managed automatically: the programmer refers to data and other objects by symbolic names rather than by physical addresses. Storage management would extend over levels from high-speed registers and monolithic memories up through Comanche files, optical storage devices, and even cataloged off-line storage such as tape libraries. Logically, all such storage is an integral part of the system, distinctions between levels are invisible to the programmer, and it is considered almost unlimited in size.

When independent formulations of a problem give rise to similar concepts, those concepts probably contain an essential element of the problem that is invariant under change of notation or frame of reference. The problem of distinguishing between objects and the mechanism for referencing them is a fundamental one that every computer system, programming language, and theory of computation must face:

In von Neumann machines, a special type of data called an address is used to refer to other data; although addresses have the useful properties of numbers, they are bound so tightly to physical storage that their logical properties are inextricably confused with problems of allocating storage and devices.
In the definition of CPL, Strachey distinguished L-values and R-values according to whether the value could appear on the left or the right of an assignment statement. The target of an assignment had to be a value with location-like properties.
Algol 60 can be formally defined without the concept of storage only because it has a relatively small number of basic concepts. To deal with pointers and to formalize concepts of assignment, Algol 68 introduced the concept of a reference, which is like an address pointing to a cell capable of holding a given type of object.
In his analysis of APL, Abrams distinguishes selection operators and computational operators. The value of a selection operator is linked to the storage of one of its operands and can transmit changes back to the them.

One of the design principles of AFS is to search for the essential elements underlying all programming languages and to build a new system upon them. The concepts of object and storage cell are fundamental and require careful definition to support a general treatment of assignments, synonyms, ownership, and argument passing to functions.

For defining indices and pointers, storage addresses are useful, but the housekeeping they entail far outweighs their usefulness. The storage cell in AFS is a logical location capable of holding any object or collection of objects, no matter how large; its characteristics of a location simplify the definition of indices and pointers, but it involves no housekeeping burden because the storage management system makes the cell appear as large as necessary and automatically moves it to any device that may need to process its contents.

Definition: A storage cell is a logical location identified by a unique cell name. Each storage cell contains one and only one object; there is no upper limit on the size of a storage cell. The cell name is an internal identifier (abbreviated iid) whose representation is invisible to the user.

This definition is nonconstructive: it defines a storage cell by axioms or characteristics that are visible to the programmer, not by an explicit construction from something more primitive. The advantage of nonconstructive definitions is that the implementer has maximum freedom in his choice of representations and hardware design. The disadvantage of such definitions is that they don't prove that an efficient implementation (or even any implementation) is possible. To remedy that situation, the informal notes between definitions will illustrate the abstract concepts with a sample implementation. Since the illustration will not necessarily be the optimum engineering solution, the implementers are free to choose any design that satisfies the axioms.

Definition: A buffer is a temporary storage cell created for the purpose of holding an object until it can be processed or moved.

Buffers are intimately related to the mechanism for passing messages between objects such as arguments to functions and results from functions. Normally what is passed is the cell name of some storage cell containing the message. In computing X[I], for example, the select function returns the cell name for the storage cell containing X[I]. However, when the sum function computes (A+B), there is no permanently allocated storage cell containing the result. Therefore, the interpreter that is interpreting the function obtains a temporary storage cell, called a buffer, to hold the result. Buffers correspond to I/O buffers in current systems as well as to registers in the CPU or on a pushdown stack.

A particular implementation of the storage cell concept is discussed in the System Architecture Manual. The Storage Management Subsystem (SMS) described there provides spaces identified by unique space numbers; each space is linearly addressable by an offset from the beginning of the space. A collection of storage cells can be implemented as a space divided into a number of fixed-length blocks holding object images, also known as DAPOVs (Descriptor And Pointer Or Value). The cell name corresponds to the space number and offset to the object image; the uniqueness of space numbers guarantees the uniqueness of cell names.

2.1.2 Processes

The concept of process is fundamental to all levels of an information handling system: CPU, channels, operating systems, and external devices. A rationally designed system must have a precise concept of process and of the possible interactions between processes. In AFS, the definition of process is based on the well developed foundation of automata theory and is designed to facilitate the implementation of multiprocessing systems.

Definition: A process is an automaton with a set of states S and a set of states W contained in S, in which it waits for input. Processes can be best described by assuming they have three parts:

A process status record (abbreviated PSR) containing the current state, input, and contents of buffers used for working storage. There is a one-to-one correspondence between processes and PSRs.
A procedural description that encodes a finite set of information defining the states and permissible transitions between those states. Some procedural descriptions may be shared by many processes.
An interpreter that performs state transitions for a process. It examines the procedural description and the PSR and sets the PSR to its next state. An interpreter may be time-shared among a number of processes.

The process status record keeps track of all information that defines the current state of a process. In automata theory, a PSR is analogous to the instaneous description of a Turing machine. In a System/360 CPU, a PSR is analogous to the program status word with the contents of the fixed and floating registers. In the CDC 7600, the exchange jump package is the equivalent of the PSR. In the Burroughs 6700, the pushdown stack together with control words that may be stored in it form the equivalent of a PSR.

Above the SL level, a procedural description could be a read-only program. Beneath that level, procedural descriptions may be in microcode or hard wiring. The reason for separating the procedural description from other parts of a process is to allow a number of re-entrant processes to use the same description simultaneously. For primitive objects, the hardware may take shortcuts during high-speed execution and not separate the three parts of a process. For error logouts or responses to a diagnostic programmer, however, the system must generate a PSR that effectively represents the current state of an object.

The interpreter is the motive power that causes a process to move from one state to the next. It is the logical abstraction of active servers like CPUs and channels, but is more general since it includes software interpreters as well as special devices that may be attached as RPQs. The AFS logical architecture has deliberately avoided the concept of a CPU. Instead, the more general concept of process allows the engineer greater freedom to build distributed execution units, special purpose devices, and multiple processing units to improve performance without changing any logical interfaces.

The definition of process sets the stage for later discussion of wait states, exceptions, and suspensions. When a process needs input, it stays in one of its wait states indefinitely. A waiting process is considered asleep, and sending it input corresponds to a wake-up call. Exceptions are unusual conditions like arithmetic overflow or violations of access rights. When an exception occurs, the process in which it occurs generates a message for another process called a monitor and then goes into a wait state until it receives a message from the monitor. A suspension occurs when the motive force, the interpreter, is removed from a process, and the process stops because there is nothing to make it go. Suspensions result from time-sharing the interpreter among many processes so that only one can be running at any given time, but they can also occur when a process has run out of money (using too much time or space) or when it is stopped because of some other event like an attention signal from the programmer who started it.

Processes occur at all levels of a system. When concepts are not clearly defined, engineers and programmers working on different levels may be unaware that they are working on similar problems and duplicating functions that are performed on other levels. In System/370, for example, there are processes executing in channels and I/O untits, in microcode in the CPU, and at the instruction level for subroutines and tasks. The concepts, terminology, and data formats at the various levels completely obscure any similarities among these processes:

Records of processes in channels and I/O units are maintained in channel status words.
The record of a process at the microlevel is logged out by the DIAGNOSE instruction.
The record of the architecturally defined CPU status record of a process is in the program status word and register contents.
And the record of a process as viewed by OS/360 is in the task control block.

Not only does System/370 use awkward terminology for the various processes, it also uses awkward means for switching status. For subroutine calls, the BAL instruction does only half the job since it only modifies part of the PSW and it doesn't save registers. To call a program with different status, an SVC instruction must be used with considerable overhead from the operating system. The rest of the status, the registers, are at the mercy of the called routine to save or destroy. If the called routine is re-entrant, the simple BAL instruction, which takes one microsecond on a Model 65, must be supported by two SVC instructions to get and free temporary storage, at a cost of over 200 microseconds. In AFS, PSRs maintain the status and working storage for all processes at all levels. Although data formats beneath the SL level are CPU-dependent structures and cannot therefore be identical to formats above that level, the same concepts and terminology are used to emphasize the relationship between similar problems on different levels of the system design.

2.1.3 Objects

In AFS, the object is a generalization from two sources: descriptor/value pairs and resource/process associations. Descriptors are maintained with data in data management systems, APL, EULER, and the dynamically varying parts of PL/I. The type field in a descriptor can be interpreted as the name of a machine for accessing the value part. Although the few bits that describe a floating-point number don't exhibit many characteristics of a procedure, the generality of an access machine or procedure is valuable for complex arrays and structures and is essential for the intricate relationships in a large data base.

The association of a process with every resource derives from Dijkstra's approach in T.H.E. Multiprogramming System and from Ole-Johann Dahl's approach in Simula 67. Dijkstra associates a process with every resource in his system; the process is solely responsible for allocating that resource and acts as a central clearinghouse for all accesses to it. Chapter 2.5 shows that all objects in AFS have the properties of Dijkstra's resources and naturally fit into a general scheme of resource management. Alan Perlis suggested that simulation languages might provide a suitable basis for an operating systems language since they have the best developed concepts of event and process. The AFS concept of objects as processes is a generalization of the objects in the simulation language Simula 67.

Definition: An object is the basic entity in the system. It has an active part called an access machine and a passive part called an owned resource. Its active part responds to requests by other objects and may in turn generate requests of its own.

There is an input queue of cell names that specify buffers containing requests for the object.
The access machine is a process that waits in one of a set of states called ready states when it is ready to respond to input requests. When a cell name for a request appears on its input queue, [the object] assumes ownership of the buffer containing the request, performs whatever action is appropriate, returns a buffer containing the answer, and returns to a ready state.
The owned resource is data that is accessed only by the object's access machine. For objects like clocks and printers, however, the resource may interact with events outside the system.

Since this definition is general enough to accommodate source-sink I/O devices as well as objects as powerfulas a Turing machine, it can include any conceivable device within the standard accessing and allocating method. For a floating-point number, the implementation could specify a fixed-length bit string as the resource and a few bits to identify a hardware unit as the access machine. For I/O devices, the object internal to the system would be called a port, whose resource would be a logical connection to the external device and whose access machine could be a hardware or microcoded control unit. Since the internal structure of an object is invisible to the caller, an object implemented in hardware or microcode on one system could be implemented in software on another.

As in Simula 67, a software access machine is a procedure that defines a potentially infinite class of activations. An object corresponds to a process executing in one such activation. A ready state is a point in the procedure where the process waits for input. And the resource is a set of [local] variables used by the activation. Logically, all objects are processes. Even a floating-point variable is a process that is normally waiting, but must occasionally answer requests to deliver a value or to stow one away.

Definition: A primitive object is one that cannot be constructed from other objects in the system. The PSR, interpreter, and procedural description that make up its access machine are not objects normally defined in the logical architecture.

Somewhere underneath all the logical data structures, there must be primitive building blocks from which everything else can be constructed by software. Although the logical definitions of primitive objects are parallel to the constructions of other objects, their substructure is visible only to the engineers and diagnostic programmers.

Definition: A reducible object is one that can be constructed from other objects. The PSR, interpreter, and procedural description of its access machine are AFS objects that can be manipulated by SL.

Primitive objects are defined axiomatically in terms of their effects on other parts of the system. Sometimes, reducible objects are defined axiomatically, but most reducible objects are defined by an explicit construction in terms of primitive objects. All primitive objects are implementation defined. Many reducible objects are implementation defined, and others can be user defined. For efficiency, reducible implementation-defined objects may be built out of hardware or microcode even though they can be constructed out of more primitive objects. Logically, however, all reducible objects have the same status whether they are implementation defined or user defined.

Definition: The primitive object nil has an access machine with only one state. For every request, nil returns a copy of itself. For operations on lists, nil has the properties of a zero element list.

In APL\360, the empty vectors are similar to nil, but they have additional type information. The empty charcter vector has a description that indicates that it is of type character, and it expands into blanks. The empty numeric vector is of type numeric, and it expands into zeroes. [In AFS,] nil is of type any, and it expands into a list of undefined objects. [Note: a better option might be to have nil expand into a list of nil objects rather than undefined objects.]

Definition: The primitive object undef has an access machine with only one internal state. For every request except destroy, undef raises an error exception.

Logical storage cells can never be empty. If nothing else has been put in them, they contain an undefined variable object. The object nil is a general neutral element. It responds without error exception to any request, although some functions such as + or − may themselves raise error exceptions when given a nil operand. The object undef is a general undefined element; it always raises error exceptions except when being copied or destroyed.

Primitve objects are so basic to the structure of the system that they cannot be constructed by software. Hardware devices may not be primitive in the same sense because a disk drive, for example, could be simulated by a software routine that duplicates the interface and uses the storage management system to perform the same functions; but there is no sequence of instructions that could createa new disk drive in the corner of the machine room and physically attach it to the computer. Therefore, certain objects must be built in from the beginning, and others may be attached as the system expands or be removed when they fail. As long as the physical interface provides circuitry that matches voltage levels and makes the device look like a procedure, the logical interface can make room for it in the object base and can define synonyms and access machines that make it respond to any protocol expected of it.

Definition: A port is an object that communicates with the world outside the system. Its access machine handles the interface, and its owned resource is a logical connection to a physical object.

Since ports are objects, they have the same interface as all other objects. They have a well-defined status with respect to the accessibility graph, environment tree, and dependency graph, and they respond to requests in the same way as other objects. Therefore, it is always possible to replace a port with a software object that has the same interface. Programmers can create logical printers, simulated 2314 disks, and even simulated networks of machines. If a graphic object has an unusual interface, the real port to the device can be replaced by a logical port that behaves like a printer, but that contains a program to massage control information passed with a request and send it to the graphic device in the appropriate format. To make network communication more transparent to the user, the system will provide identical interfaces for a virtual System/370 emulated inside the system and for a real System/370 at the far end of a telephone line.

If communication with a system were in the character set format of typewriters and printers, the internal representation of an object would be of no concern to programmers and could remain totally invisible. But since data may be interchanged between systems, either conversationally or by removable storage media, there must be a standard representation of an object that can be recorded on an external medium and be reconstructed on a different system. This standard representation is called an object image. Every system is free to use its own internal forms, but they must all be directly mappable to the standard form for an object image. [Note: the term "object image" was adopted from the APL\360 system, but the basic idea was implemented in the 1950s for LISP. The currently popular term is "serialization".]

Definition: An object image is an external representation of an object. The object image has two parts corresponding to the two parts of the object: a descriptor that specifies the access machine and a representation of the owned resource.

If the object is primitive, the descriptor indicates that it is primitive, and the representation is a bit string specifying what object it is.
In general, the descriptor specifies the complete access machine by indicating the PSR (which may contain zero bits of information in some simple cases), the object image of the procedural description of the access machine, and the interpreter of the access machine.
If the owned resource contains storage cells holding other objects, the representation includes object images of all those objects.
If the object is a synonym containing the cell name of some storage cell, the object image must contain a path name (see Section 2.1.5) for reconstructing the cell name by indexing from some standard vertex of the accessibility graph.

The object image is an external form of the DAPOV (Descriptor And Pointer Or Value) discussed in the System Architecture Manual. Although a DAPOV on a small system may be different from a DAPOV on a large one, the object images will be the same for all. The object image may be considered as the DAPOV for an abstract implementation of the AFS. It may turn out to be identical to the internal DAPOVs of one or more actual implementations, or it may be a compressed encoding of the internal DAPOVs.

Definition: The object base is the set of all objects in the system.

The term object base is more general than the term data base, since it also includes the logical interfaces to hardware resources. Because of the generality, all hardware devices have descriptors and can have synonyms defined upon them. Whenever a device breaks down, its descriptor can be changed to point to another device or to a software simulator that can replace it. All of the advantages of late binding therefore apply to devices as well as data. Instead of doing a SYSGEN for every configuration, implementers can provide standard logical facilities, make descriptors for nonexistent facilities point to substitutes, and keep the logical appearance constant as descriptors are changed one by one to reflect the current configuration.

The definition of object given above implies that all objects are serially reusable resources. Nonreusable objects can be implemented by making the access machine destroy the object after its first (or n-th) use. No requests can bypass this check, since an object cannot be used except through its access machine.

[In effect,] re-entrant procedures and time-shared devices correspond to a potentially infinite class of serially reusable objects. By subdividing storage, a single re-entrant procedure can provide [local] variables for as many activations as requested. By subdividing time, a time-slicing routine can provide multiple logical devices that all perform the same function as a single physical device. The AFS view of objects as processes treats the problem of resource management as a problem of interprocess communication.

Definition: A request on an object is a triple (T;P;D), where T identifies the request type, P is information proper to that type, and D is the destination or object that is to receive the answer. Normally, the access machine of the object will execute the request and return a result to the object D. In some cases, the access machine will cause an event called an exception. See Section 2.4.1 for a definition of exceptions and the ensuing events.

[Note: If an object x sends a request (T;P;D) to an object y, the destination D of the result may be the sender x or some third object z. This convention allows objects to support any kind of synchronous or asynchronous protocol. In a simple synchronous request, the object x would send a request (T;P;x), in which the destination is the sender; after sending the result, x could wait until the answer came back. To allow more parallelism, the sender x could send a request (T;P;x) and not wait for the answer; at some later time, x could check its input queue to see whether the answer was present. The definition, however, is general enough to support a wide range of protocols for blackboards or whiteboards that may hold messages that might be answered by objects whose identity was unknown to the object that sent the original request. For example, the sender x might generate a request (T;(P;x);y) for a blackboard manager y, which might use some private method for determining which object z could process the request type T. Then y would send the request (T;P;x) to z, which would send the result back to x.]

Definition: The dependency graph is a structure defined over the object base. If an object x has a request on its input queue that specifies an object y as its destination, then y is said to depend on x, and (y,x) is an edge of the dependency graph.

Later chapters will bring out the implications of the dependency graph in resource management, process dispatching, and deadlock determination. Chains of subroutine calls form a subgraph of the dependency graph called the activation tree. If x is an activation of a program that calls a subroutine y, then x is dependent on an activation of y until it returns. [Note that a re-entrant program p may have many activations, each of which is an object that is distinct from p and from any other activation of p.]

2.1.4 Access Machines

Since every object has an access machine, it always has an active element available to perform necessary functions. A typical function is that of monitoring. During debug mode, the programmer may wish to monitor all accesses to a particular variable and then perform a specific action such as recording the access, calling some procedure, or waiting for instructions from the terminal. For sensitive data, all requests on an object may cause its access machine to check the identity of the caller and to notify a security officer of an access attempt by an unauthorized user. For proprietary software on lease, the access machine might destroy the object after thousand uses. All these applications rely on the invisibility of an object's internal structure — when an ordinary variable is replaced by one that is being monitored, its normal interface remains unchanged.

Definition: An access machine has the following external interface:

It must have a set of ready states in which it waits for requests with arguments (T;P;D). After processing a request, it must return to a ready state.
The argument D specifies the destination for the response to the request.
The argument P specifies further information proper to the request type.
The argument T specifies one of the following request types:

Authorize.
Request to obtain a synonym to the storage cell containing the object. (See Section 2.1.5.)
Copy.
Request to obtain a copy of the object. If the object may not be copied, the access machine raises an error exception. If the argument P is nil, then the entire object is copied; otherwise, P must specify some subpart to be copied.
Delete.
Request to delete a storage cell of a collective object. The argument P must be the index of the cell to be deleted. (See Section 2.1.6.) The object contained in the cell is not destroyed, but is returned as the response to the request.
Destroy.
Request to destroy an object. If the object is nondestructible, its access machine raises an error exception. If it is a collective object, it makes destroy requests upon all its elements before finally destroying itself. [After the object has been destroyed, its storage cell contains the object undef.]
Evaluate.
Request upon a simple data object to deliver a value or upon a more complex object to generate a value. The argument P is nil for ordinary data objects, but must be a list for functions (see below).
Identify.
Request to obtain a description of the access machine and structure of an object. If the argument P is nil, the response includes all identifying information; otherwise, P specifies the information requested (see below).
Insert.
Request upon a collective object to insert a new storge cell into its owned resource. (See Section 2.1.6.) P specifies the index to be mapped onto the new cell by select requests. If P is nil, the new cell has no index.
Select.
Request upon a collective object to map P onto its storage cells. P must be a set (possibly nil) of elements in the index set of the object. The response is a set of cell names selected by those indices. (See Section 2.1.5 for further discussion of indices.)
Start.
Request upon an activation of a function to begin interpretation of the procedural description associated with the function. The argument P is a list of objects to be bound to the formal parameters of the function.
Stow.
Request to stow the value P in the owned resource of an object. The access machine will either perform data conversions to make P comply with its conventions or raise error exceptions if P cannot be converted properly or if the current value cannot be modified.
The access machine always reserves the right to tell lies about itself and its resource. This right is essential to data independence because it must always be possible to replace an object with another object that may be different in structure, but appears to be the same.

[The request types correspond to the methods of object-oriented programming languages. That is not a coincidence, since AFS and all other object-oriented systems were directly or indirectly influenced by Simula 67. All AFS objects, primitive or reducible, must recognize and respond to authorize and identify requests. Some of the other request types may not be appropriate for some types of objects. Reducible (user defined) objects may support any number or kind of request types that are appropriate to their purpose.]

Definition: In order to specify requests, a primitive request constant is defined for each of the request types. The names of the request constants are formed by adding 's' to the corresponding request names: authorizes, copies, deletes, destroys, evaluates, identifies, inserts, selects, starts, and stows.

[More to come.]

Definition: A function is

Definition: The triadic function request

Definition: The monadic function evaluate

Definition: The dyadic function stow

2.1.5 Accessibility Graph

Definition: A synonym is

Definition: The dyadic function authorize

Definition: The monadic function syn

Theorem: If a request of type T is made on an object through a chain of synonyms, then T must be in the intersection of the rights of all synonyms in the chain.

Definition: A metonym is

Definition: A collective object is

Definition: If x is a collective object

Definition: An elementary object is

Definition: The ownership relation between collective objects and storage cells has the following properties:

Theorem: Every object except the system root is an element of one and only one collective object.

Theorem: The ownership relation defines a tree structure over the object base. The system root is the root of the tree, collective objects are at branching nodes, and elementary objects are at leaves of the tree. Call this tree the ownership tree.

Definition: The index set of an object x is

Definition: A list L is

Definition: The monadic function ilist

Definition: The dyadic function select

Definition: An object x is directly accessible from y if

Definition: An object x is indirectly accessible from y if

Definition: An object x is accessible from y if

Definition: The accessibility graph is a union of the ownership tree and the chains of synonyms: (x,y) is an edge of the graph if either x is a synonym for y, or y is an element of x.

Theorem: If x is an element of y or if x is indirectly accessible from some object z which is an element of y, then there exists an element n in the index set of y such that x=select(n;y). Call n a simple name for accessing x from y.

Definition: A path from an object y to an object x is

Theorem: If x is accessible from y, then there exists a path name from y to x.

2.1.6 Manipulating Storage Cells

Definition:

Theorem:

2.1.7 Structure

2.1.7.1 Lists

Definition:

2.1.7.2 Structures

Definition:

2.1.7.3 Arrays

Definition:

[There is more to this chapter that has not yet been transcribed.]