A Prolog to Prolog

John F. Sowa

2. Pure Prolog

All versions of Prolog share a common core called pure Prolog. This core is directly derived from symbolic logic and supports the great bulk of all Prolog programming. Although features outside the core are needed for input/output and system interfaces, the pure Prolog subset determines the spirit of the language. This section illustrates Prolog programming style with emphasis on the pure Prolog core.

Solving Problems Stated in English

Translating Prolog into English is relatively straightforward. But translating any natural language into any formal language is much more difficult. Chapter 5 presents formal rules for automatically mapping English into a logical form. This section presents more informal methods of analyzing a problem statement and mapping it into Prolog. Following are some rules of thumb for doing that mapping:

Proper names like "Bill" or "Socrates" map into constants: bill, socrates; unnamed individuals must be assigned unique identifiers, such as i1, i2, i3.
Common nouns like "cat" or "table" map into one-place predicates: cat(X), table(X).
Adjectives like "red" or "heavy" map into one-place predicates: red(X), heavy(X).
Intransitive verbs like "walk" or "sleep" map into one-place predicates: walk(X), sleep(X).
Transitive verbs like "see" or "carry" map into two-place predicates: see(X,Y), carry(X,Y).
Bitransitive verbs like "give" or "tell" (which take both a direct and an indirect object) map into three-place predicates: give(X,Y,Z), tell(X,Y,Z).
The main verb usually maps into the conclusion of a rule (the head) and other words map into the conditions (the body).
The verb "be" in its various forms usually does not map into a predicate: the following noun, adjective, or participle tends to become the main predicate.
The verb "have" sometimes indicates possession, but more often it indicates a relation between two nouns that depends on the context.

These nine rules cover a useful subset of English. But they are only a rough approximation. There are many fine points of language that they do not cover. Some of these points will be discussed later. But to show their applicability, consider Exercise 2 at the end of this chapter:

Washable allergenic things are washed. Nonwashable allergenic things are vacuumed. Everything that is gray and fuzzy is allergenic. Shirts, socks, pajamas, dogs, and llamas are washable. Lamps, sofas, cats, and computers are nonwashable. Following are my gray, fuzzy possessions: my pajamas, my sofa, my cat Thothmes, and my llama Millicent.

In the first sentence, the adjectives "washable" and "allergenic" map into the predicates washable(X) and allergenic(X). The common noun "thing" maps into the predicate thing(X). But what about the verb "are washed"? The active verb form "wash" is transitive, but the passive form is consistently used in this problem without any indication of the one who does the washing. Since the subject is omitted, a one-place passive form washed(X) may be used. Given these predicates, the next step is to identify the condition and conclusion of the rule. The conclusion is washed(X). The condition under which X is washed is that it must be a washable allergenic thing. Therefore the Prolog rule becomes

washed(X) <- washable(X) & allergenic(X) & thing(X).

This rule may be read "X is washed if X is a washable allergenic thing," which is Prolog's closest approximation to "Washable allergenic things are washed." By the same kind of analysis, the second sentence becomes,

vacuumed(X) <-
    nonwashable(X) & allergenic(X) & thing(X).

Note the separate predicate nonwashable(X) instead of ^washable(X). This separate predicate is needed since a later rule uses nonwashable in the conclusion, and Prolog normally uses negations only in the condition or body of a rule. The third sentence has a different syntactic form, but it also maps into a similar Prolog rule:

allergenic(X) <- gray(X) & fuzzy(X) & thing(X).

The next sentence implies that X is washable if X is a shirt, a sock, a pajama, a dog, or a llama:

washable(X) <- shirt(X) | sock(X) | pajama(X) |
               dog(X) | llama(X).

Similarly, the sentence after that becomes,

nonwashable(X) <- lamp(X) | sofa(X) | cat(X) |
                  computer(X).

The last sentence introduces some complexities: "Following are my gray, fuzzy possessions: my pajamas, my sofa, my cat Thothmes, and my llama Millicent." The word "following" can be ignored; it is not about the subject matter itself, but about the form of the statement. The word "my" introduces a new individual, me, who possesses several things. The common noun "possession" does not map into a one-place predicate, since it is actually derived from a transitive verb "possess." The other common nouns map into one-place predicates, and the proper names "Thothmes" and "Millicent" map into the constants thothmes and millicent. But this sentence also mentions two other individuals, my pajamas and my sofa, which do not have names. In order to record facts about these things, they must be identified by constants; since they are not named, unique identifiers such as pj1 for my pajamas and sf1 for my sofa may be introduced. With these predicates and constants, this sentence maps into a series of facts with five individual constants: pj1, sf1, thothmes, millicent, and me. In standard logic, a single formula could assert that I possess pj1 and pj1 is a gray fuzzy pajama:

possess(me,pj1) & pajama(pj1) &
    gray(pj1) & fuzzy(pj1).

But since Prolog facts may contain only one predication, that information must be split into four simple facts (which may be written on a single line). The last sentence of Exercise 2 therefore maps into the following collection of facts:

possess(me,pj1). pajama(pj1). gray(pj1). fuzzy(pj1).
possess(me,sf1). sofa(sf1).   gray(sf1). fuzzy(sf1).
possess(me,thothmes). cat(thothmes). gray(thothmes).
     fuzzy(thothmes).
possess(me,millicent). llama(millicent).
     gray(millicent). fuzzy(millicent).

These Prolog rules, when asserted in a single workspace, represent a complete translation of Exercise 2 into Prolog. Unfortunately, there is something missing: the English statement never said how the noun "thing" is related to the other nouns like "cat" or "sofa". Another rule must be added to say that X is a thing if X is a shirt or sock or whatever:

thing(X) <- shirt(X) | sock(X) | pajama(X) | dog(X)
          | llama(X) | lamp(X) | sofa(X) | cat(X)
          | computer(X).

Every time a new type of entity is introduced, this rule must be augmented with another disjunction. A simpler and more general solution is to say that everything is a thing. That can be stated in one very short unconditional rule:

thing(X).

This rule always succeeds for any value of X.

Suppose all those rules have been typed into the file washable prolog. Then the following is a transcript of a Prolog session at the screen. The first step is to invoke the consult predicate that reads the file and adds all the rules to the current workspace. .sk p8;.in3;.fo off <-consult(washable). goal : <- consult(washable). 8ms success <- consult(washable). .sk p8;.in0;.fo on Here, what we type at the keyboard is in this typeface, while the answers from the machine are in this typeface. The first line is the goal that calls consult. The second line is an echo that repeats the input and confirms that it is a goal. The third line indicates that the goal took 8 milliseconds to complete successfully on an IBM 3090 computer. The fourth line is another echo that repeats the input goal, but now with the variables replaced with the computed values (in this case, there were no variables). The next goal typed at the keyboard asks what is washed; Prolog responds with pj1 substituted for X: .sk p8;.in3;.fo off <-washed(X). goal : <- washed(V1). 0ms success <- washed(pj1). .sk p8;.in0;.fo on The first echo replaced the variable X with the variable V1, which is the standard way that IBM Prolog refers to the first variable in a goal. The execution time is listed as 0 milliseconds. Nothing actually takes zero time to compute, but anything less than half a millisecond is rounded down to 0. Finally, the second echo replaces the variable with the answer pj1, which indicates that the pajamas are washed. After a goal succeeds, typing a semicolon forces the system to backtrack and find another solution. The next thing that is washed turns out to be millicent: .sk p8;.in3;.fo off ; 0ms success <- washed(millicent). .sk p8;.in0;.fo on Typing another semicolon shows that there are no further solutions: .sk p8;.in3;.fo off ; 0ms fail .sk p8;.in0;.fo on Next are the goals that determine what is vacuumed. In this case, the answers are sf1 and thothmes: .sk p8;.in3;.fo off <-vacuumed(X). goal : <- vacuumed(V1). 0ms success <- vacuumed(sf1). ; 0ms success <- vacuumed(thothmes). ; 0ms fail .sk p8;.in0;.fo on Prolog also has other methods for computing all possible solutions without requiring the user to type a semicolon for each one. These methods will be discussed in Section 2.3.2. IBM Prolog also has pragmas that change the input and output formats. They can be used to suppress the echo and the information about execution time, and they can also make goals the default so that the arrow <- can be omitted from an input goal. These are useful features that are discussed in Appendix A. They improve the human factors of the system, but they do not illustrate any fundamental principles about Prolog or logic programming.

Subtle Properties of English

Natural languages are highly systematic structures, but the system is not always marked by obvious syntactic features. For that reason, superficial patterns are often misleading. As an example of a common misunderstanding, consider the sentences "Sam likes Jane" and "Cats like fish." After learning that the first sentence maps into the Prolog fact like(sam,jane), many beginners translate the second sentence into the fact like(cats,fish). But the rules of thumb in the previous section suggest that common nouns should map into predicates. Therefore "Cats like fish" maps into the following rule:

like(C,F) <- cat(C) & fish(F).

This rule says that C likes F if C is a cat and F is a fish. The reason for this difference is that a proper name normally designates a unique individual within a given context. Therefore the fact like(sam,jane) is an unconditional assertion. If there happened to be many people named Sam and Jane, further conditions would be needed to specify the context more narrowly. The following rule, for example, says that X likes Y if X is named sam, Y is named jane, X lives at 23 Main Street, and Y lives at 19 Blossom Lane:

like(X,Y) <- named(X,sam) & named(Y,jane)
           & live(X,'23 Main St.')
           & live(Y,'19 Blossom Lane').

The general principle that each individual must be identified by a unique constant is always true. But when names are not unique, that constant must be something more specific, such as a social security number. Except in unusual circumstances, common nouns apply to multiple individuals. Therefore they typically map into predicates rather than constants.

The rule that common nouns map into predicates is generally true, but they may map into predicates with more than one argument. Consider the sentences "My cat is my friend" and "Friends like each other." The word "friend" does not indicate the type of individual in the same way as "human" or "cat"; instead, it shows the role that some person (or cat) plays with respect to another. In general, nouns that specify the type of individual designate natural types, which correspond to predicates with a single argument; nouns that specify a role that the individual plays designate role types, which correspond to predicates with more than one argument. Examples of natural types include human, cat, dog, beagle, tree, table, house, and number. Examples of role types include father, sister, lawyer, employee, pet, possession, home, and quotient. Note the distinction: Sam may be a human by nature, but a father by virtue of a certain relationship to someone else; Thothmes is a cat by nature, but a pet in relation to some person; 4 is a number by itself, but a quotient as a result of 20 divided by 5. For further discussion of these types, see the book by Sowa (1984). Sowa, J.

To represent the noun "friend," two arguments are needed on the predicate: the first indicates the person who is the friend, and the second indicates the other person involved. Since friendships are normally reciprocal, two facts are needed to make the predicate symmetric. The sentence "My cat is my friend" maps into the following facts:

friend(me,thothmes).  friend(thothmes,me).

The sentence "Friends like each other" uses the phrase "each other" to abbreviate two statements: X likes Y if X is a friend of Y; and Y likes X if X is a friend of Y.

like(X,Y) <- friend(X,Y).
like(Y,X) <- friend(X,Y).

Actually, only one of these rules would be needed if the system guaranteed that the friend predicate were always symmetric. English also uses other words like "respectively" or "reciprocally" to indicate relationships that Prolog would represent with variables.

The sentence "My cat eats everything he likes" raises some interesting issues. In mapping it into Prolog, the main verb becomes the conclusion and the other parts are the conditions. If the cat is the same one mentioned in Exercise 2, the rule becomes,

eat(thothmes,X) <- like(thothmes,X).

When this rule is added to the previous rules, it leads to the conclusion that Thothmes eats me. One way to limit his appetite is to amend the specification to say "My cat eats X if he likes X and X is not a friend of his":

eat(thothmes,X) <- like(thothmes,X)
                 & ^friend(X,thothmes).

This rule seems reasonable, since it would allow Thothmes to make friends with Tweety without eating him while still eating other birds.

Exercise 4 (page &pexer4;) can now be solved with the techniques discussed so far. Some of the sentences are fairly complex. For example, the sentence "Any man who likes a woman who likes him is happy" maps into the following rule:

happy(M) <-
    man(M) & woman(W) & like(M,W) & like(W,M).

The sentence "Bertha likes any man who likes her" causes trouble for about 50% of the people who translate it into Prolog. Those who translate it into the following rule are lucky:

like(bertha,M) <- man(M) & like(M,bertha).

The unfortunate ones map it into the following rule:

like(bertha,M) <- like(M,bertha) & man(M).

Both of these rules are recursive. But the first one checks whether M is a man before calling itself, and the second one calls itself recursively before checking M. In the first case, man(M) is a finder goal that determines some value for M. When the rule calls itself, it issues one of the subgoals

like(norbert,bertha)
like(pierre,bertha)
like(bruno,bertha)

The second rule, however, calls itself immediately. It gets into a recursive loop where it keeps issuing the same subgoal like(bertha,bertha). In general, a recursive rule that calls itself before making any change to its arguments will loop indefinitely (or until storage is exhausted). See page &casp; for a keyboard command to terminate a loop.

In looking for some way to make everybody happy in Exercise 4, think of reasonable principles in English before writing them in Prolog. One reasonable principle is "Any man who is not rich likes any woman who is rich." In fact, this principle makes many people happy for many different reasons. With the goal happy(X), for example, Prolog would keep finding the same names again and again. The reason for the duplicate answers is that backtracking forces the system to examine all possible paths. If it finds that someone is happy for two or more reasons, that person will be listed once for each reason. To find all solutions and eliminate duplicates, IBM Prolog has a built-in predicate named setof, which could be used in the goal setof(X,happy(X),L). This goal would find all X for which the predicate happy is true, accumulate all the values of X in the list L, sort them in alphabetical order, and eliminate duplicates. The setof predicate will be discussed in more detail later, along with a similar predicate named set_of, which has a better way of handling the case when L happens to be empty.

Exercise 5 introduces a new issue: the distinction between individuals and types. That problem arises in many guises, as in the following faulty syllogism:

               Clyde is an elephant.
               Elephant is a species.
               ----------------------
Therefore      Clyde is a species.

A naive mapping of that syllogism into Prolog would lead to exactly the same fallacy:

isa(clyde,elephant).
isa(elephant,species).
isa(X,Z) <- isa(X,Y) & isa(Y,Z).

Given these two facts and rule, Prolog would conclude that the goal isa(clyde,species) is satisfied. The problem here is a nondistributed middle term: the word "elephant" is being used in two different ways. The first sentence says that the individual Clyde is of type elephant, and the second says that "elephant" is the name of a species. No conclusion is possible when the middle term of a syllogism is used in two different ways.

Talking about types in Prolog requires a change of notation. Normally, one would assert the fact elephant(clyde) where the individual maps into a constant clyde, and the type maps into the predicate elephant(X). To talk about types as individuals, however, requires types to be represented as constants. Then a new, more general predicate type(X,T) is needed to assert that individual X has type T. The premises about Clyde and elephants would map into the following Prolog facts:

type(clyde,elephant).
named(es1,elephant).  type(es1,species).

The first line says that clyde is of type elephant. The second line says that there exists some entity identified as es1, which has the name elephant and the type species. This analysis blocks erroneous inferences about Clyde's being a species, but it allows more abstract reasoning about types.

In Exercise 5, the main statements are about types. If Sam likes all kinds of sports, he undoubtedly likes all the subtypes, such as football or boxing, but he might be bored with a particular game or match. The problem also involves other complexities, such as unstated assumptions about the relationship between competing in a sport and playing the sport. There are two approaches to issues as complex as these: adopt a general set of predicates for talking about individuals, types, and subtypes; or simplify the English statements in a way that captures the main points with a minimum of abstraction. Simplifying assumptions may be necessary to solve a particular problem, but they may not generalize to other problems.

Another issue that arises in mapping English to a formal notation is that adjectives may have complex interactions with the nouns they modify. In particular, degree adjectives like "big" and "small" are relative to a standard determined by the noun: a big mouse is much smaller than a small elephant. By blindly mapping adjectives and common nouns into one-place predicates, one system made a serious blunder. It mapped the sentence "Sam is a good musician" into two Prolog facts:

good(sam).  musician(sam).

Then it mapped the sentence "Sam is a bad cook" into the following facts:

bad(sam).  cook(sam).

From these four facts, it answered "yes" to all of the following questions: Is Sam a good cook? Is Sam a bad musician? Is Sam a good bad musician cook? The error here results from the fact that "cook" and "musician" do not designate natural types, but role types. Adjectives like "good" and "bad" do not apply to Sam as a person, but to his role as cook or musician. The mapping into Prolog (or any other logical form) must recognize the roles and the ways of qualifying them.

Besides these complexities, natural languages pose many philosophical problems, especially in talk about knowledge and belief, hopes and fears, intentions and expectations, tenses and modalities. Many of these problems are still unsolved in their full generality; but for many of them, the Prolog programmer can adopt a special-case solution that is adequate for the task at hand. Fortunately, many practical problems lie within the subset that Prolog can easily represent. But the problem solver (and poser) should recognize the areas where difficulties can be expected.

Representing Quantifiers

Standard logic has two basic quantifiers: the universal quantifier (∀ X), read "for all X," and the existential quantifier (∃ X), read "there exists an X." Prolog does not use explicit quantifier symbols like ∀ and ∃. Instead, all variables in a Prolog clause are assumed to be universally quantified. For example, the rule

thing(X).

from page &pthing; could be read "for all X, X is a thing". Because of this convention, universal quantifiers in logic are usually easy to map into Prolog. Existential quantifiers, however, must be handled differently in different contexts:

If the quantifier (∃ X) does not follow any universal quantifiers, the variable X may be replaced by a constant symbol like a.
If the quantifier (∃ X) follows one or more universal quantifiers (∀ Y)(∀ Z)..., the variable X may be replaced with a function symbol f(Y,Z,...) where the variables Y,Z,... become the arguments of the function.
If a variable X occurs only in a negative context (such as the condition part of a rule), it has the effect of being existentially quantified.

Some examples may help to clarify these three principles. With an existential quantifier, the sentence, "I own a fuzzy gray sofa," could be represented,

(∃ X) (own(me,X) & fuzzy(X) & gray(X) & sofa(X)).

In this formula, the sofa has no name. Instead, it is introduced by the quantifier (∃ X) to indicate "there exists an X." Since (∃ X) does not follow any universal quantifiers, X may be replaced by some constant, such as sf1, which was suggested as the name of the sofa in Section 2.2.1. Since Prolog facts have only a single predication, the formula must also be broken up into four separate facts. It therefore becomes,

own(me,sf1).  fuzzy(sf1).  gray(sf1).  sofa(sf1).

When an existential quantifier follows a universal quantifier, it must be represented by a function symbol. As an example, consider the sentence, "Every person has a mother." That may be paraphrased, "For every P, there exists an M such that M is a mother of P if P is a person." If Prolog had quantifiers, that sentence could be written,

(∀ P) (∃ M) mother(M,P) <- person(P).

In standard Prolog, the universal quantifier (∀ P) would be dropped. But the existential quantifier (∃ M) cannot be replaced with a unique identifier such as m1. Otherwise, the resulting Prolog rule would express something quite unintended:

mother(m1,P) <- person(P).

This rule says that for all P, m1 is a mother of P if P is a person. In other words, m1 would be a universal mother of everyone. Since the existential quantifier (∃ M) follows a universal quantifier (∀ P), the variable M depends on P. That dependency can be shown with a special function, say mom(P):

mother(mom(P),P) <- person(P).

This rule may be read, "For all P, mom(P) is a mother of P if P is a person." The function mom(P) shows the dependency on P. Such functions are called Skolem functions in honor of the Norwegian mathematician Thoralf Skolem, who introduced them as a technique for eliminating quantifiers. This method of replacing quantifiers with function symbols is sometimes called skolemizing.

As a more complex example, take the sentence, "Every chicken coop has a chicken that is pecked by every other chicken in the coop." The first part of this sentence, "Every chicken coop has a chicken," has the same structure as "Every person has a mother," but the extra clause at the end makes it more complex. To map it into Prolog, first select predicates coop(X) and chicken(X) for the coop and the chicken, a predicate peck(X,Y) for the verb "peck," and a predicate loc(X,Y) for location "in." According to the rules of Section 2.2.1, the verb "has" typically indicates a highly context-dependent predicate. In this context, it represents the location loc(X,Y). Since the English sentence implies a universal quantifier ranging over chicken coops followed by an existential quantifier ranging over chickens, there must be a special function, say loser(X), which designates the poor chicken that is pecked by every other one in coop X. The English sentence has three implications: the loser of the coop is a chicken; it is located in the coop; and it is pecked by every other chicken located in the coop. Since Prolog rules have only one conclusion each, that sentence must be mapped into three separate rules:

chicken(loser(X)) <- coop(X).
loc(loser(X),X) <- coop(X).
peck(Y,loser(X)) <- coop(X) & chicken(Y)
     & loc(Y,X) & ^(Y = loser(X)).

These three rules may be read "The loser of X is a chicken if X is a coop; the loser of X is located in X if X is a coop; and Y pecks the loser of X if X is a coop, Y is a chicken, Y is located in X, and Y is not the same as the loser of X."

The third method of handling existential quantifiers results from the way quantifiers interact with negation. In standard logic:

(∀ X) ^P(X)    is identical to    ^(∃ X) P(X).
(∃ X) ^P(X)    is identical to    ^(∀ X) P(X).

The first rule says that for all X, P is false if and only if it is false that there exists an X for which P is true. The second rule says that there exists an X for which P is false if and only if it is false that for all X, P is true. These two rules can be summarized in the following principle: when a quantifier moves in or out of a context governed by a negation, ∀ becomes ∃, and ∃ becomes ∀. In applying this principle to Prolog, note that the standard logic conditional "P if Q" is equivalent to "P or not Q." Similarly, the Prolog <- operator contains an implicit negation that governs the condition or body of a rule. As an example, consider the following definition of mother_in_law with the universal quantifiers shown explicitly:

(∀ M) (∀ X) (∀ Y)
    mother_in_law(M,X) <- mother(M,Y) & married(X,Y).

The quantifiers (∀ M) and (∀ X) govern variables that occur in both the condition and the conclusion of the rule. But the variable Y occurs only after the arrow. Therefore it is possible to move (∀ Y) into the condition. But when it moves into the negative context, it becomes (∃ Y):

(∀ M) (∀ X) mother_in_law(M,X) <- (∃ Y) mother(M,Y)
     & married(X,Y).

This may be read, "For all &M and X, M is a mother in law of X if there exists a Y where M is a mother of Y and X is married to Y." Although Prolog does not use explicit quantifier symbols, variables that occur only after the arrow have this existential effect.

Since variables that occur only in the condition part of a Prolog rule are effectively governed by an existential quantifier, a problem occurs in expressing conditions that contain universal quantifiers. Consider the sentence, "Every mother whose children are all above average is a super mom." The following rule does not accurately represent the meaning:

supermom(M) <- mother(M,C) & above_avg(C).

This rule may be read, "M is a super mom if there exists a C where &M is a mother of C and C is above average." Notice that C is effectively bound by an existential quantifier, since it occurs only in the condition. Therefore the rule does not exclude the possibility that other children of M may be below average. To exclude those cases, some negations are needed:

supermom(M) <- mother(M,C) &
    ^(mother(M,D) & ^above_avg(D)).

In this rule, D occurs within a negative context nested inside another negative context (the condition). Therefore the implicit quantifier that governs D flips from ∀ when it is in front of the formula to ∃ when it moves after the arrow and back to ∀ when it occurs in the nested negative context. Therefore this rule may be read, "M is a supermom if there exists a C where M is a mother of C and it is false that for all D where M is a mother of D, D is not above average." This contorted way of expressing universal quantifiers by double negations is also required in certain database query languages.

In order to express quantifiers in a more natural way, McCord introduces the notion of focalizers in Chapter 5. He uses the focalizer all(P,Q) to express the statement that in every case where P is true, Q is also true. Using the focalizer all, the definition of supermom becomes somewhat simpler and more readable:

supermom(M) <- mother(M,C)
             & all(mother(M,D), above_avg(D)).

This rule may be read, "M is a supermom if there exists a C where M is a mother of C and for all cases where M is a mother of D, D is above average." The focalizer all is defined by the following rule:

all(P,Q) <- ^(P & ^Q).

In this rule, P and Q are metavariables, which match arbitrary predications or combinations of predications. The body of the rule invokes P and Q as goals to be proved. This rule says that all(P,Q) succeeds if it is false that there is ever a case when P succeeds and Q does not succeed. Metavariables are discussed in more detail in Chapter 3, and focalizers are discussed in Chapter 5.

Choosing a Data Structure

Before writing rules to define predicates, a Prolog programmer must decide what data structures the rules will process. The choice of structure has a major effect on the way rules are written and the efficiency of their execution. As an example, consider Exercise 6 at the end of this chapter:

A farmer has a wolf, a goat, and a cabbage (a very large one). He wants to get all three of them plus himself across a river, but his boat is only large enough to hold one item plus himself. How can he cross the river without leaving the wolf alone with the goat or the goat alone with the cabbage?

This is a typical state-transition problem where the starting state has all four items (plus the boat) on one side, and the goal state has everything on the other side. In between are various intermediate states where some things are on one side and some things are on the other side.

In the farmer problem, the state of the system is defined by the position of each of the four items: farmer, wolf, goat, and cabbage. (The boat, the fifth item, can be ignored, since it must always be on the same side as the farmer.) There are two major choices of data structures for this problem:

A structure with four variables: each variable represents the location of one item (farmer, wolf, goat, or cabbage).
A pair of sets: all items on the first side in one set; and all items on the second side in the other set.

This description has three words that indicate data structures: "structure," "pair," and "set." Before writing rules for this problem, the programmer must decide how to map these data structures into Prolog terms.

A structure of four variables can be represented as a Prolog function symbol applied to four arguments. A suitable function is loc(F,W,G,C). The variable F represents the location of the farmer, W the wolf, G the goat, and C the cabbage. Single letters for variables keeps the rules short, but they make the programs harder to read. That function could also be written with longer names for the variables:

loc(Farmer_loc, Wolf_loc, Goat_loc, Cabbage_loc)

This data structure is analogous to the record structures used in many programming languages, such as Pascal. The similarity is more striking when each argument in the Prolog structure is written on a separate line: .sk .cp 1i Prolog Structure Pascal Record

loc                  type loc = record
   (Farmer_loc,                   farmer: string,
    Wolf_loc,                     wolf: string,
    Goat_loc,                     goat: string,
    Cabbage_loc)                  cabbage: string
                                end

The Prolog structure has a function named loc applied to four arguments. In the Pascal structure on the right, loc is the name of a record type that has four fields; each field can hold a single character string that represents the location of the corresponding item. Although the syntax is quite different, the two structures serve the same purpose. One difference between them is that Pascal requires the data types to be explicitly declared (as strings in this example), but Prolog permits data of any type to be assigned to its variables. This comparison illustrates a basic point about Prolog: functional forms are used to build data structures, not to do computations; they should be compared to data structures in other languages. Predicates do the actual computations in Prolog; they should be compared to procedures in other languages.

Given the data structure loc(F,W,G,C), the next task is to define a predicate move(Current,Next) that determines a possible move from the state Current to the state Next. The first rule that defines move says that the farmer can move by himself from location F to F2, leaving the other items unchanged, provided that the new location F2 is opposite F and the resulting state is safe:

move(loc(F,W,G,C), loc(F2,W,G,C)) <- opp(F,F2)
     & safe(loc(F2,W,G,C)).

This rule illustrates a powerful feature of Prolog: in most languages, the head of a procedure lists variable names for the parameters, and the body of the procedure uses operators that extract substructures from the input variables. Then it uses other operators to assemble the output. In Prolog, however, the extraction and assembly can be done automatically by pattern matching with structures in the head of a rule. The above rule could be written in a style similar to more conventional languages by using the = operator to take apart the structure Current and assemble the new structure Next:

move(Current,Next) <-
    Current=loc(F,W,G,C) & opp(F,F2) &
    Next=loc(F2,W,G,C) & safe(Next).

In this version, the subgoal Current=loc(F,W,G,C) causes subparts of the structure Current to be matched to the variables F, W, G, and C. But the original version of the rule does that same pattern matching before execution of the rule even starts. It is usually simpler and more efficient to do as much pattern matching as possible in the head of a rule instead of in the body.

There are three other kinds of moves: the farmer takes the wolf across, the goat across, or the cabbage across. Following is the rule for moving the wolf with the farmer:

move(Current,Next) <-
    Current=loc(F,W,G,C) & F=W & opp(F,F2) & W2=F2 &
    Next=loc(F2,W2,G,C) & safe(Next).

This rule says that if the current state is loc(F,W,G,C) and the farmer's location F and the wolf's location W are the same (F=W), then the farmer's new location F2, which is opposite F, and the wolf's new location W2 are the same. All of that can be stated more simply and efficiently by doing the pattern matching in the head of the rule:

move(loc(F,F,G,C), loc(F2,F2,G,C)) <- opp(F,F2)
     & safe(loc(F2,F2,G,C)).

The rules for moving the goat and the cabbage are analogous:

move(loc(F,W,F,C), loc(F2,W,F2,C)) <- opp(F,F2)
     & safe(loc(F2,W,F2,C)).
move(loc(F,W,G,F), loc(F2,W,G,F2)) <- opp(F,F2)
     & safe(loc(F2,W,G,F2)).

The predicate opp can be defined by two facts to say that east is opposite west, and west is opposite east:

opp(east,west).    opp(west,east).

In determining whether a state is safe, the critical item is the goat, which can either eat the cabbage or be eaten by the wolf. The next two rules say that a state is safe if either the farmer is on the same side as the goat or everything else is on the opposite side from the goat:

safe(loc(F,W,F,C)).
safe(loc(F,F,G,F)) <- opp(F,G).

Given these rules, the farmer problem is completely defined. The only thing that is needed is a general search procedure that tries to find a path of safe states from the start to the goal:

Start state:   loc(west,west,west,west)
Goal state:    loc(east,east,east,east)

Suitable search procedures are defined on page &xdepth;.

The other choice of data structure for this problem is a pair of sets. To represent pairs, Prolog provides the dot operator, written as a period between two items. The pair abc.def consists of the atom abc followed by the atom def. Pairs of numbers must have blanks around the dot: (3 . 14159) is a pair of two integers, but 3.14159 is a single floating-point number. Pairs can include other pairs: (a.b).(c.d) is a pair of two pairs. The dot operator is right associative: the structure a.b.c.d is exactly equivalent to (a.(b.(c.d))). The items linked by the dot operator can include any mixture of data structures, such as numbers, character strings, functional forms, other pairs, and even variables. The following are sample Prolog structures:

a.X.c.(Y.Z).'My Old Kentucky Home'. 7037 .xyz

farmer(north).wolf(south).goat(drowned).cabbage(eaten)

f(g(X),39).h(ABC).z(smith).order('One dozen eggs')

Sets are commonly used in logic and mathematics, but very few programming languages support them directly. Instead, most languages provide some kind of ordered list or vector. To represent a true set, the ordering must be ignored. In Prolog, lists are represented as a sequence of items connected with dots followed by the atom nil:

a.b.c.d.e.f.g.h.nil

The atom nil has two purposes in Prolog: it is the name of the empty list, and it serves to mark the end of longer lists. However, there is nothing magic about this atom; the atom end could have been used instead, but nil is the traditional name of the empty list in other languages, such as Lisp. Most implementations of Prolog also provide a bracket notation for lists that omits the final nil (although nil is still present in the underlying representation).

[a,b,c,d,e,f,g,h]

The dot and the bracket notations are exactly equivalent. The dot form is closer to the internal representation, and the bracket form is a permissible alternative. Following are some examples of lists in both the dot form and the bracket form:

nil                              []
abc.nil                          [abc]
nil.nil                          [[]]
abc.def.ghi.nil                  [abc,def,ghi]
nil.nil.nil.nil                  [[],[],[]]
(abc.nil).(def.nil).nil          [[abc],[def]]
((abc.nil).nil).nil              [[[abc]]]

For representing constants, the bracket form tends to be shorter, since it does not require a terminal nil. For combinations of variables, however, the dot form tends to be shorter. Since the dot form shows more clearly how Prolog processes lists, this book will primarily use the dot form in the examples. IBM Prolog supports both notations for lists; the desired form can be specified by a pragma statement — see page &calistpragma;.

Given this notation for pairs and lists, consider the second option for .hw repre-sent-ing states in the farmer problem: a pair of sets, with all items on the first side of the river in the first set and all items on the other side in the second set. Following are the starting state, one of the intermediate states, and the goal state for this option:

(farmer.wolf.goat.cabbage.nil) . nil
(wolf.cabbage.nil) . (farmer.goat.nil)
nil . (farmer.wolf.goat.cabbage.nil)

For nested lists, the bracket notation tends to be more readable. The dot operator for pairs can be combined with the bracket notation for lists:

[farmer,wolf,goat,cabbage] . []
[wolf,cabbage] . [farmer,goat]
[] . [farmer,wolf,goat,cabbage]

Whichever notation is used, these lists are actually supposed to be sets. Therefore the order must be ignored. But ignoring order is not easy, since all possible permutations must be treated as equivalent. Extra processing is needed to compare sets or to search for a particular element.

With the first data structure, the move predicate accesses each item by a direct pattern match. With the pairs of sets, however, it must call another predicate to search for each item. The predicate pick(X,L1,L2), which will be defined later, picks one element X from list L1 and puts all of the remaining elements in list L2. If X does not occur in L1, the predicate fails. Given that predicate, the first rule for move can be written:

move(S.G, S2.G2) <- pick(farmer,S,S2) & (G2=farmer.G)
     & safe(S2.G2).

In this rule, S.G represents a state where S is the list of items on the starting side of the river, and G is the list of items of items on the goal side. The next state is the pair of lists S2.G2. To generate the new state, the subgoal pick(farmer,S,S2) takes the farmer out of the list S and leaves the remaining items in S2. Then the next subgoal (G2=farmer.G) puts the farmer in front of the list G to form G2. Finally, the predicate safe checks the new state S2.G2. Actually, this rule is not complete, since it does not allow the farmer to move back from the goal side to the starting side. Another option is necessary to allow the farmer to move both ways:

move(S.G, S2.G2) <-
    (pick(farmer,S,S2) & (G2=farmer.G) |
     pick(farmer,G,G2) & (S2=farmer.S)) &
     safe(S2.G2).

With the pair of lists, the rules for safe are also more complex. The rule that a state is safe if the farmer and the goat are on the same side becomes the following:

safe(S.G) <- member(farmer,S) & member(goat,S)
           | member(farmer,G) & member(goat,G).

This rule uses the predicate member(X,L) to check whether X is a member of the list L; member is also defined later (it is actually the same as the pick predicate without the third argument). As an exercise, the reader should practice writing the other rules for move and safe, using this data structure instead of the first one.

As these examples show, fixed-length structures are easy to access by direct pattern matching. Variable-length lists require more searching with predicates like member or pick. This principle is true of almost all programming languages: if some item is in a known position, it is easier to access than when it must be found by searching through a list. Although lists require some searching to find an element, they are essential for structures that are constantly growing and shrinking. In choosing a data structure, the first question to ask is whether the number of items is fixed or changing:

If the number of items cannot change by the very nature of the problem, choose a fixed data structure, such as a pair X.Y, a triple X.Y.Z, or a functional form loc(F,W,G,C).
If the number of items may change or their location in the structure is unknown, use a list (which is a structure of dotted pairs terminated by nil).

At this point, the reader should turn to Exercises 7 through 10, which are state-transition problems similar to the farmer problem. In Exercise 7 with the two water jugs, the amount of water is variable, but the number of jugs is fixed. Therefore a good representation for the state is a pair of numbers that show the amount of water in each jug. The starting state is (5 . 0), and the goal state is (&asterisk; . 1), where the anonymous variable &asterisk; shows that any amount of water in Jug A is permissible in a goal state. One of the rules for defining the move predicate is the following:

 move(A.B, A2.B2) <- gt(A,0) & lt(B,2)
     & Total := A + B
     & ( ge(Total,2) & (A2 := Total - 2) & B2=2
       | lt(Total,2) & A2=0 & B2=Total).

This rule defines the conditions for pouring water from Jug A to Jug B. The pair A.B is the amount of water in each jug in the current state, and A2.B2 is the amount in the next state. The condition part of the rule says that Jug A must not be empty, gt(A,0), and Jug B must not be filled to the brim, lt(B,2). Then the rule computes the total amount of water in the system, Total. The last two lines consider the cases where the total amount is greater than or equal to the capacity of Jug B, ge(Total,2); or where it is less than the capacity of Jug B, lt(Total,2). If the total is greater than or equal to 2, Jug A receives the total minus 2, and Jug B is filled to its capacity 2. If the total is less than 2, Jug A becomes empty, A2=0; and Jug B has everything, B=Total.

The previous rule also illustrates two operators = and :=. The = operator unifies two things (variables or constants or structures) to make them equal; it fails if they cannot be unified. But the := operator, which is a synonym for =?, evaluates the expression on the right before unifying it to the variable or constant on the left. In A2=0, the variable A2 becomes 0. In (Total := A + B), the expression (A + B) is evaluated before the result is unified with the variable Total. By contrast, (Total = A + B) without the colon would do no evaluation; it would simply bind the unevaluated expression (A + B) to Total. The process of unification is described in more detail in the next section.

In Exercise 8 with missionaries and cannibals, the identities of the individuals are irrelevant; only the number of each kind is significant. Unlike the farmer problem, where the boat was always on the same side as the farmer, this problem requires a separate variable for the boat's location, since the missionaries and the cannibals can both operate it. Therefore a fixed-length structure of five variables could represent the state. One possibility is B.M.C.M2.C2, where B is the position of the boat, M and C are the numbers of missionaries and cannibals on the starting side, and M2 and C2 are the numbers on the goal side. The atom nil could be tacked to the end of this structure, to form the list B.M.C.M2.C2.nil. However, nil is not needed, since the structure has a fixed number of items that will be accessed by a pattern match, not by searching with predicates like member or pick.

In Exercise 9 with the monkey and bananas, the state could be represented by a fixed-length structure with three variables. One possibility is the triple M.B.P, where M is the monkey's location, B is the box's location, and P is the monkey's posture (standing, on box, or reaching). The following rule says that if the monkey is standing next to the box, it can push the box somewhere else:

move(M.M.standing, N.N.standing) <-
     member(N,a.b.c.d.m.nil) & ^(M=N).

The condition that the monkey is next to the box is that the monkey's location M is the same as the box's location. The member predicate is used here as a finder goal to pick some value for N out of the list of possible locations, a.b.c.d.m.nil. Then the test ^(M=N) verifies that the new location N is not the same as the old location M. Note that the predicate member can be used either as a finder goal or as a verifier goal. Negated predicates, however, can only be used as verifiers.

Unification:

Binding Values to Variables

Variables in Prolog are bound to values during pattern matching. The algorithm that binds values to variables is called unification. Special cases of unification have been used in all of the examples given so far: whenever a goal is matched to the head of a rule or fact, the arguments in the goal are unified with the arguments in the head. For example, if the goal is human(X), the variable X is unified or bound to some constant such as socrates. If the goal already has a constant, such as human(cicero), unification consists of checking whether the two constants are identical. Unlike conventional assignments, the binding performed during pattern matching is only tentative. If the value bound to X happens to cause some conflict in a later subgoal, the system backtracks and undoes the tentative binding. Then some other match may bind a new value to X.

Variables other than &asterisk; must have the same value at every occurrence within a clause; but they may have different values in different clauses. In other words, the scope of a variable name extends for just a single clause (rule, goal, or fact). The next two rules both have the same internal form:

sibling(X,Y) <- child(X,Z) & child(Y,Z) & ne(X,Y).
sibling(C1,C2) <- child(C1,Parent) & child(C2,Parent)
                & ne(C1,C2).

Internally, the system numbers the variables that occur in each clause. On output, it prints the first variable in a clause as V1, the second one as V2, and so forth.

In simple pattern matching, one pattern is made up of constants (character strings or integers) and the other is a mixture of constants and variables. The unification algorithm in Prolog is more general: it allows both patterns to contain unbound variables, and it tries to find values for the variables that will make both patterns the same. For example, if g(7,X,Y) is unified with g(Z,W,22), the value 7 is bound to Z, 22 is bound to Y, and X and W are cross-bound to each other. Two variables that are cross-bound do not immediately acquire a value, but if any value is later bound to one of them, the other gets exactly the same value. Unification can cause some complex substitutions. For example, f(g(X),X) matches f(Y,g(3)) with X bound to g(3) and Y bound to g(g(3)). The unification process is most easily seen when the functions are drawn in tree form: .sp

      f                      f                  f
     / \                    / \                / \
    g   X   unified with   Y   g   produces   g   g
    |                          |              |   |
    X                          3              g   3
                                              |

Unification starts by matching the heads of each tree. Since f=f, it continues by matching the left sides of the trees, making Y equal g(X). Then it matches the right sides, making X equal g(3) and thereby causing Y to become g(g(3)). When the final tree is rewritten in linear form, the result is f(g(g(3)),g(3)).

In the above example, unifying two data structures produced a structure that was bigger than either of the starting structures. The two occurrences of the variable X caused the structure to grow by multiple substitutions. With repeated substitution, it is even possible to produce infinite data structures. One way is to assert the rule,

p(X,X).

and then issue the goal p(Y,f(Y)). In trying to satisfy the goal, Prolog attempts to find some binding for the variable Y that will unify both arguments of p. Therefore it tries to substitute f(Y) for Y in the first argument. To be consistent, it must make the second argument f(f(Y)). But then it goes back to substitute f(f(Y)) for Y in the first argument and thereby makes the second argument f(f(f(Y))), and so on. Some versions of Prolog get hung up in a loop with such goals. IBM Prolog, however, can represent infinite terms, with pointers that point back to a higher node. For the above goal, the system produces the answer f(@(1)) to indicate an infinite tree with a loop of length 1 (shown by @(1)). The resulting structure does not actually occupy an infinite amount of storage. It simply contains a pointer that cycles back to a higher node. For most common applications, infinite terms are not needed, but the ability to create them keeps Prolog from getting hung up in a loop.

The unification operator = tries to unify its left- and right-hand arguments. If they are unifiable, it succeeds; otherwise, it fails. Since unification is one of the fundamental processes performed by the underlying inference engine, it could be defined by a single rule:

X = X.

This rule succeeds if the built-in pattern matcher can unify both arguments; otherwise, it fails. Because the = operator is so useful, IBM Prolog provides it as a built-in predicate.

The = operator can be used either to test for equality or to force two terms to become equal. If X were an unbound variable, either goal X=5 or 5=X would bind the value 5 to X. But if X had a previous value, then the same goal would test whether that value is 5. For unbound X, = has the effect of an assignment that can be undone by backtracking; otherwise, it has the effect of a comparison. The evaluation operator ? can be used to evaluate an expression before the unification is performed. Consider the following goals:

X = 4 + 5 .
X = ? 4 + 5  .
3 &asterisk; 3 = 4 + 5 .
? 3 &asterisk; 3 = ? 4 + 5 .

In the first goal, the unevaluated expression (4 + 5) is bound to X. But with ?, the result 9 is bound to X. Whenever evaluation is required, =? or the synonym := must be used. The third goal fails because the expression (3 &asterisk; 3) is not identical to the expression (4 + 5). The fourth goal, however, succeeds because each side evaluates to the same value 9. (Note that a blank space must be typed between an integer and the ending period; otherwise, 5. would be treated as a floating point number.)

Unification is a general tree-matching process. The most commonly used operator for building trees in Prolog is the dot operator, used in forming lists. Suppose the pattern of variables A.B.C.D.E were unified with the following list:

apples.peaches.pumpkin.pie.nil

Then the pattern match would succeed with the result, A=apples, B=peaches, C=pumpkin, D=pie, and E=nil. If the pattern had more than five variables, the pattern match would fail. But if the pattern had less than five variables, the pattern match would still succeed. For example, the single variable A by itself could match the entire list. But what about a pair of variables, say A.B? To see how patterns of variables match a list, draw the list as a tree. Since the dot is a right-associative operator, the fully parenthesized form of the fruit list would be

apples.(peaches.(pumpkin.(pie.nil))).

Following are the tree forms for the pair A.B and the entire fruit list: .in7 .sp .in0

           .                            .
          / \                          / \
         A   B                   apples   .
                                         / \
                                  peaches   .
                                           / \
                                    pumpkin   .
                                             / \
                                          pie   nil

Here A matches apples, and B matches the entire sublist peaches.pumpkin.pie.nil. When a dotted pair of variables, like A.B, matches a simple list, the first one matches a single element, called the head of the list; the second one matches an entire sublist, called the tail. If there are more than two variables, each variable but the last matches a single element, and the last variable matches everything that is left over. If the pattern X.Y.Z were matched to the fruit list, the result would be X=apples, Y=peaches, and Z=pumpkin.pie.nil.

When parentheses are used for grouping, any element could itself be a complex subtree. In the list (a.b.nil).(c.d.nil).(e.f.nil).nil, the head is the list a.b.nil, and the tail is a list of two elements, each of which is itself a list of two elements: (c.d.nil).(e.f.nil).nil. Although the dot notation is convenient for simple lists, the bracket notation tends to be more readable when the lists contain sublists. In the bracket notation, this list is written [[a,b],[c,d],[e,f]]; its head is [a,b], and its tail is [[c,d],[e,f]].

In the bracket notation, the fruit list would be written [apples,peaches,pumpkin,pie]. To represent patterns of variables, the exclamation point ! separates the variables that match single elements at the beginning from the variable that matches the tail. The pair A.B is equivalent to the bracket form [A!B], and the pattern X.Y.Z is equivalent to [X,Y!Z]. Note that for patterns of variables, the dot notation is shorter than the bracket notation.

Ordinary Prolog lists branch to the right because the dot operator is right associative. By means of the special predicate op, new operators can be defined to form different kinds of structures. The dot, for example, is defined by the assertion

op(".",rl,100).

The second argument rl says that it is right-to-left associative, and the third argument defines its precedence as 100 (which gives it tighter binding than the Boolean operators and most of the common predicates). It is possible to declare some other symbol, such as ?, as a kind of list-forming operator that would be left-to-right associative:

op("?",lr,100).

This definition makes ? an infix operator. Its built-in definition as a prefix evaluation operator is not affected: the same symbol can be given different definitions for different numbers of arguments. Then the structure a?b?c?d would be equivalent to the fully parenthesized form ((a?b)?c)?d. Note the difference between the tree for this structure and the tree for the dotted form a.b.c.d: .in7 .sp .in0

             ?                           .
            / \                         / \
           ?   d                       a   .
          / \                             / \
         ?   c                           b   .
        / \                                 / \
       a   b                               c   d

With the left associative ? operator, the rightmost value d is closest to the top of the tree. With the right associative . operator, the leftmost value a is closest to the top. For the pair A?B, the variable A would match the entire subtree a?b?c, and B would match the single element d. Either the dot or the question mark could be used as a basis for list processing, but the right associative dot is the more traditional.

In list processing, the most common operations are to extract the head or the tail of an input list. Lisp, for example, has two selector functions called CAR and CDR to perform those operations. Instead of using special functions, Prolog performs those extractions by pattern matching with a pair H.T where H matches the head and T matches the tail. Pattern matching normally takes place while unifying arguments when a rule is invoked. Sometimes it is more convenient to use the = operator in the body of a rule. Consider the following goal:

L=(H.T).

This goal might be used in three ways: If H and T were unbound variables, it would split the list L in two parts, binding the head to H and the tail to T. If only L were an unbound variable, the same goal would construct a new list L from a head H and a tail T. Finally, if all variables L, H, and T were bound, it would simply check for equality.

One of the strengths of Prolog is its ability to express its own definition. It is not difficult to describe the unification process in Prolog itself. However, most applications of unification can be understood just by drawing trees and doing repeated substitutions.

List-Handling Predicates

Predicates that process lists are usually defined by recursive rules. Their definitions typically require two rules. The first rule processes the empty list, and the second rule processes any list longer than nil. Following is the general scheme:

If the list is nil, return the value expected for nil.
Otherwise, split the list into its head and tail, process the head, call the predicate recursively to process the tail, and combine the result of processing the head with the results of processing the tail.

As an example of this scheme, the predicate total(L,T) computes the total T of a list of numbers L:

total(nil,0).
total(Head.Tail,T) <- total(Tail,Subtotal)
    & T := Head + Subtotal.

In this case, the expected result for nil is 0. For any longer list, the pattern Head.Tail splits the input list with Head matching the first element, and Tail matching all the rest. The recursive call to total computes the subtotal, which is added to the head to form the total T. Note that this computation will always terminate because each recursive call processes a sublist that is shorter than the previous list.

The predicate length(L,N), which computes the number of elements N in the list L, is slightly simpler than total since it only counts elements instead of adding them.

length(nil,0).
length(Head.Tail,N) <- length(Tail,M) & (N := M + 1).

The first fact says that the length of the empty list is 0. The second rule says that the length of any list Head.Tail is one more than the length of Tail.

The predicate member(X,L), which checks whether X is a member of the list L, does not exactly fit the general scheme since it does not specify a rule or fact to deal with nil:

member(X,X.Tail).
member(X,Head.Tail) <- member(X,Tail).

The first rule says that X is a member of a list if it is the head. The second rule says that X is a member of a list if it is a member of the tail of the list. If the list were originally empty (represented by the symbol nil), both of the patterns that contain dots, X.Tail and Head.Tail, would fail to match. Therefore both rules would fail. That is exactly what should happen, since by definition, nothing is ever a member of the empty list.

The predicate pick(X,L1,L2), which picks an element X out of the list L1 and puts the remainder in L2, behaves like member with an extra argument:

pick(X,X.Tail,Tail).
pick(X,Head.Tail,Head.Rem) <- pick(X,Tail,Rem).

The first rule says that if X occurs at the head of a list, the remainder is the tail. The second rule says that X can be removed from a list by picking it out of the tail; the remainder Head.Rem is the old head combined with the remainder Rem of picking X out of the tail. If the list were originally empty, both rules would fail to match, as indeed they should.

Many predicates process two input lists L1 and L2 to generate an output list L3. There are three basic ways of combining the elements of L1 and L2 to generate the elements of L3. Each method is recursive, but they differ in the ways they treat their input and output arguments.

Element-by-element: Each step combines the head of L1 with the head of L2 to form the head of L3. Then the predicate calls itself recursively to process the tails. The predicate listadd is an example of this method.
Appending: Each step processes the head of L1 to form the head of the output L3. When L1 is finally reduced to nil, the second argument L2 is processed to form the tail of L3. The predicate append is the primary example of this method.
Reversing: Each step processes the head of L1 to form the head of L2. When L1 has been reduced to nil, the second argument L2 has the results built up in reverse order. Then the whole list L2 is transferred unchanged to L3 to be passed back as the result. The three-argument form of reverse, given at the end of this section, is an example of this method.

These three methods are language-independent techniques for handling lists. Variations of them are used in Lisp and in pointer-based languages like Pascal and PL/I. Recognizing them as distinct approaches helps to explain why different rules handle their arguments in different ways.

As an example of the first approach, consider the predicate listadd, which does an element-by-element add of L1 to L2 to generate L3. It is defined by the following rules:

listadd(nil,nil,nil).
listadd(H1.T1,H2.T2,H3.T3) <-
    (H3 := H1 + H2) & listadd(T1,T2,T3).

If both of the input lists have been reduced to nil, the result (the third argument) is also nil. The second rule takes care of the general case: it splits the first list into a head and a tail (H1.T1); splits the second list into (H2.T2); and generates the result (H3.T3) by adding the two heads and calling itself recursively to add the two tails.

With the element-by-element approach, a problem would occur if one input were shorter than the other. Therefore two starting rules may be used to say what to do if either one input or the other is reduced to nil:

listadd(nil,L,L).
listadd(L,nil,L).
listadd(H1.T1,H2.T2,H3.T3) <-
    (H3 := H1 + H2) & listadd(T1,T2,T3).

The first rule says that if the first list is nil, the result (the third argument) is the second list unchanged. Then the next rule says that if the second list is nil, the result is the first list. The last rule is the same as before.

Concatenating two lists is not as simple as concatenating two character strings. The reason why lists are more difficult to handle is that they are actually trees, not flat structures like strings and vectors. For flat structures, the position of the end can be computed from the length. For trees, however, a program must scan to the end, looking for the marker nil. As an example, consider the two lists X=a.b.nil and Y=c.d.nil. The result Z should be a.b.c.d.nil. But simply writing the pair X.Y produces a very different structure (a.b.nil).(c.d.nil). The difference is most obvious when the structures are drawn as trees: .sp

       .                                .
      / \                              / \
     a   .                            /   \
        / \                          /     \
       b   .                        .       .
          / \                      / \     / \
         c   .                    a   .   c   .
            / \                      / \     / \
           d  nil                   b  nil  d  nil

     a.b.c.d.nil                (a.b.nil).(c.d.nil)

Note that the dot operator builds a new binary tree with each of the previous trees as branches. A full concatenation (typically called append) must scan through the first tree to find the ending nil and replace it with the second tree. That scanning process takes place while concatenating lists in any language. In pointer-based languages like Pascal and PL/I, that scanning would be done by a while-loop that continues until a null pointer is found. In Lisp and Prolog, the scanning is done by recursive calls that continue until the first argument is reduced to nil.

Given input lists L1 and L2, each recursive call to the append predicate moves the head of L1 to the head of the output list L3. When L1 has been reduced to nil, the second input L2 is passed to the output L3 unchanged. Then as each recursive call returns, the final result is built up as the third argument. Although this process sounds complicated in English, it is much shorter in Prolog:

append(nil,L,L).
append(Head.Tail,L,Head.T2) <- append(Tail,L,T2).

The first rule says that appending nil to any L leaves L unchanged. The second rule says that appending a list of the form Head.Tail to L yields Head.T2 if Tail appended to L yields T2.

To show how Prolog processes these rules, the trace predicate may be used to turn on tracing of all calls and exits. To trace the append predicate, type the following goal:

trace(append(&asterisk;,&asterisk;,&asterisk;)).

This goal turns on tracing for the append predicate; the three asterisks represent the three argument places of append. After tracing has been turned on, the goal

append(a.b.nil, c.d.nil, L)

generates output in the following form: .sk .in2 1 : call ==> append(a.b.nil, c.d.nil, &asterisk;) . 2 : call ==> append(b.nil, c.d.nil, &asterisk;) . 3 : call ==> append(nil, c.d.nil, &asterisk;) . 3 : exit ==> append(nil, c.d.nil, c.d.nil) . 2 : exit ==> append(b.nil, c.d.nil, b.c.d.nil) .
1 : exit ==> append(a.b.nil, c.d.nil, a.b.c.d.nil) . .in0 .sk The numbers on the left indicate the nesting level of calls; indentation makes the nesting more visible. The first call has the original inputs. The second line shows the next recursive call, where the tail of the original list b.nil is now being appended to the second list. By the third call, the first argument has been reduced to nil. Therefore this call matches the first rule for append, which forces the third argument to be equal to the second. At each exit from the recursion, the value of Head at that level is tacked onto the front of the list returned from the previous level. At the end, the third argument contains the final result.

The techniques for handling lists in Prolog are similar to the Lisp techniques for equivalent operations. They also have strong similarities to the techniques used in pointer-based languages like Pascal or PL/I. The best way to become familiar with them is to practice them with various kinds of lists. Try tracing append on different combinations of lists: a one element list a.nil with an empty list; longer lists, such as a.b.c.nil and x.y.z.nil; and lists with complicated sublists, ((a.b.nil).c.nil).(x.nil).nil). Study the output to see which rule is being invoked at each step and which substructure is being matched to each variable in the rule.

Tracing shows how the recursive process works. But the best way to understand a recursive rule is to ignore all of the calling and returning, which can become extremely confusing in complex cases. Instead, recursive rules should be analyzed by the process of mathematical induction:

First show that the rule works for the starting case, such as the empty list nil.
Then assume that it works for any list of length N and show that it must therefore work for any list of length N+1.

To apply these principles to append, note that the first rule gives the correct result of appending nil to any list. Then if the first argument is of length N+1, assume that the recursive call correctly computes the result T2 of appending Tail (which is of length N) to L. If that assumption is correct, then Head.T2 must be the result of appending a list of length N+1. The principle of mathematical induction guarantees that such assumptions are always safe. Therefore the details of how the recursion actually takes place can always be ignored. Programmers who think recursively just "assume away" all of the messy details.

As another example, consider a predicate to reverse a list. In this case, the simplest predicate to define is not the most efficient. But it is interesting to see why it is not efficient and to see how it could be improved. The next two rules define the simple version of reverse:

reverse(nil, nil).
reverse(Head.Tail, R) <-  reverse(Tail, RT)
     & append(RT, Head.nil, R).

The first rule says that the reversal of nil is nil. Then the second rule says that to compute the reverse of a list Head.Tail, first reverse the tail to form RT; then append RT in front of the head. This predicate will work correctly, but it runs in time proportional to the square of the number of elements in the list. The number of calls to reverse itself is just proportional to the length N. But the number of calls to append is proportional to the square of N, since append calls itself many times for each call to reverse.

A slightly more complex version of reverse uses an extra variable as intermediate storage. Each time it calls itself recursively, it builds up a reversed list in the second argument, which is finally returned as the answer in the third argument:

reverse(nil, Answer, Answer).
reverse(Head.Tail, Int, Answer) <-
    reverse(Tail, Head.Int, Answer).

The first rule says that when recursion finally reduces the first argument to nil, the second argument should be returned as the answer. The second rule splits the head from the first argument, and tacks it on the front of the list