. Before using Prolog effectively,
a programmer has to forget the old procedural habits and
learn a new declarative way of thinking.
After programmers recover from the initial shock of learning what
Prolog does not have, they can begin to appreciate the features that it
does have:
- Predicates that express relationships between entities,
- A method of defining predicates by asserting rules and facts,
- A method of asking questions or starting computations by
issuing goals,
- A backtracking search procedure for evaluating goals,
- Data structures that can simulate Pascal-like records or
Lisp-like lists,
- A pattern matcher that builds and analyzes the
data structures, and
- A set of built-in predicates for arithmetic, input/output, and
system services.
These features provide a language that is universal in the sense
that it can simulate a Turing machine—the most general
digital computing device known. In principle, Prolog can
compute anything that can be computed in any other programming language.
But the way that it is done in Prolog is totally different
from conventional languages. For complex problem solving,
the Prolog approach is typically the most convenient.
Facts and Predicates
In Prolog, a fact is a statement that some entity has a particular
property or that some relationship holds between two or more entities.
Facts are expressed by applying a predicate to a list
of arguments. For example, human(socrates) is a
Prolog fact: the predicate is named human, and the
argument is the constant socrates. This fact corresponds
to the English sentence "Socrates is human." Following are
other examples of facts stated in Prolog notation:
odd(5).
married(zeus, hera).
supply(acme, toothbrush, 100, 'July 14, 1989').
The fact odd(5) states that 5 is an odd number. The second
fact says that Zeus and Hera are married; and the third says that Acme
supplied 100 toothbrushes on July 14, 1989.
The single quotes in 'July 14, 1989' are used to enclose
a character string in IBM Prolog.
A constant or a predicate name can be chosen in many ways.
The fact qz3x(gglfn5) is handled by the Prolog system in the
same way as human(socrates). For readability, however,
the names should reflect the intention of the human programmer.
Following are some common naming conventions:
- A predicate with one argument, in the form p(A), usually
means that the entity &A has property or attribute p.
The fact dog(snoopy) means "Snoopy is a dog."
- A predicate with two arguments, in the form r(A,B),
typically means that &A stands in relation &r to B.
The fact child(sam,mary) means "Sam
is a child of Mary."
- For a predicate that does some computation, the input arguments are
usually placed first, and the results are put at the
end: sum(3,5,8) means "The sum
of 3 and 5 is 8."
- For a predicate that takes many arguments, the more
important arguments are put first, and the less important ones later.
For the fact "Acme supplied 100 toothbrushes on July 14, 1989,"
the most important argument is probably the supplier, next the type
of item, next the quantity, and finally the date. Therefore one
would write
supply(acme, toothbrush, 100, 'July 14, 1989')
Although these conventions are commonly followed, they do not always
apply. Some predicates, like married(A,B), are symmetric,
and either argument could be written first. For many computational
predicates, any argument may be treated as the output and the other
arguments as input: sum(X,5,8) computes a number X, which
when added to 5 produces 8; in this case, the first
argument X may be considered the output. The flexibility in
treating any argument as either input or output makes many Prolog
predicates reversible. This property will be used
in a number of examples throughout this book.
Constants, Variables, and Rules
Variables start with an uppercase letter.
Constants, however, are either enclosed in quotes, or they
start with a lower case letter or a numeric digit.
Following are the basic kinds of constants in IBM Prolog:
- An atom is any sequence of characters enclosed in double
quotes, such as
"July 14, 1989" or "apple". For atoms that start
with a lowercase letter and contain only letters and digits (no blanks
or punctuation), the double quotes may be omitted: "apple"
and apple are two identical atoms.
- A character string is any
sequence of characters enclosed in single quotes, such as
'July 14, 1989' or 'apple'. The single quotes can
never be omitted.
- Numbers may be either integers like 3796,
floating point numbers
like 3.14159 and 7.59e-10, or rational numbers like
22/7.
For integers and rational numbers, the only limit on size is the
amount of available storage; the system will keep allocating
storage up to a limit of 16 megabytes per number.
Rational numbers are written without blanks around the slash.
With blanks, 3 / 4 is treated as 3 divided
by 4. All three kinds of numbers may be intermixed
in computable expressions,
such as (3/4 + 2 + 1.9e-4) &asterisk; 5,
which could be evaluated to produce 13.75095.
Atoms and character strings differ in the way they may be processed.
Atoms are stored in a compact way that saves space, but there are no
predicates that operate on their internal structure.
Character strings take more storage space, but they
can be processed by various string-handling predicates.
A string '6xyz' can be converted to an atom A by the built-in
predicate st_to_at('6xyz',A); this is a reversible
predicate that
can also be used to convert an atom "6xyz" to a string
S as in st_to_at(S,"6xyz").
Besides variables like X or T1 that
begin with an uppercase letter, Prolog also supports
anonymous variables represented by just
an asterisk &asterisk;. The asterisk is
truly variable because it can have a different value every time
it appears. For example, p(&asterisk;,&asterisk;)
permits any values to occur in
the two different argument places, such
as p(3,99) or p(socrates,7); but p(X,X)
requires that both arguments be the same, such
as p(3,3) or p(socrates,socrates).
An anonymous variable may be used anywhere that a named variable can
occur; but each time it occurs, its previous value is ignored. For
that reason, the asterisk is sometimes called
a don't-care symbol because it shows that
the value is not needed in future references.
Predicates may be defined in two ways: by a complete list of all
the facts in which they occur; or by general rules that determine their
truth or falsity. As an example, all family relationships are
determined by lists of facts for three basic
predicates: child(C,P) means that C is a child
of P; male(M) means that M is
male; female(F) means that F is female;
and married(X,Y) means that X is married to Y.
Following are examples of facts for the child predicate:
child(oidipous,iokaste).
child(oidipous,laios).
child(antigone,iokaste).
child(antigone,oidipous).
child(eteokles,iokaste).
child(eteokles,oidipous).
child(polyneikes,iokaste).
child(polyneikes,oidipous).
Following are some facts that define
the male and female predicates:
male(laios). male(oidipous).
male(eteokles). male(polyneikes).
female(iokaste). female(antigone).
The married predicate is symmetric: if X is married to
Y, then Y is married to X. Therefore two facts must be stated
for each marriage; one for each ordering of the arguments.
married(laios,iokaste).
married(iokaste,laios).
married(oidipous,iokaste).
married(iokaste,oidipous).
With these four predicates defined by lists of facts, all other family
relations can be defined by rules. The easiest to define
is parent, which is the inverse of child:
parent(P,C) <- child(C,P).
In this rule, which is an example of a conditional,
the arrow <- may be read as the English word if.
Translated into English, the rule says that &P is a parent of &C if C
is a child of P.
Unlike standard logic, which has implications pointing to the right,
the conclusion in Prolog is written on the left. This way of writing
Prolog rules makes it easier for a person to find the rules that
define a predicate: when a predicate appears on the right side
(the body of a rule), it is being used or
called; but when it appears on the left side (the
head of a rule), it is being defined.
By scanning down the left margin of a listing, the Prolog programmer
can quickly find all the rules that define a given predicate.
The predicates father and mother require
a conjunction
of two conditions: M is the mother of C if C is a child of M
and &M is female. Likewise, F is the father of C if C is a
child of F and F is male.
mother(M,C) <- child(C,M) & female(M).
father(F,C) <- child(C,F) & male(F).
The symbol & is the Boolean operator for conjunction; it
may be read as the English word and. The reader should
practice translating these rules into English sentences: the
term mother(M,C) is read as "M is a mother of C";
the arrow is read "if"; and the & symbol
is read "and."
In the first rule above, C
occurs twice. If C has a value at one place in a rule,
then it must have exactly the same value at every other place.
The scope of a variable in Prolog is limited to
the rule in which it occurs;
variables with the same name in different
rules are treated as different variables.
Therefore, the variable C in the first rule has no effect on the
variable C in the second rule.
The next more
complicated predicate is sibling: X is a sibling of Y
if X is a child of some P, Y is a child of the same P,
and X and Y are not the same person.
sibling(X,Y) <- child(X,P) & child(Y,P) & ^(X=Y).
The symbol = represents an operator that tests for
equality, and the
operator ^ represents negation. Both of the
operators = and ^ have further properties that
will be discussed in more detail later.
Rules may be defined in terms of predicates that are themselves
defined by rules. The predicates sister and brother,
for example, can be defined in terms of sibling:
sister(S,X) <- sibling(S,X) & female(S).
brother(B,X) <- sibling(B,X) & male(B).
A recursive rule is one that defines a predicate in
terms of itself. An example is descendant: D is a
descendant of A if either &D is a child of A or D is a child
of a person P who is a descendant of A.
descendant(D,A) <- child(D,A)
| child(D,P) & descendant(P,A).
This rule introduces the symbol | for the Boolean disjunction
or or operator. The & operator has higher
precedence (tighter binding) than the | operator.
Strictly speaking, the | operator is not needed in Prolog,
since every use of | can be eliminated by writing extra
rules. The predicate descendant, for example, can be defined
by two separate rules. The first says that D is a descendant of A
if D is a child of A; the second says that D is a
descendant of A if D is a child of P and P is a
descendant of A:
descendant(D,A) <- child(D,A).
descendant(D,A) <- child(D,P) & descendant(P,A).
Both of these ways of defining descendant are about
equally readable and equally efficient. There is not much reason
for choosing one or the other.
In some cases, however, the | operator can improve
efficiency. Consider the definition of uncle. U is
an uncle of N under either of the following two conditions:
- By blood, N is a child of P, who is a sibling of U,
and U is male;
- By marriage, N is a child of P, who is a sibling of
a person Q, who is married to U, and U is male.
That pair of conditions can be mapped into two Prolog rules:
uncle(U,N) <- child(N,P) & sibling(P,U) & male(U).
uncle(U,N) <- child(N,P) & sibling(P,Q) &
married(Q,U) & male(U).
Since both of these rules have similar beginnings and endings,
the | operator could be used to
combine them in a single rule with the common parts factored out:
uncle(U,N) <- child(N,P)
& (sibling(P,U) | sibling(P,Q) & married(Q,U))
& male(U).
One further simplification is possible. The expression
sibling(P,U) may be replaced by the equivalent expression
sibling(P,Q) & Q=U. After making that substitution,
the two occurrences of sibling(P,Q) could also be factored out:
uncle(U,N) <- child(N,P) & sibling(P,Q)
& (Q=U | married(Q,U)) & male(U).
To understand this rule, read it as an English sentence: U is
an uncle of N if N is a child of P,
and P is a sibling of Q, and
either Q is the same as U or Q is married to U, and U is male.
Parentheses are needed around the phrase (Q=U | married(Q,U))
since | does not bind as tightly as &. This
phrase in the factored form of the rule illustrates an interesting
property of family trees: two people who are
married to each other have equal status in defining relationships like
aunt and uncle. This property holds for the
English kinship terms, but it is not true of the kinship terms in
many other languages and cultures.
Symmetric predicates like married must be defined by two facts
for each pair of arguments. It is tempting to simplify the definition
by asserting only one fact for each pair. Then the following rule
might be used to handle the other case:
married(X,Y) <- married(Y,X).
This rule says that X is married to Y if Y is married to X.
Unfortunately, this rule represents the most dangerous case of
recursion: the predicate calls itself recursively without making
a change to any of its arguments. If this rule is ever invoked,
it could cause a recursive loop that would only stop if
the system ran out of storage.
The definition of descendant also used a recursive rule,
but it used recursion more safely:
descendant(D,A) <- child(D,P) & descendant(P,A).
Before this rule calls itself recursively, it calls the child
predicate to find the parent P of D.
Therefore, each time descendant is called, its first argument
has a person who is higher up in the tree. If there are no loops in
the tree, it will eventually find the desired ancestor, or it will
stop with some individual whose parents are unknown.
A good way to ensure that a predicate is symmetric is to define it in
terms of some other asymmetric predicate. Consider the following
rules and facts:
wife(iokaste,laios).
wife(iokaste,oidipous).
husband(X,Y) <- wife(Y,X).
married(X,Y) <- wife(X,Y) | wife(Y,X).
With these definitions, wife is considered a primitive
predicate defined by a list of facts, and husband is
defined as the inverse of wife. Then X is married
to Y if either X is the wife of Y or Y is the wife of X.
The husband predicate could equally well have been used as
the primitive with wife and married defined in
terms of husband.
All of the rules discussed so far are
conditionals. Unconditional rules look like facts,
but they contain variables. For example,
a predicate related(X,Y) might
indicate that X is related to Y. Since
every person is related to himself or herself,
the following unconditional rule would say so:
related(X,X).
In list processing, which will be discussed later in this chapter,
there is a special list nil, called the empty list.
One might define a predicate sublist(L1,L2), which would
be true if the list L1 happened to be a sublist of
the list L2. Then the following unconditional rule
would state that nil is a sublist of
every list (including itself):
sublist(nil,L).
Another predicate used in list processing is append(L1,L2,L3),
which appends list L2 to the end of list L1 to form
a new list L3. The following unconditional rule says that
the result of appending nil to
any list L
is L unchanged:
append(nil,L,L).
The predicates for list processing are often defined by
two rules: the first rule is an unconditional rule that states what
happens to the empty list nil; and the second rule is a
recursive rule that handles the general case of a nonempty list.
Goals and Backtracking
Facts and rules reside in a Prolog workspace, but no
action takes place until a goal is entered. A simple goal
looks like a fact or an unconditional rule. The difference lies in
its use. For example, human(socrates) could be added to
the workspace as a fact. But it could also be entered as a goal
to ask the question "Is Socrates human?" In
IBM Prolog, the default syntax requires goals to be preceded by the
symbol <-. Therefore, the goal could be entered
by typing <-human(socrates). Since facts are more
commonly entered from a file and goals are more commonly
typed at the keyboard, it is often convenient to omit the
initial arrow for goals. Such conventions are determined
by pragmas, which are special global variables
that govern the default syntax and keyboard conventions.
Appendix A discusses the pragmas, including
<- pragma(allgoal,1),
which allows the initial arrow
to be omitted for goals.
Goals may also contain variables. The goal human(X) asks
the system to find some human and bind its value
to the variable X. Suppose the following facts had been
asserted:
human(socrates). human(cicero). human(voltaire).
greek(socrates). roman(cicero). french(voltaire).
When the goal human(X) is entered, the system looks for
the first fact that matches the goal, in this
case human(socrates),
and binds the value socrates to the
variable X. The other possible values cicero
and voltaire are ignored at first.
Binding is similar to assignment in other programming languages: it
causes a variable like X to acquire a value like socrates.
Yet binding is not as permanent as assignment.
It is possible to force Prolog to undo the binding by
backtracking to find another value for X.
For this example, if the user at the keyboard types the goal
<-human(X), the response is human(socrates).
To force backtracking, the user could then type a semicolon ;
which would cause the system to redo the previous goal and find another
binding for X. The second response would be human(cicero).
Another semicolon would force the system to backtrack and generate
the third response human(voltaire). One more semicolon would
force another backtrack, but no more options would be left.
Therefore, Prolog would report failure.
The semicolon is an operator that allows the user at the keyboard to
force backtracking. But backtracking also occurs when any goal
or subgoal happens to fail. Consider the compound
goal (human(X) & roman(X)). This goal has two
subgoals: human(X) and roman(X).
To process the compound goal, the Prolog system takes
the first subgoal human(X), finds some value for X,
and then processes the second subgoal roman(X) with the new
value of X. As before, the first value found for X is
socrates. It then tries to match the
second goal roman(socrates) to some fact in the workspace,
but no fact matches. Therefore, the system backtracks to
the first goal human(X). It unbinds the old
value of X and binds the next value cicero. With this new
value, the goal roman(cicero) matches one of the facts.
Then both halves of the compound goal succeed,
and the final value of X is cicero.
Backtracking sounds complicated, but the Prolog system makes it
efficient by keeping a stack of choice points.
Whenever it finds a list of alternatives, it always takes the
first one and puts a pointer to the next one on its stack of
choice points. When a failure occurs, it takes the most recent
choice point and continues from there. Since the stacks are
optimized for fast retrieval of choice points, Prolog
is an efficient language for processing alternatives
with complex interactions.
The goal human(socrates) was used to verify the fact
that Socrates is human. The goal human(X), however, was
used to find some