A Prolog to Prolog

John F. Sowa

4. Performance and Optimization

Prolog can be a highly efficient language when its features are used to their best advantage. But the powerful backtracking facilities can lead to a combinatorial explosion when they are not used properly. This section discusses ways of avoiding inefficiencies and improving performance. There are two basic ways of improving performance: finding a better algorithm, and reordering the search.

Choosing an Algorithm

The choice of algorithm is the most important factor in determining performance. A microprocessor running an efficient algorithm can outperform a supercomputer running an inefficient one. As an example, consider the problem of sorting a list in alphabetical order. Several different algorithms might be considered:

Using the built-in predicate is the most efficient and easiest to write, but the other sorts are interesting to examine, since they illustrate important Prolog techniques. The built-in compute predicate and the merge sort are probably the best ones to use in practice.

The first step in writing the permutation sort is to define the permute predicate for permuting a list. With the pick predicate defined on page &xpick;, permute can be defined in two lines:

permute(L,H.T) <- pick(H,L,Rem) & permute(Rem,T).
The first line says that the only permutation of the empty list nil is nil itself. The second line says that any other list L can be permuted by picking an arbitrary element H from L as the new head and permuting the remainder Rem to form the new tail T. As an example, consider the following goal:
At the first call to permute, pick selects the first element H=a, and the remainder Rem is b.c.nil. Then the first call to permute Rem leaves the result T=b.c.nil. Therefore the first permutation is the identity: P=a.b.c.nil. Each time backtracking returns to the pick predicate, a different element is selected. The second permutation is a.c.b.nil, and so on up to the sixth permutation c.b.a.nil. Then asking for another permutation causes a failure. To see how the permutations are generated, the reader should type these goals at the keyboard, tracing both the pick and permute predicates.

To sort a list L1, find another list L2 that is an ordered permutation of L1. The following rule defines psort in exactly that way:

psort(L1,L2) <- permute(L1,L2) & ordered(L2).
If L1 is already ordered, permute leaves it unchanged at the first call, and ordered verifies that it is correctly sorted. But if ordered fails, backtracking keeps returning to permute to run through all possible permutations until an ordered one is found. The predicate ordered is defined by one recursive rule together with unconditional rules to handle nil or a one-element list:
ordered(A.B.Tail) <- le(A,B) & ordered(B.Tail).

The ordered predicate is deterministic: it does no backtracking and takes N calls to check that a list of length N is ordered. But the number of permutations increases as the factorial of N. To sort a list of two or three elements, psort is fast. It is also fast if the original list is already ordered. But to sort the 26 letters of the alphabet from typewriter order (q.w.e.r.t.y...) to alphabetical order would take about a trillion centuries.

The psort predicate is an example of the generate and test method of problem solving: it generates a trial solution and calls ordered to test it. If ordered fails, backtracking goes back to generate another. There are two ways of improving a generate and test approach:

  • Heuristics: Use knowledge about the problem to guide the choice of trials and avoid those that have no chance of succeeding.

  • Divide and conquer: Split the problem into multiple subproblems, each of which is easier to solve; then combine the partial results to produce the final result.
Instead of trying all possible permutations, heuristics could guide the sort program by only selecting ones that make some "improvement"; i.e each new trial must be slightly better ordered than the previous one. A divide and conquer approach would split the list in two parts, sort those, and combine the results.

The bubble sort calls the predicate bubble to improve each pass through the list. Although a bubble sort is not usually considered a heuristic program, it illustrates the use of knowledge about the solution to improve performance. The predicate bsort calls bubble to generate an improved permutation:

bsort(L,L) <- ordered(L).
bsort(L1,L2) <- bubble(L1,L) & bsort(L,L2).
Instead of using backtracking to generate each permutation, bsort calls itself recursively. The reason it does not use backtracking is that each call needs the previous result to generate an improved permutation. Backtracking is good when the previous result is not needed, but it throws away information that algorithms like this may need. The bubble predicate checks pairs of adjacent elements and interchanges them if they are out of order. After each pass by bubble, the list is better sorted than before.
bubble(A.B.Tail, L) <-
     (le(A,B) -> (bubble(B.Tail,T) & L=A.T);
                 (bubble(A.Tail,T) & L=B.T)).
bubble(A.nil, A.nil).
The recursive rule checks whether A and B are in order. If so, it calls itself to check B.Tail; if not, it makes B the new head and calls itself to check A.Tail.

The time for the bubble sort increases as the square of the length N. For the list of 26 letters in QWERTY order, bsort takes 10 milliseconds. The predicate qsort is faster: it takes only 2 milliseconds for the same list. Even more important is the rate at which computation time increases; the time for qsort increases as N&asterisk;log(N): for 52 letters, bsort takes 38 ms, and qsort takes 5 ms. The qsort algorithm is a version of divide and conquer, called Quicksort by its originator, Anthony Hoare. Hoare, C. A. R. It uses a predicate partition(M,L,L1,L2) to split a list L into two parts L1 and L2. M is an arbitrary element of L; for best performance, it should be near the midpoint, but any element of L is acceptable. After partitioning, every element of L1 is less than or equal to M and every element of L2 is greater than M.:

partition(M,H.L,H.L1,L2) <-
    le(H,M) & partition(M,L,L1,L2).
partition(M,H.L,L1,H.L2) <-
    gt(H,M) & partition(M,L,L1,L2).
The first two rules compare M to H, the head of the list. Depending on the comparison, they put H in front of L1 or L2. The third line takes care of nil.

The qsort predicate keeps calling partition to split the input list into smaller and smaller sublists. Then it calls itself to sort the sublists and calls append to combine the results:

qsort(M.L,S) <- partition(M,L,L1,L2) &
    qsort(L1,S1) & qsort(L2,S2) &
The recursive rule takes the head of the input list M as the partitioning element. Then all the elements no larger than M go to L1, and the others go to L2. Finally, append combines the sorted sublists S1 and S2 with M in the middle. The choice of partitioning element M can have a major impact on performance. For relatively random lists, choosing the head is generally good. But if the input list is almost sorted, the head is the worst choice. If the 26 letters were already in alphabetical order, qsort would take 7 milliseconds, but bsort and even psort would take less than one millisecond. The merge sort given in Exercise 28 is insensitive to the order: it takes 2 ms to sort 26 letters, and 5 ms for 52 letters independent of their original order.

Sort programs rarely sort simple lists of numbers or character strings. Usually, they sort larger records or structures where one part is the key and the rest is some data associated with the key. In Prolog, such structures can be represented as lists with the head element as the key. Note that the only places at which the qsort-partition program deals with items that cannot be structures are le(X,M) and gt(X,M). To handle structures, it is enough to replace these with, say, new_le(X,M) and new_gt(X,M). As long as we provide appropriate rules for these new predicate names, the program will sort lists of any kind of data structure. For our proposed Key.Data structure, we need

new_le(K1.&asterisk;,K2.&asterisk;) <- le(K1,K2).
new_gt(K1.&asterisk;,K2.&asterisk;) <- gt(K1,K2).
Suppose we make these changes, and then give the goal ADRIAN 1
qsort((4 .door).(2 .shoe).(3 .tree).(1 .bun).nil,S).
Then S gets the value
(1 .bun).(2 .shoe).(3 .tree).(4 .door).nil.
For this example, the key is a number, and the associated data is a single word. The program would work just as well with data of varying length, such as:
(4 .door.knob.lock.key.nil).(2 .shoe.sock.nil)
   .(3 .tree.leaf.nil).(1 .bun.nil).nil.
In rearranging the lists, structure sharing versions of Prolog move pointers rather than the actual data. Therefore qsort takes a similar amount of execution time for lists with one word of data per key as for lists with hundreds of data items per key.

Finally, the most efficient sort is the one that relies on the built-in compute predicate:

sort(L1,L2) <- compute(set,X,member(X,L1),nil,L2).
The compute predicate in this form generates the set of all X's that are members of L1 and accumulates them in L2 in sorted order. IBM Prolog simply calls a highly optimized machine language program to do the computation. Unlike the other sorts, this one will discard duplicates. If the list L2 needs to retain all duplicates in L1, then the merge sort in Exercise 28 is probably the best.

Generate and Test

In a large class of artificial intelligence problems, values must be assigned to a number of variables to satisfy certain constraints. Exercise 16 on marching bands and Exercise 17 on assigning offices are examples of constraint satisfaction problems. Other examples include cryptarithmetic problems that assign digits to letters to satisfy equations like SEND+MORE=MONEY or DONALD+GERALD=ROBERT. A database query is an example of constraint satisfaction where the body of the query states constraints on the desired combinations of values that are selected from the database. Whereas a database system merely selects pre-existing values, an expert system for design may generate new values that meet given constraints. The R1 computer configurator, for example, assigns values to about 1,000 different variables in determining a complete system configuration (McDermott 1982). McDermott, J. Scheduling problems like Exercise 25 at the end of this chapter have a great deal in common with configurators and require similar techniques of constraint satisfaction.

A common approach to solving such problems is generate and test: pick some value for each design variable, then test to see if that combination of values satisfies the constraints. If so, the problem is solved. If not, pick a new set of values, and try again. The psort program on page &psort; is a particularly simple example of this.

This method is the epitome of the fixed-control approach: it fits all problems; all constraints may be stated in pure logic; and the control is clearly separated from the logic. Its only drawback is inefficiency: for the R1 computer configurator, there are about 1,000 variables to be defined with an average of 3 possible values for each one. The number of combinations to be tested is 3&asterisk;&asterisk;1000 — an array of supercomputers calculating since the beginning of the universe could not finish this computation.

To reduce the combinatorial explosion, R1 uses flexible control. The knowledge engineer explicitly adds control information that determines the order of assigning values and testing constraints. With such ordering, R1 can reduce the exponential term to a nearly linear one. As a result, it finds a satisfactory assignment of values to the 1,000 variables in about two minutes of CPU time. The efficiency of R1 is a strong argument for a judicious blending of logic and control. Yet it does not show whether the blend should be determined by a person or by an optimizing compiler.

To illustrate the issues, consider a simple cryptarithmetic problem with 8 variables — in this case, the puzzle SEND+MORE=MONEY. Each letter represents a single digit; no digit is repeated; and the initial digits S and M must be positive. (See Raphael (1976) for an analysis of the problem BEST+MADE=MASER, which is the same problem with some variables renamed.) Newell and Simon (1972) Newell, A. Simon, H. A. extensively studied protocols of human problem solvers who addressed such puzzles. As a result, they recommended production systems as a model of human performance. The OPS5 production system used to implement R1 (Forgy 1981) is Forgy, C. a direct descendant of their original proposals. This section, however, argues that a conscious imitation of human approaches is unnecessary; an optimal blend of logic and control can be determined by an analysis of the problem structure itself. We can determine a good sequence of execution from the declarative information that defines the problem.

For the sample problem, the declarative information is derived adrian -- minor change -- 3/2/90 from the statement SEND+MORE=MONEY. Let &C1; represent a carry from the units position; &C2;, a carry from the tens position; &C3;, a carry from the hundreds position; and &C4;, a carry from the thousands position. Then the puzzle determines five constraint equations:

     D + E = Y + 10 &asterisk; C1
N + R + C1 = E + 10 &asterisk; C2
E + O + C2 = N + 10 &asterisk; C3
S + M + C3 = O + 10 &asterisk; C4
        C4 = M
The carries C1, C2, C3, and C4 are restricted to be 0 or 1, with possible repetitions; the other variables are restricted to 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with no repetitions; and the leading digits must be positive: S>0 and M>0.

To solve the problem in Prolog, the first step is to define a nondeterministic predicate pick(X,L,REM) for picking an arbitrary element X out of a list L and putting all the remaining elements of L in the third argument &REM;. As we saw in on page &ppick; the predicate is defined by the following two rules in Prolog:

pick(X,HD.TL,HD.REM) <- pick(X,TL,REM).

Given the pick predicate, the following Prolog rule defines the solve predicate for the puzzle. This predicate has 3 arguments, each of which is a list of variables connected by dots. As the rule is executed, each variable is bound to an integer. At the end, each letter acquires an integer value to give 9567+1085=10652 as the solution to the original puzzle. Note that in a list of numbers, blanks are needed around the dots to avoid confusion with floating point numbers. .sk solve(S.E.N.D, M.O.R.E, M.O.N.E.Y) <- pick(M, 0 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . nil,R9) & pick(S, R9, R8) & pick(O, R8, R7) & pick(E, R7, R6) & pick(N, R6, R5) & pick(R, R5, R4) & pick(D, R4, R3) & pick(Y, R3, R2) & gt(M, 0) & (C4=0 | C4=1) & M=C4 & gt(S, 0) & (C3=0 | C3=1) & O := S + M + C3 - 10 &asterisk; C4 & (C2=0 | C2=1) & N := E + O + C2 - 10 &asterisk; C3 & (C1=0 | C1=1) & R := 10 &asterisk; C2 + E - N - C1 & Y := D + E - 10 &asterisk; C1. .sk This rule first picks a value for M from the 10 digits 0 through 9 and puts the remaining 9 digits in R9. Then it picks a value for S from R9 and puts the remaining 8 digits in R8. After it has picked a value for each of the variables, it begins testing the constraints to see if the choice of values constitutes a valid solution to the puzzle.

As the body of the solve rule is executed, the first choice of values is M=0, S=1, O=2, E=3, N=4, R=5, D=6, and Y=7. When the constraint M>0 is tested, a failure occurs, and the system backs up to the most recent choice point — in this case, the predicate, pick(Y,R3,R2). It then picks the new value Y=8 and tests the constraints once more. Again the test M>0 causes a failure, and the system backs up to try Y=9. At the next failure, it tries D=7 and Y=6. It continues exhaustively testing all 181,440 combinations of values for variables other than M before it finally backs up to try M=1. After more brute-force searching, it eventually finds the correct combination M=1, S=9, O=0, E=5, N=6, R=8, D=7, and Y=2.

This rule takes about 13 seconds of CPU time on an IBM 3090. Although 13 seconds is not a long time for a complex problem, this problem could be solved in just a few milliseconds with a different ordering of the predicates in the rule. But to decide how to reorder the predicates, a fair amount of analysis is required, and the reordered rule is less readable than the one above. Furthermore, for dynamically changing inputs, it may not be possible to do that analysis in advance. The next section will show how to speed up the execution by reordering the predicates by hand. Chapter 3, on page &c3money;, will show how to achieve close to the same performance automatically by using special features of IBM Prolog.

Reordering the Generate and Test

The ideal shape for a search tree is a straight line that leads directly to a solution. The search tree that corresponds to the above program is a bush with 1,814,400 short branches, of which only one is correct. For the SEND+MORE=MONEY problem, there is no way to convert the search tree into the ideal, but it is possible to prune most of the unnecessary branches and make it more tree-like than bush-like. The optimization follows two general principles: :ul

  • Prune branches on the search tree as early as possible.

  • Reduce the number of new branches that are allowed to sprout. :eul These principles give a graphic image of the approach, but they are not detailed enough to implement in an optimizing compiler. For .hw generate--and--test problems, however, they map into the following more detailed principles: :ul

  • Whenever all the variables in a constraint have values, that constraint should be tested before any new values are picked for other variables.

  • If every constraint has one or more unbound variables, select the constraint that has the smallest choice space and pick some value for each of its unbound variables. :eul To define the size of a choice space, let X1, X2, ..., Xn be the unbound variables in a constraint whose values cannot be computed from any other variables in that constraint. Then let choice(Xi) be the number of possible values that Xi can assume. The size of the choice space is defined as the product choice(X1)&asterisk;...&asterisk;choice(Xn). This method is a common technique in artificial intelligence. It has been described by Kowalski (1979) for optimizing logic programs Kowalski, R. Warren, D. H. D. and implemented by Warren (1981).

    To illustrate these rules, apply them systematically to optimize the SEND+MORE=MONEY problem. Of the five constraint equations, the one with the fewest choices is &C4=M.. Since M is determined by the value of &C4;, the choice space is limited to the two possible values of &C4;. The further constraint &M>0. eliminates the value 0 and completely determines the values &M=1. and &C4=1.. The rule for solve can therefore begin without any nondeterminism:

    M=1 & C4=1
    & R9=(0 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . nil)

    When &C4; and M have values, the next most restricted constraint is S+M+C3=O+10*C4. In this equation, the three variables S, O, and &C3; are without values; a choice for any two of them determines the third. Since &C3; is limited to 0 or 1 and S is restricted by the further constraint that &S>0., the choice space is 2 times 8, or 16. All the other equations have a choice space of 112 or more. The procedure should next pick a value for S, test S>0, pick a value for &C3;, compute the value of O from the choices for S and &C3;, and check whether the computed value for O is one of the possible values left to be assigned. In Prolog, that sequence becomes

    pick(S, R9, R8)
    & gt(S, 0) & (C3=0 | C3=1)
    & O  :=  S + M + C3 - 10&asterisk;C4
    & pick(O, R8, R7)

    After these choices have been made, the next most restricted constraint is E+O+C2=N+10*C3, which has unbound variables E, N, and &C2;. There are two choices for &C2;, and seven for E and N. Since E and N have the same choice space, either one can be picked first to determine a value for the other. Choosing E gives

    pick(E, R7, R6) & (C2=0 | C2=1)
    & N  :=  E + O + C2 - 10&asterisk;C3
    & pick(N, R6, R5)

    Now the most restricted constraint is N+R+C1=E+10*C2. Only R and &C1; are unbound variables, and picking 0 or 1 for &C1; determines R:

    (C1=0 | C1=1)
    & R  :=  10&asterisk;C2 + E - N - C1
    & pick(R, R5, R4)

    Finally, the last constraint to be tested is D+E=Y+10*C1, in which only D and Y are left unbound. Picking a value for either one determines the other. The end of the solve rule is therefore

    pick(D, R4, R3)
    & Y  :=  D + E - 10&asterisk;C1
    & pick(Y, R3, R2).

    Putting the pieces together produces the following Prolog rule. Except for the first line of the body, which simplifies parts of the previous rule, the rest of the rule is just a reordering of the previous one. The improvement in performance, however, is dramatic: it finds the correct solution in only 4 milliseconds — over 3200 times faster.

    solve(S.E.N.D, M.O.R.E, M.O.N.E.Y) <-
       M=1 & C4=1
       & R9=(0 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . nil)
       & pick(S, R9, R8)
       & gt(S, 0) & (C3=0 | C3=1)
       & O  :=  S + M + C3 - 10 &asterisk; C4
       & pick(O, R8, R7)
       & pick(E, R7, R6) & (C2=0 | C2=1)
       & N  :=  E + O + C2 - 10 &asterisk; C3
       & pick(N, R6, R5)
       & (C1=0 | C1=1)
       & R  :=  10 &asterisk; C2 + E - N - C1
       & pick(R, R5, R4)
       & pick(D, R4, R3)
       & Y  :=  D + E - 10 &asterisk; C1
       & pick(Y, R3, R2).

    Observations on the Method

    The improved program is still a form of generate and test, but a more distributed form. Instead of having a single block that generates all combinations followed by another block that tests all constraints, it repeatedly generates a few combinations and immediately tests them. Reordering the sequence of generating and testing improves performance in three ways:

    1. Dependencies in the constraints allow certain values to be computed deterministically from the values already bound to other variables. In this example, the choice spaces for M, O, N, R, and Y were narrowed down to 1.

    2. Some constraints, such as M>0 and S>0, do not determine a unique value, but they at least reduce the size of the choice spaces.

    3. Other constraints serve as filters that break up the monolithic generate block into a sequence of smaller generate blocks — in this example, five blocks.
    With weak constraints that are almost always true, the filtering effect is of little value. The ideal case occurs when the constraints that separate the generate blocks filter out all but one combination. With ideal filtering, the large problem breaks down into separable subproblems with no possibility of backtracking from a later block to an earlier one. If the SEND+MORE=MONEY problem were completely separable, the choice space for the reordered program would be the sum 1+16+14+2+4 or 37; with the maximum amount of backtracking, the choice space would be the product 1&asterisk;16&asterisk;14&asterisk;2&asterisk;4 or 1792. Since the generate block in the original program had a single choice space of 1,814,400 combinations, the expected improvement would be 1,814,400/37 or about 49,000 for the separable case and 1,814,400/1792 or about 1012 for the nonseparable case. The observed improvement in execution time was a factor of 3200, which is somewhere between the two extremes.

    Unfortunately, not all problems can be optimized. The problem of assigning values to variables to meet certain constraints belongs to a class of problems called NP complete, for which the only known algorithms take up time that increases exponentially with the size of the problem. In the general case, no optimizer — human or machine — can derive a program that avoids the combinatorial explosion. Such cases arise when every constraint requires values for all variables before it can be evaluated. Then reordering does not help.

    Fortunately, the constraints for many practical problems involve small subsets of the variables. The generate block can then be partitioned into a separate block for each subset. An ideal partitioning should have a large number of subblocks with just a few variables in each one. The partitioning is improved further when the constraints are so strong that they filter out all but a single option in the passage from one block to the next. With such strong filtering, the total number of combinations that have to be tested is the sum of the choice spaces in each subblock rather than their product. To improve performance without reordering the options by hand, IBM Prolog provides a way of delaying the execution of a subgoal until one or more of its variables have been assigned values. This technique is extremely valuable for constraint satisfaction problems, and its application to this puzzle will be given on page &c3money; in Chapter 3.

    Copyright ©2001 by John F. Sowa.

      Last Modified: