Numbers in Prolog


English speakers know how to express numbers using a series of English words. This act is sometimes so natural that it is done without a second thought. There may be software out there that can do the same thing, but I think it is worth examining the logic behind this in more detail. This example is a good opportunity to learn some Prolog, (I'll assume you don't know much about Prolog at this point), and linguistics. I also would like to spend some time with programming methodology for prolog. I like examining the linguistic of numbers because one can view it as a "sub-grammar" within a language. Sometimes, a language will demonstrate that sub-grammar of numbers is simply borrowed from other languages and demonstrates the language's origins. Once you can handle numbers, you can move onto ordinals, ("first", "second", "third"), time, dates, and adjectives in a language.

Goals

Prolog works with goals and searches for logic to reach this goal. For this exercise, we want to construct a Prolog "functor" which takes an integer value and generates a sequence of words, (using "atoms" in Prolog), that express the numeric value in a given language.

Prolog is a good languge to work on a problem from a top-down approach. We must start with the goal and work down toward details. So the first step is to specify what we want the program to do at the top level. We want a program that will do something like this:

Facts

?- num(0,[zero]).
?- num(1,[one]).
?- num(123,[one,hundred,and,twenty,three]).

In Prolog terms, the word "num" in each case is a "functor". It is operates on its arguments until it has resolved each unbound variable or fails. The above statements are goals, meaning that they have been fully resolved and "num/2", (an "arity" description saying that the fuctor is called "num" and has 2 arguments), only needs to decide whether it is true or false.

Many prolog books start out with "facts" like "cat(fluffy)". This is okay in a few cases, where you always work with cats, and just want to know attributes, but it is limited. as a general rule, a more flexible functor is "functor(IN,OUT)". That is, the first arguments are the input, the second is the output. An even better functor is one that is reversible -- meaning that you can enter an input and get results for the possible outputs. Enter an output and get all possible inputs. Most builtin functors can be used this way. so ... num(0,[zero]) ... has an input argument of integer, (and prolog is very generous to allow us a large precision). The output is a list with a set of words (atoms). this is the typical way words are handled in parsers in Prolog. We can give it to a "write" statement and it will generate all the words separated by spaces.

Okay, now for the code. We can easily code the first 10 digits:

num(0,[zero]) :- !.
num(1,[one]) :- !.
num(2,[two]) :- !.
num(3,[three]) :- !.
num(4,[four]) :- !.
num(5,[five]) :- !.
num(6,[six]) :- !.
num(7,[seven]) :- !.
num(8,[eight]) :- !.
num(9,[nine]) :- !.

I put ":- !." at the end of each "fact". This is the "cut" operator. It isn't strictly necessary. It is unlikely that any other rule will be found that matches, and since numbers are fairly deterministic -- but the cut improves the performance since it is useless to continue searching once we match one of these rules. Later, we will have a good reason for a cut. Linguistically, the word associated with an integer in this range can't be derived from base words, so we have to code them as facts. This will be an important pattern to establish that works for all languages.

There is an implied "rule" in English that numbers between 10 and 19 are "teens" and are formed by appending the suffix "-teen", however there are many exceptions to this rule: we say "eleven", not "oneteen". so we should state these exceptions as facts, (with a cut operator). In general, exceptions to grammatical rules should are "facts" that must be memorized, just like the base numbers.

num(10,[ten]) :- !.
num(11,[eleven]) :- !.
num(12,[twelve]) :- !.
num(13,[thirteen]) :- !.
num(15,[fifteen]) :- !.
num(18,[eighteen]) :- !.

Here is where the "cut" is more important. Later on, I need to declare a rule that "1" + digit => + "-teen". If i didn't write the cut, then both "eleven" and "oneteen" would be considered correct. With the cut, only "eleven" is valid, "oneteen" will never be considered.

Finishing up the "facts", we notice that there is also an implied rule that two-digit numbers ending in "0" are formed by appending "-ty" to the end of the single-digit word. there are more exceptions which we must state:

num(20,[twenty]) :- !.
num(30,[thirty]) :- !.
num(40,[forty]) :- !.
num(50,[fifty]) :- !.
num(80,[eighty]) :- !.

Patterns

Okay, and thats it for explicit facts for this exercises, (there are no more "exceptions" in the English language for numbers). We now cover the rest of the grammar with rules. In prolog, though, "clarity is achieved via utilities". That is, in most other languages, writing explicit steps over and over again is readable. In prolog, it takes a while to unravel what was written by someone else. Therefore, it is useful to break a problem down into smaller steps which can be solved separately, and cover more cases. Notice that I mentioned the English grammar rules about matching digits "1" and "0" above, and using associated digits to generate words. Writing something out explicit to figure out each digit is possible and may even be "elegant", but would become unreadable and unmanagable really quickly. Solving smaller problems is more stable, and easier to read.

I want to be able to (1) match digits and (2) collect "other" matching digits. I have to pay attention to the number of digits so I can match certain lengths of digits. For example, in the "teen" pattern a number must also be 2-digits long, have a "1" in the front, and then figure out the word for the second digit and add "-teen" to the end. I decided to write a "mini-regular expression" language to solve these patterns rather than write out the logic for individual number patterns. In my expression, I want a "d" to match a digit and any other character (a digit) to match that digit. For the "teen" rule, my patter will be "1d". For "-ty" rules, "d0", for hundreds, I want to match "d00" and "dd{2}". Since we are examining patterns character-by-character, I want to split a number and the pattern into lists of characters. "atom_chars/2" does this. I will then use a helper functor called "numpat0" to deal with the pattern as a list. So I define a "numpat" functor to simply split the pattern and the number into lists of characters:

numpat(N,P,L) :- atom_chars(N,NL), atom_chars(P,PL), numpat0(NL,PL,L).

Notice that the 3 arguments: 2 are "IN" and 1 "OUT". the "IN" arguments are the number, N, and the pattern, P, the output is a list of matched "d" arguments. For example, if I query for "numpat(16,'1d',L)", Prolog will convert this into "numpat0(['1','6'],['1','d'],L)". L will later be bound to the list ['6'].

Now for the logic of "numpat0". Since I am dealing with a list, this functor will be recursive and that usually implies two things: (1) there must be at least two rules (2) one rule handles the "stop" case which stops recursion. For numpat0, I can write the 3 most general rules:

numpat0([],[],[]) :- !.
numpat0([N|T1],[N|T2],T) :- !,numpat0(T1,T2,T).
numpat0([N|T1],[d|T2],[NN|T]) :- !,
    atom_number(N,NN),
    numpat0(T1,T2,T).

The first rule is the "stop" and simply means "stop and succeed if both lists are exhausted". We generally expect that no pattern and no number will return no matches.

The second rule is the digit matching rule. If the heads of both lists are the same (digit), then recursively call numpat0 with the tails of both lists for more matches. (we don't store exact character matches because we already know what is being matched here). For example "numpat0([1,...],[1,...],L)" matches this rule.

The third rule matches a digit with a "d". The functor "atom_number/2" is a builtin that converts the char digit into an number. We want to return real integers in our match results so we get 0 instead of "000" later on. finally, call "numpat0/3" on the rest of the lists for more matches.

Now, I also want to use patterns like "d{#}" and "d{#,#}" for longer numbers. I would rather write "d{9}" to match 9 digits rather than "ddddddddd", which is error prone and difficult to read. The "d{#,#}" pattern allows us to automatically match a range of digits. "d{1,3}" will match "d" or "dd" or "ddd". The first pattern can be coded similar to the last rule:

numpat0(L,[d,'{',C,'}'|T1],[N|T2]) :- !,
    atom_number(C,CN),
    cntlist(L,CN,HL,TL),
    concat_atom(HL,NS),
    atom_number(NS,N),
    numpat0(TL,T1,T2).

The "C" will get bound to a count digit, (we are assuming single digit counts), which we convert into a number, CN, and pass to a helper functor "cntlist". "cntlist/4" will take a list and splits it into two lists with C elements in the first list and the rest in the second output list. this hides all the work needed to pull out a certain number of digits for matching. The functor "concat_atom" combines the digits we pulled using "cntlist" ... and "atom_number" converts the digits into an integer. We then call recursively on the rest of the list.

The functor "cntlist/4" is actually very easy to write. we only need the stop rule and a rule to (1) count backwards to zero, (2) collect elements up to the number of elements for each call:

cntlist(L,0,[],L) :- !.
cntlist([H|T],N,[H|HL],TL) :- succ(N1,N),cntlist(T,N1,HL,TL).

The first rule says, "when count is zero, then the first OUT is the empty list (zero elements), the second OUT is the input list.

The second rule says, "for every other count, reduce the count by one and call with the tail of the input list". In Prolog, "succ" is another builtin which returns the "successor" number. That is, the second argument will be the first argument + 1. This is the way to increment and decrement arguments. Notice that we call it with N as the second argument, so it solves "succ" backwards and binds N1 to N - 1. "succ" can be used like x++ and x-- in C.

Okay, now for the "d{C,D}". This pattern means "when the number of digits is between C and D. In Prolog terms, this pattern is non-deterministic -- meaning it has more than one solution. We want "d{1,3}" to act like "d{1} or d{2} or d{3}". We will use another custom routine, similar to cntlist, called "rnglist" which can match a range of lengths. The Prolog unification engine will automatically try to figure out which length satisfies our queries.

numpat0(L,[d,'{',C,',',D,'}'|T1],[N|T2]) :- !,
    atom_number(C,CN),
    atom_number(D,DN),
    rnglist(L,CN,DN,HL,TL),
    concat_atom(HL,NS),
    atom_number(NS,N),
    numpat0(TL,T1,T2).

In this rule, we use "atom_number" to convert the char digits C and D into actual integers CN and DN. we call "rnglist" to match *each* of the combinations of numbers of digits between C and D. Which query succeeds will concatenate the digits, convert to a number and combine with the result of the recursive call.

rnglist(L,_,N2,HL,TL) :- cntlist(L,N2,HL,TL).
rnglist(L,N1,N2,HL,TL) :- N2 > N1,succ(NN,N2),rnglist(L,N1,NN,HL,TL).

This is "rnglist". notice there are no cuts. This time, we want rnglist to return the result of cntlist for each value (the first rule), and call itself for each value between the first and second values. This particular definition of rnglist will return the longest match first, and shorter ones afterward.

Rules

Now that we have our pattern language defined, we use it to code the "num" functor for "-teens" and "-tys":

num(N,[T]) :-
    numpat(N,'1d',[N1]),!,
    num(N1,[T1]),
    concat_atom([T1,teen],T).
num(N,[T]) :-
    numpat(N,'d0',[N1]),!,
    num(N1,[T1]),
    concat_atom([T1,ty],T).

If the number matches the pattern, then take the matching "d" value, look up its word sequence and concatenate "teen" or "ty".

For any other 2-digit number, we simply combine the tens word with the units word, with a "-" in between: like "thirty-five".

num(N,[T]) :-
      numpat(N,'dd',[N1,N2]),!,
      concat_atom([N1,'0'],N1A),
      num(N1A,[T1]),
      num(N2,[T2]),
      concat_atom([T1,-,T2],T).

We can do similar things to create rules for patterns up to the billions:

num(N,[T,hundred]) :-
    numpat(N,'d00',[N1]),!,
    num(N1,[T]).
num(N,L) :-
    numpat(N,'dd{2}',[N1,N2]),!,
    concat_atom([N1,'00'],N1A),
    num(N1A,L1),
    num(N2,[T2]),
    append(L1,[and,T2],L).
num(N,L) :-
    numpat(N,'d{1,3}000',[N1]),!,
    num(N1,L1),
    append(L1,[thousand],L).
num(N,L) :-
    numpat(N,'d{1,3}d{3}',[N1,N2]),!,
    concat_atom([N1,'000'],N1A),
    num(N1A,L1),
    num(N2,L2),
    append(L1,L2,L).
num(N,L) :-
    numpat(N,'d{1,3}000000',[N1]),!,
    num(N1,L1),
    append(L1,[million],L).
num(N,L) :-
    numpat(N,'d{1,3}d{6}',[N1,N2]),!,
    concat_atom([N1,'000000'],N1A),
    num(N1A,L1),
    num(N2,L2),
    append(L1,L2,L).
num(N,L) :-
    numpat(N,'d{1,3}000000000',[N1]),!,
    num(N1,L1),
    append(L1,[billion],L).
num(N,L) :-
    numpat(N,'d{1,3}d{9}',[N1,N2]),!,
    concat_atom([N1,'000000000'],N1A),
    num(N1A,L1),
    num(N2,L2),
    append(L1,L2,L).

Extending the Pattern Language

What if we want to get into trillions? Our "numpat" rule assumes that there is only one digit between {} marks. We want to have "d{12}" work as well. Well, it isn't difficult to extend numpat0 in such a way that if it doesn't match our existing number pattern rules, we can simply try concatentating two or more sequential digits:

numpat0(L,[d,'{',C,',',D,E|T1],T) :- !,
    concat_atom([D,E],DE),
    numpat0(L,[d,'{',C,',',DE|T1],T).
numpat0(L,[d,'{',C,D|T1],T) :- !,
    concat_atom([C,D],CD),
    numpat0(L,[d,'{',CD|T1],T).

Notice that we just fixed the program without modifying any code, just extending the existing rules. We don't need to worry about order too much. The "," makes sure we match a range and assumes the unfinished "{}" blocks must be made up of subsequent digits. We simply concatenate the two characters together and call ourselves recursively until we match a "}" to the block.

With the extended pattern we can now add more rules for numbers beyond "trillion":

num(N,L) :-
    numpat(N,'d{1,3}000000000000',[N1]),!,
    num(N1,L1),
    append(L1,[trillion],L).
num(N,L) :-
    numpat(N,'d{1,3}d{12}',[N1,N2]),!,
    concat_atom([N1,'000000000000'],N1A),
    num(N1A,L1),
    num(N2,L2),
    append(L1,L2,L).
num(N,L) :-
    numpat(N,'d{1,3}000000000000000',[N1]),!,
    num(N1,L1),
    append(L1,[quadrillion],L).
num(N,L) :-
    numpat(N,'d{1,3}d{15}',[N1,N2]),!,
    concat_atom([N1,'000000000000000'],N1A),
    num(N1A,L1),
    num(N2,L2),
    append(L1,L2,L).

User Friendly Enhancement

There is one small "user friendly" feature we can add: We are assuming that we are calling a "num" functor with an atom, (in quotes), not an integer, which is what most people would expect. We can add a rule to automatically check if our argument is an atom and convert to an integer and re-call ourself. This can be done with a single rule:

num(N,L) :- atom(N),!,atom_number(N,N1),num(N1,L).

Modules

Ultimately, we might like to have a Prolog module for each target language. To finish up this article, we should place all the utility functions into a script called "numutils.pro" (or "numutils.pl"), and add the declaration:

:- module(numutil,
[
    cntlist/4,
    rnglist/5,
    numpat/3
]).

Which makes our custom, generic, routines public.

In "english.pro" (or "english.pl"), we add:

:- use_module(numutil).

... and it will include utility definitions.

Finally, here are is a test:

1 ?- [english].
%  numutil compiled into numutil 0.00 sec, 3,560 bytes
% english compiled 0.00 sec, 15,212 bytes
Yes
2 ?- num(1234567,W).
W = [one, million, two, hundred, and, 'thirty-four', thousand, five, hundred|...] [write]
W = [one, million, two, hundred, and, 'thirty-four', thousand, five, hundred, and, 'sixty-seven']
Yes