% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Extracts from the book "Natural Language Processing in Prolog" % % published by Addison Wesley % % Copyright (c) 1989, Gerald Gazdar & Christopher Mellish. % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Finite state techniques It is straightforward to represent FSTNs in Prolog and to write programs to perform various operations with them. Our approach here will be to represent networks declaratively as data structures (facts in the Prolog database) that can be examined and manipulated. We will then write general purpose recognition and generation programs that will work on any network represented as a data structure in the approved way. Before we start writing general network traversal programs, we need to consider how we are going to represent transition networks as Prolog data structures. We need to know the following about a network: - what its initial nodes are - what its final nodes are - what its arcs are, where each arc is defined by: - the departure node - the destination node - the label on the arc This can be represented most straightforwardly in Prolog by using simple predicates of the appropriate kind: initial(Node). final(Node). arc(DEPARTURE_NODE,DESTINATION_NODE,LABEL). Here are some example networks (the ENGLISH1 example appears in the appendix in the file "fstnarcs.pl"): % Swahili-1 % initial(1). final(5). arc(1,2,subj). arc(2,3,tense). arc(3,4,obj). arc(4,5,stem). % ENGLISH1 % initial(1). final(9). arc(1,3,np). arc(1,2,det). arc(2,3,n). arc(3,4,bv). arc(4,5,adv). arc(4,5,'#'). arc(5,6,det). arc(5,7,det). arc(5,8,'#'). arc(6,7,adj). arc(6,6,mod). arc(7,9,n). arc(8,9,adj). arc(8,8,mod). arc(9,4,cnj). arc(9,1,cnj). Something is missing from these network definitions (at least, under their intended interpretations), namely a statement of what such symbols as 'subj' abbreviate. We could readily adopt the following style of declaring these abbreviations: abbreviates(subj,[ni,u,a,tu,wa]). But, for reasons of consistency with our treatment of the lexicon in subsequent example programs, we will adopt here a more verbose, but equivalent form of declaration: word(CATEGORY,WORD). Here the CATEGORY argument corresponds to the abbreviation symbol, and WORD corresponds to one of the items subsumed under this abbreviation. Under this scheme, here are the abbreviations for ENGLISH1: word(np,kim). word(np,sandy). word(np,lee). word(det,a). word(det,the). word(det,her). word(n,consumer). word(n,man). word(n,woman). word(bv,is). word(bv,was). word(cnj,and). word(cnj,or). word(adj,happy). word(adj,stupid). word(mod,very). word(adv,often). word(adv,always). word(adv,sometimes). Although this is verbose, it is not inefficient: it may involve a Prolog interpreter in just as much work to search down a list checking membership as it does to search through its data base looking for the appropriate 'word' entry. In this section and some subsequent sections, we will develop a number of programs for performing useful tasks with various kinds of networks. These programs will have a certain "family resemblance" but will differ in their details. The task we will consider initially is that of using a network for recognition. The predicate 'recognize(NODE,STRING)' will be true if the given STRING can be accepted by our FSTN with the traversal starting at the given NODE. Firstly, we want to recognize the empty string (a string is a list of words, for present purposes) if we are at a final node: recognize(Node,[]) :- final(Node). This clause says that 'recognize(Node,String)' will succeed when String is empty and Node is a final Node. Secondly, we need to make provision for a traversal of an arc leading from the node we are at, followed by a continuation of the traversal from that point: recognize(Node_1,String) :- arc(Node_1,Node_2,Label), traverse(Label,String,NewString), recognize(Node_2,NewString). Notice that we call 'recognize' recursively, starting it off at the destination node Node_2 of the chosen arc. The predicate 'traverse' is in charge of telling us whether we can traverse that arc and, if so, what change is made to the input string. Provided with the arc label and the initial string in its first two arguments, it returns the new value of the string in its third argument. The definition of 'traverse' reflects the three ways in which we can make progress through the network. The first way (SYM) involves crossing an arc labelled with the next word in the string: traverse(Label,[Label|Words],Words) :- not(special(Label)). This says that we can traverse the arc if the label is the same as the first word in the input, as long as it is not a 'special' symbol (an abbreviation or '#'). The second way (ABB) involves continuing down a string whose first word is subsumed under the category given by the arc label: traverse(Label,[Word|Words],Words) :- word(Label,Word). This says that we can traverse the arc if the label is an abbreviation and the first word on the input is covered by that abbreviation. As in the last case, the new string returned is simply the rest of the string. The final clause (JMP), for 'jump' arcs, says we can recognize the string from a node if there is a jump to another node. In this case we return the original string unchanged: traverse('#',String,String). To complete our definition, we need to define the predicate 'special', which detects abbreviations and the "jump" symbol: special('#'). special(Category) :- word(Category,_). Finally, in checking recognition of a whole string, we need to make sure that we start out from an initial node, and we can put this requirement in a simple test predicate that calls 'recognize': test(Words) :- initial(Node), recognize(Node,Words). The predicate 'test', given a list of words as argument, will succeed exactly when the string of words is accepted by the automaton encoded in the 'initial', 'final', 'arc' and 'word' statements. The definition of 'test' completes the code of our recognizer, which also appears in the appendix as "fstnrecg.pl". The FSTN recognizer program has been described as if there is no search involved in recognition, but obviously we would like the program to work with non-deterministic automata as well as deterministic ones. When we call 'recognize' with a particular string and node, we are entrusting that call with a particular state in the search space which is to be explored. Such a call is in charge of checking whether the state could be a final state, and simply succeeding if so. It is also in charge of computing the possible next states and calling 'recognize' again to investigate them. Search arises if there is more than one possible next state that arises from a given state. Our program handles this thanks to the ability of 'recognize' to do different things on backtracking. Thus a given call may first of all invoke another 'recognize' for one possible next state. If that recursive call does not succeed, it may then be able (by selecting an alternative 'traverse' or 'arc' clause) to make a call corresponding to an alternative next state. Notice that because of the way Prolog works only one path down the search tree is investigated at a time. So, for instance, the call recognize(4,[o,a,h,w]) first of all calls recognize(5,[a,h,w]) and only if this fails (because this is not a final state and yet cannot be taken further) does it call recognize(5,[o,a,h,w]) This is the characteristic behaviour of depth-first search - a given possibility is followed up as far as possible before any alternative is considered. In this case, Prolog's backtracking facility means that we do not need to explicitly represent the "pool" of alternative states to explore. The set of alternatives is represented implicitly in Prolog's memory of places to backtrack to. Let us consider now what has to be done if we wish to adapt our recognition program for the exhaustive generation of all the sequences that are allowed by a transition network. We still need to search all possible paths through the network, but now we are not constrained to traverse arcs that are compatible with an input string. In fact, we do not really need to make any modifications to the program, since it will generate as it stands, given the goal 'test(X), write(X), nl, fail.', but we can package this up into a 'generate' predicate if we wish: generate :- test(X), write(X), nl, fail. This provides us with exhaustive generation but it cannot be allowed to run to completion if the network allows an infinite number of different strings. Nevertheless, it could still be useful, in principle at least, to run such a system for a while, to get an idea of the range of the strings allowed by a network. If we try generating from our example network for a fragment of English, however, we do not get a representative sample of the strings - the strings generated go as follows: [kim, is, often, a, happy, consumer] [kim, is, often, a, happy, consumer, and, often, a, happy, consumer] [kim, is, often, a, happy, consumer, and, often, a, happy, consumer, and, often, a, happy, consumer] [kim, is, often, a, happy, consumer, and, often, a, happy, consumer, and, often, a, happy, consumer] [kim, is, often, a, happy, consumer, and, often, a, happy, consumer, and, often, a, happy, consumer] ... The trouble is that, once the program has made a decision about an initial set of words, it is happy to go on investigating other decisions that it encounters later. At node 9, the only final node, there is always the option to loop back to node 4 instead of finishing, and it is this decision, rather than any other, that is remade when we ask for alternative solutions. Only if for some reason all future possibilities fail to work out will it remake earlier decisions. Because of the way one call of 'recognize' has to fail (exhaust all its possibilities) before an alternative can run, the program will always focus on one alternative and further developments from it before it tries other alternatives. This depth-first search behaviour is all right when the set of possibilities to be investigated is finite, but when the set is infinite, as in this case, it can lead to a very narrow exploration of alternatives, or, worse still an infinite loop where longer and longer alternatives are tried and the program never actually stops with any solutions. Indeed, if we were to rearrange our 'word' database in this example so that entry for the mod "very" came first, the generation would get into an infinite loop without generating any possibilities. Our recognition program does just that - it recognizes whether or not a string is accepted by a given FSTN. But although it discovers quite a lot about the string in the course of a successful recognition, this information is lost. All we get by way of result is a truth value (reported by Prolog as "yes" or "no"). If we wish to recover the information about the sequence of nodes and arc labels that were involved in the successful recognition, then we need to introduce a further argument to collect this information. Equipped with such an argument, 'recognize' ceases to be merely a recognizer and becomes instead a rather primitive form of parser. Here is the modified part of the program. The definition of 'traverse' and 'special' are as before: parse(Node,[],[Node]) :- final(Node). parse(Node_1,String,[Node_1,Label|Path]) :- arc(Node_1,Node_2,Label), traverse(Label,String,NewString), parse(Node_2,NewString,Path). test(Words) :- initial(Node), parse(Node,Words,Path), write(Path), nl. One way of thinking about what this program (which also appears in the appendix as "fstnpars.pl") does is simply in terms of it recording the history of its successful arc transitions in a list. But another, equally good, way of thinking about it is as a program which maps one string of symbols (the sentence) into another string of symbols (the parse), in other words, as a program which transduces one string into another. A finite state transducer is just like a finite state recognizer except that it deals with two tapes. Thus we can amend our FSTN traversal programs to deal with finite state transducers, mainly by changing the arc statements to specify a pair of symbols (one from each tape) instead of just one. Where a transition is allowed by an abbreviation, however, the label can just be a single symbol, as before. Here is an example, the ENG_FRE-2 transducer (which appears in "fstrans.pl" in the appendix): initial(1). final(5). arc(1,2,'WHERE'). arc(2,3,'BE'). arc(3,4,'FDET'). arc(4,5,'FNOUN'). arc(3,6,'MDET'). arc(6,5,'MNOUN'). word('WHERE',[where,ou]). word('BE',[is,est]). word('FDET',[the,la]). word('MDET',[the,le]). word('FNOUN',[exit,sortie]). word('FNOUN',[shop,boutique]). word('FNOUN',[toilet,toilette]). word('MNOUN',[policeman,gendarme]). Notice that abbreviations now stand for possible pairs of words, rather than simply possible single words. Given such arc statements, our 'transduce' predicate can be defined along very similar lines to our earlier 'recognize' and 'parse' predicates. The only additional complication is that we need a modified version of 'traverse', 'traverse2', which takes two input tape arguments and produces two modified tape arguments. The definition of 'traverse2' needs to consider the three possible situations: left tape but not right tape, right tape but not left tape, and both tapes. transduce(Node,[],[]) :- final(Node). transduce(Node_1,String1,String2) :- arc(Node_1,Node_2,Label), traverse2(Label,String1,NewString1,String2,NewString2), transduce(Node_2,NewString1,NewString2). traverse2([Word1,Word2],[Word1|RestString1],RestString1,[Word2|RestString2],RestString2) :- not(special(Word1)), not(special(Word2)). traverse2(Abbrev,String1,NewString1,String2,NewString2) :- word(Abbrev,NewLabel), traverse2(NewLabel,String1,NewString1,String2,NewString2). traverse2(['#',Word2],String1,String1,[Word2|RestString2],RestString2). traverse2([Word1,'#'],[Word1|RestString1],RestString1,String2,String2). traverse2('#',String1,String1,String2,String2). Notice that this definition of 'transduce' is wholly reversible: it makes no difference to the program if we think of it as transducing from its second argument to its third, or conversely. Indeed, we may even wish to use it to match inputs on both tapes or to generate output on both tapes. As before, we need an additional predicate to ensure that we start out from an initial node, and we can conveniently use this predicate to print the transduced string out, as well: test(String_A) :- initial(Node), transduce(Node,String_A,String_B), write(String_B), nl.