% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Extracts from the book "Natural Language Processing in Prolog" % % published by Addison Wesley % % Copyright (c) 1989, Gerald Gazdar & Christopher Mellish. % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Parsing, search and ambiguity The essence of bottom-up recognition can be captured quite elegantly in a simple Prolog program. As above, we can see the problem as that of successively transforming the initial string into a string consisting of a single category (e.g. "S"). The predicate 'burecog' below is given a string of words and categories, implemented as a Prolog list, and succeeds if, using the rules of the grammar, it can transform it to the list [s]. burecog([s]). burecog(String) :- append(Front,Back,String), append(RHS,Rest,Back), rule(LHS,RHS), append(Front,[LHS|Rest],NewString), burecog(NewString). This definition makes use of the 'append' predicate and assumes that the grammar rules are represented in the following way: rule(s,[np,vp]). rule(np,[john]). rule(vp,[v]). rule(vp,[v,np]). rule(v,[saw]). rule(np,[mary]). How does 'burecog' work? First of all, 'append' is used twice to find a way in which the input String decomposes into the Front, followed by the RHS, followed by the Rest. On backtracking, the 'append' goals will find every possible way that this can be done. Given such a decomposition, an attempt is made to match the RHS part to the right hand side of a rule. If this succeeds, a new list is constructed by putting together the items in the string that appear before the matched portion, the LHS of the rule and the items appearing after the matched portion. Then an attempt is made to further transform the resulting NewString. Although this definition is a clear and concise description of bottom-up recognition, the way that 'append' is used to investigate at each stage all possible decompositions of the string means that it is hopelessly inefficient. In the Prolog programs which result from the standard translation, DCGs implement top-down, left-to-right recognizers or parsers. This can be readily inferred from observing the execution of a DCG parser. When the standard translation of a particular DCG rule is tried, for instance: s([s,NP,VP],S1,S3) :- np(NP,S1,S2), vp(VP,S2,S3). a hypothesis has been made about how a particular type of phrase ("s") might be found in terms of smaller phrases (of category "np" and "vp"). Thus the goal to recognize (or parse) a sentence decomposes into goals to recognize an NP and a VP. These themselves decompose into goals to find smaller phrases, and the process continues until the word level is reached. This is pure top-down behaviour, as no account is taken of the input string until a hypothesis actually predicts that a word of a particular kind might appear there. The fact that the recognition proceeds left-to-right is a simple consequence of the Prolog execution strategy. Since Prolog attempts the goals in the body of a clause in the order they are written, in this case it will only start looking for a VP after it has found an initial NP. Similarly, for any rule that has multiple elements on the RHS the corresponding Prolog program will look for them in the order written. The result it a pure left-to-right strategy. If you want to do recognition or parsing in a Prolog environment, then the most straightforward way of going about it is simply to use the ready-made DCG interpreter provided with most Prolog systems and discussed in the previous chapter. But, if such an interpreter is not available, then it is still trivial to implement left-to-right, top- down, depth-first recognition or parsing, requiring barely half-a- dozen lines of code in addition to the grammar. Here, for example, is a recognizer - as usual, we exploit difference lists: recognize(Category,S1,S2) :- rule(Category,Daughters), matches(Daughters,S1,S2). A string can be recognized as an instance of Category, if there is a rule which allows Category to dominate Daughters and if the Daughters match the string. matches([],S,S). matches([Word],[Word|S],S). matches([Category|Categories],S1,S3) :- recognize(Category,S1,S2), matches(Categories,S2,S3). A list of categories matches a string (i) if both are empty, or (ii) both consist just of a Word, or (iii) if an initial portion of the string can be recognized as the first category in the list, and the remainder of the category list matches the remainder of the string. This recognizer expects rules in the following format: rule(s,[np,vp]). rule(vp,[iv]). rule(vp,[tv,np]). rule(np,['MediCenter']). rule(np,[nurses]). rule(iv,[died]). rule(tv,[employed]). Since DCGs (under the standard translation) give rise to top-down, left-to-right parsers, both DCG parsers and this recognizer will suffer from the problem of left recursion. That is, one can no more expect to write the DCG rule: np --> np, pp. than one can expect to be able to write the clause: human(X) :- human(X), animate(X). say. In practice, it is not usually a problem to rewrite a DCG grammar to avoid left recursion. Moreover, since the structures built by a DCG are not required to correspond exactly to the trace of the parser, in practice one can also arrange to have the same structures produced by the rewritten grammar as might have been produced by an original, left recursive, grammar. For example: np(np(Det,N)) --> det(Det), n(N). np(np(NP,PP)) --> np(NP), pp(PP). can be rewritten as this: np(np(Det,N)) --> det(Det), n(N). np(PPs) --> det(Det), n(N), pps(np(Det,N),PPs). pps(NP,np(NP,PP)) --> pp(PP). pps(NP,PPs) --> pp(PP), pps(np(NP,PP),PPs). Although the standard translation of DCGs into Prolog programs gives rise to top-down parsers, this is by no means the only possible translation available. We will now show how CF-PSG rules can be mapped into rather more complex clauses in a way that converts the CF-PSG into a bottom up parser, although, for simplicity, we will start out by considering just recognition. Let us assume that our lexicon is encoded as follows: word(np,'MediCenter'). word(np,nurses). word(tv,employed). word(iv,died). As before, the translation will associate a distinct predicate with each category, although the interpretation of this predicate will be different to that implicit in the standard translation. The translation will also make use of a special predicate 'goal', which also serves as a kind of top-level predicate: goal(Goal,[Word0|Words1],WordsN) :- word(Category,Word0), Term =.. [Category,Goal,Words1,WordsN], call(Term). The first argument here is the category you are looking for, the second and third arguments are used for the standard difference list representation of the string to be recognized. Thus we might call 'goal' like this: ?- goal(s,[nurses,died],[]). If we do, then 'word(np,nurses)' will succeed and Term will get bound to 'np(s,[died],[])' which is then called. When we call the predicate 'goal', Prolog attempts bottom-up analysis of the relevant part of the string and will succeed if there is a way of recognizing a phrase of type Goal. Bottom-up analysis consists of looking at the next word (seeing what category it could have) and then attempting to complete an analysis of the required Goal using this material. The crucial thing to note here is that the 'np' predicate gets called after an NP has been recognized. Thus the 'np' predicate is not a predicate for recognizing NPs, but rather a predicate for recognizing the material that follows an NP. So the interpretation of 'np' is as follows: % np(Goal,S1,S2) - % The words in S1-S2 are a possible way of % extending an NP to a phrase of type Goal % (completing a phrase of type Goal, after an initial NP) Given a two daughter rule like this: c0 -> c1 c2 we will translate it into this: c1(Goal,Words0,WordsN) :- goal(c2,Words0,Words1), c0(Goal,Words1,WordsN). So our rule for expanding S, s -> np vp should be translated into this: np(Goal,Words0,WordsN) :- goal(vp,Words0,Words1), s(Goal,Words1,WordsN). In words, if we are looking for a phrase of type Goal and have found an NP, we can complete the analysis by finding a VP and after that completing the analysis of a phrase of type Goal, working from the newly-found S. Another way of saying this is 'we can extend an NP to a given Goal by first of all finding a VP and then extending the resulting S to make the Goal'. Pursuing our example, 'np(s,[died],[])' will succeed if 'goal(vp,[died],[])' and 's(s,[],[])' succeeds. We make provision for the latter by providing a set of termination clauses, one for each syntactic category in our grammar, for our recognizer, as follows: s(s,Words,Words). np(np,Words,Words). vp(vp,Words,Words). iv(iv,Words,Words). tv(tv,Words,Words). These clauses have the obvious interpretation - for instance, if you are looking for an S and have found an S, then one possibility is simply to succeed, without consuming any further words. So that just leaves 'goal(vp,[died],[])' to consider: when we call 'goal' with those arguments, 'word(iv,died)' will succeed and Term will get bound to 'iv(vp,[],[])' which is then called. This is not going to succeed in virtue of any of the termination clauses listed above, so clearly a clause that exists in virtue of the grammar translation is what is required. Given a single daughter rule like this: c0 -> c1 we will translate it into this: c1(Goal,Words0,WordsN) :- c0(Goal,Words0,WordsN). So a rule for expanding VP as just an intransitive verb, vp -> iv should be translated into this: iv(Goal,Words0,WordsN) :- vp(Goal,Words1,WordsN). In words, if you want to complete a phrase of type Goal and have found an IV, then you can use any way of completing such a phrase assuming that you have found a VP. Given this, 'iv(vp,[],[])' will plainly succeed if 'vp(vp,[],[])' succeeds, and the latter will succeed thanks to the presence of the relevant termination clause. So our original goal, of recognizing nurses died as an instance of S, succeeds, as we would want it to. In some ways, the name of the predicate goal is slightly misleading in a recognizer of this kind, because it suggests that the program is actively seeking a particular kind of phrase, a very top-down idea. In fact, though, if you look at how goal is defined you will see that the first argument is not used at all to influence how bottom-up recognition proceeds from the first word. Instead, the Goal argument simply serves to ensure that any bottom-up analyses with the right category are recognized as as possible solutions to the original problem. Having considered the recognition case in some detail, we will now show how to provide a parser by an analogous translation. The top- level predicate of our parser is still 'goal', only now defined as follows: goal(Goal,Parse2,[Word0|Words1],WordsN) :- word(Category,Word0), Parse1 =.. [Category,Word0], Term =.. [Category,Goal,Parse1,Parse2,Words1,WordsN], call(Term). The second argument here is the parse tree that gets returned. Thus we might call 'goal' like this: ?- goal(s,Parse2,['MediCenter',employed,nurses],[]). If we do, then 'word(np, 'MediCenter')' will succeed, Parse1 will get bound to 'np('MediCenter')' and Term will get bound to 'np(s, np('MediCenter'), Parse2, [employed, nurses], [])' which is then called. We can now interpret a predicate like 'np' as follows: % np(Goal,Parse1,Parse2,S1,S2) - % The words in S1-S2 are a possible way of % extending an NP with parse tree Parse1 to a % phrase of type Goal. % The parse tree of the Goal phrase is Parse2. Given a single daughter rule like this: c0 -> c1 we will translate it into this: c1(Goal,Parse1,Parse,Words0,WordsN) :- c0(Goal,c0(Parse1),Parse,Words0,WordsN). And given a two daughter rule like this: c0 -> c1 c2 we will translate it into this: c1(Goal,Parse1,Parse,Words0,WordsN) :- goal(c2,Parse2,Words0,Words1), c0(Goal,c0(Parse1,Parse2),Parse,Words1,WordsN). Termination rules will now look like this: c0(c0,Parse,Parse,String,String). c1(c1,Parse,Parse,String,String). c2(c2,Parse,Parse,String,String). As pure bottom-up parsers, programs of this kind suffer from various kinds of inefficiencies. Consider what would happen if we had multiple lexical entries for the same word: word(iv,strikes). word(np,strikes). If we are trying to recognize a sentence starting with this word, we will first look for its category and then try and extend a phrase of this category to a phrase of type S. The first possible category is IV, and so we will try to extend this analysis. A rule like: vp -> iv tells us that we can extend an analysed IV into an analysed phrase of some category if we can find a way of extending a VP to a phrase of the given category. So we will look for ways of extending a VP into an S. If there are rules with VP as the first category on the RHS, this will involve looking for further phrases that can combine with a VP. All this activity is, however, wasted if in fact a TV can never appear as the first word in a sentence. The trouble is that the bottom-up parsers only consider the goal category when they return a complete solution of some kind - the goal information is not used to actively guide what alternatives are considered beforehand. We can improve the efficiency of this kind of left-to-right, depth-first, bottom-up parser by precompiling a 'link' relation: a category c1 is linked to a category c2 if and only if c1 can appear as an initial constituent in c2. The relation is reflexive, so we have: link(Category,Category). and, for the little grammar assumed in the discussion above, we also have: link(np,s). link(iv,vp). link(tv,vp). The relationship is transitive, so if our grammar had included a "np -> det n" rule, then we would have had both the following: link(det,np). link(det,s). Given such 'link' statements, we can now modify our parser to exploit them: goal(Goal,Parse2,[Word0|Words1],WordsN) :- word(Category,Word0), link(Category,Goal), Parse1 =.. [Category,Word0], Term =.. [Category,Goal,Parse1,Parse2,Words1,WordsN], call(Term). This now expresses the rather obvious fact that there is no point in exploring a parse based on a particular category assignment to the first word in the string if that category cannot appear as a first constituent in the category of the string assumed by the Goal. Likwise, we shall want to insert a 'link' check at the start of each of our rule translations: np(Goal,NP,Parse,Words0,WordsN) :- link(s,Goal), goal(vp,VP,Words0,Words1), s(Goal,s(NP,VP),Parse,Words1,WordsN). There is no point seeking to complete a sentence in this position if sentences cannot appear in this position. Analogously with the single-daughter rules: iv(Goal,IV,Parse,Words0,WordsN) :- link(vp,Goal), vp(Goal,vp(IV),Parse,Words0,WordsN). All our Prolog recognisers and parsers, of course, use depth-first search, as they rely on the depth-first backtracking facility of Prolog to do their searching. For this reason, it is obviously much easier in Prolog to write depth-first search programs than other kinds. The use of breadth-first search requires programs that explicitly manipulate lists of alternative states, rather than leave the enumeration and remembering of states to the Prolog execution mechanism.