% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Extracts from the book "Natural Language Processing in Prolog" % % published by Addison Wesley % % Copyright (c) 1989, Gerald Gazdar & Christopher Mellish. % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Grammars Diagrams are not, given the limitations of present computers, a convenient data structure to use for NLP. The simplest way to handle trees is to manipulate them as lists where the first element is the root category and subsequent elements are the daughter subtrees in order of occurrence. Thus the tree above would be: [s,[np,'MediCenter'],[vp,[v,employed],[np,nurses]]] This kind of list would appear in a "pretty-printed" format as follows: [s [np 'MediCenter'] [vp [v employed] [np nurses]]] The general principle used here is that a tree is represented by a list whose first element is the top label and whose subsequent elements are the immediate subtrees in order. These elements are then themselves lists, representing the subtrees according to the same convention. In this scheme, a leaf node of a tree is treated as a tree with a label but no immediate subtrees. Thus it will be represented by a list with one element - the label. An alternative and equivalent representation of a tree is as a term: s(np('MediCenter'),vp(v(employed),np(nurses))) It is as straightforward to represent a CF-PSG in Prolog data structures as it is to represent transition networks. One way is to have a predicate 'rule' which takes two arguments, for the LHS category and the sequence of categories appearing as the RHS (represented as a list). Here is how the rules of Grammar1 would appear in this representation: rule(s,[np,vp]). rule(vp,[v,np]). rule(vp,[v]). For lexical entries, we can adopt the same convention as with transition networks, introducing assertions like: word(np,'Dr. Chan'). word(np,nurses). word(np,'Medicenter'). word(v,died). A grammar is a formal set of rules describing what it is to be a string in a particular language. Prolog, as a language based on logic, is eminently suitable for writing formal descriptions and definitions of various kinds. It is therefore natural to consider expressing a grammar directly as a logical specification in Prolog. In fact, it is extremely simple to translate the little grammar that we have been using as an example into Prolog clauses talking directly about strings of words and which ones are grammatical. As it happens, the resulting Prolog code can be used to recognize exactly the strings of words that the grammar claims to be grammatical and to generate example grammatical sentences. The translation of grammar rules into Prolog clauses just requires thinking carefully about what each rule is actually saying. Our sentence rule can be readily paraphrased as claiming that a string of words, Z, counts as a sentence if there is a string of words, X, that counts as a noun phrase, and a string of words, Y, that counts as a verb phrase, and that Z is merely Y appended to X. And this paraphrase can be more concisely expressed as a Prolog Horn clause: s(Z) :- np(X), vp(Y), append(X,Y,Z). The rule for verb plus object verb phrases is isomorphic: vp(Z) :- v(X), np(Y), append(X,Y,Z). And the intransitive case is trivial: vp(Z) :- v(Z). We need a lexicon, of course, and this can be provided as follows: np(['Dr. Chan']). np(['MediCenter']). np([nurses]). np([patients]). v([died]). v([employed]). And that is it -- we have a recognizer for a tiny fragment of English which can be invoked thus: ?- s([patients,died]). ?- s(['MediCenter',employed,nurses]). Thanks to its wholly declarative character, we can use this very same code to do generation as well as recognition. Try the following: ?- s(Sentence). Although elegant in comparison with the code one would need to perform the same task in a procedural language, this scheme for writing recognizers is less than ideal. Considerations of both efficiency and aesthetics suggest that we should eliminate the explicit 'append' statements if we can. The difference list technique, which we have already exploited in our net traversal programs, allows us both to eliminate the offending 'append' statements and to make the Prolog code close to isomorphic to the rules of the grammar whose language is recognized, thus simultaneously addressing our efficiency and aesthetic concerns. We recode the recognizer as follows: s(X,Z) :- np(X,Y), vp(Y,Z). vp(X,Z) :- v(X,Z). vp(X,Z) :- v(X,Y), np(Y,Z). np(['Dr. Chan'|X],X). np(['MediCenter'|X],X). np([nurses|X],X). np([patients|X],X). v([died|X],X). v([employed|X],X). ?- s([patients,died],[]). ?- s(['MediCenter',employed,nurses],[]). The first rule here can be paraphrased as saying that we can find an S at the start of X (with Z the portion of string left behind after it)) if we can find an NP at the beginning of X, leaving Y behind, and a VP at the beginning of Y, with Z left behind after that. As before, we can use this very same code to do generation as well as recognition. Try the following: ?- s(Sentence,[]). It is straightforward to convert the recognizer into a parser, simply by the addition of extra arguments. Each predicate in the program, which captures the notion of a particular grammatical category, is provided with an extra argument (appearing as the first argument, say) which describes (as a list) the structure of the phrase. In the logic translation of each phrase-structure rule the tree corresponding to the LHS category will be simply a list consisting of the category name followed by the trees of the RHS categories. For example: s([s,NP,VP],X,Z) :- np(NP,X,Y), vp(VP,Y,Z). np([np,nurses],[nurses|X],X). Translated into English (and forgetting about the arguments concerned with portions of the string), the first clause says that one form of a sentence is as a NP followed by a VP. If a sentence has this form, then its parse tree will be a list of the form "[s,NP,VP]", where NP is the parse tree of the noun phrase and VP is the parse tree of the verb phrase. The code for our difference list recognizer above bears such a close relationship to the corresponding PSG that one might reasonably expect to be able to get from the latter to the former automatically. We would like to be able to get a program to convert clauses written essentially as phrase structure rules into clauses that would implement difference list recognition. Thus our input to the program might be expressed thus: s ---> np, vp. np ---> [nurses]. (given an appropriate operator declaration for '--->'), where the annotated ("variabilised") difference list output we want is: s(_1,_2) :- np(_1,_3), vp(_3,_2). np([nurses|_4],_4). This turns out not to be too difficult. We define a predicate 'translate' as follows: translate((LHS_in ---> RHS_in), (LHS_out :- RHS_out)) :- LHS_out =.. [LHS_in,S0,Sn], add_vars(RHS_in,S0,Sn,RHS_out). This sets up templates for the input rule and output clause, forms the LHS of the output by wrapping a couple of difference list variables up with the LHS of the input, and concludes by handing the RHS over to 'add_vars': add_vars((RHS_in1,RHS_in2),S0,Sn,RHS_out) :- !, add_vars(RHS_in1,S0,S1,RHS_out1), add_vars(RHS_in2,S1,Sn,RHS_out2), combine(RHS_out1,RHS_out2,RHS_out). If there is more than one item on the RHS of the input, then split it into two at the first comma, and recursively reapply 'add_vars' to the two pieces, then combine the results. add_vars(RHS_in,S0,Sn,true) :- islist(RHS_in), !, append(RHS_in,Sn,S0). If the RHS just consists of a list, and is thus a lexical entry, append the terminating difference list variable to it, and then unify the resulting list with the initial difference list variable. add_vars(RHS_in,S0,Sn,RHS_out) :- atom(RHS_in), RHS_out =.. [RHS_in,S0,Sn]. If RHS is just a single atom, then treat it the same way as the LHS symbol. It only remains to define 'combine': combine(true,RHS_out2,RHS_out2) :- !. combine(RHS_out1,true,RHS_out1) :- !. combine(RHS_out1,RHS_out2,(RHS_out1,RHS_out2)). This just says that RHS results which unify with 'true' can be ignored -- otherwise you get what you expect. Definite Clause Grammars Although translating CF-PSGs directly into Prolog clauses is straightforward, there are important new issues to be faced in translating general feature based grammars. In such grammars, a category is a bundle of features and so cannot be represented simply by an atomic predicate relating strings of words. What we have to do is translate each category into a predicate together with a sequence of arguments which encode the features in the bundle, as well as saying the right things about portions of the string. As before, it would be useful to have a more abbreviated notation that would allow us to write such grammars as Prolog programs but without having to worry about the difference lists. Fortunately, many Prolog systems come complete with a built in interpreter for such a notation, the notation of Definite Clause Grammars (DCGs). In fact, we have already almost seen DCGs in a preceding section -- the following expressions are DCG rules: s --> np, vp. np --> [kim]. Prolog interpreters equipped with a DCG translator will automatically convert these expressions into the difference list recognition format that we have already discussed. It might seem from this example as if "DCG" is just a fancy name for a minor variant on standard PSG formalism. But the example is misleading in that the DCG formalism allows one to do something not countenanced in PSG - namely to associate as arguments, terms and/or logical variables, as many as you want, with categories. Hence the following rule is also a legitimate statement in the DCG formalism: s --> np(Person,Number,nominative), vp(Person,Number,tensed). one which a DCG-equipped Prolog system will translate into something equivalent to this: s(_12,_13) :- np(_14,_15,nominative,_12,_16), vp(_14,_15,tensed,_16,_13). Assuming that the variables and constants in the original DCG rule are not perversely named, that rule claims that a sentence can consist of a nominative NP followed by a tensed VP, where the NP and VP share the same values for both person and number. In PATR, this would be rendered by something like: Rule S -> NP VP: = = = nominative = tensed. Note that in the DCG version the number and order of arguments is fixed and crucial. Thus whilst 'np(third,plural,Case)' is probably both meaningful and useful in the context of the rule just given, the superficially similar 'np(third,plural)' and 'np(plural,Case,third)' are most likely to be both useless and misleading. (We hedge our claims here, since their truth depends on how the rest of the grammar is set up. One could use 'Case' as a variable to range over number values, or use 'tensed' as the name of a case, but it would be thoroughly perverse to do so.) The PATR notation does not suffer from these fixed arity and fixed order requirements on the equations that annotate its rules. Notice also that whilst we have chosen mnemonic names for the variables in our example rule, these names will typically not be maintained internally by the Prolog system, and so the effect of tracing a parse, say, may be to exhibit expressions like 'np(_43, _76, accusative, _15, _16), vp(_43, _56, _97, _16, _19)' thus requiring one to remember that the third argument in 'vp' deals with tense, the second argument in 'np' deals with number, and so on. This latter problem can be overcome, albeit at the cost of a little prolixity, by using terms separated by a colon (or "=" or some other symbol which is declared as an infix operator) in place of bare variables. The first component of these terms can then be a constant which provides a mnemonic name for the function of the slot in which it appears, thus: s --> np(person:P,number:N,sex:S,case:nominative), vp(person:P,number:N,sex:S,verb_form:tensed). vp(person:P1,number:N1,sex:S1,verb_form:V1) --> xv(person:P1,number:N1,verb_form:V1), np(person:P2,number:N2,sex:S2,case:accusative), vp(person:P2,number:N2,sex:S2,verb_form:infinitive). And, in a grammar along these lines, the lexical entries need to look like this: np(person:3,number:singular,sex:_,case:_) --> [kim]. np(person:3,number:singular,sex:female,case:nominative) --> [she]. det(number:singular) --> [a]. det(number:_) --> [the]. nn(number:plural,sex:none) --> [scissors]. nn(number:_,sex:_) --> [sheep]. nn(number:plural,sex:none) --> [scissors]. bv(person:_,number:plural,verb_form:tensed) --> [are]. tv(person:_,number:_,verb_form:tensed) --> [ate]. xv(person:3,number:singular,verb_form:tensed) --> [sees]. Since DCGs allow one to nest terms and variables to arbitrary depth, rules such as the following are quite permissible: s --> np(person:P,number:N,sex:S,case:C), s(slash:np(person:P,number:N,sex:S,case:C)). vp(person:P,number:N,sex:_,verb_form:V,slash:np(person:_, number:_,sex:_,case:accusative)) --> tv(person:P,number:N,verb_form:V). Notice that the ":" notation, although it leads to more perspicuous grammars and traces, does not liberate the grammar writer from having to worry about always using the same argument position of a predicate consistently for the same feature (something the PATR user does not have to worry about). Our examples so far have concentrated on using DCGs as feature-based recognizers, but nothing about the formalism constrains them to mere recognition. Since the arguments to the categories can contain arbitrary terms, we can use them for parsing, and, more generally, syntactic translation. Thus to parse, we could write rules like this: s([s,NP,VP]) --> np(NP),vp(VP). vp([vp,XV,NP,VP]) --> xv(XV),np(NP),vp(VP). or like this: s(s(NP,VP)) --> np(NP),vp(VP). vp(vp(XV,NP,VP)) --> xv(XV),np(NP),vp(VP). depending on whether we want the parse returned in the form of a list or a term. And, since there is no limit to the number of permissible arguments, we can, if we wish, simply add this parse-construction argument to the feature arguments already illustrated. The formulation of feature based grammar in DCG that has been illustrated relies on a fundamental property of the grammars concerned, namely that there is a feature (we have called it cat) whose value is atomic and always provided in any description of a category given in the grammar. This property has been needed, because we have consistently used the cat value as the Prolog predicate in the formulation, and in Prolog it is impossible to have a goal whose predicate is unknown (a variable). Of course, in a rule like: Rule {coordination} X0 -> X1 C X2: ... the whole point is not to have to specify the cat features of X0, X1 and X2. With a grammar having rules of this kind, we have little choice but to treat cat just like every other feature and provide a dummy predicate, 'p' say, that is used with every category. This predicate must always take the same number of arguments (have an argument for every possible feature) in order that there can be category descriptions analogous to X0 above that will match with every possible category. The result of translating a PATR grammar of this kind into a DCG will be a set of rules such as the following: p(s,Person1,Number1,Case1,Vform1) --> p(np,Person,Number,nominative,Vform2), p(vp,Person,Number,Case2,tensed). (where the 'p' arguments represent cat, person, number, case and vform in that order). In summary, the DCG formalism represents a rather natural extension of CF-PSG within the Prolog context. It is convenient, being standardly provided with Prolog systems, and perspicuous, it has a semantics that is as declarative as that of Prolog and, of course, it integrates well with Prolog. The arguments associated with categories permit both parsing and translation more generally (for example, translation from English to logic, as we shall see in Chapter 8). As we will see in the next chapter, the parser that the standard interpretation of the formalism provides is a left-to-right, top-down, back-tracking one, a parser, in fact, whose search strategy is exactly that of Prolog itself. For this reason, DCG parsers are fast (since they employ Prolog very directly) but inefficient (since they fail to store intermediate results), though no worse in this respect than a typical ATN of comparable coverage. From the point of view of the grammar- writer, the major disadvantage of the DCG formalism is its assumption of "term unification" as opposed to the "graph unification" of PATR. This means that all the slots for featural information have to be explicitly provided in a fixed order on every category to which they apply, even when no information is available and the slot just contains a variable. This makes for grammatical and lexical verbosity, whereas graph unification based formalisms allow unknown or irrelevant featural information to be suppressed. This is a topic which we shall be returning to in Chapter 7. The use of arguments and variables in DCGs allows them to be used to define recognizers for languages that fall outside the CFL class. As we saw earlier, one of the distinguishing features of a context-free rule is that it makes no restriction on the context in which its LHS can be realised as its RHS. There is no reason, however, why we should not consider other kinds of rules in describing languages, for instance rules like 'LHS can be realised as RHS if it is preceded by a A and followed by a B'. If we allow ourselves to use such rules, then we will be able to describe languages that cannot be descibed by CF- PSGs. One such language is the language anbncn. Since the rule format for DCGs is that of the CF-PSGs and since there is no apparent reference to syntactic context, it may not be immediately obvious how DCGs are able to provide recognizers for non-CFLs. The key to this puzzle lies in the fact that DCGs can take account of context by recording information in extra arguments to the predicates. We begin by considering a grammar for the anbncn language that is standardly used as an example of a non-CFL language. In this grammar, the extra arguments are used to remember how many bs and cs are to be accepted after the as. This information is stored as a term of the form: i(i(i(....i(i)....))) where the number of is is the number of symbols to be accepted. The first a seen: s --> [a], s(i). every subsequent a seen pushes an i onto the i-stack kept in the category's argument slot: s(I) --> [a], s(i(I)). We are through with as, so we copy the entire i-stack, which now records the exact number of as seen, onto two new categories: s(I) --> bn(I), cn(I). and use the first of these to check the length of the b-string by popping one i off the stack for each b seen: bn(i(I)) --> [b], bn(I). bn(i) --> [b]. and then doing exactly the same with the c-string: cn(i(I)) --> [c], cn(I). cn(i) --> [c]. Another familiar chestnut from the formal language literature, one which perhaps has more potential relevance to NLP than the case we have just looked at, is the string copying language xx, x in {a,b,c}*. This is the language of strings made out of the symbols a, b and c where each string consists of two halves, the second being an exact copy of the first. This succumbs to a rather similar technique: get the first character in the string and put it on the (empty) stack: x --> [a], x(a). x --> [b], x(b). x --> [c], x(c). push subsequent characters onto the stack as they occur: x(I) --> [a], x(a(I)). x(I) --> [b], x(b(I)). x(I) --> [c], x(c(I)). copy the stack to a new category: x(I) --> y(I). and start popping the characters off, as you encounter them in the string: y(a(I)) --> y(I),[a]. y(b(I)) --> y(I),[b]. y(c(I)) --> y(I),[c]. until you reach the last character in the string: y(a) --> [a]. y(b) --> [b]. y(c) --> [c]. It is worth reflecting on the efficiency of this grammar used as a DCG recognizer under the standard interpretation of DCGs. Arranged with the order of clauses used above, the program will begin by putting the entire string on the stack, reach the end, backtrack one character, apply the 'x(I) --> y(I)' rule, try and accept the final character, backtrack over the penultimate character, retry the 'x(I) --> y(I)' rule, and so forth, until it has backtracked to the exact middle of the string. Only then will it be able to make it through to the end of the string. Compare this with the operation of the DCG recognizer for anbncn given above: that does not need to backtrack at all when presented with grammatical strings. The moral is a simple one: the fact that DCGs make it easy to write elegant grammars for such languages does not entail that the recognition process they induce will be efficient. One problem that has much interested linguists in the last few years, and even attracted the attention of some computational linguists, is that posed by languages with very free word or constituent order. One of the simplest free word order languages that one can imagine is the language {a,b}* such that n(a) = n(b). This simple language is outside the class of regular languages, but inside the class of CFLs. Here is a DCG version of a CF-PSG for the language: s1 --> [a], y1. s1 --> [b], x1. x1 --> [a], s1. x1 --> [b], x1,x1. x1 --> [a]. y1 --> [b], s1. y1 --> [a], y1,y1. y1 --> [b]. This works, but its workings are remarkably opaque for such a simple grammar. Much more perspicuous is the following DCG, that exploits to the full the possibilities offered by using arguments as stacks in order to count occurrences: % % start two counters: x --> x(a,b). % if there is an a, then increment the b counter: x(A,B) --> [a], x(A,b(B)). % alternatively, if there is an a, % then decrement the a counter: x(a(A),B) --> [a],x(A,B). % if there is an b, then increment the a counter: x(A,B) --> [b], x(a(A),B). % alternatively, if there is an b, % then decrement the b counter: x(A,b(B)) --> [b],x(A,B). % eliminate the nonterminal: x(a,b) --> []. This section has served to illustrate some of the formal techniques that can be used to handle grammatical phenomena such as sequences of matching length, isomorphic sequences, and very free order which sometimes prove awkward, or even mathematically impossible, to handle with just the resources of CF-PSG. To the extent that such phenomena show up in the languages of the world, and it is a very limited extent, these techniques are of potential relevance to NLP. We have restricted ourselves to relatively simple formal language examples since the background necessary to provide grammar fragments from languages like Swiss German (isomorphic sequences) or Djirbal (free order) would take more space than these relatively exotic topics warrant.