Coding Dialogs with the DAMSL Annotation Scheme Mark G. Core and James F. Allen Department of Computer Science University of Rochester Rochester, NY 14627 mcore, james@cs.rochester.edu Abstract This paper describes the DAMSL annotation scheme for communicative acts in dialog. The scheme has three layers: Forward Communicative Functions, Backward Communicative Functions, and Utterance Features. Each layer allows multiple communicative functions of an utterance to be labeled. The Forward Communicative Functions consist of a taxonomy in a similar style as the actions of traditional speech act theory. The Backward Communicative Functions indi- cate how the current utterance relates to the previous dialog, such as accepting a proposal, confirming under- standing, or answering a question. The Utterance Fea- tures include information about an utterance’s form and content, such as whether an utterance concerns the communication process itself or deals with the subject at hand. The kappa inter-annotator reliability scores for the first test of DAMSL with human annota- tors show promise, but are on average 0.15 lower than the accepted kappa scores for such annotations. How- ever, the slight revisions to DAMSL discussed here should increase accuracy on the next set of tests and produce a reliable, flexible, and comprehensive utter- ance annotation scheme. Introduction There are two classes of applications that require the automatic analysis of dialogs: a computer system may act as a participant in a dialog with users, or it may act as an observer attempting to interpret human-human dialogs. In both cases, the system must keep track of how each utterance changes the commonly agreed upon knowledge (common ground (CS89)) including the conversational agents’ obligations and plans. Dia- log text annotated with the communicative actions of each utterance would aid in training and testing such systems. In addition, linguists studying dialog would greatly benefit from annotated corpora that could be used to reveal the underlying structures of dialogs. DAMSL (Dialog Act Markup in Several Layers) de- fines a set of primitive communicative actions that can be used to analyze dialogs. For the purposes of this paper, we will define communicative actions as refer- ring to explicit manipulations of the common ground, and not include more subtle phenomena such as listen- ers forming opinions about speakers based on the tone and style of their speech. Speech act theory (Sea75) was one of the first at- tempts at developing a set of communicative actions. Searle’s action classification included Representatives, that introduce information into the common ground; Directives, that attempt to create an obligation on the listener; and Commissives, that involve speak- ers attempting to introduce an obligation on them- selves. Over the years, many researchers (A1195; CL90; Han79) have noticed that a major problem with speech act theory is that it attempts to capture an utter- ance’s purpose(s) with one label. DAMSL addresses this problem by allowing multiple labels in multiple layers to be applied to an utterance. Thus an utterance might simultaneously perform actions such as respond- ing to a question, confirming understanding, promising to perform an action, and informing. The classes of communicative actions discussed here are high-level and designed to be applicable to vari- ous types of dialogs. The idea is that for a particular domain, these classes could be further subdivided into acts that are relevant to the domain. The common level of abstraction across domains, however, would al- low researchers to share data in a way that would not be possible if everyone developed their own scheme. The overall structure of DAMSL has been devel- oped by the Multiparty Discourse Group in Dis- course Research Initiative (DRI) meetings. The DAMSL annotation manual and annotation tools have been developed at Rochester. The an- notation manual describing each action class in DAMSL and when it applies is available at “ftp://ftp.cs.rochester.edu /pub/packages/dialog- ‘See the DRI home page for more details: http://www. georgetown. edu/luperfoy/Discourse-Treebank/ dri-home. html annotation/manual.ps.gz”.. It is important to note that this is a working document rather than a completed project, and the scheme is sure to be refined and ex- tended in subsequent meetings once we have more ex- perience with using DAMSL. In addition, the focus of DAMSL has primarily been on task-oriented dialogs, where the participants are focused on accomplishing a specific task. While we believe the taxonomy is ap- plicable to all dialogs, the distinctions made here are the ones most prevalent and important to task-oriented situations. The following sections of this paper will give a short description of the DAMSL scheme and discuss some preliminary inter-annotator reliability scores. The DAMSL Annotation Scheme Speech act theories generally only allow an utterance to have one speech act and maybe an additional indi- rect speech act. This is a problem because utterances can simultaneous respond, promise, request, and in- form. To handle responses, researchers have created subclasses of Representative/Inform such as Accept and Reject (ASFt94). However, consider the two di- alogs below. Note, the labels u and s are used to refer to different speakers. u: let’s finish the report today s: okay u: it is raining s: oh no In contexts above, it seems strange that the utter- ance “okay” would be labeled with the same category as “it is raining” (both would be Informs and “okay” would be an Accept to be more specific). The accepting and rejecting character of an utterance seems to belong in a separate action class dealing with a speaker’s reac- tions to previous utterances. You can find other types of phenomena that fit into this class such as signaling understanding with acknowledgments and answering questions. These phenomena will be called Backward Communicative Functions while speech act categories not related to responses will be called Forward Com- municative Functions since they affect the future por- tions of the dialog. For example, a request for infor- mation will cause you to give an answer. A third set of labels, Utterance Features, includes features that characterize the content and structure of utterances. Forward Communicative Function The Forward Communicative Functions include the speech act categories: Representatives, Directives, and Commissives. However, the categories are now inde- pendent so an utterance can simultaneously give infor- mation, make a request, and make a promise (although it is unlikely one utterance will do all of these). All the Forward Communicative Functions are shown below. Representatives, utterances making claims about the world, are now called Statements. This class is further subdivided based on whether the speaker is trying to affect the beliefs of the hearer, or is repeating information for emphasis or acknowledg- ment. Directives fit under the more general category, Influencing-Addressee-Future-Action, which includes all utterances that discuss potential actions of the ad- dressee. Directives are subdivided into two categories: Info-Request, which consists of questions and requests such as “tell me the time”, and Action-Directive, which covers requests for action such as “please take out the trash” and “close the door”. Influencing- Addressee-Future-Action also includes Open-Option where a speaker gives a potential course of action but does not show preference toward it, “how about go- ing to Joey’s Pizza”. Commissives are given the more descriptive name, Committing-Speaker-Future-Action, and are subdivided into Offers and Commit(ments). The Performative category includes utterances that make a fact true in virtue of their content, such as your boss firing you by saying “you are fired”). Since the Performative category is an independent component of the Forward Function, such utterances can be marked in other categories (such as Statement) as well. The Other Forward Function category is a default choice for communicative actions that influence the future of the dialog in a way not captured by the other categories. Sentence initial words such as “okay” are often sep- arated into separate utterances and marked as Other Forward Function. These words may have Forward Communicative Functions such as signaling a repair or change in topic or holding the turn (while the person is thinking) as well as Backward Communicative Func- tions such as Accepting and Acknowledging. Future work in this annotation effort will include developing classes of Other Forward Functions. e Statement — Assert — Reassert — Other-Statement e Influencing Addressee Future Action — Open-option — Directive Info-Request Action-Directive e Committing Speaker Future Action Offer Commit e Performative e Other Forward Function Backward Communicative Function The Backward Communicative Functions in the DAMSL scheme are shown below. The classes Agreement, Understanding, Answer, and Information- Relation are independent so an utterance may simulta- neously accept information and acknowledge that the information was understood as well as answer a ques- tion. Agreement has several subclasses; Accept and Reject refer to fully accepting or rejecting an utterance or set of utterances. Accept-Part and Reject-Part refer to partially accepting or rejecting a proposal. In the next version of DAMSL, a label such as Accept-and-Reject will be added to deal with utterances such as “I'll take everything except the curtains”, that both accept and reject parts of an offer (assume that this is a response to an offer such as “what would you like to take to school”). Note, it is difficult to break this into ac- cepting and rejecting pieces since separating “I'll take everything” from the rest changes its meaning. Hold refers to utterances such as clarification questions that delay the listener’s reaction to a proposal or question. Maybe refers to cases where the listener refuses to make a judgment at this point. The examples in figure 1 il- lustrate each type of agreement in response to the offer “Would you like the book and its review?”. The Understanding dimension concerns whether the listener understood the speaker. The listener may sig- nal understanding or non-understanding or attempt to correct the speaker (showing that they either did not understand or that they did understand but that the speaker misspoke). Non-understanding can be indicated by utterances such as “huh?”, clarification questions (“To Dansville?”) and by explicit questions Context: A: Would you like the book and its review? Accept B: Yes please. Accept-Part I’d like the book. Maybe B: I?11 have to think about it (intended literally) wo Reject-Part B: I don’t want the review. Reject B: No thank you. Hold B: Do I have to pay for them? Figure 1: Example annotations using the Agreement Label about what the speaker said or meant. Understanding can be indicated by acknowledgments such as “right” or “okay”, by repeating some of the speaker’s utter- ance, or by continuing or completing the speaker’s sen- tence. The Answer dimension indicates that an utterance is supplying information explicitly requested by a previ- ous Info-Request act. This is a highly specific function that you might expect could be generalized into some other form of response, but we have not as yet been able to identify what the generalization would be. Information-Relations are intended to be like the Rhetorical Relations of (MT87) and describe how the information in the current utterance relates to pre- vious utterances in the dialog: “does the utterance provide evidence for a claim in a previous utter- ance”, “is it giving an example of something mentioned previously?”. So an utterance can certainly have Information-Relations as well as answering a question, accepting a proposal, and acknowledging understand- ing. A set of information relations for DAMSL has not been constructed yet. e Agreement — Accept Accept-Part — Maybe Reject-Part Reject — Hold Understanding — Signal-Non-Understanding — Signal-Understanding Acknowledge Repeat-Rephrase Completion — Correct-Misspeaking e Answer e Information-Relation Utterance Features The third part of DAMSL consists of the Utterance Features, which capture features of the content and form of utterances. The Information Level dimension encodes whether the utterance deals with the dialog task, the communication process, or metalevel dis- cussion about the task. This dimension eliminates the need to have tags such as Communication-Info- Request, for utterances such as “What did you say?”, and Task-Info-Request for utterances such as “What times are available?”. With this information, we can identify three independent subdialogs within a single dialog. The topic motivating the dialog is developed and discussed in the Task part of the dialog. The Task- Management part of a dialog involves explicit plan- ning and monitoring of how well the task is being ac- complished. The physical requirements of the dialog (such as being able to hear one another) are main- tained in the Communication-Management part of the dialog. Note that in some sense all utterances have a Communication-Management component. It is only marked, however, when the utterance has no Task or Task Management component. Communicative Status and Syntactic Features are hints about the possible communicative acts of an ut- terance. Communicative Status labels of Abandoned and Uninterpretable suggest that an utterance has lit- tle effect on the dialog because it was broken off or garbled beyond recognition. Syntactic Features cur- rently only flag conventional sentences such as “hello” , “may I help you” and exclamations such as “wow”. Conventional utterances are often at the Communica- tion Management level and Exclamations are usually Statements about the speaker’s feelings. e Information Level Task Task Management Communication Management Other e Communicative Status Abandoned Uninterpretable e Syntactic Features Conventional Form Exclamatory Form Utterance Segmentation This paper assumes an utterance is a set of words by one speaker that is homogeneous with respect to In- formation Level and Forward and Backward Commu- nicative Functions. This means in a case like the one below when the set of communicative acts being con- veyed changes, a new utterance begins: utti u: we’1l get that couch utt2 how about that end table? Utterances are not required to be single clauses, and if the set of communicative acts being conveyed stays the same, several clauses may form one utterance: utti u: we’1l take the train to Corning | then we’11 pick up boxcars in Avon | and go on to Dansville to pick up oranges Usually the only utterances shorter than a clause are sentence initial words such as “okay”. Words such as “um” and “er” and phrases such as “I mean” have communicative functions separate from the clauses in which they appear. However, utterances are not hier- archical so labeling “I mean” as a separate utterance below would mean cutting off “Friday” from “we'll go Tuesday”. DAMSL is not designed for annotating speech repairs, reference, or other intra-clause relations so we decided to use a simple definition of utterance that leaves out such phenomena. utti u: we’1l go Tuesday I mean Friday Short interruptions by another speaker do not break up an incomplete utterance (incomplete meaning an in- terruption in the syntax). In the example below, “take the product to to Corning” is treated as one utterance. So this is a functional notion of utterance as opposed to a definition based on prosody. u: take the product to s: yes? u: to Corning Experiments One of the key requirements for any annotation scheme is that the scheme can be used reliably by trained an- notators. To explore this, we performed a reliabil- ity experiment on the current DAMSL scheme using test dialogs from the TRAINS 91-93 dialogs (GAT93; HA95), a corpus of discussions between humans on transportation problems involving trains. One person (the user) was given a problem to solve such as shipping boxcars to a city and the other person was instructed to act as a problem solving system. In addition, this system had information (the times to travel various paths) that the manager did not. An excerpt from a TRAINS dialog is shown in figure 2. u: _we_ have to ship a boxcar of oranges to Bath by 8 AM : and it is now midnight s: okay u: okay all right so there are two boxcars at Bath and one at Dansville and there’s s: and there’s u: wait I’ve forgotten where the oranges are where are the oranges s: the oranges are in the warehouse at Corning u: okay so we need to get a boxcar to Corning s: right u: alright so why don’t we take one of the ones from Bath Figure 2: An excerpt from a TRAINS 91 dialog (d91- 7.1) Three undergraduates and a graduate student were given informal training consisting of annotating some dialogs and having their results compared against canonical annotations as well as comparing their re- sults against one another. A GUI-based annotation tool, DAT? was developed to test the DAMSL scheme. This tool displays the dialogs and can play audio for individual utterances so annotators can listen to the actual dialogs as well as studying the transcripts. DAT also gives warnings to users when a suspicious pattern of inputs is entered and allows them to correct the annotation if desired. Here is a list of what the tool defined as suspicious. e Question and answer have different Info Levels e An acceptance that is not an acknowledgment e An acknowledgment (but not acceptance) that is not at the Communication Management Information Level e Answers that are not Asserts. e A check question? whose answer does not have an Agreement label. ? Available at http://www.cs.rochester.edu/research/trains/ annotation/ 3 check question is defined in the annotation manual as a statement about the world made for the purposes of con- firmation, as in “We’re using the blue sofa, right?”. Check questions are labeled as both Asserts and Info-Requests and their answers are both Asserts and Accepts (or possi- bly Rejects). Dialog | Utts | Annotators | Total Annotations/Tag dl 133 2UG 266 d2 72 2UG 144 d3 40 | 2UG1GR 120 d4 41 |1UG1GR 82 d5 19 | 1UG1GR 38 d6 88 | 1UG1GR 176 d7 159 | 1UG1GR 318 d8 52 | 1UG1GR 104 total | 604 1248 UG = undergraduate GR = graduate student Table 1: Experimental Setup e A response to an Action-Directive or Open-Option that does not have an an Agreement label. e A response to a question that is not an answer. After training, the students independently anno- tated a series of dialogs as shown in table 11: Results The statistics used to measure interannotator relia- bility are percent pairwise agreement (PA), expected pairwise agreement (PE), and kappa (PA adjusted by PA-—PE PE): K = 1_PE in (SJ88). Statistics were collected for each tag over each dialog. Then an average PA, PE, and kappa for each tag were computed as follows: average = (di x TAPT;)/ >> TAPT; where TAPT is total an- notations per tag and d; is the PA, PE, or kappa for a tag over dialog i. According to (Car96) even for tentative conclusions to be drawn, kappas must be above 0.67 with above 0.8 being considered reliable. The results suggest that with revisions to the annotation manual, annotators should be able to produce labelings of at least usable quality (between 0.67 and 0.8). The results are shown in tables 2, 3, and 4 (note, IAF is Influence on Ad- dressee Future Action and CSF is Committing Speaker Future Action). The Resp-to abbreviation refers to Response-to, an annotation of which utterances a re- sponse responds to. Note, Exclamation was only la- beled yes three times in the test set and Performative was never labeled yes in the test set, so both labels are left out of consideration. Two of the lowest kappa scores of the annotations occur in the Committing-Speaker-Future-Action and These are defined formally *d1-d8 correspond to TRAINS dialogs d92a-2.1, d92a- 2.2, d92a-3.1, d92a-4.1, d92a-4.3, d93-13.2, d93-13.3, and d93-16.1. Measure | Statement | IAF | CSF | Other For Funct PA 0.82 0.88 | 0.88 0.93 PE 0.49 0.60 | 0.87 0.85 Kappa 0.66 0.70 | 0.15 0.48 Table 2: Reliability for Main Forward Function Labels Measure | Understand | Agree | Ans | Resp-to PA 0.83 0.78 | 0.95 0.84 PE 0.60 0.62 | 0.73 0.29 Kappa 0.57 0.42 | 0.76 0.77 Table 3: Reliability for Backward Function Labels Agreement dimensions. The major reason for disagree- ments in these dimensions is that annotators have a hard time deciding whether a response is an acceptance (labeled under the Agreement dimension) or just an acknowledgment. In the example below, it is unclear whether u thinks going through Corning is a good idea or is waiting to hear more before making a judgment. s: so we’1l take the train through Corning u: okay s: and on to Elmira. Hearing the audio sometimes helps, but there are many cases where the annotator would have to be able to read the speaker’s mind in order to make the distinc- tion. To make matters worse, this one decision also af- fects two other dimensions: the Committing-Speaker- Future-Action dimension because acceptances many times mean commitment but acknowledgments do not, and the Information Level dimension since acknowl- edgments are at the Communication Management level while agreements are at the Task level. Thus, we have differences in at least three dimensions based on a sin- gle subtle distinction that often cannot be made. The two interpretations are summarized in table 5. This problem, where a slight change in interpreta- tion causes major changes in the annotation, clearly indicates a need for revision. One possibility would be to introduce some labels that capture the ambiguity, but this would have to be done in each dimension and might serve to aggravate the problem by introducing Measure | Info level | Abandoned | Unintelligible PA 0.83 0.98 0.99 PE 0.57 0.94 0.98 Kappa 0.60 0.64 0.14 Table 4: Reliability for Utterance Features Dimension | Interp 1 | Interp 2 Understanding ACK ACK Agreement N/A ACCEPT CSF N/A COMMIT Info Level COMM-MANAGE TASK Table 5: Two interpretations of an utterance such as “okay” . additional choices. The other possibility is to force an agreement reading based on how the proposal/request is eventually treated in the dialog. Thus in the ex- ample above, unless the speaker goes on to reject or question the proposal, the response would count as an implicit accept, Interpretation 1 would not be allowed, and the response would have to be labeled with some Agreement tag. Following this rule could be encour- aged by having DAT give the user a warning every time an utterance is tagged an Acknowledgment but no Agreement tag is specified. The Other-Forward-Function category also has a low kappa score; this is partially due to the fact that the expected agreement for it is high since its value is usu- ally Not-Present. This category applies most often to words such as “okay” that are very ambiguous in their meaning even when heard in context. It will be inter- esting to develop subcategories of Other Forward Func- tion such as “turn holding” and “signaling a repair” to give us a better idea of what phenomena annotators are having trouble labeling. Most of the other labels have kappas around 0.6 meaning the annotations are fairly reliable but that some problems still remain. One problem that affects several labels involves check questions. Check ques- tions are statements about the world made for the pur- poses of confirmation, as in “We’re using the blue sofa, right?”. Check questions are labeled as both Asserts and Info-Requests and their answers are both Asserts and Accepts (or possibly Rejects). However, it is dif- ficult for annotators to consistently recognize a check question, leading to disagreements in the Statement and Influencing Addressee Future Action dimensions (is it an assert, is it a question?), and disagreements about whether the next utterance is an Answer and Assert or simply an Accept (or Reject). Another problem arises with indirect speech acts such as requests made by statements such as “it would be nice to have some light”. There is a continuum of interpretations for such an utterance, ranging from a pure Assert act through to a pure Action-Directive act depending on the annotator’s view of what the speaker intended and how the utterance was taken in the dia- log. The DAMSL scheme alleviates this problem some- what by not forcing an annotator to choose between the two options. They can mark an utterance as both acts. In practice, however, we still see a fair amount of inconsistency and some more specific guidance ap- pears to be needed. This may have to be done on a domain-by-domain basis, however. For instance, in the TRAINS domain, the users often state their goals, as in “I have to get trains there by noon”. We have been taking these utterances simply as Asserts, but this is somewhat arbitrary as there is a sense in which such utterances influence the hearer’s future action as with Action Directives. Another difficult example in TRAINS occurs when the speaker summarizes a plan that has already been developed, as in: utti: s: we’ll go through Corning utt2: u: mm-hm utt3: s: pick up the oranges, and unload at Dansville If uttl and utt3 are really just descriptions of what has been agreed upon, they would be Reasserts, but annotators often want to add an Action-Directive in- terpretation as well because of their surface form. Such cases may be resolved with domain-specific instruction, but it is unclear whether unambiguous generic instruc- tions can be found. Another problem with the Statement dimension is the label, Reassert. When information is asserted that has been discussed previously, the annotators have to decide whether the information was forgotten by the hearer (and thus constitutes an Assert) or whether the speaker is trying to reintroduce the information to make a point (and hence it would be a Reassert). A similar confusion occurs with the Repeat-Rephrase tag of the Understanding level where annotators have to decide how far back a Repeat-Rephrase utterance can refer and how close the paraphrase must be. An- notators also get confused if a speaker simultaneously makes a repetition and goes on to make a correction or completion. Some work needs to be done to clarify the definitions of these labels. Another label that confuses annotators is the Task Management label of the Information Level dimension. In TRAINS, the domain is planning so an utterance such as “we can’t do that because there is a train al- ready on that track” is Task level but something like “we could do that another way. do you want to change the plan?” would be considered Task-Management since it explicitly discusses the course of the dialog while the first only implicitly signals a possible change in the course of the dialog. The difference is very subtle and hard to annotate. Conclusions For the interpretation of a dialog, it is critical to have a primitive abstraction of the purpose of each utterance. The general strategies of a system trying to partici- pate in a dialog or understand a dialog will be tied to these primitives. For example, a system might have a rule such as “a statement is something to add to the database”. The system will then use a more detailed representation of the utterance in its processing. As another example, if an utterance is an Information Re- quest, the system will process the semantic interpreta- tion of the sentence to determine what information is being asked for. As the system adds utterances to its data structures, it will create higher level forms such as hierarchical multi-agent plans and discourse structures analogous to paragraphs and chapters. The representation driving the creation of such data structures needs to be extremely flexible. Speech act theory is currently the most popular representation used; however, it is a set of mutually exclusive cate- gories and does not allow utterances to perform mul- tiple actions simultaneously. Unfortunately it is com- mon in dialogs, especially problem solving dialogs, for an utterance to perform several actions such as signal- ing understanding and accepting a task. The DAMSL annotation scheme has many independent layers that allow the labeling of all these actions. The annotation scheme also separates utterances into those that deal with the communication process, those that deal with the task at hand, and utterances that deal with how to solve the task. This type of annotation is not typically seen in speech act theory but it is critical to interpret- ing dialogs since the utterances at these levels must be processed using different strategies. Dealing with the communication process might mean repeating a pre- vious utterance or changing the volume of the speech output. Utterances discussing how to solve the task can be viewed as direct messages to a system’s plan- ner, “let’s solve this subgoal first” or “is that the best solution”. The DAMSL annotation scheme makes reference to linguistic phenomena such as “check questions” and “acknowledgments by repetition”. A serious question is whether these phenomena can be defined precisely enough for humans to recognize them and annotate them reliably in a corpus of dialogs. A corpus reli- ably annotated with DAMSL labels would provide a valuable resource in the study of discourse as well as a source of training and testing for a dialog system using DAMSL labels in its utterance representation. The experiments in this paper show reliability results close to those considered usable for drawing scientific conclusions. Given that this is the first major test of DAMSL, it seems likely that the revisions mentioned in the Results section will allow reliable annotation with DAMSL. Acknowledgments This work was supported in part by National Science Foundation grants IRI-95-03312 and IRI-95-28998, the latter of which was under a subcontract from Columbia University. Thanks to Teresa Sikorski for her help in running the experiments and interpreting the results. Thanks also to Lenhart Schubert for his helpful com- ments, and George Ferguson for his annotation tool. References J. Allwood. An activity based approach to pragmat- ics. Technical report, Dept of Linguistics, Univ. of Goteborg, 1995. J. F. Allen, L. K. Schubert, G. M. Ferguson, P. A. Heeman, C. H. Hwang, T. Kato, M. N. Light, N. G. Martin, B. W. Miller, M. Poesio, and D. R. Traum. the TRAINS project: A case study in building a conversational planning agent. Technical Report 532, Department of Computer Science, University of Rochester, Rochester, NY 14627-0226, September 1994. J. Carletta. Assessing agreement on classification tasks: the kappa statistic. Computational Linguis- tics, 22(2), 1996. Philip R. Cohen and Hector J. Levesque. Rational in- teraction as the basis for communication. In Philip R. Cohen, Jerry Morgan, and Martha E. Pollack, edi- tors, Intentions in Communication, SDF Benchmark Series, pages 221-255. MIT Press, 1990. H. H. Clark and E. F. Schaefer. Contributing to dis- course. Cognitive Science, 13:259-294, 1989. D. Gross, J. Allen, and D. Traum. the TRAINS 91 dialogues. TRAINS Technical Note 92-1, Depart- ment of Computer Science, University of Rochester, Rochester, NY 14627-0226, 1993. P. Heeman and J. Allen. the TRAINS 93 dialogues. TRAINS Technical Note 94-2, Department of Com- puter Science, University of Rochester, Rochester, NY 14627-0226, 1995. M. Hancher. The classification of cooperative illocu- tionary acts. Language in Society, 8(1):1-14, 1979. W.C. Mann and S. A. Thompson. Rhetorical struc- ture theory: a theory of text organization. Technical Report ISI/RS-87-190, Univ. of Southern CA - Infor- mation Sciences Institute, 1987. J. R. Searle. Language, Mind, and Knowledge. Min- nesota Studies in the Philosophy of Science, chapter A Taxonomy of Illocutionary Acts. University of Min- nesota Press, 1975. S. Siegel and N. J. Castellan Jr. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, 1988.