Manny Rayner, Beth Ann Hockey,
Pierrette Bouillon
mrayner@riacs.edu,
bahockey@mail.arc.nasa.gov, pierrette.bouillon@issco.unige.ch
Regulus is a compiler, written in SICStus Prolog, that turns unification grammars written in a
Prolog-based feature-value notation into Nuance GSL grammars. It is open source
software, so you can use, modify and extend it in any way you like subject to
the restrictions of the LGPL license.
Regulus 1, the first version, was developed by Netdecisions Ltd and Fluency
Voice Technology Ltd at the Netdecisions Technology Centre,
Contents of this document:
· How to install and run Regulus
o Setting up the Regulus development environment
o Using the Regulus development environment
§ Parsing utterances in the Regulus development environment
§ Parsing non-top constituents in the Regulus development environment
§ Running the development environment with speech input
o Compiling Regulus to Nuance from the command line
o Calling the Regulus parser from Prolog
· Some simple Regulus grammars
o Toy0 - a minimal Regulus grammar
o Toy1 - a slightly larger Regulus grammar
· Building recognisers using grammar specialisation
o Building on top of the general English grammar
o Invoking the grammar specialiser
o Writing operationality definitions
o Defining multiple top-level specialised grammars
§ Ignoring subdomains to speed up compilation
o Handling ambiguity when constructing the treebank
o Making Regulus to Nuance compilation more efficient by ignoring features
o Including lexicon entries directly into the specialised grammar
§ Overriding include_lex declarations
§ Conditional include_lex declarations
o Creating class N-gram grammars from specialised grammars
o Files used for surface processing
· Using Regulus for generation
o Specialised generation grammars
· Using Regulus for translation
o Running translation in batch mode
o Calling translation from Prolog
§ Resolving conflicts between transfer rules
§
Bidirectional transfer rules
and transfer lexicon entries
o Interlingua and interlingua declarations
o Interlingua structure grammars
o Using macros in translation rule files
o Using generation in translation
· Using Regulus for dialogue applications
o Examples of dialogue processing applications
o Running dialogue applications from the Regulus top-level
o Dialogue processing commands
o Regression testing for dialogue applications
o Using LF patterns for dialogue applications
· Adding intelligent help to Regulus applications
· Formal description of Regulus grammar notation
o Comments
o Labels
o Macros
§ macro
§ feature
§ category
§ feature_instantiation_schedule
§ feature_value_space_substitution
o Rules
§ RHS
§ Sequence
§ Optional
§ Category
§ List
§ Unary GSL function expression
§ Binary GSL function expression
§ Atomic syntactic feature value
§ Variable syntactic feature value
§ Disjunctive syntactic feature value
§ Conjunctive syntactic feature value
§ Negated syntactic feature value
o Interfacing the RegServer to a Prolog program
o Interfacing the RegServer to a Java program
o Sample RegServer applications
What is Regulus good for?
The main point of Regulus is to make it possible to write large, complex GSL grammars, which would be difficult to code by hand. You are most likely to find it useful if you want to build a user-initiative or mixed-initiative speech application on top of the Nuance platform.
So what does it do exactly?
You write your language model in unification grammar format. Regulus will then compile it down to a normal GSL grammar.
What's the difference between unification grammar and GSL?
You can think of a unification grammar as parameterised GSL. Try looking at some of the example grammars to get the flavor of it...
Why is unification grammar better than ordinary GSL?
Parameterised GSL is better than straight GSL for the usual reasons; it's more compact and general, and it's easier to write maintainable and modular grammars.
Can Regulus produce any other kind of grammar formats?
The current version of Regulus can only produce Nuance GSL. Future versions may be able to produce other formats, for example ScanSoft or GrXML.
Do I need to know Prolog?
You need to know Prolog syntax and how to run Prolog programs. No knowledge of Prolog application programming is required.
Do I need to know linguistics?
You will probably find it easier to use Regulus
if you have some basic knowledge of feature grammars and how they are used by
linguists, but this is not strictly necessary if you are intending to build
your grammars from scratch. Try looking at some of the example grammars and see if they make sense to you.
If you want to use the grammar specialisation tools, some understanding of
linguistics and feature grammars is recommended.
How do I use Regulus as part of developing a speech application?
The most straightforward way to use Regulus is
to build a static GSL grammar. This grammar can then be used as part of any
normal Nuance application.
If you are building a Prolog-based application, you may find it convenient to
use the RegServer, a simple utility which
lets a Prolog program use a Regulus-based Nuance recognition package as though
it were a Prolog predicate.
What can you do in the development environment?
The real point of Regulus is to be able to compile unification grammars into Nuance GSL grammars. Most people however find it easier to debug the unification grammars in text mode as unification grammars. There is a compiler which allows you to do this, by converting the Regulus grammar into a set of left-corner parser tables. These tables can be loaded and run inside the Regulus development environment .
What's this stuff about grammar specialisation?
You may simply want to use Regulus to compile your own unification grammars
into Nuance GSL. Experience shows, however, that complex natural language
grammars tend to have a lot of common structure, since they ultimately have to
model general linguistic facts about English and other natural languages. There
are consequently good reasons for wanting to save effort by implementing a
SINGLE domain-independent core grammar, and producing domain-dependent versions
out of it using some kind of specialisation process.
Regulus includes an experimental system which attempts to deliver this
functionality. There is a general
unification grammar for English , containing about 180 rules, and an
accompanying core lexicon. For a given domain, you will need to supplement
these with a domain-specific lexicon that you will write yourself. You will
then be able to use the grammar
specialisation tools to transform a small training corpus into a
specialised version of the grammar.
1. Unpack the file Regulus.zip to an appropriate place. Set the environment variable $REGULUS to this place.
2. Make sure SICStus Prolog is installed on your system. Make sure that sicstus.exe is visible along your path. (I have C:\Program Files\SICStus Prolog\bin included in my path).
3. If you want to be able to give speech input to Regulus, do the following:
1. Make sure that /usr/bin (UNIX) or c:/cygwin/bin (Windows/Cygwin) are in your path.
2. Create a file called $REGULUS/scripts/run_license.bat, whose contents
are a single line invoking the Nuance License Manager. This will require
obtaining a license manager code from Nuance. A typical line would be something
like the following (the license code is not genuine):
nlm C:/Nuance/Vocalizer4.0/license.txt ntk12-1234-a-1234-a1bc12de1234
If you want to build and compile a Regulus grammar, the first step is to write a config file . The config file specifies the various files and parameters associated with your grammar. You can then start up the development environment as follows:
1. Start SICStus Prolog.
2.
Load the Regulus system code by
typing
:- ['$REGULUS/Prolog/load'].
at Prolog top-level.
3.
Start the Regulus top-loop with
the specified config file <Config> by typing
:- regulus('<Config>').
at Prolog top-level.
Note It is often convenient to specify the pathnames in the config file using Prolog file_search_path declarations, and this has in particular been done for the examples provided with this release. If you are using file_search_path declarations, you must load these before loading the config file. For example, the PSA application uses a set of file_search_path declarations kept in the file $REGULUS/Examples/PSA/scripts/library_declarations.pl. To start the Regulus development environment with the PSA example, you thus need to carry out the following specific series of commands:
1. Start SICStus Prolog.
2. :- ['$REGULUS/Prolog/load'].
3. :- ['$REGULUS/Examples/PSA/scripts/library_declarations'].
4. :- regulus('$REGULUS/Examples/PSA/scripts/psa.cfg').
Similar declaration files exist for the other example applications.
Once you are in the development environment, you can get a listing of all
the top-level Regulus commands by typing HELP:
>> HELP
(Print this message)
BATCH_DIALOGUE (Process dialogue corpus)
BATCH_DIALOGUE <Arg> (Process dialogue
corpus with specified ID)
BATCH_DIALOGUE_SPEECH (Process dialogue
speech corpus)
BATCH_DIALOGUE_SPEECH <Arg> (Process
dialogue speech corpus with specified ID)
BATCH_DIALOGUE_SPEECH_AGAIN (Process
dialogue speech corpus, using recognition results from previous run)
BATCH_DIALOGUE_SPEECH_AGAIN <Arg>
(Process dialogue speech corpus with specified ID, using recognition results
from previous run)
CHECK_ALTERF_PATTERNS (Check the
consistency of the current Alterf patterns file)
COMPILE_ELLIPSIS_PATTERNS (Compile
patterns used for ellipsis processing)
DIALOGUE (Do dialogue-style processing on input
sentences)
DCG (Use DCG parser)
EBL (Do all EBL processing: equivalent to LOAD,
EBL_TREEBANK, EBL_TRAIN, EBL_POSTPROCESS, EBL_NUANCE)
EBL_ANALYSIS (Do all EBL processing, except for
creation of Nuance grammar: equivalent to LOAD, EBL_TREEBANK, EBL_TRAIN,
EBL_POSTPROCESS)
EBL_GEMINI (Compile current specialised Regulus
grammar into Gemini form)
EBL_GENERATION (Do main generation EBL
processing: equivalent to LOAD, EBL_TREEBANK, EBL_TRAIN, EBL_POSTPROCESS,
EBL_LOAD_GENERATION)
EBL_GRAMMAR_PROBS (Create Nuance grammar probs
training set from current EBL training set)
EBL_LOAD (Load current specialised Regulus grammar in
DCG and left-corner form)
EBL_LOAD_GENERATION (Compile and load
current specialised Regulus grammar for generation)
EBL_LOAD_GENERATION <Arg>
(Compile and load designated version of current specialised Regulus grammar for
generation)
EBL_NUANCE (Compile current specialised Regulus
grammar into Nuance GSL form)
EBL_POSTPROCESS (Postprocess results of EBL
training into specialised Regulus grammar)
EBL_TREEBANK (Parse all sentences in current EBL
training set into treebank form)
EBL_TRAIN (Do EBL training on current treebank)
ECHO_ON (Echo input sentences (normally useful only in
batch mode))
ECHO_OFF (Don't echo input sentences (default))
GEMINI (Compile current Regulus grammar into Gemini form)
GENERATION (Generate from parsed input sentences)
HELP (Print this message)
INIT_DIALOGUE (Initialise the
dialogue state)
INTERLINGUA (Perform translation through
interlingua)
LC (Use left-corner parser)
LINE_INFO_ON (Print line and file info for rules
and lex entries in parse trees (default))
LINE_INFO_OFF (Don't print line and file info for
rules and lex entries in parse trees)
LOAD (Load current Regulus grammar in DCG and left-corner
form)
LOAD_TRANSLATE (Load translation-related files)
LOAD_GENERATION (Compile and load current
generator grammar)
LOAD_GENERATION <Arg> (Compile and
load current generator grammar, and store as designated subdomain grammar)
LOAD_SURFACE_PATTERNS (Load current
surface patterns and associated files)
LOAD_DIALOGUE (Load dialogue-related files)
NO_INTERLINGUA (Perform translation directly,
i.e. not through interlingua)
NORMAL_PROCESSING (Do normal processing on
input sentences)
NOTRACE (Switch off tracing for DCG grammar)
NUANCE (Compile current Regulus grammar into Nuance GSL
form)
NUANCE_COMPILE (Compile Nuance grammar into
recogniser package)
SPLIT_SPEECH_CORPUS <GrammarName>
<InCoverageId> <OutOfCoverageId> (Split speech corpus into
in-coverage and out-of-coverage pieces with respect to the specified grammar)
STEPPER (Start grammar stepper)
SURFACE (Use surface pattern-matching parser)
TRACE (Switch on tracing for DCG grammar)
TRANSLATE (Do translation-style processing on input
sentences)
TRANSLATE_TRACE_ON (Switch on
translation tracing)
TRANSLATE_TRACE_OFF (Switch off translation
tracing (default))
TRANSLATE_CORPUS (Process text translation
corpus)
TRANSLATE_CORPUS <Arg> (Process text
translation corpus with specified ID)
TRANSLATE_SPEECH_CORPUS (Process speech
translation corpus)
TRANSLATE_SPEECH_CORPUS <Arg>
(Process speech translation corpus with specified ID)
TRANSLATE_SPEECH_CORPUS_AGAIN
(Process speech translation corpus, using recognition results from previous
run)
TRANSLATE_SPEECH_CORPUS_AGAIN
<Arg> (Process speech translation corpus with specified ID, using recognition
results from previous run)
UPDATE_DIALOGUE_JUDGEMENTS (Update
dialogue judgements file from annotated dialogue corpus output)
UPDATE_DIALOGUE_JUDGEMENTS <Arg>
(Update dialogue judgements file with specified ID from annotated dialogue
corpus output)
UPDATE_DIALOGUE_JUDGEMENTS_SPEECH
(Update dialogue judgements file from annotated speech dialogue corpus output)
UPDATE_DIALOGUE_JUDGEMENTS_SPEECH
<Arg> (Update dialogue judgements file with specified ID from
annotated speech dialogue corpus
UPDATE_TRANSLATION_JUDGEMENTS
(Update translation judgements from annotated translation corpus output)
UPDATE_TRANSLATION_JUDGEMENTS
<Arg> (Update translation judgements file from annotated translation
corpus output with specified ID)
UPDATE_TRANSLATION_JUDGEMENTS_SPEECH
(Update translation judgements file from annotated speech translation corpus
output)
UPDATE_TRANSLATION_JUDGEMENTS_SPEECH
<Arg> (Update translation judgements file from annotated speech
translation corpus output with specified ID)
UPDATE_RECOGNITION_JUDGEMENTS
(Update recognition judgements file from temporary translation corpus
recognition judgements)
UPDATE_RECOGNITION_JUDGEMENTS
<Arg> (Update recognition judgements file from temporary translation
corpus recognition judgements with specified ID)
The meanings of these commands are defined below.
Print the help message
Load the current Regulus grammar. You need to do this first to be able to do
parsing or training.
The current Regulus grammar is defined by the regulus_grammar
config file entry.
Use DCG parser. The grammar can be parsed using either the left-corner parser (the default) or the DCG parser. The left-corner parser is faster, but the DCG parser can be useful for debugging. In particular, it can be used to parse non-top constituent ; the left-corner parser lacks this capability.
Use left-corner parser. You can use this command to restore normal parsing after using the DCG parser for debugging.
Turn on normal processing, i.e. not translation mode processing or generation mode processing.
Compile current Regulus grammar into Nuance GSL form. You won't be able to
use this command in conjunction with the large general grammar, since it
currently runs out of memory during compilation - this why we need EBL. The
NUANCE command is useful for smaller Regulus grammars, e.g. the original
Medical SLT and House grammars.
The current Regulus grammar is defined by the regulus_grammar
config file entry.
The location of the generated Nuance grammar is defined by the nuance_grammar config file entry.
Compile current Regulus grammar into Gemini form.
The current Regulus grammar is defined by the regulus_grammar
config file entry.
The base name of the Gemini grammar <Gemini> is defined by the gemini_grammar config file entry.
Four files are created, called respectively <Gemini>.syn, <Gemini>.sem,
<Gemini>.features and <Gemini>.lex. Regulus semantics is translated
into Gemini semantics in a straightforward way, so that Nuance functions simply
become Prolog functors.
Switch on Prolog tracing for the predicates in the DCG grammar representing categories. Occasionally useful.
Switch off Prolog tracing for DCG grammar
Do translation-style processing on input sentences. In this mode, the sentence is parsed using the current parser. If any parses are found, the first one is processed through translation and generation. Translation is performed using interlingual rules if the INTERLINGUA command has been applied, otherwise using direct transfer.
Do dialogue-style processing on input sentences. In this mode, the sentence is parsed using the current parser. If any parses are found, the first one is processed through the code defined by the dialogue_files config file entry.
Make translation processing go through interlingua. This applies both to interactive processing when the TRANSLATE command is in effect, and to batch processing using the commands TRANSLATE_CORPUS, TRANSLATE_SPEECH_CORPUS and TRANSLATE_SPEECH_CORPUS_AGAIN.
Make translation processing use direct transfer. This applies both to interactive processing when the TRANSLATE command is in effect, and to batch processing using the commands TRANSLATE_CORPUS, TRANSLATE_SPEECH_CORPUS and TRANSLATE_SPEECH_CORPUS_AGAIN.
Load all translation-related files defined in the currently valid config file. These consist of a subset of the following:
· One or more transfer rules files (optional) defined by the transfer_rules config file entry.
· An interlingua declarations file (optional) defined by the interlingua_declarations config file entry.
· One or more to_interlingua rules files (optional) defined by the to_interlingua_rules config file entry.
· One or more from_interlingua rules files (optional) defined by the from_interlingua_rules config file entry.
· An ellipsis classes file (optional) defined by the ellipsis_classes config file entry. If this is defined, you need to compile it first using the COMPILE_ELLIPSIS_PATTERNS command.
· A generation grammar file (required) defined by the generation_rules config file entry. This should be the compiled form of a Regulus grammar for the target language. The compiled generation grammar must first be created using the LOAD_GENERATION command.
· A generation preferences file (optional) defined by the generation_preferences config file entry.
· A collocations file (optional) defined by the collocation_rules config file entry.
· An orthography rules file (optional) defined by the orthography_rules config file entry.
If the config file entries wavfile_directory and wavfile_recording_script are defined, implying that output speech will be produced using recorded wavfiles, this command also produces a new version of the file defined by wavfile_recording_script.
Compile the patterns used for ellipsis processing, which are defined by the ellipsis_classes config file entry. The compiled patterns will be loaded next time you invoke LOAD_TRANSLATE.
Process the default text mode translation corpus, defined by the translation_corpus config file entry. The output file, defined by the translation_corpus_results config file entry, contains question marks for translations that have not yet been judged. If these are replaced by valid judgements, currently 'good', 'ok' or 'bad', the new judgements can be incorporated into the translation judgements file (defined by the translation_corpus_judgements config file entry) using the command UPDATE_TRANSLATION_JUDGEMENTS.
Parameterised version of TRANSLATE_CORPUS. Process the text mode translation corpus with ID <Arg>, defined by the parameterised config file entry translation_corpus(<Arg>). The output file, defined by the parameterised config file entry translation_corpus_results(<Arg>), contains question marks for translations that have not yet been judged. If these are replaced by valid judgements, currently 'good', 'ok' or 'bad', the new judgements can be incorporated into the translation judgements file (defined by the translation_corpus_judgements config file entry) using the parameterised command UPDATE_TRANSLATION_JUDGEMENTS <Arg>.
Update the translation judgements file, defined by the translation_corpus_judgements config file entry, from the output of the default text translation corpus output file, defined by the translation_corpus_results config file entry. This command should be used after editing the output file produced by the TRANSLATE_CORPUS command. Editing should replace question marks by valid judgements, currently 'good', 'ok' or 'bad'.
Parameterised version of UPDATE_TRANSLATION_JUDGEMENTS. Update the translation judgements file, defined by the translation_corpus_judgements config file entry, from the output of the text translation corpus output file with ID <Arg>, defined by the parameterised config file entry translation_corpus_results(<Arg>). This command should be used after editing the output file produced by the parameterised command TRANSLATE_CORPUS <Arg>. Editing should replace question marks by valid judgements, currently 'good', 'ok' or 'bad'.
Process speech mode translation corpus, defined by the translation_speech_corpus config file entry. The output file, defined by the translation_speech_corpus_results config file entry, contains question marks for translations that have not yet been judged. If these are replaced by valid judgements, currently 'good', 'ok' or 'bad', the new judgements can be incorporated into the stored translation judgements file using the command UPDATE_TRANSLATION_JUDGEMENTS_SPEECH. A second output file, defined by the translation_corpus_tmp_recognition_judgements config file entry, contains "blank" recognition judgements: here, the question marks should be replaced with either 'y' (acceptable recognition), or 'n' (unacceptable recognition). Recognition judgements can be updated using the UPDATE_RECOGNITION_JUDGEMENTS command.
Parameterised version of TRANSLATE_SPEECH_CORPUS. Process speech mode translation corpus, defined by the translation_speech_corpus(<Arg>) config file entry. The output file, defined by the translation_speech_corpus_results(<Arg) config file entry, contains question marks for translations that have not yet been judged. If these are replaced by valid judgements, currently 'good', 'ok' or 'bad', the new judgements can be incorporated into the stored translation judgements file using the command UPDATE_TRANSLATION_JUDGEMENTS_SPEECH <Arg>. A second output file, defined by the translation_corpus_tmp_recognition_judgements(<Arg>) config file entry, contains "blank" recognition judgements: here, the question marks should be replaced with either 'y' (acceptable recognition), or 'n' (unacceptable recognition). Recognition judgements can be updated using the UPDATE_RECOGNITION_JUDGEMENTS <Arg> command.
Process speech mode translation corpus, starting from the results saved from the most recent invocation of the TRANSLATE_SPEECH_CORPUS command. This is useful if you are testing speech translation performance, but have only changed the translation or generation files. The output files are the same as for the TRANSLATE_SPEECH_CORPUS command.
Parameterised version of TRANSLATE_SPEECH_CORPUS_AGAIN. Process speech mode translation corpus, starting from the results saved from the most recent invocation of the TRANSLATE_SPEECH_CORPUS <Arg> command. This is useful if you are testing speech translation performance, but have only changed the translation or generation files. The output files are the same as for the TRANSLATE_SPEECH_CORPUS <Arg> command.
Update the translation judgements file, defined by the translation_corpus_judgements config file entry, from the output of the speech translation corpus output file, defined by the translation_speech_corpus_results config file entry. This command should be used after editing the output file produced by the TRANSLATE_SPEECH_CORPUS or TRANSLATE_SPEECH_CORPUS_AGAIN command. Editing should replace question marks by valid judgements, currently 'good', 'ok' or 'bad'.
Parameterised version of UPDATE_TRANSLATION_JUDGEMENTS_SPEECH. Update the translation judgements file, defined by the translation_corpus_judgements config file entry, from the output of the speech translation corpus output file, defined by the translation_speech_corpus_results(<Arg>) config file entry. This command should be used after editing the output file produced by the TRANSLATE_SPEECH_CORPUS <Arg> or TRANSLATE_SPEECH_CORPUS_AGAIN <Arg> command. Editing should replace question marks by valid judgements, currently 'good', 'ok' or 'bad'.
Update recognition judgements file, defined by the translation_corpus_recognition_judgements config file entry, from the temporary translation corpus recognition judgements file, defined by the translation_corpus_tmp_recognition_judgements config file entry and produced by the TRANSLATE_SPEECH_CORPUS or TRANSLATE_SPEECH_CORPUS_AGAIN commands. This command should be used after editing the temporary translation corpus recognition judgements file. Editing should replace question marks by valid judgements, currently 'y' or 'n'.
Parameterised version of UPDATE_RECOGNITION_JUDGEMENTS. Update recognition judgements file, defined by the translation_corpus_recognition_judgements config file entry, from the temporary translation corpus recognition judgements file, defined by the translation_corpus_tmp_recognition_judgements(<Arg>) config file entry and produced by the TRANSLATE_SPEECH_CORPUS <Arg> or TRANSLATE_SPEECH_CORPUS_AGAIN <Arg> commands. This command should be used after editing the temporary translation corpus recognition judgements file. Editing should replace question marks by valid judgements, currently 'y' or 'n'.
Splits the speech translation corpus output file, defined by the translation_speech_corpus config file
entry, into an in-coverage part defined by a translation_speech_corpus(<InCoverageId>)
config file entry, and an out-of-coverage part defined by a translation_speech_corpus(<OutOfCoverageId>)
config file entry. Coverage is with respect to the top-level grammar
<GrammarName>, which must be loaded.
Typical call:
SPLIT_SPEECH_CORPUS .MAIN in_coverage out_of_coverage
Compile and load the current generation grammar, defined by the regulus_grammar or generation_regulus_grammar config file entry. The resulting compiled generation grammar is placed in the file defined by the generation_grammar config file entry.
Compile and load the current generation grammar, defined by the generation_grammar config file entry. The resulting compiled generation grammar is placed in the file defined by the generation_grammar(<Arg>) config file entry. This can be useful if you are normally using grammar specialisation to build the generation grammar.
Run the system in "generation mode". Each
input sentence is analysed. If any parses are found, the first one is generated
back using the currently loaded generation grammar, showing all possible
generated strings. This is normally used for debugging the generation grammar.
ECHO_ON
Echo utterances at top-level. This is often useful when running the system in batch mode.
ECHO_OFF
Don't echo utterances at top-level (default).
LINE_INFO_ON
Print line and file info for rules and lex entries in parse trees (default). A
typical parse tree will look like this:
.MAIN [TOY1_RULES:1-5]
utterance
[TOY1_RULES:6-10]
command
[TOY1_RULES:11-15]
/
verb lex(switch) [TOY1_LEXICON:7-9]
|
onoff null lex(on) [TOY1_LEXICON:23-24]
|
np [TOY1_RULES:26-30]
|
/ lex(the)
|
| noun lex(light) [TOY1_LEXICON:15-16]
|
| location_pp [TOY1_RULES:31-34]
|
| / lex(in)
|
| | np [TOY1_RULES:26-30]
|
| | / lex(the)
|
| | | noun lex(kitchen) [TOY1_LEXICON:20-21]
\
\ \ \ null
------------------------------- FILES
-------------------------------
TOY1_LEXICON:
c:/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES:
c:/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
LINE_INFO_OFF
Don't print line and file info for rules and lex entries in parse trees. A
typical parse tree will look like this:
.MAIN
utterance
command
/
verb lex(switch)
|
onoff null lex(on)
|
np
|
/ lex(the)
|
| noun lex(light)
|
| location_pp
|
| / lex(in)
|
| | np
|
| | / lex(the)
|
| | | noun lex(kitchen)
\
\ \ \ null
Compile and load the current surface pattern and associated files. You can then parse in surface mode using the SURFACE command. The following config file entries must be defined:
Parse utterances using the surface parser. This assumes that surface pattern files have been loaded, using the LOAD_SURFACE_PATTERNS command.
Initialise the dialogue state when running in dialogue mode.
Parse all sentences in current EBL training set, defined by the ebl_corpus config file entry, into treebank form. Sentences that fail to parse are printed out with warning messages, and a summary statistic is produced at the end of the run. This is very useful for checking where you are with coverage.
Do EBL training on current treebank. You need to build the treebank first using the EBL_TREEBANK command.
Postprocess results of EBL training into a specialised Regulus grammar. You need to create these results first using the EBL_TRAIN command.
Load current specialised Regulus grammar in DCG and left-corner form. Same as the LOAD command, but for the specialised grammar. The specialised grammar needs to be created using the EBL_TREEBANK, EBL_TRAIN and EBL_POSTPROCESS commands.
Compile and load the current specialised generation grammar. This will be the file <prefix>_specialised_no_binarise_default.regulus, where <prefix> is the value of the config file entry working_file_prefix. The resulting compiled generation grammar is placed in the file defined by the generation_rules config file entry. Note that EBL_LOAD_GENERATION places the compiled generation grammar in the same place as LOAD_GENERATION.
Parameterised version of EBL_LOAD_GENERATION. Compile and load the specialised generation grammar for the subdomain tag <SubdomainTag>. This will be the file <prefix>_specialised_no_binarise_<SubdomainTag>.regulus, where <prefix> is the value of the config file entry working_file_prefix. The resulting compiled generation grammar is placed in the file defined by the generation_grammar(<SubdomainTag>) config file entry. Note that EBL_LOAD_GENERATION <SubdomainTag> places the compiled generation grammar in the same place as LOAD_GENERATION <SubdomainTag>.
Compile current specialised Regulus grammar into Nuance GSL form. Same as the NUANCE command, but for the specialised grammar. The input is the file created by the EBL_POSTPROCESS command; the output Nuance GSL grammar is placed in the file defined by the ebl_nuance_grammar config file entry.
Compile current specialised Regulus grammar into Gemini form. Same as the GEMINI command, but for the specialised grammar. The base name of the Gemini files produced is defined by the ebl_gemini_grammar config file entry.
Do all EBL processing, except for creation of Nuance grammar: equivalent to the sequence LOAD, EBL_TREEBANK, EBL_TRAIN, EBL_POSTPROCESS
Do all EBL processing: equivalent to the sequence LOAD, EBL_TREEBANK, EBL_TRAIN, EBL_POSTPROCESS, EBL_NUANCE
Do all EBL processing for generation:
equivalent to the sequence LOAD, EBL_TREEBANK,
EBL_TRAIN, EBL_POSTPROCESS,
EBL_LOAD_GENERATION
EBL_GRAMMAR_PROBS
Convert the current EBL training set, defined by the by the ebl_corpus config file entry, into a form that can be
used as training data by the Nuance compute-grammar-probs utility. The output
training data is placed in the file defined by the ebl_grammar_probs
config file entry.
CHECK_ALTERF_PATTERNS
Check the consistency of the current Alterf patterns
file, defined by the alterf_patterns_file
config file entry.
NUANCE_COMPILE
Compile the generated Nuance grammar, defined by the nuance_grammar
config file entry, into a recognition package with the same name. This will be
done using the Nuance language pack defined by the nuance_language_pack config
file entry and the extra parameters defined by the nuance_compile_params config
file entry. Typical values for these parameters are as follows:
regulus_config(nuance_language_pack,
'English.America').
regulus_config(nuance_compile_params, ['-auto_pron',
'-dont_flatten']).
BATCH_DIALOGUE
Process the default dialogue mode development corpus, defined by the dialogue_corpus
config file entry. The output file, defined by the dialogue_corpus_results
config file entry, contains question marks for dialogue processing steps that
have not yet been judged. If these are replaced by valid judgements, currently
'good', or 'bad', the new judgements can be incorporated into the dialogue
judgements file (defined by the dialogue_corpus_judgements config file entry)
using the command UPDATE_DIALOGUE_JUDGEMENTS.
BATCH_DIALOGUE <Arg>
Parameterised version of BATCH_DIALOGUE.
Process the default dialogue mode development corpus, defined by the dialogue_corpus(<Arg>) config file entry. The output file,
defined by the dialogue_corpus_results(<Arg>)
config file entry, contains question marks for dialogue processing steps that
have not yet been judged. If these are replaced by valid judgements, currently
'good', or 'bad', the new judgements can be incorporated into the dialogue
judgements file (defined by the dialogue_corpus_judgements config file entry)
using the command UPDATE_DIALOGUE_JUDGEMENTS
<Arg>.
BATCH_DIALOGUE_SPEECH
Speech mode version of BATCH_DIALOGUE.
Process the default dialogue mode speech corpus, defined by the dialogue_speech_corpus
config file entry. The output file, defined by the dialogue_speechcorpus_results
config file entry, contains question marks for dialogue processing steps that
have not yet been judged. If these are replaced by valid judgements, currently
'good', or 'bad', the new judgements can be incorporated into the dialogue
judgements file (defined by the dialogue_corpus_judgements config file entry)
using the command UPDATE_DIALOGUE_JUDGEMENTS_SPEECH.
BATCH_DIALOGUE_SPEECH
<Arg>
Parameterised speech mode version of BATCH_DIALOGUE. Process the default dialogue
mode speech corpus, defined by the dialogue_corpus(<Arg>)
config file entry. The output file, defined by the dialogue_speech_corpus_results(<Arg>) config file entry, contains question
marks for dialogue processing steps that have not yet been judged. If these are
replaced by valid judgements, currently 'good', or 'bad', the new judgements
can be incorporated into the dialogue judgements file (defined by the dialogue_corpus_judgements
config file entry) using the command UPDATE_DIALOGUE_JUDGEMENTS_SPEECH
<Arg>.
BATCH_DIALOGUE_SPEECH_AGAIN
Version of BATCH_DIALOGUE_SPEECH that
skips the speech recognition stage, and instead uses stored results from the
previous run.
BATCH_DIALOGUE_SPEECH_AGAIN <Arg>
Version of BATCH_DIALOGUE_SPEECH
<Arg> that skips the speech recognition stage, and instead uses
stored results from the previous run.
Update the dialogue judgements file, defined by the dialogue_corpus_judgements config file entry, from the output of the default text dialogue corpus output file, defined by the dialogue_corpus_results config file entry. This command should be used after editing the output file produced by the BATCH_DIALOGUE command. Editing should replace question marks by valid judgements, currently 'good', or 'bad'.
Parameterised version of UPDATE_DIALOGUE_JUDGEMENTS. Update the dialogue judgements file, defined by the dialogue_corpus_judgements config file entry, from the output of the dialogue corpus output file with ID <Arg>, defined by the parameterised config file entry dialogue_corpus_results(<Arg>). This command should be used after editing the output file produced by the parameterised command BATCH_DIALOGUE <Arg>. Editing should replace question marks by valid judgements, currently 'good' or 'bad'.
Update the dialogue judgements file, defined by the dialogue_corpus_judgements config file entry, from the output of the default speech dialogue corpus output file, defined by the dialogue_speech_corpus_results config file entry. This command should be used after editing the output file produced by the BATCH_DIALOGUES_SPEECH command. Editing should replace question marks by valid judgements, currently 'good', or 'bad'.
Parameterised version of UPDATE_DIALOGUE_JUDGEMENTS_SPEECH. Update the dialogue judgements file, defined by the dialogue_corpus_judgements config file entry, from the output of the dialogue corpus output file with ID <Arg>, defined by the parameterised config file entry dialogue_corpus_results(<Arg>). This command should be used after editing the output file produced by the parameterised command BATCH_DIALOGUES_SPEECH <Arg>. Editing should replace question marks by valid judgements, currently 'good' or 'bad'.
You can parse full utterances by typing them in at Regulus top-level. If
parsing is successful, Regulus returns the output logical form and the parse
tree, e.g.
>> switch on the light in the
kitchen
(Parsing with left-corner parser)
Analysis time: 0.00 seconds
Return value:
[[utterance_type,command],[action,switch],[onoff,on],[device,light],[location,kitchen]]
Global value: []
Syn features: []
Parse tree:
.MAIN [TOY1_RULES:1-5]
utterance
[TOY1_RULES:6-10]
command
[TOY1_RULES:11-15]
/
verb lex(switch) [TOY1_LEXICON:7-9]
|
onoff null lex(on) [TOY1_LEXICON:23-24]
|
np [TOY1_RULES:26-30]
|
/ lex(the)
|
| noun lex(light) [TOY1_LEXICON:15-16]
|
| location_pp [TOY1_RULES:31-34]
|
| / lex(in)
|
| | np [TOY1_RULES:26-30]
|
| | / lex(the)
|
| | | noun lex(kitchen) [TOY1_LEXICON:20-21]
\
\ \ \ null
------------------------------- FILES
-------------------------------
TOY1_LEXICON:
c:/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES:
c:/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
The formatting of the parse tree can be controlled using the LINE_INFO_ON and LINE_INFO_OFF
commands.
For small grammars, you can run in DCG mode, using the DCG command.
When the DCG parser is being used, you can also parse non-top constituents
using the command syntax
>> <NameOfConstituent> :
<Words>
so for example
>> DCG
(Use DCG parser)
>> np: the light in the kitchen
(Parsing with DCG parser)
Analysis time: 0.00 seconds
Return value:
[[device,light],[location,kitchen]]
Global value: []
Syn features:
[sem_np_type=switchable\/dimmable,singplur=sing]
Parse tree:
np [TOY1_RULES:26-30]
/ lex(the)
| noun lex(light)
[TOY1_LEXICON:15-16]
| location_pp [TOY1_RULES:31-34]
| / lex(in)
| | np [TOY1_RULES:26-30]
| | / lex(the)
| | | noun
lex(kitchen) [TOY1_LEXICON:20-21]
\ \ \ null
------------------------------- FILES
-------------------------------
TOY1_LEXICON:
c:/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES:
c:/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
The DCG mode is not suitable for large grammars, and is often inconvenient even for small ones. A better way to debug grammars is to use the grammar stepper, which is invoked using the STEPPER command. This enters a special version of the top loop, with its own set of commands. The basic functionality of the stepper is to support manipulation of parse trees. Trees can be created, examined, cut and pasted. It is usually quite easy to find feature bugs by carrying out a short sequence of these operations.
The stepper commands are as follows:
HELP
Print help message describing commands
LEX WordOrWords
Add item for WordOrWords, e.g. 'LEX fan' or 'LEX living room'
GAP
Add item for gap expression
PARSE <WordOrWords>
Add item formed by parsing <WordOrWords>, e.g. 'PARSE switch on the light'
COMBINE <IDOrIDs>
Combine items into a new item, e.g. 'COMBINE 1' or 'COMBINE 1 3'
CUT <ID> <Node>
Cut item <ID> at <Node>, e.g. 'CUT 2 3'
JOIN <ID1> <Node> <ID2>
Attach item <ID2> under <Node> of <ID1>, e.g. 'JOIN 1 15 4'
JOIN <ID1> <ID2>
Attach item <ID2> under <ID1>, e.g. 'JOIN 1 4'
SHOW <ID>
Show item <ID>, e.g. 'SHOW 1'
SHOW <ID> <Node>
Show material under <Node> of item <ID>, e.g. 'SHOW 1 15'
RULE <ID> <Node>
Show rule at <Node> of item <ID>, e.g. 'RULE 1 15'
DELETE <IDOrIDs>
Delete item <IDorIDs>, e.g. 'DELETE 1' or 'DELETE 1 2'
DELETE_ALL
Delete all items
SUMMARY
Print summary for each item
EXIT
Leave stepper
The following annotated session using the Toy1 grammar illustrates use of the stepper.
Load Regulus environment
| ?-
['$REGULUS/Prolog/load'].
<snip>
Start Regulus with Toy1 grammar
| ?-
regulus('$REGULUS/Examples/Toy1/scripts/toy1.cfg').
Loading settings
from Regulus config file
c:/cygwin/home/speech/regulus/examples/toy1/scripts/toy1.cfg
Loading settings
from Regulus config file c:/cygwin/home/speech/regulus/examples/toy1/scripts/file_search_paths.cfg
>> LOAD
<snip>
Enter stepper loop. Note that the prompt changes.
>> STEPPER
(Start grammar
stepper)
Print help message.
STEPPER>> HELP
Available stepper
commands:
HELP - print this message
LEX WordOrWords - add item for WordOrWords, e.g. 'LEX
pain' or 'LEX bright light'
GAP - add item for gap
expression
PARSE
WordOrWords - add item formed by
parsing WordOrWords, e.g. 'PARSE where is the pain'
COMBINE IDOrIDs - combine items into a new item, e.g.
'COMBINE 1' or 'COMBINE 1 3'
CUT ID Node - cut item ID at Node, e.g. 'CUT 2
3'
JOIN ID1 Node
ID2 - attach item ID2 under Node of
ID1, e.g. 'JOIN 1 15 4'
JOIN ID1 ID2 - attach item ID2 under ID1, e.g.
'JOIN 1 4'
SHOW ID - show item ID, e.g. 'SHOW 1'
SHOW ID Node - show material under Node of item
ID, e.g. 'SHOW 1 15'
DELETE IDOrIDs - delete item ID or IDs, e.g. 'DELETE
1' or 'DELETE 1 2'
DELETE_ALL - delete all items
SUMMARY - print summary for each item
EXIT - leave stepper
Parse a sentence.
STEPPER>>
PARSE switch on the light
Added item 1:
.MAIN-->switch,on,the,light
Look at the resulting item.
STEPPER>> SHOW
1
Form: .MAIN-->switch,on,the,light
Sem:
concat([[utterance_type,command]],concat([[action,switch]],concat([[onoff,on]],[[device,light]])))
Feats: []
Tree:
.MAIN (node 1)
[TOY1_RULES:1-5]
utterance (node 2) [TOY1_RULES:6-10]
command (node 3) [TOY1_RULES:11-15]
/
verb lex(switch) (node 4) [TOY1_LEXICON:7-9]
|
onoff lex(on) (node 5) [TOY1_LEXICON:23-24]
|
np (node 6) [TOY1_RULES:26-30]
|
/ lex(the)
\
\ noun lex(light) (node 7)
[TOY1_LEXICON:15-16]
-------------------------------
FILES -------------------------------
TOY1_LEXICON:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
We can look at constituents inside the item. Here, we
examine node 6:
STEPPER>> SHOW
1 6
Form: np-->the,light
Sem: [[device,light]]
Feats:
[sem_np_type=switchable\/dimmable,singplur=sing]
Tree:
np (node 1)
[TOY1_RULES:26-30]
/ lex(the)
\ noun lex(light) (node 2) [TOY1_LEXICON:15-16]
-------------------------------
FILES -------------------------------
TOY1_LEXICON:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
We can also print the rule at this node:
STEPPER>> RULE 1 6
np:[sem=concat(Noun, Loc), singplur=N,
sem_np_type=SemType] -->
the,
noun:[sem=Noun,
singplur=N, sem_np_type=SemType],
?location_pp:[sem=Loc].
We want to find out why the sentence "switch on the
kitchen" doesn't parse. We try to cut and paste a tree for it. First, we
create a tree containing the NP "the kitchen":
STEPPER>>
PARSE switch on the light in the kitchen
Added item 2:
.MAIN-->switch,on,the,light,in,the,kitchen
STEPPER>> SHOW
2
Form: .MAIN-->switch,on,the,light,in,the,kitchen
Sem:
concat([[utterance_type,command]],concat([[action,switch]],concat([[onoff,on]],concat([[device,light]],
[[location,kitchen]]))))
Feats: []
Tree:
.MAIN (node 1)
[TOY1_RULES:1-5]
utterance (node 2) [TOY1_RULES:6-10]
command (node 3) [TOY1_RULES:11-15]
/
verb lex(switch) (node 4) [TOY1_LEXICON:7-9]
|
onoff lex(on) (node 5) [TOY1_LEXICON:23-24]
|
np (node 6) [TOY1_RULES:26-30]
| /
lex(the)
|
| noun lex(light) (node 7)
[TOY1_LEXICON:15-16]
|
| location_pp (node 8)
[TOY1_RULES:31-34]
|
| / lex(in)
|
| | np (node 9) [TOY1_RULES:26-30]
|
| | /
lex(the)
\
\ \ \ noun
lex(kitchen) (node 10) [TOY1_LEXICON:20-21]
-------------------------------
FILES -------------------------------
TOY1_LEXICON:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
We cut at node 9, to create two pieces.
STEPPER>> CUT
2 9
Added item 3:
.MAIN-->switch,on,the,light,in,np
Added item 4:
np-->the,kitchen
Item 3 has the missing node marked as "cut".
STEPPER>> SHOW
3
Form: .MAIN-->switch,on,the,light,in,np
Sem:
concat([[utterance_type,command]],concat([[action,switch]],concat([[onoff,on]],concat([[device,light]],(Sem
for node 9)))))
Feats: []
Tree:
.MAIN (node 1)
[TOY1_RULES:1-5]
utterance (node 2) [TOY1_RULES:6-10]
command (node 3) [TOY1_RULES:11-15]
/
verb lex(switch) (node 4) [TOY1_LEXICON:7-9]
|
onoff lex(on) (node 5) [TOY1_LEXICON:23-24]
|
np (node 6) [TOY1_RULES:26-30]
|
/ lex(the)
|
| noun lex(light) (node 7)
[TOY1_LEXICON:15-16]
|
| location_pp (node 8)
[TOY1_RULES:31-34]
|
| / lex(in)
\
\ \ np (node 9) *cut*
-------------------------------
FILES -------------------------------
TOY1_LEXICON:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
Item 4 is the one we want.
STEPPER>> SHOW
4
Form: np-->the,kitchen
Sem: [[location,kitchen]]
Feats:
[sem_np_type=location,singplur=sing]
Tree:
np (node 1)
[TOY1_RULES:26-30]
/ lex(the)
\ noun lex(kitchen) (node 2)
[TOY1_LEXICON:20-21]
-------------------------------
FILES -------------------------------
TOY1_LEXICON:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
Use the SUMMARY command to see what we have available.
STEPPER>>
SUMMARY
1:
.MAIN-->switch,on,the,light
2:
.MAIN-->switch,on,the,light,in,the,kitchen
3:
.MAIN-->switch,on,the,light,in,np
4: np-->the,kitchen
Take another look at item 1
STEPPER>> SHOW
1
Form: .MAIN-->switch,on,the,light
Sem:
concat([[utterance_type,command]],concat([[action,switch]],concat([[onoff,on]],[[device,light]])))
Feats: []
Tree:
.MAIN (node 1)
[TOY1_RULES:1-5]
utterance (node 2) [TOY1_RULES:6-10]
command (node 3) [TOY1_RULES:11-15]
/
verb lex(switch) (node 4) [TOY1_LEXICON:7-9]
|
onoff lex(on) (node 5) [TOY1_LEXICON:23-24]
|
np (node 6) [TOY1_RULES:26-30]
|
/ lex(the)
\
\ noun lex(light) (node 7)
[TOY1_LEXICON:15-16]
-------------------------------
FILES -------------------------------
TOY1_LEXICON:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
Cut out the NP.
STEPPER>> CUT
1 6
Added item 5:
.MAIN-->switch,on,np
Added item 6:
np-->the,light
We are going to try and paste together item 5 and item 4.
Take a look at them:
STEPPER>> SHOW
5
Form: .MAIN-->switch,on,np
Sem:
concat([[utterance_type,command]],concat([[action,switch]],concat([[onoff,on]],(Sem
for node 6))))
Feats: []
Tree:
.MAIN (node 1)
[TOY1_RULES:1-5]
utterance (node 2) [TOY1_RULES:6-10]
command (node 3) [TOY1_RULES:11-15]
/
verb lex(switch) (node 4) [TOY1_LEXICON:7-9]
|
onoff lex(on) (node 5) [TOY1_LEXICON:23-24]
\
np (node 6) *cut*
-------------------------------
FILES -------------------------------
TOY1_LEXICON:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
STEPPER>> SHOW
4
Form: np-->the,kitchen
Sem: [[location,kitchen]]
Feats:
[sem_np_type=location,singplur=sing]
Tree:
np (node 1)
[TOY1_RULES:26-30]
/ lex(the)
\ noun lex(kitchen) (node 2)
[TOY1_LEXICON:20-21]
-------------------------------
FILES -------------------------------
TOY1_LEXICON:
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
Now try and join them together. Item 4 is supposed to fit
into the *cut* node in item 5.
STEPPER>> JOIN
5 4
Incompatible
syntactic feats in categories:
np:[sem_np_type=switchable,singplur=A]
np:[sem_np_type=location,singplur=sing]
Feature clash:
sem_np_type=switchable, sem_np_type=location
*** Error processing
stepper command: "JOIN 5 4"
It didn't work, and we can see why: the sem_np_type
features don't match.
We can also build items bottom-up, out of lexical
entries. This is usually less efficient, but can be necessary if there is no
way to cut and paste.
Make an item for the lexical entry "light":
STEPPER>> LEX
light
Added item 7:
noun-->light
Find a rule that can dominate item 7, and apply it. If there
are several such rules, the stepper will present a menu.
STEPPER>>
COMBINE 7
Using rule between
lines 26 and 30 in
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
Added item 8:
np-->the,light
Same for "the living room".
STEPPER>> LEX
living room
Added item 9:
noun-->living,room
Make it into an NP.
STEPPER>>
COMBINE 9
Using rule between
lines 26 and 30 in
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
Added item 10:
np-->the,living,room
Make that into a location_pp.
STEPPER>>
COMBINE 10
Using rule between
lines 31 and 34 in
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
Added item 11:
location_pp-->in,the,living,room
We can combine this with item 7 to make the NP "the
light in the living room".
STEPPER>>
COMBINE 7 11
Using rule between
lines 26 and 30 in
c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus
Added item 12:
np-->the,light,in,the,living,room
… and if we want, we can successfully paste it into the
cut in item 5, to make "switch on the light in the living room".
STEPPER>> JOIN
5 12
Added item 13:
.MAIN-->switch,on,the,light,in,the,living,room
The config file specifies the various files and parameters referred to by a
Regulus application. Each config item is defined by a declaration of the form
regulus_config(<ConfigItem>,
<Value>).
You can include Prolog file_search_path
declarations directly in the config file, using the syntax
file_search_path(<Name>,
<Value>).
You can also allow one config file to load information from another one, using
the syntax
include(<Pathname>).
Recursive includes are permitted.
The full set of possible config items is listed immediately below. For
most applications, you will not need to specify more than a small fraction of
the items in this set. For example, the config file for the Toy0
application is as follows:
regulus_config(regulus_grammar,
[domain_specific_regulus_grammars(toy0)]).
regulus_config(top_level_cat, '.MAIN').
regulus_config(nuance_grammar,
toy0_runtime(recogniser)).
regulus_config(working_file_prefix,
toy0_runtime(toy0)).
This config file says that the Regulus grammar consists of the single
file domain_specific_regulus_grammars(toy0),
that its top-level category is .MAIN,
that the generated Nuance grammar is to be placed in the location toy0_runtime(recogniser), and that
working files are to be placed in the location toy0_runtime(toy0).
Note that pathnames are specified here using Prolog file_search_path declarations.
· ebl_regulus_component_grammar
· generation_incremental_deepening_parameters
· translation_corpus_judgements
· translation_corpus_recognition_judgements
· translation_corpus_results(<Arg>)
· translation_corpus_tmp_recognition_judgements
· translation_corpus_tmp_recognition_judgements(<Arg>)
· translation_speech_corpus(<Arg>)
· translation_speech_corpus_results
· translation_speech_corpus_results(<Arg>)
alterf_patterns_file
Relevant if you are doing Alterf processing with LF patterns. Points to a file
containing Alterf LF patterns, that can be tested using the CHECK_ALTERF_PATTERNS command.
collocation_rules
Relevant to translation applications.
Points to a file containing rules for post-transfer collocation processing.
compiled_ellipsis_classes
Relevant to translation applications.
Points to a file containing the compiled form of the ellipsis processing rules.
dialogue_files
Relevant to dialogue applications.
Points to a list of files defining dialogue processing behaviour.
discriminants
Relevant to applications using surface processing.
Points to a file of Alterf discriminants.
ebl_context_use_threshold
Relevant to applications using grammar specialisation.
Defines the minimum number of examples of a rule that must be present if the
system is to use rule context anti-unification to further constrain that rule.
ebl_corpus
Points to the file of training examples used as input to the EBL_TREEBANK operation. Intended originally for use
for grammar specialisation,
but can also be used simply to parse a set of examples to get information about
coverage. The format is sent(Atom),
so for example a typical line would be
sent('switch off the light').
(note the closing period).
If the application compiles multiple
top-level specialised grammars, the grammars relevant to each example are
defined in an optional second argument. For example, if a home control domain
had separate grammars for each room, a typical line in the training file might
be
sent('switch off the light', [bedroom,
kitchen, living_room]).
ebl_gemini_grammar
Relevant to applications using grammar specialisation. Specifies the base name of the Gemini files generated by the EBL_GEMINI command.
ebl_grammar_probs
Relevant to applications using grammar specialisation.
Specifies the file where the EBL_GRAMMAR_PROBS
command places its output.
ebl_ignore_feats
Relevant to applications using grammar specialisation. The
value should be a list of unification grammar features: these features will be ignored
in the specialised grammar. A suitable choice of value can greatly speed up
Regulus to Nuance compilation for the specialised grammar. This is documented
further here.
ebl_include_lex
Relevant to applications using grammar specialisation.
Specifies a file or list of files containing EBL include lex declarations.
ebl_nuance_grammar
Relevant to applications using grammar specialisation. Points to the specialised Nuance GSL grammar file produced by the EBL_NUANCE operation.
ebl_operationality
Relevant to applications using grammar specialisation. Specifies the operationality criterion, which must be defined in
the file $REGULUS/Prolog/ebl_operational.pl
ebl_regulus_component_grammar
Relevant to applications using grammar specialisation that
define multiple top-level specialised
grammars. Identifies which specialised Regulus grammar will be
loaded by the EBL_LOAD command.
ellipsis_classes
Relevant to translation applications.
Points to a file defining classes of intersubstitutable
phrases that can be used in ellipsis processing.
from_interlingua_rules
Relevant to translation applications.
Points to a file, or list of files, containing rules that transfer source
language representations into interlingual representations.
gemini_grammar
Specifies the base name of the Gemini files generated by the GEMINI
command.
generation_grammar
Relevant to applications that use generation
(typically translation applications).
Points to the file containing the compiled generation grammar.
generation_grammar(<Arg>)
Relevant to applications that use generation
(typically translation applications)
and also grammar
specialisation. Points to the file containing the compiled specialised
generation grammar for the subdomain tag <Arg>.
generation_incremental_deepening_parameters
Relevant to applications that use generation (typically translation
applications). Value should be a list of three positive numbers [<Start>, <Increment>,
<Max>], such that both <Start>
and <Increment> are less
than or equal to <Max>.
Generation uses an iterative deepening algorithm, which initially sets a
maximum derivation length of <Start>,
and increases it in increments of <Increment>
until it exceeds <Max>.
Default value is [5, 5, 50].
generation_module_name
Relevant to applications that use generation
(typically translation applications).
Specifies the module name in the compiled generation grammar file. Default is generator.
generation_preferences
Relevant to applications that use generation
(typically translation applications).
Points to the file containing the generation preference declarations.
generation_regulus_grammar
Relevant to applications that use generation
(typically translation applications).
If there is no regulus_grammar entry, points to
the Regulus file, or list of Regulus files, that are to be compiled into the
generation file.
generation_rules
Relevant to translation applications.
Points to the file containing the generation grammar. Normally this will be a
Regulus grammar compiled for generation.
The translation code currently assumes that this file will define the module generator, and that the top-level
predicate will be of the form
generator:generate(Representation,
Tree, Words)
global_context
Relevant to translation applications.
Defines a value that can be accessed by conditional-dependent transfer rules, if
transfer rules are to be shared across several applications defined by multiple
config files.
ignore_subdomain
Relevant to applications using grammar specialisation.
Sometimes, you will have defined multiple subdomains, but you will only be
carrying out development in one of them. In this case, you can speed up EBL
training by temporarily adding ignore_subdomain
declarations in the config file. An ignore_subdomain
declaration has the form
regulus_config(ignore_subdomain,
<Tag>).
The effect is to remove all references to <Tag>
when performing training, and not build any specialised grammar for <Tag>. You may include any
number of ignore_subdomain
declarations.
interlingua_declarations
Relevant to translation applications.
Points to the file containing the interlingua
declarations, which define the constants that may be used at the
interlingual representation level.
lf_patterns
Relevant to dialogue applications.
Points to a files of LF patterns.
lf_patterns_modules
Relevant to dialogue applications.
Value should be a list of modules referenced by the compiled LF patterns.
lf_postproc_pred
Defines a post-processing predicate that is applied after Regulus analysis. If
you are using the riacs_sem semantic macros, you must set this parameter to the
value riacs_postproc_lf.
nuance_grammar
Points to the Nuance GSL grammar produced by the NUANCE
command.
nuance_compile_params
Specifies a list of extra compilation parameters to be passed to Nuance
compilation by the NUANCE_COMPILE command. A
typical value is
['-auto_pron',
'-dont_flatten']
nuance_language_pack
Specifies the Nuance language pack to be used by Nuance compilation in the NUANCE_COMPILE command.
orthography_rules
Relevant to translation applications.
Points to a file containing rules for post-transfer orthography processing.
parse_preferences
Can be used to define default analysis preferences.
regulus_grammar
Points to the Regulus file, or list of Regulus files, that constitute the main
grammar.
regulus_no_sem_decls
Points to a file which removes the sem feature from the main grammar.
surface_constituent_rules
Relevant to applications using surface processing.
Points to the surface
constituent rules file.
surface_patterns
Relevant to applications using surface processing.
Points to the surface patterns file.
surface_postprocessing
Relevant to applications using surface processing.
Points to a file that defines a post-processing predicate that can be applied
to the results of surface processing. The file should define a predicate surface_postprocess(Representation,
PostProcessedRepresentation).
tagging_grammar
Relevant to applications using surface processing.
Points to a file that defines a tagging grammar, in DCG form. The top-level
rule should be of the form
tagging_grammar(Item) -->
<Body>.
target_model
Relevant to applications using surface processing.
Points to a file defining a target model. The file should define the predicates
target_atom/1 and target_atom_excludes/2.
to_interlingua_rules
Relevant to translation applications.
Points to a file, or list of files, containing rules that transfer interlingual
representations into target language representations.
to_source_discourse_rules
Relevant to translation applications.
Points to a file, or list of files, containing rules that transfer source
representations into source discourse representations.
top_level_cat
Defines the top-level category of the grammar.
top_level_generation_cat
Relevant to applications that use generation
(typically translation applications).
Defines the top-level category of the generation grammar. Default is .MAIN.
top_level_generation_feat
Relevant to applications that use generation
(typically translation applications).
Defines the semantic feature in the top-level rule which holds the semantic
value. Normally, the rule will be of the form
'.MAIN':[gsem=[value=Sem]] --> Body
and the value of this parameter will be value
(default if not specified).
top_level_generation_pred
Relevant to applications that use generation
(typically translation applications).
Defines the top-level category of the generation grammar. For translation
applications, the value should be generate
(default if not specified).
transfer_rules
Relevant to translation applications.
Points to a file, or list of files, containing rules that transfer source
language representations into target language representations.
translation_corpus
Relevant to translation applications.
Points to a file of examples used as input to the TRANSLATE_CORPUS
command. The format is sent(Atom),
so for example a typical line would be
sent('switch off the light').
(note the closing period).
translation_corpus(<Arg>)
Relevant to translation applications.
Points to a file of examples used as input to the parameterised command TRANSLATE_CORPUS <Arg>. The format is
the same as for the translation_corpus file.
translation_corpus_judgements
Relevant to translation applications.
Points to a file of translation judgements. You should not normally edit this
file directly, but update it using the commands UPDATE_TRANSLATION_JUDGEMENTS and UPDATE_TRANSLATION_JUDGEMENTS_SPEECH.
translation_corpus_recognition_judgements
Relevant to translation applications.
Points to a file of recognition judgements. You should not normally edit this
file directly, but update it using the command UPDATE_RECOGNITION_JUDGEMENTS.
translation_corpus_results
Relevant to translation applications.
Points to the file containing the result of running the TRANSLATE_CORPUS command. You can then edit
this file to update judgements, and incorporate them into the translation_corpus_judgements file by
using the command UPDATE_TRANSLATION_JUDGEMENTS.
translation_corpus_results(<Arg>)
Relevant to translation applications.
Points to the file containing the result of running the parameterised command TRANSLATE_CORPUS <Arg>. You can then
edit this file to update judgements, and incorporate them into the translation_corpus_judgements file by
using the parameterised command UPDATE_TRANSLATION_JUDGEMENTS
<Arg>.
translation_corpus_tmp_recognition_judgements
Relevant to translation applications.
Points to the file of new recognition results generated by
running the TRANSLATE_SPEECH_CORPUS
command. You can then edit this file to update the judgements, and
incorporate them into the translation_corpus_recognition_judgements file using
the command UPDATE_RECOGNITION_JUDGEMENTS.
translation_corpus_tmp_recognition_judgements(<Arg>)
Relevant to translation applications.
Points to the file of new recognition results generated by
running the TRANSLATE_SPEECH_CORPUS
<Arg> command. You can then edit this file to update the
judgements, and incorporate them into the
translation_corpus_recognition_judgements file using the command UPDATE_RECOGNITION_JUDGEMENTS
<Arg>.
translation_rec_params
Relevant to translation applications.
Specifies the list of Nuance parameters that will be used
when carrying out recognition for the TRANSLATE_SPEECH_CORPUS
command. These parameters must at a minimum specify the recognition
package and the top-level Nuance grammar, for example
[package=med_runtime(recogniser),
grammar='.MAIN']
translation_speech_corpus
Relevant to translation applications.
Points to a file of examples used as input to the TRANSLATE_SPEECH_CORPUS command. The format
is <Wavfile> <Words>,
so for example a typical line would be
C:/Regulus/data/utt03.wav switch off
the light
translation_speech_corpus(<Arg>)
Relevant to translation applications.
Points to a file of examples used as input to the TRANSLATE_SPEECH_CORPUS <Arg>
command. Format is as for the translation_speech_corpus
parameter.
translation_speech_corpus_results
Relevant to translation applications.
Points to the file containing the result of running the TRANSLATE_SPEECH_CORPUS command. You
can then edit this file to update judgements, and incorporate them into the
translation_corpus_judgements file by using the command UPDATE_TRANSLATION_JUDGEMENTS_SPEECH.
translation_speech_corpus_results(<Arg>)
Relevant to translation applications.
Points to the file containing the result of running the TRANSLATE_SPEECH_CORPUS <Arg>
command. You can then edit this file to update judgements, and
incorporate them into the translation_corpus_judgements file by using the
command UPDATE_TRANSLATION_JUDGEMENTS_SPEECH
<Arg>.
wavfile_directory
Relevant to translation applications.
If output speech is to be produced using recorded wavfiles, points to the
directory that holds these files.
wavfile_recording_script
Relevant to translation applications.
If output speech is to be produced using recorded wavfiles, points to an
automatically created file that holds a script which can be used to create the
missing wavfiles. This script is produced by finding all the lexical items in
the file referenced by generation_rules, and
creating an entry for every item not already in wavfile_directory. The file is
created as part of the processing carried out by the LOAD_TRANSLATE
command.
Due to limitations of some operating systems the script contains some latin-1
characters translated to character sequences shown in the table below.
Original
Character |
Translates
to |
á |
a1 |
â |
a2 |
à |
a3 |
ä |
a4 |
å |
a5 |
ç |
c1 |
é |
e1 |
ê |
e2 |
è |
e3 |
ë |
e4 |
æ |
e6 |
ñ |
n1 |
ó |
o1 |
ô |
o2 |
ò |
o3 |
ö |
o4 |
ú |
u1 |
û |
u2 |
ù |
u3 |
ü |
u4 |
working_directory
Working files will have names starting with this prefix.
You can invoke Regulus in batch mode using the predicate regulus_batch/2 , defined in $REGULUS/Prolog/regulus_top.pl. A call
to regulus_batch/2 is of the
form
regulus_batch(ConfigFile,
Commands)
where ConfigFile is the name of
a Regulus config file, and Commands is
a list of Regulus commands, written as strings.
There is also a three-argument version of this predicate, where the call is of the form
regulus_batch(ConfigFile,
Commands, Errors)
Here, Errors will be
instantiated to a list consisting of all the error messages that may have been
printed out while the Commands were
executed.
regulus_batch/2 can be used
to write scripts that invoke Regulus to compile grammars. For example, loading
the following file into Prolog invokes Regulus to compile a recogniser for the
Toy1 grammar:
% Load following file to define library
directories etc
:- ['$REGULUS/Examples/Toy1/prolog/library_declarations'].
% Compile the main Regulus code
:- compile('$REGULUS/Prolog/load').
% Do Regulus to Nuance compilation
:- regulus_batch('$REGULUS/Examples/Toy1/scripts/toy1.cfg',
["NUANCE"]).
:- halt.
Similarly, loading the following file into Prolog invokes Regulus to compile a
recogniser for the PSA grammar, using grammar specialisation.
% Define library directories etc
:- ['$REGULUS/Examples/PSA/prolog/library_declarations'].
% Compile the main Regulus code
:- compile('$REGULUS/Prolog/load').
% Load, do EBL training and post-processing, and compile specialised grammar to
Nuance.
:- regulus_batch('$REGULUS/Examples/PSA/scripts/psa.cfg',
["LOAD", "EBL_TREEBANK", "EBL_TRAIN",
"EBL_POSTPROCESS", "EBL_NUANCE"]).
:- halt.
You can run these Prolog files from the command line by calling SICStus with
the -l flag. For example, if the first of the two files above is in the file $REGULUS/Examples/Toy1/scripts/compile_to_nuance.pl ,
then we can perform the compilation from the command line with the invocation
sicstus -l
$REGULUS/Examples/Toy1/scripts/compile_to_nuance.pl
It is possible to give spoken instead of written
input in the Regulus top loop. This first requires performing the following
steps:
1. Make sure that /usr/bin (UNIX) or c:/cygwin/bin (Windows/Cygwin) are in your path.
2. Create a file called $REGULUS/scripts/run_license.bat, whose contents
are a single line invoking the Nuance License Manager. This will require
obtaining a license manager code from Nuance. A typical line would be something
like the following (the license code is not genuine):
nlm C:/Nuance/Vocalizer4.0/license.txt ntk12-1234-a-1234-a1bc12de1234
It should now be possible to load the
recognition package using the command LOAD_RECOGNITION. This should start
processes for the Nuance license manager, the Nuance recserver, and the Regulus
Speech Server (regserver), and will normally take about a minute.
Once the speech resources have been loaded,
the RECOGNISE command will take spoken input from the microphone, performing
recognition using the loaded package.
It is possible to call the Regulus parser from Prolog using
the predicate
atom_to_parse_using_current_parser(+Sent, +Grammar, -Parse)
Here, Sent is the utterance to be parsed and Grammar is the top-level grammar to use, both represented as Prolog atoms; Parse is the resulting parse.
The predicate assumes that some grammar is currently loaded. Most often, this has been done using an invocation of regulus_batch/2; Grammar is usually the atom '.MAIN'. A typical invocation sequence is thus something like the following:
regulus_batch('$REGULUS/Examples/Toy1/scripts/toy1.cfg',
["LOAD"])
(…intervening code…)
atom_to_parse_using_current_parser('switch on the light', '.MAIN', Parse)
This section presents some illustrative Regulus grammars and their translations into GSL. In a later section we describe Regulus syntax more formally. Comments in the grammars are in italics and preceded Prolog-style by a percent sign.
We start with a minimal toy grammar, which covers a few phrases like "a dog" or "two cats". The only point of interest is that we want to block phrases like *"a dogs" or *"two cat", which combine a singular specifier and a plural noun, or vice versa.
% "num_value" is a feature value
space with possible values "sing" and "plur"
feature_value_space(num_value, [[sing, plur]]).
% "num" is a feature taking values in "num_value"
feature(num, num_value).
% ".MAIN" is a category which allows global slot-filling
and has no syntactic features
category('.MAIN', [gsem]).
% "np", "spec" and "n" are all
categories which allow a semantic return value and have one syntactic feature,
"num"
category(np, [sem, num]).
category(spec, [sem, num]).
category(n, [sem, num]).
% ".MAIN" is a top-level grammar
top_level_category('.MAIN').
% ".MAIN" can be rewritten to "np"
'.MAIN':[gsem=[value=S]] -->
np:[sem=S].
% "np" can be rewritten to "spec" followed by
"n". The "spec" and "num" have to agree on the
value of the "num" feature.
np:[sem=[spec=S, num=N], num=Num] -->
spec:[sem=S, num=Num], n:[sem=N, num=Num].
% Lexicon entries
% "a" is a singular "spec"
spec:[sem=a, num=sing] --> a.
% "two" is a plural "spec"
spec:[sem=2, num=plur] --> two.
% "the" is a "spec" that can be either singular
or plural
spec:[sem=the, num=(sing\/plur)] -->
the.
% "cat" and "dog" are singular "n"s
n:[sem=cat, num=sing] --> cat.
n:[sem=dog, num=sing] --> dog.
% "cat" and "dog" are plural "n"s
n:[sem=cat, num=plur] --> cats.
n:[sem=dog, num=plur] --> dogs.
This grammar compiles to the following GSL grammar:
.MAIN
[ ( NP_ANY:v_0
) { < value $v_0 > } ]
NP_ANY
[
( NP_PLUR:v_0 ) {return( $v_0 )}
( NP_SING:v_0 ) {return( $v_0 )}
]
NP_PLUR
[ ( SPEC_PLUR:v_0 N_PLUR:v_1 ) {return( [ < spec $v_0 > < num $v_1
> ] )}]
NP_SING
[ ( SPEC_SING:v_0 N_SING:v_1 ) {return( [ < spec $v_0 > < num $v_1
> ] )}]
SPEC_SING
[
( a ) {return( a )}
( the ) {return( the )}
]
SPEC_PLUR
[
( two ) {return( 2 )}
( the ) {return( the )}
]
N_SING
[
( cat ) {return( cat )}
( dog ) {return( dog )}
]
N_PLUR
[
( cats ) {return( cat )}
( dogs ) {return( dog )}
]
Our second example is a little more realistic, and shows how complex structures can be built up using the GSL "concat" operator.
% Declarations
feature_value_space(number_value, [[sing,
plur]]).
feature_value_space(vform_value, [[imperative, finite]]).
feature_value_space(vtype_value, [[transitive, switch, be]]).
feature_value_space(sem_np_type_value, [[n, location, switchable, dimmable]]).
feature(number, number_value).
feature(vform, vform_value).
feature(vtype, vtype_value).
feature(sem_np_type, sem_np_type_value).
feature(obj_sem_np_type, sem_np_type_value).
top_level_category('.MAIN').
category('.MAIN', [gsem]).
category(utterance, [sem]).
category(command, [sem]).
category(yn_question, [sem]).
category(np, [sem, number, sem_np_type]).
category(location_pp, [sem]).
category(noun, [sem, number, sem_np_type]).
category(spec, [sem, number]).
category(verb, [sem, number, vform, vtype, obj_sem_np_type]).
category(onoff, [sem]).
% Grammar
'.MAIN':[gsem=[value=S]] -->
utterance:[sem=S].
utterance:[sem=S] -->
( command:[sem=S] ;
yn_question:[sem=S]
).
command:[sem=concat([[type, command]], concat(Op, concat(OnOff, Np)))] -->
verb:[sem=Op, vform=imperative, vtype=switch, obj_sem_np_type=ObjType],
onoff:[sem=OnOff],
np:[sem=Np, sem_np_type=ObjType].
command:[sem=concat([[type, command]], concat(Op, Np))] -->
verb:[sem=Op, vform=imperative, vtype=transitive, obj_sem_np_type=ObjType],
np:[sem=Np, sem_np_type=ObjType].
yn_question:[sem=concat([[type, query]], concat(Verb, concat(OnOff, Np)))]
-->
verb:[sem=Verb, vform=finite, vtype=be, number=N, obj_sem_np_type=n],
np:[sem=Np, number=N, sem_np_type=switchable],
onoff:[sem=OnOff].
% Discard semantic contribution of spec...
np:[sem=concat(Noun, Loc), number=N,
sem_np_type=SemType] -->
spec:[sem=Spec, number=N],
noun:[sem=Noun, number=N, sem_np_type=SemType],
?location_pp:[sem=Loc].
location_pp:[sem=Loc] -->
in,
np:[sem=Loc, sem_np_type=location].
% Lexicon
verb:[sem=[[state, be]], vform=finite,
vtype=be, number=sing,
obj_sem_np_type=n] --> is.
verb:[sem=[[state, be]], vform=finite, vtype=be, number=plur,
obj_sem_np_type=n] --> are.
verb:[sem=[[action, switch]], vform=imperative, vtype=switch, number=sing,
obj_sem_np_type=switchable] --> switch.
verb:[sem=[[action, switch]], vform=imperative, vtype=switch, number=sing,
obj_sem_np_type=switchable] --> turn.
verb:[sem=[[action, dim]], vform=imperative, vtype=transitive, number=sing,
obj_sem_np_type=dimmable] --> dim.
noun:[sem=[[device, light]], sem_np_type=switchable\/dimmable, number=sing]
--> light.
noun:[sem=[[device, light]], sem_np_type=switchable\/dimmable, number=plur]
--> lights.
noun:[sem=[[device, fan]], sem_np_type=switchable, number=sing] --> fan.
noun:[sem=[[device, fan]], sem_np_type=switchable, number=plur] --> fans.
noun:[sem=[[location, kitchen]], sem_np_type=location, number=sing] -->
kitchen.
noun:[sem=[[location, living_room]], sem_np_type=location, number=sing] -->
living, room.
spec:[sem=the, number=sing] --> the.
spec:[sem=all, number=plur] --> the.
spec:[sem=all, number=plur] --> all, ?((?of, the)).
onoff:[sem=[[onoff=on]]] --> ?switched, on.
onoff:[sem=[[onoff=off]]] --> ?switched, off.
.MAIN
[( UTTERANCE:v_0 ) { < value $v_0 > }]
UTTERANCE
[
( COMMAND:v_0 ) {return( $v_0 )}
( YN_QUESTION:v_0 ) {return( $v_0 )}
]
COMMAND
[
( VERB_ANY_DIMMABLE_IMPERATIVE_TRANSITIVE:v_0 NP_ANY_DIMMABLE:v_1 )
{return( concat( ( ( type command ) ) concat( $v_0 $v_1 ) ) )}
( VERB_ANY_SWITCHABLE_IMPERATIVE_SWITCH:v_0 ONOFF:v_1 NP_ANY_SWITCHABLE:v_2 )
{return( concat( ( ( type command ) ) concat( $v_0 concat( $v_1 $v_2 ) ) ) )}
]
NP_ANY_DIMMABLE
[
( NP_PLUR_DIMMABLE:v_0 ) {return( $v_0 )}
( NP_SING_DIMMABLE:v_0 ) {return( $v_0 )}
]
NP_ANY_LOCATION
[( NP_SING_LOCATION:v_0 ){return( $v_0 )}]
NP_ANY_SWITCHABLE
[
( NP_PLUR_SWITCHABLE:v_0 ) {return( $v_0 )}
( NP_SING_SWITCHABLE:v_0 ) {return( $v_0 )}
]
VERB_ANY_SWITCHABLE_IMPERATIVE_SWITCH
[( VERB_SING_SWITCHABLE_IMPERATIVE_SWITCH:v_0 ) {return( $v_0 )}]
VERB_ANY_DIMMABLE_IMPERATIVE_TRANSITIVE
[( VERB_SING_DIMMABLE_IMPERATIVE_TRANSITIVE:v_0 ) {return( $v_0 )}]
YN_QUESTION
[
( VERB_PLUR_N_FINITE_BE:v_0 NP_PLUR_SWITCHABLE:v_2 ONOFF:v_1 )
{return( concat( ( ( type query ) ) concat( $v_0 concat( $v_1 $v_2 ) ) ) )}
( VERB_SING_N_FINITE_BE:v_0 NP_SING_SWITCHABLE:v_2 ONOFF:v_1 )
{return( concat( ( ( type query ) ) concat( $v_0 concat( $v_1 $v_2 ) ) ) )}
]
NP_PLUR_DIMMABLE
[( SPEC_PLUR:v_2 NOUN_PLUR_DIMMABLE:v_0 ?(LOCATION_PP:v_1) ) {return( concat(
$v_0 $v_1 ) )}]
NP_PLUR_SWITCHABLE
[( SPEC_PLUR:v_2 NOUN_PLUR_SWITCHABLE:v_0 ?(LOCATION_PP:v_1) ) {return( concat(
$v_0 $v_1 ))}]
NP_SING_DIMMABLE
[( SPEC_SING:v_2 NOUN_SING_DIMMABLE:v_0 ?(LOCATION_PP:v_1) ) {return( concat(
$v_0 $v_1 ) )}]
NP_SING_LOCATION
[( SPEC_SING:v_2 NOUN_SING_LOCATION:v_0 ?(LOCATION_PP:v_1) ) {return( concat(
$v_0 $v_1 ) )}]
NP_SING_SWITCHABLE
[( SPEC_SING:v_2 NOUN_SING_SWITCHABLE:v_0 ?(LOCATION_PP:v_1) ){return( concat(
$v_0 $v_1 ) )}]
LOCATION_PP
[( in NP_ANY_LOCATION:v_0 ) {return( $v_0 )}]
VERB_SING_N_FINITE_BE
[( is ) {return( ( ( state be ) ) )}]
VERB_PLUR_N_FINITE_BE
[( are ) {return( ( ( state be ) ) )}]
VERB_SING_SWITCHABLE_IMPERATIVE_SWITCH
[
( switch ) {return( ( ( action switch ) ) )}
( turn ) {return( ( ( action switch ) ) )}
]
VERB_SING_DIMMABLE_IMPERATIVE_TRANSITIVE
[( dim ) {return( ( ( action dim ) ) )}]
NOUN_SING_DIMMABLE
[( light ) {return( ( ( device light ) ) )}]
NOUN_SING_SWITCHABLE
[
( fan ) {return( ( ( device fan ) ) )}
( light ) {return( ( ( device light ) ) )}
]
NOUN_PLUR_DIMMABLE
[( lights ) {return( ( ( device light ) ) )}]
NOUN_PLUR_SWITCHABLE
[
( fans ) {return( ( ( device fan ) ) )}
( lights ) {return( ( ( device light ) ) )}
]
NOUN_SING_LOCATION
[
( kitchen ) {return( ( ( location kitchen ) ) )}
( living room ) {return( ( ( location living_room ) ) )}
]
SPEC_SING
[( the ) {return( the )}]
SPEC_PLUR
[
( the ) {return( all )}
( all ?(?(of) the) ) {return( all )}
]
ONOFF
[
( ?(switched) off ) {return( ( [ < onoff off > ] ) )}
( ?(switched) on ) {return( ( [ < onoff on > ] ) )}
]
You may simply want to use Regulus to compile your own unification grammars
into Nuance GSL. Experience shows, however, that complex natural language
grammars tend to have a lot of common structure, since they ultimately have to
model general linguistic facts about English and other natural languages. There
are consequently good reasons for wanting to save effort by implementing a
SINGLE domain-independent core grammar, and producing domain-independent
versions out of it using some kind of specialisation process.
Regulus includes an experimental system which attempts to deliver this functionality.
There is a general unification grammar
for English , containing about 145 rules, and an accompanying core lexicon.
For a given domain, you will need to extend the general grammar
in some way. In particular, you will need to supplement the core lexicon with a
domain-specific lexicon that you will write yourself. You will then be able to
use the grammar specialisation
tools to transform a small training corpus into a specialised version of
the grammar.
The general English grammar is in the directory $REGULUS/Grammar. It contains the following files:
· general_eng.regulus. The grammar rules and declarations.
· gen_eng_lex.regulus. Core function-word lexicon and some macro definitions useful for writing lexicon entries.
· riacs_sem.regulus. Definitions for semantics macros that produce a QLF-like semantics based on those used in the RIACS PSA system.
· nested_sem.regulus. Definitions for semantics macros that produce a minimal list-based recursive semantics.
· linear_sem.regulus. Definitions for semantics macros that produce a minimal list-based non-recursive semantics.
The multiple semantics files reflect the fact that the semantics in general_eng.regulus and gen_eng_lex.regulus are all defined in terms of macros.
A grammar based on the general English grammar should include
general_eng.regulus, gen_eng_lex.regulus and exactly one of the semantics
macros files. It will typically also contain a domain-specific lexicon file.
These files are declared in the config file for
the application.
If riacs_sem.regulus is used, the config file must contain the line
regulus_config(lf_postproc_pred,
riacs_postproc_lf).
This specifies that the initial results from Nuance recognition are to be
postprocessed using the predicate riacs_postproc_lf
.
You are strongly advised to define domain-specific lexical entries using the
macros from $REGULUS/Grammar/gen_eng_lex_entries.regulus.
This file includes documentation and examples, using the format illustrated by
the following example for the macro v_intransitive:
% Intransitive
% e.g. "John sleeps"
%
% @v_intransitive([sleep, sleeps,
slept, slept, sleeping],
%
[action, sleep], [agent], [takes_time_pp=y, takes_frequency_pp=y,
takes_duration_pp=y]).
macro(v_intransitive(SurfaceForms,
[SemType, SemConstant], [SubjSortalType], OtherFeats),
@verb(SurfaceForms,
[
@verb_sem(SemType, SemConstant)],
[subcat=nx0v,
inv=n,
subj_sem_n_type=SubjSortalType
|
OtherFeats])).
Simple examples of how to use lexicon macros are in the domain-specific
lexicon file for the Toy1Specialised application, $REGULUS/Examples/Toy1Specialised/Regulus/toy1_lex.regulus.
A wider range of examples can be found in the English lexicon in the Open
Source MedSLT project :
look at the file MedSLT2/Eng/Regulus/med_lex.regulus.
A grammar built on top of the general grammar is transformed into a specialised Nuance grammar in the following processing stages:
1. The EBL training corpus (defined by the config file parameter ebl_corpus ) is parsed into a "treebank" of parsed representations. This is done using the Regulus command EBL_TREEBANK .
2. The treebank is used to produce a "raw" specialised Regulus grammar, using the EBL algorithm. This is done using the Regulus command EBL_TRAIN . The granularity of the learned rules is defined by the config file parameter ebl_operationality. This parameter should have the value file(<File>), where <File> identifies a file containing operationality definitions.
3. The "raw" specialised Regulus grammar is post-processed into the final specialised grammar. This is done using the Regulus command EBL_POSTPROCESS . The post-processing stage consists of three steps:
o Duplicate rules are merged, keeping only the different training examples as documentation.
o Specialised rules are sorted by number of training examples.
o If there are enough training examples for a rule, it is further constrained to unify with the the least common generalisation of all the contexts in which it has occurred. The threshold which determines when this happens is defined by the config file parameter ebl_context_use_threshold.
4. The final specialised Regulus grammar is compiled into the Nuance grammar. This is done using the Regulus command EBL_NUANCE .
The operationality definitions file contains declarations which specify how
example trees from the "treebank" file are to be cut up into smaller
trees, which are then flattened into specialised rules. The basic idea is to
traverse the tree in a downward direction, starting with the root node:
throughout the traversal process, the system maintains a value called the
"context", which by default remains the same when moving downwards
over a node. Each node is labelled with the associated category. Operationality
declarations can do one of two things. Usually they will cut the tree, starting
a new rule and simultaneously changing the value of the context. Occasionally,
they will just change the value of the context without cutting the tree.
Operationality rules of both types syntactically have the form of Prolog rules.
They are respectively of the forms
change_rule_and_context(<OldContext>,
<NewContext>) :- <Conditions>
change_context(<OldContext>, <NewContext>) :-
<Conditions>
where <OldContext>, <NewContext> are respectively
the old and new values of the context, and <Conditions>
is a set of conditions on the node.
The <Conditions> have the
form of the body of a Prolog rule, and can include the usual Prolog logical
connectors: conjunction (","),
disjunction (";") and
negation ("\+"). Four
different primitives are currently available for constraining the node:
· cat(<CatName>). The category symbol for the node is <CatName>.
· dominates(<CatName>). The node dominates (directly or indirectly) another node whose category symbol is <CatName>.
· immediately_dominates(<CatName>). The node immediately dominates another node whose category symbol is <CatName>.
· lexical. The node is a lexical node, i.e. dominates only terminal symbols.
· gap. The node has null yield, i.e. dominates no terminal symbols.
A simple example of an operationality definitions file can be found in $REGULUS/Examples/Toy1Specialised/Prolog/operationality.pl.
Derivation trees are cut up and flattened to produce a simple grammar with
rules for UTTERANCE (the top
category), NP, POST_MODS and lexical items. The
definitions are as follows:
% Start new rule at UTTERANCE
change_rule_and_context(_Context, utterance) :-
cat(utterance),
\+ gap.
% Start new rule at NP or POST_MODS if
under UTTERANCE
change_rule_and_context(utterance, np)
:-
cat(np),
\+ gap.
change_rule_and_context(utterance,
post_mods) :-
cat(post_mods),
\+ gap.
% Start new rule at NP if under
POST_MODS
change_rule_and_context(post_mods, np)
:-
cat(np),
\+ gap.
% Start new rule at POST_MODS if under
NP
change_rule_and_context(np, post_mods)
:-
cat(post_mods),
\+ gap.
% Always start new rule at lexical node
change_rule_and_context(_Context,
lexical) :-
lexical.
It is possible to use the grammar specialisation mechanism to produce multiple
top-level specialised grammars. If you want to do this, you must first define a
set of tags, which will label the different grammars. Each example in the EBL
training corpus (defined by the config file parameter ebl_corpus ) must then be labelled
with some subset of the grammar tags, to indicate which grammar or grammars it
applies to. For example, if a home control domain had separate grammars for
each room, the tags would be the names of the rooms (bedroom, kitchen, living_room and so on), and typical
lines in the training file might be
sent('switch off the light', [bathroom,
bedroom, kitchen, living_room, wc]).
sent('turn on the tap', [bathroom, kitchen, wc]).
sent('flush the toilet', [wc]).
The specialised Nuance grammar file produced by the EBL_NUANCE command will contain one top-level grammar
for each tag. The name of the top-level Nuance grammar corresponding to the tag
<Tag> will be .MAIN__<Tag> (note the double
underscore), so for example the grammar for kitchen
will be .MAIN__kitchen. The tag default is treated specially, and
produces the top-level grammar .MAIN.
Sometimes, you will have defined multiple subdomains, but you will only be
carrying out development in one of them. In this case, you can speed up EBL
training by temporarily adding ignore_subdomain
declarations in the config file. An ignore_subdomain
declaration has the form
regulus_config(ignore_subdomain,
<Tag>).
The effect is to remove all references to <Tag>
when performing training, and not build any specialised grammar for <Tag>. You may include any
number of ignore_subdomain
declarations.
In general, the corpus utterances used as input to the treebank may be ambiguous. In most cases, the first analysis produced will the intended one. When the first analysis is not the intended one, it is possible to annotate the training corpus so as to choose a different analysis, by using the optional third argument of the sent record. This third argument should be a list of constraints on the logical form. Constraints may currently be of the following forms:
· lf_includes_structure=<Structure>. The logical form must contain a subterm that unifies with <Structure>
· lf_doesnt_include_structure=<Structure>. The logical form may not contain any subterm that unifies with <Structure>
· tree_includes_structure=<Structure>. The parse tree must contain a subterm that unifies with <Structure>
· tree_doesnt_include_structure=<Structure>. The parse tree may not contain any subterm that unifies with <Structure>
For example, suppose that the training utterance is 'i read the meter', which is ambiguous since 'read' can be either present or past tense. If we want to choose the present tense interpretation and reject the past tense interpretation, we can write the training corpus example in either of the following ways:
· sent('i read the meter', [default], [lf_includes_structure=[tense, present]]).
· sent('i read the meter', [default], [lf_doesnt_include_structure=[tense, past]]).
Tree-based preferences assume an encoding of the parse tree illustrated by the following example. Suppose that we have the training example "what is your temperature". With the general English grammar, this can be analysed in at least two ways, depending on whether the word-order is inverted or uninverted. Suppose that we want to block the inverted word-order. This analysis will have the parse-tree
.MAIN
[GENERAL_ENG:504-509]
top [GENERAL_ENG:515-521]
/
utterance_intro null [GENERAL_ENG:529-531]
|
utterance [GENERAL_ENG:578-583]
|
s [GENERAL_ENG:658-663]
| s [GENERAL_ENG:689-700]
| /
np [GENERAL_ENG:1852-1859]
| |
d lex(what) [GEN_ENG_LEX:364-364]
| |
s [GENERAL_ENG:774-783]
| |
vp [GENERAL_ENG:1252-1269]
| |
/ vp [GENERAL_ENG:999-1008]
| |
| / vbar [GENERAL_ENG:833-855]
| |
| | / v
lex(is) [MED_LEX:314-322]
| |
| | | np
[GENERAL_ENG:1942-1957]
| |
| | |
/ np [GENERAL_ENG:1818-1826]
| |
| | |
| / possessive lex(your) [GEN_ENG_LEX:377-377]
| |
| | |
| | nbar [GENERAL_ENG:1982-1992]
|
| |
| | | \ n lex(temperature) [MED_LEX:501-501]
| |
| | \
\ post_mods null
[GENERAL_ENG:1383-1389]
| |
| | np [GENERAL_ENG:1942-1957]
| |
| | / np
null [GENERAL_ENG:2180-2191]
|
| |
\ \ post_mods null [GENERAL_ENG:1383-1389]
| \
\ post_mods null
[GENERAL_ENG:1383-1389]
\
utterance_coda null [GENERAL_ENG:560-562]
which for the purposes of tree-based preferences will be represented as the
Prolog term
(.MAIN
<
[(top <
[utterance_intro<null,
(utterance <
[(s <
[(s <
[np<[d<lex(what)],
(s <
[(vp <
[(vp <
[(vbar <
[v<lex(is),
np<[np<[possessive<lex(your),nbar<[n<lex(temperature)]],post_mods<null]]),
np<[np<null,post_mods<null]]),
post_mods<null])])])])]),
utterance_coda<null])])
We can choose not to allow this analysis by using a negative tree-based
constraint matching a suitable substructure. For example, the constraint
tree_doesnt_include_structure=(s
< [(np < _), (s < _)])
will block trees containing a fronted NP, and
tree_doesnt_include_structure=(np
< null)
will block trees containing an NP gap.
It will often be the case that the same type of constraint will be required for many similar items in the treebank. For example, in Spanish the determiner "un" can be either an indefinite article ("a") or a number ("one"). In general, we will prefer the indefinite article reading, but some words like time-units will make it more likely that the number reading is to be preferred.
The best way to handle situation like these is to define default parse preferences, which are declared in the parse_preferences file. A record in the parse_preferences file is of the form
parse_preference_score(<Pattern>,
<Score>).
where <Pattern> is an LF pattern and <Score> is a numerical score. Patterns can be either constraints of the type shown above, or Boolean combinations of these constraints formed using the usual Prolog operators "," (conjunction), ";" (disjunction) and "\+" (negation). The Spanish determiner example can be handled as follows using LF-based preferences:
%
By default, disprefer readings where "un/una" is interpreted as a
number
parse_preference_score(lf_includes_structure=[number,1], -1).
%
But prefer to interpret "un/una" as number if it occurs with a
timeunit
parse_preference_score((lf_includes_structure=[number,1], lf_includes_structure=[timeunit,_]), 5).
For any given specialised grammar, there will probably be several features
deriving from the general grammar which have no appreciable positive effect in
terms of constraining the language model. Although these features are
essentially useless, they can still slow down Regulus to Nuance compilation
very substantially, or even cause it to exceed resource limits.
It is possible to force the compiler to ignore features, by using the ebl_ignore_feats config file parameter. For
example, the declaration
regulus_config(ebl_ignore_feats,
[syn_type, subj_syn_type, obj_syn_type, indobj_syn_type]).
says that all the features in the "syn_type" group are to be ignored.
If you want to try to optimise performance by ignoring features, we recommend
that you start by looking at the following groups:
1.
"syn_type" features: syn_type, subj_syn_type, obj_syn_type, indobj_syn_type.
These features are only useful if you are using the ebl_context_use_threshold parameter, and
can otherwise be safely ignored.
2.
"def" features: def, subj_def,
obj_def, indobj_def
These features can be used to constrain NPs with respect to definiteness, but
we have found them to be of very limited value. They can probably also be
ignored in most applications.
One way to add lexicon entries to the specialised grammar is just to add
suitable training examples in the corpus. You can also include lexicon entries
directly from the general grammar. To do this, you need to write one or more
files of include_lex entries, and add an ebl_include_lex
entry to the config file, which points to them. So for example if your include_lex entries are in the file $MY_APP/my_lex_includes.pl, your
config file needs the line
regulus_config(ebl_include_lex,
'$MY_APP/my_lex_includes.pl').
The format of an include_lex
entry is
include_lex(<Cat>:[words=<Words>,
sem=<Sem>], <Tags>).
This says to include all lexicon entries of category <Cat>, whose surface form is <Words> and whose logical form
contains a subterm matching <Sem>,
in the specialised grammars whose tags are in the list <Tags>. Thus for example the declaration
include_lex(v:[words=start,
sem=start_happening], [gram1, gram2]).
says to include in gram1 and gram2 the v entries whose surface form is start, and whose logical forms contain the atom start_happening. Note that <Sem> can be partially
instantiated: for example
include_lex(v:[sem=[event, _]],
[gram1]).
says to include in gram1 all the
v entries whose logical forms
contain a term matching [event, _].
You can use this feature to include all entries of a particular semantic class.
<Words>, <Sem> and <Tags> are optional, and in
practice you will usually omit some or all of them. So for example
include_lex(v:[words=start,
sem=start_happening]).
says to include the v entries
whose surface form is start, and
whose logical forms contain the atom start_happening
in the single default specialised grammar;
include_lex(v:[sem=start_happening]).
says to include all v entries
whose logical forms contain the atom start_happening
in the single default specialised grammar; and
include_lex(v:[]).
says to include all v entries in
the single default specialised grammar.
In practice, you often want to include nearly all of the entries matching
some pattern, omitting just a few problem cases. You can do this with dont_include_lex entries. The format
of a dont_include_lex entry is
the same as that of an include_lex
entry, i.e.
dont_include_lex(<Cat>:[words=<Words>,
sem=<Sem>], <Tags>).
dont_include_lex entries take
precedence over include_lex
entries. Thus for example, the following entries say to include all English
verb entries except those for the reduced forms of "is",
"are", "am", "has", "had" and
have":
% Add all entries for verbs
include_lex(v:[]).
% ... except a few auxiliaries that
cause problems
dont_include_lex(v:[words='\'s']).
dont_include_lex(v:[words='\'re']).
dont_include_lex(v:[words='\'m']).
dont_include_lex(v:[words='\'d']).
dont_include_lex(v:[words='\'ve']).
It is also possible to write conditional include_lex
declarations. These are intended to be used to cover the case where you want in
effect to say "include all the inflected forms of any entry matching this
pattern, if you see any instance of it".
A conditional include_lex
declaration has the specific form
include_lex(<Cat>:[sem=<Pattern1>],
Tags) :-
rule_exists(<Cat>:[sem=<Pattern2>],
Tags).
where normally <Pattern1>
and <Pattern2> will share
variables. So for example
include_lex(v:[sem=[Type, Word]], Tags)
:-
rule_exists(v:[sem=[[tense, Tense], [Type, Word]]], Tags).
says "include any v entry
with a sem value including the subterm [Type,
Word], if you have learned a rule for a v
whose sem value exactly matches [[tense,
Tense], [Type, Word]]".
Nuance provides tools for creating class N-gram grammars, using the
SayAnything package. For details of how to use SayAnything, see the Nuance documentation:
the least trivial step, however, is usually writing the "tagging grammar',
which defines the backoff classes. Sometimes, you may be in the situation of
having already constructed a specialised Regulus grammar, and wanting to build
a class N-gram grammar with similar coverage. Regulus provides a tool that
allows you to define the classes by example: each class is specified by naming
two or more lexical items, and consists of all the lexical items that match the
common generalisation of the examples. The tool is packaged as the Prolog
predicate
specialised_regulus2nuance_tagging(+RegulusGrammarFile,
+SpecFile, +TaggingGrammarFile, +Debug, +TopGrammar)
defined in the file $REGULUS/Prolog/specialised_regulus2nuance_tagging.pl.
The arguments are as follows:
· RegulusGrammarFile is a "no_binarise" specialised Regulus grammar file
·
SpecFile
is a file of items of the form
tagging_class(<ClassId>,
<Examples>)
where
o <ClassId> is an atom that can be used as a Nuance grammar name
o <Examples> is a list of at least two lexical items, either atoms or comma-lists
· TaggingGrammarFile is an output Nuance GSL tagging grammar file
· Debug is one of {debug, nodebug}
· TopGrammar is the name of the top-level generated grammar
TaggingGrammarFile is created from RegulusGrammarFile and SpecFile by constructing one tagging grammar for each tagging_class declaration in SpecFile. The tagging grammar for tagging_class(<Grammar>, <Examples>) is constructed as follows:
1. Go through RegulusGrammarFile finding the lexicon entries matching <Examples>
2. Construct the anti-unification of the LHS categories in all these lexicon entries, to create a category Pattern.
3. Find all Words such that there is a lexicon entry matching Pattern --> Words
4.
The generated GSL grammar <Grammar> is
<Grammar>
[
Words_1
Words_2
...
Words_n
]
The top-level GSL grammar is
.MAIN
[
<Grammar_1>
<Grammar_2>
...
<Grammar_n>
]
Here is an example of using the tool, taken from the Japanese version of the
MedSLT system:
:-
use_module('$REGULUS/Prolog/specialised_regulus2nuance_tagging').
:- specialised_regulus2nuance_tagging(
'$MED_SLT2/Jap/GeneratedFiles/japanese_recognition_specialised_no_binarise_default.regulus',
'$MED_SLT2/Jap/SLM/scripts/headache_tagging_grammar_spec.pl',
'$MED_SLT2/Jap/SLM/med_generated_tagging_headache.grammar',
debug,
'.MAIN_tagging_headache').
:- halt.
Regulus permits a surface parsing method that can be used as an alternative to grammar-based parsing. Semantic representations are lists of elements produced by simple surface pattern matching. The surface parsing mode is switched on using the SURFACE command, and interacts cleanly with translation mode.
In order to use surface parsing, you need to define the following config file entries:
The actual patterns are in the surface_patterns file, which is the only one
described here; we currently recommend that the other files be filled with the
placeholder values defined in the directory $MED_SLT2/Eng/Alterf. A later
version of this documentation may describe how to develop non-trivial versions
of these files.
If you want to use the surface processing rules to produce nested structures,
you must also define a value for the config file entry surface_constitutent_rules. This is
described further below.
The surface_patterns file contains a set of declarations of the form
alterf_surface_pattern(<Pattern>,
<Element>, <Doc>).
where the semantics are that if <Pattern>
is matched in the surface string then <Element>
is added to the semantic representation. The <Doc>
field should be set to null or
contain an example. The pattern language is illustrated by the following
examples:
alterf_surface_pattern([pressing],[adj,pressing],null).
The word "pressing" produces the semantic element [adj, pressing].
alterf_surface_pattern([in,'...',morning/afternoon/evening],[prep,in_time],null).
The word "in", followed by a gap or zero or more words and one of the
words "morning", "afternoon" or "evening",
produces the semantic element [prep,
in_time].
alterf_surface_pattern([not_word(least/than),once],[frequency,once],null).
The word "once", preceded by a word that is not "least" or
"than", produces the semantic element [frequency,
once].
alterf_surface_pattern(['is'/are/was,
not(['...',increasing/decreasing/becoming])],[verb,be],null).
The words "is", "are" or "was", not followed by a
gap of zero or more words and one of the words "increasing",
"decreasing" or "becoming", produces the semantic element [verb, be].
alterf_surface_pattern(['*start*',when],[time,when],null).
The word "when", occurring at the start of the utterance, produces
the semantic element [time, when].
It is possible to use the surface pattern rules to produce nested
constituents. To do this, you need to define a value for the config file entry surface_constitutent_rules. The file this
entry points to contains surface_constituent_boundary
rules, that define when nested constituents start and end.
A surface_constituent_boundary rule
is of the form
surface_constituent_boundary(<BeforePattern>,
<AfterPattern>, <StartOrEnd>, <Tag>).
where
· <BeforePattern> and <AfterPattern> are surface patterns of the type defined in the preceding section.
· <StartOrEnd> is either start or end
· <Tag> is the tag attached to the nested constituent. Currently this must have the value clause.
If the surface parser reaches a point in the string where the immediately
preceding words match <BeforePattern>,
the immediately following words match <AfterPattern>,
and <StartOrEnd> is start, then it will open a nested
constituentof type <Tag>.
The nested constituent is closed off either by reaching the end of the string,
or by reaching a point where there is an end
rule whose before- and after-patterns match.
For example, the rule
surface_constituent_boundary([when],
[not_word(do/does/have/has/can)], start, clause).
says that a nested constitutent of type clause
is started at a point where the preceding word is when, and the following word is not one of do, does,
have, has or can.
It is possible to compile a Regulus grammar into a generation grammar, using the LOAD_GENERATION command. The files and parameters involved are specified in the config file, as follows:
· The Regulus grammar to be compiled is specified using the regulus_grammar config file item. For historical reasons, you can also use the generation_regulus_grammar config item.
· The compiled version of the generator is placed in the Prolog file specified by the generation_grammar config item.
·
The top-level rule in the Regulus
grammar must be of the form
<TopLevelCat>:[gsem=[<TopLevelFeature>=Sem]] --> <Body>
The value of <TopLevelCat>
is specified using the top_level_generation_cat
config item. Default is .MAIN.
The value of <TopLevelFeature> is
specified using the top_level_generation_feat
config item. Default is value.
· The Prolog module for the compiled generator file is specified by the generation_module_name config item. Default is generator.
· The top-level Prolog predicate in the compiled generator file is specified by the top_level_generation_pred config item. Default is generate.
· Generation uses an iterative deepening algorithm, which initially sets a maximum derivation length of <Start>, and increases it in increments of <Increment> until it exceeds <Max>. These parameters are specified using the generation_incremental_deepening_parameters config item. The default value is [5, 5, 50].
A typical config file for compiling a generator looks like this:
% Use same grammar for analysis and
generation (analysis just for development)
regulus_config(regulus_grammar,
[french_generation_grammars(french_generation)]).
regulus_config(top_level_cat, '.MAIN').
% Where to put the compiled generation
grammar
regulus_config(generation_grammar,
fre_runtime('generator.pl')).
% Trivial settings for iterative
deepening - perform one iteration, and allow anything of depth =< 50
regulus_config(generation_incremental_deepening_parameters,
[0, 50, 50]).
regulus_config(working_file_prefix,
fre_runtime(french_generation)).
If you are developing a generation grammar, you will often find it useful to
run the top-level in generation mode, using the GENERATION
command. When in generation mode, the system attempts to parse each utterance,
and then generates back from the result using the generation grammar. If it is
possible to generate several different strings, all of them will be displayed.
In order for this to work, you have to first load the analysis grammar using
the LOAD or EBL_LOAD commands, and
also load the generation grammar using the LOAD_GENERATION
command.
Note that if you are building a translation application, it will usually be
convenient to have a separate config file for the target language generation
grammar, which you will just use for developing this grammar.
You can compile grammars that have been created using grammar specialisation into generation form, using the command EBL_LOAD_GENERATION <SubdomainTag>. If you omit the argument, it is assumed to have the value 'default'. The specialised grammar with tag <SubdomainTag> is compiled into generation form, and the compiled version is stored in the location referenced by the config file entry generation_grammar(<SubdomainTag>).
If the generation grammar is ambiguous, in the sense that several surface
strings can be generated from one logical form, the order in which the strings
are generated is in general not defined. You can induce a specific ordering by
adding a generation preferences file. The
format of entries in this file is
generation_preference(<WordList>,
<PreferenceScore>).
where <WordList> is a
Prolog list of surface words, and <PrefenceScore>
is a positive or negative number. The effect of the generation preferences is
to define a score on each generated string, calculated by summing the
preference scores for all substrings. So for example, if the grammar generates
both "on the evening" and "in the evening" as semantically
indisinguishable expressions, we could prefer "in the evening" by
adding one or both of the following declarations:
% Prefer "in the evening"
generation_preference([in, the,
evening], 1).
% Disprefer "on the evening"
generation_preference([on, the,
evening], -1).
Regulus possesses an extensive infrastructure allowing it to be used for
building speech translation systems. Most of this part of the system has been
developed under the Open Source MedSLT
project, which contains further documentation and numerous examples. To use
the translation mechanisms, you need to declare the necessary translation-related files in the config file. You can then run the top-level Regulus
development environment in translation
mode. Translation can be run both interactively and in batch. It is also possible to call the translation routines from
Prolog, so as to incorporate them into a speech translation application.
Translation can be performed using either a transfer-based or an interlingual
framework. In a transfer-based framework, source-language representations are
transferred directly into target-language representations. In an interlingual
framework, translation goes through the following levels of representation:
· Source level. The representation produced by the source language grammar.
· Source discourse level. This is intended to be a slightly regularised version of the source representation, suitable for carrying out ellipsis processing. Ellipsis processing makes it possible to translate elliptical phrases in the context of the preceding dialogue.
· Interlingual level. This is intended to act as a neutral representation suitable for reducing the differences between source and target language representation.
· Target level. The level from which the target language grammar generates surface form.
All kinds of transformations (source to target in the transfer based
framework; source to source discourse, source discourse to interlingua
and interlingua to target in the interlingual one) are implemented using the
same transfer rule formalism.
When the target language representation has been produced, it needs to be
converted into a surface string using a generation grammar. If the
generation grammar is ambiguous (i.e. one representation can produce multiple
surface string), it is possible to define generation preferences. The output of
the generation grammar can be further post-processed using collocation rules and orthography rules.
In order to build a translation application, you need to declare at least some of the following files.
· A transfer rules file (optional) defined by the transfer_rules config file entry.
· An interlingua_declarations file (optional) defined by the interlingua_declarations config file entry.
· A to_source_discourse_rules file (optional) defined by the to_source_discourse_rules config file entry.
· A to_interlingua rules file (optional) defined by the to_interlingua_rules config file entry.
· A from_interlingua rules file (optional) defined by the from_interlingua_rules config file entry.
· An ellipsis classes file (optional) defined by the ellipsis_classes config file entry. If this is defined, you need to compile it first using the COMPILE_ELLIPSIS_PATTERNS command.
· A generation grammar file (required) defined by the generation_rules config file entry. This should be the compiled form of a Regulus grammar for the target language. The compiled generation grammar must first be created using the LOAD_GENERATION command.
· A collocations file (optional) defined by the collocation_rules config file entry.
· An orthography rules file (optional) defined by the orthography_rules config file entry
You must define EITHER a transfer_rules file OR both a to_interlingua rules file and a from_interlingua rules file.
You can put the Regulus top-level into translation mode using the TRANSLATE command. You can then go back into normal mode
using the NO_TRANSLATE command. You can switch
between the transfer-based framework and the interlingual framework using the INTERLINGUA and NO_INTERLINGUA
commands. The default in translation mode is to assume the transfer-based
framework. Translation mode is compatible with surface parsing, so if you
invoke the SURFACE command while in translation mode,
you will perform translation using surface processing. This of course assumes
that you have defined and loaded the files required for surface parsing.
The following example, which uses the English to French language version of the
MedSLT system, illustrates interaction in translation mode. Note that the
second sentence is translated in the context set up by the first one, with the
ellipsis resolution being carried out at source_discourse level.
>> do you have headaches in the evening
Source: do you have headaches in the evening+*no_preceding_utterance*
Target: avez-vous vos maux de tête le soir
Other info:
n_parses=1
source_representation=[[prep,in_time],[pronoun,you],[spec,the_sing],[state,have_symptom],[symptom,headache],[tense,present],[time,evening],[utterance_type,ynq],[voice,active]]
source_discourse=[[pronoun,you],[state,have_symptom],[symptom,headache],[tense,present],[time,evening],[utterance_type,ynq],[voice,active]]
resolved_source_discourse=[[pronoun,you],[state,have_symptom],[symptom,headache],[tense,present],[time,evening],[utterance_type,ynq],[voice,active]]
resolution_processing=trivial
interlingua=[[pronoun,you],[state,have_symptom],[symptom,headache],[tense,present],[time,evening],[utterance_type,ynq],[voice,active]]
target_representation=[[pronoun,vous],[path_proc,avoir],[symptom,mal_de_tête],[tense,present],[temporal,soir],[utterance_type,sentence],[voice,active]]
n_generations=1
other_translations=[]
>> in the morning
Source: in the morning+avez-vous vos maux de tête le soir
Target: avez-vous vos maux de tête le matin
Other info:
n_parses=1
source_representation=[[prep,in_time],[spec,the_sing],[time,morning],[utterance_type,phrase]]
source_discourse=[[time,morning],[utterance_type,phrase]]
resolved_source_discourse=[[pronoun,you],[state,have_symptom],[symptom,headache],[tense,present],[time,morning],[utterance_type,ynq],[voice,active]]
resolution_processing=ellipsis_substitution(when_pain_appears)
interlingua=[[pronoun,you],[state,have_symptom],[symptom,headache],[tense,present],[time,morning],[utterance_type,ynq],[voice,active]]
target_representation=[[pronoun,vous],[path_proc,avoir],[symptom,mal_de_tête],[tense,present],[temporal,matin],[utterance_type,sentence],[voice,active]]
n_generations=1
other_translations=[]
It is possible to perform batch translation, using input data in both text
and speech form. The results can then be judged, and the judgements stored to
use for future regression testing.
Batch translation of text data
Input text data for batch translation should be placed in the file defined by
the translation_corpus config file entry. The
format is sent(Atom), so for
example a typical line would be
sent('switch off the light').
(note the closing period). Translation is performed using the TRANSLATE_CORPUS command. The output file is
defined by the translation_corpus_results
config file entry.
Batch translation of speech data
Input speech data for batch translation is in the form of recorded wavfiles.
The names of these wavfiles, together with accompanying transcriptions, should
be placed in the file defined by the translation_speech_corpus
config file entry. The format is <Wavfile>
<Words>, so for example a typical line would be
C:/Regulus/data/utt03.wav switch off
the light
Translation is performed using the TRANSLATE_SPEECH_CORPUS command. The output
file is defined by the translation_speech_corpus_results
config file entry. A second output file, defined by the translation_corpus_tmp_recognition_judgements
config file entry, contains "blank" recognition judgements.
Since it usually takes much longer to perform speech recognition on a batch
file than to translate the resulting text, the TRANSLATE_SPEECH_CORPUS_AGAIN command
makes it possible to re-run translation on the saved recognition results. This
is useful if you are testing speech translation performance, but have only
changed the translation or generation files.
Judging translations
To judge the results of translation, manually edit the output file. This should
contains question marks for translations that have not yet been judged. If
these are replaced by valid judgements, currently 'good', 'ok' or 'bad', the
new judgements can be incorporated into the translation judgements file
(defined by the translation_corpus_judgements
config file entry) using the commands UPDATE_TRANSLATION_JUDGEMENTS
(results of text translation) and UPDATE_TRANSLATION_JUDGEMENTS_SPEECH
(results of speech translation).
You can revise judgements by simply changing them in the output file. If you
then use UPDATE_TRANSLATION_JUDGEMENTS
or a similar command, the translation judgements file will be updated
appropriately.
Judging recognition
In speech mode, the second output file, defined by the
translation_corpus_tmp_recognition_judgements
config file entry, contains recognition judgements. This file should also be
manually edited, and the question marks replaced with either 'y' (acceptable
recognition) or 'n' (unacceptable recognition). The recognition judgements file
can be updated using the UPDATE_RECOGNITION_JUDGEMENTS
command. Sentences judged as unacceptably recognised are explicitly
categorised as such, and not, for example, as translation errors.
[Not yet written. How to incorporate translation into an application. What to load, in terms of both code and rules. Pointer to MedSLT.]
Translation files can be used for both transfer and interlingual processing.
In the case of transfer, there should be one file, specified by the transfer_rules config file entry. In the case of interlingua, there should be two files, specified by the from_interlingua_rules and to_interlingua_rules config file entries. The
format of the files is the same in both cases. The intent is to define a
mapping from a source representation to a target representation. Both
representations are lists of items.
Translation files can contain two kinds of entries, respectively for transfer_lexicon items and transfer_rule items. A transfer_lexicon item is of the form
transfer_lexicon(SourceItem,
TargetItem).
where the assumption is that SourceItem
can appear as an ELEMENT of the list that makes up the source
representation. The semantics is that the SourceItem
is to be replaced in the target-language representation by the TargetItem. Here are some examples of
English to French transfer lexicon items from the MedSLT system:
transfer_lexicon([adj, deep],
[body_part, profond]).
transfer_lexicon([body_part, face],
[body_part, visage]).
transfer_lexicon([event, relieve],
[event, soulager]).
transfer_lexicon([freq, often],
[frequency, souvent]).
A transfer_rule item is of the form
transfer_rule(SourceItemList,
TargetItemList).
where the assumption is that SourceItemList
is a LIST OF ONE OR MORE ELEMENTS in the list that makes up the source
representation, and TargetItemList is
a list of zero or more elements . The semantics are that the SourceItemList is to be replaced in
the target-language representation by the TargetItemList.
Here are some examples of English to French transfer rule items from the MedSLT
system:
transfer_rule([[tense,present],
[aspect,perfect]], [[tense, passé_composé]]).
transfer_rule([[event, make_adj], [adj,
worse]], [[event, aggraver]]).
transfer_rule([[state, spread], [prep,
to_loc]], [[path_proc, irradier]]).
transfer_rule([[spec, the_sing]], []).
The order of the elements in the left- and right-hand sides of a transfer
rule DOES NOT matter.
The ordering of transfer rules in the file DOES matter. If more than one
rule can be applied, the choice is made as follows:
· If it is possible to apply a transfer rule and a transfer lexicon entry, then the transfer rule is chosen.
· If more than one transfer rule can be applied, then the rule with the longer left-hand side is chosen.
· If the criterion of choosing the longest left-hand side still results in a tie, then the rule appearing first in the file is chosen.
Transfer rules can optionally include conditions, which most often will be
conditions on the context in which the rule is called. A transfer rule with
conditions is of the form
transfer_rule(SourceItemList,
TargetItemList) :- Conditions.
A simple condition is of one of the forms
· number(<Term>) where <Term> is any term.
· context(<Element>)where <Element> is a single representation element.
· context_above(<Element>) where <Element> is a single representation element.
· global_context(<Term>) where <Term> is any term.
The first case is number(<Term>).
This constrains <Term> to
be a number. For example, the rule
transfer_rule([[spec, N]], [[spec, N]])
:- number(N).
says that [spec, N] should be
translated into [spec, N] in the
case that N is a number.
The context element refers to
the context of the current clause. For example, the following rule says that [tense,present] should be translated
into into [tense, passé_composé]
in a context which includes the element [frequency,
ever]:
transfer_rule([[tense,present]],
[[tense, passé_composé]]) :- context([frequency, ever]).
context_above is similar, but is
used for rules that will be invoked inside representations of subordinate
clauses. In the case, the context referred to is that of the main clause in
which the subordinate clause appears. In contrast, the next rule says that [tense,present] should be translated
into into [tense, past] in a
subordinate clause which appears inside a main clause that includes the element
[tense, past]:
transfer_rule([[tense,present]],
[[tense, past]]) :- context_above([tense, past]).
Context elements may be partially uninstantiated, for example:
transfer_rule([[event, relieve]],
[[event, soulager]]) :- context([cause, _]).
It is also possible to define rules dependent on global context. Global
context is defined in the config file, using the global_context
declaration. It can be accessed using a context condition of the form global_context(<Element>)
where <Element> is a term
potentially defined by a global_context
declaration. For example, the following rule says that "it" should be
translated into "headache" if there is a global context declaration
of the form subdomain(headache):
transfer_rule([[pronoun, it]],
[[symptom, headache]]) :- global_context(subdomain(headache)).
You can combine context elements using conjunction, disjunction and negation,
as in usual Prolog syntax:
"Translate [event, relieve] into
[event, soulager] if there is BOTH
something matching [cause, _] AND
something matching [tense, present] in
the context"
transfer_rule([[event,
relieve]], [[event, soulager]]) :- context([cause, _]), context([tense,
present]).
"Translate [event, relieve] into
[event, soulager] if there is EITHER
something matching [cause, _] OR
something matching [symptom, _] in
the context"
transfer_rule([[event,
relieve]], [[event, soulager]]) :- context([cause, _]) ; context([symptom, _]).
"Translate [tense, present] into
[tense, present] if there
is NOT anything matching [frequency, ever] in the context"
transfer_rule([[tense,present]],
[[tense, present]]) :- \+ context([frequency, ever]).
Transfer rules may also contain constrained transfer variables. The syntax
of a constrained transfer variable is different depending on whether it occurs
on the left hand or the right hand side of the rule.
On the left hand side, the variable is written as tr(<Id>, <Constraint>), where <Id> is an atomic identifier,
and <Constraint> is a
pattern matching a single LHS element. Thus the following are valid LHS
transfer variables:
tr(device, [device, _])
tr(tense_var, [tense, _])
On the right hand side, the variable is written as simply tr(<Id>), where
<Id> is an atomic identifier. Thus the following are valid RHS transfer
variables:
tr(device)
tr(tense_var)
Each LHS variable must be associated with at least one RHS variable (normally,
it will be associated with exactly one). Conversely, each RHS variable must be
associated with exactly one LHS variable. The semantics are that if an element
on the LHS which matches the constraint in the transfer variable tr(<Id>, <Constraint>) can
be translated using other transfer rules, the translation will appear in the
RHS in the place marked by the associated RHS variable(s) tr(<Id>). Thus the following example
transfer_rule([[action, switch],
[onoff, off], tr(device, [device, _])],
[[action, switch_off], tr(device)]).
can be read: "Match the the elements [action,
switch], [onoff, off] and
[device, _] in the LHS, and
replace them with the element [action,
switch_off] together with the translation of the element matching [device, _]".
Transfer lexicon entries and unconditional transfer rules can be
bidirectional. A bidirectional lexicon entry has the form
bidirectional_transfer_lexicon(LHSItem,
RHSItem).
where the two arguments are of the same forms as those in a normal transfer
lexicon entry. Similarly, a bidirectional transfer rule is of the form
bidirectional_transfer_rule(LHSItem,
RHSItem).
where the arguments again have the same form as in a normal transfer rule.
By default, a bidirectional lexicon entry or rule is compiled in the same way
as its normal counterpart. However, if the list of transfer rules contains the
dummy element
transfer_direction(backward)
it is compiled as though the arguments had appeared in the reverse
order, i.e. with RHSItem first
and LHSItem second.
The commands TRANSLATION_TRACE_ON and TRANSLATION_TRACE_OFF can be used to toggle
translation tracing, which is off by default. When translation tracing is on,
translation prints out a trace associating the source and target atoms used by
each rule. Here is a typical example:
>> where is the pain
Transfer trace, "to_source_discourse"
[[loc,where]] --> [[loc,where]]
[ENG_TO_NEW_SOURCE_DISCOURSE:417-417]
[[secondary_symptom,pain]] --> [[symptom,pain]] [ENG_TO_NEW_SOURCE_DISCOURSE:1-3]
[[tense,present]] --> [[tense,present]]
[ENG_TO_NEW_SOURCE_DISCOURSE:474-474]
[[utterance_type,whq]] --> [[utterance_type,whq]] [ENG_TO_NEW_SOURCE_DISCOURSE:502-502]
[[verb,be]] --> [[verb,be]]
[ENG_TO_NEW_SOURCE_DISCOURSE:505-505]
[[voice,active]] --> [[voice,active]]
[ENG_TO_NEW_SOURCE_DISCOURSE:836-837]
------------------------------- FILES -------------------------------
ENG_TO_NEW_SOURCE_DISCOURSE:
c:/cygwin/home/speech/speechtranslation/medslt2/eng/prolog/eng_to_new_source_discourse.pl
Transfer trace, "to_interlingua"
[[loc,where]] --> [[loc,where]]
[ENG_TO_NEW_INTERLINGUA:634-634]
[[symptom,pain]] --> [[symptom,pain]]
[ENG_TO_NEW_INTERLINGUA:103-104]
[[tense,present]] --> [[tense,present]]
[ENG_TO_NEW_INTERLINGUA:689-689]
[[utterance_type,whq]] --> [[utterance_type,whq]] [ENG_TO_NEW_INTERLINGUA:702-702]
[[verb,be]] --> [[verb,be]]
[ENG_TO_NEW_INTERLINGUA:705-705]
[[voice,active]] --> [[voice,active]]
[ENG_TO_NEW_INTERLINGUA:1021-1022]
------------------------------- FILES -------------------------------
ENG_TO_NEW_INTERLINGUA:
c:/cygwin/home/speech/speechtranslation/medslt2/eng/prolog/eng_to_new_interlingua.pl
Transfer trace, "from_interlingua"
[[loc,where],[symptom,pain],[verb,be]] -->
[[locative,où],[path_proc,avoir],[symptom,mal],[pronoun,vous]] [INTERLINGUA_FRE_BIDIRECTIONAL_MAIN:293-294]
[[tense,present]] --> [[tense,present]]
[INTERLINGUA_FRE_BIDIRECTIONAL_MAIN:96-96]
[[utterance_type,whq]] --> [[utterance_type,wh]] [INTERLINGUA_FRE_BIDIRECTIONAL_MAIN:15-15]
[[voice,active]] --> [[voice,active]]
[INTERLINGUA_FRE_BIDIRECTIONAL_MAIN:311-312]
------------------------------- FILES -------------------------------
INTERLINGUA_FRE_BIDIRECTIONAL_MAIN:
c:/cygwin/home/speech/speechtranslation/medslt2/fre/prolog/interlingua_fre_bidirectional_main.pl
Source: where is the pain+où avez-vous mal
Target: où avez-vous mal
Other info:
n_parses=1
source_representation=[[loc,where],[secondary_symptom,pain],[tense,present],[utterance_type,whq],[verb,be],[voice,active]]
source_discourse=[[loc,where],[symptom,pain],[tense,present],[utterance_type,whq],[verb,be],[voice,active]]
resolved_source_discourse=[[loc,where],[symptom,pain],[tense,present],[utterance_type,whq],[verb,be],[voice,active]]
resolution_processing=trivial
interlingua=[[loc,where],[symptom,pain],[tense,present],[utterance_type,whq],[verb,be],[voice,active]]
target_representation=[[locative,où],[path_proc,avoir],[pronoun,vous],[symptom,mal],[tense,present],[utterance_type,wh],[voice,active]]
n_generations=1
other_translations=[]
If you are using interlingual translation, all the constants that refer to
the interlingual level must be declared in the interlingua_declarations file. This file
contains a set of Prolog clauses of one of the forms
interlingua_constant([<Key>,
<Value>]).
interlingua_constant([<Key>, <Value>]) :- <Body>.
Here, <Key> and <Value> can be any Prolog terms.
In the second case, <Body>
can be any expression that can legitimately appear in the body of a Prolog
rule.
For example, if you want to write the English-to-interlingua rule
transfer_rule([[event, make_adj], [adj,
worse]],
[[event, make_worse]]).
you need to add the declaration
interlingua_constant([event,
make_worse]).
Similarly, if you want to write the interlingua-to-French rule
transfer_rule([[spec, [more_than, N]]],
[[comparative, plus_de], [number, N]]).
you need to add the declaration
interlingua_constant([spec, [more_than,
N]]) :- ( number(N) ; var(N) ).
The body of the declaration needs to be written this way because the
interlingua declarations may be invoked either at rule-compilation time or at
runtime. At rule-compilation time, the N
in the left hand side of the interlingua-to-French rule will be an unbound
variable. At runtime, it will be instantiated to a number.
In general, you should not be adding many items to the interlingua
declarations file; the whole point is to limit the set of constants that appear
in a multi-lingual application, and encourage developers working on different
languages to use the same constants.
It is possible to specify the interlingua more tightly by defining an interlingua structure grammar. This is a Regulus grammar whose purpose is explicitly to define the range of semantic forms which constitute valid interlingua. The interlingua structure grammar is used as follows.
1. The interlingua structure grammar itself is
compiled in generation mode. The config file used to specify it must contain
the following lines:
regulus_config(generation_module_name,
check_interlingua).
regulus_config(top_level_generation_pred,
check_interlingua).
It must also include a line of the form
regulus_config(generation_grammar,
<CompiledFile>).
where <CompileFile>
is the name of the file used to hold the generation-compiled version of the
interlingua structure grammar.
2. The translation application must contain a declaration of the form
regulus_config(interlingua_structure,
<CompiledFile>).
where <CompileFile>
is the name of the generation-compiled interlingua structure grammar.
If an interlingua structure grammar is being used in a translation application, an extra line is produced in the translation trace, giving the surface form generated by the interlingua structure grammar. If the interlingua structure grammar fails to generate anything (i.e. the interlingua form is outside its coverage), this line will contain the value WARNING: INTERLINGUA REPRESENTATION FAILED STRUCTURE CHECK.
It is possible to get informative feedback about generation failures in the interlingua structure grammar using the top-level command INTERLINGUA_DEBUGGING_ON. When this mode is enabled, generation failures are followed by attempts to generate from variants of the interlingua formed by inserting, deleting or substituting elements. This will often enable the developer to identify the reason why an interlingua form fails to respect the constraints imposed by the interlingua structure grammar. Interlingua debugging mode is disabled using the command INTERLINGUA_DEBUGGING_OFF.
You can define and use macros in translation rule files. These macros have
the same syntax and semantics as the macros used in grammar
files. To take a simple example, supposing that a file of English to French
transfer rules contains the macro definition
macro(device(Eng, Fre),
transfer_lexicon([device,
Eng], [device, Fre])).
The macro call
@device(light, lampe).
would then be equivalent to the transfer lexicon entry
transfer_lexicon([device, light],
[device, lampe])).
Just as with grammar macros, transfer macros may be non-deterministic. The
following more complex example illustrates this. Suppose that we have included
the following macro definitions in a set of English to Interlingua transfer
rules:
macro(onoff(on), [[onoff,
on]]).
macro(onoff(off), [[onoff, off]]).
macro(switch_onoff(on), [action,
switch_on]).
macro(switch_onoff(off), [action,
switch_off]).
The macro call
transfer_rule([[action, switch],
@onoff(OnOff)],
[@switch_onoff(OnOff)]).
would then be equivalent to the two transfer rules
transfer_rule([[action, switch],
[onoff, on]],
[[action, switch_on]]).
transfer_rule([[action, switch],
[onoff, off]],
[[action, switch_off]]).
Macro definitions DO NOT need to precede associated macro calls, but
macro definitions MUST be included in the set of transfer files where
the calls appear.
If you are using the interlingual framework, it is possible to perform
simple context-dependent translation by defining an ellipsis_classes
file. This file contains ellipsis class definitions. Each definition is of the
form
ellipsis_class(<Id>,
<Examples>).
where <Id> is an arbitrary
identifier, and <Examples>
is a list of source-language phrases having the property that one phrase in the
list could reasonably be substituted for another if it appeared as an
elliptical phrase. Thus the definition
ellipsis_class(since_when,
['for months',
'for more than a day',
'for several days']).
specifies that phrases of the forms 'for months', 'for more than a day' and
'for several days' are all intersubstitutable as elliptical phrases. So for
example if the preceding utterance was "have you had headaches for
months", the phrase "for several days" would be translated as
though it were "have you had headaches for several days".
The ellipsis-processing mechanism assumes that source discourse level semantic
representations will be lists of elements of the form [Type, Value]. Except in the case of WH-ellipsis, examples are generalised by keeping only
the Key part of the
representation. For example, in the MedSLT system the interlingual
representation produced by the phrase 'for several days' is
[[prep,duration_time],[spec,several],[timeunit,day]]
and this is generalised to
[[prep,_],[spec,_],[timeunit,_]]
When more than one ellipsis processing pattern can be applied, the interpreter
chooses the one that matches THE LONGEST POSSIBLE SUBLIST of the current
context.
The ellipsis_classes file is compiled using the command COMPILE_ELLIPSIS_PATTERNS. The file used to hold the compiled version of the ellipsis classes can be specified explicitly using the compiled_ellipsis_classes config file entry. This is normally only useful if you for some reason want to maintain two different sets of compiled ellipsis rules for the same domain.
WH-ellipsis
It is also possible to perform ellipsis resolution on WH-questions. For
example, if the preceding question is "how often does the headache
occur", then "several times a day" can reasonably be interpreted
as meaning "does the headache occur several times a day".
The ellipsis declarations mechanism contains an extension that can be used to support ellipsis on WH-questions. An example in a class can be tagged as a WH-element, e.g.
ellipsis_class(frequency,
['several times a day',
'once a day',
'at least once a day',
'every day',
wh-'how often'
]).
Here, the example 'how often' is tagged as a WH-element. WH-ellipsis elements are not generalized, and must be matched exactly.
WH-elements can also be made dependent on a context. For example, if the preceding question was "what is your temperature", then "over ninety nine degrees" can reasonably be interpreted as meaning "is your temperature over ninety nine degrees". Here, "over ninety nine degrees" substitutes "what", but we don't always want it to substitute "what"; we only want this to happen in a context where containing a phrase like "your temperature". We specify this using the declaration
ellipsis_class(temperature,
['over one hundred degrees',
wh_with_context-['what', 'your
temperature']]).
Here, 'what' is the substituted element, and 'your temperature' is the context. Context elements are not generalized.
A generation grammar file must be defined, using the generation_rules config file entry. This should
normally be the compiled form of a Regulus grammar for the target
language. When compiling the generation grammar, the following config file
entries are required:
regulus_config(generation_module_name,
generator).
regulus_config(top_level_generation_pred,
generate).
If the generation grammar is ambiguous, in the sense that one representation
can produce multiple surface strings, it may be desirable to define generation
preferences, to force some strings to be chosen ahead of others. The generation
preference file is defined using the generation_preferences
config file entry, and should contain entries of the form
generation_preference(<ListOfWords>,
<Score>).
where <ListOfWords> is a
list of one or more surface words, and <Score>
is an associated numerical score. For example, if the intent is to
prefer the collocation "in the night" and disprefer the collocation
"at the night", it might be appropriate to add the two declarations
generation_preference([in, the, night],
1).
generation_preference([at, the, night],
-1).
If multiple generation results are produced, each result S is assigned a score
calculated by adding the generation preferences for all substrings of S, and
the result with the highest total score is preferred.
The surface form produced by generation can optionally be post-processed by
a set of collocation rules, defined using the collocation rules config file entry. A
collocation rule file consists of a list of entries of the form
better_collocation(<LHSWords>,
<RHSWords>).
where the semantics are that sequences of words matching <LHSWords> will be replaced by <RHSWords>. Here are some examples from the English to French
version of MedSLT:
better_collocation("de
un", "d'un").
better_collocation("de
une", "d'une").
better_collocation("dans
le côté", "sur le côté").
A second optional post-processing stage can be defined using the orthography rules config file entry. An
orthography rule file consists of a list of entries of the form
orthography_rewrite(<LHSString>,
<RHSString>).
where the semantics are that sequences of characters matching <LHSWords> will be replaced by <RHSWords>. If more than one
rule matches, the first one is used. Here are some examples from the English to
French version of MedSLT:
orthography_rewrite("t --
il", "t-il").
orthography_rewrite(" -- il",
"-t-il").
orthography_rewrite(" -- ",
"-").
You can define simple context-sensitive orthography rules by adding one or more
letter_class declarations, and
then using variables that range over letters of a specified type. For example,
we can capture the English rule that "a" becomes "an"
before a word starting with a vowel as follows:
letter_class('V', "aeiou").
orthography_rewrite(" a V1",
" an V1").
Here, the letter_class
declaration defines 'V' to be
the class of vowels, and the occurrences of V1
in the rule mean "a variable which can be any letter in the class 'V'". Note the spaces before the
"a" on the left-hand side and the "an" on the right-hand
side: these are necessary in order to match only the word "a", as
opposed to any word ending in an "a".
The syntax of the letter_class
declaration is
letter_class(<ClassID>,
<Letters>).
where <ClassID> is a
one-letter Prolog atom, and <Letters>
is a Prolog string. Variables in rules are written as a letter-class letter
followed by a single-digit number.
Regulus provides support for building Prolog dialogue applications, using a version of the "update semantics" model. The basic requirement is that all dialogue state must be represented as a Prolog term: there are no constraints on the form of this term. Dialogue processing code is supplied in the list of files which the dialogue_files config entry points to. These files need to define the following four predicates:
1.
lf_to_dialogue_move(+LF,
-DialogueMove)
Converts a logical form produced by a Regulus grammar into a "dialogue
move", the internal representation used by the dialogue manager. If you do
not wish to distinguish between LFs and dialogue moves, this predicate can be
trivial.
2.
initial_dialogue_state(?DialogueState)
Defines the initial value of the dialogue state object used by the DM.
3.
update_dialogue_state(+DialogueMove,
+InState, -AbstractAction, -OutState)
Takes as input a dialogue move and the previous dialogue state; returns an
"abstract action" and the new dialogue state.
4.
abstract_action_to_action(+AbstractAction,
-ConcreteAction)
Converts abstract actions into concrete actions. If you do not want to
distinguish between abstract actions and concrete actions, this predicate can
be trivial.
It is optionally possible to define two more top-level dialogue processing predicates:
1.
resolve_lf(+LF, +InState, -ResolvedLF, -Substitutions)
Performs context-dependent resolution on the logical form produced by the
Regulus grammar, with respect to the dialogue state. It return a term ResolvedLF, and a possibly empty list Substitutions of items of the form Term1 à Term2, detailing which substitutions
have been carried out to effect the transformation. This predicate will
typically be used to implement some kind of ellipsis resolution. It is called
first in the processing sequence, immediately before lf_to_dialogue_move/4.
2.
resolve_dialogue_move(+DialogueMove, +InState, -ResolvedDialogueMove)
Performs context-dependent resolution on the dialogue move produced by lf_to_dialogue_move/4, and is called
immediately after it. This predicate will typically be used to implement some
kind of reference resolution.
Examples of simple dialogue processing applications are provided in the directories $REGULUS/Examples/Toy1 and $REGULUS/Examples/Toy1Specialised. In each case, the file Prolog/input_manager.pl defines the predicate lf_to_dialogue_move/2; Prolog/dialogue_manager.pl defines the predicates initial_dialogue_state/1 and update_dialogue_state/4; and Prolog/output_manager.pl defines the predicate abstract_action_to_action/2.
A more complex example of dialogue processing can be found in $REGULUS/Examples/Calendar. As before, Prolog/input_manager.pl defines the predicate lf_to_dialogue_move/2, Prolog/dialogue_manager.pl defines the predicates initial_dialogue_state/1 and update_dialogue_state/4, and Prolog/output_manager.pl defines the predicate abstract_action_to_action/2. There are however several more files: Prolog/resolve_lf.pl defines the predicate resolve_lf/4, and Prolog/resolve_dialogue_move.pl defines resolve_dialogue_move/3. Note also that the main body of the input manager is specified using the LF-patterns mechanism. The patterns themselves can be found in the file Prolog/lf_patterns.pl.
Dialogue processing files can be compiled directly from the Regulus
top-level using the LOAD_DIALOGUE command, and the DIALOGUE command
puts the Regulus top-level into a dialogue processing mode. In this mode, text
utterances are parsed and the output is passed to the dialogue processing code.
The following edited session shows an example of how to use these commands to
run the dialogue application defined in $REGULUS/Examples/Toy1:
| ?- ['$REGULUS/Prolog/load'].
(... loads Regulus code files ...)
| ?-
regulus('$REGULUS/Examples/Toy1/scripts/toy1_dialogue.cfg').
Loading settings from Regulus config file
c:/home/speech/regulus/examples/toy1/scripts/toy1_dialogue.cfg
>> LOAD
(... loads Toy1 grammar ...)
>> LOAD_DIALOGUE
(... compiles dialogue processing files ...)
>> DIALOGUE
(Do dialogue-style processing on input sentences)
>> switch on the light in the kitchen
Old state:
[device(light,kitchen,off,0),device(light,living_room,off,0),device(fan,kitchen,off,0)]
LF:
[[type,command],[action,switch],[onoff,on],[device,light],[location,kitchen]]
Dialogue move: [command,device(light,kitchen,on,100)].
Abstract action: say(device(light,kitchen,on,100))
Concrete action: say_string("the light in the kitchen is on")
New state:
[device(light,kitchen,on,100),device(light,living_room,off,0),device(fan,kitchen,off,0)]
Dialogue processing time: 0.00 seconds
>> is the light switched on
Old state:
[device(light,kitchen,on,100),device(light,living_room,off,0),device(fan,kitchen,off,0)]
LF:
[[type,query],[state,be],[onoff,on],[device,light]]
Dialogue move: [query,device(light,_,on,_)].
Abstract action: say(device(light,kitchen,on,100))
Concrete action: say_string("the light in the kitchen is on")
New state:
[device(light,kitchen,on,100),device(light,living_room,off,0),device(fan,kitchen,off,0)]
Dialogue processing time: 0.01 seconds
>> switch off the light
Old state:
[device(light,kitchen,on,100),device(light,living_room,off,0),device(fan,kitchen,off,0)]
LF:
[[type,command],[action,switch],[onoff,off],[device,light]]
Dialogue move: [command,device(light,_,off,0)].
Abstract action: say(device(light,kitchen,off,0))
Concrete action: say_string("the light in the kitchen is off")
New state:
[device(light,kitchen,off,0),device(light,living_room,off,0),device(fan,kitchen,off,0)]
Dialogue processing time: 0.00 seconds
There are dialogue processing commands to perform most obvious kinds of corpus-based regression testing, in both text and speech mode. These commands are extremely similar to the corresponding ones for translation.
Regression testing files for dialogue applications may contain the following types of items:
The logical forms produced by a Regulus grammar typically contain a lot of useful structure, but none the less are not suitable for direct use in a dialogue application. In most cases, the input manager is responsible for performing some kind of non-trivial transformation that turns the logical form into a dialogue move.
It is often possible for some or all of the structure of a
dialogue move to be a list of feature-value pairs, whose values are determined
by searching for combinations of patterns in the LF. The lf_pattern mechanism
is designed to help build applications of this type. The implementor needs to
supply the following pieces of information:
An lf_pattern
declaration is of the following basic form:
lf_pattern(<MainPattern>,
<ConditionPattern>,
<Feature>=<Value>)
:-
<Body>.
Here, <MainPattern>
is a Prolog term that is to match some part of the LF, <ConditionPattern> is a Boolean combination of
Prolog terms, <Feature> is
an atom, <Value> is an
arbitrary Prolog term, and <Body>
is arbitrary Prolog code. The semantics are "If the LF contains an
occurrence of <MainPattern>
at a given place P, the combination of patterns <ConditionPattern>
matches anywhere in the LF, and <Body>
evaluates to true, then the assignment <Feature>=<Value>
is added at place P. It is possible to omit either the <ConditionPattern> or the <Body> or both. Boolean combinations are produced
using the operators ','
(conjunction), 'or'
(disjunction) and 'not'
(negation).
For example, the Calendar lf_pattern
declaration
lf_pattern([around_time,
time(H, M, PartOfDay)],
in_interval=referent(approximate_time(H, M,
PartOfDay))) :-
number(H),
number(M).
says that if the piece of logical form [around_time, time(H, M, PartOfDay)] is matched at some point in the LF, and H and M are both numbers, then this gives rise to the feature-value assignment in_interval=referent(approximate_time(H, M, PartOfDay)).
It is also possible to create nested representations using the lf_boundary declaration, which is of the form
lf_boundary(<Pattern>,
X^<Form>) :-
<Body>.
where <Form> is a Prolog term which contains an occurrence of the variable X. This says that a piece of logical form matching <Pattern> should give rise to an occurrence of <Form> in the output dialogue move; also, if any other patterns at matched on terms inside <Pattern>, then they are included in a list which becomes the value of the variable X. The effect is to say that <Form> "wraps around" all the feature-value pairs corresponding to structure inside <Pattern>. For example, the declaration
lf_boundary(term([ordinal, the_sing, N], meeting, _Body),
X^aggregate(nth_meeting(N), X)) :-
number(N).
says that a piece of logical form matching term([ordinal, the_sing, N], meeting, _Body), where N is a number, gives rise to a piece of dialogue move aggregate(nth_meeting(N), X), where X is a list containing all the feature-value pairs corresponding to structure inside term([ordinal, the_sing, N], meeting, _Body).
Many examples of lf_pattern and lf_boundary declarations can be found in the file $REGULUS/Examples/Calendar/Prolog/lf_patterns.pl
It is generally desirable to add some kind of help component to a grammar-based application, to give users feedback about the system's supported coverage. Experience shows that systems lacking a help component are generally very hard to use, while addition of even simple help functionality tends to create a dramatic improvement in usability. The Regulus platform contains tools that make it easy to add help functionality to a speech translation or spoken dialogue system based on a Regulus grammar.
The basic model assumed is as follows. The user provides two resources:
At runtime, the system carries out recognition using both the main Regulus-based recognizer, and also a backup statistical recognizer. The output from the statistical recognizer is matched against the help corpus, also taking account of the equivalence classes. The system returns the N examples which match most closely.
The config file needs to contain the following three entries:
The source data from which the help examples are extracted can be of several possible kinds:
The help class declarations can be of one of two possible types. Simple help class declarations are of the form
help_class_member(<SurfaceForm>, <ClassId>).
for example
help_class_member(show, list_verb).
help_class_member((what, people), who_expression).
Here, <SurfaceForm> is either a single atom, or a list of atoms enclosed in parentheses; <ClassId> is an arbitrary Prolog atom. The effect is to declare that <SurfaceForm> belongs to the equivalence class <ClassId>.
Complex class declarations make use of the Regulus lexicon's feature system, and have the syntax
help_class_member(<V>, <ClassId>) :-
lex_entry((<Cat>:<Feats>
--> <V>)).
where <V> is an arbitrary Prolog variable, <ClassId> is an arbitrary Prolog atom, <Cat> is a Regulus category symbol, and <Feats> is a possibly empty list of Regulus feature-value assignments valid for <Cat>. For example, the following complex declarations are used in the Calendar application:
% All words of
category 'p' belong to the class 'preposition'.
help_class_member(Surface, preposition) :-
lex_entry((p:[]
--> Surface)).
% All words of category 'd' such that article=n and det_type=ordinal belong
to the class 'ordinal_det'.
help_class_member(Surface, ordinal_det) :-
lex_entry((d:[article=n,
det_type=ordinal] --> Surface)).
% All words of
category 'name' with sem_n_type=agent belong to the class 'person_name'
help_class_member(Surface, person_name) :-
lex_entry((name:[sem_n_type=agent]
--> Surface)).
Help class files may also contain macros, and as usual it may be easier to structure declarations by careful use of macros. For example, the following declarations from the English MedSLT help file define all nouns with semantics of the form [[timeunit, _]] to belong to the class time_period:
macro(sem_help_class_member(Category, Sem, Class),
( help_class_member(Surface, Class) :-
lex_entry((Category:[sem=Sem]
--> Surface))
)).
%
Time periods: seconds, minutes, etc.
@sem_help_class_member(n, [[timeunit, _]], time_period).
The class stop_word has a special meaning: all words which are backed off to this class are ignored. So for example the English help class declaration
help_class_member(Surface,
stop_word) :-
lex_entry((d:[article=y] -->
Surface)).
says that all words in category d such that article=y should be ignored.
COMPILE_HELP. Compile the help resources.
LOAD_HELP. Load compiled help resources.
HELP_RESPONSE_ON. Switch on help responses in the Regulus top loop. In this mode, the help module is called for each input sentence, and the top 5 help matches are printed.
HELP_RESPONSE_OFF. Switch off help responses in the Regulus top loop.
LIST_MISSING_HELP_DECLARATIONS Write out a list of lexical items that are not listed in targeted help declarations.
The predicate get_help_matches/3, defined in $REGULUS/Prolog/help.pl, can be used to retrieve a help response. This assumes that the help resources are loaded. A call is of the form
get_help_matches(Sent, N, Matches)
where Sent is an atom representing an utterances, N is the required number of help responses. Matches will be instantiated to a list of the N most closely matching help responses.
Regulus syntax is Prolog-based - that is, every well-formed Regulus expression is a Prolog term. There are two basic kinds of Regulus expressions: declarations and rules, both of which can be enclosed inside an optional label. Note that there is no formal distinction between a grammar rule and a lexical entry. It is also possible to use comments , macros and include statements .
Comments are Prolog-style; anything between a percent-sign (%) and the end of a
line is a comment, or alternately comments can be enclosed between an initial
/* and a closing */.
Examples
% This is a comment
yesno:[sem=no] --> no, fucking, way. % This line needs a comment...
% This is a
% multi-line
% comment
/* And this is a
multi-line comment
too */
A rule or declaration can optionally be given a (possibly non-unique)
identifier. This makes it possible to add ignore_item
declarations to ignore specified labelled rules. Note that Prolog syntax
requires an extra pair of parentheses around a rule if it is enclosed in a
label.
Syntax
labelled_item(<Label>,
<RuleOrDeclaration>).
where <Label> is an Prolog atom
and <RuleOrDeclaration> is any
rule or declaration.
Examples
labelled_item(foo_macro,
macro(foo(X), [f1=X, f2=@bar])
).
labelled_item(foo_rule,
(foo:[] --> bar)
).
Regulus permits definition of macros . These
macros may be used in rules and category declarations. (It is possible that they
may also be used in other declarations in later versions of Regulus). The
syntax of a macro invocation is
@<Term>
where <Term> is a Prolog term that unifies with the head of a macro
definition. Macro definitions can be of two forms: macros
, and default_macros , with the
semantics that macros take precedence over default_macros.
The semantics of macro invocation are as follows.
1. All terms of the form @<Term> which unify with the head of a macro invocation are non-deterministically replaced with the bodies of the matching macro definitions.
2. If no definition matches, an error is signalled.
3. If the macro invocation appears in the context of a feature/value list, and the macro body expands to a list, then the body is appended to the rest of the feature/value list.
4. If the macro body itself contains macro invocations, it is recursively expanded until the result contains no macro invocations.
5. If macro invocation results in a cycle, an error is signalled.
6. If there are any matching definitions of the form "macro(<Term>, <Body>)", then the "macro" definitions are used.
7. If there are no matching definitions of the form "macro(<Term>, <Body>)", but there are definitions of the form "default_macro(<Term>, <Body>)" then the "default_macro" definitions are used.
Examples
Suppose we have the following macro and default_macro definitions:
macro(foo(X), [f1=X, f2=@bar]).
macro(bar, c).
macro(bar, d).
default_macro(bar, e).
default_macro(frob, z).
Then the rule
cat1:[f3= @bar] --> word1.
expands to the two rules
cat1:[f3=c] --> word1.
cat1:[f3=d] --> word1.
Note that the default_macro definition for bar
is not used, since there are normal macro rules available. Similarly, the
rule
cat2:[f3=a, f4= @frob, @foo(b)] -->
word2.
expands to
cat1:[f3=a, f4=z, f1=b, f2=c] --> word2.
cat1:[f3=a, f4=z, f1=b, f2=d] --> word2.
Note here that the default_macro definition for frob has been used, since there is no
macro definition available.
It is possible for Regulus files to include other Regulus files. The syntax is
include(<Pathname>).
where <Pathname> is an Prolog-syntax
pathname. If no extension is given, it is assumed to be ".regulus".
Included files may themselves include file, to any depth of nesting. If the
pathname is not relative, it is assumed relative to the directory of the
including file.
Examples
include(foo).
include('foo.regulus').
include('$REGULUS_GRAMMAR/foo').
include(regulus_grammar(foo)).
include('more_grammars/foo').
include('../foo').
include('../other_grammars/foo').
The following types of declarations are permitted:
· macro
· feature
· category
· feature_instantiation_schedule
· feature_value_space_substitution
Syntax
ignore_item(<Label>).
Conditions
<Label> is an atom.
Effect
All rules and/or declarations with label <Label> are ignored.
Examples
ignore_item(foo_macro).
ignore_item(foo_rule).
Syntax
macro(<MacroHead>, <MacroBody>).
Conditions
<MacroHead> is a non-variable
term.
<MacroBody> is an arbitrary term
Effect
<MacroHead> is defined as a macro
pattern. This means that any term in a rule which unifies with @<MacroHead> is expanded to <MacroBody> . If there are common
variables in <MacroHead> and <MacroBody>, then the variables in <MacroBody> are instantiated from
those in <MacroHead> . Macros
are described in more detail here.
Examples
macro(foo(X), [f1=X, f2=@bar]).
macro(bar, c).
macro(bar, d).
Syntax
default_macro(<MacroHead>,
<MacroBody>).
Conditions
<MacroHead> is a non-variable
term.
<MacroBody> is an arbitrary term
Effect
<MacroHead> is defined as a
default macro pattern. This means that any term in a rule which unifies with @<MacroHead> is expanded to <MacroBody> , as long as there
are no macro declarations that match. If there
are common variables in <MacroHead><MacroBody>,
then the variables in <MacroBody>
are instantiated from those in <MacroHead>
. Macros are described in more detail here.
Examples
macro(foo(X), [f1=X, f2=@bar]).
macro(bar, c).
macro(bar, d).
Syntax
feature_value_space(<ValueSpaceId>,
<ValueSpace>).
Conditions
<ValueSpaceId> is an atom.
<ValueSpace> is a list of lists
of atoms
Effect
<ValueSpaceId> is defined as the
name of the feature value space <ValueSpace>
. The lists in <ValueSpace>
represent the range of possible values along each dimension (one per list) of
the value space. Usually, <ValueSpace>
will be a singleton list, i.e. the space will be one-dimensional.
It is possible to have multiple feature_value_space declarations for the same
value_space_id, as long as the forms of the declarations are compatible in
terms of number of dimensions. In this case, the lists of possibilities along
each dimension are unioned.
Examples
feature_value_space(sem_np_value, [[n,
device, location]]).
feature_value_space(number_value, [[sing, plur]]).
feature_value_space(agr_value, [[sing, plur], [1, 2, 3]]).
feature(<FeatName>,
<ValueSpaceID>).
Conditions
<FeatName> and <ValueSpaceID> are both atoms. <ValueSpaceID> must be declared as a feature_value_space .
Effect
<FeatName> is defined as a
feature taking values in <ValueSpaceID>
.
Examples
feature(number, number_value).
feature(sem_np_type, sem_np_value).
feature(obj_np_type, sem_np_value).
Syntax
category(<CategoryName>,
<FeatsList>).
Conditions
<CategoryName> is an atom. <FeatsList> is a list of atoms, all of
which must be declared as features , except
for the pre-defined features 'sem' and 'gsem'.
Effect
<CategoryName> is declared as a
category with features <FeatsList>
.
Examples
category('.MAIN', [gsem]).
category(noun, [sem, number, sem_np_type]).
category(verb, [sem, number, vform, vtype, obj_sem_np_type]).
Syntax
top_level_category(<CategoryName>).
Conditions
<CategoryName> is an atom
that has been declared as a category .
Effect
Declares that <CategoryName> is
a top-level category, i.e. a start symbol in the grammar. In the GSL
translation, rules for <CategoryName>
will use the symbol <CategoryName>
exactly as it is specified in the Regulus grammar, e.g. without changing
capitalisation. This may mean that category names specified by top_level_category may need to start with a
period.
Example
top_level_category('.MAIN').
Syntax
feature_instantiation_schedule(<Schedule>).
Conditions
<Schedule> is a list of
lists of atoms, all of which must be declared as features
. Every declared feature must appear in one and only one of the lists.
Effect
This declaration can be used to control the way in which feature expansion is
carried out. Feature expansion is initially invoked only on the first group of
features, after which the rule space is filtered to remove irrelevant rules.
Then expansion and filtering is performed using the second group of features,
and so on until all the features have been expanded.
If no feature_instantiation_schedule
declaration is supplied, the compiler performs expansion and filtering on the
whole set of features at once.
Example
feature_instantiation_schedule([[number,
vform, vtype], [sem_np_type, obj_sem_np_type]]).
Syntax
specialises(<FeatVal1>,
<FeatVal2>, <ValueSpaceId>).
Conditions
<FeatVal1>, <FeatVal2> and <ValueSpaceId> are all atoms. <FeatVal1> and <FeatVal2>
must be declared as possible values of the feature value space <ValueSpaceId>.
Effect
Declares that <FeatVal1> is a
specialisation of <FeatVal2> .
At compile-time, <FeatVal2> will
be replaced by the disjunction of all the values that specialise it.
Example
specialises(switchable, device,
sem_np_type_value).
Syntax
ignore_feature((<FeatName>).
Conditions
<FeatName> is an atom
that has been declared as a feature .
Effect
All occurrences of <FeatName> in
rules and lexical entries are ignored.
Example
ignore_feature(number).
Syntax
ignore_specialises(<FeatVal1>,
<FeatVal2>, <ValueSpaceId>).
Conditions
<FeatVal1>, <FeatVal2> and <ValueSpaceId> are all atoms. <FeatVal1> and <FeatVal2>
are declared as possible values of the feature value space <ValueSpaceId>.
Effect
Cancels the effect of the specialises
declaration specialises(<FeatVal1>, <FeatVal2>,
<ValueSpaceId>), if there is one.
Example
ignore_specialises(switchable, device,
sem_np_type_value).
Syntax
feature_value_space_substitution(<FeatVal1>,
<FeatVal2>, <ValueSpaceId>).
Conditions
<FeatVal1>, <FeatVal2> and <ValueSpaceId> are all atoms. <FeatVal1> and <FeatVal2>
are both possible values in the feature
value space <ValueSpaceId>.
Effect
<FeatVal1> is substituted by <FeatVal2> wherever it appears as part
of the value of a feature taking values in <ValueSpaceId>
.
Example
feature_value_space_substitution(switchable,
device, sem_np_type_value).
Syntax
external_grammar(<TopLevelGrammar>,
<GSLGrammar>).
Conditions
<TopLevelGrammar> is an atom. <GSLGrammar> is an atom representing a
full GSL grammar. <TopLevelGrammar>
should NOT be defined as a category.
Effect
The GSL rule
<TopLevelGrammar> <GSLGrammar>
is added to the output grammar. This is mainly useful for including SLM
grammars in Regulus-generated grammars.
Example
external_grammar('.EXTERNAL', '[foo bar]').
Syntax
<Category> --> <RHS>
Conditions
<Category> is a category . <RHS>
is an RHS .
Semantics
Declares that <Category> can be
rewritten to <RHS> .
Examples
utterance:[sem=S] --> command:[sem=S].
np:[sem=[spec=S, noun=N]] --> spec:[sem=S, number=Num], noun:[sem=N,
number=Num].
noun:[sem=light, number=sing] --> light.
An RHS is of one of the following forms:
· category
· sequence
· optional
LEXICAL ITEM
Syntax
Prolog atom
Conditions
Print name of atom starts with a lower-case letter.
Semantics
Specific word
Examples
light
the
switch
Syntax
( <RHS1>, <RHS2> )
Conditions
<RHS1> and <RHS2> are RHSs .
Semantics
Sequence consisting of <RHS1>
followed by <RHS2>.
Examples
( all, of, the )
( spec:[sem=S, number=N], device_noun:[sem=D,
number=N] )
( at, least, number:[sem=S] )
Syntax
( <RHS1> ; <RHS2> )
Conditions
<RHS1> and <RHS2> are RHSs .
Semantics
Either <RHS1> or <RHS2>.
Examples
( under ; over ; ( at, least ) )
( adj:[sem=S] ; np:[sem=S] )
( a ; an ; number:[sem=S] )
Syntax
?<RHS>
Conditions
<RHS> is an RHS
.
Semantics
Either <RHS> or nothing.
Examples
?the
?pp:[sem=S, type=loc]
?( (of, the) )
Syntax
<CategorySymbol>:<FeatValList>
Conditions
<CategorySymbol> is an atom
defined as a category . <FeatValList> is a feature value list .
Semantics
Non-terminal in GSL grammar.
Examples
onoff:[sem=O]
device_noun:[sem=light, number=sing]
switch_verb:[]
A feature value list is a (possibly empty) list of feature
value pairs .
A feature value pair is either a semantic
feature value pair or a syntactic
feature value pair .
Syntax
<SemFeat> = <SemVal>
Conditions
<SemFeat> is either sem or gsem
. <SemVal> is a semantic value .
Semantics
sem translates into a GSL return
value. gsem translates into GSL
slot-filling.
Examples
gsem=[operation=Op, spec=S, device=D,
onoff=O, location=L]
sem=[and, N1, N2]
Syntax
<SynFeat>
= <SynVal>
Conditions
<SynFeat> is an atom declared as
a feature for the category in which the
feature value pair occurs. <SynVal>
is a syntactic feature value .
Semantics
<SynFeat> has a value compatible
with <SynVal>.
Examples
number=plur
gender=Gen,
vp_modifiers_type=(location\/n)
A semantic value is of one of the following:
· List
· Unary GSL function expression
· Binary GSL function expression
Syntax
Prolog atom
Conditions
Can only be used in expressions occurring in LHS category.
Semantics
Translates into atomic GSL value.
Examples
light
device
Syntax
Prolog variable
Conditions
If in LHS category, same variable must also occur in RHS.
Semantics
Value from RHS is passed up into LHS.
Examples
X
Sem
Syntax
[<Atom1> = <SemVal1>,
<Atom2> = <SemVal2>, ...]
Conditions
<Atom1>, <Atom2> etc are Prolog atoms.
<SemVal1>,
<SemVal2> etc are semantic values .
Semantics
If in LHS category, becomes a GSL feature value list.
If in RHS, giving <SemVal1>
variable values allows access to components of feature value lists in RHS.
Examples
[spec=S, device=D]
[op=command, device=D]
Syntax
[<SemVal1>, <SemVal2>, ...]
Conditions
Can only be used in LHS category.
<SemVal1>,
<SemVal2> etc are semantic values .
Semantics
Creates a GSL list.
Examples
[device, light]
[[operation, command], [device, D]]
Syntax
<UnaryGSLFunction>(<SemVal>)
Conditions
Can only be used in LHS category.
<SemVal> is a semantic value .
<UnaryGSLFunction> is one of the
following unary GSL functions: neg , first, last
, rest .
Semantics
Translates into corresponding GSL function expression.
Examples
neg(X)
first(List)
Syntax
<BinaryGSLFunction>(<SemVal1>,
<SemVal2>)
Conditions
Can only be used in LHS category.
<SemVal1> and <SemVal2> are semantic
values .
<BinaryGSLFunction> is one of
the following binary GSL functions: add
, sub, mul
, div , strcat
, insert_begin , insert_end , concat
Semantics
Translates into corresponding GSL function expression.
Examples
add(X, Y)
strcat(A, B)
concat(L1, L2)
A syntactic feature value is of one of the following forms:
· Atomic syntactic feature value
· Variable syntactic feature value
· Disjunctive syntactic feature value
· Conjunctive syntactic feature value
· Negated syntactic feature value
Syntax
Prolog atom <Atom>
Conditions
<Atom> must be declared as a
member of a dimension of the feature
value space for the appropriate feature
.
Semantics
The value of the feature is restricted to be consistent with <Atom> . If the feature value space is
one-dimensional (the usual case) then the value must be equal to <Atom>.
If the space is multi-dimensional, then the value must be of the form <Atom>/\<OtherValues> where <OtherValues> is a conjunction of
values along the remaining dimensions of the feature value space.
Examples
sing
device
no
Syntax
Prolog variable <Var>
Conditions
<Var> can only occur as a value
of other features if they have the same feature value space .
Semantics
The value of the feature will be the same as that of any other feature in the
same rule whose value is <Var> .
Examples
V
Number
Syntax
<SynVal1> \/ <SynVal2>
Conditions
<SynVal1> and <SynVal2> are syntactic feature values belonging to the
same feature value space .
Semantics
The value of the feature is constrained to be compatible with either <SynVal1> or <SynVal2> .
Examples
( yes \/ no )
( switchable \/ dimmable \/ null )
Syntax
<SynVal1> /\ <SynVal2>
Conditions
<SynVal1> and <SynVal2> are syntactic feature values belonging to
different dimensions of the same feature
value space . If the space is one-dimensional (the usual case), note that
this makes no sense.
Semantics
The value of the feature is constrained to be compatible with both <SynVal1> or <SynVal2> .
Examples
( sing /\ 3 )
( plur /\ 1 )
Syntax
( \ ( <SynVal> ) )
Conditions
<SynVal> is a syntactic feature value .
Semantics
The value of the feature is constrained to be NOT compatible with <SynVal> .
Examples
( \ ( device ) )
( \ ( sing /\ 3 ) )
The compiler can produce the following error messages:
·
Arg in
feature_instantiation_schedule declaration not a list of lists of atoms
See feature instantiation
schedule .
·
Arg in ignore_feature
declaration not an atom
See ignore_feature .
·
Arg in ignore_feature
declaration not declared as feature
See ignore_feature .
·
Bad category <cat>
Something unspecific is wrong with this category.
·
Bad subterm <term> in
rule
Something unspecific is wrong with this subterm
·
Cannot have both sem and gsem
as features
The compiler only allows a rule to use one out of 'sem' and 'gsem'.
The rule can have a semantic return value (sem), or do global slot-filling
(gsem), but not both at once.
·
Circular chain of
specialisations
specialisation declarations must
form a hierarchy.
·
First arg in category
declaration not an atom
See category declaration .
·
First arg in external_grammar
declaration not an atom
See external grammar declaration
.
·
First arg in feature
declaration not an atom
See feature declaration .
·
First arg in
feature_value_space declaration not an atom
See feature value space
declaration
·
First arg in top_level_category
declaration not an atom
See top level category
declaration .
·
First arg in top_level_category
declaration not declared as category
See top level category
declaration .
·
Following atoms in
feature_instantiation_schedule declaration not declared as features...
See feature
instantiation schedule declaration .
·
Following features not listed
in feature_instantiation_schedule declaration...
See feature instantiation
schedule declaration .
·
Gsem feature is meaningless
except in head of rule
Since the gsem feature corresponds to global slot-filling, it needs to be
in the head.
·
List in body of rule not
allowed yet
If you want to extract elements from a list
produced by an RHS category, use the unary
GSL functions first, last and rest.
·
Meaningless for top-level
category to use sem feature. Probably gsem intended?
The sem feature corresponds to a return value, but a top-level category can
only pass out information using global slot-filling (gsem).
·
More than one
feature_instantiation_schedule declaration
It is meaningless to have more than one feature_instantiation_schedule
declaration .
·
No top-level categories left at
start of top-down filtering phase.
During the compilation process, the compiler performs "bottom-up
filtering", and removes all categories which are irrelevant in terms of
not being expandable to lexical entries. If all of the top-level categories,
the grammar becomes equivalent to the null language. In practice, this usually
means that you are have not added enough lexical entries yet.
·
No top-level category defined.
There must be at least one top
level category definition .
·
Not meaningful to use GSL
function <func> in body of rule
Unary and binary GSL functions can only be
used in the LHS of a rule.
·
Second arg in category
declaration not a list
See category declaration .
·
Second arg in external_grammar
declaration not an atom
See external grammar declaration
.
·
Second arg in feature
declaration not an atom
See feature declaration .
·
Second arg in feature
declaration not declared as feature value space
See feature declaration .
·
Second arg in
feature_value_space declaration not a list of lists of category values
See feature value space
declaration .
·
Semantic variable assigned
value in body of rule
GSL provides no mechanism to check a semantic value, so it is meaningless
to give semantic variables specific values on the RHS.
·
Semantic variable in rule head
doesn't appear in body
The point of having a semantic variable in the rule head is that it should
get its value from another occurrence of the same variable on the RHS. Note
that the occurrence in the body must be a semantic value - using the
variable as a syntactic value is not permitted.
·
Semantic variable occurs twice
in body of rule
Since there is no way to check the value of a semantic variable, it makes
no sense to include one twice on the RHS.
·
"specialises"
declaration must be of form specialises(Val1, Val2, Space)
See specialises declaration.
·
Third arg in
"feature_value_space_substitution" declaration not an atom
See feature value space
substitution declaration .
·
Third arg in "ignore_specialises"
declaration not an atom
See ignore specialises
declaration .
·
Third arg in
"ignore_specialises" declaration not declared as feature value space
See ignore specialises
declaration .
·
Third arg in
"specialises" declaration not an atom
See specialises declaration .
·
Top-level category may not have
syntactic features
Since a top-level category has to have external significance to Nuance, it
may not have syntactic features.
·
Unable to combine
feature_value_space declarations...
See feature value space
declaration .
·
Unable to internalise regulus
declaration
Something unspecific is wrong with this declaration.
·
Unable to interpret
<file1> (included in <file2>) as the name of a readable file with
.regulus extension
See include statements .
·
Unable to interpret <file>
as the name of a readable file with .regulus extension
See compiling Regulus grammars into GSL .
·
Undeclared feature(s)
<feats> in category <cat>
All features must be declared using a feature declaration .
·
Undeclared features in category
declaration: <decl>
All features must be declared using a feature declaration .
·
Value of gsem feature can only
be a feature/value list.
Since gsem translates to global slot-filling, its value must be a
feature/value list.
·
Variable used as file name
File names must be instantiated.
The RegServer is a C++ application that provides a simple Prolog-compatible
interface to a Regulus grammar. In effect, it lets the developer access Nuance
speech functionality from a Prolog program as though the low-level speech input
and output calls were Prolog predicates. Communication between the Prolog
program and the C++ app is through a simple socket-based interface.
The rest of the section is organised as follows:
· Interfacing the RegServer to a Prolog application
· Interfacing the RegServer to a Java application
· Sample RegServer applications
All files are in the directory $REGULUS/RegulusSpeechServer.
If you are running under Windows, you do not need to do anything more once you
have unpacked the Regulus directory; simply invoke the RegServer exe file, as
described below.
If you are running in some other environment, or you want to recompile the
SpeechServer for some reason, the C++ source code is in the subdirectory C_src.
If you are using Visual C++, the SpeechServer .dsp and .dsw files are in the
directory VC++.
The executable for the RegServer app is in the file runtime/regserver.exe. Usage is as follows:
c:\home\speech\Regulus\RegulusSpeechServer\runtime\regserver.exe
\
-package
<package dir> \
[nuance
parameters] \
[-port
<tcp port the server listens for connection - default is 1974>] \
[-v]
\
[-f
<log file>]
In order to run the RegServer on a Regulus grammar, you need to do the
following:
· Compile the Regulus grammar to a GSL grammar.
· Compile the GSL grammar into a recognition package <package>, as described in the Nuance documentation.
· Start a Nuance license manager, as described in the Nuance documentation.
· Start a recserver on <package>, as described in the Nuance documentation.
· Finally, invoke the RegServer executable, specifying <package> as the package.
An invocation of the RegServer from the command-line needs to specify at least the following parameters:
· A port
· A Nuance recognition package derived from a Regulus grammar
· audio.Provider and other Nuance parameter, if any: please consult the Nuance documentation.
Usually, you will also want to supply a
parameter which specifies a port for a TTS engine. A typical invocation looks
like this:
%REGULUS%\RegulusSpeechServer\runtime\regserver.exe
-port 1975 -package ..\GeneratedFiles\recogniser
client.TTSAddresses=localhost:32323 -f C:/tmp/regserver_log.txt
A Prolog program can communicate with the SpeechServer using the following
predicates, all defined in the file Prolog/regulus_sockettalk.pl:
· regulus_sockettalk_exit_client/0
· regulus_sockettalk_exit_server/0
· regulus_sockettalk_say_file/1
· regulus_sockettalk_say_tts/1
· regulus_sockettalk_say_list/1
· regulus_sockettalk_set_output_volume/1
· regulus_sockettalk_set_parameter/2
· regulus_sockettalk_get_parameter/2
· regulus_sockettalk_recognise/2
· regulus_sockettalk_recognise_file/3
· regulus_sockettalk_interpret/3
regulus_sockettalk_init(+Port)
Effect
Initialise socket connection; call before invoking any of the other calls.
Example
regulus_sockettalk_init(1975)
regulus_sockettalk_exit_client
Conditions
None
Effect
Closes connection to regserver.
Example
regulus_sockettalk_exit_client
regulus_sockettalk_exit_server
Conditions
None
Effect
Exits regserver.
Example
regulus_sockettalk_exit_server
regulus_sockettalk_say_file(+File)
Conditions
File is an atom whose print name is
the name of a .wav file in the current RegServer prompt directory
Effect
The wavfile File is appended to the
prompt queue
Example
regulus_sockettalk_say_file(hello)
regulus_sockettalk_say_tts(+String)
Conditions
String is a Prolog string
Effect
A request to say String using TTS is
appended to the prompt queue
Example
regulus_sockettalk_say_string("hello
world")
regulus_sockettalk_say_list(+ItemList)
Conditions
ItemList is a list of items of form
either
· file(FileAtom) where FileAtom is an atom representing a wavfile.
· tts(StringAtom) where StringAtom is an atom representing a string.
Effect
An ordered list of output requests to play wavfile and/or perform TTS is
appended to the prompt queue.
Example
regulus_sockettalk_say_list([file('hello.wav'),
file('world.wav'), tts('OK, did that')])
regulus_sockettalk_set_output_volume(+Number)
Conditions
Number is an integer
Effect
A request to set the output volume to Number is
sent to the server
Example
regulus_sockettalk_set_output_volume(255)
regulus_sockettalk_set_parameter(+ParamName, +Value)
Conditions
ParamName is an atom
Value is an atom or number
Effect
A request to set Param to Value is sent to the server
Example
regulus_sockettalk_set_param('audio.OutputVolume',
255)
regulus_sockettalk_get_parameter(+ParamName, -Value)
Conditions
ParamName is an atom
Effect
A request to get Param is sent
to the server. Value is unified
with the result, which can be either an integer, a float or an atom.
Example
regulus_sockettalk_get_param('audio.OutputVolume',
Volume)
regulus_sockettalk_recognise(+Grammar, -Response)
regulus_sockettalk_recognise_file(+Wavfile, +GrammarName, -Result)
regulus_sockettalk_interpret(+StringAtom,
+GrammarName, -Result)
Conditions
Grammar is a Prolog atom
representing a top-level grammar present in the current recognition package.
Wavfile is an atom representing a wavfile
StringAtom
is an atom representing a text string
Effect
The process sends a recognition request to the server, using the top-level
grammar Grammar.
For regulus_sockettalk_recognise_file/3,
recognition is performed on the designated wavfile.
For regulus_sockettalk_interpret,
the designated string is parsed using the specified grammar.
The response can be one of the following:
· recognition_succeeded(Confidence, Words, Result) where
o Confidence is the Nuance confidence score ;
o Words are the recognised words, expressed as a Prolog atom ;
o Result is a Regulus semantic expression, expressed as a Prolog term.
· recognition_failed(FailureType) where FailureType is a Prolog term.
Example
regulus_sockettalk_recognise('.MAIN', Result)
regulus_sockettalk_recognise_file('C:/tmp/utt1.wav', '.MAIN', Result)
regulus_sockettalk_interpret("switch
on the light', '.MAIN', Result)
There is a Java client library which provides functionality similar to that of the Prolog library. You can use this to construct Java-based applications that use the RegServer. Documentation is available in the file regServer.html, in this directory.
There is a sample Prolog-based RegServer dialogue application in $REGULUS/Examples/Toy1/Prolog/toy1_app.pl. You can run this application as follows:
1. Compile the Toy1 recognition package by executing a 'make' in the directory $REGULUS/Examples/Toy1/scripts.
2. Start a Nuance license manager.
3. Start a Nuance recserver by invoking the script $REGULUS/Examples/Toy1/scripts/run_recserver.bat.
4. Start an English Nuance Vocalizer by invoking the script $REGULUS/Examples/Toy1/scripts/run_vocalizer3.bat.
5. Start a RegServer by invoking the script $REGULUS/Examples/Toy1/scripts/run_regserver.bat.
6. Start the top-level app by invoking the script $REGULUS/Examples/Toy1/scripts/run_app.bat.
There is a sample Prolog-based RegServer dialogue application in $REGULUS/Examples/Toy1/Prolog/toy1_slt_app.pl. You can run this application as follows:
1. Compile the Toy1 recognition package by executing a 'make' in the directory $REGULUS/Examples/Toy1/scripts.
2. Start a Nuance license manager.
3. Start a Nuance recserver by invoking the script $REGULUS/Examples/Toy1/scripts/run_recserver.bat.
4. Start a French Nuance Vocalizer by invoking the script $REGULUS/Examples/Toy1/scripts/run_vocalizer3_fre.bat.
5. Start a RegServer by invoking the script $REGULUS/Examples/Toy1/scripts/run_regserver.bat.
6. Start the top-level app by invoking the script $REGULUS/Examples/Toy1/scripts/run_slt_app.bat.
[Not yet documented]