% link-parser internal command documentation. % % The internal help system first displays the hard-coded one-line description % of each variable (or command) and its current and default values, and then % the matching text entry from this file. [graphics] The meaning of the marked-up displayed words are as follows: [word] Null-linked (unlinked) word word[!] word classified by a regex word[!REGEX_NAME] word classified by REGEX_NAME (turn on by !morphology=1) word[~] word generated by spell guessing (unknown original word) word[&] word run-on separated by spell guessing word[?].POS word is unknown (POS is found by the parser) word.POS word found in the dictionary as word.POS word.#CORRECTION word is likely a typo - got linked as CORRECTION For dictionaries that support morphology (enable with !morphology=1): word= A prefix morpheme =word A suffix morpheme word.= A stem For more details see: https://www.abisource.com/projects/link-grammar/dict/ [constituents] Accepted values are: 0 Disabled (no constituent tree display) 1 Treebank-style constituent tree 2 Flat, bracketed tree [A like [B this B] A] 3 Flat, treebank-style tree (A like (B this)) [spell] If zero, spell-guessing corrections and run-on corrections of unknown words are not performed. Otherwise, this indicates the number of spelling-correction guesses per unknown word. The number of run-on corrections (word splits) of unknown words is not limited when spell-guessing is enabled. [width] The terminal width, used for wrapping the printing of long sentence diagrams. Normally, this is not needed, as the terminal width is automatically adjusted when the terminal window is resized. [verbosity] The level of descriptive debug messages that will be printed. Values 1-4 are appropriate for use by the program/library user. Higher values are intended for LG dictionary authors and library developers. For each level, unless otherwise is noted, messages of lower verbosity levels are included. Some useful values: 0 No prompt, minimal library messages 1 Normal verbosity (its messages are included in all higher levels) 2 Show times of the parsing steps 3 More info messages 4 Display data file search and locale setup In the levels below, the messages of levels 2-4 are not included: 5-9 Tokenizer and parser debugging 10-19 Dictionary debugging The output of these levels may be restricted to particular files and/or functions that are listed (comma-separated) in the !debug variable. The following levels are for particular information. The messages of levels greater than 1 are not included in their output: 101 Print all the dictionary connectors, along with their length limit 102 Print all the disjuncts, before and after pruning 103 Show unsubscripted dictionary words and subscripted ones which share the same base word 104 Memory pool statistics [morphology] When False, whole words are displayed, without indicating any morphological analysis that might have been performed. When True, morphemes are shown as separate tokens, together with the link types between them. See "!help graphics" for additional info on morpheme markup. The English dictionaries do not do morphological markup, so this flag has almost no effect on English sentences. This flag has one side-effect: if set to true, and a word is matched by a RegEx, then matching dictionary entry is shown. [limit] The maximum number of linkages that are considered for post-processing. Up to this many linkages are generated; if there are fewer parses than this limit, then they will all be printed, in deterministic, cost-ranked order. If there are more parses than this limit, then a random subset will be printed. The !random option is used to control whether this sampling will use a repeatable (deterministic) random sequence, or not. [cost-max] Determines the largest disjunct cost considered during parsing. That is, only disjuncts with a cost less than this are used during the parse; higher-cost disjuncts are ignored. Raising the max allowed cost will typically produce more parses, although these are (far) less likely to be correct. [bad] When True, also display linkages that are rejected by post-processing, along with the name of the rule that resulted in the rejection. This mode is useful when editing the dictionary or the post-processing rule-set. The invalid linkages will be printed after the valid ones. The parser will only output the linkages it finds at whatever stage it had gotten to when it found a valid linkage. For example, if it had gotten to null-link stage 2 before finding its first valid linkage, it will also output invalid linkages found at null-link stage 2. There is no way of seeing invalid linkages found at earlier stages. [short] Determines the maximum allowed length for certain connectors. The intended use is to speed up parsing by not considering very long links for most connectors, since they are rarely needed in a correct parse. Setting this too low will prevent valid parses; setting this too high will slow the system, and occasionally generate unlikely parses. The limit applies only to those connectors not exempted by the UNLIMITED-CONNECTORS dictionary entry. [timeout] Determines the approximate maximum time (in seconds) that parsing is allowed to take. If a parse is not found before this time, normal parsing is halted, and a "panic parse" mode is entered. During the panic parse, a looser, less restrictive set of parameters is used (primarily, a larger !cost-max), in an effort to find some, any parse. Panic mode can be enabled and disabled with the !panic option. This option has no effect on the SAT parser (see "!help use-sat"). [memory] This variable no longer has any effect; it is obsolete. [null] When False, only linkages without null links are considered. When True, the parser tries to find linkages with the minimal possible number of null links. [panic] If enabled, then a "panic-mode" will be entered when a parse cannot be found within the time limit set by !timeout. When in panic mode, various parse options are loosened so that a less accurate parse can be found quickly. See "!help panic_variables" for info on these parse options. [use-sat] Use the Boolean-SAT parser instead of the traditional parser. The SAT parser was an experimental alternative to the traditional parser. This parser has several limitations, and offers no real advantages over the traditional parser. Problems include that it is not able to find linkages with null-links. It does not honor the `!timeout` option. [walls] Alters the display of parsed sentences (see "!help graphics"). When True, the RIGHT-WALL and LEFT_WALL are always displayed. When False, they are not displayed if their links are not considered "interesting" (by a hard-coded criterion in the LG library). [islands-ok] This option determines whether or not "islands" of links are allowed. For example, the following linkage has an island: linkparser> this sentence is false this sentence is true No complete linkages found. Found 16 linkages (8 had no P.P. violations) at null count 1 Linkage 1, cost vector = (UNUSED=0 DIS= 0.00 LEN=11) +----------->WV---------->+ +------->Wd-------+ | | +--Dsu*c-+--Ss*s-+-Paf-+ +--Dsu*c-+--Ss*s-+--Pa-+ | | | | | | | | | LEFT-WALL this.d sentence.n is.v false.a this.d sentence.n is.v true.a [postscript] Generate postscript output. The generated postscript requires a header in order to be properly displayed; the header is printed by setting !ps-header=True. The postscript output currently malfunctions for sentences longer than a page width. [ps-header] When set, and when !postscript=True is set, then the postscript header will be printed. %[cost-model] %The only allowed value is 1 for now (the source code may need fixes). [links] When enabled, this will display each link, one per line, with the words and connectors at each end of the link. The post-processing domains are also displayed. This mode is set to True when the standard input is not a terminal. [disjuncts] When True, display the disjuncts that used for each word, together with their cost. [batch] When True, the program process sentences in batch-mode. During batch mode, the usual parse printing is suppressed; only errors are reported. In batch mode, a leading * in the first column can be used to indicate a non-grammatical sentence. If such a sentence parses, an error is printed. Conversely, an error is reported if no parses are found for a valid sentence. Batch testing is typically performed by piping a file to the parser; for example link-parser [dictionary name] [arguments] < input-file or cat input-file | link-parser [dictionary name] [arguments] This flag is then usually placed at the beginning of the input-file (other options may be specified, as well). Setting the !echo flag can be useful, as it will echo the input sentence. Our GitHub repository contains several large batch-files used during testing and development; for English, the three most important ones are "corpus-basic.batch", "corpus-fixes.batch" and "corpus-fix-long.batch". See: https://github.com/opencog/link-grammar/tree/master/data/en For more details see BATCH-MODE in: https://www.abisource.com/projects/link-grammar/dict/introduction.html [echo] Print the original input sentence. This is primarily useful when working in !batch mode, which otherwise suppresses output. This mode is set to True when the standard input is not a terminal. [rand] If set to true, then a repeatable random sequence will be used, whenever a random number is required. The parser almost never uses random numbers; currently they are only used in one place: to sample a subset of linkages, if there are more parses than the linkage limit. See "!help limit" for info on the linkage limit. [debug] This variable is for LG library development. Its purpose is to limit the quantity of debug output, of which there may otherwise be too much. For example: $ link-parser -verbosity=6 -debug=flatten_wordgraph,print.c will only show messages from the `flatten_wordgraph()` function or the print.c file. For more details see debug/README.md in the LG library source code directory. [test] This variable is used to enable features that are used for debug or do not yet have any other variable to control them. For example, to show all the linkages without a need to press RETURN, use: !test=auto-next-linkage For more details, see debug/README.md and link-grammar/README.md in our GitHub repository https://github.com/opencog/link-grammar . [file] Read text from this file. The file is assumed to contain sentences and/or option settings. It is typically used for reading in batch-mode files (see "!help batch") but can also be useful in other scripting situations. [variables] Variables can be set as follows: ! Toggle the specified Boolean variable. != Assign that value to that variable. [wordgraph] This variable controls displaying the word-graph of the sentence. The word-graph is a representation of the relations between the sentence tokens, as set by the library tokenizer before the parsing step. Its value may be: 0 Disabled 1 Default display 2 Display parent tokens as subgraphs 3 Use esoteric display flags as set by !test=wg:FLAGS % FLAGS documentation: % These flags are defined in wordgraph.h. % Below, unsplit-word means a token before getting split. % 1 and 2 mark the flags that are enabled in that modes. % % c Compact display % d 1 Display debug labels % h Display hex node numbers (for "dot" command debug) % l 1,2 Add a legend % p Display back-pointing links % s 2 Display unsplit-words as subgraphs % u 1 Display unsplit-word links % x Display using X11 even on Windows (if supported) [dialect] This variable allows parsing according to predefined dialects (defined in the "4.0.dialect" file), by modifying the disjunct set of dictionary words whose expressions contain symbolic cost specifications - aka "dialect components". It does that by controlling the cost values of these dialect components. The value of this variable consists of comma-separated names, with an optional cost value after a ":" delimiter (which can be empty). White space is not allowed. Names without a cost value are dialect names from the "4.0.dialect" file, and the dialect components are assigned costs as defined there. Names with values are dialect components and their values. A missing value after a ":" delimiter denotes a "very high" cost (to disable the related disjuncts). Examples: !dialect=irish !dialect=irish,headline !dialect=instructions,bad-spelling:2.2 [!] This command is for debugging the dictionary or the library. It gets as an argument a word, and optionally a regex and/or flags. It splits the given word to tokens according to the current language, and for each token it prints its matching dictionary words along with its expression or disjunct list. The word may include a wildcard * to find multiple matches, and a subscript can be used to limit the matches to this subscript only. Examples ("test.n" is an example word): Show the expression: !!test.n Show the expression using macro tags: !!test.n/m Each macro tag is followed by its content on the same line. The other lines are direct expression components (before and after a macro). Show also low-level memory details of the expression: !!test.n/l Show the disjuncts (without duplicates): !!test.n// Show disjunct connector expression source macros: !!test.n//m The above command is more useful for a single disjunct (1234 is an example for a disjunct number, see below for disjunct print format): !!test.n/1234/m Show selected disjuncts according to the supplied string (* and + are automatically escaped if no other regex meta characters in the string): !!test.n/Ds**x+/ Show selected disjuncts according to the supplied regex: !!test.n/ Wd-.*<>.*@M\+/ !!test.n/ J[sk]- D[\w*]+c\-/ Regexes are automatically detected. The r flag forces a regex interpretation but it is not needed on normal use. Show a particular disjunct from the output of !disjuncts: !!test.n/Ds**c- Os-/f The f flag means a full specification of a disjunct. It is most useful along with the m flag: !!test.n/Ds**c- Os-/fm Search for connectors in any order: !!test.n/Os- Ds**c-/a Regretfully, adding the f flag is not supported yet. Display all the words that start with "test": !!test* Display all the words that start with "test" and have subscript ".q": !!test*.q A sample output of a disjunct-list display: Token "test.n" matches: test.n 8509 disjuncts Token "test.n" disjuncts: test.n 4273/4501 disjuncts ... test.n: [3493]2.600= @AN- @A- Ds**x- <> NM+ R+ Bs+ Bsm+ ... In this sample output: 8509 Number of disjuncts in the dictionary expression. 4501 Number of disjuncts after applying cost-max. 4273 Number of disjuncts w/o duplicates. 3493 Disjunct ordinal number. 2.600 Disjunct cost. = A separator to enable regex anchoring. <> A separator of the "-" (LHS) and "+" (RHS) connector lists. These variables affect the output: Disjuncts, expressions: !dialect Disjuncts only: !cost-max [panic_variables] Show the variables that may take effect in panic mode (see "!help panic") and their current values. They may replace the value of the similar named variables (without the prefix "panic_") according to certain replacement rules - see "!help panic_timeout". [panic_short,panic_cost-max,panic_limit,panic_max-null-count,panic_spell,panic_timeout] The following variables are panic mode variables, meaning that they may take effect in panic mode according to the rules that are listed below. Here are their names, roles and initial values, as shown by !panic_variables: Variable Controls Default value -------- -------- ------------- panic_short Max length of all links 12 panic_cost-max Largest cost to be considered 4.00 [*] panic_limit The maximum linkages processed 1000 panic_max-null-count Max number of null links allowed 10 panic_spell Up to this many spell-guesses per unknown word 0 panic_timeout Abort panic parsing after this many seconds 30 Their values are used in panic mode according to the following rules: !panic_short - replaces !short if its value is lower. [**] !panic_cost-max - replaces !cost-max if its value is higher. !panic_limit - replaces !limit if its value is lower. !panic_max-null-count - the maximum allowed null links. !panic_spell - replaces !spell if its value is lower. !panic_timeout - the timeout of the panic mode parsing. [*] Unless set in the dictionary to another value. [**] The library all_short_connectors parse option is always set in panic mode, so !panic_short determines the maximum length of all the links.