\input texinfo @c -*-texinfo-*- @settitle Wisi @copying Copyright @copyright{} 1999 - 2020 Free Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover texts being ``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled ``GNU Free Documentation License''. (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and modify this GNU manual. Buying copies from the FSF supports it in developing GNU and promoting software freedom.'' @end quotation @end copying @dircategory Emacs @direntry * Wisi: (wisi). Error-correcting LR parsers and project integration. @end direntry @titlepage @sp 10 @title Wisi Version 3.1.2 @page @vskip 0pt plus 1filll @insertcopying @end titlepage @contents @ifnottex @node Top @top Top Wisi Version 3.1.2 @end ifnottex @menu * Overview:: * Grammar actions:: * Project extension:: * GNU Free Documentation License:: * Index:: @end menu @node Overview @chapter Overview ``wisi'' used to be an acronym, but now it's just a name. The wisi package provides an elisp interface to an external parser. It assumes the parser generator package WisiToken (@url{http://stephe-leake.org/ada/wisitoken.html}, implemented in Ada), but can use any parser that meets the same API. wisi provides several grammar actions, to implement indentation, navigating, and syntax highlighting (fontification). wisi also provides an extension to Emacs @file{project.el}, providing operations useful for compilation and cross-reference. @node Grammar actions @chapter Grammar Actions Grammar actions are specified in the grammar file, in a nonterminal declaration. We assume the user is familiar with parser grammars and grammar actions. For example, a simple ``if'' statement can be declared as: @example if_statement : IF expression THEN statements elsif_list ELSE statements END IF SEMICOLON %((wisi-statement-action [1 statement-start 3 motion 6 motion 10 statement-end]) (wisi-motion-action [1 3 5 6 10]) (wisi-indent-action [nil [(wisi-hanging% ada-indent-broken (* 2 ada-indent-broken)) ada-indent-broken] nil [ada-indent ada-indent] nil nil [ada-indent ada-indent] nil nil nil]))% @end example The item before @code{:} is the ``left hand side'', or ``nonterminal''. The list of tokens after @code{:} is the ``right hand side''; in general there can be more than one right hand side for each nonterminal (separated by @code{|}). The items enclosed in ``%()%'' are the grammar actions. They are specified as list of elisp forms; an earlier version of the wisi package generated an elisp parser. We keep the elisp form because it is compact, and easier to read and write than the equivalent Ada code. The @code{wisi-bnf-generate} tool converts the elisp into the required Ada statements. There are two classes of actions; in-parse and post-parse. WisiToken calls these ``semantic checks'' and ``user actions''. The in-parse actions are done as parsing procedes; they provide extra checks that can cause the parse to fail. Currently the only one provided is @code{match-names}; it is used to check that the declaration and end names in named Ada blocks are the same (which can aid significantly in error correction). In the grammar file, in-parse actions are specified in a second @code{%()%} block, which can be omitted if empty. In this document, the term ``action'' means ``post-parse action'', we use ``in-parse action'' unless the meaning is clear from context. Executing the wisi grammar actions creates text properties in the source file; those text properties are then used by elisp code for various purposes. The text properties applied are: @table @code @item wisi-cache This should be named @code{wisi-navigate}, but isn't for historical reasons (there used to be only one kind of text property). The property contains a @code{wisi-cache} object, containing: @table @code @item nonterm The nonterminal in the grammar production that specified the action that produced this text property. @item token A token identifier naming a token in the production right hand side containing the text this text property is applied to. @item last The position of the last character in the token, relative to the first character (0 indexed). The text property is only applied to the first character in the token (mostly for historical reasons). @item class A token class; see the list of possible values in @code{wisi-statement-action} below. @item containing A marker pointing to the start of the containing token for this token; only @code{nil} for the outermost containing token in a file. @item prev A marker pointing to the previous ``motion token'' in the statement or declaration. These are normally language keywords, but can be other things. @item next A marker pointing to the next ``motion token'' in the statement or declaration. @item end A marker pointing to the end of the statement or declaration. @end table wisi provides motion commands for going to the various markers. @item wisi-name Contains no data, applied to a ``name'' of some sort. wisi provides commands for finding the next/previous name, and returning the text. Useful for the names of subprograms, which can then be used to build a completion table; see @code{wisi-xref-identifier-completion-table}. @item font-lock-face The standard font-lock property, specifying the face for the text. Some major modes do not use this for simple keywords; they use font-lock regular expressions instead. One reason for this is so keywords are still highlighted when the parser fails, which can happen if there are severe syntax errors. Other items, like function and package names, are typically marked with @code{font-lock-face} by the parser. @item fontified Another standard font-lock text property; applied whenever @code{font-lock-face} is. @item wisi-indent Contains the indent (in characters) for the next line; applied to the newline character on the preceding line. The first line in a buffer is assumed to have indent 0. @end table Each action is classified as one of @code{navigate, face, indent, in-parse}; when actions are executed, only one of the first three classes is executed (in-parse is always executed). This reflects the reasons the parser is run; to figure out how to go somehere (end of current statement, start of current procedure, etc), to apply faces for syntax highlighting, or to indent the code. @menu * Navigate actions:: * Face actions:: * Indent actions:: * In-parse actions:: @end menu @node Navigate actions @section Navigate actions @table @code @item wisi-statement-action [TOKEN CLASS ...] The argument is a vector; alternating items are a token index (an integer or label indicating a token in the right hand side) and a ``token class''; one of: @table @code @item motion Create a @code{wisi-cache} text property on the token, for use in a subsequent @code{wisi-motion-action}. @item statement-end Create a @code{wisi-cache} text property on the token, enter a pointer to it in the other @code{wisi-cache} objects in the statement or declaration. @item statement-start Create a @code{wisi-cache} text property on the token, enter a pointer to it in the other @code{wisi-cache} objects (in the @code{containing} slot) in the statement or declaration. @item statement-override Same as @code{statement-start}; marks the token to be used as the statement start if the first token is optional. @item misc Create a @code{wisi-cache} text property on the token, to be used for some other purpose. It is good style to indicate the purpose in a comment. For example, ada-mode uses a 'misc' property on left parentheses that start a subprogram parameter list; this distinquishes them from other left parentheses, and makes it possible to automatically call @code{ada-format-paramlist} to format the parameter list, instead of using the standard Emacs @code{align}. @end table @item wisi-motion-action [TOKEN ...] The argument is a vector, where each element is either a token index or a vector [INDEX ID]. Each terminal token must already have a @code{wisi-cache} created by a @code{wisi-statement-action} (this is checked at action execution, not during grammar generation). This action sets the @code{prev, next} slots for the chain of tokens, creating a chain of motion tokens. If TOKEN is a nonterminal without an ID specified, the @code{wisi-cache} must be on the first token in the nonterminal, and it is assumed to have a valid pointer in the @code{next} slot, indicating a chain of motion tokens. That chain is linked into the chain for the current right hand side. If TOKEN is a nonterminal with an ID, the region contained by the nonterminal is searched for all @code{wisi-cache} with that token ID, and for each one where prev/next is not already set, it is linked into the motion chain. Note that the ``search'' described here is done in the parser process, on a tree data structure containing the data that will eventually be stored in Emacs text properties. @item wisi-name-action TOKEN TOKEN is a token index. Create a @code{wisi-name} text property on the token. @end table @node Face actions @section Face actions @table @code @item wisi-face-mark-action [INDEX CLASS ...] The argument is a vector; alternating elements form pairs of INDEX CLASS, where class is one of @code{prefix, suffix}. Mark the tokens as part of a compound name, for use by later face actions. @item wisi-face-apply-action [TOKEN PREFIX-FACE SUFFIX-FACE ...] The argument is a vector; triples of items specify TOKEN, PREFIX-FACE, SUFFIX-FACE. The faces are the elisp names of face objects (which must declared by an @code{%elisp_face} declaration). If the token is a nonterminal, and it has been marked by a previous @code{wisi-face-mark-action}, the specified faces are applied to the prefix and suffix in the token as @code{font-lock-face} text properties. If the token is a terminal, or a non-terminal with no face mark, the suffix face is applied to the entire text contained by the token. @item wisi-face-apply-list-action [TOKEN PREFIX-FACE SUFFIX-FACE ...] Similar to ’wisi-face-apply-action’, but applies faces to all tokens marked by @code{wisi-face-mark-action} in each indicated production token, and does not apply a face if there are no such marks. @end table @node Indent actions @section Indent actions Indents are computed for each line in a cumulative way as the grammar actions are executed. Initially, each indent is set to @code{nil}, which means ``not computed''; this is not the same as the value @code{0}. The grammar actions are executed in a bottom-up fashion; low level productions are executed before higher level ones. In general, the indent action for a production specifies a ``delta indent''; the indent is incremented by that amount. When all productions have been processed, the indent has been computed for all lines. Indents are often given as a function call; the arguments to the function can be other function calls, or integer expressions. @code{wisitoken-bnf-generate} supports only simple integer expressions; those using integers, integer-valued variables, parenthesis, + (plus), - (minus), and * (multiply). @table @code @item wisi-indent-action [INDENT ...] The argument is a vector, giving an indent for each token in the production right-hand side. For terminals, the indents only have meaning, and are only computed, if the token is the first on a line. For nonterminals where the indent is not a variant of @code{wisi-hanging}, the indent is only computed if the first terminal token in the nonterminal is the first on a line. See @code{wisi-hanging} in @ref{Indent functions} for the remaining case. An indent can have several forms. In the descriptions below, the ``current token'' is given by the position of the indent expression in the @code{wisi-indent-action} argument list. @table @asis @item An integer This gives a delta indent; it is added to the total indent for the line. @item A variable name The name is translated to an Ada identifier by replacing ``-'' with ``_'', and applying @code{Camel_Case}. The translated name must identify a directly visible run-time Ada integer variable; this is checked at Ada compile time. It provides an integer delta indent. For example, in Ada two indent variable names are @code{ada-indent} and @code{ada-indent-broken}, giving the basic ident, and the continuation line indent. They are runtime variables so different projects can specify them as part of a coding standard. @item A function call A function that computes a delta indent. See @ref{Indent functions}. @item [CODE-INDENT , COMMENT-INDENT] A vector giving separate indents for code and comments. Normally, the indent for trailing comments (on lines with no code, after all code in the token) is given by the indent of the following token in the production. When the current token is the last, or the following tokens may be empty, or the indent of the following token would be wrong for some reason (for example, it is a block end), the comment indent may be specified separately. If it is not specified, and the indent from the next token is not available, the indent for the current token is used for code and comments. Comment lines that are not trailing are indented by CODE-INDENT. @item (label . INDENT) If token labels are used in a right hand side, they must be given explicitly in the indent arguments, using he lisp ``cons'' syntax. Labels are normally only used with EBNF grammars, which expand into multiple right hand sides, with optional tokens simply left out. Explicit labels on the indent arguments allow them to be left out as well. @end table @end table @menu * Indent functions:: * Indent example:: @end menu @node Indent functions @subsection Indent functions @table @code @item wisi-anchored TOKEN OFFSET Sets the indent for the current token to be OFFSET (an integer expression) from the start of TOKEN (a token index); the current token is ``anchored to'' TOKEN. @item wisi-anchored* TOKEN OFFSET Sets the indent for the current token to be OFFSET from the start of TOKEN, but only if TOKEN is the first token on a line; otherwise no indent @item wisi-anchored*- TOKEN OFFSET Sets the indent for the current token to be OFFSET from the start of TOKEN, but only if TOKEN is the first token on a line and the indent for the current token accumulated so far is nil. @item wisi-anchored% TOKEN OFFSET If there is an opening parenthesis containing TOKEN in the line containing TOKEN, set the current indent to OFFSET from that parenthesis. Otherwise, OFFSET gives an indent delta. @item wisi-anchored%- TOKEN OFFSET Same as @code{wisi-anchored%}, but only if the current token accumulated indent is nil. @item wisi-hanging DELTA-1 DELTA-2 The current token is assumed to be a nonterminal. If the text it contains spans multiple lines, use DELTA-1 for the first line, DELTA-2 for the rest. If the current token is only on one line, use DELTA-1. DELTA-1 and DELTA-2 can be any IDENT expression, except a variant of @code{wisi-hanging}. @item wisi-hanging% DELTA-1 DELTA-2 Similar to @code{wisi-hanging}; if the first terminal token in the current nonterminal is the first token on the first line, use DELTA-1 for the first line and DELTA-2 for the rest. Otherwise, use DELTA-1 for all lines. @item wisi-hanging%- DELTA-1 DELTA-2 Same as @code{wisi-hanging%}, except applied only if the current token accumulated indent is nil. @item Language-specific function Language-specific indent functions are specified by an @code{%elisp_indent} declaration in the grammar file. Each function specifies how many arguments it accepts; this is checked at action runtime, not during grammar generation. Each argument is an INDENT as described above, or a token ID prefixed by @code{'} (to allow distinguishing token IDs from variable names). @end table @node Indent example @subsection Indent example The example @code{if_statement} grammar nonterminal is: @example if_statement : IF expression THEN statements elsif_list ELSE statements END IF SEMICOLON %((wisi-indent-action [nil [(wisi-hanging% ada-indent-broken (* 2 ada-indent-broken)) ada-indent-broken] nil [ada-indent ada-indent] nil nil [ada-indent ada-indent] nil nil nil]))% @end example We trace how the indent is computed for this sample Ada code: @example 1: if A < B and 2: C < D 3: -- comment on expression 4: then 5: if E then 6: Do_E; 7: -- comment on statement 8: elsif F then 9: G := A + Compute_Something 10: (arg_1, arg_2); 11: end if; 12: end if; @end example First, the indent for the lower-level nonterminals (@code{expression, statements, elsif_list}) are computed. Assume they set the indent for line 10 to 2 (for the hanging expression) and leave the rest at nil. Next, the action for the inner @code{if_statement} is executed. Most of the tokens specify an indent of @code{nil}, which means the current accumulated indent is not changed. For the others, the action is as follows: @table @code @item expression: The expression @code{E} is contained on one line, and it is not the first token on that line, so the indent for line 5 is not changed. @item statements: [ada-indent ada-indent] This specifies separate indents for code and trailing comments, because otherwise the trailing comments would be indented with the following @code{THEN}; instead they are indented with the expression code; see the comment on line 7. Here @code{ada-indent} is 3, so the indent for lines 6 and 7 (for the first occurence of @code{statments}) is incremented from @code{nil} to @code{3}. For the second occurence of @code{statements}, line 9 is incremented from @code{nil} to @code{3}, and line 10 from @code{2} to @code{5}. @end table At this point, the accumulated indents are (the indent is given after the line number): @example 1: nil : if A < B and 2: nil : C < D 3: nil : -- comment on expression 4: nil : then 5: nil : if E then 6: 3 : Do_E; 7: 3 : -- comment on statement 8: nil : elsif F then 9: 3 : G := A + Compute_Something 10: 5 : (arg_1, arg_2); 11: nil : end if; 12: nil : end if; @end example Then the action is executed for the outer @code{if_statement}: @table @code @item expression: [(wisi-hanging% ada-indent-broken (* 2 ada-indent-broken)) ada-indent-broken] This specifies separate indents for code and trailing comments, because otherwise the trailing comments would be indented with the following @code{THEN}; instead they are indented with the expression code; see the comment on line 3. In this case, @code{wisi-hanging%} returns DELTA-1, which is @code{ada-indent-broken}, which is 2. So the indent for line 2 is incremented from @code{nil} to @code{2}. The indent for line 3 is also incremented from @code{nil} to @code{2}. @item statements: [ada-indent ada-indent] Here there is only one statement; the nested @code{if_statement}. The indent for lines 5 .. 11 are each incremented by 3. @end table The final result is: @example 1: nil : if A < B and 2: 2 : C < D 3: 2 : -- comment on expression 4: nil : then 5: 3 : if E then 6: 6 : Do_E; 7: 6 : -- comment on statement 8: 3 : elsif F then 9: 6 : G := A + Compute_Something 10: 8 : (arg_1, arg_2); 11: 6 : end if; 12: nil : end if; @end example In a full grammar, the top production should specify an indent of 0, not nil, for tokens that are not indented; then every line will have a non-nil indent. However, in normal operation a nil indent is treated as 0; the @code{wisi-indent} text property is not set for lines that have nil indent, and @code{wisi-indent-region} detects that and uses 0 for the indent. You can set the variable @code{wisi-debug} to a value > 0 to signal an error for nil indents; this is useful to catch indent errors during grammar development. @node In-parse actions @section In-parse actions @table @code @item wisi-propagate-name TOKEN The argument is a token index. Set the @code{name} component of the left-hand-side parse-time token object to the @code{name} component of the identified token, if it is not empty. Otherwise use the @code{byte_region} component. @item wisi-merge-name FIRST-TOKEN, LAST-TOKEN The arguments are token indices, giving a range of tokens. LAST-TOKEN may be omitted if it is the same as FIRST-TOKEN. Set the @code{name} component of the left-hand-side to the merger of the @code{name} or @code{byte-region} components of the identified tokens. @item wisi-match-name START-TOKEN END-TOKEN The arguments are token indices. Compare the text contained by the @code{name} (or @code{byte_region} if @code{name} is empty) token components for START-TOKEN and END-TOKEN; signal a parse error if they are different. The behavior when a name is missing is determined by the runtime language variable given in the @code{%end_names_optional_option} declaration; if True, a missing name that is supposed to match a present name is an error. Both names missing is not an error (assuming that is allowed by the grammar). @end table @node Project extension @chapter Project extension wisi defines the @code{cl-defstuct} @code{wisi-prj}, with operations suitable for compilation and cross-reference. In order to use wisi projects, the user must write project files and customize @code{project-find-functions} and @code{xref-backend-functions}. @menu * Project files:: * Selecting projects:: * Casing exception files:: * Other project functions:: @end menu @node Project files @section Project files Project file names must have an extension given by @code{wisi-prj-file-extensions} (default @file{.adp, .prj}). Project files have a simple syntax; they may be edited directly. Each line specifies a project variable name and its value, separated by ``='': @example src_dir=/Projects/my_project/src_1 src_dir=/Projects/my_project/src_2 @end example There must be no space between the variable name and ``='', and no trailing spaces after the value. Any line that does not have an ``='' is a comment. Some variables (like @code{src_dir}) are lists; each line in the project file specifies one element of the list. The value on the last line is the last element in the list. A variable name that starts with @code{$} is set as a process environment variable, for processes launched from Emacs for the project. In values, process environment variables can be referenced using the normal @code{$var} syntax. In values, relative file names are expanded relative to the directory containing the project file. Here is the list of project variables defined by wisi; major modes may add more. @table @asis @item @code{casing} [slot: @code{case-exception-files}] List of files containing casing exceptions. @xref{Casing exception files}. @item @code{src_dir} [slot: @code{source-path}] A list of directories to search for source files. @end table @node Selecting projects @section Selecting projects The current project can either be indicated by a global variable (called a ``selected project''), or depend on the current buffer. In addition, the project file can be parsed each time it is needed, or the result cached to improve response time, One reason to use a selected project is to handle a hierarchy of projects; if projects B and C both depend on library project A, then when in a file of project A, there is no way to determine which of the three projects to return. So the user must indicate which is active, by using one of @code{wisi-prj-select-file} or @code{wisi-prj-select-cache}. In addition, if changing from one project to another requires setting global resources that must also be unset (such as a syntax propertize hook or compilation filter hook), then the project should define @code{wisi-prj-deselect} in addition to @code{wisi-prj-select}. Such projects require having a selected current project, so it can be deselected before a new one is selected. One example of such projects is ada-mode. One way to declare each project is to add a Local Variables section in the main Makefile for the project; when the Makefile is first visited, the project is declared. In the examples here, we assume that approach is used; each gives an :eval line. Note that @code{wisi-prj-current-parse} and @code{wisi-prj-current-cached} always succeed after some project is selected; no functions after them on @code{project-find-functions} will be called. That's why the depth is 90 for those in the examples. @table @asis @item No caching, current project depends on current buffer @example (add-hook 'project-find-functions #'wisi-prj-find-dominating-parse 0) :eval (wisi-prj-set-dominating "foo.prj" (foo-prj-default "prj-name")) @end example @code{wisi-prj-set-dominating} declares the name of a project file with a default project object, and ensures that the current buffer file name is in @code{wisi-prj--dominating}. @code{wisi-prj-find-dominating-parse} looks for the filenames in @code{wisi-prj--dominiating} in the parent directories of the current buffer. When one is found, the associated project file is parsed, using the default project object to dispatch to the appropriate parsers. Then the final project object is returned. @item Caching, current project depends on current buffer @example (add-hook 'project-find-functions #'wisi-prj-find-dominating-cached 0) :eval (wisi-prj-cache-dominating "foo.prj" (foo-prj-default "prj-name")) @end example @code{wisi-prj-cache-dominating} declares the project file, parses it, and saves the project object in a cache indexed by the absolute project file name. @code{wisi-prj-find-dominating-cached} finds the dominating project file, and retrieves the object from the cache. @item No caching, last selected project is current @example (add-hook 'project-find-functions #'wisi-prj-current-parse 90) :eval: (wisi-prj-select-file (foo-prj-default "prj-name")) @end example @code{wisi-prj-select-file} sets the project file as the current project, and saves the default project object. @code{wisi-prj-current-parse} parses the current project file, using the saved default project object, and returns the project object. @item Caching, last selected project is current @example (add-hook 'project-find-functions #'wisi-prj-current-cached 90) :eval: (wisi-prj-select-cache (foo-prj-default "prj-name")) @end example @code{wisi-prj-select-cache} parses the project file, caches the project object. @code{wisi-prj-current-cached} returns the cached current project object. @end table In addition, the user should set @code{xref-backend-functions}. Currently, there is only one choice for wisi projects: @example (add-to-list 'xref-backend-functions #'wisi-prj-xref-backend 90) @end example @code{wisi-prj-xref-backend} returns the current wisi project object. @node Casing exception files @section Casing exception files Each line in a case exception file specifies the casing of one word or word fragment. If an exception is defined in multiple files, the first occurrence is used. If the word starts with an asterisk (@code{*}), it defines the casing of a word fragment (or ``substring''); part of a word between two underscores or word boundary. For example: @example DOD *IO GNAT @end example The word fragment @code{*IO} applies to any word containing ``_io''; @code{Text_IO}, @code{Hardware_IO}, etc. @node Other project functions @section Other project functions @table @code @item wisi-refresh-prj-cache (not-full) Refreshes all cached data in the project, and re-selects the project. If NOT-FULL is non-nil, slow refresh operations are skipped. This reparses the project file, and any cross reference information. @item wisi-prj-select-dominating (dominating-file) Find a wisi-prj matching DOMINATING-FILE (defaults to the current buffer file). If the associated project is current, do nothing. If it is not current, select it. This is useful before running `compilation-start', to ensure the correct project is current. @end table @node GNU Free Documentation License @appendix GNU Free Documentation License @include doclicense.texi @node Index, , GNU Free Documentation License, Top @unnumbered Index @printindex fn @bye