% This LaTeX document was generated using the LaTeX backend of PlDoc,
% The SWI-Prolog documentation system


\section{library(pcre): Perl compatible regular expression matching for SWI-Prolog}

\label{sec:pcre}

\begin{tags}
    \tag{See also}
`man pcre2api` or \url{https://www.pcre.org/current/doc/html/pcre2api.html}
for details of the PCRE2 syntax and options.
\end{tags}

This module provides an interface to the \href{http://www.pcre.org/}{PCRE2}
(Perl Compatible Regular Expression) library. This Prolog interface
provides an almost complete wrapper around PCRE2 (the successor to
PCRE) with as much backward compatibility to PCRE as possible, because
the original implementation was for PCRE (also known as PCRE1).

Regular expressions are created from a pattern and options and
represented as a SWI-Prolog \textit{blob}. This implies they are subject to
(atom) garbage collection. Compiled regular expressions can safely be
used in multiple threads. Most predicates accept both an explicitly
compiled regular expression, a pattern, or a term Pattern/Flags. The
semantics of the pattern can be additionally modified by options. In the
latter two cases a regular expression \textit{blob} is created and stored in a
cache. The cache can be cleared using \predref{re_flush}{0}.

Most of the predicates in this library take both a regular expression
represented as a string with optional flags, e.g., \verb$'aap'/i$ or a
\textit{compiled regular} expression. If a string (+flags) alternative is used,
the library maintains a cache of compiled regular expressions. See also
\predref{re_flush}{0}. The library can be asked to rewrite the \predref{re_match}{2} and
\predref{re_match}{3} goals to use inlined compiled regular expression objects
using

\begin{code}
:- set_prolog_flag(re_compile, true).
\end{code}

This has some consequences:

\begin{itemize}
    \item Performance is considerable better.
    \item Compiled regular expressions are currently incompatible with
\textit{Quick Load Files} (`.qlf`, see \predref{qcompile}{1}) and \textit{Saved States}
(see \predref{qsave_program}{2} and the \verb$-c$ command line option.
    \item Debugging may be harder.
\end{itemize}

\vspace{0.7cm}

\begin{description}
    \predicate[semidet]{re_match}{2}{+Regex, +String}
\nodescription
    \predicate[semidet]{re_match}{3}{+Regex, +String, +Options}
Succeeds if \arg{String} matches \arg{Regex}. For example:

\begin{code}
?- re_match("^needle"/i, "Needle in a haystack").
true.
\end{code}

Defined \arg{Options} are given below. For details, see the PCRE
documentation. If an option is repeated, the first value is used
and subsequent values are ignored. Unrecognized options are
ignored. Unless otherwise specified, boolean options default to
\const{false}.

If \arg{Regex} is a text pattern (optionally with flags), then any of the
\arg{Options} for \predref{re_compile}{3} can be used, in addition to the \arg{Options}
listed below. If \arg{Regex} is the result of \predref{re_compile}{3}, then only the
following execution-time \arg{Options} are recognized and any others are
ignored. Some options may not exist on your system, depending on the
PCRE2 version and how it was built - these unsupported options are
silently ignored.

\begin{itemize}
    \item \verb$start(From)$
Start at the given character index
    \item \verb$anchored(Bool)$
If \const{true}, match only at the first position
    \item \verb$bol(Bool)$
\arg{String} is the beginning of a line (default \const{true}) -
affects behavior of circumflex metacharacter (\verb$^$).
    \item \verb$empty(Bool)$
An empty string is a valid match (default \const{true})
    \item \verb$empty_atstart(Bool)$
An empty string at the start of the subject is a valid match
(default \const{true})
    \item \verb$eol(Bool)$
\arg{String} is the end of a line -
affects behavior of dollar metacharacter (\verb|$|)
(default \const{true}).
    \item \verb$newline(Mode)$
If \const{any}, recognize any Unicode newline sequence,
if \const{anycrlf}, recognize CR, LF, and CRLF as newline
sequences, if \const{cr}, recognize CR, if \const{lf}, recognize
LF, if \const{crlf} recognize CRLF as newline.
The default is determined by how PCRE was built, and
can be found by \verb$re_config(newline2(NewlineDefault))$.
    \item \verb$newline2(Mode)$ - synonym for \verb$newline(Mode)$.
    \item \verb$utf_check(Bool)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
You should not need this because SWI-Prolog ensures that the UTF8 strings are valid,
so the default is \const{false}.
    \item \verb$endanchored(Bool)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$partial_soft(Bool)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$partial_hard(Bool)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$dfa_restart(Bool)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$dfa_shortest(Bool)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
\end{itemize}

\begin{arguments}
\arg{Regex} & is the output of \predref{re_compile}{3}, a pattern or a term
Pattern/Flags, where Pattern is an atom or string. The defined flags
and their related option for \predref{re_compile}{3} are below.

\begin{shortlist}
    \item \textbf{x}: \verb$extended(true)$
    \item \textbf{i}: \verb$caseless(true)$
    \item \textbf{m}: \verb$multiline(true)$
    \item \textbf{s}: \verb$dotall(true)$
    \item \textbf{a}: \verb$capture_type(atom)$
    \item \textbf{r}: \verb$capture_type(range)$
    \item \textbf{t}: \verb$capture_type(term)$
\end{shortlist}

If \arg{Regex} is the output of \predref{re_compile}{3}, any compile-time options in
\arg{Options} or Flags are ignored and only match-time options are used.

The options that are derived from flags take precedence over the
options in the \arg{Options} list. In the case of conflicting flags, the
first one is used (e.g., \const{ra} results in \verb$capture_type(range)$). \\
\end{arguments}

    \predicate[semidet]{re_matchsub}{3}{+Regex, +String, -Sub:dict}
\nodescription
    \predicate[semidet]{re_matchsub}{4}{+Regex, +String, -Sub:dict, +Options}
Match \arg{String} against \arg{Regex}. On success, \arg{Sub} is a dict containing
integer keys for the numbered capture group and atom keys for the
named capture groups. The entire match string has the key \verb$0$. The
associated value is determined by the \verb$capture_type(Type)$ option
passed to \predref{re_compile}{3}, or by flags if \arg{Regex} is of the form
Pattern/Flags; and may be specified at the level of individual
captures using a naming convention for the caption name. See
\predref{re_compile}{3} for details.

The example below exploits the typed groups to parse a date
specification:

\begin{code}
?- re_matchsub("(?<date> (?<year_I>(?:\\d\\d)?\\d\\d) -
                (?<month_I>\\d\\d) - (?<day_I>\\d\\d) )"/x,
               "2017-04-20", Sub, []).
Sub = re_match{0:"2017-04-20", date:"2017-04-20",
               day:20, month:4, year:2017}.
\end{code}

\begin{arguments}
\arg{Both} & compilation and execution options are processed. See
\predref{re_compile}{3} and \predref{re_match}{3} for the set of options. In addition,
some compilation options may passed as \verb$/Flags$ to \arg{Regex} - see
\predref{re_match}{3} for the list of flags. \\
\arg{Regex} & See \predref{re_match}{2} for a description of this argument. \\
\end{arguments}

    \predicate[semidet]{re_foldl}{6}{:Goal, +Regex, +String, ?V0, ?V, +Options}
Fold all matches of \arg{Regex} on \arg{String}. Each match is represented by a
dict as specified for \predref{re_matchsub}{4}. \arg{V0} and \arg{V} are related using a
sequence of invocations of \arg{Goal} as illustrated below.

\begin{code}
call(Goal, Dict1, V0, V1),
call(Goal, Dict2, V1, V2),
...
call(Goal, Dictn, Vn, V).
\end{code}

This predicate is used to implement \predref{re_split}{4} and \predref{re_replace}{4}. For
example, we can count all matches of a \arg{Regex} on \arg{String} using this
code:

\begin{code}
re_match_count(Regex, String, Count) :-
    re_foldl(increment, Regex, String, 0, Count, []).

increment(_Match, V0, V1) :-
    V1 is V0+1.
\end{code}

After which we can query

\begin{code}
?- re_match_count("a", "aap", X).
X = 2.
\end{code}

Here is an example \arg{Goal} for extracting all the matches with their
offsets within the string:

\begin{code}
range_match(Dict, StringIndex-[MatchStart-Substring|List], StringIndex-List) :-
    Dict.(StringIndex.index) = MatchStart-MatchLen,
    sub_string(StringIndex.string, MatchStart, MatchLen, _, Substring).
\end{code}

And can be used with this query (note the \verb$capture_type(range)$ option,
which is needed by \predref{range_match}{3}, and \verb$greedy(false)$ to invert the
meaning of \verb$*?$):

\begin{code}
?- String = "{START} Mary {END} had a {START} little lamb {END}",
   re_foldl(range_match,
            "{START} *?(?<piece>.*) *?{END}",
            String, _{string:String,index:piece}-Matches, _-[],
            [capture_type(range),greedy(false)]).
Matches = [8-"Mary", 33-"little lamb"].
\end{code}

    \predicate[det]{re_split}{3}{+Pattern, +String, -Splits:list}
\nodescription
    \predicate[det]{re_split}{4}{+Pattern, +String, -Splits:list, +Options}
Split \arg{String} using the regular expression \arg{Pattern}. \arg{Splits} is a list
of strings holding alternating matches of \arg{Pattern} and skipped parts
of the \arg{String}, starting with a skipped part. The \arg{Splits} lists ends
with a string of the content of \arg{String} after the last match. If
\arg{Pattern} does not appear in \arg{String}, \arg{Splits} is a list holding a copy
of \arg{String}. This implies the number of elements in \arg{Splits} is \textit{always}
odd. For example:

\begin{code}
?- re_split("a+", "abaac", Splits, []).
Splits = ["","a","b","aa","c"].
?- re_split(":\\s*"/n, "Age: 33", Splits, []).
Splits = ['Age', ': ', 33].
\end{code}

\begin{arguments}
\arg{Pattern} & is the pattern text, optionally follows by /Flags.
Similar to \predref{re_matchsub}{4}, the final output type can be controlled by
a flag \const{a} (atom), \const{s} (string, default) or \const{n} (number if possible,
atom otherwise). \\
\end{arguments}

    \predicate[det]{re_replace}{4}{+Pattern, +With, +String, -NewString}
\nodescription
    \predicate[det]{re_replace}{5}{+Pattern, +With, +String, -NewString, +Options}
Replace matches of the regular expression \arg{Pattern} in \arg{String} with
\arg{With} (possibly containing references to captured substrings).

Throws an error if \arg{With} uses a name that doesn't exist in the
\arg{Pattern}.

\begin{arguments}
\arg{Pattern} & is the pattern text, optionally followed by /Flags.
Flags may include \const{g}, replacing all occurences of \arg{Pattern}. In
addition, similar to \predref{re_matchsub}{4}, the final output type can be
controlled by a flag \const{a} (atom) or \const{s} (string, default). The
output type can also be specified by the \verb$capture_type$ option.
Capture type suffixes can modify behavior; for example, the
following will change an ISO 8601 format date (YYYY-MM-DD) to
American style (m/d/y), and also remove leading zeros by using the
\arg{_I} suffix:

\begin{code}
re_replace("(?<date> (?<year_I>(?:\\d\\d)?\\d\\d) -
            (?<month_I>\\d\\d) - (?<day_I>\\d\\d) )"/x,
           "$month-$day-$year",
           ISODate, AmericanDate)`
\end{code}

 \\
\arg{With} & is the replacement text. It may reference captured
substrings using \Sneg{}N or \$Name. Both N and Name may be written as \{N\}
and \{Name\} to avoid ambiguities. If a substring is named, it cannot
be referenced by its number. The single chracters \verb|$| and \verb$\$ can be
escaped by doubling (e.g., \verb|re_replace(".","$$","abc",Replaced)|
results in \verb|Replaced="$bc"|). (Because \verb$\$ is an escape character
inside strings, you need to write "\bsl{}\bsl{}\BB{}" to get a single backslash.) \\
\arg{Options} & See \predref{re_match}{3} for the set of options.

The options that are derived from flags take precedence over the
options in the \arg{Options} list. In the case of conflicting flags, the
first one is used (e.g., \const{as} results in \verb$capture_type(string)$).
If a \verb$capture_type$ is meaningless (\const{range} or \const{term}), it is
ignored. \\
\end{arguments}

    \predicate[det]{re_compile}{3}{+Pattern, -Regex, +Options}
Compiles \arg{Pattern} to a \arg{Regex} \textit{blob} of type \const{regex} (see \predref{blob}{2}).
Defined \arg{Options} are given below. Please consult the \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API
documentation}
for details. If an option is repeated, the first value is used and
subsequent values are ignored. Unrecognized options are ignored.
Unless otherwise specified, boolean options default to \const{false}. Some
options may not exist on your system, depending on the PCRE2 version
and how it was built - these unsupported options are silently
ignored.

The various matching predicates can take either a \arg{Regex} \textit{blob} or a
string pattern; if they are given a string pattern, they call
\predref{re_compile}{3} and cache the result; so, there is little reason to use
\predref{re_compile}{3} directly.

\begin{itemize}
    \item \verb$anchored(Bool)$
If \const{true}, match only at the first position
    \item \verb$auto_capture(Bool)$
Enable use of numbered capturing parentheses.
(default \const{true})
    \item \verb$bsr(Mode)$
If \const{anycrlf}, \Sneg{}R only matches CR, LF or CRLF; if \const{unicode},
\Sneg{}R matches all Unicode line endings.
    \item \verb$bsr2(Mode)$ - synonym for \verb$bsr(Mode)$.
    \item \verb$caseless(Bool)$
If \const{true}, do caseless matching.
    \item \verb$compat(With)$
Error - PCRE1 had \verb$compat(javascript)$ for JavaScript
compatibility, but PCRE2 has removed that.
    \item \verb$dollar_endonly(Bool)$
If \const{true}, \$ not to match newline at end
    \item \verb$dotall(Bool)$
If \const{true}, . matches anything including NL
    \item \verb$dupnames(Bool)$
If \const{true}, allow duplicate names for subpatterns
    \item \verb$extended(Bool)$
If \const{true}, ignore white space and \# comments
    \item \verb$firstline(Bool)$
If \const{true}, force matching to be before newline
    \item \verb$greedy(Bool)$
If \const{true}, operators such as \verb$+$ and \verb$*$ are greedy unless
followed by \verb$?$; if \const{false}, the operators are not greedy and \verb$?$
has the opposite meaning. It can also beset by a `(?U)` within the
pattern - see the \href{https://www.pcre.org/current/doc/html/pcre2pattern.html\#SEC13}{PCRE2 pattern internal option setting
documentation}
for details and note that the PCRE2 option is \arg{UNGREEDY}, which is
the inverse of this packages \const{greedy} options. (default \const{true})
    \item \verb$compat(With)$
Raises an errr - PCRE1 had \verb$compat(javascript)$ for JavaScript
compatibility, but PCRE2 has removed that option . Consider using
the \verb$alt_bsux$ and \verb$extra_alt_bsux$ options.
    \item \verb$multiline(Bool)$
If \const{true}, \Shat{} and \$ match newlines within data
    \item \verb$newline(Mode)$
If \const{any}, recognize any Unicode newline sequence; if \const{anycrlf}
(default), recognize CR, LF, and CRLF as newline sequences; if
\const{cr}, recognize CR; if \const{lf}, recognize LF; \const{crlf} recognize CRLF
as newline; if \const{nul}, recognize the NULL character (0x00) as
newline.
    \item \verb$newline2(Mode)$ - synonym for \verb$newline(Mode)$.
    \item \verb$ucp(Bool)$
If \const{true}, use Unicode properties for \Sneg{}d, \Sneg{}w, etc.
    \item \verb$utf_check(Bool)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
You should not need this because SWI-Prolog ensures that the UTF8 strings are valid,
    \item \verb$endanchored(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$allow_empty_class(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$alt_bsux(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$auto_callout(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$match_unset_backref(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$never_ucp(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$never_utf(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$auto_possess(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
(default \const{true})
    \item \verb$dotstar_anchor(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
(default \const{true})
    \item \verb$start_optimize(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
(default \const{true})
    \item \verb$utf(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$never_backslash_c(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$alt_circumflex(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$alt_verbnames(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$use_offset_limit(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$extended_more(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$literal(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$match_invalid_utf(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$jit_complete(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$jit_partial_soft(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$jit_partial_hard(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$jit_invalid_utf(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
    \item \verb$jit(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
(default \const{true})
    \item \verb$copy_matched_subject(boolean)$ - see \href{https://www.pcre.org/current/doc/html/pcre2api.html}{PCRE2 API documentation}
\end{itemize}

In addition to the options above that directly map to PCRE flags the
following options are processed:

\begin{itemize}
    \item \verb$optimise(Bool)$ or \verb$optimize(Bool)$
Turns on the JIT compiler for additional optimization that greatly
that speeds up the matching performance of many patterns. (Note
that he meaning has changed slightly from the PCRE1 implementation
    \item PCRE2 always optimises where possible; this is an additional
optimisation.)
    \item \verb$capture_type(+Type)$
How to return the matched part of the input and possibly captured
groups in there. Possible values are:

\begin{description}
    \termitem{string}{}
Return the captured string as a string (default).
    \termitem{atom}{}
Return the captured string as an atom.
    \termitem{range}{}
Return the captured string as a pair \verb$Start-Length$. Note that
we use \verb$Start-Length$ rather than the more conventional
\verb$Start-End$ to allow for immediate use with \predref{sub_atom}{5} and
\predref{sub_string}{5}.
    \termitem{term}{}
Parse the captured string as a Prolog term. This is notably
practical if you capture a number.
\end{description}
\end{itemize}

The \verb$capture_type$ specifies the default for this pattern. The
interface supports a different type for each \textit{named} group using
the syntax `(?$<$name_T$>$...)`, where \arg{T} is one of \verb$S$ (string),
\verb$A$ (atom), \verb$I$ (integer), \verb$F$ (float), \verb$N$ (number), \verb$T$
(term) and \verb$R$ (range). In the current implementation \verb$I$,
\verb$F$ and \verb$N$ are synonyms for \verb$T$. Future versions may act
different if the parsed value is not of the requested numeric type.

Note that \predref{re_compile}{3} does not support the \arg{Pattern}/Flags form that
is supported by \predref{re_match}{3}, \predref{re_replace}{4}, etc.; the \arg{Pattern} must be
text and all compile options specified in \arg{Options}.

    \predicate{re_flush}{0}{}
Clean pattern and replacement caches.

\begin{tags}
    \tag{To be done}
Flush automatically if the cache becomes too large.
\end{tags}

    \predicate{re_config}{1}{?Term}
Extract configuration information from the pcre library. \arg{Term} is of
the form \verb$Name(Value)$. Name is derived from the
\verb$PCRE_CONFIG_*$ constant after removing \verb$PCRE_CONFIG_$ and
mapping the name to lower case, e.g. \const{utf8}, \verb$unicode_properties$,
etc. Value is a Prolog boolean, integer, or atom. For boolean (1 or
0) values, \const{true} or \const{false} is returned.

\predref{re_config}{1} will backtrack through all the possible configuration
values if its argument is a variable. If an unknown option is
specified, \predref{re_config}{1} fails.

Non-compatible changes between PCRE1 and PCRE2 because numeric
values changed: \const{bsr} and \const{newline} have been replaced by \const{bsr2} and
\const{newline2}:

\begin{itemize}
    \item \const{bsr2} - previously \const{bsr} returned 0 or 1; now returns \const{unicode}
or \const{anycrlf}
    \item \const{newline2} - previously \const{newline} returned an integer, now
returns \const{cr}, \const{lf}, \const{crlf}, \const{any}, \const{anycrlf}, \const{nul}
\end{itemize}

\arg{Term} values are as follows. Some values might not exist, depending on
the version of PCRE2 and the options it was built with.

\begin{itemize}
    \item bsr2
The character sequences that the \verb$\R$ escape sequence matches by
default. Replaces \const{bsr} option from PCRE1, which is not compatible.
    \item compiled_widths
An integer whose lower bits indicate which code unit widths were
selected when PCRE2 was built. The 1-bit indicates 8-bit support,
and the 2-bit and 4-bit indicate 16-bit and 32-bit support,
respectively. The 1 bit should always be set because the wrapper
code requires 8 bit support.
    \item depthlimit
    \item heaplimit
    \item jit
\const{true} if just-in-time compiling is available.
    \item jittarget
A string containing the name of the architecture for which the JIT
compiler is configured. e.g., 'x86 64bit (little endian + unaligned)'.
    \item linksize
    \item matchlimit
    \item never_backslash_c
    \item newline2
An atom whose value specifies the default character sequence that
is recognized as meaning "newline" (\const{cr}, \const{lf}, \const{crlf}, \const{any},
\const{anycrlf}, \const{nul}). Replaces \const{newline} option from PCRE1, which is
not compatible.
    \item parenslimit
    \item stackrecurse
    \item unicode
Always \const{true}
    \item unicode_version
The unicode version as an atom, e.g. '12.1.0'.
    \item utf8 - synonym for \const{unicode}
    \item parens_limit
    \item version
The version information as an atom, containing the PCRE version
number and release date, e.g. '10.34 2019-11-21'.

For backwards compatibility with PCRE1, the following are accepted,
but are deprecated:

\begin{shortlist}
    \item \const{utf8} - synonym for \const{unicode}
    \item \verb$link_size$ - synonym for \const{linksize}
    \item \verb$match_limit$ - synonym for \const{matchlimit}
    \item \verb$parens_limit$ - synonym for \const{parenslimit}
    \item \verb$unicode_properties$ - always true
\end{shortlist}

The following have been removed because they don't exist in PCRE2
and don't seem to have any meaningful use in PCRE1:

\begin{shortlist}
    \item \verb$posix_malloc_threshold$
    \item \verb$match_limit_recursion$
\end{shortlist}
\end{itemize}
\end{description}