PCRE - Perl-compatible regular expressions

NAME

       PCRE - Perl-compatible regular expressions

PCRE NATIVE API


       #include <pcre.h>

       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre *pcre_compile2(const char *pattern, int options,
            int *errorcodeptr,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre_extra *pcre_study(const pcre *code, int options,
            const char **errptr);

       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);

       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize,
            int *workspace, int wscount);

       int pcre_copy_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            char *buffer, int buffersize);

       int pcre_copy_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber, char *buffer,
            int buffersize);

       int pcre_get_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            const char **stringptr);

       int pcre_get_stringnumber(const pcre *code,
            const char *name);

       int pcre_get_stringtable_entries(const pcre *code,
            const char *name, char **first, char **last);

       int pcre_get_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber,
            const char **stringptr);

       int pcre_get_substring_list(const char *subject,
            int *ovector, int stringcount, const char ***listptr);

       void pcre_free_substring(const char *stringptr);

       void pcre_free_substring_list(const char **stringptr);

       const unsigned char *pcre_maketables(void);

       int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
            int what, void *where);

       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);

       int pcre_refcount(pcre *code, int adjust);

       int pcre_config(int what, void *where);

       const char *pcre_version(void);

       void *(*pcre_malloc)(size_t);

       void (*pcre_free)(void *);

       void *(*pcre_stack_malloc)(size_t);

       void (*pcre_stack_free)(void *);

       int (*pcre_callout)(pcre_callout_block *);

PCRE API OVERVIEW


       PCRE has its own native API, which is described in this document. There
       are also some wrapper functions that correspond to  the  POSIX  regular
       expression  API.  These  are  described in the pcreposix documentation.
       Both of these APIs define a set of C function calls. A C++  wrapper  is
       distributed with PCRE. It is documented in the pcrecpp page.

       The  native  API  C  function prototypes are defined in the header file
       pcre.h, and on Unix systems the library itself is called  libpcre.   It
       can normally be accessed by adding -lpcre to the command for linking an
       application  that  uses  PCRE.  The  header  file  defines  the  macros
       PCRE_MAJOR  and  PCRE_MINOR  to  contain  the  major  and minor release
       numbers for the library.  Applications can use these to include support
       for different releases of PCRE.

       The   functions   pcre_compile(),  pcre_compile2(),  pcre_study(),  and
       pcre_exec() are used for compiling and matching regular expressions  in
       a  Perl-compatible  manner.  A  sample  program  that  demonstrates the
       simplest way of using them is provided in the file called pcredemo.c in
       the PCRE source distribution. A listing of this program is given in the
       pcredemo documentation, and the pcresample documentation describes  how
       to compile and run it.

       A  second  matching  function,  pcre_dfa_exec(),  which  is  not  Perl-
       compatible, is also provided. This uses a different algorithm  for  the
       matching.  The  alternative  algorithm finds all possible matches (at a
       given point in the subject), and scans the subject  just  once  (unless
       there  are  lookbehind  assertions).  However,  this algorithm does not
       return  captured  substrings.  A  description  of  the   two   matching
       algorithms  and  their  advantages  and  disadvantages  is given in the
       pcrematching documentation.

       In addition to the main compiling and  matching  functions,  there  are
       convenience functions for extracting captured substrings from a subject
       string that is matched by pcre_exec(). They are:

         pcre_copy_substring()
         pcre_copy_named_substring()
         pcre_get_substring()
         pcre_get_named_substring()
         pcre_get_substring_list()
         pcre_get_stringnumber()
         pcre_get_stringtable_entries()

       pcre_free_substring() and pcre_free_substring_list() are also provided,
       to free the memory used for extracted strings.

       The  function  pcre_maketables()  is  used  to build a set of character
       tables  in  the  current  locale   for   passing   to   pcre_compile(),
       pcre_exec(),  or  pcre_dfa_exec(). This is an optional facility that is
       provided for specialist use.  Most  commonly,  no  special  tables  are
       passed,  in  which case internal tables that are generated when PCRE is
       built are used.

       The function pcre_fullinfo() is used to find out  information  about  a
       compiled  pattern; pcre_info() is an obsolete version that returns only
       some of the  available  information,  but  is  retained  for  backwards
       compatibility.   The  function  pcre_version()  returns  a pointer to a
       string containing the version of PCRE and its date of release.

       The function pcre_refcount() maintains a  reference  count  in  a  data
       block  containing  a compiled pattern. This is provided for the benefit
       of object-oriented applications.

       The global variables pcre_malloc and pcre_free  initially  contain  the
       entry   points   of   the   standard  malloc()  and  free()  functions,
       respectively. PCRE calls the  memory  management  functions  via  these
       variables,  so  a  calling  program  can  replace  them if it wishes to
       intercept the calls. This  should  be  done  before  calling  any  PCRE
       functions.

       The  global  variables  pcre_stack_malloc  and pcre_stack_free are also
       indirections to memory management functions.  These  special  functions
       are  used  only  when  PCRE is compiled to use the heap for remembering
       data, instead of recursive function calls, when running the pcre_exec()
       function.  See  the  pcrebuild  documentation  for details of how to do
       this.  It  is  a  non-standard  way  of  building  PCRE,  for  use   in
       environments  that  have  limited stacks. Because of the greater use of
       memory management, it runs more slowly. Separate functions are provided
       so  that  special-purpose external code can be used for this case. When
       used, these functions are always called in a  stack-like  manner  (last
       obtained,  first freed), and always for memory blocks of the same size.
       There is a  discussion  about  PCRE’s  stack  usage  in  the  pcrestack
       documentation.

       The global variable pcre_callout initially contains NULL. It can be set
       by the caller to a "callout" function, which PCRE  will  then  call  at
       specified  points during a matching operation. Details are given in the
       pcrecallout documentation.

NEWLINES


       PCRE supports five different conventions for indicating line breaks  in
       strings:   a  single  CR  (carriage  return)  character,  a  single  LF
       (linefeed) character, the two-character sequence CRLF, any of the three
       preceding,  or  any  Unicode  newline  sequence.  The  Unicode  newline
       sequences are the three just mentioned, plus the single  characters  VT
       (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085),
       LS (line separator, U+2028), and PS (paragraph separator, U+2029).

       Each of the first three conventions is used by at least  one  operating
       system  as its standard newline sequence. When PCRE is built, a default
       can be specified.  The  default  default  is  LF,  which  is  the  Unix
       standard.  When PCRE is run, the default can be overridden, either when
       a pattern is compiled, or when it is matched.

       At compile time, the newline convention can be specified by the options
       argument  of  pcre_compile(), or it can be specified by special text at
       the start of the pattern itself; this overrides any other settings. See
       the pcrepattern page for details of the special character sequences.

       In  the  PCRE  documentation  the  word  "newline" is used to mean "the
       character or pair of characters that indicate a line break". The choice
       of  newline convention affects the handling of the dot, circumflex, and
       dollar metacharacters, the handling of #-comments in /x mode, and, when
       CRLF   is  a  recognized  line  ending  sequence,  the  match  position
       advancement for a non-anchored pattern. There is more detail about this
       in the section on pcre_exec() options below.

       The  choice of newline convention does not affect the interpretation of
       the \n or \r escape sequences, nor does  it  affect  what  \R  matches,
       which is controlled in a similar way, but by separate options.

MULTITHREADING


       The  PCRE  functions  can be used in multi-threading applications, with
       the  proviso  that  the  memory  management  functions  pointed  to  by
       pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the
       callout function pointed to by pcre_callout, are shared by all threads.

       The  compiled  form  of  a  regular  expression  is  not altered during
       matching, so the same compiled pattern can safely be  used  by  several
       threads at once.

SAVING PRECOMPILED PATTERNS FOR LATER USE


       The compiled form of a regular expression can be saved and re-used at a
       later time, possibly by a different program, and even on a  host  other
       than  the  one  on  which  it  was  compiled.  Details are given in the
       pcreprecompile documentation. However, compiling a  regular  expression
       with  one  version  of  PCRE  for  use  with a different version is not
       guaranteed to work and may cause crashes.

CHECKING BUILD-TIME OPTIONS


       int pcre_config(int what, void *where);

       The function pcre_config() makes it  possible  for  a  PCRE  client  to
       discover  which  optional  features  have  been  compiled into the PCRE
       library. The pcrebuild  documentation  has  more  details  about  these
       optional features.

       The  first  argument  for pcre_config() is an integer, specifying which
       information is required; the second argument is a pointer to a variable
       into  which  the  information  is  placed. The following information is
       available:

         PCRE_CONFIG_UTF8

       The output is an integer that  is  set  to  one  if  UTF-8  support  is
       available; otherwise it is set to zero.

         PCRE_CONFIG_UNICODE_PROPERTIES

       The  output  is  an  integer  that is set to one if support for Unicode
       character properties is available; otherwise it is set to zero.

         PCRE_CONFIG_NEWLINE

       The output is an integer whose value specifies  the  default  character
       sequence  that is recognized as meaning "newline". The four values that
       are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF,
       and  -1  for  ANY.  Though they are derived from ASCII, the same values
       are returned  in  EBCDIC  environments.  The  default  should  normally
       correspond to the standard sequence for your operating system.

         PCRE_CONFIG_BSR

       The output is an integer whose value indicates what character sequences
       the \R escape sequence matches by default. A value of 0 means  that  \R
       matches  any  Unicode  line ending sequence; a value of 1 means that \R
       matches only CR, LF, or CRLF. The default  can  be  overridden  when  a
       pattern is compiled or matched.

         PCRE_CONFIG_LINK_SIZE

       The  output  is  an  integer that contains the number of bytes used for
       internal linkage in compiled regular expressions. The value is 2, 3, or
       4.  Larger  values  allow larger regular expressions to be compiled, at
       the expense of slower matching. The default value of  2  is  sufficient
       for  all  but  the  most massive patterns, since it allows the compiled
       pattern to be up to 64K in size.

         PCRE_CONFIG_POSIX_MALLOC_THRESHOLD

       The output is an integer that contains the threshold  above  which  the
       POSIX  interface  uses malloc() for output vectors. Further details are
       given in the pcreposix documentation.

         PCRE_CONFIG_MATCH_LIMIT

       The output is a long integer that  gives  the  default  limit  for  the
       number  of internal matching function calls in a pcre_exec() execution.
       Further details are given with pcre_exec() below.

         PCRE_CONFIG_MATCH_LIMIT_RECURSION

       The output is a long integer that gives the default limit for the depth
       of   recursion  when  calling  the  internal  matching  function  in  a
       pcre_exec() execution.  Further  details  are  given  with  pcre_exec()
       below.

         PCRE_CONFIG_STACKRECURSE

       The  output is an integer that is set to one if internal recursion when
       running pcre_exec() is implemented by recursive function calls that use
       the  stack  to remember their state. This is the usual way that PCRE is
       compiled. The output is zero if PCRE was compiled to use blocks of data
       on  the  heap  instead  of  recursive  function  calls.  In  this case,
       pcre_stack_malloc and  pcre_stack_free  are  called  to  manage  memory
       blocks on the heap, thus avoiding the use of the stack.

COMPILING A PATTERN


       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre *pcre_compile2(const char *pattern, int options,
            int *errorcodeptr,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       Either of the functions pcre_compile() or pcre_compile2() can be called
       to compile a pattern into an internal form. The only difference between
       the  two interfaces is that pcre_compile2() has an additional argument,
       errorcodeptr, via which a numerical error  code  can  be  returned.  To
       avoid  too  much repetition, we refer just to pcre_compile() below, but
       the information applies equally to pcre_compile2().

       The pattern is a C string terminated by a binary zero, and is passed in
       the  pattern  argument.  A  pointer to a single block of memory that is
       obtained via pcre_malloc is returned. This contains the  compiled  code
       and related data. The pcre type is defined for the returned block; this
       is a typedef for a structure whose contents are not externally defined.
       It is up to the caller to free the memory (via pcre_free) when it is no
       longer required.

       Although the compiled code of a PCRE regex is relocatable, that is,  it
       does not depend on memory location, the complete pcre data block is not
       fully relocatable, because it  may  contain  a  copy  of  the  tableptr
       argument, which is an address (see below).

       The  options  argument  contains  various  bit settings that affect the
       compilation. It  should  be  zero  if  no  options  are  required.  The
       available  options  are  described  below. Some of them (in particular,
       those that are compatible with Perl, but some others as well) can  also
       be  set and unset from within the pattern (see the detailed description
       in the pcrepattern  documentation).  For  those  options  that  can  be
       different  in  different  parts  of  the  pattern,  the contents of the
       options argument specifies their settings at the start  of  compilation
       and  execution.  The  PCRE_ANCHORED, PCRE_BSR_xxx, and PCRE_NEWLINE_xxx
       options can be set at the time of matching as well as at compile  time.

       If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
       if compilation of a pattern fails,  pcre_compile()  returns  NULL,  and
       sets  the  variable  pointed  to  by errptr to point to a textual error
       message. This is a static string that is part of the library. You  must
       not  try  to  free it. The byte offset from the start of the pattern to
       the character that was being processed when the error was discovered is
       placed in the variable pointed to by erroffset, which must not be NULL.
       If it is, an immediate error is given. Some  errors  are  not  detected
       until  checks  are carried out when the whole pattern has been scanned;
       in this case the offset is set to the end of the pattern.

       If  pcre_compile2()  is  used  instead  of  pcre_compile(),   and   the
       errorcodeptr  argument  is  not  NULL,  a non-zero error code number is
       returned via this argument in  the  event  of  an  error.  This  is  in
       addition  to  the  textual  error message. Error codes and messages are
       listed below.

       If the final argument, tableptr, is NULL, PCRE uses a  default  set  of
       character  tables  that  are  built  when  PCRE  is compiled, using the
       default C locale. Otherwise, tableptr must be an address  that  is  the
       result  of  a  call to pcre_maketables(). This value is stored with the
       compiled pattern, and used again by pcre_exec(), unless  another  table
       pointer is passed to it. For more discussion, see the section on locale
       support below.

       This  code  fragment  shows   a   typical   straightforward   call   to
       pcre_compile():

         pcre *re;
         const char *error;
         int erroffset;
         re = pcre_compile(
           "^A.*Z",          /* the pattern */
           0,                /* default options */
           &error,           /* for error message */
           &erroffset,       /* for error offset */
           NULL);            /* use default character tables */

       The  following  names  for option bits are defined in the pcre.h header
       file:

         PCRE_ANCHORED

       If this bit is set, the pattern is forced to be "anchored", that is, it
       is  constrained to match only at the first matching point in the string
       that is being searched (the "subject string"). This effect can also  be
       achieved  by appropriate constructs in the pattern itself, which is the
       only way to do it in Perl.

         PCRE_AUTO_CALLOUT

       If this bit is set, pcre_compile() automatically inserts callout items,
       all  with  number  255, before each pattern item. For discussion of the
       callout facility, see the pcrecallout documentation.

         PCRE_BSR_ANYCRLF
         PCRE_BSR_UNICODE

       These options (which are mutually exclusive) control what the \R escape
       sequence  matches.  The choice is either to match only CR, LF, or CRLF,
       or to match any Unicode newline sequence. The default is specified when
       PCRE  is  built.  It  can  be overridden from within the pattern, or by
       setting an option when a compiled pattern is matched.

         PCRE_CASELESS

       If this bit is set, letters in the pattern match both upper  and  lower
       case  letters.  It  is  equivalent  to  Perl’s /i option, and it can be
       changed within a pattern by a (?i) option setting. In UTF-8 mode,  PCRE
       always  understands the concept of case for characters whose values are
       less than 128, so caseless matching is always possible. For  characters
       with  higher  values,  the  concept  of  case  is  supported if PCRE is
       compiled with Unicode property support, but not otherwise. If you  want
       to  use caseless matching for characters 128 and above, you must ensure
       that PCRE is compiled with Unicode property support  as  well  as  with
       UTF-8 support.

         PCRE_DOLLAR_ENDONLY

       If  this bit is set, a dollar metacharacter in the pattern matches only
       at the end of the subject string. Without this option,  a  dollar  also
       matches  immediately before a newline at the end of the string (but not
       before any other newlines). The PCRE_DOLLAR_ENDONLY option  is  ignored
       if  PCRE_MULTILINE  is  set.   There is no equivalent to this option in
       Perl, and no way to set it within a pattern.

         PCRE_DOTALL

       If this bit is set, a dot  metacharater  in  the  pattern  matches  all
       characters,  including  those  that indicate newline. Without it, a dot
       does not match when the current position is at a newline.  This  option
       is  equivalent  to  Perl’s  /s  option,  and it can be changed within a
       pattern by a (?s) option setting. A negative class such as [^a]  always
       matches  newline characters, independent of the setting of this option.

         PCRE_DUPNAMES

       If this bit is set, names used to identify capturing  subpatterns  need
       not be unique. This can be helpful for certain types of pattern when it
       is known that only one instance of the named  subpattern  can  ever  be
       matched.  There  are  more details of named subpatterns below; see also
       the pcrepattern documentation.

         PCRE_EXTENDED

       If this bit is set, whitespace  data  characters  in  the  pattern  are
       totally  ignored  except  when  escaped  or  inside  a character class.
       Whitespace does not include the VT character (code  11).  In  addition,
       characters  between  an  unescaped  # outside a character class and the
       next newline, inclusive, are also ignored. This is equivalent to Perl’s
       /x  option,  and  it  can  be changed within a pattern by a (?x) option
       setting.

       This option makes it possible to include  comments  inside  complicated
       patterns.   Note,  however,  that this applies only to data characters.
       Whitespace  characters  may  never  appear  within  special   character
       sequences  in  a  pattern,  for  example  within the sequence (?( which
       introduces a conditional subpattern.

         PCRE_EXTRA

       This option was invented in order to turn on  additional  functionality
       of  PCRE  that  is  incompatible with Perl, but it is currently of very
       little use. When set, any backslash in a pattern that is followed by  a
       letter  that  has  no  special  meaning causes an error, thus reserving
       these combinations for future expansion. By  default,  as  in  Perl,  a
       backslash  followed by a letter with no special meaning is treated as a
       literal. (Perl can, however, be persuaded to give a warning for  this.)
       There  are  at  present no other features controlled by this option. It
       can also be set by a (?X) option setting within a pattern.

         PCRE_FIRSTLINE

       If this option is set, an  unanchored  pattern  is  required  to  match
       before  or  at  the  first  newline  in  the subject string, though the
       matched text may continue over the newline.

         PCRE_JAVASCRIPT_COMPAT

       If this option is set, PCRE’s behaviour is changed in some ways so that
       it  is  compatible with JavaScript rather than Perl. The changes are as
       follows:

       (1) A lone closing square bracket in a pattern  causes  a  compile-time
       error,  because this is illegal in JavaScript (by default it is treated
       as a data character). Thus, the pattern AB]CD becomes illegal when this
       option is set.

       (2)  At run time, a back reference to an unset subpattern group matches
       an  empty  string  (by  default  this  causes  the   current   matching
       alternative  to  fail).  A  pattern  such as (\1)(a) succeeds when this
       option is set (assuming it can find an "a" in the subject), whereas  it
       fails by default, for Perl compatibility.

         PCRE_MULTILINE

       By  default,  PCRE  treats the subject string as consisting of a single
       line of characters (even if it actually contains newlines). The  "start
       of  line"  metacharacter  (^)  matches only at the start of the string,
       while the "end of line" metacharacter ($) matches only at  the  end  of
       the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
       is set). This is the same as Perl.

       When PCRE_MULTILINE it is set, the "start of line" and  "end  of  line"
       constructs  match  immediately following or immediately before internal
       newlines in the subject string, respectively, as well as  at  the  very
       start  and  end.  This is equivalent to Perl’s /m option, and it can be
       changed within a pattern by a (?m) option  setting.  If  there  are  no
       newlines in a subject string, or no occurrences of ^ or $ in a pattern,
       setting PCRE_MULTILINE has no effect.

         PCRE_NEWLINE_CR
         PCRE_NEWLINE_LF
         PCRE_NEWLINE_CRLF
         PCRE_NEWLINE_ANYCRLF
         PCRE_NEWLINE_ANY

       These options override the default newline definition that  was  chosen
       when  PCRE  was built. Setting the first or the second specifies that a
       newline is indicated by a single character (CR  or  LF,  respectively).
       Setting  PCRE_NEWLINE_CRLF specifies that a newline is indicated by the
       two-character CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF  specifies
       that any of the three preceding sequences should be recognized. Setting
       PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should  be
       recognized. The Unicode newline sequences are the three just mentioned,
       plus the single characters VT (vertical  tab,  U+000B),  FF  (formfeed,
       U+000C),  NEL  (next line, U+0085), LS (line separator, U+2028), and PS
       (paragraph separator, U+2029). The last  two  are  recognized  only  in
       UTF-8 mode.

       The  newline  setting  in  the  options  word  uses three bits that are
       treated as a number, giving eight possibilities. Currently only six are
       used  (default  plus the five values above). This means that if you set
       more than one newline  option,  the  combination  may  or  may  not  be
       sensible.   For   example,   PCRE_NEWLINE_CR  with  PCRE_NEWLINE_LF  is
       equivalent to  PCRE_NEWLINE_CRLF,  but  other  combinations  may  yield
       unused numbers and cause an error.

       The  only time that a line break is specially recognized when compiling
       a pattern is if PCRE_EXTENDED is set, and  an  unescaped  #  outside  a
       character  class  is  encountered.  This indicates a comment that lasts
       until after the next line break sequence. In other circumstances,  line
       break   sequences   are   treated  as  literal  data,  except  that  in
       PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters
       and are therefore ignored.

       The newline option that is set at compile time becomes the default that
       is used for pcre_exec() and pcre_dfa_exec(), but it can be  overridden.

         PCRE_NO_AUTO_CAPTURE

       If  this  option  is  set,  it  disables  the use of numbered capturing
       parentheses in  the  pattern.  Any  opening  parenthesis  that  is  not
       followed  by  ?  behaves  as  if  it  were  followed  by  ?:  but named
       parentheses can still be used for capturing (and they  acquire  numbers
       in the usual way). There is no equivalent of this option in Perl.

         PCRE_UNGREEDY

       This  option  inverts  the "greediness" of the quantifiers so that they
       are not greedy by default, but become greedy if followed by "?". It  is
       not  compatible  with Perl. It can also be set by a (?U) option setting
       within the pattern.

         PCRE_UTF8

       This option causes PCRE to regard both the pattern and the  subject  as
       strings  of  UTF-8 characters instead of single-byte character strings.
       However, it is available only when  PCRE  is  built  to  include  UTF-8
       support.  If  not, the use of this option provokes an error. Details of
       how this option changes the behaviour of PCRE are given in the  section
       on UTF-8 support in the main pcre page.

         PCRE_NO_UTF8_CHECK

       When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
       automatically checked. There is a  discussion  about  the  validity  of
       UTF-8  strings  in  the main pcre page. If an invalid UTF-8 sequence of
       bytes is found, pcre_compile() returns an error. If  you  already  know
       that  your  pattern  is  valid,  and  you  want  to skip this check for
       performance reasons, you can set the PCRE_NO_UTF8_CHECK option. When it
       is  set,  the effect of passing an invalid UTF-8 string as a pattern is
       undefined. It may cause your program to crash. Note  that  this  option
       can  also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the
       UTF-8 validity checking of subject strings.

COMPILATION ERROR CODES


       The following table lists the error  codes  than  may  be  returned  by
       pcre_compile2(),  along with the error messages that may be returned by
       both compiling functions. As PCRE has developed, some error codes  have
       fallen out of use. To avoid confusion, they have not been re-used.

          0  no error
          1  \ at end of pattern
          2  \c at end of pattern
          3  unrecognized character follows \
          4  numbers out of order in {} quantifier
          5  number too big in {} quantifier
          6  missing terminating ] for character class
          7  invalid escape sequence in character class
          8  range out of order in character class
          9  nothing to repeat
         10  [this code is not in use]
         11  internal error: unexpected repeat
         12  unrecognized character after (? or (?-
         13  POSIX named classes are supported only within a class
         14  missing )
         15  reference to non-existent subpattern
         16  erroffset passed as NULL
         17  unknown option bit(s) set
         18  missing ) after comment
         19  [this code is not in use]
         20  regular expression is too large
         21  failed to get memory
         22  unmatched parentheses
         23  internal error: code overflow
         24  unrecognized character after (?<
         25  lookbehind assertion is not fixed length
         26  malformed number or name after (?(
         27  conditional group contains more than two branches
         28  assertion expected after (?(
         29  (?R or (?[+-]digits must be followed by )
         30  unknown POSIX class name
         31  POSIX collating elements are not supported
         32  this version of PCRE is not compiled with PCRE_UTF8 support
         33  [this code is not in use]
         34  character value in \x{...} sequence is too large
         35  invalid condition (?(0)
         36  \C not allowed in lookbehind assertion
         37  PCRE does not support \L, \l, \N, \U, or \u
         38  number after (?C is > 255
         39  closing ) for (?C expected
         40  recursive call could loop indefinitely
         41  unrecognized character after (?P
         42  syntax error in subpattern name (missing terminator)
         43  two named subpatterns have the same name
         44  invalid UTF-8 string
         45  support for \P, \p, and \X has not been compiled
         46  malformed \P or \p sequence
         47  unknown property name after \P or \p
         48  subpattern name is too long (maximum 32 characters)
         49  too many named subpatterns (maximum 10000)
         50  [this code is not in use]
         51  octal value is greater than \377 (not in UTF-8 mode)
         52  internal error: overran compiling workspace
         53   internal  error:  previously-checked  referenced  subpattern not
       found
         54  DEFINE group contains more than one branch
         55  repeating a DEFINE group is not allowed
         56  inconsistent NEWLINE options
         57  \g is not followed by a braced, angle-bracketed, or quoted
               name/number or by a plain number
         58  a numbered reference must not be zero
         59  (*VERB) with an argument is not supported
         60  (*VERB) not recognized
         61  number is too big
         62  subpattern name expected
         63  digit expected after (?+
         64  ] is an invalid data character in JavaScript compatibility mode

       The numbers 32 and 10000 in errors 48 and 49  are  defaults;  different
       values may be used if the limits were changed when PCRE was built.

STUDYING A PATTERN


       pcre_extra *pcre_study(const pcre *code, int options
            const char **errptr);

       If  a  compiled  pattern is going to be used several times, it is worth
       spending more time analyzing it in order to speed up the time taken for
       matching.  The  function  pcre_study()  takes  a  pointer to a compiled
       pattern as  its  first  argument.  If  studying  the  pattern  produces
       additional  information  that will help speed up matching, pcre_study()
       returns a pointer to a pcre_extra block, in which the study_data  field
       points to the results of the study.

       The  returned  value  from  pcre_study()  can  be  passed  directly  to
       pcre_exec()  or  pcre_dfa_exec().  However,  a  pcre_extra  block  also
       contains other fields that can be set by the caller before the block is
       passed; these are described below in the section on matching a pattern.

       If  studying  the  pattern  does  not  produce  any useful information,
       pcre_study() returns NULL. In that circumstance, if the calling program
       wants   to   pass   any   of   the   other  fields  to  pcre_exec()  or
       pcre_dfa_exec(), it must set up its own pcre_extra block.

       The second argument of pcre_study() contains option bits.  At  present,
       no options are defined, and this argument should always be zero.

       The  third argument for pcre_study() is a pointer for an error message.
       If studying succeeds (even if no data is  returned),  the  variable  it
       points  to  is  set  to NULL. Otherwise it is set to point to a textual
       error message. This is a static string that is part of the library. You
       must  not  try  to  free it. You should test the error pointer for NULL
       after calling pcre_study(), to be sure that it has run successfully.

       This is a typical call to pcre_study():

         pcre_extra *pe;
         pe = pcre_study(
           re,             /* result of pcre_compile() */
           0,              /* no options exist */
           &error);        /* set to NULL or points to a message */

       Studying a pattern does two things: first, a lower bound for the length
       of subject string that is needed to match the pattern is computed. This
       does not mean that there are any strings of that length that match, but
       it  does  guarantee that no shorter strings match. The value is used by
       pcre_exec() and pcre_dfa_exec() to avoid  wasting  time  by  trying  to
       match  strings  that are shorter than the lower bound. You can find out
       the value in a calling program via the pcre_fullinfo() function.

       Studying a pattern is also useful for non-anchored patterns that do not
       have  a  single fixed starting character. A bitmap of possible starting
       bytes is created. This speeds up finding a position in the  subject  at
       which to start matching.

LOCALE SUPPORT


       PCRE  handles  caseless matching, and determines whether characters are
       letters, digits, or whatever, by reference to a set of tables,  indexed
       by  character  value.  When running in UTF-8 mode, this applies only to
       characters with codes less than 128. Higher-valued  codes  never  match
       escapes  such  as  \w or \d, but can be tested with \p if PCRE is built
       with Unicode character  property  support.  The  use  of  locales  with
       Unicode  is  discouraged.  If  you  are  handling characters with codes
       greater than 128, you should either  use  UTF-8  and  Unicode,  or  use
       locales, but not try to mix the two.

       PCRE  contains  an  internal set of tables that are used when the final
       argument of pcre_compile() is  NULL.  These  are  sufficient  for  many
       applications.   Normally,  the  internal  tables  recognize  only ASCII
       characters. However, when PCRE is built, it is possible  to  cause  the
       internal  tables  to  be rebuilt in the default "C" locale of the local
       system, which may cause them to be different.

       The internal tables can always be overridden by tables supplied by  the
       application that calls PCRE. These may be created in a different locale
       from the default.  As  more  and  more  applications  change  to  using
       Unicode, the need for this locale support is expected to die away.

       External  tables  are  built by calling the pcre_maketables() function,
       which has no arguments, in the relevant locale. The result can then  be
       passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For
       example, to build and use tables that are appropriate  for  the  French
       locale  (where  accented  characters  with  values greater than 128 are
       treated as letters), the following code could be used:

         setlocale(LC_CTYPE, "fr_FR");
         tables = pcre_maketables();
         re = pcre_compile(..., tables);

       The locale name "fr_FR" is used on Linux and other  Unix-like  systems;
       if you are using Windows, the name for the French locale is "french".

       When  pcre_maketables()  runs,  the  tables are built in memory that is
       obtained via pcre_malloc. It is the caller’s responsibility  to  ensure
       that  the memory containing the tables remains available for as long as
       it is needed.

       The pointer that is passed to pcre_compile() is saved with the compiled
       pattern,  and the same tables are used via this pointer by pcre_study()
       and normally also by pcre_exec(). Thus,  by  default,  for  any  single
       pattern,  compilation,  studying  and  matching  all happen in the same
       locale, but different patterns can be compiled in different locales.

       It is possible to pass a table pointer or NULL (indicating the  use  of
       the  internal  tables)  to  pcre_exec(). Although not intended for this
       purpose, this facility could be used to match a pattern in a  different
       locale from the one in which it was compiled. Passing table pointers at
       run time is discussed below in the section on matching a pattern.

INFORMATION ABOUT A PATTERN


       int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
            int what, void *where);

       The pcre_fullinfo()  function  returns  information  about  a  compiled
       pattern.  It  replaces  the  obsolete  pcre_info()  function,  which is
       nevertheless retained for  backwards  compability  (and  is  documented
       below).

       The  first  argument  for  pcre_fullinfo() is a pointer to the compiled
       pattern. The second argument is the result of pcre_study(), or NULL  if
       the  pattern  was not studied. The third argument specifies which piece
       of information is required, and the fourth argument is a pointer  to  a
       variable  to  receive  the  data. The yield of the function is zero for
       success, or one of the following negative numbers:

         PCRE_ERROR_NULL       the argument code was NULL
                               the argument where was NULL
         PCRE_ERROR_BADMAGIC   the "magic number" was not found
         PCRE_ERROR_BADOPTION  the value of what was invalid

       The "magic number" is placed at the start of each compiled  pattern  as
       an  simple check against passing an arbitrary memory pointer. Here is a
       typical call of pcre_fullinfo(), to obtain the length of  the  compiled
       pattern:

         int rc;
         size_t length;
         rc = pcre_fullinfo(
           re,               /* result of pcre_compile() */
           pe,               /* result of pcre_study(), or NULL */
           PCRE_INFO_SIZE,   /* what is required */
           &length);         /* where to put the data */

       The  possible  values for the third argument are defined in pcre.h, and
       are as follows:

         PCRE_INFO_BACKREFMAX

       Return the number of the highest back reference  in  the  pattern.  The
       fourth  argument  should  point to an int variable. Zero is returned if
       there are no back references.

         PCRE_INFO_CAPTURECOUNT

       Return the number of capturing subpatterns in the pattern.  The  fourth
       argument should point to an int variable.

         PCRE_INFO_DEFAULT_TABLES

       Return  a pointer to the internal default character tables within PCRE.
       The fourth argument should point to an unsigned char *  variable.  This
       information  call  is  provided  for  internal  use by the pcre_study()
       function. External callers can cause PCRE to use its internal tables by
       passing a NULL table pointer.

         PCRE_INFO_FIRSTBYTE

       Return  information  about  the first byte of any matched string, for a
       non-anchored pattern. The  fourth  argument  should  point  to  an  int
       variable.  (This  option used to be called PCRE_INFO_FIRSTCHAR; the old
       name is still recognized for backwards compatibility.)

       If there is a fixed first byte, for example, from  a  pattern  such  as
       (cat|cow|coyote), its value is returned. Otherwise, if either

       (a)  the pattern was compiled with the PCRE_MULTILINE option, and every
       branch starts with "^", or

       (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
       set (if it were set, the pattern would be anchored),

       -1  is  returned, indicating that the pattern matches only at the start
       of a subject string or after any newline within the  string.  Otherwise
       -2 is returned. For anchored patterns, -2 is returned.

         PCRE_INFO_FIRSTTABLE

       If  the pattern was studied, and this resulted in the construction of a
       256-bit table indicating a fixed set of bytes for the first byte in any
       matching  string, a pointer to the table is returned. Otherwise NULL is
       returned. The fourth argument  should  point  to  an  unsigned  char  *
       variable.

         PCRE_INFO_HASCRORLF

       Return  1  if  the  pattern  contains any explicit matches for CR or LF
       characters, otherwise 0. The fourth argument should  point  to  an  int
       variable.  An explicit match is either a literal CR or LF character, or
       \r or \n.

         PCRE_INFO_JCHANGED

       Return 1 if the (?J) or (?-J) option setting is used  in  the  pattern,
       otherwise  0. The fourth argument should point to an int variable. (?J)
       and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.

         PCRE_INFO_LASTLITERAL

       Return the value of the rightmost literal byte that must exist  in  any
       matched  string,  other  than  at  its  start,  if such a byte has been
       recorded. The fourth argument should point to an int variable. If there
       is  no such byte, -1 is returned. For anchored patterns, a last literal
       byte is recorded only if it follows something of variable  length.  For
       example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
       /^a\dz\d/ the returned value is -1.

         PCRE_INFO_MINLENGTH

       If the pattern was studied and a minimum length  for  matching  subject
       strings  was  computed,  its  value is returned. Otherwise the returned
       value is -1. The value is a number of characters, not bytes  (this  may
       be  relevant in UTF-8 mode). The fourth argument should point to an int
       variable. A non-negative value is a lower bound to the  length  of  any
       matching  string.  There  may not be any strings of that length that do
       actually match, but every string that does match is at least that long.

         PCRE_INFO_NAMECOUNT
         PCRE_INFO_NAMEENTRYSIZE
         PCRE_INFO_NAMETABLE

       PCRE   supports  the  use  of  named  as  well  as  numbered  capturing
       parentheses. The names are just an additional way  of  identifying  the
       parentheses, which still acquire numbers. Several convenience functions
       such as pcre_get_named_substring() are provided for extracting captured
       substrings  by  name. It is also possible to extract the data directly,
       by first converting the name to a number in order to access the correct
       pointers in the output vector (described with pcre_exec() below). To do
       the conversion, you need  to  use  the  name-to-number  map,  which  is
       described by these three values.

       The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
       gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
       of  each  entry;  both  of  these  return  an int value. The entry size
       depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns
       a  pointer  to  the  first  entry of the table (a pointer to char). The
       first two  bytes  of  each  entry  are  the  number  of  the  capturing
       parenthesis,  most significant byte first. The rest of the entry is the
       corresponding name, zero terminated.

       The names are in alphabetical order. Duplicate names may appear if  (?|
       is used to create multiple groups with the same number, as described in
       the section on duplicate subpattern numbers in  the  pcrepattern  page.
       Duplicate  names  for  subpatterns with different numbers are permitted
       only if PCRE_DUPNAMES is set. In all cases  of  duplicate  names,  they
       appear  in  the  table  in  the  order  in which they were found in the
       pattern. In the absence of (?| this is the order of increasing  number;
       when  (?|  is  used  this  is  not  necessarily  the case because later
       subpatterns may have lower numbers.

       As a simple example of the name/number table,  consider  the  following
       pattern  (assume  PCRE_EXTENDED  is  set,  so  white  space - including
       newlines - is ignored):

         (?<date> (?<year>(\d\d)?\d\d) -
         (?<month>\d\d) - (?<day>\d\d) )

       There are four named subpatterns, so the table has  four  entries,  and
       each  entry  in the table is eight bytes long. The table is as follows,
       with non-printing bytes shows in hexadecimal, and undefined bytes shown
       as ??:

         00 01 d  a  t  e  00 ??
         00 05 d  a  y  00 ?? ??
         00 04 m  o  n  t  h  00
         00 02 y  e  a  r  00 ??

       When  writing  code  to  extract  data from named subpatterns using the
       name-to-number map, remember that the length of the entries  is  likely
       to be different for each compiled pattern.

         PCRE_INFO_OKPARTIAL

       Return  1  if  the  pattern  can  be  used  for  partial  matching with
       pcre_exec(), otherwise 0. The fourth argument should point  to  an  int
       variable.  From  release  8.00,  this  always  returns  1,  because the
       restrictions that previously applied  to  partial  matching  have  been
       lifted.   The   pcrepartial  documentation  gives  details  of  partial
       matching.

         PCRE_INFO_OPTIONS

       Return a copy of the options with which the pattern was  compiled.  The
       fourth  argument  should  point to an unsigned long int variable. These
       option bits are those specified in the call to pcre_compile(), modified
       by any top-level option settings at the start of the pattern itself. In
       other words, they are the options that will be in force  when  matching
       starts.  For  example, if the pattern /(?im)abc(?-i)d/ is compiled with
       the PCRE_EXTENDED option, the result is PCRE_CASELESS,  PCRE_MULTILINE,
       and PCRE_EXTENDED.

       A  pattern  is  automatically  anchored by PCRE if all of its top-level
       alternatives begin with one of the following:

         ^     unless PCRE_MULTILINE is set
         \A    always
         \G    always
         .*    if PCRE_DOTALL is set and there are no back
                 references to the subpattern in which .* appears

       For such patterns, the PCRE_ANCHORED bit is set in the options returned
       by pcre_fullinfo().

         PCRE_INFO_SIZE

       Return  the  size  of the compiled pattern, that is, the value that was
       passed as the argument to pcre_malloc() when PCRE was getting memory in
       which to place the compiled data. The fourth argument should point to a
       size_t variable.

         PCRE_INFO_STUDYSIZE

       Return the size of the data block pointed to by the study_data field in
       a  pcre_extra  block.  That  is,  it  is  the  value that was passed to
       pcre_malloc() when PCRE was getting memory into which to place the data
       created  by  pcre_study().  If pcre_extra is NULL, or there is no study
       data, zero is returned. The fourth argument should point  to  a  size_t
       variable.

OBSOLETE INFO FUNCTION


       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);

       The  pcre_info()  function is now obsolete because its interface is too
       restrictive to return all the available data about a compiled  pattern.
       New   programs   should  use  pcre_fullinfo()  instead.  The  yield  of
       pcre_info() is the number of  capturing  subpatterns,  or  one  of  the
       following negative numbers:

         PCRE_ERROR_NULL       the argument code was NULL
         PCRE_ERROR_BADMAGIC   the "magic number" was not found

       If  the  optptr  argument is not NULL, a copy of the options with which
       the pattern was compiled is placed in the integer  it  points  to  (see
       PCRE_INFO_OPTIONS above).

       If  the  pattern  is  not anchored and the firstcharptr argument is not
       NULL, it is used to pass back information about the first character  of
       any matched string (see PCRE_INFO_FIRSTBYTE above).

REFERENCE COUNTS


       int pcre_refcount(pcre *code, int adjust);

       The  pcre_refcount()  function is used to maintain a reference count in
       the data block that contains a compiled pattern. It is provided for the
       benefit  of  applications  that  operate  in an object-oriented manner,
       where different parts of the application may be using the same compiled
       pattern, but you want to free the block when they are all done.

       When a pattern is compiled, the reference count field is initialized to
       zero.  It is changed only by calling this function, whose action is  to
       add  the  adjust  value  (which may be positive or negative) to it. The
       yield of the function is the new value. However, the value of the count
       is  constrained to lie between 0 and 65535, inclusive. If the new value
       is outside these limits, it is forced to the appropriate limit value.

       Except when it is zero, the reference count is not correctly  preserved
       if  a  pattern  is  compiled on one host and then transferred to a host
       whose byte-order is different. (This seems a highly unlikely scenario.)

MATCHING A PATTERN: THE TRADITIONAL FUNCTION


       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);

       The  function pcre_exec() is called to match a subject string against a
       compiled pattern, which is passed in the code argument. If the  pattern
       was  studied,  the  result  of  the study should be passed in the extra
       argument. This function is the main matching facility of  the  library,
       and it operates in a Perl-like manner. For specialist use there is also
       an alternative matching function,  which  is  described  below  in  the
       section about the pcre_dfa_exec() function.

       In  most  applications,  the  pattern  will  have  been  compiled  (and
       optionally  studied)  in  the  same  process  that  calls  pcre_exec().
       However,  it  is possible to save compiled patterns and study data, and
       then use them later in different processes, possibly even on  different
       hosts.   For   a   discussion   about   this,  see  the  pcreprecompile
       documentation.

       Here is an example of a simple call to pcre_exec():

         int rc;
         int ovector[30];
         rc = pcre_exec(
           re,             /* result of pcre_compile() */
           NULL,           /* we didn’t study the pattern */
           "some string",  /* the subject string */
           11,             /* the length of the subject string */
           0,              /* start at offset 0 in the subject */
           0,              /* default options */
           ovector,        /* vector of integers for substring information */
           30);            /* number of elements (NOT size in bytes) */

   Extra data for pcre_exec()

       If the extra argument is not NULL, it must point to a  pcre_extra  data
       block.  The pcre_study() function returns such a block (when it doesn’t
       return NULL), but you can  also  create  one  for  yourself,  and  pass
       additional  information  in  it.  The  pcre_extra  block  contains  the
       following fields (not necessarily in this order):

         unsigned long int flags;
         void *study_data;
         unsigned long int match_limit;
         unsigned long int match_limit_recursion;
         void *callout_data;
         const unsigned char *tables;

       The flags field is a bitmap that specifies which of  the  other  fields
       are set. The flag bits are:

         PCRE_EXTRA_STUDY_DATA
         PCRE_EXTRA_MATCH_LIMIT
         PCRE_EXTRA_MATCH_LIMIT_RECURSION
         PCRE_EXTRA_CALLOUT_DATA
         PCRE_EXTRA_TABLES

       Other  flag  bits should be set to zero. The study_data field is set in
       the pcre_extra block that is returned by  pcre_study(),  together  with
       the appropriate flag bit. You should not set this yourself, but you may
       add to the block by setting the other fields  and  their  corresponding
       flag bits.

       The match_limit field provides a means of preventing PCRE from using up
       a vast amount of resources when running patterns that are not going  to
       match,  but  which  have  a very large number of possibilities in their
       search trees. The  classic  example  is  a  pattern  that  uses  nested
       unlimited repeats.

       Internally,  PCRE  uses  a  function  called  match()  which  it  calls
       repeatedly (sometimes recursively). The limit  set  by  match_limit  is
       imposed  on the number of times this function is called during a match,
       which has the effect of limiting the amount of  backtracking  that  can
       take place. For patterns that are not anchored, the count restarts from
       zero for each position in the subject string.

       The default value for the limit can be set  when  PCRE  is  built;  the
       default  default  is 10 million, which handles all but the most extreme
       cases. You can override the default  by  suppling  pcre_exec()  with  a
       pcre_extra     block    in    which    match_limit    is    set,    and
       PCRE_EXTRA_MATCH_LIMIT is set in the  flags  field.  If  the  limit  is
       exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.

       The  match_limit_recursion field is similar to match_limit, but instead
       of limiting the total number of times that match() is called, it limits
       the  depth  of  recursion. The recursion depth is a smaller number than
       the total number of  calls,  because  not  all  calls  to  match()  are
       recursive.   This  limit  is  of  use  only  if  it is set smaller than
       match_limit.

       Limiting the recursion depth limits the amount of  stack  that  can  be
       used, or, when PCRE has been compiled to use memory on the heap instead
       of the stack, the amount of heap memory that can be used.

       The default value for match_limit_recursion can be  set  when  PCRE  is
       built;  the  default  default  is  the  same  value  as the default for
       match_limit. You can override the default by suppling pcre_exec()  with
       a   pcre_extra   block  in  which  match_limit_recursion  is  set,  and
       PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in  the  flags  field.  If  the
       limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.

       The  callout_data  field  is  used  in  conjunction  with the "callout"
       feature, and is described in the pcrecallout documentation.

       The tables field  is  used  to  pass  a  character  tables  pointer  to
       pcre_exec();  this overrides the value that is stored with the compiled
       pattern. A non-NULL value is stored with the compiled pattern  only  if
       custom   tables  were  supplied  to  pcre_compile()  via  its  tableptr
       argument.  If NULL is passed to pcre_exec() using  this  mechanism,  it
       forces PCRE’s internal tables to be used. This facility is helpful when
       re-using patterns that have been saved after compiling with an external
       set  of  tables,  because  the  external tables might be at a different
       address  when   pcre_exec()   is   called.   See   the   pcreprecompile
       documentation  for  a  discussion of saving compiled patterns for later
       use.

   Option bits for pcre_exec()

       The unused bits of the options argument for pcre_exec() must  be  zero.
       The  only  bits  that  may  be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,
       PCRE_NOTBOL,   PCRE_NOTEOL,    PCRE_NOTEMPTY,    PCRE_NOTEMPTY_ATSTART,
       PCRE_NO_START_OPTIMIZE,   PCRE_NO_UTF8_CHECK,   PCRE_PARTIAL_SOFT,  and
       PCRE_PARTIAL_HARD.

         PCRE_ANCHORED

       The PCRE_ANCHORED option limits pcre_exec() to matching  at  the  first
       matching  position.  If  a  pattern was compiled with PCRE_ANCHORED, or
       turned out to be anchored by virtue of its contents, it cannot be  made
       unachored at matching time.

         PCRE_BSR_ANYCRLF
         PCRE_BSR_UNICODE

       These options (which are mutually exclusive) control what the \R escape
       sequence matches. The choice is either to match only CR, LF,  or  CRLF,
       or  to  match  any Unicode newline sequence. These options override the
       choice that was made or defaulted when the pattern was compiled.

         PCRE_NEWLINE_CR
         PCRE_NEWLINE_LF
         PCRE_NEWLINE_CRLF
         PCRE_NEWLINE_ANYCRLF
         PCRE_NEWLINE_ANY

       These options override  the  newline  definition  that  was  chosen  or
       defaulted   when  the  pattern  was  compiled.  For  details,  see  the
       description of  pcre_compile()  above.  During  matching,  the  newline
       choice  affects  the  behaviour  of  the  dot,  circumflex,  and dollar
       metacharacters. It may  also  alter  the  way  the  match  position  is
       advanced after a match failure for an unanchored pattern.

       When  PCRE_NEWLINE_CRLF,  PCRE_NEWLINE_ANYCRLF,  or PCRE_NEWLINE_ANY is
       set, and a match attempt for  an  unanchored  pattern  fails  when  the
       current  position  is  at  a CRLF sequence, and the pattern contains no
       explicit matches for  CR  or  LF  characters,  the  match  position  is
       advanced by two characters instead of one, in other words, to after the
       CRLF.

       The above rule is a compromise that makes the most common cases work as
       expected.  For  example,  if  the  pattern  is .+A (and the PCRE_DOTALL
       option is not set), it does not match the string "\r\nA" because, after
       failing  at the start, it skips both the CR and the LF before retrying.
       However, the  pattern  [\r\n]A  does  match  that  string,  because  it
       contains  an  explicit  CR or LF reference, and so advances only by one
       character after the first failure.

       An explicit match for CR of LF is either a literal appearance of one of
       those  characters,  or  one  of the \r or \n escape sequences. Implicit
       matches such as [^X] do not count, nor does \s (which includes  CR  and
       LF in the characters that it matches).

       Notwithstanding  the above, anomalous effects may still occur when CRLF
       is a valid newline sequence and explicit \r or \n escapes appear in the
       pattern.

         PCRE_NOTBOL

       This option specifies that first character of the subject string is not
       the beginning of a line, so the  circumflex  metacharacter  should  not
       match  before it. Setting this without PCRE_MULTILINE (at compile time)
       causes  circumflex  never  to  match.  This  option  affects  only  the
       behaviour of the circumflex metacharacter. It does not affect \A.

         PCRE_NOTEOL

       This option specifies that the end of the subject string is not the end
       of a line, so the dollar metacharacter should not match it nor  (except
       in  multiline  mode)  a  newline  immediately  before  it. Setting this
       without PCRE_MULTILINE (at compile time) causes dollar never to  match.
       This  option affects only the behaviour of the dollar metacharacter. It
       does not affect \Z or \z.

         PCRE_NOTEMPTY

       An empty string is not considered to be a valid match if this option is
       set.  If  there are alternatives in the pattern, they are tried. If all
       the alternatives match the empty string, the entire  match  fails.  For
       example, if the pattern

         a?b?

       is  applied  to  a  string not beginning with "a" or "b", it matches an
       empty string at the start of the subject. With PCRE_NOTEMPTY set,  this
       match  is  not  valid,  so  PCRE  searches  further into the string for
       occurrences of "a" or "b".

         PCRE_NOTEMPTY_ATSTART

       This is like PCRE_NOTEMPTY, except that an empty string match  that  is
       not  at  the  start  of  the  subject  is  permitted. If the pattern is
       anchored, such a match can occur only if the pattern contains \K.

       Perl    has    no    direct    equivalent    of    PCRE_NOTEMPTY     or
       PCRE_NOTEMPTY_ATSTART,  but  it  does  make a special case of a pattern
       match of the empty string within its split() function, and  when  using
       the  /g  modifier.  It  is  possible  to emulate Perl’s behaviour after
       matching a null string by first trying the  match  again  at  the  same
       offset  with  PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED, and then if that
       fails, by advancing the starting  offset  (see  below)  and  trying  an
       ordinary  match  again.  There is some code that demonstrates how to do
       this in the pcredemo sample program.

         PCRE_NO_START_OPTIMIZE

       There are a number of optimizations that pcre_exec() uses at the  start
       of  a  match,  in  order to speed up the process. For example, if it is
       known that a match must start with a specific  character,  it  searches
       the subject for that character, and fails immediately if it cannot find
       it, without actually running the main matching function. When  callouts
       are  in  use,  these  optimizations  can cause them to be skipped. This
       option disables the "start-up" optimizations,  causing  performance  to
       suffer, but ensuring that the callouts do occur.

         PCRE_NO_UTF8_CHECK

       When PCRE_UTF8 is set at compile time, the validity of the subject as a
       UTF-8 string is automatically checked when pcre_exec() is  subsequently
       called.   The  value  of  startoffset is also checked to ensure that it
       points to the start of a UTF-8 character. There is a  discussion  about
       the  validity  of  UTF-8 strings in the section on UTF-8 support in the
       main pcre page. If  an  invalid  UTF-8  sequence  of  bytes  is  found,
       pcre_exec()   returns  the  error  PCRE_ERROR_BADUTF8.  If  startoffset
       contains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.

       If you already know that your subject is valid, and you  want  to  skip
       these    checks    for   performance   reasons,   you   can   set   the
       PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might  want  to
       do  this  for the second and subsequent calls to pcre_exec() if you are
       making repeated calls to find all  the  matches  in  a  single  subject
       string.  However,  you  should  be  sure  that the value of startoffset
       points to the start of a UTF-8 character.  When  PCRE_NO_UTF8_CHECK  is
       set,  the  effect of passing an invalid UTF-8 string as a subject, or a
       value of startoffset that does not  point  to  the  start  of  a  UTF-8
       character, is undefined. Your program may crash.

         PCRE_PARTIAL_HARD
         PCRE_PARTIAL_SOFT

       These  options  turn  on  the  partial  matching feature. For backwards
       compatibility, PCRE_PARTIAL  is  a  synonym  for  PCRE_PARTIAL_SOFT.  A
       partial  match  occurs  if  the  end  of  the subject string is reached
       successfully, but there are not enough subject characters  to  complete
       the  match.  If this happens when PCRE_PARTIAL_HARD is set, pcre_exec()
       immediately returns PCRE_ERROR_PARTIAL. Otherwise, if PCRE_PARTIAL_SOFT
       is  set,  matching continues by testing any other alternatives. Only if
       they   all   fail   is   PCRE_ERROR_PARTIAL   returned   (instead    of
       PCRE_ERROR_NOMATCH).  The portion of the string that was inspected when
       the partial match was found is set as the first matching string.  There
       is a more detailed discussion in the pcrepartial documentation.

   The string to be matched by pcre_exec()

       The  subject string is passed to pcre_exec() as a pointer in subject, a
       length (in bytes) in length, and a starting byte offset in startoffset.
       In  UTF-8  mode,  the  byte  offset  must point to the start of a UTF-8
       character. Unlike the pattern string, the subject  may  contain  binary
       zero  bytes.  When  the starting offset is zero, the search for a match
       starts at the beginning of the subject, and this is  by  far  the  most
       common case.

       A  non-zero  starting offset is useful when searching for another match
       in the same subject by  calling  pcre_exec()  again  after  a  previous
       success.    Setting  startoffset  differs  from  just  passing  over  a
       shortened string and setting PCRE_NOTBOL in the case of a pattern  that
       begins with any kind of lookbehind. For example, consider the pattern

         \Biss\B

       which  finds  occurrences  of "iss" in the middle of words. (\B matches
       only if the current position in the subject is not  a  word  boundary.)
       When  applied  to the string "Mississipi" the first call to pcre_exec()
       finds the first occurrence. If pcre_exec() is called  again  with  just
       the  remainder  of  the  subject,  namely  "issipi", it does not match,
       because \B is always false at the start of the subject, which is deemed
       to  be  a  word  boundary. However, if pcre_exec() is passed the entire
       string again, but with startoffset  set  to  4,  it  finds  the  second
       occurrence  of  "iss"  because  it  is able to look behind the starting
       point to discover that it is preceded by a letter.

       If a non-zero starting offset is passed when the pattern  is  anchored,
       one attempt to match at the given offset is made. This can only succeed
       if the pattern does not require the match to be at  the  start  of  the
       subject.

   How pcre_exec() returns captured substrings

       In  general, a pattern matches a certain portion of the subject, and in
       addition, further substrings from the subject  may  be  picked  out  by
       parts  of  the  pattern.  Following the usage in Jeffrey Friedl’s book,
       this is called "capturing" in what follows, and the  phrase  "capturing
       subpattern"  is  used  for  a  fragment  of  a pattern that picks out a
       substring.  PCRE  supports  several  other   kinds   of   parenthesized
       subpattern that do not cause substrings to be captured.

       Captured substrings are returned to the caller via a vector of integers
       whose address is passed in ovector.  The  number  of  elements  in  the
       vector  is  passed  in  ovecsize,  which must be a non-negative number.
       Note: this argument is NOT the size of ovector in bytes.

       The first two-thirds of the  vector  is  used  to  pass  back  captured
       substrings,  each  substring  using  a  pair of integers. The remaining
       third of the vector is used as workspace by pcre_exec() while  matching
       capturing   subpatterns,   and   is  not  available  for  passing  back
       information. The number passed in ovecsize should always be a  multiple
       of three. If it is not, it is rounded down.

       When  a  match  is successful, information about captured substrings is
       returned in pairs of integers, starting at the  beginning  of  ovector,
       and  continuing  up  to two-thirds of its length at the most. The first
       element of each pair is set to the byte offset of the  first  character
       in  a  substring, and the second is set to the byte offset of the first
       character after the end of a substring. Note: these values  are  always
       byte offsets, even in UTF-8 mode. They are not character counts.

       The  first  pair  of  integers, ovector[0] and ovector[1], identify the
       portion of the subject string matched by the entire pattern.  The  next
       pair  is  used for the first capturing subpattern, and so on. The value
       returned by pcre_exec() is one more than the highest numbered pair that
       has  been  set.  For example, if two substrings have been captured, the
       returned value is 3. If there are no capturing subpatterns, the  return
       value from a successful match is 1, indicating that just the first pair
       of offsets has been set.

       If a capturing subpattern is matched repeatedly, it is the last portion
       of the string that it matched that is returned.

       If  the vector is too small to hold all the captured substring offsets,
       it is used as far as possible (up to two-thirds of its length), and the
       function  returns  a value of zero. If the substring offsets are not of
       interest, pcre_exec() may be called with ovector  passed  as  NULL  and
       ovecsize  as zero. However, if the pattern contains back references and
       the ovector is not big enough to remember the related substrings,  PCRE
       has  to  get  additional  memory  for  use  during matching. Thus it is
       usually advisable to supply an ovector.

       The pcre_fullinfo() function can be used to find out how many capturing
       subpatterns  there  are  in  a  compiled pattern. The smallest size for
       ovector that will allow for n captured substrings, in addition  to  the
       offsets of the substring matched by the whole pattern, is (n+1)*3.

       It  is  possible for capturing subpattern number n+1 to match some part
       of the subject when subpattern n has not been used at all. For example,
       if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
       return from the function is 4, and subpatterns 1 and 3 are matched, but
       2  is  not.  When  this  happens,  both  values  in  the  offset  pairs
       corresponding to unused subpatterns are set to -1.

       Offset values that correspond to unused subpatterns at the end  of  the
       expression  are  also  set  to  -1. For example, if the string "abc" is
       matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not
       matched.  The  return  from the function is 2, because the highest used
       capturing subpattern number is 1. However, you can refer to the offsets
       for  the  second  and third capturing subpatterns if you wish (assuming
       the vector is large enough, of course).

       Some convenience functions are provided  for  extracting  the  captured
       substrings as separate strings. These are described below.

   Error return values from pcre_exec()

       If  pcre_exec()  fails, it returns a negative number. The following are
       defined in the header file:

         PCRE_ERROR_NOMATCH        (-1)

       The subject string did not match the pattern.

         PCRE_ERROR_NULL           (-2)

       Either code or subject was passed as NULL,  or  ovector  was  NULL  and
       ovecsize was not zero.

         PCRE_ERROR_BADOPTION      (-3)

       An unrecognized bit was set in the options argument.

         PCRE_ERROR_BADMAGIC       (-4)

       PCRE  stores a 4-byte "magic number" at the start of the compiled code,
       to catch the case when it is passed a junk pointer and to detect when a
       pattern that was compiled in an environment of one endianness is run in
       an environment with the other endianness. This is the error  that  PCRE
       gives when the magic number is not present.

         PCRE_ERROR_UNKNOWN_OPCODE (-5)

       While running the pattern match, an unknown item was encountered in the
       compiled pattern. This error could be caused by a bug  in  PCRE  or  by
       overwriting of the compiled pattern.

         PCRE_ERROR_NOMEMORY       (-6)

       If  a  pattern contains back references, but the ovector that is passed
       to pcre_exec() is not big enough to remember the referenced substrings,
       PCRE  gets  a  block of memory at the start of matching to use for this
       purpose. If the call via pcre_malloc() fails, this error is given.  The
       memory is automatically freed at the end of matching.

         PCRE_ERROR_NOSUBSTRING    (-7)

       This  error is used by the pcre_copy_substring(), pcre_get_substring(),
       and  pcre_get_substring_list()  functions  (see  below).  It  is  never
       returned by pcre_exec().

         PCRE_ERROR_MATCHLIMIT     (-8)

       The  backtracking  limit,  as  specified  by the match_limit field in a
       pcre_extra structure (or defaulted) was reached.  See  the  description
       above.

         PCRE_ERROR_CALLOUT        (-9)

       This error is never generated by pcre_exec() itself. It is provided for
       use by callout functions that want to yield a distinctive  error  code.
       See the pcrecallout documentation for details.

         PCRE_ERROR_BADUTF8        (-10)

       A  string  that contains an invalid UTF-8 byte sequence was passed as a
       subject.

         PCRE_ERROR_BADUTF8_OFFSET (-11)

       The UTF-8 byte sequence that was passed as a subject was valid, but the
       value  of  startoffset  did  not  point  to  the  beginning  of a UTF-8
       character.

         PCRE_ERROR_PARTIAL        (-12)

       The subject string did not match, but it did match partially.  See  the
       pcrepartial documentation for details of partial matching.

         PCRE_ERROR_BADPARTIAL     (-13)

       This  code  is  no  longer  in  use.  It was formerly returned when the
       PCRE_PARTIAL option was used with a compiled pattern  containing  items
       that  were  not  supported  for  partial  matching.  From  release 8.00
       onwards, there are no restrictions on partial matching.

         PCRE_ERROR_INTERNAL       (-14)

       An unexpected internal error has occurred. This error could  be  caused
       by a bug in PCRE or by overwriting of the compiled pattern.

         PCRE_ERROR_BADCOUNT       (-15)

       This  error is given if the value of the ovecsize argument is negative.

         PCRE_ERROR_RECURSIONLIMIT (-21)

       The internal recursion limit, as specified by the match_limit_recursion
       field  in  a  pcre_extra  structure (or defaulted) was reached. See the
       description above.

         PCRE_ERROR_BADNEWLINE     (-23)

       An invalid combination of PCRE_NEWLINE_xxx options was given.

       Error numbers -16 to -20 and -22 are not used by pcre_exec().

EXTRACTING CAPTURED SUBSTRINGS BY NUMBER


       int pcre_copy_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber, char *buffer,
            int buffersize);

       int pcre_get_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber,
            const char **stringptr);

       int pcre_get_substring_list(const char *subject,
            int *ovector, int stringcount, const char ***listptr);

       Captured substrings can be  accessed  directly  by  using  the  offsets
       returned  by  pcre_exec()  in  ovector.  For convenience, the functions
       pcre_copy_substring(),            pcre_get_substring(),             and
       pcre_get_substring_list()   are   provided   for   extracting  captured
       substrings as new, separate, zero-terminated strings.  These  functions
       identify substrings by number. The next section describes functions for
       extracting named substrings.

       A substring that contains a binary zero is correctly extracted and  has
       a  further zero added on the end, but the result is not, of course, a C
       string.  However, you can process such a string  by  referring  to  the
       length     that    is    returned    by    pcre_copy_substring()    and
       pcre_get_substring().      Unfortunately,     the     interface      to
       pcre_get_substring_list()   is   not   adequate  for  handling  strings
       containing binary zeros, because the end of the  final  string  is  not
       independently indicated.

       The  first  three  arguments  are  the  same  for  all  three  of these
       functions:  subject  is  the  subject  string  that   has   just   been
       successfully  matched,  ovector  is  a pointer to the vector of integer
       offsets that was passed to pcre_exec(), and stringcount is  the  number
       of  substrings that were captured by the match, including the substring
       that matched the entire regular expression. This is the value  returned
       by  pcre_exec()  if  it  is  greater than zero. If pcre_exec() returned
       zero, indicating that it ran out of space in ovector, the value  passed
       as  stringcount  should be the number of elements in the vector divided
       by three.

       The functions pcre_copy_substring() and pcre_get_substring() extract  a
       single  substring,  whose  number  is given as stringnumber. A value of
       zero extracts the substring that matched the  entire  pattern,  whereas
       higher     values     extract     the    captured    substrings.    For
       pcre_copy_substring(), the string is placed in buffer, whose length  is
       given  by  buffersize,  while  for  pcre_get_substring() a new block of
       memory is obtained via pcre_malloc, and its  address  is  returned  via
       stringptr.  The  yield of the function is the length of the string, not
       including the terminating zero, or one of these error codes:

         PCRE_ERROR_NOMEMORY       (-6)

       The buffer was too small for pcre_copy_substring(), or the  attempt  to
       get memory failed for pcre_get_substring().

         PCRE_ERROR_NOSUBSTRING    (-7)

       There is no substring whose number is stringnumber.

       The   pcre_get_substring_list()   function   extracts   all   available
       substrings and builds a list of pointers to them. All this is done in a
       single block of memory that is obtained via pcre_malloc. The address of
       the memory block is returned via listptr, which is also  the  start  of
       the  list  of  string pointers. The end of the list is marked by a NULL
       pointer. The yield of the function is zero if all  went  well,  or  the
       error code

         PCRE_ERROR_NOMEMORY       (-6)

       if the attempt to get the memory block failed.

       When  any of these functions encounter a substring that is unset, which
       can happen when capturing subpattern number n+1 matches  some  part  of
       the  subject, but subpattern n has not been used at all, they return an
       empty string. This can be  distinguished  from  a  genuine  zero-length
       substring  by  inspecting  the  appropriate offset in ovector, which is
       negative for unset substrings.

       The    two    convenience    functions    pcre_free_substring()     and
       pcre_free_substring_list() can be used to free the memory returned by a
       previous call  of  pcre_get_substring()  or  pcre_get_substring_list(),
       respectively. They do nothing more than call the function pointed to by
       pcre_free, which of course could be called directly from a  C  program.
       However,  PCRE  is  used  in  some  situations where it is linked via a
       special interface to  another  programming  language  that  cannot  use
       pcre_free  directly;  it  is  for  these  cases  that the functions are
       provided.

EXTRACTING CAPTURED SUBSTRINGS BY NAME


       int pcre_get_stringnumber(const pcre *code,
            const char *name);

       int pcre_copy_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            char *buffer, int buffersize);

       int pcre_get_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            const char **stringptr);

       To extract a substring by name,  you  first  have  to  find  associated
       number.  For example, for this pattern

         (a+)b(?<xxx>\d+)...

       the number of the subpattern called "xxx" is 2. If the name is known to
       be unique (PCRE_DUPNAMES was not set), you can find the number from the
       name  by  calling  pcre_get_stringnumber().  The  first argument is the
       compiled pattern, and the second is the name. The yield of the function
       is the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no
       subpattern of that name.

       Given the number, you can extract the substring directly, or use one of
       the functions described in the previous section. For convenience, there
       are also two functions that do the whole job.

       Most   of   the   arguments    of    pcre_copy_named_substring()    and
       pcre_get_named_substring()  are  the  same  as  those for the similarly
       named functions that extract by number. As these are described  in  the
       previous  section,  they  are not re-described here. There are just two
       differences:

       First, instead of a  substring  number,  a  substring  name  is  given.
       Second,  there  is  an  extra  argument, given at the start, which is a
       pointer to the compiled pattern. This is needed in order to gain access
       to the name-to-number translation table.

       These  functions call pcre_get_stringnumber(), and if it succeeds, they
       then   call   pcre_copy_substring()   or    pcre_get_substring(),    as
       appropriate.  NOTE:  If  PCRE_DUPNAMES  is  set and there are duplicate
       names, the behaviour may not be what you want (see the next section).

       Warning: If the pattern  uses  the  (?|  feature  to  set  up  multiple
       subpatterns  with  the  same  number,  as  described  in the section on
       duplicate subpattern numbers in the pcrepattern page,  you  cannot  use
       names  to  distinguish the different subpatterns, because names are not
       included in the compiled code. The matching process uses only  numbers.
       For this reason, the use of different names for subpatterns of the same
       number causes an error at compile time.

DUPLICATE SUBPATTERN NAMES


       int pcre_get_stringtable_entries(const pcre *code,
            const char *name, char **first, char **last);

       When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for
       subpatterns  are not required to be unique. (Duplicate names are always
       allowed for subpatterns with the same number, created by using the  (?|
       feature.  Indeed,  if  such subpatterns are named, they are required to
       use the same names.)

       Normally, patterns with duplicate names are such that in any one match,
       only  one of the named subpatterns participates. An example is shown in
       the pcrepattern documentation.

       When   duplicates   are   present,   pcre_copy_named_substring()    and
       pcre_get_named_substring()  return the first substring corresponding to
       the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING
       (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()
       function returns one of the numbers that are associated with the  name,
       but it is not defined which it is.

       If  you want to get full details of all captured substrings for a given
       name, you must use  the  pcre_get_stringtable_entries()  function.  The
       first argument is the compiled pattern, and the second is the name. The
       third and fourth are pointers to variables which  are  updated  by  the
       function. After it has run, they point to the first and last entries in
       the name-to-number table  for  the  given  name.  The  function  itself
       returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if
       there are none. The format of the  table  is  described  above  in  the
       section  entitled  Information about a pattern.  Given all the relevant
       entries for the name, you can extract each of their numbers, and  hence
       the captured data, if any.

FINDING ALL POSSIBLE MATCHES


       The  traditional  matching  function  uses a similar algorithm to Perl,
       which stops when it finds the first match, starting at a given point in
       the  subject.  If you want to find all possible matches, or the longest
       possible match, consider using the alternative matching  function  (see
       below)  instead.  If you cannot use the alternative function, but still
       need to find all possible matches, you can kludge it up by  making  use
       of  the  callout  facility,  which  is  described  in  the  pcrecallout
       documentation.

       What you have to do is to insert a callout right  at  the  end  of  the
       pattern.   When  your  callout function is called, extract and save the
       current matched substring. Then return 1, which forces  pcre_exec()  to
       backtrack  and  try other alternatives. Ultimately, when it runs out of
       matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.

MATCHING A PATTERN: THE ALTERNATIVE FUNCTION


       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize,
            int *workspace, int wscount);

       The function pcre_dfa_exec()  is  called  to  match  a  subject  string
       against  a  compiled pattern, using a matching algorithm that scans the
       subject string just once, and does not backtrack.  This  has  different
       characteristics  to  the  normal  algorithm, and is not compatible with
       Perl. Some  of  the  features  of  PCRE  patterns  are  not  supported.
       Nevertheless, there are times when this kind of matching can be useful.
       For a discussion of the two matching algorithms, and a list of features
       that   pcre_dfa_exec()   does   not   support,   see  the  pcrematching
       documentation.

       The arguments for the pcre_dfa_exec() function  are  the  same  as  for
       pcre_exec(),  plus  two  extras.  The  ovector  argument  is  used in a
       different way, and this is described below. The other common  arguments
       are  used  in  the same way as for pcre_exec(), so their description is
       not repeated here.

       The two additional arguments provide workspace for  the  function.  The
       workspace  vector  should  contain at least 20 elements. It is used for
       keeping  track  of  multiple  paths  through  the  pattern  tree.  More
       workspace  will  be  needed for patterns and subjects where there are a
       lot of potential matches.

       Here is an example of a simple call to pcre_dfa_exec():

         int rc;
         int ovector[10];
         int wspace[20];
         rc = pcre_dfa_exec(
           re,             /* result of pcre_compile() */
           NULL,           /* we didn’t study the pattern */
           "some string",  /* the subject string */
           11,             /* the length of the subject string */
           0,              /* start at offset 0 in the subject */
           0,              /* default options */
           ovector,        /* vector of integers for substring information */
           10,             /* number of elements (NOT size in bytes) */
           wspace,         /* working space vector */
           20);            /* number of elements (NOT size in bytes) */

   Option bits for pcre_dfa_exec()

       The unused bits of the options argument  for  pcre_dfa_exec()  must  be
       zero.   The   only   bits   that   may   be   set   are  PCRE_ANCHORED,
       PCRE_NEWLINE_xxx,     PCRE_NOTBOL,     PCRE_NOTEOL,      PCRE_NOTEMPTY,
       PCRE_NOTEMPTY_ATSTART,      PCRE_NO_UTF8_CHECK,      PCRE_PARTIAL_HARD,
       PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the
       last  four  of  these are exactly the same as for pcre_exec(), so their
       description is not repeated here.

         PCRE_PARTIAL_HARD
         PCRE_PARTIAL_SOFT

       These have the same general effect as they do for pcre_exec(), but  the
       details  are  slightly  different.  When  PCRE_PARTIAL_HARD  is set for
       pcre_dfa_exec(), it  returns  PCRE_ERROR_PARTIAL  if  the  end  of  the
       subject is reached and there is still at least one matching possibility
       that requires additional characters. This happens even if some complete
       matches have also been found. When PCRE_PARTIAL_SOFT is set, the return
       code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end
       of  the  subject  is  reached, there have been no complete matches, but
       there is still at least one matching possibility. The  portion  of  the
       string  that  was inspected when the longest partial match was found is
       set as the first matching string in both cases.

         PCRE_DFA_SHORTEST

       Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to
       stop  as  soon  as  it  has  found  one  match.  Because of the way the
       alternative algorithm works, this is necessarily the shortest  possible
       match at the first possible matching point in the subject string.

         PCRE_DFA_RESTART

       When pcre_dfa_exec() returns a partial match, it is possible to call it
       again, with additional subject characters, and have  it  continue  with
       the  same match. The PCRE_DFA_RESTART option requests this action; when
       it is set, the workspace and wscount options must  reference  the  same
       vector  as  before  because data about the match so far is left in them
       after a partial match. There is more discussion of this facility in the
       pcrepartial documentation.

   Successful returns from pcre_dfa_exec()

       When  pcre_dfa_exec()  succeeds,  it  may  have  matched  more than one
       substring in the subject. Note, however, that all the matches from  one
       run of the function start at the same point in the subject. The shorter
       matches are all initial substrings of the longer matches. For  example,
       if the pattern

         <.*>

       is matched against the string

         This is <something> <something else> <something further> no more

       the three matched strings are

         <something>
         <something> <something else>
         <something> <something else> <something further>

       On  success,  the  yield of the function is a number greater than zero,
       which is the number of matched substrings.  The  substrings  themselves
       are  returned  in  ovector. Each string uses two elements; the first is
       the offset to the start, and the second is the offset to  the  end.  In
       fact,  all  the  strings  have the same start offset. (Space could have
       been saved by giving this only once, but it was decided to retain  some
       compatibility  with  the  way pcre_exec() returns data, even though the
       meaning of the strings is different.)

       The strings are returned in reverse  order  of  length;  that  is,  the
       longest  matching string is given first. If there were too many matches
       to fit into ovector, the yield of the function is zero, and the  vector
       is filled with the longest matches.

   Error returns from pcre_dfa_exec()

       The  pcre_dfa_exec()  function returns a negative number when it fails.
       Many of the errors are the same  as  for  pcre_exec(),  and  these  are
       described  above.   There are in addition the following errors that are
       specific to pcre_dfa_exec():

         PCRE_ERROR_DFA_UITEM      (-16)

       This return is given if  pcre_dfa_exec()  encounters  an  item  in  the
       pattern that it does not support, for instance, the use of \C or a back
       reference.

         PCRE_ERROR_DFA_UCOND      (-17)

       This return is given if pcre_dfa_exec()  encounters  a  condition  item
       that  uses  a back reference for the condition, or a test for recursion
       in a specific group. These are not supported.

         PCRE_ERROR_DFA_UMLIMIT    (-18)

       This return is given if pcre_dfa_exec() is called with an  extra  block
       that contains a setting of the match_limit field. This is not supported
       (it is meaningless).

         PCRE_ERROR_DFA_WSSIZE     (-19)

       This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
       workspace vector.

         PCRE_ERROR_DFA_RECURSE    (-20)

       When  a  recursive subpattern is processed, the matching function calls
       itself recursively, using private vectors for  ovector  and  workspace.
       This  error  is  given  if  the output vector is not large enough. This
       should be extremely rare, as a vector of size 1000 is used.

AUTHOR


       Philip Hazel
       University Computing Service
       Cambridge CB2 3QH, England.

REVISION


       Last updated: 03 October 2009
       Copyright (c) 1997-2009 University of Cambridge.

NAME

PCRE NATIVE API

PCRE API OVERVIEW

NEWLINES

MULTITHREADING

SAVING PRECOMPILED PATTERNS FOR LATER USE

CHECKING BUILD-TIME OPTIONS

COMPILING A PATTERN

COMPILATION ERROR CODES

STUDYING A PATTERN

LOCALE SUPPORT

INFORMATION ABOUT A PATTERN

OBSOLETE INFO FUNCTION

REFERENCE COUNTS

MATCHING A PATTERN: THE TRADITIONAL FUNCTION

EXTRACTING CAPTURED SUBSTRINGS BY NUMBER

EXTRACTING CAPTURED SUBSTRINGS BY NAME

DUPLICATE SUBPATTERN NAMES

FINDING ALL POSSIBLE MATCHES

MATCHING A PATTERN: THE ALTERNATIVE FUNCTION

SEE ALSO

AUTHOR

REVISION