Man Linux: Main Page and Category List

NAME

       uwildmat, uwildmat_simple, uwildmat_poison - Perform wildmat matching

SYNOPSIS

       #include <inn/libinn.h>

       bool uwildmat(const char *text, const char *pattern);

       bool uwildmat_simple(const char *text, const char *pattern);

       enum uwildmat uwildmat_poison(const char *text, const char *pattern);

DESCRIPTION

       uwildmat compares text against the wildmat expression pattern,
       returning true if and only if the expression matches the text.  "@" has
       no special meaning in pattern when passed to uwildmat.  Both text and
       pattern are assumed to be in the UTF-8 character encoding, although
       malformed UTF-8 sequences are treated in a way that attempts to be
       mostly compatible with single-octet character sets like ISO 8859-1.
       (In other words, if you try to match ISO 8859-1 text with these
       routines everything should work as expected unless the ISO 8859-1 text
       contains valid UTF-8 sequences, which thankfully is somewhat rare.)

       uwildmat_simple is identical to uwildmat except that neither "!"  nor
       "," have any special meaning and pattern is always treated as a single
       pattern.  This function exists solely to support legacy interfaces like
       NNTP’s XPAT command, and should be avoided when implementing new
       features.

       uwildmat_poison works similarly to uwildmat, except that "@" as the
       first character of one of the patterns in the expression (see below)
       "poisons" the match if it matches.  uwildmat_poison returns
       UWILDMAT_MATCH if the expression matches the text, UWILDMAT_FAIL if it
       doesn’t, and UWILDMAT_POISON if the expression doesn’t match because a
       poisoned pattern matched the text.  These enumeration constants are
       defined in the inn/libinn.h header.

WILDMAT EXPRESSIONS

       A wildmat expression follows rules similar to those of shell filename
       wildcards but with some additions and changes.  A wildmat expression is
       composed of one or more wildmat patterns separated by commas.  Each
       character in the wildmat pattern matches a literal occurrence of that
       same character in the text, with the exception of the following
       metacharacters:

       ?       Matches any single character (including a single UTF-8
               multibyte character, so "?" can match more than one byte).

       *       Matches any sequence of zero or more characters.

       \       Turns off any special meaning of the following character; the
               following character will match itself in the text.  "\" will
               escape any character, including another backslash or a comma
               that otherwise would separate a pattern from the next pattern
               in an expression.  Note that "\" is not special inside a
               character range (no metacharacters are).

       [...]   A character set, which matches any single character that falls
               within that set.  The presence of a character between the
               brackets adds that character to the set; for example, "[amv]"
               specifies the set containing the characters "a", "m", and "v".
               A range of characters may be specified using "-"; for example,
               "[0-5abc]" is equivalent to "[012345abc]".  The order of
               characters is as defined in the UTF-8 character set, and if the
               start character of such a range falls after the ending
               character of the range in that ranking the results of
               attempting a match with that pattern are undefined.

               In order to include a literal "]" character in the set, it must
               be the first character of the set (possibly following "^"); for
               example, "[]a]" matches either "]" or "a".  To include a
               literal "-" character in the set, it must be either the first
               or the last character of the set.  Backslashes have no special
               meaning inside a character set, nor do any other of the wildmat
               metacharacters.

       [^...]  A negated character set.  Follows the same rules as a character
               set above, but matches any character not contained in the set.
               So, for example, "[^]-]" matches any character except "]" and
               "-".

       In addition, "!" (and possibly "@") have special meaning as the first
       character of a pattern; see below.

       When matching a wildmat expression against some text, each comma-
       separated pattern is matched in order from left to right.  In order to
       match, the pattern must match the whole text; in regular expression
       terminology, it’s implicitly anchored at both the beginning and the
       end.  For example, the pattern "a" matches only the text "a"; it
       doesn’t match "ab" or "ba" or even "aa".  If none of the patterns
       match, the whole expression doesn’t match.  Otherwise, whether the
       expression matches is determined entirely by the rightmost matching
       pattern; the expression matches the text if and only if the rightmost
       matching pattern is not negated.

       For example, consider the text "news.misc".  The expression "*" matches
       this text, of course, as does "comp.*,news.*" (because the second
       pattern matches).  "news.*,!news.misc" does not match this text because
       both patterns match, meaning that the rightmost takes precedence, and
       the rightmost matching pattern is negated.  "news.*,!news.misc,*.misc"
       does match this text, since the rightmost matching pattern is not
       negated.

       Note that the expression "!news.misc" can’t match anything.  Either the
       pattern doesn’t match, in which case no patterns match and the
       expression doesn’t match, or the pattern does match, in which case
       because it’s negated the expression doesn’t match.  "*,!news.misc", on
       the other hand, is a useful pattern that matches anything except
       "news.misc".

       "!" has significance only as the first character of a pattern; anywhere
       else in the pattern, it matches a literal "!" in the text like any
       other non-metacharacter.

       If the uwildmat_poison interface is used, then "@" behaves the same as
       "!" except that if an expression fails to match because the rightmost
       matching pattern began with "@", UWILDMAT_POISON is returned instead of
       UWILDMAT_FAIL.

       If the uwildmat_simple interface is used, the matching rules are the
       same as above except that none of "!", "@", or "," have any special
       meaning at all and only match those literal characters.

BUGS

       All of these functions internally convert the passed arguments to const
       unsigned char pointers.  The only reason why they take regular char
       pointers instead of unsigned char is for the convenience of INN and
       other callers that may not be using unsigned char everywhere they
       should.  In a future revision, the public interface should be changed
       to just take unsigned char pointers.

HISTORY

       Written by Rich $alz <rsalz@uunet.uu.net> in 1986, and posted to Usenet
       several times since then, most notably in comp.sources.misc in March,
       1991.

       Lars Mathiesen <thorinn@diku.dk> enhanced the multi-asterisk failure
       mode in early 1991.

       Rich and Lars increased the efficiency of star patterns and reposted it
       to comp.sources.misc in April, 1991.

       Robert Elz <kre@munnari.oz.au> added minus sign and close bracket
       handling in June, 1991.

       Russ Allbery <rra@stanford.edu> added support for comma-separated
       patterns and the "!" and "@" metacharacters to the core wildmat
       routines in July, 2000.  He also added support for UTF-8 characters,
       changed the default behavior to assume that both the text and the
       pattern are in UTF-8, and largely rewrote this documentation to expand
       and clarify the description of how a wildmat expression matches.

       Please note that the interfaces to these functions are named uwildmat
       and the like rather than wildmat to distinguish them from the wildmat
       function provided by Rich $alz’s original implementation.  While this
       code is heavily based on Rich’s original code, it has substantial
       differences, including the extension to support UTF-8 characters, and
       has noticable functionality changes.  Any bugs present in it aren’t
       Rich’s fault.

       $Id: uwildmat.pod 8567 2009-08-15 07:03:37Z iulius $

SEE ALSO

       grep(1), fnmatch(3), regex(3), regexp(3).