Man Linux: Main Page and Category List

NAME

       agrep  -  search  a  file  for  a  string  or  regular expression, with
       approximate matching capabilities

SYNOPSIS

       agrep  [  -#cdehiklnpstvwxBDGIS  ]  pattern  [  -f  patternfile   ]   [
       filename... ]

DESCRIPTION

       agrep  searches the input filenames (standard input is the default, but
       see a warning under LIMITATIONS) for records containing  strings  which
       either  exactly  or  approximately  match  a  pattern.   A record is by
       default a line, but it can be defined differently using the  -d  option
       (see  below).   Normally,  each  record found is copied to the standard
       output.  Approximate matching allows finding records that  contain  the
       pattern  with  several  errors including substitutions, insertions, and
       deletions.  For example, Massechusets matches  Massachusetts  with  two
       errors   (one  substitution  and  one  insertion).   Running  agrep  -2
       Massechusets foo outputs all lines in foo containing any string with at
       most 2 errors from Massechusets.

       agrep  supports  many  kinds of queries including arbitrary wild cards,
       sets of patterns, and in general, regular  expressions.   See  PATTERNS
       below.   It  supports  most of the options supported by the grep family
       plus several more (but it is not 100% compatible with grep).  For  more
       information  on  the  algorithms used by agrep see Wu and Manber, "Fast
       Text Searching With Errors," Technical  report  #91-11,  Department  of
       Computer  Science,  University  of  Arizona,  June  1991  (available by
       anonymous ftp from cs.arizona.edu  in  agrep/agrep.ps.1),  and  Wu  and
       Manber, "Agrep -- A Fast Approximate Pattern Searching Tool", To appear
       in USENIX Conference 1992 January  (available  by  anonymous  ftp  from
       cs.arizona.edu in agrep/agrep.ps.2).

       As with the rest of the grep family, the characters ‘$’, ‘^’, ‘∗’, ‘[,],^’, ‘|’, ‘(’, ‘)’, ‘!’, and ‘\’ can cause unexpected results when
       included in the pattern, as these characters are also meaningful to the
       shell.  To avoid these problems, one should always enclose  the  entire
       pattern  argument in single quotes, i.e., ’pattern’.  Do not use double
       quotes (").

       When agrep is applied to more than one input file, the name of the file
       is  displayed  preceding  each  line  which  matches  the pattern.  The
       filename is not displayed when processing a  single  file,  so  if  you
       actually want the filename to appear, use /dev/null as a second file in
       the list.

OPTIONS

       -#     # is a non-negative integer (at most 8) specifying  the  maximum
              number  of  errors  permitted in finding the approximate matches
              (defaults to zero).  Generally,  each  insertion,  deletion,  or
              substitution  counts as one error.  It is possible to adjust the
              relative cost of insertions, deletions and substitutions (see -I
              -D and -S options).

       -c     Display only the count of matching records.

       -ddelim’
              Define  delim  to  be  the  separator  between two records.  The
              default value is ’$’, namely a record  is  by  default  a  line.
              delim  can be a string of size at most 8 (with possible use of ^
              and $), but not a regular expression.  Text between two delim’s,
              before  the  first delim, and after the last delim is considered
              as one record.  For  example,  -d  ’$$’  defines  paragraphs  as
              records and -d ’^From ’ defines mail messages as records.  agrep
              matches each record separately.  This option does not  currently
              work with regular expressions.

       -e pattern
              Same  as  a simple pattern argument, but useful when the pattern
              begins with a ‘-’.

       -f patternfile
              patternfile contains a set of (simple) patterns.  The output  is
              all   lines   that  match  at  least  one  of  the  patterns  in
              patternfile.  Currently, the -f  option  works  only  for  exact
              match and for simple patterns (any meta symbol is interpreted as
              a regular character); it is compatible only with -c, -h, -i, -l,
              -s, -v, -w, and -x options.  see LIMITATIONS for size bounds.

       -h     Do not display filenames.

       -i     Case-insensitive  search  —  e.g.,  "A"  and  "a" are considered
              equivalent.

       -k     No symbol in the pattern is treated as a  meta  character.   For
              example,  agrep  -k  ’a(b|c)*d’ foo will find the occurrences of
              a(b|c)*d  in  foo  whereas  agrep  ’a(b|c)*d’  foo   will   find
              substrings  in foo that match the regular expression ’a(b|c)*d’.

       -l     List only the files that contain a match.  This option is useful
              for  looking  for  files  containing  a  certain  pattern.   For
              example, " agrep -l ’wonderful’  * "  will  list  the  names  of
              those   files   in  current  directory  that  contain  the  word
              ’wonderful’.

       -n     Each line that is printed is prefixed by its  record  number  in
              the file.

       -p     Find  records  in  the  text that contain a supersequence of the
              pattern.  For example,
               agrep -p DCS foo will match "Department of Computer Science."

       -s     Work silently, that is, display nothing except  error  messages.
              This is useful for checking the error status.

       -t     Output  the  record  starting  from  the  end  of  delim to (and
              including) the next delim.  This is useful for cases where delim
              should come at the end of the record.

       -v     Inverse  mode  —  display only those records that do not contain
              the pattern.

       -w     Search for the pattern as a word  —  i.e.,  surrounded  by  non-
              alphanumeric characters.  The non-alphanumeric must surround the
              match;  they cannot be counted as errors.  For example, agrep -w
              -1 car will match cars, but not characters.

       -x     The pattern must match the whole line.

       -y     Used with -B option. When -y is on, agrep will always output the
              best matches without giving a prompt.

       -B     Best match mode.  When -B is specified and no exact matches  are
              found,  agrep  will continue to search until the closest matches
              (i.e., the ones with minimum number of  errors)  are  found,  at
              which point the following message will be shown: "the best match
              contains x errors, there are y matches, output them? (y/n)"  The
              best  match  mode  is  not  supported  for standard input, e.g.,
              pipeline input.  When the -#, -c, or -l options  are  specified,
              the -B option is ignored.  In general, -B may be slower than -#,
              but not by very much.

       -Dk    Set the cost of a deletion to k (k is a positive integer).  This
              option does not currently work with regular expressions.

       -G     Output the files that contain a match.

       -Ik    Set  the  cost  of  an insertion to k (k is a positive integer).
              This option does not currently work with regular expressions.

       -Sk    Set the cost of a substitution to k (k is a  positive  integer).
              This option does not currently work with regular expressions.

PATTERNS

       agrep  supports  a large variety of patterns, including simple strings,
       strings with classes of characters, sets of strings,  wild  cards,  and
       regular expressions.

       Strings
              any  sequence  of  characters, including the special symbols ‘^’
              for beginning of line and ‘$’ for  end  of  line.   The  special
              characters  listed  above  (  ‘$’, ‘^’, ‘∗’, ‘[,^’, ‘|’, ‘(’,
              ‘)’, ‘!’, and ‘\’ ) should be preceded by ‘\’ if they are to  be
              matched as regular characters.  For example, \^abc\\ corresponds
              to the string ^abc\, whereas ^abc corresponds to the string  abc
              at the beginning of a line.

       Classes of characters
              a  list  of  characters  inside [] (in order) corresponds to any
              character from the list.  For example, [a-ho-z] is any character
              between  a  and  h or between o and z.  The symbol ‘^’ inside []
              complements the list.  For example, [^i-n] denote any  character
              in  the  character  set except character ’i’ to ’n’.  The symbol
              ‘^’ thus has two meanings, but this is  consistent  with  egrep.
              The  symbol  ‘.’  (don’t care) stands for any symbol (except for
              the newline symbol).

       Boolean operations
              agrep supports an ‘and’ operation ‘;’ and an ‘or’ operation ‘,’,
              but  not  a  combination  of  both.  For example, ’fast;network’
              searches for all records containing both words.

       Wild cards
              The symbol ’#’ is used to denote a wild card.  # matches zero or
              any  number  of arbitrary characters.  For example, ex#e matches
              example.  The symbol # is equivalent to .* in egrep.   In  fact,
              .*  will work too, because it is a valid regular expression (see
              below), but unless this is part of an actual regular expression,
              # will work faster.

       Combination of exact and approximate matching
              any pattern inside angle brackets <> must match the text exactly
              even if the match is with errors.   For  example,  <mathemat>ics
              matches  mathematical  with one error (replacing the last s with
              an a), but mathe<matics> does not match mathematical  no  matter
              how many errors we allow.

       Regular expressions
              The  syntax  of  regular  expressions in agrep is in general the
              same as that for egrep.  The union operation ‘|’, Kleene closure
              ‘*’, and parentheses () are all supported.  Currently ’+’ is not
              supported.   Regular  expressions  are  currently   limited   to
              approximately    30   characters   (generally   excluding   meta
              characters).  Some options (-d, -w, -f, -t, -x, -D, -I,  -S)  do
              not currently work with regular expressions.  The maximal number
              of errors for regular expressions that use ’*’ or ’|’ is 4.

EXAMPLES

       agrep -2 -c ABCDEFG foo
              gives the number of lines  in  file  foo  that  contain  ABCDEFG
              within two errors.

       agrep -1 -D2 -S2 ’ABCD#YZ’ foo
              outputs  the  lines  containing  ABCD followed, within arbitrary
              distance, by YZ, with up to one additional  insertion  (-D2  and
              -S2 make deletions and substitutions too "expensive").

       agrep -5 -p abcdefghij /usr/share/dict/words
              outputs the list of all words containing at least 5 of the first
              10 letters of  the  alphabet  in  order.   (Try  it:   any  list
              starting  with  academia  and ending with sacrilegious must mean
              something!)

       agrep -1 ’abc[0-9](de|fg)*[x-z]’ foo
              outputs the lines containing, within up to one error, the string
              that  starts with abc followed by one digit, followed by zero or
              more repetitions of either de or fg, followed by either x, y, or
              z.

       agrep -d ’^From ’ ’breakdown;internet’ mbox
              outputs  all  mail messages (the pattern ’^From ’ separates mail
              messages in a mail file) that contain keywords  ’breakdown’  and
              ’internet’.

       agrep -d ’$$’ -1 ’<word1> <word2>’ foo
              finds  all  paragraphs that contain word1 followed by word2 with
              one error in place of the blank.  In particular, if word1 is the
              last  word  in  a  line  and word2 is the first word in the next
              line, then the space will be substituted by a newline symbol and
              it  will match.  Thus, this is a way to overcome separation by a
              newline.  Note that -d ’$$’ (or another delim which  spans  more
              than  one  line)  is necessary, because otherwise agrep searches
              only one line at a time.

       agrep ’^agrep’ <this manual>
              outputs all the examples of the use of agrep in this man  pages.

SEE ALSO

       ed(1), ex(1), grep(1V), sh(1), csh(1).

BUGS/LIMITATIONS

       Any  bug  reports or comments will be appreciated!  Please mail them to
       sw@cs.arizona.edu or udi@cs.arizona.edu

       Regular expressions do not support the ’+’ operator (match  1  or  more
       instances  of the preceding token).  These can be searched for by using
       this syntax in the pattern:

          ’pattern(pattern)*’

       (search for strings containing one instance of the pattern, followed by
       0 or more instances of the pattern).

       The   following   can  cause  an  infinite  loop:  agrep  pattern  *  >
       output_file.  If the number of matches is high, they may  be  deposited
       in  output_file before it is completely read leading to more matches of
       the pattern within output_file  (the  matches  are  against  the  whole
       directory).   It’s  not clear whether this is a "bug" (grep will do the
       same), but be warned.

       The maximum size of the patternfile is limited to  be  250Kb,  and  the
       maximum number of patterns is limited to be 30,000.

       Standard  input  is the default if no input file is given.  However, if
       standard input is keyed in directly (as opposed to through a pipe,  for
       example) agrep may not work for some non-simple patterns.

       There  is no size limit for simple patterns.  More complicated patterns
       are currently  limited  to  approximately  30  characters.   Lines  are
       limited  to  1024  characters.   Records are limited to 48K, and may be
       truncated if they are larger than that.  The limit of record length can
       be changed by modifying the parameter Max_record in agrep.h.

DIAGNOSTICS

       Exit  status  is  0  if  any matches are found, 1 if none, 2 for syntax
       errors or inaccessible files.

AUTHORS

       Sun Wu and Udi Manber, Department of Computer  Science,  University  of
       Arizona, Tucson, AZ 85721.  {sw|udi}@cs.arizona.edu.

                                 Jan 17, 1992