Man Linux: Main Page and Category List

NAME

       bibclean  - prettyprint and syntax check BibTeX and Scribe bibliography
       data base files

SYNOPSIS

       bibclean [ -author ] [ -error-log filename ] [ -help ] [ -? ]
                [ -init-file filename ] [ -long-field fieldname ]
                [ -max-width nnn ] [ -[no-]align-equals ]
                [ -[no-]check-values ] [ -[no-]delete-empty-values ]
                [ -[no-]file-position ] [ -[no-]fix-font-changes ]
                [ -[no-]fix-initials ] [ -[no-]fix-names ]
                [ -[no-]German-style ] [ -[no-]keep-linebreaks ]
                [ -[no-]keep-parbreaks ] [ -[no-]keep-preamble-spaces ]
                [ -[no-]keep-spaces ] [ -[no-]keep-string-spaces ]
                [ -[no-]parbreaks ] [ -[no-]prettyprint ]
                [ -[no-]print-patterns ] [ -[no-]read-init-files ]
                [ -[no-]remove-OPT-prefixes ] [ -[no-]scribe ]
                [ -[no-]trace-file-opening ] [ -[no-]warnings ] [ -version ]
                ( <infile | bibfile1 bibfile2 bibfile3 ... ) >outfile

       All options can be abbreviated to a unique leading prefix.

       An explicit file name of ‘‘-’’ represents standard input; it is assumed
       if no input files are specified.

DESCRIPTION

       bibclean prettyprints input BibTeX files  to  stdout,  and  checks  the
       brace balance and bibliography entry syntax as well.  It can be used to
       detect problems in BibTeX files  that  sometimes  confuse  even  BibTeX
       itself,  and  importantly,  can  be used to normalize the appearance of
       collections of BibTeX files.

       Here is a summary of the formatting actions:

       ·  BibTeX items are formatted into  a  consistent  structure  with  one
          field  = "value" pair per line, and the initial @ and trailing right
          brace in column 1.

       ·  Tabs are expanded into  blank  strings;  their  use  is  discouraged
          because  they  inhibit  portability,  and  can  suffer corruption in
          electronic mail.

       ·  Long string values are split at a blank and continued onto the  next
          line with leading indentation.

       ·  A single blank line separates adjacent bibliography entries.

       ·  Text outside BibTeX entries is passed through verbatim.

       ·  Outer parentheses around entries are converted to braces.

       ·  Personal  names  in author and editor field values are normalized to
          the form ‘‘P. D.  Q.   Bach’’,  from  ‘‘P.D.Q.  Bach’’  and  ‘‘Bach,
          P.D.Q.’’.

       ·  Hyphen sequences in page numbers are converted to en-dashes.

       ·  Month  values are converted to standard BibTeX string abbreviations.

       ·  In titles, sequences of upper-case characters at  brace  level  zero
          are  braced  to  protect  them  from  being  converted to lower-case
          letters by some bibliography styles.

       ·  CODEN,  ISBN  (International  Standard   Book   Number)   and   ISSN
          (International  Standard Serial Number) entry values are examined to
          verify the  checksums  of  each  listed  number,  and  correct  ISBN
          hyphenation is automatically supplied.

       The standardized format of the output of bibclean facilitates the later
       application  of  simple  filters,  such  as   bibcheck(1),   bibdup(1),
       bibextract(1),   bibindex(1),   bibjoin(1),   biblabel(1),  biblook(1),
       biborder(1), bibsort(1), citefind(1), and citetags(1), to  process  the
       text,  and  also  is  the  one expected by the GNU Emacs BibTeX support
       functions.

OPTIONS

       Command-line switches may be abbreviated to a  unique  leading  prefix,
       and  letter case is not significant.  All options are parsed before any
       input bibliography files are read, no matter what their  order  on  the
       command  line.   Options  that correspond to a yes/no setting of a flag
       have a form with a prefix "no-" to  set  the  flag  to  no.   For  such
       options,  the  last  setting  determines  the flag value used.  This is
       significant when options are also  specified  in  initialization  files
       (see the INITIALIZATION FILES manual section).

       The  leading hyphen that distinguishes an option from a filename may be
       doubled, for compatibility  with  GNU  and  POSIX  conventions.   Thus,
       -author and --author are equivalent.

       To avoid confusion with options, if a filename begins with a hyphen, it
       must be disguised by a leading absolute  or  relative  directory  path,
       e.g., /tmp/-foo.bib or ./-foo.bib.

       -author   Display  an author credit on the standard error unit, stderr,
                 and then exit with  a  success  return  code.   Sometimes  an
                 executable  program  is  separated from its documentation and
                 source code; this option provides a way to recover from that.

       -error-log filename
                 Redirect  stderr  to  the  indicated  file,  which  will then
                 contain all of the error and warning messages.   This  option
                 is   provided   for   those   systems  that  have  difficulty
                 redirecting stderr.

       -help or -?
                 Display a help message on stderr, giving a usage description,
                 similar  to  this  section of the manual pages, and then exit
                 with a success return code.

       -init-file filename
                 Provide an explicit value pattern  initialization  file.   It
                 will   be   processed  after  any  system-wide  and  job-wide
                 initialization files, and may override them.  It in turn  may
                 be  overridden  by  a subsequent file-specific initialization
                 file.  For further  details,  see  the  INITIALIZATION  FILES
                 manual section.

       -long-field fieldname
                 Suppress  warnings  that  field  named fieldname have lenghts
                 exceeding the standard BibTeX limits.  NB! This is a  Debian-
                 specific extension!

       -max-width nnn
                 bibclean normally limits output line widths to 72 characters,
                 and in the interests of consistency, that value should not be
                 changed.    Occasionally,  special-purpose  applications  may
                 require  different  maximum  line  widths,  so  this   option
                 provides  that  capability.   The number following the option
                 name can be specified in decimal, octal (starting with 0), or
                 hexadecimal  (starting with 0x).  A zero or negative value is
                 interpreted to mean unlimited, so -max-width 0 can be used to
                 ensure that each field/value pair appears on a single line.

                 When  -no-prettyprint  requests  bibclean to act as a lexical
                 analyzer,  the  default  line  width  is  unlimited,   unless
                 overridden by this option.

                 When  bibclean  is prettyprinting, line wrapping will be done
                 only at a space. Consequently,  a  long  non-blank  character
                 sequence  may  result  in  the output exceeding the requested
                 line width.

                 When bibclean is lexing, line wrapping is done by inserting a
                 backslash-newline pair when the specified maximum is reached,
                 so no line length will ever exceed the maximum.

       -[no-]align-equals
                 With the positive form, align the equals  sign  in  key/value
                 assignments  at  the same column, separated by a single space
                 from the value string.  Otherwise, the  equals  sign  follows
                 the key, separated by a single space.  Default: no.

       -[no-]check-values
                 With  the  positive form, apply heuristic pattern matching to
                 field values in order to detect possible errors (e.g., ‘‘year
                 =  "192"’’  instead of ‘‘year = "1992"’’), and issue warnings
                 when unexpected patterns are found.

                 This checking is usually beneficial, but if it  produces  too
                 many  bogus  warnings for a particular bibliography file, you
                 can disable  it  with  the  negative  form  of  this  option.
                 Default: yes.

       -[no-]delete-empty-values
                 With  the  positive  form,  remove  all field/value pairs for
                 which the value is an  empty  string.   This  is  helpful  in
                 cleaning   up   bibliographies  generated  from  text  editor
                 templates. Compare this option with -[no-]remove-OPT-prefixes
                 described below.  Default: no.

       -[no-]file-position
                 With   the   positive   form,  give  detailed  file  position
                 information in warning and error messages.  Default: no.

       -[no-]fix-font-changes
                 With the positive form,  supply  an  additional  brace  level
                 around  font  changes in titles to protect against downcasing
                 by some BibTeX styles.  Font changes that already  have  more
                 than one level of braces are not modified.

                 For  example,  if  a  title  contains  the  Latin phrase {\em
                 Dictyostelium    Discoideum}    or    {\em    {D}ictyostelium
                 {D}iscoideum},  then  downcasing will incorrectly convert the
                 phrase  to  lower-case  letters.   Most  BibTeX   users   are
                 surprised  that  bracing the initial letters does not prevent
                 the  downcase  action.    The   correct   coding   is   {{\em
                 Dictyostelium   Discoideum}}.    However,   there   are  also
                 legitimate cases where an  extra  level  of  bracing  wrongly
                 protects   from   downcasing.   Consequently,  bibclean  will
                 normally not supply an extra level of braces, but if you have
                 a  bibliography where the extra braces are routinely missing,
                 you can use this option to supply them.

                 If you think that  you  need  this  option,  it  is  strongly
                 recommended that you apply bibclean to your bibliography file
                 with and without  -fix-font-changes,  then  compare  the  two
                 output  files  to  ensure  that  extra  braces  are not being
                 supplied in titles where they should  not  be  present.   You
                 will  have  to  decide  which  of the two output files is the
                 better choice, then repair the  incorrect  title  bracing  by
                 hand.

                 Since  font  changes in titles are uncommon, except for cases
                 of the type which this option  is  designed  to  correct,  it
                 should do more good than harm.  Default: no.

       -[no-]fix-initials
                 With  the  positive  form,  insert  a  space  after  a period
                 following author initials.  Default: yes.

       -[no-]fix-names
                 With the positive form, reorder author and editor name  lists
                 to  remove commas at brace level zero, placing first names or
                 initials before last names.  Default: yes.

       -[no-]German-style
                 With the positive form, interpret quote characters ["] inside
                 braced  value  strings  at  brace  level  1  according to the
                 conventions of the TeX style file german.sty, which overloads
                 quote  to  simplify input and representation of German umlaut
                 accents, sharp-s  (es-zet),  ligature  separators,  invisible
                 hyphens,   raised/lowered   quotes,  French  guillemets,  and
                 discretionary  hyphens.   Recognized  character  combinations
                 will  be braced to prevent BibTeX from interpreting the quote
                 as a string delimiter.

                 Quoted strings receive no special handling from this  option,
                 and  since  German  nouns  in titles must anyway be protected
                 from the downcasing operation  of  most  BibTeX  bibliography
                 styles,  German  value  strings that use the overloaded quote
                 character can always be entered in the form "{...}",  without
                 the need to specify this option at all.

                 Default: no.

       -[no-]keep-linebreaks
                 Normally, line breaks inside value strings are collapsed into
                 a single space, so that  long  value  strings  can  later  be
                 broken to provide lines of reasonable length.

                 With  the  positive  form,  linebreaks are preserved in value
                 strings.  If -max-width is set to zero,  this  preserves  the
                 original  line breaks.  Spacing outside value strings remains
                 under bibclean’s control, and is not affected by this option.

                 Default: no.

       -[no-]keep-parbreaks
                 With  the  positive  form,  preserve paragraph breaks (either
                 formfeeds, or lines containing only spaces) in value strings.
                 Normally, paragraph breaks are collapsed into a single space.
                 Spacing  outside  value  strings  remains  under   bibclean’s
                 control, and is not affected by this option.  Default: no.

       -[no-]keep-preamble-spaces
                 With   the   positive   form,   preserve  all  whitespace  in
                 @Preamble{...} entries.  Default: no.

       -[no-]keep-spaces
                 With the positive form, preserve all spaces in value strings.
                 Normally,  multiple spaces are collapsed into a single space.
                 This option  can  be  used  together  with  -keep-linebreaks,
                 -keep-parbreaks,  and  -max-width  0  to preserve the form of
                 value  strings  while  still  providing  syntax   and   value
                 checking.    Spacing  outside  value  strings  remains  under
                 bibclean’s control, and  is  not  affected  by  this  option.
                 Default: no.

       -[no-]keep-string-spaces
                 With   the   positive   form,   preserve  all  whitespace  in
                 @String{...} entries.  Default: no.

       -[no-]parbreaks
                 With the negative form, a paragraph break (either a formfeed,
                 or  a  line containing only spaces) is not permitted in value
                 strings, or between field/value pairs.  This may be useful to
                 quickly   trap   runaway   strings  arising  from  mismatched
                 delimiters.  Default: yes.

       -[no-]prettyprint
                 Normally, bibclean functions as  a  prettyprinter.   However,
                 with  the  negative form of this option, it acts as a lexical
                 analyzer instead, producing a stream of lexical tokens.   See
                 the  LEXICAL  ANALYSIS  manual  section  for further details.
                 Default: yes.

       -[no-]print-patterns
                 With the positive form, print the value  patterns  read  from
                 initialization  files  as  they are added to internal tables.
                 Use this option to check newly-added patterns, or to see what
                 patterns are being used.

                 These  patterns  are  the  ones that will be used in checking
                 value strings for valid syntax, and all of them are specified
                 in  initialization  files,  rather  than  hard-coded into the
                 program.  For further details, see the  INITIALIZATION  FILES
                 manual section.  Default: no.

       -[no-]read-init-files
                 With  the  negative form, suppress loading of system-, user-,
                 and file-specific initialization files.  Initializations will
                 come  only  from  those  files explicitly given by -init-file
                 filename options.  Default: yes.

       -[no-]remove-OPT-prefixes
                 With the positive form, remove the ‘‘OPT’’ prefix  from  each
                 field  name  where  the  corresponding  value is not an empty
                 string.  The prefix ‘‘OPT’’ must be entirely in upper-case to
                 be recognized.

                 This  option is for bibliographies generated with the help of
                 the  GNU  Emacs  BibTeX  editing  support,  which   generates
                 templates  with  optional  fields  identified  by the ‘‘OPT’’
                 prefix.  Although the function M-x bibtex-remove-OPT normally
                 bound  to  the  keystrokes  C-c C-o does the job, users often
                 forget, with the result that BibTeX does  not  recognize  the
                 field  name,  and  ignores  the  value  string.  Compare this
                 option  with   -[no-]delete-empty-values   described   above.
                 Default: no.

       -[no-]scribe
                 With the positive form, accept input syntax conforming to the
                 Scribe document system.  The  output  will  be  converted  to
                 conform to BibTeX syntax.  See the SCRIBE BIBLIOGRAPHY FORMAT
                 manual section for further details.  Default: no.

       -[no-]trace-file-opening
                 With the positive form, record in  the  error  log  file  the
                 names of all files which bibclean attempts to open.  Use this
                 option to identify where initialization  files  are  located.
                 Default: no.

       -[no-]warnings
                 With  the  positive  form,  allow  all warning messages.  The
                 negative form is not recommended since it may  mask  problems
                 that should be repaired.  Default: yes.

       -version  Display  the  program version number on stderr, and then exit
                 with a success  return  code.   This  will  also  include  an
                 indication  of  who  compiled  the  program, the host name on
                 which it was compiled, the time of compilation, and the  type
                 of string-value matching code selected, when that information
                 is available to the compiler.

ERROR RECOVERY AND WARNINGS

       When bibclean detects an error, it issues  an  error  message  to  both
       stderr  and  stdout.   That  way, the user is clearly notified, and the
       output bibliography also contains the message at the point of error.

       Error messages begin with a distinctive pair of queries, ??,  beginning
       in  column  1, followed by the input file name and line number.  If the
       -file-position option was specified, they also contain  the  input  and
       output  positions of the current file, entry, and value.  Each position
       includes the file byte number, the line number, and the column  number.
       In  the  event  of  a  runaway  string  argument,  the  entry and value
       positions should precisely pinpoint the erroneous  bibliography  entry,
       and  the  file positions will indicate where it was detected, which may
       be rather later in the files.

       Warning messages identify possible problems,  and  are  therefore  sent
       only  to  stderr, and not to stdout, so they never appear in the output
       file.  They are identified by  a  distinctive  pair  of  percents,  %%,
       beginning  in  column 1, and as with error messages, may be followed by
       file position messages if the -file-position option was specified.

       For convenience, the first line of each error and warning message  sent
       to  stderr  is formatted according to the expectations of the GNU Emacs
       next-error command.   You  can  invoke  bibclean  with  the  Emacs  M-x
       compile<RET>bibclean  filename.bib  >filename.new command, then use the
       next-error command, normally bound to C-x  (that’s a grave,  or  back,
       accent), to move to the location of the error in the input file.

       If  error  messages  are  ignored,  and left in the output bibliography
       file, they will precipitate an error  when  the  bibliography  is  next
       processed with BibTeX.

       After  issuing an error message, bibclean then resynchronizes its input
       by copying it verbatim to stdout until  a  new  bibliography  entry  is
       recognized  on  a line in which the first non-blank character is an at-
       sign (@).  This ensures that nothing is lost from  the  input  file(s),
       allowing  corrections  to  be  made  in  either the input or the output
       files.  However, if bibclean detects an  internal  error  in  its  data
       structures,  it will terminate abruptly without further input or output
       processing; this kind of error should never happen, and if it does,  it
       should be reported immediately to the author of the program.  Errors in
       initialization files, and running out  of  dynamic  memory,  will  also
       immediately terminate bibclean.

INITIALIZATION FILES

       bibclean  can  be compiled with one of three different types of pattern
       matching; the choice is made by the installer at compile time:

              ·  The original version uses explicit hand-coded tests of value-
                 string syntax.

              ·  The  second  version uses regular-expression pattern-matching
                 host  library  routines  together   with   regular-expression
                 patterns that come entirely from initialization files.

              ·  The  third  version  uses special patterns that come entirely
                 from initialization files.

       This Debianized version of bibclean uses the third  version.   However,
       command-line  options can also be specified in initialization files, no
       matter which pattern matching choice was selected.

       When bibclean starts, it searches for initialization files,  using  the
       first    one   of   $(HOME)/.bibcleanrc,   /usr/share/bibcleanrc,   and
       /etc/bibcleanrc  that  exists.   Afterwards,   it   reads   the   first
       .bibcleanrc  found  in the BIBINPUTS search path.  The name .bibcleanrc
       can be changed at  run  time  through  a  setting  of  the  environment
       variable  BIBCLEANINI.   If  the  name  starts  with  a dot, it will be
       stripped when looking in /usr/share and /etc.

       Then, when command-line arguments are processed, any  additional  files
       specified  by  -init-filefilename options are also processed.  Finally,
       immediately before  each  named  bibliography  file  is  processed,  an
       attempt  is  made to process an initialization file with the same name,
       but with the extension changed to .ini.  The default extension  can  be
       changed  by  a  setting  of the environment variable BIBCLEANEXT.  This
       scheme permits system-wide, user-wide, session-wide, and  file-specific
       initialization files to be supported.

       When   input   is   taken   from   stdin,  there  is  no  file-specific
       initialization.

       For precise control,  the  -no-read-init-files  option  suppresses  all
       initialization   files   except   those   explicitly  named  by  -init-
       filefilename options, either on  the  command  line,  or  in  requested
       initialization files.

       Recursive  execution  of  initialization  files  with nested -init-file
       options is permitted; if  the  recursion  is  circular,  bibclean  will
       finally  get a non-fatal initialization file open failure after opening
       too  many  files.   This   terminates   further   initialization   file
       processing.   As  the recursion unwinds, the files are all closed, then
       execution proceeds normally.

       An initialization file may contain empty lines, comments  from  percent
       to  end  of line (just like TeX), option switches, and field/pattern or
       field/pattern/message assignments.  Leading  and  trailing  spaces  are
       ignored.  This is best illustrated by a short example:

       % This is a small bibclean initialization file

       -init-file /u/math/bib/.bibcleanrc %% departmental patterns

       chapter = "\"D\""                 %% 23

       pages   = "\"D--D\""              %% 23--27

       volume  = "\"D \\an\\d D\""       %% 11 and 12

       year    = \
          "\"dddd, dddd, dddd\"" \
          "Multiple years specified."      %% 1989, 1990, 1991

       -no-fix-names   %% do not modify author/editor lists

       Long  logical  lines  can  be  split  into  multiple  physical lines by
       breaking at a backslash-newline pair;  the  backslash-newline  pair  is
       discarded.   This  processing  happens while characters are being read,
       before any further interpretation of the input stream.

       Each logical line must contain a complete option  (and  its  value,  if
       any),  or  a  complete  field/pattern  pair, or a field/pattern/message
       triple.

       Comments are stripped during the parsing of  the  field,  pattern,  and
       message  values.   The  comment  start  symbol is not recognized inside
       quoted strings, so it can be freely used in such strings.

       Comments on logical lines that were input as  multiple  physical  lines
       via  the  backslash-newline convention must appear on the last physical
       line; otherwise, the remaining physical lines will become part  of  the
       comment.

       Pattern  strings  must  be  enclosed  in  quotation  marks; within such
       strings, a backslash starts an escape mechanism that is  commonly  used
       in UNIX software.  The recognized escape sequences are:

              \a     alarm bell (octal 007)

              \b     backspace (octal 010)

              \f     formfeed (octal 014)

              \n     newline (octal 012)

              \r     carriage return (octal 015)

              \t     horizontal tab (octal 011)

              \v     vertical tab (octal 013)

              \ooo   character number octal ooo (e.g \012 is linefeed).  Up to
                     3 octal digits may be used.

              \0xhh  character  number  hexadecimal   hh   (e.g.,   \0x0a   is
                     linefeed).  xhh may be in either letter case.  Any number
                     of hexadecimal digits may be used.

       Backslash followed by any other character produces just that character.
       Thus,  \%  gets  a  literal  percent  into  a  string  (preventing  its
       interpretation as a comment), \" produces  a  quotation  mark,  and  \\
       produces a single backslash.

       An  ASCII  NUL (\0) in a string will terminate it; this is a feature of
       the C programming language in which bibclean is implemented.

       Field/pattern  pairs  can  be  separated  by   arbitrary   space,   and
       optionally, either an equals sign or colon functioning as an assignment
       operator.  Thus, the following are equivalent:

       pages="\"D--D\""
       pages:"\"D--D\""
       pages "\"D--D\""
         pages = "\"D--D\""
         pages : "\"D--D\""
       pages   "\"D--D\""

       Each field name can have an arbitrary  number  of  patterns  associated
       with  it;  however,  they  must  be specified in separate field/pattern
       assignments.

       An empty pattern string  causes  previously-loaded  patterns  for  that
       field  name  to  be  forgotten.  This feature permits an initialization
       file to completely discard patterns from earlier initialization  files.

       Patterns  for  value  strings are represented in a tiny special-purpose
       language that is both convenient and suitable for  bibliography  value-
       string  syntax  checking.   While  not  as  powerful as the language of
       regular-expression patterns, its parsing can be portably implemented in
       less  than  3%  of  the code in a widely-used regular-expression parser
       (the GNU regexp package).

       The patterns are represented by the following special characters:

              <space>  one or more spaces

              a        exactly one letter

              A        one or more letters

              d        exactly one digit

              D        one or more digits

              r        exactly one Roman numeral

              R        one or more Roman numerals (i.e. a Roman number)

              w        exactly one word (one or more letters and digits)

              W        one or more space-separated words, beginning and ending
                       with a word

              .        one   ‘special’   character,   one  of  the  characters
                       <space>!#()*+,-./:;?[]~,  a   subset   of   punctuation
                       characters that are typically used in string values

              :        one or more ‘special’ characters

              X        one  or  more  ‘special’-separated words, beginning and
                       ending with a word

              \x       exactly one x (x is any character),  possibly  with  an
                       escape sequence interpretation given earlier

              x        exactly the character x (x is anything but one of these
                       pattern characters: aAdDrRwW.:<space>\)

       The X pattern character is very powerful,  but  generally  inadvisable,
       since  it  will  match  almost  anything likely to be found in a BibTeX
       value string.  The reason for providing pattern matching on  the  value
       strings is to uncover possible errors, not mask them.

       There   is  no  provision  for  specifying  ranges  or  repetitions  of
       characters, but this can usually be done with separate patterns.  It is
       a good idea to accompany the pattern with a comment showing the kind of
       thing it is expected to match.  Here is a portion of an  initialization
       file giving a few of the patterns used to match number value strings:

       number  =       "\"D\""         %% 23
       number  =       "\"A AD\""      %% PN LPS5001
       number  =       "\"A D(D)\""    %% RJ 34(49)
       number  =       "\"A D\""       %% XNSS 288811
       number  =       "\"A D\\.D\""   %% Version 3.20
       number  =       "\"A-A-D-D\""   %% UMIAC-TR-89-11
       number  =       "\"A-A-D\""     %% CS-TR-2189
       number  =       "\"A-A-D\\.D\"" %% CS-TR-21.7

       For a bibliography that contains only article entries, this list should
       probably be reduced to just the first pattern, so that  anything  other
       than  a digit string fails the pattern-match test.  This is easily done
       by keeping bibliography-specific patterns in a corresponding file  with
       extension .ini, since that file is read automatically.

       You should be sure to use empty pattern strings in this pattern file to
       discard patterns from earlier initialization files.

       The value strings passed to the  pattern  matcher  contain  surrounding
       quotes,  so the patterns should also.  However, you could use a pattern
       specification like "\"D" to match an initial digit string  followed  by
       anything  else;  the  omission  of  the  final quotation mark \" in the
       pattern allows the match to succeed  without  checking  that  the  next
       character in the value string is a quotation mark.

       Because  the  value  strings  are  intended to be processed by TeX, the
       pattern matching ignores braces, and TeX  control  sequences,  together
       with any space following those control sequences.  Spaces around braces
       are preserved.  This convention allows the pattern fragment  A-AD-D  to
       match   the   value  string  TN-K\slash 27-70,  because  the  value  is
       implicitly collapsed to TN-K27-70 during the matching operation.

       bibclean’s normal action when a string value fails to match any of  the
       corresponding  patterns  is  to  issue a warning message something like
       this: "Unexpected value in year = "192".  In most  cases,  that  is
       sufficient  to alert the user to a problem.  In some cases, however, it
       may be desirable to associate a different  message  with  a  particular
       pattern.   This can be done by supplying a message string following the
       pattern string.  Format items %% (single percent), %e (entry name),  %f
       (field name), %k (citation key), and %v (string value) are available to
       get current values expanded in the messages.  Here is an example:

       chapter = "\"D:D\"" "Colon found in ‘‘%f = %v’’" %% 23:2

       To be consistent with other messages output by  bibclean,  the  message
       string should not end with punctuation.

       If  you  wish to make the message an error, rather than just a warning,
       begin it with a query (?), like this:

       chapter = "\"D:D\"" "?Colon found in ‘‘%f = %v’’" %% 23:2

       The query will not be included in the output message.

       Escape sequences are supported in message strings, just as they are  in
       pattern  strings.  You can use this to advantage for fancy things, such
       as terminal display mode control.  If you rewrite the previous  example
       as

       chapter = "\"D:D\"" \
                 "?\033[7mColon found in ‘‘%f = %v’’\033[0m" %% 23:2

       the  error message will appear in inverse video on display screens that
       support ANSI terminal control sequences.  Such practice is not normally
       recommended,  since  it  may  have  undesirable  effects on some output
       devices.   Nevertheless,  you  may  find  it  useful   for   restricted
       applications.

       For  some  types  of  bibliography  fields,  bibclean contains special-
       purpose code to supplement or replace the pattern matching:

              ·  CODEN, ISBN and  ISSN  field  values  are  handled  this  way
                 because  their  validation  requires  evaluation of checksums
                 that cannot be expressed by simple patterns; no patterns  are
                 even used in these three cases.

              ·  chapter, number, pages, and volume values are checked only by
                 pattern matching.

              ·  month values are first checked against  the  standard  BibTeX
                 month  abbreviations,  and  only  if  no  match  is found are
                 patterns then used.

              ·  year values are first checked against patterns,  then  if  no
                 match  is  found, the year numbers are found and converted to
                 integer values for testing against reasonable bounds.

       Values for other fields are checked only  against  patterns.   You  can
       provide  patterns  for  any field you like, even ones bibclean does not
       already know about.  New ones are simply added  to  an  internal  table
       that is searched for each string to be validated.

       The  special field, key, represents the bibliographic citation key.  It
       can be given patterns, like any other field.  Here is an initialization
       file  pattern  assignment  that  will match an author name, a colon, an
       alphabetic string, and a two-digit year:

       key = "A:Add"                     %% Knuth:TB86

       Notice that no quotation marks are included in the pattern, because the
       citation  keys  are  not  quoted.   You  can  use such patterns to help
       enforce  uniform  naming  conventions  for  citation  keys,  which   is
       increasingly important as your bibliography data base grows.

LEXICAL ANALYSIS

       When  -no-prettyprint is specified, bibclean acts as a lexical analyzer
       instead of a prettyprinter, producing output in lines of the form

              <token-number><tab><token-name><tab>"<token-value>"

       Each output line contains a single  complete  token,  identified  by  a
       small  integer  number for use by a computer program, a token type name
       for human readers, and a string value in quotes.

       Special characters in the  token  value  string  are  represented  with
       ANSI/ISO  Standard C escape sequences, so all characters other than NUL
       are representable, and multi-line values can be represented in a single
       line.

       Here  are the token numbers and token type names that can appear in the
       output when -prettyprint is specified:

               0   UNKNOWN
               1   ABBREV
               2   AT
               3   COMMA
               4   COMMENT
               5   ENTRY
               6   EQUALS
               7   FIELD
               8   INCLUDE
               9   INLINE
              10   KEY
              11   LBRACE
              12   LITERAL
              13   NEWLINE
              14   PREAMBLE
              15   RBRACE
              16   SHARP
              17   SPACE
              18   STRING
              19   VALUE

       Programs that parse such output  should  also  be  prepared  for  lines
       beginning with the warning prefix, %%, or the error prefix, ??, and for
       ANSI/ISO Standard C line number directives of the form
              # line 273 "texbook1.bib"
       which record the line number and file name of the current input file.

       If a -max-width nnn command-line  option  was  specified,  long  output
       lines  will  be  wrapped at a backslash-newline pair, and consequently,
       software that processes the lexical token stream should be prepared  to
       collapse such wrapped lines back into single lines.

       As an example of the use of -no-prettyprint, the UNIX command pipeline
              bibclean -no-prettyprint mylib.bib | \
                  awk ’$2 == "KEY" {print $3}’ | \
                  sed -e ’s/"//g’ | \
                  sort
       will  extract a sorted list of all citation keys in the file mylib.bib.

       A certain amount of processing will have been done on the  tokens.   In
       particular,  delimiters equivalent to braces will have been replaced by
       braces, and braced strings will have become quoted strings.

       The LITERAL token type is used for arbitrary text  that  bibclean  does
       not  examine  further,  such  as  the contents of a @Preamble{...} or a
       @Comment{...}.

       The UNKNOWN token type should never appear in the output stream.  It is
       used internally to initialize token type variables.

SCRIBE BIBLIOGRAPHY FORMAT

       bibclean’s  support  for the Scribe bibliography format is based on the
       syntax description  in  the  Scribe  Introductory  User’s  Manual,  3rd
       Edition,  May  1980.   Scribe was originally developed by Brian Reid at
       Carnegie-Mellon University, and is now marketed by Unilogic, Ltd.

       The BibTeX bibliography format was strongly influenced by  Scribe,  and
       indeed,  with  care, it is possible to share bibliography files between
       the two systems.  Nevertheless, there are some differences, so here  is
       a summary of features of the Scribe bibliography file format:

       (1)   Letter  case  is  not significant in field names and entry names,
             but case is preserved in value strings.

       (2)   In field/value pairs, the field and value may be separated by one
             of  three  characters:  =,  /,  or  space.   Space may optionally
             surround these separators.

       (3)   Value delimiters are any of these seven pairs: { }   [  ]    (  )
             < >   ’ ’   " "   ‘ ‘

       (4)   Value  delimiters  may  not be nested, even though with the first
             four  delimiter  pairs,  nested  balanced  delimiters  would   be
             unambiguous.

       (5)   Delimiters  can  be  omitted  around  values  that  contain  only
             letters, digits,  sharp  (#),  ampersand  (&),  period  (.),  and
             percent (%).

       (6)   Outside of delimited values, a literal at-sign (@) is represented
             by doubled at-signs (@@).

       (7)   Bibliography entries begin with @name, as for BibTeX, but any  of
             the  seven  Scribe  value delimiter pairs may be used to surround
             the values in field/value pairs.  As in  (4),  nested  delimiters
             are forbidden.

       (8)   Arbitrary  space  may  separate  entry  names  from the following
             delimiters.

       (9)   @Comment is a special command whose delimited value is discarded.
             As in (4), nested delimiters are forbidden.

       (10)  The special form

             @Begin{comment}
              ...
             @End{comment}

             permits encapsulating arbitrary text containing any characters or
             delimiters, other  than  ‘‘@End{comment}’’.   Any  of  the  seven
             delimiter pairs may be used around the word ‘‘comment’’ following
             the ‘‘@Begin’’ or ‘‘@End’’; the delimiters in the two cases  need
             not        be       the       same,       and       consequently,
             ‘‘@Begin{comment}’’/‘‘@End{comment}’’ pairs may not be nested.

       (11)  The key field is required in each bibliography entry.

       (12)  A backslashed quote in a string will  be  assumed  to  be  a  TeX
             accent,  and  braced  appropriately.   While  such accents do not
             conform to Scribe syntax, Scribe-format bibliographies have  been
             found that appear to be intended for TeX processing.

       Because  of  this  loose  syntax,  bibclean’s  normal  error  detection
       heuristics are less effective, and consequently, Scribe mode  input  is
       not the default; it must be explicitly requested.

ENVIRONMENT VARIABLES

       BIBCLEANEXT  File  extension  of  bibliography-specific  initialization
                    files.  Default: .ini.

       BIBCLEANINI  Name   of   bibclean   initialization   files.    Default:
                    .bibcleanrc.

       BIBINPUTS    Search  path for bibclean and BibTeX input files.  This is
                    a colon-separated list of directories that are searched in
                    order  from  first  to  last.   It  is  not an error for a
                    specified directory to not exist.

FILES

       *.bib          BibTeX and Scribe bibliography data base files.

       *.ini          File-specific initialization files.

       /usr/share/bibcleanrc, /etc/bibcleanrc
                      System-wide initialization files.

       .bibcleanrc    User-specific initialization files.

SEE ALSO

       bibcheck(1),   bibdup(1),   bibextract(1),   bibindex(1),   bibjoin(1),
       biblabel(1),    biblex(1),    biblook(1),   biborder(1),   bibparse(1),
       bibsort(1),   bibtex(1),    bibunlex(1),    citefind(1),    citesub(1),
       citetags(1), latex(1), scribe(1), tex(1).

AUTHOR

       Nelson H. F. Beebe
       Center for Scientific Computing
       University of Utah
       Department of Mathematics, 322 INSCC
       155 S 1400 E RM 233
       Salt Lake City, UT 84112-0090
       USA
       Tel: +1 801 581 5254
       FAX: +1 801 585 1640, +1 801 581 4148
       Email: beebe@math.utah.edu, beebe@acm.org, beebe@ieee.org (Internet)
       URL: http://www.math.utah.edu/~beebe

       This   Debianization   of   bibclean   was   done  by  Henning  Makholm
       <henning@makholm.net>, and differs from the upstream source in where it
       looks for the system-wide initialization file (vanilla bibclean expects
       to find it in $PATH), and has also been patched to ignore the  built-in
       BibTeX field-length limit for abstract fields.