Man Linux: Main Page and Category List

## NAME

       bibclean  - prettyprint and syntax check BibTeX and Scribe bibliography
data base files



## SYNOPSIS

       bibclean [ -author ] [ -error-log filename ] [ -help ] [ -? ]
[ -init-file filename ] [ -long-field fieldname ]
[ -max-width nnn ] [ -[no-]align-equals ]
[ -[no-]check-values ] [ -[no-]delete-empty-values ]
[ -[no-]file-position ] [ -[no-]fix-font-changes ]
[ -[no-]fix-initials ] [ -[no-]fix-names ]
[ -[no-]German-style ] [ -[no-]keep-linebreaks ]
[ -[no-]keep-parbreaks ] [ -[no-]keep-preamble-spaces ]
[ -[no-]keep-spaces ] [ -[no-]keep-string-spaces ]
[ -[no-]parbreaks ] [ -[no-]prettyprint ]
[ -[no-]print-patterns ] [ -[no-]read-init-files ]
[ -[no-]remove-OPT-prefixes ] [ -[no-]scribe ]
[ -[no-]trace-file-opening ] [ -[no-]warnings ] [ -version ]
( <infile | bibfile1 bibfile2 bibfile3 ... ) >outfile

All options can be abbreviated to a unique leading prefix.

An explicit file name of ‘‘-’’ represents standard input; it is assumed
if no input files are specified.



## DESCRIPTION

       bibclean prettyprints input BibTeX files  to  stdout,  and  checks  the
brace balance and bibliography entry syntax as well.  It can be used to
detect problems in BibTeX files  that  sometimes  confuse  even  BibTeX
itself,  and  importantly,  can  be used to normalize the appearance of
collections of BibTeX files.

Here is a summary of the formatting actions:

·  BibTeX items are formatted into  a  consistent  structure  with  one
field  = "value" pair per line, and the initial @ and trailing right
brace in column 1.

·  Tabs are expanded into  blank  strings;  their  use  is  discouraged
because  they  inhibit  portability,  and  can  suffer corruption in
electronic mail.

·  Long string values are split at a blank and continued onto the  next

·  A single blank line separates adjacent bibliography entries.

·  Text outside BibTeX entries is passed through verbatim.

·  Outer parentheses around entries are converted to braces.

·  Personal  names  in author and editor field values are normalized to
the form ‘‘P. D.  Q.   Bach’’,  from  ‘‘P.D.Q.  Bach’’  and  ‘‘Bach,
P.D.Q.’’.

·  Hyphen sequences in page numbers are converted to en-dashes.

·  Month  values are converted to standard BibTeX string abbreviations.

·  In titles, sequences of upper-case characters at  brace  level  zero
are  braced  to  protect  them  from  being  converted to lower-case
letters by some bibliography styles.

·  CODEN,  ISBN  (International  Standard   Book   Number)   and   ISSN
(International  Standard Serial Number) entry values are examined to
verify the  checksums  of  each  listed  number,  and  correct  ISBN
hyphenation is automatically supplied.

The standardized format of the output of bibclean facilitates the later
application  of  simple  filters,  such  as   bibcheck(1),   bibdup(1),
bibextract(1),   bibindex(1),   bibjoin(1),   biblabel(1),  biblook(1),
biborder(1), bibsort(1), citefind(1), and citetags(1), to  process  the
text,  and  also  is  the  one expected by the GNU Emacs BibTeX support
functions.



## OPTIONS

       Command-line switches may be abbreviated to a  unique  leading  prefix,
and  letter case is not significant.  All options are parsed before any
input bibliography files are read, no matter what their  order  on  the
command  line.   Options  that correspond to a yes/no setting of a flag
have a form with a prefix "no-" to  set  the  flag  to  no.   For  such
options,  the  last  setting  determines  the flag value used.  This is
significant when options are also  specified  in  initialization  files
(see the INITIALIZATION FILES manual section).

The  leading hyphen that distinguishes an option from a filename may be
doubled, for compatibility  with  GNU  and  POSIX  conventions.   Thus,
-author and --author are equivalent.

To avoid confusion with options, if a filename begins with a hyphen, it
must be disguised by a leading absolute  or  relative  directory  path,
e.g., /tmp/-foo.bib or ./-foo.bib.

-author   Display  an author credit on the standard error unit, stderr,
and then exit with  a  success  return  code.   Sometimes  an
executable  program  is  separated from its documentation and
source code; this option provides a way to recover from that.

-error-log filename
Redirect  stderr  to  the  indicated  file,  which  will then
contain all of the error and warning messages.   This  option
is   provided   for   those   systems  that  have  difficulty
redirecting stderr.

-help or -?
Display a help message on stderr, giving a usage description,
similar  to  this  section of the manual pages, and then exit
with a success return code.

-init-file filename
Provide an explicit value pattern  initialization  file.   It
will   be   processed  after  any  system-wide  and  job-wide
initialization files, and may override them.  It in turn  may
be  overridden  by  a subsequent file-specific initialization
file.  For further  details,  see  the  INITIALIZATION  FILES
manual section.

-long-field fieldname
Suppress  warnings  that  field  named fieldname have lenghts
exceeding the standard BibTeX limits.  NB! This is a  Debian-
specific extension!

-max-width nnn
bibclean normally limits output line widths to 72 characters,
and in the interests of consistency, that value should not be
changed.    Occasionally,  special-purpose  applications  may
require  different  maximum  line  widths,  so  this   option
provides  that  capability.   The number following the option
name can be specified in decimal, octal (starting with 0), or
hexadecimal  (starting with 0x).  A zero or negative value is
interpreted to mean unlimited, so -max-width 0 can be used to
ensure that each field/value pair appears on a single line.

When  -no-prettyprint  requests  bibclean to act as a lexical
analyzer,  the  default  line  width  is  unlimited,   unless
overridden by this option.

When  bibclean  is prettyprinting, line wrapping will be done
only at a space. Consequently,  a  long  non-blank  character
sequence  may  result  in  the output exceeding the requested
line width.

When bibclean is lexing, line wrapping is done by inserting a
backslash-newline pair when the specified maximum is reached,
so no line length will ever exceed the maximum.

-[no-]align-equals
With the positive form, align the equals  sign  in  key/value
assignments  at  the same column, separated by a single space
from the value string.  Otherwise, the  equals  sign  follows
the key, separated by a single space.  Default: no.

-[no-]check-values
With  the  positive form, apply heuristic pattern matching to
field values in order to detect possible errors (e.g., ‘‘year
=  "192"’’  instead of ‘‘year = "1992"’’), and issue warnings
when unexpected patterns are found.

This checking is usually beneficial, but if it  produces  too
many  bogus  warnings for a particular bibliography file, you
can disable  it  with  the  negative  form  of  this  option.
Default: yes.

-[no-]delete-empty-values
With  the  positive  form,  remove  all field/value pairs for
which the value is an  empty  string.   This  is  helpful  in
cleaning   up   bibliographies  generated  from  text  editor
templates. Compare this option with -[no-]remove-OPT-prefixes
described below.  Default: no.

-[no-]file-position
With   the   positive   form,  give  detailed  file  position
information in warning and error messages.  Default: no.

-[no-]fix-font-changes
With the positive form,  supply  an  additional  brace  level
around  font  changes in titles to protect against downcasing
by some BibTeX styles.  Font changes that already  have  more
than one level of braces are not modified.

For  example,  if  a  title  contains  the  Latin phrase {\em
Dictyostelium    Discoideum}    or    {\em    {D}ictyostelium
{D}iscoideum},  then  downcasing will incorrectly convert the
phrase  to  lower-case  letters.   Most  BibTeX   users   are
surprised  that  bracing the initial letters does not prevent
the  downcase  action.    The   correct   coding   is   {{\em
Dictyostelium   Discoideum}}.    However,   there   are  also
legitimate cases where an  extra  level  of  bracing  wrongly
protects   from   downcasing.   Consequently,  bibclean  will
normally not supply an extra level of braces, but if you have
a  bibliography where the extra braces are routinely missing,
you can use this option to supply them.

If you think that  you  need  this  option,  it  is  strongly
recommended that you apply bibclean to your bibliography file
with and without  -fix-font-changes,  then  compare  the  two
output  files  to  ensure  that  extra  braces  are not being
supplied in titles where they should  not  be  present.   You
will  have  to  decide  which  of the two output files is the
better choice, then repair the  incorrect  title  bracing  by
hand.

Since  font  changes in titles are uncommon, except for cases
of the type which this option  is  designed  to  correct,  it
should do more good than harm.  Default: no.

-[no-]fix-initials
With  the  positive  form,  insert  a  space  after  a period
following author initials.  Default: yes.

-[no-]fix-names
With the positive form, reorder author and editor name  lists
to  remove commas at brace level zero, placing first names or
initials before last names.  Default: yes.

-[no-]German-style
With the positive form, interpret quote characters ["] inside
braced  value  strings  at  brace  level  1  according to the
conventions of the TeX style file german.sty, which overloads
quote  to  simplify input and representation of German umlaut
accents, sharp-s  (es-zet),  ligature  separators,  invisible
hyphens,   raised/lowered   quotes,  French  guillemets,  and
discretionary  hyphens.   Recognized  character  combinations
will  be braced to prevent BibTeX from interpreting the quote
as a string delimiter.

Quoted strings receive no special handling from this  option,
and  since  German  nouns  in titles must anyway be protected
from the downcasing operation  of  most  BibTeX  bibliography
styles,  German  value  strings that use the overloaded quote
character can always be entered in the form "{...}",  without
the need to specify this option at all.

Default: no.

-[no-]keep-linebreaks
Normally, line breaks inside value strings are collapsed into
a single space, so that  long  value  strings  can  later  be
broken to provide lines of reasonable length.

With  the  positive  form,  linebreaks are preserved in value
strings.  If -max-width is set to zero,  this  preserves  the
original  line breaks.  Spacing outside value strings remains
under bibclean’s control, and is not affected by this option.

Default: no.

-[no-]keep-parbreaks
With  the  positive  form,  preserve paragraph breaks (either
formfeeds, or lines containing only spaces) in value strings.
Normally, paragraph breaks are collapsed into a single space.
Spacing  outside  value  strings  remains  under   bibclean’s
control, and is not affected by this option.  Default: no.

-[no-]keep-preamble-spaces
With   the   positive   form,   preserve  all  whitespace  in
@Preamble{...} entries.  Default: no.

-[no-]keep-spaces
With the positive form, preserve all spaces in value strings.
Normally,  multiple spaces are collapsed into a single space.
This option  can  be  used  together  with  -keep-linebreaks,
-keep-parbreaks,  and  -max-width  0  to preserve the form of
value  strings  while  still  providing  syntax   and   value
checking.    Spacing  outside  value  strings  remains  under
bibclean’s control, and  is  not  affected  by  this  option.
Default: no.

-[no-]keep-string-spaces
With   the   positive   form,   preserve  all  whitespace  in
@String{...} entries.  Default: no.

-[no-]parbreaks
With the negative form, a paragraph break (either a formfeed,
or  a  line containing only spaces) is not permitted in value
strings, or between field/value pairs.  This may be useful to
quickly   trap   runaway   strings  arising  from  mismatched
delimiters.  Default: yes.

-[no-]prettyprint
Normally, bibclean functions as  a  prettyprinter.   However,
with  the  negative form of this option, it acts as a lexical
analyzer instead, producing a stream of lexical tokens.   See
the  LEXICAL  ANALYSIS  manual  section  for further details.
Default: yes.

-[no-]print-patterns
With the positive form, print the value  patterns  read  from
initialization  files  as  they are added to internal tables.
Use this option to check newly-added patterns, or to see what
patterns are being used.

These  patterns  are  the  ones that will be used in checking
value strings for valid syntax, and all of them are specified
in  initialization  files,  rather  than  hard-coded into the
program.  For further details, see the  INITIALIZATION  FILES
manual section.  Default: no.

and file-specific initialization files.  Initializations will
come  only  from  those  files explicitly given by -init-file
filename options.  Default: yes.

-[no-]remove-OPT-prefixes
With the positive form, remove the ‘‘OPT’’ prefix  from  each
field  name  where  the  corresponding  value is not an empty
string.  The prefix ‘‘OPT’’ must be entirely in upper-case to
be recognized.

This  option is for bibliographies generated with the help of
the  GNU  Emacs  BibTeX  editing  support,  which   generates
templates  with  optional  fields  identified  by the ‘‘OPT’’
prefix.  Although the function M-x bibtex-remove-OPT normally
bound  to  the  keystrokes  C-c C-o does the job, users often
forget, with the result that BibTeX does  not  recognize  the
field  name,  and  ignores  the  value  string.  Compare this
option  with   -[no-]delete-empty-values   described   above.
Default: no.

-[no-]scribe
With the positive form, accept input syntax conforming to the
Scribe document system.  The  output  will  be  converted  to
conform to BibTeX syntax.  See the SCRIBE BIBLIOGRAPHY FORMAT
manual section for further details.  Default: no.

-[no-]trace-file-opening
With the positive form, record in  the  error  log  file  the
names of all files which bibclean attempts to open.  Use this
option to identify where initialization  files  are  located.
Default: no.

-[no-]warnings
With  the  positive  form,  allow  all warning messages.  The
negative form is not recommended since it may  mask  problems
that should be repaired.  Default: yes.

-version  Display  the  program version number on stderr, and then exit
with a success  return  code.   This  will  also  include  an
indication  of  who  compiled  the  program, the host name on
which it was compiled, the time of compilation, and the  type
of string-value matching code selected, when that information
is available to the compiler.



## ERRORRECOVERYANDWARNINGS

       When bibclean detects an error, it issues  an  error  message  to  both
stderr  and  stdout.   That  way, the user is clearly notified, and the
output bibliography also contains the message at the point of error.

Error messages begin with a distinctive pair of queries, ??,  beginning
in  column  1, followed by the input file name and line number.  If the
-file-position option was specified, they also contain  the  input  and
output  positions of the current file, entry, and value.  Each position
includes the file byte number, the line number, and the column  number.
In  the  event  of  a  runaway  string  argument,  the  entry and value
positions should precisely pinpoint the erroneous  bibliography  entry,
and  the  file positions will indicate where it was detected, which may
be rather later in the files.

Warning messages identify possible problems,  and  are  therefore  sent
only  to  stderr, and not to stdout, so they never appear in the output
file.  They are identified by  a  distinctive  pair  of  percents,  %%,
beginning  in  column 1, and as with error messages, may be followed by
file position messages if the -file-position option was specified.

For convenience, the first line of each error and warning message  sent
to  stderr  is formatted according to the expectations of the GNU Emacs
next-error command.   You  can  invoke  bibclean  with  the  Emacs  M-x
compile<RET>bibclean  filename.bib  >filename.new command, then use the
next-error command, normally bound to C-x  (that’s a grave,  or  back,
accent), to move to the location of the error in the input file.

If  error  messages  are  ignored,  and left in the output bibliography
file, they will precipitate an error  when  the  bibliography  is  next
processed with BibTeX.

After  issuing an error message, bibclean then resynchronizes its input
by copying it verbatim to stdout until  a  new  bibliography  entry  is
recognized  on  a line in which the first non-blank character is an at-
sign (@).  This ensures that nothing is lost from  the  input  file(s),
allowing  corrections  to  be  made  in  either the input or the output
files.  However, if bibclean detects an  internal  error  in  its  data
structures,  it will terminate abruptly without further input or output
processing; this kind of error should never happen, and if it does,  it
should be reported immediately to the author of the program.  Errors in
initialization files, and running out  of  dynamic  memory,  will  also
immediately terminate bibclean.



## INITIALIZATIONFILES

       bibclean  can  be compiled with one of three different types of pattern
matching; the choice is made by the installer at compile time:

·  The original version uses explicit hand-coded tests of value-
string syntax.

·  The  second  version uses regular-expression pattern-matching
host  library  routines  together   with   regular-expression
patterns that come entirely from initialization files.

·  The  third  version  uses special patterns that come entirely
from initialization files.

This Debianized version of bibclean uses the third  version.   However,
command-line  options can also be specified in initialization files, no
matter which pattern matching choice was selected.

When bibclean starts, it searches for initialization files,  using  the
first    one   of   $(HOME)/.bibcleanrc, /usr/share/bibcleanrc, and /etc/bibcleanrc that exists. Afterwards, it reads the first .bibcleanrc found in the BIBINPUTS search path. The name .bibcleanrc can be changed at run time through a setting of the environment variable BIBCLEANINI. If the name starts with a dot, it will be stripped when looking in /usr/share and /etc. Then, when command-line arguments are processed, any additional files specified by -init-filefilename options are also processed. Finally, immediately before each named bibliography file is processed, an attempt is made to process an initialization file with the same name, but with the extension changed to .ini. The default extension can be changed by a setting of the environment variable BIBCLEANEXT. This scheme permits system-wide, user-wide, session-wide, and file-specific initialization files to be supported. When input is taken from stdin, there is no file-specific initialization. For precise control, the -no-read-init-files option suppresses all initialization files except those explicitly named by -init- filefilename options, either on the command line, or in requested initialization files. Recursive execution of initialization files with nested -init-file options is permitted; if the recursion is circular, bibclean will finally get a non-fatal initialization file open failure after opening too many files. This terminates further initialization file processing. As the recursion unwinds, the files are all closed, then execution proceeds normally. An initialization file may contain empty lines, comments from percent to end of line (just like TeX), option switches, and field/pattern or field/pattern/message assignments. Leading and trailing spaces are ignored. This is best illustrated by a short example: % This is a small bibclean initialization file -init-file /u/math/bib/.bibcleanrc %% departmental patterns chapter = "\"D\"" %% 23 pages = "\"D--D\"" %% 23--27 volume = "\"D \\an\\d D\"" %% 11 and 12 year = \ "\"dddd, dddd, dddd\"" \ "Multiple years specified." %% 1989, 1990, 1991 -no-fix-names %% do not modify author/editor lists Long logical lines can be split into multiple physical lines by breaking at a backslash-newline pair; the backslash-newline pair is discarded. This processing happens while characters are being read, before any further interpretation of the input stream. Each logical line must contain a complete option (and its value, if any), or a complete field/pattern pair, or a field/pattern/message triple. Comments are stripped during the parsing of the field, pattern, and message values. The comment start symbol is not recognized inside quoted strings, so it can be freely used in such strings. Comments on logical lines that were input as multiple physical lines via the backslash-newline convention must appear on the last physical line; otherwise, the remaining physical lines will become part of the comment. Pattern strings must be enclosed in quotation marks; within such strings, a backslash starts an escape mechanism that is commonly used in UNIX software. The recognized escape sequences are: \a alarm bell (octal 007) \b backspace (octal 010) \f formfeed (octal 014) \n newline (octal 012) \r carriage return (octal 015) \t horizontal tab (octal 011) \v vertical tab (octal 013) \ooo character number octal ooo (e.g \012 is linefeed). Up to 3 octal digits may be used. \0xhh character number hexadecimal hh (e.g., \0x0a is linefeed). xhh may be in either letter case. Any number of hexadecimal digits may be used. Backslash followed by any other character produces just that character. Thus, \% gets a literal percent into a string (preventing its interpretation as a comment), \" produces a quotation mark, and \\ produces a single backslash. An ASCII NUL (\0) in a string will terminate it; this is a feature of the C programming language in which bibclean is implemented. Field/pattern pairs can be separated by arbitrary space, and optionally, either an equals sign or colon functioning as an assignment operator. Thus, the following are equivalent: pages="\"D--D\"" pages:"\"D--D\"" pages "\"D--D\"" pages = "\"D--D\"" pages : "\"D--D\"" pages "\"D--D\"" Each field name can have an arbitrary number of patterns associated with it; however, they must be specified in separate field/pattern assignments. An empty pattern string causes previously-loaded patterns for that field name to be forgotten. This feature permits an initialization file to completely discard patterns from earlier initialization files. Patterns for value strings are represented in a tiny special-purpose language that is both convenient and suitable for bibliography value- string syntax checking. While not as powerful as the language of regular-expression patterns, its parsing can be portably implemented in less than 3% of the code in a widely-used regular-expression parser (the GNU regexp package). The patterns are represented by the following special characters: <space> one or more spaces a exactly one letter A one or more letters d exactly one digit D one or more digits r exactly one Roman numeral R one or more Roman numerals (i.e. a Roman number) w exactly one word (one or more letters and digits) W one or more space-separated words, beginning and ending with a word . one ‘special’ character, one of the characters <space>!#()*+,-./:;?[]~, a subset of punctuation characters that are typically used in string values : one or more ‘special’ characters X one or more ‘special’-separated words, beginning and ending with a word \x exactly one x (x is any character), possibly with an escape sequence interpretation given earlier x exactly the character x (x is anything but one of these pattern characters: aAdDrRwW.:<space>\) The X pattern character is very powerful, but generally inadvisable, since it will match almost anything likely to be found in a BibTeX value string. The reason for providing pattern matching on the value strings is to uncover possible errors, not mask them. There is no provision for specifying ranges or repetitions of characters, but this can usually be done with separate patterns. It is a good idea to accompany the pattern with a comment showing the kind of thing it is expected to match. Here is a portion of an initialization file giving a few of the patterns used to match number value strings: number = "\"D\"" %% 23 number = "\"A AD\"" %% PN LPS5001 number = "\"A D(D)\"" %% RJ 34(49) number = "\"A D\"" %% XNSS 288811 number = "\"A D\\.D\"" %% Version 3.20 number = "\"A-A-D-D\"" %% UMIAC-TR-89-11 number = "\"A-A-D\"" %% CS-TR-2189 number = "\"A-A-D\\.D\"" %% CS-TR-21.7 For a bibliography that contains only article entries, this list should probably be reduced to just the first pattern, so that anything other than a digit string fails the pattern-match test. This is easily done by keeping bibliography-specific patterns in a corresponding file with extension .ini, since that file is read automatically. You should be sure to use empty pattern strings in this pattern file to discard patterns from earlier initialization files. The value strings passed to the pattern matcher contain surrounding quotes, so the patterns should also. However, you could use a pattern specification like "\"D" to match an initial digit string followed by anything else; the omission of the final quotation mark \" in the pattern allows the match to succeed without checking that the next character in the value string is a quotation mark. Because the value strings are intended to be processed by TeX, the pattern matching ignores braces, and TeX control sequences, together with any space following those control sequences. Spaces around braces are preserved. This convention allows the pattern fragment A-AD-D to match the value string TN-K\slash 27-70, because the value is implicitly collapsed to TN-K27-70 during the matching operation. bibclean’s normal action when a string value fails to match any of the corresponding patterns is to issue a warning message something like this: "Unexpected value in year = "192". In most cases, that is sufficient to alert the user to a problem. In some cases, however, it may be desirable to associate a different message with a particular pattern. This can be done by supplying a message string following the pattern string. Format items %% (single percent), %e (entry name), %f (field name), %k (citation key), and %v (string value) are available to get current values expanded in the messages. Here is an example: chapter = "\"D:D\"" "Colon found in ‘‘%f = %v’’" %% 23:2 To be consistent with other messages output by bibclean, the message string should not end with punctuation. If you wish to make the message an error, rather than just a warning, begin it with a query (?), like this: chapter = "\"D:D\"" "?Colon found in ‘‘%f = %v’’" %% 23:2 The query will not be included in the output message. Escape sequences are supported in message strings, just as they are in pattern strings. You can use this to advantage for fancy things, such as terminal display mode control. If you rewrite the previous example as chapter = "\"D:D\"" \ "?\033[7mColon found in ‘‘%f = %v’’\033[0m" %% 23:2 the error message will appear in inverse video on display screens that support ANSI terminal control sequences. Such practice is not normally recommended, since it may have undesirable effects on some output devices. Nevertheless, you may find it useful for restricted applications. For some types of bibliography fields, bibclean contains special- purpose code to supplement or replace the pattern matching: · CODEN, ISBN and ISSN field values are handled this way because their validation requires evaluation of checksums that cannot be expressed by simple patterns; no patterns are even used in these three cases. · chapter, number, pages, and volume values are checked only by pattern matching. · month values are first checked against the standard BibTeX month abbreviations, and only if no match is found are patterns then used. · year values are first checked against patterns, then if no match is found, the year numbers are found and converted to integer values for testing against reasonable bounds. Values for other fields are checked only against patterns. You can provide patterns for any field you like, even ones bibclean does not already know about. New ones are simply added to an internal table that is searched for each string to be validated. The special field, key, represents the bibliographic citation key. It can be given patterns, like any other field. Here is an initialization file pattern assignment that will match an author name, a colon, an alphabetic string, and a two-digit year: key = "A:Add" %% Knuth:TB86 Notice that no quotation marks are included in the pattern, because the citation keys are not quoted. You can use such patterns to help enforce uniform naming conventions for citation keys, which is increasingly important as your bibliography data base grows.  ## LEXICALANALYSIS  When -no-prettyprint is specified, bibclean acts as a lexical analyzer instead of a prettyprinter, producing output in lines of the form <token-number><tab><token-name><tab>"<token-value>" Each output line contains a single complete token, identified by a small integer number for use by a computer program, a token type name for human readers, and a string value in quotes. Special characters in the token value string are represented with ANSI/ISO Standard C escape sequences, so all characters other than NUL are representable, and multi-line values can be represented in a single line. Here are the token numbers and token type names that can appear in the output when -prettyprint is specified: 0 UNKNOWN 1 ABBREV 2 AT 3 COMMA 4 COMMENT 5 ENTRY 6 EQUALS 7 FIELD 8 INCLUDE 9 INLINE 10 KEY 11 LBRACE 12 LITERAL 13 NEWLINE 14 PREAMBLE 15 RBRACE 16 SHARP 17 SPACE 18 STRING 19 VALUE Programs that parse such output should also be prepared for lines beginning with the warning prefix, %%, or the error prefix, ??, and for ANSI/ISO Standard C line number directives of the form # line 273 "texbook1.bib" which record the line number and file name of the current input file. If a -max-width nnn command-line option was specified, long output lines will be wrapped at a backslash-newline pair, and consequently, software that processes the lexical token stream should be prepared to collapse such wrapped lines back into single lines. As an example of the use of -no-prettyprint, the UNIX command pipeline bibclean -no-prettyprint mylib.bib | \ awk ’$2 == "KEY" {print $3}’ | \ sed -e ’s/"//g’ | \ sort will extract a sorted list of all citation keys in the file mylib.bib. A certain amount of processing will have been done on the tokens. In particular, delimiters equivalent to braces will have been replaced by braces, and braced strings will have become quoted strings. The LITERAL token type is used for arbitrary text that bibclean does not examine further, such as the contents of a @Preamble{...} or a @Comment{...}. The UNKNOWN token type should never appear in the output stream. It is used internally to initialize token type variables.  ## SCRIBEBIBLIOGRAPHYFORMAT  bibclean’s support for the Scribe bibliography format is based on the syntax description in the Scribe Introductory User’s Manual, 3rd Edition, May 1980. Scribe was originally developed by Brian Reid at Carnegie-Mellon University, and is now marketed by Unilogic, Ltd. The BibTeX bibliography format was strongly influenced by Scribe, and indeed, with care, it is possible to share bibliography files between the two systems. Nevertheless, there are some differences, so here is a summary of features of the Scribe bibliography file format: (1) Letter case is not significant in field names and entry names, but case is preserved in value strings. (2) In field/value pairs, the field and value may be separated by one of three characters: =, /, or space. Space may optionally surround these separators. (3) Value delimiters are any of these seven pairs: { } [ ] ( ) < > ’ ’ " " ‘ ‘ (4) Value delimiters may not be nested, even though with the first four delimiter pairs, nested balanced delimiters would be unambiguous. (5) Delimiters can be omitted around values that contain only letters, digits, sharp (#), ampersand (&), period (.), and percent (%). (6) Outside of delimited values, a literal at-sign (@) is represented by doubled at-signs (@@). (7) Bibliography entries begin with @name, as for BibTeX, but any of the seven Scribe value delimiter pairs may be used to surround the values in field/value pairs. As in (4), nested delimiters are forbidden. (8) Arbitrary space may separate entry names from the following delimiters. (9) @Comment is a special command whose delimited value is discarded. As in (4), nested delimiters are forbidden. (10) The special form @Begin{comment} ... @End{comment} permits encapsulating arbitrary text containing any characters or delimiters, other than ‘‘@End{comment}’’. Any of the seven delimiter pairs may be used around the word ‘‘comment’’ following the ‘‘@Begin’’ or ‘‘@End’’; the delimiters in the two cases need not be the same, and consequently, ‘‘@Begin{comment}’’/‘‘@End{comment}’’ pairs may not be nested. (11) The key field is required in each bibliography entry. (12) A backslashed quote in a string will be assumed to be a TeX accent, and braced appropriately. While such accents do not conform to Scribe syntax, Scribe-format bibliographies have been found that appear to be intended for TeX processing. Because of this loose syntax, bibclean’s normal error detection heuristics are less effective, and consequently, Scribe mode input is not the default; it must be explicitly requested.  ## ENVIRONMENTVARIABLES  BIBCLEANEXT File extension of bibliography-specific initialization files. Default: .ini. BIBCLEANINI Name of bibclean initialization files. Default: .bibcleanrc. BIBINPUTS Search path for bibclean and BibTeX input files. This is a colon-separated list of directories that are searched in order from first to last. It is not an error for a specified directory to not exist.  ## FILES  *.bib BibTeX and Scribe bibliography data base files. *.ini File-specific initialization files. /usr/share/bibcleanrc, /etc/bibcleanrc System-wide initialization files. .bibcleanrc User-specific initialization files.  ## SEE ALSO  bibcheck(1), bibdup(1), bibextract(1), bibindex(1), bibjoin(1), biblabel(1), biblex(1), biblook(1), biborder(1), bibparse(1), bibsort(1), bibtex(1), bibunlex(1), citefind(1), citesub(1), citetags(1), latex(1), scribe(1), tex(1).  ## AUTHOR  Nelson H. F. Beebe Center for Scientific Computing University of Utah Department of Mathematics, 322 INSCC 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 585 1640, +1 801 581 4148 Email: beebe@math.utah.edu, beebe@acm.org, beebe@ieee.org (Internet) URL: http://www.math.utah.edu/~beebe This Debianization of bibclean was done by Henning Makholm <henning@makholm.net>, and differs from the upstream source in where it looks for the system-wide initialization file (vanilla bibclean expects to find it in$PATH), and has also been patched to ignore the  built-in
BibTeX field-length limit for abstract fields.