antlr - ANother Tool for Language Recognition

NAME

       antlr - ANother Tool for Language Recognition

SYNTAX

       antlr [options] grammar_files

DESCRIPTION

       Antlr converts an extended form of context-free grammar into a set of C
       functions which directly implement an efficient form  of  deterministic
       recursive-descent LL(k) parser.  Context-free grammars may be augmented
       with predicates to allow semantics to influence parsing; this allows  a
       form  of  context-sensitive  parsing.   Selective  backtracking is also
       available to handle non-LL(k) and even non-LALR(k)  constructs.   Antlr
       also  produces  a  definition  of  a  lexer  which can be automatically
       converted into C code for a  DFA-based  lexer  by  dlg.   Hence,  antlr
       serves  a  function much like that of yacc, however, it is notably more
       flexible and is more integrated with a lexer generator (antlr  directly
       generates  dlg  code,  whereas  yacc  and  lex  are  given  independent
       descriptions).  Unlike  yacc  which  accepts  LALR(1)  grammars,  antlr
       accepts  LL(k)  grammars in an extended BNF notation — which eliminates
       the need for precedence rules.

       Like yacc grammars, antlr  grammars  can  use  automatically-maintained
       symbol  attribute  values  referenced  as  dollar  variables.  Further,
       because antlr generates  top-down  parsers,  arbitrary  values  may  be
       inherited  from  parent rules (passed like function parameters).  Antlr
       also has a mechanism for  creating  and  manipulating  abstract-syntax-
       trees.

       There  are  various  other  niceties in antlr, including the ability to
       spread one grammar over multiple files or even multiple grammars  in  a
       single  file,  the  ability  to  generate a version of the grammar with
       actions stripped out (for documentation purposes), and lots more.

OPTIONS

       -ck n  Use up to n symbols of lookahead when using  compressed  (linear
              approximation)  lookahead.  This type of lookahead is very cheap
              to compute and is attempted before full LL(k)  lookahead,  which
              is of exponential complexity in the worst case.  In general, the
              compressed lookahead can be much deeper (e.g, -ck 10)  than  the
              full lookahead (which usually must be less than 4).

       -CC    Generate C++ output from both ANTLR and DLG.

       -cr    Generate  a cross-reference for all rules.  For each rule, print
              a list of all other rules that reference it.

       -e1    Ambiguities/errors shown in low detail (default).

       -e2    Ambiguities/errors shown in more detail.

       -e3    Ambiguities/errors shown in excruciating detail.

       -fe file
              Rename err.c to file.

       -fh file
              Rename stdpccts.h header (turns on -gh) to file.

       -fl file
              Rename lexical output, parser.dlg, to file.

       -fm file
              Rename file with lexical mode definitions, mode.h, to file.

       -fr file
              Rename file which remaps globally visible symbols,  remap.h,  to
              file.

       -ft file
              Rename tokens.h to file.

       -ga    Generate ANSI-compatible code (default case).  This has not been
              rigorously tested to be ANSI XJ11 C compliant, but it is  close.
              The  normal  output  of antlr is currently compilable under both
              K&R, ANSI C, and C++—this  option  does  nothing  because  antlr
              generates a bunch of #ifdef’s to do the right thing depending on
              the language.

       -gc    Indicates that antlr should  generate  no  C  code,  i.e.,  only
              perform analysis on the grammar.

       -gd    C  code  is  inserted  in  each  of  the antlr generated parsing
              functions to provide for user-defined  handling  of  a  detailed
              parse  trace.   The inserted code consists of calls to the user-
              supplied macros or functions called  zzTRACEIN  and  zzTRACEOUT.
              The only argument is a char * pointing to a C-style string which
              is the grammar rule recognized by the current parsing  function.
              If  no  definition  is  given for the trace functions, upon rule
              entry and exit, a message will  be  printed  indicating  that  a
              particular rule as been entered or exited.

       -ge    Generate an error class for each non-terminal.

       -gh    Generate  stdpccts.h  for  non-ANTLR-generated files to include.
              This file contains all defines needed to describe  the  type  of
              parser  generated  by antlr (e.g. how much lookahead is used and
              whether or not trees are constructed) and  contains  the  header
              action specified by the user.

       -gk    Generate  parsers  that  delay  lookahead  fetches until needed.
              Without this option, antlr generates parsers which always have k
              tokens of lookahead available.

       -gl    Generate line info about grammar actions in C parser of the form
              # line "file" which makes error messages from the C/C++ compiler
              make more sense as they will point into the grammar file not the
              resulting C file.  Debugging is easier as well, because you will
              step through the grammar not C file.

       -gs    Do  not  generate  sets  for  token  expression  lists;  instead
              generate a ||-separated sequence  of  LA(1)==token_number.   The
              default is to generate sets.

       -gt    Generate code for Abstract-Syntax Trees.

       -gx    Do  not  create  the lexical analyzer files (dlg-related).  This
              option should be  given  when  the  user  wishes  to  provide  a
              customized  lexical  analyzer.   It  may  also  be  used in make
              scripts to cause only the parser to be rebuilt when a change not
              affecting the lexical structure is made to the input grammars.

       -k n   Set k of LL(k) to n; i.e. set tokens of look-ahead (default==1).

       -o dir Directory where output files should go (default=".").   This  is
              very  nice  for  keeping the source directory clear of ANTLR and
              DLG spawn.

       -p     The complete grammar, collected from all input grammar files and
              stripped  of  all  comments  and  embedded actions, is listed to
              stdout.  This is intended to aid in viewing the  entire  grammar
              as  a  whole and to eliminate the need to keep actions concisely
              stated so that the grammar is easier  to  read.   Hence,  it  is
              preferable  to  embed  even  complex  actions  directly  in  the
              grammar, rather than to call  them  as  subroutines,  since  the
              subroutine call overhead will be saved.

       -pa    This  option  is  the  same  as  -p  except  that  the output is
              annotated with the first sets determined from grammar  analysis.

       -prc on
              Turn on the computation and hoisting of predicate context.

       -prc off
              Turn  off  the  computation  and  hoisting of predicate context.
              This option makes 1.10 behave like the 1.06 release with  option
              -pr on.  Context computation is off by default.

       -rl n  Limit  the maximum number of tree nodes used by grammar analysis
              to n.  Occasionally,  antlr  is  unable  to  analyze  a  grammar
              submitted  by the user.  This rare situation can only occur when
              the grammar is large and the amount of lookahead is greater than
              one.   A nonlinear analysis algorithm is used by PCCTS to handle
              the general case of LL(k) parsing.  The  average  complexity  of
              analysis,  however, is near linear due to some fancy footwork in
              the implementation which reduces the number of calls to the full
              LL(k)  algorithm.   An  error message will be displayed, if this
              limit is reached, which indicates the  grammar  construct  being
              analyzed  when  antlr  hit  a non-linearity.  Use this option if
              antlr seems to go out to lunch and your  disk  start  thrashing;
              try  n=10000  to  start.   Once the offending construct has been
              identified, try to remove the ambiguity that antlr was trying to
              overcome  with  large  lookahead  analysis.  The introduction of
              (...)? backtracking blocks eliminates some of  these  problems —
              antlr  does  not analyze alternatives that begin with (...)? (it
              simply backtracks, if necessary, at run time).

       -w1    Set low warning level.   Do  not  warn  if  semantic  predicates
              and/or   (...)?   blocks   are   assumed   to   cover  ambiguous
              alternatives.

       -w2    Ambiguous parsing decisions  yield  warnings  even  if  semantic
              predicates or (...)? blocks are used.  Warn if predicate context
              computed  and  semantic  predicates  incompletely   disambiguate
              alternative productions.

       -      Read  grammar  from  standard  input and generate stdin.c as the
              parser file.

SPECIAL CONSIDERATIONS

       Antlr works...  we think.  There is no implicit guarantee of  anything.
       We reserve no legal rights to the software known as the Purdue Compiler
       Construction Tool Set (PCCTS) — PCCTS is  in  the  public  domain.   An
       individual  or  company  may  do  whatever  they  wish with source code
       distributed with PCCTS or the code generated by  PCCTS,  including  the
       incorporation  of  PCCTS,  or its output, into commercial software.  We
       encourage users to develop software with PCCTS.   However,  we  do  ask
       that  credit is given to us for developing PCCTS.  By "credit", we mean
       that if you incorporate our source  code  into  one  of  your  programs
       (commercial   product,   research   project,  or  otherwise)  that  you
       acknowledge this fact somewhere in the documentation, research  report,
       etc...   If  you  like  PCCTS  and  have developed a nice tool with the
       output, please mention that you developed it using PCCTS.  As  long  as
       these  guidelines  are  followed,  we expect to continue enhancing this
       system and expect to make other tools available as they are  completed.

FILES

       *.c    output C parser.

       *.cpp  output C++ parser when C++ mode is used.

       parser.dlg
              output dlg lexical analyzer.

       err.c  token  string array, error sets and error support routines.  Not
              used in C++ mode.

       remap.h
              file that redefines all globally visible  parser  symbols.   The
              use of the #parser directive creates this file.  Not used in C++
              mode.

       stdpccts.h
              list of definitions needed by C files, not generated  by  PCCTS,
              that reference PCCTS objects.  This is not generated by default.
              Not used in C++ mode.

       tokens.h
              output #defines for tokens  used  and  function  prototypes  for
              functions generated for rules.

NAME

SYNTAX

DESCRIPTION

OPTIONS

SPECIAL CONSIDERATIONS

FILES

SEE ALSO