NAME
antlr - ANother Tool for Language Recognition
SYNTAX
antlr [options] grammar_files
DESCRIPTION
Antlr converts an extended form of context-free grammar into a set of C
functions which directly implement an efficient form of deterministic
recursive-descent LL(k) parser. Context-free grammars may be augmented
with predicates to allow semantics to influence parsing; this allows a
form of context-sensitive parsing. Selective backtracking is also
available to handle non-LL(k) and even non-LALR(k) constructs. Antlr
also produces a definition of a lexer which can be automatically
converted into C code for a DFA-based lexer by dlg. Hence, antlr
serves a function much like that of yacc, however, it is notably more
flexible and is more integrated with a lexer generator (antlr directly
generates dlg code, whereas yacc and lex are given independent
descriptions). Unlike yacc which accepts LALR(1) grammars, antlr
accepts LL(k) grammars in an extended BNF notation — which eliminates
the need for precedence rules.
Like yacc grammars, antlr grammars can use automatically-maintained
symbol attribute values referenced as dollar variables. Further,
because antlr generates top-down parsers, arbitrary values may be
inherited from parent rules (passed like function parameters). Antlr
also has a mechanism for creating and manipulating abstract-syntax-
trees.
There are various other niceties in antlr, including the ability to
spread one grammar over multiple files or even multiple grammars in a
single file, the ability to generate a version of the grammar with
actions stripped out (for documentation purposes), and lots more.
OPTIONS
-ck n Use up to n symbols of lookahead when using compressed (linear
approximation) lookahead. This type of lookahead is very cheap
to compute and is attempted before full LL(k) lookahead, which
is of exponential complexity in the worst case. In general, the
compressed lookahead can be much deeper (e.g, -ck 10) than the
full lookahead (which usually must be less than 4).
-CC Generate C++ output from both ANTLR and DLG.
-cr Generate a cross-reference for all rules. For each rule, print
a list of all other rules that reference it.
-e1 Ambiguities/errors shown in low detail (default).
-e2 Ambiguities/errors shown in more detail.
-e3 Ambiguities/errors shown in excruciating detail.
-fe file
Rename err.c to file.
-fh file
Rename stdpccts.h header (turns on -gh) to file.
-fl file
Rename lexical output, parser.dlg, to file.
-fm file
Rename file with lexical mode definitions, mode.h, to file.
-fr file
Rename file which remaps globally visible symbols, remap.h, to
file.
-ft file
Rename tokens.h to file.
-ga Generate ANSI-compatible code (default case). This has not been
rigorously tested to be ANSI XJ11 C compliant, but it is close.
The normal output of antlr is currently compilable under both
K&R, ANSI C, and C++—this option does nothing because antlr
generates a bunch of #ifdef’s to do the right thing depending on
the language.
-gc Indicates that antlr should generate no C code, i.e., only
perform analysis on the grammar.
-gd C code is inserted in each of the antlr generated parsing
functions to provide for user-defined handling of a detailed
parse trace. The inserted code consists of calls to the user-
supplied macros or functions called zzTRACEIN and zzTRACEOUT.
The only argument is a char * pointing to a C-style string which
is the grammar rule recognized by the current parsing function.
If no definition is given for the trace functions, upon rule
entry and exit, a message will be printed indicating that a
particular rule as been entered or exited.
-ge Generate an error class for each non-terminal.
-gh Generate stdpccts.h for non-ANTLR-generated files to include.
This file contains all defines needed to describe the type of
parser generated by antlr (e.g. how much lookahead is used and
whether or not trees are constructed) and contains the header
action specified by the user.
-gk Generate parsers that delay lookahead fetches until needed.
Without this option, antlr generates parsers which always have k
tokens of lookahead available.
-gl Generate line info about grammar actions in C parser of the form
# line "file" which makes error messages from the C/C++ compiler
make more sense as they will point into the grammar file not the
resulting C file. Debugging is easier as well, because you will
step through the grammar not C file.
-gs Do not generate sets for token expression lists; instead
generate a ||-separated sequence of LA(1)==token_number. The
default is to generate sets.
-gt Generate code for Abstract-Syntax Trees.
-gx Do not create the lexical analyzer files (dlg-related). This
option should be given when the user wishes to provide a
customized lexical analyzer. It may also be used in make
scripts to cause only the parser to be rebuilt when a change not
affecting the lexical structure is made to the input grammars.
-k n Set k of LL(k) to n; i.e. set tokens of look-ahead (default==1).
-o dir Directory where output files should go (default="."). This is
very nice for keeping the source directory clear of ANTLR and
DLG spawn.
-p The complete grammar, collected from all input grammar files and
stripped of all comments and embedded actions, is listed to
stdout. This is intended to aid in viewing the entire grammar
as a whole and to eliminate the need to keep actions concisely
stated so that the grammar is easier to read. Hence, it is
preferable to embed even complex actions directly in the
grammar, rather than to call them as subroutines, since the
subroutine call overhead will be saved.
-pa This option is the same as -p except that the output is
annotated with the first sets determined from grammar analysis.
-prc on
Turn on the computation and hoisting of predicate context.
-prc off
Turn off the computation and hoisting of predicate context.
This option makes 1.10 behave like the 1.06 release with option
-pr on. Context computation is off by default.
-rl n Limit the maximum number of tree nodes used by grammar analysis
to n. Occasionally, antlr is unable to analyze a grammar
submitted by the user. This rare situation can only occur when
the grammar is large and the amount of lookahead is greater than
one. A nonlinear analysis algorithm is used by PCCTS to handle
the general case of LL(k) parsing. The average complexity of
analysis, however, is near linear due to some fancy footwork in
the implementation which reduces the number of calls to the full
LL(k) algorithm. An error message will be displayed, if this
limit is reached, which indicates the grammar construct being
analyzed when antlr hit a non-linearity. Use this option if
antlr seems to go out to lunch and your disk start thrashing;
try n=10000 to start. Once the offending construct has been
identified, try to remove the ambiguity that antlr was trying to
overcome with large lookahead analysis. The introduction of
(...)? backtracking blocks eliminates some of these problems —
antlr does not analyze alternatives that begin with (...)? (it
simply backtracks, if necessary, at run time).
-w1 Set low warning level. Do not warn if semantic predicates
and/or (...)? blocks are assumed to cover ambiguous
alternatives.
-w2 Ambiguous parsing decisions yield warnings even if semantic
predicates or (...)? blocks are used. Warn if predicate context
computed and semantic predicates incompletely disambiguate
alternative productions.
- Read grammar from standard input and generate stdin.c as the
parser file.
SPECIAL CONSIDERATIONS
Antlr works... we think. There is no implicit guarantee of anything.
We reserve no legal rights to the software known as the Purdue Compiler
Construction Tool Set (PCCTS) — PCCTS is in the public domain. An
individual or company may do whatever they wish with source code
distributed with PCCTS or the code generated by PCCTS, including the
incorporation of PCCTS, or its output, into commercial software. We
encourage users to develop software with PCCTS. However, we do ask
that credit is given to us for developing PCCTS. By "credit", we mean
that if you incorporate our source code into one of your programs
(commercial product, research project, or otherwise) that you
acknowledge this fact somewhere in the documentation, research report,
etc... If you like PCCTS and have developed a nice tool with the
output, please mention that you developed it using PCCTS. As long as
these guidelines are followed, we expect to continue enhancing this
system and expect to make other tools available as they are completed.
FILES
*.c output C parser.
*.cpp output C++ parser when C++ mode is used.
parser.dlg
output dlg lexical analyzer.
err.c token string array, error sets and error support routines. Not
used in C++ mode.
remap.h
file that redefines all globally visible parser symbols. The
use of the #parser directive creates this file. Not used in C++
mode.
stdpccts.h
list of definitions needed by C files, not generated by PCCTS,
that reference PCCTS objects. This is not generated by default.
Not used in C++ mode.
tokens.h
output #defines for tokens used and function prototypes for
functions generated for rules.
SEE ALSO
dlg(1), pccts(1)