NAME
sid - Syntax Improving Device; parser generator.
SYNTAX
sid [option]... file...
DESCRIPTION
The sid command is used to turn descriptions of a language into a
program for recognising that language. This manual page details the
command line syntax; for more information, consult the sid user
documentation. The number of files specified on the command line
varies depending upon the output language. The description of the
--language option specifies the number of files for each language.
SWITCHES
The new version of sid accepts both short form and long form command
line switches.
Short form switches are single characters, and begin with a ’-’ or ’+’
character. They can be concatentated into a single command line word,
e.g.:
-vdl dump-file language-name
which contains three different switches (-v, which takes no arguments;
-d, which takes one argument: dump-file; and -l, which takes one
argument: language-name).
Long form switches are strings, and begin with ’--’ or ’++’. With long
form switches, only the shortest unique prefix need be entered. The
long form of the above example would be:
--version --dump-file dump-file --language language-name
In most cases the arguments to the switch should follow the switch as a
separate word. In the case of short form switches, the arguments to
the short form switches in a single word should follow the word in the
order of the switches (as in the first example). For some options, the
argument may be part of the same word as the switch (such options are
shown without a space between the switch and the argument in the switch
summaries below). In the case of short form switches, such a switch
would terminate any concatentation of switches (either a character
would follow it, which would be treated as its argument, or it would be
the end of the word, and its argument would follow as normal).
For binary switches, the ’-’ or ’--’ switch prefixes set (enable) the
switch, and the ’+’ or ’++’ switch prefixes reset (disable) the switch.
This is probably back to front, but is in keeping with other programs.
The switches ’--’ or ’++’ by themselves terminate option parsing.
ERROR FILE SYNTAX
It is possible to change the error messages that sid uses. In order to
do this, make the environment variable SID_ERROR_FILE contain the name
of a file with the new error messages in.
The error file consists of zero or more sections. Each section begins
with a section marker (one of %prefix%, %errors% or %strings%). The
prefix section takes a single string (this is to be the prefix for all
error messages). The other sections take zero or more pairs of names
and strings. A name is a sequence of characters surrounded by single
quotes. A string is a sequence of characters surrounded by double
quotes. In the case of the prefix and error sections, the strings may
contain variables of the form ${variable name}. These variables will
be replaced by suitable information when the error occurs. The
backslash character can be used to escape characters. The following C
style escape sequences are recognized: ’\n’, ’\r’, ’\t’, ’\0’. Also,
the sequence ’\xNN’ represents the character with code NN in hex. The
hash character acts as a comment to end of line character.
The --show-errors option may be used to get a copy of the current error
messages.
OPTIONS
sid accepts the following command line options:
--dump-file FILE
-d FILE
This option causes intermediate dumps of the grammar to be
written to the file FILE.
--factor-limit LIMIT
-f LIMIT
This option limits the number of rules that can be created
during the factorisation process. It is probably best not to
change this.
--help
-?
Write an option summary to the standard error.
--inline INLINES
-i INLINES
This option controls what inlining will be done in the output
parser. The inlines argument should be a comma seperated list
of the following words:
SINGLES
This causes single alternative rules to be inlined.
This inlining is no longer performed as a modification
to the grammar (it was in version 1.0).
BASICS This causes rules that contain only basics (and no
exception handlers or empty alternatives) to be
inlined. The restriction on exception handlers and
empty alternatives is rather arbitrary, and may be
changed later.
TAIL This causes tail recursive calls to be inlined.
Without this, tail recursion elimination will not be
performed.
OTHER This causes other calls to be inlined wherever
possible. Unless the "MULTI" inlining is also
specified, this will be done only for productions that
are called once.
MULTI This causes calls to be inlined, even if the rule
being called is called more than once. Turning this
inlining on implies "OTHER". Similarly turning off
"OTHER" inlining will turn off "MULTI" inlining. For
grammars of any size, this is probably best avoided;
if used the generated parser may be huge (e.g. a C
grammar has produced a file that was several hundred
MB in size).
ALL
This turns on all inlining.
In addition, prefixing a word with "NO" turns off that inlining
phase. The words may be given in any case. They are evaluated
in the order given, so:
-inline noall,singles
would turn on single alternative rule inlining only, whilst:
-inline singles,noall
would turn off all inlining. The default is as if SID were
invoked with the option:
-inline noall,basics,tail
--language LANGUAGE
-l LANGUAGE
This option specifies the output language. Currently this
should be either "ansi-c", "pre-ansi-c", "ossg-c", or "test".
The default is "ansi-c".
The "ansi-c" and "pre-ansi-c" languages are basically the same.
The only difference is that "ansi-c" initially uses function
prototypes, and "pre-ansi-c" doesn’t. The "ossg-c" language
uses macros to declare and define functions which may be defined
to give either prototypes or non-prototypes. Each language
takes two input files, a grammar file and an actions file, and
produces two output files, a C source file containing the
generated parser and a C header file containing the external
declarations for the parser. The C language specific options
are:
prototypes proto ossg-prototypes ossg-proto no-prototypes
no-proto
These enable or disable the use of function prototypes or
the OSSG prototype macros.
split split=NUMBER no-split
These enable or disable the output file split option.
The generated files can be very large even without
inlining. This option splits the main output file into a
number of components containing about NUMBER lines each
(the default being 50000). These components are
distinguished by successively substituting 1, 2, 3, ...
for the character ’@’ in the output file name.
numeric-ids numeric no-numeric-ids no-numeric
These enable or disable the use of numeric identifiers.
Numeric identifiers replace the identifier name with a
number, which is mainly of use in stopping identifier
names getting too long. The disadvantage is that the
code becomes less readable, and more difficult to debug.
Numeric identifiers are not used by default and are never
used for terminal numbers.
casts cast no-casts no-cast
These enable or disable casting of action and assignment
operator immutable parameters. If enabled, a parameter
is cast to its own type when it is substituted into the
action. This will cause some compilers to complain about
attempts to modify the parameter (which can help pick out
attempts at mutating parameters that should not be
mutated). The disadvantage is that not all compilers
will reject attempts at mutation, and that ANSI doesn’t
allow casting to structure and union types, which means
that some code may be illegal. Parameter casting is
disabled by default.
unreachable-macros unreachable-macro unreachable-comments
unreachable-comment
These choose whether unreachable code is marked by a
macro or a comment. The default is to mark unreachable
code with a comment "/*UNREACHED*/", however a macro
"UNREACHED;" may be used instead, if desired.
lines line no-lines no-line
These determine whether "#line" directives should be
output to relate the output file to the actions file.
These are generated by default.
The "test" language only takes one input file, and produces no
output file. It may be used to check that a grammar is valid.
In conjunction with the dump file, it may be used to check the
transformations that would be applied to the grammar. There are
no language specific options for the "test" language.
--show-errors
-e
Write the current error message list to the standard output.
--switch OPTION
-s OPTION
Pass through OPTION as a language specific option.
--tab-width NUMBER
-t NUMBER
This option specifies the number of spaces that a tab occupies.
It defaults to 8. It is only used when indenting output.
--version
-v
This option causes the version number and supported languages to
be written to the standard error stream.
SEE ALSO
SID users’ guide.
sid(1)