pretzel - the universal prettyprinter generator

NAME

       pretzel - the universal prettyprinter generator

SYNOPSIS

       pretzel [-qtgdh] [-o outfile] fileprefix

       pretzel [-qtgdh] [-o outfile] file1 file2

DESCRIPTION

       Pretzel  is  a  program  that  generates  a prettyprinter module from a
       formal  description  of  the  way  a   certain   language   should   be
       prettyprinted.    A   prettyprinter  is  a  function  or  program  that
       rearranges source code  to  enhance  its  readability.   Prettyprinters
       generated  by  pretzel output LaTeX source code that can be used within
       your own documents.  NB that pretzel produces modules, not programs!

       You have to provide two input files to pretzel  that  specify  the  way
       given  source  code should be prettyprinted. These two files are called
       the formatted token file (suffix .ft) and the  formatted  grammar  file
       (suffix .fg).

       From  this  input,  pretzel  generates two things: a valid flex(1) file
       that forms the prettyprinting scanner and a valid bison(1)  input  file
       that  can  be  used  to  build  the prettyprinting parser (which is the
       actual  prettyprinter).   There  is  a  shell  script  pretzel-it  that
       faciliates  using  pretzel  (see pretzel-it(1)).  This man page is only
       meant as a quick reference  to  pretzel  usage.   Look  into  the  main
       documentation of pretzel if you are new to all this.

   Invoking pretzel
       Invoking  pretzel  can take two forms: Either invoke it specifying only
       the common prefix of  the  two  input  files,  or  specify  both  files
       seperately  on  the  command  line.  If  you  specify  both  files, the
       formatted token file comes first.

   Examples
       Say your input files are called foo.ft and foo.fg.  Then you can say

              pretzel foo

       to invoke pretzel properly. If your files are called foo.ft and  bar.fg
       then you would have to say

              pretzel foo.ft bar.fg

       to do the job.

OPTIONS

       Pretzel recognizes the following options:

              -q     Run quietly.

              -t     Process formatted token file only.

              -g     Process  formatted  grammar  file only (options -t and -g
                     are mutually exclusive).

              -d     Print debug information to the screen.

              -h     Print full usage message.

              -o name
                     Use name as prefix of the generated output files.

THE INPUT FILES

       This section summarizes the format of the input files  and  the  format
       command primitives that pretzel supports.

   The formatted token file
       The  formatted  token  file  contains  a list of token definitions with
       their corresponding "prettyprinted" form. The prettyprinted form  of  a
       token will be called an attribute or a translation.

       The general outline of the formatted token file is

              declarations

              %%

              token definitions

       Normally,  the  declarations  part  is  empty.  You  can  put a general
       description of the file here (as a C comment) and redefinitions of  the
       default interface go here as well.

       The  token  definitions  section of the formatted token file contains a
       series of token definitions of the form:

              pattern token attribute

       The pattern must be a valid regular expression (in  terms  of  flex(1))
       and  must  be  unindented. The token specifies the symbolic name of the
       token for the pattern and begins at the first non-whitespace  character
       after  the  pattern.  The  token  name  must  be  a  legal  name for an
       identifier  in  Pascal  notation  and  must  be  all  in  upper   case.
       (Underlines are allowed but not at the beginning of a word.)

       The attribute for this token, that is it’s prettyprinted form, consists
       of all text between the two curling brackets { and }.   Attributes  can
       be either simple strings (surrounded by double quotes), format commands
       (see below), your own C++ code (enclosed in angled brackets  [  and  ],
       see  below)  or  a combination of both joined together by an optional +
       sign. Attribute definitions can cover several lines and the starting  {
       needn’t  stand  on  the  same  line  as  the  token definition; however
       subsequent lines must be indented with at least one blank or one tab.

       If you define strings as part of an attribute definition, you  have  to
       specify  them  in a C kind of fashion, i.e. you can insert newlines and
       tabs with \n and \t.  But if you want to  insert  a  backslash  into  a
       string,  you  mustn’t  forget  to put two backslashes \\ into the input
       file. This is especially noteworthy if you are using TeX as typesetter.

       If  the  definition  of  the  attribute  is  omitted pretzel creates an
       attribute for this pattern by default. The default  attribute  consists
       of the string containing the text matched by the corresponding pattern.

       The user himself may also refer  to  the  matched  text  by  using  the
       sequence **.  Thus

              "foo"       BAR

              "foo"       BAR     { ** }

              "foo"       BAR     { "foo" }

       all have the same meaning.

       You  can  use  a  | sign as a token name; this signals that the current
       regular  expression  has  the  same  token  name  (and  also  the  same
       attribute)  as  the  token specified in the following line (empty lines
       are ignored). An attribute definition behind a | is  illegal.   However
       you  may  specify  regular expressions with neither a token name nor an
       attribute to give a default rule or to eat up whitespace.

       The declarations and the token definitions must be separated by a  line
       containing only the two characters %%.

   Examples
       The following examples are all legal token definitions:

              [0-9]           DIGIT

              "{"             OPEN           { "\\{" indent force }

              [a-z][a-z0-9]*  ID             { "{\\it " ** "}" }

              "function"      |

              "procedure"     PROC_INTRO     { big_force + ** }

              [\t\ \n]        |

              .

   The formatted grammar file
       In   the   formatted   grammar   file  the  user  encodes  the  general
       prettyprinting grammar for the programming language. This  is  done  by
       specifying  a  context  free  grammar  of  the  language  and by adding
       information about the creation of new attributes in  every  rule.   Its
       general outline looks like this:

              token declarations

              %%

              grammar rules

       The  token  declarations section may be empty and the separator between
       the two parts of the file %% must appear unindented on a single line by
       itself.

       The  grammar  rules  section  contains  the  collection of rules of the
       context  free  grammar  that  can  be  accompanied  by   an   attribute
       definition.   A  rule  is  specified  by stating the resulting token, a
       colon and then the series of tokens which will be reduced by this rule.
       The  rule  is  ended  by  a semicolon. A block definition in Pascal for
       example might look like this:

                block : BEGIN stmt_list END ;

       Following the token list on the right side  of  the  colon  can  be  an
       attribute  definition;  this  definition states, how the translation of
       the produced symbol is obtained from the tokens on the  right  side  of
       the rule.

       An  attribute  definition  is bracketed amidst curling brackets { and }
       and can again consist of strings (in double quotes), format commands or
       C code (enclosed in angled brackets [ and ], see below) joined together
       by an optional +.  But here you can also refer to the attributes of the
       tokens  on  the  right  side  of  the  rule. This is done in a slightly
       awkward notation with a number that is preceded with a $  dollar  sign.
       The  numbers  refer  to  the  order of appearance of the symbols on the
       right side of the rule. So $1 refers to the first token of the rule, $2
       to the second, and so on.

       Again  attribute  definitions  are  allowed  to  span several lines and
       strings must be specified in C manner.

       The attribute definition may be omitted. If this is so, pretzel will by
       default  form  the  attribute  of  the  produced symbol from the simple
       concatenation of the attributes on the right  side  of  the  rule.   Of
       course you may also have empty right sides of a rule (to produce things
       out of nothing) or simply concatenate two or more  rules  resulting  in
       the same symbol with a |.

       For  every  terminal  token that appears in the grammar rules a special
       line has to be written into the declarations section of the file. These
       definitions are of the form

              %token tokenname

       It is very important not to forget this.

   Examples
       For  example,  here  again  is  the  possible  definition of a block in
       Pascal, now with an example attribute definition:

                block : BEGIN stmt_list END   { $1 $2 force $3 } ;

       The attribute of a block will therefore consist of  the  attributes  of
       the  BEGIN  and  stmt_list tokens, joined together with a force command
       and the translation of the END token.

       These two lines mean the same:

              stmt : block SEMI ;

              stmt : block SEMI       { $1 $2 } ;

       These are legal rules too:

              stmt_list   :                      { force }
                          | stmt_list stmt SEMI  { $1 $2 $3 force };

   Comments and Code
       There is a very simple way of putting comments into the formatted token
       and  formatted  grammar  files. This is done in a C++ kind of manner by
       preceding the comment with a double slash //.  All  characters  between
       this sign and the end of the line are ignored by pretzel.

       In  both  files  you can put additional C/C++ code before and after the
       definitions/grammar sections.  If you want to insert code at the end of
       your  file, you have to put a second %% on a line by itself and put the
       code behind it. C/C++ code before the definitions/rules section has  to
       be  tied in with a %{, %} pair. Inserting extra code is interesting for
       people who want to access it from within the attribute definition.

   Code within attribute definitions
       From version 2.0  onwards  pretzel  allows  to  insert  C++  code  into
       attribute  definitions.  This  is how pretzel expects you to write code
       inside your pretzel input files:

       Code  fragments  are  bracketed  within  angled  brackets.  Any  angled
       brackets  that  appear  within  the  C  code  must  be  escaped  with a
       backslash. There can blocks of code before  and  behind  the  attribute
       definition  which  are  called  starting code and endingcode.  Only one
       starting or ending code block is allowed.  Both are  totally  optional,
       but if you want to specify either or, you need an attribute definition.
       Starting code is executed before the attribute  of  the  new  token  is
       built,  ending code is executed after building the attribute and before
       returning to the calling function (in the scanner).

       Code parts within attribute definitions must return  a  pointer  to  an
       Attribute   class   object   (see  file  attr/attr.nw  in  the  pretzel
       distribution for  details).   Within  the  formatted  token  file,  the
       matched  text is visible to you in form of a char* yytext variable. The
       symbolic names of the tokens  are  available  by  the  same  name  that
       pretzel  gives  them.  Starting code, code within attribute definitions
       and ending code is totally optional. But at any place  where  they  are
       allowed,  only  one bracketed code bit may be placed. Here’s an example
       from the formatted grammar file:

              id : ID  { [lookup($1) ? create("{\\bf ") :

                            create("{\\it ")] $1 "}" };

       This example shows how to format an identifier depending on whether  it
       is  in  a  lookup  table  or not. Identifiers could be installed in the
       table for example like this:

              typedef : TYPEDEF_LIKE INT_LIKE ID

                         [ install($3); ]

                         { $1 $2 "{\\bf " $3 "}" };

       More examples can be found  in  the  Pretzelbook.  Common  routines  to
       escape  identifiers,  to  build and manage lookup tables, to convert to
       and from Attribute* or to output debug information can be found in  the
       files  belonging  to the C prettyprinter in the directory languages/cee
       of the pretzel distribution.

   The set of format commands
       Here’s a list of the format commands supported  by  pretzel  and  their
       meaning:
       null   empty command.
       indent indents the next line a little more.
       outdent
              takes back the last indentation (de-indent).
       force  forces a line break.
       break_space
              denotes a possible space for a line break.
       opt1...opt9
              denotes  an  optional  line  break  with  the  continuation line
              indented a litte with respect to the normal starting position.
       backup denotes a small backspace.
       big_force
              forces a line break and inserts a little extra space.
       no_indent
              causes the current line to be output flushleft.
       cancel obliterates any break_space, opt,  force  or  big_force  command
              that  immediatly  precedes  or  follows  it and also cancels any
              backup command that follows it.

              For a complete reference on how to  write  pretzel  input,  look
              into   the   Pretzelbook   which  is  included  in  the  pretzel
              distribution.

   Format command preprocessing
       The format commands are preprocessed according  to  the  following  two
       rules:

       1. A sequence of consecutive
              break_space,  force,  and/or big_force commands is replaced by a
              single command (the maximum of the given ones).

       2. The cancel command cancels any break_space, opt, force or  big_force
              command  that  immediatly  precede or follow it and also cancels
              any backup command that follows it.

THE OUTPUT FILES

       If pretzel runs without error, you will obtain the definition of a  C++
       prettyprinter  class  in  form  of two files. The first file is a valid
       bison(1) file from which the actual prettyprinting parser class can  be
       obtained. The second file (generated from the formatted token file) can
       be  processed  with  the  flex(1)  scanner  generator   to   form   the
       prettyprinting scanner class used by the parser.

   The bison file
       The  generated bison file contains the definitions for a prettyprinting
       parser class that is a subclass of the following  abstract  base  class
       (contained in the file Pparse.h within the pretzel include directory):

              #include<iostream>

              #include"attr.h"

              #include"output.h"

              class Pparse {

              public:
                     Pparse() {};

                     ~Pparse() {};

                     virtual int prettyprint(istream*, ostream*) = 0;

                     virtual int prettyprint(istream*, Output*) = 0;
              };

       The  prettyprinter  generated  by  pretzel  will  be  a subclass of the
       following form:

              #include Pparse.h // include abstract base class

              class PPARSE_NAME : public Pparse {

              public:
                     PPARSE_NAME(); ~PPARSE_NAME();

                     int prettyprint(istream*, ostream*);

                     int prettyprint(istream*, Output*);

                     void debug_on(); void debug_off();
              };

       The name of the class may be changed  by  redefining  the  preprocessor
       macro  PPARSE_NAME  within  the  formatted  grammar  file.  The  actual
       prettyprinting function is prettyprint that reads text  from  an  input
       stream (i.e. a C++ istream object) and outputs the results to an output
       stream  (i.e.  a  C++  ostream  object,  see  ios(3C++)).   The  second
       overloaded  version of prettyprint takes an Output object (see the file
       output/output.nw and the Pretzelbook in the  pretzel  distribution  for
       details)  and  uses  this  to  output the prettyprinted code. The debug
       functions can be used to turn debugging output to cerr on and off.

   The flex file
       The  prettyprinting  parser  class  relies  on   the   service   of   a
       prettyprinting  scanner  that  can be produced using the second pretzel
       file. It contails a complete definition of a scanner subclass  of  this
       abstract   base   class  (see  file  Pscan.h  in  the  pretzel  include
       directory):

              #include<iostream> #include"attr.h"

              class Pscan {

              public:
                     Pscan(istream*) {}; ~Pscan() {};

                     virtual int scan(Attribute**) = 0;
              };

       The scanner must be initialized with a C++ istream pointer  from  which
       it  takes  its  input.  A  call  to the actual scan function returns an
       integer (the token code of the token just scanned or 0 on  end-of-file)
       plus a call by reference attribute containing the contents of the token
       (see file attr/attr.nw from the pretzel distribution).

       The produced prettyprinting scanner class is a subclass and looks  like
       this:

              #include Pscan.h // include abstract base class

              class PSCAN_NAME : public Pscan {

              public:
                     PSCAN_NAME(istream*);

                     ~PSCAN_NAME();

                     int scan(Attribute**);

       The  name of the scanner can be changed within the formatted token file
       by redefining the PSCAN_NAME macro within the declarations section. The
       scanner  class  expects to find token definitions common to the scanner
       and the parser in a file called ptokdefs.h and will try to include this
       file.  You  either  have  to  provide  this file yourself or use the -d
       option of Bison to create  one  that  fits  a  formatted  grammar  (see
       bison(1)).   You  may  change  the  name  of  the file that the scanner
       expects by redefining  the  PTOKDEFS_NAME  macro  in  the  declarations
       section  of  the  formatted  token  file.   Commen header files for the
       abstract base classes and the default subclasses reside in the  pretzel
       include directory.

FILES

       /usr/lib/pretzel/libpretzel.a pretzel runtime library.
       /usr/include/pretzel          directory  for  runtime  library  include
                                     files (pretzel include directory).
       /usr/local/lib/pretzel/include/Pscan.h
       /usr/include/pretzel/Pparse.h headers for abstract base files.
       /usr/include/pretzel/Ppscan.h
       /usr/include/pretzel/Ppparse.h
                                     default headers for generated subclasses.
       /usr/lib/texmf/tex/latex/pretzel/pretzel-latex.sty
                                     LaTeX style to typeset pretzel output.

AUTHOR

       Felix Gaertner, email: fcg@acm.org

                                 June 11, 1998                      pretzel(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

THE INPUT FILES

THE OUTPUT FILES

FILES

SEE ALSO

AUTHOR