Man Linux: Main Page and Category List

NAME

       ragel - compile regular languages into executable state machines

SYNOPSIS

       ragel [options] file

DESCRIPTION

       Ragel compiles executable finite state machines from regular languages.
       Ragel can generate C, C++, Objective-C, D, or Java  code.  Ragel  state
       machines  can  not  only recognize byte sequences as regular expression
       machines do, but can also execute  code  at  arbitrary  points  in  the
       recognition  of a regular language.  User code is embedded using inline
       operators that do not disrupt the regular language syntax.

       The core language consists of standard  regular  expression  operators,
       such  as  union,  concatenation  and kleene star, accompanied by action
       embedding operators. Ragel also provides operators that let you control
       any  non-determinism  that  you  create,  construct  scanners using the
       longest match paradigm, and build state machines using  the  statechart
       model.  It  is  also  possible  to  influence  the execution of a state
       machine from inside an embedded action by jumping or calling  to  other
       parts of the machine and reprocessing input.

       Ragel  provides  a  very  flexibile interface to the host language that
       attempts to place minimal restrictions on how  the  generated  code  is
       used  and  integrated  into  the application. The generated code has no
       dependencies.

OPTIONS

       -h, -H, -?, --help
              Display help and exit.

       -v     Print version information and exit.

       -o  file
              Write output to file. If -o is not given, a default file name is
              chosen  by  replacing the file extenstion of the input file. For
              source files ending in .rh the suffix .h is used. For all  other
              source  files a suffix based on the output language is used (.c,
              .cpp, .m, etc.). If -o is not  given  for  Graphviz  output  the
              generated dot file is written to standard output.

       -s     Print some statistics on standard error.

       --error-format=gnu
              Print   error  messages  using  the  format  "file:line:column:"
              (default)

       --error-format=msvc
              Print error messages using the format "file(line,column):"

       -d     Do not remove duplicate actions from action lists.

       -I  dir
              Add dir to the list of directories to search  for  included  and
              imported files

       -n     Do not perform state minimization.

       -m     Perform  minimization  once,  at  the  end  of the state machine
              compilation.

       -l     Minimize after nearly every operation. Lists of like  operations
              such  as  unions  are  minimized  once  at  the end. This is the
              default minimization option.

       -e     Minimize after every operation.

       -x     Compile the state machines and emit an XML representation of the
              host data and the machines.

       -V     Generate a dot file for Graphviz.

       -p     Display printable characters on labels.

       -S <spec>
              FSM specification to output.

       -M <machine>
              Machine definition/instantiation to output.

       -C     The  host  language  is  C,  C++,  Obj-C or Obj-C++. This is the
              default host language option.

       -D     The host language is D.

       -J     The host language is Java.

       -R     The host language is Ruby.

       -L     Inhibit writing of #line directives.

       -T0    (C/D/Java/Ruby/C#) Generate a table  driven  FSM.  This  is  the
              default  code  style.  The table driven FSM represents the state
              machine as static data. There are tables of states, transitions,
              indicies and actions. The current state is stored in a variable.
              The execution is a loop that looks that given the current  state
              and current character to process looks up the transition to take
              using a binary search, executes any actions  and  moves  to  the
              target  state.  In  general,  the  table  driven  FSM produces a
              smaller binary and  requires  a  less  expensive  host  language
              compile but results in slower running code. The table driven FSM
              is suitable for any FSM.

       -T1    (C/D/Ruby/C#) Generate a faster table driven  FSM  by  expanding
              action lists in the action execute code.

       -F0    (C/D/Ruby/C#)  Generate a flat table driven FSM. Transitions are
              represented  as  an  array  indexed  by  the  current   alphabet
              character.  This  eliminates  the  need  for  a binary search to
              locate transitions and produces faster code, however it is  only
              suitable for small alphabets.

       -F1    (C/D/Ruby/C#)  Generate  a  faster  flat  table  driven  FSM  by
              expanding action lists in the action execute code.

       -G0    (C/D/C#) Generate  a  goto  driven  FSM.  The  goto  driven  FSM
              represents  the  state  machine  as a series of goto statements.
              While in the  machine,  the  current  state  is  stored  by  the
              processor’s   instruction  pointer.  The  execution  is  a  flat
              function where control is  passed  from  state  to  state  using
              gotos. In general, the goto FSM produces faster code but results
              in a larger binary and a more expensive host language compile.

       -G1    (C/D/C#) Generate a faster goto driven FSM by  expanding  action
              lists in the action execute code.

       -G2    (C/D) Generate a really fast goto driven FSM by embedding action
              lists in the state machine control code.

       -P<N>  (C/D) N-Way Split really fast goto-driven FSM.

RAGEL INPUT

       NOTE: This is a  very  brief  description  of  Ragel  input.  Ragel  is
       described  in more detail in the user guide available from the homepage
       (see below).

       Ragel normally passes input files straight to the output. When it  sees
       an  FSM  specification that contains machine instantiations it stops to
       generate the state machine. If there  are  write  statements  (such  as
       "write exec") then ragel emits the corresponding code. There can be any
       number of FSM  specifications  in  an  input  file.  A  multi-line  FSM
       specification  starts with ’%%{’ and ends with ’}%%’. A single line FSM
       specification starts with %% and ends at the first newline.

FSM STATEMENTS

       Machine Name:
              Set the the name of the machine. If given, it must be the  first
              statement.

       Alphabet Type:
              Set the data type of the alphabet.

       GetKey:
              Specify  how to retrieve the alphabet character from the element
              type.

       Include:
              Include a machine of same name as the current or of a  different
              name in either the current file or some other file.

       Action Definition:
              Define an action that can be invoked by the FSM.

       Fsm Definition, Instantiation and Longest Match Instantiation:
              Used to build FSMs. Syntax description in next few sections.

       Access:
              Specify how to access the persistent state machine variables.

       Write: Write some component of the machine.

       Variable:
              Override the default variable names (p, pe, cs, act, etc).

BASIC MACHINES

       The  basic  machines  are  the  base  operands  of the regular language
       expressions.

       hello
              Concat literal. Produces a concatenation of  the  characters  in
              the  string.   Supports  escape  sequences with ’\’.  The result
              will have a start state and a transition to a new state for each
              character  in the string. The last state in the sequence will be
              made final. To make the string case-insensitive, append  an  ’i’
              to the string, as in ’cmd’i.

       "hello"
              Identical to single quote version.

       [hello]
              Or  literal. Produces a union of characters.  Supports character
              ranges with ’-’, negating the sense of the union with an initial
              ’^’  and  escape  sequences  with  ’\’. The result will have two
              states with a transition between  them  for  each  character  or
              range.

       NOTE:  ’’,  "",  and [] produce null FSMs. Null machines have one state
       that is both a start state and a final state and match the zero  length
       string. A null machine may be created with the null builtin machine.

       integer
              Makes  a  two  state  machine  with  one transition on the given
              integer number.

       hex    Makes a two state machine  with  one  transition  on  the  given
              hexidecimal number.

       /simple_regex/
              A  simple regular expression. Supports the notation ’.’, ’*’ and
              ’[]’, character ranges with ’-’, negating the  sense  of  an  OR
              expression  with  and initial ’^’ and escape sequences with ’\’.
              Also supports one trailing flag: i. Use it to  produce  a  case-
              insensitive regular expression, as in /GET/i.

       lit .. lit
              Specifies  a  range.  The  allowable  upper and lower bounds are
              concat literals of length one and number machines.  For example,
              0x10..0x20,  0..63, and ’a’..’z’ are valid ranges.

       variable_name
              References  the machine definition assigned to the variable name
              given.

       builtin_machine
              There are several builtin machines available. They are  all  two
              state  machines  for  the  purpose of matching common classes of
              characters. They are:

              any    Any character in the alphabet.

              ascii  Ascii characters 0..127.

              extend Ascii extended characters. This is  the  range  -128..127
                     for  signed  alphabets  and the range 0..255 for unsigned
                     alphabets.

              alpha  Alphabetic characters /[A-Za-z]/.

              digit  Digits /[0-9]/.

              alnum  Alpha numerics /[0-9A-Za-z]/.

              lower  Lowercase characters /[a-z]/.

              upper  Uppercase characters /[A-Z]/.

              xdigit Hexidecimal digits /[0-9A-Fa-f]/.

              cntrl  Control characters 0..31.

              graph  Graphical characters /[!-~]/.

              print  Printable characters /[ -~]/.

              punct  Punctuation. Graphical characters  that  are  not  alpha-
                     numerics /[!-/:-@\[-‘{-~]/.

              space  Whitespace /[\t\v\f\n\r ]/.

              null   Zero length string. Equivalent to ’’, "" and [].

              empty  Empty set. Matches nothing.

BRIEF OPERATOR REFERENCE

       Operators are grouped by precedence, group 1 being the lowest and group
       6 the highest.

       GROUP 1:

       expr , expr
              Join machines together without drawing any transitions,  setting
              up  a  start  state  or  any  final  states. Start state must be
              explicitly specified with the "start" label. Final states may be
              specified  with  the  an  epsilon  transitions to the implicitly
              created "final" state.

       GROUP 2:

       expr | expr
              Produces a machine that matches any string  in  machine  one  or
              machine two.

       expr & expr
              Produces  a  machine  that  matches  any  string that is in both
              machine one and machine two.

       expr - expr
              Produces a machine that matches any string that  is  in  machine
              one but not in machine two.

       expr -- expr
              Strong  Subtraction. Matches any string in machine one that does
              not have any string in machine two as a substring.

       GROUP 3:

       expr . expr
              Produces a machine that matches all the strings in  machine  one
              followed by all the strings in machine two.

       expr :> expr
              Entry-Guarded  Concatenation:  terminates machine one upon entry
              to machine two.

       expr :>> expr
              Finish-Guarded  Concatenation:  terminates  machine   one   when
              machine two finishes.

       expr <: expr
              Left-Guarded  Concatenation:  gives a higher priority to machine
              one.

       NOTE: Concatenation is the default operator. Two machines next to  each
       other  with  no  operator  between  them  results  in the concatenation
       operation.

       GROUP 4:

       label: expr
              Attaches a label to an expression. Labels can be used by epsilon
              transitions and fgoto and fcall statements in actions. Also note
              that the referencing of a machine definition causes the implicit
              creation of label by the same name.

       GROUP 5:

       expr -> label
              Draws an epsilon transition to the state defined by label. Label
              must be a name in the current  scope.  Epsilon  transitions  are
              resolved  when  comma operators are evaluated and at the root of
              the expression tree of machine assignment/instantiation.

       GROUP 6: Actions

       An action may be a name predefined with an action statement or  may  be
       specified directly with ’{’ and ’}’ in the expression.

       expr > action
              Embeds action into starting transitions.

       expr @ action
              Embeds action into transitions that go into a final state.

       expr $ action
              Embeds action into all transitions. Does not include pending out
              transitions.

       expr % action
              Embeds action into pending out transitions from final states.

       GROUP 6: EOF Actions

       When a machine’s finish routine  is  called  the  current  state’s  EOF
       actions are executed.

       expr >/ action
              Embed an EOF action into the start state.

       expr </ action
              Embed an EOF action into all states except the start state.

       expr $/ action
              Embed an EOF action into all states.

       expr %/ action
              Embed an EOF action into final states.

       expr @/ action
              Embed an EOF action into all states that are not final.

       expr <>/ action
              Embed an EOF action into all states that are not the start state
              and that are not final (middle states).

       GROUP 6: Global Error Actions

       Global error actions are stored in states until the final state machine
       has  been  fully  constructed.  They  are  then  transferred  to  error
       transitions, giving the effect of a default action.

       expr >! action
              Embed a global error action into the start state.

       expr <! action
              Embed a global error action into all  states  except  the  start
              state.

       expr $! action
              Embed a global error action into all states.

       expr %! action
              Embed a global error action into the final states.

       expr @! action
              Embed a global error action into all states which are not final.

       expr <>! action
              Embed a global error action into all states which  are  not  the
              start state and are not final (middle states).

       GROUP 6: Local Error Actions

       Local  error  actions  are  stored in states until the named machine is
       fully constructed. They are  then  transferred  to  error  transitions,
       giving  the  effect  of  a  default  action  for a section of the total
       machine. Note that the name may be omitted, in which  case  the  action
       will  be  transferred to error actions upon construction of the current
       machine.

       expr >^ action
              Embed a local error action into the start state.

       expr <^ action
              Embed a local error action into  all  states  except  the  start
              state.

       expr $^ action
              Embed a local error action into all states.

       expr %^ action
              Embed a local error action into the final states.

       expr @^ action
              Embed  a local error action into all states which are not final.

       expr <>^ action
              Embed a local error action into all states  which  are  not  the
              start state and are not final (middle states).

       GROUP 6: To-State Actions

       To state actions are stored in states and executed any time the machine
       moves into a state. This includes regular transitions, and transfers of
       control such as fgoto. Note that setting the current state from outside
       the machine (for example during initialization) does  not  count  as  a
       transition into a state.

       expr >~ action
              Embed a to-state action action into the start state.

       expr <~ action
              Embed  a to-state action into all states except the start state.

       expr $~ action
              Embed a to-state action into all states.

       expr %~ action
              Embed a to-state action into the final states.

       expr @~ action
              Embed a to-state action into all states which are not final.

       expr <>~ action
              Embed a to-state action into all states which are not the  start
              state and are not final (middle states).

       GROUP 6: From-State Actions

       From  state actions are executed whenever a state takes a transition on
       a character.  This includes the error transition and  a  transition  to
       self.

       expr >* action
              Embed a from-state action into the start state.

       expr <* action
              Embed  a  from-state  action  into  every state except the start
              state.

       expr $* action
              Embed a from-state action into all states.

       expr %* action
              Embed a from-state action into the final states.

       expr @* action
              Embed a from-state action into all states which are not final.

       expr <>* action
              Embed a from-state action into all  states  which  are  not  the
              start state and are not final (middle states).

       GROUP 6: Priority Assignment

       Priorities are assigned to names within transitions. Only priorities on
       the same name are allowed to interact. In the first form of  priorities
       the name defaults to the name of the machine definition the priority is
       assigned in.  Transitions do not have default priorities.

       expr > int
              Assigns the priority int in all transitions  leaving  the  start
              state.

       expr @ int
              Assigns the priority int in all transitions that go into a final
              state.

       expr $ int
              Assigns the priority int in all existing transitions.

       expr % int
              Assigns the priority int in all pending out transitions.

       A second form of priority assignment allows the programmer  to  specify
       the  name  to  which the priority is assigned, allowing interactions to
       cross machine definition boundaries.

       expr > (name,int)
              Assigns the priority int to name in all transitions leaving  the
              start state.

       expr @ (name, int)
              Assigns the priority int to name in all transitions that go into
              a final state.

       expr $ (name, int)
              Assigns the priority int to name in all existing transitions.

       expr % (name, int)
              Assigns the priority int to name in all pending out transitions.

       GROUP 7:

       expr * Produces  the  kleene  star  of  a machine. Matches zero or more
              repetitions of the machine.

       expr **
              Longest-Match Kleene Star. This version of kleene  star  puts  a
              higher  priority  on staying in the machine over wrapping around
              and starting over. This operator is equivalent to ( ( expr )  $0
              %1 )*.

       expr ? Produces  a  machine  that accepts the machine given or the null
              string. This operator is equivalent to  ( expr | ’’ ).

       expr + Produces the machine concatenated with the kleen star of itself.
              Matches  one  or more repetitions of the machine.  This operator
              is equivalent to ( expr . expr* ).

       expr {n}
              Produces a machine that matches exactly n repetitions of expr.

       expr {,n}
              Produces  a  machine  that  matches  anywhere  from  zero  to  n
              repetitions of expr.

       expr {n,}
              Produces a machine that matches n or more repetitions of expr.

       expr {n,m}
              Produces a machine that matches n to m repetitions of expr.

       GROUP 8:

       ! expr Produces  a  machine  that matches any string not matched by the
              given machine.  This operator is equivalent to ( *extend -  expr
              ).

       ^ expr Character-Level  Negation.  Matches  any  single  character  not
              matched by the single character machine expr.

       GROUP 9:

       ( expr )
              Forces precedence on operators.

VALUES AVAILABLE IN CODE BLOCKS

       fc     The current character. Equivalent to *p.

       fpc    A pointer to the current character. Equivalent to p.

       fcurs  An integer value representing the current state.

       ftargs An integer value representing the target state.

       fentry(<label>)
              An integer value representing the entry point <label>.

STATEMENTS AVAILABLE IN CODE BLOCKS

       fhold; Do not advance over the current character. Equivalent to --p;.

       fexec <expr>;
              Sets the current character to something else. Equivalent to p  =
              (<expr>)-1;

       fgoto <label>;
              Jump to the machine defined by <label>.

       fgoto *<expr>;
              Jump  to  the  entry  point given by <expr>. The expression must
              evaluate to an integer value representing a state.

       fnext <label>;
              Set the next state to be the entry  point  defined  by  <label>.
              The  fnext  statement does not immediately jump to the specified
              state. Any action code following the statement is executed.

       fnext *<expr>;
              Set the next state to be the entry point given  by  <expr>.  The
              expression  must  evaluate  to  an  integer value representing a
              state.

       fcall <label>;
              Call the machine defined by <label>. The next fret will jump  to
              the target of the transition on which the action is invoked.

       fcall *<expr>;
              Call the entry point given by <expr>. The next fret will jump to
              the target of the transition on which the action is invoked.

       fret;  Return to the target state of the transition on which  the  last
              fcall was made.

       fbreak;
              Save the current state and immediately break out of the machine.

CREDITS

       Ragel  was  written   by   Adrian   Thurston   <thurston@complang.org>.
       Objective-C  output contributed by Erich Ocean. D output contributed by
       Alan West. Ruby output contributed by Victor Hugo Borja. C  Sharp  code
       generation  contributed  by  Daniel  Tang.  Contributions  to Java code
       generation by Colin Fleming.

SEE ALSO

       re2c(1), flex(1)

       Homepage: http://www.complang.org/ragel/