NAME
ragel - compile regular languages into executable state machines
SYNOPSIS
ragel [options] file
DESCRIPTION
Ragel compiles executable finite state machines from regular languages.
Ragel can generate C, C++, Objective-C, D, or Java code. Ragel state
machines can not only recognize byte sequences as regular expression
machines do, but can also execute code at arbitrary points in the
recognition of a regular language. User code is embedded using inline
operators that do not disrupt the regular language syntax.
The core language consists of standard regular expression operators,
such as union, concatenation and kleene star, accompanied by action
embedding operators. Ragel also provides operators that let you control
any non-determinism that you create, construct scanners using the
longest match paradigm, and build state machines using the statechart
model. It is also possible to influence the execution of a state
machine from inside an embedded action by jumping or calling to other
parts of the machine and reprocessing input.
Ragel provides a very flexibile interface to the host language that
attempts to place minimal restrictions on how the generated code is
used and integrated into the application. The generated code has no
dependencies.
OPTIONS
-h, -H, -?, --help
Display help and exit.
-v Print version information and exit.
-o file
Write output to file. If -o is not given, a default file name is
chosen by replacing the file extenstion of the input file. For
source files ending in .rh the suffix .h is used. For all other
source files a suffix based on the output language is used (.c,
.cpp, .m, etc.). If -o is not given for Graphviz output the
generated dot file is written to standard output.
-s Print some statistics on standard error.
--error-format=gnu
Print error messages using the format "file:line:column:"
(default)
--error-format=msvc
Print error messages using the format "file(line,column):"
-d Do not remove duplicate actions from action lists.
-I dir
Add dir to the list of directories to search for included and
imported files
-n Do not perform state minimization.
-m Perform minimization once, at the end of the state machine
compilation.
-l Minimize after nearly every operation. Lists of like operations
such as unions are minimized once at the end. This is the
default minimization option.
-e Minimize after every operation.
-x Compile the state machines and emit an XML representation of the
host data and the machines.
-V Generate a dot file for Graphviz.
-p Display printable characters on labels.
-S <spec>
FSM specification to output.
-M <machine>
Machine definition/instantiation to output.
-C The host language is C, C++, Obj-C or Obj-C++. This is the
default host language option.
-D The host language is D.
-J The host language is Java.
-R The host language is Ruby.
-L Inhibit writing of #line directives.
-T0 (C/D/Java/Ruby/C#) Generate a table driven FSM. This is the
default code style. The table driven FSM represents the state
machine as static data. There are tables of states, transitions,
indicies and actions. The current state is stored in a variable.
The execution is a loop that looks that given the current state
and current character to process looks up the transition to take
using a binary search, executes any actions and moves to the
target state. In general, the table driven FSM produces a
smaller binary and requires a less expensive host language
compile but results in slower running code. The table driven FSM
is suitable for any FSM.
-T1 (C/D/Ruby/C#) Generate a faster table driven FSM by expanding
action lists in the action execute code.
-F0 (C/D/Ruby/C#) Generate a flat table driven FSM. Transitions are
represented as an array indexed by the current alphabet
character. This eliminates the need for a binary search to
locate transitions and produces faster code, however it is only
suitable for small alphabets.
-F1 (C/D/Ruby/C#) Generate a faster flat table driven FSM by
expanding action lists in the action execute code.
-G0 (C/D/C#) Generate a goto driven FSM. The goto driven FSM
represents the state machine as a series of goto statements.
While in the machine, the current state is stored by the
processor’s instruction pointer. The execution is a flat
function where control is passed from state to state using
gotos. In general, the goto FSM produces faster code but results
in a larger binary and a more expensive host language compile.
-G1 (C/D/C#) Generate a faster goto driven FSM by expanding action
lists in the action execute code.
-G2 (C/D) Generate a really fast goto driven FSM by embedding action
lists in the state machine control code.
-P<N> (C/D) N-Way Split really fast goto-driven FSM.
RAGEL INPUT
NOTE: This is a very brief description of Ragel input. Ragel is
described in more detail in the user guide available from the homepage
(see below).
Ragel normally passes input files straight to the output. When it sees
an FSM specification that contains machine instantiations it stops to
generate the state machine. If there are write statements (such as
"write exec") then ragel emits the corresponding code. There can be any
number of FSM specifications in an input file. A multi-line FSM
specification starts with ’%%{’ and ends with ’}%%’. A single line FSM
specification starts with %% and ends at the first newline.
FSM STATEMENTS
Machine Name:
Set the the name of the machine. If given, it must be the first
statement.
Alphabet Type:
Set the data type of the alphabet.
GetKey:
Specify how to retrieve the alphabet character from the element
type.
Include:
Include a machine of same name as the current or of a different
name in either the current file or some other file.
Action Definition:
Define an action that can be invoked by the FSM.
Fsm Definition, Instantiation and Longest Match Instantiation:
Used to build FSMs. Syntax description in next few sections.
Access:
Specify how to access the persistent state machine variables.
Write: Write some component of the machine.
Variable:
Override the default variable names (p, pe, cs, act, etc).
BASIC MACHINES
The basic machines are the base operands of the regular language
expressions.
hello
Concat literal. Produces a concatenation of the characters in
the string. Supports escape sequences with ’\’. The result
will have a start state and a transition to a new state for each
character in the string. The last state in the sequence will be
made final. To make the string case-insensitive, append an ’i’
to the string, as in ’cmd’i.
"hello"
Identical to single quote version.
[hello]
Or literal. Produces a union of characters. Supports character
ranges with ’-’, negating the sense of the union with an initial
’^’ and escape sequences with ’\’. The result will have two
states with a transition between them for each character or
range.
NOTE: ’’, "", and [] produce null FSMs. Null machines have one state
that is both a start state and a final state and match the zero length
string. A null machine may be created with the null builtin machine.
integer
Makes a two state machine with one transition on the given
integer number.
hex Makes a two state machine with one transition on the given
hexidecimal number.
/simple_regex/
A simple regular expression. Supports the notation ’.’, ’*’ and
’[]’, character ranges with ’-’, negating the sense of an OR
expression with and initial ’^’ and escape sequences with ’\’.
Also supports one trailing flag: i. Use it to produce a case-
insensitive regular expression, as in /GET/i.
lit .. lit
Specifies a range. The allowable upper and lower bounds are
concat literals of length one and number machines. For example,
0x10..0x20, 0..63, and ’a’..’z’ are valid ranges.
variable_name
References the machine definition assigned to the variable name
given.
builtin_machine
There are several builtin machines available. They are all two
state machines for the purpose of matching common classes of
characters. They are:
any Any character in the alphabet.
ascii Ascii characters 0..127.
extend Ascii extended characters. This is the range -128..127
for signed alphabets and the range 0..255 for unsigned
alphabets.
alpha Alphabetic characters /[A-Za-z]/.
digit Digits /[0-9]/.
alnum Alpha numerics /[0-9A-Za-z]/.
lower Lowercase characters /[a-z]/.
upper Uppercase characters /[A-Z]/.
xdigit Hexidecimal digits /[0-9A-Fa-f]/.
cntrl Control characters 0..31.
graph Graphical characters /[!-~]/.
print Printable characters /[ -~]/.
punct Punctuation. Graphical characters that are not alpha-
numerics /[!-/:-@\[-‘{-~]/.
space Whitespace /[\t\v\f\n\r ]/.
null Zero length string. Equivalent to ’’, "" and [].
empty Empty set. Matches nothing.
BRIEF OPERATOR REFERENCE
Operators are grouped by precedence, group 1 being the lowest and group
6 the highest.
GROUP 1:
expr , expr
Join machines together without drawing any transitions, setting
up a start state or any final states. Start state must be
explicitly specified with the "start" label. Final states may be
specified with the an epsilon transitions to the implicitly
created "final" state.
GROUP 2:
expr | expr
Produces a machine that matches any string in machine one or
machine two.
expr & expr
Produces a machine that matches any string that is in both
machine one and machine two.
expr - expr
Produces a machine that matches any string that is in machine
one but not in machine two.
expr -- expr
Strong Subtraction. Matches any string in machine one that does
not have any string in machine two as a substring.
GROUP 3:
expr . expr
Produces a machine that matches all the strings in machine one
followed by all the strings in machine two.
expr :> expr
Entry-Guarded Concatenation: terminates machine one upon entry
to machine two.
expr :>> expr
Finish-Guarded Concatenation: terminates machine one when
machine two finishes.
expr <: expr
Left-Guarded Concatenation: gives a higher priority to machine
one.
NOTE: Concatenation is the default operator. Two machines next to each
other with no operator between them results in the concatenation
operation.
GROUP 4:
label: expr
Attaches a label to an expression. Labels can be used by epsilon
transitions and fgoto and fcall statements in actions. Also note
that the referencing of a machine definition causes the implicit
creation of label by the same name.
GROUP 5:
expr -> label
Draws an epsilon transition to the state defined by label. Label
must be a name in the current scope. Epsilon transitions are
resolved when comma operators are evaluated and at the root of
the expression tree of machine assignment/instantiation.
GROUP 6: Actions
An action may be a name predefined with an action statement or may be
specified directly with ’{’ and ’}’ in the expression.
expr > action
Embeds action into starting transitions.
expr @ action
Embeds action into transitions that go into a final state.
expr $ action
Embeds action into all transitions. Does not include pending out
transitions.
expr % action
Embeds action into pending out transitions from final states.
GROUP 6: EOF Actions
When a machine’s finish routine is called the current state’s EOF
actions are executed.
expr >/ action
Embed an EOF action into the start state.
expr </ action
Embed an EOF action into all states except the start state.
expr $/ action
Embed an EOF action into all states.
expr %/ action
Embed an EOF action into final states.
expr @/ action
Embed an EOF action into all states that are not final.
expr <>/ action
Embed an EOF action into all states that are not the start state
and that are not final (middle states).
GROUP 6: Global Error Actions
Global error actions are stored in states until the final state machine
has been fully constructed. They are then transferred to error
transitions, giving the effect of a default action.
expr >! action
Embed a global error action into the start state.
expr <! action
Embed a global error action into all states except the start
state.
expr $! action
Embed a global error action into all states.
expr %! action
Embed a global error action into the final states.
expr @! action
Embed a global error action into all states which are not final.
expr <>! action
Embed a global error action into all states which are not the
start state and are not final (middle states).
GROUP 6: Local Error Actions
Local error actions are stored in states until the named machine is
fully constructed. They are then transferred to error transitions,
giving the effect of a default action for a section of the total
machine. Note that the name may be omitted, in which case the action
will be transferred to error actions upon construction of the current
machine.
expr >^ action
Embed a local error action into the start state.
expr <^ action
Embed a local error action into all states except the start
state.
expr $^ action
Embed a local error action into all states.
expr %^ action
Embed a local error action into the final states.
expr @^ action
Embed a local error action into all states which are not final.
expr <>^ action
Embed a local error action into all states which are not the
start state and are not final (middle states).
GROUP 6: To-State Actions
To state actions are stored in states and executed any time the machine
moves into a state. This includes regular transitions, and transfers of
control such as fgoto. Note that setting the current state from outside
the machine (for example during initialization) does not count as a
transition into a state.
expr >~ action
Embed a to-state action action into the start state.
expr <~ action
Embed a to-state action into all states except the start state.
expr $~ action
Embed a to-state action into all states.
expr %~ action
Embed a to-state action into the final states.
expr @~ action
Embed a to-state action into all states which are not final.
expr <>~ action
Embed a to-state action into all states which are not the start
state and are not final (middle states).
GROUP 6: From-State Actions
From state actions are executed whenever a state takes a transition on
a character. This includes the error transition and a transition to
self.
expr >* action
Embed a from-state action into the start state.
expr <* action
Embed a from-state action into every state except the start
state.
expr $* action
Embed a from-state action into all states.
expr %* action
Embed a from-state action into the final states.
expr @* action
Embed a from-state action into all states which are not final.
expr <>* action
Embed a from-state action into all states which are not the
start state and are not final (middle states).
GROUP 6: Priority Assignment
Priorities are assigned to names within transitions. Only priorities on
the same name are allowed to interact. In the first form of priorities
the name defaults to the name of the machine definition the priority is
assigned in. Transitions do not have default priorities.
expr > int
Assigns the priority int in all transitions leaving the start
state.
expr @ int
Assigns the priority int in all transitions that go into a final
state.
expr $ int
Assigns the priority int in all existing transitions.
expr % int
Assigns the priority int in all pending out transitions.
A second form of priority assignment allows the programmer to specify
the name to which the priority is assigned, allowing interactions to
cross machine definition boundaries.
expr > (name,int)
Assigns the priority int to name in all transitions leaving the
start state.
expr @ (name, int)
Assigns the priority int to name in all transitions that go into
a final state.
expr $ (name, int)
Assigns the priority int to name in all existing transitions.
expr % (name, int)
Assigns the priority int to name in all pending out transitions.
GROUP 7:
expr * Produces the kleene star of a machine. Matches zero or more
repetitions of the machine.
expr **
Longest-Match Kleene Star. This version of kleene star puts a
higher priority on staying in the machine over wrapping around
and starting over. This operator is equivalent to ( ( expr ) $0
%1 )*.
expr ? Produces a machine that accepts the machine given or the null
string. This operator is equivalent to ( expr | ’’ ).
expr + Produces the machine concatenated with the kleen star of itself.
Matches one or more repetitions of the machine. This operator
is equivalent to ( expr . expr* ).
expr {n}
Produces a machine that matches exactly n repetitions of expr.
expr {,n}
Produces a machine that matches anywhere from zero to n
repetitions of expr.
expr {n,}
Produces a machine that matches n or more repetitions of expr.
expr {n,m}
Produces a machine that matches n to m repetitions of expr.
GROUP 8:
! expr Produces a machine that matches any string not matched by the
given machine. This operator is equivalent to ( *extend - expr
).
^ expr Character-Level Negation. Matches any single character not
matched by the single character machine expr.
GROUP 9:
( expr )
Forces precedence on operators.
VALUES AVAILABLE IN CODE BLOCKS
fc The current character. Equivalent to *p.
fpc A pointer to the current character. Equivalent to p.
fcurs An integer value representing the current state.
ftargs An integer value representing the target state.
fentry(<label>)
An integer value representing the entry point <label>.
STATEMENTS AVAILABLE IN CODE BLOCKS
fhold; Do not advance over the current character. Equivalent to --p;.
fexec <expr>;
Sets the current character to something else. Equivalent to p =
(<expr>)-1;
fgoto <label>;
Jump to the machine defined by <label>.
fgoto *<expr>;
Jump to the entry point given by <expr>. The expression must
evaluate to an integer value representing a state.
fnext <label>;
Set the next state to be the entry point defined by <label>.
The fnext statement does not immediately jump to the specified
state. Any action code following the statement is executed.
fnext *<expr>;
Set the next state to be the entry point given by <expr>. The
expression must evaluate to an integer value representing a
state.
fcall <label>;
Call the machine defined by <label>. The next fret will jump to
the target of the transition on which the action is invoked.
fcall *<expr>;
Call the entry point given by <expr>. The next fret will jump to
the target of the transition on which the action is invoked.
fret; Return to the target state of the transition on which the last
fcall was made.
fbreak;
Save the current state and immediately break out of the machine.
CREDITS
Ragel was written by Adrian Thurston <thurston@complang.org>.
Objective-C output contributed by Erich Ocean. D output contributed by
Alan West. Ruby output contributed by Victor Hugo Borja. C Sharp code
generation contributed by Daniel Tang. Contributions to Java code
generation by Colin Fleming.
SEE ALSO
re2c(1), flex(1)
Homepage: http://www.complang.org/ragel/