NAME
stalin - A global optimizing compiler for Scheme
SYNOPSIS
stalin [-version]
[-I include-directory]*
[[-s|-x|-q|-t]]
[[-treat-all-symbols-as-external|
-do-not-treat-all-symbols-as-external]]
[[-index-allocated-string-types-by-expression|
-do-not-index-allocated-string-types-by-expression]]
[[-index-constant-structure-types-by-slot-types|
-do-not-index-constant-structure-types-by-slot-types]]
[[-index-constant-structure-types-by-expression|
-do-not-index-constant-structure-types-by-expression]]
[[-index-allocated-structure-types-by-slot-types|
-do-not-index-allocated-structure-types-by-slot-types]]
[[-index-allocated-structure-types-by-expression|
-do-not-index-allocated-structure-types-by-expression]]
[[-index-constant-headed-vector-types-by-element-type|
-do-not-index-constant-headed-vector-types-by-element-type]]
[[-index-constant-headed-vector-types-by-expression|
-do-not-index-constant-headed-vector-types-by-expression]]
[[-index-allocated-headed-vector-types-by-element-type|
-do-not-index-allocated-headed-vector-types-by-element-type]]
[[-index-allocated-headed-vector-types-by-expression|
-do-not-index-allocated-headed-vector-types-by-expression]]
[[-index-constant-nonheaded-vector-types-by-element-type|
-do-not-index-constant-nonheaded-vector-types-by-element-
type]]
[[-index-constant-nonheaded-vector-types-by-expression|
-do-not-index-constant-nonheaded-vector-types-by-expression]]
[[-index-allocated-nonheaded-vector-types-by-element-type|
-do-not-index-allocated-nonheaded-vector-types-by-element-
type]]
[[-index-allocated-nonheaded-vector-types-by-expression|
-do-not-index-allocated-nonheaded-vector-types-by-expression]]
[[-no-clone-size-limit|
-clone-size-limit number-of-expressions]]
[-split-even-if-no-widening]
[[-fully-convert-to-CPS|
-no-escaping-continuations]]
[-du]
[-Ob] [-Om] [-On] [-Or] [-Ot]
[-d0] [-d1] [-d2] [-d3] [-d4] [-d5] [-d6] [-d7]
[-closure-conversion-statistics]
[-dc] [-dC] [-dH] [-dg] [-dh]
[-d]
[-architecture name]
[[-baseline|
-conventional|
-lightweight]]
[[-immediate-flat|
-indirect-flat|
-immediate-display|
-indirect-display|
-linked]]
[[-align-strings|-do-not-align-strings]]
[-de] [-df] [-dG] [-di] [-dI] [-dp] [-dP]
[-ds] [-dS] [-Tmk]
[-no-tail-call-optimization]
[-db] [-c] [-k]
[-cc C-compiler]
[-copt C-compiler-option]*
[pathname]
Compiles the Scheme source file pathname.sc first into a C file
pathname.c and then into an executable image pathname. Also produces a
database file pathname.db. The pathname argument is required unless
-version is specified.
DESCRIPTION
Stalin is an extremely efficient compiler for Scheme. It is designed
to be used not as a development tool but rather as a means to generate
efficient executable images either for application delivery or for
production research runs. In contrast to traditional Scheme
implementations, Stalin is a batch-mode compiler. There is no
interactive READ-EVAL-PRINT loop. Stalin compiles a single Scheme
source file into an executable image (indirectly via C). Running that
image has equivalent semantics to loading the Scheme source file into a
virgin Scheme interpreter and then terminating its execution. The
chief limitation is that it is not possible to LOAD or EVAL new
expressions or procedure definitions into a running program after
compilation. In return for this limitation, Stalin does substantial
global compile-time analysis of the source program under this closed-
world assumption and produces executable images that are small, stand-
alone, and fast.
Stalin incorporates numerous strategies for generating efficient code.
Among them, Stalin does global static type analysis using a soft type
system that supports recursive union types. Stalin can determine a
narrow or even monomorphic type for each source code expression in
arbitrary Scheme programs with no type declarations. This allows
Stalin to reduce, or often eliminate, run-time type checking and
dispatching. Stalin also does low-level representation selection on a
per-expression basis. This allows the use of unboxed base machine data
representations for all monomorphic types resulting in extremely high-
performance numeric code. Stalin also does global static life-time
analysis for all allocated data. This allows much temporary allocated
storage to be reclaimed without garbage collection. Finally, Stalin
has very efficient strategies for compiling closures. Together, these
compilation techniques synergistically yield efficient object code.
Furthermore, the executable images created by Stalin do not contain
(user-defined or library) procedures that aren’t called, variables and
parameters that aren’t used, and expressions that cannot be reached.
This encourages a programming style whereby one creates and uses very
general library procedures without fear that executable images will
suffer from code bloat.
OPTIONS
-version
Prints the version of Stalin and exits immediately.
The following options control preprocessing:
-I Specifies the directories to search for Scheme include files.
This option can be repeated to specify multiple directories.
Stalin first searches for include files in the current
directory, then each of the directories specified in the command
line, and finally in the default installation include directory.
-s Includes the macros from the Scheme->C compatibility library.
Currently, this defines the WHEN and UNLESS syntax.
-x Includes the macros from the Xlib and GL library. Currently,
this defines the FOREIGN-FUNCTION and FOREIGN-DEFINE syntax.
This implies -s.
-q Includes the macros from the QobiScheme library. Currently,
this defines the DEFINE-STRUCTURE syntax, among other things.
This implies -x.
-t Includes the macros needed to compile Stalin with itself. This
implies -q.
The following options control the precision of flow analysis:
-treat-all-symbols-as-external
During flow analysis, generate a single abstract external symbol
that is shared among all symbols.
-do-not-treat-all-symbols-as-external
During flow analysis, when processing constant expressions that
contain symbols, generate a new abstract internal symbol for
each distinct symbol constant in the program. This is the
default.
-index-allocated-string-types-by-expression
During flow analysis, when processing procedure-call expressions
that can allocate strings, generate a new abstract string for
each such expression. This is the default.
-do-not-index-allocated-string-types-by-expression
During flow analysis, when processing procedure-call expressions
that can allocate strings, generate a single abstract string
that is shared among all such expressions.
Note that there are no versions of the above options for element type
because the element type of a string is always char. Furthermore,
there are no versions of the above options for constant expressions
because there is always only a single abstract constant string.
-index-constant-structure-types-by-slot-types
During flow analysis, when processing constant expressions that
contain structures, generate a new abstract structure for each
set of potential slot types for that structure.
-do-not-index-constant-structure-types-by-slot-types
During flow analysis, when processing constant expressions that
contain structures, generate a single abstract structure that is
shared among all sets of potential slot types for that
structure. This is the default.
-index-constant-structure-types-by-expression
During flow analysis, when processing constant expression that
contain structures, generate a new abstract structure for each
such expression. This is the default.
-do-not-index-constant-structure-types-by-expression
During flow analysis, when processing constant expressions that
contain structures, generate a single abstract structure that is
shared among all such expressions.
-index-allocated-structure-types-by-slot-types
During flow analysis, when processing procedure-call expressions
that can allocate structures, generate a new abstract structure
for each set of potential slot types for that structure.
-do-not-index-allocated-structure-types-by-slot-types
During flow analysis, when processing procedure-call expressions
that can allocate structures, generate a single abstract
structure that is shared among all sets of potential slot types
for that structure. This is the default.
-index-allocated-structure-types-by-expression
During flow analysis, when processing procedure-call expressions
that can allocate structures, generate a new abstract structure
for each such expression. This is the default.
-do-not-index-allocated-structure-types-by-expression
During flow analysis, when processing procedure-call expressions
that can allocate structures, generate a single abstract
structure that is shared among all such expressions.
Note that, currently, pairs are the only kind of structure that can
appear in constant expressions. This may change in the future, if the
reader is extended to support other kinds of structures.
-index-constant-headed-vector-types-by-element-type
During flow analysis, when processing constant expressions that
contain headed vectors, generate a new abstract headed vector
for each potential element type for that headed vector.
-do-not-index-constant-headed-vector-types-by-element-type
During flow analysis, when processing constant expressions that
contain headed vectors, generate a single abstract headed vector
that is shared among all potential element types for that headed
vector. This is the default.
-index-constant-headed-vector-types-by-expression
During flow analysis, when processing constant expressions that
contain headed vectors, generate a new abstract headed vector
for each such expression. This is the default.
-do-not-index-constant-headed-vector-types-by-expression
During flow analysis, when processing constant expressions that
contain headed vectors, generate a single abstract headed vector
that is shared among all such expressions.
-index-allocated-headed-vector-types-by-element-type
During flow analysis, when processing procedure-call expressions
that can allocate headed vectors, generate a new abstract headed
vector for each potential element type for that headed vector.
-do-not-index-allocated-headed-vector-types-by-element-type
During flow analysis, when processing procedure-call expressions
that can allocate headed vectors, generate a single abstract
headed vector that is shared among all potential element types
for that headed vector. This is the default.
-index-allocated-headed-vector-types-by-expression
During flow analysis, when processing procedure-call expressions
that can allocate headed vectors, generate a new abstract headed
vector for each such expression. This is the default.
-do-not-index-allocated-headed-vector-types-by-expression
During flow analysis, when processing procedure-call expressions
that can allocate headed vectors, generate a single abstract
headed vector that is shared among all such expressions.
-index-constant-nonheaded-vector-types-by-element-type
During flow analysis, when processing constant expressions that
contain nonheaded vectors, generate a new abstract nonheaded
vector for each potential element type for that nonheaded
vector.
-do-not-index-constant-nonheaded-vector-types-by-element-type
During flow analysis, when processing constant expressions that
contain nonheaded vectors, generate a single abstract nonheaded
vector that is shared among all potential element types for that
nonheaded vector. This is the default.
-index-constant-nonheaded-vector-types-by-expression
During flow analysis, when processing constant expressions that
contain nonheaded vectors, generate a new abstract nonheaded
vector for each such expression. This is the default.
-do-not-index-constant-nonheaded-vector-types-by-expression
During flow analysis, when processing constant expressions that
contain nonheaded vectors, generate a single abstract nonheaded
vector that is shared among all such expressions.
-index-allocated-nonheaded-vector-types-by-element-type
During flow analysis, when processing procedure-call expressions
that can allocate nonheaded vectors, generate a new abstract
nonheaded vector for each potential element type for that
nonheaded vector.
-do-not-index-allocated-nonheaded-vector-types-by-element-type
During flow analysis, when processing procedure-call expressions
that can allocate nonheaded vectors, generate a single abstract
nonheaded vector that is shared among all potential element
types for that nonheaded vector. This is the default.
-index-allocated-nonheaded-vector-types-by-expression
During flow analysis, when processing procedure-call expressions
that can allocate nonheaded vectors, generate a new abstract
nonheaded vector for each such expression. This is the default.
-do-not-index-allocated-nonheaded-vector-types-by-expression
During flow analysis, when processing procedure-call expressions
that can allocate nonheaded vectors, generate a single abstract
nonheaded vector that is shared among all such expressions.
Note that, currently, constant expressions cannot contain nonheaded
vectors and nonheaded vectors are never allocated by any procedure-call
expression. ARGV is the only nonheaded vector. These options are
included only for completeness and in case future extensions to the
language allow nonheaded vector constants and procedures that allocate
nonheaded vectors.
-no-clone-size-limit
Allow unlimited polyvariance, i.e. make copies of procedures of
any size.
-clone-size-limit
Specify the polyvariance limit, i.e. make copies of procedures
that have fewer than this many expressions. Must be a
nonnegative integer. Defaults to 80. Specify 0 to disable
polyvariance.
-split-even-if-no-widening
Normally, polyvariance will make a copy of a procedure only if
it is called with arguments of different types. Specify this
option to make copies of procedures even when they are called
with arguments of the same type. This will allow them to be in-
lined.
-fully-convert-to-CPS
Normally, lightweight CPS conversion is applied, converting only
those expressions and procedures needed to support escaping
continuations. When this option is specified, the program is
fully converted to CPS.
-no-escaping-continuations
Normally, full continuations are supported. When this option is
specified, the only continuations that are supported are those
that cannot be called after the procedure that created the
continuation has returned.
-du Normally, after flow analysis, Stalin forces each type set to
have at most one structure-type member of a given name, at most
one headed-vector-type member, and at most one nonheaded-vector-
type member. This option disables this, allowing type sets to
have multiple structure-type members of a given name, multiple
headed-vector-type members, and multiple nonheaded-vector-type
members. Sometimes yields more efficient code and sometimes
yields less efficient code.
The following options control the amount of run-time error-checking
code generated. Note that, independent of the settings of these
options, Stalin will always generate code that obeys the semantics of
the Scheme language for correct programs. These options only control
the level of safety, that is the degree of run-time error checking for
incorrect programs.
-Ob Specifies that code to check for out-of-bound vector or string
subscripts is to be suppressed. If not specified, a run-time
error will be issued if a vector or string subscript is out of
bounds. If specified, the behavior of programs that have an
out-of-bound vector or string subscript is undefined.
-Om Specifies that code to check for out-of-memory errors is to be
suppressed. If not specified, a run-time error will be issued
if sufficient memory cannot be allocated. If specified, the
behavior of programs that run out of memory is undefined.
-On Specifies that code to check for exact integer overflow is to be
suppressed. If not specified, a run-time error will be issued
on exact integer overflow. If specified, the behavior of
programs that cause exact integer overflow is undefined.
Currently, Stalin does not know how to generate overflow
checking code so this option must be specified.
-Or Specifies that code to check for various run-time file-system
errors is to be suppressed. If not specified, a run-time error
will be issued when an unsuccessful attempt is made to open or
close a file. If specified, the behavior of programs that make
such unsuccessful file-access attempts is undefined.
-Ot Specifies that code to check that primitive procedures are
passed arguments of the correct type is suppressed. If not
specified, a run-time error will be issued if a primitive
procedure is called with arguments of the wrong type. If
specified, the behavior of programs that call a primitive
procedure with data of the wrong type is undefined.
The following options control the verbosity of the compiler:
-d0 Produces a compile-time backtrace upon a compiler error.
-d1 Produces commentary during compilation describing what the
compiler is doing.
-d2 Produces a decorated listing of the source program after flow
analysis.
-d3 Produces a decorated listing of the source program after
equivalent types have been merged.
-d4 Produces a call graph of the source program.
-d5 Produces a description of all nontrivial native procedures
generated.
-d6 Produces a list of all expressions and closures that allocate
storage along with a description of where that storage is
allocated.
-d7 Produces a trace of the lightweight closure-conversion process.
-closure-conversion-statistics
Produces a summary of the closure-conversion statistics. These
are automatically processed by the program bcl-to-latex.sc which
is run by the bcl-benchmark script (both in the
/usr/local/stalin/benchmarks directory) to produce tables II,
III, and IV, of the paper Flow-Directed Lightweight Closure
Conversion.
The following options control the storage management strategy used by
compiled code:
-dc Disables the use of alloca(3). Normally, the compiler will use
alloca(3) to allocate on the call stack when possible.
-dC Disables the use of the Boehm conservative garbage collector.
Normally, the compiler will use the Boehm collector to allocate
data whose lifetime is not known to be short. Note that the
compiler will still use the Boehm collector for some data if it
cannot allocate that data on the stack or on a region.
-dH Disables the use of regions for allocating data.
-dg Generate code to produce diagnostic messages when region
segments are allocated and freed.
-dh Disables the use of expandable regions and uses fixed-size
regions instead.
The following options control code generation:
-d Specifies that inexact reals are represented as C doubles.
Normally, inexact reals are represented as C floats.
-architecture
Specify the architecture for which to generate code. The
default is to generate code for whatever architecture the
compiler is run on. Currently, the known architectures are
IA32, IA32-align-double, SPARC, SPARCv9, SPARC64, MIPS, Alpha,
ARM, M68K, PowerPC, and S390.
-baseline
Do not perform lightweight closure conversion. Closures are
created for all procedures. The user would not normally specify
this option. It is only intended to measure the effectiveness
of lightweight closure conversion. It is used by the bcl-
benchmark script (in the /usr/local/stalin/benchmarks directory)
to produce tables II, III, and IV, of the paper Flow-Directed
Lightweight Closure Conversion.
-conventional
Perform a simplified version of lightweight closure conversion
that does not rely on interprocedural analysis. Attempts to
mimic what ‘conventional’ compilers do (whatever that is). The
user would not normally specify this option. It is only
intended to measure the effectiveness of lightweight closure
conversion. It is used by the bcl-benchmark script (in the
/usr/local/stalin/benchmarks directory) to produce tables II,
III, and IV of the paper Flow-Directed Lightweight Closure
Conversion.
-lightweight
Perform lightweight closure conversion. This is the default.
-immediate-flat
Generate code using immediate flat closures. This is not (yet)
implemented.
-indirect-flat
Generate code using indirect flat closures. This is not (yet)
implemented.
-immediate-display
Generate code using immediate display closures.
-indirect-display
Generate code using indirect display closures. This is not
(yet) implemented.
-linked
Generate code using linked closures. This is the default.
-align-strings
Align all strings to fixnum alignment. This will not work when
strings are returned by foreign procedures that are not aligned
to fixnum alignment. It will also not work when ARGV is used,
since those strings are also not aligned to fixnum alignment.
This is the default.
-do-not-align-strings
Do not align strings to fixnum alignment. This must be
specified when strings returned by foreign procedures are not
aligned to fixnum alignment.
-de Enables the compiler optimization known as EQ? forgery.
Sometimes yields more efficient code and sometimes yields less
efficient code.
-df Disables the compiler optimization known as forgery.
-dG Pass arguments using global variables instead of parameters
whenever possible.
-di Generate if statements instead of switch statements for
dispatching.
-dI Enables the use of immediate structures.
-dp Enables representation promotion. Promotes some type sets from
squeezed to squished or squished to general if this will
decrease the amount of run-time branching or dispatching
representation coercions. Sometimes yields more efficient code
and sometimes yields less efficient code.
-dP Enables copy propagation. Sometimes yields more efficient code
and sometimes yields less efficient code.
-ds Disables the compiler optimization known as squeezing.
-dS Disables the compiler optimization known as squishing.
-Tmk Enables generation of code that works with the Treadmarks
distributed-shared-memory package. Currently this option is not
fully implemented and is not known to work.
-no-tail-call-optimization
Stalin now generates code that is properly tail recursive, by
default, in all but the rarest of circumstances. And it can be
coerced into generating properly tail-recursive code in all
circumstances by appropriate options. Some tail-recursive
calls, those where the call site is in-lined in the target, are
translated as C goto statements and always result in properly
tail-recursive code. The rest are translated as C function
calls in tail position. This relies on the C compiler to
perform tail-call optimization. gcc(1) versions 2.96 and 3.0.2
(and perhaps other versions) perform tail-call optimization on
IA32 (and perhaps other architectures) when -foptimize-sibling-
calls is specified. (-O2 implies -foptimize-sibling-calls.)
gcc(1) only performs tail-call optimization on IA32 in certain
circumstances. First, the target and the call site must have
compatible signatures. To guarantee compatible signatures,
Stalin passes parameters to C functions that are part of tail-
recursive loops in global variables. Second, the target must
not be declared __attribute__ ((noreturn)). Thus Stalin will
not generate a __attribute__ ((noreturn)) declaration for a
function that is part of a tail-recursive loop even if Stalin
knows that it never returns. Third, the function containing the
call site cannot call alloca(3). gcc(1) does no flow analysis.
Any call to alloca(3) in the function containing the call site,
no matter whether the allocated data escapes, will disable tail-
call optimization. Thus Stalin disables stack allocation of
data in any procedure in-lined in a procedure that is part of a
tail-recursive loop. Finally, the call site cannot contain a
reentrant region because reentrant regions are freed upon
procedure exit and a tail call would require an intervening
region reclamation. Thus Stalin disables allocation of data on
a reentrant region in any procedure that is part of a tail-
recursive loop. Disabling these optimizations incurs a cost for
the benefit of achieving tail-call optimization. If your C
compiler does not perform tail-call optimization then you may
wish not to pay the cost. The -no-tail-call-optimization option
causes Stalin not to take these above four measures to generate
code on which gcc(1) would perform tail-call optimization. Even
when specifying this option, Stalin still translates calls,
where the call site is in-lined in the target, as C goto
statements. There are three rare occasions that can still foil
proper tail recursion. First, if you specify -dC you may force
Stalin to use stack or region allocation even in a tail-call
cycle. You can avoid this by not specifying -dC. Second,
gcc(1) will not perform tail-call optimization when the function
containing the call site applies unary & to a local variable.
gcc(1) does no flow analysis. Any application of unary & to a
local variable in the function containing the call site, no
matter whether the pointer escapes, will disable tail-call
optimization. Stalin can generate such uses of unary & when you
specify -de or don’t specify -df. You can avoid such cases by
specifying -df and not specifying -de. Finally, gcc(1) will not
perform tail-call optimization when the function containing the
call site calls setjmp(3). gcc(1) does no flow analysis. Any
call to setjmp(3) in the function containing the call site, no
matter whether the jmp_buf escapes, will disable tail-call
optimization. Stalin translates certain calls to call-with-
current-continuation as calls to setjmp(3). You can force
Stalin not to do so by specifying -fully-convert-to-CPS. Stalin
will generate a warning in the first and third cases, namely,
when tail-call optimization is foiled by reentrant-region
allocation or calls to alloca(3) or setjmp(3). So you can hold
off specifying -fully-convert-to-CPS or refraining from
specifying -dC until you see such warnings. No such warning is
generated, however, when uses of unary & foil tail-call
optimization. So you might want to always specify -df and
refrain from specifying -de if you desire your programs to be
properly tail recursive.
The following options control the C-compilation phase:
-db Disables the production of a database file.
-c Specifies that the C compiler is not to be called after
generating the C code. Normally, the C compiler is called after
generating the C code to produce an executable image. This
implies -k.
-k Specifies that the generated C file is not to be deleted.
Normally, the generated C file is deleted after it is compiled.
-cc Specifies the C compiler to use. Defaults to gcc(1).
-copt Specifies the options that the C compiler is to be called with.
Normally the C compiler is called without any options. This
option can be repeated to allow passing multiple options to the
C compiler.
FILES
/usr/local/stalin/include/ default directory for Scheme include files
and library archive files
/usr/local/stalin/include/Scheme-to-C-compatibility.sc include file for
Scheme->C compatibility
/usr/local/stalin/include/QobiScheme.sc include file for QobiScheme
/usr/local/stalin/include/xlib.sc include file for Xlib FPI
/usr/local/stalin/include/xlib-original.sc include file for Xlib FPI
/usr/local/stalin/include/libstalin.a library archive for Xlib FPI
/usr/local/stalin/include/gc.h include file for the Boehm conservative
garbage collector
/usr/local/stalin/include/libgc.a library archive for the Boehm
conservative garbage collector
/usr/local/stalin/include/stalin.architectures the known architectures
and their code-generation parameters
/usr/local/stalin/include/stalin-architecture-name shell script that
determines the architecture on which Stalin is running
/usr/local/stalin/stalin-architecture.c program to construct a new
entry for stalin.architectures with the code-generation parameters for
the machine on which it is run
/usr/local/stalin/benchmarks directory containing benchmarks from the
paper Flow-Directed Lightweight Closure Conversion
/usr/local/stalin/benchmarks/bcl-benchmark script for producing tables
II, III, and IV from the paper Flow-Directed Lightweight Closure
Conversion
/usr/local/stalin/benchmarks/bcl-to-latex.sc Scheme program for
producing tables II, III, and IV from the paper Flow-Directed
Lightweight Closure Conversion
SEE ALSO
sci(2), scc(2), gcc(1), ld(1), alloca(3), setjmp(3), gc(8)
BUGS
Version 0.11 is an alpha release and contains many known bugs. Not
everything is fully implemented. Bug mail should be addressed to Bug-
Stalin@AI.MIT.EDU and not to the author. Please include the version
number (0.11) in the message. Periodic announcements of bug fixes,
enhancements, and new releases will be made to Info-Stalin@AI.MIT.EDU.
Send mail to Info-Stalin-Request@AI.MIT.EDU to be added to the Info-
Stalin@AI.MIT.EDU mailing list.
AUTHOR
Jeffrey Mark Siskind
THANKS
Rob Browning packaged version 0.11 for Debian Linux.