NAME
lmbench - benchmarking toolbox
SYNOPSIS
#include ‘‘lmbench.h’’
typedef u_long iter_t
typedef (*benchmp_f)(iter_t iterations, void* cookie)
void benchmp(benchmp_f initialize, benchmp_f benchmark, benchmp_f
cleanup, int enough, int parallel, int warmup, int repetitions, void*
cookie)
uint64 get_n()
void milli(char *s, uint64 n)
void micro(char *s, uint64 n)
void nano(char *s, uint64 n)
void mb(uint64 bytes)
void kb(uint64 bytes)
DESCRIPTION
Creating benchmarks using the lmbench timing harness is easy. Since it
is so easy to measure performance using lmbench , it is possible to
quickly answer questions that arise during system design, development,
or tuning. For example, image processing
There are two attributes that are critical for performance, latency and
bandwidth, and lmbenchs timing harness makes it easy to measure and
report results for both. Latency is usually important for frequently
executed operations, and bandwidth is usually important when moving
large chunks of data.
There are a number of factors to consider when building benchmarks.
The timing harness requires that the benchmarked operation be
idempotent so that it can be repeated indefinitely.
The timing subsystem, benchmp, is passed up to three function pointers.
Some benchmarks may need as few as one function pointer (for
benchmark).
void benchmp(initialize, benchmark, cleanup, enough, parallel, warmup,
repetitions, cookie)
measures the performance of benchmark repeatedly and reports the
median result. benchmp creates parallel sub-processes which run
benchmark in parallel. This allows lmbench to measure the
system’s ability to scale as the number of client processes
increases. Each sub-process executes initialize before starting
the benchmarking cycle with iterations set to 0. It will call
initialize , benchmark , and cleanup with iterations set to the
number of iterations in the timing loop several times in order
to collect repetitions results. The calls to benchmark are
surrounded by start and stop call to time the amount of time it
takes to do the benchmarked operation iterations times. After
all the benchmark results have been collected, cleanup is called
with iterations set to 0 to cleanup any resources which may have
been allocated by initialize or benchmark. cookie is a void
pointer to a hunk of memory that can be used to store any
parameters or state that is needed by the benchmark.
void benchmp_getstate()
returns a void pointer to the lmbench-internal state used during
benchmarking. The state is not to be used or accessed directly
by clients, but rather would be passed into benchmp_interval.
iter_t benchmp_interval(void* state)
returns the number of times the benchmark should execute its
benchmark loop during this timing interval. This is used only
for weird benchmarks which cannot implement the benchmark body
in a function which can return, such as the page fault handler.
Please see lat_sig.c for sample usage.
uint64 get_n()
returns the number of times loop_body was executed during the
timing interval.
void milli(char *s, uint64 n)
print out the time per operation in milli-seconds. n is the
number of operations during the timing interval, which is passed
as a parameter because each loop_body can contain several
operations.
void micro(char *s, uint64 n)
print the time per opertaion in micro-seconds.
void nano(char *s, uint64 n)
print the time per operation in nano-seconds.
void mb(uint64 bytes)
print the bandwidth in megabytes per second.
void kb(uint64 bytes)
print the bandwidth in kilobytes per second.
USING lmbench
Here is an example of a simple benchmark that measures the latency of
the random number generator lrand48():
#include ‘‘lmbench.h’’
void
benchmark_lrand48(iter_t iterations, void* cookie) {
while(iterations-- > 0)
lrand48();
}
int
main(int argc, char *argv[])
{
benchmp(NULL, benchmark_lrand48, NULL, 0, 1, 0, TRIES,
NULL);
micro( lrand48()", get_n());"
exit(0);
}
Here is a simple benchmark that measures and reports the bandwidth of
bcopy:
#include ‘‘lmbench.h’’
#define MB (1024 * 1024)
#define SIZE (8 * MB)
struct _state {
int size;
char* a;
char* b;
};
void
initialize_bcopy(iter_t iterations, void* cookie) {
struct _state* state = (struct _state*)cookie;
if (!iterations) return;
state->a = malloc(state->size);
state->b = malloc(state->size);
if (state->a == NULL || state->b == NULL)
exit(1);
}
void
benchmark_bcopy(iter_t iterations, void* cookie) {
struct _state* state = (struct _state*)cookie;
while(iterations-- > 0)
bcopy(state->a, state->b, state->size);
}
void
cleanup_bcopy(iter_t iterations, void* cookie) {
struct _state* state = (struct _state*)cookie;
if (!iterations) return;
free(state->a);
free(state->b);
}
int
main(int argc, char *argv[])
{
struct _state state;
state.size = SIZE;
benchmp(initialize_bcopy, benchmark_bcopy, cleanup_bcopy,
0, 1, 0, TRIES, &state);
mb(get_n() * state.size);
exit(0);
}
A slightly more complex version of the bcopy benchmark might measure
bandwidth as a function of memory size and parallelism. The main
procedure in this case might look something like this:
int
main(int argc, char *argv[])
{
int size, par;
struct _state state;
for (size = 64; size <= SIZE; size <<= 1) {
for (par = 1; par < 32; par <<= 1) {
state.size = size;
benchmp(initialize_bcopy, benchmark_bcopy,
cleanup_bcopy, 0, par, 0, TRIES, &state);
fprintf(stderr, d%d
mb(par * get_n() * state.size);
}
}
exit(0);
}
VARIABLES
There are three environment variables that can be used to modify the
lmbench timing subsystem: ENOUGH, TIMING_O, and LOOP_O.
FUTURES
Development of lmbench is continuing.
SEE ALSO
lmbench(8), timing(3), reporting(3), results(3).
AUTHOR
Carl Staelin and Larry McVoy
Comments, suggestions, and bug reports are always welcome.
(c)1998-2000 Larry McVoy and Carl St$Date:$