lmbench - benchmarking toolbox

NAME

       lmbench - benchmarking toolbox

SYNOPSIS

       #include ‘‘lmbench.h’’

       typedef u_long iter_t

       typedef (*benchmp_f)(iter_t iterations, void* cookie)

       void benchmp(benchmp_f   initialize,   benchmp_f  benchmark,  benchmp_f
       cleanup, int enough, int parallel, int warmup, int  repetitions,  void*
       cookie)

       uint64    get_n()

       void milli(char *s, uint64 n)

       void micro(char *s, uint64 n)

       void nano(char *s, uint64 n)

       void mb(uint64 bytes)

       void kb(uint64 bytes)

DESCRIPTION

       Creating benchmarks using the lmbench timing harness is easy.  Since it
       is so easy to measure performance using lmbench ,  it  is  possible  to
       quickly  answer questions that arise during system design, development,
       or tuning.  For example, image processing

       There are two attributes that are critical for performance, latency and
       bandwidth,  and  lmbenchs  timing harness makes it easy to measure and
       report results for both.  Latency is usually important  for  frequently
       executed  operations,  and  bandwidth  is usually important when moving
       large chunks of data.

       There are a number of factors to consider when building benchmarks.

       The  timing  harness  requires  that  the  benchmarked   operation   be
       idempotent so that it can be repeated indefinitely.

       The timing subsystem, benchmp, is passed up to three function pointers.
       Some  benchmarks  may  need  as  few  as  one  function  pointer   (for
       benchmark).

       void benchmp(initialize,  benchmark, cleanup, enough, parallel, warmup,
       repetitions, cookie)
              measures the performance of benchmark repeatedly and reports the
              median result.  benchmp creates parallel sub-processes which run
              benchmark  in  parallel.   This  allows  lmbench  to measure the
              system’s ability to scale as  the  number  of  client  processes
              increases.  Each sub-process executes initialize before starting
              the benchmarking cycle with iterations set to 0.  It  will  call
              initialize  , benchmark , and cleanup with iterations set to the
              number of iterations in the timing loop several times  in  order
              to  collect  repetitions  results.   The  calls to benchmark are
              surrounded by start and stop call to time the amount of time  it
              takes  to  do the benchmarked operation iterations times.  After
              all the benchmark results have been collected, cleanup is called
              with iterations set to 0 to cleanup any resources which may have
              been allocated by initialize or benchmark.   cookie  is  a  void
              pointer  to  a  hunk  of  memory  that  can be used to store any
              parameters or state that is needed by the benchmark.

       void benchmp_getstate()
              returns a void pointer to the lmbench-internal state used during
              benchmarking.   The state is not to be used or accessed directly
              by clients, but rather would be passed into benchmp_interval.

       iter_t    benchmp_interval(void* state)
              returns the number of times the  benchmark  should  execute  its
              benchmark  loop  during this timing interval.  This is used only
              for weird benchmarks which cannot implement the  benchmark  body
              in  a function which can return, such as the page fault handler.
              Please see lat_sig.c for sample usage.

       uint64    get_n()
              returns the number of times loop_body was  executed  during  the
              timing interval.

       void milli(char *s, uint64 n)
              print  out  the  time  per operation in milli-seconds.  n is the
              number of operations during the timing interval, which is passed
              as  a  parameter  because  each  loop_body  can  contain several
              operations.

       void micro(char *s, uint64 n)
              print the time per opertaion in micro-seconds.

       void nano(char *s, uint64 n)
              print the time per operation in nano-seconds.

       void mb(uint64 bytes)
              print the bandwidth in megabytes per second.

       void kb(uint64 bytes)
              print the bandwidth in kilobytes per second.

USING lmbench

       Here is an example of a simple benchmark that measures the  latency  of
       the random number generator lrand48():

              #include ‘‘lmbench.h’’

              void
              benchmark_lrand48(iter_t iterations, void* cookie) {
                   while(iterations-- > 0)
                        lrand48();
              }

              int
              main(int argc, char *argv[])
              {
                   benchmp(NULL,  benchmark_lrand48,  NULL,  0,  1,  0, TRIES,
              NULL);
                   micro( lrand48()", get_n());"
                   exit(0);
              }

       Here is a simple benchmark that measures and reports the  bandwidth  of
       bcopy:

              #include ‘‘lmbench.h’’

              #define MB (1024 * 1024)
              #define SIZE (8 * MB)

              struct _state {
                   int size;
                   char* a;
                   char* b;
              };

              void
              initialize_bcopy(iter_t iterations, void* cookie) {
                   struct _state* state = (struct _state*)cookie;

                  if (!iterations) return;
                   state->a = malloc(state->size);
                   state->b = malloc(state->size);
                   if (state->a == NULL || state->b == NULL)
                        exit(1);
              }

              void
              benchmark_bcopy(iter_t iterations, void* cookie) {
                   struct _state* state = (struct _state*)cookie;

                   while(iterations-- > 0)
                        bcopy(state->a, state->b, state->size);
              }

              void
              cleanup_bcopy(iter_t iterations, void* cookie) {
                   struct _state* state = (struct _state*)cookie;

                  if (!iterations) return;
                   free(state->a);
                   free(state->b);
              }

              int
              main(int argc, char *argv[])
              {
                   struct _state state;

                   state.size = SIZE;
                   benchmp(initialize_bcopy, benchmark_bcopy, cleanup_bcopy,
                        0, 1, 0, TRIES, &state);
                   mb(get_n() * state.size);
                   exit(0);
              }

       A  slightly  more  complex version of the bcopy benchmark might measure
       bandwidth as a function of  memory  size  and  parallelism.   The  main
       procedure in this case might look something like this:

              int
              main(int argc, char *argv[])
              {
                   int  size, par;
                   struct _state state;

                   for (size = 64; size <= SIZE; size <<= 1) {
                        for (par = 1; par < 32; par <<= 1) {
                             state.size = size;
                             benchmp(initialize_bcopy, benchmark_bcopy,
                                  cleanup_bcopy, 0, par, 0, TRIES, &state);
                             fprintf(stderr, d%d
                             mb(par * get_n() * state.size);
                        }
                   }
                   exit(0);
              }

VARIABLES

       There  are  three  environment variables that can be used to modify the
       lmbench timing subsystem: ENOUGH, TIMING_O, and LOOP_O.

FUTURES

       Development of lmbench is continuing.

AUTHOR

       Carl Staelin and Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)1998-2000 Larry McVoy and Carl St$Date:$

NAME

SYNOPSIS

DESCRIPTION

USING lmbench

VARIABLES

FUTURES

SEE ALSO

AUTHOR