lat_ctx - context switching benchmark

NAME

       lat_ctx - context switching benchmark

SYNOPSIS

       lat_ctx [ -P <parallelism> ] [ -W <warmups> ] [ -N <repetitions> ] [ -s
       <size_in_kbytes> ] #procs [ #procs ...  ]

DESCRIPTION

       lat_ctx measures context switching time for any  reasonable  number  of
       processes  of  any  reasonable  size.  The processes are connected in a
       ring of Unix pipes.  Each process reads a token from its pipe, possibly
       does some work, and then writes the token to the next process.

       Processes  may  vary in number.  Smaller numbers of processes result in
       faster context switches.  More than 20 processes is not supported.

       Processes may vary in size.  A size of zero  is  the  baseline  process
       that  does  nothing  except  pass  the token on to the next process.  A
       process size of greater than zero means that the process does some work
       before  passing  on the token.  The work is simulated as the summing up
       of an array of the specified size.  The summing is an unrolled loop  of
       about a 2.7 thousand instructions.

       The effect is that both the data and the instruction cache get polluted
       by some amount before the token is passed  on.   The  data  cache  gets
       polluted  by approximately the process ‘‘size’’.  The instruction cache
       gets  polluted  by  a  constant  amount,  approximately  2.7   thousand
       instructions.

       The  pollution  of the caches results in larger context switching times
       for the larger processes.  This may be confusing because the  benchmark
       takes  pains to measure only the context switch time, not including the
       overhead of doing the work.  The subtle point is that the  overhead  is
       measured  using  hot  caches.   As the number and size of the processes
       increases, the caches are more and  more  polluted  until  the  set  of
       processes do not fit.  The context switch times go up because a context
       switch is defined as the switch time plus the time it takes to  restore
       all  of  the process state, including cache state.  This means that the
       switch includes the time for the cache misses on larger processes.

OUTPUT

       Output format is intended as input to xgraph or some  similar  program.
       The  format is multi line, the first line is a title that specifies the
       size and non-context switching overhead of the test.   Each  subsequent
       line  is  a  pair of numbers that indicates the number of processes and
       the cost of a context switch.  The  overhead  and  the  context  switch
       times  are  in  micro  second  units.   The  numbers  below  are  for a
       SPARCstation 2.

       "size=0 ovr=179
       2 71
       4 104
       8 134
       16 333
       20 438

BUGS

       The numbers produced by this benchmark are  somewhat  inaccurate;  they
       vary  by about 10 to 15% from run to run.  A series of runs may be done
       and the lowest  numbers  reported.   The  lower  the  number  the  more
       accurate the results.

       The  reasons  for the inaccuracies are possibly interaction between the
       VM system and the processor caches.  It is possible that sometimes  the
       benchmark  processes  are  laid out in memory such that there are fewer
       TLB/cache conflicts than other times.  This is pure speculation on  our
       part.

ACKNOWLEDGEMENT

       Funding   for  the  development  of  this  tool  was  provided  by  Sun
       Microsystems Computer Corporation.

AUTHOR

       Carl Staelin and Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)1994-2000 Carl Staelin and Larry $Date$

NAME

SYNOPSIS

DESCRIPTION

OUTPUT

BUGS

ACKNOWLEDGEMENT

SEE ALSO

AUTHOR