Man Linux: Main Page and Category List


       rand_rate  -  A  testing  and  benchmarking  tool for GSL random number


       dieharder [-a] [-b bits] [-d diehard test number] [-f filename]
                 [-g generator number] [-h] [-i iterations] [-l]
                 [-n ntuple] [-p number of p samples] [-q] [-N]
                 [-r rgb test number] [-s sts test number]
                 [-t number of test samples] [-u user test number]
                 [-v verbose flag] [-x xvalue] [-y yvalue] [-z zvalue]


       -a Runs all the tests with standard/default options to create a report

       -b bits - sets the number of bits to be used in tests that act on a bit
              string of variable length, e.g. the rgb bitdist test.

       -d test number -  selects specific diehard test.

       -f filename - when used with -g for raw or (ascii formatted) input,
              allows random numbers to be read in from a file.  Note that  the
              stdin  (raw)  interface  does  not  require a -f entry, only the
              appropriate -g.

       -g generator number - selects a specific generator for testing.  A
              generator number of -1  (or  dieharder  entered  alone  with  no
              arguments)  causes all known generators to be printed out to the

       -h prints context-sensitive help -- usually Usage (this message) or a
              test synopsis if entered as e.g. dieharder -d 3 -h.

       -i iterations - sets iteration count for timing runs (should not be

       -l list all known tests.

       -n ntuple - set ntuple length for  tests  on  short  bit  strings  that
              the length to be varied (e.g. rgb bitdist).

       -p count - sets the number of p-value samples per test (default 100).

       -q selects
              per test (where applicable).  This is a way of  getting  a  very
              compact report.

       -N selects non-overlapping samples for overlapping tests where the
              non-overlapping   target   statistic   is   known.    Note  that
              overlapping sample tests generally have internal covariance that
              is  quite  difficult  to  remove.   Consequently they on average
              contain LESS than one

              where non-overlapping  samples  are  more  tightly  distributed.
              With  built-in generators, non-overlapping tests take longer and
              may have a different sensitivity.

       -r test number - selects specific rgb test.

       -s test number - selects specific sts test.

       -t count - sets the number of random entities used in each test, where
              possible.  Be warned -- some  tests  have  fixed  sample  sizes;
              others  are  variable  but  have practical minimum sizes.  It is
              suggested you begin with the values used in  -a  and  experiment
              carefully on a test by test basis.

       -u test number - selects specific user-developed test, if you’ve added
              one  or more to this tool.  dieharder provides this interface to
              make it easy to add your own tests.

       -v verbose flag -- controls the verbosity of the output for debugging
              only.  Probably of little use to non-developers, and  developers
              can  read the enum(s) in dieharder.h and the test sources to see
              which flag values turn on output on which routines.  1 is result
              in a fairly detailed trace of program activity.

       -x,-y,-z number - Some tests have parameters that can safely be varied
              from their default value.  For example, in the diehard birthdays
              test, one can vary the number  of  length,  which  can  also  be
              varied.   -x 2048 -y 30 alters these two values but should still
              run fine.

              NOTE WELL: The "bogomegarates" (number of  rands  generated  per
              second)  returned  by  this  tool  are BOGUS and may not be even
              approximately  correct  in  your  context.   Also,  the  quality
              assessment(s) for the rngs may, in fact, be completely incorrect
              (due to errors in the tests) or misleading.  Learn what p-values
              mean (see below)! Use them at your Own Risk!  Be Warned!


       DieHarder  This  is the current snapshot of the dieharder random number
       tester.  It encapsulates all  of  the  Gnu  Scientific  Library  random
       number  generators  as well as /dev/random and /dev/urandom in a single
       harness that can time them  and  subject  them  to  various  tests  for
       randomness.   These  tests  are variously drawn from George Marsaglia’s
       "Diehard battery of random number tests",  the  NIST  Statistical  Test
       Suite,  and  include  a  few useful e.g. binning or spectral tests that
       I’ve found doing research into random number  tests  or  tests  that  I
       myself  have  made  up  or  that  are  improvements on tests from other

       The primary point of this tester is to make it easy to  time  and  test
       (pseudo)random  number  generators  OR  hardware  or  other generators.
       Three examples are provided of wrapping a random number  generator  and
       inserting  it  so  that  it is can be called via the GSL interface.  An
       interface is planned that would allow random numbers to be read in from
       a  file,  allowing the tool to be used with any generator (wrappable or
       not) that can generate a table of random numbers.

       Another important motivation for  writing  dieharder  is  to  have  the
       entire  test  code  and  documentation be fully Gnu Public Licensed and
       hence openly available for adaptation, testing, modification, including
       the  addition  of  new  tests.   The  primary objections I have towards
       diehard and sts are not that they are or are not adequate and complete;
       it  is that the code is obscure and not explicitly published for reuse.
       It is my hope that by providing this tool in autodocumenting source, it
       makes  it  easy to add new tests, critique older tests, and improve the
       suite in general.


       DieHarder returns p-values . To understand what a p-value is and how to
       use it, it is essential to understand the null hypothesis.

       The  null  hypothesis  for  random  number  generator  testing is "This
       generator is a perfect random number generator, and for any  choice  of
       seed  produces  a infinitely long, unique sequence of numbers that have
       all the expected statistical  properties  of  random  numbers,  to  all
       orders".   Note  well  that we know that this hypothesis is technically
       false for all software generators as they are periodic and do not  have
       the  correct  entropy  content  for this statement to be strictly true.
       Many hardware generators fail a priori as well, as they contain  subtle
       bias  or  correlations  due to the deterministic physics that underlies

       It can be  practically  true,  however.   Both  software  and  hardware
       generators  can  be  "random"  enough  that  their  sequences cannot be
       distinguished from  random  ones.   Hence  the  null  hypothesis  as  a
       practical, not a theoretically pure, statement.

       To  test  it,  one  uses  the rng in question to generate a sequence of
       presumably random numbers.  Using these numbers one  can  generate  any
       one  of a wide range of test statistics -- empirically computed numbers
       that are considered random samples that may or may not be  independent,
       depending  on  whether overlapping sequences of random numbers are used
       to generate successive samples while generating the statistic(s), drawn
       from a known distribution.  From a knowledge of the target distribution
       of the statistic(s) and the associated cumulative distribution function
       (CDF)  and  the empirical value of the randomly generated statistic(s),
       one can read off the probability of obtaining the empirical  result  if
       the sequence was truly random, that is, if the null hypothesis is true!
       This probability is the "p-value" for the particular test run.

       The usual dogma is that if this p-value is low -- typically  less  than
       0.05 -- one "rejects" the null hypothesis.  In a word, it is improbable
       that one would get the result obtained if the generator is a good  one.
       If  it is any other value, one does not "accept" the generator as good,
       one "fails to reject" the generator as bad for this particular test.  A
       "good  random  number generator" is hence one that we haven’t been able
       to make fail yet!

       This criterion is, of course, naive in the extreme and cannot  be  used
       with dieharder!  It makes just as much sense to reject a generator that
       has p-values of 0.95 or more!  Both of these p-value ranges are equally
       unlikely on any given test run, and should be returned for (on average)
       5% of all test runs by a perfect random number generator.  A  generator
       that  fails  to  produce  p-values  less than 0.05 5% of the time it is
       tested with different seeds is a bad random number generator, one  that
       fails the test of the null hypothesis.

       p-values themselves, as it turns out, are test statistics.  They should
       be uniformly distributed on the range  0-1.   In  100  test  runs  with
       independent  seeds,  one  should not be surprised to obtain 0, 1, 2, or
       even 3 p-values less than 0.01.  On the other hand obtaining 7 p-values
       in  the  range 0.54-0.55 should make the generator highly suspect!  One
       can in fact convert a set of p-values into a p-value by comparing their
       distribution  to  the  expected  one,  using a Kolmogorov-Smirnov test,
       usually either Anderson-Darling or Kuiper KS (where for  a  variety  of
       reasons we use Kuiper instead of Anderson-Darling although in the limit
       of many samples it should not matter).

       The p-values obtained should in turn be uniformly distributed  and  can
       be  subjected to still more KS tests.  The distribution of p-values for
       a good generator should  be  idempotent,  even  across  different  test
       statistics and multiple runs.

       A  failure  of the distribution of p-values at any level of aggregation
       signals trouble.  In fact, if  the  p-values  of  any  given  test  are
       subjected  to  a KS test, and those p-values are then subjected to a KS
       test, as we add more p-values to either level we  will  either  observe
       idempotence  of  the  resulting  distribution of p to uniformity, or we
       will observe idempotence to a single p-value of zero!  That is, a  good
       generator  will  produce a roughly uniform distribution of p-values, in
       the specific sense that the p-values of the distributions  of  p-values
       are  themselves  roughly  uniform  and  so on ad infinitum, while a bad
       generator will produce a non-uniform distribution of p-values,  and  as
       more  p-values drawn from the non-uniform distribution are added to its
       KS test, at some point the failure will be absolutely unmistakeable  as
       the resulting p-value approaches 0 in the limit.  Trouble indeed!

       The question is, trouble with what?  Random number tests are themselves
       complex computational objects, and there is a  probability  that  their
       code  is  incorrectly framed or that roundoff or other numerical -- not
       methodical  --  errors  are  contributing  to  a  distortion   of   the
       distribution  of  some  of  the p-values obtained.  This is not an idle
       observation; when one works on writing random number generator  testing
       programs,  one  is  always testing the tests themselves with "good" (we
       hope) random number generators so that egregious failures of  the  null
       hypothesis  signal  not  a bad generator but an error in the test code.
       The null hypothesis above is correctly framed from a theoretical  point
       of  view,  but  from a real and practical point of view it should read:
       "This generator is a perfect  random  number  generator,  and  for  any
       choice  of  seed produces a infinitely long, unique sequence of numbers
       that have all the expected statistical properties of random numbers, to
       all  orders  and  this  test  is  a  perfect test and returns precisely
       correct p-values from the test  computation."   Observed  "failure"  of
       this  null  hypothesis can come from failure of either or both of these
       disjoint components, and comes from the second as often or  more  often
       than  the first during the development process.  When one cranks up the
       "resolution" of the test (discussed next) to where a  generator  starts
       to  fail  some  test  one realizes, or should realize, that development
       never ends and that new test regimes will always  reveal  new  failures
       not only of the generators but of the code.

       With  that  said, one of dieharder’s most significant advantages is the
       control that it gives you over a critical  test  parameter.   From  the
       remarks  above, we can see that we should feel very uncomfortable about
       "failing" any given random number generator on the basis of  a  5%,  or
       even  a  1%,  criterion,  especially  when  we  apply a test suite like
       dieharder that returns  76  distinct  test  p-values  as  of  the  last
       snapshot.  We want failure to be unambiguous and reproducible!

       To  accomplish this, one can simply crank up its resolution.  If we ran
       any given test against a random number generator and it returned  a  p-
       value of (say) 0.007328, we’d be perfectly justified in wondering if it
       is really a good generator.  However, the probability of  getting  this
       result isn’t really all that small -- when one uses dieharder for hours
       at a time numbers like this will definitely happen  and  mean  nothing.
       If  one  runs the same test again (with a different seed or part of the
       random sequence) and gets a p-value of 0.009122, and a third  time  and
       gets  0.002669  --  well,  that’s three 1% (or less) shots in a row and
       that should happen only one in a million times.   One  way  to  clearly
       resolve failures, then, is to increase the number of p-values generated
       in a test run.  If the actual distribution of p being returned  by  the
       test is not uniform, a KS test will eventually return a p-value that is
       not some ambiguous 0.035517 but is instead 0.000001,  with  the  latter
       produced time after time as we rerun.

       dieharder  permits one to increase the number of p-values generated for
       any test, subject only to the availability  of  enough  random  numbers
       (for  file  based  tests)  and  time,  to  make  failures  unambiguous.
       However, because it is a research tool it is  strongly  suggested  that
       one always consider the alternative null hypothesis -- that the failure
       is a failure of the test code in some limit of  large  numbers  --  and
       take at least some steps to ensure that the failure is in the generator
       and not the dieharder code.

       The steps I myself take  during  development  is  to  rely  heavily  on
       several "presumed good" random number generators.  These are generators
       that we have theoretical reasons to expect are extraordinarily good and
       to  lack  correlations out to some known underlying dimensionality, and
       that also test out extremely well.  By using several generators and not
       just  one,  one can hope that those generators have (at the very least)
       different correlations and should not all uniformly fail a test in  the
       same  way  and  with  the  same  number of p-values.  When all of these
       generators consistently fail a test, I tend to suspect that the problem
       is in the test code, not the generators.  One advantage of dieharder is
       that it has a number of these "good generators"  immediately  available
       for  comparison  runs,  courtesy  of the Gnu Scientific Library.  I use
       mt19937_1999, gfsr4, ranldx2  and  taus2  (as  well  as  "true  random"
       numbers  from  for  this purpose, and I try to ensure that
       dieharder will "pass" in particular the Mersenne Twister  generator  at
       any reasonable p-value resolution out to as much as -p 1000.

       Tests (such as the diehard operm5 test as of 3/27/08) that consistently
       fail at  sufficiently  high  resolution  are  often  flagged  as  being
       "suspect"  --  possible  failures of the alternative null hypothesis --
       and their results should not be used to test random  number  generators
       pending  agreement  in  the statistics and random number community that
       those tests are in fact valid and correct so that observed failures can
       indeed  safely  be  attributed  to  a  failure  of  the  intended  null

       dieharder is community  supported.   Bear  in  mind  that  I  (rgb)  am
       professionally  a  physicist and that I have taken precisely one course
       in statistics and that one was in high school.  That does not mean I am
       incompetent  to  be  the  primary  keeper of the dieharder code as I am
       quite competent in programming and have actively studied statistics  on
       my own to a level of at least enough competence to explain the above in
       a hopefully extremely clear way (something that has eluded many better-
       trained  workers  in  this  arena).   It  does,  however,  mean that my
       knowledge of statistics contains holes and that I  have  to  work  very
       hard  to  understand the theory underlying the more difficult tests.  I
       am not proud about this and would cherish instruction and correction by
       real experts.

       I  therefore  openly  invite  experts  in statistics and programming to
       examine the source code of those tests and help  me  as  I  attempt  to
       determine  whether  or  not  the tests are in fact valid or whether the
       failure is due to a bug in the code, and to help me  fix  the  code  or
       algorithms  being  implemented.   I  would  like to see this test suite
       ultimately be validated by the general statistics community in hard use
       in  an  open  environment,  where every possible failure of the testing
       mechanism is subject to scrutiny and eventual correction.  In this  way
       we  will eventually achieve a very powerful suite of tools indeed, ones
       that may well give us very specific information not just about  failure
       but  of  the  mode  of  failure  as  well, just how the sequence tested
       deviates from randomness.

       This will be a lot of work (but  perhaps  work  suitable  for  directed
       research  by  statistics  undergrad  or  grad  students  who  also  are
       competent programmers).  Note well  --  many  of  the  tests  that  use
       overlapping  samples  must  deal with covariance by means of e.g.  weak
       inverses of a covariance matrix in order to convert a  vector  of  test
       statistics  into  a  multinomial  form  suitable  for a chi square test
       returning a p-value.  The code for doing so is nontrivial and  requires
       that  the  covariance  matrix (or its suitable decomposition) be either
       computed in situ or entered in data.

       Checking the code for overlapping sample tests  in  particular  is  not
       easy,  in  other  words.   The  theory is difficult, the programming is
       difficult, and the numerics are difficult.  This is why it  is  (in  my
       opinion)  essential  that the testing tool be fully open source and why
       most sophisticated users of the tool will wish to work using a copy  of
       dieharder  built from source rather than "just" installed from a binary
       package.  The alternative null  hypothesis  is  there  for  all  random
       number   generator  testers,  but  only  with  an  open  source  tester
       integrated with some known-excellent generators  to  test  against  and
       compare to is it possible to consider it and take steps to validate not
       only the generator but the tool used to test  it  at  the  source  code
       level.  "Black box" commercial testers are to be eschewed at all costs,
       although it is  perfectly  reasonable  to  consider  employing  skilled
       consultants  using  an  open  product  to  test  e.g. a new hardware or
       software generator for its suitability in some given application.

       Thus far, dieharder has benefitted  tremendously  from  the  community.
       Individuals have openly contributed tests, new generators to be tested,
       and fixes for existing tests that were revealed by their own work  with
       the  testing  instrument.   Efforts are underway to make dieharder more
       portable so that it will build on more platforms  and  faster  so  that
       more thorough testing can be done.  Please feel free to participate.


       To  test  your own random number generator or to test the randomness of
       some file of presumably random numbers, two  approaches  are  possible.
       First  of  all,  there  are  five generators that are wrapped up in GSL
       compatible clothes and linked to the  GSL  so  that  the  standard  GSL
       interface  works  for them.  Using these as prototypes and working with
       the fully GPL dieharder sources, any software or hardware random number
       generator  that one wishes to test can be added for testing using these
       as prototypes, and can likely be submitted to the GSL for inclusion  if
       they  pass  the  tests  as  well or better than the generators that are
       already there.   Dieharder  is  designed  to  (ultimately)  be  a  very
       convenient tool for testing new software RNGs.

       However,  the last two non-gsl generators are "universal" generators in
       the sense that they permit you to input a random number stream  from  a
       file (but not from /dev/random or /dev/urandom, be warned).

       The  file_input  generator requires a file of "cooked" (ascii readable)
       random numbers, one per line, with a header that describes  the  format
       to dieharder.  This interface is still somewhat experimental -- not all
       ascii formats have been tested.  However, it has been tested and should
       work  for  32 bit unsigned integers represented directly in ascii or as
       32 bits of binary.  Example  files  for  a  couple  of  possible  input
       formats are included in the sources.

       Finally,  the  type file_input_raw accepts a file of raw bits as input,
       such as might be generated by:

        dd if=/dev/urandom of=testrands.raw bs=4 count=1000000

       (to generate 1,000,000  four-byte  ints  directly  from  the  software-
       augmented  kernel  entropy generator).  That is, running the tests from
       such a file should be approximately the same  as  testing  /dev/urandom

       The main (important!) difference is that some of the test require a lot
       of random numbers -- far more than were  needed  by  diehard.   Indeed,
       dieharder  typically  runs  many  of  the diehard tests 100 independent
       times, generating a p-value for each, and plots a histogram of  the  p-
       values  and generates a p-value for the (presumed uniform) distribution
       of p-values!  This approach mimics the histogram presented in  the  STS
       suite but augments it with a hard number.

       This  protects  one from the "p happens" problem described by Marsaglia
       (and described in even more detail above) where it is remarkably common
       to see very low or high p-values returned from any given test of even a
       "perfect" random number generator, but where over time  (or  many  test
       runs) a good generator will generate a uniform, idempotent distribution
       of p-values.

       File input rands are delivered to the tests on demand, but if the  test
       needs  more  than  are  available it simply rewinds the file and cycles
       through it again, and again,  and  again  as  needed.   Obviously  this
       significantly  reduces  the  sample  space  and  can lead to completely
       incorrect results for the p-value histograms unless  there  are  enough
       rands  to run EACH test without repetition (it is harmless to reuse the
       sequence for different tests).  Let the user beware!

       One last warning for those who are testing  files  of  random  numbers.
       dieharder  is  a tool that tests random number generators, not files of
       random numbers!  It is extremely inappropriate to try  to  "certify"  a
       file  of random numbers as being random just because it fails to "fail"
       any of the dieharder tests in e.g. a  dieharder  -a  run.   To  put  it
       bluntly,  if  one rejects all such files that fail any test at the 0.05
       level (or any other), the one thing one can be certain of is  that  the
       files in question are not random, as a truly random sequence would fail
       any given test at the 0.05 level 5% of the time!

       To put it another way, any file of numbers produced by a generator that
       "fails to fail" the dieharder suite should be considered "random", even
       if it contains sequences that might well "fail" any given test at  some
       specific  cutoff.  One has to presume that passing the broader tests of
       the generator itself, it was determined that the p-values for the  test
       involved  was  globally  correctly distributed, so that e.g. failure at
       the 0.01 level occurs neither more nor less than 1%  of  the  time,  on
       average,  over  many  many  tests.   If one particular file generates a
       failure at this level, one can therefore safely presume that  it  is  a
       random  file  pulled from many thousands of similar files the generator
       might create that have the correct  distribution  of  p-values  at  all
       levels of testing and aggregation.

       To  sum  up,  use  dieharder to validate your generator (via input from
       files or an embedded stream.  Then by all means use your  generator  to
       produce files or streams of random numbers.  Do not use dieharder as an
       accept/reject tool to validate the files themselves.


       To demonstrate all tests, run on the default GSL rng, enter:

         dieharder -a

       To demonstrate a test of an external generator of a raw  binary  stream
       of bits, use the stdin (raw) interface:

         cat /dev/urandom | dieharder -g 75 -a

       (be sure to run dieharder by itself on a line to verify that the number
       of the stdin generator is still 75).

       To use it with an ascii formatted file:

         dieharder -g 65 -f testrands.txt -a

       (testrands.txt should consist of a header such as:

        # generator mt19937_1999  seed = 1274511046
        type: d
        count: 100000
        numbit: 32


       To use it with a binary file

         dieharder -g 66 -f testrands.bin -a

       (or cat testrands.bin | dieharder -g 75 -a).


       DieHarder is entirely original code and can be  modified  and  used  at
       will by any user, provided that:

         a) The original copyright notices are maintained and that the source,
       including all modifications, is made publically available at  the  time
       of  any derived publication.  This is open source software according to
       the  precepts  and  spirit  of  the  Gnu  Public  License.    See   the
       accompanying    file   COPYING,   which   also   must   accompany   any

         b) The  author  of  the  code  (Robert  G.  Brown)  is  appropriately
       acknowledged and referenced in any derived publication.  It is strongly
       suggested that George Marsaglia and the Diehard suite and  the  various
       authors  of  the  Statistical  Test  Suite  be  similarly acknowledged,
       although this suite shares no actual code with these random number test

         c)   Full   responsibility   for   the   accuracy,  suitability,  and
       effectiveness of the program rests with the users and/or modifiers.  As
       is clearly stated in the accompanying copyright.h:



       The  author of this suite gratefully acknowledges George Marsaglia (the
       author of the diehard test suite)  and  the  various  authors  of  NIST
       Special  Publication 800-22 (which describes the Statistical Test Suite
       for  testing   pseudorandom   number   generators   for   cryptographic
       applications),  for excellent descriptions of the tests therein.  These
       descriptions enabled this suite to be developed with a clean copyright,
       licensable under the GPL.

       The  author  also wishes to reiterate that the academic correctness and
       accuracy  of  the  implementation  of   these   tests   is   his   sole
       responsibility  and  not  that  of  the  authors  of the Diehard or STS
       suites.  This is especially true where he has seen fit to modify  those
       tests from their strict original descriptions.


       GPL  2b;  see  the  file  COPYING  that  accompanies the source of this
       program.  This is the "standard Gnu General Public License version 2 or
       any   later   version",   with  the  one  minor  (humorous)  "Beverage"
       modification listed below.  Note that this modification is probably not
       legally  defensible and can be followed really pretty much according to
       the honor rule.

       As to my personal preferences in beverages, red wine is great, beer  is
       delightful,  and  Coca Cola or coffee or tea or even milk acceptable to
       those who for religious or personal reasons wish to avoid stressing  my

       The Beverage Modification to the GPL:

       Any  satisfied  user  of  this software shall, upon meeting the primary
       author(s) of this software for the first  time  under  the  appropriate
       circumstances,  offer  to  buy  him  or  her  or them a beverage.  This
       beverage may or may not be alcoholic, depending on the personal ethical
       and  moral views of the offerer.  The beverage cost need not exceed one
       U.S. dollar (although it certainly may at the whim  of  the  offerer:-)
       and  may be accepted or declined with no further obligation on the part
       of the offerer.  It is not necessary to  repeat  the  offer  after  the
       first meeting, but it can’t hurt...