Man Linux: Main Page and Category List

israndom - randomness testing using data compressors over fixed-size alphabets

israndom[-aalphasize][-ccompressor][-ssamplelen][-qhnr][filename]

israndomtests a sequence of symbols for randomness. israndom tries to determine if a given sequence of trials could reasonably be assumed to be from a random uniform distribution over a fixed-size alphabet of 2-256 symbols. israndom assumes that each sequence (or sample trial) is represented by exactly one byte. The only exceptions to this rule are in the case of the-nand-roptions which ignore newlines and carriage returns, respectively (see below). israndom is based on the mathematical ideas of Shannon, Kolmogorov, and Cilibrasi and uses the following formula to determine an expected size for a sample ofktrials of a uniform distribution over analphasize-symbol alphabet. Each symbol takeslog(alphasize)bits, so the total cost (in bits)cfor the ensemble of samples isklog(alphasize)bits. This number is rounded up to the nearest byte and increased by one to arrive at the final estimate of the expected communication cost on the assumption of uniform randomness. If the compressed size ofksamples is less thancthen this represents arandomnessdeficiencyand the randomness test fails. israndom will exit with a nonzero exit status. If israndom indicates that a source is nonrandom, this fact is effectively certain if the compression module is correct and invertable. If the compressed size is at least the threshhold valuecthen the file appears to be random and passes the test and israndom will exit with a 0 return value. In either case, it will print the alphabet size, expected compressed size, sample count, and randomness difference before exitting with an appropriate return code. The default number of samples is 393216. Although larger sizes should increase accuracy, using too few samples will cause the method to fail to be able to resolve randomness in certain situations. This is a theoretically unavoidable fact for all effective randomness tests. If a filename is given, it is read to find the samples to analyze. If the filename "-" is given, or no filename is given at all, then israndom reads from standard input. If text files are to be used, it is important to specify one or both of -n and -r since without these, end of line characters will be misinterpreted as samples.

-ccompressor_name set compressor explicitly to compressor_name instead of the default, bzlib. For basic analysis, bzlib is usually sufficient. For detecting complex or subtle biases, a more powerful compression module such as lzma (lzmax) or ppmd (ppmdx) will detect more types of non-randomness. Because Lempel-Ziv types are universal, all effective randomness tests can be captured as a kind of compression discriminant function.-nignore newlines (so that text files may be used)-rignore carriage returns (so that text files may be used)-aalphasize set alphabet size to alphasize an integer between 2 and 256. If you do not specify an alphabet size, it is automatically determined by the contents of the samples.-ssamplecount Use samplecount samples instead of the default of 393216. Using a number that is too small here will reduce the accuracy of the test, causing everything to appear to be random. If 0 is used, it means to read until EOF.-qquiet mode, with no extra status messages-hprint help and exit.EXAMPLESFirst, we can verify that the cryptographicly strong random number generator is correct: israndom /dev/urandom Next, we can notice that the "od" command, without extra options, is not random because it prints out addresses and spaces predictably. Most compressors can tell by the regular spaces that it is not random: od /dev/urandom | israndom -n -r but if we remove spaces using ’tr’ then a more powerful compressor, lzmax, is required to demonstrate the non-randomness of the sequence: od /dev/urandom | tr -d ’ ’ | israndom -n -r -c lzmax Removing the address lines using anodoption yields the expected result once again that the sequence is effectively random: od -An /dev/urandom | tr -d ’ ’ | israndom -n -r -c lzmax The above sequence is not actually random, because every third octal digit only ranges from 0 to 3 since 377 octal is the same as 256 decimal. This subtle pattern is detectable using 10 million samples and the advanced ppmdx compressor: od -An /dev/urandom | tr -d ’ ’ | israndom -n -r -c ppmdx -s 10000000 As a sanity check, we see that even in extreme analysis as above, /dev/urandom still checks out okay as random, even with newlines and carriage returns removed for good measure. cat /dev/urandom | israndom -n -r -c ppmdx -s 10000000ENVIRONMENTNo environment variables.

Please report bugs to the Debian BTS.

Rudi Cilibrasi <cilibrar@cilibrar.com>

complearn(5),ncd(1)