Man Linux: Main Page and Category List

NAME

       ncd - compute the Normalized Compression Distance

SYNOPSIS

       ncd  [  -c compressor ] [ -o filename ] [ -bcdhLnqsv ] [-o filestem ] [
       -d|f|l|p|t string ] ... [arg1] [arg2]

DESCRIPTION

       The Normalized Compression Distance between two objects is defined as

           NCD(a,b) = (C(a,b) - min(C(a),C(b))) / max(C(a),C(b))

       where

       C(a,b)  means "the compressed size of the concatenation of a and b"

       C(a)    means "the compressed size of a"

       C(b)    means "the compressed size of b"

       ncd will print a non-negative number (typically, but not always, 0 <= x
       < 1.1) representing how different the two objects are.  Smaller numbers
       represent more similar files.  The largest number is somewhere near  1.
       It  is  not exactly 1 due to imperfections in compression techniques or
       other irregularities  underlying  compressor,  but  for  most  standard
       compression  algorithms  you  are unlikely to see a number above 1.1 in
       any case.

       Three compressors are available by default: bzlib, zlib and  blocksort.
       These  may  be  selected  with  an  option  in  the complearn.conf, see
       complearn (5) for more details.

ENUMERATION MODES

       -f, --file-mode=FILE
              select file mode

       -l, --literal-mode=STRING
              select string literal mode;  this  is  the  default.   The  next
              argument  is  a  string which, if containing white space, may be
              enclosed in double-quotes (")

       -p, --plainlist-mode=FILE
              select plain list mode; argument is a file which contains a list
              of files to be individually evaluated

       -t, --termlist-mode=FILE
              select  term list mode; argument is a file which contains string
              literals to be individually evaluated

       -d, --directory-mode=DIR
              select directory mode; argument is a path which  contains  files
              to be individually evaluated

OPTIONS

       -c, --compressor=compressor
              use and set compressor to use

       -L, --list
              list   available   builtin  compressors  as  well  as  available
              compression  modules.   Modules  are  loaded  from  the  modules
              subdirectory of /usr/lib/complearn.

       -s, --size
              get,  in  place  of  NCD,  the compressed size of a single FILE,
              STRING, or DIR

       -n, --nexus
              Nexus output format for distance matrix

       -o, --output=FILE
              specify binary output filestem, if  different  from  distmatrix,
              the  default.  An extension (.clb, .nex, or .txt) will be added,
              as appropriate to the output file type.

       -b, --binary
              output  results  to   binary   file;   the   default   name   is
              distmatrix.clb

       -q, --quiet
              suppress ASCII output and messages

       -v, --verbose
              activate verbose mode

       -h, --help
              show help options and exit

FILES

       $HOME/.complearn/complearn.conf

       /usr/share/complearn/complearn.conf

       /usr/local/share/complearn/complearn.conf

        per-user and system configuration files
              see complearn(5) for further details.

       $HOME/.complearn/modules

       /usr/lib/complearn/modules

        standard module automatic loading area.  Any shared object compressor
              modules found here will be loaded on startup.

ENVIRONMENT

       COMPLEARNMODPATH
               If  this environment variable is set, CompLearn will search the
              given directory and load any CompLearn  compression  modules  it
              finds  there  (such  as  the libart.so example included with the
              CompLearn source distribution) none

DIAGNOSTICS

       none

SEE ALSO

       anycompress(1), anydecompress(1), complearn(5), maketree(1)