NAME
ncd - compute the Normalized Compression Distance
SYNOPSIS
ncd [ -c compressor ] [ -o filename ] [ -bcdhLnqsv ] [-o filestem ] [
-d|f|l|p|t string ] ... [arg1] [arg2]
DESCRIPTION
The Normalized Compression Distance between two objects is defined as
NCD(a,b) = (C(a,b) - min(C(a),C(b))) / max(C(a),C(b))
where
C(a,b) means "the compressed size of the concatenation of a and b"
C(a) means "the compressed size of a"
C(b) means "the compressed size of b"
ncd will print a non-negative number (typically, but not always, 0 <= x
< 1.1) representing how different the two objects are. Smaller numbers
represent more similar files. The largest number is somewhere near 1.
It is not exactly 1 due to imperfections in compression techniques or
other irregularities underlying compressor, but for most standard
compression algorithms you are unlikely to see a number above 1.1 in
any case.
Three compressors are available by default: bzlib, zlib and blocksort.
These may be selected with an option in the complearn.conf, see
complearn (5) for more details.
ENUMERATION MODES
-f, --file-mode=FILE
select file mode
-l, --literal-mode=STRING
select string literal mode; this is the default. The next
argument is a string which, if containing white space, may be
enclosed in double-quotes (")
-p, --plainlist-mode=FILE
select plain list mode; argument is a file which contains a list
of files to be individually evaluated
-t, --termlist-mode=FILE
select term list mode; argument is a file which contains string
literals to be individually evaluated
-d, --directory-mode=DIR
select directory mode; argument is a path which contains files
to be individually evaluated
OPTIONS
-c, --compressor=compressor
use and set compressor to use
-L, --list
list available builtin compressors as well as available
compression modules. Modules are loaded from the modules
subdirectory of /usr/lib/complearn.
-s, --size
get, in place of NCD, the compressed size of a single FILE,
STRING, or DIR
-n, --nexus
Nexus output format for distance matrix
-o, --output=FILE
specify binary output filestem, if different from distmatrix,
the default. An extension (.clb, .nex, or .txt) will be added,
as appropriate to the output file type.
-b, --binary
output results to binary file; the default name is
distmatrix.clb
-q, --quiet
suppress ASCII output and messages
-v, --verbose
activate verbose mode
-h, --help
show help options and exit
FILES
$HOME/.complearn/complearn.conf
/usr/share/complearn/complearn.conf
/usr/local/share/complearn/complearn.conf
per-user and system configuration files
see complearn(5) for further details.
$HOME/.complearn/modules
/usr/lib/complearn/modules
standard module automatic loading area. Any shared object compressor
modules found here will be loaded on startup.
ENVIRONMENT
COMPLEARNMODPATH
If this environment variable is set, CompLearn will search the
given directory and load any CompLearn compression modules it
finds there (such as the libart.so example included with the
CompLearn source distribution) none
DIAGNOSTICS
none
SEE ALSO
anycompress(1), anydecompress(1), complearn(5), maketree(1)