Man Linux: Main Page and Category List

NAME

       genctd - Compiles word list into ICU compact trie dictionary

SYNOPSIS

       genctd  [  -h, -?, --help ] [ -V, --version ] [ -c, --copyright ] [ -v,
       --verbose ] [ -d, --destdir destination ] [ -i, --icudatadir  directory
       ] -o, --out output-file  dictionary-file

DESCRIPTION

       genctd  reads  the word list from dictionary-file and creates a compact
       trie dictionary file. Normally this data file has the .ctd extension.

       Words begin at the beginning of a line and are terminated by the  first
       whitespace.  Lines that begin with whitespace are ignored.

OPTIONS

       -h, -?, --help
              Print help about usage and exit.

       -V, --version
              Print the version of genctd and exit.

       -c, --copyright
              Embeds the standard ICU copyright into the output-file.

       -v, --verbose
              Display extra informative messages during execution.

       -d, --destdir destination
              Set the destination directory of the output-file to destination.

       -i, --icudatadir directory
              Look for  any  necessary  ICU  data  files  in  directory.   For
              example,  the file pnames.icu must be located when ICU’s data is
              not built as a shared library.  The default ICU  data  directory
              is   specified  by  the  environment  variable  ICU_DATA.   Most
              configurations of ICU do not require this argument.

        dictionary-file
              The source file to read.

       -o, --out output-file
              The output data file to write.

CAVEATS

       When the dictionary-file contains  a  byte  order  mark  (BOM)  at  the
       beginning  of the file, which is the Unicode character U+FEFF, then the
       dictionary-file is interpreted as Unicode. Without the BOM, the file is
       interpreted in the current operating system default codepage.  In order
       to eliminate any ambiguity of the encoding for how  the  rule-file  was
       written,  it  is recommended that you write this file in UTF-8 with the
       BOM.

ENVIRONMENT

       ICU_DATA  Specifies the directory  containing  ICU  data.  Defaults  to
                 ${prefix}/share/icu/4.2.1/.   Some tools in ICU depend on the
                 presence of the trailing slash. It is thus important to  make
                 sure that it is present if ICU_DATA is set.

AUTHORS

       Deborah Goldsmith

VERSION

       1.0

COPYRIGHT

       Copyright  (C)  2006  International  Business  Machines Corporation and
       others

SEE ALSO

       http://www.icu-project.org/userguide/boundaryAnalysis.html