NAME
trietool-0.2 - trie manipulation tool
SYNOPSIS
trietool-0.2 [ options ] trie command arg ...
DESCRIPTION
trietool-0.2 is the command-line tool for manipulating double-array
trie data. It can be used to query, add and remove words in a trie.
The Trie
The trie argument specifies the name of the trie to manipulate. A trie
is stored in a file with ‘.tri’ extension. However, to create a new
trie, one needs to prepare a file with ‘.abm’ extension, describing the
Unicode ranges of alphabet set of the trie. The ABM defines a set of
vectors that map Unicode characters into a continuous range of
integers. The mapped integers will be used as internal alphabet for
the trie. Such mapping can improve the space allocation within the
trie data, regardless of non-continuity of the character set being
used, as the mapped range is always continuous.
The ABM file is a plain text file, with each line listing a range of
32-bit Unicodes to be added to the alphabet set, in the format:
[0xSSSS,0xTTTT]
where ‘0xSSSS’ and ‘0xTTTT’ are hexadecimal values of starting and
ending character code for the range, respectively.
For example, for a dictionary that contains only English words witout
any punctuations, one may prepare ‘trie.abm’ as:
[0x0041,0x005a]
[0x0061,0x007a]
The first line lists the ASCII codes for A-Z, and the second for a-z.
No more than 255 alphabets are allowed in a trie.
The created ‘.tri’ file will incorporate the ABM data. So, the ‘.abm’
file is not required after the first creation, and will be ignored.
COMMANDS
Available commands are:
add word data ...
Add word to trie, associated with integer data. Arbitrary
number of words-data pairs can be given. Two arguments will be
read at a time, the first will be treated as word, and the
second as data.
add-list [ options ] list-file
Add words with associated data listed in list-file to trie. The
list-file must be a text file listing one word per line. The
associated data can be put after the word in the same line,
separated with tab (‘\t’) character. If the data field is
omitted, a default value (-1) will be used instead.
Options are available for this command:
-e, --encoding enc
Specify character encoding of the list-file contents,
such as ‘UTF-8’. If omitted, current locale codeset is
assumed.
delete word ...
Delete word from trie. Arbitrary number of words to delete can
be given.
delete-list [ options ] list-file
Delete words listed in list-file from trie. The list-file must
be a text file listing one word per line.
Options are available for this command:
-e, --encoding enc
Specify character encoding of the list-file contents,
such as ‘UTF-8’. If omitted, current locale codeset is
assumed.
query word
Search for word in trie. If word exists, its associated data is
printed to standard output. Otherwise, error message is printed
to standard error, with nothing printed to standard output.
list List all words in trie to standard output. The output lists one
word-data pair per line, separated with tab (‘\t’) character,
the format appropriate for being list-file for the add-list
command.
OPTIONS
This program follows the usual GNU command line syntax, with long
options starting with two dashes (‘--’). A summary of options is
included below.
-p, --path dir
Set trie directory to dir [default=‘.’]
-h, --help
Show summary of options.
-V, --version
Show version of program.
AUTHOR
libdatrie was written by Theppitak Karoonboonyanan.
This manual page was written by Theppitak Karoonboonyanan
<thep@linux.thai.net>.
DECEMBER 2008