Man Linux: Main Page and Category List

NAME

       dictfmt - formats a DICT protocol dictionary database

SYNOPSIS

       dictfmt  -c5|-t|-e|-f|-h|-j|-p [options]  basename
       dictfmt  -i|-I [options]

DESCRIPTION

       dictfmt takes a file, FILE, on stdin, and creates a dictionary database
       named basename.dict, that conforms  to  the  DICT  protocol.   It  also
       creates  an  index file named basename.index.  By default, the index is
       sorted according to the C locale, and only alphanumeric characters  and
       spaces  are  used  in  sorting,  however  this  may be changed with the
       --locale and --allchars options.  (  basename  is  commonly  chosen  to
       correspond to the basename of FILE , but this is not mandatory.)

       Unless  the  database is extremely small, it is highly recommended that
       basename.dict   be   compressed   with   /usr/bin/dictzip   to   create
       basename.dict.dz.  (dictzip is included in the dictd source package.)

       FILE  may  be  in  any  of  the several formats described by the format
       options -c5, -t, -e, -f, -h, -j, -p, -i or -I.  Exactly  one  of  these
       options must be given.

       dictfmt   prepends   several  headers  are  to  the  .dict  file.   The
       00-database-url header gives the value of the -u option as the  URL  of
       the   site   from  which  the  original  database  was  obtained.   The
       00-database-short header gives the value of the -s option as the  short
       name  of  the  dictionary.   (This "short name" is the identifying name
       given by the "dict- D" option.)   If  the  -u  and/or  -s  options  are
       omitted,  these values will be shown as "unknown", which is undesirable
       for a publicly distributed database.

       The date of conversion (formatting) is given  in  the  00-database-info
       header.   All  text  in  the input file prior to the first headword (as
       defined by the appropriate  formatting  option)  is  appended  to  this
       header.   All  text  in  the input file following a headword, up to the
       next headword, is copied unchanged to the .dict file.

FORMATTING OPTIONS

       -c5    FILE  is  formatted  with  headwords  preceded  by  5  or   more
              underscore  characters (_) and a blank line.  All text until the
              next headword is considered the  definition.   Any  leading  ‘@’
              characters   are   stripped  out,  but  the  file  is  otherwise
              unchanged. This option was  written  to  format  the  CIA  WORLD
              FACTBOOK 1995.

       -t     -c5,  --without-info and --without-headword options are implied.
              Use this option, if an input database  comes  from  dictunformat
              utility.

       -e     FILE  is  in  html  format,  with  the  headword tagged as bold.
              (<B>headword - </B>)
              This  option  was  written  to  format   EASTON’S   1897   BIBLE
              DICTIONARY.  A typical entry from Easton is:

              <A NAME="T0000005">
              <B>Abagtha - </B>
              one  of  the  seven  eunuchs  in Ahasuerus’s court (Esther 1:10;
              2:21).

              This is converted to:
              Abagtha
                 one of the seven eunuchs in Ahasuerus’s court  (Esther  1:10;
              2:21).

              The  heading  "<A  NAME="T0000005"> is omitted, and the headword
              ‘Abagtha’ is indexed.

              NOTE: This option should  be  used  with  caution.   It  removes
              several  html  tags  (enough to format Easton properly), but not
              all.  The Makefile that was originally written to  format  dict-
              easton  uses sed scripts to modify certain cross reference tags.
              It may be necessary to pipe the input file through a sed script,
              or  hack the source of dictfmt in order to properly format other
              html databases.

       -f     FILE is formatted with the headwords starting in column 0,  with
              the definition indented at least one space (or tab character) on
              subsequent lines.  The third line starting in column 0 is  taken
              as  the  first  headword  ,  and the first two lines starting in
              column 0 are treated as part  of  the  00-database-info  header.
              This option was written to format the F.O.L.D.O.C.

       -h     FILE  is  formatted  with  the  headwords  starting in column 0,
              followed by a comma, with the definition continuing on the  same
              line.   All  text  before  the  first  single  character line is
              included in 00-database-info header, and  lines  with  only  one
              character  are  omitted from the .dict file.  The first headword
              is on the line following the first single character  line.   The
              headword  is indexed; the text of the file is not changed.  This
              option was written to format HITCHCOCK’S BIBLE NAMES DICTIONARY.

       -j     FILE  is formatted with headwords starting in col 0, enclosed in
              colons, followed by the definition.  The colons surrounding  the
              headword  are  removed,  and  the  headword  is  indexed.  Lines
              beginning with ’*’, ’=’, or ’-’  are  also  removed.   All  text
              before  the  first  headword  is  included in the headers.  This
              option was written to format the JARGON FILE.
              NOTE: Some recent versions of the JARGON FILE had  three  blanks
              inserted before the first colon at each headword.  These must be
              removed before processing with dictfmt.  (sed scripts have  been
              used  for  this  purpose.  ed,  awk,  or  perl  scripts are also
              possible.)

       -p     FILE is formatted with ‘%h’ in column 0, followed  by  a  blank,
              followed   by  the  headword,  optionally  followed  by  a  line
              containing ‘%d’ in column  0.   The  definition  starts  on  the
              following  line.   The  first  line beginning ´%h´ and any lines
              beginning ’%d’ are stripped from the .dict file, and  ’%h  ’  is
              stripped  from  in  front  of the headword.  All text before the
              first headword is included in  the  headers.   The  second  line
              beginning%his taken as the first headword.
              This  option  was  written  to  format  Jay  Kominek’s  elements
              database.

       -i -I  These two  options  are  different  from  all  other  formatting
              options.   They  are  intended  to  resort  (according  to dictd
              requirement) an .index file given on stdin.  That is .dict  file
              is  not  generated  at  all.  Only resorting is made.  Three- or
              four-column .index like input is expected.  -i  expects  decimal
              offset and length, while -I expects them in base64 format.

OPTIONS

       -u url Specifies  the  URL  of the site from which the raw database was
              obtained.  If this option is specified, 00-database-url headword
              and appropriate definition will be ignored.

       -s name
              Specifies the name and, optionally, the version and date, of the
              database.  (If this contains spaces, it  must  be  quoted.)   If
              this   option   is  specified,  00-database-short  headword  and
              appropriate definition will be ignored.

       -L     display license and copyright information

       -V     display version information

       -D     output debugging information

       --help display a help message

       --locale locale
              Specifies  the  locale  used  for  sorting.   If  no  locale  is
              specified,  the "C" locale is used. For using UTF-8 mode, --utf8
              is needed.

       --8bit generates database in 8-bit mode, see --locale option also.
              Note: This option is deprecated.   Use  it  for  creating  8-bit
              (non-UTF8)   dictionaries   only.   In  order  to  create  UTF-8
              dictionary, use --utf8 option instead.

       --utf8 If specified, UTF-8 database is created.

       --allchars
              Specifies that all characters should be used for the search,  by
              default  only  alphabetic, numeric characters and spaces are put
              to .index file and therefore are used  in  search.  Creates  the
              special entry 00-database-allchars.

       --case-sensitive
              makes  the  search  case  sensitive.   Creates the special entry
              00-database-case-sensitive.

       --headword-separator sep
              sets the headword separator, which allows several words to  have
              the same definition.  For example, if ´--headword-separator %%%’
              is given, and the  input  file  contains  ´autumn%%%fall’,  both
              ’autumn’ and ’fall’ will be indexed as  headwords, with the same
              definition.

       --index-data-separator sep
              sets the index/data separator, which allows to set the first and
              fourth  columns  of .index file independently. That is the first
              column can be treated  as  an  index  column  (where  the  MATCH
              command  searches)  and  the  fourth  column  as a result column
              (where the MATCH gets things to be returned), and they (1-st and
              4-th  columns)  are  completely  independant of each other.  The
              default value for this separator is ASCII symbol " \034".

       --break-headwords
              multiple headwords will be written  on  separate  lines  in  the
              .dict file.  For use with ’--headword-separator.

       --index-keep-orig
              When  --utf-8  is  specified  headwords  are lowercased and non-
              alphanumeric characters are removed from  it  before  saving  to
              .index   file   in   order   to   simplify   the  search.   When
              --index-keep-orig option is used fourth column  is  created  (if
              necessary)  in  .index  file,  and contains an original headword
              which is returned by MATCH command.  This option may  be  useful
              to  prevent converting " AT&T" to " ATT" or to keep proper nouns
              with uppercased first letter.

       --without-headword
              headwords will not be included in .dict file

       --without-header
              header will not be copied to DB info entry

       --without-url
              URL will not be copied to DB info entry

       --without-time
              time of creation will not be copied to DB info entry

       --without-ver
              By    default    dictfmt     creates     a     special     entry
              00-database-dictfmt-X.Y.Z  that contains (in .dict file) dictfmt
              version in format dictfmt-X.Y.Z. This option suppresses this.

       --without-info
              DB info entry will not  be  created.   This  may  be  useful  if
              00-database-info  headword  is expected from stdin (dictunformat
              outputs it).

       --columns columns
              By default dictfmt wraps strings read from stdin to 72  columns.
              This  option  changes  this  default.  If  it  is set to zero or
              negative value, wrapping is off.

       --default-strategy strategy
              Sets the default search strategy for the database.  It  will  be
              used     instead     of    strategy    ’.’.     Special    entry
              00-database-default-strategy is created for this purpose.   This
              option  may  be useful, for example, for dictionaries containing
              mainly phrases but the single words.   In  any  case,  use  this
              option if you are absolutely sure what you are doing.

       --mime-header mime_header
              When client sends OPTION MIME command to the dictd , definitions
              found in this database  are  prepended  by  the  specified  MIME
              header. Creates the special entry 00-database-mime-header.

CREDITS

       dictfmt  was  written  by  Rik  Faith (faith@cs.unc.edu) as part of the
       dict-misc package.  dictfmt is distributed under the terms of  the  GNU
       General  Public  License.  If you need to distribute under other terms,
       write to the author.

AUTHOR

       This   manual   page   was    written    by    Robert    D.    Hilliard
       <hilliard@debian.org> .

SEE ALSO

       dict(1),  dictd(8),  dictzip(1),  dictunformat(1), http://www.dict.org,
       RFC 2229

                               25 December 2000