Man Linux: Main Page and Category List

NAME

       swath - General-purpose Thai word segmentation utility

SYNOPSIS

       swath [options] < infile > outfile

DESCRIPTION

       Thai  script  has  no  word delimitor. Applications need some knowledge
       about Thai word list to recognize word boundaries before  they  can  do
       useful things about Thai text, such as line wrapping.

       Swath provides word analysis filter to insert word delimitors in a text
       stream. It  reads  text  from  standard  input,  analyze  it  for  word
       boundaries  by  consulting  a  Thai  word  list, and output to standard
       output the same text with the predefined word delimitors inserted.

       Currently, it can read plain text, HTML, RTF, LaTeX and Lambda (Unicode
       version  of  LaTeX  with  Omega typesetter kernel) documents and insert
       commonly used word delimitors for  each  format  (pipe  ‘|’  for  plain
       text).  But  the  user  can  always  override  this  with  a  preferred
       delimitor.

OPTIONS

       -b [delimitor]
              Define a string to be used as word delimitor code in the  output
              text.

       -d [dict-dir]
              Specify  alternative  dictionary location. dict-dir must contain
              the swath dictionary files ‘swathdic.br’ and ‘swathdic.tl’.

       -f [format]
              Specify format of the input. Possible formats  are:  html,  rtf,
              latex, lambda.

       -m [scheme]
              Choose  word  matching  scheme  when  analyzing word boundaries.
              Possible schemes are ‘long’ (for longest or greedy matching) and
              ‘max’  (for  maximal  matching,  with  least  words  preferred).
              Maximal matching is the default value.

       -u input-enc,output-enc
              Specify encodings of input and output. input-enc and  output-enc
              can  be  one  of  ’u’  (for UTF-8 encoding) and ’t’ (for TIS-620
              encoding).   Swath  will  convert  the  character  encoding   as
              necessary.  If  omitted,  TIS-620  encodings  on  both input and
              output are assumed.

       -v, --verbose
              Turn on verbose mode.

       -help, --help
              Show help.

EXAMPLES

       For LaTeX (to be used with thailatex package):

       $ swath -f latex < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       For HTML (to provide web pages to web browsers that  cannot  wrap  Thai
       lines properly, but support the <wbr> tag):

       $ swath -f html < myweb.html > myweb-wbr.html

       To  preprocess  a  Thai  UTF-8  encoded LaTeX file for thailatex, which
       always works with TIS-620:

       $ swath -f latex -u u,t < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

       This is equivalent to filtering with iconv(1):

       $  iconv  -f  UTF-8  -t  TIS-620  thaifile.tex  |  swath  -f  latex   >
       thaifile.ttex
       $ latex thaifile.ttex

       To use longest matching scheme with LaTeX document:

       $ swath -f latex -m long < thaifile.tex > thaifile.ttex
       $ latex thaifile.ttex

AUTHOR

       This   manual   page   was   written   by   Theppitak   Karoonboonyanan
       <thep@linux.thai.net>.

                                 January 2008