Man Linux: Main Page and Category List

NAME

       utf8trans - Transliterate UTF-8 characters according to a table

SYNOPSIS

       utf8trans charmap [file]...

DESCRIPTION

       utf8trans transliterates characters in the specified files (or standard
       input, if they are not specified) and writes  the  output  to  standard
       output. All input and output is in the UTF-8 encoding.

       This program is usually used to render characters in Unicode text files
       as some markup escapes or ASCII transliterations.  (It is not  intended
       for general charset conversions.)  It provides functionality similar to
       the  character  maps  in  XSLT   2.0   (XML   Stylesheet   Language   –
       Transformations, version 2.0).

OPTIONS

       -m, --modify
              Modifies  the  given  files  in-place  with their transliterated
              output, instead of sending it to standard output.

              This option is useful  for  efficient  transliteration  of  many
              files at once.

       --help Show brief usage information and exit.

       --version
              Show version and exit.

USAGE

       The  translation is done according to the rules in the ‘character map’,
       named in the file charmap. It has the following format:

       1.  Each line represents a translation entry, except  for  blank  lines
           and comment lines, which are ignored.

       2.  Any amount of whitespace (space or tab) may precede the start of an
           entry.

       3.  Comment lines begin  with  #.   Everything  on  the  same  line  is
           ignored.

       4.  Each  entry  consists  of the Unicode codepoint of the character to
           translate, in hexadecimal, followed one space or tab,  followed  by
           the translation string, up to the end of the line.

       5.  The  translation  string  is taken literally, including any leading
           and trailing spaces (except the delimeter between the codepoint and
           the  translation  string), and all types of characters. The newline
           at the end is not included.

       The above format is intended  to  be  restrictive,  to  keep  utf8trans
       simple.   But   if   a   XML-based   format  is  desired,  there  is  a
       xmlcharmap2utf8trans script that comes with the docbook2X distribution,
       that  converts  character  maps  in  XSLT  2.0  format to the utf8trans
       format.

LIMITATIONS

       · utf8trans does not work with binary files,  because  malformed  UTF-8
         sequences  in  the  input  are  substituted  with  U+FFFD characters.
         However, null characters in the input  are  handled  correctly.  This
         limitation may be removed in the future.

       · There  is  no  way  to  include a newline or null in the substitution
         string.

AUTHOR

       Steve Cheng <stevecheng@users.sourceforge.net>.