Man Linux: Main Page and Category List

NAME

       ascii2uni - convert 7-bit ASCII representations to UTF-8 Unicode

SYNOPSIS

       ascii2uni [options] (<input file name>)

DESCRIPTION

       ascii2uni  converts  various  7-bit ASCII representations to UTF-8.  It
       reads from the standard input and writes to the  standard  output.  The
       representations  understood  are  listed  below  under the command line
       options. If no format is specified, standard hexadecimal  format  (e.g.
       0x00e9) is assumed.

COMMAND LINE OPTIONS

       -a <format> Convert from the specified format. Formats may be specified
       by means of the following arbitrary single character codes, by means of
       names such as "SGML_decimal", and by examples of the desired format.

              A  Convert  hexadecimal  numbers with prefix U in angle-brackets
              (<U00E9>).

              B Convert \x-escaped hex (e.g. \x00E9)

              C  Convert  \x  escaped  hexadecimal  numbers  in  braces  (e.g.
              \x{00E9}).

              D  Convert  decimal  HTML  numeric  character  references  (e.g.
              &#0233;)

              E Convert hexadecimal with prefix U (U00E9).

              F Convert hexadecimal with prefix u (u00E9).

              G Convert hexadecimal in  single  quotes  with  prefix  X  (e.g.
              X'00E9').

              H  Convert  hexadecimal  HTML numeric character references (e.g.
              &#x00E9;)

              I Convert hexadecimal UTF-8 with each byte's hex preceded by  an
              =-sign  (e.g.  =C3=A9)  .  This  is  the Quoted Printable format
              defined by RFC 2045.

              J Convert hexadecimal UTF-8 with each byte's hex preceded  by  a
              %-sign  (e.g.   %C3%A9). This is the URIescape format defined by
              RFC 2396.

              K Convert octal UTF-8 with each  byte  escaped  by  a  backslash
              (e.g.  \303\251)

              L  Convert \U-escaped hex outside the BMP, \u-escaped hex within
              the BMP (U+0000-U+FFFF).

              M Convert hexadecimal SGML numeric  character  references  (e.g.
              \#xE9;)

              N  Convert  decimal  SGML  numeric  character  references  (e.g.
              \#233;)

              O Convert octal escapes for the three low  bytes  in  big-endian
              order(e.g. \000\000\351))

              P Convert hexadecimal numbers with prefix U+ (e.g. U+00E9)

              Q Convert HTML character entities (e.g. &eacute;).

              R Convert raw hexadecimal numbers (e.g. 00E9)

              S  Convert  hexadecimal  escapes for the three low bytes in big-
              endian order (e.g. \x00\x00\xE9)

              T Convert decimal escapes for the three low bytes in  big-endian
              order (e.g. \d000\d000\d233)

              U Convert \u-escaped hexadecimal numbers (e.g. \u00E9).

              V Convert \u-escaped decimal numbers (e.g. \u00233).

              X Convert standard hexadecimal numbers (e.g. 0x00E9).

              Y  Convert  all  three  types  of  HTML  escape: hexadecimal and
              decimal character references and character entities.

              0 Convert hexadecimal UTF-8 with each byte's hex enclosed within
              angle brackets (e.g.  <C3><A9>).

              1  Convert Common Lisp format hexadecimal numbers (e.g. #x00E9).

              2 Convert Perl format decimal numbers with prefix v (e.g. v233).

              3 Convert hexadecimal numbers with prefix $ (e.g. $00E9).

              4  Convert Postscript format hexadecimal numbers with prefix 16#
              (e.g. 16#00E9).

              5 Convert Common Lisp format  hexadecimal  numbers  with  prefix
              #16r (e.g. #16r00E9).

              6  Convert  ADA  format  hexadecimal numbers with prefix 16# and
              suffix # (e.g. 16#00E9#).

              7 Convert Apache log format hexadecimal UTF-8 with  each  byte's
              hex preceded by a backslash-x (e.g.  \xC3\xA9).

              8 Convert Microsoft OOXML format hexadecimal numbers with prefix
              _x and suffix _ (e.g. _x00E9_).

              9 Convert %\u-escaped hexadecimal numbers (e.g. %\u00E9).

       -h     Help. Print the usage message and exit.

       -v     Print program version information and exit.

       -m     Accept deprecated HTML entities lacking  final  semicolon,  e.g.
              "&#x00E9" in place of "&#x00E9;".

       -p     Pure.  Assume that the input consists entirely of escapes except
              for arbitrary (but non-null) amounts of separating whitespace.

       -q     Be quiet. Do not chat unnecessarily.

       -Z <format>
              Convert input using the supplied format.  The  format  specified
              will  be used as the format string in a call to sscanf(3) with a
              single argument consisting of a  pointer  to  an  unsigned  long
              integer.  For example, to obtain the same results as with the -U
              flag, the format would be: \u%04X.

       If the format is Quoted-Printable, although it is not strictly speaking
       conversion  of an ASCII escape to Unicode, in accordance with RFC 2045,
       if an equal-sign occurs at the end of an input line,  both  the  equal-
       sign and the immediately following newline are skipped.

       All  options  that  accept  hexadecimal input recognize both upper- and
       lower-case hexadecimal digits.

EXIT STATUS

       The following values are returned on exit:

       0 SUCCESS
              The input was successfully converted.

       3 INFO The user requested information such as  the  version  number  or
              usage synopsis and this has been provided.

       5 BAD OPTION
              An incorrect option flag was given on the command line.

       7 OUT OF MEMORY
              Additional memory was unsuccessfully requested.

       8 BAD RECORD
              An ill-formed record was detected in the input.

SEE ALSO

       uni2ascii(1)

AUTHOR

       Bill Poser <billposer@alum.mit.edu>

LICENSE

       GNU General Public License

                                 August, 2009                     ascii2uni(1)