Man Linux: Main Page and Category List

NAME

       hxunent - replace HTML predefined character entities by UTF-8

SYNOPSIS

       hxunent [ -b ] [ -f ] [ file ]

DESCRIPTION

       The hxunent command reads the file (or standard input) and copies it to
       standard output with &-entities by their equivalent character  (encoded
       as UTF-8). E.g., &quot; is replaced by " and &lt; is replaced by <.

OPTIONS

       The following options are supported:

       -b        The four builtin entities of XML (&lt; &gt; &quot; &amp;) are
                 not replaced but copied unchanged. This is necessary  if  the
                 output has to be valid XML or SGML.

       -f        This  option  changes how unknown entities or lone ampersands
                 are handled. Normally they are  copied  unchanged,  but  this
                 option  tries to "fix" them by replacing ampersands by &amp;.
                 Often such stray ampersands are the result of copy and  paste
                 of  URLs  into  a  document and then this option indeed fixes
                 them and makes the document valid.

DIAGNOSTICS

       The program’s exit value is 0 if all went well, otherwise:

       1         The  input  couldn’t  be  read  (file  not  found,  file  not
                 readable...)

       2         Wrong command line arguments.

SEE ALSO

       asc2xml(1), xml2asc(1), UTF-8 (RFC 2279)

BUGS

       The  program assumes entities are as defined by HTML. It doesn’t read a
       document’s DTD to find the actual definitions in  use  in  a  document.
       With -f, it will even remove all entities that are not HTML entities.