NAME
hxunent - replace HTML predefined character entities by UTF-8
SYNOPSIS
hxunent [ -b ] [ -f ] [ file ]
DESCRIPTION
The hxunent command reads the file (or standard input) and copies it to
standard output with &-entities by their equivalent character (encoded
as UTF-8). E.g., " is replaced by " and < is replaced by <.
OPTIONS
The following options are supported:
-b The four builtin entities of XML (< > " &) are
not replaced but copied unchanged. This is necessary if the
output has to be valid XML or SGML.
-f This option changes how unknown entities or lone ampersands
are handled. Normally they are copied unchanged, but this
option tries to "fix" them by replacing ampersands by &.
Often such stray ampersands are the result of copy and paste
of URLs into a document and then this option indeed fixes
them and makes the document valid.
DIAGNOSTICS
The program’s exit value is 0 if all went well, otherwise:
1 The input couldn’t be read (file not found, file not
readable...)
2 Wrong command line arguments.
SEE ALSO
asc2xml(1), xml2asc(1), UTF-8 (RFC 2279)
BUGS
The program assumes entities are as defined by HTML. It doesn’t read a
document’s DTD to find the actual definitions in use in a document.
With -f, it will even remove all entities that are not HTML entities.