Man Linux: Main Page and Category List

NAME

       unhtml  -  strip  the  HTML  formatting from a document or the standard
       input stream and display it to the standard output

SYNOPSIS

       unhtml -version | [ filename ]

DESCRIPTION

       Parses text read from the standard input, or a file if a file  name  is
       supplied,  and  removes  any  HTML  formatting  it  finds.   Prints the
       resulting cleansed text to the standard output  for  easy  redirection.
       The  version  included  with  this man page has been improved to handle
       comments and scripts.

OPTIONS

       -version
              Version.  unhtml will display its version and exit.

EXAMPLES

       This example simply scans a file called  "index.html"  and  prints  the
       file  to  the  standard  output  with the HTML formatting removed.  The
       standard output is redirected to a file called "index.txt" which, after
       running, will contain the plain text of the .html file.

              example% unhtml index.html > index.txt

BUGS

       Currently,  if  the  output is redirected to a file of the same name as
       the input file, the result will be an empty file of the same name,  but
       this  is really an idiosyncracy of the redirect operator, and cannot be
       corrected in the program.

DEVELOPMENT

       This document is Copyright (C) 1998 by Kevin Swan.

                                3 February 1998                      UNHTML(1)