Man Linux: Main Page and Category List

html2pdbtxt(1)                                                  html2pdbtxt(1)

NAME

       html2pdbtxt - HTML to Doc Text converter for Palm Pilots

SYNOPSIS

       html2pdbtxt [ -bchars ] [ -ttitle ] [ -uURL ] file.html [ file.txt ]
       html2pdbtxt -v

DESCRIPTION

       html2pdbtxt  converts  HTML to text suitable for conversion to a Doc(4)
       file via txt2pdbdoc(1).  If no text filename is  given,  the  generated
       text is sent to standard output.

   HTML Tags
       The following HTML tags (and corresponding ending tags) are recognized:
       ADDRESS, A NAME, BLOCKQUOTE, BR, CENTER, DIV, DL, DT, H1, H2,  H3,  H4,
       H5,  H6,  OL,  OPTION, PRE, P, SELECT, SCRIPT, STYLE, TABLE, TITLE, UL.
       In  all  cases,  the  most  ‘‘reasonable’’  thing  is  done  given  the
       constraints  of the Doc(4) format which is essentially plain text.  ALT
       attributes (typically found in IMG tags) have their text extracted  and
       placed between brackets [like this].  All other HTML tags are stripped.

   Character Entities
       Both HTML  character  and  numeric  (decimal  and  hexadecimal)  entity
       references  are  converted  to  their  byte  value according to the ISO
       8859-1 (Latin 1) character set so they appear properly  on  the  Pilot.
       For  example,  ‘‘résumé’’  becomes ‘‘resume’’ with accented
       letter ’e’s.

   Document Title
       Unless specified with the -t option,  the  HTML  file  is  scanned  for
       <TITLE> ... </TITLE> tags and, if found, the title is extracted and put
       on line 1 of the generated file.

   Bookmarks
       Bookmarks are placed into the generated  file  wherever  <A NAME="...">
       tags are found in the HTML file.

OPTIONS

       -bchars   Specify  the  character  sequence  that  is  to  serve as the
                 bookmark indicator.  The default is (*).  (See the  CAVEATS.)

       -ttitle   Specify the title of the document that is to appear on line 1
                 of the generated file overriding any title found  inside  the
                 HTML file between <TITLE> ... </TITLE> tags.

       -uurl     Specify the URL the HTML file supposedly came from and put it
                 on the line after the title, if any, in the generated file.

       -v        Print the version number to standard output and exit.

EXAMPLE

       To convert an HTML file to Doc:

            html2pdbtxt -u http://www.wonderland.org/ alice.html alice.txt
            txt2pdbdoc "‘head -1 alice.txt‘" alice.txt alice.pdb

CAVEATS

       1.  Some Doc readers have a ‘‘feature’’ whereby, during  the  scan  for
           bookmarks phase, they recognize the bookmark sequence of characters
           anywhere in the text and not just at the beginning of a line.

       2.  Some Doc readers do not allow the bookmark sequence to contain  the
           >  character  since  they interpret that as the sequence delimiter,
           e.g., <->> will be interpreted as the sequence being merely -.

       3.  Ordered lists (via the OL tag) are treated as unordered lists (like
           the  UL  tag) because it would greatly complicate the code since it
           would have to be parsed  rather  than  simple  substitutions  being
           performed.

SEE ALSO

       pdbtxt2html(1), txt2pdbdoc(1), doc(4), pdb(4)

       International   Standards   Organization.   ‘‘ISO  8859-1:  Information
       Processing -- 8-bit single-byte coded graphic character sets -- Part 1:
       Latin alphabet No. 1.’’  1987.

       World  Wide  Web  Consortium.   ‘‘Character  entity  references in HTML
       4.0.’’  HTML 4.0 Specification, http://www.w3.org/

AUTHOR

       Paul J. Lucas <pauljlucas@mac.com>

html2pdbtxt                    January 21, 2005                 html2pdbtxt(1)