Man Linux: Main Page and Category List

WWW(3)                                                                  WWW(3)

NAME

       WWW - World Wide Web Package

SYNOPSIS

       extract_description( FILE )
       extract_meta( FILE, NAME )
       hyperlink( LIST )

DESCRIPTION

       This  package  provides  a  utility functions for the World Wide Web to
       extract descriptions of or meta information from files,  and  hyperlink
       text.

SUBROUTINES

       The following Perl subroutines are defined and available:

       extract_description( FILE )
              Extracts  a description from an HTML or plain text file given by
              the FILE name; FILE should  be  an  absolute  path.   The  first
              $description::chars (default: 2048) characters are read.  If the
              file ends in one of the extensions htm, html, or  shtml,  it  is
              presumed  to  be  an  HTML  file; if the file ends in txt, it is
              presumed to be a plain text  file.   Other  extensions  are  not
              recognized and no description is returned for them.

              For   HTML   files,   first,   if   a  <META  NAME="description"
              CONTENT="..."> or a <META  NAME="DC.description"  CONTENT="...">
              (Dublin  Core) element is found, then the words specified as the
              value of the CONTENT attribute is returned as the description.

              Otherwise, all HTML comments, text  between  <SCRIPT>,  <STYLE>,
              and  <TITLE>  tags,  and  all  other HTML tags are stripped.  If
              <AREA ... ALT="..."> or <IMG ... ALT="..."> elements are  found,
              then  the words specified as the value of the ALT attributes are
              extracted.

              Finally,  for  either  HTML  or  plain  text  files,   at   most
              $description::words (default: 50) are returned.

       extract_meta( FILE, NAME )
              Extracts  the value of the CONTENT attribute from a META element
              having the given NAME attribute from an HTML file given  by  the
              FILE  name;  FILE should be an absolute path.  The file must end
              in one of the extensions htm, html, or shtml to be considered an
              HTML   file.   The  first  $description::chars  (default:  2048)
              characters  are  read.   The  characters  are   cached   between
              consecutive calls using the same filename.

       hyperlink( LIST )
              Adds  hyperlinks  to  strings:  that  is  strings  that  contain
              substrings that are valid URLs (according to RFC 1630) have  the
              appropriate  HTML tags ‘‘wrapped’’ around them so that they will
              be selectable when displayed in a  browser.   The  ftp,  gopher,
              http, https, mailto, news, telnet, and wais URLs are recognized.
              Example:

                 Read all about it at
                 http://www.usatoday.com/

            becomes:

                 Read all about it at
                 <A HREF="http://www.usatoday.com/">http://www.usatoday.com/</A>

SEE ALSO

       perl(1)

       Tim Berners-Lee.  ‘‘Universal Resource Identifiers  in  WWW,’’  Request
       for  Comments  1630,  Network Working Group of the Internet Engineering
       Task Force, June 1994.

       Tim Berners-Lee, Larry Masinter, and Mark McCahill.  ‘‘Uniform Resource
       Locators  (URL),’’  Request  for  Comments 1738, Network Working Group,
       1994.

       Dave Raggett, Arnaud Le Hors,  and  Ian  Jacobs.   ‘‘Notes  on  helping
       search  engines index your Web site,’’ HTML 4.0 Specification, Appendix
       B: Performance,  Implementation,  and  Design  Notes,  World  Wide  Web
       Consortium, April 1998.

       --.   ‘‘Objects,  Images, and Applets: How to specify alternate text,’’
       HTML 4.0 Specification, 13.8, World Wide Web Consortium, April 1998.

       Dublin  Core  Directorate.   ‘‘The  Dublin  Core:  A   Simple   Content
       Description Model for Electronic Resources.’’

       Larry  Wall,  et al.  Programming Perl, 3rd ed., O’Reilly & Associates,
       Inc., Sebastopol, CA, 2000.

AUTHOR

       Paul J. Lucas <pauljlucas@mac.com>

WWW                            February 12, 2000                        WWW(3)