NAME
sitemap - make a site map from meta tags in an HTML tree
SYNOPSIS
sitemap [start-dir | config-file]
DESCRIPTION
sitemap indexes all pages under the start directory and writes an HTML
map page to standard output. The code looks for description information
for each page in a META DESCRIPTION header; if it doesn’t find one, the
page is omitted from the index. That is, HTML pages to be indexed
should have a meta tag with its name attribute set to description and
its content attribute set to a brief description ofthe contents. For
example,
<head>
<title>Sitemap documentation</title>
<meta name="description"
content="Documentation for sitemap program to index HTML pages.">
</head>
The output of sitemap is an HTML page that contains a list of
descriptions and links to the indexed pages. This output can be
configured via an rc file (see below).
ARGUMENTS
If no options are supplied, the start directory is the directory
indicated by the DOCUMENT_ROOT or HOME environment variables, in that
order. If neither variable is specified on a UNIX system, the effective
user’s home directory (as indicated in the passwd file) will be used.
If a start-dir directory is supplied as an argument, then sitemap will
look inside that directory for a .sitemaprc. (The effective start
directory can still be overridden with the Startdir directive inside
the configuration file.) If the configuration file does not exist,
sitemap will run with a set of default parameters, which is usually not
what you want.
If a config-file configuration file is specified, then the
configuration for sitemap will be read from that file.
CONFIGURATION FILE
sitemap is a Python script. To configure the strings used in the index
page header and footer, you can create a configuration file in your
home directory called .sitemaprc (or as indicated by the command-line
parameter). A skeleton of a configuration file is provided with the
program. The file should start with the text [sitemap] on a line by
itself. Subsequent lines should be name=value pairs. Lines beginning
with the # character are treated as comments and are ignored. The
possible field names in the configuration file are listed below:
Hometitle=title
The title of your homepage. The generated site map will contain
a link with this text.
Homepage=url
The URL of your homepage. The generated site map will contain a
link back to this page.
Indextitle=title
The title for the generated site map page.
Headinfo=any Html Text
Any additional HTML you want to include in the <head> section of
the site map. Use with care - only certain tags are legal in the
<head> of a page.
Encoding=encoding
The HTML encoding, such as iso-8859-1 or utf-8. If it is not
specified, iso-8859-1 is used for all languages but Czech, where
iso-8859-2 is used.
Startdir=directory
The root directory of the site to index. If it is not specified,
the directory of the .sitemaprc configuration file is used.
Body=attributes
Any additional attributes to be included in the <body> tag.
Prefix=url
An optional URL prefix to put before each pathname (sitemap
outputs each filename as a site-relative path beginning with a
‘‘/’’. If it is not provided, sitemap tries computing it by
itself as follows. If the environment variable DOCUMENT_ROOT is
set, and the start directory is a subdirectory of the document
root, the prefix is the relative path from the document root to
the start directory. Otherwise, sitemap it assumes that the
start directory can be accessed with the URL ‘‘/’’. (That is,
the start directory would be the directory indciated by the web
server’s DOCUMENT_ROOT.) If this is incorrect (e.g. you are
indexing a user’s home page whose URL begins with
‘‘/~username’’) you can supply the alternative URL prefix here.
Dirtitle=title
The title string to use for directories. Directories are listed
and linked in the generated site map page with this text.
Fullname=name
Your full name. This name will be included in one corner of the
generated site map page. You may want to list a company name or
a copyright statement instead, for example.
Mailaddr=address
E-mail address of a contact person. Since the e-mail address
will be linked on the generated site map page, you may want to
set this parameter to the e-mail address of a contact person or
a webmaster.
Language=language
The language for the boilerplate text included in the output
(Czech, English, French, German, Italian, Norwegian, Spanish, or
Swedish).
Icondirs=icon Path
The path (relative to the start directory or a URL) of the icon
for directories. The icon must be 33 pixels wide (or scaleable
to that size). If omitted, no icon will be displayed next to
site map entries for directories.
Icontext=icon Path
The path (relative to the start directory or a URL) of the icon
for HTML files. The icon must be 33 pixels wide (or scaleable to
that size). If omitted, no icon will be displayed next to site
map entries for HTML pages.
Indexfiles=file1 File2 File3
A space-separated list of files to treat as index or main pages
for a directory. Any file with a filename exactly equal to one
of the indicated filenames will be treated as an index page.
Index pages sort to the top of the list of files in a directory.
For example, index.html or default.htm might be good candidates
for this parameter.
Exclude=word1 Word2
A space-separated list of words to ignore when scanning files
and directories. sitemap will skip any file or entire
subdirectories the contain any of the words in their path. For
example, Test or CVS may be good candidates for this parameter.
Debug=y
Set this parameter to view the computed configuration file name,
start directory, document root, and prefix in the generated site
map page. You’ll need to view the source of the generated HTML
file because these values will be listed within and HTML
comment. Search for the word Debugging in the generated HTML
page.
USE UNDER CGI
You can use sitemap to generate site maps on the fly. Any command-line
argument can be passed as the query string (i.e. a string immediately
following the URL of the CGI script and a ’?’ character).
sitemap will deduce that it is running under the CGI by virtue of the
fact that the REMOTE_ADDR environment variable is defined. If so, it
outputs a content-type header (text/html) ahead of the HTML page.
When running as a CGI script, sitemap does not assume that the document
root is necessarily identical with the start directory. It inspects the
DOCUMENT_ROOT environment variable and constructs a prefix in an
attempt to get from the server document root to the start directory.
This will fail if the start directory is not a subdirectory under the
document root, in which case the prefix directive in the configuration
file should be used.
AUTHORS
Eric S. Raymond <esr@thyrsus.com>.
Immo Huneke <HunekeI@Logica.Com>.
Tom Bryan <tbryan@python.net>.
Modified for Debian by Aaron Isotton <aaron@isotton.com>.