NAME
Linklint - fast link checker and website maintenance tool
SYNOPSIS
linklint [-cache directory] [-case] [-checksum] [-concise_url]
[-db1..9] [-delay d] [-doc] [-docbase base] [-dont_output xxxx]
[-error] [-flush] [-forward] [-help] [-help_all] [-host hostname:port]
[-host hostname] [-htmlonly] [-http] [-http_header name:value] [-ignore
ignoreset] [-index file] [-language zz] [-limit n] [-list] [-local
linkset] [-map /a=[/b]] [-net] [-netmod] [-netset] [-no_anchors]
[-no_query_string] [-no_warn_index] [-orphan] [-out file]
[-output_frames] [-output_index filename] [-password realm
user:password] [-proxy hostname[:port]] [-quiet] [-redirect] [-retry]
[-silent] [-skip skipset] [-textonly] [-timeout t] [-url_doc_prefix
url/] [-version] [-warn] [-xref] linkset
VERSION
2.3.5 August 13, 2001
DESCRIPTION
This manual page documents briefly the Linklint program, which is an
Open Source Perl program that checks local and remote HTML links.
This manual page was written for the Debian distribution because the
original program does not have a manual page. Instead, it has
documentation in the HTML format; see below.
OPTIONS
Input File Selection
Whether you are doing a local site check or an HTTP site check, you
specify which directories (presumably containing HTML files) to check
with one or more linksets. A linkset uses two wildcard characters @ and
#. Each linkset specifies one or more directories much like the
standard * and ? wildcard characters are used to specify the characters
in the * names of files in one directory.
The @ character matches any string of characters (this kind of acts
like "*"), and the # character (which is kind of like "?") matches any
string of characters except "/" . The best way to understand how @ and
# work is to look at a few examples:
the entire site /@
the homepage only (default) /
files in the root directory only /#
. . . and one directory down /#/#
files in the sub directory only /sub/#
files in the sub directory and below /sub/@
specific files /file1 /file2 ...
specific subdirectories /sub1/@ /sub2/@ ...
If you specify more than one linkset, files matching any of the
linksets will be checked. HTML files that don’t match any of the
linksets will be skipped. Linklint will see if they exist but won’t
check any of their links.
Other File Selection Options
-skip skipset
Skips HTML files that match skipset. "Linklint" will make sure
these files exist but won’t add any of their links to the list of
files to check. Multiple skipsets are allowed, but each must be
preceded with -skip on the command line. Skipsets use the same
wildcard characters as linksets.
-ignore ignoreset
Ignores files matching ignoreset. "Linklint" doesn’t even check to
see if these files exist. Multiple ignoresets are allowed, but
each must be preceded with -ignore on the command line. Ignoresets
use the same wildcard characters as linksets.
-limit n
Limits checking to n HTML files (default 500). All HTML files
after the first n are skipped.
Local Site Checking
If you are developing HTML pages on a computer that does not have an
http server, or if you are developing a simple site that does not use
Server Redirection or extensive CGI, you should use local site
checking.
linklint /@
Checks all HTML files in the current directory and below. Assumes that
the current directory is the server root directory so links starting
with "/" default to this directory. You must specify /@ to check the
entire site. See Which Files to Check for details.
linklint -root dir /@
Checks all HTML files in dir and below. This is useful if you want to
check several sites on the same machine or if you don’t want to run
Linklint in your public HTML directory.
Other Local Site Options
-host hostname
By default "Linklint" assumes all links on your site that start
with "http://" are remote links to other sites. If you have
absolute links to your own site, give "Linklint" your hostname and
links starting with "http://hostname" will be treated as local
files. If you specify -host hostname:port, only http links to this
hostname and port will be treated as local files.
-case
Makes sure that the filename (upper/lower) case used links inside
of html tags matches the case used by the file system. This is for
Windows only and is very handy if you are porting a site to a Unix
host.
-orphan
Checks all directories that contain files used on the site for
unused (orphan) files.
-index file
Uses file as the default index file instead of the default list
used by "Linklint". You can specify more than one file but each one
must be preceded by -index on the command line. If a default index
file is not found, "Linklint" uses a listing of the entire
directory. See the Default File section for details.
-map /a=[/b]
Substitutes leading /a with /b. For server-side image maps or to
simulate Server Redirection.
-no_warn_index
Turns of the "index file not found" warning. Applies to local site
checking only.
-no_anchors
Tells "Linklint" to ignore named anchors. This could ease memory
problems for people with large sites who are primarily interested
in missing pages and not missing named anchors. This option works
for both HTTP and local site checks.
HTTP Site Checking
If you have a complicated site that uses lots of CGI or Server
Redirection, you should use HTTP site checking. Even though an HTTP
site check reads pages via your HTTP server, you will get the best
performance if you do your checking on a machine that has a high speed
connection to your server.
linklint -http -host www.site.com /@
The -http flag tells "Linklint" to check HTML files on the site
www.site.com via a remote http connection. You must specify a -host
whenever you do an HTTP site check (otherwise Linklint won’t where to
get your pages). You can specify /@ to check the entire site. See
Which Files to Check for details.
HTTP Site Check Options
-http
This flag tells Linklint to perform an HTTP site check instead of a
local site check. All files (except server side image maps) will
be read via the HTTP protocol from your web server.
-host hostname:port
If you include :port at the end of your hostname, Linklint uses
this port for the HTTP site check.
-password realm user:password
Uses user and password as authorization to enter password protected
realm. Realms are named areas of a site that share a common set of
usernames and passwords. If passwords are needed to check your
site, Linklint will tell you which realms need passwords in warning
messages. Enclose the realm in double quotes if it contains
spaces. If no password is given for a specific realm, Linklint
will try using the password for the ""DEFAULT"" realm if it was
provided.
-timeout t
Times out after t seconds (default 15) when getting files via http.
Once data is received, an additional t seconds is allowed. The
timeout is disabled on Windows machines since the Windows port of
Perl does not support the "alarm()" function.
-delay d
Delays d seconds between requests to the same host (default 0).
This is a friendly thing to do especially if you are checking many
links on the same host.
-local linkset
Gets files that match linkset locally. The default -local linkset
is @.map (which matches any link ending in .map). This allows
Linklint to follow links through server-side image maps. The
default is ignored if you specify your own -local expressions. You
need to specify the -root directory for this option to work
propery.
-map /a=[/b]
Substitutes leading /a with /b. For server-side image maps or to
simulate Server Redirection.
-no_anchors
Tells "Linklint" to ignore named anchors.
-no_query_string
Up until version 2.3.4, Linklint did not use query strings while
doing HTTP site checks. Query strings were removed before making
HTTP requests. As of 2.3.4 query strings in links are used in the
requests. Use the -no_query_string flag to get back the "old"
behavior.
-http_header Name:value
Adds the HTTP header Name: value to all HTTP requests generated by
Linklint. You will need to use quotation marks to hide spaces in
the header line from the command line interpreter. Linklint will
automatically add a space after the first colon if there is not one
there already. Multiple (unique) header lines are allowed.
-language zz
This option is only useful if you are checking a site that uses
content negotiation to present the same URL in different languages.
Creates an HTTP Request header of the form Accept-Language: zz that
is included as part of all HTTP requests generated by Linklint.
Multiple -language specifications are allowed. This will result in
a single Accept-Language: header that lists all of the languages
you have specified in alphabetical order. Some web sites can use
this information to return pages to you in a specific language.
If you need to get more complicated than this, use the more general
purpose -http_header to create your own header. There is a partial
list of language abbreviations (taken from Debian) included as part
of the Linklint documentation.
Remote URL Checking
A remote URL check is used to see if a remote URL exists (or has been
recently modified). Links in the remote pages are not checked nor does
Linklint look for named anchors in remote URLs.
Remote URL checking can be used to check all of the "remote" links on
your site (those that link to pages on other sites) or it can check a
list of URLs. There are several ways to specify which remote URLs to
check:
linklint http://somehost/file.html
Checks to see if /file.html exists on somehost. Multiple URLs can be
entered on the command line, in an @commandfile, or in an @@httpfile.
Every URL to be checked must begin with "http://". This will disable
site checking.
linklint @@httpfile
Checks all the remote http URLs found in httpfile. Anything in the file
starting with "http://" is considered to be a URL. If the file looks
like a remoteX.txt file generated by Linklint then all failed URLs will
be cross referenced.
linklint @@ -doc linkdoc
Assuming you have already done a site check and used -doc linkdoc to
put all of your output files in the linkdoc directory, Linklint will
check all the remote links that were found on your site and cross
reference all failed URLs without doing a site check. You can use the
-netmod or -netset flags to enable the status-cache.
linklint -net [site check options]
The -net flag tells Linklint to check all remote links after doing
either a local or HTTP site check site. If you are having memory
problems, don’t use the -net option, instead use one of the @@ options
above.
Other Remote URL Options
-timeout t
Times out after t seconds (default 15) when getting files via http.
Once data is received, an additional t seconds is allowed. The
timeout is disabled on Windows machines since the Windows port of
Perl does not support the "alarm()" function.
-delay d
Delays d seconds between requests to the same host (default 0).
This is a friendly thing to do especially if you are checking many
links on the same host.
-redirect
Checks for <meta> redirects in the headers of remote URLs that are
html files. If a redirect is found it is followed. This feature
is disabled if the status cache is used.
-proxy hostname[:port]
Sends all remote HTTP requests through the proxy server hostname
and the optional port. This allows you to check remote URLs or
(new with version 2.3.1) your entire site from within a firewall
that has an http proxy server. Some error messages (relating to
host errors) may not be available through a proxy server.
-concise_url
Turns off printing successful URLs to STDOUT during remote link
checking.
Status Cache Options
The Status Cache is a very powerful feature. It allows you to keep
track of recent changes in all of the remote (off-site) pages you link
to. You can then use the Linklint output files to quickly check changed
pages to see if they still meet your needs.
The flags below make use of the status cache file linklint.url (kept in
your HOME or LINKLINT directory). This file keeps track of the
modification dates of all the remote URLs that you check.
-netmod
Operates just like -net but makes use of the status cache. Newly
checked URLs will be entered in the cache. Linklint will tell you
which (previously cached) URLs have been modified since the last
-netset.
-netset
Like -netmod but also resets the last modified status in the cache
for all URLs that checked ok. If you always use -netset, modified
URLs will be reported just once.
-retry
Only checks URLs that have a host fail status in the cache.
Sometimes a URL fails because its host is temporarily down. This
flag enables you to recheck just those links. An easy way to
recheck all the cached URLs with host failures is "linklint @@
-retry". Use "linklint @@linkdoc/remoteX.txt -retry" if you want
failed URLs to be cross referenced.
-flush
Removes all URLs from the cache that are not currently being
checked. The -retry flag has no effect on which URLs are flushed.
-checksum
Ensures that every URL that has been modified is reported as such.
This flag can make the remote checking take longer. Many of the
pages that require a checksum are dynamically generated and will
always be reported as modified.
-cache directory
Reads and writes the linklint.url cache file in this directory.
The default directory is set by your LINKLINT or HOME environment
variables.
Output Options
No output files are generated by default, only progress and a brief
summary of the results are printed to the screen. You can produce
complete documentation (split up into separate files) in a -doc
directory or put selected output in a single -out file or by
redirecting the standard output to a file. See the Output File
Specification section for a detailed description of all output files.
Multi File Output
-doc linkdoc
Sends all output to the linkdoc directory. The output is divided
into separate .txt and .html files. Complete documentation is
always produced regardless of the single file flags.
The file index.txt contains an index to all the other files;
index.html is an HTML version of the index. The index files for
remote URL checking are ur_lindex.txt and url_index.html.
-textonly
Prevents any HTML files from being created in the -doc directory.
-htmlonly
Erases redundant text files in the -doc directory after they have
been used to create the HTML output files. The files remote.txt
and remoteX.txt are not erased since they can be used by Linklint
to recheck remote URLs.
-docbase base
Overrides the default base expression used for directing a browser
to the resources listed in the output HTML files. The base is
prepended to local links in the output HTML files. This only
affects the links in HTML output files, it has no effect on what is
displayed in these files. Ordinarily this flag would only be used
during a local site check to set the base to "http://host".
-output_frames
All HTML output data files are linked to from index.html. If you
use this flag then the the data files will be opened up in a new
frame (window) which can be handy in some cases since it always
leaves the index.html file open in its own window.
-output_index filename
The output index files were previously named linklint.txt and
linklint.html. These have now been changed to index.txt and
index.html. You can use the -output_index option to change this
name back to "linklint" or to something else.
-url_doc_prefix url/
By default, the output files associate with remote URL checking all
start with "url". You can change this with the -url_doc_prefix
option. If the url_doc_prefix contains a "/" character then the
appropriate directory will be created (as a subdirectory of the
-doc directory).
-dont_output xxxx
Don’t create output files that contain "xxxx". Can be repeated.
Example:
-dont_output "X$"
will supress the output of all cross reference files.
Single File Output
-error
Lists missing files and other errors.
-out file
Sends list output and summary information to file.
-list
Lists all found files, links, directories etc.
-warn
Lists all warnings.
-xref
Adds cross references to the lists.
-forward
Sorts lists by referring file.
Debug and other Flags
-db1
Debugs command line input and linkset expressions.
-db2
Prints the name of every file that gets checked (not just HTML
files).
-db3
Debugs HTML parser, prints out tags and resulting links.
-db4
Debugs socket connection (kind of).
-db5
Not used.
-db6
Details last-modified status for remote URLs (requires -netset or
-netmod).
-db7
Prints brief debug information while checking remote URLs.
-db8
Prints all http headers while checking remote URLs.
-db9
Generates random http errors.
-version
Gives version information.
-help
Lists a few simple examples of how to use Linklint.
-help_all
Lists all help (contained in program) including every input option.
-quiet
Disables printing progress to the screen.
-silent
Disables printing summarys to the screen.
AUTHOR
Linklint is written by James B. Bowlin <jbowlin@linklint.org>. This
manual page was written by Denis Barbier <barbier@debian.org> for the
Debian system (but may be used by others) by cut’n’paste from original
documentation written in HTML.