Man Linux: Main Page and Category List

NAME

       webalizer - A web server log file analysis tool.

SYNOPSIS

       webalizer [ option ... ] [ log-file ]

       webazolver [ option ... ] [ log-file ]

DESCRIPTION

       The  Webalizer is a web server log file analysis program which produces
       usage statistics in HTML  format  for  viewing  with  a  browser.   The
       results  are  presented  in  both  columnar and graphical format, which
       facilitates interpretation.  Yearly, monthly, daily  and  hourly  usage
       statistics  are  presented,  along with the ability to display usage by
       site, URL, referrer, user agent (browser),  username,  search  strings,
       entry/exit  pages,   and country (some information may not be available
       if not present in the log file being processed).

       The Webalizer supports CLF (common log format) log files,  as  well  as
       Combined  log  formats as defined by NCSA and others, and variations of
       these which it attempts to  handle  intelligently.   In  addition,  the
       Webalizer  also  supports wu-ftpd xferlog formatted log files, allowing
       analysis of ftp servers, and  squid  proxy  logs.   Logs  may  also  be
       compressed, via gzip.  If a compressed log file is detected, it will be
       automatically uncompressed while it is read.  Compressed logs must have
       the standard gzip extension of .gz.

       webazolver is normally just a symbolic link to the webalizer.  When run
       as webazolver, only DNS file creation/updates are  performed,  and  the
       program  will exit once complete.  All normal options and configuration
       directives are available, however many will not be used.  In  addition,
       a  DNS  cache  file  must  be specified.  If the number of DNS children
       processes to use are not specified, the webazolver will default to 5.

       This documentation applies to The Webalizer Version 2.01

RUNNING THE WEBALIZER

       The Webalizer was designed to be run from a Unix command line prompt or
       as a crond(8) job. Once executed, the general flow of the program is:

       o       A  default  configuration  file  is  scanned for.  A file named
               webalizer.conf is searched for in the current directory, and if
               found,  it’s  configuration data is parsed.  If the file is not
               present in the current directory,  the file /etc/webalizer.conf
               is searched for and, if found, is used instead.

       o       Any  command  line  arguments  given to the program are parsed.
               This may include the specification  of  a  configuration  file,
               which is processed at the time it is encountered.

       o       If  a  log  file was specified, it is opened and made ready for
               processing.  If no log file was given, STDIN is used for input.
               If the log filename ’-’ is specified, STDIN will be forced.

       o       If  an  output  directory  was  specified,  the  program does a
               chdir(2)  to  that  directory  in  preparation  for  generating
               output.    If  no  output  directory  was  given,  the  current
               directory is used.

       o       If a non-zero number of DNS Children processes were  specified,
               they  will  be  started,  and  the  specified  log file will be
               processed, creating or updating the specified DNS cache file.

       o       If no hostname was given,  the  program  attempts  to  get  the
               hostname   using  a  uname(2)  system  call.   If  that  fails,
               localhost is used.

       o       A history file is searched for in the current directory (output
               directory)  and  read  if  found.   This  file keeps totals for
               previous months, which is used  in  the  main  index.html  HTML
               document.   Note:  The  file location can now be specified with
               the HistoryName configuration option.

       o       If  incremental  processing  was  specified,  a  data  file  is
               searched  for  and  loaded  if  found, containing the ’internal
               state’ data of the program at the end of a previous run.  Note:
               The file location can now be specified with the IncrementalName
               configuration option.

       o       Main processing begins on the  log  file.   If  the  log  spans
               multiple  months,  a separate HTML document is created for each
               month.

       o       After main processing, the main  index.html  page  is  created,
               which  has  totals  by  month  and  links  to  each months HTML
               document.

       o       A new history file is saved  to  disk,  which  includes  totals
               generated by The Webalizer during the current run.

       o       If incremental processing was specified, a data file is written
               that contains the ’internal state’ data at the end of this run.

INCREMENTAL PROCESSING

       Version  1.2x of The Webalizer adds incremental run capability.  Simply
       put, this allows processing large log files by breaking  them  up  into
       smaller  pieces,  and processing these pieces instead.  What this means
       in real terms is that you can now rotate your log files as often as you
       want, and still be able to produce monthly usage statistics without the
       loss of any detail.  Basically, The Webalizer saves  and  restores  all
       internal  data  in  a  file  named  webalizer.current.  This allows the
       program to ’start where it left  off’  so  to  speak,  and  allows  the
       preservation  of  detail  from  one  run to the next.  The data file is
       placed in the current output directory, and is a plain ASCII text  file
       that  can  be  viewed with any standard text editor.  It’s location and
       name may be changed using the IncrementalName configuration keyword.

       Some special precautions need to be taken when  using  the  incremental
       run  capability  of The Webalizer.  Configuration options should not be
       changed between runs, as that could cause corruption  of  the  internal
       data  stored.   For example, changing the MangleAgents level will cause
       different representations  of  user  agents  to  be  stored,  producing
       invalid  results in the user agents section of the report.  If you need
       to change configuration options, do it at the end of  the  month  after
       normal  processing  of  the  previous  month  and before processing the
       current month.  You may also want to delete the webalizer.current  file
       as well.

       The  Webalizer  also  attempts  to  prevent data duplication by keeping
       track of the timestamp of the last record processed.  This timestamp is
       then  compared to current records being processed, and any records that
       were logged previous to that timestamp are ignored.  This,  in  theory,
       should  allow  you to re-process logs that have already been processed,
       or process logs that contain  a  mix  of  processed/not  yet  processed
       records, and not produce duplication of statistics.  The only time this
       may break is if you have  duplicate  timestamps  in  two  separate  log
       files...  any  records  in  the  second  log file that do have the same
       timestamp as the last record in the previous log file  processed,  will
       be  discarded as if they had already been processed.  There are lots of
       ways to prevent this however, for  example,  stopping  the  web  server
       before  rotating  logs  will  prevent  this situation.  This setup also
       necessitates that you  always  process  logs  in  chronological  order,
       otherwise data loss will occur as a result of the timestamp compare.

REVERSE DNS LOOKUPS

       The  Webalizer  supports  reverse  DNS lookups through a DNS cache file
       that is either created/updated at  run-time,  or  has  been  previously
       created,  either  by a previous run of the webalizer, or by running the
       stand-alone version, webazolver.   In  order  to  perform  reverse  DNS
       lookups,   a   DNSCache  filename  must  be  specified.   In  order  to
       create/update the cache file at run-time, the DNSChildren  number  must
       be  non-zero.   The  DNSChildren value specifies the number of children
       processes to fork, each of which will perform reverse  DNS  lookups  in
       order to create/update the DNS cache file.  See the file DNS.README for
       additional information.

COMMAND LINE OPTIONS

       The Webalizer supports many different configuration options  that  will
       alter  the way the program behaves and generates output.  Most of these
       can be specified on the command line, while some can only be  specified
       in  a  configuration  file.  The command line options are listed below,
       with references to the corresponding configuration file keywords.

       General Options

       -h      Display all available command line options and exit program.

       -V      Display program version and exit program.

       -d      Debug.  Display debugging information for errors and  warnings.

       -i      IgnoreHist.  Ignore history.  USE WITH CAUTION. This will cause
               The Webalizer to ignore any previous monthly history file only.
               Incremental data (if present) is still processed.

       -p      Incremental.  Preserve internal data between runs.

       -q      Quiet.   Suppress  informational  messages.   Does not suppress
               warnings or errors.

       -Q      ReallyQuiet.  Suppress  all  messages  including  warnings  and
               errors.

       -T      TimeMe.    Force  display  of  timing  information  at  end  of
               processing.

       -c file Use configuration file file.

       -n name HostName.  Use the hostname name.

       -o dir  OutputDir.  Use output directory dir.

       -t name ReportTitle.  Use name for report title.

       -F ( clf | ftp | squid )
               LogType.  Specify log type  to  be  processed.   Value  can  be
               either  clf,  ftp  or  squid  format.   If  not specified, will
               default to CLF format.  FTP logs must be  in  standard  wu-ftpd
               xferlog format.

       -f      FoldSeqErr.   Fold  out  of  sequence  log  records  back  into
               analysis, by treating as if they were the same date/time as the
               last  good  record.   Normally, out of sequence log records are
               simply ignored.

       -Y      CountryGraph. Suppress country graph.

       -G      HourlyGraph.  Suppress hourly graph.

       -x name HTMLExtension.  Defines HTML file extension  to  use.   If  not
               specified,  defaults  to  html.   Do  not  include  the leading
               period.

       -H      HourlyStats.  Suppress hourly statistics.

       -L      GraphLegend.  Suppress color coded graph legends.

       -l num  GraphLines.  Specify number of background lines. Default is  2.
               Use zero (’0’) to disable the lines.

       -P name PageType.   Specify  file extensions that are considered pages.
               Sometimes referred to as pageviews.

       -m num  VisitTimeout.  Specify the Visit timeout period.  Specified  in
               number of seconds.  Default is 1800 seconds (30 minutes).

       -I name IndexAlias.   Use  the filename name as an additional alias for
               index..

       -M num  MangleAgents.  Mangle user agent names according to the  mangle
               level specified by num.  Mangle levels are:

               5   Browser name and major version.

               4   Browser name, major and minor version.

               3   Browser  name,  major version, minor version to two decimal
                   places.

               2   Browser name, major and minor versions and sub-version.

               1   Browser name, version and machine type if possible.

               0   All information (left unchanged).

       -g num  GroupDomains.  Automatically  group  sites  by   domain.    The
               grouping  level  specified  by  num  can  be thought of as ’the
               number of dots’ to display in the grouping.  The default  value
               of 0 disables any domain grouping.

       -D name DNSCache.  Use the DNS cache file name.

       -N num  DNSChildren.   Use  num  DNS  children processes to perform DNS
               lookups, either  creating  or  updating  the  DNS  cache  file.
               Specify  zero  (0)  to disable cache file creation/updates.  If
               given, a DNS cache filename must be specified.

       Hide Options

       -a name HideAgent.  Hide user agents matching name.

       -r name HideReferrer.  Hide referrer matching name.

       -s name HideSite.  Hide site matching name.

       -X name HideAllSites.  Hide all individual sites (only display groups).

       -u name HideURL.  Hide URL matching name.

       Table size options

       -A num  TopAgents.  Display the top num user agents table.

       -R num  TopReferrers.  Display the top num referrers table.

       -S num  TopSites.  Display the top num sites table.

       -U num  TopURLs.  Display the top num URL’s table.

       -C num  TopCountries.  Display the top num countries table.

       -e num  TopEntry.  Display the top num entry pages table.

       -E num  TopExit.  Display the top num exit pages table.

CONFIGURATION FILES

       Configuration  files  are  standard  ascii(7)  text  files  that may be
       created or edited using any standard editor.   Blank  lines  and  lines
       that  begin  with  a pound sign (’#’) are ignored.  Any other lines are
       considered to be  configuration  lines,  and  have  the  form  "Keyword
       Value",   where  the  ´Keyword´  is  one  of  the  currently  available
       configuration keywords defined below,  and  ’Value’  is  the  value  to
       assign  to that particular option.  Any text found after the keyword up
       to the end of the line is considered the keyword’s value, so you should
       not  include  anything  after  the actual value on the line that is not
       actually part of  the  value  being  assigned.   The  file  sample.conf
       provided  with  the  distribution contains lots of useful documentation
       and examples as well.

       General Configuration Keywords

       LogFile name
               Use log file named name.  If  none  specified,  STDIN  will  be
               used.

       LogType name
               Specify  log file type as name. Values can be either web, squid
               or ftp, with the default being web.

       OutputDir dir
               Create output in the directory dir.   If  none  specified,  the
               current directory will be used.

       HistoryName name
               Filename to use for history file.  Relative to output directory
               unless absolute name is given (ie: starts with  ’/’).  Defaults
               to ´webalizer.hist’ in the standard output directory.

       ReportTitle name
               Use  the  title  string  name  for  the  report title.  If none
               specified, use the default of (in  english)  "Usage  Statistics
               for ".

       Hostname name
               Set the hostname for the report as name.  If none specified, an
               attempt will be made to gather  the  hostname  via  a  uname(2)
               system call.  If that fails, localhost will be used.

       UseHTTPS ( yes | no )
               Use  https:// on links to URLS, instead of the default http://,
               in the ’Top URLs’ table.

       Quiet ( yes | no )
               Suppress informational messages.  Warning  and  Error  messages
               will not be supressed.

       ReallyQuiet ( yes | no )
               Suppress all messages, including Warning and Error messages.

       Debug ( yes | no )
               Print extra debugging information on Warnings and Errors.

       TimeMe ( yes | no )
               Force timing information at end of processing.

       GMTTime ( yes | no )
               Use GMT (UTC) time instead of local timezone for reports.

       IgnoreHist ( yes | no )
               Ignore  previous monthly history file.  USE WITH CAUTION.  Does
               not prevent Incremental file processing.

       FoldSeqErr ( yes | no )
               Fold out of sequence log records back into analysis by treating
               them as if they had the same date/time as the last good record.
               Normally, out of sequence log records are ignored.

       CountryGraph ( yes | no )
               Display Country Usage Graph in output report.

       DailyGraph ( yes | no )
               Display Daily Graph in output report.

       DailyStats ( yes | no )
               Display Daily Statistics in output report.

       HourlyGraph ( yes | no )
               Display Hourly Graph in output report.

       HourlyStats ( yes | no )
               Display Hourly Statistics in output report.

       PageType name
               Define the file extensions to consider as a page.  If a file is
               found to have the same extension as name, it will be counted as
               a page (sometimes called a pageview).

       GraphLegend ( yes | no )
               Allows the color coded graph legends to be enabled/disabled.

       GraphLines num
               Specify the number of background reference lines  displayed  on
               the  graphs  produced.  Disable by using zero (’0’), default is
               2.

       VisitTimeout num
               Specifies the visit timeout value.  Default is 1800 seconds (30
               minutes).   A  visit is determined by looking at the difference
               in time between the current and last request  from  a  specific
               site.   If  the  difference  is greater or equal to the timeout
               value, the request is counted as a  new  visit.   Specified  in
               seconds.

       IndexAlias name
               Use name as an additional alias for index.*.

       MangleAgents num
               Mangle  user agent names based on mangle level num.  See the -M
               command line switch for mangle levels and their  meaning.   The
               default is 0, which doesn’t mangle user agents at all.

       SearchEngine name variable
               Allows  the  specification  of  search  engines and their query
               strings.  The name is the name to match  against  the  referrer
               string  for  a  given  search  engine.  The variable is the cgi
               variable that the search engine  uses  for  queries.   See  the
               sample.conf  file for example usage with common search engines.

       Incremental ( yes | no )
               Enable Incremental mode processing.

       IncrementalName name
               Filename to use  for  incremental  data.   Relative  to  output
               directory  unless  an  absolute  name is given (ie: starts with
               ’/’).  Defaults to ´webalizer.current’ in the  standard  output
               directory.

       DNSCache name
               Filename  to  use  for  the  DNS  cache.   Relative  to  output
               directory unless an absolute name is  given  (ie:  starts  with
               ’/’).

       DNSChildren num
               Number   of   children   DNS  processes  to  run  in  order  to
               create/update the DNS cache file.  Specify zero (0) to disable.

       Top Table Keywords

       TopAgents num
               Display the top num User Agents table. Use zero to disable.

       AllAgents ( yes | no )
               Create separate HTML page with All User Agents.

       TopReferrers num
               Display the top num Referrers table. Use zero to disable.

       AllReferrers ( yes | no )
               Create separate HTML page with All Referrers.

       TopSites num
               Display the top num Sites table. Use zero to disable.

       TopKSites num
               Display  the  top  num  Sites  (by  KByte)  table.  Use zero to
               disable.

       AllSites ( yes | no )
               Create separate HTML page with All Sites.

       TopURLs num
               Display the top num URLs table. Use zero to disable.

       TopKURLs num
               Display the top  num  URLs  (by  KByte)  table.   Use  zero  to
               disable.

       AllURLs ( yes | no )
               Create separate HTML page with All URLs.

       TopCountries num
               Display  the  top  num  Countries  in  the  table.  Use zero to
               disable.

       TopEntry num
               Display the top num Entry Pages in  the  table.   Use  zero  to
               disable.

       TopExit num
               Display  the  top  num  Exit  Pages  in the table.  Use zero to
               disable.

       TopSearch num
               Display the top num Search Strings in the table.  Use  zero  to
               disable.

       AllSearchStr ( yes | no )
               Create separate HTML page with All Search Strings.

       TopUsers num
               Display  the  top  num  Usernames  in  the  table.  Use zero to
               disable.  Usernames are only  available  if  using  http  based
               authentication.

       AllUsers ( yes | no )
               Create separate HTML page with All Usernames.

       Hide/Ignore/Group/Include Keywords

       HideAgent name
               Hide User Agents that match name.

       HideReferrer name
               Hide Referrers that match name.

       HideSite name
               Hide Sites that match name.

       HideAllSites ( yes | no )
               Hide  all  individual sites.  This causes only grouped sites to
               be displayed.

       HideURL name
               Hide URL’s that match name.

       HideUser name
               Hide Usernames that match name.

       IgnoreAgent name
               Ignore User Agents that match name.

       IgnoreReferrer name
               Ignore Referrers that match name.

       IgnoreSite name
               Ignore Sites that match name.

       IgnoreURL name
               Ignore URL’s that match name.

       IgnoreUser name
               Ignore Usernames that match name.

       GroupAgent name [Label]
               Group User Agents that  match  name.   Display  Label  in  ’Top
               Agent’ table if given (instead of name).

       GroupReferrer name [Label]
               Group  Referrers  that  match  name.   Display  Label  in  ’Top
               Referrer’ table if given (instead of name).

       GroupSite name [Label]
               Group Sites that match name.  Display Label in ’Top Site’ table
               if given (instead of name).

       GroupDomains num
               Automatically  group  sites by domain.  The value num specifies
               the level of grouping, and can be thought of as the ’number  of
               dots’  to be displayed.  The default value of 0 disables domain
               grouping.

       GroupURL name [Label]
               Group URL’s that match name.  Display Label in ’Top URL’  table
               if given (instead of name).

       GroupUser name [Label]
               Group  Usernames  that  match  name.   Display  Label  in  ’Top
               Usernames’ table if given (instead of name).

       IncludeSite name
               Force inclusion of sites that  match  name.   Takes  precedence
               over Ignore# keywords.

       IncludeURL name
               Force  inclusion  of  URL’s  that match name.  Takes precedence
               over Ignore# keywords.

       IncludeReferrer name
               Force inclusion of Referrers that match name.  Takes precedence
               over Ignore# keywords.

       IncludeAgent name
               Force   inclusion  of  User  Agents  that  match  name.   Takes
               precedence over Ignore* keywords.

       IncludeUser name
               Force inclusion of Usernames that match name.  Takes precedence
               over Ignore* keywords.

       HTML Generation Keywords

       HTMLExtension text
               Defines  the  HTML file extension to use.  Default is html.  Do
               not include the leading period!

       HTMLPre text
               Insert text at the very beginning of the generated  HTML  file.
               Defaults to a standard html 3.2 DOCTYPE record.

       HTMLHead text
               Insert text within the <HEAD></HEAD> block of the HTML file.

       HTMLBody text
               Insert  text  in  HTML  page, starting with the <BODY> tag.  If
               used, the first line must be a <BODY ...> tag.  Multiple  lines
               may be specified.

       HTMLPost text
               Insert  text  at  top  (before  horiz.  rule)  of  HTML  pages.
               Multiple lines may be specified.

       HTMLTail text
               Insert text at bottom of the HTML page.  The text  is  top  and
               right aligned within a table column at the end of the report.

       HTMLEnd text
               Insert  text  at  the  very  end  of  the  HTML  page.   If not
               specified, the default is to  insert  the  ending  </BODY>  and
               </HTML> tags.  If used, you must supply these tags yourself.

       Dump Object Keywords

       The  Webalizer allows you to export processed data to other programs by
       using tab delimited text files.  The Dump* commands specify which files
       are to be written, and where.

       DumpPath name
               Save  dump  files  in  directory  name.   If not specified, the
               default output directory  will  be  used.   Do  not  specify  a
               trailing slash (/fP).

       DumpExtension name
               Use  name  as  the  filename  extension for dump files.  If not
               given, the default of tab will be used.

       DumpHeader ( yes | no )
               Print a column header as the first record of the file.

       DumpSites ( yes | no )
               Dump the sites data to a tab delimited file.

       DumpURLs ( yes | no )
               Dump the url data to a tab delimited file.

       DumpReferrers ( yes | no )
               Dump the referrer data to a tab delimit  file.   This  data  is
               only   available   if   using  a  log  that  contains  referrer
               information (ie: a combined format web log).

       DumpAgents ( yes | no )
               Dump the user agent data to a tab delimited file.  This data is
               only  available  if  using  a  log  that  contains  user  agent
               information (ie: a combined format web log).

       DumpUsers ( yes | no )
               Dump the username data to a tab delimited file.  This  data  is
               only  available  if  processing  a wu-ftpd xferlog or a web log
               that contains http authentication information.

       DumpSearchStr ( yes | no )
               Dump the search string data to a tab delimited file.  This data
               is  only  available  if  processing  a  web  log  that contains
               referrer information and had search string information present.

       ColorHit ( rrggbb | 00805c )
               Sets  the  graph’s  hit-color  to  the specified html color (no
               ’#’).

       ColorFile ( rrggbb | 0000ff )
               Sets the graph’s file-color to the  specified  html  color  (no
               ’#’).

       ColorSite ( rrggbb | ff8000 )
               Sets  the  graph’s  site-color  to the specified html color (no
               ’#’).

       ColorKbyte ( rrggbb | ff0000 )
               Sets the graph’s kilobyte-color to the specified html color (no
               ’#’).

       ColorPage ( rrggbb | 00c0ff )
               Sets  the  graph’s  page-color  to the specified html color (no
               ’#’).

       ColorVisit ( rrggbb | ffff00 )
               Sets the graph’s visit-color to the specified  html  color  (no
               ’#’).

       PieColor1 ( rrggbb | 800080 )
               Sets the pie’s first optional color to the specified html color
               (no ’#’).

       PieColor2 ( rrggbb | 80ffc0 )
               Sets the pie’s second optional  color  to  the  specified  html
               color (no ’#’).

       PieColor3 ( rrggbb | ff00ff )
               Sets  the pie’s third optinal color to the specified html color
               (no ’#’).

       PieColor4 ( rrggbb | ffc480 )
               Sets the pie’s fourth optional  color  to  the  specified  html
               color (no ’#’).

FILES

       webalizer.conf      Default configuration file.  Is searched for in the
                           current directory and if not found,  in  the  /etc/
                           directory.

       webalizer.hist      Monthly  history file for previous 12 months.  (can
                           be changed)

       webalizer.current   Current state data file  (Incremental  processing).
                           (can be changed)

       xxxxx_YYYYMM.html   Various   monthly   HTML   output  files  produced.
                           (extension can be changed)

       xxxxx_YYYYMM.png    Various monthly image files used in the reports.

       xxxxx_YYYYMM.tab    Monthly tab delimited text files.   (extension  can
                           be changed)

BUGS

       Report bugs to brad@mrunix.net.

COPYRIGHT

       Copyright  (C) 1997-2000 by Bradford L. Barrett.  Distributed under the
       GNU GPL.  See the files "COPYING" and "Copyright",  supplied  with  all
       distributions for additional information.

AUTHOR

       Bradford L. Barrett <brad@mrunix.net>