Man Linux: Main Page and Category List

NAME

       pmdaweblog  -  performance  metrics  domain agent (PMDA) for Web server
       logs

SYNOPSIS

       $PCP_PMDAS_DIR/weblog/pmdaweblog [-Cp] [-d domain]  [-h  helpfile]  [-i
       port]  [-l  logfile]  [-n  idlesec]  [-S  num]  [-t  delay] [-u socket]
       configfile

DESCRIPTION

       pmdaweblog is a Performance Metrics Domain Agent (PMDA(3))  that  scans
       Web  server logs to extract metrics characterizing Web server activity.
       These  performance  metrics  are  then  made  available   through   the
       infrastructure of the Performance Co-Pilot (PCP).

       The  configfile  specifies which Web servers are to be monitored, their
       associated access logs and error logs, and a  regular-expression  based
       scheme for extracting detailed information about each Web access.  This
       file is  maintained  as  part  of  the  PMDA  installation  and/or  de-
       installation  by  the  scripts  Install  and  Remove  in  the directory
       $PCP_PMDAS_DIR/weblog.  For more details, refer to  the  section  below
       covering installation.

       Once started, pmdaweblog monitors a set of log files and in response to
       a request for information, will process any new  information  that  has
       been  appended  to  the log files, similar to a tail(1).  There is also
       periodic "catch up" to process new information from all log files,  and
       a scheme to detect the rotation of log files.

       Like  all  other PMDAs, pmdaweblog is launched by pmcd(1) using command
       line options specified in $PCP_PMCDCONF_PATH - the Install script  will
       prompt  for appropriate values for the command line options, and update
       $PCP_PMCDCONF_PATH.

       A brief description of the pmdaweblog command line options follows:

       -C     Check the configuration and exit.

       -d domain
              Specify the domain number.  It is absolutely  crucial  that  the
              performance  metrics  domain number specified here is unique and
              consistent.  That is, domain should be different for every  PMDA
              on  the  one host, and the same domain number should be used for
              the pmdaweblog PMDA on all hosts.

              For most installations, the default domain  as  encapsulated  in
              the   file  $PCP_PMDAS_DIR/weblog/domain.h  will  suffice.   For
              alternate values, check $PCP_PMCDCONF_PATH for the domain values
              already    in    use    on    this    host,    and    the   file
              $PCP_VAR_DIR/pmns/stdpmid  contains  a  repository   of   ‘‘well
              known’’ domain assignments that probably should be avoided.

       -h helpfile
              Get  the  help  text from the supplied helpfile rather than from
              the default location.

       -i port
              Communicate with pmcd(1) on the specified Internet  port  (which
              may be a number or a name).

       -l logfile
              Location  of  the  log  file.   By  default,  a  log  file named
              weblog.log is written in the current directory of  pmcd(1)  when
              pmdaweblog is started, i.e.  $PCP_LOG_DIR/pmcd.  If the log file
              cannot be created or is not writable, output is written  to  the
              standard error instead.

       -n idlesec
              If  a  Web  server  log  file  has not been modified for idlesec
              seconds, then the file will be closed and  re-opened.   This  is
              the  only way pmdaweblog can detect any asynchronous rotation of
              the logs by Web  server  administrative  scripts.   The  default
              period  is  20  seconds.   This value may be changed dynamically
              using pmstore(1) to modify the value of the  performance  metric
              web.config.check.

       -p     Communicate with pmcd(1) via a pipe.

       -S num Specify  the maximum number of Web servers per sproc.  It may be
              desirable (from a latency and  load  balancing  perspective)  or
              necessary   (due   to   file   descriptor  limits)  to  delegate
              responsibility for scanning the Web server log files to  several
              sprocs.   pmdaweblog will ensure that each sproc handles the log
              files for at most num Web servers.  The default value is 80  Web
              servers per sproc.

       -t delay
              To  avoid  the  need  to  scan a lot of information from the Web
              server logs in response to  a  single  request  for  performance
              metrics, all log files will be checked at least once every delay
              seconds.  The default is 15 seconds.  This value may by  changed
              dynamically   using  pmstore(1)  to  modify  the  value  of  the
              performance metric web.config.catchup.

       -u socket
              Communicate with pmcd(1) via the given Unix domain socket.

INSTALLATION

       The PCP framework allows metrics  to  be  collected  on  one  host  and
       monitored  from  another.  These hosts are referred to as collector and
       monitor hosts, respectively.  A host may be  both  a  collector  and  a
       monitor.

       Collector hosts require the installation of the agent, while monitoring
       hosts require no agent installation at all.

       For collector hosts do the following as root:

         # cd $PCP_PMDAS_DIR/weblog
         # ./Install

       The  installation  procedure  prompts  for  a  default  or  non-default
       installation.   A  default  installation  will  search for known server
       configurations and automatically configure the PMDA for any server  log
       files  that  are  found.   A non-default installation will step through
       each server, prompting the user for  other  server  configurations  and
       arguments to pmdaweblog.  The end result of a collector installation is
       to build a configuration file that is  passed  to  pmdaweblog  via  the
       configfile argument.

       If you want to undo the installation, do the following as root:

         # cd $PCP_PMDAS_DIR/weblog
         # ./Remove

       pmdaweblog  is  launched  by  pmcd(1)  and  should  never  be  executed
       directly.  The Install and Remove scripts notify pmcd(1) when the agent
       is installed or removed.

CONFIGURATION

       The configuration file for the weblog PMDA is an ASCII file that can be
       easily modified.  Empty lines and lines beginning with ’#’ are ignored.
       All  other  lines  must  be  either  a  regular  expression  or  server
       specification.

       Regular expressions, which are used on both the access  and  error  log
       files, must be of the form:

         regex regexName regexp
       or

         regex_posix regexName ordering regexp_posix

       The   regexName  is  a  word  which  uniquely  identifies  the  regular
       expression.  This is the reference used in  the  server  specification.
       The  regexp  for  access logs is in the format described for regcmp(3).
       The regexp_posix for  access  logs  is  in  the  format  described  for
       regcomp(3).   Note that on IRIX post release 6.2, it is not recommended
       that the POSIX compliant form be  used  for  performance  reasons.  The
       argument  ordering  is  explained  below.  The  Posix  form  should  be
       available on all platforms.

       The regular  expression  requires  the  specification  of  up  to  four
       arguments  to  be  extracted from each line of a Web server access log,
       depending on the type of server. In the most common case there are  two
       arguments representing the method and the size.

       For the non- Posix version, argument $0 should contain the method: GET,
       HEAD , POST or PUT.  The method PUT is treated as a synonym  for  POST,
       and anything else is categorized as OTHER.

       The  second  argument,  $1,  should contain the size of the request.  A
       size of ‘‘-’’ or ‘‘ ’’ is treated as unknown.

       Argument $3 should contain the  status  code  returned  to  the  client
       browser  and argument $4 should contain the status code returned to the
       server from a remote host.  These latter two  arguments  are  used  for
       caching  servers  and  must  be  specified  as  a  pair  (or $3 will be
       ignored). For further information on status codes,  refer  to  the  web
       site http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

       Some legal non- Posix regex expression specifications for monitoring an
       access log are:

         # pattern for CERN, NCSA, Netscape etc Access Logs
         regex CERN ] "([A-Za-z][-A-Za-z]+)$0 .*" [-0-9]+ ([-0-9]+)$1

         # pattern for FTP Server access logs (normally in SYSLOG)
         regex SYSLOG_FTP ftpd[.*]: ([gp][-A-Za-z]+)$0( )$1

       There is 1 special types of access logs with the RegexName SQUID.  This
       formats  extract  4  parameters but since the Squid log file uses text-
       based status codes, it is handled as a special case.

       In the examples below, NS_PROXY parses the Netscape/W3C Common Extended
       Log  Format  and SQUID parses the default Squid Object Cache format log
       file.

         # pattern for Netscape Proxy Server Extended Logs
         regex NS_PROXY ] "([A-Za-z][-A-Za-z]+)$0 .*" ([-0-9]+)$2 \
              ([-0-9]+)$1 ([-0-9]+)$3

         # pattern for Squid Cache logs
         regex SQUID [0-9]+.[0-9]+[ ]+[0-9]+ [a-zA-Z0-9.]+ \
              ([_A-Z]+)$3([0-9]+)$2 ([0-9]+)$1 ([A-Z]+)$0

       The regexp for the error logs does not require any  arguments,  only  a
       match.  Some legal expressions are:

         # pattern for CERN, NCSA, Netscape etc Error Logs
         regex CERN_err .

         # pattern for FTP Server error logs (normally in SYSLOG)
         regex SYSLOG_FTP_err FTP LOGIN FAILED

       If POSIX compliant regular expressions are used, additional information
       is required since the order of parameters cannot be  specified  in  the
       regular  expression.   For  backwards compatibility, the common case of
       two parameters the order may be specified as method,size or size,method
       In  the general case, the ordering is specified by one of the following
       methods:

       n1,n2,n3,n4
            where nX is a digit between 1 and 4.  Each  comma-seperated  field
            represents     (in     order)     the     arument    number    for
            method,size,client_status,server_status

       -    Used for cases like the error logs where the content is ignored.

       As for the non- Posix format, the  SQUID  RegexName  is  treated  as  a
       special case to match the non-numerical status codes.

       Some  legal  Posix  regex  expression  specifications for monitoring an
       access log are:

         # pattern for CERN, NCSA, Netscape, Apache etc Access Logs
         regex_posix CERN method,size ][ \]+"([A-Za-z][-A-Za-z]+) \
              [^"]*" [-0-9]+ ([-0-9]+)

         # pattern for CERN, NCSA, Netscape, Apache etc Access Logs
         regex_posix CERN 1,2 ][ \]+"([A-Za-z][-A-Za-z]+) \
              [^"]*" [-0-9]+ ([-0-9]+)

         # pattern for FTP Server access logs (normally in SYSLOG)
         regex_posix SYSLOG_FTP method,size ftpd[.*]: \
              ([gp][-A-Za-z]+)( )

         # pattern for Netscape Proxy Server Extended Logs
         regex_posix NS_PROXY 1,3,2,4 ][ ]+"([A-Za-z][-A-Za-z]+) \
              [^"]*" ([-0-9]+) ([-0-9]+) ([-0-9]+)

         # pattern for Squid Cache logs
         regex_posix SQUID 4,3,2,1 [0-9]+.[0-9]+[ ]+[0-9]+ \
              [a-zA-Z0-9.]+ ([_A-Z]+)([0-9]+) ([0-9]+) ([A-Z]+)

         # pattern for CERN, NCSA, Netscape etc Error Logs
         regex_posix CERN_err - .

         # pattern for FTP Server error logs (normally in SYSLOG)
         regex_posix SYSLOG_FTP_err - FTP LOGIN FAILED

       A Web server can be specified using this syntax:

         server serverName on|off accessRegex accessFile errorRegex errorFile

       The serverName must be unique for each server, and is the name given to
       the  instance for the associated performance metrics.  See PMAPI(3) for
       a discussion of PCP instance domains.  The on  or  off  flag  indicates
       whether the server is to be monitored when the PMDA is installed.  This
       can   altered   dynamically   using   pmstore(1)   for    the    metric
       web.perserver.watched, which has one instance for each Web server named
       in configfile.

       Two files are monitored for each Web server, the access and  the  error
       log.   Each  file  requires  the  name of a previously declared regular
       expression, and a file name.  The log files specified for  each  server
       do  not have to exist when the weblog PMDA is installed.  The PMDA will
       continue to check  for  non-existent  log  files  and  open  them  when
       possible.  Some legal server specifications are:

         # Netscape Server on Port 80 at IP address 127.55.555.555
         server 127.55.555.555:80 on CERN /logs/access CERN_err /logs/errors

         # FTP Server.
         server ftpd on SYSLOG_FTP /var/log/messages SYSLOG_FTP_err /var/log/messages

CAVEATS

       Specifying  regular  expressions with an incorrect number of arguments,
       anything other than 2 for access logs, and none  for  error  logs,  may
       cause  the  PMDA  to  behave incorrectly and even crash. This is due to
       limitations in the interface of regex(3).

FILES

       $PCP_PMDAS_DIR/weblog
                 installation directory for the weblog PMDA

       $PCP_PMDAS_DIR/weblog/Install
                 installation script for the weblog PMDA

       $PCP_PMDAS_DIR/weblog/Remove
                 de-installation script for the weblog PMDA

       $PCP_LOG_DIR/pmcd/weblog.log
                 default log file for error reporting

       $PCP_PMCDCONF_PATH
                 pmcd configuration  file  that  specifies  the  command  line
                 options to be used when pmdaweblog is launched

       $PCP_LOG_DIR/NOTICES
                 log of PMDA installations and removals

       $PCP_VAR_DIR/config/web/weblog.conf
                 likely location of the weblog PMDA configuration file

       $PCP_DOC_DIR/pcpweb/index.html
                 the online HTML documentation for PCPWEB

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the
       file and directory names used by PCP.  On each installation,  the  file
       /etc/pcp.conf  contains  the  local  values  for  these variables.  The
       $PCP_CONF variable may be used to specify an alternative  configuration
       file, as described in pcp.conf(4).

SEE ALSO

       pmcd(1),  pmchart(1), pmdawebping(1), pminfo(1), pmstore(1), pmview(1),
       tail(1), weblogvis(1), webvis(1), PMAPI(3), PMDA(3) and regcmp(3).