Man Linux: Main Page and Category List

NAME

       pmie - inference engine for performance metrics

SYNOPSIS

       pmie [-bCdefHVvWxz] [-A align] [-a archive] [-c filename] [-h host] [-l
       logfile] [-j stompfile] [-n pmnsfile] [-O offset]  [-S  starttime]  [-T
       endtime] [-t interval] [-Z timezone] [filename ...]

DESCRIPTION

       pmie  accepts a collection of arithmetic, logical, and rule expressions
       to be evaluated at  specified  frequencies.   The  base  data  for  the
       expressions  consists  of performance metrics values delivered in real-
       time from any host running the Performance  Metrics  Collection  Daemon
       (PMCD),  or  using  historical  data  from  Performance  Co-Pilot (PCP)
       archive logs.

       As well as computing arithmetic and logical values,  pmie  can  execute
       actions  (popup alarms, write system log messages, and launch programs)
       in response to specified conditions.  Such actions are extremely useful
       in detecting, monitoring and correcting performance related problems.

       The  expressions  to  be  evaluated  are  read from configuration files
       specified by one or more filename arguments.  In  the  absence  of  any
       filename, expressions are read from standard input.

       A description of the command line options specific to pmie follows:

       -a   archive  is  the  base  name  of  a  PCP  archive  log  written by
            pmlogger(1).  Multiple instances of the -a flag may appear on  the
            command  line  to  specify a set of archives.  In this case, it is
            required that only one archive be present for any one host.  Also,
            any  explicit host names occurring in a pmie expression must match
            the host name recorded in one of the archive labels.  In the  case
            of multiple archives, timestamps recorded in the archives are used
            to ensure temporal consistency.

       -b   Output will be line buffered and standard output  is  attached  to
            standard  error.   This is most useful for background execution in
            conjunction with the -l option.  The -b option is always used  for
            pmie instances launched from pmie_check(1).

       -C   Parse  the  configuration  file(s)  and exit before performing any
            evaluations.  Any errors in the configuration file are reported.

       -c   An alternative to specifying filename at the end  of  the  command
            line.

       -d   Normally  pmie  would  be launched as a non-interactive process to
            monitor and manage the performance of one or  more  hosts.   Given
            the  -d  flag  however,  execution  is interactive and the user is
            presented with a menu of  options.   Interactive  mode  is  useful
            mainly for debugging new expressions.

       -e   When  used  with -V, -v or -W, this option forces timestamps to be
            reported with each expression.  The  timestamps  are  in  ctime(3)
            format,  enclosed  in  parenthesis and appear after the expression
            name and before the expression value, e.g.
                 expr_1 (Tue Feb  6 19:55:10 2001): 12

       -f   If the -l option is specified and there is no -a option (ie. real-
            time  monitoring)  then  pmie is run as a daemon in the background
            (in all other cases foreground is the  default).   The  -f  option
            forces  pmie to be run in the foreground, independent of any other
            options.

       -H   The default hostname written to the stats file will not be  looked
            up  via  gethostbyname(3),  rather it will be written as-is.  This
            option can be useful when host name aliases are in use at a  site,
            and  the  logical  name  is  more important than the physical host
            name.

       -h   By default performance data is fetched from  the  local  host  (in
            real-time  mode)  or  the  host for the first named archive on the
            command line (in archive mode).  The host argument overrides  this
            default.   It  does  not  override  hosts  explicitly named in the
            expressions being evaluated.

       -l   Standard error is sent to logfile.

       -j   An  alternative  STOMP  protocol  configuration  is  loaded   from
            stompfile.   If  this  option is not used, and the stomp action is
            used     in     any      rule,      the      default      location
            $PCP_VAR_DIR/pmie/config/stomp will be used.

       -n   An  alternative  Performance  Metrics  Name Space (PMNS) is loaded
            from the file pmnsfile.

       -t   The interval argument follows the syntax described in PCPIntro(1),
            and  in  the simplest form may be an unsigned integer (the implied
            units in this case are seconds).  The value is used  to  determine
            the  sample  interval  for  expressions that do not explicitly set
            their sample interval using  the  pmie  variable  delta  described
            below.  The default is 10.0 seconds.

       -v   Unless  one  of  the  verbose  options -V, -v or -W appears on the
            command line, expressions are evaluated silently, the only  output
            is  as  a  result  of  any actions being executed.  In the verbose
            mode, specified using the -v flag, the value of each expression is
            printed  as  it  is evaluated.  The values are in canonical units;
            bytes in the dimension of ‘‘space’’, seconds in the  dimension  of
            ‘‘time’’   and   events   in  the  dimension  of  ‘‘count’’.   See
            pmLookupDesc(3) for details of the supported dimension and scaling
            mechanisms for performance metrics.  The verbose mode is useful in
            monitoring the value  of  given  expressions,  evaluating  derived
            performance  metrics,  passing  these values on to other tools for
            further processing and in debugging new expressions.

       -V   This option has the same effect as the -v option, except that  the
            name  of the host and instance (if applicable) are printed as well
            as expression values.

       -W   This option has the same effect as the -V option described  above,
            except  that  for boolean expressions, only those names and values
            that make the expression true are printed.   These  are  the  same
            names  and  values accessible to rule actions as the %h, %i and %v
            bindings, as described below.

       -x   Execute in domain agent  mode.   This  mode  is  used  within  the
            Performance Co-Pilot product to derive values for summary metrics,
            see pmdasummary(1).  Only restricted functionality is available in
            this mode (expressions with actions may not be used).

       -Z   Change  the  reporting  timezone  to timezone in the format of the
            environment variable TZ as described in environ(5).

       -z   Change the reporting timezone to the timezone of the host that  is
            the  source  of  the performance metrics, as identified via either
            the -h option or the first named archive (as described  above  for
            the -a option).

       The  -S,  -T, -O, and -A options may be used to define a time window to
       restrict the samples retrieved, set an initial origin within  the  time
       window,  or  specify a ‘‘natural’’ alignment of the sample times; refer
       to PCPIntro(1) for a complete description of these options.

       Output from pmie is directed to standard output and standard  error  as
       follows:

       stdout
            Expression values printed in the verbose -v mode and the output of
            print actions.

       stderr
            Error and warning messages for any syntactic or semantic  problems
            during expression parsing, and any semantic or performance metrics
            availability problems during expression evaluation.

EXAMPLES

       The following example expressions demonstrate some of the  capabilities
       of the inference engine.

       The  directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
       examples of pmie expressions.

       The variable delta controls expression evaluation  frequency.   Specify
       that  subsequent  expressions be evaluated once a second, until further
       notice:

            delta = 1 sec;

       If total syscall rate exceeds 5000 per second per CPU, then display  an
       alarm notifier:

            kernel.all.syscall / hinv.ncpu > 5000 count/sec
            -> alarm "high syscall rate";

       If  the high syscall rate is sustained for 10 consecutive samples, then
       launch top(1) in an xwsh(1G) window to monitor processes, but  do  this
       at most once every 5 minutes:

            all_sample (
                kernel.all.syscall @0..9 > 5000 count/sec * hinv.ncpu
            ) -> shell 5 min "xwsh -e ’top’";

       The following rules are evaluated once every 20 seconds:

            delta = 20 sec;

       If  any  disk  is performing more than 60 I/Os per second, then print a
       message identifying  the  busy  disk  to  standard  output  and  launch
       dkvis(1):

            some_inst (
                disk.dev.total > 60 count/sec
            ) -> print "disk %i busy " &
                 shell 5 min "dkvis";

       Refine  the  preceding  rule to apply only between the hours of 9am and
       5pm, and to require 3 of 4 consecutive samples to exceed the  threshold
       before executing the action:

            $hour >= 9 && $hour <= 17 &&
            some_inst (
              75 %_sample (
                disk.dev.total @0..3 > 60 count/sec
              )
            ) -> print "disk %i busy ";

       The following rules are evaluated once every 10 minutes:

            delta = 10 min;

       If  either  the / or the /usr filesystem is more than 95% full, display
       an alarm popup, but not if it has already  been  displayed  during  the
       last 4 hours:

            filesys.free #’/dev/root’ /
                filesys.capacity #’/dev/root’ < 0.05
            -> alarm 4 hour "root filesystem (almost) full";

            filesys.free #’/dev/usr’ /
                filesys.capacity #’/dev/usr’ < 0.05
            -> alarm 4 hour "/usr filesystem (almost) full";

       The following rule requires a machine that supports the PCP environment
       metrics.  If the machine environment  temperature  rises  more  than  2
       degrees over a 10 minute interval, write an entry in the system log:

            environ.temp @0 - environ.temp @1 > 2
            -> alarm "temperature rising fast" &
               syslog "machine room temperature rise alarm";

       And  last,  something interesting if you have performance problems with
       your Oracle database:

            db = "oracle.ptg1";
            host = ":moomba.melbourne.sgi.com";
            lru = "#’cache buffers lru chain’";
            gets = "$db.latch.gets $host $lru";
            total = "$db.latch.gets $host $lru +
                     $db.latch.misses $host $lru +
                     $db.latch.immisses $host $lru";

            $total > 100 && $gets / $total < 0.2
            -> alarm "high lru latch contention";

QUICK START

       The pmie specification language is powerful and large.

       To expedite rapid development  of  pmie  rules,  the  pmieconf(1)  tool
       provides a facility for generating a pmie configuration file from a set
       of generalized pmie rules.  The supplied set of  rules  covers  a  wide
       range of performance scenarios.

       The  pmrules(1)  tool provides a GUI-based facility for generating pmie
       rules from parametrized templates.  The supplied templates cover a wide
       range of performance scenarios.

       The  development  efforts  of  the  PCP engineering team are focused on
       pmieconf rather than pmrules, and thus pmieconf is the recommended tool
       for quickly deploying useful pmie rules.

       The  Performance  Co-Pilot  Users and Administrators Guide provides a
       detailed tutorial-style chapter covering pmie.

EXPRESSION SYNTAX

       This description is terse  and  informal.   For  a  more  comprehensive
       description  see  the  Performance  Co-Pilot Users and Administrators
       Guide.

       A pmie specification is a sequence of semicolon terminated expressions.

       Basic  operators  are modeled on the arithmetic, relational and Boolean
       operators of the C  programming  language.   Precedence  rules  are  as
       expected,  although  the  use  of  parentheses is encouraged to enhance
       readability and remove ambiguity.

       Operands are performance metric names  (see  pmns(4))  and  the  normal
       literal constants.

       Operands involving performance metrics may produce sets of values, as a
       result of enumeration in the dimensions of hosts, instances  and  time.
       Special qualifiers may appear after a performance metric name to define
       the enumeration in each dimension.  For example,

           kernel.percpu.cpu.user :foo :bar #cpu0 @0..2

       defines 6 values corresponding to the time spent executing in user mode
       on  CPU  0 on the hosts ‘‘foo’’ and ‘‘bar’’ over the last 3 consecutive
       samples.  The default interpretation in the  absence  of  :  (host),  #
       (instance)  and @ (time) qualifiers is all instances at the most recent
       sample time for the default source of PCP performance metrics.

       Host and instance names that do not follow the rules for  variables  in
       programming   languages,   ie.   alphabetic   optionally   followed  by
       alphanumerics, should be enclosed in single quotes.

       Expression evaluation follows the law of  ‘‘least  surprises’’.   Where
       performance  metrics  have  the  semantics  of  a  counter,  pmie  will
       automatically convert to a rate based upon consecutive samples and  the
       time  interval between these samples.  All expressions are evaluated in
       double precision, and  where  appropriate,  automatically  scaled  into
       canonical units of ‘‘bytes’’, ‘‘seconds’’ and ‘‘counts’’.

       A  rule  is  a special form of expression that specifies a condition or
       logical expression, a special operator (->) and actions to be performed
       when the condition is found to be true.

       The following table summarizes the basic pmie operators:

           +----------------+--------------------------------------------+
           |   Operators    |                Explanation                 |
           +----------------+--------------------------------------------+
           |+ - * /         | Arithmetic                                 |
           |< <= == >= > != | Relational (value comparison)              |
           |! && ||         | Boolean                                    |
           |->              | Rule                                       |
           |rising          | Boolean, false to true transition          |
           |falling         | Boolean, true to false transition          |
           |rate            | Explicit rate conversion (rarely required) |
           +----------------+--------------------------------------------+
       Aggregate  operators  may  be  used to aggregate or summarize along one
       dimension  of  a  set-valued  expression.   The   following   aggregate
       operators  map  from  a  logical  expression to a logical expression of
       lower dimension.

         +-------------------------+-------------+--------------------------+
         |       Operators         |    Type     |       Explanation        |
         +-------------------------+-------------+--------------------------+
         |some_inst                | Existential | True if at least one set |
         |some_host                |             | member is true in the    |
         |some_sample              |             | associated dimension     |
         +-------------------------+-------------+--------------------------+
         |all_inst                 | Universal   | True if all set members  |
         |all_host                 |             | are true in the          |
         |all_sample               |             | associated dimension     |
         +-------------------------+-------------+--------------------------+
         |N%_inst                  | Percentile  | True if at least N       |
         |N%_host                  |             | percent of set members   |
         |N%_sample                |             | are true in the          |
         |                         |             | associated dimension     |
         +-------------------------+-------------+--------------------------+
       The  following  instantial  operators  may be used to filter or limit a
       set-valued logical expression, based on regular expression matching  of
       instance  names.   The  logical  expression must be a set involving the
       dimension of instances, and the regular expression is of the form  used
       by egrep(1) or the Extended Regular Expressions of regcomp(3G).

              +-------------+------------------------------------------+
              | Operators   |               Explanation                |
              +-------------+------------------------------------------+
              |match_inst   | For each value of the logical expression |
              |             | that is ‘‘true’’, the result is ‘‘true’’ |
              |             | if the associated instance name matches  |
              |             | the regular expression.  Otherwise the   |
              |             | result is ‘‘false’’.                     |
              +-------------+------------------------------------------+
              |nomatch_inst | For each value of the logical expression |
              |             | that is ‘‘true’’, the result is ‘‘true’’ |
              |             | if the associated instance name does not |
              |             | match the regular expression.  Otherwise |
              |             | the result is ‘‘false’’.                 |
              +-------------+------------------------------------------+
       For  example,  the expression below will be ‘‘true’’ for disks attached
       to controllers 2 or 3 performing more than 20 operations per second:
            match_inst "^dks[23]d" disk.dev.total > 20;

       The following aggregate operators map from an arithmetic expression  to
       an arithmetic expression of lower dimension.

          +-------------------------+-----------+--------------------------+
          |       Operators         |   Type    |       Explanation        |
          +-------------------------+-----------+--------------------------+
          |min_inst                 | Extrema   | Minimum value across all |
          |min_host                 |           | set members in the       |
          |min_sample               |           | associated dimension     |
          +-------------------------+-----------+--------------------------+
          |max_inst                 | Extrema   | Maximum value across all |
          |max_host                 |           | set members in the       |
          |max_sample               |           | associated dimension     |
          +-------------------------+-----------+--------------------------+
          |sum_inst                 | Aggregate | Sum of values across all |
          |sum_host                 |           | set members in the       |
          |sum_sample               |           | associated dimension     |
          +-------------------------+-----------+--------------------------+
          |avg_inst                 | Aggregate | Average value across all |
          |avg_host                 |           | set members in the       |
          |avg_sample               |           | associated dimension     |
          +-------------------------+-----------+--------------------------+
       The  aggregate  operators  count_inst,  count_host and count_sample map
       from  a  logical  expression  to  an  arithmetic  expression  of  lower
       dimension  by  counting  the  number  of  set  members  for  which  the
       expression is true in the associated dimension.

       For action rules, the following actions are defined:

                +----------+----------------------------------------+
                |Operators |              Explanation               |
                +----------+----------------------------------------+
                |alarm     | Raise a visible alarm with xconfirm(1) |
                |print     | Display on standard output             |
                |shell     | Execute with sh(1)                     |
                |stomp     | Send a STOMP message to a JMS server   |
                |syslog    | Append a message to system log file    |
                +----------+----------------------------------------+
       Multiple actions may be separated by the & and | operators  to  specify
       respectively  sequential  execution  (both  actions  are  executed) and
       alternate execution (the second action will only  be  executed  if  the
       execution of the first action returns a non-zero error status.

       Arguments  to actions are an optional suppression time, and then one or
       more expressions (a string is an expression in this context).   Strings
       appearing  as  arguments to an action may include the following special
       selectors that will be replaced at the time the action is executed.

       %h  Host(s)  that  make  the  left-most  top-level  expression  in  the
           condition true.

       %i  Instance(s)  that  make  the  left-most top-level expression in the
           condition true.

       %v  Values(s) from the left-most top-level expression in the  condition
           subject  to  the  host  and  instance  assignments  that  make  the
           condition true.

       Note that expansion of the special selectors is done by  repeating  the
       whole  argument  once  for each unique binding to any of the qualifying
       special selectors.  For example if a rule were true for the host mumble
       with  instances  grunt and snort, and for host fumble the instance puff
       makes the rule true, then the action
            ...
            -> shell myscript "Warning: %h-%i busy ";
       will execute myscript with the argument string  "Warning:  mumble-grunt
       busy Warning: mumble-snort busy Warning: fumble-puff busy".

       By comparison, if the action
            ...
            -> shell myscript "’Warning! busy:" " %i@%h" "’";
       were  executed  under  the  same  circumstances, then myscript would be
       executed  with  the  argument  string  ’"Warning!  busy:   grunt@mumble
       snort@mumble puff@fumble"’.

       The  semantics  of  the  expansion  of the special selectors leads to a
       common usage, where one argument is a  constant  (contains  no  special
       selectors)  the  second argument contains the desired special selectors
       with minimal separator  characters,  and  an  optional  third  argument
       provides  a constant postscript (e.g. to terminate any argument quoting
       from  the  first  argument).   If  necessary  post-processing  (eg.  in
       myscript)  can  provide  the  necessary  enumeration  over  each unique
       expansion of the string containing just the special selectors.

       For complex conditions, the bindings to these selectors is not obvious.
       It  is  strongly  recommended  that  pmie be used in the debugging mode
       (specify  the  -W  command  line  option  in  particular)  during  rule
       development.

SCALE FACTORS

       Scale  factors  may  be  appended  to  arithmetic expressions and force
       linear scaling of the value to canonical units.  Simple  scale  factors
       are   constructed   from   the  keywords:  nanosecond,  nanosec,  nsec,
       microsecond, microsec, usec, millisecond, millisec, msec, second,  sec,
       minute,  min,  hour,  byte,  Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount,
       Mcount, Gcount and Tcount, and the operator /, for example  ‘‘Kbytes  /
       hour’’.

MACROS

       Macros are defined using expressions of the form:

            name = constexpr;

       Where  name  follows  the  normal  rules  for  variables in programming
       languages,  ie.  alphabetic  optionally  followed   by   alphanumerics.
       constexpr  must  be a constant expression, either a string (enclosed in
       double quotes) or an arithmetic expression  optionally  followed  by  a
       scale factor.

       Macros  are  expanded when their name, prefixed by a dollar ($) appears
       in an expression, and macros may be nested within a constexpr string.

       The following reserved macro names are understood.

       minute    Current minute of the hour.

       hour      Current hour of the day, in the range 0 to 23.

       day       Current day of the month, in the range 1 to 31.

       month     Current month of the year, in the range  0  (January)  to  11
                 (December).

       year      Current year.

       day_of_week
                 Current  day  of  the  week,  in  the  range  0 (Sunday) to 6
                 (Saturday).

       delta     Sample interval in effect for this expression.

       Dates  and  times  are  presented  in  the  reporting  time  zone  (see
       description of -Z and -z command line options above).

AUTOMATIC RESTART

       It  is  often  useful for pmie processes to be started and stopped when
       the local host is booted or shutdown, or when they have  been  detected
       as  no  longer  running  (when  they  have unexpectedly exited for some
       reason).   Refer  to  pmie_check(1)  for  details  on  automating  this
       process.

EVENT MONITORING

       It  is  common  for  production  systems  to  be monitored in a central
       location.  Traditionally on UNIX systems this has been performed by the
       system  log  facilities  -  see logger(1), and syslogd(1).  On Windows,
       communication with the system event log is handled by  pcp-eventlog(1).

       pmie  fits into this model when rules use the syslog action.  Note that
       if the action string begins with -p (priority)  and/or  -t  (tag)  then
       these  are  extracted from the string and treated in the same way as in
       logger(1) and pcp-eventlog(1).

       However, it is common to have other event monitoring  frameworks  also,
       into  which  you  may wish to incorporate performance events from pmie.
       You can often use the shell action to send events to these  frameworks,
       as  they  usually provide their a program for injecting events into the
       framework from external sources.

       A final option is use of the stomp (Streaming Text  Oriented  Messaging
       Protocol)  action,  which allows pmie to connect to a central JMS (Java
       Messaging System) server and send events to the PMIE topic.  Tools  can
       be  written  to  extract  these  text  messages  and  present  them  to
       operations people (via desktop popup windows, etc).  Use of  the  stomp
       action requires a stomp configuration file to be setup, which specifies
       the   location   of   the   JMS   server   host,   port   number,   and
       username/password.

       The format of this file is as follows:

            host=messages.sgi.com   # this is the JMS server (required)
            port=61616              # and its listening here (required)
            timeout=2               # seconds to wait for server (optional)
            username=joe            # (required)
            password=j03ST0MP       # (required)
            topic=PMIE              # JMS topic for pmie messages (optional)

       The timeout value specifies the time (in seconds) that pmie should wait
       for acknowledgements from the JMS server after sending  a  message  (as
       required  by the STOMP protocol).  Note that on startup, pmie will wait
       indefinately for a connection, and will not begin rule evaluation until
       that initial connection has been established.  Should the connection to
       the JMS server be lost at any time while pmie  is  running,  pmie  will
       attempt  to  reconnect on each subsequent truthful evaluation of a rule
       with a stomp action, but not more than once per  minute.   This  is  to
       avoid contributing to network congestion.  In this situation, where the
       STOMP connection to the JMS server has been severed, the  stomp  action
       will return a non-zero error value.

FILES

       $PCP_DEMOS_DIR/pmie/*
                 annotated example rules
       $PCP_VAR_DIR/pmns/*
                 default PMNS specification files
       $PCP_TMP_DIR/pmie
                 pmie  maintains  files  in  this  directory  to  identify the
                 running pmie instances  and  to  export  runtime  information
                 about  each  instance  -  this  data  forms  the basis of the
                 pmcd.pmie performance metrics
       $PCP_PMIECONTROL_PATH
                 the default set of pmie instances to start  at  boot  time  -
                 refer to pmie_check(1) for details
       $PCP_VAR_DIR/config/pmie/*
                 the  predefined  alarm  action scripts (email, log, popup and
                 syslog), the example action script (sample)and the concurrent
                 action control file (control.master, see also pmrules(1)).
       /usr/pcp/lib/pmie-common
                 common  shell  procedures  for  the  predefined  alarm action
                 scripts

BUGS

       The lexical scanner and parser will attempt to recover after  an  error
       in  the  input expressions.  Parsing resumes after skipping input up to
       the next semi-colon (;),  however  during  this  skipping  process  the
       scanner  is ignorant of comments and strings, so an embedded semi-colon
       may cause parsing to resume at an unexpected place.  This  behavior  is
       largely  benign,  as  until the initial syntax error is corrected, pmie
       will not attempt any expression evaluation.

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the
       file  and  directory names used by PCP.  On each installation, the file
       /etc/pcp.conf contains the  local  values  for  these  variables.   The
       $PCP_CONF  variable may be used to specify an alternative configuration
       file, as described in pcp.conf(4).

UNIX SEE ALSO

       logger(1).

WINDOWS SEE ALSO

       pcp-eventlog(1).

SEE ALSO

       PCPIntro(1),   pmcd(1),   pmdumplog(1),   pmieconf(1),   pmie_check(1),
       pminfo(1), pmlogger(1), pmval(1), PMAPI(3), pcp.conf(4) and pcp.env(4).

USER GUIDE

       For a more complete description of the  pmie  language,  refer  to  the
       Performance   Co-Pilot   Users   and  Administrators  Guide.   This  is
       distributed in insight(1) format as part of the pcp.books subsystem, or
       in HTML format from:
           http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?\
               db=bks&fname=/SGI_Admin/books/PCP_IRIX/sgi_html/ch.html