log_analysis - Analyze various system logs

NAME

       log_analysis - Analyze various system logs

SYNOPSIS

       log_analysis [-h] [-r] [-g] [-f config_file] [-o file] [-O] [-n
       nodename] [-U] [-u unknownsdir] [-D var1,var2=value,...] [-d days_ago]
       [-a] [-F] [-i] [-m mail_address] [-M mail_prog] [-s] [-S] [-t
       forced_type] [required_files. . .]  log_analysis -I info_type

DESCRIPTION

       log_analysis analyzes and summarizes system logs files.  It also runs
       some other commands (ie. w, df -k) to show the system state.  It’s
       intended to be run on a daily basis out of cron.

       log_analysis supports several major modes.  The default mode is report
       mode, which scans through your logs, produces a text report, and exits.
       There is also real mode, which lets you monitor your logs continuously;
       gui mode, which is a gui sitting on top of real mode; and daemon mode,
       which is a daemonized variant of real mode.

OPTIONS

       -a all
           Show all logs, not just the ones from yesterday.

       -A daemon mode
           Start in daemon mode.  Daemon mode is like real mode, except that
           the process daemonizes, and there is no regular output, just
           actions.  daemon mode is useful if you want to start log_analysis
           at system boot time to run actions.  It’s also useful if you have
           actions configured, and you have multiple copies of log_analysis
           running in real/gui mode, and you only want the actions to happen
           once.

           See -r for more info on real mode.  In general, anything that
           applies to real mode applies to daemon mode unless it explicitly
           says otherwise.

           The variables specific to daemon mode are daemon_mode and
           daemon_mode_pid_file.  One variable that is not specific to daemon
           mode but is really useful with daemon mode is
           real_mode_no_actions_unless_is_daemon.

       -b real mode backlogs
           By default, real mode and gui mode ignore all existing log messages
           and only show new logs.  With this option, real mode shows logs as
           indicated by days_ago.  See -r for more info.

       -d days_ago
           Show logs from days_ago days ago.  Defaults to 1 (ie. show
           yesterday’s logs.)  In -a mode, this option only affects the
           heading, and it defaults to 0.

           You can also provide an absolute date in the form YYYY_MM_DD, ie.
           2001_03_02.  And you can provide the symbolic names today
           (equivalent to 0) and yesterday (equivalent to 1).

           And you can even provide a date range in the form
           YYYY_MM_DD-YYYY_MM_DD or ago1-ago2 to get output for a range of
           days.  Each day is output individually, so if you use the -o
           option, you get a separate file for each day, and if you use the -m
           option, you get a separate mail for each day.

           You can also set this in the config with the days_ago variable.

           See -r for how days_ago is handled under real mode and gui mode.

       -D var1,var2=value,var3,...
           This option lets you define preprocessor constants.  Its argument
           is a comma-separated list of constants to define.  To set a
           constant to a particular value, say "constant=value".

       -f config_file
           Read config_file in addition to the internal config and the
           internal config files.  See "CONFIG FILE" for details.

       -F  Instead of loading the whole internal config, just use a minimal
           subset.

       -g  "gui mode", ie. monitor log files continuously.  Currently
           conflicts with many other modes and options.  Yes, has built-in
           support for log file rollover.  This is basically real mode (see
           -r) with a GUI; variables that apply to real mode also apply to gui
           mode, but not vice versa.

           See variables gui_mode, gui_mode_modifier, and window_command for
           gui mode specifics.  See -r for many things that also apply to gui
           mode.

       -h help
           Show command summary and exit.

       -i includes suppress
           Don’t include the standard include files, ie.
           /etc/log_analysis.conf, /usr/etc/log_analysis.conf, and the others
           listed in "FILES".  Note that this option does not stop the
           inclusion of $HOME/.log_analysis.conf in gui mode.

       -I info
           This option is used for obtaining internal information about
           log_analysis.  log_analysis exits immediately after outputting the
           information.

           If info is help, log_analysis outputs the list of things you can
           use for info.

           If info is categories, all categories (those mentioned in the
           various configs and implicit categories) will be listed.

           If info is colors, all colors that work for real_mode and gui_mode
           will be listed.

           If info is config_versions, all config files will be listed with
           their config_version and file_version (if defined).

           If info is evals, the evals built from the config (internal and
           local) are output.

           If info is internal_config, the internal config is output.

           If info is log_files, the log files that would have been read are
           output.

           If info is log_types, the known log types are output.

           If info is nothing, log_analysis just exits.  Useful for testing
           configs.

           If info is pats, the known subpatterns will be listed.

           If info is patterns, the various patterns defined for the log types
           are output.

       -m mail_address
           Mail output to mail_address.  This can also be specified in the
           config; see mail_address in "VARIABLES".

       -M mail_command
           Use mail_command to send the mail.  This can also be specified in
           the config; see mail_command in "VARIABLES" for more info,
           including the default.

       -n nodename
           Use nodename as the nodename (AKA hostname) instead of the default.
           This is more than just cosmetic: entries in syslogged files will be
           processed differently if they didn’t come from this nodename.  This
           can also be specified in the config file; see nodename in
           "VARIABLES".

       -N process all nodenames
           If the logs contain entries for nodes other than nodename, (ie. if
           the host is a syslog server), analyze them anyway.

       -o file
           Output to file instead of to standard output.  Works with -m, so
           you can save to a file and send mail with one command.

       -O  With -o file, causes the output to go both to the file and to
           standard output.  NB: this does not currently work with -m, so you
           can’t output to a file, standard output, and to email.

       -p pgp_type
           Encrypts the mail output.  Uses pgp_type to determine the
           encryption command.  For use with -m or mail_address.  See pgp_type
           in the list of global variables for info on encryption types.

       -r  "Real mode", ie. monitor log files continuously.  Currently
           conflicts with many other modes and options.  Yes, has built-in
           support for log file rollover.  See -g for a GUI that can sit on
           top of this mode, and -A to run real mode as a daemon.

           See variables real_mode, real_mode_output_format,
           real_mode_sleep_interval, real_mode_check_interval,
           real_mode_backlogs (or the -b option), and keep_all_raw_logs in the
           list of global variables for more configurables.

           WARNING: in real mode and gui mode, only the most recent file per
           glob in optional_log_files is monitored.  This means that you
           should set it to something like /var/log/messages* and
           /var/log/syslog* rather than /var/log/*.

           WARNING: in real mode and in gui mode, log_analysis treats days_ago
           differently; if it’s a simple number, it is treated as the number
           of days ago to start looking at logs.  So, if days_ago is 7,
           log_analysis looks through the past 7 days’ worth of logs.
           HOWEVER, even if -d is set, log_analysis doesn’t actually show
           these logs unless -b is specified or the corresponding variable
           real_mode_backlogs is set.

           NOTE: The primary feature of log_analysis is its reporting
           capability.  Using it for continuous monitoring makes sense if you
           want a single config for reporting and for continuous monitoring.
           If you just want continuous monitoring then you may be better off
           with some of the other software out there, such as swatch(1).

       -s suppress other commands
           Usually, log_analysis runs assorted commands that show system state
           (ie. w, df -k).  This option doesn’t run those commands.  See
           commands_to_run in "VARIABLES" for the list of extra commands.  The
           suppress_commands variable does the same thing as this option.

       -S suppress output footer
           Usually, log_analysis will include its version number, the time it
           spent running, and its arguments at the end of the output.  This
           option suppresses that output.  The suppress_footer variable does
           the same thing as this option.

       -t forced_type
           log_analysis usually determines the type of logfiles by looking at
           the per-type log_filenames extension.  This option and the
           type_force variable let you bypass that check.

       -U unknowns-only
           Output logfile unknowns to stdout and exit.  If unknownsdir exists,
           also wipe unknownsdir if it exists and then write out raw unknown
           lines to files in unknownsdir.  This exists to make writing custom
           rules easier.

       -u unknownsdir
           Use unknownsdir as the unknownsdir.  If unknownsdir already exists,
           and contains files, its files will be used as the input for
           log_analysis regardless of any other command line options.  If -U
           is also specified, after all processing unknownsdir will be wiped
           out and its files rewritten with the current unknowns.  This is
           useful for writing your own configs.

       -v version
           Output version and exit.

       required-files
           If files are specified on the command line, log_analysis ignores
           its built-in list of optional and required log files, and process
           the files on the command line.  If one of the files doesn’t exist,
           it’s a fatal error.

CONFIG FILE

       The script has an embedded config file.  It will also read various
       external config files if they exist; see "FILES" for a list.  Later
       directives (from later in the file or from a file read later) override
       earlier directives.

       You can make comments with ’#’ at the beginning of a line.  If you want
       a ’#’ or ’=’ at the beginning of a line, you usually need to quote it
       with backslash.

       Some directives take a "block" as argument.  A block is a collection of
       lines that ends with a line that is empty or only contains whitespace.
       ’#’ at the beginning of a line still comments out the line.  Leading
       whitespace on a line is ignored.

       Before the config is parsed, it is passed through a preprocessor
       inspired by the aide(1) preprocessor.

       Pattern directives

       These directives describe your logs, and are the main point of this
       program.  The basic idea here is that you first declare what logtype
       you are working with, and then you specify a bunch of perl patterns
       that describe different kinds of log messages, and that save parts of
       the message.  For each perl pattern, you specify one or more
       destinations that describe what you want done with it.

       logtype: type
           Future patterns should be applied to this logtype (ie. sulog,
           syslog, wtmp.)  Example:

           logtype: syslog

       pattern: pattern
           pattern is a perl regex (see perlre(1)) that implictly starts with
           ^ (beginning of the line) and implicitly ends with \s*$ (optional
           whitespace and the end of the line.)  This should only be issued
           after a logtype: has been issued in the same config file.  Wildcard
           parts of the pattern should be surrounded with parentheses, to save
           these parts for later use in the format:.  Note that there are some
           tokens with special meanings that can be used here in the format
           $pat{something}, ie.  $pat{ip}, $pat{file}, etc. (see "pat" for
           details, and run log_analysis -I pats for the current list).
           Examples:

           pattern: popper: Stats: ($pat{mail_user}) (\d+) (\d+) (\d+) (\d+)

           pattern: login: LOGIN ON ($pat{file}) BY ($pat{user})

           The order of precedence for patterns is undefined, except that
           user-defined patterns always have precedence over the patterns of
           the internal config.

       format: format
           format is treated as a string that contains the useful information
           from a pattern.  Note that it should not actually be quoted.  A
           format is mandatory for category destinations, but should not be
           used with SKIP or LAST destinations.

           For example, if we had a pattern that was login: LOGIN ON
           ($pat{file}) BY ($pat{user}), we would probably just want $2, so we
           might say:

           format: $2

           Similarly, if we had a patterns that was kernel: deny (\d+) packets
           from ($pat{ip}) to ($pat{ip}), we might want to say:

           format: $2 => $3

       use_sprintf
           use_sprintf is optional.  If this directive is present for a given
           format, than instead of the format being treated as a string, it is
           treated as the arguments for sprintf(3).  For example, if you have
           a source IP address in $2 and a destination IP address in $3, you
           could just have dest as $2 => $3, but you would have things lining
           up better if you did this:

           format: "%-15s => $3", $2

           use_sprintf

       delete_if_unique
           delete_if_unique is optional.  This feature can be used when you
           have multiple dests for one pattern, one of which is a regular
           category and one of which is a UNIQUE with a filter.  You want the
           one that is a regular category to be deleted if the UNIQUE category
           meets its filter, ie.  because it’s a scan.  See "UNIQUE
           DESTINATION" for more info.

       count: count
           count is optional.  The default is that a log line that matches a
           pattern causes the category to increment by 1.  But sometimes, a
           single log line corresponds to multiple events, ie. if you have a
           log message of the form "5 packets denied by firewall" or "last
           message repeated 3 times", you can extract the event count to
           count.  For example, if you’re using the pattern kernel: deny (\d+)
           packets from ($pat{ip}) to ($pat{ip}), you might say:

           count: $1

       color: colors
           space-separated list of colors to display this message in when in
           real-mode or gui-mode.  For a list of colors that will work in both
           modes, run log_analysis -I colors.  Note that "bell" is among the
           available colors, because it didn’t fit anywhere else.  See the
           colors entry for more info.

           NOTE: if multiple dest configs with conflicting color settings
           result in delivery to the same line in gui mode, the result is
           currently undefined.  There is only one line to be displayed, after
           all.

       description: description_text
           This is a simple text description of the event, to explain the
           problem to your operators.  It can be accessed via gui mode.  The
           note above by color applies.

       do_action: action
           Run "action" (described elsewhere in the config with the "action:"
           keyword) if this event is seen in real mode or gui mode.

       priority: priority
           Assign priority priority to action.  Currently, the only priority
           that does anything is "IGNORE".  It can be used to ignore events.

       dest: dest
           This describes what you want done with the data in a pattern.  If
           dest is the special token SKIP the data is discarded.  If dest is
           the special token LAST, the data is assumed to be of the form "last
           message repeated N times", and we pretend as though the last
           message we saw occurred, using count as a multiplier.  If dest
           starts with the special token UNIQUE, we do special "unique"
           handling, which is covered in "UNIQUE DESTINATION".  If dest starts
           with the special token CATEGORY or is any other string, it is
           treated as a category that the pattern data should be saved to.
           Ie. if pattern was login: LOGIN ON ($pat{file}) BY ($pat{user}),
           and format was $2, then one might set dest to login: successful
           local login.  You must have a format defined before the dest.

           You can have multiple dest directives for a single pattern, if all
           of the dests are category destinations.  Each one needs its own
           format.  Similarly, if you set count or use_sprintf, they are tied
           to the particular dest you set them with.

           Note that dest "closes" the description of a destination, so you
           need to have any other related directives (ie. format, count,
           use_sprintf, delete_if_unique) before the dest directive.  This
           ordering is necessary to avoid ambiguity in the multiple-
           destination case.

       Event directives

       You can configure what happens for incoming events based on certain
       criteria.  Currently, those criteria are a simple string match of one
       or more of the category, data, or hostname.  So, for example, you can
       ignore all messages from "roguehost", or color "user logged in"
       messages for a certain user in bright red.  Here are the useful
       directives:

       event:
           Starts a new event config.

       match category: value
       match data:     value
       match hostname: value
           This event config applies when the "category" is "value", or the
           "data" is value, or the "hostname" is "value".  If multiple match
           lines are supplied, they are ANDed together.

       color: color
       description: description_text
       do_action: action
       priority: priority
           color, description, do_action, and priority work the same way as
           they do in a "dest" config or in an "event" config.

           If "event", "dest", and "category" configs all apply to a given
           event than "event" has highest precedence, followed by "dest",
           followed by "category".

       Category directives

       Several patterns can lead to the same category, so category-specific
       directives are associated with the category, not with a pattern.  Here
       are the category directives:

       category: category
           Specifies which category subsequent directives will define.

       filter: filter commands
           By default, log_analysis will output all the data it finds in a
           category.  Filters let you specify, say, that only the top 10 items
           should be output, or that only the items that occurred fewer than 5
           times should be output.  If a category has data, but none of the
           data meet the filter rules, then the category will be completely
           skipped.  See "FILTERS" for more info.

       sort: sorting keywords
           Specifies how this category should be sorted in the output.
           Examples are "funky", "string", "value", "reverse value", etc.  The
           default is "funky".  See "SORTING" for more info.

       derive: derive commands
           The usual way to populate categories is via the pattern config.
           But sometimes, you want to combine two or more elemental categories
           to make a new category.  Any categories derived in this manner may
           not be a destination for simple patterns.

           There are currently three subcommands for this (the quotes are
           literal):

           "category1" add "category2"
           "category1" subtract "category2"
                   These do what you expect: take the values for the items in
                   category2 and add or subtract them from the values for the
                   items in category1.  Any item defined in either category
                   will be in the new category.  Subtract can cause the values
                   in the new category to be negative or 0.

           "category1" remove "category2"
                   The new category will contain items in category1 that are
                   not in category2.  This is very different from subtract.

                   Example: if category1 contains A with a value of 2 and B
                   with a value of 2, while category2 contains A with a value
                   of 1 and C with a value of 1, ’"category1" subtract
                   "category2"’ will contain A with a value of 1, B with a
                   value of 2, and C with a value of -1, while ’"category1"
                   remove "category2"’ will only contain B with a value of 2.

       color: color
       description: description_text
       do_action: action
       priority: priority
           color, description, do_action, and priority work the same way as
           they do in a "dest" config or in an "event" config.

           If "event", "dest", and "category" configs all apply to a given
           event than "event" has highest precedence, followed by "dest",
           followed by "category".

       Action directives

       In real mode and in gui mode, sometimes you want an "action" (like
       paging someone) to automatically happen when a particular message is
       seen.  And in gui mode, you might want to run a command on a message
       interactively (ie. to telnet or ssh into the host it came from.)  The
       directives to do that (inspired by swatch(1)) are:

       action: action_name
           Starts defining a new action named action_name.

       command: command
           The command to run for the current action.  command uses the same
           tags as real_mode_output_format.

           WARNING: you can potentially shoot yourself in the foot by passing
           data that has not been sanitized to a command on your system.  Be
           careful!

       window: title
           Performing the action will require creating a window using title as
           the title.  The title will be passed to window_command as the "%t"
           tag.  title itself uses the same tags as real_mode_output_format.
           This only makes sense for gui mode.

           WARNING: you can potentially shoot yourself in the foot by passing
           data that has not been sanitized to a command on your system.  Be
           careful!

       use_pipe:
           The data in the event will be sent to the command via standard
           input.  The format used will be that specified by the
           default_action_format variable, unless overridden locally by the
           action_format: directive.  These formats allow the same tags as
           real_mode_output_format.

       action_format: format
           See use_pipe above.

       throttle: throttle_time
           Automatically-triggered actions can potentially result in a slew of
           events.  The "throttle" option lets you specify a minimum amount of
           time before the action should recur with this event.  The time can
           be specified as seconds, as minutes:seconds, or as
           hours:minutes:seconds.

           Throttles do not apply to actions and logins that are explicitly
           invoked via the GUI.

           By default, the throttle is triggered on unique category and data.
           That is, if the event was category "user logged in" and the data
           was "morty", then the throttle will keep "user logged in", "morty"
           events from causing the action to run again, but won’t stop "user
           logged in", "esther" or "no such user", "morty" events from
           triggering the action.  This default is set with the
           default_throttle_format variable, which defaults to "%c\n%d".  It
           can be overriden on a per-action basis with the throttle_format:
           directive, which takes the same tags as real_mode_output_format.
           If you want the throttle to be global to the action (say, a pager
           action), set throttle_format to a simple scalar value (like 1).

       throttle_format: format
           See throttle: above.

       Other directives

       config_version version-number
           Declare that the config is compatible with version version-number.
           This is for version-control purposes.  Every config file should
           have one of these.  You can scan your config files’ config versions
           with -I config_versions.

       file_version revision-information
           Your own version control information.  revision-information can be
           arbitrary text.  You can scan your config files’ config versions
           with -I config_versions.

       include file
           Read in configuration from file.  Dies if file doesn’t exist.  file
           is subject to usual tag substitutions; see "TAG SUBSTITUTION".

       include_if_exists file
           Just like include, but doesn’t die if the file doesn’t exist.

       include_dir dir
           Read in all files in dir, and include them.  Die if the directory
           doesn’t exist, or if a file in the directory isn’t readable.  dir
           is subject to the usual tag substitutions; see "TAG SUBSTITUTION".
           Any filenames that match a pattern in filename_ignore_patterns will
           be skipped.

       include_dir_if_exists dir
           Just like include_dir, but doesn’t die if the directory doesn’t
           exist.  Does still die if any of the files in dir isn’t readable.

       block_comment
           Throws out the block immediately after it.

       set var varname =value
           Set scalar variable varname to value value.  If the variable
           already exists, this will overwrite it.

           See "VARIABLES" for the list of variables you can play with.

       add var varname =value
           If scalar variable varname already exists, append value to the end
           of its current value.  If it doesn’t yet exist, create it and set
           it to value.

           See "VARIABLES" for the list of variables you can play with.

       prepend var varname =value
           If scalar variable varname already exists, prepend value to the
           current value.  If it doesn’t yet exist, create it and set it to
           value.

           See "VARIABLES" for the list of variables you can play with.

       set arr arrname =
           Read in the block that follows this declaration, make the lines
           into an array, and set the array variable arrname to that array.

           See "VARIABLES" for the list of variables you can play with.

       add arr arrname =
           Read in the block that follows this declaration, make the lines
           into an array, and append that array to the array named arrname.

           See "VARIABLES" for the list of variables you can play with.

       prepend arr arrname =
           Read in the block that follows this declaration, make the lines
           into an array, and prepend that array to the array named arrname.

           See "VARIABLES" for the list of variables you can play with.

       remove arr arrname =
           Read in the block that follows this declaration, and for each line,
           look for and delete that line from array arrname.  If one of these
           lines cannot be found, the result is a warning, not death.

           See "VARIABLES" for the list of variables you can play with.

       local OTHER DIRECTIVE
           Putting "local" in front of another directive means that this
           directive should be saved when gui_mode_config_savelocal is in
           effect.

       nowarn OTHER DIRECTIVE
           Putting "nowarn" in front of another directive means that this
           directive should not generate a config warning, i.e. for redefining
           a category filter.

VARIABLES

       Some variables are scalar, which means they are strings or numbers.
       Some variables are arrays, which are lists of scalars.

       Some variables are mandatory, which means they must be defined
       somewhere in one of the config files, while some variables are
       optional.

       Some variables are global, while some are per-log-type extensions.
       Some example of per-log-type extensions are date_pattern and filenames.
       Extensions should actually appear in the format "TYPE_EXTENSION", ie.
       date_pattern would actually appear as syslog_date_pattern for the
       syslog log-type and sulog_date_pattern for sulog.

       To see examples of many of the possibilities, as well as the default
       values, run log_analysis -I internal_config.

       PER-LOG-TYPE VARIABLE EXTENSIONS

       filenames
           This mandatory extension is an array of file basenames that apply
           to the log type.  For example, if you wanted /var/adm/messages.1 to
           be processed by the syslog rules, you might add messages to
           syslog_filenames.

       open_command
           Some log files (ie. wtmp log types) are in a binary format that
           needs to be interpreted by external commands.  This optional scalar
           extension specifies a command to be run to interpret the file.  The
           command is subject to the usual tag substitutions (see "TAG
           SUBSTITUTIONS"), plus the %f tag maps to the file.  For example,
           the wtmp log type defines wtmp_open_command as "last -f %f".  If
           both decompression_rules and open_command apply to a given file,
           the intermediate data will be stored in a temp file unless
           pipe_decompress_to_open is used.  See "pipe_decompress_to_open" for
           more info.

       pipe_decompress_to_open
           If both decompression_rules and open_command apply to a given file,
           the intermediate data will be stored in a temporary file by default
           to avoid problems with some commands that can’t handle input from a
           pipe.  If this optional scalar extension is set to 1 (or any
           "true") value, then instead, the output of the decompression rule
           will be piped to the open command, and the open command’s %f tag
           will be mapped to "-".

       open_command_is_continuous
           If an open_command has been specified and the command is the sort
           that never exits (ie. tcpdump or the like) you should set this to
           let log_analysis know what to expext.  Such commands should only
           ever be used in real mode or gui mode.

       pre_date_hook
           This optional extension is an array of arbitrary perl commands that
           are run for each log line, before the date processing (or any other
           processing) is done.

       date_pattern
           This mandatory extension is a scalar that contains a pattern with
           at least one parenthesized subpattern.  Before any rules are
           applied to a log line, the engine strips off the date pattern.  If
           the engine is only looking at one day (ie. the default), it takes
           the part of the string that matched the parenthesized subpattern,
           and if it isn’t equal to the right date, it skips the line.  The
           date_format extension (next) describes what the date should look
           like.

       date_format
           This mandatory extension is a scalar that describes the date using
           the same format as strftime(3).  For example, syslog_date_format is
           "%b %e".

       nodename_pattern
           This optional extension is a pattern with at least one
           parenthesized subpattern.  If it exists, then after the
           date_pattern is stripped from the line, this pattern is stripped,
           and the part that matched the subpattern is compared to the
           nodename.  If they’re not equal, then the relevant counter for the
           category named by the other_host_message variable is incremented.
           Note that all nodenames are subject to having the local domain
           stripped from them; see domain and leave_FQDNs_alone for details.

       pre_skip_list_hook
           This optional extension is an array of perl commands to be run
           after the nodename check, just before the skip_list check.

       skip_list
           This optional extension is obsolete and deprecated, but still works
           for backwards compatibility.

       raw_rules
           This optional extension is obsolete and deprecated, but still works
           for backwards compatibility.

       GLOBAL VARIABLES

       These variables are all globals.

       log_type_list
           This variable is a mandatory global array that contains the list of
           all known log-types, ie. syslog, sulog, wtmpx, etc.

       pat This variable is a madatory global array that contains a list of
           subpattern names followed by a comma, optional whitespace, and a
           perl regex that represents that subpattern.  Some of the predefined
           patterns include "ip", "zone", "user", "mail_user", etc.  Run
           log_analysis -I pats for a list.

       host_pat
       file_pat
       ip_pat
       mail_user_pat
       user_pat
       word_pat
       zone_pat
           Legacy variables.  Please don’t use them.

       other_host_message
       output_message_one_day
       output_message_all_days
       output_message_all_days_in_range
           Assorted mandatory scalars that are used for human-readable output.
           other_host_message defaults to "Other hosts syslogging to us",
           output_message_one_day defaults to "Logs for %n on %d",
           output_message_all_days defaults to "All logs for %n as of %d".
           output_message_all_days_in_range defaults to "All logs for %n for
           %s through %e".

       date_format
           This variable is a mandatory global scalar that describes how you
           want the date printed in the output.  Uses the format of
           strftime(3).  Note that you probably shouldn’t use characters that
           you wouldn’t want in a filename (ie. whitespace or ’/’) if you want
           to use the %d tag for output_file.

       output_file
           Equivalent to -o file.  This variable is an optional global scalar
           that lists a filename that will be output to instead of to standard
           output.  Works with mail_address (if specified.)  Note that this
           variable is subject to the usual tag substitutions (see "TAG
           SUBSTITUTIONS", plus you can use the %d tag for the date, so you
           can set it to something like "/var/log_analysis/archive/%n-%d".
           See output_file_and_stdout.

       output_file_and_stdout
           Equivalent to -O.  This variable is an optional global scalar that
           changes the behavior of -o or output_file.  By default, -o or
           output_file causes output to only to only go to the named file.
           With this variable, output also goes to standard output.  Note:
           this does not currently work with -m.

       nodename
           This variable is an optional global scalar that is used in a bunch
           of places: in checking to see whether a message from syslog (or
           other log type that defines nodename_pattern) originated on this
           host; in reading in various default config files; etc.  If left
           unset in the config, its value is set from the output uname(2).
           Its value is used to set the n tag.  Note that unless
           leave_FQDNs_alone is set, log_analysis will try to strip the local
           domain name from nodename.

       osname
       osrelease
           These two optional global scalars default to the equivalent of
           uname -s and uname -r, respectively.  They are only used for
           reading in default config files.  Their values set the s and r
           tags, respectively.

       domain
           This variable is an optional global scalar.  If you don’t set it,
           log_analysis will try to set it by looking for a domain line in
           /etc/resolv.conf.  If log_analysis has domain set, it will attempt
           to strip away the local domain name from all nodenames it
           encounters, unless leave_FQDNs_alone is set.  See leave_FQDNs_alone
           for details.

       leave_FQDNs_alone
           This variable is an optional global scalar.  By default, if
           log_analysis has domain set (either explicitly or implicitly), it
           will attempt to strip away the domain name in domain, or
           "localdomain", from all nodenames it encounters.  If you set this
           to 1, or to some other true value, log_analysis will not attempt to
           strip the domain name in domain.

       PATH
           This variable is an optional global scalar that sets the PATH
           environment variable.  This doesn’t help the initial setting of
           nodename, osname, or osrelease, which are set from uname(2).

       umask
           This variable is an optional global scalar that sets the umask.
           See umask(2).

       priority
           This variable is an optional global scalar that sets the priority,
           or "niceness."  See nice(1).  Setting this to zero means run
           unchanged from the current niceness.  Setting this negative is a
           bad idea unless you really know what you’re doing, and is
           forbdidden to non-root users.

       decompression_rules
           This variable is an optional global array of rules to decompress
           compressed files, in the format: compression-extension, comma,
           space, command to decompress to stdout.  The command is subject to
           the usual tag substitutions (see "TAG SUBSTITUTIONS", plus %f
           stands for the filename.  For example, the rule for gzipped files
           is:

           "gz, gzip -dc %f"

           The default rules support: .gz .Z .bz2

           If both decompression_rules and open_command apply to a given file,
           the default is to use a temp file for the intermediate results
           unless pipe_decompress_to_open is used.  See
           "pipe_decompress_to_open" for more info.

       pgp_rules
           This variable is an optional global array of rules for PGP
           encrypting messages, in the format: PGP type (user defined), comma,
           space, command to PGP encrypt stdin to stdout.  The command is
           subject to the usual tag substitutions, plus %m stands for the
           email address.  For use with the "-p" and "-m" options.  For
           example, the rule for gnupg is:

           "g, gpg -aer %m 2>&1"

           Internally defined rules are "g" for "gnupg", "2" for PGP 2.x, and
           "5" for PGP 5.x.

           WARNING: The user who runs log_analysis must have already imported
           the mail destination’s key for this to work.  Make sure to test
           this before you put it in a cronjob.

       filename_ignore_patterns
           This variable is an optional global array of patterns that describe
           filenames to be skipped in an include_dir/include_dir_if_exists
           context, such as emacs backup file (".*~") or vim backup files
           ("\..*\.swp").  Only the file component of the path is examined,
           not the directory component.  Patterns implicitly begin with ^ and
           implicitly end with $.

       mail_address
           This variable is an optional global scalar that can consist of an
           email address.  If set, the output of the script will be mailed to
           the address it is set to.  The -m option does the same thing, and
           overrides this.

       mail_command
           This variable is an optional global scalar that is the command used
           to send mail if -m is user or mail_address is set.  The -M option
           does the same thing, and overrides this.  This variable is subject
           to the usual tag substitutions, plus %m stands for mail_address and
           %o stands for the relevant output message.  The default is:

           "Mail -s '%o' %m"

       memory_size_command
           This variable is an optional global scalar that is the command used
           to determine the process’ memory size.  Subject to the usual tag
           substitutions, plus %p stands for the PID (process ID) in question.
           If set, the command is run at the end of the report, and the output
           is included in the footer.

           The default value for Linux is:

           "ps -p %p -o vsz | tail -n +2"

           The default value for Solaris/SunOS is:

           "ps -p %p -o vsz | tail -n +2"

       optional_log_files
           This variable is an optional array of file globs that are to be
           processed.  Note that, unlike required_log_files, these are globs
           rather than literal filenames, although literal filenames will also
           work.  [Globs are filenames with wildcards, ie.
           /var/adm/messages*.]

           See -r for an issue specific to real mode and gui mode.

       commands_to_run
           This variable is an optional array of commands that are also
           supposed to be run to give a snapshot of the system state.  These
           are currently: w, df -k, and cat /etc/dumpdates.

       rcs_command
           This variable is an optional global scalar that is the command used
           to do RCS check-in on files (i.e. when
           gui_mode_config_save_does_rcs is set).  This variable is subject to
           the usual tag substitutions, plus %f stands for the file in
           question.  The default is intended for RCS, although SCCS, CVS,
           SVN, or other systems could be substituted.  The default is:

           "ci -q -l -t-%f -m'automatic check-in' %f"

       suppress_commands
           If set, the commands in commands_to_run are NOT run during report
           mode.  This is equivalent to the -s option.

       suppress_footer
           If set, the various report mode footers are not displayed.  This is
           equivalent to the -S option.

       ignore_categories
           This variable is an optional array of categories that you don’t
           want to see.  Rather than try to remove all the rules for these
           categories, you can just list them here.

       priority_categories
           This variable is an optional array of categories that will be
           listed first in the output.

       days_ago
           This optional scalar variable is the config equivalent of the -d
           option.

       process_all_nodenames
           This optional scalar variable is the config equivalent of the -N
           option.

       type_force
           This optional scalar is the config equivalent of the -t option.

       allow_nodenames
           This variable is an optional array of nodenames that can log to
           this host.  Usually, logs labelled as being from another host will
           not be anaylzed, and each such line will be listed in a special
           category; if you chose to allow some nodenames (or if you choose to
           process all nodenames by setting -N or setting
           process_all_nodenames) then these log messages will also be
           processed.

       real_mode
           This variable is the config equivalent of the -r option; see the -r
           option for more details.

       real_mode_output_format
           This is a required global scalar.  It describes the per-output
           format for real mode and gui mode.  It is subject to normal tag
           substitution (see "TAG SUBSTITUTION"); in addition to the normal
           tags, "%c" is replaced with the category, "%#" is replaced with the
           count, "%d" is replaced with the formatted data, "%h" is replaced
           with the nodename of the message, and "%R" is the raw, original log
           line without the trailing newline.  If keep_all_log_lines is set,
           you also get "%A" for all the raw logs line.  WARNING: you usually
           want "%h" (nodename of the message), not "%n" (nodename of the host
           you’re running on, which is one of the default tags substitutions.)
           Defaults to "%c: (loghost %n, from host %h)\n%-10# %d\n\n".

       real_mode_sleep_interval
           This optional global scalar is for use with real mode and gui mode.
           In these modes, log_analysis reads log files for more data, sleeps
           for a little while, and then reads again.  The sleep interval
           controls how long log_analysis sleeps (in seconds).  It defaults to
           1.

       real_mode_check_interval
           This optional global scalar is for use with real mode and gui mode.
           In these modes, log_analysis sits in a loop reading from the logs
           files.  Periodically, it wants to check if the log files have
           rolled over or if newer log files have appeared.  If at least this
           long (in seconds) goes by since the last time we’ve checked, we
           check again.

       keep_all_raw_logs
           This optional global scalar is a boolean for use with real mode and
           gui mode.  It enables a %A tag that contains all the raw logs for a
           given entry.  That is, if you have multiple log lines that contain
           essentially the same data, only the first line shows up in %R, and
           the rest are thrown out.  This variable lets you keep them all.  It
           can eat up a lot of memory, so it’s disabled by default.

       real_mode_backlogs
           This optional global scalar is equivalent to -b.

       colors
           This variable is an optional global array for use with real mode
           and gui mode.  It defines the colors available on console, using
           "name, string" pairs.  The usual tag substitution rules apply to
           the string, plus the special tag %a stands for octal character 007
           (ASCII BEL) and %e stands for octal character 033 (ASCII ESC).
           Some of the colors are actually mode changes (ie. "normal",
           "inverse", "reverse", "blink", etc.)  If you define any colors, you
           should also define a "normal" color.  Note that "bell" is among the
           colors; it didn’t belong anywhere else.  You can list colors with
           log_analysis -I colors.

       gui_mode
           This variable is the config equivalent of the -g option; see the -g
           option for more details.  It is an optional scalar.

       gui_mode_modifier
           In gui mode, the default modifier to do things with the keyboard is
           "alt", ie. "alt-q" to exit.  This lets you change it.  It is an
           optional scalar.

       report_mode_output_node_per_category
       report_mode_combine_nodes
       report_mode_combine_shows_nodes
       report_mode_combine_is_partway
           These are assorted options for dealing with output for multiple
           node situations (ie. logservers.)  They are all optional scalars.
           See "LOGSERVER CONSIDERATIONS" for details.

       window_command
           In gui mode, if we need a window to run a command, say an action,
           this will be the command that is used.  The tags are the same as
           real_mode_output_format, plus we have "%t" as the title and "%C" as
           the command.  It is an optional scalar.

       login_action
           This optional array lets you specify what action should be used to
           login to a given host in gui mode, overriding default_login_action.
           Lines are in the format host, login_action.

       default_login_action
           This optional scalar specifies which login action should be used to
           login in hosts by default in gui mode.

       default_throttle_format
           See the throttle: directive in the action group.

       default_action_format
           See the use_pipe directive in the action group.

       print_command
       print_format
       save_format
       gui_mode_config_autosave
       gui_mode_config_savelocal
       gui_mode_config_save_does_rcs
       gui_mode_config_file
       gui_mode_print_all
       gui_mode_save_all
       gui_mode_save_events_file
           These are for GUI use.

       default_sort
           This variable is an optional global scalar that describes how
           certain things will be sorted.  See "SORTING" for info on what this
           can be set to.  Defaults to funky.

       default_filter
           This variable is an optional global scalar that describes the
           default category filter.  See "FILTERS" for info on what this can
           be set to.

PREPROCESSOR DIRECTIVES

       NB: these get completely processed before all other directives, so they
       don’t care about other syntax elements.  Except as noted, these should
       appear at the beginning of the line after optional whitespace.

       @@end
           End of config file.

       @@define var val
           Define var as value val.  var should contain only alphanumerics and
           underscores, and start with an alphanumeric.  val may contain no
           whitespace.

       @@undef var
           Undo any previous definition of var.

       @@ifdef var
       @@ifndef var
       @@else
       @@endif
           If variable var is defined, even defined as a false value, the
           lines after the @@ifdef are used, otherwise the lines are
           effectively commented out.  @@ifndef is the logical reverse.
           @@ifdef and @@ifndef must be terminated by an @@endif.   They may
           contain an @@else section that works in the usual way.

       @@ifhost name
       @@ifnhost name
           These are just like @@ifdef and @@ifndef above, except that they
           test if the variable nodename is equal to the value supplied for
           name.

       @@ifos name
       @@ifnos name
           These are just like @@ifdef and @@ifndef above, except that they
           test if the variable osname is equal to the value supplied for
           name.

       @@{var}
           If this string appears anywhere on any line, then if var is a
           defined variable, its value is substituted.  If var is not a
           defined variable, the string is left literally.  Note that this
           behaviour is different from that of aide(1).

       @@warn message
           Print out message as soon as the config is read.

       @@error message
           Print out message and exit as soon as the config is read.

SORTING

       You can sort category items using several different criteria.  You can
       set the default_sort, and then on a per-category basis, you can use the
       sort: keyword to control things even closer.  If you don’t override it,
       default_sort defaults to funky.  Sorts stack, so you can use "reverse
       string" or "reverse value".  In theory, you can stack all of them, ie.
       "reverse value reverse funky", but there is no guarantee that sorts are
       stable.

       The available sorts are:

       string
           Simple string "lexicographical" sort.  Does not handle numbers
           well.

       numeric
           Sorts numbers, including decimal numbers, correctly, but cannot
           handle non-numeric characters, and cannot handle IPs correctly.

       funky
           Tries to do the right thing with mixed integers and strings.
           Handles IP addresses correctly.  It does not handle decimal numbers
           correctly.

       reverse
           Reverses the current order.  Can be used in conjunction with
           another sort, ie. "reverse string".

       value
           Sorts by count (ascending) instead of by item.

       none
           Does no additional sorting.

FILTERS

       Sometimes, you don’t want to see all the information in a category,
       just the top few items, or whatever.  Filters let you do this.  You can
       set a default filter using default_filter (defaults to "none") or you
       can set filters on a per-category basis using the filter: keyword.

       Some commands you can use:

       >= N
           Only show items whose count is greater than or equal to N.

       <= N
       > N
       < N
       = N, == N
       != N, <> N, >< N
           These are analagous to >=.

       top N
       top N%
       top_strict N
       top_strict N%
           Only show those items who count is in the top N or top N%.  The
           difference between top and top_strict is what happens when there’s
           a tie to be in the top N.  top will include all the items that tie,
           even if this means there will be more than N.  top_strict always
           cuts off after N.

       bottom N
       bottom N%
       bottom_strict N
       bottom_strict N%
           Analagous to top.

       subfilter and subfilter
       subfilter or subfilter
           Lets you "and" or "or" two or more subfilters togther (ie. "top 10
           and >= 4").

UNIQUE DESTINATION

       log_analysis has a relatively simple counting mechanism that is usually
       effective.  One exception is when you want to track how often one value
       occurs in your log uniquely with another value.  For example, suppose
       you’re watching firewall logs, $1 is the source IP, $2 is the
       destination IP, and you want to know if you’re being scanned.  Tracking
       counts of "$1 $2" requires you to manually count how many times $1
       occurs.  Tracking just "$1" doesn’t really tell you what you want,
       because you don’t know if the source IP is really scanning a bunch of
       different hosts, or just has a renegade process that’s banging away at
       a single destination.  What you want to track is how many times $1
       occurs with a unique $2.

       To do this sort of thing in a pattern config, set format: to value1,
       value2 and set dest: to "UNIQUE category-name".  In our example, we
       might say:

         format: $1, $2
         dest:   UNIQUE scans

       The fields in format are not evaluated in a string context, and only
       the last comma acts as a separator.  So, if $3 contains the protocol
       information, you might say this:

         format: sprintf("%-15s %s", $1, $3), $2
         dest:   UNIQUE scans

       When detecting scans in particular, it makes sense to specify an event
       filter, ie.:

         category: scans
           filter: >= 5

       Note that it’s often useful to specify multiple dests with firewall
       pattern, ie. one regular category dest, one UNIQUE dest with a filter
       threshold to detect a scan.  If so, you might want to add
       delete_if_unique to the regular dest, so if it turns out you have a
       scan, you don’t have to wade through lots of garbage.  Ie.:

         pattern: kernel: block from ($pat{ip}):($pat{port}) to ($pat{ip}):($pat{port})

           format: $1 => $3:$4
           delete_if_unique
           dest: kernel block

           format: $1, $3
           dest: UNIQUE scans

         category: scans
           filter: >=5

TAG SUBSTITUTIONS

       A few items are subject to "tag substitutions".  These are kind of like
       printf’s "%" sequences: a sequence like "%n" gets replaced with the
       nodename.  You can optionally specify field widths, which default to
       right-justified (ie.  "%10n") or can be preceeded with a "-" to make
       them left-justified (ie. "%-10n").  Also, a few of the basic C-style
       backslash sequences are understood (ie. \n for newline, \t for tab, \\
       for backslash).  Anything subject to tag substitutions will be listed
       as such.

       Here are the standard tag sequences:

       %% literal %
       %n nodename (ie. the output of uname -n.)
       %r OS release (ie. the output of uname -r.)
       %s OS name (ie. the output of uname -s.)

       There are also other tag sequences that apply in special situations.
       They are listed where they apply.

       If you try to use an undefined sequence (ie. "%Z" or something else),
       you’ll get an error.

LOGSERVER CONSIDERATIONS

       log_analysis defaults to single host operation.  If you have a
       logserver that allows logs from multiple hosts (ie. centralized
       logging) then you potentially have two concerns: configuring what
       hostnames to allow, and how to display multi-node logs in report mode.

       By default, log_analysis will only allow logs from the nodename of the
       logserver, so if you want to allow other nodes, you need to tell
       log_analysis which hostnames it should allow logs from.  Either set
       allow_nodenames to a list of nodenames to allow logs from, or set
       process_all_nodenames (AKA option -N) to accept everything.  Another
       useful variable here is leave_FQDNs_alone.

       Once you’ve accepted multiple nodes, there are a number of ways
       log_analysis can display them.  Let’s say I received two "Accepted
       publickey for morty from 192.168.1.1 port 50000 ssh2" events from "red-
       sonja" and three from "conan".  In the default mode, that would look
       like this:

         Logs found for other hosts.  For host conan:

         ...
         sshd: accepted publickey:
         3          morty from 192.168.1.1
         ...

         Logs found for other hosts.  For host red-sonja:

         ...
         sshd: accepted publickey:
         2          morty from 192.168.1.1
         ...

       You can get the categories listed together more compactly by setting
       report_mode_output_node_per_category.  Ie:

         ...
         sshd: accepted publickey: (host conan)
         3          morty from 192.168.1.1

         sshd: accepted publickey: (host red-sonja)
         2          morty from 192.168.1.1
         ...

       If you set report_mode_combine_nodes, the category will be combined
       into a single category.  Ie.:

         ...
         sshd: accepted publickey:
         5          morty from 192.168.1.1
         ...

       If you set both report_mode_combine_nodes and
       report_mode_combine_shows_nodes, you get the combined messages along
       with a list of applicable hostnames.  Ie.:

         ...
         sshd: accepted publickey:
         5          morty from 192.168.1.1 (conan red-sonja)
         ...

       If you set both report_mode_combine_nodes and
       report_mode_combine_is_partway, the messages are listed like so:

         ...
         sshd: accepted publickey:
         3          morty from 192.168.1.1 (conan)
         2          morty from 192.168.1.1 (red-sonja)
         ...

       Other combinations of the variables
       report_mode_output_node_per_category, report_mode_combine_nodes,
       report_mode_combine_shows_nodes, and report_mode_combine_is_partway
       produce undefined results.

EXAMPLES

       log_analysis -m root@whatever

       Analyze yesterday’s logs and mail the results to root@whatever.  You
       might want to put this in a cronjob.

       log_analysis -p5 -m root@whatever

       Same as the last one, but PGP encrypt the logs using PGP 5 before
       mailing.

       log_analysis -a

       Look at all the logs, not just yesterday’s.

       log_analysis -sa /var/adm/sulog

       Analyze all the contents of sulog, don’t bother with local state.

       log_analysis -san otherhost syslog-file

       Analyze all the contents of syslog-file, which was created on
       "otherhost".  Don’t run the local state commands.

       log_analysis -sd1 -f foo.conf -U

       This style of command is useful while developing local configs to
       handle log messages unknown to the internal config.

       Use foo.conf as a config file in addition to the internal config.
       Output only the unknowns.

COMPATIBILITY

       Written for Solaris and Linux.  May work for other OSs.

       Written for perl 5.00503.  May work with some earlier perl versions.

NOTES

       You often need to be root to read interesting log files.

       It is customary to regularly "rollover" log files.  Many log file
       formats don’t include year infomation; among other benefits, rollover
       makes the dates in such logfiles unambiguous.  log_analysis by default
       looks for log lines that match a particular day of the year, but does
       not even try to guess the year.  If the OS you’re using doesn’t
       rollover some logfiles by default (ie. Solaris doesn’t rollover
       /var/adm/wtmpx, /var/adm/wtmp, or /var/adm/sulog), you will need to
       rollover these files yourself to get valid output from this program.

       On some OSes, ’%’ (ie. the percent symbol) has a special meaning in
       crontabs, and needs to be commented.  See crontab(1).

       When there are a lot of unknowns, log_analysis can take a lot longer to
       run.  This is particularly a problem when you’re first running it,
       before you customize for your site.  To get around this problem, if you
       send log_analysis a SIGINT (ie. if you hit control-C), it will stop
       going through your logs and immediately output the results.

FILES

       /etc/log_analysis.conf
       /etc/log_analysis.conf-%n
       /etc/log_analysis.conf-%s-%r
       /etc/log_analysis.conf-%s
       /usr/etc/log_analysis.conf
       /usr/etc/log_analysis.conf-%n
       /usr/etc/log_analysis.conf-%s-%r
       /usr/etc/log_analysis.conf-%s
           Config files, in order of precedence.  "%n", "%s", and "%r" have
           the usual tag substitution meanings; see "TAG SUBSTITUTIONS".

       /etc/log_analysis.d
       /usr/etc/log_analysis.d
           Plug-in directories.  All files in these directories will be
           treated as config files and include’d.

       $HOME/.log_analysis.conf
           If you start log_analysis with the "-g" option, this file will be
           loaded as a config file after all other config files, except those
           specified by -f.  This is also the default file for the "save
           config" menu option.

AUTHOR

       Mordechai T. Abzug <morty@frakir.org>

POD ERRORS

       Hey! The above document had some coding errors, which are explained
       below:

       Around line 8656:
           You forgot a ’=back’ before ’=head2’

       Around line 8665:
           ’=item’ outside of any ’=over’

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

CONFIG FILE

VARIABLES

PREPROCESSOR DIRECTIVES

SORTING

FILTERS

UNIQUE DESTINATION

TAG SUBSTITUTIONS

LOGSERVER CONSIDERATIONS

EXAMPLES

COMPATIBILITY

NOTES

FILES

AUTHOR

See Also

POD ERRORS