NAME
log_analysis - Analyze various system logs
SYNOPSIS
log_analysis [-h] [-r] [-g] [-f config_file] [-o file] [-O] [-n
nodename] [-U] [-u unknownsdir] [-D var1,var2=value,...] [-d days_ago]
[-a] [-F] [-i] [-m mail_address] [-M mail_prog] [-s] [-S] [-t
forced_type] [required_files. . .] log_analysis -I info_type
DESCRIPTION
log_analysis analyzes and summarizes system logs files. It also runs
some other commands (ie. w, df -k) to show the system state. It’s
intended to be run on a daily basis out of cron.
log_analysis supports several major modes. The default mode is report
mode, which scans through your logs, produces a text report, and exits.
There is also real mode, which lets you monitor your logs continuously;
gui mode, which is a gui sitting on top of real mode; and daemon mode,
which is a daemonized variant of real mode.
OPTIONS
-a all
Show all logs, not just the ones from yesterday.
-A daemon mode
Start in daemon mode. Daemon mode is like real mode, except that
the process daemonizes, and there is no regular output, just
actions. daemon mode is useful if you want to start log_analysis
at system boot time to run actions. It’s also useful if you have
actions configured, and you have multiple copies of log_analysis
running in real/gui mode, and you only want the actions to happen
once.
See -r for more info on real mode. In general, anything that
applies to real mode applies to daemon mode unless it explicitly
says otherwise.
The variables specific to daemon mode are daemon_mode and
daemon_mode_pid_file. One variable that is not specific to daemon
mode but is really useful with daemon mode is
real_mode_no_actions_unless_is_daemon.
-b real mode backlogs
By default, real mode and gui mode ignore all existing log messages
and only show new logs. With this option, real mode shows logs as
indicated by days_ago. See -r for more info.
-d days_ago
Show logs from days_ago days ago. Defaults to 1 (ie. show
yesterday’s logs.) In -a mode, this option only affects the
heading, and it defaults to 0.
You can also provide an absolute date in the form YYYY_MM_DD, ie.
2001_03_02. And you can provide the symbolic names today
(equivalent to 0) and yesterday (equivalent to 1).
And you can even provide a date range in the form
YYYY_MM_DD-YYYY_MM_DD or ago1-ago2 to get output for a range of
days. Each day is output individually, so if you use the -o
option, you get a separate file for each day, and if you use the -m
option, you get a separate mail for each day.
You can also set this in the config with the days_ago variable.
See -r for how days_ago is handled under real mode and gui mode.
-D var1,var2=value,var3,...
This option lets you define preprocessor constants. Its argument
is a comma-separated list of constants to define. To set a
constant to a particular value, say "constant=value".
-f config_file
Read config_file in addition to the internal config and the
internal config files. See "CONFIG FILE" for details.
-F Instead of loading the whole internal config, just use a minimal
subset.
-g "gui mode", ie. monitor log files continuously. Currently
conflicts with many other modes and options. Yes, has built-in
support for log file rollover. This is basically real mode (see
-r) with a GUI; variables that apply to real mode also apply to gui
mode, but not vice versa.
See variables gui_mode, gui_mode_modifier, and window_command for
gui mode specifics. See -r for many things that also apply to gui
mode.
-h help
Show command summary and exit.
-i includes suppress
Don’t include the standard include files, ie.
/etc/log_analysis.conf, /usr/etc/log_analysis.conf, and the others
listed in "FILES". Note that this option does not stop the
inclusion of $HOME/.log_analysis.conf in gui mode.
-I info
This option is used for obtaining internal information about
log_analysis. log_analysis exits immediately after outputting the
information.
If info is help, log_analysis outputs the list of things you can
use for info.
If info is categories, all categories (those mentioned in the
various configs and implicit categories) will be listed.
If info is colors, all colors that work for real_mode and gui_mode
will be listed.
If info is config_versions, all config files will be listed with
their config_version and file_version (if defined).
If info is evals, the evals built from the config (internal and
local) are output.
If info is internal_config, the internal config is output.
If info is log_files, the log files that would have been read are
output.
If info is log_types, the known log types are output.
If info is nothing, log_analysis just exits. Useful for testing
configs.
If info is pats, the known subpatterns will be listed.
If info is patterns, the various patterns defined for the log types
are output.
-m mail_address
Mail output to mail_address. This can also be specified in the
config; see mail_address in "VARIABLES".
-M mail_command
Use mail_command to send the mail. This can also be specified in
the config; see mail_command in "VARIABLES" for more info,
including the default.
-n nodename
Use nodename as the nodename (AKA hostname) instead of the default.
This is more than just cosmetic: entries in syslogged files will be
processed differently if they didn’t come from this nodename. This
can also be specified in the config file; see nodename in
"VARIABLES".
-N process all nodenames
If the logs contain entries for nodes other than nodename, (ie. if
the host is a syslog server), analyze them anyway.
-o file
Output to file instead of to standard output. Works with -m, so
you can save to a file and send mail with one command.
-O With -o file, causes the output to go both to the file and to
standard output. NB: this does not currently work with -m, so you
can’t output to a file, standard output, and to email.
-p pgp_type
Encrypts the mail output. Uses pgp_type to determine the
encryption command. For use with -m or mail_address. See pgp_type
in the list of global variables for info on encryption types.
-r "Real mode", ie. monitor log files continuously. Currently
conflicts with many other modes and options. Yes, has built-in
support for log file rollover. See -g for a GUI that can sit on
top of this mode, and -A to run real mode as a daemon.
See variables real_mode, real_mode_output_format,
real_mode_sleep_interval, real_mode_check_interval,
real_mode_backlogs (or the -b option), and keep_all_raw_logs in the
list of global variables for more configurables.
WARNING: in real mode and gui mode, only the most recent file per
glob in optional_log_files is monitored. This means that you
should set it to something like /var/log/messages* and
/var/log/syslog* rather than /var/log/*.
WARNING: in real mode and in gui mode, log_analysis treats days_ago
differently; if it’s a simple number, it is treated as the number
of days ago to start looking at logs. So, if days_ago is 7,
log_analysis looks through the past 7 days’ worth of logs.
HOWEVER, even if -d is set, log_analysis doesn’t actually show
these logs unless -b is specified or the corresponding variable
real_mode_backlogs is set.
NOTE: The primary feature of log_analysis is its reporting
capability. Using it for continuous monitoring makes sense if you
want a single config for reporting and for continuous monitoring.
If you just want continuous monitoring then you may be better off
with some of the other software out there, such as swatch(1).
-s suppress other commands
Usually, log_analysis runs assorted commands that show system state
(ie. w, df -k). This option doesn’t run those commands. See
commands_to_run in "VARIABLES" for the list of extra commands. The
suppress_commands variable does the same thing as this option.
-S suppress output footer
Usually, log_analysis will include its version number, the time it
spent running, and its arguments at the end of the output. This
option suppresses that output. The suppress_footer variable does
the same thing as this option.
-t forced_type
log_analysis usually determines the type of logfiles by looking at
the per-type log_filenames extension. This option and the
type_force variable let you bypass that check.
-U unknowns-only
Output logfile unknowns to stdout and exit. If unknownsdir exists,
also wipe unknownsdir if it exists and then write out raw unknown
lines to files in unknownsdir. This exists to make writing custom
rules easier.
-u unknownsdir
Use unknownsdir as the unknownsdir. If unknownsdir already exists,
and contains files, its files will be used as the input for
log_analysis regardless of any other command line options. If -U
is also specified, after all processing unknownsdir will be wiped
out and its files rewritten with the current unknowns. This is
useful for writing your own configs.
-v version
Output version and exit.
required-files
If files are specified on the command line, log_analysis ignores
its built-in list of optional and required log files, and process
the files on the command line. If one of the files doesn’t exist,
it’s a fatal error.
CONFIG FILE
The script has an embedded config file. It will also read various
external config files if they exist; see "FILES" for a list. Later
directives (from later in the file or from a file read later) override
earlier directives.
You can make comments with ’#’ at the beginning of a line. If you want
a ’#’ or ’=’ at the beginning of a line, you usually need to quote it
with backslash.
Some directives take a "block" as argument. A block is a collection of
lines that ends with a line that is empty or only contains whitespace.
’#’ at the beginning of a line still comments out the line. Leading
whitespace on a line is ignored.
Before the config is parsed, it is passed through a preprocessor
inspired by the aide(1) preprocessor.
Pattern directives
These directives describe your logs, and are the main point of this
program. The basic idea here is that you first declare what logtype
you are working with, and then you specify a bunch of perl patterns
that describe different kinds of log messages, and that save parts of
the message. For each perl pattern, you specify one or more
destinations that describe what you want done with it.
logtype: type
Future patterns should be applied to this logtype (ie. sulog,
syslog, wtmp.) Example:
logtype: syslog
pattern: pattern
pattern is a perl regex (see perlre(1)) that implictly starts with
^ (beginning of the line) and implicitly ends with \s*$ (optional
whitespace and the end of the line.) This should only be issued
after a logtype: has been issued in the same config file. Wildcard
parts of the pattern should be surrounded with parentheses, to save
these parts for later use in the format:. Note that there are some
tokens with special meanings that can be used here in the format
$pat{something}, ie. $pat{ip}, $pat{file}, etc. (see "pat" for
details, and run log_analysis -I pats for the current list).
Examples:
pattern: popper: Stats: ($pat{mail_user}) (\d+) (\d+) (\d+) (\d+)
pattern: login: LOGIN ON ($pat{file}) BY ($pat{user})
The order of precedence for patterns is undefined, except that
user-defined patterns always have precedence over the patterns of
the internal config.
format: format
format is treated as a string that contains the useful information
from a pattern. Note that it should not actually be quoted. A
format is mandatory for category destinations, but should not be
used with SKIP or LAST destinations.
For example, if we had a pattern that was login: LOGIN ON
($pat{file}) BY ($pat{user}), we would probably just want $2, so we
might say:
format: $2
Similarly, if we had a patterns that was kernel: deny (\d+) packets
from ($pat{ip}) to ($pat{ip}), we might want to say:
format: $2 => $3
use_sprintf
use_sprintf is optional. If this directive is present for a given
format, than instead of the format being treated as a string, it is
treated as the arguments for sprintf(3). For example, if you have
a source IP address in $2 and a destination IP address in $3, you
could just have dest as $2 => $3, but you would have things lining
up better if you did this:
format: "%-15s => $3", $2
use_sprintf
delete_if_unique
delete_if_unique is optional. This feature can be used when you
have multiple dests for one pattern, one of which is a regular
category and one of which is a UNIQUE with a filter. You want the
one that is a regular category to be deleted if the UNIQUE category
meets its filter, ie. because it’s a scan. See "UNIQUE
DESTINATION" for more info.
count: count
count is optional. The default is that a log line that matches a
pattern causes the category to increment by 1. But sometimes, a
single log line corresponds to multiple events, ie. if you have a
log message of the form "5 packets denied by firewall" or "last
message repeated 3 times", you can extract the event count to
count. For example, if you’re using the pattern kernel: deny (\d+)
packets from ($pat{ip}) to ($pat{ip}), you might say:
count: $1
color: colors
space-separated list of colors to display this message in when in
real-mode or gui-mode. For a list of colors that will work in both
modes, run log_analysis -I colors. Note that "bell" is among the
available colors, because it didn’t fit anywhere else. See the
colors entry for more info.
NOTE: if multiple dest configs with conflicting color settings
result in delivery to the same line in gui mode, the result is
currently undefined. There is only one line to be displayed, after
all.
description: description_text
This is a simple text description of the event, to explain the
problem to your operators. It can be accessed via gui mode. The
note above by color applies.
do_action: action
Run "action" (described elsewhere in the config with the "action:"
keyword) if this event is seen in real mode or gui mode.
priority: priority
Assign priority priority to action. Currently, the only priority
that does anything is "IGNORE". It can be used to ignore events.
dest: dest
This describes what you want done with the data in a pattern. If
dest is the special token SKIP the data is discarded. If dest is
the special token LAST, the data is assumed to be of the form "last
message repeated N times", and we pretend as though the last
message we saw occurred, using count as a multiplier. If dest
starts with the special token UNIQUE, we do special "unique"
handling, which is covered in "UNIQUE DESTINATION". If dest starts
with the special token CATEGORY or is any other string, it is
treated as a category that the pattern data should be saved to.
Ie. if pattern was login: LOGIN ON ($pat{file}) BY ($pat{user}),
and format was $2, then one might set dest to login: successful
local login. You must have a format defined before the dest.
You can have multiple dest directives for a single pattern, if all
of the dests are category destinations. Each one needs its own
format. Similarly, if you set count or use_sprintf, they are tied
to the particular dest you set them with.
Note that dest "closes" the description of a destination, so you
need to have any other related directives (ie. format, count,
use_sprintf, delete_if_unique) before the dest directive. This
ordering is necessary to avoid ambiguity in the multiple-
destination case.
Event directives
You can configure what happens for incoming events based on certain
criteria. Currently, those criteria are a simple string match of one
or more of the category, data, or hostname. So, for example, you can
ignore all messages from "roguehost", or color "user logged in"
messages for a certain user in bright red. Here are the useful
directives:
event:
Starts a new event config.
match category: value
match data: value
match hostname: value
This event config applies when the "category" is "value", or the
"data" is value, or the "hostname" is "value". If multiple match
lines are supplied, they are ANDed together.
color: color
description: description_text
do_action: action
priority: priority
color, description, do_action, and priority work the same way as
they do in a "dest" config or in an "event" config.
If "event", "dest", and "category" configs all apply to a given
event than "event" has highest precedence, followed by "dest",
followed by "category".
Category directives
Several patterns can lead to the same category, so category-specific
directives are associated with the category, not with a pattern. Here
are the category directives:
category: category
Specifies which category subsequent directives will define.
filter: filter commands
By default, log_analysis will output all the data it finds in a
category. Filters let you specify, say, that only the top 10 items
should be output, or that only the items that occurred fewer than 5
times should be output. If a category has data, but none of the
data meet the filter rules, then the category will be completely
skipped. See "FILTERS" for more info.
sort: sorting keywords
Specifies how this category should be sorted in the output.
Examples are "funky", "string", "value", "reverse value", etc. The
default is "funky". See "SORTING" for more info.
derive: derive commands
The usual way to populate categories is via the pattern config.
But sometimes, you want to combine two or more elemental categories
to make a new category. Any categories derived in this manner may
not be a destination for simple patterns.
There are currently three subcommands for this (the quotes are
literal):
"category1" add "category2"
"category1" subtract "category2"
These do what you expect: take the values for the items in
category2 and add or subtract them from the values for the
items in category1. Any item defined in either category
will be in the new category. Subtract can cause the values
in the new category to be negative or 0.
"category1" remove "category2"
The new category will contain items in category1 that are
not in category2. This is very different from subtract.
Example: if category1 contains A with a value of 2 and B
with a value of 2, while category2 contains A with a value
of 1 and C with a value of 1, ’"category1" subtract
"category2"’ will contain A with a value of 1, B with a
value of 2, and C with a value of -1, while ’"category1"
remove "category2"’ will only contain B with a value of 2.
color: color
description: description_text
do_action: action
priority: priority
color, description, do_action, and priority work the same way as
they do in a "dest" config or in an "event" config.
If "event", "dest", and "category" configs all apply to a given
event than "event" has highest precedence, followed by "dest",
followed by "category".
Action directives
In real mode and in gui mode, sometimes you want an "action" (like
paging someone) to automatically happen when a particular message is
seen. And in gui mode, you might want to run a command on a message
interactively (ie. to telnet or ssh into the host it came from.) The
directives to do that (inspired by swatch(1)) are:
action: action_name
Starts defining a new action named action_name.
command: command
The command to run for the current action. command uses the same
tags as real_mode_output_format.
WARNING: you can potentially shoot yourself in the foot by passing
data that has not been sanitized to a command on your system. Be
careful!
window: title
Performing the action will require creating a window using title as
the title. The title will be passed to window_command as the "%t"
tag. title itself uses the same tags as real_mode_output_format.
This only makes sense for gui mode.
WARNING: you can potentially shoot yourself in the foot by passing
data that has not been sanitized to a command on your system. Be
careful!
use_pipe:
The data in the event will be sent to the command via standard
input. The format used will be that specified by the
default_action_format variable, unless overridden locally by the
action_format: directive. These formats allow the same tags as
real_mode_output_format.
action_format: format
See use_pipe above.
throttle: throttle_time
Automatically-triggered actions can potentially result in a slew of
events. The "throttle" option lets you specify a minimum amount of
time before the action should recur with this event. The time can
be specified as seconds, as minutes:seconds, or as
hours:minutes:seconds.
Throttles do not apply to actions and logins that are explicitly
invoked via the GUI.
By default, the throttle is triggered on unique category and data.
That is, if the event was category "user logged in" and the data
was "morty", then the throttle will keep "user logged in", "morty"
events from causing the action to run again, but won’t stop "user
logged in", "esther" or "no such user", "morty" events from
triggering the action. This default is set with the
default_throttle_format variable, which defaults to "%c\n%d". It
can be overriden on a per-action basis with the throttle_format:
directive, which takes the same tags as real_mode_output_format.
If you want the throttle to be global to the action (say, a pager
action), set throttle_format to a simple scalar value (like 1).
throttle_format: format
See throttle: above.
Other directives
config_version version-number
Declare that the config is compatible with version version-number.
This is for version-control purposes. Every config file should
have one of these. You can scan your config files’ config versions
with -I config_versions.
file_version revision-information
Your own version control information. revision-information can be
arbitrary text. You can scan your config files’ config versions
with -I config_versions.
include file
Read in configuration from file. Dies if file doesn’t exist. file
is subject to usual tag substitutions; see "TAG SUBSTITUTION".
include_if_exists file
Just like include, but doesn’t die if the file doesn’t exist.
include_dir dir
Read in all files in dir, and include them. Die if the directory
doesn’t exist, or if a file in the directory isn’t readable. dir
is subject to the usual tag substitutions; see "TAG SUBSTITUTION".
Any filenames that match a pattern in filename_ignore_patterns will
be skipped.
include_dir_if_exists dir
Just like include_dir, but doesn’t die if the directory doesn’t
exist. Does still die if any of the files in dir isn’t readable.
block_comment
Throws out the block immediately after it.
set var varname =value
Set scalar variable varname to value value. If the variable
already exists, this will overwrite it.
See "VARIABLES" for the list of variables you can play with.
add var varname =value
If scalar variable varname already exists, append value to the end
of its current value. If it doesn’t yet exist, create it and set
it to value.
See "VARIABLES" for the list of variables you can play with.
prepend var varname =value
If scalar variable varname already exists, prepend value to the
current value. If it doesn’t yet exist, create it and set it to
value.
See "VARIABLES" for the list of variables you can play with.
set arr arrname =
Read in the block that follows this declaration, make the lines
into an array, and set the array variable arrname to that array.
See "VARIABLES" for the list of variables you can play with.
add arr arrname =
Read in the block that follows this declaration, make the lines
into an array, and append that array to the array named arrname.
See "VARIABLES" for the list of variables you can play with.
prepend arr arrname =
Read in the block that follows this declaration, make the lines
into an array, and prepend that array to the array named arrname.
See "VARIABLES" for the list of variables you can play with.
remove arr arrname =
Read in the block that follows this declaration, and for each line,
look for and delete that line from array arrname. If one of these
lines cannot be found, the result is a warning, not death.
See "VARIABLES" for the list of variables you can play with.
local OTHER DIRECTIVE
Putting "local" in front of another directive means that this
directive should be saved when gui_mode_config_savelocal is in
effect.
nowarn OTHER DIRECTIVE
Putting "nowarn" in front of another directive means that this
directive should not generate a config warning, i.e. for redefining
a category filter.
VARIABLES
Some variables are scalar, which means they are strings or numbers.
Some variables are arrays, which are lists of scalars.
Some variables are mandatory, which means they must be defined
somewhere in one of the config files, while some variables are
optional.
Some variables are global, while some are per-log-type extensions.
Some example of per-log-type extensions are date_pattern and filenames.
Extensions should actually appear in the format "TYPE_EXTENSION", ie.
date_pattern would actually appear as syslog_date_pattern for the
syslog log-type and sulog_date_pattern for sulog.
To see examples of many of the possibilities, as well as the default
values, run log_analysis -I internal_config.
PER-LOG-TYPE VARIABLE EXTENSIONS
filenames
This mandatory extension is an array of file basenames that apply
to the log type. For example, if you wanted /var/adm/messages.1 to
be processed by the syslog rules, you might add messages to
syslog_filenames.
open_command
Some log files (ie. wtmp log types) are in a binary format that
needs to be interpreted by external commands. This optional scalar
extension specifies a command to be run to interpret the file. The
command is subject to the usual tag substitutions (see "TAG
SUBSTITUTIONS"), plus the %f tag maps to the file. For example,
the wtmp log type defines wtmp_open_command as "last -f %f". If
both decompression_rules and open_command apply to a given file,
the intermediate data will be stored in a temp file unless
pipe_decompress_to_open is used. See "pipe_decompress_to_open" for
more info.
pipe_decompress_to_open
If both decompression_rules and open_command apply to a given file,
the intermediate data will be stored in a temporary file by default
to avoid problems with some commands that can’t handle input from a
pipe. If this optional scalar extension is set to 1 (or any
"true") value, then instead, the output of the decompression rule
will be piped to the open command, and the open command’s %f tag
will be mapped to "-".
open_command_is_continuous
If an open_command has been specified and the command is the sort
that never exits (ie. tcpdump or the like) you should set this to
let log_analysis know what to expext. Such commands should only
ever be used in real mode or gui mode.
pre_date_hook
This optional extension is an array of arbitrary perl commands that
are run for each log line, before the date processing (or any other
processing) is done.
date_pattern
This mandatory extension is a scalar that contains a pattern with
at least one parenthesized subpattern. Before any rules are
applied to a log line, the engine strips off the date pattern. If
the engine is only looking at one day (ie. the default), it takes
the part of the string that matched the parenthesized subpattern,
and if it isn’t equal to the right date, it skips the line. The
date_format extension (next) describes what the date should look
like.
date_format
This mandatory extension is a scalar that describes the date using
the same format as strftime(3). For example, syslog_date_format is
"%b %e".
nodename_pattern
This optional extension is a pattern with at least one
parenthesized subpattern. If it exists, then after the
date_pattern is stripped from the line, this pattern is stripped,
and the part that matched the subpattern is compared to the
nodename. If they’re not equal, then the relevant counter for the
category named by the other_host_message variable is incremented.
Note that all nodenames are subject to having the local domain
stripped from them; see domain and leave_FQDNs_alone for details.
pre_skip_list_hook
This optional extension is an array of perl commands to be run
after the nodename check, just before the skip_list check.
skip_list
This optional extension is obsolete and deprecated, but still works
for backwards compatibility.
raw_rules
This optional extension is obsolete and deprecated, but still works
for backwards compatibility.
GLOBAL VARIABLES
These variables are all globals.
log_type_list
This variable is a mandatory global array that contains the list of
all known log-types, ie. syslog, sulog, wtmpx, etc.
pat This variable is a madatory global array that contains a list of
subpattern names followed by a comma, optional whitespace, and a
perl regex that represents that subpattern. Some of the predefined
patterns include "ip", "zone", "user", "mail_user", etc. Run
log_analysis -I pats for a list.
host_pat
file_pat
ip_pat
mail_user_pat
user_pat
word_pat
zone_pat
Legacy variables. Please don’t use them.
other_host_message
output_message_one_day
output_message_all_days
output_message_all_days_in_range
Assorted mandatory scalars that are used for human-readable output.
other_host_message defaults to "Other hosts syslogging to us",
output_message_one_day defaults to "Logs for %n on %d",
output_message_all_days defaults to "All logs for %n as of %d".
output_message_all_days_in_range defaults to "All logs for %n for
%s through %e".
date_format
This variable is a mandatory global scalar that describes how you
want the date printed in the output. Uses the format of
strftime(3). Note that you probably shouldn’t use characters that
you wouldn’t want in a filename (ie. whitespace or ’/’) if you want
to use the %d tag for output_file.
output_file
Equivalent to -o file. This variable is an optional global scalar
that lists a filename that will be output to instead of to standard
output. Works with mail_address (if specified.) Note that this
variable is subject to the usual tag substitutions (see "TAG
SUBSTITUTIONS", plus you can use the %d tag for the date, so you
can set it to something like "/var/log_analysis/archive/%n-%d".
See output_file_and_stdout.
output_file_and_stdout
Equivalent to -O. This variable is an optional global scalar that
changes the behavior of -o or output_file. By default, -o or
output_file causes output to only to only go to the named file.
With this variable, output also goes to standard output. Note:
this does not currently work with -m.
nodename
This variable is an optional global scalar that is used in a bunch
of places: in checking to see whether a message from syslog (or
other log type that defines nodename_pattern) originated on this
host; in reading in various default config files; etc. If left
unset in the config, its value is set from the output uname(2).
Its value is used to set the n tag. Note that unless
leave_FQDNs_alone is set, log_analysis will try to strip the local
domain name from nodename.
osname
osrelease
These two optional global scalars default to the equivalent of
uname -s and uname -r, respectively. They are only used for
reading in default config files. Their values set the s and r
tags, respectively.
domain
This variable is an optional global scalar. If you don’t set it,
log_analysis will try to set it by looking for a domain line in
/etc/resolv.conf. If log_analysis has domain set, it will attempt
to strip away the local domain name from all nodenames it
encounters, unless leave_FQDNs_alone is set. See leave_FQDNs_alone
for details.
leave_FQDNs_alone
This variable is an optional global scalar. By default, if
log_analysis has domain set (either explicitly or implicitly), it
will attempt to strip away the domain name in domain, or
"localdomain", from all nodenames it encounters. If you set this
to 1, or to some other true value, log_analysis will not attempt to
strip the domain name in domain.
PATH
This variable is an optional global scalar that sets the PATH
environment variable. This doesn’t help the initial setting of
nodename, osname, or osrelease, which are set from uname(2).
umask
This variable is an optional global scalar that sets the umask.
See umask(2).
priority
This variable is an optional global scalar that sets the priority,
or "niceness." See nice(1). Setting this to zero means run
unchanged from the current niceness. Setting this negative is a
bad idea unless you really know what you’re doing, and is
forbdidden to non-root users.
decompression_rules
This variable is an optional global array of rules to decompress
compressed files, in the format: compression-extension, comma,
space, command to decompress to stdout. The command is subject to
the usual tag substitutions (see "TAG SUBSTITUTIONS", plus %f
stands for the filename. For example, the rule for gzipped files
is:
"gz, gzip -dc %f"
The default rules support: .gz .Z .bz2
If both decompression_rules and open_command apply to a given file,
the default is to use a temp file for the intermediate results
unless pipe_decompress_to_open is used. See
"pipe_decompress_to_open" for more info.
pgp_rules
This variable is an optional global array of rules for PGP
encrypting messages, in the format: PGP type (user defined), comma,
space, command to PGP encrypt stdin to stdout. The command is
subject to the usual tag substitutions, plus %m stands for the
email address. For use with the "-p" and "-m" options. For
example, the rule for gnupg is:
"g, gpg -aer %m 2>&1"
Internally defined rules are "g" for "gnupg", "2" for PGP 2.x, and
"5" for PGP 5.x.
WARNING: The user who runs log_analysis must have already imported
the mail destination’s key for this to work. Make sure to test
this before you put it in a cronjob.
filename_ignore_patterns
This variable is an optional global array of patterns that describe
filenames to be skipped in an include_dir/include_dir_if_exists
context, such as emacs backup file (".*~") or vim backup files
("\..*\.swp"). Only the file component of the path is examined,
not the directory component. Patterns implicitly begin with ^ and
implicitly end with $.
mail_address
This variable is an optional global scalar that can consist of an
email address. If set, the output of the script will be mailed to
the address it is set to. The -m option does the same thing, and
overrides this.
mail_command
This variable is an optional global scalar that is the command used
to send mail if -m is user or mail_address is set. The -M option
does the same thing, and overrides this. This variable is subject
to the usual tag substitutions, plus %m stands for mail_address and
%o stands for the relevant output message. The default is:
"Mail -s '%o' %m"
memory_size_command
This variable is an optional global scalar that is the command used
to determine the process’ memory size. Subject to the usual tag
substitutions, plus %p stands for the PID (process ID) in question.
If set, the command is run at the end of the report, and the output
is included in the footer.
The default value for Linux is:
"ps -p %p -o vsz | tail -n +2"
The default value for Solaris/SunOS is:
"ps -p %p -o vsz | tail -n +2"
optional_log_files
This variable is an optional array of file globs that are to be
processed. Note that, unlike required_log_files, these are globs
rather than literal filenames, although literal filenames will also
work. [Globs are filenames with wildcards, ie.
/var/adm/messages*.]
See -r for an issue specific to real mode and gui mode.
commands_to_run
This variable is an optional array of commands that are also
supposed to be run to give a snapshot of the system state. These
are currently: w, df -k, and cat /etc/dumpdates.
rcs_command
This variable is an optional global scalar that is the command used
to do RCS check-in on files (i.e. when
gui_mode_config_save_does_rcs is set). This variable is subject to
the usual tag substitutions, plus %f stands for the file in
question. The default is intended for RCS, although SCCS, CVS,
SVN, or other systems could be substituted. The default is:
"ci -q -l -t-%f -m'automatic check-in' %f"
suppress_commands
If set, the commands in commands_to_run are NOT run during report
mode. This is equivalent to the -s option.
suppress_footer
If set, the various report mode footers are not displayed. This is
equivalent to the -S option.
ignore_categories
This variable is an optional array of categories that you don’t
want to see. Rather than try to remove all the rules for these
categories, you can just list them here.
priority_categories
This variable is an optional array of categories that will be
listed first in the output.
days_ago
This optional scalar variable is the config equivalent of the -d
option.
process_all_nodenames
This optional scalar variable is the config equivalent of the -N
option.
type_force
This optional scalar is the config equivalent of the -t option.
allow_nodenames
This variable is an optional array of nodenames that can log to
this host. Usually, logs labelled as being from another host will
not be anaylzed, and each such line will be listed in a special
category; if you chose to allow some nodenames (or if you choose to
process all nodenames by setting -N or setting
process_all_nodenames) then these log messages will also be
processed.
real_mode
This variable is the config equivalent of the -r option; see the -r
option for more details.
real_mode_output_format
This is a required global scalar. It describes the per-output
format for real mode and gui mode. It is subject to normal tag
substitution (see "TAG SUBSTITUTION"); in addition to the normal
tags, "%c" is replaced with the category, "%#" is replaced with the
count, "%d" is replaced with the formatted data, "%h" is replaced
with the nodename of the message, and "%R" is the raw, original log
line without the trailing newline. If keep_all_log_lines is set,
you also get "%A" for all the raw logs line. WARNING: you usually
want "%h" (nodename of the message), not "%n" (nodename of the host
you’re running on, which is one of the default tags substitutions.)
Defaults to "%c: (loghost %n, from host %h)\n%-10# %d\n\n".
real_mode_sleep_interval
This optional global scalar is for use with real mode and gui mode.
In these modes, log_analysis reads log files for more data, sleeps
for a little while, and then reads again. The sleep interval
controls how long log_analysis sleeps (in seconds). It defaults to
1.
real_mode_check_interval
This optional global scalar is for use with real mode and gui mode.
In these modes, log_analysis sits in a loop reading from the logs
files. Periodically, it wants to check if the log files have
rolled over or if newer log files have appeared. If at least this
long (in seconds) goes by since the last time we’ve checked, we
check again.
keep_all_raw_logs
This optional global scalar is a boolean for use with real mode and
gui mode. It enables a %A tag that contains all the raw logs for a
given entry. That is, if you have multiple log lines that contain
essentially the same data, only the first line shows up in %R, and
the rest are thrown out. This variable lets you keep them all. It
can eat up a lot of memory, so it’s disabled by default.
real_mode_backlogs
This optional global scalar is equivalent to -b.
colors
This variable is an optional global array for use with real mode
and gui mode. It defines the colors available on console, using
"name, string" pairs. The usual tag substitution rules apply to
the string, plus the special tag %a stands for octal character 007
(ASCII BEL) and %e stands for octal character 033 (ASCII ESC).
Some of the colors are actually mode changes (ie. "normal",
"inverse", "reverse", "blink", etc.) If you define any colors, you
should also define a "normal" color. Note that "bell" is among the
colors; it didn’t belong anywhere else. You can list colors with
log_analysis -I colors.
gui_mode
This variable is the config equivalent of the -g option; see the -g
option for more details. It is an optional scalar.
gui_mode_modifier
In gui mode, the default modifier to do things with the keyboard is
"alt", ie. "alt-q" to exit. This lets you change it. It is an
optional scalar.
report_mode_output_node_per_category
report_mode_combine_nodes
report_mode_combine_shows_nodes
report_mode_combine_is_partway
These are assorted options for dealing with output for multiple
node situations (ie. logservers.) They are all optional scalars.
See "LOGSERVER CONSIDERATIONS" for details.
window_command
In gui mode, if we need a window to run a command, say an action,
this will be the command that is used. The tags are the same as
real_mode_output_format, plus we have "%t" as the title and "%C" as
the command. It is an optional scalar.
login_action
This optional array lets you specify what action should be used to
login to a given host in gui mode, overriding default_login_action.
Lines are in the format host, login_action.
default_login_action
This optional scalar specifies which login action should be used to
login in hosts by default in gui mode.
default_throttle_format
See the throttle: directive in the action group.
default_action_format
See the use_pipe directive in the action group.
print_command
print_format
save_format
gui_mode_config_autosave
gui_mode_config_savelocal
gui_mode_config_save_does_rcs
gui_mode_config_file
gui_mode_print_all
gui_mode_save_all
gui_mode_save_events_file
These are for GUI use.
default_sort
This variable is an optional global scalar that describes how
certain things will be sorted. See "SORTING" for info on what this
can be set to. Defaults to funky.
default_filter
This variable is an optional global scalar that describes the
default category filter. See "FILTERS" for info on what this can
be set to.
PREPROCESSOR DIRECTIVES
NB: these get completely processed before all other directives, so they
don’t care about other syntax elements. Except as noted, these should
appear at the beginning of the line after optional whitespace.
@@end
End of config file.
@@define var val
Define var as value val. var should contain only alphanumerics and
underscores, and start with an alphanumeric. val may contain no
whitespace.
@@undef var
Undo any previous definition of var.
@@ifdef var
@@ifndef var
@@else
@@endif
If variable var is defined, even defined as a false value, the
lines after the @@ifdef are used, otherwise the lines are
effectively commented out. @@ifndef is the logical reverse.
@@ifdef and @@ifndef must be terminated by an @@endif. They may
contain an @@else section that works in the usual way.
@@ifhost name
@@ifnhost name
These are just like @@ifdef and @@ifndef above, except that they
test if the variable nodename is equal to the value supplied for
name.
@@ifos name
@@ifnos name
These are just like @@ifdef and @@ifndef above, except that they
test if the variable osname is equal to the value supplied for
name.
@@{var}
If this string appears anywhere on any line, then if var is a
defined variable, its value is substituted. If var is not a
defined variable, the string is left literally. Note that this
behaviour is different from that of aide(1).
@@warn message
Print out message as soon as the config is read.
@@error message
Print out message and exit as soon as the config is read.
SORTING
You can sort category items using several different criteria. You can
set the default_sort, and then on a per-category basis, you can use the
sort: keyword to control things even closer. If you don’t override it,
default_sort defaults to funky. Sorts stack, so you can use "reverse
string" or "reverse value". In theory, you can stack all of them, ie.
"reverse value reverse funky", but there is no guarantee that sorts are
stable.
The available sorts are:
string
Simple string "lexicographical" sort. Does not handle numbers
well.
numeric
Sorts numbers, including decimal numbers, correctly, but cannot
handle non-numeric characters, and cannot handle IPs correctly.
funky
Tries to do the right thing with mixed integers and strings.
Handles IP addresses correctly. It does not handle decimal numbers
correctly.
reverse
Reverses the current order. Can be used in conjunction with
another sort, ie. "reverse string".
value
Sorts by count (ascending) instead of by item.
none
Does no additional sorting.
FILTERS
Sometimes, you don’t want to see all the information in a category,
just the top few items, or whatever. Filters let you do this. You can
set a default filter using default_filter (defaults to "none") or you
can set filters on a per-category basis using the filter: keyword.
Some commands you can use:
>= N
Only show items whose count is greater than or equal to N.
<= N
> N
< N
= N, == N
!= N, <> N, >< N
These are analagous to >=.
top N
top N%
top_strict N
top_strict N%
Only show those items who count is in the top N or top N%. The
difference between top and top_strict is what happens when there’s
a tie to be in the top N. top will include all the items that tie,
even if this means there will be more than N. top_strict always
cuts off after N.
bottom N
bottom N%
bottom_strict N
bottom_strict N%
Analagous to top.
subfilter and subfilter
subfilter or subfilter
Lets you "and" or "or" two or more subfilters togther (ie. "top 10
and >= 4").
UNIQUE DESTINATION
log_analysis has a relatively simple counting mechanism that is usually
effective. One exception is when you want to track how often one value
occurs in your log uniquely with another value. For example, suppose
you’re watching firewall logs, $1 is the source IP, $2 is the
destination IP, and you want to know if you’re being scanned. Tracking
counts of "$1 $2" requires you to manually count how many times $1
occurs. Tracking just "$1" doesn’t really tell you what you want,
because you don’t know if the source IP is really scanning a bunch of
different hosts, or just has a renegade process that’s banging away at
a single destination. What you want to track is how many times $1
occurs with a unique $2.
To do this sort of thing in a pattern config, set format: to value1,
value2 and set dest: to "UNIQUE category-name". In our example, we
might say:
format: $1, $2
dest: UNIQUE scans
The fields in format are not evaluated in a string context, and only
the last comma acts as a separator. So, if $3 contains the protocol
information, you might say this:
format: sprintf("%-15s %s", $1, $3), $2
dest: UNIQUE scans
When detecting scans in particular, it makes sense to specify an event
filter, ie.:
category: scans
filter: >= 5
Note that it’s often useful to specify multiple dests with firewall
pattern, ie. one regular category dest, one UNIQUE dest with a filter
threshold to detect a scan. If so, you might want to add
delete_if_unique to the regular dest, so if it turns out you have a
scan, you don’t have to wade through lots of garbage. Ie.:
pattern: kernel: block from ($pat{ip}):($pat{port}) to ($pat{ip}):($pat{port})
format: $1 => $3:$4
delete_if_unique
dest: kernel block
format: $1, $3
dest: UNIQUE scans
category: scans
filter: >=5
TAG SUBSTITUTIONS
A few items are subject to "tag substitutions". These are kind of like
printf’s "%" sequences: a sequence like "%n" gets replaced with the
nodename. You can optionally specify field widths, which default to
right-justified (ie. "%10n") or can be preceeded with a "-" to make
them left-justified (ie. "%-10n"). Also, a few of the basic C-style
backslash sequences are understood (ie. \n for newline, \t for tab, \\
for backslash). Anything subject to tag substitutions will be listed
as such.
Here are the standard tag sequences:
%% literal %
%n nodename (ie. the output of uname -n.)
%r OS release (ie. the output of uname -r.)
%s OS name (ie. the output of uname -s.)
There are also other tag sequences that apply in special situations.
They are listed where they apply.
If you try to use an undefined sequence (ie. "%Z" or something else),
you’ll get an error.
LOGSERVER CONSIDERATIONS
log_analysis defaults to single host operation. If you have a
logserver that allows logs from multiple hosts (ie. centralized
logging) then you potentially have two concerns: configuring what
hostnames to allow, and how to display multi-node logs in report mode.
By default, log_analysis will only allow logs from the nodename of the
logserver, so if you want to allow other nodes, you need to tell
log_analysis which hostnames it should allow logs from. Either set
allow_nodenames to a list of nodenames to allow logs from, or set
process_all_nodenames (AKA option -N) to accept everything. Another
useful variable here is leave_FQDNs_alone.
Once you’ve accepted multiple nodes, there are a number of ways
log_analysis can display them. Let’s say I received two "Accepted
publickey for morty from 192.168.1.1 port 50000 ssh2" events from "red-
sonja" and three from "conan". In the default mode, that would look
like this:
Logs found for other hosts. For host conan:
...
sshd: accepted publickey:
3 morty from 192.168.1.1
...
Logs found for other hosts. For host red-sonja:
...
sshd: accepted publickey:
2 morty from 192.168.1.1
...
You can get the categories listed together more compactly by setting
report_mode_output_node_per_category. Ie:
...
sshd: accepted publickey: (host conan)
3 morty from 192.168.1.1
sshd: accepted publickey: (host red-sonja)
2 morty from 192.168.1.1
...
If you set report_mode_combine_nodes, the category will be combined
into a single category. Ie.:
...
sshd: accepted publickey:
5 morty from 192.168.1.1
...
If you set both report_mode_combine_nodes and
report_mode_combine_shows_nodes, you get the combined messages along
with a list of applicable hostnames. Ie.:
...
sshd: accepted publickey:
5 morty from 192.168.1.1 (conan red-sonja)
...
If you set both report_mode_combine_nodes and
report_mode_combine_is_partway, the messages are listed like so:
...
sshd: accepted publickey:
3 morty from 192.168.1.1 (conan)
2 morty from 192.168.1.1 (red-sonja)
...
Other combinations of the variables
report_mode_output_node_per_category, report_mode_combine_nodes,
report_mode_combine_shows_nodes, and report_mode_combine_is_partway
produce undefined results.
EXAMPLES
log_analysis -m root@whatever
Analyze yesterday’s logs and mail the results to root@whatever. You
might want to put this in a cronjob.
log_analysis -p5 -m root@whatever
Same as the last one, but PGP encrypt the logs using PGP 5 before
mailing.
log_analysis -a
Look at all the logs, not just yesterday’s.
log_analysis -sa /var/adm/sulog
Analyze all the contents of sulog, don’t bother with local state.
log_analysis -san otherhost syslog-file
Analyze all the contents of syslog-file, which was created on
"otherhost". Don’t run the local state commands.
log_analysis -sd1 -f foo.conf -U
This style of command is useful while developing local configs to
handle log messages unknown to the internal config.
Use foo.conf as a config file in addition to the internal config.
Output only the unknowns.
COMPATIBILITY
Written for Solaris and Linux. May work for other OSs.
Written for perl 5.00503. May work with some earlier perl versions.
NOTES
You often need to be root to read interesting log files.
It is customary to regularly "rollover" log files. Many log file
formats don’t include year infomation; among other benefits, rollover
makes the dates in such logfiles unambiguous. log_analysis by default
looks for log lines that match a particular day of the year, but does
not even try to guess the year. If the OS you’re using doesn’t
rollover some logfiles by default (ie. Solaris doesn’t rollover
/var/adm/wtmpx, /var/adm/wtmp, or /var/adm/sulog), you will need to
rollover these files yourself to get valid output from this program.
On some OSes, ’%’ (ie. the percent symbol) has a special meaning in
crontabs, and needs to be commented. See crontab(1).
When there are a lot of unknowns, log_analysis can take a lot longer to
run. This is particularly a problem when you’re first running it,
before you customize for your site. To get around this problem, if you
send log_analysis a SIGINT (ie. if you hit control-C), it will stop
going through your logs and immediately output the results.
FILES
/etc/log_analysis.conf
/etc/log_analysis.conf-%n
/etc/log_analysis.conf-%s-%r
/etc/log_analysis.conf-%s
/usr/etc/log_analysis.conf
/usr/etc/log_analysis.conf-%n
/usr/etc/log_analysis.conf-%s-%r
/usr/etc/log_analysis.conf-%s
Config files, in order of precedence. "%n", "%s", and "%r" have
the usual tag substitution meanings; see "TAG SUBSTITUTIONS".
/etc/log_analysis.d
/usr/etc/log_analysis.d
Plug-in directories. All files in these directories will be
treated as config files and include’d.
$HOME/.log_analysis.conf
If you start log_analysis with the "-g" option, this file will be
loaded as a config file after all other config files, except those
specified by -f. This is also the default file for the "save
config" menu option.
AUTHOR
Mordechai T. Abzug <morty@frakir.org>
See Also
syslogd(8), last(1), perlre(1)
POD ERRORS
Hey! The above document had some coding errors, which are explained
below:
Around line 8656:
You forgot a ’=back’ before ’=head2’
Around line 8665:
’=item’ outside of any ’=over’