NAME
stda - Simple Tools for Data Analysis (STDA)
DESCRIPTION
STDA includes some primary tools for data analysis. You can evaluate
sums, averages, integrals, histograms, and probability distribution
functions of 1d data, and eventually plot the results. The separate
programs are stand-alone tools (supporting the standard UNIX input and
output pipelines) to be used for data processing from the command line.
It should be noted that all but one of the scripts use awk and core
system utilities - for plotting you have to install Gnuplot
(http://gnuplot.info) since ’muplot’ is a wrapper around it. In
summary, the package provides utilities for straightforward analysis in
situations where a complex analytical approach is not necessary and
where an ultimate numerical precision with floating-point numbers is
not critical. Some general examples of application cases include
evaluating usage statistics from server logfiles, determining a
response time distribution from a series of queries to a (remote)
service, producing a plot from multiple data files, etc.
This software should be considered as an open project to be extended
with new command-line driven utilities helpful for performing common
data analysis tasks. Any contributions and suggestions are welcome.
EXAMPLES
- Evaluate the current apache2 logfile and make an unique list of the
hostnames (respectively ip-addresses) sorted by the total number of
their http requests:
maphimbu -rs2 /var/log/apache2/access.log
- On a X terminal plot the probability function and the cumulative
distribution function of a sin(x) data sample:
nnum -3.14159 3.14159 0.00001 %.6g |awk ’{ print $1, sin($1) }’ \
|maphimbu -d0.01 -x2 -ns1 |mintegrate -d0.01 -x1 -y3 -S |muplot lp -
1:3,4
COPYRIGHT
Copyright © 2009 Dimitar Ivanov <dimitar.ivanov@mirendom.net>
License: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.