NAME
cmstat - display summary statistics for a CM
SYNOPSIS
cmstat [options] cmfile
DESCRIPTION
cmstat calculates and displays various types of statistics describing
the covariance models (CMs) in cmfile.
CMs are profiles of RNA consensus sequence and secondary structure. A
CM file is produced by the cmbuild program, from a given RNA sequence
alignment of known consensus structure. CM files can be calibrated
with the cmcalibrate program. Searches with calibrated CM files will
include E-values and will use appropriate filter thresholds for faster
speed. It is strongly recommended to calibrate your CM files before
using cmsearch. CM calibration is described in more detail below and
in chapters 5 and 6 of the User’s Guide. cmstat is useful for
determining statistics on calibrated or non-calibrated CM files.
By default, cmstat prints general statistics of the model and the
alignment it was built from. If the model(s) in cmfile have been
calibrated with cmcalibrate the --le and --ge options can be used to
print statistics on the the exponential tails used for calculating E-
values for the various possible search modes for locally ( --le ) and
globally configured ( --ge ) models in cmsearch. If cmfile is
calibrated, HMM filter threshold statistics can be printed for local
inside CM search with --lfi, for glocal inside CM search with --gfi,
for local CYK CM search with --lfc, and for glocal CYK CM search with
--gfc.
The --search option causes cmstat performing a timing experiment for
homology search. Statistics will be printed on how many kilobases can
be scanned per second for the different possible algorithms in
cmsearch.
OPTIONS
-h Print brief help; includes version number and summary of all
options, including expert options.
-g Turn on the ’glocal’ alignment algorithm, local with respect to
the target database, and global with respect to the model. By
default, the model is configured for local alignment which is
local with respect to both the target sequence and the model.
-m print general statistics on the models in cmfile and the
alignment it was built from.
-Z <x> Calculate E-values as if the target database size was <x>
megabases (Mb). Ignore the actual size of the database. This
option is only valid if the CM file has been calibrated.
--all print all available statistics
--le print local E-value statistics. This option only works if cmfile
has been calibrated with cmcalibrate.
--ge print glocal E-value statistics. This option only works if
cmfile has been calibrated with cmcalibrate.
--beta <x>
With the --search option set the beta parameter for the query-
dependent banding algorithm stages to <x> Beta is the
probability mass considered negligible during band calculation.
The default is 1E-7.
--qdbfile <f>
Save the query-dependent bands (QDBs) for each state to file <f>
EXPERT OPTIONS
--lfi Print the HMM filter thresholds for the range of relevant CM bit
score cutoffs for searches with locally configured models using
the Inside algorithm.
--gfi Print the HMM filter thresholds for the range of relevant CM bit
score cutoffs for searches with globally configured models using
the Inside algorithm.
--lfc Print the HMM filter thresholds for the range of relevant CM bit
score cutoffs for searches with locally configured models using
the CYK algorithm.
--gfc Print the HMM filter thresholds for the range of relevant CM bit
score cutoffs for searches with globally configured models using
the CYK algorithm.
-E <x> Print filter threshold statistics for an HMM filter if a final
CM E-value cutoff of <x> were to be used for a run of cmsearch
on 1 MB of sequence. (Remember cmsearch considers a 500,000
nucleotide sequence file as 1 MB of sequence because by default
both strands of the sequence are searched). The size 1 MB of
sequence can be changed to the size of a given database in file
<f> using the --seqfile <f> option.
-T <x> Print filter threshold statistics for an HMM filter if a final
CM bit score cutoff of <x> were to be used for a run of
cmsearch.
--nc Print filter threshold statistics for an HMM filter if a CM bit
score cutoff equal to the Rfam NC cutoff were to be used for a
run of cmsearch. The NC cutoff is defined as <x> bits in the
original Stockholm alignment the model was built from with a
line: #=GF NC <x> positioned before the sequence alignment. If
such a line existed in the alignment provided to cmbuild then
the --nc option will be available in cmstat If no such line
existed when cmbuild was run, then using the --nc option to
cmstat will cause the program to print an error message and
exit.
--ga Print filter threshold statistics for an HMM filter if a CM bit
score cutoff of Rfam GA cutoff value were to be used for a run
of cmsearch. The GA cutoff is defined in a stockholm file used
to build the model in the same way as the NC cutoff (see above),
but with a line: #=GF GA <x>
--tc Print filter threshold statistics for an HMM filter if a CM bit
score cutoff equal to the Rfam TC cutoff value were to be used
for a run of cmsearch. The TC cutoff is defined in a stockholm
file used to build the model in the same way as the NC cutoff
(see above), but with a line: #=GF TC <x>
--seqfile <x>
With the -E option, use the database size of the database in <x>
instead of the default database size of 1 MB.
--toponly
In combination with --seqfile <x> option, only consider the top
strand of the database in <x> instead of both strands.
--search perform an experiment to determine how fast the CM(s)
can search with different search algorithms.
--cmL <n>
With the --search option set the length of sequence to search
with CM algorithms as <n> residues. By default, <n> is 1000.
--hmmL <n>
With the --search option set the length of sequence to search
with HMM algorithms as <n> residues. By default, <n> is 100,000.
--efile <f>
Save a plot of cmsearch HMM filter E value cutoffs versus CM E
value cutoffs in xmgrace format to file <f>. This option must
be used in combination with --lfi, --gfi, --lfc or --gfc.
--bfile <f>
Save a plot of cmsearch HMM bit score cutoffs versus CM bit
score cutoffs in xmgrace format to file <f>. This option must
be used in combination with --lfi, --gfi, --lfc or --gfc.
--sfile <f>
Save a plot of cmsearch predicted survival fraction from the HMM
filter versus CM E value cutoff in xmgrace format to file <f>.
This option must be used in combination with --lfi, --gfi, --lfc
or --gfc.
--xfile <f>
Save a plot of ’xhmm’ versus CM E value cutoff in xmgrace format
to file <f>
’xhmm’ is the ratio of the number of dynamic programming
calculations predicted to be required for the HMM filter and the
CM search of the filter survivors versus the number of dynamic
programming calculations for the filter alone. So, an ’xhmm’
value of 2.0 means the filter stage of a search requires the
same number of calculations as the CM search of the filter
survivors does. This option must be used in combination with
--lfi, --gfi, --lfc or --gfc.
--afile <f>
Save a plot of the predicted acceleration for an HMM filtered
search versus CM E value cutoff in xmgrace format to file <f>.
This option must be used in combination with --lfi, --gfi, --lfc
or --gfc.
--bits With --efile, --sfile, --xfile, and --afile use CM bit score
cutoffs instead of CM E value cutoffs for the x-axis values of
the plot.
SEE ALSO
For complete documentation, see the User’s Guide (Userguide.pdf) that
came with the distribution; or see the Infernal web page,
http://infernal.janelia.org/.
COPYRIGHT
Copyright (C) 2009 HHMI Janelia Farm Research Campus.
Freely distributed under the GNU General Public License (GPLv3).
See the file COPYING that came with the source for details on
redistribution conditions.
AUTHOR
Eric Nawrocki, Diana Kolbe, and Sean Eddy
HHMI Janelia Farm Research Campus
19700 Helix Drive
Ashburn VA 20147
http://selab.janelia.org/