cmstat - display summary statistics for a CM

NAME

       cmstat - display summary statistics for a CM

SYNOPSIS

       cmstat [options] cmfile

DESCRIPTION

       cmstat  calculates  and displays various types of statistics describing
       the covariance models (CMs) in cmfile.

       CMs are profiles of RNA consensus sequence and secondary  structure.  A
       CM  file  is produced by the cmbuild program, from a given RNA sequence
       alignment of known consensus structure.  CM  files  can  be  calibrated
       with  the  cmcalibrate  program. Searches with calibrated CM files will
       include E-values and will use appropriate filter thresholds for  faster
       speed.  It  is  strongly  recommended to calibrate your CM files before
       using cmsearch.  CM calibration is described in more detail  below  and
       in  chapters  5  and  6  of  the  User’s  Guide.   cmstat is useful for
       determining statistics on calibrated or non-calibrated CM files.

       By default, cmstat prints general  statistics  of  the  model  and  the
       alignment  it  was  built  from.  If  the  model(s) in cmfile have been
       calibrated with cmcalibrate the --le and --ge options can  be  used  to
       print  statistics  on the the exponential tails used for calculating E-
       values for the various possible search modes for locally ( --le  )  and
       globally  configured  (  --ge  )  models  in  cmsearch.   If  cmfile is
       calibrated, HMM filter threshold statistics can be  printed  for  local
       inside  CM  search  with --lfi, for glocal inside CM search with --gfi,
       for local CYK CM search with --lfc, and for glocal CYK CM  search  with
       --gfc.

       The  --search  option  causes cmstat performing a timing experiment for
       homology search. Statistics will be printed on how many  kilobases  can
       be  scanned  per  second  for  the  different  possible  algorithms  in
       cmsearch.

OPTIONS

       -h     Print brief help; includes version number  and  summary  of  all
              options, including expert options.

       -g     Turn  on the ’glocal’ alignment algorithm, local with respect to
              the target database, and global with respect to  the  model.  By
              default,  the  model  is configured for local alignment which is
              local with respect to both the target sequence and the model.

       -m     print general  statistics  on  the  models  in  cmfile  and  the
              alignment it was built from.

       -Z <x> Calculate  E-values  as  if  the  target  database  size was <x>
              megabases (Mb). Ignore the actual size  of  the  database.  This
              option is only valid if the CM file has been calibrated.

       --all  print all available statistics

       --le   print local E-value statistics. This option only works if cmfile
              has been calibrated with cmcalibrate.

       --ge   print glocal E-value  statistics.  This  option  only  works  if
              cmfile has been calibrated with cmcalibrate.

       --beta <x>
              With  the  --search option set the beta parameter for the query-
              dependent  banding  algorithm  stages  to  <x>   Beta   is   the
              probability  mass considered negligible during band calculation.
              The default is 1E-7.

       --qdbfile <f>
              Save the query-dependent bands (QDBs) for each state to file <f>

EXPERT OPTIONS

       --lfi  Print the HMM filter thresholds for the range of relevant CM bit
              score cutoffs for searches with locally configured models  using
              the Inside algorithm.

       --gfi  Print the HMM filter thresholds for the range of relevant CM bit
              score cutoffs for searches with globally configured models using
              the Inside algorithm.

       --lfc  Print the HMM filter thresholds for the range of relevant CM bit
              score cutoffs for searches with locally configured models  using
              the CYK algorithm.

       --gfc  Print the HMM filter thresholds for the range of relevant CM bit
              score cutoffs for searches with globally configured models using
              the CYK algorithm.

       -E <x> Print  filter  threshold statistics for an HMM filter if a final
              CM E-value cutoff of <x> were to be used for a run  of  cmsearch
              on  1  MB  of  sequence.  (Remember cmsearch considers a 500,000
              nucleotide sequence file as 1 MB of sequence because by  default
              both  strands  of  the sequence are searched).  The size 1 MB of
              sequence can be changed to the size of a given database in  file
              <f> using the --seqfile <f> option.

       -T <x> Print  filter  threshold statistics for an HMM filter if a final
              CM bit score cutoff of  <x>  were  to  be  used  for  a  run  of
              cmsearch.

       --nc   Print  filter threshold statistics for an HMM filter if a CM bit
              score cutoff equal to the Rfam NC cutoff were to be used  for  a
              run  of  cmsearch.   The NC cutoff is defined as <x> bits in the
              original Stockholm alignment the model was  built  from  with  a
              line:  #=GF  NC <x> positioned before the sequence alignment. If
              such a line existed in the alignment provided  to  cmbuild  then
              the  --nc  option  will  be  available in cmstat If no such line
              existed when cmbuild was run, then  using  the  --nc  option  to
              cmstat  will  cause  the  program  to print an error message and
              exit.

       --ga   Print filter threshold statistics for an HMM filter if a CM  bit
              score  cutoff  of Rfam GA cutoff value were to be used for a run
              of cmsearch.  The GA cutoff is defined in a stockholm file  used
              to build the model in the same way as the NC cutoff (see above),
              but with a line: #=GF GA <x>

       --tc   Print filter threshold statistics for an HMM filter if a CM  bit
              score  cutoff  equal to the Rfam TC cutoff value were to be used
              for a run of cmsearch.  The TC cutoff is defined in a  stockholm
              file  used  to  build the model in the same way as the NC cutoff
              (see above), but with a line: #=GF TC <x>

       --seqfile <x>
              With the -E option, use the database size of the database in <x>
              instead of the default database size of 1 MB.

       --toponly
              In  combination with --seqfile <x> option, only consider the top
              strand of the database in <x> instead of both strands.

              --search perform an experiment to determine how fast  the  CM(s)
              can search with different search algorithms.

       --cmL <n>
              With  the  --search  option set the length of sequence to search
              with CM algorithms as <n> residues. By default, <n> is 1000.

       --hmmL <n>
              With the --search option set the length of  sequence  to  search
              with HMM algorithms as <n> residues. By default, <n> is 100,000.

       --efile <f>
              Save a plot of cmsearch HMM filter E value cutoffs versus  CM  E
              value  cutoffs  in xmgrace format to file <f>.  This option must
              be used in combination with --lfi, --gfi, --lfc or --gfc.

       --bfile <f>
              Save a plot of cmsearch HMM bit  score  cutoffs  versus  CM  bit
              score  cutoffs  in xmgrace format to file <f>.  This option must
              be used in combination with --lfi, --gfi, --lfc or --gfc.

       --sfile <f>
              Save a plot of cmsearch predicted survival fraction from the HMM
              filter  versus  CM E value cutoff in xmgrace format to file <f>.
              This option must be used in combination with --lfi, --gfi, --lfc
              or --gfc.

       --xfile <f>
              Save a plot of ’xhmm’ versus CM E value cutoff in xmgrace format
              to file <f>
               ’xhmm’ is the  ratio  of  the  number  of  dynamic  programming
              calculations predicted to be required for the HMM filter and the
              CM search of the filter survivors versus the number  of  dynamic
              programming  calculations  for  the  filter alone. So, an ’xhmm’
              value of 2.0 means the filter stage of  a  search  requires  the
              same  number  of  calculations  as  the  CM search of the filter
              survivors does.  This option must be used  in  combination  with
              --lfi, --gfi, --lfc or --gfc.

       --afile <f>
              Save  a  plot  of the predicted acceleration for an HMM filtered
              search versus CM E value cutoff in xmgrace format to  file  <f>.
              This option must be used in combination with --lfi, --gfi, --lfc
              or --gfc.

       --bits With --efile, --sfile, --xfile, and --afile  use  CM  bit  score
              cutoffs  instead  of CM E value cutoffs for the x-axis values of
              the plot.

COPYRIGHT

       Copyright (C) 2009 HHMI Janelia Farm Research Campus.
       Freely distributed under the GNU General Public License (GPLv3).
       See the  file  COPYING  that  came  with  the  source  for  details  on
       redistribution conditions.

AUTHOR

       Eric Nawrocki, Diana Kolbe, and Sean Eddy
       HHMI Janelia Farm Research Campus
       19700 Helix Drive
       Ashburn VA 20147
       http://selab.janelia.org/

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXPERT OPTIONS

SEE ALSO

COPYRIGHT

AUTHOR