Man Linux: Main Page and Category List

NAME

       compstruct - calculate accuracy of RNA secondary structure predictions

SYNOPSIS

       compstruct [options] trusted_file test_file

DESCRIPTION

       compstruct   evaluates   the   accuracy   of  RNA  secondary  structure
       predictions,  at  the  on  a  per-base-pair  basis.   The  trusted_file
       contains  one  or  more  sequences  with  trusted (known) RNA secondary
       structure annotation. The test_file contains the same sequences, in the
       same   order,   with  predicted  RNA  secondary  structure  annotation.
       compstruct reads the structures and compares them, and calculates  both
       the  sensitivity  (the  number  of  true  base pairs that are correctly
       predicted) and the specificity (positive predictive value,  the  number
       of  predicted base pairs that are true).  Results are reported for each
       individual sequence, and in summary for all sequences together.

       Both  files  must  contain  secondary  structure  annotation  in   WUSS
       notation.  Only SELEX and Stockholm formats support structure markup at
       present.

       The default definition of a correctly predicted base  pair  is  that  a
       true pair (i,j) must exactly match a predicted pair (i,j).

       Mathews,  Zuker,  Turner  and  colleagues  (see:  Mathews  et  al., JMB
       288:911-940, 1999) use  a  more  relaxed  definition.  Mathews  defines
       "correct"  as  follows: a true pair (i,j) is correctly predicted if any
       of the following pairs are predicted: (i,j), (i+1,j), (i-1,j), (i,j+1),
       or  (i,j-1).  This  rule  allows for "slipped helices" off by one base.
       The -m  option  activates  this  rule  for  both  sensitivity  and  for
       specificity.  For  specificity,  the  rule  is reversed: predicted pair
       (i,j) is considered to be true if the true structure  contains  one  of
       the five pairs (i,j), (i+1,j), (i-1,j), (i,j+1), or (i,j-1).

OPTIONS

       -h     Print  brief  help;  includes  version number and summary of all
              options, including expert options.

       -m     Use the Mathews relaxed accuracy rule (see  above),  instead  of
              requiring exact prediction of base pairs.

       -p     Count  pseudoknotted  base pairs towards the accuracy, in either
              trusted or predicted structures.  By  default,  pseudoknots  are
              ignored.

              Normally,   only   the   trusted_file   would   have  pseudoknot
              annotation,  since  most  RNA  secondary  structure   prediction
              programs  do not predict pseudoknots. Using the -p option allows
              you to penalize the prediction program for not predicting  known
              pseudoknots.  In  a  case  where  both  the trusted_file and the
              test_file have pseudoknot annotation,  the -p  option  lets  you
              count pseudoknots in evaluating the prediction accuracy. Beware,
              however, the case where you use a pseudoknot-capable  prediction
              program to generate the test_file, but the trusted_file does not
              have pseudoknot annotation; in this case, -p will  penalize  any
              predicted  pseudoknots  when  it calculates specificity, even if
              they’re  right,  because  they  don’t  appear  in  the   trusted
              annotation; this is probably not what you’d want to do.

EXPERT OPTIONS

       --informat <s>
              Specify  that  the two sequence files are in format <s>. In this
              case, both files must be in the same format. The default  is  to
              autodetect  the  file  formats,  in  which  case  they  could be
              different (one SELEX, one Stockholm).

       --quiet
              Don’t print any verbose header information.

SEE ALSO

       afetch(1),   alistat(1),   compalign(1),    revcomp(1),    seqsplit(1),
       seqstat(1),    sfetch(1),    shuffle(1),    sindex(1),    sreformat(1),
       stranslate(1), weight(1).

AUTHOR

       Biosquid  and   its   documentation   are   Copyright   (C)   1992-2003
       HHMI/Washington  University School of Medicine Freely distributed under
       the GNU General Public License (GPL) See COPYING  in  the  source  code
       distribution for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu