NAME
compstruct - calculate accuracy of RNA secondary structure predictions
SYNOPSIS
compstruct [options] trusted_file test_file
DESCRIPTION
compstruct evaluates the accuracy of RNA secondary structure
predictions, at the on a per-base-pair basis. The trusted_file
contains one or more sequences with trusted (known) RNA secondary
structure annotation. The test_file contains the same sequences, in the
same order, with predicted RNA secondary structure annotation.
compstruct reads the structures and compares them, and calculates both
the sensitivity (the number of true base pairs that are correctly
predicted) and the specificity (positive predictive value, the number
of predicted base pairs that are true). Results are reported for each
individual sequence, and in summary for all sequences together.
Both files must contain secondary structure annotation in WUSS
notation. Only SELEX and Stockholm formats support structure markup at
present.
The default definition of a correctly predicted base pair is that a
true pair (i,j) must exactly match a predicted pair (i,j).
Mathews, Zuker, Turner and colleagues (see: Mathews et al., JMB
288:911-940, 1999) use a more relaxed definition. Mathews defines
"correct" as follows: a true pair (i,j) is correctly predicted if any
of the following pairs are predicted: (i,j), (i+1,j), (i-1,j), (i,j+1),
or (i,j-1). This rule allows for "slipped helices" off by one base.
The -m option activates this rule for both sensitivity and for
specificity. For specificity, the rule is reversed: predicted pair
(i,j) is considered to be true if the true structure contains one of
the five pairs (i,j), (i+1,j), (i-1,j), (i,j+1), or (i,j-1).
OPTIONS
-h Print brief help; includes version number and summary of all
options, including expert options.
-m Use the Mathews relaxed accuracy rule (see above), instead of
requiring exact prediction of base pairs.
-p Count pseudoknotted base pairs towards the accuracy, in either
trusted or predicted structures. By default, pseudoknots are
ignored.
Normally, only the trusted_file would have pseudoknot
annotation, since most RNA secondary structure prediction
programs do not predict pseudoknots. Using the -p option allows
you to penalize the prediction program for not predicting known
pseudoknots. In a case where both the trusted_file and the
test_file have pseudoknot annotation, the -p option lets you
count pseudoknots in evaluating the prediction accuracy. Beware,
however, the case where you use a pseudoknot-capable prediction
program to generate the test_file, but the trusted_file does not
have pseudoknot annotation; in this case, -p will penalize any
predicted pseudoknots when it calculates specificity, even if
they’re right, because they don’t appear in the trusted
annotation; this is probably not what you’d want to do.
EXPERT OPTIONS
--informat <s>
Specify that the two sequence files are in format <s>. In this
case, both files must be in the same format. The default is to
autodetect the file formats, in which case they could be
different (one SELEX, one Stockholm).
--quiet
Don’t print any verbose header information.
SEE ALSO
afetch(1), alistat(1), compalign(1), revcomp(1), seqsplit(1),
seqstat(1), sfetch(1), shuffle(1), sindex(1), sreformat(1),
stranslate(1), weight(1).
AUTHOR
Biosquid and its documentation are Copyright (C) 1992-2003
HHMI/Washington University School of Medicine Freely distributed under
the GNU General Public License (GPL) See COPYING in the source code
distribution for more details, or contact me.
Sean Eddy
HHMI/Department of Genetics
Washington University School of Medicine
4444 Forest Park Blvd., Box 8510
St Louis, MO 63108 USA
Phone: 1-314-362-7666
FAX : 1-314-362-2157
Email: eddy@genetics.wustl.edu