NAME
RNAhybrid - calculate secondary structure hybridisations of RNAs
SYNOPSIS
RNAhybrid [-h] [-b hit_number] [-e energy_cutoff] [-p p-value_cutoff]
[-c] [-d xi,theta] [-s set_name] [-f from,to] [-m max_target_length]
[-n max_query_length] [-u iloop_upper_limit] [-v bloop_upper_limit] [-g
(ps|png|jpg|all)] [-t target_file] [-q query_file] [target] [query]
DESCRIPTION
RNAhybrid is a tool for finding minimum free energy (mfe)
hybridisations of a long (target) and a short (query) RNA. The
hybridisation is performed in a kind of domain mode, ie. the short
sequence is hybridised to the best fitting parts of the long one. The
tool is primarily meant as a means for microRNA target prediction. In
addition to mfes, the program calculates p-values based on extreme
value distributions of length normalised energies.
OPTIONS
-h Give a short summary of command line options.
-b hit_number
Maximal number of hits to show. hit_number hits with increasing
minimum free energy (reminder: larger energies are worse) are
shown, unless the -e option is used and the energy cut-off has
been exceeded (see -e option below) or there are no more hits.
Hits may only overlap at dangling bases (5’ or 3’ unpaired end
of target).
-c Produce compact output. For each target/query pair one line of
output is generated. Each line is a colon (:) separated list of
the following fields: target name, query name, minimum free
energy, position in target, alignment line 1, line 2, line 3,
line 4. If a target or a query is given on command line (ie. no
-t or -q respectively), its name in the output will be "command
line".
-d xi,theta
xi and theta are the position and shape parameters,
respectively, of an extreme value distribution (evd). p-values
of duplex energies are assumed to be distributed according to
such an evd. For a length normalised energy en, we have P[X <=
en] = 1 - exp(-exp(-(-en-xi)/theta)), where en = e/log(m*n) with
m and n being the lengths of the target and the query,
respectively. If the -d option is omitted, xi and theta are
estimated from the maximal duplex energy of the query, assuming
a linear dependence. The parameters of this linear dependence
are coded into the program, but the option -s has to be given to
choose from the appropriate set. Note that the evd is mirrored,
since good mfes are large negative values.
-s set_name
Used for quick estimate of extreme value distribution parameters
(see -d option above). Tells RNAhybrid which target dataset to
assume. Valid parameters are 3utr_fly, 3utr_worm and 3utr_human.
-e energy_cutoff
Hits with increasing minimum free energy (reminder: larger
energies are worse) less than or equal to energy_cutoff are
shown, unless the -b option is used and the number of already
reported hits has reached the maximal hit_number (see -b option
above). Hits may only overlap at dangling bases (5’ or 3’
unpaired end of target).
-p p-value_cutoff
Only hits with p-values not larger than p-value_cutoff are
reported. See also options -d and -s.
-f from,to
Forces all structures to have a helix from position from to
position to with respect to the query. The first base has
position 1.
-m max_target_length
The maximum allowed length of a target sequence. The default
value is 2000. This option only has an effect if a target file
is given with the -t option (see below).
-n max_query_length
The maximum allowed length of a query sequence. The default
value is 30. This option only has an effect if a query file is
given with the -q option (see below).
-u iloop_upper_limit
The maximally allowed number of unpaired nucleotides in either
side of an internal loop.
-v bloop_upper_limit
The maximally allowed number of unpaired nucleotides in a bulge
loop.
-g (ps|png|jpg|all)
Produce a plot of the hybridisation, either in ps, png or jpg
format, or for all formats together. The plots are saved in
files whose names are created from the target and query names
("command_line" if given on the command line). This option only
works, if the appropriate graphics libraries are present.
-t target_file
Each of the target sequences in target_file is subject to
hybridisation with each of the queries (which either are from
the query_file or is the one query given on command line; see -q
below). The sequences in the target_file have to be in FASTA
format, ie. one line starting with a > and directly followed by
a name, then one or more following lines with the sequence
itself. Each individual sequence line must not have more than
1000 characters. If no -t is given, either the last argument (if
a -q is given) or the second last argument (if no -q is given)
to RNAhybrid is taken as a target.
-q query_file
See -t option above.
REFERENCES
The energy parameters are taken from:
Mathews DH, Sabina J, Zuker M, Turner DH. "Expanded sequence
dependence of thermodynamic parameters improves prediction of RNA
secondary structure" J Mol Biol., 288 (5), pp 911-940, 1999
The graphical output uses code from the Vienna RNA package:
Hofacker IL. "Vienna RNA secondary structure server." Nucleic Acids
Research, 31 (13), pp 3429-3431, 2003
VERSION
This man page documents version 2.0 of RNAhybrid.
AUTHORS
Marc Rehmsmeier, Peter Steffen, Matthias Hoechsmann.
LIMITATIONS
Character dependent energy values are only defined for [acgtuACGTU].
All other characters lead to values of zero in these cases.
BUGS
In suboptimal hits, dangling ends appear as Ns if they were in the
first or last hybridising position of a previous hit.
SEE ALSO
RNAcalibrate, RNAeffective