Man Linux: Main Page and Category List

NAME

       SIBsim4 - align RNA sequences with a DNA sequence, allowing for introns

SYNOPSIS

       SIBsim4 [ options ] dna rna_db

DESCRIPTION

       SIBsim4 is  a  similarity-based  tool  for  aligning  a  collection  of
       expressed sequences (EST, mRNA) with a genomic DNA sequence.

       Launching  SIBsim4  without  any arguments will print the options list,
       along with their default values.

       SIBsim4 employs a blast-based technique to first  determine  the  basic
       matching blocks representing the "exon cores".  In this first stage, it
       detects all possible exact matches of W-mers (i.e., DNA words  of  size
       W)  between  the two sequences and extends them to maximal scoring gap-
       free segments.  In the second stage, the exon cores are  extended  into
       the   adjacent   as-yet-unmatched   fragments  using  greedy  alignment
       algorithms, and  heuristics  are  used  to  favor  configurations  that
       conform  to  the  splice-site  recognition  signals  (e.g.,  GT-AG). If
       necessary, the process is repeated with less  stringent  parameters  on
       the unmatched fragments.

       By default, SIBsim4 searches both strands and reports the best matches,
       measured by the number of matching nucleotides found in the  alignment.
       The  R  command  line  option can be used to restrict the search to one
       orientation (strand) only.

       Currently,  four  major  alignment  display  options   are   supported,
       controlled  by  the  A  option. By default, only the endpoints, overall
       similarity, and orientation of the introns are reported. An arrow  sign
       (’->’  or ’<-’) indicates the orientation of the intron.  The sign ‘==’
       marks the absence from the alignment of a  cDNA  fragment  starting  at
       that position.

       In  the description below, the term MSP denotes a maximal scoring pair,
       that is, a pair of highly  similar  fragments  in  the  two  sequences,
       obtained  during  the  blast-like procedure by extending a W-mer hit by
       matches and perhaps a few mismatches.

OPTIONS

       -A <int>
              output format
                0: exon endpoints only
                1: alignment text
                3: both exon endpoints and alignment text
                4: both exon endpoints and alignment text with polyA info

              Note that 2 is unimplemented.

              Default value is 0.

       -C <int>
              MSP score threshold for the second pass.

              Default value is 12.

       -c <int>
              minimum score cutoff value.  Alignments which have scores  below
              this value are not reported.

              Default value is 50.

       -E <int>
              cutoff value.

              Default value is 3.

       -f <int>
              score  filter  in  percent.  When multiple hits are detected for
              the same RNA element, only those  having  a  score  within  this
              percentage  of  the  maximal  score  for  that  RNA  element are
              reported.  Setting this value to 0 disables  filtering  and  all
              hits  will be reported, provided their score is above the cutoff
              value specified through the c option.

              Default value is 75.

       -g <int>
              join exons when gap on genomic and RNA have lengths which differ
              at most by this percentage.

              Default value is 10.

       -H <int>
              report  chimeric  transcripts  when the best score is lower than
              this percentage of the overall  RNA  coverage  and  the  chimera
              score  is  greater  than  this  percentage  of the RNA length (0
              disables this report)

              Default value is 75.

       -I <int>
              window width in which to search for intron splicing.

              Default value is 6.

       -K <int>
              MSP score threshold for the first pass.

              Default value is 16.

       -L <str>
              a comma separated list of forward splice-types.

              Default value is "GTAG,GCAG,GTAC,ATAC".

       -M <int>
              scoring splice sites, evaluate match within M nucleotides.

              Default value is 10.

       -o <int>
              when printing results, offset nt positions in  dna  sequence  by
              this amount.

              Default value is 0.

       -q <int>
              penalty for a nucleotide mismatch.

              Default value is -5.

       -R <int>
              direction of search
                0: search the ’+’ (direct) strand only
                1: search the ’-’ strand only
                2: search both strands

              Default value is 2.

       -r <int>
              reward for a nucleotide match.

              Default value is 1.

       -S <int>
              splice  site  indels search breadth.  While determining the best
              position of a splice site, SIBsim4 will evaluate adding at  most
              this  number  of  insertions  and deletions on the DNA strand on
              each side of the splice junction.

              Default value is 2.

       -s <int>
              split score in percent.  While linking MSP, if  two  consecutive
              group  of  exons appear like they could be part of two different
              copies of the same gene, they will be tested to see if the score
              of  each  individual group relative to the best overall score is
              greater than this value.  If both groups have a  relative  score
              above this threshold they will be split.

              Default value is 75.

       -W <int>
              word size.

              Default value is 12.

       -X <int>
              value for terminating word extensions.

              Default value is 12.