gsnap - Genomic Short-read Nucleotide Alignment Program

NAME

       gsnap - Genomic Short-read Nucleotide Alignment Program

SYNOPSIS

       gsnap -dDB [OPTION]... [QUERY]...

DESCRIPTION

       Align  the  sequences  QUERY  to the reference DB.  With no QUERY, read
       standard input.

OPTIONS

   Input options
       -D, --dir=directory
              Genome directory

       -d, --db=STRING
              Genome database

       -q, --part=INT/INT
              Process only the i-th out of every n sequences  e.g.,  0/100  or
              99/100

       -c, --circular-input
              Circular-end data (paired reads are on same strand)

   Computation options
       Note: GSNAP has an ultrafast algorithm for calculating mismatches up to
       and including ((readlength+2)/12 - 2)  ("ultrafast  mismatches").   The
       program  will run fastest if max-mismatches (plus suboptimal-levels) is
       within that value.  Also, indels, especially end indels, take longer to
       compute, although the algorithm is still designed to be fast.

       -B, --batch=INT
              Batch  mode  (0  =  no pre-loading, 1 = pre-load only indices; 2
              (default) = pre-load both indices and genome)

       -m, --max-mismatches=FLOAT
              Maximum number of mismatches allowed  (if  not  specified,  then
              defaults  to  the ultrafast level of ((readlength+2)/12 - 2)) If
              specified between 0.0 and 1.0, then treated  as  a  fraction  of
              each  read  length.  Otherwise, treated as an integral number of
              mismatches (including indel and splicing penalties)

       -i, --indel-penalty=INT
              Penalty for an indel (default 1000, essentially turning it off).
              Counts   against  mismatches  allowed.   To  find  indels,  make
              indel-penalty less than or equal to  max-mismatches  For  2-base
              reads, need to set indel-penalty somewhat high

       -I, --indel-endlength=INT
              Minimum length at end required for indel alignments (default 3)

       -y, --max-middle-insertions=INT
              Maximum number of middle insertions allowed (default 9)

       -z, --max-middle-deletions=INT
              Maximum number of middle deletions allowed (default 30)

       -Y, --max-end-insertions=INT
              Maximum number of end insertions allowed (default 3)

       -Y, --max-end-deletions=INT
              Maximum number of end deletions allowed (default 6)

       -M, --suboptimal-score=INT
              Report suboptimal hits beyond best hit (default 0) All hits with
              best score plus suboptimal-score are reported

       -R, --masking=INT
              Masking of frequent/repetitive oligomers to avoid spending  time
              on non-unique or repetitive reads
               0  =  no  masking  (will  try  to find non-unique or repetitive
              matches)
               1 = mask frequent oligomers
               2 = mask frequent and repetitive oligomers (fastest) (default)
               3 = greedy frequent: mask frequent oligomers first, then try no
              masking if alignments not found
               4  =  greedy repetitive: mask frequent and repetitive oligomers
              first, then try no masking if alignments not found

       -T, --trim=INT
              Trim mismatches at ends (0 = no (default), 1 = yes)

       -2, --dibase
              Input is 2-base  encoded  (e.g.,  SOLiD),  with  database  built
              previously using dibaseindex)

       -C, --cmet
              Use  database  for  methylcytosine experiments, built previously
              using cmetindex)

       -V, --usesnps=STRING
              Use database  containing  known  SNPs  (in  <STRING>.iit,  built
              previously using snpindex) for tolerance to SNPs

       -g, --geneprob=STRING
              Use IIT file containing geneprob (in <STRING>.iit, of cumulative
              format  >(count) (genomicpos)  to resolve ties

       -t, --nthreads=INT
              Number of worker threads

   Splicing options for RNA-Seq
       -s, --splicesites=STRING
              Look   for   splicing   involving   known   splice   sites   (in
              <STRING>.iit), at short or long distances

       -N, --novelsplicing=INT
              Look  for  novel  splicing,  not  in  known  splice sites (if -s
              provided)   within   shortsplicedist   (-w   flag)    or    with
              novelspliceprob (-x flag)

       -w, --localsplicedist=INT
              Definition of local novel splicing event (default 200000)

       -e, --local-splice-penalty=INT
              Penalty   for  a  local  splice  (default  2).   Counts  against
              mismatches allowed

       -E, --distant-splice-penalty=INT
              Penalty for  a  distant  splice  (default  3).   Counts  against
              mismatches allowed

       -k, --local-splice-endlength=INT
              Minimum  length  at  end  required  for local spliced alignments
              (default 15, min is 14)

       -K, --distant-splice-endlength=INT
              Minimum length at end required for  distant  spliced  alignments
              (default 16, min is 14)

       -J, --distant-splice-identity=FLOAT
              Minimum  identity at end required for distant spliced alignments
              (default 0.95)

   Options for paired-end reads
       -P, --pairmax=INT
              Max total  genomic  length  for  paired  reads  (default  1000).
              Should increase for RNA-Seq reads.

       -p, --pairlength=INT
              Expected paired-end length (default 200)

   Output options
       -n, --npaths=INT
              Maximum number of paths to print (default 100).

       -Q, --quiet-if-excessive
              If  more than maximum number of paths are found, then nothing is
              printed.

       -O, --ordered
              Print output in same order as input (relevant only if  there  is
              more than one worker thread)

       -S, --print-snps=INT
              Print detailed information about SNPs in reads (works only if -V
              also selected) (0=no (default), 1=positions and labels)

       -F, --failsonly
              Print only failed alignments, those with no results

       -f, --nofails
              Exclude printing of failed alignments

       -A, --format=STRING
              Another format type, other than default.  Currently implemented:
              sam

   Help options
       -v, --version
              Show version

       -?, --help
              Show this help message

ENVIRONMENT

       GMAPDB genome directory (eqivalent to -D)

FILES

       ~/.gmaprc
              configuration file

AUTHOR

       Thomas D. Wu and Colin K. Watanabe

REPORTING BUGS

       Report bugs to Thomas Wu <twu@gene.com>.

COPYRIGHT

       Copyright 2005 Genentech, Inc. All rights reserved.

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

ENVIRONMENT

FILES

AUTHOR

REPORTING BUGS

COPYRIGHT

SEE ALSO