NAME
gsnap - Genomic Short-read Nucleotide Alignment Program
SYNOPSIS
gsnap -dDB [OPTION]... [QUERY]...
DESCRIPTION
Align the sequences QUERY to the reference DB. With no QUERY, read
standard input.
OPTIONS
Input options
-D, --dir=directory
Genome directory
-d, --db=STRING
Genome database
-q, --part=INT/INT
Process only the i-th out of every n sequences e.g., 0/100 or
99/100
-c, --circular-input
Circular-end data (paired reads are on same strand)
Computation options
Note: GSNAP has an ultrafast algorithm for calculating mismatches up to
and including ((readlength+2)/12 - 2) ("ultrafast mismatches"). The
program will run fastest if max-mismatches (plus suboptimal-levels) is
within that value. Also, indels, especially end indels, take longer to
compute, although the algorithm is still designed to be fast.
-B, --batch=INT
Batch mode (0 = no pre-loading, 1 = pre-load only indices; 2
(default) = pre-load both indices and genome)
-m, --max-mismatches=FLOAT
Maximum number of mismatches allowed (if not specified, then
defaults to the ultrafast level of ((readlength+2)/12 - 2)) If
specified between 0.0 and 1.0, then treated as a fraction of
each read length. Otherwise, treated as an integral number of
mismatches (including indel and splicing penalties)
-i, --indel-penalty=INT
Penalty for an indel (default 1000, essentially turning it off).
Counts against mismatches allowed. To find indels, make
indel-penalty less than or equal to max-mismatches For 2-base
reads, need to set indel-penalty somewhat high
-I, --indel-endlength=INT
Minimum length at end required for indel alignments (default 3)
-y, --max-middle-insertions=INT
Maximum number of middle insertions allowed (default 9)
-z, --max-middle-deletions=INT
Maximum number of middle deletions allowed (default 30)
-Y, --max-end-insertions=INT
Maximum number of end insertions allowed (default 3)
-Y, --max-end-deletions=INT
Maximum number of end deletions allowed (default 6)
-M, --suboptimal-score=INT
Report suboptimal hits beyond best hit (default 0) All hits with
best score plus suboptimal-score are reported
-R, --masking=INT
Masking of frequent/repetitive oligomers to avoid spending time
on non-unique or repetitive reads
0 = no masking (will try to find non-unique or repetitive
matches)
1 = mask frequent oligomers
2 = mask frequent and repetitive oligomers (fastest) (default)
3 = greedy frequent: mask frequent oligomers first, then try no
masking if alignments not found
4 = greedy repetitive: mask frequent and repetitive oligomers
first, then try no masking if alignments not found
-T, --trim=INT
Trim mismatches at ends (0 = no (default), 1 = yes)
-2, --dibase
Input is 2-base encoded (e.g., SOLiD), with database built
previously using dibaseindex)
-C, --cmet
Use database for methylcytosine experiments, built previously
using cmetindex)
-V, --usesnps=STRING
Use database containing known SNPs (in <STRING>.iit, built
previously using snpindex) for tolerance to SNPs
-g, --geneprob=STRING
Use IIT file containing geneprob (in <STRING>.iit, of cumulative
format >(count) (genomicpos) to resolve ties
-t, --nthreads=INT
Number of worker threads
Splicing options for RNA-Seq
-s, --splicesites=STRING
Look for splicing involving known splice sites (in
<STRING>.iit), at short or long distances
-N, --novelsplicing=INT
Look for novel splicing, not in known splice sites (if -s
provided) within shortsplicedist (-w flag) or with
novelspliceprob (-x flag)
-w, --localsplicedist=INT
Definition of local novel splicing event (default 200000)
-e, --local-splice-penalty=INT
Penalty for a local splice (default 2). Counts against
mismatches allowed
-E, --distant-splice-penalty=INT
Penalty for a distant splice (default 3). Counts against
mismatches allowed
-k, --local-splice-endlength=INT
Minimum length at end required for local spliced alignments
(default 15, min is 14)
-K, --distant-splice-endlength=INT
Minimum length at end required for distant spliced alignments
(default 16, min is 14)
-J, --distant-splice-identity=FLOAT
Minimum identity at end required for distant spliced alignments
(default 0.95)
Options for paired-end reads
-P, --pairmax=INT
Max total genomic length for paired reads (default 1000).
Should increase for RNA-Seq reads.
-p, --pairlength=INT
Expected paired-end length (default 200)
Output options
-n, --npaths=INT
Maximum number of paths to print (default 100).
-Q, --quiet-if-excessive
If more than maximum number of paths are found, then nothing is
printed.
-O, --ordered
Print output in same order as input (relevant only if there is
more than one worker thread)
-S, --print-snps=INT
Print detailed information about SNPs in reads (works only if -V
also selected) (0=no (default), 1=positions and labels)
-F, --failsonly
Print only failed alignments, those with no results
-f, --nofails
Exclude printing of failed alignments
-A, --format=STRING
Another format type, other than default. Currently implemented:
sam
Help options
-v, --version
Show version
-?, --help
Show this help message
ENVIRONMENT
GMAPDB genome directory (eqivalent to -D)
FILES
~/.gmaprc
configuration file
AUTHOR
Thomas D. Wu and Colin K. Watanabe
REPORTING BUGS
Report bugs to Thomas Wu <twu@gene.com>.
COPYRIGHT
Copyright 2005 Genentech, Inc. All rights reserved.
SEE ALSO
gmap_setup(1), gmap(1)
http://research-pub.gene.com/gmap/