mummer - package for sequence alignment of multiple genomes

NAME

       mummer - package for sequence alignment of multiple genomes

SYNOPSIS

       mummer-annotate <gapfile><datafile>
       combineMUMs <RefSequence><MatchSequences><GapsFile>
       delta-filter [options]<deltafile>
       dnadiff [options]<reference><query> or [options]-d<deltafile>
       exact-tandems <file><min-match-len>
       gaps
       mapview [options]<coordsfile>[UTRcoords][CDScoords]
       mgaps [-d<DiagDiff>][-f<DiagFactor>][-l<MatchLen>][-s<MaxSeparation>]
       mummer [options]<reference-file><query-files>
       mummerplot [options]<matchfile>
       nucmer [options]<Reference><Query>
       nucmer2xfig
       promer [options]<Reference><Query>
       repeat-match [options]<genome-file>
       run-mummer1 <fastareference><fastaquery><prefix>[-r]
       run-mummer3 <fastareference><multi-fastaquery><prefix>
       show-aligns [options]<deltafile><refID><qryID>

       Input  is  the  .delta  output  of  either the "nucmer" or the "promer"
       program passed on the command line.

       Output is to stdout, and consists of all  the  alignments  between  the
       query and reference sequences identified on the command line.

       NOTE:  No  sorting is done by default, therefore the alignments will be
       ordered as found in the <deltafile> input.
       show-coords [options]<deltafile>
       show-snps [options]<deltafile>
       show-tiling [options]<deltafile>

DESCRIPTION

OPTIONS

       All tools (exept for gaps) obey to the -h,  --help,  -V  and  --version
       options as one would expect. This help is excellent and makes these man
       pages basically obsolete.
       combineMUMs Combines MUMs in <GapsFile> by extending matches  off  ends
       and  between  MUMs.   <RefSequence>  is  a  fasta file of the reference
       sequence.  <MatchSequences> is a  multi-fasta  file  of  the  sequences
       matched against the reference

         -D      Only output to stdout the difference positions
                 and characters
         -n      Allow matches only between nucleotides, i.e., ACGTs
         -N num  Break matches at <num> or more consecutive non-ACGTs
         -q tag  Used to label query match
         -r tag  Used to label reference match
         -S      Output all differences in strings
         -t      Label query matches with query fasta header
         -v num  Set verbose level for extra output
         -W file Reset the default output filename witherrors.gaps
         -x      Don’t output .cover files
         -e      Set error-rate cutoff to e (e.g. 0.02 is two percent)
       dnadiff  Run comparative analysis of two sequence sets using nucmer and
       its  associated  utilities  with  recommended  parameters.  See  MUMmer
       documentation  for  a more detailed description of the output. Produces
       the following output files:

           .report  - Summary of alignments, differences and SNPs
           .delta   - Standard nucmer alignment output
           .1delta  - 1-to-1 alignment from delta-filter -1
           .mdelta  - M-to-M alignment from delta-filter -m
           .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
           .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
           .snps    - SNPs from show-snps -rlTHC .1delta
           .rdiff   - Classified ref breakpoints from show-diff -rH .mdelta
           .qdiff   - Classified qry breakpoints from show-diff -qH .mdelta
           .unref   - Unaligned reference IDs and lengths (if applicable)
           .unqry   - Unaligned query IDs and lengths (if applicable)

       MANDATORY:
           reference       Set the input reference multi-FASTA filename
           query           Set the input query multi-FASTA filename
             or
           delta file      Unfiltered .delta alignment file from nucmer

       OPTIONS:
           -d|delta        Provide precomputed delta file for analysis
           -h
           --help          Display help information and exit
           -p|prefix       Set the prefix of the output files (default "out")
           -V
           --version       Display the version information and exit

       delta-filter
         -e float    For switches -g -r -q, keep repeats within e percent
                     of the best LIS score [0, 100], no repeats by default
         -g          Global alignment using length*identity weighted LIS.
                     For every reference-query pair, leave only the aligns
                     which form the longest mutually consistent set
         -h          Display help information
         -i float    Set the minimum alignment identity [0, 100], default 0
         -l int      Set the minimum alignment length, default 0
         -q          Query alignment using length*identity weighted LIS.
                     For each query, leave only the aligns which form the
                     longest consistent set for the query
         -r          Reference alignment using length*identity weighted LIS.
                     For each reference, leave only the aligns which form
                     the longest consistent set for the reference
         -u float    Set the minimum alignment uniqueness, i.e. percent of
                     the alignment matching to unique reference AND query
                     sequence [0, 100], default 0
         -o float    Set the maximum alignment overlap for -r and -q options
                     as a percent of the alignment length  [0,  100],  default
       100

         Reads a delta alignment file from either nucmer or promer and filters
       the alignments based on the command-line  switches,  leaving  only  the
       desired  alignments which are output to stdout in the same delta format
       as the input. For multiple switches, order of operations is as follows:
       -i  -l  -u  -q  -r  -g.  If  an  alignment  is  excluded by a preceding
       operation, it will be ignored by the succeeding operations

         An important distinction between the -g option and the -r -q  options
       is  that  -g requires the alignments to be mutually consistent in their
       order, while the  -r  -q  options  are  not  required  to  be  mutually
       consistent  and  therefore  tolerate  translocations,  inversions, etc.
       Thus, -r provides a one-to-many, -q a many-to-one, -r -q  a  one-to-one
       local  mapping,  and  -g  a  one-to-one global mapping of reference and
       query bases respectively.
       mapview
         -h
         --help   Display help information and exit
         -m|mag   Set the magnification at which the figure is rendered,
                  this is an option for fig2dev which is used to generate
                  the PDF and PS files (default 1.0)
         -n|num   Set the number of output files used to partition the
                  output, this is to avoid generating files that are too
                  large to display (default 10)
         -p|prefix  Set the output file prefix
                  (default "PROMER_graph or NUCMER_graph")
         -v
         --verbose  Verbose logging of the processed files
         -V
         --version  Display the version information and exit
         -x1 coord  Set the lower coordinate bound of the display
         -x2 coord  Set the upper coordinate bound of the display
         -g|ref     If the input file is provided by ’mgaps’, set the
                    reference sequence ID (as it appears in the first column
                    of the UTR/CDS coords file)
         -I         Display the name of query sequences
         -Ir        Display the name of reference genes
       mummer Find and output (to stdout) the  positions  and  length  of  all
       sufficiently  long  maximal  matches of a substring in <query-file> and
       <reference-file>

         -mum           compute  maximal  matches  that  are  unique  in  both
       sequences
         -mumcand       same as -mumreference
         -mumreference  compute maximal matches that are unique in
                  the reference-sequence but not necessarily            in the
       query-sequence (default)
         -maxmatch       compute  all  maximal  matches  regardless  of  their
       uniqueness
         -n             match only the characters a, c, g, or t
                        they can be in upper or in lower case
         -l             set the minimum length of a match
                        if not set, the default value is 20
         -b             compute forward and reverse complement matches
         -r             only compute reverse complement matches
         -s             show the matching substrings
         -c              report  the  query-position  of  a reverse complement
       match
                        relative to the original query sequence
         -F             force 4 column output format regardless of the  number
       of
                        reference sequence inputs
         -L              show  the length of the query sequences on the header
       line
       nuncmer
           nucmer generates  nucleotide  alignments  between  two  mutli-FASTA
       input
           files.  Two  output  files  are generated. The .cluster output file
       lists
           clusters of matches between each sequence. The  .delta  file  lists
       the
           distance  between  insertions  and  deletions  that produce maximal
       scoring
           alignments between each sequence.

       MANDATORY:
           Reference     Set the input reference multi-FASTA filename
           Query         Set the input query multi-FASTA filename

         --mum           Use anchor  matches  that  are  unique  in  both  the
       reference
                         and query
         --mumcand       Same as --mumreference
         --mumreference   Use  anchor  matches  that  are  unique  in  in  the
       reference
                         but not necessarily  unique  in  the  query  (default
       behavior)
         --maxmatch      Use all anchor matches regardless of their uniqueness

         -b|breaklen     Set the distance an alignment extension will  attempt
       to
                         extend poor scoring regions before giving up (default
       200)
         -c|mincluster   Sets the minimum  length  of  a  cluster  of  matches
       (default 65)
         --[no]delta      Toggle  the  creation  of  the  delta  file (default
       --delta)
         --depend        Print the dependency information and exit
         -d|diagfactor   Set the  clustering  diagonal  difference  separation
       factor
                         (default 0.12)
         --[no]extend    Toggle the cluster extension step (default --extend)
         -f
         --forward       Use only the forward strand of the Query sequences
         -g|maxgap       Set the maximum gap between two adjacent matches in a
                         cluster (default 90)
         -h
         --help          Display help information and exit
         -l|minmatch     Set the minimum length of a single match (default 20)
         -o
         --coords        Automatically generate the original NUCmer1.1 coords
                         output file using the ’show-coords’ program
         --[no]optimize   Toggle  alignment  score  optimization,  i.e.  if an
       alignment
                         extension reaches the end  of  a  sequence,  it  will
       backtrack
                         to   optimize   the   alignment   score   instead  of
       terminating the
                         alignment  at  the  end  of  the  sequence   (default
       --optimize)
         -p|prefix       Set the prefix of the output files (default "out")
         -r
         --reverse         Use  only  the  reverse  complement  of  the  Query
       sequences
         --[no]simplify  Simplify alignments by  removing  shadowed  clusters.
       Turn
                         this  option  off if aligning a sequence to itself to
       look
                         for repeats (default --simplify)

       promer
           promer generates amino acid alignments between two mutli-FASTA  DNA
       input
           files.  Two  output  files  are generated. The .cluster output file
       lists
           clusters of matches between each sequence. The  .delta  file  lists
       the
           distance  between  insertions  and  deletions  that produce maximal
       scoring
           alignments between each sequence. The DNA input is translated  into
       all 6
           reading  frames  in  order  to  generate the output, but the output
       coordinates
           reference the original DNA input.

       MANDATORY:
           Reference     Set the input reference multi-FASTA DNA file
           Query         Set the input query multi-FASTA DNA file

         --mum           Use anchor  matches  that  are  unique  in  both  the
       reference
                         and query
         --mumcand       Same as --mumreference
         --mumreference   Use  anchor  matches  that  are  unique  in  in  the
       reference
                         but not necessarily  unique  in  the  query  (default
       behavior)
         --maxmatch      Use all anchor matches regardless of their uniqueness

         -b|breaklen     Set the distance an alignment extension will  attempt
       to
                         extend   poor   scoring  regions  before  giving  up,
       measured in
                         amino acids (default 60)
         -c|mincluster   Sets the minimum length  of  a  cluster  of  matches,
       measured in
                         amino acids (default 20)
         --[no]delta      Toggle  the  creation  of  the  delta  file (default
       --delta)
         --depend        Print the dependency information and exit
         -d|diagfactor   Set the  clustering  diagonal  difference  separation
       factor
                         (default .11)
         --[no]extend    Toggle the cluster extension step (default --extend)
         -g|maxgap       Set the maximum gap between two adjacent matches in a
                         cluster, measured in amino acids (default 30)
         -l|minmatch     Set the minimum length of a single match, measured in
       amino
                         acids (default 6)
         -m|masklen       Set  the  maximum bookend masking lenth, measured in
       amino
                         acids (default 8)
         -o
         --coords         Automatically  generate   the   original   PROmer1.1
       ".coords"
                         output file using the "show-coords" program
         --[no]optimize   Toggle  alignment  score  optimization,  i.e.  if an
       alignment
                         extension reaches the end  of  a  sequence,  it  will
       backtrack
                         to   optimize   the   alignment   score   instead  of
       terminating the
                         alignment  at  the  end  of  the  sequence   (default
       --optimize)

         -p|prefix       Set the prefix of the output files (default "out")
         -x|matrix       Set the alignment matrix number to 1 [BLOSUM 45],
                         2 [BLOSUM 62] or 3 [BLOSUM 80] (default 2)
       repeat-match Find all maximal exact matches in <genome-file>
         -E    Use exhaustive (slow) search to find matches
         -f    Forward strand only, don’t use reverse complement
         -n #  Set minimum exact match length to #
         -t    Only output tandem repeats
         -V #  Set level of verbose (debugging) printing to #
       show-aligns
         -h      Display help information
         -q      Sort alignments by the query start coordinate
         -r      Sort alignments by the reference start coordinate
         -w int  Set the screen width - default is 60
         -x int  Set the matrix type - default is 2 (BLOSUM 62),
                 other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)
                 note: only has effect on amino acid alignments
       show-coords
         -b          Merges overlapping alignments regardless of match dir
                     or  frame and does not display any idenitity information.
         -B          Switch output to btab format
         -c          Include percent coverage information in the output
         -d          Display the alignment direction in the additional
                     FRM columns (default for promer)
         -g          Deprecated option. Please use ’delta-filter’ instead
         -h          Display help information
         -H          Do not print the output header
         -I float    Set minimum percent identity to display
         -k          Knockout (do not display) alignments that overlap
                     another alignment in a different frame by more than 50%
                     of their length, AND have a smaller percent similarity
                     or are less than 75% of the size of the other alignment
                     (promer only)
         -l          Include the sequence length information in the output
         -L long     Set minimum alignment length to display
         -o          Annotate maximal alignments between two sequences, i.e.
                     overlaps between reference and query sequences
         -q          Sort output lines by query IDs and coordinates
         -r          Sort output lines by reference IDs and coordinates
         -T          Switch output to tab-delimited format

         Input is the .delta output of either the  "nucmer"  or  the  "promer"
       program passed on the command line.

         Output  is  to stdout, and consists of a list of coordinates, percent
       identity, and other useful information  regarding  the  alignment  data
       contained in the .delta file used as input.

         NOTE: No sorting is done by default, therefore the alignments will be
       ordered as found in the <deltafile> input.
       show-snps
         -C            Do not report SNPs from alignments with an ambiguous
                       mapping, i.e. only report SNPs where the [R] and [Q]
                       columns equal 0 and do not output these columns
         -h            Display help information
         -H            Do not print the output header
         -I            Do not report indels
         -l            Include sequence length information in the output
         -q            Sort output lines by query IDs and SNP positions
         -r            Sort output lines by reference IDs and SNP positions
         -S            Specify which alignments to report by passing
                       ’show-coords’ lines to stdin
         -T            Switch to tab-delimited format
         -x int        Include x characters of surrounding SNP context in the
                       output, default 0

         Input is the .delta output of either the  nucmer  or  promer  program
       passed on the command line.

         Output  is  to  stdout, and consists of a list of SNPs (or amino acid
       substitutions for promer) with positions and other useful info.  Output
       will  be  sorted  with  -r by default and the [BUFF] column will always
       refer to the sequence whose positions  have  been  sorted.  This  value
       specifies  the  distance  from this SNP to the nearest mismatch (end of
       alignment, indel, SNP, etc) in the same  alignment,  while  the  [DIST]
       column  specifies  the  distance  from this SNP to the nearest sequence
       end. SNPs for which the [R] and [Q] columns are greater than  0  should
       be evaluated with caution, as these columns specify the number of other
       alignments which overlap this position. Use -C to assure SNPs are  only
       reported from unique alignment regions.

       show-tiling
         -a          Describe the tiling path by printing the tab-delimited
                     alignment region coordinates to stdout
         -c          Assume the reference sequences are circular, and allow
                     tiled contigs to span the origin
         -g  int       Set  maximum  gap  between  clustered  alignments  [-1,
       INT_MAX]
                     A value of -1 will represent infinity
                     (nucmer default = 1000)
                     (promer default = -1)
         -i float    Set minimum percent identity to tile [0.0, 100.0]
                     (nucmer default = 90.0)
                     (promer default = 55.0)
         -l int      Set minimum length contig to report [-1, INT_MAX]
                     A value of -1 will represent infinity
                     (common default = 1)
         -p file     Output a pseudo molecule of the query contigs to ’file’
         -R          Deal with repetitive contigs by randomly placing them
                     in one of their copy locations (implies -V 0)
         -t file     Output a TIGR style contig list of each query sequence
                     that sufficiently matches the reference (non-circular)
         -u file     Output the tab-delimited alignment region coordinates
                     of the unusable contigs to ’file’
         -v float    Set minimum contig coverage to tile [0.0, 100.0]
                     (nucmer default = 95.0) sum of individual alignments
                     (promer default = 50.0) extent of syntenic region
         -V float    Set minimum contig coverage difference [0.0, 100.0]
                     i.e. the difference needed to determine one alignment
                     is ’better’ than another alignment
                     (nucmer default = 10.0) sum of individual alignments
                     (promer default = 30.0) extent of syntenic region
         -x          Describe the tiling path by printing the XML contig
                     linking information to stdout

         Input is the .delta output of the nucmer program, run on very similar
       sequence  data,  or  the  .delta  output  of the promer program, run on
       divergent sequence data.

         Output is to stdout, and consists of the predicted location  of  each
       aligning  query  contig  as  mapped  to the reference sequences.  These
       coordinates reference the extent of the entire query contig, even  when
       only  a  certain  percentage of the contig was actually aligned (unless
       the -a option is used). Columns are, start in ref, end in ref, distance
       to  next  contig,  length of this contig, alignment coverage, identity,
       orientation, and ID respectively.

AUTHOR

       mummer was written by S. Kurtz, A. Phillippy, A.L. Delcher,  M.  Smoot,
       M. Shumway, C. Antonescu, and S.L. Salzberg.

                                 May 21, 2005

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO

AUTHOR