gmap_setup - create a genome database for GMAP or GSNAP

NAME

       gmap_setup - create a genome database for GMAP or GSNAP

SYNOPSIS

       gmap_setup -dgenomename [-Ddestdir] [-oMakefile] FASTA

OPTIONS

       -d     genome name

       -D     destination  directory  for  installation  (defaults  to  gmapdb
              directory specified at configure time)

       -o     name of output Makefile (default is "Makefile.<genome>")

       -M     use coordinates from an .md file (e.g., seq_contig.md file  from
              NCBI)

       -C     try to parse chromosomal coordinates from each FASTA header

       -E     interpret  argument  as  a  command,  instead of a list of FASTA
              files

       -O     order chromosomes in numeric/alphabetic order (0 = no, 1  =  yes
              (default))

   Advanced options
       -W     write  some  output  directly to file, instead of using RAM (use
              only if RAM is limited)

       -q     GMAP indexing interval (default: 3 nt)

       -Q     PMAP indexing interval (default: 6 aa)

DESCRIPTION

       If you want to treat each FASTA entry as a separate chromosome  (either
       because  it is in fact an entire chromosome or because you have contigs
       without any chromosomal information), you can  simply  call  gmap_setup
       like this:

              gmap_setup -d <genome> <fasta_file>...

       The  accession  of each FASTA header (the word following each ">") will
       be the name of each chromosome.  GMAP can handle an unlimited number of
       "chromosomes", with arbitrarily long names.  In this way, GMAP could be
       used as a general search program for near-identity  matches  against  a
       FASTA file.

       -M and -C
              If   your   sequences   represent   contigs  that  have  mapping
              information to specific chromosomal regions, then you  can  have
              gmap_setup  try to read each header to determine its chromosomal
              region  (the  -C  flag)  or  read  an  .md  file  that  contains
              information  about  chromosomal  regions (the -M flag).  The .md
              files are often provided in NCBI releases, but since the formats
              change  often, gmap_setup will prompt you to make sure it parses
              it correctly.

       -E     If you need to pre-process the FASTA files  before  using  these
              programs,  perhaps  because  they  are compressed or because you
              need to insert chromosomal information in the header lines,  you
              can  specify  a  command  instead  of multiple fasta_files, like
              these examples:

               gmap_setup -d <genome> -E ’gunzip -c genomefiles.gz’
               gmap_setup     -d     <genome>     -E     ’cat      *.fa      |
              ./add-chromosomal-info.pl’

       -W     The  gmap_setup  process  works best if you have a computer with
              enough RAM to hold the entire genome (e.g., 3  gigabytes  for  a
              human- or mouse-sized genome).  Since the resulting genome files
              work across all machine architectures, you can find any  machine
              with  sufficient RAM to build the genome files and then transfer
              the files  to  another  machine.   (GMAP  itself  runs  fine  on
              machines with limited RAM.)  If you cannot find any machine with
              sufficient RAM for gmap_setup, you can run the program with  the
              -W  flag to write the files directly, but this can be very slow.

       -q and -Q
              If you specify a smaller interval (for example, 3 for  the  GMAP
              interval),  you  can  create a higher-resolution database, which
              can be useful for mapping small oligomers (smaller than 18  nt).
              However,  the  corresponding  genome  index files will be larger
              (twice as big if you specify  -q  3).   These  index  files  may
              exceed  the  2 gigabyte file offset limit on some computers, and
              will therefore fail to work on those computers.

AUTHOR

       Thomas D. Wu and Colin K. Watanabe

REPORTING BUGS

       Report bugs to Thomas Wu <twu@gene.com>.

COPYRIGHT

       Copyright 2005 Genentech, Inc. All rights reserved.

NAME

SYNOPSIS

OPTIONS

DESCRIPTION

AUTHOR

REPORTING BUGS

COPYRIGHT

SEE ALSO