Man Linux: Main Page and Category List

NAME

       sindex - index a sequence database for sfetch

SYNOPSIS

       sindex [options] seqfile1 [seqfile2...]

DESCRIPTION

       sindex  indexes  one or more seqfiles for future sequence retrievals by
       sfetch.  An SSI ("squid sequence index") file is created  in  the  same
       directory  with  the  sequence  files.  By default, this file is called
       <seqfile>.ssi.

       If there is more than one sequence file on the command  line,  the  SSI
       filename will be constructed from the last sequence file name. This may
       not be what you want; see the -o option to specify your  own  name  for
       the SSI file.

       sindex  is  capable  of  indexing  large  files (>2 GB) if optional LFS
       support has been enabled at compile-time. See INSTALL instructions that
       came with @PACKAGE@.

OPTIONS

       -h     Print  brief  help;  includes  version number and summary of all
              options, including expert options.

       -o <ssi outfile>
              Direct the SSI index to a file named <outfile>.  By default, the
              SSI file would go to <seqfile>.ssi.

EXPERT OPTIONS

       --64   Force the SSI file into 64-bit (large seqfile) mode, even if the
              seqfile is small. You  don’t  want  to  do  this  unless  you’re
              debugging.

       --external
              Force  sindex  to  do  its  record sorting by external (on-disk)
              sorting. This is only useful for debugging, too.

       --informat <s>
              Specify that the sequence file  is  definitely  in  format  <s>;
              blocks  sequence  file  format  autodetection. This is useful in
              automated   pipelines,   because    it    improves    robustness
              (autodetection   can  occasionally  go  wrong  on  a  perversely
              misformed file). Common examples  include  genbank,  embl,  gcg,
              pir,  stockholm,  clustal,  msf,  or  phylip;  see  the  printed
              documentation for a complete list of accepted format names.

       --pfamseq
              A hack for Pfam; indexes a FASTA file  that  is  known  to  have
              identifier   lines  in  format  ">[name]  [accession]  [optional
              description]". Normally only the sequence name would be  indexed
              as  a  primary key in a FASTA SSI file, but this allows indexing
              both the name (as a primary key) and accession (as  a  secondary
              key).

SEE ALSO

       afetch(1),   alistat(1),   compalign(1),   compstruct(1),   revcomp(1),
       seqsplit(1),   seqstat(1),   sfetch(1),    shuffle(1),    sreformat(1),
       stranslate(1), weight(1).

AUTHOR

       Biosquid   and   its   documentation   are   Copyright   (C)  1992-2003
       HHMI/Washington University School of Medicine Freely distributed  under
       the  GNU  General  Public  License (GPL) See COPYING in the source code
       distribution for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu