NAME
sindex - index a sequence database for sfetch
SYNOPSIS
sindex [options] seqfile1 [seqfile2...]
DESCRIPTION
sindex indexes one or more seqfiles for future sequence retrievals by
sfetch. An SSI ("squid sequence index") file is created in the same
directory with the sequence files. By default, this file is called
<seqfile>.ssi.
If there is more than one sequence file on the command line, the SSI
filename will be constructed from the last sequence file name. This may
not be what you want; see the -o option to specify your own name for
the SSI file.
sindex is capable of indexing large files (>2 GB) if optional LFS
support has been enabled at compile-time. See INSTALL instructions that
came with @PACKAGE@.
OPTIONS
-h Print brief help; includes version number and summary of all
options, including expert options.
-o <ssi outfile>
Direct the SSI index to a file named <outfile>. By default, the
SSI file would go to <seqfile>.ssi.
EXPERT OPTIONS
--64 Force the SSI file into 64-bit (large seqfile) mode, even if the
seqfile is small. You don’t want to do this unless you’re
debugging.
--external
Force sindex to do its record sorting by external (on-disk)
sorting. This is only useful for debugging, too.
--informat <s>
Specify that the sequence file is definitely in format <s>;
blocks sequence file format autodetection. This is useful in
automated pipelines, because it improves robustness
(autodetection can occasionally go wrong on a perversely
misformed file). Common examples include genbank, embl, gcg,
pir, stockholm, clustal, msf, or phylip; see the printed
documentation for a complete list of accepted format names.
--pfamseq
A hack for Pfam; indexes a FASTA file that is known to have
identifier lines in format ">[name] [accession] [optional
description]". Normally only the sequence name would be indexed
as a primary key in a FASTA SSI file, but this allows indexing
both the name (as a primary key) and accession (as a secondary
key).
SEE ALSO
afetch(1), alistat(1), compalign(1), compstruct(1), revcomp(1),
seqsplit(1), seqstat(1), sfetch(1), shuffle(1), sreformat(1),
stranslate(1), weight(1).
AUTHOR
Biosquid and its documentation are Copyright (C) 1992-2003
HHMI/Washington University School of Medicine Freely distributed under
the GNU General Public License (GPL) See COPYING in the source code
distribution for more details, or contact me.
Sean Eddy
HHMI/Department of Genetics
Washington University School of Medicine
4444 Forest Park Blvd., Box 8510
St Louis, MO 63108 USA
Phone: 1-314-362-7666
FAX : 1-314-362-2157
Email: eddy@genetics.wustl.edu