Man Linux: Main Page and Category List

NAME

       sfetch - get a sequence from a flatfile database.

SYNOPSIS

       sfetch [options] seqname

DESCRIPTION

       sfetch retrieves the sequence named seqname from a sequence database.

       Which  database  is  used  is  controlled  by the -d and -D options, or
       "little databases" and "big databases".  The directory location of "big
       databases"  can  be  specified by environment variables, such as $SWDIR
       for Swissprot, and $GBDIR for Genbank (see -D for  complete  list).   A
       complete  file  path  must  be  specified  for  "little databases".  By
       default, if neither option is specified  and  the  name  looks  like  a
       Swissprot   identifier   (e.g.  it  has  a  _  character),  the  $SWDIR
       environment variable is  used  to  attempt  to  retrieve  the  sequence
       seqname from Swissprot.

       A  variety  of  other  options  are  available which allow retrieval of
       subsequences (-f,-t); retrieval by accession number instead of by  name
       (-a);  reformatting  the  extracted  sequence  into  a variety of other
       formats (-F); etc.

       If the database has  been  SSI  indexed,  sequence  retrieval  will  be
       extremely  efficient; else, retrieval may be painfully slow (the entire
       database may have to  be  read  into  memory  to  find  seqname).   SSI
       indexing  is  recommended  for  all  large  or permanent databases. The
       program sindex creates SSI indexes for any sequence file.

       sfetch was originally named getseq, and was renamed because it  clashed
       with a GCG program of the same name.

OPTIONS

       -a     Interpret seqname as an accession number, not an identifier.

       -d <seqfile>
              Retrieve  the sequence from a sequence file named <seqfile>.  If
              a GSI index <seqfile>.gsi exists, it is used  to  speed  up  the
              retrieval.

       -f <from>
              Extract a subsequence starting from position <from>, rather than
              from 1. See -t.  If <from> is greater than <to> (as specified by
              the  -t  option),  then the sequence is extracted as its reverse
              complement (it is assumed to be nucleic acid sequence).

       -h     Print brief help; includes version number  and  summary  of  all
              options, including expert options.

       -o <outfile>
              Direct the output to a file named <outfile>.  By default, output
              would go to stdout.

       -r <newname>
              Rename the sequence <newname> in the output after extraction. By
              default,  the  original  sequence  identifier would be retained.
              Useful, for instance, if retrieving  a  sequence  fragment;  the
              coordinates  of the fragment might be added to the name (this is
              what Pfam does).

       -t <to>
              Extract a subsequence that ends at position <to>, rather than at
              the  end  of  the sequence. See -f.  If <to> is less than <from>
              (as specified by the -f option), then the sequence is  extracted
              as  its  reverse  complement  (it  is assumed to be nucleic acid
              sequence)

       -D <database>
              Retrieve the sequence from  the  main  sequence  database  coded
              <database>. For each code, there is an environment variable that
              specifies the directory path to that database.  Recognized codes
              and   their   corresponding   environment   variables  are  -Dsw
              (Swissprot,  $SWDIR);  -Dpir   (PIR,   $PIRDIR);   -Dem   (EMBL,
              $EMBLDIR); -Dgb (Genbank, $GBDIR); -Dwp (Wormpep, $WORMDIR); and
              -Dowl (OWL, $OWLDIR).  Each  database  is  read  in  its  native
              flatfile format.

       -F <format>
              Reformat  the  extracted  sequence into a different format.  (By
              default, the sequence is extracted from the database in the same
              format  as  the  database.)  Available  formats are embl, fasta,
              genbank, gcg, strider, zuker, ig, pir, squid, and raw.

EXPERT OPTIONS

       --informat <s>
              Specify that the sequence file is in format <s>, rather than the
              default  FASTA  format.   Common examples include Genbank, EMBL,
              GCG, PIR, Stockholm, Clustal, MSF, or PHYLIP;  see  the  printed
              documentation  for  a  complete  list  of accepted format names.
              This option overrides the default  format  (FASTA)  and  the  -B
              Babelfish autodetection option.

SEE ALSO

       afetch(1),   alistat(1),   compalign(1),   compstruct(1),   revcomp(1),
       seqsplit(1),   seqstat(1),   shuffle(1),    sindex(1),    sreformat(1),
       stranslate(1), weight(1).

AUTHOR

       Biosquid   and   its   documentation   are   Copyright   (C)  1992-2003
       HHMI/Washington University School of Medicine Freely distributed  under
       the  GNU  General  Public  License (GPL) See COPYING in the source code
       distribution for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu