hmmpfam - search one or more sequences against an HMM database

NAME

       hmmpfam - search one or more sequences against an HMM database

SYNOPSIS

       hmmpfam [options] hmmfile seqfile

DESCRIPTION

       hmmpfam reads a sequence file seqfile and compares each sequence in it,
       one  at  a  time,  against  all  the  HMMs  in  hmmfile   looking   for
       significantly similar sequence matches.

       hmmfile will be looked for first in the current working directory, then
       in a directory named by the environment variable  HMMERDB.   This  lets
       administrators  install  HMM  library(s)  such  as  Pfam  in  a  common
       location.

       There is a separate output report for each sequence in  seqfile.   This
       report  consists  of  three sections: a ranked list of the best scoring
       HMMs, a list of the best scoring domains in order of  their  occurrence
       in  the  sequence,  and alignments for all the best scoring domains.  A
       sequence score may be higher than a domain score for the same  sequence
       if  there  is  more than one domain in the sequence; the sequence score
       takes into account all the domains.  All sequences scoring above the -E
       and  -T cutoffs are shown in the first list, then every domain found in
       this list is shown in the second list of domain hits.  If  desired,  E-
       value  and  bit score thresholds may also be applied to the domain list
       using the --domE and --domT options.

OPTIONS

       -h     Print brief help; includes version number  and  summary  of  all
              options, including expert options.

       -n     Specify  that models and sequence are nucleic acid, not protein.
              Other HMMER programs autodetect this; but because of  the  order
              in  which hmmpfam accesses data, it can’t reliably determine the
              correct "alphabet" by itself.

       -A <n> Limits the alignment output to the  <n>  best  scoring  domains.
              -A0 shuts off the alignment output and can be used to reduce the
              size of output files.

       -E <x> Set the E-value cutoff for the per-sequence ranked hit  list  to
              <x>,  where  <x> is a positive real number. The default is 10.0.
              Hits with E-values better than (less than) this  threshold  will
              be shown.

       -T <x> Set the bit score cutoff for the per-sequence ranked hit list to
              <x>, where <x> is  a  real  number.   The  default  is  negative
              infinity; by default, the threshold is controlled by E-value and
              not by bit score.  Hits with bit  scores  better  than  (greater
              than) this threshold will be shown.

       -Z <n> Calculate  the  E-value  scores  as  if  we  had seen a sequence
              database of <n> sequences. The default  is  arbitrarily  set  to
              59021, the size of Swissprot 34.

EXPERT OPTIONS

       --acc  Report  HMM  accessions  instead of names in the output reports.
              Useful for high-throughput annotation, where the data are  being
              parsed for storage in a relational database.

       --compat
              Use  the  output  format  of  HMMER  2.1.1, the 1998-2001 public
              release; provided so 2.1.1 parsers don’t have to be rewritten.

       --cpu <n>
              Sets the maximum number of CPUs that the program  will  run  on.
              The  default  is  to  use all CPUs in the machine. Overrides the
              HMMER_NCPU environment variable. Only affects threaded  versions
              of HMMER (the default on most systems).

       --cut_ga
              Use  Pfam GA (gathering threshold) score cutoffs.  Equivalent to
              --globT <GA1> --domT <GA2>, but the GA1 and GA2 cutoffs are read
              from  each  HMM  in  hmmfile  individually.  hmmbuild puts these
              cutoffs there if the alignment file was  annotated  in  a  Pfam-
              friendly  alignment  format (extended SELEX or Stockholm format)
              and the optional  GA  annotation  line  was  present.  If  these
              cutoffs are not set in the HMM file, --cut_ga doesn’t work.

       --cut_tc
              Use  Pfam  TC  (trusted  cutoff)  score  cutoffs.  Equivalent to
              --globT <TC1> --domT <TC2>, but the TC1 and TC2 cutoffs are read
              from  each  HMM  in  hmmfile  individually.  hmmbuild puts these
              cutoffs there if the alignment file was  annotated  in  a  Pfam-
              friendly  alignment  format (extended SELEX or Stockholm format)
              and the optional  TC  annotation  line  was  present.  If  these
              cutoffs are not set in the HMM file, --cut_tc doesn’t work.

       --cut_nc
              Use  Pfam NC (noise cutoff) score cutoffs. Equivalent to --globT
              <NC1> --domT <NC2>, but the NC1 and NC2 cutoffs  are  read  from
              each  HMM  in  hmmfile individually. hmmbuild puts these cutoffs
              there if the alignment file was  annotated  in  a  Pfam-friendly
              alignment  format  (extended  SELEX or Stockholm format) and the
              optional NC annotation line was present. If  these  cutoffs  are
              not set in the HMM file, --cut_nc doesn’t work.

       --domE <x>
              Set  the  E-value  cutoff  for the per-domain ranked hit list to
              <x>, where <x> is  a  positive  real  number.   The  default  is
              infinity;  by  default, all domains in the sequences that passed
              the first threshold will be reported in the second list, so that
              the  number  of  domains  reported  in  the per-sequence list is
              consistent with the number that appear in the per-domain list.

       --domT <x>
              Set the bit score cutoff for the per-domain ranked hit  list  to
              <x>,  where  <x>  is  a  real  number.  The  default is negative
              infinity; by default, all domains in the sequences  that  passed
              the first threshold will be reported in the second list, so that
              the number of domains  reported  in  the  per-sequence  list  is
              consistent  with  the number that appear in the per-domain list.
              Important note: only one domain  in  a  sequence  is  absolutely
              controlled  by  this  parameter,  or  by --domT.  The second and
              subsequent domains in a sequence  have  a  de  facto  bit  score
              threshold  of 0 because of the details of how HMMER works. HMMER
              requires at least one pass through the main model per  sequence;
              to  do more than one pass (more than one domain) the multidomain
              alignment must have  a  better  score  than  the  single  domain
              alignment,  and hence the extra domains must contribute positive
              score. See the Users’ Guide for more detail.

       --forward
              Use the Forward algorithm instead of the  Viterbi  algorithm  to
              determine  the  per-sequence scores. Per-domain scores are still
              determined by the  Viterbi  algorithm.  Some  have  argued  that
              Forward  is  a  more  sensitive  algorithm  for detecting remote
              sequence  homologues;  my  experiments  with  HMMER   have   not
              confirmed this, however.

       --informat <s>
              Assert  that  the  input  seqfile  is  in format <s>; do not run
              Babelfish format autodection. This increases the reliability  of
              the  program  somewhat, because the Babelfish can make mistakes;
              particularly recommended for unattended, high-throughput runs of
              HMMER.  Valid  format strings include FASTA, GENBANK, EMBL, GCG,
              PIR, STOCKHOLM, SELEX, MSF, CLUSTAL, and PHYLIP. See the  User’s
              Guide for a complete list.

       --null2
              Turn  off  the  post  hoc  second  null  model. By default, each
              alignment is rescored by a postprocessing step that  takes  into
              account  possible  biased  composition  in either the HMM or the
              target sequence.  This is almost essential in database searches,
              especially  with  local  alignment models. There is a very small
              chance that this postprocessing might remove real  matches,  and
              in these cases --null2 may improve sensitivity at the expense of
              reducing specificity by letting biased composition hits through.

       --pvm  Run on a Parallel Virtual Machine (PVM). The PVM must already be
              running. The client program hmmpfam-pvm must be installed on all
              the  PVM  nodes.  The HMM database hmmfile and an associated GSI
              index file hmmfile.gsi must also be installed  on  all  the  PVM
              nodes.   (The  GSI  index  is produced by the program hmmindex.)
              Because the PVM  implementation  is  I/O  bound,  it  is  highly
              recommended  that  each node have a local copy of hmmfile rather
              than NFS mounting a shared copy.  Optional PVM support must have
              been compiled into HMMER for --pvm to function.

       --xnu  Turn on XNU filtering of target protein sequences. Has no effect
              on nucleic acid sequences. In trial experiments,  --xnu  appears
              to perform less well than the default post hoc null2 model.

COPYRIGHT

       Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
       Freely distributed under the GNU General Public License (GPL).
       See the file COPYING in your distribution for details on redistribution
       conditions.

AUTHOR

       Sean Eddy
       HHMI/Dept. of Genetics
       Washington Univ. School of Medicine
       4566 Scott Ave.
       St Louis, MO 63110 USA
       http://www.genetics.wustl.edu/eddy/

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXPERT OPTIONS

SEE ALSO

COPYRIGHT

AUTHOR