NAME
hmmpfam - search one or more sequences against an HMM database
SYNOPSIS
hmmpfam [options] hmmfile seqfile
DESCRIPTION
hmmpfam reads a sequence file seqfile and compares each sequence in it,
one at a time, against all the HMMs in hmmfile looking for
significantly similar sequence matches.
hmmfile will be looked for first in the current working directory, then
in a directory named by the environment variable HMMERDB. This lets
administrators install HMM library(s) such as Pfam in a common
location.
There is a separate output report for each sequence in seqfile. This
report consists of three sections: a ranked list of the best scoring
HMMs, a list of the best scoring domains in order of their occurrence
in the sequence, and alignments for all the best scoring domains. A
sequence score may be higher than a domain score for the same sequence
if there is more than one domain in the sequence; the sequence score
takes into account all the domains. All sequences scoring above the -E
and -T cutoffs are shown in the first list, then every domain found in
this list is shown in the second list of domain hits. If desired, E-
value and bit score thresholds may also be applied to the domain list
using the --domE and --domT options.
OPTIONS
-h Print brief help; includes version number and summary of all
options, including expert options.
-n Specify that models and sequence are nucleic acid, not protein.
Other HMMER programs autodetect this; but because of the order
in which hmmpfam accesses data, it can’t reliably determine the
correct "alphabet" by itself.
-A <n> Limits the alignment output to the <n> best scoring domains.
-A0 shuts off the alignment output and can be used to reduce the
size of output files.
-E <x> Set the E-value cutoff for the per-sequence ranked hit list to
<x>, where <x> is a positive real number. The default is 10.0.
Hits with E-values better than (less than) this threshold will
be shown.
-T <x> Set the bit score cutoff for the per-sequence ranked hit list to
<x>, where <x> is a real number. The default is negative
infinity; by default, the threshold is controlled by E-value and
not by bit score. Hits with bit scores better than (greater
than) this threshold will be shown.
-Z <n> Calculate the E-value scores as if we had seen a sequence
database of <n> sequences. The default is arbitrarily set to
59021, the size of Swissprot 34.
EXPERT OPTIONS
--acc Report HMM accessions instead of names in the output reports.
Useful for high-throughput annotation, where the data are being
parsed for storage in a relational database.
--compat
Use the output format of HMMER 2.1.1, the 1998-2001 public
release; provided so 2.1.1 parsers don’t have to be rewritten.
--cpu <n>
Sets the maximum number of CPUs that the program will run on.
The default is to use all CPUs in the machine. Overrides the
HMMER_NCPU environment variable. Only affects threaded versions
of HMMER (the default on most systems).
--cut_ga
Use Pfam GA (gathering threshold) score cutoffs. Equivalent to
--globT <GA1> --domT <GA2>, but the GA1 and GA2 cutoffs are read
from each HMM in hmmfile individually. hmmbuild puts these
cutoffs there if the alignment file was annotated in a Pfam-
friendly alignment format (extended SELEX or Stockholm format)
and the optional GA annotation line was present. If these
cutoffs are not set in the HMM file, --cut_ga doesn’t work.
--cut_tc
Use Pfam TC (trusted cutoff) score cutoffs. Equivalent to
--globT <TC1> --domT <TC2>, but the TC1 and TC2 cutoffs are read
from each HMM in hmmfile individually. hmmbuild puts these
cutoffs there if the alignment file was annotated in a Pfam-
friendly alignment format (extended SELEX or Stockholm format)
and the optional TC annotation line was present. If these
cutoffs are not set in the HMM file, --cut_tc doesn’t work.
--cut_nc
Use Pfam NC (noise cutoff) score cutoffs. Equivalent to --globT
<NC1> --domT <NC2>, but the NC1 and NC2 cutoffs are read from
each HMM in hmmfile individually. hmmbuild puts these cutoffs
there if the alignment file was annotated in a Pfam-friendly
alignment format (extended SELEX or Stockholm format) and the
optional NC annotation line was present. If these cutoffs are
not set in the HMM file, --cut_nc doesn’t work.
--domE <x>
Set the E-value cutoff for the per-domain ranked hit list to
<x>, where <x> is a positive real number. The default is
infinity; by default, all domains in the sequences that passed
the first threshold will be reported in the second list, so that
the number of domains reported in the per-sequence list is
consistent with the number that appear in the per-domain list.
--domT <x>
Set the bit score cutoff for the per-domain ranked hit list to
<x>, where <x> is a real number. The default is negative
infinity; by default, all domains in the sequences that passed
the first threshold will be reported in the second list, so that
the number of domains reported in the per-sequence list is
consistent with the number that appear in the per-domain list.
Important note: only one domain in a sequence is absolutely
controlled by this parameter, or by --domT. The second and
subsequent domains in a sequence have a de facto bit score
threshold of 0 because of the details of how HMMER works. HMMER
requires at least one pass through the main model per sequence;
to do more than one pass (more than one domain) the multidomain
alignment must have a better score than the single domain
alignment, and hence the extra domains must contribute positive
score. See the Users’ Guide for more detail.
--forward
Use the Forward algorithm instead of the Viterbi algorithm to
determine the per-sequence scores. Per-domain scores are still
determined by the Viterbi algorithm. Some have argued that
Forward is a more sensitive algorithm for detecting remote
sequence homologues; my experiments with HMMER have not
confirmed this, however.
--informat <s>
Assert that the input seqfile is in format <s>; do not run
Babelfish format autodection. This increases the reliability of
the program somewhat, because the Babelfish can make mistakes;
particularly recommended for unattended, high-throughput runs of
HMMER. Valid format strings include FASTA, GENBANK, EMBL, GCG,
PIR, STOCKHOLM, SELEX, MSF, CLUSTAL, and PHYLIP. See the User’s
Guide for a complete list.
--null2
Turn off the post hoc second null model. By default, each
alignment is rescored by a postprocessing step that takes into
account possible biased composition in either the HMM or the
target sequence. This is almost essential in database searches,
especially with local alignment models. There is a very small
chance that this postprocessing might remove real matches, and
in these cases --null2 may improve sensitivity at the expense of
reducing specificity by letting biased composition hits through.
--pvm Run on a Parallel Virtual Machine (PVM). The PVM must already be
running. The client program hmmpfam-pvm must be installed on all
the PVM nodes. The HMM database hmmfile and an associated GSI
index file hmmfile.gsi must also be installed on all the PVM
nodes. (The GSI index is produced by the program hmmindex.)
Because the PVM implementation is I/O bound, it is highly
recommended that each node have a local copy of hmmfile rather
than NFS mounting a shared copy. Optional PVM support must have
been compiled into HMMER for --pvm to function.
--xnu Turn on XNU filtering of target protein sequences. Has no effect
on nucleic acid sequences. In trial experiments, --xnu appears
to perform less well than the default post hoc null2 model.
SEE ALSO
Master man page, with full list of and guide to the individual man
pages: see hmmer(1).
For complete documentation, see the user guide that came with the
distribution (Userguide.pdf); or see the HMMER web page,
http://hmmer.wustl.edu/.
COPYRIGHT
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
Freely distributed under the GNU General Public License (GPL).
See the file COPYING in your distribution for details on redistribution
conditions.
AUTHOR
Sean Eddy
HHMI/Dept. of Genetics
Washington Univ. School of Medicine
4566 Scott Ave.
St Louis, MO 63110 USA
http://www.genetics.wustl.edu/eddy/