formatrpsdb - Build databases for RPS Blast
formatrpsdb [-] [-E N] [-G N] [-S X] [-U str] [-b] [-f X] -i filename
[-l filename] [-n str] [-o] [-t str] [-v N]
Formatrpsdb is a utility that converts a collection of input sequences
into a database suitable for use with Reverse Position Specific (RPS)
Blast. Each input sequence, together with its position-specific
scoring matrix (PSSM), is ASN.1 encoded into a PssmWithParameters (or
‘scoremat’) object and resides in a separate file. Scoremat objects
can be created using blastpgp. Formatrpsdb is given a list of these
files and produces the corresponding database.
Formatrpsdb is designed to perform the work of formatdb, makemat and
copymat simultaneously, without generating the large number of
intermediate files these utilities would need to create an RPS Blast
database. Further, scoremat objects are in more general use than the
binary format makemat requires. It is hoped that direct manipulation
of scoremat objects will encourage conversion of more diverse sequence
collections into RPS Blast databases.
Databases generated by formatrpsdb are binary compatible with databases
generated by formatdb/makemat/copymat, although the database files will
in general not be byte- for-byte identical.
A summary of options is included below.
- Print usage message
-E N The gap extension penalty (if not specified in the scoremat;
default = 1)
-G N The gap opening penalty (if not specified in the scoremat;
default = 11)
-S X For scoremats that contain only residue frequencies, the scaling
factor to apply when creating PSSMs (default = 100)
-U str Underlying score matrix (if not specified in the scoremat;
default = BLOSUM62)
-b Scoremat files are binary (vs. text) ASN1.
-f X Threshold for extending hits for RPS database (default = 11)
Input file containing list of ASN.1 Scoremat filenames
Log file name (default = formatrpsdb.log)
-n str Base name of output database (same as input file if not
-o Create index files for database
-t str Title for database file
-v N Database volume size in millions of letters (default = 0, which
really means no limit)
The National Center for Biotechnology Information.
blast(1), copymat(1), formatdb(1), makemat(1),