Man Linux: Main Page and Category List

NAME

       alistat - show statistics for a multiple alignment file

SYNOPSIS

       alistat [options] alignfile

DESCRIPTION

       alistat  reads a multiple sequence alignment from the file alignfile in
       any supported format (including SELEX, GCG MSF, and CLUSTAL), and shows
       a  number  of simple statistics about it.  These statistics include the
       name of the format, the  number  of  sequences,  the  total  number  of
       residues,  the average and range of the sequence lengths, the alignment
       length (e.g. including gap characters).

       Also shown are some percent identities. A  percent  pairwise  alignment
       identity  is  defined as (idents / MIN(len1, len2)) where idents is the
       number of exact identities and len1, len2 are the unaligned lengths  of
       the two sequences. The "average percent identity", "most related pair",
       and "most unrelated pair" of the alignment are  the  average,  maximum,
       and  minimum  of all (N)(N-1)/2 pairs, respectively.  The "most distant
       seq" is calculated by  finding  the  maximum  pairwise  identity  (best
       relative)  for  all  N  sequences,  then finding the minimum of these N
       numbers (hence, the most outlying sequence).

OPTIONS

       -a     Show additional verbose information: a table with one  line  per
              sequence  showing  name,  length,  and  its  highest  and lowest
              pairwise identity. These lines are prefixed with a  *  character
              to  enable  easily  grep’ing  them  out  and  sorting  them. For
              example, alistat -a foo.slx | grep * | sort -n +3 gives a ranked
              list   of   the   most   distant  sequences  in  the  alignment.
              Incompatible with the -f option.

       -f     Fast; use a sampling method to estimate the average  %id.   When
              this  option  is  chosen,  alistat  doesn’t show the other three
              pairwise identity numbers.  This option is useful for very large
              alignments, for which the full (N)(N-1) calculation of all pairs
              would be prohibitive (e.g. Pfam’s  GP120  alignment,  with  over
              10,000 sequences). Incompatible with the -a option.

       -h     Print  brief  help;  includes  version number and summary of all
              options, including expert options.

       -q     be quiet - suppress the verbose header  (program  name,  release
              number and date, the parameters and options in effect).

       -B     (Babelfish).  Autodetect  and  read a sequence file format other
              than the default (FASTA). Almost any common sequence file format
              is recognized (including Genbank, EMBL, SWISS-PROT, PIR, and GCG
              unaligned sequence formats, and Stockholm, GCG MSF, and  Clustal
              alignment formats). See the printed documentation for a complete
              list of supported formats.

EXPERT OPTIONS

       --informat <s>
              Specify that the sequence file is in format <s>, rather than the
              default  FASTA  format.   Common examples include Genbank, EMBL,
              GCG, PIR, Stockholm, Clustal, MSF, or PHYLIP;  see  the  printed
              documentation  for  a  complete  list  of accepted format names.
              This option overrides the default  format  (FASTA)  and  the  -B
              Babelfish autodetection option.

SEE ALSO

       afetch(1),   compalign(1),   compstruct(1),   revcomp(1),  seqsplit(1),
       seqstat(1),    sfetch(1),    shuffle(1),    sindex(1),    sreformat(1),
       stranslate(1), weight(1).

AUTHOR

       Biosquid   and   its   documentation   are   Copyright   (C)  1992-2003
       HHMI/Washington University School of Medicine Freely distributed  under
       the  GNU  General  Public  License (GPL) See COPYING in the source code
       distribution for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu