Man Linux: Main Page and Category List

NAME

       parseblast  -  Filtering High-scoring Segment Pairs (HSPs) from WU/NCBI
       BLAST.

SYNOPSIS

       parseblast [options] <results.from.blast>

DESCRIPTION

       This manual page documents briefly the parseblast command.

       Different output options are available, the  most  important  here  are
       those  allowing  to  write  HSPs in GFF format (GFFv1, GFFv2 or APLOT).
       Sequences can be included in  the  GFF  records  as  a  comment  field.
       Furthermore, this script can output also the alignments for each HSP in
       ALN, MSF or tabular formats.

       NOTE - If first line from blast  program  output  (the  one  containing
       which flavour has been run, say here BLASTN, BLASTP, BLASTX, TBLASTN or
       TBLASTX), is missing, the program assumes that it contains  BLASTN  HSP
       records.  So  that,  ensure  that you feed the parseblast script with a
       well formatted BLAST file. Sometimes there are no  spaces  between  the
       HSP  coords  and  its sequence, as it sometimes happens in Web-Blast or
       Paracel-Blast outputs. Now those records are processed ok and that  HSP
       is retrieved as well as "standard" ones.

       WARNING  -  Frame  fields  from  GFF  records generated with parseblast
       contain BLAST frame  (".","1","2","3")  instead  of  the  GFF  standard
       values  (".","0","1","2").  As  the  frame  for  reverse strand must be
       recalculated from the original sequence length,  we  suggest  users  to
       post-process  the  GFF  output  from this script with a suitable filter
       that fix the frames (in case that the program that is going to use  the
       GFF  records  will not work with the original BLAST frames). We provide
       the command-line option "--no-frame" to set frames to "." (meaning that
       there is no frame).

OPTIONS

       parseblast  prints  output  in  "HSP" format by default (see below). It
       takes input from <STDIN>  or  single/multiple  files,  and  writes  its
       output  to  <STDOUT>,  so user can redirect to a file but he also could
       use the program as a filter within a pipe.   "-N",  "-M",  "-P",  "-G",
       "-F",  "-A" and "-X" options (also the long name versions for each one)
       are mutually exclusive, and their precedence order is shown above.

       GFF OPTIONS:

       -G, --gff
              Prints output in GFFv1 format.

       -F, --fullgff
              Prints output in GFFv2 "alignment" format ("target").

       -A, --aplot
              Prints output in pseudo-GFF APLOT "alignment" format.

       -S, --subject
              Projecting GFF output by SUBJECT (default by QUERY).

       -Q, --sequence
              Append query and subject sequences to GFF record.

       -b, --bit-score
              Set <score> field to Bits (default Alignment Score).

       -i, --identity-score
              Set <score> field to Identities (default Alignment).

       -s, --full-scores
              Include all scores for each HSP in each GFF record.

       -u, --no-frame
              Set all frames to "." (GFF for not available frames).

       -t, --compact-tags
              Target coords+strand+frame in short form (NO GFFv2!).

       ALIGNMENT OPTIONS:

       -P, --pairwise
              Prints pairwise alignment for each HSP in TBL format.

       -M, --msf
              Prints pairwise alignment for each HSP in MSF format.

       -N, --aln
              Prints pairwise alignment for each HSP in ALN format.

       -W, --show-coords
              Adds start/end positions to alignment output.

       GENERAL OPTIONS:

       -X, --expanded
              Expanded output (producing multiline output records).

       -c, --comments
              Include parameters from blast program as comments.

       -n, --no-comments
              Do not print "#" lines (raw output without comments).

       -v, --verbose
              Warnings sent to <STDERR>.

       --version
              Prints program version and exits.

       -h, --help
              Shows this help and exits.

OUTPUT FORMATS:

        "S_" stands for  "Subject_Sequence"  and  "Q_"  for  "Query_Sequence".
       <Program> name is taken from input blast file. <Strands> are calculated
       from <start> and <end> positions on original  blast  file.  <Frame>  is
       obtained  from the blast file if is present else is set to ".". <SCORE>
       is set to Alignment Score by default, you can change it with  "-b"  and
       "-i".
        If  "-S"  or  "--subject"  options  are  given,  then QUERY fields are
       referred to SUBJECT and SUBJECT fields are relative to QUERY (this only
       available for GFF output records).
        Dots  ("...")  mean that record description continues in the following
       line, but such record is printed as a single line record by parseblast.

       [HSP] <- (This is the DEFAULT OUTPUT FORMAT)
        <Program> <DataBase> : ...
          ... <IdentityMatches> <Min_Length> <IdentityScore> ...
          ... <AlignmentScore> <BitScore> <E_Value> <P_Sum> : ...
          ... <Q_Name> <Q_Start> <Q_End> <Q_Strand> <Q_Frame> : ...
          ...    <S_Name>    <S_Start>    <S_End>   <S_Strand>   <S_Frame>   :
       <S_FullDescription>

       [GFF]
        <Q_Name> <Program> hsp <Q_Start> <Q_End> <SCORE> <Q_Strand>  <Q_Frame>
       <S_Name>

       [FULL GFF] <- (GFF showing alignment data)
        <Q_Name>  <Program> hsp <Q_Start> <Q_End> <SCORE> <Q_Strand> <Q_Frame>
       ...
          ... Target "<S_Name>" <S_Start> <S_End> ...
          ... E_value <E_Value> Strand <S_Strand> Frame <S_Frame>

       [APLOT] <- (GFF format enhanced for APLOT program)
        <Q_Name>:<S_Name> <Program>  hsp  <Q_Start>:<S_Start>  <Q_End>:<S_End>
       <SCORE> ...
          ...             <Q_Strand>:<S_Strand>            <Q_Frame>:<S_Frame>
       <BitScore>:<HSP_Number> ...
          ... # E_value <E_Value>

       [EXPANDED]
        MATCH(<HSP_Number>): <Q_Name> x <S_Name>
        SCORE(<HSP_Number>): <AlignmentScore>
        BITSC(<HSP_Number>): <BitScore>
        EXPEC(<HSP_Number>): <E_Value> Psum(<P_Sum>)
        IDENT(<HSP_Number>): <IdentityMatches>/<Min_Length> :  <IdentityScore>
       %
        T_GAP(<HSP_Number>): <TotalGaps(BothSeqs)>
        FRAME(<HSP_Number>): <Q_Frame>/<S_Frame>
        STRND(<HSP_Number>): <Q_Strand>/<S_Strand>
        MXLEN(<HSP_Number>): <Max_Length>
        QUERY(<HSP_Number>): length <Q_Length> : gaps <Q_TotalGaps> : ...
          ... <Q_Start> <Q_End> : <Q_Strand> : <Q_Frame> : <Q_FullSequence>
        SBJCT(<HSP_Number>): length <S_Length> : gaps <S_TotalGaps> : ...
          ... <S_Start> <S_End> : <S_Strand> : <S_Frame> : <S_FullSequence>

SEE ALSO

       ali2gff(1), blat2gff(1), gff2aplot(1), sim2gff(1).

AUTHOR

       parseblast was written by Josep F. Abril <abril@imim.es>.

       This   manual   page   was   written   by   Nelson   A.   de   Oliveira
       <naoliv@gmail.com>, for the Debian project (but may be used by others).

                        Mon, 21 Mar 2005 21:44:15 -0300