tigr-glimmer — Ceates and outputs an interpolated Markov model(IMM)

NAME

       tigr-glimmer — Ceates and outputs an interpolated Markov model(IMM)

SYNOPSIS

       tigr-build-icm

DESCRIPTION

       Program   build-icm.c  creates and outputs an interpolated Markov model
       (IMM) as described in the paper A.L. Delcher, D. Harmon, S.  Kasif,  O.
       White,  and S.L. Salzberg.  Improved Microbial Gene Identification with
       Glimmer.  Nucleic Acids Research, 1999,  in  press.   Please  reference
       this paper if you use the system as part of any published research.

       Input  comes from the file named on the command-line.  Format should be
       one string per line.  Each line has an  ID  string  followed  by  white
       space  followed  by  the  sequence  itself.   The  script  run-glimmer3
       generates an input file in  the  correct  format  using  the  ’extract’
       program.

       The  IMM is constructed as follows: For a given context, say acgtta, we
       want to estimate the probability distribution of  the  next  character.
       We  shall  do  this as a linear combination of the observed probability
       distributions for this context and all of its  suffixes,  i.e.,  cgtta,
       gtta,  tta,  ta,  a  and  empty.   By observed distributions I mean the
       counts of the number of occurrences of these strings  in  the  training
       set.   The  linear combination is determined by a set of probabilities,
       lambda, one for each context string.  For  context  acgtta  the  linear
       combination coefficients are:

       lambda  (acgtta)  (1  -  lambda  (acgtta)) x lambda (cgtta) (1 - lambda
       (acgtta)) x (1 - lambda (cgtta)) x lambda (gtta) (1 - lambda  (acgtta))
       x (1 - lambda (cgtta)) x (1 - lambda (gtta)) x lambda (tta) (1 - lambda
       (acgtta)) x (1 - lambda (cgtta)) x (1 - lambda (gtta)) x  (1  -  lambda
       (tta))  x (1 - lambda (ta))  x (1 - lambda (a))

       We  compute  the  lambda  values  for each context as follows: - If the
       number  of  observations  in  the  training  set  is  >=  the  constant
       SAMPLE_SIZE_BOUND, the lambda for that context is 1.0 - Otherwise, do a
       chi-square test on the observations for this context  compared  to  the
       distribution  predicted  for  the one-character shorter suffix context.
       If the chi-square significance < 0.5, set the lambda for  this  context
       to  0.0  Otherwise  set  the  lambda  for  this context to: (chi-square
       significance) x (# observations) / SAMPLE_WEIGHT

       To run the program:

       build-icm <train.seq > train.model

       This will use the training  data  in  train.seq  to  produce  the  file
       train.model, containing your IMM.

AUTHOR

       This  manual  page  was  quickly  copied  from the glimmer web site and
       readme file  by  Steffen  Moeller  moeller@debian.org  for  the  Debian
       system.

                                                    TIGR-GLIMMER     (1)   (1)

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR