
This directory contains C language source code for the SEG program of Wootton
and Federhen, for identifying and masking segments of low compositional
complexity in amino acid sequences.  This program is inappropriate for
masking nucleotide sequences and, in fact, may strip some nucleotide
ambiguity codes from nt. sequences as they are being read.

The SEG program can be used as a plug-in filter of query sequences used in the
NCBI BLAST programs.  See the -filter and -echofilter options described in the
BLAST software's manual page.

Input to SEG must be sequences in FASTA format.  Output can be produced in a
variety of formats, with FASTA format being one of them when the -x option is
used.  The file seg.doc includes a copy of the man page for the seg program.


References:
Wootton, J. C. and S. Federhen (1993).  Statistics of local complexity in amino
acid sequences and sequence databases.  Computers and Chemistry 17:149-163.


MODIFICATION HISTORY
10/18/94
Fixed a bug in the boundary conditions for the alphabet assignments
(colorings) calculations. This condition seems not to arise in the
current protein sequence databases, but does appear when the algorithm
is customized for the nucleic acid alphabet.

4/2/94
Fixed a bug in the reading of input sequence files.  B, Z, and U letters found
in the IUB amino acid alphabet and the NCBI standard amino acid alphabet
were being stripped.

3/30/94
WRG improved speed by about 3X (roughly 5X overall since 3/21/94), due in part
to the elimination of nearly all log() function calls, plus the removal of much
unused or unnecessary code.

3/21/94
Included support for the special characters "*" (translation stop) and "-"
(gap) which are found in some NCBI standard amino acid alphabets.

WRG replaced repetitive dynamic calls to log(2.) and log(20.) with precomputed
values, yielding a 33-50% speed improvement.

WRG added EOF checks in several places, the lack of which could produce
infinite looping.

The previous version of seg is archived beneath the archive subdirectory.

9/30/97
HMF5 plugged a memory leak.
