Beruflich Dokumente
Kultur Dokumente
6 1993
CABIOS Pages 735-740
System and methods the rest of the alignment can still be moved up- or downward.
This makes it easy to compare one sequence to several others,
DCSE was written following the ANSI specification of the C
or to rearrange the order of the sequences temporarily.
programming language. However, some parts are necessarily
platform specific. These parts lie mainly in routines interfacing
with the operating system, such as screen manipulation, Checking the primary structure
keyboard and filing system routines. DCSE has been compiled DCSE uses reference files to store a version of the sequences
for the following environments: VMS for VAXstations, Ultrix that is never changed. The reference file can also contain extra
on DECstations, DOS on IBM-compatible PCs and RISC OS information about the sequence, such as the taxonomic position
on the Acorn Archimedes range. of the organism, and a literature reference. The sequences in
Under VMS, DCSE has been compiled using VAX C. It uses the reference file are generally not edited, and thus remain
ANSI escape sequences to control the display and the SMG correct. DCSE can check whether the sequence in an alignment,
Aligning sequences
Algorithms
One sequence can be automatically aligned to another one by
Editing
DCSE. The alignment algorithm will create an array the size
DCSE looks at an alignment as if it were an abacus. The of one sequence. Corresponding positions in the other sequence
alignment has a rod or sequence line for every organism. All will be stored in this array. For the creation of the array it uses
rods have the same length, which is set by the number of a combination of two methods. The program starts by
positions. This number can be reduced by removing positions comparing the two sequences using a recursive method.
that are empty in all sequence lines, or increased by inserting Subsequences of specified size that appear in both sequences
new positions. Every rod has a fixed number of beads are matched. The unmatched stretches between two matched
(characters) in a fixed order. The number of characters is subsequences are analysed using the same method with a smaller
smaller than the number of positions, so every position contains block size. When the subsequence size reaches a certain
either a nucleotide symbol or a gap symbol. This way the minimum, corresponding positions in the remaining stretches
characters can be shifted. If a character is pushed leftward or are matched using an algorithm that works by minimizing an
rightward, and makes contact with another one, the latter is 'alignment distance' between the stretches (Sellers, 1974). This
pushed in the same direction. In this way the fixed order of distance is calculated by adding a penalty for every mismatch
characters, or primary structure, will always remain correct. or gap. Gaps are penalized using affine gap costs (Spouge,
Just as in an ordinary screen editor, DCSE uses the screen 1991).
as a window on a part of the alignment. This window can be The sequence that has served as a reference will not be
moved in several ways to display other parts of the alignment. changed by the alignment routine. The characters of the other
The screen can be split to show two windows on different parts sequence will be shifted relative to this sequence in order to
of the alignment. It also features a pointer to show the current reflect the calculated correspondence array. However, if the
position in the alignment. This pointer can be moved by the newly aligned sequence contains an insertion in a spot where
arrow keys, and the window will scroll appropriately in order the reference sequence does not have a corresponding gap, this
to keep the pointer on the screen. It is used as a finger, which can not be properly accommodated in the alignment. DCSE's
can push characters leftward or rightward. The pointer cannot alignment routine can handle this situation in three ways. It can
only push characters, it can also move characters to the other create a global insert in the entire alignment, or it can carry
end of a gap, get a character from the other end of a gap, or out the insert by pushing the surrounding characters in the newly
move a continuous block of characters to either side. The pointer aligned sequence aside, thereby possibly disrupting the
can also be resized so that it covers a number of sequences and alignment locally. The other option is to leave the insertion out.
positions. The resized pointer can perform the same actions as This option will produce an error in the primary structure, which
the small one, but all characters covered by the pointer will can be detected easily later on by the primary structure checking
keep their relative positions during the process. routine. This will leave it up to the user to decide whether a
The order of the sequence lines is not rigid. One or more global insert should be created, or whether the problem can
lines can be locked in a given position on the screen, while be solved by a local sequence realignment.
736
Dedicated Comparative Sequence Editor
Organism 1 u Organism 2 ; * AU
u G C
G G
G U c
u C A
c G
r- * U C A GU G
UCC8C U U
U 2
u
u A GGCG u
A G G CG A A A
G
A G U
A A - A
U
C
A
A
A
A
U A - U
G C
c
Fig. 1. Illustration of secondary structure symbols. Two imaginary RNA sequences and drawings of corresponding secondary structures are shown. In the linear
representation square brackets designate the beginning and end of one strand of a helix and a circumflex separates adjacent helix strands. Braces are used to
indicate the beginning and end of an internal loop or bulge loop, and a base taking part in a non-standard pair is enclosed in parentheses. The 'Helix numbering'
line identifies the helices.
737
P.De R p and R.De Wachter
738
Dedicated Comparative Sequence Editor
illustrated in Figure 3. The next line is the help line. The other been described in the literature. Olsen's SEQEDT is available
lines contain sequences, preceded by an abbreviation of the from the Ribosomal Database Project (Olsen et al., 1991). The
species name. The pointer is shown as a rectangular area in GCG sequence analysis package (Genetics Computer Group,
inverse video or with a differently coloured background. Display Inc., Madison, WI) provides the multiple sequence editor
of the structure symbols can be switched off, in which case LINEUP. With the exception of SEQEDT and LINEUP, these
the number of nucleotides visible on the screen is doubled. editors were originally developed to edit protein alignments,
Every character can be given a specific colour, which makes though they can be used for nucleic acid alignments as well.
it possible to display bases or amino acids, and/or secondary DCSE was developed from the start to deal with structural RNA
structure elements in a different colour. molecules, though it can also be used on protein alignments.
An extensive range of functions allows location of certain DCSE provides the tools to investigate higher-order structure
features in the alignment. DCSE can go to a specified sequence on a comparative basis. DCSE also has several other advantages
739
P.De Ryk and R.De Wachter
Acknowledgements
This research was supported in part by the Program on Interuniversity Poles
of Attraction (contract 23) of the Office for Science Policy Programming of
the Belgian State, and by the Fund for Collective Fundamental Research. It
was performed in the framework of the Institute for the Study of Biological
Evolution of the University of Antwerp. Peter De Rijk is a research assistant
References
Chan.S.C, Wong.A.K.C. and Chiu,D.K.Y. (1992) A survey of multiple
sequence comparison methods. Bull. Math. Bioi, 54, 563-698.
Clark,S.P. (1992) MALIGNED: a multiple sequence alignment editor. Compui.
Applic. Biosci., 8, 535-538.
Depiereux.E. and Feytmans.E. (1992) MATCH-BOX: a fundamentally new
algorithm for the simultaneous alignment of several protein sequences.
Comput. Applic. Biosci., 8, 501-509.
De Rijk,P., Neefs,J.M., Van de Peer.Y. and De Wachter.R. (1992) Compilation
of small ribosomal subunit RNA sequences. Nucleic Acids Res., 20,
2075-2089.
Faulkner.D.V. and Jurka.J. (1988) MASE: multiple aligned sequence editor.
Trends Biochem. Sci., 12, 279-280.
Olsen,G.J., Larsen.N. and Woese.C.R. (1991) The ribosomal RNA database
project. Nucleic Acids Res., 19, 2017-2021.
Parry-Smith.D.J. and Attwood.T.K. (1991) SOMAP: a novel interactive
approach to multiple proteinsequences alignment. Comput. Applic. Biosci.,
7, 233-235.
Rechid.R., Vingron.M. and Argos.P. (1989) A new interactive protein sequence
alignment program and comparison of its results with widely used algorithms.
Comput. Applic. Biosci., 5, 107-113.
Schuler.G.D., Altschul.S.F. and Lipman.D.J. (1991) A workbench of multiple
alignment construction and analysis. Proteins Struct. Fund. Genet., 9,
180-190.
Sellers,P.H. (1974) On the theory and computation of evolutionary distances.
SUM J. Appl. Math., 26, 787-793.
Spouge.J.L. (1991) Fast optimal alignment. Comput. Applic. Biosci. , 7 , 1 - 7 .
Stockwell.P.A. and Petersen.G.B. (1987) H O M E D : a homologous sequence
editor. Comput. Applic. Biosci., 3, 37-43..
Thirup.S. and Larsen.N.E. (1990) ALMA, an editor for large sequence
alignments. Proteins Struct. Funct. Genet., 7, 291-295.
Van de Peer.Y. and De Wachter.R. (1993) TREECON: a software package
for the construction and drawing of evolutionary trees. Comput. Applic.
Biosci., 9, 177-182.
740