Sie sind auf Seite 1von 6

Vol.9 no.

6 1993
CABIOS Pages 735-740

DCSE, an interactive tool for sequence


alignment and secondary structure research

Peter De Rijk and Rupert De Wachter1

Abstract and strict. This makes it difficult to incorporate extra


information such as a description of secondary structure into
DCSE provides a user-friendly package for the creation and

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Michigan on September 19, 2015


the alignment process. As a result the alignments produced by
editing of sequence alignments. The program runs on different
these programs are not always optimal from a biological point
platforms, including microcomputers and workstations. Apart
of view. Also, the number of sequences that can be aligned
from available hardware, the program is not limited in the size
by these programs is usually quite limited.
of the alignment it can handle. It deviates more from classical
Another approach to the problem of making an alignment
text editors than other available sequence editors because it uses
is manual editing. Ordinary text editors are not very well suited
a different approach towards editing. It shifts characters or
for this task, mainly due to the limited number of characters
entire blocks of aligned characters, rather than inserting or
that can be placed on one line. Several multiple sequence
deleting gaps in the sequences. Alignment of a new sequence
alignment editors have been developed in response to these
to an existing alignment is partly automated. Although DCSE
problems (Stockwell and Petersen, 1987; Faulkner and Jurka,
can be used on protein sequence alignments, it is especially
1988; Thirup and Larsen, 1990; Olsen et al. ,1991; Parry-Smith
targeted at the examination ofRNA. The secondary structure
and Attwood, 1991; Clark, 1992). Most of these editors seem
for every sequence can be incorporated easily in the alignment.
to be aimed primarily at protein sequences, and though they
DCSE also has extensive built-in support for finding and
can be used on nucleic acid sequences, they do not take into
checking secondary structure elements. A sophisticated system
account the peculiarities involved in aligning a large set of
of markers allows notation of special positions in an alignment.
structural nucleic acid sequences. They are modelled after
This system can be used to store information such as the position
existing full-screen text editors but overcome the line length
of hidden breaks, introns and tertiary structure interactions.
problem, and feature functions typically used on sequence
alignments. These editors use the classical methods for editing
Introduction
text to align sequences. Alignment is achieved by inserting or
Comparison of biological macromolecules is gaining in deleting gap symbols. While these methods are very suitable
importance in studies on the structure and evolution of those for entering and editing text, a different approach seems more
molecules. Due to the development of rapid methods for suited for editing an alignment. Biopolymer sequences have
sequence determination, there has been an explosive growth been experimentally determined, and should not be changed
in published sequence data. Since a sequence alignment forms unless errors are discovered. Alignment does not involve
the basis of most comparative studies, much effort has been changing, but shifting characters and groups of characters in
put into programs that aid in the creation of multiple sequence the sequences.
alignments. In the case of molecules such as ribosomal RNA, alignment
Several programs were developed to automate totally the of more variable areas can be guided by secondary structure
creation of an optimal alignment (reviewed by Chan et al., information. A sequence editor for these molecules should
1992). However, the creation, maintenance and exploration of provide an easy and straightforward method to incorporate this
a large, meaningful alignment remains a challenge, and the information in the alignment. It should also provide tools to
advantages of an interactive approach have been advocated by locate structural elements, and to check whether the encoded
several authors (Rechid etal, 1989; Schuler etal, 1991; structure is consistent.
Depiereux and Feytmans, 1992). Indeed, multiple sequence DCSE (Dedicated Comparative Sequence Editor) originated
alignment programs use a theoretical mathematical model to from the need to maintain a rapidly growing alignment of small
define what is considered an optimal alignment. The rules subunit ribosomal RNAs (De Rijk etal., 1992). This file
followed by these programs are, by necessity, rather limited currently contains 2000 sequences and has 4760 alignment
positions. DCSE was developed from the start to cope with large
alignments. It provides a user-friendly, menu-driven
Departement Biochemie, Universiteit Antwerpen {VIA), Universiteitsplein 1, environment to create and maintain sequence alignments. It is
B-2610 Antwerpen, Belgium especially suited for the study of sequences following a common
l
To whom reprint requests should be sent secondary structure pattern.

Oxford University Press 735


P.De Rjjk and R.De Wachter

System and methods the rest of the alignment can still be moved up- or downward.
This makes it easy to compare one sequence to several others,
DCSE was written following the ANSI specification of the C
or to rearrange the order of the sequences temporarily.
programming language. However, some parts are necessarily
platform specific. These parts lie mainly in routines interfacing
with the operating system, such as screen manipulation, Checking the primary structure
keyboard and filing system routines. DCSE has been compiled DCSE uses reference files to store a version of the sequences
for the following environments: VMS for VAXstations, Ultrix that is never changed. The reference file can also contain extra
on DECstations, DOS on IBM-compatible PCs and RISC OS information about the sequence, such as the taxonomic position
on the Acorn Archimedes range. of the organism, and a literature reference. The sequences in
Under VMS, DCSE has been compiled using VAX C. It uses the reference file are generally not edited, and thus remain
ANSI escape sequences to control the display and the SMG correct. DCSE can check whether the sequence in an alignment,

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Michigan on September 19, 2015


routines for keyboard input. The version on DECstations was disregarding the gaps, is equal to the corresponding sequence
compiled with Ultrix C, and uses Curses for both screen output in a reference file. It does this by aligning the two sequences.
and keyboard input. Implementations on Turbo C for DOS and DCSE will show the differences on the screen, and offer to
Desktop C for RISC OS use commands provided by these correct them. Users can decide to make a correction themselves,
packages to handle the necessary screen output and keyboard if the alignment might be disturbed by automatic insertion or
input. deletion of characters.

Aligning sequences
Algorithms
One sequence can be automatically aligned to another one by
Editing
DCSE. The alignment algorithm will create an array the size
DCSE looks at an alignment as if it were an abacus. The of one sequence. Corresponding positions in the other sequence
alignment has a rod or sequence line for every organism. All will be stored in this array. For the creation of the array it uses
rods have the same length, which is set by the number of a combination of two methods. The program starts by
positions. This number can be reduced by removing positions comparing the two sequences using a recursive method.
that are empty in all sequence lines, or increased by inserting Subsequences of specified size that appear in both sequences
new positions. Every rod has a fixed number of beads are matched. The unmatched stretches between two matched
(characters) in a fixed order. The number of characters is subsequences are analysed using the same method with a smaller
smaller than the number of positions, so every position contains block size. When the subsequence size reaches a certain
either a nucleotide symbol or a gap symbol. This way the minimum, corresponding positions in the remaining stretches
characters can be shifted. If a character is pushed leftward or are matched using an algorithm that works by minimizing an
rightward, and makes contact with another one, the latter is 'alignment distance' between the stretches (Sellers, 1974). This
pushed in the same direction. In this way the fixed order of distance is calculated by adding a penalty for every mismatch
characters, or primary structure, will always remain correct. or gap. Gaps are penalized using affine gap costs (Spouge,
Just as in an ordinary screen editor, DCSE uses the screen 1991).
as a window on a part of the alignment. This window can be The sequence that has served as a reference will not be
moved in several ways to display other parts of the alignment. changed by the alignment routine. The characters of the other
The screen can be split to show two windows on different parts sequence will be shifted relative to this sequence in order to
of the alignment. It also features a pointer to show the current reflect the calculated correspondence array. However, if the
position in the alignment. This pointer can be moved by the newly aligned sequence contains an insertion in a spot where
arrow keys, and the window will scroll appropriately in order the reference sequence does not have a corresponding gap, this
to keep the pointer on the screen. It is used as a finger, which can not be properly accommodated in the alignment. DCSE's
can push characters leftward or rightward. The pointer cannot alignment routine can handle this situation in three ways. It can
only push characters, it can also move characters to the other create a global insert in the entire alignment, or it can carry
end of a gap, get a character from the other end of a gap, or out the insert by pushing the surrounding characters in the newly
move a continuous block of characters to either side. The pointer aligned sequence aside, thereby possibly disrupting the
can also be resized so that it covers a number of sequences and alignment locally. The other option is to leave the insertion out.
positions. The resized pointer can perform the same actions as This option will produce an error in the primary structure, which
the small one, but all characters covered by the pointer will can be detected easily later on by the primary structure checking
keep their relative positions during the process. routine. This will leave it up to the user to decide whether a
The order of the sequence lines is not rigid. One or more global insert should be created, or whether the problem can
lines can be locked in a given position on the screen, while be solved by a local sequence realignment.

736
Dedicated Comparative Sequence Editor

Markers the identification of secondary structure elements. For


homologous sequences with sufficiently similar secondary
In its simplest form, a marker is a position in the alignment structures, one helix numbering line will suffice. Three such
that can be given a name. The user can go to a marked position lines are present in our SSU RNA database, describing the
by selecting its name, or by going to the next or previous secondary structure of eukaryotic, prokaryotic and mito-
marker. A whole range of options and functions to control chondrial sequences. The coding system allows practically all
markers provides a flexible system to note special positions in possible structures, including pseudoknots.
an alignment. A marker can be located at a certain alignment
The secondary structure indicated can also be tested. DCSE
or sequence position. A sequence name can be associated with
uses the first helix numbering line to make a table containing
a marker. In this case, subsequent selection of this marker will
the approximate positions of both strands of each helix. When
move the pointer to the given position in this particular
checking the secondary structure, it searches for the first
sequence. Otherwise it will move to the position on the current

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Michigan on September 19, 2015


occurrence of a helix strand starting from the pointer. The helix
sequence line. A marker can also refer to two correlated
position table is checked to find out which helix this strand is
positions. A range of markers can be inactivated or activated.
a part of. The opposite strand is found, and is checked for
A range is selected by typing in a part of the name. All markers
complementarity. When the program encounters an error in the
containing this part in their name will be subsequently activated
proposed secondary structure it stops, gives an appropriate error
or inactivated. This feature can be used to create different groups
message, and indicates the location of this error in the alignment.
of markers that can be used and saved separately.
There are options to check only one helix, and to check whether
During editing, markers can be used to keep track of helices can be extended. DCSE has a function to copy a
interesting subsequences. Marker files can also be used to keep secondary structure from another sequence line. In order to find
all sorts of extra positional information at hand. It could be used, the structure of a newly aligned sequence, the secondary
for example, to mark the position of hidden breaks, introns or structure description of a closely related organism can be copied,
tertiary structure elements. Markers can be saved in, or loaded and its consistency checked.
from, a marker file. A marker file is an ASCII text in a simple
format, and it can easily be generated by other programs or
functions. Implementation
There are few limitations to the size of the alignments DCSE
Indication and checking of RNA secondary structure
can handle. As memory is allocated dynamically, the only limit
In each sequence line the symbols for nucleotides or gaps to the size and number of sequences that can be loaded is
alternate with positions that are either blank or contain a special available memory. DCSE can even be used to edit an alignment
symbol. These can be special characters which aid in the larger than memory relatively easily; it allows the user to load
alignment process. However, their main use is to delimit and work on just a part of the total alignment. Some of the
secondary structure elements. These symbols move together changes made to this part, such as global inserts, are applied
with the characters. Their use is illustrated in Figure 1. 'Helix to the entire alignment, and not only the part loaded.
numbering' lines are intercalated between the sequences to allow Editors can be fairly complicated programs to use. DCSE

(G A U{G A ) G U U*G C G G A ) G U G A [ U C C G C*C U G G)U U A A U|C C { A ) A G ) U U A[A A C { A ) A U C) Organism 1


[G A U{G A } G A U'G C ( G ) G A)G U G A [ U C ( A ) G V'C U G G)U U A A U|C C - A G]U U A[A U C { A ) A U C) Organism 2
Helix numbering

Organism 1 u Organism 2 ; * AU
u G C
G G
G U c
u C A
c G
r- * U C A GU G
UCC8C U U
U 2
u
u A GGCG u
A G G CG A A A
G
A G U
A A - A
U
C
A
A
A
A
U A - U
G C
c

Fig. 1. Illustration of secondary structure symbols. Two imaginary RNA sequences and drawings of corresponding secondary structures are shown. In the linear
representation square brackets designate the beginning and end of one strand of a helix and a circumflex separates adjacent helix strands. Braces are used to
indicate the beginning and end of an internal loop or bulge loop, and a base taking part in a non-standard pair is enclosed in parentheses. The 'Helix numbering'
line identifies the helices.

737
P.De R p and R.De Wachter

ros:0<!ba7D Splrocnaeta litoralls arteub.ali


tries to provide a user-friendly interface. Menus tend to be Change colours: 16:G
i Enter <colourcode(s)>:<characters you uant to have this colour> L
preferred by users who are new to a program, since they show c : -CAG-IIGAGGG-ACGAAAGd 1AGGG-1 iCl ICGAACCGGAIIIIAG-AI IACCC-GG-GI IAGUCCI IAGOiGt I-AA
s i : -C6C-IJGAGGC-GNNA--AAGCG! 'GGG-GAGCAAACAGGA! ihAG-MMCCC-i W-GUAGI iCCftCGCNGU-AA
the current range of options available. However, the more : CAC I1GAGGU GCGA AAGCG GGG GAGl'AAAl ftGGAIIliAG Ai:ACCC G GIIAGIXCA CGCCGll AA
: CGC UCAIIGC ACGA AAGCG IGGG GAGCAflACAGGAI IIIAG AliACCC UG GHAGIICCA CGCCCU AA
experienced user can work faster and easier when often-used s :-CGC-IICAUGll-GCNA--AAGC&iGGG-GAGCAAACAGGAUUAG-AHACCC HG-GIIAGIICCA--CGCIIGII-AA
p :-CAC-UCAGAU-GCGA--AAGCG:lGGG GAGCAAACAGGAUUAG-AUACCC UC-GUCGUCCA-CGCCGU-AA
functions are available at the stroke of a key. DCSE therefore u : CGC IIKAGGII ACNA AGGCG 1GGG IIAGCGAACGGGAIIIAG AlAtXC CG GHAGIICCA CGCAGH AA
s i : CGC llhAGGC GMNAMSC6UC6G IIAGCGAACGGGAUIIAG AIIACCC CG GCAGHCCA CGCAGH NA
offers both possibilities. All functions, except the basic moving -GAGCAAACAGGAI II 'AG Ai ACCC i G G!iAGUCCA--CGCCCiI-M
and shifting routines, can be called from a hierarchical menu i l l 11 m i l i m i l l UBMIIlLim i " i ' ' " I MI m H I M MI I MI I I n I U I U I IIII
: CGC UGAGGil UCGA pAG G GGG !'AGCAAACAGGAIII'AG Al ACCC G Gl'AGI'CCA CACCG.I AA
system. Menu items can be selected using arrow keys. Most i : CAC UGAGGC MMNA M G f G GGG GAGCAflftCflGGAU; AG Al'ACCC UG AGilCCA CGCCG'I AA
i :-CGC-IIGAGGA-GCGA flAGCG GGfl! GAGrGAACflGGAUhAG-AKACCC-nG Gi:AG!!CCA--CGCCGII-AA
functions have keyboard shortcuts as well. A help screen can :-CGC-IGAGGA-GCGA- AAGCG GGQ GAGCGIWCflGGA!!rAG-AHACCC-MG-GI'AGiiCCA--CGCCGll-AA
k : CGC UGAGGC GCGA AAG'-G CGfl GAGCGAACAGGAlillAG AUACCC HG GHAGIICCA CGCCGll AA
be called to list the meaning of most important keys and a 1 : CGC IIGAGGII GCGA AAGCGJGIJ i AGCGAACHGGAUUAG AUACCC HG GUAGHCIIA CGCCGU AA
t : CAC IJGAIIGC UCGA AAGUGUGGG i.'AllCAAArAGGAllHAG AUACCC IIG GIIAGIICCA CACAGII AA
constantly updated help line shows the purpose of the currently c t-CGC-UGAGGU-GCHN-AAGCGUGGG-GAGCAflACAGGA'JllAG-AUACCC-UG-GUNGIICCACGCCGIJ-AA
V : CGC IIAAGGC GCGA AAGCAAGGG GAGCAAACAGGAUIIAG AIIACCC UG GIIAGIICCII IIGCCGII AA
selected menu item or of certain keys. A user manual has also

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Michigan on September 19, 2015


t : CGC IIAAGGC GCGA AAGCAAGGG GAGCAAACAGGAUIIAG AIIACCC IIGGIIftGIICCII IIGCCGII AA

been written to provide the user with a full description of all


principles underlying DCSE, and of all parameters and functions
used. Pos:Qii66b Deaulfobacterlun udcuoiatun arteub.ali
Options: 41'iTiBffMHHffli Houi;scc(S):y Shouscc(u) Colour(c) Refresh B<
^ Sliou tuo ulndous on different parts of the allgnnent |
When DCSE starts up, the current directory is checked for c : G CtG U C C<G U A)G C - U G) [ECU - t C A G - U(G A GHS G - A C1G A
si: o oto o o o(o o o)D o - 0 0] to o o -tC G C - U<G A G>G C - G II IN A
the presence of a file containing defaults. It is recognized by B : G CtG C G C<G U A)G G - U Gl IG U U -tC A C - (KG A G}G U - G C1G A
I : G CtG U G C GC A>G G - C G] [G U U -tC G C - U(C A>U G C -A C1G A
its name. In this file, users can put their preferred default values s : G GIG C A C G(C A>G G - C Gl [G U U - [ C G C - U<C A)U G U - G C1N A
p : G CtG C A U(G C A>G G - U G] [G U(G - t C A C - U<C I E M II - G C1G A
for several parameters, such as the name of the alignment file u : G G(G(C)A C<G C A}GG ~ (Tfil tG U C - t C G C - UlN A G>G U - (A)CIN A
si: G GIG li G C(G C AJG G - C G l tG U C - t C G C - U#( A G>G C - G N]N A
and the reference file, or an alignment position to go to. Every rl: G AtG C G C(G U fl>G 6 - C G l tG G U - t C G C - t l A U)G C - G C1G A
c : G AtG U G C<G U D I G S - C Gl tG G U - t C G C - U<G A U>G C - A C1G A
parameter can have several defaults. Any default can be edited a : G GIG A A C<G C AJ6 G - C G | [G(U)C - t C G C - U1G A G>G U - U C1G A
G GIG C G C(G U A _ _ [ G u - [ C A C - U A G>G C - N N1N A
before it is selected. G AtG C U C<G U --[SIIC - I C G C - U(G A GW A - G C1G A
G GCG C U C<G U . - - I G U U -1C G C - U(G A G>G A - G C1G A
The first thing DCSE asks is the name of the alignment file. G GIG C G CtG U A
N S - C Gl tG U A -CC G C - tKG A G>G C -
G C1G
G C1G
A
A
G GIG C A OtG 11 A>G G - C Gl tG U U - t C G C - U<G A G)G U -
A list of all organisms present in this file is created. The G GIG A G CtG U A)G G - U G] IG A C - I C A C - U(G A U>G C -
U C1G
G C1N
A
N
G CtG C A C(G U MS G - C Gl tG U U - I C G C - U<G A GG U -
selection screen shown in Figure 2 is used to select which G GIG C G IKG U A)G G - C G l tG A A - t C 6 C - UOI A GIG C -
G C1G A
23 23-
organisms and positions will be loaded into memory. Organisms
can be selected individually or in block. A search function
allows users to find and select a specific organism. The current 9s:0Z676 Splrochaeta bajacalifoi
selection can be saved and retrieved later on when work on File PosiTion Prinary Sccundary ]JQ^^ Dluur^ Quit
| Options ncnu: s p l i t uindous, noue mid show secondAry structure,
the same set of organisms is resumed. si
rl
:M
:A
AAGGGHGCGCftSG-CS
AAGAGCGCGUAGG-C8
Gl
G
The two top lines on the screen show information and menus. 'c
a
;H
:A
AAGAGUGCSUAGG-CGGGUflS:__
AAGGSftnCGCAGG-CSrGUCCUU.
DCSE can disnlav the seauences in several modes, which are b :A
p :A AAGA
AAGGGCGCGUAGG-UG-GAUflUUI
'
: :A AAGG C GUCGC-GUCtl
b :A- -AAGG U- GGCftiV GUCfllffilSSACGC-UWOGC GCf.n
1 :A AAGG U--UGUHA-GUCC p-AU-UGA~CGC-UGAGGU-C(M
It :A AAGG fi GUUAA-GUCA AC-UGACAC-UGAUGC-UCGn
t :A -mOC ACUCAA-GUCAfalU-UGACGC-UGAGOJ-GCNN
fA :A-LAAGG.-_ IAG6AAA-GUUA ?-CC-UGACGC-UASfeGC-eCGn
. :A- 1 AAGGGCG PAAAGGUAH-GUUA MC-UGACGC-U(WG6C-GCGA-
i :A AAGG^CG GUGG-GCUGC-GUCq M-CC-UGfl--CGC-UBAGGC-GCGA-
r :A AAGG - -Gcps jicc-i(G(>-psc-j'""--
^.Mt;thij]i:ib<u:tf:riun s p . I 1 :A AAGG ^GUCillKAC-UGn --C6C-U
lP:A AAG- :-UGACAC-U
i :A
K Kijlosnxirfflns
inolMci 1 lu-i suis GCCCU6UGH-GUCU|[C-%X:-CG>1CCC-H,
i*.Uibrin n r o t c o l u t i c u s
9,Dcsulfobacter s p . 1
lO.Flextspira rapplni 1
ll.Hclicobacter ciiMedl
12.Lactobacillus agilis
13 Clostrldiun butyrlcun
pctroleopliiU Fig. 3. DCSE's editing screen in several display modes under DOS. The top
15-Blridobactcriim cumiiculi line shows information about the current position of the pointer, which is
bajacaliftirnirnxis
17 Si.irochoeU lltoralls displayed as a rectangular area with a grey background. The second line is
used for messages and menus. The next line displays help. The following are
sequence lines, preceeded by an abbreviation of the organism name: (a) DCSE's
editing screen in 'colour' mode. Secondary structure is not shown. Different
Fig. 2. The selection screen of DCSE under DOS. It shows the current alignment characters are displayed in different colours, seen here as shades of gray, (b)
file (arteub.ali), followed by the currently selected positions. The upper box DCSE in split-screen mode. The left and right windows display different parts
is a dialogue box which allows the user to find a certain organism or to select of an RNA sequence alignment with secondary structure information, (c) The
a set of organisms designated by a filename. The lower box displays a pan editing screen with the secondary structure indicated using colour. The same
of the organisms present in the alignment file. These can be scrolled up and helix as in (b) is displayed. Double-stranded areas are indicated by a different
down. Selected organisms are shown in inverse video. Options are shown on background colour, shown here as gray. Still other colours are used for bulges,
the screen and can be selected by pressing the highlighted characters. internal loops and non-standard basepairs.

738
Dedicated Comparative Sequence Editor

illustrated in Figure 3. The next line is the help line. The other been described in the literature. Olsen's SEQEDT is available
lines contain sequences, preceded by an abbreviation of the from the Ribosomal Database Project (Olsen et al., 1991). The
species name. The pointer is shown as a rectangular area in GCG sequence analysis package (Genetics Computer Group,
inverse video or with a differently coloured background. Display Inc., Madison, WI) provides the multiple sequence editor
of the structure symbols can be switched off, in which case LINEUP. With the exception of SEQEDT and LINEUP, these
the number of nucleotides visible on the screen is doubled. editors were originally developed to edit protein alignments,
Every character can be given a specific colour, which makes though they can be used for nucleic acid alignments as well.
it possible to display bases or amino acids, and/or secondary DCSE was developed from the start to deal with structural RNA
structure elements in a different colour. molecules, though it can also be used on protein alignments.
An extensive range of functions allows location of certain DCSE provides the tools to investigate higher-order structure
features in the alignment. DCSE can go to a specified sequence on a comparative basis. DCSE also has several other advantages

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Michigan on September 19, 2015


or alignment position. It can position the pointer on a specific over other editors due to its special approach towards editing
organism or helix. It can also find a certain sequence or its and its portability.
complement, allowing for a specified maximum number of While available editors are limited either to VMS or Unix
differences. workstations, DCSE is available for different platforms. Its
For large alignments it takes a considerable amount of time function is not limited to workstations, and its easy approach
to save the entire file after alterations have been made, therefore towards working on a part of an alignment makes handling of
DCSE saves corrections directly into the file originally loaded. large alignments possible, even on microcomputers. The
It is thus possible just to save a few sequence lines while leaving number of rows and columns visible on the screen is limited
the rest unchanged. A global insertion or deletion will only by what the system can handle, and not by DCSE. All
necessarily need a complete rewrite of the alignment. A flexible data files used by DCSE are essentially plain ASCII files in
output routine enables the user to save any part of an old a simple format. This makes data interchange easy. In contrast
alignment as a new partial one. This routine has an option to to SEQEDT, DCSE does not incorporate functions to create
remove all positions that do not have nucleotides in the selected evolutionary trees. However, the auxiliary program CONVERS
sequence lines. This way the new alignment can be compacted. can easily create data files which are used by TREECON (Van
The output alignment can also be sliced into blocks of given de Peer and De Wachter, 1993) to produce trees. It would be
size in order to print it. The secondary structure signs can be equally straightforward to create data for other phylogenetic
removed, and the names of the organisms abbreviated. packages.
CONVERS is a program that complements DCSE. Several DCSE's strength lies in its unique supportunrivalled by the
options can be chosen from a central menu. All options are other editorsfor secondary structure research. DCSE allows
menu driven as well. CONVERS currently has the possibility incorporation of structural information into an alignment, so
to change files from IG Suite format (IntelliGenetics, Inc., that it can be used to refine the alignment. Incorporated RNA
Mountain View, CA) to EMBL format (European Molecular structures can be copied, checked and displayed in several
Biology Laboratory, Heidelberg). It has an option to find and modes (cf. Figure 3). Further tools for secondary structure
extract sequence information from files obtained from EMBL, research are the possibility to search for the complement of a
and to convert them to the reference format (see Algorithms, sequence, the ability to split the screen, and a routine that helps
primary structure) used by DCSE. It can convert an alignment to locate possible helices by detection of compensating
or parts of the alignment to the format used by TREECON (Van substitutions. Protein structure information can also be
de Peer and De Wachter, 1993) for deriving phylogenetic trees. incorporated in a alignment. In fact any symbol not reserved
It can also append sequences from a reference file to an for RNA secondary structure can be used to indicate any
alignment. Information such as literature reference and property tied to a certain residue. Giving hydrophobic and
taxonomic position can be extracted in user-configurable formats hydrophilic residues a different colour can aid the detection of
from a reference file. Other options include sorting of an structure elements. However, DCSE has no special support to
alignment or reference file, and appending different alignments seek protein structures, such as site/signature searches.
of the same length. DCSE offers a powerful and user-friendly environment to
create and maintain sequence alignments. It has been used
extensively in our research group for the maintenance of an
Discussion
up-to-date alignment of small ribosomal subunit RNA sequences
Several sequence alignment editors, such as HOMED (De Rijk et al., 1992). An alignment of large subunit ribosomal
(Stockwell and Petersen, 1987), MASE (Faulkner and Jurka, RNA is also being created. A number of routines could still
1988), ALMA (Thirup and Larsen, 1990), SOMAP (Parry- be improved, such as the automatic alignment routine. A
Smith and Attwood, 1991) and MALIGNED (Clark, 1992) have criterion to choose whether a global insert is needed to

739
P.De Ryk and R.De Wachter

accommodate extra residues in a newly aligned sequence should


be considered. The different versions of DCSE for supported
platforms are available from the authors. They will also be made
available on-line through anonymous FTP on host
uiam3.uia.ac.be (143.169.8.1).

Acknowledgements
This research was supported in part by the Program on Interuniversity Poles
of Attraction (contract 23) of the Office for Science Policy Programming of
the Belgian State, and by the Fund for Collective Fundamental Research. It
was performed in the framework of the Institute for the Study of Biological
Evolution of the University of Antwerp. Peter De Rijk is a research assistant

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Michigan on September 19, 2015


of the NFWO.

References
Chan.S.C, Wong.A.K.C. and Chiu,D.K.Y. (1992) A survey of multiple
sequence comparison methods. Bull. Math. Bioi, 54, 563-698.
Clark,S.P. (1992) MALIGNED: a multiple sequence alignment editor. Compui.
Applic. Biosci., 8, 535-538.
Depiereux.E. and Feytmans.E. (1992) MATCH-BOX: a fundamentally new
algorithm for the simultaneous alignment of several protein sequences.
Comput. Applic. Biosci., 8, 501-509.
De Rijk,P., Neefs,J.M., Van de Peer.Y. and De Wachter.R. (1992) Compilation
of small ribosomal subunit RNA sequences. Nucleic Acids Res., 20,
2075-2089.
Faulkner.D.V. and Jurka.J. (1988) MASE: multiple aligned sequence editor.
Trends Biochem. Sci., 12, 279-280.
Olsen,G.J., Larsen.N. and Woese.C.R. (1991) The ribosomal RNA database
project. Nucleic Acids Res., 19, 2017-2021.
Parry-Smith.D.J. and Attwood.T.K. (1991) SOMAP: a novel interactive
approach to multiple proteinsequences alignment. Comput. Applic. Biosci.,
7, 233-235.
Rechid.R., Vingron.M. and Argos.P. (1989) A new interactive protein sequence
alignment program and comparison of its results with widely used algorithms.
Comput. Applic. Biosci., 5, 107-113.
Schuler.G.D., Altschul.S.F. and Lipman.D.J. (1991) A workbench of multiple
alignment construction and analysis. Proteins Struct. Fund. Genet., 9,
180-190.
Sellers,P.H. (1974) On the theory and computation of evolutionary distances.
SUM J. Appl. Math., 26, 787-793.
Spouge.J.L. (1991) Fast optimal alignment. Comput. Applic. Biosci. , 7 , 1 - 7 .
Stockwell.P.A. and Petersen.G.B. (1987) H O M E D : a homologous sequence
editor. Comput. Applic. Biosci., 3, 37-43..
Thirup.S. and Larsen.N.E. (1990) ALMA, an editor for large sequence
alignments. Proteins Struct. Funct. Genet., 7, 291-295.
Van de Peer.Y. and De Wachter.R. (1993) TREECON: a software package
for the construction and drawing of evolutionary trees. Comput. Applic.
Biosci., 9, 177-182.

Received on April 30, 1993; accepted on June 23, 1993

Circle No. 16 on Reader Enquiry Card

740

Das könnte Ihnen auch gefallen