Sie sind auf Seite 1von 28

Alignment II

Dynamic Programming
2
Pair-wise sequence alignments
A: C A T - T C A - C
| | | | |
B: C - T C G C A G C

Idea: Display one sequence above
another with spaces inserted in both
to reveal similarity
3
Two types of alignment
S = CTGTCGCTGCACG
T = TGCCGTG
CTGTCGCTGCACG--
-------TGC-CGTG
CTGTCG-CTGCACG
-TGC-CG-TG----
Global alignment
Local alignment
4
Global alignment: Scoring
CTGTCG-CTGCACG
-TGC-CG-TG----
Reward for matches: o
Mismatch penalty: |
Space penalty:

score(A) = ow |x - y
w = #matches x = #mismatches y = #spaces
5
Global alignment: Scoring
C T G T C G C T G C
- T G C C G T G -
-5 10 10 -2 -5 -2 -5 -5 10 10 -5
Total = 11
Reward for matches: 10
Mismatch penalty: 2
Space penalty: 5
6
Optimum Alignment
The score of an alignment is a measure of
its quality
Optimum alignment problem: Given a pair
of sequences X and Y, find an alignment
(global or local) with maximum score
The similarity between X and Y, denoted
sim(X,Y), is the maximum score of an
alignment of X and Y
7
Alignment algorithms
Global: Needleman-Wunsch
Local: Smith-Waterman
NW and SW use dynamic
programming
Variations:
Gap penalty functions
Scoring matrices
8
Global Alignment: Algorithm
1..j 1..i
T and S of alignment optimum of Cost ) , ( = j i C
T of j length of Prefix
S of i length of Prefix
.. 1
.. 1
=
=
j
i
T
S

=
= +
=
b a
b a
b a w
if
if
) , (
|
o
9



+
=
) 1 j , i ( C
) j , 1 i ( C
) T , S ( w ) 1 j , 1 i ( C
max ) j , i ( C
j i
= = j ) j , 0 ( C i ) 0 , i ( C
Initial conditions:
Recurrence relation: For 1 s i s n, 1 s j s m:
Theorem. C(i,j) satisfies the following
relationships:
10
Justification
S
1
S
2
. . . S
i-1
S
i
T
1
T
2
. . . T
j-1
T
j
C(i-1,j-1) + w(S
i
,T
j
)
S
1
S
2
. . . S
i-1
S
i
T
1
T
2
. . . T
j


C(i-1,j)
S
1
S
2
. . . S
i


T
1
T
2
. . . T
j-1
T
j

C(i,j-1)
11
Example
Case 1: Line up S
i
with T
j
S: C A T T C A C
T: C - T T C A G
i - 1
i
j
j -1
S: C A T T C A - C
T: C - T T C A G -
Case 2: Line up S
i
with space
i - 1 i
j
S: C A T T C A C -
T: C - T T C A - G
Case 3: Line up T
j
with space
i
j j -1
12
Computation Procedure
C(n,m)
C(0,0)
C(i,j)
{ } + = ) 1 j , i ( C , ) j , 1 i ( C ), T , S ( w ) 1 j , 1 i ( C max ) j , i ( C
j i
C(i-1,j) C(i-1,j-1)
C(i,j-1)
13
C T C G C A G C
A
C
T
T
C
A
C
+10 for match, -2 for mismatch, -5 for space
0 -5 -10 -15 -20 -25 -30 -35 -40
-5
-10
-15
-20
-25
-30
-35
10 5

14
0 -5 -10 -15 -20 -25 -30 -35 -40
-5 10 5 0 -5 -10 -15 -20 -25
-10 5 8 3 -2 -7 0 -5 -10
-15 0 15 10 5 0 -5 -2 -7
-20 -5 10 13 8 3 -2 -7 -4
-25 -10 5 20 15 18 13 8 3
-30 -15 0 15 18 13 28 23 18
-35 -20 -5 10 13 28 23 26 33
C T C G C A G C
A
C
T
T
C
A
C

Traceback can yield both optimum alignments
*
*
15
End-gap free alignment
Gaps at the start or end of alignment
are not penalized
Best global Best end-gap free
Match: +2 Mismatch and space: -1
Score = 1 Score = 9
16
Motivation: Shotgun assembly
Shotgun assembly produces large set of
partially overlapping subsequences from
many copies of one unknown DNA sequence.
Problem: Use the overlapping sections to
paste the subsequences together.
Overlapping pairs will have low global
alignment score, but high end-space free
score because of overlap.
17
Motivation: Shotgun assembly
18
Algorithm
Same as global alignment, except:
Initialize with zeros (free gaps at start)
Locate max in the last row/column (free
gaps at end)
19
10 5 10 5 10 5 0 10
C T C G C A G C
A
C
T
T
C
A
G
+10 for match, -2 for mismatch, -5 for gap
0 0 0 0 0 0 0 0 0
0
0
0
0
0
0
0

5 8 5 8 5 20 15 10
0 15 10 5 6 15 18 13
-2 10 13 8 3 10 13 16
10 5 20 15 18 13 8 23
5 8 15 18 13 28 23 18
0 3 10 25 20 23 38 33
20
Local Alignment: Motivation
Ignoring stretches of non-coding DNA:
Non-coding regions are more likely to be
subjected to mutations than coding regions.
Local alignment between two sequences is likely
to be between two exons.
Locating protein domains:
Proteins of different kind and of different
species often exhibit local similarities
Local similarities may indicate functional
subunits.
21
Local alignment: Example
Best local alignment:
Match: +2 Mismatch and space: -1
Score = 5
S = g g t c t g a g
T = a a a c g a
g g t c t g a g
a a a c g a -
22
Local Alignment: Algorithm
Initialize top row and leftmost column to
zero.
| |
| | | | ( )
| |
| |



+
=
0
1 ,
, 1
, ] 1 , 1 [
max ,

j i C
j i C
j t i s score j i C
j i C
C [i, j] = Score of optimally aligning a
suffix of s with a suffix of t.
23
0 0 0 0 0 0 0 0 0
0 1 0 1 0 1 0 0 1
0 0 0 0 0 0 2 0 0
0 0 1 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0
0 1 0 2 0 1 0 0 1
0 0 0 0 1 0 2 0 0
0 1 0 1 0 2 0 1 1
C T C G C A G C
A
C
T
T
C
A
C

+1 for a match, -1 for a mismatch, -5 for a space
24
Some Results
Most pairwise sequence alignment problems
can be solved in O(mn) time.
Space requirement can be reduced to
O(m+n), while keeping run-time fixed
[Myers88].
Highly similar sequences can be aligned in
O(dn) time, where d measures the distance
between the sequences [Landau86].
25
Reducing space requirements
O(mn) tables are often the limiting
factor in computing large alignments
There is a linear space technique that
only doubles the time required
[Hirschberg77]
26
0 10 5 10 5 10 5 0 10
C T C G C A G C
A
C
T
T
C
A
G
IDEA: We only need the previous row to calculate the next
0 0 0 0 0 0 0 0 0

0 5 8 5 8 5 20 15 10
27
Linear-space Alignments
mn + mn + mn + 1/8 mn + 1/16 mn + = 2 mn
28
Affine Gap Penalty Functions
Gap penalty = h + gk

where

k = length of gap
h = gap opening penalty
g = gap continuation penalty
Can also be solved
in O(nm) time
using dynamic
programming

Das könnte Ihnen auch gefallen