Dynamic Programming

Alignment II
Dynamic Programming
2
Pair-wise sequence alignments
A: C A T - T C A - C
| | | | |
B: C - T C G C A G C

Idea: Display one sequence above
another with spaces inserted in both
to reveal similarity
3
Two types of alignment
S = CTGTCGCTGCACG
T = TGCCGTG
CTGTCGCTGCACG--
-------TGC-CGTG
CTGTCG-CTGCACG
-TGC-CG-TG----
Global alignment
Local alignment
4
Global alignment: Scoring
CTGTCG-CTGCACG
-TGC-CG-TG----
Reward for matches: o
Mismatch penalty: |
Space penalty:

score(A) = ow |x - y
w = #matches x = #mismatches y = #spaces
5
Global alignment: Scoring
C T G T C G C T G C
- T G C C G T G -
-5 10 10 -2 -5 -2 -5 -5 10 10 -5
Total = 11
Reward for matches: 10
Mismatch penalty: 2
Space penalty: 5
6
Optimum Alignment
The score of an alignment is a measure of
its quality
Optimum alignment problem: Given a pair
of sequences X and Y, find an alignment
(global or local) with maximum score
The similarity between X and Y, denoted
sim(X,Y), is the maximum score of an
alignment of X and Y
7
Alignment algorithms
Global: Needleman-Wunsch
Local: Smith-Waterman
NW and SW use dynamic
programming
Variations:
Gap penalty functions
Scoring matrices
8
Global Alignment: Algorithm
1..j 1..i
T and S of alignment optimum of Cost ) , ( = j i C
T of j length of Prefix
S of i length of Prefix
.. 1
.. 1
=
=
j
i
T
S
=
= +
=
b a
b a
b a w
if
if
) , (
|
o
9

+
=
) 1 j , i ( C
) j , 1 i ( C
) T , S ( w ) 1 j , 1 i ( C
max ) j , i ( C
j i
= = j ) j , 0 ( C i ) 0 , i ( C
Initial conditions:
Recurrence relation: For 1 s i s n, 1 s j s m:
Theorem. C(i,j) satisfies the following
relationships:
10
Justification
S
1
S
2
. . . S
i-1
S
i
T
1
T
2
. . . T
j-1
T
j
C(i-1,j-1) + w(S
i
,T
j
)
S
1
S
2
. . . S
i-1
S
i
T
1
T
2
. . . T
j

C(i-1,j)
S
1
S
2
. . . S
i

T
1
T
2
. . . T
j-1
T
j

C(i,j-1)
11
Example
Case 1: Line up S
i
with T
j
S: C A T T C A C
T: C - T T C A G
i - 1
i
j
j -1
S: C A T T C A - C
T: C - T T C A G -
Case 2: Line up S
i
with space
i - 1 i
j
S: C A T T C A C -
T: C - T T C A - G
Case 3: Line up T
j
with space
i
j j -1
12
Computation Procedure
C(n,m)
C(0,0)
C(i,j)
{ } + = ) 1 j , i ( C , ) j , 1 i ( C ), T , S ( w ) 1 j , 1 i ( C max ) j , i ( C
j i
C(i-1,j) C(i-1,j-1)
C(i,j-1)
13
C T C G C A G C
A
C
T
T
C
A
C
+10 for match, -2 for mismatch, -5 for space
0 -5 -10 -15 -20 -25 -30 -35 -40
-5
-10
-15
-20
-25
-30
-35
10 5

14
0 -5 -10 -15 -20 -25 -30 -35 -40
-5 10 5 0 -5 -10 -15 -20 -25
-10 5 8 3 -2 -7 0 -5 -10
-15 0 15 10 5 0 -5 -2 -7
-20 -5 10 13 8 3 -2 -7 -4
-25 -10 5 20 15 18 13 8 3
-30 -15 0 15 18 13 28 23 18
-35 -20 -5 10 13 28 23 26 33
C T C G C A G C
A
C
T
T
C
A
C

Traceback can yield both optimum alignments
*
*
15
End-gap free alignment
Gaps at the start or end of alignment
are not penalized
Best global Best end-gap free
Match: +2 Mismatch and space: -1
Score = 1 Score = 9
16
Motivation: Shotgun assembly
Shotgun assembly produces large set of
partially overlapping subsequences from
many copies of one unknown DNA sequence.
Problem: Use the overlapping sections to
paste the subsequences together.
Overlapping pairs will have low global
alignment score, but high end-space free
score because of overlap.
17
Motivation: Shotgun assembly
18
Algorithm
Same as global alignment, except:
Initialize with zeros (free gaps at start)
Locate max in the last row/column (free
gaps at end)
19
10 5 10 5 10 5 0 10
C T C G C A G C
A
C
T
T
C
A
G
+10 for match, -2 for mismatch, -5 for gap
0 0 0 0 0 0 0 0 0
0
0
0
0
0
0
0

5 8 5 8 5 20 15 10
0 15 10 5 6 15 18 13
-2 10 13 8 3 10 13 16
10 5 20 15 18 13 8 23
5 8 15 18 13 28 23 18
0 3 10 25 20 23 38 33
20
Local Alignment: Motivation
Ignoring stretches of non-coding DNA:
Non-coding regions are more likely to be
subjected to mutations than coding regions.
Local alignment between two sequences is likely
to be between two exons.
Locating protein domains:
Proteins of different kind and of different
species often exhibit local similarities
Local similarities may indicate functional
subunits.
21
Local alignment: Example
Best local alignment:
Match: +2 Mismatch and space: -1
Score = 5
S = g g t c t g a g
T = a a a c g a
g g t c t g a g
a a a c g a -
22
Local Alignment: Algorithm
Initialize top row and leftmost column to
zero.
| |
| | | | ( )
| |
| |

+
=
0
1 ,
, 1
, ] 1 , 1 [
max ,
j i C
j i C
j t i s score j i C
j i C
C [i, j] = Score of optimally aligning a
suffix of s with a suffix of t.
23
0 0 0 0 0 0 0 0 0
0 1 0 1 0 1 0 0 1
0 0 0 0 0 0 2 0 0
0 0 1 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0
0 1 0 2 0 1 0 0 1
0 0 0 0 1 0 2 0 0
0 1 0 1 0 2 0 1 1
C T C G C A G C
A
C
T
T
C
A
C

+1 for a match, -1 for a mismatch, -5 for a space
24
Some Results
Most pairwise sequence alignment problems
can be solved in O(mn) time.
Space requirement can be reduced to
O(m+n), while keeping run-time fixed
[Myers88].
Highly similar sequences can be aligned in
O(dn) time, where d measures the distance
between the sequences [Landau86].
25
Reducing space requirements
O(mn) tables are often the limiting
factor in computing large alignments
There is a linear space technique that
only doubles the time required
[Hirschberg77]
26
0 10 5 10 5 10 5 0 10
C T C G C A G C
A
C
T
T
C
A
G
IDEA: We only need the previous row to calculate the next
0 0 0 0 0 0 0 0 0

0 5 8 5 8 5 20 15 10
27
Linear-space Alignments
mn + mn + mn + 1/8 mn + 1/16 mn + = 2 mn
28
Affine Gap Penalty Functions
Gap penalty = h + gk

where

k = length of gap
h = gap opening penalty
g = gap continuation penalty
Can also be solved
in O(nm) time
using dynamic
programming

Dynamic Programming

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Dynamic Programming

Hochgeladen von

Copyright:

Verfügbare Formate

Alignment II

Das könnte Ihnen auch gefallen