Beruflich Dokumente
Kultur Dokumente
Sequence Alignment
Definition: Procedure for comparing two or more sequences by searching for a series of individual characters or character patterns that are in the same order in the sequences
Pair-wise
alignment: compare two sequences Multiple sequence alignment: compare more than two sequences
Task: align abcdef with abdgf Write second sequence below the first
abcdef abdgf
Move sequences to give maximum match between them Show characters that match using vertical bar
We distinguish
Global
alignment algorithms which optimize overall alignment between two sequences Local alignment algorithms which seek only relatively conserved pieces of sequence
Alignment
Global
Local
--------GKG-------||| --------GKG--------
Global
Local
-------TGKG-------||| -------AGKG--------
Similar genes arise by gene duplication Copy of a gene inserted next to the original Two copies mutate independently Each can take on separate functions All or part can be transferred from one part of genome to another
Goal: Graphically display regions of similarity between two sequences (e.g., domains in common between two proteins of suspected similar function)
Basic Method: For two sequences of lengths M and N, lay out an M by N grid (matrix) with one sequence across the top and one sequence down the left side. For each position in the grid, compare the sequence elements at the top (column) and to the left (row). If and only if they are the same, place a dot at that position.
(Demonstration
Can't
Repeats
compare
character by character within a window (have to choose window size) require certain fraction of matches within window in order to display it with a dot
(Demonstration A7)
of average exon size of average protein structural element size of gene promoter size of enzyme active site
find
average (m) and s.d. () of match scores of shuffled sequence convert original (unshuffled) scores (x) to Z scores
Z = (x - m)/
use
threshold Z of of 3 to 6
using
provides
Note set of diagonals in lower right that do not line up due to insertion near 475 on cI
100
100
200
200
300
300
400
400
500
500
600
600
700
100 200 300 400 500 600 700 800 100 100
200
200
300
300
400
400
500
500
600
600
700
700
800
800
100
200
300
400
500
600
700
800
100 200 300 400 500 600 700 800 100 100
200
200
300
300
400
400
500
500
600
600
700
700
800
800
100
200
300
400
500
600
700
800