Beruflich Dokumente
Kultur Dokumente
417
Introduction to Computational Molecular Biology
Foundations of Structural Bioinformatics
Sebastian Will
MIT, Math Department
Fall 2011
Before we start
Instructor: Sebastian Will
Contact: wills@mit.edu
Office hours: by appointment, Office: 2-155
Lecture: Tuesday, Thursday, 9:30-11:00 am
Room: 8-205
Web: http://math.mit.edu/classes/18.417/
(slides, further information)
Final Project:
(minimize energy)
DNA
Transcription
RNA
Translation
Protein
Genetic code
Transcription: A,C,G,T 7 A,C,G,U
Translation: Tripletts from alphabet {A,C,G,U} (= codons)
Protein Bio-Synthesis
Evolution (
Animals
Slime moulds
Fungi Gram-positives
Chlamydiae
Green nonsulfur bacteria
Plants
ACCGA
Actinobacteria
Algae
Planctomycetes
Spirochaetes
Protozoa
ACCTA
T
Fusobacteria
Crenarchaeota
Cyanobacteria
(blue-green algae)
Nanoarchaeota
C
ACCCGA
TCCTA
T
ACTA
Euryarchaeota
Thermophilic
sulfate-reducers
Acidobacteria
Protoeobacteria
insertion, ... )
molecules/fragments of molecules
Structural relation between molecules
Relation between sequence and structure
Interaction between molecules
Interaction networks, Regulatory networks, Metabolic networks
Structure of genomes, Relation between genomes
S.Will, 18.417, Fall 2011
...
Areas of Bioinformatics
1. Genomics: Study of entire genomes.
Huge amount of data, fast algorithms,
limited to sequence.
covalent bond:
1e
+1
+1
2e
H H
HH
+1
e.g. Methane
small size
Non-covalent bonds
Covalent
1e
+1
+1
2e
+1
H H
HH
Non-covalent
Van der Waals (sum of the attractive or repulsive forces
between molecules, caused by correlations in the fluctuating
polarizations of nearby particles)
hydrogen bonds (attractive interaction of a hydrogen atom
with an electronegative atom)
[in kcal/mol]
0.1
1
noncovalent
Bond
10
100
1000
complete
glucose oxidation
CC Bond
Functional groups
organic molecules: carbon skeleton + functional groups
functional groups are involved in specific chemical reactions
Alcohol
OH
Ketone
/Aldehyde
hydroxyl group
carbonyl group
O
C
carboxyl group
C
OH
H
Amine
amino group
N
H
Carboxylic Acid
4 families:
sugars
proteins
Sugars
component of building blocks, main energy source
general formula (CH2 O)n ,
CH2OH
O
O H
H
OH
OH
H
OH
HO
H
CH2OH
CH2OH
Fats
Amino Acids
Amino Acids
Nucleotides
Purines
pentose
Base
glycosidic bond
Adenine
OH = ribose
H = deoxyribose
Guanine
Pyrimidines
nucleoside
nucleotide monophosphate
nucleotide diphosphate
Cytosine
Uracil
Thymine
nucleotide triphosphate
H
H
H
N
N
N
N
N
O
Adenine
N
N
Thymine
Guanine
Cytosine
DNA structure
Primary structure: chain of nucleotides
Tertiary Structure: antiparallel double helix
Thymine
5' end
O
O_
NH2
_O
3' end
OH
HN
N
O
O
O_
O
O
O
_O
NH2
P
O
N
N
N
PhosphateO
deoxyribose P
_O
backbone
HN
H2N
O_
O
O
NH
H2N
N
N
O
O
O
_O
NH
N
NH2
O_
H2N
N
O
O
O_
O
OH
Cytosine
Guanine
5' end
3' end
_O
Adenine
RNA structure
Hammerhead Ribozyme
tRNA
linear representation
GGGCGUGUGGCGUAGUCGGUAGCGCGCUCCCUUAGCAUGGAGAGGUCUCCGGUUCGAUUCCGGACACGCCCACCA
(((((((..((((........)))).(((((.......)).)))...(((((.......))))))))))))....
and so on . . .
Features:
3.6 amino acids per turn
hydrogen bond between
residues n and n + 4
local motif
approximately 40% of the
structure
Features:
2 amino acids per turn
hydrogen bond between
interactions
approximately 20% of the
structure
Features:
Up to 5 residue length
hydrogen bonds depend of
type
local interactions
approximately 5-10% of the
structure
DNA sequencing
A very incomplete overview
genome published
Sequence Alignment
pairwise alignment
Sequence A: ACGTGAACT
Sequence B: AGTGAGT
align A and B
Sequence A: ACGTGAACT
Sequence B: A-GTGA-GT
global and local alignment
multiple alignment (NP-complete heuristics)
sequences:
Simultaneous Alignment and Folding
fdhA
fwdB
selD
vhuD
vhuU
fruA
hdrA
((..((((((((...(((.................))).))))))))..))
CGC-CACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAG-GUGG-CG
AUG-UUGGAGGGGAACCCGU-------------AAGGGACCCUCCAAG-AU
UUACGAUGUGCCGAACCCUU------------UAAGGGAGGCACAUCGAAA
GU--UCUCUCGGGAACCCGU------------CAAGGGACCGAGAGA--AC
AGC-UCACAACCGAACCCAU-------------UUGGGAGGUUGUGAG-CU
CC--UCGAGGG-GAACCCGA-------------AA-GGGACCCGAGA--GG
GG--CACCACUCGAAGGCUA-------------AG-CCAAAGUGGUG--CU
.........10........20........30........40........50
48
36
39
35
36
32
33
A CC
A
GC
GC
GC
CG
GC
UA
UGA
U GC
U
G
C
A G G C CU A
UGCG
G
UCCGG
G
G
GU A CGC
C
UUC
G
C G GU
UA
CG
C GA
UCG
U
U
A
A
GC
U
A
A
U
A
A
C
A
U
A
U
U
C U
C A G C G G G C G
U U
U U
G U C G C G C G C
C G A C U G
G C U G A C
RNA-RNA Interaction
Prediction of interaction complex of two RNAs
Similar to Pseudoknot-prediction, the unrestricted problem is
NP-complete
MC-Fold / MC-Sym
MC-Sym:
"split set"
MP 12 ML 13 MR 14
D 15
MATP 6
inserts
IL 16
"split set"
IR 17
MP 18 ML 19 MR 20
D 21
MATP 7
inserts
IL 22
"split set"
U
input multiple alignment:
example structure: U
C
[structure] . : : <<< _ _ _ _ > - >> : << - < . _ _ _ . >>> .
5A
human . A A G A C U U C G G A U C U G G C G . A C A . C C C .
G
mouse a U A C A C U U C G G A U G - C A C C . A A A . G U G a
A
A
orc . A G G U C U U C - G C A C G G G C A g C C A c U U C .
2
5
10
15
20
25
MR 24
D 25
MATR 8
insert
IR 23
28
C
G10
G
A
U
C 15
21
U
G GCG A
C
C
C
C
A
27
25
IR 26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
ROOT 1
MATL
MATL
BIF
BEGL
4
5
MATP
MATP
MATR 8
MATP
MATL 10
MATL 11
MATL 12
MATL 13
END
14
BEGR 15
MATL 16
MATP 17
MATP 18
MATL 19
MATP 20
MATL 21
MATL 22
MATL 23
END
24
S
IL
IR
ML
D
IL
ML
D
IL
B
S
MP
ML
MR
D
IL
IR
MP
ML
MR
D
IL
IR
MR
D
IR
MP
ML
MR
D
IL
IR
ML
D
IL
ML
D
IL
ML
D
IL
ML
D
IL
E
S
IL
ML
D
IL
MP
ML
MR
D
IL
IR
MP
ML
MR
D
IL
IR
ML
D
IL
MP
ML
MR
D
IL
IR
ML
D
IL
ML
D
IL
ML
D
IL
E
EvoFold
Protein-Protein Interaction
protein models
optimal ab-initio prediction in HP-lattice protein models (3D
cubic and fcc)
vs.