Beruflich Dokumente
Kultur Dokumente
• Gene Recognition
The GENSCAN hidden Markov model
Comparative Gene Recognition – Twinscan, SLAM
…ACGTGACTGAGGACCGTG
CGACTGAGACTGACTGGGT
CTAGCTAGACTACGTTTTA
TATATATATACGTCGTCGT
ACTGATGACTAGATTACAG
ACTGATTTAGATACCTGAC
TGATTTTAAAAAAATATT…
Which representative of the species?
Which human?
Answer one:
http://info.med.yale.edu/genetics/kkidd/point.html
Migration of human variation
http://info.med.yale.edu/genetics/kkidd/point.html
Migration of human variation
http://info.med.yale.edu/genetics/kkidd/point.html
Is that all?
• Multiregional Evolution
Fossil records show a continuous change
of morphological features
Proponents of the theory doubt mtDNA
and other genetic evidence
DNA Sequencing
Goal:
Find the complete sequence of A, C, G, T’s in DNA
Challenge:
There is no machine that takes long DNA as an input, and gives the
complete sequence as output
Shake
DNA fragments
Known
Vector location
Circular genome
(bacterium, plasmid) + = (restriction
site)
Different types of vectors
Plasmid 2,000-10,000
Can control the size
Cosmid 40,000
3. Include dideoxynucleoside
(modified a, c, g, t)
1. Filtering
2. Smoothening
3. Correction for length compressions
4. A method for calling the letters – PHRED
A C G A A T C A G …A
16 18 21 23 25 15 28 30 32 …21
Double-barreled sequencing:
Both leftmost & rightmost ends are
sequenced
Method to sequence longer regions
genomic segment
~500 bp ~500 bp
Reconstructing the Sequence
(Fragment Assembly)
reads
Lander-Waterman model:
Assuming uniform distribution of reads, C=10 results in 1 gapped
region /1,000,000 nucleotides
Challenges with Fragment Assembly
• Sequencing errors
~1-2% of bases are wrong
• Repeats
1. Hierarchical – Clone-by-clone
i. Break genome into many long pieces
ii. Map each long piece onto the genome
iii. Sequence each piece with shotgun
map
genome
Goal:
Methods:
• Hybridization
• Digestion
1. Hybridization
p1 pn
C1 C2 ……………….Cn
Matrix: 0 0 1 …………………..1
m probes × n clones
0 0 0 0 0 0………1 1 1 0
0 0 0 0 0 0………0 1 1 1
C1 C2 ……………….Cn
0 0 1 …………………..1
A probe (short word) can hybridize in many
places in the genome
Computational Problem:
1 1 0 …………………..0
Find the order of probes that implies the minimal
1 0 1…………………...0
probe repetition
Solutions:
Greedy, Probabilistic, Lots of manual curation
2. Digestion
Double digestion:
Cut with enzyme A, enzyme B, then enzymes A + B
Online Clone-by-clone
The Walking Method
The Walking Method
Hierarchical Sequencing
ADV. Easy assembly
DIS. Build library & physical map;
redundant sequencing
Whole Genome Shotgun (WGS)
ADV. No mapping, no redundant sequencing
DIS. Difficult to assemble and resolve repeats
1. Grow clone
2. Prepare & Shear DNA
3. Prepare shotgun library & perform shotgun
4. Assemble in a computer
5. Close remaining gaps
Efficient Inefficient
~500 bp ~500 bp