Sie sind auf Seite 1von 103

DNA Sequencing

DNA sequencing
Determination of nucleotide sequence the determination of the precise sequence of nucleotides in a sample of DNA
Two similar methods: 1. Maxam and Gilbert method 2. Sanger method They depend on the production of a mixture of oligonucleotides labeled either radioactively or fluorescein, with one common end and differing in length by a single nucleotide at the other end This mixture of oligonucleotides is separated by high resolution electrophoresis on polyacrilamide gels and the position of the bands determined

Why sequence DNA?


All genes available for an organism to use -- a very important tool for biologists Not just sequence of genes, but also positioning of genes and sequences of regulatory regions New recombinant DNA constructs must be sequenced to verify construction or positions of mutations Etc.

History of DNA sequencing

History of DNA sequencing

MC chapter 12

Maxam-Gilbert
Walter Gilbert
Harvard physicist Knew James Watson Became intrigued with the biological side Became a biophysicist

Allan Maxam

The Maxam-Gilbert Technique


Principle - Chemical Degradation of Purines
Purines (A, G) damaged by dimethylsulfate Methylation of base Heat releases base Alkali cleaves G Dilute acid cleave A>G

Maxam-Gilbert Technique
Principle Chemical Degradation of Pyrimidines
Pyrimidines (C, T) are damaged by hydrazine Piperidine cleaves the backbone 2 M NaCl inhibits the reaction with T

Maxam and Gilbert Method


Chemical degradation of purified fragments (chemical degradation) The single stranded DNA fragment to be sequenced is end-labeled by treatment with alkaline phosphatase to remove the 5phosphate It is then followed by reaction with P-labeled ATP in the presence of polynucleotide kinase, which attaches P labeled to the 5terminal The labeled DNA fragment is then divided into four aliquots, each of which is treated with a reagent which modifies a specific base
1. Aliquot A + dimethyl sulphate, which methylates guanine residue 2. Aliquot B + formic acid, which modifies adenine and guanine residues 3. Aliquot C + Hydrazine, which modifies thymine + cytosine residues 4. Aliquot D + Hydrazine + 5 mol/l NaCl, which makes the reaction specific for cytosine

The four are incubated with piperidine which cleaves the sugar phosphate backbone of DNA next to the residue that has been modified

Maxam-Gilbert sequencing - modifications

Chemical cleavage Method (Maxam & Gilbert 1977)

G C

G C

A T

Chemical cleavage Method (Maxam & Gilbert 1980)


End labelling of DNA strand by 32p
Terminal phosphate hydrolyzed by BAP

Chemical cleavage Method (Maxam & Gilbert 1980)


End labelling of DNA strand by 32p

32p

added by (32P)ATP &

polynucleotide kinase

Chemical cleavage Method (Maxam & Gilbert 1980) End labeled DNA

Base specific Cleavage (DNA cleaved prior to specific base)

1. G only : DMS + piperidine 2. A + G : DMS (acidic) + piperidine 3. C + T : Hydrazine + piperidine

4. C only : Hydrazine (in 1.5M NaCl) + piperidine

(1) DMS + Piperidine Cleavage


O C HN C HN

Depurination &

CH3
N C
C CH N
Bond breakage

G
N

Sugar

Base specific Cleavage


For G
= DMS + Piperidine

A G T C

Base specific Cleavage


For G
: DMS + Piperidine

A G T C

Base specific Cleavage


(DNA cleaved prior to specific base)
Fragments

1. G only : DMS + piperidine 2. A + G : DMS (acidic) + piperidine 3. C + T : Hydrazine + piperidine

4 6 4/3

4. C only : Hydrazine (in 1.5M NaCl) 1 Total 14


T G T A G G A G C T

Base specific Cleavage


T G T A G G A G C T

G (4)
T C G A G G A T G

A + G (6)

C + T (3)

C (1)

Base specific Cleavage


T G T A G G A G C T

G
T

A+G

C+T

C
G A G G A T G

Maxam-Gilbert sequencing - summary

Advantages/disadvantages Maxam-Gilbert sequencing

Requires lots of purified DNA, and many intermediate purification steps Relatively short readings Automation not available (sequencers) Remaining use for footprinting (partial protection against DNA modification when proteins bind to specific regions, and that produce holes in the sequence ladder)

In contrast, the Sanger sequencing methodology requires little if any DNA purification, no restriction digests, and no labeling of the DNA sequencing template

Sanger Method
Fred Sanger, 1958
Was originally a protein chemist Made his first mark in sequencing proteins Made his second mark in sequencing RNA

1980 dideoxy sequencing

Original Sanger Method


Random incorporation of a dideoxynucleoside triphosphate into a growing strand of DNA Requires DNA polymerase I Requires a cloning vector with initial primer (M13, high yield bacteriophage, modified by adding: betagalactosidase screening, polylinker) Uses 32P-deoxynucleoside triphosphates

Sanger Method
in-vitro DNA synthesis using terminators, use of dideoxinucleotides that do not permit chain elongation after their integration DNA synthesis using deoxy- and dideoxynucleotides that results in termination of synthesis at specific nucleotides Requires a primer, DNA polymerase, a template, a mixture of nucleotides, and detection system Incorporation of di-deoxynucleotides into growing strand terminates synthesis Synthesized strand sizes are determined for each dideoxynucleotide by using gel or capillary electrophoresis Enzymatic methods

Dideoxynucleotide
PPP O
5 CH2 O BASE

3
no hydroxyl group at 3 end prevents strand extension

The principles
Partial copies of DNA fragments made with DNA polymerase Collection of DNA fragments that terminate with A,C,G or T using ddNTP Separate by gel electrophoresis Read DNA sequence

3 primer 5

CCGTAC 5 3 dNTP
ddCTP ddGTP

ddATP ddTTP

GGCA
A

GGCAT
T C G

GGC

G GG GGCATG

Dideoxy Chain Terminator


Template Primer Extension Chemistry
polymerase termination labeling

Separation Detection

Chain Terminator Basics


Target Template-Primer TGCA ddA
Extend ddA A ddC AC ddG ACG ddT ddT

ddC

ddG Labeled Terminators

dN : ddN 100 : 1

Chain-Terminator Method (Sanger, 1988)


Template DNA
OVERVIEW

Dideoxy sequencing relies on copying a single fragment of DNA many times but each copy is prematurely terminated.

This produces a series of nested DNA copies, each differing by a single base in length.
These fragments are separated on a polyacrylamide gel and the DNA sequence is determined by reading the terminating base of each fragment on the gel

A G T C

ddNTP
Dideoxynucleoside triphosphates lack an -OH group at the 3-carbon position and cannot add another nucleoside at that position, thus preventing further DNA synthesis

Chain-Terminator Method (Sanger, 1988)


Normal Nucleoside triphosphates

-OH

-OH

-OH

-OH

Dideoxy Nucleoside triphosphates tagged with fluorescent dye

C
fluorescent dyes

Chain-Terminator Method (Sanger, 1988)


A A
-OH

G G

-OH

T T

-OH

C C

-OH

T T

-OH

G A G C C T G G A G C
-OH -OH -OH

-OH

-OH

-OH

-OH

-OH

-OH

-OH

-OH

-OH

-OH

-OH

-OH

-OH

-OH

-OH

Chain-Terminator Method (Sanger, 1977) DNA synthesis


The enzyme, DNA polymerase, is used to copy a fragment of DNA. DNA polymerase requires a primer to attach to before it can copy DNA. The first step in the copying process is the pairing of a primer with the homologous sequence on a segment of DNA.

Chain-Terminator Method (Sanger, 1977) DNA synthesis


DNA polymerase attaches to the primer and begins copying the DNA strand.
DNA polymerase uses a mixture of nucleoside triphosphates to synthesize a new DNA strand.

Chain-Terminator Method (Sanger, 1977) DNA synthesis

The first DNA copy is extended

Chain-Terminator Method (Sanger, 1977) DNA synthesis & termination


DNA synthesis is terminated when one of the Dideoxy nucleotides is incorporated in place of a normal nucleotide triphosphate.

Chain-Terminator Method (Sanger, 1977) DNA fragment released The DNA copy is released and the original DNA strand is available to be copied again

Chain-Terminator Method (Sanger, 1977) DNA synthesis resumed


The DNA strand is copied a second time, until a dideoxy nucleotide is incorporated

Chain-Terminator Method (Sanger, 1977) Second DNA fragment released


The second DNA copy is released

Chain-Terminator Method (Sanger, 1977) DNA synthesis terminated third time


The DNA strand is copied and terminated a third time as another dideoxy nucleoside is incorporated

Chain-Terminator Method (Sanger, 1977) Third DNA fragment released


The third strand is released. This process is repeated until thousands of copies have been made, each terminated at a different base.

Chain-Terminator Method (Sanger, 1977) Three DNA fragments


The three DNA copies shown above are representative of the fragments that result from this process.

Chain-Terminator Method (Sanger, 1977) DNA fragments


This random premature termination creates a set of nested fragments of DNA, each differing by a single base in length.

Chain-Terminator Method (Sanger, 1977) Electrophoresis

Electrophoresis
Spacing plates

Template DNA

A G T C

Electrophoresis
Fluorescent bands after electrophoresis
The fluorescent bands in each lane of the DNA are read from the bottom up to determine the sequence of the DNA segment.
Gel bottom up sequence Complementary sequence
G A T C T T G C G T C A G T C A

A G T C

ACT GAC TG C GTTC TAG T G A C T G A C G C AA G AT C

Electrophoresis
Fluorescent bands after electrophoresis
The fluorescent bands in each lane of the DNA are read from the bottom up to determine the sequence of the DNA segment.

Electrophoresis
Fluorescent bands after electrophoresis
The fluorescent bands in each lane of the DNA are read from the bottom up to determine the sequence of the DNA segment.

Electrophoresis
Documentation
A printout of the laser scan is also prepared as a permanent record

Electrophoresis

Sanger Method Sequencing Gel

Template
ssDNA vectors
M13 pUC

PCR dsDNA (+/- PCR)

Primers
Universal primers
cheap, reliable, easy, fast, parallel BULK sequencing

Custom primers
expensive, slow, one-at-a-time ADAPTABLE

Extension
Polymerase
Sequenase Thermostable (Cycle Sequencing)

Terminators
Dye labels (Big Dye)
spectrally different, high fluorescence

ddA,C,G,T with primer labels

Separation
Gel Electrophoresis Capillary Electrophoresis
suited to automation
rapid (2 hrs vs 12 hrs) re-usable simple temperature control 96 well format

Sample Output

1 lane

Sequencing of DNA by the Sanger method

Comparison
Sanger Method
Enzymatic Requires DNA synthesis Termination of chain elongation

Maxam Gilbert Method


Chemical Requires DNA Requires long stretches of DNA Breaks DNA at different nucleotides

Current trends in sequencing:


It is rare for labs to do their own sequencing: --costly, perishable reagents --time consuming --success rate varies Instead most labs send out for sequencing: --You prepare the DNA (usually plasmid, M13, or PCR product), supply the primer, company or university sequencing center does the rest --The sequence is recorded by an automated sequencer as an electropherogram

BREAK UP THE GENOME, PUT IT BACK TOGETHER

Assemble sequences by matching overlaps ~160 kbp

BAC sequence ~1 kbp

BAC overlaps give genome sequence

Sequencing large pieces of DNA: the shotgun method


Break DNA into small pieces (typically sizes of around 1000 base pairs is preferable) Clone pieces of DNA into M13 Sequence enough M13 clones to ensure complete coverage (eg. sequencing a 3 million base pair genome would require 5x to 10x 3 million base pairs to have a reliable representation of the genome) Assemble genome through overlap analysis using computer algorithms, also polish sequences using mapping information from individual clones, characterized genes, and genetic markers This process is assisted by robotics

Sequencing done by TIGR (Maryland) and The Sanger Institute (Cambridge, UK) Here we report an analysis of the genome sequence of P. falciparum clone 3D7, including descriptions of chromosome structure, gene content, functional classification of proteins, metabolism and transport, and other features of parasite biology.

Sequencing strategy A whole chromosome shotgun sequencing strategy was used to determine the genome sequence of P. falciparum clone 3D7. This approach was taken because a whole genome shotgun strategy was not feasible or cost-effective with the technology that was available at the beginning of the project. Also, high-quality large insert libraries of (A T)-rich P. falciparum DNA have never been constructed in Escherichia coli, which ruled out a clone-by-clone sequencing strategy. The chromosomes were separated on pulsed field gels, and chromosomal DNA was extracted

The shotgun sequences were assembled into contiguous DNA sequences (contigs), in some cases with low coverage shotgun sequences of yeast artificial chromosome (YAC) clones to assist in the ordering of contigs for closure. Sequence tagged sites (STSs)10, microsatellite markers11,12 and HAPPY mapping7 were also used to place and orient contigs during the gap closure process. The high (A /T) content of the genome made gap closure extremely difficult79. Chromosomes 15, 9 and 12 were closed, whereas chromosomes 68, 10, 11, 13 and 14 contained 337 gaps (most less than 2.5 kb) per chromosome at the beginning of genome annotation. Efforts to close the remaining gaps are continuing.

Methods: Sequencing, gap closure and annotation The techniques used at each of the three participating centres for sequencing, closure and annotation are described in the accompanying Letters79. To ensure that each centres annotation procedures produced roughly equivalent results, the Wellcome Trust Sanger Institute (Sanger) and the Institute for Genomic Research (TIGR) annotated the same100-kb segment of chromosome 14. The number of genes predicted in this sequence by the two centres was 22 and 23; the discrepancy being due to the merging of two single genes by one centre. Of the 74 exons predicted by the two centres, 50 (68%) were identical, 9 (2%) overlapped, 6 (8%) overlapped and shared one boundary, and the remainder were predicted by one centre but not the other. Thus 88% of the exons predicted by the two centres in the 100-kb fragment were identical or overlapped.

The $1000 dollar genome


Venter Foundation (2003): The first group to produce a technology capable of a $1000 human genome will win $500,000
X - Prize Foundation: no, $5 - 20 million National Institutes of Health (2004): $70 million grant program to reach the $1000 genome

Previous sequencing techniques: one DNA molecule at a time Needed: many DNA molecules at a time -- arrays

One of these: pyrosequencing


Cut a genome to DNA fragments 300 - 500 bases long Immobilize single strands on a very small plastic bead (one piece of DNA per bead) Amplify the DNA on each bead to cover each bead to boost the signal Separate each bead on a plate with up to 1.6 million wells

Sequence by DNA polymerase -dependent chain extension, one base at a time in the presence of a reporter (luciferase) Luciferase is an enzyme that will emit a photon of light in response to the pyrophosphate (PPi) released upon nucleotide addition by DNA polymerase Flashes of light and their intensity are recorded

Extension with individual dNTPs gives a readout A

B
The readout is recorded by a detector that measures position of light flashes and intensity of light flashes

25 million bases in about 4 hours

APS = Adenosine phosphosulfate

From www.454.com

Height of peak indicates the number of dNTPs added

This sequence: TTTGGGGTTGCAGTT

New Sequencing Methods


1. Sequencing by MALDI-TOF Mass Spectrometry 2. Sequencing by Hybridization 3. Pyrosequencing 4. Atomic-Force Microscopy 5. Single-Molecule Fluorescence Microscopy 6. Nanopore Sequencing

Sequencing by MALDI-TOF Mass Spectrometry


MALDI was first coupled to time-of-flight mass spectrometry in 1988 by Karas & Hillenkamp. Their remarkable innovation was that virtually any macromolecule could be desorbed as an intact gas-phase ion by embedding it in the crystal of a low-molecular-weight molecule that strongly absorbs energy from a pulse of laser light. Prior to this landmark paper, laser desorption mass spectrometry was limited to peptides with specific volatility or photo-absorption properties Although originally applied to analysis of protein samples, MALDI-TOF-MS is now widely used for oligonucleotides and DNAas well. The essential features of MALDI-TOFMS DNA analysis are summarized as follows. The DNA sample is typically dried at room temperature on a flat surface in a matrix of 3-hydroxypicolic acid. The 3hydroxypicolic acid matrix serves the critical purpose of absorbing UV light while interacting very little with DNA. The sample is then treated with a short pulse of UV laser light that is absorbed by the 3-hydroxypicolic acid, causing ablation of DNA ions into the gas phase. The DNA ions are generally monovalent and intact. After a specified time delay, the charged gas-phase DNA molecules are extracted by a high-voltage pulse and accelerated in an electric field so that they attain a common kinetic energy. They are subsequently passed into a flight tube approximately 1 m long. Under vacuum at a common kinetic energy, the relative time required for a given molecule to travel the flight path is dependent on its mass. At the end of the flight tube, the molecules collide with an ion-to-electron conversion detector, thus registering the TOF from the original laser pulse (t). The mass of a given analyte can then be calculated from the relationship m = 2qVt2 /2l2. In practice, internal standards are often relied on to confirm peak identities.

The electrospray process

The soft laser desorption process

Sequencing by MALDI-TOF Mass Spectrometry

MW spectrum of 33-mer 5-ACT AAT GGC AGT TCA TTG CAT GAA TTT TAA AAG-3

DNA Sequencing by Hybridization

One early rationale for developing hybridization arrays was de novo sequencing. As originally conceived, this strategy [sequencing by hybridization (SBH)] involved annealing a labeled unknown DNA fragment to a complete array of short oligonucleotides (e.g. all 65,336 combinations of 8-mers) and deciphering the unknown sequence from the annealing pattern. Over the past decade, SBH has largely been eclipsed by the use of DNA arrays for single nucleotide polymorphisms (SNP) and expression analysis. This is partly due to the amount of diagnostic or biological information to be gained per feature on the array. For example, expression monitoring of the entire human genome could be performed using a microarray composed of 100,000 gene-specific sequences (or possibly many fewer), whereas the same number of features would allow resequencing of only 25,000 bases. Another stumbling block for sequencing applications is the use of short oligonucleotide probes. These present such problems as ambiguous reads as-sociated with repeat regions within the unknown target sequence, formation of secondary structures in some oligonucleotide probes that result in little or no detectable signal, and hybridization of oligonucleotides with single mismatches (false positives), which can be especially common at the terminal base pair.

DNA Sequencing by Hybridization This strategy involved annealing a labeled unknown DNA fragment to a complete array of short oligonucleotides (e.g. all 65,336 combinations of 8-mers) and deciphering the unknown sequence from the annealing pattern.

Principle of Pyrosequencing
(http://www.pyrosequencing.com/pages/technology.html)
Pyrosequencing is to sequence DNA by enzymatic DNA synthesis, and the DNA sequence is determined the from the signal peak of released photons during the synthesis. It includes the following 5 steps: Step 1 A sequencing primer is hybridized to a single stranded, PCR amplified, DNA template, and incubated with the enzymes, DNA polymerase, ATP sulfurylase, luciferase and apyrase, and the substrates, adenosine 5 phosphosulfate (APS) and luciferin. Step 2 The first of four deoxynucleotide triphosphates (dNTP) is added to the reaction. DNA polymerase catalyzes the incorporation of the deoxynucleotide triphosphate into the DNA strand, if it is complementary to the base in the template strand. Each incorporation event is accompanied by release of pyrophosphate (PPi) in a quantity equimolar to the amount of incorporated nucleotide.

Step 3 ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5 phosphosulfate. This ATP drives the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP. The light produced in the luciferase-catalyzed reaction is detected by a charge coupled device (CCD) camera and seen as a peak in a pyrogram. Each light signal is proportional to the number of nucleotides incorporated.

Step 4 Apyrase, a nucleotide degrading enzyme, continuously degrades unincorporated dNTPs and excess ATP. When degradation is complete, another dNTP is added.

Step 5 Addition of dNTPs is performed one at a time. It should be noted that deoxyadenosine alfa-thio triphosphate (dATPaS) is used as a substitute for the natural deoxyadenosine triphosphate (dATP) since it is efficiently used by the DN polymerase, but not recognized by the luciferase. As the process continues, the complementary DNA strand is built up and the nucleotide sequence is determined from the signal peak in the pyrogram.

Summary of Pyrosequencing Pyrosequencing is to sequence DNA by enzymatic DNA synthesis, and the DNA sequence is determined the from the signal peak of released photons during the synthesis.

Zipper-sequencing of DNA A DNA construct was engineered such that one of its extremities had one strand anchored to a surface via a long DNA fragment, and the other strand was bound to a small bead, itself stuck to a flexible glass fiber used as a force sensor. As the surface is displaced, the molecule is unzipped and the force to unpair two bases measured by the force sensor.

Nanopore Sequencing of Polynucleotides


An interesting idea in sequencing DNA proposed by D. Branton is to monitor the variation of ionic current due to an applied electric field which drives single-stranded polynucleotides through a nanopore in a thin film. Preliminary results of this method have shown its capability to distinguish long stretches of the same nucleotides, such as 30 adenines followed by 70 cytosines. Although this single molecule sequencing method provides a great advantage to sequence a long DNA, detection of monovalent ion current through the a-hemolysin pore is not likely to yield DNA sequence at single-nucleotide resolution. . First, the translocation time through the pore for each nucleotide is 1 microsecond, which is too short to resolve. Second, the thermal fluctuation of translocation time will forbid the possibility to determine the number of repeat nucleotides for each blockade current segment. Finally, the narrowest portion of the channel pore is 50 long, meaning that approximately seven nucleotides occupy that space at a given instant. Each of those seven nucleotides would contribute to resistance against ionic current, thus obscuring the influence

Nanopore Sequencing of Polynucleotides

Sequencing DNA with a Rotating Field

DNA

E
2nm

E E cos (wt )i E s in( wt ) j

The bond fluctuation model

Moving probability of each nucleic acid w = min[1, exp(-DU/kT)]

Translocation of DNA with Time

Translocation Time versus Frequency

Quantization of Translocation Time

Off-lattice simulations

Translocation Time versus Amplitude

Time Series of DNA Translocation


AAAAAAAAAAACGTACTTCGCGTGTAGTCATTTAATCCACCCCCCCCC CC

Prediction Error in Sequencing


AAAAAAAAAAAC (GTACTTCGCGTGTAGTCATTTAATCC) ACCCCCCCCCCC

Fabricating Nanopore by Ion Beam

A Sequencing Array

Conclusions
1. The traditional Sanger method of sequencing DNA is slow, costly, and inaccurate. It takes about 15 years to sequence human DNA and costs about 3 billion US dollars. The overlap of predicted novel gene sets between Celera and Ensembled is about 20%. 2. The new method by nanopore sequencing can be fast, inexpensive, and accurate. It takes about 1 day to sequence 100 million bases by using a sequencing array and high accuracy can be achieved by analyzing several time series. 3. The translocation time of polynucleotide chains is well controlled by the frequency of the rotating electric field. Specifically, it increases linearly with the rotating period for frequency less than 10 KHz. 4. The translocation time of each nucleotide is quantized in unit of a quarter of rotating period, which can be used to predict the sequence accurately.

Das könnte Ihnen auch gefallen