Sie sind auf Seite 1von 27

DIVISION OF COMPUTER ENGINEERING SCHOOL DNAOF COMPUTING ENGINEERING COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY, COCHIN 682022

Submitted by A SEMINAR REPORT

JOBIN RAJ Bonafide Certificate

Certified that this seminar report titled DNA C !"#$in% is the bona in partial fulfillment for the award of the degree fide work done by J &in R'( who carried out the work under my of supervision.

BACHELOR OF TECHNOLOGY
in COMPUTER SCIENCE & ENGINEERING P)**$+' S D), D'-i. P*$*) S SEMI !" #$I%E SCHOOL OF ENGINEERING &ead of the %epartment 'ecturer( COCHIN UNIVERSITY OF SCIENCE %ivision of Computer AND %ivision of Computer Science Science TECHNOLOGY, S)E( C$S!* S)E( C$S!*

COCHIN 682022

NOVEMBER 2008 %ate

AC/NO0LEDGEMENT

I thank my seminar +uide Mrs. ,reetha S( 'ecturer( C$S!*( for her proper +uidance( and valuable su++estions. I am indebted to Mr. %avid ,eter( the &)%( Computer Science division - other faculty members for +ivin+ me an opportunity to learn and present the seminar. If not for the above mentioned people my seminar would never have been completed successfully. I once a+ain e.tend my sincere thanks to all of them.

/obin "a0

A&1$)'2$

% ! 1deo.yribonucleic acid2 molecules( the material our +enes are made of( have the potential to perform calculations many times faster than the world3s most powerful human4built computers. % ! mi+ht one day be inte+rated into a computer chip to create a so4called biochip that will push computers even faster. % ! molecules have already been harnessed to perform comple. mathematical problems. 5hile still in their infancy( % ! computers will be capable of storin+ billions of times more data than your personal computer. *he technolo+y is still in development( and didn3t even e.ist as a concept a few years a+o. % ! computers have the potential to take computin+ to new levels( pickin+ up where Moore3s 'aw leaves off. *he % ! computers are unlikely to feature word processin+( e4mailin+ and solitaire pro+rams. Instead( their powerful computin+ power will be used by national +overnments for crackin+ secret codes( or by airlines wantin+ to map more efficient routes. Studyin+ % ! computers may also lead us to a better understandin+ of a more comple. computer 44 the human brain.

T'&3* 4 C n$*n$1

C+'"$*) N , Ti$3* P'%* N ,

'ist of *ables 'ist of 6i+ures 'ist of Symbols( !bbreviations and omenclature 7 Introduction 8% ! 8.7 5hat is % !: 8.8 Structure of % ! 8.9 )perations on % ! 8.; % ! as a software 9 Si+nificance of % ! 9.7 % !? ! uni@ue data structure 9.8 )perations in parallel ; % ! vs. Silicon < *he !dleman E.periment 77 = Conclusion > "eferences

v vi vii

7 9 9 ; < = > > A B

7A 7B

Li1$ 4 T'&3*1

S3, N , T'&3*1 P'%* N ,

8.7 )perations on % ! ;.7 Comparison of a % ! computer and Conventional Computer <.7 *S, D City Encodin+

< 7C

78

ii

Li1$ 4 4i%#)*1

S3, N , I!'%*1 P'%* N ,

8.7 Structure of % ! <.7 *ravellin+ Salesman ,roblem 4 #raph 77 <.8 *S, D City Encodin+ <.9 *S, D "oute Encodin+ <.; #el Electrophoresis <.< !ffinity ,urification

79 79 7< 7=

iii

Li1$ 4 S5!& 31, A&&)*-i'$i n1 'n. N !*n23'$#)*

S3, N , I$*! D*4ini$i n

7 % ! %eo.yribonucleic !cid 8 " ! "ibonucleic !cid 9 m" ! Messen+er " ! ; ! !denine < * *hymine = # #uanine > C Cytosine A $ $racil B "!I% "edundant !rray of Ine.pensive %isks 7C *S, *ravellin+ Salesman ,roblem

iv

% ! Computin+

6, In$) .#2$i n
In 7BB;( 'eonard M. !dleman solved an unremarkable computational problem with a remarkable techni@ue. It was a problem that a person could solve it in a few moments or an avera+e desktop machine could solve in the blink of an eye. It took !dleman( however( seven days to find a solution. demonstration of computin+ on the molecular level. *he type of problem that !dleman solved is a famous one. It3s formally known as a directed &amiltonian ,ath 1&,2 problem( but is more popularly reco+niEed as a variant of the so4called Ftravellin+ salesman problem.F In !dleman3s version of the travellin+ salesman problem( or F*S,F for short( a hypothetical salesman tries to find a route throu+h a set of cities so that he visits each city only once . !s the number of cities increases( the problem becomes more difficult until its solution is beyond analytical analysis alto+ether( at which point it re@uires brute force search methods. *S,s with a lar+e number of cities @uickly become computationally e.pensive( makin+ them impractical to solve on even the latest super4computer. !dlemanGs demonstration only involves seven cities( makin+ it in some sense a trivial problem that can easily be solved by inspection. number of reasons. It illustrates the possibilities of usin+ % ! to solve a class of problems that is difficult or impossible to solve usin+ traditional computin+ methods. It3s an e.ample of computation at a molecular level( potentially a siEe limit that may never be reached by the semiconductor industry. It demonstrates uni@ue aspects of % ! as a data structure It demonstrates that computin+ with % ! can work in a massively parallel fashion. In 8CC7( scientists at the 5eiEmann Institute of Science in Israel announced that they had manufactured a computer so small that a sin+le drop of water would hold a trillion of the machines. *he devices used % ! and enEymes as their software and hardware and could collectively perform a billion operations a second. ow the same team( led by Ehud Shapiro( has announced a novel model of its biomolecular machine
%ivision of Computer Science( School of En+ineerin+( C$S!* 7

evertheless( this work was

e.ceptional because he solved the problem with % !. It was a landmark

evertheless( his work is si+nificant for a

% ! Computin+

that no lon+er re@uires an e.ternal ener+y source and performs <C times faster than its predecessor did. *he #uinness Book of 5orld "ecords has crowned it the world3s smallest biolo+ical computin+ device. Many desi+ns for minuscule computers aimed at harnessin+ the massive stora+e capacity of % ! have been proposed over the years. Earlier schemes have relied on a molecule known as !*,( which is a common source of ener+y for cellular reactions( as a fuel source. But in the new set up( a % ! molecule provides both the initial data and sufficient ener+y to complete the computation. 5e propose a new class of al+orithms to be implemented on a % ! computer. *he al+orithms we are +oin+ to introduce will not be affected much by the initial condition chan+e. *his will +ive % ! computers +reat e.tensibility. Hnapsack problems are classical problems solvable by this method. It is unrealistic to solve these problems usin+ conventional electronic computers when the siEe of them +ets lar+e due to the ,4complete property of these problems. % ! computers usin+ our method can solve substantially lar+e siEe problems because of their massive parallelism.

%ivision of Computer Science( School of En+ineerin+( C$S!* 8

% ! Computin+

2, DNA

2,6 0+'$ i1 DNA7 % ! 1deo.yribonucleic acid2 is the primary +enetic material in all livin+ or+anisms 4 a molecule composed of two complementary strands that are wound around each other in a double heli. formation. *he strands are connected by base pairs that look like run+s in a ladder. Each base will pair with only one other? adenine 1!2 pairs with thymine 1*2( +uanine 1#2 pairs with cytosine 1C2. *he se@uence of each sin+le strand can therefore be deduced by the identity of its partner. #enes are sections of % ! that code for a defined biochemical function( usually the production of a protein. *he % ! of an or+anism may contain anywhere from a doEen +enes( as in a virus( to tens of thousands of +enes in hi+her or+anisms like humans. *he structure of a protein determines its function. *he se@uence of bases in a +iven +ene determines the structure of a protein. *hus the +enetic code determines what proteins an or+anism can make and what those proteins can do. It is estimated that only 749I of the % ! in our cells codes for +enesJ the rest may be used as a decoy to absorb mutations that could otherwise dama+e vital +enes. m" ! 1Messen+er " !2 is used to relay information from a +ene to the protein synthesis machinery in cells. m" ! is made by copyin+ the se@uence of a +ene( with one subtle difference? thymine 1*2 in % ! is substituted by uracil 1$2 in m" !. *his allows cells to differentiate m" ! from % ! so that m" ! can be selectively de+raded without destroyin+ % !. *he % !4o4+ram +enerator simplifies this step by takin+ m" ! out of the e@uation. *he +enetic code is the lan+ua+e used by livin+ cells to convert information found in % ! into information needed to make proteins. ! protein3s structure( and therefore function( is determined by the se@uence of amino acid subunits. *he amino acid se@uence of a protein is determined by the se@uence of the +ene encodin+ that protein. *he FwordsF of the +enetic code are called codons. Each codon consists of three ad0acent bases in an m" ! molecule. $sin+ combinations of !( $( C and #(
%ivision of Computer Science( School of En+ineerin+( C$S!* 9

% ! Computin+

there can be si.ty four different three4base codons. *here are only twenty amino acids that need to be coded for by these si.ty four codons. *his e.cess of codons is known as the redundancy of the +enetic code. By allowin+ more than one codon to specify each amino acid( mutations can occur in the se@uence of a +ene without affectin+ the resultin+ protein. *he % !4o4+ram +enerator uses the +enetic code to specify letters of the alphabet instead of codin+ for proteins. 2,2 S$)#2$#)* 4 DNA

6i+ 8.7 Structure of % !

%ivision of Computer Science( School of En+ineerin+( C$S!* ;

% ! Computin+

2,8 O"*)'$i n1 n DNA %ouble stranded % ! strands are dissolved in Ann*'3in% to sin+le strands 1%enaturin+2 &eatin+ breaks the hydro+en bonds between complementary strands Base4pairin+ between two complimentary H5&)i.i9'$i n sin+le4strand molecules to form a double stranded % ! molecule 1Coolin+2 /oinin+ % ! molecules to+ether 'i+ase enEymes are used to concatenate free Li%'$i n floatin+ double stranded % ! )ften invoked after annealin+ and hybridiEation operation % ! can also be replicated( takin+ a sin+le molecule and multiplyin+ it a thousand fold ,ossible by ,olymerase Chain "eaction 1,C"2 ,C" alternates between two phases? separate R*"3i2'$i n :A!"3i45; % ! into sin+le strands usin+ heatJ convert into double strands usin+ primer and polymerase reaction ,C" rapidly amplifies a sin+le % ! molecule into billions of molecules Make 8
n

copies 1 n? number of iteration 2

Electrophoresis is the movement of char+ed molecules in an electric field *echni@ue for sortin+ % ! strands by siEe S )$in% :G*3 E3*2$) "+ )*1i1; Based on the fact that % ! molecules are ne+atively char+ed "ate of mi+ration of molecules in a@ueous solution 1+el2 depends on its shape 1siEe2 and electrical char+e
%ivision of Computer Science( School of En+ineerin+( C$S!* <

% ! Computin+

Smaller molecules mi+rate faster throu+h the +el( thus sortin+ them accordin+ to siEe #el 1 made of a+arose( polyacrylamide or combination of both 2 6ilterin+ of % ! containin+ a specific se@uence form a sample of mi.ed % ! !ttach compliment of the se@uence to be Fi3$*)in% :A44ini$5 P#)i4i2'$i n; filtered to substrate like ma+netic bead Beads are mi.ed with % ! % ! which contains the specific se@uence hybridiEes with their compliment in the bead Beads are then retrieved and the % ! is isolated *able 8.7 )perations on % ! 2,< DNA '1 S 4$=')*> *hink of % ! as software( and enEymes as hardware. ,ut them to+ether in a test tube. *he way in which these molecules under+o chemical reactions with each other allows simple operations to be performed as a by4product of the reactions. *he scientists tell the devices what to do by controllin+ the composition of the % ! software molecules. It3s a completely different approach to pushin+ electrons around a dry circuit in a conventional computer. *o the naked eye( the % ! computer looks like clear water solution in a test tube. *here is no mechanical device. ! trillion bio4molecular devices could fit into a sin+le drop of water. Instead of showin+ up on a computer screen( results are analyEed usin+ a techni@ue that allows scientists to see the len+th of the % ! output molecule. F)nce the input( software( and hardware molecules are mi.ed in a solution it operates to completion without intervention(F said %avid &awksett( the science 0ud+e

%ivision of Computer Science( School of En+ineerin+( C$S!* =

% ! Computin+

at #uinness 5orld "ecords. FIf you want to present the output to the naked eye( human manipulation is needed.F

8, Si%ni4i2'n2* 4 DNA

8,6 DNA> A #ni?#* .'$' 1$)#2$#)* *he amount of information +athered on the molecular biolo+y of % ! over the last ;C years is almost overwhelmin+ in scope. So instead of +ettin+ bo++ed down in biochemical and biolo+ical details of % !( we3ll concentrate on only the information relevant to % ! computin+. *he data density of % ! is impressive. /ust like a strin+ of binar y data is encoded with ones and Eeros( a strand of % ! is encoded with four bases( represented by the letters !( *( C( and #. *he bases 1also known as nucleotides2 are spaced every C.9< nanometres alon+ the % ! molecule( +ivin+ % ! a remarkable data density of nearly 7A Mbits per inch. In two dimensions( if you assume one base per s@uare nanometre( the data density is over one million #bits per s@uare inch. Compare this to the data density of a typical hi+h performance hard drive( which is about > #bits per s@uare inch 44 a factor of over 7CC(CCC smaller. !nother important property of % ! is its double stranded nature. *he bases ! and *( and C and #( can bind to+ether( formin+ base pairs. *herefore every % ! se@uence has a natural complement. 6or e.ample if se@uence S is !**!C#*C#( its complement( S3( is *!!*#C!#C. Both S and S3 will come to+ether 1or hybridiEe2 to form double stranded % !. *his makes % ! a uni@ue data structure for computation and can be e.ploited in many ways. Error correction is one e.ample. Errors in % ! happen due to many factors. )ccasionally( % ! enEymes simply make mistakes( cuttin+ where they shouldn3t( or insertin+ a * for a #. % ! can also be dama+ed by thermal ener+y and $K ener+y from the sun. If the error occurs in one of the strands of double stranded % !( repair enEymes can restore the proper % ! se@uence by usin+ the complement strand as a reference. In this sense( double stranded % ! is similar to a "!I% 7 array( where data is mirrored on two drives( allowin+ data to be recovered from the second drive if errors occur on the first. In biolo+ical systems( this
%ivision of Computer Science( School of En+ineerin+( C$S!* >

% ! Computin+

facility for error correction means that the error rate can be @uite low. 6or e.ample( in % ! replication( there is one error for every 7CLB copied bases or in other words an error rate of 7CL4B. 1In comparison( hard drives have read error rates of only 7CL479 for "eed4Solomon correction2. 8,2 O"*)'$i n1 in "')'33*3 In the cell( % ! is modified biochemically by a variety of enEymes( which are tiny protein machines that read and process % ! accordin+ to nature3s desi+n. *here is a wide variety and number of these FoperationalF proteins( which manipulate % ! on the molecular level. 6or e.ample( there are enEymes that cut % ! and enEymes that paste it back to+ether. )ther enEymes function as copiers and others as repair units. Molecular biolo+y( Biochemistry( and Biotechnolo+y have developed techni@ues that allow us to perform many of these cellular functions in the test tube. It3s this cellular machinery( alon+ with some synthetic chemistry( that makes up the palette of operations available for computation. /ust like a C,$ has a basic suite of operations like addition( bit4shiftin+( lo+ical operators 1! %( )"( )* )"2( etc. that allow it to perform even the most comple. calculations( % ! has cuttin+( copyin+( pastin+( repairin+( and many others. !nd note that in the test tubeJ enEymes do notfunction se@uentially( workin+ on one % ! at a time. "ather( many copies of the enEyme can work on many % ! molecules simultaneously. *his is the power of % ! computin+( that it can work in a massively parallel fashion.

%ivision of Computer Science( School of En+ineerin+( C$S!* A

% ! Computin+

DNA -1, Si3i2 n

% !( with its uni@ue data structure and ability to perform many parallel operations( allows you to look at a computational problem from a different point of view. *ransistor4based computers typically handle operations in a se@uential manner. )f course there are multi4processor computers( and modern C,$s incorporate some parallel processin+( but in +eneral( in the basic von eumann architecture computer( instructions are handled se@uentially. ! von eumann machine( which is what all modern C,$s are( basically repeats the same Ffetch and e.ecute cycleF over and over a+ainJ it fetches an instruction and the appropriate data from main memory( and it e.ecutes the instruction. It does these many( many times in a row( really( really fast. *he +reat "ichard 6eynman( in his Lectures on Computation ( summed up von eumann computers by sayin+( Fthe inside of a computer is as dumb as hell( but it +oes like madMF % ! computers( however( are non4von euman( stochastic machines that approach computation in a different way from ordinary computers for the purpose of solvin+ a different class of problems. *ypically( increasin+ performance of silicon computin+ means faster clock cycles 1and lar+er data paths2( where the emphasis is on the speed of the C,$ and not on the siEe of the memory. 6or e.ample( will doublin+ the clock speed or doublin+ your "!M +ive you better performance: 6or % ! computin+( thou+h( the power comes from the memory capacity and parallel processin+. If forced to behave se@uentially( % ! loses its appeal. 6or e.ample( let3s look at the read and write rate of % !. In bacteria( % ! can be replicated at a rate of about <CC base pairs a second. Biolo+ically this is @uite fast 17C times faster than human cells2 and considerin+ the low error rates( an impressive achievement. But this is only 7CCC bitsNsec( which is a snail3s pace when compared to the data throu+hput of an avera+e hard drive. But look what happens if you allow many copies of the replication enEymes to work on % ! in parallel. 6irst of all( the replication enEymes can start on the second replicated strand of % ! even before they3re finished copyin+ the first one. So already the data rate 0umps to 8CCC bitsNsec. But look what happens after each replication is finished 4 the number of % ! strands increases e.ponentially 18Ln after n iterations2. 5ith each additional strand( the data rate increases by 7CCC bitsNsec. So after 7C iterations( the
%ivision of Computer Science( School of En+ineerin+( C$S!* B

% ! Computin+

% ! is bein+ replicated at a rate of about 7MbitNsecJ after 9C iterations it increases to 7CCC #bitsNsec. *his is beyond the sustained data rates of the fastest hard drives. ow let3s consider how you would solve a nontrivial e.ample of the travellin+ salesman problem 1O of cities P 7C2 with silicon vs. % !. 5ith a von eumann computer( one naive method would be to set up a search tree( measure each complete branch se@uentially( and keep the shortest one. Improvements could be made with better search al+orithms( such as prunin+ the search tree when one of the branches you are measurin+ is already lon+er than the best candidate. ! method you certainly would not use would be to first +enerate all possible paths and then search the entire list. 5hy: 5ell( consider that the entire list of routes for a 8C city problem could theoretically take ;< million #Bytes of memor y 17AM routes with > byte words2M !lso for a 7CC MI,S computer( it would take two years 0ust to +enerate all paths 1assumin+ one instruction cycle to +enerate each city in every path2. &owever( usin+ % ! computin+( this method becomes feasibleM 7CL7< is 0ust a nanomole of material( a relatively small number for biochemistry. !lso( routes no lon+er have to be searched throu+h se@uentially. )perations can be done all in parallel.

DNA C !"#$*)1 C n-*n$i n'3 C !"#$*)1 S$ )'%* M*.i' M*! )5 C'"'2i$5 T5"* 4 O"*)'$i n1 N'$#)* 4 O"*)'$i n1 S"**. 4 *'2+ O"*)'$i n P) 2*11 ucleic acids Semiconductors $ltra4&i+h &i+h Biochemical )perations ,arallel Se@uential Slow 6ast Stochastic %eterministic 'o+ical )perations 1and( or( not2

*able ;.7 Comparison of a % ! computer and a Conventional Computer

%ivision of Computer Science( School of En+ineerin+( C$S!* 7C

% ! Computin+

<, T+* A.3*!'n E@"*)i!*n$

*here is no better way to understand how somethin+ works than by +oin+ throu+h an e.ample step by step. So letGs solve our own directed &amiltonian ,ath problem( usin+ the % ! methods demonstrated by !dleman. *he concepts are the same but the e.ample has been simplified to make it easier to follow and present. Suppose that I live in '!( and need to visit four cities? &ouston( Chica+o( Miami( and Q( with Q bein+ my final destination. *he airline IGm takin+ has a specific set of connectin+ fli+hts that restrict which routes I can take 1i.e. there is a fli+ht from '.!. to Chica+o( but no fli+ht from Miami to Chica+o2. 5hat should my itinerary be if I want to visit each city only once:

6i+ <.7 *ravellin+ Salesman ,roblem 4 #raph It should take you only a moment to see that there is only one route. Startin+ from '.!. you need to fly to Chica+o( %allas( Miami and then to .Q. !ny other choice of cities will force you to miss a destination( visit a city twice( or not make it to .Q. 6or this e.ample you obviously donGt need the help of a computer to find a solution. 6or si.( seven( or even ei+ht cities( the problem is still mana+eable. &owever( as the number of cities increases( the problem @uickly +ets out of hand. !ssumin+ a random distribution of connectin+ routes( the number of itineraries you need to check increases e.ponentially. ,retty soon you will run out of pen and paper listin+ all the possible routes( and it becomes a problem for a computer...

%ivision of Computer Science( School of En+ineerin+( C$S!* 77

% ! Computin+

...or perhaps % !. *he method !dleman used to solve this problem is basically the shot+un approach mentioned previously. &e first +enerated all the possible itineraries and then selected the correct itinerary. *his is the advanta+e of % !. ItGs small and there are combinatorial techni@ues that can @uickly +enerate many different data strin+s. Since the enEymes work on many % ! molecules at once( the selection process is massively parallel. Specifically( the method based on !dlemanGs e.periment would be as follows? 7. #enerate all possible routes. 8. Select itineraries that start with the proper city and end with the final city. 9. Select itineraries with the correct number of cities. ;. Select itineraries that contain each city only once. !ll of the above steps can be accomplished with standard molecular biolo+y techni@ues. P')$ I> G*n*)'$* '33 " 11i&3* ) #$*1 Strate+y? Encode city names in short % ! se@uences. Encode itineraries by connectin+ the city se@uences for which routes e.ist. % ! can simply be treated as a strin+ of data. 6or e.ample( each city can be represented by a FwordF of si. bases? 'os !n+eles #C*!C# Chica+o C*!#*! %allas *C#*!C Miami C*!C## ew Qork !*#CC#

*able <.7 *S, 4 City Encodin+ *he entire itinerary can be encoded by simply strin+in+ to+ether these % ! se@uences that represent specific cities. 6or e.ample( the route from '.! 44P Chica+o 44P %allas 44P Miami 44P ew Qork would simply be #C*!C# C*!#*! *C#*!C
%ivision of Computer Science( School of En+ineerin+( C$S!* 78

% ! Computin+

C*!C## !*#CC# or e@uivalently it could be represented in double stranded form with its complement se@uence. So how do we +enerate this: SynthesiEin+ short sin+le stranded % ! is now a routine process( so encodin+ the city names is strai+htforward. *he molecules can be made by a machine called a % ! synthesiEer or even custom ordered from a third party. Itineraries can then be produced from the city encodin+s by linkin+ them to+ether in proper order. *o accomplish this you can take advanta+e of the fact that % ! hybridiEes with its complimentary se@uence. 6or e.ample( you can encode the routes between cities by encodin+ the compliment of the second half 1last three letters2 of the departure city and the first half 1first three letters2 of the arrival city. 6or e.ample the route between Miami 1C*!C##2 and Q 1!*#CC#2 can be made by takin+ the second half of the codin+ for Miami 1C##2 and the first half of the codin+ for Q 1!*#2. *his +ives C##!*#. By takin+ the complement of this you +et( #CC*!C( which not only uni@uely represents the route from Miami to Q( but will connect the % ! representin+ Miami and Q by hybridiEin+ itself to the second half of the code representin+ Miami 1...C##2 and the first half of the code representin+ Q 1!*#...2. 6or e.ample?

6i+ <.8 *S, D City Encodin+ "andom itineraries can be made by mi.in+ city encodin+s with the route encodin+s. 6inally( the % ! strands can be connected to+ether by an enEyme called li+ase. 5hat we are left with are strands of % ! representin+ itineraries with a random number of cities and random set of routes. 6or e.ample?

%ivision of Computer Science( School of En+ineerin+( C$S!* 79

% ! Computin+

6i+ <.9 *S, 4 "oute Encodin+ 5e can be confident that we have all possible combinations includin+ the correct one by usin+ an e.cess of % ! encodin+s( say 7CL79 copies of each city and each route between cities. "emember % ! is a hi+hly compact data format( so numbers are on our side. P')$ II> S*3*2$ i$in*)')i*1 $+'$ 1$')$ 'n. *n. =i$+ $+* 2 ))*2$ 2i$i*1 Strate+y? Selectively copy and amplify only the section of the % ! that starts with '! and ends with Q by usin+ the ,olymerase Chain "eaction. !fter ,art I( we now have a test tube full of various len+ths of % ! that encode possible routes between cities. 5hat we want are routes that start with '! and end with Q. *o accomplish this we can use a techni@ue called ,olymerase Chain "eaction 1,C"2( which allows you to produce many copies of a specific se@uence of % !. ,C" is an iterative process that cycle throu+h a series of copyin+ events usin+ an enEyme called polymerase. ,olymerase will copy a section of sin+le stranded % ! startin+ at the position of a primer( a short piece of % ! complimentary to one end of a section of the % ! that you3re interested in. By selectin+ primers that flank the section of % ! you want to amplify( the polymerase preferentially amplifies the % ! between these primers( doublin+ the amount of % ! containin+ this se@uence. !fter many iterations of ,C"( the % ! you3re workin+ on is amplified e.ponentially. So to selectively amplify the itineraries that start and stop with our cities of interest( we use primers that are complimentary to '! and Q. 5hat we end up with after ,C" is a test tube full of double stranded % ! of various len+ths( encodin+ itineraries that start with '! and end with Q.

%ivision of Computer Science( School of En+ineerin+( C$S!* 7;

% ! Computin+

P')$ III> S*3*2$ i$in*)')i*1 $+'$ 2 n$'in $+* 2 ))*2$ n#!&*) 4 2i$i*1, Strate+y? Sort the % ! by len+th and select the % ! whose len+th corresponds to < cities. )ur test tube is now filled with % ! encoded itineraries that start with '! and end with Q( where the number of cities in between '! and Q varies. 5e now want to select those itineraries that are five cities lon+. *o accomplish this we can use a techni@ue called #el Electrophoresis( which is a common procedure used to resolve the siEe of % !. *he basic principle behind #el Electrophoresis is to force % ! throu+h a +el matri. by usin+ an electric field. % ! is a ne+atively char+ed molecule under most conditions( so if placed in an electric field it will be attracted to the positive potential. &owever since the char+e density of % ! is constant 1char+e per len+th2 lon+ pieces of % ! move as fast as short pieces when suspended in a fluid. *his is why you use a +el matri.. *he +el is made up of a polymer that forms a meshwork of linked strands. *he % ! now is forced to thread its way throu+h the tiny spaces between these strands( which slows down the % ! at different rates dependin+ on its len+th. 5hat we typically end up with after runnin+ a +el is a series of % ! bands( with each band correspondin+ to a certain len+th. 5e can then simply cut out the band of interest to isolate % ! of a specific len+th. Since we known that each city is encoded with = base pairs of % !( knowin+ the len+th of the itinerary +ives us the number of cities. In this case we would isolate the % ! that was 9C base pairs lon+ 1< cities times = base pairs2.

6i+ <.; #el Electrophoresis


%ivision of Computer Science( School of En+ineerin+( C$S!* 7<

% ! Computin+

P')$ IV> S*3*2$ i$in*)')i*1 $+'$ +'-* ' 2 !"3*$* 1*$ 4 2i$i*1 Strate+y? Successively filter the % ! molecules by city( one city at a time. Since the % ! we start with contains five cities( we will be left with strands that encode each city once. % ! containin+ a specific se@uence can be purified from a sample of mi.ed % ! by a techni@ue called affinity purification. *his is accomplished by attachin+ the compliment of the se@uence in @uestion to a substrate like a ma+netic bead. *he beads are then mi.ed with the % !. % !( which contains the se@uence you3re after then hybridiEes with the complement se@uence on the beads. *hese beads can then be retrieved and the % ! isolated.

6i+ <.< !ffinity purification So we now affinity purify five times( usin+ a different city complement for each run. 6or e.ample( for the first run we use '.!.34beads 1where the 3 indicates compliment strand2 to fish out % ! se@uences which contain the encodin+ for '.!. 1which should be the entire % ! because of step 92( the ne.t run we use %allas34 beads( and then Chica+o34beads( Miami34beads( and finally Q34beads. *he order isnGt important. If an itinerar y is missin+ a city( then it will not be Ffished outF durin+ one of the runs and will be removed from the candidate pool. 5hat we are left with are the itineraries that start in '!( visit each city once( and end in Q. *his is e.actly what we are lookin+ for. If the answer e.ists we would retrieve it at this step.

%ivision of Computer Science( School of En+ineerin+( C$S!* 7=

% ! Computin+

R*'.in% #$ $+* 'n1=*) )ne possible way to find the result would be to simply se@uence the % ! strands. &owever( since we already have the se@uence of the city encodin+s we can use an alternate method called +raduated ,C". &ere we do a series of ,C" amplifications usin+ the primer correspondin+ to '.!.( with a different primer for each city in succession. By measurin+ the various len+ths of % ! for each ,C" product we can piece to+ether the final se@uence of cities in our itinerary. 6or e.ample( we know that the % ! itinerary starts with '! and is 9C base pairs lon+( so if the ,C" product for the '! and %allas primers was 8; base pairs lon+( you know %allas is the fourth city in the itinerary 18; divided by =2. 6inally( if we were careful in our % ! manipulations the only % ! left in our test tube should be % ! itinerary encodin+ '!( Chica+o( Miami( %allas( and Q. So if the succession of primers used is '! - Chica+o( '! - Miami( '! - %allas( and '! - Q( then we would +et ,C" products with len+ths 78( 7A( 8;( and 9C base pairs. C'-*'$1 !dleman3s e.periment solved a seven city problem( but there are two ma0or shortcomin+s preventin+ a lar+e scalin+ up of his computation. *he comple.ity of the travelin+ salesman problem simply doesnGt disappear when applyin+ a different method of solution 4 it still increases e.ponentially. 6or !dlemanGs method( what scales e.ponentially is not the computin+ time( but rather the amount of % !. $nfortunately this places some hard restrictions on the number of cities that can be solvedJ after the !dleman article was published( more than a few people have pointed out that usin+ his method to solve a 8CC city &, problem would take an amount of % ! that wei+hed more than the earth. !nother factor that places limits on his method is the error rate for each operation. Since these operations are not deterministic but stochastically driven 1we are doin+ chemistry here2( each step contains statistical errors( limitin+ the number of iterations you can do successively before the probability of producin+ an error becomes +reater than producin+ the correct result. 6or e.ample an error rate of 7I is fine for 7C iterations( +ivin+ less than 7CI error( but after 7CC iterations this error +rows to =9I.

%ivision of Computer Science( School of En+ineerin+( C$S!* 7>

% ! Computin+

A, C n23#1i n
*he most si+nificant technolo+y in the future of en+ineerin+ is % ! computers. % ! is what makes up your +enes and stores all the information about you inside your cells. It is the instructions for what you look like and how your function. Each microscopic cell in your body contains the entire % ! needed to build you( which is a lot of information. % ! not only has hu+e data stora+e potential but also the potential to solve complicated calculations and mathematical problems. % ! computers are a very new concept. *he idea was conceived 0ust few years a+o. But in these few years( scientists have already been able to use % ! to solve moderately difficult math problems. % ! computers are still decades away from bein+ able to compete with silicon based computers( but will eventually be much more powerful than silicon based computers. *he first % ! computers will not be like a home ,C. *hey will be used to solve hu+e( complicated mathematical problems( such as breakin+ codes. % ! computers will be thousands of times smaller and more powerful than silicon based computers. )ne pound of % ! has ability to store more data than every electronic devices ever made to date. ! water droplet siEed % ! computers will have more computin+ power than today3s most powerful supercomputers. !nother advanta+e of % ! computin+ over silicon based computers is the ability to do parallel calculations. Silicon based microprocessors can only do on calculation at a time while % ! computer will be able to do many simultaneous calculations.

%ivision of Computer Science( School of En+ineerin+( C$S!* 7A

% ! Computin+

6, R*4*)*n2*1

7. !dleman '. M.( Molecular computation of solutions to combinatorial problems( 7BB;.

8. ,aun #.( "oEenber+ #. and Salomaa !.( Sprin+er( 7BBA

DNA Computing (

%ivision of Computer Science( School of En+ineerin+( C$S!* 7B

Das könnte Ihnen auch gefallen