Beruflich Dokumente
Kultur Dokumente
DNA COMPUTING
Prepared by
C. Himanshu Reddy
Monsoon 2010
Department of Electronics and Communication Engineering
NATIONAL INISTITUTE OF TECHNOLOGY CALICUT
CERTIFICATE
This is to certify that this seminar report entitled “DNA Computing” is a bona
fide record of the seminar presented By C. Himanshu Reddy, Roll no.B070319EC, during
Monsoon 2010 in partial Fulfillment of the requirement of award of B. Tech degree in
Electronics and Communication Engineering by the National Institute of Technology Calicut,
India.
Computer chip manufacturers are furiously racing to make the next microprocessor that
will topple speed records. Sooner or later, though, this competition is bound to hit a wall.
Microprocessors made of silicon will eventually reach their limits of speed and miniaturization.
Chip makers need a new material to produce faster computing speeds. We have seen the
evolution of microprocessors from the basic Intel 4004, TMS 1000 and CADC to RISC and
multi-core designs. Recent inventions led to yet another kind of microprocessor. This
microprocessor, unlike others, is not based on the mundane silicon chip. In fact, this new
technology revolves around bio-mechanical science involving microprocessors based on DNA
(De-Oxyribo Nucleic Acid). In this talk, we discuss about the evolution of this technology and
develop a general but a precise knowledge about it.
CONTENTS
1. INTRODUCTION TO DNA
Computer chip manufacturers are furiously racing to make the next microprocessor that
will topple speed records. Sooner or later, though, this competition is bound to hit a wall.
Microprocessors made of silicon will eventually reach their limits of speed and miniaturization.
Chip makers need a new material to produce faster computing speeds.
DNA computing is a form of computing which uses DNA, biochemistry and molecular
biology, instead of the traditional silicon-based computer technologies. DNA computing, or,
more generally, bio-molecular computing, is a fast developing interdisciplinary area. Research
and development in this area concerns theory, experiments and applications of DNA computing.
The type of problem that Adleman solved is a famous one. It's formally known as a
directed Hamiltonian Path (HP) problem, but is more popularly recognized as a variant of the so-
called "travelling salesman problem." In Adleman's version of the travelling salesman problem,
or "TSP" for short, a hypothetical salesman tries to find a route through a set of cities so that he
visits each city only once. As the number of cities increases, the problem becomes more difficult
until its solution is beyond analytical analysis altogether, at which point it requires brute force
search methods. TSPs with a large number of cities quickly become computationally expensive,
making them impractical to solve on even the latest super-computer. Adleman’s demonstration
only involves seven cities, making it in some sense a trivial problem that can easily be solved by
inspection. Nevertheless, his work is significant for a number of reasons. [1]
It illustrates the possibilities of using DNA to solve a class of problems that is difficult or
impossible to solve using traditional computing methods.
It's an example of computation at a molecular level, potentially a size limit that may
never be reached by the semiconductor industry.
It demonstrates unique aspects of DNA as a data structure
It demonstrates that computing with DNA can work in a massively parallel fashion.
1
2. UNIQUENESS OF DNA
DNA has many unique features. Some of these features can be exploited to use DNA as an
exceptional computing device [4]. They being –
The amount of information gathered on the molecular biology of DNA over the last 40
years is almost overwhelming in scope. So instead of getting bogged down in biochemical and
biological details of DNA, we'll concentrate on only the information relevant to DNA
computing.
The data density of DNA is very impressive. Just like a string of binary data is encoded
with ones and zeros, a strand of DNA is encoded with four bases, represented by the letters A, T,
C, and G. The bases (also known as nucleotides) are spaced every 0.35 nano-meters along the
DNA molecule, giving DNA an remarkable data density of nearly 18 Mbits per inch. In two
dimensions, if you assume one base per square nano-meter, the data density is over one million
Gbits per square inch. Compare this to the data density of a typical high performance hard drive,
which is about 7 Gbits per square inch -- a factor of over 100,000 smaller. [4]
Let us give a practical comparison on the density of the storage capacity of DNA with the
current available storage devices. Consider a gram of DNA is compared with a normal Compact
Disk (CD). Now as we all know the maximum capacity of data a CD can hold is upto 800
Mbytes of data. Now from the above data about the DNA we can calculate that one gram of
DNA can hold about 1x1014 MBytes of data.
The number of CDs which would require to hold all the amount of information that is
stored in a gram of DNA would be so enormous that if all of them are placed in a line edge to
edge, they would circle the Earth 375 times, and it would take 163,000 centuries to listen to all
that information.
2
2.2 Enormous Parallelism
A test tube of DNA can contain huge numbers of DNA strands (trillions in number). Just
as any operation done on a chemical compound, when an operation is done on the test tube, each
of these trillions of strands in that tube undergoes the same operation simultaneously. In a way,
all these strands give out an end result at the same time. This property is very useful when
performing a same operation repeated number of times to get an average result.
Dr. Adleman, in his experiments, figured that his DNA computer can run 2x1019
operations, consuming only one joule of energy. Now as far as our silicon based personal
computers are concerned, the maximum efficiency (by a Pentium i7 processor) they have
achieved as of yet is barely 2x1017 operations per joule. When a Super Computer of 2006 era is
taken into consideration, its efficiency stands as 2x1019 operations consuming one joule of
energy*. The DNA computer beats these computers by a long way.
Another important property of DNA is its double stranded nature. The bases A and T, and
C and G, can bind together, forming base pairs. Therefore every DNA sequence has a natural
complement. For example if sequence S is ATTACGTCG, its complement, S', is
TAATGCAGC. Both S and S' will come together (or hybridize) to form double stranded DNA.
These complementarities make DNA a unique data structure for computation and can be
exploited in many ways. Error correction is one example. Errors in DNA happen due to many
factors.
Occasionally, DNA enzymes simply make mistakes, cutting where they shouldn't, or
inserting a T for a G. DNA can also be damaged by thermal energy and UV energy from the sun.
If the error occurs in one of the strands of double stranded DNA, repair enzymes can restore the
proper DNA sequence by using the complement strand as a reference.
In this sense, double stranded DNA is similar to a RAID 1 array, where data is mirrored
on two drives, allowing data to be recovered from the second drive if errors occur on the first. In
biological systems, this facility for error correction means that the error rate can be quite low.
For example, in DNA replication, there is one error for every 109 copied bases or in other words
an error rate of 10-9. In comparison, hard drives have read error rates of only 10-13 for Reed-
Solomon correction. [3]
*
Intel documentations
3
3. HAMILTONIAN PATH PROBLEM
The Hamiltonian Path Problem Dr. Adleman used in his laboratory was a seven city or
seven point HPP as shown in the adjacent figure. It has seven
vertices or cities and each city is connected to other city by means
of uni-directional and bi-directional paths. It also has a START
and an END point. To simplify the understanding of the
algorithm used by him in solving the seven point HPP, we use a
four point HPP with the same algorithm. [1]
Select itineraries that start with the proper city and end with the final city.
Select itineraries with the correct number of cities.
Select itineraries that contain each city only once.
4
For this particular problem, only one Hamiltonian path exists, and it passes through
Atlanta, Boston, Chicago and Detroit in that order. In the present computation, this path, here,
has been represented by GCAGTCGGACTGGGCTATGTCCGA, a DNA sequence of length 24.
Now let’s understand how this works. From the adjacent figure we can see that the latter
part of the Atlanta-Boston flight is the complement structure of the former part of Boston
complement. These two hybridize and form a pair.
GCACTCGG
AGCCTGAC
Once this pair is formed, it can be noticed that the latter part of Boston complement
sequence forms a complementary structure to the flight Boston-Chicago, which forms the
complementary pair.
GCACTCGGACTGGGCT
AGCCTGAC
In the similar fashion, the rest of the path can be formed with the Chicago complement
and the Chicago-Detroit flight. The final path is thus given below –
GCACTCGGACTG GGCTATGTCCGA
AGCCTGACCCGATACA
Thus the final path from Atlanta to Detroit passing through Boston and Chicago is
formed. Now there lies a problem. When such kind of operation is done there is a high chance
that a number various initial DNA structures can exist along with the example specified DNA
structures. These other structures will also form many other undesirable paths. The big problem
now is extraction of the desired path.
5
4. METHODS OF EXTRACTION
In order to extract the desired path or the desired strand of the DNA structures, various
methods have to be employed in a particular manner. While the strands undergo the methods
several other bio-chemicals are needed for successful extraction. The following are the important
bio-chemicals whose function is the base for the extraction methods
1. Polymerases. Polymerases copy information from one molecule into another. For
example, DNA polymerase will make a Watson-Crick complementary DNA strand from
a DNA template. In fact, DNA polymerase needs a “start signal” to tell it where to begin
making the complementary copy. This signal is provided by a primer—a (possibly short)
piece of DNA that is annealed to the template by Watson-Crick complementarity.
Wherever such a primer-template pair is found, DNA polymerase will begin adding bases
to the primer to create a complementary copy of the template.
2. Ligases. Ligases bind molecules together. For example, DNA ligase will take two strands
of DNA in proximity and covalently bond them into a single strand. DNA ligase is used
by the cell to repair breaks in DNA strands that occur, for instance, after skin cells are
exposed to ultraviolet light.
3. Nucleases. Nucleases cut nucleic acids. For example, restriction endo-nucleases will
“search” a strand of DNA for a predetermined sequence of bases and, when found, will
cut the molecule into two pieces. EcoRI (from Escherichia coli) is a restriction enzyme
that will cut DNA after the G in the sequence GAATTC—it will almost never cut a strand
of DNA anywhere else. It has been suggested that restriction enzymes evolved to protect
bacteria from viruses. For example, E. coli has a means (methylation) of protecting its
own DNA from EcoRI, but an invading virus with the deadly GAATTC sequence will be
cut to pieces. My DNA computer did not use restriction enzymes, but they have been
used in subsequent experiments by many other research groups.
We now have a test tube full of various lengths of DNA that encode possible routes
between cities. What we want are routes that start with Atlanta and end with Detroit. To
accomplish this we can use a technique called Annealing followed by Polymerase Chain
Reaction (PCR). Once this is done, next step involves sorting the DNA by length and selecting
the DNA whose length corresponds to the 4 cities. This is done by Gel Electrophoresis. [1] [2] [3]
6
4.1 Annealing
Typically, PCR consists of a series of 20-40 repeated temperature changes, called cycles,
with each cycle commonly consisting of 2-3 discrete temperature steps, usually three. The
cycling is often preceded by a single temperature step (called hold) at a high temperature
(>90°C), and followed by one hold at the end for final product extension or brief storage. The
temperatures used and the length of time they are applied in each cycle depend on a variety of
parameters. These include the enzyme used for DNA synthesis, the concentration of divalent
ions and dNTPs in the reaction, and the melting temperature of the primers.
7
The following are the critical steps involved in each cycle of the polymerase chain reaction.
Denaturation step: This step is the first regular cycling event and consists of heating the
reaction to 94°C for 20–30 seconds. It causes DNA melting by disrupting the hydrogen
bonds between complementary bases, yielding single-stranded DNA molecules.
Annealing step: The reaction temperature is lowered to about 50°C for 20–40 seconds
allowing annealing of the primers to the single-stranded DNA template. Stable DNA-
DNA hydrogen bonds are only formed when the primer sequence very closely matches
the template sequence. The polymerase binds to the primer-template hybrid and begins
DNA synthesis.
Final hold: This step at 4–15 °C for an indefinite time may be employed for short-term
storage of the reaction.
8
4.3 Gel Electrophoresis
Electrophoresis refers to the electromotive force (EMF) that is used to move the
molecules through the gel matrix. By placing the molecules in wells in the gel and applying an
electric field, the molecules will move through the matrix at different rates, determined largely
by their mass when the charge to mass ratio (Z) of all species is uniform, toward the anode if
negatively charged or toward the cathode if positively charged.
The DNA now is forced to thread its way through the tiny spaces between these strands,
which slows down the DNA at different rates depending on its length. This process results in
formation of DNA bands, with each band corresponding to a certain length. By simply cutting
out the band of interest, DNA of a specific length can be isolated. Since it is known that each city
is encoded with 8 base pairs of DNA, knowing the length of the itinerary gives us the number of
cities. In this case we would isolate the DNA that was 24 base pairs long (3 cities times 8 base
pairs).
1. Use Annealing and PCR to replicate the DNA with the correct start and end city.
2. Put one primer on Atlanta and one on Detroit.
3. The right answer is replicated exponentially, while the wrong paths are replicated linearly
or not at all.
4. Use Gel Electrophoresis to identify the molecules with the right length.
5. Use the affinity probe separation procedure to weed out paths without all cities.
6. The left out DNA is the required solution.
9
5. CAVEATS OF ADLEMAN’S EXPERIMENT
Now after understanding the process by which Adleman's solved a seven city problem,
there are some significant aspects to look into.
The entire process itself is different from the normal procedure of finding the solution of
the Hamiltonian Path Problem. On a normal silicon computer, this kind of problem will generally
require to generate each possibility starting from the initial city to the destination city. This
method implements Brute Force algorithm where a random path is created, checked and
presented if the solution suffices or it is discarded and a new random path is generated. But
where as in Adleman’s computer, all the various paths are simultaneously generated and then the
desired path is fished out. This implements Generate and Test algorithm where all possible
solutions are presented and the best one is selected.
Though Adleman was successful in finding the solution of a seven point HPP, there are
two major shortcomings preventing a large scaling up of his computation. The complexity of the
travelling salesman problem simply doesn’t disappear when applying a different method of
solution - it still increases exponentially. For Adleman’s method, what scales exponentially is not
the computing time, but rather the amount of DNA. [9]
Unfortunately this places some hard restrictions on the number of cities that can be solved;
after the Adleman article was published, more than a few people have pointed out that using his
method to solve a 200 city HP problem would take an amount of DNA that weighed more than
the earth.
Another factor that places limits on his method is the error rate for each operation. Since
these operations are not deterministic but stochastically driven, each step contains statistical
errors, limiting the number of iterations that can be done successively before the probability of
producing an error becomes greater than producing the correct result. For example an error rate
of 1% is fine for 10 iterations, giving less than 10% error, but after 100 iterations this error grows
to 63%.
Later on many new technologies have been discovered by which DNA computing has
given more precise and less bulky solutions.
10
6. DIFFERENT METHODS OF DNA COMPUTING
There are multiple new methods for building a computing device based on DNA, each
with its own advantages and disadvantages. Most of these build the basic logic gates (AND, OR,
NOT) associated with digital logic from a DNA basis. Some of the different bases include
DNAzymes, deoxy-oligo-nucleotides, enzymes, DNA tiling, and polymerase chain reaction.
6.1 DNAzymes
A design called a stem loop, consisting of a single strand of DNA which has a loop at an
end, are a dynamic structure that opens and closes when a piece of DNA bonds to the loop part.
This effect has been exploited to create several logic gates. These logic gates have been used to
create the computers MAYA I and MAYA II which can play tic-tac-toe to some extent. [4]
6.2 Enzymes
Enzyme based DNA computers are usually of the form of a simple Turing machine; there
is analogous hardware, in the form of an enzyme, and software, in the form of DNA. Shapiro has
demonstrated a DNA computer using the FokI enzyme and expanded on his work by going on to
show automata that diagnose and react to prostate cancer. His automata evaluated the expression
of each gene, one gene at a time, and on positive diagnosis then released a single strand DNA
molecule (ssDNA) that is an antisense for MDM2. [4]
DNA computers have also been constructed using the concept of toehold exchange. In
this system, an input DNA strand binds to a sticky end, or toehold, on another DNA molecule,
which allows it to displace another strand segment from the molecule. This allows the creation of
modular logic components such as AND, OR, and NOT gates and signal amplifiers, which can
be linked into arbitrarily large computers. This class of DNA computers does not require
enzymes or any chemical capability of the DNA. [4]
DNA nanotechnology has been applied to the related field of DNA computing. DNA tiles
can be designed to contain multiple sticky ends with sequences chosen so that they act as Wang
tiles. A DX array has been demonstrated whose assembly encodes an XOR operation; this allows
the DNA array to implement a cellular automaton which generates a fractal called the Sierpinski
gasket. This shows that computation can be incorporated into the assembly of DNA arrays,
increasing its scope beyond simple periodic arrays. [4]
11
7. D.N.A. vs. SILICON
Silicon microprocessors have been the heart of the computing world for more than 40
years. In that time, manufacturers have crammed more and more electronic devices onto their
microprocessors. In accordance with Moore's Law, the number of electronic devices put on a
microprocessor has doubled every 18 months. Moore's Law is named after Intel founder Gordon
Moore, who predicted in 1965 that microprocessors would double in complexity every two
years. Many have predicted that Moore's Law will soon reach its end, because of the physical
speed and miniaturization limitations of silicon microprocessors.
DNA computers have the potential to take computing to new levels, picking up where
Moore's Law leaves off. There are several advantages to using DNA instead of silicon [9]:
As long as there are cellular organisms, there will always be a supply of DNA.
The large supply of DNA makes it a cheap resource.
Unlike the toxic materials used to make traditional microprocessors, DNA biochips can
be made cleanly.
DNA computers are many times smaller than today's computers.
DNA's key advantage is that it will make computers smaller than any computer that has come
before them, while at the same time holding more data. One pound of DNA has the capacity to
store more information than all the electronic computers ever built; and the computing power of
a teardrop-sized DNA computer, using the DNA logic gates, will be more powerful than the
world's most powerful supercomputer. More than 10 trillion DNA molecules can fit into an area
no larger than 1 cubic centimetre (0.06 cubic inches). With this small amount of DNA, a
computer would be able to hold 10 terabytes of data, and perform 10 trillion calculations at a
time. By adding more DNA, more calculations could be performed.
12
8. CURRENT SCENARIO OF DNA COMPUTING
In recent years, reversible logic has emerged as a promising computing paradigm having
its applications in low power computing, quantum computing, nanotechnology, optical
computing and DNA computing. The classical set of gates such as AND, OR, and EXOR are not
reversible. Recently, it has been shown how to encode information in DNA and use DNA
amplification to implement Fredkin gates. Furthermore, in the past Fredkin gates have been
constructed using DNA, whose outputs are used as inputs for other Fredkin gates. Thus, it can be
concluded that arbitrary circuits of Fredkin gates can be constructed using DNA. [7]
In 2001, scientists at the Weizmann Institute of Science in Israel announced that they had
manufactured a computer so small that a single drop of water would hold a trillion of the
machines. The devices used DNA and enzymes as their software and hardware and could
collectively perform a billion operations a second. Now the same team, led by Ehud Shapiro, has
announced a novel model of its bio-molecular machine that no longer requires an external energy
source and performs 50 times faster than its predecessor did. The Guinness Book of World
Records has crowned it the world's smallest biological computing device. [8]
In the military scenario DNA computing has brought the communication channel to a
new level [9] [10]
DNA Manipulation technology has rapidly improved in recent years and future advances
may make DNA computers more efficient.
Researchers at U.S. Military are experimenting with DNA’s massive parallelism to crack
cryptic codes.
Japanese Military scientists have successfully stored the countries military doctrines on a
bacteria’s DNA.
In the medical scenes research is going on DNA Microchips, which will help to identify
BRCA1 and BRCA2 mutations in human beings (especially women).
IBM (International Business Machines) has already started its research on the DNA
Microprocessor.
13
9. CONCLUSION
So will DNA ever be used to solve a travelling salesman problem with a higher number
of cities than can be done with traditional computers? Well, considering that the record is a
whopping 13,509 cities, it certainly will not be done with the procedure described above. This
first demonstration of DNA computing used a rather unsophisticated algorithm, but as the
formalism of DNA computing becomes refined, new algorithms perhaps will one day allow
DNA to overtake conventional computation and set a new record.
And of course we are talking about DNA here, the genetic code of life itself. It certainly
has been the molecule of this century and most likely the next one. Considering all the attention
that DNA has garnered, it isn’t too hard to imagine that one day we might have the tools and
talent to produce a small integrated desktop machine that uses DNA, or a DNA-like biopolymer,
as a computing substrate along with set of designer enzymes.
The most significant technology in the future of engineering is DNA computers. DNA is
what makes up your genes and stores all the information about you inside your cells. It is the
instructions for what you look like and how your function. Each microscopic cell in your body
contains the entire DNA needed to build you, which is a lot of information. DNA not only has
huge data storage potential but also the potential to solve complicated calculations and
mathematical problems.
DNA computers will be thousands of times smaller and more powerful than silicon based
computers. One pound of DNA has ability to store more data than every electronic devices ever
made to date. A water droplet sized DNA computers will have more computing power than
today's most powerful supercomputers. Another advantage of DNA computing over silicon
based computers is the ability to do parallel calculations. Silicon based microprocessors can only
do on calculation at a time while DNA computer will be able to do many simultaneous
calculations.
14
REFERENCES
[1] Leonard M. Adleman, Computing with DNA, Scientific American, August 1998.
[2] L. Adleman. On constructing a molecular computer. 1st DIMACS workshop on DNA based
computers, Princeton, 1995. In DIMACS series, vol.27 (1996)
[4] Junghuei Chen, John H. Reif, DNA Computing: 9th International Workshop on DNA Based
Computers, DNA9 Madison, WI, USA June 2003, Revised Papers.
[5] Paun, G., Rozenberg, G., and Salomaa, A., DNA Computing, Springer, 1998.
[6] IEEE - J.H.M Dassen, DNA Computing – Promises, Problems and Perspective `97-`98
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=00645835.
[7] Himanshu Thapliyal & M. B. Srinavas, An Extension to DNA Based Fredkin Gate Circuits:
Design of Reversible Sequential Circuits using Fredkin Gates.
[8] National Geographic Article, Computer Made from DNA and Enzymes.
[10] Lipton R, DNA Solution of Hard Computational Problems. Science. Vol. 268 (1995). 542-
545.
15
APPENDIX
EcoRI (from Escherichia coli) Escherichia coli is a Gram negative rod-shaped bacterium that
is commonly found in the lower intestine of warm-blooded
organisms (endotherms).
Sierpinski Gasket The Sierpinski Gasket also called the Sierpinski Triangle or
the Sierpinski Sieve, is a fractal and attractive fixed set named
after the Polish mathematician Wacław. Originally
constructed as a curve, this is one of the basic examples of
self-similar sets, i.e. it is a mathematically generated pattern
that can be reproducible at any magnification or reduction.
BRCA I and BRCA II A pair of genes involved in breast cancer. BRCA Mutation.
MEMS Micro-Electro-Mechanical-Systems
16