Sie sind auf Seite 1von 13

Analysing the splicing and fordibbing –

enforcing model of DNA Computing


A Synopsis Report

Submitted by

SONIKA KATTA

in partial fulfillment for the award of the degree of

Master Of Technology

IN

Software Engineering
AT

Suresh Gyan Vihar University


2009-2011
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ABSTRACT ii
ACKNOWLEDGMENT iii

1. INTRODUCTION

1.1 what is DNA ?


1.2 Structure of DNA
1.3 Operations on DNA
1.4 DNA as Software
1.5 DNA Computing

2. Brief Literature survey


1. Introduction

In 1994, Leonard M. Adleman solved an unremarkable computational problem with a


remarkable technique. It was a problem that a person could solve it in a few moments or an
average desktop machine could solve in the blink of an eye. It took Adleman, however, seven
days to find a solution. Nevertheless, this work was exceptional because he solved the
problem with DNA. It was a landmark demonstration of computing on the molecular level.
The type of problem that Adleman solved is a famous one. It's formally known as a directed
Hamiltonian Path (HP) problem, but is more popularly recognized as a variant of the so-
called "travelling salesman problem." In Adleman's version of the travelling salesman
problem, or "TSP" for short, a hypothetical salesman tries to find a route through a set of
cities so that he visits each city only once. As the number of cities increases, the problem
becomes more difficult until its solution is beyond analytical analysis altogether, at which
point it requires brute force search methods.
TSPs with a large number of cities quickly become computationally expensive, making them
impractical to solve on even the latest super-computer. Adleman’s demonstration only
involves seven cities, making it in some sense a trivial problem that can easily be solved by
inspection. Nevertheless, his work is significant for a number of reasons.
 It illustrates the possibilities of using DNA to solve a class of problems that is difficult or
impossible to solve using traditional computing methods.
 It's an example of computation at a molecular level, potentially a size limit that may never
be reached by the semiconductor industry.
 It demonstrates unique aspects of DNA as a data structure
 It demonstrates that computing with DNA can work in a massively parallel fashion.

In 2001, scientists at the Weizmann Institute of Science in Israel announced that they had
manufactured a computer so small that a single drop of water would hold a trillion of the
machines. The devices used DNA and enzymes as their software and hardware and could
collectively perform a billion operations a second. Now the same team, led by Ehud Shapiro,
has announced a novel model of its biomolecular machine that no longer requires an external
energy source and performs 50 times faster than its predecessor did. The Guinness Book of
World Records has crowned it the world's smallest biological computing device.
Many designs for minuscule computers aimed at harnessing the massive storage capacity of
DNA have been proposed over the years. Earlier schemes have relied on a molecule known
as ATP, which is a common source of energy for cellular reactions, as a fuel source. But in
the new set up, a DNA molecule provides both the initial data and sufficient energy to
complete the computation.
We propose a new class of algorithms to be implemented on a DNA computer. The
algorithms we are going to introduce will not be affected much by the initial condition
change. This will give DNA computers great extensibility. Knapsack problems are classical
problems solvable by this method. It is unrealistic to solve these problems using conventional
electronic computers when the size of them gets large due to the NP-complete property of
these problems.
DNA computers using our method can solve substantially large size problems because of
their massive parallelism.
DNA

1.1 What is DNA?

DNA (deoxyribonucleic acid) is the primary genetic material in all living organisms - a
molecule composed of two complementary strands that are wound around each other in a
double helix formation. The strands are connected by base pairs that look like rungs in a
ladder. Each base will pair with only one other: adenine (A) pairs with thymine (T), guanine
(G) pairs with cytosine (C). The sequence of each single strand can therefore be deduced by
the identity of its partner.
Genes are sections of DNA that code for a defined biochemical function, usually the
production of a protein. The DNA of an organism may contain anywhere from a dozen genes,
as in a virus, to tens of thousands of genes in higher organisms like humans. The structure of
a protein determines its function. The sequence of bases in a given gene determines the
structure of a protein. Thus the genetic code determines what proteins an organism can make
and what those proteins can do. It is estimated that only 1-3% of the DNA in our cells codes
for genes; the rest may be used as a decoy to absorb mutations that could otherwise damage
vital genes.
mRNA (Messenger RNA) is used to relay information from a gene to the protein synthesis
machinery in cells. mRNA is made by copying the sequence of a gene, with one subtle
difference: thymine (T) in DNA is substituted by uracil (U) in mRNA. This allows cells to
differentiate mRNA from DNA so that mRNA can be selectively degraded without
destroying DNA. The DNA-o-gram generator simplifies this step by taking mRNA out of the
equation.
The genetic code is the language used by living cells to convert information found in DNA
into information needed to make proteins. A protein's structure, and therefore function, is
determined by the sequence of amino acid subunits. The amino acid sequence of a protein is
determined by the sequence of the gene encoding that protein. The "words" of the genetic
code are called codons. Each codon consists of three adjacent bases in an mRNA molecule.
Using combinations of A, U, C and G, there can be sixty four different three-base codons.
There are only twenty amino acids that need to be coded for by these sixty four codons. This
excess of codons is known as the redundancy of the genetic code. By allowing more than one
codon to specify each amino acid, mutations can occur in the sequence of a gene without
affecting the resulting protein.
The DNA-o-gram generator uses the genetic code to specify letters of the alphabet instead of
coding for proteins.
1.2 Structure of DNA

Fig 2.1 Structure of DNA

1.3 Operations on DNA


 Double stranded DNA strands are dissolved in
Annealing to single strands (Denaturing)
 Heating breaks the hydrogen bonds between
complementary strands

 Base-pairing between two complimentary


Hybridization single-strand molecules to form a double
stranded DNA molecule (Cooling)

 Joining DNA molecules together


 Ligase enzymes are used to concatenate free
Ligation floating double stranded DNA
 Often invoked after annealing and
hybridization operation

 DNA can also be replicated, taking a single


molecule and multiplying it a thousand fold
 Possible by Polymerase Chain Reaction (PCR)
 PCR alternates between two phases: separate
Replication (Amplify) DNA into single strands using heat; convert
into double strands using primer and
polymerase reaction
 PCR rapidly amplifies a single DNA molecule
into billions of molecules
 Make 2n copies ( n: number of iteration )

 Electrophoresis is the movement of charged


molecules in an electric field
 Technique for sorting DNA strands by size
 Based on the fact that DNA molecules are
negatively charged
Sorting  Rate of migration of molecules in aqueous
(Gel Electrophoresis) solution (gel) depends on its shape (size) and
electrical charge
 Smaller molecules migrate faster through the
gel, thus sorting them according to size
 Gel ( made of agarose, polyacrylamide or
combination of both )

 Filtering of DNA containing a specific


sequence form a sample of mixed DNA
 Attach compliment of the sequence to be
Filtering filtered to substrate like magnetic bead
(Affinity Purification)  Beads are mixed with DNA
 DNA which contains the specific sequence
hybridizes with their compliment in the bead
 Beads are then retrieved and the DNA is
Isolated

1.4 DNA as Software:

Think of DNA as software, and enzymes as hardware. Put them together in a test tube. The
way in which these molecules undergo chemical reactions with each other allows simple
operations to be performed as a by-product of the reactions. The scientists tell the devices
what to do by controlling the composition of the DNA software molecules. It's a completely
different approach to pushing electrons around a dry circuit in a conventional computer.
To the naked eye, the DNA computer looks like clear water solution in a test tube. There is
no mechanical device. A trillion bio-molecular devices could fit into a single drop of water.
Instead of showing up on a computer screen, results are analyzed using a technique that
allows scientists to see the length of the DNA output molecule.
"Once the input, software, and hardware molecules are mixed in a solution it operates to
completion without intervention," said David Hawksett, the science judge at Guinness World
Records. "If you want to present the output to the naked eye, human manipulation is needed."

1.5 DNA Computing

DNA computing, also called ‘Nano Computing’, is a rising interdisciplinary field that uses
the four DNA bases (A, T, G, C) to perform computation. DNA, a genetic material, is a
polymer of deoxyribonucleotides. The components of individual monomers –
deoxyribonucleotides (nucleotides) are:
(1) deoxyribose (pentose sugar),
(2) phosphate group,
(3) nitrogen base.

The phosphate group is attached to deoxyribose at the 5’ carbon position and 3’ carbon
position. The alternation of deoxyribose-phosphate-deoxyribose etc is referred to as
phosphate-sugar backbone of DNA. The type of chemical bond that holds the backbone
together is phosphodiester bond which is a strong covalent bond. Nitrogen base is attached to
deoxyribose at the first carbon position. Four types of bases are found in DNA. viz.-
adenine(A), cytosine(C), guanine(G) and thymine(T). The bases are classified in two
structural families:
• purine - adenine and guanine
• pyrimidine - thymine and cytosine
The DNA molecule is a polymer of four kinds of deoxyribonucleotides which are attached by
phosphodiester bond. The two long chains of DNA molecule are held together by
complementary base pairs. Three hydrogen bonds are present between the complementary
base pair G and C. Two hydrogen bonds are present between the complementary base pair A
and T.
DNA having an ability to store and process information inspires the idea of DNA computing.
DNA computing has a great advantage of in vivo computing and in vitro computing which is
independent of traditional silicon based technology. The advantages of DNA computing are
as follows:
• massive parallelism: in an in vitro assay 1018 processors working in parallel can be easily
handled.
• potential for information storage: in existing storage media data can be stored at a density
of approximately 1 bit per 1012 cubic nanometer while DNA requires only 1 cubic nanometer
to store 1 bit of data that is a trillion times less space.
• speed: although the elementary operations of DNA computer are slow in compare to
electronic computer but their parallelism would strongly prevail, so that in certain models 330
trillion operations per second can be performed which is more than 100,000 times the speed
of the fastest super computer existing today.
• energy efficiency: DNA can perform 1019 power operations using 1 joule of energy, while
a super computer only manages 1010 operations, making it 109 less energy efficient. In DNA
computers the energy consumption from DNA strand synthesis and PCR should also be small
compared to that used up by a super computer.

1.6 The Structure of DNA, Bonding, and Operations on DNA

Deoxyribonucleic acid, or DNA, serves as the chemical vocabulary in which the genetic
makeup of every living organism on Earth is expressed. It is composed of long strings of
nucleotides, which themselves are simple molecules, each made up of a phosphate group, a
sugar, and a base. While each nucleotide’s sugar and phosphate play an important but
supporting role—namely, bonding, respectively, with the phosphate group or the sugar of
another nucleotide, forming part of a DNA strand’s “backbone”—it is by its base that each
nucleotide is characterized, and it is in the bases of the DNA nucleotides that genetic
information is stored. Unfortunately, a detailed discussion of DNA’s role in genetics is far
beyond the scope of this report.
For DNA computation, we are mainly interested in DNA’s structure and it’s binding
properties.
For simplicity, let us identify DNA nucleotides simply by their bases. There are four of these:
adenine (A), thymine (T), cytosine (C), and guanine (G). As nucleotides bind together to
form DNA strands, sequences of these bases are formed. A short DNA strand is called an
oligonucleotide or an n-mer (where n is the number of nucleotides in the strand). Here, we
represent an oligonucleotide as the string of the letters corresponding to its base sequence
(e.g. ACTG, as in Figure 1). The end of an oligonucleotide with a free phosphate group is
referred to as the 5’ end, while the end with a free sugar is called the 3’ end. One of DNA’s
interesting and important properties is its propensity to form double strands via bonding
between the nucleotide bases of two single DNA strands. However, this bonding occurs only
in a very specific manner. In particular, A can bond only to T and vice versa,
while, likewise, C can bond only to G and vice versa. Thus, any given single-stranded n-mer
has a unique complementary single-stranded n-mer—called its Watson-Crick complement—
to which it can bond. This concept is illustrated in Figure 1. For a given oligonucleotide O,
we represent its Watson-Crick complement as O.
In any model of computation, there must exist a set of operations, and in the case of DNA
computation, the fundamental operations are used to perform various manipulations of DNA
strands. Without being too specific, the operations most commonly used in the DNA
computing literature are:
• Synthesize: synthesize a desired oligonucleotide
• Mix: mix together two test tubes of DNA to perform a union
Figure 1: A 4-mer ACTG bonded to its Watson-Crick complement TGAC to form a double
strand of DNA. Here, the pentagons represent each nucleotide’s sugar, and the small circles
represent their phosphate groups. The puzzle-piece parts are the bases. Note how only A and
T fit together and likewise with C and G.

• Anneal: bond together single stranded Watson-Crick complements to form doublestranded


DNA
• Denature: break apart double-stranded DNA into single strands
• Ligate: bond the 3’ and 5’ ends of two single DNA strands
• Sort: separate DNA strands by length
• Extract: extract DNA strands containing a given contiguous base sequence
• Amplify: make copies of DNA strands
• Cut: cut DNA strands at a specific site
• Detect: determine whether a test tube contains any DNA strands
• Sequence: read the base sequence of a DNA strand

2. Brief Literature survey


Models of DNA Computation
There are various models of DNA computation. Some of the them are mathematical models
capturing the real world situation, some of which are purely mathematical whose
experimental feasibility remains unresolved. Here we describe three models which has been
used for the purpose of DNA cryptology.

2.1 Operation on Sticker model

Before specifying the operations on sticker model we define a test tube to be a multiset
containing the memory complexes. The general operations on the memory complexes in a
test tube are merge, separate, set and clear.
merge: Two test tubes are combined into one. This is just mixing the solution of two test
tubes.
separate: Given a test tube T and an integer i, 1 _ i _ k this produces two test tube
+(T, i) and −(T, i) where +(T, i) (−(T, i)) contains all memory complexes whose ith substrand
is on (resp. off).
set: Given a test tube T and an integer i, 1 _ i _ k this produces another test tube
set(T, i) where each memory complex has its ith substrand on.
clear: Given a test tube T and an integer i, 1 _ i _ k this produces another test tube
clear(T, i) where each memory complex has its ith substrand off.
The input or initial test tube will be a library of memory complexes. In particular a (k, l)
library, 1 _ l _ k, consists of memory complexes with k substrands, the last k − l substrands
are on whereas the first l substrands are on and off in all possible ways. Thus, a (k, l) library
contains 2l different memory complexes.

2.2 Splicing System

Splicing system was proposed by Tom Head [8]. Splicing system captures mathematically the
two molecular operation cut and ligate introduced in Chapter 2. The mathematical model was
introduced and studied before Adleman’s experiment. Many results including universality of
splicing system was obtained in [7, 19, 20]. In several organisms the DNA present is circular.
If both circular and linear DNA strands are present the mathematical analysis becomes much
more complicated because then several different possibility must be handled. Despite of that
various result concerning circular splicing has been obtained in [25, 21]. In Chapter 5 we
describe a special type of circular splicing introduced in [kn:head], which has been used to
solve various combinatorial problems. From practical point of view this splicing seems
feasible because of easy availability of the particular type of circular DNA strands and the
simple operations on it. This splicing has been used in [24] to break a public key
cryptosystem.

2.3 Algorithmic Self-Assembly


Self-assembly seems to be a powerful tool for DNA computing paradigm. Apart from
creating the building blocks, self-assembly only involves annealing and ligation for
computation. The building blocks are various complex nanoscopic structure of DNA
molecules. Some of the nanoscopic structure has been studied and created by Seeman et al.
Winfree showed that self assembly has the power of universal computation in [29].
In [15] it has been shown that nanoscopic structure using DNA could be made which in
turn serves as the building block of self-assembly. This gives greater flexibility while
retaining the advantage of massive parallelism inherent in DNA computing. This nanoscopic
structure can act like Wang tiles [28]. The wang tiles are squares with colored edges. If the
Wang tiles are allowed to cover the plane according to an additional rule that only edges of
same color can face each other, then the Wang tiles can simulate Turing machine. Thus it has
the power of universal computation. The DNA Wang tiles are ”Double Crossover” or ”Triple
Crossover” tiles made up of several interwoven DNA strands to form a square body with
sticky ends coming out of the sides or corners. This tile has been used to compute Xor in
[12], multiplication and circular convolution in [18]. The problem of attacking the
cryptosystem NTRU and creating DNA one-time-pad has been addressed in [18] and [5]
respectively. The stability and error resistance of DNA tile seems promising for DNA
computing.

2.4 Forbidding-Enforcing Systems


A variant of forbidding-enforcing systems for graphs which models DNA self-assembly was
proposed in [G.F., N.Jonoska, Nanotechnology, Science and Computation, 105-118, 2005]
Self-assembly is a process in which substructures are spontaneously self-ordered into
superstructures. It is driven by the selective affinity of the substructures.
DNA substructures are formed by DNA filaments partially attached to each other by Watson-
Crick complementarity between strands having opposite orientation. DNA Self-Assembly
leads the formation of 3D structure, but DNA is used not only as a structural material, also as
’fuel’, for the construction of DNA machines, walkers, switchers (Nanoday, DNA14), DNA
transducers. Franco DNA Computation: Results, Trends, and P
6. References

1. Adleman L. M., Molecular computation of solutions to combinatorial


problems, 1994.
2. Paun G., Rozenberg G. and Salomaa A., DNA Computing, Springer, 1998