The Growing Convergence of DNA and Information Security

The
Growing Convergence of DNA and Information Security
by
Joseph M. Cocchini
Master of Science: Information Security

14 November, 2012

ABSTRACT
Modern day generation and use of sensitive digital data continues to grow at an aggressive rate (Craglia,
et al., 2008). In the face of this mounting pressure, data-protective workhorse cryptography algorithms
face becoming at the least, suspect and at the most, obsolete. DNA-based and DNA-derived solutions
are presently emerging as viable platforms on which to regain lost ground and from which to advance
the state of the current art (Bardou, et al., 2012). DNAs massively parallel processing capabilities,
storage capacity and density, computational abilities and natural one-way transcription properties make
it a contender in the archival storage, parallel processing and digital data encryption and decryption
environments ("The emerging science," 2009). Moving to real-time (lab-less, in situ sampling) reading of
in situ DNA remains one of the largest hurdles to practical implementation, but progress is being made
toward that end (Toumazou, et al., 2012). This paper looks at human cryptographic history leading up to
DNAs digital-data-specific componentry [and characteristics of same], the mechanics of commoditizing
DNA for use as a digital data storage and encryption medium, practical strides made toward using DNA
as a storage and retrieval environment and commercial DNA reading technologies.

ii
TABLE OF CONTENTS
ABSTRACT ........................................................................................................................................ ii
TABLE OF CONTENTS...................................................................................................................... iii
TABLE OF FIGURES .......................................................................................................................... v
TABLE OF TABLES ............................................................................................................................ 9
INTRODUCTION ............................................................................................................................. 10
HISTORICAL PERSPECTIVE ............................................................................................................. 11
DNA AS AN INFORMATION MEDIUM............................................................................................ 17
Codons .................................................................................................................................. 18
Longevity of DNA .................................................................................................................. 21
DNA Digital Storage Capacity ............................................................................................... 22
DNA Energy Efficiency .......................................................................................................... 24
Data Error Rates in DNA ....................................................................................................... 24
DNA Microarrays .................................................................................................................. 25
Reading and Writing in DNA ................................................................................................. 26
DNA Barcoding of Living Organisms ................................................................................... 29
Biomolecular or DNA Computing ......................................................................................... 30
Breaking DES Using a Molecular Computer ......................................................................... 31
DNA-BASED CRYPTOGRAPHY ........................................................................................................ 33
DNA Encoding ....................................................................................................................... 33
DNA-Based Steganography and Watermarking ................................................................... 34
DNA-Based Data Encryption Using Yet Another Encryption Algorithm (YAEADNA) ............ 36
Inputs Psuedo-Code ..................................................................................................... 37
Algorithm Psuedo-Code ............................................................................................... 37
iii

Output Psuedo-Code .................................................................................................... 37
DNA/Amino Acid-Based Encryption of the Playfair Cipher .................................................. 40
Inputs Psuedo-Code ..................................................................................................... 44
Algorithm Psuedo-Code ............................................................................................... 45
Processing Psuedo-Code .............................................................................................. 45
Output Psuedo-Code .................................................................................................... 45
Experiment ................................................................................................................... 45
Opportunities for Additional Security (Sabry, et al., 2010) ......................................... 46
DNA-Based Encryption using the DNA-Crypt Algorithm ...................................................... 46
One-Time Pads...................................................................................................................... 47
DNA-Based Data Encryption Using Traditional (RSA) Encryption Methodology.................. 51
Key Generation Detail .................................................................................................. 52
Data Pretreatment Detail ............................................................................................. 53
Encryption Detail .......................................................................................................... 53
Decryption Detail ......................................................................................................... 54
Data Post-Treatment Detail ......................................................................................... 54
Security Implications .................................................................................................... 54
DNA Detection and Analysis ................................................................................................. 55
Lab-Free Contact-Based DNA Testing .......................................................................... 55
Lab-Free Contact-Less (Remote) Molecular Analysis................................................... 57
CONCLUSION ................................................................................................................................. 59
REFERENCES .................................................................................................................................. 61
iv
TABLE OF FIGURES
Figure 1. Example of a Spartan-Greek scytale (Ribeiro, 2012). .................................................... 11
Figure 2. Example of a Demaratus-like steganographic tablet. The wax layer is removed and
text imprinted (and colored for demonstration purposes) on the wood substrate. The wax
is then replaced (and inked in white for demonstration purposes) (Schovanek, 2010). ...... 12
Figure 3. Example of a simple character substitution schema (Stallings, 2010). .......................... 12
Figure 4. Example of an Alberti Cipher wheel (Ribeiro, 2012). ..................................................... 12
Figure 5. Example of a Jefferson wheel (Ribeiro, 2012). .............................................................. 13
Figure 6. A basic one-time pad encryption/decription schema (Stallings, 2010). ........................ 13
Figure 7. Example of an Enigma machine (Ribeiro, 2012). ........................................................... 14
Figure 8. Example of MIT CTSS system (Lelusz, 2009). ................................................................. 14
Figure 9. DES encryption schema (Smyth, 2007). ......................................................................... 15
Figure 10. DES encryption schema (Stallings, 2010). .................................................................... 16
Figure 11. Schematic representation of a single DNA nucleotide or monomer. .......................... 17
Figure 12. Organic representation (dyed) of a segment of DNA polynucleotide structure (Van
Voorst, Finzel, 2012). ............................................................................................................ 18
Figure 13. Nucleotide sequences dictating specific amino acid outcomes. ................................. 21
Figure 14. Electron photomicrograph of bacterium isolated from ~25 million-year-old
Dominican amber (Orkand, et al., 1998) .............................................................................. 22
Figure 15. DNA microarray showing magnification of a subsection (DNA Microarray Virtual Lab,
2012). .................................................................................................................................... 25
Figure 16. Schematic of oligonucleotide probes within a DNA microarray. ................................. 25
Figure 17. DNA microarray droplet depositing from inkjet-like printhead (Lausted, et al.,
2004). .................................................................................................................................... 26
Figure 18. DNA microarray printing platform (Lausted, et al., 2004). .......................................... 27
Figure 19. DNA information storage decoding/encoding schema (Church, et al., 2012). ............ 27
Figure 20. Illuma 2000 real time DNA sequencer used to read Church and teams encoded
microarray (Church, et al., 2012). ......................................................................................... 28
Figure 21. Sample DNA sequencer output of nucleotide readings from a single land fragment
(How do we sequence DNA?, 2012). .................................................................................... 28
Figure 22. Example of a UPC barcode. .......................................................................................... 29
Figure 23. Example DNA barcoding of animals and insect species and the subtle variations
between seemingly identical species (Luoma, 2008). .......................................................... 30
Figure 24. Graphic schema of a seven-node Hamiltonian Path Problem showing fourteen
possible routes, with the redline representing the only correct path (Chen, Wood, 2000). 31
Figure 26. Function schema of steganographic algorithms (Heider, Barnekow, 2007). ............... 34
Figure 27. Schema of a secret-key stegosystem with passive adversary showing embedded text
E, covertext C, stegotext S, Alices private random source R, and secret key K shared by
Alice and Bob, with Alice sending either covertext C or stegotext S (Cachin, 1998). ........... 35
Figure 28. A basic steganographic implementation in DNA. The message synthesizing process is
shown (a). The encoding rule is shown (b). The PCR result is shown (c). The DNA-based
ciphertext and corresponding plaintext is shown (d) (Cachin, 1998). .................................. 35
Figure 29. Computational schema of one YAEADNA encryption round (Amin, et al., 2006). ...... 36
Figure 30. Pearson correlation analysis between ciphertext and corresponding plaintext (Amin,
et al., 2006). .......................................................................................................................... 39
vi
Figure 31. Test image results (not to scale) before and after processing [from left to right] of the
underlying data to show randomness of octet distribution within a given DNA sequence
(Amin, et al., 2006). .............................................................................................................. 39
Figure 32. Flowchart of DNA-based Playfair algorithm (Sabry, et al., 2010). ............................... 41
Figure 33. Sample encryption processing (Sabry, et al., 2010). .................................................... 43
Figure 34. Sample decryption processing (Sabry, et al., 2010). .................................................... 44
Figure 35. Structure of the encoder and decoder for a Hamming code. (Bandakkanavar, (2010).
.............................................................................................................................................. 47
Figure 36. A basic one-time pad encryption/decription schema (Stallings, 2010). ...................... 48
Figure 37. A DNA-based one-time pad A, plain text B, cipher text and primer DNA polymerase
primer (black box), (Wong, et al., 2003). .............................................................................. 48
Figure 38. Example of DNA tiles carrying data representing binary 1 (light blue), binary 0
(white), start block or attachment point (dark blue fading to lower right) and end marker
or attachment point (dark blue fading to upper left). The sequences can be ligated
(attached by an end or ends) to longer DNA strands by using the start and stop points, or
sticky ends (Leier, et al., 2000). .......................................................................................... 49
Figure 39. Assemby of DNA binary strands. Strands are composed of shorter, concatenated
tiles by overlapping sticky start (s) and end (e) terminators with arbitrary quantities of
DNA bits in between. Bit strands containing up to 32 bits were yielded in this process
(Leier, et al., 2000). ............................................................................................................... 49
Figure 40. Example of gel-electrophoresis (Electrophoresis, 2011). ............................................ 50
Figure 41. Self assembly schema of a random DNA tile assembly (Hirabayashi, et al., 2010). .... 50
vii
Figure 42. Scaffold construction schema by sample input message: M = 00011011 (Hirabayashi,
et al., 2010). .......................................................................................................................... 51
Figure 43. Encryption scheme flowchart (Cui, 2008). ................................................................... 51
Figure 44. Data pre/post treatment schema (Cui, 2008). ............................................................. 52
Figure 45. Genalysis proprietary microchip for real-time DNA analysis (Toumazou, et al.,
2012). .................................................................................................................................... 55
Figure 46. Excitation schema of Genalysis proprietary ISFET mechanism (Toumazou, et al.,
2012). .................................................................................................................................... 56
Figure 47. Schema of hydrogen ion release upon extending an existing DNA strand with one or
more additional nucleotides (Toumazou, et al., 2012). ........................................................ 56
Figure 48. Genalysis external testing process overview (Toumazou, et al., 2012). .................... 57
Figure 49. Schematic representation of single molecule experiment to detect malachite green
adsorbed on a planar metal surface (Neacsu, et al., 2006). ................................................. 57
Figure 50. Scanning electron microscopy image of a wrinkled Raman-active gold surface (Zhang,
et al., 2011). .......................................................................................................................... 58

viii
TABLE OF TABLES
Table 1. Amino acids, corresponding codes and codons that initiate their generation. .............. 19
Table 2. ASCII table of Western type characters. ......................................................................... 20
Table 3. Comparative sequential numbering represented in Quaternary, Binary and Decimal. .. 21
Table 4. Viable lifespans of different types of digital data storage mediums (Conway, 1996). ... 22
Table 5. Viable type character densities over a variety of mediums (Conway, 1996). ................. 23
Table 6. Data bit densities across a variety of mediums (Church, Kosuri, 2012). ......................... 23
Table 7. Raw, uncorrected data bit error rates across a variety of mediums. ............................. 24
Table 8. Associations of DNA nucleotide bases to binary number equivalents. .......................... 33
Table 9. Plain text character frequency in DNA strand (Amin, et al., 2006). ................................ 38
Table 10. DNA representation of bits (Sabry, et al., 2006). .......................................................... 40
Table 11. Amino acids and their corresponding 64 codons (Sabry, et al., 2006). ......................... 41
Table 12. Distribution of the alphabet with corresponding, interchangeable codons (Sabry, et
al., 2010). .............................................................................................................................. 42
Table 13. New distribution of the alphabet with corresponding codons, after application of the
Playfair Cipher (Sabry, et al., 2010). ..................................................................................... 43
Table 14. Performance results of experiment processing stages (Sabry, et al., 2010). ................ 46
Table 15. Associations of DNA nucleotide bases to binary number equivalents. ........................ 53

INTRODUCTION
In the area of information security, the need for reliable protection, storage and consumption of
sensitive data has been codified into the CIA Triad: Confidentiality, Integrity and Availability. In this
context, confidentiality implies two overarching classes of information consumer: those subscribers that
merit access to a given data set within given circumstances, and those that do not. Integrity in this
context infers a reliable lack of adulteration of any given data set. Availability suggests a minimum
standard of secure, reliable functionality in all conduit systems serving up any given data set to vetted
subscribers of same (Confidentiality, Integrity & Availability, 2009).
Mankind has developed a wide variety of methods over time to enforce and make available the
elements of the Triad. But with each advance, the opportunity to render that advance null offers itself
(Mathai, 2012) (Ribeiro, 2012). As modern, workhorse cryptography algorithms such as DES, and more
recently, MD5) are broken or become suspect, new solutions for information security are being sought
out to offer protection for sensitive data (Bardou, et al., 2012). As a result, DNA-based digital data
storage, cryptography and steganography are emerging as technologies with significant potential for
more capable versions of existing encryption implementations as well as new and potentially
unbreakable algorithms (Cui, et al., 2009).

10
HISTORICAL PERSPECTIVE
Modern data protection has its roots in all manner of mechanical, arithmetical and
mathematical constructs (Ribeiro, 2012).
In ~700BC, the Spartans and Greeks used scytales to send sensitive battle-related military
messages back and forth. Both sender and receiver possessed identical, hexagonal sticks, around which
a strip of leather or parchment was wrapped in a spiral. The sender would write out a message on the
spiral-wound material, unwind it and send it along to the intended receiver, who upon receipt would
wrap the material around a stick of identical proportion in order to decode the message (Ribeiro, 2012).

Figure 1. Example of a Spartan-Greek scytale (Ribeiro, 2012).
Herodotus refers to an example of steganography in his book of ~440BC, History in which the
character Demaratus sent a warning about a forthcoming attack to Greece by writing it directly on the
wooden backing of a wax tablet before covering the backing over with the intended beeswax writing
surface (Arnold, 2000).
11
Figure 2. Example of a Demaratus-like steganographic tablet. The wax layer is removed and text imprinted (and
colored for demonstration purposes) on the wood substrate. The wax is then replaced (and inked in white for
demonstration purposes) (Schovanek, 2010).
Julius Caesar employed a simple substitution cipher ~40BC to secure military communications.
This relatively weak encryption method shifts each plaintext character a certain number of spaces, using
the same variable for each shift (Mathai, 2012).

Figure 3. Example of a simple character substitution schema (Stallings, 2010).
In 1467, Leon Battista Alberti developed what appears to be have been the first polyalphabetic
substation cipher. The underlying mechanism consisted of two concentric metal discs of differing
diameters, allowing for the alignment of characters in the plaintext and ciphertext (Ribeiro, 2012).

Figure 4. Example of an Alberti Cipher wheel (Ribeiro, 2012).
Thomas Jefferson invented what is known as the Jefferson Wheel in 1797. The wheel
consisted of 26 discs on a shared spindle, with the letters of the alphabet embossed on their rims.
Turning the discs would scramble and unscramble the plaintext message (Ribeiro, 2012).
12

Figure 5. Example of a Jefferson wheel (Ribeiro, 2012).
Auguste Kerckhoffs was a military cryptologist who proposed a sea change [at the time] in the
thinking about contemporary encryption practices in the late 19th century. Kerckhoffs suggested that
an encryption algorithm should be assumed to be known, and that the key alone should be assumed to
be secret. In this way, if a key is compromised, it can be changed, reestablishing secure communications
without the need to abandon the encryption method itself (Mathai, 2012).

Figure 6. A basic one-time pad encryption/decription schema (Stallings, 2010).
The One Time Pad (OTP) is a technique that offers perfect secrecy' by using a truly random key
only once per communication, whose length is the same as the plaintext in question. While the initial
theory has been attributed to a paper written by Frank Miller in 1882, the practical implementation of
the OTP is attributed to U.S. Army Signal Corp. Commander Joseph Mauborgne and Bell Labs Gilbert
Vernam some thirty-five years later. This approach offers perfect secrecy because even if a single key
is compromised, it does not reveal anything about future or past transmissions. The strength of the
technique lies in randomness and one time use of the keys (Stallings, 2010).
13

Figure 7. Example of an Enigma machine (Ribeiro, 2012).
Invented by Arthur Scherbius ~1943, Enigma was Germany's main cryptographic technology
during WW II. It consisted of a basic keyboard, a display that would reveal the cipher text letter and a
scrambling mechanism such that each plain text letter entered as input via the keyboard was
transcribed to its corresponding cipher text letter (Mathai, 2012).

Figure 8. Example of MIT CTSS system (Lelusz, 2009).
MITs Compatible time-Sharing System (CTSS) was the first known system to employ a formal
username/password combination to control access. In addition, CTSS may have been the first system to
be compromised by a password breach. In 1966, a software bug resulted in the exposure of the CTSS
master password table (Ribeiro, 2012).
14

Figure 9. DES encryption schema (Smyth, 2007).
The National Bureau of Standards (NBS) developed the Data Encryption Standard (DES) in 1979,
using what was then state of the art, 56-bit encryption. Not even supercomputers of the time could
crack DES, which remained a standard for nearly twenty years until it was broken in ~fifty-six hours in
1998 by the Electronic Freedom Foundation. Cable television content providers HBM, Cinemax and
other introduced the Videocipher II in 1985, which was a video scrambling system based on DES
(Ribeiro, 2012).
15

Figure 10. DES encryption schema (Stallings, 2010).
The National Institute of Standards and Technology (NIST) published the Advanced Encryption
Standard (AES) in 2001. AES uses 128-bit encryption and is estimated that cracking it would require 255
(more than 36 quadrillion) years to accomplish, using non-quantum computing resources. AES is
actively in use today (Stallings, 2010).
16
DNA AS AN INFORMATION MEDIUM

First observed by German biochemist Frederich Miescher in the late 1800s, DNA or
Deoxyribonucleic acid effectively represents a data message medium in which an enormous but finite
set of conditions and characteristics are contained. That medium is composed of molecules
(nucleotides, or monomers) that organize into chains composed of phosphates, sugars, and various
nitrogen-based compounds. The lowest common denominators of coding or bases[ within a nucleotide
grouping are Adenine (A), Cytosine (C), Thymine (T) and Guanine (G). These four codes, A, C, T and G
form the basis for a quaternary or Base4 numbering system, the natural grouping of which is the codon:
a form of minimal codeword formed in triplets of bases from which more complex compounds and
instruction sets are derived (DNA Factsheet, 2012).
Note that every DNA strand has two distinct termination points at each end: a free 5 PO4 group,
and a free 3 OH group, both of which are referred to as 5 and 3 ends, respectively. These ends
provide a natural orientation for the individual standalone nucleotides as well functioning as joining
points when combined in groups of two or more (Boneh, et al., 1995).

Figure 11. Schematic representation of a single DNA nucleotide or monomer.
DNA nucleotides do not generally exist in nature as freestanding molecules, but more commonly
pair off, forming a twisting or helix structure as they do so. DNA monomers pair off by virtue of mutual,
complimentary attraction (hydrogen bonding) between connective points to form polymers, or
17
polynucleotides. It is this cooperative chaining (3 is attracted to 5, and 5 is attracted to 3) of

nucleotides that allows for the representation of a given data set. It is also the mechanism whereby the
sugar and phosphate groups combine to produce what appear to be the rails of a twisting ladder or
double helix, with the base compounds organizing in such a manner as to resemble the rungs of a
twisting ladder (Boneh, et al., 1995).

Figure 12. Organic representation (dyed) of a segment of DNA polynucleotide structure (Van Voorst, Finzel,
2012).
Codons
Abstractly speaking, a strand of DNA is roughly equivalent to a tape measure onto which
groupings of symbols (bases) have been printed in a repetitive pattern a great many times. Through a
lengthy and complex process, these bases are eventually translated into chains of amino acids
(Shimanovsky, et al., 2003).

18
Amino acid name

Alanine
Arginine
Asparagine
Aspartic acid
Cysteine
Glutamine
Glutamic acid
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
Asparagine
Glutamine
Unknown amino acid
Translation stop
Amino acid code

A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
B
Z
X
*
Nucleotide codon
GCT GCC GCA GCG
CGT CGC CGA CGG AGA AGG
ATT AAC
GAT GAC
TGT TGC
CAA CAG
GAA GAG
GGT GGC GGA GGG
CAT CAC
ATT ATC ATA
TTA TTG CTT CTC CTA CTG
AAA AAG
ATG
TTT TTC
CCT CCC CCA CCG
TCT TCC TCA TCG AGT AGC
ACT ACC ACA ACG
TGG
TAT, TAC
GTT GTC GTA GTG
Random codon from D and N
Random codon from E and Q
Random codon
TAA TAG TGA
Table 1. Amino acids, corresponding codes and codons that initiate their generation.
In this scenario, each subgroup of triplet bases (codons) is treated as being representative of a
character in a finite alphabet. This is not unlike the use of groups of bits of computer data to represent
the characters of the Western alphabet. In fact, the amino acid code and codon table bears a strong
visual and functional resemblance to the ASCII table of Western type characters (Shimanovsky, et al.,
2003).

19
Character
A
B
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
Decimal number
65
66
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
Binary number
0100 0001
0100 0011
0100 0100
0100 0101
0100 0110
0100 1000
0100 1000
0100 1001
0100 1010
0100 1011
0100 1100
0100 1101
0100 1110
0100 1111
0101 1111
0101 0001
0101 0010
0101 0011
0101 0100
0101 0101
0101 0110
0101 0111
0101 1000
0101 1001
0101 1011
Table 2. ASCII table of Western type characters.
DNA is in essence then, a natural platform for the encoding of data, with the lowest coding
denominator being the codon. These natural groupings are referred to as codons: a form of minimal
codeword from which more complex instruction sets are derived. Codons are the fundamental unit of
code storage within DNA, comprised of discrete triplets of nucleotides representing data values based
on the sequence within each triplet (Shimanovsky, et al., 2003).
20

Figure 13. Nucleotide sequences dictating specific amino acid outcomes.
Adenine (A), Cytosine (C), Thymine (T) and Guanine (G) form the basis for a quaternary or Base4
numbering system.
Quaternary
10
11
12
13
20
21
22
23
Binary
10
11
100
101
110
111
1000
1001
1010
1011
Decimal
10
11
111
112
Quaternary
30
31
Binary
1100 1101 1110 1111 10000 10001 10010 10011 10100 10101 10110
Decimal
12
13
32
14
33
15
100
16
101
17
102
18
103
19
110
20
21
22
Table 3. Comparative sequential numbering represented in Quaternary, Binary and Decimal.
Longevity of DNA
Data at rest remains accessible [and therefore of value] to the extent that the storage medium
in which it is contained remains both viable and accessible. The ideal in terms of professional archiving
is for a given storage medium to remain viable and accessible ad infinitum (Conway, 1996).
DNA has stood the test of time, in that it has maintained the same underlying molecular
construct during the billions of years since life emerged (Bancroft, et al., 2001).

21
Medium
Clay Tablet
Papyrus
Printed Book
Newspaper
Magnetic Disk
Optical
DNA
Demonstrated years of viability

10000
5000
100
90
15
5-10
>25 million
Table 4. Viable lifespans of different types of digital data storage mediums (Conway, 1996).
Stored under favorable conditions, DNA can exhibit in an effectively and practically limitless
shelf life. Microbiologist Raul Cano announced in 1995, the retrieval of two different, viable (able to be
made to reproduce), bacterial spores found in a piece of amber of Dominican the origins of which are
thought to be 25 to 40 million years old (Lambert, et al., 1998).

Figure 14. Electron photomicrograph of bacterium isolated from ~25 million-year-old Dominican amber (Orkand,
et al., 1998)
DNA Digital Storage Capacity

Within the context of secure storage of digital data, DNA represents an appealing media for due
to the very large amounts of data (capacity) that can be stored when compared to other data storage
mediums, combined with DNAs effectively limitless theoretical longevity (Lambert, et al., 1998).
Encryption of data at rest is a significant concern in todays information security landscape (Securing
Data, 2003). In that context, DNA offers siginificant potential for the secure encryption of data stored to
it (Gehani, LaBean, 2000).

22
Medium
Clay Tablet
Papyrus
Printed Book
Newspaper
Magnetic Disk
Optical
DNA
Characters per cubic inch

~102*
~11,766*
~55,600*
~136,000*
1 trillion
2,150,000,000*
>227 x 1015**
Table 5. Viable type character densities over a variety of mediums (Conway, 1996).
* Extrapolated from Conways figures for characters per square inch using the average caliper-measured thickness
of the medium in question.)
** Not part of Conways figures.
The data densities that are currently possible with DNA vastly exceed comparable volumetric
storage capacities of current electronic, magnetic, optical and experimental media, making it a
potentially attractive storage medium for encrypted archival data (Church, Kosuri, 2012).
Medium
Compact Disk (CD)
DVD-QL
Blu-Ray (QL)
Oracle StorageTek Magnetic Tape
Hard Disk
12-atom memory
Xe positioning
Quantum Holography
Super-resolution GFP
DNA in E. coli
Mycoplasma
Single Strand DNA (ssDNA)
Date
1982
2000
2010
2010
2012
2012
1991
2008
2011
2005
2010
2012
Bit Density (mm3)

413,000
10,100,000
75,200,000
559,000,000
3,100,000,000
1,110,000,000,000
10,000,000,000,000
13,800,000,000,000
40,000,000,000
1,440,000,000,000
88,000,000,000,000
5,490,000,000,000,000
Table 6. Data bit densities across a variety of mediums (Church, Kosuri, 2012).
Based on a demonstrable data density approaching 5.5 million billion bits (petabits) per cubic
millimeter, all the data in the world could theoretically be stored using several grams of DNA (Church,
Kosuri, 2012).

23
DNA Energy Efficiency

The growing demand for information processing and storage has led to an overwhelming call for
information systems that consume far less energy than current systems offer. As the need for digital
processing and information storage grows, pressure to develop more energy efficient systems will
increase (Harizopoulos, et al., (2009).
A team led by doctoral student Yaakov Benenson of the Weizmann Institute of Science in
Rehovot determined that a single DNA molecule can yield all the energy needed to perform a
computation (Benenson, et al., 2004). The form factor in question is so small that a tiny droplet could
hold up to three trillion of these DNA computers, performing 66 billion operations a second (Beale,
2003).
Data Error Rates in DNA

A digital storage medium offers little practical value if it cannot be made relatively resistant to
random error (Kamali, 1995). Random mutations occur in most types of DNA replication, with recent
studies suggesting a single-base substitution error rate in bacterial DNA in vivo in the range of 108 to
107 per replication (Balado, 2010).
Medium
Optical Compact Disc
In vitro (microarray) DNA
In vivo (live) cell DNA
Magnetic Disk Subsystem
Raw error rate

10-5 to 10-6
1.8x10-6
107 to 1010
~10-14
Table 7. Raw, uncorrected data bit error rates across a variety of mediums.
Uncorrected data error rates in DNA are also favorable when compared to current data storage
technologies. Because DNA as a storage medium permits each segment of information to be stored in
an enormous number of identical molecules, the resulting redundancy tends to mitigate data losses that
could foreseeably take place due to progressive, random decay (Bancroft, et al., 2001). In the case of
the 2012 Church Kosuri experiment in which an entire book, including images and formatting, was
written to DNA, the observable error rate was 10 bits per 5.7 million bits encoded; a factor of
approximately 1.8x10-6, with the bulk of those errors occurring primarily in areas not containing payload
information (splice areas), making the effective error rate lower still (Church, Kosuri, 2012).
24
By comparison, the raw bit error rate on a typical optical compact disc, which stores data at a
significantly lower bit density is between 10-5 to 10-6, before electronic correction. That is to say, the
demonstrated, uncorrected (raw) bit error rate in the Church book experiment is nearly two times lower
that the raw bit error rate of a typical optical compact disc before electronic correction (National Chung
Hsing University, 2012).
DNA Microarrays
In the context of digital data storage and retrieval, DNA digital data microarrays are composed
of typically planar (flat) substrates onto which an extremely find grid of DNA material is deposited in the
form of dots, accompanied by chemistry required to enable the process (Chan, et al., 2009).

Figure 15. DNA microarray showing magnification of a subsection (DNA Microarray Virtual Lab, 2012).
A microarray substrate may be composed of glass, silicon chips or nylon membranes. The DNA
is then printed, spotted, or synthesized directly onto the substrate in the form of DNA probes. In this
context, a DNA probe is a relatively small (stretches of 50-100 nucleotides), single-stranded molecule of
DNA bases (Microarrays Factsheet, 2007).

Figure 16. Schematic of oligonucleotide probes within a DNA microarray.
25
Reading and Writing in DNA

The most productive proof of concept in the area of storing digital data in DNA to date took
place earlier in 2012 (Church, Kosuri, 2012). George Church is the Robert Winthrop Professor of
Genetics at Harvard Medical School and a founding core faculty member of the Wyss Institute for
Biomedical Engineering at Harvard University. Church and his team encoded seventy billion copies of
the book, Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves, into Single Strand DNA
(ssDNA) (Leo, 2012). Church and his team were able to preserve the text, images and formatting of the
book within this effort.
The oligonucleotide library (storage medium) used in the Church-Kosuri experiment was
synthesized using ink-jet printed, high-fidelity DNA microchips or microarrays, effectively printing
oligonucleotide sequences onto a glass substrate using well understood ink-jet printer technologies
(Lausted et al., 2004).

Figure 17. DNA microarray droplet depositing from inkjet-like printhead (Lausted, et al., 2004).
The printing system is enhanced by control systems that ensure the intended series of
nucleotide dots are deposited on the glass substrate, thereby limiting catastrophic data storage media
errors. Using this printing process, researchers have been able to synthesize (print) oligonucleotide
probes (stretches of 50-100 nucleotides) in a manner identical to that of using an inkjet printer to print
words on a page of paper.
26

Figure 18. DNA microarray printing platform (Lausted, et al., 2004).
Church and his team converted an html-coded draft of a book that included 53,426 words, 11
JPG images and 1 JavaScript program into a 5.27 megabit bit stream. That bit stream was then encoded
onto 54,898 oligonucleotides (oligos) each of which in turn encoded a 96-bit data block, a 19-bit address
specifying the location of the data block in the bit stream, and flanking common sequences (start and
stop blocks) for amplification and sequencing.

Figure 19. DNA information storage decoding/encoding schema (Church, et al., 2012).
To read the encoded book, Churchs team applied limited-cycle Polymerase Chain Reaction
(PCR) amplification to the oligos in order to produce a pool of DNA replications large enough to draw
from. Developed by Kary Mullis in the early 1980s, PCR is a biochemical process whereby one or more
strands of DNA is amplified by several orders of magnitude, producing thousands to millions of copies of
a particular DNA sequence (Bartlett, Stirling, 2003).
The results were then sequenced, decoded and read back on an Illumina HiSeq DNA
sequencer.
27

Figure 20. Illuma 2000 real time DNA sequencer used to read Church and teams encoded microarray (Church, et
al., 2012).
In this application, the sequencer plots nucleotide sequences detected in one 'lane' of a sample.
The sequencer computer interprets and prints the nucleotide sequence across the top of the plot (How
do we Sequence DNA? (2012).
Figure 21. Sample DNA sequencer output of nucleotide readings from a single land fragment (How do we
sequence DNA?, 2012).
Once the sequence readings were retrieved and decoded, Church and his team recovered all
data blocks, with a total of 10 bit errors out of 5.27 million, which were predominantly located within
non-information bearing areas of the sequences (Church, Kosuri, 2012).
The bit density of Churchs effort approaches 5.5 petabits, or 1 million billion bits, per cubic
millimeter. This level of data concentration within the given physical footprint effectively eliminates
near-term concerns as to the amount of data and information that can be coded into a DNA construct.
28
DNA Barcoding of Living Organisms

The phrase, DNA barcode is an effectively slang term referring to the identification of short-
strand DNA sequences using a standardized region of a genome for species identification and discovery
(Hollingsworth, et al., 2009).
DNA barcodes offer non-experts the ability to objectively identify species even from small
quantities of damaged or industrially processed material. In the same way that a unique pattern of bars
in a Universal Product Code (UPC) identifies a consumer product, a DNA barcode is defined as a short,
uniquely patterned DNA sequence (oligonucleotide) of approximately 700 nucleotides in length used to
unequivocally identify an organism by species, genus, family, order, etc. (Hollingsworth, et al., 2009).

Figure 22. Example of a UPC barcode.
These barcodes can then be quickly processed from thousands of specimens and
unambiguously analyzed. In relatively prolific use since 2000, DNA barcoding is able to produce
a level of granularity such that, what was once thought to be one species of butterfly is really
ten species of butterflies, demonstrating the level of granularity possible (Hollingsworth, et al.,
2009).

29
Figure 23. Example DNA barcoding of animals and insect species and the subtle variations between seemingly
identical species (Luoma, 2008).
The International Barcode of Life (iBOL) organizes collaborators from more than 150 countries to
participate in a variety of campaigns to census diversity among plant and animal groups. The 10-year
Census of Marine Life, completed in 2010, provided the first comprehensive list of more than 190,000
marine species and identified 6,000 potentially new species (Hollingsworth, et al., 2009).
DNA barcoding relies on short, highly variable regions of the mitochondrial and chloroplast
genomes; the complete sets of genetic material present living cells. With thousands of copies per cell,
mitochondrial and chloroplast sequences are readily amplified by polymerase chain reaction (PCR), even
from very small or degraded specimens (Hollingsworth, et al., 2009).
A sample of tissue is collected, preserving the specimen whenever possible and noting its
geographical location and local environment. A small leaf disc, a whole insect, or a sample of muscle are
suitable sources. DNA is extracted from the tissue sample, and the barcode portion of the DNA is
amplified by PCR. The sequencing results are then used to search a DNA database (Hollingsworth, et al.,
2009).
Biomolecular or DNA Computing

In an environment without the ability to perform computing operations, modern digital data
encryption cannot not exist. The most commonly recognized of the early DNA computing experiments
were performed by Len Adleman, in his 1994 computing a solution in DNA to the Hamiltonian Path
Problem (Adleman, 1994). Adelman is also a co-developer of the Rivest-Shamir-Adleman (RSA)
encryption protocol. These experiments centered on using DNA to compute an accurate solution to the
Hamiltonian Path Problem (Adelman, 1994).
A Hamiltonian path is a path in an undirected graph that visits each identified point exactly once
(Trummel, Weisinger, 1986).
30

Figure 24. Graphic schema of a seven-node Hamiltonian Path Problem showing fourteen possible routes, with
the redline representing the only correct path (Chen, Wood, 2000).
Whereas modern day, electrically supplied computers produce electrical impulses to represent
bits of information, the DNA computer examines the patterns of combination or growth of nucleotide
molecules or strings (probes). DNA computers use the bases A, C, G and T as their memory units, along
with recombinant DNA techniques to carry out the fundamental operations. A program on a DNA
computer is executed as a series of biochemical operations which have the effect of synthesizing,
extracting, modifying and cloning the DNA strands in question (Boneh, et al., 1995).
Breaking DES Using a Molecular Computer
By observing and building on Adelmans Hamiltonian Path work, these DNA computing
principles were applied to the task of breaking the Data Encryption Standard (DES) on a theoretical basis
using a molecular (DNA-based) computer as early as 1995. Based on the availability of a plaintext-
ciphertext pair, DES was predicted to be recoverable within four months from experiment beginning to
end. Additionally, the parallel processing characteristics of the DNA computer model made it likely that
even with only a pool of plaintext candidates to choose from (with no 1:1 match), it would still be
possible to recover the DES key in question in the same period of time (Boneh, et al., 1995).
31

Figure 25. DES circuit schema (Boneh, et al., 1995).

32
DNA-BASED CRYPTOGRAPHY
Through the use of DNA computing, the DES cryptographic protocol had been shown to be
capable of being broken (Boneh, et al., 1995). As more, modern cryptography algorithms are broken,
new directions are being sought to provide needed protect for sensitive data. Whereas DNA-based
computing has been used to solve traditional mathematical problems, DNA-based cryptography
addresses the issues of using a biological system as a practical support for any given cryptographic
system. Developing DNA computing into a viable cryptography and steganography platform offers the
potential for a new generation of powerful or even unbreakable, algorithms (Sabry, et al., 2010).
As of this writing, research in DNA cryptography remains represented more by theory than
practical application, constrained by high tech lab requirements and labor-intensive procedures. These
factors currently prevent DNA-based computing from entering the mainstream of the current
information security environment, but are dissipating as limiting factors (Sabry, et al., 2010).
DNA Encoding
DNA-encoded digital information can be copied such that the possibility of successful theft of
intellectual property stored in this manner is relatively low (Shimanovsky, et al., 2003).
In its most simplistic form, adding or hiding digital data in a DNA sequence requires only a flat
encoding of 2 bits per nucleotide.
Nucleotide Base
Adenine (A)
Thymine (T)
Cytosine (C)
Guanine (G)
Base2 Numbering
00
01
10
11
Table 8. Associations of DNA nucleotide bases to binary number equivalents.
Using this reference system, binary segments can be added to DNA for purposes of hiding data,
annotating a given DNA sequence, watermarking, and so on. The application of data compression, data
encryption and checksum verification on such data would work in the same fundamental manner in
which compression, encryption and verification work in an electronically-based digital data storage and
conveyance system (Shimanovsky, et al., 2003).
33
DNA-Based Steganography and Watermarking

Steganography represents a form of data hiding and encryption in which a specific message is
made difficult to discern by virtue of the noise present in its medium. Steganography allows secret
information to be embedded into a cover message without significantly damaging the content of the
cover message. The secret information to be embedded is called the stego message (Cachin, 1998).

Figure 26. Function schema of steganographic algorithms (Heider, Barnekow, 2007).
The first published research on hiding an artificial message in living, in situ DNA was authored by
Taylor et al. in 1999 (Jiao, Goutte, 2009). This research involved the storing of a message in a sample of
human DNA. The same researchers subsequently published materials in 2001 describing the potential
for long-term storage of artificial information in DNA using codon encoding to represent the Western
alphabet (Shimanovsky, et al., 2003).
Research was published in 2009 identifying living, in situ DNA as a viable medium for long-term
and ultra compact information storage and a steganographic message medium for hidden messages. In
this model, message-encoded artificial DNA is added to the genome of one or more living organisms,
such as common bacteria. This approach yields a medium for the storage and conveyance of very high
densities of information. These information sets can take the form of digital watermarks, secure public
keys for decrypting hidden information in steganocryptography, and so forth (Shimanovsky, et al., 2003).

34
Along with the camouflaging effect afforded by the cover medium, information encoding can be
used to secure the message in a further effort to conceal its contents from unauthorized access
(watermarking). Encoding and decoding of that information are generally based on Kerckhoffs
Principle, which states that the security of a cryptosystem should depend solely on the secrecy of the
respective key and the respective private randomizer (Cayre, Bas, 2008).

Figure 27. Schema of a secret-key stegosystem with passive adversary showing embedded text E, covertext C,
stegotext S, Alices private random source R, and secret key K shared by Alice and Bob, with Alice sending either
covertext C or stegotext S (Cachin, 1998).
The mechanisms behind DNA-based steganographic watermarking are a), using traditional
encryption key-based techniques on the target information and b), concealing and encrypting the target
information in large numbers of irrelevant DNA sequence chains. This approach makes it extraordinarily
difficult to ascertain the correct beginning and end points of target information, thereby making it
difficult (in combination with encryption) to decode as a result. Only a recipient with advance
knowledge of the correct DNA indexing and decryption information would be able to reliably find the
correct DNA fragment, thereby decoding the message (Cui, 2009).

Figure 28. A basic steganographic implementation in DNA. The message synthesizing process is shown (a). The
encoding rule is shown (b). The PCR result is shown (c). The DNA-based ciphertext and corresponding plaintext
is shown (d) (Cachin, 1998).
35
DNA-Based Data Encryption Using Yet Another Encryption Algorithm (YAEADNA)

In this data encryption model, a symmetric DNA-based cipher approach is used to produce a
plaintext-cipertext pairing. The algorithm uses a search technique to locate and return the position of a
quadruple DNA sequence representing a series of binary octets that in turn, represent plaintext
characters. This method is scalable in order to produce very large cipertext outputs (Amin, et al., 2006).

Figure 29. Computational schema of one YAEADNA encryption round (Amin, et al., 2006).
36
Inputs Psuedo-Code
The inputs for this model are:
A (plaintext character)
M (random binary file)
RND (G)
DNA nucleotide sequence
Algorithm Psuedo-Code
1. In a plain text file, each character is sequentially replaced by its corresponding ASCII
code:
A ASCII [A]
2. The ASCII code is replaced with a corresponding DNA sequence:
ASCII[A] DNA sequence
3. Starting from a random location in a binary file contained within a single strand DNA
sequence, a search is performed for a quadruple DNA sequence representing a plain
text character. This sequence has the same nucleotides sequence as the ASCII code of
the respective plain-text character.
4. A sequential search is performed starting from a random location X:
X RND [G]
5. If the desired pattern is found, its location is then written to a pointer locations (PTR)
output file:
Start sequential search for sequence SEQ from location X
If DNA [A] = DNA [SEQ] then generate PTR file entry
6. Repeat the procedure for all other characters, beginning with Step 2
Output Psuedo-Code
The output for this model are one or more pointer coordinates of found quadruple DNA
nucleotides sequences representing the binary octets in question (Amin, et al., 2006).
37
Character Frequency in DNA Character

[l]
[f]
[a]
[6]
[c]
[o]
[m]
[k]
[d]
[h]
[[]
[X]
[&]
[g]
[v]
[b]
[i]
[Y]
[e]
[n]
[F]
[j]
[V]
[q]
[r]
[A]
[1]

23618
25392
27895
34295
34984
35244
39016
40134
41598
43075
43545
46915
47593
47631
50643
50722
51632
60091
61618
62262
62918
91953
92378
118728
130831
150386
150665

[-]
[,]
[s]
[2]
[p]
[9]
[>]
[C]
[L]
[5]
[M]
[!]
[U]
[']
[E]
[/]
[.]
[B]
[G]
[K]
[:]
[S]
[I]
[%]
[4]
[)]
[Q]

Frequency in DNA
Character
Frequency in DNA
164485
164592
166484
167323
171488
171570
185988
187115
190084
194813
198309
199178
202537
208559
210222
210841
216379
216498
217607
217891
223088
223518
224830
225581
225977
229001
229883

[P]
[7]
[N]
[;]
[?]
[3]
[_]
[8]
[]]
[y]
[O]
[x]
[D]
[t]
[u]
[W]
[(]
[z]
[R]
[T]
[0]
[@]
[w]
["]
[^]
[J]
[]
[H]
231686
247484
249825
253181
253181
259677
261141
264718
265877
267487
274382
277367
281713
282656
287690
291964
306117
313004
313137
313743
313988
332166
336562
338352
347070
349870
391919
391919
Table 9. Plain text character frequency in DNA strand (Amin, et al., 2006).
The full text of the novel, Uncle Toms Cabin was used as the basis of the encryption effort.
The table above illustrates the 82 characters occurring in the novel, along with their frequency of
occurrence (Amin, et al., 2006).
A correlation analysis was performed between randomly selected locations for all 1,015,120
characters of the plaintext in an effort to determine indirect relationships (Amin, et al., 2006).
38

Figure 30. Pearson correlation analysis between ciphertext and corresponding plaintext (Amin, et al., 2006).
The results of the Pearson correlation analysis indicate a majority of the locations in the
plaintext lack any significant relationship to their respective locations in the related DNA sequence, with
71 characters having a correlation coefficient between -0.1 and .1, and the other 11 characters having a
factor between -0.12 and 0.48 (Amin, et al., 2006).
The encryption process was subsequently tested on images in order to exemplify the
randomness of the encrypted octet locations within the respective DNA sequence (Amin, et al., 2006).

Figure 31. Test image results (not to scale) before and after processing [from left to right] of the underlying data
to show randomness of octet distribution within a given DNA sequence (Amin, et al., 2006).
39
The 350X258 pixel image above and to the left was used as the test subject. The test image data
was then translated into a single DNA strand. The locations of the DNA octets were cataloged and
inserted into an 88,410,189 long nucleotide sequence. The locations data were then retrieved and
reshaped into a 350X258 pixel matrix and their values rescaled to represent a full range color map. The
result of this latter effort is seen in the image above and to the right (Amin, et al., 2006).
DNA/Amino Acid-Based Encryption of the Playfair Cipher

In this data encryption model, a significant modification is introduced to the traditional Playfair
Cipher through the application of a DNA and amino acid-based structure at the core of the ciphering
process in order to make the Cipher more resistant to attack. Plaintext messages are converted to
binary form and transcribed into DNA nucleotide sequences. Those DNA sequences are subsequently
processed through a traditional Playfair Cipher encryption process based on a supporting amino acid
structure. The net result is a holistic encryption platform that elevates the demonstrable level of data
confusion and diffusion, while preserving the simplicity of the core cipher (Sabry, et al., 2006).
In order to represent all 26 characters of the English alphabet for this exercise, single-letter
abbreviations of both the DNA bases (A, C, G and T) will be employed (Sabry, et al., 2006).
Bit 1
Bit 2
DNA
0
0
1
1
0
1
0
1
A
C
G
T
Table 10. DNA representation of bits (Sabry, et al., 2006).
In addition to the DNA bases, their twenty respective amino acid elements were employed.
Three additional codons were used to represent the stopping point for the coding region (Sabry, et al.,
2006).

40
Amino acid
Alanine/A
Arginine/R
Alanine/A
Arginine/R
Asparagine/N
Aspartic acid/D
Cysteine/C
Glutamine/Q
Glutamic acid/E
Glycine/G
Histidine/H
Isoleucine/I
START
Nucleotide codons
GCU, GCC, GCA, GCG
CGU,CGC, CGA,CGG, AGA,AGG
GCU, GCC, GCA, GCG
CGU,CGC, CGA,CGG, AGA,AGG
AAU, AAC
GAU, GAC
UGU, UGC
CAA, CAG
GAA, GAG
GGU, GGC, GGA, GGG
CAU, CAC
AUU, AUC, AUA
AUG
Amino acid
Methionine
Phenylalanine
Leucine/L
Lysine/K
Methionine/M
Phenylalanine/F
Proline/P
Serine/S
Threonine/T
Tryptophan/W
Tyrosine/Y
Valine/V
STOP
Nucleotide codons
ATG
TTT TTC
UUA, UUG, CUU, CUC, CUA,CUG
AAA, AAG
AUG
UUU, UUC
CCU, CCC, CCA, CCG
UCU, UCC, UCA, UCG, AGU, AGC
ACU, ACC, ACA, ACG
UGG
UAU, UAC
GUU, GUC, GUA, GUG
UAA, UGA, UAG
Table 11. Amino acids and their corresponding 64 codons (Sabry, et al., 2006).
Because this model starts with the binary form of the plaintext, any number or special
characters may be represented, in contrast to the original Playfair Cipher, which allowed for capital
letters of the alphabet only (Sabry, et al., 2006).

Figure 32. Flowchart of DNA-based Playfair algorithm (Sabry, et al., 2010).
41
In the table below, there are only 20 amino acids along with one START and one STOP, whereas
25 letters are needed to construct the Playfair matrix (I and J are assigned to one cell). As a result,
the letters B, O, U, X, Z will share certain codons. Since the START and amino acid Methionine/M have
the same codon (AUG), the Methionine/M amino acid will not be used. Counting the number of codons
of each character, we will find the number codons that can be used interchangeably to represent a given
character varies between 1 and 4 codons. This variable (between 1 and 4) represents the 'ambiguity' of
the character, or [AMBIG] variable; a significant modification to the traditional Playfair Cipher (Sabry, et
al., 2006).

Table 12. Distribution of the alphabet with corresponding, interchangeable codons (Sabry, et al., 2010).
This completes the assignment of codons required to service the characters in the English
alphabet, so an amino acid-based plaintext message can be processed using the traditional Playfair
Cipher process using the secret key. The output from the Playfair Cipher being applied to the amino acid
table represents the ciphertext in the table below (Sabry, et al., 2006).
42

Table 13. New distribution of the alphabet with corresponding codons, after application of the Playfair Cipher
(Sabry, et al., 2010).
The idea that one character can have more than one DNA representation is itself an addition to
confusion concept that enhances the algorithms strength (Sabry, et al., 2006).

Figure 33. Sample encryption processing (Sabry, et al., 2010).
43
As with the traditional Playfair Cipher, the decryption process is the inverse of the encryption
process. There is however the additional complication of the [AMBIG] variable in this DNA-based
version which left untended, could make it impossible to choose which codon to use in relationship to
its related amino acid character. The problem of codon-amino acid mapping is addressed here by
adding two additional identification bits for each amino acid character indicating the correct codon to
choose in the decryption effort. These 2 bits can be converted to DNA form using the variables in Table
10. Since the [AMBIG] factor can range from 1 to 4, the identification bits must be able to represent a
number range from 0 to 3 (Sabry, et al., 2006).

Figure 34. Sample decryption processing (Sabry, et al., 2010).
Inputs Psuedo-Code
[P] Plaintext (characters with spaces, numbers or any special characters)

[K] Secret key (English characters without any number or special characters)
44
Algorithm Psuedo-Code
1. Prepare the secret key:
Remove any spaces or repeated characters from [K]
Put the remaining characters in the UPPER case form. [K]UPPER[K].
2. Prepare the plaintext:
Remove the spaces from [P] in order to avoid attacker's trace to a character
which is repeated many times within the message
Processing Psuedo-Code
1. Binary form [BP] = BINARY [P] (Replace each character by its binary representation-8
bits-)
2. Binary form [BP] = BINARY [P] (Replace each character by its binary representation-8
bits-)
3. Amino acids form [AP] = AMINO [DP] (Replace each 3 DNA characters by their Amino
acid character keeping in track the ambiguity of each Amino acid [AMBIG]
4. Construct the Playfair 5X5 matrix and add [K] row by row, then add the rest of alphabet
characters
5. Amino acid of cipher text [AC]= PLAYFAIR [AP]
6. DNA form of cipher text [DC] = DNA [AC]
Output Psuedo-Code
Add [DC] and [AMBIG] together in the suitable form final cipher text [C]
Experiment
The first 150 kilobytes (KB) of paragraph content of the book, A Tale of Two Cities was
selected. The secret key selected was CHARLES DICKENS, resulting in an 11 byte (11B) key (Sabry, et
al., 2010).
The amino acid table was loaded (see Table 12), and the secret key formatted, removing spaces
and non-English characters, if present. The plaintext was formatted, removing spaces and separating
any repeated, adjacent characters through the addition of a tilde (~) between pairs (Sabry, et al., 2010).
45
The characters were converted to binary form, which form was then converted to DNA, and
finally to an amino acid sequence, recording data for the [AMBIG] variables. The Playfair Cipher was
then applied, converting the amino acid sequence back into DNA form, with the [AMBIG] bit embedded
therein (Sabry, et al., 2010).

Table 14. Performance results of experiment processing stages (Sabry, et al., 2010).
The time required to load the amino acids table and preparing the secret key is omitted due to
the short duration of the events (Sabry, et al., 2010).
Opportunities for Additional Security (Sabry, et al., 2010)
Secret Key: the more random and lengthy the key, the more difficult the cipher will be
to break.
Replacing the English alphabet with an amino acid sequence: an amino acid sequence
may be used to finish populating Playfair matrix after secret key application.
Insert the ciphertext into a host DNA strand for insertion into a microdot: once in DNA
form, the ciphertext can be converted to a steganographic medium.
Use of conventional XOR-ing: an additional key may be defined, which can then be XOR-
ed with either the amino acid or DNA versions of the ciphertext, assuring randomness.
DNA-Based Encryption using the DNA-Crypt Algorithm

In this data encryption model, the DNA-Crypt algorithm uses the least significant base, much like
the use of the least significant bit in traditional steganography, and can be combined with binary
encryption algorithms such as AES, RSA or Blowfish. The DNA-Crypt algorithm is also able to correct
mutations in target DNA using correction codes such as the linear error-correcting Hamming-code
(Heider, Barnekow, 2007).
46

Figure 35. Structure of the encoder and decoder for a Hamming code.
(Bandakkanavar, (2010).
While mutations may occur infrequently, destroying encrypted information, an integrated fuzzy
controller decides on a set of heuristics based on three input dimensions and recommends whether or
not to use a correction code (Heider, Barnekow, 2007).
One-Time Pads
The concept of the one-time pad (OTP) was first introduced by U.S. Army Signal Corp officer,
Joseph Mauborgne as an improvement to the Vernam cipher. The scheme uses a random key that is as
long as the message itself, eliminating the need to repeat the key (Stallings, 2010).
47
Figure 36. A basic one-time pad encryption/decription schema (Stallings, 2010).
Additionally, the key is used to encrypt or decrypt only one message, after which it is discarded.
The result is an encrypted output that bears no statistical relationship between the plaintext and the
encoded message whatever. Without a statistical thread to pull, breaking a OTP-encrypted message is
considered impossible, a concept referred to in cryptography as perfect secrecy (Stallings, 2010).
OTPs are largely impractical in practice using modern hardware and software due to required
key lengths and distribution requirements. They have however become far more relevant with the
advent of theoretical quantum computing. In the quantum-computing model, the possibility [if not
likelihood] exists that any traditional encryption system can be broken in short order, with the exception
of those schemes based on OTP encryption. As a result, it is the OTP, combined with DNAs
extraordinary storage and computing capabilities that make the combination an attractive encryption
model (Xiao, et al., 2006).
Pak Chung Wong et al. developed a OTP algorithm to be used in DNA to store data in living
organisms. The data are translated into a DNA sequence, which is then inserted into the nucleotide
sequences of living organisms. The insert sequence is flanked by two primer sequences that are not pre-
existent in the target genome. The sequence is introduced to a living cell where it coexists, and by
association is replicated along with the genomes native DNA (Wong, et al., 2003).

Figure 37. A DNA-based one-time pad A, plain text B, cipher text and primer DNA polymerase primer (black box),
(Wong, et al., 2003).
Wong et al. employed a substitution cipher to encode a song text into the genome of a living
organism, Deinococcus radiodurans. This organism was chosen specifically for its ability to withstand
ionizing radiation, to show that the song information dould foreseeably remain intact and retrievable for
several centuries (Wong, et al., 2003).
48

Figure 38. Example of DNA tiles carrying data representing binary 1 (light blue), binary 0 (white), start block or
attachment point (dark blue fading to lower right) and end marker or attachment point (dark blue fading to
upper left). The sequences can be ligated (attached by an end or ends) to longer DNA strands by using the start
and stop points, or sticky ends (Leier, et al., 2000).
In this model, DNA strands were assembled by concatenating short, double-stranded DNA
molecules representing 0 (0-DNA bit), and 1 (1-BNA bit), sequence start point and sequence end point.
The start and end points are considered sticky, allowing the binary strands to be polymerized
(combined) through DNA annealing (reformation of DNA strands from heat-exposed DNA fragments)
and ligation (trimming) (Leier, et al., 2000).

Figure 39. Assemby of DNA binary strands. Strands are composed of shorter, concatenated tiles by overlapping
sticky start (s) and end (e) terminators with arbitrary quantities of DNA bits in between. Bit strands containing
up to 32 bits were yielded in this process (Leier, et al., 2000).
Using this method the information content can be decrypted and read directly by PCR
(amplification) and subsequent gel-electrophoresis, requiring no additional work such as subcloning or
sequencing (Leier, et al., 2000).
49

Figure 40. Example of gel-electrophoresis (Electrophoresis, 2011).
Miki Hirabayashi, et al. outlined a process in 2010 whereby self-assembling DNA is made to
function as a truly random number generator for purposes of producing OTPs (Hirabayashi, et al., 2010).

Figure 41. Self assembly schema of a random DNA tile assembly (Hirabayashi, et al., 2010).
In this DNA-based cryptosystem, each tile is randomly set throughout the entire DNA assembly).
Each tile has both XOR calculation and random number capabilities. Each tile has a sticky end for XOR
calculation, one sticky end for random number generation, and two sticky ends for connection
(Hirabayashi, et al., 2010).
50

Figure 42. Scaffold construction schema by sample input message: M = 00011011 (Hirabayashi, et al., 2010).
When the tile assembly begins, the value of the random operation tile is determined by a
probability factor of 0.5 at each slot, for of being either a 1 or a 0. In other words, the probability of
being one or the other is exactly 50/50. This probability factor is achieved by adding the same
concentration of random operation tiles (Hirabayashi, et al., 2010).
DNA-Based Data Encryption Using Traditional (RSA) Encryption Methodology

In this data encryption model, sender Alice owns encryption key KA, and her intended receiver
Bob owns decryption key KB (KA = KB or KA KB). Alice uses KA to translate a plaintext M into ciphertext C
by a translation E. Bob uses KB to translate the ciphertext C into the plaintext M by a translation D. E is
the encryption process (Cui, 2008).
The encryption process is: C = EKA (M)
The decryption process is: DKB (C) = DKB (EKA (M)) = M

Figure 43. Encryption scheme flowchart (Cui, 2008).
51
It is difficult to obtain M from C unless one has KB. It is important to note here that KA, KB and C
are not limited to digital data, but can be any method, material, data, etc. such as a DNA sequence. E
and D are also not limited to mathematical calculations, but can be any physical or chemical or biological
or mathematical process such as RSA cryptography. The intended receiver Bob has a pair of keys (e, d)
(Cui, 2008).
Key Generation Detail

On a practical level, message-sender Alice designs a DNA sequence which is realized in the form
of 20-mer oligonucleotides (a nucleotide chain composed of twenty molecules, that is, a DNA probe) as
a forward primer for PCR amplification and transmits it to intended receiver Bob over a secure channel.
The message-receiver Bob also designs a 20-mer oligonucleotides long DNA probe which to function as a
reverse primer for PCR amplification, and transmits it to Alice over a secure channel. After that pair of
PCR primers are exchanged over a secure communication channel, an encryption key KA and decryption
key KB can be derived (Cui, 2008).

Figure 44. Data pre/post treatment schema (Cui, 2008).
The set of two primer pairs identifying the critical start and stop points in the overall message
payload are cooperatively generated by both Alice and Bob. Because of this, should an attacker acquire
one primer pair, the amplification process required to bring the message payload to a stage at which it
can be extracted will not work correctly, because of the omission of the correct, corresponding primer
pair (Cui, 2008).
52
Data Pretreatment Detail

Choosing the phrase, GENECRYPTOGRAPHY as the plaintext to encrypt, the phrase first
converted into hexadecimal code by using the built-in computer code, that is: 47 45 4E 45 43 52 59 50
54 4F 47 52 41 50 48 59. That hexadecimal code sequence is then converted binary plaintext M by
purpose-built software, producing the binary sequence:
01000111 01000101 01001110 01000101
01000011 01010010 01011001 01010000
01010100 01001111 01000111 01010010
01000001 01010000 01001000 01011001
Sender Alice then translates the binary plaintext M into binary ciperhtext C, using Bobs public
key e. Alice then translates the binary ciphertext C into the DNA sequence using the nucleotide bases A,
T, C and G as binary entities (Cui, 2008).
Nucleotide Base
Adenine (A)
Thymine (T)
Cytosine (C)
Guanine (G)
Base2 Numbering
00
01
10
11
Table 15. Associations of DNA nucleotide bases to binary number equivalents.
Encryption Detail
Coding the binary ciphertext C into a binary DNA format composed of 64 nucleotides flanked by
forward and reverse PCR primer pairs constitutes the construction of the secret-message DNA sequence.
The primer pairs are required in order to enable the insertion and acceptance of the secret-message
sequence into a larger DNA sequence, and to delineate the encrypted messages beginning and end
points from within that larger environment (Cui, 2008).

53
Alice then generates an overall message payload in the form of a number of dummy DNA
sequences that have the same overarching structure as the secret-message DNA sequence. The secret-
message DNA sequence is then placed in among the dummy DNA sequences for camouflage. The
message payload is then sent to Bob using an open communications channel (Cui, 2008).
Decryption Detail
After the intended receiver (Bob) gets the message payload, he has the means for locating the
starting and ending points (associated primer pairs) for the secret-message DNA sequence, within the
overall message payload. Bob then translates the secret-message DNA sequence into the binary
ciphertext C, using his primer pair information to tell him where the correct start and stop points of the
secret-message DNA sequence are within the overall message payload. From there, Bob can decrypt the
binary ciphertext C into binary plaintext M, using his secret key e (Cui, 2008).
Data Post-Treatment Detail

After generating the binary plaintext M, Bob is in a position to retrieve plaintext M from the
binary plaintext (Cui, 2008).
Security Implications
As with traditional encryption mediums, it must be assumed that an attacker has knowledge of
the biological base on which DNA-based data encryption is premised, and has access to the necessary
tooling required to manipulate the encoded medium, and the data it contains. What is not known by
the attacker is the encryption key KA (one half of the associated primer pairs), Bobs public key e, the
decryption key KB (the other half of the associated primer pairs) and Bobs secure key d (Cui, 2008).
So in this encryption model, security is derived from two areas:
The complexity of the biological environment required to manipulate and observe

payload data without knowing the correct start or stop (or both) points of the secret-
message DNA sequence. Without knowing these start and stop points, the attacker
must attempt to derive them from a possible 1023 (100,000,000,000,000,000,000,000)
possible sequences, illustrating the effectiveness of the data obfuscation characteristics
of this model (Cui, 2008).
54
The difficulty introduced by a traditional cryptography algorithm. Without Bobs secret

key d, an attacker still needs an extraordinary level of computing power to attempt to
break the traditional encryption present (Cui, 2008).
DNA Detection and Analysis

The ability to store data, encrypted or otherwise, in any storage medium for any given length of
time is effectively useless if that medium cannot be effectively accessed and read in a timely, usable and
accurate manner (Conway, 1996).
Lab-Free Contact-Based DNA Testing
DNA Electronics (DNAE) is a United Kingdom-based firm founded by Professor Chris Toumazou,
FRS, FREng. DNAE has developed a proprietary, commercial, lab-free nucleic acid detection and analysis
system (Genalysis) capable of being operated by unskilled personnel (Toumazou, et al., 2012).
Genalysis is a three-step system that requires no specialized laboratory equipment and is not
environment dependent; it can be used on any physical site that is hospitable to human life and generic,
mobile computing platforms (Toumazou, et al., 2012).

Figure 45. Genalysis proprietary microchip for real-time DNA analysis (Toumazou, et al., 2012).
By performing simultaneous amplification and detection of nucleic acids, the Genalysis CMOS
microchip reduces the time to product significantly (Toumazou, et al., 2012).
55

Figure 46. Excitation schema of Genalysis proprietary ISFET mechanism (Toumazou, et al., 2012).
DNAE has developed a variation on the Field Effect Transistor (FET) called an Ion-sensitive Field
Effect Transistor (ISFET), in which real-time sensitivity to released hydrogen ions has been cultivated. In
the Genalysis system scenario, the presence of hydrogen ions drives electrical signal generation from
the ISFET. The greater the number of hydrogen ions detected by the ISFET, the greater the electrical
signal the ISFET produces. The ability to produce detective ISFET sensor arrays in densities measured in
the tens of millions is the primary factor in the ability to detect and analyze target DNA sequences in a
relatively short period of time (Toumazou, et al., 2012).

Figure 47. Schema of hydrogen ion release upon extending an existing DNA strand with one or more additional
nucleotides (Toumazou, et al., 2012).
The increase in hydrogen ions is achieved through DNA amplification as a result of hydrogen
ions being released whenever a strand of DNA is extended by the addition of a nucleotide (Toumazou, et
al., 2012).
A sample preparation kit is first opened, then used to acquire a sample of the target DNA in the
form of a swab or saliva sample. The balance of the preparation kit is used to prepare the sample for
deposition of a purified target DNA sample onto an electronic substrate (CMOS microship).
56

Figure 48. Genalysis external testing process overview (Toumazou, et al., 2012).
That substrate is then interfaced to an external, available Universal Serial Bus (USB) port present
on a computing device running the Genalysis analysis softwareTest results are typically available within
~30 minutes from the time analysis begins (Toumazou, et al., 2012).
Lab-Free Contact-Less (Remote) Molecular Analysis

The technological maturing of scanning-probe Raman spectroscopy has resulted in the
demonstrable ability to make conclusive informational observations at a single molecule level of
resolution, even under ambient and physiological conditions (Neacsu, et al., 2006).

Figure 49. Schematic representation of single molecule experiment to detect malachite green adsorbed on a
planar metal surface (Neacsu, et al., 2006).
In the example shown above, a helium-neon (HeNe) laser is focused on a silver (Au) probe. The
reflected (scattered) light is then detected using a charged-couple device (CCD) sensor.
57

Figure 50. Scanning electron microscopy image of a wrinkled Raman-active gold surface (Zhang, et al., 2011).
The development of wrinkly film nanoporous gold surfaces from which to perform Raman
spectroscopy analysis has largely addressed low stability and poor reproducibility in single molecule
identification, analysis and test results (Zhang, et al., 2011).

58
CONCLUSION
Mankind has been dependent on DNA for its information and related personnel security
identification, verification and authentication efforts since civilized life began. In the beginning, only
DNA-based outcomes and resultants were available for use: hair and skin color; eye color and features;
height; bipedal gait; facial profile and features; geometry of the hand and its features; vocal features;
and so on. With the passage of time we are in process of coming full circle, able now to read and write
from and to the DNA medium itself, in theory and in practice. Bypassing the outcomes and resultants of
DNA in favor of direct access to the information it contains, and can be made to contain. With the
advent of demonstrably secure approaches to reading and writing from and to DNA, the cycle is
accelerating. The ever-decreasing cost of these technologies will further accelerate real-time use of
DNA as a secure identification, authentication and information storage medium.
Modern day societal norms have associated with it an almost unique sense of certainty that is
not entirely possessed by other information security identification, verification and authentication
systems currently available. Whereas these other systems seem at least potentially vulnerable to
subversion, DNA is commonly viewed as being absolutely elemental and as a result, undeniable. That
this perception is not entirely accurate is perhaps of little consequence, since information security is
often composed as much sense and circumstance, as it is of reality.
Ever-increasing technical strides, combined with the ever-spreading availability of the
technology will make DNA more pertinent than ever as a tool for establishing identity, authenticity and
integrity in information security. The rate of progress in many related areas of research are in fact
proceeding so quickly as to make the description of these ideas and systems in the future tense
something of a challenge. The massive storage capacity and density of DNA, combined with its parallel
computing capabilities and energy efficiency will make it possible to apply traditional, hybrid-traditional
and entirely new encryption schemes to areas and applications that were previously too impractical,
unwieldy or resource-intensive to contemplate:
The relatively low temperature required to manufacture paper will make DNA-infused
stock for use in high security hardcopy documents a reality.
The unparalleled data diffusion and obfuscation opportunities in DNA will allow for
multi-access level electronic documents in which a single distribution produces different
readable versions of the document based on recipient security clearance.
59
What were once wholly impractical OTP applications will become relatively
commonplace, significantly altering the electronic distribution mechanics of-pay
periodicals, books, manuscripts, etc.
DNA data tagging of animal and plant kingdoms is already underway in full. With the
advent and refinement of near-real-time and real-time, close-proximity, non-contact
methods of reading human DNA, reading homo sapiens barcodes for security purposes
will become viable.
The sheer, secure digital data storage capacity of DNA will contribute to the gradual
obsolescence of mechanically-based storage mediums for archival storage of sensitive
data.
Real-time, close-proximity, non-contact methods of reading human DNA barcodes;

predetermined short-sequence areas of the human genome, will eventually become
practical for high security personnel access to sensitive data.
Close-proximity, non-contact confirmation systems will be used to read non-organic

DNA inks on sensitive hardcopy-based documents for authentication.
60
REFERENCES
(2009). The emerging science of dna cryptography. MIT Technology Reivew, Retrieved from
http://www.technologyreview.com/view/412610/the-emerging-science-of-dna-cryptography/
Adleman, L. (1994). Molecular Computation of Solutions to Combinatorial Problem. Science, 266(5187),
1021-1024.
Amin, S., Saeb, M., & El-Gindi, S. (2006, November). A DNA-Based Implementation of YAEA Encryption
Algorithm. IASTED International Conference On Computational Intelligence, San Francisco, CA.
Arnold, J. (2000). History: A Very Short Introduction. Oxford: Oxford University Press.
Aron, J. (2011, October 16). Keeping a Lid On Your Digital DNA. New Scientist, 2834. Retrieved October
14, 2012, from http://www.newscientist.com/article/mg21228346.500-keeping-a-lid-on-your-
digital-dna.html?DCMP=OTC-rss&nsref=online-news
Balado, F. (2010). On the Shannon Capacity of DNA Data Embedding. 2010 IEEE International Converence
on Acoustics, Speech, and Signal Processing : Proceedings, 1, 1766 - 1769 .
Bandakkanavar, R. (2010, December 10). Implementation of Hamming Code. KRAZYTECH. Retrieved
November 4, 2012, from http://krazytech.com/projects/implementation-of-hamming-code
Bancroft., Carter., Bowler., Timothy., Bloom., & Taylor. (2001). Long-Term Storage of Information in
DNA. Science, 293(5536), 1763-1765.
Bartlett, J., & Stirling, D. (2003). A Short History of the Polymerase Chain Reaction. PCR Protocols, 226, 3-
6.
Bardou, R., Focardi, R., Kawamoto, Y., Simionato, L., Steel, G., & Tsay, J. (2012). Efficient Padding Oracle
Attacks on Cryptographic Hardware. Advances in Cryptology CRYPTO 2012 32nd Annual
Cryptology Conference, Santa Barbara, CA, USA, August 19-23, 2012. Proceedings, 7417, 608-
625.
Baum, E. (1995). Building an Associative Memory Vastly Larger Than The Brain. Science, 268(5210), 583-
585.
Beale, B. (2003, February 25). Tiny Self-Powered DNA Computer Unveiled News in Science (ABC
Science). ABC.net.au. Retrieved October 14, 2012, from
http://www.abc.net.au/science/articles/2003/02/25/792007.htm
Benenson, Y., Gil, B., Ben-Dor, U., Adar, R., & Shapiro, E. (2004). An Autonomous Molecular Computer
for Logical Control Of Gene Expression. Nature, 429(6990), 423-429.
61
Boneh, D., Dunworth, C., Lipton, R. (1995). Breaking DES Using a Molecular Computer. Technical Report
CS-TR-489-95, Department of Computer Science, Princeton University.
Brown, K. (Director) (1999, October 20). How Long Will It Last? Life Expectancy of Information Media
Information Media. ARMA International Conference. Lecture conducted from Image
Permanence Institute, Cincinnati, Ohio.
Cachin, C. (1998, May). An Information Theoretic Model for Steganography. Proceedings of 2nd
Workshop on Information Hiding, Cambridge, MA.
Cayre, F., & Bas, P. (2008). Kerckhoffs-Based Embedding Security Classes for WOA Data Hiding. IEEE
Transactions on Information Forensics and Security, 3(1), 1-15.
Chan, Y., Nguyen, A., Niu, L., & Corn, R. (2009). Fabrication of DNA Microarrays with poly-L-Glutamic
Acid Monolayers on Gold Substrates for SPR Imaging Measurements. PMC, 25(9), 5054-5060.
Chen, J., & Wood, D. (2000, February). Computation with Biomolecules. Proceedings of the National
Academy of Sciences of the United States of America.
Church, G., Gao, Y., & Kosuri, S. (2012). Next-Generation Digital Information Storage in DNA. Science, 10,
1-2. Retrieved September 5, 2012, from
http://www.sciencemag.org/content/early/2012/08/15/science.1226355.full.pdf
Confidentiality, Integrity & Availability. (2009). IS Handbook. Retrieved November 7, 2012, from
http://ishandbook.bsewall.com/risk/Methodology/CIA.html
Conway, P. (1996). Preservation in the Digital World. Washington, D.C.: Commission on Preservation and
Access.
Craglia, M., Goodchild, M., Annoni, A., & Camara, G. (2008). Next-Generation Digital Earth. International
Journal of Spatial Data Infrastructures Research, 3, 146-167.
Cui, G., Li, C., Li, H., & Li, X. (2009, August). DNA Computing and Its Application to Information Security
Field. 2009 Fifth International Conference on Natural Computation.
Cui, G., Qin, D., Wang, Y., & Zhang, X. (2008). An Encryption Scheme Using DNA Technology. College of
Electrical Information Engineering, Zhengzhou University of Light Industry.
Deoxyribonucleic Acid (DNA) Fact Sheet. (n.d.). National Human Genome Research Institute (NHGRI) -
Homepage. Retrieved April 30, 2012, from http://www.genome.gov/25520880
Di Cristofaro, E. (2011). Sharing Sensitive Information with Privacy. (Master's thesis, University of
California, Irvine)Retrieved from http://www.emilianodc.com/PAPERS/dissertation.pdf
DNA Barcoding 101. (2012). Retrieved October 7, 2012, from
http://www.dnabarcoding101.org/introduction.html
62
DNA Microarray Virtual Lab. (2012). Learn Genetics. Retrieved November 10, 2012, from
http://learn.genetics.utah.edu/content/labs/
Electrophoresis | Labplanet Blog. (2011, January 3). Labplanet Blog | Just another WordPress site.
Retrieved November 15, 2012, from http://blog.labplanet.com/2011/01/03/electro
Fullerton, E., Margulies, D., Moser, A., & Takano, K. (2012). Advanced Magnetic Recording Media for
High-Density Data Storage - Electroiq. Solid State Technology - Electronics Manufacturing
Industry News for Semiconductors, Advanced Packaging and Nanotechnology. Retrieved
September 29, 2012, from http://www.electroiq.com/articles/sst/print/volume-44/issue-
9/features/thin-film-technology/advanced-magnetic-recording-media-for-high-density-data-
storage.html
Gehani, A., & LaBean, T. (2000). DNA-Based Cryptography. Informally published manuscript, Department
of Computer Science, Duke University, Durham, NC, North Carolina, United States.
Hall, D., & Seaton, S. (2006). Flexible Protein Microarray Inkjet Printing. GEN - Genetic Engineering and
Biotechnology News, 26(18). Retrieved October 1, 2012, from
http://www.genengnews.com/gen-articles/flexible-protein-microarray-inkjet-printing/1897/
Harizopoulos, S., Shah, M., Meza, J., & Ranganathan, P. (2009, January). Energy Efficiency: The New Holy
Grail of Data Management Systems Research. 4th Biennial Conference on Innovative Data
Systems Research, Asilomar, CA.
Heider, D., & Barnekow, A. (2007). DNA-Based Watermarks Using the DNA-Crypt Algorithm. BMC
Bioinformatics, 8(176), 1-11.
Hirabayashi, M., Kojima, H., & Oiwa, K. (2010). Design of True Random One-Time Pads in DNA XOR
Cryptosystems. Natural Computing, 2, 174-183.
Hollingsworth, P., et al. (2009). A DNA Barcode for Land Plants. Proceedings of the National Academy of
Sciences of the United States of America, 106, 12794-12797.
How do we Sequence DNA?. (2012). University of Michigan DNA Sequencing Core. Retrieved October 7,
2012, from http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/sequencing.html
Jambhekar, N. (2011). Steganography: To Preserve Document Security. Golden Research Thoughts, 1(6).
Retrieved November 4, 2012, from http://www.aygrt.net/PublishArticles/107.aspx
Jiao, S., & Goutte, R. (2009). Hiding Data in DNA of Living Organisms. Natural Science , 1(3), 181-184.
Kamali, B. (1995). Error Control Coding. IEEE Potentials, 14(2), 15-19.
63
Lambert, L., Cox, T., Mitchell, K., Rossello-Mora, R., Cueto, C. D., Dodge, D., et al. (1998). Staphylococcus
Succinus Sp. Nov., Isolated From Dominican Amber. International Journal of Systemic and
Evolutionary Biology, 48, 511-518.
Lausted, C., Dahl, T., Warren, C., King, K., Smith, K., Johnson, M., et al. (2004). Posam: A Fast, Flexible,
Open-Source, Inkjet Oligonucleotide Synthesizer and Microarrayer. Genome Biology, 5(8), R58 1-
17.
Leier, A., Banzhaf, W., & Rauhe, H. (2000). Cryptography with DNA Binary Strands. Biosystems, 57(1), 13-
22.
Lelusz, M. (2009, May 9). inleo.blog wirtualizacja, cloud computing, storage, serwery, it
Zastosowanie maszyn wirtualnych w przedsiebiorstwie Rys historyczny technologii
wirtualizacji. inleo.blog wirtualizacja, cloud computing, storage, serwery, it . Retrieved
November 15, 2012, from http://blog.inleo.pl/?p=68
Leo, R. A. (2012, August 16). Writing the Book in DNA | HMS. Home | HMS. Retrieved September 23,
2012, from http://hms.harvard.edu/content/writing-book-dna
LeProust, E., Peck, B., Spirin, K., McCuen, H. B., Moore, B., Namsaraev, E., et al. (2010). Synthesis of
High-Quality Libraries of Long (150mer) Oligonucleotides by a Novel Depurination Controlled
Process. Nucleic Acids Research, 38(8), 2522-2540.
Liu, C., Shi, L., Xu, X., Li, H., Xing, H., Liang, D., et al. (2012). DNA Barcode Goes Two-Dimensions: DNA QR
Code Web Server. Plus One, 7(5), 1-7.
Luoma, J. (2008, June 3). DNA Technology: Discovering New Species by Jon R. Luoma: Yale Environment
360. Yale Environment 360: Opinion, Analysis, Reporting & Debate. Retrieved November 3, 2012,
from http://e360.yale.edu/feature/dna_technology_dis
Mathai, J. (2012). History of Computer Cryptography and Secrecy Systems. Informally published
manuscript, Computer and Information Science, Fordham University, New York, NY, Retrieved
from http://www.dsm.fordham.edu/~mathai/crypto.html
Microarrays Factsheet. (2007, July 27). National Center for Biotechnology Information. Retrieved
October 1, 2012, from http://www.ncbi.nlm.nih.gov/About/primer/microarrays.html
Neacsu, C., Dreyer, J., Behr, N., & Raschke, M. (2006). Scanning-probe Raman Spectroscopy with Single-
Molecule Sensitivity. Physical Review, 73, 193406-1 through 193406-4.
Ning, K. (2009). A Pseudo Dna Cryptography Method. Manuscript submitted for publication, Library,
Cornell University, Ithaca, NY.
64
O'Brien, J. (1998, February). SCAA - Electronic Records: Basic Concepts in Preservation and Access. SCAA
- Saskatchewan Council for Archives and Archivists. Retrieved September 23, 2012, from
http://scaa.usask.ca/e-paper.html
Optical Storage Technology - The Compact Disc. (2004, April 12). National Chung Hsing University.
Retrieved September 29, 2012, from benz.nchu.edu.tw/~imtech/course/ods/Chapter%202%20-
%20The%20Compact%20Disc.pdf
Pbo, S., Gifford, J., & Wilson, A. (1988). Mitochondrial DNA sequences from a 7000-year old brain.
Oxford Journals - Nucleic Acids Research, 16(20), 9775-9787.
Pierik, A., Dijksman, F., Raaijmakers, A., Wismans, T., & Stapert, H. (2008). Quality Control of Inkjet
Technology for DNA Microarray Fabrication.. Biotechnology Journal, 3(12), 1581-1590.
Provos, Neils., & Honeyman, Peter. (2003). Hide and Seek: An Introduction to Steganography. Security &
Privacy Magazine, IEEE, 1(3), 32-44.
Sabry, M., Hashem, M., Nazmy, T., & Khalifa, M. (2010). A DNA and Amino Acids-Based Implementation
of Playfair Cipher. International Journal of Computer Science and Information Security, 8(3), 126-
133.
Schovanek, M. (2010, November 7). Marek Schovanek News News. Marek Schovanek. Retrieved
November 12, 2012, from http://www.marekschovanek.com/wordpress/?cat=1&paged=2
Securing Data at Rest: Developing a Database Encryption Strategy. (2002). RSA - Security, Compliance
and Risk Management Solutions. Retrieved November 15, 2012, from
www.rsa.com/products/bsafe/whitepapers/DDES_WP_0702.pdf
Shimanovsky, B., Feng, J., Potkonjak, M. 2003). Hiding Data in DNA. XAP Corporation, Department of
Computer Science, University of California, Los Angeles
Smyth, D. (2007, July 18). An inside look at Symmetric Encryption. DotNetSlackers: ASP.NET News and
Articles For Lazy Developers. Retrieved November 15, 2012, from
http://dotnetslackers.com/articles/security/AnInsideLookAtSymmetricEncryption.aspx
RedWeb Technologies | ADNAS. (2012). ADNAS | Crime Fighting Has Never Had a Weapon Like This.
Retrieved October 13, 2012, from http://www.adnas.com/redweb
Ribeiro, R. (2012, May). A History of Encryption Through the Ages [Infographic]. In Biztech. Retrieved
November 12, 2012, from http://www.biztechmagazine.com/article/2012/05/history-
encryption-through-ages-infographic
Reif, J. (2002). The Emergence of the Discipline of Biomolecular Computation in the US. New Generation
Computing, 20(3), 217-236.
65
Stallings, W. (2010). Cryptography and Network Security: Principles and Practice (2nd ed.). Upper Saddle
River, N.J.: Prentice Hall.
Toumazou, C. (2012). DNA Electronics - Real-Time Disposable Gene Tests. DNA Electronics - Real-time
disposable gene tests. Retrieved November 9, 2012, from
http://dnae.co.uk/platforms/genalysis/lab-free-dna-testing/
Trummel, K., & Weisinger, J. (1986). The Complexity of the Optimal Searcher Path Problem. Operations
Research, 34(2), 324-327.
Van Voorst, J., & Finzel, B. (2012, May). A Searchable Database of Macromolecular Conformation. 2012
chemistry biology interface training symposium, Minneapolis, MN.
Wong, P. C., Wong, K., & Foote, H. (2003, January). Organic Data Memory Using the DNA Approach.
Communications of the ACM, 46, 95-98.
Xiao, G., Lu, M., Qin, L., & Lai, X. (2006). New Field of Cryptography: DNA Cryptography. Chinese Science
Bulletin, 51(12), 1413-1420.
Zhang, L., Lang, X., Hirata, A., & Chen, M. (2011). Single-Molecule Spectroscopy - A New Gold Standard?
ACS Nano, 5, 4407-4413.

66

The Growing Convergence of DNA and Information Security

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Growing Convergence of DNA and Information Security

Hochgeladen von

Copyright:

Verfügbare Formate

The

Growing Convergence of DNA and Information Security

Master of Science: Information Security

DNA AS AN INFORMATION MEDIUM

polynucleotides. It is this cooperative chaining (3 is attracted to 5, and 5 is attracted to 3) of

Amino acid name

Amino acid code

Table 2. ASCII table of Western type characters.

Table 3. Comparative sequential numbering represented in Quaternary, Binary and Decimal.

Demonstrated years of viability

DNA Digital Storage Capacity

Characters per cubic inch

Bit Density (mm3)

DNA Energy Efficiency

Data Error Rates in DNA

Raw error rate

Reading and Writing in DNA

DNA Barcoding of Living Organisms

Biomolecular or DNA Computing

Table 8. Associations of DNA nucleotide bases to binary number equivalents.

DNA-Based Steganography and Watermarking

DNA-Based Data Encryption Using Yet Another Encryption Algorithm (YAEADNA)

M (random binary file)

DNA nucleotide sequence

2. The ASCII code is replaced with a corresponding DNA sequence:

ASCII[A] DNA sequence

Start sequential search for sequence SEQ from location X

If DNA [A] = DNA [SEQ] then generate PTR file entry

Character Frequency in DNA Character

DNA/Amino Acid-Based Encryption of the Playfair Cipher

Table 10. DNA representation of bits (Sabry, et al., 2006).

[P] Plaintext (characters with spaces, numbers or any special characters)

Remove any spaces or repeated characters from [K]

Put the remaining characters in the UPPER case form. [K]UPPER[K].

2. Prepare the plaintext:

Opportunities for Additional Security (Sabry, et al., 2010)

DNA-Based Encryption using the DNA-Crypt Algorithm

Figure 36. A basic one-time pad encryption/decription schema (Stallings, 2010).

DNA-Based Data Encryption Using Traditional (RSA) Encryption Methodology

Key Generation Detail

Data Pretreatment Detail

Table 15. Associations of DNA nucleotide bases to binary number equivalents.

Data Post-Treatment Detail

The complexity of the biological environment required to manipulate and observe

The difficulty introduced by a traditional cryptography algorithm. Without Bobs secret

DNA Detection and Analysis

Lab-Free Contact-Less (Remote) Molecular Analysis

Real-time, close-proximity, non-contact methods of reading human DNA barcodes;

Close-proximity, non-contact confirmation systems will be used to read non-organic

Das könnte Ihnen auch gefallen