Lectures Notes On Quantum Computing and Quantum Information

Lectures Notes on Quantum Computing and Quantum
Information Theory for Non-Physicists

Dan C. Marinescu and Gabriela M. Marinescu
Computer Science Department
School of Electrical Engineering and Computer Sciences,
University of Central Florida
Email: [dcm,magda]@cs.ucf.edu
March 4, 2003
1
Contents
1 Observing the Quantum World 7
1.1 Computing and the Laws of Physics . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 A Qubit of History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Quantum Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Quantum Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 The Wave and the Corpuscular Nature of Light . . . . . . . . . . . . . . . . . 15
1.6 Probabilities and Quantum States . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 Superposition and Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8 Measurements and Collapsing of Superposition States onto Basis States . . . . 23
1.9 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Introduction to Quantum Mechanics 28

2.1 A Brief History of Quantum Ideas . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Youngs Double-Slit Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 The Stern - Gerlach Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4 Mathematical Foundations of Quantum Mechanics . . . . . . . . . . . . . . . . 39
2.5 The First Postulate. Quantum States . . . . . . . . . . . . . . . . . . . . . . . 43
2.6 Quantum Observables. Quantum Operators . . . . . . . . . . . . . . . . . . . 45
2.7 Eigenvalues and Eigenvectors of a Quantum Operator . . . . . . . . . . . . . . 49
2.8 The Spectral Decomposition of an Operator . . . . . . . . . . . . . . . . . . . 53
2.9 The Second Postulate. The Dynamics of a Quantum System . . . . . . . . . . 55
2.10 The Schematic Derivation of Schrodingers Equation . . . . . . . . . . . . . . . 56
2.11 The Third Postulate; Measurements of Observables . . . . . . . . . . . . . . . 61
3 Qubits 64
3.1 The Qubit, a Very Small Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 The Bloch Sphere Representation of One Qubit . . . . . . . . . . . . . . . . . 65
3.3 Two Qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 The Fragility of Quantum Information. Schrodingers Cat . . . . . . . . . . . . 68
3.5 Qubits and Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.6 The Physical Realizations of Qubits . . . . . . . . . . . . . . . . . . . . . . . . 70
3.7 Qubits as Spin 12 Particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.8 The Measurement of the Electron Spin Along a Principal Axis . . . . . . . . . 72
3.9 The Measurement the Electron Spin Along any Spatial Axis . . . . . . . . . . 73
3.10 The Implications of the Quantum Mechanics Predictions . . . . . . . . . . . . 74
3.11 The Exchange of Information Using Entangled Particles . . . . . . . . . . . . . 75
3.12 The Qubit as a Polarized Photon . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 Quantum Circuits 78
4.1 Classical Logic Gates and Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2 One Qubit Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 Two Qubit Gates. The CNOT Gate . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4 Can we Build Quantum Copy Machines? . . . . . . . . . . . . . . . . . . . . . 86
4.5 Three Qubit Gates. The Fredkin Gate . . . . . . . . . . . . . . . . . . . . . . 88
4.6 The Tooli Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.7 Quantum Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2
4.8 The No Cloning Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.9 Mathematical Models of a Quantum Computer . . . . . . . . . . . . . . . . . 97
5 The Entanglement of Computers and Communication with Quantum Me-

chanics 100
5.1 Uncertainty and Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 Possible Explanations of the EPR Paradox . . . . . . . . . . . . . . . . . . . . 102
5.3 The Bell Inequality. Local Realism . . . . . . . . . . . . . . . . . . . . . . . . 103
5.4 EPR pairs and Bell States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.5 Teleportation with Maximally Entangled Particles . . . . . . . . . . . . . . . . 107
5.6 Anti-Correlation and Teleportation . . . . . . . . . . . . . . . . . . . . . . . . 114
5.7 Dense Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.8 Quantum Key Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.9 Quantum Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.10 Classic and Quantum Information Theory . . . . . . . . . . . . . . . . . . . . 124
5.11 Quantum Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6 Physical and Logical Reversibility. Reversible Computations 129

6.1 Turing Machines, Reversibility, and Entropy . . . . . . . . . . . . . . . . . . . 129
6.2 Thermodynamic Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.3 Maxwells Deamon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.4 Energy Consumption. Landauers Principle . . . . . . . . . . . . . . . . . . . . 135
6.5 Low Power Computing. Adiabatic Switching . . . . . . . . . . . . . . . . . . . 136
6.6 Bennetts Information Driven Engine . . . . . . . . . . . . . . . . . . . . . . . 137
6.7 Logically Reversible Turing Machines and Physical Reversibility . . . . . . . . 138
7 Basic Concepts of Information Theory 140

7.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.2 Conditional and Joint Entropy. Mutual Information . . . . . . . . . . . . . . . 141
7.3 Binary Symmetric Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.4 Information Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.5 Channel Capacity. Shannons Theorems . . . . . . . . . . . . . . . . . . . . . 145
7.6 Error Detecting and Error Correcting Codes . . . . . . . . . . . . . . . . . . . 147
7.7 Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.8 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.9 Channel Decoding Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.10 Error Correcting and Detecting Capabilities of a Code . . . . . . . . . . . . . 152
7.11 The Hamming Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8 Quantum Algorithms 157

8.1 Introduction to Quantum Algorithms . . . . . . . . . . . . . . . . . . . . . . . 157
8.2 Quantum Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.3 Quantum Phase Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.4 Order Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.5 Quantum Algorithms for Integer Factoring . . . . . . . . . . . . . . . . . . . . 157
8.6 The Hidden Subgroup Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.7 Quantum Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.8 Quantum Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
3
9 Quantum Information Theory 158
9.1 Introduction to Quantum Information Theory . . . . . . . . . . . . . . . . . . 158
9.2 von Neumanns Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.3 Source and Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.4 Quantum Channel Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.5 Quantum Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.6 Quantum Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
10 Appendix I: Algebraic Structures 159

10.1 Commutative Rings, Integral Domains, Fields . . . . . . . . . . . . . . . . . . 159
10.2 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
10.3 Abstract Groups and Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . 162
10.4 Symmetry in a Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
10.5 Groups of Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
11 Appendix II: Linear Algebra 167

11.1 Vectors in a Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
11.4 Scalar Product, Norm, and Hilbert Spaces . . . . . . . . . . . . . . . . . . . . 172
11.5 Euclidian Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
11.6 Linear Operators and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
11.7 Functions of Operators and Matrices . . . . . . . . . . . . . . . . . . . . . . . 177
11.8 Eigenvectors and Eigenvalues. Hermitian Operators . . . . . . . . . . . . . . . 178
11.9 Bilinear Functions and Tensor Products . . . . . . . . . . . . . . . . . . . . . . 181
12 Acknowledgments 183
4
Notations
c The speed of light in vacuum, c = 3 1010 cm/sec.
h Plancks constant. h = 6.6262 1034 Joules times second.
Reduced Plancks constant. = 2 h
= 1.054 1034 Joules second.
kB Boltzmans constant. kB = 1.131 1023 Joules per degree Kelvin.
G The universal gravitational constant. G = 6.672 108 cm3 g 1 s2 .
R The eld of real numbers.
C The eld of complex numbers.
i = 1 Imaginary number.
0 , 1 , . . . Complex numbers. i = Real(i ) + i Imaginary(i )
0 , 1 , . . . Complex conjugates. i = Real(i ) i Imaginary(i )
| i | The modulus
of the complex number i .
| i |= [Real(i )]2 + [Imaginary(i )]2
Cn n- dimensional vector space over the eld of complex numbers.
H2 Two-dimensional Hilbert space.
Hn n-dimensional Hilbert space.
3
| , | ket (Diracs
notation);
| ,
e.g., | column vectors in C
0 0
| = 1 | = 1
2 2
| bra (Diracs notation); the dual of | . The row vector |
is the transpose
of the complex conjugate of | .
0
If | = 1 then |=| = (| )T = (0 , 1 , 2 ).
2
| The scalar (inner) product of | and | ; complex number.
0
| =| | = (0 , 1 , 2 ) 1 = 0 0 + 1 1 + 2 2 .
T
2
| | The tensor product of | and | ; it is a vector.
0 0
0 1

0 2

0 0 1 0

| | =| | = 1 1 = 1 1

2 2 1 2

2 0

2 1
2 2
| | The outer product of | and| ; it is a linear operator or a matrix.
0 0 0 0 1 0 2
| |=| (| )T = 1 (0 , 1 , 2 ) = 1 0 1 1 1 2
2 2 0 2 1 2 2
|| | || | .
The norm of vector
|| | || = | = | 0 |2 + | 1 |2 + | 2 |2 .
5
A Linear operator or matrix.
A/ai Partial derivative of operator A.
tr(A) The trace of matrix A; the sum of its diagonal elements.
det(A) =| A | The determinant of matrix A.
MijA Minor obtained by eliminating row i and column j from A.
AT Transpose of matrix Amn ; row i, 1 i m becomes column i.
A Complex conjugate of matrix Amn ;
aij aij , 1 i m, 1 j n.
A Hermitian conjugate of A. A = (A )T .
i,j Kroneckers delta function. i,j = 0 if i = j and i,j = 1 if i = j.

x px 2
A formulation of Heisenbergs uncertainty principle;
the position of the particle is x and its momentum at x is px .
i dtd = H Schrodinger equation.
p = h De Broglies equation. p is the momentum of a particle and
is the wavelength of the wave associated with it.
6
1 Observing the Quantum World
When James II, the king of Great Britain, insisted that a Benedictine monk be given a degree
without taking any examinations or swearing the required oaths, Isaac Newton, who was
the Lucasian professor at Trinity College at Cambridge, wrote to the Vice-Chancellor Be
courageous and steady to the Laws and you cannot fail. The Vice-Chancellor took Newtons
advice and..... was dismissed from his post. Now we undertake the challenge of teaching
quantum computing to non-physicists and expect the wrath of physics-challenged monks,
kings, and ...students.
In his marvellous book A Brief History of Time Stephen Hawking, the astrophysicist
who is now the Lucasian professor, shares with his readers the warning he got from his editor:
expect the sales to be cut in half for every equation in your book. There are k 102
equations in this series of lectures and 2100 1, 00010 is a very large number; identifying few
residual student atoms pursuing the subject by the end of the semester may prove to be a
very challenging task.
For many years computer science students have been led to believe that they can get by
with some knowledge of discrete mathematics and little if any understanding of physics at all.
But times are changing; in some sense we are going back to the time when a strong connection
between physics and computers existed.
1.1 Computing and the Laws of Physics

Computers are systems subject to the laws of physics. One of these laws, the nite speed
of light, limits the potential reliability of future computing systems. The components of
a computer exchange information among themselves, e.g., the processor reads and writes
information from/to memory, data is transferred from the internal registers to the Arithmetic
and Logic Unit (ALU), and so on. Transmission of information is associated with a transport
of energy from the source to the destination, but no physical phenomena may propagate with
a speed larger than the speed of light. It takes one nanosecond, 109 seconds, for the light
to travel a distance of 30 cm in vacuum and about 20 cm in a metallic conductor. Therefore,
the speed of a computer is limited by the size of its components. It is inevitable that in
our quest to increase the speed, at some point the components of a computer will approach
atomic dimensions. When this happens, switching, the change of the state of a component,
will be governed by Heisenbergs uncertainty principle 1 . It follows that we may not be
able to determine the state of that component with absolute certainty and the results of a
computation, though carried out by a very fast computer, will be unreliable.
The technology enabling us to build smaller and faster computing engines encounters other
physical limitations as well. We are limited in our ability to increase the density and the
speed of a computing engine. Indeed, the heat produced by a super dense computing engine
is proportional with the number of elementary computing circuits, thus, with the volume of
1
Heisenbergs uncertainty principle says we cannot determine both the position and the momentum of a
quantum particle with arbitrary precision. In his Nobel prize lecture on December 11, 1954 Max Born says
about this fundamental principle of Quantum Mechanics : ... It shows that that not only the determinism of
classical physics must be abandoned, but also the naive concept of reality which looked upon atomic particles
as if they were very small grains of sand. At every instant a grain of sand has a denite position and velocity.
This is not the case with an electron. If the position is determined with increasing accuracy, the possibility
of ascertaining its velocity becomes less and vice versa.
7
the engine. The heat dissipated grows as the cube of the size of the device. To prevent
the destruction of the engine we have to remove the heat through a surface surrounding the
device. Henceforth, our ability to remove heat increases as the square of the size while the
amount of heat increases with the cube of the size of the computing engine.
Moreover, if there is a minimum amount of energy dissipated to perform an elementary
operation, then to increase the speed, thus the number of operations performed each second
by the computing engine, we require a liner increase of the amount of energy dissipated by
the device. The computer technology vintage year 2000 requires some 3 1018 Joules per
elementary operation. Even if this limit is reduced say 100-fold we shall see a ten times
increase in the amount of power needed by devices operating at a speed 103 times larger than
the sped of todays devices.
In 1992 Ralph Merkle from Xerox PARC calculated that a 1 GHz computer operating at
room temperature, with 1018 gates packed in a volume of about 1 cm3 , the size of a sugar
cube, would dissipate 3 MW of power [20]. A small city with 1, 000 homes each using 3 KW
would require the same amount of power; a 500 MW nuclear reactor could only power some
166 such circuits.
So classical mechanics and physics have built very rigid walls limiting our ability to com-
pute faster and cheaper. We need to look elsewhere and consider a revolutionary rather than
an evolutionary approach to computing. The direction to explore is quantum mechanics.
Quantum theory does not play only a supporting role by prescribing the limitations of physi-
cal systems used for computing and communication. Quantum properties such as uncertainty,
interference, and entanglement form the foundation of a new brand of theory, the quantum
information theory where computational and communication processes rest upon fundamental
physics. Quantum computing is not a ction. We have a reasonable hope to build quantum
computers during the next few decades and we expect amazing results.
Quantum is a Latin word meaning some quantity e.g., temperature, or some denite
amount of something, e.g., two teaspoons of sugar, a glass of water, and so on. In physics
it is used with the same meaning as the word discrete in mathematics, i.e., some quantity
or variable that can take only sharply dened values as opposed to a continuously varying
quantity.
1.2 A Qubit of History

Quantum computing is the result of a marriage between two great discoveries of the twentieth
century, quantum mechanics and the general purpose computer.
All started more than one hundred years ago when in 1900 Max Plank proposed an amazing
solution to a puzzling problem, the so called ultraviolet catastrophe. Contrary to experimental
evidence and even to common sense, classical physics predicted that the intensity of radiation
emitted by a hot body increases without any limit as the frequency of radiation increases.
A hot body in equilibrium would radiate an innite amount of energy and since this is a
physically impossibility it followed that thermal equilibrium was impossible and this was
absurd. Plank calculated the so-called blackbody spectrum assuming that the body emitted
energy in discrete packets called quanta and his calculations agreed with the experiment and
avoided the contradiction of the classical theory. Shortly afterwards, in 1905, Albert Einstein
used Plank quantum hypothesis to explain what happens when light shines on a negatively
charged metal plate, the phenomena known as the photoelectric eect. Then, in 1913, Niels
Bohr proposed a quantum model of the atom. In 1925 Werner Heisenberg developed an
8
astoundingly new thinking and created quantum mechanics; Heisenbergs work was followed
shortly, in 1926 by the introduction of a wave equation by Erwin Schroedinger. The term
quantum mechanics was introduced by Max Born. During the years 1925 and 1926 he
published, with Heisenberg and Jordan, investigations on the principles of quantum mechanics
(matrix mechanics) and soon after this, his own studies on the statistical interpretation of
quantum mechanics. All six of them got the Nobel prizes for physics for their revolutionary
discoveries: Plank in 1918, Einstein in 1921, Bohr in 1922, Heisenberg in 1932, Schroedinger in
1933, and Max Born in 1954. The Nobel prize lectures of these great scientists are fascinating
readings and can be found at the Nobel prize Web site, http://www.nobel.se/physics.
In April 1936 an eccentric young Cambridge don by the name Alan Turing dreamed up
an imaginary typewriter-like contraption called the Universal Turing Machine. The Universal
Turing Machine embodies the essential principle of the computer: a single machine which can
be turned to any well-dened task by being supplied with the appropriate program.
Turing had reached the conclusion that every function which can be regarded as com-
putable can be computed by an universal computing machine at about the same time as the
work of the American logician Alonzo Church was published. Turings paper,On Computable
Numbers with an Application to the Entscheidungsproblem, referred to Churchs work, and
was published in August 1936 [60].
Almost ten years later, in the Fall of 1945, the worlds rst general purpose computer, the
brainchild of J. Presper Eckert and John Macauly became operational after several years of
development at the Moore School of the University of Pennsylvania. The ENIAC (Electronic
Numerical Integrator and Calculator) could perform 5, 000 addition cycles a second and do
the work of 50, 000 people computing by hand; it could calculate the trajectory of a projectile
in 30 seconds, instead of the 20 hours necessary with a desk calculator. The ENIAC required
174 kW of power for the 17, 468 vacuum tubes, 70, 000 resistors, and 10, 000 capacitors. Even
when the computer was not operational the cost of electricity to keep the laments of the
vacuum tubes heated and the fans running to dissipate the heat was about $ 650 per hour
[35].
Arguably, in the mid 1940s, computer simulation of physical systems and phenomena
was the motivating force for the development of the general purpose computers. It is not a
coincidence that the general-purpose computers are based upon the so called von Neumann
architecture, named after the renown mathematician and physicist. In early 1940s scientists
associated with the Manhattan project at Los Alamos, including John von Neumann and
Richard Feynman, were feverishly developing a ssion device. Sophisticated calculations
were necessary and the hundreds of human calculators employed to solve large numerical
problems were simply not sucient. The Manhattan Project rst resorted to the use of
tabulating machines and calculators and then got access to the ENIAC. Nicolas Metropolis
and Stanly Frankel, two other physicists from the Manhattan Project, had the honor of
running the rst test programs once the ENIAC became operational; the problem they solved
remains classied even today. Later that year, Stanislaw Ulam doubted Edward Tellers design
of a thermonuclear device and needed to compute the results of a thermonuclear reaction at
increments of one ten-millionth of a second, using the ENIAC.
Some question the paternity of the ideas in the 1946 report co-authored by John von New-
mann [21] which proposed the development of a new computer called the EDVAC (Electronic
Discrete Variable Automatic Computer). This report is probably the reason why we talk
today about von Neumann architecture rather than Eckert-Macauly architecture, but this is
besides the point for our topic. We only want to stress that numerical simulation became a
9
Table 1: Projected evolution of VLSI technology.
Year 2002 2005 2008 2011

6
Minimum feature size (m, 10 meter) 0.13 0.10 0.07 0.05
Memory; Bits per chip (billions, 109 ) 4 16 64 256
2 6
Logic; Transistors per cm (millions, 10 ) 18 44 108 260
new investigative tool in mid twentieth century. Numerical simulation complements the two
traditional exploratory methods of science, experimental work and theoretical modelling.
In 1948 Claude Shannon published A Mathematical Theory of Communication in the
Bell System Technical Journal. This paper founded a new discipline, information theory
and proposed a linear model of a communications system. Shannon considered a source
of information which generates words composed of a nite number of symbols transmitted
through a channel; if xn is the n-th symbol produced by the source, then xn is a stationary
stochastic process [50].
The rst commercial computer, UNIVAC I, capable of performing 1, 900 additions/second
was introduced in 1951; the rst supercomputer, the CDC 6600, designed by Seymour Cray,
was announced in 1963; IBM launched System/360 in 1964; a year later DEC unveiled the rst
commercial minicomputer, the PDP 8, capable of performing some 330, 000 additions/second.
A very large percentage of the cycles of all these systems were devoted to numerical simulation.
In 1977 the rst personal computer, the Apple II was marketed and the IBM PC, rated at
about 240, 000 additions/sec, was introduced 4 years later, in 1981.
Todays computers are very dierent from the ENIAC. In 2001 a high-end PC had a
1.5 GHz CPU and 256 MB of memory. What about the future 2 ? Changes in the VLSI
technologies and computer architecture are projected to lead to a 10-fold increase in com-
putational capabilities over the next 5 years and 100-fold increase over the next 10 years.
Towards the end of 2003 the same PC is projected to have an 8 GHz processor and a 12 GB
memory. For 2010 the CPU speed is projected to be 64 GHz, the main memory to increase
to 16 GB.
By 2002 the minimum feature size is 0.13 m and it it is expected to decrease to 0.05 m
in 2011. As a result, during this period the density of memory bits will increase 64-fold and
the cost per memory bit will decrease 5-fold. It is projected that during this period the density
of transistors will increase 7-fold, the density of bits in logic circuits will increase 15-fold, and
the cost per transistor will decrease 20-fold (see Table 1).
While the solid-state technology continued to improve at a very fast pace, making our
computers faster and cheaper, theoretical physicists became restless and started asking ques-
tions about the physical limitations of our computational models. They reasoned that if there
is a minimum amount of power dissipated for the execution of a logical step, then the faster
computers become, the more power is needed for their operation. Moreover, the amount of
power dissipated in a given interval of time increases as the computers become faster and
faster and it becomes harder and harder to deal with the heat generated during this process.
This motivated Rolf Landauer to investigate the heat generation during the computational
2
Predicting the future is a business involving considerable risks and potential ridicule. Cases in point: in
1943, discussing the future of the computer industry, Thomas J. Watson, the chairmen of IBM corporation
said: I believe that there is a market for maybe ve computers.; in 1949 a widely circulated popular science
magazine speculated: Computers in the future may weight no more than 1.5 tones.
10
process starting from the basic laws of thermodynamics. His results were published in 1961
[34].
Following in Landauers footsteps, in 1973 Charles Bennet studied the logical reversibility
of computations. This concept is discussed in Section 6.7; in laymans terms logical reversibil-
ity means that once a computation is nished one can retrace every step and reconstruct the
data used as input of every step. Benett argued that Turing machines and any other general-
purpose computing automata are logically irreversible [9]. A device is said to be irreversible
if its transfer function does not have a single-valued inverse. Bennett developed a theo-
retical framework proving that reversible general-purpose computing automata can be built
and that their construction makes plausible the possibility of building thermodynamically re-
versible computers. There is no positive lower bound on the energy dissipated per logical step
by a thermodynamically reversible computer, thus, in principle, such a device could compute
dissipating little, if any energy at all.
The widespread interest in quantum computing was probably generated by the contribu-
tions of Richard Feynman. In 1981 he gave a talk with the title Simulating Physics with
Computers at a meeting held at MIT [20]. Feynman argued that in traditional numeri-
cal simulations such as weather forecasting or aerodynamic calculations, computers model
physical reality only approximately. He advanced the idea that physics was computational
and that a computer could do an exact simulation of a physical system, even of a quantum
system. He identied quantum mechanics as the most important ingredient for constructing
computational models of physics.
Feynman speculated that in many instances computation can be done more eciently
by using quantum eects [31]. His ideas were inspired by previous work of Bennett [9, 10]
and Benio [8]. Starting from basic principles of thermodynamics and quantum mechanics,
Feynman suggested that problems for which polynomial time algorithms do not exist could be
solved; computations for which polynomial algorithms exist could be speeded up considerably
and even made reversible.
Ed Fredkin, Thomasso Tooli, and Norman Margolus associated for some time with the
Laboratory for Computer Science at MIT contributed to the eld of quantum computing.
The inventor of a reversible quantum gate, Fredkin is probably the only college dropout who
was ever appointed as a professor at MIT and owns a private island in the British Virgin
Islands.
In 1985 David Deutsch reinterpreted the Church-Turing conjecture as every nitely re-
alizable physical system can be perfectly simulated by a universal model computing machine
operating by nite means and conceived a universal quantum computer [24].
In 1994 Peter Shor developed a clever algorithm for factoring large numbers [51] and
generated a wave of excitement for the newly founded discipline of quantum computing.
1.3 Quantum Information

Until recently, information and computation models had a feeble connection with physics.
Complexity theory addressed the time and space complexity of algorithms. Time and space
are physical attributes, thus a connection with physical reality is still maintained. Information
theory was concerned with entropy as a measure of the uncertainty associated with a random
event and its relationship with information transmission over communication channels. It
is fair to say that our information and computation models lacked physical awareness and
required little understanding of the basic laws of physics. The laws of quantum mechanics
11
were viewed more as an annoyance than a necessity; they could be needed at a distant point
in the future when the physical systems used to store and transfer information would consist
only of a few atoms.
Now this view is challenged by very signicant results in a new discipline called Quan-
tum Computing. Quantum computing and quantum information theory are concerned with
the transmission and processing of quantum states and the interactions of such quantum
information with the classical one.
A quantum bit, or qubit for short, is a microscopic system used to store quantum infor-
mation. As opposed to a bit which can be in one of the states 0 and 1, a qubit can
exist in a continuum of states. Moreover, we can measure the value of a bit with certainty
without aecting its state, while the result of measuring a qubit is non-deterministic and the
measurement alters its state. In contrast, in existing computers, if a bit is in, say, state 0,
then when it is measured the result is 0 and the state of this bit remains 0.
A classical communication channel allows electromagnetic waves or optical phenomena
to propagate and it is characterized by its capacity, the maximum quantity of information
that can be transmitted through the channel per unit of time. A quantum communication
channel is a physical system capable of delivering quantum systems more or less intact from
one place to another and is characterized by (i) a capacity C for transmitting classical data,
(ii) a lower capacity Q for transmitting intact quantum states, and (iii) a capacity Q2 for
transmitting intact quantum states with the assistance of a two-way classical side-channel.
Often Q Q2 C.
The path from classical information to quantum information is a process of extension,
renement, and completion of our knowledge. It follows the evolution of our thinking in other
areas of science. Consider for example number theory; one started with a concept inspired
by an the physical reality, positive integers; then it was realized that one needs to dene the
additive inverse of a positive integer and negative integers were born; soon one discovered
that the multiplicative inverse of an integer is needed and the real numbers were introduced;
after a while rational and then irrational numbers were added to the family of numbers.
1.4 Quantum Computers

Let us take a closer look at our computing machines and get a glimpse at the potential
advantages of their quantum incarnation. A classic computing engine is a deterministic system
evolving from an input state to an output state. All the initial states the system could be in
at the beginning of the computation as well as all the states traversed by the system during its
dynamic evolution have some canonical labelling; the label of each sate can be measured by an
external observer. The measured output state label is a deterministic function f of the input
label state; we say that the engine computes a function f . Two classical computing engines
are equivalent if given the same labelling of their input and output states they compute the
same function f [24].
Quantum computers are stochastic engines because the state of a quantum system is
uncertain, a certain probability is associated with any possible state the system can be in.
The output states of a stochastic engine are random: the label of the output state cannot
be discovered, it is not observable. All we can do is to label a set of pairs consisting of an
output observable and a measured value of that observable. In laymans terms observable
stands for characteristics or attribute; in quantum mechanics we say that each pair consists
of a Hermitian operator and one of its eigenvalues as we shall see in Section 2.11. There is an
12
asymmetry between the input and the output of a quantum engine reecting the asymmetry
between preparation and measurement in quantum mechanics.
Now few words about quantum mechanics. It is a mathematical model of the physical
world. The model allows us to specify states, observables, measurements, and the dynamics
of systems. As we shall see later, a Hilbert space, a space of n-dimensional complex vectors is
the center stage for quantum mechanics. A Hilbert space is a mix of Trafalgar Square, Place
Pigalle, and Times Square....where you could meet Heisenberg, von Neumann, Schrodinger
and other luminaries.
A Hilbert space is indeed a very large space....and we need every bit of it. Let us revisit
the statement made earlier that numerical simulation of physical processes was and continues
to be the motivation for the development of increasingly more powerful computing engines.
Whether the scientists look up at the sky and try to answer fundamental questions related
to the evolution of the Universe or try to decipher the structure of the matter, they need to
simulate increasingly more complex systems. One measure of the complexity of a system is the
number of states the system can be in. For example, the theory of black hole thermodynamics
predicts [24] that a system enclosed by a surface with an area A has a number N (A) of
observable states given by:
3 /4G
N (A) = eAc .
with c = 3 1010 cm/second, the speed of light; = 1.054 1034 Joules seconds is
Plancks constant; G the gravitational constant, G = 6.672 108 cm3 g 1 s2 .
Quantum mechanics allows us to accommodate extremely large state space. A quantum
bit or qubit is a quantum particle used to store information. Mathematically, a qubit | is
represented as a vector in a two-dimensional complex vector space (the terms used now are
dened in Section 11). In this space a vector has two components and the projections of the
vector on the basis are complex numbers. We use Diracs notations for vector | with 0
and 1 complex numbers and with | 0 and | 1 two vectors forming an orthonormal basis for
this vector space [27]:
| = 0 | 0 + 1 | 1.
While a classical bit can be in one of two states, 0 or 1 the qubit can be in states | 0
and | 1 called computational basis states and also in any state that is a linear combination
of these states. This phenomenon is called superposition.
Consider now a system consisting of n such particles whose individual states are described
by vectors in the two-dimensional vector space. In classical mechanics the individual states
of particles combine through the Cartesian product. The possible states of the system of n
particles form a vector space of 2 n dimensions; given n bits, we can construct 2n n-tuples
and describe a system with 2n states.
Individual state spaces of n particles combine quantically through the tensor product. If
X and Y are vectors, then the vector product X Y has dimension dim(X) + dim(Y ); the
tensor product X Y has dimension dim(X) dim(Y ).
In a quantum system, having n qubits, the state space has 2n dimensions. The extra
states that have no classical analog are called entangled states. The catch is that even though a
quantum bit can be in one of innitely many superposition states, when the qubit is measured,
the measurement changes the state of the particle to one of the two basis states; from one
qubit we can only extract a single classical bit of information.
13
The argument in favor of the quantum computing is that in quantum systems the amount
of parallelism increases exponentially with the size of the system; in other words, an expo-
nential increase in parallelism requires only a linear increase in the amount of space needed.
The major diculty lies in the fact that access to the results of a quantum computation is
restrictive, access to the results disturbs the quantum state. The process of disturbing the
quantum state due to the interaction with the environment is called decoherence.
Two photons can be in a state of close coupling with each other, an entangled state.
Entanglement is the exact translation of the German term Verschr ankung used by Schrodinger
who was the rst to recognize this quantum eect. It simply means that two quantum particles
share a joint state and it is not possible to describe one of the particles in isolation. Even
when the entangled particles are separated from one another, a change of state of one of the
entangled particles instantaneously aects the other particle and makes it change its state.
Later, in Chapter 2, we see that entanglement occurs for other quantum systems as well. For
example, consider a singlet state, a pair of electrons rotating on the same orbit around the
nucleus of an atom. Pauli exclusion principle dictates that the two electrons cannot be in
identical states. They must have their spins oriented in opposite directions because they are
on the same orbit, thus have the same energy. If, as a result of an experiment, one of the
electrons is made to change the orientation of its spin, then a simultaneous measurement of
the other nds it in a state with an opposite spin.
Let us try an intuitive example of entanglement taken from... our daily life!! Alice and Bob,
two characters of many cryptography texts, get married with one another, their lives become
entangled. After a few months they nd out that Alice is pregnant. Shortly afterwards, Bob
takes o for an intergalactic voyage to Andromeda. The very moment Alice gives birth to
their child named Samantha, Bobs state changes instantly, though he may be at the other
end of our galaxy. Bob becomes a father. An external observer could see the baby and decide
that Bobs state has changed 3 .
Entanglement is a very puzzling phenomena. Feynman writes [29]: A description of the
world in which an object can apparently be in more than one place at the same time, in which
a particle can penetrate a barrier without braking it, in which widely separated particles can
cooperate in an almost psychic fashion, is bound to be both thrilling and bemusing.
In a recent paper [14], Charles Bennet and Peter Shor, two of the pioneers of quantum
computing and quantum information theory, discuss the similarities and dissimilarities be-
tween classic and quantum information. They point out thatclassical information can be
copied freely, but can only be transmitted forward in time, to a receiver in the senders for-
ward light cone. Entanglement, by contrast cannot be copied, but can connect any two points
in space-time. Conventional data-processing operations destroy entanglement but quantum
operations can create it, preserve it and use it for various purposes, notably speeding up
certain computations and assisting in the transmission of classical data (quantum superdense
coding) or intact quantum states (teleportation) from a sender to a receiver.
These facts are intellectually pleasing but two questions come immediately to mind:
1. Can such a quantum computer be built?
2. Are there algorithms capable of exploiting the unique possibilities opened by quantum
computing?
The answer to the rst question is that only ve-bit quantum computers have been built so
far; several proposals to build quantum computers using nuclear magnetic resonance, optical
3
Of course Bob learns of his new state only when a message sent through a classical communication channel
reaches him, but this is besides the point.
14
and solid state techniques, and ion traps have been studied.
The answer to the second question is that problems in integer arithmetic, cryptography,
or search problems have surprisingly ecient solutions in quantum computing.
We have a decent hope that quantum computers can be built and solve mathematical
problems that are unsolvable today and will remain so, even if we assume that Moores law 4
governing the rate of increase of the speed of classical devices will continue to hold for the
next few decades. But uncertainty manifested through sophisticated quantum phenomena
such as interference and entanglement is critical for understanding quantum computing and
quantum information theory.
So we need quantum mechanics after all; we need to develop gradually the knowledge
and understand concepts such as quantum states, intact quantum states, entanglement. It is
true that we do not need quantum mechanics to explain most of the phenomena we observe
in our daily life. Yet, there are experiments involving light with very intriguing results that
cannot be explained without invoking the principles of quantum mechanics. We discuss several
experiments to set us up in the frame of mind needed for understanding quantum information.
First, we present a simple experiment revealing the granular nature of the light. Next we
address the non-deterministic eects and discuss Feynmans probability rules for quantum
systems. Finally, we present a simple experiment illustrating the eect of measurements and
the collapsing of superposition states.
1.5 The Wave and the Corpuscular Nature of Light

The nature of light was a constant source of interest for philosophers and then for physicists.
Aristotle believed that white light was a basic single entity. An early treatise on the subject,
Keplers Optics, did not challenge this idea. Isaac Newtons rst work as the Lucasian
Professor at Cambridge was in optics; in January 1670 he delivered his rst lecture on the
subject. The chromatic aberration in a telescope lens convinced Newton that Aristotles
hypothesis was false. Newton argued that white light is a mixture of many dierent types
of rays which are refracted at slightly dierent angles, and that each type of ray produces
a dierent spectral color. Newtons Opticks appeared in 1704; it dealt with the theory of
light and color; it covered the diraction of light and Newtons rings. To explain some of his
observations he had to use a wave theory of light in conjunction with his corpuscular theory.
At the end of 19-th century James Clerk Maxwell proved that light is a form of electromag-
netic radiation. Yet, as Maxwells theory was gaining universal acceptance, the photoelectric
eect and its quantum explanation involving photons was provided by Einstein in 1905; this
theory seemed to signal a return to the Newtonian model of light. Photons, though massless,
are like any other particles, they carry both a momentum and energy and act like ..... billiard
balls. When colliding with the electrons at the surface of a metal the photons can knock them
free and leave behind a positive charge. This is the simplied explanation for the photoelectric
eect. The photoelectric eect is often used to measure the intensity of light. A device called
a photomultiplier allows us to detect light of very low intensity, and even individual photons.
The contemporary theory regards light as a ux of photons and, at the same time, the
light exhibits the properties of a wave. To understand this duality consider a device called
a beam splitter, (BS), a half-silvered mirror, see Figure 1. A beam of light falling on a beam
4
Moores law states that the speed of microprocessors doubles every 1.8 years.
15
splitter is split into two components of equal intensity, one transmitted and one reected.
The color of the light is not altered by the beam splitter, a behavior consistent with a wave.
D1
D3
Incident beam of light

D5
Detector D1
D7
Reflected beam
Beam splitter
Transmitted beam
Detector D2
D2
(a) (b)
Figure 1: (a) A beam splitter. (b) A series of beam splitters.
If we start decreasing the intensity of the incident light we are able to observe the granular
nature of light. Imagine that we send a single photon. Then either detector D1 or detector
D2 in Figure 1(a) senses the photon. If we repeat the experiment involving a single photon
over and over again we observe that each one of the two detectors records the same number
of events (one event is the detection of a photon).
This is puzzling; could there be hidden information, which controls the behavior of a
photon? Does a photon carry a gene and one with a transmit gene continues and reaches
detector D2 and another with a reect gene ends up at D1 [37]? If this is true, the two genes
should have an equal probability of occurrence throughout the entire population of photons.
It is not dicult to dismiss this genetic view of photon behavior: consider the setup in
Figure 1(b) with a cascade of beam splitters. As before, we send a single photon and repeat
the experiment many times and count the number of events registered by each detector.
According to our theory we expect the rst beam splitter to decide the fate of an incoming
photon; the photon is either reected by the rst beam splitter or transmitted by all of them.
Thus, only the rst and last detectors in the chain are expected to register an equal number
of events. Amazingly enough, the experiment shows that all the detectors have a chance to
register an event; this result discredits our theory. This leads us to seek another possible
explanation: a photon pauses when reaching a beam splitter and tosses a fair binary coin and
the result of the toss determines the fate of the photon emerging from that beam splitter.
1.6 Probabilities and Quantum States

Uncertainty and probabilistic behavior are quintessential properties of quantum systems, see
also Section 5.1. Heisenbergs uncertainty (indetermination) principle states that one cannot
measure both the position x and the momentum p of a particle with arbitrary precision. The
16
uncertainty in the measurement of the position, x and the uncertainty in the measurement
of the momentum, p must satisfy the following inequality:
1
x p .
2
with = h/2 = 1.054 1034 Joules second and h = 6.626 1034 Joules second.
A similar inequality exists for the measurement of the pair (W, t) with W the energy and
t the time:
1
W t .
2
These inequalities reect the impossibility to know precisely the state of a quantum system
and the fact that any measurement disturbs the system being measured.
Now we review some basic facts in probability theory. Consider an event A that may
occur in dierent ways, for example, assume that there are two dierent routes to reach the
summit of K9, call them B1 , B2 and denote by P (A) the probability of reaching the summit
and by P (Bi ) the probability of taking route i, i = {1, 2}. Then according to Bayess rule:
P (A) = P (A/B1 ) P (B1 ) + P (A/B2 ) P (B2 ).

with P (A/Bi ) the conditional probability of reaching the summit via route i. In the general
case when there are n dierent alternatives:

n
P (A) = P (A/Bi ) P (Bi ).
i=1
Things happen dierently in quantum mechanics, Bayess rule is replaced by a dierent

composition rule, called Feynmans rule [31]. To illustrate the new probability rules consider
the experiment in Figure 2. A photon emitted by S1 or by S2 could be either reected (R)
or transmitted (T ) by BS1 and BS2 and will eventually be detected either by D1 or by D2.
We observe experimentally that a photon emitted by S1 is always detected by D1 and never
by D2 and one emitted by S2 is always detected by D2 and never by D1. This observation
is justied by Feynmans rule: if an event may occur in two or more indistinguishable ways
then the probability amplitude of the event is the sum of the probability amplitudes of each
case considered separately.
Let us rst concentrate on the rst beam splitter. We distinguish between a photon
coming from S1 and one coming from S2 because they have dierent directions, call them
direction 1 and direction 2. An incoming photon is reected by BS1 with probability pR
and transmitted with probability pT . Obviously pR + pT = 1.
After emerging from a beam splitter the state of the photon is described by a unitary
vector (its length is one) with random orientation, | . Given two basis vectors, orthogonal
to each other, | 0 and | 1 corresponding to the states coming from direction 1 and coming
from direction 2, and two complex numbers 0 and 1 the state of the photon is given by:
| = 0 | 0 + 1 | 1.
Now:
pT = | 0 |2
17
Reflecting mirror U
Source S1
Detector D1
Beam splitter BS1 Beam splitter BS2
Detector D2
Source S2
Reflecting mirror L
|1> (a) |1>
V
+q 1
+q
+q
O |0> 1 |0>
-q
(b) (c)
Figure 2: (a) Two sources, S1 and S2, generate photons. Only one photon is generated at a
time. There are two beam splitters, BS1 and BS2 and two detectors, D1 and D2. According
to Feynmanns rule a photon generated by S1 has zero probability of reaching detector D2.
(b) The probability amplitudes of a photon from source S1 emerging from the beam splitter
BS1. c) The probability
amplitudes of a photon from S2 emerging from the beam splitter BS1.
Here q = 1/ 2.
pR = | 1 |2
0 is the probability amplitude for the photon to be transmitted and 1 is the probability
amplitude to be reected.
In our experiment we assume that the two probability amplitudes are equal, a photon has
an equal chance of being transmitted or reected; this condition is ensured by the construction
of the beam splitter. The vectors describing the state of the photons emerging from the rst
bean splitter are shown in Figure 2(b) and (c): if the photon
comes from direction 1, then
its probability
amplitudes are
0 = 1/ 2 and 1 = 1/ 2; if it comes from direction 2
0 = 1/ 2 and 1 = 1/ 2. The two vectors must be distinct, otherwise the system is
not reversible, we could not trace back a photon to its source.
We follow the ight of a photon from a source ( S1 or S2) to the rst beam splitter (BS1),
then to one of the mirrors (U or L), then to the second beam splitter, (BS2), and nally to one
18
of the detectors (D1 or D2). The following sequence of events are possible: T T - the photon is
is transmitted by both beam splitters; RR - it is reected by both beam splitters; T R - it is
transmitted by the rst and reected by the second; and RT - it is reected by the rst and
transmitted by the second. The four cases are illustrated in Figures 3(a)-(d) respectively.
(a) - The probability amplitude of the TT case is (+q)(+q)

.
S1 D1
T +q S1 D1
T +q
R +q
R +q
(b) The probability amplitude of the RR case is (+q)(+q).
S1 (d) The probability amplitude of the RT case is (+q)(+q)

.
S1
T +q R -q
T +q
D2 R +q
(c) - The probability amplitude of the TR case is (+q)(-q)

.
D2
Figure 3: When a photon is emitted by S1 there are four possible events. (a) T T - the photon
is transmitted by both beam splitters with a probability amplitude q 2 . (b) RR - the photon
is reected by both beam splitters with a probability amplitude q 2 . (c) T R - the photon is
transmitted by the rst beam splitter and reected by the second with a probability amplitude
q 2 . (d) RT - the photon is reected by the rst beam
splitter and transmitted by the second
2
with a probability amplitude q . As before, q = 1/ 2.
Let us now examine

the photon coming from S1. It is transmitted at BS1 with a probability
amplitude of 1/ 2; then it changes its direction at the mirror L and comes to the beam
splitter BS2 from direction 2 thus the probability amplitude to be reected is 1/ 2.
Theprobabilityamplitude of the event T R is the product of the two probability amplitudes
(1/ 2) (1/ 2) = 1/2.
There is another path for a photon from S1 to reach D2, theone corresponding to the
event RT . The probability amplitude of this event is (1/ 2) (1/ 2 = 1/2). This time the
direction of the photon is not changed by mirror U, the photon
comes to BS2 from direction
1, and the probability amplitude to be transmitted is 1/ 2.
19
Detector D2 cannot distinguish between a photon transmitted by BS1 and reected by
BS2 and one reected by BS1 and transmitted by BS2 so we have to add the two probability
amplitudes, T R + RT = 1/2 + 1/2 = 0. The probability of a photon coming from S1
to reach D2 is the square of it, it is also 0.
It is easy to see in Figure 2 that a photon coming from S1 reaches detector D1when
eitherT T or RR events occur. The probability amplitude of the event T T is (1/ 2)
(1/ 2) = 1/2; indeed the probability amplitude of RR is (1/ 2) (1/ 2) = 1/2.
It follows that a beam splitter is characterized by four numbers,
the probability amplitude

of a photon coming from direction 1 to be transmitted (1/ 2) and to be reected (1/ 2)
as well
as the probability amplitude of a photon coming from direction 2 to be transmitted
(1/ 2) and to be reected (1/ 2).
Another experiment is illustrated in Figure 4. A group at the University of Rochester
created a source of light able to generate two photons simultaneously in two separate beams;
this is by no means trivial but, to keep the presentation focused, we do not discuss such
details. Each is directed by a reecting mirror to a beam splitter [36]. When reaching the
bean splitter each photon has a fty-fty chance of being transmitted or reected.
Detector D1
Reflecting mirror U
Beam splitter
Source
Reflecting mirror L
Detector D2
Figure 4: A source of light generates two photons simultaneously. Each photon is reected
by a mirror and directed to a beam splitter. When reaching the beam splitter each photon
has a fty-fty chance of being transmitted or reected. Two detectors are used to determine
the outcome of the experiment. Both photons may reach the same detector - when one of the
photons was reected and the other one transmitted, or each detector gets one photon (we
observe a a coincidence) - when both photons are either reected or transmitted. Surprise, a
coincidence never occurs!
In the realm of classical physics two outcomes are possible:

(i) the two photons have dierent fate; one is reected and the other one is transmitted by
the beam splitter. Both photons end up either at detector D1, or at detector D2. When the
photon from mirror U is reected and the one from mirror L is transmitted both photons end
up at detector D1 . When the scenario is reversed both photons end up at detector D2.
20
(ii) both photons have identical fate; they are either reected or transmitted by the beam
splitter. In this case detectors D1 and D2 get one photon each and signal a coincidence. If both
photons are transmitted, the one reected by mirror U ends up at detector D2 and vice-versa.
When both photons are reected the one from U ends up at detector D1 and the one from L
at detector D2.
Well, to our surprise the second outcome, the coincidence, is never observed! But
now we can explain why: the beam splitter is characterized by the four numbers giv-
ing the probability amplitudes for the two photons to be transmitted and reected
((pT, pR)1 , (pT, pR)2 ) = (+q, +q, +q, q). Call T T the event that both photons are trans-
mitted by the beam splitter, RR the event that both are reected. Figure 5(a) shows that the
probability amplitude for the T T event is 1/2 and Figure 5(b) shows that the one for the RR
event is 1/2. The two photons are indistinguishable after the beam splitter so the proba-
bility amplitude to have both transmitted or both reected is the sum of the two probability
amplitudes, so it is zero. We never observe a coincidence. When only one photon is emitted
there is an equal chance that it is detected either by D1 or by D2, as expected.
(a) - The probability amplitude of the TT case is (+q)(+q)

.
D1
D1
T +q R +q
T +q R -q
D2 D2
(b) The probability amplitude of the RR case is (+q)(-q).
Figure 5: (a) The probability amplitude for the T T case is q 2 . (b) The probability amplitude
for the RR case is q 2 . There are two indistinguishable ways for a coincidence
to occur, T T
and RR, thus the probability amplitude of this event is zero. Here q = 1/ 2
.
Milburn [36] calls this a a quantum two-up experiment. He describes a game of chance
Australias very own way to part a fool and his money. Two fair and identical coins
are tossed up in the air. Four outcomes are possible: two heads (HH) with probability
pHH = 1/4, two tails (T T ) with pT T = 1/4, or one head and one tail ( HT or T H) with
pHT = pT H = 1/4. The spinner aims to toss the HH combination three times before
tossing either a T T , or ve consecutive odds (HT or T H).
In conclusion our naive hypothesis that a photon tosses a fair coin after reaching a beam
splitter and is transmitted with probability 1/2 and reected with probability 1/2 is too sim-
plistic. The state of a single photon after a beam splitter is uncertain, it is in a superposition
state. Note that the probability amplitudes for transmission and reection need not be equal,
but their sum must be 1.
21
1.7 Superposition and Uncertainty
black hard
white soft
(a) (b)
black hard black
white soft white
(c)
Figure 6: (a) A color separation system. (b) A hardness separation system. (c) Three
measurement systems in cascade. We observe an equal number of white and black par-
ticles emerging from the third box though we expected that no black particles enter the
second box and the second box cannot fabricate black particles.
Once we open the Pandora box of quantum eects we need to discuss in more depth the
process of measuring the state of a system. Let us assume that we have a quantum particle
with two properties, color, and hardness [3]. The color of a particle may be either
white or black and a particle may be hard or soft. The two properties are attributed
to a particle in a totally random fashion, 50% of the particles are white and 50% are black;
similarly 50% are hard and 50% are soft.
Imagine that we can construct a color separation system, a box with a slit on the left
hand side, one on the right hand side and one at the top, see Figure 6(a). A beam of particles
enters the box from the slit on the left and the beam is split by a color-based beam splitter,
the white particles continue their path and exit the box through the right slit, while the
black particles are deected by 90 deg and exit through the upper slit. We construct also
a hardness separation system in Figure 6(b), similar with the color separation system.
This time the beam splitter lets the soft particles continue and deects the hard ones.
When we use the color separation system exactly 50% of the particles, the white ones
are allowed to continue; when we use the hardness separation system only 50% of incoming
particles, the soft ones are allowed to continue.
Now we design a more sophisticated experiment involving a sequence of three separation
systems; rst color, then hardness, and nally one more color separation system. They
are aligned, allowing the beam of white particles emerging from the rst box to enter the
second separation box and then the soft particles emerging from the second box to enter the
third one. Amazingly enough, we observe an equal number of white and black particles
emerging from the third box. It looks like the hardness separation system is able to fabricate
22
black particles though we have taken all possible precautions in building the hardness box,
it only deects the hard particles and lets the soft ones continue their path.
No renement of the experimental set-up can possibly alter this unexpected outcome. We
have identied a practical manifestation of the fact that the quantum state is a superposition of
state projections on a set of basis vectors. The state of the particles emerging from the second
box is a superposition of projections on white and black basis vectors. Moreover, the
measurement of one physical property disrupts the other according to Heisenbergs uncertainty
principle. Properties of quantum systems such as color and hardness are incompatible
with each other, the measurement of one disrupts the other.
| >
| >
| > | v >
b
b'
a'
a
| >
Figure 7: The eect of changing the bases.
Once we change the set of orthonormal vectors used to represent the state of a quantum
system, the projection on one of the bases vectors may itself be projected again on the new
bases vectors. Figure 7 shows a vector | v projected on the bases formed by | and |,
the projections being a and b. Then one of the projections, vector | b is projected onto a
new base giving two distinct projections a and b . This phenomena is further discussed in
the next experiment.
1.8 Measurements and Collapsing of Superposition States onto Ba-

sis States
Light is a form of electromagnetic radiation; the wavelength of the radiation in the visible
spectrum varies from red to violet. Sunlight consists of light waves with dierent wavelength.
Light can be ltered by selectively absorbing some color ranges and passing through others.
An electromagnetic eld consists of an electric and a magnetic eld perpendicular on each
other and oscillating in a plane perpendicular to the direction of the energy transported by
the electromagnetic wave. The line along which the electric eld is is directed determines
the polarization of the light. If this line rotates as the light propagates, the light is circularly
polarized; if the line does not change direction the light is plane polarized. A polarization lter
is a partially transparent material that transmits light of a particular polarization.
An experiment that sheds some light (pardon the pun) upon the strange eects we are
going to present in this series of lectures is discussed in [46]. We have a source S capable of
generating randomly polarized light and a screen E where we measure the intensity of the
23
light. We also have three polarized lters: A, polarized vertically, ; B polarized horizontally,
; and C polarized at 45 degrees, .
| >
a
| >
b
(a)
E E
A
S S
(b) intensity = I (c) intensity = I/2
B E C B E
A A
S S
intensity = 0 intensity = I/8

(d) (e)
Figure 8: (a) The polarization of a photon is described by a unit vector | with projections
a | and b| on a two-dimensional space with basis | and |. Measuring the polarization
is equivalent to projecting the random vector | onto one of the two basis vectors. The two
projections are 0 and 1 . (b) Source S sends randomly polarized light to the screen; the
measured intensity is I. (c) The lter A with vertical polarization is inserted between the
source and the screen an the intensity of the light measured at E is about I/2. (d) Filter B
with horizontal polarization is inserted between A and E. The intensity of the light measured
at E is now 0. (e) Filter C with a 45 deg polarization is inserted between A and B. The
intensity of the light measured at E is 1/8.
Using this experimental setup we make the following observations:

(i) Without any lter the intensity of the light measured at E is I, see Figure 8(b).
(ii) If we interpose lter A between S and E, the intensity of the light measured at E is
I = I/2, see Figure 8(c).
(iii) If between A and E in the previous setup we interpose lter B, then the intensity of the
light measured at E is I = 0, see Figure 8(d).
24
(iv) If between lters A and B in the previous setup we interpose lter C, the intensity of the
light measured at E is I = I/8, see Figure 8(e).
The photons have random polarizations; this means that the vectors describing the polar-
ization of a photon have random orientations, see Figure 8(a). We use the notation introduced
by Dirac [27] for vectors (we discuss vectors in detail in Section 11).
Let | denotes a 2-dimensional vector representing the random polarization of a photon.
The vector | can be expressed as a linear combination of a pair of orthogonal basis vectors.
We can use dierent orthonormal bases and we choose one denoted as | and |:
| = 0 | + 1 | | 0 |2 + | 1 |2 = 1.
Measuring the polarization of a photon is equivalent to projecting the random vector |
onto one of the two bases vectors. The measurement performed by a vertically polarized
lter, |, provides an answer to the question: does the incoming photon have a vertical
polarization?; similar statements can be made about horizontal, 45 degree, or lters with
any other polarization.
After a measurement, the superposition state | is resolved as one of the basis sates, |,
or |, a photon is forced to choose either the vertical, |, or the horizontal, | polarization
state. The probability that a photon with random polarization | is forced to choose the
| state:
p = | 0 |2 .
and the probability that it is forced to choose the | state is:
p = | 1 |2 .
The sum of the two probabilities must be one, each photon is forced to chose one of the
two basis states:
p 2 + p 2 = 1.
Once the choice is made, only those photons which have made a choice agreeing with the
polarization of the lter are allowed to pass. This explains the results of (ii) and (iii) above,
see Figure Figure 8(c) and (d). Indeed, due to the random orientation of the photons emitted
by the source only about 50% of them emerge as vertically polarized from lter A. Clearly,
none of them can make it through lter B; all of them have a vertical polarization, |, thus
the projection of their polarization on | basis vector is zero. The probability of each photon
to reach the screen E, is zero.
Until now everything seems clear and reasonable. The interesting fact is the introduction
of lter C with a 45 degree polarization between A and B. Filter C measures the quantum
state with respect to a dierent basis than lters A and B; the new basis consists of vectors
at 45 and 135 degree, given by:
1 1
{ (|+ |), (| |)}.
2 2
Filter C has a 45 degree polarization and forces incoming electrons to choose between the
two basis states; 50% of them emerge from C with a 45 degree polarization and the other 50%
end up with a 135 degree polarization and are stopped. Recall that only 50% of the electrons
25
emitted by the source reach C because of the ltering done by A, therefore only 0.25% of the
photons emitted by the source reach lter B in Figure 8(e).
Now, once again the basis used to measure the polarization of the photons changes, lter
B has a horizontal polarization and forces a measurement using the original basis vectors, |,
or horizontal, |. Again the incoming photons with a 45 degree orientation are forced to
make a choice, based upon the projection on the new base. Again only 50% make it through.
Thus only 1/8 = 12.5% of the original number of photons make it to the screen, and all
have horizontal polarization.
What did we learn from this simple experiment? First, we identied a candidate to store
and transport information; we can probably use the polarization of a photon to store a bit
of information. But this bit is unusual, it may not only take the values 0 and 1 but an
innite set of values. This is indeed a very powerful bit. Second, there is something strange
about the measurement process; once we measure a bit we aect its state. Even though this
bit may take innitely many values prior to the measurement, when we interact with it during
the measurement process, we force it to take one of the two possible values, 0 or 1. This
is a simple experiment, but what we learned is going to hunt us for the times to come.
1.9 Entanglement
The group of Leonard Mandel at the University of Rochester is one of several studying the
possibility of creating systems of entangled particles. In 1986 they started to investigate a
phenomena called parametric down conversion, an elementary quantum process of the decay
of a photon (of frequency 0 ) into two new photons of frequencies 1 and 2 such that:
0 = 1 + 2 .
The emission of the two photons is simultaneous:
t01 = t02 .
The process obeys the conservation relations for momentum and energy:

kp = k1 + k2
The two photons are highly correlated, have a wide bandwidth and the same polar-
ization and travel in well dened directions when << 0 ; they appear simultaneously
from the crystal. The photons are reected by two mirrors and brought together to interfere.
In an experiment from 1986 a coherent beam of light of frequency 0 from an argon-
ion laser (351.1 nm line) falls on a (8 cm long) nonlinear crystal of potassium dihydrogen
phosphate. Some incident photons split into two lower frequency signal and idler photons of
frequencies 1 and 2 .
The signal and the idler photons are directed by two mirrors M 1 and M 2 to pass through a
beam splitter BS and the superposed beams interfere and are detected by two photodetectors
D1 and D2. Measured is the rate at which the photons are detected in coincidence when the
beam splitter is displaced from its symmetry position by various small distances c . The
signal and idler photons:
(i) Have no denite phase.
26
(ii) Are mutually incoherent: they exhibit no second-order interference when brought together
at detectors D1 or D2.
(iii) Exhibit fourth-order interference eects as demonstrated by the coincidence counting rate
between D1 and D2, the so called cosine modulation.
(iv) The individual frequencies 1 and 2 have large uncertainties (though 0 is very well
dened. The large uncertainties are determined by the pass bands of the interference lters
which are of the order of 5 1012 corresponding to a coherence time for each photon of 100
fsec. The two-photon probability amplitudes at D1 and D2 are expected to interfere only if
they overlap to this accuracy.
27
2 Introduction to Quantum Mechanics
2.1 A Brief History of Quantum Ideas

Quantum is a Latin word meaning some quantity or some denite amount of something. In
physics it is used with the same meaning as the word discrete in mathematics, i.e., some
quantity or variable that can take only sharply dened values as opposed to a continuously
varying quantity.
Quantum mechanics is the description of the behavior of matter and light, in particular, of
what is happening on an atomic scale. Quantum physics would be a more appropriate name
because it provides a general framework for the whole of physics rather than dealing with a
special aspect of mechanics.
Things on a very small scale behave like nothing that we have any direct experience about
in out surrounding, macroscopic world. They do not behave like waves, they do not behave
like particles, or like billiard balls, or like springs, or like anything that we see around us.
Newton thought that light was made up of particles, but then it was discovered that it
behaves like waves. Thomas Young (1773 - 1829) a British physician and physicist conducted
(sometime at the beginning of the 19th century) the now famous double-slit experiment on
light, demonstrating the wave-theory eect of interference. In his experiment Young had a
light source, a barrier with two slits in it and a screen behind the barrier. He shone light from
the source on the barrier with two slits and obtained an interference pattern. An interference
pattern is the hallmark of waves. Particles do not interfere with each other.
The notion of quanta were born in 1900 when Max Planck presented his theory of the black-
body radiation. The black-body radiation is the electromagnetic radiation emitted by a body
in thermodynamical equilibrium, or the radiation contained in a cavity when its walls are at at
a uniform temperature. The radiation is allowed to escape through a small aperture do that its
spectrum and energy density cam be measured. The classical thermodynamics predicted that
the radiation intensity emitted in a small frequency interval should be proportional with
the cube of the frequency 3 ; when integrated over all frequencies that gives an innite total
intensity. The theoretical predictions were in contradiction with the experimental results at
higher frequencies, where the measured intensity rather than increasing like 3 was decreasing
exponentially.
Max Planck assumed as a new postulate of physics that the energy of the emitted radiation
does not vary continuously, but by small amounts which are multiples of a basic quantum.
Planck denoted this quantum of energy by h , where is the frequency of the radiation
(which is a wave) and h is a fundamental constant, now known as Plancks constant. (The
value of h is 6.62621034 Joulesecond and represents the product EnergyT ime). Planck
proposed the following formula for the energy levels of the black body radiation (approximated
as a Maxwell - Hetz oscillator)
E = 0, h, 2h, 3h, 4h, . . . , nh

where n is a non-negative integer.
In 1905 Albert Einstein showed that the empirical propertied of the photoelectric eect
could be explained by assuming that light consists of particles each one having an energy h
and moving at the velocity of light. Einsteins light particle became known as a photon 5 .
5
Photon is derived from the Greek word photos meaning light.
28
According to Einstein, the existence of quanta was not due to the emission process but was
a property of the light itself. The best evidence of this idea came later, when Compton
found that the scattering of gamma rays by electrons has the kinematical characteristics of
the collision between two particles (or billiard balls). Two side notes: the Gamma rays, X -
rays, light are names given to dierent parts of the energy spectrum of the same radiation;
Einstein received the Noble prize for his photoelectric eect theory and not for his relativity
and general relativity theories for which he is universally known.
In 1911 Ernest Rutherford established that an atom consists of electrons located around a
positively charged heavy nucleus, based on results of scattering of alpha particles. That was
a planetary model of the atom.
In 1913 Niels Bohr modied the planetary model of the atom based on the 1908 discovery
of Walter Ritz that all the frequencies in the spectrum of a given atom can be obtained with a
simple formula = n m , where the frequencies n (n = 1, 2, 3, . . .) characterize the atom.
Bohr noted that the angular momentum of the orbiting electron in his model of the hydrogen
atom (the hydrogen atom has one electron orbiting the nucleus and the nucleus contains one
proton and one neutron) has the same dimensions as the Plancks constant h.
Bohr postulated that the angular momentum of the orbiting electron must be a multiple
of the Plancks constant divided by 2, that is:
h h h
mvr = ,2 ,3 ,...
2 2 2
where mvr is the classical denition of the angular momentum and m is the mass, v is the
velocity, and r is the radius of the electron orbit.
The quantization of the angular momentum led Bohr to the quantization of the atom
energy. Bohr also postulated that when the hydrogen atom drops from one energy level top a
lower one, the dierence between its beginning and ending energies was emitted in the form
of a quantum of energy
Ea Eb = hab
Here, Ea is the initial energy level of the electron around the nucleus, Eb is the nal energy
level after the transition from its prior state, h is Plancks constant, and ab is the frequency of
the light quantum (gamma) emitted during the electrons jump from the rst to the second
energy level.
The quantum mechanics developed in the 1920s and the 1930s reinforced the view that
light is both particle and wave. Light exhibits both phenomena that are characteristic of
waves, such as interference and diraction, and characteristic of particles in their interaction
with matter as it happens in the photoelectric eect.
The next signicant step was made by the French physicist Louis (duke) de Broglie in
1923. Drawing an analogy with light and its dual character as wave and particle, de Broglie
proposed that a wave should be associated with every kind of particle and particularly with
the electron. De Broglie proposed an equation whereby the momentum p of a particle is linked
to the wavelength of the wave associated with it:
h
p=

6
This was a radically new assumption , no longer an idea to correct or to supplement classical
physics by quantum constraints. De Broglies assumption was conrmed (accidentally) in
6
de Broglie received the Noble Prize for his contributions in 1929.
29
1927 by Davisson and Germer, who observed the diraction of electrons by a crystal and
thereby proved the wave character of electrons.
De Broglie did not propose an equation to describe the propagation of the wave associated
with a particle. In 1926 Erwin Schrodinger gave a precise formulation to de Broglies wave
hypothesis. He proposed the following equation for the wave function n (q) of a stationary
state of energy E n of an atom:
h
H(q, )n (q) = En n (q)
2i q
The Hamilton function H(q, p) is the classical energy of the atom when expressed in terms of
position and momentum coordinates. Schrodinger replaces the momentum variable p by the
dierential operator (h/2i)(/q).
H is now the Hamiltonian operator, n is the wave function associated with the atom in a
stationary state of energy En .
Schrodinger assumed that the time evolution, the dynamics of the wave function, is gov-
erned by another partial dierential equation:
h (q) h
i = H(q, )(q)
2 t 2i q
Two observations are in order:
(i) The explicit presence of the complex number i in this equation implies that the wave
function is complex.
(ii) The solution to this equation is a function of time which represents the time evolution of
the stationary wave function of above:
2
(q, t) = ei h En t n (q).
Schrodinger realized that the wave function of a many-electron atom was not dened in
the ordinary, physical, three-dimensional space, as de Broglie had used, but in a much more
abstract conguration space (state space), i.e., the space described by the coordinates of the
positions of all the electrons. This wave is dierent from an electromagnetic wave; it exists
in a formal space and its values are complex. However, the interpretation of such a complex
function, the physical meaning that could be given to it was a problem.
By the end of the same year, 1926, Max Born suggested the probabilistic meaning of the
wave function 7 . According to Born the probability that a particle can be found at a given
location is equal to the square of the amplitude of the wave function at that location:
P robability =| |2
Born made an analogy between the scattering of an electron colliding with an atom and the
diraction of X - rays and concluded that the electron can be anywhere in a place where the
wave function is dierent from zero, but there is no way to say where it is because this is a
random event.
This formula is extremely important - it represents the essence of what the quantum theory
can give us. In contrast to the classical physics, which can measure and predict the location
7
Einstein had the intuition of the probabilistic nature of the wave function at about the same time, but
had reservations about the possible randomness of the physical world.
30
and the speed of an object with 100% certainty (in principle), in the world of quantum physics
our predictions are only statistical in nature.
Schrodingers equation allows us to make probabilistic predictions: we can determine
where a particle will be (if the position observable is considered) only in terms of probabilities
of dierent outcomes; or, equivalently, what proportion of a large number of particles will be
found at a specic location. Quantum theory does not give any indication as to a specic
outcome.
Quantum probabilities are very dierent from the probabilities in classical physics not so
much when it comes to their use, but rather when one considers their conceptual nature.
Classical probabilities only express some ignorance, some lack of information about the ne
details of a given situation. The randomness in results is due to some uncontrolled causes that
are recognized to exist and if we knew them better, the predictions would also be better. On
the other hand, quantum probabilities assume that a more precise knowledge is impossible
at the atomic level, as a matter of principle. Many experiment have shown along the time
that this limitation cannot be avoided. Einstein never completely accepted this limitation
and could not admit that God is playing dice.
The uncertainty principle discovered by Werner Heisenberg in 1927 is a remarkable con-
sequence of the probabilistic aspect of the quantum theory. Observing a quantum system is
expressed mathematically by applying an operator to the wave function. Some of the oper-
ators corresponding to the observation of physical properties of a speck of matter, e.g., the
position and the momentum are not commutative. This means given two obsevables X and
Y , if operator X is applied rst and then Y (corresponding to rst observing property X and
then Y ) the results are not the same as when we observe rst Y and then X. If X and Y
correspond to the observation of the position and, respectively the momentum of the particle
the non commutativity of the two operators has immense consequences for quantum systems,
it means that we cannot measure both the position and the momentum of the same particle
accurately.
The uncertainty principle states that the uncertainty in determining the position x and
the uncertainty in determining the momentum px at position x are constrained by the
inequality:

x px
2
where = h/2 is a modied form of Plancks constant.
This means that a precise knowledge of basic physical notions such as position and mo-
mentum is forbidden, in spite of our tendency to see them intuitively as ordinary properties
of a speck of matter, the uncertainty is an intrinsic property of quantum systems.
The superposition principle is the second essential element of the quantum theory brought
forth by Schrodingers equation. The equation is linear, therefore, when two function 1
and 2 are among its solutions, their sum 1 + 2 is another solution. The corresponding
probability is proportional to | 1 + 2 |2 and it can show interference eects. For example,
an electron can be found in a state that is a superposition of other two states. A solution of
Schrodingers equation for the electron is a sine wave and thus a sum of such sine waves is
also a sine wave and a solution.
The superposition can be extended to a single particle, i.e., a particle can be superposed
with itself. In Youngs experiment when the light is reduced to one photon emitted at a time,
we still nd an interference pattern on the screen (after enough photons have been detected).
31
The explanation is that a single photon goes through both slits and then it interferes with
itself, as two wave do by superposition.
Similar experiments were later performed with electrons, neutrons, atoms, and even bucky
balls 8 . All these particles behaved like waves and created interference patterns.
Schrodinger himself realized that if a quantum system contains more than one particle, the
superposition principle gives rise to the phenomenon of entanglement, the system interference
with itself. The result is an entangled system. The term entanglement was rst used by
Schrodinger himself in 1935 in his discussion of the Einstein, Podolsky, and Rosen (EPR)
paper.
At the beginning of the 1920s Heisenberg had developed a theory of the quantum mechan-
ics based on matrices, equivalent to Schrodingers theory which is based on wave equation.
In Heisenbergs more abstract approach, innite matrices represent properties of observable
entities and the mathematics used is that of matrix algebra. The matrix multiplication is
non-commutative and that has important consequences in quantum mechanics. Heisenbergs
leading idea was that physics should use only observable quantities; the classical orbits
should not be mentioned, since no experiment has ever shown their existence.
Max Born, Pasqual Jordan, and Paul Dirac soon realized that the use of non-commutative
quantities to replace position and momentum is an essential feature of Heisenbergs theory,
they recognized the rules of matrix calculus in their own theories. They concentrated upon
a new kind of mechanics where the dynamical variables are not ordinary numbers, but (new)
non-commutative mathematical objects.
In 1925 Born and Jordan developed a complete formulation of this new mechanics us-
ing innite matrices to represent the basic physical quantities. The same year, 1925, Dirac
introduced abstract mathematical objects, say Qj for position coordinates and Pj for mo-
mentum coordinates, without trying to specify them; the multiplication rules were those of
the new mechanics. He postulated a general form for the commutator between two quan-
tum dynamical variables using the Poissons brackets. Diracs abstract quantities Q and P
can be identied with some operators acting upon the wave functions. Heisenbergs matrices
representing position and momentum can be obtained from the wave functions.
In 1926 Scrodinger and Dirac showed the equivalence of Heisenbergs matrix formulation
and Diracs algebraic one with the Schrodingers wave function. In 1926 Dirac and indepen-
dently Born, Heisenberg, and Jordan, obtained a complete formulation of quantum dynamics
that could be applied to any physical system; before all calculations were done for the hydro-
gen atom.
In 1926 John von Neumann introduced the concept of Hilbert space to quantum mechanics.
The idea came to him while attending a lecture where Heisenberg was presenting his matrix
mechanics and the dierence between it and that of Schrodinger. David Hilbert, the greatest
mathematician of the time, didnt understand the presentation and asked for some explana-
tions. John von Neumann decided to explain the quantum theory in terms a mathematician
could well understand and used the idea of Hilbert space. A Hilbert space is a vector space
with a measure of distance, called the norm, and the property of completeness. Von Neumann
wrote a note for Hilbert, who was 64 years old at that time, explaining Heisenbergs version
of quantum theory in terms of what will later be known as Hilbert spaces. Then he expanded
this explanation into a book The Mathematical Foundations of Quantum Mechanics published
in 1932 and still a required reading for students learning quantum mechanics.
8
A bucky ball is a molecule (also called fullerene) of sixty or seventy carbon atoms arranged in a structure
resembling the geodesic domes built by the architect Buckminster Fuller.
32
Von Neumann demonstrated that the geometry of vectors over the complex plane has
the same formal properties as the states of a quantum mechanical system. The states of
the quantum systems, the wave functions, are represented as vectors in a Hilbert space and
the operations associated with the position and the momentum act like matrices upon these
vectors. He also derived a theorem, using some assumptions about the physical world, which
proved that there are no hidden variables, whose inclusion could reduce the uncertainty in
quantum systems. In 1966 John Bell challenged (successfully) von Neumanns assumptions
and proved his own theorem establishing that hidden variables could not exist.
In 1928 Paul Dirac developed the relativistic quantum mechanics that combined quantum
mechanics with relativity. He introduced corrections for relativistic eects to the quantum
mechanics equations for particles moving at close to the speed of light. This allowed the
properties of spin to be obtained in a natural way from the relativistic Schrodinger equation.
Dirac had suggested the existence of a quantum number with value +/ 12 in 1925 and
Uhlenbeck and Goudsmith had interpreted it as the spin of the electron. In 1930, Dirac
predicted the existence of anti-electrons, particles with a negative energy and opposite charge
than the electron.
In 1931 Carl Anderson discovered the positron, the anti-electron, in cosmic radiation.
In 1949 Madame Chien-Shiung Wu and Irving Shaknov of Columbia University produced
positronium, an articial association between an electron and a positron circling each other.
This element lives for a fraction of a second, then the electron and the positron spiral toward
each other, causing mutual annihilation and two photons of gamma radiation are emitted as
a result, each with an energy of 0.511 MeV.
This experiment veried an assumption made by John Wheeler in 1946 that the two pho-
tons produced when an electron and a positron annihilate each other, have opposite polariza-
tions: if one is vertically polarized, the other must be horizontally polarized. Polarization
means the orientation in space of either the electric or the magnetic eld of light. What really
distinguishes this experiment is the fact that it is the rst one in history to produce entangled
photons. This important fact was recognized only eight years later, in 1957, by David Bohm
and Yakir Aharonov.
2.2 Youngs Double-Slit Experiment

To emphasize the dierences between classical and quantum phenomena Richard Feynman
in his Lectures on Physics [29] presents several double-slit experiments, one with bullets, one
with water waves, and one with electrons
In the rst experiment, a gun shoots bullets at random at the barrier with two slits and
the bullets are detected with a mobile detector mounted on a backstop positioned behind the
barrier, see Figure 9. The probability distribution P12 of the bullets coming through the two
slits is a monotonous curve with a maximum centered at mid distance between the two slits.
This curve is the sum of two similar curves
P12 = P1 + P2
where P1 is centered around slit 1, and the other, P2 , is centered around slit 2. Such curves
are obtained if the slits are open one at a time. No interference pattern is observed here.
In the second experiment, water waves produced by a wave source arrive at the barrier,
penetrate through the slits and reach the mobile detector. The detector is mounted on the
33
Mobile P1
detector P12
Gun
P2
Wall Backstop P12 = P1 + P2
Figure 9: The double slit experiment with bullets.
absorber wall behind the barrier to avoid reection of the waves. The detector is a device
which measures the instantaneous height of the wave eit , and converts it to the intensity of
the wave I as a function of position. is a complex number, and the intensity is proportional
to the square of the height.
The result of the measurement is a curve with maxima and minima. This curve is not
the sum of the curves measured by opening one slit at a time; these curves have a maximum
centered at the respective slit position. The maxima and minima are the result of constructive
and, respectively, destructive interference of the waves. Now, for the wave coming through
slit 1 the height is 1 eit and the intensity is I1 =| 1 |2 , the intensity of the wave coming
through slit 2 is I2 =| 2 |2 . When both slits are open the height of the wave reaching the
detector is the sum of the individual waves coming through slit 1 and slit 2, i.e., (1 + 2 )eit
and its intensity will be I12 =| 1 + 2 |2 . Now
| 1 + 2 |2 =| 1 |2 + | 2 |2 +2 | 1 || 2 | cos
where is the phase dierence between 1 and 2 , and

I12 = I1 + I2 + 2 I1 I2 cos
In the third gedanken experiment electrons produced by an electron gun are recorded in
a similar setup and the detector is a Geiger counter that clicks when recording an electron.
The clicks have the same intensity, are well distinguishable and may sound at dierent rates.
These observations proves that the electrons reaching the screen are identical entities. First,
electrons coming through one slit at a time, either slit 1 or 2 are measured. Their probability
34
of arrival to the backstop is proportional to the rate of clicks. The probability distribution
for electrons coming either through slit 1 or slit 2 is P1 and P2 , respectively. The result of the
measurement when both slits are open, the curve P12 representing the probability of arrival
of electrons as a function of position, shows an interference pattern, though the electrons are
recorded as individual entities, like the bullets in the rst experiment, Figure 10.
Mobile P1
detector P12
Gun
P2
Wall Backstop
Figure 10: The double slit experiment showing electron interference.
The curve observed is the sum of the eects of the electrons coming either through slit 1
or through slit 2, but P12 = P1 + P2 . The number of electrons that arrives at a particular
point when both slits are open is not equal to the number of electrons coming only through
slit 1 plus the number of electrons coming through slit 2. That means that the electrons do
not travel either through slit 1 or through slit 2.
The probability P of an event in an ideal experiment is given by the square of the absolute
value of a complex number which is called the probability amplitude:
P =| |2
When an event can occur in several alternative ways, the probability amplitude for the
event is the sum of the probability amplitudes for each way considered separately.
Assume that P1 =| 1 |2 and P2 =| 2 |2 , where 1 and 2 are complex numbers. The
combined eect of the two slits is P12 =| 1 + 2 |2 and the resultant curve is similar to that
obtained in the waves case. In quantum mechanics the amplitudes must be complex numbers,
not just the real parts as in the case of classical waves.
The only possible explanation is that the electrons arrive like particles and the probability
of arrival of these particles is distributed like the distribution of the intensity of a wave, as if
35
each electron were travelling through both slits and then interfering with itself upon arrival
at the detector.
Feynman also proposed a variation of this experiment with electrons. A strong light source
is placed behind the wall between the two slits and it will help us see which way each electron
is coming. The electrons, as electric charges, scatter the photons of light; when ever the
detector clicks, ashes of light will be seen and point to electrons coming either through slit
1 or slit 2. In fact, it seems that indeed the electrons select either slit 1 or slit 2 to travel
trough. If the electrons are identied as such and counted, we obtain the arrival probabilities
P1 and P2 corresponding to slit 1 and slit 2, respectively. These curves are similar to the once
obtained before when the slits were open one at a time. Whether the slits are open one at a
time or both at the same time, the electrons we see coming through slit 1 are distributed in
the same way as those coming through slit 2. If we want to calculate the total probability of
arrival at the detector, we add the numbers corresponding to slit 1 and to slit 2 and we nd

P12 = P1 + P2 = P1 + P2
In this case when we identify which slit the electrons are coming through the total prob-

ability P12 does not look like P12 , it shows no interference eect. Now the light photons
colliding with the electrons that we see change the momentum and the trajectory of the elec-
trons, not much, but enough to smear the total probability distribution. We can change
the characteristics of the light, either its intensity or wavelength (frequency).
When we reduce the light intensity, we reduce the number of photons ying into the paths
of electrons; we can reduce it so far that some electrons coming either through slit 1 or slit
2 are not seen before being detected with a click. Assume we record the electrons detected
and seen as coming either through slit 1 or slit 2, as well as the electrons just detected, some
even without being seen. The distributions for slit 1 and slit 2 will be similar to P1 and P2 ,
as before, but the total probability will be similar to P12 showing an interference pattern.
When we reduce the frequency of the light (we increase the wavelength) we do not notice

any change; we record a monotonous curve P12 until we reach values of the wavelength larger
than the distance between the two slits. At that moment we begin to see big fuzzy ashes
of light, we can no longer distinguish which slit the electron went through, and we notice the
appearance of some interference eect. At wavelength values much lower the change in the
electron momentum becomes small enough to observe a total probability curve similar to P12 .
In summary, when an event may occur in several alternative ways, the probability am-
plitude for the event, is the sum of the probability amplitudes for each way considered
separately. The interference is present.
= 1 + 2
P =| 1 + 2 |2
If one of the alternative ways is determined to be actually taken, the probability of the
event is the sum of the probabilities for each alternative. The interference is lost.
P = P1 + P2
Heisenberg had suggested that the laws of the quantum mechanics could be consistent
only if there were some basic limitations, previously not recognized, on our experimental
capabilities. His uncertainty principle sets a lower limit h, the Plancks constant, on the
36
product between the position uncertainty and the momentum uncertainty. In fact, the entire
theory of quantum mechanics depends on the correctness of the uncertainty principle.
In terms of this experiment Feynman proposes to state Heisenbergs uncertainty principle
in the following way: It is impossible to design an apparatus to determine which slit the
electron passes through, that will not at the same time disturb the electrons enough to destroy
the interference pattern.
A truly correct interpretation of this experiment would be along the lines: if one has an
apparatus which is capable of determining whether the electrons go through slit 1 or slit 2,
then one cansay that they go either through slit 1 or slit 2. But, when one does not try to tell
which way the electrons go, when there is nothing in the experiment to disturb the electrons,
then one may not say that an electron goes either through slit 1 or slit 2.
The motion of all forms of matter must be described in terms of waves. In the experiment
with bullets no interference patterns could be observed because the wavelengths of the bullets
are extremely short and the nite size of the detector would not allow to distinguish the
separate maxima and minima. The result was an average over all those rapid oscillations of
the total probability, the classical curve.
2.3 The Stern - Gerlach Experiment

An experiment rst performed in 1922 by Stern and Gerlach to measure a component of the
atomic magnetic moment has played a crucial role in signaling the existence of a new intrinsic
property of atoms and particles, the spin. At that time physicists understood that in an atom,
such as the hydrogen, the electron orbiting around the nucleus represented an electric current
and this current meant that the atom had a magnetic eld, or what is called a magnetic dipole
moment. As a result each atom was expected to behave like a little bar magnet with an axis
corresponding to the axis the electron is spinning around. Such little bar magnets moving
in a magnetic eld will be deected by the eld and that is exactly what Stern and Gerlach
expected to see happening during their experiment.
The Stern-Gerlach experimental setup used a magnet with asymmetric polar caps between
which silver atoms evaporated from an oven were beamedperpendicular to the eld gradient.
One polar cap is planar and the other one is wedge-shaped. This conguration results in
a non-uniform magnetic eld whose z-axis component is normal to the planar cap. The
components of the magnetic eld along x and y axis are negligible.
The deection of the atom depends on the atoms magnetic moment and the magnetic
eld generated by the two magnets (the component along z axis). The atoms exiting the oven
were supposed to have their magnetic moments oriented randomly in every direction and,
therefore, were expected to be deected by the magnetic eld at all angles with a continuous
distribution. Instead, the atoms were deected at a discrete set of angles. For the silver atoms
which have a more complicated structure the distribution was complex. The experiment was
repeated in 1927 with hydrogen atoms which have only one electron orbiting around the
nucleus. What was very surprising was the number of peaks in the electron distribution seen
in that experiment. The hydrogen atoms were such that they should have had zero magnetic
dipole moment, i.e., no orbital motion of the electron and that was an acceptable notion
according to what was known of quantum mechanics. Hydrogen atoms with zero magnetic
moment were supposed to exit the magnetic eld as an undeected beam. Instead, two beams
were detected, one deected up and the other deected down by the magnetic eld.
37
The result was dicult to explain unless it was postulated that the electron in the hydrogen
atom had associated with it a quantity called spin, which made an extra contribution to the
magnetic dipole moment of the hydrogen atom, in addition to the contribution due to the
rotational motion of the electron. The spin is the intrinsic angular momentum of the electron
and it is in no way associated with the rotation of the electron around the nucleus. The spin as
the intrinsic angular momentum of the electron has an intrinsic magnetic moment associated
with it and proportional to the spin. It can be shown that a non-uniform magnetic eld acts
upon a magnetic moment with a force aligned with the direction of the eld gradient. The
value of the force is proportional to the eld gradient and to the component of the magnetic
moment (which component is proportional to the spin) in the direction of the eld gradient.
Considering that the experimental set up was such that the eld gradient was vertical
and the initial direction of the electron beam was horizontal, the electrons had to be deected
upwards or downwards according to the component of their spin in the vertical direction. The
electrons whose vertical spin component was up, i.e., positive, were deected upward and
those whose vertical spin component was dow, i.e., negative, were deected downwards.
Photons, protons, neutrons are also characterized by spin as an intrinsic property. The
spin of the atoms is related to the spins of the electrons orbiting around its nucleus and to the
spin of the nucleus itself. In a Stern - Gerlach type of experiment, we can measure the spin
component of an atom along the vertical axis by observing which way the atom is deected
and by how much.
In quantum mechanics the intrinsic angular moment, the spin, is quantized and the values
it can take are multiples of the rationalized Planck constant ( = h/2). The spin of
an atom or of a particle is characterized by the spin quantum number s, which may assume
integer and half-integer values. For a given value of s, the projection of the spin on any axis
may assume 2s + 1 values ranging from s to +s by unit steps. The electron has spin s = 12
and the spin projection can assume the values + 12 , referred to as spin up, and 12 , referred to
as spin down
At the time of its discovery the spin represented a new physical quantity introduced into
Nature. Its unsuspected existence signaled the possible existence of other hidden variables.
After its experimental discovery the spin was added by hand to the quantum mechanics
of the time. Pauli introduced an additional two-valued degree of freedom for the atomic
(bound) electron and with it he postulated the exclusion principle. According to Paulis
exclusion principle, no more than two electrons can occupy the same energy level and those
two electrons must have anti-parallel spins. The non-relativistic electron was described by a
two-component wave function, where the two components represented states of spin 12 .
The electron spin was directly predicted as an intrinsic property by the relativistic Dirac
equation in its non-relativistic limit.
The electron spin is the physical basis for the quantum bit, the qubit. The spin is an observable
and the states for which the probability of obtaining a particular result is unity play a central
role. In the case of a spin s = 12 , there are two such states which are generally denoted by
| + and | , according to the sign of the corresponding value | + 12 and | 12 , respectively.
These two states are vectors in a complex Hilbert space and have all the properties associated
with such a space. They form an orthonormal basis in a two-dimensional Hilbert space.
More general normalized state, for which positive and negative results are possible, can be
constructed by linear superposition of these two states:
| = c+ | + + c |
38
A qubit as a state (vector) in a two-dimensional Hilbert space can take any value of the form
mentioned above. The coecients c+ and c are arbitrary complex numbers, subject to the
normalization condition
| c+ |2 + | c |2 = 1
and they contain all the information we can ever obtain about a quantum state of spin 12 .
If this maximum information is available, the system is said to be in a pure state. In such
a state, the probabilities for the two possible results of the measurement like in the Stern -
Gerlach experiment are
P+ =| c+ |2 and P =| c |2
and they correspond to the two components of the beam split by the magnetic eld, one
deected upwards and the other downwards, respectively. Assume that we have two detectors,
one to detect the upwards deected beam and the other to detect the downward deected
beam. Depending on which detector is clicking, the initial general pure state of the system
(electron or atom) is reduced to spin state | + or | through the measurement. In fact, we
do not need two detectors. If we have one detector mounted on the upward-deection path,
them if the detector does not click, we may conclude that the electron went downward and,
therefore, it had downward spin and was in a spin state | . When the beam of particles
is in a pure spin state it is a completely polarized beam. The beam is unpolarized if the the
probabilities for possible values of the spin are equal. If the probabilities for various possible
spin states are unequal, the beam is partially polarized. A modied Stern - Gerlach apparatus
containing more than one set of magnets can be used to lter the incident beam of atoms, so
that after the last magnet we end up with atoms in a denite spin state, a polarized beam.
2.4 Mathematical Foundations of Quantum Mechanics

Quantum theory is a mathematical model of the physical world. The model is characterized
by the way it represents: the states of a physical system, the observables of the system, the
measurements of properties of the system, and the dynamics of the system. The development
of the quantum mechanics was based on three postulates which apply to any isolated quantum
system:
The physical states are represented by vectors in a complex Hilbert space, Hn ;

9
The dynamics of the quantum system is specied by Hermitian operators and the
time-evolution is described with the Schrodinger equation:
d
i = H
dt
Mutually exclusive measurement-outcomes correspond to orthogonal projection opera-
tors {P0 , P1 , . . . Pi , ...} and the probability of a particular outcome i is given by |Pi |2
9
Hermitian operators are discussed in Section 2.6
39
Traditionally, quantum mechanics texts use Diracs notations for vectors and call the
scalar product of two vectors and matrices the inner product and their tensor product the
outer product. We shall follow this tradition throughout this section where we review the
basic mathematical concepts needed for the study of quantum mechanics.
Consider the Hilbert space Hn , a vector space over the eld of complex numbers. The
elements of a Hilbert space are n-dimensional vectors; these vectors can be added together
or multiplied by scalars and the results of these operations are also elements of the Hilbert
space.
If a0 , a1 , a2 are complex numbers then a state vector | a is written in Diracs ket notation
as:

a0
| a a1
a2
For each ket vector | A there is a dual vector, called the bra vector denoted by a |
where:

a | a0 a1 a2
The bra and ket vectors are related by Hermitian conjugation:
| a = (a |) , a |= (| a )
Denition 1. The inner product a || b of two vectors in Hn is a complex number. The

inner product has the following properties:
(i) The inner product of a vector with itself is a non-negative real number:
a | a R

= 0 if | a = 0
a | a =
> 0 otherwise
(ii) Linearity :
a | ( c|b ) = ca | b
( aa | + bb | ) | c = aa |c + bb |c
c | ( a | a + b | b ) = ac | a + bc | b
(iii) Skew symmetry :
a | b = b | a .
The skew symmetry implies a skew linearity in the second factor:
40
a | (b | b + c | c ) = (b|b + c|c )|a
= b b |a + c c |a
= b a |b + c a |c
Example 1. The inner product maps an ordered pair of vectors in Hn to complex numbers
in C. For example, if | a , | a H3 then:

b 0
a | b = a0 a1 a2 b1 = a0 b0 + a1 b1 + a2 b2
b2
Note 1. A complete bracket expression denotes a number and an incomplete bracket

expression or denotes a vector.
Denition 2. A vector space over the complex numbers where the inner product is dened
with properties of linearity, positiveness and skew symmetry is called a unitary space.
Denition 3. Two vectors | a and | b in Hn are orthogonal | a | b if:
a | b = 0.
By the skew symmetry, | a | b implies | b |a .
The inner product is the generalization on the set of complex numbers of the dot product
between two vectors in a real Hilbert space. In a real Hilbert space the dot product of vectors
a and b is:
a . b = ax bx + ay by = |a| |b| cos

In the complex Hilbert space the inner product of two state vectors | a and | b is:
a | b = | a || b | cos cos
Denition 4. A normal unitary basis of an n-dimensional space is a set of n vectors |

1 , . . . , |n where each vector has norm / length one and any two vectors are orthogonal:
||1 || = ... = ||n || = 1

and
i | i | j = 0 f or (i = j)
The unit vectors 0 | = (1, 0, . . . , 0), . . . , n | = (0, 0, . . . , 1) have unit length and are
mutually orthogonal; they form a normal unitary basis.
It can be proven that any set of m < n mutually orthogonal vectors of length one of a
unitary space forms part of a normal unitary basis of the space
These properties are similar to those of a scalar product in Euclidean space, except for the
occurrence of complex numbers. A complex Euclidean space could be considered an analogue
of a Hilbert space.
41
Denition 5. A Hilbert space is complete in the norm if:
||a || = a | a 1/2 .
Note 2. Completeness is an important characteristic in innite-dimensional functional

spaces, since it ensures the convergence of certain eigenfunction expansions (e.g., Fourier
analysis). In most cases we are going to work with nite-dimensional inner product spaces.
Denition 6. Schwartz inequality is satised by the inner product operations
a | a b | b | a | b |2
Denition 7. The outer product of a ket vector and a bra vector
| a b |
is a linear operator, de facto, a matrix:

a0
a 0 b 0 a 0 b 1 a 0 b 2
| a b |= a1 b0 b1 b2 = a1 b0 a1 b1 a1 b2
a2 a2 b0 a2 b1 a2 b2
Denition 8. An operator is Hemitian conjugate if
(| a b |) = | b a |
Denition 9. Two Hermitian operators O1 and O2 commute if
[O1 , O2 ] = O1 O2 O2 O1 = 0
This is a necessary and sucient condition for the existence of a complete set of basis vectors
that are simultaneous eigenvectors of O1 and O2 .
Denition 10. The density operator for an ensemble of mixed states | i of a multipartite
system is

= pi | i i |
i
A mixed ensemble of states is produced by an incoherent admixture of states, while a

coherent superposition produces a single pure state (represented by a ray).
The density operator is an operator on the Hilbert space and has the form of a linear
combination of projections operators. The various vectors | i that constitute the ensemble
used to dene the density operator are not necessarily orthogonal; therefore, the ensemble
representation of the density operator is not a spectral decomposition.
42
2.5 The First Postulate. Quantum States
A state is a complete description of a physical system. In quantum mechanics, a state is a
vector, in fact, a ray in a Hilbert space.
By convention state vectors are assumed to be normal(ized), i.e., a | a = 1 .
Therefore,

|ai |2 = 1
i
The length of a bra vector a | or of the corresponding ket vector | a is dened

as the square root of the positive number a | a . For a given state the bra or ket vector
corresponding to it is dened only as direction and its length is undetermined up to a factor;
the factor is chosen so that the vector length is usually set equal to unity. Even than the
vector is undetermined because it can be multiplied by a quantity of modulus 1 . such a
quantity is the complex number ei , where is real. This number is called phase factor.
The inner product of two state vectors represents the generalized angle between the
states and gives an estimate of the overlap between the states | a and | b .
The interpretation of a | b = 0 as representing orthogonal states and the implication
of a | b = 1 that a and b are one and the same state are immediately evident.
The inner product of two state vectors is a complex number, but the square of the inner
product |a | b |2 can be thought of as a quantitative measure of the relative orthogonal-
ity between these states.
Note 3. Superposition: each state of a dynamical system at a particular time corresponds to a

ket vector; if the state is the result of a superposition of certain other states, its corresponding
ket vector can be expressed linearly in terms of the corresponding ket vectors of the other
states, and conversely.
The states involved in a superposition are said to be dependent. The superposition has
several properties derived from the properties of linear transformations:
(i) Symmetry - the order of the superposition is not important.
(ii) Each state in a superposition can be expressed as a superposition of the other states:
1
| a = (| R c2 | b )
c1
Note 4. The superposition of a state with itself results in the original state:
c1 | a + c2 | a = (c1 + c2 ) | a
(a) If c1 + c2 = 0 , there is no superposition and the two components cancel each other by
an interference eect.
(b) If c1 + c2 = c3 and we assume that the result of the superposition is the original
state itself we can conclude that if the ket vector corresponding to a state is multiplied by any
complex number, not zero, the resulting ket vector will correspond to the same state.
43
Note 5. A state is specied by the direction of a ket vector; the length of the vector is
irrelevant.
Note 6. The states of a dynamical system are in a one-to-one correspondence with all the
possible orientations of a ket vector.
Note 7. The directions of the ket vectors | a and | a are not distinct.
Note 8. When a state | R is the result of a superposition of two other states, the ratio
of the complex coecients c1 and c2 eectively determines the state R .
Therefore, the state is determined by a complex number, or by two real parameters and
from two given states a twofold innity of states may be obtained by superposition. These
two parameters may be interpreted as the ratio of the amplitudes of the two wave functions
added together and their phase relationship, respectively.
There is a fundamental dierence between a quantum superposition and a classical super-
position. For example, a superposition of a membrane vibration state with itself results in a
dierent state with a dierent magnitude of the oscillation. There is no physical characteristic
of a quantum state corresponding to the magnitude of the classical oscillation. A classical
state with amplitude of oscillation zero everywhere is a membrane at rest. No corresponding
state exists for a quantum system since a zero ket vector corresponds to no state at all.
We have mentioned that a quantum state is a ray in Hilbert space. A ray is an equivalence
class of vectors that dier by multiplication by a nonzero complex scalar. In fact, we can
choose an element of this class (for any non-vanishing vector) to have unit norm:
a | a = 1.
For such a normalized vector we can say that | a and ei | a , where |ei | = 1
describe the same physical state.
The ei represents the relative phase
Note 9. The superposition principle: every ray corresponds to a possible state, so that
given two states | a and | b , we can form another state as a | a + b|b , a
superposition of the original two states.
This structure of the state vector space, which comes from adding vectors, is suited for
the description of interference eects.
The relative phase in such a superposition a | a | a + b | b is associated with
e (a | a + b | b ) and is physically signicant.
i
A set of unit vectors | 0, | 1, ... , | n forms a normal unitary basis in the
n -dimensional state vector space.
In a 3-dimensional state vector space a state can be represented using the unit vectors
basis:
| a a0 | 0 + a1 | 1 + a2 | 2
where

1 0 0
| 0 = 0 , | 1 = 1 , | 2 = 0
0 0 1
44
We can work with basis bras instead of kets. Then:
a | a0 0 | + a1 1 | + a2 2 |
The unit vectors satisfy the following relation:
i | j = i,j
where i,j is the Kronecker delta
i,j = 0 (i = j)
i,j = 1 (i = j)
The same physical state | a can be expressed in many dierent bases. For example,
if the state is expressed in the basis | 0 , | 1 as:
| a = a0 | 0 + a1 | 1
and we want to represent it in the new basis
1 1
{| x = (| 0 + | 1), | y = (| 0 | 1)}
2 2
then
1 1
| a = (a0 + a1 ) | x + (a0 a1 ) | y
2 2
2.6 Quantum Observables. Quantum Operators

An observable is a property of a physical system that, in principle, can be measured. In the
formalism of quantum mechanics, an observable is a self-adjoint operator. An operator is a
linear map taking state vectors to state vectors.
The rule is that operators act on ket state vectors from the left, i.e.,
O : | O |
and on bra state vectors from the right, i.e.,
O : | | O
O (a | + b | ) = aO | + bO |
What is the meaning of that? If we take a physical state and apply a transformation to it,
such as rotating it, or waiting for some time t , we get a dierent state. Thus, an operation
applied to one state produces another state. This idea can be expressed by an equation
| = O | .
45
The operator O stands for some operation (such as the interaction of an atom with a test
apparatus). When this operation is performed on a state | of a quantum system, e.g.,
an atom, it produces some other state | of that atom.
Let us modify the equation above like an algebraic equation and try to understand what
it means. We multiply this equation by i |
i | = i | O |
We write the state | as a linear combination of base states as

| = Ci | i
i
Here the coecients Ci are a set of complex numbers; they represent the amplitudes and are
expressed as Ci = i | . The | 1, | 2, | 3, . . . , | i stand for the base states in some
base, or representation.
After this expansion, our equation becomes

i | = i | O | jj |
j
where the states | j are from the same set as | i.

This is now an algebraic equation. The numbers | give the amount of each base
state | i that we nd in | . This amount is given in terms of a linear superposition of
the amplitudes j | that we nd | in each base state. The numbers i | A | j
are the coecients which tell how much of each amplitude j | goes into each sum. The
operator O is described numerically by the set of these numbers, the matrix
Oij = i | O | j
The contracted equation we started with, where the base states are not specied, is con-
venient to use. In general we do not have to specify the base states because they can be
any set. When we want results we have to give the components with respect to some set
of axes and we have to be able to identify the operator O by its matrix Oij in terms of
some set of base states. Once we know a matrix for one particular set of base states, we can
calculate the corresponding matrix for another base. The matrix can be transformed from
one representation to another.
In a two-dimensional Hilbert space an operator can be written using orthonormal basis
vectors as a linear combination, that is
O = | 01 | + | 10 |
Then, when applied to a ket or a bras state vector we obtain, respectively,
O | a = ( | 01 | + | 10 | ) ( a0 | 0 + a1 | 1 + a2 | 2 )
= a0 | 1 + a1 | 0
a | O = ( a0 0 | + a1 1 | + a2 2 | ) ( | 01 | + | 10 | )
46
= a0 1 | + a1 0 |
The outer product of any vector with itself is a projection operator
| a a | = Pa
A projection operator has the following property
( Pa )2 = | a a | a a | = | a a | = Pa
Example 2. Quantum mechanics operators.

(i) The rotation operator Ry () takes a state | into a new state which is in fact the old
state, but as seen in a rotated coordinate system.
(ii) The inversion (parity) operator P which creates a new state by reversing all coordinates.
(iii) The operators for spin 12 x , y , z , the Pauli spin matrices.
(iv) The operator of the z - component of the angular momentum Jz . This operator is dened
in terms of the rotation operator for a small angle :
i
Rz () = 1 + Jz

When applied to a state |

i
Rz () | = 1 + Jz |

Among the new states resulting from application of the rotation operator, some are the
same as the initial state, except for a phase factor. The phase is proportional to the angle
, which is a very small, innitesimally small, angle.
Rz () | 0 = eim | 0 = (1 + im) | 0
When we compare with the denition of Jz above, we get that
Jz | 0 = m | 0
where m is the amount of the z - component of the angular momentum. The expression
above can be interpreted in the following way: if we operate with Jz on a state with a
denite angular momentum about the z - axis, we get m times the same state.
(v) The displacement operator Dx (L) along x - axis, by distance L . If we make a small
displacement x along x , a state | will transform into another state | , where

i
| = Dx (x) | = px x |
1 +

Here, if x goes to zero, the | should become the initial state | , that is
Dx (0) + 1 . For innitesimal small x , the change of Dx from its value 1 should be
proportional to x . That is how the proportionality quantity px , the momentum operator
for the x-component, is dened.
47
Example 3. Rotation Matrices
(a) Rotation matrices for spin 12 .
Two base states:
| + , spin up along the z - axis, m = + 12
1
| , spin down along the z - axis, m = 2
Rz () | + |

+ | e+ i 2 0

| 0 e i 2
(b) Rotation matrices for spin 1 .

Three base states:
| + , m = + 1
| 0 , m = 0
| , m = 1
Rz () | + | 0 |
+ | e+ i
0 0
0 | 0 1 0
| 0 0 e i
Ry () | + | 0 |
1
+ | 2
(1 + cos) + 12 sin 1
2
(1 cos)
0 | 12 sin cos + 12 sin
1
| 2
(1 cos) 12 sin 1
2
(1 + cos)
(c) Photons of Circular Polarization in xy - Plane

Two base states:
| R = 12 (| x + i | y), m = + 1 (RHC polarized)
| L = 12 (| x i | y), m = 1 (LHC polarized)
Rz () | R | L
R | e+ i
0
L | 0 e i
48
2.7 Eigenvalues and Eigenvectors of a Quantum Operator
Denition 11. Certain elements, kets among the vectors of a Hilbert space on which the
action of an operator is simply a rescaling, are called eigenkets or eigenvectors of that operator
O|V = |V
Here | V is an eigenket (eigenvector) of the operator O and is the corresponding
eigenvalue. If | V is an eigenvector of the operator O , so is the vector a | V . The
eigenvectors are determined up to scale factor. If we require that V | V = 1 , that the
eigenvectors be unitary, then the ambiguity is partially removed up to a phase factor ei .
Example 4. Every vector is an eigenvector of the identity operator I with eigenvalue 1
I |V = |V
Example 5. The operator PV = | V V | where V | V = 1 is the projector operator

into the subspace spanned by | V .
Its eigenvalues and eigenvectors can be found out in the following way. Assume
PV | V = a | V
| V V | V = a | V
Here V | V is a number and V has to be an eigenvector, that is, | V = b | V
If V | V = 0 , then a = 0 .
If V | V = 0 , then a = 1 .
The eigenvalues of PV are either 0 or 1 . The eigenvectors of PV are either
perpendicular or parallel to | V with eigenvalues 0 and 1 , respectively.
An arbitrary linear operator:
O | = |
can be written as
O | = I |
( O I ) | = 0
This equation can be transformed into a matrix equation by choosing a basis {| ui } ,
such that

| = c i | ui
i
Then

( O I ) c i | ui = 0
i

(ij ij )cj = 0
j
49
A trivial solution of this matrix equation is that all cj = 0 . A nontrivial solution exists
only if det(O I) = 0 . This can be written as

(12 )
12 13
21 (22 ) 23

13 23 (33 ) = 0

.. .. .. . . .
. . .
This is called the characteristic equation.
In general, the determinant can be written as a polynomial

a1 a2 a3 cdots

b1 b2 b3

c1 c2 c3 = ijk ai bj ck

.. .. .. . . . ijk
. . .
Here
ijk = 1 for even permutations,
ijk = 1 for odd permutations,
ijk = 0 for repeating indices.
Therefore, the characteristic equation above is a polynomial of order N in , where
N is the dimension of the vector space. The polynomial has N roots which can be real
or complex, distinct or identical. These N roots are the eigenvalues of the operator in a
particular basis, but are independent of the choice of basis. If the vector space is dened over
the space of complex numbers, then each operator O has at least one eigenvalue and the
characteristic equation has at least one root.
Let us consider the operator O rotating a vector A clockwise through an angle in
two dimensions. The matrix of O in the {| x, | y} basis is

cos sin
=
sin cos
The eigenvalues of O are found from the equation

cos sin
= 0,
sin cos
respectively, from
( cos )2 + sin2 = 0
Here we have
( cos )2 = sin2
= cos isin
If sin = 0 there is no real solution, but two complex ones.
The eigenvectors of the operator O associated with the eigenvalue can be found by
solving the system of N equations
50

(ij ij )cj = 0, (i = 1, , N )
j
for the unknowns cj . Due to the linearity of the operator O , the eigenvectors, respectively,
the coecients cj can be dened only up to a multiplicative constant. If the characteristic
equation has N distinct roots, we can nd (up to a phase factor) N unique, normalized,
linearly independent eigenvectors which form a basis V . If the characteristic equation does
not have N distinct roots, we are in the situation where we have a multiple root; the
eigenvalue is said to be degenerate. In that case, we may nd that:
(i) more than one linearly independent eigenvector are associated with the multiple root and,
thus, still nd N linearly independent eigenvectors which form a subspace V of V , or
(ii) only one eigenvector is associated with the multiple root and that the operator does not
have a basis of eigenvectors.
Proposition 1. The eigenvalues of a Hermitian operator are real numbers.

Proof: Assume
O | = |
then,
| O | = |
The adjoint of this expression is
| O | = |
Since the operator is Hermitic, O = O and then,
| O | = | O |
| = |
( ) | = 0
Since | = 0
Proposition 2. Every Hermitian operator has at least one basis of orthonormal eigenvectors.
The matrix of the operator is diagonal in this basis and has its eigenvalues as its diagonal
terms.

1 0 0
0 2 0

0 0 3
() =
.. .. .. . .
. . . .
n
51
This matrix () of the operator O is in the basis | 1 , | 2 , , | n . Every
vector | i is chosen from a subspace that is orthogonal to the previous one, therefore
the basis {| i } is orthogonal.
If the eigenvalues are degenerate (not distinct), then there are many bases of eigenvectors
that diagonalize O.
Assume is a degenerate eigenvalue,
O | 1 = | 1 , O | 2 = | 2
Then, for any a1 , a2 in the complex numbers eld F ,
O( a1 | 1 + a2 | 2 ) = ( a1 | 1 + a2 | 2 )
We can say that there is a whole subspace spanned by vectors | 1 and | 2 , the
elements of which are eigenvectors of O with eigenvalues .
Proposition 3. When two Hermitian operators O1 and O2 commute with each other
[O1 , O2 ] = O1 O2 O2 O1 = 0
there is a complete set of basis vector states that are simultaneously eigenstates of O1 and
O2 .
Proof for a necessary condition: Assume that in some cases, it is possible to nd a complete
set of basis states {| k } in the Hilbert space that are simultaneously eigenstates of both
operators
O1 | k = 1k | k ,
O2 | k = 2k | k
If these relations are true and we apply the operator O2 to the rst equation and the
operator O1 to the second equation
O2 O1 | k = O2 1k | k = 1k 2k | k
O1 O2 | k = O1 2k | k = 2k 1k | k
then,
(O1 O2 O2 O2 ) | k = 0
The states {| k } form a complete basis and any state | in the Hilbert space can be
written as a linear combination of them
| = sumk k | k
If we repeat the procedure for | we obtain

(O1 O2 O2 O1 ) | = k (O1 O2 O2 O1 ) | k = 0
k
This relation is true if the commutator
52
[O1 , O2 ] = O1 O2 O2 O1 = 0
This condition is necessary and also sucient.
In summary:
(i) By denition an observable is any Hermitian operator whose eigenvectors can form a basis.
(ii) The eigenvalues of a unitary operator are complex numbers of unit magnitude.
(iii) Eigenvectors corresponding to dierent eigenvalues are mutually orthogonal.
(iv) If two Hermitian operators commute they have a common basis of orthonormal eigenvec-
tors (an eigenbasis). If they do not commute, then no common eigenbasis exists.
(v) A complete set of commuting observables is the minimal set of Hermitian operators with
a unique common eigenbasis.
2.8 The Spectral Decomposition of an Operator

As we have mentioned before, every physical quantity q, such as energy, position, spin com-
ponent is represented by a Hermitian operator Oq , typically called observable. An operator
which is Hermitian, i.e., O = O , and unitary, i.e., OO = O O = 1, is also normal, i.e.,
[O, O ] = OO O O = 0.
In a nite-dimensional vector space, every normal operator has a complete set of or-
thonormal eigenvectors. If | iq is an eigenstate of the operator Oq and qi is the associated
eigenvalues, then we can write
Oq | iq = qi | iq
If Oq is a normal operator, its eigenstates {| iq } can serve as a complete set of orthonormal
basis kets for the Hilbert space. That means, every state in the state space can be decomposed
as

| c = c i | iq
i

where i | ci |2 = 1.
The action of the operator O on such a state represented in such a basis is described as

Oq | c = Oq i ci | iq

= i ci Oq | iq

= q
i c i i | iq
Now, let us consider the action of the projection operators
Pqi | iq iq |
on a state represented in this basis
53

Pqi | c = | iq iq | i | iq

= i ci | iq iq || iq

= i ci (iq | iq )

= i ci | iq (i, i )
= c i | iq
We nd this last result in the expression for the action of the operator Oq

Oq | c = C iqi | iq
i
and replace it with the action of the projectors and we get

q q
Oq | c = i Pi | c
i
This expression is true for any state and it is basis independent; thus we can write for the
operator Oq
q q
Oq = i i
i
Note 10. This is the so called spectral decomposition of the operator Oq .

The spectral decomposition of the operator can be expressed in matrix notation. Let us
assume a two-dimensional Hilbert space H where we choose a pair of xed basis kets | 0
and | 1 to represent an arbitrary state as
| c = c0 | 0 + c1 | 1
with | c0 |2 + | c1 |2 = 1. For these xed basis kets we can use the vector notation

co
| c
c1
Assume that we have a normal operator O with eigenvalues 0 and 1 and corresponding
orthonormal eigenstates | a and | b, where

a0
| a = a0 | 0 + a1 | 1
a1

b0
| b = b0 | 0 + b1 | 1
b1
The projection operators corresponding to these eigenvectors | a and | b can be written,
respectively

a0
| a0 |2 a0 a1
Pa = | aa | = a0 a1 =
a1 a1 a0 | a1 |2
54

b0
| b0 |2 b0 b1
Pb = | bb | = b0 b1 =
b1 b1 b0 | b1 |2
We can use the spectral decomposition to write the operator using the matrix notation

| a0 |2 a0 a1 | b0 |2 b0 b1
O a + b
a1 a0 | a1 |2 b1 b0 | b1 |2
In the case of mixed quantum ensembles, the density operator is a simplifying notation
introduced for performing calculations. Assume that the states that constitute a mixed en-
semble are | i (and they need not be orthogonal) and have probabilities pi , then the density
operator is written as

= i pi | i i |

= i Pi
where Pi is the projection onto the state | i ; the state | i itself is a normalized pure state
and occurs in the ensemble with relative probability pi . This linear combination of projection
operators is not a spectral decomposition when the states are not mutually orthogonal
2.9 The Second Postulate. The Dynamics of a Quantum System

The dynamics of a quantum system is specied by a Hermitian operator H, called the Hamil-
tonian, and the time-evolution of the system is described with the Schr
odinger equation:
d
i| (t) = H(t) | (t)
dt
where is the reduced Plancks constant = 2h
.
Note 11. For a nite dimensional system the Schr odinger equation is in fact a coupled system
of linear dierential equations.
This equation represents the relation between the states of a quantum system observed at
dierent instants of time. When we make an observation on a dynamical quantum system, the
state of the quantum system is changed in an unpredictable way, but between observations,
between interferences with the system, we expect causality to apply in a similar way as in
classical mechanics. That means that between interferences the evolution of a quantum system
is expected to be governed by equations of motion which make the state at one time determine
the state at a later time. The equations of motion will apply as long as the quantum system
is left undisturbed by any observation.
Schrodinger equation as written above gives the general law for the variation with time
of the ket corresponding to the state at any time. The operator H(t) , the Hamiltonian,
is a real, linear operator, characteristic of the dynamical system under consideration. This
operator is assumed to be the total energy of the system for two reasons: (i) this operator
was introduced in analogy with the classical mechanics, and (ii) the theory of relativity puts
energy in relation to time (the same way as momentum to distance).
It is also assumed, on physical grounds, that the total energy of a system is always an
observable.
55
2.10 The Schematic Derivation of Schr
odingers Equation
Assume that we have a system is prepared in a state | at a moment t1 , after coming out
of an apparatus. Next, our system goes through another apparatus which represents a delay
in time from t1 to t2 . During this delay in time our system is acted upon in a certain way
and comes out in a state | so that the amplitude to nd it in this state is no longer the
same as it would have been without the delay. We represent the waiting from moment t1
to moment t2 be the unitary time operator U(t2 , t1 ) . We can describe what has happened
during this delay by giving an amplitude of the general form
| U(t2 , t1 ) |
This amplitude can be represented in some vector base as

| ii | U(t2 , t1 ) | jj |
i,j
The time operator U is completely described by the whole set of amplitudes, the matrix
j | U(t2 , t1 ) | j
In non-relativistic quantum mechanics any time interval can be analyzed as a sequence
of short time intervals in between. For such short intervals there are assumptions that can
be made to simplify the calculations (such as the adiabatic hypothesis in classical thermody-
namics).
Assume: t2 = t1 + t
The state at time t is | (t) ; after the small time interval t changes and becomes
| (t + t) = U(t + t, t) | (t)

We can project the state at time t + t into a given representation and for this, let us
multiply both sides with i |
i | (t + t) = i | U(t + t, t) | (t)

If we resolve the | (t) into some base states, we write

i | (t + t) = i | U(t + t, t) | jj | (t)
j
Let us look at the left side of this equation and consider that Ci (t) = i | (t) is
the amplitude for the system to be in base state i at time t . We can think of each such
amplitude, which is just a number, as varying with time, i.e., each Ci becomes a function
of time. The equation describes how the amplitudes vary with time. At time (t + t)
each amplitude is proportional to all of the other amplitudes at time t multiplied by a set of
coecients. These coecients are related to the elements of the U matrix
Uij = i | U | j
Then, the variation in time of the amplitudes is described by

Ci (t + t) = Uij (t + t, t)Cj (t)
j
56
About Uij we know the following thing: if t 0 , nothing can happen and we should
get the original state, i.e., Uii 1 and Uij 0 if i = j . Thus
Uij ij f or t 0
We assume that for small t each of the coecients Uij should dier from ij by
amounts proportional to t and then,
Uij (t + t, t) = ij + Kij t
The coecients Kij are usually represented as i Hij and the equation becomes
i
Uij (t + t, t) = ij Hij (t)t

The new terms Hij in this expression represent the derivatives of the coecients
Uij (t2 , t1 ) with respect to time and evaluated at t2 = t1 = t . We replace Uij by
this expression and we have
i

Ci (t + t) = ij Hij (t)t Cj (t)
j

Here, j ij Cj (t) = Ci (t) . We move Ci (t) on the other side of the equation and
divide by t . The equation is now written as
Ci (t + t) Ci (t) i
= Hij (t)Cj (t)
t j
where we recognize a derivative with respect to time, or
dCi (t)
i = Hij (t)Cj (t)
dt j
The term Ci (t) is the amplitude i | to nd the state | in one of the base
states | i at the time t . The equation above shows how each of these amplitudes varies
with time. In fact, since the state | is described in terms of the amplitudes i | , the
equation shows how the state varies with time.
The matrix Hij contains all the physics of the actions applied to the system which cause
it to change and as such it depends on time. If we know this matrix, we have a complete
description of the behavior of the system in time. The base state vectors are considered to
be xed in time.
The coecients Hij are called the Hamiltonian matrix, or just the Hamiltonian. The
name comes from the mathematician and astronomer Hamilton who worked in the 1830s,
had nothing to do personally with quantum mechanics, but happened to restate Lagranges
equations with emphasis on momenta instead of forces. Hamiltons are rst order dierential
equations with respect to time and involve the Hamiltonian function H which is the total
energy expressed as a function of the generalized coordinates qi and momenta pi
d H d H
= , =
dqi pi dpi qi
57
The Hamiltonian function H expresses the total energy of the system in terms of mo-
menta and positional coordinates and for dierent physical systems takes dierent forms. For
example, the energy of a body in a simple harmonic motion 10 is:
p2 q 2
+
2m 2
Therefore, in order to describe a quantum system we need to select a set of base states
and to express the physical laws which apply to that particular system through the matrix
Hij . The rule is to nd the Hamiltonian corresponding to a particular physical situation,
such as, a magnetic eld, an electric eld, and so on. For nonrelativistic phenomena and
for some other special cases there are excellent approximations. For example, the form of
the Hamiltonian describing the motion of electrons in atoms is a very good approximation to
describe chemistry phenomena.
The following property of the Hamiltonian
Hij = Hji
results from the condition that the total probability that the system is in some state (whichever
that state is) does not change in time. If the system is a particle, as the time goes on well
still have it. The probability to nd it somewhere is

| Ci (t) |2
i
and that must not vary with time.
Example 6. Consider the case of a hydrogen atom at rest; this is a system which can be
described only with one base state to a good approximation. Assume the external conditions
do not change in time, so that the Hamiltonian H is independent of time, so that;
dC1
= H11 C1
i
dt
The system is described with one dierential equation and, since H11 is constant, its
solution is
C1 = (const)e(i/)H11 t
The solution expresses the time dependence of a state characterized by a denite energy
E = H11 . The Hamiltonian matrix is a generalized form of the energy of the system.
Example 7. A system with two base states | 1 and | 2 is in the state | . If the amplitude
to be in state | 1 is C1 = 1 | and the amplitude to be in state | 2 is C2 = 2 | , then
the state vector | can be written as:
| = | 11 | + | 22 |
| = | 1C1 + | 2C2
10
The motion of a point which represents the projection on the diameter of a body rotating on the corre-
sponding circle is called a simple harmonic motion.
58
If we assume that the system changes its state at any given moment in time, the C1 , C2
coecients will change in time according to the equations:
dC1
i= H11 C1 + H12 C2
dt
dC2
i = H21 C1 + H22 C2
dt
Solution: We have to make some assumptions about the H matrix to solve the equations:
(i) Stationary states hypothesis: once the system is in one of the states, say | 1, there is no
chance that it could change to state | 2. Then H12 = 0 and H21 = 0 and the equations
become
dC1
i = H11 C1
dt
dC2
i = H22 C2
dt
The solutions of these equations are
C1 = (const) e(i/)H11 t
C2 = (const) e(i/)H22 t
These are the amplitudes for stationary states with energies E1 = H11 and E2 = H22 ,
respectively.
Note 12. The two states | 1 and | 2 are symmetrical and the two energies are equal
E1 = E2 = E0
Note 13. There is a small probability (amplitude) that the system could tunnel through the
energy barrier separating the two states

H12 = H21 = H21 (up to a phase) = A
The two equations become
dC1
i = E0 C1 AC2
dt
dC2
i = E0 C2 AC1
dt
We rst take the sum of the two equations
d
i (C1 + C2 ) = (E0 A)(C1 + C2 )
dt
and the solution is
C1 + C2 = a e(i/)(E0 A)t
Then we take the dierence of the two equations and we get
59
d
i (C1 C2 ) = (E0 + A)(C1 C2 )
dt
with the solution
C1 C2 = b e(i/)(E0 + A)t
The integration constants a and b are chosen to give the appropriate initial conditions for
a particular system. By adding and subtracting the equations for the amplitudes, we get
a (i/)(E0 A)t b (i/)(E0 + A)t
C1 (t) = e + e
2 2
a (i/)(E0 A)t b
C2 (t) = e e(i/)(E0 + A)t
2 2
What is the interpretation of these solutions?
If b = 0, then both terms are equal and have the same frequency = (E0 A)/ in
the exponent. The system is in a state of denite energy (E0 A) at this frequency. This
is a stationary state of the system in which the two amplitudes C1 and C2 are equal. We can
say that the system is in a state of denite energy (E0 A) if the amplitudes for the system
to be in state | 1 or | 2 are equal.
If a = 0, then another stationary state is possible. This time the two amplitudes have
the frequency (E0 + A)/. The system is in a state of denite energy (E0 + A) if the two
amplitudes are equal, but of opposite sign, i.e., C1 = C2
At t = 0 the probabilities to be in state | 1 or | 2 are, respectively,
a + b a b
C1 (0) = = 1, C2 (0) = = 0
2 2
The result is that a = b = 1 and the amplitudes become
(i/)At
(i/)E0 t e + e(i/)At
C1 (t) = e
2
(i/)At
(i/)E0 t e e(i/)At
C2 (t) = e
2
and we can rewrite them as
At
C1 (t) = e(i/)E0 t cos

At
C2 (t) = e(i/)E0 t sin

The probability that the system is in state | 2 at time t is the square of the absolute value
of C2 (t)
At
| C2 (t) |2 = sin2

The probability that the system is found in state | 1 at time t is the square of the absolute
value of C2 (t)
60
At
| C1 (t) |2 = cos2

The probability associated with the term C2 is zero at the initial moment t = 0, increases
to one and continues to oscillate between zero and one in time. The probability associated
with the term C1 is one at the initial moment, decreases to zero and then oscillates in time. We
say that the magnitude of the two amplitudes varies harmonically with time. The probability
to nd the system in one of the two states varies back and forth between the magnitudes of
the two individual probabilities.
2.11 The Third Postulate; Measurements of Observables

In quantum mechanics, the numerical outcome of a measurement of the observable O is an
eigenvalue of the operator O; immediately after the measurement, the quantum state is an
eigenstate of O with the measured eigenvalue.
Another formulation of this postulate: mutually exclusive measurement-outcomes corre-
spond to orthogonal projection operators (projectors) {P0 , P1 , . . .}.
Proposition 4. Two projectors Pi , Pj are orthogonal if, for every state in the Hilbert space
Pi Pj | = 0
This condition is often written as Pi Pj = 0.
Denition 12. A complete or exhaustive set of propositions is a set for which at least one
proposition must be true. Correspondingly, we dene a complete set of orthogonal projectors
to be a set {P0 , P1 , P2 , . . .} such that

Pi = 1
i
From this denition it follows that the number of projectors in a complete orthogonal
set must be less than, or equal to the dimension of the Hilbert space. The postulate can
be reformulated in terms of completeness of a set of projectors, that is , a complete set of
orthogonal projectors species an exhaustive measurement.
Let us see what is the probability to obtain a measurement outcome an if the quantum
state just prior to the measurement is | :
P robability (an ) = | Pn | |2
= (Pn | ) Pn |
= | Pn Pn |
= | (Pn )2 |
= | Pn |
61

The total probability for all possible measurement outcomes is i P robability (an ) = 1
by completeness of the set of projectors. If the outcome an is obtained, then the post mea-
surement normalized pure quantum state is
Pn |
( | Pn | )1/2
Example 8. Let us assume the case of a two-dimensional Hilbert space where we have chosen
the orthogonal basis kets | x and | y.
We dene two possible state vectors of a system in this space
| A = ax | x + ay | y
| B = bx | x + by | y
On this system we perform a large number N of measurements corresponding to the
projectors
Px = | xx |,
Py = | yy |
Before the measurement, the initial state of the system could sometimes be | A (with
probability p) and sometimes | B (with probability 1p). We say that the system is a mixed
ensemble of quantum states. We try to predict the number of times nx out of the total number
N of measurements when we expect to obtain the measurement outcome corresponding to
basis vector | x.
According to the probability theory
nx = N [P rob(| A )P rob(x | A ) + P rob(| B )P rob(x | B )]
= N [p A | Px | A + (1 p)B | Px | B ]
= N [p | ax |2 + (1 p) | bx |2 ]
Since 0 p 1, the quantity nx /N is bounded from below by the smaller of | ax |2 and
| bx |2 .
If the initial state of the system we have a coherent superposition of the states | A and
| B corresponding to the pure state | (pA , pB ), where
| (pA , pB ) = pA | A + pB | B
= (pA ax + pB bx ) | x + (pA ay + pB by ) | y
Here, pA and pB are chosen such that the state | (pA , pB ) is normalized. In this case,
the probability of a measurement outcome corresponding to basis vector | x is
nx = (pA , pB ) | Px | (pA , pB )
= N | pA ax + pB bx |2
We notice that in certain cases it is possible to choose pA and pB such that
62
pA ax = pB bx = 0
and then,
nx = 0 < | ax |2 , | bx |2
through the phenomenon of destructive interference. This is the truly important distinc-
tion between coherent superpositions (of the type that produce a single pure state) and the
incoherent admixtures (of the type that produce a mixed ensemble of quantum states).
63
3 Qubits
A classical bit is an abstraction for a physical system capable of beeing in one of two states, 1
or 0. To be able to perform any computation we should be able to store information in the
physical system implementing a bit, retrieve the information from the physical system, and be
able to change the state of the physical system. To perform a meaningful computation we need
a lot of bits and the ability to access and modify them very quickly. That is why a computer
using electro-mechanical relays is more useful than the traditional abacus, computers using
electronic circuits are more capable than ones based upon electro-mechanical relays, and so
on.
From solid state studies we already know that thesmaller and simple are the physical
systems used to realize a bit the less energy is necessary to perform the operations described
above, and the faster is the resulting contraption. By pushing the limits of the physical
systems used to realize a bit we inevitably end at the level of atomic or even sub-atomic
particles where we enter the realm of quantum systems governed by quantum mechanics.
3.1 The Qubit, a Very Small Bit

A quantum bit or qubit is an elementary quantum object used to store information. Since it is
dicult at this stage to explain what a quantum object is, for the time being we view a qubit
as a mathematical abstraction and then hint on possible physical implementations of this
abstract object. The mathematical concepts used below are introduced later, in Section 11,
but for now you do not need to worry too much if you do not know their precise meanings as
long as you remember what a vector is.
A qubit is a vector in a two-dimensional complex vector space. In this space a vector
has two components and the projections of the vector on the basis are complex numbers. We
use Diracs notations for vector | with 0 and 1 complex numbers and with | 0 and | 1
two vectors forming an orthonormal basis for this vector space [27]:
| = 0 | 0 + 1 | 1.
While a classical bit can be in one of two states, 1 or 0 the qubit can be in states | 0
and | 1 called computational basis states and also in any state that is a linear combination of
these states; 0 and 1 are the coecients of the liner expression describing the actual state
the qubit can be in. This phenomenon is called superposition.
Now the real surprise! When we observe or measure a classical bit we determine its state
with probability 1, the bit is either in state 0 or in state 1, the result of a measurement is
strictly deterministic. On the other hand, when we observe or measure a qubit we get the
result:
| 0 with probability | 0 |2
| 1 with probability | 1 |2 .
For these expressions to be true we need that the vector length, or the norm of the vector
to be one, otherwise the probabilities do not sum to one. This means that:
| 0 |2 + | 1 |2 = 1.
64
We say that a qubit can be in a continuum of states between | 0 and | 1 until we measure
it. For example a qubit can be in state:
1 1
| 0 + | 1
2 2
and we get the result | 0 with probability 12 and | 1 with probability 12 ; alternatively the
qubit can be in the state:

1 3
| 0 + | 1
2 2
and be in the state | 0 with probability 1/4 and | 1 with probability 3/4.
The superposition and the eect of the measurement of a quantum state (the state of the
qubit) really mean that there is hidden information that is preserved in a closed quantum
system until it is forced to reveal itself to an external observer. We say that the system is
closed until it interacts with the outside world, e.g, until we decide to perform an observation
of the system. The fundamental question for us is how to use this hidden information.
So far we have used the states | 0 and | 1 to represent a qubit. But this is one of many
choices; as we shall see later we can chose a dierent set of vectors as an orthonormal basis.
For example, we can choose:
| 0 + | 1 | 0 | 1
| + and | .
2 2
In this case a qubit ca be represented as:
| + + | | + |
| = 0 | 0 + 1 | 1 = 0 + 1
2 2
0 + 1 0 1
| = | + + |
2 2
Now let us briey mention the physical incarnation of a qubit. A qubit can be realized as
the polarization of a photon as we have seen in the example discussed in Section 1.1. A laser
and a polarizing lens form a source of polarized photons. Another possible physical realization
of a qubit is the spin state of an electron orbiting an atom. Assuming that the electron can
be in one of two states one can provide enough energy to move the electron from a ground
state to an excited state, e.g., by shining light. Other physical means to realize a qubit are
discussed later.
3.2 The Bloch Sphere Representation of One Qubit

It is always useful to have a pictorial or graphic representation of an abstract concept and
this is what we are going to attempt next. To this end we express a qubit using three real
numbers, , , as follows:

0 = ei cos
2

1 = ei ei sin
2
65
| 0>
b
y
x
| 1>
Figure 11: A qubit is a vector from the origin to a point on the three-dimensional sphere with
a radius of one, the so called Bloch sphere.
It is left as an exercise to prove that in this representation:
| 0 |2 + | 1 |2 = 1.
This representation lends itself to an interesting geometrical interpretation, a qubit is a
vector from the origin to a point on the three-dimensional sphere with a radius of one, the so
called Bloch sphere, see Figure 11. is the angle of the vector with the z-axis, and is the
angle of the projection of the vector in the x y plane with the x axis, has no observable
eect.
3.3 Two Qubits

We now consider a system of two qubits. We extend our mathematical support system to a
four-dimensional complex vector space where we have a basis consisting of four vectors: | 00,
| 01, | 10 and | 11. In this space a vector is a linear combination of the basis vectors with
complex coecients 00 , 01 , 10 , 11 :
| = 00 | 00 + 01 | 01 + 10 | 10 + 11 | 11.

By extension of the previous observatiuons when we measure a pair of qubits we decide
that the system it is in one of four states | 00, | 01, | 10 and | 11 with probabilities | 00 |2 ,
| 01 |2 , | 10 |2 , and | 11 |2 respectively. The normalization condition, reects the fact that
the sum of probabilities must be one:
| 00 |2 + | 01 |2 + | 10 |2 + 11 |2 = 1.
Note that before the measurement of the two qubits, the state is uncertain, it is given by
the previous equation and the corresponding probabilities. After the measurement the state
is certain, it is | 00, | 01, | 10, or | 11, like in the case of a classical two bit system.
66
But nobody forces us to observe both qubits, what if we observe only the rst qubit, what
conclusions can we draw? Intuitively, we expect that the system to be left in an uncertain
sate, because we did not measure the second qubit that can still be in a continuum of states.
The rst qubit can be 0 with probability | 00 |2 + | 01 |2 , or 1 with probability
| 10 |2 + | 11 |2 . The normalization condition is satised, the sum of the two probabilities
is one.
Call | 0I the post-measurement state when the rst qubit is measured to be 0 and 1I
when the rst qubit is measured to be 1. The two post-measurement states are:
00 | 00 + 01 | 01
| 0I = .
| 00 |2 + | 01 |2
10 | 00 + 11 | 11
| 1I = .
| 10 |2 + | 11 |2
Now let us measure the second qubit only. The second qubit can be 0 with probability
| 00 |2 + | 10 |2 , or 1 with probability | 01 |2 + | 11 |2 . The two post-measurement
states are:
10 | 10 + 00 | 00
| 0II = .
| 10 |2 + | 00 |2
01 | 01 + 11 | 11
| 1II = .
| 01 |2 + | 11 |2

Let us now consider a special state of a two qubit system when 00 = 11 = 1/ 2
and 01 = 10 = 0. This state is called the Bell state and the pair of qubits is called an
EPR pair. In this state when we measure the rst qubit the two possible outcomes are 0 with
probability 1/2 and 1 with probability 1/2. The corresponding post-measurement states are:
0I = | 00
1I = | 11.
When we measure the second qubit, the two possible outcomes are 0 with probability 1/2
and 1 with probability 1/2. The corresponding post-measurement states are:
0II = | 00
1II = | 11.
This is quite an amazing result! The two measurements are correlated, once we measure
the rst qubit we get exactly the same result as when we measure the second one. The two
qubits need not be physically constrained to be at the same location and yet, because of
the strong coupling between them, measurements performed on the second one allow us to
determine the state of the rst.
Einstein, Podolky, and Rosen (thus, the name EPR pair) where the rst to discover the
strange behavior of states like Bell states. This strange behavior hints of possible applications
of quantum information that are well beyond what we could possibly envision in a classical
universe. This is the basis of a phenomena called teleportation.
67
3.4 The Fragility of Quantum Information. Schr
odingers Cat
By now we should be convinced that classical information can be encoded into a quantum
system. Having a potentially innite number of sates a qubit oers dazzling possibilities; two
qubits are truly amazing, as we have seen in the case of EPR pairs. This seems too good to
be true; is there a catch?
Yes, the catch is that quantum information is encoded into very fragile nonlocal corre-
lations of dierent parts of a physical system. Why do we say that these interactions are
fragile? Because a quantum system interacts continually with its environment and these
interactions with the environment destroy the correlations encoded into the quantum sys-
tem. In time the internal correlations of the quantum system are transferred into correlations
between the quantum system and the environment and the information encoded into the
quantum system is lost.
Scrodinger gave an extreme example of the apparent ambiguity of quantum information.
Consider the quantum description of a physical entity we are familiar with, a cat:
1
| cat = (| dead + | live).
2
Since all cats we have ever seen are either dead or alive, a layperson would view
Schrodingers cat as a denitive proof of foolishness; even Schrodinger himself considered
this example as a blemish to his theory [44]. Today we are more sophisticated and realize
that in fact the state | cat though possible, it is extremely rare; it can be constructed as a
correlation of two possible states a cat can be in, dead and alive, but it would be immediately
transferred to correlations between the cat and the environment and become inaccessible. All
the cats we have ever seen are in fact projections of a cat generated by the environment into
either state.
What can possibly go wrong with quantum information? A very serious problems is that
we disturb the state of the quantum system when we attempt to measure it, as we discussed
above. Another concern is that quantum information cannot be copied with delity. What
about errors, what type of errors should we expect?
With classical bits we are aware of bit-ip errors a 0 becoming a 1 and a 1 becoming a 0.
We should expect the same to happen to qubits:
| 0 | 1 and | 1 | 0.
In addition, a qubit may experience a phase error; the phase may ip and then:
| 0 | 0 and | 1 | 1.
The quantum information is continuous; if the state of a qubit is:
= 0 | 0 + 1 | 1
either 0 or 1 , or both may change by an innitesimal quantity, and then we experience
a bit error of a new type, another one we cannot encounter when dealing with classical bits.
68
3.5 Qubits and Hilbert Spaces
Now is the time to be more rigorous in our discussion of qubits and gradually introduce the
mathematical formalism discussed in more detail in Section 11. A Hilbert space is a vector
space H with a scalar product and with the norm dened by:

| |=
As pointed out before, we use Diracs notation for the inner product between two n-
dimensional vectors | and | [27]:

n1
| = i i .
i=0
where the asterisk, (*), denotes complex conjugation. We can think of this as the product of
the row vector | = (0 , 1 , . . . n1

) by the column vector:

0
1
| = ... .

n1
A qubit is a microscopic system, e.g., the spin of an electron or the polarization of a
photon, and may exist in a continuum of intermediate states, or superpositions. We
have already seen that individual sates of a quantum system consisting of one qubit can be
represented as unit vectors in a two-dimensional complex vector space with two dimensions
denoted as H2 .
We can choose dierent basis vectors associated with basis states to describe the inter-
mediate states of a single qubit. For example,
the polarization
of a photon can be described
using as basis vectors, | 0 and | 1, or 1/2(| 0 + | 1) and 1/2(| 0 | 1) 11 , or any
other pair of orthogonal vectors in H2 .
Our ability to distinguish between the states of a single qubit is limited [14]. The inter-
mediate states of one qubit cannot be reliably distinguished from basis states. We have already
seen that the superposition | = 0 | 0 + 1 | 1 behaves like | 0 with probability | 0 |2
and like | 1 with probability | 1 |2 . This gets even more complicated: two quantum states of
one qubit can only be distinguished if and only if their vector representations
are orthogonal.
We can distinguish | 0 from | 1 but we cannot distinguish | 0 from 1/2(| 0 | 1).
One last observation: quantum states form equivalence classes, multiplication with ei does
not alter an unit vector. Therefore, the quantum state of one qubit is a ray in H2 , the
equivalence class of a vector under multiplication by a complex constant.
We expect more complications for quantum system with more than one qubit. We have
seen in Section 3.3 that a two qubit system may be in one of four basis states: | 00, | 01,
| 10 and | 11 or in a superposition of them, in a Hilbert space with four dimensions, H4 .
Now the two qubits can either be in states when each qubit has a well dened state such as:
1 1
| 1(| 0 + | 1) = (| 10 + | 11)
2 2
11
In the example in Section 1.8 we have used the the basis vectors, | and |
more intuitive notation for
instead of | 0 and | 1 and | and | instead of 1/2(| 0 + | 1) and 1/2(| 0 | 1).
69
or in an entangled state when neither qubit has a dened state, though the pair does have a
well dened state:
1
(| 00 + | 11).
2
In general a system of n qubits is represented by a complex unit vector in a 2n -dimensional
Hilbert space, H2n dened as a tensor product of n two-dimensional Hilbert spaces:
H2n = (H2 )n .
Classical systems consisting of many components can be described by describing separately
the state of each individual component. It follows that the complexity of the description
of a classical system, henceforth the number of states, grows linearly with the number of
components. For a quantum system the previous equation shows that the dimensionality of
the state space grows exponentially with the number of components or qubits. The majority
of the states of a quantum system are entangled and they are responsible for both the immense
power of a quantum computer and, at the same time for the diculties of building them.
The evolution of a quantum system in isolation is unitary, it is linear and conserves the
inner product in a Hilbert space. This means that the evolution of the system preserves su-
perposition and distinguishibility of the system states. A superposition of the input states of
a quantum system consisting of a number n > 1 of qubits evolves into a corresponding super-
position of output states as we see in the next section devoted to quantum gates and quantum
circuits. Most gates map unentangled initial states of the quantum system into entangled out-
put states. Conventional computations and communication destroy the entanglement, while
quantum operations can create entanglement, preserve and use the entanglement to speed up
computations and to transmit information over quantum channels.
3.6 The Physical Realizations of Qubits

Physical systems that are considered as possible embodiments of the qubit are: (a) the electron
with two independent spin values, 12 , and (b) the photon, with two independent polarizations,
horizontal and vertical.
As the unit of quantum information, the qubit is a state in a two-dimensional Hilbert space.
An orthonormal basis for a two-dimensional vector space can be denoted as {| 0, | 1} . The
qubit can be expressed in this basis as the most general normalized state, or
0 | 0 + 1 | 1
where 0 , 1 are complex numbers that satisfy the equality
| 0 |2 + | 1 |2 = 1
and the overall phase is irrelevant.
We want to nd more information about the qubit and we perform a measurement. The
measurement itself represents a projection of the qubit state unto the basis {| 0, | 1}. As a
result of this experiment we obtain the outcome | 0 with probability | 0 |2 and the outcome
| 1 with the probability | 1 |2 .
70
1
3.7 Qubits as Spin 2 Particles
In physics it is natural to interpret the general normalized state of a qubit as the spin state
of a particle with spin 1/2, such as the electron. The electron spin is found to have either the
value +1/2 or 1/2 in the direction of the measurement, regardless of the direction chosen
by the observer. The qubit states | 0 and | 1 are the spin up | and spin down | states
along a chosen axis such as the z-axis. It is convenient to represent these states as orthogonal
unit vectors

1 0
| = | =
0 1
What is the spin of a particle, electron or otherwise? We can envision the particle spin
as a manifestation of a rotation about the particles own axis. The observable associated
with this rotation is the intrinsic angular momentum of the particle. The values taken by
the projection of this angular momentum along an axis are proportional to and to the spin
quantum number. The spin quantum number is the spin that everybody is talking about.
The spin is a positive half-integer or integer number and its orientation (the orientation of the
intrinsic angular momentum) is specied by a + or sign. The electron can rotate clockwise
or counterclockwise about a given axis and its spin is limited to two values. Electrons exhibit
probabilities to spin about a certain axis. However, when a measurement is performed for
any chosen axis of rotation, the electron is found to spin in one or the other direction about
that axis, which has been dened by the observer. Before the measurement the electron does
not have a denite axis of rotation, it only has a probability or a potentiality to assume
any axis of rotation. The act of making a measurement about a dened axis provides the
electron with a denite axis of rotation. In the Stern - Gerlach experiment the direction of
the measurement was determined by the selection of the magnets and the orientation of the
magnetic eld which acted upon the intrinsic angular momentum of the beam particles and
separated the particles in states of given spin.
At any instant, an electrons spin state | , can be represented by a linear combination of
those two possible observable states. The choice of a measurement direction is equivalent to
choosing a basis for expressing the spin components of the state | . Each component of the
measuring apparatus that contributes to the overall denition of the measurement direction
has its own vector basis and the change of bases contributes to the overall probability of
nding the electron in a particular nal state.
For any particular basis the electron spin state can be expressed as

1 0 0
| = 0 + 1 =
0 1 1
where 0 and 1 are complex constants; they encode the probability that the chosen mea-
surement will yield either result. The probability is the norm of the respective coecient.
For example, the probability of a measurement on this basis of nding spin up is equal to
the norm, i.e., the product
0 0 = (a ib)(a + ib) = a2 + b2
The probability of nding spin down on this basis is 1 1 . Since these are the only two
possibilities
71
0 0 + 1 1 = 1.
3.8 The Measurement of the Electron Spin Along a Principal Axis

In quantum mechanics, each possible measurement basis is associated with an operator whose
eigenvalues are the possible results of the respective measurement. For a given xyz basis of
orthogonal space axes we can represent the three principal measurements of the spin (mea-
surements along the three axes) by the matrices

0 1 0 i 1 0
Sx = Sy = Sz =
2 1 0 2 i 0 2 0 1
The matrices used here

0 1 0 i 1 0
x = , y = , z =
1 0 i 0 0 1
together with the identity matrix I

1 0
I =
0 1
are the so-called Pauli spin matrices.
Any two-by-two matrix (e.g., the Hamiltonian of any two-state system) can be written in
terms of them as a linear combination:

a11 a12
= I + x + y + z
a21 a22
In the case of an electron in a magnetic eld these matrices have a special geometrical signif-
icance, but in the general case they can be used simply as matrices.
The eigenvalues of the measurement operator (the spin operator, in this case) correspond-
ing to whichever measurement direction we choose (x, y, or z) determine the coecients 0
and 1 which represent the probabilities of the possible outcomes (+ 12 , or 12 ).
Let us see how this works. Suppose | 1 is the initial state vector of the electron and
we decide to perform a spin measurement corresponding to a particular operator S. That is
equivalent to applying S to | 1 , using ordinary matrix multiplication, to give the new state
vector | 2
S | 1 =| 2
The new state | 2 is either pure spin up or pure spin down in the direction of mea-
surement represented by S. A subsequent measurement in the same direction must yield the
same result and | 2 must be such that
S | 2 = | 2
for some constant (state vectors are equivalent up to length).
The constant is an eigenvalue of the measurement operator S and | 2 is the corresponding
eigenvector. The outcome of the last equation is unambiguous since the eigenvector on the
72
right is the same as the eigenvector on the left side. Nevertheless, since the arbitrary initial
state | 1 in the previous equation is not en eigenvector of S, it can yield either of the
eigenvectors of S. That brings forth the probabilistic aspect of quantum mechanics.
The eigenvectors of S constitute a basis for the space of possible state vectors and the
initial state vector | 1 can be expressed as a linear combination of those eigenvectors. If we
denote the eigenvectors by | 2 and | 2 we can write
| 1 = 2 | 2 + 2 | 2
Here the norm of each complex coecient gives the probability that S applied to | 1
will lead to the respective eigenstate.
The eigenvectors of the spin operators along the three principal directions Sx , Sy , Sz cor-
responding to + 12 or 12 are the following
2 + 2

1
1 1
1
Sx 2 2
1 1

1
1 1
1
Sy 2 i 2 i

1 0
Sz
0 1
Each pair of eigenvectors constitutes a basis for the state space; the electron state vector
can be expressed as a linear combination of the basis vectors for the desired measurement
(along one of the principal axes) and the coecients give the probability of that measurement
yielding either spin up or spin down. These probabilities could be viewed as projections of
the initial state vector onto the orthogonal axes of the chosen measurement basis.
3.9 The Measurement the Electron Spin Along any Spatial Axis
The spin of the electron can be measured along any spatial axis and each such measurement
is represented by an operator. Any such direction is purely relative to the state of the particle
in question. Let us assume an experiment, similar to the Stern - Gerlach experiment, where
the electrons are moving along the y-axis and their spins are measured in the z direction;
that means the z-component of the electron spin vector is pinned down. If we lter out the
electrons with spin down in the z-direction, we are left with electrons all with the spin up
in the z-direction. Assume that now we perform a new spin measurement on the remaining
spin up electrons along a direction in the xz-plane at an angle with the positive z-axis.
The measurement spin operator in this direction, S , will be given by the projections of the
Sx and Sz operators onto this new direction (the probabilities to obtain the spin values along
the basis axes are interpreted as the projections of the electron state vector onto those axes).

cos() sin()
S = sin()Sx + cos()Sz =
2 sin() cos()
73
The eigenvalues of this operator are +/2 and /2 and the corresponding eigenvectors are,
respectively,

cos(/2) sin(/2)
and
sin(/2) cos(/2)
Each electron measured in the stage of the experiment has the initial state vector as spin
up in the z-direction; now, that initial state vector can be expressed as a linear combination
of these new basis vectors

1 cos(/2) sin(/2)
= c1 + c2
0 sin(/2) cos(/2)
The coecients are estimated to be c1 = cos(/2) and c2 = sin(/2). The probabilities
of spin up and spin down for the measurement of such an electron along the -direction are
| cos(/2) |2 and | sin(/2) |2 , respectively, where is the angle between the two measurement
directions.
3.10 The Implications of the Quantum Mechanics Predictions

Such quantum mechanical predictions have been well supported by numerous experiments.
Assume that an equivalent experiment is performed by observing the spins of two particles
of spin 12 emitted in opposite directions following the decay of a singlet state with zero total
spin. For such a state the conservation of the angular momentum requires that the spin
vectors of the two particles are oriented in opposite directions; if we measure the spin of one
of the particles along a certain direction and nd spin up, then the other particle must
have pure spin down in that direction. The conclusion is that by measuring the spin of one
particle and reducing its state vector to one of the eigenvectors of the measurement basis, we
automatically collapse the wavefunction of the other particle onto the same basis. Instead of
a set of probabilistically possible states we obtain one, well dened state.
Now, assume that we perform the following experiment on two spin 12 particles which
are in a singlet state (one particle is spin up and the other particle is spin down). We
measure the spin of one particle (call it I) along a xed direction in the xz-plane and assume
we nd it in spin down state; then, the other particle (call it II) is in a pure spin up
state along that direction. Next, we perform a measurement on particle II along a direction
which is at an angle with the direction of rst measurement and we nd that the spin state
is a combination of spin up and spin down states with probabilities | cos(/2) |2 and
| sin(/2) |2 , respectively.
If the rst measurement on the particle I yields spin up, then particle II is in a pure spin
down state along that direction. A measurement of particle II spin at along a new direction
at angle with that of the rst measurement will nd particle II in a combination of spin
up and spin down states with probabilities | sin(/2) |2 and | cos(/2) |2 , respectively.
The probability that the successive measurements of each of the two particles along direc-
tions diering by angle will both give the same result (up-up or down-down) is | sin(/2) |2
and the probability that they yield opposite results (up-down or down-up) is | cos(/2 |2 .
The two particles emitted from the singlet are said to be entangled; regardless of how far
apart they travel before the spin measurements are made, the joint results will exhibit these
joint probabilities.
74
These probabilities can be evaluated from the results of multiple measurements; such
measurements have to be performed over several particle pairs, not just one pair. In principle
a large number of particle pairs can be prepared in an identical way, in space-separated
locations (in a string like formation) and the measurements can be performed independently
on the pairs. According to quantum mechanics the results are expected to satisfy the same
correlations.
A quantum computer may be imagined (at this time) as being based on a collection of
prepared pairs of particles whose entanglement (correlations) will satisfy all the requirements
for solving a certain parallel algorithm, or the algorithm can be designed to take advantage
of a specic entanglement.
Entanglement is the exact translation of the German term Verschr ankung used by
Schrodinger who was the rst to recognize this quantum eect.
Two entangled electrons have total spin 0 because the spins of the individual electrons
about any axis are always opposite; their spin will exist only as potentialities until an observer
chooses a denite axis and performs the measurement. The measurement axis becomes the
axis of rotation for both electrons and the instant electron I is measured, electron II acquires
a denite spin along the chosen axis, instantaneously. If we let a vertical axis represent1
and a horizontal axis represent 0, it is possible to pass 1s and 0s instantaneously across
large distances by selecting the appropriate axis of measurement. The measurement at the
receiving end must be made after the initial measurement at the sending end. Such temporal
aspects and the process of information transmission can be solved by using polarized entangled
particles.
An entangled pair is a single quantum system in a superposition of equally possible states.
The entangled state contains no information about the individual particles, only that they
are in opposite states. The important property of an entangled pair is that the measurement
of one particle inuences the state of the other particle. Einstein called that Spooky action
at a distance.
3.11 The Exchange of Information Using Entangled Particles

The method described in [12] can be used in exchanging information instantaneously at a
distance. Assume that we have prepared three particles in the following way: particle I is in
the initial state | 1 and resides in Paris; particles II and III are entangled particles which
have been separated; particle II is in Paris with particle I and particle III is in Orlando. The
essential point is to perform specic measurements on particle I and particle II which projects
them onto the entangled state
1
| 12 = (| 01 | 12 | 11 | 02 )
2
This is one of the four possible maximally entangled states into which any state of two particles
can be decomposed and it is distinguished from the other three by the fact that it changes
sign when particle I and II are interchanged. This anti-symmetric feature plays an important
role in experimental identication of this state.
Quantum physics predicts that once particles I and II are projected onto | 12 , particle
III is instantaneously projected into the initial state of particle I. What is the explanation?
Since we have entangled particles I and II, no matter what the state of particle I is in, the
particle II must be in the opposite state, a state which is orthogonal to the state of particle I.
75
Initially, particle II and III were prepared in state | 23 and that means that particle II state
is also orthogonal to particle III state. That is only possible if particle III in Orlando is in
the same state as particle I was initially. Particle I loses its identity during the measurement
entangling it to particle II; the state | 1 , where the information was inscribed, is destroyed
on the Paris side of the message transmission, but the information has been communicated
to particle III on the Orlando side of the message reception.
3.12 The Qubit as a Polarized Photon

A photon can have two independent polarizations and it is another important two-state system
recommended to represent a qubit. Photons dier from the spin- 12 particles in two ways:
(1) are massless, and
(2) have spin 1.
The spin of a particle classies how it transforms under the group of transformations that
preserve the particles momentum. For a particle with mass, which has a rest frame, this is the
rotation group. For a massless particle, there is no rest frame and the unitary transformations
are representations of the rotation group in two dimensions, i.e., the rotations about the axis
determined by the momentum. For a photon, this corresponds to the familiar property of
light - the waves are polarized transverse to the direction of propagation.
Under a rotation of angle about the axis of propagation the two linear polarization states
| h and | v (for horizontal and vertical polarization, respectively) transform as
| h cos() | h + sin() | v
| v sin() | h + cos() | v
The matrix

cos() sin()
sin() cos()
has the eigenstates

1 1 1 i
| R = | L =
2 i 2 1
with eigenvalues ei and ei for the states of right and left polarization, respectively. These
are the eigenstates of the rotation operator

0 i
= Sy
i 0
with eigenvalues 1. Because the eigenvalues are 1 we say that the photon has spin-1.
Remember the experiment with polarized light (photons) presented before. We had a
polarization analyzer that allowed only one of the two linear photon polarizations, say h,
to pass through. Only 1/2 of the photons could get through. We added an analyzer for
polarization v and no photon could get through. Then we interposed a 45o rotated polarizer;
then an h polarized photon had probability 1/2 of passing through it and a 45o polarized
photon had probability 1/2 of passing through the v polarizer. A device can be constructed
76
that rotates the linear polarization of a photon and in this way applies the rst transformation
mentioned above to our qubit; the qubit is now in a mixed state. If we use a device that
alters the relative phase of the two orthogonal linear polarization states

| h ei 2 | h | v ei 2 | v
the two devices can be used together to apply an arbitrary 2 2 unitary transformation to
the photon polarization state.
77
4 Quantum Circuits
In Section 6.7 we discuss models of computations and introduce Turing machines. To establish
if a function F (x) is computable or not, we have to nd a Turing machine able to carry out
the computation prescribed by the function F (x).
Churchs thesis, all computing devices can be simulated by a Turing machine, has pro-
found implications for the computability theory: it tells us that it is sucient to restrict
ourselves to Turing machines, instead of investigating a potentially innite set of computing
devices. A quantitative version of this thesis is: any physical computing device can be sim-
ulated by a Turing machine in a number of steps polynomial in the resources used by the
computing device. While no one has been able to nd counter-examples for this thesis, the
search has been limited to systems constructed based upon the laws of classical mechanics.
But the universe is essentially quantum mechanical, therefore there is a possibility that
the computing power of quantum mechanical computing devices might be greater [52]. If
this is true, then problems such as factoring integers 12 , or nding discrete logarithms, for
which no polynomial time algorithms are known, could be solved in linear time by a quantum
device. Therefore, the investigation of quantum computing devices is well motivated.
We learned from the previous section that quantum particles can be used to store infor-
mation. The state space of n qubits is considerably larger than the one of a classical system
with n bits, 2n . We could simulate very complex physical systems with an n-qubit quantum
computer able to manipulate qubits. Now we ask ourselves how the qubits can be transformed
inside such a quantum computing device.
4.1 Classical Logic Gates and Circuits

Boolean variables have values of either 0 or 1. The Boolean operations are: NOT, AND, NAND,
OR, NOR, and XOR. Boolean algebra deals with Boolean variables and Boolean operations.
Logic gates are the active elements of a computer, they transform the information using
the laws of Boolean algebra, see Figure 12. Logic gates implement Boolean operations. We
only describe standard logic gates with one or two inputs and one output; gates with more
than two inputs exist. Figure 12 presents six classic logic gates. For each gate we show the
output as a Boolean function of the input. Each logic function is characterized by a truth
table giving its output for dierent combinations of inputs.
We denote by the addition modulo two; the output is 1 when the two inputs dier and
it is 0 if they are identical. If one of the inputs, say y = 0, then x y = x. The truth table
of a modulo two adder is the same as the one of an XOR gate:
x y xy
0 0 0
0 1 1
1 0 1
1 1 0
It is not very dicult to prove that NAND gates are universal; any logic function can be
expressed using only NAND, Boolean operations thus, one can construct a logic circuit using
12
A paper presenting a polynomial time algorithm which determines if a number is prime or composite was
posted on a Web site on August 6, 2002 [2].
78
x y
0 1
NOT gate x y = NOT(x)
1 0
x y z
x 0 0 0
AND gate z = (x) AND (y) 0 1 0
y 1 0 0
1 1 1
x y z
x 0 0 1
NAND gate z = (x) NAND (y) 0 1 1
y
1 0 1
1 1 0
x y z
x 0 0 0
OR gate z = (x) OR (y) 0 1 1
y 1 0 1
1 1 1
x y z
x 0 0 1
NOR gate z = (x) NOR (y) 0 1 0
y 1 0 0
1 1 0
x y z
x 0 0 0
XOR gate z = (x) XOR (y) 0 1 1
y 1 0 1
1 1 0
Figure 12: Classic logic gates. The truth table of each logic gate gives its output function of
its input(s).
only NAND gates. By contrast XOR is not; indeed, it does not change the parity of its input. If
the input has odd parity (01 or 10) so does the output, it is 1; if the input has even parity
(00 or 11) so does the output, it is 0. Thus the class of Boolean functions constructed with
XOR alone is limited.
All gates with two inputs from Figure 12 are irreversible or non invertible. This means
that knowing the output we cannot determine the input for all possible combinations of input
values. For example, knowing that the output of an AND gate is 0 we cannot identify the
input combination; 0 can be produced by three possible combinations of inputs 00, 01 and 10.
Clearly, the NOT gate is reversible and two cascaded NOT gates recover the input of the rst
one.
79
The irreversibility of classical gates means that there is an irretrievable loss of information
and this has very serious consequences regarding the energy consumption of classical gates.
Now we give an example of a logic circuit using some of the logic gates presented in this
section, the full-adder, Figure 13. This circuit has three inputs a, b, and CarryIn and two
outputs Sum and CarryOut, see Figure 13(a).
CarryIn
CarryIn
a
Full
b Adder Sum
b
CarryOut
(a) (b) CarryOut
Figure 13: (a) A full one bit adder. (b) The circuit for the CarryOut.
The truth table of the full adder is:
a b CarryIn Sum CarryOut

0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
You may recall from an introductory course in computer architecture that one can derive
the Boolean equations giving the outputs of a logic circuit function of the inputs from the
truth table of the circuit as a sum of products [41]. Here sum stands for the Boolean OR and
product stands for Boolean AND. In this section a denotes the negation of the Boolean variable
a. Each term of the sum corresponds to an entry in the truth table where the output variable
is 1; each term is the product of the corresponding input variables in that row negated if the
value of the variable is 0, or without negation if the value is 1.
From the truth table of the full adder it is easy to see that:
bCarryIn + a
Sum = a bCarryIn + abCarryIn + abCarryIn
and
bCarryIn + abCarryIn + abCarryIn + abCarryIn
CarryOut = a
A manipulation of the last Boolean expression shows that:
CarryOut = ab + aCarryIn + bCarryIn.
80
Indeed, the truth table of the last expression is:
a b CarryIn ab aCarryIn bCarryIn CarryOut

0 0 0 0 0 0 0
0 0 1 0 0 0 0
0 1 0 0 0 0 0
0 1 1 0 0 1 1
1 0 0 0 0 0 0
1 0 1 0 1 0 1
1 1 0 1 0 0 1
1 1 1 1 1 1 1
If we compare the last column of the two truth tables we verify that the two Boolean
expressions are equivalent. Figure 13(b) shows the circuit implementing the last Boolean
expression for the CarryOut.
Exercise 1. Prove that the two expressions giving the CarryOut are equivalent using Boolean
algebra manipulation rather than truth tables.
Exercise 2. Using the truth table method prove de Morgans Laws:
a+b=a+b
ab = a + b
where a and b are two Boolean variables.
4.2 One Qubit Gates

A qubit gate is a black box transforming an input qubit | = 0 | 0 + 1 | 1 into an
output qubit | = 0 | 0 + 1 | 1.
Mathematically, a gate G is represented by a 2 2 transfer matrix with complex elements
gi,j (i, j) {1, 2}:

g11 g12
G=
g21 g22
Recall that the normalization condition requires that | 0 |2 + | 1 |2 = 1 and similarly
|0 |2 + | 1 |2 = 1. This implies that G must be a unitary matrix, in other words that
G G = 1. Here G is the adjoint of G, a matrix obtained from G by rst constructing GT ,

the transpose of G and then taking the complex conjugate of each element (or by rst taking
the complex conjugate of each element and then transposing the matrix), see Section 11.8 :
gi,j = Real(gi,j ) + i Imaginary(gi,j )

gi,j = Real(gi,j ) i Imaginary(gi,j ).
The transpose of a matrix has as rows the columns of the original matrix, thus:
81

T g11 g21
G =
g12 g22
It follows that:

g11 g21
G =
g12 g22
The condition for G to be unitary is:

g11 g11 + g21 g21 g11 g12 + g21 g22
GG = =1
g12 g11 + g22 g21 g12 g12 + g22 g22
Exercise 3. Derive the four equations relating the real and imaginary parts of the G matrix
elements implied by the previous condition.
The inverse of a unitary matrix G is also unitary, GG1 = I with I the identity matrix.
This implies that a quantum gate with unitary transfer matrix can always be inverted by an-
other quantum gate. This is extremely important, it shows that quantum gates are reversible
as opposed to classical gates which are irreversible.
Given the transfer matrix of a quantum gate, G, and the input and the output qubits
represented as column vectors the transformation performed by the gate is given by the
equation:
| = G |
For a single qubit gate this equation can be written as:

0 g11 g12 0
=
1 g21 g22 1
Thus:
0 = g11 0 + g12 1 and 1 = g21 0 + g22 1 .

We examine few important one qubit gates:
(i) I identity gate; leaves a qubit unchanged.
(ii) X or NOT gate; transposes the components of an input qubit.
(iii) Y gate.
(iv) Z gate; ips the sign of a qubit.
(v) H the Hadamard gate.
The transfer matrices of the rst four gates, I, X, Y, and Z are called the Pauli matrices.
The traditional notations for the Pauli matrices in quantum mechanics are: 0 , 1 or X , 2
or Y , 3 or Z . The transfer matrices, and the output of these gates , | = G | , given
the input | = 0 | 0 + 1 | 1 are listed below.

1 0 1 0 0 0
0 = I = = | = 0 | 0 + 1 | 1.
0 1 0 1 1 1
82

0 1 0 10 1
1 = X = = | = 1 | 0 + 0 | 1.
1 0 1 01 0

0 i 0 i 0 1
2 = Y = =i | = i1 | 0 + i0 | 1.
i 0 i 0 1 0

1 0 1 0 0 0
3 = Z = = | = 0 | 0 1 | 1.
0 1 0 1 1 1

1 1 1 1 0 0 + 1
H= 2 1
= | = 0 |0+|1
+1 |0|1
.
1 1 1 1 1 0 1 2 2
It is probably now the time to reinforce our understanding of the formalism, the Dirac ket
and bra notations by deriving the Pauli matrices from the descriptions of the transformations
they perform. Throughout this derivation we use the outer product of vectors dened in
Section 2.4.
Let us startwith the basics:
1

| 0 = 0 | = 1 0
0

0

| 1 = 1 | = 0 1
1
The I gate performs an identity transformation, | 0 | 0 and | 1 | 1. This can be
written as:
1
0

I =| 00 | + | 11 |= 1 0 + 0 1
0 1

1 0 0 0 1 0
I= + = .
0 0 0 1 0 1
aqubit, | 0 |
The X gate negates or ips and | 1 | 0. This can be written as:
1
1
0

X =| 01 | + | 10 |= 0 1 + 1 0
0 1

0 1 0 0 0 1
X= + = .
0 0 1 0 1 0
The Z gate performs a phase shift operation, | 0 | 0 and | 1 | 1. This can be
written as:
1
0

X =| 00 | | 11 |= 1 0 0 1
0 1

1 0 0 0 1 0
Z= = .
0 0 0 1 0 1
The Hadamard gate, H, when applied to a pure state, | 0 or | 1, creates a superposition
state, | 0 1/2(| 0+ | 1) and | 1 (| 0 | 1).
It is worth observing that single gate operations correspond to rotations and reections
on the Bloch sphere.
Exercise 4. (i) Verify that the transfer matrices of the X, Z, and H gates are unitary. (ii)
Verify that the outputs of the three gates are the ones given above.
Exercise 5. Prove that H 2 = I, where I is the unity matrix. Thus applying H twice to an
input does not change it.
83
4.3 Two Qubit Gates. The CNOT Gate
Now we describe one quantum gate with two inputs and two outputs called CNOT, controlled
not gate, see Figure 14. One of the inputs is called the control input, the other one is the
target input. The rst output is called the control and the second the target. The classical
equivalent of a quantum CNOT gate is the XOR gate discussed at the beginning of this section:
its output is the sum modulo two of its two inputs.
Control input a a
Target input b a O
+ b
O
+ addition modulo 2
Figure 14: The CNOT quantum gate. The target output is equal to the target input if the
control input is | 0 and ipped if the control input is not | 0.
The operation of the CNOT gate is informally described as follows: the control input is
transferred directly to the control output of the gate. The target output is equal to the target
input if the control input is | 0 and it is ipped if the control input is not | 0. Flipping
a bit a means complementing it, transforming it in a: if a = 0, it becomes 1 and viceversa.
Flipping a qubit | = 0 | 0 + 1 | 1 results in | = 1 | 0 + 0 | 1, the
projections on the two bases vectors are swapped.
The two input qubits and the two output qubits of a CNOT quantum gate can be represented
as vectors in a four dimensional vector space, the H4 Hilbert space. The two qubits applied
to the input of the CNOT gate in Figure 14 are: a control qubit | and a target qubit | :
| = 0 | 0 + 1 | 1 | = 0 | 0 + 1 | 1.
The input vector of the quantum CNOT gate is:

0 0
0 0 0 1
| VCN OT = | | = =
1 0
1 1
1 1
The components of the input vector are transformed by the CNOT quantum gate as follows:
| 00 | 00 | 01 | 01 | 10 | 11 | 11 | 10.

Thus transfer matrix of the CNOT quantum gate GCN OT is:
GCN OT = | 0000 | + | 0101 | + | 1011 | + | 1110 |

Let us start again from the basics:
84

1
1 1 0
| 00 =| 0 | 0 = =
0
0 0
0

0
1 0 1
| 01 =| 0 | 1 = =
0
0 1
0

0
0 1 0
| 10 =| 1 | 0 = =
1
1 0
0

0
0 0 0
| 11 =| 1 | 1 = =
0
1 1
1

1 1 0 0 0
0 0 0 0 0
| 0000 | =
0 (1 0 0 0) = 0 0 0 0

0 0 0 0 0

0 0 0 0 0
1
| 0101 | = (0 1 0 0) = 0 1 0 0
0 0 0 0 0
0 0 0 0 0

0 0 0 0 0
0 0 0 0 0
| 1011 | =
1 (0 0 0 1) = 0 0 0 1

0 0 0 0 0

0 0 0 0 0
0
| 1110 | = (0 0 01 0) = 0 0 0 0
0 0 0 0 0
1 0 0 1 0
Therefore:

1 0 0 0
0 1 0 0
GCN OT =
0

0 0 1
0 0 1 0
It is easy to determine the output | WCN OT given the input | VCN OT and the transfer
matrix of a CNOT gate:
| WCN OT = GCN OT | VCN OT .
85

1 0 0 0 0 0 0 0
0 1 0
0 0 1
| WCN OT = = 0 1 .
0 0 0 1 1 0 1 1
0 0 1 0 1 1 1 0
We see that the circuit in Figure 14 preserves the control qubit (the rst and the second
component of the input vector are replicated in the output vector) and ips the target qubit
(the third and fourth component of the input vector, | V become the fourth and respectively
the third component of the output vector).
The CNOT gate is reversible. The control qubit, | , is replicated at the output and
knowing it we can reconstruct the target input qubit | given the target output qubit of the
CNOT gate, | , see Figure 14.
Later, we show that CNOT is a universal quantum gate, any multiple qubit gate can be
constructed from single qubit and CNOT gates.
Exercise 6. Prove that the following relationship between the CNOT gate and the I and X
single qubit gates:
GCN OT =| 00 | I+ | 11 | X.
4.4 Can we Build Quantum Copy Machines?

There are several ways to replicate an input signal using classical gates. In Figure 15(a) we
see a classical analog of a CNOT gate, a binary circuit with two inputs, a control bit x and a
target bit y and two outputs x and x y. The second output is produced by an XOR gate. If
the target bit is zero, y = 0, the circuit simply replicates input x on both output lines, as
shown in Figure 15(b).
Now that we have a two qubit CNOT gate and have gured out how to replicate an input
classical bit, let us see if we can copy qubits. In Figure 15(c) we show a CNOT with an arbitrary
control qubit as input:
| = 0 | 0 + 1 | 1.
We wish to replicate | on its output lines and try to determine the target qubit that
will allow us to do so. To replicate the input implies that the output of this gate should be a
vector | W in H4
| W = | | = (0 | 0 + 1 | 1)(0 | 0 + 1 | 1)
= 0 2 | 00 + 0 1 | 01 + 0 1 | 10 + 1 2 | 11.
Alternatively:

02
0 0 0 1
| W = | | =| | = =
1 0 .
1 1
12
We do not know yet what the input should be, but, based upon the analogy with the
classical case, we suspect that | 0 may do it. Let us try to determine the actual output state
of the CNOT gate in Figure 15(d). First we determine the components of its input vector:
86
x x x x
z 0
CNOT CNOT
y +y
xO 0 x O+
=x
(a) (b)
a | 0 > + b |1 > a | 0 > + b |1 >
? a | 0 > + b |1 >
Desired output state ( a | 0 > + b |1 > )( a | 0 > + b |1 > ) =

a2 | 00 > + ab | 01 > + ab | 10 > + b
2
| 11 >
(c)
a | 0 > + b |1 >
|0>
Input state ( a | 0 > + b |1 ) ( | 0 > ) = a | 00 > + b | 10 >
Desired output state a2 | 00 > + ab | 01 > + ab | 10 > + b

2
| 11 >
Actual output state a | 00 > + b | 11 >
(d)
Figure 15: (a) A classical binary circuit with two inputs x and y and two outputs x and
x y. (b) When y = 0 the circuit in (a) simply replicates input x on both output lines.
(c) A quantum CNOT with an arbitrary input | = 0 | 0 + 1 | 1; we would like it to
replicate | on its output lines. We know its desired output state but we do not know yet
what the second input should be. (d) If we select the second input to be | 0 then the output
is 0 | 00 + 1 | 11, not exactly what we wished for.
| V = (| )(| 0) = (0 | 0 + 1 | 1)(| 0) = 0 | 00 + 1 | 10.

0
0
|V =
1
0
The actual output vector is:
87

1 0 0 0 0 0
0 1 0
0 0
| W = GCN OT V = = 0 .
0 0 0 1 1 0
0 0 1 0 0 1
The actual output vector is dierent from the desired one. The only conclusion we can
draw from the exercise described above is that the CNOT gate in Figure 15(d) cannot be used
to copy qubits. As we shall see when we discuss the no-clonning theorem we simply cannot
copy quantum states.
An informal explanation of the limitations of the circuit in Figure 15(d) is that once we
measure one of the qubits of 0 | 00 + 1 | 11 we obtain 0 with probability | 0 |2 or
1 with probability | 1 |2 and once we have measured one qubit the other is completely
determined.
4.5 Three Qubit Gates. The Fredkin Gate

Three qubit gates have three inputs and three outputs. One or two of the inputs are referred
to as control qubit(s) and are transferred directly to the output.
The gate in Figure 16 is called a Fredkin gate. There are quantum Fredkin gates which
are reversible, as well as classical ones that perform similar functions, but are not reversible.
We use the classical version of this gate to construct the table of truth and derive the Boolean
expressions relating the inputs and outputs.
The Fredkin gate has two regular inputs, a, b, and a control input c and three outputs,
a , b and c . The control input c is transferred directly to the output, c = c. The control

input determines the regular output as follows:

(i) When c = 0 the two regular inputs are transferred without modication to the output,
a = a and b = b, see Figure 16(a). The truth table for this case has four entries, one for
each possible combinations of values of a and b.
(ii) When c = 1 the two regular inputs are swapped, a = b and b = a, see Figure 16(b)
for the cisrciut and its truth table.
The full truth table of the Fredking gate in Figure 16(c) is obtained by concatenating the
truth tables for the two congurations in Figures 16(a) and (b). The logic expressions for the
output of the Fredkin gate are derived following the rules we used for the Caryyout of the
full adder presented at the beginning of this section. The last equation below conrms that
the control input is transferred directly to the output:
a = a bc + a bc + abc + abc = bc(a + a ) + a c(b + b) = bc + ac.

b = a bc + abc + ab c + abc = b c(a + a) + ac(b + b) = b c + ac.
c = a bc + abc + abc + abc = a c(b + b) + ac(a + a ) = ac + ac = c.
Now we discuss several properties of the Fredkin gate. First, we show that indeed the
Fredkin gate is reversible, knowing a , b and c we can determine a, b and c. This is easy
to prove; for example, one can express a and b function of a , b and c using the truth
table in Figure 16(c). If we apply two consecutive Fredkin gates with inputs and outputs
(a, b, c)(a , b , c ) and (a , b , c )(a , b , c ), respectively, (the second one has as input the output
of the rst) then a = a b = b, and c = c (the output of the second gate is the same as
the input of the rst one).
88
Three
qubit
gate
a a a b a a'
b b b a b b'
0 0 1 1 c c'
Input Output Input Output Input Output

a b c a' b' c' a b c a' b' c' a b c a' b' c'
0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
0 1 0 0 1 00 0 1 1 1 0 1 0 1 0 0 1 0
1 0 0 1 0 0 1 0 1 0 1 1 1 0 0 1 0 0
1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0
0 0 1 0 0 1
(a) (b)
0 1 1 1 0 1
1 0 1 0 1 1
1 1 1 1 1 1
(c)
0 bc 1 c
b bc 0 c
c c' c c'
a=0 a= bc & b = bc a=1 & b=0 a= c & b = c
(d) (e)
Figure 16: The Fredkin gate has three inputs, a, b, and control, or c; it also has three outputs,
a , b , and c = c. (a)When c = 0 the inputs appear at the output, a = a and b = b.
(b) When c = 1 the inputs are swapped, a = b and b = a. (c) The truth table for the
Fredkin gate. (d) The Fredkin gate becomes an AND gate when a = 0; then a = bc and
b = b c. (e) The Fredkin gate becomes a NOT gate when a = 1 and b = 0.
We examine the truth table once again and observe that the Fredkin gate conserves the
number of 1s at its input and for this reason is called a conservative logic gate. This prop-
erty suggests analogies with other physics laws regarding the conservation of mass, energy,
momentum.
Last, but not least, we show that the Fredkin gate is universal. The Fredkin gate can
89
simulate an AND gate and a NOT gate. Consider the case shown in Figure 16(d) when a = 0.
In this case we have an AND gate (recall that the product of b and c corresponds to the boolean
AND):
a = bc and b = b
c.
When a = 1 and b = 0, see Figure 16(e), we have a NOT gate:
a = c and b = c.
The conguration in Figure 16(e) generates two copies of the input c and can be used as a
FANOUT gate. The Fredkin gate can perform a switching function called CROSSOVER when it
swaps its two inputs as in Figure 16(b).
The input and output vectors of a quantum Fredkin gate are related by a set of linear
equations:
| WF redkin = GF redkin | VF redkin

where GF redkin is the transfer matrix of the gate.
A system of three qubits requires an eight-dimensional complex vector space with a basis
consisting of eight vectors: | 000, | 001, | 010, | 111, | 100, | 101, | 110, and | 111. In
this space a vector | V is a linear combination of the basis vectors with complex coecients
000 , 001 , 010 , 011 , 100 , 101 , 110 , 111 :
| VF redkin = 000 | 000 + 001 | 001 + 010 | 010 + 011 | 011 +
100 | 100 + 101 | 101 + 110 | 110 + 111 | 111.
Thus GF redkin = [ gij ] (i, j) {1, 9}.
Exercise 7. Construct GF redkin , the transfer matrix of a Fredkin gate.
Exercise 8. Show that GF redkin is unitary. This implies that the normalization condition
holds:
| 000 |2 + | 001 |2 + | 010 |2 +011 |2 + | 100 |2 + | 101 |2 + | 110 |2 + | 111 |2 = 1.
Exercise 9. Prove that the following relationship between the Fredkin gate and the swap gate
and the I gates:
GF redkin =| 00 | I+ | 11 | Gswap

with
Gswap =| 0000 | + | 0110 | + | 1001 | + | 1111 |
90
a a a a 1 1
b b b b b b
c 1 0 b
c O
+ ab NAND(ab)
(a) (b) (c)
Figure 17: (a) The Tooli gate has three inputs, two control inputs a and b and a target, c.
The outputs are a = a, b = b and c . If both control bits or qubits are 1 then c is ipped,
otherwise its state is unchanged. (b) A classic Tofolli gate can be used to implement a NAND
gate; when c = 1 then c = 1 (a AND b) = NOT(a AND b). (c) A quantum Tooli gate
performs a FANOUT function.
4.6 The Tooli Gate

While the Fredkin gate had only one control input the Tooli gate has two control inputs, a
and b and one target input c. The outputs are: a = a, b = b and c , see Figure 17(a). The
Tooli gate is a universal gate and it is reversible.
There are both classical and quantum version of Tooli gates. The table of truth and the
transfer matrix of classical and quantum Tooli gates are identical. For the sake of clarity
we describe rst the function of a classical Tooli gate. If both control bits are 1 then c is
ipped, otherwise its state is unchanged. The truth table of the classic Tooli gate reects
the functional description, c = c for the rst six entries and c = c for the last two, when
a = b = 1:
a b c a b c
0 0 0 0 0 0
0 0 1 0 0 1
0 1 0 0 1 0
0 1 1 0 1 1
1 0 0 0 0 0
1 0 1 1 0 1
1 1 0 1 1 1
1 1 1 1 1 0
From the table of truth we see that c = c (a AND b). From the denition of addition
modulo two (denoted by ) it follows immediately that for any binary variable a, 1 a = a
and 0 a = a.
If c = 1 then c = a NAND b. If c = 0 then c = a AND b. The Tooli gate implements
both NAND and AND functions, see Figure 17(b). Figure 17(c) shows a FANOUT circuit; when
a = 1 and c = 0, the target output c = c (a AND b) = 0 b = b. The second control input
bit is replicated.
Let us now examine the quantum Tooli gate. Let the three input qubits corresponding
to a, b, and c be:
| = 0 | 0 + 1 | 1
91
| = 0 | 0 + 1 | 1
| = 0 | 0 + 1 | 1
Then the input is | VT of f oli = | | | . If we substitute the expressions for the

three qubits and carry out the vector multiplication we get:
| VT of f oli = 0 0 0 | 000 + 0 0 1 | 001 + 0 1 0 | 010 + 0 1 1 | 011 +
1 0 0 | 100 + 1 0 1 | 101 + 1 1 0 | 110 + 1 1 1 | 111.
The vector describing the input is:

0 0 0
0 0 1

0 1 0
0 0
0 0 0
0 1 0 0 1 1
| VT of f oli = =
1 0 =

1 1 1 1 1 0 0
1 1 1 0 1

1 1 0
1 1 1
The transfer matrix of a Tooli gate is:

1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0
GT of f oli =
0

0 0 0 1 0 0 0
0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
The output of the Tooli gate is | WT of f oli = GT of f oli | VT of f oli :

1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 1
0 0 1 0 0 0 0 0 0 1 0 0 1 0

0 0 0
1 0 0 0 0 0 1 1 0 1 1
| WT of f oli =
0 0 0 = .
0 1 0 0 0
1 0 0

1 0 0

0 0 0 0 0 1 0 0
1 0 1 1 0 1
0 0 0 0 0 0 0 1 1 1 0 1 1 1
0 0 0 0 0 0 1 0 1 1 1 1 1 0
Exercise 10. Prove that the following relationship between the Tooli gate and the CNOT and
the I gates:
GT of f oli =| 00 | I+ | 11 | GCN OT .
92
4.7 Quantum Circuits
Quantum circuits are built by interconnecting quantum gates. Several limitations are imposed
in the realization of quantum circuits. First, the circuits are acyclic; feedback from one part
of the circuit to another is not allowed, there are no loops. Second, we cannot copy qubits as
we have seen earlier.
In this section we give two examples of quantum circuits. Figure 18 presents a circuit
a two bit adder circuit constructed with reversible gates. Figure 19 displays a circuit for
swapping two qubits.
a a' a'=a
b b' b'=Sum =a XOR b
c=0 c' = a AND b c'= CarryOut= a AND b
Figure 18: A two-bit adder made of reversible gates.
The two bit adder consists of a Tooli gate followed by a CNOT gate. The two control
inputs to the Tooli gate are a and b and its target input is c = 0. The target output of the
Tooli gate is c = c (aANDb) = aANDb, because c = 0. This propagates to the output and
c = aANDb. Then a = a = a as the control input of the CNOT propagates to its output

unchanged. Finally, b = b and this becomes the target input of the CNOT gate. Then the
target output of the CNOT gate is b = aXORb = aXORb. Thus the two outputs of the circuit
are the Sum and the CarryOut of the two inputs , a and b.
The circuit for swapping two qubits consists of three CNOT gates.
a a'=a a'=a O
+ b =a O
+ (a +
O b) a'=b
b + b
b'= a O + b
b'= a O + b= a
b'=aO
Figure 19: A circuit for swapping two qubits. The inputs and outputs of each stage are shown.
We observe that a (a b) = (a a) b = b and b (a b) = (b b) a = a. It

is easy to see that a a = 0 and that 0 b = b.
To show that the two inputs are swapped we determine the output of each stage. The
inputs to the second stage are a = a and b = a b. The inputs to the third stage are
a b = b and b = b. Based upon the previous observations we see that the outputs of the
third stage are a = b and b = a, thus the circuit swaps it two inputs.
If H is the Hadamard gate, the Welsh-Hadamard transform is dened recursively as:
W1 = H
93
Wn+1 = H Wn .
When applied to n qubits in state | 0 individually, the Welsh-Hadamard transform creates
a superposition of 2n states:
1
(H H H . . . H) | 000 . . . 0 = [(| 0+ | 1) (| 0+ | 1) . . . (| 0+ | 1)].
2n
| CarryIn> | CarryIn>
|x> |x>
|y> |y>
1 2
|0> | Sum>
3 4
|0> | CarryOut>
Figure 20: A full one qubit adder constructed with Tooli and CNOT gates.
Example 9. We show that the circuit in Figure 20 constructed with Tooli and CNOT gates
implements a full one qubit adder.
First, we observe that the rst three qubits from the top, namely | c =| CarryIn, | x and
| y are control qubits on all gates thus they are transferred without any change to the output.
We only have to compute expressions for | Sum and | CarryIn.
For simplicity we drop the ket notations and denote qubit | x simply as x.
If we call s1 and s2 the results of CNOT transformations at stages marked as 1 and 2 in
Figure 20 we see that:
s1 = c 0 = c,
s2 = s1 x = c x,
Sum = s2 y = (c x) y.
It is easy to show that:
(c x) y = xyc + x
y c + y
xc + c
xy.
This is precisely the expression for the Sum derived in Section 4.1. To prove this equality we
use the fact that x y = xy + xy and de Morgans Laws: x + y = xy and xy = x + y.
We call c3 and c4 the results of transformations performed by Tooli gates at stages marked
as 3 and 4 in Figure 20 we see that:
94
c3 = 0 (xy) = xy
c4 = c3 (cx) = (xy) (cx),
CarryOut = c4 (cy) = (xy) (cx) (cy).
It is easy to show that:
(xy) (cx) (cy) = xyc + xyc + x

y c + xy
c.
This is precisely the expression for the CarryOut derived in Section 4.1.
|a> | a >
|b> | b >
|c> | c >
Figure 21: A three qubit gate with a CNOT.
Example 10. The transfer matrix of the circuit in Figure 21 is:

1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0
GQ = 0 0 0 0

0 1 0 0
0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
We observe that the topmost qubit | a is the control qubit of a CNOT gate, thus it is
transferred to the output unchanged, | a =| a. The second qubit is not aected, | b =| b.
The third qubit is ipped when is | a =| 1 and left alone if | a =| 0. Henceforth, the truth
table of this gate is:
a b c a b c
0 0 0 0 0 0
0 0 1 0 0 1
0 1 0 0 1 0
0 1 1 0 1 1
1 0 0 1 0 1
1 0 1 1 0 0
1 1 0 1 1 1
1 1 1 1 1 0
This means that the gate maps its input to the output as follows:
95
| 000 | 000
| 001 | 001
| 010 | 010
| 011 | 011
| 100 | 101
| 101 | 100
| 110 | 111
| 111 | 110
Therefore:
GQ =| 000000 | + | 001001 | + | 010010 | + | 011011 | +
| 100101 | + | 101100 | + | 110111 | + | 111110 | .
It is easy to compute the outer products in the expression above. As an example we
compute the last term of this sum.

0
0
0
0
0
0 0 0
| 111 = = 0 0 = 0
1 1 1 1 0
0
0
1
0
1
Now

0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0
| 111110 |=

(0 0 0 0 0 0 1 0) =

0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 1 0
4.8 The No Cloning Theorem

The transformations carried out by quantum circuits are unitary. As a consequence of this
fact unknown quantum states cannot be copied or cloned. Several proofs of this theorem are
available [38].
Here we present a proof by contradiction presented rst in [63]. Let us assume that a two
input gate capable of clonning one of its input exists. By now we know that a gate performs a
linear transformation of its input vector V into an output vector W = GV with G the transfer
matrix of the gate. In Section 11 we show that transformation performed by matrices can be
represented by linear operators. Let us call U the unitary transformation corresponding to
the unitary matrix G of a two input gate that allows us to replicate an input qubit.
96
Let a and b be two orthogonal quantum sates, or qubits. If we apply each one of them at
the rst input of the gate independently, the fact that the gate clones its input means that:
U (| a0) = | aa
and
U (| b0) = | bb.
These two equations simply state that when the second input qubit is 0 the output vector
of the gate (a vector in H4 obtained by composing the two output qubits) consists of two
replicas of the rst qubit, | a for the rstcase and | b for the second case.
Now consider another state | c = 1/ 2(| a + | b. U is a linear transformation thus:
1 1
U (| c0) = [U (| a0) + U (| b0)] = [| aa + | bb].
2 2
We could try to apply c at the input of our cloning gate. Then we expect c to be cloned:
U (| c0) =| cc.
But
1
| cc = [1/ 2(| a + | b)][1/ 2(| a + | b)] = (| aa + | ab + | ba + | bb).
2
This contradicts the expression we got earlier assuming linearity. We have to conclude
that there is no linear operation to reliably clone unknown quantum states. It is possible to
clone known states, after a measurement has been performed. Before the measurement of a
quantum system in an unknown state the outcome of the measurement is uncertain. After
the measurement the outcome is the projection of the state on one of the bases vectors, thus
it is well determined.
4.9 Mathematical Models of a Quantum Computer

Several mathematical models for a quantum computer have been proposed, including the
quantum Turing machine model [5], and the quantum cellular automata model [23]. In a
recent publication Peter Shor provides a succinct description of the quantum circuit model
[55].
A quantum computer consists of input and output wires carrying qubits and quantum
circuits, which in turn are made out of quantum gates. The number of input and output
bits of a classical gate may dier, while a quantum gate maps q input qubits into precisely q
output qubits. This is a necessary but not a sucient condition for reversibility.
The mathematical representation of such a quantum gate is a unitary matrix with 2q rows
and 2q columns, G = [gi,j ] 1 i, j 2q . If V is an input vector, then the output vector W
is given by:
W = GV.
The input vector V is the product of q two-dimensional vectors, each one of them represent-
ing the state of one qubit. Each qubit can be in a superposition state | = 0 | B0 +1 | B1
97
where B0 and B1 are orthonormal base vectors in H2 . | 0 and | 1 are often used as base
vectors of H2 .
The input vector of the gate represents a superposition state of a quantum system in a
Hilbert space H2q , see Section 3.5. Similarly, the output of the gate is a vector representing
the superposition state of a quantum system in the Hilbert space H2q and can be regarded as
the product of q qubits.
Let us now consider an n qubit quantum computer. The joint space of n qubits is the
tensor product of the individual state spaces of the n qubits, H2n is a tensor product of n
two-dimensional Hilbert spaces:
H2n = (H2 )n .
If b1 b2 b3 , . . . bn is a binary string and | Vb1 , | Vb2 , | Vb3 , | Vbn are vectors in H2 then
the tensor product of these vectors is:
Vb1 b2 b3 ,...bn =| Vb1 | Vb2 | Vb3 | Vbn .

An equivalent notation is:
Vb1 b2 b3 ,...bn =| Vb1 Vb2 Vb3 . . . Vbn .

The basis vectors of the H2n Hilbert space space are:
B0 =| 000 . . . 000, B1 =| 000 . . . 001, B2 =| 000 . . . 010, B3 =| 000 . . . 011,
... ... ... . . . B2n 1 =| 111 . . . 110, B2n =| 111 . . . 111.
The input to a quantum computer is classical information represented as a binary string
of say k n bits. This string is mapped to an input vector with the last n k bits set to
zero:
V = b0 b1 . . . bk1 00 . . . 0.
The output of the n-qubit quantum computer is:
n 1
2
W = i Bi .
i=0
The i are complex numbers called probability amplitudes and they must satisfy the con-
dition:
i 1
2
| i |2 = 1.
i=0
According to Heisenbergs uncertainty principle we cannot measure the complete state of

a quantum system. If we observe the output of the quantum computer we decide that the
result is i (represented by the binary string b1 b2 . . . bn with probability | i |2 . For example
if n = 3, the output value i = 6 corresponds to the binary string 110 with b0 = 1, b1 =
b2 = 1. If 110 = 0.3 + 0.4i then the probability of observing the value 6 of the output is
1,
(0.32 + 0.42 ) = 0.5.
Each gate of a quantum computer acting on q-qubits induces a transformation of the state
space H2n of the entire quantum computer. For example, the action of a 2-qubit quantum
gate with a 4 4 transfer matrix G = [gk,l , k, l {1, 2}] acting on two qubits, say i and j,
98
of an n-qubit quantum computer is represented by the tensor product of the matrix A with
n 2 identity matrices, each acting on one of the remaining qubits.
99
5 The Entanglement of Computers and Communica-
tion with Quantum Mechanics
5.1 Uncertainty and Locality

Earlier we noted that Heisenbergs uncertainty principle reects the fact that measurements
of quantum systems disturb the system being measured. This fundamental law of physics is
even more profound. Outcomes of experiments, we call them observables, have a minimum
level of uncertainty that cannot possibly be removed with the help of a theoretical model.
Let us assume that we are interested in two observables of a quantum particle, call them A
and B, for example, the time and the energy of a photon, or the position and the momentum
of an electron. We prepare a large number of quantum systems (in our example photons or
electrons) in identical states | and measure rst the observable A and then the observable
B on all particles. Since all systems were initially in the same state we obtain the same value
for the observable A and we see a large standard deviation of observable B.
Alternatively, we can measure the observables A on some and the observable B on the oth-
ers. Call (A) and (B) the standard deviation of the measurements of A and B respectively.
Heisenbergs uncertainty principle states that:
1
(A) (B) | | [A, B] | | .
2
Uncertainty is a fundamental property of quantum systems. The classical interpretation
of probabilities as lack of knowledge regarding future events is no longer true for quantum
systems where probabilities reect our limited ability to acquire knowledge about the present.
After observing the weather in Orlando, Florida, for many years we may determine that the
conditional probability of rain for a fth consecutive day given that there were already four
consecutive rainy days is say, one in one hundred. So after four rainy days, we can say with
high probability, 0.99, that there will be no rain during the fth day. But, very seldom, this
prediction of the future will turn out to be false. On the other hand, for quantum systems once
we have measured the position of a particle there is a certain distribution of its momentum,
it cannot be precisely determined now, regardless of our ingeniosity.
Leading the camp of those who did not accept the uncertainty at the core of quantum
mechanics there is no more imposing gure than Albert Einstein. He is one of the pioneers
of the new theory; in 1905 the little known beamter of the patent oce in Bern published
three papers that attracted the worlds attention: one on the photoelectric eect, one on the
special theory of relativity, and one on statistical thermodynamics. In his explanation of the
photoelectric eect Einstein considered the photons as discrete packets carrying an energy
E = h with h the Planck constant and the frequency of the light. His explanation was
in perfect agreement with the experiments and with Max Planks expression for the energy
produced by a light-emitting system, E = nh with n = 0, 1, 2, ....
Until his death, in 1955, Albert Einstein argued with Bohr, Heisenberg, Schroedinger,
Dirac, and other supporters of the Copenhagen doctrine that non-determinism cannot possibly
be the basis for the laws of nature. The famous pronouncement God does not play dice
reects Einsteins strong conviction that the quantum theory was missing something, it was
an incomplete scientic theory. Einstein believed that some hidden variables are probably
missing from quantum mechanics and if one could discover the values of these variables the
randomness would disappear.
100
Albert Einstein and other physicists believed that there are three fundamental principles
for a theory attempting to accurately describe the nature: (i) A deterministic view allowing
variables to be known with great precision like the one supported by Isaac Newtons theory.
Probabilities are acceptable to describe the outcomes of experiments, but only under special
conditions, e.g., when boundary condition limit our ability to get a complete description
of reality. (ii) Locality, the fact that systems far apart in space can inuence each other
only by exchanging signals subject to the limitations cause by the nite speed of light. (iii)
Completeness, the inclusion of elements of reality.
For years Einstein together with the most preeminent scientists of the time, met regu-
larly at Solvay Conferences 13 . Regularly, at these meetings Einstein attempted to illustrate
the fallacy of Heisenbergs uncertainty through the so-called gedanken or thought ex-
periments, to the exasperations of his friends and foes alike. Bohr & Co. were forced to
nd explanations for the apparent contradictions brought forth by Einstein. One of these
experiments is outlined below: a box containing radiation is weighted both before and after
releasing several photons. The time of the release is measured precisely by a clock controlling
the door of the box and the energy of the photons can be deduced precisely using Einsteins
formula E = mc2 , where c = 1010 cm/sec is the speed of light. Thus both the time of
the release of the photons and their energy can be precisely determined, argued Einstein. It
took Niels Bohr a sleepless night in Bruxelles to nd the aws in Einsteins argument.
Exercise 11. Find out what were the arguments of Bohr, or provide your own arguments
showing that the experiment suggested by Einstein does not contradict the uncertainty princi-
ple.
But the real test for uncertainty came later, in 1933, when Albert Einstein imagined the
following gedanken experiment designed to put the matter to rest once and for all. The
idea of this experiment is to have two particles related to each other, measure one and gather
knowledge about the other.
Consider two particles A and B with known momentum ying towards each other and
interacting with each other at a known position, for a very brief period of time. An observer,
far away from the place where the two particles interacted with each other, measures the
momentum of particle A and based upon this measurement is able to deduce the momentum
of particle B. The observer may choose to measure the position of particle A instead of
its momentum. According to the principles of quantum mechanics this would be a perfectly
legitimate proposition, but in agrant violation of common sense. How could the nal state
of particle B be inuenced by a measurement performed on particle A long after the
physical interaction between the two particles has terminated?
A year later Einstein, Podolski and Rosen (thus the abreviation EPR), wrote a paper
published in the Physical Review that stimulated a great interest and seemed at that time
to have denitely settled the dispute between Niels Bohr and Albert Einstein in favor of the
author of the relativity theory [28]. The ingeniosity of the EPR experiment is that the position
and the momentum of one particle is determined precisely by measurements performed on its
entangled twin. The authors of the EPR paper write If, without disturbing the system, we
can predict with certainty (i.e., with probability one) the value of physical quantity, then there
is an element of physical reality corresponding to this physical quantity. They concluded
that though the position and the momentum of the entangled twin are elements of the
13
Ernest Solvay was an industrialist from Bruxelles who nanced scientic meetings with the hope to have
an audience for his own scientic theories.
101
physical reality and because the quantum mechanics does not allow both to be part of the
description of the state of the particle, the quantum mechanics is an incomplete theory.
In 1952 David Bohm a former student of Robert Openheimer at Berkeley 14 suggested
a change of the EPR thought experiment, this time involving two particles but with one
variable of interest, the spin with two possible values, Up and Down, instead of two
variables of interest, the position and the momentum. Two entangled particles, A and B
are generated by the same source and move away from each other. The spin of particle A
is measured by Alice, say, in the x-direction; the spin of particle B is measured by Bob,
say, in the y-direction. Once Alice measures the spin of A and nds it to be Up the spin
measured by Bob must be Down for all directions x and y.
5.2 Possible Explanations of the EPR Paradox

In the previous section we learned that an EPR experiment consists of generating a pair of
maximally entangled particles and demonstrating their strange properties with the aid of the
main characters of cryptographic texts, Alice and Bob, with Caroll in a supporting role. Alice
and Bob are at dierent locations and need to communicate with one another. Caroll sends
one of the entangled particles to Alice
and the other to Bob. The state of these particles is
described by the vector | = 1/ 2(| 00 + | 11).
Let us assume that when Alice measures her qubit she observes the state | 0. This means
that the combined state is | 00 and Bob will observe the same state, | 0 on his qubit. When
Alice observes her qubit in state | 1, the combined state is | 11 and Bob observes the state
of his qubit to be | 1.
This behavior appears to violate several laws of physics; it seems to indicate that commu-
nication with a speed greater than the speed of light is possible and that non-local eects may
occur. Einstein, Podolski, and Rosen proposed a hidden variable theory. They argued that
the state of the particle is hidden from us; both particles are either in state | 0 or in state | 1,
but we do not know which. Later, John Bell showed that the hidden variable theory predicts
that measurements performed on any system must satisfy the so-called Bell inequality [6, 7].
But measurements performed on quantum systems indicate that the Bell inequality is violated
thus the hidden variable theory must be false. The results of measurements performed with
respect to dierent basis conrm the fallacy of the hidden variable theory.
Another attempt to explain the EPR paradox is that the measurement on one of the
particles aects the other. But this contradicts the relativity theory as we shall see shortly.
Imagine two external characters, say Samantha and Hector, who observe Alice and Bob.
Samantha reports that Alice measures her particle rst (Alice may have observed the state
| 1 of her particle forcing the same state on Bobs particle). In turn, Hector reports that Bob
measures his particle rst (Bob may have measured the state | 0 of his particle forcing the
same state on Alices particle). Yet, the laws of physics must be invariant of the observer. In
our case we must provide equally consistent explanations of the observation reported by both
Samantha and Hector. Therefore, causality does not explain the EPR paradox either.
14
David Bohm was investigated in 1949 by the House Selected Committee on An-American Activities during
the McCarthy era. Bohm lost his position at Princeton, went to Sao Paolo, and nally to the University of
London.
102
5.3 The Bell Inequality. Local Realism
We now describe an experiment similar to EPR, that sets up the stage for deriving the
Bell inequality. This new experiment is consistent with common sense and does not involve
quantum particles or quantum mechanics. The only assumptions we need to derive our results
are intuitive, common sense, and are called local realism:
(i) Locality - measurements of dierent physical properties of dierent objects, carried out by
dierent individuals, at distinct locations, cannot inuence each other.
(ii) Realism - physical properties are independent of observations.
Assume that Caroll prepares two particles and sends one of them to Alice and the other to
Bob, see Figure 22. There are two physical properties that Alice could measure on her particle,
Q and R. The results of these measurements are the values of the two physical properties, Q
and R respectively. We impose the condition that the results of the measurements can only
be Q, R = 1.
Similarly, Bob can measure on his particle two physical properties S and T . The results
of these measurements can be S, T = 1. All four properties (Q, R, S, T ) are objective and
the results of the measurements, (Q, R, S, T ), have a well dened physical interpretation.
Caroll
Alice Bob
Q = +/- 1 S = +/- 1
R = +/- 1 T = +/- 1
Figure 22: The experimental set-up for the Bell inequality.
Neither Alice, nor Bob know in advance which property they will measure. They perform
the measurements simultaneously. Before a measurement, each one of them tosses a fair coin
to decide which property they are going to measure. Q, R, S, T are random variables with a
joint probability distribution function p(q, r, s, t) = P rob(Q = q, R = r, S = s, T = t). We
shall prove that:
E(QS) + E(RS) + E(RT ) E(QT ) 2.

where E() means the expected value of the random variable .
Dene:
= QS + RS + RT QT = S(R + Q) + T (R Q).
It is relatively easy to see that = 2. Indeed, either the rst term, S(R + Q), or the
second one, T (R Q), must be equal to zero because Q, R, S, T = 1. If one of the terms
is zero, then the other one must be equal to 2.
103
Now:

E() = p(q, r, s, t)(qs + rs + rt qt) 2 p(q, r, s, t).
(q,r,s,t) (q,r,s,t)
But

p(q, r, s, t) = 1.
(q,r,s,t)
The results of the measurements performed by Alice and Bob are independent random
variables and so are the pairs of random variables, one generated by Alice and the other by
Bob, QS, RS, RT , and QT so that:
E(QS + RS + RT QT ) = E(QS) + E(RS) + E(RT ) E(QT ).

This completes our proof of the Bell inequality and it is the time to leave the world
of classical systems and turn again our attention to quantum systems. Consider a pair of
entangled qubits in the state: = 1/ 2(| 01 | 10). Caroll sends one of the entangled
particles to Alice and the other to Bob. We have several one qubit gates that can be used to
observe a qubit; among them are the X gate (it transposes the components of a qubit) and
the Z gate (it ips the sign of a qubit). Alice and Bob measure the following observables:
Alice Q = Z1 R = X1

Bob S = 1/ 2(Z2 X2 ) T = 1/ 2(Z2 X2 )/ 2
Let QS denote the average value of the observable QS. We will show later, when we
have a better grasp of quantum mechanics, that the average values of pairs of observables are:
1 1
QS = RS = RT = QT = .
2 2
It follows immediately that:

QS + RS + RT QT = 2 2.
This means that quantum mechanics predicts a value for the sum of averages of observables
in violation of the Bell inequality. When we obtain two contradictory results using two
dierent theoretical models it means that one of the models is wrong. Then we must turn to
experiments to determine which one is wrong.
In this case the experiments prove the quantum mechanics to be correct. This means that
at least one of the two common sense assumptions presented at the beginning of this section
is wrong.
5.4 EPR pairs and Bell States

Some of the applications presented in this Chapter are based upon the miraculous properties
of entangled particles, the so called EPR pairs. An EPR pair can be in one of four states
called Bell states. These states form a normal basis:
| 00 + | 11
| 00 =
2
104
| 01 + | 10
| 01 =
2
| 00 | 11
| 10 =
2
| 01 | 10
| 11 =
2
The Bell states can be distinguished from one another. The rst one is often called a
maximum entangled; the last one is called anti-correlated state.
stage 1 stage 2
|a>
|a> H |a> H
|V> |W>
|b> I
|b> |b>
(a) (b)
Figure 23: (a) A quantum circuit to create Bell states. (b) The two stages of the circuit on
the left showing the I gate for the second qubit in the rst stage.
The circuit in Figure 23(a) takes as input two particles in pure states | 0 and | 1 and
creates a pair of entangled particles. The circuit consist of a CNOT gate with a Hadamard gate
on its control input.
It is easy to show that the truth table of the quantum circuit in Figure 23(a) is:
In Out
| 00 (| 00 + | 11)/2 = | 00
| 01 (| 01 + | 10)/2 = | 01
| 10 (| 00 | 11)/2 = | 10
| 11 (| 01 | 10)/ 2 = | 11
The drawing in Figure 23(b) helps us understanding how the output of the circuit to
generate entangled states is obtained. Here we distinguish two stages:
(i) A rst stage when the two input qubits are transformed separately. The rst qubit, | a, is
applied at the input of a Hadamard gate, H and produces an output | a . The second qubit,
| b, is applied at the input of an identity gate, I and produces an output | b =| b.
(ii) A second stage consisting of a CNOT gate with input | V and output | W . Here:
| V =| a | b .
| W = GCN OT | V .
105
Thus the output of the circuit is | W = GCN OT | V = GCN OT (| a | b ).
The output of the Hadamardgate with | = 0 | 0 + 1 | 1 as input is:
0 (| 0+ | 1)/ 2 + 1 (| 0 | 1)/ 2.
We now derive the output of the rst and the second stage of the circuit in Figure 23(b).
There are four possible cases depending upon the input qubits | a and | b:
(i) | a =| 0 and | b =| 0.
| 0+ | 1
| a = | b = | 0
2

1
1 1 1 1 0
|V= =

2 1 0 2 1
0

1 0 0 0 1 1
0 0 0 0
| W = GCN OT |V=
1 0 1 = 1 =| 00 .
0 0 0 1 2 1 2 0
0 0 1 0 0 1
(ii) | a =| 0 and | b =| 1.
| 0+ | 1
| a = | b = | 1
2

0
1 1 0 1 1
|V= =
2 1 1 2 0
1

1 0 0 0 0 0
0 0 0 1
| W = GCN OT V =
1 1 1 = 1 =| 01 .
0 0 0 1 2 0 2 1
0 0 1 0 1 0
(iii) | a =| 1 and | b =| 0.
| 0 | 1
| a = | b = | 0
2

1
1 1 1 1 0
|V= =
2 1 0 2 1
0

1 0 0 0 1 1
0 1 0
0 1 0 1 0
| W = GCN OT V =
0 = =| 10 .
0 0 1 2 1 2 0
0 0 1 0 0 1
106
(iv) | a =| 1 and | b =| 1.
| 0 | 1
| a = | b = | 1
2

0
1 1 0 1 1
|V= =

2 1 1 2 0
1

1 0 0 0 0 0
0 1 0
0 1 1 1 1
| W = GCN OT V =
0 =
=| 11 .

0 0 1 2 0 2 1
0 0 1 0 1 0
This completes the derivation of the output of the quantum circuit used to create entangled
particles from particles in pure states.
Exercise 12. Assume that G1 is the transfer matrix of stage 1 consisting of the H and I
gate and G2 = GCN OT is the transfer matrix of stage 2 of the quantum circuit in Figure
23(b). Construct GBell = G2 G1 . Apply GBell to an input to the circuit in Figure 23(a) and
calculate the output. Explain the results.
5.5 Teleportation with Maximally Entangled Particles

The Einstein-Podolsky-Rosen (EPR) paradox reveals that if two particles interact they be-
come correlated and when measuring one particle we gather information about the wave
function of the other.
In 1992 a group of scientists were discussing the impact of entanglement upon information
transmission with application to the distribution of encryption keys. As you might already
know Alice and Bob are the traditional names used in cryptography for two individuals
wishing to exchange information in a secure manner. An embellished version of the problem
follows: assume that Alice and Bob are given as a wedding present an EPR pair, a pair of
EPR entangled particles called particle 1 and particle 2. After several years Bob alone
takes part in an expedition on K9 15 and takes particle 2 with him, while Alice remains in
London to take care of their newborn infant and keeps particle 1 with her. A third party,
Caroll, asks Alice to deliver a secret message from the Royal K9 Society to Bob. The message
is encoded in the state of particle 3.
Alice cannot send the quantum state of particle 3 directly because of the risks associated
with sending quantum information over ber optics or quantum channels. What someone
conned to classical thinking might suggest Alice, is to perform a measurement of particle
3 and deliver the information to Bob via a classical communication channel. Then Bob
could reconstruct the quantum state by manipulating a particle similar with particle 3.
This is not going to work because Alice can only get partial state information as a result of
a measurement on particle 3. For example, let us assume that the information is in the
polarization of a photon. Alice may be given a photon polarized at 45 deg, and, not knowing
15
K9 is a ctitious mountain not far from K2. A number of canine expeditions to the summit are planned.
107
the orientation, she might measure its horizontal polarization. In this case Alice would not
only get the wrong answer, but she will alter the quantum sate of the photon.
The solution for Alice is to perform a joint measurement on her own half of the EPR pair,
particle 1 and on the particle given by Caroll, particle 3. Then she sends to Bob over a
classical communication channel the result of her measurement. At his end, upon receiving
Alices results, Bob shall perform upon his own particle, particle 2, one of the four types
of transformations, the one communicated by Alice. The four transformations are done by
applying its own qubit at the input of a I, X, Y, Z gates. The last three transformations
are in fact 180 deg rotations along the x, y, z axis. As a result of these transformations
particle 2 will be a perfect replica of particle 3.
At rst sight, this seems to violate the principle of no-cloning discussed earlier. This is
not true because when Alice measures the joint state of particle 1 and particle 3, the
state of particle 3 is altered, thus, the original copy of the particle given to Alice by Caroll
is destroyed.
In summary, Alice is able to transfer the quantum state, not the actual particle sending
only classical information to Bob. The transfer of hidden quantum information appears to
happen instantly, though Bob needs to receive rst classical information regarding the result
of Alices measurement.
As we have seen earlier there are several entangled states of two particles. Let us assume
rst that the particles of Alice and Bob (particle 1 and particle 2) are in the maximally
entangled state:
| 00+ | 11
| + =
2
and that particle 3 is in state:
| = 0 | 0 + 1 | 1 with | 0 |2 + | 1 |2 = 1.
The joint state of particle 3 and particle 1 is a vector in H8 :

0
0

1 0

0 1 0 1 0
+
| =| | = =

1 2 0 2
1

1 0

0
1

or: | = 1/ 2(0 | 000 + 0 | 011 + 1 | 100 + 1 | 111).
Alice carries out two operations on the two qubits in her possession, the particle from
Caroll and her own half of the entangled pair:
(i) Alice applies a CNOT to the pair; she uses Carolls qubit as a control and her own as a
target. She applies the GCN OT I to the state .

1 0 0 0
0 1 0 0 1 0
| 1 = (GCN OT I)() =
0 0 0 1 0 1 ().
0 0 1 0
108

1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0

1
| 1 =
0 0 0 1 0 0 0 0 0 = 1 0
0 0 0 0 0 0 1 0 2 1 2 0

0 0 0 0 0 0 0 1 0 1

0 0 0 0 1 0 0 0 0 1
0 0 0 0 0 1 0 0 1 0

Or | 1 = 1/ 2(0 | 000 + 0 | 011 + 1 | 101 + 1 | 110).
(ii) Alice measures the rst qubit and leaves the second (the one entangled with Bobs)
untouched. This means that she applies the H I I to | 1 .

1 1 1 1 0 1 0
| = (H I I)(1 ) = (1 ).
2 1 1 0 1 0 1

1 0 0 0 1 0 0 0 0 0
0 1 0 0 0 1 0 0 0 1

0 0 1 0 0 0 1 0 0 1

1 0 0 0 1 0 0 0 1
1 0 1 0
| = = 2 0
2 1 0 0 0 1 0 0 0
2 0
0 1 0 0 0 1 0 0 1 1

0 0 1 0 0 0 1 0 1 1
0 0 0 1 0 0 0 1 0 0
1
| = [0 (| 000+ | 011+ | 100+ | 111) + 1 (| 001+ | 010 | 101 | 110)].
2
Therefore, the transformation of the original state of particle 1 and particle 3 results
in a new state:
| = (H I I)(GCN OT I)().
We can isolate the rst two qubits in the expression of the new state:
| = 1/2[| 00(0 | 0 + 1 | 1) + | 01(0 | 1) + 1 | 0) +
| 10(0 | 0 1 | 1) + | 11(0 | 1 1 | 0)].
From this expression it follows that when Alice performs a joint measurement of her two
qubits, she gets the results | 00, | 01, | 10, | 11 with equal probability. She then sends Bob
over a classical communication channel the result: 00, 01, 10, or 11. At the same time, the
measurement performed by Alice forces the qubit in Bobs possession to one of four states:
(i) | = 0 | 0 + 1 | 1 when the result is 00.
(ii) | = 0 | 1 + 1 | 0 when the result is 01.
(iii) | = 0 | 0 1 | 1 when the result is 10.
(iv) | = 0 | 1 1 | 0 when the result is 11.
Bob applies to his qubit the following transformations:
109
(i) If he receives the string 00 he uses the I gate. The qubit becomes:
I() = .
(ii) If he receives the string 01 he uses the X gate. The qubit becomes:
X() = .
(iii) If he receives the string 10 he uses the Z gate. The qubit becomes:
Z() = .
(iv) If he receives the string 11 he uses the Y gate. The qubit becomes:
Y () = .
Now the qubit in Bobs possession is in the same state as the original qubit of Carolls. As
pointed out earlier, the state of Carolls qubit has been altered, henceforth the no clonning
theorem is not violated.
The discussion of teleportation with maximally entangled particles gives us the oppor-
tunity to observe some of the subtleties of handling entangled particles. We already know
that the state of a pair of entangled particles is a vector in H4 . Therefore even though Alice
possesses only two particles, Carolls and her own half of the entangled pair the state of this
two particles is a vector in H8 . This explains why she applies H I I to measure the rst
qubit.
Figure 24 shows a circuit involving several CNOT and Hadamard gates able to perform
teleportation. The circuit has three inputs, a, b, c and three outputs a , b , c . The unknown
input a appears at output c = a.
Alice Bob
|a> H |a'>
|b>=| 0> H |b'>
|c>=| 0> H H |c'>=|a>
1 2 3 4 5 6 7 8 9 10
Figure 24: A quantum teleportation circuit. The left side shows Alices transformations, the
right side Bobs.
We denote | a = 0 | 0 + 1 1 and identify ten stages in the the circuit in Figure 24.
110
The corresponding states vectors are: 1 , 2 , . . . 10 . We compute the state vectors stage by
stage as follows:

0
0

0

0 1 1 0
| 1 = =
1 .
1 0 0
0

0
0

1 0 1 0 0 0 0 0 0
0 1 0 1 0 0 0 0
0
1 0 1 0 0 0 0 0 0

1 0 1 0 1 0 0 0 0 0
| 2 = (I H I) | 1 =
2 0 0 0 0 1 0 1 0
1

0 0 0 0 0 1 0 1
0
0 0 0 0 1 0 1 0 0
0 0 0 0 0 1 0 1 0

0
0

0

1 0
| 2 = .

2 1
0

1
0

1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0

0 0 1 0 0 0 0 0 1 0
| 3 = (I GCN OT ) | 2 =
0 0 0 0 1 0 0 0 2 1

0 0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 1 0 0

0
0

0

1
| 3 = 0 .
2 1

0

0
1
111

1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0

0 0 0 1 0 0 0 0 1 0
| 4 = (GCN OT I) |3 =

2

0 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 1 0

0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 1

0
0

0

1 0
| 4 = .
2
0

1

1
0

1 0 0 0 1 0 0 0 0
0 1 0 0 0 1 0 0 0

0 0 1 0 0 0 1 0 0

1 0 0 0 1 0 0 0 1 1 0
| 5 = (H I I) | 4 =
2
1 0 0 0 1 0 0 0 2
0

0 1 0 0 0 1 0 0 1

0 0 1 0 0 0 1 0 1
0 0 0 1 0 0 0 1 0

0
1

1

1 0
| 5 = .
2
0

1

1
0
Note that 5 can be written as:
| 5 =| 00(0 | 0 + 1 | 1)+ | 01(0 | 1 + 1 | 0)+
| 10(0 | 0 1 | 1)+ | 11(0 | 1 1 | 0).
Now
| 6 =| 5 .
112

1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 1 1

0 0 0 1 0 0 0 0 1 0

0 0 1 0 0 0 0 0 1 0 1 1
7 = (I GCN OT )6 =

2
=
2
.

0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1 1 1

0 0 0 0 1 0 0 0 1 0
0 0 0 0 0 1 0 0 0 1

1 1 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 1

0 0 1 1 0 0 0 0 0

1 0 0 1 1 0 0 0 0 1 1
| 8 = (I I H) | 7 =
2
0 0 0 0 1 1 0 0 2
0

0 0 0 0 1 1 0 0 1

0 0 0 0 0 0 1 1 0
0 0 0 1 0 0 1 1 1

0 + 1
0 1

0 + 1

1
| 8 = 0 1 .
2 2
0 1
0 + 1

0 1
0 + 1
We now use a result from Section 4.7 giving the transfer matrix of the gate involved in
this stage, GQ :

1 0 0 0 0 0 0 0 0 + 1 0 + 1
0 1 0 0 0 0 0 0 0 1 0 1

0 0 1 0 0 0 0 0 0 + 1 0 + 1

0 0 0 1 0 0 0 0 1 0 1 1 0 1
| 9 = GQ | 8 =

2 2
=
2 2
.

0 0 0 0 0 1 0 0 0 1 0 + 1
0 0 0 0 1 0 0 0 0 + 1 0 1

0 0 0 0 0 0 0 1 0 1 0 + 1
0 0 0 0 0 0 1 0 0 + 1 0 1

1 1 0 0 0 0 0 0 0 + 1
1 1 0 0 0 0 0 0 0 1

0 0 1 1 0 0 0 0 0 + 1

1 0 0 1 1 0 0 0 0 1 0 1
| 10 = (I I H) | 9 =
2
0 0 0 0 1 1 0 0 2 2
0 + 1

0 0 0 0 1 1 0 0 0 1

0 0 0 0 0 0 1 1 0 + 1
0 0 0 1 0 0 1 1 0 1
113

0
1

0

1 1
| 10 =
2
0

1

0
1
| 10 = 12 [| 00(0 | 0 + 1 | 1)+ | 01(0 | 0 + 1 | 1)+ | 10(0 | 0 + 1 | 1)+

+ | 11(0 | 0 + 1 | 1)]
1
| 10 = (0 | 0 + 1 | 1)(| 00+ | 01+ | 10+ | 11).
2
5.6 Anti-Correlation and Teleportation

The two entangled qubits could be in another EPR state. Now we discuss briey the so called
anti-correlation. Consider the following two qubit state called a spin singlet:
| 01 | 10
| = .
2
This is an entangled state and we shall prove later that a measurement reveals that the
two particles involved are in opposite states. For example, if we measure the spin of the
rst particle and nd it to be Up, then the spin of the second is Down and vice versa.
Miraculously, the second qubit changes its state, as if knowing the result of the measurement
on the rst qubit. This state is distinguished from the three other entangled states; it changes
sign when particle 1 and particle 2 of an entangled pair are interchanged.
Let us analyze the anti-correlation case described above. For the sake of clarity we consider
the three particles to be photons and use the symbol for horizontal polarization and for
vertical polarization. The corresponding basis vectors are | and |. The subscript identies
the particle in the pair. The entangled pair (1, 2) forms a single quantum system with a shared
state:
1
| 12 = (|1 |2 |1 |2 ).
2
This entangled state indicates only that the two particles are in opposite state but provides
no information about the state of each particle of the pair. Once we make a measurement of
one of the particles by projecting it onto one of the basis vectors say | then the state of
the other particle becomes instantaneously |. If particle 1 is measured and found to have
vertical polarization, then particle 2 will have horizontal polarization.
Now we perform a specic joint measurement on particle 1 and particle 3 which
projects them onto the entangled state:
1
| 13 = (|1 |3 |1 |3 ).
2
114
Now particle 1 and particle 3 are anti-correlated. If particle 3 has horizontal polar-
ization it forces particle 1 to have vertical polarization. In turn, particle 1 forces particle
2 to have opposite polarization, thus particle 2 ends up with the same polarization as par-
ticle 3.
A demonstration of quantum teleportation was carried out in 1997 at the University of
Rome by Francesco de Martini based upon an idea of Sandu Popescu, and, at about the same
time, at Innsbruck, by Zeillinger. In both experiments the quantum state was teleported a
few meters.
Reflecting mirror
A D
P
o
l h v
a
Alice r Source Bob
I
z h v
e
r
B C
Carol
Figure 25: The teleportation experiment at University of Rome. The source generates a
photon with horizontal polarization for Alice and one with vertical polarization for Bob. The
entanglement is in the path selection. If Alice gets her photon via path A, then Bob gets his
via path C; if Alice gets the photon via path B then Bob gets his via path D. Caroll encodes
her quantum information using a polarizer. Alice measures the polarization of the photon she
receives and sends this classic information to Bob.
The experiment of de Martini is illustrated in Figure 25 [20]. In this experiment the in-
formation is double encoded into a single photon instead of two. The source generates two
parametric downconverted 16 photons with opposite polarization, photon 1, with horizontal
polarization, h, for Alice and photon 2, with vertical polarization, v, for Bob. The polariza-
tion entanglement of the two photons sent to Alice and Bob is converted into an entanglement
of the paths followed by the two photons. A calcite crystal performs this conversion.
If photon 1 travels to Alice via path A then photon 2 travels to Bob via path C; if
photon 1 travels via path B, then photon 2 travels via path D. Caroll encodes her message
in the polarization of the photon sent to Alice, photon 1. Alice measures the polarization
16
A parametric downconversion source uses a UV laser beam which upon an interraction with a non-linear
medium - a crystal of ammonium dihydrogen phosphate, (ADP) - generates two photons for one input photon.
115
of the photon she receives from the source and sends the classical result to Bob. Finally,
Bob performs the measurement suggested by Alices result and he gets a photon with the
polarization imposed by Caroll.
In this experiment the polarizer forces a certain polarization on photon 1 and because
of the anti-correlation of photon 1 and photon 2, the latter is forced to an opposite
polarization.
5.7 Dense Coding

Coding is the process of transforming information during a communication process. The
sender of a message encodes the message, then transmits the encoded information over a
classical communication channel. The recipient of the message decodes the encoded informa-
tion. The question we address now is whether there is an advantage to exchange quantum
information, qubits over a quantum communication channel, instead of sending classical bits
over a classical communication channel.
The main characters of following example are also Alice and Bob with Caroll, in a sup-
porting role. Alice and Bob have been married for some time and Alice, inspired by the
wonderful pictures taken by Bob during his last trip to K9 decides to join a new expedition
to that remote part of the world. They want to exchange daily messages and compress them
as much as possible to reduce communication costs. They agree prior to Alices departure to
exchange daily information about the temperature and the cloud cover on K9. They decide
that Alice will construct two bit messages. The rst bit will describe the temperature (0 if the
temperature is below 0 degree F, 1 if it is above 0 degree F); the second bit will describe the
cloud cover (0 is there are no clouds and 1 if there is a cloud cover. The sentence: At noon
today the temperature on the summit of K9 is below zero and there are no clouds. Love Alice
will be encoded as the binary string 00. The other possible messages are 01 (...below
zero... clouds..), 10 (...above zero... no clouds..), and 11 (...above zero...clouds..).
To send one of the four messages Alice must transmit two bits of classical information. But
Alice and Bob are already acquainted with quantum information. They decide to exchange
a single qubit of information over a quantum channel to encode and decode the four possible
message. Here is the intricate story on how they succeed.
Assume that there is a source of entangled particles and Caroll is able to send to Alice
one qubit (one of the two entangled particles) and to Bob the other entangled particle of the
pair, see Figure 26. By now we know that both particles are in the state:

1
| 00 + | 11 1 0
| = | = .
2 2 0
1
Alice uses the one qubit transformations given by the Pauli matrices, I, X, Y and Z. Recall
from Section 4.2 that these matrices are:

1 0 0 1 0 i 1 0
I= X= Y = Z= .
0 1 1 0 i 0 0 1
Alice prepares her qubit as follows:
116
Pair of entangled qubits
Alices Bobs
qubit qubit
Caroll
Alices Bobs
qubit
qubit
I Z X iY
00 01 10 11 Quantum
channel
Alices modified Qubit from
but still entangled qubit Alice
Alice
CNOT
First Second
qubit qubit
H 0 1
0 1
Bob
Figure 26: Dense coding. Alice sends to Bob one qubit instead of two bits.
(i) to send 00 she applies to her qubit the transformation produced by the I, the identity
matrix and transmits:

1
| 00 + | 11 1 0
| 00 = | 00 = .
2 2 0
1
(ii) to send 01 she applies to her qubit the transformation produced by the Z matrix and
transmits:
1
| 00 | 11 1 0
| 01 = | 01 = .
2 2 0
1
(iii) to send 10 she applies to her qubit the transformation produced by the X matrix and
117
transmits:
0
| 01 + | 10 1 1
| 10 = | 10 = .
2 2 1
0
(iv) to send 11 she applies to her qubit the transformation produced by the iY matrix and
transmits:
0
| 01 | 10 1 1
| 11 = | 11 =
.

2 2 1
0
You already know that the X, Y , Z, and I are one-qubit gates and are used to transform
vectors in H2 , but this time we have an entangled qubit, a vector in H4 . Each one of the
four matrices used to transform the rst qubit of the entangled pair, the qubit in possession
of Alice, is obtained as the tensor product of the corresponding single qubit gate and the
identity matrix. We transform only the rst but leave the second qubit of the entangled pair
untouched.
The transfer matrices and the outputs for the four transformations Alice is expected to
carry on her qubit are:
1 0 0 0
1 0 1 0 0 1 0 0
00 G00 = I I = = 0 0 1 0

0 1 0 1
0 0 0 1

1 0 0 0 1 1
0 1 0 0 1 0
| 00 = = 1 0 .
0 0 1 0 2 0 2 0
0 0 0 1 1 1

1 0 0 0
1 0 1 0 0 1 0 0
01 G01 = Z I = = 0 0 1

0 1 0 1 0
0 0 0 1

1 0 0 0 1 1
0 0
| 01 =
1 0 1 0 = 1 0 .
0 0 1 0 2 0 2 0
0 0 0 1 1 1

0 0 1 0
0 1 1 0 0 0 0 1
10 G10 = X I = =
1 0

1 0 0 1 0 0
0 1 0 0

0 0 1 0 1 0
0 0 0 1 1 0 1 1
| 10 =
1
= .
0 0 0 2 0 2 1
0 1 0 0 1 0
118

0 0 i 0
0 i 1 0 0 0 0 i
11 G11 = Y I = =
i 0 0 0

i 0 0 1
0 i 0 0

0 0 i 0 1 0 0
0 0 0 i i 1
| 11 = i 1 0 i
= 1
= .
i 0 0 0 2 0 2 i 2 1
0 i 0 0 1 0 0
After getting the qubit from Alice, Bob is in possession of both qubits of the entangled
pair. Lets us see how he decodes the message. First he applies a CNOT to the pair and gets
the following results
1 0 0 0 1 1
0 1 0 0 1 0
00 | = GCN OT 00 = = 1 0
0 0 0 1 2 0 2 1
0 0 1 0 1 0
| 00+ | 10 | 0+ | 1
| 00 = = | 0.
2 2

1 0 0 0 1 1
0
1 0 0 1 0 1 0
01 | = GCN OT 01 =
0 = 2
0 0 1 2 0 1
0 0 1 0 1 0
| 00 | 10 | 0 | 1
| 01 = = | 0.
2 2

1 0 0 0 0 0
0
1 0 0 1 1 1 1

10 | = GCN OT 10 =
0 = 2
0 0 1 2 1 0
0 0 1 0 0 1
| 01+ | 11 | 0+ | 1
| 10 = = | 1.
2 2

1 0 0 0 0 0
0 1 0 0 1
11 | = GCN OT 11 = 1 1 = 1
0 0 0 1 2 1 2 0
0 0 1 0 0 1
| 01 | 11 | 0 | 1
| 11 = = | 1.
2 2
Amazingly enough, Bob can now measure the second qubit without aecting the state of
the entangled pair. If the second qubit is 0 it means that Alice sent either 00 or 01. If
the second qubit is 1 it means that Alice sent either 10 or 11.
Now Bob applies the Hadamard gate to the rst qubit. The results are:

1 1 1 1 1 1 2 1
00 | 00 = H 00 = = =
2 1 1 2 1 2 0 0
119
| 00 =| 0.

1 1 1 1 1 1 0 0
01 | 01 = H 01 = = =
2 1 1 2 1 2 2 1
| 01 =| 1.

1 1 1 1 1 1 2 1
10 | 10 = H 10 = = =
2 1 1 2 1 2 0 0
| 10 =| 0.

1 1 1 1 1 1 0 0
11 | 11 = H 11 = = =
2 1 1 2 1 2 2 1
| 11 =| 1.
Now Bob knows precisely which one of the four messages was sent. If the second qubit is
1 and the result of the transformation (performed by the Haddamard gate on the rst qubit
of the pair) is 1 then the message sent is 00; if the result of the H transformation is 0
then the message sent is 01. If the second qubit is 0 and the result of the transformation
is 1 then the message sent is 10; if the result of the H transformation is 0 then the
message sent is 11.
Exercise 13. Can the strategy for encoding and decoding two bits of classical information
into one qubit be extended? If you believe that this is possible, describe an algorithm for dense
coding of 3 bits into one qubit. If not justify your answer.
5.8 Quantum Key Distribution

System security is a critical concern in the design of networked computer systems. Conden-
tiality is a property of a system that guarantees that only agents with proper credentials have
access to information. Condentiality can be compromised during transmission over insecure
communication channels or while being stored on sites that allow multiple agents to modify
it.
The common method to support condentiality is based on encryption. Data or plaintext
in cryptographic terms is mathematically transformed into ciphertext and only agents with
the proper key are able to decrypt the ciphertext and transform it into plaintext.
The algorithms used to transform plaintext into ciphertext and back form a cipher. A
symmetric cipher uses the same key for encryption and decryption. Asymmetric or public key
ciphers involve a public key that can be freely distributed and a secret private key. Data is
encrypted by the sender using the public key of the intended recipient of the message and it
is decrypted using the private key of the recipient, see Figure 27.
The problem of distributing the key of a symmetric cipher is known as the key distribution
problem. Decryption using an asymmetric cipher is time consuming. Typically, the parties
involved use the public key system to exchange a session key, and then they use a symmetric
120
Plaintext Plaintext
Encrypt with Decrypt with

secret key secret key
Ciphertext
(a)
Plaintext Plaintext
Encrypt with Decrypt with the

public key of the private key of the
recipient recipient
Ciphertext
(b)
Figure 27: (a) Secret key cryptography. (b) Public key cryptography.
cypher based upon this key to communicate. To make it harder for a third party to break
the cipher the encryption key should be changed as frequently as possible.
The key distribution has an ingenious solution when in addition to the classical commu-
nication channel, a quantum communication channel is available. We show now that the two
parties involved in the exchange of the cipher key can detect if a third party eavesdrops.
As before,the main characters are Alice who wants to send the encryption key to Bob, and
Caroll who attempts to eavesdrop.
The exchange without a third party is captured in Figure 28(a). Alice sends over the
quantum communication channel n qubits encoded randomly in one of two bases, (|, |)
and (|, |). Bob randomly chooses one of the two bases when receiving and measuring
one qubit. Bob ends up guessing correctly the bases for about n/2 of the qubits, but he does
not know which qubit is measured correctly. Then Alice uses the classical communication
channel to send Bob the bases used for each qubit and Bob sends Alice the bases he guessed
for each qubit. Now both Alice and Bob know precisely the qubits they have agreed upon.
They use the approximately n/2 qubits as the encryption key.
The exchange when Caroll eavesdrops on both the quantum and the classical communi-
cation channels is illustrated in Figure 28(b). Caroll intercepts each of the n qubits. She
randomly picks up one of the two bases to measure each qubit and then she re-sends the
qubit to Bob. Approximately n/2 of the qubits received by Bob have their state altered by
Caroll. When Alice and Bob exchange information over the classical communication channel
they realize that they agree in considerably less than n/2 of qubits. Thus they detect the
presence of Caroll.
121
Quantum communication channel
n qubits
Alice Bob
n bits
Classical communication
n qubits channel
(a)
Quantum communication channel
n qubits
Alice Caroll Bob
n bits
Classical communication channel
(b)
Figure 28: (a) Alice sends over the quantum communication channel n qubits encoded ran-
domly in one of two bases. Bob randomly chooses one of the two bases when receiving one
qubit. Bob ends up guessing correctly the bases for about n/2 of the qubits. Then Alice and
Bob exchange over the classical communication channel the base used by each one of them
for each qubit. Now both Alice and Bob know precisely on which qubits they have agreed
and use them as the encryption key. (b) Caroll intercepts each of the n qubits. She randomly
picks up one of the two bases to measure each qubit and then she re-sends the qubit to Bob.
Approximately n/2 of the qubits received by Bob have their state altered by Caroll. Bob
guesses the correct bases for only about half of the qubits whose state has not been altered
by Caroll. When Alice and Bob exchange information over the classical communication chan-
nel they realize that they agree on considerably less than n/2 qubits. Thus they detect the
presence of Caroll.
17
We give without proof the following proposition :
Proposition 5. To distinguish between two non-orthogonal quantum states we can only gain
information by introducing additional disturbance of the system.
Protocols for quantum key distribution are based upon the idea of transmitting non-
orthogonal qubit states and then checking for the disturbance in their transmitted states. To
prove the correctness of a quantum key distribution (QKD) protocol we have to show that
Alice and Bob agree on a key about which Caroll can obtain only an exponentially small
amount of information.
Eavesdropping must be distinguished from the noise on the communication channel. To
do so check bits must be interspaced randomly among the data bits used to construct
the encryption key.
17
The proof requires the understanding of quantum measurements and will be given in Section 2
122
We now outline the rst QKD protocol, BB84, proposed by Bennett and Brassard in 1984
[11]. A proof of security of this protocol can be found in [54]. Here we follow the description
of the protocol in [38].
(i) Alice selects n, the approximative length of the desired encryption key. Then she generates
two random strings of bits a and b of length (4 + )n.
(ii) Alice encodes the bits in string a using the bits in string b to chose the basis ( either X
or Z) for each qubit in a. She generates | , a block of (4 + )n qubits:

k=(4+)n
| = | ak bb .
k=1
where ak and bk are the k-th bit of stringsa and b respectively.

Each qubit is in one of four
pure states in two bases, [| 0, | 1] and [1/ 2(| 0+ | 1), 1/ 2(| 0 | 1)].
The four states used are:
00 =| 0
10 =| 1

01 = 1/ 2(| 0+ | 1)

011 = 1/ 2(| 0 | 1).
(iii) If E describes the combined eect of the channel noise and Carolls interference then the
bock of qubits received by Bob is E(| |).
(iv) Bob constructs a random string of bits, b, of length (4 + )n. He then measures every
qubit either in basis X or in basis Z depending upon the value of the corresponding bit of
b. As a result of his measurement he constructs the binary string a. He tells Alice over the
classical channel that he now expects information about b.
(v) Alice uses the classical channel to disclose b.
(vi) Alice and Bob exchange information over the classical channel and keep only the bits
in the set {a, a} for which the corresponding bits of the strings b and b are equal. Let us
assume that Alice and Bob keep only 2n bits. By choosing suciently large Alice and Bob
can ensure that the number of bits kept is close to 2n with a very high probability.
(vii) Alice and Bob perform several tests to determine the level of noise and eavesdropping
on the channel. The set of 2n bits is split into two sub-sets of n bits each. One sub-set will
be the check bits used to estimate the level of noise and eavesdropping, and the other consists
of the data bits used for the quantum key. Alice selects n check bits at random and sends the
position of the selected bits over the classical channel to Bob. Then Alice and Bob compare
the values of the check bits. If more than say t bits disagree then they abort and re-try the
protocol.
In summary, the attempt of an intruder to eavesdrop increases the level of disturbance of
a signal on a quantum communication channel. The two parties wishing to communicate in
a secure manner establish un upper bound for the level of disturbance tolerable. They use
a set of check bits to estimate the level of noise and/or eavesdropping. Then they reconcile
their information and distil a shared secret key.
123
5.9 Quantum Error Correction
5.10 Classic and Quantum Information Theory

In late 1940s Claude Shannon from Bell Labs developed an abstract model of communication;
in this model there are three entities involved, a source of information, a communication
channel, and a receiver. The source and the receiver share a common alphabet, A, a nite set
of symbols that can be transmitted over the channel, e.g., the English alphabet. The symbols
of the alphabet A = {A1 , A2 , . . . An } are selected according to probabilities p1 , pn , . . . pn .
Information Theory
Statistical Theory of
Information Encoding
Communication
Error Correcting Codes Data Compression Cryptography
Quantum Error Quantum Data Quantum Key

Correction Compression Distribution
Decoherence Entanglement
Quantum Mechnics
Figure 29: The relationship between Information Theory and Quantum Mechanics.
Shannon wanted to dene a measure of the quantity of information a source could generate.
Earlier, in 1927, another fellow scientist from Bell Labs, Ralph Hartley had proposed to
take the logarithm of the total number of possible messages as a measure of the amount of
information in a message arguing that the logarithm tells us how many digits or characters
are required to convey the message.
Shannon took a hint from classical thermodynamics and decided to relate the information
content with the probabilities of a message. The motivation is is quite intuitive: assume that
Bob wants to transmit a message to Alice to conveying the fact that no meteorite struck Tibet
124
in the last 24 hours. A meteorite striking the Earth is a very rare event thus Bobs message
would convey little information; on the other hand the message that a meteorite struck Tibet
would contain a much larger amount of information because this would be such a rare event.
5.11 Quantum Parallelism

In 1985 David Deutsch recognized that a quantum computer has capabilities well beyond a
classical computer and suggested that such capabilities can be exploited by cleverly crafted
algorithms. Deutsch realized that a quantum computer can evaluate a function f (x) for many
values of x simultaneously and called this strikingly new feature quantum parallelism [24].
In 1994 Peter Shor [51] found a polynomial time algorithm for factoring n-bit numbers on
quantum computers and generated a wave of enthusiasm for quantum computing. Like most
factoring algorithms, Shors algorithm reduces the factoring problem to the problem of nding
the period of a function, but uses quantum parallelism to nd a superposition of all values
of the function in one step. Then the algorithm calculates the quantum Fourier transform
of the function, which sets the amplitudes into multiples of the fundamental frequency, the
reciprocal of the period. To factor an integer then the Shor algorithm measures the period of
the function.
In Section 4.9 we showed that an arbitrary function f can be computed by quantum
computer. Figure 30 illustrates a quantum gate array characterized by a linear transformation
given by Gf . This gate array is a generalization of the CNOT gate, its inputs are | x Hm an
m-dimensional vector acting as a control input, and | y Hk a k-dimensional vector. The
outputs are | x and | y f (x) Hn , with n = m + k. When | y =| 0 then the second
output becomes | y f (x) =| f (x).
|x> |x>
(m-dimensional)
Gf
|y> |y+
O f(x) >
(k-dimensional) (n=m+k)-dimensional)
Figure 30: The transformation Gf :| x, y | x, y f (x) performed by the gate array.
Now assume that the input vector | x is in a superposition state and can be expressed
as a linear combination of 2m vectors forming an orthonormal basis in Hm . The gate array
performs a linear transformation. Henceforth, the transformation is applied to all basis vectors
used to express the input superposition simultaneously, and it generates a superposition of
results. In other words, the values of the function f (x) for the 2m possible values of its
argument x are computed simultaneously. This eect is called quantum parallelism and it
justies our statement in Section 1.4 that quantum computers can provide an exponential
amount of computational space in a linear amount of physical space.
Quantum parallelism allows us to construct the entire truth table of a quantum gate array
having 2n entries in one at once. In a classical system we can compute the truth table in one
125
time step with 2n gate arrays running in parallel, or we need 2n time steps with a single gate
array.
Typically we start with an n qubits, each in state | 0 and we apply a Welsh-Hadamard
transformation. Each qubit is transformed by a Hadamard gate; recall that the Hadamard
gate transforms a | 0 as follows:

H :| 0 1/ 2(| 0+ | 1).
Thus:
n
1
2 1
1
(H H . . . H) | 00 . . . 0 = [(| 0+ | 1) (| 0+ | 1) . . . (| 0+ | 1) = | x.
2n 2n x=0
The output of the gate array when we add a k-bit register to the superposition state of
integers in the range 0 to 2n 1 is:
n n n
1 1 1
2 1 2 1 2 1
Gf ( | x, 0) = Gf (| x, 0) = Gf (| x, f (x)).
2n x=0 2n x=0 2n x=0
When we measure the output of the quantum gate array we can only observe one value.
Henceforth, we need some level of algorithmic sophistication to exploit the quantum paral-
lelism.
Quantum parallelism is best illustrated by the solution to the so-called Deutschs prob-
lem. Consider a black box characterized by a transfer function that maps a single input bit
x into an output, f (x). The transformation performed by the black box, f (x), is a general
function and might not be invertible. We assume that it takes the same amount of time, T ,
to carry out each of the four possible mappings performed by the transfer function f (x) of
the black box:
f (0) = 0 f (0) = 1 f (1) = 0 f (1) = 1.

The problem posed is to distinguish if f (0) = f (1) or f (0) = f (1).
Using a classical computer one alternative is to compute sequentially f (0) and f (1) and
then compare the results, see Figure 31(a) with a total time 2T . A classical parallel solution
is illustrated in Figure 31(b) where we feed 0 as input to one of the replicas of the black box
and 1 to the other and then compare the partial results. In this case we obtain the answer
after time T .
Consider now a quantum computer with a transfer function Uf that takes as input two
qubits | x (control) and | y (target) and two outputs, | x and | y f (x). We have the
choice of selecting the states of the two qubits | x and | y. First, let us choose for the second
qubit the state | y = 12 (| 0 | 1).
We know that | 0 f (x) = | f (x) thus:
1 1
| y | f (x) = (| 0 | 1) | f (x) = (| f (x) | 1 f (x)).
2 2
But | 1 f (x) is equal to 0 when f (x) = 1 and it is equal to 1 when f (x) = 0 thus:
126
0 f(0)
1 f(1)
0 f(0) 1 f(1)
2T T
(a) (b)
|x> |x>
Uf
|y> | y >O+ f(x) >
T
(c)
|0>+|1>
2 |x>
Uf
|0>
H | y >O+ f(x) >
|0>-|1>
T
2
(d)
Figure 31: Classical and quantum parallelism. (a) Sequential solution to Deutschs problem
using a classical computer. (b) A parallel solution to Deutschs problem using a classical
computer. (c) The quantum black box with a transfer function Uf . It evaluates f (0) and
f (1) simultaneously. (d) The control qubit to the quantum black box is in state |0 +2 |1 and
the target state is the output of a Hadamard gate with input | 0.
1
| 1 | f (x) = (1)f (x) (| 0 | 1).
2
The quantum black box performs the following transformation of the two qubits:
Uf
1 1
[ | x (| 0 | 1) ] [ | x (1)f (x) (| 0 | 1)].
2 2
Let us now assume that the rst qubit is in the state | x = 1 (| 0 + | 1). The
2
transformation performed by the black box is:
Uf
1 1 1 1
[ (| 0 + | 1) (| 0 | 1) ] [ (| 0 + | 1) (1)f (x) (| 0 | 1) ].
2 2 2 2
127
The procedure described above can be generalized for a function f (x) where x is an n-tuple
and can take any of the 2n values. A quantum black box allows us to compute at once the
entire table giving all possible 2N values of the function f (x). The transfer function would
then be:
Uf
[ | x | 0 ] [ | x | f (x) ].
We select as control input a qubit in the state:

n
1
2 1
1
| x = [ (| 0 + | 1) ]n = | x.
2 2 x=0
We compute f (x) only once and generate a state that encodes global properties of f (x):
n
1
2 1
| x | f (x).
2 x=0
The truly amazing result is that we compute the entire table of 2n values at once regardless
of the value of n. This gives a totally dierent meaning to the concept of massive parallelism.
But, as always, there is a catch; unfortunately as soon as we perform one measurement of
state we can only recover one entry in the table. This parallelism is not very useful as such,
we must discover more clever ways of using it.
128
6 Physical and Logical Reversibility. Reversible Com-
putations
6.1 Turing Machines, Reversibility, and Entropy

We are now concerned with the relationships between the wonderful abstractions used by the
theoretical computer science and the physical systems surrounding us. We want to under-
stand the physical support of information and determine the energy required to store and to
transform information.
A general purpose computer is a system capable of storing and of transforming information.
Information is a primitive concept and we defer until Chapter 7 a formal denition of this
concept. For the time being we consider familiar objects such as strings of characters, images,
numbers and refer to them collectively as information.
Models of computations and computing machines that abstract properties of the thought
process necessary to solve a problem as well as the properties of a physical system able to carry
out these steps automatically have been developed. Turing machines, eective procedures,
nite-state machines, virtual memory, virtual machines, are familiar examples of such high
level models. These abstractions generalize knowledge extracted from particular cases.
A Turing machine is an abstraction, a system with a nite number of internal states and
a read-write head, that moves over a tape consisting of cells. Each cell contains a symbol.
The Turing machine starts in a certain state, looks at the symbol currently under the head
and depending upon the internal state of the machine and the current symbol, it might erase
that symbol and replace it with another symbol, or leave it as it is, and then move either
to the left or to the right and change its internal state. Let Qi and Si denote the state and
the symbol read at time t; then Qj , Sj , and D, the new state, the symbol written, and the
direction of movement are given by:
Qj = F (Qi , Si )
Sj = G(Qi , Si )
D = D(Qi , Si ).
The evolution of a computation carried out by a Turing machine is given by the set of
quintuples (Qi , Si , Qj , Sj , D) and it is complectly specied by the original tape and this set.
There exists a Universal Turing machine capable of mimiking any other Turing machine.
Assume that the set of quintuples describing the computation carried by the original Turing
machine is available. We feed the Universal Turing machine the description of the original
Turing machine (the set of quintuples) as well as the input tape of the original Turing machine
and the indication where to start and where to end.
Churchs thesis is that all computing devices can be simulated by a Turing machine.
Though not a mathematical theorem, this thesis greatly simplies the study of computations,
it reduces the potentially innite set of computing devices to a relatively simple model, the
Universal Turing machine.
Let us turn our attention to the functions that can be computed by a Turing machine.
We expect that the process of nding a solution to some problems cannot be automated.
129
Formulas to calculated the integrals of many functions exist, see for example [32], but no
general rules to obtain the analytic expression of the integral of an arbitrary function are
known. Proving theorems and solving Euclidian geometry problems is an art, no general
rules to prove theorems and to construct the solution of an Euclidian geometry problem exist.
On the other hand, whenever a function has a derivative, the analytic expression of its
derivative can be obtained by applying a nite set of rules to the original expression. Ana-
lytical geometry reduces Euclidian geometry problems to problems in a branch of algebra.
There are eective procedures to solve some problems while eective procedures may
not exist for other classes of problems. Having an eective procedure to do some computa-
tion amounts to nding a Turing machine able to carry out the same computation. Let F (x)
be a function of x. F (x) is Turing computable if there is Turing machine TF which fed a tape
containing a description of x will eventually halt with a description of F (x) on tape.
Understanding how a computer works is a non-trivial task while a description of a Turing
machine needs only a few paragraphs and, yet, it allows us to reason about fundamental issues
such as the halting problem 18 .
The distinction between computable and non-computable function is not sucient and
the complexity theory is concerned with the eciency with which a specic function F (x) is
computed, it is concerned with time and space complexity of algorithms. Yet, when addressing
the problem of resources necessary for computations, besides time and space, we have to
examine also the energy consumption by the physical devices performing a computation.
Entangled in abstractions, we have moved further and further from the physical realities
and now is the time to come back; the information must have a physical substrate so a taunting
question is the relationship between information and energy. The connection between energy
dissipation and computations was examined by Leo Szilard in 1929 [59] and later by von
Neumann in a lecture given in 1949 [61].
The models discussed above view a computation as a unidirectional process taking a
system from an initial state to a nal state, transforming some input data into results. As
such, a computation is an irreversible process inexorably condemned to consume energy, in
strong contrast with physical processes which are reversible.
A physical process is said to be reversible if it can evolve forwards as well as backwards
in time. A reversible system can always be forced back to its original state from a new state
reached during an evolutionary process.
Reversibility is a fundamental property of nature. This may not be obvious at macroscopic
level, a crystal glass dropped on the or breaks and the pieces cannot be glued back together,
rare, red wine spilled on a silk table cloth cannot be recovered, a missile red from a launcher
cannot be brought back. Yet, at the atomic level, all processes are reversible. This means
that all equations of classical and quantum mechanics are symmetric in time. If we replace
time, t, with t, the equations are not altered.
Macroscopic systems could behave reversibly as well. Of course, if you drop a piano from
the 25-th oor of an apartment building to the street below, it will break into pieces. But
if you have a pulley and lower it slowly, it will reach street level in one piece, its potential
energy will be gradually transferred to the counter weight. The trick is to make slowly subtle
changes in the environment rather than sudden, dramatic changes. The system must be in
equilibrium with its environment at all times.
Sadi Nicolas Leonard Carnot published in June 1824 a book Rexions sur la puissance
18
For some input x a Turing machine may not halt. It is not possible to construct a computable function
which predicts whether or not the Turing machine TF with x as input will ever halt.
130
motrice du feu et sur les machines propres a dvelopper cette puissance 19 showing that a
heat engine can behave reversibly. Carnots engine consists of an idealized gas in a cylinder
with a piston. The system can be heated and cooled by placing it in contact with heat and
cool reservoirs, respectively. In contact with the heat reservoir the gas within the cylinder
expands, pushes the piston and does some useful work. In contact with the cool reservoir, the
gas contracts, restoring the piston to its most compressed state. The motion of the piston
must be very slow to ensure that the gas in the cylinder is always in equilibrium with its
surroundings.
Carnot came to the conclusion that if an engine is reversible, it makes no dierence how
the engine is actually designed. The amount of work obtained if the engine absorbs a given
amount of heat at temperature T1 and delivers heat at temperature T2 is a property of the
world and not of that particular engine [29].
In 1865, in conjunction with the study of heat engines, the German physicist Rudolph
Clausius abandoned the idea that heat was conserved and stated formally the First Law of
Thermodynamics. He reconciled the results of Joule with the theories of Sadi Carnot.
Clausius dened entropy as a measure of energy unavailable for doing useful work. He
discovered the fact that entropy can never decrease in a physical process and can only remain
constant in a reversible process. This result became known as the Second Law of Thermody-
namics. Amazingly enough our Universe started into a state of perfect order and its entropy
is steadily increasing leading to a.... heat death. Of course, this sad perspective could be
billions or trillions of years away...
Statistical arguments show that systems tend to become more disordered. Indeed, our
immediate experience and intuition show that disordered states vastly outnumber the highly
ordered states of a system. After any change of state a system is more likely to settle into
a disordered state, than into an ordered one. In the days of punched computer cards almost
never a deck of cards falling on the oor remained in order.
Any student of history recognizes that revolutions and wars contribute greatly to an in-
crease of the social disorder and destruction of civilization, thus to an increase of social
entropy. The Roman empire imposed order over a signicant portion of the Western world
and contributed greatly to the Western civilization. After the fall of the Roman Empire, in
476 AD, Europe was thrown into a period of bloody ghts and little progress in arts and
science is noticeable for almost a millennium.
The higher the entropy of a system, the less information we have about the system.
Henceforth, the information is a form of negative entropy. Claude Shannon recognized the
relationship between the two and, on von Neumanns advice, he called the negative logarithm
of probability of an event, entropy. It is rumored that von Neumann told Shannon: It is
already in use under that name... and besides it will give you a great edge in debates because
nobody really knows what entropy is anyway [20].
6.2 Thermodynamic Entropy

We now turn back from abstractions to properties of the matter. The question we pose is if
there is a price to pay for the lack of physical awareness of our computational models and what
are the limitations of such models. We ask ourselves what is the relationship of information
with energy and matter.
19
Reections upon the motric power of the re and upon the machines capable of using this power.
131
Fortunately, some properties of materials do not require a detailed knowledge of the struc-
ture of the matter and they were studied during the XIX century, well before the atomic
structure of the matter was fully understood. The subject of the branch of physics called
thermodynamics is the statistical behavior of assembles of molecules. For example, we wish to
characterize a gas by familiar macroscopic properties such as, volume, pressure, and temper-
ature without knowing its microscopic properties, e.g., the vectors describing the velocities
of individual molecules of the gas. A comprehensive discussion of the subject is well beyond
our intentions, the interested reader will certainly enjoy the presentation in physics texts, our
favorite being Feynmans Lectures on Physics [29]; here we only present several concepts
useful to illustrate the relationship between information and energy dissipation.
The laws of thermodynamics are relationships between macroscopic quantities such as the
temperature, T , the pressure, p, the heat, Q, the energy of the system, U , the free energy, F ,
the work done on the system, W , all of them dened as statistical averages. The First Law
of Thermodynamics is a conservation law; it states that the total change in the energy of a
system is the sum of the heat put into the system and the work done on the system:
U = Q + W.
Throughout this section signies a nite change while is an innitesimal change of a
variable.
The thermodynamic entropy of a gas, S is also dened statistically but it does not reect
a macroscopic property. The entropy quanties the notion that a gas is a statistical ensemble
and it measures the randomness, or the degree of disorder of the ensemble. The entropy is
larger when the vectors describing the individual movements of the molecules of gas are in
a higher state of disorder than when all of them are well organized and moving in the same
direction with the same speed:
S = kB ln(W ).
where kB is Boltzmans constant, kB = 1.381 1023 Joules per degree Kelvin and W is the
probability of a given state. A version of this equation is engraved on Ludwig Boltzmanns
tombstone 20 .
The Second Law of Thermodynamics tells us that the entropy of an isolated system never
decreases. Indeed, dierentiating the previous equation we get:
W
S = kB 0.
W
It is relatively easy to see that when we compress a volume containing N molecules of gas
from V1 to V2 , maintaining the temperature of the system constant (isothermal compression),
the work done on the system is:
V2
N kB T V2
W = dV = N kB T ln .
V1 V v1
Indeed, the pressure, the variation of volume V and the work are related by W = pV. But
for an ideal gas at temperature T , pressure p, and with volume V we know that pV = N kB T .
It follows immediately that:
20
Boltzman took his life in 1906 not knowing the impact of his ndings upon the world of physics.
132
V2
S = N kB ln .
v1
Let us now consider a gas consisting of a single molecule. When N = 1 and we reduce the
volume of the gas in half, V2 = V1 /2 the previous expression becomes:
S = kB ln(2).
The reduction of an ensemble to a single molecule requires a leap of faith; yet it allows us
to relate information and entropy.
By reducing the volume where the molecule can be located we have increased our infor-
mation about the system and we have decreased the entropy. Now the molecule can hide in
a volume twice as small as before.
Was the Second Law of Thermodynamics violated by this experiment? No, because we do
not have an isolated system, we have in fact increased the amount of free energy by kB T ln(2)
and decreased the entropy by kB ln(2).
Now we can use the gas cylinder with a molecule of gas to store information. We reset
our bit by compressing the volume and then let the volume expand and the molecule will
be on one or the other halves of the cylinder depending upon its energy and the bit will be
either 0 or 1. But we need to expend energy to compress the gas and this means that erasing
information is the moment when we expend energy.
Two terms frequently used in thermodynamics are adiabatic meaning without transfer
of heat and isothermal meaning at a constant temperature. For example, if we open the
valve of a gas canister, the gas rushes out and expands without having the time to equalize
its temperature with the environment. The rushing gas feels cool.
6.3 Maxwells Deamon

According to the Second Law of Thermodynamics the entropy of a system is a non-decreasing
function of time. Now we describe an experiment that seems to violate this law.
James Clerk Maxwell is best known for equations of the electromagnetic eld and for the
kinetic theory of gases. By treating gases statistically in 1866 he formulated, independently
of Ludwig Boltzmann, the Maxwell-Boltzmann kinetic theory of gases. This theory showed
that temperatures and heat involved only molecular movement.
In 1871 Maxwell proposed a thought experiment with puzzling results. Imagine the
molecules of a gas in a cylinder separated in two by a slit covered with a door controlled by
a little deamon. The deamon examines every molecule of gas and determines its velocity;
those of high velocity on the left side are let to migrate to the right side and those with
low velocity on the right side are allowed to migrate to the left side. As a result of these
measurements, the deamon separates hot from cold in blatant violation of the Second Law of
Thermodynamics. According to the Second Law of Thermodynamics, the entropy of a system
can only increase. The entropy is a measure of the degree of disorder of a system and the
Maxwells Deamon creates order by separating hot molecules (those with high velocity) from
cold molecules (those with low velocity).
For more than a century physicists tried to spot the aw in Maxwells argument without
great success. In 1929 Leo Szilard had the intuition to relate the deamon with the binary
information. He imagined a simplied version of Maxwells thought experiment. Consider
133
Figure 32: The Maxwells Deamon separates fast-moving molecules of gas from slow-moving
ones; the fast moving ones end up on the right hand side of the box and the slow moving ones
on the left hand side. The deamon separates hot from cold in blatant violation of the Second
Law of Thermodynamics .....or so it seems.
(a) (b)
Figure 33: Szilards gedanken experiment considers a single molecule of gas. The deamon
performs a measurement of the position of the molecule in the cylinder. (a) If the molecule
is on the right the piston is pushed halfway without any energy consumption. (b) As the
molecule moves to the right, the piston is pushed and lifts the weight.
a horizontal cylinder with a piston and a single molecule of gas inside the cylinder. If the
deamon waits until the molecule is on the right side the piston is pushed halfway without any
energy consumption, Figure 33(a). As the molecule moves to the right, the piston is pushed
and lifts the weight, Figure 33(b).
The great insight brought by Leo Szilard is that the deamon has to expand energy and
reduce the entropy of the system by performing the measurements. Many regard Leo Szilard
as the father of information theory, he identied the measurement, the information, and the
memory as critical aspects of the thorny problem posed by Maxwell.
In 1950 Leon Brillouin advanced the idea that the daemon needed a torchlight to see where
the molecule is located and that the energy of the photons emitted by the torchlight should
exceed the energy of the background photons.
134
6.4 Energy Consumption. Landauers Principle
Is there an analogy between a heat engine and a computing engine? Is it sensible to think
that only irreversible processes in computations require energy consumption? These were the
main questions addressed by Rudolf Landauer a physicist working at IBM Research.
The precise physical phenomena leading to energy consumption in a computing device
was identied in 1961 by Landauer [34]. Landauer discovered that erasing information is the
process that requires energy dissipation. At a rst sight, this seems a strange statement so
we need a simple model to justify it. We already know that if a process is irreversible it
requires some energy consumption. Thus, we only need to prove that erasing information is
an irreversible process. This is much more easier to grasp, we all know that once we have
erased the blackboard in our oce the information on it is lost, that we need to back up our
les, once a disk drive fails the information stored on it cannot be retrieved.
Let us consider a slightly more formal justication and consider a storage system for binary
information. The double well in Figure 34 allows us to store a bit of information; if the ball
is on the left, then we have a 0 and if it is on the right we have stored a 1. Erasing
information means that wherever the ball happens to be in the current state, we should end
in a state with the ball on the left side of the well. In other words, we have a mapping from
two dierent initial states, the ball on the left well and the ball on the right well, to a single
nal state, the ball on the left well. Clearly, such a process is irreversible, henceforth erasing
information is always associated with energy dissipation.
0 1
Figure 34: Landauers double well information storage model.
There are two equivalent formulation of the so-called Landauers principle, one in terms
of energy consumption and the other in terms of entropy:
Landauers principle: Suppose a compute erases a bit of information. Then the amount of
energy dissipated into the environment is at least kB T ln(2) with kB the Boltzmanns constant
and T the temperature of the environment.
Landauers principle: Suppose a compute erases a bit of information. Then the entropy of
the environment increases by at least kB ln(2) with kB the Boltzmanns constant.
Landauers principle traces the energy consumption in a computation to the need to erase
information. This seems a bit counterintuitive, we would expect that writing information also
requires energy dissipation, that some symmetry of these two operations exists.
Landauer provides only a lower bound on the energy consumption. It turns out that the
logic circuits in microprocessors vintage year 2000 need roughly 500kB T ln(2) in energy for
each logic operation [38]. Since 1970 the computing power of microprocessors has doubled
135
every 18 to 24 months following very closely the prediction of Intels Gordon Moore. To limit
the energy consumption of increasingly more powerful microprocessors the energy consumed
for every logical operation had to decrease at a rate comparable to, or exceeding the one given
by the Moores law.
A consequence of Landauers principle is that no strictly positive lower bound on energy
consumption exists for a reversible computer. If we could build a reversible computer that does
not erase any information, then we can compute in principle without any energy loss. This is
in itself less shocking than it seems at a rst glance; all laws of physics are reversible and if
we know the nal state of a closed physical system then we can determine its initial state.
After understanding Landauers principle the aw in the Maxwells deamon gedanken ex-
periment is obvious: the deamon has to perform some measurement to allow a molecule of gas
approaching the slit to cross over the other side. The results of measurements must be stored
in deamons memory. But his memory is nite, so the deamon must start erasing information
after a while. According to Landauers principle the entropy of the entire system, including
the gas cylinder and the deamon, increases as a result of erasing information. This increase
should be large enough to compensate the decrease in entropy caused by the separation of
molecules with high velocity from the ones with low velocity and to vindicate the the Second
Law of Thermodynamics.
Leo Szilard noticed in 1929 the role of the measurement process in this gedanken ex-
periment. Bennett was the rst to point out that the erasure of information and not the
measurements are the source of the entropy created in the process of separating high velocity
molecules from the low velocity ones in this experiment. His model for the deamons behavior
allows the deamon to make the measurements with zero energy expenditure: the deamon is
initially in a state of uncertainty, let us call this state U. After measuring the velocity of a
molecule, the deamon enters a state H for a high velocity molecule approaching molecule, or
state L for a low-velocity one and overwrites U with either H or L. This can be done without
any energy expenditure; the energy is dissipated in the next step when the deamon has to
erase the H or L and set the value to U, to prepare for the next measurement.
6.5 Low Power Computing. Adiabatic Switching

von Neumann [61] was the rst to reect on the absolute minimum amount of energy required
for an elementary operation of an abstract computing device capable of making binary deci-
sions and of transmitting information. He advanced the idea that this energy is of the order
of kB T . At the room temperature kB T 3 1021 Joules. von Neumann reasoned that if
a capacitor is used to store a bit of information one would need an amount of energy large
enough to guarantee that the level of the signal is above the noise level.
As early as 1978 Fredkin and Tooli discussed a scheme to implement reversible logic
circuits using switches, capacitors, and inductors. Under normal circumstances a capacitor
with capacity C at voltage V if charged or discharged instantly dissipates an amount of energy
equal to 1/2CV 2 as heat. In their scheme, a capacitor used to store a bit of information could
be discharged without losing the energy. The electricity is transferred to an inductor and
from there to another capacitor. Unfortunately, the scheme is not practical, inductors cannot
be accommodated on a silicon substrate.
Adiabatic switching is a term coined for a switching device that does not produce heat.
Charles Seitz from Caltech invented a scheme called hot-clocking when the energy is saved by
136
varyying the power supply voltages. Raplh Merkle from Xerox PARC, William Athas from
USC, and Storrs Hall from Ruthers pioneered a reversible adiabatic switching scheme.
6.6 Bennetts Information Driven Engine

Bennett imagined an information driven engine; instead of using electricity or gas Benenetts
engine consumes a tape with information stored on it and converts this information into
energy see Figure 35. This sounds very exciting, but do not jump to conclusions yet; it may
be a while until youll be able to create a tape recording your kids in the evening and feed
the tape next morning to your Honda Civic instead of lling it up with gas, hydrogen, or
electricity.
90
Randomized Input Tape with

Output Tape Information on it
Heat Bath
Figure 35: Bennetts information-driven engine. The input tape contains information, the
output tape is randomized. The entropy of the system increases and the energy produced is
used to move a piston.
The engine vaguely resembles a Turing machine, the input tape contains cells, each cell is
similar to the cylinders with one molecule of gas in it. The machine spits out a randomized
tape where the atom can be anywhere in the cylinder. This system is just the opposite
of the one presented earlier where we use energy to force the molecule of gas in one half
of the cylinder and reduced the entropy. In this engine the entropy, which is a measure of
randomness, increases and this means that some energy is produced.
Let us now describe carefully the setup. The engine itself is submerged into a heat bath.
A heat bath is a system able to keep constant the temperature T of the engine. When a
cell enters the engine its contents are spilled into a cylinder with a piston half way into the
cylinder. The molecule heats up to the temperature of the heat bath. When the system
has reached thermal equilibrium the molecule pushes isothermally the piston and with some
imagination we are able to extract the energy of this movement. If we have a tape with n
bits of information on it, then the work produced by the engine is nkB T ln(2).
137
6.7 Logically Reversible Turing Machines and Physical Reversibil-
ity
An ordinary Turing machine has a control unit and a read-write head; it performs a sequence
of read-write-shift operations on an innite tape divided into squares. The dynamics of the
Turing machine is described by quintuples (A, T, A , T , ) of the form:
AT T A .
The signicance of this notation is that when the control unit is in state A and the symbol
currently scanned by the read-write head is T , then the machine will rst write T in place of
T and then shift left one square, right one square, or remain on the same square depending
upon the value of ( = , +, 0 respectively) and the new state of the control unit will
be A . An n-tape Turing machines is one were T , T , and of each quintuple are themselves
n-tuples.
Results Tape
Input Tape
Universal
Reversible
Computer
00 0 0 00 0 00 0 00 0 0 0 (U)
HistoryTape (blank)
1 1 01 0 1 00 1 10 1 01 0
History Tape (garbage)
(a)
Stage One Stage Two Stage Three
Copy of Results
111
Copy
Unit
Results Tape
0 0 0 00 0 1 1 11 1 0 00 0 00
Input Tape Modified Input Tape Universal

Universal
0 1 1 0 1 1 Computer 1 1 1 Computer 0 1 1 0 1 1
(U) (U-1)
HistoryTape
(blank) HistoryTape
0 0 0 00 0 1 1 1 00 00 0 0
Figure 36: (a) A Universal Reversible Computer. (b) A Zero Entropy Loss Reversible Com-
puter.
A Turing machine performs a mapping of its entire current state into a successor state
given by its transition function. The entire state of a Turing machine is given by the state of
its control unit, the tape contents, and the position of the read-write head. When a Turing
machines traverses a set of states we have a set of mappings associated with this evolution.
A Turing machine is deterministic i 21 the quintuples dening the mappings have non-
overlapping domains. This is guaranteed by requiring that the portion of the tape to the left
21
i is an abbreviation for if and only if
138
of the arrow marking the position of the read-write head be dierent for dierent quintuples.
A Turing machine is reversible i the mappings have non-overlapping ranges. An ordinary
Turing machine is not reversible. Indeed, the write and shift operations in a cycle do not
commute; the inverse of a read-write-shift cycle is shift-read-write.
The problem of constructing a reversible computing automaton is non-trivial. A tempting
solution is to add to the Turing machine a history tape, initially blank, and save on this tape
the details of every operation performed. Then we would be able to retrace the steps of the
direct computation; starting from the last record on the tape we would determine the previous
state and keep going back until we reached the last record on the tape.
The history tape must be left blank as it was when we started the process. We know by
now that erasing information requires energy dissipation. Thus, we require that the reversible
computer if it halts to have erased all intermediate results leaving only the original input and
the desired output so the process of building a reversible automaton is more intricate than
we have anticipated.
Charles Bennett was able to prove in 1973 that given an ordinary Turing machine S one
can construct a reversible three tape machine R which emulates S on any standard input,
and which leaves behind, at the end of its computation only the original input and the desired
output. The formal proof of this statement can be found in [9].
Here we present Bennetts informal argument. Imagine that at the end of the original
computation which is deterministic and reversible we continue with a stage when the ma-
chine uses the inverse of the original transfer function and make the machine carry out the
entire computation backwards. Like the forward computation the backwards computation is
deterministic and reversible.
We have to be a bit careful and create a copy of the results on a new tape immediately
after the completion of the forward computation and before the backward one starts, because
the backward computation destroys the results. We have also to stop recording on the history
tape during the process of copying the results. The copy operation of the results can be done
reversibly if we start with a blank tape.
After this three stage process the system will consist of a copy of the results obtained
during the rst phase, the copy being made during the second phase, and the original input
tape reconstructed during the third phase. A vast amount of storage on the history tape was
used but returned to its original blank condition after the third stage.
Once convinced that logically reversible automata exist we can think of thermodynamically
reversible physical computers operating very slowly near thermodynamic equilibrium. For
example a chemical reversible computer could consist of DNA encoding logical states and
reactants able to change the logical state.
Figures 36(a) and (b) depict a Universal Reversible Computer and a Zero Entropy Loss
Reversible Computer based upon Bennettes arguments discussed above. They are inspired
by Feynman [31].
139
7 Basic Concepts of Information Theory
A communication channel is used to transmit information from a source to a destination. The
information can be in analog or digital format. In this chapter we are only concerned with
digital information.
The information is transported through a physical channel by a carrier such as electromag-
netic or light waves. The process of inscribing the information on a physical carrier is called
modulation; the process of extracting the information from the carrier is called demodulation.
7.1 Entropy
The entropy is a measure of the uncertainty of a single random variable X before it is observed,
or the average uncertainty removed by observing it. This quantity is called entropy due to
its similarity with the thermodynamic entropy.
Denition 13. The entropy of a random variable X with a probability density function pX (x)
is:

H(X) = pX (x) log2 pX (x).
x
A binary random variable X can only take two values, x = 0 and x = 1. The entropy
of a binary random variable is measured in bits. The probability is a real number with values
0 X (x) 1. Thus, log2 pX (x) 0 and H(X) 0.
Example 11. Consider a binary random variable X and let p = pX (x = 1) be probability

that the random variable X takes the value 1. Then the entropy of X is:
H(X) = p log2 (p) (1 p) log2 (1 p).

Figure 37 shows H(X) function of p. The entropy has a maximum of 1 bit when p = 1/2
and goes to zero when p = 0 or p = 1. Intuitively, we expect the entropy to be zero when the
outcome is certain and reach its maximum when both outcomes are equally likely.
Example 12. The same group of eight cars takes part in several Formula I races. The
probabilities of winning for each of the eight cars are respectively:
1 1 1 1 1 1 1 1
, , , , , , , .
2 4 8 16 64 64 64 64
The entropy of the random variable X indicating the winner after one of the races is:
1 1 1 1 1 1 1 1 4 1
H(X) = log2 log2 log2 log2 log2 = 2 bits.
2 2 4 4 8 8 16 16 64 64
If we want to send a binary message to reveal the winner of a particular race, we could
encode the identity of the winning car in several ways. For example, we can use three
bits and encode the identity of each car as the binary representation of integers 0 to 7,
000, 001, 010, 011, 100, 101, 110, 111.
140
H(X)
p
0 1/2 1
Figure 37: The entropy of a binary random variable function of the probability of an outcome.
An optimal encoding means that the average number of bits transmitted is minimal. Given
the individual probability of winning a race the optimal encoding of the identities of individual
cars is:
0, 10, 110, 1110, 111100, 111101, 111110, 111111

and the corresponding lengths of the strings encoding the identity of each car are:
l1 = 1, l2 = 2, l3 = 3, l4 = 4, l5 = l6 = l7 = l8 = 6.
To prove that this encoding is optimal we have to show that the expected length of the string
designating the winner over a large number of car races is less than the obvious encoding
presented above. The probabilities of winning the race are: p1 = 12 for the car encoded
as 0; p2 = 14 for 10, and so on. The average length of the string, l, we have to send to
communicate the winner is:

8
1 1 1 1 1
l = li pi = 1 +2 +3 +4 + 4 (4 ) = 2 bits.
i=1
2 4 8 16 64
The average length of the string identifying the outcome of a race for this particular encoding
scheme is equal with the entropy. This example shows that indeed the entropy provides the
average information obtained by observing an outcome, or the average uncertainty removed
by observing X.
7.2 Conditional and Joint Entropy. Mutual Information
Denition 14. The joint entropy of two random variables X and Y with the joint probability
density function p(x, y) is:

H(X, Y ) = p(x, y) log2 p(x, y).
xX yY
141
Table 2: The joint probability distribution matrix of random variables X and Y in the condi-
tional entropy example.
YX a b c d
a 1/8 1/16 1/32 1 /32
b 1/16 1/8 1/32 1 /32
c 1/16 1/16 1/16 1/16
d 1/4 0 0 0
Denition 15. The conditional entropy of random variable Y given X is:

H(Y |X) = p(X = x) H(Y |X = x).
(xX
Example 13. Based on [22].

Consider two random variables X and Y . Each of them takes values over a four-letter
alphabet consisting of the symbols a, b, c, d. The joint distribution of the two random variables
is given in Table 2.
The marginal distribution of X is ( 12 , 14 , 18 , 18 ), gives the probability of x = a, x = b, x = c,
and x = d regardless of the value y of Y , and it is obtained by summing the corresponding
columns of the joint probability matrix. The corresponding marginal distribution of Y is
( 14 , 14 , 14 , 14 ).
The actual value of H(X|Y ) is:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
H(X|Y ) = H( , , , ) + H( , , , ) + H( , , , ) + H(1, 0, 0, 0).
4 2 4 8 8 4 4 2 8 8 4 4 4 4 4 4
or
1 7 1 7 1 1 11
H(X|Y ) = + + 2+ 0= bits.
4 4 4 4 4 4 8
Proposition 6. The joint entropy of two random variables X and Y , H(X, Y ), the entropy of
X and Y , H(X) and H(Y ), and the conditional entropy, H(X|Y ) and H(Y |X), are related:
H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(Y |X).

The proof of this proposition is left as an exercise for the reader.
The reduction in uncertainty of a random variable X due to another random variable Y
is called the mutual information.
Denition 16. The mutual information of two random variables X and Y is:
p(x, y)
I(X; Y ) = H(X) H(X|Y ) = p(x, y) log .
p(x) p(y)
The mutual information I(X, Y ) is a measure of the dependence between the two random
variables; it is symmetric in X and Y and always non-negative.
142
Example 14. If in the previous example the four outcomes of X are equally likely then:
1 1
H(X) = 4 log2 = log2 4 = 2 bits.
4 4
and
11 5
I(X; Y ) = H(X) H(X|Y ) = 2 = bits.
8 8
The relations between entropy, conditional entropy, joint entropy, and mutual information
is illustrated in Figure 38. As we can see:
H(X) = H(X|Y ) + I(X; Y ).
H(Y ) = H(Y |X) + I(X; Y ).
H(X, Y ) = H(X) + H(Y ) I(X; Y ).
H(X)
H(Y)
H(X|Y) I(X;Y) H(Y|X)
H(Y,X)
Figure 38: The relations between entropy, conditional entropy, joint entropy, and mutual
information.
7.3 Binary Symmetric Channels

The input X and the output Y of a communication channel are random variables that take
values over the channel alphabet. A communication channel is characterized by a probability
transition matrix that determines the conditional distribution of the output given the input.
If the input symbols are independent, then the information per symbol at the input of a
channel is H(X) and the information for n symbols is n H(X).
The question we address now is how much of this information goes through the channel.
To answer this question we examine rst very simple channel models.
143
A binary channel is one where X = {0, 1} and Y = {0, 1}. A unidirectional binary
communication channel is one where the information propagates in one direction only, from
the source to the destination.
Source Noiseless, binary symmetric communication channel Destination
x=0 y=0
x=1 y=1
(a)
Source
Noisy, binary symmetric communication channel Destination
1-p
x=0 y=0
p
p
x=1 y=1
1-p
(b)
Figure 39: (a) A noiseless binary symmetric channel maps a 0 at the input into a 0 at the
output and a 1 into a 1. (b) A noisy symmetric channel maps a 0 into a 1, and a 1 into a 0
with probability p. An input symbol is mapped into itself with probability 1 p.
A noiseless binary channel transmits each symbol in the input alphabet without errors,
as shown in Figure 39(a). The noiseless channel model is suitable in some cases for perfor-
mance analysis but it is not useful for reliability analysis when transmission errors have to be
accounted for.
In the case of a noisy binary symmetric channel, let p > 0 be the probability that one
input symbol is received in error, a 1 at the input becomes a 0 at the output and a 0 at the
input becomes a 1 at the output, as shown in Figure 39(b).
Assume that the two input symbols occur with probabilities q and 1 q In this case:

H(Y |X) = p(X = x) H(Y |X = x)
xX
H(Y |X) = q (p log2 p + (1 p) log2 (1 p)) + (1 q)(p log2 p + (1 p) log2 (1 p))
H(Y |X) = p log2 p + (1 p) log2 (1 p).

Then the mutual information is:
I(X; Y ) = H(Y ) + p log2 p + (1 p) log2 (1 p).

We can minimize I(X; Y ) over q to get the channel capacity per symbol in the input
alphabet as:
Cs = 1 + p log2 p + (1 p) log2 (1 p).

When p = 12 this capacity is 0 because the output is independent of the input. When
p = 0 or p = 1, the capacity is 1, and we have in fact a noiseless channel
144
7.4 Information Encoding
The problem of transforming, repackaging, or encoding information is a major concern in
modern communication. Encoding is used to:
(i) make transmission resilient to errors,
(ii) reduce the amount of data transmitted through a channel, and
(iii) ensure information condentiality.
A rst reason for information encoding is error control. The error control mechanisms
transform a noisy channel into a noiseless one; they are built into communication protocols
to eliminate the eect of transmission errors.
An error occurs when an input symbol is distorted during transmission and interpreted by
the destination as another symbol from the alphabet. Coding theory is concerned with this
aspect of information theory.
Another reason for encoding is the desire to reduce the amount of data transferred through
communication channels and to eliminate redundant or less important information. The
discipline covering this type of encoding is called data compression; encoding and decoding
are called compression and decompression, respectively.
Last, but not least, we want to ensure information condentiality, to restrict access to
information to only those who have the proper authorization. The discipline covering this
facet of encoding is called cryptography and the processes of encoding/decoding are called
encryption/decryption. In all these cases there is a mapping from an original message to a
transformed one done at the source and an inverse mapping done at the destination. The
rst process is called encoding and the second one decoding.
Source Encoding and Channel Encoding. Information may be subject to multiple
stages of encoding as shown in Figure 40 for the error control case on a binary channel. In
Figure 40(a) the source uses a four-letter alphabet and the source encoder maps these four
symbols into two-bit strings. The source decoder performs the inverse mapping. In Figure
40(b) the channel encoder increases the redundancy of each symbol encoded by the source
encoder by mapping each two-bit string into a ve-bit string. This mapping allows the channel
decoder to detect a single bit error in the transmission of a symbol from the source alphabet.
In general, the source encoding is the process of transforming the information produced by
the source into messages. The source may produce a continuous stream of symbols from the
source alphabet and the source encoder cuts this stream into blocks of xed size. The channel
encoder accepts as input a set of messages of xed length and maps the source alphabet into
a channel alphabet, then adds a set of redundancy symbols, and nally sends the message
through the channel. The channel decoder rst determines if the message is in error and
takes corrective actions. Then it removes the redundancy symbols, maps back the channel
alphabet into the source alphabet, and hands each message to the source decoder who in turn
processes the message and passes it to the receiver.
7.5 Channel Capacity. Shannons Theorems

Now we address a more subtle question: How much information may be transmitted through
a communication channel?
145
Source Binary Communication Source
Source Receiver
Encoder Channel Decoder
A 00 00 A
B 10 10 B
C 01 01 C
D 11 11 D
(a)
A 00 00 A
Source B 10 10 B Source
Source Receiver
Encoder C 01 01 C Decoder
D 11 11 D
Channel Binary Communication Channel

Encoder Channel Decoder
00 00000 00000 00
10 10110 10110 10
01 01011 01011 01
11 11101 11101 11
(b)
Figure 40: Encoding. (a) The source alphabet consists of four symbols, A,B,C, and D. The
source encoder maps each input symbol into a two-bit code, A is mapped to 00, B to 10, C
to 01, and D to 11. The source decoder performs an inverse mapping and delivers a string
consisting of the four input alphabet symbols. If a one-bit error occurs when the sender
generates the symbol D, the source decoder may get 01 or 10 instead of 11 and decode it
as either C or B instead of D. (b) The channel encoder maps a two-bit string into a ve-bit
string. If a one-bit or a two-bit error occur, the channel decoder receives a string that does
not map to any of the valid ve-bit strings and detects an error. For example, when 10110
is transmitted and errors in the second and third bit positions occur, the channel decoder
detects the error because 11010 is not a valid code word.
Denition 17. Given a communication channel with input X and output Y , the channel
capacity is dened as the highest rate the information can be transmitted through the channel:
C = max I(X; Y ).
We are interested in two fundamental questions regarding communication over noisy chan-
nels: (i) Is it possible to encode the information transmitted over a noisy channel to minimize
the probability of errors? (ii) How does the noise aect the capacity of a channel?
The intuition behind the answer to the rst question is illustrated by the following analogy.
If we want to send a delicate piece of china using a parcel delivery service, we have to package
it properly. The more packaging material we add, the more likely it is that the delicate item
will arrive at the destination in its original condition, but, at the same time, we increase the
weight of the parcel and add to the cost of shipping.
As far as the second question is concerned, we know that whenever the level of noise in a
146
room or on a phone line is high, we have to repeat words and sentences several times before
the other party understands what we are saying. Thus, the actual rate we are able to transmit
information through a communication channel is lowered by the presence of the noise.
Rigorous answers to both questions, consistent with our intuition, are provided by two
theorems due to Claude Shannon, [48, 49, 50], who founded the information theory in the
1950s. Shannon uses the simple models of a communication channel presented earlier to
establish the rst fundamental result, the so-called Shannons coding theorem.
Theorem 1. Given a noisy channel with capacity C, for any 0 < < 1 there is a coding
scheme that allows us to transmit information through the channel at a rate arbitrarily close
to channel capacity, C, and with a probability of error less than .
This result constitutes what mathematicians call a theorem of existence; it only states
that a solution for transforming a noisy communication channel into a noiseless one exists,
without giving a hint of how to achieve this result.
In real life various sources of noise distort transmission and lower the channel capacity.
This eect is expressed by Shannons channel capacity theorem for noisy channels:
Theorem 2. The capacity C of a noisy channel is:
Signal
C = B log2 (1 + )
N oise
where B is the bandwidth, Signal is the average power of the signal, and N oise is the average
noise power. Achieving Shannons limit is a challenging task for any modulation scheme.
The signal-to-noise ratio is usually expressed in decibels (dB) given by the formula:
10 log10 ( Signal
N oise
). A signal-to -noise ratio of 103 corresponds to 30 dB, and one of 106
corresponds to 60 dB.
Example 15. Consider a phone line that allows transmission of frequencies in the range 500
Hz to 4000 Hz and has a signal-to-noise ratio of 30 dB. The maximum data rate through the
phone line is:
C = (4000 500) log2 (1 + 1000) = 3500 log2 (1001) = 35Kbps.

If the signal-to-noise ratio improves to 60 dB, the maximum data rate doubles, to about
70 Kbps. However, improving the signal-to-noise ratio of a phone line by three orders of
magnitude, from 103 to 106 , is extremely dicult or even unfeasible from a technical stand
point.
7.6 Error Detecting and Error Correcting Codes

Coding is a discipline of information theory building on Shannons results. Error detection is
the process of determining if a received message is in error. Error correction is the process of
restoring a message in error to its original content.
The approach taken in coding is to add additional information to increase the redundancy
of a message. The intuition behind error detection and error correction is to articially
increase the distance between code words so that the transmission errors cannot possibly
transform one valid code word into another valid code word.
Suppose we want to transmit a text written in an alphabet with 32 letters. Assuming the
channel alphabet is binary, we can assign ve bits to each letter of the source alphabet and
147
we can rewrite the entire text in the binary alphabet accepted by the channel. Using this
encoding strategy, we cannot correct errors during transmission. However, if we use more
than ve bits to represent each letter of the original alphabet, we have a chance to correct a
small number of errors.
Example 16. A simple example of an error detection scheme is the addition of a parity check
bit to a word, of a given length.
This simple scheme is very powerful; it allows us to detect any number of odd errors but
fails if an even number of errors occurs. For example, consider a system that enforces even
parity for each eight-bit word. Given the string (10111011), we add one more bit to ensure
that the total number of 1s is even, in this case a 0 and we transmit the nine-bit string
(101110110). The error detection procedure is to count the number of 1s; we decide that the
string is in error if this number is odd.
This example also hints to the limitations of error detection mechanisms. A code is
designed with certain error detection capabilities and fails to detect error patterns not covered
by the original design of the code.
7.7 Block Codes

Now we provide a number of denitions necessary for understanding the basic concepts of
coding theory.
Let A be an alphabet of q symbols. For example, A = {0, 1} is the binary alphabet. The
set of symbols that could be transmitted through a communication channel constitute the
channel alphabet.
Denition 18. A block code of length n over the alphabet A is a set of M , n-tuples where
each n-tuple takes its components from A and is called a code word. We call the block code
an [n, M ]-code over A.
Figure 41 illustrates the encoding process for a block code. The source encoder maps
messages, information that the source wants to transmit, into groups of k symbols from the
channel alphabet. For example, assume that the channel alphabet consists of q symbols, the
integers, 0 to q 1, prexed by the # sign and we want to transmit a text consisting of s < q
sentences. The channel encoder maps each sentence to one of the symbols, and then creates
tuples of k such symbols. Next, the channel encoder maps each tuple of k symbols into one of
n = k + r symbols. The channel encoder transmits the code words, but the n-tuples received
by the channel decoder may be aected by transmission errors, thus, they may or may not
be code words.
The quantity r = n k > 0 is called the redundancy of the code. The channel
encoder adds redundancy, and this leads to the expansion of a message. While the added
redundancy is desirable from the point of view of error control, it decreases the eciency
of the communication channel by reducing its eective capacity. The ratio nk measures the
eciency of a code.
Denition 19. The rate of an [n, M ]-code that encodes k-tuples into n-tuples is:
k
R= .
n
148
Source Receiver
original message original message
Source
Encoder Decoder
souce encoded message souce encoded message

(k-tuples) (k-tuples)
Channel Channel
Communication channel
Encoder Decoder
channel encoded message channel encoded message

(n-tuples) (n-tuples)
Figure 41: Coding using block codes. The source encoder maps messages into blocks on k
symbols from the code alphabet, the channel encoder adds r symbols and transmits n-tuples,
with n = k + r. At the destination the channel decoder maps n-tuples into k-tuples, and the
decoder reconstructs messages.
7.8 Hamming Distance

We now introduce a metric necessary to establish the error detecting and error correcting
properties of a code.
Denition 20. The Hamming distance d(x, y) between two code words x and y is the number
of coordinate positions in which they dier.
Example 17. Given two binary code words x = (010110) and y = (011011). Their Hamming
distance is d(x, y) = 3.
Indeed if we number the bit position in each code word from left to right as 1 to 6, the
two code words dier in bit positions 3, 4, and 6.
Proposition 7. The Hamming distance is a metric. For all n-tuples x, y and z over an
alphabet A:
1. d(x, y) 0, with equality if and only if x = y.
2. d(x, y) = d(y, x).
3. d(x, y) + d(y, z) d(x, z) (triangle inequality).
We leave the proof of this theorem as an exercise for the reader.
149
Denition 21. Let C be an [n, M ]-code. The Hamming distance d of code C is:
d = min{d(x, y) : x, y C, x = y}.
The Hamming distance of a code is the minimum distance between any pairs of code
words.
Example 18. The distance of a code.

Consider C = {c0 , c1 , c2 , c3 } where c0 = (000000), c1 = (101101), c2 = (010110), c3 =
(111011). This code has distance d = 3. Indeed d(c0 , c1 ) = 4, d(c0 , c2 ) = 3, d(c0 , c3 ) =
5, d(c1 , c2 ) = 5, d(c1 , c3 ) = 3, d(c2 , c3 ) = 4.
To compute the Hamming

distance for an [n, M ]-code C , it is necessary to compute the
M
distance between the 2 pairs of code words and then to nd the pair with the minimum
distance.
7.9 Channel Decoding Policy

We now dene a policy for the channel decoder for [n, M ]-code C with distance d. This policy
tells the channel decoder what actions to take when it receives an n-tuple r and consists of
two phases:
(1) The recognition phase, when the received code word, r is compared with all the code
words in the code until a match is found or we decide that r is not a code word.
(2) The error correction phase, when the received tuple r is mapped into a code word.
The actions taken by the channel decoder are:
(i) If r is a code word, conclude that no errors have occurred and accept that the code word
sent was c = r.
(ii) If r is not a code word, conclude that errors have occurred and either correct r to a code
word c or declare that correction is not possible.
Before continuing our discussion we should observe that this strategy fails when c, the
code word sent, was aected by errors and transformed into another valid code word, r. This
is a fundamental problem with error detecting and error correcting code and we return to it
later. Once we accept the possibility that the channel decoder may fail to decode properly
in some cases, our goal is to take the course of action with the greatest probability of being
correct.
The decoding strategy called nearest neighbor decoding, requires that a received vector is
decoded to the code word closest to it, with respect to Hamming distance: If an n-tuple r
is received, and there is a unique codeword c C such that d(r, c) is a minimum, then correct
r to the c. If no such c exists, report that errors have been detected, but no correction is
possible. If multiple code words are at the same minimum distance from the received code
word we select at random one of them.
For the following analysis we make the assumption that errors are introduced by the
channel at random, and that the probability of an error in one coordinate is independent of
150
errors in adjacent coordinates. Consider a code over an alphabet of q symbols and let the
probability that an error occurs on symbol transmission be p. The probability that a symbol
is correctly transmitted over the channel is then 1 p. We assume that if an error occurs,
each of the q 1 symbols aside from the correct one is equally likely, its probability to be
received is p/(q 1). This hypothetical channel is called the q-ary symmetric channel.
To justify the nearest-neighbor decoding policy we discuss a strategy known as maximum
likelihood decoding. Under this strategy, of all possible code words, r is decoded to that code
word c which maximizes the probability P (r, c) that r is received, given that c is sent. If
d(r, c) = d, then
d
p
P (r, c) = (1 p) nd
.
q1
Indeed, n d coordinate positions in c are not altered by the channel; the probability
of this event is (1 p)nd . In each of the remaining d coordinate positions, the symbol in
c is altered and transformed into the corresponding symbol in r; the probability of this is
(p/(q 1))d .
Suppose now that c1 and c2 are two code words. Assume that we receive r such that:
d(r, c1 ) = d1 , d(r, c2 ) = d2 .
Without loss of generality we assume that d1 d2 and that:
P (r, c1 ) > P (r, c2 ).

It follows that:
d1 d2
p p
(1 p)nd1
> (1 p) nd2
q1 q1
d2 d1
d2 d1 p
(1 p) >
q1
and d2 d1
p
< 1.
(1 p)(q 1)
If d1 = d2 , this is false, and in fact, P (r, c1 ) = P (r, c2 ). Otherwise, d2 d1 1 and the
inequality is true if and only if
p (q 1)
< 1, i.e., p < .
(1 p)(q 1) q
In conclusion, when the probability of error p < (q 1)/q and we receive n-tuple r, we
decide that code word c at the minimum distance from r is the one sent by the source.
Example 19. Maximum likelihood decoding.

Consider the binary code C = {(000000), (101100), (010111), (111011)} and assume that
p = 0.15. If r = (111111) is received, r is decoded to (111011) because the probability
P (r, (111011)) is the largest. Indeed:
P (r, (000000)) = (0.15)6 = 0.000011
151
P (r, (101100)) = (0.15)3 (0.85)3 = 0.002076
P (r, (010011)) = (0.15)3 (0.85)3 = 0.002076
P (r, (111011)) = (0.15)1 (0.85)5 = 0.066555
A code is capable of correcting e errors if the channel decoder is capable of correcting any
pattern of e or fewer errors, using the algorithm above.
7.10 Error Correcting and Detecting Capabilities of a Code

The ability of a code to detect and/or correct errors is a function of its distance, d, previously
dened as the minimum Hamming distance between any pair of code words.
Denition 22. Let ci , 1 i M be the codewords of C, an [n, M ]-code. Let S be the set of
all n-tuples over the alphabet of C.
Sci = {x S : d(x, ci ) e}
Sci is called the sphere of radius e around the code word ci . It consists of all n-tuples
within distance e of the code word ci , which is at the center of the sphere.
Example 20. The sphere of radius 1 around the code word (000000):
The sphere consists of the center and all binary six-tuples, that dier from (000000) in
exactly one bit position:
{(000000), (100000), (010000), (001000), (000100), (000010), (000001)}.
Proposition 8. Let C be an [n, M ]-code with an odd distance, d = 2e + 1. Then C can correct
e errors and can detect 2e errors.
Proof. We rst show that for ci = cj we must have Sci Scj = . Assume by contradiction
that x Sci Scj . Then d(x, ci ) e and d(x, cj ) e. The triangle inequality:
d(ci , x) + d(x, cj ) d(ci , cj )

and hence
d(ci , cj ) 2e.
But every pair of distinct code words has a distance of at least 2e + 1. Thus we conclude
that
Sci Scj = .
Hence if code word ci is transmitted and t e errors are introduced, the received word r
is an n-tuple in the sphere Sci , and thus ci is the unique code word closest to r. The decoder
can always correct any error pattern of this type. If we use the code for error detection, then
at least 2e + 1 errors must occur in the code word to carry it into another code word. If at
152
d(c3,c4 ) = 2e+1
c3 e e c4
xc
u
c1 r c2
2e 2e
d(c1,c2) > 2e
Figure 42: A geometric illustration of error detection and error correction capabilities of a
code. A code C = {c1 , c2 , c3 , c4 , ..} with minimum distance d = 2e + 1 is used simultaneously
for error correction and for error detection. According to the channel decoding rules any
n-tuple t, in the sphere of radius e about code word c1 , is decoded as the center of the sphere,
c1 . If at most 2e errors occur when c1 is transmitted, the received n-tuple, r cannot be
masquerading as a valid code word, c2 since the distance between c1 and c2 is at least 2e + 1.
Some n-tuples, such as q, are not located within any sphere; thus, they cannot be corrected
to any valid code word. Patterns of e + 1 errors escape detection and are decoded incorrectly
if the code is used simultaneously for error correction and for error detection. This is the case
of cu obtained from c3 when e + 1 errors occur.
least 1 and at most 2e errors are introduced, the received word will never be a code word and
error detection is always possible.
Figure 42 provides a geometric interpretation for the previous theorem. The code C =
{c1 , c2 , c3 , c4 , ..} with distance d = 2e + 1 is used simultaneously for error correction and for
error detection.
The spheres of radius e around all code words do not intersect because the minimum
distance between any pair of code words is 2e + 1. According to the channel decoding rules,
any n-tuple t in the sphere of radius e around code word c1 is decoded as the center of the
sphere c1 .
If at most 2e errors occur when the code word a is transmitted, the received n-tuple, r
cannot be masquerading as a valid code word c2 since the distance between c1 and c2 is at
least 2e + 1.
153
The distance of a code may be even or odd. The case for even distance is proved in a
similar manner. Let a denote the largest integer smaller than or equal to a.
Proposition 9. Let C be an [n, M ]-code with distance d. Then C can correct (d1)
2
errors
and can detect d 1 errors.
From Figure 42 we see that there are n-tuples that are not contained in any of the spheres
around the code words of a code C. If an n-tuple q is received and the decoder cannot place
it in any of the spheres, then the decoder knows that at least e + 1 errors have occurred and
no correction is possible.
If a code C with distance 2e + 1 is used simultaneously for error detection and error
correction some patterns of less than 2e errors could escape detection. For example, patterns
of e + 1 errors that transform a code word c3 into a received n-tuple, cu in sphere Sc4 , and
force the decoder to correct the received n-tuple to c4 . In this case the (e + 1)-error pattern
is undetected and a false correction is performed.
In conclusion, when d = 2e + 1, the code C can correct e errors in general, but is unable
to simultaneously detect additional errors. If the distance of the code is even, the situation
changes slightly.
Proposition 10. Let C be an [n, M ]-code having distance d = 2k. Then C can correct k 1
errors and simultaneously detect k errors.
Proof. By a previous theorem C can correct up to

d1 2k 1 1
= = k = k 1
2 2 2
errors. Since the spheres around code words have radius k 1, any pattern of k errors
cannot take a code word into a word contained in some sphere around another code word.
Otherwise, the code words at the centers of these two spheres would have distance at most
k + k 1 = 2k 1, which is impossible since d = 2k. Hence, a received word obtained from
a code word by introducing k errors cannot lie in any code word sphere and the decoder can
detect the occurrence of errors. The decoder cannot detect k + 1 errors in general, since a
vector at distance k + 1 from a given code word may be in the sphere of another code word,
and the decoder would erroneously correct such a vector to the code word at the center of
the second sphere.
Example 21. Decoding rule.

Consider the code C = {c1 , c2 , c3 , c4 } where
c1 = (000000) c2 = (101100) c3 = (010111) c4 = (111011)

C has distance d = 3, and hence can correct 1 error. The set S of all possible words
over the alphabet {0,1} consists of all possible binary 6-tuples, hence, |S| = 64. Let us now
construct the four spheres of radius 1 about each code word:
Sci = {(000000), (100000), (010000), (001000), (000100), (000010), (000001)}

Sc2 = {(101100), (001100), (111100), (100100), (101000), (101110), (101101)}
Sc3 = {(010111), (110111), (000111), (011111), (010011), (010101), (010110)}
154
Sc4 = {(111011), (011011), (101011), (110011), (111111), (111001), (111010)}
Theses spheres cover 28 of the 64 6-tuples in S. Let S be the set of 6-tuples not in any
sphere and |S | = 64 28 = 36.
Suppose the decoder receives r = (000111). The distance to each code word is computed.
d(c1 , r) = 3, d(c2 , r) = 4, d(c3 , r) = 1, d(c4 , r) = 4.
Since there is a unique minimum distance, r is decoded to c3 . Notice that r lies in the sphere
Sc3 .
To evaluate the reliability of the nearest neighbor decoding strategy we have to compute
the probability that a code word c sent over the channel is decoded correctly. Let us consider
an [n, M ]-code C with distance d over an alphabet A with q symbols, |A| = q. Assume a
probability of error p, such that 0 p (q 1)/q.
When code word c is sent, the received n-tuple will be decoded correctly if it is inside the
sphere of radius e = (d 1)/2 about c, Sc of radius e. The probability of this event is
e
n
P (r, c) = i pi (1 p)ni ,
rSc i=0
n
Indeed, the probability of receiving an n-tuple with i positions in error is i pi (1 p)ni
and i takes values in the range from 0, no errors, to a maximum of e errors. This expression
gives a lower bound on the probability that a transmitted code word is correctly decoded.
7.11 The Hamming Bound

In the previous section we established that the ability of a code to detect and correct errors
is determined by the distance of the code. To construct a binary code with k information
bits and r redundancy bits we have to select a subset of n-tuples with n = k + r so that the
minimum distance between any pair of code words is d.
Now we examine the question of the minimum number of redundancy bits necessary to
construct a code able to correct any single error, the so-called Hamming bound.
To establish this bound we make several observations:
(i) To construct an [n, M ] block code means to select M = 2k n-tuples as code words out of
2n n-tuples.
(ii) To correct all 1-bit errors, all spheres of radius one around the M code words must not
intersect each other.
(iii) A sphere of radius one around an n-tuple consists of (n + 1) n-tuples. In addition to the
center of the sphere, there are n n-tuples at distance one from the center, one for each bit
position, see also the rst example in Section 7.10.
Then:
M (n + 1) 2n .
This relation says that the total number of n-tuples in the M spheres of radius one cannot
exceed the total number on n-tuples. This can be written as:
155
2k (k + r + 1) 2(k+r) .
or
2r r k + 1.
For example, if k = 15, the minimum value of r is 5.
156
8 Quantum Algorithms
8.1 Introduction to Quantum Algorithms
8.2 Quantum Fourier Transform
8.3 Quantum Phase Estimation
8.4 Order Finding
8.5 Quantum Algorithms for Integer Factoring
8.6 The Hidden Subgroup Problem
8.7 Quantum Search Algorithms
8.8 Quantum Simulation
157
9 Quantum Information Theory
9.1 Introduction to Quantum Information Theory
9.2 von Neumanns Entropy
9.3 Source and Channel Coding
9.4 Quantum Channel Capacity
9.5 Quantum Cryptography
9.6 Quantum Codes
158
10 Appendix I: Algebraic Structures
The treatment in this section follows closely [17]. An algebraic structure consists of a set of
elements, R and one or more binary laws of composition (operations) say , such that the
set is closed under these laws of composition, if a, b R then ab R and a b R.
10.1 Commutative Rings, Integral Domains, Fields
Denition 23. Commutative Ring. Let R be a set of elements R = {a, b, c, . . .} and the two
binary operations be, + (addition) and (multiplication) with the following properties:
(1) Closure: if a, b R then the sum and the product are also in R: (a+b) R and (ab) R.
(2) Uniqueness: if {a, a , b, b } R and a = a and b = b then:
a + b = a + b and a b = a b .
(3) Commutative law: a, b R
a+b=b+a and a b = b a.
(4) Associative laws: a, b, c R
a + (b + c) = (a + b) + c and a (b c) = (a b) c.
(5) Distributive law: a, b, c R
a (b + c) = a b + a c.
(6) Zero element: There is a zero additive element 0 R such that:
a R a + 0 = a.
(7) Unity element: There is a multiplicative unity 1 R, 1 = 0 such that:
a R a 1 = a.
(8) Additive inverse: a R the equation a + x = 0 has a solution x R.
Denition 24. An integral domain D is a commutative ring in which the following postulate
holds:
(9) Cancellation law: if c = 0 and c a = c b then a = b.
Example 22. Integral Domain. The set of integers, Z = {. . . 3, 2, 1, 0, +1, +2, +3, . . .}
is an integral domain.
159
Denition 25. A subdomain of an integral domain D is a subset of D which is also an
integral domain for the same operations of addition and multiplications.
Denition 26. A eld F is an integral domain where each element except 0 has a multiplica-
tive inverse.
a F, a = 0, a1 and a1 a = 1.
A subdomain F of a eld F which is also a eld is called a subeld. In this case F is also
called an extension of the eld F .
Q, R, and C, the sets of rational numbers, real numbers, and complex numbers respec-
tively, equipped with the obvious operations of addition and multiplication are examples of
elds. Rational numbers are contained inside the real numbers as a subeld and the real
numbers are contained inside the complex numbers as a subeld.
The rst two are ordered elds, the last two are complete elds (we do not dene these
concepts, the reader should consult [17]). The third is an algebraically closed eld. This
means that any polynomial equation with coecients in the eld has solutions in the eld.
Q and R are not algebraically closed since the equation x2 + 1 = 0 has no solutions among
the real numbers. (The property complete is necessary for the denition and the study of
the concept of continuity of functions f (x) when x varies in the eld.)
10.2 Complex Numbers

The equation x2 = 1 has no root among real numbers. Invented was an imaginary number i
satisfying the equality i2 = 1. The complex numbers is the smallest eld which contains the
eld of real numbers and the imaginary number i. It has the property that any polynomial
equation of degree n has n solutions (fundamental theorem of algebra).
A complex number ca be regarded as a pair (x, y) of real numbers, with x called the real
component and y called the imaginary component. A complex number is written as x + iy.
Addition and multiplication of complex numbers are dened as follows:
(x + iy) (x + iy ) = (x x ) + i(y y )
(x + iy) (x + iy ) = (x x y y ) + i(x y + y x )
Clearly, the complex numbers form a eld.

Complex Plane. Complex numbers can be identied one-to-one to the points of a Cartesian
plane. z = x + iy is mapped onto a point P (x, y) in the plane, where x = Real(z) c is the
abscissa and y = Imaginary(z) is the ordinate.
Denition 27. Polar coordinates (r, ) uniquely determine any point P (x, y), but the origin,
in the plane and, therefore, they uniquely determine a complex number z dierent from zero.
All pairs 0, theta represent the origin or equivalently the complex number zero. r is called the
absolute value and the argument of z.
160
r |z| = (x2 + y 2 )1/2 .
y y
arg(z) = arctan = tan1 .
x x
Thus:
x = r cos() y = r sin( ).
z = r cos + i r sin = r(cos + i sin).

22
One introduces the imaginary powers of the (famous) Euler number e by
cos + i sin = ei ,
and notice the consistency of all familiar algebraic operations involving e if we put:
e(x+iy) := ex eiy .
It is clear that:
(i) A complex number z can be written as:
z = r ei
(ii) Eulers formulae:

1 i
sin() = (e ei )
2i
1
cos() = (ei + ei ).
2
(iii) Moivres formulae give the absolute value and the argument of a product of complex
numbers:
|z z | = |z| |z |.
arg(z z ) = arg(z) + arg(z ).
Proof :
z = r(cos + i sin)
z = r (cos + i sin )
z z = rr [(cos cos sin sin ) + i(cos sin + sin cos )]

22
Eulers number e = 2.71828... is the only positive real number with the property that the function
f (x) = ex is the same as its derivative.
161
Thus:
z z = rr [cos( + ) + i sin( + )]
(iv) The absolute value of a sum of complex numbers is smaller or equal to the sum of the
absolute values:
|z + z | |z| + |z |.
(v) The absolute value of a complex number is a positive real number: |z| > 0 unless z = 0
since only |0| = 0
Theorem 3. Fundamental theorem of algebra: Every polynomial p(z) of positive degree m

with complex coecients has a complex root and can be written as:
p(z) = c(z z1 )(z z2 ).....(z zm ).
Denition 28. Given a complex number z = x + iy its conjugate is z = x iy.

The conjugation had the following properties:
(z1 + z2 ) = z1 + z2
(z1 z2 ) = z1 z2
(z ) = z
|z|2 = z z
z
z 1 =
|z|2
10.3 Abstract Groups and Isomorphisms
Denition 29. A group G is a system of elements closed under a single binary operation ,
called multiplication, with three properties:
(i) Associative law: (a, b, c) G a (b c) = (a b) c.
(ii) Identity Element: There is an identity element e G and a G a e = e a = a.
(iii) Inverse law: a G, a1 such that a a1 = a1 a = e.
It is not hard to check that if a G the inverse of a, a1 is unique. There is only one
identity element in G.
162
Denition 30. A group G whose operation satises the commutative law (i.e. a b = b a)
is called commutative.
We can now restate the denition of a eld:
Denition 31. A eld is set F of elements closed under two uniquely dened binary opera-
tions, addition and multiplication such that:
(i) under addition F is an Abelian group with identity O.
(ii) under multiplication, the nonzero elements form an Abelian group.
(iii) the distributive law holds: a(b + c) = ab + ac.
Denition 32. The power of an element. Given a G and m > 0 a positive integer then
m
am = a a a . . . a (to m f actors), a0 = e, am = (a1 ) .
The following holds:
ar as = ar+s .
Denition 33. If a G the order of a is the least positive power of a such that am = e. If no
positive m exists then a has order innity. The group G is cyclic if it contains one element x
whose powers exhaust G. This element is is called the generator of the group.
a a'
a+b G
a'+b
G
b'
Figure 43: If an isomorphism between tho groups G and G exists then given two elements
(a, b) G and their transforms (a , b ) G , we can: (i) construct the product of a b and then
transform the product a b to a b G , or (ii) rst transform the elements into (a , b ) G
and then construct the product a b G .
Denition 34. Given two groups G and G an isomorphism is a one-to-one correspondence

(denoted by ) between their elements that preserves the group multiplication (Figure 43).
This means that if a, b G and a , b G and
(a a ) and (b b )
then
(a b) (a b ).
163
Exercise 14. Prove that the relation G is isomorphic to G is an equivalence relation, it
is reexive, symmetric and transitive.
Exercise 15. Prove that under an isomorphism between two groups the identity elements
correspond and the inverses of corresponding elements correspond.
Denition 35. A non-void subset S of a group G is a subgroup i (if and only if ) it is

closed to the composition law and to the operation of taking an inverse. The symbol means
implies.
(i) a, b S (a b) S
(ii) aS a1 S.
Observe that the intersection S T of two subgroups S and T of a group G is a subgroup of
G:
(S G) (T G) (S T ) G.
10.4 Symmetry in a Plane
D V D
2 1
H
H
3 4
D V D
Figure 44: A square in a plane has for rotational symmetries and four reections. The vertices
of the square are labelled 1,2,3, and 4 and it center is O. There are four axes of symmetry V
(a vertical), H (horizontal), D and D (diagonal).
164
Consider a square in a plane, see Figure 44. There are eight transformations that preserve
distances:
(i) Four counter-clockwise rotations around its center O multiples of 90o
R 90o . (1 2 2 3 3 4 4 1)
R 180o . (1 3 2 4 3 1 4 2)
R 270o . (1 4 2 1 3 2 4 3)
I 360o . (1 1 2 2 3 3 4 4)
(ii)Four reections:
H: reection on the horizontal axis through O. (1 4 2 3 3 2 4 1)
V: reection on the vertical axis through O. (1 2 2 1 3 4 4 3)
D: reection through the diagonal in quadrants I and III. (1 1 2 4)
(3 3 4 2)
D: reection through the diagonal in quadrants II and IV. (1 3 2 2)
(3 1 4 4)
We can compose symmetry operations, e.g., H R is obtained by rst reecting the square
on the horizontal axis then rotating counter-clockwise by 90o . H sends vertex 1 into 4 and
R sends 4 into 1. Thus H R sends 1 into 1. On the other hand D sends 1 into 1. After
checking for all vertices we can see that:
H R = D.
Following a similar argument it follows that:
R H = D .
Thus we see that R H = H R the multiplication is not commutative.
(iii) that the group of integers modulo four with the usual addition operation is isomorphic
with the group of rotational symmetries of the square.
Exercise 16. Prove that the group of integers modulo four with the usual addition operation
is isomorphic with the group of rotational symmetries of the square.
10.5 Groups of Transformations

Let S be a set. Consider a rule which assigns to an element p S a unique image element
(p) T .
: S T .
The transformation is similar to a function dened on elements of S with values in T . S is
the domain and T is the codomain of . The set S of all images under transformation of
elements in S is called the range of ; it may comprise only part of the codomain T .
Example 23. Transformations.

(i) The function f (x) = e2ix is a transformation of the set R of all real numbers into the set
C of all complex numbers. The range of this transformation is a circle with unit radius.
(ii) The function g(z) = |z| is a transformation of the complex numbers set C into the set R
of real numbers. The range of the transformation consists of all nonnegative real numbers.
165
Denition 36. A transformation : S T is:
onto - if the co-domain T equals the range of . This means that every q T is the image
q = p(p) of at least one p S.
one to one - from S into T if carries distinct elements of S into distinct elements of T so
that each q T is the image of at most one p S.
Denition 37. Equality of two transformations. Given two transformations and with
the same domain S and codomain T
: S T
: S T
Then
=
means that the two transformations have the same eect for every p S:
(p) = (p)
.
Denition 38. Product of two transformations.Given two transformations and such

that:
: S T
: T U
is dened as the result of performing them in succession: rst then , provided
that the codomain of is the domain of .
The product is the transformation of S into U given by the equation:
()(p) = (p).
166
11 Appendix II: Linear Algebra
11.1 Vectors in a Plane

Vectors in a plane are quantities that have a magnitude and direction; they are mathematical
abstractions used to describe physical quantities such as forces, velocities, accelerations. Vec-
tors in a plane are represented by pairs of real numbers. The vector = (a1 , a2 ) is represented
by an arrow with origin at (0, 0) and terminus at (a1 , a2 ).
Vector sums and scalar products are computed coordinate by coordinate. Vector addition
and multiplication by a scalar are dened as follows:
(1) Vector addition: Given = (a1 , a2 ) and = (b1 , b2 ) then
+ = (a1 + b1 , a2 + b2 ).
(2) Multiplication by a scalar: Given = (a1 , a2 ) and a real c R then
c = (c a1 , c a2 ).
From (1) and (2) it is easy to derive various laws of vector algebra. Given two vectors
and and a scalar c R:
(3) Commutativity:
+ = + .
(4) Distributivity of scalar multiplication:
c ( + ) = c + c .
The vectors in a plane as dened above can be generalized in two ways. First, the number
of dimensions can be arbitrary instead of two. Second, the components of vectors can be the
elements of any eld instead of real numbers.
11.2 Vector Spaces
Denition 39. A vector space assumes three objects:

1. An abelian group (V, +) whose elements are called vectors and whose binary operation
+ is called addition,
2. A eld F of numbers (in our case either R, the real numbers, or C, the complex
numbers), whose elements are called scalars, and
3. A multiplication with scalars operation denoted by , which associates to any scalar
c F and vector V a new vector c V with the following properties:
c ( + ) = c + c .
(c + c ) = c + c .
and
(cc ) = c (c ), 1 = .
We abbreviate a vector space as: (V, +, ).
167
Example 24. Vector spaces:
a) The eld is R. Then V = Rn is the n-dimensional canonical real vector space
b) The eld is C. Then V = Cn is the n-dimensional canonical complex vector space
We use Diracs notation for vectors [26].

z1 z1
z2 z2
| v1 =
. . . and | v2 = . . .
zn zn
with c, z1 , z2 , . . . zn , z1 , z2 , . . . zn complex numbers. Then addition and multiplications by a
scalar c are:

z1 z1 z1 + z1
z2 z2 z2 + z2
| v1 + | v2 =
... + ... = ...

zn zn zn + zn

z1 cz1
z2 cz2
c | v1 = c
... = ...
zn czn
c) The eld F = R or F = C and V = CF (I) where I is an interval ( subset of R) and
CF (I) denotes the set of continuous functions dened on I with value in F = R or F = C,
see Figure 45.
d) The eld F = R or F = C and V = L2F (I) 23
, see Figure 45.
11.3 Linear Independence

Given the vectors 1 , 2 , . . . n V and the scalars c1 , c2 , . . . cn F then the set of all linear
combinations is:
c1 1 + c2 2 + . . . cn n .
Denition 40. Given n vectors 1 , 2 , . . . n V they are linearly independent i for all
scalars c1 , c2 , . . . cn F :
c1 1 + c2 2 + . . . cn n = 0 c1 = c2 = . . . cn = 0.
23
Here is the denition of L2F (I). The elements of L2F (I) are equivalence classes of functions f : I F
so that I | f (x) |2 dx < , with the equivalence relation dened below. | f (x) |2 means the modulus of a

complex valued function. First, note that given two functions f (x), g(x) as above I f (x) g(x)dx < . Here
g(x) is the complex conjugate of a complex valued function g(x). We say that two functions as above f1 (x)
and f2 (x) are equivalent if for any other function g(x) as above one has

f1 (x) g(x)dx = f2 (x) g(x)dx
F F
.
168
CF(I)
II F
LF2(I)
I
I F
Figure 45: Vector spaces: (a) V = CF (I), the set of continuous functions dened on I with
value in F and (b) V = L2F (I) functions so that I | f (x) |2 dx < , .
Vectors that are not linearly independent are called linearly dependent.
Example 25. Linear dependency. Show that the following three vectors in C2 are linearly
dependent:

1 1 2
| v1 = and | v2 = and | v2 =
1 2 1
Proof: We show that they cannot be linearly independent. This would require the existence
of three nonzero complex numbers a, b, c C such that:

1 1 2
a +b +c =0
1 2 1
This implies that:

a+b+c 0
=
a + 2b + c 0
The complex numbers a = 0, b = 0, and c = 0 must then satisfy two linear equations:
a + b + c = 0, and a + 2b + c = 0.
If we add the two equations 3b + c = 0 or c = 3b. Substituting c in the original equations
we get:
a 2b = 0 and a b = 0.
This implies that b = 0 which contradicts our hypothesis.
Denition 41. A subspace S of a vector space V is a subset of V which is itself a vector

space with respect to the operations of addition and scalar multiplications in V .
169
Example 26. Subspaces
(i) The set of polynomials of degree at most m is a subspace of the vector space of all
polynomials.
(ii) The set of all continuous functions f (x) dened for 0 x 2 is a subspace of the linear
space of all functions functions dened on the same domain.
Proposition 11. The set of all linear combinations of any set of vectors of a vector space V
is a subspace of V .
Given c , c1 , c2 , . . . cn F the following two identities allow us to prove the proposition:
(13) (c1 1 + c2 2 + . . . cm m ) + (c1 1 + c2 2 + . . . cm m ) =
(c1 + c1 )1 + (c2 + c2 )2 + . . . (cm + cm )m
(14) c (c1 1 + c2 2 + . . . cm m ) = (c c1 )1 + (c c2 2 + . . . (c cm )m
The subspace consisting of all linear combinations of a set of vectors of V is the smallest
subset containing all the given vectors. The set of vectors span the subspace.
Denition 42. A linearly independent subset of vectors which spans the whole space is called
a basis of a vector space.
A vector space is nite dimensional i it has a nite basis.
Denition 43. The minimum number of vectors in any basis of a nite-dimensional vector
space V is called the dimension of a vector space it is denoted by dim(V ).
Example 27. Spanning Set. The ordinary space V3 (R) can be spanned by three vectors:
(1, 0, 0), (0, 1, 0), (0, 0, 1).
Example 28. Spanning Set. Let us consider C2 the vector space of two-dimensional complex
vectors. We use for such vectors Diracs notation. Prove that the following two vectors form
a spanning set for C2 :

1 0
| v1 = and | v2 =
0 1
Proof:
Let a, b C be two arbitrary complex numbers and let a two-dimensional vector | v in
2
C be :

a
| v =
b
Then:

a 1 0 a 0 a+0 a
| v = =a +b = + = =
b 0 1 0 b b+0 b
170
Example 29. Spanning Set. Prove that the following two vectors form a spanning set for
C2 :

1 1 1 1
| v1 = and | v2 =
2 1 2 1
Proof:

1 1 1 1 2 1
| v1 + | v2 = + = = 2
2 1 1 2 0 0
Similarly:

1 1 1 1 0 0
| v1 | v2 = = = 2
2 1 1 2 2 1
We want to show that any | v in C2 can be expressed as:

a (a + b) (a b)
| v = = | v1 + | v2
b 2 2
Indeed:

a b a 1 b 0 a
| v = [| v1 + | v2 ] + [| v1 | v2 ] = 2 + 2 =
2 2 2 0 2 1 b
Exercise 17. Show that any vector x = (x1 , x2 , . . . xn ) Vn (F ) is a linear combination of n

unit vectors:
1 = (1, 0, 0, . . . 0), 2 = (0, 1, 0, . . . 0), . . . n = (0, 0, 0, . . . 1).
Proposition 12. Let n vectors span a vector space V containing r linearly independent vec-
tors. Then n r.
Proposition 13. All bases of any nite-dimensional vector space V have the same nite
number of dimensions.
Proposition 14. If a vector space V has dimension n, then (i) any set of (n + 1) elements
of V are linearly dependent, and (ii) no set of (n 1) elements can span V .
Exercise 18. Prove the previous propositions.
171
11.4 Scalar Product, Norm, and Hilbert Spaces
Denition 44. Let V and W be two vector spaces over the same eld F . Consider variable
vectors , V and , W and scalars a, b, c, d F . The function f (, ) with values in
F and with two properties:
(i) f (a + b , ) = af (, ) + bf ( , ) and
(i) f (, c + d ) = cf (, ) + df (,
is a bilinear function.
Denition 45. The scalar product in a vector space V over a eld F is a bilinear map
< ., . >: V V F which in addition to bilinearity property satises 24 :
< , > = < , >
< , > 0 and
< , > = 0 i = 0
Observation 1. A scalar product permits to measure the length of a vector and the angle
of two vectors.
Denition 46. A norm is a nonnegative function || ||: V [0, ) which satises the
following properties:
|| + |||| || + || ||,
|| c || = | c ||| ||,
|| || = 0 = 0.
A norm, and therefore a scalar product, permits to dene convergence. However, not
any norm comes from a scalar product; there is a simple condition to recognize when a norm
comes from a scalar product and when this happens the scalar product is provided by a simple
formula. For example, if F = R, the scalar product of and should be given by:
|| + ||2 || ||2 || ||2

with || ||= < , > the length of the vector .
Observation 2. Any scalar product < , > induces a norm.

Denition 47. A pair consisting of a vector space V and a scalar product < , > is called
a Hilbert space if V is complete with respect with the norm induced from the scalar product.
Complete means that any Cauchy sequence is convergent.
Observation 3. If a vector space is nite dimensional it is automatically complete, hence

Hilbert 25 . If not, one can add elements and complete it to Hilbert space 26 , see Figure 46.
24
We assume that F = C and denote by < , > the complex conjugate of the complex number < , >.
25
A vector space which is complete and equipped with a norm is called a Banach space. A Hilbert space is
a space with a scalar product which is Banach with respect to the induced norm. A nite dimensional vector
space equipped with a norm is always a Banach space. An innite dimensional vector space can be completed
to a Banach space.
26
The procedure is similar to the completion of rational numbers to real numbers
172
Finite-dimensional
vector space
Scalar
Norm
product
Hilbert Banach
space space
Figure 46: A nite dimensional vector space with a scalar product is a Hilbert space. A nite
dimensional vector space with a norm is a Banach space.

Example 30. Scalar product. Consider < f, g >:= I f (x)g(x)dx. This formula denes a
scalar product on C(I), but with this scalar product C(I) is not a Hilbert space. C(I) is the
set of continous functions on the domain I. If one completes it, then one obtains L2 (I) 27 .
Denition 48. Given a vector space V and two vectors and they are said to be orthogonal
( ) if and only i their scalar product is zero, = 0.
Observation 4. The orthogonality relation is symmetric:
Observation 5. If a vector is orthogonal to the vectors 1 , 2 , . . . n then it is orthogonal to

every vector in the subspace spanned by 1 , 2 , . . . n .
Denition 49. The vectors 1 , 2 , . . . n are normal orthogonal when:

(i) | i | = 1 i {1, n}.
(ii) i j if i = j.
Example 31. Normal orthogonal basis. The vectors
1 = (1, 0, 0, . . . 0), 2 = (0, 1, 0, . . . 0), . . . n = (0, 0, 0, . . . 1).

have unit length and are mutually orthogonal, thus they form a normal orthogonal basis.
Observation 6. A base in a vector space denes an unique scalar product which makes this
base orthonormal.
Denition 50. Given a vector space of nite dimension equipped with a scalar product, one
can construct another base which is orthonormal. (the Grahm-Schmidt orthogonalization
procedure).
27
If I is a closed interval then the formula || f ||:= supxI |f (x)| denes a norm. When equipped with this
norm C(I) is a Banach space. This norm does not comes from any scalar product.
173
11.5 Euclidian Vector Space
Denition 51. An Euclidian vector space E is a vector space over the set of real numbers R
and with a scalar product.
It is easy to see that the length or the norm | | of a vector in an Euclidian vector space,
E, has the following properties:
(i) | c | = | c | | | .
(ii) | | > 0 unless = 0.
(iii) | , | | || | (Schwartz inequality).
(iv) | + | | | + | | (T riangle inequality).
Observation 7. The distance between two vectors in a Euclidian vector space , E is

| |.
Observation 8. The Euclidian distance has the metric properties of ordinary distance:
(1) | | = 0 and | |> 0 if = .
(2) | | = | | (symmetry).
(3) | | + | | | | .
Observation 9. The non-zero orthogonal vectors 1 , 2 , . . . m of an Euclidian vector space

E are linearly independent.
Proof: Assume that they are not and there are c1 , c2 , . . . cm R, c1 = 0, c2 = 0, . . . ck = 0
such that:
c1 1 + c2 2 + . . . cm m = 0.
Then for k = 1, 2, . . . m
0 = 0 k = c1 1 k + c2 2 k + . . . cm m k = ck k k .
The last equality comes from the orthogonality assumption. But k = 0 thus, k k >
0. This implies that ck = 0, a contradiction.
Corollary 1. Normal vectors spanning E form a normal orthogonal basis.
Lemma 1. Gram-Schmidt orthogonalization process. Given a nite-dimensional Euclidian

space, E, and a nite sequence of independent vectors, 1 , 2 , . . . n E we can form an
orthogonal sequence of non-zero vectors which spans the same subspace of E as 1 , 2 , . . . n
as follows:

i = i ci,k k .
k<i
with ci,k R.
Observation 10. Given an Euclidian vector space E and an m-dimensional subspace S E

with a normal orthogonal basis 1 , 2 , . . . m . If is a vector not in S, S, the Gram-
Schmidt orthogonalization represents as a sum = + with S and perpendicular
to every vector in S. The vector is called the orthogonal projection of on S.
174
Observation 11. The scalar product of two vectors in an Euclidian vector space E is ex-
pressed by a bilinear relation:

= (ci i ) (ck i ) = ci ck i k .
i k i,k
1 , 2 , . . . n is a basis of E and the two vectors can be expressed as:
= c1 1 + c2 2 + . . . cn 1 n
= c1 1 + c2 2 + . . . cn 1 n.
11.6 Linear Operators and Matrices

A linear operator between two vector spaces V and W is dened as any function, or mapping,
from V to W , f : V W , which is linear in its inputs:

f( ak v k ) = ak f (vk ).
k k
Transformations between vector spaces can be described using either linear operators or
matrix representations. The matrix A = [Ai,j ] 1 i n 1 j m where the coecients
ak are arranged in an n m table is the matrix representation of the linear operator f .
Traditionally the same notation, either f or A, is used for both the linear operator and for
its matrix representation. We distinguish between the two for illustrative purpose only.
The linear transformation is thus represented as a scalar product of matrix A with the
vector v = (v1 , v2 , . . . vk . . . vmn ). In this section we introduce matrices and then discuss
operator functions.
Let us start with a eld F and V and W vector spaces over F . F could be the set of reals
numbers, F = R, or complex numbers, C. Consider two vectors , V and two scalars
a, b F .
Denition 52. A linear transformation is a map f : V W with the following properties:

(i) f ( + ) = f () + f ()
(ii) f (a) = af ().
or equivalently:
f (a + b) = af () + bf ().
If f and g are linear functions on V then the sum of the two liner functions is dened as:
(f + g)() = f () + g() V (F ).
If f is a linear functions on V and c F then:
(cf )() = cf () Vn (F ).
175
Proposition 15. The set V of all linear transformations of V is also a vector space over F ,
under the operations f + g and cf dened above.
Hint: we have to verify that the axioms for a vector space hold for the two operations.
Denition 53. The vector space V of all linear functions functions over V is called the
algebraic dual space of V .
Observation 12. Suppose that V is equipped with a scalar product, hence a norm; the linear
maps which are bounded 28 form a linear subspace called the bounded dual. If V has nite
dimension any linear map is bounded with respect to any norm and therefore the algebraic dual
and the bounded dual are the same. We denote the dual of V by V and mean the algebraic
dual in case there is no scalar product, and bounded dual in case there is one. In the case of
nite dimensional vector spaces the notation V for dual does not create any problem given the
fact that both dual spaces of V are the same. Moreover, in this case V and V are isomorphic
and any scalar product provides a precise isomorphism from V to V .
If V is equipped with a scalar product and it is a Hilbert space then the bounded dual has
an induced scalar product and with respect to this scalar product it is also a Hilbert space.
The scalar product provides a precise isomorphism from V to V (which means the bounded
dual which is an isomorphism of Hilbert spaces, i.e. is an isomorphism which identies the
scalar products). This is the duality in linear algebra (in the class of nite dimensional vector
spaces as well as in the class of arbitrary Hilbert spaces).
Denition 54. A linear transformation f : V W between vector spaces of nite dimension

induces a linear transformation f : W V called the adjoint of f.
Examples : Matrices, adjoint matrix.
Proposition 16. If an n dimensional vector space Vn (F ) has a nite basis 1 , 2 , . . . n , then

its dual space V has a basis f1 , f2 , . . . fn consisting of n linear functions fi , n {1, n} with
fi (c1 1 + c2 2 + . . . cn n ) = ci , i {1, n}.
The n linear functions are uniquely determined by:

0 if i = j
i fi = i,j =
1 if i = j
Here i,j are the Kroneker delta symbols.
Proposition 17. The dual V of an n-dimensional vector space V has the same dimension
n as Vn (F ).
dim(V ) = dim(V ) = n.

The transformation
n T : V V which maps each vector = ni i i V into the
function i fi (i ) is an isomorphism of V onto V . This isomorphism depends upon the
choice of the basis 1 , 2 , . . . n of V
28
i.e. there exists a constant K so that |f ()| K||||
176
Denition 55. A rectangular array of elements of a eld F with m rows and n columns is
called a matrix:

a11 a12 . . . a1n
a21 a22 . . . a2n
A= ............

am1 am2 . . . amn

Let 1 = (a11 a12 . . . a1n ), 2 = (a21 a22 . . . a2n ), . . . m = (am1 am2 . . . amn ) be
a set of vectors spanning a subspace of dimension m of a vector space Vn (F ) called the row
space of the m n matrix A.
Denition 56. The elementary row operations on a matrix are: (i) the interchange of any
two rows, (ii) multiplication of all the elements of a row by a constant c F , and (iii) addition
of any multiple of a row to any other row. Two matrices are row-equivalent if one is obtained
from the other by a nite sequence of row operations.
Exercise 19. Prove that row-equivalent matrices have the same row space.
11.7 Functions of Operators and Matrices

Given an operator or a matrix we can dene functions of it. For example, given a positive
operator we can dene its logarithm or the square root. Some important function are: the
determinant, the characteristic function, and the trace of a matrix.
Denition 57. The determinant of an n n matrix A = [aij ] is the polynomial:

det(A) =| A |= sgn() a(1, 1) a(2, 2) . . . a(n, n).

denotes one of the n! dierent permutations of integers 1, 2, . . . n . If is an even permutation

then sgn() = +1 and sgn() = 1 for an odd permutation.
The determinant can be written as:
det(A) = Ai1 aa1 + Ai2 aa2 + . . . Ain aan

Here Aij is a polynomial in the entries of the remaining rows of A called the cofactor of aij .
A cofactor contains only entries in the sub-matrix Mij (also called a minor) obtained from
A by eliminating row i and column j. The cofactor can be described as the partial derivative
( | A |/aij ) of A.
Observation 13. If we permute two rows of A then the sign of the determinant | A | changes.
Proposition 18. The determinant of the transpose of a matrix is equal to the determinant
of the original matrix.
| AT |=| A | .
177
Proposition 19. If two rows of A are identical then:
| A |= 0.
Denition 58. A square matrix (n n) is triangular if all entries below the diagonal are
zero.
Proposition 20. The determinant of a triangular matrix is the product of its diagonal ele-
ments.
Observation 14. To compute the determinant | A | one should perform elementary row
operations on matrix A and reduce it to a triangular form.
Denition 59. The characteristic function of matrix A is:
c() det | A I | .
Denition 60. The trace of matrix A is the sum of its diagonal elements:

tr(A) = iaii .
Given any two matrices A and B over F and a scalar c F it is easy to show that the
trace has the following properties:
(i) Cyclic: tr(AB) = tr(BA).
(ii) Linear: tr(A + B) = tr(A) + tr(B) tr(cA) = a tr(A).
(iii) Invariance under the unitary similarity transformation: tr(U AU ) = tr(U U A) = tr(A).
Exercise 20. Prove that the trace has properties (i), (ii), and (iii).
Denition 61. The trace of an operator A is the trace of the matrix representation of A.
11.8 Eigenvectors and Eigenvalues. Hermitian Operators

We restrict our discussion to vector spaces over the eld of complex numbers, though the
denitions and the results apply to vector spaces over any eld. To distinguish between
scalar and vectors we use Diracs notations for vectors. One can formulate the results in
terms of linear operators, or equivalently, using the matrix representation of linear operators;
we choose to use operators. For example, A is an operator, | a a vector in a vector space Cn ,
and a a scalar (a complex number).
178
Denition 62. A non-zero vector | a is an eigenvalue and the scalar a is an eigenvalue of
the linear operator A if the following equation is satised:
A | aa = a | a
The eigenvalues of the operator A are the solutions of the characteristic equation:
c() = 0.
According to the fundamental theorem of algebra a polynomial over C has at least one
complex root. Thus every operator A has at least one eigenvalue.
Denition 63. The eigenspace corresponding to an eigenvalue a of operator A is the set of

vectors | a | which have eigenvalue a. It is a subspace of the vector space A operates on.
Denition 64. A diagonal representation of operator A on the vector space Cn is

A= ai | bi bi |
i
where the vectors | bi form an orthogonal set of eigenvectors of A corresponding to eigenvalues

ai .
Observation 15. An operator is diagonizable if it has a diagonal representation.
Observation 16. When the eigenspace of an operator A is more than one dimensional we
say that it is degenerate.
Denition 65. Given a linear operator A there is a unique linear operator A called the
adjoint, or the Hermitian conjugate of A such that | a, | a Cn :
(| a, A | a ) = (A | a, | a ).

If | a is a vector in the vector space Cn by denition | a = a | and:
(A | a) =| aA .
Denition 66. If A = A then A is called a Hermitian, or self-adjoint operator.
Observation 17. If A is the matrix representation of the linear operator A and we want
to construct the Hermitian conjugate we rst construct the complex conjugate matrix A and
then take its transpose:
A = (A )T .
Example 32. The Hermitian conjugate of A is:
T T
1 5i 1 + i 1 5i 1 + i 1 + 5i 1 i 1 + 5i 1 3i
A = = = = .
1 + 3i 7i 1 + 3i 7i 1 3i 7i 1i 7i
179
Denition 67. An operator A is normal i:
AA = A A.
Observation 18. If A is Hermitian (self-adjoint) it is also normal.
Denition 68. A matrix U is unitary i:
U U = I
where I is the identity matrix:

1 0 0 ... 0
0 1 0 ... 0

0 0 1 ... 0

...
0 0 0 ... 1
Observation 19. Unitary operators preserve the scalar (inner) product of vectors.
Let | , | Cn . Then
(U | , U | = U U | = | I | = | .
Theorem 4. Spectral decomposition. If A is a normal operator over Cn it is diagonal with

respect to some orthonormal basis for Cn ; conversely any diagonizable operator is normal.
Denition 69. If (| a, A | a) 0 a Cn then A is a positive operator. If

(| a , A | a ) > 0 a Cn then A is positive denite.
Observation 20. Any positive operator is Hermitian.
Exercise 21. Prove that:
(A ) = A.
Exercise 22. Prove that a normal matrix is Hermitian i it has real eigenvalues.
180
11.9 Bilinear Functions and Tensor Products
Denition 70. Let V (F ) and W (F ) two vector spaces over the same eld F . Consider
variable vectors , V and , W and scalars a, b, c, d F . The function f (, ) with
values in F and with two properties:
(i) f (a + b , ) = af (, ) + bf ( , ) and
(i) f (, c + d ) = cf (, ) + df (,
is a bilinear function.
Proposition 21. If vector spaces Vm (F ) and Wn (F ) have nite basis 1 , 2 , . . . m and

1 , 2 , . . . n respectively, then the most general bilinear function of two vectors = 1 1 +
2 2 . . . m m , with i F, i {1, m} and = 1 1 + 2 2 + . . . n n with i F, i {1, n}
has the form:

m
n
f (, ) = i ai,j j , with ai,j = f (i , i ).
i=1 j=1
There is a one-to-one correspondence between bilinear functions an m n matrices

A = ! ai,j ! with the elements ai,j given by the previous equation.
Denition 71. Given two bilinear functions f and g and any two vectors and W the
sum h = f + g is dened by:
(1) h(, ) = f (, ) + g(, ) V, and W.
Denition 72. Given a bilinear function f and a scalar c F the product k = cf is dened
by:
(2) k(, ) = cf (, ).
Proposition 22. If V (F ) and W (F ) are vector spaces, the set B(V, W ) of all bilinear func-
tions f (, ) with V and W is also a vector space over F under the operations dened
by (1) and (2) above.
Denition 73. The tensor product V W of two vector spaces V and W over the same eld
F is the dual B(V, W ) of the space B(V, W ) of bilinear functions from V and W to F .
Example 33. If A is an m by n matrix and B is a p by q matrix then using the Kronecker

product representation for the tensor product we have:

a11 B a12 B . . . a1n B
a21 B a22 B . . . a2n B

AB = a31 B a22 B . . . a2n B

... ... ... ...
am1 B am2 B . . . amn B
with:
181

a11 a12 ... a1n b11 b12 ... b1q
a21 a22 ... a2n b21 b22 ... b2q

A=
a31 a22 ... a2n
B=
b31 b22 ... b2n .

... ... ... ... ... ... ... ...
am1 am2 ... amn bp1 bp2 ... bpq
here aij B, 1 i n, 1 j m, is a submatrix whose entries are the products of elements
of matrix B multiplied by aij .
Observation 21. The tensor product of an m n matrix and a p q matrix is an mp nq

matrix.
Example 34. The tensor product of vectors (a, b) and (c, d) is the vector;

ac
a c ad
=
bc .
b d
bd
Example 35. The tensor product of Pauli matrices Y and Z is:

0 0 i 0
0 i 1 0 0 Z i Z 0 0 0 i
Y Z = = =
i 0 0 0 .
i 0 0 1 iZ 0Z
0 i 0 0
Exercise 23. Consider two vectors over C2 :
| = 0 | 0 + 1 | 1
| = 0 | 0 + 1 | 1
Show that:
| | = 0 0 | 00 + 0 1 | 01 + 0 1 | 10 + 1 1 | 11.
Exercise 24. Consider vectors , , and scalars m, n. Prove the following equalities:
m n = mn( ).
( + ) = +
( + ) = +
182
Exercise 25. Consider matrices V, X, Y, Z, W and vectors , , . Prove the following equal-
ities:
(V W )(X Y ) = V X W Y
(V W )( ) = (V ) (W )

V W V Z W Z
Z =
X Y X Z Y Z
Exercise 26. Consider n unitary matrices Vi , 1 i n, (this means that Vi V = I, with

V the conjugate transpose of V ). Let W = V1 V2 . . . Vn . Show that W is unitary.
12 Acknowledgments
Professor Dan Burghelea from the Mathematics Department at Ohio State University and
Professor Robert E. Lynch from the Mathematics and the Computer Science Departments at
Purdue University have gone through an early version of the notes and have made suggestions
that have signicantly contributed to increase the quality of this manuscript. Of course, the
authors are responsible for all the remaining errors.
183
References
[1] A. D. Aczel. Entanglement: The Greatest Mystery in Physics. Four Walls Eight Windows
Publishing House, New York, NY, ISBN 1-56858-232-3, 2001.
[2] M. Agrawal, N. Kayal, and N. Saxena. PRIMES is in P. http://www.cse.iitk.ac.in, 2002.
[3] D. Z. Albert. Quantum Mechanics and Experience. Harvard University Press, Cambridge,
Mass, ISBN 0-674-74112-9, 1992.
[4] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, P. Shor, T. Sleator,

J. Smolin, and H. Weinfurter. Elementary Gates for Quantum Computation. Preprint,
http://arxiv.org/archive/quant-ph/9503016 v1, March 1995.
[5] E. Bernstein and U. Vazirani. Quantum Complexity Theory. SIAM J. Computing 26,
14111473, 1997.
[6] J. S. Bell. On the Einstein-Podolsky-Rosen Paradox. Physics, 1:195-200, 1964.
[7] J. S. Bell. Speakable and Unspeakable in Quantum Mechanics: Collected Papers on

Quantum Philosophy. Cambridge University Press, Cambridge, 1987.
[8] P. Benio. Quantum Mechanical Models of Turing Machines that Dissipate no Energy.
Physical Review Letters, 48:15811584, 1982.
[9] C. H. Bennett. Logical Reversibility of Computation. IBM Journal of Research and

Development, 17:525535, 1973.
[10] C. H. Bennett. The Thermodinamics of Computation A Review. International Journal

of Theoretical Physics, 21:905928, 1982.
[11] C. H. Bennett and G. Brassard. Quantum Cryptography: Public Key Distribution and
Coin Tossing. Proc. IEEE Conf. on Computers, Systems, and Signal Processing, IEEE
Press, 175-179, 1984.
[12] C. H. Bennett, G. Brassard, C. Crepeau, R. Josza, A. Peres, and W. K. Wooters. Tele-

porting an Unknown State via Dual Classical and Einstein-Podolsky-Rosen Channels.
Physical Review Letters, 70(13):1895-1899.
[13] C. H. Bennett. Quantum Information and Computation. Physics Today, 24-30, October
1995.
[14] C. H. Bennett and P.W. Shor. Quantum Information Theory. IEEE Trans. on Informa-
tion Theory, 44(6):2724-2742, 1998.
[15] C. H. Bennett, P. W. Shor, J. A. Smolin, and A. V. Thapliyal. Entanglement-Assisted

Capacity of a Quantum Channel and the Reverse Shannon Theorem. IEEE Trans. on
Information Theory, 48(10):2637-2655, 2002.
[16] C. H. Bennett, T. Mor, and J. A. Smolin. The Parity Bit in Quantum Cryptography.
arXiv.quant-ph/9604040, July 5, 2002.
184
[17] G. Birkho and S. Mac Lane. A Survey of Modern Algebra., Macmillan Publishing,
New York, NY, 1965.
[18] M. Born. The Statistical Interpretations of Quantum Mechanics. Nobel Lec-

ture, December 11, 1954. From Nobel Lectures, Physics 1942-1962, pages 256-267.
http://www.nobel.se/physics/laureates/1954/born-lecture.html.
[19] L. de Broglie. The wave nature of the Electron. Nobel Lecture, De-
cember 12, 1929. From Nobel Lectures, Physics 1922-1941, pages 244-256.
http://www.nobel.se/physics/laureates/1929/broglie-lecture.html.
[20] J. Brown. The Quest for Quantum Computer. Simon and Suster, New York, NY, ISBN
0-648-87004-5, 1999.
[21] A. W. Burks, H.H. Goldstine, and J. von Neumann Preliminary Discussion of the Log-
ical Design of an Electronic Computer Instrument. Report to the US Army Ordenance
Department, 1946. Also in: Papers of John von Neumann W. Asprey and A. W. Burks,
editors, MIT Press, Cambridge, Mass., 97-146, 1987.
[22] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley Series in

Telecommunications. John Wiley & Sons, New York, ISBN 0-471-06259-6, 1991.
[23] W. van Dam. A Unversal Quantum Cellular Automaton. Proc. PhysComp96, T. Tooli,
M. Biafore, and J Leao Eds. New England Complex Systems Institute, 323-331, (1996).
[24] D. Deutsch. Quantum Theory, the Church-Turing Principle and the Universal Quantum
Computer. Proc. R. Soc. London A, 400:97-117, 1985.
[25] D. Deutsch. The Fabric of Reality. Penguin Books, New York, NY, ISBN 0-14-027541-x,
1997.
[26] P. A. M. Dirac. Theory of Electrons and Positrons. Nobel Lecture, De-

cember 12, 1933, From Nobel Lectures, Physics 1922-1941, pages 320-325.
http://www.nobel.se/physics/laureates/1933/dirac-lecture.html.
[27] P. A. M. Dirac. The principles of Quantum Mechanics. Fourth ed. Oxford: Clarendon
Press, Sec. 2, pp. 4-7, 1967.
[28] A. Einstein, B. Podolsky, and N. Rosen. Can Quantum-Mechanical Description of Phys-

ical Reality Be Considered Complete? Physical Review, 47:777, 1935.
[29] R.P. Feynman, R. B. Leighton, and M. Sands. The Feynman Lectures on Physics, Vol-
umes 1,2, and 3. Addison-Wesley, Reading, Mass., ISBN 0-201-02116-1, 1977.
[30] R.P. Feynman. QED - The Strange Theory of Light and Matter. Princeton University
Press, Princeton NJ, ISBN 0-691-02417-0 1985.
[31] R.P. Feynman. Lectures on Computation. Addison Wesley, Reading, Mass., ISBN 9-
780201-48991-0, 1996.
[32] I. S. Gradshteyn and I. M. Ryzhik. Table of Integrals, Series, and Products Academic
Press, Orlando, Fl., ISBN 0-12-294760-6, 1980.
185
[33] W. Heisenberg. The Development of Quantum Mechanics. Nobel Lecture, December 11,
1933. From Nobel Lectures, Physics 1922-1942, pages 290-301.
[34] R. Landauer. Irreversibility and Heat Generation in the Computing Process. IBM Journal
of Research and Development, 5:182-192, 1961.
[35] S. McCartney. ENIAC; The triumphs and Tragedies of the Worlds FirstComputer.
Walker and Company Publishing House, New York, NY., ISBN0-8027-1348-3, 1999.
[36] G. J. Milburn Schroedingers Machines. Perseus Books, Cambridge, Mass., ISBN 0-

7382-0173-1, 1998.
[37] G. J. Milburn The Feynman Processor. W.H. Freeman and Company, New York, NY,
ISBN 0-7167-3106-1, 1996.
[38] M.A. Nielsen and Isaac L. Chuang. Quantum Computing and Quantum Information.
Cambridge University Press, ISBN 0-521-63245-8, 2000.
[39] B. W. Ogburn and J. Preskill. Topological Quantum Computation Lecture Notes in

Computer Science, Springer - Verlag, 1509:341-359, 1999.
[40] R. Omn`es. The Interpretation of Quantum Mechanics. Princeton Series in Physics.

Princeton University Press, Princeton, NJ, ISBN 0-691-03336-6, 1994.
[41] D. A. Patterson and J. L. Hennesy Computer Organization and Design; The Hard-
ware/Software Interface, Second Edition. Morgan Kaufmann, San Francsisco, Ca., 1998.
[42] W. Pauli. Exclusion Principle and Quantum Mechanics. Nobel Lecture,

December 13, 1946. From Nobel Lectures, Physics 1942-1962, pages 27-43.
http://www.nobel.se/physics/laureates/1945/pauli-lecture.html.
[43] M. K. E. L. Planck. The Genesis and Present State of Development of the Quan-
tum Theory. Nobel Lecture, June 2, 1920. From Nobel Lectures, Physics 1901-1922.
http://www.nobel.se/physics/laureates/1918/planck-lecture.html.
[44] J. Preskill. Lecture Notes for Physics 229: Quantum Information and Computing Cali-
formia Institute of Technology, 1998.
[45] J. Preskill. Quantum Clock Synchronization and Quantum Error Correction. Preprint,
http://arxiv.org/archive/quant-ph/0010098 v1, October 2000.
[46] E. Rieel and W. Polak. An Introduction to Quantum Computing for Non-Physicists.

ACM Computing Surveys, 32(3):300-335, 2000.
[47] E. Schrodinger. The Fundamental Idea of Wave Mechanics. Nobel Lecture, December
12, 1933. From Nobel Lectures, Physics 1922-1941, pages 305-314.
[48] C. E. Shannon. Communication in the Presence of Noise. Proceedings of the IRE,

37:1021, 1949.
[49] C. E. Shannon. Certain Results in Coding Theory for Noisy Channels. Information and
Control, 1(1):625, 1957.
186
[50] C. E. Shannon and W. Weaver. The Mathematical Theory of Communication. University
of Illinois Press, Urbana, 1963.
[51] P. W. Shor. Algorithms for Quantum Computation: Discrete Log and Factoring. Proc.
35 Annual Symp. on Foundations of Computer Science, pages 124134, IEEE Press,
Piscataway, New Jersey, 1994.
[52] P. W. Shor. Polynomial - Time Algorithms for Prime Factorization and Discrete Loga-
rithms on a Quantum Computer. Preprint, http://arxiv.org/archive/quant-ph/9508027
v2, January 1996.
[53] P. W. Shor. Fault - Tolerant Quantum Computation. 37th Ann. Symp. on Foundations
of Computer Science, pages 5665, IEEE Press, Piscataway, New Jersey, 1996.
[54] P. W. Shor and J. Preskill. Simple Proof of Security of the BB84 Quantum Key Distri-
bution Protocol. arXiv.quant-ph/0003004, May 12, 2000.
[55] P. W. Shor. Introduction to Quantum Algorithms. http://www.arXiv/quant-ph Preprint

0005003 July 6, 2001.
[56] A. Steane. The Ion Trap Quantum Information Processor. Preprint,

http://arxiv.org/archive/quant-ph/9608011 v2, August 1996.
[57] A. Steane. Quantum Computing. Preprint, http://arxiv.org/archive/quant-ph/97080222

v2, September 1997.
[58] O. Stern. The Method of Molecular Rays Nobel Lecture, Decem-

ber 12, 1946. From Nobel Lectures, Physics 1942-1961, pages 8-16.
http://www.nobel.se/physics/laureates/1943/stern-lecture.html.

[59] L. Szilard. Uber die Entropieverminderung in einem Thermodynamichen System bei
Eingrien Intelligenter Wesen Zeitschrit f
ur Physik, 53:840-856, 1929.
[60] A. M. Turing. On Computable Numbers with Application to the Einscheidungsproblem.

Proc. London Math. Soc. 2, 42:230, 1936.
[61] J. von Neumann. Fourth University of Illinois Lecture. In A. W. Burks, editor, Theory
of Self-Reproduced Automata, page 66, University of Illinois Press, Urbana, 1966.
[62] J. von Neumann. Mathematical Foundations of Quantum Mechanics. Trans. R. T. Bayer.

Princeton University Press, Princeton, 1955. (First published in German in 1932).
[63] W. K. Wotters and W. H. Zurek. A Single Quantum Cannot Be Clonned. Nature, 299,
802, 1882.
187

Lectures Notes On Quantum Computing and Quantum Information

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lectures Notes On Quantum Computing and Quantum Information

Hochgeladen von

Copyright:

Verfügbare Formate

Lectures Notes on Quantum Computing and Quantum

Information Theory for Non-Physicists

2 Introduction to Quantum Mechanics 28

5 The Entanglement of Computers and Communication with Quantum Me-

6 Physical and Logical Reversibility. Reversible Computations 129

7 Basic Concepts of Information Theory 140

8 Quantum Algorithms 157

10 Appendix I: Algebraic Structures 159

11 Appendix II: Linear Algebra 167

1.1 Computing and the Laws of Physics

1.2 A Qubit of History

Year 2002 2005 2008 2011

1.3 Quantum Information

1.4 Quantum Computers

1.5 The Wave and the Corpuscular Nature of Light

Incident beam of light

Figure 1: (a) A beam splitter. (b) A series of beam splitters.

1.6 Probabilities and Quantum States

P (A) = P (A/B1 ) P (B1 ) + P (A/B2 ) P (B2 ).

Things happen dierently in quantum mechanics, Bayess rule is replaced by a dierent

Beam splitter BS1 Beam splitter BS2

|1> (a) |1>

(a) - The probability amplitude of the TT case is (+q)(+q)

(b) The probability amplitude of the RR case is (+q)(+q).

S1 (d) The probability amplitude of the RT case is (+q)(+q)

(c) - The probability amplitude of the TR case is (+q)(-q)

Let us now examine

In the realm of classical physics two outcomes are possible:

(a) - The probability amplitude of the TT case is (+q)(+q)

(b) The probability amplitude of the RR case is (+q)(-q).

black hard black

white soft white

Figure 7: The eect of changing the bases.

1.8 Measurements and Collapsing of Superposition States onto Ba-

(b) intensity = I (c) intensity = I/2

intensity = 0 intensity = I/8

Using this experimental setup we make the following observations:

2.1 A Brief History of Quantum Ideas

E = 0, h, 2h, 3h, 4h, . . . , nh

2.2 Youngs Double-Slit Experiment

Wall Backstop P12 = P1 + P2

Figure 9: The double slit experiment with bullets.

Figure 10: The double slit experiment showing electron interference.

2.3 The Stern - Gerlach Experiment

2.4 Mathematical Foundations of Quantum Mechanics

The physical states are represented by vectors in a complex Hilbert space, Hn ;

Denition 1. The inner product a || b of two vectors in Hn is a complex number. The

( aa | + bb | ) | c = aa |c + bb |c

The skew symmetry implies a skew linearity in the second factor:

Note 1. A complete bracket expression denotes a number and an incomplete bracket

Denition 3. Two vectors | a and | b in Hn are orthogonal | a | b if:

a . b = ax bx + ay by = |a| |b| cos

Denition 4. A normal unitary basis of an n-dimensional space is a set of n vectors |

||1 || = ... = ||n || = 1

Note 2. Completeness is an important characteristic in innite-dimensional functional

Denition 6. Schwartz inequality is satised by the inner product operations

Denition 7. The outer product of a ket vector and a bra vector

Denition 8. An operator is Hemitian conjugate if

Denition 9. Two Hermitian operators O1 and O2 commute if

A mixed ensemble of states is produced by an incoherent admixture of states, while a

The length of a bra vector a | or of the corresponding ket vector | a is dened

Note 3. Superposition: each state of a dynamical system at a particular time corresponds to a