Sie sind auf Seite 1von 156

INTRODUCTION

TO BIO-GEOMETRY

Herbert Edelsbrunner
Departments of Computer Science and Mathematics
Duke University
Table of Contents

P ROLOGUE i

I B IO - MOLECULES 1
II G EOMETRIC M ODELS 17
III S URFACE M ESHING 35
IV C ONNECTIVITY 53
V S HAPE F EATURES 71
VI D ENSITY M APS 89
VII M ATCH AND F IT 101
VIII D EFORMATION 117
IX M EASURES 125
X D ERIVATIVES 141

S UBJECT I NDEX 147


AUTHOR I NDEX 149
Preface

[Mention the pioneers who early on recognized the im- and on “Bio-geometric Modeling” in the Spring of 2001 and
portance of geometry in structural molecular biology: Fred the Fall of 2002, all at Duke University. These courses were
Richards, Michael Levitt, Michael Connolly] either taken for credit or audited at least occasionally by
Luis von Ahn, Tammy Bailey, Yih-En (Andrew) Ban, Robert
[Mention that my book on the “Geometry and Topology
Bryant, Ho-Lun Cheng, Vicky Choi, Anne Collins, Abhijit
for Mesh Generation” is complementary/a prerequisite to
Guria, Tingting Jiang, Looren Looger, Ajith Mascarenhas,
this book. In particular, it covers the construction of Delau-
Gopi Meenakshisundaram, Nabil Mustafa, Vijay Natarajan,
nay triangulations in detail, and it describes the simulation
Xiuwen Ouyang, Anindya Patthak, Ken Roberts, Apratim
of simplicity as a general idea to deal with non-generic sit-
Roy, Scott Schmidler, Xiaobai Sun, Yusu Wang, Shumin
uations.]
Wu, Alper Üngör, Peng Yin and Afra Zomorodian.]
[This book is really about alpha shapes in a broad sense.
It might be useful to describe the history of that research in
short. Herbert Edelsbrunner
Durham, North Carolina, 2002
1981. Vancouver. Conception of idea with Kirkpatrick and
Seidel.
1985-89. Graz and Urbana. SoS, Delaunay software, Al-
pha Shape software with Ernst Mücke, Harald Rosen-
berger, and Patrick Moran.
1990-93. Urbana and Berlin. Surface triangulations, Betti
numbers, inclusion-exclusion, CAVE with Ping Fu,
Ernst Mücke, Cecil Delfinado, Nataraj Akkiraju, and
Jiang Qian.
1994-95. Hong Kong. Morphing, molecular skin, with Ping
Fu, Siu-Wing Cheng, Ka-Po Lam, and Ho-Lun Cheng.
1995-98. Urbana. Flow and pockets, skin surfaces with Ho-
Lun Cheng, Tamal Dey, Michael Facello, Jie Liang,
Shankar Subramaniam, Claire Woodworth.
1999-2001. Duke. Skin triangulation, hierarchy, Morse
complexes with Ho-Lun Cheng, Alper Üngör, Afra
Zomorodian, David Letscher, John Harer, Vijay
Natarajan.
2002-2003. Duke and Livermore. Docking, Reeb graphs,
Jacobian manifolds with Johannes Rudolph, Sergei
Bespamyatnikh, Vicky Choi, John Harer, Valerio Pas-
cucci, Vijay Natarajan, Ajith Mascarenhas.
2000-2005. ITR Project. Derivatives, interfaces, software
with Robert Bryant, Patrice Koehl, Michael Levitt, An-
drew Ban, Johannes Rudolph, Lutz Kettner, Rachel
Brady, and Daniel Filip.

]
[This book is based on notes developed during teaching
the courses on “Sphere Geometry” in the Spring of 2000,
To do or think about (March 15, 2004).

General Fix the software for creating the index and


glossary.
Should the Exercise sections be labeled so the
page heading is more uniform?
Chapter III Section III.3: mention new results on
scheduling.
Exercises: add a few more questions.
Chapter V Should Section V.2 on Topological Per-
sistence be reorganized by first presenting the
algebra and second the algorithm?
In Section V.3: replace 23- by 03-, 13- and 23-
collapses.
Add the interface software description to Sec-
tion V.4.
Chapter VI Write Section VI.3 on Construction
and Simplification.
Write Section VI.4 on Simultaneous Critical
Points.
Exercises: come up with questions.
Chapter VII

In Section VII.2: find out about find-
ing the best bi-chromatic matching in .
Chapter VIII Write the introduction to Deformation.
Write Section VIII.1 on Molecular Dynamics.
Write Section VIII.2 on Spheres in Motion.
Write Section VIII.3 on Rigidity.
Write Section VIII.4 on Shape Space.
Exercises: come up with questions.
Chapter IX Exercises: come up with questions.
Chapter X Write a new chapter on area and volume
derivatives and related topics.
Write a section on the Weighted Area Deriva-
tive.
Write a section on the Weighted Volume
Derivative.
Exercises: come up with questions.
Chapter I

Bio-molecules

This chapter discusses the three main classes of organic We begin by describing the chemical structure of DNA
macromolecules involved in the hereditary and life main- and RNA in Section I.1. We then explain the translation
tenance mechanisms of living beings: DNA, RNA, and from RNA to proteins in Section I.2 and talk about the
proteins. According to the central dogma of biology, pro- structural organization of proteins in Section I.3. Finally,
teins are created in two steps from DNA, which carries the we present some of the fundamental premises and results
genetic information: of molecular mechanics in Section I.4.

DNA RNA Protein I.1 DNA and RNA


transcription translation I.2 Proteins and Amino Acids
I.3 Structural Organization
replication
I.4 Molecular Mechanics
Exercises
We talk briefly about the processes indicated by the three
arrows and focuses on the structure of the players in-
volved. DNA is the stuff that genetic material is made of.
RNA is mostly but not entirely an intermediate product
copying portions of the DNA (transcription) and turning
this information into working proteins (translation). Pro-
teins act like machines that define the cell cycle as an on-
going process. Each cell is like a society whose mem-
bers have specialized tasks, which they accomplish in a
complicated net of interactions. All mentioned molecules
are between large and huge. They are relatively simple
locally but exceedingly complicated in their totality. Be-
cause of the complexity and the large variety, it should
not be surprising that there are exceptions to almost ev-
erything meaningful that can be said about them. Perhaps
it is more surprising that anything of broad validity can be
said at all.

1
2 I B IO - MOLECULES

I.1 DNA and RNA indicate the total number of extra shared electrons. For ex-
ample, the hexagonal ring of cytosine has a total of eight
DNA (or deoxyribonucleic acid) is the material that forms covalent bonds, which we may think of as four thirds of a
the genome, which is a complete set of the genetic mate- covalent bond between every contiguous pair.
rial of a living organism. As discovered by Watson and
Crick in 1953, DNA consists of two strands of nucleotides NH2
twisted into the shape of a double helix, as depicted in Fig- C
N
ure I.1. We begin by looking at the small level and work C N
HC
O C CH
N N
−O P O CH2 O
adenine
−O C H C
H
phosphate
H C C H

OH H
deoxyribose sugar

O NH2 O
CH 3 C
C C
N HC N C NH
C NH
HC
C HC C HC C
N C N
N NH2 N O O

guanine cytosine thymine

Figure I.2: The chemical structure of the DNA nucleotide with


adenine as the nitrogenous basis above, and the chemical struc-
Figure I.1: A short piece of the DNA double-helix, with atoms ture of the other three nitrogenous bases below.
shown as tightly packed and partially overlapping spheres.

our way up the multi-scale structure of DNA. Compared Double helix. The two strands of DNA are held together
to standard genomics texts, the treatment of DNA in this by weak hydrogen bonds between complementary bases,
section is coarse and lacking of many important details. forming the structure of a spiraling staircase. The back-
bone of each strand is a repeating phosphate-deoxyribose
sugar polymer. The phosphate and the sugar groups in the
Chemical structure of DNA. DNA has three chemical
components: phosphate, deoxyribose sugar, and four ni- backbone are connected by phosphodiester bonds. The at-
tachment of these bonds to the sugar groups is illustrated
trogenous bases, namely adenine, guanine, cytosine, and
thymine. The first two bases are double-ring and the last in Figure I.3. The carbons of the sugar group are num-
two are single-ring structures. The chemical components bered from  to  . One part of the phosphodiester bond is
between the phosphate and the   -carbon, and the other is
are arranged in groups called nucleotides, each composed
of a phosphate group, a deoxyribose sugar, and one of the between the phosphate and the   -carbon. We think of the
backbone as oriented in the direction of the path that starts
four bases. A nucleotide is conveniently referred to by
at the   -carbon, passes through the   -carbon, and ends at
the first letter of its base. Figure I.2 sketches the chem-
ical structure of the nucleotide A and shows the chemi- the   -carbon. In the double stranded DNA molecule, the
two backbones are in opposite, or anti-parallel, orienta-
cal structures of the remaining three bases. We obtain the
nucleotides G, C and T by substituting the corresponding tion.
base for adenine in Figure I.2. We use boldface edges to The bases are attached to the 1-carbons. Interactions
connect atoms that are joined by two covalent bonds. The between base pairs hold the two strands together. Adenine
covalent bonding in the ring structures of the nitrogenous interacts with thymine and guanine with cytosine. The two
bases is more interesting. All atoms in the ring share elec- bases of a pair are said to be complementary. This implies
trons as a group and we draw some double bonds just to that the sequence of bases along one strand determines the
I.1 DNA and RNA 3

O special protein). The beads of wrapped histones assume a


O
P
O coiled structure (a solenoid) stabilized by another type of
O histone that runs along its central axis. It takes one more
O H 5’H2
O P O HN
O
level of packaging to convert the solenoid into the three-
O 4’
O 1’
dimensional structure we call a chromosome. This higher
3’
2’ T NH N A 2’
3’
level uses a core scaffold made of another enzyme, topoi-
1’
4’
O somerase II. This enzyme has the ability to pass a strand
O O
P O of DNA through another, which is a much needed oper-
5’ H2 O
O 5’ H2
ation during packing and unpacking the DNA. The best
O P H O evidence suggests that the solenoid arranges in loops em-
O 4’
O O HN 1’ anating from the scaffold, which itself assume the form of
2’ 3’
3’ 2’ a spiral.
1’ G NH N C O
4’
O O
NH O P O
5’ H2 H O
O
Chemical structure of RNA. A gene is a subsequence
O
O P of the DNA capable of being transcribed to produce a
O functional RNA molecule. Note that this definition de-
pends on the rather complicated process of transcription,
Figure I.3: Chemical structure of a very short segment of DNA. which can fail for a variety of reasons. We begin by look-
The numbers  to  order the carbon atoms of each sugar group, ing at the chemical features of RNA. There are three main
giving each strand an orientation. The dotted connections be-
differences to DNA.
tween the nitrogenous bases indicate hydrogen bonds.

1. RNA is a single-stranded nucleotide chain and can


sequence of bases along the other: reverse the reading di- therefore assume a much greater variety of geometric
rection and replace each basis by its complement, shapes than DNA.
2. RNA has ribose sugar in its nucleotides, which dif-
5’  AATCGCGTACGCG   3’
fers from deoxyribose sugar by one additional oxy-
3’
 TTAGCGCATGCGC
 5’ gen atom.

Replication is based on this simple rule of complementar- 3. RNA nucleotides carry the bases adenine, guanine,
ity and makes essential use of the relatively weak bonds and cytosine, but substitute uracil for thymine found
between the two strands. A protein machine builds new in DNA. Uracil forms hydrogen bonds with adenine
DNA strands by separating the two old strands and com- just as thymine does.
plementing each by a new anti-parallel strand.
Figure I.4 illustrates the chemical difference between
RNA and DNA by showing a ribonucleotide containing
Chromosomes. Each cell of an organism contains a uracil.
copy of the entire genome. In the case of a human cell,
this amounts to about two meters of DNA partitioned into
O
 
twenty-three pairs of chromosomes per cell. The body has
about  cells, totaling about    meters of DNA, HC
C
NH
which is more than a hundred times the distance between
O HC C
the earth and the sun. Since humans are small relative to N O
−O P O CH2 O uracil
that distance, this implies that the DNA must be thin and
efficiently packed. Indeed, each chromosome is a long −O C H H C
thread (a double-strand) that is densely folded around pro- phosphate H H
C C
tein scaffolds.
OH OH
How is a long thread of DNA converted into the rel- ribose sugar
atively thick and worm-like structure visible through the
electron microscope? On the lowest level, the DNA is Figure I.4: Chemical structure of the RNA nucleotide with uracil
wrapped twice around a configuration of eight histones (a as the nitrogenous basis.
4 I B IO - MOLECULES

RNA is classified into different types depending on their A gene is thus not only marked but indeed defined by the
function. The vast majority is messenger RNA (or mRNA), promoter segment preceding and the terminating sequence
which acts as an intermediary structure in the synthesis succeeding it.
of proteins. There is also functional RNA produced by a
small number of genes, which is not translated into pro-
tein. Examples are transfer RNA (or tRNA), which brings Bibliographic notes. The idea that traits are hereditary
amino acids to the mRNA during the translation process, is old, but the detailed mechanism how it comes about
and ribosomal RNA (or rRNA), which helps coordinating started to unfold only recently. The groundwork for our
the assembly of amino acids to proteins. current understanding was laid in the nineteenth century
by Gregor Mendel, when he discovered the basic rules of
the hereditary mechanism [2]. An English translation of
Transcription. The transcription process, which makes this work can be found in [3]. It was long known that
RNA, is similar to the replication process of DNA. Dur- DNA is critically involved in that mechanism, but it took
ing the transcription of a gene, the two strands of DNA until the work of Watson and Crick in 1953 to discover the
are separated locally, and one strand acts as a template for chemical structure of DNA [5, 6]. The book by Watson [4]
RNA synthesis. Free ribonucleotides align along the DNA is an enjoyable personal account of the years preceding the
template. The process is catalyzed by another protein ma- discovery of that structure. Today there are many books on
chine, the RNA polymerase complex, which moves along the subject, and most of the material in this section is taken
the DNA adding ribonucleotides to the growing RNA, as from [1, Chapters 2 and 3].
sketched in Figure I.5. The resulting RNA sequence is
[1] A. J. F. G RIFFITH , W. M. G ELBART, J. H. M ILLER AND
S R. C. L EWONTIN . Modern Genetic Analysis. Freeman,
P
U 3’ S P S P S P 5’ New York, 1999.
A G C [2] G. M ENDEL . Versuche über Pflanzen-Hybriden. Verhand-
lungen des naturforschenden Vereines, Abhandlungen,
Brünn 4 (1866), 3–47.
A T C G
[3] C. S TERN AND E. R. S HERWOOD . The Origin of Genetics:
5’ P S P S P S P S 3’
A Mendel Source Book. Freeman, 1966.

Figure I.5: The RNA grows in the 5’ to 3’ direction, in this case [4] J. D. WATSON . The Double Helix. Antheneum, New York,
by adding a nucleotide carrying uracil to the chain. 1981.

the same as the non-template sequence of the gene, except [5] J. D. WATSON AND F. H. C. C RICK . Molecular structure
that U replaces T. Electron microscope pictures show that of nucleic acid. A structure for deoxyribose nucleic acid.
Nature 171 (1953), 737–738.
the transcription of DNA to RNA is a highly parallel pro-
cess in which a row of RNA polymerase complexes follow [6] J. D. WATSON AND F. H. C. C RICK . Genetic implica-
each other along the gene and produce RNA concurrently. tions of the structure of deoxyribonucleic acid. Nature 171
Each individual transcription works in three steps. (1953), 964–967.

Initiation. RNA polymerase binds to a promoter segment


of DNA located in front of the gene. It then un-
winds the DNA and begins the synthesis of an RNA
molecule.
Elongation. RNA polymerase moves along the DNA,
maintaining a transcription bubble to expose the tem-
plate strand. It compares free ribonucleotides with
the next exposed DNA basis and adds a complemen-
tary match.
Termination. Specific sequences in the DNA signal the
chain termination by triggering the release of the
RNA strand and the polymerase.
I.2 Proteins and Amino Acids 5

I.2 Proteins and Amino Acids Amino acids. Among a much larger variety of amino
acids, nature uses only twenty to build proteins. We
Proteins are polypeptide chains obtained by translation list their names together with their three-letter codes and
from strands of messenger RNA. In this section, we sketch single-letter abbreviations in Table I.1. As can be seen in
the translation process and discuss the chemical structure
Alanine Ala A Methionine Met M
of proteins.
Cysteine Cys C Asparagine Asn N
Aspartate Asp D Proline Pro P
Chemical structure. A protein is a linear sequence of Glutamate Glu E Glutamine Gln Q
Phenylalanine Phe F Arginine Arg R
amino acids connected to each other by peptide bonds.
Glycine Gly G Serine Ser S
Each amino acid consists of a central carbon atom, the -
Histidine His H Threonine Thr T
carbon, linked to an amino group, a carboxyl group, one Isoleucine Ile I Valine Val V
hydrogen atom, and a side-chain. Amino acids that are Lysine Lys K Tryptophan Trp W
linked into a polypeptide chain are referred to as residues. Leucine Leu L Tyrosine Tyr Y
Different residues are distinguished by their side-chains.
As shown in Figure I.6, two amino acids are linked by a Table I.1: Names, codes and abbreviations of the twenty amino
peptide bond whose creation releases water. The result- acids that occur as building blocks of natural proteins.
ing repeating sequence of nitrogen, -carbon and carbon
atoms is the backbone of the protein. Figures I.8 and I.9, residues differ widely in size and struc-
ture. The fifteen amino acids sketched in Figure I.8 may
be viewed as trees rooted at the -carbon, which is part
H H
H O H O of the backbone. Most of the internal nodes are carbon
N C C + N C C atoms, with rare occurrences of oxygen, nitrogen and sul-
H OH H OH fur atoms. As before, we mark double and partially dou-
R R
ble bonds by boldface edges. Four of the five amino acids
OH2

H O H
H O
N C C N C C
H OH
R H R
O N
Valine Isoleucine
Figure I.6: Two amino acid residues joined by a peptide bond.
Leucine Asparagine

The four neighbors of an -carbon, C , are at the vertex 

positions of a tetrahedron around C . This tetrahedron has


Glycine O
two orientations, one being the mirror image of the other, Alanine O S
Threonine
as illustrated in Figure I.7. The two oriented forms are O O
Aspartate Serine Cysteine
referred to as isomers and distinguished by letters L and
D. Only L-amino acids occur in nature as building blocks
of proteins.

N
NH2 COOH COOH NH 2 S
N O N
N N
O O
Cα Cα
Arginine Lysine Methionine Glutamate Glutamine

H R R H
Figure I.8: The fifteen amino acids without cycle in their chemi-
L D cal structure. The shaded circle is the -carbon on the backbone.


All unlabeled nodes are either carbon or hydrogen atoms.


Figure I.7: The two isomers of an amino acid.
sketched in Figure I.9 have pentagonal and hexagonal ring
6 I B IO - MOLECULES

structures. The fifth amino acid is proline, which forms a The translation is accomplished by transfer RNA
cycle by having its chain connect back to the nitrogen next molecules that recognize codons through the same binding
to the -carbon along the backbone. This unique feature mechanism used for replication and transcription. Some
locally restricts the flexibility of the backbone, as will be residues correspond to more codons than others. The re-
discussed in Section I.3. dundancy is in part due to multiple tRNA molecules car-
rying the same residue and in part because there is flexi-
bility in how the tRNA reads the codons. In many cases,
N an accurate match at the first two positions suffices and a
mismatch at the third position can be tolerated. This ex-
plains the relative uniformity among the four residues in
Proline
any one slot of Table I.2.
N
Since codons are triplets of nucleotides, there are ap-
Tryptophan
parently three possible reading frames, each producing an
entirely different residue sequence. The correct reading
frame is identified by starting the translation always at a
start codon, AUG. The initiator tRNA is a specific transfer
O RNA that recognizes this sequence and binds to methion-
N ine. Incidentally, it differs from the tRNA that binds to the
O
N
AUG codon in the middle of the sequence, although that
Tyrosine Phenylalanine Histidine one also binds to methionine.

Figure I.9: The five amino acids with cyclic chemical structure.
Translation. As mentioned above, the tRNA molecules
are instrumental in translating codons into residues. Each
tRNA is a short sequence of about 80 nucleotides. Com-
Genetic code. The translation process is more involved plementary subsequences form double-helix substructures
than transcription because it converts information between that further fold up to characteristic ‘clover leaf’ forma-
two languages that use different alphabets. The sequence tions, one of which is sketched in Figure I.10. A tRNA
of nucleotides is read consecutively in groups of three,

called codons. Since there are four different types of nu-
cleotides, we have   codons. There are only twenty amino
acid
residues, which implies that the map is not injective but
3’
uses redundancy to reduce the number of outcomes. The
complete map is shown in Table I.2. The codon XYZ is
5’
A G C U G C
C G
A Lys Lys Arg Arg Thr Thr Ile Met G C
G C
Asn Asn Ser Ser Thr Thr Ile Ile A U
G Glu Glu Gly Gly Ala Ala Val Val U A
U A
Asp Asp Gly Gly Ala Ala Val Val G A C A C
C Gln Gln Arg Arg Pro Pro Leu Leu C U C G
C U G U G
His His Arg Arg Pro Pro Leu Leu G A G C

U Trp Ser Ser Leu Leu C G


C G
Tyr Tyr Cys Cys Ser Ser Phe Phe A U
G C
Table I.2: The genetic code. The start codon is AUG and maps to
methionine. Empty entries correspond to the stop codons, which
are UAA, UAG, and UGA. anti−codon GAA

mapped to one of the residues in the row of X and the col- Figure I.10: Transfer RNA with anti-codon at the bottom, cova-
umn of Y. The four positions inside that slot correspond to lently attached amino acid at the top, and complementary sub-
A, G in the first row and C, U in the second row. strings shown.
I.2 Proteins and Amino Acids 7

molecule matches the exposed codon of the mRNA with [4] N. J. DARBY AND T. E. C REIGHTON . Protein Structure.
its anti-codon and contributes its residue to the polypep- Oxford Univ. Press, England, 1993.
tide chain that grows at the other end. The codon and anti-
[5] P. C. E. M OODY AND A. J. W ILKINSON . Protein Engi-
codon are matched in anti-parallel orientation, as always. neering. Oxford Univ. Press, England, 1990.
The translation process is facilitated by the ribosome,
[6] L. S TRYER . Biochemistry. Third edition, Freeman, New
which is a large complex made from more than 50 dif-
York, 1988.
ferent proteins and several RNA molecules. It consists
of a small subunit and a large subunit, which come to-
gether around an mRNA strand with the help of the ini-
tiator tRNA that contributes the first residue. The ribo-
some scans through the strand like a tape reader. For each
codon, it finds a tRNA with matching anti-codon and ap-
pends its amino acid as a residue to the carboxyl end of the
growing polypeptide chain. The orientation of the mRNA
strand from the 5- to the 3-end is thus preserved by the
orientation of the polypeptide chain from the amino group
of the first to the carboxyl group of the last residue. The
translation process ends when a stop codon is read. The
protein chain and the mRNA are released and the ribo-
some dissociates into its two subunits.
Similar to transcription, the translation of an mRNA
strand into a protein happens in parallel, with several ri-
bosomes working concurrently and in sequence along the
strand. In some cases, the translation even starts during
transcription, before the mRNA strand is complete.

Bibliographic notes. Most of the twenty amino acids


that occur in proteins have been identified in the nineteenth
century. After the determination of the DNA structure in
1953, it took only a few years for the community to agree
on the central dogma, and a few more years to decipher the
genetic code on which the dogma is based. The geomet-
ric structure of the ribosome has recently been resolved by
x-ray crystallography [2]. The material of this section is
taken from [1, 3, 6], all three of which are comprehensive
texts in their respective fields. Considerably shorter and
more focussed descriptions of proteins and protein struc-
tures can be found in [4, 5].

[1] B. A LBERTS , D. B RAY, A. J OHNSON , J. L EWIS , M.


R AFF , K. ROBERTS AND P. WALTER . Essential Cell Bi-
ology. An Introduction to the Molecular Biology of the Cell.
Garland, New York, 1998.

[2] N. BAN , P. N IESSEN , J. H ANSEN , P. B. M OORE AND T.


A. S TEITZ . The complete atomic structure of the large ribo-
somal subunit at   Å resolution. Science 11 (2000), 878–
879.

[3] T. E. C REIGHTON . Proteins: Structures and Molecular


Properties. Second edition, Freeman, New York, 1993.
8 I B IO - MOLECULES

I.3 Structural Organization are physically prohibited collisions between atoms. A


larger residue will generally prohibit a larger range of
We cannot hope to understand proteins without a good angles than a smaller one. The realizable angle pairs
grasp of their multi-level structural organization. Most are
visualized as a subset of the square of angle pairs,
surprisingly, same proteins fold up to same shapes, and 
    . This so-called Ramachandran plot for
this is really the reason why geometry plays an important glycine is sketched in Figure I.12. The side-chain of
role in their study.
ψ

Bond rotation. Consider the three bonds from one -


carbon to the next along a protein backbone, and refer to it
as a peptide unit. Figure I.6 shows its chemical and Figure
I.11 its geometric structure. Because of partial double-


φ

O N
C H

ψ H
H Figure I.12: The square represents all angle pairs  and the

N shading indicates the region of disallowed pairs for glycine.
φ


glycine is only H, which is the reason that a relatively
C
large portion of the square of angle pairs is realizable. An

O interesting residue in this respect is proline, which differs
from all others because it binds back to the backbone, and
Figure I.11: The planarity of a peptide bond is caused by its in this way restricts the rotational degree of freedom to a
partial double-bond character. The and  angles measure rota- small region.
tions around the bonds preceding and succeeding every -carbon 

atom.
Two common motifs. A motif that is commonly ob-
bond character, there is no freedom to rotate around the served in proteins is the -helix, whose backbone forms
peptide bond, which is the link between the carbon and the a right-handed helix. Contiguous -carbons are separated
nitrogen atoms. There are however two possibly planar by about  in the rotation direction and  Å rise,
configurations: the trans form, in which C -C-N-C is 

relatively stretched (zig-zag), and the cis form, in which





which is measured along the axis. A rotation takes about
 residues and produces an axial separation of about
it curves in one direction (zig-zig). The two forms are   Å. The structure is stabilized by hydrogen bonds be-
distinguished by the rotation angle along the C-N bond, tween every CO group and the NH group four residues
 , which by convention is   for the trans and  for
later. All side-chains lie outside the helix structure. The
the cis form. In contrast, the links between the -carbon characteristic dihedral angles for a right-handed -helix
and the carbon and nitrogen atoms are single bonds with are roughly    and   . Cartoon repre-
one-dimensional rotational degrees of freedom. As shown sentations of protein structures usually draw -helices as
in Figure I.11,  measures the rotation around the N-C 

tubes. In Figure I.13 the tubes are visible as spiral sections


bond, and measures the rotation around the C -C bond. 

of the ribbon.
Again by convention, 
  and   for the two
coplanar trans forms. Another recurring motif are  -sheets, which are flat and
made up of several strands. A strand can be obtained by
stretching the -helix until the axial distance between two
Ramachandran plot. The conformation of the back- contiguous -carbons reaches about   Å. The stabilizing
bone is completely determined when  ,  , and are spec- hydrogen bonds are between neighboring strands, which
ified for each residue in the chain. A given residue pro- can run in the same direction (parallel) or in opposite di-
hibits some angles because of steric hindrances, which rections (anti-parallel). They combine strands to sheets.
I.3 Structural Organization 9

Quaternary structure refers to the spatial arrangement of


subunits of a protein.

A single protein may indeed contain more than one


polypeptide chain. Each chain forms what we call a sub-
unit, and quaternary structure addresses questions about
their relative position and interaction. The description
of quaternary structure includes the rather weak van der
Waals forces, which affect atoms in short distance (within
about   Å). Although this force is weak compared to oth-
ers, its accumulated influence is significant if two subunits
have geometrically complementary shapes that permit a
large number of atom pairs within the reach of the force.
This accumulated effect thus prefers interactions between
geometrically complementary shapes. In biology, this fact
is expressed by saying that the van der Waals force creates
specificity in the interaction. That specificity plays a dom-
inant role also in protein-protein and in protein-ligand in-
Figure I.13: Ribbon diagrams visualize proteins by emphasizing teractions. A protein typically has a few regions embedded
the backbones as it winds its way through the structure. in its surface, so-called active sites, that are specific to in-
teractions with other molecules. While active sites usually
occupy only a small fraction of the surface, they decide
Both options are illustrated in Figure I.14. protein function. Evidence for that claim can be provided
by mutating a protein and distinguishing between muta-
CO CO CO HN tions that preserve and that change the active sites.
Cα Cα Cα Cα
NH NH NH OC Structure determination. Even though proteins are
OC OC OC NH large molecules that typically consist of a few thousand
Cα Cα Cα Cα atoms, they are not visible under an electron microscope.
HN HN HN OC
How do we then know anything about the structural or-
ganization of proteins? The primary source today are x-
CO CO CO HN
ray diffractions from protein crystals, but there are others
Cα Cα Cα Cα and most notably images generated from nuclear magnetic
NH NH NH OC resonance (or NMR) experiments. Both methods are com-
plicated and laborious. We only scratch the surface by ex-
Figure I.14: Two parallel -strands to the left and two anti- plaining the principle steps in the reconstruction of protein
parallel ones to the right. The dotted edges represent stabilizing structures from x-ray diffractions:
hydrogen bonds.
1. Prepare a protein crystal.
2. Expose the crystal to x-ray beams and collect the
Protein architecture and function. It is common to diffractions.
distinguish four levels of organization in the description
3. Compute the electron density and from it derive the
of protein architecture:
structure.
Primary structure refers to the sequence of residues along
The x-ray experiment does not determine the element
the oriented polypeptide chain.
identities of the atoms, which have to be obtained from the
Secondary structure refers to the spatial arrangement of known chemical structure threaded into the density. Since
residues that are near each other along the chain. there are probably hundreds of thousands of different pro-
Tertiary structure refers to the spatial arrangement of teins, it would be desirable to automate the process. It
residues that are far from each other along the chain. seems that Step 1 is the main obstacle in reaching this goal,
10 I B IO - MOLECULES

ATOM N ARG   


     
in part because some proteins are not known to form crys- ATOM CA ARG   
  
  
tals at all. Step 2 requires an x-ray source, a device to ro- ATOM C ARG     
   
 
tate the crystal by small angles (  or less), and a detec- ATOM O ARG    
  
  
tion device. For each angle, we get a two-dimensional pic- ATOM CB ARG    
  
 
ATOM CG ARG   
  
  
ture of diffractions. The three-dimensional electron den- ATOM CD ARG        
sity is computed from a whole array of such pictures. A ATOM NE ARG        
typical level surface of an electron density is shown in Fig- ATOM CZ ARG    
  
 
ATOM NH1 ARG       
 
ure I.15. The main mathematical tool in the construction
ATOM NH2 ARG    
  
  

Table I.3: Incomplete records of the atoms that belong to an argi-


nine residue. CA is the -carbon atom, CB the -carbon, etc.


Bibliographic notes. The Ramachandran plot for real-


izable bond rotations goes back to work by Ramachan-
dran and Sasisekharan [6]. The -helix has been sug-
gested as a common motif in proteins by Pauling and col-
laborators in 1951 [4], and in the same year they also
identified the  -sheet [3]. This was a few years before
these motifs had been observed in x-ray experiments. In
the late 1950s, Max Perutz reconstructed the structure
of hemoglobin from x-ray diffraction data [5], and John
Kendrew did the same for myoglobin. A classic text on
the x-ray crystallography method is [2]. The material on
x-ray crystallography and PDB files presented in this sec-
tion is taken from [1].
Figure I.15: The so-called chicken wire representation of a level
surface of a three-dimensional density.
[1] L. J. BANASZAK . Foundations of Structural Biology. Aca-
demic Press, San Diego, California, 2000.
of the electron density is the Fourier transform. A fun-
damental difficulty in this step is that only the amplitudes [2] T. B LUNDELL AND L. J OHNSON . Protein Crystallography.
(intensities) of the waveforms are observable, while the Academic Press, New York, 1976.
phase information must be obtained by different means.
[3] L. PAULING AND R. B. C OREY. Configurations of poly-
peptide chains with favored orientations around single
Protein data banks. After completing the structural bonds: two new pleated sheets. Proc. Natl. Acad. Sci. USA
37 (1951), 729–740.
study of a crystallized protein, investigators usually send
their results to the Protein Data Base, which is a public [4] L. PAULING , R. B. C OREY AND H. R. B RONSON . The
repository of protein structures described in so-called PDB structure of proteins: two hydrogen-bonded helical configu-
files. At the beginning of each file we find ancillary infor- rations of the polypeptide chain. Proc. Natl. Acad. Sci. USA
mation, including the header, the name of the protein, the 37 (1951), 205–211.
author, the reference to the corresponding journal article,
[5] M. F. P ERUTZ . X-ray analysis of hemoglobin. Lex Prix No-
etc. There is also information about non-standard compo- bel, Stockholm, 1963.
nents and about secondary structure elements. The main
body of the file lists the coordinates of the observed atoms. [6] G. N. R AMACHANDRAN AND V. S ASISEKHARAN . Stereo-
They are always given in an orthonormal coordinate sys- chemistry of polypeptide chain configurations. J. Mol. Biol.
tem, in which the length unit is one angstrom. Table I.3 7 (1963), 95–99.
illustrates the format by showing a small portion of a PDB
file for hemoglobin, listing the coordinates of the atoms
of an arginine residue. Note that there are no hydrogen
atoms, since they are too small to be resolved by an x-ray
experiment.
I.4 Molecular Mechanics 11

I.4 Molecular Mechanics the Avogadro’s number of its atoms. In other words, if the
mass of one atom of that element is  daltons then the
After a protein has been created by translation, it folds mass of one mole is  grams. Table I.4 lists properties of
into a shape, or conformation, that is determined by its elements that are commonly found in organic matter.
sequence of residues. The folding process is a reaction to
element #p #n electron shells
a multitude of forces that simultaneously act on every part
of the protein. This section presents some of the current Hydrogen H 1 0 .
Carbon C 6 6 .. ....
knowledge and efforts to model these forces. We begin
Nitrogen N 7 7 .. .....
by studying atoms and discuss covalent and non-covalent
Oxygen O 8 8 .. ......
forces. Sodium Na 11 12 .. ........ .
Magnesium Mg 12 12 .. ........ ..
Phosphorus P 15 16 .. ........ .....
Atoms. Each atom has a positively charged massive Sulfur S 16 16 .. ........ ......
nucleus, which is surrounded by a cloud of negatively Chlorine Cl 17 18 .. ........ .......
charged electrons. The nucleus consists of protons, each Potassium K 19 20 .. ........ ........ .
contributing a unit positive charge, and of electronically Calcium Ca 20 20 .. ........ ........ ..
neutral neutrons. The electrons are held in orbit by elec-
trostatic attraction to the nucleus. Each electron has one Table I.4: Some elements together with their numbers of pro-
unit of negative charge, which exactly neutralizes the pos- tons, neutrons and electrons distributed in the shells around the
nucleus.
itive charge of one proton. In total, we have the same
number of protons and electrons and thus an electroni-
cally neutral atom, as illustrated in Figure I.16. Different
Covalent bonds. According to the Born model, elec-
trons live in shells around the nucleus and populate in-
- - - - ner shells before using outer ones. The first three shells
from inside out can hold up to 2, 8 and 8 electrons, as in-
+ dicated in Table I.4. The chemical properties of an atom
+ +
+ + are defined by the tendency to either empty or complete
+ + its partially incomplete shell, if any. One way of doing
that is by sharing electrons. The shared electrons com-
- - -
plete the outermost non-empty shells of both atoms in-
volved. According to Table I.4, carbon, nitrogen and oxy-
gen need four, three and two electrons to fill their outer
Figure I.16: A schematic picture of a hydrogen atom to the left
and a carbon atom to the right. shells. As illustrated in Figure I.17, this can for exam-
ple be done by covalently binding to the same number
of hydrogen atoms. We can now define a molecule as a
elements consist of atoms with different numbers of pro-
tons. The atomic number is by definition the number of
protons, which is also the number of electrons. The num- - -
ber of neutrons is usually about the same because too few
or too many neutrons destabilize the nucleus. The atomic + +
+
weight is the ratio of its mass over the mass of a single
hydrogen atom. Because the mass of an electron is negli-
+ +
gible, the atomic weight is almost exactly the number of
protons plus the number of neutrons.
Figure I.17: The geometry of covalent bonding for carbon, nitro-
Avogadro’s number is useful in translating from the gen, and oxygen.
miniscule world of single atoms into a humanly more ac-
 
cessible scale. It is the number of hydrogen atoms in one
gram of hydrogen, which is roughly   . The mass
connected component of the graph whose vertices are the
 atoms and whose edges are the covalent bonds. When an
of one hydrogen atom is therefore     gram which, atom covalently bonds to more than one other atom, then
by definition, is one dalton. One mole of an element is there is a preferred angle between pairs of bonds. For ex-
12 I B IO - MOLECULES

ample for carbon, this angle is what we get by connecting der Waals interaction. Experimental observations point to
the centroid of a regular tetrahedron with two of the ver- a potential energy function roughly as graphed in Figure
tices.


  
     . Two atoms can
Using elementary geometry we find this angle is I.18. The corresponding force is the negative derivative,

also form a covalent double bond, which forces the nu-

energy
clei closer together and is stronger than the corresponding
single bond. It also prevents any torsional rotation around
that bond, which is possible for single bonds. We need
a sequence of four atoms and three covalent bonds to de-
fine the torsional angle of the middle bond. It is gener-
ally parametrized such that   corresponds to the trans distance
(zig-zag) coplanar configuration. For example for H  C-
CH  , we have three bonds on each side of the middle
bond. There is an energetic preference for staggering the
covalent bonds on  the two sides, which corresponds to tor-
sional angles of  ,
  , and   . Figure I.18: The van der Waals force is obtained by adding the at-
tractive force (derivative of dashed curve) and the repulsive force
When two atoms that covalently bond are of different (derivative of the dotted curve).
type then they generally attract the shared electron to dif-
ferent degrees. The shared electrons will therefore have a which is interpreted as a balance between an attractive
bias towards one end of the structure or another. We then and a repulsive force. The attraction is due to a disper-
have a polar structure in which the positive charge is con- sive force that can be explained using quantum mechanics.
centrated on one end and the negative charge on the other. The repulsion also has a quantum mechanical explanation
Examples of polar covalent bonds are between hydrogen in terms of the Pauli principle, which prohibits any two
and oxygen and between hydrogen and nitrogen, as illus- electrons from having the same set of quantum numbers.
trated in Figure I.17. In contrast, the bond between hy-
drogen and carbon has the electrons attracted much more It is useful to keep the relative strengths of the various
equally and is relatively non-polar. forces in mind. Table I.5 gives estimates of the amount of
energy necessary to break one mole of bonds.

bond type strength in


Non-covalent bonds. An atom can also donate an elec- vacuum water
tron to another atom and thus create a complete outer covalent 90.0 90.0
shell. An example is sodium donating the only electron ionic 90.0 3.0
in its third shell to chlorine, which uses it to complete its hydrogen 4.0 1.0
third shell. As a result we get positively charged sodium van der Waals 0.1 0.1
cations and negatively charged chloride anions. Both are
attracted to each other by electrostatic force and form a Table I.5: Relative strength measured in kilo-calories per mole
regular grid packing, in which each sodium cation is sur- necessary to break the bonds. Water molecules interfere with
ionic and hydrogen bonds, which are therefore considerably
rounded by six chloride anions, and vice versa. These
weaker in a solution than in a vacuum.
arrangements are known as table salt. A weaker inter-
action, also based on electrostatic force, is generated by
polar molecules. A prime example is water, which is par-
tially positively charged at the two hydrogen ends. Wa-
Force field. To get a handle on how molecules move,
ter molecules thus tend to aggregate in small semi-regular we define the potential energy of a system of atoms. The
structures, but this force is weak and bonds of this kind
general assumption is that the system develops towards a
are constantly formed and broken. The polarity of wa- minimum. To model the potential energy accurately, we
ter molecules is the basis for the difference between hy- would have to work with quantum mechanics, which is
drophilic molecules, that are polar and therefore attract
beyond the scope of this book and also beyond the capabil-
water, and hydrophobic molecules, that are non-polar and ities of current computations for large organic molecules.
do not attract water.
The alternative is molecular mechanics, which uses classi-
Another non-covalent force is responsible for the van cal mechanics to model the forces that act on atoms. The
I.4 Molecular Mechanics 13

simplest such model sums five contributions to the poten- It is clear that 
as defined is only a rough approxima-
 
 
tial energy, three accounting for covalent bonds and two tion of the real potential energy that drives the behavior


for non-covalent bonds. We use a vector to de- of the system. Whether or not that approximation suffices

 
 
scribe the state of a system of atoms and define the po- depends on what we use it for.
tential energy as a function  . In its simplest
form, that energy is written as

  
     

Molecular dynamics. One of the applications of force

fields is the simulation of molecular motion. Let
 
 *+
bonds
-* , .02. /1 ,
be the trajectory of a point with mass  . Its location at

       , * , 
time is  , its velocity is 
 
 , and its momen-

angles 

tum is  3-* , 
 . Recall Newton’s three laws of motion:


         1. A body continues to move in a straight line at con-


    
 torsions
stant velocity unless a force acts upon it.

atoms   2. The rate of change of the momentum equals the force.

"$# # )(

  !  ! 
3. To every action there is an equal and opposing reac-


!&%  ' %
tion.

atoms 
4 , 65* , 7.90. 8:1 8/ ,
The rate of change of the velocity is also referred to as
  
the acceleration,    . Newton’s sec-
This formula contains various constants that depend on the
ond law can now be written as   ;4 ,  /=<>1 ?


 @/ <>1 ? 
 
* ,
type of atom or interaction involved. We briefly look at , where

each one of the five terms. is the force acting upon  . Suppose we

/ BAC / D   of a potential



write the force as the negative gradient func-
tion:  , for some  . Using this

Bond length. The first sum approximates the energy

* 5 , BAE /=<>1 ?


by a quadratic function. The strength



penalty for differing from the reference length,
is relatively
, notation, Newton’s second law is expressed by the differ-
ential equation  

 . A trajectory is a so-
large, namely several hundred kilo-calories per mole. lution to this equation. In simple cases, the trajectory can

 * GFH*IF
be computed analytically. For example, if the potential is
Bond angle. The second sum approximates the energy 
  
 stationary and equal to one over the norm,  
 / BAC / J* GFH*KF
,
penalty for differing from the reference angle, ,
then    . In this case, the generic
again by a quadratic function. The strength, , is
considerably less than for bond length, namely about trajectory is an ellipse with one focus at the origin, as illus-
one one-hundredth or even less. trated in Figure I.19. Both the gravitational and the elec-
trostatic potentials have this form.
Torsional rotation. The third sum approximates the en-
ergy for different torsional angles around a bond. An-
gles that lead to staggered arrangements of bonds at


both sides are energetically preferred. This prefer-
ence is modeled by a cosine function with minima
and the same number of maxima.
Electrostatic interaction. The forth sum adds the electro-
 
static potential between every pair of atoms in the 
system. The constants and are the charges, is
the dielectric constant of the medium, and is the
 
distance between the two atoms.
Van der Waals interaction. The fifth sum approximates Figure I.19: A generic trajectory when the magnitude of the at-

!
the van der Waals potential by the Lennard-Jones 12-
traction to the origin decreases with the square distance.
6 function. The collision constant,
 
, marks where
the function crosses the zero line, and 
value at the unique minimum. As before,
 
is the
is the 
The problem in molecular dynamics is significantly
more involved. We have bodies (atoms) and the energy
distance between the two atoms. potential and force depend on the momentary locations of
14 I B IO - MOLECULES

 

all bodies. As before, we
 represent the collection of putational biology. Numerical algorithms for molecular

    
  
atoms by a point . The energy potential is the dynamics can be found in Leach [4] and Schlick [6].

 BAC
function defined earlier, and the force act-
ing on is  . Newton’s second law of motion [1] N. W. A SHCROFT AND N. D. M ERMIN . Solid State
can now be written as Physics. Harcourt Brace, Orlando, Florida, 1976.
 5 BAC 
 [2] A. B ONDI . Molecular Crystals, Liquids and Glasses. Wiley,


where the mass vector  
New York, 1968.
multiplies each compo- [3] W. L. J ORGENSEN AND J. T IRADO -R IVES . The OPLS po-
nent of the acceleration vector with the mass of the corre- tential functions for proteins. Energy minimization for crys-

 
sponding atom. The classic two-body problem is the spe- tals of cyclic peptides and crambin. J. Amer. Chem. Soc. 110
cial case in which  and is the sum of the two (1988), 1657-1666.
corresponding gravitational potentials. In this case, the
[4] A. R. L EACH . Molecular Modeling. Principles and Appli-
generic trajectories are again ellipses. Already for three
cations. Longman, Harlow, England, 1996.
bodies, there is no analytic solution and one has to resort
to numerical methods to approximate the trajectories. The [5] F. L ONDON . Zur Theorie und Systematik der Moleku-
problem in molecular dynamics is even more difficult be- larkräfte. Zeitschrift für Physik 63 (1930), 245–279.
cause the potential function is considerably more compli-
[6] T. S CHLICK . Molecular Modeling and Simulation.
cated than a sum of gravitational potentials. The currently
Springer-Verlag, New York, 2002.
available numerical solutions are inadequate to simulate
the entire folding process even for small proteins. One of [7] J. T SAI , R. TAYLOR , C. C HOTHIA AND M. G ERSTEIN .
the difficulties in the simulation is the near cancellation of The packing density in proteins: standard radii and volumes.
large forces so that relatively weak residuals gain a deci- J. Mol. Biol. 290 (1999), 253–266.
sive influence. Even small inaccuracies in the model or the
computation can lead to false decisions and possibly spoil
the entire remainder of the simulation.

Bibliographic notes. The first half of this section is a


highly simplified introduction of atoms and bonds. The
material on force fields is taken from Leach [4]. The van
der Waals potential derives its name from the work of van
der Waals, who quantified the deviation of rare gas from
ideal gas behavior. The origin of the force is a fluctuation
of electrostatic charge in atoms, and we refer to physics
texts such as [1, Chapters 19 and 20] for further details.
The explanation of the dispersive contribution in terms of
quantum mechanics is due to London [5].
To determine the constants needed to parametrize the
mathematical formulation of a force field is far from triv-
ial. The definition of the van der Waals radii used to
parametrize the Lennard-Jones functions is just one ex-
ample. There are various approaches to determine these
radii. Bondi [2] looks for the distances of closest ap-
proach between atoms to determine van der Waals radii.
Jorgensen and Tirado-Rives [3] derive parameters in an at-
tempt to reproduce thermodynamic properties in computer
simulations. Finally, Tsai et al. [7] analyse the most com-
mon distances between atoms in small molecule crystals
in the Cambridge Structural Database. Simulating motion
with molecular dynamics is an important topic in com-
Exercises 15

Exercises Base (www.rcsb.org/pdb) and the Swiss Bioin-


formatics Center (expasy.hcuge.ch).
1. Palindromic Sequences. Call a single strand of (i) Download a PDB file from either data base and
DNA a palindromic sequence if it the same as the extract the string of single-letter abbreviations
the complementary strand read backwards. describing the amino acid sequence.
(i) Given a strand, how would you determine (ii) Is the relative frequency of amino acids you ob-
whether or not it is a palindromic sequence? serve related to the relative number of codons
(ii) Give an algorithm that finds the longest subse- that encode them?
quence that is palindromic. 6. Ramachandran Plot. Download a PDB file and ex-
2. Counting strings. A double-strand of DNA has no tract the sequence of  and angles along the back-
preferred direction, but we can orient it so one direc- bone. Draw the result in form of a Ramachandran
tion is forward and the other is backward. In either plot.
direction, we read the strand in the   to   direction, 7. Regular Tetrahedron. A regular tetrahedron has
as usual. Call two linear or cyclic pieces of double- four equilateral triangles as faces, which meet along
stranded DNA the same if they can be oriented so we six equally long edges.
read the same string of nucleotides in the two forward
directions. (i) Determine the dihedral angle formed by two
faces meeting along a common edge.


(i) How many different linear pieces of double-
(ii) Determine the solid angle formed by three faces
stranded DNA of length are there?
meeting at a common vertex.


(ii) How many different cyclic pieces of double-
stranded DNA of length are there? [By convention, the full dihedral angle is 
, which
is the length of the unit circle, and the full solid angle
[Beware of palindromic sequences.] is  , which is the area of the unit sphere.]
3. Amino Acids. Draw the graph whose nodes are the  
  
 * FH*IF
8. Elliptic Trajectory. Let the energy potential
acyclic amino acids that has an arc connecting two 

 / BAC / J* GFH*KF
 be defined by 
*    . The force it
nodes iff one amino acid can be obtained from the exerts on a point is    .
other by the replacement or addition of a single atom. Prove that the generic trajectory in this force field is
an ellipse centered at the origin.
(i) Is the graph connected?
(ii) Does every connected component have a path
that passes through every node exactly once?

4. Lattices. The arrangement of atoms in a folded pro-


tein is often compared to that in a crystal lattices.
Sketch two such lattices by drawing the atoms as
points and connecting neighboring atoms by straight
edges.

(i) The face-centered cube (or FCC) lattice con-

*  
sisting of all points with integer coordinates


*  
whose sum is even:     such that
  .
(ii) The body-centered cube (or BCC) lattice con-

*  

sisting of all points will all even or all odd

* 
integer coordinates:     such that
*
   or      .

5. Structure Repositories. Descriptions of protein


structures are publically available at the Protein Data
16 I B IO - MOLECULES
Chapter II

Geometric Models

A surprising finding in the research on proteins is the so, we develop a language suitable for studying details of
importance of geometric shape in their functioning. By our models. In Section II.3, we introduce alpha shapes,
and large, the shape seems to determine how proteins in- which are dual to space-filling diagrams and are our pre-
teract with each other and with other molecules. This find- ferred computational representation. Finally in Section
ing is usually expressed as a causal chain of responsibili- II.4, we talk about the Alpha Shape software and discuss
ties: how it can be used.
 
S EQUENCE S HAPE F UNCTION

A protein is a peptide chain of amino acids that folds up II.1 Space-filling Diagrams
and forms a shape. In a natural environment, like proteins II.2 Power Diagrams
fold up to same shapes, but this might be a result of evolu- II.3 Alpha Shapes
tionary selection. The details of that shape in terms of its II.4 Alpha Shape Software
cavities, protrusions, dynamics, and energetics determine Exercises
how it interacts with other molecules.
At the current stage of our biological knowledge, there
is an overwhelming accumulation of sequence informa-
tion, which is due, in part, to the near completion of sev-
eral large-scale genome projects. Although the number
of proteins for which the three-dimensional structure has
been resolved and is stored in the Protein Data Base is in
the thousands, this is only a small fraction of the wealth of
available sequence information. The goal of studying the
geometry of proteins is therefore two-fold: the develop-
ment of new computational tools to help determine or re-
fine structure information and understanding the relation-
ship between shape and function.
In this chapter, we introduce some of the basic geomet-
ric models useful in representing molecular shape. We
have seen the bio-chemist’s view in Chapter I, who aims
at pruning the immense variety by limiting attention to
physically or chemically likely configurations. The rest
of this books takes a complementary view by concentrat-
ing on mathematical models and computational data struc-
tures that arise in the study of proteins. In Section II.1, we
introduce space-filling diagrams as the primary geometric
model of molecules. In Section II.2, we use Voronoi dia-
grams to decompose space-filling diagrams, and in doing

17
18 II G EOMETRIC M ODELS

II.1 Space-filling Diagrams 


ter of the circle thus traces out a curve at distance away

obtained by growing every disk  


from the boundary. This curve is the boundary of
to radius .
  
A space-filling diagram associates a molecule with a por-
tion of the three-dimensional space it occupies. The tacit The construction is illustrated in Figure II.2. The front of
assumption in constructing such a diagram is that the loca-
tions of the atoms in three-dimensional space are known.
An atom is represented by a ball (a solid sphere) and a
molecule is the union of balls of its atoms. We study such
unions first in the plane and then in space.


Union of disks. Let be a finite set of disks in the Eu-


we denote as  .  We specify each

clidean plane, which
  

  and its radius
disk
 by its center 
. An example is shown in Figure II.1. The union

Figure II.2: On the outside, the boundary of the union of uni-


formly grown disks, and on the inside the rounded boundary of
the original union.

the rolling circle describes the rounded boundary, which


consists of convex and reflex circular arcs. More formally,
this new curve is the boundary of the portion of  
that
is not covered by any placement of the open disk bounded
by the rolling circle. We can imagine creating that portion
with a milling machine whose material removing stylus
Figure II.1: Union of disks in the plane. Four of the eight disks has the shape of the rolling circle.
contribute two arcs each to the boundary.
We note that the rounded boundary of  is by and
of the disks,  , has a boundary that consists of circular large tangent continuous but can have cusps at places
where the rolling circle cannot quite squeeze through two
arcs meeting at common vertices. It is also possible that
an arc is an entire circle, which has no endpoints. A single disks. There are no cusps in Figure II.2, but there would be

disk can contribute any non-negative number of arcs. if the two disks to the lower left were just a little smaller.
In cases where tangent continuity is important, we may

The total number of arcs is however rather limited. If there
turn the cusps into crossings by adding arcs connecting

are disks whose union is a simply connected region, as
in Figure II.1, then the number of arcs cannot exceed  . 
the cusps. We thus obtain a tangent continuous immersion
of a curve in  .
 arcs. Hints towards proving the  up-

Even if we allow more general configurations, we cannot
get more than @
 upper bound is a consequence of

per bound can be found among the exercises at the end
of this chapter. The Union of balls. Let now be a finite set of balls (solid
the relationship between arcs in the boundary of the union  
spheres) in three-dimensional Euclidean space, which we

    we
denote as . Similar to the two-dimensional case,
and angles in the Delaunay triangulation, which will be
explained in Section II.2.   
specify each ball  

by its center  and
its radius . Figure II.3 shows the union of balls that
represent gramicidin, which is a small protein of barely
Rolling circle. We can make the boundary of the disk more than 300 atoms. To understand the structure of the
union smoother by substituting blending curves for the boundary of the union,  , we study the portion con-

vertices where the circular arcs meet. To this end we tributed by a single sphere. The sphere bounding in- 
roll a circle of radius on the outside about the bound- tersects the other balls in a finite collection of caps. The
ary. At any moment during the motion, the circle touches interior of each cap lies in the interior of the union, and
the boundary but never intersects the interior. The cen- the portion of the sphere not covered by any cap is the
II.1 Space-filling Diagrams 19

cally tight. However, the numbers for well packed sets of


spheres, which are common for proteins, are much smaller
and typically only a constant times .


Rolling sphere. We can again get a smoother bound-
ary by rolling a sphere of radius about 
. The cen-
ter of that sphere moves along the boundary of the union
of grown balls,   , and its front sweeps out blend-
ing surfaces that cover cusps and crevices of the origi-
nal boundary. Figure II.4 shows such a rounded surface

spheres in Figure II.3 have radii


  
representation of gramicidin. Relative to that surface, the
. There are convex
sphere patches that correspond to faces of  
, reflex
torus patches that correspond to arcs of  
, and reflex
sphere patches that correspond to vertices of  
. The
Figure II.3: A union of balls representation of the gramicidin union of convex patches is sometimes referred to as the
contact surface because that is where the rolling sphere
protein.
touches  . Similarly, the union of reflex patches (tori
and spheres) is referred to as the re-entrant surface. When
we look carefully, can can detect a self-intersection of the
contribution of the sphere to the boundary of the union. surface in Figure II.4. There is a hole whose rounded sur-
The caps form the same structure as the disks discussed face penetrates through the outer surface roughly in the

earlier, only that they live on a (two-dimensional) sphere middle of the picture. This happens because the tunnel
instead of  . The structural description of a finite union connecting the hole to the outside is slightly too narrow
of balls is thus recursive in the dimension. The same type for the rolling sphere to squeeze through.
of symmetry can also be observed in dimensions beyond
three.

  and vertices in the boundary of a


The number of arcs

 
union of balls in can be quite a bit higher than the
same numbers for a union of disks in  . To count the


faces, arcs and vertices, we first note that a single sphere
intersects the other balls in fewer than caps. By analogy

to disks in the plane, the number of arcs in the bound-
ary of the union of caps is less than . Since each arc 
has at most two endpoints (if it is a full circle then it has
 vertices. To count the faces

no endpoints) and each endpoint belongs to two arcs, we
also have no more than
contributed by our sphere, we recall that these are the
connected components of the complement of the union of
caps. We will see that these components are related to the


triangles of the Delaunay triangulation, which implies that
Figure II.4: A molecular surface representation of the gramicidin
there are fewer than  faces on this one sphere. To get protein.


bounds on the total number of faces, arcs and vertices, we
multiply by and note that each arc belongs to at least


two and each vertex belongs to at least three spheres. We In the application of space-filling diagrams to biology,
conclude that there are fewer than   faces, fewer than 
@   
the radii of the balls are usually the van der Waals radii
  arcs, and fewer than   vertices. It can be shown
 
of the atoms, and the boundary of is referred to as


that for each value of , there are configurations of balls the van der Waals surface. The radius is chosen so that
with at least some constant times  faces, edges and ver- the rolling sphere approximates a water molecule, and the
tices. This shows that the upper bounds are asymptoti- boundary of   is referred to as the solvent accessible
20 II G EOMETRIC M ODELS


surface. The rounded surface is usually referred to as the      
star-shaped and that  lies in its kernel. Since is the
molecular surface.
common intersection of the , , this im-
plies that
is also star-shaped and that  lies also in its

   kernel. It follows in particular that 


is a connected cell.
Uniform growth. The boundary of and of do
not necessarily have the same combinatorial structure. We
Since the membranes bounding the
two-sheeted hyperboloids, the boundary of

are all sheets of
consists of
can understand structural changes by observing how they patches of such hyperboloids. All these patches are visible
are introduced while we continuously grow the balls. Each in their entirety if viewed from  .
face of the boundary sweeps out a (three-dimensional) cell
 We get the boundary of by drawing the sphere
in , each arc sweeps out a (two-dimensional) membrane
separating two cells, and each vertex sweeps out a curved
bounding each ball 
only inside its own Voronoi cell,
edge in the common boundary of generically three mem- which is . By construction, the arcs of the patches meet
branes and three cells. up in pairs along the membranes and in triplets along the

We describe the same complex as a Voronoi diagram of    


curved edges of the Voronoi diagram. The same is true for

and every . We can now see how structural differ-


   
 
*
the set of points  with weights . Define the weighted
ences between and arise: when we grow the

* FH* F 
distance of a point from  equal to the Euclidean
 balls, the boundary of the union sweeps out the Voronoi
distance minus the weight:     . The

cell of  is the set of points at least as close to  as to any diagram, and we get a structural re-arrangement whenever
we sweep over a vertex of the Voronoi diagram.
other weighted point,
 *     *    *   

Bibliographic notes. Space-filling diagrams have a long
tradition in biochemistry and are similar to the CPK me-
Figure II.5 illustrates the definition in two dimensions.
Consider the case of two weighted points,  and  , and  chanical models named after Corey, Pauling and Koltun
[5, chapter 1]. The variations of these models discussed
in this section have been introduced by Lee and Richards
[6, 7]. The molecular surface is sometimes referred to as
the Connolly surface, named after Michael Connolly who
wrote early software constructing this surface [3]. The sol-
vent accessible surface in Figure II.3 and the molecular
surface in Figure II.4 are computed using the software de-
scribed in [1].
Increasing all radii of a set of circles or spheres contin-
uously and at the same rate is referred to as the Johnson-
Mehl model of growth [4]. It leads to the Voronoi diagram
of this section, which is sometimes referred to as the addi-
tively weighted Voronoi diagram. We refer to Aurenham-
Figure II.5: Two-dimensional Voronoi diagram generated by uni- mer [2] for a survey of Voronoi diagrams, their algorithms
formly growing the disks. and applications. An algorithm that computes   cells of the
let
!
be the set of points with   . If one *    * 
additively weighted Voronoi diagram in
veloped and implemented by Will [8].
has been de-

ball is contained in the interior of the other then its cell

*
is empty. Otherwise, we have two non-empty cells sep-
[1] N. A KKIRAJU , H. E DELSBRUNNER , P. F U AND J. Q IAN .
arated by a two-dimensional membrane. The points of
Viewing geometric protein structures from inside a CAVE.
this membrane satisfy
 
IEEE Comput. Graphics Appl. 16 (1996), 58–61.
F*   F F*
   F  [2] F. AURENHAMMER . Voronoi diagrams — a study of a fun-


damental geometric data structure. ACM Comput. Surveys

perboloid. Observe that for every point   , the line


which is the equation of one sheet of a two-sheeted hy- 23 (1991), 345–405.

segment connecting and lies entirely in  . In ge-



ometry, this property is expressed by saying that  is
[3] M. L. C ONNOLLY. Analytic molecular surface calculation.
J. Appl. Crystallogr. 6 (1983), 548–558.
II.1 Space-filling Diagrams 21

[4] W. A. J OHNSON AND R. F. M EHL . Reaction kinetics in


processes of nucleation and growth. Trans. Am. Inst. Mining
Metall. AIMME 135 (1939), 416–458.

[5] A. R. L EACH . Molecular Modeling. Principles and Appli-


cations. Longman, Harlow, England, 1996.

[6] B. L EE AND F. M. R ICHARDS . The interpretation of pro-


tein structures: estimation of static accessibility. J. Mol.
Biol. 55 (1971), 379–400.

[7] F. M. R ICHARDS . Areas, volumes, packing and protein


structures. Ann. Rev. Biophys. Bioeng. 6 (1977), 151–176.

[8] H.-M. W ILL . Computation of Additively Weighted Voronoi


Cells for Applications in Molecular Biology. Diss. ETH
13188, ETH Zürich, Switzerland, 1999.
22 II G EOMETRIC M ODELS

II.2 Power Diagrams FH*   F    . We have



   

* *
inside
 

  
If we grow the square radii of a finite collection of spheres
if lies on boundary of
or balls, we get a decomposition of space into convex   
outside
polyhedra. This decomposition is known as the power di-
agram and has a variety of applications in molecular mod- *  *
*
If lies outside , the power distance of is the square
eling. length of a tangent line segment from to the bounding
sphere. Using the same algebraic manipulations as above,
we can show that the set of points with equal power dis-


Growing square radii. As in Section II.1, we let be

a finite set of balls 

. The square of the radius,  tance from two balls form a plane. The two planes are

 , is sometimes referred to as the weight of the point  .


  , indeed the same. As indicated in Figure II.6, this plane



1 , may separate the two bounding spheres, intersect both, or

,
We grow each ball to radius at time . The lie on the same side of both. Think of the three configura-
set of balls at time is denoted as . The Taylor series
expansion of the radius as a function of time is

   ,   , ,  




The first order approximation of the growth is one half the


inverse of the radius. Hence, larger balls grow slower than Figure II.6: The line of equal power distance separates if the
smaller ones. Of course, smaller balls never really catch two circles are disjoint and not nested, it passes through their
up except in the limit: intersection if that is non-empty, and it passes outside if the two

   circles are nested.

1 


   ,, tions as snap-shots in an animation in which the center of
the small circle moves towards the center of the large cir-
cle. At first, the line moves in the same direction but then
We are interested in the surface swept out by the intersec-
  comes to a halt and reverses its direction moving away
* ,
tion of the spheres bounding and and claim it is a

F   ,   ,
from the center of the large circle.
F* FH* F
plane. The points that belong to both spheres at time
 
     
      .
,
satisfy
Varying has the same effect as dropping the requirement Power diagram. The power or (weighted) Voronoi cell
that the two expressions vanish. Instead we just require 
of a ball under the power distance is the set of points at
that they both be equal, so we get least as close to as to any other ball, 
FH*  F       ,  HF *
      F         ,  
*      *   *   



*
 :*   *
  
 

   :*     

 


 *
*  F F F F
If we denote by the set of points whose power dis-

   
  !
        tance from is at most as large as the power distance from


   
then . In words, is the intersection of
We see the circle at which the two spheres intersect sweeps a finite number of half-spaces and thus a convex polyhe-

 1
out a plane. If follows that the membranes swept out by dron. This polyhedron may be bounded or unbounded,
the arcs of are pieces of planes. and it is even possible that it is empty. The power or
(weighted) Voronoi diagram of is the collection of cells


together with the polygons, edges, and vertices shared
Power distance. We can describe the decomposition of by the cells. Every polygon is shared by two cells, and in
space implied by the square radius growth model as a the generic case every edge is shared by exactly three and
Voronoi diagram for yet another weighted distance func- every vertex is shared by exactly four cells. Figure II.7 il-

*
tion. The appropriate function in this case is the power lustrates the definitions in two dimensions by showing the

*
distance of a point from a ball defined as the square Voronoi diagram of the same eight disks used in earlier

distance from the center minus the weight,  figures.
II.2 Power Diagrams 23

triangles, and vertices become tetrahedra. Similarly, we


reverse the inclusion direction. For example, a Voronoi
polygon belongs to a Voronoi cell iff the corresponding
Delaunay edge contains the corresponding Delaunay ver-
tex.

Number of simplices. We refer to an element of a De-


launay triangulation as a simplex, which can be a vertex,
an edge, a triangle or a tetrahedron. We can count the sim-
plices using the Euler relation, which says that the alter-


 
nating sum of simplices is always equal to 1. Writing ,

, and for the numbers of vertices, edges, triangles
Figure II.7: Power or weighted Voronoi diagram of eight disks
and tetrahedra, we have

  
in the plane.
 
 
Delaunay triangulation. The (weighted) Delaunay tri-
Before counting the simplices in three dimensions, let us
angulation of is dual to the (weighted) Voronoi dia-

by connecting  and  by an edge
warm up to the challenge by counting the simplices of a

  
gram. It is  obtained


if the cells and share a common polygon. Similarly,
two-dimensional Delaunay triangulation. The Euler rela-
   

   tion here is  . Observe that every triangle
 ,  and  are connected by a triangle if , and
 
share a common edge, and  ,  ,  and  are connected
  @
has three edges and every edge belongs to at most two tri-
by a tetrahedron if , ,

 
  
and share a common ver-
angles, hence  
the Euler relation implies  

and  


. Combining this inequality with
. The 
tex. Assuming the balls in are in general position, this
exhausts all possible types of overlap among the Voronoi
number
   
of vertices
@
is
,   and   .
at most

the number of
@
disks, hence
cells. Since complexes of tetrahedra are difficult to draw,
we illustrate the definitions by showing a two-dimensional In three dimensions, we note that each tetrahedron has
Delaunay triangulation in Figure II.8. If the balls are not   @
four triangles and each triangle belongs to at most two
in general position, we can perturb them ever so slightly tetrahedra, hence  
  =
. Combining this with

the Eu-

to move them into general position. ler relation implies     and  
number of vertices is at most the number of balls,

. The
,
 
    
and the number
of
 edges is at 
most the number of pairs of
vertices,      . Hence
  
      
   
   

There are Delaunay triangulations that have almost this


many simplices, but they require a placement of the balls
that would be rather unlike the configurations we observe
for proteins. Typically, each atom is surrounded by its
Figure II.8: Delaunay triangulation drawn over the dual Voronoi neighbors in the Delaunay triangulation. The neighbors
diagram of eight disks in the plane. The Delaunay triangles are are near the central atom and are therefore packed in a
transparent so they do not obstruct the structure of the Voronoi small amount of space, implying there can only be a small
diagram underneath. constant number of them. It follows that the number of


edges in the Delaunay triangulation is at most some con-
Observe that we reverse dimensions when we go from stant times , and as a consequence, also the number of


the Voronoi diagram to the Delaunay triangulation: cells triangles and tetrahedra are at most some constant times
become vertices, polygons become edges, edges become .
24 II G EOMETRIC M ODELS

Orthospheres. Suppose for a moment that the balls  


that does not intersect any edge of the Delaunay triangu-

     
all have zero radius. Then each Voronoi vertex is equally lation. The half-line passes through a sequence of Delau-

 
far from four points  and coincides with the center of the nay tetrahedra, , and we have and
 

  
circumsphere of these points. We will use the concept of for some  . Any two consecutive tetrahedra
orthogonality to generalize this property to the case where


     
share a triangle. It follows that the orthospheres

   
the have not necessarily zero and not necessarily equal of and of are orthogonal to the three balls
radii. Two spheres or balls 

and 

  
  
whose centers span that triangle. The plane of points with

 
are orthogonal if equal power distance from and thus contains the
     


F F
shared triangle. The viewpoint is on ’s side of that
 
  
 
plane, which implies that the power distance of from

 
is less than that from . By transitivity, the power dis-

 



The name is justified because the two tangent planes de-
 tance of from the orthosphere of is less than its power


fined at any point common to the bounding spheres of
 distance from the orthosphere of , whenever  , and


and form a right angle between them. the same is true for and . In other words, the power
* distance increases along chains of the relation . Since

*
Let now be a vertex of the Voronoi diagram of .
Assuming the generic case, has equal power distance real numbers are totally ordered, we conclude that is
      acyclic.
  * *  *
from four balls, , , and , and larger power distance


*   * *
from all others. Let be the sphere with center
   
and weight      .

Algebraically, there is no difficulty at all if  is negative
and is therefore imaginary. That sphere is orthogonal to
Bibliographic notes. Power diagrams of discrete sets of
weighted points have been studied by Carl Friedrich Gauss
  
, , 
and , and we refer to it as the orthosphere of more than 150 years ago in the context of quadratic forms
the four balls. If the four balls had zero radius, would be  [6]. In reference to subsequent work by Dirichlet [3] and

 F   
Voronoi [8], these diagram are often referred to a Dirichlet
F*
their circumsphere. Note that is further than orthogonal
   :
from all other balls, that is,
 

    for all
 . This property can be used to characterize
tessellations or Voronoi diagrams. The dual triangulations
have been introduced considerably later by Boris Delau-
Delaunay tetrahedra for a generic set of balls. Specifically, nay (also Delone) [2]. It is common to reserve the name
a tetrahedron connecting points  ,  ,  and  belongs   Delaunay triangulation for unweighted points and to refer
to the duals of power diagrams as regular triangulations [1]
to the Delaunay triangulation of iff the orthosphere of
  
, , 
and is further than orthogonal from all other or coherent triangulations [7]. We prefer to be economi-
cal with terms and refer to them as (weighted) Delaunay
balls in .
  
triangulations. Algorithms for constructing weighted De-
launay triangulations in  and are discussed in [4,
Acyclicity. Given a fixed viewpoint, we can order two Chapters I and V]. That reference also explains how to
tetrahedra if one lies in front of the other one, as seen from computationally cope with ambiguities in the construction
the viewpoint. We call this the visibility ordering with re- caused by non-generic input sets. Upper bounds on the
spect to the given viewpoint. It turns out that this relation  
number of Delaunay simplices for “well-spaced” points in
 

can in general have cycles but is acyclic for Delaunay tri- can be found in [5].



angulations. We need some notation. Let be the


viewpoint and write if there is a half-line that em- [1] L. J. B ILLERA AND B. S TURMFELS . Fiber polytopes. Ann.


anates from and passes through the interior of the De- Math. 135 (1992), 527–549.


launay tetrahedron before it passes through the interior


the Delaunay tetrahedron . We use orthospheres to prove [2] B. D ELAUNAY. Sur la sphère vide. Izv. Akad. Nauk SSSR,
that the relation is acyclic. Otdelenie Matematicheskii i Estestvennyka Nauk 7 (1934),
793–800.
ACYCLICITY L EMMA . The visibility ordering of the De-
[3] P. G. L. D IRICHLET. Über die Reduktion der positiven
launay tetrahedra with respect to any fixed viewpoint
quadratischen Formen mit drei unbestimmten ganzen Zahl-
is acyclic. en. J. Reine Angew. Math. 40 (1850), 209–227.
 

P ROOF. Let be a half-line that emanates from and [4] H. E DELSBRUNNER Geometry and Topology for Mesh
passes through the interiors of and . We may assume Generation. Cambridge Univ. Press, England, 2001.
II.2 Power Diagrams 25

[5] J. E RICKSON . Dense point sets have sparse Delaunay tri-


angulations. In “Proc. 13th Ann. ACM-SIAM Sympos. Dis-
crete Alg., 2002”, 125–134.

[6] C. F. G AUSS . Recursion der Untersuchungen über die


Eigenschaften der positiven ternären quadratischen Formen
von Ludwig August Seeber. J. Reine Angew. Math. 20
(1840), 312–320.

[7] I. M. G ELFAND , M. M. K APRANOV AND A. V. Z ELE -


VINSKY. Discriminants, Resultants and Multidimensional
Determinants. Birkhäuser, Boston, 1994.

[8] G. VORONOI . Nouvelles applications des paramètres con-


tinus à la théorie des formes quadratiques. J. Reine Angew.
Math. 133 (1907), 97–178, and 134 (1908), 198–287.
26 II G EOMETRIC M ODELS

II.3 Alpha Shapes


 
Independence. Recall that a simplex belongs to the dual 
complex iff the corresponding clipped balls (the )
have a non-empty common intersection. This condition
Recall that the Delaunay triangulation is the dual of the
Voronoi diagram. In this section, we generalize this con- has an interesting consequence on how the themselves 
struction and consider the dual of the Voronoi diagram re- may intersect. In a nut-shell, there can be at most four
stricted to within the union of the defining balls. balls (one more than the dimension of the space), and
they can form only one combinatorially distinct intersec-


tion pattern. We first discuss this pattern for general sets
Dual complex. Observe that the Voronoi cells decom-

that are not necessarily balls. Call of a collection of sets

   
pose the union of balls in into convex cells
 

independent if for every subcollection there is a
. Let be a subset of the index set. point inside every set in and outside every set not in :
The dual complex records the non-empty common inter-
sections among these cells,    
   
   

 
  has  

 A collection of size    subcollections. For this

where   is the convex hull of the centers of the balls with 
collection to be independent, there must be   points

index in  . Equivalently,   
 iff the common inter- whose patterns of inclusion in the sets are pairwise


different. We use the pigeonhole principle to show that
section of Voronoi cells has a non-empty
the union of balls:  
  intersection with
  . Note that this   
the maximum number of independent disks in the plane is


 
three. Let  be the maximum number of regions
is just a more formal way of explaining the duality trans-

we can

get by  
drawing  
  
circles in the plane. We have 

 
formation we used in the last section to construct the De-   and    because the  -st

 
launay triangulation from the Voronoi diagram. The un- circle intersects the other circles in at most two points

derlying space is the set of points contained in simplices
of . In this context, we refer to it as the dual shape of
each. These points cut the 

-st circle into at most 
arcs, and each arc cuts at most one region into two. The
. Figure II.9 illustrates the definition for the set of disks number of regions is therefore
# 

 
used in many of the previous figures.
      




 
 %

   

Hence,     , which implies that at most three
disks can be independent. For each    there
is a (combinatorially) unique independent configuration
shown in Figure II.10. The same argument also works

Figure II.9: The dual complex is drawn on top of the Voronoi


decomposition of the union of disks. The nine edges correspond
to the pairwise intersections and the two triangles to the triple-
wise intersections of the clipped Voronoi cells.
Figure II.10: The independent configurations of one, two, and
In the special case, in which the balls have non-empty
pairwise but no non-empty triple-wise intersections,
 three disks in the plane.

looks like the ball-and-stick diagram common in chem- in three dimensions, where it can be used to show that
istry and biology. There, each stick represents a covalent the maximum number of independent balls is four. Again,
bond, while here, it represents the geometric overlap be- there is only one possible intersection pattern for four in-
tween two balls. dependent balls.
II.3 Alpha Shapes 27

 
Independent simplices. Recall that each simplex in the independent caps. But this implies that the Voronoi vertex
Delaunay triangulation is spanned by the centers of a small lies outside the sphere: .
collection of balls, four for a tetrahedron, three for a trian-
As mentioned above, the Independence Lemma also
gle, and so on. In discussions of combinatorial properties, holds for three disks in the plane. Given three balls, we get
we sometimes forget the difference and think of the sim-
three disks of maximum size by intersecting them with the
plex as this collection of balls. In this spirit, we call the plane that passes through the centers. This plane intersects
simplex independent if the collection of balls is indepen- the Voronoi diagram of the balls in the Voronoi diagram of
dent. We will prove shortly that all simplices in the dual
the disks. But this implies that three balls are independent
complex are independent. This is a fairly strong statement iff the (unique) line in the corresponding Voronoi diagram
since it limits the balls to a single intersection pattern. The
has a non-empty intersection with the union of the three
following lemma is the key to proving that all simplices in
balls. Similarly, two balls are independent iff the (unique)
the dual complex are independent. The lemma holds in plane in the corresponding Voronoi diagram has a non-
any dimension, and can be proved by induction over the
empty intersection with their union. But this is exactly the
dimension. To avoid the complications of a discussion for
general dimensions, we assume the lemma for disks   (or

criterion for a simplex to belong to the dual complex. It
follows that each simplex in is independent, as claimed.
rather for caps on a sphere) and prove it for balls in .

 
I NDEPENDENCE L EMMA . A collection of four balls in
Filtration. We return to the idea of growing the balls
is independent iff the (unique) vertex of the

 
corresponding Voronoi diagram is contained in the 
continuously and watch how the union changes. We let

    ,
time go from  to  and grow the weight of each ball
,  ,
union: .

    
to  at time . Each has zero weight at time
P ROOF. Assume first that

, for example
. There sphere  bounding intersects the other balls

  and negative weight and therefore imaginary radius
before that time. By construction, the Voronoi cells of the
in three caps. The circles bounding these caps lie in the balls are unchanged at all times. It follows that the dual

three planes bounding the Voronoi cell of , and because complexes that arise throughout time are subcomplexes of
lies outside  , the three caps are not independent. A one and the same Delaunay triangulation. Furthermore,

So there exists a subset   


 

particular such configuration is illustrated in Figure II.11.


not represented by
since the portions of the Voronoi cells covered by the balls
can only grow, the dual complexes can also only get larger
in time.
Instead of time, we use the square root,  , as the ,
vention is that for

index for time varying sets. The main reason for this con-
, the radius of the ball at time  ,

is . We need some notation. Let

be the collection of
,



u balls and the dual complex of



at time . We


refer to as the -complex and to its underlying space





as the -shape of . For small enough (large enough neg-


ative) time, all radii are imaginary, , and the 



dual complex is empty. For large enough time, cov- 

ers all Voronoi vertices, and the dual complex is equal to


the Delaunay triangulation. We thus have a sequence of
complexes that begins with the empty complex and ends
      
Figure II.11: The planes bounding the Voronoi cell intersect the

sphere in three circles. The three planes meet at , and because
with the Delaunay triangulation,
for every     

 . There are only
, 



lies outside the sphere, the three caps are not independent.

 
finitely many simplices and therefore only finitely many
any point on the sphere, that is, 
subcomplexes of that arise as dual complexes during the

. It can still 
    
   
be that there is a point outside  contained in
growth process. We refer to this sequence as a filtration of
, but then

. In other words, is not independent.
the Delaunay triangulation, 
Figure II.12 illustrates the construction by showing three
.

To prove the reverse, we assume that is not indepen- complexes in the filtration generated by eight disks in the
dent. Then  intersects the other three balls in three non- plane. To translate between continuous time and discrete
28 II G EOMETRIC M ODELS

the shared Voronoi vertex. This is also the time when


the three disks become independent, but the pair of larger
disks became independent earlier.

Figure II.13: The two larger disks are independent, but the dual
Figure II.12: Three unions of disks and the corresponding dual edge does not belong to the dual complex because their common
complexes. The first complex contains all vertices but only two intersection is disjoint from the corresponding Voronoi edge.
edges and no triangles. From the first to the third complex, the
edges become thinner and the triangles become lighter.
We represent the filtration by sorting the Delaunay sim-
  plices by birth-time, and in case of a tie by dimension.
  
rank, we define a function  
 
  .
such that 
if
plex

Remaining ties are broken arbitrarily. Every dual com-
is a prefix of this ordering, and because of the tie
breaking rule, every prefix is a complex, even if it does
not coincide with a dual complex. This property of the or-
Ordering simplices. We can sort the Delaunay sim-

 
dering will be crucial for the algorithm in Chapter IV that
plices in the order in which they enter the dual complex.

 
computes the connectivity of the .
, ,
Define the birth-time of a simplex as the minimum
time  such that for all  . The differ-


ence between two contiguous complexes in the filtration Bibliographic notes. Alpha shapes and alpha com-
consists of all simplices whose birth-time coincides with plexes have been introduced by Edelsbrunner, Kirkpatrick

  

the creation of the second complex, and Seidel [3] in 1983 for finite sets of points in the plane.
   
         
About a decade later, the concept has been generalized to

 
Often two contiguous complexes and  differ by
three dimensions and made available as a software pack-
age with graphical user interface [4]. The unexpected
only one simplex,  . In this case, the birth-time of  coin-

popularity of that software in structural biology triggered
the development of further geometric concepts useful in
sphere of  be the smallest sphere orthogonal to all balls
cides with the time it becomes independent. Let the ortho-
structural biology, some of which are explained in this
whose centers are vertices of  . The time  becomes in-
book. The main reason for the popularity is the duality
dependent is also the time the orthosphere of  dies or
between space-filling diagrams and alpha shapes as ex-
plained in this and the two preceding sections. To fully
shrinks to a point. Geometrically, this case is characterized
develop that duality, alpha shapes had to be extended to

by a non-empty common intersection between the affine
  
take into account weights, and this has been described in
hull of and the Voronoi cells of its vertices. Sometimes,
complete generality in [2]. That generalization benefit-
however, the difference between and  consists of
ted from adopting the language of simplicial complexes,

two or more simplices. In the generic case, all these sim-
which has been developed decades earlier in the area of
plices are faces of a single simplex, , that also belongs

,  
combinatorial topology [1, 5].
to the difference. All these simplices are born at the same
time,    . In the absence of any degeneracy,
[1] P. S. A LEXANDROV. Combinatorial Topology. Dover, New
 ,
their orthospheres die at different times, with the ortho-
York, 1998 (republication of translation of the original Rus-
sphere of dying last at time . Figure II.13 illustrates
sian edition from 1947).
this case. The triangle connecting all three centers and
the edge connecting the centers of the two larger disks are [2] H. E DELSBRUNNER . The union of balls and its dual shape.
born at the same time, namely when all three disks reach Discrete Comput. Geom. 13 (1995), 415–440.
II.3 Alpha Shapes 29

[3] H. E DELSBRUNNER , D. G. K IRKPATRICK AND R. S EI -


DEL . On the shape of a set of points in the plane. IEEE
Trans. Inform. Theory IT-29 (1983), 551–559.

[4] H. E DELSBRUNNER AND E. P. M ÜCKE . Three-dimen-


sional alpha shapes. ACM Trans. Graphics 13 (1994), 43–
72.

[5] P. J. G IBLIN . Graphs, Surfaces and Homology. Second edi-


tion, Chapman and Hall, London, 1981.
30 II G EOMETRIC M ODELS

II.4 Alpha Shape Software tains a line for each atom listing its three coordinates and
the van der Waals radius. The -r option allows for the
This section introduces the basic Alpha Shape software specification of a radius increment that is applied to every
and explains how to go from a standard descriptions of atom in the file. In our example, this radius increment is
protein structures to the visualization of their alpha shapes. 1.4 Å, which is the most common approximation used for
The discussion is more descriptive and less analytical the size of water molecules. The resulting set of balls thus
than in the previous three sections. Given a pdb-file, defines the solvent accessible diagram representing the in-
name.pdb, we take four steps to construct and visualize teraction with the surrounding water; see Section II.1.
alpha shapes in an interactive graphical user interface:
Delaunay triangulation. The first step towards comput-
> pdb2alf name.pdb name
ing alpha shapes is to construct the Delaunay triangulation
> delcx name
of the set of balls. This is accomplished by the command
> mkalf name
> alvis name
> delcx name
The details of the discussion apply to Version 4.1 of the
The  aunay  omple program creates a file name.dt

Alpha Shape software executed on an SGI workstation
running under the UNIX operating system and may differ that represents the Delaunay triangulation. The efficient
for other versions and platforms.  
and robust construction of the Delaunay triangulation in
is not entirely straightforward. We briefly mention the
algorithmic ingredients used. The basic strategy is incre-
Data format. The main public source for structural pro-

 
mental, adding one ball at a time to the triangulation. Us-
tein data is the Protein Data Bank (pbd) mentioned in Sec-

ing an arbitrary ordering of the balls, we write for the
tion I.3. Only a fraction of the information is needed to
construct alpha shapes. Specifically, for each atom we

of , for

set of the first balls and  
for the Delaunay triangulation
. With this notation, the algorithm
only need its coordinates in three-dimensional space and can be written as follows.
   ;
its radius. The coordinates are explicitely given in the file,
but the radius must be inferred from the atom type. This
is done according to published translation tables that map
for
 to  do
 I NSERT     
endfor.

atoms to van der Waals radii. Unfortunately, there is no 
universally agreed upon table. Some differences are due
to different methods used to derive radii, including mea- The -th ball is inserted through a sequence of flip opera-
surements of closest approach, molecular mechanics cal- tions. The flips are performed depending on the outcomes
culations, etc. One of the most problematic elements is of only two types of primitive tests needed in the construc-
hydrogen (H), which accounts for almost 50% of the num- tion of the Delaunay triangulation:
ber of atoms found in organic matter. Hydrogen atoms
sometimes donate their electrons to complete the shells of O RTHOGONALITY: decide whether a ball is closer or fur-
other atoms and thus can exist without any shell and ra- ther than orthogonal to the orthosphere of four other
dius to speak of. Hydrogen atoms are generally not repre- balls.
sented in pdb-files, but can be inferred to some accuracy O RIENTATION : decide whether a ball center is on the pos-
from the types and relative positions of the other atoms in itive or negative side of the oriented plane spanned by
the protein. In the common unified atom model, the van three other ball centers.
der Waals radii of larger atoms are adjusted to include the
bonded hydrogen atoms. Both tests reduce to the sign of the determinant of a small
We can extract the coordinates and the radii using soft- matrix and can be decided without computing intermedi-
ware that is part of the Alpha Shapes distribution. Specif- ate geometric information. The operations are ambiguous
ically, we call if the balls are in non-generic position, and so is the De-
launay triangulation. To cope with the related robustness
> pdb2alf -r 1.4 name.pdb name problem, we use exact arithmetic and simulated perturba-
tion. Exact arithmetic guarantees the correct execution of
to read name.pdb and create a new file name that con- flips in all generic and therefore unambiguous cases, and
II.4 Alpha Shape Software 31

simulated perturbation reduces ambiguous cases in a con- > mkalf name


sistent manner to unambiguous ones. The use of exact
rather than floating-point arithmetic poses a challenge to  
The  a e  pha shape iltration program reads the Delau- 
the efficiency of the code. A common remedy is to use nay triangulation in name.dt and generates a new file,
so-called floating-point filters: calculate in floating-point name.alf, that stores the filtration along with some aux-
arithmetic, bound the error, and redo the computation in iliary data structures.

  
exact arithmetic if the error is too large to guarantee a cor- The software refers to the sorted sequence of simplices
rect decision.
 
as the ‘masterlist’. It stores each simplex several


Another challenge to the efficiency of the code is the times, marking when is born, when becomes a face
 
inherent size of the Delaunay triangulation. As mentioned of another simplex, and when becomes interior to the

 ,  ,  ,
in Section II.2, the Delaunay triangulation in can have alpha

complex. Suppose the three events happen at times


a number of simplices that is quadratic in . For exam- . Then

ple, if the centers of the balls lie on the moment curve
 ,
, 

,
not in if  


and all radii are equal, then every pair of vertices forms 



, ,
 singular if 
an edge in the Delaunay triangulation, as shown in Fig- is 
ure II.14. Fortunately, the balls of organic molecules are  regular
interior
if
if

, 





The combinatorial topology term for being singular is


principal and means that is not a face of any other sim-
plex. The simplex is regular if it belongs to the bound-
ary but is not principal, and it is interior if it is completely
surrounded by other simplices. Some of the three events

, , , 
may coincide. For example,

a tetrahedron is interior as
soon as it is born, so
boundary of
  
. A simplex in the
can never become interior, so  . ,


, ,
Finally, a simplex whose orthosphere dies strictly 
before
the simplex is born is never singular, so . The

 
main reason for recording all this information is to deter-
Figure II.14: Edge-skeleton of the Delaunay triangulation of
twenty one points on the moment curve in .
mine how to draw in the graphical interface, but there
are others. Figure II.15 shows four alpha complexes of the
relatively small gramicidin protein. In each case, we only

usually well packed and have Delaunay triangulations of
show the singular simplices together with the regular tri-
size at most proportional to . The danger remains that
one of the intermediate triangulations is large. Then we
angles. Given a value of , we need quick access to the
simplices of the various types in . For this purpose, we
 

spend a lot of time constructing that triangulation, only to


store the existence intervals in a number of intervals trees.
destroy most of it before arriving at the final triangulation.
Each such tree stores some number  of intervals in space
, 
  
This danger is quite real as systematic enumerations of
O( ), and for a given moment  , it enumerates the
the data tend to generate subconfigurations with relatively
large Delaunay triangulations. The remedy here is to add
simplices whose intervals contain in time O( 

). ,
the balls in a random sequence. In other words, we apply
a random permutation to the input sequence and construct Visualization. We finally discuss the visualization inter-
the Delaunay triangulation following this permutation. face of the Alpha Shapes software. The necessary support
structures are computed and the graphics user interface is
opened by executing
Filtration. As explained in Section II.3, dual complexes
obtained by growing the square radii form a nested se- > alvis name
       
quence  of subcomplexes of the Delaunay triangulation,
 . This is the filtration


The  pha shape  ualization program uses both the De-
of -complexes, for      . We represent the launay triangulation file, name.dt, and the filtration file,
filtration by the sequence of Delaunay simplices ordered name.alf. The interface consists of a visualization
by birth-time. The sequence is generated by calling panel, and scene panel, and a signature panel. All alpha
32 II G EOMETRIC M ODELS

Figure II.16: Signature panel of the Alpha Shape visualizer.

Figure II.15: Four alpha complexes of gramicidin.

complexes are shown in the first but which complex is

panels. The visualized complex



shown and how it is shown is decided in the other two
is selected in the sig- Figure II.17: Scene panel of the Alpha Shape visualizer.
nature panel. To support that selection, the panel displays
a variety of functions (or signatures) that illustrate how

the complexes change with time. For example, the three
default signatures map each index to the number of sin-
1-skeleton of the Delaunay triangulation shown in Figure
II.14 is obtained by drawing all edges of the last alpha

the underlying space of



gular edges, the area of the boundary, and the volume of
. Figure II.16 shows the signa-
complex while suppressing the display of all triangles and
tetrahedra.
ture panel and the three default signatures for gramicidin.

  , ,
All signatures that count rather than measure are displayed

in log-scale. Instead of mapping the time to a property of

Bibliographic notes. The Alpha Shape software was

of


 
, the signatures map the index

 to the property
. To facilitate the reconstruction of the map
created by Ernst Mücke as part of his doctoral work at
Urbana-Champaign. The best documentation of the algo-
from time, the panel contains a signature that maps the in- rithm and data structures used in the software are still his

 
dex to time. Specifically, it shows the log-scale graph of
  . A particular index, , is selected by the position of a
thesis [6] and the original paper on the topic [4]. After a
period of rapid development directed by Ping Fu at the Na-
tional Center for Supercomputing Applications, the soft-
vertical bar in the signature panel and by clicking the Al-
ware reached version 4.1 in 1996, which is still the most
pha Shape button in the scene panel, as shown in Figure
II.17. The buttons in the middle of the scene panel provide recent version distributed on the web [7]. The Delaunay
triangulation software in the Alpha Shapes distribution is
control over how simplices are drawn: colored, shaded, in
wireframe, seamless, or with gaps created through a slow based on a variety of algorithmic techniques described in
explosion. The matrix on the right hand side can be used a recent text by Edelsbrunner [3]. The interval tree used
for fast retrieval of simplices is explained in [2].
to select the types of displayed simplices. By default, only
the singular vertices, edges, triangles and the regular trian- As mentioned earlier, the largest resource for structural
gles are shown. Different settings can be used to highlight protein data is the Protein Data Bank [1], which can be
different aspects of an alpha complex. For example, the accessed via the web [8]. A survey of geometric measure-
II.4 Alpha Shape Software 33

ments of proteins including a discussion of different tables


for van der Waals radius assignment can be found in [5].

[1] H. M. B ERMAN , J. W ESTBROOK , Z. F ENG , G. G ILLI -


LAND , T. N. B HAT, H. W EISSIG , I. N. S HINDYALOV AND
P. E. B OURNE . The Protein Data Bank. Nucleic Acids Res.
28 (2000), 235–242.

[2] H. E DELSBRUNNER . A new approach to rectangle intersec-


tions – part I. Internat. J. Comput. Math. 13 (1983), 209–
219.

[3] H. E DELSBRUNNER . Geometry and Topology for Mesh


Generation. Cambridge Univ. Press, England, 2001.

[4] H. E DELSBRUNNER AND E. P. M ÜCKE . Three-dimension-


al alpha shapes. ACM Trans. Graphics 13 (1994), 43–72.

[5] M. G ERSTEIN AND F. M. R ICHARDS . Protein geometry:


distances, areas, and volumes. Chapter 22 in The Interna-
tional Tables for Crystallography, Vol. F, M. G. Rossmann
and E. Arnold (eds.), Kluwer, Dordrecht, the Netherlands,
2001, 531–539.

[6] E. P. M ÜCKE . Shapes and Implementations in Three-


dimensional Geometry. Rept. UIUCDCS-R-93-1836, Dept.
Comput. Sci., Univ. Illinois, Urbana, 1993.

[7] Alpha Shapes web-site at www.alpha-


shapes.org; see also the software collection in
biogeometry.duke.edu.

[8] Protein Data Bank web-site at www.rcsb.org/pdb.


34 II G EOMETRIC M ODELS


 
     .  
 

  .
Exercises
(ii) Show that   
(i) Show that

1. Tree-like sequences. Given an alphabet of   


letters, form a sequence but refrain from placing any           . The
[We note that the relation in (ii) neatly generalizes

4  
letter twice in a row. The sequence is tree-like if the formula  
there are no two letters that alternate more generalization is not quite as neat if we sum powers
rather than binomial coefficients.]
$4 4 4  4   
than twice. In other words, subsequences of the form

4    H4
 and   are prohibited. Examples of

5. Sphere arrangements. Let   

be the maximum
4 H4 4 4
tree-like sequences of four letters are and number of cells we get by drawing spheres in .
.
   


(i) Show that   unless  .

of  letters has length at most @
(i) Prove that a tree-like sequence over an alphabet

. Is this  
(ii) Give a formula for  that works for all posi-
bound tight? tive .
(ii) Define a tree-like cyclic sequence by pro- [You might consider answering question (ii) before

4  4 
hibiting cyclic subsequences of the form question (i).]
  . Prove that a tree-like cyclic se-
 
6. Independent half-spaces. A half-plane is the set of
@
quence over an alphabet of letters has length
at most    . Is this bound tight? points on or on one side of a line in  . Similarly, a

2. Number of arcs. Let be a set of  disks in  

 
half-space is the set of points on or on one side of a
plane in , and a cap is the intersection of a sphere


the plane. The boundary of the union of the disks with a half-space. What is the maximum number of
consists of circular arcs contributed by the circles. independent
(i) Assuming the boundary of is a single  (i) half-planes in  ,

 
@
closed curve, use tree-like cyclic sequences to
(ii) half-spaces in ,
prove that it consists of at most    (maxi-  
mal) circular arcs. Is this bound tight? (iii) caps on a sphere in ?
(ii) Prove that in general the number of (maximal) 7. The filtration of water. A water molecule consists

most
   . Is this bound tight?

circular arcs in the boundary of the union is at of one oxygen and two hydrogens: H O.


3. Empty Voronoi cell. Call a disk in a finite collec-  (i) Look up the standard geometric model (deter-
mined by radii, bond length and bond angle).
tion of disks redundant if its Voronoi cell is empty. (ii) Describe the Voronoi diagram and the sequence
(i) Prove that if there are disks , and in the    of alpha complexes of the model.
collection such that
*  *  * * 
    8. Barycentric subdivision. The barycentric subdi-
(a)    
*    
 for the vision of a simplex is obtained by adding the
orthocenter of , and
 
barycenter of (also known as the centroid or cen-
(b)  lies in the triangle    ter of mass) as a new vertex and connecting it to the

then is redundant. simplices in the barycentric subdivisions of the faces.
(ii) Prove that the necessary conditions given in (i)
(i) How many vertices, edges, triangles and tetra-
are also sufficient. In other words, prove that if

is redundant then there exist disks , and   hedra are in the barycentric subdivision of a

that satisfy Conditions (a) and (b).
tetrahedron?

4. Binomial coefficients. Let be two positive


 
   
(ii) Use the Alpha Shape software to create the
barycentric subdivision of a regular tetrahe-
integers and recall that the binomial coefficient is
  dron.


the number of ways we can choose elements from
# [You will need to use weights to make the barycentric

a collection of elements. Recall also that
    

 subdivision of the tetrahedron the Delaunay triangu-

% 
    
lation of the points.]
   
Chapter III

Surface Meshing

Recall the different types of space-filling diagrams we we use that software to illustrate some of the properties of
discussed in Chapter II. The van der Waals and the solvent these curves and surfaces.
accessible models are both unions of finitely many balls
in three-dimensional space and differ only in the radii. We
have also discussed the molecular surface model that is ob-
tained by rolling a sphere about the van der Waals model. III.1 Molecular Skin
Corners and crevices are filled up and the surface consists III.2 Curvature
of spheres connected by blending torus patches and in- III.3 Adaptive Meshing
verted sphere patches. III.4 Skin Software
Exercises
In this chapter, we introduce model that is similar to
the molecular surface. Its surface consists of spheres
connected by blending hyperboloid patches and inverted
sphere patches. We call this the molecular skin model.
The surface is piecewise quadratic and has a number of
attractive properties not shared by the other space-filling
models. One is the continuity of the normal direction, an-
other the continuity of the maximum principal curvature.
Both properties are crucial for the construction of good
quality meshes, which may be used to support numerical
computations over the surface. Another interesting prop-
erty is an inside-outside symmetry that implies the exis-
tence of locally perfectly complementary molecular skin
models. In other words, for each cavity we may construct
a molecular skin representation whose boundary matches
that of the molecule. The molecular skin also lends itself
to represent deformations, and some of the possibilities
along these lines will be discussed in Chapter VIII.
This chapter is organized in four sections. In Section
III.1, we give the geometric definition of the molecular
skin and show how it can be decomposed into quadratic
patches. In Section III.2, we discuss various notions of
curvature of a surface, and we show that the maximal
principal curvature is a continuous map over the molec-
ular skin. In Section III.3, we describe the algorithm that
constructs a molecular skin in terms of a triangle mesh. Fi-
nally in Section III.4, we present software for constructing
molecular skin in two- and three-dimensional space, and

35
36 III S URFACE M ESHING

III.1 Molecular Skin Pencils. It is possibly easier to develop an intuition for


combining circles than for combining paraboloids. Given
Almost everything we will say in this section applies two intersecting circles and , the affine hull con-  
*  *
sists of all circles that pass through the same two inter-
 
equally well to spheres of any fixed dimension. Even  

*    *
section points. Indeed, if   then
though the case of spheres in is most relevant for the




for all coefficients and . 

study of molecules, there is sufficient pedagogical advan-
We call the resulting family a pencil of circles. If and 
tage to first talk about circles in  .

are disjoint then the affine hull is again a pencil but this
time of pairwise disjoint circles, like the vertical family


Circles and paraboloids. Recall that the weighted sketched in Figure III.2. We compute the center and ra-
square distance function of a circle 


is the

map

  
defined by 

*
     . F* F
 
As illustrated in Figure III.1, its graph is a paraboloid

of revolution in  
that intersects  in the circle.
In other words, the circle is the zero-set of the weighted
square distance function,  
   . All paraboloids

 * * * * 4 * ) *  
  
that arise as weighted

square distance functions have the
form    . The three pa-
     
rameters correspond to the three degrees of freedom rep-
resented by the center and the radius.

Figure III.2: Circles sampled from a coaxal system consisting of


two orthogonal pencils.

dius of the zero-set of    . We have

F* F     

  FH*   F    
 
  

FH* F
F     F 
 F F    F   F 

      

Figure III.1: A circle in  is the zero-set of its weighted square        



     and the square

distance function.

The center is therefore
radius is 

F F  F F 

   F  F     .




 

 
Functions form a vector space under the usual notions
The centers of the circles in the affine hull are therefore the
points on the line that passes through and  . If instead
of scaling and addition. We will use only a subspace of

 
that vector space, namely the one consisting of functions
of the affine hull we take the convex hull, then we get the
of the above form. Given a collection of such functions

, we can generate another such function by affine combi-
subset of circles whose centers are the points on the line

nation, , where the are real numbers with segment with endpoints  and  .

. The new function is a convex combination
        is orthogonal to  if
Recall that a circle 
of the if all
are non-negative. Given a collection of   F  F    . If   is orthogonal to 
    
circles, , the affine hull is the set of zero-sets of affine and to   then it is also orthogonal to every circle  in the
combinations of the corresponding weighted square dis- affine hull of  and   . To see this elementary fact, note
tance functions, and similarly the convex hull is the subset that

 F   F         
of zero-sets of convex combinations, 
   



    

 F F   F  F       
  
    

 
2
 
   
 


      
    
III.1 Molecular Skin 37

which is       
and thus vanishes as required.
for fixed value of . The collection of all reduced 
circles
Suppose we are now given two circles and and two   is the projection of the entire zero-set,    . It can be


   

more circles and both orthogonal to and . Then visualized as a leaning hour-glass of circles, as in Figure
  
every circle in the affine hull of and is orthogonal to III.4. The envelope of  is the projection of the silhou-
  
ette of    as viewed along the direction. It is the
*
*
both and and thus to every circle in the affine hull of
  set of points for which    



and . In other words, we have two pencils in which    

 @* :* :* @* J*  *
  
each circle in the first pencil is orthogonal to each circle vanishes. From  we get  . The envelope is
  
in the second pencil. Such a configuration is illustrated in therefore the zero-set of        ,
     
Figure III.2 and is referred to as a coaxal system. which is a hyperbola.

Envelopes. The convex hull of two circles is an infinite Skin and body. More general curves than just hyperbo-
family of circles, but the union of their disks is just the las can be constructed by taking the convex hull of a fi-
union of the two original disks. We introduce a shrinking nite collection of circles, then shrinking every circle in the
operation that reduces small circles less than big ones and family, and finally taking the envelope. Formally, the skin
this way generates a smooth
 

envelope. Specifically, we de-

of the collection of circles is the envelope of the reduced
    . The body is the union of
fine 
 we define        

      . Similarly, for a family of circles circles,


disks bounded by circles in    2
. An example can be
 . It is the region in
seen in Figure III.3, which sketches a shrunken pencil of  bounded by the skin, and symmetrically, the skin is
circles. the boundary of the body. The smallest non-trivial exam-
ple is the skin of two circles. If these circles intersect in
two points then the skin is a dumbbell, as shown in Figure
III.5. It consists of two circles connected by a blending
hyperbola arc.

Figure III.3: The dotted circles belong to the affine hull and the
solid circles are reduced.
Figure III.5: The skin of two intersecting circles is the envelope
of a reduced line segment of circles.

We are interested in the envelope of a shrunken pencil.

 *
Suppose  is a pencil and 
all its circles pass through the


points  and   . We parametrize by the - The skin of three circles is already more difficult to un-



coordinate of the circle centers. The corresponding ra- derstand, at least directly. We thus take an indirect ap-
dius is  

. The same parametrization of the family proach and first study what happens when orthogonal cir-
of reduced circles,  , gives cles shrink.
  :*  *   *   

*
 
      


  
  
The reduced circle with center 

is the zero-set of

Orthogonality and complementarity. Let
and 


be two orthogonal circles. We thus have

F  F     
  
        
     
      



     

Taking roots left and right implies that the radii of  and 
  add up to at most the distance between the two cen-
 
Figure III.4: Sections of the zero-set of viewed from the posi- ters. Furthermore, we have equality iff . In other
tive  direction. words, the reduced versions of any two orthogonal circles
38 III S URFACE M ESHING

touch if they are of the same size and they are disjoint in skin of consists of circles, connected to each other by
all other cases. blending hyperbola and inverted circle arcs. We will not
prove this claim and instead give an explicit construction
 
We apply this result to the coaxal system consisting of
of the decomposition, which is facilitated by a complex

orthogonal pencils and . Suppose contains only cir-
assembled from Voronoi and Delaunay polyhedra.

cles with real radii, or equivalently, is the affine hull of


two intersecting circles. As shown earlier, the envelope of

As usual, we let be an index set and use it to de-

   2       
 
 is a hyperbola. We claim that the envelope of  is note the Voronoi polyhedron . The corre-
the exact same hyperbola. To see this, we first note that sponding Delaunay simplex is    .

  
a circle in 
 
can at most touch the hyperbola, for if it The corresponding mixed cell is the Minkowski sum of

crossed, we would have two crossing reduced circles con- shrunken copies of both,    . If   
the mixed cell is the shrunken and translated

 
tradicting the orthogonality of the two corresponding orig- copy of a
inal circles. Furthermore, every circle in  two-dimensional Voronoi cell. If     then 

for which is
there is an equally large circle in 
 
touches the hyper- the Minkowski sum of two orthogonal edges and there-
bola because it touches that circle. The two envelopes are fore a rectangle. If 
   then  is a shrunken and
therefore the same hyperbola. As shown in Figure III.6, translated copy of a Delaunay triangle. The mixed complex
the two asymptotic lines of the hyperbola intersect at a consists of all mixed cells and their faces. Figure III.7 il-


right angle. The smallest separating circle that touches lustrates the construction by showing the mixed complex
both branches belongs to  and has the same size as the decomposing the skin into circle and hyperbola arcs. A
two osculating circles that both belong to  . These cir-
cles touch the hyperbola and have the same curvature as
the hyperbola at that point.

Figure III.7: The mixed complex and the skin of four circles.

rather intuitive explanation of the construction can be ob-


tained by drawing the Voronoi diagram and the Delaunay
Figure III.6: Hyperbola with orthogonal asymptotic lines, small- triangulation on two parallel planes in . We decompose
est separating circle, and two osculating circles. the slab between the two planes into pyramids and tetrahe-
dra, which are the convex hulls of corresponding Voronoi
polyhedra and Delaunay simplices. The mixed complex is


The complementarity of the bodies extends from the then obtained by intersecting the pyramids and tetrahedra
case of two orthogonal pencils to the case in which con-

sists of a single circle and contains all circles orthog-
with the plane parallel to and halfway between the other

 two planes, as sketched in Figure III.8.


onal to . The set is a two-parameter family spanned
by three circles. The skin of is trivially a circle, which
implies that the skin of is the same circle. Symmetry. Note that the construction of the mixed
complex is symmetric in the Voronoi diagram and the De-
launay triangulation. In other words, the mixed complex
Decomposition. The skin of any finite set of circles
can be decomposed into simple pieces, each defined by at
of
is the same as the mixed complex of the collec-
tion of circles introduced in Section V.1. [The order
most three of the circles. A single circle defines a (smaller) of the chapters on skin and pockets has changed now,
circle, a pair of circles defines a hyperbola, and a triplet of
circles defines an inverted circle. We thus claim that the
  centered at each
which requires a local rewrite here and in Section III.4.]
As explained there, contains a circle
III.1 Molecular Skin 39

[2] M. G. DARBOUX . De points, de cercles et de spheres. An-


nales de L’Ecole Normale, Series 2 (1872), 323–392.

[3] H. E DELSBRUNNER . Deformable smooth surface design.


Discrete Comput. Geom. 21 (1999), 87–115.

[4] G. F ROBENIUS . Anwendungen der Determinantentheorie


auf die Geometrie des Masses. J. Reine Angew. Math. 79
(1875), 185–247.

[5] D. P EDOE . Geometry: a Comprehensive Course. Dover,


New York, 1988.

Figure III.8: The top, middle, and bottom planes carry the Delau-
nay triangulation, the mixed complex, and the Voronoi diagram.

 
Voronoi vertex (including those at infinity) with the ra-

dius chosen so that is orthogonal to the circles that de-

fine . The Voronoi diagram of is then the Delaunay
triangulation of , the Delaunay triangulation of is the

Voronoi diagram of , and the mixed complexes of and
are the same. We have seen that the skins of two orthog-
onal pencils are the same hyperbola. Similarly, the skins
of one circle and the affine hull of three orthogonal circles
are the same circle. Since the mixed complex decomposes

the entire skin of into such cases, it follows that the skin
of is the same as that of . Note however that the two

 
bodies are not the same but rather complementary,
  

   
    
    

Bibliographic notes. There is another interpretation of



the vector space of circles exploited in this section. It
 

9F F 
identifies each circle  in  with the point
  
      in . Under this interpretation, the convex
hull of a set of circles corresponds to the usual convex hull

of points in , and the symmetry between and can

be explained as a polarity between two convex polyhedra.
This interpretation is prominently used in the geometry
text by Pedoe [5]. It has been discovered in the nineteenth
century and published at more or less then same time in
three different languages by Clifford [1], Darboux [2], and
Frobenius [4].
The material of this section is taken from [3], where

skin surfaces are introduced as orientable  
 -
manifolds in . That paper also proves that the body of
a finite collection of spheres has the same homotopy type
as the dual complex.

[1] W. K. C LIFFORD . Problem 1748. Mathematical Questions


and Solutions from the Educational Times 44 (1865), 144.
40 III S URFACE M ESHING

III.2 Curvature
an open set in  , and


 
a parametrization.


Derivatives are taken along curves on the surface. For ex-


ample, to compute the tangent plane at

 , we take
 *

The skin curves introduced in Section III.1 generalize
the tangent vectors of two curves that cross at . They span
straightforwardly to surfaces in . In this section, we
study the curvature of these surfaces. The Curvature Vari- the tangent plane, as illustrated in Figure III.10. Similarly,
ation Lemma proved at the end of this section will play
a major role in the meshing algorithm to be discussed in
Section III.3. There are several notions of curvature of a x y
surface, and all are obtained by considering the curvature f
of curves drawn on the surface.

Curves. A closed space curve is a  map


three-dimensional space,   
 
 of a circle to
. It is smooth if Figure III.10: Construction of tangent plane from two tangent
vectors.
the derivatives of all orders exist. Usually we need only

  
a small number of derivatives, and the assumption of the we define the curvature at in sections. For
each curve
existence of infinitely many is convenient but not neces-
 *
in the plane we consider the space curve

. It is a
sary. Note that a curve has a parametrization and the
counter-clockwise orientation of the circle gives a sense
  ,
geodesic at 
normal at . The curvature of

if its normal agrees with the surface


consists of a portion
 
 
 
F - - , , F 1 ,
of direction. The velocity vector at the point forced by how the surfaces 
is  
 
  and the speed is the length of that vector,
in space and another
curves
portion accounting for how curves within the sur-
,  - , F  - , F
 . The tangent vector is the normalized velocity vec-
   

face. The second contribution vanishes for geodesics, and
tor,     , which is defined as long as the if it does we call     the normal curvature of  at <1 ?


    . There
speed is non-zero. We can think of as the Gauss map
in the direction of the tangent vector
from  to  , as illustrated in Figure III.9.  we get a
is a circle of tangent vectors, and for each one
normal curvature. The principal curvatures at are the
minimum and maximum normal curvatures,

             







Let  and  be the corresponding tangent directions. By


 
a result of Euler, the principal curvatures determine all
other normal curvatures at .

 
    2
E ULER ’ S T HEOREM . The directions and are or-
 
thogonal, and if   then

   2
Figure III.9: A closed space curve to the left and its Gauss map 


to the right.
   
  then all other normal cur-
 

,   - ,  and the second derivative,  5 ,  , is normal to


It is often convenient to assume unit speed. In this case This implies that if 
  
 vatures are strictly between the two principal curvatures,

F  5 ,  F . The normal


which are therefore unique. If    then all normal
tive,  ,
the first. The curvature is the length of that second deriva-
 

,  5  ,  GF vector
 5  ,  F ,iswhich
the normalized curvatures are the same and the point is an umbilic point
  
 
as long as   ,
second derivative, is defined of the surface. Two other common notions 
of curvature

. Geometrically, the curvature is one
over the radius of the osculating circle at
 ,  , which sian curvature,

are the mean curvature,   
    . In contrast
  , and the Gaus-
to the other no-
is the circle in the plane spanned by the tangent vector and tions, the Gaussian curvature is intrinsic. In other words, it
the normal vector. is preserved by isometries, which are transformations that
preserve the distance between points measured as lengths
of connecting paths. This is a famous result of Gauss.
 
Surfaces. Let
. For a point

 
be a smooth surface or 2-manifold in
, we let  be a neighborhood,

T HEOREMA E GREGIUM .
 is an isometric invariant.
III.2 Curvature 41

Skin surfaces.  Recall that the skin defined by a finite 


spheres with indices in intersect in a circle, touch in a
set of circles in  is the envelope of the infinite fam- point, or are disjoint. Either way, the body lies on the side
ily of circles in the convex hull, each reduced by a fac- of the infinite circle in the symmetry plane.
tor    . Furthermore, the mixed complex defined by
the circles decomposes the skin into circle and hyperbola
Maximum normal curvature. We can translate and ro-
 

arcs. Similarly, the skin of a finite set of spheres in
is      . The mixed complex that tate every sphere and hyperboloid to standard form, which
we define as
*   *   * 
decomposes the surface consists of the four types of cells
illustrated in Figure III.11. Within each mixed cell, we 
*  *  * 






*
The second equation defines a hyperboloid with the apex
at the origin, the symmetry axis along  , and the sym-
metry plane 
 * . We have a one-sheeted hyperboloid
for  and a two-sheeted one for   , as illustrated in
Figure III.12. For the sphere, the normal curvature at ev-
  
Figure III.11: Typical mixed cells 
to right we have
    and 4.  
  . From left

have a sphere or a hyperboloid patch. The hyperboloid


can either be one-sheeted (an hour-glass) or two-sheeted.
The cases are summarized in Table III.1. The two sphere
 
  mixed cell skin patch
1 3 0 convex polyhedron sphere
2 2 1 polygonal prism hyperboloid
3 1 2 triangular prism hyperboloid Figure III.12: The sphere, the one-sheeted hyperboloid, and the
4 0 3 tetrahedron sphere two-sheeted hyperboloid.

Table III.1: The cardinality of  listed in the first column deter- ery point is   in every tangent direction. The situation is
mines the dimensions of the corresponding Voronoi polyhedron
and Delaunay simplex as well as the type of the mixed cell and 
more complicated for the hyperboloid. Consider the hy-
perbola in standard form in  , as shown in Figure III.13,
of the skin patch. and note that both the one-sheeted and the two-sheeted hy-
perboloid can be obtained by rotating the hyperbola about

cases are symmetric and differ from each other by the sur-
a symmetry axis. In either case, the maximum normal cur-
face orientation: in the case   

, the body lies
locally inside, and in the case     , it lies locally
outside the sphere. Similarly, the two hyperboloid cases


are symmetric and differ from each other by the surface
orientation. In the case     , the symmetry axis
of the hyperboloid is the affine hull of the Delaunay edge
and the (orthogonal) symmetry plane is the affine hull of
the Voronoi polygon. We have a one-sheeted hyperboloid r
r
if the two spheres intersect in a circle and a two-sheeted x
r
one if they are disjoint. The common limiting case is a
double-cone defined by two touching spheres. Either way,


the body is on the side of the infinite ends of the symmetry
axis. In the case 
   , the symmetry plane is the
affine hull of the Delaunay triangle and the symmetry axis Figure III.13: Every point of the hyperbola is sandwiched be-
is the affine hull of the Voronoi edge. Whether the hyper- tween two equally large circles.

*  *  , is one over the radius of


boloid is one-sheeted, a double-cone, or two-sheeted de-
pends on whether the two spheres orthogonal to the three vature at a point  ,  

42 III S URFACE M ESHING

the largest sphere that passes through and touches but * [3] H. E DELSBRUNNER . Deformable smooth surface design.

does not cross the hyperboloid. As shown in Figure III.13,
*
Discrete Comput. Geom. 21 (1999), 87–115.

* / *
this radius is the same as the distance of from the ori-

gin. In short,  
[4] B. O’N EILL . Elementary Differential Geometry. Second
 for every point of a sphere or
 edition, Academic Press, San Diego, 1997.
hyperboloid in standard form.

Curvature variation. The maximum normal curvature


varies continuously over the skin because the common
radius of the sandwiching spheres varies continuously.
We strengthen the result by showing that  varies rather

  In fact, we extend   to a function defined on all
slowly.
of and show that   has Lipschitz constant one. We

have seen that within a mixed cell,   is simply the dis-

 
tance to the center,  . By the definition of the mixed com-
plex, this is a continuous function on . Within the mixed
cell, the triangle inequality gives the Lipschitz bound,
   F* F F F
 * 


 


    
 
FH*  F

*
By applying this to the pieces of the line segment from
to contained in different mixed cells, we obtain the
result.

C URVATURE VARIATION L EMMA . For all points 


  * 
we have
   FH* F
 
*

 
 

 

  
We note that the extension of  to a function 

describes the maximal normal function of all skin surfaces
in the family defined by the power growth model of the
spheres, as introduced in Section II.2.

Bibliographic notes. The books by Bruce and Giblin


[1] and by O’Neill [4] are good introductory texts to

curves and surfaces and other topics in differential geome-
try. The skin surfaces in
  
are obtained by extending the
results of Section III.1 by one dimension, from  to .
A more direct treatment of the general-dimensional case
can be found in [3]. The specific results on the curvature
and the curvature variation of skin surfaces are taken from
[2].

[1] J. W. B RUCE AND P. J. G IBLIN . Curves and Singularities.


Second edition, Cambridge Univ. Press, England, 1992.

[2] H.-L. C HENG , T. K. D EY, H. E DELSBRUNNER AND J.


S ULLIVAN . Dynamic skin triangulation. Discrete Comput.
Geom. 25 (2001), 525–568.
III.3 Adaptive Meshing 43

III.3 Adaptive Meshing point   , the restricted Voronoi cell is


  
  *    FH*  F  FH*  F  


 
In this section, we focus on constructing an explicit rep-
 

resentation of a molecular skin surface. We choose a tri- where distance is measured in , as usual. It is the in-
angle mesh realized in that is a good approximation of tersection of  with the Voronoi polyhedron of
   
  
 
in
the surface and has good numerical properties. ,  
. The restricted cells decompose
 into closed regions that overlap along common pieces

of their boundaries. Locally the picture is rather simi-
Triangulations. Recall that a triangulation of a surface
   
lar to that of a Voronoi diagram in  . The restricted
 is a simplicial complex whose underlying
space is homeomorphic to  . Since  is a 2-manifold,  
    

Delaunay triangulation,
plices  
, is the collection of sim-
with non-empty common
it follows that the simplicial complex is the closure of its
     
intersection of the corresponding restricted Voronoi cells,
triangle set, every edge belongs to exactly two triangles,
and the star of every vertex forms a disk. Note that the last

 

  . The construction is illustrated

property implies the first two. We construct a triangula-


in Figure III.14. We note that 
is a subcomplex of the
(unrestricted) Delaunay triangulation of in .
tion by first selecting points on  and second connecting
these points with edges and triangles. Given the Delau-
nay triangulation of , we have sufficient information to Closed ball property. One trouble with the restricted


sample points and to compute their maximum normal cur- Delaunay triangulation is that it may not be homeomor-

   
vature values. Specifically, for each Delaunay simplex  
phic to  and thus not triangulate the surface. Indeed,
 

 

we construct the mixed cell  . The cen- it is easy to come up with cases where is not even
ter of this cell is the point at which the affine
 


hull of a 2-manifold. A sufficient condition for to triangu-
intersects the affine hull of . It is also the center of late  is what we call the closed ball property. It requires
the corresponding sphere or the apex of the corresponding that each common intersection of restricted Voronoi cells

 
hyperboloid. Next, we rotate the mixed cell so its center is topologically a closed ball of the appropriate dimen-

moves to the origin. Furthermore, if or is an edge sion. We formulate this condition in terms of the three-


then we rotate it into vertical position. The sphere or hy- dimensional Voronoi polyhedra defined by . Assuming
  
   
 
perboloid defined by is then in standard form, which can general position, the Voronoi polyhedron

has dimension 
 
  , and we require that


 
be sampled. For each sampled point we compute the max-
imum normal curvature from its distance to the origin and  
is either empty or homeomorphic to a closed ball
we obtain the corresponding point on  by the inverse

of dimension   
  . Depending on the cardinality
rotation. 
of we have a closed disk, a closed interval, or a single
point.

Figure III.15: To the left a barycentric subdivision of a portion


of a Voronoi diagram drawn with solid lines. To the right the
Figure III.14: Local decomposition into restricted Voronoi cells isomorphic barycentric subdivision of the corresponding portion
and dotted dual restricted Delaunay triangulation. of the dual Delaunay triangulation drawn with dashed lines.

Let be the set of points sampled on  . We use it as Proving that the closed ball property implies
 
tri-
the vertex set of the triangulation, which we construct as angulates  is not difficult. Decompose the restricted
the dual of a decomposition of  . Specifically, for each Voronoi diagram by adding a point in the middle of each
44 III S URFACE M ESHING

arc and inside each cell and connect each point to the arbitrarily ugly. To improve the mesh, we impose condi-
points on the boundary. The star of every point inside a re- tions on the size of edges and triangles that imply both
stricted cell is a triangular decomposition of that cell. The upper and lower bounds on the spacing between sampled


star of every restricted Voronoi vertex consists of six tri- points.
angular regions that can be homeomorphically mapped to
 , 
 
Let the size of an edge be half its length,
the six triangles in the barycentric subdivision of the dual 
 
and the size of a triangle be the radius of its circumcir-
restricted Delaunay triangle. By construction of , the
cle, . For edges we worry about them getting too
triangles in the two barycentric subdivisions are connected
the same way so we have a homeomorphism between    at the endpoints,   

  
short, so we compare size with the larger length scale

. For trian-
  

and the underlying space of , which is illustrated in


gles we worry about them getting too large, so we com-

Figure III.15.
  
 
 
  
    

pare size with the minimum length scale at the vertices,


. We use two constants,
-sampling. The question remains how we sample the and , to express the conditions on the size. The constant
controls how closely the triangulation approximates  ,

points such that the restricted Voronoi diagram has the
closed ball property. Since  is smooth, small neigh- and controls the quality of the triangles. We refer to the
borhoods are fairly flat and the restricted Voronoi diagram two conditions as the Lower and Upper Size Bounds,
behaves locally similar to the (unrestricted) Voronoi dia-      for every edge     ,
 

  for every triangle 


   .
gram of a set of points in the plane. In other words, a [L]
[U]  
dense enough sample of points should have the closed ball


*
property. This intuition can be made precise by formaliz-

ing the concept of density. Recall that  
It is not necessary to bound the edge lengths from above
*  *
is the max-
because an edge with   

 . Around we would belong to
*
imum normal curvature at a point
spread points at distance roughly proportional to    .


We therefore define 

    *
 
* two triangles that both violate [L]. Symmetrically, we do
*
and call it the length
      
 not need to bound the triangle sizes from below because
* 
scale at . The Curvature Variation Lemma of Section

III.2 states that for any two points   , the differ- a triangle with
that violate [L].
would have three edges

    *    FH*
ence in length scale is at most the distance between them
 
F
in ,   
 .

 such that for each

Mesh quality. The constants and have to be chosen
*  
An -sampling is a subset
 there judiciously. For example would immediately lead
FH* F  *
point exists a point at distance
  

. Showing that a sufficiently small
implies the closed ball property for the restricted Voronoi

to irreconcilable requirements on edge and triangle sizes.
Furthermore, cannot be too large, else we would con-
diagram is rather tedious and we omit the proof. tradict the -sampling condition stated in the Homeomor-
 and

phism Theorem. Without going into details, we state that

 are feasible choices. In particu-
 with  
H OMEOMORPHISM T HEOREM . If is an -sampling of
  , then the restricted Delau-
lar, these constants imply that is an -sampling for suffi-
nay triangulation of is homeomorphic to  . ciently small value of . More precisely, they imply that
is either an -sampling or it grossly violates the condition
for -sampling. An example of such a gross violation are
The precise upper bound for is a root of the function
four points close together on a sphere. The points form a
 

2
     
    




tetrahedron whose edges and triangles may very well sat-
isfy the Size Bounds, but the boundary of the tetrahedron
 
is a miserable approximation of the much larger sphere.
which arises in the proof of the Homeomorphism Theo- Fortunately, such a gross violation of the condition cannot
rem. be created from an -sampling without the intermediate
generation of triangles that grossly violate [U]. The algo-
rithm discussed below is unable to generate such triangles.
Even sampling. The points of an -sampling can locally
not be too far apart, but they can be arbitrarily close to- The two Size Bounds together imply a reasonably large
gether. In other words, on a microscopic scale, the points lower bound on the angles inside triangles of the restricted
can be placed every way one likes and the mesh can be Delaunay triangulation.
III.3 Adaptive Meshing 45

M INIMUM A NGLE L EMMA . A triangle that satisfies [U] violate the Upper Size Bound. It is possible that an edge
and whose edges satisfy [L] has minimum angle
larger than      .
8 contraction causes a vertex insertion, but a vertex inser-
tion cannot create edges of size below the allowed thresh-


 
old. This is what prevents infinite loops in spite of the



P ROOF. Let be the triangle and its cir- algorithm’s partially conflicting efforts to simultaneously

 F
 F  


cumradius. Assuming is the smallest angle, we avoid short edges and large triangles. To prove this claim,

     *+
have of length
  as the short- we consider a triangle that causes the addition of its
est edge. We have  by definition of length dual restricted Voronoi vertex  .
* created dur-
scale. Using [L] and [U] we thus get
     N O -S HORT-E DGE L EMMA . Every edge
ing the addition of has ratio   * /  /  .
  


F F 

    
  


           8 . 

*

P ROOF. We have . The sphere with




   *
Hence


 center that passes through , , and has radius


*
and contains no other vertices than in-
FH* F


   
For 
, the minimum angle is thus larger than
side. Every new edge has therefore length 


 

   , and the maximum angle is smaller than
 
    
. Assume without loss of generality that


         . 
 . We use the Curvature Variation Lemma to
derive upper bounds for the length scales at and : *
Density modification. Given an -sampling, we can en-
force the Size Bounds by contracting short edges and in-   *           F *  F


serting points near the circumcenters of large triangles.


      *   F*  F     FH*   F
*
Given a triangle that violates [U], we add the dual
restricted Voronoi vertex as a new point to . The inser-


tion may cause new violations of [U] and thus trigger new Hence
point insertions.
/ FH*  F    *     

 



 
void V ERTEX I NSERTION:
while  triangle
*

violating [U] do 
For

 and  
 we have   
  
and

endwhile.

therefore   

/  / , as claimed.


*
The details of the algorithm that modifies the restricted Scheduling. [Summarize the results on scheduling edge
Delaunay triangulation to reflect the addition of are contractions and vertex insertions described in [5].]
omitted. A vertex insertion may cause other vertex in-
sertions, but this cannot go on forever because we will
Bibliographic notes. The restricted Delaunay triangula-

eventually violate the Lower Size Bound. Given an edge
that violates [L], we contract it by removing one of its tion is a generalization of the dual complex of a ball union.
endpoints. We are not able to exclude the possibility that It can be used to triangulate surfaces and other spaces em-
the removal creates new violations of [L], and it certainly bedded in a Euclidean space. Besides the dual complex
can create new violations of [U]. literature, there are several other partially dependent roots
of the idea, namely the surface meshing method by Chew
[3], the neural net work by Martinetz and Schulten [6],

void E DGE C ONTRACTION:
while  edge the formulation of the closed ball property by Edelsbrun-

  
 
violating [L] do
if 
 


then  endif; ner and Shah [4], and the surface reconstruction algorithm
by Amenta and Bern [1]. The last of the four papers also
 ; V ERTEX I NSERTION
endwhile. introduces -samplings of surfaces, although in a slightly
different formulation in which the distance to the medial
The details of the algorithm are again omitted. An edge axis replaces the length scale.
contraction may perhaps cause other edge contractions, All results that are specific to skin surfaces are taken
but this cannot go on forever because we will eventually from [2]. The algorithm in that paper is more general than
46 III S URFACE M ESHING

what is explained in this section and maintains the surface


mesh while it moves in space.

[1] N. A MENTA AND M. B ERN . Surface reconstruction by


Voronoi filtering. Discrete Comput. Geom. 22 (1999), 481–
504.

[2] H.-L. C HENG , T. K. D EY, H. E DELSBRUNNER AND J.


S ULLIVAN . Dynamic skin triangulation. Discrete Comput.
Geom. 25 (2001), 525–568.

[3] L. P. C HEW. Guaranteed-quality mesh generation for


curved surfaces. In “Proc. 9th Ann. Sympos. Comput.
Geom., 1993”, 274–280.

[4] H. E DELSBRUNNER AND N. R. S HAH . Triangulating topo-


logical spaces. Internat. J. Comput. Geom. Appl. 7 (1997),
365–378.

[5] H. E DELSBRUNNER AND A. Ü NG ÖR . Relaxed scheduling


in dynamic skin triangulation. In “Japanese Conf. Comput.
Geom., 2002”, to appear.

[6] T. M ARTINETZ AND K. S CHULTEN . Topology representing


networks. Neural Networks 7 (1994), 507–522.
III.4 Skin Software 47

III.4 Skin Software


In this section, we use two pieces of software to visualize
the various geometric concepts introduced earlier in this
chapter.

Skin curves. The Morfi software is two-dimensional


and constructs skin curves from finite sets of circles. In
Figure III.16 we see seven disks whose union is decom-
posed into convex regions by the Voronoi diagram. Su-
perimposed on this decomposition is the skin curve with
shaded body and the dual complex. Note that the disk

Figure III.17: Decomposition of the skin and body by the mixed


complex.

portion of the hole boundary inside that quadrangle is cir-


cular while the portions outside the quadrangle are hyper-
bolic. Observe also that the five Delaunay polygons vis-
ible within the mixed complex apparently have eight ver-
tices (not double-counting the shared ones). We see only
seven of them in Figure III.16 because one of the eight
radii is imaginary. Where is its center in Figure III.16?

Figure III.16: Voronoi decomposition of disk union with super- Simulated smoothing. We return to an issue left open in
imposed skin, body, and dual complex.

 
Section V.1, where we considered the minimum weighted
square distance function   of a collection of

 
union contains the body and the body contains the dual
,
circles . The zero-set of is the envelope of the cir-


  ,
complex. Furthermore, the disk union, the body, and the cles   , and the preimage of any real value is
dual complex all have the same homotopy type. This is 
the envelope of the circles   
,
. Following the
always true. The skin shrinks the arcs in the boundary of
,
notation in Section II.3, we think of as time and de-
the disk union and smoothly blends between the shrunken note the collection of circles at time  by 
. In
arcs using pieces of hyperbolas and inverted circles. Most

 
Section V.1 we claimed that there is an infinite family of
striking is the blending for the quadrangular hole roughly smooth approximations    of that all have
in the middle of the figure, which is converted into an al- the same critical points, namely the points where dually
most entirely circular hole in the body.

corresponding Voronoi
and Delaunay polyhedra intersect.
We choose   and construct the family such that
Mixed complex. Using the Morfi software, we can visu- and  approaches as  goes to 1. One function  


* 

,; 
alize concepts that are difficult if not impossible to show in this family is the trajectory of the skin curve, , that
  to the moment in time
*
in . An example is the mixed complex illustrated in maps each point

;
Figure III.17. It decomposes the skin into circular and hy- at which belongs to the skin of . We generalize this
construction to any 


perbolic arcs. As explained in Section III.1, it consists of  by letting  be the trajec-


shrunken Voronoi polygons, rectangles, and shrunken De- tory of the modified skin curves. Specifically, the  -skin
launay polygons. The collection of circles generating the is the envelope of the circles in the convex hull that are
diagram in Figure III.17 is degenerate, which can be seen reduced by a factor   ,
from the fact that there are three shrunken Delaunay trian-



 2        
  
      2 

gles but also two shrunken Delaunay quadrangles. One of


the quadrangles contains most of the hole in the body. The  
48 III S URFACE M ESHING

 
Note that  is the skin as defined in Section III.1,
and  
is the envelope of the original disks. Figure
III.18 illustrates the construction by showing the modi-
fied skins for several values of  . Observe that the bod-

Figure III.18: From inside out the sequence of skins for  Figure III.19: Cut-away view of the mesh of a small molecule


         .
  

of about forty atoms. Only the edges of the mesh and the cut
boundary are shown.

ies bounded by the  -skins are nested. As it turns out, the


innermost  -skin, defined for  , is also the envelope dering of the same surface in Figure III.20. The appar-


 
* 
of the orthogonal circles as defined in Section III.1. The ent smoothness is an illusion created by Gouraud shading,
function     to the mo-
* , *
maps every point which is a graphics technique that interpolates between
  
ment in time   at which  belongs to 
, C
, normal directions to generate the smooth impression. Note
 as usual. For 


with  , the height function that highly curved areas detectable in Figure III.20 corre-
 is differentiable and assuming non-degeneracy of the spond to high density regions in Figure III.19.
input circles, it is twice differentiable at the critical points.
This is sufficient to justify the Morse theoretic reasoning
about the non-smooth function used in Section V.1 to Growing the mesh. As mentioned earlier, the mesh is
define pockets. constructed by maintaining it while growing the spheres.
The algorithm thus reduces to executing a sequence of ele-
mentary operations. We classify the operations according
 to the adaptation purpose they serve.
Meshed skin surfaces. In , we compute triangulated
skin surfaces using the Skin Meshing software. It takes as
input a set of spheres and constructs a mesh by main- Shape adaptation. The growth of the spheres im-

plies a deformation of the surface, which is facilitated
,
taining a triangulation of the set of spheres , with the 

time  continuously increasing from minus infin- by a motion of the mesh vertices in . The algo-


ity to zero. At the beginning, all spheres are imaginary, rithm moves vertices normal to the surface, along the

the skin is the empty surface, and the mesh is the empty  
integral
 lines of the skin trajectory, which is
 . We use edge flips to maintain the mesh as
,
complex. As time increases, the surface moves and the
software updates the mesh accordingly. At time , the restricted Delaunay triangulation of the moving
we have the mesh of the skin of . Figure III.19 shows vertices.
a portion of this mesh for a small molecule. The image Curvature adaptation. Recall that the conditions
is created by slicing the surface with a plane and remov- [L] and [U] given in Section III.3 guarantee that the
ing the front portion of the surface. The complete surface mesh adapts its local density to the maximum nor-
has genus one, and the slicing plane is chosen to cut right mal curvature. We use edge contractions to eliminate
through the narrow part of the tunnel. The image of the edges that violate [L] and vertex insertions to elimi-
mesh in Figure III.19 should be compared with the ren- nate triangles that violate [U].
III.4 Skin Software 49


are , , and  , which control how the metamorphoses
are performed. The correctness of the algorithm is guar-
anteed only if the inequalities referred to as Conditions (I)
to (V) are all satisfied. The software permits other param-
eter settings since a violation of the inequalities does not
necessarily imply a failure of the algorithm. In our ex-
perience, the software works fine for small violations but
breaks down for moderate ones.

Figure III.20: Smoothly shaded rendering of the mesh in Figure


III.19.

Topology adaptation. There are four types of Figure III.22: The quantification panel of the Skin Meshing soft-
topological changes that occur, and they correspond ware. The quality measures do not include the special edges and
to the four types of generic critical points of three- triangles that facilitate topological changes and purposely violate
dimensional Morse functions. A component is born some of the properties required for the rest of the mesh. [This
at a minimum, a handle is created at an index-1 sad- panel needs to be updated to fit the text.]
dle, a tunnel is closed at an index-2 saddle, and a void
is filled at a maximum. We use metamorphoses to
Figure III.22 shows the panel after the construction of a
change the mesh connectivity accordingly.
mesh. It displays measurements of mesh quality, includ-
ing size versus length scale ratios of edges and triangles
Two of the four types of metamorphoses can be seen at and the angles inside and between triangles. Note that in

work in Figure III.21. From the first snapshot to the sec- Figure III.22, the ratios all lie inside the allowed interval,
ond, we see two new handles appear. Each handle creates 
which is  . As proved in Section III.3, the algo-
8
a tunnel in the complement. From the second snapshot to rithm guarantees that the smallest angle inside any (non-
the third, we see both tunnels disappear again. By closing
a tunnel we also remove the handle that forms it. Observe the standard setting of

special) triangle in the mesh is larger than     . For
 , this is roughly    ,
that the surface around a handle is the same as that around and the smallest angle observed in the mesh is indeed
a tunnel, namely a two-sheeted hyperboloid that flips over    .
to a one-sheeted hyperboloid, or vice versa. The only dif-
ference is the reversal of inside and outside.

Bibliographic notes. The two-dimensional Morfi soft-


Quantification. The Skin Meshing software comes with ware has been developed by Ka-Po (Patrick) Lam, and is
a quantification panel that displays parameters used in described in his master thesis [4]. The software has been
the meshing algorithm, provides various measurements of used in [2] to explain two-dimensional skin geometry and
mesh quality, and indicates the number of operations ex- its application to deforming two-dimensional shapes into

ecuted during the construction. The two most important
parameters are , which controls the numerical approxi-
each other. The three-dimensional Skin Meshing software
has been developed by Ho-Lun Cheng [1, 5]. Computer

mation of the surface, and , which controls the size of graphics techniques used in displaying shapes, including
the angles. The three other parameters shown in the panel Gouraud shading, can be found in [3].
50 III S URFACE M ESHING

Figure III.21: Three snap-shots of the deforming triangulation of a molecular skin defined by continuously growing spheres. From left
to center, we note two metamorphoses that each add a handle in the front. From center to right, we note a metamorphosis that closes a
tunnel on the left.

[1] H.-L. C HENG . Dynamic and Adaptive Surface Meshing un-


der Motion. Ph. D. thesis, Dept. Comput. Sci., Univ. Illinois,
Urbana, 2001.

[2] S.-W. C HENG , H. E DELSBRUNNER , P. F U AND K. P.


L AM . Design and analysis of planar shape deformation. In-
ternat. J. Comput. Geom. Appl. 19 (2001), 205–218.

[3] J. F OLEY, A. VAN DAM , S. F EINER AND J. H UGHES .


Computer Graphics. Principles and Practice. Second edi-
tion, Addison-Wesley, Reading, Massachusetts, 1990.

[4] K. P. L AM . Two-dimensional geometric morphing. Master


thesis, Dept. Comput. Sci., Hong Kong University of Sci-
ence and Technology, 1996.

[5] Molecular Skin web-site in the software collection at


biogeometry.duke.edu.
Exercises 51

passing through and . Similarly, we write



Exercises


and
for the heights of and . Prove that the radius
1. Pencils of spheres. Let us extend the concept of a of the circumcircle satisfies
coaxal system of circles to three dimensions. For this
  F
 F   F  C
F
purpose assume and are two sphere
orthogonal to the spheres  ,  and .
 that are both
   
F
 F  F  
EF
  



 
 F  
EF

F
CF
(i) Prove that every affine combination of and
is orthogonal to  ,  and .

 
 
 
(ii) Prove that every affine combination of  ,     


and is orthogonal to and .
 
 

(iii) In the light of (i) and (ii), what is the analog of
a coaxal system in ?
2. Curvature in the plane. Note that the curvature  
   of a molecular skin curve  in   is not
continuous.
(i) Give an example illustrating that  is not con-
tinuous.
(ii) Introduce a new function (perhaps similar to  )
that is continuous over  .
3. Total curvature. Define the total curvature of a sur-
face  as the integral of the maximum principal cur-
vature:

 /   
*  *
 


(i) Calculate for a sphere  .

(ii) Calculate for the portion of a double-cone
within a unit-sphere around its apex.

4. Total square curvature. Define the total square cur-


vature of a surface  as the integral of the maximum
principal curvature squared:

/    

*  * 


(i) Calculate for a sphere  .
(ii) Let  be the portion of a hyperboloid of rev-


olution within a unit sphere around the apex.
Show that goes to infinity as the hyperboloid
approaches its asymptotic double-cone.
(iii) Prove that the number of points in a minimal
-sampling of  (as defined in Section III.3 is
proportional to   .



5. Something about triangles. Let be a triangle
in the plane. We write for the height of defined
as the distance of from the closest point on the line
52 III S URFACE M ESHING
Chapter IV

Connectivity

Given a shape or a space, we can ask whether or how define homology groups and their ranks, the Betti num-
it is connected. It might not be immediately obvious what bers. In Section IV.3, we describe an incremental algo-
this question means, we can draw from precise definitions rithm for Betti numbers, which is fast but limited to com-
developed in topology to answer the question. However, plexes in three dimensions. In Section IV.4, we present
we need to be aware that there are perfectly well-defined the classic matrix algorithm for Betti numbers, which is
and reasonable but different precise notions that corre- significantly slower but not limited to three-dimensional


spond to the intuitive idea of connectivity. For example, space.


for two spaces and to be “connected the same way”,



could mean they are topologically equivalent ( ),

   
they are homotopy equivalent ( ), or they have iso-
  
morphic homology groups (   . The three IV.1 Equivalence of Spaces
notions are progressively weaker: IV.2 Homology Groups

          
IV.3 Incremental Algorithm
IV.4 Matrix Algorithm
Exercises
In words, the classification of spaces by homology groups
is coarser than that by homotopy equivalence, which in
turn is coarser than that defined by topological equiva-
lence. [We should stress that homology in this topological
context has a precise algebraic meaning, which is in sharp
contrast to how the term is used in biology (eg. homology
modeling of proteins), where it indicates a vague notion of
similarity.]
Given two triangulated spaces, there is a polynomial-
time algorithm that computes and compares their homol-
ogy groups. If the groups are not isomorphic then we
know that the two space are different, meaning they are
neither homotopy equivalent nor topologically equivalent.
However, if their homology groups are isomorphic then
we still do not know whether the two spaces are the same
also under the two stricter definitions of sameness. In spite
of the apparent weakness, homology is the most important
tool to study connectivity. In this chapter, we focus on
algorithms computing the homology groups of molecules
represented by space-filling diagrams. In Section IV.1, we
prove that space-filling diagrams are homotopy equivalent
to their dual alpha shapes, which implies the two have iso-
morphic homology groups. In Section IV.2, we formally

53
54 IV C ONNECTIVITY

IV.1 Equivalence of Spaces Topological equivalence. Now that we know what a


topological space is, we can define when two are the same.
  A homeomorphism is a bijective map 
 
that is con-
The space-filling diagram of a molecule is a subset of ,

and with induced subspace topology it is a topological tinuous and whose inverse is continuous. We write
if a homeomorphism exists and say that and are home-
space. We study the connectivity of this space by con-
omorphic, topologically equivalent, and that they have the
sidering equivalence classes defined by continuous maps
between spaces. same topological type. Note that the identity is a homeo-
morphism, the inverse of a homeomorphism is a homeo-
morphism, and the composition of two homeomorphisms
   
is a homeomorphism. In other words, being homeomor-
Topological spaces. Recall that a map 

is phic is reflexive, symmetric and transitive, so is indeed

 
there is a such

 *  * 
continuous if for every an equivalence relation for topological spaces.
that  if  
  
have distance less than then the points
  have distance less than . To check As suggested by Figure IV.1, there are spaces that have
whether or not is continuous, we thus have be able to the same topological type and look vastly different, and


measure the distance between points in both sets. Accord- there are spaces that look quite similar and do not have the
same topological type. An interesting example of a pair of

ing to a more general definition, is continuous if the
preimage of every open set in is open in . Here we
only need to distinguish between open and non-open sets.
This distinction is the motivation for the following defini-

tion. A topological space is a set together with a system
of subsets of such that

(i)
  and   ,
(ii)  
 for every subsystem    , and Figure IV.1: The circle on the left is topologically equivalent

(iii)
   for every finite subsystem    . to the trefoil knot in the middle, but both are not topologically
equivalent to the annulus on the right.


 are the open sets of . If   , we can induce the
The system is called the topology of and the sets in
 
non-homeomorphic spaces are the sphere and the plane.

subspace topology, which is the system 


    After embedding both in , we can map points from the

 
. The space  together with  the system  is a
sphere to the plane by stereographic projection from the
north-pole,  , as illustrated

 in Figure IV.2. This map be-
topological subspace of the pair . tween    and  is indeed a homeomorphism, but

there is no homeomorphism between  and  .
To get comfortable with these abstract ideas requires a
 
number of concrete examples. Here is one. Let
be the three-dimensional Euclidean space. An open ball is
 N
the set of points at distance less than some from a
fixed point, and an open set is a union of open balls. Note
that the common intersection of finitely many open sets
is again open, but this is not necessarily true for infinitely


many open sets. For example, the common intersection


of the open balls of points at distance less than
from the origin, for   , is just the origin itself,
which is not an open set. We thus see that the restriction
*   
to finite subsystems in condition (iii) is necessary. The
two-dimensional sphere, 
 FH*KF
Figure IV.2: The stereographic projection maps the sphere (mi-
 ,
nus the north-pole) to the plane. The lower hemisphere maps to
 
is a subset of , and if we choose its intersections with
the shaded disk and the upper hemisphere to the complement of
open sets in

as the open sets in its topology, then it is a that disk.
 
topological subspace of . Another topological subspace

of is the two-dimensional Euclidean plane,  .
IV.1 Equivalence of Spaces 55

Homotopy equivalence. Next we introduce an equiva- (iii)   ,  , for all   and all ,&  .
lence relation that is less sensitive to the local dimension
Note that  is a homotopy between  , which is the iden-

of spaces than topological equivalence. We begin by com-
tity on , and  , which maps to  . As illustrated in
  
paring maps between the same spaces. Two continuous

maps 

are homotopic if there is a continu-
 * *
 Figure IV.4, there is a deformation retraction from the dou-
ous map with 
 *  
  
* *+
and ble annulus to the figure-8 curve, but there is no deforma-
 
 and call

 , for all . We write tion retraction to the circle. (Why not?)

,'
a homotopy between and . This definition is illustrated
in Figure IV.3. We may think of the parameter  as

im k

im H

Figure IV.4: The arrows indicate a deformation retraction from


im h
the double annulus to the figure-8 curve.

Figure IV.3: In this example, and  both map the circle into
three-dimensional space, and  maps the circle times    to 
 


D EFORMATION R ETRACTION L EMMA . If is a defor-


the cylinder connecting the two images of the circle. mation retraction from to then and are ho-
motopy equivalent.
sweep out the image of by the images of the
1 * * ,  
time and
    . The only requirements has to satisfy  
is that it starts with , ends with  P ROOF. We construct maps  and 
   *


 *

, and that with the required
properties. Define  


and
it is a map. For example, is not required to be injective,  



. Then  is homotopic to the identity on
which is the same as saying that the image of may be 
because
is a homotopy between the two maps. Fur-
self-intersecting.  is equal to the identity on 

thermore, and therefore

   
Two spaces and are homotopy equivalent if there certainly homotopic to it.



are
continuous maps  and  

such that
 is homo-
The simplest homotopy type is that of a point. A space
  
is homotopic to the identity on and is contractible if it is homotopy equivalent to a point. For
topic to the identity on . We write and say that example, a disk is contractible but a circle is not. Simi-

the two spaces have the same homotopy type. Note that
larly, a ball is contractible but a sphere is not.
is reflexive, symmetric, and transitive and is therefore
indeed an equivalence relation for topological spaces. It is
easy to show that two topologically equivalent spaces are Decomposition into joins. We construct a deformation
also homotopy equivalent. To see that the reverse is not retraction between a union of balls and its dual complex
true we note that the annulus in Figure IV.1 is homotopy
 
using a decomposition into joins. In general, a join be-
equivalent to the circle, but the two are not topologically

tween two sets and in some Euclidean space is the
equivalent.

union of closed line segments that connect points in
with points in ,
   / &   *
Deformation retraction. If is a topological subspace


of then we may prove that the two spaces are homotopy

 
equivalent by constructing a map that retracts to . A

 
deformation retraction
from to is a continuous and it is defined iff any two such line segments are either
map  with disjoint or meet at a common endpoint. Figure IV.5 uses

(i)  * * , for all *+


 two kinds of joins to decompose the difference between
,
(ii)  *   , for all * 
the union and the dual complex of a set of disks, namely

, and triangles and disk sectors. A triangle is the join between a
56 IV C ONNECTIVITY


boundary of . We shrink * by defining

  , 

,   ,
 


for every point  on the line segment *


. A triangle in
the decomposition shrinks from its outer vertex towards
the opposite edge, which belongs to the dual complex. It

,
turns into a trapezium whose height decreases and reaches
zero at time . A disk sector shrinks from its outer arc
towards its center, which is a vertex of the dual complex.
It maintains its shape while getting smaller until it reaches
the size of a point. The deformation retraction is obtained
Figure IV.5: The union of disks is decomposed into the underly- by shrinking all joins simultaneously. It is illustrated in
ing space of the dual complex and two types of joins connecting
,
Figure IV.6, which shows the image of the retraction at
that complex to the boundary of the union.
time  . Figure IV.7 shows an entire sequence of
shapes during the deformation retraction visualized for the
model of gramicidin also shown in Figure II.3.
point and an edge and a sector is the join between a circu-
lar arc and a vertex.
 
Let be a finite collection of closed balls in . We as-
sume general position and construct a deformation retrac-
  

tion from the union, , to the underlying space
of the dual complex, . Recall that the bound-
ary of consists of sphere patches separated by circular

 
arcs connecting corners. To be specific, we define a patch
as the contribution of the sphere bounding to the
boundary of . It does not have to be connected or sim-
ply connected. Similarly, we define an arc and a corner
as the contribution of the intersection of two and of three
spheres to the boundary of . An arc may be a full cir-
Figure IV.6: The decomposition after shrinking the joins half
cle, or any number of intervals along the circle. A corner
way to zero.
may be empty, a point, or a pair of points. The decom-
position is constructed by forming the join between every
patch, arc, and corner and its dual vertex, edge, and trian- There is a technical problem at the very beginning of
gle. Figure IV.5 illustrates the construction in the plane. the shrinking process that arises already in two dimen-
There are four corners that are point pairs, and they corre- sions. Specifically, the outer vertex of each triangle join
spond to the four principal edges of the dual complex. (As belongs to more than one line segment and thus retracts
defined in Section II.4, an edge is principal if it is not face towards more than one point of the dual complex. To fi-

of any other simplex in the complex. In the Alpha Shape nesse this difficulty, we choose and move the points
software, such an edge is referred to as singular.) There differently in the time interval  . In the assumed case
are also four arcs that consist of more than one component in which is in general position, this initial motion needs

 ,
each, and they correspond to the vertices on the boundary to bridge the non-zero gap between the boundary of and
of the dual complex that are exposed to the outside in more the boundary of the image of at time . By choosing
than one interval of directions. small, we can make the gap arbitrarily small and easy to
bridge.

   
 
Shrinking
joins. We get a deformation retraction Bibliographic notes. Homeomorphisms, homotopies,
  from to by shrink- and deformation retractions are covered in most texts of

* *
ing joins from outside in. Each join is the union of line algebraic topology, including Seifert and Threlfall [6] and
segments with on the boundary of and on the Munkres [5]. Subtleties of the definitions of a topology
IV.1 Equivalence of Spaces 57

Figure IV.7: Six snap-shots of the deformation retraction from the union of balls representation of gramicidin to the dual complex.

and of a topological space are discussed in texts on gen- les points fixes des représentations. J. Math. Pure Appl. 24
eral topology, including Kelley [2] and Munkres [4]. (1945), 95–167.

The particular deformation retraction used to prove the [4] J. R. M UNKRES . Topology. A First Course. Prentice Hall,
homotopy equivalence between a union of balls and its Englewood Cliffs, New Jersey, 1975.
dual complex is taken from Edelsbrunner [1]. That equiv-
[5] J. R. M UNKRES . Elements of Algebraic Topology. Addi-
alence can also be derived from general theorems about
son-Wesley, Redwood City, 1984.
coverings. The Nerve Lemma says that a space is homo-
topy equivalent to the nerve of a finite open cover whose [6] H. S EIFERT AND W. T HRELFALL . A Textbook of Topology.
sets have either empty or contractible common intersec- Academic Press, San Diego, California, 1980.
tions. We can turn the Voronoi cells of a union of balls
into such a cover and get the homotopy equivalence re-
sult from that lemma. The history of the Nerve Lemma is
complicated because different versions have been discov-
ered independently by different people. Maybe the paper
by Leray [3] is the first publication on that topic.

[1] H. E DELSBRUNNER . The union of balls and its dual shape.


Discrete Comput. Geom. 13 (1995), 415–440.

[2] J. E. K ELLEY. General Topology. Springer-Verlag, New


York, 1955.

[3] J. L ERAY. Sur la forme des espaces topologiques et sur


58 IV C ONNECTIVITY

 *    
. The quotient divided by  , denoted as
IV.2 Homology Groups   , is the collection of cosets. Addition in the quotient
group is defined by   *    

  
*   
This section introduces homology groups as an algebraic
means to characterize the connectivity of a topological  . We note that it does not matter which representatives
space. To keep the discussion reasonably elementary, we we choose in computing the sum of the two cosets. The

 *    
restrict it to triangulated spaces and to addition modulo 2. resulting coset is always the same, so addition is indeed
well defined. Observe that  implies 

Triangulations. In the preceding chapters, we have


G
talked about triangulations in an intuitive geometric sense.
In topology, the term has a precise meaning, which we x+y+ H

 2
now develop. A simplex is the convex hull of an affinely

   
independent point set,   . If has cardinality

 
then has dimension   and is also referred x+ H y+ H


to as a -simplex. A face of is the convex hull of a

  H


subset , and we write . Since has   sub-


sets, has the same number of faces, including the empty

0
set and as its two improper faces. A simplicial complex
is a finite collection of simplices with pairwise proper
intersections that is closed under the face relation, that is, Figure IV.8: Partition of  into cosets defined by  for the case
in which  contains a quarter of the elements.
(i) if    and   then   , and
 then    is either empty or a face of *      *   . So if  *   and    then
(ii) if    *   . In words, two cosets are either disjoint or
 
 
both.
same cardinality and 
 

 
that
the same. If is finite this implies
     .
all cosets have the
   
 
Recall that the underlying space of
 
is the union of all
A homomorphism between groups and  is a function
simplices, . A simplicial complex can be
     that commutes with addition,  *  

 *
used to represent a topological space, and we have seen
 . The kernel of is the subset of whose
elements map to   , and the image is the subset of 
an example in Section II.3, where the dual complex of a
space-filling diagram was used to represent a molecule.
We proved in Section IV.1 that the underlying space of whose elements have preimages in :
the dual complex is homotopy equivalent to the space-
  *    *   
 
filling diagram. A topologically more accurate represen-

    $*+
with  *

tation would have a homeomorphic underlying space. We

thus define a triangulation of a topological space as a
  
simplicial complex whose underlying space is topolog-
ically equivalent, . The remainder of this section
An isomorphism is a bijective homomorphism. Its kernel
is the zero element of and its image is the entire . 
introduces the algebraic concepts we will use to define ho-
mology groups of triangulated spaces. Chain complex. Let

be a simplicial complex. We

 
construct groups by defining what it means to add sets of

 
Abelian groups. A group is a set together with an as-
sociative operation    for which there is a 
simplices. Call a set of -simplices a -chain. By defini-
tion, the sum of two -chains is the symmetric difference
of the two sets,
  
zero and an inverse for every group element. The group is
abelian if the operation is commutative. Examples are the
  




  



infinite group of integers with addition,  

, and the fi-

 
nite cyclic group of elements,   mod . A subset
 
 forms a subgroup if  is a group. 
This is like adding modulo 2 where , since a

  
  is a subgroup. We chain belongs to
 
iff it belongs to neither or to both
 
   *  ,
 
Suppose  chains.  is the set of -chains and 
have +  , and because implies * 
is abelian and is the group

there is a bijection between  and each coset *
 
  of -chains. The zero of this chain group is the empty
set. We connect chain groups of different dimensions by
IV.2 Homology Groups 59

homomorphisms that map chains to their boundary. For


this purpose we define            Ck+1 Ck Ck−1

 
. The boundary of a chain is the sum of boundaries of its Z k+1 Zk Z k−1
simplices,  . Observe that the boundary of


the sum of two chains is the sum of their boundaries, 
  Bk+1 Bk Bk−1


. This assumes of course that and have the k+2 k+1 k k−1

  
same dimension, else would not be defined. We thus
    , 

have a boundary homomorphism 0 0 0
for every . The sequence of chain groups connected by
boundary homomorphisms is the chain complex of ,
 Figure IV.9: The chain complex and the groups of cycles and

8  
boundaries contained in the chain groups.


             

     
 
 . If  then

group, is the trivial
Figure IV.9 illustrates the sequence but contains informa-
 
group consisting only of one element. The size of is a
tion about subgroups that will be introduced shortly.

measure of how many -cycles are not -boundaries. The
cosets are the elements of and are referred to as homol-
Cycles and boundaries. There are two types of chains ogy classes.

 
that are particularly important to us: the ones without As an example consider a triangulated torus, as
 
boundary and the ones that bound. A -cycle is a -chain
  sketched in Figure IV.10. All 0-chains are 0-cycles and
with . The set of -cycles is the kernel of the -
  . Two -cycles      half of them are 0-boundaries, 
 
namely the ones with even
 

th boundary homomorphism, 

 cardinality. Hence
4
 . The two non-

      
add up to another -cycle, which implies that  is 
 bounding 1-cycles labeled and generate a first homol-
a subgroup of 

. A -boundary is a -chain for
 ogy group of four elements, as shown in Figure IV.10. It




which there exists a  -chain with . The set is isomorphic to   , which is the group of elements
    
of -boundaries is the image of the 

-st boundary
  

  
  with component-wise addition
 
homomorphism,  

. Two -boundaries add
 modulo 2. There is only one non-empty 2-cycle,  ,


 
 
up to another -boundary, which implies that 

 
is a

and no non-empty 2-boundary,  . Hence

 
subgroup of  . We prove that  is a subgroup  
 
 
 .

of  . Equivalently, the boundary of every boundary
is empty.
0 a b a+b
F UNDAMENTAL L EMMA OF H OMOLOGY.  . 0 0 a b a+b

  
b
 a a 0 a+b b


P ROOF. Note that  for every 

-simplex . b b a+b 0 a


This is because every   -simplex belongs to exactly a
a+b a+b b a 0
two -simplices. The rest follows because taking bound-
ary commutes with adding:

 

 
Figure IV.10: The curves  and  represent the homology classes
   and    , which generate the homology group   .


 

An important property of homology groups is that they
 are the same for triangulations of homeomorphic and of
homotopy equivalent spaces. In particular, we get the
which is the empty set, as required. same homology groups for different triangulations of a
We can therefore draw the relationship between the sets topological space. Similarly, the homology groups of (any
of chains, cycles, and boundaries as sketched in Figure triangulation of) a union of balls are the same as the ho-
IV.9. mology groups of the dual complex. In other words, the
homology groups are properties of the space and not arti-


facts of the complexes used to represent that space. Prov-

 
Homology groups. The -th homology group is the quo- ing that this is really the case is beyond the scope of this
tient of the -th cycle group divided by the -th boundary book.
60 IV C ONNECTIVITY

Betti numbers. The most useful aspects of homology Since     is a homomorphism,   
  ,

groups are their ranks, which have intuitive interpretations and    , we have



    


       
in terms of the connectivity of the space. The concept of
a rank applies equally well to chain, cycle, boundary and  

*  * *  rewrite this relation as  


homology groups. All these groups are idempotent, that

     . Earlier we derived
for every . Given a subset
Using corresponding lowercase letters for ranks, we

is, of such a

   . The number of  -simplices in the complex



 
group , we can form all sums of elements in and thus 
 
is also the rank of the chain group,    , hence
generate a subgroup. This operation can also be expressed


in the terminology of linear algebra, where the subgroup
is knows as the linear hull, 
 

 
 
  
, consisting of all ,

with and . This subset is
it is minimal and generates the entire group,

a basis if
. 
      
 
Even though there is no unique basis, all bases have the
       
  



  
same size, and because is idempotent, that size is the
binary logarithm of the number of group elements. By
definition, the rank of is the size of a basis:    

 

  . If the group is the -th homology 

  

 We state this result because it is important and so we can
  
group of a space, , the rank is known as the -th

Betti number of that space:    . Since  use it for later reference.
 
 we have
E ULER -P OINCAR E´T HEOREM .         .


            
This relation can often be used to quickly find the Euler
Revisiting the example above, we see that the Betti num- characteristic of a space without constructing a triangula-

 
bers of the torus are  ,  and  . The tion and counting simplices. For example, the closed disk
homology groups of dimensions
 
  are all trivial
 
has one component, no non-bounding loop, no shell, and
and the corresponding Betti

numbers are all zero. For the
 
 

therefore  . Similarly, the Euler charac-


 
closed disk we  have
and therefore  ,

 ,
 
and 
and


. Similarly for
teristic of the two-dimensional sphere is
and that of the torus is     
. Note that this

   implies that the disk, the sphere and the torus are pairwise
the two-dimensional sphere we have  , and
 non-homeomorphic. This is hardly surprising but not easy
 . As for the torus, all other Betti numbers vanish.
 to prove with elementary means. Indeed, two spaces with
In general, the 0-th Betti number is the number of con- different Euler characteristics have homology groups that

nected components. To see this remember that a 0-cycle
bounds iff it contains an even number of vertices in
are different in at least one dimension. In this case, the
spaces are neither homotopy equivalent nor topologically
each component. Note also that exactly half of the subsets equivalent.

      
of a finite set have even cardinality. If there are  compo-
  


nents and

  
vertices then   
and                      
8
. Bibliographic notes. Homology groups have been de-
It follows that        . Similar to  veloped at the end of the nineteenth and the beginning of
 , the 1-st and 2-nd Betti numbers have intuitive interpre- the twentieth centuries. The French mathematician Henri
tations as the number of independent non-bounding loops Poincar´e is usually credited with the conception of the idea
and the number of independent non-bounding shells. [4]. He named the ranks of the homology groups after the
English mathematician Betti, who introduced a slightly
different version of the numbers years earlier. The begin-


Euler characteristic. Consider a simplicial complex ning of the twentieth century witnessed parallel develop-

and let  be the number of its -simplices. By defi- ments of homology groups that differed in the elements

these numbers:



 
nition, the Euler characteristic is the alternating sum of
 . We show that is    
they added (simplices, cubes, general cells, ...) and the co-
efficient groups they used ( ,  , , , ...). Eventually, all


 
also the alternating sum of Betti numbers. Note that if this work was unified by axiomizing the assumptions un-
 is a homomorphism, then the rank of is der which homology groups exist [1]. Today, homology
equal to the sum of ranks of the kernel and the image. is a general method within algebraic topology. We refer
IV.2 Homology Groups 61

to Giblin [2] for an intuitive introduction to that area and


to Munkres [3] and Rotman [5] for more comprehensive
sources.

[1] S. E ILENBERG AND N. S TEENROD . Foundations of Alge-


braic Topology. Princeton Univ. Press, New Jersey, 1952.

[2] P. J. G IBLIN . Graphs, Surfaces and Homology. Chapman


and Hall, London, 1981.

[3] J. R. M UNKRES . Elements of Algebraic Topology. Addi-


son-Wesley, Redwood City, 1984.

[4] H. P OINCAR É . Complément à l’analysis situs. Rendiconti


del Circolo Matematico di Palermo 13 (1899), 285–343.

[5] J. J. ROTMAN . An Introduction to Algebraic Topology.


Springer-Verlag, New York, 1988.
62 IV C ONNECTIVITY


IV.3 Incremental Algorithm Observe that the four cases follow one and the same rule:


if belongs to a non-bounding cycle in then we in-
The Betti numbers of a simplicial complex can be com- crement the Betti number of the dimension of and,


puted incrementally, by adding one simplex at a time. otherwise, we decrement the Betti number of dimension
In this section, we describe the details of this algorithm,    
one less than that of . This is justified by the equa-

 
tion 
which is particularly well-suited for filtrations.   developed in Section IV.2: adding
a -simplex always increments the rank of the -th chain

 
group, and it does this by either incrementing the rank of
Adding a simplex. We analyze what happens to the
  the -th cycle group or that of the  

-st boundary
 

Betti numbers when we add a simplex to a complex .


 group.
Let  
and assume that all proper faces of


belong to , so is also a complex. By observing how

fits into , we can determine the Betti numbers of from Algorithm. To compute the Betti numbers of a complex,

 
we form a filtration that ends with that complex:
      
those of . In the case analysis, we mention only the Betti
numbers that change.

Case   andis athus
vertex. Being a vertex,  cannot connect to  
   
All are complexes, and it is convenient to assume that
   

forms a component by itself. Therefore,

    . 

any two contiguous complexes differ by only one simplex:




   . For example, we may sort the sim-

Case is an edge. There are two sub-cases depending on plices in non-decreasing order of dimension and take all
whether the endpoints of belong to the same com- prefixes of that sequence. Alternatively, we may use the
ponent or to two different components. Both cases
 
filtration of a Delaunay triangulation introduced in Sec-
are illustrated in Figure IV.11. In the first case, we
 

tion II.3. In the latter case, the filtration contains all alpha
have     , and in the second case
      complexes and we get the Betti number of all of them in
     . one sweep. The algorithm is but a simple scan along the
filtration.
u v
u v 
integer

B ETTI:

   
  ; 

to  do  
 
for   ;
The edge   if belongs to a -cycle in then   ++

Figure IV.11:  closes a loop on the left and
connects two components on the right. else  --



endif


Case is a triangle. Again we have two sub-cases, both endfor; 

   
illustrated in Figure IV.12. If completes a 2-cycle

return     .
 
then  

 
 
. Otherwise, closes a
 
 
tunnel and we have      . The only difficult part of the algorithm is deciding whether
 
or not belongs to a -cycle. We study this problem after
illustrating the algorithm for a small example.

σj σj Betti numbers of the dunce cap. The dunce cap is best


created from a triangular piece of soft cloth. As illustrated
in Figure IV.13, all three sides are equally long and are
Figure IV.12: To the left, the triangle completes a surface, while glued to each other with matching orientations. To run
to the right, it just closes a tunnel formed by the surface holes. our algorithm, we need a triangulation of the dunce cap.

  It is not difficult to construct one, but we have to avoid


 

Case is a tetrahedron. Assuming is a complex in pitfalls such as creating edges that share more than one
, it cannot have any 3-cycle. Adding can there- endpoint and triangles that share more than one edge. A
fore only turn a non-bounding 2-cycle (its boundary)
into a 2-boundary. Hence,  

 

 .
 valid triangulation is shown in Figure IV.14. When we
run our algorithm, we first add all vertices, then all edges,
 
IV.3 Incremental Algorithm 63

Classifying vertices and edges. We now return to the


problem of deciding whether the addition of a simplex in-
creases the rank of a cycle group or that of a boundary
group. In the former case, we say the simplex creates, and


in the latter case it destroys. All vertices create, but edges
can create or destroy. For example, the edge in Fig-
ure IV.11 creates on the left and destroys on the right. To
Figure IV.13: In the first step, we glue two sides of the triangle, distinguish between the two cases, we maintain the com-
thus forming a cone with a seam. In the second step, we glue the ponents of the complex throughout the filtration using a
seam along the rim of the cone (not shown).
union-find data structure, which represents a system of
pairwise disjoint sets: the elements are the vertices and
1 the sets are components of the complex at any moment
in time. The data structure supports three types of opera-
tions:
3 7 6 3
F IND 


 return the set that contains vertex .

2
8 5
2
U NION 
 
 substitute 

for the sets and

in

the system.
9
4
D A DD 

 add as a new singleton set to the system.
A B C
1 2 3 1 The algorithm scans the filtration from left to right and
classifies each vertex and each edge as either creating or
Figure IV.14: A triangulation of the dunce cap. destroying:

 
to  do

 
for
and finally  all triangles. After adding the thirteen vertices, case is a vertex  :

 
we have   ,  and  . The evolution creates; A DD  ;
 
F IND 
of Betti numbers while adding the edges in lexicographic case is an edge :
  
 
 ; F IND  ;
order is shown in Table IV.1. There are 27 triangles in the



if then creates
 destroys; U NION 
 

12 13 16 17 19 1A 1C 1D 23 25 else
12 11 10 9 8 7 6 5 5 4
endif
 0 0 0 0 0 0 0 0 1 1
 endfor.

28 29 2A 2B 2D 35 36 37 38 3B
3 3 3 2 2 2 2 2 2 2


1 2 3 3 4 5 6 7 8 9
Standard implementations of the union-find data structure

3C 45 46 47 48 49 4A 4B 4C 4D
2 1 1 1 1 1 1 1 1 1 take barely more than  constant

time per operation. To
 10 10 11 12 13 14 15 16 17 18
be more precise, let  
 be the extremely fast growing

56 5D 67 78 89 9A AB BC CD
1 1 1 1 1 1 1 1 1 Ackermann function. Its inverse is extremely slow grow-
 19 20 21 22 23 24 25 26 27 ing. To get a faint idea of how slow the inverse grows,

Table IV.1: Evolution of  and  while adding the edges of the
we note that     
any constant, but    
cannot be bounded from above by

 unless  is larger than

triangulation in Figure IV.14.
the estimated number of electrons in the universe. Any
sequence
 of  operations takes time at most proportional
triangulation, each closing a tunnel and thus decrementing 
to      . For all practical purposes, this means that
 . Indeed, no collection of triangles has zero boundary,
 each operation takes only constant time.
which can be proved by observing that three edges belong
to three triangles each and all other edges belong

to two
triangles each. The final result is therefore  and Classifying triangles and tetrahedra. In three-dimen-
  . Indeed, the dunce cap is connected, all sional Euclidean space, every tetrahedron destroys but tri-
 
 
its closed curves bound, and the surface formed by the angles can destroy or create. Deciding whether or not a
triangles does not enclose any volume in . triangle belongs to a cycle is not quite as straightforward
64 IV C ONNECTIVITY

as it is for an edge. However, with an extra assumption tetrahedra, but this is exactly what compactification does
on the filtration, we can use the dual graph of the com- for us when it adds tetrahedra outside the boundary tri-
plement to classify triangles and tetrahedra the same way angles of the Delaunay triangulation. The running time
as we classified edges and vertices. The most convenient for classifying all triangles and tetrahedra is again propor-
 
 
tional to     .
 
version of this assumption is that the last complex in the
filtration,
 
, is a triangulation of . Think of as the

one-point compactification of . Given a Delaunay tri-
Summary. The entire algorithm consists of three passes
angulation in , we can construct such a triangulation by
adding a dummy vertex  and connecting it to all bound- over the filtration:
ary simplices of the Delaunay triangulation.
   1. a forward pass to classify all vertices and edges,


In and also in , every closed surface bounds a vol- 2. a backward pass to classify all triangles and tetrahe-
ume. In other words, a triangle completes a 2-cycle dra,
iff it decomposes a component of the complement into
two. We keep track of the connectivity of the complement 3. a forward pass to compute the Betti numbers.
through its dual graph, whose nodes are the tetrahedra and
Figure IV.16 illustrates the result of the algorithm. In the
whose arcs are the triangles. Figure IV.15 illustrates this
first two passes, we maintain a union-find

data structure,
construction in two dimensions. Adding a triangle to the 
which takes time proportional to      . The third
pass does only a constant amount of work per step, namely
incrementing or decrementing a counter. The 
total running

time is therefore at most proportional to     .

Figure IV.15: A subcomplex of the Delaunay triangulation and


the dual graph of the complement. The region outside the Delau-
nay triangulation is represented by a single node.

complex effectively removes an arc from the dual graph


of the complement. Deciding whether removing an arc
splits a component is more difficult than deciding whether
adding an arc connects two components. We therefore
scan the filtration backward, from right to left:
Figure IV.16: The evolution of the Betti number  (the num-

 downto do
 
for
ber of tunnels) in the filtration of gramicidin, which is shown in
case is a tetrahedron:
 Figures II.3 and II.15.

 
destroys, unless

 , in which case it creates;


A DD  ;

 
case is a triangle:


let and be the tetrahedra that share ; Bibliographic notes. The incremental algorithm for

 
F IND  ; F IND  ; computing Betti numbers described in this section is taken



if then destroys from [2]. It exploits the fact that the connectivity of
 
else creates; U NION   the complex determines the connectivity of the comple-
endif ment. This relation is a manifestation of Alexander dual-
endfor. ity, which is studied in algebraic topology [3, Chapter 3].
This algorithm has been implemented as part of the Al-
The algorithm requires that each triangle is shared by two pha Shape software, which computes the Betti numbers of
IV.3 Incremental Algorithm 65

typically thousands of complexes in the filtration of a pro-


tein structure in less than a second. The key to achieving
this performance is a fast implementation of the union-find
data structure,
 namely one with running time proportional

to      for  operations. The details of such an
implementation can be found in most algorithm texts, in-
cluding [1, Chapter 22]. A proof that the running time

cannot be improved from      to  has been given
by Tarjan [4].

[1] T. H. C ORMEN , C. E. L EISERSON AND R. L. R IVEST.


Introduction to Algorithms. MIT Press, Cambridge, Mas-
sachusetts, 1990.

[2] C. J. A. D ELFINADO AND H. E DELSBRUNNER . An incre-


mental algorithm for Betti numbers of simplicial complexes
on the 3-sphere. Comput. Aided Geom. Design 12 (1995),
771–784.

[3] A. H ATCHER . Algebraic Topology. Cambridge Univ. Press,


England, 2002.

[4] R. E. TARJAN . A class of algorithms which require nonlin-


ear time to maintain disjoint sets. J. Comput. System Sci. 18
(1979), 110–127.
66 IV C ONNECTIVITY

IV.4 Matrix Algorithm hj hj − h s hs

In this section, we develop the linear algebra view of ho-


mology and formulate a matrix algorithm for computing
gi
Betti numbers. After explaining the algorithm both for
addition modulo two, we extend it to integer addition. +
gr gr + gi

 
Incidence matrices. Let be a simplicial complex
-simplices  and  
 +


with   -simplices . The
-th incidence matrix is

 
Figure IV.17: The effect of elementary row and column opera-
tions on the bases of   and  
 .


 
      

  
..
 
..

..


..
.

 
. . . matrix, but it is still describes a correspondence between
 
bases of  and  
  . The matrix is in normal form if
!
  its non-zero entries are lined up along an initial segment
where iff is a face of  . Using this notation, of the main diagonal, as illustrated in Figure IV.18. We
we can write the -th boundary homomorphism in matrix can use Gaussian elimination to transform the incidence


 ! 
form: matrix into normal form.

  
 :


to   


 for do
 

 if NON Z ERO    then

Recall that the  form a basis of the -th chain group,  forall rows 

do
  

 , and similarly the form a basis of    . The above   if
endfor;
then row
 
row row endif
formula thus expresses the boundary of every basis ele-

ment of  as a sum of basis elements of 
  . To make  forall columns  
if 
then col 
 do
col 
 col endif

this interpretation of the incidence matrix useful for com-
endfor
puting Betti numbers, we need to consider more general
endif
bases. These can be generated by performing elementary
endfor.
row and column operations:
   
 
The algorithm uses a boolean function NON Z ERO that
exchange row with row ;
makes sure that during the -th iteration the -th diagonal
add row to row ; entry,
, is non-zero. It does this by exchanging rows
exchange column with column  ;  and columns. The function fails to make non-zero iff
add column to column  .  all entries in the remaining sub-matrix are zero.
 :



   

boolean NON Z ERO    :


Exchanging two rows or columns is equivalent to re-
 

while and    do
 
indexing the  or . As illustrated in Figure IV.17,
   ;
  ; --
assume w.l.o.g. that
  col
adding row to row has the effect of replacing  by
 . Adding column to column  has the effect of  if col then col
with 


 row 

 

replacing by   . (Since we deal with idempo- else find ;
row 
tent groups, subtraction is the same as addition.) Note that
the effect is not symmetric: the basis of  changes at  endif
the modified row, while the basis of 
  changes at the  endwhile;
return .
modifying column.
We use the phrase “assume without loss of generality”

 
Normal form algorithm. After a few elementary row as a short-form for expressing that there is another case,
and column operations, is no longer the -th incidence namely   , that can be handled symmetrically. The al-
IV.4 Matrix Algorithm 67

   ,
 .
gorithm consists of three nested loops. Letting this function as a formal polynomial:
the running time is therefore at most proportional to   4 4   4  


where 4  is the function value of  . We add two  -


Deriving the Betti numbers. Suppose we have trans-

chains componentwise, by adding the coefficients of like

formed all incidence matrices of into normal form.

simplices:
As illustrated in Figure IV.18, the -th matrix has

4  
 

rows and
  columns. The zero-rows correspond to
-cycles, of which we have  many. It follows that the 
4    

number of non-zero entries along the main diagonal is
  

  . The -th Betti number is the rank of

 


By definition, the boundary of  is the
 
alternating sum of ordered   -simplices obtained by
1 bk −1 ck −1
dropping one vertex at a time:


   
1

 

   

1 

bk −1
where the hat marks the deleted vertex. We can check that
zk the boundary is independent of the ordering, as long as it
ck belongs to the same orientation, and that it is the nega-

 
tive boundary for an ordering of the opposite orientation:


Figure IV.18: The normal form of the  -th incidence matrix.   . Similarly, we can check that the Funda-

  
mental Lemma of Homology still holds: . As

   
before, we define the group of -chains,  , the group
 
the -th cycle group minus the rank of the -th boundary
   
  
group:  of -cycles, , and the group of -boundaries,  . The

  
   . We can thus derive


the Betti numbers from the sizes and numbers of non-zero -th homology group is again  , and the

 
entries in the normal form matrices. -th Betti number is the rank of that homology group:
We note that the ranks of the incidence matrices suffice
   .
for computing the Betti numbers and it is not necessary to
go all the way to normal form. Either way, the running Torsion. A curious new phenomenon that arises with the
time of the algorithm is cubic in the number of simplices
in the complex.  
use of integer addition is algebraic torsion. It does not oc-
cur for spaces that can be embedded in , so it is not part
of people’s immediate experience. Maybe the simplest
topological space whose homology groups have torsion is
Integer coefficients. The matrix algorithm can be ex- the Klein bottle. It can be constructed from a rectangular
tended to coefficients in  instead of  . Before dis-

cussing the necessary modifications, we talk about what
this means in terms of adding simplices and chains. We 1 4 5 1


start at the beginning.

 
An ordered -simplex is an ordering  of the vertices

2 3
of a -simplex, and we write  . Two

ordered simplices have the same orientation if their order- 3 2
ings differ by an even number of transpositions. Each sim-
plex has two orientations, except if it is a vertex, in which

1 4 5 1
case it has only one. To set the stage, we give each simplex

 
in an arbitrary but fixed orientation, and for a given ori- Figure IV.19: A triangulated rectangular piece of paper glued to


ented simplex , we write  for the other orientation of form a Klein bottle.


the otherwise same simplex. A -chain is a function from
the -simplices to the integers. It is convenient to write piece of paper by gluing opposite sides as shown in Figure
68 IV C ONNECTIVITY

 
IV.19. Since it has torsion,we  know that the Klein bot-
tle cannot be embedded in , and when we draw it, we
If we get a positive integer smaller than
gle column operation. Symmetrically, if 

in a sin-
we get such
have to allow for a self-intersection. The 1-cycle marked a positive integer in a single row operation. Otherwise,
around the neck of the bottle does not bound, but twice we may assume that
divides both 
and  , and we
that 1-cycle bounds. This is what causes torsion. To de-

can make
 
zero with a row operation. By adding row

 
scribe the phenomenon more generally, we need the fact to row we keep unchanged and we change  to
that every finitely generated abelian group is isomorphic 
 , which is not an integer multiple of . Now we
to a direct sum (Cartesian product) of copies of  and of get a positive integer smaller than
in a single column
cyclic groups: operation, as before. Since
divides every entry in the

 

 










  8 remaining sub-matrix, it will also divide the future non-
zero diagonal entries. Hence, the algorithm generates the

 torsion coefficients with the required properties.

  
Furthermore, we may require that all are larger than one The running time of the algorithm is no longer guaran-
and that divides

fixes  and the indices . The abelian group

for each . This extra condition
 is thus the
 teed to be at most cubic in the number of simplices. In-
deed, the sequence of operations is sensitive to the size
direct sum of a free subgroup, namely  , and the rest,

of the integers that arise, and it is not even clear whether
namely 





  , which is referred to as its torsion
subgroup. The are the torsion coefficients. The rank of
or not it is polynomial in the input size. As for  coef-
ficients, we can determine the homology groups directly


 
of  , which is  . For
 
the group is the number of copies  from the normal forms of all incidence matrices. We get

         
the Klein bottle, we have  ,   and the rank of the -th homology group from the -th and the
     

 
 for addition modulo 2 and  ,    -st normal form matrices:    .
      
and for integer addition. We thus get different We get the torsion coefficients from the  -st normal

Betti numbers for addition modulo 2 and for integer addi- form matrix: they are the diagonal entries that exceed one.

characteristic:  

tion, but their alternating sums are both equal to the Euler
 . Indeed, the

Euler-Poincar´e Theorem is true independent of the type Bibliographic notes. The matrix algorithm presented in
of coefficients we choose to define homology groups and this section is taken from [2, Chapter 1]. The normal form
Betti numbers. it uses is sometimes referred to as the Smith normal form
[3], and similarly, the algorithm is sometimes called the
Smith normal form algorithm. For integer coefficients, it
Algorithm revisited. The normal form of a bases tran- is unclear whether or not its running time is polynomial
sition matrix is the same as before, except that we now in the input size. However, it is possible to modify the al-
allow entries in the main diagonal that are neither zero
  
gorithm to guarantee polynomial running time [1, 4]. The
nor one. Specifically, the initial sequence of ones is fol- Betti numbers obtained for  and  (or other coefficient
  
 
lowed by integers , all larger than one, such groups) are not necessarily the same, but their differences
that divides

 
, for each . We modify the above  are predictable and described by the Universal Coefficient
algorithm to transform the incidence matrix into normal Theorem of Homology [2, Chapter 7].
form. First we extend the elementary row and column op-
erations by allowing the multiplication of entire rows or [1] R. K ANNAN AND A. BACHEM . Polynomial algorithms for
columns by non-zero integers. A more substantial mod- computing the Smith and Hermite normal forms of an inte-
ification is needed within the function NON Z ERO, which

ger matrix. SIAM J. Comput. 8 (1979), 499–507.
now attempts to turn the next diagonal entry, , into the
smallest positive entry achievable by row and column op- [2] J. R. M UNKRES . Elements of Algebraic Topology. Addi-
erations. Unless the entire remaining sub-matrix is zero, son-Wesley, Redwood City, California, 1984.
this attempt will be successful and will divide every
 
[3] H. J. S MITH . On systems of indeterminate equations and
entry in the sub-matrix. To see this property, assume there

congruences. Philos. Trans. 151 (1861), 293–326.
is an entry  , with  , that is not an integer multiple
[4] A. S TORJOHANN . Near optimal algorithm for computing

of :
 Smith normal forms of integer matrices. In “Proc. Internat.
.. .. .. Sympos. Symbol. Algebraic Comput., 1997”, 267–274.
. . .
  
Exercises 69

Exercises (ii) Assume is the center of  


. The sphere

bounding intersects all other balls in caps.

1. Equivalence classes. Consider the following topo- Show that 
is isomorphic to the dual com-
logical spaces: a circle, a trefoil knot, a Möbius strip, plex of that collection of caps.
a sphere with north-pole and south-pole removed, 5. Torus and projective plane. Take a rectangular
and a plane with origin removed. piece of paper and orient the left and right sides from
top to bottom and the top and bottom sides from left
(i) Partition the collection into classes of same
to right. You get a torus if you glue the left side to
topological type.
the right side and the top side to the bottom side, each
(ii) Partition the collection into classes of same ho- time with matching orientations. You get a projective
motopy type. plane if you glue again the left to the right and the top
to bottom sides but now with opposing orientations.
2. Amino acids. Take the graphs drawn in Figures I.8
and I.9 as definitions of the amino acids as (one- (i) Triangulate the rectangle such that you get a
dimensional) topological spaces. Here an atom is a valid triangulation for both ways of gluing its
vertex and a bond is an edge, no matter whether or sides.
not it has (partial) double bond character. (ii) Compute the Betti numbers of the torus and
the projective plane by running either the in-
(i) Are there any two amino acids with isomorphic
cremental or the matrix algorithm (by hand) on
graphs? If yes, which ones?
your triangulations.
(ii) Calculate the Betti numbers and Euler charac-
teristics of the graphs. 6. Simple graphs. A simple graph is a simplicial com-
(iii) Partition the collection of graphs into classes of plex that consists of vertices and edges but has not tri-



the same homotopy type. angles or higher-dimensional simplices. Let be the
number of vertices and the number of edges. Use
3. Joins and simplices. A tetrahedron can be defined the language of homology groups to re-confirm the
as the join of two skew line segments in space. The following formulas, which are well-known for sim-
halfway plane is parallel to both line segments and ple graphs:
lies exactly halfway between them. Since the line
(i)  


if the graph is a tree.
segments are skew, the halfway plane separates the
(ii)  


 if the graph is connected.

two line segments.   
(iii)    in general.

(i) Show that the halfway plane intersects the tetra-
hedron in a parallelogram.
(ii) Decomposing the line segments into and
 
7. Protein structure. Download a protein structure
from the pdb database and use the Alpha Shape soft-


pieces implies a decomposition of the tetrahe-
dron into joins, which are smaller tetrahedra.
ware to compute the Betti numbers of its van der
Waals and its solvent accessible diagrams.
Draw the decomposition and highlight the in-
tersection with the halfway plane.

4. Stars and links. Let

be the dual complex of a

 
finite collection of balls in . Define the star of
a vertex as the collection of simplices that
contain , and the link as the collection of faces of
simplices in the star that do not belong to the star:
      


          



(i) Show that 
is a complex, that is, every face
of a simplex in the link also belongs to the link.
70 IV C ONNECTIVITY
Chapter V

Shape Features

The topological analysis of spaces, as discussed in this idea seems simple enough, the details are tricky and
Chapter IV, is an important first step, but by itself is in- require that we use what we learned about pockets and
sufficient to appropriately characterize the shape of pro- topological persistence. Finally, in Section V.4, we illus-
tein structures. To decide what is appropriate, we need to trate the concepts using the Alpha Shape software and ex-
have a purpose. The goal we have in mind is understand- tensions.
ing how proteins interact with each other and with other
molecules. There is overwhelming evidence that interest-
ing events in such interactions happen preferably in cavi-
ties, which are partially protected regions in the protein or V.1 Pockets
molecular assembly, and that local shape complementar- V.2 Topological Persistence
ity plays a significant role in making such events happen. V.3 Molecular Interfaces
It appears that organic life is based on computations per- V.4 Software for Shape Features
formed by dynamically matching the (changing) pieces of Exercises
a three-dimensional puzzle. A statement like this needs
to be accompanied by a series disclaimers: not every in-
teraction is based on shape complementarity; interactions
that are based on shape complementarity are not entirely
so; and the relevant shape complementarity is local and
imperfect. In other words, the situation is hopelessly com-
plicated.
Our goal in this chapter is to introduce mathematical
and computational methods that allow us to start talking
about the real problem in more precise terms. We do this
be introducing three essentially new concepts. In Section
V.1, we make an attempt to give a precise meaning to cav-
ities in proteins. The main idea here is to combine the
topological concept of a hole with a minimum amount of
geometric information, and this information is the evolu-
tion of the shape under growth. In Section V.2, we return
to homology groups and introduce the concept of topo-
logical persistence. It is a measure of how important a
topological feature is during the evolution. We see this as
a tool to cope with imperfections as it permits us to distin-
guish topological features from topological noise. In Sec-
tion V.3, we make an attempt to give a precise meaning to
interfaces between interacting molecules. We define it as
a two-dimensional sheet separating the molecule. While

71
72 V S HAPE F EATURES

V.1 Pockets from infinity. All we require is that a pocket be wider on


the inside than at possible entrances from the outside. To
In this section, we formalize the idea of a cavity in a pro- make this idea concrete, we grow the space-filling diagram
tein by introducing the concept of a pocket in a space- and observe how it changes: the relatively narrow en-
filling diagram. trances close before the inside disappears. In other words,
a pocket is a maximal portion of space outside the space-
filling diagram that turns into a void before it is subsumed
Voids. The simplest type of pocket is a void, which we by the growing diagram. To formalize this intuition, we
define as a bounded connected component of the comple- need to settle on a growth model. It is convenient to use
 

ment. Suppose, for example, that is a finite collection of

the one that gave rise to the sequence of alpha complexes,
closed balls in and is the space-filling repre- but we should keep in mind that this choice does affect


sentation of a molecule. Since is finite, the balls cannot what we do and do not call a pocket. According to this
 


  the entire space, which implies that the complement,
cover
,C 
model, the center of the ball in remains
 , consists of one or more connected components.
Exactly on component is unbounded (infinitely large), and
  ,
fixed and the radius at time
root of 
is equal to the square
. We may think of the growth as pushing the
all other components are voids. See Figure V.1 for an il- points on the boundary of the space-filling diagram out-
lustration of the definition in two dimensions. Recall that wards, in the direction normal to the surface. Figure V.2
illustrates this view in two dimensions. In the interior of

Figure V.1: The union of disks has a single (shaded) void. The
corresponding void in the dual complex consists of five triangles.
Figure V.2: The growing disks push the points on the boundary


in Chapter II, we described a deformation retraction from
the space-filling diagram, , to the dual complex, . The
 outwards, in normal direction. Following the vectors, the points
in the shaded region have paths that end at Voronoi vertices.

 
plain existence of that retraction implies that for each void
in we have a void in that contains the void in . In-  the Voronoi cells, the vector field is defined by the sweep-
deed, we can reverse the deformation retraction to show ing spheres. We extend it to the rest of space by using
that the two voids have the same homotopy type. Since the circles that sweep out the Voronoi polygons and the


the dual complex is a subcomplex of the Delaunay trian- intervals that sweep out the Voronoi edges. Starting at a

   
gulation, we may think of each void in as a collection point outside the space-filling diagram, we follow vectors
of tetrahedra,  . The boundary is a col-
 and thus form a path that may or may not go to infinity.

 
lection of triangles in . This collection bounds in but
*   
We define a pocket

as a connected component of the set of


not in . It follows that represents a homology class in
the second homology group of . Indeed, the boundaries
points  whose paths do not go to infinity. The
points that flow to infinity form a single component, which


of the voids form a basis of that homology group. Hence,
 is the number of voids in , which is the same as the
we refer to as the outside. Each pocket is open where it
borders the space-filling diagram and closed where it bor-

number of voids in .  ders the outside. The latter set of points may formally be
defined as the intersection of the pocket with the closure
of the outside. Its connected components are open two-
Definition of pockets. A pocket generalizes the concept dimensional sets, which we refer to as the mouths of the
of a void by relaxing the requirement it be disconnected pocket. Note that voids are pockets without mouths.
V.1 Pockets 73

Evolution of dual complex. Similar to voids, we may


associate a pocket of the space-filling diagram with a
pocket of the dual complex. The latter is defined com-
binatorially, again by observing how the space-filling di-
agram changes as it grows. The dual complex changes
only at discrete moment, namely when the space-filling
diagram encounters a new vertex, edge, polygon or cell of


the Voronoi diagram. There are ten cases distinguished by
the dimension of the dual Delaunay simplex, , and the C2


relative position of its orthocenter, . We recall that is
the point at which the affine hull of intersects the affine M2 C2
hull of its dual in the Voronoi diagram.

 
Figure V.4: The thin solid lines represent polygons that meet
 along a common edge in space. That edge appears as a solid dot,
Case M : is a vertex and the orthocenter lies


which marks the orthocenter of the triangle. From left to right,
,
in the interior of the corresponding Voronoi cell, .
 the orthocenter lies inside the triangle, lies outside and sees one
This cell is encountered at time   , which is the
edge, lies outside and sees two edges and their shared vertex.
moment when the -th ball changes from imaginary

 
to real radius.

Case is an edge and lies in the interior of the cor- Case C :

. Here we have two sub-cases de-
pending on whether sees one or two edges
responding Voronoi polygon. There are two generic
sub-cases, both illustrated in Figure V.3. from the outside. In the first case, the three
balls touch the Voronoi edge at the same mo-
ment they encounter the Voronoi polygon dual
to the visible edge. In the second case, the balls
touch the edge at the same moment they en-
counter the two polygons and one cell dual to
the two visible edges and the vertex they share.

M1 C1

Case is a tetrahedron. Its orthocenter is necessarily the
corresponding Voronoi vertex.

Figure V.3: The vertical lines are side views of polygons in Case M : 



. The four balls completely sur-

 
space. The solid dot marks the orthocenter of the Delaunay edge. round the Voronoi vertex before they reach it.
On the left, this edge intersects its dual Voronoi polygon, while Case C : . Here we have three sub-cases de-
on the right, it lies on ones side of the polygon. pending on whether sees one, two or three tri-
angles from the outside. The four balls touch
Case M :

 

. The two balls approach the
the Voronoi vertex at the same moment they
touch the Voronoi edges, polygons and cells
Voronoi polygon from both sides, eventually

 
that correspond to the triangles, edges and ver-
touching it at . tices visible from .
Case C : . The two balls approach the poly-



gon from the same side. At the moment they In Case C and in the last sub-case each of Cases C
 
touch, the smaller ball breaks through the outer and C  , sees a vertex of from the outside. Assuming
sphere and starts sweeping out the Voronoi cell lies outside the space-filling diagram, this is only pos-


on the other side of the polygon. sible if the ball centered at that vertex is contained inside


Case is a triangle and lies in the interior of the corre-
the union of the balls centered at the other vertices of .
This is unlikely to happen for molecular data and usually
sponding Voronoi edge. There are three generic sub- indicates a measurement or modeling mistake.
cases, all illustrated in Figure V.4.

Case M :





. The three balls completely sur- Metamorphoses and collapses. In four of the ten cases,
round the Voronoi edge before they touch at . only one simplex is added to the dual complex, namely in
74 V S HAPE F EATURES


Cases M , M , M and M . Consistent with the discussion the flow along normal vectors. We are only interested in
 
tetrahedra. As noted in Case C , if the orthocenter of
 
in Chapter III, we call these operations metamorphoses,
since they change the homotopy type. We will see shortly a Delaunay tetrahedron lies outside then it sees ei-


that the remaining six cases do not affect the homotopy ther one, two or three of the triangles. For each triangle
type. They can be understood as inverses of the six types visible from , we define , where is the tetrahe-
of collapses illustrated in Figure V.5. Recall that a princi- dron on the other side of the shared triangle. To cover the
case in which the triangle lies on the boundary of the De-
launay triangulation, we introduce a dummy tetrahedron,
 , that represents the space outside the triangulation. By
definition, its orthocenter is at infinity, so  can only be a
successor but not a predecessor of other tetrahedra. This
is what we call a sink of the relation. The other sinks are
23−collapse 13−collapse 03−collapse the tetrahedra that contain their orthocenters; they define
metamorphoses in the evolution of the dual complex.


Note that implies that the square radius of the or-


thosphere of is less than that of the orthosphere of . If
 , this is true because the orthoradius of  is infinity,
12−collapse 02−collapse

by definition. If and are both (finite) Delaunay tetra-


hedra, this is true because their orthocenters are Voronoi


01−collapse vertices that lie on the same side of the plane separating
and . As illustrated in Figure V.6, the two orthospheres


Figure V.5: From left to right, top to bottom: collapsing a tetra- intersect in a circle that lies in the separating plane and the


hedron from a triangle, an edge and a vertex, collapsing a triangle orthocenter of is further from that plane than the ortho-


from an edge and a vertex, and collapsing an edge from a vertex. center of . This implies that the square radius increases
In each case, the collapse removes the tetrahedron, the transpar- along every chain of the relation. Hence, is acyclic and
ent triangles, the dashed edges, and the dotted vertices, if any. its transitive closure is transitive.


pal simplex is not face of any other simplex in the com-

 
plex. A proper face of a principal simplex is free if all
simplices that contain are faces of . Such a pair 


defines a collapse, which is the operation that removes all

simplices between and including and . Formally, the

    
complex obtained from by collapsing the pair 


is

 


. It is convenient to specify


the type using the dimensions
and to talk about -collapses, for
 

and  
 . With
  

this notation, the changes in the dual complex described


  
in Case C are caused by inverses of -collapses, for
. Figure V.6: Think of the triangles as projections of tetrahedra
and the circles of projections of spheres. The centers of both
Each collapse can be realized as a deformation retrac-
(dotted) orthospheres lie on the right of the separating plane.
tion that pushes a portion of ’s boundary through to-


ward the remaining portion of the boundary. In the pro-


cess, the retraction removes and all faces of that con-
tain . Being a deformation retraction, the operation does Pockets of dual complex. We are now ready to define
not affect the homotopy type of the complex, and neither and compute the pockets of the dual complex using the

 

does its inverse. partial order over the tetrahedra. The ancestor set of a
tetrahedron   contains , its predecessors, the
predecessors of the predecessors, and so on:
Partial order. Using the classification into ten different
operations, we may introduce a partial order on the De-   
    
launay simplices, which we think of as a discretization of

V.1 Pockets 75

We have seen that a tetrahedron can have more than one complex. Based on this adjacency information, we can
successor. It is also possible that it belongs to more than compute the connected components using standard graph
one ancestor set, although this is not the common case. algorithms, such as depth-first search or union-find. Com-
The pockets in the dual complex are defined by the tetra- puting mouths is similar to computing pockets, only one
hedra that neither belong to the dual complex nor to the dimension lower.

ancestor set of  . Note that this is more conservative than
collecting all tetrahedra outside that belong to ancestor Step 1. Collect the boundary triangles not in
 .
sets of finite sinks. We compute the pockets in two steps:
  
Step 2. Partition this collection into components.
Step 1. Collect the tetrahedra in     .
We may do the computation for individual pockets or for
Step 2. Partition this collection into components.
 
all pockets at once. In Step 1, we collect the triangles
in  that belong to exactly one pocket tetrahedron.
To collect the tetrahedra, we assume the Delaunay sim-
plices are given in a list ordered by birth-time. As il- 
In Step 2, we call two triangles adjacent if they share
an edge does not belong to . Finally, we use the same
lustrated in Figure V.7, the relation over the tetrahedra standard graph algorithms to compute components.
is acyclic and goes monotonically from left to right. We

Bibliographic notes. The importance of cavities in drug


design and discovery has been known for a while [4].
The formalization as pockets introduced in this section has
ω been described in [3] and implemented as part of the Alpha
K Shapes software. The definition of a pocket is not purely
topological and requires a crucial geometric component,
Figure V.7: Ordered list of simplices with relation over the tetra- namely the growth model of the input balls. This growth
hedra indicated by arrows. model forms the basis of the partial order over the Delau-
nay tetrahedra. An extension to include simplices of all
mark the tetrahedra in the dual complex, which form a pre- dimensions has been used for reconstructing the surface
fix of the sub-list of tetrahedra. Next, we mark the tetra- of scanned point sets [2] and might have further applica-
hedra in the ancestor set of  by searching backward from tions in the analysis of protein shape.
 along the pairs of the relation. To complete Step 1,
we now collect all unmarked tetrahedra in a single scan In everyday language we barely make any difference
through the list. See Figure V.8 for a two-dimensional il- between pockets and other holes, such as the ones counted
lustration. The resulting collection contains the tetrahe- by the Betti numbers. This has also been noticed by the
philosophers Casati and Varzi [1], who introduce a con-
cept they call a hollow which is similar at least in spirit to
our formal notion of a normal pocket.

[1] R. C ASATI AND A. C. VARZI . Holes and Other Superfi-


cialities. MIT Press, Cambridge, Massachusetts, 1994.

[2] H. E DELSBRUNNER . Surface reconstruction by wrapping


finite sets in space. Discrete and Computational Geome-
try — The Goodman-Pollack Festschrift, eds. B. Aronov, S.
Basu, J. Pach and M. Sharir, Springer-Verlag, Berlin, to ap-
pear.

[3] H. E DELSBRUNNER , M. A. FACELLO AND J. L IANG . On


Figure V.8: The eight disks form one pocket, which connects to the definition and the construction of pockets in macro-
the outside along one mouth. The corresponding pocket in the molecules. Discrete Appl. Math. 88 (1998), 83–102.
dual complex consists of four triangles and a single mouth edge.
[4] I. D. K UNTZ . Structure-based strategies for drug design and
discovery. Science 257 (1992), 1078–1082.
dra of all pockets. Call two tetrahedra in this collection
are adjacent if they share a triangle that is not in the dual
76 V S HAPE F EATURES

V.2 Topological Persistence it destroys if its addition decreases 


  . Consider the 
evolving two-dimensional space illustrated in Figure V.9
In this section, we measure the life-time or persistence of a as an example. There are three events at which homol-
topological feature in an evolving topological space. The ogy classes are created, namely when

the two components
measure can be used to distinguish between pockets with get born at the points labeled M and when the compo-
relatively wide and narrow entrances and they are essential nents merge the second time at the upper point labeled
in the definition of molecular interfaces discussed in the M . The labels indicate the types of metamorphoses that

next section. correspond to the topological changes. When the compo-
nents merge the first time, a component gets destroyed,
and when the hole gets filled, a 1-cycle gets destroyed. It
The intuition. A prime example of an evolving topolog- should be clear that M destroys what the upper M created,
 
ical space is a space-filling diagram that grows in the way and that the lower M destroys what the right M created.
 
   
discussed in the preceding

  

section. As before, we write Nobody destroys the component created by the left M .




for the corresponding filtration. The  are the complexes Incremental algorithm revisited. We will formalize
that arise during the evolution and, in the generic case, any the idea of pairing creations with destructions by revisiting
two contiguous complexes differ either by a metamorpho- the incremental algorithm for Betti numbers presented in
sis or an anti-collapse. Each anti-collapse may be viewed Section IV.3. We study the algorithm in terms of matrices
as a sequence of metamorphoses in which the later sim- of boundary homomorphisms. Recall that a single step in
plices destroy the topological features created by the ear-
    

that algorithm computes the Betti numbers of a complex


 
 
    .

lier simplices. For example, a 23-collapse consists of a from the Betti numbers of


triangle creating a void and a tetrahedron filling the same. Let the dimension of be   . The only matri-
The life-time of this void is zero because the triangle and
     
ces affected by adding to the complex are the ones of
the tetrahedron are added at the same moment. We will see
that even if a triangle and a tetrahedron are added at dif- 
  
  and of 
     , which are 
displayed in Figure V.10. The new column of the matrix
ferent moments, it is possible to decide in an unambiguous
manner whether or not the tetrahedron destroys what the
triangle created. If it does, then we are talking about a void Ck C k −1
with positive life-time, and we may interpret that life-time
as a measure of significance of the void. We may also in-
terpret it as a shape measure of the corresponding pocket. C k +1 0 Ck

Figure V.10: The addition of  to the complex appends a col-


umn to the matrix of    and a row to the matrix of   .
M1

     

  
of is zero because is not a face of any  -
  
M2
M0 M0 
simplex in . Hence , the rank of the -th boundary
group, is the same for as it is for   . On the other
M1
hand, 
may remain the same or it may increase.


Case 
creates. Then 
belongs to a -cycle, which 
Figure V.9: The region grows from two vertices, the two com- implies that its row in the matrix of can be ze-   
ponents merge twice, and the second merge creates a void that roed out. We can thus write the Betti numbers of
eventually disappears.
 
in terms of the ranks of various groups defined for
  as follows:

        


The idea of creation and destruction is the same as in

 

Section IV.3 and depends on the effect on the Betti num-
bers: a -simplex creates if its addition increases  and


  

   








 
V.2 Topological Persistence 77

   

In words, the   -st Betti number remains un- index of the row, among the first  rows, for which is

Case   
changed and the -th Betti number increases by one. the last column. It returns zero if the row is not defined.
 

destroys. Then does not belong to a -cycle.

 
boolean DOES C REATE  int
 
Its row in the matrix of can therefore not be zeroed  
out and we get a new non-zero entry in the normal
form of that matrix. Hence,
while
if
L AST C OL 
ROW 
 
then row

do
row row

                 else return FALSE

 

  

       









endif
endwhile;
 


return TRUE.



In words, the   -st Betti number decreases by
one and the -th Betti number remains unchanged. After running Function DOES C REATE for the -th row, that

 
row is either zero, in which case the corresponding sim-


The case analysis confirms that the incremental algorithm plex creates, or it has a unique last column, in which
as described in Section IV.3 computes the Betti numbers case destroys.
correctly.

Persistent homology. We argue below that Function


Recognizing creations. Besides re-proving the correct- DOES C REATE computes more than just Betti numbers:
ness of the incremental algorithm, the above analysis it also determines how long a homological feature lasts
points the way to an alternative procedure for distinguish- along the filtration. To make this precise, we return to the
ing creating from destroying simplices. Instead of a union- situation in which the filtration represents meaningful in-
find data structure, we use elementary row operations, formation, such as scale in the case of alpha shapes. In
which are slower but more general. Since we only use

general, we define persistence so it depends on the time

 
row operations, columns in the matrix of correspond to when simplices are added to the complex in the filtration,
 
 
individual 
 -simplices and rows represent   - but to simplify matters here, we re-define time equal to
cycles. When we add , we attempt to zero out its row the index. In other words, we say is added at time . 
from right to left. To describe how this is done, we call
the column of the rightmost non-zero entry in a row its 
Keeping this convention in mind, we now define the  -
persistent -th homology group of
 
as the cycle group
last column, and we assume a function L AST C OL that re- divided by the boundary group at  positions later in


turns the index of the last column; it returns zero if that the filtration:
last column does not exist. Clearly, each row has at most
one last column. Conversely, we maintain inductively that           
each column is last for at most one row. For example, this
property is satisfied by the matrix in Figure V.11 before Taking the intersection of the boundary group with the
the shaded last row is added. After that addition, we use cycle group is necessary for technical reasons to define
row operations to reinstate the property before adding the
next row. To explain the algorithm, we let be the index of
 the quotient group. Figure V.12 illustrates the difference

Zj
1
1 1 1
1 1 1
1 1 1 1

0 B j+p

Bj

Figure V.11: The shaded rightmost non-zero entries identify last


columns of rows.


the row that corresponds to the new simplex . Given a  
Figure V.12: The cycle group and its decompositions into solid
-persistent homology classes and dotted 0-persistent homology
column , we also assume a function ROW that returns the classes.
78 V S HAPE F EATURES


between the  -persistent homology group and the usual We illustrate this property by drawing a right-angled


or 0-persistent homology group. The  -persistent -th isosceles triangle below every interval, as shown in Fig-

    
Betti number is the rank of the  -persistent -th homol- ure V.13. Each triangle is closed along the top and left
ogy group:    .
 -th Betti number of
 
edges but open along the hypotenuse. The  -persistence
is represented by the point   


in the index-persistence plane. According to the Interval
Interval property of persistence. We develop an intu- Property, it is the number of right-angled isosceles trian-
itive picture of persistence using the distinction between gles that contain this point.


creating and destroying simplices. Note that the number

  
of creating -simplices until position in the filtration is
Pairing. The pairing of simplices to obtain intervals sat-

    
the rank of the cycle group:    . Similarly,
 isfying the Interval Property is done using Function DOE -
the number of destroying  -simplices is the rank

S C REATE explained above. Specifically, each destroying
of the boundary group:    . The Betti num-

-simplex corresponds to a non-zero row in the matrix of

     
ber is the surplus of creating versus destroying simplices:
 


and is paired with the   -simplex that corresponds

   . Because Betti numbers are non-negative, 
 to the last column in that row. Note that this   -
the creating -simplices and destroying  -simplices
  
simplex indeed creates, as it witnessed by the cycle repre-
are arranged like opening and closing parentheses in an 
sented by the row. The persistence of a pair  is the
expression, except that some closing parentheses may be


time-lag between the additions of the two simplices to the

missing at the end. In particular, every prefix contains at

 complex in the filtration. In the assumed simplified case
least as many creating -simplices as destroying  -
in which is added at time , the persistence is the dif- 
simplices. We can therefore pair them up and form vertex

ference between indices:  . This is the convention we

disjoint intervals, each starting at the position of a creat-
used to generate Figure IV.16, which shows the persistent

ing -simplex and ending at the position of a destroying
 first Betti numbers of the space-filling diagram modeling
 -simplex (or extending to infinity if there are no
the gramicidin protein.
destroying simplices left). We use intervals that are closed
to the left and open to the right. The Betti number at posi-

tion is then the number of intervals that contain . Any  6
arbitrary pairing creating vertex disjoint intervals has this 5
4
property for Betti numbers. (Can you prove that?) In con- 3
2
trast, there is exactly one pairing that has the following 1
0
stronger property for persistent Betti numbers:


0

I NTERVAL P ROPERTY. The  -persistent -th Betti num- 1000


 
ber at position is the number of intervals that si- 2000

multaneously contain and 


.  3000

4000 9000
8000
7000
5000 6000
5000
4000
3000
6000 2000
1000
0
[ )

Figure V.14: Graph of      , the number of tunnels in log-


[ )


[ ) [ )

index scale for gramicidin. The index in the filtration varies from left
to right and the persistence from back to front. Observe the large
triangular plateau, which corresponds to the dominant tunnel that
passes through gramicidin.
persistence

The running time of the pairing algorithm is roughly


the same as that of the normal form algorithm described
in Section IV.4, namely cubic in the number of simplices,
Figure V.13: Each right-angled isosceles triangle in the index- which is at most some constant times  . Indeed, Func-
persistence plane represents a non-bounding cycle that persists tion DOES C REATE spends fewer than  row operations
over the complexes covered by its interval. per simplex, each taking time at most proportional to  .
V.2 Topological Persistence 79

Bibliographic notes. The material for this section is


taken from [1], where we find the definition of persis-
tent Betti numbers, the algorithm and its correctness proof.
The algorithm has been implemented and experimental re-
sults suggest it is considerably faster than the obvious cu-
bic time bound. We should note, however, that the imple-
mentation in [1] differs in two possibly significant aspects
from the algorithm described in this section. First, the im-
plementation uses a union-find data structure to classify
simplices as creating or destroying, and second, it uses a
sparse matrix representation that permits row operations
in time proportional to the number of non-zero entries.
Persistent Betti numbers have been defined independently
by Robins [3], who uses them to study the fractal nature
of two-dimensional point patterns. Persistent homology
groups are embedded in spectral sequences, which are spe-
cial tables of related homology groups [2]. It might be
interesting to explore the other groups in that table and
to find meaningful interpretations in the context of alpha
complexes.

[1] H. E DELSBRUNNER , D. L ETSCHER AND A. Z OMORO -


DIAN . Topological persistence and simplification. Discrete
Comput. Geom. 28 (2002), 511–533.

[2] J. M C C LEARY. A User’s Guide to Spectral Sequences. Sec-


ond edition, Cambridge Univ. Press, England, 2001.

[3] V. ROBINS . Toward computing homology from finite ap-


proximations. Topology Proceedings 24 (1999), 503–532.
80 V S HAPE F EATURES

V.3 Molecular Interfaces bi-chromatic polygons and their edges and vertices. Fig-
ure V.15 illustrates the definition by showing the interface
The interface between two or more interacting molecules of two collections of disks in the plane.
is the location of that interaction. In this section, we
present a proposal for a surface or complex of surfaces that Local structure. In the generic case, every edge belongs
geometrically represents that interface. One of its applica-

to three and every vertex to four Voronoi cells. This im-
tions is to display functions defined over the interface. plies that for  colors, the interface has a particularly
simple local geometric structure. An interface edge be-
longs to two cells of one and to one cell of the other color,
Interfaces without boundary. Our definition of a mo- and exactly two of the three polygons sharing the edge are
lecular interface is a formalization of two intuitions, bi-chromatic and thus belong to the interface. There are
namely that the best separation of two or more molecules two types of interface vertices: those that belong to three
is part of the Voronoi diagram and that the interesting por- cells of one and one cell of the other color and those that
tion of that separation is protected by a relatively tight seal. belong to two cells of each color. As illustrated in Figure
We will come back to the second intuition later and for- V.16, the local neighborhood of both types of vertices is
malize the first intuition now. a topological disk. We conclude that in the generic case

Figure V.16: The shaded polygons and their edges belong to the
interface. On the left, we have three cells of one and one cell of
the other color. On the right, we have two cells of each color.

the interface for  colors is a  -manifold, which is a


topological space in which every point has an open neigh-
borhood homeomorphic to  . By construction, that 2-
manifold is orientable, with the cells of one color on one
side and the cells of the other color on the other side.

Figure V.15: The solid bi-chromatic edges form the interface of

the two collections of disks. The dotted mono-chromatic edges For  colors, the local structure of the interface can
show the rest of the Voronoi diagram. be more complicated because we may have tri-chromatic


edges and tri- and four-chromatic vertices. For any two
 molecules, each repre-
Consider an assembly of
  colors, we get a 2-manifold, but now these 2-manifolds
 


sented by a collection of balls 


in , and let meet along curves formed by tri-chromatic edges. In
be the collection of all balls. Recall that  the Voronoi di- other words, the interface is a two-dimensional complex of

 
agram of consists of a polyhedral cell for each ball sheets, curves and vertices. Every sheet is a maximal com-
and of the polygons, edges and vertices shared ponent consisting of bi-chromatic polygons, edges and
by the cells. We use colors to keep track of the corre- vertices of a given color pair. Similarly, every curve is a
spondence between balls and molecules. Specifically, if maximal component consisting of tri-chromatic edges and
 belongs to  then we say and  

have the color .  vertices of a given color triplet. Finally, every interface
The polygons, edges and vertices get their colors from the vertex is a four-chromatic vertex in the Voronoi diagram.
cells they belong to. While all cells are mono-chromatic, a Together, the sheets, curves and vertices form a complex
polygon can be mono-chromatic or bi-chromatic depend- in the sense that the boundary of every sheet consists of
ing on whether the two cells that share the polygon have finitely many pairwise disjoint curves and vertices, and the
the same or different colors. The interface between the  boundary of every curve consists of finitely many interface
is the subcomplex of the Voronoi diagram consisting of all vertices.
V.3 Molecular Interfaces 81

Retraction. As defined above, the interface may go to We may think of a retraction as successively removing
infinity, which is sometimes a disadvantage. Our goal here sinks from an acyclic directed graph. It follows that the
is to shrink the interface back to where the molecules are result of the operation is independent of the sequence in
sufficiently close to interact. It seems natural to do this which the collapses are performed.
with a distance threshold, but this would most certainly
lead to the deletion of interior portions and produce frac-
tured surfaces. We therefore shrink from outside in and Clipping. The result of the retraction is the collection of
use relative rather than absolute distance measurements to tetrahedra in the dual complex together with the tetrahe-
decide where to stop the process. In the first step, we re- dra in the pockets. We further remove all mono-chromatic
tract the interface back to the multi-chromatic dual of the tetrahedra and let  denote the remaining collection of
dual complex and its pockets. In the second step, we use multi-chromatic tetrahedra. The interface is now obtained
topological persistence to shrink the interface even further. as the dual of  . More specifically, for each bi-chromatic
We will return to the second step later. edge of the tetrahedra in  , we add the dual polygon to

launay triangulation

To describe the shrinking process, we consider the De-
of the collection of balls . We
the interface. There are, however, complications because
such a bi-chromatic edge may either be completely or only
have mono-chromatic vertices and mono- as well as multi- partially surrounded by tetrahedra in  . In the latter case,
chromatic edges, triangles and tetrahedra. The interface we clip the polygon before adding it to the interface. Fig-


as defined above is dual to the subset of multi-chromatic
simplices in . Note that the first step of the shrinking
ure V.17 illustrates this idea in two dimensions, but we
should keep in mind that the situation in three dimensions
process is equivalent to removing all tetrahedra outside the is more complicated. A partially surrounded bi-chromatic
dual complex that belong to the ancestor set of the dummy
tetrahedron, which represents the space outside the Delau-
nay triangulation. We use 23-collapses to remove these
tetrahedra. We simplify the algorithm by ignoring prin-
cipal triangles, edges and vertices; in other words, we

delete principal triangles, edges and vertices as soon as
they arise. Let denote the dual complex.


  



void C OLLAPSE 
 :


  
if and  is collapsible then
forall faces do delete endfor
endif.

In this context, we consider   


collapsible if the pair is


part of an anti-collapse in the construction of the filtration
and the collapse of and renders the other simplices in
this anti-collapse principal. This is equivalent to saying
that the effect of the 23-collapse is the inverse of that anti- Figure V.17: The triangles drawn with solid edges are the bi-
collapse. We define a retraction as a maximal sequence chromatic triangles constructed by the contraction algorithm.
of collapses. In other words, we collapse as long as we The boldface interface is dual to and clipped at the boundary
can. In the implementation of this operation, we maintain of this collection.
a stack of candidate pairs. Initially, this stack contains all
boundary triangles of the Delaunay triangulation together edge corresponds to a polygon with two types of vertices:
with their incident tetrahedra. During the process, we take those dual to tetrahedra in  and the others. We clip the
pairs from the stack and add new pairs whenever we create polygon by cutting each edge connecting vertices of dif-
new boundary triangles by collapsing. ferent types with the plane of the corresponding boundary
triangle. If that plane does not intersect the dual Voronoi
Complex R ETRACT : edge, which happens in rare cases, we clip at the endpoint

 
while the stack is non-empty do that is closer to the plane. Finally, we connect the cut
 
 P OP; C OLLAPSE  points in contiguous pairs and retain the portions of the
endwhile. polygon with vertices of the first type.
82 V S HAPE F EATURES

Further retraction. We now take the shrinking process  on the boundary of the current set  . We may start with
beyond the retraction from the dummy tetrahedron. Re-  the set of all Delaunay tetrahedra.
call that the topological persistence algorithm of Section

 Complex R ETRACT M ORE  




V.2 generates simplex pairs  with the property that :

  
destroys what created. The dimension of is one while the stack is non-empty do



larger than that of , but we are only interested in the case 
if  
P OP ;
 
then R EMOVE 



in which is a triangle and is a tetrahedron. We think endif


of the operation that removes and as a generalization endwhile.
of a 23-collapse, but it is more complicated because is


generally not a face of , although it can be. We do the As before, we get the interface by duality from the com-
operation only if is a boundary triangle of  and does puted collection of tetrahedra. The running time is dom-
not belong to the dual complex. We first delete and then inated by the topological persistence algorithm, which
retract from . As before, we remove principal triangles, takes cubic time to form the triangle-tetrahedron pairs.
edges and vertices as soon as they get created. With some care, we can implement the rest of the algo-


rithm so it takes only constant time per simplex in the De-

  

void R EMOVE  :

launay triangulation.
if then delete ;
forall triangles

do P USH   endfor;  We note that it is possible to use other functions that
satisfy the monotonicity property (V.1). For example, we
R ETRACT
endif. tetrahedra by using  
 
may bias the shrinking process against large triangles and

 . A second potential


advantage of this function over the inverse of the persis-


Here, is the tetrahedron that shares with . If the re- tence is that it is dimensionless and thus amenable to the
traction from reaches far enough, gets deleted just be- use of universally meaningful constant thresholds.
cause it becomes principal. However, it can happen that


the retraction does not reach all the way, in which case
we recurse for other pairs of simplices before deleting . Global structure. Note that we may get different in-

This is done implicitly during the retraction. To decide terfaces for different values of the threshold . Since a
whether or not to remove and in the first place, we smaller threshold permits as many or more removals than

compare their persistence with a constant threshold and
remove only if  
 
. Here,   ,
. Indeed, if we use  
 
a larger threshold, the interface shrinks with decreasing

 , we get a filtration


 
,
and

for        we have

 are the moments when and are born. Note that
alpha shapes. For

that is parametrized in a way similar to the sequence of
 , the interface is the original sur-
 

   

 (V.1)
face or complex defined by the set of bi-chromatic Voronoi
 polygons. For , the interface is empty, unless the
dual complex of contains bi-chromatic triangles, which

This monotonicity property is important for the correct-
would remain. In this case, we can further decrease the in-

ness of the algorithm because if the retraction from does
terface by making negative, but we have to modify the
  
not reach then this can only be because there is a triangle
retraction to allow for collapses of simplices in the dual
 between and that split the void created by before
complex. Eventually, for  , the interface is guar-
it was destroyed by . But then the other part of the void
must have been destroyed by a tetrahedron  preceding
in the filtration. In other words,     
anteed to be empty.


  , where For a fixed , the interface is a two-dimensional com-
  and  are the moments when  and  are born. The plex. Its two-dimensional elements are sheets defined

 
monotonicity guarantees that the simplices between and by bi-chromatic Voronoi polygons. There are two kinds
are removed by recursive deletions so that can even- of one-dimensional elements: the original tri-chromatic
tually be deleted. We now restate the algorithm and sim- curves and the new bi-chromatic curves outlining the sheet
plify its description by declaring a 23-collapse as a special boundary created by shrinking. Finally, there are two
case of a removal. Because of our policy to delete prin- kinds of zero-dimensional elements, namely the original
cipal triangles, edges and vertices, all other collapses can four-chromatic vertices and the new tri-chromatic vertices
be ignored. The algorithm maintains a stack of triangle- forming the curve boundary created by shrinking. We take


tetrahedron pairs formed by the topological persistence al- all sheets and curves as open sets so the complex is a col-

gorithm. Initially, the stack contains all pairs  with lection of pairwise disjoint open elements. Note, however,
V.3 Molecular Interfaces 83

that the elements are not necessarily simply connected. To


explore this further, we excise thin strips along the curves
to turn each sheet into a connected 2-manifold with bound-
ary. Each component of the boundary is a closed curve
outlining a hole in the 2-manifold. A classic result in
topology says that two orientable 2-manifolds with bound-
ary are homeomorphic if and only if they have the same
genus and the same number of holes. Furthermore, the
Euler characteristic of a 2-manifold with genus  and


holes is


 
        

 
where , and are the number of vertices, edges and
triangles of any arbitrary triangulation of the 2-manifold.
Given a sheet, it is easy to compute its Euler characteristic

genus as    
and to determine its number

of holes. We then get the
  . We may think of this manifold
as obtained by punching holes into a  -fold torus.

Bibliographic notes. The material in this section is


taken from the recent manuscript by Ban et. al [1]. There
is evidence that the geometric interfaces shed new light on
the hot-spot theory of protein-protein interaction [4]. A
competing proposal for a geometric definition of molecu-
lar interfaces can be found in [3], where two independent
real parameters are used to define the interface as a portion
of the molecular surfaces of the two or more molecules.
In topology, 2-manifolds with and without boundary have
been studies for more than a century. The fact that the
topological type of a connected orientable 2-manifold is
determined by the genus and the number of holes can be
found in a number of texts, including [2].

[1] Y.-E. BAN , H. E DELSBRUNNER AND J. RUDOLPH . A defi-


nition of interfaces for protein oligomers. Manuscript, Duke
Univ., Durham, North Carolina, 2002.

[2] W. S. M ASSEY. Algebraic Topology: an Introduction.


Springer-Verlag, New York, 1967.

[3] A. VARSHNEY, F. P. B ROOKS , J R ., D. C. R ICHARDSON ,


W. V. W RIGHT AND D. M INOCHA . Defining, computing
and visualizing molecular interfaces. In “Proc. IEEE Visu-
alization, 1995”, 36–43.

[4] J. A. W ELLS . Binding in the growth hormone receptor com-


plex. Proc. Natl. Acad. Sci. 93 (1996), 1–6.
84 V S HAPE F EATURES

V.4 Software for Shape Features


In this section, we explore extensions of the Alpha Shape
software that are concerned with connectivity information
and shape features. We begin with signatures, then pro-
ceed to pockets, and finally look at molecular interfaces.

Betti number signatures. As explained in Section IV.2,


the components, tunnels and voids 
of a complex in are
counted by the Betti numbers  ,  and  . They are
 
computed by the algorithm explained in Section IV.3 and
displayed to the right of the correspondingly labeled but-
tons in the signature panel shown in Figure V.19. To the
left of each button we can toggle the display of the evo-
lution of the number as a function of the index in the fil- Figure V.19: The signature panel with the tunnel signature dis-
tration. We refer to these functions as signatures of the played in log-scale. The index 2,354 belongs to the higher of the
two plateaus, which implies that both tunnel systems are open in
data set. As an example consider the zeolite data shown in
the displayed complex.

tunnel signature with filtration index increasing from left


to right and persistence increasing from back to front. The
persistence of the tunnels is formally defined in Section
V.2.

12
10
8
6
4
2
0

0
5000
10000
15000
20000
25000 45000
40000
35000
30000
30000 25000
20000
15000
35000 10000
5000
0

Figure V.20: The graph of      , the number of tunnels in


Figure V.18: Three axis-parallel views of the 2,354-th dual com-
plex in the filtration of a periodic zeolite molecule consisting of 
log-scale, of the zeolite data. The noise in the signature decreases
1,296 atoms. from back to front. The two persistent tunnel systems are visible
as plateaus that escape the noise removal the longest.
Figure V.18. Two of the three views are taken along tunnel
systems that intersect orthogonally and give rise to a rather
complicated cave system. Note that the tunnels shown in
the second view are smaller in diameter than those shown Displaying pockets. Prior to developing and imple-
in the third view. It follows that there are complexes in the menting pockets, we have experimented with other and
filtration that have the tunnels in the first system closed more simple-minded ideas aimed at getting a handle on
while the tunnels in the second system are still open. The cavities in molecular data. One such idea was to display
two systems can be detected in the tunnel signature shown
in Figure V.19. Figure V.20 shows the two-dimensional
  
the difference between the Delaunay triangulation and the
dual complex,  , or more generally the difference
V.4 Software for Shape Features 85

between two dual complexes,


   
 . This difference closed under the face relation. The software indicates the


can be computed in the Alpha Shape software by first se- presence or absence of boundary triangles by the choice

lecting and and second pushing the ‘Difference’ but- of color. The mouth regions are therefore visually eas-
ton in the scene panel. The results are not encouraging ily identifiable. However, the internal connectivity of the
because a typically large number of inessential simplices pockets is not immediately visible, which may lead to con-
clutters the view of important cavities. In contrast, the fusion. For example, two pockets may appear connected
dual set of a pocket usually gives a clear indication of the but are not because of missing shared triangles. It is pos-
cavity, as in Figure V.21. The interface also supports the sible to visually inspect the connectivity by turning on the
display of simplices of all dimensions in the scene panel,
as shown in Figure II.17, and using the explosion func-
tion to separate all simplices. We observe the same phe-
nomenon for the mouths of a pocket. Two boundary trian-
gles that share a common edge may or may not belong to
the same mouth depending on which shared edges belong
to the pocket.

Pocket panel. Pockets can be computed without open-


ing the pocket panel, but a more detailed exploration re-
quires interaction with the software, which is facilitated
by that panel. A useful feature is the ‘Shapewire’ button,
which can be used to display the edge skeleton of the dual
Figure V.21: All pockets in the dual complex of the zeolite data
complex together with the pockets. The skeleton does not
for index 2,926.
block the view and helps positioning the pockets relative
to the complex. The panel also provides a means to step
display of individual pockets, and Figure V.22 shows the
through the sequence of individual pockets and to select
largest of the pockets in Figure V.21 from a different angle.
pockets by their number of mouths. The main design of
We should keep in mind that the pocket in the dual com-

Figure V.23: Pocket panel of the Alpha Shape software.

the pocket panel, shown in Figure V.23, is similar to that


Figure V.22: Side view of the largest pocket of the collection of the signature panel. It contains a window for its own
shown in Figure V.21.


signatures, which start after the index of the first chosen
complex. The second index, , can be chosen anywhere
plex is geometrically considerably larger than the pocket
in the corresponding space-filling diagram. This effect is

between and the maximum. It is used to eliminate an-

  
cestor sets of tetrahedra whose indices are larger than or
the reverse of that for the molecule, whose dual complex
equal to . In other words, all tetrahedra , with ,


is considerably smaller than a corresponding space-filling are treated like  in the computation of pockets. This elim-
diagram.
ination of large pockets helps in the exploration of detailed
Remember that pockets in the dual complex are not structures, such as side pockets of larger pockets. An ex-
86 V S HAPE F EATURES

ample is shown in Figure V.24, which shows the pockets fifty-one proteins and their cavity structure. The most in-


filling the system of narrow tunnels visible in the second teresting outcome of that study is perhaps that in about
view in Figure V.18, but with set such that the system 80% of the cases, the pocket with the largest volume is
of wider tunnels visible in the third view of Figure V.18 also the biologically active site of the molecule. In many
are still open. The pockets thus only fill the remains of the instances, the largest pocket is assisted in its function by
narrow tunnels, and as can be seen in the first view, these smaller auxiliary pockets in the vicinity. In another appli-
remains are not connected. cation, Liang and Dill [2] provide numerical evidence that
proteins are packed tighter in the core than near the out-
side. The interface software has been developed by Yih-
En (Andrew) Ban but is not yet complete. It is built on
top of the Alpha Shapes software but requires a variety
of additional features to be useful to biologists. Some of
these features can be seen in visualizations of interfaces
presented in this section.

[1] M. A. FACELLO . Geometric Techniques for Molecular


Shape Analysis. Ph. D. thesis, Dept. Comput. Sci., Univ. Illi-
nois, Urbana, 1996.

[2] J. L IANG AND K. A. D ILL . Are proteins well-packed? Bio-


physics J. 81 (2001), 751–766.

[3] J. L IANG , H. E DELSBRUNNER AND C. W OODWARD .


Anatomy of protein pockets and cavities: measurement of
binding site geometry and implications for ligand binding.
Protein Science 7 (1998), 1884–1897.

[4] A. Z OMORODIAN . Analyzing and Comprehending the


Topology of Spaces and Morse Functions. Ph. D. thesis,
Figure V.24: Three axis-parallel views of the pockets represent- Dept. Comput. Sci., Univ. Illinois, Urbana, 2001.
ing the narrow tunnel system decomposed into pieces by opening
up the wide tunnel system. Both systems are shown as holes in
Figure V.18.

Displaying interfaces. [The input is a complexed collec-


tion of proteins.] [Mention the issue of water molecules,
which we remove for simplicity.] [Talk about the weighted
square distance function over the interface.] [Show one
figure with iso-lines of that function.]

A human growth hormone example. [Say a few works


about the particular two proteins.] [Show the sequence of
figures illustrating the interface filtration.]

Bibliographic notes. The persistence software has been


developed by Afra Zomorodian and is described in his dis-
sertation [4]. It is currently not part of the Alpha Shape
software. The pocket software has been developed by
Michael Facello and is described in his dissertation [1].
Using this software, Liang and collaborators [3] studied
Exercises 87

Exercises (ii) Following your definition, can the Euler char-


acteristic of a void be any integer or are there
 restrictions?

1. Gabriel graph. Let be a finite set of points in  .
The Gabriel graph of consists of all edges
which
for 6. Paired parentheses. Consider a sequence of  @
*IF   F  *KF 
parenthesis of a  well-formed expression, such as for
F  F   F  
 
example      . A pairing is a perfect matching

for all points *+ 


 
.
between the opening and closing parentheses such
that the opening parenthesis precedes the closing
parenthesis in every pair. Each parenthesis has an
(i) Prove that all edges in the Gabriel graph belong integer position in the sequence, and the length of a
to the Delaunay triangulation of . pair is position of the closing minus the position of
(ii) Prove that the Gabriel graph is connected. the opening parenthesis.



   
2. Ancestor sets in the plane. Consider the Delaunay

(i) Given a pairing, let be the sum of lengths of
triangulation of a finite points set in  . Write the pairs. Prove that  .


if the two Delaunay triangles share an edge and both (ii) Prove that depends on the given sequence but
orthocenters lie on ’s side of that edge. not on the pairing.
4 
(i) Prove that is a partial order.
(ii) Prove that the ancestor sets of any two different 4
7. Sperner’s Lemma. Let 
be a triangle and a

* * * 
triangulation of . The label of a vertex in that
sinks in the order are disjoint.
 4  4  

lies on the edge is either or , for every


(iii) Explain how the Gabriel graph relates to the an-
cestor sets of the sinks. of

4 

4  
, and the label of a vertex in the interior
is either , or .
3. Collapsible complexes. Recall that a contractible (i) Prove that there exists at least one triangle in

topological space has the homotopy type of a point. whose vertices have three different labels.
We call a simplicial complex collapsible if there is a (ii) Strengthen the result in (i) by proving that the

tex. Clearly, if

sequence of collapses that reduces it to a single ver-
is collapsible then its underlying
number of triangles with three different labels
is odd.

 
space is contractible.
(iii) What would be a natural generalization of these
 results from a triangle to a tetrahedron?
(i) Prove that if is embedded in  then
is collapsible iff its underlying space is con-
8. 2-manifolds. Recall that a 2-manifold is a topologi-
tractible.

cal space in which every point has an open neighbor-

(ii) Give an example of a simplicial complex em- hood homeomorphic to  .
bedded in that is not collapsible but whose
(i) Show that a two-dimensional simplicial com-

underlying space is contractible.
plex in which every edge belongs to exactly two
4. Barycentric subdivision.
complex and let 
be a simplicial
denote its barycentric subdi-
 Let triangles is not necessarily a 2-manifold.
(ii) Show that a simplicial complex in which the

vision.

   
   closed star of every vertex is the triangulation


(i) Show that each -simplex in gives rise to of a disk is necessarily a 2-manifold.
-simplices in  , for .



(ii) Prove that the Euler characteristic of
are the same.
and

5. Connectivity of voids. A void of a space-filling dia-


gram is by definition connected but can have handles
and islands.
(i) How would you define the Betti numbers of a
void?
88 V S HAPE F EATURES
Chapter VI

Density Maps

Morse theory grew out of the study of the variational


methods in analysis. The initial interest focused on high-
and possibly infinite-dimensional settings. In this chapter,
we introduce Morse theory with an emphasis on the two-
and three-dimensional cases. Possibly the best known re-
sult in Morse theory is the relation between the critical
points of a smooth real-valued function over a manifold
and the Euler characteristic of that manifold. Because of
this relation, Morse theory is sometimes also referred to as
critical point theory.
We use two sections to introduce the basic setting of
Morse theory and one to explain the concept of molecu-
lar pockets in Morse theoretic terms. In the second sec-
tion, we make an effort to relate the Morse theoretic con-
cepts with the discussion on connectivity. While Morse
theory requires differentiable spaces and thus seems to
be built on rather specialized assumptions, we will see
that many themes are familiar from Chapter IV. In some
ways, Morse theory is but a different language or frame-
work to talk about connectivity. The differentiability as-
sumption allows the introduction of otherwise undefined
concepts. Together with suitable non-degeneracy assump-
tions, it brings order into the complicated world of ge-
ometric form. [The material will have to be partially re-
arranged according to the following plan of sections:]

VI.1 Morse Funcitons


VI.2 Critical Points
VI.3 Morse-Smale Complexes
VI.4 Jacobian Submanifolds
Exercises

89
90 VI D ENSITY M APS

VI.1 Smooth vs. Piecewise Linear of 


do not belong to . For 
we have empty 
boundary,   , so attaching a point or 0-cell is the
A Morse function is a smooth real-valued map over a man- same as taking the disjoint union.
ifold that satisfies certain non-degeneracy assumptions.
This section introduces Morse functions as a crucial piece
Smooth manifolds. In order to relate the topological
in the basic mathematical framework of Morse theory.
type to differential properties, we need to restrict ourselves
to sets for which such properties are defined. We need
Sweeping a torus. Morse theory talks about manifolds some basic definitions from differential geometry to ex-


press these restrictions.
 
and smooth functions over these manifolds. The primary
goal is to find out about the topological type of the mani-
 
A map from an open set to another open set

folds through a differential analysis of the functions. The is smooth if the partial derivatives of all orders
standard introductory example is the torus  embedded
 
   
 
exist and are continuous. For general and not necessarily   
   is defined by 
in upright position in and the height function this em- 
*  
open sets and , the map

*  * *  
bedding defines. Formally, is 
smooth if for every
*
there   that
exists an open set
  to its distance from the   - 
4;
mapping each point

containing and a smooth map
 
4
plane. For each , we consider the set of points with coincides with throughout . Note that the com-
height less than or equal to ,
 *      *   4

position of two smooth maps is smooth. A diffeomor-   



phism is a smooth homeomorphism  whose
inverse is also smooth, and two spaces are diffeomorphic
As illustrated in Figure VI.1,  
    
if there is a diffeomorphism between them. A subset
4
changes its topology

*   
only at certain critical values of . is a smooth manifold of dimension if
 has a neighborhood



each that is diffeo-
  
morphic to an open subset . A particular diffeo-
 is called a parametrization
s

    
h ( s)
  is called
morphism  
r
 , and its inverse   
 
h ( r) of
 . As
*   F *IF
  an example we may
a coordinate system on
consider the 2-sphere 

h (q)
q

cover  with six open hemispheres defined by
 . We  can
*
p for    . As shown in Figure VI.2, each hemisphere
h ( p)

* 
can be parametrized by orthogonal projection to one of the
attach attach attach attach coordinate planes. For a point  , we can construct
0-cell 1-cell 1-cell 2-cell
0

Figure VI.1: Evolution of the torus in the sweep from bottom to


top and the corresponding construction by attaching a 0-cell, two
1-cells, and a 2-cell.

     *    HF *KF
It is instructive to look at the evolution of the homotopy
type of  

  homeomorphic to
. A -cell, , is a space
the -dimensional ball, . Each
time the homotopy type of  changes, we can interpret
this event as attaching a cell of some dimension. The evo-

 
lution of the torus during the sweep and the interpretation Figure VI.2: The upper open hemisphere is parametrized by pro-
of attaching cells is illustrated in Figure VI.1. To define jection to the  -plane.

  
what attaching a cell exactly means, note that the bound-
    
 
is a   -sphere,   . The attachment of
   * *
ary of a -dimensional hyperplane in that best approximates
to a space requires a continuous map     ,

 near . The tangent space at is the
hyperplane  /  -dimensional 
  
which we refer to as the gluing map. Then with at- through the origin of that is parallel

*  
tached by  is the space 
*   /
obtained by identifying to this best approximating hyperplane. The elements of

every points   with  . All interior points the vector space  are called tangent vectors to  at
VI.1 Smooth vs. Piecewise Linear 91

* . Note that for every smooth curve     passing 


through * , the tangent vector   1 * is a tangent vector and
tangent space of . The index is then the number of eigen-

thus an element of  / . 
vector directions along which decreases. For example,
the indices of the critical points  , , , and  in Figure  
VI.1 are 0, 1, 1, and 2. This fact is also expressed in the
lemma of Morse.
4
Critical points. The homotopy type of the partial torus
  
 changes when passes the height value of the points
, , , and  marked in Figure VI.1. These are the points
     . There is a
M ORSE L EMMA . Let be a non-degenerate critical point

with index of
with horizontal tangent planes. Assuming a local coordi-
nate system in a neighborhood, a point  is a critical  
neighborhood of and a local coordinate system
    in with 

for all and 

    
point of if all derivatives vanish, 

   


 
*   
 
  

If is a critical point then  is a critical value. Non- throughout .
critical points and non-critical values are also referred to
as regular points and regular values. Note that the dimensions of the cells attached to the evolv-
ing torus in Figure VI.1 are equal to the indices of the
Just like the first derivative can be used to compute
corresponding critical points. This is generally the case
 
the best linear approximation to , the second derivative
because a critical point with index connects
* 
can be used to compute the best quadratic approximation.
Specifically, the Hessian of at  is the matrix of to the past along directions. These directions span a -
dimensional cell needed to realize the connections.
second derivatives,
# 
*


* *  * %
 
Degenerate critical points. A 1-dimensional manifold

is a closed curve. A connected open subset is an open in-



A critical point  is non-degenerate if 
terval, which is homeomorphic to  . Consider the height
singular, that is,   

is non-
. Non-degenerate critical
function


 

defined by 

 

. The * *
derivative vanishes at 0. The second derivative vanishes
points are isolated, which means there is an open neigh-
too,  , which identifies 0 as a degenerate crit-
borhood without other critical points. We call a Morse
ical point. Geometrically, the degeneracy is manifested
function if all critical points are non-degenerate.
by the fact that an arbitrarily small perturbation can re-
A quadratic function in two variables has only three move the critical point or turn it into two non-degenerate
types of critical points, maxima, saddles, and minima. The ones, a maximum and a minimum. Figure VI.3 illustrates
 
* :*  * *
origin is a critical point for every possible assignment of
 the instability of the degenerate critical point. A simi-
signs to 
 


 , and it is a maximum for

  , a saddle for  or  , and a minimum for .
 
The saddle is the most interesting case of the three because
a circle drawn around it has two peaks alternating with two
pits. In contrast, a circle drawn around a regular point has
only one peak and one pit. Critical points with small cir-
cles that oscillate more often than twice are necessarily
degenerate.

Index. The Hessian is symmetric and we can compute


 , where is the
Figure VI.3: From left to right, graphs of the function     
its eigenvalues, for  ,   , and 
  

. Critical points are marked.


  

 
dimension of the manifold  . Assuming the Hessian is The middle function has a degenerate critical point at 0, which is
non-singular, all eigenvalues are non-zero. The index of unfolded in different ways by the other two functions.
at a non-degenerate critical point is the number of neg-
ative eigenvalues and is denoted as   . Recall that the lar degenerate critical point exists for the monkey saddle
eigenvectors define an orthogonal coordinate system in the shown in Figure VI.4. It may be specified as the graph
92 VI D ENSITY M APS

 * :*


*  @* *  , which is the real part of
 For example for the torus we get      

 * *  . As we go around a circle centered at the ori-
 
of
    
. In words, for every minimum and maximum we get

gin, the function
and three pits at
point is 


has three peaks at  ,   , and    ,
 ,   , and   . The only critical
. The matrix of second derivatives at
Morse function we use. For the sphere we get  
exactly one (non-degenerate) saddle point, no matter what
 


 . This implies that every Morse function of the
 
that point is sphere has at least two (non-degenerate) critical points. A


 *  * minimum example is the ordinary height function, which


 * 


 *  has a minimum at the south-pole and a maximum at the
 north-pole.
which is zero at 0.
Bibliographic notes. The original development of
Morse theory from its variational background is described
by Morse [3] and by Seifert and Threlfall [4]. Milnor’s
later book [2] emphasizes the topological analysis of man-
ifolds and has since become a standard reference in Morse
theory. Good introductory texts to the related subject of
differential topology are the books by Guillemin and Pol-
lack [1] and by Wallace [6]. A good introduction to lin-
ear algebra including an intuitive discussion of eigenval-
ues and eigenvectors is the book by Strang [5].

[1] V. G UILLEMIN AND A. P OLLACK . Differential Topology.


Prentice-Hall, Englewood Cliffs, New Jersey, 1974.

[2] J. M ILNOR . Morse Theory. Princeton Univ. Press, New Jer-


sey, 1963.

[3] M. M ORSE . The Calculus of Variations in the Large. Amer.


Math. Soc., New York, 1934.
Figure VI.4: Monkey saddle with degenerate critical point.
[4] H. S EIFERT AND W. T HRELFALL . Variationsrechnen im
Großen. Published in the United States by Chelsea, New
All critical points in the above examples are isolated,
York, 1951.

* :* * *
but there are others that are not. For example, for

  the entire -axis is critical, but none of [5] G. S TRANG . Introduction to Linear Algebra. Wellesley-
   
its points are isolated. Similarly, if we lay down the torus Cambridge Press, Wellesley, Massachusetts, 1993.
on its side, the height function has a circle of minima and
[6] A. WALLACE . Differential Topology. First Steps. Benjamin,
another circle of maxima.
New York, 1968.

Euler characteristic. Let  be a compact and smooth


manifold without boundary and    a Morse func- 
 
tions. We will see in Section VI.2 that we can construct a
-cell for each index- critical point so that  can be con-


structed by successive attachment of these cells. Let be
the number of critical points of index . As always, the
Euler characteristic is the alternating sum of cells, which
is also the alternating sum of critical points,


 
 

  
 .  


VI.2 Morse-Smale Complexes 93

VI.2 Morse-Smale Complexes joint or the same. Every maximal integral line is open at



both ends and thus a map of an open interval or, equiva-
lently, of the real line,    . It approaches two

In this section, we introduce the gradient of a Morse func-
tion and use it to construct the -cells whose inductive at-
 
tion,   1
  ,
critical points, which we refer to as its origin and destina-

   and   


     . 1 ,
4
tachment reproduces the evolution of the homotopy type
of  , for continuously increasing real threshold . It is convenient to consider each critical point as an inte-
gral line by itself so that the collection of integral lines
partitions  . The stable manifold of a critical point is
Gradient flow. The gradient of a linear map  * 
4 * is the vector    


 

4 4
. It is the 4  the union of integral lines with destination and, symmet-
rically, the unstable manifold is the union of integral lines
projection of a normal vector of the graph of and points with origin ,

   . 
in the direction of the steepest ascent. The same concept

.
    

* 
can also be defined for a Morse function 

/ /8 /
Assuming an orthonormal local coordinate system at ,
the gradient of  is    
    , same



 
  
    
as for linear maps. We can define it also without refer-


*  *  /
ence to a coordinate system. A vector field, , maps

every point  to a tangent vector  
 . The stable manifold of a minimum is the minimum itself.
The gradient is the particular vector field that satisfies The stable manifold of a saddle is an open curve, which is


 

 
 , for every vector field , where 

  
the union of two integral lines and the saddle itself. In a 2-
manifold  , the stable manifold of a maximum is an open


is the directional derivative of along . For example, if
we have a smooth curve 

  with
1 

velocity vector disk, which is the union of a circle of integral lines and
 then the derivative of   can be computing maximum itself. All three cases are illustrated in Figure

using the gradient as VI.6. Note that the dimension of each stable manifold is


the index of the critical point that defines it,   


,   
  
,


  .
The gradient vanishes precisely at all critical points of .
If we start at a regular point and follow the gradient we

-* *
trace out a path, which is a solution to the ordinary dif-
ferential equation


    . This path is called an
integral line. It depends smoothly on the initial condition,
which is its regular starting points. Two integral lines can
therefore not cross. Neither can an integral line fork, and
because we can reverse the gradient vector field by con-
Figure VI.6: From left to right, that stable manifold of a min-
sidering  , two integral lines can also not merge. The imum, a saddle, and a maximum of a two-dimensional Morse
patterns of integral lines in the neighborhoods of a regu- function.
lar and several critical points on a smooth 2-manifold are
shown in Figure VI.5
Each stable manifold is the injective image of an open
balls. However, as indicated by the examples in Figure
VI.6, the closure of a stable manifold is not necessarily
homeomorphic to a closed ball. Nevertheless, the clo-
sure of each stable manifold is the union of (open) sta-
ble manifolds. The collection of stable manifolds thus
satisfies the two conditions of an open complex: its cells
Figure VI.5: From left to right, the flow in the neighborhoods of partition  and the boundary of every cell is a union of
a regular point, a minimum, a saddle, and a maximum.
other cells. By symmetry, everything we said about sta-
ble manifolds is also true for unstable manifolds. The
 
dimension of the unstable manifold of a critical point
is the co-dimension of the stable manifold,  

Stable manifolds. Every regular point belongs to an in-

tegral line, and two maximal integral lines are either dis-           .
94 VI D ENSITY M APS

Morse-Smale functions. We may refine the complexes Shape of Morse-Smale cells. Note that all 2-cells in
of stable and unstable manifolds by forming unions of Figure VI.7 have four sides, provided we count an arc
integral lines that agree on both limiting critical points. twice if it bounds the cell on both sides. In other words,
This amounts to overlaying the two complexes. In do- all two-dimensional Morse-Smale cells are quadrangles.
ing so, it is convenient to assume that the stable and un-
Q UADRANGLE L EMMA . Every 2-cell of a two-dimen-
*
stable manifolds intersect in a generic manner. To ex-


sional Morse-Smale complex is a quadrangle.
  . The intersection is transversal
plain what this means, we consider a point common to


/ /

*
and
span the tangent P ROOF. The vertices of a 2-cell alternate between saddles
/
at if the tangent spaces and
space  . Equivalently, the dimension of the intersec-
and other critical points, and the non-saddles alternate be-

/ / 
   
tween minima and maxima. Any such cyclic sequence has
  
   

tion of the two tangent spaces is
    .
length  , for . We take two copies of a  -gon and


A Morse-Smale function is a Morse function  


glue them together along the shared boundary. Saddles be-

 come regular points, minima remain minima, and maxima

  
whose stable and unstable manifolds intersect only remain maxima. The result is a topological 2-sphere with

  
transversally. For example, the height function of the up- minima and maxima. The Euler characteristic of the
right torus in Figure VI.1 is Morse but not Morse-Smale  2-sphere is  , which implies .

  
because the stable 1-manifold of the upper saddle, , meets
The 3-cells of a Morse-Smale complex may have the
  
the unstable 1-manifold of the lower saddle, , along en-

 
structure of a cube, but they can also assume more gen-
tire one-dimensional

  

 

integral
 
lines,

   
    . Morse-Smale
eral shapes with arbitrarily many saddles alternating be-
functions are again dense in the set of maps from  to
tween index-1 and index-2 separating the minimum from
 the maximum. The common features of all 3-cells are that
. In the case of the upright torus, it suffices to tilt it ever
they have one minimum and one maximum, and all 2-cells
so slightly sideways in order to get transversality. Assum-
in the boundary are quadrangles. A few examples of 3-
ing a Morse-Smale function, we define the Morse-Smale
cells are shown in Figure VI.8.
complex as the collection of connected components of in-
tersections of stable and unstable manifolds. We can see in
Figure VI.7 that it is indeed necessary to take components.

Figure VI.8: Three 3-cells of a three-dimensional Morse-Smale


complex. From left to right they have one, two, and three index-1
saddles and the same number of index-2 saddles.

Piecewise linear height functions. Height functions


over manifolds occur in many practical problems, but they
are never smooth in the mathematical sense of the word.
An example is a surface  of a molecule model and the
electrostatic potential on this surface. The surface would
typically be given as a triangulating simplicial complex ,

as shown in Figure VI.9, and the function would be speci-
minimum saddle maximum
fied by its values at the vertices. Using linear interpolation,
Figure VI.7: Solid stable and dashed unstable 1-manifolds with we can extend these values to a continuous function over
overlaid dotted iso-lines of a rectangular portion of a Morse- the entire surface.

4    
Smale function. The two bold 2-cells share the same origin and
*
We need some definitions to explain the linear interpo-
destination.
* 4 
lation. Each point of a triangle is a convex com-
bination of the three vertices,  with
VI.2 Morse-Smale Complexes 95

times between lower and higher values of as a -fold 


saddle. This interpretation is consistent with the result that

regular minimum saddle maximum

Figure VI.10: The star of every vertex in the triangulation of a


2-manifold is an open disk. The shaded portions are lower stars.

the alternating sum of critical points is equal to the Euler


characteristic of  . The alternating sum of simplices in

 
the lower stars of a regular point, minimum, saddle, maxi-
Figure VI.9: Portion of a triangulated surface of a molecule.

mum, and -fold saddle are , ,  , , and  . It follows
immediately that is the number of minima and maxima
    minus the number of saddles counted with multiplicity.


and  . The three parame-


Another similarity between smooth and piecewise lin-


ters 
*
are unique and referred to as the barycentric
* ear height functions arises when we sweep in the direc-  
 
coordinates of . The value at is now defined as the  
tion of increasing height. Assuming   for
analogous combination of values at the vertices,

4         
all , we sort the vertices in the order of increas-
* 
   
ing height. Indexing the vertices accordingly, we define



Note that the barycentric coordinates of the vertex 4 of


 as the the union of the first lower stars and note that

4   are  , which implies that the is a simplicial complex. The sequence of complexes
and          
linearly interpolated agrees with the value specified at 4 .


Furthermore, for points * along the edge 4  we have


 is a filtration and a discrete version of the evolution of 
. The values computed for * within the two triangles that

share 4  thus agree, which implies that is continuous.  
during the sweep. Adding the lower star of a regular point
does not change the homotopy type of , and adding the

Lower stars. The height function 


    is con- 
lower star of a critical point is similar to attaching a cell in
the smooth case.

with Morse functions. Define the star of a vertex 



tinuous but not smooth. It still shares many characteristics
Bibliographic notes. The gradient and related concepts
as the collection of simplices that contain , and the lower from vector calculus are intuitively described in the book-
star as the subset for which is the highest vertex,
    

 
let by Schey [3]. The transversality condition for stable
and unstable manifolds has its origin in dynamical system

    

  
 
    


and is named after Steve Smale [4]. The Morse-Smale
complex has been introduced recently in [2] along with
algorithms for piecewise linear height functions over 2-
It is convenient to assume pairwise different height values
manifolds. The idea of writing a triangulated manifold as
at all vertices so that each simplex belongs to exactly one

lower star. With this assumption, the lower stars partition
the complex . Figure VI.10 illustrates the definitions by
the disjoint union of lower stars goes back to Banchoff [1].

showing the lower stars of vertices that behave like regular [1] T. F. BANCHOFF . Critical points and curvature for embed-
ded polyhedra. J. Differential Geometry 1 (1967), 245–256.
points, minima, saddles, and maxima. More complicated
lower stars are possible, and we cannot remove them just [2] H. E DELSBRUNNER , J. H ARER AND A. Z OMORODIAN .


by perturbing the height values. Instead, we may consider Hierarchy of Morse-Smale complexes for piecewise linear
a vertex whose circle of neighbors alternates   


2-manifolds. Discrete Comput. Geom., to appear.


96 VI D ENSITY M APS

[3] H. M. S CHEY. Div, Grad, Curl and All That. An Informal


Text on Vector Calculus. Second edition, Norton, New York,
1992.

[4] S. S MALE . The Mathematics of Time. Essays on Dynam-


ical Systems, Economic Processes, and Related Topics.
Springer-Verlag, New York, 1980.
VI.3 Construction and Simplification 97

VI.3 Construction and Simplifica-


tion
[Explain the sweep construction for two-dimensional
Morse-Smale complexes using the simulation of differetia-
bility.] [The most important part of the algorithm is maybe
the handle slide, which is the only restructuring operation
necessary to go between different complexes.] [That oper-
ation has been used in early work on Morse theory, maybe
the first time by Smale(?).]
[Build a hierarchy through prioritized cancellation.] [We
can describe the cancellation as a combinatorial restruc-
turing operation and we only need this one to go up the
hierarchy.] [Again, there should be reference to the early
mathematics literature on the topic of cancellation.]

Bibliographic notes.

[1] C. L. BAJAJ , V. PASCUCCI AND D. R. S CHIKORE . Visu-


alization of scalar topology for structural enhancement. In
“Proc. 9th Ann. IEEE Conf. Visualization, 1998”, 18–23.

[2] H. E DELSBRUNNER , J. H ARER AND A. Z OMORODIAN .


Hierarchy of Morse-Smale complexes for piecewise linear
2-manifolds. Discrete Comput. Geom., to appear.

[3] H. E DELSBRUNNER , J. H ARER , V. NATARAJAN AND


V. PASCUCCI . Hierarchy of Morse-Smale complexes for
piecewise linear 3-manifolds. Manuscript, Dept. Comput.
Sci., Duke Univ., Durham, North Carolina, 2001.

[4] M. VAN K REFELD , R. VAN O OSTRUM , C. L. BAJAJ , V.


PASCUCCI AND D. R. S CHIKORE . Contour trees and small
seed sets for iso-surface traversal. In “Proc. 13th Ann. Sym-
pos. Comput. Geom., 1997”, 212–220.
98 VI D ENSITY M APS

VI.4 Simultaneous Critical Points


[Explain the work with John on the topic and mention pa-
pers by Hassler Whitney and books in Catastrophy The-
ory.]

Bibliographic notes.

[1] V. I. A RNOL’ D . Catastrophy Theory. Third edition,


Springer-Verlag, Berlin, Germany, 1992.

[2] H. E DELSBRUNNER AND J. H ARER . Jacobian submani-


folds of multiple Morse functions. Manuscript, Duke Univ.,
Durham, North Carolina, 2002.

[3] T. P OSTON AND I. S TEWART. Catastrophy Theory and Its


Applications. Dover, Mineola, New York, 1978.
Exercises 99

Exercises
The credit assignment reflects a subjective assessment of
difficulty. Every question can be answered using the ma-
terial presented in this chapter.



1. Section of triangulation. (2 credits). Let be a


triangulation of a set of points in the plane. Let

@ 
be a line that avoids all point. Prove that intersects
at most    edges of and that this upper bound

is tight for every


 .
100 VI D ENSITY M APS
Chapter VII

Match and Fit

As a general theme in biology, questions are almost tween the two sets. In Section VII.3, we look at the re-
always about populations and rarely about individuals. lated problems of sampling a rigid motion and of covering
This is particularly true on the molecular level. The the space of such motions with small neighborhoods. In
molecules that participate in the mechanism of life tend Section VII.4, we apply the methods to questions of sim-
to be large and composed of small molecules. Minor ilarity and complementarity. In particular, we look at the
variations in the type or arrangement of the components problem of identifying matching subsequences with min-
are frequently inessential and do not alter the role of a imum root mean square distance and at score functions
molecule within the larger organization. But then again, that assess the shape complementarity of two space-filling
there are seemingly small variations that do have signif- diagrams.
icant consequences. The underlying question is one of
definition: when do we call two molecules the same or
of the same type, and how do we quantify and assess that
notion of sameness. There are various approaches to the VII.1 Rigid Motions
question applied to proteins, including the comparison of VII.2 Optimum Motion
amino acid sequences, space curves modeling backbones, VII.3 Sampling and Covering
and shapes formed by space-filling diagrams. Instead of VII.4 Alignment
asking how similar two shapes are, we may also ask the Exercises
related question of how well two shapes fit side by side.
The complementarity question is a similarity question be-
tween one shape and (a portion of) the complement of an-
other shape. It really makes sense only for space-filling
diagrams and does not seem to apply to information ex-
pressed in terms of sequences and space curves. The
similarity question is at the core of human understand-
ing, which crucially relies on classification to simplify and
create order. The complementarity question, on the other
hand, is at the root of natural and other re-production pro-
cesses and it takes part in protein interaction, which forms
the basis of functioning life.
As always in this book, we focus on mathematical and
algorithmic methods that shed light on the broader biolog-
ical issues. In Section VII.1, we explore rigid motions in
three-dimensional Euclidean space and introduce quater-
nions as a tool to specify and compute with rotations. In
Section VII.2, we study the problem of finding the best
rigid motion for matching one points set with another. The
measure of choice is the root mean square distance be-

101
102 VII M ATCH AND F IT

VII.1 Rigid Motions can be obtained by a sequence of three rotations about co-
ordinate axes. In general, the composition of any two ro-
A motion in three-dimensional Euclidean space can be de- tations is another rotation. Indeed, the rotations form the
so-called special orthogonal group of 3-by-3 matrices, ab-
composed into a rotation and a translation. In this section 
we consider different ways to mathematically represent ro- breviated as SO   . Note, however, that this group is not
tations, and we focus on quaternions, which provide a par- abelian because the multiplication of matrices and there-
ticularly elegant mathematical framework. fore the composition of rotations is not commutative. It is
important to specify the Euler angles in a fixed sequence
  as other sequences of the same angles usually specify dif-
ferent rotations. It is mostly true that two different triplets
Rotation and translation. A rigid motion in is an
   
 *
orientation-preserving isometry of three-dimensional Eu- of angles specify different rotations, but there are excep-
tions. Consider for example a rotation by  about the -
*
clidean space. More formally, it is a map  
FH*
 

  

F
 and  

F * F * axis, followed by a rotation by    about the -axis,


*
such that

 

  *

for every pair  . As illustrated in *  followed by a rotation by about the  -axis and note


Figure VII.1, a rotation is a rigid motion that preserves the that we get the same composite rotation if we switch 
origin, and a translation is a rigid motion that preserves and . In other words, the map
difference vectors. Every rigid motion can be written as
     SO  


x3 is not injective. This suggests that the Cartesian product


of three circles is not an appropriate model and we will
indeed see shortly that    is not homeomorphic
to the space of rotations.

x1 x2
Quaternions. As an alternative to orthonormal 3-by-3
matrices, we may use quaternions to represent rotations.
Quaternions can be viewed as a generalization of complex
Figure VII.1: The translation of the boldface original coordinate numbers:
system preserves the directions of the axes while the rotation pre-
serves their anchor point.
        I J K

where  ,  ,  and   are real numbers and








*
I , J and K
the composition of a rotation and a translation:   
*
.
  are three different imaginary units. In preparation of an
Using matrix notion, we can write   , where
operation that multiplies two quaternions, we specify how
is an orthonormal 3-by-3 matrix with unit determinant
to multiply the imaginary units:
and is a 3-vector:
   * ,
   

   
I J K
  
       
  
*  
,  

I  K  J

* ,
       
 
 J  K  I
  K J  I 

 
The rotation matrix moves the unit coordinate vectors to
and  that make up the columns of
 Note that reversing two different

imaginary units changes      
the sign of the result. If  K  is another
the vectors ,
  I J
quaternion then the product of  and  is
. A rotation about a coordinate axis has a comparatively  

*
simple rotation matrix. For example, rotating about the

     
             
-axis gives    
  

         

I 

 
 




  
2
  

     
J
K

   

  


   
The angle of rotation about a coordinate axis is referred to The product   has a similar form but six of the terms have
as an Euler angle. Leonhard Euler proved that any rotation their signs changed. Sometimes it is more convenient to
VII.1 Rigid Motions 103

 
think of a quaternion as a vector in . We can express cannot use simple multiplications to represent rotations
the product of two quaternions in terms of an orthogonal because the product of a unit quaternion and a purely
4-by-4 matrix and a vector. This can be done by expanding
either the first or the second quaternion to a matrix:
imaginary quaternion is not in general purely imaginary.
Instead, we use the composite product      . Ob- 
     serve that
    
   













 
 

     
        

 




    
   





 
 

where
 and

are the 4-by-4 matrices that correspond
to  . We expand the product of the two matrices in Ta-
 
     ble VII.1 and see that   is purely imaginary. Furthermore,
 
    
   













 
 

since  F F  
   
 , both and are orthonormal. It follows
 
    

   
  that the lower right 3-by-3 submatrix of is also or-
      



 thonormal. This 3-by-3 matrix is the familiar rotation ma-

  to 


Take a moment to verify that the matrices and are


trix that takes
   . The justifi-
cation for    to represent a rotation is not yet complete.
indeed orthogonal. differs from by having the lower Another possibility is that it represents a reflection, which
right 3-by-3 submatrix transposed. While the product of also preserves scalar products. However, a reflection re-
two quaternions is another quaternion, the scalar product verses the orientation of a sequence of three vectors, and
is a real number: we can check that composite multiplication does not. To
          do this, we think of a quaternion as composed of a scalar
and a vector, 

  . The rules       

 

for computing
with
 can be rewritten as
  
      
F F
As usual, we can use the scalar product to define the length
of a vector: 


  . Similar to complex num-


    



bers, the conjugate of a quaternion


ing the imaginary parts: 

is obtained by negat-
 K  . Ob-
     

 I

 J
serve that the matrices associated with  are the trans-
 
poses of those associated with  . Since the matrices are
When  and
plify to 

are purely imaginary then these results sim-



and   . If we now apply  
  the composite product with a unit quaternion  , we get
orthogonal, the products with their transposes are diago-
nal:

 
, where is the 4-by-4 identity ma-   
     and    . Notice that 
trix. Similarly, the imaginary parts vanish when we mul-
tiply a quaternion with its conjugate:  

 
. This  
                
implies that every non-zero quaternion  has an inverse,
  
namely       
. In the special case when  has
    Hence,    is the result of applying the composite prod- 

unit length, we have ,  and     . uct with the unit quaternion to  , which shows that the
composite product preserves cross-products, as required.
Representing rotations. We use
quaternions to represent vectors in
  purely imaginary
and compound mul- Axis and angle. The expansion of given in Table
 
F F
tiplication with unit quaternions to represent rotations. We
start with a few properties, always assuming 
VII.1 provides an explicit method for computing the or-

First, the scalar product 


is preserved if we multiply

.
 thonormal rotation matrix from the unit quaternion. In the

with  . This is true from either side and we show it for


reverse direction, we show that the rotation by an angle

about the axis defined by the unit vector  
multiplication from the left:  
can be represented by the unit quaternion
      
     
  
2   

 



 
because  
. This implies in particular that multi-
 



I

J

K

plying with  also preserves length:    


 
.

As illustrated in Figure VII.2, an observer who looks
Same as rotation, multiplication with a unit quaternion against the direction of the axis sees the vector rotate in
neither changes the angle nor the length. However, we a counterclockwise order. The imaginary part of  gives
104 VII M ATCH AND F IT


F F            


  



                
 

                 
      
  


  





    

              
 
 




    

Table VII.1: Product of matrices in the representation of a rotation by composite multiplication with unit quaternions.

ux r

To prove the claimed correspondence, we write the vec-
tor rotated by about the axis defined by using the
θ formula of Rodrigues,
  2   

  2
 
  
  
r,u u r’

r
which can be seen from Figure VII.2. We show that this
   , where 

 
can also be written in the form  
,    and as given above. Tedious but
  
 straightforward calculations show
Figure VII.2: The rotation of the vector by an angle of about

the line spanned by . The three dotted vectors correspond to the              

  


 


If we substitute  and 
terms in the formula of Rodrigues.

  
 and use the
2
 and  2 
     
identities       
the direction of the rotation axis, and the real part deter-    
mines the angle of the rotation. Note that   represents
then we obtain the formula of Rodrigues.
the same rotation as  and that non-antipodal pairs of unit
  
quaternions represent different rotations. In other words,
Composing rotations. The above relationships provide
the unit sphere
  in is a double cover of the space
of rotations in . Figure VII.3 illustrates the correspon-
a convenient conversion between unit quaternions  and
 F  F  FF
dence with a picture in one lower dimension. The space
axis-angle pairs. We have
and 


 
 , 
 F  F

 . The composition of two rotations

represented by the unit quaternions  and is

x0  

      
   
  
Thus, composition of rotations corresponds to multipli-
x1 x2 cation of quaternions, and from the product it is easy to
again get the axis and the angle. A more direct geomet-
ric description of the composition of two rotations uses
the fact that every rotation can be written as the composi-
tion of two reflections, as illustrated in Figure VII.4. The
Figure VII.3: The north- and south-poles correspond to the iden-
two planes defining the reflections are not unique; they
 just need to pass through the axis of rotation and enclose
tity, and points on the equator correspond to rotations by  .


The dashed great-circle through the two poles represents the set half the angle of rotation. To compose two rotations, we
of rotations about a fixed axis. write each as the composition of two reflections, making
 sure that the second plane of the first rotation is also the
obtained by identifying antipodal points of is usually first plane of the second rotation, as in Figure VII.4. The
 
referred to as the real projective three-dimensional space, middle two reflections cancel and we are left with two re-
or
  for short. It is a good model of the set of rotations flections. The axis of the corresponding rotation is the
in , although we usually prefer because it is easier to line common to the two planes, and the angle of rotation
imagine. is twice the angle enclosed by the planes.
VII.1 Rigid Motions 105

ρ
w

ϕ
ψ
u
v

Figure VII.4: We see three rotations defined by the axis-angle



pairs    ,     and    . Each rotation is the compo-
sition of two reflections illustrated by the great-circles at which
their planes meet the sphere.

Bibliographic notes. The exposition of quaternions and


their connection to rotations chosen for this section fol-
lows [2]. It is commonly acknowledged that quaternions
have been discovered by Hamilton in 1844 [1]. It is less
well known that a few years earlier, Rodrigues studied the
composition of rotations in space and gave a purely geo-
metric explanation that is equivalent to Hamilton’s algebra
[5]. Even earlier, Gauss recorded his discovery of quater-
nions in his unpublished notebook in 1819. We recom-
mend the primer by Kuipers [3] for background on rota-
tions and the text by Needham [4] for background on the
more general context provided by complex analysis.

[1] W. R. H AMILTON . On a new species of imaginary quan-


tities connected with the theory of quaternions. Irish Acad.
Proc. 2 (1844), 424–434.

[2] B. K. P. H ORN . Closed-form solution of absolute orienta-


tion using unit quaternions. J. Opt. Soc. Amer. A 4 (1987),
629–642.

[3] J. B. K UIPERS . Quaternions and Rotation Sequences.


Princeton Univ. Press, New Jersey, 1999.

[4] T. N EEDHAM . Visual Complex Analysis. Clarendon Press,


Oxford, England, 1997.

[5] O. RODRIGUES . Des lois géométriques qui régissent les


déplacements d’un système solide dans l’espace, et de la
variation des coordonnées provenant de ces déplacements
considérés indépendamment des causes qui peuvent les pro-
duire. J. Math. Pures Appl. 5 (1840), 380–440.
106 VII M ATCH AND F IT

VII.2 Optimum Motion point for which the sum of the vectors to the points in the


 

collection vanishes:

  
In this section, we study an optimization problem that

 
   
  

arises when one attempts to match two molecular struc-
tures or to fit two structures snug next to each other. After
formulating the optimization problem, we solve it using
quaternions representing rotations in three-dimensional   theFsum
from the . Indeed, *
This implies that minimizes
* F  is a quadratic
of square distances


space.
function with a unique minimum. That minimum is char-
acterized by a vanishing gradient:


A   *

Problem specification. Suppose we are given two finite
* 
 
collections of points in and a bijection between them.
 
 
While entertaining the possibility that the two collections
are structurally the same or at least similar, we are in-
terested in moving one collection so it best matches the As mentioned earlier, the latter sum vanishes iff . *
    

other. We need some notation to make this precise. Let


      

We are now ready to prove that the best translation is the
  

one that moves to . Let us move every point to the

and be the

 


two collections and assume that corresponds to , for origin of and move the translated copy of with it

each . We use the root mean square or RMS distance to to   . This operation is illustrated in Figure VII.5.
assess how similar the two collections are. This measure Then the sum of square distances between the correspond-
is the square root of the average square distance:
 

   F  F 


   


Given a rigid motion    


, we may apply it to
 
the first collection and recompute the root mean square
distance. We are interested in finding the rigid motion
 
  that minimizes the root mean square distance
 and  .
between   Figure VII.5: After moving the shaded points  to the origin,
the (solid) difference vectors all radiate out from the origin.

F   F
Note that minimizing the root mean square distance is
 , is also the sum of square

equivalent to minimizing the sum of square distances. Re- ing points,  

call also that every rigid motion can be decomposed into distances of the points   from the origin. The
    


a rotation followed by a translation. The space of rigid translation minimizes the sum iff the origin is the centroid

motions is therefore six-dimensional, namely , of the points   :
   would be hopeless or at
and it might seem that computing the particular rigid mo- 

       
tion that minimizes       
least difficult. Quite the opposite is true, and the main rea-   


son for this is the convenience provided by quadratic func-
tions. We consider rotations and translations separately. This implies that the best translation moves to , as
claimed.
Optimum translation. Recall that the centroid of a col-
and of are   
lection of points is the average the points. More for- Optimum rotation. Note that rotating and taking the

  
mally, the centroids of  and centroid commute. In other words, the centroid of 



 . We begin by showing that the best translation is  . Since every rigid motion can be
written as a rota-
  that 
  to the
moves to . In other words, the translation tion followed by a translation,  , the motion can

* * 
minimizes the root mean square distance between  be optimal only if translates the centroid of 
   
and is defined by    . A crucial insight
used in proving this fact is that the centroid is the only
and independently translating  such that
centroid of . We may therefore simplify our problem by
translating
VII.2 Optimum Motion 107


both centroids lie at the origin. Equivalently, we may as-


sume . Using quaternions, we can express the
rotation of a point as   , where  is a unit quater-

 

nion and is the pure imaginary quaternion that corre-
sponds to , as explained in Section VII.1. The
sum of the square distances after the rotation is


 
 
F     F
  

  
F F      F F  
Figure VII.6: The plane represents  , the partially dotted circle

 

represents  , the surface represents the graph of the quadratic
function over  , the dashed lines represent the zero-set and the
The sums of the F  and the   are not affected by F F F 
boldface curve represents the graph of the restriction of that func-
the rotation, so minimizing   is equivalent to maxi-

 tion to  .
mizing the sum of the   
. Since multiplication



with a unit quaternion preserves scalar products, we have
  
  
. Recall from the previous sec- 
point   for which the quadratic function gives a max-
imum. We can compute such a  with a modest amount of
tion that
linear algebra.

      

Recall that the eigenvalues of a square matrix  are the


  

 complex numbers for which the determinant of    



 
  


vanishes. The corresponding eigenvectors are the unit vec-
   

 tors  such that   . Letting   , we


    
  
have four eigenvalues, and because is symmetric, the

  
     eigenvalues

are all real. It is convenient to order them as
   . The corresponding eigenvectors

     

    are
  

   

  
   pairwise orthogonal and therefore span . We can thus
 

* 
 write any quaternion as a linear combination of the eigen-
vectors, 
 
 , and because we are only interested
The two matrices are skew symmetric as well as orthogo- in unit quaternions, we have  . Hence *
nal. The sum that we have to maximize can now be rewrit-
ten as




     
  *    

 
 
 



 

 


 *  
 

  





  


By the assumed
   
ordering of the eigenvalues, we have
, and this maximum is attained for . *

 The corresponding quaternion is    . In other words,
where  . Take a moment to verify that each the optimum rotation is defined by the unit eigenvector
matrix in this sum is symmetric. Since the sum of sym-
metric matrices is again symmetric, we have   .  that corresponds to the largest eigenvalue.

Eigenvalues and -vectors. We can interpret    ge-  Without bijection. If there is no bijection specified be-
tween the two sets then the problem of finding the best
ometrically as a quadratic function over four-dimensional

rigid motion seems significantly more difficult. Assum-


 

Euclidean space. Short of being able to draw the graph of ing and contain points each, we could of course
this function in , we illustrate the idea in Figure VII.6, try all  bijections, but that would take a long time. A
which drops two of the dimensions. Our goal is to find a more effective algorithm alternates between improving the
108 VII M ATCH AND F IT

root mean square distance by changing the bijection and puter vision, the version that works with injections rather
by changing the motion. Note that independent of the bi- than bijections is known as the iterated closest point or
to the centroid of  . So we may again assume that both
jection, the best translation always moves the centroid of ICP algorithm [1].

centroids are at the origin and restrict ourselves to rota- [1] P. J. B ESL AND N. D. M C K AY. A method for registration
tions. We use three subroutines to describe the iterative of 3-D shapes. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-

algorithm. For a given rotation, M ATCH  returns the 14 (1992), 239–256.

between 
 and  . Given a permutation, ROTATE  
permutation that minimizes the root mean square distance
[2] O. D. FAUGERAS AND M. H EBERT. The representation,
recognition, and locating of 3-D objects. Int. J. Robotics
returns the rotation that minimizes the mean square dis-
Res. 5 (1986), 27–52.
tance under this permutation. Finally, given a permutation

and a rotation, RMSD   returns the root mean square [3] B. K. P. H ORN . Closed-form solution of absolute orienta-
distance. tion using unit quaternions. J. Opt. Soc. Amer. A 4 (1987),
629–642.
 ; identity;   [4] W. K ABSCH . A discussion of the solution for the best rota-
loop M ATCH  ; ROTATE  ;
 tion to relate two sets of vectors. Acta Crystallogr. Sect. A
if  RMSD    then  34 (1978), 827–828.
else exit
endif [5] G. S TRANG . Introduction to Linear Algebra. Wellesley-
forever. Cambridge Press, Wellesley, Massachusetts, 1993.

After each iteration, the root mean square distance de-


creases. This implies that no permutation is tried twice.
Since there are only finitely many permutations, it follows
that the algorithm halts. Note however that we neither
have a polynomial bound on the number of iterations nor
a guarantee that the algorithm finds the globally optimal
solution.


A popular version of the above algorithm uses injec-
tions from to instead of bijections. Sometimes this
change is motivated by the purpose of the computation, at
other times by the fact that finding the best bijection is not


entirely straightforward. Given

a rotation, we may use a


subroutine A SSOCIATE  , which determines for each
the point  closest to . In the algorithm, we replace
M ATCH by A SSOCIATE and do the remaining operations

as before, except that is replaced by the multi-set of
points in that are closest to some point in .


Bibliographic notes. The problem of finding the rota-


tion that minimizes the root mean square distance between

two point sets with given bijection in has been studied
in various fields, including x-ray crystallography [4] and
computer vision [2]. In this section, we follow the expo-
sition of the solution given by Horn [3]. For background
on linear algebra and how to compute the eigenvalues and
eigenvectors of a symmetric matrix, we refer to Strang [5].
The algorithm that attempts to minimize the root mean
square distance between two point sets without specified
bijection has also been described in several fields. In com-
VII.3 Sampling and Covering 109

VII.3 Sampling and Covering slices. The area of a slice is    *   ,


 * *
ing the infinitesimal
  
with    , as before. Hence,
In this section, we study two questions on rigid motions,
namely how to sample uniformly at random and how to   *    #   * 
  
cover the space of motions most economically. We treat
translations and rotations separately and spend most of our
Vol 
  
* % 



 2 
 
time on the more complicated case of rotations.   
  
      2 
The size of a sphere. We prepare the discussion of sam-    

pling rotations by measuring the unit 2-sphere and the unit   


3-sphere. For  embedded in , we sweep a plane nor-
*      

  
mal to the -axis and compute the area by integrating in- 
  

 *
finitesimal slices. The perimeter of the circle in which the

plane cuts the sphere is  
 * 
equal to  



 . Hence, *
, with the square radius which we get by substituting  . The total vol- *
   *

# ume of the 3-sphere is therefore 
 . Note also that
      * Archimedes’ theorem does not extend to the 3-sphere, at

 * %
Area  least not in the straightforward manner from sections be-
 




 *


 tween parallel plane to sections between parallel hyper-
planes.

 
The total area of the 2-sphere is therefore  . But note Uniform sampling. Archimedes’ theorem can be used
that the derivation shows more, namely that the area of the to pick a point uniformly at random on  . The method
slice between two parallel planes at a constant distance is may be viewed as picking a point on the enclosing cylinder
the same for all such planes, as long as both intersect the and projecting it back to the 2-sphere:
sphere. This fact has been known already to Archimedes

and is often expressed by saying that the axial projection Step 1. Pick  uniformly at random in  .
from the sphere to an enclosing cylinder preserves area. 4
Step 2. Pick uniformly at random in    .
J4
2 2 
This projection is illustrated in Figure VII.7.

 .
* 
Define
Return             .


*
We now extend this method to and thus to an algorithm
for picking a rotation uniformly at random. Think of as
the axis of rotation, so we just need to pick the angle of
rotation about this axis. It would not be correct to pick an

* 
angle uniformly at random since this would favor small
dislocations of . Indeed, in the quaternions near the
identity would be more likely than those far away from the
identity. To pick the angle correctly, we return to what we
learned from the above volume computation. The angle of
rotation about the axis is twice the angular distance from
the identity on . In other words,     . We

need to pick the angle  from     , not uniformly but
Figure VII.7: Illustration of Archimedes’ theorem implying that  

from a density that favors angles near the middle of the
the sphere and the enclosing truncated cylinder have the same
interval. Specifically, the density is      , normalized
area. 
to have unit total integral. The corresponding distribution

/
function is
  method to compute the volume of
We use the same
embedded in
*
. Sweeping

a three-dimensional hyper-
plane normal to the -axis, we get the volume by integrat-

* 



 2    
  
110 VII M ATCH AND F IT

 *
 
which monotonically increases and reaches 
*
at
 . To pick
an angle, we pick a number uniformly


at random in  , and we compute its preimage under
 
the distribution function: 
    . To get a point
uniformly at random on , we append Steps 1 and 2
above with


Figure VII.8: From left to right: the cube, the FCC and the BCC


lattices. The points with maximum distance to the lattice points
Step 3. Pick uniformly at random in  .
  are the cube centers, the edge centers and the midpoints between
Let     .
 2  the face and the edge centers.

2       .
Return           
   

ume of the balls divided by the volume of the space they


We get a random rotation by using as a unit quaternion.
inhabit. We see that the FCC lattice leads to an effective
Alternatively, we get a random rotation by using  , and
   as Euler angles.
C UBE FCC BCC
points per cube 1 4 2
Covering the spaces of translations and rotations. We packing radius 0.500 0.353 0.433
turn our attention to selecting a collection of rigid mo- volume (fraction) 0.523 0.740 0.680
tions such that every possible motion has a selected mo- covering radius 0.866 0.500 0.559
tion nearby. It is convenient to measure the distance be- volume (fraction) 2.720 2.094 1.463
tween translations and between rotations using the Eu-
clidean metric. We will later analyze how these notions Table VII.2: Numerical assessment of how well the cube, the
FCC and the BCC lattices pack and cover.

of distance relate to the effect of the motion on the root
mean square distance between two sets in .
The idea of guaranteeing that every possible motion has packing while the BCC lattice leads to an effective cov-
a nearby selected motion can be expressed by covering ering. Indeed, both are known to be the respective best
the space of motions with neighborhoods. Consider first packing and covering lattices.
 
translations,

which we represent by 3-vectors or points in As an exercise we may estimate the number of balls we
and let
. Let
 
be a collection of closed balls,
all of radius . We call a covering if  and we
need to cover the unit 3-sphere. Recall that its volume
is 
 . Assuming

  the volume of
 is very small,

call the covering radius. We need infinitely many balls a ball with radius in is about   . If we believe
just because has infinite volume, but we are usually
 
that we cannot cover more economically than the BCC
 
only interested in bounded portions of space. If we use lattice in , we can use a straightforward  volume
ment to show that we need at least    
    


argu-

 
* 
the centers of the covering balls as selected translations,
     balls to cover the 3-sphere.
*
we are guaranteed that every translation has a se-
lected translation at a distance at most from . We study
three lattices of points in some detail. The cube lattice
consists of all integer points, the FCC or face-centered
cube lattice adds all centers of cube faces, and the BCC Sensitivity to small translations. Next, we address how
or body-centered cube lattice adds all cube centers to the

translations affect the root mean square distance between


   VII.2, let and be two


cube lattice. Figure VII.8 shows the portion of each lat- two point sets. As in Section


tice inside a cube of unit side-length and Table VII.2 lists collections of points in , with a bijection that maps
some of their pertinent properties. By counting fractions, to , for each . To simplify the analysis, we assume that

 
we note that the FCC lattice has four times and the BCC the centroids of the two collections are both at the origin:
lattice has twice as many points as the cube lattice. The . This implies that the vectors  add up to

*   
 *
packing radius is the largest radius we can assign to the 0 implying that the sum of scalar products with any vector

points to get non-overlapping balls, and the volume is the


fraction of the space covered by the packed balls. The cov-
vanishes: 
.  Recall that the
root mean square distance between and is the square
 
ering radius is the smallest radius we can assign and still
have the balls cover , and the volume is the total vol-
*
root of the average square distance between corresponding
points. After translating along , the root mean square
VII.3 Sampling and Covering 111

distance is direction opposite to the rotation axis. It is geometrically

  * 
F  *  F obvious that the total distance increases the fastest when


 each point moves in a direction straight away from .


This is possible in the limit and characterized by the veloc-

 F
 F   FH*KF 

ity vector of being parallel to  , which includes
  the possibility that . As for translations, the length
 

which implies  *

F H*KF . We have  *  FH*KF if and
of the gradient is maximized if
    
for all . In this

 for all . To measure how fast the root mean




only if
case, we have and the root mean square

distance between and the rotated copy of is


F  F 
square distance changes with varying translation vector,
we compute the the gradient:   

  

A  * 
    *     *   * 

 
 * 
 *



  

except at *

 are the eigenvalues of the
   everywhere
its length is F0A * F
where
  

The gradient is defined and

 for all  . Figure VII.9 illustrates this result


  

if
. The length is 1 if and only matrix defined in the previous section. For the purpose 
  gradient and its length, we consider a
of computing the
by comparing the graphs obtained for equal and for non- function over :
equal corresponding points. Since the length of the gra-
A    *  *  *   *   
 


   

F2A    F
 * 
   



 of  , we observe    , that


 
Going back to the definition the
 and  for 

eigenvalues are

   , where  is the radius of gyration of
 
projected  

into the plane * 
          .
*  we simplify the expressions for  , its
Figure VII.9: The hyperboloid approaches the graph of the norm in . Note that   
 
function at plus and minus infinity. Using
gradient and the length of the gradient:
dient never exceeds 1, the difference between the root      *     *     * 


A          *   *    *       
mean square distances for two translations is bounded 
   
from above by the norm of the difference vector:
   * 

 
   FH*
 F F0A  F 

 




In words, the root mean square as a function over the Since the length of the gradient never exceeds 
 , the  
three-dimensional space of translations satisfies a Lips- difference between the root mean square distance for two
chitz condition with constant 1. rotations is no more than that multiple of the norm of the
difference vector:
Sensitivity to small rotations. We repeat the analysis    

 
     
 FH 'F 


for rotations. Call the root mean square distance from the

, the radii of gyration of


and  are
centroid the radius of gyration. Since we assume We see that the rotations satisfy a Lipschitz condition that
is similar to that for translations, except that the constant


F F 
F F now depends on the collection of points, in particular to

 
their radii of gyration.
and

Let

* :* *  :*   

be a unit quaternion. The ef- Bibliographic notes. The problem of sampling motions
fect of the rotation represented by is best viewed in the has been studied in various fields, including statistics,
112 VII M ATCH AND F IT

crystallography and molecular modeling. Various meth-


ods for picking a rotation uniformly at random have been
published but not all are correct. In particular, it is impor-
tant to notice that first picking a rotation axis and second a
rotation angle favors quaternions close to the identity if we

pick the angle uniformly at random in  . A popular
method that is correct and different from the one described
in this section is due to Marsaglia [4] and is reproduced in
the exercise section of this chapter.
Packing and covering problems have been studied
within mathematics and have generated a large body of
literature [2, 3]. Surprisingly, many of the main ques-
tions in this area are still open. For example, it is not
known whether or not  the BCC lattice is the most eco-
nomical covering of with congruent balls. Very little
is known about optimal packings and coverings in non-
Euclidean spaces. The problem is challenging even in the
relatively simple case of the 2-sphere, and for most num-
bers of points (or caps) only approximate solutions are
known [1].

[1] J. B ERMAN AND K. H ANES . Optimizing the arrangement


of points on the unit sphere. Math. Comput. 31 (1977),
1006–1008.

[2] J. C ONWAY AND N. S LOANE . Sphere Packings, Lattices


and Groups. Springer-Verlag, New York, 1988.

[3] L. F EJES T ÓTH . Lagerungen in der Ebene, auf der Kugel


und im Raum. Second edition, Springer-Verlag, New York,
1972.

[4] G. M ARSAGLIA . Choosing a point from the surface of a


sphere. Ann. Math. Stat. 43 (1972), 645–646.
VII.4 Alignment 113

VII.4 Alignment creasing the length. We turn the recurrence relation into
an algorithm:
In this section, we briefly discuss the two problems of
LCS 
 

 
match and fit for protein structures. We begin by studying integer
 
:
;
how to match proteins and develop an algorithm that mea-
to  do 

 ;
  ;
for
sures the similarity between two chains of atoms. There-

       
for to do
after, we consider the related problem of docking a protein
       :   

if then
with its substrate.  
else 
 
endif

 
Longest common subsequence. Consider first the com- endfor
binatorial (as opposed to geometric) version of the se- endfor; return .
quence alignment problem. We model a protein as a

  
string over the alphabet of twenty amino acids: This algorithm is a typical example of the dynamic pro-



and . An alignment maps gramming paradigm, which constructs an optimal solution
 
 
the to the in sequence, but it permits spaces on both from pre-computed optimal solutions to sub-problems. To

sides. As illustrated in Table VII.3, we represent an align-
ment by a matrix consisting of two rows and     store the solutions, the algorithm uses an array of  en-


tries. Each entry takes constant time, which implies that
columns, where  is the total number of spaces. A match the total running time is proportional to  . Using a sec-
ond array of the same size, we may keep track of the deci-
Q R A A C C sions made by the algorithm, and with this extra informa-
A Q A C R C R tion, we can reconstruct the longest common subsequence
itself, and not just compute its length.
Table VII.3: The alignment uses   spaces to achieve  
matches.
Sequence alignment. The general alignment problem
is a column of two equal non-space characters, and a mis- permits mismatches and assesses the score by rewarding
match is a column with two different non-space charac-
ters. Columns with two spaces are disallowed. An inser-
  *
each match and penalizing each mismatch, insertion and
deletion. Assuming   gives the score for having *
      
tion is a column with a space at the top and a deletion is and in a single column, we get
a column with a space at the bottom. The common sub-
          
 
sequence between two strings consists of all matches, and 

     

its length is the number of matches. For the moment, we 
 
  be the length of the longest common

restrict ourselves to alignments without mismatches. Let-
ting 
subsequence,  
 
We can think of every alignment as a directed path in the

  is the minimum number

so-called edit graph of the two strings, which we illustrate

 
of insertions and deletions needed to transform to . in Figure VII.10. The path starts at the source in the upper
We compute by dynamic programming. Let be the

  

  
length of the longest common subsequence of
  

A Q A C R C R
and , and define for all and . Then

 
 

Q

    :  

      
 if R
 


if
   
A deletion:
A insertion:
To verify the recurrence relation note that every alignment
ends with an insertion, a deletion or a match. In each case, C match:
removing the last column leaves an optimal alignment of C mismatch:
shorter strings. In the third case, we need to show that the


length of the common subsequence cannot increase if we

Figure VII.10: The edit graph for the strings in the above exam-
do not use the match between and . Indeed, without ple and the path that corresponds to the given alignment.
using that match we end with an insertion or a deletion,
and we may move the last match to the end without de- left corner, takes vertical, horizontal and diagonal edges
114 VII M ATCH AND F IT

and ends at the sink in the lower right corner. A gap in  


of all :  

      

.  This construction is
the alignment is a sequence of contiguous insertions or illustrated in Figure VII.11. Let  be the motion that
and  is then  

of contiguous deletions. It is common to penalize a gap maximizes . The score  of the best alignment between



separately for its existence and an additional amount that , and the best alignment is
for which   
 

4
depends on its length. This may be done by penalizing an .

an amount
   4
insertion or deletion an amount when it starts a gap and
when it continues a gap. This gives Γ
rise to the following recurrence relations:
      
 
     :      4

   
        :      4 
 
  



         
    


where   is the score of the best alignment that ends with


µ

an insertion and  
is the score of the best alignment that
Figure VII.11: The horizontal axis represents the six-dimension-
ends with a deletion. Using three arrays, we can again al space of rigid motions. The upper envelope of the graphs is


compute the best alignment with dynamic programming the motion-wise maximum of the score functions.
in time proportional to  .
The idea of the algorithm is to sample the space of mo-


Chains of atoms. We can use the same algorithmic tions dense enough to guarantee an alignment with a score

at least  , for some . We thus aim at computing

ideas to compute alignments between two sequences of

atoms. Let the and the be the centers of the -carbon an approximately best alignment, but we may decrease
and thus get arbitrarily close to the optimum. This strategy
atoms along the backbones of   proteins. For now, we
two
assume a fixed embedding in and consider the align- makes sense in practice since in any case the locations of
ment problem without applying any rigid motion. Using atoms are only known up to some precision.
the root mean square distance between two sub-chains is
problematic for two reasons. First, it does not lend it- Running time. Improving the approximation by de-
self to the dynamic programming algorithm and, second, creasing comes with a cost, namely higher running time
it prefers shorter over longer sub-chains. Instead, we need
a score function that balances the contributions of length

because we evaluate for more rigid motions. We quan-
tify the dependence by analyzing the running time depend-
and distance. One such function is obtained by combining

ing on . The other parameters entering the analysis are

square distances with gap penalties as follows. Letting
   and , the radii of the small-
the lengths of the chains,

and be positive constants, we reward a match between

and by adding
est spheres enclosing and and the radii of gyration of

     the two sets. Proteins tend to have globular shapes pack-



ing their atoms around their centroids. We may therefore
 
 F
  F 
 assume that the radii of are both roughly equal to   


(VII.1)

 are both roughly equal to  . We

 and the radii of
further simplify the discussion by assuming  . To
to the score, and we penalize for gaps as before. The dy-
decide how dense we have to cover the space of rigid mo-
namic programming algorithm can still be used to identify
 
* 
tions, we determine the sensitivity of the score function

the best in a collection of exponentially many alignments.
to small motions. We first consider translations .
It does this in time proportional to  .
Ignoring penalties for gaps, we get

Next, we permit a rigid motion be applied to one of the
  * 


    F  *
 F 
chains, say . Instead of computing the best motion for  

each alignment, we compute the best alignment for each 
of a dense sample of motions. We need some notation to
 where  is the length of the alignment and the points are
formalize this idea. For each alignment between and

 re-indexed so that maps to  , for
   . The norm
, we get a function

 that

maps a rigid motion  to the
of the gradient of a single term  in this sum


 
   .

score   between  
       
by a constant  , and hence F2A  * F
and . Consider the function is bounded
 defined as the motion-wise maximum
VII.4 Alignment 115


@

We cover the space of translations with balls of radius notation to lay out the rules for this problem. Let and
 . It follows that having a translation that is not
.
 
 be the protein after applying a
represent the protein and the substrate in complexed form,
   

quite the optimum contributes at most   to the error. and let 

F F    
   . By covering
The sensitivity of to small rotations depends on the radii random rigid motion. The input to the reconstruction al-
 
   gorithm consists of  and and not knowing the solu-


of gyration, and we get
the space of rotations with balls of radius   
.
, we tion means we can not use any information on and on
get again a contribution of at most   to the error.    . The goal is to find a rigid motion  such that 
  

.
and    fit well. After  is computed, we can test

By assumption on the shape of the protein, the volume
of translations we need to cover is proportional to , and how well we did by comparing  with    , which can be
 and between    and   . .
done directly or by computing the root mean square dis-
the volume of the  rotations is  . In each case, we need
a constant times   
balls. We cover the space of rigid
tances between  and  

 
motions by cross-products of these balls and thus get a We cannot use the root mean square distance to guide
constant times   rigid motions. Multiplying this with our reconstruction of the complexed form and thus need
 programming algorithm

the running time of the dynamic a score function  that assesses how well a motion does
gives a total running time of   . This is of course not in generating a good fit. There are many possibilities, and
practical and we need faster alternatives, some of which one is the approximation of the van der Waals potential by
will be mentioned at the end of this section. counting the pairs of spheres at small distance from each
 
 for the van der Waals radii of the spheres in and
other. We think of the and as the centers and write

Protein re-docking. In protein docking, the basic ques-


and


     F
. The collections of colliding and of close pairs are

   F    
 
tion is how well a proteins and its substrate fit to each
other. The substrate could be another protein or a small
ligand. We interpret this question as asking how similar       F 
 F 




the substrate is to a portion of the complement of the pro-
tein. This question makes sense if we use space-filling where
 is a small positive constant. As mentioned in Sec-
representations of the protein and the ligand, but not if tion I.4, the van der Waals force is weakly attractive within
we represent them combinatorially or as chains of points small distances of maybe up to four Angstrom, and it is
in space. This idea is illustrated in Figure VII.12. For strongly repulsive for colliding van der Waals spheres. We

   
thus define
 if 
 

 



    if 

Given a rigid motion  , we compute    by comparing
all pairs of spheres in time proportional to  . Improve- 
ments of the running time are possible. Experiments show
that this score function is a good indicator of good fit, but
one weakness is its sensitivity to collisions. Actual pro-
teins are flexible and can avoid minor collisions by small
deformations. We may account for this fact by allowing
Figure VII.12: The shaded local complement of the left shape is a few collisions in the definition of  , but to get a good
similar to the shaded portion of the right shape.
approximation of the reality, we will need to build knowl-
edge about flexibility into the score function.
protein-protein interactions, the region of local comple-
mentarity is frequently fairly large. The geometric fit be-
tween the two proteins thus becomes a significant factor Analysis. The general algorithm for re-docking is sim-
in making the interaction possible or, more accurately, in ilar to the one for geometric alignment: we explore the
not making that interaction impossible. Instead of pro- space of rigid motions and evaluate the score function at
tein docking, we consider the simpler re-docking problem. the centers of the balls used to cover the space. By choos-
Here we are given the complexed form of a protein and

ing the balls in the cover small enough, we can guarantee

.
its substrate and we attempt to reconstruct that form while that the root mean square distances between  and  
   
and between    

suppressing any knowledge of the solution. We need some and    are less than some
116 VII M ATCH AND F IT


threshold  . Note that this does not necessarily im- gested in [9]. It should be mentioned that the presented
ply that    is large. Indeed, it could be zero because algorithm is significantly slower than the currently most
motions with high score value tend to be right next to mo- commonly used DALI software [5], but it is the only algo-
tions that generate collisions. In other words, whether or
not the algorithm recognizes  as close to    depends . rithm that guarantees a good approximation of the optimal
alignment in polynomial time.
on the shape of  in this neighborhood. We can design The goal of protein docking is the prediction of whether,
cases in which  has arbitrarily narrow high spikes and where and how proteins interact with each other and with
our algorithm has little chance to ever recover the com-
other molecules. In many cases, the surface area of the
plexed form. There is, however, experimental evidence interface during the interaction is substantial, and in these
that such configurations do either not exist or are rare for
cases the geometric fit is an important factor. However,
actual proteins.
there are cases with smaller interaction area in which
Let us return to the question how to cover the space of forces unrelated to geometric shape outweigh the impor-
motions to guarantee a root mean square distance of at tance of shape [2]. We refer to [4] for a recent survey


most . As before, we simplify the analysis by setting of the extensive literature on computational approaches to
 and assuming that the radii of the smallest enclos- protein docking. The material in is this section is based on

 
ing spheres and the radii of gyration are all roughly equal the work described in [1].
to  . According to the sensitivity analysis in the previ-
ous section, we may cover the space of translations with [1] S. B ESPAMYATNIKH , V. C HOI , H. E DELSBRUNNER AND
 
balls of radius   and the space of rotations with balls of
radius     , where is the radius of gyration of either
J. RUDOLPH . Protein docking by exhaustive search. Manu-
or  . For the translations, we need to cover a volume script, Duke Univ., Durham, North Carolina, 2003.


of about requiring about 


balls. For the rotations,
[2] A. H. E LCOCK , D. S EPT AND J. A. M C C AMMON . Com-
puter simulation of protein-protein interactions. J. Phys.

 
we need to cover a constant volume also requiring about
Chem. B 105 (2001), 1504–1518.


balls. The total number of rigid motions to be ex-
plored is thus proportional to  , and multiplying this [3] D. G USFIELD . Algorithms on Strings, Trees, and Se-


with quadratic running time for evaluating the score func- quences. Cambridge Univ. Press, England, 1997.
tion  , we get a total running time proportional to  .

An improvement by a factor is possible if we compute 
[4] I. H ALPERIN , B. M AO , H. W OLFSON AND R. N USSINOV.
Principles of docking: an overview of search algorithms and
for all translations composed with a single rotation in one a guide to scoring functions. Proteins 47 (2002), 409–443.

 
 constant , this improves the running time to
sweep. For
roughly . Since is typically in the thousands, even this [5] L. H OLM AND C. S ANDER . Protein structure comparison
is not practical and we need faster alternatives. by alignment of distance matrices. J. Mol. Biol. 233 (1993),
123–138.

[6] L. H OLM AND C. S ANDER . The FSSP database of struc-


Bibliographic notes. The structural alignment problem
turally aligned protein fold families. Nucleic Acid Res. 22
refers to comparing the backbones modeled as curves or (1994), 3600–3609.
chains of spheres in three-dimensional space. Its impor-
tance within structural molecular biology derives from [7] R. KOLODNY AND N. L INIAL . Approximate protein struc-
the observation that evolution preserves structure better tural alignment in polynomial time. Manuscript, Stanford
than amino acid sequences. Among other things, re- Univ., Stanford, California, 2002.
search on this problem has lead to the creation of struc- [8] A. G. M URZIN , S. E. B RENNER , T. H UBBARD AND
tural databases [6, 8]. There are two main computational C. C HOTHIA . SCOP: a structural classification of proteins
approaches to structural alignment: one represents a chain database for the investigation of sequences and structures. J.
by its matrix of internal distances [5] and the other uses Mol. Biol. 247 (1995), 536–540.
rigid motions to align the chains embedded in space [9].
In this section, we have followed the second approach and [9] S. S UBBIAH , D. V. L AURENTS AND M. L EVITT. Struc-
tural similarity of DNA-binding domains of bacteriophage
presented the work of Kolodny and Linial [7], who explore
repressors and the globin core. Current Biol. 3 (1993), 141–
rigid motions in the outer loop and optimal alignments us-
148.
ing dynamic programming [3] in the inner loop of their al-

(VII.1), with constants and



gorithm. The particular score function given in Equation
 , was sug-

 
Exercises 117

Exercises 5. Biased probability. Suppose Function U NIFORM



picks a real number uniformly at random in  .

   
*  
1. Reflections. The reflection through a plane maps (i) Show that the minimum of two numbers picked
*
every point to the point such that
* *
by Function U NIFORM is distributed according

*
crosses the line segment orthogonally at its mid- to the triangle density function   .
J*
point. The central reflection maps every point to its
(ii) How are the minimum, the median and the
antipodal point  .
maximum of three numbers picked by Function
(i) Show that every rigid motion is the composition U NIFORM distributed?
of two plane reflections.
(ii) How many plane reflections do you need to rep- 
6. Sampling the 3-sphere. Prove that the following
method picks a point uniformly at random on :
resent the central reflection?
2. Sizes of spheres. The   -dimensional unit
 (i) Pick numbers
*
  and
uniformly at ran-

*  

dom in    .
sphere consists of all points at unit distance from the


 
(ii) If   or  

 *
then repeat Step
origin of the -dimensional Euclidean space:  

 
*     FH*KF

1, else let
*
return  
 
 .
 


     and

We know that the perimeter of  is  , the area of



7. Random rotation. Let us mark a point on the unit

*
2-sphere. For a rotation , let 
*
 is  and the volume of is   . What is the be the image of
 under that rotation. Any density function over the
  -dimensional volume of   ?
space of rotations implies a density function over the

from a point  , 

FH*

 F
3. Square distance from planes. The square distance


 
 
 *  *   that the uniform density of quater-
2-sphere. Prove

*
nions over implies the uniform density of points
    
      , is also the sum of square distances from over the 2-sphere.
the three planes parallel to the coordinate planes that

pass through  . 8. Number of alignments. Recall that an alignment be-
tween two chains of  and -carbon atoms that
uses  spaces can be represented by a matrix with
(i) Show that the above claim holds for any three
  
    columns. Assuming
planes that pass through  and pairwise enclose
a right angle. 
 
two rows and  
, we define 


  and note that we

need  insertions just to make up for the difference
FH* F
(ii) Area there triplets of planes enclosing non-right


 is equal to the sum in length. The remaining spaces are distributed over
*
angles for which
of square distances from to the three planes? equally many insertions and deletions, so we define
    .
   
 

4. Sum of square distances. Consider a collection of
points  in and let    be its centroid. (i) Show that    is a necessary and suffi-
(i) Prove that for every point in space, the root * cient condition for the number of spaces in any


alignment of the two chains.
mean square distance to the  is the root of the (ii) What is the number of different alignments with
square distance to the centroid plus a constant:

F* $ F 
a fixed number of spaces?
  * 

(iii) What is the total number of different align-
ments?
What exactly is the constant?
 
(ii) Extend the construction to a collection of 
planes in . In other words, prove that there


are three planes for which a similar formula
gives the sum of square distances to the
planes.

  
(iii) Further extend the construction to a collection
of lines in .
118 VII M ATCH AND F IT
Chapter VIII

Deformation

VIII.1 Molecular Dynamics


VIII.2 Spheres in Motion
VIII.3 Rigidity
VIII.4 Shape Space
Exercises

119
120 VIII D EFORMATION

VIII.1 Molecular Dynamics


Newton’s second law. [ 
  .]

Numerical integration. [Taylor expansion, different nu-


merical methods (Euler, Verlet, leap-frog, Beeman,
predictor-corrector).]

Hydrophobic surface area. [Weighted area and deriva-


tive (forward pointer to Chapter IX).]

Kinetic data structures. [Close neighbor lists, Delaunay


triangulation or dual complex (forward pointer to Section
VIII.2 and IX).]
VIII.2 Spheres in Motion 121

VIII.2 Spheres in Motion


[Explain the slack in the Pie Volume Formula (with a for-
ward pointer to Chapter IX.)] [This topic relates to the pos-
sibility of drawing non-straight Voronoi like decompositions
[2].] [Define cross-sections of the complex of independent
simplices and proof that each cross-section gives a differ-
ent pie formula but the same measurement.]



[Dynamic Delaunay triangulations [3]. Linear motion in
instead of .]
[Predict collisions of spheres.]

Bibliographic notes.

[1] J. BASCH , L. J. G UIBAS AND L. Z HANG . Proximity prob-


lems on moving points. In “Proc. 13th Ann. Sympos. Com-
put. Geom., 1997”, 344–351.

[2] H. E DELSBRUNNER AND E. A. R AMOS . Inclusion-


exclusion complexes for pseudodisk collections. Discrete
Comput. Geom. 17 (1997), 287–306.

[3] M. A. FACELLO . Geometric techniques for molecular


shape analysis. Ph. D. thesis, Report UIUCDCS-R-96-
1967, Dept. Comput. Sci., Univ. Illinios, Urbana, Illinois,
1996.
122 VIII D EFORMATION

VIII.3 Rigidity
[Discuss the pebble algorithm that analyzes the rigidity of
a graph in three dimensions.]

Bibliographic notes.
VIII.4 Shape Space 123



VIII.4 Shape Space used to mix


 skin surfaces and thus create a shape



space that encompasses   -variate deformations.
[Explain the mixing of two or more shapes as a generaliza-
tion of 1-parametrized deformation. The problems of [1] H.-L. C HENG , P. F U AND H. E DELSBRUNNER . Shape
space from deformation. Comput. Geom. Theory Appl. 19
(1) finding a good basis, (2001), 191–204.
(2) finding the best approximation within the spanned
[2] S.-W. C HENG , H. E DELSBRUNNER , P. F U AND K. P.
space,
L AM . Design and analysis of planar shape deformation.
Comput. Geom. Theory Appl. 19 (2001), 205–218.
are both difficult. They are similar to fundamental ques-
tions on function representation, which are probably dis- [3] G. W OLBERG . Recent advances in image morphing. In
cussed in the approximation theory literature.] “Proc. Comput. Graphics Internat., 1996”, 64–71.
The main functionality of the Morfi software is that it
can smoothly morph between one skin curve to another.
In other words, it deforms the skin of one set of circles to
the skin of another. The details of this deformation will
be explained in Section VIII.4, where we discuss notions
of similarity between two molecular skins. In this section,
we merely illustrate the deformation and mention some of
its features in passing. Figure VIII.1 shows the deforma-
tion of a skin curve defined by four into one defined by
three circles. For each snapshot, we show the skin curve
together with the dual complex. We note that any two con-
tiguous bodies, except the last three in the sequence, differ
by at least one change in homotopy type. Recall that the
homotopy types of the body and the dual complex are al-
ways the same, which implies that they change their type
the same way and at the same time. For the complex we
observe two types of changes caused by adding an edge
or a triangle. The corresponding changes in the body are
caused by creating a handle or filling a hole. There is a
third type of change not seen in Figure VIII.1, which in
the he complex is caused by adding a vertex and in the
body by creating a component.

Bibliographic notes. The Morfi software has been used


in [2] to explain two-dimensional skin geometry and to il-
lustrate its use in deforming two-dimensional shapes into
each other. We note that these deformations are similar but
also different from the image morphs studied in computer
graphics [3]. The goal there is photo realism and possibly
the most difficult problem towards achieving it is the con-
struction of a one-to-one correspondence between features
of the initial and the final images. The Morfi software cre-
ates a few-to-few correspondence through geometric con-
siderations rather than working towards a one-to-one cor-
respondence, which often does not exist. Similar to two
dimensions, we can deform skin surfaces into each other
by continuously changing the defining spheres. A canon-
ical such method is explained in [1]. That method can be
124 VIII D EFORMATION

Figure VIII.1: Ten snapshots of a deformation with skin and dual complex displayed. The skin in the fifth snapshot is the same as in the
figures above.
VIII.4 Shape Space 125

Figure VIII.2: From left to right and top to bottom: the shapes at times            . The sequence is defined by a set of seven
   

spheres forming a question mark at time    and a set of eight spheres forming a human-like figure at time    .
  
126 VIII D EFORMATION

Exercises
The credit assignment reflects a subjective assessment of
difficulty. Every question can be answered using the ma-
terial presented in this chapter.



1. Section of triangulation. (2 credits). Let be a


triangulation of a set of points in the plane. Let

@ 
be a line that avoids all point. Prove that intersects
at most    edges of and that this upper bound

is tight for every


 .
Chapter IX

Measures

There are various reasons why biologists want to mea-


sure the size of molecules. Volume is important in the
calculation of free energy and in estimates of populations
given a bound on the available space. Surface area is a
resource consumed by molecular interactions and is prob-
ably even more relevant to research in structural biology
than volume. This chapter will study three aspects of size:
volume, surface area, and arc length for such diagrams.

  with
Our general approach to measuring the size begins
indicator functions for convex polyhedra in . From
these we will derive short inclusion-exclusion formulas for
size measurements.

IX.1 Indicator functions


IX.2 Volume and surface area
IX.3 Void formulas
IX.4 Measuring Software
Exercises

127
128 IX M EASURES


IX.1 Indicator Functions Below we will construct indicator functions of from Eu-
ler characteristics of subcomplexes of the boundary com-
The Euler relation for convex polyhedra is a special case plex. The Euler relation will follow from elementary
of the Euler-Poincar´e theorem for complexes. There are proofs of properties of these indicator functions.
elementary proofs for this special case, and this section
Inclusion-exclusion.  Let be the finite collection of
presents one that is inductive.
. For a subset  


half-spaces such that
and a point we define

*    
Convex polyhedra. A convex polyhedron is the inter-
section of finitely many closed half-spaces. It is either
bounded or unbounded, and both cases are illustrated in  
 

if
Figure IX.1. In the first case, the polyhedron is the convex otherwise.


hull of finitely many points, and in the second, it extends  
Note that is outside iff 

to infinity. We study polyhedra in -dimensional space, for at least one non-

keeping in mind that
sion since polyhedra in
   is the most important dimen-
relate to molecules in
 
, as
zero subset . Namely if
the outside and we have 

then it sees a facet from
for the singleton
 
we will see later. set containing the half-space whose bounding hyperplane


contains that facet.
We form an alternating sum of the that leads to
an indicator function for the convex polyhedron. The
straightforward way of doing this is called the principle
of inclusion-exclusion. Particularly, we define

 

 
. 
  



 (IX.1)
Figure IX.1: A bounded convex polyhedron in to the left and



an unbounded one to the right.

   set for which



The sum ranges over all subsets of , including the empty
 for all points . We show that the
Let be a convex polyhedron in and assume it has

 non-zero terms cancel unless there is only one non-zero
 

non-empty interior. A hyperplane  supports if it in-



the boundary but not the interior, 

 contribution to the sum, which comes from the empty set.
 
 
tersects and
  To see this define and  
  .
    , which is the alternating sum 

. A face of is the intersection with a
Note that 
supporting hyperplane. The boundary is decomposed into  


    
#
of subsets of . This sum is
faces of various dimensions, which are usually prefixed
for clarity. For example,


is a -face of itself and the    
facets are the   -faces. Let

be the number of -
 %   


faces. The Euler characteristic of is the alternating sum
of faces,
provided 


. For  we get and  




 
 


  


. In words,  
is an indicator function for
 
,
 
   


if
  


In the bounded case, the boundary is a   -dimensional  if


topological

sphere whose only non-zero Betti numbers are
   . In the unbounded case, the boundary is
   Truncation. Most of the terms in the exponentially long
an open   -dimensional

topological ball whose only
non-zero Betti number is  . Assuming general po- formula (IX.1) are redundant and can be removed. Specif-
ically, we only keep the terms that correspond to faces of
sition, the dual of the boundary complex is a simplicial 
complex and the Euler-Poincar´e Theorem stated in Sec- . Each face is the intersection of the polyhedron with a
subset of the hyperplanes bounding half-spaces in ,
  
tion IV.3 implies the Euler relation for convex polyhedra:

    
 
  if

is bounded  





   
    if is unbounded

IX.1 Indicator Functions 129

For  we get
 
, which we consider an im-

ones crossing the hyperplane shared by  and  , and the

proper face but still a face of . It is convenient to assume
 , where
ones contained in  . The corresponding systems form the
- -
 
general position, which in this context means that there partition      
 
are no two subsets of that define the same face.
   
      
 
    



       
Let  be the system  


of subsets that define non-empty faces. For sets

there is an intuitive interpretation of  . Consider
 



     


and 

visible from if sees all facets around from outside 


 

Note that 



  . The faces of are defined by sets in

. Notice that according to this definition, 
the faces on
the silhouette are not visible. Then  iff is ,  ,  , and the faces of   are defined by sets in  ,
visible from . The restriction of the inclusion-exclusion
  ,    , where
 
 
 
    


formula (IX.1) to the system is

 
   .  

 
    
(IX.2)  

  
We claim that even though

is much shorter than

, it   The introduced systems partition  , , and   . We can
therefore write their values as sums of values of the  
is still an indicator function of . This claim is sufficiently
subsystems,

         
important to warrant a complete proof.
 
P IE T HEOREM A.
         
   
          

if

            
 
  


if
and hence      . We   

argue that all three terms on the right side of the equation
P ROOF. We use induction over the cardinality of the set

for  vanish. Both  and   have one less half-space

, which is again defined as the collection of half-spaces
not containing than does. The induction hypothesis
that do not contain . The basis of the  induc-   
tion is covered by
 
    , in which case and  

thus applies. By assumption,
 iff

 , which implies that
  and therefore  


   

.  

, as required. Assume , let
The second term vanishes because all sets in    con-
 , and define  as the closed complement of  , which
  iff    


 
  
tain  . The third term vanishes because
 . We have    
is a half-space that contains . Define sets of half-spaces

   and       . The correspond-
  and  
 
   . The
  
 
 

 

for all
 
ing systems are 
convex polyhedron 

 is obtained by remov-
. Therefore
cancel pairwise.
because the values

 



ing the constraint  , and therefore  
  , as shown in Figure IX.2. We distinguish
  , where
Unbounded convex polyhedra. The Pie Theorem A
implies the Euler relation for unbounded polyhedra. To
_ see this, we fix a point outside all half-spaces in , as in
g
Figure IX.3, and rewrite the formula in the Pie Theorem
P ’’ g

P
y

Figure IX.2: The half-spaces  and  share the hyperplane and Figure IX.3: The point  lies in the intersection of the comple-
are complementary to each other. The union of  and    is   . ments of the half-spaces.

three types of faces of



 , the ones contained in  , the A in terms of face numbers
  . By assumption of general
130 IX M EASURES

 D  

position, is the number of sets with

cardinality Bounded convex polyhedra. We return to the compu-

 
 . By the choice of , we have 
and therefore 


 
 
 

for all
  
   tation of the Euler characteristic, this time for a bounded

convex polyhedron . We choose a line not parallel to any


polyhedra, 
 
 
 

. This implies the Euler relation for unbounded convex
 .
face of and points and  sufficiently far in opposite di-
rections on the line. As illustrated in Figure IX.5, this par-

Restricting body. We need a slightly stronger version of Z Y


the Pie Theorem A to prove the Euler relation for bounded
y z
convex polyhedra. We first weaken the theorem  by re-
stricting the points to lie within a convex body , and

then strengthen 
it by further reducing the set  system. De-
      

fine


let  be the corresponding
 and


sum of values. We show
Figure IX.5: The boundary of
is dotted, that of
 is solid,
that for points

,  is an indicator function for .

 and the silhouette is indicated by the two hollow vertices.

P IE T HEOREM B. titions into the set of half-spaces that do not contain 


      

and the set  of half-spaces that do not contain  . Each
 

if
  proper face of either belongs to or to  or to


if 
the silhouette as seen in a view parallel to the chosen line.
  
  Let be the number of -faces of that have non-
empty intersection with the interior of  , and define 

P ROOF. We construct
 
that contains


and

a convex polyhedron
approximates



in the sense  that  symmetrically. Let  be the number of -faces in the sil-  
 , as in Figure IX.4. Define
 houette. The projection of the silhouette onto a hyperplane
normal to the line  is a bounded convex polyhedron

of dimension  . We can now argue inductively that the

Euler characteristic of is 
 
 .

P
For , is a closed interval with
establishes the induction basis. For

we have

 , which 
A
 
   
 
  
 
PA  

      
 




Observe that this sum counts the -face the same num-
Figure IX.4: Three edges and one vertex of  intersects the in-

terior of , and the same edges and vertex intersect the interior

 
      
 
ber of times on both sides. On the right side it is counted

 
times, same as on the
of  .  

left side. We get
 





 and use the Pie Theorem A to get
     
   
 
   
    

 

if
  


if
by the Pie Theorem B, using the respective other convex
By choice of , every point

  


polyhedron as the restricting convex body . Further-
 
 
is contained in all

half-spaces of . Hence 
 if
 .
 
more,

  
The system contains exactly all sets
. Hence   for
 
for which
  


      
      
 

all points      
and therefore also for all points .   
IX.1 Indicator Functions 131

  
the ,  , and  implies  
by induction hypothesis. Adding
 


the alternating
 
sums of
  , as re-
quired.

Bibliographic notes. Most of the material in this sec-


tion is taken from [2], where the inclusion-exclusion ap-
proach to measuring the union of balls is laid out. As
demonstrated, this principle also yields the Euler relation
for convex polyhedra. The discovery of that relation for
convex polyhedra in three dimensions is usually attributed
to Ludwig Euler [3, 4], although there is evidence that
Ren´e Descartes knew about it a century earlier. There are
many proofs of that relation, and the historically first one
for the general -dimensional case goes back to the work
of Ludwig Schläfli [7] in the middle of the nineteenth cen-
tury. He implicitly assumes that the boundary complex of
every convex polyhedron is shellable, which has not been
established until 1972 by Bruggesser and Mani [1], who
thus filled the gap left in Schläfli’s proof.
We note that all authors of papers referenced in this sec-
tion are Swiss, except for one who has a Swiss grand-
mother. Indeed, finding elementary proofs of the Euler
relation for convex polyhedra seems to be a favorite topic
for Swiss mathematicians [5, 6].

[1] H. B RUGGESSER AND P. M ANI . Shellable decompositions


of cells and spheres. Math. Scand. 29 (1972), 197–205.

[2] H. E DELSBRUNNER . The union of balls and its dual shape.


Discrete Comput. Geom. 13 (1995), 415–440.

[3] L. E ULER . Elementa doctrinae solidorum. Novi Comm.


Acad. Sci. Imp. Petropol 4 (1752/53), 109–140.

[4] L. E ULER . Demonstratio nonnullarum insignium proprieta-


tum, quibus solida hedris planis inclusa sunt praedita. Novi
Comm. Acad. Sci. Imp. Petropol 4 (1752/53), 140–160.

[5] H. H ADWIGER . Eulers Charakteristik und kombinatorische


Geometrie. J. Reine Angew. Math. 194 (1955), 101–110.

[6] W. N EF. Zur Einführung der Eulerschen Charakteristik.


Monatsh. Math. 92 (1981), 41–46.

[7] L. S CHL ÄFLI . Theorie der vielfachen Kontinuität. Written


1850–52 and published in Denkschrift der Schweizerischen
naturforschenden Gesellschaft 38 (1901), 1–237.
132 IX M EASURES

IX.2 Volume and Surface Area


In this section, we use the indicator functions developed
in Section IX.1 to derive inclusion-exclusion formulas for
the volume, area, and total arc length of a space-filling
diagram.

Volume by integration. By definition, the indicator


function of a geometric set is 1 inside and 0 outside the

  
set. We can therefore compute its volume by integration.
  
Consider for example a bounded
and a convex polyhedron

convex body
. Let


 Figure IX.6: A pyramid cut out of a ball by three half-spaces.

be the system of subsets of that appears in the statement



of the Pie Theorem B in the last section. The volume of
the intersection of the two convex bodies is 
 
which implies that the area of the spherical triangle is
 .

= 

   
 
Stereographic projection. We now turn to the problem
 

  .   of measuring the union of a finite set of balls in .
 
   
 
We transform
 the question into one about half-spaces in

.  


    

. Let
 
with the hyperplane 
 *
be the unit 3-sphere with center at the origin
. Call
 
   and identify





 
 

      maps a point to the *
the north-pole of . The stereographic

 .   projection    
* *
  
point  

   collinear with and  . The map is bijective


and therefore   has an inverse. If applied to all points of a

ball in , we get a cap of , which is the intersection

where is the closed complement of the half-space . As-
suming general position, the sets contain or fewer of the 3-sphere with a half-space  . This is illustrated
half-spaces each. For measuring molecules, we are mostly in Figure IX.7. The half-space  lies on the side of its
interested in the case  , in which the volume is a sum
of terms each involving four or fewer half-spaces. N

In  dimensions, the above formula gives a proof


  that 

of the area formula for spherical triangles. Recall
is the unit sphere centered at the origin . Let
be a set of three half-spaces whose bounding planes pass
through 0. The half-spaces intersect in an unbounded tri-
angular cone, and the intersection with the ball bounded
by  is a pyramid whose base is a spherical triangle, as
shown in Figure IX.6. Let ,  , and be the dihedral an-

gles between the planes, or equivalently, the angles of the Figure IX.7: Stereographic projection from   to  .
spherical triangle. The volume of the pyramid can now be
hyperplane that does not contain the north-pole, so 
computed by taking the ball, subtracting three half-balls,
  
adding
mid,          



 three sectors, and subtracting 

 
the reflected pyra-
 
. It follows    does contain  . Let   be the

collection of half-spaces that contain the north-pole. Then



that the volume is 
is the stereographic projection of the portion of
= 


  

   


that is not contained in the interior of

.

The area of the spherical triangle is three times the volume

Union of balls. Instead of computing the volume of
   
divided by the radius of the sphere. That radius is one, directly, we compute the volume of  
IX.2 Volume and Surface Area 133

    
   



. Let be the 4-ball bounded by

and
the system of subsets of that appears in the
complex of and do inclusion-exclusion with a term for
 every simplex in the dual complex. This is illustrated in
Pie Theorem B. The volume of the portion of outside Figure IX.8.
the polyhedron is

  
 
    


  .  
   

   

.  


 
   
    


  .  

    



 

.   =   

   
 
 



Figure IX.8: The area of the union is the sum of eight disk areas
minus the sum of nine pairwise intersection areas, plus the sum
We could now get a formula for  by scaling the vol- of two triple-wise intersection areas.

ume by the distortion factor of . A more straightforward
  translates the
derivation of a formula for the ball union
inclusion-exclusion formula from to . Instead of the Area and length. Similar to volume, we get a Pie Area
system of half-spaces we now use a system of balls ob-
  Formula for the surface area of , 
tained by substituting for  . For convenience, we use
  
  
 . 
 
 
 

the same notation, namely for the system of balls and 


for a generic set in .

P IE VOLUME F ORMULA . The volume of the union of a For  we get



and therefore a zero con-
 

.   =  
finite set of balls is tribution to the area. To prove this formula, we add the

 


 

contributions of individual spheres. For a single sphere,
  we use the Pie Volume Formula on the set of caps defined
by intersecting balls. Since the caps are two-dimensional,

the volume formula becomes an area formula. Letting 


Dual complex, revisited. We observe that the index sys- be the sphere and the set of caps, the area of  
.
tem in the Pie Volume Formula is an abstraction of the
dual complex 

of . Instead of proving this alge-
 is the area of  minus the alternating sum of the areas of
cap intersections,    
 
 

   , where

braically, we explain the connection in geometric pictures.
 
is the abstraction of the dual complex of . For each  
   set of caps in the system , we have the corresponding

E
Start with and embedded in as suggested in set of balls together with the ball of  in the system of .
Figure IX.7. For each ball we get a half-space  , By summing over all balls, we get the Pie Area Formula
hedron

and the  intersection of the half-spaces is a convex poly-
, which contains the north-pole in its
given above.

 
interior. Use  to project the boundary complex of to Similarly, we can get a Pie Length Formula that mea-

 
. This is the weighted Voronoi
belongs
 diagram


of . A subset   sures the total length of the circular arcs in the boundary
of the union of balls,

 .  
to iff its correspond-

ing
 face of  has non-empty intersection with the ball  
         


bounded by . But this is also the condition for the

 
projection of to have non-empty intersection with the

in

interior of . Hence, a non-empty set of half-spaces is
iff the corresponding set of balls defines a simplex
The sets
cause
 
 
with one or no half-space are redundant be-
in these cases. The proof of the for-
in the dual complex. We have arrived at a simple inter- mula is similar to the one for area, except that the sum-
pretation of the Pie Volume Formula: construct the dual mation is done over all circles that are intersections of two
134 IX M EASURES


spheres forming a pair in . For each such circle, we ap-
ply the (one-dimensional) Pie Volume Formula and thus
get an expression whose terms correspond to the simplices
in the star of the pair.
We might even go one step further and consider the
number of vertices of  . The inclusion-exclusion for-
mula suggests that this number is the alternating sum of

each triple in

vertex numbers of common intersections of balls. For
we have a three-sided spindle with two
vertices, and for each quadruple we have a rounded tetra-
hedron with four vertices. For two or fewer balls we have
no vertices. It follows that in the generic case, the number
of vertices of  is twice the number of triangles minus
four times the number of tetrahedra in the dual complex.

Bibliographic notes. In 1992, Naiman and Wynn


proved that the volume of a finite union of congruent balls
can be expressed by an inclusion-exclusion formula whose
terms correspond to the simplices in the Delaunay triangu-
lation of the centers [4]. Edelsbrunner generalized the for-
mula to allow for different size balls and strengthened it by
using the dual complex as the index system [1]. The ma-
terial in this section is taken from that paper. The proof of
the volume formula uses the inverse  
 of the stereographic
projection to transform balls in to half-spaces in .
That projection is conformal (preserves angles) and has a
number of other nice properties, many of which can be
found in the book by Thurston [5].

Just as a union of balls in
 corresponds to a convex
polyhedron in , a union of intersections of balls corre-
sponds to a union of intersections of half-spaces. The lat-
ter is Hadwiger’s notion of a not necessarily convex poly-
hedron [3]. Inclusion-exclusion formulas for such polyhe-
dra can be found in [2].

[1] H. E DELSBRUNNER . The union of balls and its dual shape.


Discrete Comput. Geom. 13 (1995), 415–440.
[2] H. E DELSBRUNNER . Algebraic decomposition of non-
convex polyhedra. In “Proc. 36th Ann. IEEE Sympos.
Found. Comput. Sci., 1995”, 248–257.
[3] H. H ADWIGER . Vorlesungen über Inhalt, Oberfläche und
Isoperimetrie. Springer, Berlin, 1957.
[4] D. Q. NAIMAN AND H. P. W YNN . Inclusion-exclusion
Bonferroni identities and inequalities for discrete tube-like
problems via Euler characteristics. Ann. Statist. 20 (1992),
43–76.
[5] W. P. T HURSTON . Three-Dimensional Geometry and
Topology, Volume 1. Edited by S. Levy, Princeton Univ.
Press, New Jersey, 1997.
IX.3 Void Formulas 135

IX.3 Void Formulas  


ery subset , there is a point inside every disk
   
in the subset and outside every disk not in the subset,

  
This section derives another collection of inclusion- 
. This condition is equivalent to the


exclusion formulas that express the volume, surface area,
and arc length of a union of balls in . The new collec-
three circles decomposing  into eight regions in the way
shown in Figure IX.10. Let ,  , and be the angles at the

tion leads to formulas for voids, which are bounded com-
ponents of the space outside the union.
c

Angles of revolution. A (one-dimensional) angle is by


definition the length of a unit circle arc and can assume c
any value between 0 and  . A two-dimensional angle is a b a b
the area of a piece of the unit 2-sphere and can assume
any value between 0 and  . It is convenient to normal-
ize so that in both cases the full angle is 1 and every an-
gle is a fraction of the full angle. This definition can be Figure IX.10: Both triangles are spanned by the centers of three
independent disks.


used in any dimension . For example, the 0-sphere is a
4  4  I4      4 4   4
pair of point with possible subsets the empty set, a single
vertices , , and . The left drawing suggests that the area
point, or both points. The only zero-dimensional angles 
   
4 4 4
of the triangle is ,
are therefore 0,  , and 1, and we will see shortly that this 

4
where we write for the area of the disk with center ,
convention makes perfect sense when we compute volume
for the area of the intersection of the disks with centers

I4      4   4      4  
using angles.
 and , and so on. If we change the meaning from  area to

     
Consider for example a tetrahedron . For each face perimeter we get .


, we define the angle  as the fraction of directions Both formulas hold whenever the three disks are indepen-

around along which we enter . Equivalently,  is the dent, but the right drawing in Figure IX.10 indicates that


volume fraction of a sufficiently small ball centered at an there are cases where the formulas are not as obvious as to
 
interior point of that lies inside the tetrahedron. Figure the left.
IX.9 illustrates the definition. In we refer to the two-
We generalize the formulas for independent triangles to
independent tetrahedra. To simplify the notation, we drop


the distinction between abstract and geometric simplices.
Specifically, we let denote an independent set of four
balls and, at the same time, the tetrahedron spanned by the
four ball centers. We use similar conventions for triangles,
edges, and vertices.


I NDEPENDENT VOLUME F ORMULA . The volume of an
independent tetrahedron is

 


 .    =  
Figure IX.9: The solid angle at a vertex, the dihedral angle at an 
edge, and the zero-dimensional angle of a triangle. 
The proof of the formula is somewhat technical and
dimensional angle at a vertex as a solid angle, and the omitted. Similar to the two-dimensional case, we get
one-dimensional angle at an edge as a dihedral angle. sums that evaluate to zero if we replace volume by area
The zero-dimensional angle of a triangle is always  . For or length,


convenience, we also define the angles of the improper
faces of as  and   .


 .     
 
 


 .   

 
   

Independent triangles and tetrahedra. Recall that a 
collection of three disks in  is independent if for ev- 
136 IX M EASURES

Angle weights. We derive a new volume formula for a the same formulas for area and length, except that the first
union of balls by combining the Pie Volume and the In- sum vanishes:
dependent Volume Formulas. We first make the Pie Vol-
 
  

 .     
 

   


ume Formula more complicated and then simplify by can- 

 .   
celling terms. It is convenient to cover the portion of 
outside the Delaunay triangulation with tetrahedra. This 
  

  


can be done by adding four points viewed as degenerate
balls to the set . We start with the Pie Volume Formula,      

= 

 
 . 
  
 =


   
Voids. As defined earlier, a void of a union of balls is a
bounded component of the complement space,  . 



Figure IX.11 illustrates the fact that every void of is 

and decompose into the parts defined by the tetrahedra contained in a void of . From a point inside the void,
that contain as a face,

=  
  
 




We need some notation to continue. Let denote the set


of tetrahedra in a simplicial complex . Furthermore, for

    
a subcomplex , let  denote the collection of

pairs with and . With this notation
we can rewrite the Pie Volume Formula as

= 


 .    =  
    

 

   
where is the Delaunay triangulation of . For example

=
for a tetrahedron , the only coface in is  , Figure IX.11: Both voids in the union of disks is contained in a


the angle is  , and the contributed term is  , corresponding void of the dual complex.
as before. For triangles, edges, and vertices , the contri-


bution is split up into as many pieces as there are angles
around . Whenever is a tetrahedron in , we use the   the union of balls looks a lot like from a point outside all
balls and voids. It is therefore not surprising that we can
Independent Volume Formula to make a substitution. This
   rewrite the Angle-weighted Pie Volume Formula to get an



results in the new volume formula. We write for  . expression for the volume of a void of . The cor-
responding void in is triangulated by a subset of the
A NGLE - WEIGHTED P IE VOLUME F ORMULA . The
  vol- Delaunay triangulation. Strictly speaking, is not a tri-
ume of the union of a finite set of balls in is

 
angulation because it is not even a complex, missing the

 
 
simplices that bound the void in . The most straightfor-


   .  
ward translation of the angle-weighted formula suggests
 

we compute the volume of by first computing the vol-
   = 
ume of the corresponding void in and then subtracting

      the volume of the fringe that reaches into that void.

 

VOID VOLUME F ORMULA . The volume of a void of
The new formula suggests we compute volume in two 

with dual set is
steps. First we compute the volume of the underlying
  
= 

  


space of itself, and second we add the volume of the
fringe,  . Observe that not all pieces con- 
  .  
sidered in the second sum are subsets of the fringe; some        = 

might reach into the interior of . Nevertheless, the
second sum is exactly the volume of the fringe. We get      
IX.3 Void Formulas 137

  
 and the total arc
Similarly, we get formulas for the area
  
to radius      
. The first complex is the sequence is

length of by substituting for in the corresponding
 
and the last is  , hence
  as required by (ii).
Define   

 
formulas of : and note that the underlying space of

.  

 is the void in that corresponds to the void in .
 
 
    


   are contained in  and
    
By choice of , the balls in


 .       
thus cannot contribute to the union of balls in any other
 


 
 




way than covering , as required by (iii).

       
Bibliographic notes. The material of this section is
Proof of void volume formula. The main idea in the taken from [1], which also contains a proof of the -
proof is to cover the void with small balls and measure the dimensional version of the Independent Volume Formula.
 The implementation of the formulas are part of the Alpha
difference between the new and the old union. Let be
 
the set of balls we add, and consider 



, ,  Shapes software and their use in structural biology has
been described in [2]. The Angle-weighted Pie Volume
and    . We require that
Formula is related to Gram’s angle sum formula, which
 states that the alternating sum of angles in a bounded con-
(i)
 be finite,
 
vex polyhedron always vanishes,
(ii)
(iii)

be a subcomplex of
   .
,


 
 .   

faces 
  

=  = 
Assuming these three conditions, we have

 
In  , this implies that the sum of angles at the vertices of
  
  . The Angle-weighted Pie Volume For-
mulas for the two unions are

=  
a convex -gon is , for the edges, minus 1, for the -
    
 
 
  
gon. Expressed in radians, this is   

   .



In , the sum of angles at the vertices is not longer deter-

 . 
 = 
mined by the combinatorial structure of the polyhedron,

    but the sum of solid angles minus the sum of dihedral an-

     

=  
gles is. A treatment of Gram’s angle sum formulas can be

= 
 
found in Grünbaum [3, chapter 14].


   .     =  


[1] H. E DELSBRUNNER . The union of balls and its dual shape.
Discrete Comput. Geom. 13 (1995), 415–440.

     
[2] H. E DELSBRUNNER , M. A. FACELLO , P. F U AND J.
The difference gives the Void Volume Formula. L IANG . Measuring proteins and voids in proteins. In “Proc.
 28th Ann. Hawaii Internat. Conf. System Sciences, 1995”,
Finally, we construct so that (i), (ii), and (iii) are vol. V: Biotechnology Computing, 256–264.

satisfied. Assuming general position, there exists a posi-
  
tive with    , where  is obtained from
by reducing every ball with radius to radius    .
  [3] B. G R ÜNBAUM . Convex Polytopes. Wiley, Interscience,
London, England, 1967.
and  have the same Voronoi diagrams and Delaunay
triangulations by the way we changed the radii, and they

have the same dual complexes by the choice of . Let
be a finite set of balls of radii  with centers in the void


 
that covers . Let  be the set of centers 
and note that

the dual complex of   is just  together with
finitely many isolated vertices. Hence,
   
   
 
 
   
 


where the second containment follows because  is

obtained from    by growing every ball of radius
138 IX M EASURES

IX.4 Measuring Software the corresponding interval of -values. Measuring voids


takes about  seconds on the author’s SGI Indigo II,
[Should we add a short discussion of Patrice’s new soft- and volbl outputs the measurements of all voids. The
output for the largest void in this example is
=
ware that also computes derivatives?] Volbl stands for
 
the ume of a union of a ls. It is part of the Alpha
Shapes software and can be used to compute the volume, measurements of void, index 845:
surface area, and total arc length of a ball union and its number of tetrahedra: 26
voids. tetra volume: 2.504511e+02
void volume: 1.009809e+01
surface area: 3.880316e+01
Running volbl. The software uses the files generated arc length: 5.776804e+01
by delcx and by mkalf that represent the Delaunay tri- number of corners: 34
angulation and its filtration, as explained in Sections II.3
and II.4. It is not necessary but a good idea to execute The index of the void is a unique but fairly arbitrary inte-
volbl in parallel with visualizing the alpha shapes of the
 the tetrahedra
ger assigned during the process of collecting
same data, which we do by typing in the dual set. The measurements are in Å , Å , and Å, as
appropriate. While the largest void is more than ten times
> alvis name & as large as any of the others (in volume), it is still only
> volbl name of the order of one van der Waals ball. The correspond-
ing void in the dual complex is more than twenty times as
on the command line. The software will start with a di- large, which confirms out intuition about the size differ-
alogue narrowing down the options of what to compute. ence between the two representations. While measuring
As an example consider the measurements of voids in the voids, the software calculates for each ball its contri-
cdk2, which is an enzyme involved in the control of the bution to the void area and outputs the result in a new file,
growth process of a body cell. The voids shown in Fig- name.contrib.
ure IX.12 occur for the solvent accessible diagram defined
for    Å. In other words, we look at the wire- Before exploring any of the other options in volbl,
  
frame of the dual complex defined by the balls with radii
  , where is the van der Waals radius of the -th
 we take a brief look at the algorithms used and the data
structures these algorithms require.
ball. After entering the index of the -complex, which
Algorithms and data structures. To measure a union
of balls using the Pie Volume, Area, and Length Formu-
las, we need a list of the simplices in the dual complex

of  . This list is a prefix of the masterlist mentioned
in Section II.4. We simplify the actual situation insignif-
  
stored in an array

icantly by assuming that the simplices in

are
 . The following pseudo-code is
then a direct implementation of the Pie Volume Formula
of Section IX.2.

    ;

      .   
for to  do


 ;    
 

endfor.

Figure IX.12: There are eight voids in the -complex of cdk2, The implementation of the Area and Length Formulas is
for 


       Å. Some of the voids have (open) dual sets similarly straightforward. The Angle-weighted Pie and
that seem connected in the image but are not because of missing Void Volume Formulas use the masterlist and in addition
triangles. require a representation of the voids. We use a partition of

we get as    
 from alvis, we pick the middle of ious voids,
  -
the Delaunay tetrahedra into


 -
  -
the dual complex and the var-
, where

is-

IX.4 Measuring Software 139


the set of tetrahedra in the unbounded component of the
complement of . We have voids, each represented by a space-filling diagram
vol
Vsf
area
Asf
lgth
Lsf
crns
Csf
linear list of tetrahedra. We compute the lists by main-
voids
outside fringe
Vtv
Vof
Atv
Aof
Ltv
Lof
Ctv
Cof
taining a union-find data structure while scanning the mas- envelope Ve Ae Le Ce
terlist from back to front.
   
dual complex Vsh

  dual sets of voids Vtiv


 downto
 
for do ;



case    . A DD  ; Table IX.1: Cumulative measurements made by the Volbl soft-


case    . let be the first and the second ware.


Delaunay tetrahedron that has as a face;
  
U NION  F IND  F IND 
endfor.

The only trouble with this algorithm is that tetrahedra in


the unbounded component may be scattered in more than


one list. We fix this problem by adding a dummy tetra-
hedron to the system and setting  whenever is
a triangle on the boundary of the Delaunay triangulation.
The following pseudo-code is a direct implementation of
the Void Volume Formula of Section IX.3.

    ;
forall tetrahedra  

            ;
do


forall faces  do


if ;    then 

        .  
Figure IX.13: The dual complex of the van der Waals diagram of

 cdk2. The complex has  vertices and no voids.





    

endif
endfor Asf = 3.100959e+04 Aof = 3.100959e+04
endfor. Lsf = 1.915391e+04 Lof = 1.915391e+04
Csf = 6388 Cof = 6388
The implementation of the Void Area and Length Formu-
las is similarly straightforward.
Note that the volume of the space-filling diagram is in-
significantly higher than that of the outside fringe. The
Options. The software computes the volume, area, difference is the volume of the dual complex, which is ap-
length, and also the number of vertices in the boundary, parently rather small. The surface area, total arc length,
which we refer to as corners. It does this for the space- and number of corners are of course the same for both.
filling diagram 
, its voids, the outside fringe (defined The software also checks a few linear relations that should

plement of
 
as the portion of the unbounded component of the com-
that is covered by the balls), and the enve-
vanish provided the computations are correct. For exam-
ple, the sum of volumes of the space-filling diagram and
lope (defined as the space-filling diagram union all voids). its voids should be equal to the volume of the envelope,
Table IX.1 lists the main measurements made. As an ex- which in turn should be equal to the sum of volumes of
ample consider the van der Waals diagram of cdk2, whose the dual complex, the voids in the dual complex, and the
dual complex is shown in Figure IX.13. In the checking outside fringe. The specific relations checked by the soft-
option, the software computes all terms in Table IX.1 and ware are
prints a summary of the results. In the considered exam-
ple, it reports that there are no voids and it prints the sizes Vsf + Vtv - Vtiv - Vsh - Vof = 0.0
of the space-filling diagram and the outside fringe as Asf - Atv - Aof = 0.0
Lsf - Ltv - Lof = 0.0
Vsf = 3.034036e+04 Vof = 2.962563e+04 Csf - Ctv - Cof = 0
140 IX M EASURES

Another form of output is the description of the total mea-


surement as a sum of contributions over individual atoms.
whose edges are by definition great-circle
of that  -gon is 
   
 
arcs. The area
  , where the sum adds

This makes sense for volume and area but is done only for all angles in the  -gon. This is because a triangulation
the latter. Depending on the type of area measurement, produces    spherical triangles each contributing one
the software outputs a file name.contrib that contains half times the sum of the three angles minus one quarter
to the area. To construct the  -gon, we approximate each

the contribution of each individual atom. In the check-
ing option, the software compares for each atom the area of the two circles by a regular spherical -gon. The points


contribution to the space-filling diagram with the sum of are placed slightly outside the circles so that the areas of
the -gons are exactly the areas of the caps. Let  and   

contributions to the voids and the outside fringe. It also
be the angles in the two -gons. Assuming that  and   

checks whether the sum of contributions really add up to


the total area, and it does this for the space-filling diagram, are rational, we can find infinitely many integers so that

  
the voids, and the outside fringe. the two -gons share two vertices near the vertices of the
bigon. We then have   


. The angles at the
two shared vertices approach  as goes to infinity. Fur-
Area formula. All analytic formulas needed to measure thermore, the  -gon has   
vertices with angle  
the common intersection of up to four balls are straightfor-
ward, except possibly the area of the intersection of up to
and      

vertices with angle  . To compute  we re- 
call that the area of the cap is 

 
. By construction,
three caps. A formula for the area follows from the Gauss-
    
the area of the approximating -gon is the same,
  namely
    
    
Bonnet theorem in differential geometry, but we prefer to
       

 . Hence 

   ,

  
derive it with elementary means. The cap on a sphere
consists of the portion inside the sphere . Equivalently,  and symmetrically 
 


   . We plug the values

 for  and  into the formula for the area of and



the cap contains all points whose power distance from
      
 

 
get
is no less than that to ,  

  *     *    * 
   




 after eliminating the terms that vanish when  goes to in-


Let be the radius of
and  
the radius of the cir-
 finity. Similarly, for the intersection of three caps with an-
 gles   ,   , and  and arc  ,  , and we get
 for the area of        ,
    lengths
cle bounding . We define the width of equal to   

      

  . Note that  the formulas give the precise





the distance between the two planes that cut from ,      



       , as illustrated in Figure IX.14.

  
 

where
The area of the cap is then   times the area of the

sphere , which is   




.
area of the intersection of two or three caps since the ap-
proximating spherical  -gon is only a tool in the proof
and not used in the formula.

ri pj Bibliographic notes. The structural biology litera-


ρj
wj
ture distinguishes between numerical and analytical ap-
ϕ ϕ
proaches to measuring molecules. For the latter approach,
pk
we would decompose the molecule into simple pieces and
give a formula for the size of each piece. An example is
Connolly’s work [1] on computing the area of a molecu-
lar surface. The idea of using inclusion-exclusion for size
Figure IX.14: To the left, the shaded cap  has radius  and computations goes back to Kratky [4], who shows that
width  . To the right, the shaded bigon has angles and and there is a short inclusion-exclusion formula for the area
arc lengths  and   . of the intersection of a finite set of disks in the plane. His
proof is existential and superceded by explicit formulas

Consider now the intersection of two caps. Since all that can be derived by the same methods as described in
simplices in
intersection   
are independent, we may assume that the
is a bigon, as shown in Figure IX.14.
Sections IX.1 and IX.2. Scheraga and coauthors [5] imple-
ment an inclusion-exclusion formula for a union of balls
We let  be the angle at the two vertices and  and  the   based on Kratky’s work, but the lack of an explicit expres-
lengths of the two arcs, all measured as fractions of a full sion occasionally leads to miscalculations [2]. A detailed
circle. We approximate the bigon by a spherical  -gon, documentation of the Volbl software is given in [3].
IX.4 Measuring Software 141

[1] M. L. C ONNOLLY. Analytical molecular surface calcula-


tion. J. Appl. Cryst. 16 (1983), 548–558.

[2] L. R. D ODD AND D. N. T HEODOROU . Analytic treat-


ment of the volume and surface area of molecules formed
by an arbitrary collection of unequal spheres intersected by
planes. Molecular Physics 72 (1991), 1313–1345.

[3] H. E DELSBRUNNER AND P. F U . Measuring space filling


diagrams and voids. Rept. UIUC-BI-MB-94-01, Beckman
Inst., Univ. Illinois, Urbana, Illinois, 1994.

[4] K. W. K RATKY. The area of intersection of equal circular


disks. J. Phys. A: Math. Gen. 11 (1978), 1017–1024.

[5] G. P ERROT, B. C HENG , K. D. G IBSON , J. V ILA , A.


PALMER , A. NAYEEM , B. M AIGRET AND H. A. S CHER -
AGA . MSEED: a program for rapid determination of acces-
sible surface areas and their derivatives. J. Comput. Chem.
13 (1992), 1–11.
142 IX M EASURES

Exercises
The credit assignment reflects a subjective assessment of
difficulty. Every question can be answered using the ma-
terial presented in this chapter.



1. Section of triangulation. (2 credits). Let be a


triangulation of a set of points in the plane. Let

@ 
be a line that avoids all point. Prove that intersects
at most    edges of and that this upper bound

is tight for every


 .
Chapter X

Derivatives

The derivative of surface area under deformation is an


important term in the simulation of molecular and atomic
motion. In the case of van der Waals or solvent accessible
diagram, it is related to the length of the circular arcs in
the boundary.

X.1 Implicit Solvent Model


X.2 Weighted Area Derivative
X.3 Weighted Volume Derivative
X.4 Derivative Software
Exercises

143
144 X D ERIVATIVES

X.1 Implicit Solvent Model


[Give a general introduction and work out the relationship
with area and volume derivatives.]
X.2 Weighted Area Derivative 145

X.2 Weighted Area Derivative


[Talk about the unweighted and the weighted area deriva-
tives.] [Explain the results and disucuss the continuity is-
sue of the functions.]

[1] R. B RYANT, H. E DELSBRUNNER , P. KOEHL AND M.


L EVITT. The area derivative of a space-filling diagram.
Manuscript, Duke Univ. Durham, North Carolina, 2002.
146 X D ERIVATIVES

X.3 Weighted Volume Derivative


[Talk the unweighted and the weighted volume derivatives.]
[Explain the results and disucuss the continuity issue of the
functions.]

[1] H. E DELSBRUNNER AND P. KOEHL . The weighted vol-


ume derivative of a space-filling diagram. Manuscript, Duke
Univ. Durham, North Carolina, 2003.
X.4 Derivative Software 147

X.4 Derivative Software


[Discuss Patrice’s ProShape software.]
148 X D ERIVATIVES

Exercises
The credit assignment reflects a subjective assessment of
difficulty. Every question can be answered using the ma-
terial presented in this chapter.



1. Section of triangulation. (2 credits). Let be a


triangulation of a set of points in the plane. Let

@ 
be a line that avoids all point. Prove that intersects
at most    edges of and that this upper bound

is tight for every


 .
S UBJECT I NDEX 149

Subject Index Dirichlet tessellation, 19


DNA (deoxyribonucleic acid), 2
dual complex, 20, 101
active site, 7 dual set, 69
affine combination, 28
affine hull, 28
edge contraction, 40
alpha complex, 21
edge flip, 40
alpha shape, 21
electron, 9
Alpha Shape software, 23
element, 9
amino acid, 5
-sampling, 36
angle, dihedral, 103
Euler characteristic, 51, 96
, solid, 103
Euler relation, 96
area, 100
Euler-Poincar´e theorem, 96
atom, 9
exact arithmetic, 24
atomic number, 9
atomic weight, 9
attachment, 60 face (of a polyhedron), 96
face (of a simplex), 48
facet, 96
backbone, 5
filtration, 21, 24
barycentric coordinates, 65
fundamental theorem of linear algebra, 51
basis (of a group), 51
Betti number, 51, 114
, persistent, 57 Gauss map, 32
body (inside a skin), 29 Gaussian curvature, 32
boundary group, 49 gene, 3
boundary homomorphism, 49 genome, 2
Brunn-Minkowski theorem, 116 geodesic, 32
gluing map, 60
Gouraud shading, 40
canonical basis, 57 gradient, 63
cell (in a complex), 60 graphical user interface, 23
central dogma, 1 group, 48
chain, 48
chain complex, 49
Helly’s theorem, 116
chromosome, 3
Hessian, 61
closed ball property, 35
homeomorphism, 44
coaxal system, 29
homology class, 49
codon, 5
homology group, 49
coherent triangulation, 19
, persistent, 57
Connolly surface, 16
homomorphism, 48
continuous function, 44
homotopic map, 44
contractible, 45
homotopy equivalence, 44
convex combination, 28
homotopy type, 45
convex hull, 28
homotopy, 44
convex polyhedron, 96
coordinate system, 60
Corey-Pauling-Koltun model, 16 image (of a function), 48
coset, 48 inclusion-exclusion, 96
critical point, 61 independent collection, 20
, non-degenerate, 61 independent simplex, 20, 103
critical point theory, 59 index (of a critical point), 61
curvature (of a curve), 32 indicator function, 96
, Gaussian, 32 integral line, 63
, mean, 32 interval tree, 24
, normal, 32 isomorphism, 48
, principal, 32
cycle group, 49 Johnson-Mehl model, 16
join, 45
deformation retraction, 45
Delaunay triangulation, 23 kernel, 48
, restricted, 35
, weighted, 18 length scale, 36
diffeomorphism, 60 length, 100
differential topology, 62 Lennard-Jones function, 11
dihedral angle, 103 linear algebra, 62
150 S UBJECT I NDEX

linear independence, 51 regular simplex, 24


lower star, 65 regular triangulation, 19
replication (of DNA), 3
manifold, 60 residue, 5
map, 44 restricted Delaunay triangulation, 35
matrix (of a homomorphism), 55 restricted Voronoi diagram, 35
mean curvature, 32 ribosome, 6
mesh, 35 RNA (ribonucleic acid), 3
metamorphosis, 41
Minkowski sum, 30, 116 signature, 25, 71
mixed cell, 30 simplex, 48
mixed complex, 30 simplicial complex, 48
molecular mechanics, 10 simulated perturbation, 24
molecular skin, 27 singular simplex, 24
molecular surface, 15 skin, 29
molecule, 9 Skin Meshing software, 40
Morfi software, 39 smooth manifold, 60
morphing, 84 smooth map, 60
Morse complex, 64 solid angle, 103
Morse function, 61 solvent accessible surface, 15
Morse theory, 59 space-filling diagram, 14
Morse-Smale function, 64 specificity, 7
mouth (of a pocket), 69 speed (of a curve), 32
spherical triangle, 100
neutron, 9 stable manifold, 63
NMR (nuclear magnetic resonance), 23 star, 65
normal curvature, 32 stereographic projection, 100
normal form, 55 subspace topology, 44
normal form algorithm, 55, 114 supporting hyperplane, 96
normal vector, 32
nucleotide, 2 tangent space, 60
tangent vector, 32, 60
open ball, 44 topological equivalence, 44
open set, 44 topological space, 44
open set (of simplices), 72 topological subspace, 44
orthogonal spheres, 18 topological type, 44
orthosphere, 18, 22 topology, 44
transcription (of DNA to RNA), 4
transversal, 64
parametrization, 60 triangulation, 35, 48
partial order, 69 , coherent, 19
pdb-file, 23 , regular, 19
pencil (of circles), 28 , weighted Delaunay, 18
persistent Betti number, 57
persistent homology group, 57
union-find, 56, 107
piecewise linear, 65
unstable manifold, 63
pocket, 68
polyhedron, 102
, convex, 96 van der Waals potential, 10
potential energy, 11 van der Waals radius, 23
power diagram, 17 van der Waals surface, 15
power distance, 17 vector field, 63
principal curvature, 32 velocity vector, 32
principal simplex, 24 vertex insertion, 40
principle of inclusion-exclusion, 96 void, 69, 104
protein, 5 Volbl software, 106
Protein Data Bank, 23 volume, 100
proton, 9 Voronoi diagram, additively weighted, 15
, restricted, 35
, weighted, 17
quotient group, 48
x-ray crystallography, 23
Ramachandran plot, 6
rank (of a group), 51
regular point, 61
AUTHOR I NDEX 151

Author Index Gelfand, I. M., 19


Gerstein, M., 11, 26, 93
Akkiraju, N., 16 Giblin, P. J., 22, 34, 50
Alberts, B., 8 Gibson, K. D., 109
Alexandrov, P. S., 22 Gilliland, G., 26
Amenta, N., 38 Grünbaum, B., 105
Ashcroft, N. W., 11 Griffith, A. J. F., 4
Aurenhammer, F., 16 Gromov, M., 117
Guibas, L. J., 83
Guillemin, V., 62
Bader, R. F. W., 79
Bajaj, C. L., 77
Banchoff, T. F., 65 Hadwiger, H., 99, 102
Basch, J., 83 Harer, J., 65, 77
Berman, H. M., 26 Helly, E., 117
Bern, M., 38 Hughes, J., 42
Besl, P. J., 93
Bhat, T. N., 26 Johnson, A., 8
Billera, L. J., 19 Johnson, W. A., 16
Bondi, A., 11 Jorgensen, W. L., 11
Bourne, P. E., 26
Bray, D., 8 Kapranov, M. M., 19
Bronson, H. R., 8 Kelley, J. E., 46
Bruce, J. W., 34 Kirkpatrick, D. G., 22
Bruggesser, H., 99 Klee, V., 117
Kratky, K. W., 109
Capoyleas, V., 117 Kuntz, I. D., 70
Casati, R., 70
Cheng, B., 109 Lam, K. P., 42, 84
Cheng, H.-L., 34, 38, 42, 84, 87 Leach, A. R., 11, 16
Cheng, S.-W., 42, 84 Lee, B., 16
Chew, L. P., 38 Leiserson, C. E., 54, 114
Chothia, C., 11 Leray, J., 46
Clifford, W. K., 31 Letscher, D., 58, 76, 114
Connolly, M. L., 16, 109 Levitt, M., 93
Corey, R. B., 8 Lewis, J., 8
Cormen, T. H., 54, 114 Lewontin, R. C., 4
Creighton, T. E., 8 Liang, J., 70, 74, 105, 115
Crick, F. H. C., 4 London, F., 11

Darboux, M. G., 31 Mücke, E. P., 22, 26


Darby, N. J., 8 Maigret, B., 109
Delaunay, B. (also Delone), 19 Maillot, P.-G., 92
Delfinado, C. J. A., 54 Mani, P., 99
Dey, T. K., 34, 38 Martinetz, T., 38
Dirichlet, P. G. L., 19 McCleary, J., 58
Dodd, L. R., 109 McKay, N. D., 93
Mehl, R. F., 16
Edelsbrunner, H., 16, 19, 22, 26, 31, 34, 38, 42, 46, Mendel, G., 4
54, 58, 65, 70, 74, 76, 77, 82, 84, 87, 99, 102, Mermin, N. D., 11
105, 109, 113, 114, 115 Miller, J. H., 4
Eilenberg, S., 54 Milnor, J., 62
Euler, L., 32, 99 Morse, M., 62
Munkres, J. R., 46, 50, 58
Facello, M. A., 70, 74, 105, 115
Feiner, S., 42 Naiman, D. Q., 102
Feng, Z., 26 Nef, W., 99
Foley, J., 42 Neyeem, A., 109
Forman, R., 70, 115
Frobenius, G., 31 O’Neill, B., 34
Fu, P., 16, 25, 42, 84, 87, 105, 109
Palmer, A., 109
Gauss, C. F., 19, 32 Pascucci, V., 77
Gelbart, W. M., 4 Pauling, L., 8
152 AUTHOR I NDEX

Pedoe, D., 31 Will, H.-M., 16


Perrot, G., 109 Woodward, C., 74
Poincar´e, H., 54 Wynn, H. P., 102
Pollack, A., 62
Zelevinsky, A. V., 19
Qian, J., 16 Zhang, L., 83
Zomorodian, A., 58, 65, 74, 76, 77, 114
Raff, M., 8
Ramachandran, G. N., 8
Ramos, E. A., 82
Richards, F. M., 16, 26
Rivest, R. L., 54, 114
Roberts, K., 8
Rotman, J. J., 50

Sasisekharan, V., 8
Schütte, K., 113
Scheraga, H. A., 109
Schey, H. M., 66
Schikore, D. R., 77
Schl¨afli, L., 99
Schneider, R., 117
Schulten, K., 38
Seidel, R., 22
Seifert, H., 46, 62
Shah, N. R., 38
Sharir, M., 113
Sherwood, E. R., 4
Shindyalov, I. N., 26
Smale, S., 66
Steenrod, N., 54
Stern, C., 4
Storjohann, A., 58
Strang, G., 62
Stryer, L., 8
Sturmfels, B., 19
Sullivan, J., 34, 38

Taylor, R., 11
Theodorou, D. N., 109
Threlfall, W., 46, 62
Thurston, W. P., 102
Tirado-Rives, J., 11
Tsai, J., 11

Van Dam, A., 42


Van der Waals, 11
Van der Waerden, B. L., 113
Van Krefeld, M., 77
Van Oostrum, R., 77
Varzi, A. C., 70
Veltkamp, R., 91
Vila, J., 109
Vleugels, J., 91
Voronoi, G., 19

Wagon, S., 117


Wallace, A., 62
Walter, P., 8
Wang, Y., 77
Watson, J. D., 4
Weissig, H., 26
Westbrook, J., 26

Das könnte Ihnen auch gefallen