Beruflich Dokumente
Kultur Dokumente
Series Editor
The titles published in this series are listed at the end of this volume.
Graph Theoretical Approaches
to
Chemical Reactivity
edited by
Danail Bonchev
and
Ovanes Mekenyan
Higher Institute of Chemical Technology,
Burgas, Bulgaria
ISBN 978-94-010-4526-1
4. POLYHEDRAL DYNAMICS
Robert B. King ... 109
1. Introduction ... 109
2. The Topology of Polyhedra ... 111
3. Polyhedral Isomerizations ... 116
4. Microscopic Models: Diamond-Square-Diamond Processes and Gale
Diagrams 00.116
5. Macroscopic Models: Topological Representations ... 126
6. Literature References ... 134
5. REACTION GRAPHS
Alexandru T. Balaban ... 13 7
vi TABLE OF CONTENTS
1. Introduction 138
2. Reaction Graphs of Rearrangements Via Carbocations ... 138
3. Automerization of Bulvalene, Other Valence Isomers of Annulenes,
and Azabullvalene ... 155
4. Rotation in Molecular Propellers ... 158
5. Reaction Graphs for Rearrangements of Metallic Complexes ... 159
6. Xenon Hexafluoride ... 175
7. Heptaphosphide Trianion ... 175
8. Kinetic Graphs, Synthon Graphs, and Graph Transforms ... 176
9. Conclusions ... 177
10. References ... 177
The progress in computer technology during the last 10-15 years has enabled the
performance of ever more precise quantum mechanical calculations related to structure
and interactions of chemical compounds. However, the qualitative models relating
electronic structure to molecular geometry have not progressed at the same pace. There
is a continuing need in chemistry for simple concepts and qualitatively clear pictures that
are also quantitatively comparable to ab initio quantum chemical calculations.
Topological methods and, more specifically, graph theory as a fixed-point topology,
provide in principle a chance to fill this gap.
With its more than 100 years of applications to chemistry, graph theory has proven to
be of vital importance as the most natural language of chemistry. The explosive
development of chemical graph theory during the last 20 years has increasingly
overlapped with quantum chemistry. Besides contributing to the solution of various
problems in theoretical chemistry, this development indicates that topology is an
underlying principle that explains the success of quantum mechanics and goes beyond it,
thus promising to bear more fruit in the future.
As a part of the series "Understanding Chemical Reactivity", this volume is designed
to introduce the reader to the graph-theoretical and, more generally, topological
elucidation of chemical reactivity. The nine chapters of the volume are written by 15
authors from seven countries who have contributed largely to the development of this
area of science. This emphasizes the importance and complexity of chemical reactivity
studies whose elucidation requires the broad cooperation of scientists from allover the
world, as well as from various branches of chemistry. The chapters are well illustrated
and provide an extensive reference to the problems discussed, in line with the scope of
not teaching but intriguing and guiding the reader.
The introductory chapter on graph theory by H. Hosoya is not just a collection of
terms, definitions, and formulae. The basic notions and concepts of graph theory are
specifically conveyed in a way that benefits from the numerous personal contributions of
the author in this area. Emphasis is put on the matrix and polynomial representations,
symmetry, isomorphism, and operations on graphs. Indeed, a single chapter could not
cover all aspects of the very rich graph-theoretical formalism, and the reader may find
additional information on the subject in the introductory sections of the other chapters.
It is traditional to connect chemical reactivity to graph theory via Hiickel molecular
orbital theory (HMO), which provided the first reactivity indices (atomic charges, bond
orders, free valences, localization energy, superdelocalizability indices, frontier orbital
indices). In Chapter 2, Trinajstic, Mihalic, and Graovac go beyond reviewing the
isomorphism between graph-spectral theory and the HMO theory, and beyond the
discussion on the structure of the Huckel eigenvalue spectrum. They present the modern
view on the interplay between graph theory and molecular orbital theory by reviewing
the achievements of recent years. Several major topics are included. The TEMO principle
(developed by the late Oskar Polansky, one of the pioneers of chemical graph theory)
allow, among other things, reactivity predictions for a special class of topological isomers
(topomers). The rule of topological charge stabilization of Gimarc is a reliable guide in
predicting relative stabilities of various heteroatomic isomers. The graph-theoretical
assessments of the HOMO-LUMO separation and absolute hardness of altemant
hydrocarbons pave the way for future achievements in this area.
In Chapter 3, Rousseau and Lee present a form of Huckel theory, termed second
moment scaling, which had been developed earlier by Burdett and Lee and has proven
ix
x PREFACE
successful in both rationalizing and optimizing the structure of molecules and solids.
Roussseau and Lee go beyond the reviewed previous work in this area and apply the
method to obtain the actual shape of the electronic energy surface as a function of
geometry for two important classes of compounds (boranes and carboranes). The
minimum energy geometries, electron density contour maps, and reaction paths thus
calculated are shown to be in reasonable accord with the ab initio method. The
topological method used may be regarded as a third-generation Huckel method,
applicable to covalent and metallic (but not ionic) compounds that are formed of main
group atoms and transition metals.
Chapter 4 by R. B. King summarizes topological and graph-theoretical aspects of
isomerization reactions of polyhedral molecules (both coordination and cluster
polyhedra). The microscopic approach is discussed, in which the details of polyhedral
topology are used to help elucidate which types of single isomerization steps are possible.
The reader may gain experience in using specific techniques and processes, such as the
Schlegel diagrams, the Gale diagrams, and the dsd-processes, as well as learn about
exciting developments in this area in which the author is one of the major contributors.
The earlier macroscopic approach, which uses the so-called topological representations
(reaction graphs) to show the relationships between different permutational isomers, is
also reviewed.
The macroscopic approach just mentioned is further detailed by Balaban in Chapter 5,
a chapter devoted to reaction graphs in both organic and inorganic chemistry. The author
was the first chemist to apply graph theory to isomerization processes (interconversions
of carbenium ions), doing so as early as 1966. The reader will find in this chapter the
complete and intriguing story of the rearrangements via carbocations with a particular
emphasis on those leading to diamond hydrocarbons and their derivatives. Reaction
graphs dealing with inorganic compounds describe different classes of rearrangements of
complexes (mainly metallic ones) with various geometries. The chapter is rich in
illustrative examples demonstrating that a chemist applying this graph-theoretical
technique may gain a closer insight into rearrangement mechanisms and be enabled to
indicate likely intermediates. .
Chapter 6 by Mezey initially offers a summary of the previously developed topological
methodology for treating three-dimensional molecular shapes and their changes. The
mathematical formalism developed is formulated in terms of contour surfaces of
electronic charge densities or, alternatively, of molecular electrostatic potential contours.
Chemical reactivity is thus regarded as the strongest change in molecular shape within
a series of similarly treated but less pronounced changes like conformational and
vibrational-rotational ones, as well as electronic excitations. The second part of the
chapter presents a newly developed method for representing molecular shapes by the
topological patterns of contour surfaces of three-dimensional nuclear potentials.
Computationally simple, the new technique is extended for the modeling of shape
changes (reaction paths) in chemical processes.
A quite different topological approach to chemical reactions is advanced in Chapter 7
by Babaev. Proceeding from the classical picture of molecules with localized bonds,
described by multigraphs with loops and then by graphoids, Babaev introduces the
concept of two-dimensional manifolds of surfaces. This novel concept characterizes
chemical species in a highly generalized manner by several topological invariants. The
well-known empirical types of chemical similarity (e.g., isoelectronic, isostructural,
homological) are thus shown to result from topological homeomorphisms; they all
conserve the Euler characteristic of the respective surfaces. Moreover, the author proves
the invariance of this topological characteristic in any chemical reaction involving
PREFACE xi
molecules with localized bonds, a result that might be termed a principle of conservation
of molecular topology in chemical reactions. The reader may thus gain an exciting and
unusually general view on reactivity in chemistry as a whole.
Chapter 8 by Mekenyan and Basak begins by reviewing some of the basic principles
underlying the topological nature of chemical reactivity. Topological indices, one of the
powerful tools of graph theory, ~e then introduced on this basis. The most common
indices are classified and formulated in several large groups which also distinguish
between global molecular, fragment, and atomic indices. Some of the first electronic
indices of reactivity, derived within the HMO theory, are also mentioned, owing to the
topological origin of the Hiickel matrix. Examples are presented of successful modeling
of various reactivity effects in different branches of chemistry including environmental
chemistry, toxicology, and drug-receptor interaction, along with some topology-based
reactivity rules and relationships.
Chapter 9 by Temkin, Zeigarnik, and Bonchev centers on progress in the mechanistic
and kinetic studies of complex reactions as a source of information on the reactivity of
intermediates and their elementary steps. The graph-theoretical concept of the topological
identifier, which produces two general principles (simple bond-change topology and
bond-change compensation topology), is first developed for identifying, classifying, and
enumerating elementary reactions. Then, the formalism of kinetic graphs and bipartite
graphs is applied to the classification, coding, and enumeration of linear and nonlinear
reaction mechanisms. The authors also introduce and discuss the concept of mechanistic
topological structure (the reactant interrelations, the number and kind of reaction routes,
and their mutual connectedness), an aspect of complex chemical reactions that was
largely neglected in the past. A topological characterization of the four major classes of
complex reactions (noncatalytic, noncatalytic conjugated, chain, and catalytic reactions)
is presented on this basis.
In conclusion, we would like to thank our authors and express the hope that the
material presented will find resonance with our readers and prompt their own
contributions to the field. If this proves to be the case, then the aim of this volume will
be fulfilled. .
Haruo HOSOYA
Ochanomizu University
Department of Information Sciences
Bunkyo-ku, Tokyo 112, Japan
1. Chemical graph theory
In this book you are invited to the world of the application of the graph
theory to chemistry, especially on the problem how the topology of a
molecule determines its reactivity toward a specific reaction and how the
graph theory helps you understand these relationships. This introduction to
the graph theory was written for the purpose that a chemist or chemist-to-
be will be relaxed to think of applying the graph theory to one's own
problem.
There have already been published many books and monographs on the
application of the graph theory to chemistry from various standpoints
(Balaban, 1976, Graovac et aI., 1977, Trinajstic, 1983, 1992,
Balasubramanian, 1985, Tang et aI., 1986, Rouvray, 1990, Bonchev and
Rouvray, 1991). The readers can consult them for those interesting
problems which could not be introduced in this small article.
As a peculiar way of guiding the readers to the graph theory through the
gate of this chapter you are supposed to have the least knowledge of the
Huckel molecular orbital (HMO) theory applied to conjugated hydrocarbon
molecules (Huckel, 1931). That is, you are supposed to know that the
solutions, {£n=a+xnl3}, of
Ll(e) = det{ (a-e)E+ (3A}= 0, (1-1)
represent the energies of the orbitals for accommodating the electrons in
the molecule. Here A and E are, respectively, the unit and adjacency
matrices, which will be explained later, and a and f3 are the parameters
called the Coulomb and resonance integrals, respectively. Further, you are
supposed to know that the set of the eigenvectors which are automatically
obtained in the process of diagonalizing the secular matrix represent the
coefficients (Cor) of the molecular orbitals (MO) expressed in terms of the
linear combination of atomic orbitals (LCAO).
In the ground state the lower half (in energy) of the MO's are doubly
occupied by electrons. By taking the product-sum of the edge of the
corresponding chemical graph, one can obtain the bond order (Prs), which
is a measure of the contribution of the bond rs to the stabilization of the
molecule caused by the delocalization of 7t-electrons.
Although the HMO theory was proposed to study the 1t-electronic system of
unsaturated hydrocarbon molecules, its formal application to other classes
D. Bonchev and O. Mekenyan (eds.), Graph Theoretical Approaches to Chemical Reactivity, 1-36.
© 1994 Kluwer Academic Publishers.
2 H. HOSOYA
H
I
H C
'C~ ..... C....
I II
H
~ I
C
C.... .....C
I
°
C:::::>I
...... 0 ......
°I
....C.:::.C .... C, C.........C
H
I
H C °
0 ............ 0
H
Structural Carbon (Molecular) Graph
Formula Network
L
0 0 0
/\ ii~ /\
0---0
2 3 <=> -1
0 ___ 0
2
Il il ~l
1 1 1 I 1
[:
0
iJ [~ 0
0 !J [~ 0
[:
0
-i [:
0
2
H 1 0 0 0
I C I 0 2
0 1 0 0 = CH 202 - CHO - H02 - SH 20 + 0 +4H
0 0 1 H 0
0 2 0 0 0
and 11 = HCl- 1.
17 CI
number of the least steps from vi to Vj. If a graph G is given, the matrix D
can be reproduced uniquely, while by wiping out all the elements in D
except for unity one gets A. Thus one can assert that the geometrical
objects, A and D, for G are mathematically equivalent.
0-0-0-0
A= G D=
1\
A .... D
2.1.3. Characteristic polynomial
Although an adjacency matrix unequivocally represents the topological
structure of a graph, the bits of information increase quadraticly with
respect to N. On the other hand, the number of chemical substances which
are duly stored in the accessible CAS (Chemical Abstracts Service)
database has recently exceeded ten million. Then one needs some
characterization for a graph, which is a mathematical abstraction from a
complicated representation of a given object. A naIve way for
characterizing a given matrix is to take its determinant.
Define the characteristic polynomial Pdx) of G as
Pdx) = (-l)Ndet(A-xE), (2-1)
where E is the NxN unit matrix and x a scalar (Collatz and Sinogowitz,
1957). Although we know that PG(x) does not have a one-to-one
correspondence with G (Harary et aI., 1971), it has been shown that it can
be used for rough characterization of a graph.
By substituting E=a.+(3x into Eq. (1-1) it is shown that the HMO energies
of the secular equation of ~(E) are nothing else but the spectra of PG(x) for
the graph corresponding to the carbon atom skeleton of a molecule
(Cvetkovic et aI., 1980). From this coincidence one can expect that some
information from the HMO scheme might be helpful for analyzing and
understanding certain kinds of graph-theoretical aspects of the relevant
graph, and vice versa. This is the reason why in this chapter we have
entered into the world of the graph theory through the HMO gate.
2.1.4. Distance polynomial
By using the distance matrix one can define the distance polynomial
(Hosoya et aI., 1973, Graham and Lovasz, 1978) as
Sdx) = (-l)Ndet(D-xE). (2-2)
A number of interesting relations have been known between the coefficients
of the distance polynomial and the topological structure of a graph (Graham
6 H. HOSOYA
et aI., 1977, Hosoya, 1988). Although the number of digits of those coef-
ficients rapidly increases with N, it is shown that the distance polynomial
cannot also uniquely determine the structure of a graph.
ZG = p(G,k) (2-4)
k=O
where d(i,j) is the shortest distance between the two vertices Vi and Vj .
By using the set of p(G,k) numbers the following counting polynomial has
been defined,
L
m
(XG(x) = (_1)k p(G,k) xN-2k, (2-6)
k=O
INTRODUCTION TO GRAPH THEORY 7
Now we are ready for realizing and comparing the graph-theoretical aspects
of various series of graphs.
Especially for the path graph, SN, with N vertices, the characteristic
polynomial can be expressed by a closed fonn (Hosoya, 1971) as
P s (x) = L
[N/2]
(_1)k
(N -k) x N - 2k (2-9)
N k~ k
p(G,k)
N SN ZG w
k=O 2 3
0 <p a) 1
0 1 1 0
2 0-0 1 2
"..0,
3 0 0 2 3 4
4 0
",0, ...... 0
0
1 3 5 10
/ 0 , ",0,
5 0 0 0 4 3 8 20
..... 0 ........... 0, /0
6 0 0 0 1 5 6 13 35
..... 0, .... 0, ..... 0,
7 0 0 0 0 6 10 4 21 56
a)
Vacant graph.
p(G,k)
N eN Zo w
k=O 2 3
1 0 0
2 0 2 3
0
3 / \ 1 3 4 3
0-0
0-0
4 I I 1 4 2 7 8
0-0
... 0,
5 0
\
0
I
1 5 5 11 15
0-0
0-0,
6 0 0 1 6 9 2 18 27
0-0
INTRODUCTION TO GRAPH THEORY 9
Fig. 3 Relation between the cycle graph Cn and path graphs Sn.
N G <Xa(x) PG(x)
0 <I>
1 • x x
2 x2_1
2 e-----. X -1
3
~ 3 3
x-x x3-3x-2
4
A x4-6x 2+3 x4-6x 2-8x-3
5
~ x5-lOx 3+15x x5 32
-lOx -20x -15x-4
6 @ x6-15x4+45x 2-15 x6 43
-15x -40x -45x 2-24x-5
3. Realization of a graph
3. 1. SYMMETRY OF A GRAPH
First consider the complete graph, K 4 , for which there is no a priori way
of drawing. Then as in Fig. 4 one can draw several graphs whose
geometrical structures belong to different point groups, such as T d (with
24 symmetry elements), D4h (16), D3h(l2), etc. Among them the Td and
D3h structures are 3-dimensionally, or geometrically possible, while the
INTRODUCI10N TO GRAPH THEORY 11
The coefficient, p(G,m), of the last term of QO<x) or (Xo(x) is equal to the
number of the Kekule structures, or the perfect matching number, of the
dodecahedron. In general the characteristic polynomial of a highly
symmetrical graph can be highly factored out. On the other hand, the
corresponding matching polynomial is not usually factorable. However, the
coefficient of the last term of the matching polynomial of a highly
symmetrical graph is found to be highly factorable (Hosoya, 1986). For
example, the Kekule number of the truncated icosahedron, or the perfect
matching number of the pattern of the soccer ball, is equal to 12500=2255,
and that of.the truncated dodecahedron is 2048=2 11 .
INTRODUCTION TO GRAPH THEORY 13
From these MO's the bond orders of the component bonds can also quite
easily be calculated, just by noticing the fact that each of the triply
degenerate orbitals is to be occupied by 2/3 1t-electron to attain the
uniform (or neutral) charge distribution. For example, the bond order of
bond 12 can be calculated as
P12 = 2x1l2x1l2 + 2/3x1l2xl/2(lxl-lxl-lxl)
= 112-116 = 113,
14 H. HOSOYA
and the same values can be obtained for all the component bonds, meaning
that the edge-topicity of this graph is unity. Finally one can realize the high
symmetry, 1<4, of this graph.
In the above case the graph was so small that one can recognize the high
symmetry of the graph even without recourse to the above-mentioned
procedure. However, in the case where a rather complicated structure as
dodecahedron is drawn so deformedly that no proper symmetry element can
easily be recognized, one can use a computer to get the eigenvalues (orbital
energies), eigenvectors (wavefunctions), and the related quantities as the
charge density and bond order. The method outlined here is too brute to be
intorduced in a usual textbook of the graph theory. However, the clever
readers would realize how this method works well for a complicated
problem.
1~2
3 4
lSI ~ ~
MO. 1 2 3 4
X 3 -1 -1 -1
Petersen
graph
P(5,2) P(5, 1)
P(6,1) P(6,2)
Continue this process until one gets a cubic graph, i.e., a graph whose
vertex degrees are all three. Necessary caution to be taken for the cases
with m=2n and 2n+2 is illustrated in Fig. 7, where P(6,1), P(6,2), P(8,3)
and P02,5) are also shown. The cube and dodecahedron graphs (Fig. 5)
are, respectively, P(4,1) and P(lO,2).
The two graphs, P(8,3) and P(l2,5), have very interesting features in
common. Namely, they can be derived from the honeycomb lattice and the
edge topicity is unity (Hosoya, 1993). Although at present no actual
reaction network is known to be associated with these highly symmetrical
graphs, their topological structures are worthy of further study.
3.2. ISOMORPHISM OF A GRAPH
3.2.1 Comparison of two graphs
Since a graph is a topological object which merely expresses the adjacency
relation for a given set of vertices or concepts, so many different ways for
representing the same set of the adjacency relation may be possible as in
Figs. 4 and 5. If one-to-one correspondence between the adjacency
relations in the two given graphs Gland G2 is found, they are called
isomorphic to each other. Examples of isomorphic graphs are given in
Figs. 4 and 5. As stated before a graph G with N vertices can be expressed
by an adjacency matrix A with N! different ways for numbering the
vertices. However, the mathematical properties of a matrix are independent
of the numbering of vertices. In other words isomorphic graphs have
exactly the same topological quantities.
Two graphs are called homeomorphic with each other if they can be
reduced into the same graph by short-cutting without changing their
topological structure. A pair of path graphs, Sj and Sk, with different
number of vertices are not isomorphic, but homeomorphic with each other.
By successive deletion of an inner vertex followed by short-cutting both
the graphs can be reduced into S2 (See Table 1). Similarly any two cycle
graphs, Cj and Ck, can be reduced into C3 (Table 2) and are homeomorphic
with each other. The concept of homeomorhism might be important for the
analysis of a reaction network.
Given a pair of graphs with the same number of vertices and edges, one
often needs to judge if they are isomorphic or not. For a pair of relatively
large graphs no efficient algorithm other than exhaustive search is known
for judging if they are isomorphic. Although the characteristic polynomial
cannot necessarily distinguish the topological structure of a graph, it can
be used for rough discrimination. That is, there are so many examples
where two or more different graphs have the same characteristic polynomial
(See Table 4). One may say that for larger graphs redundancy of the
characteristic polynomial, r(Pdx», is not negligible. This is also the case
with almost all of the topological indices, such as the Z-index, or Z-
counting polynomial, QG(x). Actually, for tree graphs we have r(Z) >
r(Qd x » = r(Pdx» (Mizutani et aI., 1971).
INTRODUCTION TO GRAPH THEORY 17
1
0- 0 [>-0-0
0
0 0 X X X
0 »-0
2 [>-~o 0 X X X X
0
0-0-/ 0- 0
»-
3 cf\o X X 0 X X
0 0
4
0,'L>--0-o-o 0\00
o I
0 0 0 X X
0 o~o
5 0>-0 b> 0 0 0 X
o means that the pair of graphs have the same value or formula, while
X
The equality in the above relation comes from the fact that the set of the
coefficients of QG(x) are identical to that of Pdx) for tree graphs, whereas
the inequality between Z and Qdx) has been derived by contracting the
topological information from the latter quantity to the former. In the
extensive tabulation of non-tree graphs the following inequality is
observed, r(QG(x)) > r(Pdx» (Kawasaki et aI., 1971).
isospectral graphs
c$C)---I O(J-/ Y
o-J
cut
II II II
(\-ri( =
~cut~ 0
isospectral points
Since the coefficients of the distance polynomial are much larger than those
of the corresponding characteristic polynomial, one can expect more
efficient discriminating property than the characteristic polynomial. As
seen in Table 4 all the isospectral and/or iso-Z pair graphs can be
discriminated by the distance polynomial, SG(x). However, as will be
shown below the distance polynomial does not also uniquely determine the
topological structure of a graph.
1 1 1 11 159 1238
2 1 1 12 355 3057
3 1 2 13 802 7639
4 2 4 14 1858 19241
5 3 8 15 4347 48865
6 5 17 16 10359 124906
7 9 39 17 24894 321198
8 18 89 18 60523 830219
9 35 211 19 148248 2156010
10 75 507 20 366319 5622109
Every isomer compound has its own geometrical structure and properties.
The chemical properties, such as reactivities, are largely determined by the
electronic property of a single molecule, whereas almost all the thermo-
dynamic properties, e.g., boiling point and density of liquid, are the
outcome of interaction among a vast number of the same species, as large
as 1020. However, it has empirically been known that there exist beautiful
correlations between the topological structure and various properties of
chemical substances, tempting the curiosity of scientists and mathema-
ticians. Besides fruitful results in mathematical chemistry, a new field of
QSAR or QSPR study has been cultivated to crop drug and reaction design
(Kier and Hall, 1976, 1986).
In this section only Table 7 is given for the readers, which compares
various topological quantities with boiling point amd density of liquid of
heptane isomers, which were taken from the extensi ve tabulation of
thermodynamic properties of hydrocarbons (Rossini et aI., 1943).
Table 7 Topological properties and chemical properties of heptane isomers.
Isomer (G) ZG W bp a pb dC
6 ~2 ~ non-planar
5~3
4
on torus
~ - 3 edges
@
3 5
~
2 4 6
~ - 1 edge
~ planar
4. Operation on graphs
If various relations among the eXIstmg graphs are clarified through the
concept of operation on graphs, one can either develop global discussion or
get useful interpretation for disorderly piled up informations. Further,
consider such a case where one has to calculate the topological quantities
for infinitely large networks. Unless practically useful techniques, such as
recursive formulas, for calculating those quantities are known, one would
surely be lost in the jungle of the so-called combinatorial explosion.
One can construct larger graphs by the use of various operations on the
component small graphs, among which four fundamental operations, union,
join, product, and composition, will be explained here. Given two graphs,
G I and G 2 , to be operated by the operation F, then the third graph G 3 is
generated as
G 3 =F(G 1 ,G 2 ) (4-1)
Except for composition, an operation on graphs is usually commutative,
which can be expressed as
24 H.HOSOYA
(4-2)
The simplest operation is the union, which does not change the numbers of
edges and vertices as,
v 3 == vI + v 2 and e3 == e 1 + e 2
where vn and en' respectively, means the numbers of vertices and edges of
graph G n . This means that the operation union, U, simply takes two
disjoint sets of graphs, G 1 and G 2, as a disconnected graph G 3, or one can
express this as
U(G], G 2) == G] UG 2 (4-3)
a b 1 2 3
G I = S2 = K2 ------ G2 =S3 • • •
a b
a b
G l VG 2 ------
1 2
• •
3
•
G I +G 2
M 1 2 3
al a2 a3
G I xG2 CD
bI b2 b3
al a2 a3 la~ Ib
~
G1[G2] G2[Gtl 2a 2b
bi
b2 b3 3a 3b
4.1.3. Composition
This operation is a little complicated. The operation G I [G2] is defined as
follows. The two vertices XjY k and xmY n are connected, if and only if
either the edge (Xj'x m ) exists in GI or Xj=xm and the edge (Yk'Y n) exists in
G 2. Then one can see that G 2[Gd yields different result from G 1[G 2]. See
Fig. 10 for the different effects between G I [G 2] and G 2 [Gtl.
Za b) (aN+I_~N+l)/-vs aN + ~N
G G-l Gel
l - exclusive l - inclusive
There are two and only two possibilities in choosing k disjoint edges,
whether they contain lor not. In other words, the sum of I-inclusive and
I-exclusive countings is equal to the value of p(G,k). They are nothing
else but the terms, p(Ge/,k-l) and p(G-I,k), respectively. Then we have
the recurrence relation,
28 H.HOSOYA
This relation automatically gives the recurrence relations for the two
counting polynomials, i.e., Z-counting and matching polynomials, as
Note that x in the second term of the right hand-side of Eq. (4-6) means a
selection of edge I as an entry among k disjoint edges from G. Figure 3
gives another example of application of these recurrence relations.
4.3.2. Operator technique and transfer matrix
Besides the above-mentioned recurrence relation a few other relations have
been known for the p(G,k) numbers and Z-counting polynomial. By
repeated use of these recurrence relations one can obtain these quantities
for relatively large networks with periodic structure. Whether useful recur-
rence relations for a given topological quantity exist or not depends on the
way how it is defined. For the characteristic polynomial recurrence
relations as the one introduced in Table 9 have been known. However, For
polycyclic graphs, even if they have periodic structure, paractical
application of the recurrence relation is usually a formidable task to be
performed even with a computer.
In order to overcome the difficulty in the case of polycyclic network
systems several mathematical techniques have recently been introduced,
such as the operator technique (Hosoya and Ohkami, 1983) and transfer
matrix method (Randic et aI., 1989).
Consider, for example, a series of polyacene graphs, or linearly growing
hexagonal animals, for which the recurrence relations of the Z-counting
(matching) and characteristic polynomials are to be sought. In both the
cases we are forced to solve simultaneous but entangled set of recurrence
relations for the three series of subgraphs, In' Ln' and N n , derived from
the polyacene graphs.
where In. Ln. and N n .respectively. stand for the matching polynomials of
the three series of graphs.
Then let us define a step-up operator 6 for promoting the n-th member of
(any kind of) counting polynomial Fn to the (n+ 1)-th member as
(F = I. L. and N) (4-9)
If one assumes that Eq. (4-9) is commonly applied to In. Ln. and N n • the
set of simultaneous recurrence formulas for a series of a family of
regularly growing graphs can be transformed into a set of simultaneous
linear equations involving the operator 6 as a variable. Then the necessary
condition for the variable 6 to be non-trivial is that the coefficient
determinant of Eq. (4-8) is zero. Namely.
and we have
6 3 - (x4 -5x 3 +3)6 2 + (x4-3x2+3)6 - 1 = O.
which gives the corresponding recurrence formula for the matching
polynomials of the three series of graphs as
Fn = (x4-5x 3 +3)Fn_l - (x4 -3x 2+3)F n_2 + F n-3 (4-11)
Not all but many of the graphs with an even number of vertices (N=2m) can
be spanned by m edges without leaving isolated vertices. The number of
ways for this type of selection is called the perfect matching number, and
can be expressed in terms of the non-adjacent number as p(G,m). In the
graph theory a graph with p(G,m):;toO is said to be I-factorable. A I-factor
is a set of isolated edges, as the degrees of all its component vertices are
unity, while any set of disjoint cycles is a 2-factor, as all the vertex
degrees are two.
H H
H2C=C-C=CH 2 C=C-C=C K(G)=l
C
II
C
/C :----
/C
'C/
'-..;C
I
co K(G)=2
It is possible for a tree graph with even N to have zero K(G), as exempli-
fied below.
INTRODUCTION TO GRAPH THEORY 31
Acknowledgment
The author of this chapter greatly appreciates Prof. Milan Randic for his
numerous and useful advices for improving the manuscript.
32 H. HOSOYA
References
New York.
Sinanoglu, O. and Lee., L.-S. (1978) " Finding All Possible a priori
Mechanisms for a Given Type of Overall Reaction", Theor. Chim. Acta
(Berl.) 48, 287-199.
Sinanoglu, O. and Lee., L.-S. (1979) " Finding the Possible Mechanisms
for a Given Type of Overall Reaction", Theor. Chim. Acta (Ber!.) 51, 1-9.
Trinajstic, N. (1983) Chemical Graph Theory, 1st Ed., CRC Press, Boca
Raton, FL.
Trinajstic, N. (1992) Chemical Graph Theory, 2nd Ed., CRC Press, Boca
Raton, FL.
1. Introduction
* Dedicated to the memory of those brave Croatian men. women and children who died defending the
freedom and democracy in the Republic of Croatia against the Serbian and Montenegrin fascists.
** Permanent address: The Rugjer Boskovic Institute. P.O.B. 1016. HR-41001 Zagreb. The Republic of
Croatia.
37
D. Bonchev and O. Mekenyan (eds.!, Graph Theoretical Approaches to Chemical Reactivity, 37-72.
© 1994 Kluwer Academic Publishers.
38 N. TRINAJSTIC ET AL.
possible. Nevertheless, we will consider in the present article only the interplay between
the MO theory at the Hiickellevel and graph theory. In this way the analysis will be
simple, clear and easily understood by a chemical community at large. Besides, the
HMO theory in spite of all of its shortcomings I is still being used by many a chemist,
e.g.,12-21 as a convenient device for qualitative rationalization of a variety of chemical
phenomena.
This article is structured as follows. The next section contains the exposition of the
fundamentals of graph theory later needed in the text. In the third section the equivalence
of graph spectral theory and Hiickel molecular orbital theory is presented. In the fourth
section a brief discussion on the characteristic of the Hiickel spectrum is given. The fifth
section contains the presentation of the topological effect on molecular orbitals (the
TEMO concept), whilst in the sixth section the graph-theoretical formulae for the
HOMO-LUMO separation and absolute hardness are considered. The concept of
topological charge stabilization is detailed in the seventh section. The eigth section
contains the graph-theoretical analysis of the localization energy. The article ends with
concluding remarks.
In this section will be given only those graph-theoretical concepts and definitions that
will be utilized in the present article. In doing this we will follow Frank Harary's classic
text "Graph Theory"22 and our own books 3,5 on chemical graph theory.
A simple graph G is defined as an ordered pair (V(G), E(G)), where V=V(G) is a
nonempty set of elements called vertices (or points) of G and E=E(G) is a set of
unordered pairs of distinct elements of V called edges (or lines). Whenever we mention
the term graph in the text, we will always refer to only a simple graph.
A graph G can be visualized by means of a diagram when the vertices are drawn as small
circles or dots and the edges as lines or curves joining the appropriate circles. Since a
diagram of a graph fully describes the graph, it is customary to refer to the diagram of
the graph as the graph itself. As an example of a simple graph we give in Figure 1 a
diagram of a labelled simple graph O. A graph G is labelled if a certain numbering of
1kj78
vertices in G is introduced.
3 6
4 5
G
Figure 1 A diagram of a labelled simple graph G.
THE INTERPLAY BETWEEN GRAPH THEORY AND MOLECULAR ORBITAL THEORY 39
Many results which can be proved for simple graphs may be extended without difficulty
to more general graphs in which two vertices may have more than one edge connecting
them (multiple edges) or edges may join vertices to themselves (loops). An example of a
general labelled graph is given in Figure 2.
G
Figure 2 A diagram of a labelled general graph G.
A subgraph G' of a graph G is simply a graph whose all vertices and edges are contained
in G. Subgraphs of G can be generated from G by deleting any number of vertices and/or
edges. By definition a graph is its own subgraph. In Figure 3 we give several subgraphs
ofG.
Figure 3 Several subgraphs of a graph G. G2' and G6' are isomorphic subgraphs.
Note that G1' is an acyclic spanning subgraph of G. Any subgraph of a graph G which
contains all its vertices is a spanning subgraph of G. Subgraph G1' can be denoted as G-e
as it is obtained by deletion of the edge e from G. Frequently occurring subgraphs are G3'
and GS' which are denoted as G-r and G-r-s, respectively. Subgraph G-r is obtained by
deletion of the vertex r and its incident edges from G. Subgraph G-r-s is generated by
removal of vertices rand s and their incident edges from G.
Two vertices rand s of a graph G are adjacent (the first neighbours) if there is an edge
joining them. Vertex s and edge e in G of Figure 3 are incident as the edge e terminates
at s.
A graph G is planar if it can be drawn in the plane in such a way that no two edges
intersect. Otherwise a graph is non-planar. Examples of a planar graph and a nonplanar
graph are given in Figure 4.
40 N. TRINAJSTIC ET AL.
G1 G2
Figure 4 Examples of a planar graph (GI) and a nonplanar graph (G2).
site H vertex
connection Hedge
H H
H , H ./
CI
" __ C-/-C r'\, /
C~I
V "\
0 1 0 0 0 0 0 0
1 0 I 0 0 0 1 0
0 1 0 1 0 0 0 0
A 0 0 1 0 1 0 0 0
(2)
0 0 0 1 0 1 0 0
0 0 0 0 1 0 1 0
0 1 0 0 0 1 0 1
0 0 0 0 0 0 0
Typically, the adjacency matrix is a sparse matrix: amongst its N2 matrix elements there
are 2 lEI nonzero entries where lEI denotes the number of edges in G.
The characteristic (secular, spectral) polynomial P(G;x) of a graph G is the characteristic
polynomial of its adjacency matrix, A = A(G):
where the I is the NxN unit mattix. The characteristic polynomial can be wlitten as
N
P(G;x) = Lan xN -n (4)
n=O
A graph eigenvalue Xi (i=l, ... , N) is a zero root of its characteristic polynomial. The
collection of all graph eigenvalues {Xl, ... , XN} forms the spectrum of the graph.24 The
eigenvalues are real and the interval they span is bounded. According to the Frobenius
theorem,25 the limits of the graph spectrum are determined by the maximum degree
(valency) of vertices in a graph.
There are many methods available for the construction of the characteristic polynomial,
e.g. 26 We usually use the Le Verrier-Faddeev-Frame method. 27 .28
The simplest fOlm of the MO theory is the Hiickel molecular orbital (HMO) theory.29
The theoretical framework of the HMO model has been very often presented in the
literature. 8.30.3 I Therefore we shall repeat it here only bliefly. In Hiickel theory, only the
7t-electrons are considered explicitly. This is the result of the Hiickel assumption that a
and 7t electrons in a conjugated system are separated, i.e., that the related wavefunctions
are mutually orthogonal:
(5)
H(Huckel) is the effective one-electron Hamiltonian. It is defined only to the extent that
we give its matrix elements (see eqs. (14) - (15». The quantity Ei is the energy
eigenvalue associated with 'l'i. The 7t MOs are expressed in the usual linear combination
of atomic orbitals (LCAO) form,
(7)
where Cir is the linear expansion coefficient and 'i>r is a 2pz orbital on atom r. The
summation is over all conjugated centers in a molecule. The functions 'l'i are
orthonormal, that is,
(8)
If the occupation number of'l'i is denoted by ni, the total 7t-electron energy En of such
electronic configuration is given by,
THE INTERPLAY BETWEEN GRAPH THEORY AND MOLECULAR ORBITAL THEORY 43
The coefficients Cjr are obtained from the requirement that En should be a minimum.
what by means of the standard variational procedure leads to a set of secular equations.
N
I. Cis (H rs - Ej Srs) = 0 (i.r = 1•...• N) (12)
i:<;;l
The zeros of (13) determines the 1t MO energies: EI, E2, ... , Ei, ... , EN. For a given Ei. the
set of eqs. (12) determines the coefficients Ci'S of the i-th MO.
The secular determinant (13) can be simplified by using the set of approximations
originally introduced by Bloch32 and utilized by Hiicke1. 29 Since the Hiickel
Hamiltonian is not known explicitly, its matrix elements. eqs. (10) and (11), can be
related to empirical quantities. The diagonal elements (Hrr) are assumed to be constant
for all identical orbitals; they are called Coulomb integrals and are given an empirical
value of <x,
The off-diagonal elements (Hrs) are assured to be zero unless orbitals <Pr and <Ps are
located on bonded atoms. For bonded atoms H rs are assumed to be the same for all
similar bonds. They are called resonance integrals and are given an empirical value of ~.
H rs = < <Pr I "(Hiickel) I <Ps > = {~ if ato~s rand s are bonded (15)
o otherwIse
Furthermore, the zero overlap is assumed between neighbouring atoms. that is,
44 N. TRINAJSTIC ET AL.
where H is the Hamiltonian matrix and S is the overlap matrix. As a result of the Bloch-
Hi.ickel approximations, the matrices Hand S have the following composition,33
(18)
(19)
where A is the adjacency matrix of the Hiickel graph. The matrix [H - Ej S] is called the
Huckel matrix. Substitution of Hand S by (18) and (19) into (17)' and division of each
row of determinant by ~ gives,
E'-a
det I _1_ I - A I= 0 (i = 1, ... , N) (20)
~
If the normalized form of Huckel theory is used, i.e., if ~ is taken as the energy unit and
a the zero-energy reference point or ~ = 1 and a = 0, then eq. (20) becomes Hi.ickel
determinant,
The comparison between the Huckel determinant and the secular determinant (3):
(i = 1, ... , N) (22)
reveals that Ej, representing the energies of individual Hiickel MOs, are identical to the
elements of the spectrum of the adjacency matrix of a Huckel graph,
[H, A] =0 (24)
THE INTERPLAY BETWEEN GRAPH THEORY AND MOLECULAR ORBITAL THEORY 45
they possess the same set of eigenvectors. Therefore, the eigenvectors of the adjacency
matrix are identical to the Hiickel MOs. On account of this the HMOs are sometimes
refen-ed to as the topological molecular orbitals.34
Eq. (18) also reveals that the Hiickel Hamiltonian is the linear function of the adjacency
matrix ,7
H =H(A) (25)
This is due to the particular nature of the Hi.ickel Hamiltonian, with the short-range
forces being dominant in the effective potential.3 4
The analysis in this section results in two important conclusions: (i) The spacing and
general pattern of Hi.ickel eigenvalues are specified by the skeletal atom-atom
connectivity in the conjugated molecule and (ii) The skeletal atom-atom connectivity in
the conjugated molecule, rather than its geometry, determines the form of Hi.ickel
molecular orbitals. Therefore, what chemists customarily call as the Hi.ickel MO theory
is essentially what graph-theoreticians refer to as graph-spectral theory. In fact Hi.icke!
theory and graph-spectral theory are isomorphic theories for the specified class of graphs
(planar graphs with the maximum valency 3).3-9 ,24.35
4. Hiickel Spectrum
The extrema of the spectrum are defined as we already said by the Frobenius theorem.
Since the maximum degree in Hi.ickel graphs is equal to 3, the interval in which the
Hi.ickel spectrum lies is given by
-3 ~ Xj ~ + 3 (i = 1, ... , N) (27)
Hi.ickel graphs with spectra containing only integers are very rare. Actually there are
only five conjugated sustems with integer Hi.ickel spectra.3 6
A more common case is the OCCUlTence of nonisomorphic conjugated molecules with
identical Hi.ickel spectra. 37 ,38 They are named isospectral molecules 7 in chemical graph
theory, although a telm cospectral is suggested by Harary22 as more appropriate.
The Hi.ickel spectrum consists of three subsets con-esponding to bonding (Xj>O, i.e.,
Ej<O), nonbonding (Xj=O, i.e., Ej=O) and anti bonding (Xj<O, i.e., Ej>O) energy levels. The
cardinalities of these subsets are denoted by N+' No and N., respectively, and they are
related to the number of conjugated centers (N) as:
46 N. TRINAJSTIC ET AL
(28)
These quantities are important for the chemical behaviour of conjugated molecules. The
presence of non bonding energy levels (and non bonding MOs) indicate that such a
molecule should have open-shell ground state (within the HMO model) and be very
reactive.39 The experimental facr4 0 is that the structures possessing nonbonding MOs are
rarely encounteI:ed in the chemistry of conjugated molecules.
For our purpose, related to the scope of this book, especially important eigenvalues are
Xn and Xn+l (where n=N/2 if N is even and n=(N + 1)/2 if N is odd number) which
correspond to the frontier orbitals, HOMO (highest occupied molecular orbital) and
LUMO (lowest unoccupied molecular orbital), respectively. They are chemically the
most important orbitals,41-46 since they are directly involved in chemical reactions. The
following rule appears to be generally valid: The smaller the HOMO - LUMO separation
the more reactive the molecule is expected to be. Additionally, many molecular
properties such as the UV -vis specu'al characteristics, polarographic half-wave oxidation
and reduction potentials, ionization potentials, electron affinities, the charge-transfer
energy in molecular complexes. etc .. are largely dependent on the frontier orbitals and
their energy separation.
Figure 6 Examples of topomeric benzenoids. The broken lines indicate bonds which
need to be ruptured in order to transform one topomer into the other.
A B A B
o-+---1-ou o-+---1-ou
5 O-+---1r-<> v So-+---1r-<> v
s T
Figure 7 A schematized pair of topomers.
where PI'S and Puv denote paths connecting vertices rand s in A, and u and v in B,
respectively. The difference .-1(x) of polynomials (29) and (30) is given by,
In the special case when the subunits A and B are isomorphic and the sites u and v
conicide with the sites rand s, as it is the case for the pair S 1 and T I in Figure 6, (31)
reduces to
Obviously, in this specific case .-1(x) is non-negative for all x and the following
inequality holds
There are several consequences of (33) for those specially consu'ucted Sand T isomers.
We present two of them. Firstly, the total 1t-electron ground state energies, E(S) and
E(T), of those isomers obey,54
(36)
x
S T
2.43 2.41
1.95 2.00
1.51
1.41,
1.31 1.41
1.14
1.00,
0.77 1.00
0.61
0.41
-0.41
-0.61
-0.77
-1.00,
-1.14 -1.00
-1.41 -1.41,
-1.51 -1.41
-1.95 -2.00
-2.43 -2.41
-x
Figure 8 The eigenvalue pattern of phenanthrene (S) and anthracene (T).
Besides the Huckel molecular orbital theory, the TEMO principle has been tested against
more sophisticated MO theories and/or experimental data. For example, it has been
tested and confirmed in a series of ab initio calculations at SCF-HF level. 55 -58 Similarly,
the examination of hundreds of experimental data on topomeric pairs also confirmed the
validity of the TEMO principle. SO Cases of violations are sometimes noticed when
strong steric effects and/or pronounced non-uniformity of heteroatoms are present in
topomers.
For our purpose in the present alticle it is relevant how the TEMO principle affects the
HOMO-LUMO separation. The following has been found: 49 In the case of specially
constructed topomeric pairs of alternant hydrocarbons (AHs) with N=4n+2 (n;:::1) It-
electrons the HOMO-LUMO separation is larger in S than in T. If the topomeric pairs
50 N. TRINAJSTIC ET AL.
contain N=4n (n~l) n-electrons then the opposite is true, i.e., the HOMO-LUMO
separation is larger in T than in S.
For example, the comparison of the well-known topomeric pair with 4n+2 n-electrons:
phenanthrene (S) and anthracene (T) leads to prediction that the HOMO-LUMO
separation should be larger in phenanthrene and consequently that this molecule should
be more stable than the related isomer. HMO calculations confirms the TEMO prediction
(HOMO-LUMO(S): 1.21~ vs HOMO-LUMO(T): O.82~) and the experimental evidence
also points to phenanthrene as the compound much more stable of the two. 59
The TEMO predictions concerning HOMO-LUMO separation are also supported by the
experimental energies of the p-bands, i.e., bands arising from the n-electron jump from
the HOMO onto the LUMO level. This can be illustrated by considering a topomeric pair
with 4n n-electrons: dibenzo[fg,ij]pentaphene (S) and dibenzo[fg,qr]pentacene (see
Figure 9).
s T
Figure 9 A pair of topomelic alternants with 4n n-electrons.
The prediction that the HOMO-LUMO separation should be larger for T is fully
supported by the experimental energies of the p-bands in their absorption spectra: pS =
2.98eV vs pT = 3.0IeV.60
The TEMO principle also affects the ordering of the ionization potentials and other
physical and chemical properties in Sand T isomers.
The importance of the frontier orbitals has already been noticed by Hi.ickel in his study of
the alkaline reduction of naphthalene and anthracene. 61 Other theoreticians who also
early observed the significance of the frontier orbitals for the outcome of the chemical
reaction were Moffitt62 and Walsh. 63 .64 However, the systematic and detailed study of
the role of the frontier orbitals and the HOMO-LUMO separation in the theory of
THE INTERPLAY BETWEEN GRAPH THEORY AND MOLECULAR ORBITAL THEORY 51
chemical reactivity was canied out by Fukui. 41 -43 ,45,46,65,66 These concepts have also
been incorporated into the PMO theory of Dewar67 and the Woodward-Hoffmann
rules. 68 In this section we will give approximate formulae for the estimation of the
HOMO-LUMO separation based on the graph-theoretical quantities. The HOMO-LUMO
separation, denoted by 0, is given by,
0= Xn - Xn+l (37)
xn+l = - Xn (38)
0= 2xn (39)
In the past, several researchers have been interested in studying the structural factors
which influence the HOMO-LUMO separation.7 0-75 They have established that the
structural characteristics of the conjugated molecule, such as branching and cyclicity,
influence the HOMO-LUMO separation, though in a very complicated way.7 6 Later
efforts have produced approximate formulae for the HOMO-LUMO separation.7 7,78
Gutman and Rouvray77 derived the first approximate formula for the HOMO-LUMO
separation in alternant hydrocarbons (AHs). This formula is given below,
(40)
where N is the number of sites in a graph G depicting AH, whilst the aN-2 and aN are the
last two coefficients of the characteristic polynomial of G. The application of formula
(40) is illustrated in Figure 10.
(1)
(2)
Naphtalene graph
co
The characteristic polynomial of G
G
(41)
The symbols in (41) have their previous meaning. The use of this formula for predicti ng
the HOMO-LUMO separation in naphthalene is demonstrated below (where the data are
taken from Figure lO):
3·10-29
o(naphthalene)= -1-0- (43)112= l.281~ (42)
This time the agreement between the computed and exact o-values is much better,
difference being only O.045~. Formula (42) has been tested, for example, on a modest set
of benzenoids and has produced a reasonable agreement with exact HOMO-LUMO
values (see Table 1 and Figure 11).
2.5
o
::2:
o
~
--l HOMO- LUMO a Ii + b
2.0
::2:
o
I
1.5
• n = 24
s = 0.118
R = 0.926
1.0 F=132.7
a = 1.228 ± 0.107
b = -0.149 ± 0.101
0.5
•
0.0 + - - - - r - - - - - - r - - - T " ' " " - - - - - r - - - - - r - - - - - - ,
0.50 0.75 1.00 1.25 1.50 1.75 2.00
8
Figure 11 A plot of Huckel vs estimated HOMO-LUMO gap (in ~ units).
THE INTERPLAY BETWEEN GRAPH THEORY AND MOLECULAR ORBITAL THEORY 53
However, regardless the limited success of these two forrnulae 77 ,78 it is easily seen that
Hall's pessimistic prediction in 1977,76 that the exact analytical expression for the
HOMO-LUMO separation will probably be very difficult to obtain, remains
unchallenged.
Although both formulae are rather approximate, they allow several interesting
inferences. For example, the following relationships for alternants containing only 4n+2
(n21) cycles hold,80
co
are given in Figure 12.
C() C()
0(4) 1 (4)
CO 00 CO
o (2) 1 (4) 1 (2)
cococoO)co
1 (4) 1 (2) 0(2) 0(4) 1 (1)
In the case of AHs containing 4n+2 and/or 4n (n~1) cycles, the Kekule-structure count in
(45) should be replaced by the algebraic-structure count (ASC),81.82
There are a number of methods available for counting K(G) and K(G-r-s) or ASC(G) and
ASC(G-r-s) numbers of G.23,83-86 However, formulae (45) and (46) are interesting per
se because they reveal the relationship (although only approximate) between the MO
concept: (the HOMO-LUMO separation) and the VB concept: (the Kekule-structure
count or the algebraic-structure count.) Or in other words, they disclose that the HOMO-
LUMO gap in AHs is also calculable by use of the VB quantities.
The relationship between the HOMO-LUMO separation and graph-theoretical quantities
allows also similar analysis of the concept of absolute hardness. The absolute hardness T\
of a molecule is defined as,87
(47)
where E is the electronic energy of the molecule, N is the number of electrons in the
molecule and v is the external potential due to the nuclei. A finite approximation to eq.
(47), within the validity of Koopman's theorem, is given by,
(48)
where the symbols I and A stand for the ionization potential and electron affinity,
respectively. Note that fOimula (48) is independent of any molecular model. If the MO
theory is used,88 the absolute hardness can be defined in terms of the frontier orbitals,
(49)
T\ = - EHOMO (51)
EHOMO and ELUMO may be computed by means of any MO model such as the Hartree-
Fock or Hiickel MO model.
The absolute hardness has been used by PalT et al. as a measure of aromaticity.89-91
However, because of its definition in terms of the frontier orbitals, the absolute hardness
is a theoretical quantity which may be regarded as an unifying criterion for both the
aromatic (thermodynamic) stability and reactivity (kinetic stability) of a molecule. In this
sense the harder the polycyclic molecule, the more aromatic and less reactive it is.
56 N. TRINAJSTIC ET AL.
(52)
The symbols in eq. (49) have their previous meaning. This fonnulae gives, for example,
for the Hiickel absolute hardness of naphthalene,
11 = 2 - 3N { [K(G)]2 } 1/2
(54)
2N L[K(G-r-s)]2
r,s
or in terms of the algebraic-structure counts for AHs containing 4n+2 and 4n (n;:::l)
cycles as
The rule of topological charge stabilization states that the heteroatoms prefer to be
placed at those positions where their electronegativities match the charge distribution as
determined by the uniform reference frame. Charge densities may be computed by
Hiickel theory, extended Hiickel theory or ab initio MO theory. This rule has been
applied successfully to a number of inorganic and organic, planar and non-planar
systems. 94,96-1 04
The validity of rule of topological charge stabilization goes beyond Hiickel theory and
indeed even the molecular orbital approximation, as the following argument based on the
first-order perturbation theory shows. 97 (This argument was originally given by Parr by
private communication to Gimarc.) Consider the uniform reference frame as the
unperturbed system with Hamiltonian HO, wavefunction 'Po and total energy EO
connected by the SchrOdinger wave equation,
(56)
If we introduce a heteratom in the system keeping the molecular structure and the
number of electrons fixed, it would represent a perturbation. The perturbational
Hamiltonian H' may be expressed as a sum of changes in the coulombic nuclear-electron
attraction terms due to alternations in nuclear charges !:{Za which result from substitution
of a heteroatom at position a,
where a and i are used as labels for the nuclei and the electrons, respectively. For the
pelturbed system described by,
H=HO+H' (58)
the total energy E can be computed as the sum of the unperturbed (zero-order energy)
and the higher order corrections,
Since the operator H' involves only multiplication, the 'Po factors can be joined together
within the integral to give the unperturbed electron density,
58 N. TRINAJSTIC ET AL.
(61)
Then,
(62)
Therefore, to achieve maximum stability (lowering of the energy) through the correction
E(l), the heteroatoms with largest llZ should tally those positions in the molecule where
the electron density pO is already largest in the unperturbed or reference frame. For
qualitative considerations it is convenient to take into account valence electrons only and
to replace llZ by changes in effective nuclear charge ll~, or even more simply, to employ
electronegativity as a rough measure of ll~.
As we already said the rule of topological charge stabilization has been applied to a wide
selection of systems from both inorganic and organic chemistry. Here we will give only
a few illustrative examples. The reader who is interested in more examples is referred to
the papers on the subject by Gimarc and co-workers. 94,96-103
Let us consider the series of isomeric thiophthenes (shown in Figure 13) which are
isoelectronic with the pentalene dianion, i.e., the uniform reference frame with MO 1t-
electrons in the eight-orbital system. In Huckel MO theory the charge density qr as atom
r is given by,
(63)
where Cir is the coefficient of atomic orbital r in the molecular orbital «I>i and ni is the
number of electrons (2, 1 or 0) in orbital «I>i. The charge density distribution of pentalene
dianion is also given in Figure 13. Incidentally, the pentalene dianion has been
ro
prepared. lOS
1.20 11.32
5 7 2 1.17
4 8 3
<X>
1
CO <X 2 3
s (X) 4 5
Figure 13 1t-electron charge distribution in the pentalene dianion and diagrams of
four isomeric thiophthenes.
THE INTERPLAY BETWEEN GRAPH THEORY AND MOLECULAR ORBITAL THEORY 59
In the series of isomeric thiophthenes 2-5 using the rule of topological charge
stabilization with reference to lone concludes with regard to their relative stabilities that
those of 1,4-thiophthene (2) and 1,6-thiophthene (4) should be comparable, but 1,5-
thiophthene (3) and 2,5-thiophthene (5) should be successively less stable. This result
agrees with experimental facts. All four isomeric thiophthenes are known,106-109
although 2,5-thiophthene only as tetraphenyl-substituted derivative. I09 Calculated
resonance energies by various theoretical models (see Table 2) support completely the
stability predictions based on the rule of topological charge density.
a MJ.S. Dewar and N. Trinajstic, J. Amer. Chem. Soc. 92, 1453 (1970).
b B.A. Hess, Jr., LJ. Schaad and C.W. Holyoke, Jr., Tetrahedron 28, 3657
(1972); Tetrahedron 31, 295 (1975).
C I. Gutman, M. Milun and N. Trinajstic, J. Amer. Chem. Soc. 99, 1692 (1977);
M. Milun and N. Trinajstic, Croat. Chem. Acta 49, 107 (1977).
The rule of topological charge stabilization operates also in non-planar systems. The
charge density distribution in the uniform reference frames for non-planar systems is
calculated by means of the extended Hiickel MO theory.1l0 The extended Hiickel MO
theory is known to yield exaggerated charges but they appear to be adequate for the
purpose which require only a qualitative pattern of charge density disu;bution. In some
test cases it has been found lll that the charge iteration of the extended Hiickel MO
theory 1J2 gives charges which are more realistic but with the same pattern as those
obtained by the non-iterative procedure.
The uniform reference frames for non-planar systems are often hypothetical and have
very large total charges Q. The charges qr on individual atoms r must sum to Q,
(64)
Since one is interested only in charge differences, the normalized charges q'r are
introduced,9?
60 N. TRINAJSTIC ET AL.
where N is the number of centers of the uniform reference frame. The normalized
charges q'r also sum to zero.
A beautiful example96 to illustrate the operational power of the rule to topological charge
stabilization is provided by some molecules such as P4S3, AS4S3 and PS3As3 which are
isostructural and isoelectronic with heptaphosphorus-trianion, 113 P7 3-. P7 3- is cage or
end-capped triangular prism, with a unique apical atom, three equivalent bridging atoms
and three based atoms in the equilateral triangle.
The anion P7 3- serves as the uniform reference frame for P4S3, AS4S3 and PS3As3. In
Figure IS are given the Mulliken net atomic popUlations, or more simply, atomic
charges, for P7 3- calculated from extended Hiickel MO wavefunctions. In the same figure
we also give the normalized charges of P7 3-. Since the uniform reference frame is no
longer composed of real atoms, its graph-theoretical representation is given in the figure.
+ 0.170
- 0.720 - 0.291
- 0.194 + 0.235
7
Figure 15 The Mulliken net atomic populations (6) and the normalized
charges (7) of P7 3-.
The normalized charges (see structure 7 in Figure 15) indicate that the bridging positions
are negative compared to apical and basal positions. Consequently, the more
electronegative sulfurs should occupy the bridging sites whilst the less electronegative
phosphorus or arsenic atoms should enter the apex or basal positions. The structure of
P4S3,1l4,1l5 AS4S3116 and PS3As3117 are in agreement with those predicted above (see
Figure 16).
THE INTERPLAY BETWEEN GRAPH THEORY AND MOLECULAR ORBITAL THEORY 61
9
Figure 16 The structures of P4S3 (8), AS4S3 (9) and PS3As3 (10).
The pattern of the charge density distribution in a chain vary considerably with its
geometry. Therefore, in the applications of the rule of topological charge stabilization to
linear systems one must consider various geometries of the uniform reference frame.
Let consider symmetric five-atom chain with 24 valence electrons and two of its
geometries: linear and bent. The normalized extended Huckel charge densities for these
two uniform reference frames are given in Figure 17.
~ /-0.88
-0.90 +0.83 +0.15
0-----0----0--- ~0.89
-0.02
11 12
Figure 17 The normalized extended Huckel charge densities for the 24-electron
five-atom uniform reference, linear (11) and bent (12), frames.
Linear reference frame has negative charges at the both ends of the chain. Such a charge
distribution directs the electronegtive atoms to terminal sites. The reader should note that
the change from the linear to bent reference frame increases the electron density at the
central position, thus, providing increased stability for more electronegative atoms. In
agreement with the above C302 (O=C=C=C=O) is almost linear with the electronegative
oxygen atoms at the ends of the chain,1lS whilst B203 and (CN)zS are bent (V-shaped)
molecules1l9.120 (see Figure 18).
- 132 0
- 100 0
.. ;.~ ..
:o~ ~o: :N,;.~,N:
B, /B C, /C
.0. .S.
13 14
Figure 18 The shapes of boron oxide B203 (13) and sulfur dicyanide (CN)zS (14).
62 N. TRINAJSTIC ET AL.
In many cases all positions in the unifonn reference frame are equivalent. Then because
of symmetry all these positions will have the same charge density and, hence, the rule of
topological charge stabilization is not applicable. For example, 1t charge densities in
alternant hydrocarbons are all equal to unity.8
This difficulty may be overcomed by introducing a heteroatom of larger (smaller)
electronegativity than carbon into one of the positions. So, for example, in the case of
benzene as a unifonn reference frame, the introduction of a nitrogen atom gives negative
charges at positions 3 and 5 in pyridine. This is shown in Figure 19.
50
4
3
6 2
o
1 0.946
1.072
()
0.895
N
1.249
Figure 19 1tcharges in benzene and pyridine. The HUckel parameters used for the
nitrogen atom and CN bonds are taken to be ON = 1.00 and ~CN = 1.50.
The structures 15 - 17 (see Figure 20) are in agreement with the pattern of charge
distribution in pyridine.
H
I
H .. B .. H
'N"" 'N""
I I
B .. B
H"" 'N"" 'H
I
H
16 17
Figure 20 Molecules isoelectronic with benzene and pyridine.
To conclude this section we point out that the rule of topological charge stabilization is
easy to apply and could be used to guide preparative efforts and to expose problems that
are worth further study by both experiment and theory. This rule can serve as a powerful
unifying principle for the organization of chemical information. The impact of the rule,
especially in the field of inorganic chemistry, could be significant.
8. Localization Energy
The reactivity index, called the localization energy,121 has been introduced by
Wheland 122 for predicting aromatic substitution reactions. Wheland postulated that the
transition state (Wheland intermediate state)123 in an aromatic substitution reaction
resembles to a cr-complex. The cr-complex is not the transition state for the reaction, but
it is commonly assumed to be close to the transition state on the potential-energy surface.
The answer on the question: How close? depends upon the type and conditions of a
reaction. The structure of the Wheland intermediate is depicted in Figure 21.
Figure 21 The structure of the Wheland intermediate state. X is the attacking atom or
group, whilst S is the part of the original conjugated molecule which is still
able to support 1t-electrons and is referred to as the residual molecule.1 23
The Wheland intermediate mayor may not have a well-defined structure. 124 It is
generally considered as a loose addition complex in which the attacking atom or group X
and the departing hydrogen atom H are on opposite sides of the molecular plane. In this
complex the attacked carbon atom r is in an approximately tetrahedral configuration and
no longer contributes to conjugation within the aromatic ring. Hence, the residual
molecule S has one conjugated center less than the original molecule. Consequently, the
extent of the molecular network available to the 1t-electrons is smaller in the residual
molecule than in the parent molecule.
The localization energy for aromatic substitution, AE1t,
is the energy needed to form the Wheland intermediate state. The lower this energy, the
lower will be the energy of the transition state, and the aromatic substitution reaction will
64 N. TRINAJSTIC ET AL.
proceed with a smaller energy loss. Therefore, the lower the flE rr value, the lower the
balTier for substitution at a given position. The most prefelTed position of attack,
amongst the available positions for substitution, would be the one in which the
localization energy is the lowest. Err (molecule) and Err(S) may be calculated using MO
theory at various levels of approximation. Ordinarily, Hiickel theory or SCF n-MO
theories are used.
There are three kinds of localization energy possible, depending upon whether the
attacking atom or group is neutral, or positively or negatively charged. Or in other
words, during the substitution reaction either two electrons (electrophilic substitution),
one electron (radical substitution) or no electron (nucleophilic substitution) is localized
on atom r. In the literature the cOlTesponding localization energies are usually denoted 124
by Lr+, Lr' and Lr-, respectively.
It should be noted that the approach based on the localization energy is rather qualitative
and should be cautiously used. Nevertheless, it is still occasionally used,125 because the
users, who are usually experimental chemists with little time at their disposal for exact
theoretical computations, need a method, even if the method is approximate, as an aid in
the planning and interpreting expeliments.
In the framework of graph theory the structure of the Wheland intermediate may be
depicted as a subgraph G' which is obtained by deletion of the appropriate vertex and
incident edges from the Hiickel graph G representing a given conjugated system. In
Figure 22 we give graphs cOlTesponding to the transition states for the substitution upon
the tetracene substratum.
where G is the molecular graph depicting a system with only 4n + 2 (n~1) cycles and G-r
is a subgraph obtained by deletion of the vertex r and the edges incident to it from G. In
this discussion we will use the following expression for En(G),126
where Nand M are the number of vertices and edges, respectively, in G, whereas K(G) is
the number of Kekule structures of G. Since the structure G-r contains N-l vertices and
M-2 edges, the approximate formula for En ofG-r is given by,127
where 0' denotes multiplication over the non-zero elements of the spectrum of G-r.
Substitution of (68) and (69) into (67) produces graph-theoretical formula for the
localization energy,
Since a, b and c are constants, the localization energy at position r is determined by the
following graph-theoretical quantity,
I _ K(G)
r - SC(G-r) .. (72)
This formula may also be rewritten in terms of the last coefficients (aN(G) and aN-l(G-r))
of the characteristic polynomials of G and G-r,
(73)
where,
In the case of altern ant hydrocarbons with 4n+2 and/or 4n (n~l) sites, K(G) and SC(G-r)
in the above equations should be replaced by ASC(G) and ASC(G-r), respectively.
The relative reactivity of two positions rand s is resolved from the difference in the
corresponding localization energies,
ALrs = Lr - Ls (76)
Substitution of (71) into (76) and the use of (72) leads to the expression,
where kr and ks are the rate constants for an aromatic substitution reaction on atoms r
and s, T is the absolute temperature (in K) and R (8.314510 Jk-1mol- 1) the ideal gas
constant. By substituting (77) into (78) we obtain the following expression,
(79)
where,
p = c/RT (80)
P is a positive constant (for a given temperature) which does not depend on the type of
substitution reaction. The form of (80) strikingly resembles the Hammett equation. 130 To
obtain the Hammett equation it is necessary to make the additional assumption that linear
free energy relationship holds such that different equations reflect different, but constant,
degrees of approach to the fully localized intermediate.
Since,
SC(G-s)
lsllr =SC(G-r) (81)
or,
(82)
it follows that the position with larger SC value (or aN -1 coefficient) will be more
reactive.
THE INTERPLAY BETWEEN GRAPH THEORY AND MOLECULAR ORBITAL THEORY 67
Therefore, the topological rule for predicting the substituent orientation is as follows:
The more reactive position towards aromatic substitution is the one which has a larger
(algebraic) structure count of resonance forms in the corresponding transition state. Thus
in the case of the example in Figure 22 the reactivity order of positions on anthracene is
as follows: peri (SC = 4) > a (SC = 3) > ~ (SC = 1). This order is in accord with
experimental evidence. 60
9. Concluding remarks
In this review we have discussed the interplay between graph theory and molecular
orbital theory in the area of chemical reactivity. After pointing out that the graph spectral
theory and Hiickel molecular orbital theory are isomorphic theories and discussing the
structure of the HUckel spectrum, we presented several selected topics in which the
interplay between the two theories is clearly enriching each other. These topics were the
TEMO principle (which allows amongst the other things the reactivity predictions for a
special class of topomers), the graph-theoretical estimation of the HOMO-LUMO
separation and absolute hardness of AHs, the rule of topological charge stabilization
(which can serve as a unifying principle of chemical systematics) and finally the graph-
theoretical formulation of the localization energy. There are many other topics which are
left out in order to keep the size of the article within the agreed limits. Thus, we could
not include all reactivity indices that have been developed and analyzed in terms of
graph-theoretical invariants. 131-133 Similarly, we could not consider some interesting
classes of molecules such as Mobius molecules, 134,1 35 fractal benzenoids 136.137 and
fullerenes. 138 However, we are confident that our goal was achieved. that is, to show
how the interplay between graph theory and molecular orbital theory shapes up a
theoretical framework which is fairly simple to be used by experimental chemists and
which is reliable to great extent for qualitative predictions .. We hope there is still some
room left for simple qualitative models, because in chemistry, due to the complexity of
its problems and enormous combinatorial possibilities, we need besides exact and semi-
exact computational models, simple concepts which may be used to classify an observed
or a computed result into a coherent system.
Acknowledgements
This work was supported by the Ministry of Science, Technology and Informatics of the
Republic of Croatia via Grants 1-07-159 and 1-07-185. One of us (A.G.) would like to
thank Professors X.G. Viennot and M. Delest (Bordeaux) for their support and
hospitality during his stay at the LaBRI.
We thank Dr Sonja Nikolic (Zagreb) for useful comments.
68 N. TRINAJSTIC ET AL.
References
54. Graovac, A., Gutman, I. and Polansky, O.E. (1984) Monat. Chern. 115, I.
55. Motoc, I., Silverman, J.N. and Polansky, O.E. (1983) Phys. Rev. A28, 3673.
56. Motoc, I., Silverman, J.N. and Polansky, O.E. (1984) Chern. Phys. Lett. 103,285.
57. Motoc, I. and Polansky, O.E. (1984) Z. Naturforsch 39b, 1053.
58. Motoc, I., Silverman, J.N., Polansky, O.E. and Olbrich, G. (1985) Theoret. Chim.
Acta 67,63.
59. Clar, E. (1964) Polycyclic Hydrocarbons, Academic, London.
60. Clar, E. and Schmidt, W. (1977) Tetrahedron 33, 2093.
61. Hiickel, E. (1932) Z. Physik 76,628.
62. Moffitt, W. (1950) Proc. Roy. Soc. (London) A 200, 414.
63. Walsh, AD. (1953) J. Chern. Soc .. 2260.
64. Walsh, AD. (1953) J. Chern. Soc .. 2265; (1953) ibid. 2288.
65. Fukui, K., Yonezawa, T. and Shingu, H. (1952) J. Chern. Phys. 20, 722.
66. Fukui, K., Yonezawa, T., Nagata, C. and Shingu, H. (1954) J. Chern. Phys. 22,
1433.
67. Dewar, M.J.S. (1952) J. Amer. Chern. Soc. 74, 3341; (1952) ibid. 74, 3345;
(1952) ibid. 74,3350; (1952) ibid. 74, 3353; (1952) i!lliL. 74,3357.
68. Woodward, R.B. and Hoffmann, R. (1971) The Conservation of Orbital
Symmetry, VCH, Weinheim.
69. Coulson, C.A. and Rushbrooke, G.S. (1940) Proc. Cambridge Philos. Soc. 36,
193. A nice account about the discovery of the Pairing Theorem is given by
Mallion, R.B. and Rouvray, D.H. (1990) in J. Math. Chern. 5, I.
70. Ruedenberg, K. and Scherr, C.W. (1953) J. Chern. Phys. 21,1565.
71. Ruedenberg, K. (1954) J. Chern. Phys. 22,1878.
72. Gutman, I., Knop, J.V. and Trinajstic, N. (1974) Z. Naturforsch 29b, 80.
73. Gutman, I. (1979) Z. Naturforsch 35a, 458.
74. Bonchev, D., Mekenyan, O. and Trinajstic, N. (1980) Int.. J. Ouantum Chern.
17,845.
75. Kiang, Y.-s. and Chen, E.-t. (1983) Pure App!. Chern. 55, 283.
76. Hall, G.G. (1977) Mol. Phys. 33, 551.
77. Gutman,1. and Rouvray, D.H. (1979) Chern. Phys. Lett.. 62, 384.
78. Graovac, A and Gutman, I. (1980) Croat. Chern. Acta 53, 45.
79. Coulson, C.A. and Streitwieser, Jr., A. (1965) Dictionary of 1t-Electron
Calculations, Freeman, San Francisco.
80. Graovac, A, Gutman, I., Trinajstic, N. and Zivkovic, T. (1972) Theoret. Chim.
Acta 26, 67.
81. Wilcox, Jr., C.F. (1968) Tetrahedron Lett., 795.
82. Wilcox, Jr., c.F. (1969) J. Amer. Chern. Soc. 91, 2732.
83. Klein, DJ., Schmalz, T.G., EI-Basil, S., Randic, M. and Trinajstic, N. (1988) L
Mol. Struct. (Theochem) 179, 99.
THE INTERPLAY BETWEEN GRAPH THEORY AND MOLECULAR ORBITAL THEORY 71
1. Introduction
to large basis set (6-31G*) ab-initio Hartree Fock calculations for clusters
such as BgHg2-, B9H92- and B 10H 102-. The electronic structure of these and
related boranes and carboranes form an active field of research. In the last
year there have been at least three ab-initio calculations on the BgHg2- cluster
alone.4
In this paper, we use a form of Huckel theory which includes a
parameterless pairwise repulsive potential. This modification, which we call
second moment scaling, has recently proven successful in both rationalizing
and optimizing the structures of molecules and extended solids. The method
is based on earlier work of D. Pettifor and R. Podloucky and of 1. K.
Burdett and the authors of this paper.S Recently, we have found the method
useful in studying covalent or metallic (but not ionic) compounds where the
valence orbitals are fairly tightly bound (Le., late transition metal and main
group atoms)6. We have used second moment scaled Hamiltonians to find
reasonable energy minima for the 58 atom unit celled n-Mn structure, the
highly anisotropic gallium structure, the icosohedral packing of elemental
boron and the cia ratio of hexagonal closest packed metal alloys. Molecular
examples of geometrical optimizations include organometallic clusters such
as OSS(COh6, Ir4(CO)I2, [R~(COh6]2-, anti-Wade's rule compounds such
as C4B4Hg and C4BgHgR4 (R=a1kyl group) and organic compounds such as
napthalene, spiropentane, pentalene and Coo. In a similar vein we have
rationalized the Hume-Rothery electron concentration rules for main group
and transition metal alloys, Wade's rules for clusters, the VSEPR and the
octet rule. The difference between the current work and these earlier projects
is our interest here, not just in the global energy minimum, but in the actual
shape of the electronic energy surface as a function of geometry. A
knowledge of such surfaces is of course necessary to study vibrations and
reactivity in general.
2. Calculational Method
where we note that Ei is a function among other things of the overall size of
the system, r, and where M is the index of the highest occupied molecular
orbital (HOMO). The first term on the right hand side of the equation is the
repulsive energy, U(r), while the second is the attractive energy, - V(r).
We now follow the argument first discussed by D. G. Pettifor. 12
We consider two systems which we label I and 2. The terms ETl, UI. VI,
ET2, U2 and V2 refer to the various energies of these two systems. We
wish to calculate ~, where ~ = ETl -ET2 It may be seen that,
~E = U I (rleq) - V I (rleq) - U 2 (r2eq) + V 2 (r2eq)
where fIeq and r2eq refer to the respective equilibrium sizes of the two
systems.
We use the fact that we are interested in equilibrium geometries in
the following way. Note that at equilibrium to first order in distance, ETi(r)
is constant. Therefore,
U 2 (r2eq) - V2 (r2eq) "" U 2 (r2eq + d) - V 2 (r2eq + d) (1)
In particular we choose a value for d such that U2 (r2eq + d) = U 1(fIeq).
We now find that:
M M
N N
L
i=l
Eli 2 (rleq) = L ~i
i=l
2 (r2eq + d)
We note that both the expression to the left and right of the equal sign are
the second moments of the molecular orbital energies, ~2. In particular
equations 2 - 4 state that the differences in energy between two structural
alternatives can be calculated from knowledge of the molecular orbital
energies alone. It should however be noted that the approximation given in
equation (1) breaks down if the deviation in the values of ~2 become
significant.
To calculate these molecular orbital energies we use a minimal
valence basis set. The Hamiltonian diagonal elements equal the energies of
the isolated atomic orbitals while off-diagonal elements are calculated using
the Wolfsberg-Helmholz approximation,I3
K
R·1J = -2 S··1J (R·11 + R·)JJ
where K is a constant traditionally set at 1.75 and Sij is the overlap integral
between the i th and jth atomic orbitals. The values for the diagonal Hii
terms are taken from work of the R. Hoffmann group for extended HUckel
calculations. 14 The Sij integrals are based on Slater type orbitals (STO)
with single or double zeta expansions. Again the values of the STO
exponents, ~, are taken in conformity with values used in extended HUckel
theory. It is important to note that the standard literature values for
extended HUckel parameters are quite close to Hartree-Fock calculations. 15
For example the extended HUckel parameters for boron are Hjj(2s) = -15.2
eV, Hjj(2p) = -8.5 eV, ~(2s) = 1.3 and ~ (2p) = 1.3. These correspond to
the atomic Hartree-Fock parameters which are Hjj(2s) =-13.46 eV, Hjj(2p)
= -8.43 eV, ~(2s) = 1.288 and ~(2p) = 1.211. Except in the case of
contracted valence orbitals (such as the d orbitals in Zn or the s orbitals in
TI) we do not adjust Hiickel or tight-binding parameters to improve our fit
to experiment. In particular, in the current work on boranes and carboranes
we have made no alteration to the literature parameters for boron, carbon or
hydrogen.
In practice the second moment scaled calculations reduce to the
following. When comparing two structural alternatives we calculate the
molecular orbital energies of one of the structures at its true equilibrium
size. For the second structure we scale its size so that its second moment
exactly equals the second moment of the first. We then fill both molecular
orbital diagrams with the requisite number of electrons and then calculate
the difference in total electronic energies. We note that the constant 'Y
remains undetermined in this procedure. We therefore study only the
structural shape and not the overall size of the geometries in question. The
chief advantage of this method of calculation is that it allows one to retain
all the insights garnered from simple molecular orbital theory. Important
TOPOLOGICAL CONTROL OF MOLECULAR ORBITAL THEORY 77
J:
based on the following observations. First, knowledge of J.1n, where
Jln = En p(E,r) dE, can be used to exactly determine the function p(E,r).
The most advantageous transform technique uses a continued fraction
expansion 16. Second. the J.1n may be related to specific structural features, as
J.1n is the sum of all closed paths of n-steps in which one hops from one
valence atomic orbital to the next. Third, the earliest J.1n i.e., J.Io, J.11 and J.12 are
all structure invariants: J.Io is normalized to equal 1, J.11 is just Tr(H) and is
therefore a constant sum of the Hartree-Fock atomic orbital energies, and
finally J.12 is treated as a constant in our variance scaling method (see eq. (2».
Finally we note that while it is necessary to know all the J.1n to determine
p(E,r) exactly, it is only the first few moments which control the principal
features of the attractive energy, VCr). As we discuss below, knowledge of
J.13 through J.16 is often sufficient in calculating energy differences between
structures. This is particularly true if one uses the continued fraction
expansion in conjunction with the upper and lower energy limits of p(E,r)
(which we call respectively Eu and El). This use of Eu and El can be
important. The reason is that the higher moments are increasingly dominated
by these two values. In the absence of exact knowledge of these higher
moments, Eu and EJ have a significant role. (It should be noted that Eu and
El are also related to local structural features; E 1 depends on the coordination
number, C, and Eu + El depends on the degree ofnon-altemancy).17
As an example of this method we consider band calculations for the
fourth row of the main group. In particular we consider the elemental
structures of Cu, Zn, Ga, Ge, As and Se (elements 29-34 of the periodic
table).1 8 Copper and zinc are respectively face centered cubic (fcc) and
hexagonally closest packed (hcp), gallium adopts an unusual seven
coordinate structure, germanium forms in a diamond lattice, arsenic fonns a
three coordinate two-dimensional puckered honey-combed sheet, while
selenium adopts an infinite one dimensional helix. 19 We therefore need to
compare p(E,r) for each of these six structures types.
For meaningful comparisons we need to calculate the electronic
energies of each of these six structures for the same atom type. The Hartree-
Fock energies of the valence 4s orbital ranges from a(4s)= -6.5 to -22.9 eV
while the 4p orbital energies range from a(4p)= -5.7 to -12.4 eV.17 The ~
(4s) exponent of the STO's range from ~ (4s)= 1.21 for Cu to 2.4 for Se and
similarly the ~ (4p) exponent ranges from ~ (4p)= 1.6 to 2.1. With this great
78 R. ROUSSEAU AND S. LEE
4. Elemental Boron 21
-- E
0
2.24
-•
. ; 0.00
UI
CI
-2.24
..4048
Band Filling
-
2.22
-•
E
0
.!!
-
> 0.00
UI
CI
-2.22
....44
~--------------------------~
Band Filling
lJ..J
<l
Figure 2. Diffcrcnccs ill energy bc:tweca SllUClUrt:S whic:b have aiangles. hexagons.
pentagons 01' squares in their SIrUCtUrC • • function of x. the fractional band filling.
Results are taken from ref. 11. Sec discussion of Figure 1 for fisure conventions.
Figure 3. Two views of the icosohedron. On the left it is seen down I 2·fold axis
while on the ri&ht it is viewed down I 3·foid uis.
TOPOLOGICAL CONTROL OF MOLECULAR ORBITAL THEORY 81
have been reported with unit cells ranging from a twelve atom
rhombohedral cell to a 1708 atom cubic cell. The structures of five of these
polymorphs have been fully resolved. In all five, regular icosohedra playa
significant role. Structures derived from icosohedra play an equally
ubiquitous role in molecular boron chemistry.22 We show this icosohedron
from two perspectives in Figure 3.
In our calculations we will concentrate on the simplest of the boron
polymorphs, the R-12 structure, which contains twelve atoms in a
rhombohedral cell. Each unit cell of this structure contains one icosohedron
whose center can be placed at the cell axes origin. As boron has three
valence electrons, there are a total of 36 valence electrons per primitive unit
cell; of these 36 electrons, 26 are used in intra-icosohedral bonds and 10 in
inter-icosohedral bonds.
There are five crystallographic parameters which control the shape
of the R-12 polymorph. They are the rhombohedral cell angle a and the
atomic x and z fractional coordinates for the two symmetry inequivalent
boron atoms. In Table I we show our optimal values for these paramters
using our variance scaling technique. The six inequivalent bonds in the R-
12 structure are found experimentally to have the lengths of 2.021, 1.787,
1.785, 1.777, 1.733 and 1.709A. Theoretically (if we assume an optimal
value of-O we find these bond lengths to be respectively 1.90, 1.88, 1.89,
1.79, 1.56, 1.78A. While our calculated bond lengths are roughly in the
correct order in going from longest to shortest lengths, the numerical
agreement is poor. The average error in bond lengths is o.09A.
It is instructive to compare the R-12 structure with reasonable
crystallographic equivalents in order to elucidate the most significant
structural feature of the R-12 structure. As our primary interest is with the
inter-icosohedral bonds we consider alternative packings of these
icosohedra. In particular for the sake of numerical simplicity we consider
systems which have exactly one icosohedron per unit cell. Furthermore we
assume that the icosohedra line up in such a way as to preserve some
portion of the point group symmetry of the individual clusters. Of the three
types of rotational axes (five-fold, three-fold and two-fold) only the three-
fold and two-fold axes are compatible with translational crystalline
symmetry. These symmetry axes are found in the trigonal, orthorhombic
and monoclinic crystal classes. We consider here only the higher symmetry
trigonal and orthorhombic lattices. We recall that there are four types of
orthorhombic Bravais lattices (primitive, face-centered, end-centered and
body-centered) and only two types of trigonal Bravais lattices (primitive
and rhombohedral). We therefore need to explore these six different
Bravais lattices. We therefore optimized elemental boron assuming that its
structure corresponded to one of these six different lattice types. In each
case we maintained a perfect icosohedral shape for the individual clusters.
In Figure 4 we compare the differences in energy of these polymorphs. It
may be seen that at an sand p band-filling of 0.375 (which corresponds to
the fractional band filling of elemental boron) the experimentally observed
82 R. ROUSSEAU AND S, LEE
.Expc:rimcnt Theory
a 5.057A
a 58.060 56.7
B(1)x 0.010 0.00
B(l)z 0.657 0.67
B(2)x 0.221 0.22
B(2)z 0.632 0.62
-6 0• B
~
~ prim. trig. C-cent. ortho.
fO. 4 l rhomb~edralprim.or1hO.
~ o.o~~~~~~~~~~~~~~~~~
-0.4
-O.B ____~~____~____~~____~____~
-
~
d d d d d
.
Figure 4. Differences in energy between the high symmetry Bravias latticcs where
th= is one icosahedra per unit cell as a function of fractional band-filling. At the fractional
band-filling of boron. 0.375. the rhombohedral fonn is prcfcmd. See the discussion of
FlJUfC 1 for fiaurc conventions.
TOPOLOGICAL CONTROL OF MOLECULAR ORBITAL THEORY 83
rhombohedral form (R-12) is the most stable. At lower electron counts the
primitive trigonal and face centered orthorhombic structures are more stable
while at higher band fillings the C-centered and primitive orthorhombic
cells are energetically preferred.
We now apply the method of moments to determine the specific
structural causes for these energy differences. In particular, we will
consider the primitive trigonal form as an example of a polymorph stable at
low band fillings and the C-centered orthorhombic lattice as an example of a
phase stable at high band fillings. It is instructive to first consider in detail
the pertinent structural features of these phases. In Figure 5 we illustrate
the rhombohedral (R-12), C-centered orthorhombic, and primitive trigonal
polymorphs. In the middle of Figure 5, we portray the rhombohedral
structure viewed down primarily the hexagonal [001] axis. It may be seen
that in the (hexagonal) a - b crystallographic plane the inter-icosohedral
bonds form both triangles and squares of bonded atoms. There are two
such triangles and three such squares per unit cell. Also shown in Figure 5
is a triangle of atoms connected to the regular icosohedra by 1.71A bonds.
This triangle represents the base of an icosohedron in the next higher plan
in this structure. These 1.71A bonds lie on inter-icosohedral hexagons of
bonded atoms. We note that these 1.71A bonds point radially outward
from the icosohedron. On the bottom of Figure 5 we illustrate the primitive
trigonal cell viewed down primarily the [001] axis. It is identical to the
rhombohedral cell within the a - b plane. It differs in the positioning of the
out-of plane icosohedra which in the primitive trigonal structure lie directly
above the lower icosohedra. The bases of four of the out-of-plane
icosohedra are shown in Figure 5. It may be seen that the interlayer cavities
are octahedra. As octahedral faces are triangles, these out-of-plane
octahedra increase the 113 value for the primitive trigonal lattice
significantly.
On the top of Figure 5 we illustrate the C-centered orthorhombic
structure viewed primarily down the [100] direction. It may be seen that
there are squares involving inter-icosohedral bonds in this structure normal
to both the a and b directions. Bond angles between these inter and the
intra-icosohedrallinks therefore are as small as 90°. Harder to see are the
hexagons of bonds normal to the C axis. It is interesting to note that the a -
b plane of the icosohedra found in the C-centered orthorhombic structure is
identical to sheets found in the rhombohedral structure.
In Figure 6 we show the differences in energy between these three
structures using only 113, 114, Eu and EI. These curves reproduce many of
the qualitative features of the full band calculations. In particular, it may be
seen that the primitive trigonal structure is stable for low fractional band
fillings while the C-centered orthorhombic structure is stable for high band
fillings. These differences in energy can be explained in terms of local
structural features. The difference in energy between the primitive trigonal
and the rhombohedral geometry is due to the larger 113 in the former
geometry. This difference in 113 is due to the formation of octahedral
84 R. ROUSSEAU AND S. LEE
C- centered orthorhorrbic
rhombohedral
primitive trigona
0.84
JL3 - 1&4 only
G.32
eo
-
-•
.!! 0.00 ~~~""'----J'------"'-"""--"""?I
>
IU
<I
-0.32
-0.64 L......_ _ _ _ _ _ _ _ _ _ _ _ _- - - I
Band Filling
B9HY
Bond Exp. JL2 6-31G·
BloHl02-
Bond Exp. JL2 6-31G·
rb
;:0
::0
8
en
ffl
2- 2- 2- ~
BaHa BgHg B10 H10 ~
~
"
r
Figure 7 The BsH82-. B9H92- and B IOH 102- clusters. Polyhedral venices represent BH units. gJ
TOPOLOGICAL CONTROL OF MOLECULAR ORBITAL THEORY 89
same for the 1l2-theory, the Hartree-Fock calculations and the X-ray
structures.
We now turn to the shape of the electronic surface of these three
molecules near the minimum energy geometries. We show in Figure 8 a
contour map of the three surfaces using 1l2-and the RHF/6-31G* theory.
(For the sake of simplicity we consider only a two-dimensional surface for
BgHg2-.) Both theoretical and experimental minima are shown in the
figure. It may be seen that there is fairly good agreement between the ab-
initio and the 112 electronic surfaces, with the best agreement found for
B9H92- and the worst for BgHg2-. There are however several major
differences between these results. First we should note that only the
Hartree-Fock theory is based on exact approximations and therefore only
for Hartree-Fock theory is it known, at least in principle, which additional
effects (such as configuration interaction) need to be included. Second,
there is a significant difference in computer time required for the Hartree-
Fock and the 112 scaled theory,with the Hartree-Fock calculations being on
the order of 103 times slower. Third, it is easiest to rationalize the
geometrical factors responsible for the shape of these curves within the
context of 1l2-theory. This is so as we have a fairly large set of useful
molecular orbital techniques including the fragment formalism and the
concept of orbital mixing which can be used to explain 1l2-Huckel
molecular orbital energies. These approaches are particularly useful in the
context of Huckel theory as it is possible to evaluate the energy of a Huckel
orbital without considering other occupied molecular orbitals (as one has to
do with Hartree-Fock theory). Furthermore, Huckel energies depend
purely on the overlap of valence atomic orbitals. Such overlap can be
deduced by visual inspection of the molecular orbital (MO) shape. Finally.
in Huckel theory one does not need to calculate directly the difference in
energy between large nuclear-nuclear or electron-electron repulsions and
large electron-nuclear attractions and one therefore obviates the need to
explain small differences between large numbers.
1.20 ra1rb
C B9H92- (fL2) d
Table III Comparison of £1E Sum for All MO's and Sum of
HOMO's of Each Irreducible Representation
8 ~ (All MO)8 ~ (Sum of HOMO)b
4SO 22.6 eV 24.3 eV
5SO 3.1 3.9
fi1' 0.0 0.0
6SO 2.1 3.5
7SO 21.6 31.6
BIOH,02-(fL2-theory)
-17
~
::<:I
g
75° 60° 45° 30° ~
E;
~
t:I
Vl
r
Figure 9 Figures 1 and 4
~2-HUckel theory Walsh diagram for BIOHI02- as a function of O. (See
~
TOPOLOGICAL CONTROL OF MOLECULAR ORBITAL THEORY 93
regime the BIOHUt- cluster has divided itself into two more or less isolated
fragments each of C4v symmetry.
We consider fIrSt the 9=9()0 geometry. We find here that of the
three bond types shown originally in Figure 7 for B IOH 102-, only the a and
b bonds remain intact. The c-bond is completely broken. In terms of
symmetry, the DSh point group found at 9=90° has twice as many
symmetry elements as the original D4h point group. One may therefore
defme a new irreducible representation sub-label depending on whether the
molecular orbital is symmetric or antisymmetric with respect to the central
mirror plane of the molecule (i.e., the plane which contains the octagon of
boron atoms.) We shall use the letters (J and 1t for respectively the
symmetric and anti symmetric forms. There are comparatively few
molecular orbitals of 1t symmetry, the majority being 1t orbitals of the
central octagonal plane. In Figure 11 we illustrate those 1t - orbitals which
are relevant to our analysis. They are the al,el and e2 - LUMO's and the e3
-PUMO (for penultimate unoccupied molecular orbital). Lying relatively
near in energies to these four orbitals are the aI, el and e2 -HOMO's and
the e3 - LUMO. These latter four orbitals are all of (J- character and they
are also illustrated in Figure 12. As we allow 9 to relax from the 90°
geometry, two effects occur. First, the (J and 1t sub-labels are lost and
hence mixing between the (J and 1t sets becomes allowed. Secondly, the
role of the c and d-bonds change (the d bond becomes increasingly strong)
which leads to a corresponding change in the overall bonding character of
individual orbitals. Thus the al -HOMO and the al -LUMO which were
formerly of respectively (J and 1t character mix to form strong bonding and
antibonding combinations. The bonding combination is further
strengthened as the geometric distortion eventually converts the original1t
94 R. ROUSSEAU AND S. LEE
Ee1ec =2 I hi + I
i,j
(2 Jij - K ij ) (6)
= h· +. . .
E·I I £ ~ (2J 1J.. - K·)
1J
(7)
j
where i and j are indices of fIlled spatial orbitals. Ei are eigenvalues of the
Fock operator. hi is the electronic kinetic energy plus the electron-nuclear
attractive energy of an electron in the ith orbital. Jij and Kij are respectively
the Coulombic and exchange energies. Etot is the total energy. Eelec the sum
of all electronic kinetic energy plus electron-nuclear and electron-electron
energies and Enue is the repulsive nuclear-nuclear potential energy.
It may be seen from equations (5) - (7) that the sum of the filled
molecular orbital eigenvalues (i.e .• the sum of all Ei) is not equal to Eelec or
TOPOLOGICAL CONTROL OF MOLECULAR ORBITAL THEORY 97
7. Reaction Pathways
In the previous sections of this chapter we examined the electronic
energy surface near the equilibrium geometries of BSHs2-, B9H92- and
98 R. ROUSSEAU AND S. LEE
0.6
-.. E LUMO - E HOMO
-...
:::s (6-31G*)
as 0.5
>-
en 0.4
Q)
c
w 0.3
0.2
40 50 60 70 80
e
Figure 13 HOMO-LUMO gap energies as a function of a for RHF 6-31G*
. BloH102- calculations.
-620
-.. EELEC
-...
::s (6-31G*)
as -630
>
en-
Q)
-640
C
w -650
-660~--~--~----~~~
40 50 60 70 80
e
Figure 14 RHF 6-31G* Ee1ec as a function ofa for BloH102-.
TOPOLOGICAL CONTROL OF MOLECULAR ORBITAL THEORY 99
--
:::s
as
0.2
•
c
B H
8 8
Hueckel
6-31G*
2-
W
+ 3-21G*
<1
0.1
• STO-3G
O.OT----r--~----~~~~~~~
0.0 0.2 0.4 0.6 0.8 1.0
q
Figure 16 Energy as a function of distortion coordinate, q, between the C2y
and I>2d isomers of BgHa2-.
nido 10 (vi), q =1
Bond !J.2-Hiickel 6-31G*
b 1.589 A 1.536 A
c 1.748 1.652
d 1.927 2.093
e 1.871 1.858
f 2.007 2.199
g 1.829 1.844
h 1.903 1.842
a --
:::s
as
0.2
E6-31G*
W
<:l 0.1
nido-10(vi)
nido-10(iv+iv)
~
0.0
0.0 0.2 0.4 0.6 0.8 1.0
q
b
--
0.4
:::s E~2
as 0.3
W 0.2
<:l
0.1
0.0
0.0 0.2 0.4 0.6 0.8 1.0
q
C
--
:::s
as
w
0.5
0.4
0.3
E HOMO
<:l
0.2 nido-10(iv+iv)
0.1
0.0
0.0 0.2 0.4 0.6 0.8 1.0
q
Figure 18 Energy as a function of distortion coordinate, q, between the
C2BgHl02- ~lO (iv + iv) and nido-IO (vi) geomeaies.
104 R. ROUSSEAU AND S. LEE
interest to find chemical variations which will stabilize the as yet unknown
nido-to (iv + iv) geometry.
8. Conclusion
In many ways these last results show succinctly the advantages and
disadvantages of the 112 scaled technique when compared to ab-initio
theory. One of the advantages is that because the 112 method uses Hiickel
theory we can readily understand on a qualitative level the precise electronic
factors which influence the total energy. A second advantage is that 112-
HUckel theory can be carried out quickly and at low cost. This low cost
allows one a great lattitude in the number of geometries one chooses to
study. Our results for C2BgH102- suggest that 1l2-HUckel theory can be
used to find potential new isomers which can then be explicitly tested at a
more accurate ab-initio level. The disadvantage of the 1l2-HUckel theory is
its incomplete modeling of the various factors which control the electronic
energy. For example, 1l2-HUckel theory does not as yet contain terms
which model the relation between charge transfer or ionic energies and
structure. This limits the applications of the method for non-covalent
systems. We believe that in the end a combination of both approaches leads
to the clearest picture of the bonding in the boranes as well as other covalent
and nearly covalent compounds. Hartree-Fock calculations will allow the
chemist to assess the full electronic energy. By contrast 1l2-Hiickel theory
will let one measure the pure covalent forces. It in turn will form a bridge
to such qualitative molecular orbital ideas such as the fragment formalism,
symmetry analysis and the isolobal analogy. With these tools the chemist
can form a vivid and accurate picture of the bonding in both molecules and
solids.
Acknowledgemen ts
References
1. Wigner, E.P. and Seitz, F. in Solid State Physics. edited by Seitz,
F. and Turnbull, D. (Academic, New York, 1955), Vol. 1, p. 97.
2. a) Hehre, W. M., Radom, L., Schleyer, P. v. R. and Pople, J. A.
Ab Initio Molecular Orbital Theory.. Wiley-Interscience
Publications, New York, 1986 b) Hafner, J. From Hamiltonians to
Phase Diagrams, Springer-Verlag, New York, 1987.
3. a) Woodward, R. B., and Hoffmann, R. The Conservation of
Orbital Symmetry VCH, New York, 1970. b) Albright, T. A.,
Burdett, J. K. and Whangbo, M. H. Orbital Interactions in
Chemistry, Wiley - Interscience Publications, Toronto, 1985.
4. a) Wales, D. J., Bone, R. G. A. J. Amer. Chern. Soc., 1992,
114, 5394 b) Bausch, J. W., Surga Prakash, G. K., Williams, R.
E. Inor~. Chern., 1992,31, 3763 c) Biihl, M., Mebel, A. M.,
Charkin, O. P., Schleyer, P. v. R. Inor~. Chern. 1992,31,3769.
5. a) This method was proposed independently for AB (main group-
transition metal) by Pettifor, V and Podloucky, R. Phys. Rev. Lett.
1984,53, 1080 and for the Peierls distortion by Burdett, J. K. and
Lee, S. J. Am. Chern. Soc. 1985, J07, 3063. Other papers whose
results, are based on this method include b) Pettifor, D. G. 1. Phys.
C 1986,19, 285 c) Cressoni, J. C. and Pettifor, D. G. J. Phys.
Condens. Matter. 1991,3,495, and the references cited below in
Ref. 6.
6. a) Lee,S. J. Am. Chern Soc. 1991,113,101; 1991,113,8611
b) Hoistad, L. M., Lee, S. and Pasternak, J. illliL.1992, 114,
4790 c) Hoistad, L. M. and Lee, S. illliL.1991, 113,8216 d)
Lee, S. Acc. Chern. Res. 1991,24, 249 e) Lee, S. Inor~. Chern.
1992,31, 3063 t) Lee, S., Hoistad, L. M. and Carter, S. T. New
J. Chern., 1992, 16, 651.
7. a) Chadi, D. J. Phys. Rev. B., 1979, 19, 2074 b) Chadi, D. J.
Phys. Rev. B., 1984,29, 785 c) Harrison, W. A. Phys. Rev. B.,
1981,24, 385 d) Wang, W. R. and Duke, C.S. Phys. Rev. B.,
1987,36, 2736 e) Verges, J. A. Yndurain, F. Phys. Rev. B.,
1988,34, 4333 t) Chadi, P. J. Phys. Rey. Lett. 1978,41, 1062.
8. Foulkes, W. M. C. and Haydock, R. Phys. Rev. B., 1989, 12,
520.
9. See discussion in Phillips, L. S. G. and Williams, R. 1. P.
Inof!~anic ChemistrY Vol. I., Oxford University Press, New York
1965.
10. Heine, V., Robertson, I. J. and Payne, M. C. in Bonding and
Structure of Solids, edited by Haydock, R., Inglesfield, J. E. and
Pendry, J. B. Royal Society, London, 1991.
11. J. Friedel, Adv. Phys. 1954,3, 446 F. Cyrot-Lackmann, L
~ 1970, CC1 67.
12. Pettifor, D. G. J. Phys., 1986, 19, 285.
13. Wolfsberg, M. and Helmholz, L. J. Chern. Phys. 1957,20,83.
14. Many important atomic parameters are used and discussed in a)
Hoffmann, R. J. Chern. Phys.1963,39, 1397 Anderson, A. B.
and Hoffmann, R. i..b..tiL,1974, 60, 4271 b) Rossi, A. R. and
Hoffmann, R. Inor~. Chern. 1975,14, 365 c) Hay, P. J.,
TOPOLOGICAL CONTROL OF MOLECULAR ORBITAL THEORY 107
R. B. KING
Department of Chemistry
University of Georgia
Athens, Georgia 30602, USA
1. Introduction
1. Euler's relationship:
v-e+f=2 (1)
This arises from the properties of ordinary three-dimensional space.
v-I
}:.ifi = 2e (2)
i=3
In equation 2 f; is the number of faces with i edges (Le.,!3 is the number of triangular
faces,f4 is the number of quadrilateral faces, etc.). This relationship arises from the fact
that each edge of the polyhedron is shared by exactly two faces. Since no face can have
fewer edges than the three of a triangle, the following inequality must hold in all cases:
VS~ m
112 R.B.KING
v-I
})Vi = 2e (4)
i=3
In equation 4 Vi is the number of vertices of degree i (i.e., having i edges meeting at the
vertex). This relationship arises from the fact that each edge of the polyhedron connects
exactly two vertices. Since no vertex of a polyhedron can have a degree less than three, the
following inequality must hold in all cases:
~~~ W
4. Totality of faces:
v-I
Lfi =f (6)
i=3
5. Totality of vertices:
v-I
LVi = v (7)
i=3
In general if a face withfk edges is capped, the following relationships will be satisfied:
V2 =Vl + 1; e2 =el + fk;f2 =fl + fk - 1. An example of such a capping process converts
a square antiprism into a capped square antiprism, i.e.
POLYHEDRAL DYNAMICS 113
A given polyhedron P can be converted into its dual pi< by locating the centers of the faces
of pi< at the vertices of P and the vertices of pi< above the centers of the faces of P. Two
vertices in the dual pi< are connected by an edge when the corresponding faces in P share an
edge. An example of the process of dualization is the conversion of a trigonal bipyrarnid
into a trigonal prism, i.e.
dualization
octahedron. Such processes can be continued to generate all of the deltahedra in Figure 1
with up to ten vertices (i.e., the bicapped square antiprism). However, there are
discontinuities in this deltahedral growth sequence for the deltahedra with eleven and
twelve vertices depicted in Figure 1.
3/1~
"-2/
1
Breaking
12-edge
.. 1\
\/
Capping ..
1234-face
L~
3\J
2 2
Tetrahedron Butterfly Trigonal Bipyramid
1
/\\ Breaking
.. Capping .-
3~ 45-edge 1245-face
2
Trigonal Bipyramid Square Pyramid Octahedron
Figure 2: Deltahedral growth processes involving edge removal followed by face capping
as applied to the tetrahedron and the trigonal bipyramid
of the three-dimensional plane of the face 10; this projection is called the Schlegel diagram
of the polyhedron P.
Any given polyhedron can have as many different Schlegel diagrams as it has
different faces. The procedure for drawing the Schlegel diagram of the square pyramid
using the square face as the base face 10 is illustrated below.
,
Xo
The following features of Schlegel diagrams are of interest:
(1) The location of the point xo can always be chosen so that the edges in the Schlegel
diagram can be drawn as non-intersecting straight lines. This is one of the big advantages
of Schlegel diagrams over conventional perspective drawings.
(2) Schlegel diagrams depict the topological but not the metric features of polyhedra.
Thus the vertex neighborhood relationships depicted by edges are preserved. However,
edge lengths and angles are distorted. Since many important chemical relationships are
topological rather than metric, this distortion is not necessarily serious.
(3) Schl~el diagrams may not preserve all symmetry elements of the original
polyhedron because of the metric distortion. The preservation of symmetry elements in
Schlegel diagrams is maximized if a unique face of the polyhedron is selected as the base
face.
The problem of the classification and enumeration of polyhedra is a complicated
one. Thus there appear to be no formulas, direct or recursive, for which the number of
combinatorially distinct polyhedra having a given number of vertices, edges, faces, or any
given combination of these elements can be calculated. 28 ,29 Duijvestijn and Federico have
enumerated by computer the polyhedra having up to 22 edges according to their numbers of
vertices, edges, and faces and their symmetry groups and present a summary of their
methods, results, and literature references to previous work.30 Their work shows that
there are 1,2, 7, 34, 257, 2606, and 32,300 topologically distinct polyhedra having 4,5,
6,7,8,9, and 10 faces or vertices, respectively. Tabulations are available for all 301 ( =
1 + 2 + 7 + 34 + 257) topologically distinct polyhedra having eight or fewer faces 31 or
eight or fewer vertices. 32 These two tabulations are essentially equivalent by the
dualization relationship discussed above.
Polyhedra of greatest significance in coordination chemistry are those that can be
formed by the nine orbitals of the sp 3d 5 valence orbital manifold accessible to transition
metals. There are, however, some polyhedra having fewer than nine vertices which cannot
be formed by these nine orbitals; such polyhedra are called forbidden polyhedra. 33 Group
theoretical arguments show that polyhedra of the following types are always forbidden
polyhedra:
(1) Polyhedra having eight vertices, a direct product symmetry group R x Cs or R x Cj
(R contains only proper rotations) and the plane in Cs fixing either 0 or 6 vertices;
(2) Polyhedra having a six-fold or higher CIl rotation axis.
116 R. B. KING
3. Polyhedral Isomerizations
A
/\
C-O
\/
B
PI P3
POLYHEDRAL DYNAMICS 117
In this process a configuration such as PI can be called a dsd situation and the edge AB can
be called a switching edge. If a, b, c, and d are taken to represent the degrees of the
vertices A, B, C, and D, respectively, in PI, then the dsd type of the switching edge AB
can be represented as ab(cd). In this designation the first two digits refer to the degrees of
the vertices joined by AB but contained in the faces (triangles) having AB as the conunon
edge (Le., C and D in PI). The quadrilateral face formed in structure P2 may be called a
pivot/ace.
In his pioneering paper Lipscomb 39 described some possible framework
rearrangements of the polyhedra found in cage boranes and carboranes having from five to
twelve vertex atoms. Fifteen years later4 1 I reexamined this question in light of advances
in known experimental information on polyhedral chemical systems as well as improved
understanding in polyhedral topology. Subsequently42 I developed a mathematical
approach for examining all possible non-planar rearrangements of polyhedra having few
(i.e., ::;; 6) vertices using a method developed by Gale 43 in 1956 for studying
d-dimensional polytopes having only a few more than the minimum d + 1 vertices. This
work 42 confirmed the crucial role of dsd-processes conjectured so successfully by
Lipscomb 39 and also provided insight for more detailed study of isomerizations of
polyhedra having seven 44 and eight45 vertices.
Consider a polyhedron having e edges. Such a polyhedron has e distinct dsd
situations, one corresponding to each of the e edges acting as the switching edge.
Applications of the dsd process at each of the dsd situations in a given polyhedron leads in
each case to a new polyhedron. In some cases the new polyhedron is identical to the
original polyhedron. In such cases the switching edge can be said to be degenerate. A dsd
process involving a degenerate switching edge represents a pathway for a degenerate
polyhedral isomerization of the polyhedron. A polyhedron having one or more degenerate
edges is inherently fluxional whereas a polyhedron without degenerate edges is inherently
rigid.
The dsd type of a degenerate edge ab(cd) can be seen by application of the process
PI-'>P2-'>P3 to satisfy the following conditions:
c = a-I and d = b - I or c = b - 1 and d = a-I (10)
Using these conditions the chemically significant deltahedra depicted in Figure 1 can be
very easily checked for the presence of one or more degenerate edges with the following
results:
(1) Tetrahedron. No dsd process of any kind is possible since the tetrahedron is the
complete graph K4. A tetrahedron is therefore inherently rigid.
(2) Trigonal bipyramid. The three edges connecting pairs of equatorial vertices are
degenerate edges of the type 44(33). A dsd process using one of these degenerate edges as
the switching edge and involving a square pyramid intermediate corresponds to the Berry
pseudorotation 40 ,46 which is believed to be the mechanism responsible for the
stereochemical nonrigidity of trigonal bipyramidal complexes, even at relatively low
temperatures. 47 The single dsd process for the trigonal bipyramid may be depicted as
follows:
118 R.B. KING
.. ..
Trigonal bipyramid - Square pyramid --Trigonal bipyramid
Note that the trigonal bipyramid rotates through 90° upon rearrangement through square
pyramid intermediate as a result of the C4 axis in the square pyramid. This is why this
process has been called a pseudorotation.
(3) Octahedron. The highly symmetrical octahedron has no degenerate edges and is
therefore inherently rigid.
( 4) Pentagonal bipyramid. The pentagonal bipyramid has no degenerate edges and
thus by definition is inherently rigid. However, a dsd process using a 45(44) edge of the
pentagonal bipyrarnid (namely an edge connecting an equatorial vertex with an axial vertex)
gives a capped octahedron. The capped octahedron is a low energy polyhedron for ML7
coordination complexes48 but a forbidden polyhedron for boranes and carboranes because
of its tetrahedral chamber. 49
(5) Bisdisphenoid. The eight-vertex bisdisphenoid has four pairwise degenerate
edges, which are those of the type 55(44) located in the subtetrahedron consisting of the
degree 5 vertices of the bisdisphenoid (Figure 1). Thus two successive or more likely
concerted (parallel) dsd process involving opposite 55(44) edges (Le., a pair related by a
C2 symmetry operation) converts one bisdisphenoid into another bisdisphenoid through a
square antiprismatic intermediate. Thus a bisdisphenoid, like the trigonal bipyramid
discussed above, is inherently fluxional.
(6) 4,4,4-Tricapped Trigonal Prism. The three edges of the type 55(44) corres-
ponding to the "vertical" edges of the trigonal prism are degenerate. A dsd process using
one of these degenerate edges as the switching edge involves a C4v 4-capped square
antiprism intermediate. Nine-vertex systems are therefore inherently fluxional.
(7) 4,4-Bicapped Square Antiprism. This polyhedron has no degenerate edges
and therefore is inherently rigid.
(8) Edge-coalesced Icosahedron. The four edges of the type 56(45) are
degenerate. This eleven-vertex deltahedron is therefore inherently fluxional.
(9) Icosahedron. This highly symmetrical polyhedron, like the octahedron, has no
degenerate edges and is therefore inherently rigid.
This simple analysis indicates that in deltahedral structures the 4, 6, to, and 12
vertex structures are inherently rigid; the 5, 8, 9, and 11 vertex structures are inherently
fluxional; and the rigidity of the seven-vertex structure depends upon the energy difference
between the two most symmetrical seven-vertex deltahedra, namely the pentagonal
bipyramid and the capped octahedron. This can be compared with experimental
fluxionality observations by boron-II nuclear magnetic resonance on the deltahedral borane
anions BnHn2- (6 ~ n ~ 12)50 where the 6, 7, 9, to, and 12 vertex structures are found to
be rigid and the 8 and 11 vertex structures are found to be fluxional. The only discrepancy
between experiment and these very simple topological criteria for fluxionality arises in the
nine vertex structure B9H92-.
POLYHEDRAL DYNAMICS 119
The discrepancy between the predictions of this simple topological approach and
experiment for B9H92- has led to the search for more detailed criteria for the rigidity of the
deltahedra boranes. In this connection Gimarc and Ott have studied orbital symmetry
methods particularly for the five,51 seven,52 and nine 53 vertex borane and carborane
structures. A topologically feasible dsd process is orbitally forbidden if crossing of
occupied and vacant molecular orbitals (i.e., a "HOMO-LUMO crossing") occurs during
the dsd process as illustrated by the following diagram for the single dsd process for the
trigonal bipyrarnid54 :
For such an orbitally forbidden process, which occurs in the five- and nine-vertex
deltahedral boranes and carboranes, the activation barrier separating initial and final
structures is likely to be large enough to prevent this polyhedral isomerization. However,
the forbidden dsd polyhedral rearrangement for the five-vertex B5H52- and corresponding
carboranes is allowed and has been observed for PX5 derivatives such as pel5 and PF5
(i.e., the single fluorine-19 resonance in PFS). Guggenberger and Muetterties55 point out
that cage framework rearrangements such as those in the deltahedral boranes and
carboranes involve bond stretches which must require more energy than bond angle
changes that occur in coordination polyhedra of ligands bound to a central atom.
Some selection rules have been proposed for distinguishing between symmetry-
allowed and symmetry-forbidden processes in deltahedral boranes, carboranes, and related
structures. Thus Wales and Stone56 distinguish between symmetry-allowed and
symmetry-forbidden processes by observing that a HOMO-LUMO crossing occurs if the
proposed transition state has a single atom lying on a principal en
rotational axis where
n ~ 3. A more detailed selection rule was observed by Mingos and Johnston. 57 If the four
outer edges of the two fused triangular faces (i.e., the "diamond") are symmetry
equivalent, then a single dsd process results in a pseudorotation of the initial polyhedrOn by
90° as follows:
120 R. B. KING
<I>
C 2v
..
~
C 4v
..
~C 2v
However, if the edges are not symmetry. equivalent then the rearrangement results in a
pseudorejlection of the initial polyhedron which can be indicated as follows:
..
o ..
C2 C 2v C2
Pseudorotations are symmetry-forbidden and have larger activation energies than
pseudoreflections, which are symmetry allowed.
Gale diagrams provide an elegant method for the study of microscopic aspects of
rearrangements of polyhedra with relatively few vertices (i.e., for v ~ 6) by reducing the
dimensionality of allowed vertex motions. In a chemical context Gale diagrams can be
used to study possible rearrangements of six-atom structures by depicting skeletal
rearrangements of six atoms as movements of six points on the circumference of a circle or
from the circumference to the center of the circle subject to severe restrictions that reduce
possible such movements to a manageable number. 58
Consider a polytope Pin d-dimensional space 9{d. The minimum number of
vertices of such a polyhedron is d + 1 and there is only one such polyhedron, namely the
d-simplex. 1 The combinatorially distinct possibilities for polytopes having only d + 2 and
d + 3 vertices (polyhedra with "few" vertices) are also rather limited and through a Gale
transformation59 can be represented faithfully in a space of less than d dimensions. More
specifically, if Pis a d-dimensional polytope with v vertices, a Gale transformation leads to
a Gale diagram of Pconsisting of v points in (v-4-1)-dimensional space 9{d-l in one-to-
one correspondence with the vertices of P. From the Gale diagram it is possible to
determine all of the combinatorial properties of P such as the subsets of the vertices of P
that derme faces of P, the combinatorial types of these faces, etc. Of particular significance
in the present context is tho fact that the combinatorial properties of a polytope Pwhich can
be determined by the Gale diagram include all possible isomerizations (rearrangements) of
Pto other polytopes having the same number of vertices and imbedded in the same number
of dimensions as P. Also of particular importance is the fact that, if v is not much larger
than d (Le., if v ~ 2d), then the dimension of the Gale diagram is smaller than that of the
original polytope P.
Now consider polyhedra in the ordinary three-dimensional space of interest in
chemical structures (i.e., d = 3). Gale diagrams of five- and six-vertex polyhedra can be
imbedded into one- or two-dimensional space, respectively, thereby simplifying analysis of
their possible vertex motions leading to non-planar polyhedral isomerizations of these
polyhedra of possible interest in a chemical context.
POLYHEDRAL DYNAMICS 121
In order to obtain a Gale diagram for a given polyhedron, the polyhedron is ftrst
sUbjected to a Gale transformation. Consider a polyhedron with v vertices as a set of v
points XI, ... , Xv in three-dimensional space ~3. These points may be regarded as three-
dimensional vectors XII =V:1I,l, XII,2,xIl,3), 1 ~ n ~ v, from the origin to the vertices of the
polyhedron. In addition, consider a set of points '1J(A) in v-dimensional space ~v, A =
(al, ... ,av) such that the following sums vanish:
v
I,ajxj,k =0 for 1 ~ k ~ 3 (Ila)
i=1
v
raj =0 (Ub)
i=1
Equation lla may also be viewed as three orthogonality relationships between the v-
dimensional vector A = (al, ... ,av) and the three v-dimensional vectors (Xl,k,X2,k, ... ,
xv,k), 1 ~ k ~ 3. Now consider the locations of the vertices of the polyhedron as the
following v x 4 matrix:
DO= (
XI,I XI,2
X2,l X2,2 XI,3
. .
X2,3 1
1 J (12)
xv,} XV ,2 x V ,3
Consider the columns of DO as vectors in ~v. Since DO has rank 4, the four columns of
DO are linearly independent. Hence the subspace 9If(X) of ~v represented by these four
linearly independent columns has dimension 4. Its orthogonal complement !M(A).L = (A e
~v I A·X = 0 for all X e 9If(X)} coincides with V(A) defined above by equations lla
and 11 b. Therefore:
dim '1J(A) = dim 9If(A).L = v - dim 9If(X) = v - 4 (13).
Now define the following v x (v-4) matrix:
(
al,}
a2,1
al,2 .
a2,2 .
.
.
.
.
al,v-4
a2,v-4
J
DI = " . (14)
" .
av,} aV,2 aV,v-4
(1) Gale transforms Xj and Xk of two or more vertices of a polyhedron may lead to the
same point (i.e., the same v-4 coordinates) in (v-4)-dimensional space (9t v-4). In other
words some points of a Gale transform may have a multiplicity greater than one so that the
Gale transform of a polyhedron in such cases contains fewer distinct points than the
polyhedron has vertices.
(2) The Gale transform depends upon the location of the origin in the coordinate
system. Therefore, infinitely many Gale transforms are possible for a given polyhedron.
Geometrically a Gale transform of a polyhedron corresponds to a projection of the v
vertices of a (v-I)-dimensional simplex (i.e., the higher dimension "analogue" of the
tetrahedron in three dimensions) into a (v-4)-dimensional hyperplane. 60 Since infinitely
many such projections are possible, the Gale transform for a given polyhedron is not
unique.
In practice, it is easier to work with Gale diagrams corresponding to Gale
transforms of interest. Consider a Gale transform of a (three-dimensional) polyhedron
having v vertices XI. ... xv as defined above. The corresponding Gale diagram i1, ... ,xv is
defmed by the following relationships:
Xj = 0 ifij =0 (I5a)
Xj = I~II if Xj '" 0 (I5b)
In equation 15b 1li;1I is the length (i.e.,"" a 2 j,1 + a2 j,2 + ... + a 2 j,v-4 ) of the vector Xj .
If v-4 = 1 (i.e., v =5), Gale diagrams can only contain the points of the straight line 0, 1,
and -1 of varying multiplicities mo, ml, and m-lo respectively, where mo ~ 0, m1 ~ 2, and
m-1 ~ 2. If v-4 = 2 (i.e., v = 6) Gale diagrams can only contain the center and
circumference of the unit circle. These two types of Gale diagrams (Figure
3) are of interest for the study of polyhedral isomerizations since they represent significant
structural simplifications of the corresponding polyhedra.
The following properties of Gale diagrams corresponding to three-dimensional
polyhedra are of interest since they impose important restrictions on configurations of
points which can be Gale diagrams:
(1) Any (v-5)-dimensional plane passing through the central point of the Gale diagram
bisects the space of the Gale diagram into two halfspaces. Each such halfspace must
contain at least two vertices (or one vertex of multiplicity 2) of the Gale diagram not
including any vertices actually in the bisecting plane or hyperplane. Such a halfspace is
called an open halfspace. Violation of this condition corresponds to a polyhedron with the
impossible property of at least one pair of vertices not connected by an edge which is
closer in three-dimensional space than another pair of vertices which is connected by an
edge.
(2) The set of vertices of a polyhedron not forming a given face or edge of the
polyhedron is called a co/ace of the polyhedron. The regular octahedron is unusual since
all of its faces are also cofaces corresponding to other faces. The interior of a figure
formed by connecting the vertices of a Gale diagram corresponding to a coface must
contain the central point.
(3) The central point is a vertex of a Gale diagram if and only if the corresponding
polyhedron is a pyramid. The central vertex of such a Gale diagram corresponds to the
apex of a pyramid which is the coface corresponding to the base of the pyramid.
POLYHEDRAL DYNAMICS 123
2--1--2 2--3
Polyhedra with
6 vertices
11 edges ___
6 triangular faces
1 quadrilateral faces
Figure 3: Standard Gale diagrams for all polyhedra having five and six vertices. Balanced
diameters are indicated by bold lines.
124 R. B. KING
~
abc---de
-- ab---c~de
- ab---cde
c/I"\
\ile
.. L\
C,,\/
e
..
d
/11"-
a~\I/b
e
This process int\(fchanges the vertices of multiplicities 2 and 3 and leads to an equivalent
Gale diagram corresponding to an isomeric trigonal bipyramid. The motion through the
center point of the Gale diagram corresponds to the generation of a square pyramid
intermediate in the non-planar degenerate isomerization of a trigonal bipyramid. This, of
course, is the Berry pseudorotation process40 ,46 which is the prototypical dsd process.
The choice of three points to move away from the vertex of multiplicity 3 in the Gale
diagram of a trigonal bipyramid corresponds to the presence of three degenerate edges in a
trigonal bipyramid. This analysis of the Gale diagrams of the two possible five-vertex
polyhedra shows clearly that the only possible nonplanar isomerizations of five-vertex
polyhedra can be represented as successive dsd processes corresponding to successive
Berry pseudorotations.
The Gale diagrams of six-vertex polyhedra (Figure 3) can be visualized most clearly
if all of the diameters containing vertices are drawn. Some Gale diagrams of six vertex
polyhedra have diameters with vertices of unit multiplicity at each end. Such diameters
may be called balanced diameters and are indicated by bold lines in Figure 3. The two
vertices of a balanced diameter in the Gale diagram of a six-vertex polyhedron form an edge
which is a coface corresponding to a quadrilateral face. Gale diagrams drawn to maximize
the multiplicities of the vertices and the numbers of balanced diameters consistent with the
POLYHEDRAL DYNAMICS 125
polyhedral topology are called standard Gale diagrams. The Gale diagrams depicted in
Figure 3 are the standard Gale diagrams for the six-vertex polyhedra in question. The
number of balanced diameters in a standard Gale diagram of a six-vertex polyhedron is
equal to the number of quadrilateral faces of the polyhedron. The pentagonal pyramid is
the only six-vertex polyhedron for which the center of the circle is a vertex of the
corresponding standard Gale diagram.
The standard Gale diagrams of the trigonal prism and octahedron illustrate another
interesting feature of Gale diagrams, namely the ability to draw Gale diagrams so that all
symmetry elements of the corresponding polyhedron are preserved. The C3 symmetry
elements of both the trigonal prism and octahedron are readily apparent in their standard
Gale diagrams passing through the center perpendicular to the plane of the circle (Figure 3).
In the case of the trigonal prism, the three C2 axes of its D3h point group correspond to the
three balanced diameters of the corresponding standard Gale diagmm. In the case of the
octahedron, which has the 0 h point group, the reflection planes cr h correspond to
permuting the two vertices of an octahedron forming a vertex of multiplicity two in the
corresponding standard Gale diagram while keeping the other vertices fixed. The C2 and
C4 rotation axes of the octahedron pass through the center and a vertex of multiplicity two
in the corresponding standard Gale diagram and permute the other four vertices forming the
two other standard Gale diagram vertices of multiplicity two in various ways.
Polyhedral isomerizations in six-vertex polyhedra may be described by allowed
motions of the vertices of their Gale diagrams along the circumference of the unit circle or
through the circle center in the case of polyhedral isomerizations involving a pentagonal
pyramid intermediate. However, vertex motions are not allowed if at any time they
genemte one or more forbidden diameters containing three or more vertices. Using these
techniques all non-planar degenerate isomerizations of six-vertex polyhedra can be
decomposed into sequences of eight fundamental processes, namely two processes through
pentagonal pyramid intermediates, five processes which are variations of single diamond-
square-diamond processes, and the triple dsd degenerate isomerization of an octahedron
through a trigonal prism intermediate on which the Baila.r61 and Ray and Dutt62 twists of
M(bidentate)J complexes are based. The Gale diagram for the last process can be depicted
as follows:
----
8,
I I b
e-I-d
'f/
/c
-- b1CI
8 \:::;.
1
j,a\
e--f
Note that in the fIrst (diamond-square) stage of this triple dsd process leading from the
octahedron to the trigonal prism, one vertex from each of the three vertex pairs (i.e., ad,
be, and cf) in the Gale diagram of the octahedron must move in the same direction in a
concerted manner preserving the C3 axis in order to avoid violating the "half-space rule."
The standard Gale diagram of the trigonal prism is reached when three balanced diameters
are formed. Similarly, in the second (square-diamond) stage leading from the trigonal
prism to an isomeric octahedron these three vertices continue to move in a concerted
manner so as to preserve the C3 axis.
The two axial vertices (labeled Td) correspond to the two tetrahedral isomers and the three
equatorial vertices (labeled D4h) correspond to the three square planar isomers. The
connectivities of the tetrahedral (Olet) and square planar (Osq) isomers are 3 and 2,
respectively, in accord with the degrees of the corresponding vertices of the K2,3 graph.
Thus ltetOtet = IsqOsq = 6; this is an example of the closure condition IaOa = IbOb required
for a topological representation with vertices representing more than one type of
polyhedron.
128 R. B. KING
The two combinatorially distinct five-vertex polyhedra are the trigonal bipyramid
and the square pyramid. The conversion of a trigonal bipyramid into an isomeric trigonal
bipyramid through a dsd process involving a square pyramid intennediate has been
discussed above. Some interesting graphs (Figure 4) are found in the topological
representations for this process. The trigonal bipyramid has an isomer count I =5!/1D31 =
120/6 =20 corresponding to 10 enantiomeric pairs. A given trigonal bipyrarnid isomer can
be described by the labels of its two axial positions (i.e., the single pair of vertices not
connected by an edge) with a bar used to distinguish enantiomers. In a single degenerate
dsd isomerization of a trigonal bipyramid through a square pyramid intennediate, both axial
vertices of the original trigonal bipyramid become equatorial vertices in the new trigonal
bipyrarnid leading to a connectivity of three for dsd isomerizations of trigonal bipyramids.
The corresponding topological representation thus is a 20 vertex graph in which each vertex
has degree 3. However, additional properties of dsd isomerizations of trigonal bipyramids
exclude the regular (Ih) dodecahedron as a topological representation unless double group
fonn is used to produce pseudohexagonal faces. A graph suitable for the topological
representation of dsd isomerizations of trigonal bipyramids is the Desargues-Levy graph,
depicted in Figure 4 (top).
Less complicated but still useful topological representations can be obtained by
using each vertex of the graph to represent a set of isomers provided that each vertex
represents sets of the same size and interrelationship and each isomer is included in exactly
one set. A simple example is the use of the Petersen's graph (Figure 4 bottom) as a
topological representation of isomerizations of the 10 trigonal bipyramid enantiomer pairs
(E = 5!/ID3hl = 120/12 = 10) by dsd processes. The use of Petersen's graph for this
purpose relates to its being the odd graph 03; an odd graph Ok is defined as follows 64: its
vertices correspond to subsets of cardinality k - 1 of a set S of cardinality 2k - 1 and two
vertices are adjacent if and only if the corresponding subsets are disjoint.
The midpoints of the 30 edges of the dodecahedron (designated by triangles, ~) are the 30
octahedron isomers. Line segments across a pentagonal face connecting these edge
midpoints correspond to triple dsd isomerization processes; the midpoints of these lines
(designated by diamonds, .) correspond to the 120 trigonal prismatic isomers with 10
such isomers being located in each of the 12 faces of the pentagonal dodecahedron. The
ten lines on a face representing isomerization processes form a KS graph. This system is
closed since the connectivities of the octahedron (8oc t) and trigonal prism (dtp) are 8 and 2,
respectively, leading to the closure relationship loct8oct =Itp8tp =240.
interest. The major effect of reducing the symmetry by a factor of 105 ( 3 x 5 x 7) in going
from Sg to S4[S2] is the deletion of five-fold and seven-fold symmetry elements. Such
symmetry elements are not of interest in this context since none of the 257 eight-vertex
polyhedra has five-fold symmetry elements31.3 2 and the only eight-vertex polyhedron
having a seven-fold symmetry element is the heptagonal pyramid, which is not of interest
in this particular chemical context. Restricted isomer counts 1* = 384/IRI based on
subgroups of the wreath product group S4[S2] rather than the symmetric group Sg are the
more manageable numbers 16,32,48, and 96 for the cube, hexagonal bipyramid, square
antiprism, and bisdisphenoid, respectively.
The concept of restricting vertex permutations in eight-vertex systems to the wreath
product group S4[S2] rather than the fully symmetric Sg group can be restated in graph-
theoretical terms using the hyperoctahedral graph H4. 64 Therefore such a restriction of
permutations from Sg to S4[S2] can be called a hyperoctahedral restriction. The
hyperoctahedral graphs underlying this restriction are designated as Hn and have 2n
vertices and 2n(n - 1) edges with every vertex connected to all except one of the remaining
vertices so that each vertex of Hn has degree 2(n - 1). The name "hyperoctahedral" comes
from the fact that an Hn graph is the I-skeleton of the analogue of the octahedron (called the
"cross-polytope") in n-dimensional space.!. The hyperoctahedral graphs H2 and H3 thus
correspond to the square and octahedron, respectively. The S4[S2] wreath product group
is the automorphism ("symmetry") group of the hyperoctahedral graph H4 just as the Sg
symmetric group is the automorphism group of the complete graph Kg.
U sing these ideas topological representations for isomerizations of eight-vertex
polyhedra are depicted in Figures 5 and 6. Vertex and edge midpoints in these
representations correspond to the E* = 384/IGI hyperoctahedrally restricted enantiomer
pairs (E* = 8, 16, 24, and 48 for the cube, hexagonal bipyramid, square antiprism, and
bisdisphenoid, respectively) except because of the hyperoctahedral reduction in symmetry,
the number of points for the square antiprism must be doubled. 69 Figure 5 is a K4,4
bipartite graph in which the 8 cube enantiomer pairs are located in the centers of the
hexagons and the 16 hexagonal bipyramid enantiomer pairs are located at the edge
midpoints. Since both the cube and hexagonal bipyramid are forbidden polyhedra (i.e.,
cannot be formed using only s, p and d orbitals),33 this portion of the topological
representation for hyperoctahedrally restricted eight-vertex systems is not accessible if only
s, p and d orbitals are available for chemical bonding.
The detailed structure of a hexagon wheel corresponding to a given pair of cube
enantiomers is depicted in Figure 6. The vertices of the hexagon correspond to the square
antiprisms that can be generated from the cube in the center by twisting opposite pairs of
faces. The midpoints of the hexagon edges correspond to bisdisphenoid enantiomer pairs.
Traversing the circumference of a given hexagon corresponds to a sequence of double dsd
processes interconverting the bisdisphenoids located at the midpoints of the two joined
hexagonal edges meeting at a vertex through the square anti prism intermediate represented
by the vertex joining the edges. Since both the bisdisphenoid and square antiprism can be
formed using only s, p, and d orbitals, the circumference of the hexagon is accessible in
MLg systems in which the central atom M has the usual sp 3d5 nine-orbital manifold. Thus
in the usual situation not involving f orbitals, isomerizations are restricted to the
circumference of a given hexagon in Figure 5 and cannot occur by moving from one
hexagon to another.
132 R. B. KING
Figure 5: The K.!,4 bipartite graph for the hyperoctahedrally restricted isomerizations of
eight-vertex polyhedra indicating points corresponding to one each of the cube, square
antiprism, hexagonal bipyramid, and bisdisphenoid isomers.
POLYHEDRAL DYNAMICS 133
Figure 6: The detailed structure of a hexagonal wheel corresponding to a given pair of cube
enantiomers. Spokes labeled B correspond to cube-square antiprism interconversions
whereas edges labeled C correspond to interconversions from one square antiprism isomer
to another through a bisdisphenoid intermediate. The double dsd isomerization of a
bisdisphenoid to another through a square antiprism intermediate corresponds to movement
from the center of one edge representing the initial bisdisphenoid through a vertex
representing a square antiprism intermediate to the center of an adjacent edge representing
the final bisdisphenoid.
134 R.B. KING
6. Literature References
ALEXANDRU ~ BALABAN
Contents
1.lntroduction
2.Reaction graphs of rearrangements via carbocations
2. 1. ETHYL CARBENIUM IONS
2. 2. AUTOMERIZATIONS OF HOMOVALENIUM CATIONS
2. 3. PARTLY DEGENERATE REARRANGEMENTS VIA CARBOCATIONS
2. 4. MECHANISMS OF REARRANGEMENTS LEADING TO DIAMOND
HYDROCARBONS AND DERIVATIVES
2. 4. 1. Adamantane
2. 4. 2. Diamantane
2. 4. 3. Tetracyclotridecanes
2. 4. 4. Tricycloundecanes
2. 4. 5. Tetracycloundecanes
2. 4. 6. Spiro{adamantane-2, 1'-cyclobutaneJ
3.Automerization of bullvalene, other valence isomers of
annulenes, and azabullvalene
4.Rotation in molecular propellers
5.Reaction graphs for rearrangements of metallic complexes
5. 1. PENTACOORDINATE COMPLEXES
5. 1. 1. Trigonal-bipyramidal complexes
5. 1. 2. Tetragonal-pyramidal complexes
5. 2. TETRACOORDINATE COMPLEXES
5. 3. HEXACOORDINATE COMPLEXES
5. 3. 1. Octahedral complexes
5. 3. 2. Axially distorted octahedra
5. 3. 3. Trigonal prismatic complexes
5.4. OCTACOORDINATECOMPLEXES
5. 4. 1. Square antiprisms
5. 5. OTHER INORGANIC COMPLEXES
6.Xenon hexafluoride
7.Heptaphosphide trianion
a.Kinetic graphs, synthon graphs, and graph transforms
9.Conclusions
137
D. Bonchev and O. Mekenyan (eds.), Graph Theoretical Approaches to Chemical Reactivity, 137-180.
© 1994 Kluwer Academic Publishers.
138 A. T. BALABAN
1.lntroduction
The first reaction graphs were published in 1966 for representing all
possible interconversions of carbenium ions. 1 Thus, a pentasubstituted ethyl
cation can undergo an elementary reaction step via three different pathways
involving a 1,2-shift of any of the three substituents in B'position relative to
the positive charge (Whitmore's rule), as shown in Fig 1. Each of the three new
carbenium ions again may react via three different pathways (one of which
reverts the preceding 1,2·shift). The process continues leading to a total of 20
different carbenium ions if the two carbon atoms of the ethyl group are
distinguishable (e. g. by isotopic labeling), or to 10 different carbenium ions if
they are not.
2 ,
:!-1.151
/---+,5 !23.1
.1 3 1 :S-1.141
1 4
~
2t-\
3 5
·2
~
'
3
/-+,s
4
2
113.1
;!:-1.251
":5-1.241
1 4
1.451 ~ ! - t 5 l12.)
;i-1.3SI
:;-1.341
2 3
Fig.1. A portion of the reaction graph indicating the three possibilities for
the rearrangement of an ethyl carbenium ion with five different
sUbstituents denoted by 1 - 5. The abbreviated notation for each
carbocation is indicated in brackets. On the arrows one can see the
substituent undergoing the 1,2·shift. All interconversions are reversible.
REACTION GRAPHS 139
Z'3.1'4'l~~*J'4'5
~ y~ ~
1~~1<:r
5~1,34 ~.2
\/"
3,4*1,2) 3,5*1,2,4' 2,4"I,3.l Z 14 23< 1 '4
~ 'V'~
3,4,5* 1,2 2,4,5*1,3 4,5-2,3
4,5-1,7,3
Fig. 2. The Oesargues-Levi graph with the 20 vertices symbolizing
rearranging ethyl carbenium ions. The full notation lists all five
sUbstituents (it can be abbreviated by discarding the triplet of
substituents, and replacing the asterisk by a period).
The notation of Fig. 2 indicates for the 20-vertex reaction graph, called
Oesargues-Levi graph, two groups of digits corresponding to the substituents
bonded to the carbon atoms of the ethyl cation : the group of three digits
corresponds to the substituents attached to the sp3-hybridized carbon atom, and
the group of the two digits corresponds to the substituents attached to the
cationic center (sp2-hybridized carbon atom). The two groups are separated by
an asterisk or a period. The order of the substituents within each group is
irrelevant (conventionally, the increasing order of digits was chosen), but the
order between the doublet and triplet of digits is essential for distinguishing
the two ethyl carbons atoms.
For abbreviating the notation, the group of three digits may be omitted
because these digits can be deduced when the remaining two digits are known ;
a period either after or before the group of two digits distinguishes the order
between the doublet and triplet of digits. We shall use both the full and
abbreviated notation : the abbreviated notation in Fig. 1, and the full notation in
Fig. 2.
The largest graph-theoretical distance between two vertices (i. e. the
diameter) of the Oesargues-Levi graph is 6 : for any vertex i, there is a unique
vertex at distance 6. This is the "antipode" of vertex i, and it differs only in the
order between the two groups of digits. Thus, the full notation system for two
antipodal vertices is 12.345 and 345.12 ; the abbreviated notation 12. and .12 ;
it will be seen in the section on trigonal-bipyramidal complexes that another
abbreviated notation uses a bar above the group of two digits but there is no
direct equivalence with the present notation.
140 A. T. BALABAN
A
t,2,3 e ",5
~
Z,~'I,4,5 t,:~
r Z'3"4<3~"4 ~ 1,3'4.~
_ ' !
13.s~2,'...Lz
1,2,:'3,5
1 I
~.3"1
J _
23 •
..... .....
.23
14.1J1 .,.... J1
Fig. 4. Alternative representations of the Oesargues-Levi graph with full
and abbreviated notation, respectively.
. 10
Since the terminology may not be clear, definitions will be given for a
few terms that will be used in this section. Degenerate rearrangements 0 r
automerizations 12 are those chemical processes in which covalent bonds are
broken and formed, leading to new connectivities, but the structures of the
reactants and products remain unchanged. Examples are Cope rearrangements
(eq. 1), or isotopic rearrangements such as those discovered by R. M. Roberts et
al. 13 when n-propylbenzene was treated with aluminum chloride (eq. 2), or the
automerization of phenanthrene-1- 13 C under the influence of AICI 3 (eq.3). 14
'C' -- ':)'
2
--+ (I)
4 ~6 4~ 6
- *
I'h-CHz-CHz-CHJ
AIC1]
* z-CH3 (2)
I'h-CHz-CH
'm ceo
- Ala 3
*~ I (3)
~I# ~I
Automerizations can be detected by isotopic labeling (in order to see
newly formed bonds), or when such reactions occur fast enough, by dynamic NMR
spectroscopy : the low number of peaks in the NMR spectra of bullvalene or
semibullvalene at room temperature is due to rapid reversible Cope
rearrangements, as will be discussed in Section 3.
Table 1 summarizes the types of chemical processes that can occur, and
the corresponding changes in chemical or stereochemical formulas. 15
142 A. T. BALABAN
~,
•
.••
~,
-!J"
.~.
~
5,"
• 5,U
I ')Y • ~ I
1." l."
5,n
-
Ho".d.,.".rof. Ollflll".'O'.
, r IIO"O"V."'.'"
eL1f:2
~,
..:!i S02
, ,
{5,n41
/<;,1111
'f,'''}
/1.U2] ,
'\)I-
, ,
sa
,
, ;7
-
S(;;;:t -45
, , -
2
(1.501
~
sOr
, -
(3,5'11
/I.,"S fV~J
I
/I."nl f 1.5111 (1.""
Fig. 9. Interconversions of classical bicyclo[2.1.0]pent-2-ene-5-yl and
cyclopentadienyl cations.
Fig. 10. Notation for the bicyclic and monocyclic structures presented in
Fig. 9 ; letters a - e symbolize digits 1 - 5.
The complete reation graph (Fig. 11) has 60 vertices of degree 2 (white
points) and 60 vertices of degree 4 (black points). The degenerate processes
involving only vertices of degree 4 are represented by ten 6-membered
circuits. They are connected pairwise to each other by four paths having a
vertex of degree 2 between each pair of vertices with degree 4. The whole graph
is reminiscent of the Petersen graph. with each vertex being replaced by a
6-circuit. The simplified notation includes only the digit preceding the comma.
~~3.2~
6~
Sr-~3
6
rf:;7
1
4.3 J;&.
6~"'V>3
714 741 324 2
6253 3526 1675
Fig. 13. The 2-bicyclo[2.2.1]heptyl cation with the three possible classical
1,2-shifts of C-C and C-H bonds, and with notation described in the text.
REACTION GRAPHS 147
The reaction graph has 7! = 5040 vertices ; it has girth 4 and is a cubic,
vertex-transitive and edge-intransitive graph. A portion of this graph is shown
in Fig. 14,17 with the corresponding notation, as shown in Fig. 13 : the label of
the one-carbon bridge is followed by the two bridgehead labels starting with
that one which is adjacent to the charged CW group. On a lower line, the
remaining four numberings follow in the order : the charged CH group, its CH 2
neighbor, the CH 2 group once removed from CW, and lastly its CH 2 neighbor.
Fig. 14. A portion of the reaction graph for the rearrangement of the
norbornyl cation via classical 1,2-shifts of C-C and C-H bonds.
aQ5 3 8~~5
.6 ~4
;:;Z4 2 J
2 1t H
8~4 a~5
~
· ~g9
8' 6
2::::'"
.
5#
4
itV
~
J
4
Fig. 15. Reaction graph for the rearrangement of the barbaralyl cation.
2.4. 1. Adamantane
It was shown in 1968 by Whitlock and and Siefken 31 that the conversion
of 2 into 1 may involve at least 2897 pathways ; they constructed a reaction
graph showing plausible intermediates. Fig. 16 selects only a few of these
pathways, based on the fact that all tricyclodecanes studied experimentally (3,
4, 5 and 6) did rearrange to 1. Schleyer and coworkers 32 calculated heats of
formation by means of molecular mechanics, and the corresponding values are
included in Fig. 16. They detected at -10 two intermediates in the aluminum
0
Fig. 16. Reaction graph for the formation of adamantane from various
precursors including the endo/exo tetrahydrogenated dimers of
cyclopentadiene (2). Negative numbers indicate heats of formation ;
thermodynamic stability increases approximately from bottom to top.
150 A. T. BALABAN
Adlmlntaneland
'g f----
7J:o -- 'E
.12..) ,'0.1
.1,0
~ ~ ~
•
Th ~
t:6
"
'~
.,1,1
--- 54:3
·'0.4
1
·17 "U
~
ru U$ - 'Jj
11 '7
ill ·lU
I--
·~.I
...-
.,.. 1 .\1.7
t
'& '2& r-- '«/
15
Fig. 17. The same graph as in Fig. 16, but without attempting to order
structures according to their stability.31
2.4.2. Oiamantane
,,'
Fig. 20. Reaction graph for the rearrangement of 6 to diamantane 1 (full
lines). Broken lines indicate less probable pathways.
~~~
...
/"\.......Ii
._.~ ~
y'
~~~-~.
. ~'"
'~'--~.L-
.,;~_...
B
'0
~~g
r ,
Fig. 21. Reaction mechanism for the formation of diamantane 1 from
tetrahydro-Binor S (2 or 3).
2.4.3. Tetracyclotridecanes
2.4.4. Tricycloundecanes
2.4.5. Tetracycloundecanes
Among all 2486 possible tetracyclic systems which have the molecular
formula ell H 16 , 2,4-ethanonoradamantane and 2,8-ethanonoradamantane are the
most stable according to empirical face-field calculations ; indeed, they appear
as the final products (in 97 : 3 ratio) of AIBr3-catalyzed isomerizations
starting from several available tetracycloundecanes. A reaction graph 34
interconnecting 15 isomers was helpful for identifying several intermediates.
-n'--...
~
I ffJ
E:f>=f;r LN-
.. !!. J~.
'HI
ti
H J~ '·1'"
"tA--.yjl
-D:'i1 ~ ,~,7'
,ll
,;;:.
~N
b I:k:I _
..... ~N ,;Q'l' "l';.!ll
Ed"'""
",~~,
,"
. 8.::;-"
,:~.
II"
D-=g- 'iF-
(~):J. (S~-.:fi (BI:2~
i)-
(1l);J.
Fig. 24. Reaction graph for interconversions of the carbocation 12 derived
from 5_ Small numbers in brackets indicate calculated heats of formation.
Numbers along arrows indicate dihedral angles for the bond alignment (the
lower these angles, the more easily the 1.2-shifts take place).
2
3(i)IO
I -.::. \
4
7 6 9
( 8
l J
4
Azabullvalene (Fig. 27) has a much simpler reaction graph for its
automerization owing to the higher stability of structures with sp2-hybridized
nitrogen, therefore structures where the nitrogen atom is not adjacent to a
double bond are forbidden energetically.
2 10 o 6
'~
6 4 5
872
391
456
451
392
870
. .~ 70-
•• '"<..-l",
.~ 81
6l6~3
J2.J.::.
8
7~2 ~
@
V
2
64
1
8
~ 7~1
6~4
8 2
-- 7~2
6
67
8 1
...:
l
~ - 54 5 54
...:;0;- 5 4 5
Fig. 29. Valence isomerization of COT into bi- and tetracyclic systems.
158 A. T. BALABAN
6 t 8 t 8 2 4 3
~~: ®2 7(iJr
72 5®t
-
~ I': 3 8 t 8 t 6 _ 4 6:-; ;;:
70l~ ~O;~
7 7 7
4 3 4 5 5 3 8
6(t}8 7®3
1 8 1 6~/.3- 8 t
~181': 2:(f)~
~1
5 4 5 4
3
5 _ 3 6 ~ I': 2 6
4 2 4 5 4 6 4
Fig. 30. Portion of the reaction graph for the automerization of COT
,,
-78-0-
. ,ly
•
·, ..
_ 1'l
'-
~'I
'0-
S _ I1' --+--7'-:"""-:'-<---+-- '0-
,. 11•
~. .~
'0
"'. 1' 0--"1,
(;1-'\
.'04,;. .
S _ J ',' _
4
"7'0-1
2 '\.24
5 _ I'l _3_8_ , _
17
1'l
-~-!/'
, 1 '51
Fig. 32. Reaction graph for the rotation of aryl rings in triarylboranes
with different substitution patterns in each phenyl ring.
5. 1. PENTACOORDINATE COMPLEXES
'*2
5
4
(.45) ~(.35)
5 3
112.)
-:s-- (.34)
Fig. 33. Portion of the reaction graph for the pseudorotation of a
phosphorane with five different substituents.
160 A. T. BALABAN
I I I
l_~,,'4 I .. 4 I .,3
1'5
\2
2-', Il
1 5
2-'"
1 5
14
2 3 4
1 I I
1 ,,5 _ 1 ,~s
2-P, IT
1,,5 -
2-', 14
3-P, 12
1 4 1 4 1 3 4
2 3 4
I ,2
1 - ' , 45
2 2 1 3
1-',
1 ,-4
23 I-P'
1.-3 24 S
1 5 I' 5 4
3 4
2 2
1-',
1 .• 3
1 2
_
45
I
1 ,4 _ 1,.5 _ 1 .5 - 5
2-r'::' 3 IS 1-',
I
23 4
1-',
1 3
24
5 3 4
1-\'4 2S
1 ,.3
3 3
2
1 ,.4 - 1,,5 -34
1_',
1.,4
1-',
1 3
25
I 2
I-r'l 35
5 4 5
Fig. 35. The mechanism for the Berry pseudo rotation ; it can be seen that
transition states between TBP configurations are tetragonal pyramids.
Fig. 36 presents the reaction graph for the Ml (3) mode of rearrangement
with the Muetterties notation ; the resulting Oesargues-Levi graph is
illustrated in several isomorphic representations.
"!l ..
"
.
j4
l5
.,
"
1/
"
/4
Fig. 37. Reaction graph for rearrangement mode M2(6).
For Musher's M4(6) mode one obtains two disjoint regular graphs of degree
six and girth three for the two enantiomeric series, as shown in Fig. 38 for one
of these. 63
/2
5
~3 @' 2 2 5
Ruch and Hasselbarth,66 using double cosets, showed the various modes of
rearrangement for TP complexes. For the standard complex of Fig. 39 there are
seven possible modes. Mode 1 is trivial (identity permutation, no
rearrangement). Mode 2 is also trivial since it involves permuting two opposite
ligands at the base of the TP, i. e. racemization. The reaction graph is a forest
of 15 disconnected edges linking together pairwise the enantiomers. Mode 3
leads to five disconnected regular subgraphs of degree 4 with six vertices each,
,;oce the ape, ;, uochaoged. Doe of the'e ;, ,howo ;0 Fig. 40;"A'"
5.23~5.'3
5.32
Fig. 40. Subgraph of the disconnected reaction graph for rearrangement
mode 3 of TP complexes.
REACTION GRAPHS 165
Mode 4 also leads to a reaction graph which is regular of degree four and
girth five. but now it is a connected graph which is represented in Fig. 41 with
five-fold symmetry. and in Fig. 42 with a Hamiltonian circuit with three-fold
symmetry. Randic et a1. 6 ? found a canonical numbering for this reaction graph
which leads to the smallest binary number when reading sequentially its
adjacency matrix. They also determined that the order of the symmetry group is
240.
26 10
2
,,{~"7f:.~~fYI5
......".,~..-.2" 8
29 •
Fig. 42. Reaction graph with Hamiltonian circuit and canonical numbering
for mode 4. 65
1.5
•..
Fig. 44. Reaction graph with Hamiltonian circuit for rearrangement modes
4 and 5 of TP complexes neglecting enantiomerism.
S.13
In both cases, when enantiomerism is disregarded, one obtains one and the
same 15-vertex reaction graph which is regular of degree 8, as shown in Fig. 48.
3.5
1.5
'.3 2.5
Fig. 49. Reaction graph for the rearrangement of tetrahedral complexes via
square planar transition states.
4) Indicate the last two digits in each sequence of six digits ; for each
pair of enantiomeric structures. compare their numbers. then take the lower
number for one enantiomer and the same number with a bar above it for the
other enantiomer.
The resulting notation may be restated as follows : the first digit
indicates the ligand trans (opposite) to ligand 1. whereas the second digit
indicates the ligand trans to SUbstituent 2. unless this substituent 2 is trans
to ligand 1. in which case the second digit indicates the ligand trans to
SUbstituent 3. Thus. digit 1 never appears in the notation.
These rules due to Muetterties. 51 - 53 Gielen and their coworkers 6.56-60
lead to the following 15 groups of 2-digit symbols : 24. 25. 26 ; 34. 35. 36 ;
43. 45. 46 ; 53. 54. 56 ; 64. 64. 65 . When bars are added. one obtains the 30
enantiomers depicted in Fig. 50. The distinction between the two hydride
ligands denoted by H5 and Hs (in addition to the four ligands denoted by L1
through L 4 ) leads to dividing the 30 isomers into 6 trans and 24 cis
stereoisomers. Enantiomers are depicted one above the other in Fig. 50.
L. L. L,
~.
~ •. _L. ~~ •• L) ~ ../ l 2
:?t<~
H.... H,_ •• Ht ..•
H." I ~LJ I •
H"" ..... L
o H," , . . . . t,
L, L, L, L,
E 16 n 36
L, L, L, 1,
H,._ /L,! H.;;~<L. H._._~ ..... L~ H,. __ ~ .... ",l.
H,
"";M~L
I ' H, r L, Hs" ,--L, H,....... ' ...... L,
L, L, L, L,
2S 26 36
!. . .
35 L. L, L,
H. __ . ' , ... L, H, •.. Lt H•.. _~ ... _ L,
J___ .,
L, L, L, .... M_
·L ___ I
~,
L. H. l/"",-H, L/,'" , ........ H,
H.... L, ~_/L2
H,',--L,
HI ... H,._.~_ ...... L, H· .. L.
H,'" ,---L, H•.,.....' ...... L, H," , ....... L. L,
2' 34
L.
'3
L. L. L. L,
4S 6)
.,
46 SJ L, L, L,
H···_L· . . HI._.~ __ .. LJ _~_ ... _LI
H, ...
L,
~", ...... L2
L,
H._._~ __ . L, H.... !___
L,
L. H. ___ ~ ../LI
L." .,,-H,
·L,
L."" . . . H.
Ht ••
L,""'" , . . . . . H,
H, ..... i -L,
I
H,;,-L1 H," , ....... l\ H.',-l, 2i
L,
li
L.
41
L. L. L, L,
(b) Trails
45 46 6J 53
t,
l·.
L, L.
J. . .
L.
H.· __ ·L\ H••.• '. __ L, _~---LI
HI •• H._,. L,
1--" H/'... , . . . . . L, ","'-L
H ,M ....... L
H."" • I ' I I
L. L. L. L;
.,
6i 54 65 56
L, L. L.
H,_. I .. L
';M:""' L
J H. __ I __ L, H•• _.' ..... Ll H,_ f .. l.
I ~M'L
I '
.... M_ L
I ;';M:'::" L
.. ". I '
HI I H, Hs I
L. L. L. L.
54 is 56
(.) 01
Two isomerization processes lead to the same result : the Bailar trigonal
twist 69 rotates one triangular face of the octahedron relatively to opposite one
; the Ray-Dutt rhombic twist 70 involves rotation of four ligands on two faces
of the octahedron sharing one edge. Such isomerizations convert 56 into 25, 26,
34, 35, 43, 45, 63, and 64 (connectivity 8 for the resulting reaction graph of
girth 4). This graph is represented with Hamiltonian circuits in Fig. 51 with
three-fold symmetry ; one can find isomorphic representations such that
enantiomers appear in antipodal (diametrally opposite) positions.
1(~~i!i1ft1f\
jj~~~~
IS
Fig. 51. Reaction graph with Muetterties notation for the trigonal (Bailar)
or rhombic (Ray-Dutt) twists.
On reverting the asignments and the notation for 24, 35 and 53 relative
to the Muetterties-Gielen conventions, as was advocated by Balaban,71 one can
obtain for this rearrangement a reaction graph (Fig. 52) which has a six-fold
symmetry, a Hamiltonian circuit with diametrally opposed enantiomers, and
two sets of vertices for the bipartite graph which are distinguished by the
presence or absence of a bar.
IS
Fig. 53. Reaction graph for the trigonal or rhombic twists ignoring
enantiomerism (Muetterties and Balaban notation becomes equivalent).
02
..
Fig. 56. Canonical numbering for the graph from Fig. 54.
Fig. 57. Reaction graph for digonal twists with Balaban's notation.
'+". ,./
H4 )U "'2 565
• •
.,./ '/ \
12' 1)4 14' I~
~
_ .R:,,'.
3~ ~~~~~~~~~~"J
~ I~
..,
\3$
,A$
Trigonal prismatic
complexes with 6 different ligands give rise to
6 !/( I0 3 1) = 720/6 = 120
stereoisomers. The resulting reaction graph is too
complicated to be presented here. Even when enantioisomerism is ignored and
the reaction graph remains with only 60 vertices, the picture is fairly
intricate.? 5
The transition states for the Bailar or Ray-Outt twists have trigonal
prismatic structure, but they are not usually taken into consideration in the
reaction graphs.
There are 257 polyhedra with eight vertices ; 14 of these are deltahedra,
having only triangular faces. The square antiprism coordinate complexes with
two square and eight triangular faces can give rise to 14 modes of
rearrangement, consisting of seven pairs of enantiomeric modes. Three reaction
graphs for four of them are shown in Figures 60 and 61. 76
174 A. T. BALABAN
13
19
Fig. 60. Reaction graphs for modes 9 and 10 : they are regular of degree 8,
and have 12 and 24 vertices, respectively.
10
Fig. 61. Reaction graph for modes 3 and 7. It is regular of degree 4 and has
24 vertices.
6.Xenon hexafluoride
7.Heptaphosphide trianion
The three title graphs are similar in principle to reaction graphs in that
all these types of graphs have vertices symbolizing chemical species
undergoing reactions. However, vertices of the same graph do not represent
isomeric species as in the case of reaction graphs ; in kinetic graphs, vertices
symbolize intermediates, and edges symbolize their interconversions, while in
synthon graphs and graphs transforms, vertices symbolize successive steps in
building up a molecule. Another feature differentiating such graphs from
reaction graphs is that usually the latter graphs are simple, non-directed
graphs, whereas kinetic and synthon graphs are directed graphs (digraphs), i. e.
their edges have a direction (they are, therefore, properly called arcs).
With the aid of kinetic graphs one can derive the kinetic equations,
analyze, and solve them. Further information on kinetic graphs may be obtained
from reviews by Bonchev and Temkin,80-83 and by Yatsimirskii. 84
A few more words on synthon graphs are necessary. This term was
introduced by Hendrickson,8S-87 but Corey and Wipke 88-90 were the first to
devise programs for computer-assisted organic synthesis. This field has
expanded rapidly, and a rich literature exists. 91 -98 Synthon graphs indicate how
the synthons are assembled, while a related type of graph (optimal planning
graph) shows the order in which synthons are introduced in order to build up the
target molecule. It is well-known that for the same numbers of steps and yields
per step, a convergent synthesis (branched graph) gives a higher overall yield
than a sequential synthesis (linear non-branched graph). Fig. 63 presents two
examples for obtaining a steroid from four synthons ; the order in which the
four synthons 1 - 4 are assembled is shown in the bottom row ; the number of
carbon atoms in each synthon is written to the left of the synthon notation
(1 - 4) on the bottom row where the target molecule is denoted by a white
point. Assuming yields of 90 % for each step, the "synthetic tree" on the left has
a global yield of 0.9 4 = 66 % while the one on the right has an overall yield of
0.9 6 = 53 %.
""l:'~
00')' Target
molecule
Synthon
:~ graphs
~:
Optimal cm2~
C3 ,
C2W '2; - ;
C planning
c, J graphs c, 1
c2 ,
c, J
Fig. 63. Synthon graphs (second row) for the same steroid target molecule
dissected differently.
REACTION GRAPHS 177
9.Conclusions
References
1. A T. Balaban, D. Farcasiu, and R. Banica, Rev. Roum. Chim., 1966, 11, 1205
2. C. J. Collins, C. K. Johnson, and V. F. Raaen, J. Amer. Chem. Soc., 1974, 96,
2524.
3. M. Gielen, in Chemical Applications of Graph Theory (AT. Balaban, ed.),
Academic Press, London, 1976, p. 261.
4. J. Brocas, M. Gielen, and R. Willem, The Permutational Approach to Dynamic
Stereochemistry, McGraw-Hili, New York,1983.
5. T. M. Gund, P. v. R. Schleyer, P. H. Gund, and W. T. Wipke, J. Amer. Chem. Soc.,
1975, 97, 743.
6. E. Osawa, Y. Tahara, A. Togashi, T. lizuka, N. Tanaka, T. Kan, D. Farcasiu,
G. J. Kent, E. M. Engler, and P. v. R. Schleyer, J. Org. Chem., 1982,47, 1923.
7. J. Petersen, Acta Math., 1891, 15, 193.
8. F. Harary, Graph Theory, Addison-Wesley, Reading, MA, 1969.
9. P. K. Wong, J. Graph Theory, 1982, 6, 1.
10. W. T. Tuite, Connectivity in Graphs, Univ. of Toronto Press, 1966, p. 74.
11. N. L. Biggs, Algebraic Graph Theory, Cambridge University Press, London,
1974.
12. A T. Balaban, and D. Farcasiu, J. Amer. Chem. Soc., 1967, 89, 1958.
178 A. T. BALABAN
PAUL G. MEZEY
Mathematical Chemistry Research Unit
Department of Chemistry and
Department of Mathematics and Statistics
University of Saskatchewan
Saskatoon, Canada, S7N OWO
However, the above simple picture is very misleading if taken at face value. Molecules
are quantum mechanical objects. They have neither a finite body defined in precise
geometrical terms nor a finite boundary surface that encloses the entire electron density
of the molecule. The electronic density function changes rapidly with distance within a
certain range, but this change is continuous; there is no abrupt change analogous to the
boundary of a macroscopic object like a potato. The cloud-like, fuzzy electronic density
of a molecule is rather different from a macroscopic body, and no finite distance can be
specified beyond which the electronic density of the molecule is precisely equal to zero.
Even within a semiclassical model, the electronic density decreases in a continuous
manner as the distance from the nuclei increases, and no precise distance can be given
181
D. Bonchev and O. Mekenyan (eds.), Graph Theoretical Approaches to Chemical Reactivity, 181-208.
© 1994 Kluwer Academic Publishers.
182 P.G.MEZEY
where the molecule ends. No molecular surface exists in the classical, macroscopic
sense. The peripheral regions of a molecule are described by a continuous, 3D
electronic charge density function that approaches zero value at large distances from the
nuclei of the molecule.
In a less rigorous sense, however, the concepts of a formal molecular body and
molecular surface are very useful within both classical and approximate quantum
chemical models of molecules. A boundary surface of a formal molecular body, or
several such surfaces at various electron density thresholds can be defined by requiring
only that these surfaces enclose an essential part of the molecule. Depending on the
chemical problem, there are many possible choices for what is to be regarded as the
essential part of the molecule; some of the possible choices, as well as many of the
topological shape analysis techniques are described in ref. [1]. Usually, some electron
density threshold is chosen, and all points where the electronic density is equal to or
greater than this threshold are regarded to belong to the "essential part" of the
molecule. This approach can be formulated in terms of contour surfaces of electronic
charge densities, called molecular isodensity contours (MIDCO's).
Alternative choices can also be considered as formal molecular surfaces, for example,
contours of the molecular electrostatic potential (molecular electrostatic potential
contours, MEPCO's), the contours of specified molecular orbitals, such as frontier
orbitals, or simple molecular Van der Waals surfaces generated by fused atomic spheres
(VDWS'), solvent accessible surfaces, and various other surfaces surrounding some or
all of the nuclei of a molecule. For each choice, the part of the 3D space that is
enclosed by the surface can be regarded as an approximate molecular body; for
example, a formal molecular body B(a) can be taken as the part of three-dimensional
space enclosed by a MIDCO G(a) of a density threshold a.
Molecules of similar functional groups, similar moieties, and molecules having similar
reactions usually have similar molecular isodensity contours, MID CO's, and similar
molecular electrostatic potential contours, MEPCO's. The similarities are often
restricted to local regions, nevertheless, they are recognizable. The fact that similarities
in reactivity are often associated with local similarities in the MID CO and MEPCO
surfaces of the reacting molecules indicates the special importance of electronic and
electrostatic interactions during the initial stages of chemical reactions. Biomolecules
and drug molecules of similar biochemical effects also show similarities in their
MEPCO's and MIDCO's. Since the shapes of MEPCO's often suggest mechanistic
explanations of how a given ligand interacts with a receptor site of an enzyme, the
study of the three-dimensional shapes of MEPCO's is an important task in rational
drug design.
There are several important differences between MIDCO's and MEPCO's; one of the
most prominent differences is in the range for the numerical values of the contour
thresholds. Whereas the threshold parameters a of electronic charge density contours,
MIDCO's G(a) can take only non-negative values, the threshold parameter of a
molecular electrostatic potential contour MEPCO G(a) can take both positive and
negative values.
THREE-DIMENSIONAL MOLECULAR BODIES AND THEIR SHAPE CHANGES 183
The conventional approach of direct visual inspection of shapes is one of the simplest
methods for the assessment of shape similarity. Most molecular modeling computer
programs have the capacity to display pictorial shape information on the computer
screen; these models can then be used for a similarity assessment by inspection. Using
advanced quantum chemistry, molecular mechanics, and other computational chemistry
methods and computer graphics techniques, the three-dimensional images of various
molecular models, contour surfaces, or macromolecular representations such as a
pattern of a folded protein chain or a knotted structure of a DNA fragment can be
displayed on the computer screen. These images can be rotated, aligned with other
models, or superimposed on models of other molecules, and simple visual comparisons
can be used to judge molecular similarity. Such visual comparisons are much enhanced
by the chemical knowledge and expectation of the observer, who can incorporate in the
assessment the known or assumed relative importance of various shape features seen
on the computer screen.
However, such visual comparisons have some disadvantages: they are often subjective,
seldom reproducible, and rely on the visual memory of the observer. It is hard to
remember and compare images seen hours ago with one currently on display. Two
different observers are likely to judge the similarity of a sequence of molecular models
differently: when the task is to order a set of molecules according to their degree of
similarity to a target molecule, two different observers are likely to come up with
different orders.
These potentially serious disadvantages of visual similarity evaluation methods can be
circumvented by nonvisual, algorithmic similarity analysis, using automated similarity
assessment by computer. It is possible to use nonvisual computational techniques for
evaluating the degree of similarity by reproducible, algorithmic methods. By such
computer-based algorithmic methods, molecular similarity can be assessed and
evaluated numerically.
enclosed by the large, low electron density MIDCO's of molecules. By contrast, most
of the nonconvex formal molecular bodies and the associated MIDCO's of more
intricate shape features are of more chemical interest. For a detailed study of their
shapes the methods of local convexity analysis and their various generalizations are
important mathematical, algorithmic tools. Some of these generalizations will be
reviewed below, with special emphasis on the topological techniques adapted to treat
fuzzy electronic charge clouds.
A single surface or a single, formal molecular body cannot provide a detailed enough
description of the shape of the actual, fuzzy electron distribution or the entire, 3D
molecular electrostatic potential of any given molecule. Individual surfaces, that is,
individual geometrical models are insufficient for the description of the shape of
molecules, even if one takes a static model and ignores vibrational motions and
conformational flexibility. The validity of simple geometrical models is even more
restricted if the conformational flexibility and the more general, dynamic molecular
properties of molecules are considered.
In fact, most topological shape analysis methods for molecules is based on the two-step
process of Geometrical Classification and Topological Characterization [1].
analysis techniques [1]. Geometrical conditions are used to defme ranges of geometrical
objects (e.g., families of points along a MIDCO where the contour surface satisfies
some local geometrical condition, for example, some curvature condition), leading to a
geometrical classification of the surface points into domains. This step is followed by a
topological characterization of the various interrelations among these domains. This
topological characterization is stable for most small geometrical variations that are
regarded incidental. In fact, topology is used to extract the essential information,
affected only by greater geometry changes.
In order to apply the above principles, the characterization of the shapes of molecular
contour surfaces, such as MIDCO's and MEPCO's, requires the subdivision of the
surface into domains fulfilling some local shape criteria. One may consider two types of
criteria: relative, and absolute shape criteria. These lead to a relative shape domain
or an absolute shape domain subdivision of the molecular contour surface,
respectively.
If one compares two or several surfaces to one another by some direct technique, then
relative shape conditions are used. In one implementation, a pair of contour surfaces of
two molecules or contour surfaces of two different physical properties for the same
molecule are superimposed, generating interpenetration patterns on these surfaces. The
maximum connected subsets of these patterns can be taken as the relative shape
domains on each surface, and the interrelations among these relative shape domains in
the patterns can be used as criteria for local shape characterization. This shape analysis
belongs to the family of relative methods, describing the shape of one molecule relative
to another or one physical property of a molecule relative to another property of the
same molecule. When a topological analysis and subsequent characterization are applied
to these relative shape domains, then a direct comparison is obtained between the two
molecular surfaces.
It is not always necessary to compute the entire contour surface for both molecules or
for both physical properties; for a simple study, it is sufficient to map the
interpenetration pattern on one of these surfaces. For example, if the technique of
interpenetrating contour surfaces is applied for a relative shape domain subdivision of a
pair of MIDCO and MEPCO surfaces of the same molecule, then the MID CO
surface can be subdivided into domains using the contour value of the MEPCO as
criterion. This procedure is equivalent to generating the interpenetration pattern on the
MIDCO surface [1].
If one compares a molecular contour surface to some standard surface, such as a plane,
or a sphere, or an ellipsoid, or any other closed surface selected as standard, then an
absolute shape characterization is obtained. In the simplest case, a contour surface is
compared to a plane. This plane can be moved along the contour as a tangent plane, and
the local curvature properties of the molecular surface can be compared to the plane.
Each point of the contour surface can be characterized by the local relation between the
tangent plane and the body enclosed by the surface: the tangent plane may fall on the
outside, on the inside, or it may cut into the given molecular body within any small
neighborhood of the surface point. This characterization leads to a subdivision of the
molecular contour surface into locally convex, locally concave, and locally saddle-type
186 P. G. MEZEY
shape domains, usually denoted by the symbols D2, Do, and Db respectively. These
shape domains are absolute and lead to an absolute shape characterization in the above
sense, since the local curvatures are compared to the plane, selected as standard. If a
different standard, such as a tangent sphere or a tangent ellipsoid is selected then a
similar technique can be used for generating absolute shape domains on the contour
surface.
When a topological analysis and subsequent characterization are applied to these
absolute shape domains, then an absolute shape characterization of the molecular
surface is obtained.
Some of the topological methods of shape analysis of molecular contour surfaces,
designed to take advantage of such relative and absolute shape domain subdivisions of
the contours according to some physical or geometrical conditions are described in
detail in ref [l].
The justification of the use of shape domains of contour surfaces for facilitating a
topological shape analysis of molecules follows from a simple observation. Most
commonly, molecular contour surfaces defmed in terms of various physical properties,
such as the MIDCO or MEPCO surfaces, are topologically rather simple, even trivial
objects. Most MIDCO's in the chemically interesting density ranges are topologically
equivalent to a sphere, or in more special cases to a doughnut or to a few "fused"
doughnuts. A direct topological characterization of such (topologically simple) surfaces
does not provide much useful information about their chemically interesting shape
features, and is of little use in a similarity analysis.
To overcome this problem, one may use various geometrical or physical conditions,
denoted in general by ~, and define various domains on a contour surface G(a), such
as the shape domains Dji. of various curvature types. These domains subdivide the
contour surface G(a). One may formally cut out from G(a) all domains of some
specified properties [for example, all the locally convex domains ("bumps") of the
contour surface G(a)]. In this process, a new, topologically more interesting object, a
truncated contour surface G(a,~) is obtained, that inherits some shape information of
the original surface in a topologically easily accessible way.
The new, truncated surface G(a,~) is no longer topologically equivalent to the original
contour surface G(a). In spite of this, the surface G(a,~) carries information on the
shape of the original surface G(a), where shape is understood within the context of the
curvature condition or physical property used to define domains on the surface. In this
scheme, a geometrical or physical shape condition is used to convert the original
molecular surface G(a) into a new, topologically different object G(a,~). The
topological analysis of the truncated surface G(a,~) corresponds to a shape analysis
of the original surface G(a). The topological invariants of the truncated surface G(a,~)
contain information on the pattern and topological interrelations of various shape
domains on the original contour surface G(a). In fact, the geometrical curvature
properties of the surface, considered as a manifestation of the shape of the object, are
characterized by topological tools.
THREE-DIMENSIONAL MOLECULAR BODIES AND THEiR SHAPE CHANGES 187
(i) all the possible (in principle, infinitely many) geometrical patterns and
arrangements are classified by some combination of geometrical and
topological criteria, and
This method follows the framework of the GSTE principle: the initial geometrical
classification by curvature properties leads to an eventual topological characterization. If
the above topological characterization gives the same results for two different nuclear
arrangements along the reaction path, that is, if the two patterns are topologically
equivalent, then the two arrangements are similar in a geometrical sense.
One can take the same approach when comparing two different molecules: if their shape
domain characterizations give equivalent topological results, then the two molecules are
similar in a geometrical sense. The geometrical similarities of the nuclear arrangements
within a range of the nuclear configurations along the reaction path are manifested in a
topological equivalence of the shape domain patterns of the corresponding sequence of
188 P.G. MEZEY
One topological technique for a concise representation of the interrelations among the
shape domains is the Shape Group Method (SGM) [1], developed for the analysis of
molecular shapes. Assume that a family of shape domains is determined on a molecular
contour surface G(a). A truncated contour surface G(a,ll) is obtained from G(a) by
excising a selected subfamily of shape domains, for example, all Dll shape domains
for a specified index ll. The shape groups of the contour surface G(a), with respect
to the given family II of shape domains, are the homology groups of the truncated
contour surface G(a,ll).
For example, we may assume that the shape domains are defined in terms of local
convexity, leading to locally concave, saddle type and convex domains. If we select the
locally convex domains (that is, the domains of index II =2), then the shape groups of
G(a) are the homology groups [1] of the truncated isodensity contour surface G(a,2),
obtained from the molecular contour surface G(a) by cutting out all Dll domains of
index II = 2.
Homology groups of algebraic topology are topological invariants, expressing an
important aspect of the topological structure of bodies and surfaces. In general, the
shape groups of an object are the homology groups of a truncated object derived from
the original object by eliminating parts fulfilling some physical, or geometrical or some
other topological criteria [1]. This family of shape groups of the example, obtained by
eliminating all locally convex domains of G(a), has been used for the shape analysis of
many molecules [1].
Following the spirit of the GSTE principle, the above shape group methods combine
the advantages of geometry and topology. The shape domains and the subsequent
truncation of the MIDCO's are defined in terms of a geometrical classification of
points of the surfaces, using local curvature properties, whereas the truncated surfaces
are characterized topologically by the shape groups.
The shape groups are not determined by the point symmetry groups of the nuclear
framework. The shape groups provide a symmetry-independent characterization of
molecular shape.
Hessian matrix lHlT(r), and points r of G(a) are classified into domains
according to the relative local convexity properties of their neighborhoods on G(a),
relative to the tangent sphere, or according to the oriented relative local convexity
propenies, relative to the tangent ellipsoid T. Points with zero, one and two negative
eigenvalues belong to domains Dom, D 1m, and D2(T)' which are concave, of the
saddle-type, and convex, respectively, relative to the tangent sphere or the oriented
tangent ellipsoid T.
The test ellipsoid T may be oriented so as to represent an external electromagnetic field.
Alternative choices for orientation include the main direction of a cavity of an enzyme
molecule, the axes of a polarizability ellipsoid of a molecule, an alignment on the
surface of a catalyst, or some other internal or external constraint [1].
If one replaces the ellipsoid T by any other differentiable surface, for example, by a
contour surface of another molecule [1], then a further generalization of the concept of
convexity is obtained. The resulting shape domains on the MIDCO G(a) can be used
for a direct shape comparison and a direct similarity test for these molecules.
The above shape analysis methods all rely on a classification of surface domains based
on local curvature properties. Curvature can be regarded as the second derivative of a
function describing the surface, hence all curvature-based methods of shape analysis
require that the surface must be twice differentiable. However, not all models of formal
molecular surfaces fulfill this criterion: some surfaces, such as fused sphere Van der
Waals surfaces don't even possess first derivatives at the seams of interpenetrating
spheres.
A similar problem arises with simple models of solvent accessible surfaces, obtained by
"rolling" a sphere (representing the solvent molecule) along a formal molecular surface
of the solute. Even if the latter surface is differentiable, the solvent accessible surface,
taken as the surface generated by motion of the center of the rolling sphere, is not
necessarily differentiable [1].
Several methods of shape analysis have been proposed for such surfaces [1]. One of
these methods is applicable for surfaces that are not everywhere differentiable as well as
for the shape characterization of dot representations of molecular surfaces. The latter
models are not only nondifferentiable, but are not even continuous. This technique, the
method of T-hul/s [1] is based on a generalization of the concept of convex hull. The
convex hull of a set A is usually defined as the smallest convex set that contains A.
However, an alternative, equivalent definition lends itself to a useful generalization.
Take all possible half spaces (a half space is one side of the three-dimensional space
divided by a plane) that contain the set A, and define the convex hull of A as the
intersection of all these half spaces. A generalization is obtained if one replaces the half
spaces by some other object T. Consider a three-dimensional body T. By definition,
the T-hull of a point set A is the intersection of all rotated and translated versions of
T which contain A.
The T -hull method can be applied for shape comparisons of molecules using a
common reference shape, chosen as that of the body T. In another application, the
190 P. G. MEZEY
shapes of two molecules, T and A can be compared directly. One of the two
molecular bodies can be chosen as T and the T -hull of the other molecular body A
serves as a tool for a direct shape comparison [1]. The smaller the deviation between
A and the T-hull of A (as measured, for example, by volume differences), the more
similar are T and A.
Another important class of methods used for molecular similarity analysis is based on
the resolution based similarity measures (RBSM) [1]. The principle of these methods
can be illustrated by a simple example. Consider three objects, A, B, and C, which
appear indistinguishable at a great distance to an observer. For example, for a distant
observer all three objects may appear as mere points, hence the objects cannot be
distinguished. At some closer distance, one of the objects, say object C is already
distinguishable from objects A and B, but the latter two may still appear
indistinguishable. At a close distance, A and B are also distinguishable.
Alternatively, one may view a low resolution photograph of these three objects; if the
resolution is very low, the three objects may appear indistinguishable. One may view a
somewhat better resolved photograph, where object C is distinguishable from the
other two, while A and B are still indistinguishable. However, on a photograph of
high enough resolution, all three objects are distinguishable. Using either approach, it
is natural to conclude that A and B are more similar to each other than A is to C or
B is to C, simply, because the dissimilarity of C from the other two objects is
already evident at a medium distance or at medium resolution, whereas it takes a closer
look or a much higher resolution to distinguish A from B.
In general, the observer may also use a series of binoculars, and in order to distinguish
objects, a higher level of resolution of the observed picture is required if the objects are
more similar. Based on this idea, one can define a similarity measure relying on the
level of resolution required to distinguish objects, leading to Resolution Based
Similarity Measures (RBSM's), described in detail in [1].
partitioning, that is, for each level of topological resolution. For each such level, the
introduction of a topology turns the MIDCO into a topological space. The
corresponding topologies are related by a cruder-finer relation [1], if the shape
domains of a cruder partitioning can be constructed as unions of the shape domains of a
finer partitioning. If all shape domain partitionings, that is, all the topologies introduced
on the MIDCO's can be ordered by such relations, then the corresponding hierarchy
of topologies also provides a hierarchy of topological resolutions. If two MIDCO's
have identical shape groups using a finer shape domain partitioning than two MIDCO's
which have different shape groups already at a cruder shape domain partitioning, then
the members of the first pair of MIDCO's are more similar to each other than the
MIDCO's of the second pair. The complexity of the partitioning can be characterized
by the defining subbase used, that gives a measure of how fine the corresponding
topology is. In tum, this measure gives a resolution based measure of similarity of the
MIDCO surfaces.
The quantum mechanical uncertainty in nuclear positions and the associated, inherent
vibrational and other internal motions of molecules imply that molecular shapes cannot
be described in detail without taking into account dynamic features of molecules. The
topological aspects of dynamic shape properties have an important role in the study of
conformational motions, chemical reactions, as well as electronic excitations.
Evidently, molecular shape is not a static property. Molecules vibrate even at absolute
zero temperature. According to quantum mechanics, the formal vibrational properties
are manifested in a probabilistic distribution of nuclear positions in any poly atomic
molecule. In a similar manner, rotational states of molecules also influence their shapes.
Since motion is an inherent property of all molecules, molecular shapes cannot be
described in detail without taking into account the dynamic aspects of the motion of
various parts of the molecules relative to one another.
The energy dependence of the accessible shapes and the accessible symmetries of
various molecules obey a family of rules influencing the mechanism and outcome of
conformational changes, electronic excitations, and chemical reactions [1].
Some of the dynamic shape analysis aproaches can be formulated in terms of the
dynamic shape space D, defined as a composition of the nuclear configuration space
M, and the space of the parameters involved in the shape representation, for example,
the two-dimensional parameter space defined by the possible values of the density
threshold a, and some reference curvature parameter b of a given MIDCO surface.
Two types of methods for dynamic shape analysis have been distinguished [1]. The
methods belonging to the first class are used to determine which nuclear arrangements
are associated with a given topological shape. The methods belonging to the second
class determine the available topological shapes compatible with some external
conditions, for example, with an energy bound.
A simple formulation of the dynamic shape analysis methods of the first type can be
given in terms of the invariance of topological descriptors within domains of the
dynamic shape space D. The subsets of the dynamic shape space with a common shape
group, called the shape group invariance domains 0/ D, can serve as tools for such
dynamic shape analysis. Within these invariance domains a limited change of nuclear
configurations, hence a limited change in the geometrical shape of the MIDCO surface
is permitted, as long as these changes are small enough so that within the given
topological context the topological shape remains invariant. For example, the
preservation of the shape group is a suitable topological criterion. The dynamic shape
space invariance domains serve as tools for analyzing dynamic shape properties.
An upper limit for energy within a family of nuclear arrangements can be selected as
criterion to be used in a dynamic shape analysis method of the second type. The task is
to determine the invariance domains of topological descriptors in the dynamic shape
space D, restricted to these nuclear configurations. In this process, an
energy-dependent family of allowed shapes is obtained, as defined by the given
topological descriptors.
Alternatively, the energy criterion can be replaced with formal temperature, using
properties of Boltzmann distributions. At a higher temperature more energy is available,
and the molecular vibrations cover a wider range of formal molecular geometries. A
larger accessible conformational domain implies that a greater variety of dynamic
shapes is likely to occur. At some higher temperature, the energy is sufficient for
overcoming the activation barriers to conformational rearrangements or to chemical
reactions. Consequently, at these temperatures a further, more significant increase in
the size of accessible configuration space domains is found, and greater shape
variations are expected.
Whereas these techniques are useful for the study of many chemically relevant aspects
of molecular shape, they require rather time consuming electron density calculations,
and hence are not easily adaptable for large molecules. There is a need for alternative,
approximate techniques.
An early observation of Parr and Berk [2] is the basis for a simple, discrete
representation of molecular shapes and shape changes in chemical reactions, described
below. The method proposed in the present study is much simpler than the original
shape group analysis applied directly to MIDCO's, nevertheless, the approach retains
many of the fundamental features of MIDCO-based molecular shape analysis and
molecular similarity analysis techniques.
The potential V n(r) generated by the nuclei, also called the "bare nuclear
potential" by Parr and Berk [2], is defined as
Parr and Berk [2] have found that the isopotential contours of the nuclear potential
Vn(r) of simple molecules show a remarkable similarity to the actual MIDCO's of the
electronic ground states of these molecules. These nuclear isopotential contours can
serve as approximations of MIDCO's, and in general, they are suitable for an
approximate shape representation of molecules.
The isopotential contour surfaces of the nuclear potential have been referred to as
194 P.G.MEZEY
The level sets F(a) and their NUPCO boundary surfaces G(a) for any constant
nuclear potential value a are defined as
respectively. The value of nuclear potential is always positive or zero; clearly, there are
no NUPCO's G(a) with negative threshold parameter a.
One should note that only some of the essential chemical changes are reflected in a
NUPCO shape analysis; in some instances, electronic excitations have only negligible
THREE-DIMENSIONAL MOLECULAR BODIES AND THEIR SHAPE CHANGES 195
Note, however, that electronic excitations are often accompanied by small changes of
the optimum nuclear arrangement K, and NUPCO's can be used for an approximate
description of these contributions to the overall shape changes caused by the electronic
excitations.
The analysis of NUPCO's also provides alternatives for various dot representations of
molecular shapes. A technique, called the fused spheres guided homotopy (FSGH)
method [1] is based on the generation of point sets on spherical surfaces about each
nucleus of a molecule in a systematic manner, designed to facilitate the construction of a
semi-uniform distribution of points ("dots") along MIDCO's, using simple
interpolation. The same technique is directly applicable for dot representations of
NUPCO surfaces. An improvement of the original FSGH technique is obtained if the
supporting, "guiding" set of spheres is replaced by a "guiding" series of NUPCQ's
G(ai) for a sequence of nuclear potential values ai' a2, ... , ai ' ... , am. Points
along these NUPCO's can then be used for the interpolation of electron density
values, eventually leading to a dot representation of the actual MIDCQ. Whereas the
computation of NUPCO's is somewhat more time consuming than the generation of a
sequence of spheres, NUPCO's provide a more faithful representation of the shape of
the MIDCO's, hence the approximate dot representation is also more accurate.
(4)
196 P. G. MEZEY
For the reaction path p considered, take the initial nuclear configuration,
Take a high enough nuclear potential threshold value a 1 that fulfills the following
conditions:
(ii) the G(Ko,aI) NUPCO surface has the maximum number of maximum
connected components, subject to condition (i).
In other words, the various nuclear neighborhoods are not joined yet (condition (i»,
and G(Ko,aI) have the maximum number of such atomic neighborhoods (condition
(ii».
(6)
At the nuclear potential threshold aI, the NUPCO component enclosing nucleus j is
denoted by GIj(Ko,al). We assume that all these NUPCO components are
topological spheres.
(9)
some of the components GI/Ko,a 1) and Glj-<Ko,a 1) may expand and join to form a
single maximum connected component G2/KO,a2). Here we assume that
j > j'. (10)
We choose the value a2 as the nuclear potential threshold where the first such joining
THREE-DIMENSIONAL MOLECULAR BODIES AND THEIR SHAPE CHANGES 197
of components occurs.
where
where
(14)
index j' is the smallest index of any nucleus enclosed by the component Gij(KO,ai)
enclosing nucleus j, and where
(15)
One should notice that the imaginary component of each number k 11, k 12, ... , kIn
of the first sequence is zero that agrees well with the assumption on the topological
sphere properties of the initial NUPCO components. Consequently, the numbers k 11,
k 12, ... , k In which are real integers, also satisfy the conditions of the general
definition given for the numerical sequence kit, ki2' ... , kin of a generic index i.
In the general case, the numbers kil' ki2' ... , kin are complex with integer
components, including the possibility of zero imaginary parts. Clearly,
For each nuclear configuration K there are m(K) such kil, ki2' ... , kin'
sequences, i=I,2, ... m(K). These sequences can be arranged into m(K) x n matrices
which can be augmented by mmax - m(K) rows of zeroes, where
198 P. G. MEZEY
N(K) = . . . . . . . (18)
km(K) I km(K) 2,' . . . k m(K) n
o 0 0
o o .. O.
For the static nuclear configuration K the matrix N(K) describes the pattern of the
topological structure of the nuclear potential.
where we consider column vectors and the symbol' stands for transpose. The matrix
N(K) combined with vector a(K) provides a more detailed description of the shape
of the NUPCO sequence of nuclear configuration K.
The information stored in matrix N(K) can be represented by a labeled graph d(K).
The n vertices of d(K) are labeled by the serial indices of the nuclei (the column
index j' of matrix matrix N(K». In addition, each vertex j' is labeled by a sequence
of complex numbers zj't, t=1,2, .. , defined by
where i' is the index of potential threshold value ai' where the t-th topological
change of Gi'j'(Ko,ai') occurs, and by
of the topological change, as long as for this change j' is the smallest nucleus index
within the NUPCO.
There is an edge from vertex j to vertex j' if at the nuclear potential threshold ai the
nucleus j is contained in the NUPCO component Gij'(Ko,ai), where j' is the
smallest index of any nucleus enclosed by the NUPCO component containing nucleus
j, and where ai is the largest potential threshold value where this holds. The OJ')
edge is labeled by index i.
The edge indices can be used to assign a direction to each edge, for example, the
THREE-DIMENSIONAL MOLECULAR BODIES AND THEIR SHAPE CHANGES 199
direction from higher to lower nuclear index as given in list (7), turning the edges into
arcs and d(K) into a digraph. Digraph d(K) is a discrete representation of the
topological pattern of the nuclear potential of the molecule.
An alternative digraph representation da(K) is obtained from d(K) if one replaces the
integer arc labels i with the real number label ai and the real parts i' of the arc labels
with ai' of the actual threshold values where the topological changes occur. This
approach takes into account all information represented by matrix N (K) and vector
a(K). Note that digraph da(K) is no longer a discrete representation of the nuclear
potential.
In the following discussion we shall use the matrix representations N(K) and vectors
a(K), convenient for computer manipulations.
p: 1=[0,1] ~ M, (22)
parametrized as
p = p(u), 0 ~ u ~ 1, (23)
of the formal product. Note that for most nuclear configurations K(u) along the path
p(u), a small displacement du does not alter the topological pattern N(K(u» of
NUPCO sequences,
although the numerical values of the critical threshold values, stored in vector a(K) are
likely to change:
Along the entire path p(u), there are only finite number of different N(K(u))
matrices of NUPCO sequences and path p(u) can be decomposed into a finite
number w of invariance intervals,
The topological pattern of nuclear potential and its variation along the path p can be
characterized by the sequence
N(p,I), N(p,2), .... , N(p,w) (29)
marking the endpoints of the frrst w-l of the invariance intervals PN,l' PN,2, ...
PN,w-l along path p.
Note that a reparametrization of path p is always possible that can change the actual
UN l' UN 2 ' ... UN w-l values while preserving their monotonic increase along the
path p. rr, however,'the parametrization (23) reflects some physical condition, for
example, it is defined by proportionality with the distance given in the metric nuclear
configuration space M [3] then the UN 1 ' UN 2 , ... UN w-l values also reflect
physically relevant information. " ,
Two reaction paths p 1 and P2 are regarded shapewise equivalent within the above
context (N -shape equivalent) if and only if the numbers w land w2 of their
invariance intervals agree,
(31)
and
(33)
PI N P2' (35)
A subclass PI (C(A,i), C(A',i'» of class P I is the family of all paths from class PI
which start at the catchment region C(A,i), end at the catchment region C(A',i'), and
are homotopic to one another (continuously deformable into one another) while
preserving these properties. Evidently, the above relation is an equivalence relation
among paths, and P 1(C(A,i), C(A',i'» is an equivalence class. Such an equivalence
class P 1(C(A,i), C(A',i'» represents a formal reaction mechanism defined in terms of
shape (N -shape).
essential connection among the individual shape variations along the ground and excited
electronic state potential surfaces.
For each nuclear configuration K the NUPCO matrix N(K) is defined as in eq.
(18), using a modified choice for mmax '
hence there are only a finite number of NUPCO shape invariance domains by the
above matrix criterion,
(39)
The union of these NUPCO shape invariance domains is the entire nuclear
configuration space M,
M= UMN,k' (40)
The metric of the nuclear configuration space M allows one to introduce a measure of
volume V for subsets of M. Measures of the relative importance sc(k,i) of shape
type k (N -shape type k) for various individual chemical species C(I..,i) and the
contribution cs(i,k) of a chemical species C(A,i) to a given shape type k can be
specified by the following volume ratios:
and
(42)
I. cs(i,k) = 1. (44)
A,i
The above two relations can be used as a test for results obtained for individual shape
and species contributions.
The similarities between various reaction mechanisms can be analysed and quantified
by direct comparison of their matrix sequences. A numerical similarity measure of
reaction mechanisms is based on a measure of difference between the two matrix
sequences on the two sides of eq. (34): the smaller the difference, the greater the
similarity. The extreme case of shapewise equivalence of reaction mechanisms is
represented by eq. (34).
Distortions and strains in a molecule may alter the identity of a functional group. A
large enough local distortion of some molecular moiety may qualify as an actual
chemical reaction that changes a functional group to another. For example, by an
appropriate (large) local distortion, a - CH2 - 0 - CH3 group may get converted into
the group - CH2 - CH2 - OH.
A rather general treatment of such problems has been formulated [5] as follows.
Consider a family f of some functional groups f t generated by some or all of the
N atoms associated with the given stoichiometry S that defines the actual nuclear
configuration space M:
(45)
For convenience in the terminology, and for sake of generality, among these w
functional groups we may include two extreme cases as formal "functional groups":
the individual atoms, and entire molecules, possibly containing all the N atoms.
A larger functional group may contain some smaller ones, for example, the
(46)
(47)
ft < fs · (48)
Of course, it is possible that for many other pairs of functional groups of such a family
f no such containment relation exists; for example, this is the case for functional
groups f l' = - CH 2 - OH and f s' = - CH 2 - CH 3, since neither contains the
other. Since not all functional groups are interrelated, the relation < defines only a
partial order in the family f.
Two cases are of special importance. If there exists a functional group f 1 which is
contained in all the other functional groups f2' ... , fw of the family f,
(49)
then this functional group fl may serve as an infimum, and with relation < as the
partial order, f becomes a lower semilattice. For example, if the family f is such
that each member fs contains a common atom, for example, the H atom, then the <
partial order relation implies that family f is a lower semilattice, with fl = H as
infimum [5].
If the family f contains only one, unique isomer for the structure involving all the N
nuclei of the given stoichiometry S (where this isomer is regarded as a formal
functional group fw), then fw can be taken as a supremum, provided that
(50)
In this case, family f is an upper semilauice. If both relations (49) and (50) hold,
that is, if both infimum and supremum exists within family f, with respect to the <
partial order relation, then the family f of functional groups forms a lattice. Such
lattices are useful tools for systematic treatments of hierarchies.
The goal of the earlier study [5] was to generate a concise scheme for the study of the
presence and interrelations of functional groups among all possible nuclear
configurations of a given stoichiometry S. If two geometrical arrangements of a given
collection of atoms are similar enough, then we may consider these two arrangements
as representations of the same functional group. In order to account for the nonrigid,
flexible nature of functional groups, and for the allowed, minor geometry changes
which do not change their chemical identity, a topological "tolerance" range has been
selected for these groups. For each fuctional group f t a range Tt of allowed
geometrical arrangements is specified; within the given range each functional group f t
is regarded to preserve its chemical identity. The family of all these ranges Tt has
been denoted by T,
(51)
THREE-DIMENSIONAL MOLECULAR BODIES AND THEIR SHAPE CHANGES 205
In terms of NUPCO's a new set of topological criteria can be defined for the
preservation of some essential properties of functional groups. One advantage of a
NUPCO analysis is that molecular fragments can be easily identified. simply by taking
a subset of the nuclei and the nuclear potential generated by this subset. The modeling
of functional groups can be based on such subsets. The entire treatment described for
molecular NUPCO's G(K,aj) in the previous sections is directly applicable for
functional groups. leading to an identifiable NUPCO.
(52)
Another advantage is the fact that the G(ft.K.aj) functional group NUPCO's are
additive. as implied by eq. (1). If the members ft of set f={fl,f2, ... ,ft , ... ,fw }
of formal functional groups are mutually disjoint,
(53)
and if the union of these functional groups is the given molecule c(K) of nuclear
configuration K,
w
c(K) =U ft , (54)
t=1
then the molecular NUPCO G(K,aj) of the molecule c(K) can be obtained as the
sum of the individual functional group NUPCO's G(ft,K,aj),
w
G(K,aj) = 2. G(ft,K,aj), (55)
t=1
If the NUPCO analysis techniques described in the earlier sections are applied to
individual functional group NUPCO's G(f t,K,aj). then the new set of criteria are also
given in terms of configuration space invariance domains of these very functional group
NUPCO's G(f t,K,aj). This approach gives a characterization that is finer than the
mere preservation of chemical identity of a functional group ft.
By analogy with the molecular case, for each nuclear configuration K the functional
group NUPCO's G(ft,K,aj) generate a functional group NUPCO matrix
206 P. G. MEZEY
(56)
for each fonnal functional group ft. Matrix N(ft,K) is defined as in eq. (18), using
an appropriate choice for lIImax '
The functional group NUPCO shape invariance domains of the metric nuclear
configuration space M are defined in terms of invariance domains of these N(ft,K)
matrices. Additional conditions can be specified in order to avoid treating dissociated
fragments as formal functional groups, for example, by specifying a subset of the
configuration space M where only very low nuclear potential contours enclose all of
the (distant) nuclei of the formal functional group ft. This subset can be taken as the
collection of nuclear configurations K where ft is not realized as a chemically
recognizable functional group.
With the above provision, for any nonpathological nuclear configuration space M
there are only a finite number q of different functional group NUPCO matrices
N(ft, K v),
(58)
Consequently, there are only a finite number of functional group NUPCO shape
invariance domains by the above matrix criterion, where these invariance domains are
denoted by
The union of these functional group NUPCO shape invariance domains is the entire
nuclear configuration space M,
q
M= UMNtv. (61)
v::l "
The above partitioning scheme of the nuclear configuration space M can be repeated
for each functional group f t of the family f. When the intersections of all these
invariance domains are consIdered collectively, then one obtains a (usually) finer
subdivision of the nuclear configuration space M, where each such domain MN f tv
is defmed by the condition that within a given MN f t v the invariance is valid for each
functional group ft of the family f, and each MN:f,t,v is a maximum connected set
THREE-DIMENSIONAL MOLECULAR BODIES AND THEIR SHAPE CHANGES 207
with this property. Of course, the union of all these collective NUPCO shape
invariance domains offunctional groups f t of the family f also generates the entire
nuclear configuration space M,
w q
M= U U MNf . (62)
t=l v=l ' ,t,v
The nuclear configuration space M can be characterized by the distribution of either the
MN t v subsets or that of the more detailed M N,f tv subsets of M. In either case, one
can 'apply the definition of a neighbor relation n'(A,B) between two arbitrary subsets
A and B of a nuclear configuration space M [3],
I if A n cLos(B) ¢ 0 or cLos(A) n B ¢ 0 ,
n(A,B) = { (63)
o otherwise,
where cLos(A) is the closure of set A in the metric of space M (in simple terms, the
set A together with all its boundary points).
U sing the above neighbor relation, the pattern of NUPCO invariance domain
distribution of the configuration space M can be characterized by graphs. Two
graphs, gN,t and gN,f are defined as follows. The vertex sets V(gN,t) and V(gN,f)
of these graphs are
and
E(gN,t) = {(MN,t,v , MN,t,v') : n(MN,t,v, MN,t,v') =1, v,v'=l, ... q, v¢v'} , (66)
and
respectively.
These two graphs provide a detailed, global description of invariance domains, and
concise information on how various functional groups are interrelated, transformed or
carried through approximately intact during various chemical reactions.
208 P. G. MEZEY
Summary
After a review of the basic concepts of the topological shape analysis methods, a simple
technique is described for a discrete representation of molecular shapes by the
topological patterns 0/ contour sUrfaces o/three-dimensional nuclear potentials. This
technique is extended for modeling of shape changes in chemical processes. A family
of matrices of integer elements (and the corresponding graphs of integer labels) as well
as various shape invariance domains of the nuclear configuration space are introduced.
Formal reaction paths are characterized by the finite sequences of matrices occurring
along each path. Equivalence relations among formal reaction paths based on these
topological properties lead to a shape-based definition of reaction mechanisms.
Additional relations are specified for the shape characterization of chemical species. The
equivalence classes of these finite matrix sequences provide a shape-based description
of formal reaction mechanisms. Similarities between reaction mechanisms can be
studied by comparing their matrix sequences. These similarities can be quantified,
leading to numerical similarity measures of reaction mechanisms.
References
EUGENY V. BABAEV
Moscow State University
Department of Organic Chemistry
Moscow 119899 Russia
1. Introduction
There are two different pictures of molecular structure: the classical and the quantum-
mechanical. The classical picture is naive-empirical and is the chemical one; it is
connected with classical structural formulae, ball-and-stick models, the phenomenological
Lewis concept and the Gillespy rules for prediction of molecular geometry. This picture
now endures as the heuristic instrument for the planning of chemical synthesis, for
communication between experimental chemists, and for chemical education. The
quantum-mechanical picture is the physical one; it is based on the application of quantum
mechanical ideas to molecular structures and on quantum-chemical calculations of different
degrees of sophistication. Many attempts have been made in theoretical chemistry to find
some symbiosis between these two different levels of description of molecular structure;
only in recent years the desired compromise seems to have been found in the topological
nature of both the quantum-mechanical and classical models of the molecular structure.
Topology is not just graph theory, and similarly chemical topology is not just the use of
a graph as an image of a molecular structure or chemical reactionl&,z as it is usually
considered. 3 One of the main ideas in classical topolog~ is to study spaces which can
be continuously deformed into one another, and to fmd the invariants of such spaces.
Some known chemical applications of these ideas (e.g. the topological invariants of sur-
faces and their critical points) are used to describe electron density mapslb or potential
energy surfaces1c; some topological invariants of the polyhedrons are also used to
understand the electron-counting rules in the chemistry of clusters. 7 In the cited
approaches, the ideas of topology are applied to the quantum-chemical picture of mole-
cular structure. It seems that there is only one work1d devoted to the topological
description of classical structures and the electron-counting rules for usual molecules with
localized bonds.
It is the aim of this paper to introduce special spaces, two-dimensional manifolds or
surfaces, as new images of molecules with localized bonds, starting only from the classical
picture of the molecular structure. One can easily get these surfaces from graphs
corresponding to the usual Lewis diagrams of molecules. Some qualitative chemical
concepts, which are rather poorly formalized in the language of graph theory, seem to be
more clear from the point of view of surface topology. Moreover, because the topological
209
D. Bonchev and O. Mekenyan (eds.), Graph Theoretical Approaches to Chemical Reactivity, 209-220.
© 1994 Kluwer Academic Publishers.
210 E. V.BABAEV
invariants of the surfaces are based on the usual chemical electron-counting rules, it seems
that the general classical pictures of molecular structures and reactions is closer to
manifold topology than to the graph-theoretiCal. description. The suggested approach and
its further development seems to be a new branch of interaction between topology and
chemistry.
Consider a Lewis diagram L(M) = (Z,N, (qJ) of a molecule M with localized bondsla and
with Z valence electrons and N atoms, where the i-th atom contains qj valence electrons
(for the non-transition elements qj coincides with their group number in the Periodic
System). For the given Lewis diagram the unique molecular pseudo-graph (a multi-graph
with loops~ G(M) = (V,R, (deg vJ) can be found, where the number of vertices V is the
same as N, the number of the edges R is equal to Z12, the degree of any i-th vertex deg
Vi is equal to q;, and any loop of the graph G(M) corresponds to a lone pair in the starting
diagram L(Mj" (Chart la). This defmition (the importance of which has been discussed
earlier from different points of view '()'12) connects the Euler equation for a (pseudo)graph9
with the valence electron count in a molecule:
If the starting molecule contains Z valence electrons and if L of them are unpaired, then
the corresponding topological image of a Lewis diagram is no more the pseudo-graph.
Let us call a graphoid G'(M) = (V,R,L,{deg vJ) the object which one can get from the
(pseudo)graph G(M) = (V+L,R+L, (deg vI) by deleting L free (terminated) vertices but
not the edges incident to them. Any graphOld has two sorts of edges, R usual and L hemi-
edges, as well as two sorts of vertices, V usual and L pricked, i.e., it has as its subset a
(V,R)-(pseudo)graphI4. It is obvious that the usual edge of a graph in the topological sense
is homeomorphic to the closed interval [a,b], while the herni-edge (without one vertex or
point) in G'(M) is homeomorphic to the one-side open interval [a,b). On the Chart Ib this
type of hemi-edge is shown as the line starting from a vertex to infinity. Because these
hemi-edges participate L-times in the sum deg v,, the Eq. (la) for the open-shell molecules
and their graphoids should be written as in Eq-.(lb):
L deg Vi = 2R - L (lb)
V -
We want to mention that in both of the above equations the equality ql = deg Vj for the
i-th atom is conserved 14. This means that for any molecule which can be described by
more than one Lewis diagram, only one resonance structure (perhaps a non-octet one)
should be chosen to construct the pseudo-graph (graphoid) due to this equality. In the
case of charged molecules (as well as ylides or betaines) the charges should simply be
localized on the appropriate atoms and the necessary number of protons should be added
or deleted in these nuclei to get a neutral isoelectronic species with the corresponding
THE INV ARIANCE OF MOLECULAR TOPOLOGY IN CHEMICAL REACTIONS 211
Chart la .
Molecule
..
:!Fz : SF .. SFs
Pseudograph
~ W
*
Chart. lb.
Molecule
..
oNF z
..
....
;0-0:
..
.~.
Graphoid n # +
Chart. lc.
+ •• -
NH 3 CH z
..
GeH3AeHZ
212 E. V.BABAEV
change in the qj and deg Vi value. 11 Thus, the isovalent molecules CJIs' and CH30H 2+,
as well as their isoster 'BH30H/ and the ylide +NH3CH2' which are isostructural to the
neutral CH,NH2 after this "charge annihilation," have the isomorphic unlabeled pseudo-
graphs (Chart Ic).
It is easy to get cyclomatic number C for any connected pseudo-graph G(M) (see the left-
hand equality of Eq.(2». All the loops and the independent cycles between the multiple
edges are also included in the cyclomatic number. 9 For graphoids G'(M) L hemi-edges
do not participate in any cycle; that is why one should cut them and calculate the C-value
by using the same equation for the (V,R)-subgraph of the G'(M). In general, the
cyclomatic number for any Lewis diagram has a simple chemical sense as the sum of the
(independent) cycles, multiple bonds, and lone pairs, and is determined only by the
balanced equation between the number of valence electrons, atoms, and unpaired electrons
(see the right-hand equality of Eq. (2»:
C = R- V +1 = ~ (Z - L) - N +1 (2)
3. From Graph (Graphoid) to Surface
Consider any (pseudo)graph or graphoid to be in the real three-dimensional space R3. Let
us add to any edge and vertex a very small volume of surrounding space. This operation
not only conserves completely unchanged the starting graph(oid) structure, but it also adds
a new interesting property to the starting object. Now a fWo-dimensional boundary exists
between the internal and external parts of a graph in R3. Consider our graph to consist
of empty rubber tubes (edges) which are also empty in their cross-sections (i.e. in the
internal vertices), but they are closed in the places of the usually terminated vertices and
open on the ends of the hemi-edges.
It is obvious that the resulting object is the two-dimensional manifold in R3 or the fwO-
dimensional sUrface S(M) corresponding to the starting Lewis diagram L(M). By a simple
continuous deformation one can easily get some canonical form of this surface, e.g. a
sphere with C-handles and L-holes or S(C,L), (see Chart 2a,b). This surface is orientable;
it is closed if L=O and open if L differs from zero. It can be found elsewhere that the
pair (C,L) is quite enough to classify all non-homeomorphic orientable and connected R2_
surfaces,~
The connected R~-surfaces S(C,L) can be described by their Euler characteristic X which
is one of the topological invariants, i.e. it is unchanged on topological deformations.~
It is not necessary to make a triangulation of the surface to get the X value: it depends
only on the number of holes L and handles C (see the left-hand equality of Eq. (3».4 The
use of Eq. (2) shows that for the starting Lewis diagram L(M) its Euler characteristic X
depends simply on the balance between N and Z, (see the right-hand equality of Eq, (3»:
The resulting map L(Z,N,L,{qJ) = > G'(V,R,L, {deg vJ) = > S(C,L) distributes all
the Lewis diagrams on the homeomorphism of their surfaces S(M) on equivalence classes.
THE INVARIANCE OF MOLECULAR TOPOLOGY IN CHEMICAL REACTIONS 213
Chart 2a.
Chart 2b.
Structural Graph(oldl Surface InvarIants
formula C L X
CH~
.f-. Sphere
0 0 0 2
CH 3
.
+
-< Sphere
0
~
0 0 2
.t-
Hemisphere
CH 3 (Plane) 0 1 1
:CH;
J. Torus
G ! 0 0
:CHz(s)
c< Torus
G ! 0 0
.CH z ( tJ
~ CyJi nder
{} 0 2 0
CHz-CH z
H Cy Ii nder
U 0 2 0 I
,,
I
CHz=CH-CH z
~I Handle
~
! 1 -1
214 E. V. BABAEV
There are some general empirical types of chemical similarity both of organic and
inorganic molecules (see references 11 and 12) which are based on the usual isoelectronic
or 1T-isoelectronic analogies, isostructural and homological series, etc., for example:
a) isovalent molecules differing only by the number of the period of any atom in
the molecule (e.g. CHfiH:z-SiHfiHl-CHJ'H:z-SiHJ>Hl-GeH~Hl etc.);
b) isovalent molecules differing in charge (e.g. H30+-NH3-CH,-);
c) isosters (alkanes-borazines; CO-N2; COl-NlO etc.);
d) any number of the resonance structures;
e) all types of tautomers and isomers;
f) classical homologs, differing by one or more CHl-group;
g) 1T-isoelectronic molecules with the same number of 1T-electrons (e.g. pyrrol -
benzene - borepine, or "pseudoazulenes": azulene - indolizine - pyrrolo l •2 -
aimidazole);
h) 1T-isoelectronic molecules with the same number of 1T-electrons differing in the
charge (cyclopentadien yI-anion -benzene-tropili um -cation);
i) members of isostructural series of boron hydrides differing in the BH-fragment
(isostructural c/oso-, nido- or arachno- series, see reference 8b).
All the members of each of these series a) - i) have topologically identical (homeo-
morphic) Lewis diagrams.
It should be mentioned that the homeomorphism in the series a) - e) simply follows our
definition of L(M), G'(M), and S(M) (Chart lc), while the topological identity of the
molecules in the series f) to i) (differing by the well-known homological fragments -CH2-,
-BH-, and -CH+-) proves that the concept of the homeomorphism is a very natural and
reasonable one for further chemical applications.
On the other hand, the difference in the x-value [or in the genus of the closed surfaces
S(M), i.e., the number of handles C for the non-radicals] permits us to classify the non-
homeomorphic types of molecules in a linear order as is usual for orientable surfaces in
topology. ~ The simple chemical sense of the C-number is clear: it is a generalization
THE INVARIANCE OF MOLECULAR TOPOLOGY IN CHEMICAL REACTIONS 215
In the classical Lewis concept of the two-electron and two-centered bond there are onl y
two possibilities to form or to break the bond: The homolytic and the heterolytic. In the
simplest case the hydrogen molecule [for which S(M) is a sphere, X=2] could be formed
from two atoms (each of which is homeomorphic to a sphere without a point or to a
hemisphere, X=I) or from a proton and a hydride ion (a sphere, x=2, plus a torus,
X=O). From the surface topology point of view it means gluing the surfaces to a sphere
in this manner: to glue the I-dimensional cycles of the hemispheres in the first case, or
to glue the sphere into the hole of the handle in the second case. It is important that in
the both operations the Euler characteristic X is the additive value. Other examples also
prove this consideration (Chart 3). This principle can be generalized to be the Main
Theorem.
6. The Main Theorem
The total Euler characteristic of the Lewis diagrams with localized bonds stays
unchanged in chemical reactions.
6.1. Proof
Consider an ensemble of K, molecules (N" A" and L, are the general number of atoms,
and valence and unpaired - electrons, in -the ensemble) which transforms during the
chemical reaction to a new ensemble of the Kr molecules (where s and f indicate starting
and final) with corresponding values Nr, Ar and Lr. The non-connected graphs (graphoids)
with K, (Kr) components and corresPonding v: and R, (VI and Rr) are determined as
mentiOlled above for the Lewis diagrams of the starting and final ensembles corresponding
to Eqs. (Ia) and (Ib).
For any non-connected graph with K components, Eq. (2) should be changed to Eq. (4)
[see the left-hand equality of Eq. (4)]: and after mapping from the graph to the surface
with K components'-6 the left-hand equality of Eq. (3) should be changed to Eq. (5):
The resulting Euler characteristic X. for the ensemble of the molecules after the
combination of the Eq. (5) with Eq. (4) is equivalent to the right-hand equality of Eq. (3):
X 2K-2C-L = 2K-2{~(Z-L)-N+KJ-L
2K - Z + L + 2N - 2K - L = 2N - Z
216 E. V.BABAEV
Chart 3.
Heterolytical Homolytical
CH 3
+ + CH 3
-
H-
~ C H& ~ CH
+
C2HS + 3 + CH 3
HT
~ C2H.S
C2H~ 2
.
+
H2 C2H"
~ H
+ +
CH" + :CH 2
Gluing of surfaces
()@
::.. : . .~ ..J
:~:<:.:~.. , .+ ......:~:.~~;.;:.:.: >:
~
0 00
,-", "
", .~
" .
" .t .• •
- t:':·
~.;.
.•.~ :... .
... :
~:;: . " .
...,:
.t.:,:.~
Euler characteristic
x: 2 + 0 2 1 + 1
THE INVARIANCE OF MOLECULAR TOPOLOGY IN CHEMICAL REACTIONS 217
Comparing the values X! and Xl for the starting and fmal ensembles of the molecules one
can easily get Eq. (6):
(6)
which is equal to zero due to the conservation of the valence electrons and the atoms in
a chemical reactions. Thus, the Main Theorem is proved.
6.2 Discussion
The principle corresponding to the Main Theorem we call the conservation of molecular
topology in chemical reactions. It is of interest that the conservation of the pure
topological property X in classical chemistry follows from the conservation of N and Z,
i.e., from the physical conservation of matter and charge. One can say that an imaginary
space with classical chemical structures is mapping to itself during the chemical reactions.
The invariance of X is not dependent on the changes of neither the number of molecules
(.1K), nor on the unpaired electrons (.1L), nor the sum of the lone pairs, multiple bonds
or cycles (the degree of saturation, .1C alone). Because all the members of the triad
(C,L,K) are topological invariants in the surface topology, the combination ofEqs. (5) and
(6) gives Eq. (7), which is an important chemical consequence:
It follows from Eq. (7) that only five types of interconversions of topological invariants
(K,C,L) are permitted in chemical reactions for molecules with localized bonds:
Chart 4.
K • L C (7.1 X + Y = X-y
U~~O
I •• (-J (oj
X • C L (7bl X + Y = X-y -->
>--
I 80 0
L • C K (7c1 I X Y = X-y -->
I "--.J ~
I
Cj C5
, eX' ·A -e-A)
K • L • C - (7dl
I y. 'B) - Y-B
I ~ l) ~G
(-) (+)
rn
:<
0::1
6;
;..
tTl
<
THE INV ARIANCE OF MOLECULAR TOPOLOGY IN CHEMICAL REACTIONS 219
b) the immediate appearance of a new component [see Eq. (7b)], e.g., conversion
of alkanes to cycloalkanes,
c) corresponding changes in the number of holes and components [see Eq. (7d)] ,
e.g., formation of cyclobutane from two triplet ethylenes,
d} the appearance another handle; the appearance of a handle from nothing is
forbidden. A good example is the well-known cycle-chain tautomerism: it only
seems that a cycle is built from a chain. The cycle which is usually formed,
e.g., from an electrophile-nucleophile interaction, has already existed in the
"chain" as a lone pair (Le., hidden cycle) on the nucleophilic center.
The suggested five types of the conservation and interconversion of the topological
invariants are good starting points for the further topological classification of chemical
reactions. Each type should be subdivided to the different classes, e.g., on the
redistribution of the invariants between different surfaces, following to the size of cycles,
etc.
7. Conclusion
The discussed novel approach could be considered as the first step in our program of
"topologization of chemistry" starting from a classical, and not quantum-mechanical, point
of view. This gives possibility for physicists to better understand the logic of classical
chemistry; for chemists to prove once more that chemistry is not only a descriptive
science, but also an exact science; and for mathematicians to fmd new fields of
application. In our further communications we intend to apply some other ideas of
manifold topology (fundamental and homology groups, topological images of hypergraphs,
etc.) to other classical concepts of chemistry (localization and delocalization, conjugation
and hyperconjugation, 7f-rich and 7f-deficient molecules).
8. Acknowledgements
I thank my colleagues, chemists at the Lomonosov University in Moscow, for the third
Lomonosov award, which was awarded for this work. I also thank topologists Professor
A. T. Fomenko (Moscow, Russia), Professor H. Zieschang (Bochum, BRD) and physicist
Professor R. Hefferlin (Collegedale, TN, USA) for fruitful discussions.
1. Introduction
Chemical reactivity can be defined as the ability of the molecular structures to take part
in the electronic rearrangement processes during chemical interactions. As electronic
processes, one can consider hard (charge-charge) and soft (charge-transfer) electronic
interactions as suggested by Klopman and Hudson in their polyelectronic perturbation
theory [1-3], as well as weaker interactions such as dipole-dipole, hydrogen bonding
effects which can be considered as particular cases of the above two main types of
electronic rearrangements. Reactivity determines the interaction of molecules with other
chemical species in its environment. For example, the ability of chemicals to take part
in charge-charge electrostatic interactions modulate their hard electrophilicity and
condition the nature and extent of alkylation of nucleophiles by electrophiles. The ability
of polar-polar (dipole-dipole) and hydrogen bonding interactions affect the behavior of
solute in the solvent and partitioning of molecules between different phases. Such
consequences of reactivity on the properties of chemical species may be termed primary
effects. On the other hand, for many physiologically active molecules, the primary effects
have important biological consequences-determining their interactions with critical target
biomolecules. The latter can be specified as secondary effects. Thus, carcinogenicity and
mutagenicity of chemicals are believed to be due to the alkylation of critical
biomacromolecules by the chemicals themselves or their reactive metabolites produced in
vivo.
Chemists have been interested in discerning the structural factors underlying molecular
reactivity. The relationship of molecular topology to chemical reactivity is of interest for
both theoreticians and experimentalists. The quantifiers of molecular topology (e.g.,
topological indices) have been useful as reactivity parameters for many classes of
chemicals such as acyclic hydrocarbons, alkyl benzenes, benzenoid hydrocarbons, etc. The
focus of the present work is to discuss the basic principles underlying the topological
foundation of molecular reactivity, to give a comprehensive account of topological
invariants which can serve as reactivity indices as well as to demonstrate applicability of
some of these topological parameters.
The basic concept which determines the topological conditioning of reactivity is the first
principle of organic chemistry, viz., the principle of Molecular Structure. According to
221
D. Bonchev and O. Mekenyan (eds.), Graph Theoretical Approaches to Chemical Reactivity, 221-239.
© 1994 Kluwer Academic Publishers.
222 O. MEKENY AN AND S. C. BASAK
this principle, molecules are considered as isolated objects, possessing a relatively rigid
and permanent location of nuclei. Hence, they are assumed to have a structure, which
conditions their physical and chemical properties. As a consequence of this principle, it
is assumed that molecular structure can be adequately described [4,5]. Three components
of molecular structure can be distinguished [6]: topology, metric, and electronic
distribution.
Molecular topology is defined only by the binary relation between atoms in a molecule
determining whether they are bonded or not [7-9]. This relationship is usually termed
molecular connectivity, and it can be derived from so-called molecular graphs [9-11].
Simple chemical graphs are mathematical structures, where the nature of atoms and type
of bonds is neglected. They can be constructed by depicting each atom by a vertex and
connecting a pair of vertices by an edge when the corresponding atoms are bonded in the
constitutional formula. Usually, in this mathematical representation of molecules, the
hydrogen atoms are neglected, thus arriving at the respective skeletal graphs [12].
For many classes of compounds, the variation of molecular metric (bond lengths, valence
and torsional angles) and electronic structure are small (e.g., planar, homo-nuclear
systems). Provided the impact of th:!se factors on many of the molecular properties can
be neglected, the latter may be considered as only topology conditioned. Still, some
properties of such compounds are topology-invariant and are strongly conditioned by the
non-topological structural characteristics. For example, the tendency to delocalized 1t-
electron density within symmetric hexagonal a-framework is primarily due to the steric
constraint. These facts are supported by the assumption of a relative orthogonality of
topological and non-topological structural parameters with respect to the molecular
properties of the compounds considered.
(2)
where K[ and K2 are the reaction constants for two members of the reaction series, while
~[ and ~2 are the respective reaction energies.
Similarly [15,16], one can derive LFERs for reaction rate constants:
TOPOLOGICAL INDICES AND CHEMICAL REACTIVITY 223
(3)
where kl and k2 stand for the rates constants of two members of the reaction series, while
LiE l' and LiEz' are the respective activation energies.
(4)
the difference of activation (reaction) energies of substituted (X) and reference structure
(H) of the series under investigation is proportional to the variation of electronic structure
of the reference molecule in a reference reaction series after introduction of substituent
X, as described by ax-parameter. The proportionality constant (reaction constant), Q,
reflects the specificity of the reaction studied as well as conditions of the reaction.
(5)
where Rl and R2 stand for the value of reactivity property of two members of the reaction
series (as reaction rates or equilibrium constants). Their ratio can be modeled as a
product of two relatively independent variables: the external parameter, EP, conditioned
by non-structural factor (as reaction conditions and specificity, etc.) and difference of the
respective structural parameter, SP. Usually in LFERs the impact of the structural
variation on reactivity is analyzed at constant values of the external factors (EP=const).
If one considers reactivity in a broader sense, including primary effects determined by
polar-polar or hydrogen bonding interactions, it is possible to relate the change of the
structural parameters with the variation of molecular properties as partition coefficient,
retention data, etc.
For properties determined predominantly by molecular topology, the above equation can
be written in the following form:
(6)
where TIl and TI2 are quantitative indices characterizing the topological structures of the
two members of the reaction series under consideration. The nature of the topological
indices will be discussed in detail on the next part of the work.
The topological indices are numerical quantities derived from molecular graphs
representing molecules. Such graphs could be hydrogen-filled or hydrogen-suppressed.
Sometimes weighted graphs, multi graphs or weighted pseudographs are used to represent
the relevant aspects of the chemical species [9]. First, the graph is transformed into a
more convenient mathematical representation. As such, one can use the adjacency and
distance matrices, characteristic and distance polynomials, etc. These mathematical
structures are then transformed by different algorithms in order to derive topological
indices (TIs), incorporating in a concise way the topological information of the respective
chemical species (see Fig.i).
224 O. MEKENYAN AND S. C. BASAK
AlGORITHM
MATHEMATICAL TOPOlOGiCAl
MOlECUlE ~ GRAPH I- REPRESENTATlON INDICES
Next we are presenting the topological indices most frequently used in structure-property
analysis, which can be obtained by the above three groups of algorithms (see ref. 9,17-20
for more details):
Neighborhood Relationships. The set of the vertices in the chemical graph can be
classified according to their degrees as well as the degrees of their first neighbors. This
classification appears to be useful for reactivity purposes [21].
Total Adjacency, A, is the sum of the matrix elements, ll;j, of the graph adjacency NxN
matrix [22,23]:
(7)
Zagreb Group Indices [24,25] are also obtained by simple function over adjacency matrix
elements:
where Vj = ~~j'
Randic Connectivity Index [26]:
In the Generalized Connectivity Index [27,28], the summation is extended over all
possible paths of length h:
TOPOLOGICAL INDICES AND CHEMICAL REACTIVITY 225
(10)
whereas in Valence Connectivity Index [29], the vertex degree Vi is replaced by the
number of valence electrons of atom i diminished by the number of hydrogen atoms
attached to this atom, ~.
The Wiener Index is defined [30] as the half-sum of the off-diagonal elements of the
distance matrix:
(11)
Another one index defined by the elements of the distance matrix is the Mean Square
Distance Topological Index [31]:
(12)
A Randic-type formula (eq.9) was applied [31,32] to distance sums, VO,i' instead of vertex
degrees, thus introducing the Average Distance Sum Connectivity Index, J:
(13)
where, the distance sum is defined as the sum of all entries of the i-th row in the distance
matrix [33]:
vo·,I =E·_INd
J- ..
I)
(14)
The normalization term in eq. 13 is based on the cyclomatic number, 11, and number of
edges, q (q = N + 11 -I, for planar graphs).
Analogously to VO,i' another useful topological index is defined [34-36] by summing self-
returning walks of length I, SRW/, starting from point i and passing through other
vertices, k, without traversing one and the same bond twice in each step:
(15)
Apparently, the total number of self-returning walks of length 2 is twice the number of
edges (single bonds) in the graphs, also termed the total adjacency, A:
(16)
The atomic topological indices, SRW/, were normalized by dividing by the total number
of such walks in the molecule [35,36]:
Other atomic topological indices can be derived by some of the atom orderings obtained
by molecular coding algorithms, as Morgan extended connectivity (EC) algorithm [37],
Hierarchically Ordered extended Connectivity algorithm (HOC) [38]. The Extended
226 O. MEKENY AN AND S. C. BASAK
9
MO:~ 9OC
j:/'5,,2 9 9 -tX
2 2 5 4 11 2 3
~CC1
2 3 1
_4CL2
16
4
4
4
2 2 11
-:OC --tX::
2 3 5 2 4
~OC4
3 1 4
5
2 3 (2+3) 5
(1+4)
where LjEC/ = A = N.
Recently Kier and Hall [39] have introduced the electrotopological state, ESTj, which is
calculated from the intrinsic state volume of atom i, Ij, and the sum of loge state values
(lrI)lr vectored from atom i:
(20)
Here, Ii' is the intrinsic state value for every other atom and Si is the topological distance
within the loge in which i and j are terminal atoms. Ij is dermed by:
(21)
where 0v.j is the number valence electron of atom i and OJ is its vertex degree.
The topological indices based on information theory [40,41] can also be classified as
indices obtained by using the first group algorithm. For a system having N elements
distributed into k classes of equivalence N l' N2, ... ,N k a probability distribution
P{Pl,P2, ... ,pd is constructed (pj = N/N). The entropy of this distribution, calculated by
the Shannon's formula [40]:
(22)
is called information content. The approach can be applied to the entries of graph
representations, thus obtaining the information content of the structure, called also
TOPOLOGICAL INDICES AND CHEMICAL REACTIVITY 227
Information Topological Indices [18]. If one uses the entries, di, of the topological
distance matrix, the following Information Index based on Graph Distances can be
obtained [42]:
(23)
(24)
Based on the distribution of the graph vertices according to the number of their flrst,
second, etc., neighbors the Information index on Neighborhood Symmetry was introduced
[44-46]. An appropriate set A of n elements is derived from a molecular graph G
depending on certain preselected criteria. On the basis of an equivalence relation deflned
on A, the set A is partitioned into equivalence classes Ai of order ni (i = 1,2, - - - - -, h
k ni = n). A probability scheme is then assigned to the set of equivalence classes:
I
AI' A2 , - - - - -, Ah
PI' P2' - - - - -, Ph
where Pi = nin, ni and n being the cardinalities of Ai and A respectively. The mean
information content (or complexity) of an element of A is deflned by Shannon's [40]
relation:
(25)
The logarithm is taken at base 2 for measuring the information content in bits. The total
complexity of the set A is then n times Ie.
It is to be noted that the complexity of a real object or a model object is not uniquely
deflned. While there could be more than one way of deflning a model object
corresponding to the same chemical species, complexity of the same model object
(chemical graph) may vary depending on the nature of the equivalence relation. In the
calculation of indices of neighborhood symmetry, two vertices u and v of a graph G are
said to be topologically equivalent if and only if for each neighboring vertex U i
(i = 1,2,- - - -, k) of the vertex u there is a distinct neighboring vertex Vi of the same
degree for the vertex v. If v is a vertex of the graph G, then the open r-sphere S(v,r) is
deflned as the subset of V(G) consisting of all vertices Vi such that d(v,v i) < r. Obviously,
S(v,O) = $, S(v,r) = v for 0 < r < 1, and S(v,r) = (v) uri (v) = NI(V) for 0 < r < 2. One
can construct open r-spheres of each vertex of G for all integral values of r, 0 ::,; r ::,; p.
For a particular value of r the collection of all such open spheres S(v,r), where v runs
over the entire vertex set V, forms a neighborhood system of the vertices of G. A
suitably deflned equivalence relation can then partition V into disjoint subsets based on
the equivalence of nature, connectedness, and bonding pattern of neighbors up to rib order
neighborhoods. It is noteworthy that this approach incorporates the effects of distant
neighbors (i.e. neighbors of immediately bonded neighbors) on an atom or a reaction
center. After partitioning of the vertices for a particular order (r) of neighborhood, IC,
is calculated by Shannon's formula. Subsequently, Basak, Roy and Ghosh [44] deflned
another information-theoretic measure, structural information content (SIC,), which is
calculated as:
(26)
228 O. MEKENY AN AND S. C. BASAK
where IC, is calculated as above and n is the total number of vertices of the graph.
Another information-theoretic invariant, complementary information content (CIC), was
defined as [45]:
(28)
Here, one can consider Po as the total number of possible ways of distributing N particles
into k partial bond spaces with N j particles in the partial space i.
Z = L.,,,,,,[N121p(G,k) (29)
where p(G,k) is the number of ways in which k edges are chosen from the graph G so
that no two of them are adjacent; N/2 in the Gauss square brackets is the smallest integer
not exceeding the real number in them. By definition, p(G,O) = 1, while p(G,I) equals
the number q of edges in the graph. For acyclic graphs Z can be defined as the sum of
the absolute values of coefficients in the characteristic polynomial, P(G,x).
Herndon's structure count ratio [50] can be also considered as derived by a combinatorial
algorithm:
(30)
Here, SC R is the number of Kekule structures of the unperturbed molecule and SP p(o, 1) of
the transformation product, P, or rate controlling intermediate, I. It was found that InSC
values are proportional to resonance energy [51] and eventually to the stability of the
systems. A quicker way to obtain SC's is by summing of the absolute values of the
unnormalized non-bonded molecular orbitals (NBMO) coefficients, coj ' of the altemant ion
or radical [52,53]. The latter can be determined by means of the zero-sum rule of
Longuet-Higgins [54]:
(31)
where the summation is over all vertices j joined to the vertex k, Ajk are the corresponding
non-zero entries of the i-the row of the adjacency matrix.
TOPOLOGICAL INDICES AND CHEMICAL REACTIVITY 229
The first step of the procedure for counting of SC is to produce an odd alternant from the
even alternant by deleting one carbon atom and adjacent bonds from the even system.
Then the vertices of the odd alternant system are divided into two set (starred and
unstarred) in such a way that no vertices from one and the same set are adjacent. The
one set of atoms (starred) have zero coefficients in the NBMO. To the vertices of the
other set one assigns integers chosen in such a way that their sum around each starred
vertex is zero [54]. In fact, these simple integers are the unnormalized coefficients of
NBMO (Fig.3).
0 C°C
-2 !• )3.1'._,
1
, .~.
1 1 -I 1
° 0 0
A method for calculating SCR's of non-alternant systems (fluoranthens) has also been
published [55].
Analogously, based on the coefficients of NBMO a topological index, N;, was introduced
[52] to assess the relative reactivity of the different position of an altemant hydrocarbon,
termed localization energy or reactivity number. The reactivity of a hydrocarbon at a
particular position is determined by a procedure similar to this one for deriving SCR. The
atom, reactivity of which is determined, is removed from the system with its adjacent
bond. The NBMO coefficients are determined for the resulting odd alternant system by
Longuet-Higgins approach. The absolute sum of the coefficients of the atoms neighboring
the removed one is obtained. Then this sum is normalized by the root of the sum of
squares of the unnormalized coefficients, thus producing reactivity number, N j • Thus, for
the ~-position of anthracene analyzed in Fig.4, the respective value of N~ is calculated
by the relationship:
Corrections are introduced here [56] for the resonance integral, ~, according to the
branching conditions:
After placing x in the main diagonal of the adjacency matrix, the latter is transformed into
the well known characteristic or topological matrix from Huckel quantum-chemical
theory. The respective characteristic polynomial may be obtained readily by expanding
the determinant of the topological matrix. Thus, the eigenvalues of the topological
(Huckel) matrix obtained after its diagonalization coincide to the graph spectrum [9].
Lovasz and Pelikan [57] introduce as a topological index the largest eigenvalue, Xl' ofthe
230 O. MEKENY AN AND S. C. BASAK
characteristic matrix.
Taking into account the topological nature of the Huckel quantum-chemical approach,
first-order perturbation theory, the free electron MO model, and valence-bond structure
resonance theory, one can classify the reactivity indices obtained by these methods as
purely topological-defined by the third type of algorithms from topological matrix. These
are: the atomic charges, q1t, the index of free valence, Fr, atomic self-polarizability, 1trr ,
superdelocalizability indices Sr' Brown's index Z, localization energy, Lr. Space does not
permit a consideration of all these parameters, which are described in detail elsewhere
[58].
(33)
Next, this idea was generalized by Carbo and Jenkins [61,62] and Ponec [63-65]. The
latter introduced a topological similarity index assessing the extent of reorganization of
electron density of a molecule during the chemical transformation, r AB • In this topological
approximation a similar expression is used as eq. 33, but here P and P L are the density
matrices of the reactant, A, and product, B, related by the equation:
(34)
The 't-matrix is the so-called assigning table [66] describing the mutual relation of basis
sets X and X' (HMO type delocalized 1t-molecular orbitals) of the reactant and product,
respectively:
The 't-matrices are well approximated by diagonal matrices, where 't jj = ±1 describe the
changes of MO at particular atoms at the course of the reaction.
Most of the above described indices assess the topology of the whole molecule. Because
of that, they are termed global topological indices. The latter are significant for structure-
reactivity studies either when the molecules interact as a whole (e.g., polar-polar
interactions with solvent molecules) or when molecular geometry controls the substrate-
receptor complex formation. Extensive experimental results, however, indicate that
chemical reactions usually concern a localized position in the interacting species. For
these cases it is desirable to characterize topologically a fragment of a molecule instead
of whole molecule. In order to define the fragment topological indices, one can apply to
fragments the same general procedures used for whole molecules. The obtained indices
were termed [67] internal fragment topological Indices, IFfI(F). But in this way one
cannot differentiate between isomeric univalent groups with different point of attachment
as for example n-butyl and s-butyl moieties. This problem was solved by taking into
account "the interaction" between fragments and the reminder of the molecule. Thus
resulting indices were termed [67] external topological indices, EFfI(F). More precisely,
they are specified as the difference in value between the topological index for the whole
TOPOLOGICAL INDICES AND CHEMICAL REACTIVITY 231
graph, TI(G), and the internal fragment indices for both the fragment, IFfI(F), and the
reminder of the molecule, IFfI(G-F)k:
The idea of external fragment indices will be illustrated by the Wiener index derived from
the distance (pxp symmetrical) matrix of the molecular graph. If the fragment F has p'
vertices, the IFfI(F) is defined by operations on the submatrix F, while the IffI (G-F)
is similarly specified on the submatrix (G-F) having pxp' vertices. The EFfI-indices are
defined by operations on the hatched portions of the matrix (Fig.4a). When (G-F)
comprises two or more disjoint subgraphs, the interaction between these subgraphs (the
additional hatched portions in Fig.4b) is not taken into account in specifying IFfI(G-F),
since they are connected only by virtue of the fragment F.
G~ 1 If
•
P
iii b
Fragment topological indices can be calculated based on the different graph invariant by
applying the above scheme [67].
When considering a fragment of one non-hydrogen atom, the EFfI(l) value reduces to:
EFfI(1) = TI(G) - IFfI(G') - a (37)
where G' is the vertex-excised graph, i.e. the initial graph from which the given vertex
and its adjacent edges have been removed. a stands for IFfI(F) and is zero in the majority
of cases or is a constant (e.g., a=l for the Hosoya index Z).
For example, based on eq.36 one can derive distance sum index, VD.i ' (see eq.14) if
proceeds from Wiener index. Though this is a general expression for the atomic
topological indices, there are many other original algorithms, as one can see in the
preceding section.
To the group of atomic indices one should include also Herndon's structure count ratio
(eq.30), Dewar's reactivity number (eq.32) and the whole group of HMO-reactivity
parameters.
232 O. MEKENYAN AND S. C. BASAK
A set of topological indices, however, describe significantly the variation of 10gP for the
studied compounds. Thus, the correlation with IX' can be presented by the equation:
Hall and Kier (72) found good correlation between aquatic toxicity (LC50) values in
fathead minnow and the third order valence connectivity index:
The topological indices, fi and liml»lfi are considered [35,36] as fractional atomic
charges, describing distribution of one electron over the atoms in the molecules. This
assumption is based on the idea that each self-returning walk can be associated with
possible electron movements. The larger the number of SRWi for a specific atom, the
larger its fractional electronic charge. By the examination of a number of 1t-electronic
molecules it was found also [35] that f; = liml»lfi is equal to the partial Huckel LOMO
charges:
(46)
By this reason, the product of fi and N is called also topological charge, TC.
The fact that the indices fi' TC, (N)EC, ETS (see eqs.19 and 20), are related to atomic
topology only explains their correlation to CNDO/2 atomic charges in alkanes [36]:
The correlations with TC and NEC are characterized by r=0.952, s=O.007, and F=298 and
301, respectively.
For a molecule with a discrete spectrum of energy levels, E I, E2'''''~' the n-the moment
of energy is specified by:
(48)
where the second equality follows from the invariance of the trace of the corresponding
Hamiltonian matrix. The latter has a simple topological interpretation: it equals the
weighted sum of all self-returning walks of length I in the molecule, beginning and ending
234 O. MEKENYAN AND S. C. BASAK
with the same orbital (atom). This comes to explain the above relationship between the
indices based on self-returning walks and atomic charges as well as the relationships with
the other energetic molecular and atomic characteristics [73-76].
Recently, using the moment analysis, a scheme is proposed [77] for determining the
energy and reactivity of conjugated hydrocarbons without referring to the standard
calculations of HMO theory. The 1t-electron energy of a molecule here is expressed by
the equation:
(49)
t.~~re the a.-are tabulated coefficients determined by the truncated expansion of function
Based on the presented unified energy scheme [77], energy contributions are assigned to
different fragments (point-energy, edge-energy, ring resonance energy) which are used for
rationalizing the aromaticity, reactivity and bond length of conjugated hydrocarbons.
Herndon's structure count ratio, SCRi , as well as Dewar's reactivity number, Ni , are
constructed similarly and can be used to compare reactivity at different positions of a
molecule (as well as at a specific position in a reaction series). Classic examples here are
the naphthalene and biphenylene:
TOPOLOGICAL INDICES AND CHEMICAL REACTIVITY 235
0 ·1 ·1 0 ·1
0C()P 1
. 1. 0 1 .1 01 1 .1 0 0 o=t)P °ao
Q
·2 Q
'1cr~r 0 . ''9
·1 0
3
o 2 ·2
2 o • 3
8C,.=3 SC~=3 8C,.=S SCa =3
8<1=7 SCj =6 8<1= 11 SCj =8
SCR..=2.33 SCR, = 2.00 SCR,.=2.20 S~=2.67
Fig.6. Calculated structure count ratio, SCRI , and Dewar's reactivity number,
Nil for a and ~.positions of naptalene and biphenilene
As one can see from Fig.6, according to structure count ratio and reactivity number
(localization energy), the reactivity of naphthalene a-position is higher the this one at ~.
position: SCRa>SCRll.. and Na<N~. Alternatively, for biphenylene, ~-position is more
reactive than a: SCR~>SCRa and N~<Na. Both predictions correspond to the
experimental observations for local reactivity of these molecules.
Next we are going to demonstrate the application of the topological similarity index
introduced by Polansky, Fratev [60,61] and next generalized by Carbo [62] and Ponec
[64-66] in predicting pericyclic reactions' pathway [64]. The electrocyclic transformation
of 1,3-butadiene to cyclobutene is considered. Both molecules can be described by their
bonding MO, <PA and <PB. The first one is constructed by Huckel1t-MO whereas the
second one by localized bonds:
(SO)
1 0.894 0 -0.447 1 0 0 1
The next step consists of transforming PB from the basis of atomic orbitals x: into the
basis X (by eq. 33) serving simultaneously to the description of PA • Two such matrices
236 O. MEKENYAN AND S. C. BASAK
-1
The calculated similarity indices by these matrices for both pathways are: rcon=0.723 and
rdi.=O.500 or in other words the similarity of the reaction partners is larger for the
conrotatory than for the disrotatory cyclization. On the other hand, according to least-
motion principle, the easy course of the reaction is connected with the requirement of
minimal variation of electronic configurations of the reacting molecules. Hence,
conclusion could be made by the above calculated r-values that more likely reaction is the
conrotatory cyclization, which correspond to Woodward-Hoffmann rules as well as to
experimental results.
Despite of some limitations the topological similarity approach is promising for the
formulation of selection rules in chemical reactivity.
Space does not permit a discussion on application of HMO reactivity indices which
undoubtedly did a large impact on calculating and predicting chemical properties [59].
We refered here only to some more recent applications of these topology based reactivity
parameters. During the past decade, for example, they were applied to the identification
of the molecular fragment of polycyclic aromatic hydrocarbons (PAH), most susceptible
to carcinogenic metabolism. For this purpose, Jerina and Lehr [78] have calculated the
ease of carbonium ion formation from various dihydrodiol epoxides by estimating the
respective increase in HMO-delocalization energy. Seybold and Smith [79] have used the
net 1t-electron charge at the benzyl carbon of an ionized Bay-Regions [80] of the studied
dihydrodiol epoxides. Soriano et al. [81] have assessed the sums of two atomic (HMO)
superdelocalizabilities for three particular regions of PAH, termed A, K, and L.
5. Conclusions
Here, we have introduced two different classifications of TIs. The first one is based on
which aspects of the chemical graph they quantify. The global, fragment, and atomic
graph invariants represent different geometrical information regarding the portion of the
molecular graph they characterize. Thus they are capable of modeling distinct types of
reactivities. The second classification is based on the specificity of algorithms translating
the mathematical representation of molecular graphs into topological quantities. The first
two groups of algorithms, using simple mathematical functions and combinatorial
procedures, produce the conventional TIs known from chemical graph theory. The TIs
obtained by applying the third group of algorithms (diagonalizing particular graph
matrices) are in fact the well known reactivity indices from Hueckel theory. Though
produced by different algorithms, however, the TIs from the three groups have a common
foundation, namely the chemical graph and its mathematical representation.
We suggest that in correlation analysis TIs should be used in conjunction with models of
the reactive process and plausible hypotheses about the mechanism of the reaction under
investigation. Such an approach will explain why a particular TI correlates with reactivity
in certain cases whereas fails to do so in others.
TOPOLOGICAL INDICES AND CHEMICAL REACTIVITY 237
TIs will have a predominant role in determining molecular reactivity where metric and
electronic aspects of molecular architecture plays a very minor role in the reaction
process. Thus an understanding of the mechanism of the particular reaction process under
investigation is critical to the selection of a proper set of TIs for modelling reactivity.
Initially, it is advisable to use TIs in modelling the primary effects. This will clarify why
a particular TI is able to predict some property reasonably well. Subsequently, these TIs
should be used in predicting secondary effects.
Finally, the initial set of TIs used in predictive models should not be too large to the point
that the probability of chance correlation will be high [82]. Rational choice of a minimal
set of TIs on the basis of understanding the reaction process is one solution to this
problem. On the other hand, the presence of too many TIs in the final models (even if
some statistical criteria permit that) can cause problems also, because the relation between
TIs and the reaction mechanism is lost.
Acknowledgment
One of the authors (SCB) was supported in part by cooperative agreement No. EPAICR
819621-01-0 from the United States Environmental Protection Agency. Contribution
Number 104 from the Center for Water and the Environment of the Natural Resoures
Research Institute.
6. References
20. M.1. Stankevich, LV. Stankevich and N.e. Zefirov, Usp. Khim, 57, 337 (1988).
21. M. Zander, Naturwissenchaften, 69, 436 (1982).
22. D. Bonchev, O. Mekenyan, H. Fritsche, J. Cryst. Growth, 49, 90 (1980).
23. J. Barton, in: Sintering and Catalysis, G.e. Kuczynski, Ed., Plenum, New York,
1977, pp. 17-27.
24. I. Gutman, B. Ruscic, N. Trinajstic, and e.F. Wilcox, Jr., J. Chern. Phys. 62, 3339
( 1975).
25. I. Gutman and N. Trinajstic, Chern. Phys. Lett., 17, 535 (1972).
26. M. Randic, J. Am. Chern. Soc., 97, 6609 (1975).
27. L. Kier, L. Hall, W. Murray, and M. Randic, J. Pharm. Sci., 64, 1971 (1975).
28. L. Kier and L. Hall, J. Pharm. Sci., 65, 1226 (1976).
29. L. Kier and L. Hall, Molecular Connectivity in Chemistry and Drug Research,
Academic, New York, 1976.
30. H. Wiener, J. Am. Chern. Soc., 69, 17 (1947); 69, 2336 (1947); J. Chern. Phys.,
15, 766 (1947); J. Phys. Chern., 52, 425 (1948); 52, 1082 (1948).
31. A. T. Balaban, Pure Appl. Chern., 55, 199 (1983).
32. A. T. Balaban, Chern. Phys. Lett., 89, 399 (1982).
33. O. Polansky and D. Bonchev, MATCH, 21, 135 (1988).
34. D. Bonchev, O. Mekenyan and O.E. Polansky, in: Graph Theory and Topology in
Chemistry, R.B. King and D.H. Rouvray, Eds., Elsevier, Amsterdam, 1987, p. 126.
35. D. Bonchev, L.B. Kier, and O. Mekenyan, Int. J. Quant. Chern. (in press).
36. D. Bonchev and L.B. Kier, Journal of Mathematical Chemistry, 9, 75 (1992).
37. H.L. Morgan, J. Chern. Doc., 5, 107 (1965).
38. O. Mekenyan, A.T. Balaban and D. Bonchev, J. Magn. Res., 63, 1 (1985).
39. L.B. Lier and L.H. Hall, Pharm. Res., in press.
40. e. Shannon, W. Weaver, Mathematical Theory of Communication , Urbana, Univ.
Illinoy Press, 1949.
41. L. Brillouin, Science and Information Theory, New York, Academic, 1956.
42. D. Bonchev and N. Trinajstic, J. Chern. Phys., 67, 4517 (1977).
43. C. Raychaudhury, S. K. Ray, 1. J. Ghosh, A. B. Roy and S. e. Basak, J. Comput.
Chern., 5, 581 (1984).
44. S.C. Basak, A.B. Roy and J.J. Ghosh, Proceedings of the lInd International
Conference on mathematical modelling (Eds. X.J.R. Avula, R. Bellman, Y.L. Luke
and A.K. Rigler) pp. 851-856, University of Missouri-Rolla, Rolla, Missouri, USA
(1979).
45. S.e. Basak and V.R Magnuson, Arzneimittel-Forschungl Drug Research, 33,479
(1983).
46. A.B. Roy, S.e. Basak, D.K. Harriss and V.R. Magnuson, in Mathematical
Modelling in Science and Technology. X.J.R Avula, RE. Kalman, A.1. Lipais and
E.Y. Rodin (Eds.), p. 745, Pergamon Press, 1984.
47. W.Y. Yee, K. Sakamoto, Y.J. I'Haya, Rept. Univ. Electro-comm., 27, 53 (1976).
48. K. Sakamoto, W.Y. Yee, Y.J. I'Haya, Rept. Univ. Electro-cl)mm., 27, 227 (1977).
49. H. Hosoya, K. Kawasaki, and W.J. Murray, J. Pharm. Sci., 64, 1974 (1975).
50. W.e. Herndon, J. Org. Chern., 40, 3583 (1975).
51. W.e. Herndon, Israel J. Chern., 20, 270 (1980).
52. N.J.S. Dewar and R.e. Dougherty, The PMO Theory of Organic Chemistry,
Plenum Press, New York, 1975.
53. W.e. Herndon, Tetrahedron, 29, 3 (1973).
54. H.e. Longuet-Higgins, J. Chern. Phys., 18,265,275, 283 (1950).
55. W.e. Herndon, Tetrahedron, 39, 1389 (1982).
56. H. Kuhn, Helv. Chim. Acta, 31, 1441 (1948); Helv. Chim. Acta, 32, 2247 (1949).
57. L. Lovasz and J. Pelikan, Period. Math. Hung., 3, 175 (1973).
TOPOLOGICAL INDICES AND CHEMICAL REACTIVITY 239
58. A Streitwieser, Jr., Molecular Orbital Theory for Organic Chemists, Wiley, New
York, 1961.
59. F. Fratev, O.E. Polansky, A Melhorn, V. Monev, J. Mol. Struct., 56, 245 (1979).
60. O.E. Polansky, G. Derflinger, Int. J. Quant. Chern., 1, 379 (1967).
61. R Carbo, L. Leyda, M. Arnau, Int. J. Quant. Chern., 17, 1185 (1980).
62. P.E. Bowen-Jenkins, L.D. Cooper, W.G. Richards, J. Phys. Chern., 89, 2195
(1985).
63. R. Ponec, ColI. Czech. Chern. Commun., 52, 555 (1987).
64. R. Ponec, Z. phys. Chemie, 270, 365 (1989).
65. R Ponec, J. Phys. Org. Chern., 4, 701 (1991).
66. R Ponec, ColI. Czech. Chern. Commun., 29, 455 (1984).
67. O. Mekenyan, D. Boncher, and A.T. Balaban, J. Moth. Chern., ,2,347 (1988).
68. S. C. Basak, G. J. Niemi and G. D. Veith, J. Math. Chern., 4, 185 (1990).
69. W. J. Murray, L. H. Hall and L. B. Kier, J. Pharrn. Sci., 64, 1978 (1975).
70. A Sabljic, in: Practical Applications of Quantitative Structure-Activity
Relationships (QSAR) in Environmental Chemistry and Toxicology, W. Kaecher
and J. Devillers, Eds., Kiuwer, Dordrecht, 1990, pp. 61-82.
71. O. Mekenyan, G. Veith, S. Bradbury and C. Russom, Quant. Str.-Act. Relat. 12,
132 (1993).
72. L. H. Hall and L. B. Kier, Environ. Toxicol. Chern., 8, 19 (1989).
73. J.K. Burdett, S. Lee, and W.C. Sha, Croat. Chern. Acta, 57, 1193 (1984).
74. J.K. Burdett, and S. Lee, J. Am. Chern. Soc., 107, 3050, 3063 (1985).
75. J.K. Burdett, Struct. Bonding (Berlin), 65, 10 (1987).
76. J.K. Burdett, Chern. Reviews, 88, 1 (1988).
77. Y. Jiang and H. Zhang, Theor. Chirn. Acta, 75, 279 (1989).
78. RE. Lehr, D.M. Jerina, Terahedron Letters, 24, 27 (1983).
79. LA Smith, G.D. Berger, P.G. Seybold, M.P. Serve, Cancer Research, 1978, pp.
2968-2977.
80. R.E. Lehr, A.W. Wood, in: Polynuclear Aromatic Hydrocarbons: Physical and
Biological Chemistry, M. Cook, AJ. Dennis, G.L. Fisher, Eds., Batelle Press,
Columbus, Ohio, 1982, pp. 21-37.
81. D.S. Soriano, J.A. Daeger, D. Robbins, W. Confer, and V. Soriano, J. Environ.
Sci. Health, A25(3), 277 (1990).
82. T.G. Topliss and R.P. Edwards, J. Med. Chern., 22, 1238 (1979).
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION
MECHANISMS AND THEm ELEMENTARY STEPS
1. Introduction
Chemical reactivity, which can be viewed as the capability of chemical species of any
kind to undergo chemical transformations, has always been a key problem in theoretical
chemistry. Any progress in understanding reactivity not only enriches chemical
knowledge but also has important practical implications. Numerous methods have been
developed to assess reactivity quantitatively, and it is not the aim of this chapter to
review all of them. Yet, some basic approaches are to be mentioned. These include first
of all various quantum chemical concepts and results, such as the classical work of
Dewar and Simonetta,I,2 the frontier orbital theory of Fukui et al., 3-S orbital symmetry
principles,6 the isolobal concept of Hoffmann,7 and molecular hardness (softness) concept
of Parr and Pearson. 8-IO Correlation analysis has also contributed greatly to this areay-14
The experimental reactivity measure most commonly used is the rate constant of the
reaction in which the compound of interest is involved. Advances in contemporary
experimental techniques made possible the precise direct measurements of the rate
constants of chemical reactions, including elemen~ reactions of electronlS ,16 and
proton I7 ,18 transfer, and steps involving radicals,19 ions, 0 and metal complexes. 21 -24
This chapter centers on progress in the mechanistic and kinetic studies of complex
reactions as a source of information on the reactivity of intermediates and their
elementary reaction steps. Estimation of kinetic constants (known as the inverse problem
in chemical kinetics) is an area of intensive study. 2S However, the correct and exhaustive
formulation of this problem presumes progress in two related areas of research:
formulation of mechanistic hypotheses and their discrimination based on experimental
data. The necessity of these two preliminary stages of investigation matches the modern
view of scientific method,26,27 known as the hypothetic-deductive method. This method
is advocated in our recent work,28 in which we developed a rational methodology for
kinetic and mechanistic studies and showed that formulation of mechanistic hypotheses
is a key procedure in chemical mechanistic studies. Computer assistance in advancing
241
D. Bonchev and O. Mekenyan (eds.), Graph Theoretical Approaches to Chemical Reactivity, 241-275.
© 1994 Kluwer Academic Publishers.
242 O. N. TEMKIN ET AL.
The choice of the experimental method for evaluating mechanistic hypotheses and the
structure of the overall kinetic law of a multistage reaction is largely influenced by the
topological structure of the mechanism. 29 This structure can be depicted in the form of
a graph. In many cases, the specific features of this graph (and the related topological
structure) produce specific kinetics of the overall reaction,30 and such studies in the
graph-theoretical modeling of reaction mechanisms may find important applications.
Therefore, this chapter will elucidate the progress in these areas of research.
At present, numerous chemical reactions are widely believed to occur via several
consecutive elementary reactions. Such a set of elementary reactions is usually termed
the reaction mechanism. Mechanistic studies, which provide predictions of the behavior
of complex reactions in wide intervals of varied parameters, help shed light on the nature
of complex reactions and, more generally, on the self-organization phenomena. The
concept of a multistep mechanism for catalytic reactions is now commonly accepted.
Such deyelopments make elementary reactions and reaction mechanisms a topic of ever-
growing interest. However, there is no general definition of elementary reaction as a
basic unit of a complex reaction. The definitions proposed so far have limited
applicability, since they account for different aspects of this phenomenon. Therefore, in
this section, we will review briefly some of the previous attempts at defining an
"elementary reaction" and present our original concept.
There are two major areas of mechanistic studies, both aimed at developing the notion
of elementary step. The classical formal-kinetic approach, regards the elementary
reaction as a repeated reproduction of a unit act. This is supposed to be a transformation
of some (usually no more than two) reacting species, which results from their collision,
or a molecular rearrangement of single species. The elementary reaction is thus
characterized by a single transition state and no intermediates. It is also supposed to run
via concerted bond change and in a certain direction, Le., the inverse reaction is assumed
to be a separate process.
The formal-logical approach, however, introduces severe restrictions on the types of bond
redistribution (bond making or breaking). There are only two common principles for
describing elementary reactions. These are the well-known classical principle of the
minimum structure change31 ,32 and the self-explaining principle of the minimum
reaction participants in elementary reactions, which is grounded in collision theory. 33
The first of these principles was formulated mathematically as the principle of the
minimum chemical distance (PM CD) . 34-37 Chemical distance is a quantitative measure
of chemical dissimilarity. For many reactions, chemical distance can be represented as
the mathematical model of the logical structure of constitutional chemistry, proposed by
Ugi and co-workers. 34 ,38 A central point in this model is the so-called reaction matrix,
which describes the pattern of electron and bond redistribution in the course of the
reaction. This theory rsrovides the mathematical ground for studies on reaction
mechanisms generation, 9-42 however, it does not suggest a mathematically rigorous
definition of elementary reactions. Thus, the problem of describing the possible types of
electron redistribution and bond change in elementary reactions remained open.
Several attempts to fill this ,§ap are worth mentioning. The contemporary formulation of
the valence bond method43 , provides a convenient basis for this purpose. This method
proceeds from the classical postulates for bicentered bonds, and assumes that reactions
occur via bond-breaking or bond-making. A critical review of these principles45
emphasizes the two possibilities for bond-breaking or bond-making: the heterolytic and
homolytic ones. However, the question of how many bonds are involved in the bond
change process remained open. The apgroach proposed by Zefirov and Trach45 -50 was
based on Woodward-Hoffmann rules6,51, for multicentered processes with cyclic electron
redistribution. However, the elementary nature of pericyclic reactions is debatable. In
another approach, Dewar3 discussed the problem of possible bond change in concerted
and synchronous processes, He termed any reaction involving one bond-breaking and/or
one bond-making a "one-bond reaction", assuming multibond reactions to proceed via
several bond-breaking and -making acts. Dewar also presented evidence for the
assumption that multi-bond processes cannot normally be synchronous. However, there
are exceptions to Dewar's rules, i.e., multibond processes allowed b~ Woodward-
Hoffmann rules or Evans' principle,S4-s6 as well as E2 and SN2 reactions. 3 It should be
noted that elementary reactions are not necessarily synchronous. However, they must be
concerted, i. e., they must proceed via single kinetic step without intermediates.
Synchronous nature requires bond change to proceed at the same time. Finally, an
approach developed by Koca et al. 37, assumes that an elementary reaction is nothing but
one bond-making (or breaking) that occurs heterolytically, Clearly, such a simplification
is chemically untrue.
One may conclude then that several central questions remain open:
(i) Which types of electron redistribution occur in elementary reactions?
(ii) How many bonds of each atom can be formed and/or broken in a concerted process?
(iii) How many bonds can be totally formed and/or broken in a concerted manner in a
single kinetic step?
2.2.1. Definitions
After deleting all vertices and edges that correspond to atoms and bonds taking no part
in the bond redistribution process, one obtains the so-called simplest graph-scheme of
a reaction. The reaction systems in the beginning and at the end of the reaction in the
simplest graph-scheme we call initial graph Gin and final graph Gf , respectively. The
graph G'ff? which results from embedding Gfon Gin is called here the topology identifier
(Fig. 2.1).
/H
(a) M + H-C==C-r=C' + C l " -
H H
(b)
(b) 0-0--0--0--0-o
Gtop
Fig. 2.1. The stages in building a topological identifier of a reaction system. a) chemical
equation; b) the graph-scheme containing the three reagents and the reaction product; c)
the simplest graph-scheme, describing only the bond changes by means of the initial and
final graphs, Gin and Gf ; d) the topology identifier built by superimposing Gin and Gf'
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION MECHANISMS 245
It was shown recently46 that there are only two simple topologies of bond change, linear
and cyclic ones, whereas more complex topologies can be decomposed into these two
simple cases. Fig. 2.2 illustrates the simple topologies for 3-6-centered chemical
transformations.
In analyzing complex reaction topologies, Zeigarnik and Temkins7 showed that in these
cases the topology identifiers must not contain vertices whose degree exceeds two.
Hence, as seen in Table 1, no reaction center changes more than two bonds. Therefore,
the topology identifier for a graph-scheme of elementary reaction must be either a circuit
or a path (in Harary's terms).s This makes possible the enumeration of the elementary
reaction types, i. e., the enumeration of the simplest graph-schemes, corresponding to
elementary reactions.
....IV
0-
H H •
/
M+ c M c - . /•
N=3 I • 0<1
H
,H 1 ""'. 0
3.la 3.2a 3.3a
/ I
M=C , M-C-
+ c c
N=4 I I
- .--. - • •
-c c- C=C-
/ .--. 1 1 IJ
4.la 4.2a 4.3a
\ ,/
. . . c~c/ c---c/ .---• /• •
c •
N=5 M + I ~ II - • l
0..--0
\ c...-c ,
• ...-. .. '. <1
-c~c . . . . •
\ 1\
5.la 5.2a 5.3a
I I /
N=6
-c-c::c-
I c
- -c=c=c
I •/ • •\ ..
. -.
_. 1-0\
O-M~O O~M""'O \ / • 0\ 9
.-. • • 0-01 :z..,
6.la 6.2a 6.3a tn
3:::
;>::
Z
Fig. 2.2a. Bond change topologies for 3-6 centered chemical transformations: cyclic topology ..,tn
;,,-
r-
Cl
::a
• L + M-L' >
."
N=3 L-M+L' .~ .- . 0-0-0 ::r:
.-. .:.,
. ::r:
~
3.1 b 3.2 b 3.3 b ::a
~
>
r
3:
I I 0
t:1
M-C-
.. W=c- • •
_. 1!l
Cf.l
N=4 Sil
H-
w :,M, - I-
1 n
• 1- _. :J 0
3:
."
4.1 b 4.2 b 4.3 b
~
::a
ttl
>
'C~c .-.
I to 9
0
1--c • •
N=5 M + HI - .. M + II - • ........ lJ z
w,..c, 0 3:
• •I
we.... • • f!l
::r:
5.1 b 5.2 b 5.3 b ~
Cf.l
3:
Cf.l
,,/ 'C-C'/
/CJ -~ II -I \ cf°~b
N=6 M -;,c-C
.-. • - .• ••
I
+ 1- , Nu " .-. o-d
••
Nu 6.1 b
• 6.2 b 6.3 b
Fig. 2.2b. Bond change topologies for 3-6 centered chemical transfonmtions: acyclic topology
t->
~
248 O. N. TEMKIN ET AL.
/ • \
- - ~q=O
/ •
- ..
~
~q =1
•
• •
- ..
1\ ~q=2
•
- - ~ ~q =3
• •
Fig. 2.3. All possible types of elementary reactions involving three reaction centers . .:1q
stands for the total bond change (the difference in the number of bonds formed and
respectively broken upon the reaction).
Cl
::a
>
."
Aq=O Aq = 2
• -- i
~
n
>
I I~= I • :J r
3::
0
0
tIl
ren
Aq =0 0
r.~ • • Aq = 3 'Tl
n
0
:J • •
-- :J 3::
."
r
tIl
><
::a
tIl
Aq = 1 >
B
• • --
.......- Aq = 4 0
:l z
• • --
• • 0 3::
~
::t::
>
~
en
Aq = 1 3::
til
• -- ..........
•
I -- ---
Fig. 2.4. All possible types of elerrentary reactions involving rom reaction steps.
~
250 O. N. TEMKIN ET AL.
I/)
C""I ~
N N
II II
II II II 0"
0" 0"
0" 0" <l
<l
c c
<l <l
<l
\..-. (: (J ~
~
C)
c
·Bg
.
11 11 11 11 e:!
11 ~
• • • • I
/ •
~
· ·• I • • • • .~
• • • • • • !!l
.g
g
....
Q)
i~
0
II
0
II
0
II
-II
-0"
II
.£
Q)
......
0
0"
0"
<l <l
0"
<l
0"
<l <l '"
Q)
/---.
• • (: /. .) 1...1
...s.I
P-
'"'"0
........
c..
• ~
I/)
N
.9P
~
11 11
<:
11 11 11
• • ......... "-.
.
~
~. •~ \ \..
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION MECHANISMS 251
(i) The principle of transition state (activation barrier) uniqueness. Although rigorous,
this rule cannot be well defined mathematically. Nothing can be said about the activation
barrier uniqueness proceeding from the bond change analysis.
(ii) The principle of minimum reaction participants: reaction molecularity should not
exceed three.
(iii) The principle of minimum structure change, and its modem version, the principle
of minimum chemical distance. Strong objections could be made to the applicability of
this principle to elementary reactions. Different mechanistic schemes may involve, with
the same probability, elementary reactions with arbitrary, but not very large, chemical
distances. The calculation of chemical distances says nothing about the probability or
improbability of a mechanistic scheme.
Our studies have led to the formulation of two more principles. They have not been
rigorously proved but are based on extensive observations in the area of catalysis by
metal complexes. We suppose that their validity might be extended to all elementary
reactions, by anticipating that a future quantum chemical evidence will support it. The
first one was already discussed above in Table 2.1 and Figs. 2.2-2.5. It can be
summarized as follows:
(iv) The principle of simple bond change topology: Elementary reactions obey either
linear or cyclic bond change topology.
The cases of non-simple bond change topology are readily detected by the presence of
vertices in GUlp whose degree is greater than two. Such vertices correspond to centers in
the activated complex, which break and/or form more than two bonds. For example, the
step of alkene epoxidation by metal alkyl peroxides62 should not be regarded as
concerted, owing to the presence of an oxygen atom, which breaks two bonds and forms
two new ones. The corresponding vertex in GlOP is of degree four.
The second new principle was deduced from the examination of the graph-scheme lists
(Figs. 2.2-2.5) and their comparison with reference data. Because no concerted reactions
with I~q I > 1 were found, we came to the conclusion that the factor determining the
lack of such reactions is the disbalance between bond making and bond breaking. Hence,
the following formulation of this principle resulted:
(v) The principle of bond change compensation: The number of breaking (making)
bonds, which are not compensated by making (breaking) bonds, should not exceed one.
As can be seen, our new rules are heuristic. Their mathematical or quantum chemical
justification is an open question and a challenge for theoretical chemists. We would also
like to mention that the lack of a rigorous mathematical definition does not prevent the
use of the concept for elementary reactions. The latter is a basis for specifying the
notions of reaction route and reaction mechanism which are the main topic of interest in
the next sections of this chapter.
252 O. N. TEMKIN ET AL.
3.1. BACKGROUND
The mechanism of any complex chemical reaction is a set of reagents, products, and
intermediates that incorporates ordered subsets of species related to each other by
chemical reactions (mechanism steps). It is then not surprising that chemists like to depict
reaction mechanisms in the form of diagrams and graphs. Christiansen63 was perhaps the
first to use graphs to demonstrate the difference in the mechanisms of reactions involving
open, cyclic, and mixed sequences of steps. In Christiansen's graphs edges represent the
mechanism steps, whereas vertices stand for the reactants, intermediates, and products.
A+C~B+P (3.1)
A + Zl ~ Z2 + B (3.2)
~+C~Z3 (3.3)
~~ZI+P (3.4)
where Zl is a catalyst. The mechanism is depicted by the KG in Fig. 3.1, in which the
undirected edges 1 and 2 represent reversible reaction steps, while the directed edge (arc)
3 represents an irreversible step.
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION MECHANISMS 253
Figure 3.1. The kinetic graph used to depict the catalytic reaction (3.1) whose
mechanism is depicted by eqs. (3.2-3.4).
(3.8)
(a) (b)
KG2
Fig. 3.2. a). The kinetic graph used to depict a noncatalytic reaction (eq. 3.5) with
hypothetical intermediate. The reaction mechanism is described by eqs. (3.6-3.8). The
empty graph vertex represents the placement of the zero reactant. b) A more detailed
version of kinetic graph 3.2a showing that steps 1 and 2 are reversible by representing
each nondirected edge as two arcs.
254 O. N. TEMKIN ET AL.
KG3
Fig.3.3. Kinetic graph that includes the two pendant vertices that were added to the
kinetic graph in Fig. 3.1 to depict the production of two nonactive intermediates.
As an example, consider in Figure 3.3 a KG constructed by adding two additional steps
to the catalytic reaction represented by Fig. 3.1
(3.9)
~ + p;:t ~p (3.10)
Thus, nonactive intermediates are produced. They contribute to the catalyst mass balance
only:
stoichiometric number. Each complete set of such numbers is a reaction route. The
number of such resulting stoichiometric equations is infinite, owing to the infinite number
of sets of stoichiometric numbers. However, it suffices for the reaction kinetics
description to obtain a set of linearly independent routes (vectors) and respective overall
equations (see section 4 for more detailed description of the route method). Here, we
shall mention that the multiroute linear mechanisms are depicted by polycyclic KGs,
whose number of linearly independent cycles equals the number of linearly independent
routes. Examples of such graphs are given below in section 3.3.
The manner in which the routes (cycles) are topologically connected can be used as a
basis for mechanism classification.
Proceeding from the one-to-one correspondence between linear mechanisms and KGs,
we pro~sed a hierarchical classification of these reaction mechanisms in an earlier
paper. 7 However, the studies on the enumeration of the linear mechanisms and their
computer storage and retrieval indicated the need for some changes in both the
classification and coding systems. The resulting hierarchical set of classification criteria
is as follows: 71
(i) Number of linearly independent reaction routes (KG cycles), M=1,2,3, ...
0-000 A B
CD C Z=Z
2
Fig. 3.4. The four basic classes of linear mechanisms. Class Z refers to the nonadjacent
pair of cycles 1 and 3. Substituting any loop for a cycle of arbitrary size preserves the
class.
e
C =C
eC2
e C3
e
1
e
oLD Zj
O()O
•Z2
Fig. 3.5. Illustration of different subclasses of linear mechanisms. Class Z refers to the
nonadjacent pair of cycles 1 and 3.
The linear code that results from the above classification criteria is:
(3.12)
It describes mechanisms incorporating reversible steps only; they are depicted by simple
(nondirected) kinetic graphs. The class notation in the linear code is abbreviated; it stands
for the generalized classes and contains superscripts that show the number of times this
particular type of cycle linkage occurs. Instead, one can use specific class notation,
which is not shortened, and list all pairwise cycle linkages (A, B, Cor Z) following their
canonical numbering (For examples, see Table 3.1, vide infra). Kinetic face graphs
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION MECHANISMS 257
(KFGs) are used to facilitate the canonical numbering of KG cycles, vertices, and
edges. 70 Each vertex in the KFG represents a cycle (a face) in the initial KG, while a
KFG edge represents a KG cycle linkage of type A, B, or C. The lack of an edge
between two KFG vertices means no A, B, or C type of linkage for the respective pair
of cycles in KG; such mechanisms are classified as class Z.
The modifications of our previously adopted classification and coding systems include
the type of reaction mechanism, which was previously denoted in the code by the serial
number introduced for each KFG. The computer elucidation of the linear mechanisms,
however, would require that standard tables of mechanisms be stored with the serial
numbers of all KFGs, whose number increases rapidly for more complex reactions. The
retrieval of the mechanisms coded is facilitated by the use of the new class Z introduced
in the foregoing, and the general prefix n, which is equal to the number of vertices in
the smallest homeomorphic image of all KGs of the class under consideration. The new
code does not contain any symbol for the mechanism type. Yet, preserving the
mechanism type makes sense from the viewpoint of classification. Types of KGs with
increased complexity may be denoted by L = 1, 2, 3, 4, ... , an integer indicating the
total number of pairwise cycle linkages of type A, B, or C in the KG (See Table 3.1,
vide infra). The upper limit of the L value is the number of edges in the complete KFG
(L = M(M-l)/2).
An example of the use of KGs and their coding is given below with the catalytic reaction
of methanol synthesis. One of the mechanisms proposed7S incorporates two reaction
routes with a total of five reaction steps and four intermediates. Hence, it is represented
by a KG containing two cycles, four vertices, and five edges (Fig. 3.6). The mechanism
code includes the class prefix n=2 (two vertices of degree higher than 2).
KG 1 3 Code: 2-4-2-C-2,4
Fig. 3.6. The kinetic graph and linear code of the reaction, described by the
stoichiometric equations 3.18 and 3.19, and by mechanistic steps 3.13-3.17.
258 O. N. TEMKIN ET AL.
(1)
Z'HzO + COZ +::t Z'HZO'C02 (3.13)
(2)
Z'H2O'C02 ;:t Z'C02 + H2O (3.14)
(3)
Z'C02 + Hz +::t Z'COZ'H 2 (3.15)
(4)
Z'COz'Hz + 2H2 +::t Z' H 20 + CH]OH (3.16)
(5)
Z'C02'H2 ;:t Z'HzO + CO (3.17)
------------------------ -------------------------
COZ + 3H2 ;:t CH]OH + HzO (3.18)
For digraphs, which refer to mechanisms containing irreversible steps, the code is
supplemented by all edge types, E j , listed according to their canonical numbering. There
are three possibilities for a step direction; it is either forward, reverse, or both. These
three types of mechanism steps are denoted by i, i and e, respectively. To make the
code unique, the priority order i < i < e and the minimum code criterion are used.
In the case of KGs with pendant vertices (Section 3.2.3), the code incorporates also the
total number of such vertices, Np , and in an increasing order, the numbers nl of the base
vertices to which the pendant vertices P are connected. Hence, the code of a linear
mechanism containing irreversible steps and pendant vertices is
(3.20)
As an illustration of the code thus extended we may present the codes corresponding to
the mechanisms with one irreversible step and two pendant vertices, depicted by
Fig. 3.1: 1-3-0-3 - e, e, i, and Fig. 3.3: 1-3-0-3 - e, e, i - 2: 1, 2, respectively. Another
illustration is Table 3.1, which contains the codes of all 111 classes of linear mechanims
with four routes. Detailed tables with all linear mechanisms involving 1-4 routes and 2-6
intermediates are given elsewhere. 67 ,70,71
The mechanism code described above can be converted into a convenient complexity
index by making use of the spanning trees of the KGs and some of their subgraphs \a
spanning tree is an acyclic subgraph that contains all the vertices of the initial graph). 6
The mechanism complexity thus evaluated parallels the mechanism hierarchical ordering
in types, classes, and subclasses. Topological patterns that increase or preserve
complexity were analyzed and presented in complexW flowcharts of potential use in the
computerized elucidation of reaction mechanisms. 71 •
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION MECHANISMS 259
Table 3.1. All Classes of Linear Mechanisms with Four Reaction Routes
L=3 29 4-B2Z 2CZ 57 4-A2C 2Z 2 85 4-B2CZC 2
1 4-A3Z 3 30 4-B2CZ3 58 5-ABCZ2C 86 4-BC3ZB
2 5-A3Z 3 31 4-BCZ3B 59 5-AC2Z 2B 87 5-BC3ZC
3 6-A3Z 3 32 5-BCZ3C 60 4-AC 2Z 2C 88 6-C sZ
4 4-A2Z2AZ 33 5-BCZ2CZ 61 6-AC 2Z 2C L=6
5 5-A2Z2AZ 34 5-BC2Z 3 62 2_B4 Z 2 89 5-A6
6 6-A2Z2AZ 35 6-C2Z2CZ 63 3-B3CZ2 90 6-A6
7 4-A2BZ3 36 6-C3Z 3 64 3-B2CBZ2 91 4-A5B
8 5-A2BZ3 L=4 65 3-B2CZBZ 92 5-A5C
9 4-A2Z2BZ 37 5_A4 Z2 66 4-B2Z;zC2 93 2_A2B2C2
10 5-A2Z2BZ 38 6-A4 Z2 67 4-B2CZCZ 94 2-A3B3
11 5-ABZ3A 39 3-A3BZ2 68 4-B2C2Z 2 95 3-A2BCA2
12 4-AB2Z3 40 4-A3BZ2 69 4-BC 2Z 2B 96 4-A2C 2A2
13 4-ABZ2B 41 5-A2BAZ2 70 5-BCZCZC 97 3-A3B2C
14 4-ABZ3B 42 4-A2BZAZ 71 3-BC2Z 2C 98 4-A3BC 2
15 5-ABZ3C 43 3_A2B2Z 2 72 5-BC 2Z 2C 99 3-A3C 3
16 5-ABZ2CZ 44 3-A2ZB2Z 73 5-BC3Z 2 100 1-B6
17 5-ABCZ3 45 4-A3CZ2 74 4_C4 Z 2 101 2-B5C
18 5-ACZ3B 46 5-A3CZ2 75 6_C4 Z 2 102 3_B4 C 2
19 5-A2Z2CZ 47 6-A2CAZ2 L=5 103 3_B2C 2B2
20 6-A2Z2CZ 48 5-A2CZAZ 76 3-NBZA2 104 3-B3C 3
21 5-A2CZ3 49 3-AWZ2B 77 4-A2CZA2 105 3-B2CBC2
22 6-A2CZ3 50 3-A2BCZ2 78 4-A2CZ;zA2 106 4-B2C 2BC
23 6-ACZ3A 51 4-A2ZBCZ 79 3-A2ZCB2 107 3-WC4
24 6-ACZ3C 52 4-A2ZCBZ 80 4-A2ZCBC 108 4-B2C4
25 6-ACZ2CZ 53 4-A2CBZ2 81 5-A2ZC 3 109 5-BC4B
26 6-AC2Z3 54 4-AB2Z 2C 82 2-B2CZB2 110 2-C6
27 3-B2Z2BZ 55 4-ABCZ2B 83 3-B2CZBC 111 6-C6
28 3-B3Z 3 56 5-A2ZC2Z 84 4-B2ZC3
All mechanisms having up to six reaction routes and up to 12 vertices were enumerated,
except in the case of M =6 for N = 11, and N = 12, for which the computational time was
unreasonably high (Table 3.2). The number of classes was also enumerated (Table 3.3).
We found that, at a constant number of reaction routes and an increasing number of
intermediates, the number of classes passes through a maximum and behaves close to the
normal distribution. Both tables give evidence for the potential existence of a
260 O. N. TEMKIN ET AL.
2 1 2 4 7 10 14
3 1 3 12 27 65 129
4 1 5 23 85 276 764
5 1 6 43 210 924 3403
6 1 8 72 469 2652 12644
M\N 8 9 10 11 12
2 19 24 30 37 44
3 245 422 710 1113 1710
4 1935 4466 9583 19291 36859
5 11242 33156 89789 224621 526346
6 52727 194909 651008 CE CE
eE - combmatonal explosIon
I M\N I 8 I 9 I 10 I 11 I 12 I I
2 0 0 0 0 0
3 0 0 0 0 0
4 11 4 1 0 0
5 250 153 77 26 7
6 2800 3082 2576 CE CE
*N for a class mcludes vertIces WIth . ~ 2. as we as all 100ps.
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION MECHANISMS 261
a sharp contrast to some estimates based on mechanistic chemical, but not topological,
tremendousl~ large variety of topologically distinct linear mechanisms. This result is in
information. 0,81 Evidently, in order to be complete, any mechanism enumeration should
take into account all possible interrelations of reactants, elementary steps, and reaction
routes. Besides the incompleteness of the purely chemical approach, such comparisons
may also indicate that some mechanisms that are topologically allowed might be
chemically forbidden. The elucidation of this important question needs further study.
It should be mentioned that the enumeration reported in Table 3.2 is also incomplete. It
refers to mechanisms containing only reversible steps. Indeed, a specified number of
mechanisms with irreversible elementary reactions can be deduced for each of the
mechanisms counted in Table 3.2. Graph-theoretically, this is the problem of counting
the digraphs (graphs containing both arcs and edges, called also "mixed graphs'~ that
correspond to a certain nondirected graph; i. e., the edge coloring problem. However,
this problem is complicated by the fact that some digraphs do not correspond to any
mechanism. Another extension of the enumeration procedure may handle mechanisms
with reaction intermediates that are involved in equilibrium steps only. In terms of graph
theory, this problem can be reformulated as counting the number of graphs with pendant
vertices that correspond to each of the digraphs of interest. Finally, after the exhaustive
topological enumeration described above, one could search for procedures that would
produce an even larger number of theoretically possible mechanisms by accounting for
their chemical specificity. Different classes of chemical reactions or reactants may be
incorporated into our enumeration scheme by regarding graphs with weighted edges
andlor vertices. The results obtained by all these calculations will be a subject of a future
publication. 82
In this section we develop another version of the latter approach in which we deal with
the set of intermediates instead of the complete set of reactants. Thus, Vol'pert graphs
are used in the space of intermediates only. Removal of reactants and products enables
the unified handling of both linear and nonlinear mechanisms. 67 An additional advantage
of our approach is that the space of intermediates alone can indicate the mechanistic
topology. In dealing with the latter, we take into account the mechanisms basic structure
262 o. N. TEMKIN ET AL.
and omit the superfluous chemical details, which make the procedure concrete.
(1) Xi .... Xk
Fig. 4.1. The complete set of elementary reaction types and their bipartite graphs. Open
circles indicate intermediates; solid circles indicate elementary reactions
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION MECHANISMS 263
Fig. 4.2. The bipartite graph depicting the simplified Benson mechanism of ethane
pyrolysis89 , presented by eqs. 4.1-4.6
CH3 •
C2H 6
+ C2H6
-- 2CH3 •
CH, + C2Hs ·
(4.1)
(4.2)
+ C2H6 +
--
H· ;::t C2H s · H2 (4.4)
It should be noted that in Fig. 4.2 elementary reactions (with definite directions) are
considered but are not elementary steps. As seen in Fig. 4.2, double edges appear in the
BG for any reaction that includes the same two products or educts. However, loops are
prohibited. One may therefore summarize that the BG of a reaction mechanism is a
directed graph with multiple edges and no loops.
M = s - rank Bx (4.7)
It was shown earlier for linear reaction mechanisms72 that some of the routes represent
cycles in Temkin's kinetic graphs. Hence, the number of independent routes p can be
obtained from the cyclomatic number of the graph by the equation
Here, E(G) and V(G) are the graph edge sets and vertex sets, respectively; IE(G) I and
IV(G) I are the total number of edges and vertices, respectively; ~(G) is the number of
components of G, whereas P1(G) is the cyclomatic number of G. 9,91 Proceeding from
eqs. (4.7) and (4.8), and taking p = P1(G) and s = IE(G) I , one obtains for linear
mechanisms (described by connected KGs for which Po(G) = 1)
Keeping in mind that IV(G) I is the number of intermediates, one arrives at the chemical
interpretation of this fact: There is one material balance equation for intermediates, which
makes the number of linearly independent intermediates one less than their total number
IV(G) I. ;t;.q. (4.9) can also be obtained from the theorem for the rank of the incidence
matrix B59 , after recognizing the Bx matrix as an incidence matrix.
Some of the reaction routes are of special interest, because they cannot be subdivided
into simpler routes and form a finite set. This was pointed out by Milner,92 who called
them direct paths. These were later used by Happel and Sellers,81,93 and in our recent
work. 94 In this chapter, we call them simple routes and discuss them in more detail.
(1) XI + A .... X2 1 0 1 2 3
(2) X2 + B +:t XI +C 1 -1 0 1 2
(3) X2 + D .... XI + E 0 1 1 1 1
1 -1 1
Bx = 2 1-1
3 -1-1
It is seen that routes PI, P2, and P3 are simple and that they have overall route equations:
(PI) A + B .... C
(p:z) C + D .... E + B
(P,) A + D .... E
Each pair of the elementary reactions (1,2), (-2,3), and (1,3), if regarded as a
mechanism, builds one route and cannot be divided further into two or more other routes.
However, only two of these three routes are independent (M = s - rankB. = 2).
'4
On the other hand, routes and P5 are not simple because they can be divided further
into simple ones. Thus, a part of the overall mechanism (called here "a submechanism")
could correspond either to a simple route, to a complex route, or to no route at all. 94
Consider now the ith submechanism related to the B" stoichiometric matrix. In order to
define this submechanism class we need to fmd M, according to Horiuti's rule. If ~ ~
1, then the sub mechanism could be decomposed into simpler ones. If Mj = 1, it is
regarded as simple (Further along in this text simple mechanisms will be called simple
routes). The complete set of simple routes can be enumerated and generated for any
reaction mechanism.94 Such sets are called here trivial sets of reaction routes.
4.2.3. Reduced Adjacency Matrix. The Shrinking Procedure
For undirected BG we defmed a reduced adjace1J!:Y. 17IIllJj.x (RAM), A' = II a;/ II , whi~h
is an .n.xii matrix. Its entries are 3;j' =1 if vertex Vi' Vi E V is adjacent to vertex ~, Vj E V,
and av' =0 otherwise. For directed BGs the RAM, A" = II aq" II , is defineo so as to
account for the adjacency direction:
3;j" = -1, if Vi is adjacent 10 Vi' and
266 O. N. TEMKIN ET AL.
Keeping in mind that subsets V and V represent the intermediates and steps, respectively,
whereas the absolute value of a RAM entry equals the multiplicity of the respective
directed edge (arc), one arrives at a one-to-one correspondence between the directed
bipartite multigraph and the stoichiometric matrix B •. Hen~e, an operation on graphs can
be defined, that transforms the ,graph of a route to a V-vertex graph. Clearly, this
operation includes contracting of V-type vertices. If now a sum of two equations is taken,
this operation will correspond to the removal of two v-vertices and to the addition of one
new v-type vertex. The procedure, which we call shrinking, is illustrated by eqs. 4.10-
4.13 and Fig. 4.3:
Xl
c[ X Y
X2
... 2 X3
"'0
II
(1-2)
Stage 1
Fig. 4.3. Two-step contracting procedure (shrinking) for bipartite graphs: a) graph of
reactions (1) and (2) before the contracting; b) graph of the overall reaction (1-2) before
the contracting; c) graph of the overall reaction with identical intermediates cancelled.
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION MECHANISMS 267
(1) (4.10)
(2) (4.11)
Features KG BG
Graphical directed or nondirected directed bipartite
characteristics multigraphs multigraphs
Application linear mechanisms both linear and
nonlinear mechanisms
Simple routes are a convenient basis for a common classification of both linear and
nonlinear mechanisms. The face graph (FG) derived from the BG of the mechanism
plays a central role in this procedure. FG vertices represent simple routes, whereas FG
edges stand for the simple route linkages. These FGs differ from those proposed earlier
(under the name supergraphs) for KGs,67,69,70 because generally simple routes are not
identical with KG routes. Yet, the general classification of reaction mechanisms follows
most of the principles of the classification of linear mechanisms. Thus, the first
classification criterion is the number of simple routes, instead of the number of linearly
independent routes, both being expressed by the number of vertices in the respective
facegraph. A mechanism type is introduced according to the total number of edges in
the FG, E = 1, 2, 3, ... Then, the different nature of the FG edges determines the
mechanism class.
268 O. N. TEMKIN ET AL.
Consider in more detail the possible linkages between simple routes. Nine elementary
reaction types were shown in Fig. 4.1. They may be labeled by the consecutive)etters
k. I•...• t, but omitting o. Hence, each FG edge can be coded by the term <nlk, I,
..• >. If two simple routes have no common s~s the right part of the edge label is
empty: < ii 10 >. For example, the term < 61 k1)1 > means that two simple routes are
linked by a common subgraph with six vertices via the elementary reactions of types k,
k. p. and I.
k k
2 4
Indeed, the classification follows the increasing number of n, and, for each nvalue, the
lexicographic priority order is used (k < I < m... ).
In order to complete the classification and coding procedure one also needs to classify
the simple routes (the FG vertices). This is related with the difficult task of enumerating
simple routes, which is not yet solved. On the other hand, the enumeration of the FGs
is actuallv the enumeration of simple connected graphs, which is discussed by Harary and
Palmer. 95
Fig. 4.4. The bipartite graph depicting the mechanism of butane hydrogenation.
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION MECHANISMS 269
---
N(1) N(2) N(3) N(4)
--
(2) ZC.HJO ZC4HS + H2 1 1 0 0
(3) ZC.Hs ZC4~ + H2 0 1 0 0
(4) ZC.Hs Z + C.Hs 1 0 0 2
(5) ZC4~ Z + C.~ 0 1 1 0
(-5) Z + C4H6 - ZC.~ 0 0 1 1
(6) ZC.~ + ZC.HJO - 2ZC.Hs 0 0 0 1
The number of simple routes in this mechanism, determined by the method of Zeigarnik
and Temkin,94 was found to be equal to the number of linearly independent routes.
These four simple routes are the sets of stoichiometric numbers, given in the series N(l),
N(2), N(3), and N(4). Then, the FG of this mechanism can be build as follows:
N(2)
Chemical reaction mechanism, in its most general context, implies the interpretation of
all available experimental facts and theoretical estimates, related to a certain com~lex
reaction. When the sequence of reaction steps (or, otherwise, the reaction scheme ) is
depicted as a graph (KG, BG), one obtains a structure that mirrors the interrelations of
intermediates and the connectedness of reaction routes in the space of intermediates. In
the case of linear mechanisms, the graph representation provides the sequence of reactant
transformations92 without making use of any chemical information, and without any
discrimination of "chemical" hypotheses. The accumulated knowledge on reaction
mechanisms evidences in a convincing waf8 that there is no hierarchy of the different
stages of mechanism studies and description of the type "reaction scheme" - "reaction
mechanism". The two types of mechanism information are interrelated, and obtaining just
270 O. N. TEMKIN ET AL.
one of them for any class of reactions is simply impossible (the only exception is the
class of simple one-route reactions).
Topological information mirrors the mechanistic topological structure (MTS), i. e., the
reactant interrelations, the number and kinds of reaction routes, and their mutual
connectedness. The MTS identification can be done by making use of techniques such
as chemical kinetics, isotope studies, chemical modeling of steps and intermediates,
physicochemical analysis, etc. Taking into account (i) the use of FGs (both for KGs and
BGs) as a topological basis of the classification of mechanism types, (ii) the topological
invariants introduced for the mechanism classes, and (iii) the topological nature of the
mechanistic classification proposed by Christiansen63 (open, closed, and mixed sequence
of steps), we proposed to call such a mechanistic structure topological (MTS).
Physicochemical information reflects intermediate composition and structure at different
levels of elaboration, their reactivity (rate constants, reversibility, the presence of fast
and slow steps), the structure of transition states.
Consequently, this section deals with those aspects of the mechanistic studies of complex
reactions that are most closely related to the concept of mechanistic topological structure
and, first of all, with the graph-theoretical analysis of the mechanistic types.
Depending on the presence or absence of a substance that speeds up the reaction but is
not included in the stoichiometric equation, one can distinguish between catalytic and
noncatalytic reactions. The latter can be divided into conjugated and nonconjugated
reactions depending on the presence or absence of route interdependence. In tum,
conjugated reactions can be chain or nonchain ones, the criterion being the type of the
sequences of intermediate transrormations, or SIT (open, closed, and mixed ones,
GRAPH-THEORETICAL MODELS OF COMPLEX REACTION MECHANISMS 271
(ii) any features of possible topological classes in the different SITs (by analogy with the
classes of topological identifiers for elementary reactions).
The detailed analysis of these questions will be published elsewhere. 100 Here, however, we
will briefly comment on the preliminary results that are presented in Table 5.1 for
mechanisms with one or two linearly independent routes:
(1) Conjugated and chain processes, differing from the nonconjugated noncatalytic
processes and the catalytic ones, have not less than two routes.
(2) Within the topology of Temkin's cyclic KGs, there is no difference between
noncatalytic, catalytic, and chain processes. A special class C (see section 3) is singled
out for congugated reactions only.
(3) The major difference in topology of these four types of reactions is in the nature of
their SITs (i.e., in the nature of KG cycles).
(4) The SITs of noncatalytic and noncatalytic conjugated reactions belong to linear,
bilinear, or branched linear topological classes.
> >-
(5) The SITs of catalytic reactions are classified into cyclic or polycyclic topological
classes.
(6) The SITs of chain processes correspond to mixed branched-cyclic topological classes.
(7) Chain reactions are a specific type of conjugated reaction, one of the routes of which
is transformed into a cyclic SIT. Alternatively, chain reactions may be regarded as a
specific type of catalytic process (nonideal catalysis), one of the routes of which turns
into an open SIT ( a route with linear topology).
It is interesting to note that, while they differ by many classification criteria, catalytic,
noncatalytic, and chain reactions belong to the same equivalence class when the criterion
used is the SIT nature (open or closed) or the mechanism topological class (linear, cyclic
or combined).
N
....,
Table 5.1. Topological characteristics of the mechanisms of complex reactions. Examples with one-route and two-route N
mechanisms
N Reaction type Number of KG topological features SIT topological classes
routes M KG vertex tXpe class SIT tXpe63 SIT to~logX SIT depiction
C closed cyclic
CD
4 Chain ~2
CD zero-vertex 8 open-closed mixed
9
:z
CD tIl
~
~
6. References
7. Acknowledgments. This work was partially supported by the Russian Foundation for
Fundamental Research Grant no. 93-03-18050. D. Bonchev gratefully acknowledges the
hospitality of Dr. W. A. Seitz (Galveston) and Dr. C. F. Mountain (Houston) during his
sabbatical, as well as the support of the Welch Foundation, Houston, Texas.
INDEX