Computational Methods in Medical Chemistry

Digital Comprehensive Summaries of Uppsala Dissertations
from the Faculty of Pharmacy 201
Computational Methods in
Medicinal Chemistry
Mechanistic Investigations and Virtual Screening
Development
FREDRIK SVENSSON
ACTA
UNIVERSITATIS
UPSALIENSIS ISSN 1651-6192
ISBN 978-91-554-9293-9
UPPSALA urn:nbn:se:uu:diva-259443
2015
Dissertation presented at Uppsala University to be publicly examined in A1:107a, BMC,
Husargatan 3, Uppsala, Friday, 25 September 2015 at 09:15 for the degree of Doctor of
Philosophy (Faculty of Pharmacy). The examination will be conducted in English. Faculty
examiner: Professor Anna Linusson (Umeå University, Department of Chemistry).
Abstract
Svensson, F. 2015. Computational Methods in Medicinal Chemistry. Mechanistic
Investigations and Virtual Screening Development. Digital Comprehensive Summaries of
Uppsala Dissertations from the Faculty of Pharmacy 201. 65 pp. Uppsala: Acta Universitatis
Upsaliensis. ISBN 978-91-554-9293-9.
Computational methods have become an integral part of drug development and can help bring
new and better drugs to the market faster. The process of predicting the biological activity of
large compound collections is known as virtual screening, and has been instrumental in the
development of several drugs today in the market. Computational methods can also be used to
elucidate the energies associated with chemical reactivity and predict how to improve a synthetic
protocol. These two applications of computational medicinal chemistry is the focus of this thesis.
In the first part of this work, quantum mechanics has been used to probe the energy surface of
palladium(II)-catalyzed decarboxylative reactions in order to gain a better understating of these
systems (paper I-III). These studies have mapped the reaction pathways and been able to make
accurate predictions that were verified experimentally.
The other focus of this work has been to develop virtual screening methodology. Our first
study in the area (paper IV) investigated if the results from several virtual screening methods
could be combined using data fusion techniques in order to get a more consistent result and
better performance. The study showed that the results obtained from data fusion were more
consistent than the results from any single method. The data fusion methods also for several
target had a better performance than any of the included single methods.
Next, we developed a dataset suitable for evaluating the performance of virtual screening
methods when applied to large compound collection as a replacement or complement for high
throughput screening (paper V). This is the first benchmark dataset of its kind.
Finally, a method for using computationally derived reaction coordinates as basis for virtual
screening was developed. The aim was to find inhibitors that resemble key steps in the
mechanism (paper VI). This initial proof of concept study managed to locate several known and
one previously not reported reaction mimetics against insulin regulated amino peptidase.
Keywords: DFT, IRAP, Virtual Screening, Catalysis, Palladium
Fredrik Svensson, Department of Medicinal Chemistry, Organic Pharmaceutical Chemistry,

Box 574, Uppsala University, SE-75123 Uppsala, Sweden.
© Fredrik Svensson 2015
ISSN 1651-6192
ISBN 978-91-554-9293-9
urn:nbn:se:uu:diva-259443 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-259443)
”A really good theoretical chemist
can obtain right answers with wrong
models.”
-J. H. Van Vleck
List of Papers
This thesis is based on the following papers, which are referred to in the text
by their Roman numerals.
I F. Svensson, R. Mane, J. Sävmarker, M. Larhed, C. Sköld,

Theoretical and Experimental Investigation of Palladium(II)-
Catalyzed Decarboxylative Addition of Arenecarboxylic Acid
to Nitrile, Organometallics 2013, 32, 490-497
II J. Rydfjord, F. Svensson, A. Trejos, P. Sjöberg, C. Sköld, J.

Sävmarker, L. Odell, M. Larhed, Decarboxylative Palladi-
um(II)-Catalyzed Synthesis of Aryl Amidines from Aryl Car-
boxylic Acids: Development and Mechanistic Investigation,
Chem. Eur. J. 2013, 19, 13803-13810
III F. Svensson, A. Fardost, B. Skillinghaug, P. Wakchaure, M.

Wejdemar, M. Larhed, C. Sköld, Mechanistic Investigation of
Palladium(II)-Catalyzed Decarboxylative Synthesis of Elec-
tron-Rich Styrenes and 1,1-Diarylethenes, Manuscript
IV F. Svensson, A. Karlén, C. Sköld, Virtual Screening Data Fu-

sion Using Both Structure- and Ligand-Based Methods, J.
Chem. Inf. Model. 2012, 52, 225-232
V M. Lindh, F. Svensson, W. Schaal, J. Zhang, C. Sköld, P.

Brandt, A. Karlén, Toward a Benchmarking Data Set Able to
Evaluate Ligand- and Structure-based Virtual Screening Using
Public HTS Data, J. Chem. Inf. Model. 2015, 55, 343-353
VI F. Svensson, K. Engen, T. Lundbäck, M. Larhed, C. Sköld,

Virtual Screening for Transition State Analogue Inhibitors of
IRAP Based on Quantum Mechanically Derived Reaction Co-
ordinates, J. Chem. Inf. Model., In press, DOI:
10.1021/acs.jcim.5b00359
Reprints were made with permission from the respective publishers.

List of additional publications
2015 A. Belfrage, J. Gising, F. Svensson, E. Åkerblom, C. Sköld,

A. Sandström, Efficient and Selective Palladium-Catalysed C-
3 Urea Couplings to 3,5-Dichloro-2(1H)-pyrazinones, Eur. J.
Org. Chem., 5, 978-986
2014 S. Borhade, U. Rosenström, J. Sävmarker, T. Lundbäck, A.

Jenmalm-Jensen, K. Sigmundsson, H. Axelsson, F. Svensson,
V. Konda, C. Sköld, M. Larhed, M. Hallberg, Inhibition of In-
sulin-Regulated Aminopeptidase (IRAP) by Arylsulfona-
mides, ChemistryOpen, 3, 256-263
2014 B. Skillinghaug, C. Sköld, J. Rydfjord, F. Svensson, M. Beh-

rends, J. Sävmarker, P. J. R. Sjöberg, M. Larhed, Palladi-
um(II)-Catalyzed Desulfitative Synthesis of Aryl Ketones
from Sodium Arylsulfinates and Nitriles; Scope, Limitations,
and Mechanistic Studies, J. Org. Chem., 79, 12018-12032
2013 J. Rydfjord, F. Svensson, M. Fagrell, J. Sävmarker, M. Thu-

lin, M. Larhed, Temperature measurements with two different
IR sensors in a continuous-flow microwave heated system,
Beilstein J. Org. Chem., 9, 2079-2087
Contents
1 Introduction ......................................................................................... 11
1.1 Computational Tools in Drug Discovery ........................................ 11
2 Reactivity and Energy.......................................................................... 12
2.1 Energies of Ligand Binding ............................................................ 12
2.2 Potential Energy Surface ................................................................. 13
2.3 Transition State Theory ................................................................... 14
2.4 Molecular Mechanics ...................................................................... 15
2.5 Quantum Mechanics ....................................................................... 16
2.5.1 Density Functional Theory .................................................... 17
2.5.2 Level of Theory ..................................................................... 17
3 Virtual Screening ................................................................................. 20
3.1 Ligand-Based Methods ................................................................... 20
3.2 Molecular Docking ......................................................................... 20
3.3 Database and Target Preparation .................................................... 21
3.3.1 Water Molecules in Virtual Screening ................................... 21
3.4 The Concept of Chemical Space ..................................................... 22
3.4.1 Principal Component Analysis .............................................. 22
3.5 Benchmark Datasets ........................................................................ 22
3.6 Evaluation Metrics .......................................................................... 23
4 Catalysis............................................................................................... 25
4.1 Palladium Catalysis ......................................................................... 25
4.1.1 Decarboxylation ..................................................................... 26
4.1.2 Palladium Ligands ................................................................. 26
4.1.3 Density Functional Theory Modeling of Palladium-
Catalyzed Reactions............................................................................. 27
4.2 Enzyme Catalysis ............................................................................ 28
4.2.1 Modeling of Enzyme Reactions ............................................. 28
4.2.2 Cluster Approach ................................................................... 29
5 Aims of the Thesis ............................................................................... 30
6 Decarboxylative Palladium(II)-Catalyzed Reactions (papers I-III) ..... 31
6.1 Synthesis of Aryl Ketones............................................................... 31
6.2 Synthesis of Aryl Amidines ............................................................ 34
6.3 Styrene Synthesis ............................................................................ 36
7 Method Consensus in Virtual Screening (paper IV) ............................ 41
8 Evaluation of Virtual Screening Using Publicly Available High
Throughput Screening Data (paper V)................................................. 44
9 Virtual Screening Using Computationally Derived Reaction
Coordinates (paper VI) ........................................................................ 48
10 Conclusions ......................................................................................... 54
11 Acknowledgements.............................................................................. 55
12 References ........................................................................................... 56
Abbreviations
APN aminopeptidase N
B3LYP Becke, 3-parameter, Lee-Yang-Parr
BEDROC Boltzman-enhanced discrimination of ROC
DEKOIS demanding evaluation kits for objective in silico screening
DFT density functional theory
DUD directory of useful decoys
EF enrichment factor
ERAP endoplasmic reticulum aminopeptidase
ESI-MS electrospray ionization mass spectrometry
FF force field
fp false positive
HTS high throughput screening
IRAP insulin-regulated aminopeptidase
IRC intrinsic reaction coordinate
LB ligand-based
MM molecular mechanics
MUV maximum unbiased validation (dataset)
PCA principal component analysis
PES potential energy surface
PDB protein data bank
QM quantum mechanics
QRC quick reaction coordinate
ROC receiver operator characteristic
SB structure-based
tp true positive
TS transition state
TST transition-state theory
VA vinyl acetate
VS virtual screening
1 Introduction
For thousands of years humans have treated ailments with remedies derived
from natural sources such as plants, fungi, insects, and animals. During the
nineteenth century it was realized that the therapeutic effects were due to
specific molecules present in these sources. Subsequent development of the
chemical industry provided new synthetic sources for these molecules, set-
ting the stage for the emergence of the pharmaceutical industry. Although
this process revolutionized the treatment of many diseases, the discovery of
new drugs still happened mainly by chance. Gradually, the newly invented
methods led to what we call rational drug design, the leading paradigm of
drug development today.1
1.1 Computational Tools in Drug Discovery

The rational development of new drugs is a complex process involving a
myriad of different disciplines. In nearly all of these, computational tools
serve to guide the design decisions, ultimately speeding up and reducing the
cost of development.2 In the area of medicinal chemistry, computational
methods are applied both to identify new starting points and to guide the
further development of promising substances.
Computational tools can also be applied to further our understanding of
chemical reactivity and the mechanisms behind chemical reactions. Quantum
mechanics (QM) allows for the detailed study of the electronic properties of
a molecule and the energies associated with its reactivity.
Thus, computational methods can be used to guide the decision on what
compounds to make, and to help decide how best to make these compounds.
These two factors are the focus of this thesis.
11
2 Reactivity and Energy
The fundamentals of thermodynamics dictate that a chemical reaction will be

a spontaneous process if it is associated with a decrease in the free energy of
the system. Free energy in the context of thermodynamics is defined as the
energy available to do work, and for a process carried out at constant tem-
perature it can be measured as Gibbs free energy. The change in Gibbs free
energy (ΔG) of a process is dependent on the change in enthalpy (ΔH) and
entropy (ΔS) as well as the temperature of the system (T), according to:
∆ =∆ − ∆
The thermodynamic properties of a reaction tell us if it is energetically fa-

vorable, but they do not determine the rate of the reaction. How fast a reac-
tion will proceed is determined by the activation energy (Ea) of the process.
This is expressed by the Arrhenius equation as:
/
=
where k is the reaction rate coefficient, A is the frequency factor specific to

the reaction, R is the gas constant, and T is the temperature.
2.1 Energies of Ligand Binding

Like any process, the binding of a drug molecule to its macromolecular tar-
get only proceeds spontaneously if the process is associated with negative
free energy. The enthalpy change associated with this binding process de-
pends on the interactions occurring between the ligand and the protein com-
pared to the interactions of the unbound ligand protein with the solvent. Pos-
sible interactions in the ligand–protein complex can be detected by using, for
example, X-ray methodology to find the structure. Ideally, these interactions
would be associated with an energy term and then summed. Unfortunately, it
has been shown that the interaction energies behave in a non-additive fash-
ion.3,4
12
Estimation of the entropy term is less straightforward, as it is dependent
on the effects of the binding event on the degrees of freedom of the ligand,
the protein, and the solvent molecules.5
The ultimate goal of computational medicinal chemistry could be thought
of as the accurate prediction of the free energy of a binding event.
2.2 Potential Energy Surface

A function of the atom positions which return the potential energy of the
systems could be thought of in terms of a multidimensional potential energy
surface (PES). Points on the PES that have a zero gradient, especially mini-
ma and first-order saddle points, are of especial interest. The minima points
correspond to stable species and a saddle point along the minimum energy
path separating two minima represents a transition state (TS). If the system is
restricted to two degrees of freedom, such as two bond lengths occurring
during a reaction, this can be visualized in a graph. The PES for the palladi-
um(II)-catalyzed decarboxylation of benzoic acid is shown in Figure 1.
Figure 1. Calculated PES for the decarboxylation of benzoic acid. Energy is plotted
as a function of bond lengths r1 and r2.
The relative energies of the TS saddle point to the connecting minima on

either side represent the minimal amounts of energy required for the transi-
tions to take place in the forward and backward directions, which affects the
rates of these reactions.
The PES is also of interest when considering the conformations of mole-
cules. A low energy minimum on the PES for a certain of molecules will be
highly populated, with the population density decreasing as the energy in-
13
creases. The probability distribution of molecules over their possible con-
formational states is described by a Boltzmann distribution through:
/
=∑ /
where pi is the probability of state i, εi is the energy state i, k is the Boltz-

mann constant, T is the temperature, and N is the number of possible states.
Since we are usually interested in the low energy conformations, we need
methods for locating these for a given PES. Given an arbitrary starting point
on the PES, minimization methods can find the closest local minimum point
by following the local energy gradient. Thus, the optimization of a starting
geometry can easily be achieved, yielding the conformation associated with
the downhill energy minimum from the starting conformation. However, no
method of readily locating the global energy minimum exists. In fact, locat-
ing the global minimum with certainty usually requires that all possible con-
formations are sampled. Several different approaches have been developed
for sampling the conformational space of molecules.6,7
Once located, the zero gradient point can be characterized by calculating
the Hessian matrix of the potential energy function. Although potentially
time consuming, this will give information on the local curvature of the PES.
The Hessian matrix can also be used to calculate the force constants of the
harmonic vibrations in a molecule.
2.3 Transition State Theory

Transition state theory (TST) is closely linked to the Arrhenius equation and
the PES. This concept, which was introduced in 1935 by Eyring, includes
mechanistic considerations that are not included in the Arrhenius rate law.8
TST links the movement over the PES to the reaction rate constant (k)
through:
∆ ‡
=κ
ℎ
where κ is the transmission coefficient relative to the probability that the

product will form from the activated complex, kB is Boltzman’s constant, h is
Planck’s constant, ΔG‡ is the free energy of activation, T is the temperature,
and R is the gas constant.
14
2.4 Molecular Mechanics
The application of principles from classical mechanics to describe molecular
systems is called molecular mechanics (MM). In MM, an atom is considered
as a single particle and bonds between atoms are treated like springs.
Through this approach, the potential energy of a molecular system can be
calculated using a set of parameterized functions called a force field (FF).
The concept of MM was introduced in 1946 by Hill,9 and Westheimer and
Mayer.10
The terms included depend on the FF used. Generally, terms describing
bond length, torsion, angle, and non-bonded interactions are considered. The
parameters are chosen so that the model can reproduce known experimental
structures or structures derived from QM calculations. One of the first suc-
cessful FFs was MM2, published in 1977 by Allinger.11 In this FF the poten-
tial energy is described as:
= + +
+ +
where:
= 71.94 ( − ) (1 − 2( − )),
= 0.021914 ( − ) (1 + 7 ∙ 10 ∙( − ) ),
= 2.51124 (θ − )( − ) + (l − ) ,
= (2.9 ∙ 10 ∙ − 2.25 ),
= (1 + cos ) + (1 − cos 2 ) + (1 + cos 3 ),
ks is a bond-specific constant, l is the bond length, l0 is the ideal bond length,

kθ is a constant specific for the atom types making up the angle, θ is a three-
atom angle, θ0 is the ideal angle, ksθ is a stretch-bond constant determined by
the included atoms, ε is an atom-specific constant, P is the sum of the van
der Waal radii divided by the distance between the interaction centers, V is a
torsion-type constant, and ω is the torsion angle.
Because MM FF energy calculations are relatively simple, they run very
fast on a modern computer and the analytical derivative is easily accessible,
allowing for efficient optimization of the conformational geometry. The
15
speed of MM calculations allows large databases of small molecules to be
considered, and large systems to be readily optimized.
However, the standard FF parameterization does not allow chemical reac-
tions to be modeled. Even with a set of parameters adapted for modeling a
specific reaction, MM would struggle to accurately describe the energy of
the process. This is because, in order to capture the full nature of a reaction,
the electronic structure must be considered.
2.5 Quantum Mechanics

The basis of QM is the wave function (Ψ), which constitutes the most com-
plete description of a physical system. When queried with different opera-
tors, all observable properties can be derived from the wave function of a
system. The operator that return the energy (E) of the system is called the
Hamiltonian operator (Ĥ), and the energy can thus be determined from:
Ψ= Ψ
This constitutes the time-independent Schrödinger equation.12

QM methods, in contrast to MM methods, are derived from basic physical
laws and are not parameterized. For this reason, QM calculations are some-
times called ab initio, meaning “from first principles”. Classic QM methods
explicitly consider the electron wave function, thus allowing chemical reac-
tions to be accurately modeled. However, this makes the methods dependent
on 4N coordinates (N is the number of electrons), 3N spatial coordinates and
N spin coordinates. Such systems can only be solved approximately through
very time-consuming methods.
A key to make any attempt to solve the Schrödinger equation is the varia-
tional principle. This principle states that, for any guessed normalized wave
function, the computed energy will be the upper boundary of the true energy
at the ground state.
The wave function itself is not observable. However, an N-electron wave
function can be related to the electron probability density ρ(r) of the system
through:
( )= … |Ψ( , , , …, , )| …
where r is the spatial coordinates and s is the spin coordinate. Although,

strictly speaking, ρ(r) is a probability density, it is usually referred to as the
electron density.
16
2.5.1 Density Functional Theory
Density functional theory (DFT) relates the system properties to the electron
density function, and depends on only three spatial coordinates. This has
made DFT a cost-effective alternative for modeling larger systems.
Although it was recognized early on that using the electron density func-
tion could provide a simpler approach than the wave function methods, no
proof existed that the electron density function uniquely determined the
properties of the system. Hohenberg and Kohn provided the fundamental
principles for DFT when they proved that the electron density of a system
completely determines the electronic energy.13 In their second theorem, they
also showed that any trial electron density satisfying the boundary conditions
results in energy higher than or equal to the true ground-state energy; in oth-
er words, they showed that the variational principle is also valid in this ap-
proach.
Although this marked a theoretical breakthrough, the Hohenberg-Kohn
theorems do not give any practical guidance on how to construct the density
functional and determine the energy of the studied system. Also, the second
theorem is of limited practical use, as it is only valid for the exact density
functional, which is not available.
Once the soundness of the method had been proved, however, it did not
take long for a practical approach to the density functional to be presented by
Kohn and Sham.14
One challenge with the Kohn-Sham method is that it includes an ex-
change-correlation functional term for which the expression is unknown.
Any attempt to apply the Kohn-Sham method thus has to be accompanied by
an estimate of this term, resulting in a multitude of different functional
forms.
2.5.2 Level of Theory

All DFT calculations in this thesis were carried out using the Jaguar soft-
ware15 and the Becke, three-parameter, Lee-Yang-Parr (B3LYP) hybrid
functional.16,17 The term hybrid functional means that the correlation term
partly consists of an exact exchange term from Hartree-Fock theory.
B3LYP-derived energies have been estimated on the G3/05 QM valida-
tion dataset to have a mean absolute deviation of 4.1 kcal mol-1.18 In most
applications, these energies are used relative to the energies of similar sys-
tems and thus the accuracy is expected to be higher as a result of error can-
celing. However, the comparison of some systems, such as charged and neu-
tral systems, might be more challenging. In these cases, the results are de-
pendent on a suitable solvation model to give accurate results.19
In this thesis, the B3LYP functional has been paired with the LACVP
(Los Alamos national laboratory, outermost Core orbitals, Valence only,
17
Pople) basis set which utilizes the 6-31G20–25 basis set for elements H-Ar and
an effective core potential together with a valence basis set for heavier ele-
ments.26 The 6-31G basis set is a split-valence double-zeta basis set. This
means that it uses one basis function for each core atomic orbital and two
basis functions for the outer atomic orbitals. The basis sets can be further
expanded by the addition of polarization functions, denoted by * when added
to all atoms except for transition metals, hydrogen, and helium, and by **
when also added to hydrogen and helium atoms. The purpose is to expand
the available space for the orbitals to accommodate polarization in a certain
direction. The orbital radius can be further expanded by the addition of dif-
fuse functions (denoted + or ++) to increase the accuracy when considering
anions and highly electronegative atoms.
2.5.2.1 Dispersion
Electron correlation gives rise to long range attractive forces called disper-
sion. These long range electron-electron interactions are not well modeled in
most DFT functionals.27 Nevertheless, an accurate description of the disper-
sion energy is essential for modeling some systems and has to be accounted
for.28
In the papers employing DFT calculations presented in this thesis, the
DFT-D3 protocol was used to correct for dispersion.29 Antony et al. have
shown that, when modeling protein ligand energies, the accuracy of disper-
sion-corrected DFT methods equals that of high-level wave function meth-
ods.30
2.5.2.2 Solvation
Chemical reactions generally do not take place in an ideal gas phase, as as-
sumed in standard QM calculations, but rather in some form of solvent. The
solvent molecules constantly interact with the reaction system and alter its
properties. The most accurate way to model this would be to include explicit
solvent molecules in the QM calculations. This, however, rapidly becomes
impractical as the size of the system would drastically increase and the con-
formational freedoms would soon become unmanageable. Instead, solvent
effects are usually modeled by continuum solvent models that simulate the
average effect of the solvent. The solvation models generally apply the sol-
vent's dielectric constant (ε) and radius to simulate the electrostatic effect of
the solvent on the system.
In this thesis, the Poisson-Boltzman finite continuum model31,32 has been
applied in all DFT studies.
2.5.2.3 Free Energy

The final energies reported for the DFT studies were derived by adding the
electronic energy from the gas phase calculations to the solvation energy, the
18
calculated dispersion correction, the zero point energy, and the harmonic
entropy contribution at the investigated temperature.
2.5.2.4 Interpreting the Model

With the help of DFT we can calculate the free-energy profile of a chemical
reaction.33 The free-energy profile can be thought of as a schematic intersec-
tion of the free-energy surface of the reaction. A free-energy profile of a
hypothetical reaction is shown in Figure 2.
Figure 2. Hypothetical free-energy profile of a reaction.
The first thing to note is that we are only interested in relative energies and,
thus, we choose to denote the starting point as zero energy. From the energy
profile we can then determine the thermodynamic properties of the reaction
(endergonic or exergonic) by comparing the relative energies of the reactant
and the product. We can also calculate the amount of free energy that is re-
quired for the reaction to take place by comparing the free energy of the
highest energy TS to the lowest preceding point (reactant to the second TS,
80 kJ mol-1). This free-energy requirement in turn is proportional to the rate
of the reaction.
19
3 Virtual Screening
Virtual screening (VS) is the in silico process of querying a large collection

of molecules for those that are predicted to have biological activity on a
macromolecular target of interest. VS can be divided into structure-based
(SB) and ligand-based (LB) methods where the SB methods utilize infor-
mation on the macromolecular target and the LB methods use information on
known active compounds. Both SB and LB methods can then be further
subdivided depending on their complexity.
VS is well recognized today and several examples of successful applica-
tions of VS are present in the literature.34,35 VS has also played an instrumen-
tal role in the development of several drugs now in the clinic.36
3.1 Ligand-Based Methods

LB methods are based on the assumption that similar molecules have similar
biological activity.37 This statement appears simple enough at first, but the
concept of similarity is not easily defined and it can be measured in any
property of a molecule. The simplest approaches only consider similarity in
easily calculated molecular properties, such as molecular weight or the num-
ber of hydrogen-bond acceptors. These kinds of methods are generally re-
ferred to as 1D methods. 2D methods, on the other hand, calculate similari-
ties between the fingerprints derived from the 2D structures of the mole-
cules. The most advanced methods consider the three-dimensional shape of
the molecule and are accordingly called 3D methods.38
Instead of considering the similarities between entire molecules, a set of
key features thought to be responsible for the biological activity can be de-
fined. This set of key features is referred to as a pharmacophore.
3.2 Molecular Docking

The most commonly used SB method is molecular docking. Docking aims to
predict the binding mode of small molecules to a macromolecular target,
usually a protein. The predicted poses are then scored using a scoring func-
tion in order to rank the compounds by predicted activity. To achieve this,
the docking process must tackle several challenges. The input molecules
20
need to be conformationally sampled, fitted in to the binding site of the mac-
romolecular target, and the energies of the formed interactions must be esti-
mated to generate a score.39
Performance of SB VS requires the protein structure to be derived, usual-
ly by X-ray crystallography of a protein crystal. The Protein Data Bank
(PDB) is a valuable source of protein structures which contains more than
100 000 structures that have been deposited by researchers around the
world.40 The electron densities of many structures are also available through
the Uppsala Electron-Density Server.41
Although the availability of large numbers of crystal structures has been
of immense importance to understanding protein–ligand interactions and the
development of new drugs, careful evaluation should accompany the use of
protein crystal data.42 When evaluating the quality of a crystal structure for
use in SB VS, it is essential to review the actual electron density map in or-
der to make sure that the ligand and the binding site are well defined. Also,
the relevance of the crystalized conformation has to be evaluated and the
possible effects of flexibility assessed.
3.3 Database and Target Preparation

The input library of chemical substances and the protein target must be pre-
pared for VS.43 Depending on the requirements of the VS method, this can
include different considerations. Steps for preparing the compound library
include the generation of reasonable 3D structures of the molecules, the ap-
propriate tautomers, the protonation states, and the stereoisomers.
For SB techniques, the protein target also requires the correct tautomeric-
and protonation states to be assigned to the binding-site amino acid residues.
Since the protein is often considered to be rigid during docking calculations
it is necessary to also consider the conformations of the binding-site resi-
dues. Reasonable conformations of the heavy atoms are usually readily
available from the crystal structure input, although they might need revision
to be suitable for the specific docking problem at hand. The hydrogens,
however, often constitute a challenging task since they are not visible in
most X-ray-derived structures and must therefore be added by the program.
This means that the conformation of each individual hydrogen has to be es-
timated.
3.3.1 Water Molecules in Virtual Screening

All of the above protein properties can be influenced by the water molecules
in the protein surroundings.44 Water molecules also influence the binding of
ligands to the protein through solvation/desolvation effects, bridging interac-
tions, and entropy contributions.
21
When preparing a protein for VS, the water present in the active site re-
quires special consideration. For simplicity, water is often removed but, for
some inhibitors, the presence of water molecules might be essential in order
to predict the correct binding pose.45 Overall, the displacement of water from
the right areas of the active site cavity is essential to achieve favorable bind-
ing free energy.46,47
3.4 The Concept of Chemical Space

All of the molecules that can possibly be made constitute the chemical
space; it has been estimated that for small organic molecules this consists of
about 1060 molecules.48 Thus, for all practical applications, chemical space
can be considered infinite.
Despite this, many molecules of interest have been identified, for example
in the form of drugs. Is this because the entire chemical space is filled with
molecules that are suitable as drugs? Probably not. The history of success
probably stems from the fact that in our pursuit of new drugs we generally
consider only a small part of chemical space. Here, concepts such as “drug-
likeness” become important. By imitating natural products and drugs that are
already known, we can focus our efforts in a subspace that is more likely to
contain molecules of interest.49 A famous example is Lipinski’s rule of five;
this is a set of rules derived from known drugs and is intended to character-
ize the part of chemical space associated with solubility and permeability for
oral drugs.50
3.4.1 Principal Component Analysis

Chemical space can be thought of as multidimensional space, where each
dimension is a molecular property. By measuring or calculating these prop-
erties we can describe molecules with respect to each other. In order to visu-
alize this many-dimensional space, methods that reduce the dimensionality
of the data, such as principal component analysis (PCA), are useful.
PCA can map the original features of the data to a new set of uncorrelated
features called principal components.51 This is done in such a way that the
first principal component accounts for the most variability.
3.5 Benchmark Datasets

Evaluating the success of VS and comparing the results between methods is
not a trivial undertaking.52 It can be argued that the best validation is the
prospective application of VS methods to new targets. However, this is both
time consuming and costly. A more efficient way to evaluate VS methods is
22
through a test set with known active compounds pooled with other mole-
cules, referred to as decoys. Such a dataset can be used to estimate the ability
of different algorithms to retrieve active compounds. A number of such
benchmark sets are available.53 The most well-known are: the directory of
useful decoys (DUD) and the enhanced version (DUD-E),54,55 the maximum
unbiased validation (MUV) dataset,56 and the demanding evaluation kits for
objective in silico screening (DEKOIS).57,58
Many benchmark datasets are compiled using compounds from publicly
available databases for small molecules such as the ZINC,59,60 BindingDB,61
and PubChem62 databases. Often the data are supplemented using com-
pounds from the literature for the specific target in question.
Since benchmark datasets are based on sets of known compounds, there is
an inherited risk of bias. Large chemical collections tend to sample only
certain regions of chemical space, such as molecules that are easy to make or
those that fall into the regions surrounding known inhibitors.63 Therefore,
strictly speaking, the benchmark datasets only test performance in the chem-
ical and biological space that they cover.
The composition of benchmark datasets has led to the introduction of the
concepts of artificial enrichment and analog bias. A difference in physico-
chemical properties between active and inactive molecules can lead to artifi-
cial enrichment, i.e. enrichment due to the dissimilarity between active and
inactive compounds rather than due to the efficacy of the VS method evalu-
ated.64 Similarly, if all active compounds are in the same structure class this
could lead to an “all or nothing” outcome. Too much similarity among the
active compounds is generally referred to as analog bias.65
3.6 Evaluation Metrics

Several metrics have been proposed to measure the success of VS. The en-
richment factor (EF) is a commonly used metric and is defined as:
/( )
=
/
where tp is the number of true positives, fp is the number of false positives,

A is the total number of active compounds, and N is the total number of both
active and inactive compounds. The EF can be interpreted as comparing the
fraction of active compounds in the top part of the data set to the fraction of
active compounds in the whole dataset.
The results generated by VS are often visualized using a receiver operator
characteristic (ROC) curve where the tp rate is plotted against the fp rate.
The area under the ROC curve can be used as an evaluation metric.
23
These commonly used methods do not account for the position of the ac-
tive compound in the hit-list; they only indicate whether it is within the de-
fined cutoff. This is called the “early recognition problem” and several other
metrics have been proposed that take this into consideration.66 One such
metric is Boltzman-enhanced discrimination of ROC (BEDROC), which is
defined as:
∑ / ∙ sinh 1
= ∙ +
( )
/ cosh − cosh − 1−
where n is the number of active compounds, N is the total number of com-

pounds, ri is the rank of the ith active compound, and α is a tuning parame-
ter. However, these more complicated metrics often lack the simple inter-
pretability of the standard measurements.
24
4 Catalysis
The term catalysis was introduced by the Swedish scientist Jacob

Berzelius.67 A catalyst facilitates a reaction by providing an alternate route
from reactant to product that has lower activation energy than the non-
catalyzed pathway. As such, the catalyst will change the rate of the reaction.
This thesis deals with both homogeneous catalysis, where the catalyst is
dissolved in the same phase as the reactant, and biological catalysis by en-
zymes.
4.1 Palladium Catalysis

Palladium is a late transition metal with properties well suited for chemical
catalysis. It favors the 0 and +2 oxidation states and has a relatively small
energy barrier between these. This makes palladium suitable for oxida-
tion/reduction types of reactions. Palladium has ten d-electrons and prefers
coordination with four ligands, resulting in an 18-electron, coordinatively
saturated, tetrahedral complex. The eight-electron palladium(II) instead fa-
vors a four-ligand, 16-electron, low-spin, square-planar conformation.
Perhaps one of the most well-known examples of palladium catalysis is
the Mizoroki-Heck reaction where an aryl halide or vinyl halide reacts with
an alkene to form a new carbon-carbon bond. This reaction was reported
independently by Mizoroki68 in 1971 and Heck69 in 1972. The Mizoroki-
Heck reaction starts with oxidative addition followed by a migratory inser-
tion where the alkene is inserts into the metal-ligand bond. This is followed
by a β-hydride elimination and subsequent regeneration of palladium(0) by
reductive elimination.
The Mizoroki-Heck reaction is classified as a palladium(0)-catalyzed re-
action since each catalytic cycle is started in that oxidation state. The first
palladium(II)-catalyzed reaction was reported as early as 1968 by Heck70
but, until recently, palladium(0)-catalyzed reactions have been the main fo-
cus of palladium chemistry. This is largely because many palladium(II)-
catalyzed reactions result in the reduction of palladium. In order to restore
the catalyst for the next catalytic cycle, an oxidant has to be included in stoi-
chiometric quantities.
25
4.1.1 Decarboxylation
Benzoic acids, which are cheap, non-toxic, stable precursors to aryl-
palladium(II), are a common reactant in palladium-catalyzed, carbon-carbon
bond formations. The first metal-mediated decarboxylations were reported as
early as 1930,71 but the use of single metal palladium catalysis was only
reported recently, by Myers.72–74
The greatest limitation of palladium-catalyzed decarboxylative reactions
is the need for ortho substituents on the benzoic acids in order for the reac-
tions to be productive.75 Theoretical investigations have suggested that the
carboxylic acid functionality has to be orthogonal to the plane of the aro-
matic system to allow for decarboxylation (Figure 3). This conformation
breaks the conjugation between the acid and the aromatic ring and is there-
fore energetically disfavored. Ortho substituents break the planar system by
steric interactions, thus reducing the free-energy requirement for the orthog-
onal conformation.76
Figure 3. Top and side views of the optimized geometry (B3LYP/LACVP*) of a

2,6-dimethoxybenzoic acid palladium complex preceding decarboxylation. The acid
functionality is orthogonal to the aromatic plane. Distances are measured in Ång-
ström.
4.1.2 Palladium Ligands

A palladium ligand is any compound capable of coordinating with palladi-
um. Many synthetic protocols employ compounds specifically designed for
this purpose and, although they do not directly take part in the reaction, they
can be crucial to the efficacy of the catalyst. Ligating palladium can have
several beneficial effects; it can retard the aggregation of metallic palladi-
um(0), alter the electron density of palladium and, using steric effects, influ-
ence the neighboring coordination sites.
26
Palladium ligands can be grouped according to the type of atom coordi-
nating palladium and the number of coordinations they make. The number of
non-contiguous interactions made by the ligand to the metal center is called
its denticity. A ligand with one atom coordinating the metal is called mono-
dentate, and a ligand where two atoms coordinate is called bidentate. The
denticity is denoted by the symbol κ.
The palladium-catalyzed reactions investigated in this thesis all employed
bidentate nitrogen-based ligands. Some common nitrogen-based ligands are
shown in Table 1.
Table 1. Common nitrogen-based bidentate palladium ligands.
Structure Chemical name
2,2'-bipyridine
6-methyl-2,2'-bipyridine
6,6´-dimethyl-2,2'-
bipyridine
1,10-phenantroline
2,9-dimethyl-1,10-
phenantroline
2,9-dimethyl-4,7-
diphenyl-1,10-
phenanthroline
1,1'-biisoquinoline
4.1.3 Density Functional Theory Modeling of Palladium-

Catalyzed Reactions
The modeling of palladium-catalyzed reactions requires specific mention.
One of the advantages of modeling metal-catalyzed reactions is that the con-
formational freedom of the reactants is limited by the fact that they are coor-
dinated to a metal center. However, the modeling done for this thesis includ-
ed some approximations and assumptions to further reduce the complexity
and computational time:
• Only complexes involving one palladium atom were considered.
27
• The bidentate nitrogen ligands were always bound to palladium in
a κ2 fashion.
• The process of ligand exchange was not modeled but was, in most
cases, assumed to be diffusion-controlled and to have a free-
energy requirement of about 20 kJ mol-1.77
4.1.3.1 Quick Reaction Coordinates

When a TS has been identified, it has to be verified that it connects to the
supposed reactant and product. This is done in order to make sure that it is
the TS of interest. This can be achieved by intrinsic reaction coordinate
(IRC) calculations, which follow the reaction coordinate downhill on the
PES along the minimal energy pathway in both forward and backward direc-
tions.78,79 IRC calculations are generally very accurate but are carried out at a
high computational cost. Studies of organic reactivity generally do not re-
quire that level of accuracy and quick reaction coordinate (QRC) calcula-
tions can be applied.80 QRC calculations take one short step in each direction
along the TS normal mode and the derived structures are then minimized. If
the minimization proceeds smoothly, one can be reasonably sure it has ar-
rived at the connecting energy minimum.
4.2 Enzyme Catalysis

Enzymes are protein catalysts that are essential to many biochemical trans-
formations. The traditional viewpoint is that enzymes catalyze reactions by
stabilizing the TS of the reaction and, thus, lowering the activation energy.81
More recent views also include a dynamic feature of the protein–ligand
complex to facilitate barrier crossing.82
Because of their essential role in most processes in the body, enzymes are
important drug targets; about one third of the currently marketed drugs target
enzymes.83 A better understanding of enzyme reactivity is therefore im-
portant in the context of drug discovery.
4.2.1 Modeling of Enzyme Reactions

Theoretical modeling of enzyme reactions using QM methods has become a
powerful tool for investigating enzyme reactivity and deriving the most
probable reaction mechanisms.84,85 The two main methods are QM/MM and
the cluster approach. In the QM/MM approach, the entire enzyme is included
in the model but only the core is investigated using QM techniques while the
surrounding protein is investigated using MM methods.86 The enzyme mod-
eling conducted for this thesis was carried out using the cluster approach, as
described below.
28
4.2.2 Cluster Approach
The cluster approach to enzyme modeling allows the study of biochemical
problems using solely QM techniques. In this approach the amino acid resi-
dues in the vicinity of the active site are extracted to construct a model that is
investigated using only QM methods. In order to account for the steric and
electrostatic effects of the surrounding protein structure, the coordinates of
selected atoms are fixed during optimization and the entire complex is treat-
ed using a continuum solvation model.87,88
A range of values has been used for the dielectric constant (ε) when mod-
eling proteins, but a value of four is the most common.89 For the cluster ap-
proach, the effect of the solvation model decreases when the model system
increases in size and, for systems larger than 200 atoms, the relative energy
is nearly the same regardless of the chosen dielectric constant.90–92
The first step in this approach is to select the model substrate, amino acid
residues, cofactors, metal ions, and water molecules that will be included in
the model. This is often done by selecting as many of the amino acids sur-
rounding the catalytic site as possible with regard to the model size. With
today’s computational power, it is reasonable to consider models of up to
300 atoms. Naturally, care must be taken to include all residues that take part
in the reaction or stabilize the generated intermediates. Thus, the more in-
formation that is available on the reaction the better are the chances of con-
structing an accurate model.
The cluster approach is associated with some limitations, according to the
way the model system is constructed. Since only a small part of the enzyme
is included in the model, long range interactions are not modeled.93 Also, the
geometry optimization has to be carefully monitored to ensure that no artifi-
cial geometry changes occur.94
29
5 Aims of the Thesis
This thesis aspires to further the field in computational medicinal chemistry

by improving the methods used to decide which molecules to make, and by
theoretically investigating how best to make them. The first goal was pur-
sued by method development in the area of VS and the second by DFT-
based mechanistic investigations of reactions. More specifically, the aims of
this thesis were:
• to elucidate reaction pathways for palladium(II)-catalyzed decarboxyla-

tive reactions by means of DFT;
• to develop an improved VS protocol based on SB and LB method con-

sensus;
• to compile a dataset suitable for the evaluation of both LB and SB VS

methods when applied in a high throughput screening (HTS) scenario;
and
• to develop a new VS methodology using computationally predicted TS

and intermediate structures of enzymatic transformations to identify TS
and intermediate mimetics.
30
6 Decarboxylative Palladium(II)-Catalyzed
Reactions (papers I-III)
For this part of the thesis, three reactions involving palladium(II)-catalyzed

decarboxylation (the synthesis of aryl ketones, aryl amidines, and styrene)
were investigated using DFT.
6.1 Synthesis of Aryl Ketones

Lindh et al. developed a protocol for the synthesis of aryl ketones through
palladium(II)-catalyzed decarboxylative addition of benzoic acids to acetoni-
trile (Scheme 1).95 They also investigated the mechanism using electrospray
ionization mass spectrometry (ESI-MS).
Our goal was to expand their mechanistic study results by means of DFT
calculations, and to explain the experimentally observed outcomes for the
palladium ligands employed during the ligand screening process.
Scheme 1. General reaction scheme for the decarboxylative palladium(II)-catalyzed

synthesis of aryl ketones.
The palladium-catalyzed decarboxylation of aryl carboxylic acids has previ-
ously been the focus of theoretical studies.96,97 However, the systems used in
those studies were quite different from the one we used as they did not em-
ploy a bidentate nitrogen ligand.
We set out to model the reaction path using 6-methyl-2,2'-bipyridine as
the palladium ligand, 2,6-dimethoxybenzoic acid as the precursor to aryl-
palladium, and acetonitrile as the carbopalladation partner and solvent for
the reaction. The calculated free-energy profile for decarboxylation is shown
in Figure 4.
31
Figure 4. Calculated free-energy profile for decarboxylation. Reprinted with per-
mission from Svensson et al.98
The lowest energy complex identified prior to decarboxylation was the neu-
tral complex II where palladium coordinates 2,6-dimethoxybenzoate and
acetate. This complex also marks the starting point of the catalytic cycle.
The complex immediately preceding decarboxylation (IV) can be formed
from II by releasing the acetate and rotating the benzoate. From there, the
decarboxylation can occur through TS-I to form the product complex V.
Release of CO2 and coordination with another benzoic acid then generates
the lowest energy intermediate (VII) between decarboxylation and the sub-
sequent carbopalladation. Overall, the decarboxylation process had a calcu-
lated free-energy requirement of 110 kJ mol-1.
Next, the carbopalladation step was investigated (Figure 5). The calcula-
tions indicated carbopalladation as the rate-determining step of the reaction
with a calculated free-energy requirement of 132 kJ mol-1 (VII to TS-II).
Following the carbopalladation, coordination with benzoate and then ligand
exchange between the formed ketimine and acetate reformed the starting
point of the catalytic cycle (II). Hydrolysis of the product yielded the desired
aryl ketone and further lowered the free energy of the system.
32
Figure 5. Calculated free-energy pathway for carbopalladation and product for-
mation. Adapted with permission from Svensson et al.98
The free-energy profiles of all the nitrogen-based ligands that had been used
during the ligand screening were investigated to better understand the re-
ported experimental outcomes for the different palladium ligands. The com-
putationally investigated ligands are depicted in Table 1. A clear trend was
observed where the more sterically hindered ligands, such as 6,6´-dimethyl-
2,2'-bipyridine, had a lower free-energy requirement for decarboxylation
than the less hindered ligands. This could be due to that the lowest energy
complex prior to decarboxylation (II) was sterically crowded and was desta-
bilized by the methyl groups on these ligands.
One of the main findings of this theoretical study was that the lowest en-
ergy complex prior to decarboxylation was a neutral complex (II) and, in
order to proceed to decarboxylation, the system had to undergo a charge
separation to yield the charged pre-decarboxylative complex (IV). The same
pattern is true for the carbopalladation, with the neutral complex VII as the
lowest point prior to the charged carbopalladation TS (TS-II). This indicates
that the solvent played an important role in the reaction and that the free-
energy requirement can be lowered if the solvent better stabilizes the
charged species. In fact, DFT calculations applying a water solvation model
showed that the free-energy requirements were reduced with these settings.
This prompted us to try a series of experimental setups with increasing water
content while monitoring the rate and yield of the reaction.
33
The experiments showed an increased reaction rate for systems with high-
er concentrations of water in the solvent system. However, the formation of
the proto-decarboxylation product 1,3-dimethoxybenzene also increased
with the concentration of water and, thus, the systems with a high water con-
centration did not generate yields as high as those in the previously investi-
gated systems. Although the addition of water did not improve the protocol,
the increased reaction rate supports the computational findings.
6.2 Synthesis of Aryl Amidines

The palladium-catalyzed synthesis of aryl amidines from aryl trifluorobo-
rates has previously been reported by Sävmarker et al.99 Since the decarbox-
ylative synthesis of aryl ketones proved to be a rewarding reaction, we chose
to also investigate the related reaction furnishing aryl amidines using cyan-
amides (Scheme 2). The mechanism was investigated using DFT calcula-
tions and ESI-MS in conjunction with the synthetic results.
Scheme 2. The general scheme for palladium-catalyzed aryl amidine synthesis.

The free-energy profile for three benzoic acids (Scheme 3) was investigated;
we used benzoic acids that were more (2,4,6-trimethoxy benzoic acid, 1a) or
less (2,3,6-trimethoxy benzoic acid, 1c) electron-rich than 2,6-dimethoxy
benzoic acid (1b), which had been our focus in the theoretical studies of
decarboxylation.
Scheme 3. Structures of the theoretically investigated benzoic acids.

For 1b, the overall free-energy profile for decarboxylation was naturally
nearly identical to that previously calculated for the aryl ketone pathway,
with small differences stemming from the application of different solvent
parameters. Parameters suitable for N-methyl-2-pyrrolidone were used for
this study. The following carbopalladation was mechanistically highly simi-
lar to the nitrile counterpart but less energetically demanding. The calculated
free energy for the two reaction steps is shown in Table 2.
34
Table 2. Calculated free energy for the decarboxylation and carbopalladation steps.
Substrate ΔG‡ decarboxylation (kJ mol-1) ΔG‡ carbopalladation (kJ mol-1)
1a 110.3 110.9
1b 111.1 117,9
1c 138.5 149.1
Scheme 4. Proposed catalytic cycle based on DFT calculations and ESI-MS for the
palladium(II)-catalyzed decarboxylative synthesis of aryl amidines. The calculated
free-energy requirement for the two main reaction steps for substrate 1a is indicated
in the figure.
The investigation showed that the more electron-poor substrate 1c had sub-
stantially higher free-energy requirements for both reaction steps. The lower
free-energy requirement of the carbopalladation process placed it close in
energy to the decarboxylation process, in particular for the more electron-
rich substrates. The carbopalladation was still predicted to be the rate-
35
determining step but the difference was within the expected margins of error
for the DFT method.
The ESI-MS investigation detected several of the complexes that the cal-
culations had predicted would form during the reaction. A putative catalytic
cycle based on the theoretical investigation and the ESI-MS results is shown
in Scheme 4.
6.3 Styrene Synthesis

For the final palladium(II)-catalyzed reaction covered in this thesis, we set
out to create a decarboxylative protocol for the synthesis of styrenes
(Scheme 5). Also in this case, protocols employing alternate routes to aryl-
palladium were known.100–102
Scheme 5. Schematic for the palladium(II)-catalyzed decarboxylative styrene syn-

thesis with the main by-products shown.
It proved difficult to develop a general protocol for the synthesis of styrene
because of the formation of by-products. At first, when the experiments were
conducted with an excess of vinyl acetate (VA), some reaction systems
yielded large quantities of transvinylated by-product and, when the amounts
of VA were reduced, the diarylated product was formed instead of the de-
sired styrene. In order to better understand the mechanisms behind this, a
DFT investigation was initiated.
36
Figure 6. Calculated free-energy pathway for the formation of styrene.
The reaction was modeled using 2,6-dimethoxybenzoic acid as the aryl-

palladium precursor and palladium(II)-acetate as the palladium source. In
order to facilitate the calculations we opted to use 2,2'-bipyridine as the pal-
ladium ligand. Although this ligand was not the most favorable of the ap-
plied ligands, it was shown to be productive.
We started by mapping the reaction pathway for the formation of the de-
sired styrene (Figure 6). The calculations indicated that the rate-determining
step was the decarboxylation (117 kJ mol-1, XI to TS-III) and the overall
reaction was found to be exergonic. The results also indicated that, after
decarboxylation, α-arylation was preferred over β-arylation with free-energy
requirements of 98.5 kJ mol-1 and 115 kJ mol-1, respectively. This indicates
that styryl acetate formation, resulting from β-arylation, was unlikely to be
one of the main pathways in this reaction. This was also supported by the
experimental outcome, where only a small amount of styryl acetate was de-
tected. With this pathway established, we proceeded to the by-products, and
the calculated free-energy pathway for the transvinylation is shown in Fig-
ure 7.
37
Figure 7. Calculated free-energy pathway for the transvinylation process. The calcu-
lated free-energy requirement for decarboxylation is shown (in red) for comparison.
The transvinylation process turned out to have a calculated free-energy re-

quirement very similar to that of decarboxylation (124 kJ mol-1, XI to TS-
VII). Thus, it is not surprising that a large amount of the by-product was
formed when the reaction was carried out using high concentrations of VA.
Our first investigation of decarboxylation revealed that the more sterically
hindered ligands favored decarboxylation. The ligand screen conducted in
this project supported this conclusion, as the sterically hindered ligands
clearly resulted in less of the transvinylation by-product than the unhindered
ligands, indicating that the free-energy requirement for decarboxylation in
those cases was lower than that for transvinylation.
The free-energy pathway leading to the diarylated by-products, stilbene
and 1,1-diarylethene (Figure 8), was also investigated. These calculations
showed, somewhat surprisingly, that the reaction path leading to the sterical-
ly congested 1,1-diarylethene product was the energetically favored path-
way. The free-energy requirement for this transformation was found to be
104 kJ mol-1 (XV to TS-IXa). This is lower than the free energy required for
decarboxylation and similar to the free energy required for the first arylation.
This indicated that it would be difficult to avoid a second arylation in these
protocols.
38
Figure 8. Calculated free-energy profile for the formation of the styrene product in
black and the 1,1-diarylethene in red.
The key reaction steps were also investigated using 2,9-dimethyl-1,10-

phenantroline as the palladium ligand. These results indicated that the for-
mation of the transvinyl product was energetically disfavored when employ-
ing this ligand. This is consistent with the experimental outcome where only
small amounts of the transvinylation product were observed for systems
employing 2,9-dimethyl-1,10-phenantroline. Further, the calculations indi-
cated that the sterically congested 1,1-diarylethene product was favored over
the linear counterpart for this ligand also.
Based on the theoretical results, a synthetic effort was initiated with the
aim of developing a protocol for the selective synthesis of 1,1-diarylethenes.
After optimization, the protocol using 2,6-dimethoxybenzoic acid yielded
74% of the desired product with excellent selectivity over the corresponding
stilbene (Scheme 6). The protocol was, with some success, extended to
2,4,6-trimethoxybenzoic acid and 2,4-dimethoxybenzoic acid, reaching 29%
and 37% yields, respectively. Unfortunately, the protocol gave very low
yields for other substrates.
39
Scheme 6. Reaction conditions for the selective synthesis of 1,1-diarylethene.
Overall, the calculations confirmed the experimental findings that it might be
difficult to produce a general protocol for the selective synthesis of styrene.
This is because the two possible side reactions, transvinylation and diaryla-
tion, have free-energy requirements that are very similar to or lower than
those of the main reaction and, thus, will be difficult to circumvent. Further,
the situation was made more difficult by the fact that a high concentration of
VA, which would have been one way of retarding the diaryl formation, leads
to high yields of the transvinylation by-product while, in contrast, a low con-
centration of VA leads to problems with diarylation.
40
7 Method Consensus in Virtual Screening
(paper IV)
Combining the signals from several sources has the potential to increase
their accuracy.103 This concept is generally referred to as data fusion or sig-
nal consensus. In the field of VS, the concept of consensus scoring for dock-
ing has been known for a long time.104 The work of Willet has shown that
these concepts are applicable for fusing the results from different methods in
LB VS.105
Consider a collection of VS methods. If the methods agree better when
ranking active compounds than when ranking inactive compounds, then the
combination of these methods should yield better results for active com-
pounds. If, instead, the applied methods find different active compounds,
then the best use of the data would be to select the top compounds from each
method.
Besides improving the result over that of any individual signal, another
possible beneficial result of data fusion is that it might reduce variation in
performance. Since no VS method is known to consistently give the best
result, and the performance often varies depending on the target and com-
pound dataset, it is very difficult to know which method to apply when faced
with a new target. If data fusion can function as an averaging effect between
methods it could be used to remove this uncertainty, although at a higher
computational cost.
LB and SB methods rely on quite different principles but are both known
to give good results in VS. We therefore set out to investigate whether the
results of VS could be improved by applying data fusion to the results of
both LB and SB methods, and which data fusion scheme would be the most
successful. We chose VS methods to represent the current state of the art:
docking using Glide,106,107 shape screening using ROCS,108 pharmacophore
screening using Phase,109,110 and electrostatic shape screening with EON.111
Five data fusion schemes, all previously reported in the literature, were
applied: sum rank, rank vote, sum score, Pareto ranking, and parallel selec-
tion.105,112–115 The algorithms are further described in Table 3. These schemes
are convenient as they do not require any prior knowledge of the target or
dataset and no training set needs to be applied. The only variable that has to
be decided upon prior to screening is the number of structures that should
41
receive votes from the rank vote method. This would typically be set based
on the number of hits desired rather than the properties of the dataset.
Table 3. The data fusion algorithms applied in this study. Adapted with permission
from Svensson et al.116
Name Algorithm
Sum rank The ranks from the single method rank lists are summed.
Rank vote Each single method gives a vote to the 250 highest-ranked compounds.
The hit lists are then combined and the compounds are sorted first on the
number of votes and then on the sum score.
Sum score All scores are divided by the best score from that method. This relative
score is then summed from all the single methods.
Pareto ranking A compound is ranked based on how many other compounds are better
in all single methods. Ties are broken using sum rank.
Parallel selection Compounds are selected in turn from the top of the single method hit
lists until the desired number of compounds is reached. If a compound
has already been selected, the next compound from the current hit list is
chosen instead.
We chose 16 datasets to evaluate the data fusion schemes, six developed by

Jacobson and Karlén117 and later modified by Mutas et al.,118 and ten from
the DUD. This subset of DUD datasets has previously been identified to be
the most suitable for evaluation of LB methods since it contains more differ-
ent scaffolds among the active compounds than the average DUD dataset.119
Among the single methods applied in this study, docking and shape
screening performed best and electrostatic shape performed worst. It is,
however, important to note that no effort was made to optimize the usage of
the applied methods; they were applied with the recommended general set-
tings. Nonetheless, each method performed best for at least one dataset, il-
lustrating how difficult the choice of method is for a new target.
The results of the study are shown in Table 4. In order to compare the EF
between methods, we calculated the relative EF (the EF divided by the max-
imum possible EF).
Table 4. The average relative EF across all the datasets for the investigated single
methods and data fusion algorithms, shown for three top fractions of the datasets.
The numbers in bold indicate the best performance for that database fraction.
Method 100 compounds 1% 10%
Glide 0.34 0.44 0.66
Phase 0.33 0.34 0.46
ROCS 0.43 0.46 0.55
EON 0.33 0.37 0.53
Sum rank 0.38 0.43 0.64
Rank vote 0.45 0.52 0.71
Sum score 0.43 0.50 0.69
Pareto ranking 0.40 0.51 0.73
Parallel selection 0.44 0.53 0.73
42
When the results were reviewed it became clear that data fusion was indeed
able to improve the outcome of VS. Not only did the data fusion have an
averaging effect, meaning that the fused results were more consistent than
any of the single methods, but data fusion recovered more active compounds
than any of the single methods for some targets. Also, on average across all
the datasets, data fusion recovered more active compounds than any single
method alone. Among the applied data fusion algorithms, parallel selection
performed best, indicating that for these datasets and methods the hit lists are
at least to some extent complementary.
The results for one of the targets are shown in Figure 9. The differences
between the different data fusion algorithms were clearly smaller than the
differences between the different VS methods. This was a general trend
across the tested targets. This is beneficial for data fusion since it can serve
to reduce the impact of the method selection. If the spread between data fu-
sion algorithms had been as wide as for the individual methods we would
just have replaced one problem of method selection for another.
Figure 9. ROC curves for one of the investigated targets (ERα). The left plot shows
the individual VS methods and the right plot shows the result from the five data
fusion algorithms. Reprinted with permission from Svensson et al.116
Following our study, another investigation of data fusion using disparate

methods has also shown improvements in the VS results.120 That study also
introduced a new fusion method that had the best performance. In this fusion
method, called Z2, the combined score was derived from the mean of the
normalized scores for the two best individual methods for each compound.
This method relies on the same basic principle as the parallel selection
method; if a compound is scored sufficiently highly by any or by a few
methods then that is sufficient to include the compound in the fused hit list.
These results further strengthen the view that the hit lists, when applying
principally different VS methods, are likely to be complementary.
43
8 Evaluation of Virtual Screening Using
Publicly Available High Throughput
Screening Data (paper V)
One possible application of VS is as a replacement or complement to HTS at

the start of a drug discovery project.121 The aim is then to identify new mo-
lecular starting points for medicinal chemistry development by finding lig-
and binders in large collections of compounds. This scenario is in many
ways different from what is tested when VS protocols are evaluated using
benchmark datasets such as the DUD. One of the biggest differences is that
the active compounds in most benchmark datasets are collected from the
literature and, thus, represent already optimized binders. This makes the
affinity of the active compounds too high compared to what would be ex-
pected to be found in an HTS library and, if whole series of molecules are
included, they might also be highly similar. Furthermore, the ratio of active
to inactive compounds is usually different from what would be expected in
HTS, with too few inactive compounds per active compound. Since the ex-
isting benchmark datasets are not sufficient to evaluate the success of VS in
an HTS scenario, we set out to create a benchmark dataset that could be used
to evaluate both SB and LB methods.
The most obvious way to evaluate how well VS would fare in an HTS
scenario was to test the VS protocols on data already evaluated in high-
throughput screens. The PubChem BioAssay database62 is an excellent
source of publicly available HTS data that has been used before to create the
MUV dataset.56 The MUV dataset, however, was constructed mainly to
evaluate ligand-based methods and the aim is not to resemble a high-
throughput screen.
The problem with using HTS data for retrospective evaluation is the gen-
erally poor quality of the data generated.122 Figure 10 shows the percent
inhibition measured for a library of almost 300 000 compounds against
VHR1 (PubChem AID 1654). As can be seen clearly from the graph, the
spread in the measured inhibition is quite high, indicating low confidence in
the acquired inhibition values.
44
Figure 10. Reported % inhibition at 13.3 μM for PubChem HTS id 1654.
Another complication when using data from PubChem is the lack of stand-
ardization for how the data are annotated and the assay procedures de-
scribed. This makes the curation of the data a manual and time-consuming
job.
Issues regarding data quality as well as several other aspects need to be
considered before using HTS data for VS evaluation. We stipulated that, in
order to be useful as an evaluation set, at least 1000 compounds should have
been screened, with a minimum of twenty actives found. The bioassays also
needed to be linked to at least one protein crystal structure in the PDB, bind-
ing a drug-like molecule. In order to improve confidence in the data, we only
used HTS data where the active compounds had been confirmed in a dose-
response-type secondary assay. This addressed the most acute problem of
false positives in the HTS data but did not take into account the issue of false
negatives. However, due to the large number of inactive compounds, false
negatives should only have had a minor impact on the evaluation metrics
used for VS, unless present in excessive amounts. At first, we also excluded
all proteins with a co-factor binding site but since this excluded target clas-
ses such as kinases, which are well characterized regarding ligand binding,
we selectively included targets where binding in alternative sites was
deemed unlikely.
With the inclusion criteria set, we used Python scripts to mine PubChem
for assays that fulfilled these conditions.
Following the automated filters, the collected data were manually re-
viewed and the details of each bioassay were assessed. All data from whole-
cell assays were removed as the mode of action of potential inhibitors was
45
considered too uncertain. To our surprise, from the thousands of screens
deposited in PubChem, only seven datasets fulfilled our selection criteria,
indicating that the quality of the publicly available HTS data was far from
optimal. The seven datasets and the number of included compounds are
listed in Table 5. The number of inactive molecules to active molecules, on
average almost 2 500 inactive for every active, clearly distinguished these
datasets from previous benchmark datasets.
Table 5. Number of active and inactive compounds in the compiled datasets.
Target # active # inactive
MMP13 19 64 814
DUSP3 52 289 236
PTPN22 28 292 058
EPHX2 44 90 262
CTDSP1 369 337 634
MAPK10 57 59 405
CDK5 23 334 339
The seven datasets were characterized using physicochemical descriptors

and the enrichment from 1D similarity to the crystal ligands was calculated.
In previous benchmark datasets, much effort has gone towards eliminating
artificial enrichment (enrichment due to differences in 1D descriptors be-
tween the active and inactive molecules). While it is necessary to be aware
of such differences in the dataset in order to carry out sound evaluations,
Bender and Glen have suggested that, instead of removing inactive mole-
cules that are too different, the results from more advanced methods can be
compared with the 1D similarity results.123 Since this strategy allows the
dataset to be more true to the original high-throughput screen, all inactive
compounds were included. Any evaluation using these datasets must then of
course be taken in context with the performance of 1D methods.
We also investigated the analog bias (are the active compounds too simi-
lar?) by calculating the number of Bemis-Murcko scaffolds124 in each da-
taset. The results showed that the active compounds in general were quite
diverse, with fewer than two structures sharing the same scaffold on average.
In order to evaluate the suitability of the datasets for method validation,
we also conducted VS using standard methods. To represent the limited in-
formation at the start of a drug discovery project, we chose only to use the
information from the crystal structure and the co-crystalized ligand in the
VS. The VSs were only moderately successful. This was probably because
of the diverse structures of the active compounds and the often quite differ-
ent structure of the crystalized ligand that was used as query in the LB tech-
niques.
46
Keeping the dissimilar inactive compounds in the dataset also gives some
advantage to the evaluation. The inactive molecules will cover much more of
the physicochemical space and, thus, methods that are biased towards big or
charged molecules, for example, will be penalized in the comparison. It is
not possible to detect that kind of bias using a property balanced dataset such
as the DUD. Indeed, when we evaluated the results of the VS, it became
clear that docking, in many cases, showed a preference for very large mole-
cules, as shown in Figure 11.
Figure 11. PCA plots showing the distribution of inactive compounds as a contour
plot, the active compounds as black dots, the crystal ligand as a red triangle, and the
one hundred best-scored compounds from docking as yellow squares. Size is a major
factor when determining the first principal component. Reprinted with permission
from Lindh et al.125
In conclusion, the presented datasets can be a valuable complement to al-

ready widely used benchmark datasets by allowing for a more accurate eval-
uation of how VS methods fare in an HTS-like scenario. The large number
of inactive compounds as well as their wide spread in physicochemical space
will also allow for the detection of some method biases.
Hopefully, the amount of publicly available high-quality HTS data will
continue to grow, allowing the dataset to be expanded in the future.
47
9 Virtual Screening Using Computationally
Derived Reaction Coordinates (paper VI)
Enzymes catalyze a variety of reactions and are essential for life as we know
it. The capacity of an enzyme to catalyze a reaction stems from its ability to
stabilize the TS of the reaction and, thus, to lower the required free energy of
the transition. TS mimetic inhibitors resemble the TS and can therefore take
advantage of the same stabilizing effect in their binding. This results in the
potential for TS mimetics to be extraordinarily potent inhibitors.126
It is in the nature of TSs that their structure cannot be experimentally de-
termined by methods such as X-ray crystallography. They can, however, be
modeled computationally using various approaches, the most common being
QM/MM methods86 and the so-called cluster approach.87,88
Research from the laboratory of Schramm has shown that it is possible to
use computationally derived TSs as a basis for inhibitor design and that the
designed inhibitors can be extremely potent.127,128 We envisioned that instead
of designing inhibitors, computed TS or high-energy intermediate structures
might be used for VS in order to query compound collections for TS mimetic
inhibitors.
We chose insulin-regulated aminopeptidase (IRAP) as the target for our
efforts. IRAP belongs to the M1 family of aminopeptidases and has been
indicated as a promising target for cognitive enhancement.129 A schematic
mechanism based on studies of the related enzyme aminopeptidase N (APN)
and endoplasmic reticulum aminopeptidase (ERAP) is shown in Scheme 7.
Scheme 7. Schematic mechanism for the M1 family of aminopeptidases. Reprinted

with permission from Svensson et al.130
48
Several TS or high-energy-intermediate mimetic inhibitors of enzymes in the
M1 family are known; among these, bestatin and amastatin have been con-
firmed to also be active against IRAP.131
When the project started, the crystal structure of IRAP was not available
and we chose to use the crystal structure of the closely related enzyme APN
as the basis for modeling. The active site of APN has a very high sequence
similarity to its IRAP counterpart. Starting from the APN crystal 4FYR,132
the residues closest to the active site were extracted. The extracted active site
contained eleven amino acid side chains; of these, two differed between
APN and IRAP. The APN residues Gln213 and Val385 are the equivalents
of the IRAP residues Glu295 and Ile461, respectively, and these were modi-
fied to reflect the IRAP structure. A model substrate and the water molecule
necessary for the reaction were added to the active site. The resulting model,
consisting of 227 atoms, was used to model the reaction by means of DFT
calculations and the cluster approach.
During the course of our work, two crystal structures for IRAP were re-
leased (PDB ID 4PJ6 and 4P8Q).133 We evaluated these and found that they
did not provide a significant advantage over the APN-based model as the
parts included in our active site model were highly similar between the two
enzymes and the quality of the released IRAP structures was in some ways
inadequate. Firstly, the resolution of the released structures was quite poor
(2.96 Å and 3.02 Å) compared to that of the APN structure (1.91 Å). The
IRAP structures also held no substrate or inhibitor. Instead, a single amino
acid residue was modeled in the active site. The structures were also in an
intermediate position between the open and closed conformations know from
ERAP.134 A schematic picture of the open and closed conformations of
ERAP1 is shown in Figure 12. It has been suggested that the peptide sub-
strate C-terminal in ERAP binds to a positively charged binding site in the
open conformation of the enzyme. The ligand binding prompts the enzyme
to adopt the closed formation. If the substrate is too large, the enzyme cannot
close, and if it is too small it will not reach the N-terminal binding site where
the catalytic cleavage takes place. Thus, ERAP only effectively cleaves pep-
tides of a certain length.135
49
Figure 12. Schematic picture of the open (left, PDB ID 3MDJ136) and closed (right,
PDB ID 2YDO134) conformations of ERAP1.
The biological significance of this half open conformation in the IRAP crys-
tals is not known. However, in this conformation, Tyr549, which is located
in the active site and known from other M1 family peptidases to stabilize the
reaction intermediate,137 does not have any interaction with the amino acid
crystalized in the substrate position on the active site. The Gly-Ala-Met-Glu-
Asn amino acid loops that line the active site are also displaced compared to
the APN structure. This could be because of the enzyme conformation or it
could reflect an actual difference between the two enzymes. To resolve this
uncertainty, more and better crystal structures of IRAP are required.
The modeled reaction path (Figure 13), based on the APN active site
model, starts with the nucleophilic attack of the water molecule on the sub-
strate carbonyl to form the gem-diol intermediate. From there, the hydrogen
bonds to and from the formed gem-diol have to shift to prepare for the pro-
tonation of the former amide nitrogen. From here, protonation takes place
and subsequently the peptide is cleaved. The first hydrogen bond shift was
found to have the highest free-energy requirement in our model (58.5 kJ mol-
1
). Overall, the calculated reaction pathway follows the literature mechanism
for the M1 family aminopeptidases and has reasonable associated energies.
50
IRAP-I IRAP-TS-I IRAP-II
IRAP-TS-II IRAP-III IRAP-TS-III
IRAP-IV IRAP-TS-IV IRAP-V
IRAP-TS-V IRAP-VI
Figure 13. Schematic view of the calculated intermediates and TSs of the modeled
reaction. Reprinted with permission from Svensson et al.130
Once the computed TS and intermediate structures were determined, we set

out to use them for VS. We chose to use the structures IRAP-II and IRAP-
51
V since they represent the two main steps of the reaction. These intermedi-
ates were very similar to the preceding TS structure, and we expected that
the difference was not of any major significance when used for docking.
The compound library was first docked and then submitted to a pharma-
cophore filter. This was done in order to only keep the structures that were
predicted to bind in a similar way to the reaction intermediate. The pharma-
cophore model was built from the substrate in structure IRAP-II. The allow-
ance for the pharmacophore filter was quite big and the chosen features were
present throughout the reaction; therefore, the choice of structure on which
the pharmacophore was based should have been of lesser importance. The
compounds used for screening were from the eMolecules database, which at
the time consisted of around 5.6 million compounds. Before VS, we filtered
off any structure with a molecular weight over 600 g mol-1 and prepared the
structures for VS with LigPrep.
The VS yielded several interesting compounds, including known ami-
nopeptidase TS/high-energy intermediate mimetics. The ranking of these
compounds was dramatically improved by the application of the pharmaco-
phore model, indicating that some form of binding mode filter is beneficial.
The VS protocol was also excellent at reproducing the experimentally de-
termined binding mode of bestatin, Figure 14.
Figure 14. The conformation of bestatin resulting from the VS protocol (pink car-
bons) overlaid on the crystal structure of bestatin in APN (yellow carbons, PDB ID
4FKK138). Reprinted with permission from Svensson et al.130
Seven compounds were purchased and submitted for biological testing with
respect to their ability to inhibit IRAP. Among these, ami-
no(phenyl)methylphosphonic acid (2, Scheme 8) was confirmed as an IRAP
inhibitor with an IC50 of 230 (± 7) μM. We also tested amastatin as a positive
control and it registered an IC50 of 26 (± 0.6) μM in our assay.
52
Scheme 8. Structure of the identified IRAP inhibitor.
α-Aminophosphonic acids are known to inhibit APN but have not previously
been shown to inhibit IRAP.139 Although, the IRAP inhibition is not surpris-
ing it further confirms the usefulness of this VS strategy.
Although 2 was a close match in pharmacophore features to the TS, the
measured affinity was not very high. This might have been the result of the
small size of the inhibitor, especially compared to the natural substrate of
IRAP. It is also possible that the electron density and charge distribution
were not sufficiently similar to those of the TS to achieve a tighter binding.
Overall, this study confirms that it is possible to use calculated reaction
coordinates as the basis for VS in order to enrich compounds that can act as
TS/high-energy intermediate mimetics.
53
10 Conclusions
In summary, the work presented in this thesis has resulted in the following
new insights:
• The theoretical studies increased the understanding of palladium(II)-

catalyzed decarboxylation reactions. The studies not only explained the
experimental outcomes, but also inspired the pursuit of improved proto-
cols.
• Data fusion of results obtained with LB and SB methodologies improved

the VS outcomes.
• A new VS benchmark dataset that can be used to evaluate the perfor-

mance of VS techniques was compiled, evaluated, and made generally
available.
• A new method for identifying TS and high-energy intermediates through

VS using DFT-derived reaction coordinates was developed and applied.
The presented protocol was validated through the identification of
known TS mimetics.
54
11 Acknowledgements
The work presented in this thesis would not have been possible without the
help, support, and inspiration provided by others. I wish to express my sin-
cere gratitude to the following people:
My supervisor Associate Professor Christian Sköld and my co-supervisor

Professor Anders Karlén, for the best possible support. I have truly enjoyed
working with you.
Professor Mats Larhed, the palladium group, the TB group, the IRAP group,
and the CADD group for fruitful scientific discussions.
All my colleagues at the department, past and present, for making our work-
place such a great environment for both research and socializing.
Professor Fahmi Himo at Stockholm University and all members of his

group for generously sharing their knowledge of enzymatic reaction model-
ing and the cluster approach.
Apotekarsocieteten for financial support allowing parts of this work to be

presented at international conferences.
My always loving and supportive family.
My partner in life, Susanne “sus” Bornelöv, for simply being the best.
55
12 References
(1) Jones, A. W. Early Drug Discovery and the Rise of Pharmaceutical

Chemistry. Drug Test. Anal. 2011, 3, 337–344.
(2) Jorgensen, W. L. The Many Roles of Computation in Drug Discovery.
Science 2004, 303, 1813–1818.
(3) Mark, A. E.; van Gunsteren, W. F. Decomposition of the Free Energy of a
System in Terms of Specific Interactions: Implications for Theoretical and
Experimental Studies. J. Mol. Biol. 1994, 240, 167–176.
(4) Dill, K. A. Additivity Principles in Biochemistry. J. Biol. Chem. 1997, 272,
701–704.
(5) Ajay; Murcko, M. A. Computational Methods to Predict Binding Free
Energy in Ligand-Receptor Complexes. J. Med. Chem. 1995, 38, 4953–
4967.
(6) Christen, M.; van Gunsteren, W. F. On Searching In, Sampling Of, and
Dynamically Moving through Conformational Space of Biomolecular
Systems: A Review. J. Comput. Chem. 2008, 29, 157–166.
(7) Saunders, M.; Houk, K. N.; Wu, Y. D.; Still, W. C.; Lipton, M.; Chang, G.;
Guida, W. C. Conformations of Cycloheptadecane. A Comparison of
Methods for Conformational Searching. J. Am. Chem. Soc. 1990, 112, 1419–
1427.
(8) Eyring, H. The Activated Complex in Chemical Reactions. J. Chem. Phys.
1935, 3.
(9) Hill, T. L. On Steric Effects. J. Chem. Phys. 1946, 14.
(10) Westheimer, F. H.; Mayer, J. E. The Theory of the Racemization of
Optically Active Derivatives of Diphenyl. J. Chem. Phys. 1946, 14.
(11) Allinger, N. L. Conformational Analysis. 130. MM2. A Hydrocarbon Force
Field Utilizing V1 and V2 Torsional Terms. J. Am. Chem. Soc. 1977, 99,
8127–8134.
(12) Schrödinger, E. An Undulatory Theory of the Mechanics of Atoms and
Molecules. Phys. Rev. 1926, 28, 1049–1070.
(13) Hohenberg, P.; Kohn, W. Inhomogeneous Electron Gas. Phys. Rev. 1964,
136, B864–B871.
(14) Kohn, W.; Sham, L. J. Self-Consistent Equations Including Exchange and
Correlation Effects. Phys. Rev. 1965, 140, A1133–A1138.
(15) Bochevarov, A. D.; Harder, E.; Hughes, T. F.; Greenwood, J. R.; Braden, D.
A.; Philipp, D. M.; Rinaldo, D.; Halls, M. D.; Zhang, J.; Friesner, R. A.
Jaguar: A High-Performance Quantum Chemistry Software Program with
Strengths in Life and Materials Sciences. Int. J. Quantum Chem. 2013, 113,
2110–2142.
(16) Becke, A. D. A New Mixing of Hartree–Fock and Local Density-Functional
Theories. J. Chem. Phys. 1993, 98, 1372.
56
(17) Lee, C.; Yang, W.; Parr, R. G. Development of the Colle-Salvetti
Correlation-Energy Formula into a Functional of the Electron Density. Phys.
Rev. B 1988, 37, 785–789.
(18) Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. Assessment of Gaussian-3
and Density-Functional Theories on the G3/05 Test Set of Experimental
Energies. J. Chem. Phys. 2005, 123, 124107.
(19) Ahlquist, M.; Kozuch, S.; Shaik, S.; Tanner, D.; Norrby, P.-O. On the
Performance of Continuum Solvation Models for the Solvation Energy of
Small Anions. Organometallics 2006, 25, 45–47.
(20) Ditchfield, R.; Hehre, W. J.; Pople, J. A. Self-Consistent Molecular-Orbital
Methods. IX. An Extended Gaussian-Type Basis for Molecular-Orbital
Studies of Organic Molecules. J. Chem. Phys. 1971, 54, 724.
(21) Hehre, W. J.; Pople, J. A. Self-Consistent Molecular Orbital Methods. XIII.
An Extended Gaussian-Type Basis for Boron. J. Chem. Phys. 1972, 56,
4233.
(22) Binkley, J. S.; Pople, J. A. Self-Consistent Molecular Orbital Methods. XIX.
Split-Valence Gaussian-Type Basis Sets for Beryllium. J. Chem. Phys.
1977, 66, 879.
(23) Hariharan, P. C.; Pople, J. A. The Influence of Polarization Functions on
Molecular Orbital Hydrogenation Energies. Theor. Chim. Acta 1973, 28,
213–222.
(24) Francl, M. M.; Pietro, W. J.; Hehre, W. J.; Binkley, J. S.; Gordon, M. S.;
DeFrees, D. J.; Pople, J. A. Self-Consistent Molecular Orbital Methods.
XXIII. A Polarization-Type Basis Set for Second-Row Elements. J. Chem.
Phys. 1982, 77, 3654.
(25) Hehre, W. J.; Ditchfield, R.; Pople, J. A. Self—Consistent Molecular Orbital
Methods. XII. Further Extensions of Gaussian—Type Basis Sets for Use in
Molecular Orbital Studies of Organic Molecules. J. Chem. Phys. 1972, 56,
2257.
(26) Hay, P. J.; Wadt, W. R. Ab Initio Effective Core Potentials for Molecular
Calculations. Potentials for K to Au Including the Outermost Core Orbitals.
J. Chem. Phys. 1985, 82, 299.
(27) Grimme, S. Density Functional Theory with London Dispersion Corrections.
Wiley Interdiscip. Rev. Comput. Mol. Sci. 2011, 1, 211–228.
(28) Ahlquist, M. S. G.; Norrby, P.-O. Dispersion and Back-Donation Gives
Tetracoordinate [Pd(PPh3)4]. Angew. Chemie Int. Ed. 2011, 50, 11794–
11797.
(29) Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A Consistent and Accurate
Ab Initio Parametrization of Density Functional Dispersion Correction
(DFT-D) for the 94 Elements H-Pu. J. Chem. Phys. 2010, 132, 154104–
154119.
(30) Antony, J.; Grimme, S.; Liakos, D. G.; Neese, F. Protein–Ligand Interaction
Energies with Dispersion Corrected Density Functional Theory and High-
Level Wave Function Based Methods. J. Phys. Chem. A 2011, 115, 11210–
11220.
57
(31) Tannor, D. J.; Marten, B.; Murphy, R.; Friesner, R. A.; Sitkoff, D.; Nicholls,
A.; Honig, B.; Ringnalda, M.; Goddard, W. A. Accurate First Principles
Calculation of Molecular Charge Distributions and Solvation Energies from
Ab Initio Quantum Mechanics and Continuum Dielectric Theory. J. Am.
Chem. Soc. 1994, 116, 11875–11882.
(32) Marten, B.; Kim, K.; Cortis, C.; Friesner, R. A.; Murphy, R. B.; Ringnalda,
M. N.; Sitkoff, D.; Honig, B. New Model for Calculation of Solvation Free
Energies: Correction of Self-Consistent Reaction Field Continuum
Dielectric Theory for Short-Range Hydrogen-Bonding Effects. J. Phys.
Chem. 1996, 100, 11775–11788.
(33) Hratchian, H. P.; Schlegel, H. B. Finding Minima, Transition States, and
Following Reaction Pathways on Ab Initio Potential Energy Surfaces. In
Theory and Applications of Computational Chemistry; Elsevier: Amsterdam,
2005; pp. 195–249.
(34) Ripphausen, P.; Nisius, B.; Peltason, L.; Bajorath, J. Quo Vadis, Virtual
Screening? A Comprehensive Survey of Prospective Applications. J. Med.
Chem. 2010, 53, 8461–8467.
(35) Matter, H.; Sotriffer, C. Applications and Success Stories in Virtual
Screening. In Virtual Screening; Wiley-VCH Verlag GmbH & Co. KGaA,
2011; pp. 319–358.
(36) Clark, D. E. What Has Virtual Screening Ever Done for Drug Discovery?
Expert Opin. Drug Discov. 2008, 3, 841–851.
(37) Concepts and Applications of Molecular Similarity; Johanson, M. A.;
Maggiora, G. M., Eds.; John Wiley & Sons: New York, 1990.
(38) Tresadern, G.; Bemporad, D.; Howe, T. A Comparison of Ligand Based
Virtual Screening Methods and Application to Corticotropin Releasing
Factor 1 Receptor. J. Mol. Graph. Model. 2009, 27, 860–870.
(39) Meng, X.-Y.; Zhang, H.-X.; Mezei, M.; Cui, M. Molecular Docking: A
Powerful Approach for Structure-Based Drug Discovery. Curr. Comput.
Aided. Drug Des. 2011, 7, 146–157.
(40) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.;
Weissing, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank.
Nucleic Acids Res. 2000, 28, 235–242.
(41) Kleywegt, G. J.; Harris, M. R.; Zou, J. Y.; Taylor, T. C.; Wählby, A.; Jones,
T. A. The Uppsala Electron-Density Server. Acta Crystallogr. D. Biol.
Crystallogr. 2004, 60, 2240–2249.
(42) Davis, A. M.; St-Gallay, S. A.; Kleywegt, G. J. Limitations and Lessons in
the Use of X-Ray Structural Information in Drug Design. Drug Discov.
Today 2008, 13, 831–841.
(43) Madhavi Sastry, G.; Adzhigirey, M.; Day, T.; Annabhimoju, R.; Sherman,
W. Protein and Ligand Preparation: Parameters, Protocols, and Influence on
Virtual Screening Enrichments. J. Comput. Aided. Mol. Des. 2013, 27, 221–
234.
(44) Kirchmair, J.; Spitzer, G. M.; Liedl, K. R. Consideration of Water and
Solvation Effects in Virtual Screening. In Virtual Screening; Wiley-VCH
Verlag GmbH & Co. KGaA, 2011; pp. 263–289.
58
(45) Lenselink, E. B.; Beuming, T.; Sherman, W.; van Vlijmen, H. W. T.;
IJzerman, A. P. Selecting an Optimal Number of Binding Site Waters To
Improve Virtual Screening Enrichments Against the Adenosine A2A
Receptor. J. Chem. Inf. Model. 2014, 54, 1737–1746.
(46) Abel, R.; Young, T.; Farid, R.; Berne, B. J.; Friesner, R. A. The Role of the
Active Site Solvent in the Thermodynamics of Factor Xa-Ligand Binding. J.
Am. Chem. Soc. 2008, 130, 2817–2831.
(47) Young, T.; Abel, R.; Kim, B.; Berne, B. J.; Friesner, R. A. Motifs for
Molecular Recognition Exploiting Hydrophobic Enclosure in Protein–ligand
Binding. Proc. Natl. Acad. Sci. 2007, 104 , 808–813.
(48) Kirkpatrick, P.; Ellis, C. Chemical Space. Nature 2004, 432, 823.
(49) Lipinski, C.; Hopkins, A. Navigating Chemical Space for Biology and
Medicine. Nature 2004, 432, 855–861.
(50) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental
and Computational Approaches to Estimate Solubility and Permeability in
Drug Discovery and Development Settings. Adv. Drug Deliv. Rev. 1997, 23,
3–25.
(51) Pearson, K. LIII. On Lines and Planes of Closest Fit to Systems of Points in
Space. Philos. Mag. Ser. 6 1901, 2, 559–572.
(52) Jain, A. N.; Nicholls, A. Recommendations for Evaluation of Computational
Methods. J. Comput. Aided. Mol. Des. 2008, 22, 133–139.
(53) Xia, J.; Tilahun, E. L.; Reid, T.-E.; Zhang, L.; Wang, X. S. Benchmarking
Methods and Data Sets for Ligand Enrichment Assessment in Virtual
Screening. Methods 2015, 71, 146–157.
(54) Huang, N.; Scoichet, B. K.; Irwin, J. J. Benchmarking Sets for Molecular
Docking. J. Med. Chem. 2006, 49, 6789–6801.
(55) Mysinger, M. M.; Carchia, M.; Irwin, J. J.; Shoichet, B. K. Directory of
Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better
Benchmarking. J. Med. Chem. 2012, 55, 6582–6594.
(56) Rohrer, S. G.; Baumann, K. Maximum Unbiased Validation (MUV) Data
Sets for Virtual Screening Based on PubChem Bioactivity Data. J. Chem.
Inf. Model. 2009, 49, 169–184.
(57) Vogel, S. M.; Bauer, M. R.; Boeckler, F. M. DEKOIS: Demanding
Evaluation Kits for Objective in Silico Screening--a Versatile Tool for
Benchmarking Docking Programs and Scoring Functions. J. Chem. Inf.
Model. 2011, 51, 2650–2665.
(58) Bauer, M. R.; Ibrahim, T. M.; Vogel, S. M.; Boeckler, F. M. Evaluation and
Optimization of Virtual Screening Workflows with DEKOIS 2.0 – A Public
Library of Challenging Docking Benchmark Sets. J. Chem. Inf. Model.
2013, 53, 1447–1462.
(59) Irwin, J. J.; Shoichet, B. K. ZINC − A Free Database of Commercially
Available Compounds for Virtual Screening. J. Chem. Inf. Model. 2005, 45,
177–182.
(60) Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G.
ZINC: A Free Tool to Discover Chemistry for Biology. J. Chem. Inf. Model.
2012, 52, 1757–1768.
(61) Liu, T.; Lin, Y.; Wen, X.; Jorissen, R. N.; Gilson, M. K. BindingDB: A
Web-Accessible Database of Experimentally Determined Protein–ligand
Binding Affinities. Nucleic Acids Res. 2007, 35, D198–D201.
59
(62) Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Zhou, Z.; Han, L.;
Karapetyan, K.; Dracheva, S.; Shoemaker, B. A.; Bolton, E.; Gindulyte, A.;
Bryant, S. H. PubChem’s BioAssay Database. Nucleic Acids Res. 2012, 40,
D400–D412.
(63) Irwin, J. J. Community Benchmarks for Virtual Screening. J. Comput.
Aided. Mol. Des. 2008, 22, 193–199.
(64) Verdonk, M. L.; Berdini, V.; Hartshorn, M. J.; Mooij, W. T. M.; Murray, C.
W.; Taylor, R. D.; Watson, P. Virtual Screening Using Protein-Ligand
Docking: Avoiding Artificial Enrichment. J. Chem. Inf. Comput. Sci. 2004,
44, 793–806.
(65) Good, A. C.; Hermsmeier, M. a.; Hindle, S. a. Measuring CAMD Technique
Performance: A Virtual Screening Case Study in the Design of Validation
Experiments. J. Comput. Aided. Mol. Des. 2004, 18, 529–536.
(66) Truchon, J.-F.; Bayly, C. I. Evaluating Virtual Screening Methods: Good
and Bad Metrics for the “Early Recognition” Problem. J. Chem. Inf. Model.
2007, 47, 488–508.
(67) Berzelius, J. Årsberättelsen Om Framsteg I Fysik Och Kemi; Royal Swedish
Academy of Sciences, 1835.
(68) Mizoroki, T.; Mori, K.; Ozaki, A. Arylation of Olefin with Aryl Iodide
Catalyzed by Palladium. Bull. Chem. Soc. Jpn. 1971, 44, 581.
(69) Heck, R. F.; Nolley, J. P. Palladium-Catalyzed Vinylic Hydrogen
Substitution Reactions with Aryl, Benzyl, and Styryl Halides. J. Org. Chem.
1972, 37, 2320–2322.
(70) Heck, R. F. Acylation, Methylation, and Carboxyalkylation of Olefins by
Group VIII Metal Derivatives. J. Am. Chem. Soc. 1968, 90, 5518–5526.
(71) Shepard, A. F.; Winslow, N. R.; Johnson, J. R. THE SIMPLE HALOGEN
DERIVATIVES OF FURAN. J. Am. Chem. Soc. 1930, 52, 2083–2090.
(72) Tanaka, D.; Myers, A. G. Heck-Type Arylation of 2-Cycloalken-1-Ones
with Arylpalladium Intermediates Formed by Decarboxylative Palladation
and by Aryl Iodide Insertion. Org. Lett. 2004, 6, 433–436.
(73) Tanaka, D.; Romeril, S. P.; Myers, A. G. On the Mechanism of the
Palladium(II)-Catalyzed Decarboxylative Olefination of Arene Carboxylic
Acids. Crystallographic Characterization of Non-Phosphine Palladium(II)
Intermediates and Observation of Their Stepwise Transformation in Heck-
like Processes. J. Am. Chem. Soc. 2005, 127, 10323–10333.
(74) Myers, A. G.; Tanaka, D.; Mannion, M. R. Development of a
Decarboxylative Palladation Reaction and Its Use in a Heck-Type
Olefination of Arene Carboxylates. J. Am. Chem. Soc. 2002, 124, 11250–
11251.
(75) Dickstein, J. S.; Mulrooney, C. A.; O’Brien, E. M.; Morgan, B. J.;
Kozlowski, M. C. Development of a Catalytic Aromatic Decarboxylation
Reaction. Org. Lett. 2007, 9, 2441–2444.
(76) Skillinghaug, B.; Sköld, C.; Rydfjord, J.; Svensson, F.; Behrends, M.;
Sävmarker, J.; Sjöberg, P. J. R.; Larhed, M. Palladium(II)-Catalyzed
Desulfitative Synthesis of Aryl Ketones from Sodium Arylsulfinates and
Nitriles: Scope, Limitations, and Mechanistic Studies. J. Org. Chem. 2014,
79, 12018–12032.
60
(77) McMullin, C. L.; Jover, J.; Harvey, J. N.; Fey, N. Accurate Modelling of
Pd(0) + PhX Oxidative Addition Kinetics. Dalt. Trans. 2010, 39, 10833–
10836.
(78) Fukui, K. The Path of Chemical Reactions - the IRC Approach. Acc. Chem.
Res. 1981, 14, 363–368.
(79) Müller, K. Reaction Paths on Multidimensional Energy Hypersurfaces.
Angew. Chemie Int. Ed. 1980, 19, 1–13.
(80) Goodman, J. M.; Silva, M. A. QRC: A Rapid Method for Connecting
Transition Structures to Reactants in the Computational Analysis of Organic
Reactivity. Tetrahedron Lett. 2003, 44, 8233–8236.
(81) Lienhard, G. E. Enzymatic Catalysis and Transition-State Theory. Science
1973, 180, 149–154.
(82) Schwartz, S. D.; Schramm, V. L. Enzymatic Transition States and Dynamic
Motion in Barrier Crossing. Nat. Chem. Biol. 2009, 5, 551–558.
(83) Imming, P.; Sinning, C.; Meyer, A. Drugs, Their Targets and the Nature and
Number of Drug Targets. Nat. Rev. Drug Discov. 2006, 5, 821–834.
(84) Blomberg, M. R. A.; Borowski, T.; Himo, F.; Liao, R.-Z.; Siegbahn, P. E.
M. Quantum Chemical Studies of Mechanisms for Metalloenzymes. Chem.
Rev. 2014, 114, 3601–3658.
(85) Carvalho, A. T. P.; Barrozo, A.; Doron, D.; Kilshtain, A. V.; Major, D. T.;
Kamerlin, S. C. L. Challenges in Computational Studies of Enzyme
Structure, Function and Dynamics. J. Mol. Graph. Model. 2014, 54, 62–79.
(86) Senn, H. M.; Thiel, W. QM/MM Studies of Enzymes. Curr. Opin. Chem.
Biol. 2007, 11, 182–187.
(87) Siegbahn, P. E. M.; Himo, F. The Quantum Chemical Cluster Approach for
Modeling Enzyme Reactions. Wiley Interdiscip. Rev. Comput. Mol. Sci.
2011, 1, 323–336.
(88) Siegbahn, P. M.; Himo, F. Recent Developments of the Quantum Chemical
Cluster Approach for Modeling Enzyme Reactions. J. Biol. Inorg. Chem.
2009, 14, 643–651.
(89) Li, L.; Li, C.; Zhang, Z.; Alexov, E. On the Dielectric “Constant” of
Proteins: Smooth Dielectric Function for Macromolecular Modeling and Its
Implementation in DelPhi. J. Chem. Theory Comput. 2013, 9, 2126–2136.
(90) Sevastik, R.; Himo, F. Quantum Chemical Modeling of Enzymatic
Reactions: The Case of 4-Oxalocrotonate Tautomerase. Bioorg. Chem.
2007, 35, 444–457.
(91) Hopmann, K. H.; Himo, F. Quantum Chemical Modeling of the
Dehalogenation Reaction of Haloalcohol Dehalogenase. J. Chem. Theory
Comput. 2008, 4, 1129–1137.
(92) Georgieva, P.; Himo, F. Quantum Chemical Modeling of Enzymatic
Reactions: The Case of Histone Lysine Methyltransferase. J. Comput. Chem.
2010, 31, 1707–1714.
(93) Hu, L.; Eliasson, J.; Heimdal, J.; Ryde, U. Do Quantum Mechanical
Energies Calculated for Small Models of Protein-Active Sites Converge?†.
J. Phys. Chem. A 2009, 113, 11793–11800.
(94) Sumner, S.; Söderhjelm, P.; Ryde, U. Effect of Geometry Optimizations on
QM-Cluster and QM/MM Studies of Reaction Energies in Proteins. J.
Chem. Theory Comput. 2013, 9, 4205–4214.
61
(95) Lindh, J.; Sjöberg, P. J. R.; Larhed, M. Synthesis of Aryl Ketones by
palladium(II)-Catalyzed Decarboxylative Addition of Benzoic Acids to
Nitriles. Angew. Chemie Int. Ed. 2010, 49, 7733–7737.
(96) Xue, L.; Su, W.; Lin, Z. A DFT Study on the Pd-Mediated Decarboxylation
Process of Aryl Carboxylic Acids. Dalt. Trans. 2010, 39, 9815–9822.
(97) Zhang, S.-L.; Fu, Y.; Shang, R.; Guo, Q.-X.; Liu, L. Theoretical Analysis of
Factors Controlling Pd-Catalyzed Decarboxylative Coupling of Carboxylic
Acids with Olefins. J. Am. Chem. Soc. 2009, 132, 638–646.
(98) Svensson, F.; Mane, R. S.; Sävmarker, J.; Larhed, M.; Sköld, C. Theoretical
and Experimental Investigation of Palladium(II)-Catalyzed Decarboxylative
Addition of Arenecarboxylic Acid to Nitrile. Organometallics 2013, 32,
490–497.
(99) Sävmarker, J.; Rydfjord, J.; Gising, J.; Odell, L. R.; Larhed, M. Direct
Palladium(II)-Catalyzed Synthesis of Arylamidines from
Aryltrifluoroborates. Org. Lett. 2012, 14, 2394–2397.
(100) Kormos, C. M.; Leadbeater, N. E. Preparation of Nonsymmetrically
Substituted Stilbenes in a One-Pot Two-Step Heck Strategy Using Ethene
As a Reagent. J. Org. Chem. 2008, 73, 3854–3858.
(101) Cabri, W.; Candiani, I.; Bedeschi, A.; Santi, R. Palladium-Catalyzed
Arylation of Unsymmetrical Olefins. Bidentate Phosphine Ligand
Controlled Regioselectivity. J. Org. Chem. 1992, 57, 3558–3563.
(102) Arai, I.; Daves, G. D. Palladium-Catalyzed Phenylation of Enol Ethers and
Acetates. J. Org. Chem. 1979, 44, 21–23.
(103) Mitchell, H. B. Multi-Sensor Data Fusion; Springer-Verlag: Berlin, 2007.
(104) Charifson, P. S.; Corkery, J. J.; Murcko, M. A.; Walters, W. P. Consensus
Scoring: A Method for Obtaining Improved Hit Rates from Docking
Databases of Three-Dimensional Structures into Proteins. J. Med. Chem.
1999, 42, 5100–5109.
(105) Willet, P. Enhancing the Effectivness of Ligand-Based Virtual Screening
Using Data Fusion. QSAR Comb. Sci. 2006, 25, 1143–1152.
(106) Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.;
Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelly, M.; Perry, J. K.; Shaw,
D. E.; Francis, P.; Shenkin, P. S. Glide: A New Approach for Rapid,
Accurate Docking and Scoring. 1. Method and Assessment of Docking
Accuracy. J. Med. Chem. 2004, 47, 1739–1749.
(107) Halgren, T. A.; Murphy, R. B.; Friesner, R. A.; Beard, H. S.; Frye, L. L.;
Pollard, W. T.; Banks, J. L. Glide: A New Approach for Rapid, Accurate
Docking and Scoring. 2. Enrichment Factors in Database Screening. J. Med.
Chem. 2004, 47, 1750–1759.
(108) Hawkins, P. C. D.; Skillman, A. G.; Nicholls, A. Comparison of Shape-
Matching and Docking as Virtual Screening Tools. J. Med. Chem. 2007, 50,
74–82.
(109) Dixon, S. L.; Smondyrev, A. M.; Knoll, E. H.; Rao, S. N.; Shaw, D. E.;
Freisner, R. A. PHASE: A New Engine for Pharmacophore Perception, 3D
QSAR Model Development, and 3D Databases Screening: 1. Methodology
and Preliminary Results. J. Comput. Aided. Mol. Des. 2006, 20, 647–671.
(110) Dixon, S. L.; Smondyrev, A. M.; Rao, S. N. PHASE: A Novel Approach to
Pharmacophore Modeling and 3D Database Searching. Chem. Biol. Drug
Des. 2006, 67, 370–372.
62
(111) Muchmore, S. W.; Souers, A. J.; Akritopoulou-Zanze, I. The Use of Three-
Dimensional Shape and Electrostatic Similarity Searching in the
Identification of a Melanin-Concentrating Hormone Receptor 1 Antagonist.
Chem. Biol. Drug Des. 2006, 67, 174–176.
(112) Tan, L.; Geppert, H.; Sisay, M. T.; Gütschow, M.; Bajorath, J. Integrating
Structure- and Ligand-Based Virtual Screening: Comparison of Individual,
Parallel, and Fused Molecular Docking and Similarity Search Calculations
on Multiple Targets. ChemMedChem 2008, 1566–1571.
(113) Baber, J. C.; Shirley, W. A.; Gao, Y.; Feher, M. The Use of Consensus
Scoring in Ligand-Based Virtual Screening. J. Chem. Inf. Model. 2006, 46,
244–288.
(114) Cross, S.; Baroni, M.; Carosati, E.; Benedetti, P.; Clementi, S. FLAP: GRID
Molecular Interaction Fileds in Virtual Screening. Validation Using the
DUD Data Set. J. Chem. Inf. Model. 2010, 50, 1442–1450.
(115) Wang, R.; Wang, S. How Does Consensus Scoring Work for Virtual Library
Screening? An Idealized Comuter Experiment. J. Chem. Inf. Comput. Sci.
2001, 41, 1422–1426.
(116) Svensson, F.; Karlén, A.; Sköld, C. Virtual Screening Data Fusion Using
Both Structure- and Ligand-Based Methods. J. Chem. Inf. Model. 2012, 52,
225–232.
(117) Jacobsson, M.; Karlén, A. Ligand Bias of Scoring Functions in Structure-
Based Virtual Screening. J. Chem. Inf. Model. 2006, 46, 1334–1343.
(118) Muthas, D.; Sabnis, Y. A.; Lundborg, M.; Karlén, A. Is It Possible to Increse
Hit Rates in Structure-Based Virtual Screening by Pharmacophore Filtering?
An Investigation of the Advantages and Pitfalls of Post-Filtering. J. Mol.
Graph. Model. 2008, 26, 1237–1251.
(119) Cheeseright, T. J.; Mackey, M. D.; Melville, J. L.; Vinter, J. G. FielsScreen:
Virtual Screening Using Molecular Fields. Application to the DUD Data
Set. J. Chem. Inf. Model. 2008, 48, 2108–2117.
(120) Sastry, G. M.; Inakollu, V. S. S.; Sherman, W. Boosting Virtual Screening
Enrichments with Data Fusion: Coalescing Hits from Two-Dimensional
Fingerprints, Shape, and Docking. J. Chem. Inf. Model. 2013, 53, 1531–
1542.
(121) Subramaniam, S.; Mehrotra, M.; Gupta, D. Virtual High Throughput
Screening (vHTS) - A Perspective. Bioinformation 2008, 3, 14–17.
(122) Blucher, A. S.; McWeeney, S. K. Challenges in Secondary Analysis of High
Throughput Screening Data. Pac. Symp. Biocomput. 2014, 114–124.
(123) Bender, A.; Glen, R. C. A Discussion of Measures of Enrichment in Virtual
Screening: Comparing the Information Content of Descriptors with
Increasing Levels of Sophistication. J. Chem. Inf. Model. 2005, 45, 1369–
1375.
(124) Bemis, G. W.; Murcko, M. A. The Properties of Known Drugs. 1. Molecular
Frameworks. J. Med. Chem. 1996, 39, 2887–2893.
(125) Lindh, M.; Svensson, F.; Schaal, W.; Zhang, J.; Sköld, C.; Brandt, P.;
Karlén, A. Toward a Benchmarking Data Set Able to Evaluate Ligand- and
Structure-Based Virtual Screening Using Public HTS Data. J. Chem. Inf.
Model. 2015, 55, 343–353.
63
(126) Karolina Gluza; Kafarski, P. Transition State Analogues of Enzymatic
Reactions as Potential Drugs. In Drug Discovery; El-Shemy, H., Ed.;
InTech: Rijeka, Croatia, 2013.
(127) Schramm, V. L. Enzymatic Transition States, Transition-State Analogs,
Dynamics, Thermodynamics, and Lifetimes. Annu. Rev. Biochem. 2011, 80,
703–732.
(128) Schramm, V. L. Transition States, Analogues, and Drug Development. ACS
Chem. Biol. 2012, 8, 71–81.
(129) Andersson, H.; Hallberg, M. Discovery of Inhibitors of Insulin-Regulated
Aminopeptidase as Cognitive Enhancers. Int. J. Hypertens. 2012, 2012,
Article ID 789671.
(130) Svensson, F.; Engen, K.; Lundbäck, T.; Larhed, M.; Sköld, C. Virtual
Screening for Transition State Analogue Inhibitors of IRAP Based on
Quantum Mechanically Derived Reaction Coordinates. J. Chem. Inf. Model.
2015. DOI: 10.1021/acs.jcim.5b00359
(131) Matsumoto, H.; Rogi, T.; Yamashiro, K.; Kodama, S.; Tsuruoka, N.;
Hattori, A.; Takio, K.; Mizutani, S.; Tsujimoto, M. Characterization of a
Recombinant Soluble Form of Human Placental Leucine
Aminopeptidase/oxytocinase Expressed in Chinese Hamster Ovary Cells.
Eur. J. Biochem. 2000, 267, 46–52.
(132) Wong, A. H. M.; Zhou, D.; Rini, J. M. The X-Ray Crystal Structure of
Human Aminopeptidase N Reveals a Novel Dimer and the Basis for Peptide
Processing. J. Biol. Chem. 2012, 287, 36804–36813.
(133) Hermans, S. J.; Ascher, D. B.; Hancock, N. C.; Holien, J. K.; Michell, B. J.;
Yeen Chai, S.; Morton, C. J.; Parker, M. W. Crystal Structure of Human
Insulin-Regulated Aminopeptidase with Specificity for Cyclic Peptides.
Protein Sci. 2015, 24, 190–199.
(134) Kochan, G.; Krojer, T.; Harvey, D.; Fischer, R.; Chen, L.; Vollmar, M.; von
Delft, F.; Kavanagh, K. L.; Brown, M. A.; Bowness, P.; Wordsworth, P.;
Kessler, B. M.; Oppermann, U. Crystal Structures of the Endoplasmic
Reticulum Aminopeptidase-1 (ERAP1) Reveal the Molecular Basis for N-
Terminal Peptide Trimming. Proc. Natl. Acad. Sci. 2011, 108, 7745–7750.
(135) Gandhi, A.; Lakshminarasimhan, D.; Sun, Y.; Guo, H.-C. Structural Insights
into the Molecular Ruler Mechanism of the Endoplasmic Reticulum
Aminopeptidase ERAP1. Sci. Rep. 2011, 1.
(136) Nguyen, T. T.; Chang, S.-C.; Evnouchidou, I.; York, I. A.; Zikos, C.; Rock,
K. L.; Goldberg, A. L.; Stratikos, E.; Stern, L. J. Structural Basis for
Antigenic Peptide Precursor Processing by the Endoplasmic Reticulum
Aminopeptidase ERAP1. Nat. Struct. Mol. Biol. 2011, 18, 604–613.
(137) McGowan, S.; Porter, C. J.; Lowther, J.; Stack, C. M.; Golding, S. J.;
Skinner-Adams, T. S.; Trenholme, K. R.; Teuscher, F.; Donnelly, S. M.;
Grembecka, J.; Mucha, A.; Kafarski, P.; DeGori, R.; Buckle, A. M.;
Gardiner, D. L.; Whisstock, J. C.; Dalton, J. P. Structural Basis for the
Inhibition of the Essential Plasmodium Falciparum M1 Neutral
Aminopeptidase. Proc. Natl. Acad. Sci. 2009, 106, 2537–2542.
(138) Chen, L.; Lin, Y.-L.; Peng, G.; Li, F. Structural Basis for Multifunctional
Roles of Mammalian Aminopeptidase N. Proc. Natl. Acad. Sci. 2012, 109,
17966–17971.
64
(139) Grzywa, R.; Sokol, A. M.; Sieńczyk, M.; Radziszewicz, M.; Kościołek, B.;
Carty, M. P.; Oleksyszyn, J. New Aromatic Monoesters of Α-
Aminoaralkylphosphonic Acids as Inhibitors of Aminopeptidase N/CD13.
Bioorg. Med. Chem. 2010, 18, 2930–2936.
65
Acta Universitatis Upsaliensis
Digital Comprehensive Summaries of Uppsala Dissertations
from the Faculty of Pharmacy 201
Editor: The Dean of the Faculty of Pharmacy
A doctoral dissertation from the Faculty of Pharmacy, Uppsala

University, is usually a summary of a number of papers. A few
copies of the complete dissertation are kept at major Swedish
research libraries, while the summary alone is distributed
internationally through the series Digital Comprehensive
Summaries of Uppsala Dissertations from the Faculty of
Pharmacy. (Prior to January, 2005, the series was published
under the title “Comprehensive Summaries of Uppsala
Dissertations from the Faculty of Pharmacy”.)
ACTA
UNIVERSITATIS
UPSALIENSIS
Distribution: publications.uu.se UPPSALA
urn:nbn:se:uu:diva-259443 2015

Computational Methods in Medical Chemistry

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Computational Methods in Medical Chemistry

Hochgeladen von

Copyright:

Verfügbare Formate

Digital Comprehensive Summaries of Uppsala Dissertations

from the Faculty of Pharmacy 201

Keywords: DFT, IRAP, Virtual Screening, Catalysis, Palladium

Fredrik Svensson, Department of Medicinal Chemistry, Organic Pharmaceutical Chemistry,

© Fredrik Svensson 2015

I F. Svensson, R. Mane, J. Sävmarker, M. Larhed, C. Sköld,

II J. Rydfjord, F. Svensson, A. Trejos, P. Sjöberg, C. Sköld, J.

III F. Svensson, A. Fardost, B. Skillinghaug, P. Wakchaure, M.

IV F. Svensson, A. Karlén, C. Sköld, Virtual Screening Data Fu-

V M. Lindh, F. Svensson, W. Schaal, J. Zhang, C. Sköld, P.

VI F. Svensson, K. Engen, T. Lundbäck, M. Larhed, C. Sköld,

Reprints were made with permission from the respective publishers.

2015 A. Belfrage, J. Gising, F. Svensson, E. Åkerblom, C. Sköld,

2014 S. Borhade, U. Rosenström, J. Sävmarker, T. Lundbäck, A.

2014 B. Skillinghaug, C. Sköld, J. Rydfjord, F. Svensson, M. Beh-

2013 J. Rydfjord, F. Svensson, M. Fagrell, J. Sävmarker, M. Thu-

1.1 Computational Tools in Drug Discovery

The fundamentals of thermodynamics dictate that a chemical reaction will be

The thermodynamic properties of a reaction tell us if it is energetically fa-

where k is the reaction rate coefficient, A is the frequency factor specific to

2.1 Energies of Ligand Binding

2.2 Potential Energy Surface

The relative energies of the TS saddle point to the connecting minima on

where pi is the probability of state i, εi is the energy state i, k is the Boltz-

2.3 Transition State Theory

where κ is the transmission coefficient relative to the probability that the

= (1 + cos ) + (1 − cos 2 ) + (1 + cos 3 ),

ks is a bond-specific constant, l is the bond length, l0 is the ideal bond length,

2.5 Quantum Mechanics

This constitutes the time-independent Schrödinger equation.12

where r is the spatial coordinates and s is the spin coordinate. Although,

2.5.2 Level of Theory

2.5.2.3 Free Energy

2.5.2.4 Interpreting the Model

Figure 2. Hypothetical free-energy profile of a reaction.

Virtual screening (VS) is the in silico process of querying a large collection

3.1 Ligand-Based Methods

3.2 Molecular Docking

3.3 Database and Target Preparation

3.3.1 Water Molecules in Virtual Screening

3.4 The Concept of Chemical Space

3.4.1 Principal Component Analysis

3.5 Benchmark Datasets

3.6 Evaluation Metrics

where tp is the number of true positives, fp is the number of false positives,

where n is the number of active compounds, N is the total number of com-

The term catalysis was introduced by the Swedish scientist Jacob

4.1 Palladium Catalysis

Figure 3. Top and side views of the optimized geometry (B3LYP/LACVP*) of a

4.1.2 Palladium Ligands

4.1.3 Density Functional Theory Modeling of Palladium-

• Only complexes involving one palladium atom were considered.

4.1.3.1 Quick Reaction Coordinates

4.2 Enzyme Catalysis

4.2.1 Modeling of Enzyme Reactions

This thesis aspires to further the field in computational medicinal chemistry

• to elucidate reaction pathways for palladium(II)-catalyzed decarboxyla-

• to develop an improved VS protocol based on SB and LB method con-

• to compile a dataset suitable for the evaluation of both LB and SB VS

• to develop a new VS methodology using computationally predicted TS

For this part of the thesis, three reactions involving palladium(II)-catalyzed

6.1 Synthesis of Aryl Ketones

Scheme 1. General reaction scheme for the decarboxylative palladium(II)-catalyzed

6.2 Synthesis of Aryl Amidines