Beruflich Dokumente
Kultur Dokumente
Computational Methods in
Medicinal Chemistry
Mechanistic Investigations and Virtual Screening
Development
FREDRIK SVENSSON
ACTA
UNIVERSITATIS
UPSALIENSIS ISSN 1651-6192
ISBN 978-91-554-9293-9
UPPSALA urn:nbn:se:uu:diva-259443
2015
Dissertation presented at Uppsala University to be publicly examined in A1:107a, BMC,
Husargatan 3, Uppsala, Friday, 25 September 2015 at 09:15 for the degree of Doctor of
Philosophy (Faculty of Pharmacy). The examination will be conducted in English. Faculty
examiner: Professor Anna Linusson (Umeå University, Department of Chemistry).
Abstract
Svensson, F. 2015. Computational Methods in Medicinal Chemistry. Mechanistic
Investigations and Virtual Screening Development. Digital Comprehensive Summaries of
Uppsala Dissertations from the Faculty of Pharmacy 201. 65 pp. Uppsala: Acta Universitatis
Upsaliensis. ISBN 978-91-554-9293-9.
Computational methods have become an integral part of drug development and can help bring
new and better drugs to the market faster. The process of predicting the biological activity of
large compound collections is known as virtual screening, and has been instrumental in the
development of several drugs today in the market. Computational methods can also be used to
elucidate the energies associated with chemical reactivity and predict how to improve a synthetic
protocol. These two applications of computational medicinal chemistry is the focus of this thesis.
In the first part of this work, quantum mechanics has been used to probe the energy surface of
palladium(II)-catalyzed decarboxylative reactions in order to gain a better understating of these
systems (paper I-III). These studies have mapped the reaction pathways and been able to make
accurate predictions that were verified experimentally.
The other focus of this work has been to develop virtual screening methodology. Our first
study in the area (paper IV) investigated if the results from several virtual screening methods
could be combined using data fusion techniques in order to get a more consistent result and
better performance. The study showed that the results obtained from data fusion were more
consistent than the results from any single method. The data fusion methods also for several
target had a better performance than any of the included single methods.
Next, we developed a dataset suitable for evaluating the performance of virtual screening
methods when applied to large compound collection as a replacement or complement for high
throughput screening (paper V). This is the first benchmark dataset of its kind.
Finally, a method for using computationally derived reaction coordinates as basis for virtual
screening was developed. The aim was to find inhibitors that resemble key steps in the
mechanism (paper VI). This initial proof of concept study managed to locate several known and
one previously not reported reaction mimetics against insulin regulated amino peptidase.
ISSN 1651-6192
ISBN 978-91-554-9293-9
urn:nbn:se:uu:diva-259443 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-259443)
”A really good theoretical chemist
can obtain right answers with wrong
models.”
-J. H. Van Vleck
List of Papers
This thesis is based on the following papers, which are referred to in the text
by their Roman numerals.
1 Introduction ......................................................................................... 11
1.1 Computational Tools in Drug Discovery ........................................ 11
2 Reactivity and Energy.......................................................................... 12
2.1 Energies of Ligand Binding ............................................................ 12
2.2 Potential Energy Surface ................................................................. 13
2.3 Transition State Theory ................................................................... 14
2.4 Molecular Mechanics ...................................................................... 15
2.5 Quantum Mechanics ....................................................................... 16
2.5.1 Density Functional Theory .................................................... 17
2.5.2 Level of Theory ..................................................................... 17
3 Virtual Screening ................................................................................. 20
3.1 Ligand-Based Methods ................................................................... 20
3.2 Molecular Docking ......................................................................... 20
3.3 Database and Target Preparation .................................................... 21
3.3.1 Water Molecules in Virtual Screening ................................... 21
3.4 The Concept of Chemical Space ..................................................... 22
3.4.1 Principal Component Analysis .............................................. 22
3.5 Benchmark Datasets ........................................................................ 22
3.6 Evaluation Metrics .......................................................................... 23
4 Catalysis............................................................................................... 25
4.1 Palladium Catalysis ......................................................................... 25
4.1.1 Decarboxylation ..................................................................... 26
4.1.2 Palladium Ligands ................................................................. 26
4.1.3 Density Functional Theory Modeling of Palladium-
Catalyzed Reactions............................................................................. 27
4.2 Enzyme Catalysis ............................................................................ 28
4.2.1 Modeling of Enzyme Reactions ............................................. 28
4.2.2 Cluster Approach ................................................................... 29
5 Aims of the Thesis ............................................................................... 30
6 Decarboxylative Palladium(II)-Catalyzed Reactions (papers I-III) ..... 31
6.1 Synthesis of Aryl Ketones............................................................... 31
6.2 Synthesis of Aryl Amidines ............................................................ 34
6.3 Styrene Synthesis ............................................................................ 36
7 Method Consensus in Virtual Screening (paper IV) ............................ 41
8 Evaluation of Virtual Screening Using Publicly Available High
Throughput Screening Data (paper V)................................................. 44
9 Virtual Screening Using Computationally Derived Reaction
Coordinates (paper VI) ........................................................................ 48
10 Conclusions ......................................................................................... 54
11 Acknowledgements.............................................................................. 55
12 References ........................................................................................... 56
Abbreviations
APN aminopeptidase N
B3LYP Becke, 3-parameter, Lee-Yang-Parr
BEDROC Boltzman-enhanced discrimination of ROC
DEKOIS demanding evaluation kits for objective in silico screening
DFT density functional theory
DUD directory of useful decoys
EF enrichment factor
ERAP endoplasmic reticulum aminopeptidase
ESI-MS electrospray ionization mass spectrometry
FF force field
fp false positive
HTS high throughput screening
IRAP insulin-regulated aminopeptidase
IRC intrinsic reaction coordinate
LB ligand-based
MM molecular mechanics
MUV maximum unbiased validation (dataset)
PCA principal component analysis
PES potential energy surface
PDB protein data bank
QM quantum mechanics
QRC quick reaction coordinate
ROC receiver operator characteristic
SB structure-based
tp true positive
TS transition state
TST transition-state theory
VA vinyl acetate
VS virtual screening
1 Introduction
For thousands of years humans have treated ailments with remedies derived
from natural sources such as plants, fungi, insects, and animals. During the
nineteenth century it was realized that the therapeutic effects were due to
specific molecules present in these sources. Subsequent development of the
chemical industry provided new synthetic sources for these molecules, set-
ting the stage for the emergence of the pharmaceutical industry. Although
this process revolutionized the treatment of many diseases, the discovery of
new drugs still happened mainly by chance. Gradually, the newly invented
methods led to what we call rational drug design, the leading paradigm of
drug development today.1
11
2 Reactivity and Energy
∆ =∆ − ∆
/
=
12
Estimation of the entropy term is less straightforward, as it is dependent
on the effects of the binding event on the degrees of freedom of the ligand,
the protein, and the solvent molecules.5
The ultimate goal of computational medicinal chemistry could be thought
of as the accurate prediction of the free energy of a binding event.
Figure 1. Calculated PES for the decarboxylation of benzoic acid. Energy is plotted
as a function of bond lengths r1 and r2.
13
creases. The probability distribution of molecules over their possible con-
formational states is described by a Boltzmann distribution through:
/
=∑ /
∆ ‡
=κ
ℎ
14
2.4 Molecular Mechanics
The application of principles from classical mechanics to describe molecular
systems is called molecular mechanics (MM). In MM, an atom is considered
as a single particle and bonds between atoms are treated like springs.
Through this approach, the potential energy of a molecular system can be
calculated using a set of parameterized functions called a force field (FF).
The concept of MM was introduced in 1946 by Hill,9 and Westheimer and
Mayer.10
The terms included depend on the FF used. Generally, terms describing
bond length, torsion, angle, and non-bonded interactions are considered. The
parameters are chosen so that the model can reproduce known experimental
structures or structures derived from QM calculations. One of the first suc-
cessful FFs was MM2, published in 1977 by Allinger.11 In this FF the poten-
tial energy is described as:
= + +
+ +
where:
= 71.94 ( − ) (1 − 2( − )),
= 0.021914 ( − ) (1 + 7 ∙ 10 ∙( − ) ),
= 2.51124 (θ − )( − ) + (l − ) ,
= (2.9 ∙ 10 ∙ − 2.25 ),
15
speed of MM calculations allows large databases of small molecules to be
considered, and large systems to be readily optimized.
However, the standard FF parameterization does not allow chemical reac-
tions to be modeled. Even with a set of parameters adapted for modeling a
specific reaction, MM would struggle to accurately describe the energy of
the process. This is because, in order to capture the full nature of a reaction,
the electronic structure must be considered.
Ψ= Ψ
( )= … |Ψ( , , , …, , )| …
16
2.5.1 Density Functional Theory
Density functional theory (DFT) relates the system properties to the electron
density function, and depends on only three spatial coordinates. This has
made DFT a cost-effective alternative for modeling larger systems.
Although it was recognized early on that using the electron density func-
tion could provide a simpler approach than the wave function methods, no
proof existed that the electron density function uniquely determined the
properties of the system. Hohenberg and Kohn provided the fundamental
principles for DFT when they proved that the electron density of a system
completely determines the electronic energy.13 In their second theorem, they
also showed that any trial electron density satisfying the boundary conditions
results in energy higher than or equal to the true ground-state energy; in oth-
er words, they showed that the variational principle is also valid in this ap-
proach.
Although this marked a theoretical breakthrough, the Hohenberg-Kohn
theorems do not give any practical guidance on how to construct the density
functional and determine the energy of the studied system. Also, the second
theorem is of limited practical use, as it is only valid for the exact density
functional, which is not available.
Once the soundness of the method had been proved, however, it did not
take long for a practical approach to the density functional to be presented by
Kohn and Sham.14
One challenge with the Kohn-Sham method is that it includes an ex-
change-correlation functional term for which the expression is unknown.
Any attempt to apply the Kohn-Sham method thus has to be accompanied by
an estimate of this term, resulting in a multitude of different functional
forms.
17
Pople) basis set which utilizes the 6-31G20–25 basis set for elements H-Ar and
an effective core potential together with a valence basis set for heavier ele-
ments.26 The 6-31G basis set is a split-valence double-zeta basis set. This
means that it uses one basis function for each core atomic orbital and two
basis functions for the outer atomic orbitals. The basis sets can be further
expanded by the addition of polarization functions, denoted by * when added
to all atoms except for transition metals, hydrogen, and helium, and by **
when also added to hydrogen and helium atoms. The purpose is to expand
the available space for the orbitals to accommodate polarization in a certain
direction. The orbital radius can be further expanded by the addition of dif-
fuse functions (denoted + or ++) to increase the accuracy when considering
anions and highly electronegative atoms.
2.5.2.1 Dispersion
Electron correlation gives rise to long range attractive forces called disper-
sion. These long range electron-electron interactions are not well modeled in
most DFT functionals.27 Nevertheless, an accurate description of the disper-
sion energy is essential for modeling some systems and has to be accounted
for.28
In the papers employing DFT calculations presented in this thesis, the
DFT-D3 protocol was used to correct for dispersion.29 Antony et al. have
shown that, when modeling protein ligand energies, the accuracy of disper-
sion-corrected DFT methods equals that of high-level wave function meth-
ods.30
2.5.2.2 Solvation
Chemical reactions generally do not take place in an ideal gas phase, as as-
sumed in standard QM calculations, but rather in some form of solvent. The
solvent molecules constantly interact with the reaction system and alter its
properties. The most accurate way to model this would be to include explicit
solvent molecules in the QM calculations. This, however, rapidly becomes
impractical as the size of the system would drastically increase and the con-
formational freedoms would soon become unmanageable. Instead, solvent
effects are usually modeled by continuum solvent models that simulate the
average effect of the solvent. The solvation models generally apply the sol-
vent's dielectric constant (ε) and radius to simulate the electrostatic effect of
the solvent on the system.
In this thesis, the Poisson-Boltzman finite continuum model31,32 has been
applied in all DFT studies.
18
calculated dispersion correction, the zero point energy, and the harmonic
entropy contribution at the investigated temperature.
The first thing to note is that we are only interested in relative energies and,
thus, we choose to denote the starting point as zero energy. From the energy
profile we can then determine the thermodynamic properties of the reaction
(endergonic or exergonic) by comparing the relative energies of the reactant
and the product. We can also calculate the amount of free energy that is re-
quired for the reaction to take place by comparing the free energy of the
highest energy TS to the lowest preceding point (reactant to the second TS,
80 kJ mol-1). This free-energy requirement in turn is proportional to the rate
of the reaction.
19
3 Virtual Screening
20
need to be conformationally sampled, fitted in to the binding site of the mac-
romolecular target, and the energies of the formed interactions must be esti-
mated to generate a score.39
Performance of SB VS requires the protein structure to be derived, usual-
ly by X-ray crystallography of a protein crystal. The Protein Data Bank
(PDB) is a valuable source of protein structures which contains more than
100 000 structures that have been deposited by researchers around the
world.40 The electron densities of many structures are also available through
the Uppsala Electron-Density Server.41
Although the availability of large numbers of crystal structures has been
of immense importance to understanding protein–ligand interactions and the
development of new drugs, careful evaluation should accompany the use of
protein crystal data.42 When evaluating the quality of a crystal structure for
use in SB VS, it is essential to review the actual electron density map in or-
der to make sure that the ligand and the binding site are well defined. Also,
the relevance of the crystalized conformation has to be evaluated and the
possible effects of flexibility assessed.
21
When preparing a protein for VS, the water present in the active site re-
quires special consideration. For simplicity, water is often removed but, for
some inhibitors, the presence of water molecules might be essential in order
to predict the correct binding pose.45 Overall, the displacement of water from
the right areas of the active site cavity is essential to achieve favorable bind-
ing free energy.46,47
22
through a test set with known active compounds pooled with other mole-
cules, referred to as decoys. Such a dataset can be used to estimate the ability
of different algorithms to retrieve active compounds. A number of such
benchmark sets are available.53 The most well-known are: the directory of
useful decoys (DUD) and the enhanced version (DUD-E),54,55 the maximum
unbiased validation (MUV) dataset,56 and the demanding evaluation kits for
objective in silico screening (DEKOIS).57,58
Many benchmark datasets are compiled using compounds from publicly
available databases for small molecules such as the ZINC,59,60 BindingDB,61
and PubChem62 databases. Often the data are supplemented using com-
pounds from the literature for the specific target in question.
Since benchmark datasets are based on sets of known compounds, there is
an inherited risk of bias. Large chemical collections tend to sample only
certain regions of chemical space, such as molecules that are easy to make or
those that fall into the regions surrounding known inhibitors.63 Therefore,
strictly speaking, the benchmark datasets only test performance in the chem-
ical and biological space that they cover.
The composition of benchmark datasets has led to the introduction of the
concepts of artificial enrichment and analog bias. A difference in physico-
chemical properties between active and inactive molecules can lead to artifi-
cial enrichment, i.e. enrichment due to the dissimilarity between active and
inactive compounds rather than due to the efficacy of the VS method evalu-
ated.64 Similarly, if all active compounds are in the same structure class this
could lead to an “all or nothing” outcome. Too much similarity among the
active compounds is generally referred to as analog bias.65
/( )
=
/
23
These commonly used methods do not account for the position of the ac-
tive compound in the hit-list; they only indicate whether it is within the de-
fined cutoff. This is called the “early recognition problem” and several other
metrics have been proposed that take this into consideration.66 One such
metric is Boltzman-enhanced discrimination of ROC (BEDROC), which is
defined as:
∑ / ∙ sinh 1
= ∙ +
( )
/ cosh − cosh − 1−
24
4 Catalysis
25
4.1.1 Decarboxylation
Benzoic acids, which are cheap, non-toxic, stable precursors to aryl-
palladium(II), are a common reactant in palladium-catalyzed, carbon-carbon
bond formations. The first metal-mediated decarboxylations were reported as
early as 1930,71 but the use of single metal palladium catalysis was only
reported recently, by Myers.72–74
The greatest limitation of palladium-catalyzed decarboxylative reactions
is the need for ortho substituents on the benzoic acids in order for the reac-
tions to be productive.75 Theoretical investigations have suggested that the
carboxylic acid functionality has to be orthogonal to the plane of the aro-
matic system to allow for decarboxylation (Figure 3). This conformation
breaks the conjugation between the acid and the aromatic ring and is there-
fore energetically disfavored. Ortho substituents break the planar system by
steric interactions, thus reducing the free-energy requirement for the orthog-
onal conformation.76
26
Palladium ligands can be grouped according to the type of atom coordi-
nating palladium and the number of coordinations they make. The number of
non-contiguous interactions made by the ligand to the metal center is called
its denticity. A ligand with one atom coordinating the metal is called mono-
dentate, and a ligand where two atoms coordinate is called bidentate. The
denticity is denoted by the symbol κ.
The palladium-catalyzed reactions investigated in this thesis all employed
bidentate nitrogen-based ligands. Some common nitrogen-based ligands are
shown in Table 1.
Table 1. Common nitrogen-based bidentate palladium ligands.
Structure Chemical name
2,2'-bipyridine
6-methyl-2,2'-bipyridine
6,6´-dimethyl-2,2'-
bipyridine
1,10-phenantroline
2,9-dimethyl-1,10-
phenantroline
2,9-dimethyl-4,7-
diphenyl-1,10-
phenanthroline
1,1'-biisoquinoline
27
• The bidentate nitrogen ligands were always bound to palladium in
a κ2 fashion.
• The process of ligand exchange was not modeled but was, in most
cases, assumed to be diffusion-controlled and to have a free-
energy requirement of about 20 kJ mol-1.77
28
4.2.2 Cluster Approach
The cluster approach to enzyme modeling allows the study of biochemical
problems using solely QM techniques. In this approach the amino acid resi-
dues in the vicinity of the active site are extracted to construct a model that is
investigated using only QM methods. In order to account for the steric and
electrostatic effects of the surrounding protein structure, the coordinates of
selected atoms are fixed during optimization and the entire complex is treat-
ed using a continuum solvation model.87,88
A range of values has been used for the dielectric constant (ε) when mod-
eling proteins, but a value of four is the most common.89 For the cluster ap-
proach, the effect of the solvation model decreases when the model system
increases in size and, for systems larger than 200 atoms, the relative energy
is nearly the same regardless of the chosen dielectric constant.90–92
The first step in this approach is to select the model substrate, amino acid
residues, cofactors, metal ions, and water molecules that will be included in
the model. This is often done by selecting as many of the amino acids sur-
rounding the catalytic site as possible with regard to the model size. With
today’s computational power, it is reasonable to consider models of up to
300 atoms. Naturally, care must be taken to include all residues that take part
in the reaction or stabilize the generated intermediates. Thus, the more in-
formation that is available on the reaction the better are the chances of con-
structing an accurate model.
The cluster approach is associated with some limitations, according to the
way the model system is constructed. Since only a small part of the enzyme
is included in the model, long range interactions are not modeled.93 Also, the
geometry optimization has to be carefully monitored to ensure that no artifi-
cial geometry changes occur.94
29
5 Aims of the Thesis
30
6 Decarboxylative Palladium(II)-Catalyzed
Reactions (papers I-III)
31
Figure 4. Calculated free-energy profile for decarboxylation. Reprinted with per-
mission from Svensson et al.98
The lowest energy complex identified prior to decarboxylation was the neu-
tral complex II where palladium coordinates 2,6-dimethoxybenzoate and
acetate. This complex also marks the starting point of the catalytic cycle.
The complex immediately preceding decarboxylation (IV) can be formed
from II by releasing the acetate and rotating the benzoate. From there, the
decarboxylation can occur through TS-I to form the product complex V.
Release of CO2 and coordination with another benzoic acid then generates
the lowest energy intermediate (VII) between decarboxylation and the sub-
sequent carbopalladation. Overall, the decarboxylation process had a calcu-
lated free-energy requirement of 110 kJ mol-1.
Next, the carbopalladation step was investigated (Figure 5). The calcula-
tions indicated carbopalladation as the rate-determining step of the reaction
with a calculated free-energy requirement of 132 kJ mol-1 (VII to TS-II).
Following the carbopalladation, coordination with benzoate and then ligand
exchange between the formed ketimine and acetate reformed the starting
point of the catalytic cycle (II). Hydrolysis of the product yielded the desired
aryl ketone and further lowered the free energy of the system.
32
Figure 5. Calculated free-energy pathway for carbopalladation and product for-
mation. Adapted with permission from Svensson et al.98
The free-energy profiles of all the nitrogen-based ligands that had been used
during the ligand screening were investigated to better understand the re-
ported experimental outcomes for the different palladium ligands. The com-
putationally investigated ligands are depicted in Table 1. A clear trend was
observed where the more sterically hindered ligands, such as 6,6´-dimethyl-
2,2'-bipyridine, had a lower free-energy requirement for decarboxylation
than the less hindered ligands. This could be due to that the lowest energy
complex prior to decarboxylation (II) was sterically crowded and was desta-
bilized by the methyl groups on these ligands.
One of the main findings of this theoretical study was that the lowest en-
ergy complex prior to decarboxylation was a neutral complex (II) and, in
order to proceed to decarboxylation, the system had to undergo a charge
separation to yield the charged pre-decarboxylative complex (IV). The same
pattern is true for the carbopalladation, with the neutral complex VII as the
lowest point prior to the charged carbopalladation TS (TS-II). This indicates
that the solvent played an important role in the reaction and that the free-
energy requirement can be lowered if the solvent better stabilizes the
charged species. In fact, DFT calculations applying a water solvation model
showed that the free-energy requirements were reduced with these settings.
This prompted us to try a series of experimental setups with increasing water
content while monitoring the rate and yield of the reaction.
33
The experiments showed an increased reaction rate for systems with high-
er concentrations of water in the solvent system. However, the formation of
the proto-decarboxylation product 1,3-dimethoxybenzene also increased
with the concentration of water and, thus, the systems with a high water con-
centration did not generate yields as high as those in the previously investi-
gated systems. Although the addition of water did not improve the protocol,
the increased reaction rate supports the computational findings.
34
Table 2. Calculated free energy for the decarboxylation and carbopalladation steps.
Substrate ΔG‡ decarboxylation (kJ mol-1) ΔG‡ carbopalladation (kJ mol-1)
1a 110.3 110.9
1b 111.1 117,9
1c 138.5 149.1
Scheme 4. Proposed catalytic cycle based on DFT calculations and ESI-MS for the
palladium(II)-catalyzed decarboxylative synthesis of aryl amidines. The calculated
free-energy requirement for the two main reaction steps for substrate 1a is indicated
in the figure.
The investigation showed that the more electron-poor substrate 1c had sub-
stantially higher free-energy requirements for both reaction steps. The lower
free-energy requirement of the carbopalladation process placed it close in
energy to the decarboxylation process, in particular for the more electron-
rich substrates. The carbopalladation was still predicted to be the rate-
35
determining step but the difference was within the expected margins of error
for the DFT method.
The ESI-MS investigation detected several of the complexes that the cal-
culations had predicted would form during the reaction. A putative catalytic
cycle based on the theoretical investigation and the ESI-MS results is shown
in Scheme 4.
36
Figure 6. Calculated free-energy pathway for the formation of styrene.
37
Figure 7. Calculated free-energy pathway for the transvinylation process. The calcu-
lated free-energy requirement for decarboxylation is shown (in red) for comparison.
38
Figure 8. Calculated free-energy profile for the formation of the styrene product in
black and the 1,1-diarylethene in red.
39
Scheme 6. Reaction conditions for the selective synthesis of 1,1-diarylethene.
Overall, the calculations confirmed the experimental findings that it might be
difficult to produce a general protocol for the selective synthesis of styrene.
This is because the two possible side reactions, transvinylation and diaryla-
tion, have free-energy requirements that are very similar to or lower than
those of the main reaction and, thus, will be difficult to circumvent. Further,
the situation was made more difficult by the fact that a high concentration of
VA, which would have been one way of retarding the diaryl formation, leads
to high yields of the transvinylation by-product while, in contrast, a low con-
centration of VA leads to problems with diarylation.
40
7 Method Consensus in Virtual Screening
(paper IV)
Combining the signals from several sources has the potential to increase
their accuracy.103 This concept is generally referred to as data fusion or sig-
nal consensus. In the field of VS, the concept of consensus scoring for dock-
ing has been known for a long time.104 The work of Willet has shown that
these concepts are applicable for fusing the results from different methods in
LB VS.105
Consider a collection of VS methods. If the methods agree better when
ranking active compounds than when ranking inactive compounds, then the
combination of these methods should yield better results for active com-
pounds. If, instead, the applied methods find different active compounds,
then the best use of the data would be to select the top compounds from each
method.
Besides improving the result over that of any individual signal, another
possible beneficial result of data fusion is that it might reduce variation in
performance. Since no VS method is known to consistently give the best
result, and the performance often varies depending on the target and com-
pound dataset, it is very difficult to know which method to apply when faced
with a new target. If data fusion can function as an averaging effect between
methods it could be used to remove this uncertainty, although at a higher
computational cost.
LB and SB methods rely on quite different principles but are both known
to give good results in VS. We therefore set out to investigate whether the
results of VS could be improved by applying data fusion to the results of
both LB and SB methods, and which data fusion scheme would be the most
successful. We chose VS methods to represent the current state of the art:
docking using Glide,106,107 shape screening using ROCS,108 pharmacophore
screening using Phase,109,110 and electrostatic shape screening with EON.111
Five data fusion schemes, all previously reported in the literature, were
applied: sum rank, rank vote, sum score, Pareto ranking, and parallel selec-
tion.105,112–115 The algorithms are further described in Table 3. These schemes
are convenient as they do not require any prior knowledge of the target or
dataset and no training set needs to be applied. The only variable that has to
be decided upon prior to screening is the number of structures that should
41
receive votes from the rank vote method. This would typically be set based
on the number of hits desired rather than the properties of the dataset.
Table 3. The data fusion algorithms applied in this study. Adapted with permission
from Svensson et al.116
Name Algorithm
Sum rank The ranks from the single method rank lists are summed.
Rank vote Each single method gives a vote to the 250 highest-ranked compounds.
The hit lists are then combined and the compounds are sorted first on the
number of votes and then on the sum score.
Sum score All scores are divided by the best score from that method. This relative
score is then summed from all the single methods.
Pareto ranking A compound is ranked based on how many other compounds are better
in all single methods. Ties are broken using sum rank.
Parallel selection Compounds are selected in turn from the top of the single method hit
lists until the desired number of compounds is reached. If a compound
has already been selected, the next compound from the current hit list is
chosen instead.
42
When the results were reviewed it became clear that data fusion was indeed
able to improve the outcome of VS. Not only did the data fusion have an
averaging effect, meaning that the fused results were more consistent than
any of the single methods, but data fusion recovered more active compounds
than any of the single methods for some targets. Also, on average across all
the datasets, data fusion recovered more active compounds than any single
method alone. Among the applied data fusion algorithms, parallel selection
performed best, indicating that for these datasets and methods the hit lists are
at least to some extent complementary.
The results for one of the targets are shown in Figure 9. The differences
between the different data fusion algorithms were clearly smaller than the
differences between the different VS methods. This was a general trend
across the tested targets. This is beneficial for data fusion since it can serve
to reduce the impact of the method selection. If the spread between data fu-
sion algorithms had been as wide as for the individual methods we would
just have replaced one problem of method selection for another.
Figure 9. ROC curves for one of the investigated targets (ERα). The left plot shows
the individual VS methods and the right plot shows the result from the five data
fusion algorithms. Reprinted with permission from Svensson et al.116
43
8 Evaluation of Virtual Screening Using
Publicly Available High Throughput
Screening Data (paper V)
44
Figure 10. Reported % inhibition at 13.3 μM for PubChem HTS id 1654.
Another complication when using data from PubChem is the lack of stand-
ardization for how the data are annotated and the assay procedures de-
scribed. This makes the curation of the data a manual and time-consuming
job.
Issues regarding data quality as well as several other aspects need to be
considered before using HTS data for VS evaluation. We stipulated that, in
order to be useful as an evaluation set, at least 1000 compounds should have
been screened, with a minimum of twenty actives found. The bioassays also
needed to be linked to at least one protein crystal structure in the PDB, bind-
ing a drug-like molecule. In order to improve confidence in the data, we only
used HTS data where the active compounds had been confirmed in a dose-
response-type secondary assay. This addressed the most acute problem of
false positives in the HTS data but did not take into account the issue of false
negatives. However, due to the large number of inactive compounds, false
negatives should only have had a minor impact on the evaluation metrics
used for VS, unless present in excessive amounts. At first, we also excluded
all proteins with a co-factor binding site but since this excluded target clas-
ses such as kinases, which are well characterized regarding ligand binding,
we selectively included targets where binding in alternative sites was
deemed unlikely.
With the inclusion criteria set, we used Python scripts to mine PubChem
for assays that fulfilled these conditions.
Following the automated filters, the collected data were manually re-
viewed and the details of each bioassay were assessed. All data from whole-
cell assays were removed as the mode of action of potential inhibitors was
45
considered too uncertain. To our surprise, from the thousands of screens
deposited in PubChem, only seven datasets fulfilled our selection criteria,
indicating that the quality of the publicly available HTS data was far from
optimal. The seven datasets and the number of included compounds are
listed in Table 5. The number of inactive molecules to active molecules, on
average almost 2 500 inactive for every active, clearly distinguished these
datasets from previous benchmark datasets.
Table 5. Number of active and inactive compounds in the compiled datasets.
Target # active # inactive
MMP13 19 64 814
DUSP3 52 289 236
PTPN22 28 292 058
EPHX2 44 90 262
CTDSP1 369 337 634
MAPK10 57 59 405
CDK5 23 334 339
46
Keeping the dissimilar inactive compounds in the dataset also gives some
advantage to the evaluation. The inactive molecules will cover much more of
the physicochemical space and, thus, methods that are biased towards big or
charged molecules, for example, will be penalized in the comparison. It is
not possible to detect that kind of bias using a property balanced dataset such
as the DUD. Indeed, when we evaluated the results of the VS, it became
clear that docking, in many cases, showed a preference for very large mole-
cules, as shown in Figure 11.
Figure 11. PCA plots showing the distribution of inactive compounds as a contour
plot, the active compounds as black dots, the crystal ligand as a red triangle, and the
one hundred best-scored compounds from docking as yellow squares. Size is a major
factor when determining the first principal component. Reprinted with permission
from Lindh et al.125
47
9 Virtual Screening Using Computationally
Derived Reaction Coordinates (paper VI)
Enzymes catalyze a variety of reactions and are essential for life as we know
it. The capacity of an enzyme to catalyze a reaction stems from its ability to
stabilize the TS of the reaction and, thus, to lower the required free energy of
the transition. TS mimetic inhibitors resemble the TS and can therefore take
advantage of the same stabilizing effect in their binding. This results in the
potential for TS mimetics to be extraordinarily potent inhibitors.126
It is in the nature of TSs that their structure cannot be experimentally de-
termined by methods such as X-ray crystallography. They can, however, be
modeled computationally using various approaches, the most common being
QM/MM methods86 and the so-called cluster approach.87,88
Research from the laboratory of Schramm has shown that it is possible to
use computationally derived TSs as a basis for inhibitor design and that the
designed inhibitors can be extremely potent.127,128 We envisioned that instead
of designing inhibitors, computed TS or high-energy intermediate structures
might be used for VS in order to query compound collections for TS mimetic
inhibitors.
We chose insulin-regulated aminopeptidase (IRAP) as the target for our
efforts. IRAP belongs to the M1 family of aminopeptidases and has been
indicated as a promising target for cognitive enhancement.129 A schematic
mechanism based on studies of the related enzyme aminopeptidase N (APN)
and endoplasmic reticulum aminopeptidase (ERAP) is shown in Scheme 7.
48
Several TS or high-energy-intermediate mimetic inhibitors of enzymes in the
M1 family are known; among these, bestatin and amastatin have been con-
firmed to also be active against IRAP.131
When the project started, the crystal structure of IRAP was not available
and we chose to use the crystal structure of the closely related enzyme APN
as the basis for modeling. The active site of APN has a very high sequence
similarity to its IRAP counterpart. Starting from the APN crystal 4FYR,132
the residues closest to the active site were extracted. The extracted active site
contained eleven amino acid side chains; of these, two differed between
APN and IRAP. The APN residues Gln213 and Val385 are the equivalents
of the IRAP residues Glu295 and Ile461, respectively, and these were modi-
fied to reflect the IRAP structure. A model substrate and the water molecule
necessary for the reaction were added to the active site. The resulting model,
consisting of 227 atoms, was used to model the reaction by means of DFT
calculations and the cluster approach.
During the course of our work, two crystal structures for IRAP were re-
leased (PDB ID 4PJ6 and 4P8Q).133 We evaluated these and found that they
did not provide a significant advantage over the APN-based model as the
parts included in our active site model were highly similar between the two
enzymes and the quality of the released IRAP structures was in some ways
inadequate. Firstly, the resolution of the released structures was quite poor
(2.96 Å and 3.02 Å) compared to that of the APN structure (1.91 Å). The
IRAP structures also held no substrate or inhibitor. Instead, a single amino
acid residue was modeled in the active site. The structures were also in an
intermediate position between the open and closed conformations know from
ERAP.134 A schematic picture of the open and closed conformations of
ERAP1 is shown in Figure 12. It has been suggested that the peptide sub-
strate C-terminal in ERAP binds to a positively charged binding site in the
open conformation of the enzyme. The ligand binding prompts the enzyme
to adopt the closed formation. If the substrate is too large, the enzyme cannot
close, and if it is too small it will not reach the N-terminal binding site where
the catalytic cleavage takes place. Thus, ERAP only effectively cleaves pep-
tides of a certain length.135
49
Figure 12. Schematic picture of the open (left, PDB ID 3MDJ136) and closed (right,
PDB ID 2YDO134) conformations of ERAP1.
The biological significance of this half open conformation in the IRAP crys-
tals is not known. However, in this conformation, Tyr549, which is located
in the active site and known from other M1 family peptidases to stabilize the
reaction intermediate,137 does not have any interaction with the amino acid
crystalized in the substrate position on the active site. The Gly-Ala-Met-Glu-
Asn amino acid loops that line the active site are also displaced compared to
the APN structure. This could be because of the enzyme conformation or it
could reflect an actual difference between the two enzymes. To resolve this
uncertainty, more and better crystal structures of IRAP are required.
The modeled reaction path (Figure 13), based on the APN active site
model, starts with the nucleophilic attack of the water molecule on the sub-
strate carbonyl to form the gem-diol intermediate. From there, the hydrogen
bonds to and from the formed gem-diol have to shift to prepare for the pro-
tonation of the former amide nitrogen. From here, protonation takes place
and subsequently the peptide is cleaved. The first hydrogen bond shift was
found to have the highest free-energy requirement in our model (58.5 kJ mol-
1
). Overall, the calculated reaction pathway follows the literature mechanism
for the M1 family aminopeptidases and has reasonable associated energies.
50
IRAP-I IRAP-TS-I IRAP-II
IRAP-TS-V IRAP-VI
Figure 13. Schematic view of the calculated intermediates and TSs of the modeled
reaction. Reprinted with permission from Svensson et al.130
51
V since they represent the two main steps of the reaction. These intermedi-
ates were very similar to the preceding TS structure, and we expected that
the difference was not of any major significance when used for docking.
The compound library was first docked and then submitted to a pharma-
cophore filter. This was done in order to only keep the structures that were
predicted to bind in a similar way to the reaction intermediate. The pharma-
cophore model was built from the substrate in structure IRAP-II. The allow-
ance for the pharmacophore filter was quite big and the chosen features were
present throughout the reaction; therefore, the choice of structure on which
the pharmacophore was based should have been of lesser importance. The
compounds used for screening were from the eMolecules database, which at
the time consisted of around 5.6 million compounds. Before VS, we filtered
off any structure with a molecular weight over 600 g mol-1 and prepared the
structures for VS with LigPrep.
The VS yielded several interesting compounds, including known ami-
nopeptidase TS/high-energy intermediate mimetics. The ranking of these
compounds was dramatically improved by the application of the pharmaco-
phore model, indicating that some form of binding mode filter is beneficial.
The VS protocol was also excellent at reproducing the experimentally de-
termined binding mode of bestatin, Figure 14.
Figure 14. The conformation of bestatin resulting from the VS protocol (pink car-
bons) overlaid on the crystal structure of bestatin in APN (yellow carbons, PDB ID
4FKK138). Reprinted with permission from Svensson et al.130
Seven compounds were purchased and submitted for biological testing with
respect to their ability to inhibit IRAP. Among these, ami-
no(phenyl)methylphosphonic acid (2, Scheme 8) was confirmed as an IRAP
inhibitor with an IC50 of 230 (± 7) μM. We also tested amastatin as a positive
control and it registered an IC50 of 26 (± 0.6) μM in our assay.
52
Scheme 8. Structure of the identified IRAP inhibitor.
α-Aminophosphonic acids are known to inhibit APN but have not previously
been shown to inhibit IRAP.139 Although, the IRAP inhibition is not surpris-
ing it further confirms the usefulness of this VS strategy.
Although 2 was a close match in pharmacophore features to the TS, the
measured affinity was not very high. This might have been the result of the
small size of the inhibitor, especially compared to the natural substrate of
IRAP. It is also possible that the electron density and charge distribution
were not sufficiently similar to those of the TS to achieve a tighter binding.
Overall, this study confirms that it is possible to use calculated reaction
coordinates as the basis for VS in order to enrich compounds that can act as
TS/high-energy intermediate mimetics.
53
10 Conclusions
In summary, the work presented in this thesis has resulted in the following
new insights:
54
11 Acknowledgements
The work presented in this thesis would not have been possible without the
help, support, and inspiration provided by others. I wish to express my sin-
cere gratitude to the following people:
Professor Mats Larhed, the palladium group, the TB group, the IRAP group,
and the CADD group for fruitful scientific discussions.
All my colleagues at the department, past and present, for making our work-
place such a great environment for both research and socializing.
My partner in life, Susanne “sus” Bornelöv, for simply being the best.
55
12 References
56
(17) Lee, C.; Yang, W.; Parr, R. G. Development of the Colle-Salvetti
Correlation-Energy Formula into a Functional of the Electron Density. Phys.
Rev. B 1988, 37, 785–789.
(18) Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. Assessment of Gaussian-3
and Density-Functional Theories on the G3/05 Test Set of Experimental
Energies. J. Chem. Phys. 2005, 123, 124107.
(19) Ahlquist, M.; Kozuch, S.; Shaik, S.; Tanner, D.; Norrby, P.-O. On the
Performance of Continuum Solvation Models for the Solvation Energy of
Small Anions. Organometallics 2006, 25, 45–47.
(20) Ditchfield, R.; Hehre, W. J.; Pople, J. A. Self-Consistent Molecular-Orbital
Methods. IX. An Extended Gaussian-Type Basis for Molecular-Orbital
Studies of Organic Molecules. J. Chem. Phys. 1971, 54, 724.
(21) Hehre, W. J.; Pople, J. A. Self-Consistent Molecular Orbital Methods. XIII.
An Extended Gaussian-Type Basis for Boron. J. Chem. Phys. 1972, 56,
4233.
(22) Binkley, J. S.; Pople, J. A. Self-Consistent Molecular Orbital Methods. XIX.
Split-Valence Gaussian-Type Basis Sets for Beryllium. J. Chem. Phys.
1977, 66, 879.
(23) Hariharan, P. C.; Pople, J. A. The Influence of Polarization Functions on
Molecular Orbital Hydrogenation Energies. Theor. Chim. Acta 1973, 28,
213–222.
(24) Francl, M. M.; Pietro, W. J.; Hehre, W. J.; Binkley, J. S.; Gordon, M. S.;
DeFrees, D. J.; Pople, J. A. Self-Consistent Molecular Orbital Methods.
XXIII. A Polarization-Type Basis Set for Second-Row Elements. J. Chem.
Phys. 1982, 77, 3654.
(25) Hehre, W. J.; Ditchfield, R.; Pople, J. A. Self—Consistent Molecular Orbital
Methods. XII. Further Extensions of Gaussian—Type Basis Sets for Use in
Molecular Orbital Studies of Organic Molecules. J. Chem. Phys. 1972, 56,
2257.
(26) Hay, P. J.; Wadt, W. R. Ab Initio Effective Core Potentials for Molecular
Calculations. Potentials for K to Au Including the Outermost Core Orbitals.
J. Chem. Phys. 1985, 82, 299.
(27) Grimme, S. Density Functional Theory with London Dispersion Corrections.
Wiley Interdiscip. Rev. Comput. Mol. Sci. 2011, 1, 211–228.
(28) Ahlquist, M. S. G.; Norrby, P.-O. Dispersion and Back-Donation Gives
Tetracoordinate [Pd(PPh3)4]. Angew. Chemie Int. Ed. 2011, 50, 11794–
11797.
(29) Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A Consistent and Accurate
Ab Initio Parametrization of Density Functional Dispersion Correction
(DFT-D) for the 94 Elements H-Pu. J. Chem. Phys. 2010, 132, 154104–
154119.
(30) Antony, J.; Grimme, S.; Liakos, D. G.; Neese, F. Protein–Ligand Interaction
Energies with Dispersion Corrected Density Functional Theory and High-
Level Wave Function Based Methods. J. Phys. Chem. A 2011, 115, 11210–
11220.
57
(31) Tannor, D. J.; Marten, B.; Murphy, R.; Friesner, R. A.; Sitkoff, D.; Nicholls,
A.; Honig, B.; Ringnalda, M.; Goddard, W. A. Accurate First Principles
Calculation of Molecular Charge Distributions and Solvation Energies from
Ab Initio Quantum Mechanics and Continuum Dielectric Theory. J. Am.
Chem. Soc. 1994, 116, 11875–11882.
(32) Marten, B.; Kim, K.; Cortis, C.; Friesner, R. A.; Murphy, R. B.; Ringnalda,
M. N.; Sitkoff, D.; Honig, B. New Model for Calculation of Solvation Free
Energies: Correction of Self-Consistent Reaction Field Continuum
Dielectric Theory for Short-Range Hydrogen-Bonding Effects. J. Phys.
Chem. 1996, 100, 11775–11788.
(33) Hratchian, H. P.; Schlegel, H. B. Finding Minima, Transition States, and
Following Reaction Pathways on Ab Initio Potential Energy Surfaces. In
Theory and Applications of Computational Chemistry; Elsevier: Amsterdam,
2005; pp. 195–249.
(34) Ripphausen, P.; Nisius, B.; Peltason, L.; Bajorath, J. Quo Vadis, Virtual
Screening? A Comprehensive Survey of Prospective Applications. J. Med.
Chem. 2010, 53, 8461–8467.
(35) Matter, H.; Sotriffer, C. Applications and Success Stories in Virtual
Screening. In Virtual Screening; Wiley-VCH Verlag GmbH & Co. KGaA,
2011; pp. 319–358.
(36) Clark, D. E. What Has Virtual Screening Ever Done for Drug Discovery?
Expert Opin. Drug Discov. 2008, 3, 841–851.
(37) Concepts and Applications of Molecular Similarity; Johanson, M. A.;
Maggiora, G. M., Eds.; John Wiley & Sons: New York, 1990.
(38) Tresadern, G.; Bemporad, D.; Howe, T. A Comparison of Ligand Based
Virtual Screening Methods and Application to Corticotropin Releasing
Factor 1 Receptor. J. Mol. Graph. Model. 2009, 27, 860–870.
(39) Meng, X.-Y.; Zhang, H.-X.; Mezei, M.; Cui, M. Molecular Docking: A
Powerful Approach for Structure-Based Drug Discovery. Curr. Comput.
Aided. Drug Des. 2011, 7, 146–157.
(40) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.;
Weissing, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank.
Nucleic Acids Res. 2000, 28, 235–242.
(41) Kleywegt, G. J.; Harris, M. R.; Zou, J. Y.; Taylor, T. C.; Wählby, A.; Jones,
T. A. The Uppsala Electron-Density Server. Acta Crystallogr. D. Biol.
Crystallogr. 2004, 60, 2240–2249.
(42) Davis, A. M.; St-Gallay, S. A.; Kleywegt, G. J. Limitations and Lessons in
the Use of X-Ray Structural Information in Drug Design. Drug Discov.
Today 2008, 13, 831–841.
(43) Madhavi Sastry, G.; Adzhigirey, M.; Day, T.; Annabhimoju, R.; Sherman,
W. Protein and Ligand Preparation: Parameters, Protocols, and Influence on
Virtual Screening Enrichments. J. Comput. Aided. Mol. Des. 2013, 27, 221–
234.
(44) Kirchmair, J.; Spitzer, G. M.; Liedl, K. R. Consideration of Water and
Solvation Effects in Virtual Screening. In Virtual Screening; Wiley-VCH
Verlag GmbH & Co. KGaA, 2011; pp. 263–289.
58
(45) Lenselink, E. B.; Beuming, T.; Sherman, W.; van Vlijmen, H. W. T.;
IJzerman, A. P. Selecting an Optimal Number of Binding Site Waters To
Improve Virtual Screening Enrichments Against the Adenosine A2A
Receptor. J. Chem. Inf. Model. 2014, 54, 1737–1746.
(46) Abel, R.; Young, T.; Farid, R.; Berne, B. J.; Friesner, R. A. The Role of the
Active Site Solvent in the Thermodynamics of Factor Xa-Ligand Binding. J.
Am. Chem. Soc. 2008, 130, 2817–2831.
(47) Young, T.; Abel, R.; Kim, B.; Berne, B. J.; Friesner, R. A. Motifs for
Molecular Recognition Exploiting Hydrophobic Enclosure in Protein–ligand
Binding. Proc. Natl. Acad. Sci. 2007, 104 , 808–813.
(48) Kirkpatrick, P.; Ellis, C. Chemical Space. Nature 2004, 432, 823.
(49) Lipinski, C.; Hopkins, A. Navigating Chemical Space for Biology and
Medicine. Nature 2004, 432, 855–861.
(50) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental
and Computational Approaches to Estimate Solubility and Permeability in
Drug Discovery and Development Settings. Adv. Drug Deliv. Rev. 1997, 23,
3–25.
(51) Pearson, K. LIII. On Lines and Planes of Closest Fit to Systems of Points in
Space. Philos. Mag. Ser. 6 1901, 2, 559–572.
(52) Jain, A. N.; Nicholls, A. Recommendations for Evaluation of Computational
Methods. J. Comput. Aided. Mol. Des. 2008, 22, 133–139.
(53) Xia, J.; Tilahun, E. L.; Reid, T.-E.; Zhang, L.; Wang, X. S. Benchmarking
Methods and Data Sets for Ligand Enrichment Assessment in Virtual
Screening. Methods 2015, 71, 146–157.
(54) Huang, N.; Scoichet, B. K.; Irwin, J. J. Benchmarking Sets for Molecular
Docking. J. Med. Chem. 2006, 49, 6789–6801.
(55) Mysinger, M. M.; Carchia, M.; Irwin, J. J.; Shoichet, B. K. Directory of
Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better
Benchmarking. J. Med. Chem. 2012, 55, 6582–6594.
(56) Rohrer, S. G.; Baumann, K. Maximum Unbiased Validation (MUV) Data
Sets for Virtual Screening Based on PubChem Bioactivity Data. J. Chem.
Inf. Model. 2009, 49, 169–184.
(57) Vogel, S. M.; Bauer, M. R.; Boeckler, F. M. DEKOIS: Demanding
Evaluation Kits for Objective in Silico Screening--a Versatile Tool for
Benchmarking Docking Programs and Scoring Functions. J. Chem. Inf.
Model. 2011, 51, 2650–2665.
(58) Bauer, M. R.; Ibrahim, T. M.; Vogel, S. M.; Boeckler, F. M. Evaluation and
Optimization of Virtual Screening Workflows with DEKOIS 2.0 – A Public
Library of Challenging Docking Benchmark Sets. J. Chem. Inf. Model.
2013, 53, 1447–1462.
(59) Irwin, J. J.; Shoichet, B. K. ZINC − A Free Database of Commercially
Available Compounds for Virtual Screening. J. Chem. Inf. Model. 2005, 45,
177–182.
(60) Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G.
ZINC: A Free Tool to Discover Chemistry for Biology. J. Chem. Inf. Model.
2012, 52, 1757–1768.
(61) Liu, T.; Lin, Y.; Wen, X.; Jorissen, R. N.; Gilson, M. K. BindingDB: A
Web-Accessible Database of Experimentally Determined Protein–ligand
Binding Affinities. Nucleic Acids Res. 2007, 35, D198–D201.
59
(62) Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Zhou, Z.; Han, L.;
Karapetyan, K.; Dracheva, S.; Shoemaker, B. A.; Bolton, E.; Gindulyte, A.;
Bryant, S. H. PubChem’s BioAssay Database. Nucleic Acids Res. 2012, 40,
D400–D412.
(63) Irwin, J. J. Community Benchmarks for Virtual Screening. J. Comput.
Aided. Mol. Des. 2008, 22, 193–199.
(64) Verdonk, M. L.; Berdini, V.; Hartshorn, M. J.; Mooij, W. T. M.; Murray, C.
W.; Taylor, R. D.; Watson, P. Virtual Screening Using Protein-Ligand
Docking: Avoiding Artificial Enrichment. J. Chem. Inf. Comput. Sci. 2004,
44, 793–806.
(65) Good, A. C.; Hermsmeier, M. a.; Hindle, S. a. Measuring CAMD Technique
Performance: A Virtual Screening Case Study in the Design of Validation
Experiments. J. Comput. Aided. Mol. Des. 2004, 18, 529–536.
(66) Truchon, J.-F.; Bayly, C. I. Evaluating Virtual Screening Methods: Good
and Bad Metrics for the “Early Recognition” Problem. J. Chem. Inf. Model.
2007, 47, 488–508.
(67) Berzelius, J. Årsberättelsen Om Framsteg I Fysik Och Kemi; Royal Swedish
Academy of Sciences, 1835.
(68) Mizoroki, T.; Mori, K.; Ozaki, A. Arylation of Olefin with Aryl Iodide
Catalyzed by Palladium. Bull. Chem. Soc. Jpn. 1971, 44, 581.
(69) Heck, R. F.; Nolley, J. P. Palladium-Catalyzed Vinylic Hydrogen
Substitution Reactions with Aryl, Benzyl, and Styryl Halides. J. Org. Chem.
1972, 37, 2320–2322.
(70) Heck, R. F. Acylation, Methylation, and Carboxyalkylation of Olefins by
Group VIII Metal Derivatives. J. Am. Chem. Soc. 1968, 90, 5518–5526.
(71) Shepard, A. F.; Winslow, N. R.; Johnson, J. R. THE SIMPLE HALOGEN
DERIVATIVES OF FURAN. J. Am. Chem. Soc. 1930, 52, 2083–2090.
(72) Tanaka, D.; Myers, A. G. Heck-Type Arylation of 2-Cycloalken-1-Ones
with Arylpalladium Intermediates Formed by Decarboxylative Palladation
and by Aryl Iodide Insertion. Org. Lett. 2004, 6, 433–436.
(73) Tanaka, D.; Romeril, S. P.; Myers, A. G. On the Mechanism of the
Palladium(II)-Catalyzed Decarboxylative Olefination of Arene Carboxylic
Acids. Crystallographic Characterization of Non-Phosphine Palladium(II)
Intermediates and Observation of Their Stepwise Transformation in Heck-
like Processes. J. Am. Chem. Soc. 2005, 127, 10323–10333.
(74) Myers, A. G.; Tanaka, D.; Mannion, M. R. Development of a
Decarboxylative Palladation Reaction and Its Use in a Heck-Type
Olefination of Arene Carboxylates. J. Am. Chem. Soc. 2002, 124, 11250–
11251.
(75) Dickstein, J. S.; Mulrooney, C. A.; O’Brien, E. M.; Morgan, B. J.;
Kozlowski, M. C. Development of a Catalytic Aromatic Decarboxylation
Reaction. Org. Lett. 2007, 9, 2441–2444.
(76) Skillinghaug, B.; Sköld, C.; Rydfjord, J.; Svensson, F.; Behrends, M.;
Sävmarker, J.; Sjöberg, P. J. R.; Larhed, M. Palladium(II)-Catalyzed
Desulfitative Synthesis of Aryl Ketones from Sodium Arylsulfinates and
Nitriles: Scope, Limitations, and Mechanistic Studies. J. Org. Chem. 2014,
79, 12018–12032.
60
(77) McMullin, C. L.; Jover, J.; Harvey, J. N.; Fey, N. Accurate Modelling of
Pd(0) + PhX Oxidative Addition Kinetics. Dalt. Trans. 2010, 39, 10833–
10836.
(78) Fukui, K. The Path of Chemical Reactions - the IRC Approach. Acc. Chem.
Res. 1981, 14, 363–368.
(79) Müller, K. Reaction Paths on Multidimensional Energy Hypersurfaces.
Angew. Chemie Int. Ed. 1980, 19, 1–13.
(80) Goodman, J. M.; Silva, M. A. QRC: A Rapid Method for Connecting
Transition Structures to Reactants in the Computational Analysis of Organic
Reactivity. Tetrahedron Lett. 2003, 44, 8233–8236.
(81) Lienhard, G. E. Enzymatic Catalysis and Transition-State Theory. Science
1973, 180, 149–154.
(82) Schwartz, S. D.; Schramm, V. L. Enzymatic Transition States and Dynamic
Motion in Barrier Crossing. Nat. Chem. Biol. 2009, 5, 551–558.
(83) Imming, P.; Sinning, C.; Meyer, A. Drugs, Their Targets and the Nature and
Number of Drug Targets. Nat. Rev. Drug Discov. 2006, 5, 821–834.
(84) Blomberg, M. R. A.; Borowski, T.; Himo, F.; Liao, R.-Z.; Siegbahn, P. E.
M. Quantum Chemical Studies of Mechanisms for Metalloenzymes. Chem.
Rev. 2014, 114, 3601–3658.
(85) Carvalho, A. T. P.; Barrozo, A.; Doron, D.; Kilshtain, A. V.; Major, D. T.;
Kamerlin, S. C. L. Challenges in Computational Studies of Enzyme
Structure, Function and Dynamics. J. Mol. Graph. Model. 2014, 54, 62–79.
(86) Senn, H. M.; Thiel, W. QM/MM Studies of Enzymes. Curr. Opin. Chem.
Biol. 2007, 11, 182–187.
(87) Siegbahn, P. E. M.; Himo, F. The Quantum Chemical Cluster Approach for
Modeling Enzyme Reactions. Wiley Interdiscip. Rev. Comput. Mol. Sci.
2011, 1, 323–336.
(88) Siegbahn, P. M.; Himo, F. Recent Developments of the Quantum Chemical
Cluster Approach for Modeling Enzyme Reactions. J. Biol. Inorg. Chem.
2009, 14, 643–651.
(89) Li, L.; Li, C.; Zhang, Z.; Alexov, E. On the Dielectric “Constant” of
Proteins: Smooth Dielectric Function for Macromolecular Modeling and Its
Implementation in DelPhi. J. Chem. Theory Comput. 2013, 9, 2126–2136.
(90) Sevastik, R.; Himo, F. Quantum Chemical Modeling of Enzymatic
Reactions: The Case of 4-Oxalocrotonate Tautomerase. Bioorg. Chem.
2007, 35, 444–457.
(91) Hopmann, K. H.; Himo, F. Quantum Chemical Modeling of the
Dehalogenation Reaction of Haloalcohol Dehalogenase. J. Chem. Theory
Comput. 2008, 4, 1129–1137.
(92) Georgieva, P.; Himo, F. Quantum Chemical Modeling of Enzymatic
Reactions: The Case of Histone Lysine Methyltransferase. J. Comput. Chem.
2010, 31, 1707–1714.
(93) Hu, L.; Eliasson, J.; Heimdal, J.; Ryde, U. Do Quantum Mechanical
Energies Calculated for Small Models of Protein-Active Sites Converge?†.
J. Phys. Chem. A 2009, 113, 11793–11800.
(94) Sumner, S.; Söderhjelm, P.; Ryde, U. Effect of Geometry Optimizations on
QM-Cluster and QM/MM Studies of Reaction Energies in Proteins. J.
Chem. Theory Comput. 2013, 9, 4205–4214.
61
(95) Lindh, J.; Sjöberg, P. J. R.; Larhed, M. Synthesis of Aryl Ketones by
palladium(II)-Catalyzed Decarboxylative Addition of Benzoic Acids to
Nitriles. Angew. Chemie Int. Ed. 2010, 49, 7733–7737.
(96) Xue, L.; Su, W.; Lin, Z. A DFT Study on the Pd-Mediated Decarboxylation
Process of Aryl Carboxylic Acids. Dalt. Trans. 2010, 39, 9815–9822.
(97) Zhang, S.-L.; Fu, Y.; Shang, R.; Guo, Q.-X.; Liu, L. Theoretical Analysis of
Factors Controlling Pd-Catalyzed Decarboxylative Coupling of Carboxylic
Acids with Olefins. J. Am. Chem. Soc. 2009, 132, 638–646.
(98) Svensson, F.; Mane, R. S.; Sävmarker, J.; Larhed, M.; Sköld, C. Theoretical
and Experimental Investigation of Palladium(II)-Catalyzed Decarboxylative
Addition of Arenecarboxylic Acid to Nitrile. Organometallics 2013, 32,
490–497.
(99) Sävmarker, J.; Rydfjord, J.; Gising, J.; Odell, L. R.; Larhed, M. Direct
Palladium(II)-Catalyzed Synthesis of Arylamidines from
Aryltrifluoroborates. Org. Lett. 2012, 14, 2394–2397.
(100) Kormos, C. M.; Leadbeater, N. E. Preparation of Nonsymmetrically
Substituted Stilbenes in a One-Pot Two-Step Heck Strategy Using Ethene
As a Reagent. J. Org. Chem. 2008, 73, 3854–3858.
(101) Cabri, W.; Candiani, I.; Bedeschi, A.; Santi, R. Palladium-Catalyzed
Arylation of Unsymmetrical Olefins. Bidentate Phosphine Ligand
Controlled Regioselectivity. J. Org. Chem. 1992, 57, 3558–3563.
(102) Arai, I.; Daves, G. D. Palladium-Catalyzed Phenylation of Enol Ethers and
Acetates. J. Org. Chem. 1979, 44, 21–23.
(103) Mitchell, H. B. Multi-Sensor Data Fusion; Springer-Verlag: Berlin, 2007.
(104) Charifson, P. S.; Corkery, J. J.; Murcko, M. A.; Walters, W. P. Consensus
Scoring: A Method for Obtaining Improved Hit Rates from Docking
Databases of Three-Dimensional Structures into Proteins. J. Med. Chem.
1999, 42, 5100–5109.
(105) Willet, P. Enhancing the Effectivness of Ligand-Based Virtual Screening
Using Data Fusion. QSAR Comb. Sci. 2006, 25, 1143–1152.
(106) Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.;
Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelly, M.; Perry, J. K.; Shaw,
D. E.; Francis, P.; Shenkin, P. S. Glide: A New Approach for Rapid,
Accurate Docking and Scoring. 1. Method and Assessment of Docking
Accuracy. J. Med. Chem. 2004, 47, 1739–1749.
(107) Halgren, T. A.; Murphy, R. B.; Friesner, R. A.; Beard, H. S.; Frye, L. L.;
Pollard, W. T.; Banks, J. L. Glide: A New Approach for Rapid, Accurate
Docking and Scoring. 2. Enrichment Factors in Database Screening. J. Med.
Chem. 2004, 47, 1750–1759.
(108) Hawkins, P. C. D.; Skillman, A. G.; Nicholls, A. Comparison of Shape-
Matching and Docking as Virtual Screening Tools. J. Med. Chem. 2007, 50,
74–82.
(109) Dixon, S. L.; Smondyrev, A. M.; Knoll, E. H.; Rao, S. N.; Shaw, D. E.;
Freisner, R. A. PHASE: A New Engine for Pharmacophore Perception, 3D
QSAR Model Development, and 3D Databases Screening: 1. Methodology
and Preliminary Results. J. Comput. Aided. Mol. Des. 2006, 20, 647–671.
(110) Dixon, S. L.; Smondyrev, A. M.; Rao, S. N. PHASE: A Novel Approach to
Pharmacophore Modeling and 3D Database Searching. Chem. Biol. Drug
Des. 2006, 67, 370–372.
62
(111) Muchmore, S. W.; Souers, A. J.; Akritopoulou-Zanze, I. The Use of Three-
Dimensional Shape and Electrostatic Similarity Searching in the
Identification of a Melanin-Concentrating Hormone Receptor 1 Antagonist.
Chem. Biol. Drug Des. 2006, 67, 174–176.
(112) Tan, L.; Geppert, H.; Sisay, M. T.; Gütschow, M.; Bajorath, J. Integrating
Structure- and Ligand-Based Virtual Screening: Comparison of Individual,
Parallel, and Fused Molecular Docking and Similarity Search Calculations
on Multiple Targets. ChemMedChem 2008, 1566–1571.
(113) Baber, J. C.; Shirley, W. A.; Gao, Y.; Feher, M. The Use of Consensus
Scoring in Ligand-Based Virtual Screening. J. Chem. Inf. Model. 2006, 46,
244–288.
(114) Cross, S.; Baroni, M.; Carosati, E.; Benedetti, P.; Clementi, S. FLAP: GRID
Molecular Interaction Fileds in Virtual Screening. Validation Using the
DUD Data Set. J. Chem. Inf. Model. 2010, 50, 1442–1450.
(115) Wang, R.; Wang, S. How Does Consensus Scoring Work for Virtual Library
Screening? An Idealized Comuter Experiment. J. Chem. Inf. Comput. Sci.
2001, 41, 1422–1426.
(116) Svensson, F.; Karlén, A.; Sköld, C. Virtual Screening Data Fusion Using
Both Structure- and Ligand-Based Methods. J. Chem. Inf. Model. 2012, 52,
225–232.
(117) Jacobsson, M.; Karlén, A. Ligand Bias of Scoring Functions in Structure-
Based Virtual Screening. J. Chem. Inf. Model. 2006, 46, 1334–1343.
(118) Muthas, D.; Sabnis, Y. A.; Lundborg, M.; Karlén, A. Is It Possible to Increse
Hit Rates in Structure-Based Virtual Screening by Pharmacophore Filtering?
An Investigation of the Advantages and Pitfalls of Post-Filtering. J. Mol.
Graph. Model. 2008, 26, 1237–1251.
(119) Cheeseright, T. J.; Mackey, M. D.; Melville, J. L.; Vinter, J. G. FielsScreen:
Virtual Screening Using Molecular Fields. Application to the DUD Data
Set. J. Chem. Inf. Model. 2008, 48, 2108–2117.
(120) Sastry, G. M.; Inakollu, V. S. S.; Sherman, W. Boosting Virtual Screening
Enrichments with Data Fusion: Coalescing Hits from Two-Dimensional
Fingerprints, Shape, and Docking. J. Chem. Inf. Model. 2013, 53, 1531–
1542.
(121) Subramaniam, S.; Mehrotra, M.; Gupta, D. Virtual High Throughput
Screening (vHTS) - A Perspective. Bioinformation 2008, 3, 14–17.
(122) Blucher, A. S.; McWeeney, S. K. Challenges in Secondary Analysis of High
Throughput Screening Data. Pac. Symp. Biocomput. 2014, 114–124.
(123) Bender, A.; Glen, R. C. A Discussion of Measures of Enrichment in Virtual
Screening: Comparing the Information Content of Descriptors with
Increasing Levels of Sophistication. J. Chem. Inf. Model. 2005, 45, 1369–
1375.
(124) Bemis, G. W.; Murcko, M. A. The Properties of Known Drugs. 1. Molecular
Frameworks. J. Med. Chem. 1996, 39, 2887–2893.
(125) Lindh, M.; Svensson, F.; Schaal, W.; Zhang, J.; Sköld, C.; Brandt, P.;
Karlén, A. Toward a Benchmarking Data Set Able to Evaluate Ligand- and
Structure-Based Virtual Screening Using Public HTS Data. J. Chem. Inf.
Model. 2015, 55, 343–353.
63
(126) Karolina Gluza; Kafarski, P. Transition State Analogues of Enzymatic
Reactions as Potential Drugs. In Drug Discovery; El-Shemy, H., Ed.;
InTech: Rijeka, Croatia, 2013.
(127) Schramm, V. L. Enzymatic Transition States, Transition-State Analogs,
Dynamics, Thermodynamics, and Lifetimes. Annu. Rev. Biochem. 2011, 80,
703–732.
(128) Schramm, V. L. Transition States, Analogues, and Drug Development. ACS
Chem. Biol. 2012, 8, 71–81.
(129) Andersson, H.; Hallberg, M. Discovery of Inhibitors of Insulin-Regulated
Aminopeptidase as Cognitive Enhancers. Int. J. Hypertens. 2012, 2012,
Article ID 789671.
(130) Svensson, F.; Engen, K.; Lundbäck, T.; Larhed, M.; Sköld, C. Virtual
Screening for Transition State Analogue Inhibitors of IRAP Based on
Quantum Mechanically Derived Reaction Coordinates. J. Chem. Inf. Model.
2015. DOI: 10.1021/acs.jcim.5b00359
(131) Matsumoto, H.; Rogi, T.; Yamashiro, K.; Kodama, S.; Tsuruoka, N.;
Hattori, A.; Takio, K.; Mizutani, S.; Tsujimoto, M. Characterization of a
Recombinant Soluble Form of Human Placental Leucine
Aminopeptidase/oxytocinase Expressed in Chinese Hamster Ovary Cells.
Eur. J. Biochem. 2000, 267, 46–52.
(132) Wong, A. H. M.; Zhou, D.; Rini, J. M. The X-Ray Crystal Structure of
Human Aminopeptidase N Reveals a Novel Dimer and the Basis for Peptide
Processing. J. Biol. Chem. 2012, 287, 36804–36813.
(133) Hermans, S. J.; Ascher, D. B.; Hancock, N. C.; Holien, J. K.; Michell, B. J.;
Yeen Chai, S.; Morton, C. J.; Parker, M. W. Crystal Structure of Human
Insulin-Regulated Aminopeptidase with Specificity for Cyclic Peptides.
Protein Sci. 2015, 24, 190–199.
(134) Kochan, G.; Krojer, T.; Harvey, D.; Fischer, R.; Chen, L.; Vollmar, M.; von
Delft, F.; Kavanagh, K. L.; Brown, M. A.; Bowness, P.; Wordsworth, P.;
Kessler, B. M.; Oppermann, U. Crystal Structures of the Endoplasmic
Reticulum Aminopeptidase-1 (ERAP1) Reveal the Molecular Basis for N-
Terminal Peptide Trimming. Proc. Natl. Acad. Sci. 2011, 108, 7745–7750.
(135) Gandhi, A.; Lakshminarasimhan, D.; Sun, Y.; Guo, H.-C. Structural Insights
into the Molecular Ruler Mechanism of the Endoplasmic Reticulum
Aminopeptidase ERAP1. Sci. Rep. 2011, 1.
(136) Nguyen, T. T.; Chang, S.-C.; Evnouchidou, I.; York, I. A.; Zikos, C.; Rock,
K. L.; Goldberg, A. L.; Stratikos, E.; Stern, L. J. Structural Basis for
Antigenic Peptide Precursor Processing by the Endoplasmic Reticulum
Aminopeptidase ERAP1. Nat. Struct. Mol. Biol. 2011, 18, 604–613.
(137) McGowan, S.; Porter, C. J.; Lowther, J.; Stack, C. M.; Golding, S. J.;
Skinner-Adams, T. S.; Trenholme, K. R.; Teuscher, F.; Donnelly, S. M.;
Grembecka, J.; Mucha, A.; Kafarski, P.; DeGori, R.; Buckle, A. M.;
Gardiner, D. L.; Whisstock, J. C.; Dalton, J. P. Structural Basis for the
Inhibition of the Essential Plasmodium Falciparum M1 Neutral
Aminopeptidase. Proc. Natl. Acad. Sci. 2009, 106, 2537–2542.
(138) Chen, L.; Lin, Y.-L.; Peng, G.; Li, F. Structural Basis for Multifunctional
Roles of Mammalian Aminopeptidase N. Proc. Natl. Acad. Sci. 2012, 109,
17966–17971.
64
(139) Grzywa, R.; Sokol, A. M.; Sieńczyk, M.; Radziszewicz, M.; Kościołek, B.;
Carty, M. P.; Oleksyszyn, J. New Aromatic Monoesters of Α-
Aminoaralkylphosphonic Acids as Inhibitors of Aminopeptidase N/CD13.
Bioorg. Med. Chem. 2010, 18, 2930–2936.
65
Acta Universitatis Upsaliensis
Digital Comprehensive Summaries of Uppsala Dissertations
from the Faculty of Pharmacy 201
Editor: The Dean of the Faculty of Pharmacy
ACTA
UNIVERSITATIS
UPSALIENSIS
Distribution: publications.uu.se UPPSALA
urn:nbn:se:uu:diva-259443 2015