Rydberg

Article
pubs.acs.org/molecularpharmaceutics
The Contribution of Atom Accessibility to Site of Metabolism Models

for Cytochromes P450
Patrik Rydberg,* Michal Rostkowski, David E. Gloriam, and Lars Olsen
Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, DK-2100 Copenhagen, Denmark
*
S Supporting Information
ABSTRACT: Three different types of atom accessibility

descriptors are investigated in relation to site of metabolism
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
predictions. To enable the integration of local accessibility we

have constructed 2DSASA, a method for the calculation of the
Downloaded via BITS PILANI PILANI CAMPUS on October 4, 2019 at 05:52:36 (UTC).
atomic solvent accessible surface area that is independent of

3D coordinates. The method was implemented in the
SMARTCyp site of metabolism prediction models and
improved the results by up to 4 percentage points for nine
cytochrome P450 isoforms. The final models are made
available at http://www.farma.ku.dk/smartcyp.
KEYWORDS: cytochromes P450, solvent accessible surface area, site of metabolism, circular fingerprints
■ INTRODUCTION
Understanding the pharmacokinetic characteristics of drug
metabolism, and accessibility is less important for most
isoforms. This is also why it works less well when the binding
candidates is crucial both in the early drug discovery and orientation of a molecule within the active site is a more
subsequent development processes. The metabolism and important determinant than reactivity. This is the case for the
elimination of drugs gives a major contribution to their kinetic CYP isoforms 2C9 and 2D6. To take this into account, we
profile and is heavily influenced by interactions with the therefore recently constructed simple pharmacophore correc-
cytochrome P450 (CYP) enzyme family. This family of tions that simply take the distance between each atom and the
ubiquitous enzymes is the major determinant of phase I pharmacophoric element into account (carboxylic acid and its
metabolism. The nine most prevalent isoforms in human are bioisosteres in CYP 2C9 and protonated amines in CYP
1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1, and 3A4, among 2D6).6,13
which 1A2, 2C9, 2C19, 2D6, and 3A4 are considered to be the In principle there are only two contributions that a ligand-
most important for drug metabolism.1 based model for CYP-mediated site of metabolism should
The prediction of the site of metabolism for CYP mediated include: the reactivity and descriptors that mimic the binding
drug metabolism has received a lot of interest in recent years, mode to the active site. Descriptors that mimic the binding
and models have been based on many different techniques.2 mode can be separated into two classes. First are those that
While attempts to construct models that include the protein describe overall accessibility of an atom, which is determined by
structure in the modeling have been made,3−5 they so far seem the orientation of the molecule inside the active site. Examples
to offer very little (if any) improvement in prediction accuracy of such descriptors are distances to pharmacophoric elements,
compared to the best ligand-based models.6−8 This is probably bond counts to the center of the molecule, etc. Second are the
because the CYP enzymes are very flexible,9 making the descriptors that describe the local accessibility of an atom. Such
sampling of the full conformational space too computationally descriptors estimate how likely an atom is to be accessible for a
expensive. reaction to take place, if the binding mode suggests that this
We have developed the 2D structure based SMARTCyp atom will be close to the heme group.
methodology,10−12 which in contrast to other models is In SMARTCyp, all descriptors that mimic the binding mode
completely independent of experimental data. To achieve belong to the first class. In the standard model the relative span
this, the reactivity toward oxidation by the heme group of the is used to describe the orientation. The relative span is defined
CYPs of all atoms is assigned using fragment matching toward a
fragment library for which the reactivities have been
Special Issue: Predictive DMPK: In Silico ADME Predictions in
precomputed using density functional theory (DFT) transition
Drug Discovery
state calculations.
These reactivities in combination with a simple atom Received: September 12, 2012
accessibility descriptor (the relative span) lead to a model that Revised: December 13, 2012
is quite accurate for most CYP isoforms.8 This is because Accepted: January 7, 2013
reactivity is the most important factor in determining the site of Published: January 22, 2013
© 2013 American Chemical Society 1216 dx.doi.org/10.1021/mp3005116 | Mol. Pharmaceutics 2013, 10, 1216−1223
Molecular Pharmaceutics Article
as the relative distance to the center of the molecule (see Figure

1). Its maximum and minimum values within a molecule are
Figure 1. Description of the relative span and Span2End descriptors

used in the different SMARTCyp models. The example of calculated
accessibility concerns the atom in the red circle. The green bonds
represent the largest number of bonds between this atom and any
other atom. The black bold bonds represent the largest number of Figure 3. A 2-dimensional example of how atomic SASA is defined.
bonds between any two atoms in the molecule. Reprinted with The gray, blue, green, and red filled circles are atoms, and their
permission from ref 6. Copyright 2012 American Chemical Society. corresponding atomic SASA are shown as dashed arcs in the same
colors. R is the sum of the atom and solvent probe radii. The distance
independent of molecule size. In the CYP 2C9 and CYP 2D6 between each atom and its atomic SASA is the radius of the solvent
models the Span2End descriptor is used together with a probe, which is shown as a black circle. The SASA of atom B is very
pharmacophore descriptor. The Span2End descriptor is the small as it is only accessible from one direction in this 2D
distance to the end of the molecule, defined as the maximum representation.
number of bonds between any two atoms in the molecule
minus the maximum number of bonds between the atom of
contact with this atom without penetrating any other atoms of
interest and any other atom in the molecule (see Figure 1).
the molecule.” The sphere radius, R, is the sum of the atomic
This descriptor has a minimum value of zero, but its maximum
van der Waals and solvent molecule radii.
value is dependent on the size of the molecule. Both span
The atomic SASA has hitherto required 3D structures,
descriptors give preference to sites of metabolism that require
typically in one of two different ways: either numerically and
binding modes with a vertical orientation compared to sites of
accurately using the Shrake−Rupley dot density method17 or
metabolism that require binding modes with a horizontal
analytically and to a rather high accuracy using approximate
orientation (described in Figure 2). Additionally, in the CYP
methods.18−22
Here, we describe the first method for calculation of atomic
SASA from 2D structures and include it in site of metabolism
prediction models for cytochromes P450. The atomic SASA for
non-hydrogen atoms are computed based on circular finger-
prints, which previously have been used successfully to build
models for prediction of toxicity,23,24 atomic properties such as
pKa,25 and site of metabolism.26,27
■ METHODS AND DATA SETS

Cytochrome P450 Data Sets. Nine data sets with site of
metabolism data for CYP mediated metabolism were taken
from the recent work by Zaretzki et al.8 and comprise the
Figure 2. Vertical (left) and horizontal (right) binding modes following numbers of compounds for each isoform: 1A2 (271),
exemplified with testosterone. The bold horizontal lines represent 2A6 (105), 2B6 (151), 2C8 (142), 2C9 (226), 2C19 (218),
the heme group at the bottom of the active site. The span descriptors 2D6 (270), 2E1 (145), and 3A4 (475). The CYP 3A4 data set
give preference to sites of metabolism that are close to the heme iron was divided into a training set (50 compounds) and a test set
in a vertical binding mode, as shown by the relative span values in the (425 compounds) by selecting a diverse subset using MACCS
figure. fingerprints and Tanimoto similarities in the MOE software.28
Calculation of Site of Metabolism Prediction Accu-
2C9 and CYP 2D6 models there is a pharmacophore-based racies. The models were validated by using the top 1, top 2,
distance descriptor, which counts the number of bonds from and top 3 standard prediction rates by the algorithm developed
the atom of interest to the pharmacophoric element by Zaretzki et al.7 This algorithm is used to ascertain that
(protonated amines in CYP 2D6 and negatively charged results from different software are comparable even when some
groups in CYP 2C9). This descriptor penalizes binding modes software gives multiple atoms the same rank.
with the pharmacophoric element close to the heme group. Solvent Accessible Surface Area (SASA) Data Sets. The
However, a descriptor for the local accessibility is missing in all subset of druglike compounds available for purchase (subset 23,
SMARTCyp models. The atomic solvent accessible surface area “drugs-now”, version update 2010-11-03) was downloaded in
(SASA, illustrated in Figure 3) is the most commonly used local smiles format from the ZINC database.29 MACCS fingerprints,
accessibility descriptor and has previously been shown to give a with 64 bit precision, were calculated for all molecules using the
significant contribution to several methods for prediction of Schrö dinger canvasFPGen utility 30 and applied by the
CYP mediated drug metabolism.5,7,14,15 canvasDBCS utility to generate a diverse compound subset.
The atomic SASA was first defined by Lee and Richards16 as The Soergel metric31 and sphere exclusion method were
“The area on the surface of a sphere of radius R, on each point applied with an exclusion distance of 0.3 (similarity of 0.7). For
of which the center of a solvent molecule can be placed in each of these compounds, the lowest energy ionization and
1217 dx.doi.org/10.1021/mp3005116 | Mol. Pharmaceutics 2013, 10, 1216−1223
tautomeric state at pH 7 was calculated with Epik.32,33

Subsequently, the most likely bioactive conformation was
determined using Confgen.34,35 The compound set was split
into training and test sets containing 7,000 and 20,849
compounds, respectively. Finally, hydrogen atoms were deleted,
and the two molecule sets were both split into four atom
subsets (Table 1) having 1−4 neighboring atoms.
Table 1. Description of the Data Sets in Terms of Number of

Atoms and Atomic SASA
1a 2a 3a 4a all
Training Sets
no. of atoms 29,692 70,451 42,936 3,784 146,863
SASA range 7−97 0−62 0−35 0−12 0−97
(Å2)
av SASA (Å2) 48.8 25.7 5.4 0.4 23.8
Test Sets
no. of atoms 84,188 191,005 113,888 11,309 400,390 Figure 4. The atom type counts for the atomic circular fingerprint of
SASA range 0−96 0−59 0−37 0−11 0−103 the chlorine atom in 1-chloro-3-methoxybenzene.
(Å2)
av SASA (Å2) 49.6 25.9 5.3 0.3 24.3
a
Number of neighboring atoms.
for the atomic SASA. 7-fold cross validation was used to
compute q2 values, and the auto fit function was used to
determine if the PLS components were significant. The two
3D SASA Calculation. The atomic SASA was computed atom types for neutral (N.3) and charged (N.4) sp3 nitrogen
using the Shrake−Rupley dot density method17 with a probe atoms were merged into one variable since knowledge about
radius of 1.4 Å (representing water36) and 16,000 sphere points the charge of pyramidal nitrogen atoms is usually not included
(this quantity of sphere points generates a grid fine enough to in topological methods. When selecting which variables to
make numerical errors negligible). Atom radii were taken from include in the models, we always looked at one level of the
Mantina et al.37 and complemented with van der Waals radii circular fingerprints at a time, and kept or removed all variables
from Bondi.38,39 Atoms that are topologically symmetric can of the same type in that level. The final PLS models were
have different SASA when using 3D coordinates. However, for defined as those having the highest explained variation and
prediction of accessibility we are only interested in the largest goodness of prediction (r2 and q2, respectively) and the lowest
SASA if there are topologically symmetric atoms. To define the complexity (smallest number of variables).
atoms that are topologically symmetric, we have used the Variable Importance Determination: Since the SMART-
algorithm by Hu and Xu.40 When describing local accessibility Cyp model is applied to site of metabolism data, which, in
for the purpose of CYP-mediated site of metabolism prediction principle, describes atom ranks within molecules, a traditional
we are, in principle, interested in describing the accessibility of determination of the variable importance in regression models
the oxygen atom that is bound to the heme iron atom, which is is not applicable. Instead we performed a variable exclusion
the atom that participate directly in the oxidation reaction, and evaluation to estimate the contribution of the parameters to the
not the full heme molecule itself. We assume that the atomic models. An exclusion evaluation is performed in the following
SASA calculated with a standard water probe radius (1.4 Å) is way: For each variable, the reduction of prediction accuracy
similar to the accessibility of this oxygen atom. when this variable is excluded from the scoring function
Atomic Circular Fingerprint Generation. The basic circular (without refitting the scoring function) is measured. The sum
fingerprints were generated using the algorithm of Xing et al.25 of the reductions for all variables is defined as being the total
without grouping of atom types. In our data sets of neighboring model accuracy, and the contribution of each variable is then its
atoms, there were in total 25 different atom types as described reduction divided by the total model accuracy. Thus, this gives
in Table SI1 in the Supporting Information. To these an estimate of the relative contributions of each variable to the
fingerprints we added atom counts and counts of rings of model for the data set in question. Below, the evaluation was
performed using the top 2 prediction accuracy.
■
sizes 3−8 or larger for each level (a level describes atoms
located a specific number of bonds from the atom of interest),
resulting in 33 variables at each level. Since we are counting RESULTS AND DISCUSSION
atom type, rings, and total atom count, each atom always Introduction of the Solvent Accessible Surface Area
contributes to at least two variables at each level (atom type for Site of Metabolism Prediction. We wanted to
and atom count), and can potentially contribute to three investigate if adding the atomic solvent accessible surface area
different variables (if it is part of a ring). Rings were defined (SASA) to the SMARTCyp score would improve the
using the minimum cycle bases algorithm41 as implemented in prediction accuracy. As a test case, a small subset of the CYP
the CDK.42,43 For each atom we generated circular fingerprints 3A4 data set (50 compounds) was used to determine the
in six levels, where level 0 is the atom of interest and levels 1−5 contribution of atomic SASA, which was then validated by the
encompass the atoms separated from this atom by 1−5 bonds, remainder of the CYP 3A4 data set (425 compounds). The
respectively, as shown in Figure 4 for atom type counts. atomic SASA was computed from the 3D structures of these
Construction of Partial Least-Squares (PLS) Models. 475 compounds. The contribution of atomic SASA to the
SIMCA-P (version 11)44 was used to create the PLS models45 SMARTCyp scoring function was optimized to achieve the best
top 1, top 2, and top 3 prediction accuracies for the training set. contributing 84% of the model (relative span and atomic SASA
The optimal parameter for the training set was found to be contribute 9% and 7%, respectively).
0.04, resulting in the SMARTCyp score equation shown (eq 1). Hence, we have shown that the addition of atomic SASA to
the SMARTCyp model adds a valuable contribution to the
score = reactivity − 8 × relative span model (∼7% of the prediction accuracy).
− 0.04 × atomic SASA (1) How To Calculate the Atomic SASA without Using 3D
Structures. Since SMARTCyp is a purely 2D-based method,
The score determines the likelihood for metabolism, with a we decided to investigate if it was possible to predict the atomic
low score suggesting an atom is more likely to be metabolized SASA from 2D structure-based properties, avoiding the need to
than an atom with a high score. The reactivity is a measure of generate 3D structures.
the transition state energy for the oxidation reaction As starting point for the building of a predictive model for
(determined by fragment matching against precomputed
atomic SASA, we compiled a diverse selection of 27,838
transition state energies). The relative span is defined as
druglike molecules from the ZINC database.29 The most likely
shown in Figure 1, and the constant 8 was determined to reflect
tautomer and conformation were generated for each molecule
the standard deviation of the original reactivity rules in
in this subset, and their atomic SASA were computed as
SMARTCyp, and has not been changed in this work.10
described above. This data set was split into training and test
The improvement in prediction accuracy was found to be 2−
sets, and atoms were divided into subsets according to the
4.5 percentage points for the training set and 1.2−3.0
percentage points for the test set (see Table 2), suggesting number of neighboring atoms as described in Table 1.
Using circular fingerprints (described above) predictive
models were built from each training set using the partial
Table 2. SMARTCyp Prediction Accuracy on CYP 3A4 Data
least-squares regression method (PLS).45 The final PLS models
with and without 3D Atomic SASA Contribution
are described in Table 3. Applied to the test sets, the models
training set test set have mean absolute errors (MAE) of 0.2−3.8 Å2, with
no SASA SASA no SASA SASA coefficients of determination (r2) ranging from 0.57 to 0.85.
top 1%a
66.0 68.0 64.2 65.4
In the final model (2DSASA), in which the appropriate PLS
top 2%a 77.5 80.0 75.7 78.1
model is applied to each atom, the MAE is 2.8 Å2 and r2 is 0.96.
top 3%a 83.5 88.0 81.2 84.2
The MAEs of the four PLS models are inversely correlated to
a the number of neighbors and the size of the atomic SASA
Top n accuracy is the percentage of compounds for which a site of
metabolism is found among the top n atoms.
distribution of the data sets (see Tables 1 and 3). This is
because data sets with small data ranges typically get smaller
that atomic SASA can make a valuable contribution to absolute errors. As can be seen in Figure 7 and Figure S2 in the
SMARTCyp models. An example of how rankings can change Supporting Information, the atomic SASA of the test set
when including atomic SASA is shown in Figure 5. A variable calculated by 2DSASA more often have large positive errors
than large negative ones. However, this is not reflected in the
average error, which is −0.06 for the test set. The errors for
specific atom types correlate with the occurrence of the atom
types in the different data sets (1−4 neighbors), and hence with
the PLS model applied. More details on the errors for specific
atom types are available in the Supporting Information.
Figure 6 describes the variables that are included in the final
PLS models. The atom type variables (T) contribute to levels
0−1 in all models, level 2 in the 1−3-neighbor models, and
Figure 5. An example of improved site of metabolism prediction when level 3 in the 1-neighbor model. The ring count variables (O)
including atomic SASA. The arrow represents the site of metabolism, contribute to level 0 in all models except the 1-neighbor one
and the numbers represent the top two ranked atoms (black with
atomic SASA and gray without). (an atom with one neighbor cannot be part of a ring). It also
contributes to level 1 in the 1- and 2-neighbor models, and level
2 in the 1-neighbor model. The variable for total number of
atoms (#) contributes to levels 2−3 in all models, and level 4 in
exclusion evaluation on the test set shows that the reactivity is all models except that for 4 neighbors. It is clear that the fewer
ten times more important than the two accessibility descriptors, atoms that are bound to an atom, the more atoms further away
Table 3. Statistics of the PLS Models in 2DSASA

variables components r2 q2 MAEa,b r2preda MAEpreda,b
c c
2DSASA 35−99 3−5 0.95 N/A 2.83 Å2 0.96 2.76 Å2
PLS models used in 2DSASA
1 neighbor 99 4 0.85 0.85 3.87 Å2 0.85 3.79 Å2
2 neighbors 79 5 0.81 0.81 3.30 Å2 0.82 3.22 Å2
3 neighbors 70 5 0.67 0.67 1.53 Å2 0.67 1.50 Å2
4 neighbors 35 3 0.51 0.49 0.24 Å2 0.57 0.20 Å2
a
Mean absolute error of the training (MAE) and test (MAEpred) sets. bThe predicted atomic SASA are set to zero if the PLS model gives a negative
value. c2DSASA uses the appropriate PLS model for each atom.

Table 4. Prediction Accuracy for SMARTCyp with 2D and

3D Atomic SASA Contribution on the CYP 3A4 Test Set
3D SASA 2DSASA
top 1% 65.4 65.3
top 2% 78.1 77.9
top 3% 84.2 84.5
between using atomic SASA from our 2D model and from 3D

structures is minimal.
Application of 2D Based Atomic SASA to the
SMARTCyp Prediction Models and Validation to Nine
CYP Isoforms. Since the application of 2DSASA to the CYP
3A4 data set showed good results, we decided to apply this new
standard SMARTCyp model to the other eight isoform data
sets released by Zaretzki et al.8 The results are shown in Table
5 and show improvements in prediction accuracies for all
isoforms ranging from 0.0 to 3.6 percentage points for top 1,
1.5 to 3.6 percentage points for top 2, and 1.7 to 4.3 percentage
points for top 3. The average improvements are 1.4, 2.3, and
Figure 6. Atom of interest (filled black circle) with 1−4 neighboring
atoms, respectively. The PLS model for each atom of interest
2.7 percentage points for top 1, top 2, and top 3 prediction
incorporates variables for atom types (T), ring counts (O), and total accuracies, respectively, indicating that the contribution of
number of atoms (#). atomic SASA is more important for compounds that are harder
to predict (have a lower top-ranked site of metabolism without
atomic SASA).
As seen in Table 5, the prediction accuracies for SMARTCyp
are the lowest for the 2C9 and 2D6 isoforms. This has been
noted earlier and was the major reason for our investigation
into pharmacophore corrections to the prediction of 2C9 and
2D6 substrates which were introduced in version 2 of
SMARTCyp.6,13 Applying the same atomic SASA corrections
to these models gives similar increases in prediction accuracies.
Since the 2C9 and 2D6 models have a different accessibility
descriptor than the standard model, this shows that the simple
correction derived from 50 CYP 3A4 substrates is consistent
across all isoforms and model types (see Table 6).
After the inclusion of 2DSASA in the standard model and the
pharmacophore models, it is obvious that the isoforms for
which SMARTCyp cannot perform as well are CYP 2C8 and
CYP 2C19. Since these are closely related to CYP 2C9, one
might expect that a model similar to the 2C9 model would give
better predictions for these isoforms than the standard model.
Indeed, Liu et al. have shown that a model similar to the
Figure 7. Atomic SASA in Å2 computed by the reference Shrake− SMARTCyp 2C9 model13 performs better than the standard
Rupley dot density method and 2DSASA. model for a small set of CYP 2C19 substrates.48 To investigate
how their results extrapolate to the much larger data sets used
will contribute to the PLS model. This is most likely due to the in this study, we applied the 2C9 model to the 2C8 and 2C19
fact that atoms with many neighboring atoms have less data sets, with and without the pharmacophore correction
flexibility, and hence their accessible surface area depends included (all with 2DSASA correction included). The results
more on the closest neighbors. are significantly better than for the standard model, and the
To validate that the atomic SASA computed from our 2D pharmacophore correction also gives a small positive
model gives an accuracy that is useful for the prediction of site contribution to the prediction accuracy for the 2C8 and
of metabolism, we applied it to the same test set used to 2C19 data sets (see Table 7). While these isoforms are not
validate the calibration of the atomic SASA contribution to usually considered to metabolize as many carboxylic acid
SMARTCyp above. The results are compared to the results containing substrates as 2C9, the results in Table 7 and an
using atomic SASA calculated from 3D structures in Table 4, investigation of the data sets show that, when they do, they
and show that there is no significant difference of using atomic tend to oxidize them at the same positions as 2C9.
SASA computed from 2D and 3D structures. Therefore, while Thus, the new SMARTCyp version includes three models, all
it might be possible to build more accurate 2D predictive depending solely on the 2D structure of the compounds to be
models of atomic SASA using nonlinear models such as random predicted:
forests46 and support vector machines,47 there seems to be no 1. a standard model including reactivity, relative span, and
gain in going beyond a simple PLS model for the purpose of 2DSASA, that is applicable to CYP isoforms 1A2, 2A6,
application to site of metabolism modeling, as the difference 2B6, 2E1, and 3A4
Table 5. Prediction Accuracy for the Standard SMARTCyp Model with and without 2D Atomic SASA Contribution on Nine
CYP Isoforms and the Improvements with Atomic SASA
1A2 2A6 2B6 2C8 2C9 2C19 2D6 2E1 3A4
Standarda
top 1% 64.3 71.3 66.0 61.9 55.7 59.6 48.9 64.1 64.4
top 2% 78.5 84.1 74.6 73.8 69.6 74.0 59.8 81.0 75.9
top 3% 85.4 88.7 85.0 80.2 78.8 80.3 68.6 87.1 81.4
+SASAb
top 1% 64.9 72.4 66.2 65.5 58.8 60.1 49.3 64.1 65.4
top 2% 80.0 85.7 76.8 77.5 71.7 76.1 61.1 82.1 78.1
top 3% 88.4 90.5 86.8 84.5 81.4 83.0 70.7 89.0 85.0
a
Equation 1 without SASA. bEquation 1.
Table 6. Prediction Accuracy for the Pharmacophore

SMARTCyp Models with and without 2D Atomic SASA
Contribution on CYP 2C9 and CYP 2D6
SMARTCyp 2.3 +2DSASA diff
2C9
top 1% 70.1 71.0 1.0
top 2% 81.5 83.8 2.4
top 3% 88.0 90.5 2.5
2D6
top 1% 73.5 73.7 0.2
top 2% 82.0 83.0 0.9
top 3% 87.5 89.6 2.1
Figure 8. Variable importance for the final SMARTCyp models
Table 7. Prediction Accuracy for the 2C9 Pharmacophore applied to data sets for nine CYP isoforms. Standard model applied to
SMARTCyp Model with 2D Atomic SASA on CYP 2C8 and 1A2, 2A6, 2B6, 2E1, and 3A4; 2C model applied to 2C8, 2C9, and
CYP 2C19 2C19; 2D6 model applied to 2D6.
std 2C9 model 2C9 model − pharmacophore
2C8 2E1 have a preference for smaller substrates (the median
top 1% 65.5 68.0 63.7 molecular weights in these data sets are 206 and 200,
top 2% 77.5 82.7 81.3 respectively, compared to 271−326 for the other seven data
top 3% 84.5 90.5 89.8 sets), which is because they have the smallest binding sites
2C19 among these isoforms.49,50 However, we know that CYP 2E1
top 1% 60.1 69.7 67.9 also can metabolize various fatty acids,50 for which the relative
top 2% 76.1 85.8 85.3 span might be more important, but such molecules are a
top 3% 83.0 91.3 91.3 minority in the current data set.
The application of the 2C model to the three 2C enzymes
2. a CYP 2C model including reactivity, Span2End, distance also shows a relatively systematic contribution from the local
to carboxylic acid bioisostere, and 2DSASA accessibility (5−7%), whereas the contributions from the other
3. a CYP 2D6 model including reactivity, Span2End, descriptors vary more. The reactivity is less important than in
distance to positively charged amine, and 2DSASA the standard model, contributing 62−69%, and being most
To investigate the relative importance of the descriptors in important in CYP 2C8. The Span2End descriptor contributes
the final models, we performed the same variable exclusion 21−29%, being least important in CYP 2C8, and the
evaluation as done above for all the final models toward the pharmacophore descriptor contributes 1−7%, being most
nine data sets. A summary of the results is shown in Figure 8. important in CYP 2C9 as expected. The importance of the
The standard model was applied to data sets for isoforms 1A2, Span2End descriptor correlates inversely to the likelihood of
finding sites of metabolism that are located further away from
2A6, 2B6, 2E1, and 3A4. In this model, reactivity is highly
the end of the molecules. The average number of bonds from
important, contributing 86−97% of the model, and the two the end of the molecule to a site of metabolism (i.e., the average
accessibility descriptors contribute to the remaining 3−14%. Span2End descriptor value for sites of metabolism) is 1.74 in
The contribution of the relative span and 2DSASA varies from CYP 2C8, 1.46 in CYP 2C9, and 1.33 in CYP 2C19. The larger
0−10% and 3−7%, respectively, suggesting that, while the difference between CYP 2C8 and the other two enzymes most
contribution of the local accessibility developed in this work likely is due to the larger size of its active site,51 since a larger
(2DSASA) gives a relatively systematic contribution, the active site would make it easier to orient the substrates in a way
contribution of the relative span descriptor depends more on that makes more centrally located atoms in a substrate position
the data set. The relative span is least important for the CYP themselves close to the heme group. However, the small
2A6 and CYP 2E1 data sets (1.4 and 0%, respectively), which is difference between CYP 2C19 and CYP 2C9 is harder to
quite interesting since these isoforms have different substrate explain. It could possibly be explained by the difference in
preferences than the other isoforms. Both CYP 2A6 and CYP positions of Phe 114 and Phe 476 as shown in the recent work
by Reynald et al.,51 but it might as well be a result of the data compared to much more complex methods and showed better
sets used in the current work. or similar prediction accuracies for all isoforms.
The same evaluation performed for the CYP 2D6 model on
the CYP 2D6 data set shows that reactivity is even less
important than for the 2C family (55%), whereas the
■
*
ASSOCIATED CONTENT
S Supporting Information
pharmacophore is much more important (35%). This reflects ZINC IDs of all compounds in the data sets, PLS equations for
the substrate preference of CYP 2D6, which has a high the 2DSASA models, SYBYL atom type descriptions of the
preference for protonated amines, and binds these specifically atom types included in the 2DSASA models, atom type
with the amine group far away from the heme. The N- distributions in the 2DSASA data sets, and errors by atom type
dealkylation of protonated amines, which is initiated by a in 2DSASA. This material is available free of charge via the
hydrogen abstraction from the α-carbon atom, is one of the Internet at http://pubs.acs.org.
■
CYP-mediated reactions with lowest energy barrier. Thus, to
make a reactivity-based model such as SMARTCyp predict AUTHOR INFORMATION
CYP 2D6 mediated metabolism correctly, a large contribution Corresponding Author
from the pharmacophore descriptor is required. *Phone: +45 35 33 66 50. Fax: +45 35 33 60 41. E-mail: pry@
Comparison of the Final Models to More Complex sund.ku.dk.
Models. To compare our methods to others, we show the
prediction accuracies in relation to the recent work on the RS- Notes
The authors declare no competing financial interest.
■
predictor methodology by Zaretzki et al.,8 in which models
were built using the MIRank algorithm52 on SMARTCyp
reactivities together with either 148 topological descriptors, or ACKNOWLEDGMENTS
the same descriptors plus 392 quantum chemical descriptors The work was supported by grants from the Alfred Benzon
computed with AM1. They also compared the results to two Foundation, the Danish Council for Independent Research
methods implemented in StarDrop and Schrö dinger. A (Medical Sciences), and Lhasa Limited. The authors wish to
comparison of the top 2 prediction accuracies is shown in thank Nina Jeliazkova for constructing the initial version of the
Figure 9. It shows that our simple 3−4 parameter methods circular fingerprint java code.
compare well to the much more complex RS-predictor based
models, as well as the StarDrop and Schrödinger software
packages.
■ ABBREVIATIONS USED
SASA, solvent accessible surface area; CDK, chemistry
development kit; CYP, cytochrome P450
■ REFERENCES
(1) Guengerich, F. P. Cytochrome P450s and other enzymes in drug
metabolism and toxicity. AAPS J. 2006, 8, E101−E111.
(2) Kirchmair, J.; Williamson, M. J.; Tyzack, J. D.; Tan, L.; Bond, P.
J.; Bender, A.; Glen, R. C. Computational Prediction of Metabolism:
Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms. J.
Chem. Inf. Model. 2012, 52, 617−648.
(3) Danielson, M. L.; Desai, P. V; Mohutsky, M. A.; Wrighton, S. A.;
Lill, M. A. Potentially increasing the metabolic stability of drug
candidates via computational site of metabolism prediction by
CYP2C9: The utility of incorporating protein flexibility via an
ensemble of structures. Eur. J. Med. Chem. 2011, 46, 3953−3963.
(4) Moors, S. L. C.; Vos, A. M.; Cummings, M. D.; Van Vlijmen, H.;
Ceulemans, A. Structure-Based Site of Metabolism Prediction for
Figure 9. Comparison of top 2 prediction accuracies for SMARTCyp Cytochrome P450 2D6. J. Med. Chem. 2011, 54, 6098−6105.
with 2DSASA included to other prediction methods. (5) Rydberg, P.; Vasanthanathan, P.; Oostenbrink, C.; Olsen, L. Fast
Prediction of Cytochrome P450 Mediated Drug Metabolism.
ChemMedChem 2009, 4, 2070−2079.
■
(6) Rydberg, P.; Olsen, L. Ligand-Based Site of Metabolism
CONCLUSIONS Prediction for Cytochrome P450 2D6. ACS Med. Chem. Lett. 2012,
3, 69−73.
In this work we created 2DSASA, a model for prediction of (7) Zaretzki, J.; Bergeron, C.; Rydberg, P.; Huang, T.; wei; Bennett,
atomic SASA from purely 2D structure information. 2DSASA K. P.; Breneman, C. M. RS-Predictor: A New Tool for Predicting Sites
enables methods built from 2D structure data, e.g., toxicity of Cytochrome P450-Mediated Metabolism Applied to CYP 3A4. J.
alerts and site of metabolism predictions, to take atomic Chem. Inf. Model. 2011, 51, 1667−1689.
accessibility into account. We showed that, for a test set (8) Zaretzki, J.; Rydberg, P.; Bergeron, C.; Bennett, K. P.; Olsen, L.;
consisting of 20,847 compounds, 2DSASA predict the atomic Breneman, C. M. RS-Predictor Models Augmented with SMARTCyp
SASA with an average absolute error of only 2.8 Å2. Reactivities: Robust Metabolic Regioselectivity Predictions for Nine
CYP Isozymes. J. Chem. Inf. Model. 2012, 52, 1637−1659.
We integrated 2DSASA in SMARTCyp and showed that for (9) Pochapsky, T. C.; Kazanis, S.; Dang, M. Conformational
a set of 425 CYP3A4 substrates it gave a model as accurate as Plasticity and Structure/Function Relationships in Cytochromes
one built from atomic SASA computed from 3D structures. It P450. Antioxid. Redox Signaling 2010, 13, 1273−1296.
was also applied to data sets for eight other CYP isoforms (1A2, (10) Rydberg, P.; Gloriam, D. E.; Zaretzki, J.; Breneman, C.; Olsen,
2A6, 2B6, 2C8, 2C9, 2C19, 2D6, and 2E1) and was shown to L. SMARTCyp: A 2D Method for Prediction of Cytochrome P450-
consistently improve the predictions. The final models were Mediated Drug Metabolism. ACS Med. Chem. Lett. 2010, 1, 96−100.

(11) Rydberg, P.; Gloriam, D. E.; Olsen, L. The SMARTCyp Efficient Generation of Bioactive Conformers. J. Chem. Inf. Model.
cytochrome P450 metabolism prediction server. Bioinformatics 2010, 2010, 50, 534−546.
26, 2988−2989. (35) Confgen, version 2.1; Schrödinger L.L.C.: New York, NY, 2009.
(12) Rydberg, P.; Jørgensen, M. S.; Jacobsen, T. A.; Jacobsen, A.-M.; (36) Bernal, J. D.; Fowler, R. H. A Theory of Water and Ionic
Madsen, K. G.; Olsen, L. Nitrogen Inversion Barriers Affect the N- Solution, with Particular Reference to Hydrogen and Hydroxyl Ions. J.
Oxidation of Tertiary Alkylamines by Cytochromes P450. Angew. Chem. Phys. 1933, 1, 515.
Chem., Int. Ed. 2013, 52 (3), 993−997. (37) Mantina, M.; Chamberlin, A. C.; Valero, R.; Cramer, C. J.;
(13) Rydberg, P.; Olsen, L. Predicting Drug Metabolism by Truhlar, D. G. Consistent van der Waals Radii for the Whole Main
Cytochrome P450 2C9: Comparison with the 2D6 and 3A4 Isoforms. Group. J. Phys. Chem. A 2009, 113, 5806−5812.
ChemMedChem 2012, 7, 1202−1209. (38) Bondi, A. Van der Waals Volumes + Radii. J. Phys. Chem. 1964,
(14) Sheridan, R. P.; Korzekwa, K. R.; Torres, R. A.; Walker, M. J. 68, 441−451.
Empirical regioselectivity models for human cytochromes p450 3A4, (39) Bondi, A. Van der Waals Volumes and Radii of Metals in
2D6, and 2C9. J. Med. Chem. 2007, 50, 3173−3184. Covalent Compounds. J. Phys. Chem. 1966, 70, 3006−3007.
(15) Hennemann, M.; Friedl, A.; Lobell, M.; Keldenich, J.; Hillisch, (40) Hu, C. Y.; Xu, L. On highly discriminating molecular topological
A.; Clark, T.; Göller, A. H. CypScore: Quantitative prediction of index. J. Chem. Inf. Comput. Sci. 1996, 36, 82−90.
reactivity toward cytochromes P450 based on semiempirical molecular (41) Berger, F.; Gritzmann, P.; De Vries, S. Minimum cycle bases for
orbital theory. ChemMedChem 2009, 4, 657−669. network graphs. Algorithmica 2004, 40, 51−62.
(16) Lee, B.; Richards, F. M. Interpretation of Protein Structures - (42) Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann,
Estimation of Static Accessibility. J. Mol. Biol. 1971, 55, 379−400. E.; Willighagen, E. The Chemistry Development Kit (CDK): An open-
(17) Shrake, A.; Rupley, J. A. Environment and Exposure to Solvent source Java library for chemo- and bioinformatics. J. Chem. Inf. Comput.
of Protein Atoms - Lysozyme and Insulin. J. Mol. Biol. 1973, 79, 351− Sci. 2003, 43, 493−500.
371. (43) Steinbeck, C.; Hoppe, C.; Kuhn, S.; Floris, M.; Guha, R.;
(18) Weiser, J.; Shenkin, P. S.; Still, W. C. Approximate atomic Willighagen, E. L. Recent developments of the Chemistry Develop-
surfaces from linear combinations of pairwise overlaps (LCPO). J. ment Kit (CDK) - An open-source Java library for chemo- and
Comput. Chem. 1999, 20, 217−230. bioinformatics. Curr. Pharm. Des. 2006, 12, 2111−2120.
(19) Haberthur, U.; Caflisch, A. FACTS: Fast analytical continuum (44) SIMCA-P, version 11; Umetrics AB: Umeå, Sweden, 2005.
treatment of solvation. J. Comput. Chem. 2008, 29, 701−715. (45) Wold, H. Estimation of principal components and related
(20) Lee, M. S.; Feig, M.; Salsbury, F. R.; Brooks, C. L. New analytic models by iterative least squares. In Multivariate Analysis; Krishnaiaah,
approximation to the standard molecular volume definition and its P. R., Ed.; Academic Press: New York, 1966; pp 391−420.
application to generalized born calculations. J. Comput. Chem. 2003, (46) Breiman, L. Random Forests 2001, 45, 5−32.
24, 1348−1356. (47) Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.
(21) Cavallo, L.; Kleinjung, J.; Fraternali, F. POPS: a fast algorithm 1995, 20, 273−297.
for solvent accessible surface areas at atomic and residue level. Nucleic (48) Liu, R.; Liu, J.; Tawa, G.; Wallqvist, A. 2D SMARTCyp
Acids Res. 2003, 31, 3364−3366. Reactivity-Based Site of Metabolism Prediction for Major Drug-
(22) Hasel, W.; Hendrickson, T. F.; Still, W. C. A rapid Metabolizing Cytochrome P450 Enzymes. J. Chem. Inf. Model. 2012,
approximation to the solvent accessible surface areas of atoms. 52, 1698−1712.
Tetrahedron Comput. Methodol. 1988, 1, 103−116. (49) Yano, J. K.; Hsu, M.-H.; Griffin, K. J.; Stout, C. D.; Johnson, E.
(23) Jaworska, J.; Nikolova-Jeliazkova, N. How can structural F. Structures of human microsomal cytochrome P450 2A6 complexed
similarity analysis help in category formation? SAR QSAR Environ. with coumarin and methoxsalen. Nat. Struct. Mol. Biol. 2005, 12, 822−
Res. 2007, 18, 195−207. 823.
(24) Jeliazkova, N.; Jaworska, J.; Worth, A. Open Source Tools for (50) Porubsky, P. R.; Meneely, K. M.; Scott, E. E. Structures of
Read-Across and Category Formation. In In Silico Toxicology: Principles human cytochrome P-450 2E1. Insights into the binding of inhibitors
and Applications; Cronin, M., Madden, J., Eds.; RSC Publishing: and both small molecular weight and fatty acid substrates. J. Biol.
Cambridge, U.K., 2010; pp 408−445. Chem. 2008, 283, 33698−33707.
(25) Xing, L.; Glen, R. C. Novel methods for the prediction of logP, (51) Reynald, R. L.; Sansen, S.; Stout, C. D.; Johnson, E. F. Structural
pK(a), and logD. J. Chem. Inf. Comput. Sci. 2002, 42, 796−805. characterization of human cytochrome P450 2C19: active site
(26) Boyer, S.; Arnby, C. H.; Carlsson, L.; Smith, J.; Stein, V.; Glen, differences between P450′s 2C8, 2C9 and 2C19. J. Biol. Chem.
R. C. Reaction site mapping of xenobiotic biotransformations. J. Chem. 2012, 287, 44581−44591.
Inf. Model. 2007, 47, 583−590. (52) Bergeron, C.; Moore, G.; Zaretzki, J.; Breneman, C. M.;
(27) Carlsson, L.; Spjuth, O.; Adams, S.; Glen, R. C.; Boyer, S. Use of Bennett, K. P. Fast bundle algorithm for multiple-instance learning.
IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1068−1079.
historic metabolic biotransformation data as a means of anticipating
metabolic sites using MetaPrint2D and Bioclipse. BMC Bioinf. 2010,
11, 362.
(28) Molecular Operating Environment (MOE), 2011.10; Chemical
Computing Group Inc., 1010 Sherbooke St. West, Suite #910,
Montreal, QC, Canada, H3A 2R7, 2011.
(29) Irwin, J. J.; Shoichet, B. K. ZINCA free database of
commercially available compounds for virtual screening. J. Chem. Inf.
Model. 2005, 45, 177−182.
(30) Canvas, version 1.2; Schrödinger L.L.C.: New York, NY, 2009.
(31) Soergel, D. Mathematical analysis of documentation systems.
Inf. Storage Retr. 1967, 3, 129−173.
(32) Shelley, J. C.; Cholleti, A.; Frye, L. L.; Greenwood, J. R.; Timlin,
M. R.; Uchimaya, M. Epik: a software program for pK (a) prediction
and protonation state generation for drug-like molecules. J. Comput.-
Aided Mol. Des. 2007, 21, 681−691.
(33) Epik, version 2.0; Schrödinger L.L.C.: New York, NY, 2009.
(34) Watts, K. S.; Dalal, P.; Murphy, R. B.; Sherman, W.; Friesner, R.
A.; Shelley, J. C. ConfGen: A Conformational Search Method for

Rydberg

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Rydberg

Hochgeladen von

Copyright:

Verfügbare Formate

Article

The Contribution of Atom Accessibility to Site of Metabolism Models

ABSTRACT: Three diﬀerent types of atom accessibility

predictions. To enable the integration of local accessibility we

atomic solvent accessible surface area that is independent of

as the relative distance to the center of the molecule (see Figure

Figure 1. Description of the relative span and Span2End descriptors

■ METHODS AND DATA SETS

tautomeric state at pH 7 was calculated with Epik.32,33

Table 1. Description of the Data Sets in Terms of Number of

Table 3. Statistics of the PLS Models in 2DSASA

1219 dx.doi.org/10.1021/mp3005116 | Mol. Pharmaceutics 2013, 10, 1216−1223

Table 4. Prediction Accuracy for SMARTCyp with 2D and

between using atomic SASA from our 2D model and from 3D

Table 6. Prediction Accuracy for the Pharmacophore

1222 dx.doi.org/10.1021/mp3005116 | Mol. Pharmaceutics 2013, 10, 1216−1223

1223 dx.doi.org/10.1021/mp3005116 | Mol. Pharmaceutics 2013, 10, 1216−1223

Das könnte Ihnen auch gefallen