Wolgang 1

3D Quantitative-Structure-Activity
Relationships (QSAR) Methods in

Drug Design
Wolfgang Sippl, PhD
Martin-Luther-Universität Halle-Wittenberg
Institute of Pharmaceutical Chemistry
3D-QSAR publications
250
229
203
194 195
200 186
176 180
168
158
150
123
106
99
100
76
60 58
50
29
20
1 2 4
0
88
89
90
91
92
93
94
95
96
97
98
99
00
01
02
03
04
05
06
07
19
19
19
19
19
19
19
19
19
19
19
19
20
20
20
20
20
20
20
20
Chemical Abstract Service
1
3D-QSAR Methods in Drug Design
• Introduction
• Theoretical Background - 3D-QSAR
– Training Set
– Ligand Alignment
– Molecular Field Calculation
– Internal Validation – Crossvalidation
– External Prediction – Interpretation
• Case Studies
• Conclusions and Recommendations
Introduction
Pharmacophores
Data base Similarity Structure-based
filtering analysis QSAR
design
2-4 years
Target Biological test HTS HTS hits Chemistry Target structure Drug
selection development confirmed start determined candidate
When is QSAR or 3D-QSAR useful?
- you have a data set of ligands with known activities (preferentially in

vitro data on isolated proteins) covering several orders of magnitude of
biol. activity
- known binding mode (competitve) for all ligands
- you want to synthesize modified derivatives
-What you should not expect

- prediction of compounds not related to the original series
- scaffold hopping
2
Intention – QSAR
QSAR = quantitative structure activity relationships are derived

from a series of (similar) molecules with known activity (training
set). If a statistically relevant QSAR model has been found, it can
be applied to new molecules in this series (test set) in order to
predict their activity before biological testing (or even before
synthesis!)
Statistical Tools
Biological Data Molecular Properties
Ki MEP
IC50 MLP
MIC Volume
Permeation log P
… …
QSAR – Molecular Descriptors
General form of a QSAR equation:

biol. activity = f(P) P = molecular properties
biol. activity = const. + (c1 P1) + (c2 P2) + (c3 P3) + ...
Molecular properties – molecular descriptors
1D: Whole-molecule properties

(e.g. molecular weight, melting point, logP, ...)
2D: Substituent constants (e.g. π, σ, molar refractivity),
fragment fingerprints, topological indices, ...)
3D: Surface or field properties (e.g. electrostatic potential,
steric fields, solvent accessible surface area, ...)
3
QSAR
∆(observed activity) - ∆(molecular descriptors)

Y = f(P)
Linear Regression Hammett, 1939
pKi = ao + a1 (Mol Voli)
Multiple Linear Regression (MLR) Hansch, 1964

pKi = ao + a1 (Mol Voli) + a2 (logP) + a3 (µi) + ...
Partial Least-Squares (PLS) Regression Wold, 1984
pKi = ao + a1 (PC1) + a2 (PC2) + a3 (PC3) + ...
Others: neuronal nets, bayesian models, decission trees, ..
PLS Analysis
• PLS analysis:
Belongs to the family of PCA (principal component
analysis) techniques and is used as standard method
within 3D-QSAR
• Large dimension sets require decomposition

techniques such as PLS
• Use of principal component analysis in regression:

First reduction of X and/or Y matrices in principal
components also called latent variables (LVs).
Secondly, regression between these latent variables.
4
Crossvalidated PLS analysis
Leave-one-out Leave-several-out
Original Groups of
Table crossvalidation
SDEP
Derivation
of a model Differences
Compounds Predicted
excluded Activity
Measured
Prediction of excluded Activity
compounds
Crossvalidated PLS Analysis
• Crossvalidated r2cv (q2) ∑(y − yexp erimental ) 2

q 2 = 1.0 −
predicted
∑(y predicted
experiment − y) 2
1.00 = Optimal Model
Statistical Significant Model
0.50
Use results only with care when: q2 < 0.5
0.00 = No Model!
Negative values = prediction worse than

those based on the mean over all compounds !
5
3D-QSAR Methods
• Different techniques
– Molecular-shape analysis
– Hypothetical Active Site Lattice
(HASL)
– Comparative Molecular Field
Analysis (CoMFA, CoMSiA,
GRID/GOLPE …)
– ALMOND, GRIND
– QUASAR
– AFMoC
CoMFA Approach
1. Superimpose 3D models of molecules

(„Ligand Alignment")
2. Generate a regular grid around the
molecules
3. Calculate and tabulate steric and

electrostatic interaction energy of
each grid point and each molecule
Compound Number Biol. Activity Steric Electrostatic Steric Electrostatic Steric Electrostatic Steric Electrostatic
Interaction Interaction Interaction Interaction Interaction Interaction Interaction Interaction ...
S001 E001 S002 E002 S003 E003 S004 S004
1 1.07
2 0.09
3 0.66
4 1.42
5 -0.62
6 0.64
7 -0.46
6
3D-QSAR
CoMFA (Comparative Molecular Field Analysis)
Selection
Training Set
Interpretation
Ligand Alignment Graphical Representation
Calculation Statistical
Molecular Fields Analyses
QSAR - Setup
• All included compounds

– interact with the target in the same way
– posses the same binding mode (competitive)
• Interaction energies ~ biological activities
• Biological activities ~ binding affinities
• Quality biological activities! (Test system,
experimental error, value distribution, ...)
• Quality of compounds (Structure,
stereochemistry, purity, ...)
• Cave: in vivo data (influence of transport
processes)
7
Selection Training Set
• The training set should contain a wide

range of structurally diverse compounds
(> 3-4 orders of magnitude)
• Both the range and the distribution of
biological data are of great importance
• Improving of the distribution using a
logarithmic scale
The Gibbs-Helmholtz equation (∆G = -RT lnK) tells us that

there is a logarithmic relationship between equilibrium
constants (e.g. IC50) and free energy of binding. Thus, the IC50
values are normally transformed to a logarithmic scale.
Selection Training Set - Example
CN CN
CN N
N N N
O N N
R Y X
Y X O
R R O
CN Set 3 : 18 molecules
N N Set 2 : 7 molecules
O O
O
R
X R'
Set 1 : 16 molecules
R
O O O Y R
N
OR´
N N N
O X O
R R'
Y
O
R Set 5 :12 molecules
Set 4 : 3 molecules
O
Different set of MAO inhibitors
8
10
Sets 4 und 7:
9 not enough active (7)
or inactive (4)
8
compounds
pIC 50
7
Sets 1, 2, 3 und 5:
6 Poor distribution of
biol. activities
5
4 Set 6:
3 Broad range and
0 1 2 3 4 5 6 7 8 relatively well
distributed biol.
Sets activities
Statistical Results (q2LOO) for Training Set 6 (n=22)
Analysis Field(s) q2 N r2 s F ste ele

A S .743 3 .894 .522 50.4 100 -
B E .433 1 .547 1.02 24.2 - 100
C S+E .594 2 .790 .713 35.8 45.1 54.9
The model using only the steric field shows the best statistical
results (q2, LOO cross validation)
9
Statististical Results (q2LOO) for Training Set 1, 3, 5 and 6
Field(s) Set 1 Set 3 Set 5 Set 6
Steric -0.219 (1) 0.005 (3) -0.097 (1) 0.743 (3)
Electrostatic 0.296 (2) -0.075 (1) -0.180 (1) 0.433 (1)
S+E 0.006 (1) 0.031 (2) -0.141 (1) 0.594 (2)
No model could be obtained when set 1, 3 or 5 were used,

presumably due to the poor distribution/small range of activities
in these sets.
Statistical Results (q2LOO) for several combinations
Field(s) Sets 1 + 4 Sets 2+4 Sets 3+4 Sets 1+2+3

S 0.645 (1) 0.872 (1) 0.778 (2) -0.035 (1)
E 0.786 (2) 0.831 (1) 0.840 (2) 0.198 (3)
S+E 0.728 (1) 0.854 (1) 0.816 (2) 0.212 (4)
By• combining sets 1, 2 and 3, no reliable CoMFA model was

Die Kombination von Trainingsatz 1, 2 und 3, ergibt kein signifikantes
found, presumably due to the
Modell (schechte Verteilung derpoor distribution
biologischen of activities.
Aktivitäten).
• Werdenby
However, diecombining
Trainingssätze
set1,4 2with
und either
3 jedoch mit1,Trainingssatz
set 2 or 3, CoMFA4
kombiniert, so ergeben sich signifikante Modelle!
produced surprisingly good statistical models!
10
Statistics are markedly improved when set 4 (only 3 compounds of

high activity !) was added. However, it appears that the activities
can be separated in two clusters (poorly active and highly active
compounds). It is thus trivial to find good linear models (a straight
line via two points!).
• • Set 4
pred.
•
„Beware of q2!“
q2 0.85
•• •
••• •
expt.
S
• The leave-one-out procedure was not able to detect

this pitfall.
• When the crossvalidation was performed using

groups of crossvalidation, the q2 vary from very good
(> 0.8) to very bad (<-0.5, when all active compounds
were removed!)
• The “leave-several-out” crossvalidation detects the

robustness of a CoMFA model much better.
• The choice of the training set is of prime importance

as it will affect the outcome of a CoMFA model!
11
3D-QSAR
Selection
Training Set
Interpretation
Ligand Alignment
• The alignment step is the most critical in a CoMFA

study as it will affects the outcome of the statistical
analysis and it is rather difficult particularly when the
studied compounds are structurally diverse.
N
HO N
HO
N
S N
O H
N
H
How to align?
N
N
F
H
N N N
N
N N
O N N H
H
Serotonin 5HT1F receptor agonists
12
Ligand Alignment
One problem – several ways of solving it

– Alignment-independend methods
(GRIND, ALMOND)
– Ligand-based alignment – use of traditional
pharmacophore concepts (Active Analog Approach)
Catalyst, Disco, …
– FLEXS, SEAL
– Field Fit alignment
– Receptor-based alignment (from ligand docking)
3D-QSAR
Selection
Training Set
Interpretation
13
Calculation Molecular Fields
• Traditional CoMFA Fields

– Lennard-Jones Potential (Steric Field)
• Coulomb potential (Electrostatic Field)
Lennard-Jones Potential Electrostatic Energy
1
2.5
0.8
2 (A12/r – B6/r) 0.6 q1q2 / ε r
Energy, kcal/mol
Energy, kcal/mol
0.4
1.5
0.2
1 0
-0.2 0 1 2 3 4
0.5 -0.4
0 -0.6
0 1 2 3 4 -0.8
-0.5 -1
Nonbonded Internuclear Distance Internuclear Distance
3D-QSAR
Selection
Training Set
Interpretation
14
Statistical Parameters
• Crossvalidation:
q 2
= 1−
∑(y obs − y pred ) 2
– Crossvalidated Correlation Coeff., q2. ∑(y obs − y) 2
– Optimal Number of Components.
ypred predicted value
– SDEP (Standard Deviation of Error yobs observed value
Prediction) y mean
( y pred − y obs ) 2
• Final PLS Model:
– Correlation Coefficient, r2.
SDEP = ∑ N
– Standard Deviation, s.
– F values. SDEP standard dev. of error prediction
yobs predicted value
ypred observed value
N number of ligands
r 2 = 1−
∑(y calc − yobs ) 2
∑(y obs − y) 2
Crossvalidation - PLS Analysis
• Choice of optimal number of components: principal

source of overfitting in PLS analyses.
• Graphs q2 vs number of components help the selection!
0,8 1,2
1
0,6
0,8
r^2 final
Q^2
0,4 0,6
0,2 0,4
0,2
0
0
0 2 4 6 8 10 12
0 2 4 6 8 10 12
Number of components Number of components
LOO
• Principal rule: have more than 5 observations by

component !
15
3D-QSAR
Selection
Training Set
Interpretation
Graphical Representation
• The graphical representation of CoMFA

models provides important information
regarding the optimization of drug molecules.
• Representation of regions, where differences
in the field variables are correlated with
variance of biological activities.
16
Case Study 1
Ligand-based 3D-QSAR
5-HT1F Receptor Agonists
Biological Data 5-HT1F Agonists
N N
HO HO N
N
S N
N
O N N N N
H H H
N N
H F F
HO N H H
N N N N
S
O O N O N N
N N
H H H
N
N N
H
H H N
N O
O
O N
O N N N
H H H
Structural and biological data: pKi 5.5 – 8.5 ( human 5-HT1F )

Schaus, J. et al., J. Med. Chem. 46 (2003) 3060
17
Alignment Procedure – 5-HT1F Agonists
LY306528 (R)
Template molecule 1
Template molecule 2 Ligand Alignment
Template molecule 3
Ligand Alignment 5-HT1F Agonists
All ligands were superimposed using FlexS and post-

processing using SYBYL Multifit
18
Statistical Results
LOO Crossvalidation
Predicted pKi
q2LOO = 0.94
SDEP = 0.25
n = 21
Experimental pKi
3D-QSAR - 5-HT1F Agonists
Statistical results of the crossvalidation

CoMFA approach – standard settings
n = 21, 3 principal components
q2 SDEP
cvLOO (1 cpd) 0.94 0.25
cvL5RG (4 cpds) 0.93 0.26
cvL3RG (7 cpds) 0.91 0.30
cvL2RG (10 cpds) 0.85 0.40

Repeated 30 times
19
Case Study 2
Receptor-based 3D-QSAR
Acetylcholinesterase (AChE)
Inhibitors
AChE Inhibitors
Biological data:
NH2 (CH2)n R
N N - IC50 values fromaAChE
aTorp. Californica
- 42 Inhibitors
X Y CH2
- pIC50 3.1 - 7.6
N N
- competitive inhibitors
- same binding mode

N
H
R N
N N
Structural and biological data:

Sippl, W. et al., J. Comp.-Aided Mol. Des. 15 (2001) 395
20
Docking Validation
Good agreement between docking results and X-ray structures
AutoDock X-ray
Sippl, W. et al., JCAMD 15 (2001) 395
3D-QSAR - Setup
Analyses
Analysesof
ofknown
knownX-ray
X-raystructures
structures
Docking
Docking
AutoDock
AutoDock
Validation
Validation
GRID
GRID InteractionFields
Interaction Fields
Receptor-based
Receptor-basedAlignment
Alignment
3D-QSAR
3D-QSARAnalysis
Analysis
21
Receptor-based Alignment
Docking of all inhibitors

into the binding site
Similar position of the

cationic head
Hydrophobic parts of the

inhibitors are interacting
with aromatic residues
within the binding pocket
Blue – hydrophilic
Brown - hydrophobic
Support by Novel X-ray Structures

Aminopyridazine
Donepezil
Good agreement between
predicted conformation of the
aminopyridazine and the
X-ray structure of donepezil
Donepezil
Aminopyridazine
22
3D-QSAR Model
GRID/GOLPE - www.moldiscovery.com
pIC50 calculated
n=42 r2 0,98 SDEC 0,13
pIC50 experimental
3D-QSAR Model
GRID/GOLPE - www.moldiscovery.com
Cross-validation:
Leave-50%-out
pIC50 predicted
n=42 q2L50%O 0,91 SDEP 0,40
pIC50 experimental
23
Graphical Representation
Favoured Favoured interaction

interaction with with methyl probe
polar probe (cyan)
(geen)
GRID/GOLPE
PLS Felder Wasser Sonde
Design of Novel AChE Inhibitors
N N
N N
N N N N
H H
7.40 8.00 7.62 7.41
N N
N N
N N N N
H H
7.50 7.61 6.88 7.25
N N N
H
N N O
N N N N
H H
7.05 7.24 7.25 7.27
predicted SDEPext = 0.36

1Sippl, W. et al., J. Comp.-Aided Mol. Des. 15 (2001) 395 observed
24
AFMoC
Klebe, G. University of Marburg
Conclusions
The robustness and predictivity of a 3D-QSAR

model will be crucially determined by:
• Quality of the biological data

• Causality between structure and activity!
• Quality of the chemical structures
• Ligand Similarity!
• Ligand alignment
• Number of PLS vectors
• Choice of the right crossvalidation method
25
Recommendations for CoMFA Analyses
• Quality of biological data (affinities, inhibition

constants)
• Variance and error range of biological data
• Pharmacophore for ligand superposition
• Ligand alignment (ligand-based, field-based,
protein-based)
• Strucutrally related molecules
• Number of PLS vectors („Occam‘s razor“)
• Variable selection / reduction
• Crossvalidation - LOO or random groups
• Prediction of a test set
Book to read
Novel textbook
PDF of the talk upon request

sippl@pharmazie.uni-halle.de
Student Edition
26

Wolgang 1

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Wolgang 1

Hochgeladen von

Copyright:

Verfügbare Formate

3D Quantitative-Structure-Activity

Relationships (QSAR) Methods in

Chemical Abstract Service

When is QSAR or 3D-QSAR useful?

- you have a data set of ligands with known activities (preferentially in

-What you should not expect

QSAR = quantitative structure activity relationships are derived

QSAR – Molecular Descriptors

General form of a QSAR equation:

Molecular properties – molecular descriptors

1D: Whole-molecule properties

∆(observed activity) - ∆(molecular descriptors)

Linear Regression Hammett, 1939

pKi = ao + a1 (Mol Voli)

Multiple Linear Regression (MLR) Hansch, 1964

Partial Least-Squares (PLS) Regression Wold, 1984

pKi = ao + a1 (PC1) + a2 (PC2) + a3 (PC3) + ...

Others: neuronal nets, bayesian models, decission trees, ..

• Large dimension sets require decomposition

• Use of principal component analysis in regression:

Crossvalidated PLS Analysis

• Crossvalidated r2cv (q2) ∑(y − yexp erimental ) 2

1.00 = Optimal Model

Statistical Significant Model

Use results only with care when: q2 < 0.5

Negative values = prediction worse than

1. Superimpose 3D models of molecules

3. Calculate and tabulate steric and

CoMFA (Comparative Molecular Field Analysis)

• All included compounds

• The training set should contain a wide

The Gibbs-Helmholtz equation (∆G = -RT lnK) tells us that

Selection Training Set - Example

Different set of MAO inhibitors

Selection Training Set

Statistical Results (q2LOO) for Training Set 6 (n=22)

Analysis Field(s) q2 N r2 s F ste ele

Statististical Results (q2LOO) for Training Set 1, 3, 5 and 6

Field(s) Set 1 Set 3 Set 5 Set 6

Steric -0.219 (1) 0.005 (3) -0.097 (1) 0.743 (3)

Electrostatic 0.296 (2) -0.075 (1) -0.180 (1) 0.433 (1)

S+E 0.006 (1) 0.031 (2) -0.141 (1) 0.594 (2)

No model could be obtained when set 1, 3 or 5 were used,

Selection Training Set

Statistical Results (q2LOO) for several combinations

Field(s) Sets 1 + 4 Sets 2+4 Sets 3+4 Sets 1+2+3

By• combining sets 1, 2 and 3, no reliable CoMFA model was

Statistics are markedly improved when set 4 (only 3 compounds of

Selection Training Set

• The leave-one-out procedure was not able to detect

• When the crossvalidation was performed using

• The “leave-several-out” crossvalidation detects the

• The choice of the training set is of prime importance

CoMFA (Comparative Molecular Field Analysis)

• The alignment step is the most critical in a CoMFA

Serotonin 5HT1F receptor agonists

One problem – several ways of solving it

CoMFA (Comparative Molecular Field Analysis)

• Traditional CoMFA Fields

CoMFA (Comparative Molecular Field Analysis)

Crossvalidation - PLS Analysis

• Choice of optimal number of components: principal