Sie sind auf Seite 1von 26

3D Quantitative-Structure-Activity

Relationships (QSAR) Methods in


Drug Design
Wolfgang Sippl, PhD
Martin-Luther-Universität Halle-Wittenberg
Institute of Pharmaceutical Chemistry

3D-QSAR publications

250
229

203
194 195
200 186
176 180
168
158
150
123
106
99
100
76
60 58
50
29
20
1 2 4
0
88

89

90

91

92

93

94

95

96

97

98

99

00

01

02

03

04

05

06

07
19

19

19

19

19

19

19

19

19

19

19

19

20

20

20

20

20

20

20

20

Chemical Abstract Service

1
3D-QSAR Methods in Drug Design

• Introduction
• Theoretical Background - 3D-QSAR
– Training Set
– Ligand Alignment
– Molecular Field Calculation
– Internal Validation – Crossvalidation
– External Prediction – Interpretation
• Case Studies
• Conclusions and Recommendations

Introduction

Pharmacophores
Data base Similarity Structure-based
filtering analysis QSAR
design
2-4 years

Target Biological test HTS HTS hits Chemistry Target structure Drug
selection development confirmed start determined candidate

When is QSAR or 3D-QSAR useful?

- you have a data set of ligands with known activities (preferentially in


vitro data on isolated proteins) covering several orders of magnitude of
biol. activity
- known binding mode (competitve) for all ligands
- you want to synthesize modified derivatives

-What you should not expect


- prediction of compounds not related to the original series
- scaffold hopping

2
Intention – QSAR

QSAR = quantitative structure activity relationships are derived


from a series of (similar) molecules with known activity (training
set). If a statistically relevant QSAR model has been found, it can
be applied to new molecules in this series (test set) in order to
predict their activity before biological testing (or even before
synthesis!)

Statistical Tools
Biological Data Molecular Properties
Ki MEP
IC50 MLP
MIC Volume
Permeation log P
… …

QSAR – Molecular Descriptors

General form of a QSAR equation:


biol. activity = f(P) P = molecular properties

biol. activity = const. + (c1 P1) + (c2 P2) + (c3 P3) + ...

Molecular properties – molecular descriptors

1D: Whole-molecule properties


(e.g. molecular weight, melting point, logP, ...)
2D: Substituent constants (e.g. π, σ, molar refractivity),
fragment fingerprints, topological indices, ...)
3D: Surface or field properties (e.g. electrostatic potential,
steric fields, solvent accessible surface area, ...)

3
QSAR

∆(observed activity) - ∆(molecular descriptors)


Y = f(P)

Linear Regression Hammett, 1939

pKi = ao + a1 (Mol Voli)

Multiple Linear Regression (MLR) Hansch, 1964


pKi = ao + a1 (Mol Voli) + a2 (logP) + a3 (µi) + ...

Partial Least-Squares (PLS) Regression Wold, 1984

pKi = ao + a1 (PC1) + a2 (PC2) + a3 (PC3) + ...

Others: neuronal nets, bayesian models, decission trees, ..

PLS Analysis

• PLS analysis:
Belongs to the family of PCA (principal component
analysis) techniques and is used as standard method
within 3D-QSAR

• Large dimension sets require decomposition


techniques such as PLS

• Use of principal component analysis in regression:


First reduction of X and/or Y matrices in principal
components also called latent variables (LVs).
Secondly, regression between these latent variables.

4
Crossvalidated PLS analysis

Leave-one-out Leave-several-out

Original Groups of
Table crossvalidation
SDEP

Derivation
of a model Differences

Compounds Predicted
excluded Activity

Measured
Prediction of excluded Activity
compounds

Crossvalidated PLS Analysis

• Crossvalidated r2cv (q2) ∑(y − yexp erimental ) 2


q 2 = 1.0 −
predicted

∑(y predicted
experiment − y) 2

1.00 = Optimal Model

Statistical Significant Model

0.50

Use results only with care when: q2 < 0.5

0.00 = No Model!

Negative values = prediction worse than


those based on the mean over all compounds !

5
3D-QSAR Methods

• Different techniques
– Molecular-shape analysis
– Hypothetical Active Site Lattice
(HASL)
– Comparative Molecular Field
Analysis (CoMFA, CoMSiA,
GRID/GOLPE …)
– ALMOND, GRIND
– QUASAR
– AFMoC

CoMFA Approach

1. Superimpose 3D models of molecules


(„Ligand Alignment")
2. Generate a regular grid around the
molecules

3. Calculate and tabulate steric and


electrostatic interaction energy of
each grid point and each molecule

Compound Number Biol. Activity Steric Electrostatic Steric Electrostatic Steric Electrostatic Steric Electrostatic
Interaction Interaction Interaction Interaction Interaction Interaction Interaction Interaction ...
S001 E001 S002 E002 S003 E003 S004 S004
1 1.07
2 0.09
3 0.66
4 1.42
5 -0.62
6 0.64
7 -0.46

6
3D-QSAR

CoMFA (Comparative Molecular Field Analysis)

Selection
Training Set

Interpretation
Ligand Alignment Graphical Representation

Calculation Statistical
Molecular Fields Analyses

QSAR - Setup

• All included compounds


– interact with the target in the same way
– posses the same binding mode (competitive)
• Interaction energies ~ biological activities
• Biological activities ~ binding affinities
• Quality biological activities! (Test system,
experimental error, value distribution, ...)
• Quality of compounds (Structure,
stereochemistry, purity, ...)
• Cave: in vivo data (influence of transport
processes)

7
Selection Training Set

• The training set should contain a wide


range of structurally diverse compounds
(> 3-4 orders of magnitude)
• Both the range and the distribution of
biological data are of great importance
• Improving of the distribution using a
logarithmic scale

The Gibbs-Helmholtz equation (∆G = -RT lnK) tells us that


there is a logarithmic relationship between equilibrium
constants (e.g. IC50) and free energy of binding. Thus, the IC50
values are normally transformed to a logarithmic scale.

Selection Training Set - Example

CN CN
CN N
N N N
O N N
R Y X
Y X O
R R O
CN Set 3 : 18 molecules

N N Set 2 : 7 molecules
O O
O
R
X R'
Set 1 : 16 molecules
R
O O O Y R

Set 6 : 22 molecules
N
OR´
N N N
O X O
R R'
Y
O
R Set 5 :12 molecules
Set 4 : 3 molecules
O
Set 7 : 21 molecules

Different set of MAO inhibitors

8
Selection Training Set

10
Sets 4 und 7:
9 not enough active (7)
or inactive (4)
8
compounds
pIC 50

7
Sets 1, 2, 3 und 5:
6 Poor distribution of
biol. activities
5
4 Set 6:
3 Broad range and
0 1 2 3 4 5 6 7 8 relatively well
distributed biol.
Sets activities

Selection Training Set

Statistical Results (q2LOO) for Training Set 6 (n=22)

Analysis Field(s) q2 N r2 s F ste ele


A S .743 3 .894 .522 50.4 100 -
B E .433 1 .547 1.02 24.2 - 100
C S+E .594 2 .790 .713 35.8 45.1 54.9

The model using only the steric field shows the best statistical
results (q2, LOO cross validation)

9
Selection Training Set

Statististical Results (q2LOO) for Training Set 1, 3, 5 and 6

Field(s) Set 1 Set 3 Set 5 Set 6

Steric -0.219 (1) 0.005 (3) -0.097 (1) 0.743 (3)

Electrostatic 0.296 (2) -0.075 (1) -0.180 (1) 0.433 (1)

S+E 0.006 (1) 0.031 (2) -0.141 (1) 0.594 (2)

No model could be obtained when set 1, 3 or 5 were used,


presumably due to the poor distribution/small range of activities
in these sets.

Selection Training Set

Statistical Results (q2LOO) for several combinations

Field(s) Sets 1 + 4 Sets 2+4 Sets 3+4 Sets 1+2+3


S 0.645 (1) 0.872 (1) 0.778 (2) -0.035 (1)
E 0.786 (2) 0.831 (1) 0.840 (2) 0.198 (3)
S+E 0.728 (1) 0.854 (1) 0.816 (2) 0.212 (4)

By• combining sets 1, 2 and 3, no reliable CoMFA model was


Die Kombination von Trainingsatz 1, 2 und 3, ergibt kein signifikantes
found, presumably due to the
Modell (schechte Verteilung derpoor distribution
biologischen of activities.
Aktivitäten).
• Werdenby
However, diecombining
Trainingssätze
set1,4 2with
und either
3 jedoch mit1,Trainingssatz
set 2 or 3, CoMFA4
kombiniert, so ergeben sich signifikante Modelle!
produced surprisingly good statistical models!

10
Selection Training Set

Statistics are markedly improved when set 4 (only 3 compounds of


high activity !) was added. However, it appears that the activities
can be separated in two clusters (poorly active and highly active
compounds). It is thus trivial to find good linear models (a straight
line via two points!).

• • Set 4
pred.

„Beware of q2!“
q2 0.85
•• •
••• •

expt.
S

Selection Training Set

• The leave-one-out procedure was not able to detect


this pitfall.

• When the crossvalidation was performed using


groups of crossvalidation, the q2 vary from very good
(> 0.8) to very bad (<-0.5, when all active compounds
were removed!)

• The “leave-several-out” crossvalidation detects the


robustness of a CoMFA model much better.

• The choice of the training set is of prime importance


as it will affect the outcome of a CoMFA model!

11
3D-QSAR

CoMFA (Comparative Molecular Field Analysis)

Selection
Training Set

Interpretation
Ligand Alignment Graphical Representation

Calculation Statistical
Molecular Fields Analyses

Ligand Alignment

• The alignment step is the most critical in a CoMFA


study as it will affects the outcome of the statistical
analysis and it is rather difficult particularly when the
studied compounds are structurally diverse.

N
HO N
HO
N
S N
O H
N
H

How to align?
N
N
F
H
N N N
N
N N
O N N H
H

Serotonin 5HT1F receptor agonists

12
Ligand Alignment

One problem – several ways of solving it


– Alignment-independend methods
(GRIND, ALMOND)
– Ligand-based alignment – use of traditional
pharmacophore concepts (Active Analog Approach)
Catalyst, Disco, …
– FLEXS, SEAL
– Field Fit alignment
– Receptor-based alignment (from ligand docking)

3D-QSAR

CoMFA (Comparative Molecular Field Analysis)

Selection
Training Set

Interpretation
Ligand Alignment Graphical Representation

Calculation Statistical
Molecular Fields Analyses

13
Calculation Molecular Fields

• Traditional CoMFA Fields


– Lennard-Jones Potential (Steric Field)
• Coulomb potential (Electrostatic Field)
Lennard-Jones Potential Electrostatic Energy
1
2.5
0.8
2 (A12/r – B6/r) 0.6 q1q2 / ε r
Energy, kcal/mol

Energy, kcal/mol
0.4
1.5
0.2
1 0
-0.2 0 1 2 3 4
0.5 -0.4

0 -0.6
0 1 2 3 4 -0.8
-0.5 -1
Nonbonded Internuclear Distance Internuclear Distance

3D-QSAR

CoMFA (Comparative Molecular Field Analysis)

Selection
Training Set

Interpretation
Ligand Alignment Graphical Representation

Calculation Statistical
Molecular Fields Analyses

14
Statistical Parameters

• Crossvalidation:
q 2
= 1−
∑(y obs − y pred ) 2
– Crossvalidated Correlation Coeff., q2. ∑(y obs − y) 2
– Optimal Number of Components.
ypred predicted value
– SDEP (Standard Deviation of Error yobs observed value
Prediction) y mean

( y pred − y obs ) 2
• Final PLS Model:
– Correlation Coefficient, r2.
SDEP = ∑ N
– Standard Deviation, s.
– F values. SDEP standard dev. of error prediction
yobs predicted value
ypred observed value
N number of ligands

r 2 = 1−
∑(y calc − yobs ) 2
∑(y obs − y) 2

Crossvalidation - PLS Analysis

• Choice of optimal number of components: principal


source of overfitting in PLS analyses.
• Graphs q2 vs number of components help the selection!

0,8 1,2
1
0,6
0,8
r^2 final
Q^2

0,4 0,6

0,2 0,4
0,2
0
0
0 2 4 6 8 10 12
0 2 4 6 8 10 12
Number of components Number of components
LOO

• Principal rule: have more than 5 observations by


component !

15
3D-QSAR

CoMFA (Comparative Molecular Field Analysis)

Selection
Training Set

Interpretation
Ligand Alignment Graphical Representation

Calculation Statistical
Molecular Fields Analyses

Graphical Representation

• The graphical representation of CoMFA


models provides important information
regarding the optimization of drug molecules.
• Representation of regions, where differences
in the field variables are correlated with
variance of biological activities.

16
Case Study 1

Ligand-based 3D-QSAR
5-HT1F Receptor Agonists

Biological Data 5-HT1F Agonists

N N

HO HO N
N
S N
N
O N N N N
H H H

N N
H F F
HO N H H
N N N N
S
O O N O N N
N N
H H H

N
N N
H
H H N
N O
O
O N
O N N N
H H H

Structural and biological data: pKi 5.5 – 8.5 ( human 5-HT1F )


Schaus, J. et al., J. Med. Chem. 46 (2003) 3060

17
Alignment Procedure – 5-HT1F Agonists

LY306528 (R)

Template molecule 1
Template molecule 2 Ligand Alignment
Template molecule 3

Ligand Alignment 5-HT1F Agonists

All ligands were superimposed using FlexS and post-


processing using SYBYL Multifit

18
Statistical Results

LOO Crossvalidation
Predicted pKi

q2LOO = 0.94
SDEP = 0.25
n = 21

Experimental pKi

3D-QSAR - 5-HT1F Agonists

Statistical results of the crossvalidation


CoMFA approach – standard settings
n = 21, 3 principal components

q2 SDEP

cvLOO (1 cpd) 0.94 0.25

cvL5RG (4 cpds) 0.93 0.26

cvL3RG (7 cpds) 0.91 0.30

cvL2RG (10 cpds) 0.85 0.40


Repeated 30 times

19
Case Study 2

Receptor-based 3D-QSAR
Acetylcholinesterase (AChE)
Inhibitors

AChE Inhibitors

Biological data:
NH2 (CH2)n R
N N - IC50 values fromaAChE
aTorp. Californica

- 42 Inhibitors
X Y CH2
- pIC50 3.1 - 7.6
N N
- competitive inhibitors

- same binding mode


N

H
R N
N N

Structural and biological data:


Sippl, W. et al., J. Comp.-Aided Mol. Des. 15 (2001) 395

20
Docking Validation

Good agreement between docking results and X-ray structures

AutoDock X-ray

Sippl, W. et al., JCAMD 15 (2001) 395

3D-QSAR - Setup

Analyses
Analysesof
ofknown
knownX-ray
X-raystructures
structures

Docking
Docking
AutoDock
AutoDock

Validation
Validation
GRID
GRID InteractionFields
Interaction Fields

Receptor-based
Receptor-basedAlignment
Alignment

3D-QSAR
3D-QSARAnalysis
Analysis

21
Receptor-based Alignment

Docking of all inhibitors


into the binding site

Similar position of the


cationic head

Hydrophobic parts of the


inhibitors are interacting
with aromatic residues
within the binding pocket

Blue – hydrophilic
Brown - hydrophobic

Support by Novel X-ray Structures


Aminopyridazine
Donepezil
Good agreement between
predicted conformation of the
aminopyridazine and the
X-ray structure of donepezil

Donepezil

Aminopyridazine

22
3D-QSAR Model
GRID/GOLPE - www.moldiscovery.com
pIC50 calculated

n=42 r2 0,98 SDEC 0,13

pIC50 experimental

3D-QSAR Model

GRID/GOLPE - www.moldiscovery.com

Cross-validation:
Leave-50%-out
pIC50 predicted

n=42 q2L50%O 0,91 SDEP 0,40

pIC50 experimental

23
Graphical Representation

Favoured Favoured interaction


interaction with with methyl probe
polar probe (cyan)
(geen)

GRID/GOLPE
PLS Felder Wasser Sonde

Design of Novel AChE Inhibitors

N N
N N
N N N N
H H

7.40 8.00 7.62 7.41

N N
N N
N N N N
H H

7.50 7.61 6.88 7.25

N N N
H
N N O
N N N N
H H

7.05 7.24 7.25 7.27

predicted SDEPext = 0.36


1Sippl, W. et al., J. Comp.-Aided Mol. Des. 15 (2001) 395 observed

24
AFMoC

Klebe, G. University of Marburg

Conclusions

The robustness and predictivity of a 3D-QSAR


model will be crucially determined by:

• Quality of the biological data


• Causality between structure and activity!
• Quality of the chemical structures
• Ligand Similarity!
• Ligand alignment
• Number of PLS vectors
• Choice of the right crossvalidation method

25
Recommendations for CoMFA Analyses

• Quality of biological data (affinities, inhibition


constants)
• Variance and error range of biological data
• Pharmacophore for ligand superposition
• Ligand alignment (ligand-based, field-based,
protein-based)
• Strucutrally related molecules
• Number of PLS vectors („Occam‘s razor“)
• Variable selection / reduction
• Crossvalidation - LOO or random groups
• Prediction of a test set

Book to read

Novel textbook

PDF of the talk upon request


sippl@pharmazie.uni-halle.de

Student Edition

26

Das könnte Ihnen auch gefallen