Sie sind auf Seite 1von 132

Moleküldesign

Folien zum Kurs


Sommersemester 2005

Gisbert Schneider
gisbert.schneider@modlab.de

Worum es geht, am Beispiel der „Aspirin-Story“

Salicin
(aus Weidenrinde, Entzündungshemmer)
Arachidonsäure
HO
COOH
O-β-D-glucopyranosid

COX
COX
COOH COOH
OH O O COOH
O O
OOH
Salicylsäure Acetylsalicylsäure
PGG2
(Aspirin®, Prodrug)
10.10.1897 Erstsynthese
durch Felix Hoffmann (Bayer)
Prostacyclin Thromboxan A2
• inhibiert COX-1 und COX-2 (1991 entdeckt)
• Acetylierung von Ser530 ( Ala-Mutante) Erweiterung Aggregation
• Problem der Selektivität von Blutgefäßen von Thrombozyten
+ Hemmung der (nur in Thrombozyten)
Thrombozytenaggregation

1
Strukturbasierter Entwurf von Wirkstoffen

Cokristallstruktur Salicylsäure+COX-1 Cokristallstruktur SC558+COX-2


(PDB: 1pth) (PDB: 1cx2)

Hydrophile Tasche

Ser530
Ser530
„Mickey Mouse“-Grundgerüst
privilegiertes Strukturmotif

Adaptive Optimization

Bio/Chemical • Bioactive Molecule(s)


Knowledge • (Q)SAR

Inference Machine

Hypothesis Data/Facts

Molecular Structures Test System

Synthesis

2
Similarity-Based Molecular Design

“Seed”
x2
Assumption:
Assumption:

Growing
Growing Distance
Distance ==
Growing
Growing Dissimilarity
Dissimilarity

x1

Mutation - Sampling

ELVISISKING
DIVISISKLNG
PY j
DAYLSLSKLDS

σ DAYANDNIGHT

| |||||| | | | | || ||| | | di,j


0 1
Y FLWIHM QV PCKTNE RSAD G

3
Entropy of Peptide Libraries

H = − ∑ ∑ p p , k log 2 p p , k
p k
Cumulative Shannon-Entropy

70 1 2 x1 2
09
Number of active peptides

60 (a ) 1 0 x1 0 9 (b )

x 10-5 units
50
8 x18
0 9

<activity>
40
30 6 x16
09
<activity>
20 4 x14
09
10
2 x12
09
0
00
0 10 20 30 40 50 1 2 3 4 5 6
E n tro p y o f p e p tid e lib ra ry [b its ] < d is ta n c e to s e e d p e p tid e >

Schneider & So 2001, Adaptive Systems in Drug Design, Landes, Austin.

Virtual Fitness Landscape

O
O O H
H H N
N N
H

NH
Ala Trp Gly
ity
ic
ob
Fitness

ph
ro
yd
H

Vol
um
e
A B C
S1 S2 S3

Signal peptidase I substrates


Neural
Neural network
network „fitness“
„fitness“ score
score

4
Peptide de novo Design

• Novel eubakterial signal peptidase-I substrates

FFFFGWYGWA*RE

• Artificial antigen (DCM, β1-adrenoceptor auto-antibodies)

ARRCYNDPKC GWFGGADWHA

Wrede et al. 1998, Biochemistry 37:3588


Schneider et al. 1998, PNAS 95:12179

Drugs, Drug-Likeness &


Virtual Screening

5
Best-selling Drugs 2003-2005
OH
O H
N
HN Lipitor® (Atorvastatin) [Pfizer] HO O
O HMG CoA reductase inhibitor HO
N O
Hypercholesterolemia S
OH F
OH OH Hyperlipidemia HO O
Atherosclerosis 11 billion $ F
O

F HH
HO O Zocor® (Simvastatin) [Merck] O
F
O HMG CoA reductase inhibitor
O Adavir® (Fluticasone, Salmetrol) [GSK]
Hypercholesterolemia
Hyperlipidemia Corticosteroid agonist + Beta 2 adrenoceptor agonist
O
H Atherosclerosis
6 billion $ Asthma

Zyprexa® (Olanzapine) [Eli-Lily]


N 5-HT 2 antagonist
Plavix® (Clopidogrel) [BMS]
N D1, D2, D4 antagonist
O S P2Y12 purinoceptor antagonist
N Alzheimers disease
N Myocardial infarction
O Psychosis
Thromboembolism
Cl N Schizophrenia
Atherosclerosis H
S
Bipolar disorder
Cerebrovascular ischemia
H Paxil® (Paroxetine) [GSK]
N
5-HT uptake inhibitor
Norvasc® (Amlodipine) [Pfizer] O Anxiety disorder
Cl O
O O
Calcium channel blocker Sleep disorder
Hypertension O Obsessive-compulsive disorder
O O
O
Angina Premenstrual syndrome
H2N N Cardiac failure F
Major depressive disorder
H

Therapeutic Target Classes

Bleicher et al. (2003)

6
Stage-by-stage quality assessment
to reduce costly late-stage attrition Bleicher et al. (2003)

Time & Costs in Drug Development

7
Assay Methods

Functional Assays Cell-based Assays

Binding Assays Biochemical-endpoint


Assays

Note:
– no strict discrimination possible
– overlapping between subdefinitions

Functional Assays

Amplified Luminscent Proximity Homogeneous Assay

Low background in the absence of a Amplified signal when receptor and


specific receptor-ligand bead ligand beads are in proximity by
interaction specific biological interactions

8
Functional Assays

GPCR: Coupling

R Effector
Proteins

AC PLC
ATP cAMP + PPi PIP2

Functional Assays

Detection of changes in intracellular IP3

donor GST: glutathione S-transferase

9
Functional Assays

Fusion HT Microplate Reader

Binding Assays

Binding of a ligand to a protein target

Ligand: Protein target:


– neurotransmitters – Receptors
– hormones – Ion channels
– growth factors – Enzymes
– cytokines – Carrier
– toxins molecules
– etc.

10
Binding Assays

Evaluation of the binding (affinity) between endogenous


ligands (e.g. neurotransmitters) or drugs and their
molecular targets (e.g. receptors)

Binding Assays provide a direct approach to the study of


receptors (or more accurately recognition sites) and their
modification by drugs

They do not provide information on the activity of the


ligand for the molecular target

Definition and calculation of IC50-value

The IC50-value represents the concentration of drug that is


required for 50% inhibition of enzyme / receptor activity

x
IC 50 =
100 %
( − 1)
y

x = concentration [µM] of drug


in the assay
y = result of assay for the drug [% of Control]

11
Definition and calculation of Ki-value

The Ki-value is defined as the concentration of the competing ligand


(here: drug) that will bind to half of the binding sites at equilibrium, in the
absence of competitors

IC50
Ki = [Ligand] [Protein]
L Ki = Einheit: mol/l
1+ [Ligand x Protein]
Kd

L = concentration [nM] of radiotracer


Kd = affinity [nM] of the radiotracer for the receptor

Cheng Y., Prusoff W.H., Biochem. Pharmacol. 22: 3099-3118, 1973

HTS Workstation

12
Screening Methods
Screening plates Screening plates
for HTS „Intelligent LTS“
for LTS

24 well

96 well

Hit-identification strategies

Structure-based
Ligand-based

Bleicher et al. (2003)

13
Structure-Based Molecular Design

Selection of promising drug-like agents


in the presence of
receptor structure information

• Binding site identification


• Docking (single; combinatorial)
• Scoring
• ….

Ligand-Based Molecular Design

Selection of promising drug-like agents


in the absence of
receptor structure information

103 -1020
1 -104
(10100)

Chemical Space Focused Library


(chemically feasible, (subset of compounds)
virtual molecules)

14
Navigation is defined as ...

"The process of determining and maintaining a


course or trajectory to a goal location“.
(Franz & Mallot, Robot. Autonom. Syst. 2000, 30, 133)

What We Need for Exploration

• Coordinate system (“chemical space”)


• Guide through chemical space (“compass”, “map”)
• Target (“goal location”)
• Molecule generator / sampling method (“vessel”)

Each map has a certain resolution & meaning

15
Similarity Searching / Neighborhood Behavior

Bleicher et al. (2003)

“Cherry-Picking“

Bleicher et al. (2003)

16
Library Design by Similarity
x2
PC2

virtual d
optimum
PCA
k-nearest neighbor
x3
PC1
x1

x2
PC2

“Spikes”
PCA

x3
PC1
x1

Library Shaping by Similarity

ANN, SVM, PLS etc.


„Rule-of-Five“
20 before after
100
% compounds
% compounds

15
80

10 60
40
5
20
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2

drug-likeness score Rule-of-Five violations

Lipinski
Lipinski et
et al.
al. (1997)
(1997)
Poor
Poor absorption
absorption oror permeation
permeation is
is more
more likely
likely when:
when:

1.
1. There
There are
are more
more than
than 55 H-bond
H-bond donors
donors (expressed
(expressed as
as the
the sum
sum of
of OHs
OHs and
and NHs);
NHs);
2.
2. The
The MW
MW isis over
over 500;
500;
3.
3. The
The LogP
LogP isis over
over 5;
5;
4.
4. There
There are
are more
more than
than 10
10 H-bond
H-bond acceptors
acceptors (expressed
(expressed as
as the
the sum
sum of
of Ns
Ns and
and Os).
Os).

17
Properties of Known Drugs

35 35
MW clogP
30 30
25 25
% 20 % 20
15 15
10 10
5 5
0 0
100 300 500 700 > -6 -4 -2 0 2 4 6 8 10 12
200 400 600 800

Druglikeness may be defined as a complex


balance of various molecular properties and
structure features which determine whether
particular molecule is drug or non-drug.

• hydrophobicity
• electronic distribution
• hydrogen bonding characteristics
• molecule size and flexibility affect
• pharmacophoric features

• behavior of molecule in a living organism


• transport, affinity, reactivity, toxicity, metabolic stability etc.
(ADME/Tox)
• Pharmacokinetics („what the organism does to the drug“)
• Pharmacodynamics („what the drug does to the organism“)

18
More Rules of Thumb for „Drug-Likeness“

Rotatable Bonds < 6

• Solubility logS
Fragment-based prediction
logS values below -4 indicate possible solubility problems

• Polar Surface Area PSA


Fragment-based prediction
PSA values above 120 Å2 indicate possible absorption problems

• Partion Coefficient Water/Octanol AlogP


Fragment-based prediction
AlogP values above 5 indicate possible bioavailability problems

http://www.molinspiration.com/cgi-bin/properties

Properties of Ligand Families

MW clogP
Receptor class
(number of cmpds) Avg. Min. Max. σ Avg. Min. Max. σ
GPCR (N = 1,467) 406 121 993 134 3.6 -11.1 13.7 2.6
Protease (N = 1,015) 495 136 945 114 2.9 -8.2 10.3 2.4
Kinase (N = 387) 395 74 717 104 3.1 -5.6 8.7 2.2
Enzyme (N = 839) 364 68 849 119 2.6 -5.2 10.9 2.5
Hormone (N = 227) 336 142 949 115 4.0 -5.2 10.1 2.9
Ion channel (N = 412) 375 208 969 106 3.1 -11.1 10.8 2.7

19
Properties of Ligand Families

5.5
5.0
4.5 hormone receptor

4.0
GPCR
3.5 kinase
<clogP>
3.0 x protease

2.5 other enzymes

2.0 ion channel other

1.5
320 340 360 380 400 420 440 460 480 500
<MW>

Fragment-based Design

• combinatorial optimization principle


• manageable size of search space
• might result in chemically feasible molecular designs
• “side-chains” and “scaffolds” are interchangeable

Cl

R1 R1
N N
N NH
N N N
R2 R2 N

Traditional “combinatorial thinking” Fragment-based design


Scaffolds & side-chains Only “building-blocks” (fragments)

20
The Concept

pseudo-retrosynthetic fragmentation
(e.g., RECAP)

Drug DB Fragment DB

•• Reference
Reference Molecule(s)
Molecule(s)
Reactions •• Fitness
Fitness Functions
Functions
Assemble
(e.g., TOPAS)

Designed
Designed Molecules
Molecules

RECAP: Lewell et al. (1998) J. Chem. Inf. Comput. Sci. 38:511


TOPAS: Schneider et al. (2000) J. Comp. Aided Mol. Des. 14:487

RECAP – Eleven Bond Cleavage Types


O O
O

N
N N N
O

Amide Ester Amine Urea

N
O
N+ N

Ether Olefin Quarternary N Arom. N aliph. C

N
N S

O
O
Lactam N aliph. C Arom. C arom. C Sulphonamide
Lewell et al., JCICS 1998.

21
RECAP Applied to the COBRA Database

Reaction Total Fragments Unique Fragments


Amide 4324 1904
Ester 978 472
Amine 489 178
Urea 8 6
Ether 128 76
Olefin 0 0
Quart. Nitrogen 48 16
Arom. N – Aliph. C 1018 601
Lactam N – Aliph. C 274 170
Arom. C – Arom. C 1194 613
Sulphonamide 646 368

COBRA: Schneider & Schneider (2003) QSAR Comb. Sci. 22:713

Fragment analysis

Molecule

Sidechain Framework

Ring System Linker

Ring

N N
Linker
Sidechain
N S N S

Molecule Molecular Graph Scaffold


adapted from
Bemis & Murcko

K. Grabowski, Analysis done with SVL (MOE)

22
Framework extraction

IBS SPECS-NP COBRA

73: 10,44% 34: 4,86% 276: 5,48% 93: 1,85% 82: 1,63% 81: 1,61%
1520: 6,09% 789: 3,16% 668: 2,67% 553: 2,21%

26: 3,72% 20: 2,86% 19: 2,72% 77: 1,53% 68: 1,35% 58: 1,15%

492: 1,97% 463: 1,85%

17: 2,43% 16: 2,23%

450: 1,8% 367: 1,47%% 55: 1,09% 50: 0,99%

12: 1,72%

13: 1,86%

304: 1,22% 266: 1,06% 50: 0,99% 50: 0,99%

10: 1,43%

Unique ring extraction from natural products


Natural products COBRA
(IBS: 24977, Micro: 685, SpecsNP: 699 cpds.) (5033 cpds.)
*1 *2

*2 *2 *1
*1

*2

1* 2* *1

22 20 17
2* *1
2* *2 *1

363 289

*2 *2

1* *1 *2
*1 *2
2* *2 2* 2*

16 15 15
*1
1* *2 1* *2

281 273 1*
2* 1*
*2
*2 *2 *1
1* *2 1* 1*
2* *1

2* 12 11 11
*1 *2
*1 2*
2* 2* *2 *2

265 256 254 1*


1*

*2 2*

1* *2 *2
1* 1* *1
11
11 11 *1

1*
2* 2*
2*
2* *1 *2
*2
1* *2 *2 *2 *2
11 11
2*
*2
242 219 205
2*

23
Synthetic drugs vs. Natural products: RECAP results

COBRA: IBS: MICROSOURCE: SPECS:


molecules: 5074 molecules: 24617 molecules: 692 molecules: 700

number of number of number of number of


reactions percent reactions percent reactions percent reactions percent
Amide 2409 47,48 7423 30,15 14 2,02 36 5,14
Aromatic carbon-aromatic carbon 922 18,17 2687 10,92 81 11,71 100 14,29
Aromatic nitrogen-aliphatic carbon 599 11,81 906 3,68 6 0,87 13 1,86
Ester 435 8,57 7150 29,04 291 42,05 374 53,43
Amine 370 7,29 535 2,17 5 0,72 7 1,00
Sulphonamide 313 6,17 84 0,34 0 0,00 1 0,14
Lactam nitrogen-aliphatic crabon 176 3,47 763 3,10 1 0,14 3 0,43
Ether 80 1,58 133 0,54 1 0,14 2 0,29
Urea 65 1,28 436 1,77 1 0,14 0 0,00
Quarternary nitrogen 12 0,24 38 0,15 0 0,00 2 0,29
Olefin 0 0,00 3 0,01 0 0,00 0 0,00

“N-Chemistry” “O-Chemistry”

K. Grabowski, U. Fechner, Analysis done with Daylight-Toolkit

„Drug-Likeness“ Score (ANN)

HN
O Xenical™ (Orlistat)
O
O O Score = 0.54

Nondrugs
14

12

10 Σ = 76% Σ = 24%

% 8

4 y = f(x)
O 2

0
O HN N
N 25
Drugs 0 Score 1
N
20

Σ = 24% Σ = 76%
O S O % 15
N 10

5
N
0

0 Score 1

Viagra™ (Sildenafil)
Score = 0.94

24
„X-Likeness“ Scores (ANN)

• Analysis of natural compound properties & scaffolds


(Lee & Schneider 2001, J. Comb. Chem. 3:284)

• “Drug-Likeness” prediction
(Schneider 2000, Neural Networks 13:15)

• Comparison of combinatorial libraries


(Schneider & So 2001, Adaptive Systems in Drug Design, Landes, Austin)

• Virtual screening for CNS-active compounds


(Schneider et al. 2001, Curr. Med. Chem. CNSA 1:99)

• hERG-liability prediction
(Roche et al. 2002, ChemBioChem 3:455)

• CYP P4503A4-liability prediction


(Zuegge et al. 2002, Quant. Struct. Act. Relat. 21, in press)

• “Frequent Hitter” analysis & prediction


(Roche et al. 2002, J. Med. Chem. 45:137)

Evaluation of Virtual Combinatorial Libraries


R1

Input R1
w(1) N
O
Hidden N R2 OH R2
N
w(2) N O
R2 R1 O
Output

Drug-likeness ++ ++ ++ ++ ++ ++
“Cytotoxicity” + + + ++ ++ –
GPCR-ligand likeness ++ ++ ++ – + +
Kinase-ligand likeness – – – – – +
by supervised R1 R1 R1
O
neural networks N
R2
O N
O
N R2
NH 2 N NH 2
R2

25
R-Group Descriptor
F Distance
E B
D A 1 2 3 4
C
Sum of atomic
a b+c d+f e
values
R1 Distance: topological number of bonds

R-group descriptor R1 based on 5 atomic properties


Distance
3 1 2 3 4
4 2
HN O Atomic Weight 12.010 28.010 28.054 15.015
1
3 H-Bond Don 0 1 0 0
2 H-Bond Acc 0 1 0 1
logP 0.1551 0.0129 -0.4070 -0.7096
R1
Molar Refractivity 0.3513 0.4328 0.5506 0.2173

Popular Supervised Classifier Systems

x2
The task

x2
ANN Solution(s)

x1

x2
x1

SVM Solution

x1

26
Three-layered Feed-forward Network

INPUT x1 x2 xn

w
Hidden Layer

v
OUTPUT

 HID   IN  
f( x ) = act  ∑ v h  act  ∑ whi x hi + ϑ h   + θ 
 h =1  i =1  
  
1 if f ( x) > threshold
class = 
0 else

Support Vector Machine

f ( x) = ∑ α i K ( xisv , x) + b
i

1 if f ( x) > 0
class = 
0 else

α i : Lagrange multipliers (≥ 0)
K : Kernel function K ( x, x' ) = (( x • x' ) s + 1) 5
x : Input vector
x sv : Support vectors
b : Constant

27
Comparison of ANN & SVM: Drug-Likeness
• Sadowski/Kubinyi data set
• GC + MOE + CATS descriptor

SVM

ANN

Comparison of ANN & SVM: Drug-Likeness

• Sadowski/Kubinyi data set


• GC + MOE + CATS descriptor

Test data Correct Incorrect


ANN ANN
Correct SVM 72 % 6%
Incorrect SVM 10 % 11 %

• Solutions complement each other

28
Comparison of ANN & SVM: Drug-Likeness
F
F
HO N F
N
S

6
Cl
1
O
O
OH

☺ ANN O HN
OH
N
O ANN
SVM
OH O
2

N
7
☺ SVM
N H
O N N O
N
O O
HO O N N N

OH N O O
3

NH2 8 9
H
H2N N OH
O
O 4 O
O NH
N
N Cl
O
HN
S S O O
O O NH2
O
5 10

The Jury Decision Approach

A. Givehchi, G. Schneider (2003)

29
Prediction of “Frequent Hitters”

• Molecules showing up as hits in different assays


(unspecific binding, interference with assay)

• Data collection from Roche-HTS

• “Frequent Hitter” selection:


• active in at least eight assays
• requested by at least six projects
• 80% agreement by medicinal chemists

479 “Frequent-Hitters”
423 “Nonfrequent Hitters” from trade drug collection

Roche et al. (2001) J. Med. Chem. 45:137

“Frequent Hitters”: Visualization of Data

120 Ghose&Crippen descriptors


SOM
cc = 0.8

PLS-analysis (cc = 0.8)


PCA

30
“Frequent Hitters”: ANN Training

cc = 0.83

Database cpds. FH/% DND/% r2


ACD 183221 35 26 0.08
WDI 55750 22 81 0.03
Trade Drugs 3344 13 76 0.05

Drugs classified as “Frequent Hitters”

31
Pareto-Ranking

• Multiple objective functions


Dimension N

1
• Selection of sets of solutions
2

3 Pareto-Front

Dimension 1 Implementation
Multiobjective Genetic Algorithm (MOGA)
„non-dominated“ solution Fonseca & Fleming (1993) In: Genetic Algorithms:
Proceedings of the Fifth International Conference,
Forrest, S. (Ed.), Morgan Kaufmann: San Mateo,
CA, pp. 416-423.

Median-molecules: Brown et al. (2004) JCICS 44:1079

Escaping the “Twilight Zone”


© 1999 Cordon Art B.V. - Baarn - Holland. All rights reserved.
M.C. Escher’s “Regular Division of the Plane I”

Library
Diversity

SAR Information

32
Adaptive Feature Extraction

Weeding out
Activity

• Drug-Likeness
• ADMET in silico
• Reactive groups
• Frequent Hitters
x

Narrowing down
Activity

• Trend vectors
• Substructure analysis
• Similarity searching
• Profile / landscape analysis
x

Focusing in
Activity

• Structure-based models
• PPP models
• Informed docking & scoring
x • Informed design

Examples of drugs derived from


structure-based approaches from Congreve et al. (2005)

33
Virtual Screening

Bleicher et al. (2003)

34
Virtual Screening: DNA-Gyrase
Böhm et al. (2000) JMC 43:2664
ACD

Catalyst / LUDI

600 candidates

Retrieval of ~3,000 close analogs

150 hits in 14 chemical classes

7 validated novel actives


Binding to the DNA-gyrase B-subunit ATP site

Structure-based optimization

Fragment-based discovery of a potent p38a MAP-kinase inhibitor


Gill et al. (2005) JMC 48:414
adapted from Congreve et al. (2005)
X-Ray Screening

Met109 Thr106
backbone
amide

Fragment
IC50 = 1.3 mM

Lead

structure overlay IC50 = 65 nM

35
Ligand binding modes: EGFR kinase ATP site.

inactive-like conformation active-like conformation

PDB: 1xkk PDB: 1m17

adapted from Congreve et al. (2005)

Self-Organizing Map (SOM)


„Medicinal Chemistry Roadmap“

36
Mapping Chemical Space
O O
N N O
N HO N OH H2N N NH2
N

Neuron (1/25) Neuron (25/25) N N


H2N H2N

O O O O H2N N OH O
N H N OH
N N
H2N HO OH
N Kohonen map
containing (25x25) O OH

compound classes
O
O O
O O
OH O O
+
N
+
O N
O
O O
N

Neuron (1/1) Neuron (25/1)


O O
Cl

• source: WDI
+
N N
N +
N
N
• CATS 2D descriptor
• Euclidian distance
• planar topology

A „Target Road Map“ (SOM)


CATS 2D descriptors

GPCR Protease Kinase

Enzyme Hormone Ion channel

37
distribution mqe
COBRA 4.6
• 6.064 cmpds, CATS
• 10x10, toroidal
• Euclidian distance

GPCR Protease Enzyme Ion channel Kinase Nuclear R.

distribution mqe
COBRA 4.6
• 6.064 cmpds, CATS
• 10x10, toroidal
• Manhattan distance

GPCR Protease Enzyme Ion channel Kinase Nuclear R.

38
distribution mqe
COBRA 4.6
• 6.064 cmpds, CATS
• 20x20, toroidal
• Euclidian distance

GPCR Protease Enzyme Ion channel Kinase Nuclear R.

distribution mqe
COBRA 4.6
• 6.064 cmpds, CATS
• 20x20, toroidal
• Manhattan distance

GPCR Protease Enzyme Ion channel Kinase Nuclear R.

39
Enzyme_5-LO
Ionchannel_gaba

Ligand Space
GPCR_CRF
GPCR_opioid
Enzyme_PDE
Enzyme_Topoisomerase
GPCR_adenosine
GPCR_CCR
GPCR_NK Enzyme_cholinesterase
GPCR_NPY
Hormone_PPAR GPCR_mAChR 64 COBRA SOMs
Enzyme_aromatase
Ionchannel_calcium
Enzyme_PLA
GPCR_5HT7
GPCR_melatonin
Enzyme_COX Enzyme_Estrone Ward’s method
Hormone_estrogen
Enzyme_cholinesterase GPCR_Histamin 1 – Pearson r
GPCR_mAChR
GPCR_5HT7
Enzyme_Estrone GPCR_adrenergic
GPCR_Histamin
GPCR_adrenergic protease_fVIIa
protease_fVIIa
GPCR_dopamine GPCR_dopamine
GPCR_5HT1
GPCR_5HT3
GPCR_5HT2 GPCR_5HT1
Ionchannel_nAChR
GPCR_5HT4 GPCR_5HT3
Ionchannel_potassium
Ionchannel_sodium GPCR_5HT2
Ionchannel_nAChR
Enzyme_NOS
GPCR_5HT6
protease_DPP
GPCR_cannabinoid
Hormone_RAR GPCR_5HT4
Hormone_retinoid
Hormone_RXR
Enzyme_Farnesyltransferase Ionchannel_potassium
protease_thrombin
protease_cathepsin_K Ionchannel_sodium
protease_proteasome
protease_b-secretase Enzyme_NOS
protease_g-secretase
GPCR_CCK
GPCR_somatostatin GPCR 5HT6
protease_cathepsin_D
GPCR_endothelin
protease_HMGCoa
t DPP
GPCR_mGlu
GPCR_prostanoid
“[..] peripheral serotonergic system disturbances
Hormone_GHRP
protease_ECE
protease_papain
protease_caspase
protease_ace
Enzyme_IMPDH
GPCR_P2Y
may predispose to thromboembolic complications [..]”
Ionchannel_P2X
Enzyme_Polymerase
Enzyme_RT
Ionchannel_AMPA
protease_neuramidase Małyszko et al. (2000) Nephron 84:305

O
COBRA 4.6: mqe NH

N N
N
O
N O O
S
O

O
H
N

O
O

O
I
NH N

N N N
N S N+
COBRA 4.6: all H -
N

HO HO
OH
Br O
O P O
O O
O N O
P
HO SH N+
O N O O O-
H
-
O O
O N+
Cl
H
N S
N N
NH O N+ N
Cl O- H
O O

FF

“Cluster Representatives“

40
COBRA 4.6: mqe

NH
H O
N N N O H
HO N
N Cl
N
N
Cl

O S
O Cl N
N
HN HN
N O
O
F N F
F
COBRA 4.6: all O
N
S
N
N
N H
O
Br
N
N
HN O
N N

Cl
O N

O N N O

O HN NH N
N
O

Identification of „Promiscuous Binders“


• COBRA 3.9
• CATS
5HT Ligands Dop.Rec. Ligands • Euclidian distance

Cl

N NH

N
N

H Lepotex (Clozapine)
N
Sertindole O Ki (D4.2) 40 nM
Ki (5-HT2a) 0.9 nM N Ki (5-HT2a) 3.3 nM
Ki (5-HT2c) 1.3 nM N Ki (5-HT2c) 13 nM
Ki (D1) 210 nM Cl Ki (D1) 540 nM
Ki (D2) 7.4 nM Ki (D2) 150 nM
Ki (D3) 8.2 nM Ki (D3) 360 nM
Ki (D4.2) 21 nM N
Ki (H1) 2.1 nM
Ki (H1) 570 nM Ki (α1) 23 nM
Ki (α1) 1.8 nM Ki (mAChR) 34 nM
F

41
SOM-Training
Distribution of all
Quantization Error • Software: molmap®
compounds
• Descriptor: speedCATS®
• 20,000 cycles
• Toroidal (10 x 10) map
• Gaussian neighborhood, τinit = 1
• Euclidian distance

Distribution of cpds. Distribution of cpds. Distribution of cpds. Distribution of cpds.


(activity = high) (activity = medium) (activity = low) (activity = 0)

Target area: neuron (2/3)

Projection of
Virtual Combinatorial Libraries

• R = 60 generic building blocks

R1 R1
N N
N R2
N N
N
R2 R1 R2
O

N N
O H H
N

NH N

O N S
H

IC50 ~ 3 nM NPY-1

piperazine spiroindoline benzodiazepine

42
Privileged Scaffolds

N R1
O
R3 N R2
N N
Ser-protease inhibitors H H
O

A SOM for Identification of


Novel Selective A2a Inhibitors

Purinergic receptor family (GPCR) A1, A2a, A2b, A3


(endogenous ligand: adenosine)
• Affective disorders, Parkinson’s disease

Given: 153 combinatorial products with known Ki (A2a , A1)


1. CATS topological pharmacophores
2. SOM training
3. Identification of a “seed” structure
4. Variation of the seed and virtual library design
5. Projection of virtual compounds onto the SOM
6. Selection of candidates for synthesis and testing

43
SOM: Novel A2a Inhibitors
Scaffold structure
O
R1 N
N
R3
R2 N N

NH2

Seed structure 1 (neuron 4/2)


O
Ki (A2a) Ki (A1) N N
N
N S
N

empty NH2
low
Seed structure 2 (neuron 3/2)
Ki and O
Selectivity O Br
N
N
high
N
N

Selectivity NH2

Combined with 96 secondary amines (from ACD)


192 virtual combinatorial products

SOM: Novel A2a Inhibitors

A2a A1
<Ki> [nM] <Ki> [nM] <Selectivity>
“historical” 102 (101) 860 (1154) 14 (19)
structures
“designed” 50 (93) 974 (1264) 33 (23)
structures
Standard deviations in brackets
Distribution of virtual
Combi-Products

O O MeO O
R1 N N R1 N N
N N N
R2 N N S R2 N N O N N O
Br Br
NH2 NH2 NH2

Lead structure 1 Lead structure 2 121-fold selectivity


(neuron 4/2) (neuron 3/2) Ki (A2A) = 2.4 nM

44
“Antidepressant-Likeness” - SOM

5000 WDI drugs N


+

+
Imipramine HN
597 Antidepressants Fluoxetine
N

F3C O

150-dimensional SOM
“chemical space”
(CATS descriptors) NKP-608
O
CF3
empty N
Antidepressants

0%
Fraction of

25 % O NH
CF3

50 %
Cl
N
75 %

Vergleich von Bindetaschen

Metalloproteinase active-site
Other Zn2+-containing pockets
1. Identify surface pockets empty
2. Assign surface properties
3. Calculate spatial auto-correlation
4. SOM-Training

Stahl, Taroni & Schneider (2000) Protein Engineering 13:83

45
Bindetaschen-SOM

Vorhersage von Bindetaschen

Training-Daten Test-Daten

46
Identifikation von Bindetaschen (1)

1. Errechne Proteinoberfläche

2. Bestimme Gitterpunkte

3. Errechne “Vergrabenheit”

4. Definiere “Tasche”

Definition von Bindetaschen

1.) Protein / solvent assignment 2.) Accessibility of “solvent” points

3.) Detection and excision of 4.) Conolly surface of


“cavity” points cavity-forming atoms

Stahl, Taroni, Schneider (2000) Protein Engineering 13:83

47
Identifikation von Bindetaschen (2)

“Site Finder in MOE”

• beruht auf “alpha spheres”


• verwendet hydrophobe und hydrophile Interaktion
• Konzept der “dichten Packung” Bindetasche

http://www.chemcomp.com/Journal_of_CCG/Features/sitefind.htm

Receptor-Ligand
Interactions

48
Reaction-Energy Diagram

N N
+ O
O
protein ligand protein–ligand complex
Energy Coordinate

Ea Ed Ea: Activation Energy


Ed: Dissociation Energy

P+L Activated complex


(transition state) ²∆EE
Overall change in energy

PL complex

Reaction Coordinate

Raffa (2003)

Protein-Liganden Interaktion

Bindungskonstante

[Ligand] [Protein]
Ki = Einheit: mol/l
[Ligand x Protein]

Freie Bindungsenthalphie ∆G = -RT ln Kii


(“Gibbs free energy of binding”)
∆G = ∆H – T ∆S
Einheit: J/mol

Enthalpie Entropie Beispiel (bei Körpertemperatur)


Ki = 10-9 M = 1 nM ≡ -51 kJ/mol
( 1 kcal = 4 kJ; K = °C + 273,15)

49
Thermodynamische Beiträge zu ∆G

Böhm et al. (1996)

Der Rezeptor-Liganden Bindungsvorgang


HO

solvated receptor solvated ensemble


binding site of ligand conformations

O O

Enthalpischer Beitrag
solvated receptor-ligand complex
• Brechen und Bilden von H-Brücken
• Bilden lipophiler Kontakte
HO

hydrogen bond Entropischer Beitrag


• Freisetzen von Wasser von hydrophoben
Oberflächen zum Medium
N
hydrophobic contacts
(mittlere Anzahl H-Brücken const.!)
• Verlust von Beweglichkeit/Freiheitsgraden
charge-assisted hydrogen bond O O von Rezeptor und Ligand
Böhm et al. (1996)

50
Protein mobility and ligand binding

A protein is considered to exist in two conformations (P and P*) with an energy difference ∆Gconvert. The ligand (L) can bind
the protein (P) to give a complex (PL), or bind to P* to give a complex (P*L). Although P* has a higher free energy, it might
offer greater scope for interaction with L. For instance, P* might represent a conformer in which the binding site has
opened and exposed hydrophobic patches. This is energetically unfavourable, but offers the potential for favourable
interactions with the hydrophobic moiety of a suitable incoming L, thereby giving rise to a large, favourable interaction
∆Gintrinsic. The resulting complex (P*L) has a lower energy than that of the complex PL. The observed affinity of L for the
protein conformational ensemble is governed by ∆Gobs. Slow binding kinetics might well be observed, as P* is a higher-
energy conformer than P and an energy barrier (∆Gbarrier) must be surmounted before optimal binding to L can take place.

Domain motion of HIV-RT; partial refolding of 3D-structure


http://www.nature.com/nrd/journal/v2/n7/extref/nrd1129-s3.mpg

Allosteric
site

Ligand-induced hinge motion Ligand-induced share motion

Domain motion in Maltose: Hinge mechanism


http://www.nature.com/nrd/journal/v2/n7/extref/nrd1129-s4.mpg

Domain motion in P450BM-3: Shear mechanism


http://www.nature.com/nrd/journal/v2/n7/extref/nrd1129-s5.mpg

51
Optimale Interaktion

• Sterische Komplementarität (“lock-and-key”, Adaptivität)


• Komplementarität von Oberflächeneigenschaften
• Keine abstossenden (repulsive) Wechselwirkungen
• Der Ligand bindet in einer energetisch bevorzugten Konformation

Biotin-Avidin Komplex (PDB: 1avd)

Beispiel: Biotin-Streptavidin

• Ki = 2.5 · 10-13 M (∆G = -76 kJ/mol)


• sieben geometrisch ideale H-Brücken
• alle polaren Gruppen des Liganden sind an der Bindung beteiligt
• lipophiler Teil “schmiegt” sich ideal an das Protein an
Biotin passt optimal in die Bindetasche
perfekte Komplementarität der funktionellen Gruppen
Böhm et al. (1996)

52
Multiple Bindungsmoden

Soakingexperimente mit
Trypsin + Guanidiniumbenzoat
Sp
e
zif
it
ä ts
ta s
ch
e

katalytisches Ser195
Böhm et al. (1996)

Multiple binding modes

• Overlay of four HIV-RT inhibitor complexes using the protein Cα atoms


• Efavirenz (blue), Nevirapine (yellow), UC-781 (green), Cl-TIBO (red)
• Protein Data Bank codes 1FK9, 1VRT, 1JLG and 1TVR.
• These structurally diverse inhibitors occupy the same volume in the binding site.
Teague (2003)

53
Bindungsmoden verschiedener Inhibitoren

Überlagerung von 5 dipeptidischen Elastase-Inhibitoren

Böhm et al. (1996)

Kraftfelder

z.B. MM2, CHARMM, OWFEG, AMBER Böhm et al. (1996)

54
Empirische Scoringfunktionen

Idee: Interpretation von ∆G als Summe lokaler nicht-kovalenter WW

2D-Fall
z.B.: Andrews et al. (1984), Mittlere Beiträge funktioneller Gruppen
(in stärksten Protein-Liganden Komplexen ~ 1.5 kJ/mol pro Atom)

3D-Fall

∆Gbinding ≈ Σ ∆Gi fi (rL, rP)

Koordinaten von Ligand und Protein

Gewichtungsfaktoren
für verschiedene WW-Typen
Training-Daten !

Wissensbasierte Scoringfunktionen

Analyse von beobachteten Kontakten in Protein-Liganden Komplexen


z.B. PMF (Muegge & Martin), DrugScore (Gohlke)

Pseudo-Paarpotentiale ∆Wij(r)
Beobachtete Paarhäufigkeit
g ij (r ) im Abstand r
∆Wij (r ) ∝ − ln
g ref

P(r) W(r)

r r

55
Calculation of knowledge-based potentials (1)
(for protein folding simulation)
Define a contact: e.g. “distance of Calpha atoms from two residues < threshold” and
count the frequencies this gives the observed frequencies:
A C L
A 1 2 1
C 2 3 1
L 1 1 1
Define a suitable reference state: e.g. statistically expected pair-wise spatial
contacts, calculated from the AS occurrences in the AS-sequence:

A C L
AS sequence: ACLACLAALL A 6 8 16
C 8 1 8
AS counts: A: 4
C: 2 L 16 8 6
L: 4 Diagonals: ½ N (N-1)

Off-diagonals: N*M

Calculation of knowledge-based potentials (2)

observed: expected (“background”):

A C L A C L
A 1 2 1 A 6 8 16
Counts C 2 3 1 C 8 1 8
L 1 1 1 L 16 8 6

A C L A C L
Relative A 0.077 0.154 0.077 A 0.078 0.104 0.208
Frequency C 0.154 0.231 0.077 C 0.104 0.013 0.104
L 0.077 0.077 0.077 L 0.208 0.104 0.078

A C L
 f ab 
A 0.027 - 0.894 2.255 ∆EPseudo = − KT ln  ref  Inverse Boltzmann Law
C - 0.894 - 6.537 0.680 f 
L 2.255 0.680 0.027

56
Calculation of Pseudo-Energy

Count the frequencies of contacts for a protein the pseudo-energy for the protein
is then calculated by summing up the pair-wise empirical potentials for all found
contacts.

2F19 (blue) and decoy (green) 2HFL (blue) and decoy (green)

∆ E pseudo = − 123 .9 kJ

2F19: ∆E pseudo = −123.9 kJ 2F19: ∆E pseudo = −169.4 kJ


decoy: ∆E pseudo = −217.7 kJ decoy: ∆E pseudo = −163.2 kJ

Andrews Estimated Binding Energy


Andrews, P. R., Craik, D. J., Martin, J. L.
Functional group contributions to drug-receptor Interactions.
J. Med. Chem. 1984, 27, 1648-1657.

• Average Binding Energy (ABE)


Crude estimation of binding affinity of an “average” drug based on its
components
• ABE of 10 functional groups were derived from regression analysis with
200 potent ligands

∆Gestimated (kcal/mol) =
- 0.7•DOF + 0.7•Csp2 + 0.8•Csp3 + 11.5•N+
+ 1.2•N + 8.2•CO2- + 10•PO42- + 2.5•OH + 3.4•C=O
+ 1.1•(O,S) + 1.3•Hal - 14

57
Andrews analysis of virtual combinatorial libraries

50
40
30
∆Gestimated (kcal/mol)
20
10
0
1 2 3 4 5 6 7 8 9 COBRA

R1 R1 R1
R1
N N O
N R2
N R2 O
NH O
N N O R2
N
R2 R2 R1 R3

1 2 3 4 5

R1 O R2
O N R1 R1
O N O R2
R2 H
N R3 N R2 O HO R3
N N S N N
H H H
NH2 N NH2 O R3 R1 O

6 7 8 9

Extension of Andrews Analysis

Free energy of ligand binding

∆G = − RT ln K D

Binding energy per atom (ligand efficiency)

∆g = ∆G
number of non − H atoms

• ∆G change of -1.4 kcal/mol ~ 10-fold change in potency


• maximum affinity per drug atom ∆g = -1.5 kcal/mol “magic” methyl

Optimize/select compounds with highest ∆g, not lowest KD

58
Nicht-kovalente Wechselwirkungstypen

Protein Ligand
O H N

hydrogen bonds Dopt = 2.8 – 3.2 Å


O H O

H
O H N

+
O H N

O H
H ionic interactions Dopt = 2.7 – 3.0 Å
+
N
O H H

hydrophobic interactions
CH3 H3C

+
N
cation-π interaction

2+ -
Zn S metal complexation
Böhm et al. (1996)

Geometrie von H-Brücken

100° < C O H < 180°

• direktionale WW
• definierte Geometrie

N H O

N H O > 150°

Abstand N-O: 2.8 – 3.2 Å

Böhm et al. (1996)

59
Beispiel: Methothrexat in DHFR
H-Brückengeometrie in Protein-Liganden Komplexen sind
oft sehr ähnlich zu Kristallstrukturen aus der Cambridge-DB

Böhm et al. (1996)

Beitrag von H-Brücken (1)

Die Bindungskonstante ist


KEINE direkte Funktion der
Zahl der gebildeten H-Brücken!

Böhm et al. (1996)

60
The Hydrogen-Bridge

Desiraju (2002) Acc. Chem. Res. 35:565

Beitrag von H-Brücken (2)

Bindungskonstanten für Thermolysin-Inhibitoren (Metalloprotease)

X = O führt zu Repulsion!

Böhm et al. (1996)

61
Beitrag von H-Brücken (3)

Bindungskonstanten für Thrombin-Inhibitoren (Serinprotease)

(Eli-Lilly)

Böhm et al. (1996)

Extrem starke H-Brücken (1)


Beispiel: Inhibition des Enzyms Cytidin-Desaminase

= Analoges des
Übergangszustandes
Böhm et al. (1996)

62
Extrem starke H-Brücken (2)
Beispiel: Inhibition des Enzyms Cytidin-Desaminase

starker Inhibitor schwacher Inhibitor


• Inhibitor verdrängt Wassermolekül • Inhibitor verdrängt Wassermolekül nicht
• optimale Passform • leicht verschobenes Molekülgerüst

Böhm et al. (1996)

Pharmacophores &
Pharmacophore Descriptors

63
Pharmacophore Definition
GLOSSARY OF TERMS USED IN MEDICINAL CHEMISTRY
(IUPAC Recommendations 1998) http://www.chem.qmul.ac.uk/iupac/medchem/

Pharmacophore (pharmacophoric pattern)


A pharmacophore is the ensemble of steric and electronic features that is
necessary to ensure the optimal supramolecular interactions with a specific
biological target structure and to trigger (or to block) its biological response.
(A pharmacophore does not represent a real molecule or a real association
of functional groups, but a purely abstract concept that accounts for the
common molecular interaction capacities of a group of compounds towards
their target structure.)
Pharmacophoric descriptors
Pharmacophoric descriptors are used to define a pharmacophore, including
H-bonding, hydrophobic and electrostatic interaction sites, defined by
atoms, ring centers and virtual points.

Pharmacophoric types of functional groups


H O NH2
N
Donor N
H

O
Acceptor N

O
OH H
N NH
Donor + Acceptor
N
H O
Acid O N CF3
N S
N
(negative ionizable) OH N H O

NH
Base N
(positive ionizable) NH2 NH2

Atoms excluded O O
N
N
(„non pharmacophoric“) O O

64
MOE-Atomtypen (PATTY)

Pharmacophore Models:
A Matter of Interpretation?

HO NH3+ NH3+
Dopamine
HO HO
OH

P/D
A/D P/D
L
Generated from
L inspection of other
A/D D
dopamine receptor ligands

•• two
two rotamers
rotamers
•• different
different PPP
PPP models
models

65
Creation of PPP Triplets

Donor
H conformational H
N N
Acceptor analysis
O
HO HO
Donor O
PPP assignment

0 0 1 0 0 0 1 0 0 0

• many bits (often > 104)


• sparse (few bits set)
Hashing & Folding

Reagent-based PPP Fingerprints (GaP)

y
HO O
PPPs

H2N
O O
1 2
N H x

z
1. align along reactive bond & place attachment point at origin
2. rotate and record PPP constellations on a grid
relative orientation of PPs to origin is defined
applicable to active site constraints
Leach et al. (2000) JCICS 40:1262

66
Receptor-derived PPP Fingerprints

e.g. GRID e.g. 3PPs

Protein Pocket Site Map PPP Constellations

Target A 0 1 1 0 1 0 1 0 1 0
• Compound screening Target B 0 1 1 0 0 0 1 0 1 1
• Informative Library Design Target C 0 0 1 1 0 0 1 0 0 1
Target D 1 0 1 1 1 0 1 0 0 0
...

Virtual Screening for BACE Inhibitors

Validated hits
Common Pharmacophore (Asp Proteases)

Virtual Screening

Combined Query

Target-Specific Pharmacophore
(Surf2Lead® , Pep2Lead® , PHACIR®)

67
In situ Design

Factor VIIa

O NH

NH

S1 pocket HN NH2

Asp189

Geometrische Parameter
nicht-kovalenter Wechselwirkungen

R: Abstand der Wechselwirkungspartner [Å]


α: Winkel zur Bezugsachse
ω: Drehwinkel um die Bezugsachse

68
LUDI geometry rules: C=O

• Type: Acceptor
• Compl. Type: Donor
• R: 1.9 + 1 Ǻ
• α: 110-180°
• ω: 0-360°

LUDI geometry rules: N-H, O-H

• Type: Donor
• Compl. Type: Acceptor
• R: 1.9 Ǻ
• α: 150-180°
• ω: 0-360°

69
LUDI geometry rules: COO-

• Type: Acceptor
• Compl. Type: Donor
• R: 1.8 + 1 Ǻ
• α: 100-140°
• ω: -50-50°, 130-230°

LUDI geometry rules: =N- (as in His)

• Type: Acceptor
• Compl. Type: Donor
• R: 1.9 + 1 Ǻ
• α: 150-180°
• ω: 0-360°

70
LUDI geometry rules: R-O-R

• Type: Acceptor
• Compl. Type: Donor
• R: 1.9 + 1 Ǻ
• α (sp2): 100-140°
• ω (sp2): -60-60°
• α (sp3): 90-130°
• ω (sp3): -70-70°

LUDI geometry rules: Carbon

• Type: Lipophilic
• Compl. Type: Lipophilic
• R: 4 Ǻ
• α, ω: full sphere

71
LUDI geometry rules: Sulfur

• Type: Lipophilic
• Compl. Type: Lipophilic
• R: 4.8 Ǻ
• α, ω: full sphere

LUDI geometry rules: Aromatic Ring

• Type: Aromaticity
Donor
• Compl. Type:
Aromaticitiy Donor and
Acceptor
• R: 6 Ǻ
• Circular plane

72
LUDI geometry rules:
Aromatic Ring Hydrogens

• Type: Aromaticity Acceptor


• Compl. Type: Aromaticitiy
Donor
• R: 6 Ǻ
• α: 160-180°
• ω: 0-360°

LUDI geometry rules: Amide Bond

• Type: Aromaticity
Donor
• Compl. Type:
Aromaticitiy Donor and
Acceptor
• R: 6 Ǻ
• Circular plane

73
“Virtual Ligands”
Docking-Free Structure-Based Similarity Searching

• Idea: Autocorrelation Vectors of virtual ligand


pharmacophore points b

• Formal definition: CV = f ( x ) ⋅ f ( x + l ) ⋅ dx
a

Binding pocket Potential interaction sites “Inverse” interaction sites

Virtual Ligand

Example: Thrombin Active Site PPPs

74
Example: Factor Xa “Fingerprints”

Virtual Ligand Known Ligand

DD DA DH AA AH HH DD DA DH AA AH HH

HO
COBRA_3743 (Xa/VIIa) O
O S
O N
YM_60828_3743
NH

N O NH2
D: Donor, A: Acceptor, H: Hydrophobic HN

Goodness-of-Hit

 H (3 A + H t )   H t − H a 
GH =  a  × 1 − 
 4Ht A   D− A  Guner & Henry (2000)

D number of compounds in the database


A number of actives in the database
Ht total number of compounds in the hit list
Ha number of actives in the hit list

• often, enrichment and coverage are in competition


retrospective analysis by GH value

75
SVM-based Feature Relevance (1)

a) b)
maximum
margin

optimal
hyperplane

important features unimportant features

Byvatov & Schneider (2004) JCICS 44:993

SVM-based Feature Relevance (2)

SVM function:
f (x) = ∑ ai * K (x sv
i , x) + b
i

Separating hyperplane:

f (x) = (w • x) + b , where w = ∑ ai x i is a normal vector


sv

i of the separating hyperplane.

Feature change along the normal of the SVM plane:

∂f (x) ∂K ( x sv
i , x)
R f ( x) = = ∑ ai * +b
∂x f i ∂x f

∂K (x sv sv
i , xj )
R f = ∑ R j (x ) = ∑ ai *
sv
j +b
j i, j ∂x f

76
SVM-based Feature Relevance (3)
c)
Factor Xa vs. COBRA

• SVM-based feature selection is


more robust than KS-Statistics

• target-specific features can be


identified (here: CATS 2D)

Factor
e) Xa vs. Thrombin
L

O O
OH S
O
NH O
NH H
H2N
+ NH HN NH
H2N O NH
+ N
H
NH O H2N
HO Br HN

Calculation of 3PP Importance (1)


Bit set Bit not set

Ri = f (x( Fi = 1)) − f (x( Fi = 0))


x : molecular fingerprint with features F

a) b) c) d)

(2+3=5) 3
NH2 Ri = 3
O

OH 3
2 2

Rj = 2

77
Calculation of 3PP Importance (2)

predicted thrombin inhibitor


S2/3

O O
H H
N O N O
S N S N
O H O H
O O O O

NH Gly216 NH
19
H2N NH H2N NH

Gly216 Asp189

Gly219
NH
N COOH N
H N COOH
N
NH O
S O H N O HN NH2
O S O
Asp189 O O
O S NH O NH2
20 NH2
NH O
S1 N NH2
H
H2N NH

Argatroban (Ki = 5 nM) NAPAP (Ki = 7 nM)


(PDB: 1dwc) (PDB: 1dwd)

Calculation of 3PP Importance (3)

Sulfonyl group
a) b)
planar
hydrophobic
Ring B
O
H-bond
N S NH2
acceptor
Ring A F3 C N O

COX-2 inhibitor SC558


(PDB: 1dwc) Molecule 5

His90
SC-558

78
Virtual Screening for D3 Receptor Ligands (1)

Arom.A Aliph. Lipo. Binding


B C
O N D3, D2
N O
N
H N
N Ki = 1408 nM
BP897 (D3 partial agonist) 3
12 Ki = 1414 nM
O
+ analogues with known D2, D3 binding O N Cl
N N
MOE 3PP fingerprints Kii == 40
K 40 nM
nM
4
13
Kii == 554
K 554 nM
nM
SVM training + predictions O
O
11 compounds tested N N
N
Cl
Ki = 139 nM
O O SVM-based 5
14
Ki = 417 nM
O O
O
similarity search O
N
D3 N
N K i
H Ki = 96 nM
log D2
regression 6
15
Ki = 201 nM
best: D3 Ki < 2 µM K i
O

N
D2 Ki < 2 µM Ki = 914 nM
Ki = 4395 nM
7
16

Byvatov et al. (2005) Chembiochem 6:997

Virtual Screening for D3 Receptor Ligands (2)

• hD3 Homology model (rhodopsin template; 28% sequence identity)


• ligand docking (MOE)

A B

Asp 110

Asp110

Phe 345
Phe345

Ser192
Phe346 Phe 346
Ser 192

79
Naumann & Matter (2002) J. Med. Chem 45:2366
© Dr. Hans Matter

© Dr. Hans Matter

80
GRID Potentials & Block-Scaling

GRID

BUW

BUW coefficients are obtained equalizing the sum of squares of each block of variables
Equal weight of each block (no prior autoscaling!)

Consensus PCA (CPCA)

• Daten in Blöcken: verschiedene Information


• welche Informationsblöcke tragen zum Modell bei?

Block-Analyse (CPCA)
Eigenwerte (“erklärte Varianz”) werden pro Block errechnet

Superblock-Analyse (entspricht “klassischer” PCA)

81
ATP-Bindetasche in Kinasen

© Dr. Hans Matter

© Dr. Hans Matter

82
© Dr. Hans Matter

Spezifische Attribute von Phosphate-binding area


ATP-Bindetaschen in Kinasefamilien

ATP-purine pocket

© Dr. Hans Matter

83
Correlation-Vector-Representation
of molecules

Correlation Vector Representation (CVR) of Molecules

i,j: coordinate points (e.g., atoms)


CVd = ∑∑ δ ij ⋅ (qi ⋅ q j )d
A A
A: number of points
q: property value
i =1 j =1 δ: Kronecker delta
d: distance between two points
P. Broto, G. Moreau, C. Vandyke (1984)
Eur. J. Med. Chem. 19:66.

b
• autocorrelation function ( CV = ∫ a
f ( x ) ⋅ f ( x + l ) ⋅ dx)
• spatial or topological distance
• rotation- and translation invariant („alignment-free“)
• descriptor vector of defined length (similarity calculation)
• easily implemented

84
modlab® Correlation Vector Representations

O
O
O
O
S O
H2N
O

CORINA
PETRA
Gasteiger et al.

CVd = ∑∑ δ ij ⋅ (qi ⋅ q j )d
A A A A
1 A A T
CVdT = ∑∑ δ ij ,d CVdT = ∑∑ δ ijT,d ⋅ wi ⋅ w j
A i =1 j =1 i =1 j =1 i =1 j =1

LUDI atom types PATTY atom types

The CATS2D Descriptor

NH 2

Lipophilic:
Lipophilic: {C(C)(C)(C)(C),
{C(C)(C)(C)(C), Cl}
Cl}
Positive:
Positive: {[+],
{[+], NH2}
NH2}
Negative:
Negative: {[-],
{[-], COOH,
COOH, SOOH,
SOOH, POOH}
POOH}
H-bond
H-bond Donor:
Donor: {OH,
{OH, NH,
NH, NH2}
NH2}
H-bond
H-bond Acc.:
Acc.: {O,
{O, N[!H]}
N[!H]}

85
„Retrospektives Screening“

Library
Bekannte aktive Moleküle

“Query”

• Paarweise Güteberechnung zwischen “Query” und “Molekül X”


• Für alle Strukturen werden Rangplätze vergeben

number of actives found


ef =
number of actives expected

There Is No Best Method

COX
COX MMP
MMP

HIVP
HIVP

• COBRA 3.2
• Manhattan distance

86
Similarity Searching → Complementary Results

COX-2 MMP HIV-Protease

CATS2D CATS2D CATS2D


CATS3D CATS3D CATS3D

Charge3D Charge3D Charge3D

∩=6 ∩=1 ∩=0

U. Fechner, S. Renner, L. Franke, P. Schneider, G. Schneider (2003)

Do not forget the receptor!

COX-2: buried, narrow MMP3: shallow, solvent-exposed HIV-Prot: buried “tunnel”

1CX2 1D5J 1HSG

Red: crystal structure of L-735,524


Green: CORINA model

87
„Retrospektives Screening“ - Beispiele

% found

% of library screened

adapted from Stahl, Rarey, Klebe (2001)

„Retrospektives Screening“ - Ranking

adapted from Stahl, Rarey, Klebe (2001)

88
„Retrospektives Screening“ - Methodenvergleich

adapted from Stahl, Rarey, Klebe (2001)

„Retrospektives Screening“ - Methodenvergleich

DaylightFingerprints

Top ranks in both lists

H1 receptor antagonist
(query)
FeatureTrees

adapted from Stahl, Rarey, Klebe (2001)

89
Fusion of CATS2D Ranked Lists
Nuclear Receptor ligands subset of
A∪B∪C∪D∪E∪F∪G the COBRA database, v2.1 (N = 211)

A∪B∪C∪D∪E∪F A) Manhattan distance


A∪B∪C∪D∪E
B) Euclidian distance
C) Tanimoto coefficient
A∪B∪C∪D D) Soergel distance
E) Dice coefficient
A∪B∪C
F) Cosine coefficient
A∪B G) Spherical distance

A S   P act 
ef =  act   
0 5 10 15 20 25 30  S all   P all 
cumulative percentage of actives found
Subset Pool
Black
Black bars
bars show
show the
the percentage
percentage of of
(COBRA)
actives
actives that
that were
were retrieved
retrieved by
by the
the
respective
respective similarity
similarity metric
metric and
and no
no more
more
than
than one
one additional
additional similarity
similarity metric.
metric.

„Fuzzification“ of the CATS2D Descriptor

Counts Counts
4 4
3.5 3.5 Counts = f ⋅ Countsbin+1
3 3
2.5 2.5 Counts = f ⋅ Countsbin−1
2 2
1.5 1.5
1 1
0.5 0.5

0 1 2 3 4 5 0 1 2 3 4 5
Distance / bonds Distance / bonds

Original
Original „Fuzzy“
„Fuzzy“

• no significant overall improvement of enrichment of actives


in a focused library
• can be helpful for individual searches ( scaffold hopping)

90
CATS2D: A Ranked List KKi (D1)
(D1)==270
KKi i(D3)
270nM
==21nM
nM
(D3) 21nM
KKi i(D4.2) = 11nM
(D4.2) = 11nM
Query KKi i(5-HT2A) = 25nM
i (5-HT2A) = 25nM
O KKi (α1)
(α1)==19
19nM
nM
Haloperidol OH
KKi i(H1)
(H1)==730
730nM.
nM.
D2-antagonist N Cl i
F

O N
OH N H OH 5-HT2C antagonist
N
D2 ligand N Br
1 F 6 O

O
OH N
O
N O H3 antagonist
D2 ligand 2 F F 7 NH
F
F

GABA transporter HO TNF-α inhibitor


type I (ion channel) 3 8 N
O N N N N
NH 2
S
O N
O
4 S 9
PPAR-γ agonist HN N F D2 ligand
O
O Cl

5 10 Eliprodil
D2 ligand H2N N
N N (ion channel)
F OH

The previous figure shows the ten highest ranking compounds which were retrieved from the COBRA
database by a topological pharmacophore similarity search (CATS method). The query structure was
Haloperidol, a dopamine (D2) receptor antagonist. Not surprisingly, classic variations of the query structure
are found on ranks 1 and 2. These are not very interesting from the library design or scaffold-hopping point of
view. On ranks 5 and 9 two additional D2-receptor ligands are found, one of which surprisingly is an agonist
(rank 5), and the well-known Melperone structure on rank 9 which represents a substructure of Haloperidol.
Retrieval of the rank 5-molecule could already be regarded as a “scaffold-hop”, as the compound has a
different structure than the query. This molecule is a “D2-ligand” and may be regarded as isofunctional on this
description level of “bioactivity”, but not necessarily exhibit the same kind of functional activity (the
compound on rank 5 is an agonist, not an antagonist as the query molecule). Looking at the first molecules
ranked between known D2-ligands, we find an annotated ion channel blocker (GABA transporter type I,
GAT1) on rank 3, and an antiinflammatory PPAR-γ agonist (Pioglitazone) on rank 4. Based on the similarity
ranking, it would now be worthwhile testing these molecules in a dopamine receptor binding assay. Indeed, a
co-inhibition of dopamine transporter and GAT1 has been reported for Orphanin FQ, an endogenous
antagonist of the dopamine transporter; and Pioglitazone has been found to prevent dopaminergic cell loss.
These are first indications that the similarity search might have produced useful results. Still, only the
biochemical test can validate the results. An argument against the compound on rank 4 might be the lack of a
basic amine function. Looking at the structures on lower ranks, we find a serotonin receptor 5-HT2C
antagonist (rank 6), a histamine receptor H3 antagonist (rank 7), a TNF-alpha inhibitor (rank 8), and another
ion channel blocker (Eliprodil, rank 10). For an assessment of this finding, it is important to learn about other
activities of the query structure: Ki (D1) = 270 nM, Ki (D3) = 21nM, Ki (D4.2) = 11nM, Ki (5-HT2A) = 25nM,
Ki (α1) = 19 nM, Ki (H1) = 730 nM. This means that Haloperidol exhibits binding activity against a whole
family of targets, and is not specific for the D2 receptor. Therefore, retrieving a H3 ligand on rank 7 can be
considered a success if we keep in mind that the query has significant binding potential at the H1 receptor.
This brief example of a pharmacophore-based similarity search demonstrates that one has to be very
careful when analyzing a ranked list, and a seeming contradiction to what was expected as an outcome
of the experiment might be resolved by considering multiple activity of the query structure.

91
How Far Should We Look?

Number
of Hits

0
Distance to Query

Threshold?
Threshold?

U. Fechner, G. Schneider (2004)

CATS: Quest for Novel Ca2+ Channel Blockers

N
N
O H
2 IC50: 3 µM - RTTC / FLIPR
N
O N
F N N
H
O CATS Cl N
N
O
O

1 IC50: 1.2 µM - RTTC / FLIPR


3 IC50: 2.4 µM - RTTC / FLIPR
O
O
Mibefradil (Posicor®) N N

T-type Ca2+ channel blocker


4 IC50: 3.5 µM - RTTC /FLIPR

O O O
O N N
CATS
O O N
O
O N

6 IC50: 3.3 µM - RTTC / FLIPR 5 IC50: 0.8 µM - RTTC / FLIPR

G. Schneider, T. Giller, W. Neidhart, G. Schmid (1999)

92
“Fuzzy” Pharmacophores
Assign
Align atom types Determine Local Feature Densities
LFD = 1.46 LFD = 1
LFD = 2.09

LFD = 1.76
LFD = 1.89
LFD = 1.85

LFD = 1.85 LFD = 1.88

LFD = 2.18 LFD = 2.16

LFD = 2.14 LFD = 2.17

Cluster PPPs Assign Distances Calculate Fingerprint


1 
Wc = min  ,1 = 0.5
2 
σ = 0.5
dist = 2.94 Å

1 5  1 6 
Wc = min  ,  + min  ,  = 0.96
2 11  2 11 
σ =1.3

Fuzzy Pharmacophores:
Quest for Novel COX-2 Inhibitors

1%

5%

10%

93
Fuzzy Pharmacophores Outperform
Single-Query Searching

Fuzzy Pharmacophores:
Quest for Novel Thrombin Inhibitors

H2
D1
H3 a) b) c)
H1

A1

B
d) e) f)

“Fuzziness”

94
Fuzzy Pharmacophores Outperform
Single-Query Searching

Fuzzy Pharmacophore Model of TAR Binders


H2N
N NH2
OH
Binding of Tat protein to TAR N
N O
RNA is essential for HIV N N HN
N O
replication O N N O
S Cl N
1 2 3
Lind et al., Chem Biol, 2002 Hamy et. al, Biochemistry 1998
Known Tat-TAR interaction inhibitors

Flexible alignment
of 2 and 3 to active
conformation of 1

NMR-structure 1LVJ
with bound inhibitor 1

95
Fuzzy Pharmacophore Model of TAR Binders

Fuzzy pharmacophore model


of TAR-RNA ligands

Virtual screening
of the SPECS catalogue

10 compounds cherry-picked

4 hits, 1 novel TAR-RNA ligand

In cooperation with SFB 579


S. Renner, O. Boden, V. Ludwig, M. Göbel, G. Schneider (2004)

New Allosteric Modulators of Metabotrobic Glutamate


Receptor 5 (mGluR5) (1)

Queries

Flexible
Flexible alignment
alignment of
of queries
queries 33--99 (MOE)
(MOE)
conformations
conformations for
for CATS3D
CATS3D

Renner et al. (2004) Chembiochem 6:620

96
New Allosteric Modulators of Metabotrobic Glutamate
Receptor 5 (mGluR5) (2)
Ranked list

Molecule Most similar K i mGluR5 K i mGluR1 Selectivity


no. reference (µM) (µM) (K i mGluR1 /
molecule K i mGluR5)
10 6 12 17 1.4
11 8 14 45 3.2
12 3 24 > 100 > 4.2
13 6 33 61 1.9
14 7 35 > 100 > 2.9
15 9 38 > 100 > 2.6
16 8 39 > 100 3.2
17 5 41 64 1.6
18 9 63 > 100 > 1.6
19 5 > 100 14 < 0.14

97
3D Conformations
Heuristic Conformer Generation

Heuristic 3D Conformer Generation

• CORINA one conformer


• ROTATE multiple conformers

98
• Generation of isomeric structures

Br Br
A meso molecule is one that is superimposable on its mirror image H H
(achiral) but has stereogenic centers. Br Br
The most common kind of mesocompound is a molecule with two H H
stereogenic centers and a plane of symmetry.
(2R,3S)-2,3-dibromobutane

• Identification of “geometrically strained” configurations

• Elimination of clashes (vdW contacts) Clash in


n-heptane

• Elimination of duplicate conformations (e.g. meso compounds)

STEREOCHEMISTRY
http://orgchem.colorado.edu/courses/3361manualF04/MMstereofullLM61F04.pdf
http://www.chem.umd.edu/courses/jarvis/chem233spr04/Chapter04Notes.pdf

99
Principles of ROTATE

Zuweisung von Torsionswinkeln

100
Observed torsion angles (CSD)

Schwab (2003)

n-decan
144 conformers

Receptor-bound vs. best ROTATE conformer

N
1
RMS XYZ =
N
∑ ( X i − X i ')2 + (Yi − Yi ')2 + (Z i − Z i ')2
i =1

101
Berechnung des RMSXYZ-Wertes

102
Torsionswinkelhistogramm (CSD) und
abgeleitete Potentialfunktion
h(τ)

[SLN (SYBYL Line Notation)]

E(τ) = A • ln h(τ)

(Näherungsverfahren
nach Murray-Rust)

Diskretisierung von Torsionswinkeln

103
Erkennen von rotierbaren Bindungen

• Bindung muss Einfachbindung sein


• Bindung darf nicht Teil eines Ringsystems sein
• Bindung darf nicht endständig sein

Sonderfall 1: Carbonsäureamid

in CSD

104
Sonderfall 2: Keto-Enol Tautomere

in CSD

Erkennung von Ringsystemen

• graphentheoretischer Ansatz

(z.B. durch Flood-fill)


Alle Knoten (Atome) auf den kürzesten Pfaden
zwischen den Knoten, welche durch eine
Ringschlußbindung verknüpft sind, werden als
Ringatome markiert.

105
Distribution of intramolecular atom-pair distances
in CORINA-generated conformations of druglike compounds
(COBRA v2.1)

N
average DAB = 15 Å

maximal pair-wise distance (DAB) / Å

Molecular Diversity
Subset Sampling

106
Molecular Diversity

Distance-based
• Diversity metrics Cell-based
Variance-based

2D
• Diversity spaces 3D
Physicochemical

Reagent-based
• Diversity sampling Product-based

Adapted from: Agrafiotis et al. (2000) in “Virtual Screening for Bioactive Molecules”
Böhm, Schneider eds., Wiley-VCH.

Distance-based diversity metrics: D1

1. Define distance function


2. Compute library diversity

Often used:
minimum pair-wise distance of compounds i,j in a collection C
D1 (C ) = min d ij
i< j
Problem: D1 depends on a single inter-molecular distance.

= equally diverse sets


according to D1

107
Distance-based diversity metrics : D4

Average nearest-neighbor distance of compounds i,j in a collection C

1
D4 (C ) =
N
∑ min d
i
j ≠i
ij

Less sensitive to outliers!

Collection of 100 “most diverse” compounds


From a library of 10,000 points
(subset maximizes D4)

Distance-based diversity metrics : D4

Problem: D4 does not consider inter-cluster distances

= equally diverse sets


according to D4

• such a situation is rarely found in real-life problems

108
Space partitioning by a k-dimensional tree

• Quadratic dependence of D1 and D4 on the number of compounds in C


Virtually useless for large libraries and high-dimensional spaces!
Workaround: nearest-neighbor searching in a
k-dimensional tree (for dimensions < 10)

Def.: A multidimensional search tree for n points in k-dimensional space.


“Find the set of points that fall into a given rectangle in a plane“ in O(sqrt(n)+k) time

left right
left right
up
down down up point coordinates
y are discriminators

One possible tree!

x Demo: http://www.rolemaker.dk/nonRoleMaker/uni/algogem/kdtree.htm

Distance-based diversity metrics : D7

Entropy measure (“information content”) of library diversity

D7 (C ) = S max − S

N N
S = −∑∑ pij ln pij
i =1 j =1

pij: probability of finding the i-th individual


in the j-th species
(from substructure similarity table)
• subset maximizes D7
Unbalanced subset selection
Agrafiotis et al. (2000)

Critically depends on the definition of “information”.

109
Cell-based diversity metrics: D8
• Gridding of chemical space
• Absolute positions of compounds (in contrast to diversity-based metrics)

M
D8 (C ) = ∑ δ i if cell is occupied, δi = 1
else δi = 0.
Measure
Measure of
absolute
of
absolute diversity!
diversity!
i =1

equally diverse sets


according to D8

• does not consider clustering of data


• poor discrimination of collections with similar span but different distributions

Cell-based diversity metrics: D9-12


NC
Cell-based fraction D9 (C ) =
NR

D10 (C ) = ∑ ( N i − N * )
2
Cell-based χ2
i

Cell-based entropy D11 (C ) = −∑ ( N i log( N i ))


i

  N 
Cell-based density D12 (C ) = −∑  N i log i  
i   Mi 
NC number of cells occupied by C
NR number of cells occupied by the reference set
Ni number of compounds in the i-th cell of the subset
N* average number of compounds per cells expected for the subset
Mi number of compounds in the i-th cell of the reference set

110
Subset Selection & Sampling

n!
Number of different subsets =  n  =
k
  (n − k )!k!
n : number of compounds in C
k : number of compounds in the subset

• Product-based selection is more effective in terms of diversity


(generates more diverse subsets than reagent-based selection)

• Reagent-based selection can cope with very large libraries

• Differences also result from descriptor type

Maxmin Sampling

SELECTION
POOL

1. take first compound from POOL and put it into SELECTION,


2. find the compound in POOL which is most dissimilar from
the compounds in SELECTION and put it into SELECTION,
max (min (d (C POOL , C SELECTION )))
3. repeat step 2 until the desired number of compounds is in
SELECTION.

111
http://gecco.org.chemie.uni-frankfurt.de/maxminselection/index.html

1 1

b)

0.5 0.5

0 0
0 0.5 1 0 0.5 1

1 1

d)

0.5 0.5

0 0
0 0.5 1 0 0.5 1

Maxmin Sampling: Java vs. C implementations

2
x 10

3
R1
calculation time [s]

C
O
N R2 2
HN Java
O
R3
1

0
0 1 2 3 4 5
number of compounds in pool x 10
4

112
Kolmogorov-Smirnov Statistics

Dissimilarity between two property distributions


(“model-free” approach to comparing distributions)
K * = max P ( x) − P * ( x) Maximum value of the absolute difference
−∞ < x >∞ between two distribution functions

actual distribution target (reference) distribution

Example:
K* is in [0,1]
⇒ K = 1− K *

Similarity index

Kolmogorov-Smirnov for Sampling

Source Library
Flip members
• combinatorial Sub-set
• de novo
• corporate Goal:
K max.

Reference set

e.g. by Metropolis sampling:

if ∆ ≥ 0 then accept
else if exp(-∆/T) > random[0,1] then accept;

113
Kolmogorov-Smirnov for Sampling

Agrafiotis et al. (2000)

An Approach to
Product-Based Diversity Sampling

Reference
Compounds

Choose molecular descriptors Source Library


• combinatorial
Estimate probability distribution • de novo
• corporate
Sample library members

Synthesise / Order / Test

Actives
Inactives
Byvatov & Schneider (2003)

114
De novo design
of druglike molecules

How druglike chemical space might be structured

M.C. Escher's “Development II“


© 2004 The M.C. Escher Company - Baarn - Holland.
All rights reserved

115
Scaffold analysis with MEQI

Xu, Johnson (2001) JCICS 41:181


Xu, Johnson (2002) JCICS 42:912
Application to scaffold hopping from “natural ligands”: Jenkins et al. (2004) JMC 47:61444

www.pannanugget.com

Molecular graph Cyclic skeleton Reduced cyclic skeleton


(reduced second-degree vertices)

• “light” version for up to 5,000 molecules: www.pannanugget.com

Hierarchical relationships of structural feature classes

116
Ligand Classes & Structural Feature Classes

Jenkins et al. (2004) JMC 47:61444

De novo design concepts

Requirements Implementations
• Grow
• Link
• Structure sampling method • Lattice
• Stochastic
• Primary constraints
• Structure assessment method (receptor, ligand)
• Secondary constraints

• Depth-first search (DFS)


• Breadth-first search (BFS)
• Search method & Stop criterion • Random search
• Evolutionary Algorithm
• Monte Carlo / Metropolis
• Exhaustive enumeration

117
Building Blocks Primary target constraints Combinatorial Search Strategy Structure Sampling
Name Publication Buildin Fragments Receptor Ligand DFSA BFSB Random MCC EAD Grow Link Lattice MDE Stochastic
HSITE/ 2D Skeletons10,29,85 1989 X X X Fitting and clipping of planar skeletons
3D Skeletons30 1990 X X X X
Diamond Lattice31 1990 X X X X
26
BUILDER v1 1992 X X X X X
18
LEGEND 1991 X X X X
LUDI11,12,86-88 1992 X X X X X
NEWLEAD28 1993 X X X X X
SPLICE58 1993 X X X X
GenStar32 1993 X X X X
GroupBuild16 1993 X X X X
CONCEPTS37 1993 X X X X
15,55-57
SPROUT 1993 X X X X X X
23,25
MCSS & HOOK 1994 X X X X
19
GrowMol 1994 X X X X X
MCDNLG59 1995 X X X X
Chemical Genesis20 1995 X X X X X
DLD24,89 1995 X X X X
PRO_LIGAND13,42,90-93 1995 X X X X X X
SMoG39,40,94 1996 X X X
27
BUILDER v2 1995 X X X X
33
CONCERTS 1996 X X X X
21
RASSE 1996 X X X X
PRO_SELECT14,38 1997 X X X X
SkelGen61,62 1997 X X X X X
Nachbar43,95 1998 X X X X
Globus47 1999 X X X X
DycoBlock34,35 1999 X X X X
LEA45 2000 X X X X
22
LigBuilder 2000 X X X X X
46
TOPAS 2000 X X X X
F-DycoBlock36 2001 X X X X
ADAPT65 2001 X X X X
Pellegrini & Field44 2003 X X X X X
SYNOPSIS53 2003 X X X X
CoG48 2004 X X X X X
BREED60 2004 X X X Exhaustive recombination

Link/Grow Strategy
N place
H fragments Ki = 16 µM
O O

link
O
Ile56
N
H OH OH
O Babine et al. (1995)
O
Bioorg. Med. Chem. Lett. 5:1719

O
O N place first
H fragment
Asp37 O
Phe46 O

grow
Define Determine
binding pocket interaction sites
OH
O
(FKBP-12) O
O
O

118
Lattice Strategy

N N
H H

OH
O O
O O

Fill pocket Find and connect Assign molecular Build


with lattice points interaction points framework molecule

Molecule Assembly Strategies

1. Generate a molecular skeleton based on molecular graph


A
2. Assign real 3D substructure elements (e.g. SPROUT)

1. Link 2D molecular building blocks (SMILES, mol)


B
2. Calculate 3D conformation (e.g. TOPAS)

C • Directly link 3D molecular fragments (e.g. LUDI)

119
Tree model of search
space exploration by an Binding
Pocket
automated structure Initial State

generation method

NH HN
Level 1
• Grow strategy
• Depth-first search x
• Structure-based O

HO
NH NH NH
Level 2
N O
N

x x
...

NH
End State
O Designed Molecule

HO
O

N NH O

Search space exploration by an HN


OH
O

evolutionary algorithm
O
O
Initial state
O HN O
N+ N
O
OH
N 0.47
O
HO
O
O

• Mutation / Selection
Tanimoto index (similarity to the template structure)

N
N
H
N NH 0.57
• Depth-first search
N N
N N
N O

• Ligand-based N
O

NH N
0.66
(Reference: Gleevec®) Br NH

H O
N
N N
N
NH N 0.70
N
N Br NH

O
OH

HN
N N 0.81
F N HN
O

O N

N N
H
N
H
N
0.92
N
N F

O N

N N N N
1.0
H H
N
N
End state

120
Generation of favorable ligand-binding positions

• CAVEAT (Lauri & Bartlett, 1984)

• GRID (Goodford, 1985 )

• LUDI (Böhm, 1992)

• MCSS (Miranker & Karplus, 1991) /


CHARMM (Brooks, 1983)

CAVEAT (Lauri & Bartlett, JCAMD 1984, 8:51)

• Designing mimics of known ligands


• Designing linking units to constrain acyclic molecules
• de novo design of active site ligands

Design of a glucopyranose receptor


Yang et al. Angew. Chem. Int. Ed. 2001, 40:1714

121
LUDI (Böhm)

• Finde WW-Zentren
HD: blau
HA: rot
Lipo: grün

• Plaziere Fragmente

• Verknüpfe Fragmente

Böhm et al. (1996)

De novo Design mit LUDI: Trypsin-Inhibitoren

Böhm et al. (1996)

122
CHARMM

123
DHFR site points

• acceptors
• donors
• ring centroids
• neutrals

HIV-Protease site points

• acceptors
• donors
• ring centroids
• neutrals

Minima for fragments


• clustered site points
final site points

124
HIV-Protease

• benzene minima

HIV-Protease

• benzene minima
and
• other ring minima

Recent de novo design examples

• New antifungal agent


HO O • Candida / Mycobacterium lanosterol 14α-demethylase (CYP51)
• MCSS fragement identification
N
1
• LUDI fragment linking
OH
Ji et al. (2003) JMC 46:474 • no heme coordination (no CYP-P450 interaction)

• New HIV-1 protease inhibitor (Ki = 42 nM)


OH
H
O N N
S
O • BREED „preferred fragment“ approach
N
O O O H • First step: 4 reference molecules recombined
2 • Second step: hybrid fusing with 100 reference structures
Pierce et al. (2004) JMC 47:2768

O • New HIV-1 reverse transcriptase inhibitor (IC50 = 4.4 µM)


N
N
H • SYNOPSIS structure-based approach
H2N S
• First step: 28 designs with predicted low IC50
3
• Second step: expert inspection & selection, 18 synthesized
Vinkers et al. (2003) JMC 46:2765
• 10/18 with IC50 < 100 µM

125
Recent combinatorial de novo design examples

• New Cdk-4 inhibitor (IC50 < 1 µM)


N
HN • LEGEND with homology model
HN
O • First step: candidate designs
O 4 • Second step: combinatorial optimization of preferred scaffolds
Honma et al. (2001) JMC 44:4628 (MW < 350 Da)
• Third step: LUDI & LeapFrog for selectivity optimization
(side-chain optimization)

O
F3C
• New CB-1 ligand (IC50 = 0.3 µM)
• TOPAS ligand-based approach
CF 3
5 • First step: designs assembled from GPCR-fragments
• Second step: expert inspection & scaffold selection
Rogers-Evans et al. (2004) QCS 23:426
• 6-10% hit rate with IC50 < 10 µM

TOPAS II: The Implementation

11
Generate λ 3,788
Reactions
Start diverse Fragments
molecules

Determine Quality Generate λ


#Atoms (!H): 12-60
#O+#N ≤ 12 (Daylight Fingerprints & molecules
Tanimoto Coefficient) (mutation)

Select best solution

Yes No
End Stop?

126
Daylight Toolkit Functions

dt_fp_allocfp allocate a new fingerprint


dt_fp_euclid compute the euclidean distance between two fingerprints
dt_fp_tanimoto compute the tanimoto coefficient of two fingerprints

dt_smirkin interpret a string as a generic reaction


dt_utransform apply a reaction transform to an object
dt_umatch match a pattern against an object

dt_smartin interpret a SMARTS string


dt_cansmiles retrieve the canonical SMILES string of an object
dt_copy make a copy of an object
dt_dealloc remove an object from the system
dt_smilin interpret a SMILES string
dt_weight return the atomic weight of an atom
dt_getrole get the role an object plays in a reaction
dt_stream allocate a stream object
dt_next retrieve the next object in a compound object

SMIRKS/ ReactionSMILES for Virtual Synthesis

Aromatic-C + Aromatic-C

([c;R1:1][10*]).([10*][c;R1:2])>>[c;R1:1]-[c;R1:2]
([c;R1:1][10*]).([10*][c;R1:2])>>[c;R1:1]-[c;R1:2]

Reaction type & site index


Aromatic
carbon Atom mapping index
Member of
exactly one ring

[10*] [10*]
+

127
The TOPAS II “Flux-Generator”
F F

HN F H HN F
O N O N
H H
N N

F
O O
Parent Structure Child Structure

O O
Step 1 Step 3
Randomly select Synthesize with
N N
a reaction and reaction chosen
retro-synthesize Amide Amide in Step 1

F O
OH F O
H2 N Step 2 OH
HN F H2 N
HN F
OH H2N Randomly pick a OH
O fragment and substitute H
N
O
F by a fragment of the
Fragmented Parent Structure same type Fragmented Child Structure

Design Examples - Napsagatran


O
O OH • ‘recapped’ COBRA
S NH N
O
• Daylight Fingerprints
O
NH • Tanimoto Coefficient
O
Napsagatran • non-adaptive EA
N
NH
H2N

O O
S NH OH
O

NH
O

T = 0.9 N
NH
H2N

128
Design Examples - Gleevec

O N
• ‘recapped’ COBRA
N N N N • Daylight Fingerprints
H H
N
N • Tanimoto Coefficient
Gleevec
• non-adaptive ES

O N

N N N N
H H
N
N N

T = 0.88
NH

N O N

NH
N N
NH
N
T = 0.87

Design Examples – Dopamine D3 Ligand

• ‘recapped’ COBRA
O N
N O
• Daylight Fingerprints
N
D3: 0.92 nM
H • Tanimoto Coefficient
D2: 61 nM BP 897 • non-adaptive ES
Pilla et al. (1999) Nature 400:371-373

NH
Cl
O N O H
N N
N N Cl
T = 0.88 H
O N O
T = 0.80
O
O
N
O
N N
N N
NH
O T = 0.78
T = 0.75

129
Design Examples – Dopamine D3 Ligand
Reference: BP 897
O N
N O
D3 Homology Model
N (Byvatov, Sasse, Stark, Schneider (2004))
H

4-bond spacer

Asp 110
N
H
N N O

O
Phe 345
3-bond spacer

I Phe 346
N
H
N N O Ser 192
O

D3: 396 nM Design


D2: 117 nM
Hackling et al. (2003) J. Med. Chem. 46:3883-3899

Design of novel CB-1 ligands


R

CB-1 Seed (Ki = 110 nM) O

(Khanolkar et al. 2000) CO 2 Et O


N
N
O
O R'

O 6% Hit-Rate

De novo designs
2
Focused
N
R
F libraries
R
1

R = H; clogP = 6.90
Br; clogP = 7.79
MeSO2; clogP = 5.47 N
N
R'
GPCR-BB
TOPAS DB 10% Hit-Rate
3

130
Combinatorial Design of Novel Kv1.5 Blockers

O
O
S
O NH H
N
IC50 < 1 µM HO
(ICAGEN) O
O
S
NH O
O H
O Evolutionary N

de novo Virtual
Design CombiChem IC50 < 1 µM
O
S
O NH O
H
N
O
Pharmacophore S
O NH O
IC50 ~ 7 µM Matching H
N
IC50 ~ 1 µM

Design of a druglike hKv1.5 channel blocker

RECAP Building TOPAS combinatorial


WDI 3b
Blocks optimization

~ 46,000 24,563
Reference molecule

aromatic
aromatic
1-fluoro-2-nitrobenzene o-anisidine nucleophilic
nucleophilic reduction
reduction
substitution
substitution (Pd-catalytic
(Pd-catalytic hydrogenation)
hydrogenation)

condensation
condensation

131
Thrombin inhibitors:
automatic vs. manual design
W60D
Y60A H57

L99
S195
NH 1. Placement of fragments
O 2. Combinatorial optimization
N98 6 • Preferred reaction (red. amination)
A190 NH NH2 • „needle“ approach
G216
W215 G219
Ki = 10 nM
D189
1DWB, 3.16 Å Böhm et al. (1999) JCAMD 13:51

Y60A W60D O
1. Placement of central scaffold
N 2. Modelling
N
3.1
O 3. Fluorine scan
F

N98
7
G219 HN NH2
W215
G216

A190
Ki = 6 nM
1OYT, 1.67Å Obst et al. (1997) Chem. Biol. 4:287
D189
Olsen et al. (2003) Angew. Chem. Int. Ed. 42:2507

Conclusions: Current status of de novo design

• Pipelines are fuelled by HTS due to lack of early SAR


• de novo design is complementary to HTS:
generates new chemical entities
exploits existing knowledge
yields higher hit-rates

BUT...
• neglects receptor flexibility
Zhu et al. (2001) JCAMD 15:979 (F-DycoBlock)
Anderson & Wright (2005) Curr. Comp. Aid. Drug Des. 1:103

• needs to include secondary constraints better


• lack (?) of validated & available software tools

132

Bewerten