Vorlesung MolecularDesign2005

Moleküldesign
Folien zum Kurs

Sommersemester 2005
Gisbert Schneider
gisbert.schneider@modlab.de
Worum es geht, am Beispiel der „Aspirin-Story“
Salicin
(aus Weidenrinde, Entzündungshemmer)
Arachidonsäure
HO
COOH
O-β-D-glucopyranosid
COX
COX
COOH COOH
OH O O COOH
O O
OOH
Salicylsäure Acetylsalicylsäure
PGG2
(Aspirin®, Prodrug)
10.10.1897 Erstsynthese
durch Felix Hoffmann (Bayer)
Prostacyclin Thromboxan A2
• inhibiert COX-1 und COX-2 (1991 entdeckt)
• Acetylierung von Ser530 ( Ala-Mutante) Erweiterung Aggregation
• Problem der Selektivität von Blutgefäßen von Thrombozyten
+ Hemmung der (nur in Thrombozyten)
Thrombozytenaggregation
1
Strukturbasierter Entwurf von Wirkstoffen
Cokristallstruktur Salicylsäure+COX-1 Cokristallstruktur SC558+COX-2

(PDB: 1pth) (PDB: 1cx2)
Hydrophile Tasche
Ser530
Ser530
„Mickey Mouse“-Grundgerüst
privilegiertes Strukturmotif
Adaptive Optimization
Bio/Chemical • Bioactive Molecule(s)

Knowledge • (Q)SAR
Inference Machine
Hypothesis Data/Facts
Molecular Structures Test System
Synthesis
2
Similarity-Based Molecular Design
“Seed”
x2
Assumption:
Assumption:
Growing
Growing Distance
Distance ==
Growing
Growing Dissimilarity
Dissimilarity
x1
Mutation - Sampling
ELVISISKING
DIVISISKLNG
PY j
DAYLSLSKLDS
…
σ DAYANDNIGHT
| |||||| | | | | || ||| | | di,j

0 1
Y FLWIHM QV PCKTNE RSAD G
3
Entropy of Peptide Libraries
H = − ∑ ∑ p p , k log 2 p p , k
p k
Cumulative Shannon-Entropy
70 1 2 x1 2
09
Number of active peptides
60 (a ) 1 0 x1 0 9 (b )
x 10-5 units
50
8 x18
0 9
<activity>
40
30 6 x16
09
<activity>
20 4 x14
09
10
2 x12
09
0
00
0 10 20 30 40 50 1 2 3 4 5 6
E n tro p y o f p e p tid e lib ra ry [b its ] < d is ta n c e to s e e d p e p tid e >
Schneider & So 2001, Adaptive Systems in Drug Design, Landes, Austin.
Virtual Fitness Landscape
O
O O H
H H N
N N
H
NH
Ala Trp Gly
ity
ic
ob
Fitness
ph
ro
yd
H
Vol
um
e
A B C
S1 S2 S3
Signal peptidase I substrates

Neural
Neural network
network „fitness“
„fitness“ score
score
4
Peptide de novo Design
• Novel eubakterial signal peptidase-I substrates
FFFFGWYGWA*RE
• Artificial antigen (DCM, β1-adrenoceptor auto-antibodies)
ARRCYNDPKC GWFGGADWHA
Wrede et al. 1998, Biochemistry 37:3588

Schneider et al. 1998, PNAS 95:12179
Drugs, Drug-Likeness &

Virtual Screening
5
Best-selling Drugs 2003-2005
OH
O H
N
HN Lipitor® (Atorvastatin) [Pfizer] HO O
O HMG CoA reductase inhibitor HO
N O
Hypercholesterolemia S
OH F
OH OH Hyperlipidemia HO O
Atherosclerosis 11 billion $ F
O
F HH
HO O Zocor® (Simvastatin) [Merck] O
F
O HMG CoA reductase inhibitor
O Adavir® (Fluticasone, Salmetrol) [GSK]
Hypercholesterolemia
Hyperlipidemia Corticosteroid agonist + Beta 2 adrenoceptor agonist
O
H Atherosclerosis
6 billion $ Asthma
Zyprexa® (Olanzapine) [Eli-Lily]

N 5-HT 2 antagonist
Plavix® (Clopidogrel) [BMS]
N D1, D2, D4 antagonist
O S P2Y12 purinoceptor antagonist
N Alzheimers disease
N Myocardial infarction
O Psychosis
Thromboembolism
Cl N Schizophrenia
Atherosclerosis H
S
Bipolar disorder
Cerebrovascular ischemia
H Paxil® (Paroxetine) [GSK]
N
5-HT uptake inhibitor
Norvasc® (Amlodipine) [Pfizer] O Anxiety disorder
Cl O
O O
Calcium channel blocker Sleep disorder
Hypertension O Obsessive-compulsive disorder
O O
O
Angina Premenstrual syndrome
H2N N Cardiac failure F
Major depressive disorder
H
Therapeutic Target Classes
Bleicher et al. (2003)
6
Stage-by-stage quality assessment
to reduce costly late-stage attrition Bleicher et al. (2003)
Time & Costs in Drug Development
7
Assay Methods
Functional Assays Cell-based Assays
Binding Assays Biochemical-endpoint

Assays
Note:
– no strict discrimination possible
– overlapping between subdefinitions
Functional Assays
Amplified Luminscent Proximity Homogeneous Assay
Low background in the absence of a Amplified signal when receptor and

specific receptor-ligand bead ligand beads are in proximity by
interaction specific biological interactions
8
Functional Assays
GPCR: Coupling
R Effector
Proteins
AC PLC
ATP cAMP + PPi PIP2
Functional Assays
Detection of changes in intracellular IP3
donor GST: glutathione S-transferase
9
Functional Assays
Fusion HT Microplate Reader
Binding Assays
Binding of a ligand to a protein target
Ligand: Protein target:

– neurotransmitters – Receptors
– hormones – Ion channels
– growth factors – Enzymes
– cytokines – Carrier
– toxins molecules
– etc.
10
Binding Assays
Evaluation of the binding (affinity) between endogenous

ligands (e.g. neurotransmitters) or drugs and their
molecular targets (e.g. receptors)
Binding Assays provide a direct approach to the study of

receptors (or more accurately recognition sites) and their
modification by drugs
They do not provide information on the activity of the

ligand for the molecular target
Definition and calculation of IC50-value
The IC50-value represents the concentration of drug that is

required for 50% inhibition of enzyme / receptor activity
x
IC 50 =
100 %
( − 1)
y
x = concentration [µM] of drug

in the assay
y = result of assay for the drug [% of Control]
11
Definition and calculation of Ki-value
The Ki-value is defined as the concentration of the competing ligand

(here: drug) that will bind to half of the binding sites at equilibrium, in the
absence of competitors
IC50
Ki = [Ligand] [Protein]
L Ki = Einheit: mol/l
1+ [Ligand x Protein]
Kd
L = concentration [nM] of radiotracer

Kd = affinity [nM] of the radiotracer for the receptor
Cheng Y., Prusoff W.H., Biochem. Pharmacol. 22: 3099-3118, 1973
HTS Workstation
12
Screening Methods
Screening plates Screening plates
for HTS „Intelligent LTS“
for LTS
24 well
96 well
Hit-identification strategies
Structure-based
Ligand-based
13
Structure-Based Molecular Design
Selection of promising drug-like agents

in the presence of
receptor structure information
• Binding site identification

• Docking (single; combinatorial)
• Scoring
• ….
Ligand-Based Molecular Design
Selection of promising drug-like agents

in the absence of
receptor structure information
103 -1020
1 -104
(10100)
Chemical Space Focused Library

(chemically feasible, (subset of compounds)
virtual molecules)
14
Navigation is defined as ...
"The process of determining and maintaining a

course or trajectory to a goal location“.
(Franz & Mallot, Robot. Autonom. Syst. 2000, 30, 133)
What We Need for Exploration
• Coordinate system (“chemical space”)

• Guide through chemical space (“compass”, “map”)
• Target (“goal location”)
• Molecule generator / sampling method (“vessel”)
Each map has a certain resolution & meaning
15
Similarity Searching / Neighborhood Behavior
“Cherry-Picking“
16
Library Design by Similarity
x2
PC2
virtual d
optimum
PCA
k-nearest neighbor
x3
PC1
x1
x2
PC2
“Spikes”
PCA
x3
PC1
x1
Library Shaping by Similarity
ANN, SVM, PLS etc.

„Rule-of-Five“
20 before after
100
% compounds
% compounds
15
80
10 60
40
5
20
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2
drug-likeness score Rule-of-Five violations
Lipinski
Lipinski et
et al.
al. (1997)
(1997)
Poor
Poor absorption
absorption oror permeation
permeation is
is more
more likely
likely when:
when:
1.
1. There
There are
are more
more than
than 55 H-bond
H-bond donors
donors (expressed
(expressed as
as the
the sum
sum of
of OHs
OHs and
and NHs);
NHs);
2.
2. The
The MW
MW isis over
over 500;
500;
3.
3. The
The LogP
LogP isis over
over 5;
5;
4.
4. There
There are
are more
more than
than 10
10 H-bond
H-bond acceptors
acceptors (expressed
(expressed as
as the
the sum
sum of
of Ns
Ns and
and Os).
Os).
17
Properties of Known Drugs
35 35
MW clogP
30 30
25 25
% 20 % 20
15 15
10 10
5 5
0 0
100 300 500 700 > -6 -4 -2 0 2 4 6 8 10 12
200 400 600 800
Druglikeness may be defined as a complex

balance of various molecular properties and
structure features which determine whether
particular molecule is drug or non-drug.
• hydrophobicity
• electronic distribution
• hydrogen bonding characteristics
• molecule size and flexibility affect
• pharmacophoric features
• behavior of molecule in a living organism

• transport, affinity, reactivity, toxicity, metabolic stability etc.
(ADME/Tox)
• Pharmacokinetics („what the organism does to the drug“)
• Pharmacodynamics („what the drug does to the organism“)
18
More Rules of Thumb for „Drug-Likeness“
Rotatable Bonds < 6
• Solubility logS
Fragment-based prediction
logS values below -4 indicate possible solubility problems
• Polar Surface Area PSA

PSA values above 120 Å2 indicate possible absorption problems
• Partion Coefficient Water/Octanol AlogP

AlogP values above 5 indicate possible bioavailability problems
http://www.molinspiration.com/cgi-bin/properties
Properties of Ligand Families
MW clogP
Receptor class
(number of cmpds) Avg. Min. Max. σ Avg. Min. Max. σ
GPCR (N = 1,467) 406 121 993 134 3.6 -11.1 13.7 2.6
Protease (N = 1,015) 495 136 945 114 2.9 -8.2 10.3 2.4
Kinase (N = 387) 395 74 717 104 3.1 -5.6 8.7 2.2
Enzyme (N = 839) 364 68 849 119 2.6 -5.2 10.9 2.5
Hormone (N = 227) 336 142 949 115 4.0 -5.2 10.1 2.9
Ion channel (N = 412) 375 208 969 106 3.1 -11.1 10.8 2.7
19
Properties of Ligand Families
5.5
5.0
4.5 hormone receptor
4.0
GPCR
3.5 kinase
<clogP>
3.0 x protease
2.5 other enzymes
2.0 ion channel other
1.5
320 340 360 380 400 420 440 460 480 500
<MW>
Fragment-based Design
• combinatorial optimization principle

• manageable size of search space
• might result in chemically feasible molecular designs
• “side-chains” and “scaffolds” are interchangeable
Cl
R1 R1
N N
N NH
N N N
R2 R2 N
Traditional “combinatorial thinking” Fragment-based design

Scaffolds & side-chains Only “building-blocks” (fragments)
20
The Concept
pseudo-retrosynthetic fragmentation
(e.g., RECAP)
Drug DB Fragment DB
•• Reference
Reference Molecule(s)
Molecule(s)
Reactions •• Fitness
Fitness Functions
Functions
Assemble
(e.g., TOPAS)
Designed
Designed Molecules
Molecules
RECAP: Lewell et al. (1998) J. Chem. Inf. Comput. Sci. 38:511

TOPAS: Schneider et al. (2000) J. Comp. Aided Mol. Des. 14:487
RECAP – Eleven Bond Cleavage Types

O O
O
N
N N N
O
Amide Ester Amine Urea
N
O
N+ N
Ether Olefin Quarternary N Arom. N aliph. C
N
N S
O
O
Lactam N aliph. C Arom. C arom. C Sulphonamide
Lewell et al., JCICS 1998.
21
RECAP Applied to the COBRA Database
Reaction Total Fragments Unique Fragments

Amide 4324 1904
Ester 978 472
Amine 489 178
Urea 8 6
Ether 128 76
Olefin 0 0
Quart. Nitrogen 48 16
Arom. N – Aliph. C 1018 601
Lactam N – Aliph. C 274 170
Arom. C – Arom. C 1194 613
Sulphonamide 646 368
COBRA: Schneider & Schneider (2003) QSAR Comb. Sci. 22:713
Fragment analysis
Molecule
Sidechain Framework
Ring System Linker
Ring
N N
Linker
Sidechain
N S N S
Molecule Molecular Graph Scaffold

adapted from
Bemis & Murcko
K. Grabowski, Analysis done with SVL (MOE)
22
Framework extraction
IBS SPECS-NP COBRA
73: 10,44% 34: 4,86% 276: 5,48% 93: 1,85% 82: 1,63% 81: 1,61%
1520: 6,09% 789: 3,16% 668: 2,67% 553: 2,21%
26: 3,72% 20: 2,86% 19: 2,72% 77: 1,53% 68: 1,35% 58: 1,15%
492: 1,97% 463: 1,85%
17: 2,43% 16: 2,23%
450: 1,8% 367: 1,47%% 55: 1,09% 50: 0,99%
12: 1,72%
13: 1,86%
304: 1,22% 266: 1,06% 50: 0,99% 50: 0,99%
10: 1,43%
Unique ring extraction from natural products

Natural products COBRA
(IBS: 24977, Micro: 685, SpecsNP: 699 cpds.) (5033 cpds.)
*1 *2
*2 *2 *1
*1
*2
1* 2* *1
22 20 17
2* *1
2* *2 *1
363 289
*2 *2
1* *1 *2
*1 *2
2* *2 2* 2*
16 15 15
*1
1* *2 1* *2
281 273 1*
2* 1*
*2
*2 *2 *1
1* *2 1* 1*
2* *1
2* 12 11 11
*1 *2
*1 2*
2* 2* *2 *2
265 256 254 1*

1*
*2 2*
1* *2 *2
1* 1* *1
11
11 11 *1
1*
2* 2*
2*
2* *1 *2
*2
1* *2 *2 *2 *2
11 11
2*
*2
242 219 205
2*
23
Synthetic drugs vs. Natural products: RECAP results
COBRA: IBS: MICROSOURCE: SPECS:

molecules: 5074 molecules: 24617 molecules: 692 molecules: 700
number of number of number of number of

reactions percent reactions percent reactions percent reactions percent
Amide 2409 47,48 7423 30,15 14 2,02 36 5,14
Aromatic carbon-aromatic carbon 922 18,17 2687 10,92 81 11,71 100 14,29
Aromatic nitrogen-aliphatic carbon 599 11,81 906 3,68 6 0,87 13 1,86
Ester 435 8,57 7150 29,04 291 42,05 374 53,43
Amine 370 7,29 535 2,17 5 0,72 7 1,00
Sulphonamide 313 6,17 84 0,34 0 0,00 1 0,14
Lactam nitrogen-aliphatic crabon 176 3,47 763 3,10 1 0,14 3 0,43
Ether 80 1,58 133 0,54 1 0,14 2 0,29
Urea 65 1,28 436 1,77 1 0,14 0 0,00
Quarternary nitrogen 12 0,24 38 0,15 0 0,00 2 0,29
Olefin 0 0,00 3 0,01 0 0,00 0 0,00
“N-Chemistry” “O-Chemistry”
K. Grabowski, U. Fechner, Analysis done with Daylight-Toolkit
„Drug-Likeness“ Score (ANN)
HN
O Xenical™ (Orlistat)
O
O O Score = 0.54
Nondrugs
14
12
10 Σ = 76% Σ = 24%
% 8
4 y = f(x)
O 2
0
O HN N
N 25
Drugs 0 Score 1
N
20
Σ = 24% Σ = 76%
O S O % 15
N 10
5
N
0
0 Score 1
Viagra™ (Sildenafil)
Score = 0.94
24
„X-Likeness“ Scores (ANN)
• Analysis of natural compound properties & scaffolds

(Lee & Schneider 2001, J. Comb. Chem. 3:284)
• “Drug-Likeness” prediction
(Schneider 2000, Neural Networks 13:15)
• Comparison of combinatorial libraries

(Schneider & So 2001, Adaptive Systems in Drug Design, Landes, Austin)
• Virtual screening for CNS-active compounds

(Schneider et al. 2001, Curr. Med. Chem. CNSA 1:99)
• hERG-liability prediction
(Roche et al. 2002, ChemBioChem 3:455)
• CYP P4503A4-liability prediction

(Zuegge et al. 2002, Quant. Struct. Act. Relat. 21, in press)
• “Frequent Hitter” analysis & prediction

(Roche et al. 2002, J. Med. Chem. 45:137)
Evaluation of Virtual Combinatorial Libraries

R1
Input R1
w(1) N
O
Hidden N R2 OH R2
N
w(2) N O
R2 R1 O
Output
Drug-likeness ++ ++ ++ ++ ++ ++
“Cytotoxicity” + + + ++ ++ –
GPCR-ligand likeness ++ ++ ++ – + +
Kinase-ligand likeness – – – – – +
by supervised R1 R1 R1
O
neural networks N
R2
O N
O
N R2
NH 2 N NH 2
R2
25
R-Group Descriptor
F Distance
E B
D A 1 2 3 4
C
Sum of atomic
a b+c d+f e
values
R1 Distance: topological number of bonds
R-group descriptor R1 based on 5 atomic properties

Distance
3 1 2 3 4
4 2
HN O Atomic Weight 12.010 28.010 28.054 15.015
1
3 H-Bond Don 0 1 0 0
2 H-Bond Acc 0 1 0 1
logP 0.1551 0.0129 -0.4070 -0.7096
R1
Molar Refractivity 0.3513 0.4328 0.5506 0.2173
Popular Supervised Classifier Systems
x2
The task
x2
ANN Solution(s)
x1
x2
x1
SVM Solution
x1
26
Three-layered Feed-forward Network
INPUT x1 x2 xn
w
Hidden Layer
v
OUTPUT
 HID   IN  
f( x ) = act  ∑ v h  act  ∑ whi x hi + ϑ h   + θ 
 h =1  i =1  
  
1 if f ( x) > threshold
class = 
0 else
Support Vector Machine
f ( x) = ∑ α i K ( xisv , x) + b
i
1 if f ( x) > 0
class = 
0 else
α i : Lagrange multipliers (≥ 0)
K : Kernel function K ( x, x' ) = (( x • x' ) s + 1) 5
x : Input vector
x sv : Support vectors
b : Constant
27
Comparison of ANN & SVM: Drug-Likeness
• Sadowski/Kubinyi data set
• GC + MOE + CATS descriptor
SVM
ANN
• Sadowski/Kubinyi data set

• GC + MOE + CATS descriptor
Test data Correct Incorrect

ANN ANN
Correct SVM 72 % 6%
Incorrect SVM 10 % 11 %
• Solutions complement each other
28
F
F
HO N F
N
S
6
Cl
1
O
O
OH
☺ ANN O HN
OH
N
O ANN
SVM
OH O
2
N
7
☺ SVM
N H
O N N O
N
O O
HO O N N N
OH N O O
3
NH2 8 9
H
H2N N OH
O
O 4 O
O NH
N
N Cl
O
HN
S S O O
O O NH2
O
5 10
The Jury Decision Approach
A. Givehchi, G. Schneider (2003)
29
Prediction of “Frequent Hitters”
• Molecules showing up as hits in different assays

(unspecific binding, interference with assay)
• Data collection from Roche-HTS
• “Frequent Hitter” selection:

• active in at least eight assays
• requested by at least six projects
• 80% agreement by medicinal chemists
479 “Frequent-Hitters”
423 “Nonfrequent Hitters” from trade drug collection
Roche et al. (2001) J. Med. Chem. 45:137
“Frequent Hitters”: Visualization of Data
120 Ghose&Crippen descriptors

SOM
cc = 0.8
PLS-analysis (cc = 0.8)

PCA
30
“Frequent Hitters”: ANN Training
cc = 0.83
Database cpds. FH/% DND/% r2

ACD 183221 35 26 0.08
WDI 55750 22 81 0.03
Trade Drugs 3344 13 76 0.05
Drugs classified as “Frequent Hitters”
31
Pareto-Ranking
• Multiple objective functions

Dimension N
1
• Selection of sets of solutions
2
3 Pareto-Front
Dimension 1 Implementation
Multiobjective Genetic Algorithm (MOGA)
„non-dominated“ solution Fonseca & Fleming (1993) In: Genetic Algorithms:
Proceedings of the Fifth International Conference,
Forrest, S. (Ed.), Morgan Kaufmann: San Mateo,
CA, pp. 416-423.
Median-molecules: Brown et al. (2004) JCICS 44:1079
Escaping the “Twilight Zone”

© 1999 Cordon Art B.V. - Baarn - Holland. All rights reserved.
M.C. Escher’s “Regular Division of the Plane I”
Library
Diversity
SAR Information
32
Adaptive Feature Extraction
Weeding out
Activity
• Drug-Likeness
• ADMET in silico
• Reactive groups
• Frequent Hitters
x
Narrowing down
Activity
• Trend vectors
• Substructure analysis
• Similarity searching
• Profile / landscape analysis
x
Focusing in
Activity
• Structure-based models
• PPP models
• Informed docking & scoring
x • Informed design
Examples of drugs derived from

structure-based approaches from Congreve et al. (2005)
33
Virtual Screening
34
Virtual Screening: DNA-Gyrase
Böhm et al. (2000) JMC 43:2664
ACD
Catalyst / LUDI
600 candidates
Retrieval of ~3,000 close analogs
150 hits in 14 chemical classes
7 validated novel actives

Binding to the DNA-gyrase B-subunit ATP site
Structure-based optimization
Fragment-based discovery of a potent p38a MAP-kinase inhibitor

Gill et al. (2005) JMC 48:414
adapted from Congreve et al. (2005)
X-Ray Screening
Met109 Thr106
backbone
amide
Fragment
IC50 = 1.3 mM
Lead
structure overlay IC50 = 65 nM
35
Ligand binding modes: EGFR kinase ATP site.
inactive-like conformation active-like conformation
PDB: 1xkk PDB: 1m17
adapted from Congreve et al. (2005)
Self-Organizing Map (SOM)

„Medicinal Chemistry Roadmap“
36
Mapping Chemical Space
O O
N N O
N HO N OH H2N N NH2
N
Neuron (1/25) Neuron (25/25) N N

H2N H2N
O O O O H2N N OH O
N H N OH
N N
H2N HO OH
N Kohonen map
containing (25x25) O OH
compound classes
O
O O
O O
OH O O
+
N
+
O N
O
O O
N
Neuron (1/1) Neuron (25/1)

O O
Cl
• source: WDI
+
N N
N +
N
N
• CATS 2D descriptor
• Euclidian distance
• planar topology
A „Target Road Map“ (SOM)

CATS 2D descriptors
GPCR Protease Kinase
Enzyme Hormone Ion channel
37
distribution mqe
COBRA 4.6
• 6.064 cmpds, CATS
• 10x10, toroidal
GPCR Protease Enzyme Ion channel Kinase Nuclear R.
distribution mqe
COBRA 4.6
• 10x10, toroidal
• Manhattan distance
38
distribution mqe
COBRA 4.6
• 20x20, toroidal
distribution mqe
COBRA 4.6
• 20x20, toroidal
39
Enzyme_5-LO
Ionchannel_gaba
Ligand Space
GPCR_CRF
GPCR_opioid
Enzyme_PDE
Enzyme_Topoisomerase
GPCR_adenosine
GPCR_CCR
GPCR_NK Enzyme_cholinesterase
GPCR_NPY
Hormone_PPAR GPCR_mAChR 64 COBRA SOMs
Enzyme_aromatase
Ionchannel_calcium
Enzyme_PLA
GPCR_5HT7
GPCR_melatonin
Enzyme_COX Enzyme_Estrone Ward’s method
Hormone_estrogen
Enzyme_cholinesterase GPCR_Histamin 1 – Pearson r
GPCR_mAChR
GPCR_5HT7
Enzyme_Estrone GPCR_adrenergic
GPCR_Histamin
GPCR_adrenergic protease_fVIIa
protease_fVIIa
GPCR_dopamine GPCR_dopamine
GPCR_5HT1
GPCR_5HT3
GPCR_5HT2 GPCR_5HT1
Ionchannel_nAChR
GPCR_5HT4 GPCR_5HT3
Ionchannel_potassium
Ionchannel_sodium GPCR_5HT2
Ionchannel_nAChR
Enzyme_NOS
GPCR_5HT6
protease_DPP
GPCR_cannabinoid
Hormone_RAR GPCR_5HT4
Hormone_retinoid
Hormone_RXR
Enzyme_Farnesyltransferase Ionchannel_potassium
protease_thrombin
protease_cathepsin_K Ionchannel_sodium
protease_proteasome
protease_b-secretase Enzyme_NOS
protease_g-secretase
GPCR_CCK
GPCR_somatostatin GPCR 5HT6
protease_cathepsin_D
GPCR_endothelin
protease_HMGCoa
t DPP
GPCR_mGlu
GPCR_prostanoid
“[..] peripheral serotonergic system disturbances
Hormone_GHRP
protease_ECE
protease_papain
protease_caspase
protease_ace
Enzyme_IMPDH
GPCR_P2Y
may predispose to thromboembolic complications [..]”
Ionchannel_P2X
Enzyme_Polymerase
Enzyme_RT
Ionchannel_AMPA
protease_neuramidase Małyszko et al. (2000) Nephron 84:305
O
COBRA 4.6: mqe NH
N N
N
O
N O O
S
O
O
H
N
O
O
O
I
NH N
N N N
N S N+
COBRA 4.6: all H -
N
HO HO
OH
Br O
O P O
O O
O N O
P
HO SH N+
O N O O O-
H
-
O O
O N+
Cl
H
N S
N N
NH O N+ N
Cl O- H
O O
FF
“Cluster Representatives“
40
COBRA 4.6: mqe
NH
H O
N N N O H
HO N
N Cl
N
N
Cl
O S
O Cl N
N
HN HN
N O
O
F N F
F
COBRA 4.6: all O
N
S
N
N
N H
O
Br
N
N
HN O
N N
Cl
O N
O N N O
O HN NH N
N
O
Identification of „Promiscuous Binders“

• COBRA 3.9
• CATS
5HT Ligands Dop.Rec. Ligands • Euclidian distance
Cl
N NH
N
N
H Lepotex (Clozapine)
N
Sertindole O Ki (D4.2) 40 nM
Ki (5-HT2a) 0.9 nM N Ki (5-HT2a) 3.3 nM
Ki (5-HT2c) 1.3 nM N Ki (5-HT2c) 13 nM
Ki (D1) 210 nM Cl Ki (D1) 540 nM
Ki (D2) 7.4 nM Ki (D2) 150 nM
Ki (D3) 8.2 nM Ki (D3) 360 nM
Ki (D4.2) 21 nM N
Ki (H1) 2.1 nM
Ki (H1) 570 nM Ki (α1) 23 nM
Ki (α1) 1.8 nM Ki (mAChR) 34 nM
F
41
SOM-Training
Distribution of all
Quantization Error • Software: molmap®
compounds
• Descriptor: speedCATS®
• 20,000 cycles
• Toroidal (10 x 10) map
• Gaussian neighborhood, τinit = 1
Distribution of cpds. Distribution of cpds. Distribution of cpds. Distribution of cpds.

(activity = high) (activity = medium) (activity = low) (activity = 0)
Target area: neuron (2/3)
Projection of
Virtual Combinatorial Libraries
• R = 60 generic building blocks
R1 R1
N N
N R2
N N
N
R2 R1 R2
O
N N
O H H
N
NH N
O N S
H
IC50 ~ 3 nM NPY-1
piperazine spiroindoline benzodiazepine
42
Privileged Scaffolds
N R1
O
R3 N R2
N N
Ser-protease inhibitors H H
O
A SOM for Identification of

Novel Selective A2a Inhibitors
Purinergic receptor family (GPCR) A1, A2a, A2b, A3

(endogenous ligand: adenosine)
• Affective disorders, Parkinson’s disease
Given: 153 combinatorial products with known Ki (A2a , A1)

1. CATS topological pharmacophores
2. SOM training
3. Identification of a “seed” structure
4. Variation of the seed and virtual library design
5. Projection of virtual compounds onto the SOM
6. Selection of candidates for synthesis and testing
43
SOM: Novel A2a Inhibitors
Scaffold structure
O
R1 N
N
R3
R2 N N
NH2
Seed structure 1 (neuron 4/2)

O
Ki (A2a) Ki (A1) N N
N
N S
N
empty NH2
low
Seed structure 2 (neuron 3/2)
Ki and O
Selectivity O Br
N
N
high
N
N
Selectivity NH2
Combined with 96 secondary amines (from ACD)

192 virtual combinatorial products
SOM: Novel A2a Inhibitors
A2a A1
<Ki> [nM] <Ki> [nM] <Selectivity>
“historical” 102 (101) 860 (1154) 14 (19)
structures
“designed” 50 (93) 974 (1264) 33 (23)
structures
Standard deviations in brackets
Distribution of virtual
Combi-Products
O O MeO O
R1 N N R1 N N
N N N
R2 N N S R2 N N O N N O
Br Br
NH2 NH2 NH2
Lead structure 1 Lead structure 2 121-fold selectivity

(neuron 4/2) (neuron 3/2) Ki (A2A) = 2.4 nM
44
“Antidepressant-Likeness” - SOM
5000 WDI drugs N

+
+
Imipramine HN
597 Antidepressants Fluoxetine
N
F3C O
150-dimensional SOM
“chemical space”
(CATS descriptors) NKP-608
O
CF3
empty N
Antidepressants
0%
Fraction of
25 % O NH
CF3
50 %
Cl
N
75 %
Vergleich von Bindetaschen
Metalloproteinase active-site
Other Zn2+-containing pockets
1. Identify surface pockets empty
2. Assign surface properties
3. Calculate spatial auto-correlation
4. SOM-Training
Stahl, Taroni & Schneider (2000) Protein Engineering 13:83
45
Bindetaschen-SOM
Vorhersage von Bindetaschen
Training-Daten Test-Daten
46
Identifikation von Bindetaschen (1)
1. Errechne Proteinoberfläche
2. Bestimme Gitterpunkte
3. Errechne “Vergrabenheit”
4. Definiere “Tasche”
Definition von Bindetaschen
1.) Protein / solvent assignment 2.) Accessibility of “solvent” points
3.) Detection and excision of 4.) Conolly surface of

“cavity” points cavity-forming atoms
Stahl, Taroni, Schneider (2000) Protein Engineering 13:83
47
Identifikation von Bindetaschen (2)
“Site Finder in MOE”
• beruht auf “alpha spheres”

• verwendet hydrophobe und hydrophile Interaktion
• Konzept der “dichten Packung” Bindetasche
http://www.chemcomp.com/Journal_of_CCG/Features/sitefind.htm
Receptor-Ligand
Interactions
48
Reaction-Energy Diagram
N N
+ O
O
protein ligand protein–ligand complex
Energy Coordinate
Ea Ed Ea: Activation Energy

Ed: Dissociation Energy
P+L Activated complex

(transition state) ²∆EE
Overall change in energy
PL complex
Reaction Coordinate
Raffa (2003)
Protein-Liganden Interaktion
Bindungskonstante
[Ligand] [Protein]
Ki = Einheit: mol/l
[Ligand x Protein]
Freie Bindungsenthalphie ∆G = -RT ln Kii

(“Gibbs free energy of binding”)
∆G = ∆H – T ∆S
Einheit: J/mol
Enthalpie Entropie Beispiel (bei Körpertemperatur)

Ki = 10-9 M = 1 nM ≡ -51 kJ/mol
( 1 kcal = 4 kJ; K = °C + 273,15)
49
Thermodynamische Beiträge zu ∆G
Böhm et al. (1996)
Der Rezeptor-Liganden Bindungsvorgang

HO
solvated receptor solvated ensemble

binding site of ligand conformations
O O
Enthalpischer Beitrag
solvated receptor-ligand complex
• Brechen und Bilden von H-Brücken
• Bilden lipophiler Kontakte
HO
hydrogen bond Entropischer Beitrag

• Freisetzen von Wasser von hydrophoben
Oberflächen zum Medium
N
hydrophobic contacts
(mittlere Anzahl H-Brücken const.!)
• Verlust von Beweglichkeit/Freiheitsgraden
charge-assisted hydrogen bond O O von Rezeptor und Ligand
Böhm et al. (1996)
50
Protein mobility and ligand binding
A protein is considered to exist in two conformations (P and P*) with an energy difference ∆Gconvert. The ligand (L) can bind
the protein (P) to give a complex (PL), or bind to P* to give a complex (P*L). Although P* has a higher free energy, it might
offer greater scope for interaction with L. For instance, P* might represent a conformer in which the binding site has
opened and exposed hydrophobic patches. This is energetically unfavourable, but offers the potential for favourable
interactions with the hydrophobic moiety of a suitable incoming L, thereby giving rise to a large, favourable interaction
∆Gintrinsic. The resulting complex (P*L) has a lower energy than that of the complex PL. The observed affinity of L for the
protein conformational ensemble is governed by ∆Gobs. Slow binding kinetics might well be observed, as P* is a higher-
energy conformer than P and an energy barrier (∆Gbarrier) must be surmounted before optimal binding to L can take place.
Domain motion of HIV-RT; partial refolding of 3D-structure

http://www.nature.com/nrd/journal/v2/n7/extref/nrd1129-s3.mpg
Allosteric
site
Ligand-induced hinge motion Ligand-induced share motion
Domain motion in Maltose: Hinge mechanism

Domain motion in P450BM-3: Shear mechanism

51
Optimale Interaktion
• Sterische Komplementarität (“lock-and-key”, Adaptivität)

• Komplementarität von Oberflächeneigenschaften
• Keine abstossenden (repulsive) Wechselwirkungen
• Der Ligand bindet in einer energetisch bevorzugten Konformation
Biotin-Avidin Komplex (PDB: 1avd)
Beispiel: Biotin-Streptavidin
• Ki = 2.5 · 10-13 M (∆G = -76 kJ/mol)

• sieben geometrisch ideale H-Brücken
• alle polaren Gruppen des Liganden sind an der Bindung beteiligt
• lipophiler Teil “schmiegt” sich ideal an das Protein an
Biotin passt optimal in die Bindetasche
perfekte Komplementarität der funktionellen Gruppen
Böhm et al. (1996)
52
Multiple Bindungsmoden
Soakingexperimente mit
Trypsin + Guanidiniumbenzoat
Sp
e
zif
it
ä ts
ta s
ch
e
katalytisches Ser195
Böhm et al. (1996)
Multiple binding modes
• Overlay of four HIV-RT inhibitor complexes using the protein Cα atoms

• Efavirenz (blue), Nevirapine (yellow), UC-781 (green), Cl-TIBO (red)
• Protein Data Bank codes 1FK9, 1VRT, 1JLG and 1TVR.
• These structurally diverse inhibitors occupy the same volume in the binding site.
Teague (2003)
53
Bindungsmoden verschiedener Inhibitoren
Überlagerung von 5 dipeptidischen Elastase-Inhibitoren
Böhm et al. (1996)
Kraftfelder
z.B. MM2, CHARMM, OWFEG, AMBER Böhm et al. (1996)
54
Empirische Scoringfunktionen
Idee: Interpretation von ∆G als Summe lokaler nicht-kovalenter WW
2D-Fall
z.B.: Andrews et al. (1984), Mittlere Beiträge funktioneller Gruppen
(in stärksten Protein-Liganden Komplexen ~ 1.5 kJ/mol pro Atom)
3D-Fall
∆Gbinding ≈ Σ ∆Gi fi (rL, rP)
Koordinaten von Ligand und Protein
Gewichtungsfaktoren
für verschiedene WW-Typen
Training-Daten !
Wissensbasierte Scoringfunktionen
Analyse von beobachteten Kontakten in Protein-Liganden Komplexen

z.B. PMF (Muegge & Martin), DrugScore (Gohlke)
Pseudo-Paarpotentiale ∆Wij(r)
Beobachtete Paarhäufigkeit
g ij (r ) im Abstand r
∆Wij (r ) ∝ − ln
g ref
P(r) W(r)
r r
55
Calculation of knowledge-based potentials (1)
(for protein folding simulation)
Define a contact: e.g. “distance of Calpha atoms from two residues < threshold” and
count the frequencies this gives the observed frequencies:
A C L
A 1 2 1
C 2 3 1
L 1 1 1
Define a suitable reference state: e.g. statistically expected pair-wise spatial
contacts, calculated from the AS occurrences in the AS-sequence:
A C L
AS sequence: ACLACLAALL A 6 8 16
C 8 1 8
AS counts: A: 4
C: 2 L 16 8 6
L: 4 Diagonals: ½ N (N-1)
Off-diagonals: N*M
Calculation of knowledge-based potentials (2)
observed: expected (“background”):
A C L A C L
A 1 2 1 A 6 8 16
Counts C 2 3 1 C 8 1 8
L 1 1 1 L 16 8 6
A C L A C L
Relative A 0.077 0.154 0.077 A 0.078 0.104 0.208
Frequency C 0.154 0.231 0.077 C 0.104 0.013 0.104
L 0.077 0.077 0.077 L 0.208 0.104 0.078
A C L
 f ab 
A 0.027 - 0.894 2.255 ∆EPseudo = − KT ln  ref  Inverse Boltzmann Law
C - 0.894 - 6.537 0.680 f 
L 2.255 0.680 0.027
56
Calculation of Pseudo-Energy
Count the frequencies of contacts for a protein the pseudo-energy for the protein
is then calculated by summing up the pair-wise empirical potentials for all found
contacts.
2F19 (blue) and decoy (green) 2HFL (blue) and decoy (green)
∆ E pseudo = − 123 .9 kJ
2F19: ∆E pseudo = −123.9 kJ 2F19: ∆E pseudo = −169.4 kJ

decoy: ∆E pseudo = −217.7 kJ decoy: ∆E pseudo = −163.2 kJ
Andrews Estimated Binding Energy

Andrews, P. R., Craik, D. J., Martin, J. L.
Functional group contributions to drug-receptor Interactions.
J. Med. Chem. 1984, 27, 1648-1657.
• Average Binding Energy (ABE)

Crude estimation of binding affinity of an “average” drug based on its
components
• ABE of 10 functional groups were derived from regression analysis with
200 potent ligands
∆Gestimated (kcal/mol) =
- 0.7•DOF + 0.7•Csp2 + 0.8•Csp3 + 11.5•N+
+ 1.2•N + 8.2•CO2- + 10•PO42- + 2.5•OH + 3.4•C=O
+ 1.1•(O,S) + 1.3•Hal - 14
57
Andrews analysis of virtual combinatorial libraries
50
40
30
∆Gestimated (kcal/mol)
20
10
0
1 2 3 4 5 6 7 8 9 COBRA
R1 R1 R1
R1
N N O
N R2
N R2 O
NH O
N N O R2
N
R2 R2 R1 R3
1 2 3 4 5
R1 O R2
O N R1 R1
O N O R2
R2 H
N R3 N R2 O HO R3
N N S N N
H H H
NH2 N NH2 O R3 R1 O
6 7 8 9
Extension of Andrews Analysis
Free energy of ligand binding
∆G = − RT ln K D
Binding energy per atom (ligand efficiency)
∆g = ∆G
number of non − H atoms
• ∆G change of -1.4 kcal/mol ~ 10-fold change in potency

• maximum affinity per drug atom ∆g = -1.5 kcal/mol “magic” methyl
Optimize/select compounds with highest ∆g, not lowest KD
58
Nicht-kovalente Wechselwirkungstypen
Protein Ligand
O H N
hydrogen bonds Dopt = 2.8 – 3.2 Å

O H O
H
O H N
+
O H N
O H
H ionic interactions Dopt = 2.7 – 3.0 Å
+
N
O H H
hydrophobic interactions
CH3 H3C
+
N
cation-π interaction
2+ -
Zn S metal complexation
Böhm et al. (1996)
Geometrie von H-Brücken
100° < C O H < 180°
• direktionale WW
• definierte Geometrie
N H O
N H O > 150°
Abstand N-O: 2.8 – 3.2 Å
Böhm et al. (1996)
59
Beispiel: Methothrexat in DHFR
H-Brückengeometrie in Protein-Liganden Komplexen sind
oft sehr ähnlich zu Kristallstrukturen aus der Cambridge-DB
Böhm et al. (1996)
Beitrag von H-Brücken (1)
Die Bindungskonstante ist

KEINE direkte Funktion der
Zahl der gebildeten H-Brücken!
Böhm et al. (1996)
60
The Hydrogen-Bridge
Desiraju (2002) Acc. Chem. Res. 35:565
Bindungskonstanten für Thermolysin-Inhibitoren (Metalloprotease)
X = O führt zu Repulsion!
Böhm et al. (1996)
61
Bindungskonstanten für Thrombin-Inhibitoren (Serinprotease)
(Eli-Lilly)
Böhm et al. (1996)
Extrem starke H-Brücken (1)

Beispiel: Inhibition des Enzyms Cytidin-Desaminase
= Analoges des
Übergangszustandes
Böhm et al. (1996)
62
Extrem starke H-Brücken (2)
Beispiel: Inhibition des Enzyms Cytidin-Desaminase
starker Inhibitor schwacher Inhibitor

• Inhibitor verdrängt Wassermolekül • Inhibitor verdrängt Wassermolekül nicht
• optimale Passform • leicht verschobenes Molekülgerüst
Böhm et al. (1996)
Pharmacophores &
Pharmacophore Descriptors
63
Pharmacophore Definition
GLOSSARY OF TERMS USED IN MEDICINAL CHEMISTRY
(IUPAC Recommendations 1998) http://www.chem.qmul.ac.uk/iupac/medchem/
Pharmacophore (pharmacophoric pattern)

A pharmacophore is the ensemble of steric and electronic features that is
necessary to ensure the optimal supramolecular interactions with a specific
biological target structure and to trigger (or to block) its biological response.
(A pharmacophore does not represent a real molecule or a real association
of functional groups, but a purely abstract concept that accounts for the
common molecular interaction capacities of a group of compounds towards
their target structure.)
Pharmacophoric descriptors
Pharmacophoric descriptors are used to define a pharmacophore, including
H-bonding, hydrophobic and electrostatic interaction sites, defined by
atoms, ring centers and virtual points.
Pharmacophoric types of functional groups

H O NH2
N
Donor N
H
O
Acceptor N
O
OH H
N NH
Donor + Acceptor
N
H O
Acid O N CF3
N S
N
(negative ionizable) OH N H O
NH
Base N
(positive ionizable) NH2 NH2
Atoms excluded O O
N
N
(„non pharmacophoric“) O O
64
MOE-Atomtypen (PATTY)
Pharmacophore Models:
A Matter of Interpretation?
HO NH3+ NH3+
Dopamine
HO HO
OH
P/D
A/D P/D
L
Generated from
L inspection of other
A/D D
dopamine receptor ligands
•• two
two rotamers
rotamers
•• different
different PPP
PPP models
models
65
Creation of PPP Triplets
Donor
H conformational H
N N
Acceptor analysis
O
HO HO
Donor O
PPP assignment
0 0 1 0 0 0 1 0 0 0
• many bits (often > 104)

• sparse (few bits set)
Hashing & Folding
Reagent-based PPP Fingerprints (GaP)
y
HO O
PPPs
H2N
O O
1 2
N H x
z
1. align along reactive bond & place attachment point at origin
2. rotate and record PPP constellations on a grid
relative orientation of PPs to origin is defined
applicable to active site constraints
Leach et al. (2000) JCICS 40:1262
66
Receptor-derived PPP Fingerprints
e.g. GRID e.g. 3PPs
Protein Pocket Site Map PPP Constellations
Target A 0 1 1 0 1 0 1 0 1 0
• Compound screening Target B 0 1 1 0 0 0 1 0 1 1
• Informative Library Design Target C 0 0 1 1 0 0 1 0 0 1
Target D 1 0 1 1 1 0 1 0 0 0
...
Virtual Screening for BACE Inhibitors
Validated hits
Common Pharmacophore (Asp Proteases)
Virtual Screening
Combined Query
Target-Specific Pharmacophore
(Surf2Lead® , Pep2Lead® , PHACIR®)
67
In situ Design
Factor VIIa
O NH
NH
S1 pocket HN NH2
Asp189
Geometrische Parameter
nicht-kovalenter Wechselwirkungen
R: Abstand der Wechselwirkungspartner [Å]

α: Winkel zur Bezugsachse
ω: Drehwinkel um die Bezugsachse
68
LUDI geometry rules: C=O
• Type: Acceptor
• Compl. Type: Donor
• R: 1.9 + 1 Ǻ
• α: 110-180°
• ω: 0-360°
LUDI geometry rules: N-H, O-H
• Type: Donor
• Compl. Type: Acceptor
• R: 1.9 Ǻ
• α: 150-180°
• ω: 0-360°
69
LUDI geometry rules: COO-
• Type: Acceptor
• R: 1.8 + 1 Ǻ
• α: 100-140°
• ω: -50-50°, 130-230°
LUDI geometry rules: =N- (as in His)
• Type: Acceptor
• R: 1.9 + 1 Ǻ
• α: 150-180°
• ω: 0-360°
70
LUDI geometry rules: R-O-R
• Type: Acceptor
• R: 1.9 + 1 Ǻ
• α (sp2): 100-140°
• ω (sp2): -60-60°
• α (sp3): 90-130°
• ω (sp3): -70-70°
LUDI geometry rules: Carbon
• Type: Lipophilic
• Compl. Type: Lipophilic
• R: 4 Ǻ
• α, ω: full sphere
71
LUDI geometry rules: Sulfur
• Type: Lipophilic
• Compl. Type: Lipophilic
• R: 4.8 Ǻ
• α, ω: full sphere
LUDI geometry rules: Aromatic Ring
• Type: Aromaticity
Donor
• Compl. Type:
Aromaticitiy Donor and
Acceptor
• R: 6 Ǻ
• Circular plane
72
LUDI geometry rules:
Aromatic Ring Hydrogens
• Type: Aromaticity Acceptor

• Compl. Type: Aromaticitiy
Donor
• R: 6 Ǻ
• α: 160-180°
• ω: 0-360°
LUDI geometry rules: Amide Bond
• Type: Aromaticity
Donor
• Compl. Type:
Aromaticitiy Donor and
Acceptor
• R: 6 Ǻ
• Circular plane
73
“Virtual Ligands”
Docking-Free Structure-Based Similarity Searching
• Idea: Autocorrelation Vectors of virtual ligand

pharmacophore points b
∫
• Formal definition: CV = f ( x ) ⋅ f ( x + l ) ⋅ dx
a
Binding pocket Potential interaction sites “Inverse” interaction sites
Virtual Ligand
Example: Thrombin Active Site PPPs
74
Example: Factor Xa “Fingerprints”
Virtual Ligand Known Ligand
DD DA DH AA AH HH DD DA DH AA AH HH
HO
COBRA_3743 (Xa/VIIa) O
O S
O N
YM_60828_3743
NH
N O NH2
D: Donor, A: Acceptor, H: Hydrophobic HN
Goodness-of-Hit
 H (3 A + H t )   H t − H a 
GH =  a  × 1 − 
 4Ht A   D− A  Guner & Henry (2000)
D number of compounds in the database

A number of actives in the database
Ht total number of compounds in the hit list
Ha number of actives in the hit list
• often, enrichment and coverage are in competition

retrospective analysis by GH value
75
SVM-based Feature Relevance (1)
a) b)
maximum
margin
optimal
hyperplane
important features unimportant features
Byvatov & Schneider (2004) JCICS 44:993
SVM function:
f (x) = ∑ ai * K (x sv
i , x) + b
i
Separating hyperplane:
f (x) = (w • x) + b , where w = ∑ ai x i is a normal vector

sv
i of the separating hyperplane.
Feature change along the normal of the SVM plane:
∂f (x) ∂K ( x sv
i , x)
R f ( x) = = ∑ ai * +b
∂x f i ∂x f
∂K (x sv sv
i , xj )
R f = ∑ R j (x ) = ∑ ai *
sv
j +b
j i, j ∂x f
76
c)
Factor Xa vs. COBRA
• SVM-based feature selection is

more robust than KS-Statistics
• target-specific features can be

identified (here: CATS 2D)
Factor
e) Xa vs. Thrombin
L
O O
OH S
O
NH O
NH H
H2N
+ NH HN NH
H2N O NH
+ N
H
NH O H2N
HO Br HN
Calculation of 3PP Importance (1)

Bit set Bit not set
Ri = f (x( Fi = 1)) − f (x( Fi = 0))

x : molecular fingerprint with features F
a) b) c) d)
(2+3=5) 3
NH2 Ri = 3
O
OH 3
2 2
Rj = 2
77
predicted thrombin inhibitor

S2/3
O O
H H
N O N O
S N S N
O H O H
O O O O
NH Gly216 NH
19
H2N NH H2N NH
Gly216 Asp189
Gly219
NH
N COOH N
H N COOH
N
NH O
S O H N O HN NH2
O S O
Asp189 O O
O S NH O NH2
20 NH2
NH O
S1 N NH2
H
H2N NH
Argatroban (Ki = 5 nM) NAPAP (Ki = 7 nM)

(PDB: 1dwc) (PDB: 1dwd)
Sulfonyl group
a) b)
planar
hydrophobic
Ring B
O
H-bond
N S NH2
acceptor
Ring A F3 C N O
COX-2 inhibitor SC558

(PDB: 1dwc) Molecule 5
His90
SC-558
78
Virtual Screening for D3 Receptor Ligands (1)
Arom.A Aliph. Lipo. Binding

B C
O N D3, D2
N O
N
H N
N Ki = 1408 nM
BP897 (D3 partial agonist) 3
12 Ki = 1414 nM
O
+ analogues with known D2, D3 binding O N Cl
N N
MOE 3PP fingerprints Kii == 40
K 40 nM
nM
4
13
Kii == 554
K 554 nM
nM
SVM training + predictions O
O
11 compounds tested N N
N
Cl
Ki = 139 nM
O O SVM-based 5
14
Ki = 417 nM
O O
O
similarity search O
N
D3 N
N K i
H Ki = 96 nM
log D2
regression 6
15
Ki = 201 nM
best: D3 Ki < 2 µM K i
O
N
D2 Ki < 2 µM Ki = 914 nM
Ki = 4395 nM
7
16
Byvatov et al. (2005) Chembiochem 6:997
Virtual Screening for D3 Receptor Ligands (2)
• hD3 Homology model (rhodopsin template; 28% sequence identity)

• ligand docking (MOE)
A B
Asp 110
Asp110
Phe 345
Phe345
Ser192
Phe346 Phe 346
Ser 192
79
Naumann & Matter (2002) J. Med. Chem 45:2366
© Dr. Hans Matter
© Dr. Hans Matter
80
GRID Potentials & Block-Scaling
GRID
BUW
BUW coefficients are obtained equalizing the sum of squares of each block of variables
Equal weight of each block (no prior autoscaling!)
Consensus PCA (CPCA)
• Daten in Blöcken: verschiedene Information

• welche Informationsblöcke tragen zum Modell bei?
Block-Analyse (CPCA)
Eigenwerte (“erklärte Varianz”) werden pro Block errechnet
Superblock-Analyse (entspricht “klassischer” PCA)
81
ATP-Bindetasche in Kinasen
© Dr. Hans Matter
© Dr. Hans Matter
82
© Dr. Hans Matter
Spezifische Attribute von Phosphate-binding area

ATP-Bindetaschen in Kinasefamilien
ATP-purine pocket
© Dr. Hans Matter
83
Correlation-Vector-Representation
of molecules
Correlation Vector Representation (CVR) of Molecules
i,j: coordinate points (e.g., atoms)

CVd = ∑∑ δ ij ⋅ (qi ⋅ q j )d
A A
A: number of points
q: property value
i =1 j =1 δ: Kronecker delta
d: distance between two points
P. Broto, G. Moreau, C. Vandyke (1984)
Eur. J. Med. Chem. 19:66.
b
• autocorrelation function ( CV = ∫ a
f ( x ) ⋅ f ( x + l ) ⋅ dx)
• spatial or topological distance
• rotation- and translation invariant („alignment-free“)
• descriptor vector of defined length (similarity calculation)
• easily implemented
84
modlab® Correlation Vector Representations
O
O
O
O
S O
H2N
O
CORINA
PETRA
Gasteiger et al.
CVd = ∑∑ δ ij ⋅ (qi ⋅ q j )d
A A A A
1 A A T
CVdT = ∑∑ δ ij ,d CVdT = ∑∑ δ ijT,d ⋅ wi ⋅ w j
A i =1 j =1 i =1 j =1 i =1 j =1
LUDI atom types PATTY atom types
The CATS2D Descriptor
NH 2
Lipophilic:
Lipophilic: {C(C)(C)(C)(C),
{C(C)(C)(C)(C), Cl}
Cl}
Positive:
Positive: {[+],
{[+], NH2}
NH2}
Negative:
Negative: {[-],
{[-], COOH,
COOH, SOOH,
SOOH, POOH}
POOH}
H-bond
H-bond Donor:
Donor: {OH,
{OH, NH,
NH, NH2}
NH2}
H-bond
H-bond Acc.:
Acc.: {O,
{O, N[!H]}
N[!H]}
85
„Retrospektives Screening“
Library
Bekannte aktive Moleküle
“Query”
• Paarweise Güteberechnung zwischen “Query” und “Molekül X”

• Für alle Strukturen werden Rangplätze vergeben
number of actives found

ef =
number of actives expected
There Is No Best Method
COX
COX MMP
MMP
HIVP
HIVP
• COBRA 3.2
86
Similarity Searching → Complementary Results
COX-2 MMP HIV-Protease
CATS2D CATS2D CATS2D

CATS3D CATS3D CATS3D
Charge3D Charge3D Charge3D
∩=6 ∩=1 ∩=0
U. Fechner, S. Renner, L. Franke, P. Schneider, G. Schneider (2003)
Do not forget the receptor!
COX-2: buried, narrow MMP3: shallow, solvent-exposed HIV-Prot: buried “tunnel”
1CX2 1D5J 1HSG
Red: crystal structure of L-735,524

Green: CORINA model
87
„Retrospektives Screening“ - Beispiele
% found
% of library screened
adapted from Stahl, Rarey, Klebe (2001)
„Retrospektives Screening“ - Ranking
88
„Retrospektives Screening“ - Methodenvergleich
„Retrospektives Screening“ - Methodenvergleich
DaylightFingerprints
Top ranks in both lists
H1 receptor antagonist
(query)
FeatureTrees
89
Fusion of CATS2D Ranked Lists
Nuclear Receptor ligands subset of
A∪B∪C∪D∪E∪F∪G the COBRA database, v2.1 (N = 211)
A∪B∪C∪D∪E∪F A) Manhattan distance

A∪B∪C∪D∪E
B) Euclidian distance
C) Tanimoto coefficient
A∪B∪C∪D D) Soergel distance
E) Dice coefficient
A∪B∪C
F) Cosine coefficient
A∪B G) Spherical distance
A S   P act 
ef =  act   
0 5 10 15 20 25 30  S all   P all 
cumulative percentage of actives found
Subset Pool
Black
Black bars
bars show
show the
the percentage
percentage of of
(COBRA)
actives
actives that
that were
were retrieved
retrieved by
by the
the
respective
respective similarity
similarity metric
metric and
and no
no more
more
than
than one
one additional
additional similarity
similarity metric.
metric.
„Fuzzification“ of the CATS2D Descriptor
Counts Counts
4 4
3.5 3.5 Counts = f ⋅ Countsbin+1
3 3
2.5 2.5 Counts = f ⋅ Countsbin−1
2 2
1.5 1.5
1 1
0.5 0.5
0 1 2 3 4 5 0 1 2 3 4 5
Distance / bonds Distance / bonds
Original
Original „Fuzzy“
„Fuzzy“
• no significant overall improvement of enrichment of actives

in a focused library
• can be helpful for individual searches ( scaffold hopping)
90
CATS2D: A Ranked List KKi (D1)
(D1)==270
KKi i(D3)
270nM
==21nM
nM
(D3) 21nM
KKi i(D4.2) = 11nM
(D4.2) = 11nM
Query KKi i(5-HT2A) = 25nM
i (5-HT2A) = 25nM
O KKi (α1)
(α1)==19
19nM
nM
Haloperidol OH
KKi i(H1)
(H1)==730
730nM.
nM.
D2-antagonist N Cl i
F
O N
OH N H OH 5-HT2C antagonist
N
D2 ligand N Br
1 F 6 O
O
OH N
O
N O H3 antagonist
D2 ligand 2 F F 7 NH
F
F
GABA transporter HO TNF-α inhibitor

type I (ion channel) 3 8 N
O N N N N
NH 2
S
O N
O
4 S 9
PPAR-γ agonist HN N F D2 ligand
O
O Cl
5 10 Eliprodil
D2 ligand H2N N
N N (ion channel)
F OH
The previous figure shows the ten highest ranking compounds which were retrieved from the COBRA
database by a topological pharmacophore similarity search (CATS method). The query structure was
Haloperidol, a dopamine (D2) receptor antagonist. Not surprisingly, classic variations of the query structure
are found on ranks 1 and 2. These are not very interesting from the library design or scaffold-hopping point of
view. On ranks 5 and 9 two additional D2-receptor ligands are found, one of which surprisingly is an agonist
(rank 5), and the well-known Melperone structure on rank 9 which represents a substructure of Haloperidol.
Retrieval of the rank 5-molecule could already be regarded as a “scaffold-hop”, as the compound has a
different structure than the query. This molecule is a “D2-ligand” and may be regarded as isofunctional on this
description level of “bioactivity”, but not necessarily exhibit the same kind of functional activity (the
compound on rank 5 is an agonist, not an antagonist as the query molecule). Looking at the first molecules
ranked between known D2-ligands, we find an annotated ion channel blocker (GABA transporter type I,
GAT1) on rank 3, and an antiinflammatory PPAR-γ agonist (Pioglitazone) on rank 4. Based on the similarity
ranking, it would now be worthwhile testing these molecules in a dopamine receptor binding assay. Indeed, a
co-inhibition of dopamine transporter and GAT1 has been reported for Orphanin FQ, an endogenous
antagonist of the dopamine transporter; and Pioglitazone has been found to prevent dopaminergic cell loss.
These are first indications that the similarity search might have produced useful results. Still, only the
biochemical test can validate the results. An argument against the compound on rank 4 might be the lack of a
basic amine function. Looking at the structures on lower ranks, we find a serotonin receptor 5-HT2C
antagonist (rank 6), a histamine receptor H3 antagonist (rank 7), a TNF-alpha inhibitor (rank 8), and another
ion channel blocker (Eliprodil, rank 10). For an assessment of this finding, it is important to learn about other
activities of the query structure: Ki (D1) = 270 nM, Ki (D3) = 21nM, Ki (D4.2) = 11nM, Ki (5-HT2A) = 25nM,
Ki (α1) = 19 nM, Ki (H1) = 730 nM. This means that Haloperidol exhibits binding activity against a whole
family of targets, and is not specific for the D2 receptor. Therefore, retrieving a H3 ligand on rank 7 can be
considered a success if we keep in mind that the query has significant binding potential at the H1 receptor.
This brief example of a pharmacophore-based similarity search demonstrates that one has to be very
careful when analyzing a ranked list, and a seeming contradiction to what was expected as an outcome
of the experiment might be resolved by considering multiple activity of the query structure.
91
How Far Should We Look?
Number
of Hits
0
Distance to Query
Threshold?
Threshold?
U. Fechner, G. Schneider (2004)
CATS: Quest for Novel Ca2+ Channel Blockers
N
N
O H
2 IC50: 3 µM - RTTC / FLIPR
N
O N
F N N
H
O CATS Cl N
N
O
O
1 IC50: 1.2 µM - RTTC / FLIPR

3 IC50: 2.4 µM - RTTC / FLIPR
O
O
Mibefradil (Posicor®) N N
T-type Ca2+ channel blocker

4 IC50: 3.5 µM - RTTC /FLIPR
O O O
O N N
CATS
O O N
O
O N
6 IC50: 3.3 µM - RTTC / FLIPR 5 IC50: 0.8 µM - RTTC / FLIPR
G. Schneider, T. Giller, W. Neidhart, G. Schmid (1999)
92
“Fuzzy” Pharmacophores
Assign
Align atom types Determine Local Feature Densities
LFD = 1.46 LFD = 1
LFD = 2.09
LFD = 1.76
LFD = 1.89
LFD = 1.85
LFD = 1.85 LFD = 1.88
LFD = 2.18 LFD = 2.16
LFD = 2.14 LFD = 2.17
Cluster PPPs Assign Distances Calculate Fingerprint

1 
Wc = min  ,1 = 0.5
2 
σ = 0.5
dist = 2.94 Å
1 5  1 6 
Wc = min  ,  + min  ,  = 0.96
2 11  2 11 
σ =1.3
Fuzzy Pharmacophores:
Quest for Novel COX-2 Inhibitors
1%
5%
10%
93
Fuzzy Pharmacophores Outperform
Single-Query Searching
Fuzzy Pharmacophores:
Quest for Novel Thrombin Inhibitors
H2
D1
H3 a) b) c)
H1
A1
B
d) e) f)
“Fuzziness”
94
Fuzzy Pharmacophores Outperform
Single-Query Searching
Fuzzy Pharmacophore Model of TAR Binders

H2N
N NH2
OH
Binding of Tat protein to TAR N
N O
RNA is essential for HIV N N HN
N O
replication O N N O
S Cl N
1 2 3
Lind et al., Chem Biol, 2002 Hamy et. al, Biochemistry 1998
Known Tat-TAR interaction inhibitors
Flexible alignment
of 2 and 3 to active
conformation of 1
NMR-structure 1LVJ
with bound inhibitor 1
95
Fuzzy Pharmacophore Model of TAR Binders
Fuzzy pharmacophore model

of TAR-RNA ligands
Virtual screening
of the SPECS catalogue
10 compounds cherry-picked
4 hits, 1 novel TAR-RNA ligand
In cooperation with SFB 579

S. Renner, O. Boden, V. Ludwig, M. Göbel, G. Schneider (2004)
New Allosteric Modulators of Metabotrobic Glutamate

Receptor 5 (mGluR5) (1)
Queries
Flexible
Flexible alignment
alignment of
of queries
queries 33--99 (MOE)
(MOE)
conformations
conformations for
for CATS3D
CATS3D
Renner et al. (2004) Chembiochem 6:620
96
New Allosteric Modulators of Metabotrobic Glutamate
Receptor 5 (mGluR5) (2)
Ranked list
Molecule Most similar K i mGluR5 K i mGluR1 Selectivity

no. reference (µM) (µM) (K i mGluR1 /
molecule K i mGluR5)
10 6 12 17 1.4
11 8 14 45 3.2
12 3 24 > 100 > 4.2
13 6 33 61 1.9
14 7 35 > 100 > 2.9
15 9 38 > 100 > 2.6
16 8 39 > 100 3.2
17 5 41 64 1.6
18 9 63 > 100 > 1.6
19 5 > 100 14 < 0.14
97
3D Conformations
Heuristic Conformer Generation
Heuristic 3D Conformer Generation
• CORINA one conformer

• ROTATE multiple conformers
98
• Generation of isomeric structures
Br Br
A meso molecule is one that is superimposable on its mirror image H H
(achiral) but has stereogenic centers. Br Br
The most common kind of mesocompound is a molecule with two H H
stereogenic centers and a plane of symmetry.
(2R,3S)-2,3-dibromobutane
• Identification of “geometrically strained” configurations
• Elimination of clashes (vdW contacts) Clash in

n-heptane
• Elimination of duplicate conformations (e.g. meso compounds)
STEREOCHEMISTRY
http://orgchem.colorado.edu/courses/3361manualF04/MMstereofullLM61F04.pdf
http://www.chem.umd.edu/courses/jarvis/chem233spr04/Chapter04Notes.pdf
99
Principles of ROTATE
Zuweisung von Torsionswinkeln
100
Observed torsion angles (CSD)
Schwab (2003)
n-decan
144 conformers
Receptor-bound vs. best ROTATE conformer
N
1
RMS XYZ =
N
∑ ( X i − X i ')2 + (Yi − Yi ')2 + (Z i − Z i ')2
i =1
101
Berechnung des RMSXYZ-Wertes
102
Torsionswinkelhistogramm (CSD) und
abgeleitete Potentialfunktion
h(τ)
[SLN (SYBYL Line Notation)]
E(τ) = A • ln h(τ)
(Näherungsverfahren
nach Murray-Rust)
Diskretisierung von Torsionswinkeln
103
Erkennen von rotierbaren Bindungen
• Bindung muss Einfachbindung sein

• Bindung darf nicht Teil eines Ringsystems sein
• Bindung darf nicht endständig sein
Sonderfall 1: Carbonsäureamid
in CSD
104
Sonderfall 2: Keto-Enol Tautomere
in CSD
Erkennung von Ringsystemen
• graphentheoretischer Ansatz
(z.B. durch Flood-fill)

Alle Knoten (Atome) auf den kürzesten Pfaden
zwischen den Knoten, welche durch eine
Ringschlußbindung verknüpft sind, werden als
Ringatome markiert.
105
Distribution of intramolecular atom-pair distances
in CORINA-generated conformations of druglike compounds
(COBRA v2.1)
N
average DAB = 15 Å
maximal pair-wise distance (DAB) / Å
Molecular Diversity
Subset Sampling
106
Molecular Diversity
Distance-based
• Diversity metrics Cell-based
Variance-based
2D
• Diversity spaces 3D
Physicochemical
Reagent-based
• Diversity sampling Product-based
Adapted from: Agrafiotis et al. (2000) in “Virtual Screening for Bioactive Molecules”
Böhm, Schneider eds., Wiley-VCH.
Distance-based diversity metrics: D1
1. Define distance function

2. Compute library diversity
Often used:
minimum pair-wise distance of compounds i,j in a collection C
D1 (C ) = min d ij
i< j
Problem: D1 depends on a single inter-molecular distance.
= equally diverse sets

according to D1
107
Distance-based diversity metrics : D4
Average nearest-neighbor distance of compounds i,j in a collection C
1
D4 (C ) =
N
∑ min d
i
j ≠i
ij
Less sensitive to outliers!
Collection of 100 “most diverse” compounds

From a library of 10,000 points
(subset maximizes D4)
Problem: D4 does not consider inter-cluster distances
= equally diverse sets

according to D4
• such a situation is rarely found in real-life problems
108
Space partitioning by a k-dimensional tree
• Quadratic dependence of D1 and D4 on the number of compounds in C

Virtually useless for large libraries and high-dimensional spaces!
Workaround: nearest-neighbor searching in a
k-dimensional tree (for dimensions < 10)
Def.: A multidimensional search tree for n points in k-dimensional space.

“Find the set of points that fall into a given rectangle in a plane“ in O(sqrt(n)+k) time
left right
left right
up
down down up point coordinates
y are discriminators
One possible tree!
x Demo: http://www.rolemaker.dk/nonRoleMaker/uni/algogem/kdtree.htm
Entropy measure (“information content”) of library diversity
D7 (C ) = S max − S
N N
S = −∑∑ pij ln pij
i =1 j =1
pij: probability of finding the i-th individual

in the j-th species
(from substructure similarity table)
• subset maximizes D7
Unbalanced subset selection
Agrafiotis et al. (2000)
Critically depends on the definition of “information”.
109
Cell-based diversity metrics: D8
• Gridding of chemical space
• Absolute positions of compounds (in contrast to diversity-based metrics)
M
D8 (C ) = ∑ δ i if cell is occupied, δi = 1
else δi = 0.
Measure
Measure of
absolute
of
absolute diversity!
diversity!
i =1
equally diverse sets

according to D8
• does not consider clustering of data

• poor discrimination of collections with similar span but different distributions
Cell-based diversity metrics: D9-12

NC
Cell-based fraction D9 (C ) =
NR
D10 (C ) = ∑ ( N i − N * )
2
Cell-based χ2
i
Cell-based entropy D11 (C ) = −∑ ( N i log( N i ))

i
  N 
Cell-based density D12 (C ) = −∑  N i log i  
i   Mi 
NC number of cells occupied by C
NR number of cells occupied by the reference set
Ni number of compounds in the i-th cell of the subset
N* average number of compounds per cells expected for the subset
Mi number of compounds in the i-th cell of the reference set
110
Subset Selection & Sampling
n!
Number of different subsets =  n  =
k
  (n − k )!k!
n : number of compounds in C
k : number of compounds in the subset
• Product-based selection is more effective in terms of diversity

(generates more diverse subsets than reagent-based selection)
• Reagent-based selection can cope with very large libraries
• Differences also result from descriptor type
Maxmin Sampling
SELECTION
POOL
1. take first compound from POOL and put it into SELECTION,

2. find the compound in POOL which is most dissimilar from
the compounds in SELECTION and put it into SELECTION,
max (min (d (C POOL , C SELECTION )))
3. repeat step 2 until the desired number of compounds is in
SELECTION.
111
http://gecco.org.chemie.uni-frankfurt.de/maxminselection/index.html
1 1
b)
0.5 0.5
0 0
0 0.5 1 0 0.5 1
1 1
d)
0.5 0.5
0 0
0 0.5 1 0 0.5 1
Maxmin Sampling: Java vs. C implementations
2
x 10
3
R1
calculation time [s]
C
O
N R2 2
HN Java
O
R3
1
0
0 1 2 3 4 5
number of compounds in pool x 10
4
112
Kolmogorov-Smirnov Statistics
Dissimilarity between two property distributions

(“model-free” approach to comparing distributions)
K * = max P ( x) − P * ( x) Maximum value of the absolute difference
−∞ < x >∞ between two distribution functions
actual distribution target (reference) distribution
Example:
K* is in [0,1]
⇒ K = 1− K *
Similarity index
Kolmogorov-Smirnov for Sampling
Source Library
Flip members
• combinatorial Sub-set
• de novo
• corporate Goal:
K max.
Reference set
e.g. by Metropolis sampling:
if ∆ ≥ 0 then accept
else if exp(-∆/T) > random[0,1] then accept;
113
Kolmogorov-Smirnov for Sampling
Agrafiotis et al. (2000)
An Approach to
Product-Based Diversity Sampling
Reference
Compounds
Choose molecular descriptors Source Library

• combinatorial
Estimate probability distribution • de novo
• corporate
Sample library members
Synthesise / Order / Test
Actives
Inactives
Byvatov & Schneider (2003)
114
De novo design
of druglike molecules
How druglike chemical space might be structured
M.C. Escher's “Development II“

© 2004 The M.C. Escher Company - Baarn - Holland.
All rights reserved
115
Scaffold analysis with MEQI
Xu, Johnson (2001) JCICS 41:181

Xu, Johnson (2002) JCICS 42:912
Application to scaffold hopping from “natural ligands”: Jenkins et al. (2004) JMC 47:61444
www.pannanugget.com
Molecular graph Cyclic skeleton Reduced cyclic skeleton

(reduced second-degree vertices)
• “light” version for up to 5,000 molecules: www.pannanugget.com
Hierarchical relationships of structural feature classes
116
Ligand Classes & Structural Feature Classes
Jenkins et al. (2004) JMC 47:61444
De novo design concepts
Requirements Implementations
• Grow
• Link
• Structure sampling method • Lattice
• Stochastic
• Primary constraints
• Structure assessment method (receptor, ligand)
• Secondary constraints
• Depth-first search (DFS)

• Breadth-first search (BFS)
• Search method & Stop criterion • Random search
• Evolutionary Algorithm
• Monte Carlo / Metropolis
• Exhaustive enumeration
117
Building Blocks Primary target constraints Combinatorial Search Strategy Structure Sampling
Name Publication Buildin Fragments Receptor Ligand DFSA BFSB Random MCC EAD Grow Link Lattice MDE Stochastic
HSITE/ 2D Skeletons10,29,85 1989 X X X Fitting and clipping of planar skeletons
3D Skeletons30 1990 X X X X
Diamond Lattice31 1990 X X X X
26
BUILDER v1 1992 X X X X X
18
LEGEND 1991 X X X X
LUDI11,12,86-88 1992 X X X X X
NEWLEAD28 1993 X X X X X
SPLICE58 1993 X X X X
GenStar32 1993 X X X X
GroupBuild16 1993 X X X X
CONCEPTS37 1993 X X X X
15,55-57
SPROUT 1993 X X X X X X
23,25
MCSS & HOOK 1994 X X X X
19
GrowMol 1994 X X X X X
MCDNLG59 1995 X X X X
Chemical Genesis20 1995 X X X X X
DLD24,89 1995 X X X X
PRO_LIGAND13,42,90-93 1995 X X X X X X
SMoG39,40,94 1996 X X X
27
BUILDER v2 1995 X X X X
33
CONCERTS 1996 X X X X
21
RASSE 1996 X X X X
PRO_SELECT14,38 1997 X X X X
SkelGen61,62 1997 X X X X X
Nachbar43,95 1998 X X X X
Globus47 1999 X X X X
DycoBlock34,35 1999 X X X X
LEA45 2000 X X X X
22
LigBuilder 2000 X X X X X
46
TOPAS 2000 X X X X
F-DycoBlock36 2001 X X X X
ADAPT65 2001 X X X X
Pellegrini & Field44 2003 X X X X X
SYNOPSIS53 2003 X X X X
CoG48 2004 X X X X X
BREED60 2004 X X X Exhaustive recombination
Link/Grow Strategy
N place
H fragments Ki = 16 µM
O O
link
O
Ile56
N
H OH OH
O Babine et al. (1995)
O
Bioorg. Med. Chem. Lett. 5:1719
O
O N place first
H fragment
Asp37 O
Phe46 O
grow
Define Determine
binding pocket interaction sites
OH
O
(FKBP-12) O
O
O
118
Lattice Strategy
N N
H H
OH
O O
O O
Fill pocket Find and connect Assign molecular Build

with lattice points interaction points framework molecule
Molecule Assembly Strategies
1. Generate a molecular skeleton based on molecular graph

A
2. Assign real 3D substructure elements (e.g. SPROUT)
1. Link 2D molecular building blocks (SMILES, mol)

B
2. Calculate 3D conformation (e.g. TOPAS)
C • Directly link 3D molecular fragments (e.g. LUDI)
119
Tree model of search
space exploration by an Binding
Pocket
automated structure Initial State
generation method
NH HN
Level 1
• Grow strategy
• Depth-first search x
• Structure-based O
HO
NH NH NH
Level 2
N O
N
x x
...
NH
End State
O Designed Molecule
HO
O
N NH O
Search space exploration by an HN

OH
O
evolutionary algorithm
O
O
Initial state
O HN O
N+ N
O
OH
N 0.47
O
HO
O
O
• Mutation / Selection
Tanimoto index (similarity to the template structure)
N
N
H
N NH 0.57
• Depth-first search
N N
N N
N O
• Ligand-based N
O
NH N
0.66
(Reference: Gleevec®) Br NH
H O
N
N N
N
NH N 0.70
N
N Br NH
O
OH
HN
N N 0.81
F N HN
O
O N
N N
H
N
H
N
0.92
N
N F
O N
N N N N
1.0
H H
N
N
End state
120
Generation of favorable ligand-binding positions
• CAVEAT (Lauri & Bartlett, 1984)
• GRID (Goodford, 1985 )
• LUDI (Böhm, 1992)
• MCSS (Miranker & Karplus, 1991) /

CHARMM (Brooks, 1983)
CAVEAT (Lauri & Bartlett, JCAMD 1984, 8:51)
• Designing mimics of known ligands

• Designing linking units to constrain acyclic molecules
• de novo design of active site ligands
Design of a glucopyranose receptor

Yang et al. Angew. Chem. Int. Ed. 2001, 40:1714
121
LUDI (Böhm)
• Finde WW-Zentren
HD: blau
HA: rot
Lipo: grün
• Plaziere Fragmente
• Verknüpfe Fragmente
Böhm et al. (1996)
De novo Design mit LUDI: Trypsin-Inhibitoren
Böhm et al. (1996)
122
CHARMM
123
DHFR site points
• acceptors
• donors
• ring centroids
• neutrals
HIV-Protease site points
• acceptors
• donors
• ring centroids
• neutrals
Minima for fragments

• clustered site points
final site points
124
HIV-Protease
• benzene minima
HIV-Protease
• benzene minima
and
• other ring minima
Recent de novo design examples
• New antifungal agent

HO O • Candida / Mycobacterium lanosterol 14α-demethylase (CYP51)
• MCSS fragement identification
N
1
• LUDI fragment linking
OH
Ji et al. (2003) JMC 46:474 • no heme coordination (no CYP-P450 interaction)
• New HIV-1 protease inhibitor (Ki = 42 nM)

OH
H
O N N
S
O • BREED „preferred fragment“ approach
N
O O O H • First step: 4 reference molecules recombined
2 • Second step: hybrid fusing with 100 reference structures
Pierce et al. (2004) JMC 47:2768
O • New HIV-1 reverse transcriptase inhibitor (IC50 = 4.4 µM)

N
N
H • SYNOPSIS structure-based approach
H2N S
• First step: 28 designs with predicted low IC50
3
• Second step: expert inspection & selection, 18 synthesized
Vinkers et al. (2003) JMC 46:2765
• 10/18 with IC50 < 100 µM
125
Recent combinatorial de novo design examples
• New Cdk-4 inhibitor (IC50 < 1 µM)

N
HN • LEGEND with homology model
HN
O • First step: candidate designs
O 4 • Second step: combinatorial optimization of preferred scaffolds
Honma et al. (2001) JMC 44:4628 (MW < 350 Da)
• Third step: LUDI & LeapFrog for selectivity optimization
(side-chain optimization)
O
F3C
• New CB-1 ligand (IC50 = 0.3 µM)
• TOPAS ligand-based approach
CF 3
5 • First step: designs assembled from GPCR-fragments
• Second step: expert inspection & scaffold selection
Rogers-Evans et al. (2004) QCS 23:426
• 6-10% hit rate with IC50 < 10 µM
TOPAS II: The Implementation
11
Generate λ 3,788
Reactions
Start diverse Fragments
molecules
Determine Quality Generate λ

#Atoms (!H): 12-60
#O+#N ≤ 12 (Daylight Fingerprints & molecules
Tanimoto Coefficient) (mutation)
Select best solution
Yes No
End Stop?
126
Daylight Toolkit Functions
dt_fp_allocfp allocate a new fingerprint

dt_fp_euclid compute the euclidean distance between two fingerprints
dt_fp_tanimoto compute the tanimoto coefficient of two fingerprints
dt_smirkin interpret a string as a generic reaction

dt_utransform apply a reaction transform to an object
dt_umatch match a pattern against an object
dt_smartin interpret a SMARTS string

dt_cansmiles retrieve the canonical SMILES string of an object
dt_copy make a copy of an object
dt_dealloc remove an object from the system
dt_smilin interpret a SMILES string
dt_weight return the atomic weight of an atom
dt_getrole get the role an object plays in a reaction
dt_stream allocate a stream object
dt_next retrieve the next object in a compound object
SMIRKS/ ReactionSMILES for Virtual Synthesis
Aromatic-C + Aromatic-C
([c;R1:1][10*]).([10*][c;R1:2])>>[c;R1:1]-[c;R1:2]
([c;R1:1][10*]).([10*][c;R1:2])>>[c;R1:1]-[c;R1:2]
Reaction type & site index

Aromatic
carbon Atom mapping index
Member of
exactly one ring
[10*] [10*]
+
127
The TOPAS II “Flux-Generator”
F F
HN F H HN F
O N O N
H H
N N
F
O O
Parent Structure Child Structure
O O
Step 1 Step 3
Randomly select Synthesize with
N N
a reaction and reaction chosen
retro-synthesize Amide Amide in Step 1
F O
OH F O
H2 N Step 2 OH
HN F H2 N
HN F
OH H2N Randomly pick a OH
O fragment and substitute H
N
O
F by a fragment of the
Fragmented Parent Structure same type Fragmented Child Structure
Design Examples - Napsagatran

O
O OH • ‘recapped’ COBRA
S NH N
O
• Daylight Fingerprints
O
NH • Tanimoto Coefficient
O
Napsagatran • non-adaptive EA
N
NH
H2N
O O
S NH OH
O
NH
O
T = 0.9 N
NH
H2N
128
Design Examples - Gleevec
O N
• ‘recapped’ COBRA
N N N N • Daylight Fingerprints
H H
N
N • Tanimoto Coefficient
Gleevec
• non-adaptive ES
O N
N N N N
H H
N
N N
T = 0.88
NH
N O N
NH
N N
NH
N
T = 0.87
Design Examples – Dopamine D3 Ligand
• ‘recapped’ COBRA
O N
N O
• Daylight Fingerprints
N
D3: 0.92 nM
H • Tanimoto Coefficient
D2: 61 nM BP 897 • non-adaptive ES
Pilla et al. (1999) Nature 400:371-373
NH
Cl
O N O H
N N
N N Cl
T = 0.88 H
O N O
T = 0.80
O
O
N
O
N N
N N
NH
O T = 0.78
T = 0.75
129
Design Examples – Dopamine D3 Ligand
Reference: BP 897
O N
N O
D3 Homology Model
N (Byvatov, Sasse, Stark, Schneider (2004))
H
4-bond spacer
Asp 110
N
H
N N O
O
Phe 345
3-bond spacer
I Phe 346
N
H
N N O Ser 192
O
D3: 396 nM Design

D2: 117 nM
Hackling et al. (2003) J. Med. Chem. 46:3883-3899
Design of novel CB-1 ligands

R
CB-1 Seed (Ki = 110 nM) O
(Khanolkar et al. 2000) CO 2 Et O

N
N
O
O R'
O 6% Hit-Rate
De novo designs
2
Focused
N
R
F libraries
R
1
R = H; clogP = 6.90
Br; clogP = 7.79
MeSO2; clogP = 5.47 N
N
R'
GPCR-BB
TOPAS DB 10% Hit-Rate
3
130
Combinatorial Design of Novel Kv1.5 Blockers
O
O
S
O NH H
N
IC50 < 1 µM HO
(ICAGEN) O
O
S
NH O
O H
O Evolutionary N
de novo Virtual
Design CombiChem IC50 < 1 µM
O
S
O NH O
H
N
O
Pharmacophore S
O NH O
IC50 ~ 7 µM Matching H
N
IC50 ~ 1 µM
Design of a druglike hKv1.5 channel blocker
RECAP Building TOPAS combinatorial

WDI 3b
Blocks optimization
~ 46,000 24,563
Reference molecule
aromatic
aromatic
1-fluoro-2-nitrobenzene o-anisidine nucleophilic
nucleophilic reduction
reduction
substitution
substitution (Pd-catalytic
(Pd-catalytic hydrogenation)
hydrogenation)
condensation
condensation
131
Thrombin inhibitors:
automatic vs. manual design
W60D
Y60A H57
L99
S195
NH 1. Placement of fragments
O 2. Combinatorial optimization
N98 6 • Preferred reaction (red. amination)
A190 NH NH2 • „needle“ approach
G216
W215 G219
Ki = 10 nM
D189
1DWB, 3.16 Å Böhm et al. (1999) JCAMD 13:51
Y60A W60D O
1. Placement of central scaffold
N 2. Modelling
N
3.1
O 3. Fluorine scan
F
N98
7
G219 HN NH2
W215
G216
A190
Ki = 6 nM
1OYT, 1.67Å Obst et al. (1997) Chem. Biol. 4:287
D189
Olsen et al. (2003) Angew. Chem. Int. Ed. 42:2507
Conclusions: Current status of de novo design
• Pipelines are fuelled by HTS due to lack of early SAR

• de novo design is complementary to HTS:
generates new chemical entities
exploits existing knowledge
yields higher hit-rates
BUT...
• neglects receptor flexibility
Zhu et al. (2001) JCAMD 15:979 (F-DycoBlock)
Anderson & Wright (2005) Curr. Comp. Aid. Drug Des. 1:103
• needs to include secondary constraints better

• lack (?) of validated & available software tools
132

Vorlesung MolecularDesign2005

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Vorlesung MolecularDesign2005

Hochgeladen von

Copyright:

Verfügbare Formate

Moleküldesign

Folien zum Kurs

Worum es geht, am Beispiel der „Aspirin-Story“

Cokristallstruktur Salicylsäure+COX-1 Cokristallstruktur SC558+COX-2

Bio/Chemical • Bioactive Molecule(s)

Molecular Structures Test System

| |||||| | | | | || ||| | | di,j

Schneider & So 2001, Adaptive Systems in Drug Design, Landes, Austin.

Virtual Fitness Landscape

Signal peptidase I substrates

• Novel eubakterial signal peptidase-I substrates

• Artificial antigen (DCM, β1-adrenoceptor auto-antibodies)

Wrede et al. 1998, Biochemistry 37:3588

Drugs, Drug-Likeness &

Zyprexa® (Olanzapine) [Eli-Lily]

Therapeutic Target Classes

Bleicher et al. (2003)

Time & Costs in Drug Development

Functional Assays Cell-based Assays

Binding Assays Biochemical-endpoint

Amplified Luminscent Proximity Homogeneous Assay

Low background in the absence of a Amplified signal when receptor and

Detection of changes in intracellular IP3

donor GST: glutathione S-transferase

Fusion HT Microplate Reader

Binding of a ligand to a protein target

Ligand: Protein target:

Evaluation of the binding (affinity) between endogenous

Binding Assays provide a direct approach to the study of

They do not provide information on the activity of the

Definition and calculation of IC50-value

The IC50-value represents the concentration of drug that is

x = concentration [µM] of drug

The Ki-value is defined as the concentration of the competing ligand

L = concentration [nM] of radiotracer

Cheng Y., Prusoff W.H., Biochem. Pharmacol. 22: 3099-3118, 1973

Bleicher et al. (2003)

Selection of promising drug-like agents

• Binding site identification

Ligand-Based Molecular Design

Selection of promising drug-like agents

Chemical Space Focused Library

"The process of determining and maintaining a

What We Need for Exploration

• Coordinate system (“chemical space”)

Each map has a certain resolution & meaning

Bleicher et al. (2003)

Bleicher et al. (2003)

Library Shaping by Similarity

ANN, SVM, PLS etc.

drug-likeness score Rule-of-Five violations

Druglikeness may be defined as a complex

• behavior of molecule in a living organism

Rotatable Bonds < 6

• Polar Surface Area PSA

• Partion Coefficient Water/Octanol AlogP

Properties of Ligand Families

2.5 other enzymes

2.0 ion channel other

• combinatorial optimization principle

Traditional “combinatorial thinking” Fragment-based design

RECAP: Lewell et al. (1998) J. Chem. Inf. Comput. Sci. 38:511

RECAP – Eleven Bond Cleavage Types