Drug Like Properties

384
Recognizing molecules with drug-like properties W Patrick Walters, Ajay and Mark A Murcko
A variety recognizing of successful drug-like approaches molecules to the problem have been employed. of These
range from simple counting schemes such of five to the analysis of the multidimensional occupied by drugs, this variety of tools, that are enriched like properties. extending during them,
as the Lipinski rule chemistry space With libraries
in the range of 1 per 100,000 compounds screened for easier targets such as enzymes, and much worse for harder targets such as protein-protein interactions [5]. As a consequence, many researchers have begun to pay closer attention to the nature of the compounds synthesized and screened. This process is sometimes referred to as recognizing drug-like molecules. In this brief review, we will point out some recent publications in this field, and suggest some future directions that this field may take.
to neural network learning systems. it now appears possible to design
in compounds which Verifying the robustness will form years. the basis
have desirable or drugof these methods, and in this field
of research
the next few
Address Vertex Pharmaceuticals, MA 02139, USA Current Opinion
130 Waverly
Street,
Cambridge,
Simple counting drug-likeness
methods
to predict
in Chemical
Biology
1999,
3:384-387
http://biomednet.com/elecref/i 0 Elsevier Science Ltd ISSN
367593100300384 1367-5931
Abbreviations ACD Available Chemical Directory Comprehensive Medicinal Chemistry CMC MDDR MACCS-II Drug Report WDI World Drug Index
Many researchers over the years have attempted to show that drug-like molecules tend to have certain properties. For example, 1ogP (where P is partition coefficient), molecular weight, and the number of hydrogen bonding groups have been correlated with oral bioavailability [6,7]. In principle, then, one should be able to very simply improve the odds of success by biasing a combinatorial library towards compounds that have certain properties. Recently, researchers at Pfizer [4] have extended this idea with the establishment of the rule of five to provide a heuristic guide for determining if a compound will be orally bioavailable. The rules were derived from analysis of 2,245 compounds from the World Drug Index (WDI; Derwent Information, London, UK) which have a LJSAN (United States adopted name) or INN (international nonproprietary name) and an entry in the indications and usage field of the database. The assumption is that compounds meeting these criteria have entered human clinical trials, and therefore must posess many of the desirable characteristics of drugs. It was found that in a high percentage of compounds, the following rules were rrue: hydrogen bond donors < 5; hydrogen bond acceptors 2 10; relative molecular weight 2500; and IogP 5 5. The majority of the violations came from antibiotics, antifungals, vitamins and cardiac glycosides. The authors suggest that these classes of compounds are orally bioavailable, despite their violations of the rule of five, due to the presence of functional groups that act as substrates for transporters. The application of simple counting schemes to combinatorial library design is obvious. For example, Fecik et al. [8] performed an analysis of a large number of combinatorial libraries in terms of the weight of the scaffold and average weight of substituents which are necessary to arrive at products with relative molecular weights of 500.
Introduction
With the advent of high-throughput chemistry and enzymology, some researchers in the early 1990s took the position that simply throwing more compounds at a drug discovery problem would increase the odds of success. Drug companies now routinely assay several hundred thousand compounds against each new drug target, and the size of the typical screening library is soon expected to approach a million compounds. Likewise, the number of compounds that can be synthesized in one year by a dedicated combinatorial chemist can now routinely be in the range of lO,OOO-100,000 or more [l-3]. Anecdotal evidence from a variety of research labs suggests that raw speed and sheer numbers are not sufficient to crack the problem of drug discovery, however. The utility of first-generation combinatorial libraries has generally been considered to be quite low because these libraries tend to be populated with large, lipophilic, highly flexible molecules (MA Gallop, The Second Lake Tahoe Symposium on Molecular Diversity, Tahoe City, CA, January 1998; CB Cooper, National Managed Health Care Conference, Boston, MA, May 1997). Support for this thesis comes from Lipinski et al: [4], who analyzed the compounds synthesized at Pfizer between 1984 and 1994 and showed that the number of compounds with a relative molecular weight greater than 500 doubled over the 10 year period. We should also remember that the number of high-quality lead molecules to be derived from highthroughput screening (HTS) is typically quite low, perhaps
Functional
group
filters
A different approach is to identify functional groups that tend to be undesirable because of chemical reactivity, metabolic lability, and so forth. Rishton [9] discusses
Recognizing
molecules
with
drug-like
properties
Walters,
Ajay
and
Murcko
385
chemistry guidelines for the elimination of compounds such as alkylating or acylating agents, which tend to appear as false positives in biochemical screens. Specifically, a set of approximately 25 functional groups are described that are prone to solvolysis or hydrolysis or which tend to react with biological nucleophiles. Walters et al. [lo] briefly described an approach (REOS [rapid elimination of swill]) to eliminate undesirable reagents and products from screening and combinatorial libraries. REOS is a hybrid method that combines some simple counting schemessimilar to those in the rule of five with a set of functional group filters to remove reactive and otherwise undesirable moieties. The authors claim that for large (106-109) libraries, it is typically possible to remove 2 99.9% of the compounds at a rate of approximately 105compounds per hour per processor.
Chemistry
space
methods
Several research groups have attempted to define the chemistry space [20,21] that is occupied by drug-like molecules.The basic idea is that drugs will tend to possess distinct values for certain properties, and asa result, when analyzed in high-dimensional space, drugs will be shown to be distinct from nondrugs. A chemistry spaceis typically defined by calculating a number of descriptors for each molecule and using the descriptor values as points in multidimensional space.As an example, let us assumethat we have calculated molecular weight, 1ogPand the number of hydrogen bond donors for a set of molecules. These three descriptor values can then be used to define a point in a three-dimensional spacethat representseach molecule. In practice, large numbers (20-100) of descriptors are calculated and statistical techniques such as principal components or factor analysis [ZZ] are used to reduce the dimensionality of the descriptor space. Cummins et a/. [23] compared five databases Comprehensive Medicinal Chemistry (CMC; Molecular Design Ltd, San Leandro, CA), MACCS-II Drug Report (MDDR; Molecular Design Ltd), Available Chemical Directory (ACD; Molecular Design Ltd), SPECS/BioSPECS database, Specs and BioSPECS, Rijswijk, The Netherlands), and their in-house Wellcome registry. They calculated 28 topological indices, aswell asan estimate of the free energy of solvation for 300,000 compounds. Factor analysiswas used to reduce the descriptor spaceto four dimensions.The descriptor spacewasthen partitioned and the occupancy of the resulting sub-hypercubes was examined. The percentagesof the total volume occupied by the databaseswere 27% (CMC), 72% (Wellcome registry), 69% (MDDR), 46% (SPECS) and 72% (ACD). The authors also found a 92% overlap between CMC and ACD. Thus, although the method may be used to identify interesting regionsof spaceit may not by itself be an effective discriminator between drugs and nondrugs. Gillet et a/. [24] used profiles of calculated properties (numbers of hydrogen bond donors and acceptors, molecular weight, rotatable bonds, aromatic rings, and a shape descriptor) to differentiate between a set of drugs represented by 14,861 compounds from the WDI and a set of nondrugs represented by 16,807 compounds from the SPRESI database (Daylight Chemical Informations System, Mission Viejo, CA). A genetic algorithm was used to derive a set of optimal weights for the properties. The best weighting schemeswere able to provide a five- to sixfold enhancement over random selection. The authors were alsoable to achieve similar results using property profiles to identify drugs belonging to a specific therapeutic classfrom a larger drug database. A Chiron group [ZS] establisheda chemistry spaceusing logP, principal components analysis of 81 topological indices [26], chemical functionality descriptors derived from multidimensional scaling [27] of Tanimoto similarities
Prediction
of oral bioavailability
Oral bioavailability of a drug can be defined as the fraction of the oral dosethat reachessystemic circulation. Reaching systemic circulation is influenced by both absorption and first-passmetabolismin the liver or gut wall. It is alsopossible for drugs to be highly bound to plasmaproteins, thus resulting in low circulating levels. Lipophilicity and solubility are two important determinants of the extent and rate of absorption of molecules [11,12]. Lipophilicity influences both metabolic activity [13] and plasma protein binding [14]. Interestingly, the effect of lipophilicity on membrane penetration and first-passmetabolismappearto have opposing effects on oral bioavailability. It is important to note that correlation with lipophilicity doesnot imply predictivity. Regression-type models have been attempted to model/predict oral bioavailability and in &uo (in situ perfusion) and in a& (Caco-2 cells) permeability. These approaches use either theoretically calculated or experimentally obtained descriptors relating to logP, pKa, electrostatic interactions, polar surface area, AlogP (i.e. the difference in the partition coefficient between a polar solvent such as diethyl ether and a nonpolar solvent such as isooctane), and so on. Recent methods introduced by Sugawara et al. [15] and Winiwarter et a/. [16] provide excellent examples of the types of models that can be built. Other approaches along similar lines have also appeared [17,18]. A major unsolved problem with regression approachesis that it is not evident whether or not a prediction is applicable on a new seriesof compounds. An entirely diferent approach to bioavailability prediction has been taken by Amidon and co-workers [19]. This is a dynamic and phenomenological method where time is accounted for explicitly in the mathematical formulation. The authors found that a seven-compartmental small intestine model worked well in characterizing the compounds they studied. Explicit knowledge of the effective permeability (measure of in S&Uabsorption) of the drug is required, however. This is not a high-throughput method.
306
Next
generation
therapeutics
[ZS] and atom layer tables [29]. Substituents were selected using D-optimal design [30]. A list of criteria used to eliminate unacceptable candidate amines was also included.
Examination of building known drugs
blocks
in
method for discriminating drugs from nondrugs. They used 38,416 molecules from the WDI databaseas the drug set and 169,331 molecules from the ACD as the nondrug set. The program was able to correctly classify 83% of the ACD compounds and 77% of the WDI compounds.
A very different approach is to analyze the building blocks commonly found in drugs to see whether nonrandom patterns can be unearthed. This work does not directly confront the problem of distinguishing drugs from nondrugs, but it helps to define what drugs are and thereby helps chemists to think about preferred moieties for library design. Bemis and Murcko [31] examined 5,120 compounds from the CMC database and found 1,179 frameworks, or scaffolds. This suggests that drugs are rather diverse. When considering just topology, however, only 32 frameworks described the shapes of half the drugs in the set. Even when atom types and hybridization are considered, 25% of all drugs are found to utilize only 42 frameworks. These surprising results suggest that a small number of common shape themes can be re-used in widely divergent drug design situations. Ghose et a/ [32] characterized the CMC database based on computed physicochemical property profiles (log P, molar refractivity, molecular weight, and number of atoms). They established qualifying ranges, which cover more than 80% of the compounds. They also examined commonly occurring functional groups. Not surprisingly, benzene was the most common, with a frequency approximately equal to that of all aromatic heterocycles combined. Nonaromatic heterocycles were more common than aromatic by approximately twofold. Tertiary amines, alcohols and carboxamides were the most frequently occurring functional groups.
Conclusions
and future
directions
As we have shown, a wide variety of methods have already been applied to the problem of identifying moleculeswith desirable or drug-like properties. These methods appear to be meeting with some success.A key issue is whether general (i.e. global) rules can be formulated, or whether rules will always need to be local and situation-specific. The publications by Ajay et a/ [34] and Sadowski and Kubinyi [35] suggest that general rules with reasonable predictive power can be formulated. Another trend we may witness in coming years might be attempts to predict the various properties that contribute to a drugs success,rather than the more complex problem of drug-likeness itself. These might include oral absorption, blood-brain barrier penetration, toxicity, metabolism, aqueous solubility, logP, pKa, half-life, and plasma protein binding. Some of these properties are themselves rather complex and are likely to be extremely difficult to model, but in our view it should be possible for the majority of properties to be predicted with betterthan-random accuracy. Future work is likely to include additional approachesand more robust attempts at validation of these methods. Also, one hopes that the judicious use of these predictions may lead to increased efficiency in the selection of combinatorial and HTS libraries. We are probably still several years away from a definitive experiment proving this point, however. Further off, in all likelihood, will be the ability to predict downstream issues pertaining to formulation, manufacturing, shelf-life, chemical stability, and so forth. These too are critical for the success of a drug [36].
Neural
network
methods
Neural networks [33] have long been used in classification schemes, but less frequently in pharmaceutical applications; however, two papersappeared in 1998that described the successful employment of different neural network approachesto distinguish drugs from nondrugs. Ajay et a/. [34] used a Bayesian neural network. The network was trained using a random partition of 3,500 compounds, each from the CMC and ACD databases. Two kinds of descriptors were used: a set of seven one-dimensional and 166 two-dimensional descriptors. The program was able to correctly classify 90% of the CMC compounds and mis-classified only 10% of the ACD molecules. The generalizability of the method was demonstrated by the programs ability to correctly classify 80% of the compounds from the MDDR. Appearing back-to-back with Ajay et al. 134.1was a contribution from Sadowski and Kubinyi [35]. Those researchers developed a feed-forward neural network
References
Papers of particular have been highlighted
l
and recommended
interest, as: published within
reading
the annual period of review,
of special interest **of outstanding interest Gordon EM: Libraries Opin Biofechnol 1995, of non-polymeric 6:624-631. organic through molecules. combinatorial MO/ Divers Curr
1. 2. 3. 4.
Dolle RE: Discovery of enzyme inhibitors chemistry. MO/ Divers 1997, 2:223-226. Brown D: Future 1997, 2:217-222. pathways for combinatorial
chemistry.
Lipinski CA, Lombard0 F, Dominy SW, Feeney PJ: Experimental and compuational approaches to estimate solubility and permeablity in drug discovery. Adv Drug De/iv Rev 1997, 23:3-25.
Spencer RW: High-throughput screening of historic collections observations on file size, biological targets, and file diversity. Biotechnol Bioeng 1996, 61:61-67. This work provides an analysis of more than 150 high-throughput screens that were carried out at Pfizer Central Research. The authors compared hit rates for enzyme, cytokine and receptor targets. They evaluated the impact of clustering and diversity analysis on a screen for substance P antagonists.
5. .
Recognizing
molecules
with
drug-like
properties
Walters,
Ajay
and
Murcko
387
6. 7.
Navia drugs.
MA, Chaturvedi Drug Discov
PR: Design Today 1996,i
principles :I 79-189. and
for orally drug-delivery Drug Discov
bioavailable
Chan OH, Stewart BH: Physicochemical considerations for oral drug bioavailability. 1996,1:461-473.
Today H:
8.
Fecik RA, Frank KE, Gentry El, Menon SR, Mitscher The search for orally acitive medications through chemistry. Med Res Rev 1998, 18:149-l 85. Rishton GM: Reactive compounds and HTS. Drug Discov Today 1997, 21382-385. Walters WP, Drug Discov Stahl MT, Murcko MA: Virtual Today 1998, 3:160-l 78. in vitro false
LA, Telikepalli combinatorial positives in
9. 10. 11. 12.
screening
- an overview. from the 2:343-346.
Gillet VJ, Willett P, Bradshaw J: Identification of biological activity profiles using substructural analysis and genetic algorithms. J Chem inform Comp Sci 1998, 38:165-l 79. The authors used profiles of calculated properties (numbers of hydrogen bond donors and acceptors, molecular weight, rotatable bonds, aromatic rings, and a 2% shape descriptor) to differentiate between a set of drugs represented by 14,861 compounds from the World Drug Index and a set of nondrugs represented by 16,807 compounds from the SPRESI database. A genetic algorithm was used to derive a set of optimal weights for the properties. The best weighting schemes were able to provide a five to sixfold enhancement over random selection. The authors were also able to achieve similar results using property profiles to identify drugs belonging to a specific therapeutic class from a larger drug database. Martin EJ, Critchlow RE: Beyond mere diversity: tailoring combinatorial libraries for drug discovery. J Comb Chem 1999, 1~32-45. The authors present an overview of methods used at Chiron for combinatorial library design an analysis. The paper focuses on a number of techniques used to ensure that the molecules produced are diverse and posses desirable properties. 26. Kier LB, Hall LH: Molecular Analysis. New York: Wiley; Torgerson WS: Psychometrica Connectivity 1986. scaling. in Structure-Activity 1. Theory similarity diversity: discovery. algorithm and methods. 25. .
24. .
Schanker LS: On the mechanicsm of absorption gastrointestinal tract. I Med Pharm Chem 1960, Leahy DE, Lynch J, Taylor CID: Mechanisms molecules. Edited by Prescott LF, Nimmo & Sons; 1989.
of absorption of small WS. New York: John Wiley
13.
Seydel JK, Schaper KJ: Quantitative Structure-fharmacokefk Relationships in Drug Design. Edited by Rowland M, Tucker New York: Pergamon Press; 1986.
G. 27. 28. 29.
14.
Sawada GA, Barshun CL, Lutzke BS, Houghton ME, Padbury GW, Ho NFH, Raub TJ: Increased lipophilicity and subsequent cell partitioning decrease passive transcellular diffusion of novel highly lipophilic antioxidants. Pharm fxptl Ther 1999, 288:1317-1326.
Multi-dimensional 1952, 17:401-419.
Willett P, Barnard JM, Downs GM: Chemical J Chem Inform Comp Sci 1998, 38:983-996. Martin design 1995, EJ, Blaney JM, Siani MA: Measuring of combinatorial libraries for drug 38:1431-l 436. exchange
searching. experimental J Med Chem of D-optimal 1.
15. .
Sugawara M, Takekuma Y, Yamada H, Kobayashi M, lseki K, Miyazaki K: A general approach for the prediction of the intestinal absorption of drugs: regression analysis using the physicochemical properties and drug-membrane eletrostatic interactions. J Pharm SC; 1998,87:960-966. Experimentally determined log Cl values in octanol, diethyl ether, chloroform and isooctane were used in different combinations to model the rat jejunal permeability of 32 drugs. Reasonable models could be developed for anionic, cationic and nonionized compounds. Predictions for an external set of 10 compounds (including some zwitterionic compounds) were also reasonable. 16. . Winiwarter S, Bonham NM, Ax F, Hallberg A, Lennernas H, Karlen A: Correlation of human jejunal permeability (in t&o) of drugs with experimentally and theoretically derived parameters. A multivariant data analysis approach. J Med Chem 1998,41:4939-4949. In viva human jejunal permeability of 22 structurally diverse compounds was correlated with experimentally determined log D (log P) values and calculated structural parameters. The best model used log D, number of hydrogen bond donors (HBD) and polar surface area (PSA); however, models using calculated log P, HBD, and PSA and just HBD and PSA were close to the best. Reasonable predictivity was seen on an external validation set of 24 compqunds where data on oral bioavailability was available. It is important to note that some of the actively transported molecules were under-predicted by the models. 17. Stenberg P, Luthman K, Artursson P: Prediction of membrane permeability to pepides from calculated dynamic molecular surface properties. Pharm Res 1999, 16:205-212. Wessel intestinal structure. MD, Jurs PC, Tolan JW, Muskal SM: Prediction of human absorption of drug compounds from molecular J Chem Inform Comp Sci 1998, 38:726-735.
30. 31. 32. .
Miller A, Nguyen N-K: A fedorov design. Appl Stat 1994,43:669-678.
Bemis GW, Murcko MA: The properties Molecular frameworks. J Med Chem
of known drugs. 1996, 39:2887-2893.
Ghose AK, Viswanadhan VN, Wendelowski JJ: A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative characterization of known drug databases. J Comb Chem 1999,1:55-67. The authors characterized the CMC database based on computed physicochemical property profiles (log P, molar refractivity, molecular weight, and number of atoms). They established qualifying ranges, which cover more than 80% of the compounds. They also examined commonly occurring functional groups. They found that benzene was most common - frequency was approximately equal to that of all aromatic heterocycles combined. Nonaromatic heterocycles were more common than aromatic (approximately twofold). Tertiary amines, alcohols and carboxamides were the most frequently occurring functional groups. 33. 34. . Hertz J, Krogh Computation. A, Palmer Redwood RG: Introduction City, CA: Addison to the Theory of Neural Wesley; 1991.
18.
19.
Yu LX, Lipka E, Crison JR, Amidon GL: Transport approached to the biopharmaceutical design of oral drug delivery systems: prediction of intestinal absorption. Adv Drug De/iv Rev 1996, 19:359-376. Pearlman subspace Pearlman diversity. Cooley RS, Smith concept. KM: Metric validation J Chem Inform Comp and the receptor-relevant Sci 1999, 39:28-35.
Ajay, Walters WP, Murcko MA: Can we learn to distinguish between drug-like and nondrug-like molecules7 J Med Chem 1998, 41:3314-3324. The authors used a Bayesian neural network to distinguish between drugs and nondrugs. Network was trained using a random partition of 3,500 compounds each from CMC and ACD. The network was trained using a set of seven 1 D and 166 2D descriptors. The program was able to correctly classify 90% of the CMC compounds, and misclassified only 10% of the ACD molecules. The generalizablity of the method was demonstrated by the programs ability to correctly classify 80% of the compounds from the MDDR. 35. Sadowski J, Kubinyi H: A scoring scheme for discriminating . between drugs and nondrugs. J Med Chem 1998,41:3325-3329. The authors developed a neural network method for discriminating drugs and non-drugs; they used 38,416 molecules from the WDI as the drug set and 169,331 molecules from the ACD as the nondrug set. A set of atom types originally developed for log P prediciton was used as descriptors. A feedforward neural network was trained to classify the compounds. The program was able to correctly classify 83% of the ACD compounds and 77% of the WDI compounds. 36. Streng WH: substances. Physical chemical characterization Drug Discos Today 1997,2:415-426. of drug
20. 21. 22. 23.
RS, Smith KM: Novel software tools for chemical Persp Drug Design Discov 1998, 9:339-353. W, Lohones P: Multivariate Data Anaysis. New York: Wiley; 1971.
Cummins in chemical knowledge compounds.
DJ, Andrews CW, Bentley databases: comparison bases and databases J Chem Inform Comp
JA, Gory M: Molecular diversity of medicinal chemistry of commercially available Sci 1996, 36:750-763.

Drug Like Properties

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Drug Like Properties

Hochgeladen von

Copyright:

Verfügbare Formate

384

as the Lipinski rule chemistry space With libraries

to neural network learning systems. it now appears possible to design

have desirable or drugof these methods, and in this field

the next few

Address Vertex Pharmaceuticals, MA 02139, USA Current Opinion

Simple counting drug-likeness

http://biomednet.com/elecref/i 0 Elsevier Science Ltd ISSN

Examination of building known drugs

MA, Chaturvedi Drug Discov

PR: Design Today 1996,i

principles :I 79-189. and

for orally drug-delivery Drug Discov

LA, Telikepalli combinatorial positives in

9. 10. 11. 12.

- an overview. from the 2:343-346.

of absorption of small WS. New York: John Wiley

G. 27. 28. 29.

Multi-dimensional 1952, 17:401-419.

searching. experimental J Med Chem of D-optimal 1.

30. 31. 32. .

Miller A, Nguyen N-K: A fedorov design. Appl Stat 1994,43:669-678.

of known drugs. 1996, 39:2887-2893.

20. 21. 22. 23.

Cummins in chemical knowledge compounds.

Das könnte Ihnen auch gefallen