Sie sind auf Seite 1von 4

384

Recognizing molecules with drug-like properties W Patrick Walters, Ajay and Mark A Murcko
A variety recognizing of successful drug-like approaches molecules to the problem have been employed. of These

range from simple counting schemes such of five to the analysis of the multidimensional occupied by drugs, this variety of tools, that are enriched like properties. extending during them,

as the Lipinski rule chemistry space With libraries

in the range of 1 per 100,000 compounds screened for easier targets such as enzymes, and much worse for harder targets such as protein-protein interactions [5]. As a consequence, many researchers have begun to pay closer attention to the nature of the compounds synthesized and screened. This process is sometimes referred to as recognizing drug-like molecules. In this brief review, we will point out some recent publications in this field, and suggest some future directions that this field may take.

to neural network learning systems. it now appears possible to design

in compounds which Verifying the robustness will form years. the basis

have desirable or drugof these methods, and in this field

of research

the next few

Address Vertex Pharmaceuticals, MA 02139, USA Current Opinion

130 Waverly

Street,

Cambridge,

Simple counting drug-likeness

methods

to predict

in Chemical

Biology

1999,

3:384-387

http://biomednet.com/elecref/i 0 Elsevier Science Ltd ISSN

367593100300384 1367-5931

Abbreviations ACD Available Chemical Directory Comprehensive Medicinal Chemistry CMC MDDR MACCS-II Drug Report WDI World Drug Index

Many researchers over the years have attempted to show that drug-like molecules tend to have certain properties. For example, 1ogP (where P is partition coefficient), molecular weight, and the number of hydrogen bonding groups have been correlated with oral bioavailability [6,7]. In principle, then, one should be able to very simply improve the odds of success by biasing a combinatorial library towards compounds that have certain properties. Recently, researchers at Pfizer [4] have extended this idea with the establishment of the rule of five to provide a heuristic guide for determining if a compound will be orally bioavailable. The rules were derived from analysis of 2,245 compounds from the World Drug Index (WDI; Derwent Information, London, UK) which have a LJSAN (United States adopted name) or INN (international nonproprietary name) and an entry in the indications and usage field of the database. The assumption is that compounds meeting these criteria have entered human clinical trials, and therefore must posess many of the desirable characteristics of drugs. It was found that in a high percentage of compounds, the following rules were rrue: hydrogen bond donors < 5; hydrogen bond acceptors 2 10; relative molecular weight 2500; and IogP 5 5. The majority of the violations came from antibiotics, antifungals, vitamins and cardiac glycosides. The authors suggest that these classes of compounds are orally bioavailable, despite their violations of the rule of five, due to the presence of functional groups that act as substrates for transporters. The application of simple counting schemes to combinatorial library design is obvious. For example, Fecik et al. [8] performed an analysis of a large number of combinatorial libraries in terms of the weight of the scaffold and average weight of substituents which are necessary to arrive at products with relative molecular weights of 500.

Introduction
With the advent of high-throughput chemistry and enzymology, some researchers in the early 1990s took the position that simply throwing more compounds at a drug discovery problem would increase the odds of success. Drug companies now routinely assay several hundred thousand compounds against each new drug target, and the size of the typical screening library is soon expected to approach a million compounds. Likewise, the number of compounds that can be synthesized in one year by a dedicated combinatorial chemist can now routinely be in the range of lO,OOO-100,000 or more [l-3]. Anecdotal evidence from a variety of research labs suggests that raw speed and sheer numbers are not sufficient to crack the problem of drug discovery, however. The utility of first-generation combinatorial libraries has generally been considered to be quite low because these libraries tend to be populated with large, lipophilic, highly flexible molecules (MA Gallop, The Second Lake Tahoe Symposium on Molecular Diversity, Tahoe City, CA, January 1998; CB Cooper, National Managed Health Care Conference, Boston, MA, May 1997). Support for this thesis comes from Lipinski et al: [4], who analyzed the compounds synthesized at Pfizer between 1984 and 1994 and showed that the number of compounds with a relative molecular weight greater than 500 doubled over the 10 year period. We should also remember that the number of high-quality lead molecules to be derived from highthroughput screening (HTS) is typically quite low, perhaps

Functional

group

filters

A different approach is to identify functional groups that tend to be undesirable because of chemical reactivity, metabolic lability, and so forth. Rishton [9] discusses

Recognizing

molecules

with

drug-like

properties

Walters,

Ajay

and

Murcko

385

chemistry guidelines for the elimination of compounds such as alkylating or acylating agents, which tend to appear as false positives in biochemical screens. Specifically, a set of approximately 25 functional groups are described that are prone to solvolysis or hydrolysis or which tend to react with biological nucleophiles. Walters et al. [lo] briefly described an approach (REOS [rapid elimination of swill]) to eliminate undesirable reagents and products from screening and combinatorial libraries. REOS is a hybrid method that combines some simple counting schemessimilar to those in the rule of five with a set of functional group filters to remove reactive and otherwise undesirable moieties. The authors claim that for large (106-109) libraries, it is typically possible to remove 2 99.9% of the compounds at a rate of approximately 105compounds per hour per processor.

Chemistry

space

methods

Several research groups have attempted to define the chemistry space [20,21] that is occupied by drug-like molecules.The basic idea is that drugs will tend to possess distinct values for certain properties, and asa result, when analyzed in high-dimensional space, drugs will be shown to be distinct from nondrugs. A chemistry spaceis typically defined by calculating a number of descriptors for each molecule and using the descriptor values as points in multidimensional space.As an example, let us assumethat we have calculated molecular weight, 1ogPand the number of hydrogen bond donors for a set of molecules. These three descriptor values can then be used to define a point in a three-dimensional spacethat representseach molecule. In practice, large numbers (20-100) of descriptors are calculated and statistical techniques such as principal components or factor analysis [ZZ] are used to reduce the dimensionality of the descriptor space. Cummins et a/. [23] compared five databases Comprehensive Medicinal Chemistry (CMC; Molecular Design Ltd, San Leandro, CA), MACCS-II Drug Report (MDDR; Molecular Design Ltd), Available Chemical Directory (ACD; Molecular Design Ltd), SPECS/BioSPECS database, Specs and BioSPECS, Rijswijk, The Netherlands), and their in-house Wellcome registry. They calculated 28 topological indices, aswell asan estimate of the free energy of solvation for 300,000 compounds. Factor analysiswas used to reduce the descriptor spaceto four dimensions.The descriptor spacewasthen partitioned and the occupancy of the resulting sub-hypercubes was examined. The percentagesof the total volume occupied by the databaseswere 27% (CMC), 72% (Wellcome registry), 69% (MDDR), 46% (SPECS) and 72% (ACD). The authors also found a 92% overlap between CMC and ACD. Thus, although the method may be used to identify interesting regionsof spaceit may not by itself be an effective discriminator between drugs and nondrugs. Gillet et a/. [24] used profiles of calculated properties (numbers of hydrogen bond donors and acceptors, molecular weight, rotatable bonds, aromatic rings, and a shape descriptor) to differentiate between a set of drugs represented by 14,861 compounds from the WDI and a set of nondrugs represented by 16,807 compounds from the SPRESI database (Daylight Chemical Informations System, Mission Viejo, CA). A genetic algorithm was used to derive a set of optimal weights for the properties. The best weighting schemeswere able to provide a five- to sixfold enhancement over random selection. The authors were alsoable to achieve similar results using property profiles to identify drugs belonging to a specific therapeutic classfrom a larger drug database. A Chiron group [ZS] establisheda chemistry spaceusing logP, principal components analysis of 81 topological indices [26], chemical functionality descriptors derived from multidimensional scaling [27] of Tanimoto similarities

Prediction

of oral bioavailability

Oral bioavailability of a drug can be defined as the fraction of the oral dosethat reachessystemic circulation. Reaching systemic circulation is influenced by both absorption and first-passmetabolismin the liver or gut wall. It is alsopossible for drugs to be highly bound to plasmaproteins, thus resulting in low circulating levels. Lipophilicity and solubility are two important determinants of the extent and rate of absorption of molecules [11,12]. Lipophilicity influences both metabolic activity [13] and plasma protein binding [14]. Interestingly, the effect of lipophilicity on membrane penetration and first-passmetabolismappearto have opposing effects on oral bioavailability. It is important to note that correlation with lipophilicity doesnot imply predictivity. Regression-type models have been attempted to model/predict oral bioavailability and in &uo (in situ perfusion) and in a& (Caco-2 cells) permeability. These approaches use either theoretically calculated or experimentally obtained descriptors relating to logP, pKa, electrostatic interactions, polar surface area, AlogP (i.e. the difference in the partition coefficient between a polar solvent such as diethyl ether and a nonpolar solvent such as isooctane), and so on. Recent methods introduced by Sugawara et al. [15] and Winiwarter et a/. [16] provide excellent examples of the types of models that can be built. Other approaches along similar lines have also appeared [17,18]. A major unsolved problem with regression approachesis that it is not evident whether or not a prediction is applicable on a new seriesof compounds. An entirely diferent approach to bioavailability prediction has been taken by Amidon and co-workers [19]. This is a dynamic and phenomenological method where time is accounted for explicitly in the mathematical formulation. The authors found that a seven-compartmental small intestine model worked well in characterizing the compounds they studied. Explicit knowledge of the effective permeability (measure of in S&Uabsorption) of the drug is required, however. This is not a high-throughput method.

306

Next

generation

therapeutics

[ZS] and atom layer tables [29]. Substituents were selected using D-optimal design [30]. A list of criteria used to eliminate unacceptable candidate amines was also included.

Examination of building known drugs

blocks

in

method for discriminating drugs from nondrugs. They used 38,416 molecules from the WDI databaseas the drug set and 169,331 molecules from the ACD as the nondrug set. The program was able to correctly classify 83% of the ACD compounds and 77% of the WDI compounds.

A very different approach is to analyze the building blocks commonly found in drugs to see whether nonrandom patterns can be unearthed. This work does not directly confront the problem of distinguishing drugs from nondrugs, but it helps to define what drugs are and thereby helps chemists to think about preferred moieties for library design. Bemis and Murcko [31] examined 5,120 compounds from the CMC database and found 1,179 frameworks, or scaffolds. This suggests that drugs are rather diverse. When considering just topology, however, only 32 frameworks described the shapes of half the drugs in the set. Even when atom types and hybridization are considered, 25% of all drugs are found to utilize only 42 frameworks. These surprising results suggest that a small number of common shape themes can be re-used in widely divergent drug design situations. Ghose et a/ [32] characterized the CMC database based on computed physicochemical property profiles (log P, molar refractivity, molecular weight, and number of atoms). They established qualifying ranges, which cover more than 80% of the compounds. They also examined commonly occurring functional groups. Not surprisingly, benzene was the most common, with a frequency approximately equal to that of all aromatic heterocycles combined. Nonaromatic heterocycles were more common than aromatic by approximately twofold. Tertiary amines, alcohols and carboxamides were the most frequently occurring functional groups.

Conclusions

and future

directions

As we have shown, a wide variety of methods have already been applied to the problem of identifying moleculeswith desirable or drug-like properties. These methods appear to be meeting with some success.A key issue is whether general (i.e. global) rules can be formulated, or whether rules will always need to be local and situation-specific. The publications by Ajay et a/ [34] and Sadowski and Kubinyi [35] suggest that general rules with reasonable predictive power can be formulated. Another trend we may witness in coming years might be attempts to predict the various properties that contribute to a drugs success,rather than the more complex problem of drug-likeness itself. These might include oral absorption, blood-brain barrier penetration, toxicity, metabolism, aqueous solubility, logP, pKa, half-life, and plasma protein binding. Some of these properties are themselves rather complex and are likely to be extremely difficult to model, but in our view it should be possible for the majority of properties to be predicted with betterthan-random accuracy. Future work is likely to include additional approachesand more robust attempts at validation of these methods. Also, one hopes that the judicious use of these predictions may lead to increased efficiency in the selection of combinatorial and HTS libraries. We are probably still several years away from a definitive experiment proving this point, however. Further off, in all likelihood, will be the ability to predict downstream issues pertaining to formulation, manufacturing, shelf-life, chemical stability, and so forth. These too are critical for the success of a drug [36].

Neural

network

methods

Neural networks [33] have long been used in classification schemes, but less frequently in pharmaceutical applications; however, two papersappeared in 1998that described the successful employment of different neural network approachesto distinguish drugs from nondrugs. Ajay et a/. [34] used a Bayesian neural network. The network was trained using a random partition of 3,500 compounds, each from the CMC and ACD databases. Two kinds of descriptors were used: a set of seven one-dimensional and 166 two-dimensional descriptors. The program was able to correctly classify 90% of the CMC compounds and mis-classified only 10% of the ACD molecules. The generalizability of the method was demonstrated by the programs ability to correctly classify 80% of the compounds from the MDDR. Appearing back-to-back with Ajay et al. 134.1was a contribution from Sadowski and Kubinyi [35]. Those researchers developed a feed-forward neural network

References
Papers of particular have been highlighted
l

and recommended
interest, as: published within

reading
the annual period of review,

of special interest **of outstanding interest Gordon EM: Libraries Opin Biofechnol 1995, of non-polymeric 6:624-631. organic through molecules. combinatorial MO/ Divers Curr

1. 2. 3. 4.

Dolle RE: Discovery of enzyme inhibitors chemistry. MO/ Divers 1997, 2:223-226. Brown D: Future 1997, 2:217-222. pathways for combinatorial

chemistry.

Lipinski CA, Lombard0 F, Dominy SW, Feeney PJ: Experimental and compuational approaches to estimate solubility and permeablity in drug discovery. Adv Drug De/iv Rev 1997, 23:3-25.

Spencer RW: High-throughput screening of historic collections observations on file size, biological targets, and file diversity. Biotechnol Bioeng 1996, 61:61-67. This work provides an analysis of more than 150 high-throughput screens that were carried out at Pfizer Central Research. The authors compared hit rates for enzyme, cytokine and receptor targets. They evaluated the impact of clustering and diversity analysis on a screen for substance P antagonists.

5. .

Recognizing

molecules

with

drug-like

properties

Walters,

Ajay

and

Murcko

387

6. 7.

Navia drugs.

MA, Chaturvedi Drug Discov

PR: Design Today 1996,i

principles :I 79-189. and

for orally drug-delivery Drug Discov

bioavailable

Chan OH, Stewart BH: Physicochemical considerations for oral drug bioavailability. 1996,1:461-473.

Today H:

8.

Fecik RA, Frank KE, Gentry El, Menon SR, Mitscher The search for orally acitive medications through chemistry. Med Res Rev 1998, 18:149-l 85. Rishton GM: Reactive compounds and HTS. Drug Discov Today 1997, 21382-385. Walters WP, Drug Discov Stahl MT, Murcko MA: Virtual Today 1998, 3:160-l 78. in vitro false

LA, Telikepalli combinatorial positives in

9. 10. 11. 12.

screening

- an overview. from the 2:343-346.

Gillet VJ, Willett P, Bradshaw J: Identification of biological activity profiles using substructural analysis and genetic algorithms. J Chem inform Comp Sci 1998, 38:165-l 79. The authors used profiles of calculated properties (numbers of hydrogen bond donors and acceptors, molecular weight, rotatable bonds, aromatic rings, and a 2% shape descriptor) to differentiate between a set of drugs represented by 14,861 compounds from the World Drug Index and a set of nondrugs represented by 16,807 compounds from the SPRESI database. A genetic algorithm was used to derive a set of optimal weights for the properties. The best weighting schemes were able to provide a five to sixfold enhancement over random selection. The authors were also able to achieve similar results using property profiles to identify drugs belonging to a specific therapeutic class from a larger drug database. Martin EJ, Critchlow RE: Beyond mere diversity: tailoring combinatorial libraries for drug discovery. J Comb Chem 1999, 1~32-45. The authors present an overview of methods used at Chiron for combinatorial library design an analysis. The paper focuses on a number of techniques used to ensure that the molecules produced are diverse and posses desirable properties. 26. Kier LB, Hall LH: Molecular Analysis. New York: Wiley; Torgerson WS: Psychometrica Connectivity 1986. scaling. in Structure-Activity 1. Theory similarity diversity: discovery. algorithm and methods. 25. .

24. .

Schanker LS: On the mechanicsm of absorption gastrointestinal tract. I Med Pharm Chem 1960, Leahy DE, Lynch J, Taylor CID: Mechanisms molecules. Edited by Prescott LF, Nimmo & Sons; 1989.

of absorption of small WS. New York: John Wiley

13.

Seydel JK, Schaper KJ: Quantitative Structure-fharmacokefk Relationships in Drug Design. Edited by Rowland M, Tucker New York: Pergamon Press; 1986.

G. 27. 28. 29.

14.

Sawada GA, Barshun CL, Lutzke BS, Houghton ME, Padbury GW, Ho NFH, Raub TJ: Increased lipophilicity and subsequent cell partitioning decrease passive transcellular diffusion of novel highly lipophilic antioxidants. Pharm fxptl Ther 1999, 288:1317-1326.

Multi-dimensional 1952, 17:401-419.

Willett P, Barnard JM, Downs GM: Chemical J Chem Inform Comp Sci 1998, 38:983-996. Martin design 1995, EJ, Blaney JM, Siani MA: Measuring of combinatorial libraries for drug 38:1431-l 436. exchange

searching. experimental J Med Chem of D-optimal 1.

15. .

Sugawara M, Takekuma Y, Yamada H, Kobayashi M, lseki K, Miyazaki K: A general approach for the prediction of the intestinal absorption of drugs: regression analysis using the physicochemical properties and drug-membrane eletrostatic interactions. J Pharm SC; 1998,87:960-966. Experimentally determined log Cl values in octanol, diethyl ether, chloroform and isooctane were used in different combinations to model the rat jejunal permeability of 32 drugs. Reasonable models could be developed for anionic, cationic and nonionized compounds. Predictions for an external set of 10 compounds (including some zwitterionic compounds) were also reasonable. 16. . Winiwarter S, Bonham NM, Ax F, Hallberg A, Lennernas H, Karlen A: Correlation of human jejunal permeability (in t&o) of drugs with experimentally and theoretically derived parameters. A multivariant data analysis approach. J Med Chem 1998,41:4939-4949. In viva human jejunal permeability of 22 structurally diverse compounds was correlated with experimentally determined log D (log P) values and calculated structural parameters. The best model used log D, number of hydrogen bond donors (HBD) and polar surface area (PSA); however, models using calculated log P, HBD, and PSA and just HBD and PSA were close to the best. Reasonable predictivity was seen on an external validation set of 24 compqunds where data on oral bioavailability was available. It is important to note that some of the actively transported molecules were under-predicted by the models. 17. Stenberg P, Luthman K, Artursson P: Prediction of membrane permeability to pepides from calculated dynamic molecular surface properties. Pharm Res 1999, 16:205-212. Wessel intestinal structure. MD, Jurs PC, Tolan JW, Muskal SM: Prediction of human absorption of drug compounds from molecular J Chem Inform Comp Sci 1998, 38:726-735.

30. 31. 32. .

Miller A, Nguyen N-K: A fedorov design. Appl Stat 1994,43:669-678.

Bemis GW, Murcko MA: The properties Molecular frameworks. J Med Chem

of known drugs. 1996, 39:2887-2893.

Ghose AK, Viswanadhan VN, Wendelowski JJ: A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative characterization of known drug databases. J Comb Chem 1999,1:55-67. The authors characterized the CMC database based on computed physicochemical property profiles (log P, molar refractivity, molecular weight, and number of atoms). They established qualifying ranges, which cover more than 80% of the compounds. They also examined commonly occurring functional groups. They found that benzene was most common - frequency was approximately equal to that of all aromatic heterocycles combined. Nonaromatic heterocycles were more common than aromatic (approximately twofold). Tertiary amines, alcohols and carboxamides were the most frequently occurring functional groups. 33. 34. . Hertz J, Krogh Computation. A, Palmer Redwood RG: Introduction City, CA: Addison to the Theory of Neural Wesley; 1991.

18.

19.

Yu LX, Lipka E, Crison JR, Amidon GL: Transport approached to the biopharmaceutical design of oral drug delivery systems: prediction of intestinal absorption. Adv Drug De/iv Rev 1996, 19:359-376. Pearlman subspace Pearlman diversity. Cooley RS, Smith concept. KM: Metric validation J Chem Inform Comp and the receptor-relevant Sci 1999, 39:28-35.

Ajay, Walters WP, Murcko MA: Can we learn to distinguish between drug-like and nondrug-like molecules7 J Med Chem 1998, 41:3314-3324. The authors used a Bayesian neural network to distinguish between drugs and nondrugs. Network was trained using a random partition of 3,500 compounds each from CMC and ACD. The network was trained using a set of seven 1 D and 166 2D descriptors. The program was able to correctly classify 90% of the CMC compounds, and misclassified only 10% of the ACD molecules. The generalizablity of the method was demonstrated by the programs ability to correctly classify 80% of the compounds from the MDDR. 35. Sadowski J, Kubinyi H: A scoring scheme for discriminating . between drugs and nondrugs. J Med Chem 1998,41:3325-3329. The authors developed a neural network method for discriminating drugs and non-drugs; they used 38,416 molecules from the WDI as the drug set and 169,331 molecules from the ACD as the nondrug set. A set of atom types originally developed for log P prediciton was used as descriptors. A feedforward neural network was trained to classify the compounds. The program was able to correctly classify 83% of the ACD compounds and 77% of the WDI compounds. 36. Streng WH: substances. Physical chemical characterization Drug Discos Today 1997,2:415-426. of drug

20. 21. 22. 23.

RS, Smith KM: Novel software tools for chemical Persp Drug Design Discov 1998, 9:339-353. W, Lohones P: Multivariate Data Anaysis. New York: Wiley; 1971.

Cummins in chemical knowledge compounds.

DJ, Andrews CW, Bentley databases: comparison bases and databases J Chem Inform Comp

JA, Gory M: Molecular diversity of medicinal chemistry of commercially available Sci 1996, 36:750-763.

Das könnte Ihnen auch gefallen