Virtual Screening Drag Design

Hot Topics in Chemoinformatics in the Pharmaceutical Industry
David J. Wild, Ph.D.

Scientific Computing Consultant, and Adjunct Professor of Pharmaceutical Engineering at the University of Michigan
david@wild-ideas.org www.WildIdeasConsulting.com
About me
B.Sc Computer Science Ph.D. Chemoinformatics (Willett Lab) Worked for 5 years in Scientific Computing leadership at Pfizer, responsible for the development of computational tools for scientists Now run a consulting firm based in Ann Arbor, Mich., and am also an Adjunct Professor at the University of Michigan.
doing some research
Wild Ideas Consulting www.WildIdeasConsulting.com
University of Michigan www-personal.engin.umich.edu/~wildd
What well cover today

Overview of early-stage drug discovery and the big industry concerns Using information and technology together to improve the chances of finding a new drug Example High Throughput Screening Some other examples of hot areas
Genomics & Proteomics Information Handling Virtual Screening Combinatorial Chemistry Design of scientific software
Characteristics of the pharmaceutical industry

Very segmented market largest company (Pfizer) only has an 11% market share High risk, long term takes 10-20 years to develop a drug, and most drugs fail to get to market Highly regulated (by FDA) High profit margins for drugs which do make it Investors traditionally expect high return on investment Four main phases: discovery, development, clinical trials and marketing
R&D spending up, new drugs down
Taken from http://www.newscientistjobs.com/biotech/ernstyoung/blues.jsp
Drug Discovery & Development

Identify disease Find a drug effective against disease protein (2-5 years) Isolate protein involved in disease (2-5 years)
Preclinical testing (1-3 years)

Formulation & Scale-up
Human clinical trials (2-10 years)
FDA approval (2-3 years)
Impact of new technology on drug discovery

The last few years have seen a number of revolutionary new technologies:
Gene chips, genomics and HGP Bioinformatics & Molecular biology More protein structures High-throughput screening & assays Virtual screening and library design Docking Combinatorial chemistry In-vitro ADME testing Other computational methods
How do we make it all work for us?
GENOMICS, PROTEOMICS & BIOPHARM.

Potentially producing many more targets and personalized targets
HIGH THROUGHPUT SCREENING

Identify disease Screening up to 100,000 compounds a day for activity against a target protein
VIRTUAL SCREENING
Isolate protein Using a computer to predict activity
COMBINATORIAL CHEMISTRY
Rapidly producing vast numbers of compounds Find drug
MOLECULAR MODELING
Computer graphics & models help improve activity
IN VITRO & IN SILICO ADME MODELS
Preclinical testing
Tissue and computer models begin to replace animal testing
There is little hard data on using the new technologies

In a sense, the drug design process is becoming a big experiment Do we continue as before, and carefully introduce new technologies as we deem appropriate, or do we radically change the way things are done? Lots of pressure for the new technologies to yield results quickly How do we measure the results?
Some questions being asked

Is our increasing spending on R&D and new technologies really going to pay off? Or was it a red herring? Is the paucity of drugs in the pipeline because were not doing things right, or are we just hitting limits on the number of major diseases with potential treatments still to be found? (all the low-hanging fruit has gone) Should we be looking in new areas (e.g. life enhancment rather than life saving or quality of life)
Whats being done

Trying to get the right Attrition (=drugs dropping out of the pipeline). Aim to increase early-stage attrition and reduce late-stage attrition Risk analysis look ideally for low-risk, high-payoff drugs Using metrics to monitor successes and failures
Analyzing risk
High risk Low payoff
High risk High payoff
Low risk Low payoff
Low risk High payoff
Using metrics to monitor improvement

Split the discovery process into discrete units, with key decisions at the end of each unit. Come up with measurable properties that can be used to gauge success Look for good and bad decisions and why they were made Stage Target exploration HTS HTS Analysis Decision Point Go with this target? Was the screen successful? Follow up these 5-10 series
Series Followup Produce 2-3 lead compounds ADME study Are compounds safe?
Summary
The pharmaceutical industry is a high-risk industry with very long development times and short product lifespans There has been a lot of investment in new technologies for early stage drug discovery, but so far these are not resulting in more drug candidates (or profits) Companies are looking at ways to address this problem including managing attrition, risk analysis and metrics.
How Chemoinformatics can help out

Producing and manage information for metrics In-silico analysis to reduce risk, e.g.
Virtual screening Library design, Docking Cost/benefit analyses
Making information available at the right time and the right place Needs to be integrated into processes
An example: High-Throughput Screening
Screening perhaps millions of compounds in a corporate collection to see if any show activity against a certain disease protein
High-Throughput Screening
Drug companies now have millions of samples of chemical compounds High-throughput screening can test 100,000 compounds a day for activity against a protein target Maybe tens of thousands of these compounds will show some activity for the protein The chemist needs to intelligently select the 2 - 3 classes of compounds that show the most promise for being drugs to follow-up
Informatics Implications
Need to be able to store chemical structure and biological data for millions of datapoints
Computational representation of 2D structure
Need to be able to organize thousands of active compounds into meaningful groups

Use cluster analysis or machine learning methods to group similar structures together and relate to activity
Need to learn as much information as possible from the data (data mining)
Apply statistical methods to the structures and related information
HTS Tools Tripos SAR Navigator
SAR Navigator is Tripos, inc., www.tripos.com
BioReason ClassPharmer
Clusters actives into groups representing series Attempts to find a scaffold using MCS algorithm Recovers inactives back into series Presents series as rows in a spreadsheet view Gives other statistics on series, such as activity distribution http://www.bioreason.com
BioReason Classpharmer
www.bioreason.com
BioReason Classpharmer
www.bioreason.com
Strategy for HTS Triage

Run HTS Decided which compounds are active and which are inactive Cluster the actives to put them into series Visualize clusters of actives (showing 2D structures) and pick series of interest Identify scaffold for each series Use similarity or substructure search on inactives to find inactives related to these series Use SAR techniques to discover differences between actives and inactives in a series
Information generated at different points in the Drug Design process

Gene chip experiments Protein structures Project selection decisions Assay protocols HTS results Series selection decisions SAR studies Combinatorial Expts. Pharmacophores ADME studies Lead cmpd decisions Toxicology studies Scaleup reactions
Clinical Trials data Doctor/patient studies Marketing, surveys, etc
Information generated at different sites
Distributed goals model
Shared goals model
Information storage breakdowns

Large amounts of information generated:
Some is not kept at all Some is kept but loses its meaning
Often data is kept, but not semantics or decisions

e.g. keep the HVX2 assay result for this compound was 3.2, but not what the assay protocol was, whether the compound was considered active, nor whether it was followed up on.
Bigger picture or derivative information is usually not stored

E.g. all the compounds with a tri-methyl group seemed to have much lower activity for this project
Information access breakdowns

Some information is only available in one physical location Some information is only available within one part of the discovery process Often information is not contextualized for use outside a particular domain When someone is clear about a piece of information they need; that piece of information exists, but they dont know how to access it.
E.g. What system to use, what Oracle table its in, or even the knowledge of whether that piece of information does exist!
Missed opportunities
Not a specific breakdown, but if the right piece of information had been available at the right time, better decisions could have been made E.g.
A group of compounds is being followed up as potential drugs, but a rival company just applied for a patent on the compounds A large amount of money is being spent developing an HTS assay for a target, but marketing research shows any drug is unlikely to be a success A group of compounds is selected from an HTS as good candidates for follow up, but 20 years ago they were followed up for a similar project and had severe solubility problems
Information use breakdowns

The meaning of data is incorrectly interpreted A single piece of information is used, whilst using a wider range of information would lead to different conclusions Lessons learned from one project are incorrectly applied to another Fuzzy information is taken as concrete information
What do we do?
No large company has really solved the problem But ongoing attempts include:
Defining information produced and needed at each stage of the discovery process Improving processes to be more consistent, especially across different sites Improving information flow between departments and sites Harmonizing terminology across disciplines and sites Defining needed management information as well as raw data Looking for quick win opportunities
This will presumably impact what is stored in databases and what software is used
Oracle Chemistry Cartridges help
Some Other Examples

Genomics & Proteomics Information Handling Virtual Screening Combinatorial Chemistry Design of scientific software
Genomics & Proteomics Information Handling
Understanding the link between diseases, genetic makeup and expression of proteins
Genomics
Genomics is fast-forwarding our understanding of how DNA, genes, proteins and protein function are related, in both normal and disease conditions Human genome project has mapped the genes in human DNA Hope is that this understanding will provide many more potential protein targets Allows potential personalization of therapies
ATACGGAT TATGCCTA
functions
Gene Chips
Gene chips allow us to look for changes in protein expression for different people with a variety of conditions, and to see if the presence of drugs changes compounds administered that expression Makes possible the design of drugs to target different phenotypes
expression profile (screen for 35,000 genes) people / conditions e.g. obese, cancer, caucasian
Chemogenomics from Vertex

Video: http://www.vrtx.com/Chemogenonone.html
Virtual Screening
Build a computational model of activity for a particular target Use model to score compounds from virtual or real libraries Use scores to decide which to make, or pass through a real screen
Computational Models of Activity

Machine Learning Methods
E.g. Neural nets, Bayesian nets, SVMs, Kahonen nets Train with compounds of known activity Predict activity of unknown compounds
Scoring methods
Profile compounds based on properties related to target
Fast Docking
Rapidly dock 3D representations of molecules into 3D representations of proteins, and score according to how well they bind
Present molecules to model

We may want to virtual screen
All of a companys in-house compounds, to see which to screen first A compound collection that could be purchased A potential combinatorial chemistry library, to see if it is worth making, and if so which to make
Model will come out with with either prediction of how well each molecule will bind, or a score for each molecule
Combinatorial Chemistry
By combining molecular building blocks, we can create very large numbers of different molecules very quickly. Usually involves a scaffold molecule, and sets of compounds which can be reacted with the scaffold to place different structures on attachment points.
Example Combinatorial Library

Scaffold R-groups
R1 = OH OCH3 NH2 Cl COOH NH
CN
Examples
OH NH
R1
O OH C
OH
R2 R3
R2 = phenyl OH NH2 Br F CN R3 = CF3 NO2 OCH3 OH phenoxy

OH
OH NH
NH
OH
CF3 O CH3 O C NH OH
For this small library, the number of possible compounds is 5 x 6 x 5 = 150
Combinatorial Chemistry Issues

Which R-groups to choose
Which libraries to make

Fill out existing compound collection? Targeted to a particular protein? As many compounds as possible?
Computational profiling of libraries can help

Virtual libraries can be assessed on computer
Design of Scientific Software

Problems with scientific software tend to occur because of deficiencies in one of three areas: Software Relevance Software Usability Software Management
Software Relevance
To be able to make software relevant requires the software designer to understand:
the science, i.e. the domain the scientific computing techniques that are used in the domain the possibilites and limitations of software development.
Even with this, its hard to match the things we can do with the things that people want or need to do Techniques like personas and contextual inquiry simply help us understand the people who use the software, their goals, and tasks they want to do
Software relevance: Bridge between computation & science

clustering sim. searching activity models scaffold detection docking logp calculation
goals: e.g. produce compounds that have high biological activity tasks: work out a chemical synthesis
tasks: doing a cluster analysis identifying activity-related fragments
tools
choose good reagents
try and document some reactions
chemoinformatics
science
Software Usability
Tend to focus on the method and the science, but not how easy it is for people to get their job done using the software Programmers tend to make software intuitive for them, but not necessarily the people it is designed for A usability lab and other techniques can make a HUGE difference to the satisfaction of users and programmers alike!
Software Management
Disparate set of tools & platforms Disparate programming styles, languages A variety of people tend to be writing software
Trained software developers Enthusiastic scientists Scientific computing specialists
Focus on the science tends to mean software management is neglected Everyone hates traditional software management rules But there are ways of making everything work better and having more fun doing it! Have a recommended basic setup that should help a lot
Foundation reading
The Inmates Are Running the Asylum by Alan Cooper Contextual Design by Hugh Beyer and Karen Holtzblatt Usability Engineering by Jakob Nielsen The Visual Display of Quantitative Information by Edward Tufte Dont Make Me Think! by Steve Krug
See also, www.WildIdeasConsulting.com
Summary
R&D in the pharmaceutical industry is undergoing a lot of technological changes, and there is pressure to make the investment pay off There is a big need to sensibly use the large amounts of chemical and biological-related information produced in the process Thoughtful use of chemoinformatics methods and software is becoming crucial to the success of drug discovery

Virtual Screening Drag Design

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Virtual Screening Drag Design

Hochgeladen von

Copyright:

Verfügbare Formate

Hot Topics in Chemoinformatics in the Pharmaceutical Industry

David J. Wild, Ph.D.

doing some research

Wild Ideas Consulting www.WildIdeasConsulting.com

University of Michigan www-personal.engin.umich.edu/~wildd

What well cover today

Characteristics of the pharmaceutical industry

R&D spending up, new drugs down

Taken from http://www.newscientistjobs.com/biotech/ernstyoung/blues.jsp

Drug Discovery & Development

Preclinical testing (1-3 years)

Human clinical trials (2-10 years)

FDA approval (2-3 years)

Impact of new technology on drug discovery

How do we make it all work for us?

GENOMICS, PROTEOMICS & BIOPHARM.

HIGH THROUGHPUT SCREENING

IN VITRO & IN SILICO ADME MODELS

Tissue and computer models begin to replace animal testing

There is little hard data on using the new technologies

Some questions being asked

Whats being done

High risk Low payoff

High risk High payoff

Low risk Low payoff

Low risk High payoff

Using metrics to monitor improvement

How Chemoinformatics can help out

An example: High-Throughput Screening

Need to be able to organize thousands of active compounds into meaningful groups

HTS Tools Tripos SAR Navigator

SAR Navigator is Tripos, inc., www.tripos.com

Strategy for HTS Triage

Information generated at different points in the Drug Design process

Clinical Trials data Doctor/patient studies Marketing, surveys, etc

Information generated at different sites

Distributed goals model

Shared goals model

Information storage breakdowns

Often data is kept, but not semantics or decisions

Bigger picture or derivative information is usually not stored

Information access breakdowns

Information use breakdowns

Some Other Examples

Genomics & Proteomics Information Handling

Chemogenomics from Vertex

Computational Models of Activity

Present molecules to model

Example Combinatorial Library

R2 = phenyl OH NH2 Br F CN R3 = CF3 NO2 OCH3 OH phenoxy

For this small library, the number of possible compounds is 5 x 6 x 5 = 150

Combinatorial Chemistry Issues

Which libraries to make

Computational profiling of libraries can help

Design of Scientific Software

Software relevance: Bridge between computation & science

tasks: doing a cluster analysis identifying activity-related fragments

choose good reagents

try and document some reactions

See also, www.WildIdeasConsulting.com

Das könnte Ihnen auch gefallen