Sie sind auf Seite 1von 73

Translational

Bioinformatics 2011:
The Year in Review
Russ B. Altman, MD, PhD
Stanford University

Thursday, March 10, 2011 1


Goals
• Provide an overview of the major scientific
events, trends and publications in
translational bioinformatics
• Create a “snapshot” of what seems to be
important in March, 2011 for the amusement
of future generations.
• Marvel at the progress made and the
opportunities ahead.

Thursday, March 10, 2011 2


Process
1. Think about what has had early impact
2. Think about sources to trust
3. Solicit advice from colleagues
4. Surf online resources
5. Select papers to highlight in ~2 slides and
some to highlight in < 1 slide.

Thursday, March 10, 2011 3


Caveats

• Considered last ~14 months


• Focused on human biology and clinical
implications: molecules, clinical data, informatics.
• Considered both data sources and informatics
methods (and combination)

Thursday, March 10, 2011 4


Final list
• ~63 semi-finalist papers
• 25 presented here (briefly!)
• Apologies to many I missed, including some I
mention at end. Mistakes are mine.
• This talk and semi-finalist bibliography will be
made available on the conference website and
my blog.
• TOPICS: Personal genomics, Drugs & Genes,
Infrastructure for TB, Sequencing & Science,
Warnings/Hope.
Thursday, March 10, 2011 5
Thanks!
• Atul Butte • Dan Masys
• Phil Bourne • Alex Morgan
• Joel Dudley • Gill Omenn
• Larry Fagan • Chirag Patel
• Guy Fernald • Nigam Shah
• Yael Garten • Justin Starren
• George Hripcsak • Nick Tatonetti
• Larry Hunter • Peter Tarczy-Hornoch
• Peter Kang • Olga Troyanskaya
• Konrad Karczewski • Liping Wei
• Alain Laederach • Jeff Williamson
• Jennifer Lahti • Anonymous reviewer
• Yves Lussier
Thursday, March 10, 2011 6
Personal Genomics

Thursday, March 10, 2011 7


“Clinical assessment incorporating a personal
genome” (Ashley et al, Lancet)
• Goal: Assess the ability of current genetic
knowledge to provide clinically useful information
for a 40 y.o. patient in possession of his genome.
• Method: Use genome to assess rare genetic disease
carrier status, common disease risk, environmental
risks, and drug response.
• Result: High risk for CAD, statins likely to work,
recommend starting statin.
• Conclusion: Whole genome data provides useful
information about 100s of drugs, 1000s of diseases,
but many variations can not yet be interpreted.
Thursday, March 10, 2011 8
Thursday, March 10, 2011 9
Thursday, March 10, 2011 9
“Do-it-yourself genetic testing” (Salzberg & Pertea,
Genome Biology)
• Goal: Empower individuals with their genome to
test for mutations in breast cancer genes (BRCA1
and BRCA2) in violation of Myriad Inc.’s patent
• Method: Concatenated 68 known mutated
sequences into a single virtual strand, and then
mapped genome reads to this strand.
• Result: They showed in Asian, African and Caucasian
samples the ability to recover BRCA* mutations.
• Conclusion: We should all be allowed to do this.
Thursday, March 10, 2011 10
“Fundamentally, this seems no different from
measuring one’s temperature or blood pressure, but
because of gene patents, the act of reading one’s own
genome may require the permission of a private
company. It is hard to envision how the patent holders
can enforce their claims in this scenario. Our
contention is that these patents never should have
been awarded, and that no private entity should have
rights to the naturally occurring gene sequences in
every human individual.”
Thursday, March 10, 2011 11
“Robust Replication of Genotype-Phenotype
Associations across Multiple Diseases in an
Electronic Medical Record” (Ritchie et al, Am. J.
Human. Gen.)
• Goal: Assess whether EMRs can replicate GWAS
signals found in controlled research setting.
• Method: Test 21 known SNP associations with
disease within an EMR.
• Result: Most GWAS results replicated, but with
smaller odds ratios.
• Conclusion: EMR is not a tangled mess, but a
potential goldmine.
Thursday, March 10, 2011 12
Thursday, March 10, 2011 13
“Web-based, participant-driven studies yield novel
genetic associations for common traits” (Eriksson et
al, PLoS Genetics)
• Goal: Evaluate web-based phenotype elicitation for
GWAS-based discovery.
• Method: Using commercial DTC data (+consent),
conduct multiple GWAS with questionnaire-derived
phenotypes.
• Result: Replicated several known associations, and
discovered new ones for hair morphology, photic
sneeze reflex, freckling & smell asparagus metabolite
• Conclusion: Phenotypes from DTC customers can
be used to support discovery.
Thursday, March 10, 2011 14
Hair curl,

asparagus
anosmia,

photic sneeze,

freckling,

Thursday, March 10, 2011 15


“PheWAS: demonstrating the feasibility of a
phenome-wide scan to discover gene-disease
associations” (Denny et al, Bioinformatics)
• Goal: Seek additional phenotypes associated with
SNPs associated with an index phenotype.
• Method: Use EMR to define cases/controls for 776
disease. For 5 disease-associated SNPs, evaluate
their association with other diseases
• Result: Replicated known association with index
disease. Found 19 new associations for these SNPs.
• Conclusion: EMR and SNP data can combine for
Phenome Wide Association Study (PheWAS) to link
SNPs to diseases.
Thursday, March 10, 2011 16
Thursday, March 10, 2011 17
“An environment-wide association study (EWAS) on
type 2 diabetes mellitus” (Patel et al, PLoS One)
• Goal: Develop method to associate environmental
factors with disease
• Method: Used NHANES dataset to associate 266
environmental variables to glucose values > 126 mg/
dL, corrected for multiple hypothesis testing.
• Result: Found significant associations with pesticide
heptachlor epoxide, polychlorinated biphenyls
(PCBs). Protective effect of β-carotenes.

• Conclusion: An important first step in balancing the


equation of phenotype = f(genetics, environment)
Thursday, March 10, 2011 18
Thursday, March 10, 2011 19
“Rare variants create synthetic genome-wide
associations” (Dickson et al, PLos Biology)
• Goal: Assess possibility that GWAS hits come from
combined effect of rare variants, and not single
SNPs.
• Method: Use simulations to explore possibilities.
• Result: Rare causal mutations for hearing loss and
sickle cell create genome-wide significant
associations over very large genomic regions.
• Conclusion: GWAS may be giving us important
information, but not what we thought.

Thursday, March 10, 2011 20


Thursday, March 10, 2011 21
Drugs and Genes and
their relationships

Thursday, March 10, 2011 22


“A side effect resource to capture phenotypic
effects of drugs” (Kuhn, Mol. Sys. Biol.)
• Goal: Create an available database connecting drugs
to side effects.
• Method: Mine FDA labels.
• Result: 888 drugs connected to 1450 side effects.
SIDER database
• Conclusion: A key information resource for
understanding the systems effects of drugs alone
and potentially in combination. Requires
augmentation with off-label side effects.

Thursday, March 10, 2011 23


Thursday, March 10, 2011 24
“Mining multi-item drug adverse effect associations
in spontaneous reporting systems” (Harpaz et al,
BMC Bioinformatics)
• Goal: Create methods to associate multiple drugs
to adverse events.
• Method: Association rule mining on 162K reports
from FDA AERS in 2008.
• Result: 1167 multi-drug AEs. 67% previously
recognized, several novel ADEs identified.
• Conclusion: Population-based analysis of drug
effects can find synergistic effects, could provide
basis for molecular understanding.
Thursday, March 10, 2011 25
Thursday, March 10, 2011 26
“Drug off-target effects predicted using structural
analysis in the context of a metabolic network
model” (Chang et al, PLoS Comp. Bio)
• Goal: Combine structural informatics and systems
biology to dissect hypertension from torcetrapib.
• Method: Systems model of kidney perturbed based
on predicted gene-product interactions with drug.
• Result: Successfully predicted off-targets that
explain hypertension and predict drug response.
• Conclusion: Drug action can be understood using
systems modeling and 3D structural interaction
modeling.
Thursday, March 10, 2011 27
Thursday, March 10, 2011 28
“Reconstruction and flux-balance analysis of the
Plasmodium falciparum metabolic network” (Plata et
al, Mol. Sys. Biol.)
• Goal: Find new drug targets for malaria.
• Method: Build metabolic model of malaria using
FBA, look for targets at key points.
• Result: 90% accuracy in reproducing knockout and
drug inhibition assays. 40 new targets, including one
experimentally proven.
• Conclusion: Modeling of metabolic networks can be
accurate and can suggest (with mechanism) new
drug targets.
Thursday, March 10, 2011 29
Thursday, March 10, 2011 30
“Optimal drug synergy in antimicrobial
treatments” (Torella et al, PLoS Comp. Bio.)
• Goal: Determine optimal treatment strategies to
prevent multi-drug resistance.
• Method: Math modeling.
• Result: Two effects of synergy: (1) clears infection
faster, but (2) increases selective advantage of
mutants. “Winner” determined by level of
competition for resources.
• Conclusion: Optimal strategy is not always to
maximize synergy, but may be to introduce mild
antagonism.
Thursday, March 10, 2011 31
Thursday, March 10, 2011 32
Infrastructure for
translational
bioinformatics

Thursday, March 10, 2011 33


“Building a biomedical ontology recommender web
service” (Jonquet et al, J. Biomed. Semantics)
• Goal: Analyze a domain and recommend an
ontology that would cover it.
• Method: Use concepts of coverage, connectivity
and size to perform matching based on keywords
from corpus. Determine appropriate weights.
• Result: On 6 test sets, evaluators liked the
proposed ontologies.
• Conclusion: As ontologies become more available,
tools for matching one or more to a domain will be
critical. This is good first step.
Thursday, March 10, 2011 34
Thursday, March 10, 2011 35
“Using global unique identifiers to link autism
collections” (Johnson et al, JAMIA)
• Goal: Assess ability to generate global IDs that
work for large population-based studies.
• Method: Implement one-way encrypted hashing
based on commonly available demographics.
• Result: On 1M simulated individuals and 8000
participants, identifiers generated for 96% of
children and 77% of parents.
• Conclusion: Generic unique identifiers are useful
for linking data sets, and thus detecting overlap.

Thursday, March 10, 2011 36


Thursday, March 10, 2011 37
“Serving the enterprise and beyond with informatics
for integrating biology and the bedside
(i2b2)” (Murphy et al, JAMIA)
• Goal: Provide tools to integrate medical record and
genomic-age research data.
• Method: NIH-supported NCBC created i2b2
software for cohort finding and query.
• Result: Implemented at Harvard Partners, and
exported to many centers nationally.
• Conclusion: i2b2 provides a relatively facile entry
into translational bioinformatics research, leveraging
efforts of NIH NCBC program.
Thursday, March 10, 2011 38
Thursday, March 10, 2011 39
“Collaboratively charting the gene-to-phenotype
network of human congenital heart
defects” (Barriot et al, Genome Medicine)
• Goal: Build a knowledgebase of genetics of
congenital heart disease.
• Method: Wiki-based collaborative KB.
• Result: 1000s of concepts linked together and
edited by credentialed members of CHDWiki.
• Conclusion: Community-based knowledge
resources can be a cost effective way to create a
curatede repository of linked knowledge.

Thursday, March 10, 2011 40


Thursday, March 10, 2011 41
“In silico research in the era of cloud
computing” (Dudley & Butte, Nature Biotech.)
• Goal: Review opportunities for cloud in
computational research
• Result: Virtual images of computing environments
offer ultimate opportunities for reproducibility and
transparency
• Conclusion: A “whole system snapshot exchange”
WSSE vision of future may help minimize
redundancy and maximize progress.

Thursday, March 10, 2011 42


Thursday, March 10, 2011 43
“Principles of Human Subjects Protections Applied in
an Opt-Out, De-identified Biobank” (Pulley et al,
Clin. Translational Science)
• Goal: Explore feasibility of an “opt-out” approach to
human consent for genomics.
• Method: Implement opt-out at Vanderbilt with
substantial effort at education, outreach, and
attention to ethical theory.
• Result: High rates of accrual, low rate of
unhappiness among participants.
• Conclusion: Opt-out is possible but treacherous.
Thursday, March 10, 2011 44
Thursday, March 10, 2011 45
Thursday, March 10, 2011 45
Genomics: applications

Thursday, March 10, 2011 46


“The mutation spectrum revealed by paired genome
sequences from a lung cancer patient” (Lee et al,
Nature)
• Goal: Understand the differences between normal
and cancer tissue in lung cancer.
• Method: Whole genome sequence of primary
tumor and adjacent normal tissue.
• Result: > 50,000 SNP differences, many large scale
rearrangements. Selection against expressed genes.
High rate of mutation in kinases.
• Conclusion: First evidence of distinctive
evolutionary pressures on particular tumors.
Thursday, March 10, 2011 47
SNPs red dots,
chromosomal
structural
variations =
blue & red
lines, LOH
green, CNV
red/blue bars

Thursday, March 10, 2011 48


“A human gut microbial gene catalogue established
by metagenomic sequencing” (Qin et al, Nature)
• Goal: Understand the impact of gut microbes on
human health.
• Method: Sequence DNA in feces of 124 individuals
with a range of maladies.
• Result: 576 Gbases of DNA sequenced showing
3.3M genes. Minimum of 160 different bacteria/
person.
• Conclusion: Basic data is now established to
associate bacterial flora with disease/health.

Thursday, March 10, 2011 49


Relative
abundance of
57 frequent
microbial
genomes
among
individuals in
cohort

Thursday, March 10, 2011 50


“Toward a complete in silico, multi-layered
embryonic stem cell regulatory network” (Xu et al,
WIREs. Sys. Biol. & Med.)
• Goal: Review databases, algorithms and software to
organize data on embryonic stem (ES) cells.
• Method: Aggregation of resources.
• Result: 100s of relevant resources, data integration is
the key challenge.
• Conclusion: Stem cell biology is likely to be a key
driver of informatics innovation in next decade.
Existing tools promising, but holes remain in our
ability to broadly integrate modalities.
Thursday, March 10, 2011 51
Thursday, March 10, 2011 52
“Systematic discovery of nonobvious human disease
models through orthologous phenotypes” (McGary
et al, PNAS)
• Goal: Search for similar phenotypes across different
organisms = phenotype homologs = phenologs.
• Method: Find phenotypes in 2 organisms that share
greater-than-expected gene orthologs.
• Result: Yeast model for angiogenesis, worm model
for breast cancer, mouse model of autism, plant
model of neural crest development. And others.
• Conclusion: The criteria for “model systems” for
translational medicine have been tightened up.
Thursday, March 10, 2011 53
Thursday, March 10, 2011 54
Warnings and Causes for
Hope

Thursday, March 10, 2011 55


“Over-optimism in bioinformatics
research” (Boulesteix, Bioinformatics)
• Goal: Investigate implication of bias within
informatics research.
• Method: Analyze impact of (1) fishing for signficance
and (2) publication bias.
• Result: Recommendations for publishing negative
results, publishing documented code and data more
transparently.
• Conclusion: Some of what I reported here today is
probably wrong, and I have been overly optimistic.

Thursday, March 10, 2011 56


“Predicting protein structures with a multiple player
online game” (Travis et al, Lancet)
• Goal: Test if crowdsourcing outperform best
algorithms for protein structure prediction?
• Method: Build online game for folding proteins,
compare results with Rosetta, best automatic
algorithm.
• Result: Top humans excel, work collaboratively, and
explore search strategies better than Rosetta.
• Conclusion: Humans need to be integrated into
algorithms for harding solve problems.

Thursday, March 10, 2011 57


Thursday, March 10, 2011 58
“How to build a motivated research group” (Alon,
Molecular Cell)
• Goal: Explore issues of how to manage a research
group.
• Method: Introspection.
• Result: Key features to foster: competence,
autonomy and social-connectedness.
• Conclusion: Good projects are at the intersection
of talents, objectives, & passions.

Thursday, March 10, 2011 59


Thursday, March 10, 2011 60
2010 Crystal ball...
Clinical records will be linked to genomics to make
discoveries.
More emphasis on drugs and ancestry in DTC
companies
Whole genome sequencing for a cohort with a
common disease (cancer already here?)
Consumer sequencing (vs. genotyping)
Semantics in literature mining for knowledge discovery
Cloud computing will contribute to one biomedical
discovery.
Thursday, March 10, 2011 61
2010 Crystal ball...
Clinical records will be linked to genomics to make
discoveries.
More emphasis on drugs and ancestry in DTC
companies
Whole genome sequencing for a cohort with a
common disease (cancer already here?)
Consumer sequencing (vs. genotyping)
Semantics in literature mining for knowledge discovery
Cloud computing will contribute to one biomedical
discovery.
Thursday, March 10, 2011 62
2010 Crystal ball...
Clinical records will be linked to genomics to make
discoveries.
More emphasis on drugs and ancestry in DTC
companies
Whole genome sequencing for a cohort with a
common disease (cancer already here?)
Consumer sequencing (vs. genotyping)
Semantics in literature mining for knowledge discovery
Cloud computing will contribute to one biomedical
discovery.
Thursday, March 10, 2011 63
2010 Crystal ball...
Clinical records will be linked to genomics to make
discoveries.
More emphasis on drugs and ancestry in DTC
companies
Whole genome sequencing for a cohort with a
common disease (cancer already here?)
Consumer sequencing (vs. genotyping)
Semantics in literature mining for knowledge discovery
Cloud computing will contribute to one biomedical
discovery.
Thursday, March 10, 2011 64
2010 Crystal ball...
Clinical records will be linked to genomics to make
discoveries.
More emphasis on drugs and ancestry in DTC
companies
Whole genome sequencing for a cohort with a
common disease (cancer already here?)
Consumer sequencing (vs. genotyping)
Semantics in literature mining for knowledge discovery
Cloud computing will contribute to one biomedical
discovery.
Thursday, March 10, 2011 65
2010 Crystal ball...
Clinical records will be linked to genomics to make
discoveries.
More emphasis on drugs and ancestry in DTC
companies
Whole genome sequencing for a cohort with a
common disease (cancer already here?)
Consumer sequencing (vs. genotyping)
Semantics in literature mining for knowledge discovery
Cloud computing will contribute to one biomedical
discovery.
Thursday, March 10, 2011 66
2010 Crystal ball...
Clinical records will be linked to genomics to make
discoveries.
More emphasis on drugs and ancestry in DTC
companies
Whole genome sequencing for a cohort with a
common disease (cancer already here?)
Consumer sequencing (vs. genotyping)
Semantics in literature mining for knowledge discovery
Cloud computing will contribute to one biomedical
discovery.
Thursday, March 10, 2011 67
2011 Crystal ball...
Consumer sequencing (vs. genotyping) will emerge
Cloud computing will contribute to major biomedical
discovery.
Informatics applications to stem cell science will
increase
Important discoveries from text mining
Population-based data mining will yield important
biomedical insights
Systems modeling will suggest useful polypharmacy
Immune genomics will emerge as powerful data
Thursday, March 10, 2011 68
Apologies...
“Network Analysis of Global Influenza Spread” (Chan, PLoS Comp. Bio.)

“Discovery of drug mode of action and drug repositioning from transcriptional


responses” (Iorio et al, PNAS)

“The structural and content aspects of abstracts versus bodies of full text journal
articles are different” (Cohen et al, BMC Bioinformatics)

“Ethical implications of the use of whole genome methods in medical research” (Kaye
et al, Eur. J. of Human. Gen.)

“Genome, epigenome and RNA sequences of monozygotic twins discordant for


multiple sclerosis” (Baranzini et al, Nature)

“Extreme Evolutionary Disparities Seen in Positive Selection across Seven Complex


Diseases” (Corona et al, PLoS One)

“Using text to build semantic networks for pharmacogenomics” (Coulet et al, J.


Biomed. Inf.)

Thursday, March 10, 2011 69


Apologies...
“Systems Pharmacology of Arrhythmias” (Berger et al, Science Signaling)
“An Integrated Approach to Uncover Drivers of Cancer” (Akavia et al,
Cell)
“Exome sequencing identifies MLL2 mutations as a cause of Kabuki
syndrome” (Ng et al, Nature Genetics)
“Disease-Associated Mutations That Alter the RNA Structural
Ensemble” (Halvorsen et al, PLoS Genetics)
“Hints of hidden heritability in Gwas” (Gibson, Nature Genetics)
“Gene-environment interactions in 7610 women with breast cancer:
prospective evidence from the Million Women Study” (Travis et al, Lancet)
“Leveraging informatics for genetic studies: use of the electronic medical
record to enable a genome-wide association study of peripheral arterial
disease” (Kullo et al, JAMIA)

Thursday, March 10, 2011 70
Thanks.
See you in 2012!
russ.altman@stanford.edu

Thursday, March 10, 2011 71

Das könnte Ihnen auch gefallen