Sie sind auf Seite 1von 34

Modeling Protein Function

MED260
Philip E. Bourne
Department of Pharmacology, UCSD
pbourne@ucsd.edu
http://www.sdsc.edu/pb
Slides on-line at:
http://www.sdsc.edu/pb/edu/med260/med260.ppt

MED260 Modeling Protein Function 1


- October 11, 2006
Agenda
• Why model protein function?
• Where does it fit as a technique in modern medical
research?
• The data deluge as a motivator
• The extent of what can be modeled
• Ontologies – establishing order from chaos
• Examples of what can be learnt
• Accuracy – a word of caution

MED260 Modeling Protein Function 2


- October 11, 2006
Why Model Protein Function
• The rate of discovery of new proteins far
outweighs our ability to functionally characterize
them
• Functional discovery of new proteins has
implications in:
– Drug discovery
– Biomarker identification
– Understanding of biological processes
– Identification of disease states and treatment regimes

MED260 Modeling Protein Function 3


Why model protein function? - October 11, 2006
REPRESENTATIVE EXAMPLE
UNITS SCIENTIFIC RESEARCH REPRESENTATIVE
DISCIPLINE
& DISCOVERY TECHNOLOGY

Anatomy MRI Organisms Migratory


Sensors

Physiology Heart Ventricular


Organs
Modeling

Cell Biology Neuron Cells Electron


Microscopy

Proteomics Structure Macromolecules X-ray


Genomics Sequence Biopolymers Crystallography

Medicinal Atoms & Molecules Protein


Protease
Chemistry Docking
Inhibitor

Where does it fit as a technique


in modern medical research?
REPRESENTATIVE EXAMPLE
UNITS SCIENTIFIC RESEARCH REPRESENTATIVE
DISCIPLINE
& DISCOVERY TECHNOLOGY

Anatomy MRI Organisms Migratory


Sensors

Physiology Heart Ventricular


Organs
Modeling

Translational
Cell Biology Neuron
Medicine Cells Electron
Microscopy

Proteomics Structure Macromolecules X-ray


Genomics Sequence Biopolymers Crystallography

Medicinal Atoms & Molecules Protein


Protease
Chemistry Docking
Inhibitor

Where does it fit as a technique


in modern medical research?
The Ability to Model Protein Function
Influences and can be Influenced by Any
Level of Biological Complexity - Examples
• Genome - rapid increase in sequenced genomes provides
new raw material
• Proteome – large increase in the number of 3D structures
highlights new functions
• Interactome – identification of a binding partner points to
a new function
• Metabolome – isolation of a protein within a metabolic
pathway
• Cell - localization points to function
• Organ – gene expression in heart tissue points to function
• Organism – different physiology observed in species can
be related to protein functions
MED260 Modeling Protein Function 6
Where does it fit as a technique - October 11, 2006
in modern medical research?
REPRESENTATIVE EXAMPLE
UNITS SCIENTIFIC RESEARCH REPRESENTATIVE
DISCIPLINE
& DISCOVERY TECHNOLOGY

Anatomy MRI Organisms Migratory


Sensors

Physiology Heart Ventricular


Organs
Modeling

Cell Biology Neuron Cells Electron


Microscopy
We will focus here
Proteomics Structure Macromolecules X-ray
Genomics Sequence Biopolymers Crystallography

Medicinal Atoms & Molecules Protein


Protease
Chemistry Docking
Inhibitor

MED260 Modeling Protein Function 7


- October 11, 2006
At All Levels We Are Being Driven By Data

Biological Experiment Data Information Knowledge Discovery

Collect Characterize Compare Model Infer


Complexity Technology
Data
Higher-life 1 10 100 1000 100000 Computing
Power
Organ Brain Cardiac
Mapping Modeling
Virtual
Cellular Communities
Model Metaboloic
Pathway of E.coli
Sub-cellular 106 102 Neuronal 1
Modeling # People/Web Site

Assembly Virus Ribosome


Structure
Genetic
Circuits
Structure Human
Genome Yeast E.Coli C.Elegans 1 Small
Project Genome Genome Genome Genome/Mo.
ESTs Gene Chips Human Sequencing
Sequence Genome Technology
90 95 00 05
The Data Deluge Year
Metagenomics A First Look
• New type of genomics • New data (and lots of it)
and new types of data
– 17M new (predicted
proteins!) 4-5 x growth
in just few months and
much more coming
– New challenges and
exacerbation of old
challenges

MED260 Modeling Protein Function 9


The Data Deluge - October 11, 2006
Metagenomics: First Results

• More then 99.5% of DNA • Everything we touch


in very environment turns out to be a gold
studied represent unknown mine
organisms • Environments studied:
– Culturable organisms are
– Water (ocean, lakes)
exceptions, not the rule
– Soil
• Most genes represent
– Human body (gut, oral
distant homologs of known cavity, human
genes, but there are microbiome)
thousands of new families
MED260 Modeling Protein Function 10
The Data Deluge - October 11, 2006
Metagenomics New Discoveries
Environmental (red) vs. Currently Known PTPases (blue)
1

2
3

4
Hi
gh
er
eu
ka
ryo
te
s

MED260 Modeling Protein Function 11


The Data Deluge - October 11, 2006
The Good News and the Bad News

• Good news
– Data pointing towards function are growing at
near exponential rates
– IT can handle it on a per dollar basis
• Bad news
– Data are growing at near exponential rates
– Quality is highly variable
– Accurate functional annotation is sparse

MED260 Modeling Protein Function 12


The Data Deluge - October 11, 2006
Genomes - 2004
• We all know about the human – what is not
so well known is:
– 191 completed microbial genomes
– 44 archaea
– 727 bacteria
– 785 eukaryotes (complete or in progress)
– Viroids ….

MED260 Modeling Protein Function 13


The Data Deluge - October 11, 2006
Proteome
• We are reasonably good at finding proteins
in genomes with intergenic regions but not
perfect – eg alternative initiation codons
• Regulatory elements provide a different set
of challenges
• We are not so good at assigning functions
to those proteins
• Moreover the devil is in the details

MED260 Modeling Protein Function 14


The Extent of What Can Be Modeled - October 11, 2006
Estimated Functional Roles (by % of
Proteins) of the Proteome in a Complex
Organism

MED260 Modeling Protein Function 15


The Extent of What Can Be Modeled - October 11, 2006
Functional Nomenclature Needs to be Consistent
for Orderly Progress – Enter EC and GO

• EC classifies all enzymes -


http://www.chem.qmul.ac.uk/iubmb/enzym
e/
• Gene Ontology Consortium characterizes
by molecular function, biochemiscal
process and cellular location http://
www.geneontology.org/

Ontologies – MED260 Modeling Protein Function 16


establishing order from chaos - October 11, 2006
Functional
Coverage of the
Human Genome

40% covered

http://function.rcsb.org:8080/pdb/function_distribution/index.html

The Extent of What Can Be Modeled


Step 1. Learn What You Can from
the Protein Sequence
• Find it

• Pay attention to the quality of the functional


annotation – errors are transitive

• Understand its 1-D structure – domain


organization, {signatures, fingerprints}

MED260 Modeling Protein Function 18


Examples of what can be learnt - October 11, 2006
Step 2. Is there a 3D Structure? If so
What Can You Learn from That?
• Find it
• Understand it
• Characterize it
• Understand its function(s) – these follow a
power law at the fold level – some folds are
promiscuous (many functions) others are
solitary or of unknown function

MED260 Modeling Protein Function 19


Examples of what can be learnt - October 11, 2006
(a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA
(e) antibodies (f) viruses (g) actin (h) the nucleosome
(i) myosin (j) ribosome
Courtesy of David Goodsell, TSRI
First Why Bother with Structure?
An Example: Protein Kinase A
This “molecular scene”
for cAMP dependant
protein kinase depicts
years of collective
knowledge.

Beyond basics, only


the atomic coordinates
are captured by the
PDB.

Functional annotation
requires the literature
MED260 Modeling Protein Function 21
- October 11, 2006
Examples of what can be learnt
What Did that Picture Tell Us?
• Two domains with
associated functions
• So is structure
• ATP binding & substrate the answer to
binding
• Through conserved
functional
residues and their spatial modeling?
location details of the
ATP and substrate binding
and mechanism of the
phospho transfer reaction

MED260 Modeling Protein Function 22


Examples of what can be learnt - October 11, 2006
Question: So is structure the answer to
functional modeling?

Answer: Partly - The number of unique


protein sequences still outnumbers the
number of unique structures by 100:1

Enter Structural Genomics

Enter Structure Prediction


MED260 Modeling Protein Function 23
- October 11, 2006
Examples of what can be learnt
The Structural Genomics Pipeline
(X-ray Crystallography)
Basic Steps
Crystallomics
• Isolation,
Target • Expression, Data Structure Structure Functional
Selection • Purification, Collection Solution Refinement Annotation Publish
• Crystallization

MED260 Modeling Protein Function 24


Examples of what can be learnt - October 11, 2006
Structural Genomics Will Give Us..

• Good news
– More structures (definitely)
– New folds (some but not as anticipated)
– New understanding of specific diseases and pathways
(maybe)
– Representatives from each major protein family
(maybe)
• Bad news
– Many new structures that are functionally unclassified
(definitely)

MED260 Modeling Protein Function 25


Examples of what can be learnt - October 11, 2006
What About Structure Prediction?

• Current rule

We will be able to predict a structure when


we know all the structures 

MED260 Modeling Protein Function 26


Examples of what can be learnt - October 11, 2006
Why is Structure Prediction so Hard?
Random 1000 structurally similar PDB polypeptide chains with z > 4.5
(% sequence identity vs alignment length)

Twilight Zone
Midnight Zone

MED260 Modeling Protein Function 27


Examples of what can be learnt - October 11, 2006
Approaches to Structure Prediction

• Homology modeling
• Threading (aka fold recognition)
• Ab initio
• How well do we do? – see CASP
• Consensus servers
– Eva - http://cubic.bioc.columbia.edu/eva/
– LiveBench - http://bioinfo.pl/meta/
MED260 Modeling Protein Function 28
Examples of what can be learnt - October 11, 2006
Step 3. What Can Be Got from Structure
When You Have it?

From Structural Bioinformatics


Ed Bourne and Weissig p394 Wiley 2002
MED260 Modeling Protein Function 29
Examples of what can be learnt - October 11, 2006
Specific Example
• Mj0577 – putative ATP molecular switch

Mj0577 is an open reading frame (ORF) of previously unknown function


from Methanococcus jannaschii. Its structure was determined at 1.7Å
(Figure 7a) (Zarembinski et al, 1998). The structure contains a bound
ATP molecule, picked up from the E. coli host. The presence of
bound ATP led to the proposition that Mj0577 is either an ATPase, or
an ATP-binding molecular switch. Further experimental work showed
that Mj0577 cannot hydrolyse ATP by itself, and can only do so in the
presence of M. jannaschii crude cell extract. Therefore it is more
likely to act as a molecular switch, in a process analogous to ras-GTP
hydrolysis in the presence of GTPase activating protein.

From Structural Bioinformatics


Ed Bourne and Weissig p402 Wiley 2002
MED260 Modeling Protein Function 30
Examples of what can be learnt - October 11, 2006
Step 4. Proteins Do Not Function in Isolation
But are Part of Complex Interaction Networks

http://www.genome.jp/kegg/
MED260 Modeling Protein Function 31
- October 11, 2006
Examples of what can be learnt
Accuracy - A Word of Caution
• Errors are transitive
– Proteins A and B are observed to have similar
functions through sequence homology
– Proteins B and C are observed to have similar
functions through sequence homology
– Is protein A related to protein C?
– Up to 30% of current annotation may be wrong

MED260 Modeling Protein Function 32


- October 11, 2006
Accuracy - A Word of Caution
Questions?

MED260 Modeling Protein Function 33


- October 11, 2006
Demo of Steps 1-4
• Step 1. Learn What You Can from the Protein
Sequence
• Step 2. Is there a 3D Structure? If So, What Can
You Learn from That?
• Step 3. What Can Be Got from Structure When
You Have it?
• Step 4. Proteins Do Not Function in Isolation But
are Part of Complex Interaction Networks

MED260 Modeling Protein Function 34


- October 11, 2006

Das könnte Ihnen auch gefallen