Sie sind auf Seite 1von 44

Protein Structure Prediction

Doctoral Fellow Animal Biotechnology Center Veterinary Physiology & Biochemistry College of Veterinary & Animal Sciences Pantnagar, INDIA - 263145.

Dr. Bhaskar Ganguly

OUT LINE
Introduction

Approaches &
Tools &

Principles

Methods

Case Study

Introduction

Protein structure is determined experimentally by crystallography.

Primary structure is determined empirically by peptide sequencing.


Primary structure and environment determine secondary, tertiary and quaternary structure of a protein.

By April, 2010, there were more than 6,800,000 protein sequences in the non-redundant protein sequence database at NCBI and fewer than 50,000 protein structures in the Protein Data Bank.
(Present estimates of proteins without a structure approx. 9 million)

The only way to bridge the ever growing gap between protein sequence and structure is computational structure modeling.
- Kryshtafovych & Fidelis, 2010, Drug Discov Today, 14: 386-393

An amino acid sequence carries all the information needed to guide protein folding into a specific spatial shape of a protein.
- Sela et al., 1957, Science, 125: 691-692

Higher orders of structure can be inferred from primary structure.

Approaches & Principles

2 approaches: Template Based Approach o Homology Modeling o Threading Free Modeling/ De novo Approach o Ab initio Modeling

Homology Modeling
Comparative Modeling Protein structure is much more evolutionarily conserved than sequence and therefore similar sequences normally yield similar 3D structures.
-Chothia & Lesk, 1986, EMBO J, 5: 823-826

New families are being discovered at a rate that is linear with the addition of new sequences.
-Yooseph et al., 2007, PLOS Biol, 5: e16

Threading
Fold Recognition Even if a template cannot be identified using sequence similarity, suitable templates may still exist as Nature tolerates only a limited number of folds. Scans the query sequence against a database of solved structures; a scoring function is used to assess the compatibility of the sequence to the structure.

Ab initio Modeling
Based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire.

Before we proceed any further

A word about CASP


Critical Assessment of techniques for protein Structure Prediction
Results published in a special issue of Proteins: Structure, Function and Bioinformatics

Good News!
Currently available template-based methods reliably generate accurate models, comparable in quality to the structures solved by X-ray crystallography. The level of detail is sufficient for DRUG DESIGN, detecting PROTEIN INTERACTIONS, understanding REACTION MECHANISMS, interpretation of MUTATIONS and molecular replacement in solving CRYSTAL STRUCTURES.
-Baker & Sali, 2001, Science, 294: 93-96

-Raimondo et al., 2007, Proteins, 66: 689-696


-Zhang, 2008, Curr Opin Struct Biol, 18: 342-348

The Workhorses of Computational Biology?


Computational Biologists Computational Servers Computational Programs

The performance gap between the best servers and the best human-expert groups is narrowing over time.
-Kryshtafovych et al., 2007, Proteins, 69: 194-207

How do I model My Protein?

Getting Started:
Primary Structure? Amino-acid sequence (your own or database) In silico translation of nucleotide sequence (your own or database)

Be cautious of CODON BIAS!!!

Resources for Sequence Retrieval:

Primary databases: NCBI, EMBL, DDBJ Protein database: Uni-Prot, SWISS-PROT/ TrEMBL, PIR, IPI, NCBI(Protein)

Template Search:
BLASTp (NCBI)

PSI-BLAST against PDB Select best template(s) and note the pdb id [low evalue, high similarity, high coverage]
Retrieve template structure (.pdb file format)

Has your protein already been Modeled?

PSI-BLAST against PDB ModWeb

Modeling:
Provide query sequence and template pdb id/ template file
Many servers do not require template files

PROTEIN MODEL

Resources for Modeling:

Programs: MODELLER SWISS-PDB View

Resources for Modeling:

Servers: TASSER SPARKSx PEP-FOLD Swiss Model Rosetta Design ModWeb QUARK HHpred Phyre RaptorX EsyPred 3D Jigsaw RosettaAntibody Bhageerath & many others

Resources for Modeling:

Meta-servers: 3D-Jury GeneSilico Pcons.net LOMETS

Hmmm but How do I see My Protein?

Resources for Visualization:


Chimera PyMol YASARA SWISS-PDB View JMol RasMol Cn3D

Yippee!!! I can see my Model. Am I through?

Indices of Quality:
Steric Clashes & Bumps Packing Density Ramachandran plot Deviation from template Conservation of secondary structure G-score Q-value Z-value

Resources for Quality Assessment:


Servers: PROCHECK (Ramachandran plot, G-score) Q-MEAN (Q-value, Z-value) SAVES (ERRAT, PROVE) Verify3D MATRAS 3D (deviation from template) DaliLite (deviation from secondary folds)

Resources for Quality Assessment:

Meta-server: MetaMQAP Program: Swiss-Pdb View (Ramachandran plot) YASARA (MUSTANG-RMSD)

Tools for Quality Improvement:

Server: What If Web Interface Program: Swiss-Pdb View

QA/ QC Completed

What else can I do with My Protein?

Active residue prediction (ConSurf, NCBI-CDD, InterProScan) Active site prediction (POCKET FINDER, 3D2GO, fPOCKET, LIGSITE, MetaPocket, 3D LigandSite, LigPlot, NCBI-CDD, InterProScan) Antigenic profiling (IEDB, IMTECH) Protein Dynamics (CAVITY, CASTp, SLITHER, NCBI-CDD, InterProScan) Drug design/ Ligand docking (Hex, PATCH DOCK, Z DOCK, HADDOCK, Docking Server, GRAMM-X, Flex Pep Dock) Function prediction (ConFunc, ProKnow, KPFP, InterProScan, Pfam, NCBI-CDD, KAAS) Physical & Chemical Profiling (ProtParam, molbiol-tools.ca, ProSAL, Scratch Protein Predictor) Protein-protein interactions (Hex, KB Dock) Toxicity profiling (OSIRIS Actelion Property Explorer, Molsoft Drug Likeness Explorer)
(Again, not all these require a 3D structure of your protein as query, though, many do)

The Curious Case of buPAG


-Ganguly & Prasad, 2012, J Ani Sci Biotech, 3: 13

PREGNANCY ASSOCIATED GLYCOPROTEINS


Form a diverse family of glycoproteins Variably expressed at different stages of gestation Probably involved in immunosuppression of the dam Presence has also been correlated with placentogenesis and placental re-modeling Exact structure and function was unknown due to limitations on obtaining purified preparations

Sequence Retrieval of buPAG-2 (NCBI)

Template Selection and Retrieval of Template Structure (PSI-BLAST against PDB) Generation & Refinement of buPAG-2 Model (MODELLER9v10, SWISS-PDB View, What If Web Interface)
Quality Assessment (ERRAT, PROCHECK, MUSTANG-RMSD)

Structural & Functional Annotation (NetOGlyc, NetNGlyc, YinOYang, ProtParam, NCBI-CDD, InterProScan, Pfam, KPFP, ProKnow)

buPAG-2 MODEL

The final 3D structure of buPAG2 was submitted to PMDB (PM0077895).

QUALITY

Only 1 of the total 367 residues was present in the disallowed region, Quality Factor of 83.143%, RMSD of 0.447 over 353 residues, G-factor of -0.16.

STRUCTURAL & FUNCTIONAL


ANNOTATION
Molecular structure was elucidated. MHC-I binding and down-regulation of the complement pathway Regulation of transcription through DNA-dependent, GTP binding mechanisms Control of apoptotic processes underlying fetal morphogenesis and/ or re-modeling of placenta Activity is controlled during pregnant & non-pregnant states by Yin-Yang sites

Some other Models

OmpH

Pasteurella multocida

Ligand Binding Domain, Umami Receptor Canis lupus familiaris

Cyclooxygenase 2
Canis lupus familiaris

Lipoxygenase 2
Canis lupus familiaris

A large number of Proteins are waiting to be Modeled!

SELECTED REFERENCES
- Kryshtafovych A, Fidelis K. 2010. Protein structure prediction and model quality assessment. Drug Discov Today, 14: 386-393. - Sela M, et al. 1957. Reductive cleavage of disulfide bridges in ribonuclease. Science, 125: 691-692. - Chothia C, Lesk AM. 1986. The relation between the divergence of sequence and structure in proteins. EMBO J, 5: 823-826. - Yooseph S, et al. 2007. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLOS Biol, 5: e16. - Baker D, Sali A. 2001. Protein structure prediction and structural genomics. Science, 294: 93-96. - Raimondo D, et al. 2007. Automatic procedure for using models of proteins in molecular replacement. Proteins, 66: 689-696. - Zhang Y. 2008. Progress and challenges in protein structure prediction. Curr Opin Struct Biol, 18: 342-348. - Kryshtafovych A et al. 2007. Progress from CASP6 to CASP7. Proteins, 69: 194-207. - Ganguly B, Prasad S. 2012. Homology modeling and functional annotation of bubaline pregnancy associated glycoprotein 2. J Ani Sci Biotech, 3: 13.

Mail me: bhaskarvet@yahoo.co.in

THANK YOU

Das könnte Ihnen auch gefallen