Beruflich Dokumente
Kultur Dokumente
Objectives of Lecture
Structural Bioinformatics
What is 3D Structure Prediction Significance of 3D Structure Prediction Central Dogma Fundamentals of Protein Structure Protein Data bank (PDB) To be aware of a number of Structure Prediction methods: Homology Modeling Fold Recognition/Threading Ab initio Protein Folding Approaches Applications of Structural Bioinformatics Analog-Based design Structure-Based design
Structural Bioinformatics
Structural Bioinformatics is a subset of Bioinformatics concerned with the use of biological structuresProtein, DNA, RNA, Ligands and complexes thereof to further our understanding of biological systems.
preserved than sequence. 3D protein structure offers much more information then just the amino acid sequence. By comparison with known structures we can infer probable biological functions of new proteins By mapping the residue conservations on to the structure we can infer active sites and possibly the molecular function
We can also identify regions involved in protein-protein interactions. We can reconstruct (at least partially) the structure of protein complexes identified by other experimental methods. We can build homology models.
Terminology
Primary Structure-- The sequence of amino acid
residues in the proteins.
--MESSTHEDRKVLDL
C atoms
C first side chain carbon (except for glycine).
Secondary Structure
A first level description of 3D structure. The peptide backbone of DNA has areas of positive charge and negative charge These areas can interact with one another to form hydrogen bonds The result of these hydrogen bonds are two types of structures: alpha helices beta pleated sheets
(About 3.4)
Antiparallel
-Sheets
Parallel
-Sheets
Mixed
-Sheets
structural core Tertiary structure results from the folding of alpha helices and beta pleated sheets Factors influencing tertiary structure include: Hydrophobic/hydrophilic interactions Hydrogen bonding Disulfide linkages Folding by chaperone proteins
(Richardson-style) Ribbon Diagrams are traces of the protein backbone emphasizing the 3-D arrangement of a-helices and b-strands. This arrangement is called the protein fold or the protein folding topology.
This is much rather like what other molecules see when they encounter a protein! This is a representation of the molecular surface (Van der Waals surface) of a hemagglutinin domain with bound sialic acid.
Quaternary Structure
Association of Multiple Polypeptide Chains. Quaternary structure results from the interaction of independent polypeptide chains Factors influencing quaternary structure include: Hydrophobic/hydrophilic interactions Hydrogen bonding The shape and charge distribution on associating polypeptides
Other sites:
MMDB (EBI): NCBI: msd.ebi.ac.uk www.ncbi.nlm.nih.gov/Structure/
Old fold
New fold
The number of unique folds in nature is fairly small (possibly a few thousands) 90% of new structures submitted to PDB in the past three years have similar structural folds in PDB
YES NO
Ab initio method
Homology Modeling
Homology Modeling
Predicts
the three-dimensional structure of a given protein sequence (TARGET) based on an alignment to one or more known protein structures (TEMPLATES)
If similarity between the TARGET sequence and the TEMPLATE sequence is detected, structural similarity can be assumed. In general, 30% sequence identity is required for generating useful models.
Query Sequence
PDB
Hit#1
PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQQWEWEWQWEWEQWEW EWQRYEYEWQWNCEQWERYTRASDF HG TREWQIYPASDWERWEREWRFDSFG
Hit#2
PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGPRTEINSEQENC PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFG
Query sequence
PDB
Dynamic Programming
Alignment
Key step in Homology Modeling. Global (Needleman-Wunsch) alignment is absolutely required. Small error in alignment can lead to big error in structural model. Multiple alignments are usually better than pair wise alignments. Alignment is prepared by superimposing all template structures.
Corresponds to sequence regions with highest level of gapping, lowest level of sequence conservation Usually corresponds to loops and turns
PROSA II http://lore.came.sbg.ac.at/People/mo/Prosa/prosa.html
VADAR http://www.pence.ualberta.ca/ftp/vadar/
DSSP http://www.embl-heidelberg.de/dssp/
http://www.expasy.ch/swissmod/SWISS-MODEL.html
http://www.cmbi.kun.nl:1100/WIWWWI/
http://cl.sdsc.edu/hm.html
Raw Sequence
Predicted structure
MQQPMNYPCP QIFWVDSSAT SSWAPPGSVF PCPSCGPRGP DQRRPPPPPP PVSPLPPPSQ PLPLPPLTPL KKKDHNTNLW LPVVFFMVLV ALVGMGLGMY QLFHLQKELA ELREFTNQSL KVSSFEKQIA NPSTPSEKKE PRSVAHLTGN PHSRSIPLEW EDTYGTALIS GVKYKKGGLV INETGLYFVY SKVYFRGQSC NNQPLNHKVY MRNSKYPEDL VLMEEKRLNY CTTGQIWAHS SYLGAVFNLT SADHLYVNIS QLSLINFEES KTFFGLYKL
DOGB
1TNRA
Protein Threading
Makes structure prediction through identification of good sequence-structure fit. Protein threading can predict only the backbone structure of a protein (side-chains have to be predicted using other methods)
Predicted
Actual
Applications
Structural Bioinformatics can facilitate the discovery, design, and optimization of new chemical entities. Computer aided drug design (CADD) or Computer aided molecular design (CAMD) follows two strategies: Analog based design (Ligand Based) Structure based design (Target Based)
Structure-Based Design
Structure-based approach starts with the structure of the receptor site, such as the active site in protein.
QSAR Table
Structure Comp.1 Comp.2 Comp.3 Comp.4 Bioproperty Bio1 Bio2 Bio3 Bio3 Structural properties P1 " " " P2 " " " P3 " " " P4 " " "
Training set
Test set
Thank You