Structural Bioinfo

STRUCTURAL BIOINFORMATICS
( Toward A High-Resolution Understanding of Biology )
Objectives of Lecture
Structural Bioinformatics
What is 3D Structure Prediction Significance of 3D Structure Prediction Central Dogma Fundamentals of Protein Structure Protein Data bank (PDB) To be aware of a number of Structure Prediction methods: Homology Modeling Fold Recognition/Threading Ab initio Protein Folding Approaches Applications of Structural Bioinformatics Analog-Based design Structure-Based design
Structural Bioinformatics
Structural Bioinformatics is a subset of Bioinformatics concerned with the use of biological structuresProtein, DNA, RNA, Ligands and complexes thereof to further our understanding of biological systems.
What is protein structure prediction?

A prediction of the (relative) spatial position of each atom in the tertiary structure generated from knowledge (sequence). only of the primary structure
Significance of Protein Structure Prediction

In evolutionary related proteins structure is much better
preserved than sequence. 3D protein structure offers much more information then just the amino acid sequence. By comparison with known structures we can infer probable biological functions of new proteins By mapping the residue conservations on to the structure we can infer active sites and possibly the molecular function
We can also identify regions involved in protein-protein interactions. We can reconstruct (at least partially) the structure of protein complexes identified by other experimental methods. We can build homology models.
The central dogma

DNA ------{A,C,T,G} Guanine, Cytosine Thymine, Adenine RNA {A,C,G,U} T U ---------Protein {A,D,..Y}
Fundamentals of Protein Structure
Terminology
Primary Structure-- The sequence of amino acid
residues in the proteins.
--MESSTHEDRKVLDL
Amino acids and the peptide bond
C atoms
C first side chain carbon (except for glycine).
Secondary Structure
A first level description of 3D structure. The peptide backbone of DNA has areas of positive charge and negative charge These areas can interact with one another to form hydrogen bonds The result of these hydrogen bonds are two types of structures: alpha helices beta pleated sheets
Secondary Structure I: The EHelix
Secondary Structure II: The Strand
(About 3.4)
Several betastrands assemble into a beta-sheet (a tertiary structural element)
Antiparallel
-Sheets
Parallel
-Sheets
Mixed
-Sheets
Tertiary Structure: The Global Three Dimensional Structure

Secondary structure elements pack together to form a
structural core Tertiary structure results from the folding of alpha helices and beta pleated sheets Factors influencing tertiary structure include: Hydrophobic/hydrophilic interactions Hydrogen bonding Disulfide linkages Folding by chaperone proteins
Tertiary Structure: Different Representations
(Richardson-style) Ribbon Diagrams are traces of the protein backbone emphasizing the 3-D arrangement of a-helices and b-strands. This arrangement is called the protein fold or the protein folding topology.
Tertiary Structure: Different Representations
This is much rather like what other molecules see when they encounter a protein! This is a representation of the molecular surface (Van der Waals surface) of a hemagglutinin domain with bound sialic acid.
Super secondary Structures: Between Secondary and Tertiary Structure
For example: - alpha- -above - -hairpin - left
Quaternary Structure
Association of Multiple Polypeptide Chains. Quaternary structure results from the interaction of independent polypeptide chains Factors influencing quaternary structure include: Hydrophobic/hydrophilic interactions Hydrogen bonding The shape and charge distribution on associating polypeptides
Side Chain Properties

Hydrophobic amino acids stay inside of a protein. Hydrophilic ones tend to stay in the exterior of a protein. Oppositely charged amino acids can form salt bridge. Polar amino acids can participate hydrogen bonding.
Domain, Motif, Fold

Domain: a discrete portion of a protein assumed to fold independently of the rest of the protein and possessing its own function. Most proteins have multiple domains. The overall shape of a domain is called a fold. There are only a few thousand possible folds. Super-secondary structure, motif Frequently occurring structure patterns among multiple proteins, which are not necessarily have similar folds.
Determination of protein structures

X-ray Crystallography NMR (Nuclear Magnetic Resonance) EM (Electron microscopy)
Protein Data bank (PDB)

A repository for 3-D biological macromolecular structure. Established in 1971 at Brookhaven National Lab (7 structures) It includes proteins, nucleic acids and viruses. Obtained by X-Ray crystallography (80%) or NMR spectroscopy (16%). Submitted by biologists and biochemists from around the world.
Other sites:
MMDB (EBI): NCBI: msd.ebi.ac.uk www.ncbi.nlm.nih.gov/Structure/
Growth of Protein Data Bank (PDB): The Motivation
Old fold
New fold
The number of unique folds in nature is fairly small (possibly a few thousands) 90% of new structures submitted to PDB in the past three years have similar structural folds in PDB
Protein Structure Prediction Methods

Comparative Modeling Method:
Homology Modeling Method Threading Method

Ab initio folding Method
Protein structure prediction flowchart

Experimental Sequence Database Searching Structure Homolog?
YES NO
Ab initio method
Homology Modeling
Homology Protein Threading Modeling
Homology Modeling
Predicts
the three-dimensional structure of a given protein sequence (TARGET) based on an alignment to one or more known protein structures (TEMPLATES)
If similarity between the TARGET sequence and the TEMPLATE sequence is detected, structural similarity can be assumed. In general, 30% sequence identity is required for generating useful models.
7 Steps In Homology Modeling
Step 1: ID Homologues in PDB
PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK
Query Sequence
PDB
Step 1: ID Homologues in PDB

PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGPRTEINSEQENCEPRTEINS EQUENCEPRTEINSEQNCEQWERYTRASDFH GTREWQIYPASDFG TREWQIYPASDFGPRTEINSEQENCEPRTEINS EQUENCEPRTEINSEQNCEQWERYTRASDFH GTREWQ PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFG
Hit#1
PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQQWEWEWQWEWEQWEW EWQRYEYEWQWNCEQWERYTRASDF HG TREWQIYPASDWERWEREWRFDSFG
Hit#2
PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGPRTEINSEQENC PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFG
Query sequence
PDB
Step 2: Align Sequences

G E N E S I S G 10 0 0 0 0 0 0 E 0 10 0 0 0 0 0 N 0 0 10 0 0 0 0 E 0 10 0 10 0 0 0 T 0 0 0 0 0 0 0 I 0 0 0 0 0 10 0 C 0 0 0 0 0 0 0 S 0 0 0 0 10 0 10
G G 60 E 40 N 30 E 20 S 20 I 10 S 0 E 40 50 30 20 20 10 0 N E 30 20 30 30 40 20 20 30 20 20 10 10 0 0 T 20 20 20 20 20 10 0 I C 0 10 0 10 0 10 10 10 0 10 20 10 0 0 S 0 0 0 0 10 0 10
Dynamic Programming
Alignment
Key step in Homology Modeling. Global (Needleman-Wunsch) alignment is absolutely required. Small error in alignment can lead to big error in structural model. Multiple alignments are usually better than pair wise alignments. Alignment is prepared by superimposing all template structures.
Two zones of sequence alignment
Step 3: Find SCRs
Query Hit #1 Hit #2
ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA

SCR#1 SCR#2
Structurally Conserved regions (SCRs)

Corresponds to the most stable structures or regions (usually interior) of protein. Corresponds to sequence regions with lowest level of gapping, highest level of sequence conservation. Usually corresponds to secondary structures.
Step 4: Find SVRs
Query Hit #1 Hit #2
ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA HHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCBBBBBBBBB SVR Loop
Structurally Variable Regions (SVRs)

Corresponds
to the least stable or most flexible regions (usually exterior) of protein
Corresponds to sequence regions with highest level of gapping, lowest level of sequence conservation Usually corresponds to loops and turns
Step 5: Side Chain Modeling

Rotamer
placement and positioning is done via a superposition algorithm using rotamers.
Step 6: Model Optimization

Efficient way of polishing and shining your protein model Removes atomic overlaps and unnatural strains in the structure Stabilizes or reinforces strong hydrogen bonds, breaks weak ones Brings protein to lowest energy in about 1-2 minutes CPU time Several freeware options to choose
XPLOR (Axel Brunger, Yale) GROMACS (Gronnigen, The Netherlands) AMBER (Peter Kollman, UCSF) CHARMM (Martin Karplus, Harvard) TINKER (Jay Ponder, Wash U))
Step 7: Model Validation

PROCHECK http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html
PROSA II http://lore.came.sbg.ac.at/People/mo/Prosa/prosa.html
VADAR http://www.pence.ualberta.ca/ftp/vadar/
DSSP http://www.embl-heidelberg.de/dssp/
Homology Modeling On Web
http://www.expasy.ch/swissmod/SWISS-MODEL.html
http://www.cmbi.kun.nl:1100/WIWWWI/
http://cl.sdsc.edu/hm.html
Raw Sequence
Use templates to build the structure of the homologous sequence
Predicted structure
Use of SwissPDB Viewer to build the structure of following sequence
MQQPMNYPCP QIFWVDSSAT SSWAPPGSVF PCPSCGPRGP DQRRPPPPPP PVSPLPPPSQ PLPLPPLTPL KKKDHNTNLW LPVVFFMVLV ALVGMGLGMY QLFHLQKELA ELREFTNQSL KVSSFEKQIA NPSTPSEKKE PRSVAHLTGN PHSRSIPLEW EDTYGTALIS GVKYKKGGLV INETGLYFVY SKVYFRGQSC NNQPLNHKVY MRNSKYPEDL VLMEEKRLNY CTTGQIWAHS SYLGAVFNLT SADHLYVNIS QLSLINFEES KTFFGLYKL
DOGB
1TNRA
After magic fit
Activate the raw sequence
The Preliminary Result
Protein Threading
Makes structure prediction through identification of good sequence-structure fit. Protein threading can predict only the backbone structure of a protein (side-chains have to be predicted using other methods)
Predicted
Actual
Ab Initio 3D structure prediction

to predict tertiary structure from basic physico-chemical properties.
It is used when Homology Modeling & Threading have failed (no homologies are evident ). Does not rely on any detection of similarity to sequence of known structure. As yet very unreliable for practical predictions.
Aims
Applications
Structural Bioinformatics can facilitate the discovery, design, and optimization of new chemical entities. Computer aided drug design (CADD) or Computer aided molecular design (CAMD) follows two strategies: Analog based design (Ligand Based) Structure based design (Target Based)
Analog Based Design

The analog based approach mainly uses Pharmacophoric maps and Quantitative structure Activity Relationship (QSAR) to identify or modify a lead in the absence of a known 3D structure of the receptor.
Structure-Based Design
Structure-based approach starts with the structure of the receptor site, such as the active site in protein.
Docking comes under this category of design.
Quantitative Structure Activity relationship (QSAR)

QSAR is an applied series of mathematical models built to predict biological and physicochemical behavior of molecules based on their chemical structures. It alleviates the need to determine molecular activity of hundreds of similar compounds that would take large amounts of resources to determine individually. The underlying premise of QSAR is that Biological Activity is correlated to its physiochemical parameters. BA = f (biological + Chemical + Physical) Biological activity can be any measured such as IC50, or ED50.
QSAR Table
Structure Comp.1 Comp.2 Comp.3 Comp.4 Bioproperty Bio1 Bio2 Bio3 Bio3 Structural properties P1 " " " P2 " " " P3 " " " P4 " " "
BA = k1P1 + k2P2 + k3P3 + ...
EXTERNAL VALIDATION OF QSAR MODELS

Entire dataset
Training set
Test set
Model development (q2)
Prediction of the test set (R2)
Thank You

Structural Bioinfo

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Structural Bioinfo

Hochgeladen von

Copyright:

Verfügbare Formate

STRUCTURAL BIOINFORMATICS

( Toward A High-Resolution Understanding of Biology )

What is protein structure prediction?

Significance of Protein Structure Prediction

The central dogma

Fundamentals of Protein Structure

Amino acids and the peptide bond

Secondary Structure I: The EHelix

Secondary Structure II: The Strand

Several betastrands assemble into a beta-sheet (a tertiary structural element)

Tertiary Structure: The Global Three Dimensional Structure

Tertiary Structure: Different Representations

Tertiary Structure: Different Representations

Super secondary Structures: Between Secondary and Tertiary Structure

For example: - alpha- -above - -hairpin - left

Side Chain Properties

Domain, Motif, Fold

Determination of protein structures

Protein Data bank (PDB)

Growth of Protein Data Bank (PDB): The Motivation

Protein Structure Prediction Methods

Homology Modeling Method Threading Method

Protein structure prediction flowchart

Homology Protein Threading Modeling

7 Steps In Homology Modeling

Step 1: ID Homologues in PDB

PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK

Step 1: ID Homologues in PDB

PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK

PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK

PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK

Step 2: Align Sequences

Two zones of sequence alignment

Step 3: Find SCRs

Query Hit #1 Hit #2

ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA

Structurally Conserved regions (SCRs)

Step 4: Find SVRs

Query Hit #1 Hit #2

ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA HHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCBBBBBBBBB SVR Loop

Structurally Variable Regions (SVRs)

to the least stable or most flexible regions (usually exterior) of protein

Step 5: Side Chain Modeling

placement and positioning is done via a superposition algorithm using rotamers.

Step 6: Model Optimization

Step 7: Model Validation

Homology Modeling On Web

Use templates to build the structure of the homologous sequence

Use of SwissPDB Viewer to build the structure of following sequence

After magic fit

Activate the raw sequence

The Preliminary Result

Ab Initio 3D structure prediction

Analog Based Design

Docking comes under this category of design.

Quantitative Structure Activity relationship (QSAR)

BA = k1P1 + k2P2 + k3P3 + ...

EXTERNAL VALIDATION OF QSAR MODELS

Model development (q2)

Prediction of the test set (R2)

Das könnte Ihnen auch gefallen