Sie sind auf Seite 1von 111

Protein Chemistry

Postgraduate students’ course


CONTENTS

Chapter 1 Introduction to Protein Chemistry

1.1 Overview—Advances in Protein Chemistry


1.2 General Concept of Protein
1.3 Protein Functions
1.4 Protein Structures
1.4.1 The Building Blocks of Proteins
1.4.2 Polypeptide Chain
1.4.3 Conformation of Polypeptide Chain
1.4.4 Protein Structure Biology
1.5 Interactions of Protein with Other Molecules
1.6 Protein Engineering
1.7 Proteome and Proteomics
1.8 International Journals and Websites Related to
Protein Chemistry

Chapter 2 Physic-chemical Properties of Proteins

2.1 Size and Shape of Protein Molecules


2.2 The Charge Properties of Protein Molecules
2.2.1 The Charge Properties of Amino Acids
2.2.2 The Charge Properties of Protein Molecules
2.2.3 Isoelectric Point of Proteins
2.2.4 Electrophoresis
2.2.5 Ion Exchange
2.3 Colloid Property of Proteins
2.3.1 Diffusion
2.3.2 Viscosity
2.3.3 Impermeability
2.4 Precipitation
2.4.1 Salting out and Salting in
2.4.2 Basic and Acidic precipitation
2.4.3 Organic Solvent-caused precipitation
2.4.4 Heavy metal-caused precipitation
2.4.5 Interaction of Antibody and Antigen
2.4.6 Others
2.5 Color-formation of Proteins
2.6 Spectro-properties of Proteins
2.6.1 Ultraviolet absorption
2.6.2 Fluorescence spectrum
2.6.3 Circular Dichroism Spectrum

Chapter 3 Methods for Characterization and


Purification of Proteins

3.1 Methods of Protein Characterization

3.1.1 Solubility Reflects a Balance of Protein-Solvent


Interaction
3.1.2 Several Methods are Available for Determina-
tion of Gross Size and Shape
3.1.3 Electrophoretic Methods are the Best Way to
Analyze Mixtures

3.2 Methods for Protein Purification

3.2.1 Differential Centrifugation Subdivides Crude


Extracts into Two or More Fractions
3.2.2 Differential Precipitation is Based on Solubility
Differences
3.2.3 Column Procedures Are the Most Versatile
Purification Methods
3.2.4 Electrophoretic Methods are Used for
Preparation and Analysis
3.2.5 Purification of Specific Proteins Involves
Combination of Different Procedures

Chapter 4 The Building Blocks of Proteins----


Amino Acids, Peptides, and Polypeptides

4.1 Aminao acids


4.1.1 Amino acids have both acid and base
properties
4.1.2 Aromatic amino acid absorb light in the near-
ultraviolet
4.1.3 All amino acids except glycine show
asymetry
4.2 Peptides and polypeptides
4.3 Determination of amino acid composition of
proteins
4.4 Determination of amino acid sequence of proteins
4.5 Chemical synthesis of peptides and polypeptides

Chapter 5 The Three-Dimensional Structures of


Proteins

5.1 The information for folding is contained in the


primary structure
5.2 The Ramachadran Plot predicts sterically
permissible strucutre
5.3 Protein folding reveals a hierarchy of structural
organization
5.4 Two secondary structure are found in most
proteins
5.4.1 The α helix
5.4.2 The β Sheet
5.5 Pauling and Corey provided the foundation for
understanding of fibrous protein structure
5.6 Collagan forms a unique triple-stranded structure
5.7 In globular proteins, secondary structure element
are connected in simple motifs
5.8 The domain is the basic unit of tertiary structure
5.8.1 The helix-loop-helix motif is the basic
component found in α domain structures
5.8.2 α/β domains exploit the β-α-β motif
5.8.3 Antiparalle β domains show a great variety of
topologies
5.8.4 Some proteins or domains require additional
features to account for their stability
5.8.5 Many proteins contain more than one domain
5.9 Quaternary structure depends on the interaction of
two or more proteins or protein subunits
5.10 Predicting protein structure from protein prmary
structure
5.11 Methods for determing protein conformation
5.11.1 X-ray diffraction analysis of fibrous proteins
5.11.2 X-ray diffraction analysis of proteins crystals
5.11.3 Nuclear magnetic resonance (NMR)
complements X-ray crystallography
5.11.4 Optical Rotatory Dispersion (ORD) and
Circular Dichroism (CD)
Chapter 6 Protein Structure Prediction

Reference:
“Protein Structure Prediction A Practical
Approach” edited by M J E Sternberg
IRL Press
Chapter 7 Protein Folding and Unfolding

7.1 Kinetic Analysis of Complex Reactions


7.2 Kinetics of Unfolding
7.3 Kinetics of Refolding
7.3.1 Peptide bond isomerization
7.3.2 Refolding in the absence of slow peptide
bond isomerization
7.3.3 The prefolded state
7.3.4 The transition state for folding
7.4 Folding Pathways
7.4.1 Trapping intermediates with disulfides
7.4.2 Disulfide folding pathway of BPTI
7.5 Folding of large proteins
7.6 Biosynthetic folding
7.6.1 Basic methods
7.6.2 Multiple phases and cis-peptidyl-prolyl
bonds

Chapter 8 Stabilizing Protein Function

8.1 Understanding Protein Stability


8.1.1 Protein Stability and Its Measurement
Introduction
Protein Structure
Definition and measurement of protein stability
Folding stability
Kinetic stability

8.1.2 Studies on Denaturation, Inactivation and


Stabilizing Interactions

Introduction
Denaturation studies
Detelerious chemical reactions in proteins
Deamidation of asparagine residues
Isomerization of prolines
Destructive oxidation events
Proteolytic processes
Probing the stabilizing interactions in proteins
Replacement of conserved residues: E. coli
Thioredoxin
Carbohydrate side chains and protein stability
Is there a trade-off between stability and
Activity ?
8.1.3 Enzymes in Organic Media
Introduction
Enzyme behavior in anhydrous organic solvents
Some case studies
Water-

Chapter 9 Functional Diversity of Proteins

9.1 Targeting and functional diversity


9.1.1 Proteins are directed to the regions where
they are utilized
9.1.2 Classification of proteins according to
location emphasizes functionality
9.1.3 Protein structure is suited to protein function
9.2 Hemoglobin—an allosteric oxygen-binding
protein
9.3 Muscle---an aggregate of proteins involved in
contraction
9.4 Protein diversification as a result of evolutionary
pressures

Chapter 10 Proteins in Solution and in Membranes

10.1 Introduction
10.2 Physical and chemical properties of soluble
proteins
10.2.1 Aqueous solubility
10.2.2 Hydrodynamic properties in aqueous
solution
10.2.3 Spectral properties
10.2.4 Ionization
10.2.5 Chemical properties
10.3 Proteins in membranes
10.3.1 Association with membrane
10.3.2 Structures of integral membrane proteins
10.3.3 Identifying amino acid sequences likely to
transverse membranes
10.3.4 Dynamic behavior in membranes
10.4 Flexibility of protein structure

Chapter 11 Protein Engineering

11.1 Introduction to Protein Engineering


11.2 Production and Analytical Characterization of
Proteins
11.2.1 DNA level processes
Finding the protein of interest
Developing recombinant DNA libaries
Mutagenesis principle
11.2.2 Protein Characterization
Methods to determine and assess the protein
strucutre and composition
Assessment of mutant proteins
11.2.3 Summary of Issues to Consider before
Engineering a Protein
What is needed?
Industrial issues

11.3 Protein Engineering for Stability

11.4 Engineering therapeutic antibody

11.5 Site-directed drug design

Chapter 12 Proteome and Proteomics

12.1 Introduction to the Proteomics


12.1.1 Proteome: a new wod, a new field biology
12.1.2 The Proteome and Technology
Thinking in two dimenesions
Further dimensions in protein analysis
Information and the proteome
12.2 Two-Dimensional Electrophoresis: The State of
the Art and Future Direction
12.3 Protein Identification in Proteome Projects
12.4 The Importance of Protein Co- and Post-
Translational Modifications in Proteome Projects
12.5 Proteome Databases
12.6 Interfacing and Integrating Databases
12.7 Large-scale Comparative Protein Modelling
12.8 Applications of Proteomics

Chapte 13 Protein Synthesis, Targeting, and


Turnover
13.1 The cellular machinery of protein synthesis
13.1.1 Messenger RNA is the template for
protein synthesis
13.1.2 Transfer RNAs order activated amino
acids on the mRNA template
13.1.3 Ribosomes are the site of protein
synthesis
13.2 The Genetic code
13.2.1 The code was deciphered with the help
of synthetic messengers
13.2.2 The code is highly degenerate
13.2.3 Wobble introduces ambiguity into
codon-anticodon interactions
13.2.4 The code is not universal
13.2.5 The rules regarding codon-anticodon
pairing are species-specific
13.3 The Steps in translation
13.3.1
Chapter 1 Introduction to Protein Chemistry
1.1 Overview—Advances in Protein Chemistry
1. Advances in technological and experimental approaches.
Experimentalists are able to alter the activity and stability of proteins by
protein engineering, and the first tentative steps in protein design are
under way.
2. Theoreticians are able to simulate many aspects of folding and catalysis
with increasing detail and reliability.
3. The ultimate goal of protein science is to be able to predict the structure
and active of a protein de novo and how it will bind ligands
4. Proteome and proteomics

Protein biology is the study of protein structure revealing details of the


function and life cycle of individual polypeptides. Completion of the sequences of
animal and plant genomes will not be the consummation of modern biology, but a
new beginning. The experimental and philosophical challenge is not just to
identify polypeptide gene products for all coding genes, but to provide a rationale
for how they work. As part of a cellular response to external stimuli, proteins may
be altered by chemical modification or change of subcellular localization,
processing, degradation, or concentration. How do proteins work? How they work
together? How do they work over time and space?
The development of technologies and experimental approaches that were
required to answer the questions of protein biology accelerated during the early
period of genomic analysis. These technologies are now essential tools of
experimental biology. All of the strategies aim toward the highest sensitivity
analysis possible. Chromatographic and electrophoretic separation and
determination, Edman sequence analysis, amino acid analysis, and mass
spectrometry are now all performed routinely with a few picomoles of protein or
peptide. Subpicomoles and low femtomole levels of analysis are reported in the
literature more frequently, and experiments detecting attomoles of protein or
peptide have been described.
Covalent modifications of proteins are an essential part of the language of
intracellular and intercellular communication. They may be reversible or
irreversible. They may be required for biological activity, or simply modulate it,
functioning as “molecular switches”. They are important for signalling and
molecular and cellular recognition. These modifications may impart structural
stability or unique structural features, facilitate protein folding, or promote
intermolecular interactions. They may anchor a protein in a membrane or
determine intracellular or extracellular position. Modification can alter the
biological lifetime of a protein, and define the process by which it is degraded.
Whereas some modifications are present on an entire population of a polypeptide,
others are present on only a small subset of the protein at any one moment.
Quantification of the extent and range of modification present in this population is
not routinely analyzed, but could have important biological implications. Thus,
knowledge of the possible modifications of a particular protein, as well as
modifications induced by external stimuli, are of fundamental importance to
research in structural biology and cell biology alike.
Protein biology also includes the design of new proteins to test biological and
physiological hypothesis. One can now design and prepare novel protein with new
functions that cannot be isolated from nature. How can one manipulate proteins to
understand their original function or to make them perform new ones? On what
principles can these designs be based?
Both molecular biological and chemical synthetic procedures have made it
possible to alter polypeptides structure to probe details of function. Strategies can
be developed to alter a single amino acid, to substitute a non-natural residue, or to
mix and match functional units to create new biological entities.
Last, but not the least, the bio-informatics has accelerated the development of
protein study.

1.2 General Concept of Protein (see also page 1 in Text)

1.3 Protein Functions

1.4 Protein Structures


1.4.1 The Building Blocks of Proteins
1.4.2 Polypeptide Chain
1.4.3 Conformation of Polypeptide Chain
1.4.4 Protein Structure Biology

1.5 Interactions of Protein with Other Molecules

1.6 Protein Engineering

1.7 Proteome and Proteomics

Proteome indicates the PROTEins expressed by a genOME or tissue. Despite


being first used in late 1994 at the Siena 2-D electrophoresis meeting, the term
proteome is already widely accepted. Zit is mentioned in more than 30 papers (till
the end of 1997), including those from Science and Nature, and “Proteomics: has
been the subject of a rash of conference in 1997.
The antecedent for proteome is genome, itself a relative new word. Recently,
genome has become a generic term for “big science” molecular biology. People
think of grand dreams like the human genome project in the context of genome.
And the genome projects have captured the imagination of scientific funding
bodies and the biotechnology industry. By sequencing the entire genome of an
organism, here for the first time in biology is the complexity of an organism
understood at the level of information content.
The proteome, unlike the genome, is not a fixed feature of an organism. Instead,
it changes with the state of development, the tissue or even the environmental
conditions under which an organism finds itself. There are therefore many more
proteins in a proteome than gens in a genome, especially for eukaryotes. This is
because there can be various ways a gene is spliced in constructing mRNA, and
there are many ways that the same protein can be post-translationally altered. So
one of the famous dogmas of biology, the one-gene-one-enzyme hypothesis of
Beadle and Tatum, is no longer tenable.

1.8 International Journals and Websites Related to Protein Chemistry

Table 1.1 International Journals Related to Protein Chemistry (295 Journals in


Bichemistry and Molecular Biology in 1998)

Rank Journal Abbreviation Impact Factor 1998 articles


1 Annu Rev Biochem 39.000 26
2 Cell 38.686 407
8 Nat Struc Biol 13.563 158
15 Curr Opin Struc Biol 8.690 93
23 J Biol Chem 7.199 4879
31 Adv Protein Chem 5.870 7
45 Biochemistry 4.628 2014
46 Protein Sci 4.440 277
52 Enzyme Protein 4.080
66 Proteins 3.346 177
73 Protein Eng 2.947 160
77 Method Enzymol 2.823 218
95 Arch Biochem Biophys 2.497 481
97 Biochim Biophys Acta 2.478 1813
123 Anal Biochem 1.991 460
132 Adv Enzyme Regul 1.884 22
167 Protein Expres Purif 1.382 166
175 J Protein Chem 1.278 140
Table 1.2 Websites Related to Protein Chemistry

GenBank (美国基因、蛋白数据库)
GeneCard(http://bioinformatics.weizmann.ac.il/cards/index.html)
EMBL (欧洲分子生物学实验室数据库)
HUGD(人类基因突变数据库)
DDBJ(日本国家遗传研究所基因数据库)
Swiss-Port(瑞士蛋白数据库)
蛋白三维结构数据库(美国 Brookheaven 实验室)
PIR(美国国家生物医学技术研究基金会蛋白数据库)
Human SNP Database(Whitehead institute/MIT center for genome
research)
OPD (Oligonucleotide probe database)
PROSITE (EMBL,蛋白序列中的特征序列及位点)
HSSP(EMBL, 三维结构已知的蛋白的同源蛋白)
DSSP(EMBL,蛋白二级结构及溶剂信息)
FSSP(EMBL,蛋白折叠方式相似性的结构家族)
SBASE(ABC,ICGEB,蛋白结构域、功能域资料)
TFD(NCBI,各种转录因子及其特性)
TRANSTERM(新西兰 Otago 大学, 翻译终止信号数据库)
Rebase(New england biolabs 公司,限制酶及甲基化酶数据库)
Genome Data Base (GDB)
The PredictProtein server(EMBL in Heidelberg)
BLAST (http://www.ncbi.nlm.nih.gov/BLAST/)
dbEST(http://www.ncbi.nlm.nih.gov/dbEST/)
Entrez (http://www.ncbi.nlm.nih.gov/Entrez/)
MMBD(http://www.ncbi.nlm.nih.gov/Structure/)
基因芯片数据库---?
MPDB(合成寡核苷酸探针数据库)---?
Chapter 2 Physic-chemical Properties of Proteins
Chapter 3 Methods for Characterization and
Purification of Proteins

To analyze the structure of a protein, we must isolate it from


the complex mixture of substances in which it exists in whole
cell. The primary object of this chapter is to describe techniques
and strategies used for protein purification. Because these
procedures are often used for protein characterization as well,
they will add to the methods already discussed for protein
characterization.

3.1 Methods of Protein Characterization


First we will discuss methods used for protein
characterization. Some of them will be discussed in later
related chapters.

3.1.1 Solubility Reflects a Balance of Protein-Solvent


Interaction
The solubility of a protein reflects a delicate balance
between different energetic interactions, both internally
within the protein and between the protein and the
surrounding solvent. Consequently, the protein’s solvent
or thermal environment could affect both its solubility
and structure. As we have seen, extreme changes can lead
to denaturation. In this part, we will, for the most part, be
concerned with conditions under which the native
structure is maintained. Changes in protein solubility that
do not destroy the molecule’s strucutral integrity can
occur in several ways.
A. Minimum in solubility occurs at the isoelectric point
Proteins typically have on their surfaces charged amino
acid chains that undergo energetically favorable polar
interactions with the surrounding water. The total
charge on the protein is the sum of the side-chain
charges. However, the actual charge on the weakly
acidic and basic side-chain groups also depends on the
solution pH. The decrease in solubility at the isoelectric
pH reflects the fact that the individual protein
molecules, which would all have a similar charge at pH
values away from their isoelectric points, cease to repel
each other.

B. Salting in and salting out


Proteins also show a variation in solubility that
depends on the concentration of salts in the solution.
These frequently complex effects may involve specific
interactions between charged side chains and solution
ion, or, particularly at high salt concentrations, may
reflect more comprehensive changes in the solvent
properties.
Figure 7.1 and 7.2
Salting-in effect: The effect of salts such as sodium
chloride on increasing the solubility of globulins is
often referred to as salting in. The salting in effect is
related to the nonspecific effect the salt has on
increasing the ionic strength of the solution. The higher
the ionic strength, the smaller are the interactions
between charged groups on the same or different
proteins.
Salting-out effect: The effect of salt such as ammonium
sulfate on decreasing the solubility of proteins is refered
to as salting out, which occurs with salts that effectively
compete with the protein for available water molecules.
In this case the protein molecules tend to associate with
each other because at high salt concentrations, protein-
protein interactions become energetically more
favorable than protein-solvent interactions. Each
protein has a characteristic salting-out point, and we
can exploit this fact to make protein separations in
crude extracts.

3.1.2 Several Methods are Available for Determina-


tion of Gross Size and Shape
Several methods are available for determining the size
and shape of protein molecules in solution.
A. Sedimentation rates is a function of size and shape
Information concerning the molecular weight of a protein
can be obtained by observing its behavior in an intense
centrifugal field. To get a qualitative understanding of
how this method works, we must first recognize that
protein molecules are generally slightly denser than
water. However, the molecules in a protein solution
seldom settle out in the earth’s gravitational field(1×g)
because they are constantly being stirred up by collisions
with surrounding solvent molecules. Nevertheless,
protein molecules in solution can be made to settle if they
are subjected to very high centrifugal force field(~100000
×g), such as can be attained in an ultracentrifuge.
The protein molecules slowly migrate toward the
bottom of the centrifuge tube at a rate that proportional to
their molecular weight.

B. Gel-exclusive chromatography gives a measure of size


Gel-exclusive chromatography is used for size estimation
as well as protein purification. This popular technique
exploits the availability of both natural polysaccharide
and synthetic polymers that can be formed into beads
with varying pore sizes, depending on the extent of cross-
linking between polymer chains.
Figure 7.5 and 7.6

3.1.3 Electrophoretic Methods are the Best Way to


Analyze Mixtures
Electrophoresis is one of the most commonly used
techniques in biochemistry. Electrophoresis is very much
like sedimentation, since in both cases a force gradient
leads to protein transport in the direction of the force. In
the case of sedimentation the force is gravity, so the rate
of migration depends on the effective mass of the particle.
In electrophoresis the force is the electrical potential, E,
so the rate of migration depends on the net charge on the
molecule, rather than its mass. (Figure 7.7 and 7.8).
(1) SDS-PAGE
(2) IEF-SDS-PAGE

3.2 Methods for Protein Purification


Before we can fully characterize a protein, we must
purify it from a natural source. Once the decision has
been made to purify a particular protein, several factors
must be weighed. e.g. how much material is needed?
What level of purity is required? The starting material
should be readily available and should contain the desired
protein in relative abundance. If the protein is part of a
large structure, such as nucleus, the mitochondria, or the
ribosome, then it is advisable to isolate the large structure
first from a crude cell extract.
Purification must usually be performed in a series of
steps, using different techniques at each step. Some
purification techniques are more useful when handling
large amounts of material, whereas others work best on
small amounts. A purification procedure is arranged so
that the techniques that are best for work with large
amounts are during early steps in the overall purification.
The suitability of each purification step is evaluated in
terms of the amount of purification achieved by that step
and the percent recovery of the desired protein.
Combining techniques introduces new considerations
and new problems. If each of two purification techniques
gives a ten-fold enrichment for the desired protein when
executed independently on a crude extract, this does not
mean that they will give 100-fold enrichment when
combined. In general, they will give somewhat less. As a
rule, purification techniques that combine most
effectively usually are based on different properties of the
protein.
Throughout the purification we must have a
convenient means of assaying for the desired proteins so
that we can know the extent to which it is being enriched
relative to the other proteins in the starting material. In
addition, a major concern in protein purification is
stability. Once the protein is removed from its normal
habitat, it becomes susceptible to a variety of
denaturation and degradation reactions. Specific
inhibitors are sometimes added to minimize attack by
proteases on the desired protein. During purification it is
usual to carry out all operations at 5C or below.
In their natural habitat, proteins are usually surrounded
by other protein and organic factors. When these are
removed or diluted, as during purification, the protein
becomes surrounded by water on all sides. Proteins react
differently to a pure aqueous environment; many are
destabilized and rapidly denatured. A common remedial
measure is to add 5% to 20% glycerol to the purification
buffer. The organic surface of the glycerol is believed to
simulate the environment of the protein in the intact cell.
Two other ingredients that are most frequently added to
purification buffers are mercaptoethanol and EDTA.

3.2.1 Differential Centrifugation Subdivides Crude


Extracts into Two or More Fractions
3.2.2 Differential Precipitation is Based on Solubility
Differences
3.2.3 Column Procedures Are the Most Versatile
Purification Methods
3.2.4 Electrophoretic Methods are Used for
Preparation and Analysis
3.2.5 Purification of Specific Proteins Involves
Combination of Different Procedures

Summary
1. Protein solubility is not a fixed quantity for a given
protein. Rather, it is a function of many variables. Two
of these are pH and salt concentration. Proteins show a
minimum solubility at their isoelectric point.
Frequently, proteins require the addition of a small
amount of salt to become soluble, but excessive
amounts of salt lead to protein precipitation.
2. There are several methods for determination of
molecular weight. These include sedimentation
analysis and gel-exclusion chromatography.
Sedimentation analysis may be used in two different
ways: (1) by independently determining the
sedimentation and diffusion rates and combining this
information to calculate a molecular weight and (2) by
equilibrium ultracentrifugation. Gel-exclusion
chromatography uses cross-linked polydextrans and
relates molecular weight to the rate of migration
through a column.
3. Electrophoretic methods are used in various ways to
characterize protein mixtures and purified proteins.
The high resolution attainable by electrophoresis
makes it ideal for determining the number of proteins
in a mixture as well as their approximate size
4. Methods of protein purification include differential
centrifugation, differential precipitation with
(NH4)2SO4, gel-exclusion chromatography, different
electrophoretic mobility, and differential affinities for
column matrice containing different functional groups.
Column procedures are particularly versatile because
of the large number of functional groups that can be
used to bind proteins in different ways and because of
the variety of conditions for differential column
elution.

Chapter 4 Primary Structure of Proteins: The


Building Blocks of Proteins----Amino Acids,
Peptides, and Polypeptides

Key-points:
From our presentation you will learn the
following:
1. Certain acidic and basic properties are common to all amino
acids found in proteins except for the amino acid proline.
2. Side chains give amino acids their individuality. These side
chains serve a variety of structural and functional roles.
3. The alpha-carboxyl group of one amino acid can react with
the alpha-amino group of another amino acid to form a
dipeptide.
4. Many amino acids, reacting in a similar way, can become
linked to form a linear polypeptide chain.
5. The Amino acid sequence in a polypeptide can be determined
by a process of partial breakdown into manageable
fragments, followed by stepwise analysis proceeding from
one end of the chain to the other.
6. Polypeptide chains with a prespecified sequence can be
synthesized by well-established chemical methods.


4.1 Amino acids

Figure 1 Amino acid anatomy


Table 1 Structure of the 20 amino acids found in protein

4.1.1 Amino acids have both acid and base


properties

Fig 4.3 Equilibrium between charged and uncharged forms of


amino acid side chains
Table 4.2 Values of pK for the ionizable groups of the 20 amino
acids commonly found in proteins

4.1.2 Aromatic amino acid absorb light in the near-


ultraviolet

Fig 4.4 Ultraviolet absorption spectra of Trp, Tyr, and Phe at pH


6.

4.1.3 All amino acids except glycine show


asymetry
Fig 4.5 The covalent structure of alanine, showing the three-
dimensional structure of the L- and D- stereoisomeric forms

4.2 Peptides and polypeptides

Fig 4.6 Formation of a dipeptide from two amino acids


Fig 4.7 A polypeptide chain, with the backbone shown in color and
the amino acid side chains in outline

4.3 Determination of amino acid composition of


proteins
To determine a protein’s amino acid composition, it is necessary
to (1) break down the polypeptide chain into its constituent
amino acids, (2) separate the resulting free amino acids
according to type, and (3) measure the quantities of each amino
acid.
Tab 4.3 Amino acid content of protein (in precent)

4.4 Determination of amino acid sequence of proteins

The amino acid sequence of the first protein (insulin) was


determined in 1953 by Sanger’s laboratory.

The importance of knowledge of the amino acid sequence


of the proteins shows in a number of ways:
It permits comparisons to be made between normal and mutant
proteins;
It permits comparisons to be made between comparable proteins
in different species and thereby has been instrumental in
positioning different organisms on the evolutionary tree;
Final and most important, it is a vital piece of information for
determining the three-dimensional structure of the protein.

The steps to determine the amino acid sequence of the


protein(Fig 4.13):

Purification of the protein


Cleavage of all disulfide bonds
Determination of the terminal amino acid residues
Specific cleavage of the polypeptide chain into small fragments
in at least two different ways
Independent separation and sequence determination of peptides
produced by the different cleavage methods
Reassembly of the individual peptides with appropriate overlaps
to determine the overall sequence
Figure 4.14 Disulfide cleavage reaction
Figure 4.15 Polypeptide chain end-group analysis
Figure 4.16 The Dansyl chloride method for N-terminal amino
acid determination
Figure 4.17 The cleavage of polypeptide chains at methionine
residual by cyanogen bromide
Figure 4.18 The Edman degradation for polypeptide sequence
determination
4.5 Chemical synthesis of peptides and polypeptides
Chapter 5 The Three-Dimensional Structures of
Proteins
Proteins adopt the most stable folded structure, this
is a function of the way in which the individual amino
acid residues interact with one another

Two famous scientists, Pauling and Corey made a


great progress in deduced protein structure, using
information from various sources:
They knew a little about the structure of peptides from
small-molecule crystallography, which indicated that the
peptide bond was planar and gave accurate bond lengths
and angles
They were already aware of the importance of hydrogen
bonds in determining the orientation of amino acids,
peptides and even water in simple crystals
They made shrewd guesses about the interpretation of a
few spacings in the diffraction patterns of certain fibrous
proteins
Putting all of this information together, they
experimented with molecular models until they could
produce structure in reasonable agreement with all the
available facts.

Kendrew and Perutz:


myoglobin and hemoglobin

Enormous advances have been made in protein


chemistry and in computer technology as well:
The final folding agreements of proteins may some
day be predictable from the amino acid sequences of the
polypeptide sequences
5.1 The information for folding is contained in the
primary structure

The conformation of a native or highly organized proteins


reflects a delicate balance among a variety of interaction forces,
both within the folded protein’s interior and with surrounding
solvent.

Example:
Figure 5.1 Schematic representation of an experiment to
demonstrate that the information for folding into a biologically
active conformation in contained in the protein’s amino acid
sequence.

Chaperons: a special class of proteins that appear to catalyze


the folding process of polypeptide chain into a native
conformation

5.2 The Ramachadran Plot predicts sterically


permissible strucutre

Figure 5.2 Basic dimensions of (a) the peptide and (b) the
dipeptide
Figure 5.4 Ramachandran plot
Figure 5.5 Ramachandran plot(2)

In summary, it can be seen that owing to the basic geometric


properties of the polypeptide chain, its sterically allowed
conformations are severely restricted by the occurrence of
unfavorable steric interactions between various atomic groups.

5.3 Protein folding reveals a hierarchy of structural


organization
Anfinsen’s experiment: protein folding is a spontaneous
process.

In actuality, newly synthesized polypeptide chains typicaly


fold in seconds. This means that protein folding must be a
highly directed and cooperative process. Although much
remains to be learned about the details of the process, its speed
and its facility suggest the existence of a sequential set of
folding intermediates, each being more highly organized than
the one before it.

Figure 5.6 Possible successive steps in the protein folding


process.

The forces that stabilize protein act in concert with related


energetic and geometric factors to yield successively large and
more complex protein structural arrangement.

Figure 5.7 Herarchies of protein structure

To begin with, steric interactions restrict accessible


conformations and reflect feature of protein’s amino acid
sequence, or Primary structure. The requirement for
hydrogen-bond preservation in the folded structure result in
cooperative formation of regular structural regions in proteins.
This situation arises principally because of the regular repeating
geometry of the hydrogen-bonding groups of the polypeptide
backbone an leads to the formation of regular hydrogen-bonded
Secondary structures. Association between elements of
secondary structure in turn results in the formation of
Strucutral Domains, whose properties are determined both by
chiral properties of the polypeptide chain and packing
requirements that effectively minimize the molecule’s
hydrophobic surface area. Further association of domains results
in the formation of the protein’s Tertiary Structure, or over-all
spatial arrangement of the polypeptide chain in three
dimensions. Likewise, fully folded protein subunits can pack
together to form Quaternary Structure, which can serve a
structural role or provide a structural basis for modification of
the protein’s functional properties

5.4 Two secondary structure are found in most proteins


A major driving force in folding is the necessity to minimize
the extent of exposure b the hydrophobic group to solvent. This
consideration involves a sacrifice of the favourable hydrogen-
bonded interactions between the unfolded polypetide backbone
and water. To preserve a favorable energy balance of folding,
the backbone polypeptide groups must take part in alternative
hydrogen-bonded interactions between themselves in the
protein’s folded state.

5.4.1 The α helix


Characterizing Parameters of the α helix:
(1) each residue’s carbonyl group forms a hydrogen bond with
the amide NH group of the residue four amino acids farther
along the polypeptide chain;
(2) all residue in an α helix have nearly identical
conformations, averaging ψ=-45º to -50º and Φ=-60º, so
they lead to a regular strucuture in which each 360º of
helical turn incorporate approximately 3.6 amino acid
residues and rise 5.6Å along the helix axis direction.
(3) The advance per amino acid residue along the helix axis is
1.5Å.

Figure 5.8 Three ways of projecting the α helix.

An important property stemming from the conformational


regularity of the α helix, which applied to other secondary
structures as well, is cooperativity in folding.

5.4.2 The β Sheet

Figure 5.9 The antiparallel β sheet

β sheets occur in two different arrangements. In the first of


these, the chains are arranged with the same N-to-C polypeptide
sense to produce a parallele β sheet. Alternatively, the chains
can be aligned with opposite N-to-C sense to produce an
antiparallel β sheet.

Figure 5.10 Two forms of the β sheet structure

5.5 Pauling and Corey provided the foundation for


understanding of fibrous protein structure

Linus Pauling and Robert Corey examined the structure of


crystals formed by amino acids and short peptides and formed
two rules that describe the ways in which amino acids and
peptides interact with one another to form noncovalently
bonded crystalline structures. These rules laid the foundations
for our understanding of how amino acids in protein polypeptide
chain interact with one another:

1st rule: was that the peptidyl C-N linkage and the four
atoms to which the C and the N atoms are directly linked always
forms a planar structure;(which indicates the only flexibility in
the polypeptide backbone arises from rotation about the carbon
that joins adjacent peptide planar groups)
2nd rule: was that peptidyl carbonyl and amino groups
always form the maximum number of hydrogen bonds.
Taken together, these two rules drastically reduce the number of
possible conformations available to the polypeptide chain.

5.6 Collagan forms a unique triple-stranded structure

5.7 In globular proteins, secondary structure element


are connected in simple motifs

Figure 5.17 The structure of lysozyme--- the first enzyme


whose three-dimensional structure was determined, the120-
residue protein
This protein has local regions of ordered α-helical and
antiparallel β-sheet secondary structure. In addition it has
several additional regions of single-stranded loops with a less
regular conformation.

At the most elementary level of structural analysis, it was


found that simple combinations of a few secondary structure
elements with specific geometric arrangements are used again
and again in different protein structure. Three of these structural
motifs that are used most frequently:
(1) The helix-loop-helix (Figure 5.18)
(2) The hairpin β motif (Figure 5.19, 5.20, 5.21)
(3) The β-α-β motif (Figure 5.22)

5.8 The domain is the basic unit of tertiary structure


A domain constitutes a stable unit of tertiary structure; it
usually contains a combination of two or more covalently linked
structural motifs. Some proteins contains a single domain.
Others contain two or more domains held together by covalent
linkages or noncovalent linkages. While it is clear that there is
an enormous variety of domains, it is remarkable how many
times we find strikingly similar domains in different proteins
and how often it is possible to gain an immediate qualitative
understanding of the features that account for a protein’s
stability and function once the structure has been determined.

No protein is found as a single-layer structure. This is


because it requires at least two layers to bury the hydrophobic
core resulting from the hydrophobic amino acid side chains that
are inevitably found in all proteins.

The three kinds of domains:


5.8.1 The helix-loop-helix motif is the basic component found in α
domain structures

Two commonly found domains of this type:


(1) four-helix bundle (Figure 5.23 & 24)
(2) globin fold (Figure 5.25)

5.8.2 α /β domains exploit the β -α -β motif (Figure 5.26)


The most frequent and most regular of the domain
structures are the α/β domains, which consist of a central
parallel or mixed β sheet surrounded by α helices.Most domains
of this type make extensive use of the β -α -β motif. (e.g. glycolytic
enzymes, translocating proteins, etc.)

5.8.3 Antiparalle β domains show a great variety of topologies


(Figure 5.27~29)

5.8.4 Some proteins or domains require additional features to


account for their stability (Figure 5.30)
In addition to the packing of elements of protein secondary
structure, which is a dominant feature in most proteins, there are
cases, especially among the samllest structures, where the
geometry and presence of disulfide bonds or nonpeptidyl
groups are a dominant factor.
Special structural features that account for the stability of
membrane-binding proteins and DNA-binding proteins will be
discussed in details later.

5.8.5 Many proteins contain more than one domain


Within a single subunit, contiguous portions of the
polypeptide chain often fold into more than one domain.
Sometimes the domains within a protein are very different from
one another (Figure 5.31), but often resemble each other very
closely (Figure 5.32).
Figure 5.33~34

5.9 Quaternary structure depends on the interaction of


two or more proteins or protein subunits

Quaternary structure: The higher-order organization of


globular subunits to form a functional aggregate.
Types: (1) subunits that resemble to form the quaternary
structure are very different in structure
(2) a commonly observed pattern of quaternary
structure is typified by molecular aggregate composed of
multiple copies of one or more different kinds of subunits.

5.10 Predicting protein structure from protein prmary


structure (omitted)

5.11 Methods for determing protein conformation


5.11.1 X-ray diffraction analysis of fibrous proteins
5.11.2 X-ray diffraction analysis of proteins crystals
5.11.3 Nuclear magnetic resonance (NMR)
complements X-ray crystallography
5.11.4 Optical Rotatory Dispersion (ORD) and
Circular Dichroism (CD)
Chapter 6 Protein Structure Prediction

Outlines:
Protein structure prediction is becoming urgent because of the
increased discrepancy between the number of known protein sequences
and the number of experimentally-determined structure. In this chapter,
we will discuss (1) the principles of protein structure as they related to
the prediction problem; (2) the approaches of protein structure
prediction; and (3) some examples.

6.1 Introduction

The central dogma motivating prediction is that the three-


dimensional structure of a protein is determined by its sequence and its
environment without the obligatory role of extrinsic factors.
This hypothesis comes mainly from the classic study of re-naturation
of ribonuclease, which is conformed by many experimental results.
However, it has been challenged since then, such as that chaperons and
disulfide interchange enzymes have been identified as assisting the
folding process. Some other experiments support this hypothesis, i.e.
these molecules just help but not determine the final natural state of
proteins. There has been sufficient success of predictions to justify the
use of the central dogma as a working hypothesis.
Prediction is becoming a pressing problem for many biologists as the
discrepany continues to increase between the number of known protein
sequences and the number of experimentally-determined structure (Fig.
6.1)

6.2 Principles of protein structure (Factors deteriming protein


structure)

6.2.1 Dominant effects in protein folding


In theory, molecular dynamics simulation in solvent with accurate
potentials and run over sufficient time would model the folding of a
protein. Since this is not feasible at present, it is instructive to describe
the individual effects that govern the protein/solvent system.

1. Net protein stability


The diverse chemical properties of the protein main chain and side
chains (Figure) give rise to an interplay of non-covalent and entropic
effects that determine the structure of the molecule. Most globular water
soluble proteins have only marginal stability at their physiological
conditions: the change in Gibbs free energy from the unfolded to folded
state typically is between –5 to –20 kcal mol-1. Understanding and
quantifying the thermodynamic effects remains a chanllenge.

2. The hydrophobic effect


It is widely regarded that protein folding is driven by the hydrophobic
effect. This describes the energetic preference for non-polar atoms, such
as hydrocarbons, to associate and reduce their contact with water. At
room temperature, the effect is mainly entropic.
Experimental measures of the magnitude of hydrophobicity for different
side chains come from partition experiments in which the concentrations
of compounds modelling side chains are measured in a medium
representing the protein core and in water.
The relationship between the hydrophobic effect and the accessible
surface area (ASA) of the solute has dominated many aspects of protein
modelling. ASA is defined as the locus of the centre of a water probe as
its rolls around the surface of a molecule (Fig). Molecular surface is the
sum of the area of the solute atoms in contact with this water probe
(contact surface) and the re-entrant surfaces of the water probe. There is
an approximate linear relationship between hydrophobicity and the ASA
of all non-polar atoms in side chains in their extended conformation.

3. Atomic packing
The net effect of attractive and repulsive van der Waals interactions
between atoms is to favor close atomic packing. Thus to a first
approximation the protein core resembles the solid state. Surface
residues and most atoms of the chain in the unfolded state are less
ordered and resemble in part the liquid state. Thus for residues that are
in the protein core, folding leads to a liquid→solid transition. This
transition is primarily enthalpic.

4. Conformational entropy
the formation of the folded structure restricts the dihedral
conformational space samples by the main chain and the buried side
chain. This freezing of rotamers is entropically unfavorable.
5.Electrostatic effects---ion pairs and hydrogen bonds
The net effects of hydrophobicity, close packing, and conformational
entropy would probably lead to a compact protein that lacks a specific
architecture. The specificity for the tertiary structure could be
considered as residing in the location of the hydrogen bonding and the
ion pairing groups. Electrostatic effects in the protein/solvent system are
complex.

An individual fully charged (or just apolar atom) extending from the
protein surface into water will be surrounded by a solvation shell of
water molecules. Transfering this charged atom the protein core is
energetically very unfavorable due to removing the solvation shell.
Thus, isolated charges are very rarely observed buried within proteins.

The formation of protein-protein electrostatic interactions must compete


with the charges interacting with water and thus a charge interaction will
be far less favorable energetically compared to the in vacuo effect.
Despite the disadvantageous effect of partial desolvation, ion pairs on
the surface tend to stabilize a protein and on average one-third of
charged residues in a protein are involved in salt bridges. However there
is an adverse effect of burying an ion pair in a low dielectric
environment with only about 20% of such pairs being fully buried.

There is a competition between protein-protein and protein-solvent


hydrogen bonds. Although hydrogen bonds abound in proteins both
forming secondary structures and involving side chain/main chain and
side chain/side chain interactions, it remains unclear whether hydrogen
bonds, particularly if buried, actually stabilize a protein. The formation
of α-helices and β-sheets is probably the consequence that the periodic
hydrogen bonding provides the best method of arranging
complementary main chain amide and carboxyl groups within a
hydrophobic core.

6. Disulfide bridges
The common view is that disulfide cross-links stabilize the folded state
by entropically restricting the degrees of freedom of the unfolded state
compared to the same chain without cross-links. For a single link, the
stability increases with the length of the link but for multiple bridges
there are complex effects. Typically a link will yield a few kcal mol-1 of
stability. There are small proteins whose stability is considered to be
enhanced by the entropic effect of multiple disulfide bridges.

6.2.2 Analyses of Protein Structure

Many workers primarily follow a knowledge-based (i.e. empirical)


approach to structure prediction. However, as more protein structures
were determined to high resolution, analyses were increasingly able to
identify principles of protein architecture. These principles then form
the basis for predictive algorithms. Some features that are particularly
relevant to prediction are highlighted below:
1. Residue conformation
(1) The main chain backbone torsion angles adopt allowed states
conventionally represented as a Ramachandran (φ, Ψ) plot. Gly
adopts a larger and Pro a smaller region of allowed (φ, Ψ) space.
(2) Proline is the only residue that adopts a cis peptide conformation
with a relatively high probility
(3) Side chains adopt distinct conformations that are dependent on
backbone conformation. These conformations are conviently
represented as rotamer libraries

2. Periodic secondary structure


(1) α-helices can be curved or bent due to interactions with solvent,
the presence of proline, or an α-aneurism
(2) There are differentpreferences for residues to occur in the middle
of an α-helix, and at the three N-terminal residues (N-cap), just
before the N-cap, at the C-cap, and just after the C-cap.
(3) Nearly all β-sheets have a right-handed twist along the strand
direction and consequently a left-handed twist between strands.
(4) A common distortion to the β-sheet is the β-bulge.
(5) Certain residues preferentially occur within β-strands.
(6) Right-handed 3.10 helices are relatively common (about 4% of
residues). These helices can occur independently (typically less
than six residues) or can form a terminal few residues of an α–
helix.
(7) Left-handed polyproline II helices are relatively common (about
4% of residues) but their recognition was delayed due to the
absence of periodic hydrogen bonds. These helices tend to be less
than six residues long.
3. Non-periodic secondary structure
(a) A β-turn refers four residues that achieve a 180° chain reversal.
There are preferred sequence patterns for the different
conformational families of β-turns
(b) βββββαααααα
Chapter 7 Protein Folding and Unfolding

C. B. Anfinsen discovered that the small proteins ribonuclease A and


staphylococcal nuclease could be reversibly denatured. On removal of a
chemical denaturant, such as urea, they spontaneously refold to their
native structures after denaturation. Similarly, they spontaneously refold
on cooling after thermal denaturation. Not only the amino acid
sequences of these proteins encode their final folded structures, they
also encode the information on how to get to the structures. But there
are proteins that will not renature after being denatured, sometimes
because they have been processed after biosynthesis. In some cases,
larger or multimeric proteins do require assistance to fold, which is
provided by molecular chaperones. The kinetics of folding and
unfolding appear to be a very complex process, but this process follows
very simple rate laws, governed by a few basic principles.

7.1 Kinetic analysis of complex reactions

To determine the mechanism and pathway of unfolding and


refolding, the intermediates that define and direct the pathway must be
identified, but these are usually unstable thermodynamically. They
might be detectable as kinetic intermediates, but only if they occur on
the pathway before the rate-limiting step and if their free energies are
comparable to or lower than that of the initial state. No other kinetic
intermediates are populated to substantial levels, even transiently.
With a simple one-step reaction followed as a function of time, t,
there is expected to be a single kinetic phase, characterized by a single
rate constant k:
Fraction folded conformation = 1-exp(-kt) (7.1)
More complex kinetic behavior would be observed either if there were
rate-limiting steps in the reaction or if the starting material were
heterogeneous, with different populations having different rates of
reaction. Discriminating between these possibilities with a multiphase
reaction is not always straightforward. It is too often assumed that one
kinetic phase of a folding reaction represents formation of an obligatory
intermediate I and that a second phase represents its conversion to
another intermediate or to N:
k-1
U I N (7.2)
k1 k2
if this were the case, there would be a lag period in the appearance of N
of approximate magnitude (k1 + k2)-1, during which the steady-state
concentration of I would be generated. This effect is cumulative with
additional steps. Therefore, the magnitude of the lag period in formation
of the final folded conformation should be correspondingly longer with
an increasing number of obligatory, sequential intermediates along a
pathway. Most kinetic complexities of protein folding arise from
heterogeneity of the unfolded state.

Because of the great conformational heterogeneity of the unfolded


state, protein folding is a special kinetic phenomenon in which every
molecule of a typical population is likely to have a unique conformation
at every instant. For example, 1 mg of a protein with a molecular weight
of 104 comprises 0.1 µmol, or 6×1016 molecules, whereas many more
conformations are likely when it is unfolded. How is this
conformational heterogeneity apparent in the kinetics of refolding? Does
each molecule refold at its own rate, determined by its conformation at
time zero, or do molecules somehow fold by a common mechanism at a
common rate? If each molecule does not fold uniquely, how do different
molecules manage to follow the same rate-limiting step?

It is unrealistic to expect to elucidate all the details of a complex


reaction like protein folding. Although it occurs much more rapidly (on
second to minute time scale) than expected for a random search, this
time is long enough for each molecule to undergo perhaps some
1011~1013 conformational changes. Because each molecule of a
substantial population starts out with a different conformation, it might
be feasible to determine at what stage different molecules start to follow
the same pathway. At best it may be possible only to characterize the
slowest transitions and the conformations and energetics of the most
stable intermediates, to identify the overall rate-limiting step, and to
characterize the transition state.

7.2 Kinetics of unfolding

Unfolding of proteins is almost universally observed to be an all-or-


none process, with little or no partial unfolding preceding complete
unfolding. When a native, covalently homogeneous protein is placed in
unfolding conditions at time zero, unfolding almost always occurs with
a single kinetic phase and a single rate constant. There is no lag period,
and all probes of unfolding give the same rate constant. Therefore there
is a single rate-limiting step in unfolding, and all folded molecules have
the same probability of unfolding. Exceptions generally result from
exceptions to the usual homogeneous nature of the folded state.
The rate of unfolding usually changes uniformly with variation of the
unfolding conditions. In particular, logarithmic plots of unfolding rates
versus denaturant concentration or temperature are generally linear,
suggesting that the mechanism of unfolding is not changing. There
appears to be a single transition state for unfolding under these
conditions.

7.3 Kinetics of refolding

Kinetic complexities are encountered almost universally in protein


refolding. These complexities usually result from conformation
heterogeneity of the unfolded state, with slow- and fast-refolding
molecules:

slow fast
US UF N (7.3)

In virtually all characterized cases, the heterogeneity arises from cis-


trans isomerization of peptide bonds preceding Pro residues.

1. Peptide bond isomerization


Cis peptide bonds are often found in folded proteins, but almost
only when the next residue is pro. A peptide bond is usually cis or trans
in essentially all the folded molecules because the folded conformation
generally favors one over the other. When a protein is unfolded,
however, the constraints favoring one form over the other are released,
and an equilibrium between cis and trans isomers is attained at each
peptide bond. When the protein is refolded, a fraction of the molecules,
UF, have all the necessary peptide bonds as the correct isomer whereas
the others, US, have one or more s an incorrect isomer.
Cis-trans isomerization of Pro peptide bonds is intrinsically
slow(Sec.5.2.4.b). When the rate of refolding of the UF molecules is
faster than cis-trans isomerization, UF and US molecules have different
rates of refolding(Eq.7.21) If all the peptide bonds must be of the correct
isomer for refolding to occur, the greater the number of Pro residues the
greater the fraction of US molecules and the slower their refolding. The
actual situation is more complex, however, because some proteins can
refold to a native like conformation with an incorrect isomer of one or
more peptide bonds. Also, the rates of
isomerization can be either increased or decreased by the conformation
of the protein.
For example, two of the four Pro resudues in bovine ribonuclease A
have cis peptide bonds in the folded conformation. Unfolded
ribonuclease A (with the four disulfied bonds intact) refolds in three
different kinetic phases, corresponding to at least three different
unfolded species. One accounts for 15% of the molecules and refolds
within less than a second under optimal conditions; it is thought to have
all correct peptide bond isomers. The remaining 85% of the molecules
are thought to have one or more nonnative peptide bond isomers that
must slow refolding. A second kinetic species represent 65% of the
molecules and refolds on a time scale of seconds. Under conditions
strongly favoring folding, this species folds more rapidly into a native-
like formation, retaining the incorrect peptide bond isomer. The
remaining 20% of the molecules make up the third kinetic species and
refold even more slowly. The second and third kinetic species are
believed to result from trans isomers predominating in the unfolding.

2. Refolding in the absence of slow peptide bond isomerization


In a population of unfolded molecules with the same cis-trans
isomers as the native state, the refolded protein generally appears with a
single rate constant and without a significant lag period. The absence of
an observable lag period indicates that there is a single rate limiting step
in refolding and that all preceding and subsequent steps are more rapid.
Consequently, refolding can be simplified to three stages (Fig. 7.29): (1)
the nature of the unfolded protein under refolding conditions, the
“prefolded” conformation; (2) the nature of the rate limiting step and the
overall transition state for folding; and (3) the nature of the folded
conformation under refolding conditions, especially its flexibility.
Considering the conformational heterogeneity of the unfolded state
(but excluding intrinsically slow isomerizations), it is noteworthy that
all the molecules with the same covalent structure are usually observed
to fold with the same rate constant. A single rate constant is consistent
with all the molecules folding by the same rate-determining step. The
folding of many conformationally heterogeneous molecules by the same
rate-limiting step requires that there be a rapid conformational
equilibration prior to the rate-limiting step (Fig.7.29C). That this occurs
is also indicated by the general observation that the rate of refolding
depends only on the final folding conditions, not the initial unfolding
conditions. Proteins unfolded in different ways generally have different
average physical properties. Nevertheless, they refold at
indistinguishable rates under the same final folding conditions. The rate
of folding is determined not by the nature of the initial unfolded protein
but by the properties it rapidly adopts when placed under the final
folding conditions.
How do all unfolded molecules equilibrates rapidly prior to refolding
if sampling of all conformations by a random coil requires such a long
period of time? The answer undoubtedly is that an unfolded protein
under refolding conditions does not behave as a random coil but adopts
a limited set of energetically favored nonrandom conformations. In this
way, all the molecules converge to follow a common subsequent
pathway and have the same rate-limiting step. This convergence is in
contrast to the proposal that each protein molecule folds by a unique
pathway.
The rate of direct refolding generally varies with temperature in a
complex manner, giving a nonlinear Arrhenius plot. At low
temperatures, the rate of refolding increases with temperature, as do
most chemical reactions. The increase in rate diminishes, however, and
the rate reaches a maximum and then decreases dramatically at high
temperatures. This temperature dependence is unusual for chemical
reactions, but it might be expected for a complex reaction like protein
folding that is dependent on the presence of metastable, partially folded
intermediates. Such metastable intermediates would be destabilized at
high temperature, and the rate of refolding would decrease accordingly.

3. The prefolded state

The prefolded state is the unfolded protein under refolding


conditions, prior to the rate-limiting step and complete refolding. The
prefolded state is intrinsically unstable and is populated only transiently.
Nevertheless, a variety of evidence indicates that it has considerable
nonrandom conformation in many proteins. The nature of the prefolded
state is being investigated very actively, using its spectral properties, its
susceptibility to proteases, protection from exchange of its labile
hydrogens, and the effects of mutagenesis.

4. The transition state for folding

7.4 Folding pathways

Elucidating the mechanism of protein folding requires


characterization of the initial, intermediate, and final conformational
states, plus determination of the steps by which they are interconverted.
The kinetic roles of the various states can be determined most readily if
there is some means of control over the rates and equilibria of the
various steps. This control would also make it possible to ensure that
unstable intermediates accumulate to substantial levels, at least
transiently. Ideally, the unstable intermediates would be trapped in a
stable form so that they could be characterized. To control the rate of
formation and breakage of hydrogen bonds would be almost ideal
because every protein structure includes hydrogen bonds. During
folding, protein molecules with 1, 2,3,….intramoleular hydrogen bonds
might accumulate kinetically, if they could be trapped and identified, a
pathway could be defined in terms of hydrogen bonds. Unfortunately, it
is not possible to trapped, due to the reduction-oxidation nature of the
covalent disulfide interaction between thiol groups.

1. Trapping intermediates with disulfides


some proteins that contain disulfide bonds in their folded
conformations require these disulfide for stability of their folded
conformation. In this case, the reduced protein is unfolded, even in the
absence of denaturants, and folding and disulfide bond formation are
coupled. Protein species with different numbers of disulfide bonds that
accumulate during unfolding and refolding can be trapped and separated
and their disulfide bonds identified.
The kinetic roles of the intermediate can often be determined
unambiguously due to the ability to control the kinetics and
thermodynamics of the disulfide interaction. Under approriate condition,
the disulfide interaction can be very dynamic, with disulfides being
formed, broken, and rearrangd on time scales as short as 10-5s. the rates
of the intermolecular steps in disulfide formation reflect the protein
conformational transitions involved. The approach is useful only with
proteins that unfold when their disulfides are broken; unfolding and
refolding of the protein consequently can be controlled by varying jus
the intrinsic disulfide stability. There is no need to use denaturants, and
the strengths of all other types of interactions that stabilize protein are
not affected.
Although only the disulfide bonds are trapped, the conformations that
direct disulfide bond formation are effectively trapped also. The
stabilities of protein disulfides and of the conformations that specify
them are linked functions. It is thus a thermodynamic requirement that
whatever conformation stabilize a particular disulfide bond must be
stabilized to the same extent by the presence of that disulfide. Therefore,
the conformational basis of folding should be evident from the
conformation of the trapped intermediates as long as the conformations
are not affected by the trapping procedure.

2. Disulfide folding pathway of BPTI


Examples: p 318~
Figure 7.53 The disulfide folding pathway of BPTI (bovine pancreatic
trypsin inhibitor)

7.5 Folding of large proteins

Large proteins are composed of multiple structural domains,


multiple subunits, or both. Individual domains can often be excised
proteolytically from a protein, or the corresponding fragment can be
produced by protein engineering. In many cases, the isolated domains
are as stable as when they are in the intact protein, and they are
independent structural units in the intact protein. The independent
domains unfolded and refold like single-domain protein, which can lead
to complex unfolding curves for a protein when its domains unfold
under different conditions. There can also be varying degrees of
interaction between the domains when they are part of the same
molecule. If these interactions are mutually stabilizing, the isolated
domains are corresponding less stable. In extreme cases, domains can be
so independent as to become a single cooperative unit.
When the isolated domains are stable, folding of an intact
multidomain protein appears to occur by folding of individual domains,
followed by their association. Somewhat surprisingly, the domains
linked by a polypeptide chain often fold more slowly than when they are
isolated. The various segments of a polypeptide chain seem to interfere
with the folding of each other. Association of the folded chains is often
the slowest step in the overall folding process, either because the
domains are not folded entirely correctly or because the small
adjustments required for their interaction are energetically unfavorable.
When association of folded domains is slow, an intermediate state
accumulates during folding in which the individual domains are folded
but unpaired. These domians apparently can interact with the
complementary domains of other molecules, which often leads to
indefinite aggregation of the protein and to its precipitation. For this
reason, productive folding of large proteins usually must be carried out
at very low protein concentrations.
The folding of oligomeric proteins is subject to similar
considerations because their subunits often consist of multiple domains.
With oligomers, however, specific interactions between molecules are
necessary. the monomers generally fold to nearly their final
conformations before any association steps occur, specific association
presumably requires a folded conformation to provide the interaction
site. Nevertheless, further folding generally occurs after association. The
rate-limiting step in regenerating a native oligomeric protein can be
either intramolecular folding or association of two particles. Which is
rate-limiting often depends on the protein concentrations.
No scheme for folding and assembly is general to all oligomeric
proteins. This should not be surprising in view of the many different
quaternary structures that are encountered in proteins (Fig.6.24). But
even homologous proteins with essentially the same quaternary structure
can use apparently different assemble mechanisms.
Association of folded monomers would seem to be a straightforward
process, but it often observed to be relatively slow, with association
constants of only 103-105 m-1 s-1. The final adjustments of the structure
upon association seem to involve a significant energetic barrier.

7.6 Biosynthetic folding

Proteins are synthesized on ribosomes in vivo as linear polypeptide


chains, but they rapidly fold to their final conformation either during or
after biosynthesis. Very little is known directly about how or when this
happens in vivo.
7.1.1 Basic methods
The folding of proteins is usually studied in vitro by first denaturing them in
solution of urea, guanidinium chloride, or acid and then diluting the denaturant.
Stopped-flow methods are the most convenient, because they cover a useful time
range and are ideal for mixing two reagents. Fluorimetry, following the change in
tryptophan fluorescence in the near ultravilolet, is used because of its sensitivity.
This technique is generally used to monitor tertiary interactions because the
fluorescence yield and emission wavelength of tryptophan are sensitive to its
environment. Stopped-flow circular dichroism is useful because it can detect
changes in secondary structure in the far ultraviolet.
Helixes form in a few hundred nanoseconds and β turns in a few microseconds
in model peptides. Short loops in proteins form with an upper limit of about 106 s-1.
Thus, a lower limit for the initial collapse of a denatured protein is about 1 µs.
Conventional rapid mixing methods are limited to a time scale of milliseconds or
greater, but specialized continuous-flow apparatus has been used for tens of
microseconds. Relaxation methods or flash photolysis are necessary for
investigating faster reactions. ……

7.1.2 Multiple phases and cis-peptidyl-prolyl bonds

1. Effects of peptidyl-proline isomerization on kinetics


Refolding is generally found to proceed by a series of exponential phases.
Many of these exponentials are a consequence of cis-trans isomerization about
peptidyl-prolyl bonds. The equilibrium constant for the normal peptide bond in
proteins favors the trans conformation by a factor of 103-104 or so. The peptidyl-
prolyl bond is an exception that has some 2-20% of cis isomer in model peptides.
Further, it is often found as the cis isomer in native structure.The interconvension
of cis to trans in solution is quite slow, having half-lives of 10-100s at room
temperature and neutral pH. This has two important consequences. First, a
protein that has several proline all in the trans conformation in the native
structure will equilibrate when denatured to give a mixture of cis and trans forms.
Chapter 8 Stabilizing Protein Function

One of the great unresolved problems of science is the prediction of


the three-dimensional structure of a protein from its amino acid
sequence: “folding problem”. An even more elusive goal is the
prediction of the catalytic activity of an enzyme from its amino acid
sequence. Why so important?
1. the acquisition of sequence data by DNA sequencing is relatively
quick, and vast quantities of data have become available through
international efforts such as the Human Genome Project and other
genome sequencing projects. The acquisition of three-dimensional
data is still slow and is limited to proteins that either crystallize in a
suitable form or are sufficiently small and soluble to be solved by
NMR in solution. Algorithms are thus required to translate the linear
information into spatial information;
2. we are now able to synthesize proteins by way of their genes, and so
the production of new enzymes with specified catalytic activities is a
challenging prospect.

Producing such an new enzyme requires five underpinning and


interrelated abilities:
(1) the ability to predict the most stable fold of a particular sequence;
(2) the ability to design a novel fold;
(3) the ability to predict whether the desired fold is kinetically
accessible;
(4) the ability to design the precise features for specific binding in
the fold; and
(5) the ability to design the precise orientation of groups in the
protein for efficient catalytic function.

8.1 Understanding Protein Stability

8.1.1 Protein Stability and Its Measurement

1. Introduction
Protein stability is a very important area of study within
biotechnology, Because:
(1) For enzymes, although enzymes are protein molecules that act as
extremely efficient catalysts, the usefulness of enzymes and
proteins as analytical tools and as industrial catalysts is often
limited by their requirements for “mild” storage and reaction
conditions. This is because many emzymes lose their catalytic
abilities over time, that is, they have poor operational or long-
term stability;
(2) Stability is an issue also in the development and use of protein-
based analytical or sensor devices, and it can be of literally vital
importance in protein pharmaceuticals or therapeutics, where
deterioration of protein preparations over extended storage
periods can be a serious drawback;
(3) Interesting in protein stability will likely grow due to the
increasing use of recombinant therapeutic proteins, the advent of
protein engineering and recent strides in understanding protein
folding.

2. Protein Structure(see those have already been discussed)

…….. It is an axiom in biology that structure relates to function.


Integrity of the three-dimensional tertiary (or quaternary) structure is
essential for the correct functioning of the protein, e.g. catalysis of a
reaction by an enzyme, or antigen binding by an antibody. Loss of
quaternary or tertiary structure leads to loss of function. Adverse
conditions of temperature, pH or solvent, or high concentration of urea
or quanidinium hydrochloride, can bring about this loss of function.
Heavy metals, certain organic chemicals or chelating agents can act
similarly in some cases.

3. Definition and measurement of protein stability

The term “stability” refers to a protein’s resistance to adverse


influences such as heat or denaturants, that is, to the persistence of its
molecular integrity or biological function in the face of high
temperatures or other deleterious influences.
A perfectly folded, fully functional monomeric protein can lose its
biological activity in vitro by unfolding of its tertinary tructure to a
disordered polypeptide, in which key residues are no longer aligned
closely enough for continued participation in functional or structural-
stabilizing interactions. Such unfolding is termed denaturation. It is
usually cooperative and may be reversible if the denaturing influence is
removed, since the polypeptide chain has not undergone any chemical
changes.
A protein is also subjected to chemical changes which lead to an
irreversible loss of activity or inactivation, particularly following
unfolding. An unfolded, extended polypeptide will be much more prone
to proteolysis than a tightly-packed, globular protein. Unfolding may
result in the loss of a functionally-essential cofactor from a holoprotein,
such that biological activity will not be regained even if the unfolding
can be reversed to yield the corresponding apoprotein. Unfolded
polypeptide chains may aggregate to form an inactive, insoluble mass
while an individual chain attempting to refold may enter an incorrect,
kinetically-trapped conformation from which it cannot emerge. (A living
cell may be able to prevent these events, cells contain a number of so-
called chaperone proteins which assist the folding on newly-synthesized
proteins in vivo. Chaperones can also act in vitro to prevent aggregate
and assist folding. At least one such protein has been shown to
reactivate an aggregated enzyme.)

These different molecular phenomena give rise to two distinct


definitions of in vitro protein stability:
Thermodynamic (or conformational) stability: it concerns the
resistance of the folded protein conformation to denaturation (i.e. its
Gibbs, or free, energy of unfolding); (N←→U )
Long-term stability: it measures the resistance to irreversible
inactivation (i.e. persistence of biological activity.) (U→I)
Both types can be represented in a single scheme:
N←K→Uk→I
K: the equilibrium constant for the reversible reaction
k: the rate constant for the irreversible reaction.

4. Folding stability

Folding (conformational)stability is usually measured by optical


techniques (such as UV spectrophotometry, fluorescence or circular
dichroism) or by urea gradient gel electrophoresis. Recent reports
describe the analysis of thermal denaturation by free solution capillary
electrophoresis and temperature gradient gel electrophoresis. These
methods are sensitive to changes in protein conformation and thus
monitor unfolding of the target protein.
Free solution capillary electrophoresis achieves molecular separations
without mass transfer between mobile and stationary phases and without
retardation by a solid gel matrix. This separation, therefore, depend on
the intrinsic properties of the sample but accurate measurement of
temperature is essential. The technique allows estimation of the apparent
thermodynamics parameters (Gibbs energy, enthalpy and entropy) as
well as Tm(the transition midpoint of thermal unfolding) and, uniquely
can show the population distribution of mobility states. It is reportedly
as accurate as differential scanning calorimetry in the determination of
apparent thermodynamic parameters.
Temperature gradient gel electrophoresis involves the use of two
electrode tanks and a horizontal plate with two water baths connected to
provide a temperature gradient perpendicular to the direction of the
electric field. The gel is connected to the electrode tanks via wicks. One
of the water baths should have a refrigerator to maintain a suitable low
temperature.

Certain proteins are more stable at room temperature than in the


refrigerator and are said to be cold labile. This cold denaturation has
been well characterized for myoglobin and a few other proteins. It is a
property of the protein itself and is distinct from freezing inactivation.

5. Kinetic stability
Kinetic stability is distinct from (and need not correspond with)
thermodynamic stability. It involves measuring the persistence of
catalytic (or other biological) activity with time under adverse
conditions of temperature, pH, solvents, salt concentration and so on (or,
to put it another way, the progressive loss of function). In can be
represented by the scheme
Nkin→I
Where N is the native, functional protein, I is an irreversible inactivated
form and kin is the rate constant for the inactivation process.
To conclude, there are many indices of protein stability. The most
prominent of these are summarized in Table 1.

Table 1 Principal indices of protein stability

8.1.2 Studies on Denaturation, Inactivation and


Stabilizing Interactions

1. Introduction
The crucial importance of a protein’s tertiary structure, i.e. its
molecular shape, has been remarked upon in previous chapters. Tertiary
structure arises from interactions between the side chains (or R-groups)
of the covalently-linked amino acids making up the polypeptide. It is
the tertiary structure that orientates the critical residues and side chains
into the correct geometrical relationship to permit function. This is not
to state that a protein molecles is completely rigid, however. There is
aboundant evidence that enzymes and proteins undergo slight but
significant changes in shape on binding substrates or modulators. This
part examines denaturation or unfolding of a protein and the nature of
subsequent molecular changes leading to irreversible inaction. It also
surveys exploirations of stabilizing interactionsin proteins and considers
the contribution of the carbohydrate portions of glycoproteins to
stability.

2. Denaturation studies
Armed with the parameters described in previous section for
measurement of protein stability, researchers have been able to study the
loss of protein function or strucutre I a rational fashion. Understanding
the causes of activity loss can help on the formulation of stabilizing
strategies, since one will know what changes must be prevented. Many
enzyme deactivations have been characterized in detail. A first-order
exponential process describes many deactivations, since two-state
transitions are often observed in reality. More complex phenomena do
occur, however, but even some of these have been successfully modeled.
Oligomeric proteins are more likely to undergo complex deactivation.
Many monomeri proteins show two-state unfolding but there are
exception. The single subunit protease zymogen, pepsinogen, undergoes
a transition that is not two-state.
One can observe denaturation of most proteins at high temperatures.
This conformational stability of proteins is due to the (quite small) net
difference between a very large number of weak stabilizing interactions
and the nearly-equally large conformational entropy. This net free
(Gibbs) energy of stabilization (typically 40 kj mol-1) is equivalent to
that of a small number of interactions. Only a very few further
interactions is sufficient to explain the greater stabilities of very stable
or extremophilic proteins: a single interaction may contribute up to 25kj
mol-1. Some proteins, however, will denature at low temperatures and
this has been well described for metmyoglobin.
There are examples where the unfolding temperature of a protein of
interest decreases greatly in the presence of chaotropic agents. Finding
of this sort show that different denaturing influences (where
temperature, pHor chaotropic agents) act in an equivalent fashion by
encouraging unfolding of the three-dimensional protein structure. Where
inactivation takes place, some covalent change or alternation in the
degree of association occurs instead of, together with, or in addition to,
the unfolding phemomenon. Activity will be lost when the unfolding
disrupts the integrity of the molecule’s active or functional site(s). In
summary, loss of a protein’s biological activity can occur by either
conformational or covalent processes.
Air-liquid interfaces can also have important destructive effects on
proteins, especially under conditions of rapid mixing or agitation, as has
been demonstrated for a variety of proteins. Surfactants which
preferentially absorb to the air-water interface can prevent inactivation,
providing further eidence for this interfacial effect.

3. Deleterious chemical reactions in proteins


Deamidation of glutamine and asparagine can occur at neutral to
alkaline pH values while peptide bonds involving aspartic acid undergo
cleavage under acidic conditions. Cysteine is prone to oxidation, as are
tryptophan and methionine. Alkaline conditions lead to reduction of
disulphide bonds and this is often folowed by beta-eliminations or thiol-
disulphide exchange reactions. Where reducing sugars are present with
free protein amino groups (N-termini or lysine residues), there may be
destructive glycation of the amino functions by the reactive aldehyde or
keto groups of the suagr (the Maillard Reaction). Elevated temperatures
favor all of these reactions but it is important to note that aggregation
and deleterious chemical reactions can occur at moderate temperatures
also.
(1) Deamidation of asparagine residues
(2) Isomerization of prolines
(3) Destructive oxidation events
(4) Proteolytic processes

4. Probing the stabilizing interactions in proteins


One can see from the paragraphs above that protein scientists have
identified the events responsible for loss of protein function, but what of
the molecular interactions maintaining a protein’s folded structural
integrity? Site directed mutagenesis permits the replacement of specific
amino acid within a polypeptide chain. Use of these specific mutagenic
techniques allows one to ascertain a particular amino acid’s contribution
to the overall structural integrity and stability of the protein within
which that amino acid is located. The introduction (or elimination) of
specific interactiona in a protein is now possible. Experiments of this
sort have greatly increased understanding of the types of forces
contributing to stability. Examples of the sorts of substitutions possible
include replacement of tyrosine by phenylalanine to assess the
contribution of the tyrosine –OH group, or exchange of aspartic with
glutamic acid to extend or shorten the position of the carboxylic acid
function by a distance of one –CH2- group. (examples are omitted).

5. Replacement of conserved residues: E. coli Thioredoxin


Conservation of certain amino acid residues among similar proteins
across different species is generally regarded as implying an important
structural or functional role for the residues in question. In some cases,
however, it has proven possible to replace conserved residues and still
recover a functional protein.

Example: E. coli thioredoxin


Thioredoxins are redox proteins, which act as reducing agents and
as protein disulfide reductases. They contain many disulfides, having
about 100 amino acid residues and occur in many different types of
organisms. The active sequence Trp-Cys32-Gly-Pro-Cys35 is highly
conserved, along with a number of close-lying residues. Three of these
amino acids were replaced by site-specific mutagenesis: Asp26, Pro34,
and Pro76. These occur in most thioredoxins and lie near the active site
disulfide. A mutant containing alanine at position 26(D26A) was more
stable than the wide type but did have altered functional properties
despite an insignificant change in redox potential. Wide type
thioredoxin’s guanidine HCl midpoint for unfolding is 3.4 M at pH 7.0;
that of the D26A mutant shifted upwards to 4.3M, coresponding to a 1.0
kcal mol-1 increasing stability……..

6. Carbohydrate side chains and protein stability


Glycolation is the covalent addition of carbohydrate residues to R-
groups of amino acids within a polypeptide backbones. This post-
translational modification occurs widely in eukaryotic systems and
generally involves the side chains of Asn (N-linked) or Ser (O-linked)
residues. Besides having a number of purposes in vivo, this
carbohydrate labelling of a protein can confer quite a deal of additional
stability. The hormone erythropoietin (EPO), which stimulates the
production of red blood cell precursors and is an important therapeutic
shows just how significant this extra stabilization can be. EPO is a 34-
38 kDa protein containing three N-linked and O-linked carbohydrate
side chains. Carbohydrate accounts for 40-50% of its molecular mass.
The N-linked oligo-saccharides are essential for in vivo activity but the
O-linked one is not. Removal of sialic acid from the N-linked
oligosaccharides abolishes ‘in vivo’ activity even when the rest of the
sugar moiety remains.

7. Is there a trade-off between stability and Activity?


Studies of denaturation and of stabilizing interactions in proteins
strongly suggest that proteins are optimized for function (e.g. an enzyme
as catalyst for its specific reaction) rather than for stability (with the
obvious exception of the proteins occurring in extremophilic
organisms). There may be a trade-off between activity (which requires
some degree of moleclar flexibility; cf. The induced-fit hypothesis) and
folding stability (which demands a rigid polypeptide backbone).

8. Conclusion:
It can be seen from discussion above, that some conclusions have
emerged from studies of native and mutant proteins under extreme
conditions. Inactivating covalent processes, which follow unfolding of
the polypeptide, fall into a small number of defined reactions.
Electrostatic interactions can be stabilizing in appropriate, rigid regions
of the polypeptide. However, tight packing of the hydrophobic core so
as to minimize cavities is perhaps the most important contributor to
folded protein stability.

8.1.3 Enzymes in Organic Media

1. Introduction
The successful use of non-membrane bound enzymes in biphasic
aqueous-organic systems (and even in anhydrous organic solvents) was
a novel and surprising development. Here the focus will be confined to
the stability aspects of such studies.

2. Enzyme behavior in anhydrous organic solvents


Some enzymes (and other proteins) have been observed to function
perfectly well in non-aqueous media. Indeed, they show some
remarkable properties in organic systems. A protein can retain its correct
conformation on transfer from a hydrophilic aqueous to a hydrophobic
organic solvent system. This is because a layer of bound or “essential”
water remain associated with the folded polypeptide, even in a
“completely dry” or lyophilized preparation. This minimal layer of
water is sufficient to solvate the folded polypeptide. An unfolded
polypeptide requires a greater number of water molecules for effective
solvation; thus the folding equilibrium is shifted towards the folded
form in low-water systems. This shift in equilibrium means that
denaturation (unfolding) becomes a much less likely event in low-water
hydrophobic systems. This is not the only characteristic of these media
benefitting stability. A common feature of the inactivating covalent
reactions is the participation of water. Water is present at the extremely
high concentration of 55 M in aqueous systems. Despite the retention of
“essential” water surrounding a polypeptide in a hydrophobic medium,
water’s effective concentration is profoundly reduced and water-
mediated deleterious covalent reactions are much less likely to occur.
Thus, both the unfolding and inactivation events of the conventional N
←→U←→I model can be markedly decreased in hydrophobic media.
Klibanov has proposed certain rules to ensure that enzymes will remain
active in organic solvents:
(1) The solvent should be hydrophobic and show very little affinity
for water.
(2) Enzymes for use in organic solvents should be lyophilized from
solutions with pH values corresponding to the enzyme’s
optimal pH.
(3) The enzyme should be agitated vigorously or sonicated to
ensure homogeneous dispersion in the organic solvent.

Why are these rules important? (see Text for details)

3. Some case studies (Omitted)

4. Combined Strategies
One can stabilize enzymes for use in the presence of organic
solvents by strategies such as immobilization or protein engineering.
Membrane-bound (as distinct from soluble) multi-subunit enzymes may
also be stabilized as protein-lipid complexes by organic solvents. It was
found that Cytochrome oxidase and H+-ATPase from inner
mitochondrial membrane were stabilized by factors of 100 and 9,
respectively when the water content of the toluene bulk phase was
reduced from 13 microliters per milliliter to 3 microliters per milliliter.
The judicious use of hydrophobic, non-polar solvents as alternatives to
water has tremendous potential for the achievement of stabilized
enzymes.

8.2 Manipulating Protein Stability

8.2.1 Use of Stabilizing Additives

1. Inroduction

It has been known that inclusion of low molecular weight substances


such as glycerol or sucrose in protein solutions can greatly stabilize the
critical protein’s biological activity. A variety of compounds (often used
at concentrations of 1M or greater) can increase the stability of proteins
in solution or those undergoing processes such as freeze drying. The
range of such compounds is very wide. It includes substances as diverse
as substrates (and specific ligands), salts, glycerol, sucrose, polyethylene
glycol, chelating agent, reducing agents and proteins such as serum
albumin.
However, it took some time for the exact mechanism of this
stabilization to be ascertained. Timasheff and colleagues have shown
that these types of substances are preferentially excluded from the
vicinity of the protein molecules, since their binding to the protein is
thermodynamically unfavorable. The protein molecule is preferentially
hydrated by the solvent water. Loss of the protein’s compact, properly
folded structure (denaturation) will increase the protein-solvent
interface. This in return will tend to increase the degree of
thermodynamically unfavorable interaction between the additive and the
protein molecule.The result is that the protein molecule is stabilized by
the additive. This “preferential exclusion” means that there is less of the
solute (additive) immediately surrounding the protein than there is in the
bank solution; it does not necessarily mean that no solute molecules can
penetrate the protein molecule’s hydration shell.

2. Types of stabilizing molecules

(1) Salts
A particular salt exerts stabilizing or destabilizing effects on proteins
depending on its position in the Hofmeister lyotropic series which
relates to the effects of salts on protein solubility:
(more stabilizing )(CH3)4N+>N+H4>K+, Na+>Mg2+>Ca2+>Ba 2+
SO4 2->Cl->Br->NO3->ClO4->SCN-
The stabilizing ions “salt out” hydrophobic residues in the protein,
causing the adoption of more compact structure. This effect may be
attributed to the increased ionic strength of the solution and to the
increase in the number of water clusters around the protein. This helps
prevent the unfolding which is the initial event in any protein
deterioration process.
Most stabilizing ions seems to act via a surface tension effect. Ions
can also stabilize proteins by shielding surface charges and can act as
osmolytes by affecting the bulk properties of water. Note that ammonia
sulfate, which is widely used as a stabilizing ions from the Hofmeister
list above, the NH4+ cation and the SO42+ anion. One can stabilize
protein in solution while avoiding precipitation by adding ammonium
sulfate to final concentration in the range 20~400mM. Besides a
ammonium sulfate, salts containing citrate, sulfate, acetate, phosphate
and quaternary ammonium ions are generally useful. Note, however,
that the nature of the counterion will influence the overall effect of such
compounds on protein stability.

(2) Glycerol, sugars and polyethylene glycol


Glycerol, sugars and polyethylene glycol are polyhydroxy
compounds. They can form many hydrogen bonds and aid formation of
a “solvent shell” around the protein molecule that is distinct from the
bulk aqueous phase. They can also increase the surface tension and
viscosity of a solution.

(3) Chelating agents


Chelating agents prevents act to complex metal ions. By doing this,
they prevent oxidation by active oxygen species. They can also prevent
metal ion-induced aggregation. They can, however, remove a
catalytically-essential metal ion from an active site, leading to loss of
activity.

(4) Reducing agents


Reducing agents prevent the destructive oxidation of essential
structural or functional features, but they do have potential drawbacks.
The commonly used 2-mercaptoethanol can reduce disulfide bonds in
proteins and also catalyzes a thiol-disulfide exchange which may lead
to aggregation.

3. Basis of stabilization
Volkin and Klibanov have classified stabilizing additives into three
classes:
(1) Specific: substrates and ligands, where the Native→Unfolded
equilibrium shifts towards the native form;
(2) Non-specific: neutral salts and polyhydroxyl compounds,
which function as explained above;
(3) Competitors: which out compete the enzyme for the
inactivating reagents or remove the catalysts of deteriorative
chemical reactions: examples included added protein,
chelating and reducing agents.

Schein stressed the importance of the hydration or solvent shell


surrounding protein molecules in solution and has dividede solutes into
osmolytic and ionic stabilizers. Osmolytes affect solvent viscosity and
surface tension: thus they influence solvent ordering. They are not
highly charged and do not affect enzyme activity up to 1M
concentration. Osmolytes include polyols, sugars, polysaccharides,
neutral polymers and amino acids. The principal method of protein
stabilization by ionic compounds is by shielding of surface charges.

4. Use of Additives
It is important to note that the additives discussed below are
generally applicable as stabilizing agents for proteins but a given
substance may not be effective for a particular protein. Both sucrose and
PEG, for instance, are good stabilizers of invertase but have denaturing
effects on lysozyme; the same additive has contrary effects on the two
enzymes.One should note the stabilizing or destabilizing effects of the
component ions of a salt when choosing additives; ref.6 includes a
useful discussion of this topic.
Osmolytes are a diverse group of substances comprising such
compounds as polyols, mono-and polysaccharides, neutral polymers
(such as PEG) and amino acids and their derivatives. One should use
polyols and sugars at high final concentrations: typical figures range
from 10-40%(w/v). Sugars are reckoned to be the best stabilizers but
reducing sugars can react with protein amino groups, leading to
inactivation. This problem can avoided by using non-reducing sugars or
the corresponding sugar alcohols. Glycerols is a very widely used low
molecular weight polyol. Its advantages include its ease of removal by
dialysis and its noninterference with ion exchange chromatography.
However, glycerol suffers from two significant disadvantages: it is a
good bacterial substrate and it greatly lowers the glass transition
temperature of material to be preserved by lyophilization or drying. The
5-carbon sugar alcohol, xylitol, can often replace glycerol; it can be
recycled from buffers and is not a convenient food source for bacteria.

Polymers such as PEG are generally added to a final concentration of


1~15%(w/v). They increase the viscosity of the single phase solvent
system and thereby help prevent aggregation. Note, however, that higher
polymer concentrations will promote the development of a two phase
system. The protein of interest will concentrate in one of these phases
and this may actually lead to aggregation.

Amino acids with no net charge, notably glycine and alanine, can act
as stabilizers if used in the range 20~500 mM. Amino acids and
derivatives occur as osmolytes in nature.
5. Substrate and specific ligands

Addition of specific substrate, cofactors or competitive (reversible)


inhibitors to purified proteins can often exert great stabilizing effects.
Occupation of the target protein’s binding or active site(s) by these
substances leads to minor but significant conformational changes in the
polypeptide backbone. The protein adopts a more tightly folded
conformation, reducing any tendency to unfold and (sometimes)
rendering it less prone to proteolytic degradation. Occlusion of the
protein’s active site(s) by a bound substrate molecule or reversible
inhibitor will protect those amino acid side chain which are critical for
function. A starch degrading amyloglucosidase enzyme stoed in the
presence of 14%(w/v) partial starch hydrolysate was 80% more stable
over a 24-week period at ambient temperature than the corresponding
enzyme preparation stored in the hydrolysate’s absence.

6. Use of reducing agents and prevent of oxidation reaction

The thiol group of cysteine in prone to destructive oxidative


reactions. One can prevent or minimize these by using reducing agents
such as 2-mercaptoethanol or DTT. One should add 2-mercaptoethanol
to reach a final concentration of 5-20 mM and then keep the solution
under anaerobic conditions. To achieve these anaerobic condition, one
can gently bubble an inert gas such as nitrogen through the solution and
fill it to the brim of a screwcap containers to minimize headspace and
the chances of gaseous exchange. DTT is effective at lower
concentrations: usually 0.5~1.0 mM will be suffice. DTT can act as a
denaturant at higher temperature and is not very soluble in high salt.
Note that these reducing agents are themselves prone to oxidation. DTT
oxidizes to form an internal disulfide which is no longer effective but
which will not interfere with protein molecules. 2-mercaptoethanol, on
the other hand, participates in intermolecular reactions and can form
disulfides with protein thiol groups. Such thiol-disulfide exchanges are
highly undesirable and may actually lead to inactivation or aggregation.
Much of the oxidation of thiol groups is mediated by divalent metal
ions which can activate molecular oxygen. Complexation of free metal
ions can prevent destructive oxidation of thiol groups. EDTA may be
used to complex metal ions.
Additives can greatly enhance stability without chemical or genetic
manipulation of the target protein. Using empirical knowledge and the
principles described, one may devise a formulation applicable to many
different proteins rather than just one. This is especially useful in drug
applications, where the need to satisfied regulatory requirements may
work against a chemically- or genetically-modified derivative. Of
course, one should note that operations such as dilution, dialysis, buffer
exchange or rehydration of a dry preparations will remove the
stabilizing influences. One has an achieved stability only in a particular
context or ambience. In contrast, engineered proteins such as subtilisins
show increased stability under many different conditions.

8.2.2 Chemical Modification of Protein in Solution

1. Introduction: Scope of chemical modification


Chemical modification of protein in solution is very complicated
question and would require a monograph devoted entirely to this topic.
Here we attempt merely to survey the nature and types of chemical
modification available for the study and derivatization of enzymes and
proteins. We will also describe the functional groups of proteins that are
available for reaction and presents the types of reactions they undergo.
Lastly, we will summarizes a selection of chemical modifications which
have stabilized a variety of enzymes and proteins.

Chemical modification procedure can provide important structural


and functional information concerning a protein. Soluble enzymes can
be chemically modified in a variety of ways so as to alter their
properties. Each of the 20 amino acids occuring in proteins has a free R-
group or side chain. Many of these have reactive functional groups such
as the thiol group of cysteine or the amino group of lysine residues. At
least 9 amino acid side chains (Cys, Lys, Asp, and Glu, Arg, His, Trp,
Tyr and Met) can react with specific reagents under mild conditions to
yield chemically-modified protein derivatives, often with altered
properties. The aim of the chemical modification of protein is to alter
the properties of interest protein (such as enhancing the stability of
protein), and minimize activity loss. Thus, careful choice of modifying
reagents and conditions is required. Table 8.1 lists reactive amino acid
R-groups and some of the types of compounds useful for reactions with
them.
Chemical modification is in many ways complementary to site-
directed mutagenesis and protein engineering as a methodology for the
study of protein variants: (1) relatively little structural information is
required concerning the target protein; (2) the experiments are often
simple to carry out and protocols may be readily implemented; (3)
however, chemical modification of protein is prone to a number of
pitfalls (see “Immobilization”).

Applications of chemical modification of proteins:


(1) Quantitation: two of the most reactive R-groups are the thiol and
amino groups, which occur on the amino acid cysteine and
lysine, respectively. These group each react with a range of
agents which yield some specific colored compounds. These
colored compounds can be estimated quantitatively using
spectrophotometer.
(2) Active-site targeting: R-group specific reagents may be used to
identify residues in or near the active site. The chemically-
modified residues may be conveniently identified by comparing
the peptide map (or protein sequence) characteristics of the
chemically-modified protein with those of the native. This
approach can elucidate which active site residues are involved in
binding and catylysis.

Cross-linking of protein subunits or adjacent proteins cab be carried


out with a wide variety of reagents (see below).

2. Chemical modification and protein stabilization


Mechanisms of chemical stabilization of proteins have been
categorized under four headings:
(1) Cross-linking (either intra- or inter-molecular) by bifunctional
reagents;
(2) Strengthening of hydrophobic interactions by nonpolar
reagents;
(3) Introduction of new polar or charged groups, leading to
additional ionic or hydrogen bonds;
(4) Hydrophilization of the protein surface to reduce unfavorble
surface hydrophobic contacts with water.

(1) Cross-linking studies


Many examples of protein stability by crosslinking have been
reported using a wide range of “bridging” reagents. Glutaraldehyde has
been used to crosslink tetrametric lactate dehydrogenase with
borohydride reduction. Sixty per cent of the initial activity was
recovered and 82% of the product was in tetrameric form.

(2) New Polar or Surface Charged Groups


Crosslinking is not a prerequisite for protein stabilization by chemical
modification. Dramatic increases in stability can result merely from the
alteration of surface groups. Chemical modifications other than
crosslinking can have dramatic stabilizing effects. Prominent among
these is the introduction of new polar or charged groups onto a protein
surface. The monofunctional reagent methyl acetimidate was used to
alter 17 of the 24 available lysines of pig heart lactate dehydrogenase.
This acetamidination takes place with retention of charge but the Pk
values rise from 10.2 to 12.5 as the simple charged amino of the native
lysine residue is replaced (Fig.6.2). The modified enzyme was more
resistant to heat, alkaline, Ph and even to trypsin digestion. This last
finding was ascribed to the structure of the covalently-altered lysines,
which is not equivalent to either lysine or arginine, the two amino acids
which are cleaved by trypsin( Fig.6.2). The increases in thermal half life
ranged from 8- to 50-fold, depending on the elevated temperature used.
Charged retention is not, however, always required for stabilization
by monofunctionals. Alpha-chymotrypsin has been alklated(using
acrolein followed by borohydride reduction, resulting in retrntion of
positive charges) and acylated( with acetic or succinic anhydride,
leading to the neytralization or reversal of positive charges,
respectively). All of these reagents react with free amino groups of the
protein. Thermal stabilities increased with the degree of substitution of
the available amino groups up to a maximum of 90%. Further
substitution led to a dramatic drop in thermostability. The stbilizations
achieved were shown to depend on the degree of modification rather
than on the type of modifier used. In other words, either alkylation or
acylation reactions stabilized the enzyme provided that each substituted
the free amino groups on the enzyme to an equivalent degree. This
suggests that the positive charges on the surface of alpha-
chymotrypsin(or, at least, some of them) are not important for retention
of native conformation.
Similar findings were reported for alanine aminotransferase(ALT)
following reaction of its free amino groups with succinic anhydride,
which reverses the positive charges of protein amino groups. This
modification led to a 2-fold stabilization at 37°C(measured by
comparison of first-order inactivation rate constants) and to a six-times
longer shelf life at 4°C(measured by an accelerated degradation
methodology).These benefits depended strongly on the use of a large
excess of modifier over protein.
In contrast, Cupo and colleagues observed that guanidination of
chymotrypsinogen lysines by O-methyl isourea led to increased
stability. The final effect of this reaction resembles the acetamidination
discussed above (ref.48) but here a full guanidino group is added to the
lysines to yield homoarginine. Net positive charge is retained.
Subsequently, the remaining available lysines were neutralized by
acetylation and the resulting derivative was less stable than the native
form. A superguanidinaged chymotrypsinogen was prepared by further
guanidinartion of altered protein carboxyl groups and proved to be even
less stable. Taken together with the results in ref.49, these findings show
just how critical the correct choice of reaction chemistries and
stoichiometries can be in chemical stabilization experiments. They may
also reflect the different methods used to measure stability: loss of
catalytic activity and hydrogen isotope in-and out-exchanges.

(3) Hydrophilization of Protein Surface


This has been well illustrated by the work of Mozhaev and
colleagues. Their approach was to “hydrophilize” enzymes by chemical
modification of non-polar surface clusters. The resulting derivative will
have a greater hydrophilic surface area and be better solvated. Since its
interactions with the solvent will be improved, the protein will be less
likely to unfold in the face of denaturing influences.
Using alpha-chymotrypsin once again as target enzyme, dramatic
stability enhancements(1,000-fold at 60°C) have been achieved for
surface-hydrophilized derivatives arising from reductive alkylation of
up to 10 amino groups. Alkylation with glyoxylic acid, followed by
cyanoborohydride reduction, introduced a number of -NCH2COO-
groups. These are much more hydrophilic than the naturally-occurring
free-NH2 groups on the protein surface and they probably lead to
decreased contact between water and nonpolar clusters. Six of the
fourteen lysines lie close to hydrophobic residues on the protein surface.
The 1000-fold stabilization reported is the ratio of the first-order
inactivation rate constants for the native and modified forms. This was a
remarkable result from such a simple chemical alteration with low
molecular weight compounds.
(4) PEG and carbohydrate coupling
There is another form of chemical modification which can benefit
stability, namely the covalent attachment of large molecular weight
polyhydroxy entities such as PEG (polyethylene glycol) to proteins.
Coupling of PEG to proteins has been used to alter their
immunogenicity and to prolong their clearance times post-injection.
PEG coupling can also improve enzymes’ solubilities and activities in
organic solvents. PEG has been used to dissolve enzymes in non-
aqueous solvents to open up a whole new area of enzyme chemistry. It
can also lead to increased stability.

Soluble carbohydrates and polysaccharides may be coupled to


proteins by the cyanogen bromide method. This activates sugar
hydroxyl groups for coupling to protein amino groups. This has been
suggested as a general method for enzyme stabilization: carbohydrate
coupling helps protected protelytic inactivation in addition to its
enhancement of protein thermostability. Trypsin and alpha- and beta-
amylases have been coupled to soluble dextran. All derivatives were
more stable than their untreated counterparts. For example, the 60C
half-life of alpha-amylase increased from 2.5 to 63 minutes following
modification. The modified trypsin was less prone to autolysis. An
increased degree of hydration may be an important factor in the
stabilization.

(5) Other strategies

3. Conclusion
A wide range of protein and enzyme properties have successfully been
altered by chemical modification procedures. These include solubility,
catalytic activity, substrate selectivity and stability. Chemical
modification can dramatically increase protein stability, with good
recoveries of activity. Active sites may be protected by specific ligands,
if required, but some degree of inactivation is always a possibility.
Partial loss of biological activity may be due either to decreased activity
of all the enzyme/protein molecules in the system or to total inactivation
of some of the molecules present. It can be difficult to distinguish
between these alternatives. Lessened activity among the entire
population of molecules often leads to altered kinetic parameters or pH
profiles, while a decreased turnover number accompanied by an
unchanged Km suggests the presence of a molecular fraction with
unaltered activities. Complete loss of biological activity following a
given modification reaction is usually interpreted in terms of the
targetted residue(s) being essential for activity.
Despite these reservations, chemical modification may be carried out
and characterized quite quickly. The information gained is ueful for
further rounds of chemical modification or for suggesting target sites for
site-directed mutation of the protein of interest. The ability to create
non-protein amino acid derivatives by chemical modification usually
complements the scope of mutational/protein engineering strategies for
the study or manipulation of proteins. Meanwhile, mutagenesis and
expression techniques continue to improve while chemical techniques
are likewise becoming more sophisticated. Very many reagents satisfy
the main criteria of a useful protein modifier, namely a high specificity
for one type of amino acid residue within a protein molecule and an
ability to react with that residue under mild conditions of pH and solvent
composition. It is likely that these powerful and complementary
strategies will be increasingly combined in the future.

8.2.3 Immobilization

The immobilization of enzymes and proteins onto insoluble materials


forms the basis of many biotechnological processes and analytical
devices. It is in its own right an extremely active field of applied study,
so only the functional stability aspects of enzyme and protein
immobilization will be considered here. There are four basic
immobilization processes, namely: physical adsorption, covalent
binding, copolymerization and microencapsulation. Each of these
methods can bring about increased stability of a protein or enzyme. It is
important to remember that immobilization is often performed for other
reasons, such as ease of enzyme-product separation, or enzyme
recyclability, and not primarily to stabilize the protein or enzyme of
interest.

8.2.4 Protein Engineering


See: related chapter

8.2.5 Long-term Storage of Proteins

1. Introduction
The laboratory scientists and biotechnologist will often need to
store an isolated or purified protein for varying lengths of time. If the
protein is an object of study, it will take some time to ascertain its
properties. If it is a commercial end product, or finds use as a tool in
some procedure, it will likely be used in small quantities over an
extended period. The protein must retain as much as possible of its
original, post-purification, biological (or functional) activity
throughout this time. The storage period or “shelf life” can range
from a few days to more than one year. The protein’s long-term or
kinetic stability becomes critically important under these conditions.
Shelf life can depend on the nature of the protein and on the storage
conditions. How one can prevent deterioration due to microbial
contamination and proteolysis and that correct use of low temperature
for extended storage are the focus of this section. It also considers
drying and freezes drying as processes for long-term protein
preservation. Of course, one cannot guarantee that any or all of these
necessarily-broad recommendations will “work” for a given protein,
or that a particular stabilization factor will result from a procedure
that seems to “work”.

2. Prevention of microbial contamination


Microbial contamination can lead to significant losses of a pure
protein by proteolysis. One should always aim to avoid it in the first
place.
(1) Addition of antimicrobial compounds such as sodium azide
or thiomersal (sodium merthiolate, a mercury-containing
compound) can prevent microbial growth. Add sodium
azide to a final concentration of 0.1%(w/v) or thiomersal to
a final concentration of 0.01%(w/v). Azide will inactivate
oxygen-binding proteins such as hemoglobin or peroxidase.
(2) Where one desires sterility but must avoid use of the toxic
compounds above, filtration offers a useful alternative. A
filter of pore size 0.22 micrometers will exclude all
bacteria; indeed, this method is used in industry to sterilize
labile materials which cannot be autoclaved or irradiated.

3. Avoidance of proteolysis
It can be difficult to remove proteases completely during
purification of a target protein. Unless the object protein is
completely pure, even tiny amounts of contaminating proteolytic
enzymes can cause serious losses of activity during extended storage
periods. The molecular diversity of proteases complicates the
situation: there are exopeptidase and endopeptidase. In addition, there
are four types of proteases classified by their molecular reaction
mechanisms: the serine, cysteine (or thiol), acid and metallo-
proteases. Use of EDTA in the concnetration range 2-5 mM should
complex the divalent metal ions essential for metalloprotease action.
Pepstatin A is a potent but reversible inhibitor of acid proteases. It is
used at concentrations around 0.1 micromolar, as are similar protease
inhibitors. The compound phenylmethylsulphonyl fluoride (PMSF)
reacts irreversibly with the essential serine in the active site of serine
proteases, inactivating them. It can also act on some thiol proteases.
It is typically used at a final concentration of 0.5-1.0 mM, following
dissolution in a solvent such as acetone. Before addition, of course,
one must ensure that none of these compounds will adversely affect
the protein of interest.
If the protein of interest is itself a proteolytic enzyme, use of
protease inhibitors is not feasible. One may need to store such a
protein in dried form or as a freeze-dried preparation. Alternatively,
one can place it in a solution with a pH value far removed from the
protease’s optimum pH. Trypsin, for example, is most active at
mildly alkaline pH values. Daily stock solution of trypsin are often
prepared in 1mM HCl, where the very acid pH value renders the
enzyme effectively incapable of catalysis. This helps prevent
autolysis during the course of the experiment. The enzyme molecule
does not inactivate under these conditions and is fully active on
dilution into a suitable assay solution.

4. Extremely dilute solutions


Very dilute protein solutions are highly prone to inactivation. This
is especially true of oligomeric proteins where dissociation of subunits
can occur at low concentration. The individual polypeptide chain
comprising the oligomer may lack activity alone and/or may denature
with consequent loss of activity. Protein solutions of concentration less
than 1-2 mg ml-1 should be concentrated as rapidly as possible by
ultrafiltration, or by reverse osmosis using solid sucrose or polyethylene
glycol. Where rapid concentration is not possible, one can prevent
inactivation by addition of an exogenous protein such as bovine serum
albumin (BSA), typically to a final concentration of 1mg ml-1.

5. Low temperature storage


Refrigeration at 4-6°C will often suffice to preserve a protein’s
biological activity, provided microbial contamination and proteolysis are
prevented. Many proteins are supplied commercially in 50% glycerol or
as slurries in approximately 3 M ammonium sulphate. Freezing of such
preparations is unnecessary and should be avoided.
Some proteins can deteriorate at “refrigerator” temperature and
require storage at temperature lower than 0°C. Usually, temperatures
between –18° and –20°C will allow for stable storage. Sometimes,
however, it may be necessary to use temperature below –20°C.
Most protein solutions will freeze solid at temperature below 0°C.
The events occurring during freezing of a protein-containing mixture or
biological system are much more complex than the simple macroscopic
phase change suggests. Differential freezing of particular components of
the mixture can lead to enormous concentration effects and to dramatic
changes of pH at low temperatures. These chemical processes can lead,
in turn, to a notable degree of protein inactivation.
Prevention of freezing will, of course, avoid freezing damage. It is
possible to undercool liquid without freezing by preventing the
nucleation of ice crystals. This means that protein can be stored well
below 0°C in the liquid phase. This method is very useful for small
volumes of valuable proteins. It avoid the need to use additives and is
more economical than freeze drying.

6. Drying for stable storage


The advantages of water removal as a protein storage/stabilization
strategy are many. Water participates directly in many of the deleterious
chemical reactions and in proteolytic processes. In any case, it provides
a medium for molecular movement and interactions. For these reasons,
removal of water effectively prevents deterioration of the protein. A
dried preparation will be much less bulky than the original solution and
can conveniently be stored in a laboratory freezer or refrigerator (or
perhaps even at room temperature). When one wishes to use the protein
preparation, one can rehydrate it simply by addition of an appropriate
volume of pure water or a suitable buffer solution.

7. Freezing drying
Lyophilization, or freeze drying, is a method for the preservation of
labile materials in a dehydrated form. It can particularly suitable for
high value biomolecules such as proteins. The process involves the
removal of bulk water from a frozen protein solution by sublimation
under vacuum with gentle heating (primary drying). This is followed by
controlled heating to more elevated temperatures for removal of the
remaining “bound water” from the protein preparation (secondary
drying). Residual moisture levels are often lower than 1%. If the freeze
drying operation is carried out correctly, the protein will preserve all or
most of its initial biological activity of the protein in question.

8.3 Conclusion

This chapter has tried to show that many different approaches have
contributed to our understanding of, and ability to manipulate, protein
stability. It is likely that such cross-fertilization will continue in the
future. Advances in determining three-dimensional protein structure will
aid molecular modeling and rational design approaches. Mutagenesis
and expression techniques continue to improve while chemical
techniques are likely becoming more sophisticated. It is likely that these
powerful and complementary strategies will be increasingly combined
in the future.
Available stabilization strategies include immobilization, use of
additive, chemical modification and protein engineering. (see Table).

Towards an exciting future?

Chapter 9 Functional Diversity of Proteins

Keypoints:
1. How does the structure of proteins relate to the
function for which they were designed?
2. How cells design protein (from the evolutionary
viewpoint)?

9.1 Targeting and functional diversity

The cell is a highly organized factory in which the constituent parts


are assembled in different locations and specialized machinery exists for
specific purposes. Thus single-cell organisms are compartmentalized so
that specific reactions occur in unique locations. In multicellular
organisms the localization of reactions is even greater. The workers in
the biochemical factory of the organism are the proteins.

1. Proteins are directed to the regions where they are utilized


Our first consideration is how proteins get to their final destination, that
is, the locations where they function. All proteins are made in the
cytoplasm, but their final location depends on a variety of signals.
All proteins are made on ribosomes. Except for a small number of
ribosomes located inside the organelles themselves, the vast majority of
proteins are made on ribosome in the cytosol. Some of the ribosome are
freely floating in the cytosol, and some are attached to the endoplasmic
reticulum. The ribosomes that remain free account for the proteins that
are targeted to locations in the cytosol, the nucleus, the peroxisomes, the
mitochondria, and the chloroplasts. Ribosome that are bound to the
endoplasmic reticulum make proteins that deposited in the lumen of the
endoplasmic reticulum. From there the newly synthesized proteins may
be transferred to the Golgi apparatus while undergoing modifications of
various sorts. At some point, parts of the Golgi pinch off, and the
modified proteins that do not remain in the Golgi are transferred to
specific locations such as the lysosomes, the plasma membrane, and the
secretory granules. Those proteins targeted to the secretory granules are
eventually exported.

2. Classification of proteins according to location emphasizes


functionality
Because of the great structural and functional diversity of proteins, it is
difficult to capture the important features or the whole range of them
within any one classification scheme. Here we will classify proteins
according to the location they occupy when they are fully functional.
This is useful classification scheme because it emphasizes functional
interrelatedness---proteins that go together work together. (Table)

3. Protein structure is suited to protein function


we have seen that highly elongated fibrous proteins are well suited for
compartmentalization, for giving stable form to organellar and cellular
structures, and for processes involving movement of the organism.
Because of their generally low mobility, fibrous proteins are rarely
associated with enzyme activity or used for transport purposes. For
those functions, globular proteins are more suitable. In this section, we
will consider two classical examples of protein assemblages that are
ideally designed for the roles they play in the cell: hemoiglobin and the
skeletal muscle system.

9.2 Hemoglobin---an allosteric oxygen-binding protein

Homoglobin is the best known transport protein. Its chief function is


to pick up oxygen in the lungs, where it is plentiful, and deliver it to
tissues throughout in body. A central feature of homoglobin is a water-
free pocket for the heme, with its central iron atom located where
oxygen is bound. The hydrophobic character of the heme binding cavity
is dicated by the apolar side chains that line it. This is particularly
suitable environment for binding the hydrophobic porphyrin ring and
where iron can bind oxygen reversible without itself being oxidized to
Fe3+.
Hemoglobin consists of two α subunits, each with 141 amino acids,
and two β subunits, each with 146 amino acids. Each subunit is capable
of binding a single molecules of oxygen. In muscle cells a reserve
oxygen store is provided by the myoglobin molecule, which is similar in
strucutre to hemogobin but exits as a monomer. While the components
of myoglobin and hemoglobin are remarkably similar, their
physiological responses are very different. On a weight basis, each
molecule binds about the same amount of oxygen at high oxygen
tension (pressure). At low oxygen tensions, however, hemoglobin gives
up its oxygen much more readily. These differences are reflected in the
oxygen-binding curves of the purified proteins in aqueous solution(Fig).
The oxygen-binding curve for myoglobin(Mb) is hyperbolic in
shape, as would be expected for a simple one-to-one association of
myoglobin and oxygen:
Mb+O2 MbO2
Kf=[MbO2]/[Mb][O2]=equilibrium formation constant
If y is the fraction of myoglobin molecules saturated, and if we express
the oxygen concnetration in terms of the partial pressure of oxygen
[O2], then

Kf=y/[1-y][O2] and y=Kf[O2]/(1+Kf[O2]


This is the equation of a hyperbola.

Hemoglobin(Hb) hehaves differently. Its sigmoidal binding curve can be


fitted by an association-constant expression with a greater-than-first-
power dependence on the oxygen concentration:

Kf=[HbO2]/[Hb][O2] and y=KfO2n/(1+KfO2n)

Under physiological conditions the value of n is around 2.8, indicating


that the binding of oxygen molecules to the four heme in hemoglobin is
not independent and binding to any one heme is affected by the state of
the other three hemes.

The first oxygen attaches itself with the lowest affinity, and successive
oxygen are bound with a higher affinity. The exact value of n for
hemoglobin is a function of the extent of oxygen binding as well as the
pressure of other factors. In general, a value of n>1 indicates
cooperative binding (or positive cooperativity) between small-molecule
ligands, a value of n<1 indicates anticooperative binding (or negative
cooperativity), and a value of n=1 indicates no cooperativity.

9.3 Muscle---an aggregate of proteins involved in


contraction

9.4 Protein diversification as a result of evolutionary


pressures
Chapter 10 Proteins in Solution and in
Membranes

10.1 Introduction

10.2 Physical and chemical properties of soluble


proteins

There are great differences in physical, chemical and biological roles


between native folded proteins and unfolded proteins. Due to the
compactness of folded conformation, the diffusion rate of native
proteins is very rapid. The individual domains of proteins are relatively
resistant to proteases, which is frequently used as a criterion for whether
a protein is folded. Multidomain proteins often can be cleaved between
domains. Some domains are cleaved by proteases at peptide bonds in
mobile surface loops, but the folded structures generally remain intact.
If diassociated, the fragments often recombine spontaneously under the
appropriate conditions to regenerate the folded structure. The folded
conformation places the atoms of a protein in unique environments that
often markedly affect their physical and chemical properties. Tow or
more functional groups are often held in proximate by the folded
conformation, making their effective concentrations relative to each
others so high that reactions occur between them that would be
negligible if the functional groups were on separate molecules.

Many of these properties are not evident when proteins are crystallized,
but appear in solution or in membrane where the proteins are more
flexible. Nevertheless, knowing the crystal structure of a protein is
necessary to understand its properties under other conditions.

10.2.1 Aqueous solubility

The solubility of proteins in aqueous solutions vary enormously. Some


proteins are so soluble in water that they can compose up to 35% of the
volume of a saturated solution. Others, especially structural proteins, are
essentially insoluble under physiological conditions and exist normally
as solids, aggregated into complexes of varying sizes and specificity.
Many proteins that are relatively insoluble in water are sequestered into
membranes. The solubility of a protein in water is determined by its free
energy when surrounded by aqueous solvent relative to its free energy
when interacting in an amorphous or ordered solid state with any other
molecules that might be present, or when immersed in membrane. This
is a very complex situation, for which no quantitative explanations of
protein solubility are available.

The interactions of a protein molecule with solvent or with other


molecules are determined primarily by its surface. The most favorable
interactions with aqueous solvent are provided by charged and polar
groups of the hydrophilic side chains. The surfaces of most water-
soluble globular proteins are covered uniformly be charged and polar
groups, and their solubilities are governed primarily by the interactions
of the polar groups with water.structural proteins also have polar
surfaces, but they interact with other protein molecules more avidly than
they do with water. Membrane proteins are more complex; their
interactions with membranes are described later.

The solubility of a globular protein in water generally increases at pH


values farther away from its isoelectric point. The greater the net charge
on the protein molecule, the greater the electrostatic repulsions between
molecules, which tends to keep them in solution. Most proteins unfold
at some pH value, however, often with drastic consequences for their
solubility, because undoloding exposes many nonpolar surface areas to
the solvent. Most proteins can be solubilized in aqueous solutions by
adding detergents or denaturants such as urea or guanidinium salts, but
the proteins are then usually unfolded.

The solubilities of globular protein are affected by the addition of


cosolvents, especially salts. (see discussion above).

Organic solvents also tend to decrease the solubility of proteins,


primarily by lowering the dielectric constant of the solvent. Polar
interactions between the solvent and the protein surface are
consequently less favorable. The stability of the folded state is also
lowered, however, so organic solvents tend to denature proteins.

Other polymers also tend to decrease the solubility of proteins.

10.2.2 Hydrodynamic properties in aqueous solution

1. Diffusion (see: other chapter)


Molecules undergo random rotation and translation because of
Brownian motion, which subjects them to repeated collisions with the
atoms of their environment.

2. Sedimentation analysis(see: other chapter)


The hydrodynamic properties of protein molecules are often measured
by their sedimentation coefficient, the rate at which they sediment in a
gravitational field. The rate dr/dt of sedimentation in a centrifugal field,
where r is the radius at which the protein is situated and t is time, is
given by

dr/dt=[(Mw(1-νρ))/NAf]ω2r

Mw: molecules weight; ν: its partial specific volume; ρ: the density of


solution; NA: Avogardro’s number; f: its translational frictional
coefficient; ω: the radial velocity of the rotor in radians per second.

3. Gel filtration

4. Rotation

10.2.3 Spectral properties

The various environments of the chromophores of a folded protein and


the unique stereochemistry of the polypeptide chain affect their spectral
properties in various ways. These can be used to characterize and to
follow changes in the folded conformation in solution.

1. Absorbance
Absorbance of UV light by proteins is not very sensitive to thei
conformation or environments, except for that by the aromatic rings of
Phe, Tyr, and Trp residues. The spectral properties of the aromatic
residue reflect their environment. Their absorbance spectra are shifted
somewhat to longer wavelengths in a nonpolar environment such as the
interior of a protein. The absorbance spectra of the aromatic groups
consequently can be used to determine their average exposure to water.

2. Fluorescence
Fluorescence by the aromatic side chains is much more sensitive to their
environment than is absorbance, but it varies in an unpredictable
manner. The quantum yield may be either increased or decreased by
folding, so a folded protein can have either greater or less fluorescence
than the unfolded form. The magnitude of the fluorescence is not very
informative in itself, but it can serve as a sensitive probe of any
perturbations of the folded state.
Fluorescence by a protein is especially complex when there is more than
one aromatic side chain. The close proximity of aromatic groups in a
folded protein usually results in very efficient energy transfer between
them. ……
3. Circular dichroism
The CD and optical rotary dispersion(ORD) spectra of a protein are very
sensitive to its conformation. In the far-UV region (below 250nm), these
spectral characteristics are determined primarily by the polypeptide
backbone conformation, especially its secondary structure. The
spectrum of a protein of known structure is usually close to that
expected from the average of the spectra of α-helice, β-sheets, and
irregular conformations of model polypeptides, weighted by the fraction
of the polypeptide chain in each conformation. Consequently, CD
spectra can be used to estimate the relative proportions of the various
types of secondary structure in a protein. Early methods interpreted he
CD spectrum in terms of the model spectra of α-helice, β-sheets, and
irregular conformations; more recent procedures use spectra of a
number of proteins of known structure to fit the spectrum being
analyzed. As long as the unknown spectrum does not have any unique
features, fitting it with actual protein spectra usually gives the most
meaningful interpretation. However, other chromophores, especially
aromatic rings, can contribute significantly to the far-UV spectrum of a
protein. Recently, Infrared and Raman spectroscopy are being developed
to measure protein secondary structure in solution.

10.2.4 Ionization
The folded conformation of protein have a variety of effects on the
ionization of their polar groups. Many charged groups are brought into
close proximity on the surface of a folded protein, so ionization of
groups that would increase the net charge may be hindered. This general
electrostatic effect influences the ionization of all the groups. Specific
interactions, such as hydrogen bonding or salt bridging, also occur and
primarily affect the ionization of particular groups. The kPa values of
groups can be influenced by many environmental and electrostatic
effects in small molecules. The variety of environments in folded
proteins can produce very unusual ionization properties. The pKa values
of residues of one type can vary widely within a single protein, Often
over a range of 3-4 pH units, because of their different environments.

Understanding and simulating electrostatic effects in the heterogeneous


environments of a folded protein immersed in water or a membrane are
much more complex than in a homogeneous liquid, where a simple
dielectric constant can describe the effect of the environment. Detailed
modeling of electrostatic effects in proteins requires consideration of all
the atoms and charges of both the protein and the solvent, plus their
atomic polarizabilities. The complexity of folded protein structures
prevents such analysis, and simpler approximate models are usually
used. Electrostatic effects in proteins are also complicated by the
presence of centurions in the aqueous solvent and by binding of ions by
the proteins. Consequently, it is impossible at present to predict
accurately the ionization behavior of any one group or the titration of
the total protein, but progress is being made.

10.2.5 Chemical properties


The unique environments of reactive groups in folded proteins can
substantially affect their chemical properties. (omitted)

10.3 Proteins in membranes

Membranes provide a physical and insulating barrier between the cell


interior and its environment; they also divide eukaryotic cells into
compartments. (The basic structure of a membrane). The basis of
membrane structure is the amphiphilic structure of the lipid molecules.
Natural membranes vary in their lipid compositions, also do the two
layers of their bilayer in some cases.
Protein typically compose 50%of the mass of most natural
membranes, but this can vary between as little as 25% and as great as
75%. The proteins mediate various functions of the membrane such as
transport of appropriate molecules into or out of the cell, catalysis of
chemical reactions, receiving and transducing chemical signals from the
cells environment, and maintaining the membrane structure. Membrane
proteins are no less important biologically than those that are water
soluble, but they have not been as thoroughly studied for simple
technical reasons. Membrane proteins have amphipathic structures that
reflect the membrane in which they reside. They have both polar
surfaces that interact with the aqueous solution and with the lipid head
groups, and nonpolar surface that interact with the nonpolar interior of
the lipid bilayer. Consequently, they are soluble neither in aqueous
solution nor in nonpolar solvents. They can be manipulated and studied
only when immersed in a lipid bilayer or a detergent micelle.

1. Association with membranes


Different proteins associate with membranes to varying extents,
depending on what fraction of the polypeptide chain is immersed in the
membrane bilyer.
(1) Integral membrane proteins
(2) Nonintegral membrane proteins: water soluble,
a. to be anchored to the membrane only by fatty acid chains
attached covalently by their polar ends to the protein
b. other proteins are associated with membrane by noncovalent
interactions with the exposed surfaces of integral membrane
surface.

2. Structures of integral membrane proteins


Membrane proteins do not readily form three-dimensional crystals,
primarily because of the membrane lipids or detergents that necessarily
bound to their nonpolar surfaces. These technical problems have been
solved so recently, primarily by using detergents with short chains, that
the three-dimemsional structures of only three membrane proteins are
known in details. Two are photosynthetic reaction centers from related
bacteria, and their structures are closely similar (fig). The other is the
bacterial outer membrane protein, porin, which differs markedly from
the reaction center proteins.

a. Photosynthetic Reaction Centers

b. Porin
3. Identifying amino acid sequences likely to transverse membranes
The integral membrane proteins of known structure are not markedly
different in structure or amino acid composition from water-soluble
proteins, except that they are slightly more hydrophobic. They differ
mainly in the nature of the amino acid side chains that are on part of
their surfaces. Those side chains of membrane proteins that are on the
surface in contact with the membrane bilayer are less polar than the
protein interior, whereas the surfaces of water-soluble proteins are much
more polar than the interior. Those observations make it likely that
segments of polypeptide chains that traverse membranes could be
identified from their amino acid sequences alone.

4. Dynamic behavior in membranes


Membrane proteins generally diffuse rapidly in the two-dimemsional
plane of the membrane, with diffusion coefficients of about 10-10 cm2/s,
unless they are interacting with other molecules inside or outside the
membrane. They usually retain their vertical orientation in the
membrane, however, and do not flip between the two surfaces. The
membrane lipids move even more rapidly in the membrane planes, with
diffusion coefficients of 10-8 cm2/s, and only very infrequently do they
move from one side of the bilayer to the other. Proteins in a membrane
generally induce disorder in the lipid bilayer and restrict the diffusion of
neighboring lipid molecules. The restricted lipids exchange positions
rapidly with others, however, indicating that the interactions between
the lipids and the proteins are weak and nonspecific. Similarly, neither
ordered detergent nor lipid molecules are strongly evident in the crystal
structures of membrane protein. The physical state of the membrane
also affects the functional properties of its proteins, but in widely
varying ways. Interactions between proteins and membranes are
complicated by the usual heterogeneity of the lipid in natural
membranes.
Proteins in membranes tend to interact with each other much more
than do protein in solution. The large sizes of protein and their high
concentrations in most membranes, typically at least 255 of the
membrane volume, produce a large exclude volume effect. There is not
much empty space in a membrane for a protein molecule to move into.
Also, the orientations of the proteins are fixed relative to the membrane
and to each other, fewer degrees of freedom need to be lost for them to
interact specifically. Perhaps partly for these reasons, many proteins in
membranes are oligomeric.

10.4 Flexibility of Protein Structure

The structures of protein in crystals demonstrate varying degrees of


conformational flexibility in that the electron density of any particular
atom in the calculated electron density map may be spread out to
varying extents. In part, this spreading reflects the existence of
populations of alternative conformations. Even greater flexibility would
be expected in solution, without the constraint of the crystal lattic.
Indeed, it is a thermodynamic requirement that molecules the size of
proteins have a substantial transient fluctuations.

The most prevalent and best understood movements of atoms in


molecules are the small-scale vibrations of bond lengths and angles that
are detectable by infrared and Raman spectroscopy techniques. These
vibrations in proteins are similar to those observed in small molecules,
and they occur at frequencies between 6×1012/s and 1014/s. On a large
scale, larger movements occur, such as those of domains of large
proteins that are linked together by relatively flexible “hinge” segments.
On the longest time scale, folded conformations are only marginally
stable and therefore spontaneously undergo transient but complete
unfolding with a frequency of 10-4-10-12/s, even under conditions that are
optimal for stability. Protein flexibility therefore involves movements of
widely varying magnitudes on a time scale that spans perhaps 26 orders
of magnitude.

Describing protein flexibility is not straightforward, except for that


of the side chain on the protein surface, which usually can move to
extents similar to those observed in small molecules and unfolded
proteins.
The rate at which conformational changes occur is only one aspect of
protein flexibility; another is the energetics of the various
conformations.
There are severe constraints on the extent to which a folded protein
conformation normally varies.
Integral membrane proteins have varying degrees of internal
flexibility, comparable to those of soluble proteins. Amino acid side
chains that extend into the membranes, however, are much more
restricted in their flexibility than are those of water-soluble proteins that
extend into the aqueous solution.

Chapter 11 Protein Engineering


Protein engineering has unlimited potential to provide significant
advances in science, medicine, and industry. The successful engineering
of proteins requires an understanding of the basic concepts of proteins
from expression to composition. The protein engineer needs to have a
working knowledge of protein composition, structure, and expression.
Often, the protein engineer is required to begin by finding the protein of
interest hidden within a mixture of proteins. After locating the desired
protein, the protein engineer must be able to clone and express it for
further analysis. Before engineering the protein of interest, the
characteristics of the wild-type protein must be determined from a
variety of analytical methods. Mutant proteins are then produced to
assess elements of the protein that are necessary for function. Within the
functional sites identified, the protein engineer begins to evaluate
alterations in these sites that result in the desired new properties. These
new properties may include alteration in stability, catalytic activity,
receptor binding, specificity, pharmacokinetic properties, or
immunogenicity. The protein engineer must have a clear understanding
of the required improvements and their impact on the intended use of
the protein.

11.1 Introduction to Protein Engineering

1. Why engineer proteins?

The term “protein engineering” refers to the use of genetic manipulation


techniques to alter(modify) or create proteins of interest in extremely
specific ways.
Substitution of a single amino acid residue at a known location can be
accomplished routinely by site-specific (or site-directed) mutagenesis protocols.
This approach has already had a major impact on our knowledge of protein
structure and function. Site-directed mutagenesis depends on the use of synthetic
oligonucleotides to direct the required mutation. It is highly specific: individual
bases may be reliably altered even within a triplet codon. Non-natural amino acid
substitutions have already been engineered specificially to investigated protein
stability.

a. The very underpinnings of the biotechnology industry with regard to


proteins are founded upon protein engineering research. The
applications range from creating structural motifs such as alpha-
helical bundles or leucine zippers that test theories of protein folding
and our current understanding of protein stability, to the production
of the first- and second-generation protein products for human
therapeutics or industrial products. These products include injectable
proteins such as insulin, growth hormone, erythropoietin,
hematopoietic cytokine granulocyte-colony stimulating factor, and
viral subunits of hepatitis B, as well as enzymes for improved food
production, biosensors, pollution control, and even biocomputing.
The tools are now in place to obtain a remarkable understanding of
protein folding and assembly and apply that knowledge to create
protein-based human and animal therapeutics that are better than
those currently available. The de novo design of new proteins with
prescribe properties constructed from first principles will eventually
result in even more useful products.
b. Several fundamental questions currently being addressed by protein
engineers involve catalysis, molecular recognition, protein folding,
stability and protein-protein interactions. The now-classic approach
of protein engineers in studying these questions to target those amino
acids that potentially play functional or structural roles and then
delete or replace them with alternate residues to test the role of steric
constraints, hydrophobic force, electrostatics and charge, and the
placement of hydrogen bonds, salt bridge, disulfide bonds, water, or
metals. A database of information for a particular protein or class of
proteins can be established in this fashion. In some cases, a database
may give a researcher clear clues regarding how to produce a protein
with a predictable change in structure or function.

2. The goals of protein engineering


One of the long-term goals of these efforts is to define the fundamental
rules for de novo protein design, since the ability to tailor-make a
protein with a predetermined activity and structure is the ultimate dream
of a protein chemist. Another more immediate goal is to sufficiently
understand structure-activity relationships between a protein and a
ligand to design small-molecule inhibitors of exquisite specificity. This
latter goal constitutes the theme of structure-based drug design. Finally,
generation of the redesigned enzymes provides insight into basic
principles governing protein structure-function relationships and
produces scientific reagents of practical importance. It should be clear
from this description that protein engineering is both an area requiring
basic research input and an empirical technology capable of being
applied to broad areas of research as well as the generation of products.
Not just enzymes but protein-base reagents, in general, will undergo a
similar increase in demand. Some novel and cost-effective protein drugs
such as erythropoietin, interferons, and human growth hormones have
been brought to market, while others are in the developmental pipeline.
Development of second-generation derivatives of these drugs and the
discovery of new first-generation products will depend heavily upon the
field of protein engineering.
Other area where protein engineering principles will have significant
impact are as follows:
a. Macromolecular recognition: specifically how proteins recognize
one another and recognize their substrates.
b. Target drug delivery: specially the generation of high concentrations
of a particular drug at a specified site. By designing an enzyme to
convert a precursor to a form a drug to its active form, the drug will
be delivered site specially.
c. Bioremediation:
d. Engineered biological catalysis.

These are a few of the relatively short-term projects that the protein
engineers are currently focusing their efforts on. The long-term goal is
to provide a theoretical framework for addressing the relationship
between the three-dimensional structure and the function of a protein.

11.2 Production and Analytical Characterization of Proteins

11.2.1 DNA level processes

1. Methods and rationale


The current methods of recombinant DNA technology permit a protein
engineer to address structure-activity relationships of proteins and
enzymes in ways never before imagined. Genes or cDNAs encoding
proteins of interest can be cloned from natural sources or synthesized de
novo from the known protein sequence. The DNA sequences can then
be genetically engineered to encode redesigned proteins with insertions,
deletions, or precisely placed amino acid substitutions. The altered
genes can then be introduced into appropriate heterologous expression
systems to overproduce the introduced protein to reagent-level
quantities and qualities. The proteins can be purified and analyzed both
biochemically and biophysically to determine the effect of the function
of the protein, the synthetic technology of recombinant DNA must be
wed with the analytical techniques of enzymology, structural analysis,
and molecular dynamics. The basic methods involved in protein
engineering will now be described.

a. Finding the protein of interest


If the wild-type protein is not available, but its sequence has been
determined, cDNA can be synthesized. Cloned into the appropriate
plasmid, and expressed in the desired host.
In the case of discovery research, the protein of interest may not be
well-defined. If a homologous sequence (e.g., same family or different
species) is known, cDNA probes from these molecules are often used to
locate the oligonucleotide sequence encoding the desired protein within
a cell that is producing the protein. For discovery research, homologous
proteins may not be known, but a ligand or cofactor may be available.
The ligand is then used to locate the protein of interest through
radiolabeling and immunoprecipitation methods. Once sufficient
quantities (~100ug to 1 mg) of the wide-type protein are available, it can
be used to generate monoclonal antibodies, utilized for subsequent
assays and purification. The antibodies produced can then be used to
identify the protein of interest and its mutant. However, for research
purposes, polyclonal antibodies derived from animal antisera may be
sufficient to purify the desired quantities of the protein. The purified
protein can be sequenced by proteolytic digestion and subsequent amino
acid sequence analysis of the peptide fragments. After obtaining a
partial amino acid sequence, cDNA can be generated for probing cells
for expression of the protein as well as construction of plasmids for
expression of the protein in foreign hosts.

b. Developing recombinant DNA libraries


Armed with primary sequence information and/or antibodies to a
protein of interest, isolation of the natural gene and/or cDNA encoding
the protein is a relatively straightforward task. (see also other related
chapter)

c. Mutagenesis principle
Most protein engineering involves recombinant DNA
methodologies, but other methods such as random mutation via DNA
damaging agents or environmental pressures have been used. A gene
encoding the target protein is cloned from the original source or
synthesized and subcloned based on the protein sequence of interest.
Once the gene for the target protein is cloned, one or many amino acids
can be substituted, deleted, or inserted into the gene. Both natural and
unnatural amino acids can be introduced into the gene at this point.
The ready availablity of synthetic DNA provides a protein engineer
with new vistas in the manipulation and selective alteration of cloned
DNA. In particular, it is now feasible to change any cloned nucleotide
sequence to any other desired sequences and to determine the effect of
the change. One of the most powerful techniques for accomplishing
such nucleotide substitutions, insertions, or deletions is through the use
of synthetic DNA oligonucleotides.

2. Oligosynthesis
a. The synthesis of DNA is based on the Merrifield principle of solid-
phase chemistry that has previously developed for the synthesis of
peptides. Activated nucleotided are sequentially added to the 5’
hydroxyl of an oligo chain bound to an insoluble solid support by a 3’
hydroxyl linkage.

b. Automated DNA synthesizer.

3. Use of oligo as hybridization probes


The availability of synthetic DNA of predetermined sequence has
greatly facilitated studies of small fragments, permitting critical
evaluation of the various parameters affecting the stability of short
DNA/RNA duplexes.(Fig 1.3).

4. Use of oligos for site-specific mutagenesis


Once a gene or cDNA has been cloned and its sequence determined, a
critical analysis of its genetic structure and function frequently requires
its expression in an appropriate system. Development of optimal
expression systems is currently an area of intense investigation, and
approaches include both cytoplasmic and secreted expression in
bacteria, yeast and cultured insect and mammalian cells as well as
expression in live animals. Once an assay for monitoring the expressed
foreign gene product in the heterologous environment has been
developed, it is then possible to probe structure-function relationship by
site-specific mutagenesis. The effect o nucleotide or amino acid change
has on an activity of the gene product can be determined by site-
specifically modifying the cloned DNA, expressing, and then analyzing
the activity of the mutated product.

Efficient application of this so-called “reverse genetics” in the analysis


of a biological macromolecule requires prior knowledge of the relative
importance of a given short sequence with respect to the rest of the
genetic unit. In case where accurate crystallographic determinations
have been made, the three-dimensional structure of a protein or nucleic
acid may suggest functional roles for individual amino acid or
nucleotide residue. Computer-graphics-assisted modeling building
studies can then suggest residue replacements that could modify the
activity of the gene product. This strategy is a classic one for a protein
engineer.
(Figure 1.4)
Applications of oligo-directed mutagenesis

11.2.2 Protein Characterization

1. Methods to determine and assess the protein structure and


composition

a. Designing and modeling protein structure


While the qualitative nature of the forces (that operate within
proteins and between the protein and its substrate) is reasonable well
understood, their quantitative contributions to protein function are not
yet definable in most cases. However, there has been significant success
in the use of computer software in the analysis of known protein
structures in designing o proteins with novel functional properties. Such
semiquantitative analyses often take into account functional and
structural data of first- and higher-generation mutants, in addition to
established structural biological and chemical theories. Protein
engineering efforts to test or later enzyme function is an experimental
science.

Computer-based design of novel mutants makes extensive use of


existing general-purpose modeling programs, INSIGHT and MIDAS.
These programs allow rapid inspection and manipulation of structure
available through the protein data bank.
b. Expression
A basic requirement for developing a genetic approach to studying
protein is an efficient expression system for the protein of interest.
High-level expression of modified proteins has proven to be difficult
and unpredictable. The choice of an expression system will largely
depend on the desired use of the expressed protein. ….. it is essential
that the method of biosynthetically producing the target protein provides
authentic material in reagent levels and quality. The most common step
that permits the development of a successful protein engineering project
is one involving the high-level production of functionally active protein.

2. Assessment of mutant proteins


It is generally the goal in protein engineering projects to evaluate the
effects of the specific modification(s) on the structure or function of the
protein of interest. There will therefore be specific characteristics or
functions which are the prime targets of the protein engineer, who will
examine them in general detail. However, it is important to establish that
the desired modifications, and no others, have indeed been introduced
into the engineered protein. Only in this way is it possible to link the
observed changes in structure or function to the newly introduced
changes in the molecule.

For many protein engineering projects, the wild-type protein is freely


available and its structural characteristics are well understood. The DNA
sequence of the wild-type and mutant protein are known, and their
primary structures (amino acid sequences) are known. Clearly,
therefore, the most important property of the new protein species that
needs to be confirmed is its amino acid sequence.

One of the most powerful indirect methods for assessing the correct
incorporation of the structural alteration expected from the engineering
is mass spectrometry, specially using electrospray methodology. ……

Peptide mapping is a common, powerful tool used in the analysis of


a protein. Cleavage of the protein by a specific enzyme leads to a series
of peptides which are separated (typically by reversed phase HPLC) into
a unique pattern. Each peak in such a separation is characterized and
identified as a particular part of the protein sequence. Thus, if a well-
characterized peptide map is available for the parent sequence, it can be
predicated which features of the map should be altered in the new
protein and only a few peptides need to be characterized to confirm that
the expected changes in sequence are indeed present. Even though
indirect evidence (that the desired alteration is present) may be
abundant, the amino acid sequence of the altered peptide should be
directly demonstrated to be consistent with that predicted.

As noted above, the strongest link between the introduced change and
the observed alterations in structure or function comes from the
demonstration that no other changes have been introduced, at least in the
amino acid sequence. The identity of the peptide map in all regions
other than those affected by the desired change can usually be taken as
good evidence that there have been no other changes. Table 1.4 shows
some of the methods that might be used to investigate such changes or
to confirm that no observable differences exist between the parent and
wild-type protein.

11.2.3 Summary of Issues to Consider before Engineering


a Protein

1. What is needed?
Prior to altering the composition of the wild-type protein, the protein
engineer must have a well-defined rational for mutagenesis studies. In
particular, these studies fall into two primary, although not mutually
exclusive, areas: structure-function analysis and reconfiguraton of the
protein to provide useful properties (e.g. increased stability, longer half
life, etc.). Structure-function analysis is often performed to understand
the critical elements of the protein. Many of these studies involve large
screens of mutants, and these mutants are often dramatically different
from the wild-type protein. Another common alteration in the wild-type
protein is cassette substitution or removal of several adjacent amino
acids, with each mutant containing a different set of substitutions. This
type of mutagenesis scan often pinpoints the critical functional domain
of the protein. Of course, each of these mutants may significantly alter
the protein’s conformation, and, therefore, it is critical to determine the
effects of the mutations by analytical techniques. Assuming that the
protein conformation (secondary and tertiary structure) remains
significantly unaltered, the observed differences in function between the
mutant and wild-type molecules provide essential insight into the
importance of the mutated regions in the protein’s overall function.

While these structure-function studies are often necessary before


redesigning the protein, many alterations may be made a priori,
especially if the wild-type protein is a member of a large well-defined
family of proteins, some of which may have already been studied for
structure and function. The next issue then becomes the choice of
desired attributes to confer onto the protein. The most common
alterations include mutations that provide greater specificity, higher
binding affinity, faster rates of reaction (enzyme catalysis or binding
rates), and altered clearance rates. Before proceeding with protein
engineering, the behavior of the wild-type protein both in vivo and in
vitro must be well understood. …… overall, the desired protein
properties and potential mutagenesis sites must be determined prior to
the design of a superior mutant protein.

2. Industrial issues
Protein engineering holds great promise for elucidating the underlying
mechanisms of protein structure, function and folding. Beyond
enhancing the general understanding of proteins, proteins, protein
engineering has the potential to great improve their use in industrial
applications. For enzyme design, protein engineering provide the
opportunity to design more efficient and stable enzymes. These enzymes
can be used to catalyse complex reactions that, by standard chemical
methods, are inefficient or lead to racemic mixtures resulting in impure
products or intermediate. In addition, enzymes for many industrial
chemical reaction should be stabilized against environmental stresses
such as heat or organic solvents.

In the case of therapeutic proteins, mutations in a naturally occurring


protein may provide significant benefits. For example, (omitted).

Finally, protein engineering provides a basis for rational drug design. By


developing an understanding of protein structure and function
relationships, the molecular epitopes that determine a protein’s function
can be used to develop small molecular drugs. By understanding
biological responses such as signal transduction through receptor
dimerization and phosphorylation, it may ultimately be possible to
rationally design small-molecule agonists to replace therapeutic
proteins. Achievement of this last goal will surely take protein
engineering into the next century and beyond.

11.3 Protein Engineering for Stability

11.4 Engineering therapeutic antibody

11.5 Site-directed drug design


Chapter 12 Proteomics

12.1 Introduction to the Proteomics


12.1.1 Proteome: a new word, a new field biology
12.1.2 The Proteome and Technology
Thinking in two dimenesions
Further dimensions in protein analysis
Information and the proteome
12.2 Two-Dimensional Electrophoresis: The State of
the Art and Future Direction
12.3 Protein Identification in Proteome Projects
12.4 The Importance of Protein Co- and Post-
Translational Modifications in Proteome Projects
12.5 Proteome Databases
12.6 Interfacing and Integrating Databases
12.7 Large-scale Comparative Protein Modelling
12.8 Applications of Proteomics

蛋白质组研究的兴起
在后基因组时代,研究的重点已从揭示遗传信息转移到功能基因组学上来。
但是,由于生物功能主要体现者是蛋白质,而蛋白质有其自身特有的活动规
律。如蛋白质修饰加工、转运定位、结构变化、蛋白质与蛋白质间、蛋白质与其
他生物大分子的相互作用等,均无法在基因组水平上获得。因为基因组学有
样的局限性,促使人们从整体水平上探讨细胞蛋白质的组成及其活动规律。

蛋 白 质 组 和 蛋 白 质 组 学 概 念 的 提 出

1994 年,澳大利亚 Macquarie 大学的 Wilkins 和 Williams 首先提出了


蛋白质组( Proteome)的概念,早期定义为:微生物基因组表达的整套蛋
白质,在多细胞微生物中,整套蛋白质指一种组织或细胞表达的蛋白质,后
来定义为:一个基因组所表达的蛋白质。但是,从基因表达的角度来看,蛋
白质组的蛋白质数目总是少于基因组的基因数目。从蛋白质修饰的角度来看,
蛋白质组的蛋白质数却多于其相应的 ORF 数目,因为 mRNA 的剪切和编辑
可使一个 ORF 产生数种蛋白质,蛋白质翻译后的修饰,如糖基化、磷酸化同
样增加蛋白质的种类,氨基酸序列一致的一级结构在一定条件下可以形成功
能完全不一样的具有不同空间结构的蛋白质,如朊病毒。故"蛋白质组内蛋白
质数目要多于基因组内的基因数目"。现在蛋白质组的概念为:在一种细胞内
存在的全部蛋白质。但是由于蛋白质组在不同的时间、不同的条件具有不同的
蛋白质组分,而且衡量是否是蛋白质组的全部蛋白质尚缺乏必要的尺度。所
以 , 欲 得 到 " 细 胞 内 存 在 的 所 有 蛋 白 质 " 是 不 可 能 的 。

蛋白质组学(Proteomics)是以蛋白质组为研究对象的新的研究领域。
它可分为:①表达蛋白质组学(expression proteomic 即把细胞、组织中
的蛋白,建立蛋白定量表达图谱,或扫描 EST 图。该方法依赖 2-D 凝胶图
和图像分析技术,而且在整个蛋白质组水平上提供了研究细胞通路,以及疾
病、药物相互作用和一些生物刺激引起的功能紊乱的可能性。②细胞图谱蛋白
质组学(cell-map proteomics):即确定蛋白质在亚细胞结构中的位置;
通过纯化细胞器或用质谱仪鉴定蛋白复合物组成等,来确定蛋白质 -蛋白质
的相互作用。Humphery-Smith 等总结了基因组结果后提出了"功能蛋白
质组( Functional Proteome)"的新概念。即细胞内与某个功能有关或在
某种条件下的一群蛋白质。鉴于此,我国学者李伯良提出了"功能蛋白质组学
(Functional Proteomics)"的概念。即把"功能蛋白质组"作为主要研究内
容。这一概念的提出,为蛋白质组研究的可能性奠定了理论基础。

蛋白质组研究的理论基础

蛋 白 质 组 分 析 主 要 基 于 3 条 理 由 :

① 从 mRNA 表达水平并不能预测蛋白表达水平。有人研究了 mRNA 和蛋白


质表达的关系,以处于对数生长期的啤酒酵母为研究对象,mRNA 的表达
由 SAGE(serial analysis of gene expression)频率表指示,同位素标
记酵母蛋白,共选择 80 个基因,结果没有发现翻译和转录丰度有明显相关。

② 蛋白质的动态修饰和加工并非必须来自基因序列。在 mRNA 水平上有许多


细胞调节过程是难以观察到的,因为许多调节是在蛋白质的结构域中发生的。
许多蛋白只有与其他分子结合后才有功能,蛋白的这种修饰是动态的、可逆
的,这种蛋白修饰的种类和部位通常不能由基因序列决定

③ 蛋白质组是动态反映生物系统所处的状态。细胞周期的特定时期、分化的不
同阶段、对应的生长和营养状况、温度、应激和病理状态,这些状态所对应的
蛋白质组是有差异的。蛋白质组学的研究可望提供精确、详细的有关细胞或组
织状况的分子描述。因为诸如蛋白质合成、降解、加工、修饰的调控过程只有通
过 蛋 白 质 的 直 接 分 析 才 能 揭 示 。

蛋 白 质 组 学 用 于 医 疗 研 究 的 重 点

蛋白质组方面的研究,将帮助人们寻找到一些用于医疗的可识别蛋白,
这些蛋白可作为诊断标记或作为诊断靶分子提供给从事医药和诊断研究的机
构 。 研 究 主 要 有 以 下 五 个 方 面 :
1.癌症针对研究的肿瘤类型包括:食道、肺、结肠、前列腺、胰腺、乳房以及成
神 经 细 胞 瘤 。
2.神经性疾病研究方向主要包括:脑损伤和感染性蛋白质疾病,如克雅氏
病 ( CJD ) 、 牛 海 绵 状 脑 病 ( BSE ) 、 帕 金 森 氏 病 。
3.器官移植排异蛋白质组研究将寻求一种体外检测的方法,用于人体器官
(心脏、肝、肺或肾)移植后的过敏和慢性排异性反应。
4.心血管疾病列入研究的心血管疾病有心力衰竭、高血压合肥大型心肌炎。
5.糖尿病、肥胖症通过蛋白质组学方法对于肥胖症及糖尿病相关的多肽进行
识 别 , 作 为 潜 在 的 识 别 分 子 和 治 疗 靶 象 。

Sites for Proteome:

ExPASy Home page Site Map Search ExPASy Contact us


Hosted by PKU China Mirror sites: Australia Canada Korea Switzerland

The Taiwanese ExPASy site, tw.expasy.org, is unavailable for maintenance.


Please bookmark the addresses of the other mirror sites and use one of them
during the downtime. We apologize for any inconvenience caused.

SWISS-2DPAGE
Two-dimensional polyacrylamide gel electrophoresis database

SWISS-2DPAGE contains data on proteins identified on various 2-D PAGE


reference maps. You can locate these proteins on the 2-D PAGE maps or display
the region of a 2-D PAGE map where one might expect to find a protein from
SWISS-PROT [More details / References / Disclaimer].

Release 13.0, December 2000 and updates up to 27-Feb-2001


(contains 772 entries in 31 reference maps from human, mouse,
Arabidopsis thaliana , Dictyostelium discoideum , Escherichia coli
and Saccharomyces cerevisiae ).

[Search][Documents][Services][Software][Related
servers][Other databases][Job openings]
Access to SWISS-2DPAGE SWISS-2DPAGE documents
• by description (DE lines) or by ID • User manual
• by accession number (AC lines) • Release notes (December 19, 2000)
• by clicking on a spot: select one of our 2-D • Protocols:
PAGE reference maps, click on a spot and then o Technical information about 2-D
get the corresponding information from the PAGE (IPG's, silver staining,
SWISS-2DPAGE database. protocols, etc)
• by author (RA lines) o High performance 2-D gel comparison
• by spot serial number (2D lines) • 2-D PAGE maps published:
• by full text search o Human CSF, ELC, HEPG2,
• SRS, searching in SWISS-2DPAGE using the HEPG2SP, LIVER, LYMPHOMA,
Sequence Retrieval System PLASMA, PLATELET, RBC, U937,
• retrieve in a table all the protein entries CEC, KIDNEY.
identified on a given reference map
o Dictyostelium discoideum, Escherichia
• compute estimated location on reference maps coli, Saccharomyces cerevisiae.
for a user-entered sequence
Services Software
• Downloading SWISS-2DPAGE by FTP • Melanie 3 - Software package for 2-D PAGE
• SWISS-2DSERVICE - Get your 2-D Gels analysis
performed according to Swiss standards
• 2-D PAGE training - attend a one week course • Make2ddb package - A package preparing the
in Geneva data and the programs necessary to build a
federated 2-DE database on one's own web site.
• 2-D PAGE museum - gels run by trainees
during the 2-D PAGE courses
Gateways to other 2-D PAGE related servers and services
• 2D Hunt - 2-D electrophoresis web site finder

• WORLD-2DPAGE - Index to other Federated 2-D PAGE databases


Access to other databases and tools on ExPASy
• SWISS-PROT • SWISS-3DIMAGE
• CD40Lbase
• PROSITE • SWISS-MODEL Repository
• Proteomics tools
• ENZYME • SeqAnalRef

Last modified 9/Apr/2001 by CHH


ExPASy Home page Site Map Search ExPASy Contact us
Hosted by PKU China Mirror sites: Australia Canada Korea Switzerland

The Taiwanese ExPASy site, tw.expasy.org, is unavailable for


maintenance. Please bookmark the addresses of the other mirror
sites and use one of them during the downtime. We apologize for
any inconvenience caused.

Site Map Search ExPASy Contact us


Hosted by PKU China Mirror sites: Australia Canada Korea Switzerland

The Taiwanese ExPASy site, tw.expasy.org, is unavailable for maintenance.


Please bookmark the addresses of the other mirror sites and use one of them
during the downtime. We apologize for any inconvenience caused.
This is the ExPASy (Ex pert Protein Analysis Sy stem) proteomics server of the
Swiss Institute of Bioinformatics (SIB). This server is dedicated to the
analysis of protein sequences and structures as well as 2-D PAGE
(Disclaimer).

[Announcements] [Job opening] [Mirror Sites]


Databases Tools and Software Packages
• SWISS-PROT and TrEMBL - Protein • Proteomics tools
sequences o Identification and characterization
• PROSITE - Protein families and domains o DNA -> Protein
• SWISS-2DPAGE - Two-dimensional o Similarity searches
polyacrylamide gel electrophoresis o Pattern and profile searches
• SWISS-3DIMAGE - 3D images of proteins o Post-translational modification
and other biological macromolecules prediction
• SWISS-MODEL Repository - Automatically o Primary structure analysis
generated protein models o Secondary structure prediction
• CD40Lbase - CD40 ligand defects o Tertiary structure
• ENZYME - Enzyme nomenclature o Transmembrane regions detection
• SeqAnalRef - Sequence analysis bibliographic o Alignment
references • Melanie 3 - Software for 2-D PAGE analysis
• SWISS-MODEL - Automated knowledge-
Links to many other molecular based protein modelling server
• Swiss-PdbViewer - Macintosh/PC tool for
biology databases
structure display and analysis

• Boehringer Mannheim's Biochemical


Pathways
Education and services Documentation
• The ExPASy FTP server • What's New on ExPASy
• Swiss-Shop - automatically obtain (by email) • SWISS-FLASH electronic bulletins
new sequence entries relevant to your field(s) • SWISS-PROT documents
of interest • How to create HTML links to ExPASy
• Masters Degree in Bioinformatics
• 2-D PAGE training - attend a one-week • Complete table of available documents
course in Geneva

• SWISS-2DSERVICE - get your 2-D Gels


performed according to Swiss standards
Links to lists of molecular biology resources Links to some major molecular biology servers
• Amos' WWW links - The ExPASy list of • European Bioinformatics Institute (EBI)
Biomolecular servers • National Center for Biotechnology
• BioHunt - Search the internet for molecular Information (NCBI)
biology information • Japanese GenomeNet
• WORLD-2DPAGE - Links to 2-D PAGE • Australian National Genomic Information
database servers and 2-D PAGE related servers Service (ANGIS)
and services • ISREC bioinformatics group
• 2D Hunt - 2-D electrophoresis finder • BIOSCI/bionet Electronic Newsgroup
• CMS-SDSC - The CMS-SDSC Molecular Network for Biology
Biology Resource
• Biology links - from Harvard University • EMBnet
• BioWurld - from the EBI

• Yahoo - Science:Biology
Miscellaneous Local links
• Protein Spotlight • Geneva and Swiss local pages
• Links to conferences and events • Swiss Institute of Bioinformatics (SIB)
• Swiss-Quiz • The Health On the Net foundation (HON)

• Swiss-Jokes • Geneva Bioinformatics (GeneBio)

Announcements and new features


• What's new on ExPASy (January 18, 2001)

• Proteome Research: New Frontiers in Functional Genomics


ExPASy and its Mirror Sites
Switzerland
http://www.expasy.org/ at Swiss Institute of Bioinformatics, Geneva
Australia
http://au.expasy.org/ at Australian Proteome Analysis Facility, Sydney
Canada
http://ca.expasy.org/ at Canadian Bioinformatics Resource, Halifax
China
http://cn.expasy.org/ at Peking University
Korea
http://kr.expasy.org/ at Yonsei Proteome Research Center, Seoul
Taiwan
http://tw.expasy.org/ at National Health Research Institutes, Taipei
Server access statistics

Last modified 14/May/2001 by ELG


Site Map Search ExPASy Contact us
Hosted by PKU China Mirror sites: Australia Canada Korea Switzerland

The Taiwanese ExPASy site, tw.expasy.org, is unavailable for maintenance.


Please bookmark the addresses of the other mirror sites and use one of them
during the downtime. We apologize for any inconvenience caused.

基因组研究自从开展以来已经取得了举世瞩目的成就。 在过去几年中 , 已经陆续完成了包括大肠


杆菌、酿酒酵母、拟南芥(T. Arabidopsis)等十多种结构比较简单的生物的基因组 DNA 的全序列分析。
线虫(C.elegans)的基因组 DNA 测序工作已基本完成。规模更为庞大的人类基因组计划预期在本世纪初
(2003~2005 年)也将完成全部基因组 DNA 的序列分析。这些进展是非常令人振奋的。但是也随之产
生了新问题。大量涌出的新基因数据迫使我们不得不考虑这些基因编码的蛋白质有什么功能这个问题。
不仅如此, 蛋白质作为生物功能的主要载体,拥有自身特有的活动规律,在细胞合成蛋白质之后, 这些
蛋白质往往还要经历翻译后的加工修饰、转运定位、结构变化、蛋白质与蛋白质间、蛋白质与其他生物大
分子的相互作用等, 也就是说, 一个基因对应的不是一种蛋白质而可能是几种甚至是数十种。 包容了
数千甚至数万种蛋白质的细胞是如何运转的?或者说这些蛋白质在细胞内是怎样工作、如何相互作用、
相互协调的?这些问题远不是基因组研究所能回答得了的。因为基因组学有这样的局限性,促使人们
从整体水平上探讨细胞蛋白质的组成及其活动规律。
为了充分了解和全面认识生命活动的奥秘,90 年代中期,在人类基因组研究计划的基础上,萌
发了一门新兴的学科 蛋白质组学(proteomics),即从蛋白质组的水平进一步认识生命活动的机理和疾
病发生的分子机制。科学家们预测,随着人类基因组全部测序工作的完成,21 世纪生命科学的研究重
心将从基因组学转移到蛋白质且学,生命科学领域内一个崭新的时代 蛋白质组时代即将开始。  
1994 年 , 澳 大 利 亚 Macquarie 大 学 的 Wilkins 和 Williams 首 先 提 出 了 蛋 白 质 组 (
Proteome)的概念,最早见诸于 1995 年 7 月的“Electrophoresis”杂志上, 它是指一个有机体的全
部蛋白质组成及其活动方式。早期定义为:微生物基因组表达的整套蛋白质,在多细胞微生物中,整
套蛋白质指一种组织或细胞表达的蛋白质,后来定义为:一个基因组所表达的蛋白质。但是,从基因
表达的角度来看,蛋白质组的蛋白质数目总是少于基因组的基因数目。从蛋白质修饰的角度来看,蛋
白质组的蛋白质数却多于其相应的 ORF 数目,因为 mRNA 的剪切和编辑可使一个 ORF 产生数种蛋
白质,蛋白质翻译后的修饰,如糖基化、磷酸化同样增加蛋白质的种类,氨基酸序列一致的一级结构
在一定条件下可以形成功能完全不一样的具有不同空间结构的蛋白质,如朊病毒。故"蛋白质组内蛋白
质数目要多于基因组内的基因数目"。
蛋白质组研究虽然尚处于初始阶段, 但已经取得了一些重要进展。 当前蛋白质组学的主要内容是,
在建立和发展蛋白质组研究的技术方法的同时, 进行蛋白质组分析。 对蛋白质组的分析工作大致有两
个方面。 一方面, 通过二维凝胶电泳得到正常生理条件下的机体、组织或细胞的全部蛋白质的图谱, 相
关数据将作为待检测机体、组织或细胞的二维参考图谱和数据库。一系列这样的二维参考图谱和数据库
已经建立并且可通过联网检索。 二维参考图谱建立的意义在于为进一步的分析工作提供基础。 蛋白质
组分析的另一方面, 是比较分析在变化了的生理条件下蛋白质组所发生的变化。 如蛋白质表达量的变
化、翻译后修饰的变化, 或者可能的条件下分析蛋白质在亚细胞水平上的定位的改变等。
蛋白质组学(Proteomics)是以蛋白质组为研究对象的新的研究领域。它可分为:①表达蛋白质组
学(expression proteomics: 即把细胞、组织中的蛋白,建立蛋白定量表达图谱,或扫描 EST 图。该方
法依赖 2-D 凝胶图和图像分析技术,而且在整个蛋白质组水平上提供了研究细胞通路,以及疾病、
药物相互作用和一些生物刺激引起的功能紊乱的可能性。 ② 细胞图谱蛋白质组学( cell-map
proteomics):即确定蛋白质在亚细胞结构中的位置;通过纯化细胞器或用质谱仪鉴定蛋白复合物组
成等,来确定蛋白质-蛋白质的相互作用。Humphery-Smith 等总结了基因组结果后提出了"功能蛋白
质组(Functional Proteome)"的新概念。即细胞内与某个功能有关或在某种条件下的一群蛋白质。鉴于
此,我国学者李伯良提出了"功能蛋白质组学(Functional Proteomics)"的概念。即把"功能蛋白质
组"作为主要研究内容。这一概念的提出,为蛋白质组研究的可能性奠定了理论基础。
蛋白质组分析主要基于 3 条理由:① 从 mRNA 表达水平并不能预测蛋白表达水平;② 蛋白质的
动态修饰和加工并非必须来自基因序列。在 mRNA 水平上有许多细胞调节过程是难以观察到的,因为
许多调节是在蛋白质的结构域中发生的。许多蛋白只有与其他分子结合后才有功能,蛋白的这种修饰
是动态的、可逆的,这种蛋白修饰的种类和部位通常不能由基因序列决定;③ 蛋白质组是动态反映生
物系统所处的状态。细胞周期的特定时期、分化的不同阶段、对应的生长和营养状况、温度、应激和病理
状态,这些状态所对应的蛋白质组是有差异的。蛋白质组学的研究可望提供精确、详细的有关细胞或组
织状况的分子描述。因为诸如蛋白质合成、降解、加工、修饰的调控过程只有通过蛋白质的直接分析才能
揭 示 。
蛋白质组学强调的是针对蛋白质的一个整体思路。从整体的角度看,蛋白质组研究大致可以分为两
种类型:一种是针对细胞或组织的全部蛋白质,即着眼点就是整个蛋白质组;而另一种则是以一个特
定的生物学问题或机制相关的全部蛋白质为着眼点,在这里整体是局部性的。针对细胞蛋白质组的完
整分析的工作已经比较全面地展开,不仅如大肠杆菌、酵母等低等模式生物的蛋白质组数据库在建立
之中,高等生物如水稻和小鼠等的蛋白质组研究也已开展,人类一些正常和病变细胞的蛋白质组数据
库也在建立之中。与此同时,更多的蛋白质组研究工作则是将着眼点放在蛋白质组的变化或差异上,
也就是通过对蛋白质组的比较分析,首先发现并去鉴定在不同生理条件下或不同外界环境条件下蛋白
质组中有差异的蛋白质组分。
1999 年 11 月在《Nature》杂志上发表了一篇用蛋白质组学方法研究蛋白质折叠的研究论文,揭
示了蛋白质与分子伴侣 GroEL 相互作用的关键结构特征。这项工作很好地体现了蛋白质组学的思想方
法和技术手段的应用。Rout 等 (J Cell Biol, 2000, 148:635-651)通过使用蛋白质组学的手段鉴定了
完整的酵母核孔复合体所有能检测到的多肽,并系统地对每种可能的蛋白质组分在复合体内定位并定
量,从而揭示了酵母核孔复合体的完整分子结构,并在此基础上揭示了其工作原理。这个工作可以说
是蛋白质组学解决结构生物学问题的一个典范,为揭示其它巨大分子机器的“构造”和工作原理指出了
一条新路。从近期国际上蛋白质组学研究的发展动向可以看出,揭示蛋白质之间的相互作用关系,建
立相互作用关系的网络图,已成为揭示蛋白质组复杂体系与蛋白质功能模式的先导,业已成为蛋白质
组学领域的研究热点。2000 年初,《Science》刊载了一篇应用蛋白质组学的大规模双杂交技术研究线
虫生殖器官发育的文章,初步建立了与线虫生殖发育相关的蛋白质相互作用图谱,从而为深入研究和
揭示线虫发育的机理等提供了丰富的线索。这一工作为以前专注于信号转导过程中单个蛋白质作用的
科学家们提供了一个新的思路,即将整个途径的相关蛋白质一起考虑。
如果说蛋白质学刚诞生时没有得到国际生物学主流的重视,那么近两年情况已有了巨大的改变。
美国国立卫生研究院(NIH)所属的国立肿瘤研究所(NCI)投入了大量经费支持蛋白质组研究。同时,
NCI 和美国食品与药物管理局(FDA)联合开发可用于临床的蛋白质组技术。美国能源部不久前也启
动了一个蛋白质组项目,旨在研究涉及环境和能源的微生物和低等生物的蛋白质组。欧共体目前正在
资助酵母蛋白质组研究。英国生物技术和生物科学研究委员会最近也资助了三个研究中心,对一些已
完成或即将完成全基因组测序的 生物开展蛋白质组研究。在法国,五个研究不同模式生物的实验室得
到为期三年的资助,每年约为 500 万美圆平均分配到基因组、转录组和蛋白质组研究中。德国也没有忽
略蛋白质组研究,去年联邦政府投资了 730 万美圆开展蛋白质组和相关技术研究,并建立了一个蛋白
质组中心。1998 年澳大利亚政府着手建立第一个蛋白质组研究网 APAF(Australian Proteome Analysis
Facility)。APAF 将为该国的有关实验室提供一流的仪器设备,并把他们整合在一起进行大规模的蛋白
质组研究。我国关于蛋白质族研究的国家自然科学基金重大项目也从 1999 年开始启动。
蛋白质组研究领域的另一个特色是,许多实验室、公司和药厂等很早就已经开始进行与应用前景
有关的蛋白质组研究。如膀胱癌、早老年痴呆症的蛋白质组研究;利用蛋白质组技术筛选疫苗等。据报
道, Myriad 公司将与美国 Oracle 公司,日本日立股份和瑞士 Friedli 基金组织合作推出"蛋白质组"研
究计划。由 Myriad 公司控股,四家公司共同投资一亿八千五百万美元成立的 Myriad 蛋白质组学股份有
限公司将把鉴定人体中存在的 30 万种以上的蛋白质为目标,并力争弄清各种蛋白质之间相互作用的
机制。对此,Myriad 公司的首席执行官 Peter Meldrum 说:"我们将力争在分子水平上去揭示生命过程
的奥秘"。 Myriad 公司的计划分为两个部分:第一部分,是在酵母中表达人体的每一种蛋白质的同时
研究这些蛋白质的相互作用;第二部分则把目标放在分析人体蛋白质复合体的组成及其中各蛋白质组
分的功能及调控机制上。总之,两个方面的研究将帮助科学家们了解蛋白质如何实现正常的细胞功能
以及如何抵抗疾病的侵袭。同时,Myriad 公司所面临的竞争也非常激烈,曾在人类基因组计划中发挥
了重要作用的 Celera 公司也不甘示弱,他们也早就瞄准了蛋白质组学这一非常具有吸引力的研究领域。
虽然蛋白质组学还处于一个初期发展阶段,但相信随着其不断地深入发展,蛋白质组(学)研究
在揭示诸如生长、发育和代谢调控等生命活动的规律上将会有所突破,对探讨重大疾病的机理、疾病诊
断、疾病防治、新药开发、植物生长发育调控机理等方面提供重要的理论基础。
本学术方向的研究内容分两个方面,第一方面是蛋白质组(学)用于医疗研究。这方
面的研究,将帮助人们寻找到一些用于医疗的可识别蛋白,这些蛋白可作为诊断标记或作
为诊断靶分子提供给从事医药和诊断研究的机构。研究的重点主要有以下五个方面:
(1 )癌症针对研究的肿瘤类型包括:食道、肺、结肠、前列腺、胰腺、乳房以及成神经细胞 瘤。
(2)神经性疾病研究方向主要包括:脑损伤和感染性蛋白质疾病,如克雅氏病(CJD)、牛海绵状脑
病 ( BSE ) 、 帕 金 森 氏 病 。
(3)器官移植排异蛋白质组研究将寻求一种体外检测的方法,用于人体器官(心脏、肝、肺或肾)移
植 后 的 过 敏 和 慢 性 排 异 性 反 应 。
(4)心血管疾病列入研究的心血管疾病有心力衰竭、高血压合肥大型心肌炎。
(5)糖尿病、肥胖症通过蛋白质组学方法对于肥胖症及糖尿病相关的多肽进行识别,作为潜在的识别
分子和治疗靶象。
第二方面是用于植物生长发育及其调控机理研究。其主要研究的内容包括以下几个方面:
(1)据于植物生长发育相关的蛋白质组学研究,如与植物光周期相关蛋白质组研究。
(2)植物感应环境胁迫后细胞内蛋白质组谱的变化
(3)种子发育过程中特异蛋白质的表达及其功能分析
(4)信号识别、转导途径中蛋白组分分析
(5)细胞器蛋白质组
蛋白质组学研究尚处于一个起始阶段,我国在这方面的研究工作也才起步。目前国内许多大学
和研究院所正在组织人员准备开展这方面的工作,希望在这方面有一席之地,但据了解,许多单位尚
为建立一个合适的实验体系.
Chapte 13 Protein Synthesis, Targeting, and
Turnover
13.1 The cellular machinery of protein synthesis
13.1.1 Messenger RNA is the template for
protein synthesis
13.1.2 Transfer RNAs order activated amino
acids on the mRNA template
13.1.4 Ribosomes are the site of protein
synthesis
13.2 The Genetic code
13.2.1 The code was deciphered with the help
of synthetic messengers
13.2.2 The code is highly degenerate
13.2.3 Wobble introduces ambiguity into
codon-anticodon interactions
13.2.4 The code is not universal
13.2.5 The rules regarding codon-anticodon
pairing are species-specific
13.3 The Steps in translation
13.3.1

Das könnte Ihnen auch gefallen