Proteomics 10

Current Pharmaceutical Design, 2001, 7, 293-312 291
Proteomics as a Tool in the Pharmaceutical Drug Design Process

M. Yoshida
a
, J . A. Loo
b
and R. A. Lepley
a
*
Departments of Molecular Biology
a
and Discovery Technologies
b
, Pfizer Global Research
and Development, Ann Arbor, MI. 48105, USA
Abstract: Proteomics is a technology platform that is gaining widespread use in
drug discovery and drug development programs. Defined as the protein complement
of the genome, the proteome is a varied and dynamic repertoire of molecules that in
many ways dictates the functional form that is taken by the genome. The
importance of proteomics is a direct consequence of the central role that proteins
play in establishing the biological phenotype of organisms in healthy and diseased states. Moreover,
proteins constitute the vast majority of drug targets against which pharmaceutical drug design processes
are initiated. By studying interrelationships between proteins that occur in health and disease and
following drug treatment, proteomics contributes important insight that can be used to determine the
pathophysiological basis for disease and to study the mechanistic basis for drug action and toxicity.
Proteomics is also an effective means to identify biomarkers that have the potential to improve decision
making surrounding drug efficacy and safety issues based on data derived from the study of key tissues
and the discovery and appropriate utilization of biomarkers.
INTRODUCTION The Concept of a Proteome
In a biomedical research environment that is
increasingly subject to -omics driven
experimentation the utility of various genomic
based technology platforms in dedicated
pharmaceutical drug design processes remains
largely untested. While genomic technologies are
routinely used by biologists to evaluate the
pharmacodynamic and pharmacokinetic properties
of drugs, they have not been systematically
deployed by medicinal chemists in a manner that
provides tactical or strategic data that can be used
to guide synthetic chemistries or define novel
synthetic routes [1]. The intent of this paper is to
depict proteomics, a genomic technology, in a
fashion that provides insight into the technology
platform itself and to provide a context that
suggests how the technology may provide utility
to medicinal chemists.
The flow of information out of the eukaryotic
cell nucleus involves a hierarchical process that
begins with the transcription of any of an
estimated 100,000 genes represented by the human
genome [2]. These genes are contained within some
3.3 billion nucleotides of chromosomal DNA that
constitute the 22 somatic chromosomes and 2 sex
chromosomes of the human genome. Not all genes
are used by a cell. Within any given cell, only 20%
to 30% of the 100,000 genes may be required to
provide a fully functional, terminally differentiated
cell. Each gene represents an open reading frame
that encodes at least one protein product. The first
step toward generating a protein product from a
gene involves transcription. The transcriptional
process generates a messenger RNA which
provides a template that is used to direct the
synthesis of the genes cognate protein. The
cellular process of synthesizing a protein from the
mRNA transcript is termed translation.
Translation occurs outside of the eukaryotic cell
nucleus. This necessitates that a mRNA transcript
be transported out of the nucleus into the
*Address correspondence to this author at the TIS Group Inc., 200 South
Sixth Street, Suite 450, Minneapolis, MN 55402, USA; Ph.: +1-612-334-
3050; Fax: +1-612-334-3058; e-mail: Robert.Lepley@TISGROUP.net
1381-6128/01 $28.00+.00 2001 BenthamScience Publishers Ltd.
292 Current Pharmaceutical Design, 2001, Vol. 7, No. 4 Lepley et al.
cytosolic compartment where it is ultimately
loaded onto a ribosome. The ribosome in
conjunction with transfer RNA species charged
with amino acids and a number of accessory
proteins provides the necessary molecular
machinery to carry out and complete the
translational process. This hierarchical process
that converts the informational content of genes to
protein products forms the basis for defining the
proteome as the protein complement of the
genome [3]. While this definition is strictly correct
it does not explicitly accommodate the fact that
the functional form of nearly all mammalian
proteins requires some type of covalent
modification. These post-translational covalent
modifications impart important functional
attributes to proteins such as
compartmentalization signals, structural changes,
stability and gain or loss of function [4]. Arguably,
the functional proteome is much larger than the
genome from which it is derived. From the drug
discovery standpoint, the proteome provides the
cellular phenotype from which drug targets
originate and against which compounds are
developed and screened. While the functional
proteome constitutes a target-rich environment, it
is an environment that only grudgingly yields good
targets.
expressed proteins linked to their respective genes,
then the proteome becomes a powerful means to
examine global changes in protein levels and
expression under changing environmental
conditions. The proteome becomes a reference for
future comparison across cell types and species. It
is expected that proteomics will lead to important
new insights into disease mechanisms and
improved drug discovery strategies to produce
novel therapeutics.
The application of proteomics to study
biochemical pathways and to identify potentially
important gene products as targets for drug
discovery is well established in the literature
[6,7,8]. Proteome expression profiles of fresh
bladder tumors are being used to search for protein
markers that may form the basis for diagnosis,
prognosis, and treatment of the disease [9]. The
goal of these studies is to identify signaling
pathways and components that are affected at
various stages of bladder cancer progression and
have the potential to provide novel leads in drug
discovery. Studying individual proteins in the
context of other cellular proteins is complementary
to the information gathered from a genomics-based
approach.
In order to increase the capacity and throughput
of the information flow derived from a proteomics-
based research program, several newly developed
technologies need to be highly integrated to yield a
complete and efficient proteomics-based
methodology [10, 11]. The traditional approach
involves the separation of a complex protein
mixture with two-dimensional electrophoresis (2-
DE, or 2-D PAGE for two-dimensional
polyacrylamide gel electrophoresis). This is a
widely used method because of its high sample
capacity and separation efficiency. Associated
with 2-DE are the necessary means to analyze and
store the gel images for protein correlation and data
tracking. A major component of the proteomics
assembly is bioinformatics, including methods that
correlate changes observed in the gel patterns
(attributed to changes in the environment) and
software that identifies proteins from genomic and
protein sequences.
Proteomics in Drug Discovery
The dedicated effort to characterize the
proteome has been an ongoing research effort for
many years. Understanding how drugs affect
protein expression is a key rationale for applying
proteomic methodology to a drug discovery
program. Mapping proteomes from tissues and
organisms has been used for development of high
throughput screens, validation and advancement of
new protein targets, SAR development,
exploration of compound mechanism of action,
compound toxicology profiling and identification
of protein biomarkers in disease. For example an
integrated proteomics approach has been used
successfully to identify protein differences in
phenylephrine-induced hypertrophic cardiac
myocytes [5]. Once a proteome is established and
Proteomics as a Tool Current Pharmaceutical Design, 2001, Vol. 7, No. 4 293
Differential protein expression has been
monitored for many years using 2-DE systems.
However, at the time the term proteomics was
introduced [12], the technology presented by new
mass spectrometry methods was developed to
make the identification of proteins separated by
PAGE more amenable. This advance in mass
spectrometry greatly expanded the range of
applications to which proteomics can contribute.
Mass spectrometry provides a robust method for
protein identification (provided the genome or
protein sequences are known and available). The
method can be automated, thereby greatly
increasing the throughput of the analysis if many
proteins need to be identified. The virtues of mass
spectrometry have played an important role in
honing the burgeoning field of proteomics, and
mass spectrometry will continue to support these
escalating efforts. This review serves to outline the
major technologies associated with a proteomics-
based approach that can be applied within drug
discovery and drug development strategies.
the separation both limit protein separation and
ultimately, protein resolution. To minimize these
effects 2-DE is carried out in a polyacrylamide
support medium. The utility of polymerizing
acrylamide monomers into an electrophoretic
support matrix capable of reproducibly separating
complex protein mixtures was recognized long
before the advent of 2-DE systems [14, 15]. While
either electrophoretic dimension alone can
independently resolve 100 to 200 individual
proteins, the combined separation properties
resident in 2-DE systems offer investigators the
ability to resolve up to 10,000 individual proteins
[16]. The optimization of 2-DE systems for
proteomics research applications has been a
scientifically arduous evolution, not revolution.
Protein separation in the first dimension is
based on the isoelectric point (pI) of individual
proteins. Isoelectric focusing (IEF) depends upon
the formation of a continuous pH gradient through
which a protein migrates under the influence of an
electrical field. The net charge on the protein varies
with the pH but is zero at its pI. At a pH
equivalent to its pI movement of the protein in the
electrical field ceases. At this point the protein is
isoelectrically focused. In practical terms, IEF is
achieved using either carrier ampholyte gels or
immobilized pH gradient (IPG) gels. Carrier
ampholytes are oligoamino and oligocarboxylic
acid derivatives with molecular weights ranging
from 600 to 900 daltons. By blending many carrier
ampholyte species together highly reproducible,
continuous pH gradients between pH 3 and pH 10
can be formed in a polyacrylamide support matrix.
Typically, the first dimension pH gradient and
polyacrylamide support matrix are cast in a tube
gel format that conforms to the physical
measurements of the second dimension gel system.
Carrier ampholyte IEF tube gels possess several
advantages over IPG strip gels. First, they are
easily prepared and do not require specialized
gradient-forming instrumentation. Second, carrier
ampholyte IEF gels can be experimentally tailored
to generate either linear or nonlinear pH gradients
in broad ranges or very narrow ranges [17].
Disadvantages of carrier ampholyte use include i)
lot to lot variation in the ampholytes themselves
attributable to complicated organic synthetic
PROTEOMIC TECHNOLOGY PLATFORMS
Two-dimensional Gel Electrophoresis
Since its inception, two-dimensional
electrophoresis has provided researchers with a
powerful tool that is capable of resolving a
complex protein mixture into its individual
components [13]. As implied by its descriptive
nomenclature, 2-DE systems separate proteins on
the basis of distinct properties in each of two
dimensions. In each dimension, the chosen
property of the individual proteins within the
mixture defines their mobility within an electrical
field. The term 2-DE describes a technique that
separates proteins in the first electrophoretic
dimension on the basis of charge and in the second
electrophoretic dimension on the basis of
molecular mass. While 2-DE separations of protein
mixtures can theoretically be carried out in free,
aqueous solvent systems, the resultant separations
are of little practical value. This is principally due
to the effects of thermal convection and diffusion.
Thermal effects caused by joule heating during the
separation and the effects of diffusion following
methods and ii) the physicochemical effect termed
cathodic drift that prevents a truly stable pH
equilibrium state from being obtained in the first
dimension gel system. The tangible effect of
cathodic drift is a progressive deterioration of the
basic side of the pH gradient that can be
compensated for by addressing factors such as the
time and power requirements [18]. Immobilized
pH gradient (IPG) gel chemistry is based on
Pharmacia Immobiline
acrylamido buffers [19].

These chemical entities are bifunctional. The
general chemical structure, CH2=CH-CO-NH-R,
defines one end of the molecule, designated R, as
the titrant moiety where a weak carboxyl or amino
group provides an ionizable buffer capacity that
defines the range of the continuous pH gradient.
At the other end of the molecule, the CH2=CH-
moiety provides an acrylic double bond that co-
polymerizes into the polyacrylamide support
matrix. It is the polymerization of the Immobiline
buffers into the polyacrylamide support matrix

that provides an immobilized pH gradient. Because
the Immobiline

molecules themselves are rather
simple chemical entities, the discrete molecules can
be generated reproducibly with minimal variation
between lots. This eliminates a principal concern
associated with carrier ampholytes. IPG gels also
offer the prospect of increased sample loading over
the carrier ampholytes. Carrier ampholyte tube
gels can routinely accommodate up to 100 g of
protein and in certain instances several hundred g
of protein. In comparison IPG gels offer higher
protein loading potentials that can accommodate
up to and in certain instances more than 1000 g
of protein [20]. Another benefit associated with
IPG gels is the virtual elimination of cathodic drift
that permits basic proteins to be resolved [21].
While both carrier ampholyte and IPG gels offer
similar pI resolution, IPG gels have been viewed as
providing less reliable positional focusing for
discrete protein species than carrier ampholyte
gels [22]. Although casting procedures for IPG gels
require specialized gradient forming
instrumentation, the increased commercial
availability of IPG gels from several manufacturers
minimizes this potential disadvantage.
dimension IEF chemistry [23]. The first dimension
IEF chemistries that produce the desired
continuous pH gradient depend upon protein
charge to effect zonal concentration of discrete
protein species. Consequently, the introduction of
charged species, especially detergents, with the
sample must be minimized. This constraint makes
sample preparation for 2-DE analysis a daunting
task. To maximize protein representation with the
sample and simultaneously retain a fixed linear
relationship with respect to the original material
being sampled necessitates that i) the starting
material be handled appropriately to minimize
protein modification due to degradation and
spurious post-translational covalent modification,
ii) the disruption and solubilization of proteins
that participate in inter-molecular interactions are
ensured, iii) highly charged species such as nucleic
acids are removed, iv) intra- and inter- molecular
disulfide bonds are disrupted and v)
transmembrane proteins are remove and
solubilized. To accomplish this task 2-DE sample
preparation buffers employ high concentrations of
chaotropic agents such as urea, nonionic or
zwitterionic detergents, nucleases and reductants
such as dithiothreitol in the presence of alkylating
agents to disrupt disulfide bonds. Consideration
must also be given to post-first dimension buffer
exchanges that facilitate the egress of proteins from
the first dimension gel and permit efficient
penetration into the second dimension gel matrix.
An excellent compendium of recent procedures
that address protein solubilization concerns
associated with specific organisms and specific
protein classes has been provided in a text edited
by Link [24].
Protein separation in the second dimension is
based on the molecular mass of individual proteins.
Many variations exist as to the composition of the
second dimension. Specific applications have been
designed to optimize protein separation under
denaturing or nondenaturing conditions, reducing
or nonreducing conditions, linear or nonlinear
polyacrylamide support matrixes and numerous
buffer compositions. The most frequently used
second dimension buffer systems are based on the
pioneering work of Laemmli [25]. In the Laemmli
system, the polyacrylamide support matrix is used
Sample preparation presents a problem that
must be considered within the context of the first
to sieve proteins that have been subject to
denaturation in the presence of sodium dodecyl
sulfate (SDS). This crucial step is based on the
premise that SDS binds uniformly to denatured
macromolecules with a fixed stoichiometry and
that the macromolecules coated with SDS assume a
prolate ellipsoid shape [26]. Theoretically, all
protein species assume the same shape and migrate
through the polyacrylamide support matrix with
rates that are dependent only upon their
hydrodynamic radius. Consequently, the position
to which a protein species migrates during a fixed
time is correlated to its molecular size. This fact
offers investigators the opportunity to move to
dimensionally larger gel formats to increase protein
loading and subsequent resolution. A larger 2-DE
gel system can accommodate more protein. This
increases the likelihood that low abundance
proteins will be detected and that these can be
separated from other proteins on in the gel. An
alternative approach to this problem is the so
called zoom gels. Zoom gel systems run the same
sample on narrow first dimension pH gradients
and separate on as large a second dimension gel as
possible. The broadest pH range and most
extensive molecular weight range can then be
reassembled into a single composite image by
visual means or attempted using specialized
software applications. In practice, this is very
difficult to accomplish.
[29] and forms complexes with basic amino acid
side chains [30]. The R-250 based staining
procedures offer protein detection in theg range
and they are rapid as well as inexpensive. The
Coomassie R-250 stain is also compatible with
many post-run analytical steps including sequence
analysis and mass spectroscopic methods. Basic
silver staining methodologies depend on the
reduction of silver nitrate to metallic silver at
protein sites in the gel. The addition of sensitizing
agents increases the sensitivity of silver staining
methods. Sensitizers bind to both the proteins in
the gel and subsequently either bind silver ion
directly or facilitate its reduction to metallic silver
[31]. While silver staining is capable of detecting
proteins with approximately 100-fold greater
sensitivity than Coomassie stains, silver staining is
a non-equilibrium method that does not stain
proteins uniformly. Another disadvantage of silver
staining methods is that they can impair post-run
procedures such as mass spectroscopy.
More recently fluorescent methods have
become available for 2-DE protein detection
applications. Non-covalent fluorescent staining has
focused primarily on the SYPRO dye series
consisting of SYPRO Orange and SYPRO Red
stains and the relatively recent addition of SYPRO
Ruby [32]. Because the Orange and Red SYPRO
variants interact with the SDS detergent that coats
proteins rather than with the proteins themselves,
variation in protein staining is reduced with this
method. SYPRO Ruby differs in that the stain
interacts with the proteins rather than the
detergent coat. The cyanine dyes present an
alternative fluorescent detection method. These
agents are based on sulfoindocyanine dye
templates that have succinimidyl ester reactive
groups incorporated into the structure [33]. The
succinimidyl moiety readily conjugates with
primary amine groups to form stable, covalent
protein-dye complexes. Due to the nature of the
protein-dye interaction, proteins must be labeled
with cyanine dyes during sample preparation. A
potential problem with this approach is the
possibility that the dye fails to interact with some
proteins due to the complete lack of functional
groups or their lack of accessibility to the dye.
Other concerns pertain to issues surrounding the
Protein Detection
Detection of proteins in 2-DE gels is a process
that must be integrated into the overall
experimental design. A 2-DE detection method
must be chosen to provide the investigator with
access to information that matches requirements
set forth in the experimental design for sensitivity,
linear dynamic range, reproducibility, imaging
methodology and potential post-imaging 2-DE gel
processing steps. Currently the most common
methods for protein detection in 2-DE gel
applications are Coomassie dye staining and silver
staining [27, 28]. In 2-DE gel staining applications
Coomassie Brilliant Blue R-250 has become the
Coomassie dye of choice [29]. Coomassie Brilliant
Blue R-250 has an absorption peak at 560-575nm
uniformity of the protein-dye stoichiometry
across all proteins in the sample and the fact that
the dye does alter the molecular weight of the
proteins that it modifies. Cyanine dyes have
several properties that make them attractive
protein detection reagents, especially in double
labeling experiments [34]. First they bind
covalently to proteins. Second, there are a series of
related sulfoindocyanine dye templates that
possess discrete excitation and emission spectra.
When used in combination(s) different dyes can be
used to independently process protein samples
that are mixed prior to gel loading. The resultant
gel image readily discriminates between the two
protein-dye populations as well as producing a
third color that represents co-migrating proteins.
Although the SYPRO and cyanine dyes offer
improvements over silver staining with respect to
sensitivity, their major advantage is in the broad,
linear dynamic range for protein detection. This
can exceed a comparable silver stained gel by
several orders of magnitude. The SYPRO stains
and cyanine dyes are also compatible with mass
spectroscopic procedures which is a great benefit
when proteins need to be identified from the same
gels in which they are visualized. A significant
disadvantage is the substantial cost of the
fluorescent detection methods. Not only are the
fluorescent stains and dyes costly themselves, but
the need for separate UV and visible imaging
platforms adds to the overall cost of the
proteomics laboratory and can limit their utility in
large 2-DE gel projects.
with film response to weak -emitting isotopes
and a modest linear dynamic range of 300 to 1 have
lead most investigators to employ
phosphorimaging devices. Phosphorimaging
systems have linear dynamic ranges that span up
to five orders of magnitude and as much as 250
times more intrinsic sensitivity to radiolabeled
proteins than does film [35]. Other advantages of
phosphorimaging systems include fast imaging
times, acquisition of digitized data ready for
computational analysis, and the elimination of
autoradiograpic film development processes. A
significant drawback to phosphorimaging systems
is their cost and the requirement that different
screens be used to measure specific radioisotopes.
A particularly attractive use of phosphorimaging
technology is the application of double-label
analysis. Experimental samples can be prepared
with two different radiolabels. Phosphorimages
can be acquired that track two different
biochemical processes within the same sample and
from the same gel by imposing selective shielding
between the source and the target following the
initial image. This approach circumvents problems
associated with inter-gel reproducibility. Non-
radiolabeled samples can be imaged using
chromogenic or chemiluminescent immunologic
methods [36, 37] or conventional stains and dyes.
Current methods for 2-DE image acquisition
depend primarily upon charge-coupled device
(CCD) camera systems and document scanning
devices [38, 39]. CCD camera systems employ
CCD in lieu of conventional film to acquire the
image. The characteristic emission spectrum and
stability of the light source as well as a uniform
diffusion of the light across the 2-DE gel surface is
important to ensure that optimized, consistent
images are obtained over time. The CCD chip
image map can be readily converted to digital form
and downloaded to a computer for subsequent
analysis. Document scanners acquire 2-DE image
information through an array of photo-detectors
that are moved across a 2-DE gel illuminated by a
white or filtered light source. For 2-DE gel
applications it is important that the scanner obtain
and record a uniform image map across the length
and width of the gel. Document scanning features
that change gain settings to optimize text contrast
during image acquisition should be disabled to
Imaging and Image Analysis
Rigorous attention to detail from sample
preparation to image analysis is necessary if the 2-
DE project is designed to generate valid
comparative data and to obtain valid numerical
data. While the accuracy and precision of first
dimension protein loading determines the validity
of subsequent quantitative image analysis the type
of 2-DE image acquired is in large part determined
by the nature of the sample itself.
Autoradiographic images derived from isotopically
labeled samples can be collected on film or
phosphorimaging systems. Difficulties associated
ensure that a uniform image is obtained for
subsequent analysis. During image acquisition any
changes in the relationship between the image
CCD device or scanner must be noted. A gray
scale step tablet should be used initially to
calibrate and to recalibrate the system following
any changes.
At this point data management and data
visualization tools are required.
Data management and data visualization are
critical concerns in the evolution of a 2-DE project.
Although all investigators confront these issues
when a 2-DE project is completed, an integrated
approach that considers data management and
visualization during the planning stages of a 2-DE
project offer the best opportunity to optimize the
process and achieve intended project goals. A clear
understanding of the specific aim of the project at
the its conception can define strategic concerns
that address issues pertaining to i) How will the 2-
DE data be analyzed and what level of validation is
needed? Is visual examination of the 2-DE gel data
sufficient or are statistical methods required? ii) Is
the experimental design appropriate and are
sample numbers adequate to ensure that the
experimental outcomes can be determined without
ambiguity? iii) What form of raw data must be
harvested from the 2-DE gel images and by what
analytical method? iv) Does this project
standalone or will data be integrated into a larger
ongoing project? v) Is data going to be added to a
database? vi) If so, what annotation will be
necessary to ensure continuity with existing data?
vii) Will important spots be excised from the gel
for subsequent identification? In simple terms, the
outcome of a 2-DE project is determined by
careful planning. If visual inspection of the 2-DE
gel data is sufficient to achieve project specific
aims then 2-DE images must be evaluated in a
consistent manner and results cross-checked and
agreed upon by several observers. If statistical
methods are to be used the type of raw data that
will be used in the analysis determines the output
of the image analysis system. The statistical
analysis can be performed at a single level with an
analysis that employs T-tests or it can be multi-
level with each successive level depending upon
the previous analysis. The type of biological
experimentation is an important determinant of
whether the final 2-DE data represent a completed
project or if they become a covariant in a larger
model that includes data from other sources.
Preliminary statistical analysis of the spot data can
be accomplished using a straightforward T-test
procedure based on individual spot intensity
All steps of the 2-DE process culminate in the
generation of an image and establish its analytical
value. The central tenet of 2-DE image analysis, to
provide a comparative means to detect and
measure changes, has changed little since the
QUEST System was designed and developed by
Garrels nearly 20 years ago [40]. The specific
intent for the QUEST System was to develop a
system for quantitative 2-DE that was exemplary
in gel electrophoresis, image processing and data
management. Although advances in computational
speed and power have made more complex
analyses possible by more sophisticated
algorithms in shorter periods of time these
principal goals remain. Currently four 2-DE image
analysis software systems are available
commercially: Melanie II and PDQUEST (BioRad
Laboratories, Hercules, CA), BioImage (Genomic
Solutions, Inc, Ann Arbor, MI), and Phoretix 2D
Advanced (Phoretix International, Newcastle upon
Tyne, UK). While each software system
possesses unique features the basic approach to 2-
DE gel analysis is quite similar. The process of
image analysis begins with spot detection and
image editing. Spot detection algorithms are
effective but they frequently miss spots, fail to
resolve multiple spots or identify artifacts as
spots. These occurrences must be visually
identified and corrected by the investigator. Once
edited, reference spots are identified that appear
consistently and are well resolved on each gel in a
project. These are used to register the individual
gels across a project. Matching individual spots
across all gels in a project follows the gel
registration process. This process is arduous and
time consuming. For each hour spent producing a
2-DE gel, 4 to 6 hours may be spent in image
editing and analysis. Once the imaged spots have
been matched statistical processes can be applied
to detect and define significant differences between
spots across respective groups within a project.
values or integrated spot intensity values.
Mapping statistically changed spot matches onto
the 2-DE gel images can provide a visual signature
that is characteristic of a treatment, organism or
pathology. Signature patterns can be visualized on
gels or by more complex statistical methods that
cluster data and create strong visual links between
related and disparate statistical data. There are
many alternative approaches to the visualization
of complex computational data sets [41]. In the
absence of established standards, investigators
should be guided by a desire to present data
graphically with simplicity, clarity, precision and
efficiency [42].
considerable investigator skill and time. An
alternative is the application of a spot excision
robot to the process. This is feasible only if large
numbers of protein spots are to be excised from
many gels over time due to the fact that robotic
systems are costly and have only recently become
commercially available suggesting that performance
and reliability are yet to be determined. A great
many wide bore needles can be beveled flat and
used as spot excision devices for the price of an
automated robotic system. In regions of high spot
density, great accuracy must be used to remove
only the spot intended without contaminating the
target spot or affecting the ability to excise
proximal spots. To accomplish this manually or
through automation requires that the excision
device be precisely centered on the spot and have a
diameter that does not exceed the diameter of the
spot. The flip side of this is that in instances
where a large spot needs to be excised in its
entirety, a larger punching device or multiple
punches must be taken. Given the present state of
robotic systems this is much easier to accomplish
manually. Other factors should be considered in
sampling low abundance protein spots. A linear
correlation does not necessarily exist between
protein abundance and apparent spot intensity. If
multiple spots are to be pooled to increase the
apparent abundance of protein a point of
diminishing returns may be reached where excision
from multiple 2-DE gels fails to increase the
protein extracted from the gel above the detection
limit of the imaging device. In this instance,
preparative sample loading on a fresh 2-DE gel is
often the best solution. Keratin contamination is a
major concern during spot excision. Keratin from
many sources must be assumed to be present in
the laboratory environment unless specific and
heroic measures are taken to ensure and guarantee
its absence. Precaution against keratin
contamination must be taken at all steps of the 2-
DE gel process to avoid contamination of the
sample, gel chemistry components and the surface
of the gel. During spot excision the investigator
should wear protective clothing, gloves, hair net
and a mask. If a robotic system is used careful
maintenance, cleaning and an enclosure system will
greatly reduce potential keratin contamination.
Protein Spot Excision
Fundamentally, protein spots are excised from
2-DE gels to provide a source of material that can
be used to identify the protein(s) resident in the
spot. From a tactical perspective, a 2-DE project
can represent a completed experimental objective.
In this context, the identification of important
spots derived from analytical means completes the
project. In contrast, a 2-DE project can be a
component of a much larger, ongoing project. In
this circumstance exhaustive protein spot excision
that accounts for all spots in a 2-DE gel may be
used to establish a repository of information that
can be stored and drawn upon at a later date. No
matter what the intent, protein spot excision
requires that spots be identified, precisely
removed, and that protein(s) present in the spot be
abundant with respect to the detection limit of the
identification technology and be stained in a
manner that does not interfere with identification
technology. Several approaches to protein spot
removal should be considered. The individual 2-DE
gels that contribute to a proteomics project can be
quite variable with respect to spot density.
Consequently, picking methods that remove
defined sections of the 2-DE gel in a grid pattern
are inefficient in barren regions of the gel and prone
to cross contamination from multiple spots in
other more densely populated regions. Conversely,
a picking method that punches specific spots
performs equally across the gel but requires
Protein Identification by Mass Spectrometry resulting tryptic fragments are extracted from the
2-DE gel matrix and then subjected to Matrix-
Assisted Laser Desorption / Ionization MS
(MALDI-MS). A mass spectrum of the resulting
digest products produces a peptide map or a
peptide fingerprint. The masses of various
peptides in the fingerprint can be compared to
theoretical peptide maps derived from protein
database sequences for identification.
Subsequent to 2-DE gel analysis the inevitable
question arises, what are the proteins in the
spots? Historically, methods such as amino acid
analysis and Edman sequencing have been
employed to obtain protein identities from 2-DE
gel systems. When pI and molecular weight from
the 2-DE system is combined with amino acid
composition and / or N-terminal sequence data
protein identifications can be made with a high
degree of confidence [43, 44]. More recently,
protein identifications have been determined by
using mass spectrometry (MS). The sensitivity of
various MS systems permits the identification of
proteins whose abundance is below one picomole
and in many cases in the femtomole range. In
addition to providing greater sensitivity, protein
identifications can be made with greater confidence
given the mass accuracy of the MS measurement.
For peptide fragments less than 3000 Da, a mass
accuracy of better than 100 ppm (parts-per-
million) can be obtained. Moreover, mass
spectrometry is highly amenable to automation, as
steps from sample preparation, sample injection,
data acquisition, to data interpretation can be
performed unattended. Automation is an
important concept as capacities increase. It can
reduce sources of sample contamination as well as
sample handling errors. The rapid increase in
proteomic sample throughput arising from
constant improvements in the automation of
sample preparation and data acquisition have
resulted in the generation of a large data sets that
contain data with high intrinsic integrity. This has
necessitated the parallel development of
increasingly efficient computer-based MS data
searching strategies that can rapidly provide
protein identifications from experimentally derived
polypeptide molecular masses and sequences.
With these clear advantages of using MS-based
approaches, mass spectrometry has become an
integral part of proteomics.
MALDI-MS is a tool that has rapidly grown in
popularity in the area of bioanalytical mass
spectrometry [45]. This is due to recent
improvements in time-of-flight (TOF) technology
that have resulted in enhancements in resolution
and subsequently mass accuracy, which have
increased the usefulness of MALDI-MS data.
Currently, the TOF analyzer is the most common
system for MALDI. Improvements such as the
reflectron analyzer and time-lag focusing (delayed-
extraction) have greatly improved the quality of
MALDI-MS data [46]. Automation has also
played an important role in the increased use of
MALDI-MS as a tool for proteome analysis by
allowing automated acquisition of data, followed
by fully automated database searching of protein
sequence databases such as SWISS-PROT and
NCBI [47, 48].
The simplicity and speed of the MALDI-MS
approach is the reason for its popularity in
proteomics. The peptide analyte of interest is co-
crystallized on the MALDI target plate with an
appropriate matrix (i.e., 4-hydroxy--
cyanocinnamic acid or 3,5-dimethoxy-4-
hydroxycinnamic acid (sinapinic acid)), which are
small, highly conjugated organic molecules that
strongly absorb energy at the laser frequency being
used. In most examples of MALDI, the
wavelength of choice is the 337 nm output from a
nitrogen laser. Other lasers operating at different
frequencies can be used but cost and ease of
operation have made the nitrogen laser the most
popular choice. Once the peptide mixture is co-
crystallized with matrix, the target plate is inserted
into the high vacuum region of the source and the
sample is irradiated with a laser pulse. The matrix
absorbs the input laser energy and transfers it to
the analyte molecule. The molecules are desorbed
Protein Identification by MALDI-MS
Preceding MS the protein spots from a 2-DE
gel are excised from the gel and exposed to a
proteolytic enzyme, typically trypsin. The
and ionized during this stage of the process. The
ions are then accelerated under constant kinetic
energy down the flight tube of the TOF instrument
with a pulse of up to 30,000 V. The laser pulse
imparts all ions with nearly the same kinetic
energy. Each ions characteristic flight times are
based on the individual ions mass-to-charge (m/z)
ratio. As a result, ions of different masses are
separated as they travel down the flight tube of the
mass spectrometer (lighter ions travel faster than
larger ions), then strike the detector and are
registered by the data system, which converts the
flight-times to masses.
at which it is obtained can be greatly affected by
laser energy and the homogeneity of the analyte
spot. To minimize the effects caused by
heterogeneous application of analyte crystals,
automated MALDI sample preparation robotic
systems provide consistent sample preparation
results. Peptide maps that are derived from the
parent protein can result in sequence coverage
(relative to the entire protein sequence) that
accounts for greater than 80% of the sequence. The
measured molecular weights of the peptide
fragments along with the specificity of the enzyme
employed can be searched and compared against
protein sequence databases using a number of
computer searching routines that are available from
public databases on the Internet.
To obtain the best mass accuracy possible
MALDI-MS instruments equipped with a
reflectron and time-lag focusing are commonly
employed in proteomic research [46]. A reflectron
is used to compensate for the initial energy spread
that the ions may have following desorption off
the sample target plate. Ions of the same mass
which have slightly different energy arrive at the
detector at slightly different times, resulting in a
slight mass difference and a loss of resolution. To
compensate for this effect, an ion mirror or
reflectron is used to focus the ions by creating an
energy gradient; ions with higher kinetic energy
penetrate deeper and are forced back out at a
slightly higher energy. In addition, time-lag
focusing (or commonly referred to as delayed
extraction) allows the initial energy spread to be
partially focused prior to initiating the first pulse
of the ions. Many MALDI-TOF instruments
today incorporate a reflectron and time-lag
focusing capabilities. In many cases, this
combination can increase MALDI-TOF mass
accuracy to better than 20 ppm for peptides of
molecular mass 500-3000 Da.
HPLC-MS and Tandem Mass Spectrometry
An approach for peptide mapping similar to
MALDI-MS involves the use of electrospray
ionization mass spectrometry (ESI-MS). ESI is a
solution-based ionization method in which analyte
solutions flowing in the presence of a high electric
field produce submicron-sized droplets. As the
droplets travel towards the mass spectrometer
orifice at atmospheric pressure, they evaporate
and eject charged analyte ions. These ions are
sampled by the mass spectrometer for subsequent
mass measurement. A peptide map can be
obtained by the direct analysis of the peptide
mixture by ESI-MS. A significant advantage of ESI
over MALDI is the ease with which it can be
coupled to separation methodologies such as
HPLC. Inclusion of a pre-MS separation reduces
the complexity of the starting mixture applied to
the MS instrument. Once again, the measured
peptide masses can be similarly compared with
sequence databases to identify the parent protein.
Proteomic projects are capable of generating a
large number of protein spots that require protein
identification. This necessitates that mass
spectrometric systems acquire the capability for
very high throughput. This capability can be
achieved by automation of the entire process
associated with sample preparation through
spectral analysis. Most commercial MALDI-MS
instruments have capabilities for automated data
acquisition. The quality of the data and the speed
To provide further confirmation of a protein
identification, a tandem mass spectrometer
(MS/MS) can be employed to dissociate peptide
ions in the mass spectrometer to provide direct
sequence information. Peptide fragmentation
typically occurs along the polypeptide backbone
to produce products termed y-type and b-
type fragment ions (in which the y-ions contain
the C-terminal portion of the peptide and the b-
ions contain the N-terminus). These product ions
from an MS/MS spectrum can be compared to
available sequences using powerful software tools
as well. In many examples, laboratories may use
nanoelectrospray with tandem mass spectrometry
to examine the unseparated digest mixtures with a
high degree of success [6].
practice for the interpretation of electron
ionization (EI) mass spectra for many years.
However, there is a fundamental difference
between database searching of EI spectra and
proteomic spectra. In the former, an
experimentally acquired spectrum is compared
with a large collection of spectra acquired from
known, authentic standards. Fundamental features
of the spectra that may include the ions present
and their relative intensities are compared to
establish a match and to assess a more subjective
variable, quality of fit. This type of computer-
based searching is typically referred to as library
searching. In contrast, proteomic data searches are
executed by means of a list of experimentally
determined masses that is compared to lists of
computer-generated theoretical masses prepared
from a database of protein primary sequences.
With the current exponential growth in the
generation of genomic data, the comprehensiveness
of these databases expands daily and will soon
encompass the complete inventory of open reading
frames for many prokaryotic and eukaryotic
species, including human.
One of the most common forms of tandem mass
spectrometers used for proteomics applications is
the quadrupole ion trap (QIT) mass spectrometer
[5, 48 - 50]. An ion trap is an ion storage device
that utilizes radiofrequency (rf) voltage across a
ring electrode to contain ions [51]. As the rf
amplitude is increased, ions of increasing mass
become unstable in the trap and are ejected
towards the instrument detector. The ion trap
mass spectrometer has become a workhorse
instrument for proteomics. This is due to its ease
of operation and because of its high efficiency for
generating protein sequence information.
For a single sample, the application of Liquid
Chromatography-MS/MS (LC-MS/MS) analysis
included two discrete steps: (a) LC-MS peptide
mapping to identify peptide ions from the
digestion mixture and to deduce their molecular
weights, and (b) LC-MS/MS of the previously
detected peptides to obtain sequence information
for protein identification. An improvement in
efficiency and throughput of the overall method
can be obtained by performing LC-MS/MS in the
data dependent mode. As full scan mass spectra
are acquired continuously in LC-MS mode, any
ion detected with a signal intensity above a pre-
defined threshold will trigger the mass
spectrometer to switch over to MS/MS mode.
Thus, the ion trap mass spectrometer switches
back and forth between MS- (molecular mass
information) and MS/MS mode (sequence
information) in a single LC run. The data
dependent scanning capability, combined with an
autosampler device, can dramatically increase the
capacity and throughput of protein identification.
There are basically three types of search
strategies that are employed: i) searching with
peptide fingerprint data, ii) searching with
sequence data, and iii) searching with raw MS/MS
data. While the strategy employed often depends
upon the type of data available, there is a logical
hierarchy to the search strategy based on the time
and effort that must be invested in a given
strategy. While many of the computer-based
search engines accommodate the purchase of an
on-site license for use with proprietary data, most
are within the public domain and are available via
the Internet. One limiting factor that must be
considered for these approaches is that they can
only identify proteins that have already been
identified and annotated within an existing
database. In some instances unknown sequence
that is homologous to known sequence(s) can be
identified with nominal confidence based on
homology criteria.
Computer-based Sequence Searching
Strategies Searching with Peptide Fingerprints
MALDI peptide fingerprinting is considered to
be the most rapid manner in which to identify an
The concept of database searching with mass
spectral data is not novel. It has been a routine
unknown protein. Computer software is used to
produce a virtual digest" of the individual protein
components within a protein database using a
defined protease. This virtual digest generates a list
of theoretical peptide masses that represent each
entry in the database. The experimentally
determined peptide masses are then compared
with the theoretical masses to determine the
identity of the unknown.
consumption, throughput, and the need for
intensive data interpretation, peptide
fingerprinting is invariably the first approach that
is utilized for protein identification.
The peptide fingerprint approach is also
amenable to the identification of proteins that are
present in complex mixtures. Peptides generated
from the digest of a protein mixture will often
return two or more results that produce a "good"
fit to a database entry. As long as the peptides
assigned for multiple proteins do not overlap,
identification becomes reasonably straightforward.
Conversely, residual peptides that are "left over"
in a peptide fingerprint once the identification of a
target component has been made can be
resubmitted for the possible identification of
another component contained in the mixture.
The majority of the available search engines
applied by mass spectrometrists allow the
investigator to define certain experimental
conditions that can be used to optimize a specific
search. The search elements that are most
frequently optimized are: i) minimum number of
peptides to be matched, ii) allowable mass error,
iii) monoisotopic versus average mass data, iv)
mass range of starting protein, and v) type of
protease used for digestion. Additionally, many
search algorithms permit the inclusion of
information about potential protein modification,
such as N- and C-terminal modification,
carboxymethylation, and oxidized methionines.
However, most protein databases contain primary
sequence information only, and any shift in mass
incorporated into the primary sequence as a result
of post-translational modification will result in an
experimental mass that is in disagreement with
theoretical mass data. This is one of the greatest
shortcomings of the virtual digestion method.
Modifications such as glycosylation and
phosphorylation can also result in errors. Even a
single amino acid substitution can shift the mass of
a peptide to such a degree that proteins with
significant homology cannot be identified within
the database.
Searching with Sequence Information
Computer algorithm driven database searching
that employs sequence information is the oldest,
and arguably the most straightforward of the three
search strategies that are presented. Typically, an
experimentally determined partial amino acid
sequence is compared with the sequences of
proteins listed in a database. From this input list a
list of proteins containing the same partial
sequence is generated. Virtually all web-based
search applications are able to perform searches
with sequence information. One advantage of this
approach is that typical search engines, such as
BLAST, allow increasing levels of ambiguity to be
integrated into the submitted experimental
sequence. This often facilitates the identification of
unknown sequences that possess a degree of
homology to known sequences.
There are many factors that influence the utility
of peptide fingerprinting. As the experimental
mass accuracy improves, search tolerances can be
refined to focus on very narrow limits. This
increases the match confidence, and decreases the
number of "false positive" responses [52]. A
common practice used to increase mass accuracy in
peptide fingerprinting is to employ an autolysis
fragment from the proteolytic enzyme used in the
digest as an internal standard to calibrate a
MALDI mass spectrum. In terms of sample
The ability to rapidly generate amino acid
sequence information via MS/MS experiments,
whether by triple quadrupole instrument, ion trap,
or post-source decay with a MALDI-TOF
instrument, has revolutionized practices associated
with the generation of proteomic sequence
information. Many software packages are able to
use a combination of sequence information and
mass spectral information that includes the
molecular weight of the individual peptide under
investigation. With an accurate value of the parent
ion mass of a peptide in the presence of limited,
partial sequence information, it is often possible to
obtain a database match for an unknown. A
disadvantage of this approach is that it requires
many manual interpretations by the investigator.
This makes the process more difficult to automate,
and can require multiple separate experiments to
obtain the requisite information.
computer-based search strategies that employ
MS/MS data do not require that operator provide
interpretative information to guide the
identification process. Analogous to the approach
described for peptide fingerprinting, these
programs take the individual protein entries in a
database and electronically "digest" them to
generate a list of theoretical peptides for each
protein. However, in the use of MS/MS data,
theoretical peptides are further manipulated to
generate a second level of lists that contain
theoretical fragment ion masses that would be
generated in the MS/MS experiment for each
theoretical peptide. Consequently, these programs
simply compare the list of experimentally
determined fragment ion masses from the MS/MS
experiment of the peptide of interest with the
theoretical fragment ion masses generated by the
computer program. Again, as with the peptide
fingerprint strategy, the investigator inputs a list
of masses, and typically has a choice of a number
of experimental conditions that can be used to
tailor the search. This is a very processor-intensive
function and due to the size of current databases, it
is only possible on a routine basis due to the
explosive increase in desk-top computing power.
Some software packages employ a combination
of predictive MS/MS data and sequence
information. An example is the so-called "error
tolerant" software is currently part of the
PeptideSearch package available at the EMBL
website. This strategy is employed when
incomplete sequence information is available from
an MS/MS spectrum. In this process the starting
mass of the partial sequence, followed by the
sequence itself and then the ending mass, are
submitted for database searching. The partial
sequence must be manually derived from
experimental MS/MS data. This limits the utility
of the approach as it requires operator input and is
not amenable to automation. By incorporating the
ion mass at which the partial sequence starts and
stops, as well as the mass of the peptide itself,
PeptideSearch is often able to generate a strong
match from limited data. Successful application of
this method requires that potential candidate
proteins in the database contain the partial
sequence, generate a theoretical proteolytic
peptide of the correct mass, and contain the partial
sequence positioned appropriately within that
theoretical peptide, based on the starting and
ending masses of the partial sequence. If a
candidate protein meets all of these requirements,
then a strong match can be argued for even with a
very short experimentally derived sequence.
The appearance of data-dependent scanning
functions on an increasing number of mass
spectrometers has permitted the unattended
acquisition of MS/MS data. Another example of a
raw MS/MS data searching program that takes
particular advantage of this ability is the
SEQUEST program. The SEQUEST software
program will input the data from a data-dependent
LC/MS chromatogram and automatically strip out
all of the MS/MS information for each individual
peak, and submit it for database searching using
the strategy discussed above. The attractiveness of
this approach is that each peak is treated as a
separate data file. This makes it especially useful
for the online separation and identification of the
individual components present in a protein
mixture. No user interpretation of MS/MS spectra
is involved.
Searching with Raw MS/MS Data
Current mass spectral technology is capable of
generating MS/MS data at an unprecedented rate.
Prior to the generation of powerful computer-
based database searching strategies, the largest
bottleneck in protein identification was the manual
interpretation of MS/MS data from which protein
sequence information was extracted. Many current
Several different strategies are applicable in the
computer-based searching of sequence databases
for the identification of unknown proteins. The
choice of strategy employed is often dictated by
the format of the data available. However, caution
must be used to ensure that results are valid. This
is aided by careful examination of the individual
MS/MS spectra and establishing that most of the
abundant ions have been used for identification.
does not enhance the confidence of proposed
identifications significantly. Finally, membrane
proteins and other proteins that are refractory to
solubilization methods may not be recovered by
this methodology as well as proving to be harder
to handle as intact proteins in comparison to
tryptic peptides.
The ability to perform these searches using
web-based programs provides direct access to the
rapidly expanding collection of public domain
protein databases that are also on the Internet.
Most of the programs take advantage of HTMLs
ability to weave various informational sources
together through hyperlinks. In doing so, when a
strong candidate identification is found, a wealth of
further information pertaining to the specific
protein is only a mouse click away.
Capillary electrophoresis (CE) is a very high-
resolution technique similar to gel electrophoresis,
except that the separations take place in a capillary
filled with an electrolyte. The term capillary
electrophoresis encompasses all of the
electrophoretic separation modes that can take
place in a capillary. In simplest terms, separations
are achieved in CE based upon differences in the
electrophoretic mobilities of charged species
placed within an electric field. One other
phenomenon present in CE that also contributes to
the movement of analytes within an applied
potential field is the electroosmotic flow.
Electroosmotic flow describes the movement of
fluid in a capillary under the influence of an
applied electric field. This movement is brought
about by the ionization of silanol groups on the
inner surface of the capillary when it is in contact
with the electrolyte. An electrical double layer is
formed when hydrated cations from the electrolyte
associate with the negatively charged ionized
silanol groups. When an electrical potential is
applied, the hydrated cations migrate towards the
cathode creating a net flow in the same direction.
The velocity of the electroosmotic flow can be
directly affected by changes in the field strength,
electrolyte pH, and viscosity of the electrolyte
solvent system used. The contribution of
electroosmotic flow to the movement of charged
species in a capillary can most easily be
determined experimentally by observing the
migration time of a neutral marker species to the
detector. In comparison to the laminar flow
achieved in liquid chromatography electroosmotic
flow is pluglike in character. As a result, far more
theoretical plates can be generated in CE in
comparison to a comparable liquid
chromatographic column length. There are many
excellent texts available that provide in-depth
coverage of the theoretical basis of capillary
electrophoresis [54, 55].
Other Methods Used for Proteomics Research
To avoid difficulties that are encountered when
extracting proteins from gel matrices, other
approaches to proteome analysis that dispense
with polyacrylamide gel separations and rely on
multidimensional chromatography have been
developed. One such approach has paired size
exclusion chromatography with HPLC for intact
protein analysis, offering convenient preparative
capabilities by directing column effluent to a
fraction collector [53]. For the purposes of
identification, ESI-MS mass analysis offers an
intact molecular weight that is accurate to
approximately 0.1%. Fractions derived from
ESI-MS analysis can also be analyzed by Edman
degradation to obtain NH
2
-terminal sequence
information, or by enzymatic digestion followed
by successive stages of MS and MS/MS for
higher-confidence identification and/or
characterization. While these approaches are
promising, the comprehensive chromatographic
method is currently limited by dynamic range
defined as the ability to see low abundance
proteins present at only a few copies per cell in
the presence of abundant proteins present at tens
of thousands or more copies per cell. Moreover,
identification on the basis of molecular weight
alone is risky, particularly in complex systems
presenting many post-translational modifications;
availability of chromatographic retention times
The most widely applied ionization method for
interfacing CE to mass spectrometry for the
analysis of biomolecules is electrospray ionization.
An uninterrupted electrical contact is essential for
both the continued operation of the CE and the
generation of the electrospray when interfacing CE
with ESI-MS. Several interfaces have been
developed to achieve this electrical contact. The
three most widely applicable interfaces are liquid
junction, sheath liquid, and sheathless interfaces.
The most developed, automated non-gel
methodology for proteome analyses was
demonstrated by analysis of E. coli periplasmic
proteins, partially fractionated by using strong
anion exchange chromatography [58]. Following
separation each fraction was digested with trypsin
and then analyzed using micro-column HPLC ESI-
MS/MS. The tandem mass spectra were used to
search the E. coli sequence database, from which a
total of 80 proteins were identified. The procedure
limits the amount of sample handling, and by
manipulating the proteins as mixtures, the higher
abundance proteins act as carriers for lower
abundance proteins, further reducing losses.
However, the presence of a single highly abundant
protein can potentially suppress the acquisition of
tandem mass spectra for lower abundance peptides
present in the mixture (dynamic range limitation).
Also, because the procedure is a superficial
sampling of the peptides present, it is possible
that no MS/MS data are acquired for some
proteins that exist in the sample. For relatively
simple mixtures this approach greatly increases the
speed and efficiency of analysis.
Although detection limits approaching the
attomole (10
-18
mol) range have been reported, CE
is generally recognized as having a very low
concentration limit of detection (CLOD). To
achieve the best resolution and peak shape, it is
necessary to inject very small volumes (low
nanoliters) of sample requiring the use of highly
concentrated samples. Several groups have
developed various pre-concentration techniques to
attempt to overcome this CLOD [56]. All these
techniques involve trapping or pre-concentrating
the samples on some type of C18 stationary phase
or hydrophobic membrane in the case of tryptic
digest mixtures.
Relative quantitation can be difficult to achieve
with all of the above methods. One trend, designed
to deliver quantitative information, has been to
isotopically label samples reflecting different
conditions [59]. Based on the isotope ratios of
tryptic peptides (or appropriately-sized intact
proteins when FTMS is employed [60]) one can
attempt to determine whether a protein is up-
regulated or down-regulated. Clearly this
methodology can only be employed when it is
possible to isotopically label the protein starting
material.
Capillary isoelectric focusing (CIEF) has been
combined with online electrospray ionization
Fourier transform ion cyclotron resonance (ESI-
FTICR) mass spectrometry to examine desalted,
intact Escherichia coli proteins [57]. The
methodologys promises of simultaneous ppm
mass measurement accuracy, high sensitivity, and
ultra-high MS resolution are attractive for the
analysis intact protein samples. The proteins
isoelectric point in combination with molecular
weight can provide a useful means for proposing
protein identifications. However, the CIEF
conditions employed for online MS have so far
employed native or near-native separation
conditions, yielding pIs that are not predictable
from sequence alone and that do not necessarily
correlate to the denatured pIs. MS/MS
dissociation techniques compatible with FTICR
mass spectrometry, such as infrared multiphoton
dissociation (IRMPD) or sustained off-resonance
irradiation (SORI) may provide structural
information to yield higher confidence protein
identifications.
A newer approach to deliver quantitative
information in all types of samples, relies on
alkylating cysteines with unique tags (e.g.,
isotopically encoded) that incorporate biotin
functionalities [61]. Cysteine-containing tryptic
peptides can be withdrawn selectively from
mixtures for subsequent LC/MS/MS analysis
yielding both identification and quantitation. The
approach should also reduce limitations on
dynamic range, because only cysteine containing
proteins would be loaded onto the LC column,
albeit at the cost of losing all information about
non-cysteine containing proteins.
evaluation. These applications occur at relatively
different points in drug discovery and
development timelines. Comparative proteome
analysis is an effective means of assessing the
inventory of proteins in a cell or tissue whose level
is altered by exposure to the drug. The drug-
specific protein pattern or signature [65, 66]
combined with mass spectrometric protein
identification can be used to determine cellular
function(s) such as cell growth rate, fractional
protein synthetic rate and intermediary metabolic
or signal transduction pathways that are affected
by the drug. Through comparison of
experimentally derived proteomic signatures
similar drug templates can be determined to have
distinctly different apparent mechanisms of action.
In the early preclinical discovery environment this
approach can be effectively applied to both target
identification and / or target evaluation.
APPLICATIONS FOR PROTEOMICS
The conceptual framework for applying 2-DE
gel based systems as a comparative tool that can
be used to study complex protein mixtures
antedates the application of proteomic
nomenclature to the process by more than 20
years [62]. Unlike mRNA transcript profiling
where DNA oligonucleotide arrays of
predetermined composition are deployed to study
the transcriptome in a comparative manner,
proteomics studies proteins in the context of other
proteins in a relatively unbiased manner. Hence,
the potential exists to reveal unforeseen
interrelationships between proteins as well as to
discover novel proteins. Since the majority of
disease processes manifest themselves at the
protein level where addressable drug targets are
identified and medicinal chemistry efforts are
directed, this feature of proteomics should be
highly desirable from the perspective of the
synthetic organic chemist. The disparate
correlation between gene transcription and protein
translation is also relevant to proteomic
applications. In the relatively few instances where
mRNA to protein data has been quantitatively
analyzed, the correlation between the two has been
very poor in eukaryotic cell systems [63, 64]. The
lack of a close quantitative relationship between
mRNA and protein necessitates that investigators
clearly understand the experimental hypothesis
being tested and the interpretation of the resultant
data. Inferring protein data from mRNA transcript
data has the potential to introduce significant error
that may lead to inappropriate experimental
conclusions.
Once a proteomic signature of efficacy is
established in a cell or a tissue based screen
proteomic signature analysis can be used to guide
compound lead series evaluation. By integrating
proteomic signature analysis into
pharmacodynamic data pertaining to a specific
target, compound lead series can be compared on
the basis of their relative ability to hit the target
within the framework of the overall proteome. By
studying the target protein within the context of
may other cellular proteins, subtle differences
emerge that give important clues as to a
compounds mechanism of action. It is plausible
that distinct proteomic signatures would be highly
correlated to rule of 5 data [67] or predictive of
drug-like and nondrug-like pharmacology [68].
Proteomics as a Tool to Characterize
Combinatorial Libraries
As the use of combinatorial chemistry library
screening [69] becomes more widespread genomic
technology platforms will ideally evolve to achieve
a level of throughput that is consistent with the
minimum number of independent molecular
entities in the libraries. At present, in the case of
2-DE gel based proteomic platforms this appears
to be an unrealistic goal. While proteome data
would be desirable to broadly classify the
Proteomics as a Tool to Study Drug
Mechanism of Action
Clear application for proteomic technologies
exists in the context of obtaining drug target
information either from mechanism of action
experiments that reveal drug effected proteome
differences or from compound lead series
constituents of a library, the technology platform
has serious drawbacks to throughput. As
discussed in preceding sections of this review,
drawbacks associated with 2-DE gel based
proteomics exist in terms of the inability to
automate key steps in the process, dependence on
highly skilled labor and a propensity to miss large
subsets of proteins such as transmembrane
proteins due to low solubility in conventional 2-
DE gel buffer systems [70]. Virtual screening
methods [71] may be of assistance by reducing the
overwhelming complexity of the task but applying
conventional proteomic 2-DE gel / MS methods to
several thousands or even one thousand
compounds represents a daunting task. At the
present time it may be more realistic to consider a
limited application of proteomic technology to
support SAR optimization efforts. However,
spectrometry instrumentation does lend itself to
automation and may provide a suitable platform
for combinatorial library scale proteomics [72].
Rather than approaching the problem from the
perspective of acquiring as many proteomic
signatures as there are entities or compound
classes in a library it may be more fruitful to allow
the library to drive the proteomics. If the
components of the library were tethered to a solid
phase support, the library could be used to probe
complex protein mixtures and recover bound
proteins in a compound specific manner. This
would necessitate that the small molecular weight
molecules would be linked to the solid phase
support in a fashion that did not interfere with
binding to their cognate macromolecule.
Approaches to the problem of identifying specific
protein - ligand interactions have been applied to
RNA-peptide fusions [73] and mirror - image
phage display [74]. However, to date proteomics
has not been systematically applied to any of
these methods.
definition, a biomarker is a characteristic that is
measured and evaluated as an indicator of normal
biologic processes, pathogenic processes, or
pharmacological responses to a therapeutic
intervention. In the strictest sense, to function as a
biomarker, a molecular entity need only be
correlated to the disease and optimally, to
demonstrate a reproducible pattern of change that
follows efficacious drug treatment. Ideally, a well-
validated biomarker that is correlated not only
with disease specific pathophysiology but also to
efficacious drug treatment can be used as a
surrogate marker in place of specific drug-target
pharmacodynamic data. Biomarkers are of
considerable utility as tools that contribute to
rational decision making that determines the first
dose regimen at the preclinical clinical interface.
Biomarkers are also useful as risk factor indicators
capable of providing predictive evidence that
individuals are susceptible to disease [77, 78, 79].
In this context a validated biomarker can be used to
direct a prophylactic therapeutic regimen.
Additionally, in instances where it is impossible to
directly assess that a drug has hit its target in vivo,
a biomarker that tracks efficacy is a valuable tool
that links biology and chemistry. In proteomic
biomarker projects it is often necessary to process
hundreds of samples to obtain statistically valid
biomarker data. This is due to the fact that it is
difficult to separate true biomarker relationships
from the intrinsic protein variability that is present
in the animal or human population being studied.
Hence, the low throughput and labor intensity
associated with 2-DE gel based proteomics is a
significant drawback to large-scale biomarker
projects. Despite drawbacks, the ability of
proteomics to provide an unbiased and
comprehensive means for studying proteins in the
context of other proteins has great potential in
biomarker applications that compare the protein
constituents of biofluids such as blood plasma,
urine, cerebrospinal fluid and synovial fluid [80].
Proteomics as a Tool to Discover Biomarkers
Proteomics as a Tool to Study Toxicology and
Pathology
In a pharmaceutical business climate that has
become increasingly intolerant of drug
developmental failures, biomarkers have received
considerable attention as tools that can contribute
to appropriate decision making [75, 76]. By
The utility of a compound to perform as a
medicinal agent capable of fulfilling unmet medical
need is established through the analysis of the
interrelationship between its pharmacodynamic,
pharmacokinetic and toxicological properties.
Proteomics has been used to study how the rat
liver proteome changes in response to drug
treatment [81]. By establishing a database that
defines the response of a tissue proteome to
specific drugs, comparative protein pattern
analysis can be used to determine the propensity
for a novel compound to elicit a toxic response and
to match the response to that previously observed
with known agents [82, 83]. If RNA transcript
profile data are also generated [84] the combined
information becomes a powerful toxicogenomics
tool that can provide early knowledge of a
prospective drugs potential to elicit a toxic
response in a small segment of the patient
population. Employing combined genomic
technologies in early toxicological profiling is
crucial data that should be integrated into
pharmaceutical drug design and lead series
optimization strategies.
concert with solid phase array and lab-on-a-chip
technology [87, 88]. Miniaturization to produce
nano-fluidic chips capable of performing
proteomic tasks is being developed [89]. In
summary, as the current generation of 2-DE gel /
MS proteomic technology platforms mature
applications for the technology will continue to
grow. With the advent of future proteomic
technologies that not only alleviate existing
shortcomings but add new dimensions to the
technology, expectations are that proteomics will
become an increasingly important factor in
understanding the functional attributes of the
genome.
REFERENCES
[1] Farber, G.K. Pharmacol. Ther. 1999, 84, 327.
[2] Baldi, P.; Brunak, S. Bioinformatics: The Machine
Learning Approach, The MIT Press, Cambridge,
Massachusetts, 1998.
[3] Wilkins, M.R.; Sanchez, J .-C.; Gooley, A.A.;
Appel, R.D.; Humphrey-Smith, I.; Hochstrasser,
D.F.; Williams, K.L. Biotechnol. Genet. Eng. Rev.
1996, 13, 19.
CONCLUSION
The advantages that are derived from directly
analyzing the proteome and applying the results to
drug discovery and development are considerable.
Already in widespread use within the
pharmaceutical and biotech industries, proteomics
has demonstrated its ability to contribute to
preclinical drug discovery strategies and clinical
drug development programs [85, 86]. Efforts to
overcome the acknowledged drawbacks that are
associated with existing proteomics technology
platforms have lead to significant recent advances
in sample processing, protein detection, image
acquisition and analysis, automation, biometric
analysis, bioinformatics and database applications.
Applications that encompass pharmacoproteomics
[85] whereby recursive proteomic analyses are
made to characterize protein markers that are drug
dose dependent and correlate with the emergence
and severity of toxicity are being used to predict
the response of individuals to treatment. In future
applications, there is considerable evidence that
proteomic analysis will become increasingly
dependent upon utilizing mass spectrometry in
[4] Wold, F. Annu. Rev. Biochem. 1981, 50, 783.
[5] Arnott D.; O'Connell K.L.; King, K.L.; Stults, J .T.
Anal. Biochem. 1998, 258, 1.
[6] Shevchenko A; J ensen, O.N.; Podtelejnikov, A.V.;
Sagliocco, F; Wilm, M; Vorm O; Mortensen, P;
Shevchenko, A; Boucherie, H; Mann, M. Proc. Natl.
Acad. Sci. U. S. A. 1996, 93, 4440.
[7] Roepstorff, P. Curr. Opin. Biotechnol. 1997, 8, 6.
[8] Wilkins, M.R.; Williams, K.L.; Appel, R.D.;
Hochstrasser, D.F. Proteome Research: New frontiers
in functional genomics. Springer Verlag: Berlin:
1997.
[9] Celis, J .E.; Oestergaard, M.; Rasmussen, H.H.;
Gromov, P.; Gromova, I.; Varmark, H.; Palsdottir,
H.; Magnusson, N.; Andersen, I; Basse, B;
Lauridsen, J .B.; Ratz, G.; Wolf, H.; Oerntoft, T.F.;
Celis, P.; Celis, A. Electrophoresis, 1999, 20, 300.
[10] Dunn, M.J . Biochem. Soc. Trans. 1997, 25, 248.
[11] Humphery-Smith, I.; Cordwell, S.J .; Blackstock,
W.P. Electrophoresis, 1997, 18, 1217.
[12] Wasinger, V.C.; Cordwell, S.J .; Cerpa-Poljak, A.;
Yan, J .X.; Gooley, A.A.; Wilkins, M.R.; Duncan,
M.W.; Harris, R.; Williams, K.L.; Humphery-
Smith, I. Electrophoresis, 1995, 16, 1090.
[33] Mujumdar, R.B.; Ernst, L.A.; Mujumdar, S.R.;
Lewis, C.J .; Waggoner, A.S. Bioconjug. Chem,
1993, 4, 105.
[13] OFarrell, P.H. J. Biol. Chem. 1975, 250, 4007.
[34] Sargent, P.B. Neuroimage, 1994, 1, 288.
[14] Ingram, L.; Tombs, M.P.; Hurst, A. Anal. Biochem.
1967, 20, 24. [35] J ohnston, R.F.; Pickett, S.C.; Barker, D.L.
Electrophoresis, 1990, 11, 355.
[15] Weber, K.; Osborn, M. J. Biol. Chem. 1996, 244,
4406. [36] Towbin, H.; Staehlin, T.; Gordon, G. Proc. Natl.
Acad. Sci. USA, 1979, 76, 4350.
[16] Klose, J .; Kobalz, U. Electrophoresis, 1995, 16,
1034. [37] Durrant, I.; Fowler, S. In Protein Blotting: A
practical approach. Dunbar, B.S. ed. IRL Press,
New York, New York, 1994, pp. 141-152.
[17] Lopez, M.F.; Patton, W.F. Electrophoresis, 1996,
18, 338.
[38] Watkins, C.; Sadun, A.; Marenka, S. Modern Image
Processing: Warping, Morphing and Classical
Techniques, Academic Press, New York, New York,
1993.
[18] Yamada, Y. J. Biochem. Biophys. Methods. 1993, 8,
175.
[19] Bjellqvist, B.; Ek, K.; Righetti, P.G.; Gianazza, E.;
Grg, A.; Postel, W.; Westermeier, R. J. Biochem.
Biophys. Methods, 1982, 6, 317.
[39] Sutherland, J . In Advances in Electrophoresis;
Chrambach, A.; Dunn, M.; Radola, B. editors; VCH,
New York, New York, 1993, pp 1-42.
[20] Bjellqvist, B., Sanchez, J .C., Pasquali, C., Ravier,
F., Paquet, N., Frutiger, S., Hughes, G.J .;
Hochstrasser, D. Electrophoresis, 1993, 14, 1375.
[40] Garrels, J .I.; Farrar, J .T.; Burwell, C.B. IV. In Two-
Dimensional Gel Electrophoresis of Proteins,
Methods and applications; Cels, J .E.; Bravo, R.
editors; Academic Press, Inc., New York, New York,
1984, pp. 37-91.
[21] Grg, A. In Methods in Molecular Biology, 2-D
Proteome Analysis Protocols; A. Link, Ed ; Humana
Press, Inc. Toowata, N.J . 1999; Vol. 112, pp 197-
209.
[41] Kraemer,E.T.; Ferrin T.E. Bioinformatics, 1998, 14,
764.
[22] Lopez, M. F.; Patton, W.F. Electrophoresis, 1996,
18, 338.
[42] Tufte, E. R. The visual display of quantitative
information, Graphics Press, Cheshire, CN, 1983.
[23] Rabilloud, T. Electrophoresis, 1996, 17, 813.
[43] Gooley, A.A.; Ou, K; Russell, J ; Wilkins, M.R.;
Sanchez, J .C.; Hochstrasser, D.F.; Williams, K.L.;
Electrophoresis, 1997, 18, 1068.
[24] Link, A. Methods in Molecular Biology, 2-D
Proteome Analysis Protocols Vol 112, Humana
Press, Inc.: Toowata, NJ . 1999.
[44] Wilkins, M.R.; Ou, K.; Appel, R.D.; Sanchez, J .-
C.; Yan, J .X.; Golaz, O.; Farnsworth, V.; Cartier,
P.; Hochstrasser, D.F.; Williams, K.L.; Gooley, A.A
Biochem. Biophys. Res. Commun., 1996, 221, 609.
[25] Laemmli, U. K. Nature, 1970, 227, 680.
[26] Tanford, C. The physical chemistry of
macromolecules in solution, Wiley, New York,
1961.
[45] Karas, M; Bahr, U; Ingendoh, A; Nordhoff, E; Stahl,
B; Strupat, K; Hillenkamp F. Anal. Chim. Acta
1990, 241, 175.
[27] Bennett, J .; Scott, K.J . Anal. Biochem. 1971, 43,
173.
[46] Cotter, R.J . Anal. Chem. 1999, 71, 445A.
[28] Switzer, R.C.; Merril, C.R.; Shifrin, S. Anal.
Biochem. 1979, 98, 231.
[47] J ames, P. Biochem. Biophys. Res. Commun., 1997,
231, 1.
[29] Wilson, C.M. Methods Enzymol. 1983, 91, 236.
[48] Yates, J .R., III. J. Mass Spectrom. 1998, 33, 1.
[30] de Marco, M.R.; Smith, J .F.; Smith, R.V. J.
Pharm. Sci., 1986, 75, 907.
[49] J onscher, K.R.; Yates, J .R., III. Anal. Biochem.,
1997, 244, 1.
[31] Rabilloud, T.; Vuillard, L.; Gilly, C.; Lawrence J .J .
Cell. Mol. Biol. 1994, 40, 57.
[50] Dongre, A.R.; Eng, J .K.; Yates, J .R., III. Trends
Biotechnol., 1997, 15, 418.
[32] Steinberg, T.H.; J ones, L.J .; Haugland, R.P.;
Singer, V.L. Anal. Biochem, 1996, 239, 223.
[51] McLuckey, S.A.; Van Berkel, G.J .; Goeringer, D.E.;
Glish, G.L. Anal. Chem., 1994, 66, 737A.
[52] Clauser, K.R.: Baker, P.; Burlingame, A.L.; Anal.
Chem., 1999, 71, 2871.
[70] Molloy, M.P. Anal. Biochem. 2000, 280, 1.
[71] Walters, W.P.; Stahl, M.T.; Murcko, M.A. Drug
Discov. Today, 1998, 3, 160. [53] Opiteck, G.J .: Ramirez, S.M.; J orgenson, J .W.;
Moseley, M.A. III. Anal. Biochem., 1998, 258, 349.
[72] Loo, J .A. Euro. Mass Spectrom., 1997, 3, 93.
[54] Camilleri, P. Capillary Electrophoresis: Theory and
Practice, Second Edition, CRC Press: Boaca Raton,
FL, 1998.
[73] Roberts, R.W.; Szostak, J .W. Proc. Natl. Acad. Sci.
USA, 1997, 94, 12297.
[74] Schumacher, T.N.M.; Mayr, L.M.; Minor, D.L., J r.;
Milhollen, M.A.; Burgess, M.W.; Kim, P.S.
Science, 1996, 271, 1854.
[55] Ding, J .; Vouros, P. Anal. Chem., 1999, 71, 378A.
[56] Yang, Q.; Tomlinson, A.J .; Naylor, S. Anal. Chem.,
1999, 71, 183A.
[75] Rolan, P. Br. J. Clin. Pharmacol., 1997, 44, 219.
[57] Yang, L.; Lee, C.S.; Hofstadler, S.A.; Pasa-Tolic,
L.; Smith, R.D. Anal. Chem., 1998, 70, 3235.
[76] Flemming, T.R.; DeMets, D.L. Ann. Intern. Med.,
1996, 125, 605.
[58] Link, A.J .; Carmack, E.; Yates, J .R. III. Int. J. Mass
Spectrom. Ion Proc. 1997, 160, 303.
[77] Tockman, M.S.; Gupta, P.K.; Pressman, N.J .;
Mulshinre, J .L. Cancer Res., 1992, 52, 2711s.
[59] Oda, Y.; Huang, K.; Cross, F.R.; Cowburn, D.;
Chait, B.T. Proc. Natl. Acad. Sci. USA 1999, 96,
6591.
[78] Myers, R.B.; Grizzle, W.E. Biotech. Histochem.
1997, 72, 86.
[79] Black, S.E. Neurology, 1999, 52, 1533.
[60] Pasa-Tolic, L.; J ensen, P.K.; Anderson, G.A.;
Lipton, M.S.; Peden, K.K.; Martinovic, S.; Tolic,
N.; Bruce, J .E.; Smith, R.D. J. Am. Chem. Soc.
1999, 121, 7949.
[80] Doherty, N.S.; Littman, B.H.; Reilly, K.; Swindell,
A.C.; Buss, J .; Anderson, N.L. Electrophoresis,
1998, 19, 355.
[61] Gygi, S.P.; Rist, B.; Gerber, S.A.; Turecek, F.;
Gelb, M.H.; Aebersold, R. Nature Biotechnol.,
1999; 17, 994.
[81] Anderson, N.L.; Esquer-Blasco, R.; Hoffman, J .P.;
Anderson, N.G. Electrophoresis, 1991, 12, 907.
[82] Aicher, L.; Wahl, D.; Arce, A.; Grenet, O.; Steiner,
S. Elecrophoresis, 1998, 19, 1998.
[62] Anderson, N.L.; Anderson, N.G. Electrophoresis,
1998, 19, 1853.
[83] Steiner, S.; Anderson, N.L. Toxicol. Lett., 2000, 112
113, 467.
[63] Anderson, L.; Seilhamer, J .; Electrophoresis, 1997,
18, 533.
[84] Nuwaysir, E.F.; Bittner, M.; Trent, J .; Barrett, J .C.;
Afshari, C.A. Mol. Carcinogen., 1999, 24, 153.
[64] Gygi, S.P.; Rochon, Y.; Franza, B.R.; Aebersold,
R.; Mol. Cell. Biol., 1999, 19, 1720.
[85] Page, M.J .; Amess, B.; Rohlff, C.; Stubberfield, C.;
Parekh, R. Drug Discov. Today, 1999, 4, 55.
[65] Vohradsky, J .; Li, X-M.; Thompson C. J .
Electrophosesis, 1997, 18, 1418.
[86] Wang, J .H.; Hewick, R.M. Drug Discov. Today,
1999, 4, 129.
[66] VanBogelen, R.A.; Schiller, E.E.; Thomas, J .D.;
Neidhardt, F.C. Electrophoresis, 1999, 20, 2149.
[87] Albala, J .; Humphery-Smith, I. Current Opin.
Molec. Therapeut. 1999, 1, 680.
[67] Lipinski, C.A.; Lombardo, F.; Dominy, B.W.
Feeney, P.J . Adv. Drug Del. Rev., 1997, 23, 3.
[88] Figeys, D. Current Opin. Molec. Therapeut. 1999,
1, 685.
[68] Ajay; Walters, P.; Murcko, M.A. J. Med. Chem.,
1998, 41, 3314.
[89] Ramsey, R.S.; Ramsey, J .M. Anal. Chem., 1997,
69, 3153.
[69] Ajay; Bemis, G.W.; Murcko, M.A. J. Med. Chem.,
1999, 42, 4942.

Proteomics 10

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Proteomics 10

Hochgeladen von

Copyright:

Verfügbare Formate

Current Pharmaceutical Design, 2001, 7, 293-312 291

Proteomics as a Tool in the Pharmaceutical Drug Design Process

acrylamido buffers [19].

buffers into the polyacrylamide support matrix

Das könnte Ihnen auch gefallen