Sie sind auf Seite 1von 6

Cellular and Molecular BiologyTM 47 (8), 1295-1299

Printed in France.

0145-5680/01
2001 Cell. mol. Biol.
TM

FUNCTIONAL PROTEOMICS
TO EXPLOIT GENOME SEQUENCES
A. Donny STROSBERG
Hybrigenics SA, 3-5 Impasse Reille, 75014 Paris, France
Fax: +33 (0)1 58 10 38 40; E-mail: adstrosberg@hybrigenics.fr
Received June 12, 2001; Accepted June 29, 2001

A. Donny STROSBERG, obtained his Doctorate in Chemistry from the Free University of
Brussels, Belgium, before becoming Instructor in Medicine at Massachusetts General Hospital and
Harvard Medical School in Boston, USA. Upon his return to Europe he became a Professor first at
the Free University of Brussels (Belgium) and then at the University of Paris VII, France, where
he was until recently also Director of the B2M Graduate School. At the Cochin Institute for
Molecular Genetics (Paris) he served as Director of the CNRS Unit of Molecular
ImmunoPharmacology. His research interests focus on G-protein coupled receptors including the
-adrenoreceptors, the muscarinic acetylcholine receptors and the angiotensin II AT2 receptor. His
group discovered the 3-adrenoreceptor. He has published over 400 peer-reviewed articles in
international peer reviewed scientific journals, has edited half a dozen books and co-authored many
others. He has filed close to thirty patent families of which many have been issued in the USA and
in Europe, and have been licensed to pharmaceutical and biotechnology companies. Prof. A. Donny
Strosberg was one of the pioneer in creating biotech companies based on research, scientific skills
and expertise: He was the founder and/or co-founder of number of companies: Chemunex SA (for
microbial analysis), Vetigen SARL (allergy testing in pets and horses), Neurotech SA (gene therapy
for the eye), Hybrigenics SA (functional proteomics). These privately-owned companies are
financed by institutional grants and international venture capital companies.

Abstract - The sequencing of various genomes has inaugurated a new stage in the understanding of normal and pathological cell
function through the analysis of the role of proteins. Proteins, after all, that intervene in the different molecular mechanisms of life,
during growth, reproduction, and in the interaction between cells, thus making it possible to describe the biology of integrated
systems. In this article, we briefly describe the various stages in the progression of our knowledge, from the genome to the
"functional" proteome. Emphasis is placed on a global approach to the protein-protein interactions used to describe the cellular
"interactome".
Key words: Proteome, cell pathology, genome

HUMAN GENOME SEQUENCING:


JUST THE BEGINNING
The sequencing of the human genome is now entering
its final approach. The main results were published in two
issues of Nature and Science (6,7), published
simultaneously during the week of February 16, 2001.
After so many efforts, researchers can now know more
precisely "read" the sequence of all the elements which
constitute our genes, i.e. our molecular memory. In
addition, scientists will also be able to read almost
everything that surrounds these genes, i.e. what we today

refer to as the "non-coding" parts since in most cases we


still do not know what purpose they serve. These parts
represent more than 97% of the entire human genome.
For many people, this decisive step forward in the
study of our genes represents a remarkable achievement.
This is certainly true in technical terms. Unfortunately,
reading a sequence does not necessarily guarantee that we
will understand its meaning. For example, we can all
decipher the letters which compose the words of any
language written in the Latin alphabet, and we can guess
the meaning of certain words with origins similar to those
of our own tongue. However the Turkish language, for

1295

1296

A.D. Strosberg

example, is generally incomprehensible, despite the


familiarity of the individual letters. Today the same is true
for the human genome sequence.
Aside from the fact that we have only just begun to
distinguish the gene sequences from the non-coding parts,
we do not have, for many of these genes, any idea of their
role in the cell.
This explains the effervescence now affecting the
scientific and industrial community. The massive efforts
devoted to human genome sequencing must, for their
ultimate justification, be followed by a decoding of its
functions. What was once an end, has now become a
beginning!
DECODING THE HUMAN GENOME:
IDENTIFYING THE GENES
Like the discovery of Egyptian hieroglyphics, the
identification of gene sequences has stimulated the
imagination of researchers. Numerous techniques have
emerged to determine the position and organization of
genes in the sequence. Techniques known as "automatic
annotations" have appeared to facilitate gene
identification. Though these techniques are able to detect
the nucleotide sequences which mark the beginning and
the end of certain genes, they are incapable of identifying
their functions. To do so, it is necessary to study the
corresponding proteins. This is the discipline defined as
proteomics.
PROTEOMICS
One gene: several proteins
A certain amount of success has been achieved in
understanding the function of certain genes, often on the
basis of structural homology with well characterized
genes whose protein products are already well known.
Nevertheless, numerous problems have emerged. If
the gene occupies a central position in the transmission of
the functions from generation to generation, it is mainly
because it "codes" for the structure of molecules much
more directly involved in cell function, i.e. proteins. It is
indeed these proteins which are responsible for most of the
functions that characterize life: movement, respiration,
growth, etc
But the gene/protein correspondence is not univocal:
indeed, the sequence of a single gene may contain
information for the structure of several different forms of
a protein, varying in length and in function. This is notably
due to the fact that a gene may be composed of several

coding segments called "exons", separated by non-coding


segments called "introns".
The order of assembly and even the utilization of these
diverse exons may change according to the state of cell
differentiation. It has been known for a long time that
membrane immunoglobulins are distinguished from
secreted forms by a hydrophobic segment that allows
attachment to the plasma membrane; this is replaced by a
hydrophilic segment in secreted immunoglobulins.
Moreover, once translated from a gene, proteins may
undergo a series of modifications liable to change their
behavior through the addition of phosphate groups, lipids,
or sugars, for example, or to affect their folding patterns,
by the formation of disulfide bridges, for example.
At a time when the scientific world is finally
recognizing that the number of genes in the human
genome probably does not exceed 40,000, everyone
readily admits that the total number of proteins will be at
least five times if not ten times larger. Moreover, one
also admits that it will not be sufficient to know the linear
sequence of protein products to predict their spatial
structure. Finally, even knowledge of their threedimensional structure will not necessarily allow us to
understand their function and role. Therefore, decoding
the function of the human "proteome" will be a much
more complex task than that of the sequencing of the
genome.
Surprisingly, the world is just beginning to prepare
itself for this task. This time, the fastest to react was the
Japanese Ministry of Technology and Industry (MITI),
which announced last year that from now on, it will
provide three times more funds for proteomics, i.e. the
knowledge of proteins, than for genomics, i.e. genome
analysis. The United States also responded quickly,
creating a national program called "Structural Proteomics"
for determining the numerous crystallographic structures
of proteins. Progressively, various nations are developing
programs on the proteome. Yet we are far from the level
of resources mobilized for human genome sequencing.
The sheer scale and diversity of the "proteomics"
challenge doubtless constitute an obstacle to progress,
over and above the disappointment of discovering that
identification of the human genome sequence, with all the
media attention that it received, will not in fact bear fruit
over the short term.
Expression proteomics
The exact meaning of the word "proteomics" is yet to
be defined. Today the term refers mainly to the study of
the composition or structure of the protein content of cells.

Functional proteomics to exploit genome sequences

This content is studied with regard to composition by


isolating proteins after cellular lysis, separating these on
polyacrylamide gel or in a chromatographic column. They
are then analyzed by sequencing the amino acids on a
sequencer or by determining the size of the protein
fragments using a mass spectrometer. The result of these
complex manipulations (increasingly automated) is the
description of a protein profile which, if sufficiently
reproducible, can be associated with a pathological state or
a stage of cell differentiation.
Structural proteomics
Structural proteomics refers to the study of protein
folding, generally analyzed by studying the X-ray
diffraction of protein crystals. Spectacular recent progress
has made it possible to automate the most random process,
i.e. crystallization, and the analysis procedure. Other
analysis methods, by nuclear magnetic resonance (NMR)
for example, complete the array of methods now available
for studying the three-dimensional structure of proteins.
Functional proteomics: what for?
Once the composition and structure of cell proteins are
known, the most difficult task but also the most
interesting is then to analyze their function. Indeed,
knowledge of protein function should take us towards our
ultimate goal, namely to understand the normal
functioning of the cell and the changes induced by a
pathogenic agent introduced by a natural or provoked
accident.
Functional proteomics: how?
Contrary to sequence analysis which can be automated
(both at the chemical and at the computing levels), the
study of protein function is very difficult to systematize.
Indeed, the very definition of protein function is
problematic: is a protein defined by its individual
biological activity, such as enzyme activity for example?
Or by its capacity to interact with other substances, such
as antibodies? Or by its role in the cell structure, like the
proteins of the cytoskeleton? Or its association with
pathological manifestations? There is no single answer to
these questions, all may be valid at the same time or
successively. This means that a whole variety of protein
function study methods can be envisaged.
Several "generic" methods have recently appeared for
the simultaneous study of the "functions" of hundreds, if
not thousands, of proteins, alone capable of sifting through
the proteomes to find as yet unidentified functions. We
will look briefly at two of these methods, operating at

1297

cellular level. One concerns systematic inactivation of cell


genes, the other the cartography of protein-protein
interactions.
CELL GENE INACTIVATION
Individual gene inactivation has been practiced for a
long time through site-specific or random mutagenesis.
The selection of cells and especially animals with mutated
genes is nevertheless still a long and arduous task.
Recently, Ross-Mac Donald et al. (9) described a study
focusing on one third of the six thousand genes of the
Saccharomyces cerevisisae yeast which offers prospects
for more systematic and faster progress. A similar study
(2) on a smaller scale, applied to the pathogenic yeast
Candida albicans, has reinforced the idea that if the
methods proposed by these authors were one day applied
to mammalian cells, they would advantageously replace
the current techniques which involve either intracellular
injection of antisense RNA or genetic inactivation.
CARTOGRAPHY OF
PROTEIN-PROTEIN INTERACTIONS
The other approach concerns the large scale
identification of protein-protein interactions (Fig. 1). A
manual technique based on the manipulation of
transfected yeasts has existed for several years. Invented
by Stanley Fields in New York, the so-called "double
hybrid" technique is used to detect protein "preys" which
link to protein "baits" (3). Unfortunately, the original
technique is complex to apply and generates numerous
false-positive and false-negative responses. A newer
version of this technique, considerably improved,
automated, and robotized, has led to the discovery (5,8) of
the protein partners of nearly 800 proteins of the
bacterium Helicobacter pylori, the most widespread
pathogen bacterium in humans responsible for most
stomach ulcers and cancers (Fig. 2A, 2B). Through the
computer controlled method developed by these
researchers, the domains responsible for interactions have
been identified for nearly all of these protein partners.
These domains may be used as dominant-negative mutant
forms in vivo and as interaction modulators in vitro. For
example, a first domain was used to block flagella
synthesis, making the transfected bacterium practically
incapable of normal motion (1). These domains may also
serve as positive controls to develop high flux screens for
the identification of antagonists or agonists. The same
approach has already been used to study the interaction of

Quadri
1298

A.D. Strosberg

A) Method of double hybrid in the yeast by mating (5)


Fig. 1

B) Iterative method for prey selection and for elimination of


false positives and reduction of false negatives

Automated method for the cartography of the protein-protein interactions of a cell

SID mimics the surface reacting


with the prey and can also act
like a dominant negative mutant
in vivo or like a positive control
in a screening system at high
yield in vitro
SID can also represent a first
step in the strategy of drug
modellization

A) All fragments derived from a same prey and recognized


by a same bait involve a minimal domain, the SID (Selected
Interacting Domain), which is in fact the smallest
denominator. It presents a unique sequence and constitutes
in fact the linking domain to the bait [see (1)].
Fig. 2

B)

Identification (A) and utilization (B) of the domains intervening in the protein-protein interactions

Functional proteomics to exploit genome sequences

proteins of the hepatitis C virus (4). The computerized


representation of these interactions (10) allows literally to
construct a consultable and modifiable data base
containing a large amount of information on each of the
studied proteins. The same computer format has been used
to represent the entire set of available data on interactions
between the AIDS virus and human lymphocyte proteins
(databases accessible at www.pimrider.hybrigenics.com).
INTEGRATIVE BIOLOGY OF SYSTEMS
We are thus developing an array of molecular and
genetic methods for the study of integrated systems, i.e.,
the sets of cells that make up organs and individuals.
These techniques have yet to be complemented by others
that are directly applicable to intact cells in their normal
ambient medium.
It is only when we have understood the role of proteins
in the diverse mechanisms which make up the life of the
cell, of organs, and of the individual, that we will be able
to take action to stop the uncontrolled proliferation of
cancerous cells, viral or bacterial invasion, graft rejections,
and so on
Some will maintain that we discovered antibiotics
without understanding host-pathogen interactions and that
some Amazonian tribes use plant extracts for cures
without understanding the protein structures. Though it is
certain that popular traditions and experimental research
have led to considerable success, it is also true that the
average life expectancy and quality of life of Western
populations with access to medicine based on the most
modern techniques is much greater than that of people
living in developing countries. Moreover, with the
progress of humanity, pathogens resistant to these

1299

methods of treatment are becoming more widespread, and


even in countries with a highly-developed health care
system, certain diseases are still "incurable". The time has
come to combat these resistances and to find effective
treatments for these diseases. Functional proteomics will
certainly contribute significantly to this progress.

REFERENCES
1. Colland F., Rain, J.-C., Gounon , P., Labigne, A., Legrain, P. and De
Reuse, H., Identification of the Helicobacter pylori anti-s28 factor.
Mol. Microbiol. 2001, 41: 477-487.
2. De Backer, M.D., Nelissen, B., Logghe, M., Viaene, J., Loonen, I.,
Vandoninck, S., de Hoogt, R. and Dewaele, S., An antisense-based
functional genomics approach for identification of genes critical for
growth of Candida albicans. Nature Biotechnol. 2000, 19: 235241.
3. Fields, S. and Songs, O., A novel genetic system to detect proteinprotein interactions. Nature 1989, 340: 245-246.
4. Flajolet, M., Rotondo, G., Daviet, L., Bergametti, F., Inchauspe, G.,
Tiollais, P., Transy, C. and Legrain, P., A genomic approach of the
hepatitis C virus generates a protein interaction map. Gene 2000,
242: 369-379.
5. Legrain, P., Woycik, J. and Gauthier, J.-M., Protein-protein
interaction maps: a lead towards cellular functions. Trends Gen.
2001, 17: 346-352.
6. The human genome. Nature Genomics Special. vol. 409 (issue
6822): February 15, 2001.
7. The human genome. Science vol. 291 (issue 5507): February 16,
2001.
8. Rain, J.C., Selig, J.L., De Reuse, H., Battaglia, V., Reverdy, C.,
Simon, S., Lenzen, G., Petel, F., Wojclk, J., Schchter, V.,
Chemama, Y., Labigne, A. and Legrain, P., The protein-protein
interaction map of Helicobacter pylori. Nature 2001, 409: 211-215.
9. Ross-MacDonald, P., Coehlo, P.S., Roemer, T., Agarwal, S.,
Kumar, A., Jansen, R., Cheung, K.H., Sheehan, A., Symniatis, D.,
Umansky, L., Heidtman, M., Nelson, F.K., Iwasaki, H., Hager, K.,
Gerstein, M., Miller, P., Roeder, P. and Snyder, M., Large-scale of
the yeast genome by transposon tagging and gene disruption.
Nature 1999, 402: 413-418.
10. Wojcik, J. and Schchter, V., Protein-protein interaction map
inference using interacting domain profile pairs. Bioinformatics
2001, 17(Suppl. 1): S296-S305.

Das könnte Ihnen auch gefallen