Gene Expression Studies Using Microarrays Principles, Problems, and Prospects

GENE EXPRESSION STUDIES USING
MICROARRAYS: PRINCIPLES, PROBLEMS, AND

PROSPECTS
David Murphy
Advan in Physiol Edu 26:256-270, 2002.

You might find this additional info useful...
This article cites 51 articles, 21 of which can be accessed free at:
http://advan.physiology.org/content/26/4/256.full.html#ref-list-1
This article has been cited by 9 other HighWire hosted articles, the first 5 are:
Global Analysis of Transcriptional Regulation by Poly(ADP-ribose) Polymerase-1 and
Poly(ADP-ribose) Glycohydrolase in MCF-7 Human Breast Cancer Cells
Kristine M. Frizzell, Matthew J. Gamble, Jhoanna G. Berrocal, Tong Zhang, Raga
Krishnakumar, Yana Cen, Anthony A. Sauve and W. Lee Kraus
J. Biol. Chem., December 4, 2009; 284 (49): 33926-33938.
[Abstract] [Full Text] [PDF]
Microarray analysis using disiloxyl 70mer oligonucleotides

Marcus Gry Bjrklund, Christian Natanaelsson, Amelie Eriksson Karlstrm, Yong Hao and
Joakim Lundeberg
Nucl. Acids Res., March , 2008; 36 (4): 1334-1342.
Microarray analysis using disiloxyl 70mer oligonucleotides
Marcus Gry Bjrklund, Christian Natanaelsson, Amelie Eriksson Karlstrm, Yong Hao and
Joakim Lundeberg
Nucl. Acids Res., January 10, 2008; .
[PDF]
Profiling Studies in Ovarian Cancer: A Review
Rudolf S. N. Fehrmann, Xiang-yi Li, Ate G. J. van der Zee, Steven de Jong, Gerard J. te
Meerman, Elisabeth G. E. de Vries and Anne P. G. Crijns
The Oncologist, August , 2007; 12 (8): 960-966.
Updated information and services including high resolution figures, can be found at:
http://advan.physiology.org/content/26/4/256.full.html
Additional material and information about Advances in Physiology Education can be found at:
http://www.the-aps.org/publications/advan
This infomation is current as of October 25, 2011.
Advances in Physiology Education is dedicated to the improvement of teaching and learning physiology, both in specialized
courses and in the broader context of general biology education. It is published four times a year in March, June, September and
December by the American Physiological Society, 9650 Rockville Pike, Bethesda MD 20814-3991. Copyright 2002 by the
American Physiological Society. ISSN: 1043-4046, ESSN: 1522-1229. Visit our website at http://www.the-aps.org/.
Downloaded from advan.physiology.org on October 25, 2011
Transcriptional profiling of VEGF-A and VEGF-C target genes in lymphatic endothelium

reveals endothelial-specific molecule-1 as a novel mediator of lymphangiogenesis
Jay W. Shin, Reto Huggenberger and Michael Detmar
Blood, September 15, 2008; 112 (6): 2318-2326.
A P S
R E F R E S H E R
C O U R S E
R E P O R T
GENE EXPRESSION STUDIES USING MICROARRAYS:

PRINCIPLES, PROBLEMS, AND PROSPECTS
David Murphy
University of Bristol Research Centre for Neuroendocrinology,
Bristol Royal Infirmary, Bristol BS2 8HW, England
ADV PHYSIOL EDUC 26: 256 270, 2002;

10.1152/advan.00043.2002.
Key words: functional genomics; microarray; cDNA; oligonucleotide; bioinformatics
Japan, China, the United Kingdom, and the United

States. Furthermore, public and charitable funding has
ensured that genome data are freely available to all,
enabling information to be put to immediate use to the
maximal possible benefit of all humankind.
The sequencing projects that have elucidated the human genome (http://www.nature.com/cgi-taf/
dynapage.taf?file/nature/journal/v409/n6822/
index.html, http://www.ncbi.nlm.nih.gov/genome/
guide/human/, http://www.sciencemag.org/content/
vol291/issue5507/, and Refs. 8, 52) and will soon
reveal the genetic complement of key model vertebrate organisms such as the mouse (http://www.
ncbi.nlm.nih.gov/genome/guide/mouse/index.html),
rat (http://www.ncbi.nlm.nih.gov/genome/guide/
R_norvegicus.html), puffer fish (http://fugu.hgmp.
mrc.ac.uk/), and zebra fish (http://www.ncbi.nlm.
nih.gov/genome/guide/D_rerio.html) must rank among
the greatest achievements of human civilization. This is
all the more so as these efforts have been truly internationalthe Human Genome Sequencing Consortium includes scientists at 16 institutions in France, Germany,
Despite the magnificent efforts, the exact number of

genes needed to make a human being is still not
known. The computing technologies that recognize
transcription units in raw genomic information are
still being developed. Current estimates range from
28,000 to 120,000 (9, 17, 28, 33, 42, 52). Irrespective
of the exact number, the challenge remains to determine exactly what all of these genes do in terms of the
development and physiological functioning of the organism. This is the task of a newly emerging disciplinefunctional genomics. A crucial aspect of func-
1043 - 4046 / 02 $5.00 COPYRIGHT 2002 THE AMERICAN PHYSIOLOGICAL SOCIETY

VOLUME 26 : NUMBER 4 ADVANCES IN PHYSIOLOGY EDUCATION DECEMBER 2002
256
number of mammalian genomes having been sequenced, an important next

step is to catalog the expression patterns of all transcription units in health
and disease by use of microarrays. Such discovery programs are crucial to our
understanding of the gene networks that control developmental, physiological, and
pathological processes. However, despite the excitement, the full promise of microarray technology has yet to be realized, as the superficial simplicity of the concept
belies considerable problems. Microarray technology is very new; methodologies are
still evolving, common standards have yet to be established, and many problems with
experimental design and variability have still to be fully understood and overcome.
This review will describe the time course of a microarray experimentRNA isolation
from sample, target preparation, hybridization to the microarray probe, data capture,
and bioinformatic analysis. For each stage, the advantages and disadvantages of
competing techniques are compared, and inherent sources of error are identified and
discussed.
A P S
R E F R E S H E R
C O U R S E
R E P O R T
tional genomics is the description of global

expression patterns. In this regard, we need to address two questions:
Where in the organism are all of these genes expressed?
How do the expression patterns of these genes

change as a consequence of physiological cues and
disease?
For many years, molecular biologists tackled the analysis of one or a few genes at a time. Techniques relied
on the use of nucleic acid (in situ hybridization,
Northern blotting, RNase protection, etc.) or antibody (immunocytochemistry, Western blotting, etc.)
probes. Descriptions of gene expression were a haphazard process, largely dependent on the opportunistic availability of these probes and the intuition of
researchers. The availability of genome information
and the parallel development of microarray technology have provided the means to perform global analyses of the expression of thousands of genes in a
single assay (15, 27). The results provide an assessment of the expression levels of the genes included
on the microarray in a particular cell, tissue or organ.
The basic concept of microarray analysis is simple
(Fig. 1). RNA is harvested from a cell type or tissue of
interest and labeled to generate the targetthe free
nucleic acid sample whose identity or abundance is
being detected. This is hybridized to the tethered
probe DNA sequences corresponding to specific
genes that have been affixed, in a known configuration, onto a solid matrix. Hybridization, based on
Watson and Crick base pairing, between probe and
target provides a quantitative measure of the abundance of a particular sequence in the target population. This information is captured digitally and subjected to various analyses to extract biological
information. Comparison of hybridization patterns enables the identification of mRNAs that differ in abundance in two or more target samples (Fig. 1). Thus
microarrays provide a powerful tool with which to
screen biological specimens for alterations in the expression of mRNAs that accompany, and may regulate
physiological and pathological change.
FIG. 1.
Course of a typical microarray experiment, starting
from the collection of biological samples (in this case,
brain) from which the targets are derived, through the
preparation and hybridization of the target, to the
microarray probe and the elaboration and analysis of
gene expression profiling data (bioinformatics), to
the biological conclusions that one might wish to derive. Each stage is a source of variability, namely tissue
source, tissue harvesting, microarray platforms, microarray variability, sample labeling, data analysis, and
human error! The identification and control of these
sources of error are a crucial aspect of present microarray development. In this particular case, the samples
give very similar hybridization patterns but reveal a
number of differentially expressed genes (arrows).
Before we consider in detail the many reasons for not

carrying out a microarray study, we should iterate the
257
A P S
R E F R E S H E R
C O U R S E
reasons why this technique has elicited so much excitement in recent years. Microarrays are a novel way
of getting fast answers to obvious questions. Although
microarray discovery research is not in itself hypothesis driven, rigorous but unbiased data gathering
can narrow the search for genes that represent the
best targets for future hypothesis-driven studies. Thus
microarray techniques are finding utility in
gene discoverythe global description of genes
potentially involved in developmental, physiological, and pathological processes (7, 22)
gene regulationthe description of regulatory networks, based on the assumption that genes regulated in parallel share common control mechanisms
(2, 3, 53, 56)
diagnosisidentification of patterns of gene expression that define disease states and that may represent prognostic indicators (49)
drug discovery and toxicology (8, 25).
pool multiple samples
perform multiple replicates.
What Is Normal?
Before we can interpret microarray data related to
physiological and pathological transitions, we need to
answer the question, What is normal? For any system under study, we need to understand the magnitude and diversity of gene expression in the unperturbed state. Different strains of the same species
have been shown to exhibit marked variability in the
overall pattern of gene expression in the brain. Sandberg et al. (44) compared the expression profiles of
more than 10,000 genes in six brain regions of two
inbred mouse lines, C57Bl/6 and 129SvEv. Around 1%
of genes were identified as being differentially expressed in at least one brain region. This is not unexpected, and indeed, these gene expression differences may represent a functional substrate for
physiological and behavioral differences between
strains. Similarly, it has been shown that gene expression patterns differ greatly between genetically identical individuals of the same age, sex, and physiological condition. Comparison of the expression profiles
of 5,406 genes in a number of tissues of the C57Bl/6
inbred mouse strain revealed considerable variability,
ranging from 0.8% in the liver to 3.3% in the kidney
(40). It is accepted that, for any particular parameter,
physiological normalcy is not a strict value but is
rather a range of values presented by healthy individuals. It is perhaps no surprise that normal gene
expression displays similar variability. Such variability
between individuals emphasizes the need to pool
samples, and to perform replicates.
However, the superficial simplicity of the microarray

concept belies considerable problems, and the full
promise of the technology has yet to be realized.
Microarray technology is very new; methodologies
are still evolving, common standards have yet to be
established, and many problems with experimental
design and variability have still to be fully understood
and overcome. Each stage in the course of a microarray experiment is thus a source of error (Fig. 1).
BIOLOGICAL SAMPLES
Microarray analysis has revealed a large degree of
variability in gene expression patterns in particular
tissues, even within supposedly genetically identical
individuals. In addition, differences can be expected
as a consequence of time of day, age, and physiological state (including sex and stage of the estrous cycle)
and as a response to disease. Furthermore, variability
can be introduced into a microarray experiment by
way of sample heterogeneity and sampling error. The
only way to overcome these difficulties is to
Sample Collection
Ideally, a tissue sample from which target RNA is
isolated must be pure. Although this might be possible for cultured cells, it is impossible for samples
obtained from animals or from patient material due to
the cellular heterogeneity of tissues. This is a problem
compounded by human error and the variability inherent in tissue collection.
Whereas microarrays can be seen as a new and precise way to define cellular phenotype based on pat-
standardize experimental conditions as far as possible
258
R E P O R T
A P S
R E F R E S H E R
C O U R S E
R E P O R T
ARRAY PLATFROMS
terns of gene expression, it follows that the ideal

sampling methodology should enable the analysis of
the cellular contents of single cells or of groups of
selected cells (12, 14). The mRNA content of an
individual cell can be isolated using a patch pipette to
penetrate the cell membrane. The cellular contents
are then aspirated into the pipette and transferred to
a microcentrifuge tube for RNA isolation and target
preparation. Samples can be obtained from randomly
selected cells, from electrophysiologically characterized cells or from defined cell types in transgenic
mice that specifically express reporters, such as enhanced green fluorescent protein (55). This level of
analysis has now been taken to the subcellular level.
Previous studies have revealed that a complex subset
of mRNAs is present within the dendritic subdomain
of neurons, where their local translation may contribute to synaptic plasticity (35). Microarray analysis has
been used to screen for genes that are enriched in
neuronal dendrites in response to various stimuli. A
patch pipette was used to harvest individual dendrites
and cell soma from primary rat hippocampal neuronal
cultures treated with (RS)-3,5-dihydroxyphenylglycine, a metabotrobic glutamate receptor agonist that
modulates protein translation in dendrites. Targets
derived from these samples were used to analyze
microarrays, which revealed that a few mRNAs
changed in abundance as a result of stimulation (12).
There are two microarray platforms in common use

cDNA microarrays, which utilize cloned probe molecules corresponding to characterized expressed sequences, and oligonucleotide microarrays, made of
synthetic probe sequences based on database information (20).
cDNA Microarrays
they are usually hybridized to more sensitive radioactive tagets
no specialized equipment is required
they can be reused up to four or five times.
Glass, too, has its advantages:
Another method for rapidly procuring pure, targeted,

single or multiple cells from specific microscopic regions of tissue sections is laser capture microdissection (LCM) (6). A tissue sample is covered with a
transparent plastic film and observed under the light
microscope. The cells of interest having been identified, a focused infrared laser beam is activated. The
heat of the beam melts the film, causing it to adhere
to the targeted cells, which then can be lifted away,
leaving the rest of the tissue section intact. Bonaventure et al. (4) have elegantly validated this approach
by cataloguing gene expression profiles of seven rat
brain nuclei or subnuclei, thus identifying putative
specific markers of potential functional importance.
Arcturus sells the industry standard LCM machine
(http://www.arctur.com/about/technology/technology_lcm.htm).
DNA samples can be covalently attached to a glass

matrix
glass is a durable material that sustains high temperatures and washes of high ionic strength
because glass is nonporous, the hybridization volume can be kept to a minimum, enhancing the
kinetics of base pairing between probe and target
glass has a low inherent fluorescence, so does not

contribute greatly to background noise.
Two different targets derived from two samples to be

compared can be labeled with different fluors and
simultaneously hybridized with a glass microarray in a
single competitive reaction. In contrast, nylon microarrays are generally probed in serial or parallel
hybridization reactions.
259
A cDNA microarray comprises a collection of gene

sequences [usually pure PCR products ranging in size
from 100 2,000 bp derived from cDNA and expressed sequence tag (EST) clones] that are applied
individually to precise locations on a solid matrix,
usually nylon or glass. Nylon membranes have a number of advantages:
A P S
R E F R E S H E R
C O U R S E
Fabrication of cDNA microarrays. For both glass

and membrane matrixes, each microarray element is
generated by the deposition of a few nanoliters of
purified PCR product at a concentration of 100 500
g/ml. Spots are typically 100 m in diameter and can
be deposited at a density of up to 20,000 features/
cm2. Spotting is achieved by contact (mechanical microspotting) or noncontact (ink jetting) methods.
R E P O R T
microarray elements. Clearly, the use of a radioactive

target requires that comparison of different targets
must be carried out using serial hybridizations to the
same microarray or by parallel analyses using separate
microarrays.
MECHANICAL MICROSPOTTING. A DNA sample is loaded into

a spotting pin by capillary action, and a small volume
is transferred to a solid surface by physical contact
between the pin and the solid substrate. After the first
spotting cycle, the pin is washed, and a second sample is loaded and deposited to an adjacent address.
Robotic control systems and multiplexed print heads
allow automated microarray fabrication (29, 46).
INK JETTING.
A DNA sample is loaded into a miniature

nozzle equipped with a piezoelectric fitting (or other
form of propulsion), which is used to expel a precise
amount of liquid from the jet onto the substrate. After
the first jetting step, the nozzle is washed and a
second sample is loaded and deposited to an adjacent
address. A repeat series of cycles with multiple jets
enables rapid microarray production (47).
RNA purity is a critical factor in hybridization performance, particularly when fluorescence is used, as
cellular protein, lipid, and carbohydrate can mediate
significant nonspecific binding of labeled cDNAs to
matrix surfaces.
A number of contact and noncontact robotic arraying

systems are commercially available (http://ihome.cuhk.
edu.hk/b400559/array.html#Arrayer/%20Spotter and
http://www.lab-on-a-chip.com/files/maauto.pdf). Prefabricated cDNA chips are available from a number of
suppliers (18, 20, and http://ihome.cuhk.edu.hk/
b400559/array.html#Microarray%20slide and http:
//www.lab-on-a-chip.com/suppliers/inform.html).
A limitation of cDNA microarray technology is the

large amount of RNA required to produce an adequate
signal over noise (11). This is a particular issue with
low-abundance transcripts. Fluorescence detection
requires 10 g of total RNA (equivalent to a million
cells), whereas radioactive detection enables detection with as little as 0.1 g of starting total RNA
(10,000 cells). However, as described above, the ultimate aim is to carry our expression profiling with as
few cells as possible, preferably single cells. For targets to be derived from such samples, some form of
amplification process needs to be incorporated into
the procedure. PCR (43) is a highly efficient method
for exponentially amplifying a population of singlestranded cDNA. However, the nonlinear amplification
results in a target in which sequence representation is
skewed compared with the original mRNA pool. In
Target labeling and hybridization of cDNA microarrays. Sample RNA is converted to target by use
of the enzyme reverse transcriptase, an oncoretroviral
enzyme that uses RNA as a template for the synthesis
of a single-stranded cDNA. Reverse transcriptase requires a short primer to initiate cDNA synthesis, and
this is usually provided by oligo(dT), which anneals to
the poly(A) tail found at the 3 end of the vast majority
of mammalian mRNAs. The label incorporated into
the cDNA can be either radioactive or fluorescent.
Radioactive target is generated by incorporation of
[33P]dCTP, a relatively weak emitter that reduces interference between the closely physically juxtaposed
260
An advantage of fluorescence detection is that competitive hybridization to the same microarray (usually
glass, see above) can be used to compare targets
derived from different samples. The relative hybridization of the targets labeled with different fluors to
the same probe can be readily quantified. The fluorescent labels Cy3-dUTP and Cy5-dUTP are frequently
paired, as they have high incorporation efficiencies
with reverse transcriptase and good photostability
and yield and are widely separated in their excitation
and emission spectra, allowing highly discriminating
optical filtration. However, it should be noted that the
different fluors produce targets with different characteristics. Thus microarray experiments must either be
repeated with the fluors swapped around or be performed with the same fluor in parallel on different
probes.
A P S
R E F R E S H E R
C O U R S E
R E P O R T
genes. However, closely related genes can often be

distinguished by using probes corresponding to the
3-untranslated region of an mRNA, as these regions
often display gene-specific sequence diversity.
contrast, the amplified antisense RNA (aRNA) procedure (13, 50) is a linear procedure that produces a
target more representative of the initial mRNA population. An mRNA sample is converted into cDNA using an oligo(dT) primer that contains a bacteriophage
T7 RNA polymerase promoter site. After the cDNA is
rendered double stranded, T7 RNA polymerase is
used to transcribe antisense RNA copies. The procedure can produce up to 106-fold amplification. Both
Ambion (http://www.ambion.com/catalog/CatNum.
php?1750) and Arcturus (http://www.arctur.com/
products/riboamp_main.htm) sell linear amplification
kits.
The state of the double-stranded DNA on the microarray is ill defined and may well have constraining
contacts with the matrix and inter- and intrastrand
cross-links that will affect hybridization.
Each sample must be synthesized, purified, and stored
before microarray fabrication.
Advantages of cDNA microarrays. cDNA microarrays are a relatively accessible and cost-effective technology. Hybridization does not need specialized
equipment, and data capture can be carried out using
equipment that is very often already available in the
laboratory. Prefabricated microarrays are relatively
cheap, and custom chip manufacture is within the
reach of many researchers, affording flexibility of design as necessitated by the scientific goals of the
experiment.
Oligonucleotide Microarrays
Oligonucleotide microarrays are made by synthesizing single-stranded probes on the basis of sequence
information in databases. A number of technologies
are available (http://www.lab-on-a-chip.com/suppliers/inform.html). For example, oligonucleotide synthesis has been combined with ink jet spotting. Motorola Life Sciences (http://www.motorola.com/
lifesciences) synthesize 30-mer oligonucleotides
offline and spots them onto slides coated with a
three-dimensional, branched polymeric substrate gel
surface (Motorola Life Sciences recently sold their
microarray business to Amersham Biosciences; http://
www.amersham.com). Aligent Technologies (http://
www.chem.agilent.com/Scripts/IDS.asp?lPage1624),
in partnership with Rosetta Inpharmatics (http://
www.rii.com), has described oligonucleotide synthesis
The long target sequences (2 kbp) increases detection sensitivity.

Disadvantages of cDNA microarrays. Sequence
homologies between clones representing different
closely related members of the same gene family may
result in a failure to specifically detect individual
261
Microarray fabrication is dependent on the curation

of extensive clone sets. Even the best maintained sets
are prone to mix-ups, with clones not containing the
sequence that they are supposed to. Halgren et al.
(24) sequenced 1,189 IMAGE consortium cDNAs
(http://image.llnl.gov/) obtained commercially from
Research Genetics (http://www.resgen.com). Only
62.2% were uncontaminated and contained cDNA inserts that had significant sequence identity to published data for the ordered clones; 7.1% contained
both a correct and an incorrect plasmid; and 5.9%
contained multiple, distinct, incorrect plasmids, indicating the likelihood of multiple contaminating
events. Through this kind of analysis will emerge
systems that will enable the better curation of clone
stocks.
Data capture. Once targets have been hybridized to

probes and the microarray has been washed to remove as much unbound and nonspecifically bound
target as possible, the array must be scanned to determine how much target is bound to each probe
spot. Data are captured from microarrays hybridized
with 33P-labeled target by means of a phosphorimager
system (e.g., the Molecular Dynamics Storm and Typhoon machines; http://www.mdyn.com). Microarrays hybridized with fluorescent targets are stimulated
with a laser. The emitted light is then captured by
either a charge-coupled device or a confocal scanner.
A number of companies produce machines for scanning fluorescently labeled microarrays (http://
ihome.cuhk.edu.hk/b400559/array.html#Scanner and
http://www.lab-on-a-chip.com/files/mascanner.pdf).
A P S
R E F R E S H E R
C O U R S E
in situ using an ink-jet printing method employing standard phosphoramidite chemistry (26).
R E P O R T
specific hybridization and background signal and allows the direct subtraction of cross-hybridization
signals and discrimination between real and nonspecific signals.
Affymetrix GeneChips. The industry leader in the

field of oligonucleotide microarrays is undoubtedly
Affymetrix Corporation (http://www.affymetrix.com),
which uses photolithography-directed combinatorial
chemical synthesis to manufacture so-called GeneChips, microarrays bearing hundreds of thousands of
different oligonucleotides on a derivatized glass surface (Fig. 2 and Refs. 19, 34, 40). By use of the
Affymetrix Fluidics system, GeneChips are hybridized
with fragments (35200 residues long) of biotinylated
target RNA derived from 5 g of total cell RNA or 0.2
g of poly(A)-selected mRNA. Hybridized probe is
recognized by a streptavadin-phycoerythrin conjugate, and then the fluorescent image is captured using
Affymetrix Microarray Reader.
Short-chain oligonucleotides with single points of

constraint are probably more accessible for hybridization to target than cDNA probes.
Advantages of Affymetrix GeneChips. Because

probes are based entirely on sequence information,
there is no need to prepare and verify physical intermediates such as bacterial clones, PCR products, or
cDNAs. As such, there is less possibility of probe
mix-ups. However, Affymetrix has experienced some
problems in this regard. Up to one-quarter of the
sequences on one set of mouse microarrays were
wrong. The company had used sequences from the
public sequence databases that were known to be
ambiguous and that actually corresponded to the
wrong strand from the DNA double helix. As a result,
the oligonucleotides could not detect their target mRNAs (http://www.affymetrix.com/support/technical/
product_updates/mgu74_product_bulletin.affx).
Comparability of Different Microarray

Platforms
Technical problems inherent in probe manufacture
and use still confound the extraction of meaningful
data from comparative microarray experiments. Such
sources of variability include
The use of multiple, short-sequence detectors enables

splice variants and closely related members of a gene
family to be distinguished (Fig. 2B). By use of probes
representing regions of genes that significantly diverge or are significantly unique between family
members, microarrays can distinguish transcripts that
are up to 90% identical.
slide heterogeneity
spotting variation
printing irregularities
background noise.
Furthermore, little work has been carried out on the

comparability of different microarray platforms. In
the absence of common standards, different platforms
are likely to give different results. Difficulties with
intraplatform comparability will be compounded by
different gene and probe choices.
For each probe designed to be perfectly complementary to a target sequence, a partner probe is generated
that is identical except for a single base mismatch in
its center (Fig. 2B). This probe mismatch strategy,
along with the use of multiple probes for each transcript, helps identify and minimize the effects of non-
ANALYSIS OF MICROARRAY DATA

Microarray experiments produce a huge amount of
data. A single microarray run can produce between
262
Disadvantages of Affymetrix GeneChips. There

are several disadvantages to the Affymetrix GeneChips. First is a need for access to expensive specialized equipment. Second, oligonucleotide chips are
only availabile from commercial manufacturers. Custom oligonucleotide microarrrays can be commisioned, but at great expense. Third, prefabricated GeneChips are themselves very expensive, although the
price, particularly for academic users, is falling.
Fourth, although short-sequence probes confer high
specificity, they may have decreased sensitivity/binding compared with cDNA microarrays. Low sensitivity
is compensated for by employing multiple probes.
A P S
R E F R E S H E R
C O U R S E
R E P O R T
100,000 and a million data points, and a typical experiment may require tens or hundreds of runs (21).
Quite simply, for the first time in the history of the
biomedical sciences, our ability to generate data in

vast quantities is running ahead of our ability to make
sense of them. Moving from data to knowledge is a
263
FIG. 2.
Principles underlying the manufacture and use of Affymetrix oligonucleotide arrays. A: GeneChip manufacture. 1)
A silicon wafer solid support is derivatized with a covalent linker molecule terminated with a photolabile
protecting group (X). Light is then directed through a mask to deprotect and activate selected sites. 2) The wafer is
then flooded with a photoprotected (X) DNA base, resulting in spatially defined coupling on the chip surface. Each
base is represented by a different symbol (square, star, circle, cartwheel). 3) A second photomask is used to
deprotect different, but defined, regions of the wafer. 4) The wafer is then flooded with a second photoprotected
DNA base. 5) Repeated deprotection and coupling cycles enable the preparation of high-density oligonucleotide
microarrays. 6) The resultant chip bears 400,000 groups of oligonucleotides in an area of 1.6 cm2, with each
feature containing 10 million oligonudeotides of a given sequence. B: Probe selection. A model gene consists of
a promoter (TATA) driving the transcription of a pre-mRNA consisting of two exons (I and II) separated by an
intron. Transcription termination occurs downstream of the polyadenylation signal (AAUAAA). Introns are removed by splicing to produce the mature translated mRNA consisting of a 5-untranslated region (5 UTR), the
coding sequences, and a 3 UTR. Based on cDNA, expressed sequence tags (EST), and genomic sequence information
corresponding to mature mRNA sequences, <20 different independent 25-mer and minimally overlapping oligonucleotides are selected to serve as sensitive, unique sequence detectors for each transcript (in this case i to vi).
These are the perfect match (PM) sequences. Mismatch (MM) control probes are identical to their perfect match
partners except for a single base difference in a central position (arrow). Comparison of the PM and MM signal
following hybridization allows automatic background subtraction.
A P S
R E F R E S H E R
C O U R S E
R E P O R T
considerable challenge. Although procedures for the

assessment, curation, and presentation of microarray
data are rapidly evolving, statistical approaches are
neither routine nor standardized (10, 76). The Microarray Gene Expression Database (MGED; 53) consortium has the goal of facilitating the adoption of
common standards for microarray experiment annotation and data representation, as well as the introduction of standard experimental controls, and data
normalization methods. The projects being pursued
by MGED are:
value. This could be the overall intensity of the microarray, the overall intensity of all of the genes on
the microarray, the intensity of so-called housekeeping genes (the expression of which are supposedly
constant), or spiked targets, containing a known and
constant amount of a labeled control. Negative normalization controls might be represented by target
sequences from a different organism. Data are often
then subjected to log transformation to improve the
characteristics of the distribution of the expression
values.
High-Level Analysis
MAGE (Microarray and Gene Experiment) establishment of a data exchange format, or mark-up
language (MAGE-ML), and object model (MAGE-OM)
for microarray experiments
ontologies development of ontologies for microarray experiment description and biological material (biomaterial) annotation.
High-level microarray analysis is often called data

mining, the uncovering of relevant patterns of interest in data from a particular problem domain. Typically this will involve data processing using various
statistical techniques to identify the patterns. In addition, data needs to be packaged, presented, archived,
and compared with other types of information.
Statistical analysis. The statistical analysis of microarray data is probably the most difficult problem
associated with the use of these techniques. The aim
is to apply standard statistical approaches to determine gene expression and gene expression alteration
significance, thus enabling the extraction of significant biological information from a morass of noise
and variability. However, present methodologies do
not deal well with the number of possible combinations. Statisticians are experienced with handling data
involving a limited number of variables, but a large
number of samples (e.g., the average weight of persons in England is a problem of a single variable and
49 million samples). Microarrays turn this problem on
its head, producing thousands of variables from a
small number of samples. A number of different methods have been explored.
The hope is that researchers will be able to share

MAGE-compatible data seamlessly. Some central internet sites have been established for depositing microarray data (e.g., http://www.ebi.ac.uk/arrayexpress).
A variety of microarray analysis software packages are
available from commercial and academic sources (http:
//ihome.cuhk.edu.hk/b400559/array.html#Software and
http://www.lab-on-a-chip.com/suppliers/inform.html).
Low-Level Analysis
Primary image data having been collected from a
microarray experiment, the aims of the first level of
analysis, so-called low-level analysis, are background
elimination, filtration, and normalization, all of which
should contribute to the removal of systematic variation between chips, enabling group comparisons.
Background noise is removed from cDNA microarrays
by subtracting nonspecific signal from spot signal. In
contrast, preprocessing of Affymetrix data is intrinsic
to the perfect match and mismatch strategy (Fig. 2B).
Normalization in both cases involves comparing different microarrays relative to some standard intensity
FOLD CHANGE. Simple and intuitive, this method, involves the calculation of a ratio relating the expression level of a gene under control and experimental
conditions. An arbitrary ratio (usually 2-fold) is then
selected as being significant. Because this ratio has
no biological merit, this approach amounts to nothing
more than a blind guess. The selection of an arbitrary
threshold results in both low specificity (false positives, particularly with low-abundance transcripts or
264
MIAMEformulation of the minimum information

about a microarray experiment required to interpret and verify the results (5)
A P S
R E F R E S H E R
C O U R S E
when a data set is derived from a divergent comparison) and low sensitivity (false negatives, particularly
with high-abundance transcripts or when a data set is
derived from a closely linked comparison). It is now
accepted that the use of the fold change method
should be discontinued.
R E P O R T
samples into groups, or clusters, with maximum similarity, thus enabling the identification of gene signatures or informative gene subsets. Methods for classification are either unsupervised or supervised.
Supervised methods use existing biological information about specific genes that are functionally related
to guide or test the cluster algorithm. With unsupervised methods, no prior test set is required.
The most commonly employed unsupervised classification methods are the clustering techniques (16).
They fall within the categories of hierarchical and
nonhierarchical (partitional) clustering. Most cluster
analysis techniques are hierarchical; the resultant classification has an increasing number of nested classes,
and the result resembles a phylogenetic classification.
Hierarchical clustering has the advantage that it is
simple and the result can be easily visualized. Nonhierarchical clustering techniques, such as k-means
clustering (51), partition objects into different clusters without trying to specify the relationship between/among individual elements. A self-organizing
map [SOM (48)] is a neural-network-based divisive
clustering approach. A SOM assigns genes to a series
of partitions on the basis of the similarity of their
expression vectors to reference vectors that are defined for each partition. It is the process of defining
these reference vectors that distinguishes SOMs from
k-means clustering.
UNIVARIATE STATISTICS.
If log ratios follow a normal distribution, a probability (P value) that the gene is erroneously reported as being differentially regulated
above a given threshold can be assigned using a univariate statistical test (e.g., the t-test). However, such
tests require correction. From a statistical point of
view, interrogating R genes on a microarray is the
same as running R parallel tests. The Bonferroni correction takes this into account, and adjusts P to P/R.
However, because a microarray experiment involves
an R of thousands, no differentially expressed genes
would ever be reported as reaching significance. Less
conservative correction methods have been reported
(10).
Principal component analysis [PCA (1, 41)] is a mathematical decomposition technique that picks out the
most abundant themes to reoccur in an experiment. A
set of expression patterns, called principal components, is identified, and linear combinations of these
are assembled to represent the behavior of genes in a
data set. PCA can be applied to both genes and experiments as a means of classification. In most implementations of PCA, it is difficult to define accurately
the precise boundaries of distinct clusters in the data
or to define genes (or experiments) belonging to each
cluster. However, PCA is a powerful technique for the
analysis of gene expression data when used with
another classification technique, such as k-means
clustering or SOMs, that requires the user to specify
the number of clusters.
ANALYSIS OF VARIANCE. Ultimately, the analysis of microarray data, and the selection of differentially expressed
genes, will be achieved by analysis of variance
(ANOVA) based on explicit experimental models.
Identifying patterns in microarray data. The output from the analysis of a microarray experiment is
usually a large data spreadsheet filled with numbers
related to the signal intensity for each gene on the
chip. Further analysis is required to identify groups of
genes that are similarly regulated across the biological
samples under study. A variety of mathematical procedures have been developed that partition genes or
One approach to supervised modeling is linear discriminant analysis (LDA), which uses a training set
265
UNUSUAL RATIO. This method selects genes for which the

ratio of control and experimental values is an arbitrarily selected distance from the mean control-toexperimental ratio. This is usually taken to be 2
standard deviations. This can be calculated by applying z-transformation, subtraction of the mean, and
division by the standard deviation to the log ratio
values. As a fixed proportion threshold is used, the
unusual-ratio method will always identify the most
affected genes. However, genes will be reported,
even if there are no differentially expressed genes.
Although flawed, the unusual-ratio method is commonly used for the analysis of cDNA microarrays.
A P S
R E F R E S H E R
C O U R S E
R E P O R T
CONFIRMATIONAL STUDIES
consisting of all classes of interest and then tries to set

up a model that classifies an unknown sample unambiguously into one of the already established classes
[(23) http://www.stat.berkeley.edu/users/terry/
zarray/Html/discr.html).
Because of the statistical issues raised by microarray

technology, it is necessary that findings be confirmed
using independent methodological criteria, preferably
with separate samples rather than with the tissue or
RNA used to derive the original targets.
A rapid, high through-put, but expensive, method for
confirmation of microarray data is quantitative (real
time) RT-PCR using the TaqMan (Applied Biosystems;
http://www.appliedbiosystems.com/products/productdetail.cfm?prod_id42), iCycler (Bio-Rad Laboratories; http://www.bio-rad.com/iCycler/), LightCycler (Roche Diagnostics; http://www.lightcycleronline.com/) machines. TaqMan PCR (http://
www.appliedbiosystems.com) exploits the 5nuclease activity of Taq DNA polymerase in
conjunction with DNA probes labeled with quencher
and reporter dyes. A positive PCR reaction results in
the removal of the reporter dye from the influence of
the quencher dye, leading to an increase in measurable fluorescence. The real-time reaction information
allows quantification of target nucleic acid. Advantages of TaqMan are that it is a closed-tube assay,
reducing the risk of contamination, and no post-PCR
processing (such as gel electrophoresis) is required.
Multiple reactions, detecting more than one sequence
per reaction, are possible using different quencher
dyes.
dChip. One of the best Affymetrix analysis packages

is dChip (31, 32, 45), available online at no cost
(http://www.dchip.org). dChip is based on a statistical model for Affymetrix expression data at the probe
level. This approach facilitates automatic probe selection in the analysis stage to reduce errors caused by
outliers, cross-hybridizing probes, and image contamination. Data are then normalized using a rank-selection method. The program selects a set of genes with
the property that the rank of a gene in this set according to its expression measurement in one microarray
is similar to its rank using values for the second
microarray. Genes thus selected tend to be nondifferentially expressed, and this forms a valid basis for the
computation of a normalization relation. By pooling
information across multiple microarrays, it is possible
to assess standard errors for the model-based expression indexes (MBEI) calculated for each gene. After
obtaining MBEIs, dChip can perform some high-level
analysis, such as hierarchical and functional clustering, involving ANOVA-based gene filtering, comparative analysis, PCA, and LDA. Some of these functions
require R (http://www.r-project.org), a statistical
package and language, as the engine for computational and graphic tasks.
Alternatively, Northern blots or ribonuclease protection assays provide the benefit of direct quantification. Finally, in situ hybridization can be used as a
sensitive measure of gene expression changes in specific cell types within a mixed tissue. This is important, as significant gene expression changes detected
on a microarray may be related to a small fraction of
the cells in a tissue.
As steady-state levels of RNA are not necessarily reflective of the final steady-state level of the functional
protein translation product of an mRNA, further studies might involve the use of specific antibody probes
in Western blot or immunocytochemical studies.
Because a microarray experiment may reveal putative
changes in the expression of tens or hundreds of
genes, it is practically impossible to confirm all of the
266
Relational and functional databases. Microarray

data need to be interpreted within the context of
gene function and the functional relationships between genes. This demands relating microarray data
with existing biological knowledge. However, this
project has had to face up to the linguistic ambiguities
of the existing scientific literature; supposedly rigid,
solid scientific concepts are often couched in imprecise terms. What is needed is a common gene language. Thus the aim of the Gene Ontology Consortium (http://www.geneontology.org) is to produce a
dynamic controlled vocabulary that can be applied to
all organisms even as knowledge of gene and protein
roles in cells is accumulating and changing.
A P S
R E F R E S H E R
C O U R S E
data. However, it is incumbent upon investigators to

evaluate a reasonable number of genes. That said,
confirmational studies may raise other issues. Although a microarray experiment might indicate an
increase or decrease in the expression of a gene, an
independent method might reveal a greater or a lesser
change. Does such a result represent sufficient confirmation of the microarray findings, or does a quantitative difference raise new questions about the validity of the microarray data?
R E P O R T
combine expression data with other sources of information to improve the range and quality of biological conclusions
develop techniques that enable the modeling of

gene networks in the context of the functioning of
integrated biological systems.
It is to be anticipated that, as these aims are realized,

human judgement and expertise will be replaced by
artificial intelligence.
PRESENT AND FUTURE CHALLENGES

Hardware
Experimental Design
Careful contemplation of the ultimate research objective for a study will ensure that appropriate type and
number of treatment groups are incorporated into an
experimental design. Every microarray study should
include a sufficient number of independent experiments to allow statistical evaluation of claims of an
increase or decrease in gene expression (30, 38). The
number of microarrays and replicates needed to
achieve statistical significance is dependent on the
coefficient of variation. Reproducibility must be demonstrated, including rigorous evaluation of the run-torun variability for each gene. This will permit appropriate adjustments to be made that will reduce the
false discovery rate.
Another challenge that can be overcome by good
experimental design is the need to distinguish among
primary and secondary effects and subsequent events.
An initial perturbation of a biological system will
induce gene expression changes that will be followed
by more alterations related to secondary, cellular
changes, and subsequent modulations at the organismal level. All of these levels of cause-and-effect plasticity are of interest and could be dissected by incorporating a broad range of time points. Similarly,
effects not directly related to an experimental perturbation can be eliminated by inclusion of appropriate
control groups. For example, if studying gene expression changes as a consequence of drug interactions
with a specific receptor, controls might include comparisons with groups using a receptor pathway inhibitor or a nonactive analog.
Another factor driving the development of bigger

chips is cost; as size and volume increase, prices will
surely drop.
Software
We continue to search for a body of mathematics
that will serve as a natural language for gene expression information (54). The role of this mathematics
will be to
understand sources of noise and variation to increase the biological signal
267
Scientists in academia and industry are diligently addressing the technical problems of microarrays. The
quality, reproducibility, comparability, sensitivity, and
dynamic range of microarrays will improve. In 1965,
Gordon Moore, the founder of Intel, observed that the
number of transistors per semiconductor chip doubles every 18 24 months (36). Microarrays are on a
similar trajectory. In 1998, an Affymetrix microarray
contained fewer than 1,000 genes; by 2000, it boasted
of 12,000. The ultimate aim is to represent all of the
expressed sequences of the genome on a single chip.
Toward this end, Affymetrix has recently released the
Human Genome U133 GeneChip Set, comprised of
two microarrays containing almost 45,000 probe sets
corresponding to more than 39,000 transcript variants representing greater than 33,000 of the best
characterized human genes (http://www.affymetrix.com/products/arrays/specific/hgu133.affx). However,
until the day when all transcription units have been
identified, microarrays will remain incomplete. Although this is acceptable, microarrays should be unbiased in their selection of genes.
A P S
R E F R E S H E R
C O U R S E
TOWARD AN UNDERSTANDING OF GENE

FUNCTION
6.
7.
8.
9.
10.
11.
12.
13.
The Wellcome Trust is thanked for support. Dr. Mohamed Ghorbel

and Greig Sharman (University of Bristol) are thanked for critical
and constructive comments on the manuscript.
14.
Address for reprint requests and other correspondence: D. Murphy,

Univ. of Bristol Research Centre for Neuroendocrinology, Bristol
Royal Infirmary, Marlborough St., Bristol BS2 8HW, UK (E-mail:
d.murphy@bristol.ac.uk)
15.
Received 22 August 2002; accepted in final form 23 August 2002
17.
REFERENCES
18.
16.
1. Alter O, Brown PO, and Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 97: 1010110106, 2000.
2. Altman RB. Whole-genome expression analysis: challenges
beyond clustering. Curr Opp Structural Biol 11: 340 347,
2001.
3. Banerjee N and Zhang MQ. Functional genomics as applied
to mapping transcription regulatory networks. Curr Opin Microbiol 5: 313317, 2002.
4. Bonaventure P, Guo H, Tian B, Liu X, Bittner A, Roland B,
Salunga R, Ma XJ, Kamme F, Meurers B, Bakker M, Jurzak
M, Leysen JE, and Erlander MG. Nuclei and subnuclei gene
expression profiling in mammalian brain. Brain Res 943: 38
47, 2002.
5. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton
HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF,
Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, and
Vingron M. Minimum information about a microarray exper-
19.
20.
21.
22.
23.
24.
iment (MIAME)toward standards for microarray data. Nat

Genet 29: 365371, 2001.
Burgess JK and Hazelton RH. New developments in the
analysis of gene expression. Redox Rep 5: 6373, 2000.
Cao Y and Dulac C. Profiling brain transcription: neurons
learn a lesson from yeast. Curr Opin Neurobiol 11: 615 620,
2001.
Clarke PA, te Poele R, Wooster R, and Workman P. Gene
expression microarray analysis in cancer biology, pharmacology, and drug development: progress and potential. Biochem
Pharmacol 62: 13111336, 2001.
Das M, Burge CB, Park E, Colinas J, and Pelletier J. Assessment of the total number of human transcription units. Genomics 77: 7178, 2001.
Draghici S. Statistical intelligence: effective analysis of highdensity microarray data. Drug Discov Today 7: S55S63, 2002.
Duggan DJ, Bittner M, Chen Y, Meltzer P, and Trent JM.
Expression profiling using cDNA microarrays. Nat Genet 21,
Suppl: 10 14, 1999.
Eberwine J, Kacharmina JE, Andrews C, Miyashiro K,
McIntosh T, Becker K, Barrett T, Hinkle D, Dent G, and
Marciano P. mRNA expression analysis of tissue sections and
single cells. J Neurosci 2: 8310 8314, 2001.
Eberwine J, Yeh H, Miyashiro K, Cao Y, Nair S, Finnell R,
Zettel M, and Coleman P. Analysis of gene expression in
single live neurons. Proc Natl Acad Sci USA 89: 3010 3014,
1992.
Eberwine J. Single-cell molecular biology. Nat Neurosci 4,
Suppl: 11551156, 2001.
Eisen MB and Brown PO. DNA arrays for analysis of gene
expression. Methods Enzymol 303: 179 205, 1999.
Eisen MB, Spellman PT, Brown PO, and Botstein D. Cluster analysis and display of genome-wide expression patterns.
Proc Natl Acad Sci USA 95: 1486314868, 1998.
Ewing B and Green P. Analysis of expressed sequence tags
indicates 35,000 human genes. Nat Genet 25: 232234, 2000.
Fitzgerald DA and Guimbellot JS. Tailored Arrays. The Scientist 5: 26, 2001. [online] http://www.thescientist.com/
yr2001/sep/profile1_010917.html
Fodor SPA, Read JL, Pirrung MC, Stryer L, Lu AT, and
Solas D. Light-directed, spatially addressable parallel chemical
synthesis. Science 251: 767773, 1991.
Gershon D. Microarray technology: an array of opportunities.
Nature 416: 885 891, 2002.
Goodman N. Biological data becomes computer literate: new
advances in bioinformatics. Curr Opin Biotechnol 13: 68 71,
2002.
Greenberg SA. DNA microarray gene expression analysis technology and its application to neurological disorders. Neurology
57: 755761, 2001.
Hakak Y, Walker JR, Li C, Wong WH, Davis KL, Buxbaum
JD, Haroutunian V, and Fienberg AA. Genome-wide expression analysis reveals dysregulation of myelination-related genes
in chronic schizophrenia. Proc Natl Acad Sci USA 98: 4746
4751, 2001.
Halgren RG, Fielden MR, Fong CJ, and Zacharewski TR.
Assessment of clone identity and sequence fidelity for 1189
IMAGE cDNA clones. Nucleic Acids Res 29: 582588, 2001.
268
The microarray-based approach to the problem of

gene function clusters genes according to their expression behavior under defined conditions and to
assign function. The hypothesis of this guilt-by-association approach is that clustered genes may be coregulated and therefore may be involved in similar
functions. However, sequence and expression analysis alone is insufficient to fully inform us about gene
function. To make sense of these data, the hypotheses
that emerge from analysis of systemic expression information must be tested empirically. This will involve the integration of genomic knowledge with
biochemistry, cell biology, genetics, structural biology, and proteomics. Ultimately, hypotheses must be
tested within the physiological integrity of the whole
organism. This will demand the development of a
new, high-throughput systems biology coupled with
rapid and efficient gene transfer techniques.
R E P O R T
A P S
R E F R E S H E R
C O U R S E
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP,

Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP,
Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A,
Wetterstrand KA, Patrinos A, Morgan MJ, Szustakowki J,
de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S,
and Chen YJ. Initial sequencing and analysis of the human
genome. Nature 409: 860 921, 2001.
Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile
C, Hwang SY, Brown PO, and Davis RW. Yeast microarrays
for genome wide parallel genetic and gene expression analysis.
Lee ML, Kuo FC, Whitmore GA, and Sklar J. Importance of
replication in microarray gene expression studies: statistical
methods and evidence from repetitive cDNA hybridizations.
Proc Natl Acad Sci USA 97: 9834 9839, 2000.
Li C and Wong WH. Model-based analysis of oligonucleotide
arrays: model validation, design issues and standard error application. Genome Biology 2: research0032.1 0032.11, 2001.
Li C and Wong WH. Model-based analysis of oligonucleotide
arrays: expression index computation and outlier detection.
Liang F, Holt I, Pertea G, Karamycheva S, Salzberg SL, and
Quackenbush J. Gene index analysis of the human genome
estimates approximately 120,000 genes. Nat Genet 24: 239
240, 2000.
Lipshutz RL, Fodor SPA, Gingeras TR, and Lockhart DL.
High density synthetic oligonucleotide arrays. Nat Genet 21,
Suppl: 20 24, 1999.
Miyashiro K, Dichter M, and Eberwine J. On the nature and
distribution of mRNAs in hippocampal neurites: implications
for neuronal functioning. Proc Natl Acad Sci USA 91: 10800
10804, 1994.
Moore GE. Cramming more components onto integrated circuits. Electronics 38: 114 117, 1965.
Nadon R and Shoemaker J. Statistical issues with microarrays: processing and analysis. Trends Genet 18: 265271, 2002.
Pan W, Lin J, and Le CT. How many replicates of arrays are
required to detect gene expression changes in microarray
experiments? A mixture model approach. Genome Biol 3:
research0022, 2002.
Pease AC, Solas D, Sullivan EJ, Cronin MT, Holmes CP,
and Foder SPA. Light-generated oligonucleotide arrays for
rapid DNA sequence analysis. Proc Natl Acad Sci USA 91:
50225026, 1994.
Pritchard CC, Hsu L, Delrow J, and Nelson PS. Project
normal: defining normal variance in mouse gene expression.
Proc Natl Acad Sci USA 98: 13266 13271, 2001.
Raychaudhuri S, Stuart JM, and Altman RB. Principal components analysis to summarize microarray experiments: application to sporulation time series. Pacific Symposium on Biocomputing 5: 452 463, 2000.
Roest Crollius H, Jaillon O, Bernot A, Dasilva C, Bouneau
L, Fischer C, Fizames C, Wincker P, Brottier P, Quetier F,
Saurin W, and Weissenbach J. Estimate of human gene
number provided by genomewide analysis using Tetraodon
nigroviridis DNA sequence. Nat Genet 25: 235238, 2000.
269
25. Heck DE, Roy A, and Laskin JD. Nucleic acid microarray
technology for toxicology: promise and practicalities. Adv Exp
Med Biol 500: 709 714, 2001.
26. Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ,
Shannon KW, Lefkowitz SM, Ziman M, Schelter JM,
Meyer MR, Kobayashi S, Davis C, Dai H, He YD, Stephaniants SB, Cavet G, Walker WL, West A, Coffey E, Shoemaker DD, Stoughton R, Blanchard AP, Friend SH, and
Linsley PS. Expression profiling using microarrays fabricated
by an ink-jet oligonucleotide synthesizer. Nat Biotechnol 19:
342347, 2001.
27. King HC and Sinha AA. Gene expression profile analysis by
DNA microarrays. JAMA 286: 2280 2288, 2001.
28. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC,
Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W,
Funke R, Gage D, Harris K, Heaford A, Howland J, Kann
L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C,
Stange-Thomann N, Stojanovic N, Subramanian A,
Wyman D, Rogers J, Sulston J, Ainscough R, Beck S,
Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R,
French L, Grafham D, Gregory S, Hubbard T, Humphray
S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L,
Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross
M, Shownkeen R, Sims S, Waterston RH, Wilson RK,
Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton
LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl
MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB,
Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW,
Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S,
Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM,
Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives
CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS,
Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori
M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H,
Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W,
Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C,
Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M,
Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M,
Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J,
Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis
RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM,
Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV,
Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima
S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F,
Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR,
de la Bastide M, Dedhia N, Blocker H, Hornischer K,
Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A,
Batzoglou S, Birney E, Bork P, Brown DG, Burge CB,
Cerutti L, Chen HC, Church D, Clamp M, Copley RR,
Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S,
Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV,
Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A,
R E P O R T
A P S
R E F R E S H E R
C O U R S E
53.
54.
55.
56.
P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai

Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV,
Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B,
Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun
J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao
C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao
Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D,
Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T,
Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M,
Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng
ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S,
Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A,
Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin
D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F,
Kline L, Koduru S, Love A, Mann F, May D, McCawley S,
McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon
M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R,
Sitter C, Smallwood M, Stewart E, Strong R, Suh E,
Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J,
Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe
K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ,
Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B,
Hatton T, Narechania A, Diemer K, Muruganujan A, Guo
N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz
B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L,
Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne
M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D,
Esparham S, Fosler C, Gire H, Glanowski S, Glasser K,
Glodek A, Gorokhov M, Graham K, Gropman B, Harris M,
Heil J, Henderson S, Hoover J, Jennings D, Jordan C,
Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M,
Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S,
Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck
J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M,
Smith T, Sprague A, Stockwell T, Turner R, Venter E,
Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, and Zhu
X. The sequence of the human genome. Science 291: 1304
1351, 2001.
Wyrick JJ and Young RA. Deciphering gene expression regulatory networks. Curr Opin Genet Dev 12: 130 136, 2002.
Young RA. Biomedical discovery with DNA arrays. Cell 102:
9 15, 2000.
Young WS III, Iacangelo A, Luo XZ, King C, Duncan K,
and Ginns EI. Transgenic expression of green fluorescent
protein in mouse oxytocin neurones. J Neuroendocrinol 11:
935939, 1999.
Zhu Z, Pilpel Y, and Church GM. Computational identification of transcription factor binding sites via a transcriptionfactor-centric clustering (TFCC) algorithm. J Mol Biol 318:
71 81, 2002.
270
43. Saiki RK, Bugawan TL, Horn GT, Mullis KB, and Erlich
HA. Analysis of enzymatically amplified beta-globin and
HLA-DQ alpha DNA with allele-specific oligonucleotide probes.
Nature 324: 163166, 1986.
44. Sandberg R, Yasuda R, Pankratz DG, Carter TA, Del Rio
JA, Wodicka L, Mayford M, Lockhart DJ, and Barlow C.
Regional and strain-specific gene expression mapping in the
adult mouse brain. Proc Natl Acad Sci USA 97: 11038 11043,
2000.
45. Schadt EE, Li C, Su C, and Wong WH. Analyzing high-density
oligonucleotide gene expression array data. J Cell Biochem 80:
192202, 2001.
46. Schena M, Shalon D, Davis RW, and Brown PO. Quantitative monitoring of gene expression pattern with a complementary DNA microarray. Science 270: 467 470, 1995.
47. Shalon D, Smith SJ, and Brown PO. A DNA microarray
system for analyzing complex DNA samples using two-color
fluorescent probe hybridization. Genome Res 6: 639 645,
1996.
48. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S,
Dmitrovsky E, Lander E, and Golub T. Interpreting patterns
of gene expression with self-organizing maps: methods and
application to hematopoietic differentiation. Proc Natl Acad
Sci USA 96: 29072912, 1999.
49. Ant Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AMA,
Mao M, Hans L, Peterse HL, van der Kooy K, Marton MJ,
Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C,
Linsley PS, Bernards R, and Friend SH. Gene expression
profiling predicts clinical outcome of breast cancer. Nature
415: 530 536, 2002.
50. Van Gelder RN, von Zastrow ME, Yool A, Dement WC,
Barchas JD, and Eberwine JH. Amplified RNA synthesized
from limited quantities of heterogeneous cDNA. Proc Natl
Acad Sci USA 87: 16631667, 1990.
51. Varela JC, Goldstein MH, Baker HV, and Schultz GS. Microarray analysis of gene expression patterns during healing of
rat corneas after excimer laser photorefractive keratectomy.
Invest Ophthalmol Vis Sci 43: 17721782, 2002.
52. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton
GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne
JD, Amanatides P, Ballew RM, Huson DH, Wortman JR,
Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M,
Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL,
Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA,
Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C,
Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D,
Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz
S, Levy S, Mobarry C, Reinert K, Remington K, AbuThreideh J, Beasley E, Biddick K, Bonazzi V, Brandon R,
Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi
K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan
R E P O R T

Gene Expression Studies Using Microarrays Principles, Problems, and Prospects

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Gene Expression Studies Using Microarrays Principles, Problems, and Prospects

Hochgeladen von

Copyright:

Verfügbare Formate

GENE EXPRESSION STUDIES USING

MICROARRAYS: PRINCIPLES, PROBLEMS, AND

Advan in Physiol Edu 26:256-270, 2002.

Microarray analysis using disiloxyl 70mer oligonucleotides

This infomation is current as of October 25, 2011.

Downloaded from advan.physiology.org on October 25, 2011

Transcriptional profiling of VEGF-A and VEGF-C target genes in lymphatic endothelium