Beruflich Dokumente
Kultur Dokumente
ABSTRACT
Microarray experiments have been used recently in genetical genomics studies, as an additional tool
to understand the genetic mechanisms governing variation in complex traits, such as for estimating
heritabilities of mRNA transcript abundances, for mapping expression quantitative trait loci, and for
inferring regulatory networks controlling gene expression. Several articles on the design of microarray
experiments discuss situations in which treatment effects are assumed fixed and without any structure. In
the case of two-color microarray platforms, several authors have studied reference and circular designs.
Here, we discuss the optimal design of microarray experiments whose goals refer to specific genetic
questions. Some examples are used to illustrate the choice of a design for comparing fixed, structured
treatments, such as genotypic groups. Experiments targeting single genes or chromosomic regions (such
as with transgene research) or multiple epistatic loci (such as within a selective phenotyping context) are
discussed. In addition, microarray experiments in which treatments refer to families or to subjects (within
family structures or complex pedigrees) are presented. In these cases treatments are more appropriately
considered to be random effects, with specific covariance structures, in which the genetic goals relate to
the estimation of genetic variances and the heritability of transcriptional abundances.
be assayed or by the amount of mRNA from each in- which should be explored when designing the experi-
dividual) or number of slides available. The choice of ments. In the fourth section, the design of microarray
the experimental unit, which can be an individual (such experiments with random treatments is discussed. Treat-
as a plant or animal, hereinafter denoted as a subject) or ments in this case may refer to related individuals or
a group of subjects (such as a family or mRNA pool), families, and the main goal of the experiments relates to
may impose some spatial or genetic structure among inference on genetic variances and heritabilities. Fi-
units, which should be taken into account when assign- nally, some concluding remarks are presented in the last
ing treatments to units. section.
Microarray experiments have been recently used on
genetical genomics studies, as an additional tool to
understand the genetic mechanisms governing varia-
DESIGN PRINCIPLES AND CRITERIA
tion in complex traits, such as disease susceptibility in
human medicine or production traits in agriculture. The general statistical principles of the design of ex-
Microarray experiments using related subjects can be periments apply to experiments of any kind, including
utilized, for example, to infer heritabilities of mRNA microarray studies. The general procedure for choosing
transcript abundances (Monks et al. 2004). In addition, a good design involves the following steps (see also
gene expression profiling can be combined with marker Bailey 1981, 1997; Mead 1990):
genotype information for mapping expression quanti-
1. The objectives of the experiment lead to the choice
tative trait loci (eQTL; Jansen and Nap 2001; Brem
of a set of treatments (groups to compare).
et al. 2002; Darvasi 2003; Schadt et al. 2003) and to
2. The experimental units (here the combination of
infer regulatory networks controlling gene expression
slides and dyes) that can be used are identified and
(Chesler et al. 2004; Bystrykh et al. 2005).
any expected patterns of variability among them lead
In this article we discuss the optimal design of micro-
to a choice of blocking structure (here a row–column
array experiments whose goals refer to specific genetic
structure).
questions, such as the comparison of expression levels
3. Consideration of any restrictions on which treat-
of different genotypic groups, eQTL studies, or estima-
ments can be applied to which units might lead to
tion of heritabilities of mRNA transcript abundances.
complex structures such as split-plot designs.
This goes beyond the scope of earlier work on fixed,
4. Given 1–3, a combinatorial, algorithmic or ad hoc
unstructured treatments.
method is found to construct a design.
The experimenters’ interest might be in specific
treatment comparisons, e.g., referring to additive and The block structure is already clear and we do not dis-
dominance effects. Treatments may refer to families or cuss complex structures here, so the two main tasks in
to subjects (within family structures or complex pedi- developing a design are treatment choice and treatment-
grees) in a given experiment. In these cases treatments to-unit allocation.
are more appropriately considered to be random ef- Treatment choice: When the experimental resources
fects, with specific covariance structures. These covari- are limited, a choice of the treatments that will be
ance structures should then be taken into account when represented in the experiment, from a larger set of
planning microarray experiments, so that the efficiency candidate treatments, might be needed. In some sit-
of broad inferences to the population of interest is uations particular treatments cannot be produced, or
maximized. the number of candidate treatments is too big for all of
The high cost of microarrays greatly limits the sample them to be implemented. For example, suppose we want
sizes of heritability or gene-mapping studies of mRNA to study isolines for transgenic versions of three genes
abundance traits ( Jin et al. 2004). Therefore, a careful (A/a, B/b, and C/c) from some parental line. Even
design of such experiments is extremely important to restricting each locus to express just the homozygote
make the most of the limited number of slides that are genotypes, it is unlikely that all seven desired isolines
generally available in a single experiment. will be obtained by transformation events or any other
This article is organized as follows. In the next section currently known gene manipulation techniques. Start-
we review some experimental design concepts, such as ing from the nontransgenic genotype (AABBCC), for
treatment choice, treatment-to-unit allocation, design instance, a single mutation may give rise to the geno-
optimality criteria, and some useful methods and results types AABBcc, AAbbCC, or aaBBCC; however, two or
on optimal designs. The third section presents genetic three mutations are needed to obtain the genotypes
research goals that can motivate microarray studies AAbbcc, aaBBcc, and aabbCC or aabbcc, respectively.
with two or more treatments, such as genotypic groups. Another and more obvious example is given by partial
Treatments are considered fixed, but the genetic com- diallels, in which some crossings cannot be done. We
ponent that defines the treatments provides a natural discuss later in this article some multilocus analysis
structure (which is translated into genetic parameters problems. In these cases, we need to define a model for
of interest, such as additive and dominance effects), the treatment effects and an optimality criterion related
Design of Genetical Genomics Studies 947
to the variances of the estimated parameters of the with variance s2e ; likewise, as arrays come from a pop-
model. Then we must find a method of searching for ulation of possible assembled arrays, their effects can be
a design that optimizes the criterion, e.g., using the modeled as random normal variables centered around
optimal continuous (approximate) design theory de- zero, with variance s2a . The term ‘‘treatment’’ here is
veloped from Kiefer and Wolfowitz (1960) or a com- used in a very broad context, representing either experi-
puter algorithm that searches for the optimal exact mental groups (such as genotypic classes) or families
design, such as those developed from Fedorov (1972). (and subjects within families) in quantitative genetics
As stated earlier, the treatment choice in microarray studies to estimate heritabilities. Therefore, treatment
experiments is independent of the array platform, so effects can be modeled as fixed or random, depending
the same procedure can be used for single- or two-color on the objective of the experiment.
technologies. The usual matrix notation for linear mixed models
Treatment-to-unit allocation: A given set of treatments such as model (1) is
(and their respective replication numbers) should be
combined in an optimal way within a given experi- y ¼ Xb 1 Zu 1 e; ð2Þ
mental setup. In the context of two-color microarray where y and e are vectors with elements yijk and eij ,
experiments, for example, this setup is given by the respectively, b is the vector of fixed effects, X is a known
number of slides and the two labels (Cy3 and Cy5 dyes). (design) matrix that relates y to b, u is the vector of
Again, construction of the design might require optimal random effects, and Z is the matrix that relates y to u. An
design theory, this time most likely using a computer extra assumption is that u $ N ð0; UÞ, where U is the
search. covariance matrix of the random effects.
The treatment-to-unit allocation applications dis- Note that the treatment effects can be taken as being
cussed in this article are specific to two-color array plat- either fixed or random and so they can be contained in
forms (cDNA or long oligos), in which samples should either b or u, depending on the experimental context.
be allocated in pairs to the slides (blocks). With single- In the examples discussed in this article, whenever
color microarrays (such as Affymetrix), on the other treatment effects are considered random, they are as-
hand, each sample is hybridized to a different slide, sumed normally distributed; i.e., t $ N ð0; Gs2t Þ, where
so this is not an issue and allocation should be done G is a matrix of known constants representing, for ex-
completely at random. ample, genetic relationships among the elements in t
Linear mixed models for microarray experiments: (such as a numerator relationship matrix A; Henderson
The outcomes of a microarray experiment are luminous 1976), and s2t is the variance among treatments, such as
intensities of thousands of genes, which are propor- an additive genetic variance.
tional to their mRNA transcriptional abundances. After Design criteria: The covariance matrix of the param-
suitable normalization, the response intensities from eter estimates for model (2) is the inverse of the in-
gene g in a two-color microarray experiment can be formation matrix M, where
represented by the model
! "
ygijk ¼ mg 1 dgi 1 agj 1 tgk 1 egij ; ð1Þ X9X Z9X
M} :
X9Z Z9Z 1 s2e U%1
(Wolfinger et al. 2001), where ygijk is the normalized
intensity of the gene expression measured by the dye Almost any sensible criterion defining design efficiency
(label) i of the array j that received treatment k, mg is is a function of the information matrix M.
a common experimental constant, dgi is the effect of dye Costs are important when comparing alternative ex-
i (green or red), agj is the effect of array (or slide) j, tgk perimental designs. In this article, however, designs
is the effect of the treatment k, and egij is a residual are generally compared for a given number of slides. As
term. We consider a single gene at a time for design pointed out by Yang and Speed (2002), the number of
purposes, as the same model is generally used to analyze slides is generally the most important limiting factor
all genes in a single experiment, and so the subscript g in microarray experiments. In addition, for two-color
is dropped. Different experimental layouts may need microarray platforms, as the variation between slides is
extra terms in the model for the appropriate modeling generally larger than the variation within slides, block-
of the data. With Affymetrix data, for example, there is ing by slides is crucial. Since the ratio of between-slide to
no dye effect in the model, but a hierarchical structure within-slide variation is unknown, it is usual to design
could be used to model intensities at the probe level, experiments for the worst case, i.e., where this ratio is
by including the effects of probe sets (clones) and of very large, in which case the design problem becomes
probe pairs within clones. Model choice should then be equivalent to one with fixed blocks. Note, however, that
closely connected to treatment and block structures of after the experiment is run, in the mixed-models data
the experiments (Rosa et al. 2005). analysis, we will estimate the variance components; if
As additional assumptions of model (1), residual their ratio turns out to be large, we will estimate the
terms can be modeled as being normally distributed treatment effects with the precision expected when we
948 J. S. S. Bueno Filho, S. G. Gilmour and G. J. M. Rosa
chose the design; if their ratio turns out to be small, then represented in the microarray slide, i.e., how polymor-
the interblock information will give us estimates of phisms at a specific site of the genome affect transcrip-
treatment effects better than expected when we chose tion activity of other genes in the genome. Such effects
the design. are generally referred to as trans-acting factors ( Jansen
In this context, Kerr and Churchill (2001) used the and Nap 2001).
so-called A-optimality criterion to compare block de- The model (1) for such a simple treatment structure
signs for microarray studies with simple treatment can be written with tk ¼ qk a. For the BC situation, qk is a
structures, and Yang and Speed (2002) discussed the design index on genotypes assuming values q1 ¼ 0 (for
efficiency of alternative designs for specific contrasts of genotype bb) and q2 ¼ 1 (for genotype Bb). In this sit-
interest in factorial and time course experiments. Kerr uation, one wants to estimate a ¼ t2 % t1 and the vari-
(2003) later argued in favor of double reference designs ance of this contrast is given by Varð^t1 % ^t2 Þ ¼ Varð^t1 Þ 1
as being suboptimal but more robust when compared to Varð^t2 Þ % 2 Covð^t1 ; ^t2 Þ. For the DH situation, q1 ¼ 0 (for
loop designs. Glonek and Solomon (2004) studied genotype bb) and q2 ¼ 2 (for genotype BB). In this
some alternative designs for factorial treatment struc- situation, one wants to estimate a ¼ 12ðt2 % t1 Þ, and the
tures. More general work on two-level factorial designs variance of this comparison is given by one-quarter of
in blocks of size two, which has some relevance, has been the right-hand side of the above variance. As these two
done by Draper and Guttman (1997), Yang and variances are proportional, the same optimal design
Draper (2003), Wang (2004), and Kerr (2006). minimizes both of them.
Similar sorts of experiments are now being used for In a microarray experiment, such a simple two-
understanding the genetic basis of mRNA transcript treatment group situation can arise within a selective
abundances, with a variety of experimental settings. For phenotyping approach based on a single locus (or small
example, fixed treatment effect models arise in genetic chromosomic region), as proposed by Jin et al. (2004),
analysis when comparing multilocus genotypic groups, or within a candidate gene context in which one is
including interaction (epistasis) terms. In addition, interested in the effect of a specific locus (e.g., trans-
random treatment effect models may be used in quan- gene) on the expression of other genes in the genome
titative genetics to infer genetic variances and heritabil- (see, for example, Noueiry et al. 2000).
ities. No work, however, has been published on optimal Another example is to define two experimental groups
design for microarray experiments in situations where by taking the extremes of the phenotypic distribution of
the treatments have any of these structures. a quantitative trait of interest (Lander and Botstein
In each of the following sections we present different 1989). By comparing the allelic frequencies of molecu-
experiments and discuss possible parameter contrasts lar markers between the two groups, candidate regions
of interest, as well as the variances of their estimates, containing genes contributing to variation of the phe-
sometimes weighted to correct for scale problems. In notypic trait can be pinpointed. In the context of mi-
each situation, alternative optimality criteria are dis- croarrays, candidate genes are generated by searching
cussed. We assume that all power comparisons have a for differential transcript abundances between the two
monotonic (inverse) relation with the variances of the groups. In any situation, the proportions of the selected
desired comparisons and describe treatment structures individuals should be made small enough so that the
in microarray experiments for some common goals in two phenotypic groups are quite distinct (Darvasi and
genetic studies. The resulting designs are depicted by Soller 1992).
a set of arrows representing the slides, with heads and tails Microarray experiments with two experimental groups
denoting alternative dye labeling (Cy3 and Cy5). Arrows define row–column designs obtained from multiple
(slides) refer always to different biological replications copies of a 2 3 2 Latin square (in the case of two-color
from each experimental group, unless otherwise stated. systems), for which extensive literature is available.
Three treatment groups in microarray experiments:
Genetic motivation: In some situations there are three
GENETIC STUDIES WITH FIXED TREATMENT possible genotypes in a segregating population, for ex-
EFFECTS
ample, with F2 populations and codominant markers.
Two treatment groups in microarray experiments: The genotypes for each marker can then be described
Genetic motivation: One of the simplest scenarios that by an additive–dominance model. Here we discuss the
results in two-treatment experiments arises with a single case of a single locus (such as a candidate gene sit-
candidate gene (with alleles B and b) and a backcross uation) and its trans-acting effect on the expression of
(BC) or double-haploid (DH) experiment. In these cir- other genes. The case of multiple loci is discussed later
cumstances, individuals belong to one of two possible in this article, with factorial and fractional factorial
genotypic groups, namely Bb or bb if in a BC experi- treatment structures.
ment and BB or bb if in a DH scenario. In both Treatment choice: An additive–dominance treatment
situations one is interested in the additive effect ðaÞ of effect of a candidate gene in an F2 population can be
the candidate gene on the expression of each gene expressed as tk ¼ qk a 1 ð1 % jqk jÞd, where qk ¼ %1; 0; 1,
Design of Genetical Genomics Studies 949
for genotypes bb, Bb, and BB, respectively. Denote by dominance only, half of the observations should be
rk the proportion of the n observations that represent taken from the heterozygotes and one-quarter from
each genotypic group (k ¼ 1, 2, 3). The model can be each of the homozygotes. To estimate additive and dom-
rewritten as y ¼ m1n31 1 QðadÞ 1 e, where 1n31 is a col- inance effects simultaneously, it is necessary to define an
umn vector of ones and Q gives the incidences of optimality criterion. Throughout this article, we use the
genotypes bb, Bb, and BB as A-efficiency on this scale, i.e., the mean of the variances
2 3 of the estimates, but in any real experiment careful
%1 %1 & & & %1 0 0 & & & 0 1 1 & & & 1 thought should be given to whether a weighted mean of
6 7
Q9 ¼ 6
4 0 0 &&& 0 1 1 &&& 1 0 0 &&& 0 7 5: the variances would be a more appropriate criterion.
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl}
n3r1 n3r2 n3r3
The A-optimal proportion ofpeach ffiffiffi homozygote is the
irrational number ropt ¼ 1 % 2=2. As no possible num-
Both parameters can be optimally estimated using the ber of arrays satisfies this optimum exactly, an approx-
same proportion for each homozygous genotype (bb imation is needed. It is reasonable to use $0.293 of
and BB), in which case r1 ¼ r3 ¼ r and r2 ¼ 1 % 2r. The the observations for each homozygote and the remain-
information matrix is then given by ing 0.414 of the observations for the heterozygote
2 3 group.
1 0 1 % 2r Suppose, for example, that we are interested in the
n
M¼4 0 2r 0 5 2 best design that can be made with only 10 slides using a
1 % 2r 0 1 % 2r se two-color microarray platform. We proceed as follows.
To estimate just additive effects, each homozygous
and the covariance matrix is
genotype (BB or bb) will have 10 replications. The het-
2 1 3 erozygote group, in this case, is not included in the
2r 0 %2r1
6 1 7 se
2 experiment. To estimate only dominance effects there
M%1 ¼ 4 0 2r 0 5 : will be 5 replications of each homozygous genotype
n
%2r1 0 %2r ð2r1 %1Þ and 10 replications of the heterozygote group. Finally, for
estimation of both additive and dominance effects, the A-
Figure 1 plots changes in the variances of the pa- optimal design has 6 replications for each homozygote
rameter estimates with the proportion (r) of each and the remaining 8 replications for the heterozygote.
homozygous group. If one is interested solely in the Treatment-to-unit allocation: After the treatments have
(trans-acting) additive effects, just homozygotes should been chosen (i.e., the number of replications for each ge-
be considered in the design. To estimate (trans-acting) notypic group), the next step in planning the microarray
950 J. S. S. Bueno Filho, S. G. Gilmour and G. J. M. Rosa
TABLE 2
Optimal number of replications of each genotype in a two-locus experiment with no epistasis,
using 20 experimental units
Parameters of interest
Genotype Optimal proportions a’s d’s (a and d)’s
AABB r.1r.2 0.25n ¼ 5 0.0625n ' 1 '0.0858n ' 2
AABb r.1(1 % 2r.2) 0n ¼ 0 0.125n ' 3 '0.1213n ' 2
AAbb r.1r.2 0.25n ¼ 5 0.0625n ' 1 '0.0858n ' 2
AaBB (1 % 2r.1)r.2 0n ¼ 0 0.125n ' 3 '0.1213n ' 2
AaBb (1 % 2r.1)(1 % 2r.2) 0n ¼ 0 0.25n ' 4 '0.1716n ' 4
Aabb (1 % 2r.1)r.2 0n ¼ 0 0.125n ' 3 '0.1213n ' 2
aaBB r.1r.2 0.25n ¼ 5 0.0625n ' 1 '0.0858n ' 2
aaBb r.1(1 % 2r.2) 0n ¼ 0 0.125n ' 3 '0.1213n ' 2
aabb r.1r.2 0.25n ¼ 5 0.0625n ' 1 '0.0858n ' 2
952 J. S. S. Bueno Filho, S. G. Gilmour and G. J. M. Rosa
all! Similar conclusions can be generalized to situations In selecting an optimal design, careful thought must
with more than two loci and larger numbers of slides. be given to which classes of designs are considered as
It is interesting to note how different the designs candidates. In this situation, it is reasonable to insist on
of Figure 3, which explore a natural structure on treat- choosing a treatment set that is symmetrical with respect
ment groups, are from the optimal designs for two-color to the two gene loci, i.e., that is genewise balanced. The
microarray experiments involving unstructured treat- best genewise balanced design is given in Figure 5. Note
ments, which tend toward symmetry whenever possible. that in all slides one locus has its genotype repeated
For example, Figure 4 depicts two experiments, one while the other varies. This allows us to infer each
involving four treatments and 10 slides (Figure 4A) and genomic expression and interaction independently but
the other with nine treatments and 9 slides (Figure 4B). at the expense of decreasing precision for the main
The experiment with nine treatments is an example of effects parameters previously of interest (additive and
the so-called loop (or circular) design commonly found dominance effects).
Design of Genetical Genomics Studies 953
TABLE 3
Examples of hierarchical and factorial mating systems with
nine progeny
Hierarchical Factorial
Progeny Dam Sire Dam Sire
1 1 1 1 1
2 2 1 2 1
3 3 1 3 1
4 4 2 1 2
5 5 2 2 2
6 6 2 3 2
Figure 5.—Optimal design for estimating trans-acting ep- 7 7 3 1 3
istatic effects between two genes, with 10 slides in a two-color 8 8 3 2 3
microarray experiment. 9 9 3 3 3
2 3
1 0:25 0:25 0 0 0 0 0 0
GENETIC STUDIES WITH RANDOM TREATMENT 6 0:25 1 0:25 0 0 0 0 0 0 7
EFFECTS 6 7
6 0:25 0:25 1 0 0 0 0 0 0 7
6 7
6 0 0 0 1 0:25 0:25 0 0 0 7
Multiple-treatment microarray experiments: Apart 6 7
6
Ah ¼ 6 0 0 0 0:25 1 0:25 0 0 0 7
from those situations in which 2L or 3L factorials arise 6 0
7
6 0 0 0:25 0:25 1 0 0 0 77
from targeting L genes, there are other genetic objec- 6 0
6 0 0 0 0 0 1 0:25 0:25 7
7
tives that need many treatments. One common goal in 4 0 0 0 0 0 0 0:25 1 0:25 5
genetic studies is to obtain good estimates of genetic var- 0 0 0 0 0 0 0:25 0:25 1
iances and heritabilities. Within a genetical genomics
context this means studying the heritability of the and
mRNA abundance of thousands of genes, having only 2 3
1 0:25 0:25 0:25 0 0 0:25 0 0
a sample of n subjects (with a specific relationship struc- 6 0:25 1 0:25 0 0:25 0 0 0:25 0 7
6 7
ture among them) to look at. 6 0:25
6 0:25 1 0 0 0:25 0 0 0:25 7
7
In genetic models with random treatments, the treat- 6 0:25 0 0 1 0:25 0:25 0:25 0 0 7
6 7
ment effects are often the subject effects. This is the Af ¼ 6
6 0 0:25 0 0:25 1 0:25 0 0:25 0 7 7
6 0 0 0:25 0:25 0:25 1 0 0 0:25 7
case, for example, with the so-called animal models 6
6 0:25
7
6 0 0 0:25 0 0 1 0:25 0:25 7
7
(Lynch and Walsh 1998), which have been extensively 4 0 0:25 0 0 0:25 0 0:25 1 0:25 5
used in animal breeding and more recently in plant 0 0 0:25 0 0 0:25 0:25 0:25 1
breeding (mainly in forestry). For a comprehensive pre-
sentation of modern analytical techniques for such
models refer to Sorensen and Gianola (2002). (Lynch and Walsh 1998), where the indices h and f
For these situations we have the problem of finding refer to the hierarchical and factorial mating systems,
good designs to obtain estimates using the best linear respectively.
unbiased predictors of breeding values, given a known Restricting our discussion to the estimation of addi-
pedigree or family structure. The best designs are very tive variance components, it can be seen that in such
specific for each pedigree and estimation objective, as cases we need to find both the best replication for each
discussed below. treatment and the best treatment-to-unit allocation. It
Design criterion: Bueno Filho and Gilmour (2003) is worth noting that in this case ‘‘treatments and rep-
argued that a suitable criterion for selecting among lications’’ can refer either to families and sibs within
related treatments can be obtained as a weighted families or to biological individuals and technical
average of the variances of all pairwise contrasts. This samples from the same individual.
criterion has a broad relevance and estimating genetic As an illustration, consider two replications of each
variance components and heritabilities is one of the treatment (individual). For this kind of group-divisible
possible goals. In the following examples we apply this treatment set, it is suggested (Bueno Filho and
criterion to two small experimental situations to illus- Gilmour 2003) that the best designs are in the class of
trate some features of optimal designs. In Table 3, two regular graph designs (e.g., loop designs), but the allo-
situations are presented, involving nine animals com- cation of treatments to treatment labels is important.
ing from either a hierarchical or a factorial mating The best design is one that avoids relatives (in this case,
system. paternal or maternal half-sibs) in the same array, as
These two different pedigrees result in the additive much as possible. Therefore, the resulting optimal de-
(numerator) relationship matrices signs are more restrictive for the factorial than for the
954 J. S. S. Bueno Filho, S. G. Gilmour and G. J. M. Rosa
TABLE 4
Litter, treatment (drug), and sex of 20 piglets available
for microarray experiments for inferring the effects
of sex, drug, and the interaction between sex
and drug, as well as the heritability
It has been shown how the design choice should be LITERATURE CITED
intimately related to the goal of the experiment, such as Bailey, R. A., 1981 A unified approach to design of experiments.
the contrast between fixed effects (e.g., the comparison J. R. Stat. Soc. A 144: 214–223.
of alternative drugs or the expected phenotypic value of Bailey, R. A., 1997 Statistics and mathematics: the appropriate use
of mathematics within statistics (with discussion). Statistician 47:
different genotypes), or inference regarding variance 261–271, 273–290.
components (such as genetic variances), or heritability Brem, R. B., G. Yvert, R. Clinton and L. Kruglyak, 2002 Genetic
of transcriptional levels. dissection of transcriptional regulation in budding yeast. Science
296: 752–755.
Sometimes it is possible to have an analytical solution Bueno Filho, J. S. S., and S. G. Gilmour, 2003 Planning incomplete
for the optimality problem, but in many situations a block experiments when treatments are genetically related.
numerical solution is needed. Whenever a numerical Biometrics 59: 375–381.
Bystrykh, L., E. Weersing, B. Dontje, S. Sutton, M. T. Pletcher
solution is implemented, a complete search (comparing et al., 2005 Uncovering regulatory pathways affecting hemato-
all possible designs) guarantees the optimal finding. poietic stem cell function using ‘‘genetical genomics.’’ Nat.
A complete search, however, is often unattainable due Genet. 37: 225–232.
Chesler, E. J., L. Lu, J. Wang, R. W. Williams and K. F. Manly,
to the size of the search problem involved. In these si- 2004 WebQTL: rapid exploratory analysis of gene expression
tuations algorithms are needed that can find optimal or and genetic networks for brain and behavior. Nat. Neurosci. 7:
near-optimal designs. R functions were developed for 485–486.
Darvasi, A., 2003 Gene expression meets genetics. Nature 422: 269–
searching and comparing the different designs discussed 271.
in this article; they can be obtained directly from the Darvasi, A., and M. Soller, 1992 Selective genotyping for determi-
authors. nation of linkage between a marker locus and a quantitative trait
locus. Theor. Appl. Genet. 85: 353–359.
As a final remark, it is important to note that this Draper, N. R., and I. Guttman, 1997 Two-level factorial and frac-
article considers a single gene at a time for design pur- tional factorial designs in blocks of size two. J. Qual. Technol.
poses, as the same model is generally used to analyze all 29: 71–75.
Fedorov, V. V., 1972 Theory of Optimal Experiments. Academic Press,
genes in a single experiment. This is not an issue for New York.
fixed-effects models (such as for the examples compar- Glonek, G. F. V., and P. J. Solomon, 2004 Factorial and time course
ing genotypic groups) as the optimal design criteria for designs for cDNA microarray experiments. Biostatistics 5: 89–
111.
fixed-effects models depend only on the design matrix. Henderson, C. R., 1976 Simple method for computing inverse of a
This is not true in mixed- or random-effects models, as numerator relationship matrix used in prediction of breeding
in these cases optimal designs depend on both the values. Biometrics 32: 69–83.
Jansen, R. C., and J. Nap, 2001 Genetical genomics: the added value
design matrix and the variance components parame- from segregation. Trends Genet. 17: 388–391.
ters. Since the variance parameters vary among genes, Jin, C., H. Lan, A. D. Attie, G. A. Churchill, D. Bulutoglu et al.,
the optimal design for one gene may be different from 2004 Selective phenotyping for increased efficiency in genetic
mapping studies. Genetics 168: 2285–2293.
that for another gene. As a consequence, caution is Kerr, M. K., 2003 Design considerations for efficient and effective
needed before we simplify the optimal design for the microarray studies. Biometrics 59: 822–828.
microarray to an optimal design for a single gene if Kerr, M. K., 2006 2^k factorials in blocks of size 2, with application to
two-color microarray experiments. UW Biostatistics Working Paper
random- or mixed-effects models are considered. Series No. 227 (http://www.bepress.com/uwbiostat/paper227).
This is a very interesting topic, which deserves further Kerr, M. K., and G. A. Churchill, 2001 Statistical design and the
research. A possible approach to this issue would be to analysis of gene expression microarray data. Genet. Res. 77: 123–
128.
use an empirical distribution for the genetic variances Kiefer, J. C., and J. Wolfowitz, 1960 The equivalence of two extre-
or heritabilities to weight the decision function that mum problems. Can. J. Math. 12: 363–366.
yields the design criterion. Alternatively, ‘‘robust’’ de- Lander, E. S, and D. Botstein, 1989 Mapping Mendelian factors
underlying quantitative traits using RFLP linkage maps. Genetics
signs could be sought by focusing on the high herita- 121: 185–199.
bility genes. This could be achieved by putting more Lockhart, D. J., H. L. Dong, M. C. Byrne, M. T. Follettie, M. V.
weight on the higher heritabilities than suggested by an Gallo et al., 1996 Expression monitoring by hybridization to
high-density oligonucleotide arrays. Nat. Biotech. 14: 1675–
empirical distribution or by even ignoring the region of 1680.
low heritability when searching for good designs. An Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitative
extreme version of this approach would be by taking Traits. Sinauer, Sunderland, MA.
Mead, R., 1990 The nonorthogonal design of experiments (with dis-
genetic effects as fixed, but in general this would be an cussion). J. R. Stat. Soc. A 153: 151–201.
inefficient strategy. Moreover, while modeling genetic Monks, S. A., A. Leonardson, H. Zhu, P. Cundiff, P. Pietrusiak
effects as fixed may be simple with experiments in- et al., 2004 Genetic inheritance of gene expression in human
cell lines. Am. J. Hum. Genet. 75: 1094–1105.
volving family structures such as half- or full-sibs, it is Noueiry, A. O., J. Chen and P. Ahlquist, 2000 A mutant allele of
not clear how this approach could be easily extended essential, general translation initiation factor DED1 selectively in-
for complex pedigrees. hibits translation of a viral mRNA. Proc. Natl. Acad. Sci. USA 97:
12985–12990.
Rosa, G. J. M., J. P. Steibel and R. J. Tempelman, 2005 Reassessing
This work was supported by the Michigan State University Intra- design and analysis of two-color microarray experiments using
mural Research Grant Program, by grant 2004-33120-15204 from the mixed effects models. Comp. Funct. Genomics 6: 123–131.
United States Department of Agriculture to G.J.M.R., and by grant Schadt, E. E., S. A. Monks, T. A. Drake, A. J. Lusis, N. Che et al.,
474787/2004-4 from the Conselho Nacional de Desenvolvimento 2003 Genetics of gene expression surveyed in maize, mouse
Cientı́fico e Tecnológico-Brazil to J.S.S.B.F. and man. Nature 422: 297–302.
Design of Genetical Genomics Studies 957
Schena, M., D. Shalon, R. W. Davis and P. O. Brown, 1995 Quan- Wang, P. C., 2004 Designing two-level fractional factorial experi-
titative monitoring of gene-expression patterns with a comple- ments in blocks of size two. Sankhya 66: 327–342.
mentary-DNA microarray. Science 270: 467–470. Wit, E., A. Nobile and R. Khanin, 2005 Near-optimal designs for
Sorensen, D., and D. Gianola, 2002 Likelihood, Bayesian, and MCMC dual channel microarray studies. Appl. Stat. 54: 817–830.
Methods in Quantitative Genetics. Springer, New York. Wolfinger, R. D., G. Gibson, E. D. Wolfinger, L. Bennett,
Steibel, J. P., and G. J. M. Rosa, 2005 On reference designs for mi- H. Hamadeh et al., 2001 Assessing gene significance from
croarray experiments. Stat. Appl. Gen. Mol. Biol. 4(1): Article 36 cDNA microarray expression data via mixed models. J. Comp.
(http://www.bepress.com/sagmb/vol4/issl/art36/). Biol. 8: 625–637.
Tempelman, R. J., 2005 Assessing statistical precision, power, and ro- Yang, Y. H., and T. Speed, 2002 Design issues for cDNA microarray
bustness of alternative experimental designs for two color micro- experiments. Nat. Rev. Genet. 3: 570–588.
array platforms based on mixed effects models. Vet. Immunol. Yang, Y. J., and N. R. Draper, 2003 Two-level factorial and fractional
Immunopathol. 105: 175–186. factorial designs in blocks of size two. J. Qual. Technol. 35: 294–
Vinciotti, V., R. Khanin, D. D’Alimonte, X. Liu, N. Cattini et al., 305.
2005 An experimental evaluation of a loop versus a reference
design for two-channel microarrays. Bioinformatics 21: 492–501. Communicating editor: J. B. Walsh