Beruflich Dokumente
Kultur Dokumente
M. A. Elzo
The online version of this article, along with updated information and services, is located on
the World Wide Web at:
http://jas.fass.org
www.asas.org
M. A. Elzo
Animal Science Department, University of Florida, Gainesville 32611
ABSTRACT: Restricted maximum-likelihood proce- were modeled in regression form (any value between
dures were developed to estimate additive and nonad- and including zero and one in the design matrices).
ditive genetic and environmental covariances for Computational requirements will be larger than for
multiple traits in multibreed populations. The com- intrabreed analyses. Appropriate simplifying assump-
putational procedure follows the expectation-maximi- tions and numerical techniques (e.g., sparse and
zation ( E M ) algorithm, where the set of equations in iterative numerical techniques) will be required for
the maximization step is solved by successive approxi-
the implementation of these multibreed covariance
mations. This computational procedure does not guar-
estimation procedures. Number of iterations ( 5 to 12)
antee convergence to a symmetric positive-definite
covariance matrix. Thus, computer programs will need and computing times (57 to 113 min) to achieve
to incorporate restrictions in the maximization step to convergence when estimating 21 genetic and environ-
ensure positive definiteness of each covariance matrix. mental covariances in five small simulated multibreed
Additive genetic and environmental covariances were data sets (two breeds, 25,200 to 50,400 calves, 120 to
modeled in subclass form (zeros and ones in the 135 unrelated bulls) suggest that these procedures
design matrices). Nonadditive genetic covariances are computationally feasible.
Key Words: Maximum Likelihood, Variance Components, Genetic Parameters, Population Structure
Although crossbreeding is widely practiced in the additive and nonadditive genetic effects will need to be
United States, the active genetic' basis of the beef accounted for (Elzo and Famula, 1985). In a mul-
industry is formed by a large number of breeds (e.g., tibreed genetic evaluation, each breed and crossbred
Angus, Brahman, Hereford, Limousin, Simmental) group may have different values for additive genetic
that act independently of one another. Consequently, variances and covariances (Elzo, 1990a; Lo et al.,
genetic evaluation and selection of parents are still 1994). Similarly, each breed group combination may
formally carried out within each breed, as evidenced have different values for nonadditive genetic variances
by the Guidelines for Uniform Beef Improvement and covariances (Elzo, 1990b). Environmental vari-
ances and covariances may also differ across breeds
Programs (BIF, 199 0). Unfortunately, intrabreed
and crossbred groups. The large number of sets of
EPD cannot be used t o compare bulls of the same or additive, nonadditive, and environmental covariances
ddferent breeds for crossbreeding purposes because that need to be estimated simultaneously can be
they consider only additive genetic effects (each breed drastically reduced if multibreed covariances are
has a different additive genetic base) and they ignore assumed to be linear functions of a small set of
nonadditive genetic effects (defined here as the covariances.
combining ability of a bull when mated to dams of The current method of choice to estimate covari-
various breed compositions). If bulls are to be ances using animal breeding data is REML. However,
compared across breeds and crossbred groups both existing REML procedures can only estimate a single
set of covariances. Thus, the objective of this research
was to develor, REML Drocedures for multibreed
populations that 1) account for heterogeneity of
'Florida Agncultural Experiment Station Journal Series KO.R- covariances genetic groups of animals, 2
03642. The author thanks R. L. Quaas for his help and comments
during the course of this research and D. D. Hargrove and T. A.
express additive and nonadditive genetic and environ-
Olson for reviewing the manuscript. mental covariances of genetic groups as linear combi-
Received February 7, 1994. nations of a small number of covariances, and 3 )
Accepted June 20, 1994. simultaneously estimate all the sets of covariances
3055
used to compute all the additive and nonadditive more breeds within a nonadditive configuration.
genetic and environmental covariances of any genetic Nonadditive configurations are to nonadditive genetic
group given a set of base breeds. covariances as breeds are to additive genetic covari-
ances.
Environmental intrabreed genetic covariance: a
Development of the Restricted covariance due to environmental effects within a
Maximum Likelihood Procedure to Estimate breed.
Covariances in Multibreed Populations Environmental interbreed genetic covariance: a
covariance arising from differences between in-
The computational procedure is based on the trabreed means of environmental effects.
expectation-maximization ( EM) algorithm (Dempster Environmental multibreed genetic covariance:
et al., 1977), where the maximization step is accom- an environmental covariance equal to either an
plished by iteration. intrabreed environmental covariance (straightbred
The description of this procedure requires the use of animals) or a weighted sum of intrabreed and
unfamiliar terminology as well as definitions of interbreed environmental covariances (progeny of one
multibreed additive, nonadditive, environmental, and or two crossbred parents).
residual covariances. Thus, these preliminary aspects Residual intrabreed, interbreed, and multibreed
will be explained first. genetic covariances: weighted sums of additive and
environmental intrabreed, interbreed, and multibreed
Definition of Terms covariances.
Multibreed population: a population composed of
Assumptions
straightbred and crossbred animals.
Breed group: a group of animals whose genetic The following assumptions are made: 1) traits are
composition falls within a range of fractions of breeds; determined by a large number of unlinked loci, 2 )
for example, if five breed groups are constructed to random segregation and assortment of alleles occur
group animals in a two-breed population (A = breed 1, during meiosis, 3 no inbreeding, and 4 ) covariances
B = breed 21, the group ranges could be as follows: remain constant over time.
group 1 = (1.0 to .81)A ( . O to .19)B, group 2 = (.80 t o
.61)A (.20 to .39)B, group 3 = (.60 to .41)A (.40 to Additive Multibreed Genetic Covariances
.59)B, group 4 = (.40 to .21)A (.60 to .79)B, and
group 5 = (.O to .19)A (1.0 to .81)B. Additive genetic covariances for each breed group
Regression model: a model that defines multibreed combination are assumed to be different. These
bull nonadditive effects in terms of intra- and inter- covariances are equal to the sum of two terms. The
breed interactions between alleles at E loci, I = 1, ... , L. first term is equal to the weighted sum of the
Bull model: an abbreviation of sire-maternal grand- intrabreed covariances for traits Y and Z, where the
sire model. weights are the expected frequencies of each breed in
Additive intrabreed genetic covariance: a covari- the gth breed group combination (Elzo, 1983, 1990a;
ance due to additive genetic effects within a breed. Lo et al., 1994). The second term is equal to the
Additive interbreed genetic covariance: a covari- weighted sum of the interbreed covariances for traits
ance arising from differences between intrabreed Y and Z, where the weights are the sum of the product
means of additive genetic effects; it is equal to twice of the expected breed frequencies in the parental breed
the segregation covariance (Lo et al., 1994). groups (Lo et al., 1994). This second term was
Additive multibreed genetic covariance: an addi- assumed to be zero by Elzo (1990a). Inclusion of the
tive covariance for animals in a multibreed popula- second term in the computation of multibreed additive
tion; equal to either an additive intrabreed genetic covariances does not affect the rules to compute the
covariance (straightbred animals) or a weighted sum matrix of covariances among bull additive genetic
of additive intrabreed and interbreed genetic covari- effects ( G,) or its inverse ( G,-I).
ances (progeny of at least one crossbred parent). Thus, the additive genetic covariance between traits
Nonadditive configuration: a representation of I Y and 2 for an animal in a noninbred multibreed
loci using the breed of origin of the alleles. For two population is
breeds, A and B, there are four configurations at one
locus: MA, A/B, B/A, and B/B; three configurations nb
result if A B and B/A are defined as one configuration.
A possible set of configurations for one and two loci is b=l
shown in Elzo (1990b).
Nonadditive intraconfiguration genetic covari- nb- 1 nb
ance: a covariance due to nonadditive genetic effects
caused by the interaction between alleles of one or
b=l
Pb (aeYZ)b Z = incidence matrix relating records to ele-
ments of u; G = matrix of covariances among
nb-1 nb
elements of u; and R = matrix of covariances
among elements of v.
Bulls are assumed to be unrelated. Bull genetic
effects due to different nonadditive configurations are
where the superscript i represents an individual assumed to be uncorrelated among themselves and t o
animal, the subscripts b and b’ represent two breeds, additive genetic effects. Thus, the matrix G is block
( aeyz)b = environmental intrabreed covariance for diagonal (one block per bull), with blocks
breed b, and ( aem)bb’ = environmental interbreed
covariance for the pair of breeds b and b’. 0 0 ... 0
The structure of the residual covariances will -0sg
depend on 1 ) the additive model used (animal, 0 GOnl 0 ... 0
reduced animal, sire-dam, bull, sire model), 2 ) the 0 0 GOn2 . . . 0
ancestors identified on an animal with records, and 3 ) . .
the assumptions made with respect to multibreed 0 0 0 ... GOnK
environmental covariances. -
An expression for the multibreed residual covari-
ance between traits Y and Z for a bull model is the
following: where
GOag = nt x nt matrix of additive direct genetic Step 0. 1 ) Define a set of initial covariance values.
covariances for bulls of parental breed 2 ) Compute the matrices of derivatives of
combination g, g = 1, ... , nbgcom, where additive, nonadditive, and residual covari-
nbgcom = number of different breed ance matrices with respect to 4.
group combinations (nbg(nbg + 1112, nbg E-Step. 1) Compute the additive and nonadditive
= number of breed groups); covariances genetic and environmental sets of mul-
in Goag are computed using Equation [l]; tibreed covariance matrices needed in the
GOnk = n t x nt matrix of nonadditive direct construction of the MME of the bull model.
genetic covariances; k = 1, ... ,K, where K 2 1 Compute the predicted values of u, and
L u, (by solving the MME of the bull model)
= nl, and nl = number of interaction and v (by Equation [71, Appendix). Also,
1=1 compute the EVE' of u, and u, (using
effects at I loci. elements of the inverse of the MME) and v
(by Equation [SI, Appendix).
Calves are assumed to be related only through their 3 ) Compute SO,. ( P ) (Equation [41, Ap-
sires and(or) maternal grandsires. Thus, residual F
pendix), Sank.( P (Equation [51, Appen-
effects are correlated only within a calf. Consequently, dix), and Sorn.(P) (Equation [61, Appen-
the matrix R is block diagonal, with blocks equal to dix).
Ram*, for m = 1, ... , M, and M = number of residual M-Step. 1 ) Compute 4( p l) by successive approxi-
+
subclasses (defined according to the various intra- and mations (i.e., Scoring iterations) using
interbreed genetic and environmental factors con- Equations 1121 (Appendix). If the differ-
tained in them). The Ro," are nt x nt matrices of ence between the absolute values of the
residual covariances with zeros in the rows and estimates of $( P+ and 4( P) are less than
columns corresponding to missing traits in a calf. or equal to a vector of small values E ( E -
Elements of Ram* are computed using Equation [31. quations [131 and [141, Appendix), then
To simplify the notation, let 4 = [4, 4, 4J' where 4, stop; otherwise, go back to the E-step and
is a vector of additive genetic covariances ( t h e aayz in continue with the EM iterations.
Equation [l]),4, is a vector of nonadditive genetic
covariances ( t h e ( unyz)l), and is a vector of Analyses of Simulated Data Sets
environmental covariances ( t h e aeyz in Equation [2]).
Also, let the number of 1 ) additive genetic covariances Five data sets were simulated and covariance
be N,, 2 ) nonadditive genetic covariances be Nn, 3 ) components estimated using the methodology
environmental covariances be Ne, and 4 ) N,= N, + N, presented here. The purpose of these analyses was to
+ Ne. Thus, each covariance in Goag, GOnk, and Ro," obtain some information on the computer times and
is a linear function of elements of d. the number of EM and Scoring iterations required to
achieve convergence in small data sets. Computations
Computational Procedure were carried out in a n IBM RS6000 workstation,
model 580, using a computer program (written in
The derivation of the computational procedure used FORTRAN and compiled using the AIX XL FORTRAN
to obtain REML estimates of covariances ( 4 ) in Compiler/6000 without any optimization) based on
multibreed populations is described in the Appendix. the procedures described here. This program used the
The computational procedure makes use of the Expec- FSPAK sparse-matrix routines (Perez-Enciso and
tation Maximization ( E M 1 algorithm (Dempster et Misztal, personal communication) to invert the left-
hand side of the MME.
al., 1977). The EM algorithm is a n iterative procedure
that has a n expectation step ( E-step) and a maximi- Simulation of Data
zation step (M-step) in each iteration. The E-step
requires the computation of sums of products of Two breeds ( A and B ) , two traits per calf, and only
predicted values of random effects plus their cor- direct genetic effects were considered. Three additive
responding error variances of predictions ( EVP) . In genetic effects, one nonadditive genetic effect, three
the M-step, 4' P + is computed by iteration, where environmental effects and two sex effects were used in
+( P+ is the value of 4 that maximizes Q ( 4 I r$ P) ) the simulation of calf records. All effects, except sex of
calf, were simulated as random effects. The additive
(Equation [33, Appendix). Thus, at convergence, the
genetic effects were additive intrabreed A, additive
M-step produces the covariance estimates for the intrabreed B, and additive interbreed AB. The nonad-
( ~ + EM l >iteration.
~ ~ The M-step is accomplished by ditive genetic effect was intraconfiguration 11 (one A
iteration because the differentiation of Q ( 4 I q$ p) ) allele and one B allele a t one locus). Environmental
with respect to 4 results in a nonlinear set of effects were intrabreed A, intrabreed B, and inter-
equations. The computing algorithm is as follows: breed AB. Sex effects were male and female.
Pairs of traits
Additive
Intrabreed A 4.0a 3.0 40.0
3.gb 2.7 40.8
(1.7, 5.7)' (-.6, 5.6) (17.0, 55.0)
Intrabreed B 6.0 4.0 60.0
6.2 5.2 52.4
(3.9, 7.6) (3.9, 6.8) (41.7, 60.3)
Interbreed AB 2.0 4.0 20.0
1.9 1.8 27.0
(1.2, 3.2) (-3.3, 5.8) (11.1. 48.7)
Nonadditive ( 1 locus)
Intraconfiguration 11 3.0 4.0 30.0
4.8 5.8 38.9
(4.1, 5.3) (5.0, 7.6) (32.2, 53.1)
Environmental
Intrabreed A 6.0 7.0 90.0
6.2 7.0 87.1
(4.9, 7.6) (5.7, 9.0) (72.4, 109.8)
Intrabreed B 14.0 10.0 240.0
14.1 8.8 238.9
(12.9, 15.6) (7.5, 9.4) (233.1, 249.3)
Interbreed B 4.0 8.0 60.0
4.3 10.1 67.8
(3.5, 4.7) (7.5, 12.7) (50.4, 85.0)
aCovariance prior.
bMean of five REML estimates.
'(Smallest, largest) value among five REML estimates.
which is assumed to exist for all pairs (+‘, 4), where x = complete data and z = incomplete data. Also, f(x,+) is
assumed t o be an increasing function almost everywhere.
The incomplete data used here are defined to be a linear combination of the vector of observations, K’y, where
K’ is a matrix of contrasts such that K X = 0. The complete data are considered to be the vectors of unknown
random effects in the model ( u and v). The log-likelihood of the complete data, L,, is:
In [f(u,I+)l = - -
1
2
ag=l i=l
[lnIGoagI + uaiGOAguapil,
In [f(u,I4)1 = - + u ~ ~ ~ G
and ; ~ ~ u ~ ~ ~ ~
and u, = vector of bull additive genetic effects; u, = vector of bull nonadditive genetic effects; nag = number of
bulls in breed group combination ag; nbu = total number of bulls; and nm = number of calves in calf group m.
The EM algorithm is an iterative procedure that has two steps in each iteration: 1) an expectation step (E-
step), and 2 ) a maximization step (M-step). The E-step consists of computing Q ( + I @ ( P ) ) , and in the M-step,
+( P
+ + +
is computed, where C#I(P l ) is the value of that maximizes Q ( 4 I 4( P) ) . The M-step is accomplished by
iteration because the differentiation of &( + I4( P ) ) with respect to 4 results in a nonlinear set of equations. The
derivation of the E-step and the M-step for the ( p + l I t h iteration is described below.
E-step. The function &( 4 I +( P) ) , ignoring the constant term, is
1
nbgcom %g
x
ag=l i = l (In I GOagl + uagi’ G;:g~api) I Ky,+(P)
nbgcom ag
- - -
[na,Jn I GOagI + trGO:,i E[uapiuagi’ I K‘y,r#~(~)]
ag=l i=l
where
"ag
= I Ky,+'P'] + var(uagi I Ky,+(P)))
(E[uapi I K'~,+'~']E[u~pil
i=l
"ag
= [uagiuagit+ var(uapi - uagi)j,
i=l
and where
where
and where
where
"m
s$de = E[~,,v~~lK'y,+'~']
"In
- (+mi.+mi.' + var(Gmi - vmi)j,
i=l
The EM algorithm requires the function Q(4I 4( P)) to increase at each EM iteration. This is accomplished by
choosing 4' P + l ) as the value that maximizes Q ( 4 I +( P) ) . However, to compute $( P+l) by maximizing Q ( 4 I 4( PI)
in the M-step, only S o a g ( P ) , SO&.( P), and SOm.(P) are needed (i.e., the complete Q(q5 I q5(P) ) function does not
need to be computed). Thus, the quantities that need to be computed in the E-step are Soag.(p), Sonk.(P), and
Sorn.(P), which are functions of the predicted values of u,, u,, and v and their respective EVP. The predicted
values of Ua and u, are obtained by solving the mixed-model equations (MME) for the bull model (Equation [41
in the main text) and their EVP from elements of the inverse of the left-hand side. The predicted values of v are
computed as
A
where the {Cij) are submatrices of the inverse of the left-hand side of the MME for the bull model.
M-step. The vector 4(P+l) is computed by maximizing the function Q ( 4 I +( P) ) . This requires differentiating
+
Qa( 4 I 4' P) ) , Q,( 4 I $( P) ) and Q,( 4 I qb( P) with respect to and equating the resulting set of equations to zero.
The derivative of the additive genetic function is
'Goa,
where GLtgGoag,GOag written as C
Na
j=1
-9.,
a+j J
was inserted in the first term.
A similar strategy is used to obtain the derivative of the nonadditive and residual functions. Thus, the
derivative of the nonadditive genetic function is
where
and where
3GOag
Do,, = i j = 1, ... , N,;
, DOm*. = - a i i , and
~
34,
B,, = {C M
m = l
nm(tr(Rim)-lDimi(Rim)-lD~mj]
and where
aGOnk
DOnh = , and i j = N, + 1, ... , N, + N,;
aii
~
Be, = {CM
m = l
, for i j = N, + N, + 1, ... , N,;
~(tr(Rim)-1Dimi(Rim)-1D6,jri
da = {nbgcom
ag = 1
trG&DOa,G~agSag. +
1 (p)
C
M
m = l
tr(Rim)-lDi,i(Rim)- , for i = 1, ... , N,;
and
M
tr(R:m)-lDimi(R:,)- , for i = N, + N, + 1, ... , N,.
Equations [121: 1) are nonlinear in 4, thus, 4 ( P + l ) must be computed iteratively (i.e., by successive
approximations [Harville, 1977; Harville and Callahan, 1990]), 2 ) are equal to those obtained by a Scoring
Algorithm applied to maximizing Q ( 4 I 4 ( P ) ) (R. L. Quaas, Cornel1 University, personal communication), and 3
have no built-in restrictions on the values of the covariances, thus there is no guarantee that all parameters
being estimated will be within the parameter space. Thus, computer programs need to incorporate restrictions to
ensure that estimates of covariance matrices are symF etric positive definite (or at least symmetric positive
semidefinite) at each Scoring iteration and at each EM iteration. Restriction strategies that could be considered
include 1) barrier and penalty functions (Fletcher, 1974; Ryan, 1974; Harville, 19771, 2 ) gradient projection
methods (Sargent, 1974; Harville, 1974), and 3 ) direct search methods (Swann, 1974).
The convergence criterion used to stop the Scoring and the EM iterations was that the absolute change in the
estimates of covariances of two successive iterations was small (Bard, 1974; Searle et al., 1992). Thus,
convergence was achieved when
where t is a vector of small numbers. The values o f t can be either set in advance or computed by the program
(Bard, 1974). In the second case, Bard (1974) recommended using Marquardt's (1963) expression