Sie sind auf Seite 1von 22

See

discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/7559669

Tutorial in biostatistics: spline smoothing


with linear mixed models
ARTICLE in STATISTICS IN MEDICINE NOVEMBER 2005
Impact Factor: 2.04 DOI: 10.1002/sim.2193 Source: PubMed

CITATIONS

DOWNLOADS

VIEWS

25

85

533

3 AUTHORS, INCLUDING:
Martin L. Hazelton
Massey University
77 PUBLICATIONS 1,094 CITATIONS
SEE PROFILE

Available from: Martin L. Hazelton


Retrieved on: 17 August 2015

STATISTICS IN MEDICINE
Statist. Med. 2005; 24:33613381
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2193

Tutorial in biostatistics: spline smoothing with linear


mixed models
Lyle C. Gurrin1; ; Katrina J. Scurrah2 and Martin L. Hazelton3
1 Epidemiology

and Biostatistics Unit; University of Melbourne; Australia


of Physiology; University of Melbourne; Australia
3 School of Mathematics and Statistics; University of Western Australia; Australia
2 Department

SUMMARY
The semi-parametric regression achieved via penalized spline smoothing can be expressed in a linear
mixed models framework. This allows such models to be tted using standard mixed models software
routines with which many biostatisticians are familiar. Moreover, the analysis of complex correlated data
structures that are a hallmark of biostatistics, and which are typically analysed using mixed models, can
now incorporate directly smoothing of the relationship between an outcome and covariates. In this paper
we provide an introduction to both linear mixed models and penalized spline smoothing, and describe
the connection between the two. This is illustrated with three examples, the rst using birth data from
the U.K., the second relating mammographic density to age in a study of female twinpairs and the
third modelling the relationship between age and bronchial hyperresponsiveness in families. The models
are tted in R (a clone of S-plus) and using Markov chain Monte Carlo (MCMC) implemented in the
package WinBUGS. Copyright ? 2005 John Wiley & Sons, Ltd.
KEY WORDS:

best linear unbiased prediction; genetic variance components; Markov chain Monte
Carlo; mixed models; non-parametric regression; semi-parametric regression; penalized
splines; twins; WinBUGS

1. INTRODUCTION
Linear mixed models [1] play a fundamental role in the practice of biostatistics since they
extend the linear regression model for continuously valued outcomes to allow for correlated
responses, and in particular for the analysis of longitudinal or clustered data within a hierarchical structure. Most commercial statistical software packages oer at least a basic facility
Correspondence

to: Lyle C. Gurrin, Epidemiology and Biostatistics Unit, School of Population Health, University of
Melbourne, 2/723 Swanston Street, Carlton, VIC 3053, Australia.
E-mail: lgurrin@unimelb.edu.au
Contract=grant
Contract=grant
Contract=grant
Contract=grant

sponsor:
sponsor:
sponsor:
sponsor:

National Breast Cancer Foundation


National Health and Medical Research Council
Merck, Sharp & Dohme Research Foundation
Canadian Breast Cancer Research Institute

Copyright ? 2005 John Wiley & Sons, Ltd.

Received 27 January 2004


Accepted 3 January 2005

3362

L. C. GURRIN, K. J. SCURRAH AND M. L. HAZELTON

in which to t mixed models, and many have extensive suites of routines that allow models
of arbitrary complexity. By incorporating random eects on the scale of the linear predictor,
one can also model correlation between observations where the data are discrete or involve
censoring by moving to generalized linear mixed models [1]. As such, linear mixed models
are a natural starting point for developing general methods for the analysis of many of the
data structures that arise in medical research.
When including continuously valued covariates in a linear mixed model, most biostatisticians take a fully parametric approach derived from linear regression. The mean of the
measured outcome is taken to depend on the covariate value in a linear manner, or in a
non-linear fashion that one assumes can be captured by a low-order polynomial. There will,
however, be circumstances where this approach is at best a crude approximation. A scatter
plot of the data may reveal that an apparently strong association between the outcome and
covariate nevertheless changes in a complicated way across the full range of the covariate.
Alternatively, it may be dicult to discern the form of the relationship, despite a large sample
size, due to considerable residual variability in the data. In this case an appropriate parametric
model will not be obvious, with low-order polynomial terms providing a poor t to the data.
Even in the case where polynomial terms provide a satisfactory t, it is unlikely that the true
relationship has a simple polynomial form.
Fortunately there is a considerable literature on semi-parametric regression and non-para
metric smoothing, where the non-linear association between the outcomes and a covariate is
not constrained by the parametric forms described above. Such models seek to describe the
local structure of the relationship between outcome and covariate, providing a good t to the
data as we move across the range of the covariate. Such techniques include kernel smoothing, spline smoothing and generalized additive models. Many of the large statistical software
packages provide access to a selection of these methods of smoothing. Such implementations
are, however, unlikely to be integrated with the conventional regression-based model tting
routines in the same package. Moreover, the use of techniques such as generalized additive
models raises issues of statistical inference that are outside the domain of linear regression
modelling. For many biostatistical analyses it is unlikely that any attempt will be made to
explore complex non-linear relationships involving secondary covariates that potentially confound the relationship between the outcome measure and a primary covariate unless such
modelling can be carried out in standard software.
In this paper we seek to elucidate a recent important connection that has been made between
semi-parametric regression models that achieve smoothing using penalized splines and linear
mixed models. We will demonstrate how the relationship between an outcome and covariate
can be modelled non-parametrically within the framework of standard parametric mixed models analysis. Not only does this allow smoothing to be carried out using any of the available
software for mixed models, it also allows data analysts to incorporate semi-parametric regression while retaining the benets of a mixed models approach to the analysis of correlated
data structures that are often a feature of biostatistical data analysis.
The paper is arranged as follows. In the next section we provide a review of linear mixed
models and introduce necessary notation. Spline smoothing for non-parametric regression is
covered in Section 3. We begin with spline smoothing with a single covariate, illustrating the
ideas using data on the annual number of live births in the U.K. over the 50 year period from
1865 to 1914. We go on to discuss extensions, including modelling with multiple explanatory
variables (generalized additive models), semi-parametric regression models, and models with
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3363

TUTORIAL IN BIOSTATISTICS

correlated data structures. In Section 4, we exemplify these methods through the analysis of
data on mammographic density, a known risk factor for breast cancer, measured for twinpairs.
This study focuses on the relationship between mammographic density and age, taking into
account the within-pair correlation structure that appears to be due to genetic factors [2]. We
nd that, cross-sectionally, mammographic density tends to decrease with age after 45 years,
but that the relationship is non-linear. A more substantial example is presented in Section 5,
where the relationship between age and the log-hazard of bronchial hyperresponsiveness on
the dose scale of an inammatory agent is captured using a spline term in a generalized linear
mixed model. In Section 6, we summarize the main points of the paper and mention some
extensions to the methods described earlier. Computing code for implementing the second
example in R [3] and WinBUGS [4, 5] appears in Appendix A.

2. REVIEW OF LINEAR MIXED MODELS


The standard linear mixed model has the form
y = XR + Zu + U
where y is a vector of observed responses, and X and Z are design matrices associated with
a vector of xed eects R and a vector of random eects u, respectively. The random eects
vector u has zero mean and covariance matrix G, and U is a vector of residual error terms
with zero mean and covariance matrix R. The dimensions of the design matrices X and Z
must conform to the lengths of the observation vector y and the number of xed and random
eects, respectively. It is generally assumed that the elements of u are uncorrelated with the
elements of U, in which case the covariance matrix of the random eects and residual errors
terms is block diagonal:

  
u
G 0
Var
=
U
0 R
The matrices Z and G will themselves be block diagonal if the data arise from a hierarchical
(or multilevel) structure, where a xed number of random eects common to observations
within a single higher-level unit are assumed to vary across the units for a given level of the
hierarchy. Typically we take the vectors of residual errors U to be independent and identically
distributed and thus R = 2 I where 2 is the residual variance, although other structures are
possible [68]. The covariance matrix G of the random eects vector u is often assumed to
have a structure that depends on a series of unknown variance component parameters that
need to be estimated in addition to the residual variance 2 and the vector of xed eects R.
The universal estimators of the xed and random eects are the best linear unbiased estimator (BLUE) R of R and the best linear unbiased predictor (BLUP) u of u which can be
recovered as the solution to the mixed model equations [9]:
     1 
  1
X R1 Z
XR X
XR y
R
=
Z R1 X Z R1 Z + G1
Z R1 y
u
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3364

L. C. GURRIN, K. J. SCURRAH AND M. L. HAZELTON

Robinson [10] presents several derivations of these equations, a key feature of which is that
an explicit assumption regarding the distribution of the random eects and residual error term
is not necessary in order to make progress in the estimation of R and u. The criteria of best
linear unbiased prediction is sucient. It is, however, far more common to work explicitly
with the Gaussian mixed model, where u and U are assumed to be multivariate normal:
 

  
u
0
G 0
N
;
U
0
0 R
This normality assumption allows the construction of a likelihood function for (R; G; R), the
logarithm of which is
l(R; G; R) = 12 {n log(2) + log |V| + (y XR) V1 (y XR)}
where
V = Var (y) = ZGZ + R
By assuming that the parameters dening the covariance matrices G and R are known it is
straightforward to show that the maximum likelihood (ML) estimate R of R is
1

R = (X V1 X)

X V1 y

which, although it is not obvious algebraically, must also satisfy the mixed model equations
presented earlier since one of the ways in which these equations can be derived is directly
from the multivariate normality assumption. Typically G and R will not be known, and can
be estimated by substituting the expression for R back into l(R; G; R) (generating the prole
log-likelihood for the covariance matrices) and maximizing the result over the parameters
dening G and R. An alternative is to use restricted maximum likelihood estimation (REML)
for the variance components, which involves maximizing the likelihood of a set of linear
combinations of the data vector y which, by construction, do not depend on R. The latter
condition is achieved by working with error contrasts, that is, linear combinations of the
form t y where E(t y) = 0, which is equivalent to requiring that t X = 0. Crainiceanu and
Ruppert [11] have recently demonstrated that REML estimates are not always unbiased, but
the bias is less severe than for ML estimates.
Once estimates for R, G and R have been determined, we can return to the mixed model
equations and determined the best linear unbiased predictor u of random eects vector u as
the vector that minimizes the expected mean squared error of prediction
E{(u u) (u u)}

It is well known that the BLUP of u can be expressed as the posterior expectation of the
random eects given the data
u = E{u|y}
which can be solved explicitly under the normality assumption to yield
u = GZ V1 (y XR )
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3365

TUTORIAL IN BIOSTATISTICS

Henderson [9] showed that the covariance matrix for R R and u u is


  1
1
XR X
X R1 Z
C=
Z R1 X Z R1 Z + G1
Although it is not possible to perform this matrix inversion explicitly for general covariance
matrices G and R, it is straightforward to show that the covariance matrix of the xed eects is
1
Var (R ) = (X V1 X) . One other useful relationship is that Var (u) = Cov(u; u) = GVar (uu).
Note that the covariance matrices of Var (u) and Var (u u) are not in general the same due
to the assumption that the components of u are random eects and thus have intrinsic variability in addition to that incurred by predicting u from the data. In practice the expression
for the estimators R and u are evaluated using values of covariance matrices G and R that
are themselves estimated from the data, which results in a downward bias in the estimates of
the sampling variability of the xed and random eects; see Reference [12].

3. SEMI-PARAMETRIC REGRESSION
3.1. Spline smoothing
Suppose that we want to model the relationship between a continuous response Y and a single
covariate x by
E[Yi ] = m(xi ) + i

(i = 1; : : : ; n)

(1)

where m is an arbitrary smooth function giving the conditional mean of Y , and 1 ; : : : ; n


are independent error random variables with common mean zero and variance 2 . This is
a non-parametric regression model because m is not constrained to be of any pre-specied
parametric form (e.g. linear). A popular approach to estimating m is using splines. The linear
spline estimator is of the form
m(x) = 0 + 1 x +

K


uk (x k )+

(2)

k=1

where

(x k )+ =

0;

x 6 k

x k ;

xk

and 1 ; : : : ; K are knots. Equation (2) describes a sequence of line segments tied together at
the knots to form a continuous function. More generally, we can extend (2) to be a piecewise
polynomial of degree p, dening
mp (x; R; u) = 0 + 1 x + + p xp +

K


uk (x k )p+

(3)

k=1

where R = (0 ; : : : ; p ) and u = (u1 ; : : : ; uk ) denote vectors of coecients. The functions 1;


x; : : : ; xp ; (x 1 )p+ ; : : : ; (x K )p+ in (3) are basis functions, since any spline of order p with
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3366

L. C. GURRIN, K. J. SCURRAH AND M. L. HAZELTON

Fixed effects at knots

Random effects at knots


950
live births (in 1000s)

live births (in 1000s)

950
900
850
800

900
850
800
750

750
1870

1890

1910

1870

year

1890

1910

year

Figure 1. Linear spline regression functions for the U.K. births data. In the left panel all
regression coecients are xed eects estimated by OLS. In the right panel the coecients
at the knots are random eects.

the given knots is a linear combination of these function. Quadratic (p = 2) and cubic (p = 3)
splines are common choices in practice because they ensure a certain degree of smoothness
in the tted curve.
To implement spline smoothing in practice the coecients R and u must be estimated. In
principle we could apply the usual method of least squares, but this tends to result in a rather
rough function estimate. To appreciate why, consider the linear spline model. The coecients
u1 ; : : : ; uK represent the changes in gradient between consecutive line segments. In ordinary
least squares (OLS) these quantities can be of large magnitude, resulting in relatively rapid
uctuations in the estimate of m. A greater degree of smoothness can be achieved by shrinking
the estimated coecients towards zero. This is precisely what occurs when OLS estimates are
replaced by BLUPs, and suggests that a smooth estimate of m might be obtained by regarding
u1 ; : : : ; uK as random coecients, distributed independently as uk N(0; u2 ). The ecacy of
this approach is illustrated by the non-parametric regression functions for the U.K. birth data
displayed in Figure 1. The left-hand panel shows the results of estimating u1 ; : : : ; uK in (2) by
OLS; the tted curve is rather rough. Constraining the coecients u1 ; : : : ; uK to come from a
common distribution has the eect of damping changes in the gradient of tted line segments
from one knot point to the next, resulting in the smooth regression function displayed in the
right-hand panel.
Let the n response and covariates values be accumulated in vectors y = (y1 ; : : : ; yn ) and
x = (x1 ; : : : ; xn ) , respectively, and write 1 for the vector of all ones. Consider the model
implied by equation (2), with the Gaussian distributional assumption governing both the error
vector U = (1 ; : : : ; n ) and the coecient vector u. If we dene the n 2 xed eects design
matrix by
X = [1 x]
and the n K random eects design matrix by
Z = [(x 1 1)+ ; : : : ; (x K 1)+ ]
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3367

TUTORIAL IN BIOSTATISTICS

then this model can be expressed as


y = XR + Zu + U
where

 

u
U

  
N

0
0

u2 I

2 I



In other words, a non-parametric regression implemented using spline smoothing can be expressed as a linear mixed model.
The connection between mixed models and spline smoothing methods can also be established by considering that the estimators R and u minimize the penalized least squares (PLS)
function of Reference [13]
PLS(R; u) = y XR Zu2 + u

(4)

where  is the ratio of variance components 2 =u2 and u is the usual Euclidean norm of
the vector u. The minimization of PLS(R; u) is penalized in the sense that the magnitude
of the random eect coecients in u are constrained not to grow too large. The particular
penalty u2 results from the use of the normal distribution; other penalties are possible [14].
This produces smooth tted curves that can be shown to be spline smoothers [15], thus
formally connecting linear mixed models to spline smoothing. One benet of thinking about
smooth regression modelling in this fashion is that (4) makes it clear that  = 2 =u2 is a
smoothing parameter. As its value is increased, the penalty term receives greater weight and
the regression becomes smoother at the expense of a less close t to the data. In other nonparametric smoothing problems such as kernel regression (e.g. Reference [16]) the choice of
smoothing parameter is crucial, but often dicult in practice. One of the attractions about
tting spline smoothers as linear mixed models is that  can be selected in a very natural
fashion using REML, although the user may choose to experiment with alternative values of
this parameter.
The solution for R and u that minimizes the PLS function can also be written as
 
R
1
= (C C + D) C y
u
where C = [X Z] and D = diag(0m ; 1; : : : ; 1), the vector 0m representing the m-dimensional zero
vector where m is the dimension of the vector R of xed regression coecients. This equation
can be recovered by substituting R = 2 I and G = u2 I into the mixed model equations above,
and represents a particular example of ridge regression.
3.2. Spline smoothing in the literature
There is a very large literature on spline smoothing. For those wishing to go beyond the
coverage in the present paper, the article by Wand [14] and the books by Hansen et al. [17]
and Ruppert et al. [18] are recommended. The origin of smoothing splines lies can be traced
back to the work of Whittaker [19] on graduating data. However, spline smoothing received
little attention from statisticians until the utility of this technique was demonstrated by the
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3368

L. C. GURRIN, K. J. SCURRAH AND M. L. HAZELTON

research eorts of Wahba and co-workers [2023]. The monograph on spline models by
Wahba [24] contains many theoretical results that prompted the recent developments that use
the connection between spline smoothing and linear mixed eects models, although the rst
explicit mention of this link seems to be due to Speed [25] in his discussion of Robinsons
1991 paper on the estimation of random eects [10].
Dierent types of spline can be obtained by altering the choice of knots, and by changing the
manner in which roughness in the estimated regression function is penalized. One approach
is to allow a knot at each (discrete) value of the x variable which, given an appropriate
choice of roughness penalty, leads to a natural cubic smoothing spline [15]. We prefer to
focus on penalized splines, also called P-splines, a terminology introduced by Eilers and
Marx [26] and Marx and Eilers [27]. P-splines are characterized by the use of relatively
modest, xed, number of knots (usually K  n, though we say more on knot selection later).
They are very similar in spirit to low-rank pseudo-splines proposed by Hastie [28]. See also
References [2932].
Since the end of the 1990s a body of research has developed on P-splines. Methodological
developments include Brumback et al.s [33] contribution on mixed-model representations of
P-splines; Rupperts [34] work on the choice of knots; Cai et al.s [35] use of P-splines in
hazard function estimation; and Lang and Brezgers work on Bayesian P-splines [36]. See also
References [3742]. Some examples of applications of P-splines include Greenlands paper
on HIV incidence [43], Marx and Eilers contribution on calibration and chemometrics [44],
Kauermann and Ortliebs paper on modelling patterns of sick leave [45], and Eisen et al.s
paper on occupational cohort studies [46].
3.3. Knot specication
The location of the knots must be specied in advance of tting the model, and they are
supplied implicitly to the chosen software routine via the realized values of the spline basis
function in the random eects design matrix Z. Wand [14] comments that knot specication
is very much a minor detail for penalized splines, and Ruppert [34] notes that because
smoothing is controlled by the penalty parameter, , the number of knots, K, is not a crucial
parameter. We concur with Wands suggestion that a reasonable default rule for the location
of the knots is


k +1
th sample quantile of the unique xi s; 1 6 k 6 K
k =
K +2
where K = min(n=4; 35). Some additional algorithms, empirical results and further commentary
on the topic of knot selection are supplied in Reference [34].
3.4. Extensions
Several extensions to the simple non-parametric regression model in (1) are available within
the linear mixed models framework. We may wish to model the relationship between a continuously valued response Yi and multiple continuously valued covariates. Illustrating this for
the case of two covariates x and w, with  dened as before, we have
E[Yi ] = m(xi ) + l(wi ) + i
Copyright ? 2005 John Wiley & Sons, Ltd.

(i = 1; : : : ; n)

(5)
Statist. Med. 2005; 24:33613381

3369

TUTORIAL IN BIOSTATISTICS

where m and l are smooth functions of the corresponding covariates contributing to the
conditional mean of Yi . This is a generalized additive model as described by Hastie and
Tibshirani [47], implemented in the gam() routine in R or S-plus. These types of models are
easily implemented in general mixed model software by dening knot locations and spline
basis function separately for both the covariates x and w, declaring each of the corresponding
coecient vectors to have independent Gaussian distributions and estimating them using best
linear unbiased prediction.
It may be that (5) is unnecessarily complex because E[Y ] is (approximately) linear in w.
We might then prefer a semi-parametric regression model which includes both non-parametric
(spline) and linear terms:
E[Yi ] = m(xi ) + 1 wi + i

(i = 1; : : : ; n)

(6)

Another form of semi-parametric regression occurs if extra random eects are included for
modelling clustered and/or longitudinal data. We investigate an example with clustered data
in some detail in the next section. For longitudinal data, if yij denotes the observation on
individual i at the jth time point (j = 1; : : : ; m) then a simple extension of (1) is
E[Yij ] = m(xij ) + vi + ij

(i = 1; : : : ; n; j = 1; : : : ; m)

(7)

where v1 ; : : : ; vn are individual specic random eects. These ideas may be combined and
generalized to produce intricate models capable of capturing non-linear relationships between
response and covariates while accounting for complicated correlation structures in the data,
all within the framework of linear mixed models.

4. APPLICATION: MODELLING MAMMOGRAPHIC DENSITY VERSUS AGE FROM


TWIN DATA
In this section we use a linear mixed model to implement a semi-parametric regression
model of the relationship between mammographic density and age using data from females
twinpairs.
4.1. Mammographic density, age and risk of disease
Women with extensive dense breast tissue determined by mammography are known to be
at higher risk of breast cancer than women of the same age with lower mammographic
density. Previous work on pairs of both identical or monozygous (MZ) and non-identical or
dizygous (DZ) female twins from samples in Australia and North America provided strong
evidence of a genetic eect on mammographic density [2]. Current interest focuses on studying
the relationship between mammographic density and individual covariates. Age is known to
inuence both the risk of breast cancer and the percent mammographic density, and it is ageadjusted mammographic density that predicts breast cancer risk. Thus any covariate analysis
must adjust a priori for age in order to minimize any confounding inuence on the risk
relationship between mammographic density and other covariates.
The relationship between age and mammographic density is not immediately apparent from
inspection of a scatter plot (see Figure 2), apart from a sparsity of older women with high
mammographic density, suggesting that mammographic density decreases with increasing age.
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3370

L. C. GURRIN, K. J. SCURRAH AND M. L. HAZELTON

100

Percent breast density (%)

80

60

40

20

0
40

45

50

55

60

65

70

Age at mammogram (years)

Figure 2. Fitted regression functions for the per cent mammographic density data. The dotted line
and dashed curve represent, respectively, the simple linear regression and a cubic polynomial t,
both based on OLS. The solid line displays the result of a linear spline regression with knots at
each distinct value of age at mammogram.

We hope to elucidate the relationship between age and mammographic density using spline
smoothing while specifying a random eects structure to account for the within twinpair
correlations.
4.2. Data and models
The data consist of measurements on per cent mammographic density, which is the ratio of
dense breast tissue area to total breast tissue area determined by mammographic imaging, and
age at mammogram in years recorded as an integer. Per cent mammographic density ranges
from 0 to 90 per cent with mean 37 per cent; age ranges from 38 to 71 with mean 51
years. Data are available on 951 twinspairs, 599 from Australia (353 MZ and 246 DZ) and
352 from North America (218 MZ and 134 DZ). We represent the per cent mammographic
density data perc.density as the response vector y = {yij } where i = 1; : : : ; 951 indexes the
twinpair and j = 1; 2 twins within pairs. The covariate age is represented by the vector x
with identical structure to y. We seek to t a model
perc:densityij = m(ageij ) + hij + ij

(8)

where m is some smooth function of age, with the random eect hij capturing the within-pair
correlation structure and ij representing an uncorrelated random error term with variance 2 .
We begin by specifying a model to t a penalized spline for the relationship between
mammographic density and age. We use a linear spline basis as specied in equation (2), so
the xed eects design matrix X has just two columns. There are sucient data to warrant
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3371

TUTORIAL IN BIOSTATISTICS

locating a knot at each distinct value of age, so the random eects design matrix Z has 34
columns since the age range is 3871 years.
We incorporate the within-pair correlation into the regression structure by including additional random eects in the model that are shared within each twinpair cluster, while also
allowing for the possibility that the within-pair correlations for MZ and DZ pairs are dierent.
For the ith twinpair we generate two random eects ai1 and ai2 , each with variance a2 and
correlation between them of . If the ith twinpair is MZ then for both twins ai1 is added to
the xed eects linear predictor, that is, hij = ai1 for j = 1; 2 in equation (8). For DZ twins we
add ai1 to one of the twins and ai2 to the other, so we have simply hij = aij for j = 1; 2. Note
that the ordering of the twins within a given pair is arbitrary regardless of their zygosity. In
this case the total variance in the trait for an individual is just a2 + 2 , with covariance of
a2 for MZ pairs and a2 for DZ pairs, implying that the within-pair correlation in DZ pairs
will be less than in MZ pairs. The special case of  = 0:5 implies a standard additive genetic
model known as the classical twin model [48, 49].
More formally, for our N = 951 twinpairs we update the vector of random eects from u
to u = (u ; a11 ; a12 ; : : : ; aN 1 ; aN 2 ) and augment the original random eects design matrix Z to
Z where


Z
0

Z =
0 Z twin
The additional design sub-matrix Ztwin is block diagonal with 2 2 blocks, the ith block being
the 2 2 identity matrix if the ith pair are DZ twins, and a 2 2 matrix with a column of
ones followed by a column of zeros if the ith pair are MZ twins. The covariance matrix for
the random eects vector u must also be updated from G to G where


G
0

G =
0 Gtwin
The sub-matrix Gtwin is block diagonal with N identical 2 2 blocks S where


1 
S = a2
 1
The residual error covariance matrix R remains as 2 times the identity matrix with appropriately expanded dimension. There are of course other ways of capturing this correlation
structure using realized random eects but this is by far the easiest when we are restricted
to using only two genetic random eects per twinpair cluster as required by some of the
software routines we explored when tting these models.
The models were tted using the lme() routine in R and in a Markov chain Monte Carlo
(MCMC) setting using the package WinBUGS; the results were similar and the relevant computing code for both implementations is presented in Appendix A. For additional examples
of the use of R, WinBUGS and SAS [50] in spline smoothing see References [14, 51, 52].
The REML estimates (from the lme() routine in R) of the three variance parameters
were  2a = 282:31, 2u = 0:205 and  2 = 125:57, and the estimated within-pair correlation for
DZ twins was  = 0:507, very close to the null value of  = 0:5 for the standard additive
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3372

L. C. GURRIN, K. J. SCURRAH AND M. L. HAZELTON

genetic model. Note that the total variance is now a2 + u2 + 2 rather than the a2 + 2 implied
by the original model for within-pair correlation, although the magnitude of the estimated
penalized spline variance component u2 is very modest compared to the estimated values for
a2 and 2 . In this example an adjustment for the eect of age on mammographic density
would always be necessary before imposing a genetic variance component structure, so it
make sense to interpret the estimated values of a2 and 2 conditional on the estimated value
of u2 in the same way that we condition implicitly on the value of estimated xed eects
when interpreting (components of) the residual variance.
The tted spline regression is displayed in Figure 2, along with the OLS ts of linear
and cubic polynomial regressions. The spline t reveals that the mean mammographic density
increases slightly with age in range 4045 years (the maximum tted value is 44.3 per cent at
44 years of age) but decreases for the remaining age range. The cubic polynomial t largely
reproduces the behaviour of the semi-parametric model, albeit it with a spurious increase in
density after 65 years that is a consequence of the parametric form of the regression function.
The cubic t is nonetheless a signicant improvement on the linear t (p = 0:0025) indicating
that the features of the data exposed by the semi-parametric regression are not artefacts.
5. SPLINE SMOOTHING AND GENERALIZED LINEAR MIXED MODELS
In the examples we have considered so far, the smoothing of the relationship between a
response and a covariate has taken place directly on the scale of the continuously valued
outcome variable. By appealing to the role of the smoothing term in dening a model for the
expected value of the outcome, the mixed model implementation of spline smoothing can be
extended to encompass regression models for outcomes such as binary and count data that
are not continuously valued and hence cannot be assumed to be even approximately normally
distributed. Such data can be modelled using a generalized linear mixed model (GLMM) [1],
which we introduced using the notation of Section 2 as a two-parameter exponential family
density f for y, where
f(y|u) = exp{(yt (XR + Zu) 1t b(XR + Zu))= + 1t c(y; )}

(9)

with b and c both scalar functions that are applied component-wise to the vector each takes as
its argument. Note that c is a function of the data vector y that involves the scale parameter 
but not the xed or random eect parameters. If the terms involving the vector u of random
eects, assumed to be normally distributed with mean zero and covariance matrix G, are
absent then the model reduces to a generalized linear model (GLM) [53, 54].
The term XR + Zu is the linear predictor containing both xed and random eects and is
related to the conditional expectation of y via the link function g such that g(E(y|u)) = XR+Zu.
Smoothing of the relationship between the response and covariate naturally take places on the
scale of the linear predictor, and additional terms can be included in the specication of the
random eects to accommodate the truncated linear basis that facilitates the mixed model
implementation of spline smoothing presented in Section 3.
In principle the xed eect and variance parameters of a generalized linear mixed model
can be estimated using maximum likelihood, with the random eects estimates following from
BLUP. This process, however, is much more computationally challenging than estimating the
parameters of the standard linear mixed models due to the fact that the required integration
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

TUTORIAL IN BIOSTATISTICS

3373

over the (typically high dimensional) distribution of the random eects cannot be performed
explicitly and must be accomplished numerically. Common approaches use the Laplace approximation which is equivalent to penalized likelihood [13, 55]. Ngo and Wand [51] note
that for a user-specied value of the variance component corresponding to the smoothing
parameter, tting a GLMM reduces to iteratively reweighted least squares ridge regression.
Routines to implement GLMMs are available in both R, SAS and Stata [56] although these
approximate methods (often based on quasi-likelihood) do not necessarily reproduce the true
ML estimates to any degree of accuracy.
5.1. Application: using pedigree data to model the age- and sex-specic risk of bronchial
hyperresponsiveness
In this section we illustrate the application of mixed model smoothing in GLMs using an
example from asthma genetics research. The aim of the original analysis of these data was
to determine if there was an association between bronchial hyper-responsiveness (BHR) and
alleles of a genetic marker in the interleukin-9 (IL9) gene [57]. By accounting for the withinfamily correlation due to shared genetic and environmental inuences it is possible to estimate
the extent to which the residual variation after adjustment for IL9 depends on unmeasured
genetic factors. It is also necessary to adjust for the eect of age and sex on BHR. Although
we can stratify analyses for sex, the role of age in determining the risk of BHR is not known
a priori and may change as the individual moves from childhood through adolescence and
into adulthood. It is unclear whether tting a linear term or some other low-order polynomial
is a good approximation to the correct but unknown functional form relating age to the loghazard of BHR with increasing doses of a bronchial agonist. A more exible approach is to
t age as a spline term in the model for BHR, which we demonstrate here.
The data are from long-standing cohort studies in Busselton (Western Australia) and
Southampton (U.K.). In the Busselton study, caucasian families with both parents alive and
aged under 55 years and with at least two children over the age of 5 years were recruited.
In the Southampton study, caucasian families with three or more children were recruited
through contact at local general practices with the parents of 1800 children aged 1114 years
[58, 59].
Individual lung function was assessed repeatedly on a single occasion as part of a bronchial
challenge [60], which is designed to simulate an asthma attack. During this procedure, forced
expiratory volume in 1 s (FEV1 ) was measured after an initial dose of saline and after each increasing dose of a bronchoconstrictor drug (methacholine in Busselton, histamine in Southampton). The event of interest was a 20 per cent fall from post-saline FEV1 , characterized by
the dose at which this fall occurred and estimated by linear interpolation of the observed
responses bracketing the critical fall. If a 20 per cent fall was not achieved at or before the
highest permissible dose had been administered (12 mol for methacholine and 2:45 mol for
histamine) the response was censored at the maximal dose. Such measurements can be considered as time to event data, although the distance to the event is measured on the dose
and not time scale.
Complete data were available for 823 individuals (213 from Busselton, 610 from Southampton) in 199 nuclear families. A total of 101 subjects (12.3 per cent) responded (40 (18.8 per
cent) in Busselton and 61 (10.0 per cent) in Southampton). Of the remaining 722 subjects,
the majority were end-censored at the maximal dose.
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3374

L. C. GURRIN, K. J. SCURRAH AND M. L. HAZELTON

5.2. Piecewise exponential model


Data were analysed using a piecewise exponential model. In this model, the cumulative dose
is divided into N arbitrary intervals (D1 ; : : : ; DN ) of nite length (U1 ; : : : ; UN ), such that individual j in the ith family is at risk of response during interval Db up to their recorded dose
dijb , where 0dijb Ub . A censoring indicator (yijb ) denotes the response in the ijth individual during interval Db : yijb = 1 if the individual failed during Db and yijb = 0 otherwise.
Declaring that yijb follows a Poisson distribution within each interval produces a likelihood
function that in the limit, when there are a large number of distinct dose intervals, is equivalent to Coxs proportional hazards model [54]; inferences from the two models are remarkably
similar even when the number of distinct intervals is small [61]. In this example we used ve
dose intervals; (0; 0:4], (0:4; 1:8], (1:8; 5], (5; 10], (10; 12].
More formally, the model may be expressed via the hazard, ijb , for individual ij in the
bth dose interval:
log( ijb ) = log(dijb ) + 0b + Rt xijb + Fi + Gi + Hij +

K

k=1

log( ijb ) = log(dijb ) + 0b + Rt xijb + Fi Gi + Hij +

K

k=1

log( ijb ) = log(dijb ) + 0b + Rt xijb + Fi + Mi + Pij +

K

k=1

uk (ageij k )+

(fathers)

uk (ageij k )+

(mothers)

uk (ageij k )+

(children)

where
Fi N(0; 12 A2 );

2
Mi N(0; Cs
)

2
Gi N(0; 12 A2 ); Hij N(0; Cs
)

Pi N(0; 12 A2 );

uk N(0; U2 )

and yijb Poisson( ijb ).


This is a generalized linear mixed model with a log link, a Poisson error term, and a
multivariate Normal joint distribution for the higher order random eects [53, 55]. Here 0b
is the log baseline hazard, R is a vector of regression coecients and xijb is a vector of
covariates for the IL9 marker, indicating whether each individual has 0, 1 or 2 copies of each
of the four alleles of interest. Separate eects for each of these four alleles, and distinct log
baseline hazards 0b for each dose interval, were allowed in each population.
The linear predictor includes a spline smoothing term to capture the relationship between
the log hazard and the age of the individual ageij . The ui parameters, which are assumed to
be uncorrelated with each other and the remaining random eects described above, represent
the coecients for the truncated linear basis with knot points 1 ; : : : ; K . Due to the large
number of data points at each age, 27 knot points were used, at every 2 years from age
6 to 60.
2
are the components of variance attributable to additive genetic
The parameters A2 and Cs
eects and shared sibling environment, respectively. The uncorrelated random eects Fi , Gi ,
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

TUTORIAL IN BIOSTATISTICS

3375

Hij , Mi and Pij are shared in such a way that the total variance (on the linear predictor
2
for each individual. The covariance between two parents is zero, while the
scale) is A2 + Cs
covariance between a parent and a child is 12 A2 and the covariance between two siblings is
1 2
2
2 A + Cs .
We took an MCMC approach to t the models, using WinBUGS. The unconventional parameterization for the familial random eects enhances convergence of the sampled parameter
values to the target posterior distribution [62]. Vague prior distributions were used for all
parameters. Specically, N(0; 10 000) distributions were used as priors for xed eects while
Pareto (0.5,0.01) distributions, on the scale of the precision (the inverse of the variance), were
used for all three variance components. This is equivalent to specifying a uniform prior on the
scale of the standard deviation which, for a suitably large choice of the distributions upper
bound (here we use 100), ensures that the results are essentially invariant to the scale of the
data. Note that the use of the traditional vague gamma prior with constant hyperparameters
corresponding to, say, a mean of 1 and a variance of 1000, produces results that are not scale
invariant. The use of such gamma priors is no longer recommended. Models were run for
a burn-in of 10 000 iterations, followed by 100 000 iterations after convergence. Two chains
were run in parallel for all models, and the estimates presented below are posterior means
(and standard deviations) of the combined iterations from both chains. The relevant computing
code for WinBUGS is available from the corresponding author on request.
5.3. Results
The alleles of IL9 exhibited only weak positive associations with the risk of BHR. The
variance component A2 , reecting additive polygenic eects, remained large at 2.48 (SD, 1.52;
95 per cent credible interval, 0.165.99) even after the inclusion of the IL9 marker, which
implies the existence of other genes controlling BHR. The smaller estimate of 0.61 (SD,
2
, reecting eects due to a shared sibling environment, suggests that this has
0.76) for Cs
little impact on bronchial hyperresponsiveness. The smoothing parameter U2 was estimated
to be 0.0039 (SD, 0.0063; 95 per cent credible interval, 0.0000250.0221) which, similar
to the application in Section 4, was small in comparison to the other variance components.
Figure 3 displays the tted log-hazard log( ijk ) for age, separately for males and females, using
both the semi-parametric spline smoothing estimates and a fully parametric cubic polynomial
curve. Males appear to have a higher risk of response than females in childhood although
both cubic and spline ts indicate that there is little relationship between age and BHR for
adult males. In contrast, the risk of response rises in females from mid-teens to about age 30,
then decreases consistently. This increase is surprising but may be due to sparse data for this
age group. The sharp decrease is due to the fact that no female aged over 50 responded.
Clearly, the spline t is less aected by this in comparison with the cubic curve. In males
the spline t captures local variation in the risk of response between ages 20 and 60, which
is not reected in the cubic polynomial t.
Although the result of the spline smoothing term for age mirrors many features of the cubic
t, it was necessary to spend some considerable time during the original analysis actually
discovering the cubic dependence of log-hazard of BHR on age. In data such as these it
may be dicult and time consuming to obtain a good parametric t, and it would be easy
for a biostatistician to miss the need for a cubic or higher order polynomial to provide an
approximation to the complex yet unknown functional form. The use of spline smoothing
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3376

L. C. GURRIN, K. J. SCURRAH AND M. L. HAZELTON

Age-sex risk profiles

log(Hazard ratio)

-2

-4
Spline (M)
Spline (F)
Cubic (M)
Cubic (F)

-6

-8

-10
10

20

30
40
Age in years

50

60

Figure 3. The log-hazard of BHR with age, tted using both a cubic and spline term.

in a mixed model framework allows us to discern immediately the shape of the relationship
between age and BHR, even if subsequent analyses suggest that a parametric model t is
appropriate.

6. CONCLUSIONS
The realization that penalized splines and thus a broad class of semi-parametric regression
models can be cast within a linear mixed models framework aords biostatisticians the opportunity to incorporate the use of smoothing techniques into the analysis of correlated data
structures. The latter are typically represented as a hierarchical model with correlation captured by random eects common to observations at a given level of the hierarchy, and are
thus naturally analysed by tting regression models using mixed model software. We saw that
smoothing of the relationship between age and two separate outcomes, rstly mammographic
density (in application 2) and secondly the risk of bronchial hyperresponsiveness (in application 3), could be achieved while simultaneously estimating a non-trivial genetic variance
component structure implied by the paired or familial nature of the data.
Not only can linear mixed models be used to attack a very wide variety of problems,
but they can be extended to meet the challenges of missing data, measurement error and
non-normally distributed data via generalized linear mixed models as in application 3. Their
hierarchical nature means that such models lend themselves to tting via Markov chain Monte
Carlo methods. The computing code in Appendix A demonstrates that tting penalized splines
using the WinBUGS platform is no more dicult for the biostatistician than using the more
conventional maximum likelihood routines in R or S-plus. Indeed, an MCMC approach is
perhaps the most attractive approach currently available for tting spline terms and random
eects in the linear predictor for generalized linear mixed models.
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3377

TUTORIAL IN BIOSTATISTICS

Within the mixed models implementation of penalized splines the amount of smoothing
is controlled by the relative magnitude of the relevant variance component and the residual
error variance. Typically this parameter is estimated from the data, and in the three examples
in the text the amount of smoothing was dictated by the default REML estimates of the
variance parameters (in application 2) and posterior means using vague prior distributions (in
applications 1 and 3). This gave reasonable tted curves in the examples we consider although
it is possible that there are scenarios where success of this method for choosing the amount
of smoothing automatically will depends on the number and location of the knots. Using too
few knots with the large data set in our second example drives the magnitude of the variance
component associated with the spline smoothing down to an unrealistically low values resulting
in very little smoothing. This can obviously be overcome by increasing the number of knots,
or tting with user-specied values for the variance components, which is straightforward in
WinBUGS. It is, however, not possible at this stage to work with user-specied parameters with
the lme() routine in R (or S-plus). Wand [14] describes an algorithm based on Demmler
Reinsch orthogonalization that allows smoothing to be performed in R with user-specied
parameters in the simplest scatter plot scenario such as our rst example. Although this can
be extended to t more complex models such as the second or third example, the dimension
of the matrices involved in this case are large despite their sparsity due to the large number
of twinpairs.
We noted in Section 4 that the variance component allocated to the coecients of the
spline basis functions will contribute to the total modelled variance of the observations and
may encroach on the interpretability of relationship between the remaining components if their
structures results from a substantive model such as, in our second and third examples, the
genetic relationships that exists within twinpairs or between family members. In both these
examples, however, an impressive degree of smoothing was achieved with a relatively small
variance component that had little impact on the remaining variance parameters.
We might expect future versions of mixed modelling software to implement penalized spline
smoothing automatically at the request of the user, relieving the biostatisticians of the burden
associated with creating spline bases and design matrices manually. This would not, of course,
obviate the need for careful consideration of the problem at hand and a decision as to whether
even the best semi-parametric regression model is a good t to the data.
APPENDIX A: COMPUTING CODE
The following code implements the mammographic density example in R, using the library
lmeSplines. We work with a data frame called twins that contains the following variables:
perc.density
age
zyg
pairnum
twinnum

per cent mammographic density


age at mammogram in years as integer
zygosity: 1 = MZ, 2 = DZ
twinpair ID number 1951
within-pairtwin number 1 or 2

The relevant R computing code is:


attach(twins) # attach the data frame "twins" in position 1
knots <- sort(unique(age))
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3378

L. C. GURRIN, K. J. SCURRAH AND M. L. HAZELTON

y <- perc.density
# set outcome "y" to percent density
cons <- rep(1,length(y))
# set "cons" to a vector of ones
dz2 <- rep(0,length(y))
dz2[(zyg == 2) & (twinnum == 2)] <- 1
mzdz1 <- cons - dz2
X <- cbind(rep(1,length(y)),age)
# create fixed effects design matrix X
Z <- outer(age,knots,"-")
# create random effects design matrix Z
Z <- Z*(Z>0)
twins.fit <- lme(y ~ -1 + X, random = list(cons = pdIdent(~-1 +
Z), pairnum = pdCompSymm(~-1 + mzdz1 + dz2)), method = "REML")
twins.fitted.values <- twins.fit$fitted[,2]

The vector mzdz1 contains a 1 for all MZ twins and for the (arbitrarily labelled) rst
twin of each DZ twinpair; the vector dz2 contains a 1 for the second twin in each DZ
pair, so that the sum of the vectors mzdz1 and dz2 is identically 1. By declaring the
(random) coecients associated with the vectors mzdz1 and dz2 within each twinpair to
have a compound symmetry correlation structure we can reproduce the additive genetic model
described in Section 4. The coecients are declared to vary across twinpairs by using the
pairnum twinpair factor to name the pdCompSymm structure in the above call to lme().
Since lme() requires all random coecients to vary over higher-level units, the above syntax
uses the dummy variable cons of all ones to form a single unit containing all of the data.
The pdIdent correlation structure then constrains the random coecients associated with
penalized spline to have the same variance, namely u2 . The tted values for the penalized
spline are contained in the second column of the tted object since we must include the
random eects in the linear predictor; the rst column of this object contains the tted values
based on the single xed linear eect only and in this case is meaningless without adding the
random eects associated with the penalized spline t.
The relevant WinBUGS computing code is:
model
{
for( i in 1:951){
A1[i] ~ dnorm(0.0, tau.A)
# random effects for MZ/DZ
A2[i] ~ dnorm(0.0, tau.A)
A3[i] ~ dnorm(0.0, tau.A)
}
for( i in 1:34){
u[i] ~ dnorm(0.0, tau.u)
# random effects for spline
}
for (j in 1:34){
# loop to set up the basis functions
smeanY1[j,1] <- a + b1*(j+37)
for (k in 1:Nknots){
smeanY1[j,(k+1)] <- smeanY1[j,k] +
u[k] * step(j+37-knot[k])*(j+37-knot[k])
}
smeanY[j] <- smeanY1[j,(Nknots+1)]
}
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

TUTORIAL IN BIOSTATISTICS

for (j in 1:1902){

3379

# loop to assign mean and distribution to


# observations

meanY[j] <- smeanY[age[j]-37] +


mz[j]*(sqrt(rho)*A1[pairnum[j]]+sqrt(1-rho)*A2[pairnum[j]]) +
(1-mz[j])*( sqrt(rho)*A1[pairnum[j]] +
sqrt(1-rho)*((1-twin1[j])*A2[pairnum[j]] + twin1[j]*A3[pairnum[j]]) )
perc.density[j] ~ dnorm(meanY[j], tau.Y)
}
a ~ dnorm(0.0, 1.0E-8)
b1 ~ dnorm(0.0, 1.0E-8)
tau.A <-1/pow(sig.A,2)
tau.u <- 1/pow(sig.u,2)
tau.Y <-1/pow(sig.Y,2)
rho ~ dbeta(1,1)
sig.A ~ dunif[0,100]
sig.u ~ dunif[0,100]
sig.Y ~ dunif[0,100]
}

The variables are dened as for the R code, expect that mz is an indicator for an MZ twin,
and twin1 is an indicator for twinnum = 1. The prior distributions employed here and in
application 3 are discussed in more detail in Section 5.2; see also Reference [63, Section 5.7]
for a recent comprehensive discussion of prior distributions for variance parameters.

ACKNOWLEDGEMENTS

The authors thank the anonymous referees for their comments which have greatly improved the presentation of the paper. We thank Professor Norman F. Boyd, Professor John L. Hopper and Ms Gillian
S. Dite for permission to use the mammographic density data, the generation of which was supported
by grants from the National Breast Cancer Foundation (Australia), the National Health and Medical
Research Council (Australia), the Merck, Sharp & Dohme Research Foundation (Australia), and the
Canadian Breast Cancer Research Institute. We thanks also Professor Lyle J. Palmer, Professor Newton
Morton and Professor Bill Cookson for permission to use the bronchial hyperresponsiveness data. The
authors acknowledge several helpful discussions with Professor John Hopper and Ms Gillian S. Dite.

REFERENCES
1. McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. Wiley: New York, 2000.
2. Boyd NF, Dite GS, Stone J et al. Heritability of mammographic density, a risk factor for breast cancer. New
England Journal of Medicine 2002; 347:886 894.
3. Ihaka R, Gentleman R. R: a language for data analysis and graphics. Journal of Computational and Graphical
Statistics 1996; 5:299314.
4. Gilks WR, Thomas A, Spiegelhalter DJ. A language and program for complex Bayesian modelling. The
Statistician 1994; 43:169178.
5. Spiegelhalter DJ, Thomas A, Best NG, Lunn D. WinBUGS Version 1.4 User Manual. MRC Biostatistics Unit,
Cambridge, U.K., 2003. www.mrc-bsu.cam.ac.uk/bugs/winbugs
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

3380

L. C. GURRIN, K. J. SCURRAH AND M. L. HAZELTON

6. Goldstein H, Healy MJR, Rasbash J. Multilevel times series models with applications to repeated measures data.
Statistics in Medicine 1994; 13:16431655.
7. Goldstein H, Browne WJ, Rasbash J. Multilevel modelling of medical data (Tutorial in Biostatistics). Statistics
in Medicine 2002; 21:32913315.
8. Browne WJ, Draper D, Goldstein H, Rasbash J. Bayesian and likelihood methods for tting multilevel models
with complex level-1 variation. Computational Statistics and Data Analysis 2002; 39:203225.
9. Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics 1975;
31:423 447.
10. Robinson GR. That BLUP is a good thing: the estimation of random eects. Statistical Science 1991; 6:15 51.
11. Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. Journal
of the Royal Statistical Society, Series B 2004; 66:165185.
12. Kackar RN, Harville DA. Approximations for standard errors of xed and random eects in mixed linear models.
Journal of the American Statistical Association 1985; 79:853 862.
13. Green PJ. Penalized likelihood for general semi-parametric regression models. International Statistical Review
1987; 55:245259.
14. Wand MP. Smoothing and mixed models. Computational Statistics 2003; 18:223249.
15. Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty
Approach. Chapman & Hall: London, 1994.
16. Wand MP, Jones MC. Kernel Smoothing. Chapman & Hall: London, 1995.
17. Hansen MH, Huang JZ, Kooperberg C, Stone CJ, Truong YK. Statistical Modelling with Spline Functions:
Methodology and Theory. Springer: New York, 2003.
18. Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press: Cambridge, U.K.,
2003.
19. Whittaker ET. On a new method of graduation. Proceedings of the Edinburgh Mathematical Society 1923;
41:6375.
20. Kimeldorf G, Wahba G. A correspondence between Bayesian estimation of stochastic processes and smoothing
by splines. Annals of Mathematical Statistics 1970; 41:495 502.
21. Wahba G. Improper priors, spline smoothing and the problem of guarding against model errors in regression.
Journal of the Royal Statistical Society, Series B 1978; 40:364 372.
22. Craven P, Wahba G. Smoothing noisy data with spline functions. Numerische Mathematik 1979; 31:377 403.
23. Wahba G. Bayesian condence intervals for the cross-validated smoothing spline. Journal of the Royal
Statistical Society, Series B 1983; 45:133150.
24. Wahba G. Spline Models for Observational Data. Wiley: New York, 1990.
25. Speed TP. Comment in discussion of Robinson GR. That BLUP is a good thing: the estimation of random
eects. Statistical Science 1991; 6:15 51.
26. Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science 1996; 11:89 121.
27. Marx BD, Eilers PHC. Direct generalized additive modelling with penalized likelihood. Computational Statistics
and Data Analysis 1998; 28:193208.
28. Hastie T. Pseudosplines. Journal of the Royal Statistical Society, Series B 1996; 58:379 396.
29. OSullivan F. A statistical perspective on ill-posed inverse problems (with discussion). Statistical Science 1986;
1:505 527.
30. OSullivan F. Fast computation of fully automated log-density and log-hazard estimators. SIAM Journal on
Scientic and Statistical Computing 1988; 9:363379.
31. Kelly C, Rice J. Monotone smoothing will applications to dose-response curves and the assessment of synergism.
Biometrics 1990; 46:10711085.
32. Gray RJ. Spline-based tests in survival analysis. Biometrics 1992; 50:640 652.
33. Brumback B, Ruppert D, Wand MP. Comment on variable selection and function estimation in additive
nonparametric regression using data-based prior by Shively TS, Kohn R, Wood S. Journal of the American
Statistical Association 1999; 94:794797.
34. Ruppert D. Selecting the number of knots for penalized splines. Journal of Computational and Graphical
Statistics 2002; 11:735 757.
35. Cai T, Hyndman RJ, Wand MP. Mixed model-based hazard estimation. Journal of Computational and Graphical
Statistics 2002; 11:784 798.
36. Lang S, Brezger A. Bayesian P-splines. Journal of Computational and Graphical Statistics 2004; 13:183212.
37. Aerts M, Claeskens G, Wand MP. Some theory for penalized spline generalized additive models. Journal of
Statistical Planning and Inference 2002; 103:455 470.
38. Berry SM, Carroll RJ, Ruppert D. Bayesian smoothing and regression splines for measurement error problems.
Journal of the American Statistical Association 2002; 97:160 169.
39. Yu Y, Ruppert D. Penalized spline estimation for partially linear single-index models. Journal of the American
Statistical Association 2002; 97:10421054.
40. Durban M, Currie ID. A note on P-spline additive models with correlated errors. Computational Statistics 2003;
18:251262.
Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

TUTORIAL IN BIOSTATISTICS

3381

41. Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. Journal
of the Royal Statistical Society, Series B 2004; 66:165 185.
42. Fahrmeir L, Kneib T, Lang S. Penalized structured additive regression for space-time data. Statistica Sinica
2004; 14:731761.
43. Greenland S. Historical HIV incidence modelling in regional subgroups: use of exible discrete models with
penalized splines based on prior curves. Statistics in Medicine 1996; 15:513525.
44. Marx BD, Eilers PHC. Multivariate calibration stability: a comparison of methods. Journal of Chemometrics
2002; 16:129 140.
45. Kauermann G, Ortlieb R. Temporal pattern in number of sta on sick leave: the eect of downsizing. Journal
of the Royal Statistical Society, Series C-Applied Statistics 2004; 53:353367.
46. Eisen EA, Agalliu I, Thurston SW, Coull BA, Checkoway H. Smoothing in occupational cohort studies: an
illustration based on penalised splines. Occupational and Environmental Medicine 2004; 61:854 860.
47. Hastie TJ, Tibshirani RJ. Generalized Additive Models. Chapman & Hall: London, 1990.
48. Hopper JL, Mathews JD. A multivariate normal model for pedigree and longitudinal data and the software
Fisher. Australian Journal of Statistics 1994; 36:153167.
49. Williams CJ. On the covariance between parameter estimates in models of twin data. Biometrics 1993; 49:
557568.
50. SAS Institute Inc. SAS/STAT Software Version 8. SAS Institute Inc., Cary, NC, USA, 2000. www.sas.com
51. Ngo L, Wand MP. Smoothing with mixed model software. Journal of Statistical Software 2004; 9:154.
52. Crainiceanu CM, Ruppert D, Wand MP. Bayesian analysis for penalized spline regression using WinBUGS.
Journal of Statistical Software 2004, under revision, available at www.people.cornell.edu/pages/cmc59/
53. McCullagh P, Nelder J. Generalised Linear Models. Chapman & Hall: London, 1989.
54. Aitkin M, Anderson D, Francis B, Hinde J. Statistical Modelling in GLIM. Oxford University Press: Oxford,
1989.
55. Breslow N, Clayton DG. Approximate inference in generalised linear mixed models. Journal of the American
Statistical Association 1993; 88:9 25.
56. StataCorp. Stata: Release 8.2. Stata Corporation, College Station, TX, USA, 2003. www.stata.com
57. Hall I. b2-adrenoreceptor polymorphisms and asthma. Clinical and Experimental Allergy 1999; 29:11511154.
58. Daniels S, Bhattacharrya S, James A, Leaves N, Young A, Hill M, Faux J, Ryan G, le Souef P, Lathrop M
et al. A genome wide search for quantitative trait loci underlying asthma. Nature 1996; 383:247250.
59. Doull I, Lawrence S, Watson M, Begishvili T, Beasley R, Lampe F, Holgate S et al. Allelic association of
gene markers on chromosomes 5q and 11q with atopy and bronchial hyperresponsiveness. American Journal
of Respiratory and Critical Care Medicine 1996; 153:1280 1284.
60. Yan K, Salome C, Woolcock A. Rapid method for measurement of bronchial hyperresponsiveness. Thorax
1983; 38:760 765.
61. Scurrah KJ, Palmer LJ, Burton PR. Variance components analysis for pedigree-based censored survival data
using generalized linear mixed models (GLMMs) and Gibbs sampling in BUGS. Genetic Epidemiology 2000;
19:127148.
62. Burton P, Tiller K, Gurrin L, Cookson W, Musk A, Palmer L. Genetic variance components analysis for binary
phenotypes using generalized linear mixed models (GLMMs) and Gibbs sampling. Genetic Epidemiology 1999;
17:118 140.
63. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Randomised Trials and Healthcare
Evaluation. Wiley: Chichester, 2004.

Copyright ? 2005 John Wiley & Sons, Ltd.

Statist. Med. 2005; 24:33613381

Das könnte Ihnen auch gefallen