Sie sind auf Seite 1von 44

Regression Models for Survival Data

Hannelore Liero
Institute of Mathematics
University of Potsdam

Contents
1 Introduction
1.1 Explanatory Variables . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Parametric and Semiparametric Models . . . . . . . . . . . . . .
1.3 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3
4
7
10

2 Parametric Regression Models

11

3 The Proportional Hazard Model (PH model)


3.1 Definition of the Model . . . . . . . . . . . . . . . . . . . . . .
3.2 Estimation of the parameter . . . . . . . . . . . . . . . . . .
3.2.1 The Partial Likelihood . . . . . . . . . . . . . . . . . . .
3.2.2 Computation of the Estimates in R
Examples . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Mathematical Justification of the Partial Likelihood . .
3.3 Statistical Inference Based on the Partial Likelihood Estimator
3.3.1 Asymptotic Normality of . . . . . . . . . . . . . . . .
3.3.2 Asymptotic Confidence Regions and Tests for . . . .
3.4 Comparison of Two or More Lifetime Distributions . . . . . . .
3.5 Estimation of the Baseline Distribution . . . . . . . . . . . . .
3.6 Stratification . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 References and Index

.
.
.

17
17
18
18

.
.
.
.
.
.
.
.

21
24
27
28
29
35
39
39
43

Introduction

Survival analysis concerns the times of events. Such data are particularly
common in medicine, engineering and social sciences, but also arise in many
other domains. The responses may be incompletely observed owing to censoring
or truncation. Here we give an introduction to regression analysis of such data,
that is we investigate the relationship between the survival time and values of
an explanatory variable.
We consider the situation with just one event per individual. Throughout we
use the terms death, failure or event to describe the event of interest, and
refer to the time to death as a lifetime or survival time.
Let us start with an example:
Data example 1.1 (Leukaemia and white blood count.) Table 1.1 contains data on survival of acute leukaemia victims considered by Feigl and Zelen.
The covariate is log10 white blood cell count at time of diagnosis, and the
patients are grouped according to the presence or not of a morphologic characteristic of their white blood cells (AG).

No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

AG positive
log10 (WBC) Time
3.36
65
2.88
156
3.63
100
3.41
134
3.78
16
4.02
108
4.00
121
4.23
4
3.73
39
3.85
143
3.97
56
4.51
26
4.54
22
5.00
1
5.00
1
4.72
5
5.00
65

No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

AG negative
log10 (WBC) Time
3.64
56
3.48
65
3.60
17
3.18
7
3.95
16
3.72
22
4.00
3
4.28
4
4.43
2
4.45
3
4.49
8
4.41
4
4.32
3
4.90
30
5.00
4
5.00
43

Table 1.1: Survival time and white blood cells (Feigl and Zelen)

In Data example 1.1 the white blood cell counts (WBC) gives additional
information about the survival time. In addition we have to take into
account the morphologic characteristic. Thus we have two covariates: z =
(log10 (WBC), group number).

Other examples are: If we consider the survival time following a heart


transplantation, a vector z may denote (age, sex, treatment type). In a
survival study for lung cancer patients factors such as age and general conditions
of the patient, and the type of tumor, are recorded. In experiments on the
time to failure of an electrical insulation an important factor is the voltage the
insulation is subjected to. In clinical trials in medicine, the treatment assigned
to a patient may be considered a covariate.
There are two reasons for regression modeling:
- The covariates express a heterogeneity of the underlying population
(patients with different treatment, different stress level in life testing
experiments, different types of tumor). To model the survival time in
such cases one has to take into account this heterogeneity.
- Often the main interest is not the distribution of the lifetime itself but the
relationship between the lifetime and the covariates. For example,
a question could be: Does the survival time depends on the age of the
patient at diagnose?

1.1

Explanatory Variables

In general we consider a survival time X > 0 depending on a p-dimensional


explanatory z. Other terms for z are covariate, regressor or independent
variable. The vector z may include quantitative variables, such as blood
pressure, temperature, age, and weight, and qualitative variables, such as
gender, race, treatment and disease status. The covariates may vary over the
time or be not time-dependent.
In a simple comparison of two treatments (new, old) we consider a binary
explanatory variable, that is z = 1 for those individuals getting the new
treatment, and z = 0 for the other ones. If the treatment is specified by a
dose level, the corresponding explanatory variable is dose or log dose. If the
treatment is factorial, several variables will be required with synthesized product
variables to represent interactions where appropriate.
Data example 1.2 (Male laryngeal cancer patients.) Kardaun
(1983)
reports data on 90 males diagnosed with cancer of the larynx during 19701978 at a Dutch hospital. Times recorded are the intervals between the first
treatment and either death or end of the study. Also recorded are the patientss
age at the time of diagnosis, the year of diagnosis, and the stage of the disease.
The four stages of disease were based on an international classification rule
based on primary tumor, nodal involvement and distant metastasis.The stages,
denoted by I, II, III and IV, are ordered from least serious to most serious. 1
We define
z1

1 if the patient is in stage II, 0 otherwise

z2

1 if the patient is in stage III, 0 otherwise

z3

1 if the patient is in stage IV, 0 otherwise,

1 The data are taken from Klein and Moeschberger and are available in the Appendix:
R-Programs.

further z4 is the age at time of diagnosis. To model interactions between the


stage of disease and age we set z5 = z1 z4 , z6 = z2 z4 and z7 = z3 z4 . Thus, for a
50-year-old patient with status II the covariate vector is (1, 0, 0, 50, 50, 0, 0).

In many cases covariates will be constant over time. However time-dependent


variables can arise in two ways: internal or external. Sometimes a time-varying
covariate may be linked physically with the life time process. For example
blood pressure may be linked to the time or age at which an individual has a
stroke. Such covariates are called internal, and their treatment requires care. A
covariate process Z = {z(x), x 0}, which develops independent of the life time
process, is termed external; factors such as air pollution or climate conditions,
or applied stresses such as voltage or temperature in life test experiments, are
examples.
Data example 1.3 (Bone Marrow Transplantation for Leukaemia)
Bone marrow transplants are a standard treatment for acute leukaemia. Recovery following bone marrow transplantation is a complex process. Prognosis
for recovery may depend on risk factors known at the time of transplantation,
such as patient and donor age and sex, the stage of initial disease, the time
from diagnosis to transplantation etc. The final prognosis may change as
the patientss post transplantation history develops with the occurrence of
events at random time during the recovery process, such as development of
a acute or chronic graft-versus-host disease (GVHD), return of the platelet
count to normal levels, return of granulocytes to normal levels, or development
of infections. Transplantation can be considered a failure when ta patientss
leukaemia returns (relapse) or when he or she dies while in remission (treatment
related death). Here is a part of the data:
The data in in Table 1.2 are: group, t1 Time To Death Or On Study Time t2 Disease
Free Survival Time (Time To Relapse, Death Or End Of Study) d1 Death Indicator
1-Dead 0-Alive d2 Relapse Indicator 1-Relapsed, 0-Disease Free d3 Disease Free
Survival Indicator 1-Dead Or Relapsed, 0-Alive (Disease Free)
ta Time To Acute Graft-Versus-Host Disease da Acute GVHD Indicator 1-Developed
Acute GVHD 0-Never Developed Acute GVHD) bnct 7 tc Time To Chronic GraftVersus-Host Disease
dc Chronic GVHD Indicator 1-Developed Chronic GVHD
0-Never Developed Chronic GVHD tp Time To Chronic Graft-Versus-Host Disease
dp Platelet Recovery Indicator 1-Platelets Returned To Normal, 0-Platelets Never
Returned to Normal
z1 Patient Age In Years z2 Donor Age In Years z3 Patient Sex: 1-Male, 0-Female
z4 Donor Sex: 1-Male, 0-Female z5 Patient CMV Status: 1-CMV Positive, 0-CMV
Negative z6 Donor CMV Status: 1-CMV Positive, 0-CMV Negative z7 Waiting
Time to Transplant In Days z8 FAB: 1-FAB Grade 4 Or 5 and AML, 0-Otherwise
z9 Hospital: 1-The Ohio State University, 2-Alferd , 3-St. Vincent, 4-Hahnemann
z10 MTX Used as a Graft-Versus-Host- Prophylactic: 1-Yes 0-No

In this data set we see that there three intermediate events that occur during the
transplant recovery process which be related to the disease-free survival time.
These are the development of acute graft-versus-host disease, the development

6
Pat.Nr
1
2
3
..
.

group
1
1
1
..
.

t1
2081
1602
1496
..
.

t2
2081
1602
1496
..
.

d1
0
0
0
..
.

d2
0
0
0
..
.

d3
0
0
0
..
.

ta
67
1602
1496
..
.

da
1
0
0
..
.

tc
121
139
307
..
.

dc
1
1
1
..
.

tp
13
18
12
..
.

dp
1
1
1
..
.

49
50
51

2
2
2

860
1258
2246

860
1258
2246

0
0
0

0
0
0

0
0
0

860
1258
52

0
0
1

860
120
380

0
1
1

15
66
15

1
1
1

Pat.Nr
1
2
3
..
.

z1
26
21
26
..
.

z2
33
37
35
..
.

z3
1
1
1
..
.

z4
0
1
1
..
.

z5
1
0
1
..
.

z6
1
0
0
..
.

z7
98
1720
127
..
.

z8
0
0
0
..
.

z9
1
1
1
..
.

z10
0
0
0
..
.

49
50
51

25
30
45

31
16
39

0
0
0

1
1
0

0
1
0

1
0
0

180
180
105

0
0
0

1
2
4

0
1
0

Table 1.2: Part of Bone Marrow Transplantation Data

of chronic graft-versus-host disease and the return of the patients platelet


count to a self-sustaining level (platelet recovery). In the analysis of the data
it turned out that the risk group (AML low-risk, AML high-risk), the so-called
FAB factors and the patient age and the donor age are the most important
fixed covariates. Further investigations showed that the time-dependent main
effect is the platelet recovery and that interactions between these factors are
significant. As an example consider the following part of the interpretation of
the results: The relative risk of treatment failure (death or relapse) before
platelet recovery for an AML low-risk patient compared to an ALL patient is
3.69. and a 95% confidence interval for the relative risk is [0.74, 18.39]. . . The
risk of treatment failure after platelet recovery for an AML low-risk patient
compared to an ALL patient is 0.18 and the 95% confidence interval for this
risk is [0.08, 0.41]. . . This suggests that the difference in outcome between AML
low-risk patients and the ALL patients is due to different survival rates after
the platelets recover and that, prior to platelet recovery, the two risk groups
are quite similar. 2

In Section 2 and Section 3 we will consider only models with covariates which
do not vary in time.
As in the classical regression theory we can consider the covariate as a value
of a random variable or as a fixed quantity. Roughly speaking, if we know
the values of the covariate before the experiment is carried out or if we observe
2 taken

from Klein/Moeschberger

the covariates with a negligible error then z can be considered as a nonrandom


variable. Otherwise, we model z as a realization of a random variable Z. In
both cases we use the following notation: The survival function of X given the
covariate z is denoted by S(|z). For a random covariate Z this is a conditional
survival function
S(x|z) = P(X > x|Z = z).
For the nonrandom case it can be useful to write Xz to represent the survival
time X for covariate values z. Then
S(x|z) = P(Xz > x).
Further, f (|z), h(|z) and H(|z) are the density, the hazard rate (function)
and the cumulative hazard function, respectively.

1.2

Parametric and Semiparametric Models

Let denote the function describing the influence of the covariate z on the
lifetime X. Suppose that has a parametric form, that is () = (; ),
where (; ) is known up to the finite-dimensional parameter .
If the type of the distribution of X is known, we have a parametric model.
One possibility to define such a model is to choose a typical lifetime distribution
where the parameters depend on the covariate. Consider the following examples:
Example 1.1 (Exponential model) Suppose that X given Z = z is expop
nentially distributed with expectation exp( t z) = exp( j=1 zj j ). Then the
survival function is defined by
S(x|z) = exp((z; )x) with (z; ) = exp( t z).

Example 1.2 (Log normal model) Suppose that ln X given Z = z has a


normal distribution with expectation t z and variance 2 . Then
S(x|z) = 1

ln x t z

where is the distribution function of the standard normal distribution.

Another possibility for describing the influence of the covariates is to model


their effect on the failure directly. A well-known model is the proportional
hazard model (PH model) which belongs to the class of multiplicative hazard
models. Here the hazard rate of X given z is of the form
h(x|z) = h0 (x) (z).

The function h0 is the so-called baseline hazard rate, which does not depend
on the covariate, and > 0. The key feature of this type of models is that
the hazard rates of two individuals with distinct values of the covariate are
proportional: Let z = z , then the ratio of the hazards is for arbitrary x
h(x|z)
h0 (x) (z)
(z)
=
=

h(x|z )
h0 (x) (z )
(z )
which is a constant. The baseline hazard h0 can be considered as hazard function
for an individual whose covariate vector z is such that (z) = 1. In other words,
the baseline hazard is for all objects/individuals the same. The function does
not contain a constant.
Fully parametric PH models specify the baseline hazard h0 (; ) and (; )
parametrically.
Very often a parametric form is assumed only for the function, ; the
baseline hazard h0 is treated nonparametrically. Such models are called
semiparametric models.
Because h must be positive, a common parametric specification for is
(z; ) = exp( t z), in which case h0 is the hazard function for z = 0.
An estimation procedure for this class of semiparametric models is given in
Section 3.
Data example 1.4 (Male laryngeal cancer patients; continuation) Let
us fit a PH model to the data of Data example 1.2. Set
(z; ) = exp(1 z1 + 2 z2 + 3 z3 + 4 z4 ).
With the method described in the next section we will obtain the following
estimates
1 = 0.1386 2 = 0.6383 3 = 1.6931 4 = 0.0189.
Thus the relative risk for a 50-year-old patient compared to a 40-year-old
patient (both in Stage IV disease) is exp(10 4 ) = 1.2.

Exercise 1.1 Show that the survival function of a PH model has the form
S(x|z) = S0 (x)(z) ,
where S0 is the baseline survival function.

Exercise 1.2 Does the Weibull distribution with shape parameter and scale
parameter (z), i.e.
S(x|z) = exp((x/(z)) )
belong to the PH family?

(1.1)

Another important class of survival models with covariates are the accelerated
life time models (ALT)3 . Here the covariates are assumed to act directly on
the lifetime, so to speed it up or to retard its progress. In terms of the lifetime
X, the speeding up or slowing down is accomplished by the positive covariate
function and we may write for X given Z = z or X = Xz , respectively
X =

X0
.
(z)

(1.2)

It follows from (1.2) that the survival function of X has the form
S(x|z) = S0 (x(z)),
where S0 is the survival function of the baseline lifetime X0 .

Exercise 1.3 Show that the Weibull distribution (2.10) belongs to the ALT
family.

Exercise 1.4 Show that the survival functions of PH models and ALT models
are ordered in the sense that S(x|z1 ) S(x|z2 ) for all x or S(x|z1 ) S(x|z2 ).
An interesting property of the ALT models is the following: Taking the
logarithms on each side of (1.2) gives
Y = ln X = ln (z) + W,
where W is a random error with a distribution independent of the covariates:
P(W > w|z)

P(Y + ln (z) > w|z) = P(ln(X (z)) > w|z)

P(X (z) > exp(w)|z) = S0 (exp(w)).

For (z) = exp( t z) we have


Y = ln X = t z + W
which looks very similar to the classical parametric regression model. Note
that the expectation of the error W is not necessarily zero. Common choices
for the error distribution include the standard normal distribution, extreme
value distribution or the logistic distribution. In these cases the baseline
distribution is the lognormal distribution, the Weibull distribution or the log
logistic distribution.
In such parametric models we apply the maximum likelihood method to estimate
the parameter .
If we do not specify the distribution of the baseline life time we can apply the
following approach: Set
Y = ln X = 0 t z + ,
3 also

AFT-models for Accelerated Failure Time

10

where 0 = E ln X0 and = ln X0 E ln X0 is random variable with mean zero


and a variance which does not depend on z. Assuming identifiability we can
use the least squares approach to estimate the unknown parameter (, t )t .
Another class of models is characterized by an additive structure of the hazard
functions. Let us mention Aalenss nonparametric additive hazard model
h(x|z) = 0 (x) + (x)t z(x).
Here the time-dependent regression coefficients () are left completely unspecified except for the assumption
t

|j (u)| du <
0

for all t, j = 1, . . . , p.
Lin and Ying proposed an alternate additive hazards regression model. For their
model the possible time-varying coefficients in the Aalen model are replaced by
constants:
h(x|z) = 0 (x) + t z(x)
where j , j = 1, . . . , p are unknown parameters and 0 is an arbitrary baseline
function.
For the investigation of such models the reader is referred to the books
Klein/Moeschberger (Chapter 10) and Martinussen/Scheike (Chapter 5).

1.3

Data Structure

Data of the following structure are the basis of our investigations: An individual
may not be observed on the whole of its lifetime, so that, for example, we
may only know that it survived to the end of the study. In other words, we
have censored data. We will consider random right-censoring of type I: Each
individual is assumed to have a life time X and a censoring time C with survival
functions S and 1 G, respectively. Our observations are realizations of the
independent and identically distributed triples (Ti , i , Zi ) or of the independent
(not identically distributed) pairs (Ti , i )
Ti = min(Xi , Ci ),

i =

1
0

if
if

Xi Ci
.
Xi > Ci

(1.3)

We assume that the censoring is noninformative in that, given the covariate Zi ,


the event time Xi and the censoring time Ci are independent.
The lifetime X is assumed to be a continuous random variable. The special case
of nonrandom right-censoring at a fixed point c is included by taking
G(t) =

1
0

tc
.
t<c

11

Parametric Regression Models

In this section we discuss some examples for parametric survival regression


models. Suppose that the (conditional) survival function has a parametric
form; let us write S(|; , ), where the parameter Rm characterizes
the underlying distribution and the parameter Rp the influence of the
covariates. The likelihood function is given by
n

h(ti |z i ; )i S(ti |z i ; )

L() =

= (t , t )t

(2.4)

i=1

and the log-likelihood function by


n

l() =

i ln h(ti |z i ; ) H(ti |z i ; ).

li (; ti , z i , i ) =
i=1

(2.5)

i=1

Suppose that L is twice differentiable, then the maximum likelihood estimate


which is defined by
= argmax L()
is the solution of the likelihood equations
Us () =

l() !
=0
s

s = 1, . . . , m + p = k.

As in statistical models for independent and identically distributed random


variables one can show that under regularity conditions the estimator = n
is asymptotically normal, that is

D
n Nk (0, ()1 )
(2.6)
for a positive definite matrix () defined later on. The proof of (2.6) is based
on the following steps:
Step 1: By Taylor expansion (assuming that the likelihood function is three
times differentiable) we obtain for the score vector 4
.
U () U () = U () = M () ( )
where M () is the k k matrix with the elements
Msr () =

2 l()
,
s r

s, r = 1, . . . , k.

Thus

.
n = nM ()1 U () =

M ()
n

U ()
.
n

(2.7)

Step 2: Applying the central limit theorem for independent but not necessarily
identically distributed random vectors to
n

U () =
i=1
4 The

li (, Ti , z i , i )

symbol = means is approximately equal; means is approximately distributed as

12

we obtain under the assumption that


1
n

Var
i=1

li (, Ti , z i , i )
zi

1
n

Vi (, z i ) (),
i=1

where () is a positive definite matrix, the limit statement


U () D

Nk (0, ()).
n

(2.8)

Step 3: By the law of large numbers for independent, but not necessarily
identically distributed random vectors we obtain
EM () P
M ()
0.

n
n
And since (under the usual regularity condition of exchange of integration and
differentiation)
n

Vi (, z i )

EM () =
i=1

we have
M () P
().
n

(2.9)

The statements (2.7), (2.8) and (2.9) together imply the desired asymptotic
normality (2.6).

Roughly speaking we have

Nk (, (n ())1 ),
and the limiting covariance matrix can be consistently estimated by
1

J()1 =

Vi (, z i )

i=1

where I() is called the observed Fisher information.


Let us consider some examples:
Example 2.1 (Exponential model) We start with an exponential model
introduced in Example 1.1. In this model we have no distribution parameter
and the log-likelihood function is given by
n

t z i i

l() =
i=1

ti exp( t z i ).
i=1

13

The estimate is the solution of the equations


n

ti exp( t z i )zij = 0

zij i +
i=1

j = 1, . . . , p.

i=1

The Fisher information matrix has (for fixed covariates) the elements
n

E(Ti ) exp( t z i )zir zis .

I()rs =
i=1

If there is no censoring we have ETi = exp( t z i ) and therefore


n

I()rs =

zir zis .
i=1

The observed Fisher matrix J() has the elements


n

ti exp( t z i )zir zis .

J()rs =
i=1

To demonstrate this procedure let us continue Data example 1.1:


Data example 2.1 (Leukaemia and WBC; continuation) Assume that
the survival times of the leukaemia patients considered in Data example 1.1 are
exponentially distributed with expectation
(z; ) = exp(0 + 1 log10 (wbc) + 2 1(group = 1)).
The log-likelihood function has the form
n

(0 + 1 zi1 + 2 zi2 )
i=1

exp(0 1 zi1 2 zi2 )xi


i=1

where zi1 is the log10 (wbc) of the ith patient and zi2 = 1 if the ith patient is in
group AG positive and zi2 = 0 otherwise.
The maximum likelihood estimates and their standard errors are computed with
help of the R-procedure
ll<- function(beta)
{-sum(-beta[1]-beta[2]* leuk$wbc-beta[3]*leuk$groupleuk$time*exp(-beta[1]-beta[2]*leuk$wbc-beta[3]*leuk$group))}
outw<-nlm(ll,c(6,0,1),hessian=TRUE)
b<-outw$estimate

14
Parameter
0
1
2

Estimate
5.81
-0.70
1.02

se
1.29
0.30
0.35

Table 2.1: Results of the Exp-fit for Data example 1.1

The results are given in Table 2.1. The fitted means are shown in Figure 2.1.
With outw$hessian we obtain the estimated observed Fisher information,

32.98 136.41 16.99


J() = 136.41 577.51 67.28
16.99
67.28 16.00
and from the diagonal elements of its inverse the standard error given in Table
2.1.

150

Leukaemia Data (Feigl/Zelen):


Exp fit of mean
1

AG positive
AG negative
1
1
1

100

1
2

50

Survival time (weeks)

1
1

1
2

2
2

1
1

2
1

2
2

3.0

3.5

4.0

1 2 2 222

4.5

2
1
5.0

log(wbc)

Figure 2.1: Fitted means for Exp-model (Example 1.1)

Exercise 2.1 Fit a Weibull distribution with shape parameter and scale
parameter (z; ), i.e.
S(x|z) = exp((x/(z; )) )
to the data of Data example 1.1.

(2.10)

15
Voltage Level (kV)
26
28
30

ni
3
5
11

32

15

34

19

36

15

38

Breakdown Times
5.79, 1579.52, 2323.7
68.85, 426.07, 110.29, 108.29, 1067.6
17.05, 22.66, 21.02, 175.88, 139.07, 144.12,
20.46, 43.40, 194.90, 47.30, 7.74
0.40, 82.85, 9.88, 89.29, 215.10,
2.75, 0.79, 15.93, 3.91, 0.27,
0.69, 100.58, 27.80, 13.95, 53.24
0.96, 4.15, 0.19, 0.78, 8.01,
31.75, 7.35, 6.50, 8.27, 33.91,
32.52, 3.16, 4.85, 2.78, 4.67,
1.31, 12.06, 36.71, 72.89
1.97, 0.59, 2.58, 1.69, 2.71,
25.50, 0.35, 0.99, 3.99, 3.67,
2.07, 0.96, 5.35, 2.90, 13.77
0.47, 0.73, 1.40, 0.74, 0.39,
1.13, 0.09, 2.38

Table 2.2: Times to breakdown (in minutes) at each of seven voltage levels

In the following we consider a parametric ALT model:


Data example 2.2 (Insulating fluid failure times.) Nelson (1972) described the results of a life test experiment in which specimens of a type of
electrical insulating fluid were subjected to a constant voltage stress. the length
of time until each specimen failed, or broke down, was observed. Table 2.2
gives results for seven groups of specimens, tested at voltages ranging from 26
to 28 kilovolts (kV).
Engineering background for this problem suggests that the Weibull distribution
with a scale function (z) = cz and a shape parameter is appropriate. This
is referred to as power law model. So we have
S(x|z) = exp

x
(z)

Taking now the logarithm we obtain


Y = 0 + 1 log(z) + W
where 0 = ln c, 1 = and exp(W ) is Weibull distributed with shape parameter
and scale parameter 1.
The R-procedure
survreg(formula = Surv(volt.stress$time) ~ z.log, dist = weibull)
yields (in 6 iteration steps) the results given in Table 2.3.

16
Parameter
0
1

Estimate
64.85
-17.73
1.288

se
5.62
1.61
0.113

Table 2.3: Estimates of the parameters in Data example 2.2

Exercise 2.2 (Steel specimens) In Table5 2.4 you see data for four independent rolling contact fatigue tests on hardened steel specimens at four different
stress levels (in psi2 /106 ).
Failure at stress level
S2
S3
S4
S1
0.87
0.99 1.09
1.18
1.67 0.012
0.8 0.073
2.20 0.180 1.00 0.098
2.51 0.200 1.37 0.117
3.00 0.240 2.25 0.135
2.90 0.260 2.95 0.175
4.70 0.320 3.70 0.262
7.53 0.320 6.07 0.270
14.70 0.420 6.65 0.350
27.80 0.440 7.05 0.386
37.40 0.880 7.37 0.456
Table 2.4: Failure times of certain steel specimens at four stress levels

Fit the following model to the data:


Y = 0 + 1 log(stress level) + W
where exp(W ) is Weibull distributed with shape parameter and scale parameter 1.

5 The

example is taken from Smith, page 152

17

The Proportional Hazard Model (PH model)

3.1

Definition of the Model

We consider the semiparametric proportional hazard model introduced in the


previous section, i.e. the hazard function of the life time X given the covariate
z has the form
h(x|z) = h0 (x) exp( t z),

(3.11)

where h0 is the nonparametric baseline hazard function and is p-dimensional


parameter.
The characteristic property is the proportionality of the hazard functions; thus
for different z and z
h(x|z)
= exp( t (z z )).
h(x|z )

(3.12)

The ratio defined in (3.12) is called hazard ratio or relative risk of an


individual/object with covariate z with respect to one with covariate z . Let
z = (z1 , . . . , zj , . . . , zp )t and z = (z1 , . . . , zj , . . . , zp )t , then the hazard ratio
describes the effect of zj adjusted for the other variables.
Note, that
log

h(x|z)
= t (z z ).
h(x|z )

A slight extension of (3.11) is a model of the form h(x|z) = h0 (x) c( t z), where
c is a parametric function taking values in R+ . Model (3.11) can be extended
to a completely nonparametric model
p

h(x|z) = h0 (x) exp(

j (zj )),
j=1

where j are nonparametric functions. Such models correspond to generalized


additive regression models and are discussed by Hastie and Tibshiriani (1990).
In this section we consider model (3.11) and will discuss the following problems:
- estimation of using the partial likelihood method
- testing hypothesis about the parameter
- model building using the PH model
- estimation of the baseline distribution

18

3.2

Estimation of the parameter

3.2.1

The Partial Likelihood

For observations (ti , i , z i ), i = 1, . . . , n the likelihood function


model is given by

for the PH

L () =

h0i (ti ) exp(i t z i ) S0 (ti )exp(

zi )

(3.13)

i=1

Since the baseline distribution is unknown this likelihood function cannot be


used to derive estimates for . Cox (1972) proposed to maximize the so-called
partial likelihood function. To introduce this concept we need the following
notation:
Let us denote the observed ordered event times by t(i) , i = 1, . . . , d. First, we
assume that all death times are distinct, in other words there are no ties and
we have
t(1) < t(2) < < t(d) .
n

In this case d = i=1 i . Tied life times have probability zero for continuous
distributions, but nevertheless they arise in data due to rounding. We will
consider tied observations later. Let R(t) denote the set of individuals who are
alive and uncensored just prior to time t; this is referred to as the risk set at
t since it consists of those individuals, who could be observed to die at t, given
what has occurred up to that time. The covariate associated with the individual
observed to die at t(i) is denoted by z (i) .
We start with a heuristic consideration. The idea behind the partial likelihood
approach is:
By considering t(i) along with its associated risk set Ri = R(t(i) ) in terms of
the probability of imminent death at t(i) conditional on Ri , we may write the
following approximation: Define the events
Ai = The individual from Ri with z (i) dies at t(i) .
Bi = A member of Ri dies at t(i) ..
Since
P(t X < t + dt|X t, z)

h(t|z) dt

we have
P(Ai |Bi )

=
=
=

P(Ai )
P(Bi )
h(t(i) |z (i) )
jRi h(t(i) |z j )
exp( t z (i) )
jRi

exp( t z j )

(3.14)

6 Note that for the case of random covariates (3.13) is the likelihood function based on the
conditional distribution.

19

The partial likelihood is formed by multiplying these conditional probabilities


over all deaths, so we have the partial likelihood function
d

exp( t z (i) )

L() =

jRi

i=1

exp( t z j )

(3.15)

However, the function defined in (3.15) is not a likelihood in the usual sense,
since it does not arise from the probability of some observable outcome. The
mathematical justification of this approach will be given in Section 3.2.3.
Note that the numerator of the partial likelihood depends only on information
from the individual who experiences the event, whereas the denominator utilizes
information about all individuals who have not yet experienced the event
including the individuals who will be censored later.
Here is an example for the constructions of the risk sets :
Observation t
7
124
88
13
2
79

No.
1
2
3
4
5
6

Censor
1
1
0
1
0
1

Rank of event times


1
4
2
3

Table 3.1: Illustration of computing the likelihood

From this array of data we find with n = 6 and d = 4


R1 = {1, 2, 3, 4, 6}, R2 = {2, 3, 4, 6}, R3 = {2, 3, 6} and R4 = {2}.

The log-likelihood is
d

k z(i)k

l() =

exp

ln
i=1

i=1 k=1

k zjk .

jRi

(3.16)

k=1

The (partial) maximum likelihood estimates are found by maximizing (3.16).


The score equations are derived by taking partial derivatives with respect to :
U () =

l() !
= 0.

In other words, we have to solve the system of equations


Us () =
!

l()
=
s

Us () = 0

d
jRi

z(i)s
i=1

i=1

zjs exp (

jRi

exp (

p
k=1 k zjk )
.
p
k=1 k zjk )

for s = 1, . . . , p.

The solution can be determined by the NewtonRaphson method.

(3.17)

(3.18)

20

Modifications when ties are present. Until now we have supposed that all
event times are different. Now, consider the case that ties are present. There
are modifications of the partial likelihood to adjust for simultaneous failure of
elements of the risk set Ri : Let t(1) < t(2) < < t(s) denote s distinct, ordered
death times. Further, di is the number of events at t(i) and Di is the set of all
individuals who die at time t(i) . Let v i be the sum of the vectors z j over all
individuals who die at t(i) . That is
zj .

vi =
jDi

The first proposal is due to Breslow (1974): The partial likelihood is expressed
as
s

exp( t v (i) )

L() =

i=1

jRi

di

(3.19)

exp( z j )

This likelihood considers each of the di events at a given time as distinct,


constructs their contribution to the likelihood function, and obtains the contribution to the likelihood by multiplying over all events at time t(i) . When
there are few ties, this approximation works quite well.
For illustration let us consider a data example given in Table 3.2.
No.
1
2
3
4
5
6

Lifetime t
51
51
322
828
339
551

Censor
1
1
1
0
0
1

Covariate z
50
47
48
42
54
50

Table 3.2: Illustration of computing the likelihood for tied data

We begin by ranking the data and constructing the risk sets:


No.
1,2
3
5
6
4

Lifetime t()
51
322
339
551
838

Censor ()
1,1
1
0
1
0

Ties d
2
1
1
1
1

Covariate z()
50,47
48
54
50
42

Risk set R(t() )


{1, 2, 3, 4, 5, 6}
{3, 4, 5, 6}
{4, 6}

Table 3.3: Illustration of computing the likelihood for tied data

The first term in the partial likelihood is


e(50+47)
( e50 + e47 + e48 + e54 + e50 + e42 )

21

The second term is


( e48

e54

e48
,
+ e50 + e42 )

and the last one


e50
.
( e50 + e42 )
If there are no ties the Breslowproposal corresponds to the usual partial
likelihood. The same is true for the Efronapproach:
Efron (1977) suggests a partial likelihood of the form
s

L() =
i=1

exp( t v (i) )
di
j=1

t
kRi exp( z k )

j1
di

t
kDi exp( z k )

(3.20)

For further proposals the reader is referred to the book of Klein and
Moeschberger.
3.2.2

Computation of the Estimates in R


Examples

To compute the the estimates for we have to solve (3.18) or the corresponding
equations following from a modified partial likelihood function. This can be done
numerically via a NewtonRaphson procedure. The R command
coxph( Surv(time, status) ~ x , data, method=... )
from the survival package can be used to find the maximum of the likelihood
function.
Here x is (as in the regression case) the vector of covariates; time is either the
event or the censoring time and status is a dummy variable coded 1 if the event
is observed or 0 if the observation is censored.
Let use consider some examples:
Data example 3.1 (Breast-Cancer Trial) In a study designed to determine
if female breast-cancer patients, originally classified as lymph node negative
by standard light microscopy (SLM), could be more accurately classified
by immunohistochemical (IH) examination of their lymph nodes with an
anticytokeratin, monoclonal antibody cocktail. The data for 45 female breastcancer patients with negative axillary lymph nodes and a minimum 10-year
follow-up were selected from The Ohio State University Hospitals Cancer
Registry. Of the 45 patients, 9 were immunoperoxidase positive and the
remaining 36 still negative.
Here are the data: (+ censored observation)
19
57
130+
152+

25
61
130+
153+

Immunoperoxidase
30
34
37
66
67
74
133+ 134+ 136+
154+ 156+ 162+

22

23

38

Negative:
46
47
78
86
141+ 143+
164+ 165+

Immunoperoxidase Positive:
42
73
77
89

51
122+
148+
182+

56
123+
151+
189+

115

144+

22

We fit a proportional hazard model with the immunoperoxidase status as the


single covariate. With zi = 1 if this status of the ith patient is positive and
zi = 0 otherwise we suppose the model
h(x|z) = h0 (x) exp(z).
The partial log-likelihood function has the form
d

l() = d1

ln(y0i + y1i exp()),


i=1

where d1 is the number of deaths in the immunoperoxidase positive sample, and


y0i (y1i ) is the number of individuals at risk in the immunoperoxidase negative
(positive) sample at time ti . We have to solve
d

U () = d1
i=1

y1i exp()
!
= 0.
y0i + y1i exp()

The procedure coxph yields the following result:


coxph(formula = Surv(time, death) ~ im, data = btrial)

im

coef
0.98

exp(coef)
se(coef)
2.66
0.435

z
2.25

p
0.024

The estimate of is 0.98, thus the relative risk of dying for an


immunoperoxididase-positive patient relative to an immunoperoxididasenegative patient is exp(0.98) = 2.66.
The estimate is computed in four iteration steps.

Data example 3.2 (Leukaemia and WBC; continuation) In Section 1


we assumed that the baseline distribution is an exponential distribution. Now,
let us consider a semiparametric model:
h(x|z) = h0 (x) exp(1 z1 + 2 z2 ),
where z1 = log10 (wbc) mean(log10 (wbc)) and z2 = 1 for AG positive and
z2 = 0 for AG negative.
The centering of the components is partly for computational reasons and partly
to aid in the interpretation. The results using different methods for handling
the ties are the following:
coxph(Surv(time)~ wbc + group,data=leuk,method=efron)
coef
wbc
0.844
group -1.069

exp(coef) se(coef)
z
2.326
0.313
2.70
0.343
0.429
-2.49

p
0.007
0.013

23

coxph(Surv(time) ~ wbc + group, data = leuk, method = breslow)


coef
wbc
0.827
group -1.018

exp(coef) se(coef)
2.287
0.312
0.361
0.423

z
2.65
-2.40

p
0.008
0.016

coxph(Surv(time)~ wbc + group, data = leuk, method = exact)

wbc
group

coef
0.898
-1.085

exp(coef)
2.454
0.338

se(coef)
z
p
0.335
2.68 0.0073
0.446
-2.43 0.0150

Let us interpret these results: The relative risk of dying for a AG positive patient
compared to a AG negative patient of the same wbc is exp(2 ) = 0.343.
The relative risk for a patient with log10 (wbc) = z compared to a patient with
log10 (wbc) = z is exp(1 (z z )). Note that this is the same for both groups.
To include interactions let us fit the following model
h(x|z) = h0 (x) exp(1 z1 + 2 z2 + 3 z1 (z2 mean(group)),
Here are the results
coxph(Surv(time) ~ z_1 * z_2, data=leuk, method = breslow)

wbc
group
wbc:group

coef
0.922
-1.139
1.137

exp(coef)
2.51
0.32
3.12

se(coef) z
0.320 2.88
0.428 -2.66
0.636 1.79

p
0.0040
0.0078
0.0740

Let us discuss the relative risks based on the model with interactions. Here the
relative risk between both groups depends on the white blood counts: Note that
mean(group)=0.5151.
h(x|z1 , 1, z1 (1 0.5151))
= exp(2 + 3 z1 ).
h(x|z1 , 0, z1 (0 0.5151))
Consider now the hazard ratios in the same group. In the model without
interaction term we have
h(x|z1 , 0)
h(x|z1 , 1)
= exp(1 (z1 z1 )) =
.

h(x|z1 , 1)
h(x|z1 , 0)
However, with interaction term the relative risks differ:
h(x|z1 , 1, z1 (1 0.5151))
= exp(1 (z1 z1 ) + 3 (1 0.5151)(z1 z1 ))
h(x|z1 , 1, z1 (1 0.5151))
and
h(x|z1 , 0, z1 (0 0.5151))
= exp(1 (z1 z1 ) 3 0.5151(z1 z1 ))
h(x|z1 , 0, z1 (0 0.5151))
The plot in Figure 3.1 supports this fact.

24

Leukaemia Data (Feigl/Zelen):


Linear predictors

Predictor

AG positive
AG negative

3.0

3.5

4.0

4.5

5.0

log(wbc)

Figure 3.1: Fitted linear predictors log(wbc) (Example 1.1)

3.2.3

Mathematical Justification of the Partial Likelihood

The partial likelihood, proposed by Cox (1972), is a nice idea for handling
complicated data structures where the full likelihood function is hard to obtain.
It is particularly useful for delating nuisance parameters, leading to a simplified
likelihood equation for the parameters of interest.
The basic behind the partial likelihood concept in general is as follows: Suppose
that the data vector y is a realization of a random vector with the density
f (y; , ), where is the parameter of interest and is a nuisance parameter.
Suppose that Y can be transformed into parts V 1 , W 1 , V 2 , W 2 . . . , V m , W m .
The joint density of V 1 , W 1 , V 2 , W 2 . . . , V m , W m can be written as
m

fV i |Si (v i |Si ; , )
i=1

fW i |Qi (wi |Qi ; , ),

(3.21)

i=1

where Si = (V 1 , W 1 , . . . , V i1 , W i1 ), Qi = (Si , V i ).
If the second term depends just on , i.e.
m

fW i |Qi (wi |Qi ; , ) =


i=1

fW i |Qi (wi |Qi ; )


i=1

this is termed a partial likelihood for .


In the proportional hazard model the parameter of interest is = and the
nuisance parameter = h0 . Let Wj be the label for the individual failing at t(j)
and
V j = {History from the (j 1)th death to the jth death, and T(j) }.

25

In other words, V j consists of all events occurring between the time right after
T(j1) and just before T(j) , and the random variable T(j) itself. Then Qj is
the history from the time 0 to T(j) and T(j) , where T(j) denotes the time
instantaneously before T(j) . The following figure illustrates the notation:

Q3

Q2

Q1

V1

](

V2

](

V3

7
6

5
4

2
1

T(1)

T(2)

T(3)

Figure 3.2: Illustration of the construction of a partial likelihood

In Figure 3.2 we have 7 observations, 3 are events. The labels of the events
are: w1 = 4, w2 = 1 and w3 = 6. The risk sets are R1 = {1, 3, 4, 5, 6, 7},
R2 = {1, 3, 6} and R3 = {6}
Now, let us derive the second term of (3.21) for the proportional hazard model:
P(Wj = l|Qj ) = lim

dt0

P(Tl [t(j) , t(j) + dt)|Qj )


kRj

iRj \{l}

P(Tk [t(j) , t(j) + dt)|Qj )

P(Ti
/ [t(j) , t(j) + dt)|Qj )

iRj \{k}

P(Ti
/ [t(j) , t(j) + dt)|Qj )

Now, as dt 0
P(Tk [t(j) , t(j) + dt)|Qj ) = P(Tk [t(j) , t(j) + dt)|Tk t(j) , z k )
Similarly
lim P(Ti
/ [t(j) , t(j) + dt)|Qj ) = 1.

dt0

h(t(j) |z k ) dt.

26

It follows
P(Wj = l|Qj )

h(t(j) |z l )
kRj h(t(j) |z k )

exp( t z l )
t
kRj exp( z k )

Consequently, the partial likelihood is given by


d

exp( t z (j) )

P(Wj = wj |Qj ) =
j=1

j=1

kRj

exp( t z k )

which coincide with (3.15).


Another characterization is to interpret L as a marginal likelihood: When
there is no censoring, (3.15) can be derived as a marginal likelihood function
based on the rank statistic for the data: The density function of Xi given z i
under the PH model is
f (x|zi ) = h0 (x) exp( t z i ) exp[H0 (x) exp( t z i )],

(3.22)

where H0 is the baseline cumulative hazard function. Let v = (v1 , . . . , vn ) be


the rank statistic for the data; that is vj is the label of the individual with the
jth smallest lifetime. The vector v is the realization of a discrete random vector
with distribution
P(V = v)

P(X(1) < X(2) < . . . < X(n) )

=
0

x(1)

f (x(i) |z (i) ) dx(n) dx(1)

(3.23)

x(n1) i=1

Straightforward integration of this expression using (3.22) yields


n

P(V = v) =
i=1

exp( t z (i) )
jRi

exp( t z j )

(3.24)

which is the likelihood function (3.15). In obtaining this result, we have used
the fact that the risk set is Ri = R(x(i) ) = {vi , vi+1 , . . . , vn }, since there is no
censoring.
In the simple noncensored case, therefore (3.15) is a legitimate likelihood
function arising from the probability function of the rank statistic. Under
suitable assumptions concerning the z i s, L() behaves in the usual way, with
being asymptotically normally distributed with mean and covariance matrix
I 1 , where I has the elements Ilm = E( 2 log L/i m ). If the data are
subject to Type II censoring, an extension of the preceding arguments yields the
same result. For more general types of censoring the argument breaks down,
however. In general the rank statistic is in fact unknown, because censoring
makes it impossible to know the exact ordering of the actual lifetimes.

Exercise 3.1 Derive (3.24) from (3.23) by integration.

27

Finally, consider the relation to the profile likelihood: If the hazard rate h0
is entirely arbitrary, then inference could only be based on events where failures
actually occurred, because the hazard might in principle be zero at every other
time. Thus it suffices to estimate the baseline cumulative hazard function by a
step function, say H0 (t) = i:t(i) t hi , where hi = h0 (t(i) ) > 0 only at observed
failure times. Suppose there are no ties. We can write the likelihood function
(3.13) in the form
d

L (, h1 , . . . , hd ) =

hi exp( t z (i) )

exp(H0 (ti ) exp( t z i )) .

i=1

i=1

For the log likelihood we obtain


d

l (, h1 , . . . , hd )

[ln hi + t z (i) ]

=
i=1

H0 (ti ) exp( t z i )
i=1

[ln hi + t z (i) ]

=
i=1

exp( t z i )
i=1

[ln hi + t z (i) ]

hj
j|t(j) ti

i=1

exp( t z j )

hi
i=1

jRi

With fixed the hj have maximum likelihood estimators


hi =

1
.
t
jRi exp( z j )

(3.25)

So the profile likelihood for is


d

lp () = max l (, h1 , . . . , hd ) =
h1 ,...,hd

ln
i=1

exp( t z (i) )
jRi

exp( t z j )

(3.26)

The corresponding profile likelihood is


d

i=1

exp( t z (i) )
jRi

exp( t z j )

that is, (3.15).

3.3

Statistical Inference Based on the Partial Likelihood


Estimator

It turns out that L can be treated as an ordinary likelihood: The estimator


obtained by maximizing L is
- consistent
- asymptotically p-variate normal.
- The likelihood ratio statistics based on L() can be treated as approximately 2 -distributed,

28

- the score statistics U () as approximately normal.


Information contained in the lifetimes ti is lost, because they are treated as
fixed in constructing the partial likelihood. The loss of information compared
to using the correct parametric model turns out to be small in most cases. So,
standard errors from partial likelihood are close to those obtained under the
(unknown) parametric model.
3.3.1

Asymptotic Normality of

Let be the solution of the likelihood equations (3.18)). For the investigation
of the properties of it is convenient to write L in the following form
n

L() =
i=1

exp( t z i )
n
t
l=1 Yl (ti ) exp( z l )

with
Yi (t) =

(ti t),

Yi (t) = 1

(3.27)

i R(t)

i = 1, . . . , n. The log-likelihood function is


n

i t z i ln

l() =
i=1

Yl (ti ) exp( t z l )

(3.28)

l=1

The score vector U () and the information matrix take simple forms. Define
for any t > 0 the p vector
n
t
l=1 Yl (t)z l exp( z l )
,
n
t
l=1 Yl (t) exp( z l )

z(t; ) =

which is a weighted average of the covariate vectors of individuals at risk at


time t. Then it is easily seen that the vector U () = l()
is
n

i [z i z(ti ; )].

U () =

(3.29)

i=1

In regular parametric models the Fisher information is the negative expectation


of the second derivative of the log-likelihood function. Here it does not make
sense to try to evaluate the expectation of the second derivative of the logarithm
of partial likelihood, but we use the second derivative directly in order to
characterize the asymptotic variance of . This matrix is called observed
information matrix. With
J() =

2 l()
t

we obtain
n

J() =

i
i=1

n
l=1

Yl (ti ) exp( t z l )[z l z(ti ; )][z l z(ti ; )]t


n
t
l=1 Yl (ti ) exp( z l )

. (3.30)

Under regularity conditions one can show that the score function (properly
standardized) converges in distribution to a normal distribution. Roughly

29
speaking, these conditions ensure that n1 J() converges in probability to a
positive definite matrix (). For a detailed formulation of these condition see
Martinussen and Scheike (2006).
Theorem 3.1 Suppose that (given the covariate z)
n1 J() ()
in probability, where is a positive definite p p-matrix. Under regularity
conditions we have
D

n1/2 U ()
D

n1/2 ( )

Np (0, ())

(3.31)

Np (0, ()1 )

(3.32)

and () can be estimated consistently by n1 J().

Remark: Roughly speaking, we have N (, J()1 ).


Proof: The proof of statement (3.31) is given in Martinussen and Scheike (2006,
page 185). We obtain (3.32) from (3.31) as follows: Since maximizes the loglikelihood function l we have U () = 0, and therefore by Taylor expansion
.
U () U () = U () = I() ( ).
This implies together with (3.31)

.
n( ) =
n I()1 U ()
=
.
=

(n1 I())1 n1/2 U ()

Np (0, ()1 )

1 n1/2 U ()

Standard errors of and confidence intervals for are based on the normal
approximation.
Estimates for the variances of s , s = 1, . . . , p are found on the diagonal of
I 1 (); the standard errors, denoted by se(s ) or se(coef) in computer packages,
are the corresponding square roots. Let J()rs be the element (r, s) of the
matrix J()1 . Then
se(s ) =

3.3.2

J()ss .

Asymptotic Confidence Regions and Tests for

Asymptotic confidence intervals for the (single) parameters s are given by


[s z1/2 se(s ), s + z1/2 se(s )],
where z is the -quantile of the standard normal distribution.

(3.33)

30

Furthermore, under regularity conditions it follows that the distribution of the


Wald statistic ,
t
QW
n () = ( ) J()( )

converges to the 2 -distribution with p degrees of freedom. Thus an asymptotic


confidence region for the parameter is given by
{|( )t J()( ) 2p;1 },

(3.34)

where 2p; is the -quantile of the Chi-squared distribution with p degrees of


freedom.
A second asymptotic confidence interval for the parameter is based on the
likelihood ratio statistic
QLR
n () = 2(l() l()).

(3.35)

For large n the distribution of QLR can be approximated by the chi-squared


distribution with p degrees of freedom, since
2(l() l())
.
= 2(l() (l() + U ()( ) ( )t J()( )/2))
=

( )t J()( )

2p

Thus, we obtain an asymptotic confidence region by


{ | 2 (l() l()) 2p;1 }.

(3.36)

The third confidence region avoids the calculation of the estimate . From
statement (3.31) it follows that the quadratic form
QSn () = U ()t J()1 U ()

(3.37)

converges in distribution to the chi-squared distribution with p degrees of


freedom. Consequently, a further asymptotic confidence region is given by
{ | U ()t J()1 U () 2p;1 }.

(3.38)

Based on the relationship between confidence regions and tests we can immediately formulate asymptotic -test procedures for testing the hypothesis
H : = 0

versus

K : = 0

for some known 0 : We reject H if


- Wald test
2
QW
n ( 0 ) > p;1

(3.39)

31

- Likelihood ratio test


2
QLR
n ( 0 ) > p;1

- Score test
QSn ( 0 ) > 2p;1 .
For testing the single parameter s , s = 1, . . . , p we can use the analogue to the
confidence interval (3.33). The hypothesis H : s = 0s is rejected if 0s is not
covered by the confidence interval, i.e.
|s 0s |
> z1/2 .
se(s )
Note that the procedures in software packages carry out this test for the
hypotheses 0s = 0, s = 1, . . . , p.

Data example 3.3 (Leukaemia and WBC; continuation) In the results


for the model with interaction between white blood counts and group we find
1
= 2.88,
se(1 )

2
= 2.66 and
se(2 )

3
= 1.79.
se(3 )

It follows that the hypotheses 1 = 0 and 2 = 0 are rejected (p-value 0.004


and 0.0078, respectively), but the data do not contradict the hypothesis 3 = 0
(p-value 0.074).
The R-procedure gives also the value of the likelihood ratio test statistic for
testing H : = 0, against the alternative that, at least, one of the s s is
nonzero:
Likelihood ratio test=17.9 on 3 df, p=0.000463
Thus, the influence of the covariates wbc and group is significant.

Often, one is interested in testing a hypothesis about a subset of the coefficients


s . We express this testing problem with a suitable m p matrix C and a
m-dimensional vector c0 :
H : C = c0

versus

K : C = c0 .

(3.40)

If we choose for m = 1 C1j = 0 for j = s, C1s = 1 and c0 = 0s we


generate the single hypothesis s = 0s considered above; with C = Ipp and
c0 = 0 we obtain testing problem (3.39). Suppose that = ( 1 , 2 )t , where
1 is a m-dimensional vector containing the coefficients of interest. Taking
C = (Imm , Om(pm) ) and c0 = 10 we obtain
H : 1 = 10

versus

K : 1 = 10 .

32

The estimators for the hypothetical parameter c0 based on the partial likelihood
method is C . The Wald statistic for testing (3.40) is given by
t
1 t 1
QW
C ] (C c0 ).
n (c0 ) = (C c0 ) [CJ()

(3.41)

2
If the hypothesis H in (3.40) is true the distribution of QW
n (c0 ) tends to a distribution with m degrees of freedom as n . Thus, we reject H if
2
QW
n (c0 ) > m;1 .

Let (c0 ) be the partial maximum likelihood estimator of in the hypothetical


model. The likelihood ratio statistic is defined by
QLR
n (c0 ) = 2(l() l((c0 ))).

(3.42)

Under H this statistic has also a 2 - distribution with m degrees of freedom,


and we reject H if
2
QLR
n (c0 ) > m;1 .

Data example 3.4 (Male laryngeal cancer patients; continuation)


First we fit a model without interactions, that is
(z, ) = exp(1 z1 + 2 z2 + 3 z3 + 4 z4 ).
The procedure
coxph(formula = Surv(time, delta) ~ z1 + z2 + z3 + z4, data = larynx)
yields the estimates, standard errors and (single) test results given in Table 3.4.

z1
z2
z3
z4

(stage II)
(stage III)
(stage IV)
(age)

coef
0.1400
0.6424
1.7060
0.0190

exp(coef)
1.15
1.90
5.51
1.02

se(coef)
0.4625
0.3561
0.4219
0.0143

z
0.303
1.804
4.043
1.335

p
7.6e-01
7.1e-02
5.3e-05
1.8e-01

Table 3.4: Results for the Data example 1.2 (without interaction)

Further R carries out the likelihood ratio test for testing


H : = 0 against K : not all j s are zero.
Likelihood ratio test=18.3 on 4 df, p=0.00107 n= 90
Thus, the influence of the covariates is significant.

33

I
II
III

II
1.15

III
1.90
1.65

IV
5.51
4.79
2.90

Table 3.5: Relative risks(without interaction)

Let us derive estimates for the relative risk of dying for a stage II patient of age
z4 compared to a stage I patient of the same age:
h(x|stage II and age = z4 )
= exp(1 ) = 1.15
h(x|stage I and age = z4 )
Results for the other comparisons are given in Table 3.5.
Now, consider the model with all interactions:
coxph( Surv(time, delta)~ z1$*$z4 + z2$*$z4 + z3$*$z4, data = larynx)

z1
z2
z3
age
z1:age
z2:age
z3:age

coef
-8.08376
-0.16404
0.82526
-0.00293
0.12236
0.01203
0.01422

exp(coef)
0.000309
0.848705
2.282480
0.997073
1.130165
1.012106
1.014325

se(coef)
z
3.6936 -2.1886
2.4742 -0.0663
2.4229
0.3406
0.0261 -0.1124
0.0525
2.3295
0.0375
0.3206
0.0359
0.3959

p
0.029
0.950
0.730
0.910
0.020
0.750
0.690

Likelihood ratio test=24.664 on 7 df, p=0.00087 n= 90


The output suggests that the interaction between stage II and age is significant
(p value 0.02).
Let us fit the model
(z, ) = exp(1 z1 + 2 z2 + 3 z3 + 4 z4 + 5 z1 z4 ).
We obtain
coef
z1
-7.48989
z2
0.62513
z3
1.76814
age
0.00603
z1:age 0.11333

exp(coef)
0.000559
1.868488
5.859972
1.006048
1.119998

se(coef)
z
3.4169
-2.192
0.3558
1.757
0.4238
4.172
0.0149
0.405
0.0479
2.367

Likelihood ratio test=24.488

p
0.02800
0.07900
0.00003
0.69000
0.01800

on 5 df, p=0.000175

The test statistic of the likelihood ratio test for testing

n= 90

34

30

Influence curves for different stages


with interaction between age and stage II

30

Influence curves for different stages


without interactions

10

15

20

25

Stage I
Stage II
Stage III
Stage IV

10

15

20

25

Stage I
Stage II
Stage III
Stage IV

40

50

60

70

80

40

50

Age at diagnosis

60

70

Age at diagnosis

Figure 3.3: Influence curves for Data example 1.2

H : 6 = 7 = 0 versus

K : 6 = 0 or 7 = 0

in the model with all interaction terms has the value


2 (24.664 24.488) 0.176
and gives the p-value 0.92. Thus, these interactions may be dropped from the
model.
The estimated curves exp(; z) for the model without interaction and with
interaction between age and stage II are shown in Figure 3.3.
From the significant interaction between stage II and age it follows that also the
relative risks of dying for a stage II patient of age z4 as compared to a patient
of different stage but at the same age depend on that age:
rel.risk stage II compared to stage I: exp(7.49 + 0.113z4 )
rel.risk stage II compared to stage III: exp(7.49 0.625 + 0.113z4 )
rel.risk stage II compared to stage IV: exp(7.49 1.768 + 0.113z4 ).
For example, for a 76-year-old patient in stage II this risk (compared to stage
I) is 2.998, whereas for a 60-year-old patient it is 0.492.

80

35

Relative risk
76yearold
60yearold

40

50

60

70

80

Age at diagnosis

Figure 3.4: Relative Risk of a patient at stage II with respect to a patient at stage I,
depending on age at diagnosis; Data example 1.2

3.4

Comparison of Two or More Lifetime Distributions

We can use the proportional hazard approach to compare the survival distribution of two groupsperhaps a treatment group and a control group. The
covariate z is used to identify which of the two groups an observation belongs
to: We set
z =

1
0

if the observation belongs to group 1


.
if the observation belongs to group 2

Let S1 and S2 be the survival functions corresponding to the two groups. One
then tests the hypotheses
H : S1 = S2

(3.43)

by testing for the absence of a covariate effect.


If the Cox-PH regression model with t z = z is adopted, then S2 = S0 and
exp()
, so that
S1 = S0
exp()

S1 = S2

(3.44)

A test of H : = 0 is a test of the hypothesis (3.43); the alternative


hypotheses (3.44) with = 0 are sometimes referred to as the Lehmann family
of alternatives. The hypothesis H : = 0 can be tested by fitting the PH model
with (conditional) survival function
S(x|z) = S0 (x)exp(z)

(3.45)

36

and applying standard large sample-procedures based on the maximum partial


likelihood estimator and its standard error, or on the likelihood ratio statistic.
The score function leads to a very simple procedure that does not require the
computation of the estimate of .
Assume that we have independent samples of lifetimes from S1 and S2 of sizes
n1 and n2 . Define the following quantities for i = 1, . . . , n = n1 + n2 :
i =

(ti is a lifetime)

1i =

(ti is a lifetime from S1 )

n1i =

Yl (ti )zj

number at risk from S1 at time ti

j=1
n

Yl (ti )(1 zj ) number at risk from S2 at time ti .

n2i =
j=1

For the partial log likelihood (3.28) we obtain

l()

Yj (ti ) ezj

i zi ln

=
i=1

j=1

n1

i ln( e n1i + n2i )

=
i=1
n

i=n1 +1
n

i ln( e n1i + n2i ).

1i

i ln( e n1i + n2i )

i=1

i=1

It follows that the score function is given by


n

1i

U () =
i=1

i n1i e
n1i e + n2i

(3.46)

It is easy to show that the observed Fisher information J() is


n

J() =
i=1

i n1i n2i e
.
(n1i + n2i )2

(3.47)

As described in Section 3.3.1 the quantity U ()/


J() converges in distribution
to the standard normal distribution. Thus U (0)/ J(0) provides a very simple
test of the hypothesis H : = 0: Reject H if
n
i=1

1i

i n1i
n1i + n2i

n
i n1i n2i
i=1 (n1i +n2i )2

> u1/2 .

(3.48)

This test is sometimes referred to as either the MantelHaenszel test or the


Log rank test.
Exercise 3.2 Derive the observed Fisher information in model (3.45), i.e.
verify (3.47).

37
Treatment A

1 3 3 6 7 7 10 12 14 15 18 19 22
26 28+ 29 34 40 48+ 49+
1 1 2 2 3 4 5 8 8 9 11 12 14 16
18 21 27+ 31 38+ 44

Treatment B

Table 3.6: Remission times (in weeks)


Kaplan Meier Estimate

1.0

Comparing treatments for leukaemia remission

0.0

0.2

0.4

0.6

0.8

Traetment A
Treatment B

10

20

30

40

50

Figure 3.5: KaplanMeier estimate for Data example 3.5

Data example 3.5 (Comparing treatments for leukaemia remission)


Lawless (1982) gives data for comparing the remission times (in weeks) for
patients subjected to different treatments. Twenty patients were assigned to
each of two treatments; + denotes right censoring. (See Table 3.6.)
The KaplanMeier estimates for the survival functions of both treatment groups
(Figure 3.5) suggest that a proportional hazard model is appropriate. We fit the
model with zi = 1 if the ith patient belongs to treatment group A and zi = 0
otherwise. Using the R procedure
fit<-coxph(Surv(time,delta) ~ group, data=lrem,method=breslow)
we obtain
group

coef
-0.388

exp(coef)
0.678

se(coef)
0.341

z
-1.14

Table 3.7: Comparison of treatments

The likelihood ratio test gives:

p
0.25

38

Likelihood ratio test=1.29

on 1 df, p=0.255

n= 40

The commands
fit$wald.test; 1-pchisq(fit$wald.test,1)

fit$score; 1-pchisq(fit$score,1)
yield the values and the p-values of the test statistic of the Wald- and of the
score test:
QW = 1.30 p = 0.255

QS = 1.31 p = 0.252

With sqrt(fit$score) we get the test statistic of the log rank test; and the
two-sided significance is 2*(1-norm(sqrt(fit$score)))=0.2512.
All tests procedures give essentially identically results. There is no evidence of
a difference between both distributions.

The expression of the score U (0) has a built-in structure which becomes evident
when we re-express (3.46) (for = 0) in the following form
n

1i
i=1
n

i n1i
n1i + n2i

(3.49)

(observed gr 1 deaths at ti expected gr 1 deaths at ti ) .

=
i=1

To see this note that only times ti at which a death occur (i = 1) contribute
to U (0) and J(0), and that if S1 = S2 , then the conditional expectation of 1i ,
given i = 1 and the numbers n1i , n2i at risk, is i n1i /(n1i + n2i ). This shows
directly that EU (0) = 0 under H : = 0.
Test of equality of three or more lifetime distributions are also readily obtained.
To compare m distributions S1 , . . . Sm we define a vector of m 1 indicators
z = (z1 , . . . , zm1 )t , where
zr =

1
0

if the observation is from Sr


.
otherwise

If the PH model is assumed, then Sm = S0 and


Sr (x) = S0 (x)exp(r )

r = 1, . . . , m 1.

The hypothesis
H : S1 = S2 = = Sm
is equivalent to
H : = 0.
Exercise 3.3 Write down the score test for comparing three survival distributions.

39

3.5

Estimation of the Baseline Distribution

To estimate the cumulative hazard function H we use the estimates developed in


the profile likelihood approach. Using (3.25) we obtain the Breslow estimator
1

H0 (t) =

(3.50)

t(i) t

jRi

exp( z j )

which can be written in the form


i

H0 (t) =
ti t

n
l=1

(3.51)

Yl (ti ) exp( z l )

Note that in the case = 0 this estimator is just the NelsonAalen estimate.
The estimate H0 is a step function with jumps at the observed event times.
A simple way to estimate S0 is to exploit the relationship S0 (t) = exp(H0 (t))
and define
S0 (t) = exp(H0 (t)).

(3.52)

When there are no covariates, or = 0, this does not give the KaplanMeier
estimator, but another estimate, sometimes referred to as the Fleming
Harrington estimate.
To estimate the survival function for an individual with covariate z , one uses
the estimate
t z )

S(t|z ) = S0 (t)exp(

(3.53)

Data example 3.6 (Male laryngeal cancer patients; continuation)


Consider the model without interaction. The estimates for the parameters
are given in Table 3.4 We provide an estimate for the survival function of an
patient who was 60 years old at diagnosis. The estimate of a survival for a
stage I patient is S0 (t)exp(0.01960) , for a stage II patient S0 (t)exp(0.01960+0.14) ,
for a stage III patient S0 (t)exp(0.01960+0.6424) and for a stage IV patient
S0 (t)exp(0.01960+0.706) . These curves are plotted in Figure 3.6.
At five years, the estimated survival probabilities for a patient with diagnosis
age 60 years are: 0.742, 0.611,0.44 and 0.262 (see the vertical line in Figure 3.6).

3.6

Stratification

There are instances when the proportional hazards assumption is violated for
some covariate. In such cases it may be possible to stratify on that variable
and employ the proportional hazards model within each stratum for the other
covariates. Here the subjects in the jth stratum have an arbitrary baseline

40

0.2

0.4

0.6

0.8

1.0

Estimated survival functions

0.0

Stage I
Stage II
Stage III
Stage IV
0

time

Figure 3.6: Estimated survival functions for a larynx cancer patient of age 60 at
diagnosis

function hj0 and the effect of the explanatory variables on the hazard function
can be represented by a proportional hazards model
hj (x|z) = h0j (x) exp( t z),

j = 1, . . . , s.

(3.54)

Here it is assumed that the parameter is the same in each stratum.


The partial log-likelihood function for this model is given by
s

l() =

lj (),
j=1

where lj is the partial likelihood (see (3.16)) using only the observations for those
individuals in the jth stratum. The parameter is estimated by maximizing
the partial likelihood function as before:
Data example 3.7 (Remission) Gehan (1965) and others have discussed the
results of a clinical trial reported by Freireich et al (1963), in which the drug
6-mercaptopurine (6-MP) was compared to a placebo with respect to the ability
to maintain remission in acute leukemia patients. 7 The trial was conducted
by matching pairs of patients by remission status (complete or partial) and
randomized within the pair to either 6-MP or placebo maintenance therapy.
Patients were followed until their leukaemia returned (relapse) or until the end
of the study.
7 The

example is taken from Cox and Oakes.

41

Table 3.8 gives remission times for two groups of 21 patients each, one group
given the placebo and the other the drug 6-MP.
Pair
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21

Remission status
1
2
2
2
2
1
2
2
2
2
2
1
2
2
2
1
1
2
2
2
2

Placebo
1
22
3
12
8
17
2
11
8
12
2
5
4
15
8
23
5
11
4
1
8

Drug 6-MP
10
7
32
23
22
6
16
34
32
25
11
20
19
6
17
35
6
13
9
6
10

Relapse
1
1
0
1
1
1
1
0
0
0
0
0
0
1
0
0
1
1
0
0
0

Table 3.8: Remission times: 6-MP and placebo treatment

First we estimate the survival curves of both groups nonparametrically by the


KaplanMeier estimate:
We fit the model
h(x|z) = h0 (x) exp(z),
where z = 1 if the patient is in the 6-MP-group and z = 0 otherwise.
coxph(formula = Surv(time, delta) ~ z, data = mp)
The following table provides the result for the case that the remission status is
not taken into account.

coef
-1.57

exp(coef)
0.208

se(coef)
0.412

z
-3.81

p
0.00014

That is, the risk of relapse for patients given a placebo is exp(1.57) = 4.8 times
higher than those given 6-MP.
Taking into account the remission status we fit the model
hj (x|z) = hj0 (x) exp(z),
where j = 1 for the pairs with complete remission and j = 2 for those with
partial remission. The procedure is

42

coxph(formula = Surv(time, delta) ~

z + strata(stat), data = mp)

The results are


z

coef
-1.79

exp(coef)
0.167

se(coef)
0.463

z
-3.87

p
0.00011

Here the relative risk of the placebo group compared to the 6MP-group is
exp(1.79) = 5.99.

43

References and Index


Textbooks
Cox, D. R. und Oakes, D., Analysis of Survival Data
Chapman & Hall 1984
Hastie, T. J. and Tibshirani, R. J., Generalized Additive Models,
Chapman and Hall 1990
Klein, J. P. und Moeschberger, M. L., Survival Analysis: Techniques for
Censored and, Truncated Data
Springer 2005
Lawless, J. F., Statistical Models and Methods for Lifetime Data
Wiley 2003
Martinussen, T. und Scheike, T. H., Dynamic Regression Models for
Survival Data
Springer 2006
Nelson, W., Applied Life Data Analysis,
Wiley, 1982
Smith, P. J., Analysis of Failure and Survival Data
Chapman & Hall/CRC 2002
Articles
Breslow, N. E., Covariance analysis of censored data. Biometrics 30, 579
594, 1974
Cox, D. R. Regression models and life-tables, J. R. Soc., B, 34, 187220,
1972
Efron, B. The efficiency of Coxs likelihood function for censored data,
JASA 72, 557-565, 1977
Feigl, P. and Zelen, M., Estimation of exponential survival probabilities
with concomitant information, Biometrics 21, 826838, 1965
Lin, D. Y. and Ying, Z. Semiparametric analysis of general additivemultiplicative hazard models for counting processes, Annals of Statistics
23, 17121734, 1995

Index
accelerated life time model, 9
ALT, 9
additive hazard models, 10
asymptotic confidence regions, 29
asymptotic normality, 11, 28
asymptotic tests, 29, 31
likelihood ratio statistic, 32, 38
likelihood ratio test, 31
score statistic, 38
score test, 31
Wald statistic, 32, 38
Wald test, 30

parametric model, 7
partial likelihood, 18, 24
partial log-likelihood, 40
profile likelihood, 27
proportional hazard model, 7
relative risk, 17, 23, 33, 34, 42
risk set, 1820, 25
score
statistic, 31
vector, 11, 19, 28, 30, 36
semiparametric model, 8
standard error, 29
stratification, 39

baseline distribution, 39
baseline hazard, 8

comparison of life time distributions, ties, 18, 20


35
Wald statistic, 30, 32
covariate, 4, 6
Weibull distribution, 8, 9, 15
time-dependent, 5
Data example
Brest cancer trial, 21
Insulating fluids, 15
Bone marrow transplants, 5
Larynx, 4, 8, 32, 39
Leukaemia remission, 37
Remission 6-MP group, 40
White blood counts, 3, 22, 31
Fisher information, 28
observed, 12, 28, 36
FlemingHarrington estimate, 39
hazard ratio, 17
KaplanMeier estimator, 41
Lehmann family of alternatives, 35
likelihood, 11
marginal, 26
partial, 24
profile, 27
likelihood ratio statistic, 3032
log rank test, 36
MantelHaenszel test, 36
NelsonAalen estimator, 39
44

Das könnte Ihnen auch gefallen