Beruflich Dokumente
Kultur Dokumente
Module Objective:
To develop techniques for the analysis of survival data
Module Content:
1) Parametric models of survival, use of life tables, types of
censoring, hazard functions
2) Non-parametric estimation of hazard and survival functions,
Kaplan-Meier and Nelson-Aalen estimators
3) Proportional hazards model with covariates
Use of software
ST3054 4
Introduction
Survival models
Lifetime distribution functions
Cox regression
Learning Outcomes
Co-requisite: ST3053
Predicting survival
X (age at death)
?
= T (time of death)
x = attained age
Want to nd S(t)
S
x
(10) is dierent if x = 25 than if x = 55
Let T
x
be the future lifetime after age x, of a life who
survives to age x, with 0 x and T
0
= T
Denition (0 x ):
F
x
(t) = P(T
x
t) is the distribution function of T
x
S
x
(t) = P(T
x
> t) = 1 F
x
(t) is the survival function of T
x
ST3054 25
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Future lifetime
Examples:
F
30
(50) denotes the probability that a 30-year old dies before
his/her 80th birthday
S
25
(32) represents the probability that a 25-year old survives
at least another 32 years
For consistency with T, the distribution function of the random
variable T
x
must satisfy the following:
F
x
(t) = P(T
x
t)
= P(T x + t|T > x)
=
F(x + t) F(x)
S(x)
ST3054 26
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Probabilities of death and survival
Actuarial notation for death and survival probabilities:
t
q
x
probability that someone aged x dies with t years
q
x
probability that someone aged x dies within 1 year
t
p
x
probability that someone aged x is still alive after t years
p
x
probability that someone aged x is still alive after 1 year
In particular we have
t
q
x
= F
x
(t)
t
p
x
= 1
t
q
x
= S
x
(t)
ST3054 27
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Survival probabilities: example 1
Age x l
x
d
x
p
x
q
x
90 9,253 2,035 0.78006 0.21994
91 7,218 1,711 0.76297 0.23703
92 5,507 1,403 0.74515 0.25485
93 4,104 1,122 0.72659 0.27341
94 2,982 873 0.70730 0.29270
95 2,109 660 0.68728 0.31272
96 1,449 483 0.66652 0.33348
97 966 343 0.64503 0.35497
98 623 235 0.62281 0.37719
99 388 155 0.59985 0.40015
Table: Irish Life Table No. 14 2001-2003 (Males)
Probability that a 90 year old man survives to 95, i.e.
5
p
90
?
ST3054 28
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Survival probabilities: example 1
For a 90 year old man to survive to 95 he must
Event Probability
survive from 90 to 91
1
p
90
= 0.78006
survive from 91 to 92
1
p
91
= 0.76297
survive from 92 to 93 p
92
= 0.74515
survive from 93 to 94 p
93
= 0.72659
survive from 94 to 95 p
94
= 0.70730
Thus
5
p
90
=
1
p
90
1
p
91
1
p
92
1
p
93
1
p
94
= 0.2278
ST3054 29
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Factoring survival probabilities
5
p
90
= P(90 year old man will survive to 95)
5
p
90
=
1
p
90
1
p
91
1
p
92
1
p
93
1
p
94
5
p
90
= (
1
p
90
1
p
91
1
p
92
) (
1
p
93
1
p
94
) =
3
p
90
2
p
93
5
p
90
= (
1
p
90
1
p
91
) (
1
p
92
1
p
93
1
p
94
) =
2
p
90
3
p
92
s+t
p
x
=
s
p
x
t
p
x+s
s+t
p
x
=
t
p
x
s
p
x+t
ST3054 30
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Survival probabilities: example 2
Question: which is bigger,
5
p
34
or
7
p
33
?
ST3054 31
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Relating conditional and aggregate survival probabilities
S
x
(t) is the (select) survival function for the r.v. T
x
T
x
is the future (select) lifetime after age x
T = T
0
is called the aggregate lifetime
ST3054 32
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Relating conditional and aggregate survival probabilities
S
x
(t) = probability someone aged x survives for t years or more
= probability someone survives to age x + t given that
they have already survived to age x
= P(T
x
> t) = P(T > x + t|T > x)
=
P(T > x + t and T > x)
P(T > x)
=
S(x + t)
S(x)
Equivalently,
t
p
x
=
x+t
p
0
x
p
0
ST3054 33
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Relating conditional and aggregate survival probabilities
From this relationship
t
p
x
=
x+t
p
0
x
p
0
we may thus also derive
s+t
p
x
=
x+s+t
p
0
x
p
0
=
x+s
p
0
x
p
0
x+s+t
p
0
x+s
p
0
=
s
p
x
t
p
x+s
Similary,
s+t
p
x
=
t
p
x
s
p
x+t
(i.e. the result seen previously on factoring survival probabilities)
ST3054 34
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
The distribution of mortality (age at death)
SDF: S(x) = 1 F(x) = P(t > x)
PDF: f (x) = dF(x)/dx = dS(x)/dx (unconditional)
f (x) = lim
h0
+
1
h
[F(x +h) F(x)] = lim
h0
+
1
h
P(x < T x +h)
Figure: Distribution of the random variable T: number of deaths vs age
ST3054 35
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
The force of mortality
x
Denition: the force of mortality
x
at age x (0 x ) is
x
= lim
h0
+
1
h
P (T x + h|T > x)
x
is an instantaneous measure of mortality at age x, or the
conditional instantaneous failure rate given survival to time x
x
= lim
h0
+
P (x < T x + h)
h
1
P(T > x)
thus the 1-1 relationship between
x
and f (x)
x
=
f (x)
S(x)
=
dS(x)/dx
S(x)
Inversion: since
x
=
d
dx
log S(x), we have
S(x) = exp
x
0
s
ds
= exp [(x)]
ST3054 39
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
The force of mortality
x
0.0
0.5
1.0
1.5
2.0
h
a
z
a
r
d
e
s
t
i
m
a
t
e
s
0 20 40 60 80 100
Age (years)
Figure: UK mortality 2003-2005 (Males, Oce of National Statistics):
x
vs age
ST3054 40
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
The force of mortality
x
.0001
.001
.01
.1
1
h
a
z
a
r
d
e
s
t
i
m
a
t
e
s
(
l
o
g
s
c
a
l
e
)
0.0
0.2
0.4
0.6
0.8
1.0
s
u
r
v
i
v
a
l
e
s
t
i
m
a
t
e
s
0 20 40 60 80 100
Age (years)
Figure: UK mortality 2003-2005 (Males, ONS):
x
=
d log S(x)
dx
(increasing) and S(x) = exp
x
0
s
ds
(decreasing) vs age
ST3054 41
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
The force of mortality
x
.0001
.001
.01
.1
1
h
a
z
a
r
d
e
s
t
i
m
a
t
e
s
(
l
o
g
s
c
a
l
e
)
0.00
0.01
0.02
0.03
0.04
D
e
n
s
i
t
y
0 20 40 60 80 100
Age (years)
Figure: UK mortality 2003-2005 (Males, ONS):
x
(increasing) and f (x)
(unimodal) vs age
ST3054 42
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
The force of mortality
x+t
Denition: for x 0 and t > 0, we could dene the force of
mortality
x+t
in two ways:
(1)
x+t
= lim
h0
+
1
h
P (T x + t + h|T > x + t)
(2)
x+t
= lim
h0
+
1
h
P (T
x
t + h|T
x
> t)
We often use
x+t
for a xed age x and 0 t < x.
ST3054 43
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Density of select model (pdf of T
x
)
By def, the distribution function of T
x
is F
x
(t). Thus its pdf:
f
x
(t) =
d
dt
F
x
(t) =
d
dt
P(T
x
t) =
1
S(x)
d
dt
S(x + t)
= lim
h0
+
1
h
(P(T
x
t + h) P(T
x
t))
= lim
h0
+
P(T x + t + h|T > x) P(T x + t|T > x)
h
= lim
h0
+
P(T x + t + h) P(T x)
S(x) h
= lim
h0
+
P(T x + t + h) P(T x + t)
S(x) h
ST3054 44
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Density of select model (pdf of T
x
)
Now by multiplying by S(x + t)/S(x + t) we obtain
f
x
(t) =
S(x + t)
S(x + t)
lim
h0
+
P(T x + t + h) P(T x + t)
S(x) h
= S
x
(t) lim
h0
+
1
h
P (T x + t + h|T > x + t)
= S
x
(t)
x+t
In actuarial notation, for a xed age 0 x , this is equivalent
to the following very important relationship
f
x
(t) =
t
p
x
x+t
(0 t < x)
ST3054 45
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Density of select model (pdf of T
x
)
Alternatively, we may observe that
f
x
(t) =
d
dt
S
x
(t) =
d
dt
S(x + t)
S(x)
=
1
S(x)
d
dt
S(x + t)
and since
d
dt
S(x + t) =
d
dt
x+t
f (u)du = f (x + t)
we have
f
x
(t) =
f (x + t)
S(x)
=
S(x + t)
S(x)
f (x + t)
S(x + t)
=
t
p
x
x+t
ST3054 46
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Summary
T
x
random future lifetime after age x (continuous r.v.)
t
q
x
probability that someone aged x dies with t years
q
x
probability that someone aged x dies within 1 year
t
p
x
probability that someone aged x is still alive after t years
p
x
probability that someone aged x is still alive after 1 year
Probabilistic notation Actuarial notation
x
h P(T x + h|T > x)
x
h
h
q
x
x
=
f (x)
S(x)
or f (x) =
x
S(x) f
X
(t) =
x+t
t
p
x
s+t
p
x
=
s
p
x
t
p
x+s
( s, t > 0)
ST3054 47
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
I.2 Life table functions
ST3054 48
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Life table functions
A life table provides the expected number of survivors to each age
in a hypothetical group of lives. We use:
l
x
= expected number of lives at age x
d
x
= expected number of deaths between ages x and x + 1
d
x
= l
x
l
x+1
p
x
=
l
x+1
l
x
q
x
= 1 p
x
= 1
l
x+1
l
x
=
l
x
l
x+1
l
x
=
d
x
l
x
t
p
x
=
l
x+t
l
x
ST3054 49
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Life table functions
In life tables, values are tabulated only for integer age. When
working with continuous age variables, assumptions on the
variation of mortality between integer ages are required. Ex:
= 100, 000
Survival probabilities:
t
p
x
=
l
x+t
l
x
ST3054 52
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Life table: example 2
Estimate l
58.25
assuming a uniform distribution of deaths between
exact ages 58 and 59, from the English Life Table 15 (Males):
Age x l
x
58 88,792
59 87,805
There are 88,792 - 87,805 = 987 deaths expected between the
ages of 58 and 59. Assuming deaths within this interval are
uniformly distributed, the number of deaths expected between the
ages of 58 and 58.25 is
987/4 = 246.75
So the expected number of lives at age 58.25 is
88, 792 246.75 = 88, 545.25
ST3054 53
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Life tables: uniform distribution of deaths assumption
Note: Under the assumption of uniform distribution of deaths, the
surviving population at the start of each quarter is decreasing.
This assumption therefore implies that the force of mortality is
increasing over the year of age (58,59).
Result: If deaths are assumed uniformly distributed between the
ages of x and x + 1, it follows that
t
q
x
= t q
x
for 0 t 1.
ST3054 54
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Life tables: uniform distribution of deaths assumption
Proof: Under the assumption we have by linear interpolation
l
x+1
= (1 t)l
x
+ t l
x+1
for 0 t 1. So
t
q
x
= 1
l
x+t
l
x
= 1
(1 t)l
x
+ t l
x+1
l
x
=
t l
x
t l
x+1
l
x
= t(1 p
x
)
= t q
x
ST3054 55
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Initial and central rates of mortality
Since q
x
is the probability that a life alive at age x (the initial
time) dies before age x + 1, it is called an initial rate of mortality.
Denition: The central rate of mortality m
x
dened as
m
x
=
d
x
1
0
l
x+t
dt
=
q
x
1
0
t
p
x
dt
This alternative represents the probability that a life alive between
ages x and x + 1 dies. The denominator
1
0
t
p
x
dt represents the
expected amount of time spent alive between ages x and x + 1 by
a life alive at age x.
ST3054 56
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Initial and central rates of mortality
Note: m
x
measures the rate of mortality over the whole year from
exact age x to exact age x + 1. By contrast,
x
measures the
instantaneous rate of mortality at exact age x.
m
x
is useful for projecting numbers of deaths, given the number of
lives alive in age groups. It constitutes one of the basic
components of a population projection.
In practice, the age groups used in population projection are often
broader than exactly one year, in which case the denition of m
x
must be adjusted accordingly.
ST3054 57
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Initial and central rates of mortality
If
x+t
is a constant, , between ages x and x + 1, then
m
x
=
q
x
1
0
t
p
x
dt
=
1
0
t
p
x
dt
1
0
t
p
x
dt
=
This quantity is close to an occurence-exposure rate statistic
Number of deaths
Total time spent alive and at risk
which can be used to estimate the force of morality
x
.
Note from this that m
x
can never be less than q
x
.
ST3054 58
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
I.3 Expected future lifetime
ST3054 59
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Complete expectation of life
Denition: The expected future lifetime after age x (or
expectation of life at age x) is dened as E[T
x
] and is denoted e
x
:
e
x
=
x
0
t
t
p
x
x+t
dt
=
x
0
t
t
t
p
x
dt
=
t
t
p
x
x
0
+
x
0
t
p
x
dt (integration by parts)
=
x
0
t
p
x
dt
(using
t
p
x
x+t
= f
x
(t) =
t
q
x
/t =
t
p
x
/t and
t
t
p
x
x
0
= 0)
ST3054 60
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Curtate future lifetime
Denition: The curtate future lifetime of a life aged x is
K
x
= [T
x
], where the square brackets [.] denote the integer part.
K
x
is a discrete r.v. taking values on {0, 1, 2, . . . , x}, with
probability function
P(K
x
= k) = P(k T
x
< k + 1)
= P(k < T
x
k + 1) (assuming F
x
(t) is continuous in t)
=
k
p
x
q
x+k
P(K
x
= k) is often denoted
k|
q
x
(k deferred q
x
), as we consider
here deferring the event of death until the year that begins in k
years from now.
ST3054 61
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Curtate expectation of life
Denition: The curtate expectation of life, denoted e
x
, is
e
x
= E[K
x
]
We have
e
x
=
[x]
k=0
k
k
p
x
q
x+k
=
1
p
x
q
x+1
+
2
p
x
q
x+2
+
2
p
x
q
x+2
+
3
p
x
q
x+3
+
3
p
x
q
x+3
+
3
p
x
q
x+3
+. . .
=
[x]
k=1
[x]
j =k
j
p
x
q
x+j
=
[x]
k=1
k
p
x
ST3054 62
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Curtate expectation of life
Additional years Probability
from age X
1
1
p
x
q
x+1
(survives one year and then dies in the next year)
2
2
p
x
q
x+2
... ...
k
k
p
x
q
x+k
... ...
Sum
k
p
x
q
k+1
(k = 1 to x)
ST3054 63
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Relationship between e
x
and e
x
Considering the two formulae
e
x
=
x
0
t
p
x
dt and e
x
=
[x]
k=1
k
p
x
the complete and curtate expectations of life are related by the
approximate equation
e
x
= e
x
+
1
2
Dene J
x
= T
x
K
x
to be the random lifetime after the highest
integer age to which a life x survives. Approximately, E[J
x
] = 1/2
(assuming deaths occur uniformly within each year of age), and
since E[T
x
] = E[K
x
] + E[J
x
], we have e
x
e
x
+ 1/2.
ST3054 64
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Life table: example 3
Figure: Irish Life Table No. 13, 1995-97 (Males)
ST3054 65
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Life table: example 3
Why e
81
=e
80
1?
How would you write e
81
w.r.t e
80
and p
x
?
Figure: Irish Life Table No. 13, 1995-97 (Males), continued
ST3054 66
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Future lifetimes: variance
The variances of the complete and curtate future lifetimes are
Var[T
x
] =
x
0
t
2
t
p
x
x+t
dt e
2
x
Var[K
x
] =
[x]
k=0
k
2
k
p
x
q
x+k
dt e
2
x
These expressions may be useful to:
t
0
s
p
x
x+s
ds
This follows from the relationship f
x
(t) =
t
p
x
x+t
. For each time
s [0, t], the integrand is the product of
(i)
s
p
x
, the probability of surviving to age x + s
(ii)
x+s
, which is approximately equal to
ds
q
x+s
, the probability
of dying just after age x + s
These probabilities are mutually exclusive and are thus just added
up (or in the limit integrated).This result allows deriving an
important relationship between
t
p
x
and
x
.
ST3054 70
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
A formula for
t
p
x
t
p
x
= exp
t
0
x+s
ds + c
ST3054 71
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
A formula for
t
p
x
Proof: This follows from
s
s
p
x
=
s
s
q
x
= f
x
(s) =
s
p
x
x+s
Note that
s
log
s
p
x
=
s
s
p
x
s
p
x
=
x+s
hence, for some constant of integration c (which is 0),
t
0
s
log
s
p
x
ds =
t
0
x+s
ds + c
Since
0
p
x
= 1 we have
log
s
p
x
t
0
= log
t
p
x
and
t
p
x
= exp{
t
0
x+c
ds + c} = exp{
t
0
x+c
ds}
(since e
0
= 1, we use c = 0)
ST3054 72
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Summary: integral expressions for
t
q
x
and
t
p
x
t
q
x
=
t
0
s
p
x
x+s
ds
t
p
x
= exp
t
0
x+s
ds + c
x
(increasing) and S(x) (decreasing) vs age WHAT MODEL?
ST3054 75
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Parametric models?
.0001
.001
.01
.1
1
h
a
z
a
r
d
e
s
t
i
m
a
t
e
s
(
l
o
g
s
c
a
l
e
)
0.00
0.01
0.02
0.03
0.04
D
e
n
s
i
t
y
0 20 40 60 80 100
Age (years)
Figure: UK mortality 2003-2005 (Males, Oce of National Statistics):
x
(increasing) and f (x) (unimodal) vs age WHAT MODEL?
ST3054 76
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
The exponential model
t
0
ds
= e
t
0
= e
t
and
t
q
x
= 1
t
p
x
= 1 e
t
ST3054 77
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
The exponential model
We also have
f (t) = e
t
E[T] =
1
Var[T] =
1
t
= t
1
ST3054 81
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
The Weibull model
Since
t
=
d
dt
log [S(t)] we have
t
=
d
dt
[t
1
] = t
1
t
= Bc
t
t
= Be
t log(c)
t
= A + Bc
t
t
0
x+s
ds
B
log(c)
, we have
t
p
x
= g
c
x
(c
t
1)
Makehams law: for g = exp
B
log(c)
r +1
+
r +2
t++
r +s
t
s1
where the
1
, . . . ,
r +s
are constants independent of t.
This is the most widely used form of the GM family. Another
popular form is
x
= GM(r , s) = poly
1
(t) + e
poly
2
(t)
where t is a linear function of x and poly
1
(t) and poly
2
(t) are
polynomials of degrees r and s respectively.
ST3054 89
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Suitable distribution families (summary)
Key constraint for a parametric model:
the domain of f (t) must be R
+
.
Here is a summary of some suitable families:
Exp Weibull Gompertz Makeham Log-logistic
f (t) e
t
t
1
e
t
Bc
t
e
B(1c
t
)
log(c)
(A+Bc
t
)e
B(1c
t
)
log(c)
t
1
(1+t
)
2
F(t) 1 e
t
1 e
t
1 e
B(1c
t
)
log(c)
1 e
B(1c
t
)
log(c)
At
1
1
1+t
S(t) e
t
e
t
e
B(1c
t
)
log(c)
e
B(1c
t
)
log(c)
At
1
1+t
t
t
1
Bc
t
A+Bc
t t
1
1+t
ST3054 90
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Exam-style question (1/2)
1. An investigation is undertaken into the mortality of men
between exact ages 50 and 55 years. A sample of n men is followed
from their 50th birthday until their either die or they reach their
55th birthdays.
The force of mortality (or hazard rate) is assumed to have the
following form
x
= + x
where and are parameters to be estimated and x is measured
in years since the 50th birthday.
(a) Derive an expression for the survival function between ages 50
and 55 years
(b) Sketch this on a graph
ST3054 91
Introduction
Survival models
Lifetime distribution functions
Cox regression
Simple survival model
Life table functions
Expected future lifetime
Some important formulae
Simple parametric survival models
Exam-style question (2/2)
[...] The force of mortality (or hazard rate) is assumed to have the
following form
x
= + x
where and are parameters to be estimated and x is measured
in years since the 50th birthday.
(c) Comment on the appropriateness of the assumed form of the
hazard function for modelling mortality over this age range
(d) If there were 100,000 men aged 50 then how many deaths
would you expect between ages 50 and 55 years
(e) Describe the distribution of the number of deaths between
ages 50 and 55 years among the n men
ST3054 92
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Section II
Estimating lifetime distribution
functions
ST3054 93
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
II.1 Statistical inference
ST3054 94
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Introduction
Parametric MLE
Non-parametric MLE
No entrants after t = 0
Right censoring
Left censoring
Interval censoring
Random censoring
Type I censoring
Type II censoring
ST3054 112
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Right censoring
Ex: end of mortality study before all lives observed have died
Data are left censored if we cannot know when entry into the
state we wish to observe took place
Dene t
0
= 0 and t
k+1
= and let t
1
< < t
k
, k m, be
the ordered times at which deaths were observed
Assume d
j
deaths are observed at time t
j
(1 j k) so that
d
1
+ + d
k
= m
Assume c
j
lives are censored (i.e. removed from investigation)
between times t
j
and t
j +1
(0 j k)
Then c
0
+ c
1
+ + c
k
= n m
Let d
j
be the number of individuals experiencing the event at
duration t
j
Let n
j
be the risk of experiencing the event just prior to
duration t
j
ST3054 134
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
The Kaplan-Meier model: assumptions
The Kaplan-Meier (KM) estimator of the survivor function adopts
the following conventions:
(a) The hazard of experiencing the event is zero at all durations
except those where an event actually happens in our sample
(b) The hazard of experiencing the event at any particular
duration t
j
when an event takes place is equal to
d
j
n
j
(c) For any 0 j k, if c
j
> 0, then
If d
j
= 0, the persons censored are removed from observation
at duration t
j
(at which censoring takes place)
If d
j
> 0, persons who are censored at t
j
are assumed to be
censored immediately after the events have taken place (so
that they are still at risk at that duration)
ST3054 135
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
The Kaplan-Meier model: assumptions
Example [IFA notes]: a group of 15 lab rats are injected with a
new drug. They are observed over the next 30 days. The following
events occur:
Day Event
3 Rat 4 dies from eects of drug
4 Rat 13 dies from eects of drug
6 Rat 7 gnaws through bars of cage and escapes
11 Rats 6 and 9 die from eects of drug
17 Rat 1 killed by other rats
21 Rat 10 dies from eects of drug
24 Rat 8 freed during raid by animal liberation activists
25 Rat 12 accidentally freed by journalist reporting earlier raid
26 Rat 5 dies from eects of drug
30 Investigation closes. Remaining rats hold street party.
ST3054 136
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
3 4
6
11 (2 rats)
17 24
26
25 30 (5 rats)
21
censored
died
Day
t1 t2
t3 t5 t4
Day Event
3 Rat 4 dies from eects of drug
4 Rat 13 dies from eects of drug
6 Rat 7 gnaws through bars of cage and escapes
11 Rats 6 and 9 die from eects of drug
17 Rat 1 killed by other rats
21 Rat 10 dies from eects of drug
24 Rat 8 freed during raid by animal liberation activists
25 Rat 12 accidentally freed by journalist reporting earlier raid
26 Rat 5 dies from eects of drug
30 Investigation closes. Remaining rats hold street party.
ST3054 137
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
The Kaplan-Meier model: assumptions
k
j =0
c
j
= n m)
Recall that if
x+t
= , then S
x
(t) =
t
p
x
= e
t
j
=
d
j
n
j
j =1
d
j
j
(1
j
)
n
j
d
j
(product of independent binomial likelihoods)
In eventless intervals, d
j
= 0 and the hazard becomes 0
ST3054 140
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Extending the force of mortality to discrete distributions
Denition: Suppose F(t) has probability masses ar the points
t
1
, . . . , t
k
. Then the discrete hazard function is dened as
j
= P[T = t
j
|T t
j
] (1 j k)
j
may be seen as the proba that a given individual dies on
day t
j
, given that they were still alive at the start of that day
ST3054 141
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Extending the force of mortality to discrete distributions
Ex: butteries of a certain species have short lives. After hatching,
each buttery experiences a lifetime dened by the following
probability distribution:
Lifetime (days) Probability
1 0.10
2 0.30
3 0.25
4 0.20
5 0.15
Calculate
j
for j = 1, 2, ..., 5 (to 3 decimal places) and sketch a
graph of the discrete hazard function.
ST3054 142
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Calculating the KM estimate of the survival function
If we assume that T has a discrete distribution then
1 F(t) =
t
j
t
(1
j
)
Since 1 F(t) = S(t), we can estimate the survival function using
the formula
S(t) =
t
j
t
(1
j
)
This is the Kaplan-Meier estimator.
ST3054 143
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Calculating the KM estimate of the survival function
To compute
S(t), we multiply the survival probabilities within each
of the intervals up to and including duration t. The survival
probability at time t
j
is estimated by
1
j
=
n
j
d
j
n
j
=
number of survivors
number at risk
So the probability of survival at time t is estimated by
S(t) =
t
j
t
n
j
d
j
n
j
The KM estimate is also called the product limit estimate as a
result of this expression.
ST3054 144
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Calculating the KM estimate of the survival function
To summarize the approach:
j
= P[T = t
j
|T t
j
], as the mesh of the partition tends to 0
j
= d
j
/n
j
(1
j
) 1
j
k=1
(1
k
)
1 3 1 15 0.0667 0.9333 0.0667
2 4 1 14 0.0714 0.9286 0.1333
3 11 2 12 0.1667 0.8333 0.2778
4 20 1 9 0.1111 0.8889 0.3580
5 26 1 6 0.1667 0.8333 0.4650
F(t) =
0 for 0 t < 3
0.0667 for 3 t < 4
0.1333 for 4 t < 11
0.2778 for 11 t < 21
0.3580 for 21 t < 26
0.4650 for t 26
ST3054 147
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
A graphical approach
S(t)
The graph of
S(t) is a step function starting at 1 and
stepping down at each new death
S(t) t
1 0 t < 3
14/15 3 t < 4
14/1513/14 4 t < 11
13/1510/12 11 t < 21
... ...
ST3054 149
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Comparing lifetime distributions
F(t)
1
F(t)
t
j
t
d
j
n
j
(n
j
d
j
)
S(x)
1/
,
S(x)
, where = exp
_
_
Z
1/2
Var[
S(x)]
S(x) log(
S(x))
_
_
This CI is not
symmetric about S(t)
Bands can be
constructed by
adjusting conf level
0 50 100 150
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
t
S
(
t
)
> leuk.surv = survfit(Surv(time)0, data=leukemia)
> plot(leuk.surv,xlab="t",ylab="S(t)",bty="n")
ST3054 152
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
The Nelson-Aalen model
t
=
t
0
s
ds +
t
j
t
t
=
t
j
t
d
j
n
j
ST3054 153
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
The Nelson-Aalen model
S(t) = e
F(t) = 1 e
t
Its variance is given by
Var
t
j
t
d
j
(n
j
d
j
)
n
3
j
ST3054 154
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Relationship between the KM and NA estimates
F
KM
(t) = 1
t
j
t
1
d
j
n
j
Using e
x
1 + x for small x, we have
F
KM
(t) 1 e
t
=
F
NA
(t)
ST3054 155
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Survival estimation in R
> library(survival)
> leukemia
> leuk.km = survfit(Surv(time, status) x, data=leukemia)
> leuk.km.ncs <- survfit(Surv(time) x, data=leukemia)
> plot(leuk.km[1], conf.int=F, xlab="t", ylab="S(t)", bty="n")
> lines(leuk.km.ncs[1], lty=4)
> legend("topright",c(Censoring", No censoring"), lty=c(1,4))
ST3054 156
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Survival estimation in R
0 50 100 150
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
t
S
(
t
)
Censoring
No censoring
ST3054 157
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
KM vs NA
> leuk.km = survfit(Surv(time,status), data=leukemia)
> leuk.na = survfit(Surv(time,status), data=leukemia,type="flem")
> # The Fleming-Harrington estimate is actually
> # exp(- NelsonAalen)
> plot(leuk.km, conf.int=F, xlab="t", ylab="S(t)", bty="n")
> lines(leuk.na, lty=4)
> legend("topright", c(Kaplan-Meier",Nelson-Aalen"), lty=c(1,4))
ST3054 158
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
KM vs NA
0 50 100 150
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
t
S
(
t
)
Kaplan!Meier
Nelson!Aalen
ST3054 159
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Testing dierences in survival curves
Here the 2 subpopulations (low risk vs high risk of H&N cancer)
appear to be dierent. Is it a statistically signicant dierence or
due to too small a sample size?
ST3054 160
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Testing dierences in survival curves
Let us denote:
- K the number of categories
- d
k,(i )
the number of deaths in group k at ordered time t
(i )
- d
(i )
=
K
k=1
d
k,(i )
the total number of deaths at time t
(i )
- n
k,(i )
the number of members of group k at risk at t
(i )
- n
(i )
the total number of members at risk right before t
(i )
ST3054 161
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Tests with two categories
Hypotheses:
H
0
: S
1
(t) = S
2
(t)
H
1
: S
1
(t) = S
2
(t)
Under H
0
, the expected number of deaths in group k at time
t
(i )
is
e
k,(i )
=
n
k,(i )
d
(i )
n
(i )
Assumptions:
m
i =1
d
(i )
is large
m
i =1
e
k,(i )
is large
ST3054 162
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Tests with two categories
The variance of d
k,(i )
is given for k = 1 and k = 2 by
v
(i )
=
n
1,(i )
n
2,(i )
d
(i )
(n
(i )
d
(i )
)
n
2
(i )
(n
(i )
1)
m
i =1
w
i
(d
1,(i )
e
1,(i )
)
m
i =1
w
2
i
v
(i )
If H
0
is true then q X
2
1
asymptotically, and the p-value
follows this cdf:
p = P(Q > q|H
0
true)
ST3054 163
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Tests with two categories
n
(i )
deaths
f (t
i
)
censored lives
S(t
i
)
k
e
k
i =1
t
i
n
i =k+1
t
i
n
i =1
t
i
and the MLE of is dened as
= arg max
L()
ST3054 168
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Maximum likelihood estimation (general case)
Let us now use indicator
i
with value 1 if life i died and 0 if life i
was censored. Then we have the following general expression for
the likelihood:
L =
n
i =1
f (t
i
)
i
S(t
i
)
1
i
and since f (t) = S(t)h(t),
L =
n
i =1
h(t
i
)
i
S(t
i
)
i
S(t
i
)
1
i
=
n
i =1
h(t
i
)
i
S(t
i
)
Note also that
arg max L = arg max log(L)
ST3054 169
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Maximum likelihood estimation (exponential hazard model)
We obtain an expression for the MLE by nding the optimum of
log(L). Now substituting for the exponential case:
log(L()) =
n
i =1
i
i =1
t
i
and =
n
i =1
i
n
i =1
t
i
This indeed coincides with a maximum for L() since
2
log(L())
2
=
n
i =1
i
2
< 0
ST3054 170
Introduction
Survival models
Lifetime distribution functions
Cox regression
Statistical inference
Censoring
The Kaplan-Meier (product-limit) model
Parametric estimation of the survival function
Using estimates over age ranges
m1
j =0
x+j
Model T
i
for individual i ? One constraint: T
i
0
Ex: model T
i
T
0,i
e
x
i
Then T
1
= T
0,1
and T
2
= T
0,2
e
If T
0,1
= T
0,2
(same population a priori ), then T
2
= 2T
1
We also have S
2
(t) = S
1
(t/2) (proportional survival)
ST3054 179
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Log-linear hazard models
Assuming
i
f
(u) = e
(ue
u
)
yields the exponential
distribution with constant hazard rate
log(
i
) = x
i
t
i
(t) = (t)e
x
i
ST3054 180
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Log-linear hazard models
Ex: (t) =
0
yields the exponential regression model with
(t, x
i
) =
0
e
x
i
Ex: (t) = t
1
yields the Weibull model
- if = 1 then exponential model
- if > 0 then Weibull model
- if < 1 then risk decreases over time
ST3054 181
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
2-sample log-linear hazard model example
Let (t) = e
+x
and
1
(t) =
1
,
2
(t) =
2
,
2
(t) = e
2
(t)
1
(t)
= e
o
(t) is the baseline hazard.
p
j =1
j
x
1,j
exp
p
j =1
j
x
2,j
= exp
_
_
p
j =1
j
(x
1,j
x
2,j
)
_
_
Time
H
a
z
a
r
d
ST3054 188
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Eect of covariates on risk
In exp(x
T
), determines the contribution of the vector of
covariates x to the hazard rate
If the j th covariate x
i ,j
takes positive values only, then
j
> 0
implies a positive correlation between x
i ,j
and the hazard rate
(hazard increases with x
i ,j
)
The magnitude of
j
determines the strength of this
correlation: a large
j
implies a signicant increase in hazard
with a unit increase in x
j
(by one standard deviation)
Likewise,
j
< 0 implies that an increase in the covariate
results in a decrease in risk
Ex: the covariates for a given life are (62, 168, 85)
(age at start of study, height in cm, weight in kg)
Given
= (0.0156, 0.0032, 0.0201), we can calculate (t, x)
for this life:
(t, x) =
0
(t) e
620.01561680.0032+850.0201
=
0
(t) e
2.1381
= 8.4833
0
(t)
(strong increase from baseline risk due mainly to the weight)
ST3054 190
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
The Cox model in practice
Estimating (t, x
i
) for an individual requires estimating
0
(t)
i dead
i
(t, x)
j all
S
j
(t, x)
i dead
0
(t
i
)e
x
i
j all
e
e
x
j
t
j
0
0
(s)ds
Dene
0i
=
0
(t
i
), then substitute back into likelihood
ST3054 194
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
ML wrt the baseline hazard
i dead
0i
j R
i
e
x
j
0i
=
1
j R
i
e
x
j
Partial likelihood:
L() =
k
j =1
e
x
T
j
i R
j
e
x
T
i
where R
j
denotes the set of lives alive at risk at time t
j
j =1
(t, x
j
)
i R
j
(t, x
i
)
PHM: (t, x) =
0
e
x
, x {0; 1}
Partial likelihood:
L
p
=
e
0
3e
0
+ 2e
e
0
+ 2e
e
0
e
0
+ e
1
3e
0
+ 2e
1 + 2e
1
1 + e
The model must also allow for multiple events at one time
point t
j
(i.e. d
j
> 1)
(i )
_
_
_
_
e
j D
(i )
x
j
(y
(i )
)
combinations
D
(i )
R
(i )
j D
(i )
x
j
(y
(i )
)
_
_
_
_
0
(18)e
x
4
0
(18)e
x
3
0
(18)e
x
4
+
0
(18)e
x
4
0
(18)e
x
5
+
0
(18)e
x
3
0
(18)e
x
5
=
e
.0+.1
e
.0+.1
+ e
.1+.1
+ e
.0+.1
Thus,
L
p
=
e
.0
3e
.0
+ 2e
.1
e
.0+.1
e
.0+.1
+ e
.1+.1
+ e
.0+.1
1
3 + 2e
2e
+ e
2
ST3054 201
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Partial likelihood for ties
j =1
e
s
T
j
i R
j
e
x
T
i
d
j
where s
j
is the sum of the covariate vectors x of the d
j
lives
that experienced an event at time t
j
.
2
log L()
ST3054 203
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Cox regression
1 for treatment B
0 otherwise
X
2
=
1 for treatment C
0 otherwise
ST3054 204
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Cox regression
1 for treatment B
0 otherwise
X
2
=
1 for treatment C
0 otherwise
1
X
1
+
2
X
2
then
covers all possible cases
1 for males
0 for females
X
2
=
1 for treatment B
0 otherwise
X
3
=
1 for treatment C
0 otherwise
female,B
(t) =
0
(t) e
0.025
. The ratio of these two hazards is
female,A
(t)
female,B
(t)
= e
0.025
= 1.0253
so the risk of an event is estimated to decrease by 2.53%
when using treatment B instead of A for female patients.
ST3054 207
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Cox regression: example
male,C
(t) =
0
(t) e
0.031+0.011
=
0
(t) e
0.042
. The ratio of
these two hazards is
female,A
(t)
male,C
(t)
= e
0.042
= 0.9589
so the risk of an event is estimated to be 4.11% higher for a
male patient receiving treatment C than for a female patient
receiving treatment A.
ST3054 208
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Numeric example of a PHM in R
library(survival)
library(MASS) # contains dataset leuk
leuk: Data from 33 leukemia patients with white blood count
(wbc) and presence/absence of leukemia marker (ag)
wbc ag time
1 2300 present 65
2 750 present 156
3 4300 present 100
4 2600 present 134
5 6000 present 16
leuk.cox <- coxph(Surv(time)ag+log(wbc), data=leuk);
summary(leuk.cox)
ST3054 209
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Numeric example of a PHM in R
leuk.cox <- coxph(Surv(time)ag+log(wbc), data=leuk);
summary(leuk.cox)
coef exp(coef) se(coef) z p
agpresent -1.069 0.343 0.429 -2.49 0.0130*
log(wbc) 0.368 1.444 0.136 2.70 0.0069**
exp(coef) exp(-coef) lower .95 upper .95
agpresent 0.343 2.913 0.148 0.796
log(wbc) 1.444 0.692 1.106 1.886
Rsquare= 0.377 (max possible=0.994)
Likelihood ratio test= 15.6 on 2 df, p=0.000401
Wald test = 15.1 on 2 df, p=0.000537
Score (logrank) test = 16.5 on 2 df, p=0.000263
In this example, both variables are signicant
ST3054 210
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
III.4 Model selection
ST3054 211
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Eect of covariates
Dene H
0
: the extra q covariates do not have an eect in the
presence of the other p covariates, i.e.
H
0
:
p+1
=
p+2
= =
p+q
= 0
T
x
i
H
0
(t
i
)
r
D
i
= sign(r
M
i
)
2(r
M
i
i
log (
i
r
M
i
))
scatter.smooth(leuk$wbc,resid(leuk.cox))
plot(1:length(leuk$time), resid(leuk.cox,type="deviance"))
ST3054 217
Introduction
Survival models
Lifetime distribution functions
Cox regression
Modelling approach
The Cox model
Cox regression
Model selection
Residual diagnostics
scatter.smooth(leuk$wbc,resid(leuk.cox))
plot(1:length(leuk$time), resid(leuk.cox,type="deviance"))
0 e+00 2 e+04 4 e+04 6 e+04 8 e+04 1 e+05
-
2
-
1
0
1
log(wbc)
m
a
r
t
i
n
g
a
l
e
r
e
s
i
d
u
a
l
s
0 5 10 15 20 25 30
-
3
-
2
-
1
0
1
2
3
index
d
e
v
i
a
n
c
e
r
e
s
i
d
u
a
l
s
ST3054 218