Sie sind auf Seite 1von 22

Cluster randomised trials (CRTs)

(AS10)
EPM304 Advanced Statistical Methods in Epidemiology

Course: PG Diploma/ MSc Epidemiology

This document contains a copy of the study material located within the computer
assisted learning (CAL) session.
If you have any questions regarding this document or your course, please contact
DLsupport via DLsupport@lshtm.ac.uk.
Important note: this document does not replace the CAL material found on your
module CDROM. When studying this session, please ensure you work through the
CDROM material first. This document can then be used for revision purposes to refer
back to specific sessions.
These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of
the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale or
further copying.
London School of Hygiene & Tropical Medicine September 2013 v1.0

Section 1: Cluster randomised trials (CRTs)


Aim
To understand when cluster randomised trials (CRTs) are appropriate and how to plan and
analyse them.
Objectives
By the end of this session you will be able to:
Appreciate the uses of cluster randomised intervention trials.
Understand why correlations between individuals in each cluster imply that special
methods are needed for the design and analysis of such trials.
Calculate sample size requirements for cluster randomised trials, and appreciate the role of
matching and stratification to improve balance.
Apply appropriate methods for the analysis of data from cluster randomised trials, including
methods of adjusting for covariates.
This session should take you between 2 and 3 hours to complete.

Section 2: Planning your study


In this session you will learn when cluster randomised trials are appropriate and/or necessary and
how methods for planning and analysis differ from individually randomised trials. So far in the
course you have mostly applied methods that make the assumption the data are statistically
independent, e.g. logistic regression, Poisson regression, Cox regression. With correlated data
the assumption of independence is invalid.

To begin you will look at why the usual methods are not appropriate. You will then look at
different approaches to obtain estimates accounting for the fact that observations are correlated.
During the session you will fit random effects models that were covered in AS09. Hence, you
may wish to review this material. In addition, sample size calculations will be covered and so you
could also review this session.
Correlated data

AS09

Sample size calculations

PE06

2.1: Planning your study


To illustrate the ideas in this session, you will use a study carried out in northern Ghana on the
impact of insecticide-impregnated bednets on child mortality. Details are given below.

Details

A rural study area in Ghana was divided arbitrarily into 96 clusters of compounds. Households in
48 randomly selected clusters then received insecticide-treated bednets. A demographic
surveillance system recorded all births, deaths and migrations. Around 16,000 children aged 6-59
months were followed up for 2 years with all-cause child mortality rates as the outcome.

Section 3: Background to CRTs


In a conventional clinical trial there are commonly two arms a treatment arm and a control arm,
with randomisation to one of these two arms.
At what level does this randomisation take place?
Interaction: Button: clouds picture (text and interaction appears below):
Randomisation takes place at the individual level i.e. individuals are randomised to either the
treatment or the control arm.
However, it may sometimes be more appropriate to randomly assign treatments to clusters of
individuals. The cluster may be a geographical entity, such as a town, a village, or an arbitrary
area obtained by dividing up a larger region into smaller sub-areas. Or it may be some other kind
of grouping, such as a household, school, workplace or medical practice.
Can you think of any reasons why it may be better to randomise clusters, not individuals?
Interaction: Button: clouds picture (prior text and interaction disappears and following text
appears below):
Reasons for randomising clusters may include:
-The intervention is most naturally provided at community level; for example, health education
campaigns, or water and sanitation programmes.
-Logistical convenience in providing the intervention. In some cases, provision of the same
intervention to everyone in the community may also be more acceptable to the study population.
-Avoidance of the contamination that occurs when individuals in different arms live close together.
In trials of interventions against infectious diseases, there may be additional reasons for adopting
cluster-randomisation:
-To capture the effects of an intervention on infectiousness, as well as susceptibility; for example,
transmission-blocking malaria vaccines which are designed to prevent onward transmission of
malaria.
-To capture the mass effect of an intervention when applied throughout a community; for
example, herd immunity resulting from high coverage of an intervention such as a vaccine or
circumcision to prevent HIV, or mass killing of malaria vectors due to insecticide treated bednets.

3.1: Background to CRTs


Cluster randomised trials require different methods of analysis to individually randomised trials.
Why do you think this may be?
Interaction: Button: clouds picture (text and interaction appears below):
Individuals from one cluster may be more likely to be similar than individuals from different
clusters. Can you think of reasons this may be?
Interaction: Button: clouds picture (pop up box appears ):

Correlation between measurements on individuals in the same cluster may arise in a number of
ways:
-Persons may behave or respond more like other persons in the same community than persons in
a different community.
-Persons may have a level of exposure more like other persons in the same community than
persons in a different community.
-An infected individual in a community may transmit an infection directly to another individual in
that same community.

3.2: Background to CRTs


From the session on correlated data, what are the implications for analysis if the correlations in
the data are not taken into account?
Interaction: Button: clouds picture (text appears below):
The standard errors will be too small, leading to confidence intervals that are too narrow and pvalues that are too small. Hence, the evidence for an effect will be exaggerated.
Some other issues of CRTs compared to individual randomised trials are:
- Often the numbers of clusters in each arm are small, leading to large baseline imbalances
between the treatment and control arms for potential confounding variables.
- It is often impossible or impractical to consent an entire cluster.
- Contamination between clusters, such that one cluster affects another cluster, must be avoided
if possible, to preserve independence of observations between clusters.
- As correlated data contain less information than uncorrelated data, a larger sample size will be
required than for individually randomised studies.

3.3: Background to CRTs


The first carefully designed CRTs were used to measure the effect of isoniazid on tuberculosis
and started to be published in the 1960s.
For example, Ferebee et al. published a paper in 1963 that randomised 433 hospital units to
isoniazid or placebo.
Despite this early work, it was not until the last two decades that substantial numbers of studies
used the CRT design. Even during this time many of these studies were methodologically poor.
Hence, in comparison with individually randomised trials, methods of design and analysis for
CRTs are still a very active area of research. It is clear that the number and importance of CRTs
are growing and as such are an important topic to understand.
The following three pages give examples of CRTs.

3.4: Background to CRTs


Impact of insecticide-impregnated bednets on child mortality in northern Ghana
Studies in The Gambia, where many people in rural areas use bednets, suggested that dipping
the nets in insecticide once a year made them much more effective and reduced morbidity from
malaria and all-cause child mortality.

A rural study area in Ghana was divided arbitrarily into 96 clusters of compounds, with
households in 48 randomly selected clusters receiving insecticide-treated bednets. A
demographic surveillance system recorded all births, deaths and migrations. Around 16,000
children aged 6-59 months were followed up for 2 years with all-cause child mortality rates as the
outcome.
Why do you think a CRT design was used, rather than randomising some households to receive
treated bednets?
Interaction: Button: clouds picture (text appears in pop up box):
Logistically it may be easier to distribute bednets to every household in a cluster. Education and
information campaigns around the use of bednets may be better delivered at a cluster level.
In addition, by impacting malaria prevalence and mosquito density there may be substantial
contamination between neighbouring households and so measuring the effect at the cluster level
may help reduce this problem.

3.5: Background to CRTs


Reducing HIV incidence by controlling STDs
The sexual transmission of HIV is enhanced in the presence of other STDs, suggesting that
improved STD control may be an effective HIV prevention strategy. In a trial in rural Tanzania, six
pairs of communities were chosen, and one community in each pair was randomly selected to
receive the intervention, which consisted of improved STD treatment services delivered through
existing health units.
One thousand adults were randomly selected from the general population in each community,
and surveyed at baseline and two years later. HIV incidence rates were compared between
treatment and control communities.
Why do you think a CRT design was used, rather than randomising individuals to improved STD
treatment or standard of care?
Interaction: Button: clouds picture (text appears in box):
It was much easier logistically to upgrade existing facilities at a community level than to visit all
individuals and randomly assign some to treatment.
Treating STDs may have an impact on both infectiousness and susceptibility to HIV. By analysing
individual-level HIV sero-conversion, only the impact on susceptibility would be captured. Hence,
a CRT was necessary to capture both these effects.

3.6: Background to CRTs


Reducing adolescent tobacco use
In a study in the USA, twelve schools were randomly assigned to receive a health promotion
intervention (Smoke-free generation) aimed at reducing smoking among students, while twelve
other schools acted as controls. The primary outcome was the proportion of children in each
school who were smoking after two years of follow-up.

Why do you think a CRT design was used, rather than randomising children to the intervention or
control?
Interaction: Button: clouds picture (text appears in box):
Children who received the health promotion intervention may talk to their classmates about the
intervention. This may reduce the impact of the intervention among the children who received it
and may also lead to the intervention affecting the children who did not receive it. Hence, the
measured intervention effect may be diluted.
In addition, it may be easier to deliver such a campaign to the whole school using large meetings
as well as smaller group counselling.

3.7: Background to CRTs


There are some ethical issues that are particular to CRTs and a few are discussed here:
In general, individually randomised trials try to ensure the benefit/risk ratio is favourable to all
participants. But if a vaccine that may reduce infectiousness is to be studied, the effects must be
studied in those who are uninfected, strongly suggesting a CRT. Yet in such a CRT there is no
direct benefit for those who receive the vaccine and no direct risk for those who do not.
Scientific validity of a trials design and conduct is a key ethical requirement. There is a much
smaller body of research into CRTs than individually randomised trials and so scientific errors are
more common. The inclusion of researchers with appropriate CRTs experience into all stages of a
CRT should be carefully considered.

3.8: Background to CRTs


Informed consent is a prerequisite for participation in most individually randomised trials. This is
complicated in a CRT as it would usually be virtually impossible to obtain consent from all
members of a community. In the example of a condom promotion campaign some community
members may strongly oppose this on religious grounds or they may believe that such a
campaign will increase extra-marital/pre-marital sexual behaviour.
What should be done if a minority in a cluster is opposed to the campaign? Who can speak on
behalf of the community? Some trials have required consent from political authorities, village
heads, community representatives, patient representatives or heads of medical practices for a
cluster to be included in the study and then individual consent from all those receiving direct
interventions.
There is increasing using of Community Advisory Boards (CABs). These serve as the voice of the
community for determining the acceptability and feasibility of the study both at the design stage
and when the study is ongoing.

3.9: Background to CRTs


As mentioned earlier, a CRT design is often used to try and reduce the problem of contamination.
There are generally three types of contamination:
Interaction: Tabs: 1:
Between intervention and control clusters this may be the result of travel or migration between
geographical areas, children socialising from different schools or patients attending different

medical facilities. Individuals from one community may commonly visit medical facilities in other
communities for treatment of stigmatising illnesses. Such contamination will generally dilute the
effect of any intervention.
Interaction: Tabs: 2:
Between intervention clusters and the wider community. This is likely to dilute the effect of the
intervention, but to a lesser extent as it may weaken the intervention, but is unlikely to lead to
any effect in the control clusters under study.
Interaction: Tabs: 3:
Between control clusters and the wider community. This is unlikely to have any substantial effect,
unless the control communities are receiving an improved standard of care due to the study.

3.10: Background to CRTs


Here are some ways of trying to reduce these types of contamination.
Interaction: Tabs: All types:
Increasing the sizes of clusters is likely to reduce contamination, both by including larger patterns
of movement and by reducing the proportion of people who live on the edge of the cluster.
Interaction: Tabs: Type 1:
Clusters that are far apart are likely to have less contamination than clusters that are close to
each other. Also, contamination may be lower between clusters that are separated by a natural
barrier, such as a mountain or a river, or those for which connecting transport links are poor.
Interaction: Tabs: Types 2 and 3:
Type 2 contamination is particularly common in high density urban areas. In such cases, the fried
egg design can be used whereby the intervention is given to the whole cluster, but the trial
endpoints are only measured in a central portion of the cluster (the yolk).

Section 4: Measuring intra-cluster correlation


When using the CRT design, observations between individuals within one cluster are likely to be
correlated - termed intra-cluster correlation.
As an illustration consider the following: if participants in a CRT of communities are asked is
there a health facility in your community?, all participants in a cluster should give the same
answer.
You could have asked only one participant in each community and hence, the effective sample
size is the number of communities (clusters).
Intra-cluster correlation is commonly measured by the intra-cluster correlation coefficient, denoted
by the Greek letter (rho). =0 implies that responses from individuals of the same cluster are no
more alike than from individuals of different clusters. =1 implies that all responses within a
cluster were identical.

4.1: Measuring intra-cluster correlation


Another measure of the impact of intra-cluster correlation is the design effect (or deff). This is
the ratio of the variance of the estimator under cluster sampling to the variance of the estimator
under simple random sampling.
This means that the power from a CRT with sample size n is the same as from an individually
randomised trial with sample size n/deff (called the effective sample size). Hence, a design
effect of 1.5 implies that the CRT will need 50% more participants than the individually
randomised trial.

4.2: Measuring intra-cluster correlation


For a cluster sample survey in which every cluster is of the same size, m, the two measures are
related by the formula
deff = 1 + (m - 1)
Hence, the larger the cluster size, the larger the design effect. In larger clusters, the observations
from a larger number of participants are correlated and so, in general, statistical power will be
higher for many clusters with few participants each than for a few clusters with many participants.

4.3: Measuring intra-cluster correlation


If there is intra-cluster correlation, then the observations from two individuals in the same cluster
are more likely to be similar than the observations from two individuals in different clusters. This
happens if and only if there are differences between clusters. Hence, an alternative way of
thinking about intra-cluster correlation is in terms of between-cluster variation.
So intra-cluster correlation and between-cluster variation are two ways of measuring the same
phenomenon.
A commonly used measure of between-cluster variation is the coefficient of variation, denoted k.
Depending on the type of outcome we are analysing, k is the coefficient of variation of true rates,
risks or means across clusters. Recall that the coefficient of variation is computed as SD/mean,
so if we are analysing proportions, k would be computed as the SD of the (true) cluster-level
proportions divided by the mean of these proportions.

4.4: Measuring intra-cluster correlation


For binary outcomes (proportions), the intra-cluster correlation coefficient is defined as:

where B is the variance of the true cluster-level proportions and is the overall true proportion
across all the clusters in the study.
2

Since k = B/, this gives us the relationship between and k:

For quantitative data, is defined as:

where B is the between-cluster variance and W is the within-cluster variance of the outcome of
interest.
2

The intra-cluster correlation coefficient is not defined for person-years (rates) data.

Section 5: Design issues


One of the most important issues when designing any trial is the required sample size. In
individually randomised trials, one needs to calculate the required number of individuals in each
arm of the study. In a CRT, one needs to take into account both the number of clusters AND the
number of individuals per cluster in each study arm. This introduces an additional degree of
complexity.
Do you think a trial with one cluster per arm could be valid?
Interaction: Button: clouds picture (text appears in box):
No, this is because we must have some replication in order to measure and take account of the
degree of between-cluster variation (or intra-cluster correlation). A trial which compares one
intervention community with one control community is very similar to a clinical trial with one
patient in each treatment arm!
On the following page we provide formulae for the number of clusters required for an unmatched
design with only two arms.

5.1: Design issues


For rates,
c = 1 + f [ (0 + 1)/y + k (0 + 1 ) ] / (0 - 1)
2

c is the number of clusters per treatment arm


1 is the expected rate in the intervention arm
0 is the expected rate in the control arm
y is the number of person-years of follow-up in each cluster (assumed equal in all clusters)

For proportions,
c = 1 + f [ 0(1 - 0)/m + 1(1 - 1)/m + k (0 + 1 ) ] / (0 - 1)
2

c is the number of clusters per treatment arm


1 is the expected proportion in the intervention arm
0 is the expected proportion in the control arm
m is the number of individuals in each cluster (assumed equal in all clusters)
In both cases, f is a factor depending on type I error and the required study power: f = 7.84 for 5%
type I error and 80% power, and f = 10.50 for 5% type I error and 90% power; and k is the coefficient

of variation in the (true) rates or proportions between clusters in each treatment arm (see Page 1819).

5.2: Design issues


As an example, in the Mwanza HIV prevention trial, HIV incidence in the control arm was expected
to be around 1% per year. It was desired to have 80% power of detecting a halving of incidence in
the intervention arm. With a sample of 1000 adults followed for 2 years in each community, what are
the values of 0, 1, y and f?
Interaction: Button: clouds picture (pop up box appears):
0 = 0.01, 1 = 0.01/2 = 0.005, y = 2*1000 = 2000, f = 7.84 (for 5% type I error and 80% power - from
previous card).
The value of k was assumed to be 0.25. Since SD = k x Mean, this would correspond to an SD of
0.0025 in the control arm. In other words, incidence rates in control communities were assumed to
vary from 0.5% to 1.5% (Mean 2SD), while rates in intervention communities were assumed to
vary from 0.25% to 0.75%.
Assuming an unpaired design, we need six communities per arm.
Interaction: Button: Explanation (pop up box appears):
Applying the equation for rates on the previous page we require:
2

c = 1 + 7.84 [0.015/2000 + 0.25 (0.01 + 0.005 )] / 0.005 = 5.8


or about six communities in each arm.
Note that the CRT design implies 62,000=12,000 person-years of follow-up in each arm. What
would this be in an individually randomised trial with these assumptions?
Interaction: Button: clouds picture (pop up box appears):
Using the standard equation of y = f (0 + 1) / (0 - 1) , we get y = 7.84 * 0.015 / 0.005 = 4,704
person years in each arm. This gives a design effect of 12,000 / 4,704 = 2.6.
2

5.3: Design issues


Sometimes we can derive estimates of k the coefficient of variation from other studies, but often
we have to make some assumptions about this and then examine the sensitivity of the calculated
sample size to this.
Using the previous example, how many clusters would we require if k=0.15 and if k=0.35?
Interaction: Button: clouds picture (pop up box appears):
k
0.15

c
4.2

0.25
0.35

5.8
8.2

In this example, we can see that the number of clusters required is quite sensitive to k. We can
also see that as k increases, the between cluster variation or intra-cluster correlation is
increasing and so the number of clusters required increases.
We can see that it would be helpful to get estimates of k for HIV incidence in these communities
while planning the study, but of course this may not be possible.

5.4: Design issues


Other CRT designs
In an individually-randomised trial, in which hundreds or thousands of individuals are randomised to
the various treatment arms, the randomisation process can usually be relied upon to produce
treatment arms that are very similar with respect to (known and unknown) risk factors for the
outcome.
By contrast, cluster-randomised trials often involve randomisation of only a few clusters, and so
substantial imbalances between the treatment and control arms, in terms of potentially confounding
variables, can occur more easily.
The number of imbalances that can be simultaneously adjusted for is often limited, due to the loss of
degrees of freedom. Hence, the results may be more credible if there is baseline similarity of the two
arms. One method of achieving this is the matched design.
The simplest approach is the pair-matched design, in which all available clusters are matched in
pairs according to variables thought to be correlated with the primary outcome. Then one cluster in
each pair is randomly allocated to the intervention arm, and the other to the control arm.

5.5: Design issues


The Mwanza HIV prevention trial is a good example of a pair-matched CRT.
There were no data available on HIV incidence in the 12 communities before starting the study.
However, baseline data were available on a few factors expected to be correlated with HIV
incidence, the primary outcome measure. These included type of community (roadside, rural,
islands), geographical location and prior STD rates in community health units.
Hence, communities were pair-matched on these factors.
If initial HIV prevalence data were available, matching on these might be an alternative, although
prevalence and incidence of an infection are not necessarily highly correlated, especially in an
epidemic situation.

5.6: Design issues

Note that with a pair-matched design, the sample size equations given should be amended to add 2
rather than 1 to the right-hand side. However, k now represents the between-cluster variation within
matched pairs, and if the matching is effective this may be much smaller than the unmatched k so
that the required sample size is reduced.

5.7: Design issues


Pair-matching is used most commonly when we have a small number of clusters, but is only
worthwhile if we match on factors that are strongly correlated with the outcome. Recent
methodological work has suggested that stratification may often be preferable to pair-matching. The
clusters are matched into strata (with more than two clusters in each stratum), and then randomly
allocated to the treatment arms within strata.
However, if we want balance on several variables and have a small number of clusters it will be
impossible to have many strata. In such situations, restricted (or constrained) randomisation is used.
Before randomly choosing an allocation of the clusters to the study arms, the possible allocations are
restricted to those that meet pre-defined criteria. Examples of such criteria may include: no more
than 10% difference by arm in baseline covariates or similar numbers of urban versus rural clusters
in each arm. In practice, stratification and restricted randomisation are often combined to improve
balance.

5.8: Design issues


CRTs tend to be expensive and logistically complex and so investigators may want to have more
than two arms. This is usually to try out more than one epidemiological approach to disease control
or to examine a more intensive and a less intensive form of one intervention.
For example, the Masaka HIV prevention trial in Uganda consisted of three arms: a control arm, an
arm with an educational intervention and an arm with both the educational intervention and improved
treatment of STDs.
An alternative to this would have been a factorial design, with four arms control (A), intervention 1
only (B), intervention 2 only (C) and both interventions 1 and 2 (D). To evaluate intervention 1, we
would compare arms B and D with arms A and C; to evaluate intervention 2, we would compare arms
C and D with arms A and B. Assuming independence of the two interventions for their effect on the
outcome (i.e. no statistical interaction in terms of rate or risk or odds ratios), this would essentially
give two CRTs for the price of one.

Section 6: Analysis at the cluster level


There are two main methods of analysis of data from a CRT that are valid.
The first is to avoid the problem of clustering in the individual data by using the proportion or
event rate in each cluster as the observation i.e. to analyse the cluster level summaries.
It may seem that by using the cluster level summaries there is no advantage to having larger
clusters. Why is this not the case?
Interaction: Button: clouds picture (text appears in pop-up):

By having larger clusters, the variability among clusters in the proportions, rates or means is
reduced. Hence, there is more power to detect a difference between arms.

The second approach is to analyse the individual-level data using regression methods that make
allowance for correlated data. These methods were introduced in AS09.
We will now cover cluster-level analysis followed by individual-level analysis.

6.1: Analysis at the cluster level


The estimate of the outcome in each treatment arm can be obtained either as the overall
response of all individuals in the treatment arm, or as the mean of the cluster responses in that
arm.
For example, using the data from the smoking study below, what are the risk ratios using the
overall proportions and using the means of the cluster proportions? How do they differ?
Interaction: Button: clouds picture (pop-up box appears):
The risk ratio using overall proportions is 0.0433/0.0615 = 0.704
The mean of the school-level proportions from the intervention arm is (0 + 0.012 + + 0.052) /
12 = 0.0387 and from the control arm is (0.049 + 0.017 + + 0.058) / 12 = 0.0595. Hence the
risk ratio using the means of cluster proportions is 0.0387/0.0595 = 0.650
The risk ratios are similar, but the risk ratio from the cluster means is more extreme (further from
1).

The proportion of children in each school (and overall) who reported tobacco use after 2 years in
the Smoke-free generation intervention trial:

0/42
1/84
9/149
11/136
4/58
1/55
10/219
4/160
2/63
5/85
1/96
10/194

Intervention
Proportion
0
5/103
0.012
3/174
0.060
6/83
0.081
6/75
0.069
2/152
0.018
7/102
0.046
7/104
0.025
3/74
0.032
1/55
0.059
23/225
0.010
16/125
0.052
12/207

Control
Proportion
0.049
0.017
0.072
0.080
0.013
0.069
0.067
0.041
0.018
0.102
0.128
0.058

58/1341

0.0433

0.0615

91/1479

6.2: Analysis at the cluster level

The main arguments for using the overall measure (of rates, risks or means) are:
(i) ease of estimation;
(ii) if each treatment arm is a random cluster sample from the population of interest, then the
overall measure gives consistent estimates of the population measure; and
(iii) clusters are weighted according to their size, giving equal weight to each individual.
The main arguments for using the means of the cluster measure (of rates, risks or means) are
(i) this gives a better tie-in with simple inferential procedures based on the t-test (see later);
(ii) the clusters do not always form a random cluster sample from a well-defined population - for
example, the clusters (communities) may have been arbitrarily selected and then randomly
allocated to the two treatment arms; and
(iii) equal weight is given to each cluster.

6.3: Analysis at the cluster level


Cluster-level analysis starts with calculating these cluster level summaries as rates, proportions
or means and then analysing these using standard methods.
For unmatched CRTs, cluster summaries can be analysed with the unpaired t-test.
Although based on assumptions of normality and equal variances of cluster-level summaries that
are not strictly satisfied, the t-test is remarkably robust to this.
Alternatively, we may use non-parametric tests to avoid these assumptions.
Enter the 24 (12x2) proportions into Stata and then use the unpaired t-test on the cluster level
data from the smoking study. What conclusion do you make?
Interaction: Button: clouds picture (pop-up box appears):
We obtain a t statistic of 1.66 on 22 degrees of freedom, giving a p-value of 0.112.
Hence, there is only weak evidence for a difference in risks between the two arms.
The Wilcoxon rank sum test is a non-parametric equivalent of the unpaired t-test. This gave a pvalue of 0.16, which is similar to the result from the unpaired t-test, but with fewer assumptions.
Note that the Wilcoxon test requires at least four clusters per arm for p<0.05 to be attainable.

6.4: Analysis at the cluster level


Confidence interval for the rate or risk ratio
We would also like a confidence interval for the RR that takes the clustering into account. There
are several approaches to obtaining approximate CI, but we only give one simple method here.
Firstly we work with the log(RR) = log(r1/r0) = log(r1) - log(r0). An approximate formula for the
variance of log(RR) is given by:
Var(log RR) = Var(log r1) + Var(log r0) Var(r1)/r1 + Var(r0)/r0
2

Note this formula is the same whether r0 and r1 represent risks or rates.

Then Var(r1) can be estimated approximately as s1 /c1, where s1 is the SD of the observed cluster
risks in the intervention arm, and c1 is the number of clusters in that arm, and similarly for the
control arm.
Calculate the 95% CI for the risk ratio using overall risks (i.e. RR = 0.0433/0.0615 = 0.704).
Interaction: Button: clouds picture (text appears in pop-up):
Using the given formula we obtain:
Var(log RR) (0.026 /12)/0.043 + (0.035 /12)/0.062 = 0.0571
2

Then taking log(RR) 1.96 Var(log RR), we obtain an approximate 95% CI for log(RR):
-0.351 1.96 0.239, or -0.819 to 0.441.
Exponentiating these values, we obtain the corresponding CI for the RR of 0.70 as 0.44 to 1.12.

6.5: Analysis at the cluster level


Now we can apply these methods to a second example using data from the Ghana bednet study.
This measured all-cause child mortality over a 2 year period in 96 clusters (48 intervention and
48 control) to determine the impact of distributing insecticide-treated bednets.
The overall mortality rate (per 1,000 person years) was 27.95 in the control arm and 23.51 in the
intervention arm. What was the rate difference (intervention minus control) and the rate ratio
using overall rates?
Interaction: Button: clouds picture (text appears in pop-up):
Rate difference = 23.51 27.95 = -4.4 per 1,000 person years
Rate ratio = 23.51/27.95 = 0.84

6.6: Analysis at the cluster level


If we were to use the t test on the cluster level rates we would get the following results.
What are the rate difference and the rate ratio using cluster level rates?
Interaction: Button: clouds picture (text appears in pop-up):
Rate difference = -3.95 per 1,000 person years (given in the diff line).
Rate ratio = 23.97/27.92 = 0.86
As in the school-level smoking-free intervention trial, the rate differences and rate ratios from
overall rates and cluster rates are similar.

t-test of the cluster level mortality rates from the Ghana bednet study (0=control; 1=intervention).
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+--------------------------------------------------------------------

0 |
48
27.92242
1.840102
12.7486
24.22061
31.62423
1 |
48
23.97155
1.404359
9.729682
21.14635
26.79676
---------+-------------------------------------------------------------------combined |
96
25.94699
1.168985
11.45367
23.62626
28.26772
---------+-------------------------------------------------------------------diff |
3.950866
2.314778
-.6451805
8.546912
-----------------------------------------------------------------------------Ha: diff < 0
Ha: diff != 0
Ha: diff > 0
Pr(T < t) = 0.9544
Pr(|T| > |t|) = 0.0912
Pr(T > t) = 0.0456

6.7: Analysis at the cluster level


From the t test results, the p-value from the 2-sided test was 0.091, showing weak evidence for
an effect of the intervention.
To calculate the confidence interval, the approximate formula for the variance of the log rate ratio
is the same as for the variance of the log risk ratio i.e.
Var(log RR) = Var(log r1) + Var(log r0) Var(r1)/r1 + Var(r0)/r0
2

Then Var(r1) can be estimated approximately as s1 /c1, where s1 is the SD of the observed cluster
rates in the intervention arm, and c1 is the number of clusters in that arm, and similarly for the
control arm.
Calculate the approximate confidence interval for the rate ratio using cluster rates.
Interaction: Button: clouds picture (text appears in pop-up box):
From the output in the table on the previous page, we know that s1 and s0 are 9.730 and 12.749
2
2
2
2
respectively. Hence Var(log RR) (9.730 /48)/23.972 + (12.749 /48)/27.922 = 0.00778
we obtain an approximate 95% CI for log(RR):
log(23.972/27.922) 1.96 0.00778, or -0.325 to 0.020.
Exponentiating, we obtain the CI for the RR of 0.86 as 0.72 to 1.02.

Section 7: Adjusting for confounders at cluster level


In individually-randomised trials, adjustment for covariates is often unnecessary because the
random allocation of a large number of individuals should ensure that the treatment arms are
similar with regard to risk factors for the disease outcome.
In contrast, the number of clusters in CRTs is often small, and important imbalances may
therefore arise by chance. For this reason, adjustment for covariates is much more important in
such trials.
Covariates may be measured at the cluster level (eg. distance of village from nearest health
centre) or the individual level (eg. age, ethnic group). They may also include baseline
prevalences of the outcome of interest. For example, in the Mwanza HIV prevention trial, the
effect on HIV incidence was adjusted for the baseline HIV prevalence in each community.

7.1: Adjusting for confounders at cluster level


There are various methods to adjust for covariates in a cluster level analysis and here we
present a simple two-stage approach. In essence, a regression model is fitted to the individuallevel data, ignoring the study arms and clusters, to obtain a residual for each cluster, adjusted for
the covariates. Then these are analysed in place of the observed measures. The details of this
are given on the next page.

7.2: Adjusting for confounders at cluster level


Firstly, we fit an ordinary regression model to the data, including the covariates, but ignoring
study arms and clusters.
Using this regression equation, we calculate the expected rate, proportion or mean in each
cluster, based on the distribution of individual covariates in that cluster.
Then we calculate a residual for each cluster. For example, if we are interested in the rate ratio,
the residual is the ratio of the observed to the expected rate, whereas if we want a rate
difference, the residual is the difference between the observed and the expected rate.
Finally, these cluster-level residuals are analysed in place of the cluster-level rates, proportions
or means. If the intervention had no effect then the mean and distribution of the cluster level
residuals will be similar in the two study arms, whereas a difference in the mean and distribution
of the residuals between the two arms would imply an intervention treatment effect.

7.3: Adjusting for confounders at cluster level


Consider again the Ghana bednet study; the unadjusted analysis gave an estimated rate ratio of
RR = 0.86 (95% CI 0.72 1.02, p = 0.091), using the cluster level rates. Mortality was related to
age and there was a slight age imbalance between arms. A further analysis adjusting for age and
sex was carried out. We will follow this on the next pages.

7.4: Adjusting for confounders at cluster level


Firstly a Poisson regression model is fitted for mortality by age group and sex.
Then the expected number of events (deaths) for each individual is calculated as their person
time multiplied by their mortality rate, calculated from the regression equation, given their age
group and sex.
These expected numbers of events for each individual are then summed in each cluster, to give
the total expected number of deaths per cluster (assuming no effect of the intervention, i.e. under
the null hypothesis that the death rate is the same in the intervention and control arms).

Finally, by dividing the observed number of deaths by the expected number, in each cluster, we
obtain the cluster level residuals. Would it matter here if we used the ratio of observed to
expected rates instead of the ratio of observed to expected deaths?
Interaction: Button: clouds picture (text appears in pop-up box):

No it would not have mattered. We could have calculated the observed and expected rate in each
cluster as the observed or expected deaths divided by total person time in that cluster. However,
as the person time in the cluster is the same for both the observed and the expected rate, the
ratio of observed to expected rates is the same as the ratio of observed to expected deaths.
However, it does matter if we want the rate difference, when we would have to use the difference
in rates, not the difference in deaths.

7.5: Adjusting for confounders at cluster level


In the table below we show observed and expected deaths for each cluster, together with the
residual (observed/expected) for 10 of the 96 clusters in this study.
As the residual is the ratio of the observed to the expected deaths, a residual less than one
implies there were fewer deaths than predicted, given the age and sex distribution of the cluster,
while a residual greater than one implies more deaths than predicted.

Cluster

Arm

1
2
3
4
5
6
7
8
9
10

1
0
0
0
1
0
0
1
1
1

Observed
deaths
12
11
6
12
9
9
7
10
8
9

Expected
deaths
6.44
6.11
6.08
6.37
8.73
10.40
7.86
9.53
10.65
7.51

Residual
1.86
1.80
0.99
1.88
1.03
0.87
0.89
1.05
0.75
1.20

7.6: Adjusting for confounders at cluster level


These results show a t test on the cluster level residuals for the Ghana bednet study.
How do you interpret the mean of the cluster level residuals for the intervention arm?
Interaction: Button: clouds picture (text appears in pop-up box):
As the residuals are the ratio of the observed to expected deaths, a mean of the cluster level
residuals of 0.920 implies fewer deaths in the intervention arm than expected, if the null
hypothesis of no difference between the treatment and control arms is true, given the
distributions of age group and sex in that arm.
The adjusted rate ratio is 0.920/1.091 = 0.84 with a p-value of 0.054, suggesting stronger
evidence for an effect after adjustment for imbalances in age group and sex between the arms.
t-test of the cluster level residuals from the Ghana bednet study (0=control; 1=intervention).
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
48
1.090595
.0710939
.4925527
.9475728
1.233618
1 |
48
.9203973
.0502305
.3480069
.8193467
1.021448
---------+-------------------------------------------------------------------combined |
96
1.005496
.0441661
.4327381
.9178154
1.093177
---------+--------------------------------------------------------------------

diff |
.1701979
.0870485
-.0026389
.3430346
-----------------------------------------------------------------------------Ha: diff < 0
Ha: diff != 0
Ha: diff > 0
Pr(T < t) = 0.9732
Pr(|T| > |t|) = 0.0535
Pr(T > t) = 0.0268

7.7: Adjusting for confounders at cluster level


To calculate a confidence interval for the adjusted rate ratio, we use the same equation as
before, but with the residuals, rather than observed rates.
2
2
2
2
Hence Var(log RR) (0.348 /48)/0.920 + (0.493 /48)/1.091 = 0.00723

So, we obtain an approximate 95% CI for log(RR):


log(0.920/1.091) 1.96 0.00723.
Exponentiating, we obtain the CI for the RR of 0.84 as (0.71 - 1.00).
t-test of the cluster level residuals from the Ghana bednet study (0=control; 1=intervention).
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
48
1.090595
.0710939
.4925527
.9475728
1.233618
1 |
48
.9203973
.0502305
.3480069
.8193467
1.021448
---------+-------------------------------------------------------------------combined |
96
1.005496
.0441661
.4327381
.9178154
1.093177
---------+-------------------------------------------------------------------diff |
.1701979
.0870485
-.0026389
.3430346
-----------------------------------------------------------------------------Ha: diff < 0
Ha: diff != 0
Ha: diff > 0
Pr(T < t) = 0.9732
Pr(|T| > |t|) = 0.0535
Pr(T > t) = 0.0268

Section 8: Analysis at the individual level


Methods based on cluster-level summaries are highly robust but have a number of
disadvantages. First, they are not the most efficient use of the data, as equal weight is given to
each cluster even though cluster size may vary substantially. Second, adjustment for individual
level covariates requires a two-stage procedure using residuals and this is rather inconvenient.
Third, we may want to analyse the effects of these individual covariates alongside the intervention
effects. Note that if all covariates are at the cluster level the t-test can be replaced by ordinary
regression techniques.
For all these reasons, we may prefer to analyse individual-level data from the trial using a
regression method that allows for the correlations induced by the cluster randomised design.
This includes any of the methods presented in AS09 - robust standard errors, generalised
estimating equations (GEE) and random effects models.

8.1: Analysis at the individual level

Returning to the Ghana bednet trial, if we ignore the clustered design and fit a simple Poisson
regression model to the data, we obtain an unadjusted rate ratio for the intervention effect of 0.84
(95% CI: 0.74 0.96) with a P-value of 0.012.
Note however that this analysis is invalid as it does not take account of intra-cluster correlation.

8.2: Analysis at the individual level


Now a random effects Poisson model is fitted that takes account of the intra-cluster correlation.
The point estimate of the RR is almost unchanged at 0.85, but after taking account of the
clustered design the CI is now wider (0.72 1.00) and the P-value of 0.054 is larger.
The fitted model also indicates strong evidence of clustering (P < 0.001) as the LRT of alpha=0
tests the null hypothesis that the variance of the random effects is zero i.e. no between-cluster
variation.
Random-effects Poisson regression
Group variable: cluster

Number of obs
Number of groups

=
=

26342
96

-----------------------------------------------------------------------------outcome |
IRR
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_Ibednet_1 |
.8472471
.0729997
-1.92
0.054
.7155989
1.003114
follyr | (exposure)
-------------+---------------------------------------------------------------/lnalpha | -2.764239
.4069338
-3.561814
-1.966663
-------------+---------------------------------------------------------------alpha |
.0630241
.0256466
.0283873
.139923
-----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) =
11.82 Prob>=chibar2 = 0.000

8.3: Analysis at the individual level


Now a random effects Poisson model is fitted, adjusting for age group and sex.
Again, this shows that adjusting for age group and sex strengthened the evidence for an
association.
In contrast to the cluster level analysis, this allows us to examine simultaneously the effect of the
intervention and the covariates at the same time.
Random-effects Poisson regression
Group variable: cluster

Number of obs
Number of groups

=
=

26342
96

-----------------------------------------------------------------------------outcome |
IRR
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_Ibednet_1 |
.8374889
.070685
-2.10
0.036
.7098012
.9881467
_Iagegp_1 |
.3969492
.0383497
-9.56
0.000
.3284728
.4797008
_Iagegp_2 |
.2990543
.0308447
-11.70
0.000
.2443186
.3660527
_Iagegp_3 |
.1715907
.0256085
-11.81
0.000
.1280734
.2298946
_Iagegp_4 |
.1827192
.0416134
-7.46
0.000
.1169304
.2855229
_Isex_1 |
.9422119
.0644793
-0.87
0.384
.8239436
1.077456
follyr | (exposure)
-------------+---------------------------------------------------------------/lnalpha | -2.881458
.4360814
-3.736162
-2.026755
-------------+---------------------------------------------------------------alpha |
.056053
.0244437
.0238454
.1317625
-----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) =
9.80 Prob>=chibar2 = 0.001

8.4: Analysis at the individual level


Choice of method
Despite their greater convenience and other advantages, the individual-level regression methods
such as random effects models and GEE are not reliable when analysing studies with fewer than
around 30 clusters, whereas the simpler methods based on cluster-level summaries are robust
even with small numbers of clusters.
We therefore recommend that cluster-level methods (based on the t-test or corresponding nonparametric methods) are used to analyse trials with fewer than around 15 clusters per treatment
arm. For trials with a larger number of clusters, either cluster-level methods or individual-level
regression (with allowance for correlated data) may be used. In general the findings are very
similar.

Section 9: Summary
This is the end of AS10. When you are happy with the material covered here please move on to
session AS11 .
The main points of this session will appear below as you click on the relevant title.
Background to CRTs
Randomised controlled trials are the gold standard for evaluating health interventions. Often it is
more appropriate to randomise groups of people (clusters) to study arms, rather than
individuals.
Possible reasons include: the intervention is naturally delivered at a community level; for logistical
convenience; to reduce mixing between people in different arms; to capture mass effects of an
intervention; or the impact on infectiousness of infected individuals
Measuring intra-cluster correlation
Individuals from one cluster may be more alike than individuals from different clusters termed
intra-cluster correlation and measured by (rho). This happens if and only if there are
differences between clusters termed between-cluster variation and measured by the
coefficient of variation of cluster level responses (k). Hence, intra-cluster correlation and
between-cluster variation are two ways of measuring the same thing.
Design issues
Correlated data contain less information than uncorrelated data and so a CRT needs a larger
sample size than an individually randomised trial for the same power. The sample size
calculations are complicated by the need to determine both the number of clusters and the
number of individuals per cluster. In addition, assumptions must be made about the level of
between-cluster variation.
Analysis at the cluster level
Analysis of CRTs must take the clustering into account. There are two main approaches for this:

The first is to analyse the cluster level summaries (usually risks, rates or means) as the unit of
observation with standard methods, such as the t test. These methods are robust even for small
numbers of clusters, but they are inconvenient when adjusting for covariates.
Analysis at the individual level
The second method is to use the individual-level regression methods for correlated data
introduced in AS09, such as GEE and random effects models. These have a number of
advantages, including convenience and taking intra-cluster correlation into account in the
estimated effect of the intervention, but they are not robust with small numbers of clusters. As a
general rule, if there are fewer than 15 clusters per study arm, cluster-level methods are
recommended.

Das könnte Ihnen auch gefallen