Sie sind auf Seite 1von 5

Nurse Researcher Quantitative research

Fundamentals of
estimating sample size
Cite this article as: Malone HE, Nicholl H, Coyne I (2016) Fundamentals of estimating sample size.
Nurse Researcher. 23, 5, 21-25.

Date of submission: June 1 2015. Date of acceptance: August 27 2015.


Correspondence
helenmalone3@gmail.com Abstract
Helen Evelyn Malone PhD, MSc, Background Estimating sample size is an integral Conclusion An informed choice of parameter values
RNT, SCM, SRN is a research
fellow at the School of
requirement in the planning stages of quantitative to input into estimates of sample size enables the
Nursing & Midwifery studies. However, although abundant literature is researcher to derive the minimum sample size required
available that describes techniques for calculating with sufficient power to detect a meaningful effect.
Honor Nicholl PhD, BSc,
RCN is assistant professor
sample size, many are in-depth and have varying An understanding of the parameters provides the
(nursing) at the School of degrees of complexity. foundation from which to generalise to more complex
Nursing & Midwifery size estimates. It also enables more informed entry of
Imelda Coyne PhD, MA,
Aim To provide an overview of four basic parameters required parameters into sample size software.
BSc(Hons), Dip N, RSCN, RGN, that underpin the determination of sample size and to
RNT, FEANS is head of children’s explain sample-size estimation for three study designs Implications for practice Underpinning the concept
nursing and research
common in nursing research. of evidence-based practice in nursing and midwifery
All at Trinity College Dublin, is the application of findings that are statistically
Republic of Ireland Discussion Researchers can estimate basic sound. Researchers with a good understanding
Peer review sample size if they have a comprehension of of parameters, such as significance level, power,
This article has been subject four parameters, such as significance level, power, effect size, standard deviation and event rate,
to double-blind review and
effect size, and standard deviation (for continuous are enabled to calculate an informed sample size
has been checked using
antiplagiarism software data) or event rate (for dichotomous data). In this estimation and to report more clearly the rationale
paper, these parameters are applied to determine for applying any particular parameter value in sample
Author guidelines
sample size for the following well-established study size determination.
journals.rcni.com/r/
nr-author-guidelines designs: a comparison of two independent means,
the paired mean study design and a comparison Keywords effect size, power, quantitative research,
of two proportions. sample size, significance level, standard deviation

Introduction a sample size may be considered unethical, wasteful


‘WHAT SAMPLE size do I require?’ is a question of resources and can affect a study’s feasibility.
frequently asked by novice researchers Additionally, funding agencies, ethics committees
(Kelly et al 2010). It is unnecessary to recruit the and journal editors increasingly expect sample size
entire population to a study: a sufficiently large to be justified.
representative subset or sample can provide
information concerning that population to whatever Defining sample size
degree of accuracy is required (Daly et al 1991). A sample size is appropriate if it enables the
Nevertheless, a balance is needed between researcher to make an unequivocal judgement that
using too few or too many subjects in the sample a statistical result is correct to a chosen degree
(Kelly et al 2010). Too small a sample size will have of error (‘type I error’) and has sufficient power
insufficient power to statistically detect a true (‘1-type II error’) to detect a specified meaningful
difference, so important differences between study effect. The sample size determination in this paper
groups may be declared statistically insignificant is based on power calculations and is suitable
(Daly et al 1991, Karlsson et al 2003). Too large for testing hypotheses.

© RCNi / NURSE RESEARCHER May 2016 | Volume 23 | Number 5 21


Downloaded from RCNi.com by ${individualUser.displayName} on Jul 03, 2016. For personal use only. No other uses without permission.
Copyright © 2016 RCNi Ltd. All rights reserved.
Nurse
Art Researcher
& science | acute care

Four parameters that underpin power It is notable, using sample size tables from
The significance level For testing hypotheses, Machin et al (2009), that at a significance level of
the significance level ‘α’ (alpha) cuts off the area 5%, sample sizes for means or proportions increase
under the sampling distribution curve that divides by approximately 33% when the effect size and
a significant from a non-significant result: standard deviation are held constant and the
if a statistical test’s result has a p-value less power parameter is changed from 80% to 90%.
than α, then the statistical test is deemed to be
significant. For a significance of 5%, for example, The effect size
α=0.05 and if p<0.05, the result is deemed Defining effect size for the comparison of means
significant. This α is the same α that is entered When comparing two means, absolute effect size is
into sample size software or formulae. defined as |µ1-µ2| where μ1 and μ2 are the population
The ‘alpha error’ or ‘type I error’ is the risk of means. The standardised effect size – that is the
accepting that a treatment is effective when it is not effect size measured in units of standard deviation
(Karlsson et al 2003). More formally, it is rejecting – is the absolute effect size divided by the relevant
the null hypothesis Ho when Ho is true (Sakpal standard deviation, σ, or |µ1- µ2|/σ (Machin et al
2010). A p<0.05 means that if the experiment was 2009), where σ is the square root of the variance
repeated 100 times, a type I error is likely to occur of the populations (Petrie and Sabin 2010).
five times (5% of times). Similarly, p<0.01 means The Cohen (1987) operational definitions of the
that if the experiment was repeated 100 times, standardised effect size for means are: 0.20 for
a type I error is likely to occur only once. While α is a small effect size, 0.50 for a moderate size and
likely to be set at 0.05 (Karlsson et al 2003, O’Hara 0.80 for a large.
2008), the researcher can choose a lower risk of
a type I error using an α of 0.01. Such changes to Defining effect size for the comparison of
the chosen significance level affect the calculated proportions When comparing proportions (event
sample size. For example, if it is deemed acceptable rates), absolute effect size is defined as |π1-π2|
to change the conventional significance level of 5% where π1 and π2 are the anticipated population
to 10%, the calculated sample size decreases. proportions in the two groups to be compared
(Machin et al 2009). Authors generally report the
Power of the test ‘Power’ is the probability of a true absolute effect size – for example, a difference in
positive effect (Kelly et al 2010) and can be defined absolute effect sizes of 15% was deemed clinically
as the probability that a statistical test will produce important in Adam et al’s (2005) study of three-year,
a significant result, given that a true difference leg amputation-free survival values, with 65% (π2)
actually exists (Houle et al 2005). The probability in one group and 50% (π1) in the other.
of a statistical test failing to detect a true difference
is called ‘β’ (beta) and the power is (1-β). Clinically meaningful effect size To calculate
If a treatment effect is present but the test does sample size, a researcher needs to anticipate what
not find it, that is a ‘beta error’ – also known as effect size to use. This should be the clinically
a ‘type II error’ or ‘false negative’. More formally, important difference (Kelly et al 2010) that the
it is accepting Ho when the alternative hypothesis treatment should make to justify a change in
H1 is true (Sakpal 2010). practice (Gogtay 2010). For example, an intervention
The choice of beta and therefore power depends that reduces maternal mortality by 1% could be
on the degree of certainty of detecting a real considered clinically important, but an intervention
difference or effect (‘a true positive’) that the that reduces the incidence of neonatal physiological
researcher requires. A beta β of 0.1 (10%) to 0.2 (20%) jaundice by 10% might be considered of lesser
is usually tolerated (Cohen 1987), giving a power clinical important in practice (Devane et al 2004).
of 0.8 (80%) or 0.9 (90%) (Kelly et al 2010).
Anticipated effect size and the primary outcome
Table 1 Choosing the alpha value and the corresponding z value for a chosen A study’s principal endpoint of interest is its main
significance level for a one or a two-tail test outcome (Kelly et al 2010). Sample size is calculated
using the anticipated effect size for the designated
Significance Alpha (α) zα for a one-tail test zα for a two-tail test
primary outcome; where several outcomes are
5% 0.05 1.64 1.96 equally important, the sample size is calculated
1% 0.01 2.33 2.58 for all endpoints of interest and the largest
sample size is then used to conduct the study
Source: Colton (1974), Machin et al (2009)
(Machin et al 2009).

22 May 2016 | Volume 23 | Number 5 © RCNi / NURSE RESEARCHER


Downloaded from RCNi.com by ${individualUser.displayName} on Jul 03, 2016. For personal use only. No other uses without permission.
Copyright © 2016 RCNi Ltd. All rights reserved.
Quantitative research

If a researcher is over-optimistic and decides on Table 2  Choosing the power and the corresponding z value for a one- or a two-tail test
an anticipated effect size that is too large relative
Power Beta β zβ
to the true but unknown effect size, the resultant
sample size will be too small (Machin et al 2009). 80% 20% 0.842
Consequently, the study will be underpowered and 90% 10% 1.282
a true effect could be missed. In contrast, if the
NB Power always cuts off only one tail
researcher is pessimistic and sets the effect size too Source: Colton 1974, Machin et al 2009
low, the sample size can be prohibitively large and
may affect the study’s feasibility. mean in units of standard deviation (Castle 1977).
In practice, there can be a range of opinions It is a standardised value used to cut off a particular
among clinicians on the minimum effect size area in the normal distribution (Daly et al 1991).
of clinical importance. These opinions can be Z-values do not need to be calculated, but can simply
considered with the aim of balancing the detection be looked up in tables of the normal distribution
of a clinically meaningful effect size with the contained in most statistics books (Colton 1974,
feasibility of the study. A small adjustment to the Machin et al 2009). The zα values for one- and two-tail
effect size can result in large changes to the calculated tests are given in Table 1. The zβ values for powers
sample size. For example, Mires et al (2001) estimated are given in Table 2. These values are sufficient
a sample size of 2,552 to detect a 3% absolute for most studies.
effect size. However, using the same significance
level and power, the estimated sample size Design 1: A comparison of two independent means
reduced to 1,704 subjects when the effect size was (continuous data) This type of design can be applied
increased to 4%. when randomising participants into an intervention
and a control group for a clinical trial. An appropriate
The standard deviation or event rate For test would be the independent t-test (Colton 1974,
continuous data, variability is measured by the Petrie and Sabin 2010). The sample size calculated
standard deviation, σ (Colton 1974). This is the gives the number in each arm of the study.
variation or spread associated with data, and can For example: A researcher is planning
be conceptualised as an average of deviations a randomised controlled trial (RCT) to test the
from the mean µ. effectiveness of a new pharmaceutical drug to
Estimates for σ are difficult to obtain, but the alter systolic blood pressure by a predetermined
researcher can estimate it by undertaking a pilot effect size of 5mmHg for a group of subjects whose
study (Kelly et al 2010), an internal pilot study blood pressure values have σ=10mmHg. For a two-tail
(Daly et al 1991, Machin et al 2009) or using the significance test, with 80% power
σ reported by similar studies (Gogtay 2010). and 5% level of significance, how many subjects
For continuous data, the relevant standard deviation does the study require? Assume that the data
can be entered directly into sample size software. sampled are normally distributed, there is equal
For discrete data (‘dichotomous data’), variability variance for both compared groups and there
depends on the proportion of subjects with the are equal numbers in each group. The answer is
outcome of interest (Kelly et al 2010). Thus, for the given in Figure 1.
comparison of two proportions, sample size is
calculated using the input of the two proportions Figure 1 Sample size formula for the comparison
of interest: the control proportion (baseline of two independent means
event rate) can be obtained from published
2
literature, while the treatment proportion (event ⎡(zα+ zβ)σ ⎤
rate) is derived once the effect size is chosen. n=2
⎣ ES ⎦
The sample size calculation is informed by the  
variation in the population and the effect size
by knowledge of the two proportions of interest. Therefore:
2
⎡(1.96 + 0.84)10 ⎤
Three sample size calculations n=2
⎣ 5 ⎦ = 63 in each group
In the following three formulae, the significance level       
is entered into the formula as a zα value. Similarly, The more precise estimate given by the G*Power
the power is entered into the formula as a zβ value. sample size software is 64 in each group.
A z-value or ‘standardised normal deviate’ (Petrie and (Colton 1974, Daly et al 1991)
Sabin 2010) is a measure of the distance from the

© RCNi / NURSE RESEARCHER May 2016 | Volume 23 | Number 5 23


Downloaded from RCNi.com by ${individualUser.displayName} on Jul 03, 2016. For personal use only. No other uses without permission.
Copyright © 2016 RCNi Ltd. All rights reserved.
Nurse
Art Researcher
& science | acute care

Figure 2 Sample size formula for the difference pharmaceutical beta-blocker drug to alter heart
of two dependent means: matched rate by a pre-determined effect size of five
paired design beats/minute. The standard deviation of the
differences, σd, is ten beats/minute.
2
np = ⎡(zα+ zβ)σd⎤ = Number of pairs Using a two-tail test, 80% power and a 5%
⎣ ES ⎦ significance level, how many subjects does the study
need? Assume the data are normally distributed.
Therefore: The answer is given in Figure 2.
2
⎡(1.96 + 0.84)10 ⎤ (For the zα value for 5% Significance see Table 1).
np = = 32 pairs (For the zβ value for 80% Power see Table 2).
⎣ 5 ⎦
The more precise G*Power estimate is 34 pairs. Design 3: A comparison of two independent
(Daly et al 1991) proportions (categorical data) For this design,
appropriate tests are the z-test for the comparison
Design 2: Paired design – comparison of of two independent proportions (Colton 1974) or
two dependent means (continuous data) A paired the χ2 test (Petrie and Sabin 2010).
design uses naturally matched pairs (‘matching For example: a researcher wishes to detect
siblings’), artificial pairing (‘matching subjects for an effect size of 15% (absolute difference) in the
important characteristics’) or self-pairing (Colton three-year, amputation-free survival rate between an
1974). In self-pairing, participants act as their own angioplasty group and a group undergoing bypass
control and in matched pairs, two individuals surgery. The proportion in one group was assumed
form a pair. Therefore, the total sample size is to be 50%, in the other 65%.
doubled for matched pairs as compared to self- Using a two-tail test, 5% significance level and
pairing. The appropriate test is the paired t-test a power of 90%, how many subjects does the study
(Colton 1974, Petrie and Sabin 2010). The sample need? Assume that the groups have equal variance
size calculated gives the number of pairs needed and numbers. The answer is given in Figure 3.
for the study. The formula in Figure 3 assumes the normal
The relevant standard deviation is the ‘standard approximation to the binomial distribution. As such,
deviation of difference’, σd – that is, the standard it can only be applied when the average of the two
deviation of the differences estimated from before proportions is not too small: (π1 + π2)/2>0.10
and after treatment. However, an estimate of σd (Goodall et al 2009).
is difficult to obtain from published sources so
a preliminary estimate should be obtained at the Anticipated drop-outs or losses to follow up If
beginning of the study (Daly et al 1991) or through a drop-out rate of r% is expected, the adjusted
a pilot study. For example, a researcher is sample size is obtained by multiplying the
planning a study to test the efficacy of a new unadjusted sample size by 100/(100–r)
(Petrie and Sabin 2010).
Figure 3 Sample size formula for the difference of two independent proportions
Allocation ratio In general, clinical trials allocate
For example: equal numbers to both groups (Machin et al 2009).
2
⎡(zα+ zβ)⎤ However, other allocation ratios may be required.
n=2
⎣ π2– π1 ⎦ π(1–π) For example, in placebo-controlled trials with very
  ill patients, it may be unethical to assign equal
Therefore: numbers to each group (Sakpal 2010). Additionally,
2
⎡(1.96+1.282) ⎤ in case-control studies, where there are a limited
  n= 2⎣ 0.15 ⎦ x 0.575 x 0.425 = 228 in each group number of cases and controls are readily available,
it is acceptable to include more controls to increase
the power of the study (Machin et al 2009).
Where n = Number in each group Sample size software assists with calculations
Where π1 and π2 are the proportions in each group involving different allocation ratios. The sample size
Where Effect size = |π2 – π1| = |0.65 – 0.50| = 0.15 derived in this paper for a 1:1 ratio can be adjusted
Where π = (π1 + π2)/2 = 0.575 (Average of the two proportions) to give a different allocation ratio by applying the
Where (1 – π) = (1– 0.575) = 0.425 following simple formula (where n is the number in
The more precise G*Power estimate is 227 in each group. each arm with equal allocation and k is the required
allocation) (Sakpal 2010):

24 May 2016 | Volume 23 | Number 5 © RCNi / NURSE RESEARCHER


Downloaded from RCNi.com by ${individualUser.displayName} on Jul 03, 2016. For personal use only. No other uses without permission.
Copyright © 2016 RCNi Ltd. All rights reserved.
Quantitative research

Regardless of whether sample size is calculated


Arm (1) n1 = n (1+k)/2 using formulae or sample size software, carefully
Arm (2) n2 = n (1+k)/2k estimated sample sizes are important. This is
N (total) = (n1 + n2) because the research informing healthcare practice
needs to be based on evidence. Underpinning
A practical example of this formula can be the concept of evidence-based practice is
illustrated using the sample size estimation given in the application of research findings that are
this paper for the comparison of two Independent statistically sound.
means, with a 1:1 ratio gives a G*Power estimate of Four parameter values required for basic
64 in each arm (128 total). Then, for an allocation estimation of sample size are: significance level,
ratio of 2:1 the required numbers in each arm to power, effect size, and standard deviation (for
maintain 80% power are (Sakpal 2010): continuous data) or the event rate (for dichotomous
data). The conventional significance level is typically
Arm (1) n1 = 64 (1+2)/2 = 96 5% and the conventional power is 80%. The effect
Arm (2) n2 = 64 (1+2)/4 = 48 size is the clinically meaningful difference that can
N (total) = (96 + 48) = 144 justify a change in clinical practice. The standard
deviation is usually unknown and can be estimated
Sample size and study feasibility considerations by undertaking a pilot study or using the standard
The estimated sample size may be too large, deviation reported for similar studies.
rendering the study unfeasible. Sample size can It is important for researchers to be aware that
be reduced by decreasing the significance level the benefits of a properly calculated sample size
from 5% to 10%; changing power from 90% to 80%; can be undermined by the presence of systematic
limiting the study to the primary outcome, if other bias in the study design (Malone et al 2014).
end-points require a larger sample size; applying When reporting study findings, researchers should
a paired design (‘reducing variability’); choosing report their rationales for the choice of parameters
a continuous instead of a categorical variable; applied to calculate the sample size and whether any
or applying a one-sided significance test. adjustments were made to the calculations to take
account of attrition. Online archive
Conclusion It is strongly recommended that a confidence
For related information, visit
Sample-size software requires the input of the interval should be reported for the effect size our online archive and search
parameters outlined in this paper. Familiarisation observed (Noordzij et al 2011). Statistical guidance using the keywords
with the formulae in this paper enables the from a statistician is advised for study designs
researcher to conceptualise what is happening that require more complex formulae than those Conflict of interest
in the background when using such software. presented in this paper. None declared

References
Adam DJ, Beard JD, Cleveland T et al (2005) Devane D, Begley CM, Clark M (2004) How Karlsson J, Engebretsen L, Dainty K et al Mires G, Williams F, Howie P (2001)
Bypass versus angioplasty in severe ischaemia many do I need? Basic principles of sample (2003) Considerations on sample size and Randomised controlled trial of
of the leg (BASIL): multicentre, randomised size estimation. Journal of Advanced Nursing. power calculations in randomized clinical trials. cardiotocography versus Doppler auscultation
controlled trial. Lancet. 366, 9501, 1925-1934. 47, 3, 297-302. Arthroscopy. 19, 9, 997-999. of fetal heart at admission in labour in low risk
obstetric population. British Medical Journal.
Castle WM (1977) Statistics In Small Doses. Gogtay NJ (2010) Principles of sample size Kelly PJ, Webster AC, Craig JC (2010)
322, 7300, 1457-1460.
Second edition. Churchill Livingstone, calculation. Indian Journal of Ophthalmology. How many patients do we need for a clinical
New York NY. 58, 6, 517-518. trial? Demystifying sample size calculations. Noordzij M, Dekker FW, Zoccali C et al (2011)
Nephrology. 15, 8, 725-731. Sample size calculations. Nephron Clinical
Cohen J (1987) Statistical Power Analysis for Goodall EA, Moore J, Moore T (2009) The
Practice. 118, 4, c319-c323.
Behavioral Sciences. Second edition. Lawrence estimation of approximate sample size Machin D, Campbell MJ, Tan SB et al (2009)
Erlbaum Associates, Mahwah NJ. requirements necessary for clinical and Sample Size Tables for Clinical Studies. O’Hara J (2008) How I do it: sample size
epidemiological studies in vision sciences. Eye. Third edition. Wiley-Blackwell, Oxford. calculations. Clinical Otolaryngology.
Colton T (1974) Statistics in Medicine.
23, 7, 1589-1597. 33, 2, 145-149.
First edition. Little, Brown and Co, Boston MA. Malone H, Nicholl H, Tracey C (2014)
Houle TT, Penzien DB, Houle CK (2005) Awareness and minimisation of systematic Petrie A, Sabin C (2010) Medical Statistics at
Daly LE, Bourke GJ, McGilvray J (1991)
Statistical power and sample size estimation bias in research. British Journal of Nursing. a Glance. Third edition. Wiley-Blackwell, Oxford.
Interpretation and Uses of Medical Statistics.
for headache research: an overview and power 23, 5, 279-282.
Fourth edition. Blackwell Scientific Sakpal TV (2010) Sample size estimation in
calculation tool. Headache. 45, 5, 414-418.
Publications, Oxford. clinical trial. Perspectives in Clinical Research.
2, 67-69.

© RCNi / NURSE RESEARCHER May 2016 | Volume 23 | Number 5 25


Downloaded from RCNi.com by ${individualUser.displayName} on Jul 03, 2016. For personal use only. No other uses without permission.
Copyright © 2016 RCNi Ltd. All rights reserved.

Das könnte Ihnen auch gefallen