6 Sample Size Determination (Fikru)

Recall from Biostatics
Estimator
• A point estimate is a single value (point) derived
from a sample and used to estimate a population
value.
• A confidence interval estimate is a range of values
constructed from sample data so that the
population parameter is likely to occur within
that range at a specified probability.
• The specified probability is called the level of
confidence.
Interval Estimates - Interpretation
For a 95% confidence interval about 95% of the similarly
constructed intervals will contain the parameter being
estimated. Also 95% of the sample means for a specified
sample size will lie within 1.96 standard deviations of the
hypothesized population
How to Obtain z value for a Given
Confidence Level
The 95 percent confidence refers to
the middle 95 percent of the
observations. Therefore, the
remaining 5 percent are
equally divided between the
two tails.
Following is a portion of Appendix B.1.
Point Estimates and Confidence Intervals for a
Mean – σ Known
x  sample mean
z  z - value for a particular confidence level
σ  the population standard deviation
n  the number of observations in the sample
1. The width of the interval is determined by the level of

confidence and the size of the standard error of the mean.
2. The standard error is affected by two values:
- Standard deviation
- Number of observations in the sample
Example: Confidence Interval for a Mean – σ
Known
The American Management Association wishes to have information
on the mean income of middle managers in the retail industry.
A random sample of 256 managers reveals a sample mean of
$45,420. The standard deviation of this population is $2,050.
The association would like answers to the following questions:
1. What is the population mean?
2. What is a reasonable range of values for the population mean?
3. What do these results mean?

Known
What is the population mean?
In this case, we do not know. We do know the sample

mean is $45,420. Hence, our best estimate of the
unknown population value is the corresponding sample
statistic.
The sample mean of $45,420 is a point estimate of the

unknown population mean.
Known
What is a reasonable range of values for the population

mean?
Suppose the association decides to use the 95 percent level of

confidence:
The confidence limit are $45,169 and $45,671

The ±$251 is referred to as the margin of error
Known
What do these results mean, i.e. what is the

interpretation of the confidence limits $45,169
and $45,671?
If we select many samples of 256 managers, and for

each sample we compute the mean and then
construct a 95 percent confidence interval, we
could expect about 95 percent of these
confidence intervals to contain the population
mean. Conversely, about 5 percent of the
intervals would not contain the population mean
annual income, µ
Estimation Process
Population Random Sample I am 95% confident

that  is between 40
& 60.
Mean
Mean, , is
X = 50
unknown
Sample
© 2002 Prentice-Hall, Inc.

Chap 8-9
Point Estimates
Estimate Population with Sample

Parameters … Statistics
Mean  X
Proportion p PS
Variance  2
S 2
Difference 1  2 X1  X 2
Chap 8-10
Interval Estimates
• Provides range of values
– Takes into consideration variation in sample
statistics from sample to sample
– Is based on observation from one sample
– Gives information about closeness to unknown
population parameters
– Is stated in terms of level of confidence
• Never 100% certain

Chap 8-11
Confidence Interval Estimates
Confidence
Intervals
Mean Proportion
 Known  Unknown

Chap 8-12
Confidence Interval for 
( Known)
• Assumptions
– Population standard deviation is known
– Population is normally distributed
– If population is not normal, use large sample
• Confidence interval estimate
 
X  Z / 2    X  Z / 2
n n
Chap 8-13
Elements of
Confidence Interval Estimation
• Level of confidence
– Confidence in which the interval will contain the
unknown population parameter
• Precision (range)
– Closeness to the unknown parameter
• Cost
– Cost required to obtain a sample of size n

Chap 8-14
Level of Confidence
• Denoted by
100 1    %
• A relative frequency interpretation
– In the long run, 100 1  of %
all the confidence
intervals that can be constructed will contain the
unknown parameter
• A specific interval will either contain or not
contain the parameter
– No probability involved in a specific interval

Chap 8-15
Interval and Level of Confidence
Sampling Distribution of the
_ Mean
X
  Z / 2 X  /2   Z / 2 X
1  /2
Intervals X
X  
extend from of1intervals
  100%
X  Z X constructed
to contain ;
X  Z X do not.
100 %
© 2002 Prentice-Hall, Inc. Confidence Intervals
Chap 8-16
Factors Affecting
Interval Width (Precision)
• Data variation
Intervals Extend from
– Measured by
 X - Z to X + Z 
• Sample size x x
–   
X
n
• Level of confidence
– 100 1    %
© 1984-1994 T/Maker Co.

Chap 8-17
Determining Sample Size (Cost)
Too Big: Too small:
• Requires • Won’t do
the job
too many
resources

Chap 8-18
Determining Sample
Size for Mean
What sample size is needed to be 90% confident of being
correct within ± 5? A pilot study suggested that the standard
deviation is 45.
Z 1.645  45
2 2 2 2
  219.2  220
n 2
 2
Error 5
Round Up
Chap 8-19
Sample Size Determination
20
Sample Size Determination
 An essential part of planning any study is to decide how
many people need to be studied
 Sample Size: The number of study subjects selected

to represent a given study population.
 Important to make inferences based on the findings

from the sample.
 Should be sufficient to represent the characteristics

of interest of the study population.
21
Sample Size---
 “What size sample do I need?”
 The answer to this question is influenced by a

number of factors, including
 the purpose of the study, population size, the
risk of selecting a “bad” sample, and the
allowable sampling error
22
Sample Size------
 Generally sample size determination depends on
the:
• objective of the study;
• Design of the study;

• plan for statistical analysis;
• accuracy of the measurements to be made
• degree of precision required for generalization;
• degree of confidence with which to conclude.
23
 In planning any investigation we must decide how many people
need to be studied in order to answer the study objectives.
 the study is too small we may fail to detect important effects, or
may estimate effects too imprecisely.
 the study is too large then we will waste resources.
 In general, it is much better to increase the accuracy of data
collection (by improving the training of data collectors and data
collection tools) than to increase the sample size after a certain
point.
 The eventual sample size is usually a compromise between what
is desirable and what is feasible.
 The feasible sample size is determined by the availability of
resources.
 It is also important to remember that resources are not only
needed to collect the information, but also to analyze it
Sample size -----
When deciding on sample size:

PRECISION COST
∆
Sample size = Precision = Cost

Sample size -----
 The feasible sample size is also determined by
the availability of resources:
– time
– manpower
– transport
– available facility, and
– money
Sample size -----
 In addition to the purpose of the study & population size,
three criteria usually will need to be specified to
determine the appropriate sample size:
 The level of precision.
 The level of confidence
 The degree of variability in the attributes being
measured
27
1.The Level of Precision
 The level of precision, sometimes called sampling error, is
the range in which the true value of the population is
estimated to be.
This range is often expressed in percentage (e.g., ±5
percent)
 The absolute precision (d) is half of the confidence
interval. d = Z α /2 x SE
– where SE is the standard error of the estimator of the

parameter of interest
– w/d/=precision of the estimate (how close do you
want to be? True value =sample value )
28
• Precision: this is how accurate our estimated effect will
be and uses a 95% confidence interval.
• Thus the precision of the study is normally to a level of
p=0.05
29
The Level of Precision
 idea of how precise or narrow you want the
confidence interval
 Need more people in sample when:
 the prevalence is closer to 0.5 (50%), or if

continuous, the sd is larger
 want a narrower confidence interval

30

Example
Want to estimate lung cancer prevalence in a given population
aged 65+
 Planning cross-sectional survey
 Think prevalence about 8%
 Want to estimate it to within 2% of truth (with 95% certainty)
 706 people
 Think prevalence will be ~10% and want to estimate it to within

2% (with 95% confidence)
 864 people
Prev ~10% and estimate within 1%
 3445 people
31
The Level of Precision
 Family planning campaign survey were done & reported
by the media.
 Thus, if a researcher finds that 60% of mothers in the

sample have adopted a recommended practice with a
precision rate of ±5%, then he or she can conclude that
between 55% and 65% of mothers in the population
have adopted the practice.
32
The Confidence Level
 Irrespective of the shape of the underlying distribution of the

population, by increasing the sample size, sample means &
proportions will approximate normal distributions if the sample
sizes are sufficiently large.
 The confidence or risk level is based on ideas encompassed under

the Central Limit Theorem.
33
The Confidence Level
 The key idea encompassed in the Central Limit Theorem
is that when a population is repeatedly sampled, the
average value of the attribute obtained by those samples
is equal to the true population value
 In a normal distribution, approximately 95% of the

sample values are within two standard deviations of the
true population value (e.g., mean).
34
Degree of Variability
 The third criterion, the degree of variability in the
attributes being measured, refers to the distribution of
attributes in the population.
 The more heterogeneous a population, the larger the

sample size required to obtain a given level of
precision.
35
Degree of Variability
 Note that a proportion of 50% indicates a greater
level of variability than either 20% or 80%. This is
because 20% & 80% indicate that a large majority
do not or do, respectively, have the attribute of
interest.
 Because a proportion of 0.5 indicates the maximum

variability in a population, it is often used in
determining a more conservative sample size, that
is, the sample size may be larger than if the true
variability of the population attribute were used.
36
Strategies For Determining Sample Size
1. Sample Size: Single Sample
 The aim is to have a large enough sample with

which to estimate a population mean or proportion
within a narrow interval with high reliability.
 Concerned with the precision of the estimate

(“narrowness of the CI”).
estimate ± d units
37
Estimating the number needed to estimate a single
proportion
n = (z α/2)2 p(1-p)
d2
n=number required
P=population proportion (you need to guess this
as if you knew it you would not need to do
the calculation!)
d=precision of the estimate (how close do you
want to be?)
38
Sample size for single sample
includes:
A. Sample size for estimating a single

population mean
B. Sample size to estimate a single population

proportion
Using Formulas to Calculate a Sample Size
n = (Z/2 )2 p(1-P) = 384
d2
The minimum sample size required,
40
1. Suppose that you are interested to know the
proportion of infants who breastfed >18 months of
age in a rural area. Suppose that in a similar area, the
proportion (p) of breastfed infants was found to be
0.20. What sample size is required to estimate the true
proportion within ±3% points with 95% confidence.
Let p=0.20, d=0.03, α=5%
• Suppose there is no prior information about
the proportion (p) who breastfeed
• Assume p=q=0.5 (most conservative)
• Then the required sample size increases
An estimate of p is not always available.
• However, the formula may also be used for
sample size calculation based on various
assumptions for the values of p.
• P = 0.1  n = (1.96)2(0.1)(0.9)/(0.05)2 = 138
P = 0.2  n = (1.96)2(0.2)(0.8)/(0.05)2 = 246
P = 0.3  n = (1.96)2(0.3)(0.7)/(0.05)2 = 323
P = 0.5  n = (1.96)2(0.5)(0.5)/(0.05)2 = 384
P = 0.7  n = (1.96)2(0.7)(0.3)/(0.05)2 = 323
P = 0.8  n = (1.96)2(0.8)(0.2)/(0.05)2 = 246
• For a fixed absolute precision (d), the required
sample size increases as P increases form 0 to
0.5, and then decreases in the same way as
the prevalence approaches 1.
2. A survey is planned to determine what proportion
of the medical students have regularly chewed
khat. If no estimate of p is available and a pilot
sample cannot be drawn, what sample size would
be required if a 95% confidence is desired, and
d=0.04 is to be used.
Ans: 600 students
2. Sample Size: Two Samples
A. Estimation of the difference between two
population means
B. Estimation of the difference between two
population proportions
Comparison of two proportions
• n (in each region) = (p1q1 + p2q2) (f(,)) / ((p1 - p2)²
•  = type I error (level of significance)
•  = type II error ( 1- = power of the study)
• power = the probability of getting a significant result
• f (,) =10.5, when the power = 90% and the level of

significance = 5%
• Eg. The proportion of nurses leaving the health service
is compared between two regions. In one region 30% of
nurses is estimated to leave the service within 3 years
of graduation. In other region it is probably 15%.
Solution
• The required sample to show, with a 90%
likelihood (power), that the percentage of nurses
is different in these two regions would be:
• (assume a confidence level of 95%)
• n = (1.28+1.96)2 ((.3.7) +(.15 .85)) / (.30 - .15)2
= 158
• 158 nurses are required in each region
• Comparison of two means (sample size in each
group)
n = (s12 + s22) f(,) / (m1 - m2)2
 m1 and s12 are mean and variance of group 1

respectively.
 m2 and s22 are mean and variance of group 2
respectively.
A. Sample size for estimating a
difference in two means
• Aim: Estimate μ1-μ2
• Want: within ± d units,
where d = Zα/2.SE
(95% CI of width= w =2d)
• If equal sample size in both groups is required,
then:
2 2 2 2
• Use σ1 , σ2 or estimate using s1 and s2
B. Sample size for estimating a difference
in two proportions
• Aim: Estimate p1-p2
• Want: within ± d units
where d = Zα/2•SE
(95% CI of width = w = 2d)
• If equal sample sizes in both groups, then:
• Use estimates of p1, p2 or (or p1=p2 =0.5 if

unknown)
Points for Consideration
1. Sample size estimates might need to be adjusted to compensate for
non-response rate, patient dropout or loss to follow-up, lack of
compliance, etc.
2. If sampling is from a finite population of size N, then:
n0
n=
 n0 
1 + 
 N
where n0 is the sample from an infinite population. When N is large in
comparison to n, (i.e., n/N ≤ 0.05), the finite population correction
may be ignored.
3. Design effect for complex cluster sampling. Common values: multiply n by

2, 3, …5.

6 Sample Size Determination (Fikru)

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

6 Sample Size Determination (Fikru)

Hochgeladen von

Copyright:

Verfügbare Formate

Recall from Biostatics

1. The width of the interval is determined by the level of

1. What is the population mean?

2. What is a reasonable range of values for the population mean?

3. What do these results mean?

What is the population mean?

In this case, we do not know. We do know the sample

The sample mean of $45,420 is a point estimate of the

What is a reasonable range of values for the population

Suppose the association decides to use the 95 percent level of

The confidence limit are $45,169 and $45,671

What do these results mean, i.e. what is the

If we select many samples of 256 managers, and for

Population Random Sample I am 95% confident

© 2002 Prentice-Hall, Inc.

Estimate Population with Sample

© 2002 Prentice-Hall, Inc.

© 2002 Prentice-Hall, Inc.

© 2002 Prentice-Hall, Inc.

© 2002 Prentice-Hall, Inc.

© 1984-1994 T/Maker Co.

Too Big: Too small:

© 2002 Prentice-Hall, Inc.

 Sample Size: The number of study subjects selected

 Important to make inferences based on the findings

 Should be sufficient to represent the characteristics

 The answer to this question is influenced by a

• Design of the study;

When deciding on sample size:

Sample size = Precision = Cost

– where SE is the standard error of the estimator of the

 Need more people in sample when:

 the prevalence is closer to 0.5 (50%), or if

 want a narrower confidence interval

 Think prevalence about 8%

 Want to estimate it to within 2% of truth (with 95% certainty)

 Think prevalence will be ~10% and want to estimate it to within

 Thus, if a researcher finds that 60% of mothers in the

 Irrespective of the shape of the underlying distribution of the

 The confidence or risk level is based on ideas encompassed under

 In a normal distribution, approximately 95% of the

 The more heterogeneous a population, the larger the

 Because a proportion of 0.5 indicates the maximum

 The aim is to have a large enough sample with

 Concerned with the precision of the estimate

A. Sample size for estimating a single

B. Sample size to estimate a single population

n = (Z/2 )2 p(1-P) = 384

The minimum sample size required,

• power = the probability of getting a significant result

• f (,) =10.5, when the power = 90% and the level of

 m1 and s12 are mean and variance of group 1

• Use estimates of p1, p2 or (or p1=p2 =0.5 if

3. Design effect for complex cluster sampling. Common values: multiply n by

Das könnte Ihnen auch gefallen