Sie sind auf Seite 1von 54

Recall from Biostatics

Estimator
• A point estimate is a single value (point) derived
from a sample and used to estimate a population
value.
• A confidence interval estimate is a range of values
constructed from sample data so that the
population parameter is likely to occur within
that range at a specified probability.
• The specified probability is called the level of
confidence.
Interval Estimates - Interpretation
For a 95% confidence interval about 95% of the similarly
constructed intervals will contain the parameter being
estimated. Also 95% of the sample means for a specified
sample size will lie within 1.96 standard deviations of the
hypothesized population
How to Obtain z value for a Given
Confidence Level
The 95 percent confidence refers to
the middle 95 percent of the
observations. Therefore, the
remaining 5 percent are
equally divided between the
two tails.
Following is a portion of Appendix B.1.
Point Estimates and Confidence Intervals for a
Mean – σ Known

x  sample mean
z  z - value for a particular confidence level
σ  the population standard deviation
n  the number of observations in the sample

1. The width of the interval is determined by the level of


confidence and the size of the standard error of the mean.
2. The standard error is affected by two values:
- Standard deviation
- Number of observations in the sample
Example: Confidence Interval for a Mean – σ
Known
The American Management Association wishes to have information
on the mean income of middle managers in the retail industry.
A random sample of 256 managers reveals a sample mean of
$45,420. The standard deviation of this population is $2,050.
The association would like answers to the following questions:

1. What is the population mean?

2. What is a reasonable range of values for the population mean?

3. What do these results mean?


Example: Confidence Interval for a Mean – σ
Known

What is the population mean?

In this case, we do not know. We do know the sample


mean is $45,420. Hence, our best estimate of the
unknown population value is the corresponding sample
statistic.

The sample mean of $45,420 is a point estimate of the


unknown population mean.
Example: Confidence Interval for a Mean – σ
Known

What is a reasonable range of values for the population


mean?

Suppose the association decides to use the 95 percent level of


confidence:

The confidence limit are $45,169 and $45,671


The ±$251 is referred to as the margin of error
Example: Confidence Interval for a Mean – σ
Known

What do these results mean, i.e. what is the


interpretation of the confidence limits $45,169
and $45,671?

If we select many samples of 256 managers, and for


each sample we compute the mean and then
construct a 95 percent confidence interval, we
could expect about 95 percent of these
confidence intervals to contain the population
mean. Conversely, about 5 percent of the
intervals would not contain the population mean
annual income, µ
Estimation Process

Population Random Sample I am 95% confident


that  is between 40
& 60.
Mean
Mean, , is
X = 50
unknown

Sample

© 2002 Prentice-Hall, Inc.


Chap 8-9
Point Estimates

Estimate Population with Sample


Parameters … Statistics
Mean  X
Proportion p PS
Variance  2
S 2

Difference 1  2 X1  X 2
© 2002 Prentice-Hall, Inc.
Chap 8-10
Interval Estimates
• Provides range of values
– Takes into consideration variation in sample
statistics from sample to sample
– Is based on observation from one sample
– Gives information about closeness to unknown
population parameters
– Is stated in terms of level of confidence
• Never 100% certain

© 2002 Prentice-Hall, Inc.


Chap 8-11
Confidence Interval Estimates

Confidence
Intervals

Mean Proportion

 Known  Unknown

© 2002 Prentice-Hall, Inc.


Chap 8-12
Confidence Interval for 
( Known)
• Assumptions
– Population standard deviation is known
– Population is normally distributed
– If population is not normal, use large sample
• Confidence interval estimate

 
X  Z / 2    X  Z / 2
n n
© 2002 Prentice-Hall, Inc.
Chap 8-13
Elements of
Confidence Interval Estimation
• Level of confidence
– Confidence in which the interval will contain the
unknown population parameter
• Precision (range)
– Closeness to the unknown parameter
• Cost
– Cost required to obtain a sample of size n

© 2002 Prentice-Hall, Inc.


Chap 8-14
Level of Confidence
• Denoted by
100 1    %
• A relative frequency interpretation
– In the long run, 100 1  of %
all the confidence
intervals that can be constructed will contain the
unknown parameter
• A specific interval will either contain or not
contain the parameter
– No probability involved in a specific interval

© 2002 Prentice-Hall, Inc.


Chap 8-15
Interval and Level of Confidence
Sampling Distribution of the
_ Mean
X
  Z / 2 X  /2   Z / 2 X
1  /2

Intervals X
X  
extend from of1intervals
  100%
X  Z X constructed
to contain ;

X  Z X do not.
100 %
© 2002 Prentice-Hall, Inc. Confidence Intervals
Chap 8-16
Factors Affecting
Interval Width (Precision)
• Data variation
Intervals Extend from
– Measured by
 X - Z to X + Z 
• Sample size x x

–   
X
n
• Level of confidence
– 100 1    %

© 1984-1994 T/Maker Co.


© 2002 Prentice-Hall, Inc.
Chap 8-17
Determining Sample Size (Cost)

Too Big: Too small:

• Requires • Won’t do
the job
too many
resources

© 2002 Prentice-Hall, Inc.


Chap 8-18
Determining Sample
Size for Mean
What sample size is needed to be 90% confident of being
correct within ± 5? A pilot study suggested that the standard
deviation is 45.

Z 1.645  45
2 2 2 2
  219.2  220
n 2
 2
Error 5
Round Up
© 2002 Prentice-Hall, Inc.
Chap 8-19
Sample Size Determination

20
Sample Size Determination
 An essential part of planning any study is to decide how
many people need to be studied

 Sample Size: The number of study subjects selected


to represent a given study population.

 Important to make inferences based on the findings


from the sample.

 Should be sufficient to represent the characteristics


of interest of the study population.
21
Sample Size---
 “What size sample do I need?”

 The answer to this question is influenced by a


number of factors, including
 the purpose of the study, population size, the
risk of selecting a “bad” sample, and the
allowable sampling error

22
Sample Size------
 Generally sample size determination depends on
the:
• objective of the study;

• Design of the study;


• plan for statistical analysis;
• accuracy of the measurements to be made
• degree of precision required for generalization;
• degree of confidence with which to conclude.
23
 In planning any investigation we must decide how many people
need to be studied in order to answer the study objectives.
 the study is too small we may fail to detect important effects, or
may estimate effects too imprecisely.
 the study is too large then we will waste resources.
 In general, it is much better to increase the accuracy of data
collection (by improving the training of data collectors and data
collection tools) than to increase the sample size after a certain
point.
 The eventual sample size is usually a compromise between what
is desirable and what is feasible.
 The feasible sample size is determined by the availability of
resources.
 It is also important to remember that resources are not only
needed to collect the information, but also to analyze it
Sample size -----

When deciding on sample size:


PRECISION COST

Sample size = Precision = Cost


Sample size -----
 The feasible sample size is also determined by
the availability of resources:
– time
– manpower
– transport
– available facility, and
– money
Sample size -----
 In addition to the purpose of the study & population size,
three criteria usually will need to be specified to
determine the appropriate sample size:
 The level of precision.
 The level of confidence
 The degree of variability in the attributes being
measured

27
1.The Level of Precision
 The level of precision, sometimes called sampling error, is
the range in which the true value of the population is
estimated to be.
This range is often expressed in percentage (e.g., ±5
percent)
 The absolute precision (d) is half of the confidence
interval. d = Z α /2 x SE

– where SE is the standard error of the estimator of the


parameter of interest
– w/d/=precision of the estimate (how close do you
want to be? True value =sample value )
28
• Precision: this is how accurate our estimated effect will
be and uses a 95% confidence interval.
• Thus the precision of the study is normally to a level of
p=0.05

29
The Level of Precision
 idea of how precise or narrow you want the
confidence interval

 Need more people in sample when:

 the prevalence is closer to 0.5 (50%), or if


continuous, the sd is larger

 want a narrower confidence interval


30

Example
Want to estimate lung cancer prevalence in a given population
aged 65+
 Planning cross-sectional survey

 Think prevalence about 8%

 Want to estimate it to within 2% of truth (with 95% certainty)

 706 people

 Think prevalence will be ~10% and want to estimate it to within


2% (with 95% confidence)
 864 people
Prev ~10% and estimate within 1%
 3445 people
31
The Level of Precision
 Family planning campaign survey were done & reported
by the media.

 Thus, if a researcher finds that 60% of mothers in the


sample have adopted a recommended practice with a
precision rate of ±5%, then he or she can conclude that
between 55% and 65% of mothers in the population
have adopted the practice.

32
The Confidence Level

 Irrespective of the shape of the underlying distribution of the


population, by increasing the sample size, sample means &
proportions will approximate normal distributions if the sample
sizes are sufficiently large.

 The confidence or risk level is based on ideas encompassed under


the Central Limit Theorem.

33
The Confidence Level
 The key idea encompassed in the Central Limit Theorem
is that when a population is repeatedly sampled, the
average value of the attribute obtained by those samples
is equal to the true population value

 In a normal distribution, approximately 95% of the


sample values are within two standard deviations of the
true population value (e.g., mean).

34
Degree of Variability
 The third criterion, the degree of variability in the
attributes being measured, refers to the distribution of
attributes in the population.

 The more heterogeneous a population, the larger the


sample size required to obtain a given level of
precision.

35
Degree of Variability
 Note that a proportion of 50% indicates a greater
level of variability than either 20% or 80%. This is
because 20% & 80% indicate that a large majority
do not or do, respectively, have the attribute of
interest.

 Because a proportion of 0.5 indicates the maximum


variability in a population, it is often used in
determining a more conservative sample size, that
is, the sample size may be larger than if the true
variability of the population attribute were used.
36
Strategies For Determining Sample Size
1. Sample Size: Single Sample

 The aim is to have a large enough sample with


which to estimate a population mean or proportion
within a narrow interval with high reliability.

 Concerned with the precision of the estimate


(“narrowness of the CI”).

estimate ± d units
37
Estimating the number needed to estimate a single
proportion

n = (z α/2)2 p(1-p)
d2

n=number required
P=population proportion (you need to guess this
as if you knew it you would not need to do
the calculation!)
d=precision of the estimate (how close do you
want to be?)
38
Sample size for single sample
includes:

A. Sample size for estimating a single


population mean

B. Sample size to estimate a single population


proportion
Using Formulas to Calculate a Sample Size

n = (Z/2 )2 p(1-P) = 384

d2

The minimum sample size required,

40
1. Suppose that you are interested to know the
proportion of infants who breastfed >18 months of
age in a rural area. Suppose that in a similar area, the
proportion (p) of breastfed infants was found to be
0.20. What sample size is required to estimate the true
proportion within ±3% points with 95% confidence.
Let p=0.20, d=0.03, α=5%
• Suppose there is no prior information about
the proportion (p) who breastfeed
• Assume p=q=0.5 (most conservative)
• Then the required sample size increases
An estimate of p is not always available.
• However, the formula may also be used for
sample size calculation based on various
assumptions for the values of p.
• P = 0.1  n = (1.96)2(0.1)(0.9)/(0.05)2 = 138
P = 0.2  n = (1.96)2(0.2)(0.8)/(0.05)2 = 246
P = 0.3  n = (1.96)2(0.3)(0.7)/(0.05)2 = 323
P = 0.5  n = (1.96)2(0.5)(0.5)/(0.05)2 = 384
P = 0.7  n = (1.96)2(0.7)(0.3)/(0.05)2 = 323
P = 0.8  n = (1.96)2(0.8)(0.2)/(0.05)2 = 246
• For a fixed absolute precision (d), the required
sample size increases as P increases form 0 to
0.5, and then decreases in the same way as
the prevalence approaches 1.
2. A survey is planned to determine what proportion
of the medical students have regularly chewed
khat. If no estimate of p is available and a pilot
sample cannot be drawn, what sample size would
be required if a 95% confidence is desired, and
d=0.04 is to be used.
Ans: 600 students
2. Sample Size: Two Samples
A. Estimation of the difference between two
population means
B. Estimation of the difference between two
population proportions
Comparison of two proportions
• n (in each region) = (p1q1 + p2q2) (f(,)) / ((p1 - p2)²
•  = type I error (level of significance)
•  = type II error ( 1- = power of the study)

• power = the probability of getting a significant result

• f (,) =10.5, when the power = 90% and the level of


significance = 5%
• Eg. The proportion of nurses leaving the health service
is compared between two regions. In one region 30% of
nurses is estimated to leave the service within 3 years
of graduation. In other region it is probably 15%.
Solution
• The required sample to show, with a 90%
likelihood (power), that the percentage of nurses
is different in these two regions would be:
• (assume a confidence level of 95%)
• n = (1.28+1.96)2 ((.3.7) +(.15 .85)) / (.30 - .15)2
= 158
• 158 nurses are required in each region
• Comparison of two means (sample size in each
group)
n = (s12 + s22) f(,) / (m1 - m2)2

 m1 and s12 are mean and variance of group 1


respectively.
 m2 and s22 are mean and variance of group 2
respectively.
A. Sample size for estimating a
difference in two means
• Aim: Estimate μ1-μ2
• Want: within ± d units,
where d = Zα/2.SE
(95% CI of width= w =2d)
• If equal sample size in both groups is required,
then:

2 2 2 2
• Use σ1 , σ2 or estimate using s1 and s2
B. Sample size for estimating a difference
in two proportions
• Aim: Estimate p1-p2
• Want: within ± d units
where d = Zα/2•SE
(95% CI of width = w = 2d)
• If equal sample sizes in both groups, then:

• Use estimates of p1, p2 or (or p1=p2 =0.5 if


unknown)
Points for Consideration
1. Sample size estimates might need to be adjusted to compensate for
non-response rate, patient dropout or loss to follow-up, lack of
compliance, etc.
2. If sampling is from a finite population of size N, then:

n0
n=
 n0 
1 + 
 N
where n0 is the sample from an infinite population. When N is large in
comparison to n, (i.e., n/N ≤ 0.05), the finite population correction
may be ignored.

3. Design effect for complex cluster sampling. Common values: multiply n by


2, 3, …5.

Das könnte Ihnen auch gefallen