Sie sind auf Seite 1von 7

1

Examples for Ch 8

Confidence Intervals for the Sample Mean with Known


Example 1: An auditor takes a random sample of size 36 from a population of 1000 accounts
receivable. The mean value of the accounts receivable for the population is known (from a
previous large survey) to be $2600 with the standard deviation of $450. Find the 95% confidence
interval for the sample mean.
We are not told whether the population is normal or not. But the sample size is large enough to
use the normal approximation. So we will use Z-values to construct the confidence interval. The
Z-value for 95% confidence interval is 1.96. The standard error of the sample mean is 450/ =
$75. Therefore the margin of error is 1.96*75 = $147. Therefore, the confidence interval is:
2600 147 or between $2453 and $2747
You can similarly find the 90% and 99% confidence intervals using the corresponding Z-values.
Now suppose I asked you to find the probability that the sample mean will fall in the interval
which is within $150 around the mean? In this case we are given the margin of error in the units
of the X values and asked to find the probability of the resulting confidence interval. This is a
reverse process relative to above where we are given the confidence level (or probability) and
asked to find the interval.
You can simply convert this value to the Z-value by dividing by the std. error of $75 for the
sample mean to give 2
The probability that Z will be between 2 from the Z tables = 0.9772 0.0228 = 0.9544 or
95.44% chance. Thus the confidence level increased slightly from 95% as the range of estimated
interval (or tolerable margin of error) became wider ($150 compared to $147).
Similarly, I could ask you to find the probability that the sample mean will be less than some
value or greater than some value. But I should not ask the probability that the sample mean will
be exactly equal to some value. Why?

Required Sample size for a specified level of error


The (minimum) required sample size for a random sample is n = Z2/2*2/E2 , where Z/2 is
already defined above in the context of confidence Intervals (it is the Z value corresponding to
the required confidence level), is the population standard deviation (or the standard deviation
of the population from which the sample is taken), and E is the margin of error around the mean
expressed in the units of the X variable. Note that we dont need to know the mean for
determining the sample size. We need only the standard deviation.

Example 2: A personnel department analyst wishes to estimate the mean number of training
hours needed annually for supervisors in a division of the company within the margin of 3
hours (that is plus minus 3 hours) with a 95% confidence level. Based on a large data from
other similar companies the analyst estimates the standard deviation of required training hours
to be equal to 20 hours. Find the minimum sample size which will give the required estimate
with specified margin of error and level of confidence.
Answer: Here = 20 hours, Z/2 = Z0.025 = 1.96 (for 95% confidence level), and the margin of
error E = 3 hours. Therefore, n = (1.96*20/3)2 = 170.7 or 171 observations (always rounded up).
The required sample size increases as the tolerable margin of error is reduced. Find the required
sample size for margin of error of only 1 hour (I bet it will be 9 times the sample size we just
obtained). Similarly, we can find the required sample size for other confidence levels, such as
90% (with Z/2= Z0.05 = 1.645) and 99% (with Z/2= Z0.005 = 2.576). Clearly the sample size
increases as the desired confidence level increases and conversely. Similarly you see that the
required sample size increases as the standard deviation of the population increases. This makes
sense, because you need a larger sample size to have the same level of confidence if the parent
population involves larger variability, other things remaining the same.
Example 3: A small town has 1000 families who make contributions to the only local church. A
poll of 144 randomly selected contributing families reveals that the mean annual family
contribution is $500 with a standard deviation of $72. Construct a 95% confidence interval for
the mean annual family contribution for this population of families who contribute to this
particular church.
Do you see a problem with this question? We are not given the population mean or standard
deviation like the previous example. This may be because there was no previous survey done for
this population. In such cases the sample results are used as surrogates to the unknown
population parameters in the formulas given above if the sample size is adequate. In this case the
sample size 144 is quite large. So we will use $500 and $72 as the surrogates for the unknown
population mean and the standard deviation, respectively. But do we need to use the formula for
large population or small population? At first sight the population of 1000 families seems to be
large. But remember the rule given above. The population is much smaller than 20 times the
sample. So we will use property #3 to find the standard error. We will use the formula given for
finite population on page 4 of Instructions for Chapter 7.
Therefore,

= (72/

= 6*0.9257= 5.554

Therefore, a 95% confidence interval would be between 500-1.96*5.554 and 500+ 1.96*5.554 or
between 489 and 511 dollars per year (rounded to whole numbers ignoring cents. We could also
use the t distribution (discussed below) to build the confidence interval in this case since the
population standard deviation is not given. But the result would be very close to what we

obtained using Z distribution because the sample size is very large. I will discuss this issue in the
following section.

Confidence Intervals for the Sample Mean with Not Known


The pdf (probability density function discussed in my previous instructions) for the t-distribution
looks like the curve given below. Note that the t-distribution approaches the Z-distribution more
and more closely as the df gets larger (or equivalently, the sample size gets larger). In most
practical applications the t-distribution is considered to be close enough to the Z-distribution for
df 30. Therefore, many authors suggest as the practical rule of thumb that for df at least 30
just use the Z-distribution (whose values for the three popular confidence levels are well
known and can be easily memorized) instead of the theoretically required t-distribution for
which we have to look at the table for the corresponding degree of freedom. Also note that
for df infinite the t-distribution exactly coincides with the Z-distribution.

Thus we will use t/2 in place of Z/2 in our calculation of the margin of error and the
confidence interval whenever df is less than 30 and is unknown (assuming ,however, that
the parent population is normal). We will follow exactly the same steps (shown above) as
the case when is known, except that we replace by s and Z by t. For df greater than or
equal to 30 it is a matter of researchers choice. Theoretically t would be more accurate than
Z, but that would involve reading from the t-table instead of using the popularly known Zvalues. So it is up to you which one to use.
Example 4: The sample mean operating life for a random sample of 16 light bulbs of a particular
brand is calculated to be 4000 hours with the sample standard deviation of 200 hours. The
operating life of bulbs is generally assumed to be approximately normal. Estimate the mean

operating life for the population of bulbs from which the sample is taken using a 95% confidence
interval.
Here n=16, df = 15, the population is normal (approximately) and the population standard
deviation is not given. Therefore, we will use the t-distribution instead of the Z-distribution to
construct the confidence interval. We are given = 4000 hours and s = 200 hours. Therefore, the
standard deviation (or standard error) of the sample mean denoted by (from my previous
Instructions) is given by =
where we have replaced by s.

Or =

= 50 hours. Now the confidence interval required is 95%. So = .05 and /2 = .025.

Therefore, we need to find t.025 from the table for df = 15. This value is 2.131. Next,
The margin of error = t/2* = 2.131*50 = 106.55 hours. Therefore,
The 95% confidence interval for the mean is t/2* = t/2*

4000 106.55 or between 3893.45 and 4106.55 hours or between 3893 and 4107 hours rounded
(because the numbers are very large we can ignore the decimals and round to the nearest whole
number). If we had neglected the fact that the population standard deviation is not known and the
sample size is quite small (consequently the df is small), then we would be estimating a narrower
interval which would be questionable because it would be claiming more precision than
warranted by the nature of the sample.
Now can you build 90% and 99% confidence interval estimates of the mean life of bulbs for this
sample? (Hint: look for t.050 and t.005, respectively).
Use of Computer
For the example of auditors sample of accounts receivable (Example 1 above)
95%
2600
450
36
1.960
146.997
2746.997
2453.003

confidence level
mean
std. dev.
n
z
half-width
upper confidence
limit
lower confidence
limit

You can also find the required sample size for a given level of confidence and specified tolerable
margin of error. Let us work on the Second example of this instruction using MegaStat.

Example 2 from above: A personnel department analyst wishes to estimate the mean number of
training hours needed annually for supervisors in a division of the company within the margin of
3 hours (that is plus minus 3 hours) with a 95% confidence level. Based on a large data from
other similar companies the analyst estimates the standard deviation of required training hours
to be equal to 20 hours. Find the minimum sample size which will give the required estimate
with specified margin of error and level of confidence.
Go to MegaStat, select Confidence interval/Sample size, then select Sample size-mean in the
dialogue box, and fill 3 for E and 20 for std deviation. Then the sample size for 95%
confidence level is given by MegaStat as:
Sample size - mean
3 E, error tolerance
20 standard deviation
95% confidence level
1.960 z
170.732 sample size
171 rounded up

This is exactly the same answer we derived above using the formula. I want you to learn
everything using formula as well as computer. Learning only one way will be half knowledge.
Similarly, you can use MegaStat to find confidence intervals using the t-distribution. Let us solve
the Example 3 of this instruction using MegaStat.
Example 4 from above: The sample mean operating life for a random sample of 16 light bulbs
of a particular brand is calculated to be 4000 hours with the sample standard deviation of 200
hours. The operating life of bulbs is generally assumed to be approximately normal. Estimate the
mean operating life for the population of bulbs from which the sample is taken using a 95%
confidence interval.
In this case we will select t instead of Z in the dialogue box and get:
Confidence interval - mean
95% confidence level
4000 mean
200 std. dev.
16 n
2.131 t (df = 15)
106.572 half-width
4106.572 upper confidence limit
3893.428 lower confidence limit

Section B:Confidence Intervals for the Proportions


Example 5: A large population of older homes is known to have defective wiring in 30 percent
of such homes (from previous survey by the company responsible for servicing). A fresh random
sample of 250 homes from this population is collected to study this problem. Find the 95%
confidence interval estimate for the proportions.
The is given in this case. So we will use it to find the standard error and the confidence interval
p =

= 0.029

Therefore, the 95% confidence interval is 0.3 0.029*1.96 = 0.3 0.057 (rounded to three
decimals) or between 0.243 and 0.357 (also found in previous instructions). We could have used
MegaStat to find this (select proportion in the dialogue box instead of sample mean) as follows:
Confidence interval -proportion
95% confidence level
0.3 proportion
250 n
1.960 z
0.057 half-width
0.357 upper confidence limit
0.243 lower confidence limit

Sometimes the population proportion is not given. Then we have to work with the estimated
sample proportion only as in the following example.
Example 6: A sample of 75 retail in-store purchases showed that 24 paid in cash. Construct a
95% confidence interval for the proportion of all retail in-store purchases that are paid in cash.
Here population proportion is not given. The sample p = 24/75 = 0.32 and n= 75. Thus np = 24
and n(1-p) = 51. Therefore, normal approximation can be satisfactorily applied. Do we need
continuity correction? We have n(p(1-p) =16.3> 10. So we dont need continuity correction. We
get p = {(.32)(.68)/75} = 0.0539. Now replacing by p we get the 95% confidence interval
as: p Z/2p = 0.32 1.96*0.0539 = 0.32 0.1056 or between 0.2144 and 0.4256.
Using MegaStat
Confidence interval - proportion
95% confidence level
0.32 proportion
75 n
1.960 z
0.106 half-width
0.426 upper confidence limit
0.214 lower confidence limit
You can easily find other confidence intervals.

Required Sample Size in the case of Proportions for given Accuracy


Example 8: An opinion poll last month suggested that 40% of people would vote for candidate
A. This month a new poll is being rerun. How many people must be interviewed for the poll to
be within 2 percentage points of actual voting intentions with a 95% level of confidence?
Since we are not given any other information we will assume that the previous poll estimates are
the surrogates for the population parameters (assuming that the previous sample was sufficiently
large).
The formula for required sample size is n = { Z/22* (1- )}/E2.
Note that the symbol E is used for margin of error is the desired margin of error or tolerable
deviations from the mean value (expressed in decimal). The value Z= 1.96 for 95% confidence,
1.645 for 90% and 2.576 for 99% confidence levels. The value is the proportion in the
population. You see that the required sample size increases by square as the margin of error is
reduced (or more accuracy is desired)
For the above example: n = ((1.96)2*0.24)/(0.02)2 = 2304.96 or 2305 rounded. Thus for the
desired accuracy the new poll has to interview 2305 voters. This should give you an idea why the
polls generally dont strive for such accuracy but settle for a margin of 4 or 5 percentage points.
If in the above example the desired margin of error is 4 percentage points, then the sample size
would reduce by a factor of 4 (less than 580 interviews required). If a 1 percentage point margin
is set the sample size would jump to more than 9000. The next significant factor which affects
the sample size is the confidence interval (the higher the confidence interval the larger the
required sample size, other things remaining the same). But this does not have as dramatic
impact as the margin of error. The third factor is the . The farther it is from 0.5, the lower is the
required sample size. Does this make sense? Of course! If a population is almost equally divided
on some issue or candidate you need a larger sample to find which one is going to win. If the
population is highly biased on some issue or candidate you can easily find out the result even
with a smaller sample.
You can easily get the required sample size for proportions using the MegaStat by selecting
Sample size-p in the dialogue box and specifying the other parameters in the dialogue box.

Das könnte Ihnen auch gefallen