Sie sind auf Seite 1von 4

Estimation - 1

CHAPTER 1 REVIEW OF BASIC CONCEPTS

3. INTERVAL ESTIMATION

One of the most important activities that statisticians do is to use sample statistics to estimate unknown
population parameters, for example population mean or population proportion. There are two ways to make
estimation: point estimation and interval estimation.

Point Estimation
In point estimation, we use a single numerical value to estimate the value of a population parameter. For
example, we may want to estimate the value of an unknown population mean. In this case, some of the possible
candidates that we can use to estimate the population mean include sample mean, sample median, sample mode,
etc. The question then becomes which of these candidates is the best estimator for the population mean.
Similarly, among those candidates that we can choose to be the point estimator for the population proportion,
which one is the best?

Statisticians use sample mean to estimate the population mean, sample proportion to estimate the population
proportion, and sample standard deviation to estimate the population standard deviation. This implies that
statisticians think these sample statistics are the best estimators for the corresponding population parameters.
But why? By what standards are these sample statistics considered better than other estimators? Among the list
of criteria that statisticians use to choose a point estimator, following are the more important ones:

1. Unbiased Estimator A sample statistic is called an unbiased estimator of a population parameter if the
mean of all possible values of that sample statistic is equal to the value of that parameter.

Recall from our previous example about the distribution of sample means. We found that the mean of
sample means is always equal to the population mean. Hence, sample mean is an unbiased estimator of the
population mean. Similarly, it can be shown that sample proportion and sample standard deviation
(variance) are unbiased estimators of population proportion and population standard deviation (variance),
respectively. You should know that sample mean may not be the only unbiased estimator of the population
mean. For unimodal and symmetric distributions, sample median is also an unbiased estimator of the
population mean.

2. Consistency An unbiased estimator is said to be consistent if the average squared difference between the
estimator and the parameter becomes smaller as the sample size becomes larger.

Again, recall from our previous example on the distribution of sample means. We found that the standard
error of sample mean is equal to  / n , where n is the sample size and  is the population standard
deviation. Hence, as the sample size becomes larger, the average distance between sample means and the
population mean (measured by the standard error of sample mean) becomes smaller. Therefore, sample
mean is a consistent estimator for the population mean. Similarly, sample proportion is a consistent
estimator for the population proportion.

3. Relative Efficiency If there are two unbiased estimators of a parameter, the one whose variance is smaller
is said to be relatively efficient. For example, sample mean and median are both unbiased estimators of the
population mean when the population is normal. However, the variance of sample mean, 2/n, is smaller
than the variance of sample median, 1.57 2/n. Hence, sample mean is relatively more efficient than sample
median as an estimator for the population mean.

Statisticians argue that a good estimator should satisfy as many of the above criteria as possible. Relatively
speaking, sample mean, sample proportion, sample variance, and sample standard deviation are doing better in
terms of the above criteria than other candidates in making estimation.
Estimation - 2

Interval Estimation
Point estimation is good in the sense that it provides a very sharp estimation for the unknown population
parameter. However, it has the following major drawbacks:

1. With only a single point, it is unlikely that the point estimator will actually hit the target. For example,
when we use a sample with mean equal to 10 to estimate the population mean, it is unlikely that the
population mean will be exactly equal to 10. (You may get very close, but it’s hard to be exactly right.)
2. Point estimator doesn’t allow us to specify how confident we are with our estimation. For example, again,
when we use a sample mean 10 to estimate a population mean, how confident are we in making this
estimation?

It is mainly because of the above two reasons, statisticians often use interval, rather than a single point, to make
estimation. For example, instead of estimating the population mean to be 10, we may estimate it to be some
where between 8 and 12. This interval, from 8 to 12, is our interval estimation for the population mean.

How wide should a confidence interval be?


This depends on how much confidence you want to have with your estimation. If you want to be more
confident with your estimation, you should use a wider interval. However, wider interval also means less
precise estimation. In fact, if the interval becomes too wide, the estimation may become meaningless. So, in
order to make your estimation more precise, you probably should make your interval shorter. However, by
doing that, you will be less confident with your estimation. There is always tradeoff between preciseness and
confidence level of your estimation. Given fixed sample size, you cannot have “high degree of preciseness”
and “high level of confidence” at the same time. By increasing the sample size, however, you can gain more
confidence with a fixed interval length. Similarly, with a larger sample size, you can have a shorter (or precise)
interval with a fixed confidence level.

Confidence Interval for  When  Is Known


How do we construct an interval so that we can have certain level of confidence that the true population mean
will fall in the interval?
 Recall that the distribution of sample mean will be normal with mean equal to the population mean and
standard error equal to  x   / n when the sample size is sufficiently large, or when the population is
normal.
 With this information in mind, we know, for example, that 95% of sample means should be within  1.96
standard error from the population mean.
 In other words, if we draw a random sample, there is 95% chance that the sample mean will be within 
1.96 standard error from the true population mean.
 Now, if we construct an interval that is 21.96 standard errors wide ( 1.96 makes it equal to 21.96) and is
centered at sample mean, how likely is it that your interval will contain the true population mean?

If you can answer the last question, then you should know that a 95% confidence interval for the population
mean will be x  1.96 x . By the same reasoning, a 90% confidence interval for  will be x  1.64 x ,
and a 99% confidence interval for  will be x  2.58 x . (Why?)

In summary, to construct a 100(1-)% confidence interval for , we use the following formula:

x  Z / 2  x  x  Z / 2  , (1)
n
where (1 - ) is the ”confidence level”, and Z/2 is the value of Z such that P(Z  Z/2) = /2. Notice that, to use
the formula given by (1), it is required that either (1) the population is normal or (2) the sample size is
sufficiently large.

 In Excel, the function “NORMSINV” can be used to find the critical value of Z. If  is the confidence
coefficient, then entering “=NORMSINV(1 – /2)” will return the desired critical Z value. For example, the Z
value needed to construct a 95% confidence interval can be found by entering the formula
“=NORMSINV(0.975)”, which returns the value 1.96. Similarly, the Z values required for 99% confidence
intervals can be found by using the formula “=NORMSINV(0.995)”, which return the value of 2.58.
Estimation - 3

Examples:

1 Suppose a sample of size 9 was taken from a normal population with  = 3, and it was found that the
sample mean = 20.
a) Find a 90% confidence interval for . (18.90-21.10)
b) Find a 95% confidence interval for . (18.69-21.31)
c) Find a 99% confidence interval for . (18.27-21.73)
d) Find a 80% confidence interval for . (19.24-20.86)

2 Suppose a sample of size 100 was taken from a population with  = 5, and it was found that the sample
mean = 80.
a) Find a 90% confidence interval for . (79.18-80.82)
b) Find a 95% confidence interval for .
c) Find a 99% confidence interval for .
d) Find a 80% confidence interval for .

Confidence Interval for  When  Is Unknown


Notice that we need to know the population standard deviation  in order to use the formula given by (1). What
happens if we don’t know the population standard deviation? In this case, we can use sample standard
deviation s to estimate the population standard deviation . Unfortunately, by doing that, the distribution of
sample mean is no longer normal. In fact, as we mentioned before, if the samples were taken from a normal
population, the following sample statistic will follow a t-distribution with n – 1 degrees of freedom.
x
t .
s/ n
Hence, when  is unknown, the 100(1 - )% confidence interval for  is modified as
s
x  t / 2,n 1  ˆ x  x  t / 2,n 1 
n
(2)
where ˆ x  s / n is the estimated standard error of sample mean and t/2, n-1 is the value of t (with n – 1 degrees
of freedom) such that P(tn-1  t/2, n-1) = /2. The critical t value required to calculate the confidence interval can
be found in the t-Table (see my note or your textbook) or by using computer (see the note below). Note that, to
use the formula given by (2), the population must be normal, or at least approximately normal. If the population
is not normal, the sample size must be sufficiently large.

 In Excel, the function “TINV” can be used to find the critical value of t. Notice that the way to use TINV
function is not quite the same as NORMSINV. (I’ll explain this in class. Your textbook explains this in detail –
See pp.426-429). TINV function requires two arguments: the probability and the degrees of freedom. If the
confidence coefficient is  and degrees of freedom is n, then entering “=TINV(, n)” will return the desired
critical t value. For example, if the degrees of freedom is 20, then the t value needed to construct a 95%
confidence interval can be found by entering the formula “=TINV(0.05, 20)”, which returns the value 2.0860.
Recall that when n is sufficiently large, t distribution will be very close to the Z distribution. Hence, we can use
Z/2 to replace t/2, n-1 in the above formula when n gets sufficiently large,

Examples:

3. A random sample of 25 homeowners in New Territories showed that the sample mean mortgage payment is
HK$18,000 per month. The sample standard deviation was $1,500. Assuming that the mortgage payments
are approximately normally distributed, find an interval within which you can be 98% confident that the
true mean mortgage payment per month of all New Territories homeowners will lie. How about if you want
to be 90%, 80%, or 70% sure?

4. A random sample of 400 college students showed that, on average, they spent 1.5 hours in study with a
sample standard deviation of 0.5 hours. Assuming that this study time follows a normal distribution, find a
Estimation - 4

90% confidence interval for the true average study time of the college students. How about an 80%, a 60%,
a 99%, or a 95% confidence interval? 90%(1.46-1.54)

Das könnte Ihnen auch gefallen