Sie sind auf Seite 1von 23

8.

1 Introduction
Everyone makes estimates. When you are ready to cross a street, you estimate the speed of any car that is approaching, the distance between you and that car, and your own speed. Having made these quick estimates, you decide whether to wait, walk, or run.

8.2 Reasons why estimates have to be made


All mangers must make quick estimates too. The outcome of these estimates can affect their organizations as seriously as the outcome of your decision as to whether to cross the street. Credit managers estimate whether a purchaser will eventually pay his bills. Prospective home buyers make estimates concerning the behaviour of interest rates in the mortgage market. All these people make estimates without worry about whether they are scientific but with the hope that the estimates bear a reasonable resemblance to the outcome. Managers use estimates because in all but the most trivial decisions, they must make rational decisions without complete information and with a great deal of uncertainty about what the future will bring. As educated citizens and professionals, you will be able to make more useful estimates by applying the techniques described in this and subsequent chapters.

8.3 Making statistical inference


Statistical inference is based on estimation, and hypothesis testing. In both estimation and hypothesis testing, we shall be making inferences about characteristics of populations from information contained in samples. Here we infer something about a population from information taken from a sample. Here we try to estimate with reasonable accuracy the population proportion (the proportion of the population that possesses a given characteristic) and the population mean. To calculate the exact proportion or the exact mean would be an impossible goal. Even so, we will be able to make an estimate, and implement some controls to avoid as much of the error as possible.

8.4 Types of estimates


There are two types of estimates about a population
1. 2. A point estimate and An interval estimate

A Point estimate: is a single number that is used to estimate an unknown population parameter. A point estimate is often insufficient, because it is either right or wrong, we do not know how wrong it is. Therefore, a point estimate is much more useful if it is accompanied by an estimate of the error that might be involved. An interval estimate: is a range of values used to estimate a population parameter. It indicates the error in two ways: by the extent of its range and by the probability of the true population parameter lying within that range.

Criteria of a good estimator


8.5.1 Unbiasedness: This is a desirable property for a good estimator to have. The term un-biasedness refers to the fact that a sample mean is an unbiased estimator of a population mean because the mean of the sampling distribution of sample means taken from the same population is equal to the population mean itself. We can say that a statistic is an unbiased estimator if, on average, it tends to assume values that are above the population parameter being estimated as frequently and to the same extent as it tends to assume values that are below the population parameter being estimated.

Criteria of a good estimator


8.5.2 Efficiency: Another desirable property of a good estimator is that it be efficient. Efficiency refers to the size of the standard error of the statistic. If we compare two statistics from a sample of the same size and try to decide which one is the more efficient estimator, we would pick the statistic that has the smaller standard error. EXAMPLE: Suppose we choose a sample of a given size and must decide whether to use the sample mean or the sample median to estimate the population mean. If we calculate the standard error of the sample mean and find it to be 1.05 and then calculate the standard error of the sample median and find it to be 1.6, we would say that the sample mean is a more efficient estimator of the population mean because its standard error is smaller. It makes sense that an estimator with a smaller standard error (with less variation) will have more chance of producing an estimate nearer to the population parameter under consideration.

Criteria of a good estimator


8.5.3 Consistency: A statistic is a consistent estimator of a population parameter if as the sample size increases, it becomes almost certain that the value of the statistic comes very close to the value of the population parameter. If an estimator is consistent, it becomes more reliable with large samples.

Criteria of a good estimator


8.5.4 Sufficiency: An estimator is sufficient if it makes so much use of the information in the sample that no other estimator could extract from the sample additional information about the population parameter being estimated.

8.6 Point estimates:

Consider the table above, we have taken a sample of 35 boxes of bolts from a manufacturing line and have counted the bolts per box. We can arrive at the population mean i.e. mean number of bolts by taking the mean for the 35 boxes we have sampled. i.e. adding all the bolts and dividing by the number of boxes.

Thus using the sample mean x as the estimator we have a point estimate of the population mean . Similarly we can use the sample variance s 2 and estimate the population variance, where the sample variance s 2 is given by the formula.

8.7 Interval Estimates


The purpose of gathering samples is to learn more about a population. We can compute this information from the sample data as either point estimates, or as interval estimates. An interval estimate describes a range of values within which a population parameter is likely to lie.

The marketing research director needs an estimate of the average life in months of car batteries his company manufactures. We select a random sample of 200 batteries with a mean life of 36 months. If we use the point estimate of the sample mean x as the best estimator of the population mean , we would report that the mean life of the companys batteries is 36 months. The director also asks for a statement about the uncertainty that will be likely to accompany this estimate, that is, a statement about the range within which the unknown population mean is likely to lie. To provide such a statement, we need to find the standard error of the mean. If we select and plot a large number of sample means from a population, the distribution of these means will approximate to normal curve. Furthermore, the mean of the sample means will be the same as the population mean. Our sample size of 200 is large enough that we can apply the central limit theorem. Suppose we have already estimated the standard deviation of the population of the batteries and reported that it is 10 months. Using this standard deviation we can calculate the standard error of the mean by using the formula

Making the interval estimate: We can tell to the director that our estimate of the life of the companys batteries is 36 months, and the standard error that accompanies this estimate is 0.707. In other words, the actual mean life for all the batteries may lie somewhere in the interval estimate of 35.293 to 36.707 months. This is helpful but insufficient information for the director. Next, we need to calculate the chance that the actual life will lie in this interval or in other intervals of different widths that we might choose, 2 (2 x 0.707), 3 (3 x 0.707), and so on. The probability is 0.955 that the mean of a sample size of 200 will be within 2 standard errors of the population mean. Stated differently, 95.5 percent of all the sample means are within 2 standard errors from m. The population mean will be located within 2 standard errors from the sample mean 95.5 percent of the time. Hence from the above example we can now report to the director, that the best estimate of the life of the companys batteries is 36 months, and we are 68.3 percent confident that the life lies in the interval from 35.293 to 36.707 months (36 1x ). Similarly, we are 95.5percent confident that the life falls within the interval of 34.586 to 37.414 months (36 2x ), and we are 99.7 percent confident that battery life falls within the interval of 33.879 to 38.121 months (36 3x).

8.8 Interval Estimates and confidence intervals


In using interval estimates, we are not confined to 1,2 and 3 standard errors for example, 1.64 standard errors includes about 90 percent of the area under the curve it includes 0.4495 of the area on either side of the mean in a normal distribution. Similarly, 2.58 standard error includes about 99 percent of the area, or 49.51 percent on each side of the mean. The probability that we associate with an interval estimate is called the confidence level. This probability indicates how confident we are that the interval estimate will include the population parameter. A higher probability means more confidence. In estimation, the most commonly used confidence levels are 90 percent, 95 percent, and 99 percent, but we are free to apply any confidence level. The confidence interval is the range of the estimate we are making. Example : If we report that we are 90 percent confident that the mean of the population of incomes of people in a certain community will lie between Rs. 8,000 and Rs. 24,000, then the range Rs. 8,000Rs. 24,000 is our confidence interval. Often, however, we will express the confidence interval in standard errors rather than in numerical values. Thus, we will often express confidence intervals like this: 1.64 x

+ 1.64 x = upper limit of the confidence interval 1.64 x = lower limit of the confidence interval
Thus, confidence limits are the upper and lower limits of the confidence interval. In this case,

X + 1.64 x is called the upper confidence limit (UCL) and X 1.64 x = is the lower confidence limit (LCL).

Calculating interval Estimates of the Mean from Large Samples


If the samples are large then we use the finite population multiplier to calculate the standard error. This is given from the previous unit as

Calculating interval Estimates of the Proportion from Large Samples


Statisticians often use as sample to estimate a proportion of occurrences in a population. For example, the government estimates by a sampling procedure the unemployment rate, or the proportion of unemployed people, in the countrys workforce. We know for a binomial distribution, the mean and the standard deviation of the binomial distribution to be Mean = np Standard deviation = where q = 1-p where n = number of trials p = probability of success and q = probability of failure = 1-p Since we are taking the mean of the sample to be the mean of the population we actually mean that = p Similarly, we can modify the formula for the standard deviation of the binomial distribution,, which measures the standard deviation in the number of successes. To change the number of successes to the proportion of successes,
we divide by n and get /

Therefore the standard error of the proportion =

Example: In a very large organization the director wanted to find out what proportions of the employees prefer to provide their own retirement benefits in lieu of a company sponsored plan. A simple random sample of 75 employees was taken and found that 40%, i.e. 0.4 of them are interested in providing their own retirement plans. The management requests that we use this sample to find an interval about which they can be 99 percent confident that it contains the true population proportion. Here n = 75, p = 0.4 q = 1- p = 1 0.4 = 0.6 Therefore Standard error of the mean = There the interval estimate for 99% level of confidence is 0.4 2.58 (0.057) = 0.253 and 0.547. Therefore the proportion of the total population of employees who wish to establish their own retirements plans lie between 0.253 and 0.547.

Interval Estimates using the students t Distribution


So far, the sample sizes we were examining were all larger than 30. This is not always the case. Questions like how can we handle estimates where the normal distribution is not the appropriate sampling distribution, that is, when we are estimating the population standard deviation and the sample size is 30 or less? Suppose we have data only form let us say 10 weeks or sample sizes less than 30, then fortunately, another distribution exists that is appropriate in these cases. It is called the t distribution. Early theoretical work on t distributions was done by a man named W. S. Gosset in the early 1990s. Gosset was employed by the Guinness Brewery in Dublin, Ireland, which did not permit employees to publish research findings under their own names. So Gosset adopted the pen name Student and published under that name. Consequently, the t distribution is commonly called Students t distribution, or simply Students distribution.

Conditions for usage: Because it is used when the sample size is 30 or less, statisticians often associate the t distribution with small sample statistics. This is misleading because the size of the sample is only one of the conditions that lead us to use the t distribution. The second condition is that the population standard deviation must be unknown. Use of the t distributions for estimating is required whenever the sample size is 30 or less and the population standard deviation is not known. Furthermore, in using the t distribution, we assume that the population is normal or approximately normal.

Degrees of freedom
There is a different t distribution for each of the possible degrees of freedom. What are degrees of freedom? We can define them as the number of values we can choose freely. We will use degrees of freedom when we select a t distribution to estimate a population mean, and we will use n 1degrees of freedom, where n is the sample size. For example, if we use a sample of 20 to estimate a population mean, we will use 19 degrees of freedom in order to select the appropriate t distribution. With two sample values, we have one degree of freedom (21 = 1), and with seven sample values, we have six degrees of freedom (71 = 6). In each of these two examples, then, we had n1 degrees of freedom, assuming n is the sample size. Similarly, a sample of 23 would give us 22 degrees of freedom.

Using the t Distribution Table


Comparison between t and z tables The table of t distribution values differs in construction from the z table or normal distribution table used previously. The t table is more compact and shows areas and t values for only a few percentages (10, 5, 2, and 1 Percent). Because there is a different t distribution for each number of degrees of freedom, a more complete table would be quite lengthy. Although we can conceive of the need for a more complete table A second difference in the t table is that it does not focus on the chance that the population parameter being estimated will fall with our confidence interval. Instead, it measures the chance that the population parameter we are estimating will not be within our confidence interval (that is, that it will lie outside it). If we are making an estimate at the 90 percent confidence level, we would look in the t table under the 0.10 column (100 percent 90 percent = 10 percent). This is 0.10 chance of error is symbolized by the Greek letter alpha . We would find the appropriate t values for confidence intervals of 95 percent, 98 percent, and 99 percent under the columns headed 0.05, 0.02, and 0.01, respectively. A third difference in using the t table is that we must specify the degrees of freedom with which we are dealing. Suppose we make an estimate at the 90 percent confidence level with a sample size of 14, which is 13 degrees of freedom. Look under the 0.10 column until you encounter the row labelled 13. Like a z value the t value there of 1.771 shows that if we mark off plus and minus 1.7716 sx (estimated standard errors of x) on either side of the mean, the area under the curve between these two limits will be 90 percent, and the area outside these limits(the chance of error) will be 10 percent. Remember that in any estimation problem in which the sample size is 30 or less and the standard deviation of the population is unknown and the underlying population can be assumed to be normal or approximately normal, we use the t distribution.

8.9 Determining the Sample size in Estimation


In all the examples above we have used, the sample size was known. Now we are trying to estimate the sample size n. if it is too small we may fail to achieve the objective, if it is too large we will be wasting resources. However, lets try to examine some of the methods that are useful in determining what sample is necessary for any specified level of precision.