Sie sind auf Seite 1von 69

Statistical Estimation

1. Point and interval estimation 2. Confidence interval for mean, proportion & Variance

Introduction
 

Everyone makes estimates ! When you ready to cross a street, you estimate the speed of any car that approaching, the distance between you and that car, and your own speed. Having made these quick estimates, you decide whether to wait, walk, or run. All managers must make quick estimates too.. The outcome of these estimates can affect their organizations seriously too.
Lecture 24 2

8/8/2011

Introduction


How do mangers use sample statistics to estimate population parameters? Statistical Estimation methods enable us to estimate with reasonable accuracy the population proportion If all these estimates are obtained on a census Basis, it would be very costly and time-consuming proposition. Hence sampling theory
Lecture 24 3

8/8/2011

Statistical Estimation


Statistical estimation is the procedure of using a sample statistic to estimate a population parameter. A statistic used to estimate a parameter is called an estimator and the value taken by the estimator is called an estimate. Statistical estimation is divided into two main categories: Point estimation & Interval estimation
Lecture 24 4

8/8/2011

Point estimation


A point estimate is a single number that is used to estimate an unknown population parameter. Example
If a firm takes a sample of 50 salesman And Average amount of time each salesman spend with his customers is 80 minutes And This figure is used for an estimate of a parameter Then 80 is the point estimate.

8/8/2011

Lecture 24

Interval estimation


An estimate of a population parameter given by two numbers between which the parameter may be considered to lie is called as interval estimate of the parameter An interval estimation is a range of values used to estimate a population parameter.
Average amount of time each salesman spend with his customers is between 60 to 80 minutes

8/8/2011

Lecture 24

Criteria of a Good Estimator.




   

A good estimate is one which is close to the population parameter being estimated. We can evaluate the quality of a statistic as an estimator by using four criteria. Unbiasedness Consistency Efficiency Sufficiency
Lecture 24 7

8/8/2011

Unbiasedness


An estimator is said to be unbiased if the expected value of the estimator is equal to the population parameter being estimated.  OR The mean of the sample values is equal to the population parameter, then it is unbiased estimate.

8/8/2011

Lecture 24

Consistency


As the sample size increases, the difference between the sample statistic and the population parameter should become smaller and smaller. If the difference continues to become smaller and smaller as the sample size becomes larger, the sample statistic is said to converge in probability to a parameter and is said to be consistent estimator of that parameter.
Lecture 24 9

8/8/2011

Efficiency


The efficiency of an estimator depends on its variance. If the variance of the estimator is small, then that estimate is closer to the parameter value. For example sample mean and sample median are unbiased and consistent estimators of population mean. Choose between them on the basis of relative efficiency. (select one which have smaller variance)
Lecture 24 10

8/8/2011

Sufficiency


A sufficient estimator is one that uses all information about the population parameter contained in the sample. Example sample mean is a sufficient estimator of the population mean since all the information in the sample is used in its computation. Not sample range.

8/8/2011

Lecture 24

11

Example
 

Consider A medical supplies company that produces disposable hypodermic syringes . Each syringe is wrapped in a sterile package and then jumblepacked in a large corrugated carton. Jumble packing causes the cartons to contain differing number of syringes. Because the syringes are sold on a per unit basis, the company needs an estimate of the number of syringes per carton for billing purposes.

8/8/2011

Lecture 24

12

Cont..


  

A sample of 35 cartons is taken and the number of syringes in each carton is recorded . Obtain the sample mean Sample mean = 102 syringes Then we can say that the point estimate of the population mean is 102 syringes per carton
Lecture 24 13

8/8/2011

Cont..


The manufactured price of a disposable hypodermic syringe is quite small, so both the buyer and seller would accept the use of this point estimate 102 as the basis for billing Manufacturer can save the time and expense of counting each syringe that goes into a carton.

8/8/2011

Lecture 24

14

Interval estimates


An interval estimate describes a range of values within which a population parameter is likely to lie.

8/8/2011

Lecture 24

15

Example


 

Suppose the marketing research director needs an estimate of the average life in months of car batteries his company manufactures. Select a sample of 200 car owners. Interview these owners and collect the data about the life of batteries. Let mean life = 36 months If Point estimate then 36 months.

8/8/2011

Lecture 24

16

Cont..


    

If the director asks for a statement about the uncertainty that will be likely to accompany this estimate or a range That can be done by Calculating the standard error of the mean as
Wx !

Say 0.707 We could now report that our estimate of the life of the companys batteries may lie somewhere in the range of 35.293 to 36.707 months.

W n

8/8/2011

Lecture 24

17

Confidence Interval


 

The probability that we associate with an interval estimate is called the confidence level. How confident? Most commonly used confidence levels are 90%, 95% & 99% Free to apply any confidence level.

8/8/2011

Lecture 24

18

For large samples, the confidence interval is

+ 1.64 W

When the confidence level is 90% (normal distribution )

When the population standard deviation is not known approximate it by sample standard deviation.

8/8/2011

Lecture 24

19

Interval estimation-Students t distribution


When ever sample size is 30 or less and the population standard deviation is not known. Then use t distribution. In using t distribution we assume that the population is normally distributed. A t distribution is lower at the mean and higher at the tails than a normal distribution. There is separate t distribution for each sample size Or for different degrees of freedom.
8/8/2011 Lecture 24 20

Degrees of freedom.


The number of values we can choose freely.

8/8/2011

Lecture 24

21

Example


As part of the budgeting process for next year, the manager of the Fan point electric generating plant must estimate the coal he will need for this year. Last year the plant almost ran out, so he is reluctant to budget for that same amount again. The plant manager took a random sample of 10 plant operating weeks chosen over the last 5 years. It yielded a mean usage of 11400 tons a week, a sample standard deviation of 700 tons a week. Calculated a sensible estimate of the amount( with 95 % confident ) to order this year.
Lecture 24 22

8/8/2011

    

n=10 df=9 Sample mean=11400 S.d=700 ( approximate this as population S.D.) Standard error= W x ! W
n

 

 

=221.38 From t-table, corresponding to d.f 9 & confidence level (1.00-0.95)=0.05 the t value= 2.262 The confidence interval is 11400 + 2.262* 221.38 10899 to 11901 tons with 95 % confidence
Lecture 24 23

8/8/2011

Tests of Hypothesis

8/8/2011

Lecture 24

24

Suppose a manger of a large shopping mall tells us that the average work efficiency of her employees is at least 90%. How can we test the validity of her claim? We could calculate the efficiency of a sample of her employees. If this sample statistic came out be 95% we would accept the managers statement. But if it is 46% we would reject her assumption as untrue. Suppose sample statistic is 88%. Whether we accept or reject? We cannot be absolutely certain that our decision is correct. Therefore learn to deal with uncertainty in our decision making.
Lecture 24 25

8/8/2011

Hypothesis
 

  

Here we wish to test efficiency = 90% (null) Against the alternative, efficiency 90%, (alternate) Or we can say null hypothesis H0 0=90 alternate hypothesis H1 1 90

8/8/2011

Lecture 24

26

Level of significance


Decide what criterion to use for deciding whether to accept or reject Ho Therefore reject the hypothesis at the 5% level of significance. We will reject the null hypothesis if the difference between the sample statistic and the hypothesized population parameter is so large that it occur on average, only five or fewer times in every 100 samples when the tested population parameter is correct
Lecture 24 27

8/8/2011

Level of significance


If we assume the hypothesis is correct, then the significance level will indicate the % of sample means that is outside certain limits.

8/8/2011

Lecture 24

28

Cont..


The purpose of testing is not to question the computed value of the sample statistic but to make a judgment about the difference between that sample statistic and tested population parameter.

8/8/2011

Lecture 24

29

Introduction


 

A hypothesis is an assumption about the population parameter to be tested based on sample information. Hypothesis tests are widely used in business and industry for making decisions.. Examples Based on sample data decide whether a new medicine is really effective in curing a disease Whether one training procedure is better than other.

8/8/2011

Lecture 24

30

The hypothesis is made about the value of some parameter, (only facts available to estimate the true parameter are those provided by sample) If the sample statistic differs from the hypothesis made about the population parameter, and if it is significant, then reject the hypothesis. If it is not significant then it must be accepted. Hence tests of hypothesis

8/8/2011

Lecture 24

31

Procedures of Hypothesis Testing


     

Set up a hypothesis Set up a suitable significance level Determination of a suitable test statistic Determination of the critical region Doing computations Making decisions

8/8/2011

Lecture 24

32

Set up a hypothesis
     

Establish the hypothesis to be tested. Set up Null hypothesis denoted by H0 & Alternate hypothesis denoted by H1 The null hypothesis There is no true difference in the sample statistic and population parameter under consideration
Lecture 24 33

8/8/2011

Set up a hypothesis


The hypothesis that is different from the null hypothesis is the alternate hypothesis H1 If the sample information leads to reject H0, then accept H1

8/8/2011

Lecture 24

34

Set up a suitable significance level




 

 

The confidence with which an experimenter rejects or retains null hypothesis The level of significance is denoted by E It is generally specified before any sample is drawn. (no influence) In practice 5% or 1% level of significance 5% 5 chances out of 100 that we would reject the null hypothesis ( 95% confident that right decision )

8/8/2011

Lecture 24

35

When the null hypothesis is rejected at =0.5 the result is said to be significant.

When the null hypothesis is rejected at E = 0.01 the result is said to be significant. The test result is said to be highly significant

8/8/2011

Lecture 24

36

Determination of a suitable test statistic





Test statistic =sample static Hypothesized population parameter


Standard error of the sample statistic

8/8/2011

Lecture 24

37

Determination the critical region


 

  

Determination of Which value of test statistic will lead to a rejection of H0 And which lead to acceptance of H0. The former is called critical region. Establishing a critical region is similar to determining a 100 (1- E ) % confidence interval.

8/8/2011

Lecture 24

38

Doing computations


Calculations for step 3

8/8/2011

Lecture 24

39

Making decisions
 

Draw statistical conclusions Either acceptance of the null hypothesis or rejection of it. Based on whether the computed value of the test statistic falls in the region of acceptance or region of rejection

8/8/2011

Lecture 24

40

8/8/2011

Lecture 24

41

Procedures for statistical inferences




Point estimation. Appropriate when the goal is to estimate a population parameter. Confidence interval. Appropriate when the goal is to estimate a population parameter with confidence. Hypotheses testing.
Hypothesis: a statement about the parameters.

Appropriate when the goal is to assess if the evidence provided by the data is in favor of some claim about the population.

8/8/2011

Lecture 24

42

Confidence Interval
 

Point estimate +/- margin of error Confidence interval for a population mean
W W x  z* n , x  z* n

Assumption: the population variance is known. Confidence level: C

8/8/2011

Lecture 24

43

Hypothesis Testing


Sometimes, not interested in


Estimate an unknown parameter Provide a confidence interval for the parameter

But rather, you have some claim (belief) about the parameter and you want to see whether the data supports the claim or not.
Support Contradict

8/8/2011

Lecture 24

44

Concepts of Hypothesis Testing




The critical concepts of hypothesis testing: two hypotheses H0 - the null hypothesis
 The

statement of no effect or no difference. statement we hope or suspect is true.

Ha - the alternative hypothesis


 The

Usually one would decide on Ha first.


8/8/2011 Lecture 24 45

A group of Statistics students spin the Belgian one-Euro coin 250 times, and it came up heads 140 times. p: the probability of getting a head during each spin. H0: p = .5 against Ha: p > .5.
One-sided

Biased one-Euro Coin?

H0: p = .5 against Ha: p .5.


Two-sided
8/8/2011 Lecture 24 46

Company Billing System


  

A new billing system for a company will be cost- effective only if the mean monthly account is more than $170. A sample of 400 monthly accounts has a mean of $178. If the accounts are normally distributed with W = $65, can we conclude that the new system will be cost effective? The population is the credit accounts at the store. We want to show that the mean account for all customers is greater than $170. Ha : Q > 170. The null hypothesis must specify a single value of the parameter QH0 : Q = 170. How can we achieve that?

 

8/8/2011

Lecture 24

47

Test statistic


A test is based on a statistic, which estimates the parameter that appears in the hypotheses
Point estimate

Values of the estimate far from the parameter value in H0 give evidence against H0. Ha determines which direction will be counted as far from the parameter value.

8/8/2011

Lecture 24

48

Company Billing System


Question: Is a sample mean of 178 sufficiently greater than 170 to infer that the population mean is greater than 170? Answer: Lets assume the population mean is 170, and see how likely it is for us to observe a sample mean of 178 or even more.

8/8/2011

Lecture 24

49

P-value


P-value:
the probability of observing a test statistic as extreme or more extreme than the actually observed value, given that H0 is true. extreme means far from what we would expect from H0.

The P-value provides information about the amount of statistical evidence that supports the null hypothesis.
The smaller the P-value, the less the evidence for H0.

8/8/2011

Lecture 24

50

Interpreting P-value


Because the probability that the sample mean is equal or larger than 178, when Q = 170, is so small (.0069), there are no reasons to believe that Q = 170. (or, reasons to believe that Q> 170.)

We can conclude that the smaller the P-value the more statistical evidence exists to support the alternative hypothesis.

8/8/2011

Lecture 24

51

Describing P-value


If the P-value is less than 1%, there is overwhelming evidence that supports the alternative hypothesis. If the P-value is between 1% and 5%, there is strong evidence that supports the alternative hypothesis. If the P-value is between 5% and 10% there is weak evidence that supports the alternative hypothesis. If the P-value exceeds 10%, there is no evidence that supports of the alternative hypothesis.
8/8/2011 Lecture 24 52

Significance Level E
 

We need to make a conclusion after carrying out the hypothesis test. What do we conclude? We can compare the P-value with a fixed value that we regard as decisive. This amounts to announcing in advance how much evidence against H0 we require in order to reject H0. The decisive value is called the significance level of the test. It is denoted by E and the corresponding test is called a level E test.
Statistical Significance: If the P-value e E, we say that the data are statistically significant at level E.
8/8/2011 Lecture 24 53

E and P-value


P-value and significance level E:


Reject H0 if Do not reject H0 if

When is it easier to reject H0?


Large E or small E ? .

When is the evidence against H0 stronger?


Large P-value or small P-value? .

8/8/2011

Lecture 24

54

Four steps of hypotheses testing




  

Define the hypotheses to test, and the required significance level E Calculate the value of the test statistic. Find the P-value based on the observed data. State the conclusion.
Reject the null hypothesis if the P-value <=E ; if it>E, the data do not provide sufficient evidence to reject the null.

8/8/2011

Lecture 24

55

Testing for normal mean with known a Let X1, , Xn beW random sample from N(Q, W2).
Null hypothesis:
H0: Q = Q0

Alternative hypothesis:
Ha: Q { Q0 Ha: Q > Q0 Ha: Q < Q0

The sample mean

is normally distributed with

Q ! Q and W ! W / n .
8/8/2011 Lecture 24 56

Normal with known W: Z test  When H0 is true, Q X ! Qand 0


has a standard normal distribution. Z is a natural measure of the distance between the sample mean and its expected value Q. X For a given sample, we observe If H0
8/8/2011

X  Q0 Z ! W/ n

x  Q0 z! . W is true, we expect z to be close to 0. / n


Lecture 24 57

Normal with known W




Case 1: Ha: Q {Q0.


H0 should be rejected if z is too far away from 0. The P-value is

Case 2: Ha: Q >Q0.


H0 should be rejected if z is much larger than 0. The P-value is

Case 3: Ha: Q < Q0.


H0 should be rejected if z is much smaller than 0. The P-value in this case is

8/8/2011

Lecture 24

58

Normal with known W: P-value  method Null hypothesis: H0: Q = Q0




Test statistic:
z!

xQ . W n

Alternative hypothesis Ha: Q { Q0 Ha: Q > Q0 Ha: Q < Q0

P-value

8/8/2011

Lecture 24

59

Sprinkler


  

A sprinkler systems maker claims that the true average system-activation temperature is 130o. A sample of n = 9 systems , when tested, yields a sample average activation temperature of 131.08o. If the distribution of activation temperature is normal with W = 1.5o, does the data contradict the claim at significance level E = .01 ? Let Q = true average activation temperature. Hypotheses: Test statistic: P-value: Conclusion:
8/8/2011 Lecture 24 60

 

Rejection Region Method




The rejection region is a range of values such that if the test statistic falls into that range, the null hypothesis is rejected. The rejection region method:
Define the hypotheses to test, and the required significance level E Find the corresponding rejection region. Calculate the test statistic. Reject the null hypothesis only if the value of the test statistic falls in the rejection region.

8/8/2011

Lecture 24

61

Normal with known W: Rejection Region Method  Null hypothesis: H0: Q = Q0




Test statistic: Alternative hypothesis Ha: Q { Q0 Ha: Q > Q0 Ha: Q < Q0


8/8/2011

x  Q0 z! . W/ n

Rejection region for level E test

Lecture 24

62

Sprinkler


1 2 3

A sprinkler systems maker claims that the true average system-activation temperature is 130o. A sample of n = 9 systems , when tested, yields a sample average activation temperature of 131.08o. If the distribution of activation temperature is normal with W = 1.5o, does the data contradict the claim at significance level E = .01 ? Let Q = true average activation temperature. Hypotheses: Rejection region: Test statistic: Conclusion:
8/8/2011 Lecture 24 63

Sprinkler Revisited


A sprinkler systems maker claims that the true average system-activation temperature is 130o. A sample of n = 9 systems , when tested, yields a sample average activation temperature of 131.08o. If the distribution of activation temperature is normal with W = 1.5o,
does the data contradict the claim at significance level E = .01 ? whats the 99% confidence interval for the activation temperature?

8/8/2011

Lecture 24

64

CI & 2-Sided Tests




A level E 2-sided test rejects H0: Q = Q0 exactly when the value Q0 falls outside a level 1 - E confidence interval for Q. Confidence interval can be used to test hypotheses. Calculate the 1 - E level confidence interval, then
if Q0 falls within the interval, do not reject the null hypothesis, Otherwise, reject the null hypothesis.

8/8/2011

Lecture 24

65

SAT


In a discussion of SAT scores, someone comments: Because only a minority of students take the test, the scores overestimate the ability of typical seniors. The mean SAT-M score is about 475, but I think if all seniors took the test, the mean would be 450. You gave the test to an SRS of 500 seniors from California. These students had an average score of 461. (The SAT-M score follows a normal distribution with a standard deviation of 100.) Is there sufficient evidence against the claim that the mean for all California seniors is 450 under a significance level of 0.05? Give a 95% CI for the mean score Q of all seniors.

8/8/2011

Lecture 24

66

SAT
 

The hypotheses are The test statistic is

Because Ha is two-sided, the P-value is Conclusion:

.


A 95% confidence interval for Q is

8/8/2011

Lecture 24

67

Take Home Message




Tests of significance:
When to use it Two hypotheses:
 

Null Alternative

Test for a population mean with known W


Test statistic P-value Significance level E P-value method


4 steps

Rejection region method




CI and 2-sided test


Lecture 24 68

8/8/2011

Homework 12.1


Reading in Text 435-452 Exercises in Text 6.32, 6.36, 6.44, 6.48, 6.52, 6.56 Due Time Thursday, April 28

8/8/2011

Lecture 24

69

Das könnte Ihnen auch gefallen