Sie sind auf Seite 1von 90

Hypothesis Testing

(Significance Test)
Terms - Definition
A hypothesis is a statement or assertion or assumption or claim
or belief about the state of nature (about the true value of an
unknown population parameter):

The accused is innocent


= 100
p=0.9
Daily sales at a Store is normally distributed
Employee empowerment and customer satisfaction are
independent

1 2 0
Terms-Definition

A Hypothesis Test is a statistical procedure


that involves formulating a hypothesis and
using sample data to decide on the validity of
the hypothesis (to support or not to support)
Terms-Definition
A null hypothesis, denoted by H0, is a statement about one or
more population parameters. This is the statement we hold to be
true until we have sufficient statistical evidence to conclude
otherwise.
H0: = 100
The alternative hypothesis, denoted by H1, is the assertion of
all situations not covered by the null hypothesis.
H1: 100
H0 and H1 are:
Mutually exclusive
Only one can be true.
The null hypothesis (H0) will always state that the parameter
equals the value specified in the alternative hypothesis (H1)
Example
Producer of electric bulbs claims that proportion
of defectives in a lot can never be more than 1%.
H0: p = 0.01
H1: p > 0.01
I believe that on an average a PGP student at
IIMK spends 15 hours per week using library
resources
H0: = 15
H1: 15
Example
Suppose a bank knows that their customers are waiting in line an average of
10.2 minutes during the lunch hour. The branch manager has decided
to add an additional teller during the 12-2 p.m. period and wishes to test
the hypothesis that the average wait has decreased due to the additional
teller. Set up the null and alternative hypothesis for the bank manager.
H0: = 10.2
H1: < 10.2
One-Tailed and Two-Tailed
Tests
The tails of a statistical test are determined by the need for an action. If action
is to be taken if a parameter is greater than some value a, then the alternative
hypothesis is that the parameter is greater than a, and the test is a right-tailed/
upper-tail test.
H0: 50 H1: 50

If action is to be taken if a parameter is less than some value a, then the


alternative hypothesis is that the parameter is less than a, and the test is a left-
tailed / lower tailed test.
H0: 50 H1: 50

If action is to be taken if a parameter is either greater than or less than some


value a, then the alternative hypothesis is that the parameter is not equal to a,
and the test is a two-tailed test. H0: 50
H1: 50
The Concepts of Hypothesis Testing

Test Statistic : Test statistic is a sample statistic


computed from sample data. The value of the test statistic
is used in determining whether or not we may reject the
null hypothesis.
A test statistic measures how close the point estimate of
the population parameter is to the null hypotheses value (of
the parameter). This closeness" is measured in terms of
the standard error of the point estimate.
Test statistic can be sample mean, sample variance, sample
proportion etc.
Example contd..
Producer of electric bulbs claim that proportion of
defectives in a lot can never be more than 1%.
H0: p = 0.01
H1: p > 0.01
A wholesaler takes a random sample of 100 bulbs from
the lot and found 3 defectives.

Test Statistic : Sample proportion= p 0.3


The Concepts of Hypothesis Testing

The decision rule of a statistical hypothesis test is a rule


that specifies the conditions under which the null
hypothesis may be rejected.
Critical Region () : The set of values of the test statistic
which cause us to reject the null hypothesis (H0)
Rejection of null hypothesis implies that the test is
significant, otherwise is insignificant

Consider H0: = 15. We may have a decision rule that says:


Reject H0 if the sample mean is less than 10 or more than 20 hrs.
Decision Making
There are two possible decisions that can be made:

Conclude that there is enough evidence to support the


alternative hypothesis
(also stated as: rejecting the null hypothesis in favor of the
alternative)

Conclude that there is not enough evidence to reject H0


NOTE: we do not say that we accept the null hypothesis
Decision Making
A decision may be correct in two ways:
Fail to reject a true H0
Reject a false H0
A decision may be incorrect in two ways:
Type I Error: Reject a true H0
The Probability of a Type I error is denoted
by .
Type II Error: Fail to reject a false H0
The Probability of a Type II error is denoted
by .
Type I and Type II Errors
Errors in Hypothesis Testing

= P(Reject H 0 H 0 is true)
= P(Accept H 0 H 0 is false)

One is inversely proportional to the other.


Can not minimize both the errors simultaneously
Hence, we fix P(type I error) at a pre-assigned value
and then minimize P(type II error)
Level of Significance
Upper bound of P(type I error) is called Level of Significance and
is denoted by .

Level of significance gives maximum allowable error that can


be committed in testing a hypothesis.

Significance level is usually set at 0.05, 0.1 or 0.01.

Significance level tells us how strong the evidence should be for


us to reject H0.
The p-Value
p-value is the probability of getting a test statistic value more
extreme than the observed value (we have obtained), given the
null hypothesis is true.

Smaller p-values represent stronger evidence against the null


hypotheses.

Policy: When the p-value is less than , reject H0.


Interpreting the p-value
The smaller the p-value, the more statistical evidence exists
to support the alternative hypothesis.
If the p-value is less than 1%, there is overwhelming
evidence that supports the alternative hypothesis.
If the p-value is between 1% and 5%, there is a strong
evidence that supports the alternative hypothesis.
If the p-value is between 5% and 10% there is a weak
evidence that supports the alternative hypothesis.
If the p-value exceeds 10%, there is no evidence that
supports the alternative hypothesis.
We observe a p-value of .0069, hence there is
overwhelming evidence to support H1: > 170.
The Power of a Test

The power of a statistical hypothesis test is the


probability of rejecting the null hypothesis when the
null hypothesis is false.

Power = (1 - )

Lesser is the P(type II error), more is the power


STATISTICAL INFERENCE
TYPE OF TESTS
The type of test is determined by the Alternative Hypothesis (H1)

Left Tailed Test - H1: parameter < value

Note that the inequality points to the left


Decision Rule: Reject Ho if test statistic < critical value
STATISTICAL INFERENCE

Right Tailed Test - H1: parameter > value

Note that the inequality points to the right


Decision Rule: Reject H0 if test statistic > critical value
STATISTICAL INFERENCE
Two Tailed Test - H1: parameter value

Note that the inequality points to both sides


Decision Rule: Reject H0 if test statistic < critical value
(left) or test statistic > critical value (right)
Six Steps of Hypothesis Testing
Any significance test procedure has five distinct steps

1. Making assumptions
2. Constructing hypotheses
3. Determining the test statistic
4. Constructing critical region
5. Determining p-values
6. Drawing conclusion
UNIVARIATE POPULATION
Significance test for population
mean (when is known)
Assumptions :
A random sample is drawn from a population (normal
distribution) with mean and sd
Sample size should be large (small)
Population sd is known

Hypotheses:
H 0 : 0 H 0 : 0 H 0 : 0
H1 : 0 H1 : 0 H1 : 0

Test Statistic: sample mean = x


By CLT,
2
x (~) N ,
n

x 0
z obs ~ N (0,1), under H 0

n
Hypothesis Test of the Population
Mean When Is Known
The Critical Value Approach
Determining the critical value(s) depending on the
specification of the competing hypotheses.

Reject H0 if
z > z/2 or z < z/2
Reject H0 if z < z Reject H0 if z > z

LO 9.4
Critical Region :

Right tail : {zobs z }


Left tail : {zobs z }
Two tail : {zobs z / 2 or z z / 2 }
Example
An automatic bottling machine fills cola into two liter (2000 cc)
bottles. A consumer advocate wants to challenge this average
amount. A random sample of 40 bottles coming out of the machine
was selected and the exact content of the selected bottles are
recorded. The sample mean was 1999.6 cc. The population
standard deviation is known from past experience to be 1.30 cc.
Test appropriate hypothesis.

Ho : 2000
H 1 : 2000
Hypothesis Test of the Population
Mean When Is Known
The p-value Approach
Determining the p-value depending on the specification
of the competing hypotheses.

Reject H0 if p-value <

LO 9.3
Test statistic ; p-value
x
z 0 = 1999.6 - 2000
obs 1.3
n 40

= 1.95
z 1.645
0.05
z -1.645
obs
p - value P(Z -1.95)
0.0256 0.05

Reject Null i.e. the test is significant


There is sufficient evidence for rejection
Significance test for population
mean (when is unknown)
Large sample
Assumptions :
A random sample is drawn from a population with mean
and sd
Sample size should be large
Population sd is unknown

Hypotheses:
H 0 : 0 H 0 : 0 H 0 : 0
H1 : 0 H1 : 0 H1 : 0

Test Statistic: sample mean= x ;


needs to be estimated. Unbiased estimator of for large n
is
1 n
s
n i 1
( xi x ) 2
2
x N ,
n
x 0
zobs ~ N (0,1), under H 0
s
n
Critical Region :

Right tail : {zobs z }


Left tail : {zobs z }
Two tail : {zobs z / 2 or z z / 2 }
Example
An automatic bottling machine fills cola into two liter (2000 cc)
bottles. A consumer advocate wants to challenge this average
amount. A random sample of 40 bottles coming out of the machine
was selected and the exact content of the selected bottles are
recorded. The sample mean and sd were found to be 1999.6 cc.
and 1.3 cc respectively.
Test appropriate hypothesis.

Ho : 2000
H 1 : 2000
Given, n 40, x 1996.6, s 1.3
Test statistic ; p-value
x
z 0 = 1999.6 - 2000
obs s 1.3
n 40

= 1.95
z 1.645
0.05
z -1.645
obs
p - value P(Z -1.95)
0.0256 0.05

Reject Null i.e. the test is significant


There is sufficient evidence to believe that the bottles
do not contain at least 2000 cc of cola.
Problem
I believe that on an average a PGP student at IIMK
spends 15 hours per week using library resources. A
random sample of 8 students were selected and
the average number of hours they spend in the library
came out to be 16.3 hrs. Assuming reading time to follow
normal distribution with sd 3.6 hrs, test a suitable
hypothesis

To test
H0: = 15
H1: 15
Test statistic ; p-value
x
z 0 = 16.3 -15
obs 3.6
n 8
= 1.02
z 1.96
0.025
z 1.96
obs
p - value P(Z 1.02) P(Z 1.02)
0.1539 * 2 0.3078 0.05

Do not reject Null i.e. the test is insignificant


There is no enough evidence for rejection of the belief
Significance test for population
proportion
Assumptions :
A random random sample is drawn from a population
Sample size should be large

Hypotheses:
H 0 : p p0 H 0 : p p0 H 0 : p p0
H 1 : p p0 H 1 : p p0 H 1 : p p0
Test Statistic: sample proportion= p
By CLT,

p (1 p )
N p,
p
n
p0
p
z obs ~ N (0,1)
p0 (1 p0 )
n
Critical Region :

Right tail : {z obs z }


Left tail : {z obs z }
Two tail : {z obs z / 2 or z z / 2 }
Example: Contd.
Producer of electric bulbs claim that proportion of
defectives in a lot can never be more than 1%. A
wholesaler takes a random sample of 100 bulbs
from the lot and found 3 defectives.

H0: p = 0.01
H1: p > 0.01
Approach 1: Test statistic:

0.03 0.01
zobs 2.01, under H 0
0.01(0.99) / 100
z0.05 1.645
zobs z0.05

Reject Null i.e. the test is significant


There is sufficient evidence for rejection of the claim of
the producer
p-value: The p-value would depend on the direction of the
alternative as follows :
If H1 : p > p0, p-value will be the right tailed area above
the observed value of the test statistic (Zobs) under the
standard normal curve.
If H1 : p < p0, p-value will be the left tailed area below the
observed value of the test statistic under the standard normal
curve.
If H1 : p p0, p-value will be the tailed area beyond the
observed value of the test statistic under the standard normal
curve. Since the normal curve is symmetric, it can also be
calculated as twice the one-tailed area above (or below) the
observed value of the test statistic.
Approach 2: p-value
0.03 0.01
0.03) P ( Z
P( p )
0.01(0.99) / 100
P ( Z 2.01) 1 ( 2.01)
1 0.9778 0.022 0.05

Reject Null i.e. the test is significant


There is sufficient evidence for rejection of the claim of
The producer
Problem
It is known that the percentage of managers who are female
in the Indian corporate sector has been pretty low, about
18%. The HRD ministry wants to know whether the
percentage has improved during recent times. Accordingly,
a random sample of 100 managers were chosen and 25 of
them were females.
Perform an appropriate test of hypotheses for the above
problem.
Significance test for population
mean (when is unknown)
small sample
Assumptions :
A random sample is drawn from normal distribution
with mean and sd
Population sd is unknown

Hypotheses:
H 0 : 0 H 0 : 0 H 0 : 0
H1 : 0 H1 : 0 H1 : 0

Test Statistic: sample mean= x ;


needs to be estimated. Unbiased estimator of is
n
1
s '
n 1 i 1
( xi x ) 2
(n 1) s
2 '2
x ~ N , ; ~ 2
n 1
n 2

x 0
tobs '
~ t n 1 , under H 0
s
n
Critical Region :

Right tail : {tobs t ;n 1}


Left tail : {tobs t ;n 1}
Two tail : {tobs t / 2;n 1 or tobs t / 2;n 1}
Example
New software companies that create programs for world wide web applications believe
that average staff age at these companies is 27. A random sample of 18 staff is chosen
from these companies and their age is given as follows: 41, 18, 25, 36, 26, 35, 24, 30, 28,
19, 22, 22, 26, 23, 24, 31, 22, 22. Test appropriate hypothesis.

H0: = 27 n = 18
H1: 27 x = 26.3
n = 18 s = 6.15
For = 0.05 and (18-1) = 17 df , x 26.3 - 27
critical values of t are 2.11 t s 0 = 6.15
obs
x 0 n 18
The test statistic is: t
s = 0.48 Do not reject H
n 0
Equivalence between Hypotheses
tests and Confidence intervals
The main idea is that a two-sided hypotheses test will give
us exactly the same conclusion (about the population
parameter) as a confidence interval i.e if we test
H0: = 0 vs H1: 0 and fail to reject H0 at significance
level (=0.01/0.05/0.1), then the corresponding 100(1)%
(99%, 95%, 90%) confidence interval will contain the null
value (i.e 0).

95% CI of
[26.3 - (1.96* 6.15/ 18 ) , 26.3 - (1.96* 6.15/ 18 )]
[23.46 , 29.14]
Problem
The manager of a small convenience store does not want her customers
standing in line for too long prior to a purchase. In particular, she is willing to
hire an employee for another cash register if the average wait time of the
customers is more than five minutes. She randomly observes the wait time (in
minutes) of customers during the day as:

3.5 5.8 7.2 1.9 6.8 8.1 5.4

a. Set up the null and the alternative hypotheses to determine if


the manager needs to hire another employee.
b. Calculate the value of the test statistic. What assumption
regarding the population is necessary to implement this step?
c. Use the critical value approach to decide whether the manager
needs to hire another employee at =0.10.
d. Repeat the above analysis with the p-value approach.
Probability of a Type II Error
A Type II error occurs when a false null hypothesis is not
rejected.

In example 11.1, this means that if is less than 175.34 (our


critical value) we will not reject our null hypothesis, which
means that we will not install the new billing system.

Thus, we can see that:

= P( < 175.34 given that the null hypothesis is false)


Example 11.1 (revisited)

= P( < 175.34 given that the null hypothesis is false)


The condition only tells us that the mean 170. We need to
compute for some new value of . For example, suppose
that if the mean account balance is $180 the new billing
system will be so profitable that we would hate to lose the
opportunity to install it.

= P( < 175.34, given that = 180), thus


Example 11.1 (revisited)
Our original hypothesis

our new assumption


Effects on of Changing
Decreasing the significance level , increases the
value of and vice versa. Change to .01 in
Example 11.1.

Stage 1: Rejection region


z z z.01 2.33
x x 170
z 2.33
/ n 65 / 400
x 177.57
Effects on of Changing
Stage 2 Probability of a Type II error

P( x 177.57 | 180)
x 177.57 180
P
/ n 65 / 400
Pz .75
.2266
Effects on of Changing
Consider this diagram again. Shifting the critical
value line to the right (to decrease ) will mean a
larger area under the lower curve for (and vice
versa)
Judging the Test
A statistical test of hypothesis is effectively defined
by the significance level () and the sample size (n),
both of which are selected by the statistics
practitioner.
Therefore, if the probability of a Type II error () is
judged to be too large, we can reduce it by
Increasing ,
and/or
increasing the sample size, n.
Judging the Test
For example, suppose we increased n from a
sample size of 400 account balances to 1,000 in
Example 11.1.

Stage 1: Rejection region

z z z.05 1.645
x x 170
z 1.645
/ n 65 / 1,000
x 173.38
Judging the Test

Stage 2: Probability of a Type II error

P( x 173.38 | 180)
x 173.38 180
P

/ n 65 / 1, 000
Pz 3.22
0 (approximat ely)
By increasing the sample size we reduce the
probability of a Type II error:
n=400

n=1,000
173.38
175.35
Compare at n=400 and n=1,000
TWO INDEPENDENT
UNIVARIATE
POPULATIONS
Sampling
In order to compare two groups (populations), we have to select
samples from both the groups. If the observations in one sample
are independent of those in the other, then those are
called independent samples.
Eg. Suppose we want to compare two drugs. We select a sample of
patients and randomly allocate them to the two drugs. These two
groups of patients (and also the observations coming from them)
will constitute independent samples since they were randomly
allocated to the two groups corresponding to the two drugs.
Significance test for difference
between population proportions
Notations:
p1 (p2) : population proportion of success in the first
(second) group.
n1 (n2) : sizes of random samples drawn from the first
(second) populations.

Assumptions :
Independent random samples from the two groups.
Large enough sample sizes so that in each sample there
are at least 5 success and 5 failures.

Hypotheses:

H 0 : p1 p 2 H 0 : p1 p 2 H 0 : p1 p 2
H 1 : p1 p 2 H 1 : p1 p 2 H 1 : p1 p 2
Test Statistic: difference between sample proportions
= p p
1 2

Let x1 and x2 represent the number of observations that belong to


the class of interest in samples 1 and 2, respectively.
x1 x2
p1 ; p 2
n1 n2
x1 x2
p
n1 n2
By CLT,
1 1
)
1 p
p 2 N p1 p2 , p (1 p n n
1 2

1 p
p 2
z obs ~ N (0,1), under H 0
1 1
p (1 p )

n1 n2
Critical Region :

Right tail : {zobs z }


Left tail : {zobs z }
Two tail : {zobs z / 2 or z z / 2 }
Example:
Two population Proportions
Is there a significant difference between the
proportion of men and the proportion of women
who will vote Yes on a proposal from local
administration.

In a random sample, 36 of 72 men and 31 of 50


women indicated they would vote Yes

Test at the .05 level of significance


Example: contd..

H0: p1 p2 = 0 (the two proportions are equal)


H1: p1 p2 0 (there is a significant difference between proportions)

sample proportionof men p 1 36 / 72 0.5


sample proportionof women p 2 31 / 50 0.62

The pooled estimate for the overall proportion is:

x1 x 2 36 31 67
p .549
n1 n 2 72 50 122
Example: contd.

The test statistic for testing H0

z
p1 p 2 p1 p2
1 1
p (1 p )
n1 n2


.50 .62 0 1.31
1 1
.549 (1 .549)
72 50

Critical Values = 1.96 Conclusion: There is not


For = .05 significant evidence of a
difference in proportions who
Decision: Do not reject H0 will vote yes between men and
women.
Significance test for difference
between population means
(population sds are known)
(both small and large sample)
Notations:
1 (2) : population mean in the first (second) group.
1 (2) : population sd in the first (second) group.
n1 (n2) : sizes of random samples drawn from the first
(second) populations.

Assumptions :
Independent random samples from the two populations
(normal distributions) are drawn.
n1 and n2 large (small)

Hypotheses:

H 0 : 1 2 H 0 : 1 2 H 0 : 1 2
H 1 : 1 2 H 1 : 1 2 H 1 : 1 2
Test Statistic: difference between sample means
= x x (unbiased estimators)
1 2 1 2

By CLT,

12 22
x1 x2 (~) N 1 2 ,
n1 n2
( x1 x2 ) ( 1 2 )
zobs ~ N (0,1), under H 0
12 22

n1 n2
Critical Region :

Right tail : {zobs z }


Left tail : {zobs z }
Two tail : {zobs z / 2 or z z / 2 }
Example
A product developer is interested in reducing the drying time of
primer paint. Two formulations of the paint are tested.
Formulation-1 is the standard chemistry and Formulation-2 has a
new drying ingredient that should reduce the drying time. From
experience it is known that the standard deviation of drying time is
8 minutes and this inherent variability should be unaffected by the
addition of the new ingredient. Ten specimens are painted with
Formulation-1 and another 10 specimens are painted with
Formulation-2; the 20 specimens are painted in random order. The
two sample average drying times are 121 minutes and 112 minutes
respectively. What conclusions can be the product developer draw
about the effectiveness of the new ingredient, using = 0.05.
Example: contd..
X 1 : drying time from formulation 1
X 2 : drying time from formulation 2
Let
X 1 ~ N ( 1 , 12 )
X 2 ~ N ( 2 , 22 )
X 1 and X 2 are independently distributed

H 0 : 1 2
H 1 : 1 2

x1 x2
Z 0bs
12 22

n1 n2
Example : contd..
Reject H0 if Z0bs 1.645 at = 0.05
Computations:
Since x =121 minutes, x 2 =112 minutes,
1
1 = 22 = 82 = 64 minutes and n1= n2 = 10, the value of the test
2

statistics is,
121 112
Z0 2.52
8 2
8 2

10
10

Conclusion: Since Z0 = 2.52 > 1.645, we reject H0: 1 - 2 = 0 at the 0.05
level of significance and conclude that adding the new ingredient to the
paint significantly reduces the drying time.
Significance test for difference
between population means
(population sds are unknown)
Large sample
Notations:
1 (2) : population mean in the first (second) group.
1 (2) : population sd in the first (second) group.
n1 (n2) : sizes of random samples drawn from the first
(second) populations.

Assumptions :
Independent random samples from the two groups are
drawn.
n1 and n2 large

Hypotheses:

H 0 : 1 2 H 0 : 1 2 H 0 : 1 2
H 1 : 1 2 H 0 : 1 2 H 0 : 1 2
Test Statistic: difference between sample means
= 1 2 x1 x2 (unbiased estimators)

1 n1
1 s1
n1 i 1
( x1i x1 ) 2

n2
1
2 s2
n2
2i 2
( x
i 1
x ) 2

By CLT,
12 22
x1 x2 N 1 2 ,
n1 n2
( x1 x2 ) ( 1 2 )
zobs ~ N (0,1), under H 0
s12 s22

n1 n2
Critical Region :

Right tail : {zobs z }


Left tail : {zobs z }
Two tail : {zobs z / 2 or z z / 2 }
Significance test for difference
between population means
(population sds are unknown)
Small sample
Notations:
1 (2) : population mean in the first (second) group.
1 (2) : population sd in the first (second) group.
n1 (n2) : sizes of random samples drawn from the first
(second) populations.

Assumptions :
Independent random samples are drawn from normal
distributions
1 = 2 = (say)

Hypotheses:

H 0 : 1 2 H 0 : 1 2 H 0 : 1 2
H 1 : 1 2 H 1 : 1 2 H 1 : 1 2
Test Statistic: difference between sample means

= 1 2 x1 x2 (unbiased estimators)
1 n1
1 s1'
n1 1 i 1
( x1i x1 ) 2

n2
1
2 s
'
2
n2 1 i 1
( x 2i x 2 ) 2

Pooled estimator of 2 is
(n1 1) s (n2 1) s
'2 '2
s
'2 1 2
n1 n2 2
2 1 1
x1 x2 ~ N 1 2 , ( )
n1 n2
(n1 n2 2) s '2
~ 2
n1 n2 2

2

( x1 x2 ) ( 1 2 )
tobs ~ t n1 n2 2 , under H 0
1 1
s '

n1 n2
Critical Region :

Right tail : {tobs t ;n1 n2 2 }


Left tail : {tobs t ;n1 n2 2 }
Two tail : {tobs t / 2;n1 n2 2 or tobs t / 2;n1 n2 2 }
Example
A sample of scores on an examination given in statistics are

Men : 72 69 98 66 85 76 79 80 77
Women : 81 67 90 78 81 80 76

Is the mean score of women is same as that of men?

Hypothesis:
H0 : f = m H1: f m
Solution:
Women Men
Mean 79 78
Variance 47.33333333 90
Observations 7 9
Pooled Variance 71.71428571
Hypothesized Mean
Difference 0
df 14
t Stat 0.234318967
P(T<=t) one-tail 0.409064729
t Critical one-tail 1.761310115
P(T<=t) two-tail 0.818129458
t Critical two-tail 2.144786681

Das könnte Ihnen auch gefallen