Sie sind auf Seite 1von 11

Chapter 5

Hypothesis Testing

Recall that Statistics, the science of analyzing data, has two main branches – descriptive and
inferential statistics. In descriptive statistics we describe data or tell a story hidden in the data. In
inferential statistics, we infer a story about a population, based on data obtained for a sample. In
Inferential statistics, one of the things we do is estimate the values of certain population parameters,
something we learnt in the previous chapter. The other main thing we do in inferential statistics is
hypothesis testing, the subject of this chapter.

Although hypothesis testing may sound very technical, it is something you and I do every day. In
fact, it is part of your survival mechanism on a moment by moment basis. You are constantly engaged in
hypothesis testing. In order to survive in your environment, you need to make sure that your
environment is conducive to your survival and safety. If it deviates from being conducive to your survival
and safety, your senses will alert you to it and you will do something about it. For example, let’s say you
are working on your computer and suddenly you hear a loud explosion, an extraordinarily loud explosion,
your senses will alert you to it and you will leave whatever you were doing and would want to know what
happened, what caused the explosion and whether you need to take any action to ensure your safety.
For example, your mind might entertain all kinds of ideas such as – can a fire break out, has some country
declared war etc. And if fire can break out, can it affect your building, do you need to evacuate, do you
need to call emergency or make sure someone has already called emergency etc. In other words,
whenever a rare event happens, you take notice and do something about it especially if the rare event
affects you in some way. In this example, on a moment to moment basis you assume that your
environment is “normal” until one of your senses “alerts” you to a “sufficient deviation from normality”
which causes you to take notice and perhaps even some action particularly if the deviation from
normality affects you in some way. This is an example of hypothesis testing that you are constantly
engaged in. Your default hypothesis (or the null hypothesis) is that your environment is “normal”. Your
alternative hypothesis is that it isn’t. Your senses are constantly sending you signals. The moment a
signal suggests sufficient evidence to believe that an abnormal condition exists, you reject the null
hypothesis that all is normal.

Let me give you another example, a softer example, one that does not involve explosions.
Suppose you are having a nice dinner conversation with your friends and one of your friends tells you
that she saw a man almost 8 ft. tall the other day. I am sure you, at least for a moment, stop eating, gasp,
and give your friend an amazed look, as if she has accomplished a very rare feat. Since it is very rare for
someone to be 8 ft. tall to begin with and it is even more rare that such a person happens to be around
your town and it is even more rare for your friend to run into this person, that hearing about such a rare
event draws an unusual reaction from you. In fact, you must have noticed, that at parties, during
conversations, people are trying to impress others by telling them things that are out of the ordinary.
Sometimes, there is almost like a competition going on as to who can narrate a more extraordinary true
incident that will cause everyone to gasp and make you feel important. After all, if your friend had told
you that she saw someone who was 5’ 8”, not only would you not gasp, you might wonder what was
wrong with your friend – why is she wasting your time giving such mundane piece of information. In this
example, your null hypothesis is that everyone has a normal height of around 5’ 8” and a standard
deviation of around 3” and that most of the adult population falls somewhere between 4’ 6” and 7 ft.
Some basketball players go beyond 7 ft. The alternate hypothesis is that someone does not belong in this
normal range of height. Any news or evidence of someone being outside of this normal expected range
rejects your null hypothesis and it invokes a reaction and possibly some action from you.

1
As another example, in the court of law, in most countries, everyone is considered “innocent”
unless proven “guilty”. In order to be proven the guilt, there should be sufficient “evidence” establishing
the guilt, for example if the defendant’s fingerprints might have been found on the murder weapon. It is
harder to prove or establish “innocence”. Attorneys often try to establish innocence by establishing
alibis, but we all know that alibis can also be designed and many criminals do design alibis in order to
establish innocence in case they are convicted of crime. In this example, the null hypothesis (or the
default hypothesis) is that the defendant is innocent and the alternative hypothesis is that the defendant
is guilty. When sufficient evidence of guilt is found, the null hypothesis is rejected. Till then the null
hypothesis is not rejected.

As another example, suppose you go to a doctor with some symptoms that in your mind are
sufficiently severe to raise a flag in your mind that something is wrong with you. The doctor will assume
that you are a normal and healthy person. But since you are complaining, he or she will look for scientific
evidence that you really need medical attention. So the doctor will order some tests that will try to
establish sufficient evidence that you indeed carry an abnormal condition. Without any evidence to the
contrary you are assumed to be normal. In this example, the null hypothesis is that you are healthy and
the alternative is that you are not and you need to have sufficient evidence to reject the null hypothesis.

This scientific process of testing for hypotheses has been used for hundreds of years by scientists,
criminologists, doctors, sociologists, statisticians, and really by everyone including you, on a day to day
basis.

Students find the subject of hypothesis testing very difficult to understand. And perhaps it was
very difficult during pre-Excel days. But these days, there is no reason to find testing of hypothesis a
difficult topic. Basically the line of reasoning is this - if the probability of some event is so small that it is
unlikely that the assumption of “all is normal” can be true then you reject that assumption and go with
the alternate hypothesis that “all is not normal”. In the explosion example, the probability of a loud
explosion is so small when all is normal, that you had to admit and take notice that something was not
“normal”. In the dinner table conversation, the probability of someone’s height being 8 ft. was so small
that you had to take notice that something was not “normal”. In the criminal case example the
probability of finding the defendant’s fingerprints on the murder weapon is so small in case the
defendant was indeed innocent, that you want to reject the null hypothesis that he (or she) is innocent.
In the medical example, if the x-rays reveal a large ulcer in your stomach, then it might be sufficient
evidence that not all is normal. So basically, one has to establish the probability of the “rare” event
under the assumption that all is normal. The probability of the rare event is called the p-value. If the p-
value is very small (say less than 5% or less than 1%), then you may say that you have sufficient evidence
against the null. So remember this – a small p-value means a small probability (because p stands for
probability) and small probabilities imply rare events or “not normal” events and therefore a low p-value
is evidence against the null hypothesis because the null hypothesis assumes that all is normal. If you
cannot understand the line of reasoning explained in the last two pages, then you will have a lot of
difficulty understanding the material in the rest of this course. So if you do not understand the last two
pages, please go back and read them again till you understand them.

Please note that in all the discussion in this chapter so far, the word “normal” has nothing to do
with the “normal probability distribution”. I am using the term normal as in nothing abnormal or nothing
out of the ordinary.

In hypothesis testing, we deal with two kinds of hypotheses – the null hypothesis and the
alternate hypothesis. The null hypothesis is usually represented by the symbol H0 or H0. The alternate
hypothesis is denoted by H1 or Ha or H1 or Ha.
2
Hypotheses are usually about a population parameter and we use a sample statistic to test a
hypothesis. The reason for this is very simple. If all of the data about a population was available, there
would be no need to hypothesize about the value of its parameters, because you can compute it exactly.
Since data about a population is not usually available, we need to hypothesize about the possible value of
its parameters. For example, we may make a null hypothesis that the average height of all the people in
your population is 68”. In symbols, we can write this as follows: H0 : µ = 68”. µ is a Greek symbol usually
used for a population average. The alternative hypothesis would be that the average height of all the
people in your population is greater than 68”. In symbols, we can write the alternate hypothesis as Ha: µ
> 68”. You can also have an alternate hypothesis like Ha: µ < 68” or Ha: µ ≠ 68”.

So how do we test this set of null and alternate hypotheses? We collect some data and obtain
the mean of the sample (since the hypothesis is about the population mean). If the sample mean is close
to 68”, such as 68.5”, we may not have sufficient evidence against the null hypothesis and we will
therefore not reject the null hypothesis. After all if the mean of the population is 68” then there is a
reasonably high probability of obtaining a sample whose mean happens to be 68.5”. If the sample mean,
however, is very different from 68”, such as 74”, the probability of which is very small if the null
hypothesis were true, that it must be that the null hypothesis is not true and your decision would be to
reject the null hypothesis. Remember this is the same line of reasoning that we used earlier. So the
question is - what is the probability of obtaining a sample mean of >74” if the mean height was indeed
68”, i.e. if the null hypothesis were true? The answer to this question is the p-value. If the p-value is
small enough, then we can reject the null hypothesis.

So how do we compute this p-value?

Recall that in an earlier chapter we learnt how to compute probabilities about different range of
values of a normal random variable if the probability distribution of that random variable is known. So
for example, if we knew the mean and the standard deviation of a normal random variable then we can
find the probability of any range of values for that variable. As a refresher, in Figure 1, I reproduce the
first few rows of the standard normal table.

Cum Cum Cum Cum


z Prob. z Prob. z Prob. z Prob.
-3.00 0.00135 -1.50 0.06681 0.00 0.50000 1.50 0.93319
-2.95 0.00159 -1.45 0.07353 0.05 0.51994 1.55 0.93943
-2.90 0.00187 -1.40 0.08076 0.10 0.53983 1.60 0.94520
-2.85 0.00219 -1.35 0.08851 0.15 0.55962 1.65 0.95053
-2.80 0.00256 -1.30 0.09680 0.20 0.57926 1.70 0.95543
-2.75 0.00298 -1.25 0.10565 0.25 0.59871 1.75 0.95994
-2.70 0.00347 -1.20 0.11507 0.30 0.61791 1.80 0.96407
-2.65 0.00402 -1.15 0.12507 0.35 0.63683 1.85 0.96784
-2.60 0.00466 -1.10 0.13567 0.40 0.65542 1.90 0.97128
-2.55 0.00539 -1.05 0.14686 0.45 0.67364 1.95 0.97441
-2.50 0.00621 -1.00 0.15866 0.50 0.69146 2.00 0.97725
-2.45 0.00714 -0.95 0.17106 0.55 0.70884 2.05 0.97982
Figure 1: Partial Standard Normal Table

Using this table, we can answer questions like what is P(z < 2.0)? or what is P(z > 2.0)? I hope you
remember how to do it. P(z < 2.0) is 0.97725 and P ( z > 2.0) is 1 minus 0.97725 = 0.02275

Similarly P(z > 1.65) = 1 minus 0.95053 or 0.04947


3
Also recall that you can use the standard normal distribution to answer these types of questions for any
normal random variable as long as you know the mean and the standard deviation of the distribution.
You do this by standardizing the value of the normal random variable. As a refresher, let us say the
normal variable about which I need some probabilities is called X and it has a normal distribution with a
mean of 40 and a standard deviation of 2. What is P(X > 43)? To standardize X to z, we do (43 – 40)/2 =
1.5.

So P(X > 43) is the same as P(z > 1.5) which is 1 minus 0.93319 = 0.06681.

So how to test a hypothesis such as: H0 : µ = 68” and Ha: µ >68”?

Basically we want to get a sample of data and find the sample mean. Call the sample mean x-bar

Then find P(mean > x-bar). This probability is called the p-value. If this value p-value is very small, say
smaller than 0.05 then we say we have sufficient evidence to reject the null hypothesis at a significance
level of 0.05. If this probability is greater than 0.05 then we fail to reject the null hypothesis at a
significance level of 0.05.

Let’s say we collect some sample data of say size 36 and let’s say we find that the sample mean (or x-bar)
comes out to be 68.9” and the standard deviation comes out to be 3 inches. The question is what is
P(mean > 68.9”) if the null hypothesis were true? To answer this question, we need the probability
distribution of the mean if the null hypothesis were true. Now I will tell you that the probability
distribution of the mean if the null hypothesis were true has a normal distribution whose mean is 68” and
the standard deviation is 3/sqrt(36) = 3/6 = 0.5. So the question is what is P(mean > 68.9”) if the mean
has a normal distribution with a mean of 68” and a standard deviation of 0.5”. We normalize 68.9” as
(68.9 – 68)/0.5 = 0.9/0.5 = 1.8

So P(mean > 68.9) is the same as P(z > 1.80) which is 1 minus 0.96407 = 0.03593. Since this probability is
less than 0.05, we have sufficient evidence to reject the null hypothesis at a significance level of 0.05.
Note that we can reject the null hypothesis at a significance level of 0.04 or even at 0.036 or any value
greater than 0.03593 but we could not reject the null hypothesis at a significance level of 0.01 or 0.02 or
0.03 or any value less than 0.03593.

Let’s say instead of 68.9, our sample mean came out to be 68.8”. Now we need P(mean > 68.8”). The
normalized value of 68.8” is 0.8/0.5 = 1.6. so P(mean > 68.8”) = P(z > 1.6) = 1 minus 9.94520 = 0.0548,
which is greater than 0.05. So we say that we did not find sufficient evidence to reject the null at a
significance level of 0.05. But we can reject the null at a significance level of 0.06 or 0.07 and any value
higher than 0.0548.

What is the significance of the significance level?

Lower the significance level, more the confidence with which I can reject the null hypothesis. So when
the sample mean was 68.9” we were more confident in rejecting the null (smallest significance level of
0.03593) than when the sample mean was 68.8” (smallest significance level of 0.0548). Intuitively also,
you should understand that more the deviation between the hypothesized mean (according to the null
hypothesis) and the sample mean, greater the evidence against the null. In this example since 68.9 had a
greater deviation from 68 than 68.8, we were more confident in rejecting the null at 68.9” than at 68.8”.
In fact we couldn’t even reject the null (at a significance level of 0.05) when the sample mean was 68.8”.

I really hope you understood the above logic of hypothesis testing.


4
I want to point out one thing in the above examples. In the process of generating the p-values,
we first standardized the value of the sample means. For example we got the standardized value as 1.8
in the first example and 1.6 in the second. Using 1.8, we got P(z > 1.8) to be 0.03593 and using 1.6, we
got P(z > 1.6) to be 0.0548. We then said that since 0.03593 is less than 0.05 and if the desired
significance level is 0.05, we have sufficient evidence to reject the null. In the second example, we said
that since 0.0548 is greater than 0.05, which was our desired significance level, we failed to reject the
null. Going from the standardized value to the p-value required the extra step of reading the standard
normal table. Many textbooks and authors recommend bypassing this extra step of reading the table and
making the decision simply based on the standardized value itself. For the above example, for a
significance level of 0.05, if the standardized value happens to be greater than or equal to 1.645 then the
null hypothesis can be rejected. For a significance level of 0.01, if the standardized value is greater than
or equal to 2.33 then the null can be rejected. For a significance level of 0.10, if the standardized value is
greater than 1.28, then the null can be rejected. If we carried out the extra step of reading the standard
normal table to find the p-value, we would get the exact same decision.

Note also that in the above examples the alternate hypothesis was Ha: µ >68”. This alternate
hypothesis is considered a one-tailed hypothesis. Why is it considered a one-tailed hypothesis? It’s
because the sign in the alternate hypothesis is in one direction only (in this case “>”). An example of a
two-tailed alternate hypothesis would be Ha: µ ≠ 68”. In a two-tailed hypothesis, the null would be
rejected if either the sample mean is sufficiently higher than 68” or sufficiently lower than 68”. In the
one-tailed hypothesis (such as this - Ha: µ >68”), the null would only be rejected if the sample mean was
sufficiently higher than 68”. One tailed hypothesis can be in the other direction as well (such as Ha: µ
<68”). In this case, the null would only be rejected if the sample mean was sufficiently lower than 68”.
Also, in the above example, note that the sample size was 36, which can be considered a large sample
size. Any sample size larger than 30 may be considered a large sample.

The standardized value of the sample mean, in the above examples is also called the test statistic.
It is called the test statistic because it is a statistic (because it summarizes the sample data) and because
it is used as the basis for testing the hypothesis. The range of test statistic values above which (or below
which) the null hypothesis can be rejected is called the rejection region.

Let me now give you, in Figure 2, some rejection regions for various types of hypotheses for large sample
sizes:

Significance Level Rejection region of the Test Statistic


One Tailed (Lower Tail) One Tailed (Upper Tail) Two-Tailed
0.10 z < -1.280 z > 1.280 z < -1.645 or z > 1.645
0.05 z < -1.645 z > 1.645 z < -1.960 or z > 1.960
0.01 z < -2.330 z > 2.330 z < -2.575 or z > 2.575
Figure 2: Rejection Regions for Various Types of Null Hypotheses for Large Sample Sizes

You should understand that P(z < -1.280) is 0.10 and P(z > 1.280) is 0.10 and P(z < -1.645) + P(z > 1.645) =
0.10. This is why the rejection regions are what they are in Figure 2. Similarly, P(z < -1.645) = 0.05 and
P(z > 1.645) is 0.05 and P(z < -1.96) + P(z > 1.96) = 0.05. Similarly, P(z < -2.33) = 0.01 and P(z > 2.33) = 0.01
and P(z < -2.575) + P(z > 2.575) = 0.01.

5
Hypothesis testing for small Sample

In the first part of this chapter we learnt how to perform a test of hypothesis for a large sample (>30). In
this part, we will look at how to test a hypothesis when the sample size is small (<30).

What is the main difference when dealing with small sample as opposed to large sample?

If you recall, for large sample, we specified the rejection region for various significance levels. For
example, for a one-tailed test (upper tail), for a significance level of 0.05, the rejection region was z >
1.645. We got 1.645 from the standard normal table. Basically P(z > 1.645) = 0.05.

The main difference is that for small samples, we get the cutoff values for the rejection region from the t-
tables instead of the z-tables. In t-tables, the cutoffs depend on two things – the significance level and
the degree of freedom. The degree of freedom for the kinds of hypotheses we have seen so far is (n – 1),
where n is the sample size.

That is the only difference. In an earlier chapter we learnt how to use the t-tables. Next we will see lots
of examples of hypothesis testing.

Before going further, let me go over some jargon

Critical Value: The critical value is basically the cutoff value above which (or below which) you reject the
Null hypothesis. For hypotheses about the mean of a single population, for large sample size we get the
critical value from the z-table. For small sample size, we get them from the t-table. For other types of
hypotheses, we get the critical value from other types of tables that we have not studied yet, but will
study in later chapters.

Test Statistic: The summary value that we compute using the sample is called the test statistic. For
hypotheses about the mean of a single population, the test statistic is basically the number of standard
deviations from the hypothesized value, where the standard deviation is that of the sampling distribution
of the mean.

p-value: For hypotheses about the mean of a single population, the p-value is the probability of the
sample mean being greater than (for a right tailed hypothesis) or less than (for a left tailed hypothesis) its
value assuming the null hypothesis is true.

Rejection Region: The range of values of test statistic where we reject the null hypothesis. For example
for a right-tailed for a significance level of 0.05, the rejection region is “> 1.645”

Let me now talk about two new terms: Type-I Error and Type-II Error.

Type-I Error: Suppose you get very unlucky in that you get a sample that does not represent the
population very well. But you don’t know that you got unlucky. You believe that the sample is good and
you are thinking that if the null hypothesis is true, then the probability of getting this sample is very small
and you therefore reject the null hypothesis. But the problem was that the sample was bad. At any rate,
your decision to reject the null was an error because in fact the null hypothesis was true. This type of
error is called a Type-I error. In a Type-I error, you reject the null incorrectly. For example, say a patient
does not have cancer and the null hypothesis is also that the patient has no cancer. Suppose the tests
suggest that the patient has cancer and the doctor rejects the null hypothesis. This would be an example
of a Type-I error. A defendant who is actually innocent but is found guilty is a victim of a Type-I error. Of
course, in the last example, the null hypothesis was that the defendant was innocent.

6
Type-II Error: Suppose the null hypothesis is false, but your sample data suggests that it is not false. So
when you should have rejected the null, you don’t reject the null. This is a Type-II Error. For example if a
patient is declared cancer free when he does in fact have cancer or if a defendant is not found guilty
when he is in fact guilty, we have examples of Type-II error.

Note that in general either the Type-I or the Type-II error is more expensive than the other. For example
making a Type-II error in case of a patient having cancer can cost his life, whereas making a Type-I error
on a cancer free patient will at the most force the patient to go through cancer treatment, which
although very expensive, will not kill him. So making a Type-II error in this case is more expensive. In the
case of a criminal, making a Type-I error is very expensive because you might be executing an innocent
man whereas when you make a Type-II error a guilty person wasn’t punished and this person will still be
at large, which has a cost to the society but at least no innocent man was killed.

You have to evaluate what type of error is more expensive, depending on your situation.

Desired Significance Level: The highest probability of Type-I error that you are willing to make is called
the desired significance level. For example, if you say you are testing a hypothesis at a desired
significance level of 0.05, you are saying that you are willing to tolerate a probability of type-I error up to
5%. The desired significance level is represented by the Greek symbol alpha (α). As the cost of the Type-I
error gets higher, your desired significance level should get smaller because the lower your desired
significance level, the lower the probability of making a Type-I error.

Actual Significance Level: Another name for the p-value.

Now I am going to tell you an extremely extremely extremely important thing. So please pay all the
attention you can. Ok here we go -- Whenever the test statistic is in the rejection region, the actual
significance level (or the p-value) is less than the desired significance level (or α).

Steps in Hypothesis Testing

1. State the Null Hypothesis.


2. State the Alternate Hypothesis.
3. Establish the desired significance level (α) and the critical value. Also figure out the rejection
region.
4. Collect the data.
5. Calculate the test statistic and the p-value
6. Make your decision using either the critical region approach or the p-value approach
7. State your conclusions (in terms of the original problem).
8. Assess the consequences of making a Type-I error and a Type-II error

A few important things about each of the above are in order.

Step -1: State the null hypothesis. The null hypothesis usually assumes that all is well, or that there is no
difference or that there is no relationship between two variables or that something is equal to something.
Usually you will see an equality sign. Here are some examples of valid null hypotheses:

H0: µ = 68 H0: µ ≥ 68 H0: µ ≤ 68 H0: µ1 = µ2 H0: µ1 ≤ µ2 H0: µ1 ≥ µ2

7
Following are some examples of invalid null hypotheses:

H0: µ > 68 H0: µ < 68 H0: µ ≠ 68 H0: µ1 ≠ µ2 H0: µ1 < µ2 H0: µ1 > µ2

Note that in the examples of invalid null hypotheses, there were no equality signs. In the examples of
valid null hypotheses, there were equality signs (≥ and ≤ signs also included the equality signs).

Step-2: State the Alternate Hypothesis. The alternate hypothesis assumes not all is well or that there is a
difference or that there is relationship between two variables or that something is not equal to
something. All the examples of invalid null hypotheses above are actually valid examples of alternate
hypotheses. The alternate hypothesis is also called the research hypothesis. Usually a researcher is
happy to find evidence in favor of the research hypothesis because then the researcher has something to
say because he or she found something out of the ordinary. After all what good is a dinner conversation
if everyone is telling everyone straightforward, ordinary, well-known facts? What good is a research
report that tells that a treatment had no effect? When you look at the alternate hypothesis, you should
be able to tell if it is two-tailed hypothesis or a one-tailed and if it one-tailed, is it left tailed or right tailed.
For example if the alternate hypothesis looks like this: Ha: µ ≠ 68, it is a two-tailed hypothesis. If it looks
like this Ha: µ > 68 , or this Ha: µ < 68, it is one tailed. The first one is right tailed and the second one is
left tailed.

Step-3: Establish the desired significance level (α) and the critical value. Also figure out the rejection
region. The most popular desired significance level is 0.05 or 5%. But it is just a convenient round
number. There is nothing sacred about – it just happens to be the most used significance level. Other
lesser popular levels are 0.10 and 0.01. These levels are decided by the researcher and the value
depends on the cost of the type-I error. If the cost is high, you want the desired level of significance to be
lower – say 0.01.

Step-4: Collect the data. This basically means get a good representative sample and measure the values
of the random variable of interest. Care must be taken in sampling. The sample should be as random as
possible.

Step-5: Calculate the test statistic and the p-value. With Excel and computers, this step is easy. There are
straightforward formulas for performing this step. Note that the formulas change for different types of
hypotheses. We will learn the formulas for some types of hypotheses in this chapter and for other types
of hypotheses in later chapters.

Step-6: Decision. The decision is basically to either reject the null or fail to reject the null. We never say
that we accept the alternate hypothesis or that we accept the null hypothesis. We also never reject the
alternate hypothesis. We either reject the null or fail to reject the null.

Step 7: State your conclusion in the context of your original problem. Many students confuse conclusion
and decision. When I ask for a conclusion, do not say whether the null is rejected or not rejected.
Rejecting or not rejecting the null is a decision not a conclusion. The conclusion is always in the context
of the real world problem about which the hypotheses are being tested. For example if the hypothesis is
that the mean height is 68” (the null) vs. that the mean height is > 68” (the alternate) and if your decision,
based on the sample data was to reject the null, then a statement of your conclusion will be something
like this - there was sufficient evidence at significance level of α (where α is whatever it is) that the
average of height of the population is greater than 68”. If the decision was to fail to reject the null, then
a statement of your conclusion will be something like – there wasn’t sufficient evidence at significance
level of α that the average height of the population was not 68”. Note that you will not say that there
was sufficient evidence that the average height of the population was 68”.
8
Step 8: Assess the consequences of making a type-I error and a type-II error. This is an important exercise
and it forces you to think about what if you made a mistake in your decision and hence your conclusion.
It is not enough just to declare your decision and your conclusions because there is always a finite
probability that your decision and hence your conclusions were incorrect. For example if you made a
type-I error in diagnosing a patient with a certain disease then a consequence of that will be that you will
give the patient treatment that was never needed which might have a cost and perhaps even some side
effects.

Examples of Hypothesis Testing

We will look at several examples of hypothesis testing now.

Example 1: A lot of students prefer to work after their undergraduate degree, but wish to return for
graduate education after acquiring some experience in the real world. A researcher is interested in
knowing the average age at which students return for graduate studies. The research hypothesis is that
the age is greater than 27. To test their hypothesis the researcher interviews 49 graduate students at a
campus randomly and finds that the average age of this sample was 28.4 years with and a sample
standard deviation of 3.5 years. Using this information carry out all the steps of hypothesis testing at a
significance level of 0.05.

Step 1: H0: µ = 27, where µ represents the average age of all graduate students who are returning to
graduate school after working for some years after college.
Step 2: Ha: µ > 27 (Note that this is a one-tailed hypothesis. It is right-tailed hypothesis)
Step 3: α = 0.05 (this was given to us). The rejection region is z > 1.645
Step 4: Data on 49 students was collected. Sample mean xbar = 28.5, sample standard deviation s =
3.5 yrs.
Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (28.5 – 27)/(3.5/sqrt(49)) = 1.5/(3.5/7)
= 3.0. The p-value corresponding to this test statistic is P(z > 3.0) = 1 – 0.99865 = 0.00135
Step 6: Decision. Using the critical value approach, since 3.0 > 1.645, we reject the null hypothesis.
Using the p-value approach, since 0.00135 is less than 0.05 we reject the null hypothesis.
Step 7: There was sufficient evidence at significance level of 0.05 that the average age of students
returning for graduate school after working for some years is greater than 27 years.
Step 8: Since we are rejecting the null hypothesis, there is a chance of making a type-I error, i.e. the
true average age may be less than or equal to 27 years although our test indicates otherwise. A
consequence might be that the courses may be designed for a more mature audience when in
fact the audience is not that mature. This may result in an expectation gap for the faculty
teaching the courses. If the age was indeed above 27 and our test showed that it was not, then
there would be a type-II error. A consequence would be that the courses would be designed
for not so mature audience whereas the students are expecting more advanced courses.

Example 2: In the above example, what if the sample average turned out to be 27.5 years instead of 28.5.
All other things are the same. Carry out all the steps.
Steps 1, 2 and 3 are the same as above.
Step 4: Data on 49 students was collected. Sample mean xbar = 27.5, sample standard deviation s =
3.5 yrs.
Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (27.5 – 27)/(3.5/sqrt(49)) = 0.5/(3.5/7)
= 1.0. The p-value corresponding to this test statistic is P(z > 1.0) = 1 – 0.84134 = 0.15866

9
Step 6: Decision. Using the critical value approach, since 1.0 < 1.645, we fail to reject the null
hypothesis. Using the p-value approach, since 0.15866 is greater than 0.05 we fail to reject the
null hypothesis.
Step 7: There was not sufficient evidence at significance level of 0.05 that the average age of students
returning for graduate school after working for some years is greater than 27 years.
Step 8: Similar to the example above.

Example 3: In example 1, assume that the sample average turned out to be 28 years and the sample
standard deviation was 5 years instead of 3.5 years.
Steps 1, 2 and 3 are the same as above.
Step 4: Data on 49 students was collected. Sample mean xbar = 28, sample standard deviation s = 5.0
yrs.
Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (28 – 27)/(5/sqrt(49)) = 1/(5/7) = 7/5 =
1.4. The p-value corresponding to this test statistic is P(z > 1.4) = 1 – 0.91924 = 0.08076
Step 6: Decision. Using the critical value approach, since 1.4 < 1.645, we fail to reject the null
hypothesis. Using the p-value approach, since 0.08076 is greater than 0.05 we fail to reject the
null hypothesis.
Step 7: There was not sufficient evidence at significance level of 0.05 that the average age of students
returning for graduate school after working for some years is greater than 27 years.
Step 8: Similar to the example above.

Example 4: In Example 1, what if the research hypothesis was Ha: µ ≤ 27? Everything else was the same
and the sample average came out to be 26.5” and a standard deviation of 3.5 years

Step 1: Same as Example 1


Step 2: Ha: µ < 27 (Note that this is a one-tailed hypothesis. It is left-tailed hypothesis)
Step 3: α = 0.05 (this was given to us). The rejection region is z < -1.645
Step 4: Data on 49 students was collected. Sample mean xbar = 26.5, sample standard deviation s =
3.5 yrs.
Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (26.5 – 27)/(3.5/sqrt(49)) = -0.5/(3.5/7)
= -1.0. The p-value corresponding to this test statistic is P(z < -1.0) = 0.15866
Step 6: Decision. Using the critical value approach, since -1.0 > -1.645, we fail to reject the null
hypothesis. Using the p-value approach, since 0.15866 is greater than 0.05 we fail to reject the
null hypothesis.
Step 7: There was not sufficient evidence at significance level of 0.05 that the average age of students
returning for graduate school after working for some years is less than 27 years.

Example 5: In Example 1, what if the research hypothesis was Ha: µ ≠ 27? Everything else was the same.

Step 1: H0: µ = 27, where µ represents the average age of all graduate students who are returning to
graduate school after working for some years after college.
Step 2: Ha: µ ≠ 27 (Note that this is a two-tailed hypothesis.)
Step 3: α = 0.05 (this was given to us). The rejection region is either z < -1.96 or z > 1.96
Step 4: Data on 49 students was collected. Sample mean xbar = 28.5, sample standard deviation s =
3.5 yrs.

10
Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (28.5 – 27)/(3.5/sqrt(49)) = 1.5/(3.5/7)
= 3.0. The p-value corresponding to this test statistic is 2*P(z > 3.0) = 2*(1 – 0.99865) =
2*0.00135 = 0.0027.
Step 6: Decision. Using the critical value approach, since 3.0 > 1.96, we reject the null hypothesis.
Using the p-value approach, since 0.0027 is less than 0.05 we reject the null hypothesis.
Step 7: There was sufficient evidence at significance level of 0.05 that the average age of students
returning for graduate school after working for some years is not 27 years.
Step 8: Since we are rejecting the null hypothesis, there is a chance of making a type-I error, i.e. the
true average age may be 27 years although our test indicates otherwise.
Let me summarize the above five examples in a table:
Example 1 Example 2 Example 3 Example 4 Example 5
H0 µ = 27 µ = 27 µ = 27 µ = 27 µ = 27
Ha µ > 27 µ > 27 µ > 27 µ < 27 µ ≠ 27
Tails One (right) One (right) One (right) One (left) Two
α 0.05 0.05 0.05 0.05 0.05
Rejection >1.645 >1.645 >1.645 < -1.645 <-1.96 or >1.96
region
Sample 28.5 27.5 28 26.5 28.5
Mean
Sample std. 3.5 3.5 5 3.5 3.5
dev.
Sample size 49 49 49 49 49
Test (28.5-27)/ (27.5-27)/ (28-27)/ (26.5-27)/ (28.5-27)/
Statistic (3.5/sqrt(49) = (3.5/sqrt(49) = (5/sqrt(49) = (3.5/sqrt(49) = (3.5/sqrt(49) =
3.0 1.0 1.4 -1.0 3.0
p-value P(z>3.0) = P(z>1.0) = P(z>1.4) = P(z<-1.0) = P(z<-3)+P(z>3) =
0.00135 0.15866 0.08076 0.15866 2*P(z>3) =
2*0.00135=
0.0027
Decision Reject Null Fail to reject Fail to reject Fail to reject Reject

Now let’s look at an Example involving small sample (<30)


Example 6: What if in Example 1, the sample size was 16 instead of 49. Everything else was the same.
Example 6
H0 µ = 27
Ha µ > 27
Tails One (right)
α 0.05
Rejection region >1.753
Sample Mean 28.5
Sample std. dev. 3.5
Sample size 16
Test Statistic (28.5-27)/(3.5/sqrt(16) = 1.5*4/3.5 = 1.714
p-value P(t>1.714) = 0.0535
Decision Fail to Reject Null

Instead of writing paragraphs about it I have shown it in tabular form above similar to the first five
examples. I will point out the differences here. In this example, note that the rejection region of >1.753
came from the t-table of Chapter 3, for degrees of freedom 15 under the column 0.05. You can also use
Excel function T.INV like this =T.INV(0.95,15) where 0.95 is basically 1 – 0.05 and 15 is the degree of
freedom, which in this case is 16 minus 1.

11