0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

1 Ansichten11 SeitenMar 28, 2020

© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

1 Ansichten11 Seiten© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 11

Hypothesis Testing

Recall that Statistics, the science of analyzing data, has two main branches – descriptive and

inferential statistics. In descriptive statistics we describe data or tell a story hidden in the data. In

inferential statistics, we infer a story about a population, based on data obtained for a sample. In

Inferential statistics, one of the things we do is estimate the values of certain population parameters,

something we learnt in the previous chapter. The other main thing we do in inferential statistics is

hypothesis testing, the subject of this chapter.

Although hypothesis testing may sound very technical, it is something you and I do every day. In

fact, it is part of your survival mechanism on a moment by moment basis. You are constantly engaged in

hypothesis testing. In order to survive in your environment, you need to make sure that your

environment is conducive to your survival and safety. If it deviates from being conducive to your survival

and safety, your senses will alert you to it and you will do something about it. For example, let’s say you

are working on your computer and suddenly you hear a loud explosion, an extraordinarily loud explosion,

your senses will alert you to it and you will leave whatever you were doing and would want to know what

happened, what caused the explosion and whether you need to take any action to ensure your safety.

For example, your mind might entertain all kinds of ideas such as – can a fire break out, has some country

declared war etc. And if fire can break out, can it affect your building, do you need to evacuate, do you

need to call emergency or make sure someone has already called emergency etc. In other words,

whenever a rare event happens, you take notice and do something about it especially if the rare event

affects you in some way. In this example, on a moment to moment basis you assume that your

environment is “normal” until one of your senses “alerts” you to a “sufficient deviation from normality”

which causes you to take notice and perhaps even some action particularly if the deviation from

normality affects you in some way. This is an example of hypothesis testing that you are constantly

engaged in. Your default hypothesis (or the null hypothesis) is that your environment is “normal”. Your

alternative hypothesis is that it isn’t. Your senses are constantly sending you signals. The moment a

signal suggests sufficient evidence to believe that an abnormal condition exists, you reject the null

hypothesis that all is normal.

Let me give you another example, a softer example, one that does not involve explosions.

Suppose you are having a nice dinner conversation with your friends and one of your friends tells you

that she saw a man almost 8 ft. tall the other day. I am sure you, at least for a moment, stop eating, gasp,

and give your friend an amazed look, as if she has accomplished a very rare feat. Since it is very rare for

someone to be 8 ft. tall to begin with and it is even more rare that such a person happens to be around

your town and it is even more rare for your friend to run into this person, that hearing about such a rare

event draws an unusual reaction from you. In fact, you must have noticed, that at parties, during

conversations, people are trying to impress others by telling them things that are out of the ordinary.

Sometimes, there is almost like a competition going on as to who can narrate a more extraordinary true

incident that will cause everyone to gasp and make you feel important. After all, if your friend had told

you that she saw someone who was 5’ 8”, not only would you not gasp, you might wonder what was

wrong with your friend – why is she wasting your time giving such mundane piece of information. In this

example, your null hypothesis is that everyone has a normal height of around 5’ 8” and a standard

deviation of around 3” and that most of the adult population falls somewhere between 4’ 6” and 7 ft.

Some basketball players go beyond 7 ft. The alternate hypothesis is that someone does not belong in this

normal range of height. Any news or evidence of someone being outside of this normal expected range

rejects your null hypothesis and it invokes a reaction and possibly some action from you.

1

As another example, in the court of law, in most countries, everyone is considered “innocent”

unless proven “guilty”. In order to be proven the guilt, there should be sufficient “evidence” establishing

the guilt, for example if the defendant’s fingerprints might have been found on the murder weapon. It is

harder to prove or establish “innocence”. Attorneys often try to establish innocence by establishing

alibis, but we all know that alibis can also be designed and many criminals do design alibis in order to

establish innocence in case they are convicted of crime. In this example, the null hypothesis (or the

default hypothesis) is that the defendant is innocent and the alternative hypothesis is that the defendant

is guilty. When sufficient evidence of guilt is found, the null hypothesis is rejected. Till then the null

hypothesis is not rejected.

As another example, suppose you go to a doctor with some symptoms that in your mind are

sufficiently severe to raise a flag in your mind that something is wrong with you. The doctor will assume

that you are a normal and healthy person. But since you are complaining, he or she will look for scientific

evidence that you really need medical attention. So the doctor will order some tests that will try to

establish sufficient evidence that you indeed carry an abnormal condition. Without any evidence to the

contrary you are assumed to be normal. In this example, the null hypothesis is that you are healthy and

the alternative is that you are not and you need to have sufficient evidence to reject the null hypothesis.

This scientific process of testing for hypotheses has been used for hundreds of years by scientists,

criminologists, doctors, sociologists, statisticians, and really by everyone including you, on a day to day

basis.

Students find the subject of hypothesis testing very difficult to understand. And perhaps it was

very difficult during pre-Excel days. But these days, there is no reason to find testing of hypothesis a

difficult topic. Basically the line of reasoning is this - if the probability of some event is so small that it is

unlikely that the assumption of “all is normal” can be true then you reject that assumption and go with

the alternate hypothesis that “all is not normal”. In the explosion example, the probability of a loud

explosion is so small when all is normal, that you had to admit and take notice that something was not

“normal”. In the dinner table conversation, the probability of someone’s height being 8 ft. was so small

that you had to take notice that something was not “normal”. In the criminal case example the

probability of finding the defendant’s fingerprints on the murder weapon is so small in case the

defendant was indeed innocent, that you want to reject the null hypothesis that he (or she) is innocent.

In the medical example, if the x-rays reveal a large ulcer in your stomach, then it might be sufficient

evidence that not all is normal. So basically, one has to establish the probability of the “rare” event

under the assumption that all is normal. The probability of the rare event is called the p-value. If the p-

value is very small (say less than 5% or less than 1%), then you may say that you have sufficient evidence

against the null. So remember this – a small p-value means a small probability (because p stands for

probability) and small probabilities imply rare events or “not normal” events and therefore a low p-value

is evidence against the null hypothesis because the null hypothesis assumes that all is normal. If you

cannot understand the line of reasoning explained in the last two pages, then you will have a lot of

difficulty understanding the material in the rest of this course. So if you do not understand the last two

pages, please go back and read them again till you understand them.

Please note that in all the discussion in this chapter so far, the word “normal” has nothing to do

with the “normal probability distribution”. I am using the term normal as in nothing abnormal or nothing

out of the ordinary.

In hypothesis testing, we deal with two kinds of hypotheses – the null hypothesis and the

alternate hypothesis. The null hypothesis is usually represented by the symbol H0 or H0. The alternate

hypothesis is denoted by H1 or Ha or H1 or Ha.

2

Hypotheses are usually about a population parameter and we use a sample statistic to test a

hypothesis. The reason for this is very simple. If all of the data about a population was available, there

would be no need to hypothesize about the value of its parameters, because you can compute it exactly.

Since data about a population is not usually available, we need to hypothesize about the possible value of

its parameters. For example, we may make a null hypothesis that the average height of all the people in

your population is 68”. In symbols, we can write this as follows: H0 : µ = 68”. µ is a Greek symbol usually

used for a population average. The alternative hypothesis would be that the average height of all the

people in your population is greater than 68”. In symbols, we can write the alternate hypothesis as Ha: µ

> 68”. You can also have an alternate hypothesis like Ha: µ < 68” or Ha: µ ≠ 68”.

So how do we test this set of null and alternate hypotheses? We collect some data and obtain

the mean of the sample (since the hypothesis is about the population mean). If the sample mean is close

to 68”, such as 68.5”, we may not have sufficient evidence against the null hypothesis and we will

therefore not reject the null hypothesis. After all if the mean of the population is 68” then there is a

reasonably high probability of obtaining a sample whose mean happens to be 68.5”. If the sample mean,

however, is very different from 68”, such as 74”, the probability of which is very small if the null

hypothesis were true, that it must be that the null hypothesis is not true and your decision would be to

reject the null hypothesis. Remember this is the same line of reasoning that we used earlier. So the

question is - what is the probability of obtaining a sample mean of >74” if the mean height was indeed

68”, i.e. if the null hypothesis were true? The answer to this question is the p-value. If the p-value is

small enough, then we can reject the null hypothesis.

Recall that in an earlier chapter we learnt how to compute probabilities about different range of

values of a normal random variable if the probability distribution of that random variable is known. So

for example, if we knew the mean and the standard deviation of a normal random variable then we can

find the probability of any range of values for that variable. As a refresher, in Figure 1, I reproduce the

first few rows of the standard normal table.

z Prob. z Prob. z Prob. z Prob.

-3.00 0.00135 -1.50 0.06681 0.00 0.50000 1.50 0.93319

-2.95 0.00159 -1.45 0.07353 0.05 0.51994 1.55 0.93943

-2.90 0.00187 -1.40 0.08076 0.10 0.53983 1.60 0.94520

-2.85 0.00219 -1.35 0.08851 0.15 0.55962 1.65 0.95053

-2.80 0.00256 -1.30 0.09680 0.20 0.57926 1.70 0.95543

-2.75 0.00298 -1.25 0.10565 0.25 0.59871 1.75 0.95994

-2.70 0.00347 -1.20 0.11507 0.30 0.61791 1.80 0.96407

-2.65 0.00402 -1.15 0.12507 0.35 0.63683 1.85 0.96784

-2.60 0.00466 -1.10 0.13567 0.40 0.65542 1.90 0.97128

-2.55 0.00539 -1.05 0.14686 0.45 0.67364 1.95 0.97441

-2.50 0.00621 -1.00 0.15866 0.50 0.69146 2.00 0.97725

-2.45 0.00714 -0.95 0.17106 0.55 0.70884 2.05 0.97982

Figure 1: Partial Standard Normal Table

Using this table, we can answer questions like what is P(z < 2.0)? or what is P(z > 2.0)? I hope you

remember how to do it. P(z < 2.0) is 0.97725 and P ( z > 2.0) is 1 minus 0.97725 = 0.02275

3

Also recall that you can use the standard normal distribution to answer these types of questions for any

normal random variable as long as you know the mean and the standard deviation of the distribution.

You do this by standardizing the value of the normal random variable. As a refresher, let us say the

normal variable about which I need some probabilities is called X and it has a normal distribution with a

mean of 40 and a standard deviation of 2. What is P(X > 43)? To standardize X to z, we do (43 – 40)/2 =

1.5.

So P(X > 43) is the same as P(z > 1.5) which is 1 minus 0.93319 = 0.06681.

Basically we want to get a sample of data and find the sample mean. Call the sample mean x-bar

Then find P(mean > x-bar). This probability is called the p-value. If this value p-value is very small, say

smaller than 0.05 then we say we have sufficient evidence to reject the null hypothesis at a significance

level of 0.05. If this probability is greater than 0.05 then we fail to reject the null hypothesis at a

significance level of 0.05.

Let’s say we collect some sample data of say size 36 and let’s say we find that the sample mean (or x-bar)

comes out to be 68.9” and the standard deviation comes out to be 3 inches. The question is what is

P(mean > 68.9”) if the null hypothesis were true? To answer this question, we need the probability

distribution of the mean if the null hypothesis were true. Now I will tell you that the probability

distribution of the mean if the null hypothesis were true has a normal distribution whose mean is 68” and

the standard deviation is 3/sqrt(36) = 3/6 = 0.5. So the question is what is P(mean > 68.9”) if the mean

has a normal distribution with a mean of 68” and a standard deviation of 0.5”. We normalize 68.9” as

(68.9 – 68)/0.5 = 0.9/0.5 = 1.8

So P(mean > 68.9) is the same as P(z > 1.80) which is 1 minus 0.96407 = 0.03593. Since this probability is

less than 0.05, we have sufficient evidence to reject the null hypothesis at a significance level of 0.05.

Note that we can reject the null hypothesis at a significance level of 0.04 or even at 0.036 or any value

greater than 0.03593 but we could not reject the null hypothesis at a significance level of 0.01 or 0.02 or

0.03 or any value less than 0.03593.

Let’s say instead of 68.9, our sample mean came out to be 68.8”. Now we need P(mean > 68.8”). The

normalized value of 68.8” is 0.8/0.5 = 1.6. so P(mean > 68.8”) = P(z > 1.6) = 1 minus 9.94520 = 0.0548,

which is greater than 0.05. So we say that we did not find sufficient evidence to reject the null at a

significance level of 0.05. But we can reject the null at a significance level of 0.06 or 0.07 and any value

higher than 0.0548.

Lower the significance level, more the confidence with which I can reject the null hypothesis. So when

the sample mean was 68.9” we were more confident in rejecting the null (smallest significance level of

0.03593) than when the sample mean was 68.8” (smallest significance level of 0.0548). Intuitively also,

you should understand that more the deviation between the hypothesized mean (according to the null

hypothesis) and the sample mean, greater the evidence against the null. In this example since 68.9 had a

greater deviation from 68 than 68.8, we were more confident in rejecting the null at 68.9” than at 68.8”.

In fact we couldn’t even reject the null (at a significance level of 0.05) when the sample mean was 68.8”.

4

I want to point out one thing in the above examples. In the process of generating the p-values,

we first standardized the value of the sample means. For example we got the standardized value as 1.8

in the first example and 1.6 in the second. Using 1.8, we got P(z > 1.8) to be 0.03593 and using 1.6, we

got P(z > 1.6) to be 0.0548. We then said that since 0.03593 is less than 0.05 and if the desired

significance level is 0.05, we have sufficient evidence to reject the null. In the second example, we said

that since 0.0548 is greater than 0.05, which was our desired significance level, we failed to reject the

null. Going from the standardized value to the p-value required the extra step of reading the standard

normal table. Many textbooks and authors recommend bypassing this extra step of reading the table and

making the decision simply based on the standardized value itself. For the above example, for a

significance level of 0.05, if the standardized value happens to be greater than or equal to 1.645 then the

null hypothesis can be rejected. For a significance level of 0.01, if the standardized value is greater than

or equal to 2.33 then the null can be rejected. For a significance level of 0.10, if the standardized value is

greater than 1.28, then the null can be rejected. If we carried out the extra step of reading the standard

normal table to find the p-value, we would get the exact same decision.

Note also that in the above examples the alternate hypothesis was Ha: µ >68”. This alternate

hypothesis is considered a one-tailed hypothesis. Why is it considered a one-tailed hypothesis? It’s

because the sign in the alternate hypothesis is in one direction only (in this case “>”). An example of a

two-tailed alternate hypothesis would be Ha: µ ≠ 68”. In a two-tailed hypothesis, the null would be

rejected if either the sample mean is sufficiently higher than 68” or sufficiently lower than 68”. In the

one-tailed hypothesis (such as this - Ha: µ >68”), the null would only be rejected if the sample mean was

sufficiently higher than 68”. One tailed hypothesis can be in the other direction as well (such as Ha: µ

<68”). In this case, the null would only be rejected if the sample mean was sufficiently lower than 68”.

Also, in the above example, note that the sample size was 36, which can be considered a large sample

size. Any sample size larger than 30 may be considered a large sample.

The standardized value of the sample mean, in the above examples is also called the test statistic.

It is called the test statistic because it is a statistic (because it summarizes the sample data) and because

it is used as the basis for testing the hypothesis. The range of test statistic values above which (or below

which) the null hypothesis can be rejected is called the rejection region.

Let me now give you, in Figure 2, some rejection regions for various types of hypotheses for large sample

sizes:

One Tailed (Lower Tail) One Tailed (Upper Tail) Two-Tailed

0.10 z < -1.280 z > 1.280 z < -1.645 or z > 1.645

0.05 z < -1.645 z > 1.645 z < -1.960 or z > 1.960

0.01 z < -2.330 z > 2.330 z < -2.575 or z > 2.575

Figure 2: Rejection Regions for Various Types of Null Hypotheses for Large Sample Sizes

You should understand that P(z < -1.280) is 0.10 and P(z > 1.280) is 0.10 and P(z < -1.645) + P(z > 1.645) =

0.10. This is why the rejection regions are what they are in Figure 2. Similarly, P(z < -1.645) = 0.05 and

P(z > 1.645) is 0.05 and P(z < -1.96) + P(z > 1.96) = 0.05. Similarly, P(z < -2.33) = 0.01 and P(z > 2.33) = 0.01

and P(z < -2.575) + P(z > 2.575) = 0.01.

5

Hypothesis testing for small Sample

In the first part of this chapter we learnt how to perform a test of hypothesis for a large sample (>30). In

this part, we will look at how to test a hypothesis when the sample size is small (<30).

What is the main difference when dealing with small sample as opposed to large sample?

If you recall, for large sample, we specified the rejection region for various significance levels. For

example, for a one-tailed test (upper tail), for a significance level of 0.05, the rejection region was z >

1.645. We got 1.645 from the standard normal table. Basically P(z > 1.645) = 0.05.

The main difference is that for small samples, we get the cutoff values for the rejection region from the t-

tables instead of the z-tables. In t-tables, the cutoffs depend on two things – the significance level and

the degree of freedom. The degree of freedom for the kinds of hypotheses we have seen so far is (n – 1),

where n is the sample size.

That is the only difference. In an earlier chapter we learnt how to use the t-tables. Next we will see lots

of examples of hypothesis testing.

Critical Value: The critical value is basically the cutoff value above which (or below which) you reject the

Null hypothesis. For hypotheses about the mean of a single population, for large sample size we get the

critical value from the z-table. For small sample size, we get them from the t-table. For other types of

hypotheses, we get the critical value from other types of tables that we have not studied yet, but will

study in later chapters.

Test Statistic: The summary value that we compute using the sample is called the test statistic. For

hypotheses about the mean of a single population, the test statistic is basically the number of standard

deviations from the hypothesized value, where the standard deviation is that of the sampling distribution

of the mean.

p-value: For hypotheses about the mean of a single population, the p-value is the probability of the

sample mean being greater than (for a right tailed hypothesis) or less than (for a left tailed hypothesis) its

value assuming the null hypothesis is true.

Rejection Region: The range of values of test statistic where we reject the null hypothesis. For example

for a right-tailed for a significance level of 0.05, the rejection region is “> 1.645”

Let me now talk about two new terms: Type-I Error and Type-II Error.

Type-I Error: Suppose you get very unlucky in that you get a sample that does not represent the

population very well. But you don’t know that you got unlucky. You believe that the sample is good and

you are thinking that if the null hypothesis is true, then the probability of getting this sample is very small

and you therefore reject the null hypothesis. But the problem was that the sample was bad. At any rate,

your decision to reject the null was an error because in fact the null hypothesis was true. This type of

error is called a Type-I error. In a Type-I error, you reject the null incorrectly. For example, say a patient

does not have cancer and the null hypothesis is also that the patient has no cancer. Suppose the tests

suggest that the patient has cancer and the doctor rejects the null hypothesis. This would be an example

of a Type-I error. A defendant who is actually innocent but is found guilty is a victim of a Type-I error. Of

course, in the last example, the null hypothesis was that the defendant was innocent.

6

Type-II Error: Suppose the null hypothesis is false, but your sample data suggests that it is not false. So

when you should have rejected the null, you don’t reject the null. This is a Type-II Error. For example if a

patient is declared cancer free when he does in fact have cancer or if a defendant is not found guilty

when he is in fact guilty, we have examples of Type-II error.

Note that in general either the Type-I or the Type-II error is more expensive than the other. For example

making a Type-II error in case of a patient having cancer can cost his life, whereas making a Type-I error

on a cancer free patient will at the most force the patient to go through cancer treatment, which

although very expensive, will not kill him. So making a Type-II error in this case is more expensive. In the

case of a criminal, making a Type-I error is very expensive because you might be executing an innocent

man whereas when you make a Type-II error a guilty person wasn’t punished and this person will still be

at large, which has a cost to the society but at least no innocent man was killed.

You have to evaluate what type of error is more expensive, depending on your situation.

Desired Significance Level: The highest probability of Type-I error that you are willing to make is called

the desired significance level. For example, if you say you are testing a hypothesis at a desired

significance level of 0.05, you are saying that you are willing to tolerate a probability of type-I error up to

5%. The desired significance level is represented by the Greek symbol alpha (α). As the cost of the Type-I

error gets higher, your desired significance level should get smaller because the lower your desired

significance level, the lower the probability of making a Type-I error.

Now I am going to tell you an extremely extremely extremely important thing. So please pay all the

attention you can. Ok here we go -- Whenever the test statistic is in the rejection region, the actual

significance level (or the p-value) is less than the desired significance level (or α).

2. State the Alternate Hypothesis.

3. Establish the desired significance level (α) and the critical value. Also figure out the rejection

region.

4. Collect the data.

5. Calculate the test statistic and the p-value

6. Make your decision using either the critical region approach or the p-value approach

7. State your conclusions (in terms of the original problem).

8. Assess the consequences of making a Type-I error and a Type-II error

Step -1: State the null hypothesis. The null hypothesis usually assumes that all is well, or that there is no

difference or that there is no relationship between two variables or that something is equal to something.

Usually you will see an equality sign. Here are some examples of valid null hypotheses:

7

Following are some examples of invalid null hypotheses:

H0: µ > 68 H0: µ < 68 H0: µ ≠ 68 H0: µ1 ≠ µ2 H0: µ1 < µ2 H0: µ1 > µ2

Note that in the examples of invalid null hypotheses, there were no equality signs. In the examples of

valid null hypotheses, there were equality signs (≥ and ≤ signs also included the equality signs).

Step-2: State the Alternate Hypothesis. The alternate hypothesis assumes not all is well or that there is a

difference or that there is relationship between two variables or that something is not equal to

something. All the examples of invalid null hypotheses above are actually valid examples of alternate

hypotheses. The alternate hypothesis is also called the research hypothesis. Usually a researcher is

happy to find evidence in favor of the research hypothesis because then the researcher has something to

say because he or she found something out of the ordinary. After all what good is a dinner conversation

if everyone is telling everyone straightforward, ordinary, well-known facts? What good is a research

report that tells that a treatment had no effect? When you look at the alternate hypothesis, you should

be able to tell if it is two-tailed hypothesis or a one-tailed and if it one-tailed, is it left tailed or right tailed.

For example if the alternate hypothesis looks like this: Ha: µ ≠ 68, it is a two-tailed hypothesis. If it looks

like this Ha: µ > 68 , or this Ha: µ < 68, it is one tailed. The first one is right tailed and the second one is

left tailed.

Step-3: Establish the desired significance level (α) and the critical value. Also figure out the rejection

region. The most popular desired significance level is 0.05 or 5%. But it is just a convenient round

number. There is nothing sacred about – it just happens to be the most used significance level. Other

lesser popular levels are 0.10 and 0.01. These levels are decided by the researcher and the value

depends on the cost of the type-I error. If the cost is high, you want the desired level of significance to be

lower – say 0.01.

Step-4: Collect the data. This basically means get a good representative sample and measure the values

of the random variable of interest. Care must be taken in sampling. The sample should be as random as

possible.

Step-5: Calculate the test statistic and the p-value. With Excel and computers, this step is easy. There are

straightforward formulas for performing this step. Note that the formulas change for different types of

hypotheses. We will learn the formulas for some types of hypotheses in this chapter and for other types

of hypotheses in later chapters.

Step-6: Decision. The decision is basically to either reject the null or fail to reject the null. We never say

that we accept the alternate hypothesis or that we accept the null hypothesis. We also never reject the

alternate hypothesis. We either reject the null or fail to reject the null.

Step 7: State your conclusion in the context of your original problem. Many students confuse conclusion

and decision. When I ask for a conclusion, do not say whether the null is rejected or not rejected.

Rejecting or not rejecting the null is a decision not a conclusion. The conclusion is always in the context

of the real world problem about which the hypotheses are being tested. For example if the hypothesis is

that the mean height is 68” (the null) vs. that the mean height is > 68” (the alternate) and if your decision,

based on the sample data was to reject the null, then a statement of your conclusion will be something

like this - there was sufficient evidence at significance level of α (where α is whatever it is) that the

average of height of the population is greater than 68”. If the decision was to fail to reject the null, then

a statement of your conclusion will be something like – there wasn’t sufficient evidence at significance

level of α that the average height of the population was not 68”. Note that you will not say that there

was sufficient evidence that the average height of the population was 68”.

8

Step 8: Assess the consequences of making a type-I error and a type-II error. This is an important exercise

and it forces you to think about what if you made a mistake in your decision and hence your conclusion.

It is not enough just to declare your decision and your conclusions because there is always a finite

probability that your decision and hence your conclusions were incorrect. For example if you made a

type-I error in diagnosing a patient with a certain disease then a consequence of that will be that you will

give the patient treatment that was never needed which might have a cost and perhaps even some side

effects.

Example 1: A lot of students prefer to work after their undergraduate degree, but wish to return for

graduate education after acquiring some experience in the real world. A researcher is interested in

knowing the average age at which students return for graduate studies. The research hypothesis is that

the age is greater than 27. To test their hypothesis the researcher interviews 49 graduate students at a

campus randomly and finds that the average age of this sample was 28.4 years with and a sample

standard deviation of 3.5 years. Using this information carry out all the steps of hypothesis testing at a

significance level of 0.05.

Step 1: H0: µ = 27, where µ represents the average age of all graduate students who are returning to

graduate school after working for some years after college.

Step 2: Ha: µ > 27 (Note that this is a one-tailed hypothesis. It is right-tailed hypothesis)

Step 3: α = 0.05 (this was given to us). The rejection region is z > 1.645

Step 4: Data on 49 students was collected. Sample mean xbar = 28.5, sample standard deviation s =

3.5 yrs.

Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (28.5 – 27)/(3.5/sqrt(49)) = 1.5/(3.5/7)

= 3.0. The p-value corresponding to this test statistic is P(z > 3.0) = 1 – 0.99865 = 0.00135

Step 6: Decision. Using the critical value approach, since 3.0 > 1.645, we reject the null hypothesis.

Using the p-value approach, since 0.00135 is less than 0.05 we reject the null hypothesis.

Step 7: There was sufficient evidence at significance level of 0.05 that the average age of students

returning for graduate school after working for some years is greater than 27 years.

Step 8: Since we are rejecting the null hypothesis, there is a chance of making a type-I error, i.e. the

true average age may be less than or equal to 27 years although our test indicates otherwise. A

consequence might be that the courses may be designed for a more mature audience when in

fact the audience is not that mature. This may result in an expectation gap for the faculty

teaching the courses. If the age was indeed above 27 and our test showed that it was not, then

there would be a type-II error. A consequence would be that the courses would be designed

for not so mature audience whereas the students are expecting more advanced courses.

Example 2: In the above example, what if the sample average turned out to be 27.5 years instead of 28.5.

All other things are the same. Carry out all the steps.

Steps 1, 2 and 3 are the same as above.

Step 4: Data on 49 students was collected. Sample mean xbar = 27.5, sample standard deviation s =

3.5 yrs.

Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (27.5 – 27)/(3.5/sqrt(49)) = 0.5/(3.5/7)

= 1.0. The p-value corresponding to this test statistic is P(z > 1.0) = 1 – 0.84134 = 0.15866

9

Step 6: Decision. Using the critical value approach, since 1.0 < 1.645, we fail to reject the null

hypothesis. Using the p-value approach, since 0.15866 is greater than 0.05 we fail to reject the

null hypothesis.

Step 7: There was not sufficient evidence at significance level of 0.05 that the average age of students

returning for graduate school after working for some years is greater than 27 years.

Step 8: Similar to the example above.

Example 3: In example 1, assume that the sample average turned out to be 28 years and the sample

standard deviation was 5 years instead of 3.5 years.

Steps 1, 2 and 3 are the same as above.

Step 4: Data on 49 students was collected. Sample mean xbar = 28, sample standard deviation s = 5.0

yrs.

Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (28 – 27)/(5/sqrt(49)) = 1/(5/7) = 7/5 =

1.4. The p-value corresponding to this test statistic is P(z > 1.4) = 1 – 0.91924 = 0.08076

Step 6: Decision. Using the critical value approach, since 1.4 < 1.645, we fail to reject the null

hypothesis. Using the p-value approach, since 0.08076 is greater than 0.05 we fail to reject the

null hypothesis.

Step 7: There was not sufficient evidence at significance level of 0.05 that the average age of students

returning for graduate school after working for some years is greater than 27 years.

Step 8: Similar to the example above.

Example 4: In Example 1, what if the research hypothesis was Ha: µ ≤ 27? Everything else was the same

and the sample average came out to be 26.5” and a standard deviation of 3.5 years

Step 2: Ha: µ < 27 (Note that this is a one-tailed hypothesis. It is left-tailed hypothesis)

Step 3: α = 0.05 (this was given to us). The rejection region is z < -1.645

Step 4: Data on 49 students was collected. Sample mean xbar = 26.5, sample standard deviation s =

3.5 yrs.

Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (26.5 – 27)/(3.5/sqrt(49)) = -0.5/(3.5/7)

= -1.0. The p-value corresponding to this test statistic is P(z < -1.0) = 0.15866

Step 6: Decision. Using the critical value approach, since -1.0 > -1.645, we fail to reject the null

hypothesis. Using the p-value approach, since 0.15866 is greater than 0.05 we fail to reject the

null hypothesis.

Step 7: There was not sufficient evidence at significance level of 0.05 that the average age of students

returning for graduate school after working for some years is less than 27 years.

Example 5: In Example 1, what if the research hypothesis was Ha: µ ≠ 27? Everything else was the same.

Step 1: H0: µ = 27, where µ represents the average age of all graduate students who are returning to

graduate school after working for some years after college.

Step 2: Ha: µ ≠ 27 (Note that this is a two-tailed hypothesis.)

Step 3: α = 0.05 (this was given to us). The rejection region is either z < -1.96 or z > 1.96

Step 4: Data on 49 students was collected. Sample mean xbar = 28.5, sample standard deviation s =

3.5 yrs.

10

Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (28.5 – 27)/(3.5/sqrt(49)) = 1.5/(3.5/7)

= 3.0. The p-value corresponding to this test statistic is 2*P(z > 3.0) = 2*(1 – 0.99865) =

2*0.00135 = 0.0027.

Step 6: Decision. Using the critical value approach, since 3.0 > 1.96, we reject the null hypothesis.

Using the p-value approach, since 0.0027 is less than 0.05 we reject the null hypothesis.

Step 7: There was sufficient evidence at significance level of 0.05 that the average age of students

returning for graduate school after working for some years is not 27 years.

Step 8: Since we are rejecting the null hypothesis, there is a chance of making a type-I error, i.e. the

true average age may be 27 years although our test indicates otherwise.

Let me summarize the above five examples in a table:

Example 1 Example 2 Example 3 Example 4 Example 5

H0 µ = 27 µ = 27 µ = 27 µ = 27 µ = 27

Ha µ > 27 µ > 27 µ > 27 µ < 27 µ ≠ 27

Tails One (right) One (right) One (right) One (left) Two

α 0.05 0.05 0.05 0.05 0.05

Rejection >1.645 >1.645 >1.645 < -1.645 <-1.96 or >1.96

region

Sample 28.5 27.5 28 26.5 28.5

Mean

Sample std. 3.5 3.5 5 3.5 3.5

dev.

Sample size 49 49 49 49 49

Test (28.5-27)/ (27.5-27)/ (28-27)/ (26.5-27)/ (28.5-27)/

Statistic (3.5/sqrt(49) = (3.5/sqrt(49) = (5/sqrt(49) = (3.5/sqrt(49) = (3.5/sqrt(49) =

3.0 1.0 1.4 -1.0 3.0

p-value P(z>3.0) = P(z>1.0) = P(z>1.4) = P(z<-1.0) = P(z<-3)+P(z>3) =

0.00135 0.15866 0.08076 0.15866 2*P(z>3) =

2*0.00135=

0.0027

Decision Reject Null Fail to reject Fail to reject Fail to reject Reject

Example 6: What if in Example 1, the sample size was 16 instead of 49. Everything else was the same.

Example 6

H0 µ = 27

Ha µ > 27

Tails One (right)

α 0.05

Rejection region >1.753

Sample Mean 28.5

Sample std. dev. 3.5

Sample size 16

Test Statistic (28.5-27)/(3.5/sqrt(16) = 1.5*4/3.5 = 1.714

p-value P(t>1.714) = 0.0535

Decision Fail to Reject Null

Instead of writing paragraphs about it I have shown it in tabular form above similar to the first five

examples. I will point out the differences here. In this example, note that the rejection region of >1.753

came from the t-table of Chapter 3, for degrees of freedom 15 under the column 0.05. You can also use

Excel function T.INV like this =T.INV(0.95,15) where 0.95 is basically 1 – 0.05 and 15 is the degree of

freedom, which in this case is 16 minus 1.

11