Sie sind auf Seite 1von 8

I would like to present a simple example to illustrate statistical hypothesis testing.

1
In a history class at a certain university it is known that that students’ scores on
the final exam have a normal distribution with mean, mu = theta naught and
variance sigma squared.

The History Department introduces a new teaching method and wants to


determine if this new method will increase final exam scores.

The null hypothesis is the hypothesis of no change, H naught is mu = theta


naught

The alternative hypothesis is the hypothesis that mu equals theta A which is


greater than theta naught.

Let X1 to Xn be the final exam scores of the n students in the class.

Under the null hypothesis, the joint probability density of the X’s is denoted by
L of the X’s with the mean equal to theta naught, called the likelihood function
under the null hypothesis.

Under the alternative hypothesis, the joint probability density of the X’s is
denoted by L of the X’s with the mean equal to theta A, called the likelihood
function under the alternative hypothesis.

2
The testing of H naught versus H A, a one-sided test, can be based on the ratio
of the likelihood functions, the likelihood under the alternative hypothesis
divided by the likelihood under the null hypothesis – simply called the
likelihood ratio. The likelihood function is similar to a ‘probability’, so if the
likelihood ratio is larger than one, we would expect that the alternative
hypothesis to be more ‘likely’ than the null hypothesis. After the students take
the final exam the likelihood ratio can be computed. If the likelihood ratio is
too large, then the new teaching method is better than the original method and
we would reject the null hypothesis in favor of the alternative hypothesis.
It can be shown that the likelihood ratio varies directly with the mean, X bar, of
the students’ scores. So, the testing can equivalently be based on X bar or on T
of X bar, the standardized X bar which has a standard normal distribution under
the null hypothesis. T of X bar is called the test statistic.

Would reject the null hypothesis for large values of the test statistic. This can
be formalized or quantified by setting the significance level or alpha level of the
test or by setting the critical value of the test, t C denoted the critical value:

3
Now the probability that the test statistics is at least t C under the null
hypothesis has probability alpha. Setting alpha can find t C or setting t C can
find alpha. For this example, it may be best to set the critical value, since this
would determine how much larger than theta naught the mean of the students’
scores should be in order to reject the null hypothesis.

Under the null hypothesis, the test statistic has a standard normal distribution;
the standard normal density is graphed here.

The set of values of the X’s where the test statistic is at least equal to the critical
value is called the critical region or rejection region, where the null hypothesis
would be rejected. In the graph the critical value, t C, is marked and the area
under the curve to the right of t C is alpha, the probability of the rejection
region. Let t O, t subscript O, be the observed value of the test statistic. If the
observed value of the test statistic is at least equal to the critical value, the test
statistic is in the rejection region, then would reject the null hypothesis,
otherwise accept the null hypothesis.
The p-value of the test is the probability that the test statistic is more extreme
than the observed value under the null hypothesis. So would reject the null
hypothesis if the p-value is less than or equal to the significance level or alpha
level of the test.

4
There are two possible errors in hypothesis testing.
We could reject the null hypothesis when the null hypothesis is true – called a
Type I error. The probability of a Type I error is the probability of the rejection
region under the null hypothesis, which is equal to alpha, the significance level
of the test. To perform the testing, the probability of a Type I error is set.

Or we could accept the null hypothesis when the null hypothesis is not true –
called a Type II error. The probability of a Type II error is the probability that
the test statistic is not in the rejection region, or the complement of the
rejection region under the alternative hypothesis which is equal to one minus
the probability of the rejection region under the alternative hypothesis.

The probability of rejecting the null hypothesis when the alternative is true is
called the Power of the test. A test with high Power would be a desirable
property.
Increasing the Power would increase the rejection region, decrease the
probability of a Type II error but increase the probability of the Type I error
larger.
We cannot control the probabilities of the Type I error and the Type II error
simultaneously.

5
The above example is based on the Neyman-Pearson Lemma. The statement
of the Lemma is here: For a one-sided test, a test which rejects the null
hypothesis when the likelihood ratio is at least K for values of the X’s in a critical
region, C, where alpha is the probability of the critical region under the null
hypothesis is a uniformly most powerful test. Simply “uniformly most powerful
test” means that it is the best test available.
The Neyman-Pearson Lemma is a very strong result and very subtle. But the
Lemma is only applicable to one-sided tests.

Because of the restriction to one-sided tests, the Neyman-Pearson Lemma has


limited applicability. But the Lemma does have some important applications: It
is used in economics concerning land value, it is used in electrical engineering
for signal processing and it is used in Particle Physics for discovering new
elementary particles.
5 – bottom
In the above example concerning the mean of the final exam; to test if the new
curriculum has changed the mean of the final exam, then one would be using a
two-sided test: The null hypothesis is the same as for the one-sided test, but
the alternative is that the mean is not equal to theta naught, the mean could
be larger or smaller than theta naught.

6
For a two-sided test we use what is called the likelihood ratio test which
maximizes the likelihood under the null hypothesis and under the alternative
hypothesis.

L hat of the X’s with the mean equal to theta naught, is the maximum of the
likelihood under the null hypothesis.
L hat of the X’s with the mean not equal to theta naught, is the maximum of the
likelihood under the alternative hypothesis.
The two-sided test is based on the ratio of these maximum likelihoods, the
maximum under the alternative hypothesis divided by the maximum under the
null hypothesis.

It turns out, that the test statistic will be same as for the one-sided test, the
standardized X bar, but values of the test statistic that are too small or too large
would lead to rejection of the null hypothesis.
Reject H naught when the absolute value of the test statistic is at least u C, the
critical value of the two-sided test.

The test can be formalized by selecting the critical value, u C, or the alpha-level
of the test so that probability that the absolute value of the test statistic is at
least u C under H naught is equal to alpha.

7
Where the absolute value of the test statistic is greater than or equal to the
critical value is the critical region or rejection region of the two-sided test.
The distribution of the test statistic under the null hypothesis is standard
normal, this is the graph of the density. On the graph, the rejection region is in
both the left and right tails; to the left of a negative U C and to the right of U C.

Again, t O denotes the observed value of the test statistic.


Would reject the null hypothesis if the absolute value of the observed value of
the test statistic is at least the critical value.
The p-value is the probability that the absolute value of the test statistic is at
least as extreme as the absolute value of the observed value of the test statistic
under the null hypothesis; the probability that the test statistic is more extreme
than its observed value.

8
For the two-sided test, here are the probabilities of the Type I and Type II errors
and the Power of the test.

Want to mention that when selecting the alpha-level for a one-sided test, it is
usually appropriate to use half the alpha value that would be used for a two-
sided test.
H0 HA

Type I Error = rejecting the null hypothesis when the null hypothesis is true.

Type II Error = accepting the null hypothesis when the null hypothesis is not true, or
accepting the null hypothesis when the alternative is true.

Power of a test = probability of rejecting the null hypothesis when the alternative is true.

Probability of a Type I Error, significance level, size of the critical region, power of the test
when null hypothesis is true are all the same.

Probability of a Type II Error = 1 – the power of the test when the alternative is true.

Pivot – a function of the sample values and parameters, whose distribution is


independent of the parameters. Can be used to construct confidence intervals.
Tc uc
-uc tc
α α/2

Das könnte Ihnen auch gefallen