Sie sind auf Seite 1von 29

Hypothesis Testing

Hypothesis testing: use of statistics to determine the


probability that a given hypothesis is true / false.

Hypothesis: some theory or claim that has been put


forward because it is believed to be true, but has not
been proved.

In the present context, hypothesis is a conjecture about


the distribution of some random variables often
statements about mean and variance of an r.v.

Hypothesis testing tests a null hypothesis H0 against the


alternate hypothesis H1.

Null Hypothesis

States that nothing extraordinary is present, or the


process is operating ordinarily the most likely and
normal situation.

It should be rejected only when the evidence against it is


really very strong.

Choice of null hypothesis: (1) the current belief, (2) which


requires very strong evidence against it, (3) which is
simpler to claim.

Alternate hypothesis your belief that you want to


establish.

Decision Errors

The outcome of a hypothesis test is either reject H0 or


do not reject H0.

When we perform a statistical test we hope that our


decision will be correct, but sometimes it will be wrong.
There are two possible errors that can be made in
hypothesis test.
Decisions
Truth
H0
H1

Accept H0

Reject H0

Correct

Type I Error

Type II Error

Correct

Steps in Hypothesis
Testing
Hypothesis testing is a proof by contradiction.

Step 1: Formulate the null and alternative hypothesis. Assume


H0 is true.
Step 2: Collect data.
Step 3: Evaluate whether data are consistent with the statistical
hypothesis identify a test statistics that will be computed
from the collection of data and assess the truth of the null
hypothesis.
Approaches: (1) Frequentist or classical.
(2) Bayesian
(3) Likelihood.

Some Definitions

Critical region or rejection region that region of the dataspace or the corresponding range of statistical test value for
which null hypothesis will be rejected.

Size or significance level of the test prob. of incorrectly


rejecting H0 = prob. of Type I error = .

When value for test statistic is found in the rejection region,


the test is said to be statistically significant at level

Prob. of Type II error =

Power of a test prob. of correctly rejecting H0 = 1 .

The p-value is the min. significance level for which we would


still reject the null hypothesis. The p-value is a measure of
how much evidence there is against the null hypothesis.

A Simple Example

Null hypothesis H0 : a sample comes from a Gaussian


process with mean zero, i.e. N(0,2).

We have to decide whether a particular sample came from


the same process or not.

For decision making, we need to set some criterion,


measure the sample value and evaluate some chosen test
statistics. Then compare the test statistics value against
the criterion to accept / reject the null hypothesis.

Our criterion reject H0 if the normalized difference


between the sample value and 0 is more than 2.

Example (contd.)

Measure sample value = xs.

Evaluate z xs 0

If z > 2, then we decide that the sample has not come from
the zero mean Gaussian process under consideration.

Check that the size or significance level or the


probability of Type I error for the set criterion is = 0.05

The p-value for the measured sample is p 2 f ( x )dx


0
0

xs 0
where f0(x) = N(0,2).

Example (contd.)

Now, we state both null hypothesis and alternate hypothesis


as follows.

H 0 : f 0 ( x) p x | H 0
H 1 : f1 ( x ) p x | H 1

What we need to do is to see to which of the two density


functions the observed data fits better and then decide for
H0 or H1.

Accordingly we need to set the decision criterion.

Bayesian Hypothesis
Testing

Given prior probabilities P(H0) and P(H1)

Given likelihoods p(xs | H0) and p(xs | H1), where xs is the


collected data.

Calculate posterior odds ratio = Bayes factor prior odds


ratio.

P ( H 0 | xs ) p ( xs | H 0 ) P ( H 0 )

P ( H 1 | xs ) p ( xs | H 1 ) P ( H 1 )

Bayesian Method (contd.)

Decide for H0 if the posterior odd is greater than 1, else


decide for H1.

Basically based on MAP criterion.

Prior odd modified after observing the data.

Integrates prior probabilities associated with competing


hypotheses into the assessment of which hypothesis is
the most likely for the data in hand.

Saying other way, likelihoods modified with the prior


probabilities.

Bayesian Method (contd.)

Decision
boundary:
p(x |0)H=
p(x
| H1)P(H1)1)
P(Type
III Error)
Error)
=
P(Type
p(x
H
)dx
PE = P(Type
I Error)P(H
+0)P(H
0) =
II01)dx
Error)P(H
P(Type
R0 p(x
|| H
R1

Bayesian Method (contd.)

When the hypotheses are defined over some parameter:

P ( H 0 | xs )

P ( H 1 | xs )

p( x | , H ) p( | H )d P( H )
p( x | , H ) p( | H )d P( H )
s

Useful for compound hypothesis testing.

Example: We toss a coin 100 times and obtain 60 heads


and 40 tails. What is the evidence against the
hypothesis that the coin is fair?

Minimum Prob. of Error

We attach some cost function: Cij = cost incurred by


accepting Hi when Hj is true.

R0 is region of accepting H0, R1 is region of rejecting H0


(accepting H1) we have to determine proper decision
regions so that the overall cost is minimized.

Overall cost value:


C

00

p( x | H 0 ) P( H 0 ) C01 p( x | H1 ) P( H1 )dx

R0

C10 p( x | H 0 ) P( H 0 ) C11 p( x | H1 ) P( H1 )dx


R1

Minimum Prob. of Error


(contd.)

Cost minimized if we decide for H0 when


C00 p ( xs | H 0 ) P ( H 0 ) C01 p( xs | H1 ) P ( H1 )

C10 p( xs | H 0 ) P( H 0 ) C11 p( xs | H1 ) P( H1 )

Considering zero-one cost function, decide for H0 when


p ( xs | H 1 ) P ( H 1 ) p ( xs | H 0 ) P ( H 0 )

This is essentially the Bayes decision criterion.

In this the overall error (both Type I and Type II) is


minimized.

Minimum Prob. of Error


(contd.)

Fisherian Hypothesis
Testing

Construct a statistical null hypothesis

Choose an appropriate distribution or test statistic.

Collect the data with random samples.

Determine the p value assuming the null hypothesis is


true.

Reject the null hypothesis if p is small

Neyman-Pearson Method

Similar to Fishers approach, but:

Set significance level in advance,

Focus on Type I and Type II errors, as well as power of


tests.

For any there is infinite number of possible decision


rules (infinite number of critical regions).

Each critical region has a power.

Neyman Pearson Lemma tells us how to find the critical


region that has the highest power.

Neyman-Pearson Method
(contd.)
False Alarm: Wrongly rejecting H .

Detection: Rightly detecting H0 not true (rejecting H0).

Miss: Wrongly accepting H0.

Check that prob. of false alarm PF = and prob. of miss


is . Prob. of correct acceptance is (1 ) and prob. of
detection PD = (1 ).

So, we have two degrees of freedom.

Aim to increase PD while decreasing PF generally not


possible simultaneously.

Neyman-Pearson Method
(contd.)

Neyman-Pearson Method
(contd.)
Relation between P and P given commonly by receiver

operating characteristic (ROC) curve:

Neyman-Pearson Lemma

Neyman-Pearson Criterion: maximize PD such that PF


Do likelihood ratio test (LRT).
Define likelihood ratio statistics as

p xs | H 0
L ( xs , H 0 , H 1 )
p xs | H 1

Reject null hypothesis if L( xs , H 0 , H1 ) k where k is


chosen such that the total probability of rejection is
Neyman-Pearson Lemma: Any other test with significance
level * has power less than or equal to that of the
likelihood ratio test.

Hypothesis Testing using


Recall the criterion used for minimizing cost function in
LRT

hypothesis testing (decision making): Reject H0 when


C00 p ( xs | H 0 ) P ( H 0 ) C01 p( xs | H1 ) P ( H1 )
C10 p ( xs | H 0 ) P ( H 0 ) C11 p( xs | H1 ) P ( H1 )

p( xs | H 0 ) C01 C11 P ( H1 )

This can be rewritten as


p ( xs | H1 ) C10 C00 P( H 0 )

So, if the threshold k in LRT equals RHS then the cost


function is minimized.
Also, check that if k = P(H1)/P(H0) for zero-one cost
function then it is essentially the Bayes criterion based
hypothesis testing.

Points to Note

Neyman-Pearson is classical (or frequentist) approach.

In Neyman Pearson approach, our questions revolve


around the probability of the data, given a specific
hypothesis good for cases where no a priori information is
available.

In the Bayesian approach, our questions revolve around the


probability of various hypotheses, given the data.

Bayes criterion is MAP based and is equivalent to min. error


criterion.

Bayes method also minimizes overall cost in case of zeroone cost function.

Bayes method and LRT (with threshold 1) are equivalent in


case of equal a priori probabilities.

Maximum Likelihood
Estimation
Assumed that the parametric form of p(x) is known, but

depends on some parameters 1, 2, 3, ..

So, once we can find (estimate) these parameter values


then the density function is uniquely determined.

Observe a set of i.i.d. training samples x1, x2, .., xn.

Likelihood:

p ( X | ) p ( x1 , x2 ,......., xn | ) p ( xk | )
k 1

MLE finds the values of parameters for which the


likelihood is maximized.

MLE (contd.)

To find the parameter value that maximizes likelihood we


need to differentiate w.r.t. k and then equate to zero.

Equivalently, we may differentiate the log-likelihood


function this is easier to work with.

Log-likelihood: l () ln p( X | ) ln p ( x | )

k
k 1

Solve for l ()

k 1

ln p ( xk | ) 0

Explain with examples: Gaussian case unknown mean,


unknown mean and variance.

MAP Estimation

Prior probability of different parameter values given p().

We can determine the posterior prob. p( | X) for the


given training samples.

We look for the parameter values that maximizes this


posterior prob.

That is, we maximize p(X | ) p().

Equivalently, we maximize l()p().

Bayesian Estimation

The parametric form of p(x) is known but the parameter


values not known.

Basic goal is to compute density function from a given set


of samples, i.e. p(x | X) which is close to the unknown p(x).

p( x | X) p ( x | ) p ( | X)d

So, need to compute p( | X) called reproducing density.

Initial knowledge about the parameter values is contained


in a known prior density p() called conjugate prior.

Rest of our knowledge about the parameters is contained


in the sample set.

Bayesian Estimation
(contd.)

Using Bayes formula:

p ( | X)

p ( X | ) p ()

p(X | ) p()d

Since the samples are drawn independently according to


the unknown prob. density p(x)
n

p ( X | ) p ( xk | )
k 1

Bayesian Estimation
Example
Given: p ( x | ) ~ N ( , )
p( ) ~ N ( , )
2

Reproducing density:

1
1 n

p( | X )
exp
2
2 n
2 n

n 02
2
n
where n
0
2
2
2
2
n 0
n 0

2
0

Finally, density function:

02 2

n 02 2
2
n

1 n
n xk
n k 1

p ( x | X) ~ N ( n , 2 n2 )

Das könnte Ihnen auch gefallen