Sie sind auf Seite 1von 32

Unit 3: Hypothesis Testing

Section 1.3-1.6 in the Text


Unit 3 Outline

• Motivating Example
• Hypothesis Testing Framework
• (Fisher’s) Randomization Test
• Permutation Test
• Testing in R

2
• Malaria sickens
~200 million each
year, killing about 1
million.
• Most malaria cases
and deaths occur in
sub-Saharan Africa.
• Malaria vaccines are
difficult to develop
due to the diversity
of the parasite and
the complexity if its
effects on human
body - an area of
intensive research.

3
Example: Malaria Vaccine Trial
• “Researchers reported that…out of 9 volunteers who
received four doses of the new vaccine, 3 contracted
the disease. Of 12 volunteers who received no vaccine,
10 became infected.”
• What is the scientific question of interest?
• What are the statistical hypotheses to address this
scientific question?
• What statistical analysis tool should we use to test these
hypotheses?

4
Unit 3 Outline

• Motivating Example
• Hypothesis Testing Framework
• (Fisher’s) Randomization Test
• Permutation Test
• Testing in R

5
Steps in testing Hypotheses
• There are 4 main steps to a statistical hypothesis
testing procedure:
1) Formulate the hypotheses: H0 and HA.
2) Calculate a test statistic that is evidence for these
hypotheses.
3) Calculate the p-value based on a reference
distribution of the test statistic (assuming H0 is true).
4) Determine the conclusion of the test by comparing
the p-value to the significance level of the test
procedure. State how it generalizes internally and
externally.

6
Hypotheses
• A scientific hypothesis makes a testable statement
about the observable universe.
• A statistical hypothesis is more restricted in that it
concerns the behavior of a measurable (or
observable) random variable. It is often a statement
or claim about a parameter of a population or
distribution.
• Two competing types of statistical hypotheses for
any scientific problem: H0 vs. HA.

7
Hypothesis Testing
• Suppose that the treatment is randomly assigned to
selected units in a sample, while others are used as
“controls”,

Treatment

Control

How do we use statistical inference to answer the


question of interest?
8
Formulating Statistical Hypotheses
• There are two types of hypotheses associated with every
statistical test:
• Null Hypothesis (H0): Assumption specifying a possible
truth, typically the absence of an effect. A statement of old
beliefs, such as no change, no difference, or no effect. If the
null hypothesis is true, then any discrepancy between the
observed data and the hypothesis is due only to chance
variation.
• Alternative Hypothesis (HA): Assumption describing an
alternative truth, typically some effect or some difference.
A statement of possible new beliefs. An observed
discrepancy between the observed data and the null
hypothesis is not due to chance variation
• What should our hypotheses be for the malaria data?

9
2. Determine a Test Statistic
(& Reference Distribution)
• Statistic: A function of the data, y = y1, …, yn: .
ˆ  ˆ (y )
• Test Statistic: a specific statistic used to weigh evidence
supporting and contradicting the null hypothesis.
• Reference Distribution: Probability distribution of the test
statistic, assuming that the null hypothesis is true
f ( ˆ (Y ) | H 0 ). This is often called the sampling
distribution.
• What should be the test statistic for the malaria dataset?
How can we determine it’s reference distribution?

10
3. Calculate the p-value
• p-value – the probability of observing our test
statistic or a more extreme one, assuming the null
hypothesis to be true. It is used as a measure of
the strength of evidence of the hypotheses.

• Calculated by comparing the observed test


statistic to the reference distribution.
Randomization Distribution of Y

2500
Frequency
p-value is NOT the
probability that the null
hypothesis is true! 1000
0
-0.5 0.0 0.5

11
Significance Level of a test
• p-value - probability that the test statistic would be at least as
extreme as observed, under the null hypothesis.
• A significance level (α) is the criterion compared against the
p-value. The null hypothesis is rejected if p-value is lower
than α.
• Generally, α reflects the probability of rejecting the null
hypothesis given that it is true (Type I error).

• Why are two-sided hypotheses considered more conservative?

12
4. Determining the Conclusion
• We come to a the conclusion about our hypotheses by
comparing the p-value to the Type I error rate.
• If the p-value is as small or smaller than the pre-specified
level of the test or alpha (), usually 0.05, we reject the
null hypothesis and say the result is statistically significant
at level .
• If the p-value is larger than , we are unable to reject the
null hypothesis.
• Warning: this is different than concluding that the null.
Hypothesis is true! Why?
• Based on the study design, we can then generalize internally
and/or externally (the scope of the inference procedure)

13
Unit 3 Outline

• Motivating Example
• Hypothesis Testing Framework
• (Fisher’s) Randomization Test
• Permutation Test
• Testing in R

14
Uncertainty in Randomized
Experiments
• In randomized experiments, uncertainty comes from
randomness of an assignment mechanism.
• How can we capture this randomization through a
summary measurement?
• (Fisher’s) Randomization Test is a distribution-free test
for treatment effect in randomized experiments.

15
(Fisher’s) Randomization Test
• Additive Treatment Effect:
Yc ,i  Yv ,i  
• H0: Zero treatment effect for all units, δ = 0. Each
unit’s outcome is the same, regardless of the treatment
assigned.
• Consequently, the distribution of outcomes is
identical in two groups.
• Ha: Non-zero treatment effect for ALL units, δ ≠ 0
• (This version is used in R&S, Section 1.3.1: Y*  Y   )

16
(Fisher’s) Randomization Test
Assumptions:
• Random assignment to groups.
• Under the H0, independence of study units.
– To be precise, there is an exchangeability of study units,
i.e., the labels assigning subjects to groups are
interchangeable.
– We will take advantage of this exchangeability when
building the sampling distribution (through simulation) of
the test statistic.

17
(Fisher’s) Randomization Test

• Test statistic: Difference between average outcomes


in the two groups*.
ˆ  Yc  Yv
• *Note: other summaries of the data can represent a test
statistic. We could have instead chosen the difference between
medians.

• Randomization distribution is the reference distribution of a


test statistics in a randomization test, where variation is due to
random assignment of the treatment.

18
Example 1: Randomization Test

12
• Observed data
• Observed test statistic

8
Count
yc  yv  0.5

4
0
• Randomization distribution Control Vaccine
Randomization Distribution of Y

of Yc  Yv

2500
 21
Frequency
– What is  9  ?
 
– Exact vs. approximate 1000
– R-code to come
0

-0.5 0.0 0.5


19
P-values in Randomization Test Randomization Distribution of Y

2500
• One-sided alternative

Frequency
1000
H0 :   0

0
H a :   0 (or   0) -0.5 0.0 0.5

– The side has to be justified!


– p-value = proportion of values above (or below)
the observed test statistic,
p  value  P (Yc  Yv  yc  yv )
 P (Yc  Yv  0.5)  0.029
20
Example 1: Conclusion
• Conclusion:
– Under the significance level α=0.05, there is evidence that
the vaccine is effective.
– More generally, if effect is NOT homogeneous, then there
is evidence that the vaccine was effective for at least one
volunteer.
• Scope of inference
– Internal validity: are assumptions of the test satisfied?
• Independent units?
• Randomized treatment?
– External validity: possible to make inference to a broader
population?
21
Unit 3 Outline

• Motivating Example
• Hypothesis Testing Framework
• (Fisher’s) Randomization Test
• Permutation Test
• Randomization Test in R

22
Example 2: Natural Resistance to
Malaria
• Sickle-cell anemia – hereditary blood disordered.

• Suppose researchers surveyed a group of people in


Uganda infected by malaria, tested for sickle cells and
measured parasite counts* in blood.

• Question of interest: is the sickle-cell trait associated


with parasite counts?

*Allison AC (1954). "Notes on sickle-cell polymorphism". Ann Hum Genet 19: 39–57.

23
Uncertainty in Observational
Studies
• We can apply the same “regrouping” idea.
However, because assignment mechanism is not
random, the uncertainty is due to an “imaginary”
chance mechanism.

• This is called a permutation test, and is a


distribution-free nonparametric test for
association between group status and outcome in
observational studies.
24
Permutation Test
• H0 : Outcomes not related to group status.
• Ha : Outcomes are related to (associated with) group
status.
• Assumptions:
• Independence of study units.

• Test statistic: Difference in average observed outcomes


between the two groups (or, any other statistic),
YSC  YNorm
• Permutation distribution is a reference distribution of a
test statistics in a permutation test.

25
Permutation Test: Same mechanics 0 1 2 3 4 5

Norm SC
4

80

Percent of Total
Count
3

60
2

40
1

20
0

Norm SC 0

0 1 2 3 4 5
100 per mm^3
100 per mm^3
Permutation distribution:
2000
Frequency

p-value = 0.02
1000

Conclusion?
0

0.74
-1.5 -1.0 -0.5 0.0 0.5 1.0
Difference in S. cells counts 26
Randomization Test: Permutation Test:
H0: Each unit in the population H0: outcomes in the population are
would have the same outcome not related to group status; Hence,
regardless of the treatment outcome distributions in
assigned; Hence, distributions of subpopulations with different
outcomes under the two treatments groups status are the same.
are the same. Ha: there is an association between
Ha: Treatment effect is non-zero, group status and outcomes in the
, in the population. population.

Randomization Test: Permutation Test:


H0: Each unit in the sample would H0:Outcomes in the sample are not
have the same outcome regardless related to group status;
of the treatment assigned; Ha: There is an association between
Ha: Treatment effect is non-zero, group status and outcomes in the
, in the sample. sample.

27
Unit 3 Outline

• Motivating Example
• Hypothesis Testing Framework
• (Fisher’s) Randomization Test
• Permutation Test
• Randomization Test in R

28
R’s for loop
• Often times in statistical computing, you would like to repeat
some process multiple times:
• Repeated Sampling (like for Fisher’s randomization test).
• Operating on a vector or matrix [though use of one of
apply, lapply, or tapply may be better]
• Markov chains
• Simulation Studies
• Using a loop of some sort can help with this task. R has two
major type:
• for loop: used to repeat a task a fixed number of times
(called the number of iterations)
• while loop: used to repeat a task until you satisfy a certain
condition (number of iterations undetermined)
29
for loop syntax
2) for statement: 1) Initialization step:
The word for, and in parentheses, Define your constant (here: n) and
the range of values that the iterator usually define a “blank” variable
(here: i) will take on (here: 1:n.iter). (here: variable) which will store
Here, the iterator will increase by the results of the steps
one each time through the loop of the for loop.
(most common way to do it).
n.iter = 100
3) Body of work (inside the “{}”): variable = rep(NA, n.iter)
The work to be done for each time
through the loop. Usually there will for(i in 1:n.iter){
be a step that indexes the storage samp = runif(10)
vector (here: variable[i]). variable[i] = max(samp)
}

4) Results:
hist(variable)
After the for loop is complete, what
calculation and results will be mean(variable)
produced on the storage variable 30
Fisher’s randomization test in R
#################################### #and the for loop to do all the work:
# Performing Fisher's Randomization reordering the x's, and splitting into
# Test for the Malaraia data groups
#################################### for(i in 1:nsims){
#Create the variables in the data set x.sim = sample(x)
x=c(rep(1,9),rep(0,12)) ybar.sim[i]=mean(y[x.sim==0])-
y=c(rep(1,3),rep(0,6),rep(1,10),rep(0,2) mean(y[x.sim==1])
) }
n=length(x)
mean(ybar.sim)
#split into two groups and calculate the var(ybar.sim)
test statistics hist(ybar.sim, col="grey")
y.v=y[x==1]
y.c=y[x==0]
#one-sided p-value
ybar.obs=mean(y.c)-mean(y.v) mean(ybar.sim >= ybar.obs)
#two-sided p-value
#initialize the vector to store the mean( abs(ybar.sim) >= abs(ybar.obs))
simulated
nsims=100000
ybar.sim=rep(NA,nsims)

31
The Last Word

Heck, my eighth grade


science class managed
to conclusively reject it
just based on a
classroom experiment.
It’s pretty sad to hear
about million-dollar
research teams who
can’t even manage that.

http://xkcd.com/892/
32

Das könnte Ihnen auch gefallen