Beruflich Dokumente
Kultur Dokumente
Objectives
Discuss commonly used terms P-value Power Type I and Type II errors Present a few commonly used statistical tests for comparing two groups
Outline
Estimation and Hypotheses Continuous Outcome/Known Variance Test statistic, tests, p-values, confidence intervals Unknown Variance Different Outcomes/Similar Test Statistics Additional Information
Statistical Inference
Inferences about a population are made on the basis of results obtained from a sample drawn from that population Want to talk about the larger population from which the subjects are drawn, not the particular subjects!
What Do We Test
Effect or Difference we are interested in
Difference in Means or Proportions Odds Ratio (OR) Relative Risk (RR) Correlation Coefficient
Vocabulary (1)
Statistic
Compute from sample
Sampling Distribution
All possible values that statistic can have Compute from samples of a given size randomly drawn from the same population
Parameter
Compute from population
Hypothesis Testing
Null hypothesis Alternative hypothesis Is a value of the parameter consistent with sample data
Null Hypothesis
Usually that there is no effect Mean = 0 OR = 1 RR = 1 Correlation Coefficient = 0 Generally fixed value: mean = 2 If an equivalence trial, look at NEJM paper or other specific resources
Alternative Hypothesis
Contradicts the null There is an effect What you want to prove If equivalence trial, special way to do this
Example Hypotheses
H0: 1 = 2 HA: 1 2 Two-sided test HA: 1 > 2 One-sided test
One-sided test
Specific interest in only one direction Not scientifically relevant/interesting if reverse situation true
Outline
Estimation and Hypotheses Continuous Outcome/Known Variance Test statistic, tests, p-values, confidence intervals Unknown Variance Different Outcomes/Similar Test Statistics Additional Information
Experiment
Develop hypotheses Collect sample/Conduct experiment Calculate test statistic Compare test statistic with what is expected when H0 is true
Reference distribution Assumptions about distribution of outcome variable
Example: Hypertension/Cholesterol
Mean cholesterol hypertensive men Mean cholesterol in male general population (20-74 years old) In the 20-74 year old male population the mean serum cholesterol is 211 mg/ml with a standard deviation of 46 mg/ml
Cholesterol Hypotheses
H0: 1 = 2 H0: = 211 mg/ml = population mean serum cholesterol for male hypertensives Mean cholesterol for hypertensive men = mean for general male population HA: 1 2 HA: 211 mg/ml
Experiment
Develop hypotheses Collect sample/Conduct experiment Calculate test statistic Compare test statistic with what is expected when H0 is true Reference distribution Assumptions about distribution of outcome variable
Test Statistic
Basic test statistic for a mean
point estimate of Q - target value of Q test statistic = W point estimate of Q
= standard deviation For 2-sided test: Reject H0 when the test statistic is in the upper or lower 100* /2% of the reference distribution What is ?
Vocabulary (2)
Types of errors Type I ( ) Type II ( ) Related words Significance Level: Power: 1-
level
H0 Correct
HA Correct
1True Negative False Negative 1False Positive True Positive = significance level 1- = power
Type I Error
= P( reject H0 | H0 true) Probability reject the null hypothesis given the null is true False positive Probability reject that hypertensives =211mg/ml when in truth the mean cholesterol for hypertensives is 211
Z Test Statistic
Want to test continuous outcome Known variance Under H0 X Q 0 ~ N (0,1)
W/ n
Therefore,
Reject H 0 if
X Q0
W/ n
Z1E / 2W Z1E / 2W Q ,Q n n
/2) =
- Z(1-
/2)
Do Not Reject H0
220 211 ! ! 0.98 1.96 Z ! W/ n 46 / 25
46 ! 228.03 NO! 220=X > Q 0 1.96 =211+1.96* n 25 W 46 ! 211-1.96* ! 192.97 NO! 220=X < Q 0 1.96 n 25 W
X Q0
P-value
Smallest the observed sample would reject H0 Given H0 is true, probability of obtaining a result as extreme or more extreme than the actual sample MUST be based on a model Normal, t, binomial, etc.
Cholesterol Example
P-value for two sided test X = 220 mg/ml, = 46 mg/ml n = 25 H0: = 211 mg/ml HA: 211 mg/ml
Z1E / 2W Z1E / 2W X ,X n n
Construct an interval around the point estimate Look to see if the population/null mean is inside
Outline
Estimation and Hypotheses Continuous Outcome/Known Variance Test statistic, tests, p-values, confidence intervals Unknown Variance Different Outcomes/Similar Test Statistics Additional Information
T-Test Statistic
Want to test continuous outcome Unknown variance Under H0 X Q 0
s/ n
~ t( n 1)
Critical values: statistics books or computer t-distribution approximately normal for degrees of freedom (df) >30
Cholesterol: t-statistic
Using data T !
X Q0 s/ n
! 1.17
For = 0.05, two-sided test from t(24) distribution the critical value = 2.064 | T | = 1.17 < 2.064 The difference is not statistically significant at the = 0.05 level Fail to reject H0
Outline
Estimation and Hypotheses Continuous Outcome/Known Variance Test statistic, tests, p-values, confidence intervals Unknown Variance Different Outcomes/Similar Test Statistics Additional Information
Z!
xy W / n W / m
2 1 2 2
or T !
xy s /ns /m
2 1 2 2
Binary Outcomes
Exact same idea For large samples Use Z test statistic Now set up in terms of proportions, not means
Z!
p p0 p0 (1 p0 ) / n
Z!
p1 p2 p1 (1 p1 ) p2 (1 p2 ) n m
Vocabulary (3)
Null Hypothesis: H0 Alternative Hypothesis: H1 or Ha or HA Significance Level: level Acceptance/Rejection Region Statistically Significant Test Statistic Critical Value P-value
Outline
Estimation and Hypotheses Continuous Outcome/Known Variance Test statistic, tests, p-values, confidence intervals Unknown Variance Different Outcomes/Similar Test Statistics Additional Information
Linear regression
Model for simple linear regression
Yi = 0 + 1x1i + 0 = intercept 1 = slope
i
Assumptions
Observations are independent Normally distributed with constant variance
Hypothesis testing
H0 :
1
= 0 vs. HA:
or
Depends on the question Most will say protect against Type I error Need to think about individual and population health implications and costs
HIV Screening
False positive Needless worry Stigma False negative Thinks everything is ok Continues to spread disease For cholesterol example?
Outline
Estimation and Hypotheses Continuous Outcome/Known Variance Test statistic, tests, p-values, confidence intervals Unknown Variance Different Outcomes/Similar Test Statistics Additional Information
Misconceptions
Smaller p-value larger effect
Effect size is determined by the difference in the sample mean or proportion between 2 groups
Misconceptions
A small p-value means the difference is statistically significant, not that the difference is clinically significant A large sample size can help get a small p-value Failing to reject H0
There is not enough evidence to reject H0 Does NOT mean H0 is true
Normal/Large Sample Data? No Yes Independent? No Yes Expected 5 Yes 2 sample Z test for proportions or contingency table No Fishers Exact test McNemars test Binomial?
No
Nonparametric test
Yes
Inference on means?
Yes
Independent?
No
Inference on variance?
Yes
Variance known?
No
Paired t
Yes
F test for variances
No
Variances equal?
Yes
Z test
Yes
T test w/ pooled variance
No
T test w/ unequal variance
Questions?