Beruflich Dokumente
Kultur Dokumente
continued…
Recall: Single population mean
(large n)
Hypothesis test:
observed mean null mean
Z
s
n
Confidence Interval
s
confidence interval observed mean Z/2 * ( )
n
Single population mean (small
n, normally distributed trait)
Hypothesis test:
observed mean null mean
Tn 1
s
n
Confidence Interval
s
confidence interval observed mean Tn 1,/2 * ( )
n
What is a T-distribution?
A t-distribution is like a Z distribution,
except has slightly fatter tails to reflect
the uncertainty added by estimating .
The bigger the sample size (i.e., the
bigger the sample size used to estimate
), then the closer t becomes to Z.
If n>100, t approaches Z.
T-distribution with only 1 degree of freedom.
T-distribution with 4 degrees of freedom.
T-distribution with 9 degrees of freedom.
T-distribution with 29 degrees of freedom.
T-distribution with 99 degrees of freedom. Looks a lot like Z!!
Student’s t Distribution
Note: t Z as n increases
Standard
Normal
(t with df = )
t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal
0 t
from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
Student’s t Table
Upper Tail Area
Let: n = 3
df .25 .10 .05 df = n - 1 = 2
= .10
1 1.000 3.078 6.314 /2 =.05
Note: t Z as n increases
from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
The T probability density
function
What does t look like mathematically? (You may at least recognize
some resemblance to the normal distribution function…)
Where:
v is the degrees of freedom
(gamma) is the Gamma function
is the constant Pi (3.14...)
The t-distribution in SAS
Yikes! The t-distribution looks like a mess! Don’t want to
integrate!
Luckily, there are charts and SAS! MUST SPECIFY
DEGREES OF FREEDOM!
The t-function in SAS is:
probt(t-statistic, df)
The normality assumption…
Ttests (and all linear models, in fact) have a
“normality assumption”:
If the outcome variable is not normally distributed
and the sample size is small, a ttest is
inappropriate
it takes longer for the CLT to kick in and the
sample means do not immediately follow a t-
distribution…
This is the source of the “normality
assumption” of the ttest…
Computer simulation of the distribution
of the sample mean (non-normal,
small n):
1. Pick any probability distribution and specify a mean and standard deviation.
2. Tell the computer to randomly generate 1000 observations from that
probability distributions
E.g., the computer is more likely to spit out values with high probabilities
Xn
3. Calculate 1000 T-statistics: T
Sx
n
4. Plot the T-statistics in histograms.
5. Repeat for different sample sizes (n’s).
n=2, underlying distribution is exponential (mean=1, SD=1)
5. Probably not sufficient evidence to reject the null. We cannot sue the light bulb manufacturer for
false advertising! Notice that using t-distribution to calculate the p-value didn’t change much!
With n>30, might as well use Z table.
Practice problem
You want to estimate the average ages of
kids that ride a particular kid’s ride at
Disneyland. You take a random sample of 8
kids exiting the ride, and find that their ages
are: 2,3,4,5,6,6,7,7. Assume that ages are
roughly normally distributed.
a. Calculate the sample mean.
b. Calculate the sample standard deviation.
c. Calculate the standard error of the mean.
d. Calculate the 99% confidence interval.
Answer (a,b)
a. Calculate the sample mean.
8
X
i 1
i
2 3 4 5 6 6 7 7 40
X8 5.0
8 8 8
b. Calculate the sample standard deviation.
8
i 1
( X i 5) 2
32 2 2 12 0 2(12 ) 2(2 2 ) 24
s X2 3.4
8 1 7 7
s X 3.4 1.9
Answer (c)
c. Calculate the standard error of the mean.
sX 1 .9
sX .67
n 8
Answer (d)
d. Calculate the 99% confidence interval.
t7,.005=3.5
mean s X (t df , / 2 )
5.0 .67 (3.50) (2.65, 7.35)
Example problem, class data:
A two-tailed hypothesis test:
A researcher claims that Stanford affiliates eat
fewer than the recommended intake of 5
fruits and vegetables per week.
We have data to address this claim: 24 people
in the class provided data on their daily fruit
and vegetable intake.
Do we have evidence to dispute her claim?
Histogram fruit and veggie
intake (n=24)…
Mean=3.7 servings
Median=3 servings
Mode=3 servings
Std Dev=1.7 servings
Answer
1. Define your hypotheses (null, alternative)
H0: P(average servings)=5.0
Ha: P(average servings)≠5.0 servings (two-sided)
3 83 80
4 98 93
5 108 98
6 95 90
Example problem: paired ttest
Patient Diastolic BP Before D. BP After Change
1 100 92 -8
2 89 84 -5
3 83 80 -3
4 98 93 -5
5 108 98 -10
6 95 90 -5
(-3.43 , - 8.571) -5
-3
Note: does not include 0.
-5
-10
-5
Summary: Single population
mean (small n, normality)
Hypothesis test:
observed mean null mean
t n 1
sx
n
Confidence Interval
sx
confidence interval observed mean t n -1,/2 * ( )
n
Summary: paired ttest
Hypothesis test:
Where d=change
observed mean d 0 over time or
tn 1 difference within
sd
a pair.
n
Confidence Interval
sd
confidence interval observed mean d t n -1,/2 * ( )
n
Summary: Single population
mean (large n)
Hypothesis test:
observed mean null mean
Z tn 1
sx
n
Confidence Interval
sx
confidence interval observed mean [ t n -1,/2 Z/2 ] * ( )
n
Examples of Sample Statistics:
Single population mean (known )
Single population mean (unknown )
Single population proportion
Difference in means (ttest)
Difference in proportions (Z-test)
Odds ratio/risk ratio
Correlation coefficient
Regression coefficient
…
Recall: normal approximation
to the binomial…
Statistics for proportions are based on a
normal distribution, because the
binomial can be approximated as
normal if np>5
Recall: stats for proportions
For binomial: x np
Differs by
a factor of
x np(1 p)
2
n.
x np(1 p)
Differs
by a
factor
pˆ p of n.
For proportion:
np(1 p) p(1 p)
pˆ 2 2
n n
P-hat stands for “sample p(1 p)
proportion.” pˆ
n
Sampling distribution of a
sample proportion
pˆ p p=true population proportion.
p(1 p )
pˆ
n BUT… if you knew p you wouldn’t
be doing the experiment!
pˆ (1 pˆ )
s pˆ
n
pˆ (1 pˆ ) Always a normal
pˆ ~ Normal( p, ) distribution!
n
Practice Problem
A fellow researcher claims that at least 15% of smokers
fail to eat any fruits and vegetables at least 3 days a week.
You find this hard to believe and decide to check the
validity of this statistic by taking a random (representative)
sample of smokers. Do you have sufficient evidence to
reject your colleague’s claim if you discover that 17 of the
200 smokers in your sample eat no fruits and vegetables at
least 3 days a week?
Answer
1. What is your null hypothesis?
Null hypothesis: p=proportion of smokers who skip fruits and veggies frequently
>= .15
Alternative hypothesis: p < .15
4. Z = (.085-.15)/.025 = -2.6
p-value = P(Z<-2.6) = .0047
.5 * .5 .5 5 1
S pˆ 2%
625 25 250 50
4% is 2 standard errors.
Since, we're on a normal distributi on, 2 standard errors on either
side of the mean, should represent 95% confidence...
Paired data proportions test…
Analogous to paired ttest…
Also takes on a slightly different form
known as McNemar’s test (we’ll see lots
more on this next term…)
Paired data proportions test…
1000 subjects were treated with
antidepressants for 6 months and with
placebo for 6 months (order of tx was
randomly assigned)
Question: do suicide attempts (yes/no)
differ depending on whether a subject is
on antidepressants or on placebo?
Paired data proportions test…
Data:
15 subjects attempted suicide in both conditions (non-
informative)
10 subjects attempted suicide in the antidepressant
condition but not the placebo condition
5 subjects attempted suicide in the placebo condition
but not the antidepressant condition
970 did not attempt suicide in either condition (non-
informative)
Observed p = .666
pˆ p0 .666 .5
Z 1.29; p .05
( p0 )(1 p0 ) (.5)(.5)
n 15
** If np (expected
Test for Ho: p = po: Z
pˆ p 0
value)<5, use exact
( p 0 )(1 p 0 )
binomial rather than Z
n approximation…
Corresponding confidence
intervals…
For a mean: sx
x t n 1, / 2 Tn-1 approaches Z
for large n.
n
( pˆ )(1 pˆ ) ** If np
For a proportion: pˆ Z / 2 (expected
n value)<5, use
exact binomial
rather than Z
approximation
…
Symbol overload!
n: Sample size
Z: Z-statistic (standard normal)
tdf: T-statistic (t-distribution with df degrees of
freedom)
p: (“p-hat”): sample proportion
X: (“X-bar”): sample mean
s: Sample standard deviation
p0: Null hypothesis proportion
0: Null hypothesis mean