Beruflich Dokumente
Kultur Dokumente
Distributions
There are many theoretical distributions, both continuous and discrete. We use 4 of these a lot: z (unit normal), t, chi-square, and F. Z and t are closely related to the sampling distribution of means; chisquare and F are closely related to the sampling distribution of variances.
(X X ) ( X Q) ( y Q) ;z! ;z! SD W W
z score
( y Q)2 z ! W2
z !G
2
2 (1)
What would its sampling distribution look like? Minimum value is zero. Maximum value is infinite. Most values are between zero and 1; most around zero.
Chi-square (2)
What if we took 2 values of z2 at random and added them?
( y1 Q ) 2 2 ( y2 Q ) 2 ( y1 Q ) 2 ( y2 Q ) 2 2 2 z ! ; z2 ! ! z12 z2 G ( 2) ! 2 2 W W W2 W2
2 1
Same minimum and maximum as before, but now average should be a bit bigger. Chi-square is the distribution of a sum of squares. Each squared deviation is taken from the unit normal: N(0,1). The shape of the chi-square distribution depends on the number of squared deviates that are added together.
Chi-square 3
The distribution of chi-square depends on 1 parameter, its degrees of freedom (df or v). As df gets large, curve is less skewed, more normal.
Chi-square (4)
The expected value of chi-square is df.
The mean of the chi-square distribution is its degrees of freedom.
There are tables of chi-square so you can find 5 or 1 percent of the distribution. 2 2 2 Chi-square is additive. G ( v v ) ! G ( v ) G ( v )
1 2 1 2
( y y)2 N 1 ( N 1) s 2 ! W2
Sample estimate of population variance (unbiased). Multiply variance estimate by N-1 to get sum of squares. Divide by population variance to normalize. Result is a random variable distributed as chi-square with (N-1) df.
G (2N 1)
We can use info about the sampling distribution of the variance estimate to find confidence intervals and conduct statistical tests.
Test the null that the population variance has some specific value. Pick alpha and rejection region. Then: Plug hypothesized population variance and sample variance into equation along with sample size we used to estimate variance. Compare to chi-square distribution.
2 ( N 1)
( N 1) s 2 ! 2 W0
Note: 1 tailed test on small side. Set alpha=.01. N ! 30; s 2 ! 4.55 Mean is 29, so its on the small (29)(4.55) 2 G 29 ! ! 21.11 side. But for Q=.99, the value 6.25 of chi-square is 14.257. Cannot reject null.
6.25.
H 0 : W 2 ! 6.25; H 1: W 2 { 6.25. N ! 30; s 2 ! 4.55
Now chi-square with v=29 and Q=.995 is 13.121 and also with Q=.005 the result is 52.336. N. S. either way.
( N 1) s 2 ( N 1) s 2 p 2 eW 2 e 2 ! .95 G ( N 1;.975 ) G ( N 1;.025 ) Suppose N=15 and s 2 is 10. Then df=14 and for Q=.025
the value is 26.12. For Q=.975 the value is 5.63.
Normality Assumption
We assume normal distributions to figure sampling distributions and thus p levels. Violations of normality have minor implications for testing means, especially as N gets large. Violations of normality are more serious for testing variances. Look at your data before conducting this test. Can test for normality.
Also the ratio of two chi-squares, each divided by its degrees of freedom:
F! G (2v1 ) / v1 G (2v2 ) / v2
In our applications, v2 will be larger than v1 and v2 will be larger than 2. In such a case, the mean of the F distribution (expected value) is v2 /(v2 -2).
F Distribution (2)
F depends on two parameters: v1 and v2 (df1 and df2). The shape of F changes with these. Range is 0 to infinity. Shaped a bit like chi-square. F tables show critical values for df in the numerator and df in the denominator. F tables are 1-tailed; can figure 2-tailed if you need to (but you usually dont).
Note 1-tailed.
A Look Ahead
The F distribution is used in many statistical tests
Test for equality of variances. Tests for differences in means in ANOVA. Tests for regression models (slopes relating one continuous variable to another like SAT and GPA).
t ! F(1,v )
2 (v )