Sie sind auf Seite 1von 2

Sampling Distribution of the sample mean Z=(xbar-)/ Z=(Xbar-)/n Mean of the sampling Dis of the Mean: X-bar = Variance

ce of the sampling Dis of the Mean: 2Xbar = 2x /n SD:Xbar = / n, or SEXbar = / n Central Limit Thm: Draw a SRS of size n from any population with and . When n is large (at least n>30), the sampling dis. of the sample mean is approximately normal: N(, / n) = approx. normal for large samples in any case Confidence Interval (Ch. 6) C: 90% =Z: 1.645, C:95%=Z: 1.960, C:99% = Z: 2.576 (1) Ha: >0 is P(Z>z), (2) Ha: <0 is P(Z<z), (3) Ha: 0 is 2P(Z>|z|) Two sided: Reject H0 and accept Ha if |Zobs| > |Zcrit|, Pobs < Pcrit (Note 2*Pcrit = ) One sided: Reject H0 and accept Ha if |Zobs| > |Zcrit|, Pobs < Pcrit (Note Pcrit = ) is given & 1 sample: CI: = Xbar +/- (Z/2* / n), Z=(xbar-0)/(/n) Margin of error= (Z/2* / n), n=(z*/m)2 EX) H0: = 475, Ha: > 475 (one-sided test) Given: = 475, = 100, n=100, Xbar=478, SEXbar = /n = 100/100 =10 Z= Xbar /SExbar=0.3, P = P (Z > 0.3) = 1 - .6179 = .382 (one-seided) Assuming 5% significance level ( = 0.05);|Zobs| (0.3) < |Zcrit| (1.65); 1.65 is the Zcrit for a 1-tailed test, Pobs (.382)> Pcrt (.05) : cannot reject H0 at 5% significant level 1 & 2 known & same: Mean=1-2, Variance= , SD=

Inference for Two Proportions Mean of difference in prop: D =p1-p2 =1-2 Variance of difference in prop: 2D = [1(1 1)]/n1 + [2(1 2)]/n2 SD of difference in prop: D = *1(1 1)]/n1 + *2(1 2)]/n2 CI: 1-2 = (p1-p2) +/- (z/2*SED), pi=Xi/ni, SED =*p1(1 p1)]/n1 + [p2(1 p2)]/n2 Hypothesis testing: p1 = X1/n1 and p2 = X2/n2 , p0 = (X1+ X2)/(n1+ n2) Z= (p1-p2) -(1 -2) / *p0(1 -p0)*[(1/n1) + (1/n2)]] p-value = P( Z > |Zobs|)= 1 P(Z < |Zobs|) , 2-sided p-value = 2*P( Z > |Zobs|) = 2*[1 P(Z < |Zobs|)] Use normal approx. for values of n and that satisfy n > 10 and n(1 ) > 10 EX) CI: The 95% confidence interval for the true difference in proportion b/w XX and XX is between XXX and XXX. EX) H0: 1=2, Ha: 1 >2 1. Find p0=(x1+x2)/(n1+n2)=(63+43)/(78+77)=106/155=0.684 2. Zobs = {(p1-p2)-( 1-2)}/ {p0 (1 p0)*(1/n1+1/n2)}=3.35 3. One-sided p-value =P(Z>|3.35|)=1-(Z<|3.35|)=1-0.9996=0.0004 4. = 0.05, |Zobs| (3.35) >|Zcrit| (1.64), Pobs (0.0004) < Pcrt (0.05) 5. So, we can reject H0 at the 95% confidence level and accept Ha that the true proportion of Group 1 is higher than the one of Group2, which shows the aspirin is likely to increase the chance of a favorable outcome. Two way Tables (Cross-tab) ex) In 2000, a non-white person was more likely to be poor than a white person Explanatory Variable

Z=[(xbar1-xbar2)-(1-2)]/SD, CI: 1-2= (Xbar1- Xbar2) +/- Z/2*SD NOT given & 1 sample: CI: = Xbar +/- (t/2*s / n), t=(xbar-0)/(s/n), df=n-1 1 & 2 unknown & differ(non-pooled): Mean=1-2, Vari= H0: 1-2=0, Ha: 1-20, DOF(k): smaller of (n1-1) and (n2-1) t=[(xbar1-xbar2)-(1-2)]/SE, CI: 1-2= (Xbar1- Xbar2) +/- t/2*SE 1&2 unknown & same(pooled):M=1-2, V=sp2=
( ) ( )

, SE=

, SE=

DOF:n1+n2-2, CI: 1-2= (Xbar1- Xbar2) +/- t/2*sp*(1/n1) + (1/n2)], tobs = (Xbar1 - Xbar2) - (1 - 2)/ sp*(1/n1) + (1/n2)]
EX) Random assignment should result in two similar groups in terms of pre-treatment variables. They come from populations with equal variance for pre-treatment variables. Randomization=use pooled t distribution

EX) H0: (1-2) = 0, Ha: (1-2) 0 (non-pooled t-distribution) 1. Find (Xbar1- Xbar2) = (52-49)=3 2. SE= =(13^2/75+11^2/53)=2.130 Two way Table Hypothesis testing EX) H0:there is NO association between the row and the column variable Expected Count: Row total * Column total / n 1) X2 obs= (observed count - expected count)2/ expected count 2) DOF: (r-1)(C-1). We look at Table F and locate the X2obs=11.57 in the table for df=1 to find the corresponding Pobs. We find that the proportion in the tail to the right of our X2obs is located between .0005 and .001. Thus, .0005 < pobs < .001 3) X2obs> X2crit, Pobs<Pcrit(0.05)we can reject H0 --- X2 is only one-sided Note: large X2= less likely that the null is true, 2 =Z2 = (2-sample Z-test)2 Inference for Simple regression Regression model; Population: Yi = 0 + 1Xi + i, Sample: Yi = b0 + b1Xi + ei Parameters for slope 1 in population Mean: 1, V=b12 = 2/(Xi - Xbar)2; where 2 = (Yi - Yhati)2/n SD: b1=/(Xi - Xbar)2; where = (Yi - Yhati)2/n ,or b1 = (/n)*(1/sx) 1 coefficient with sample parameters

3. Calculate two sample t statistics. t=[(xbar1-xbar2)-(1-2)]/SE=(3-0)/2.130=-1.409 4. Compare tobs to tcrit: d.f. smaller (n1-1) and (n2-1), that is, k=52-1=51 t/2 for two-sided test, approx. tcrit=2.009 Since |tobs| < tcrit, we cannot reject the null hypothesis. 5. For two-sided test: pobs (2*0.10)=0.20 and (2*0.05)=0.10, [0.10<2pobs<0.20]>(pcit=0.05).So we cannot reject the null hypothesis at the 95% confidence level. EX) CI: We are 95% confident that the true mean number of XX (the true difference in mean XXX this year and last year) falls between XX and XX. Note that 0 is included in the CI, which is consistent with results from the hypothesis test. -XX < (1-2) < XX Inference for a Single Proportion Sample proportion: Use rules when the population is at least 20 times as large as the sample. We can assume that sampling distribution is approximately normal Mean of a sample prop success p: p = Variance of a sample prop success p: 2p = [(1 )]/n SD of a sample prop success p: p =SE= *(1 )]/n CI: = p +/- (z/2*SEp), where p=X/n, SEp = *p(1 p)]/n Margin of Error(m): n=( z/2/m)p(1-p) H0:=0.7, Ha: 0.7, Z=(P-0)/ (0(1 0 )/n), p-value = P( Z > |Zobs|) = 1 P(Z < |Zobs|) , 2-sided p-value = 2*P( Z > |Zobs|)= 2*[1 P(Z < |Zobs|) EX) H0: =0.7, Ha: 0.7. 0=0.7, p=103/150=0.68, SE=(0.7)(0.3)/150=0.0379, Zobs=(0.68-0.7)/0.0379=-0.351 EX) CI: SEp=(0.687)(0.313)/150=0.379, =0.687+/- z/2*0.379 EX) CI: The 95% confidence interval for the true proportion of people who lied about having a degree is between XXX and XXX. EX) H0: =20%, Ha: >20%, 0=0.2, n= 40 and X=11, p=X/n=11/40=0.275 Zobs = (p-0)/ {0 (1 0)/n}=1.1867. One-sided p-value =P(Z>|1.1867|)=1-(Z<|1.1867|)=1-0.8810=0.119 |Zobs| (1.1867) < |Zcrit|, and Pobs (0.119) > Pcrt (0.05). So cannot reject the null. The sample data did not give us good evidence that Ha

EX) Step 1: H0: 1 = 0, Ha: 1 0 (or one-side: H0: 1 = 0, Ha: 1 > 0) Step 2: Calculate Tstatistic (= Tobs) STATA or tobs= b1/SEb1. Step 3: Calculate 2-sided p-value STATA or find 2-sided p-value with df=N-k-1.

2*(0.005) < 2Pobs < 2*(0.01), 0.01< Pobs < 0.02 |Tobs| (2.42) > |Tcrit| ( 2.0) , Pobs (.018) < Pcrit (.05) (two-sided) Step 4: Conclusion: We can reject the H0 and conclude that the slope is significantly different from (greater than) zero at the 95% confidence level. Confidence Interval for 1 = b1 t.025 and df*SEb1, SEb1= s/(Xi Xbar)2 Conclusion: With 95% confidence, we can say that the true B1 lies b/w XX & XX EX) Yi = 0 + 1Xi + i where Yi is each of the possible annual budget, Xi is each of the possible # of students in an elementary school, and 0 + 1Xi is the mean 2 response when X = Xi where the corresponding residual i ~ N(0,. ) .B1 shows how total cost changes when there are more students in the school. Error term (corresponding residual), i, allows variation among schools of the same size X. ANOVA

EX) Salary-hat = 500 + 200*(gender) On average, women make $500 per year, while men make $700 per year On average, men make $200 dollars per year more than women EX) Salary-hat = 13711.10+3534.22(Exp)+26550.33(gender) Constant: The average salary for woman (when gender=0) with 0 years of experience is expected to be $13,711.1 Experience coefficient: On average, every one year increase in years of experience is associated with an increase in annual salary of $3,534, when holding gender constant /On average, each one year increase in years of experience is associated with an increase in predicted salary of $2590 for woman Gender (dummy): On average, male workers make $26,550 more than female workers, when holding experience constant. (or when experience is zero). EX) mincome-hat = 1150.1 1284.6lhs + 43.2totalhours For those with high school education mincome-hat = 1150.1 + 43.2totalhours For those with low education: mincome-hat = -134.6 + 43.2totalhours These two lines would have different intercepts but the same slope Constant: The average income for someone with high school equivalent or higher education (when lhs=0) and zero hours worked is $1150. lhs(dummy): On average, having high school equivalent or higher education is associated with an increase in monthly income by $1,285(difference in constant), when holding the number of hours worked constant. totalhours: On average, every additional hour worked is associated with $43 more monthly income, holding education constant.

EX) XX% of the variation in Y is explained by X variable

Interaction Terms Y = 0 + 1X1 + 2X2 + 3X1X2


EX) Salary-hat = 23460 + 2591(Exp) + 7053(Gender) + 1887(Exp*gender) Constant: For women with zero experience, the av. expected salary is $23,460. Coefficient for experience: For women, on average, each additional year of experience is associated with an increase in salary of $2,591. Coefficient for gender: For those with zero experience, on average, the expected salary for men is $7,053 higher than for women. Coefficient for experience*gender: On average, the impact of each additional year of experience on salary is $1887 higher for men than for women. Interaction term: EX) On average, the impact of each additional percentage point of parents with some college education is 3.141 points more in medium poverty communities compared to low poverty communities. Equation interpretation: Ex)On average, every additional percentage point of parents with some college education decreases the school performance score by 0.947 in low poverty communities. EX) mincome-hat = 1081.1 841.9lhs + 44.8totalhours 11.1lhsXhrs For individuals with high school education: lhs=0: mincome-hat = 1081.1 + 44.8totalhours For individuals with less than a high school education: lhs=1: Mincome-hat = 239.2 + 33.7totalhours Constant: The average monthly income for a respondent with a high school education and working zero hours is $1081. The estimate for this coefficient is significant at 5% significance level. lhs: On average, for individuals working 0 hours, having less than a high school education is associated with $841.9 less in monthly income compared to high school graduates. The estimate for this coefficient is significant at 5% significance level. totalhours: For those with a high school education or greater, on average, an additional hour worked is associated with a $44.8 increase in monthly income. The estimate for this coefficient is significant even at 1% significance level. lhsXhrs: On average, the impact of another hour worked is $11.1 less for those with less than a high school education compared to those with a high school education. However, the estimate for the coefficient of the interaction term is not statistically significant.

tobs= b1/SEb1, F-obs = MSM/MSE, CI=1 = b1 tcrit (SE), t =F Sample SD=MST=(SST/n-1), Residual SD=MSE=(SSE/n-k-1) MSE=s= the square root of the average squared residual, which is a measure of the variation of Y observations around the population regression line. Residual Plots: Residual plots allow us to assess whether the assumptions of linearity and constant
variance (homoskedasticity) are plausible. They can also tell us about possible lurking variables EX) There is a clear pattern in the residuals (the residuals are not randomly scattered above and below the zero line) suggesting a violation of the linearity assumption and/or a lurking variable. The pattern does not appear to be homoskedasitic because variability of the residuals is larger for the middle values of self-concept.

Multiple Regression Population: Yi = 0 + 1Xi1 + 2Xi2 + kXik +i Sample(estimate of the true regression line):Yi = b0 + b1Xi + b2Xi2 + bkXik + ei Constant: In the hypothetical situation when X1=X2=Xk=0, we would expect Y to be b0 (or the predicted value of Y when all the Xs = 0) Coefficient: On average, a one *unit, % point] increase in X1 is associated with a b1 [unit, % point] increase/decrease in Y, while holding X2, Xk constant If slope is not statistically significant, then The coefficient on X1 is not significantly different from 0 at the 95% significant level R-Squared: X% of the variation in GPA is explained by the least-squares multiple regression
of GPA on IQ and Self-Concept. Including IQ in our model has increased the explanatory power of our model.

Others r - Correlation: standardized measure of the strength of linear relationship between two quantitative variables Scatterplot: form-linear/curvilinear/nonlinear, direction-positive/negative, strength-moderate/strong/weak correlation and outliers EX) Outlier would skew the distribution and pull the mean towards it.
X 100 200 300 400 500 600 700 400 Y 40 50 50 70 65 65 80 60 Sx 2 Sx B=r x y *( s t/s x ) A=Ybar - bXbar 0.059 36.43 X- Xbar -300 -200 -100 0 100 200 300 0 ( X- Xbar ) 2 90,000 40,000 10,000 0 10,000 40,000 90,000 280,000 46,667 Sy 2 216.02 Sy Y- Ybar -20 -10 -10 10 5 5 20 0 ( Y- Ybar ) 2 400 100 100 100 25 25 400 1,150 192 13.84
Z x = ( X- Xbar ) /S x Z y = ( Y- Ybar ) /S y

Z x *Z y 2.01 0.67 0.33 0.00 0.17 0.33 2.01 5.52 0.92

-1.39 -0.93 -0.46 0.00 0.46 0.93 1.39 0 rxy

-1.44 -0.72 -0.72 0.72 0.36 0.36 1.44 0 ( T otal/n- 1)

F test(1): H0: 1= 0, Ha: 1 0 DOF: DFM(k=numerator)/DFE(n-k-1=denominator), Fobs = MSM/MSE, Fobs = (tobs)2 F-test(2): H0: 1= 2 == k = 0. HA: At least one of the ks is not equal to 0. OF: DFM(k)/DFE(n-k-1), F-obs = MSM/MSE F is one-sided only Conclusion: we can reject the null hypothesis with 95% confidence and conclude that at least one of b-coefficients is significantly different from zero and therefore one of our explanatory variables has a linear impact on GPA (Y)

Yield- hat=36.4+0.059( fer tiliz er )

Dummy Variable

Das könnte Ihnen auch gefallen