Beruflich Dokumente
Kultur Dokumente
|
.
|
\
|
=
1
1
) 1 ( 1
2 2
k n
n
r r
adj
Error and coefficients relationship
B1 = Covar(yx)/Varp(x)
Stddevp 419.28571 1103.4439 115902.4 1630165.82 36245060.6 706538.59 195.9184
Covar 662.14286 6862.5 25621.4286 120976.786 16061.643 257.1429
b1 0.6000694 0.059209 0.01571707 0.00333775 0.0227329 1.3125
Is the Model Significant?
F Test for Overall Significance of the Model
Shows if there is a linear relationship between all of the
X variables considered together and Y
Use F-test statistic
Hypotheses:
H
0
:
1
=
2
= =
k
= 0 (no linear relationship)
H
1
: at least one
i
0 (at least one independent
variable affects Y)
F Test for Overall Significance
Test statistic:
where F has (numerator) = k and
(denominator) = (n k - 1)
degrees of freedom
1 k n
SSE
k
SSR
MSE
MSR
F
= =
Case discussion
Multiple Regression Assumptions
Assumptions:
The errors are normally distributed
Errors have a constant variance
The model errors are independent
e
i
= (Y
i
Y
i
)
<
Errors (residuals) from the regression model:
Error terms and coefficient estimates
Once we think of the Error term as a random
variable, it becomes clear that the estimates
of b1, b2, (as distinguished from their true
values) will also be random variables, because
the estimates generated by the SSE criterion
will depend upon the particular value of e
drawn by nature for each individual in the
data set.
Statistical Inference and Goodness of
fit
The parameter estimates are themselves random
variables, dependent upon the random variables e.
Thus, each estimate can be thought of as a draw
from some underlying probability distribution, the
nature of that distribution as yet unspecified.
If we assume that the error terms e are all drawn
from the same normal distribution, it is possible to
show that the parameter estimates have a normal
distribution as well.
T Statistic and P value
T = B1-B1average/B1 std dev
Can you have a hypothesis that
b1 average = b1 estimate
and do the T test
Are Individual Variables Significant?
Use t tests of individual variable slopes
Shows if there is a linear relationship between the
variable X
j
and Y
Hypotheses:
H
0
:
j
= 0 (no linear relationship)
H
1
:
j
0 (linear relationship does exist
between X
j
and Y)
Are Individual Variables Significant?
H
0
:
j
= 0 (no linear relationship)
H
1
:
j
0 (linear relationship does exist
between x
j
and y)
Test Statistic:
(df = n k 1)
j
b
j
S
b
t
0
=
Coefficien
ts
Standard
Error t Stat P-value
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept -59.0661 11.28404 -5.23448 3.45E-05 -82.5325 -35.5996 -82.5325 -35.5996
OFF -0.00696 0.04619 -0.15068 0.881663 -0.10302 0.089097 -0.10302 0.089097
BAR 0.041988 0.005271 7.966651 8.81E-08 0.031028 0.052949 0.031028 0.052949
YNG 0.002716 0.000999 2.717326 0.012904 0.000637 0.004794 0.000637 0.004794
VEH 0.00147 0.000265 5.540878 1.69E-05 0.000918 0.002021 0.000918 0.002021
INV -0.00274 0.001336 -2.05135 0.052914 -0.00552 3.78E-05 -0.00552 3.78E-05
SPD -0.2682 0.068418 -3.92009 0.000786 -0.41049 -0.12592 -0.41049 -0.12592
with n (k+1) degrees of freedom
Confidence Interval Estimate
for the Slope
Confidence interval for the population slope
j
where t has (n k 1) d.f.
j
b k n j
S t b
1
2 1 3 2 2 1 1 0
3 3 2 2 1 1 0
+ + + =
+ + + =
Effect of Interaction
Given:
Without interaction term, effect of X
1
on Y is
measured by
1
With interaction term, effect of X
1
on Y is
measured by
1
+
3
X
2
Effect changes as X
2
changes
X X X X Y
2 1 3 2 2 1 1 0
+ + + + =
X
2
= 1:
Y = 1 + 2X
1
+ 3(1) + 4X
1
(1) = 4 + 6X
1
X
2
= 0:
Y = 1 + 2X
1
+ 3(0) + 4X
1
(0) = 1 + 2X
1
Interaction Example
Slopes are different if the effect of X
1
on Y depends on X
2
value
X
1
4
8
12
0
0 1 0.5 1.5
Y
= 1 + 2X
1
+ 3X
2
+ 4X
1
X
2
Suppose X
2
is a dummy variable and the estimated regression equation is
Y
Residual Analysis
The residual for observation i, e
i
, is the difference between
its observed and predicted value
Check the assumptions of regression by examining the
residuals
Examine for linearity assumption
Evaluate independence assumption
Evaluate normal distribution assumption
Examine for constant variance for all levels of X (homoscedasticity)
Graphical Analysis of Residuals
Can plot residuals vs. X
i i i
Y
Y e =
Residual Analysis for
Independence
Not Independent
Independent
X
X
r
e
s
i
d
u
a
l
s
r
e
s
i
d
u
a
l
s
X
r
e
s
i
d
u
a
l
s
Residual Analysis for
Equal Variance
Non-constant variance
Constant variance
x x
Y
x
x
Y
r
e
s
i
d
u
a
l
s
r
e
s
i
d
u
a
l
s
Linear fit does not give
random residuals
Linear vs. Nonlinear Fit
Nonlinear fit gives
random residuals
X
r
e
s
i
d
u
a
l
s
X
Y
X
r
e
s
i
d
u
a
l
s
Y
X
Quadratic Regression Model
Quadratic models may be considered when the scatter diagram takes on one of
the following shapes:
X
1
Y
X
1
X
1
Y Y Y
1
< 0
1
> 0
1
< 0
1
> 0
1
= the coefficient of the linear term
2
= the coefficient of the squared term
X
1
i
2
1i 2 1i 1 0 i
X X Y + + + =
2
> 0
2
> 0
2
< 0
2
< 0