Sie sind auf Seite 1von 8

Multiple Regression Analysis

Copyright@Tieming Ji
Spring 2013
University of Missouri at Columbia
1 / 8
Model:
y
i
=
0
+
1
x
i 1
+
2
x
i 2
+ +
k
x
ik
+ e
i
e
i
i .i .d.
N(0,
2
)
Key Point:
j
, j = 1, , k, is the change in the mean of Y
for a unit increase in x
j
with all other variables held constant.
Model Assumptions:
Independence;
Normality;
Constant Variance;
Linearity.
2 / 8
Least Square Estimation
How to estimate
0
,
1
, ...,
k
?
min
n

i =1
(y
i
y
i
)
2
min
n

i =1
(y
i
(

0
+

1
x
i 1
+ +

k
x
ik
))
2

0
,

1
, ,

k
.
Theorem: LSE

0
,

1
, ,

k
are the best linear unbiased
estimators of
0
,
1
, ,
k
.
3 / 8
Conclusions:
The tted values (predicted values) are
y
i
=

0
+

1
x
i 1
+ +

k
x
ik
.
Residuals are e
i
= y
i
y
i
, i = 1, ..., n.

j
, j = 1, , k, is a linear function of y
i
s, so

j
N(
j
,
2

j
).
Multiple correlation coecient
r
Y,

Y
=

n
i =1
(y
i
y)( y
i

y)

n
i 1
(y
i
y)
2

n
i =1
( y
i

y)
2
.
It measures how well response variables are predicted.
4 / 8
ANOVA Table
Source d.f. SS MSS
model k SSR =

n
i =1
( y
i
y)
2
MSR =
SSR
k
error n (k + 1) SSE =

n
i =1
(y
i
y
i
)
2
MSE =
SSE
n(k+1)
total n 1 SST =

n
i =1
(y
i
y)
2
R
2
=
SSR
SST
=
SST SSE
SST
.
MSE S
2
is the unbiased estimator of
2
.
5 / 8
ANOVA Test
Hypotheses:
H
0
:
1
=
2
= =
k
= 0;
H
a
: at least one of
j
s, j = 1, , k, is not equal to 0.
Test statistic: F
obs
=
MSR
MSE
from ANOVA table.
Decision (at level ):
(1) reject H
0
if F
obs
F
k,n(k+1),1
; or
(2) reject H
0
if p-value is less or equal to .
ANOVA test is also a model selection. where H
0
: y
i
= + e
i
vs. H
a
: model contains at least one
j
, j = 1, , k.
6 / 8
Model Selection
Suppose we have decided to use linear models, we need to
choose predictors in the linear function. We may consider:
What explanatory variables to include;
What forms (e.g. transformations, interactions) of
explanatory variables to include.
Criteria:
Overall criteria: model simplicity + tting quality.
Larger R
2
Better tting to observations.
R
2
always increases as more explanatory variables added in
model. Incremental increase in R
2
by adding more
explanatory variables may not be statistically signicant
nor practically important.
7 / 8
Model Selection
(a) (b) (c)
(a) Poor tting, R
2
is low.
(b) Adequate tting, robust for future data.
(c) Extremely overtting, R
2
=1, but not robust for predicting.
8 / 8

Das könnte Ihnen auch gefallen