Modelo Multiple

Multiple Regression Analysis
3.1 Motivation for multiple

regression
In chapter 2 we learned how to use simple
regression analysis to explain a dependent
variable y as a function of a single explanatory
variable x.
Key assumption for causality:
SLR.4: The error u has an expected value of zero given
x: E(u|x)=0
Main drawback of that framework: All other

factors affecting y have to be uncorrelated
with x.
2
Multiple regression analysis is

more suitable for causal (ceteris
paribus) analysis.
Reason: We can explicitly control for
other factors that affect the
dependent variable y.
Example: Wage equation
If we estimate the parameters of this model
using OLS, what interpretation can we give to
1?
Why might this approach yield a more reliable
estimate of the causal effect of education than
if we were using a simple regression with educ
as the sole explanatory variable?
General model with two independent

variables:
where
0 is the intercept
1 measures the change in y with respect to x1,
holding other factors fixed
2 measures the change in y with respect to x2,
holding other factors fixed
This framework can also be used to generalize the
functional form example: modeling family
consumption (cons) as a function of income (inc):
where inc2 is entered as a separate variable. What is
the effect of income on consumption in this model?
4
Key assumption for the model

with two independent variables:
Note the similarity to assumption SLR.4

introduced last time.
Interpretation: for any values of x1 and x2 in the
population, the average unobservable (u) is
equal to zero.
Discuss this assumption in the context of the
wage model introduced above (educ, exper
impact wage)
5
The model with k independent

variables
The multiple regression model:
3.6
where
0 is the intercept
1 is the parameter associated with x1 (measures
the change in y with respect to x1, holding other
factors fixed)
2 is the parameter associated with x1 (measures
the change in y with respect to x2, holding other
factors fixed)
The model with k independent

variables (contd)
1, 2,,k are often referred to as slope

parameters
u is the disturbance term (error term). It contains
factors other than the x1,x2,,xk affecting y.
Interpretation of the parameters

of the multiple regression model
Being able to interpret the parameters
of the multiple regression model is one
of our key goals in this course and we
will get plenty of practice.
Checkpoint: Make sure you are able to
interpret the parameters of the
following model:
where ceoten = CEO tenure.
8
Key assumption for the model

with k independent variables:
Thus, all factors in the unobserved

error term u are assumed
uncorrelated with the explanatory
variables.
3.2 Mechanics and

interpretation of OLS
We focus first on the model with two independent variables.
We write the estimated OLS regression in a form similar to
the simple regression case:
where hats on the parameters indicate that

these are estimates of the true (unknown)
population parameters:
and the hat on y means predicted (instead of
10
How obtain the OLS estimates?

As discussed in Chapter 2, the method of
ordinary least squares (OLS) chooses the
estimates that minimizes the sum of
squared residuals.
That is, given n observations on the y and
x1,,xk variables, the OLS estimates
minimize:
where i refers to observation number, and
the second index distinguishes different
variables.
11
Model with k independent

variables
OLS estimates
minimize:
You know from earlier maths courses

how to solve a minimization problem
such as this one:
Write down the first-order conditions
for each parameter
Then solve for each parameter
12
Example: The f.o.c. for

The minimization problem: Choose
as to minimize:
so
The first order condition (f.o.c.) is simply:

:
Note (chain rule)
13
General: k+1 unknown parameters

& k+1 equations
These are the OLS first order conditions.

Note that they also can be interpreted as sample
counterparts of the population moments E(xju)=0
(omitting the division by n). Compare with approach
used in Chapter 2.
The latter point highlights the importance of assuming
14
zero covariance between the residual and the
It is tedious but straightforward in principle to

solve for the parameter estimates here. Each
parameter estimate is expressed as a linear
function of the x and y variables.
Fortunately, the computer does the work for us.
You will not be required to derive the solutions.
Note: We must assume that the equations above
can be solved uniquely for the parameters. For
now, we will just assume this and move on.
15
Interpreting the OLS regression

function
More important than the details of
the computation of the OLS
estimates is the interpretation of
the estimated equation.
Consider the model with two
regressors:
he estimates
have partial effect, or ceteris paribus, interpretation
xplain what is meant by this statement.
xplain how to interpret the intercept

16
Example 3.1: Determinants of

college GPA
Data: GPA1.DTA. Collected by a
student at Michigan State University
Variables: college grade point
average (colGPA), high shool GPA
(hsGPA) and achievement test score
(ACT)
Summary statistics for these
variables:
. summarize colGPA hsGPA ACT
Variable
Obs
Mean
colGPA
hsGPA
ACT
141
141
141
3.056738
3.402128
24.15603
Std. Dev.
Min
Max
.3723103
.3199259
2.844252
2.2
2.4
16
4
4
33
17
Regression results
. regress colGPA hsGPA ACT
Source
SS
df
MS
Model
Residual
3.42365506
15.9824444
2
138
1.71182753
.115814814
Total
19.4060994
140
.138614996
colGPA
Coef.
hsGPA
ACT
_cons
.4534559
.009426
1.286328
Std. Err.
.0958129
.0107772
.3408221
t
4.73
0.87
3.77
Number of obs
F( 2,
138)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.383
0.000
=
=
=
=
=
=
141
14.78
0.0000
0.1764
0.1645
.34032
[95% Conf. Interval]

.2640047
-.0118838
.612419
.6429071
.0307358
1.960237
Interpret the coefficients. Carefully state

what is held constant when you are
evaluating the results. Are the estimated
effects small or large?
18
Now consider results from the

following simple regression
. reg colGPA ACT
Source
SS
df
MS
Model
Residual
.829558811
18.5765406
1
139
.829558811
.133644177
Total
19.4060994
140
.138614996
colGPA
Coef.
ACT
_cons
.027064
2.402979
Std. Err.
.0108628
.2642027
t
2.49
9.10
Number of obs
F( 1,
139)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
141
6.21
0.0139
0.0427
0.0359
.36557
P>|t|
0.014
0.000
.0055862
1.880604
.0485417
2.925355
Compare the estimated coefficient on ACT in this

model to that on the previous slide: how do they
differ and why?
Related: How does interpretation of the estimated
parameters differ across the two specifications?
19
Interpretation of equation with k

independent variables
The case with more than two independent
variables is similar.
For example, the coefficient on x 1 measures
the change in
due to a one-unit increase
in x1, holding all other independent
variables fixed:
holding x2, x3,,xk constant.

Econometric jargon: We have controlled for the variables x2, x3,
20
,xk when estimating the effect of x1 on y.
Example 3.2: The wage

equation, with and without
controls for tenure &
experience
a) Simple regression
a) Multiple regression
. ge logwage=ln(wage)
. reg logwage educ exper tenure
. reg logwage educ

Source
Source
SS
df
MS
Model
Residual
27.5606288
120.769123
1 27.5606288
524 .230475425
Total
148.329751
525
Number of obs
F( 1, 524)
Prob > F
R-squared
Adj R-squared
Root MSE
.28253286
=
=
=
=
=
=
526
119.58
0.0000
0.1858
0.1843
.48008
SS
Coef.
educ
_cons
.0827444
.5837727
Std. Err.
.0075667
.0973358
t
10.94
6.00
P>|t|
0.000
0.000
.0678796
.3925563
.0976091
.7749891
MS
Model
Residual
46.8741776
3 15.6247259
101.455574 522 .194359337
Total
148.329751 525 .28253286
logwage
logwage
df
educ
exper
tenure
_cons
Coef. Std. Err.

.092029
.0041211
.0220672
.2843595
Data: WAGE1.DTA.
.0073299
.0017233
.0030936
.1041904
t
12.56
2.39
7.13
2.73
Number of obs
F( 3, 522)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
526
80.39
0.0000
0.3160
0.3121
.44086
P>|t|
0.000
0.017
0.000
0.007
.0776292
.0007357
.0159897
.0796756
.1064288
.0075065
.0281448
.4890435
21
Interpretation of these results

Does it make sense to assume zero covariance between
the residual and the regressors?
What is the causal effect of education?
Is general experience more or less important for wages
than specific experience (specific to the firm)?
What is the effect on wage when an individual stays at
the same firm for another year? (Clue: more than one
explanatory variable change here see p.77 in the
book.)
How many years of tenure correspond to one year of
education in terms of wages?
Why is the estimated education effect higher in the
multiple regression (the table shown to the right on the
previous slide)?
22
Correlation matrix
. corr educ exper tenure
(obs=526)
educ
exper
tenure
educ
exper
tenure
1.0000
-0.2995
-0.0562
1.0000
0.4993
1.0000
Education is clearly negatively correlated with

experience and tenure in this data set.
So, those with high levels of education will have
less experience, on average (e.g. for the simple
reason that they enter the labor market later)
Presumably, education and experience both raise
wages
By not controlling for experience, we may
therefore underestimate the effect of education 23
OLS fitted values and

residuals
For observation i the fitted value is simply
The residual for observation i is defined

just as in the simple regression case:
24
Properties:
25
Comparison of simple & multiple

regression estimates
Simple regression:
Multiple regression:
We know that the simple regression of the
coefficient on x1 is generally different from
the multiple regression coefficient on x1.
Heres how the two parameters are related:
where
is the slope coef. from a simple
regression of x2 on x1. How can this be 26
So, they are the same if the second

term on the RHS is zero; i.e. if
and/or
27
Goodness-of-fit:
Same as for simple regression model
SST = Total Sum of Squares
SSE = Explained Sum of Squares
SSR = Residual Sum of Squares
28
Some points about the Rsquared

The R-squared is equal to the squared correlation
between actual and fitted y.
The R-squared never decreases, and usually
increases, when another independent variable is
added to a regression.
This is because the SSR can never increase when you
add more regressors to the model (why?)
Why is the R-squared a poor tool for deciding

whether a particular variable should be added to
the model?
29
Compare and interpret the Rsquareds

a) Simple regression
a) Multiple regression
. ge logwage=ln(wage)
. reg logwage educ exper tenure
. reg logwage educ

Source
Source
SS
df
MS
Model
Residual
27.5606288
120.769123
1 27.5606288
524 .230475425
Total
148.329751
525
Number of obs
F( 1, 524)
Prob > F
R-squared
Adj R-squared
Root MSE
.28253286
=
=
=
=
=
=
526
119.58
0.0000
0.1858
0.1843
.48008
SS
Coef.
educ
_cons
.0827444
.5837727
Std. Err.
.0075667
.0973358
t
10.94
6.00
P>|t|
0.000
0.000
.0678796
.3925563
.0976091
.7749891
MS
Model
Residual
46.8741776
3 15.6247259
101.455574 522 .194359337
Total
148.329751 525 .28253286
logwage
logwage
df
educ
exper
tenure
_cons
Coef. Std. Err.

.092029
.0041211
.0220672
.2843595
Data: WAGE1.DTA.
.0073299
.0017233
.0030936
.1041904
t
12.56
2.39
7.13
2.73
Number of obs
F( 3, 522)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
526
80.39
0.0000
0.3160
0.3121
.44086
P>|t|
0.000
0.017
0.000
0.007
.0776292
.0007357
.0159897
.0796756
.1064288
.0075065
.0281448
.4890435
30
3.3 The expected value of

the OLS estimators
The statistical properties of OLS.
Begin by studying the assumptions underlying the
estimator
These are mostly straightforward extensions of those
we saw for the simple regression model (SLR.1-4).
We will also obtain the bias in OLS when an important
variable has been omitted from the regression.
Keep in mind: statistical properties have nothing to
do with a particular sample they refer to the method
of OLS applied in the context of random sampling.
31
Assumptions
Assumption MLR.1: Linear in parameters:
y = 0 + 1x1 + 2x2 ++ u.
Assumption MLR.2: Random sampling:
{xi1, xi2,,xik,y): i=1,2,,n}
following the population model in Assumption MLR.1
Assumption MLR.3: No perfect collinearity: In the
sample, none of the independent variables is
constant and there are no exact linear relationships
among the independent variables.
Assumption MLR.4: Zero conditional mean - the
error u has an expected value of zero given any
values of the independent variables:
E(u|x , x ,,x )=0
32
Assumptions MLR.1-2 are straightforward.

Assumption MLR.3 is new: No perfect collinearity.
Key in practice: No exact linear depdendence
between independent variables.
If there is linear dependence between variables, then
we say there is perfect collinearity. In such a case we
cannot estimate the parameters using OLS.
Examples:
x2 = a*x1
x3 = a1*x1 + a2*x2
Any intuition as to why perfect collinearity implies OLS wont work?

33
Note: Nonlinear dependence is

okay!
This type of model can be estimated
by OLS:
But this type of model cant be estimated by OLS:
Since income_thousandsdollars = 1,000*income_dollars, i.e.

theres linear dependence.
34
Zero conditional mean

MLR.4 E(u|x1, x2,,xk)=0 is a direct extension of
SLR.4.
It is the most important of the four assumptions
MLR.1-4, and requires the residual u to be
uncorrelated with all explanatory variables in
the population model.
When MLR.4 holds, we say that the explanatory
variables are exogenous.
35
Zero conditional mean

MLR.4 may fail for the following reasons:
Omitting an important explanatory variable
that is correlated with any of the x1, x2,,xk
Mis-specified functional relationship between
the dependent and independent variables (e.g.
omitted squared term; using level instead of log;
or log instead of level)
The first of these omitted variables is by far the
biggest concern in empirical work.
36
Theorem 3.1:
Under MLR.1-4, OLS is unbiased
A proof is provided in Appendix 3A.

You do not have to know how to prove
that OLS is unbiased for the multiple
regression model. But you need to know:
The definition above and what it means
The assumptions you need for
unbiasedeness (MLR.1-4)
37
Omitted variable bias:

The simple case
Suppose we omit a variable that
actually belongs in the true
(population) model.
The reason may simply be lack of
data (e.g. ability in wage
regressions).
This generally causes OLS estimators
to be biased.
Now lets study this bias in a bit more
detail.
38
True (population) model:

for which we assume that assumptions MLR.1-4
hold.
Suppose y is log wage, x1 is education and x2 is
innate ability.
Suppose we are primarily interested in 1.
Given the true model, we should run a
regression of log wage on education and ability.
But due to data unavailability (say) we estimate
the wage model excluding ability - hence our
equation becomes:
39
Example:
40
3.45
Because the bias in this case arises from omitting one of the
explanatory variables, this is often referred to as omitted
variable bias.
There are two cases where there is no bias what are they?
Discuss: a) the sign of the bias; b) the size of the bias.
41
Signing the bias
Note that these results follow directly

from equation (3.45) on the previous
slide.
42
Omitted variable bias:

More general cases
Deriving the sign of omitted variable
bias when there are multiple
regressors in the estimated model is
more difficult.
In general, correlation between a
single explanatory variable and the
error results in all estimates being
biased.
43
3.4 Variance of the OLS

estimators
We now obtain the variance of the OLS

estimators, so that we have a measure of
the spread in their sampling distributions.
Assumption MLR.5: Homoskedasticity.
The error u has the same variance given
any value of the explanatory variables:
This means that the variance in the error term, u,

conditional on the explanatory variables, is the
same for all values of the explanatory variables.
If this is not the case, there is
heteroskedasticity and the variance formula
44
Theorem 3.2: Sampling variance

of the OLS slope estimators
Under assumptions MLR.1-5 (known as the
Gauss-Markov assumptions), conditional
on the sample values of the regressors,
for j=1,2,,k, where

is the total sample variation in xj, and Rj2
is the R-squared from regressing xj on all
other regressors (and including an
45
Interpreting the variance

formula
The variance of the estimator is high (which is
typically undesirable) if
The variance of the residual is high
The total sampling variance of the xj is low (e.g. due
to low variance or small sample)
The Rj2 is high. Note that, as Rj2 gets close to 1 due
to near linear dependence amongst regressors
(multicollinearity) the variance can become very
large.
46
Estimating standard errors of

the OLS estimates
The main practical usage of the variance
formula is for computing standard
errors of the OLS estimates (and we use
std errors to test various hypotheses about
the population parameters)
A technicality in this context is that the
true parameter 2 is not observed. But it
can be estimated as follows:
where
47
Standard errors (contd)

Degrees of freedom (df):
The standard errors:
We use standard errors for hypothesis testing

48
more in this in Chapter 4
Standard errors in Stata

. regress colGPA hsGPA ACT
Source
SS
df
MS
Model
Residual
3.42365506
15.9824444
2
138
1.71182753
.115814814
Total
19.4060994
140
.138614996
colGPA
Coef.
hsGPA
ACT
_cons
.4534559
.009426
1.286328
Std. Err.
.0958129
.0107772
.3408221
t
4.73
0.87
3.77
Number of obs
F( 2,
138)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.383
0.000
=
=
=
=
=
=
141
14.78
0.0000
0.1764
0.1645
.34032

.2640047
-.0118838
.612419
.6429071
.0307358
1.960237
49
3.5 Efficiency of OLS: The Gauss-Markov

Theorem
Theorem 3.4: Under assumptions MLR.1-5,

OLS is the Best Linear Unbiased
Estimator (BLUE) of the population
parameters.
Best = smallest variance
Its reassuring to know that, under MLR.1-5,
you cannot find a better estimator than
OLS.
If one or several of these assumptions fail,
OLS is no longer BLUE.
50
Some nice problems in

Chapter 3
Try the following problems in Chapter
3:
3.1, 3.4, 3.6, 3.7, 3.8
51

Modelo Multiple

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Modelo Multiple

Hochgeladen von

Copyright:

Verfügbare Formate

Multiple Regression Analysis

3.1 Motivation for multiple

Main drawback of that framework: All other

Multiple regression analysis is

General model with two independent

Key assumption for the model

Note the similarity to assumption SLR.4

The model with k independent

The model with k independent

1, 2,,k are often referred to as slope

Interpretation of the parameters

Key assumption for the model

Thus, all factors in the unobserved

3.2 Mechanics and

where hats on the parameters indicate that

and the hat on y means predicted (instead of

How obtain the OLS estimates?

Model with k independent

You know from earlier maths courses

Example: The f.o.c. for

The first order condition (f.o.c.) is simply:

Note (chain rule)

General: k+1 unknown parameters

These are the OLS first order conditions.

It is tedious but straightforward in principle to

Interpreting the OLS regression

have partial effect, or ceteris paribus, interpretation

xplain what is meant by this statement.

xplain how to interpret the intercept

Example 3.1: Determinants of

[95% Conf. Interval]

Interpret the coefficients. Carefully state

Now consider results from the

[95% Conf. Interval]

Compare the estimated coefficient on ACT in this

Interpretation of equation with k

holding x2, x3,,xk constant.

Example 3.2: The wage

. reg logwage educ exper tenure

. reg logwage educ

[95% Conf. Interval]

148.329751 525 .28253286

Coef. Std. Err.

[95% Conf. Interval]

Interpretation of these results

Education is clearly negatively correlated with

OLS fitted values and

The residual for observation i is defined

Comparison of simple & multiple

So, they are the same if the second

Some points about the Rsquared

Why is the R-squared a poor tool for deciding

Compare and interpret the Rsquareds

. reg logwage educ exper tenure

. reg logwage educ

[95% Conf. Interval]

148.329751 525 .28253286

Coef. Std. Err.

[95% Conf. Interval]

3.3 The expected value of

Assumptions MLR.1-2 are straightforward.

Any intuition as to why perfect collinearity implies OLS wont work?

Note: Nonlinear dependence is

Since income_thousandsdollars = 1,000*income_dollars, i.e.