Sie sind auf Seite 1von 51

Multiple Regression Analysis

3.1 Motivation for multiple


regression
In chapter 2 we learned how to use simple
regression analysis to explain a dependent
variable y as a function of a single explanatory
variable x.
Key assumption for causality:
SLR.4: The error u has an expected value of zero given
x: E(u|x)=0

Main drawback of that framework: All other


factors affecting y have to be uncorrelated
with x.
2

Multiple regression analysis is


more suitable for causal (ceteris
paribus) analysis.
Reason: We can explicitly control for
other factors that affect the
dependent variable y.
Example: Wage equation
If we estimate the parameters of this model
using OLS, what interpretation can we give to
1?
Why might this approach yield a more reliable
estimate of the causal effect of education than
if we were using a simple regression with educ
as the sole explanatory variable?

General model with two independent


variables:
where
0 is the intercept
1 measures the change in y with respect to x1,
holding other factors fixed
2 measures the change in y with respect to x2,
holding other factors fixed
This framework can also be used to generalize the
functional form example: modeling family
consumption (cons) as a function of income (inc):
where inc2 is entered as a separate variable. What is
the effect of income on consumption in this model?
4

Key assumption for the model


with two independent variables:

Note the similarity to assumption SLR.4


introduced last time.
Interpretation: for any values of x1 and x2 in the
population, the average unobservable (u) is
equal to zero.
Discuss this assumption in the context of the
wage model introduced above (educ, exper
impact wage)
5

The model with k independent


variables
The multiple regression model:
3.6
where
0 is the intercept
1 is the parameter associated with x1 (measures
the change in y with respect to x1, holding other
factors fixed)
2 is the parameter associated with x1 (measures
the change in y with respect to x2, holding other
factors fixed)

The model with k independent


variables (contd)

1, 2,,k are often referred to as slope


parameters
u is the disturbance term (error term). It contains
factors other than the x1,x2,,xk affecting y.

Interpretation of the parameters


of the multiple regression model
Being able to interpret the parameters
of the multiple regression model is one
of our key goals in this course and we
will get plenty of practice.
Checkpoint: Make sure you are able to
interpret the parameters of the
following model:
where ceoten = CEO tenure.
8

Key assumption for the model


with k independent variables:

Thus, all factors in the unobserved


error term u are assumed
uncorrelated with the explanatory
variables.

3.2 Mechanics and


interpretation of OLS
We focus first on the model with two independent variables.
We write the estimated OLS regression in a form similar to
the simple regression case:

where hats on the parameters indicate that


these are estimates of the true (unknown)
population parameters:

and the hat on y means predicted (instead of

10

How obtain the OLS estimates?


As discussed in Chapter 2, the method of
ordinary least squares (OLS) chooses the
estimates that minimizes the sum of
squared residuals.
That is, given n observations on the y and
x1,,xk variables, the OLS estimates
minimize:
where i refers to observation number, and
the second index distinguishes different
variables.
11

Model with k independent


variables
OLS estimates

minimize:

You know from earlier maths courses


how to solve a minimization problem
such as this one:
Write down the first-order conditions
for each parameter
Then solve for each parameter

12

Example: The f.o.c. for


The minimization problem: Choose
as to minimize:

so

The first order condition (f.o.c.) is simply:


:

Note (chain rule)

13

General: k+1 unknown parameters


& k+1 equations

These are the OLS first order conditions.


Note that they also can be interpreted as sample
counterparts of the population moments E(xju)=0
(omitting the division by n). Compare with approach
used in Chapter 2.
The latter point highlights the importance of assuming
14
zero covariance between the residual and the

It is tedious but straightforward in principle to


solve for the parameter estimates here. Each
parameter estimate is expressed as a linear
function of the x and y variables.
Fortunately, the computer does the work for us.
You will not be required to derive the solutions.
Note: We must assume that the equations above
can be solved uniquely for the parameters. For
now, we will just assume this and move on.
15

Interpreting the OLS regression


function
More important than the details of
the computation of the OLS
estimates is the interpretation of
the estimated equation.
Consider the model with two
regressors:

he estimates

have partial effect, or ceteris paribus, interpretation

xplain what is meant by this statement.

xplain how to interpret the intercept


16

Example 3.1: Determinants of


college GPA
Data: GPA1.DTA. Collected by a
student at Michigan State University
Variables: college grade point
average (colGPA), high shool GPA
(hsGPA) and achievement test score
(ACT)
Summary statistics for these
variables:
. summarize colGPA hsGPA ACT
Variable

Obs

Mean

colGPA
hsGPA
ACT

141
141
141

3.056738
3.402128
24.15603

Std. Dev.

Min

Max

.3723103
.3199259
2.844252

2.2
2.4
16

4
4
33
17

Regression results
. regress colGPA hsGPA ACT
Source

SS

df

MS

Model
Residual

3.42365506
15.9824444

2
138

1.71182753
.115814814

Total

19.4060994

140

.138614996

colGPA

Coef.

hsGPA
ACT
_cons

.4534559
.009426
1.286328

Std. Err.
.0958129
.0107772
.3408221

t
4.73
0.87
3.77

Number of obs
F( 2,
138)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|
0.000
0.383
0.000

=
=
=
=
=
=

141
14.78
0.0000
0.1764
0.1645
.34032

[95% Conf. Interval]


.2640047
-.0118838
.612419

.6429071
.0307358
1.960237

Interpret the coefficients. Carefully state


what is held constant when you are
evaluating the results. Are the estimated
effects small or large?

18

Now consider results from the


following simple regression
. reg colGPA ACT
Source

SS

df

MS

Model
Residual

.829558811
18.5765406

1
139

.829558811
.133644177

Total

19.4060994

140

.138614996

colGPA

Coef.

ACT
_cons

.027064
2.402979

Std. Err.
.0108628
.2642027

t
2.49
9.10

Number of obs
F( 1,
139)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

141
6.21
0.0139
0.0427
0.0359
.36557

P>|t|

[95% Conf. Interval]

0.014
0.000

.0055862
1.880604

.0485417
2.925355

Compare the estimated coefficient on ACT in this


model to that on the previous slide: how do they
differ and why?
Related: How does interpretation of the estimated
parameters differ across the two specifications?
19

Interpretation of equation with k


independent variables
The case with more than two independent
variables is similar.
For example, the coefficient on x 1 measures
the change in
due to a one-unit increase
in x1, holding all other independent
variables fixed:

holding x2, x3,,xk constant.


Econometric jargon: We have controlled for the variables x2, x3,
20
,xk when estimating the effect of x1 on y.

Example 3.2: The wage


equation, with and without
controls for tenure &
experience

a) Simple regression

a) Multiple regression

. ge logwage=ln(wage)

. reg logwage educ exper tenure

. reg logwage educ


Source

Source
SS

df

MS

Model
Residual

27.5606288
120.769123

1 27.5606288
524 .230475425

Total

148.329751

525

Number of obs
F( 1, 524)
Prob > F
R-squared
Adj R-squared
Root MSE

.28253286

=
=
=
=
=
=

526
119.58
0.0000
0.1858
0.1843
.48008

SS

Coef.

educ
_cons

.0827444
.5837727

Std. Err.
.0075667
.0973358

t
10.94
6.00

P>|t|

[95% Conf. Interval]

0.000
0.000

.0678796
.3925563

.0976091
.7749891

MS

Model
Residual

46.8741776
3 15.6247259
101.455574 522 .194359337

Total

148.329751 525 .28253286

logwage
logwage

df

educ
exper
tenure
_cons

Coef. Std. Err.


.092029
.0041211
.0220672
.2843595

Data: WAGE1.DTA.

.0073299
.0017233
.0030936
.1041904

t
12.56
2.39
7.13
2.73

Number of obs
F( 3, 522)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

526
80.39
0.0000
0.3160
0.3121
.44086

P>|t|

[95% Conf. Interval]

0.000
0.017
0.000
0.007

.0776292
.0007357
.0159897
.0796756

.1064288
.0075065
.0281448
.4890435
21

Interpretation of these results


Does it make sense to assume zero covariance between
the residual and the regressors?
What is the causal effect of education?
Is general experience more or less important for wages
than specific experience (specific to the firm)?
What is the effect on wage when an individual stays at
the same firm for another year? (Clue: more than one
explanatory variable change here see p.77 in the
book.)
How many years of tenure correspond to one year of
education in terms of wages?
Why is the estimated education effect higher in the
multiple regression (the table shown to the right on the
previous slide)?
22

Correlation matrix
. corr educ exper tenure
(obs=526)

educ
exper
tenure

educ

exper

tenure

1.0000
-0.2995
-0.0562

1.0000
0.4993

1.0000

Education is clearly negatively correlated with


experience and tenure in this data set.
So, those with high levels of education will have
less experience, on average (e.g. for the simple
reason that they enter the labor market later)
Presumably, education and experience both raise
wages
By not controlling for experience, we may
therefore underestimate the effect of education 23

OLS fitted values and


residuals
For observation i the fitted value is simply

The residual for observation i is defined


just as in the simple regression case:

24

Properties:

25

Comparison of simple & multiple


regression estimates
Simple regression:
Multiple regression:
We know that the simple regression of the
coefficient on x1 is generally different from
the multiple regression coefficient on x1.
Heres how the two parameters are related:
where
is the slope coef. from a simple
regression of x2 on x1. How can this be 26

So, they are the same if the second


term on the RHS is zero; i.e. if
and/or

27

Goodness-of-fit:
Same as for simple regression model
SST = Total Sum of Squares
SSE = Explained Sum of Squares
SSR = Residual Sum of Squares

28

Some points about the Rsquared


The R-squared is equal to the squared correlation
between actual and fitted y.
The R-squared never decreases, and usually
increases, when another independent variable is
added to a regression.
This is because the SSR can never increase when you
add more regressors to the model (why?)

Why is the R-squared a poor tool for deciding


whether a particular variable should be added to
the model?

29

Compare and interpret the Rsquareds


a) Simple regression

a) Multiple regression

. ge logwage=ln(wage)

. reg logwage educ exper tenure

. reg logwage educ


Source

Source
SS

df

MS

Model
Residual

27.5606288
120.769123

1 27.5606288
524 .230475425

Total

148.329751

525

Number of obs
F( 1, 524)
Prob > F
R-squared
Adj R-squared
Root MSE

.28253286

=
=
=
=
=
=

526
119.58
0.0000
0.1858
0.1843
.48008

SS

Coef.

educ
_cons

.0827444
.5837727

Std. Err.
.0075667
.0973358

t
10.94
6.00

P>|t|

[95% Conf. Interval]

0.000
0.000

.0678796
.3925563

.0976091
.7749891

MS

Model
Residual

46.8741776
3 15.6247259
101.455574 522 .194359337

Total

148.329751 525 .28253286

logwage
logwage

df

educ
exper
tenure
_cons

Coef. Std. Err.


.092029
.0041211
.0220672
.2843595

Data: WAGE1.DTA.

.0073299
.0017233
.0030936
.1041904

t
12.56
2.39
7.13
2.73

Number of obs
F( 3, 522)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

526
80.39
0.0000
0.3160
0.3121
.44086

P>|t|

[95% Conf. Interval]

0.000
0.017
0.000
0.007

.0776292
.0007357
.0159897
.0796756

.1064288
.0075065
.0281448
.4890435
30

3.3 The expected value of


the OLS estimators
The statistical properties of OLS.
Begin by studying the assumptions underlying the
estimator
These are mostly straightforward extensions of those
we saw for the simple regression model (SLR.1-4).
We will also obtain the bias in OLS when an important
variable has been omitted from the regression.
Keep in mind: statistical properties have nothing to
do with a particular sample they refer to the method
of OLS applied in the context of random sampling.

31

Assumptions
Assumption MLR.1: Linear in parameters:
y = 0 + 1x1 + 2x2 ++ u.
Assumption MLR.2: Random sampling:
{xi1, xi2,,xik,y): i=1,2,,n}
following the population model in Assumption MLR.1
Assumption MLR.3: No perfect collinearity: In the
sample, none of the independent variables is
constant and there are no exact linear relationships
among the independent variables.
Assumption MLR.4: Zero conditional mean - the
error u has an expected value of zero given any
values of the independent variables:
E(u|x , x ,,x )=0

32

Assumptions MLR.1-2 are straightforward.


Assumption MLR.3 is new: No perfect collinearity.
Key in practice: No exact linear depdendence
between independent variables.
If there is linear dependence between variables, then
we say there is perfect collinearity. In such a case we
cannot estimate the parameters using OLS.
Examples:
x2 = a*x1
x3 = a1*x1 + a2*x2

Any intuition as to why perfect collinearity implies OLS wont work?


33

Note: Nonlinear dependence is


okay!
This type of model can be estimated
by OLS:
But this type of model cant be estimated by OLS:

Since income_thousandsdollars = 1,000*income_dollars, i.e.


theres linear dependence.

34

Zero conditional mean


MLR.4 E(u|x1, x2,,xk)=0 is a direct extension of
SLR.4.
It is the most important of the four assumptions
MLR.1-4, and requires the residual u to be
uncorrelated with all explanatory variables in
the population model.
When MLR.4 holds, we say that the explanatory
variables are exogenous.

35

Zero conditional mean


MLR.4 may fail for the following reasons:
Omitting an important explanatory variable
that is correlated with any of the x1, x2,,xk
Mis-specified functional relationship between
the dependent and independent variables (e.g.
omitted squared term; using level instead of log;
or log instead of level)
The first of these omitted variables is by far the
biggest concern in empirical work.

36

Theorem 3.1:
Under MLR.1-4, OLS is unbiased

A proof is provided in Appendix 3A.


You do not have to know how to prove
that OLS is unbiased for the multiple
regression model. But you need to know:
The definition above and what it means
The assumptions you need for
unbiasedeness (MLR.1-4)
37

Omitted variable bias:


The simple case
Suppose we omit a variable that
actually belongs in the true
(population) model.
The reason may simply be lack of
data (e.g. ability in wage
regressions).
This generally causes OLS estimators
to be biased.
Now lets study this bias in a bit more
detail.

38

True (population) model:


for which we assume that assumptions MLR.1-4
hold.
Suppose y is log wage, x1 is education and x2 is
innate ability.
Suppose we are primarily interested in 1.
Given the true model, we should run a
regression of log wage on education and ability.
But due to data unavailability (say) we estimate
the wage model excluding ability - hence our
equation becomes:
39

Example:

40

3.45

Because the bias in this case arises from omitting one of the
explanatory variables, this is often referred to as omitted
variable bias.
There are two cases where there is no bias what are they?
Discuss: a) the sign of the bias; b) the size of the bias.
41

Signing the bias

Note that these results follow directly


from equation (3.45) on the previous
slide.

42

Omitted variable bias:


More general cases
Deriving the sign of omitted variable
bias when there are multiple
regressors in the estimated model is
more difficult.
In general, correlation between a
single explanatory variable and the
error results in all estimates being
biased.
43

3.4 Variance of the OLS


estimators

We now obtain the variance of the OLS


estimators, so that we have a measure of
the spread in their sampling distributions.
Assumption MLR.5: Homoskedasticity.
The error u has the same variance given
any value of the explanatory variables:

This means that the variance in the error term, u,


conditional on the explanatory variables, is the
same for all values of the explanatory variables.
If this is not the case, there is
heteroskedasticity and the variance formula

44

Theorem 3.2: Sampling variance


of the OLS slope estimators
Under assumptions MLR.1-5 (known as the
Gauss-Markov assumptions), conditional
on the sample values of the regressors,

for j=1,2,,k, where


is the total sample variation in xj, and Rj2
is the R-squared from regressing xj on all
other regressors (and including an
45

Interpreting the variance


formula
The variance of the estimator is high (which is
typically undesirable) if
The variance of the residual is high
The total sampling variance of the xj is low (e.g. due
to low variance or small sample)
The Rj2 is high. Note that, as Rj2 gets close to 1 due
to near linear dependence amongst regressors
(multicollinearity) the variance can become very
large.
46

Estimating standard errors of


the OLS estimates
The main practical usage of the variance
formula is for computing standard
errors of the OLS estimates (and we use
std errors to test various hypotheses about
the population parameters)
A technicality in this context is that the
true parameter 2 is not observed. But it
can be estimated as follows:
where
47

Standard errors (contd)


Degrees of freedom (df):

The standard errors:

We use standard errors for hypothesis testing


48
more in this in Chapter 4

Standard errors in Stata


. regress colGPA hsGPA ACT
Source

SS

df

MS

Model
Residual

3.42365506
15.9824444

2
138

1.71182753
.115814814

Total

19.4060994

140

.138614996

colGPA

Coef.

hsGPA
ACT
_cons

.4534559
.009426
1.286328

Std. Err.
.0958129
.0107772
.3408221

t
4.73
0.87
3.77

Number of obs
F( 2,
138)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|
0.000
0.383
0.000

=
=
=
=
=
=

141
14.78
0.0000
0.1764
0.1645
.34032

[95% Conf. Interval]


.2640047
-.0118838
.612419

.6429071
.0307358
1.960237

49

3.5 Efficiency of OLS: The Gauss-Markov


Theorem

Theorem 3.4: Under assumptions MLR.1-5,


OLS is the Best Linear Unbiased
Estimator (BLUE) of the population
parameters.
Best = smallest variance
Its reassuring to know that, under MLR.1-5,
you cannot find a better estimator than
OLS.
If one or several of these assumptions fail,
OLS is no longer BLUE.
50

Some nice problems in


Chapter 3
Try the following problems in Chapter
3:
3.1, 3.4, 3.6, 3.7, 3.8

51

Das könnte Ihnen auch gefallen