Sie sind auf Seite 1von 41

Review: Ch 10 and 12

Please do review all the material we have


covered
We can only go over a fraction it in one
review class, so please make sure to review
all of our lecture material and textbook
readings, not just the slides here
Copyright 2015 Pearson, Inc. All rights reserved.

10-1

Panel data notation, ctd.


Panel data with k regressors:
(X1it, X2it,,Xkit, Yit), i = 1,,n, t = 1,,T
n = number of entities (states)
T = number of time periods (years)
Some jargon
Another term for panel data is longitudinal data
balanced panel: no missing observations, that is, all
variables are observed for all entities (states) and all time
periods (years)

Copyright 2015 Pearson, Inc. All rights reserved.

10-2

Why are panel data useful?


With panel data we can control for factors that:
Vary across entities but do not vary over time
Could cause omitted variable bias if they are
omitted
Are unobserved or unmeasured and therefore
cannot be included in the regression using multiple
regression
Heres the key idea:
If an omitted variable does not change over time,
then any changes in Y over time cannot be caused
by the omitted variable.
Copyright 2015 Pearson, Inc. All rights reserved.

10-3

Panel Data Outline


1. The FIXED effects Model:
1. What is it
2. Potential issues
3. TIME Fixed Effects

2. The RANDOM effects Model


1. What is it
2. Potential issues

3. Choosing b/t the FIXED effects and RANDOM


effects Model

Copyright 2015 Pearson, Inc. All rights reserved.

10-4

The Fixed Effects


Model
Copyright 2015 Pearson, Inc. All rights reserved.

10-5
16-5

Fixed Effects
Fixed-effects (FE) explore the relationship between the
independent variables and dependent variable within an
entity (country, state, institution etc.).
Each entity (state) has its own individual characteristics that
may or may not influence the dependent variables
Why use FE? Because we believe that something within the
entity (state) will bias the variables; we need to control for
this to get unbiased estimates.
Therefore, FE removes the effect of those time-invariant
characteristics from the independent variables so we can
assess their net effect.
Copyright 2015 Pearson, Inc. All rights reserved.

10-6

Fixed Effects
Fixed effects form
Yit = 1Xit + i + uit
i is called a state fixed effect or state effect it is
the constant (fixed) effect of being in state I
Again, FE removes the effect of those time-invariant
characteristics from the independent variables
thus we CANNOT get a coefficient for a specific time-invariant
variable (race, gender, etc) since they are all lumped in with
the intercept in the term i

Thus i=B0+all other coefficients of time

invariant variables for that particular


entity i

Copyright 2015 Pearson, Inc. All rights reserved.

10-7

The regression lines for each state in a


picture

Copyright 2015 Pearson, Inc. All rights reserved.

10-8

Summary of issues and solutions


If we have HETEROSKEDASTICITY ONLY
We find is using the xttest3 test
Correct it via the robust option

If we have SERIAL CORRELATION ONLY


We find it via the xtserial test
We correct it via the xtregar., fe command
If we have both HETEROSKED & SERIAL CORELATION:
Use the xtreg , fe vce (cluster id)
____________________________________

If we have ERRORS correlated ACROSS entities


We find is using the Pesaran Test
Correct it via the xtscc command

Copyright 2015 Pearson, Inc. All rights reserved.

10-10

Example: Traffic deaths and beer taxes in


STATA
First let STATA know you are working with panel
data by defining the entity variable (state) and
time variable (year):

.xtsetstateyear;

panelvariable:state(stronglybalanced)
timevariable:year,1982to1988
delta:1unit

Copyright 2015 Pearson, Inc. All rights reserved.

10-11

xtregvfrallbeertax,fevce(clusterstate)

Fixed-effects(within)regressionNumberofobs=336
Groupvariable:stateNumberofgroups=48
R-sq:within=0.0407Obspergroup:min=7
between=0.1101avg=7.0
overall=0.0934max=7
F(1,47)=5.05
corr(u_i,Xb)=-0.6885Prob>F=0.0294

(Std.Err.adjustedfor48clustersinstate)
-----------------------------------------------------------------------------|Robust
vfrall|Coef.Std.Err.tP>|t|[95%Conf.Interval]
-------------+---------------------------------------------------------------beertax|-.6558736.2918556-2.250.029-1.243011-.0687358
_cons|2.377075.149796615.870.0002.0757232.678427
------------------------------------------------------------------------------

Thepaneldatacommandxtregwiththeoptionfeperformsfixedeffects
regression.Thereportedinterceptisarbitrary,andtheestimated
individualeffectsarenotreportedinthedefaultoutput.
Thefeoptionmeansusefixedeffectsregression
Thevce(clusterstate)optiontellsSTATAtouseclusteredstandard
errorswhatarethese?Letsfigurethemouttogether

Copyright 2015 Pearson, Inc. All rights reserved.

10-12

Panel Data Outline


1. The FIXED effects Model:
1. What is it
2. Potential issues
3. TIME Fixed Effects

2. The RANDOM effects Model


1. What is it
2. Potential issues

3. Choosing b/t the FIXED effects and RANDOM


effects Model

Copyright 2015 Pearson, Inc. All rights reserved.

10-13

Regression with Time Fixed Effects


(SW Section 10.4)
An omitted variable might vary over time but not
across states:
Safer cars (air bags, etc.); changes in national
laws
These produce intercepts that change over time
Let St denote the combined effect of variables
which changes over time but not states (safer
cars).
The resulting population regression model is:
Yit = 0 + 1Xit + 2Zi + 3St + uit
Copyright 2015 Pearson, Inc. All rights reserved.

10-14

geny83=(year==1983);
geny84=(year==1984);
geny85=(year==1985);
.globalyeardum"y83y84y85y86y87y88";

.xtregvfrallbeertax$yeardum,fevce(clusterstate);
geny86=(year==1986);

geny87=(year==1987);
Fixed-effects(within)regressionNumberofobs=336
geny88=(year==1988);

Groupvariable:stateNumberofgroups=48
R-sq:within=0.0803Obspergroup:min=7
between=0.1101avg=7.0
overall=0.0876max=7
corr(u_i,Xb)=-0.6781Prob>F=0.0009
(Std.Err.adjustedfor48clustersinstate)
-----------------------------------------------------------------------------|Robust
vfrall|Coef.Std.Err.tP>|t|[95%Conf.Interval]
-------------+---------------------------------------------------------------beertax|-.6399799.3570783-1.790.080-1.358329.0783691
y83|-.0799029.0350861-2.280.027-.1504869-.0093188
y84|-.0724206.0438809-1.650.106-.1606975.0158564
y85|-.1239763.0460559-2.690.010-.2166288-.0313238
y86|-.0378645.0570604-0.660.510-.1526552.0769262
y87|-.0509021.0636084-0.800.428-.1788656.0770615
y88|-.0518038.0644023-0.800.425-.1813645.0777568
_cons|2.42847.201688512.040.0002.0227252.834215
-------------+---------------------------------------------------------------Copyright 2015 Pearson, Inc. All rights reserved.

10-15

Are the time effects jointly statistically


significant?
.test$yeardum;

(1)y83=0
(2)y84=0
(3)y85=0
(4)y86=0
(5)y87=0
(6)y88=0

F(6,47)=4.22

Prob>F=0.0018

Yes

Copyright 2015 Pearson, Inc. All rights reserved.

10-16

Panel Data Outline


1. The FIXED effects Model:
1. What is it
2. Potential issues
3. TIME Fixed Effects

2. The RANDOM effects Model


1. What is it
2. Potential issues

3. Choosing b/t the FIXED effects and RANDOM


effects Model

Copyright 2015 Pearson, Inc. All rights reserved.

10-17

The Random Effects


Model

Copyright 2015 Pearson, Inc. All rights reserved.

10-18
16-18

The Random Effects Model (cont.)


Advantages of the random effects model:
1. Can now also estimate time-invariant
explanatory variables (like race or gender).
Disadvantages of the random effects model:
1. Most importantly, the random effects estimator
requires us to assume that ai (the fixed effect term) is
uncorrelated with the independent variables, the Xs,
if were going to avoid omitted variable bias
This may be an overly strong assumption in
many cases
Copyright 2015 Pearson, Inc. All rights reserved.

10-20
16-20

Remember our FE model results


xtregvfrallbeertax,fevce(clusterstate)

Fixed-effects(within)regressionNumberofobs=336
Groupvariable:stateNumberofgroups=48
R-sq:within=0.0407Obspergroup:min=7
between=0.1101avg=7.0
overall=0.0934max=7
F(1,47)=5.05
corr(u_i,Xb)=-0.6885Prob>F=0.0294

(Std.Err.adjustedfor48clustersinstate)
-----------------------------------------------------------------------------|Robust
vfrall|Coef.Std.Err.tP>|t|[95%Conf.Interval]
-------------+---------------------------------------------------------------beertax|-.6558736.2918556-2.250.029-1.243011-.0687358
_cons|2.377075.149796615.870.0002.0757232.678427
-----------------------------------------------------------------------------Copyright 2015 Pearson, Inc. All rights reserved.

10-21

Panel Data Outline


1. The FIXED effects Model:
1. What is it
2. Potential issues
3. TIME Fixed Effects

2. The RANDOM effects Model


1. What is it
2. Potential issues

3. Choosing b/t the FIXED effects and RANDOM


effects Model

Copyright 2015 Pearson, Inc. All rights reserved.

10-22

Choosing Between Fixed and Random


Effects
One key is the nature of the relationship between ai and the Xs:
If theyre likely to be correlated, then it makes sense to use
the fixed effects model
If not, then it makes sense to use the random effects model
use the Hausman test to examine whether there is correlation
between ai and X
Essentially, this procedure tests to see whether the regression
coefficients under the fixed effects and random effects models are
statistically different from each other
If they are different, then the fixed effects model is
preferred
If the they are not different, then the random effects
model is preferred (or estimates of both the fixed effects and
random effects models are provided)
10-23

Copyright 2015 Pearson, Inc. All rights reserved.

Choosing Between Fixed and Random


Effects
Run both the FE and the RE regression
Method 1:
xtoverid

Ho: indep vars are uncorrelated with the group-specific


error (the extra RE orthogonality conditions)

Method 2:
hausman alternative command: does the same thing

Copyright 2015 Pearson, Inc. All rights reserved.

10-24

Instrumental Variables Outline


1. IV Regression: Why and What; Two Stage
Least Squares
2. The General IV Regression Model
3. Checking Instrument Validity
a) Weak and strong instruments
b) Instrument exogeneity

Copyright 2015 Pearson, Inc. All rights reserved.

10-25

IV Regression: Why?
Three important threats to internal validity are:
Omitted variable bias from a variable that is correlated with
X but is unobserved (so cannot be included in the
regression) and for which there are inadequate control
variables;
Simultaneous causality bias (X causes Y, Y causes X);
Errors-in-variables bias (X is measured with error)
All three problems result in E(u|X) 0.
Instrumental variables regression can eliminate bias when
E(u|X) 0 using an instrumental variable (IV), Z.

Copyright 2015 Pearson, Inc. All rights reserved.

10-26

The IV Estimator with a Single Regressor and a


Single Instrument (SW Section 12.1)
Yi = 0 + 1Xi + ui
IV regression breaks X into two parts:
a part that might be correlated with u, and
a part that is not.

By isolating the part that is not correlated with u, it


is possible to estimate 1.
This is done using an instrumental variable, Zi,
which is correlated with Xi but uncorrelated with ui.
Copyright 2015 Pearson, Inc. All rights reserved.

10-27

Two Stage Least Squares: Summary


Suppose Zi, satisfies the two conditions for a valid instrument:
1. Instrument relevance: corr(Zi,Xi) 0
2. Instrument exogeneity: corr(Zi,ui) = 0
Two-stage least squares:
Stage 1: Regress Xi on Zi (including an intercept), obtain the

X i

predicted values

Stage 2: Regress Yi on X i (including an intercept); the


coefficient on

is theX iTSLS estimator,

TSLS
. 1

1TSLS is a consistent estimator of 1.


Copyright 2015 Pearson, Inc. All rights reserved.

10-28

Example #4: Demand for Cigarettes


ln(

Qicigarettes) = 0 + 1ln( Pi cigarettes ) + ui

Why is the OLS estimator of 1 likely to be biased?


Data set: Panel data on annual cigarette consumption and
average prices paid (including tax), by state, for the 48
continental US states, 1985-1995.
Proposed instrumental variable:
Zi = general sales tax per pack in the state = SalesTaxi
Do you think this instrument is plausibly valid?
1. Relevant? corr(SalesTaxi, ln(

Pi cigarettes )) 0?

2. Exogenous? corr(SalesTaxi,ui) = 0?
Copyright 2015 Pearson, Inc. All rights reserved.

10-29

Combined into a single command:


YXZ
ivregress2slslpackpc(lravgprs=rtaxso)ifyear==1995,vce(robust)

Instrumentalvariables(2SLS)regressionNumberofobs=48
Waldchi2(1)=12.05
Prob>chi2=0.0005
R-squared=0.4011
RootMSE=.18635

-----------------------------------------------------------------------------|Robust
lpackpc|Coef.Std.Err.zP>|z|[95%Conf.Interval]
-------------+---------------------------------------------------------------lravgprs|-1.083587.3122035-3.470.001-1.695494-.471679
_cons|9.7198761.4961436.500.0006.7874912.65226
-----------------------------------------------------------------------------Instrumented:lravgprsThis is the endogenous regressor
Instruments:rtaxsoThis is the instrumental varible
-----------------------------------------------------------------------------Estimated

cigarette demand equation:


=Q9.72
cigarettes 1.08
ln(
)
i
(1.53) (0.31)
Copyright 2015 Pearson, Inc. All rights reserved.

,Pncigarettes
= 48)
ln(
i

10-31

Instrumental Variables Outline


1. IV Regression: Why and What; Two Stage
Least Squares
2. The General IV Regression Model
3. Checking Instrument Validity
a) Weak and strong instruments
b) Instrument exogeneity

Copyright 2015 Pearson, Inc. All rights reserved.

10-32

The General IV Regression Model:


Summary of Jargon
Yi = 0 + 1X1i + + kXki + k+1W1i + + k+rWri + ui
Yi is the dependent variable
X1i,, Xki are the endogenous regressors (potentially
correlated with ui)
W1i,,Wri are the included exogenous regressors
(uncorrelated with ui) or control variables (included so that
Zi is uncorrelated with ui, once the Ws are included)
0, 1,, k+r are the unknown regression coefficients
Z1i,,Zmi are the m instrumental variables (the excluded
exogenous variables)
The coefficients are overidentified if m > k; exactly
identified if m = k; and underidentified if m < k.
Copyright 2015 Pearson, Inc. All rights reserved.

10-33

Ws as control variables, ctd.

In many cases, the purpose of including the Ws is


to control for omitted factors, so that once the Ws
are included, Z is uncorrelated with u.

Technically, the condition for Ws being effective


control variables is that the conditional mean of ui
does not depend on Zi, given Wi:
E(ui|Wi, Zi) = E(ui|Wi)
Here is the key idea: in many applications you need
to include control variables (Ws) so that Z is
plausibly exogenous (uncorrelated with u).
Copyright 2015 Pearson, Inc. All rights reserved.

10-34

Example #4: Demand for cigarettes, ctd.

Suppose income is exogenous (this is plausible why?), and we


also want to estimate the income elasticity:

P cigarettes )
Q cigarettes ) ) = + ln(ln(
ln( ln(
) + 2ln(Incomei) + ui
i
0
1
i
We actually have two instruments:
Z1i = general sales taxi
Z2i = cigarette-specific taxi

Endogenous variable: ln( ln( Pi cigarettes ) ) (one X)


Included exogenous variable: ln(Incomei) (one W)
Instruments (excluded endogenous variables): general sales
tax, cigarette-specific tax (two Zs)
Is 1 over, under, or exactly identified?
Copyright 2015 Pearson, Inc. All rights reserved.

10-35

Example: Cigarette demand, two instruments


Y

Z1

Z2

ivreg lpackpc lperinc (lravgprs = rtaxso rtax) if year==1995, r;

IV(2SLS)regressionwithrobuststandarderrorsNumberofobs=48
F(2,45)=16.17
Prob>F=0.0000
R-squared=0.4294
RootMSE=.18786

-----------------------------------------------------------------------------|Robust
lpackpc|Coef.Std.Err.tP>|t|[95%Conf.Interval]
-------------+---------------------------------------------------------------lravgprs|-1.277424.2496099-5.120.000-1.780164-.7746837
lperinc|.2804045.25388941.100.275-.230955.7917641
_cons|9.894955.959216910.320.0007.96299311.82692
-----------------------------------------------------------------------------Instrumented: lravgprs
Instruments: lperinc rtaxso rtax STATA lists ALL the exogenous regressors
as instruments slightly different
terminology than we have been using
-----------------------------------------------------------------------------Copyright 2015 Pearson, Inc. All rights reserved.

10-36

TSLS estimates, Z = sales tax (m = 1)


P cigarettes ) + 0.21ln(Income )
Q cigarettes ) = 9.43 1.14 ln(
ln(
i
i
i
(1.26)

(0.37)

(0.31)

TSLS estimates, Z = sales tax & cig-only tax (m = 2)


Q cigarettes ) = 9.89 1.28 ln(
P cigarettes ) + 0.28ln(Incomei)
ln(
i
i
(0.96) (0.25)
(0.25)

Smaller SEs for m = 2. Using 2 instruments gives more


information more as-if random variation.

Copyright 2015 Pearson, Inc. All rights reserved.

10-37

Outline
1. IV Regression: Why and What; Two Stage
Least Squares
2. The General IV Regression Model
3. Checking Instrument Validity
a) Weak and strong instruments
b) Instrument exogeneity

Copyright 2015 Pearson, Inc. All rights reserved.

10-38

Checking Instrument Validity


(SW Section 12.3)
Recall the two requirements for valid instruments:
1. Relevance (special case of one X)
At least one instrument must enter the population
counterpart of the first stage regression.
2. Exogeneity
All the instruments must be uncorrelated with the error
term: corr(Z1i,ui) = 0,, corr(Zmi,ui) = 0
Mathematically, we check:
3. Instrument relevance: corr(Zi,Xi) 0
4. Instrument exogeneity: corr(Zi,ui) = 0

Copyright 2015 Pearson, Inc. All rights reserved.

10-39

Checking Assumption #1: Instrument


Relevance corr(Zi,Xi) 0
We will focus on a single included endogenous regressor:
Yi = 0 + 1Xi + 2W1i + + 1+rWri + ui
First stage regression (from the TSLS):
Xi = 0 + 1Z1i ++ mZmi + m+1W1i ++ m+kWki + ui
The instruments are relevant if at least one of 1,, m are
nonzero.
The instruments are said to be weak if all the 1,, m are
either zero or nearly zero.
Weak instruments explain very little of the variation in X,
beyond that explained by the Ws
Copyright 2015 Pearson, Inc. All rights reserved.

10-40

Measuring the Strength of Instruments in


Practice: The First-Stage F-statistic
The first stage regression (one X):
Regress X on Z1,..,Zm,W1,,Wk.
Xi = 0 + 1Z1i ++ mZmi + m+1W1i ++ m+kWki + ui

Totally irrelevant instruments all the coefficients


on Z1,,Zm are zero.
The first-stage F-statistic tests the hypothesis that
Z1,,Zm do not enter the first stage regression.
Weak instruments imply a small first stage F-statistic.

Copyright 2015 Pearson, Inc. All rights reserved.

10-41

Checking for Weak Instruments with a


Single X

Compute the first-stage F-statistic.


Rule-of-thumb: If the first stage F-statistic is
less than 10, then the set of instruments is
weak.
If so, the TSLS estimator will be biased, and
statistical inferences (standard errors, hypothesis
tests, confidence intervals) can be misleading.

Copyright 2015 Pearson, Inc. All rights reserved.

10-42

Checking Assumption #2: Instrument


Exogeneity corr(Zmi, ui) = 0
Instrument exogeneity: All the instruments are
uncorrelated with the error term: corr(Z1i, ui) = 0,,
corr(Zmi, ui) = 0
Yi = 0 + 1Xi + ui,
Suppose there are two valid instruments: Z1i, Z2i
The J-test of overidentifying restrictions

X
(Anderson-Rubin test) can only be done if #Zs >
#Xs (overidentified).
Same test works if we have more variables
Yi = 0 + 1X1i + + kXki + k+1W1i + + k+rWri + ui
Copyright 2015 Pearson, Inc. All rights reserved.

10-43

The J-test, ctd


Distribution of the J-statistic
Under the null hypothesis that all the instruments
are exogeneous, J has a chi-squared distribution
with mk degrees of freedom
If m = k, J = 0 (does this make sense?)
If some instruments are exogenous and others are
endogenous, the J statistic will be large, and the
null hypothesis that all instruments are exogenous
will be rejected.

Copyright 2015 Pearson, Inc. All rights reserved.

10-44

Das könnte Ihnen auch gefallen