Econometrics PPT Final Review Slides

Review: Ch 10 and 12
Please do review all the material we have

covered
We can only go over a fraction it in one
review class, so please make sure to review
all of our lecture material and textbook
readings, not just the slides here
Copyright 2015 Pearson, Inc. All rights reserved.
10-1
Panel data notation, ctd.

Panel data with k regressors:
(X1it, X2it,,Xkit, Yit), i = 1,,n, t = 1,,T
n = number of entities (states)
T = number of time periods (years)
Some jargon
Another term for panel data is longitudinal data
balanced panel: no missing observations, that is, all
variables are observed for all entities (states) and all time
periods (years)
10-2
Why are panel data useful?

With panel data we can control for factors that:
Vary across entities but do not vary over time
Could cause omitted variable bias if they are
omitted
Are unobserved or unmeasured and therefore
cannot be included in the regression using multiple
regression
Heres the key idea:
If an omitted variable does not change over time,
then any changes in Y over time cannot be caused
by the omitted variable.
10-3
Panel Data Outline

1. The FIXED effects Model:
1. What is it
2. Potential issues
3. TIME Fixed Effects
2. The RANDOM effects Model

1. What is it
2. Potential issues
3. Choosing b/t the FIXED effects and RANDOM

effects Model
10-4
The Fixed Effects

Model
10-5
16-5
Fixed Effects
Fixed-effects (FE) explore the relationship between the
independent variables and dependent variable within an
entity (country, state, institution etc.).
Each entity (state) has its own individual characteristics that
may or may not influence the dependent variables
Why use FE? Because we believe that something within the
entity (state) will bias the variables; we need to control for
this to get unbiased estimates.
Therefore, FE removes the effect of those time-invariant
characteristics from the independent variables so we can
assess their net effect.
10-6
Fixed Effects
Fixed effects form
Yit = 1Xit + i + uit
i is called a state fixed effect or state effect it is
the constant (fixed) effect of being in state I
Again, FE removes the effect of those time-invariant
characteristics from the independent variables
thus we CANNOT get a coefficient for a specific time-invariant
variable (race, gender, etc) since they are all lumped in with
the intercept in the term i
Thus i=B0+all other coefficients of time
invariant variables for that particular

entity i
10-7
The regression lines for each state in a

picture
10-8
Summary of issues and solutions

If we have HETEROSKEDASTICITY ONLY
We find is using the xttest3 test
Correct it via the robust option
If we have SERIAL CORRELATION ONLY

We find it via the xtserial test
We correct it via the xtregar., fe command
If we have both HETEROSKED & SERIAL CORELATION:
Use the xtreg , fe vce (cluster id)
____________________________________
If we have ERRORS correlated ACROSS entities

We find is using the Pesaran Test
Correct it via the xtscc command
10-10
Example: Traffic deaths and beer taxes in

STATA
First let STATA know you are working with panel
data by defining the entity variable (state) and
time variable (year):
.xtsetstateyear;
panelvariable:state(stronglybalanced)
timevariable:year,1982to1988
delta:1unit
10-11
xtregvfrallbeertax,fevce(clusterstate)
Fixed-effects(within)regressionNumberofobs=336
Groupvariable:stateNumberofgroups=48
R-sq:within=0.0407Obspergroup:min=7
between=0.1101avg=7.0
overall=0.0934max=7
F(1,47)=5.05
corr(u_i,Xb)=-0.6885Prob>F=0.0294
(Std.Err.adjustedfor48clustersinstate)
-----------------------------------------------------------------------------|Robust
vfrall|Coef.Std.Err.tP>|t|[95%Conf.Interval]
-------------+---------------------------------------------------------------beertax|-.6558736.2918556-2.250.029-1.243011-.0687358
_cons|2.377075.149796615.870.0002.0757232.678427
------------------------------------------------------------------------------
Thepaneldatacommandxtregwiththeoptionfeperformsfixedeffects
regression.Thereportedinterceptisarbitrary,andtheestimated
individualeffectsarenotreportedinthedefaultoutput.
Thefeoptionmeansusefixedeffectsregression
Thevce(clusterstate)optiontellsSTATAtouseclusteredstandard
errorswhatarethese?Letsfigurethemouttogether
10-12
Panel Data Outline

1. What is it
2. Potential issues

1. What is it
2. Potential issues

effects Model
10-13
Regression with Time Fixed Effects

(SW Section 10.4)
An omitted variable might vary over time but not
across states:
Safer cars (air bags, etc.); changes in national
laws
These produce intercepts that change over time
Let St denote the combined effect of variables
which changes over time but not states (safer
cars).
The resulting population regression model is:
Yit = 0 + 1Xit + 2Zi + 3St + uit
10-14
geny83=(year==1983);
.globalyeardum"y83y84y85y86y87y88";
.xtregvfrallbeertax$yeardum,fevce(clusterstate);
overall=0.0876max=7
corr(u_i,Xb)=-0.6781Prob>F=0.0009
-----------------------------------------------------------------------------|Robust
-------------+---------------------------------------------------------------beertax|-.6399799.3570783-1.790.080-1.358329.0783691
y83|-.0799029.0350861-2.280.027-.1504869-.0093188
y84|-.0724206.0438809-1.650.106-.1606975.0158564
y85|-.1239763.0460559-2.690.010-.2166288-.0313238
y86|-.0378645.0570604-0.660.510-.1526552.0769262
y87|-.0509021.0636084-0.800.428-.1788656.0770615
y88|-.0518038.0644023-0.800.425-.1813645.0777568
_cons|2.42847.201688512.040.0002.0227252.834215
-------------+---------------------------------------------------------------Copyright 2015 Pearson, Inc. All rights reserved.
10-15
Are the time effects jointly statistically

significant?
.test$yeardum;
(1)y83=0
(2)y84=0
(3)y85=0
(4)y86=0
(5)y87=0
(6)y88=0
F(6,47)=4.22
Prob>F=0.0018
Yes
10-16
Panel Data Outline

1. What is it
2. Potential issues

1. What is it
2. Potential issues

effects Model
10-17
The Random Effects

Model
10-18
16-18
The Random Effects Model (cont.)

Advantages of the random effects model:
1. Can now also estimate time-invariant
explanatory variables (like race or gender).
Disadvantages of the random effects model:
1. Most importantly, the random effects estimator
requires us to assume that ai (the fixed effect term) is
uncorrelated with the independent variables, the Xs,
if were going to avoid omitted variable bias
This may be an overly strong assumption in
many cases
10-20
16-20
Remember our FE model results

xtregvfrallbeertax,fevce(clusterstate)
overall=0.0934max=7
F(1,47)=5.05
corr(u_i,Xb)=-0.6885Prob>F=0.0294
-----------------------------------------------------------------------------|Robust
-------------+---------------------------------------------------------------beertax|-.6558736.2918556-2.250.029-1.243011-.0687358
_cons|2.377075.149796615.870.0002.0757232.678427
-----------------------------------------------------------------------------Copyright 2015 Pearson, Inc. All rights reserved.
10-21
Panel Data Outline

1. What is it
2. Potential issues

1. What is it
2. Potential issues

effects Model
10-22
Choosing Between Fixed and Random

Effects
One key is the nature of the relationship between ai and the Xs:
If theyre likely to be correlated, then it makes sense to use
the fixed effects model
If not, then it makes sense to use the random effects model
use the Hausman test to examine whether there is correlation
between ai and X
Essentially, this procedure tests to see whether the regression
coefficients under the fixed effects and random effects models are
statistically different from each other
If they are different, then the fixed effects model is
preferred
If the they are not different, then the random effects
model is preferred (or estimates of both the fixed effects and
random effects models are provided)
10-23
Choosing Between Fixed and Random

Effects
Run both the FE and the RE regression
Method 1:
xtoverid
Ho: indep vars are uncorrelated with the group-specific

error (the extra RE orthogonality conditions)
Method 2:
hausman alternative command: does the same thing
10-24
Instrumental Variables Outline

1. IV Regression: Why and What; Two Stage
Least Squares
2. The General IV Regression Model
3. Checking Instrument Validity
a) Weak and strong instruments
b) Instrument exogeneity
10-25
IV Regression: Why?
Three important threats to internal validity are:
Omitted variable bias from a variable that is correlated with
X but is unobserved (so cannot be included in the
regression) and for which there are inadequate control
variables;
Simultaneous causality bias (X causes Y, Y causes X);
Errors-in-variables bias (X is measured with error)
All three problems result in E(u|X) 0.
Instrumental variables regression can eliminate bias when
E(u|X) 0 using an instrumental variable (IV), Z.
10-26
The IV Estimator with a Single Regressor and a

Single Instrument (SW Section 12.1)
Yi = 0 + 1Xi + ui
IV regression breaks X into two parts:
a part that might be correlated with u, and
a part that is not.
By isolating the part that is not correlated with u, it

is possible to estimate 1.
This is done using an instrumental variable, Zi,
which is correlated with Xi but uncorrelated with ui.
10-27
Two Stage Least Squares: Summary

Suppose Zi, satisfies the two conditions for a valid instrument:
1. Instrument relevance: corr(Zi,Xi) 0
2. Instrument exogeneity: corr(Zi,ui) = 0
Two-stage least squares:
Stage 1: Regress Xi on Zi (including an intercept), obtain the
X i
predicted values
Stage 2: Regress Yi on X i (including an intercept); the

coefficient on
is theX iTSLS estimator,
TSLS
. 1
1TSLS is a consistent estimator of 1.

10-28
Example #4: Demand for Cigarettes

ln(
Qicigarettes) = 0 + 1ln( Pi cigarettes ) + ui
Why is the OLS estimator of 1 likely to be biased?

Data set: Panel data on annual cigarette consumption and
average prices paid (including tax), by state, for the 48
continental US states, 1985-1995.
Proposed instrumental variable:
Zi = general sales tax per pack in the state = SalesTaxi
Do you think this instrument is plausibly valid?
1. Relevant? corr(SalesTaxi, ln(
Pi cigarettes )) 0?
2. Exogenous? corr(SalesTaxi,ui) = 0?
10-29
Combined into a single command:

YXZ
ivregress2slslpackpc(lravgprs=rtaxso)ifyear==1995,vce(robust)
Instrumentalvariables(2SLS)regressionNumberofobs=48
Waldchi2(1)=12.05
Prob>chi2=0.0005
R-squared=0.4011
RootMSE=.18635
-----------------------------------------------------------------------------|Robust
lpackpc|Coef.Std.Err.zP>|z|[95%Conf.Interval]
-------------+---------------------------------------------------------------lravgprs|-1.083587.3122035-3.470.001-1.695494-.471679
_cons|9.7198761.4961436.500.0006.7874912.65226
-----------------------------------------------------------------------------Instrumented:lravgprsThis is the endogenous regressor
Instruments:rtaxsoThis is the instrumental varible
-----------------------------------------------------------------------------Estimated
cigarette demand equation:

=Q9.72
cigarettes 1.08
ln(
)
i
(1.53) (0.31)
,Pncigarettes
= 48)
ln(
i
10-31
Instrumental Variables Outline

Least Squares
10-32
The General IV Regression Model:

Summary of Jargon
Yi = 0 + 1X1i + + kXki + k+1W1i + + k+rWri + ui
Yi is the dependent variable
X1i,, Xki are the endogenous regressors (potentially
correlated with ui)
W1i,,Wri are the included exogenous regressors
(uncorrelated with ui) or control variables (included so that
Zi is uncorrelated with ui, once the Ws are included)
0, 1,, k+r are the unknown regression coefficients
Z1i,,Zmi are the m instrumental variables (the excluded
exogenous variables)
The coefficients are overidentified if m > k; exactly
identified if m = k; and underidentified if m < k.
10-33
Ws as control variables, ctd.
In many cases, the purpose of including the Ws is

to control for omitted factors, so that once the Ws
are included, Z is uncorrelated with u.
Technically, the condition for Ws being effective

control variables is that the conditional mean of ui
does not depend on Zi, given Wi:
E(ui|Wi, Zi) = E(ui|Wi)
Here is the key idea: in many applications you need
to include control variables (Ws) so that Z is
plausibly exogenous (uncorrelated with u).
10-34
Example #4: Demand for cigarettes, ctd.
Suppose income is exogenous (this is plausible why?), and we

also want to estimate the income elasticity:
P cigarettes )
Q cigarettes ) ) = + ln(ln(
ln( ln(
) + 2ln(Incomei) + ui
i
0
1
i
We actually have two instruments:
Z1i = general sales taxi
Z2i = cigarette-specific taxi
Endogenous variable: ln( ln( Pi cigarettes ) ) (one X)

Included exogenous variable: ln(Incomei) (one W)
Instruments (excluded endogenous variables): general sales
tax, cigarette-specific tax (two Zs)
Is 1 over, under, or exactly identified?
10-35
Example: Cigarette demand, two instruments

Y
Z1
Z2
ivreg lpackpc lperinc (lravgprs = rtaxso rtax) if year==1995, r;
IV(2SLS)regressionwithrobuststandarderrorsNumberofobs=48
F(2,45)=16.17
Prob>F=0.0000
R-squared=0.4294
RootMSE=.18786
-----------------------------------------------------------------------------|Robust
lpackpc|Coef.Std.Err.tP>|t|[95%Conf.Interval]
-------------+---------------------------------------------------------------lravgprs|-1.277424.2496099-5.120.000-1.780164-.7746837
lperinc|.2804045.25388941.100.275-.230955.7917641
_cons|9.894955.959216910.320.0007.96299311.82692
-----------------------------------------------------------------------------Instrumented: lravgprs
Instruments: lperinc rtaxso rtax STATA lists ALL the exogenous regressors
as instruments slightly different
terminology than we have been using
-----------------------------------------------------------------------------Copyright 2015 Pearson, Inc. All rights reserved.
10-36
TSLS estimates, Z = sales tax (m = 1)

P cigarettes ) + 0.21ln(Income )
Q cigarettes ) = 9.43 1.14 ln(
ln(
i
i
i
(1.26)
(0.37)
(0.31)
TSLS estimates, Z = sales tax & cig-only tax (m = 2)

Q cigarettes ) = 9.89 1.28 ln(
P cigarettes ) + 0.28ln(Incomei)
ln(
i
i
(0.96) (0.25)
(0.25)
Smaller SEs for m = 2. Using 2 instruments gives more

information more as-if random variation.
10-37
Outline
Least Squares
10-38
Checking Instrument Validity

(SW Section 12.3)
Recall the two requirements for valid instruments:
1. Relevance (special case of one X)
At least one instrument must enter the population
counterpart of the first stage regression.
2. Exogeneity
All the instruments must be uncorrelated with the error
term: corr(Z1i,ui) = 0,, corr(Zmi,ui) = 0
Mathematically, we check:
3. Instrument relevance: corr(Zi,Xi) 0
4. Instrument exogeneity: corr(Zi,ui) = 0
10-39
Checking Assumption #1: Instrument

Relevance corr(Zi,Xi) 0
We will focus on a single included endogenous regressor:
Yi = 0 + 1Xi + 2W1i + + 1+rWri + ui
First stage regression (from the TSLS):
Xi = 0 + 1Z1i ++ mZmi + m+1W1i ++ m+kWki + ui
The instruments are relevant if at least one of 1,, m are
nonzero.
The instruments are said to be weak if all the 1,, m are
either zero or nearly zero.
Weak instruments explain very little of the variation in X,
beyond that explained by the Ws
10-40
Measuring the Strength of Instruments in

Practice: The First-Stage F-statistic
The first stage regression (one X):
Regress X on Z1,..,Zm,W1,,Wk.
Xi = 0 + 1Z1i ++ mZmi + m+1W1i ++ m+kWki + ui
Totally irrelevant instruments all the coefficients

on Z1,,Zm are zero.
The first-stage F-statistic tests the hypothesis that
Z1,,Zm do not enter the first stage regression.
Weak instruments imply a small first stage F-statistic.
10-41
Checking for Weak Instruments with a

Single X
Compute the first-stage F-statistic.

Rule-of-thumb: If the first stage F-statistic is
less than 10, then the set of instruments is
weak.
If so, the TSLS estimator will be biased, and
statistical inferences (standard errors, hypothesis
tests, confidence intervals) can be misleading.
10-42
Checking Assumption #2: Instrument

Exogeneity corr(Zmi, ui) = 0
Instrument exogeneity: All the instruments are
uncorrelated with the error term: corr(Z1i, ui) = 0,,
corr(Zmi, ui) = 0
Yi = 0 + 1Xi + ui,
Suppose there are two valid instruments: Z1i, Z2i
The J-test of overidentifying restrictions
X
(Anderson-Rubin test) can only be done if #Zs >
#Xs (overidentified).
Same test works if we have more variables
Yi = 0 + 1X1i + + kXki + k+1W1i + + k+rWri + ui
10-43
The J-test, ctd

Distribution of the J-statistic
Under the null hypothesis that all the instruments
are exogeneous, J has a chi-squared distribution
with mk degrees of freedom
If m = k, J = 0 (does this make sense?)
If some instruments are exogenous and others are
endogenous, the J statistic will be large, and the
null hypothesis that all instruments are exogenous
will be rejected.
10-44

Econometrics PPT Final Review Slides

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Econometrics PPT Final Review Slides

Hochgeladen von

Copyright:

Verfügbare Formate

Review: Ch 10 and 12

Please do review all the material we have

Panel data notation, ctd.

Copyright 2015 Pearson, Inc. All rights reserved.

Why are panel data useful?

Panel Data Outline

2. The RANDOM effects Model

3. Choosing b/t the FIXED effects and RANDOM

Copyright 2015 Pearson, Inc. All rights reserved.

The Fixed Effects

Thus i=B0+all other coefficients of time

invariant variables for that particular

Copyright 2015 Pearson, Inc. All rights reserved.

The regression lines for each state in a

Copyright 2015 Pearson, Inc. All rights reserved.

Summary of issues and solutions

If we have SERIAL CORRELATION ONLY

If we have ERRORS correlated ACROSS entities

Copyright 2015 Pearson, Inc. All rights reserved.

Example: Traffic deaths and beer taxes in

Copyright 2015 Pearson, Inc. All rights reserved.

Copyright 2015 Pearson, Inc. All rights reserved.

Panel Data Outline

2. The RANDOM effects Model

3. Choosing b/t the FIXED effects and RANDOM

Copyright 2015 Pearson, Inc. All rights reserved.

Regression with Time Fixed Effects

Are the time effects jointly statistically

Copyright 2015 Pearson, Inc. All rights reserved.

Panel Data Outline

2. The RANDOM effects Model

3. Choosing b/t the FIXED effects and RANDOM

Copyright 2015 Pearson, Inc. All rights reserved.

The Random Effects

Copyright 2015 Pearson, Inc. All rights reserved.

The Random Effects Model (cont.)

Remember our FE model results

Panel Data Outline

2. The RANDOM effects Model

3. Choosing b/t the FIXED effects and RANDOM

Copyright 2015 Pearson, Inc. All rights reserved.

Choosing Between Fixed and Random

Copyright 2015 Pearson, Inc. All rights reserved.

Choosing Between Fixed and Random

Ho: indep vars are uncorrelated with the group-specific

Copyright 2015 Pearson, Inc. All rights reserved.

Instrumental Variables Outline

Copyright 2015 Pearson, Inc. All rights reserved.

Copyright 2015 Pearson, Inc. All rights reserved.

The IV Estimator with a Single Regressor and a

By isolating the part that is not correlated with u, it

Two Stage Least Squares: Summary

Stage 2: Regress Yi on X i (including an intercept); the

is theX iTSLS estimator,

1TSLS is a consistent estimator of 1.

Example #4: Demand for Cigarettes

Qicigarettes) = 0 + 1ln( Pi cigarettes ) + ui

Why is the OLS estimator of 1 likely to be biased?

Combined into a single command:

cigarette demand equation:

Instrumental Variables Outline

Copyright 2015 Pearson, Inc. All rights reserved.

The General IV Regression Model: