Sie sind auf Seite 1von 153

16-1/135

Econometrics I
Professor William Greene
Stern School of Business
Department of Economics

Part 16: Panel Data

16-2/135

Econometrics I
Part 16 Panel Data

Part 16: Panel Data

16-3/135

Part 16: Panel Data

www.oft.gov.uk/shared_oft/reports/Evaluating-OFTs-work/oft1416.pdf

16-4/135

Part 16: Panel Data

16-5/135

Part 16: Panel Data

16-6/135

Panel Data Sets

Longitudinal data

Cross section time series

British household panel survey (BHPS)


Panel Study of Income Dynamics (PSID)
many others
Penn world tables

Financial data by firm, by year

rit rft = i(rmt - rft) + it, i = 1,,many; t=1,many

Exchange rate data, essentially infinite T, large N

Part 16: Panel Data

Benefits of Panel Data

16-7/135

Time and individual variation in behavior


unobservable in cross sections or aggregate time
series
Observable and unobservable individual
heterogeneity
Rich hierarchical structures
More complicated models
Features that cannot be modeled with only cross
section or aggregate time series data alone
Dynamics in economic behavior

Part 16: Panel Data

16-8/135

Part 16: Panel Data

16-9/135

Part 16: Panel Data

16-10/135

BHPS Has
Evolved

Part 16: Panel Data

16-11/135

Part 16: Panel Data

16-12/135

Part 16: Panel Data

16-13/135

Part 16: Panel Data

16-14/135

Part 16: Panel Data

16-15/135

Part 16: Panel Data

16-16/135

Part 16: Panel Data

16-17/135

Part 16: Panel Data

16-18/135

Part 16: Panel Data

Cornwell and Rupert Data


Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7
Years
(Extracted from NLSY.) Variables in the file are
EXP = work experience
WKS
= weeks worked
OCC
= occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA
= 1 if resides in a city (SMSA)
MS = 1 if married
FEM
= 1 if female
UNION = 1 if wage set by union contract
ED = years of education
BLK = 1 if individual is black
LWAGE = log of wage = dependent variable in regressions

These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation
with Panel Data: An Empirical Comparison of Instrumental Variable
Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. See
Baltagi, page 122 for further analysis. The data were downloaded from the
website for Baltagi's text.
16-19/135

Part 16: Panel Data

16-20/135

Part 16: Panel Data

Balanced and Unbalanced Panels

Distinction: Balanced vs. Unbalanced Panels


A notation to help with mechanics
zi,t, i = 1,,N; t = 1,,Ti
The role of the assumption
Mathematical and notational convenience:
Balanced, n=NT
N
Unbalanced: n i=1 Ti

Is the fixed Ti assumption ever necessary? Almost


never.
Is unbalancedness due to nonrandom attrition from an
otherwise balanced panel? This would require special
considerations.

16-21/135

Part 16: Panel Data

Application: Health Care Usage


German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods
This is an unbalanced panel with 7,293 individuals. There are altogether 27,326 observations. The number of
observations ranges from 1 to 7.
(Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).
(Downloaded from the JAE Archive)
Variables in the file are
DOCTOR = 1(Number of doctor visits > 0)
HOSPITAL = 1(Number of hospital visits > 0)
HSAT
= health satisfaction, coded 0 (low) - 10 (high)
DOCVIS
= number of doctor visits in last three months
HOSPVIS = number of hospital visits in last calendar year
PUBLIC
= insured in public health insurance = 1; otherwise = 0
ADDON
= insured by add-on insurance = 1; otherswise = 0
HHNINC = household nominal monthly net income in German marks / 10000.
(4 observations with income=0 were dropped)
HHKIDS
= children under age 16 in the household = 1; otherwise = 0
EDUC
= years of schooling
AGE = age in years
MARRIED = marital status

16-22/135

Part 16: Panel Data

An Unbalanced Panel:
RWMs GSOEP Data on Health Care

16-23/135

N = 7,293 Households

Part 16: Panel Data

A Basic Model for Panel Data

Unobserved individual effects in regression: E[yit | xit, ci]


Notation:

yit =xit + ci + it
xi1
x
i2
Xi
M

xiT i

Ti rows, K columns

Linear specification:
Fixed Effects: E[ci | Xi ] = g(Xi). Cov[xit,ci] 0

16-24/135

effects are correlated with included variables.


Random Effects: E[ci | Xi ] = 0. Cov[xit,ci] = 0
Part 16: Panel Data

Convenient Notation

Fixed Effects the dummy variable model

yit = i + xit + it
Individual specific constant terms.

Random Effects the error components model

16-25/135

yit = xit + it + ui
Compound (composed) disturbance
Part 16: Panel Data

Estimating

16-26/135

is the partial effect of interest

Can it be estimated (consistently) in


the presence of (unmeasured) ci?

Does pooled least squares work?


Strategies for controlling for ci using the
sample data

Part 16: Panel Data

Assumptions for Asymptotics

Convergence of moments involving cross section Xi.


N increasing, T or Ti assumed fixed.

16-27/135

Fixed T asymptotics (see text, p. 348)


Time series characteristics are not relevant (may be
nonstationary relevant in Penn World Tables)
If T is also growing, need to treat as multivariate time series.

Ranks of matrices. X must have full column rank. (Xi


may not, if Ti < K.)
Strict exogeneity and dynamics. If xit contains yi,t-1 then xit
cannot be strictly exogenous. Xit will be correlated with
the unobservables in period t-1. (To be revisited later.)
Empirical characteristics of microeconomic data
Part 16: Panel Data

The Pooled Regression

Presence of omitted effects


it +ci+it , observation for person i at time t
yit =x
yi=X
i +cii+ i , Ti observations in group i

=X
i +c i + i , note c i (ci , ci ,...,ci )
y =X+c +, Ni=1Ti observations in the sample

16-28/135

Potential bias/inconsistency of OLS depends


on fixed or random

Part 16: Panel Data

OLS in the Presence of Individual Effects


b=(XX)-1X'y
-1

=
+ (1/N)Ni=1
X Xi i

(1/N)Ni=1
X c i

(part due to the omitted ci )

-1

i i (covariance of X and will = 0)


+ (1/N) XiX i (1/N)Ni=1X
The third term vanishes asymptotically by assumption
N
i=1

-1

N Ti
1 N

plim b= + plim
N Xi=1X i i i=1xNc

So, what becomes of Ni=1


w ixci i ?
plim b=

16-29/135

if the covariance x
of

(left
out variable for
i

mula)

and ci converges to zero.

Part 16: Panel Data

Estimating the Sampling Variance of b

16-30/135

s2(X X)-1? Inappropriate because

Correlation across observations (certainly)


Heteroscedasticity (possibly)

A robust covariance matrix

Robust estimation (in general)


The White estimator
A Robust estimator for OLS.

Part 16: Panel Data

Cluster Estimator

16-31/135

Robust variance estimator for Var[b]


Est.Var[b]
Ti
Ti
= ( X'X)1 Ni=1( t=1
xit vit )( t=1
xit vit ) ( X'X) 1

Ti
Ti
vit vis xit xis ( X'X) 1
= ( X'X)1 Ni=1 t=1
s=1

v a least squares residual = c


it

it

(If Ti = 1, this is the White estimator.)

Part 16: Panel Data

Alternative OLS Variance Estimators


Cluster correction increases SEs

16-32/135

+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
Constant
5.40159723
.04838934
111.628
.0000
EXP
.04084968
.00218534
18.693
.0000
EXPSQ
-.00068788
.480428D-04
-14.318
.0000
OCC
-.13830480
.01480107
-9.344
.0000
SMSA
.14856267
.01206772
12.311
.0000
MS
.06798358
.02074599
3.277
.0010
FEM
-.40020215
.02526118
-15.843
.0000
UNION
.09409925
.01253203
7.509
.0000
ED
.05812166
.00260039
22.351
.0000
Robust
Constant
5.40159723
.10156038
53.186
.0000
EXP
.04084968
.00432272
9.450
.0000
EXPSQ
-.00068788
.983981D-04
-6.991
.0000
OCC
-.13830480
.02772631
-4.988
.0000
SMSA
.14856267
.02423668
6.130
.0000
MS
.06798358
.04382220
1.551
.1208
FEM
-.40020215
.04961926
-8.065
.0000
UNION
.09409925
.02422669
3.884
.0001
ED
.05812166
.00555697
10.459
.0000

Part 16: Panel Data

16-33/135

Results of Bootstrap Estimation

Part 16: Panel Data

16-34/135

Bootstrap variance for a


panel data estimator
Panel Bootstrap =
Block Bootstrap
Data set is N groups of
size Ti

Bootstrap sample is N
groups of size Ti drawn
with replacement.

Part 16: Panel Data

16-35/135

Part 16: Panel Data

Using First Differences

16-36/135

it +ci+it , observation for person i at time t


yit =x
Eliminating the heterogeneity
it ) + ci +
yit = yit -yi,t-1 = (x

it

it ) + wit
= (x
Note: Time invariant variables become zero
Time trend becomes the constant term

Part 16: Panel Data

OLS with First Differences

16-37/135

With strict exogeneity of (Xi,ci), OLS regression of yit on


xit is unbiased and consistent but inefficient.
i,2 i,1

Var

22 2
0
0

i,3 i,2 2 22 2
M
0 2 O 2
M

2
2
0
L

i,Ti i,Ti 1

GLS is unpleasantly complicated. Use OLS in


first differences and use Newey-West with one
lag.

Part 16: Panel Data

Application of a Two Period Model


Hemoglobin and Quality of Life in Cancer
Patients with Anemia,
Finkelstein (MIT), Berndt (MIT), Greene (NYU),
Cremieux (Univ. of Quebec)
1998
With Ortho Biotech seeking to change labeling
of already approved drug erythropoetin.
r-HuEPO

16-38/135

Part 16: Panel Data

16-39/135

Part 16: Panel Data

QOL Study

Quality of life study

yit = self administered quality of life survey, scale = 0,,100


xit = hemoglobin level, other covariates

16-40/135

Treatment effects model (hemoglobin level)


Background r-HuEPO treatment to affect Hg level

Important statistical issues

i = 1, 1200+ clinically anemic cancer patients undergoing


chemotherapy, treated with transfusions and/or r-HuEPO
t = 0 at baseline, 1 at exit. (interperiod survey by some patients was not
used)

Unobservable individual effects


The placebo effect
Attrition sample selection
FDA mistrust of community based not clinical trial based statistical
evidence

Objective when to administer treatment for maximum marginal


benefit
Part 16: Panel Data

Regression-Treatment Effects Model

16-41/135

QOLit t + "other covariates"


+ 7Hbit7 + 8Hbit8+ 9Hbit9+ ... 15Hb15
it
+ ci + it
Hbit hemoglobin level, grams/deciliter, range 3+ to 15
Hbit7 1(3 Hbit < 7.5) (Base case; 7 = 0)
Hbit8 1(7.5 Hbit < 8.5)
M
Hb15
it 1(14.5 Hbit 15)

Part 16: Panel Data

Effects and Covariates

16-42/135

Individual effects that would impact a self reported QOL:


Depression, comorbidity factors (smoking), recent
financial setback, recent loss of spouse, etc.
Covariates

Change in tumor status


Measured progressivity of disease
Change in number of transfusions
Presence of pain and nausea
Change in number of chemotherapy cycles
Change in radiotherapy types
Elapsed days since chemotherapy treatment
Amount of time between baseline and exit

Part 16: Panel Data

First Differences Model


QOLi QOLi1 QOLi0
j
j
K
= (1 0 ) 15

(Hb

Hb
)

j8 j
i1
i0
k 1k (xik,1 xik,0 ) i1 i0

Regression to the mean (the "tendency to mediocrity")


i0 i1 ui (QOLi0 QOL0 ) Expect 0 < 1
implies
= 1 0 QOL0

QOLi QOLi1 QOLi0

16-43/135

j
j
K
= 15

(Hb

Hb
)

j 8 j
i1
i0
k1k (xik,1 xik,0 ) QOLi0 + ui

Part 16: Panel Data

Dealing with Attrition

16-44/135

The attrition issue: Appearance for the second interview


was low for people with initial low QOL (death or
depression) or with initial high QOL (dont need the
treatment). Thus, missing data at exit were clearly
related to values of the dependent variable.
Solutions to the attrition problem

Heckman selection model (used in the study)

Prob[Present at exit|covariates] = (z) (Probit model)


Additional variable added to difference model i = (zi)/(zi)

The FDA solution: fill with zeros. (!)

Part 16: Panel Data

Difference in Differences
With two periods,
i1) + ui
yit = yi2 -yi1 = 0 + (xi2 -x
Consider a "treatment, Di ," that takes place between
time 1 and time 2 for some of the individuals
+ 1Di + ui
yi= 0 + (x
i)
Di = the "treatment dummy"

This is a linear regression model. If there are no regressors,

16-45/135

1 y| treatment - y| control
= "difference in differences" estimator.
0 Average change in yi for the "treated"
Part 16: Panel Data

Difference-in-Differences Model
With two periods and strict exogeneity of D and T,

yit = 0 1Dit 2 Tt 3 TtDit it


Dit = dummy variable for a treatment that takes place
between time 1 and time 2 for some of the individuals,
Tt = a time period dummy variable, 0 in period 1,
1 in period 2.
This is a linear regression model. If there are no regressors,

16-46/135

Using least squares,


b3 (y2 y1)D1 (y2 y1)D0

Part 16: Panel Data

16-47/135

Difference in Differences
yit = 0 1Dit 2 Tt 3Dit Tt xit it , t 1,2
yit = 2 3Di 2 (xit ) it
= 2 3Di 2 (xit ) ui

yit | D 1 yit | D 0

3 (xit | D 1) (xit | D 0)
If the same individual is observed in both states,
the second term is zero. If the effect is estimated by
averaging individuals with D = 1 and different individuals
with D=0, then part of the 'effect' is explained by change
in the covariates, not the treatment.
Part 16: Panel Data

A Tale of Two Cities

16-48/135

A sharp change in policy can constitute a natural


experiment
The Mariel boatlift from Cuba to Miami (May-September,
1980) increased the Miami labor force by 7%. Did it
reduce wages or employment of non-immigrants?
Compare Miami to Los Angeles, a comparable
(assumed) city.
Card, David, The Impact of the Mariel Boatlift on the
Miami Labor Market, Industrial and Labor Relations
Review, 43, 1990, pp. 245-257.

Part 16: Panel Data

Difference in Differences
i individual, T = 0 for no immigration, T=1 for migration
(Yi | T) Yi,T 1 if unemployed, 0 if employed.
c = city, t = period.
Unemployment rate in city c at time t is E[Yi,0 | c,t] with no migration
Unemploym ent rate in city c at time t is E[Yi,1 | c,t] with migration
Assume E[Yi,0 | c,t] t c
E[Yi,1 | c,t] t c
E[Yi,0 | c,t]
the effect of the immigration on the unemployment rate.

16-49/135

Part 16: Panel Data

Applying the Model

16-50/135

c = M for Miami, L for Los Angeles


Immigration occurs in Miami, not Los Angeles
T = 1979, 1981 (pre- and post-)
Sample moment equations: E[Yi|c,t,T]

E[Yi|M,79] = 79 + M

E[Yi|M,81] = 81 + M +

E[Yi|L,79] = 79 + L

E[Yi|M,79] = 81 + L

It is assumed that unemployment growth in the two cities


would be the same if there were no immigration.
Part 16: Panel Data

Implications for Differences

16-51/135

If neither city exposed to migration

E[Yi,0|M,81] - E[Yi,0|M,79] = 81 79 (Miami)

E[Yi,0|L,81] - E[Yi,0|L,79] = 81 79 (LA)

If both cities exposed to migration

E[Yi,1|M,81] - E[Yi,1|M,79] = 81 79 + (Miami)

E[Yi,1|L,81] - E[Yi,1|L,79] = 81 79 + (LA)

One city (Miami) exposed to migration: The difference


in differences is.

{E[Yi,1|M,81] - E[Yi,1|M,79]} {E[Yi,0|L,81] - E[Yi,0|L,79]}


= (Miami)

Part 16: Panel Data

UK Office of Fair Trading, May 2012; Stephen Davies

http://dera.ioe.ac.uk/14610/1/oft1416.pdf

16-52/135

Part 16: Panel Data

16-53/135

Outcome is the fees charged.

Activity is collusion on fees.

Part 16: Panel Data

Treatment Schools:
Treatment is an
intervention by the
Office of Fair Trading
Control Schools were
not involved in the
conspiracy

Treatment is not
voluntary

16-54/135

Part 16: Panel Data

16-55/135

Apparent Impact of the Intervention

Part 16: Panel Data

16-56/135

Part 16: Panel Data

Treatment (Intervention)
Effect = 1 +
2 if SS school

16-57/135

Part 16: Panel Data

In order to test robustness two versions of the fixed effects model were run. The first is
Ordinary Least Squares, and the second is heteroscedasticity and auto-correlation robust
(HAC) standard errors in order to check for heteroscedasticity and autocorrelation.

16-58/135

Part 16: Panel Data

16-59/135

Part 16: Panel Data

16-60/135

The cumulative impact of the intervention is the


area between the two paths from intervention to
time T.

Part 16: Panel Data

16-61/135

Part 16: Panel Data

16-62/135

The Fixed Effects Model


yi = Xi + dii + i, for each individual
y1 X1 d1 0

y2 X2 0 d2
M M M M

X
y
0 0
N
N

= [X,D]

= Z

0
0 0
O M

0 d
N
0

E[ci | Xi ] = g(Xi); Effects are correlated with included variables.


Cov[xit,ci] 0
Part 16: Panel Data

16-63/135

The Within Groups Transformation


Removes the Effects
it ci+it
yit x
i ci+i
yi x
(it i )
yit yi ( xit - x
i)
Use least squares to estimate .

Part 16: Panel Data

Useful Analysis of Variance Notation

16-64/135

Decomposition of Total variation:


(zit z)
N
i=1

Ti
t=1

N
i=1

(zit z.)

Ti
t=1

T z.i z
N
i=1 i

Total variation = Within groups variation


+ Between groups variation

Part 16: Panel Data

WHO Data

16-65/135

Part 16: Panel Data

Baltagi and Griffins Gasoline Data


World Gasoline Demand Data, 18 OECD Countries, 19 years
Variables in the file are
COUNTRY = name of country
YEAR = year, 1960-1978
LGASPCAR = log of consumption per car
LINCOMEP = log of per capita income
LRPMG = log of real price of gasoline
LCARPCAP = log of per capita number of cars
See Baltagi (2001, p. 24) for analysis of these data. The article on
which the analysis is based is Baltagi, B. and Griffin, J., "Gasolne
Demand in the OECD: An Application of Pooling and Testing
Procedures," European Economic Review, 22, 1983, pp. 117-137. The
data were downloaded from the website for Baltagi's text.

16-66/135

Part 16: Panel Data

Analysis of Variance

16-67/135

Part 16: Panel Data

Analysis of Variance
+--------------------------------------------------------------------------+
| Analysis of Variance for
LGASPCAR
|
| Stratification Variable
_STRATUM
|
| Observations weighted by
ONE
|
| Total Sample Size
342
|
| Number of Groups
18
|
| Number of groups with no data
0
|
| Overall Sample Mean
4.2962420
|
| Sample Standard Deviation
.5489071
|
| Total Sample Variance
.3012990
|
|
|
| Source of Variation
Variation
Deg.Fr.
Mean Square |
| Between Groups
85.68228007
17
5.04013 |
| Within Groups
17.06068428
324
.05266 |
| Total
102.74296435
341
.30130 |
| Residual S.D.
.22946990
|
| R-squared
.83394791
MSB/MSW
21.96425 |
| F ratio
95.71734806
P value
.00000 |
+--------------------------------------------------------------------------+

16-68/135

Part 16: Panel Data

Estimating the Fixed Effects Model


The FEM is a plain vanilla regression model but
with many independent variables
Least squares is unbiased, consistent, efficient,
but inconvenient if N is large.

16-69/135

b XX X D
Xy

D y

a
D
X
D
D

Using the Frisch-Waugh theorem


b

=[XMDX]1 XMDy

Part 16: Panel Data

Fixed Effects Estimator (cont.)


M1D

2
0 MD
0
MD
(The dummy variables are orthogonal)

N
0
MD
0
i i )1 d = I Ti (1/Ti )dd

MDi I Ti di (dd
i
0

i Di Xi ,
XMD X = Ni=1XM
i Di yi ,
XMD y = Ni=1XM

16-70/135

y
XM
i Di Xi
XM
i

i
D

k,l

i k

i
t=1
(xit,k -xi.,k )(xit,l -xi.,l )

Ti

t=1(xit,k -xi.,k )(yit -yi. )

Part 16: Panel Data

Least Squares Dummy Variable Estimator

b is obtained by within groups least squares


(group mean deviations)

a is estimated using the normal equations:


DXb+DDa=Dy

16-71/135

a = (DD)-1D(y Xb)
Ti

a=(1/T
)
i
i
t=1 (yit -xitb)=ei

Part 16: Panel Data

Inference About OLS

16-72/135

Assume strict exogeneity: Cov[it,(xjs,cj)]=0. Every


disturbance in every period for each person is
uncorrelated with variables and effects for every person
and across periods.
Now, its just least squares in a classical linear
regression model.
2
N
N
N
i
1

T
)plim[(1
/

T
)

X
M
X
]

i=1 i
i=1 i
i=1 i D i
Asy.Var[b] =
which is the usual estimator for OLS

Ti
Ni=1 t=1
(yit -ai -xitb)2

N
i=1 i

T - N - K

(Note the degrees of freedom correction)


Part 16: Panel Data

16-73/135

Application Cornwell and Rupert

Part 16: Panel Data

16-74/135

LSDV Results
Note huge changes in
the coefficients. SMSA
and MS change signs.
Significance changes
completely!

Pooled OLS

Part 16: Panel Data

16-75/135

The Effect of the Effects

Part 16: Panel Data

The Within (LSDV) Estimator is an IV Estimator

16-76/135

y = X+(D+)
= X+ w
Regression of y on X is inconsistent because X is
correlated with w. The data in group mean deviations is
Z = MDX = X - D(DD)-1DX
The inconsistent OLS estimator is b = (XX)-1 Xy (omits D)
The IV estimator bLSDV =(ZX)-1 Zy=(XMDX)-1 XMDy.
=[(XMD )(MDX)]-1 (XMD )(MDy)
This is OLS using data in mean deviations, i.e., LSDV.

Part 16: Panel Data

16-77/135

LSDV As Usual

Part 16: Panel Data

16-78/135

2SLS Using Z=MDX as Instruments

Part 16: Panel Data

A Caution About Stata and R2


Residual Sum of Squares
Total Sum of Squares
Or is it? What is the total sum of squares?
R squared = 1 -

For the FE model above,

Conventional: Total Sum of Squares =

"Within Sum of Squares"

i 1

i 1

Ti

t 1

it

Ti

t 1

it

yi

R2 = 0.90542
R2 = 0.65142

Which should appear in the denominator of R 2

The coefficient estimates and standard errors are the same. The calculation of the R 2 is
different. In the areg procedure, you are estimating coefficients for each of your covariates
plus each dummy variable for your groups. In the xtreg, fe procedure the R2 reported is
obtained by only fitting a mean deviated model where the effects of the groups (all of the
dummy variables) are assumed to be fixed quantities. So, all of the effects for the groups are
simply subtracted out of the model and no attempt is made to quantify their overall effect on
the fit of the model.
Since the SSE is the same, the R 2=1SSE/SST is very different. The difference is real in
that we are making different assumptions with the two approaches. In the xtreg, fe approach,
the effects of the groups are fixed and unestimated quantities are subtracted out of the
model before the fit is performed. In the areg approach, the group effects are estimated and
affect the total sum of squares of the model under consideration.

16-79/135

Part 16: Panel Data

Examining the Effects with a KDE


Fixed E ff ect s from Corn well an d Ru p er t Wag e Mod el

.345

.207

.138

.069

Fixed Eff ects from Cornwell and Rupert W


age Model

.000
0

AI
K
ernel dens ity es tim a te for

AI

Frequenc y

De n s ity

.276

Mean = 4.819,
Standard deviation =
1.054.

16-80/135

.856

1.688

2.520

3 .351

4.1 83

5.015

5.847

6.678

AI

Part 16: Panel Data

Robust Covariance Matrix for LSDV


Cluster Estimator for Within Estimator
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|OCC
|
-.02021
.01374007
-1.471
.1412
.5111645|
|SMSA
|
-.04251**
.01950085
-2.180
.0293
.6537815|
|MS
|
-.02946
.01913652
-1.540
.1236
.8144058|
|EXP
|
.09666***
.00119162
81.114
.0000
19.853782|
+--------+------------------------------------------------------------+
+---------------------------------------------------------------------+
| Covariance matrix for the model is adjusted for data clustering.
|
| Sample of
4165 observations contained
595 clusters defined by |
|
7 observations (fixed number) in each cluster.
|
+---------------------------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|DOCC
|
-.02021
.01982162
-1.020
.3078
.00000|
|DSMSA
|
-.04251
.03091685
-1.375
.1692
.00000|
|DMS
|
-.02946
.02635035
-1.118
.2635
.00000|
|DEXP
|
.09666***
.00176599
54.732
.0000
.00000|
+--------+------------------------------------------------------------+

16-81/135

Part 16: Panel Data

16-82/135

Time Invariant Regressors

Time invariant xit is defined as


invariant for all i. E.g., sex dummy
variable, FEM and ED (education in
the Cornwell/Rupert data).

If xit,k is invariant for all t, then the


group mean deviations are all 0.

Part 16: Panel Data

16-83/135

FE With Time Invariant Variables


+----------------------------------------------------+
| There are 2 vars. with no within group variation. |
| FEM
ED
|
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
EXP
|
.09671227
.00119137
81.177
.0000
19.8537815
WKS
|
.00118483
.00060357
1.963
.0496
46.8115246
OCC
|
-.02145609
.01375327
-1.560
.1187
.51116447
SMSA
|
-.04454343
.01946544
-2.288
.0221
.65378151
FEM
|
.000000
......(Fixed Parameter).......
ED
|
.000000
......(Fixed Parameter).......
+--------------------------------------------------------------------+
|
Test Statistics for the Classical Model
|
+--------------------------------------------------------------------+
|
Model
Log-Likelihood
Sum of Squares R-squared |
|(1) Constant term only
-2688.80597
886.90494
.00000 |
|(2) Group effects only
27.58464
240.65119
.72866 |
|(3) X - variables only
-1688.12010
548.51596
.38154 |
|(4) X and group effects
2223.20087
83.85013
.90546 |
+--------------------------------------------------------------------+

Part 16: Panel Data

Drop The Time Invariant Variables


Same Results
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
EXP
|
.09671227
.00119087
81.211
.0000
19.8537815
WKS
|
.00118483
.00060332
1.964
.0495
46.8115246
OCC
|
-.02145609
.01374749
-1.561
.1186
.51116447
SMSA
|
-.04454343
.01945725
-2.289
.0221
.65378151
+--------------------------------------------------------------------+
|
Test Statistics for the Classical Model
|
+--------------------------------------------------------------------+
|
Model
Log-Likelihood
Sum of Squares R-squared |
|(1) Constant term only
-2688.80597
886.90494
.00000 |
|(2) Group effects only
27.58464
240.65119
.72866 |
|(3) X - variables only
-1688.12010
548.51596
.38154 |
|(4) X and group effects
2223.20087
83.85013
.90546 |
+--------------------------------------------------------------------+

16-84/135

No change in the sum of squared residuals


Part 16: Panel Data

Fixed Effects Vector Decomposition


Efficient Estimation of Time Invariant and
Rarely Changing Variables in Finite Sample
Panel Analyses with Unit Fixed Effects

16-85/135

Thomas Plmper and Vera Troeger


Political Analysis, 2007

Part 16: Panel Data

Introduction
[T]he FE model does not allow the estimation of
time invariant variables. A second drawback of
the FE model results from its inefficiency in
estimating the effect of variables that have very
little within variance.
This article discusses a remedy to the related
problems of estimating time invariant and rarely
changing variables in FE models with unit effects

16-86/135

Part 16: Panel Data

16-87/135

The Model
yit = i +

x
+
k
kit
k=1

z
+

m
mi
it
m=1

where i denote the N unit effects.

Part 16: Panel Data

Fixed Effects Vector Decomposition


Step 1: Compute the fixed effects regression to
get the estimated unit effects. We run this FE
model with the sole intention to obtain estimates
of the unit effects, i.

16-88/135

i = yi - K bFE
xki
k=1 k

Part 16: Panel Data

16-89/135

Step 2
Regress ai on zi and compute residuals

ai =zm=1 +mh
M

im

hi is orthogonal to zi (since it is a residual)


Vector hi is expanded so each element
hi is replicated Ti times - h is the length of
the full sample.
Part 16: Panel Data

16-90/135

Step 3
Regress yit on a constant, X, Z and h using
ordinary least squares to estimate , , , .

yit = +

x kit +

k=1 k

z + h i + it

m=1 m mi

Notice that i in the original model has


become +h i in the revised model.

Part 16: Panel Data

Step 1 (Based on full sample)


These 2 variables have no within group variation.
FEM
ED
F.E. estimates are based on a generalized inverse.
--------+--------------------------------------------------------|
Standard
Prob.
Mean
LWAGE| Coefficient
Error
z
z>|Z|
of X
--------+--------------------------------------------------------EXP|
.09663***
.00119
81.13 .0000
19.8538
WKS|
.00114*
.00060
1.88 .0600
46.8115
OCC|
-.02496*
.01390
-1.80 .0724
.51116
IND|
.02042
.01558
1.31 .1899
.39544
SOUTH|
-.00091
.03457
-.03 .9791
.29028
SMSA|
-.04581**
.01955
-2.34 .0191
.65378
UNION|
.03411**
.01505
2.27 .0234
.36399
FEM|
.000
.....(Fixed Parameter).....
.11261
ED|
.000
.....(Fixed Parameter).....
12.8454
--------+---------------------------------------------------------

16-91/135

Part 16: Panel Data

Step 2 (Based on 595 observations)


--------+--------------------------------------------------------|
Standard
Prob.
Mean
UHI| Coefficient
Error
z
z>|Z|
of X
--------+--------------------------------------------------------Constant|
2.88090***
.07172
40.17 .0000
FEM|
-.09963**
.04842
-2.06 .0396
.11261
ED|
.14616***
.00541
27.02 .0000
12.8454
--------+---------------------------------------------------------

16-92/135

Part 16: Panel Data

Step 3!
--------+--------------------------------------------------------|
Standard
Prob.
Mean
LWAGE| Coefficient
Error
z
z>|Z|
of X
--------+--------------------------------------------------------Constant|
2.88090***
.03282
87.78 .0000
EXP|
.09663***
.00061
157.53 .0000
19.8538
WKS|
.00114***
.00044
2.58 .0098
46.8115
OCC|
-.02496***
.00601
-4.16 .0000
.51116
IND|
.02042***
.00479
4.26 .0000
.39544
SOUTH|
-.00091
.00510
-.18 .8590
.29028
SMSA|
-.04581***
.00506
-9.06 .0000
.65378
UNION|
.03411***
.00521
6.55 .0000
.36399
FEM|
-.09963***
.00767
-13.00 .0000
.11261
ED|
.14616***
.00122
120.19 .0000
12.8454
HI|
1.00000***
.00670
149.26 .0000 -.103D-13
--------+---------------------------------------------------------

16-93/135

Part 16: Panel Data

16-94/135

Part 16: Panel Data

16-95/135

What happened here?


yit = i +

x kit +

k=1 k

z + it

m=1 m mi

where i denote the N unit effects.


An assumption is added along the way
Cov(i ,Zi ) = 0. This is exactly the number of
orthogonality assumptions needed to
identify . It is not part of the original model.

Part 16: Panel Data

http://davegiles.blogspot.com/2012/06/fixed-effects-vector-decomposition.html

16-96/135

Part 16: Panel Data

16-97/135

Part 16: Panel Data

The Random Effects Model

The random effects model


it +ci+it , observation for person i at time t
yit =x
yi=X
i +cii+ i , Ti observations in group i

=X
i +c i + i , note c i (ci , ci ,...,ci )
y =X+c +, Ni=1Ti observations in the sample
c=(c1 , c2 ,...cN), Ni=1Ti by 1 vector

ci is uncorrelated with xit for all t;

16-98/135

E[ci |Xi] = 0
E[it|Xi,ci]=0

Part 16: Panel Data

Notation

16-99/135

y1
y
2
M

yN

X
1
X

M


X
N
= X++u

T1 observations
1i u1 1
i u
T2 observations
22
2
M
M
M

i
TN observations
N uN N
Ni=1 Ti observations

= X+w
I n all that follows, except where explicitly noted, X, Xi
and xit contain a constant term as the first element.
To avoid notational clutter, in those cases, xit etc. will
simply denote the counterpart without the constant term.
Use of the symbol K for the number of variables will thus
be context specific but will usually include the constant term.

Part 16: Panel Data

Error Components Model


A Generalized Regression Model
yit xitb+it +ui
E[it | Xi ] 0
E[it2 | Xi ] 2
E[ui | Xi ] 0
E[ui2 | Xi ]

2 +u2

Var[i+uii ]

2
u

u2

...

2
2

...u
u +

...

...

u2

2
u

u2

2
...

2
2

+
u

2
u

yi=X
i + i +ui
i for Ti observations

16-100/135

Part 16: Panel Data

Notation

16-101/135

Var[i+uii ]

2 u2
u2
L

2
2 u2 L
u

u2

u2

u2

u2
O
M

K 2 u2

= 2I Ti u2ii Ti Ti
= 2I Ti u2ii
= i
0
1 0 L
0
(Note these differ only
L
0
2

Var[w | X]
M M O
M in the dimension Ti )

0 0 K

Part 16: Panel Data

Convergence of Moments

16-102/135

XiXi
XX
N

f
a weighted sum of individual moment matrices
i1 i
N
i1 T
Ti
i iX i

X
XX
N

f
a weighted sum of individual moment matrices
i1 i
N
i1T
Ti
= 2Ni1ffi

XiXi
u2Ni1 ixixi
Ti

Note asymptotics are with respect to N. Each matrix

Xi Xi
is the
Ti

moments for the Ti observations. Should be 'well behaved' in micro


level data. The average of N such matrices should be likewise.
T or Ti is assumed to be fixed (and small).

Part 16: Panel Data

Random vs. Fixed Effects

Random Effects

Small number of parameters


Efficient estimation
Objectionable orthogonality assumption (ci Xi)

Fixed Effects

16-103/135

Robust generally consistent


Large number of parameters

Part 16: Panel Data

Ordinary Least Squares

Standard results for OLS in a GR model

Consistent
Unbiased
Inefficient

True variance of the least squares estimator

16-104/135

X X
XX

Ni1T i Ni1 Ti
0 Q-1 Q * Q-1
0 as N

1
Var[b | X] N
i1Ti

XX
N
i1T i

Part 16: Panel Data

Estimating the Variance for OLS

16-105/135

XX XX
X

X
N
N
N

i1 i
i1 i
i1 Ti
In the spirit of the White estimator, use
1
Var[b | X] N
i1 Ti

iw
iXi

Xi w
Ti
XX
N
i= yi - Xib, fi N
i1fi
, w
N
Ti
i1Ti
i1Ti
Hypothesis tests are then based on Wald statistics.
THI S I S THE 'CLUSTER' ESTI MATOR
Part 16: Panel Data

OLS Results for Cornwell and Rupert


+----------------------------------------------------+
| Residuals
Sum of squares
=
522.2008
|
|
Standard error of e =
.3544712
|
| Fit
R-squared
=
.4112099
|
|
Adjusted R-squared
=
.4100766
|
+----------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant
5.40159723
.04838934
111.628
.0000
EXP
.04084968
.00218534
18.693
.0000
19.8537815
EXPSQ
-.00068788
.480428D-04
-14.318
.0000
514.405042
OCC
-.13830480
.01480107
-9.344
.0000
.51116447
SMSA
.14856267
.01206772
12.311
.0000
.65378151
MS
.06798358
.02074599
3.277
.0010
.81440576
FEM
-.40020215
.02526118
-15.843
.0000
.11260504
UNION
.09409925
.01253203
7.509
.0000
.36398559
ED
.05812166
.00260039
22.351
.0000
12.8453782

16-106/135

Part 16: Panel Data

Alternative Variance Estimators

16-107/135

+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
Constant
5.40159723
.04838934
111.628
.0000
EXP
.04084968
.00218534
18.693
.0000
EXPSQ
-.00068788
.480428D-04
-14.318
.0000
OCC
-.13830480
.01480107
-9.344
.0000
SMSA
.14856267
.01206772
12.311
.0000
MS
.06798358
.02074599
3.277
.0010
FEM
-.40020215
.02526118
-15.843
.0000
UNION
.09409925
.01253203
7.509
.0000
ED
.05812166
.00260039
22.351
.0000
Robust Cluster___________________________________________
Constant
5.40159723
.10156038
53.186
.0000
EXP
.04084968
.00432272
9.450
.0000
EXPSQ
-.00068788
.983981D-04
-6.991
.0000
OCC
-.13830480
.02772631
-4.988
.0000
SMSA
.14856267
.02423668
6.130
.0000
MS
.06798358
.04382220
1.551
.1208
FEM
-.40020215
.04961926
-8.065
.0000
UNION
.09409925
.02422669
3.884
.0001
ED
.05812166
.00555697
10.459
.0000

Part 16: Panel Data

Generalized Least Squares

16-108/135

=[X-1X]1[X-1y]

i i-1X i ]1[Ni1X i i-1y i ]


=[Ni1X
2

1
-1
i 2 I Ti 2
ii
2

Tiu

(note, depends on i only through Ti )

Part 16: Panel Data

Generalized Least Squares

16-109/135

GLS is equivalent to OLS regression of


yit * yit iyi. on xit * xit ixi.,
where i 1

2 Tiu2

] [X-1X]-1 2[X * X* ]-1


Asy.Var[

Part 16: Panel Data

Estimators for the Variances


it it ui
yit x
Using the OLS estimator of , bOLS ,
Ni1 tTi 1(yit - a - xitb)2

T -1-K
N
i1 i

estimates 2 U2

With the LSDV estimates, ai and bLSDV ,


Ni1 tTi 1(yit - ai - xitb)2

T -N-K
N
i1 i

estimates 2

Using the difference of the two,

16-110/135

N Ti (y - a - x b)2
it
i1 t1 it

i1Ti -1-K

N Ti (y - a - x b )2
i1 t 1
it
i
it

i1Ti -N-K

estimates U2

Part 16: Panel Data

Practical Problems with FGLS


The preceding regularly produce negative estimates of u2.
Estimation is made very complicated in unbalanced panels.
A bulletproof solution (originally used in TSP, now NLOGIT and others).
Ti
N
2

2
i1 t1 (yit ai xitbLSDV )
From the robust LSDV estimator:

Ni1 Ti

Ni1 tTi 1 (yit aOLS xitbOLS )2


2
From the pooled OLS estimator: Est( )

Ni1 Ti

16-111/135

2
u

Ni1 tTi 1(yit aOLS xitbOLS )2 Ni1 tTi 1 (yit ai xitbLSDV )2

0

N
i1 Ti
2
u

Part 16: Panel Data

Stata Variance Estimators


Ni1 tTi 1 (yit ai xitbLSDV )2

> 0 based on FE estimates



Ni1Ti K N
2

(N

K)

SSE(group
means)

u Max 0,

A
(N

A)T

2
where A = K or if
u is negative,

A=trace of a matrix that somewhat resembles I K .


Many other adjustments exist. None guaranteed to be
positive. No optimality properties or even guaranteed consistency.

16-112/135

Part 16: Panel Data

Other Variance Estimators


% )2
% xib
Ni1(yit a

2
2
MEANS
From the group means regression: / T u
N K 1
it w
is
Ni1 tTi 11 sTi t1w
2
2
(Wooldridge) Based on E[wit wis | Xi ] u if t s,

u
Ni1Ti K N
There are many others. Generally if the original, standard choices fail,
these will also.
x does not contain a constant term in the
preceding.

16-113/135

Part 16: Panel Data

Fixed Effects Estimates


---------------------------------------------------------------------Least Squares with Group Dummy Variables..........
LHS=LWAGE
Mean
=
6.67635
Residuals
Sum of squares
=
82.34912
Standard error of e =
.15205
These 2 variables have no within group variation.
FEM
ED
F.E. estimates are based on a generalized inverse.
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------EXP|
.11346***
.00247
45.982
.0000
19.8538
EXPSQ|
-.00042***
.544864D-04
-7.789
.0000
514.405
OCC|
-.02106
.01373
-1.534
.1251
.51116
SMSA|
-.04209**
.01934
-2.177
.0295
.65378
MS|
-.02915
.01897
-1.536
.1245
.81441
FEM|
.000
......(Fixed Parameter).......
UNION|
.03413**
.01491
2.290
.0220
.36399
ED|
.000
......(Fixed Parameter).......
--------+-------------------------------------------------------------

16-114/135

Part 16: Panel Data

Computing Variance Estimators


Using the full list of variables (FEM and ED are time invariant)
OLS sum of squares = 522.2008.
2+2 = 522.2008 / (4165 - 9) = 0.12565.

Using full list of variables and a generalized inverse (same


as dropping FEM and ED), LSDV sum of squares = 82.34912.
2 = 82.34912 / (4165 - 8-595) = 0.023119.

2 0.12565 - 0.023119 = 0.10253

u
2 were
Both estimators are positive. We stop here. If
u
negative, we would use estimators without DF corrections.

16-115/135

Part 16: Panel Data

Application
---------------------------------------------------------------------Random Effects Model: v(i,t)
= e(i,t) + u(i)
Estimates: Var[e]
=
.023119
Var[u]
=
.102531
Corr[v(i,t),v(i,s)] =
.816006
Lagrange Multiplier Test vs. Model (3) =3713.07
( 1 degrees of freedom, prob. value = .000000)
(High values of LM favor FEM/REM over CR model)
Fixed vs. Random Effects (Hausman)
=
.00 (Cannot be computed)
( 8 degrees of freedom, prob. value = 1.000000)
(High (low) values of H favor F.E.(R.E.) model)
Sum of Squares
1411.241136
R-squared
-.591198
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
EXP
.08819204
.00224823
39.227
.0000
19.8537815
EXPSQ
-.00076604
.496074D-04
-15.442
.0000
514.405042
OCC
-.04243576
.01298466
-3.268
.0011
.51116447
SMSA
-.03404260
.01620508
-2.101
.0357
.65378151
MS
-.06708159
.01794516
-3.738
.0002
.81440576
FEM
-.34346104
.04536453
-7.571
.0000
.11260504
UNION
.05752770
.01350031
4.261
.0000
.36398559
ED
.11028379
.00510008
21.624
.0000
12.8453782
Constant
4.01913257
.07724830
52.029
.0000

16-116/135

Part 16: Panel Data

Testing for Effects: An LM Test

16-117/135

Breusch and Pagan Lagrange Multiplier statistic


0
yit xit ui it , ui and it ~ Normal
0

u2

0
2

H0 : u2 0
General
2

Ni1(Ti ei )2
( T )
LM =
1

N
N
T
2
2i1Ti (Ti 1) i1 t1eit

Balanced Panel
N
2
i1 i

i i ]
NT [(Te ) ee
LM

i i
2(T-1)
Ni1ee

N
i1

2
i

[1]

Part 16: Panel Data

Application: Cornwell-Rupert

16-118/135

Part 16: Panel Data

Testing for Effects


Regress;
lhs=lwage;rhs=fixedx,varyingx;res=e$
Matrix ; tebar=7*gxbr(e,person)$
Calc
; list;lm=595*7/(2*(7-1))*
(tebar'tebar/sumsqdev - 1)^2$
LM
= 3797.06757

16-119/135

Part 16: Panel Data

A Hausman Test for FE vs. RE


Estimator

Random Effects
E[ci|Xi] = 0

Fixed Effects
E[ci|Xi] 0

FGLS
(Random
Effects)
LSDV
(Fixed Effects)

Consistent and
Efficient

Inconsistent

Consistent
Inefficient

Consistent
Possibly Efficient

16-120/135

Part 16: Panel Data

Computing the Hausman Statistic

16-121/135

1
N

Est.Var[FE ]
i1Xi I ii X i
Ti

i
N
]

Est.Var[

X
I

ii

RE

i1 i
Ti

X i

-1

2
Ti
u
, 0 i = 2
1
2

Ti
u

2
2
] Est.Var[
]
As long as
and
u are consistent, as N , Est.Var[
FE
RE

will be nonnegative definite. In a finite sample, to ensure this, both must


2
be computed using the same estimate of
. The one based on LSDV will

generally be the better choice.


] if there are time
Note that columns of zeros will appear in Est.Var[
FE
invariant variables in X.

does not contain the constant term in the


preceding.
Part 16: Panel Data

Hausman Test

16-122/135

+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i)
|
| Estimates: Var[e]
=
.235368D-01 |
|
Var[u]
=
.110254D+00 |
|
Corr[v(i,t),v(i,s)] =
.824078
|
| Lagrange Multiplier Test vs. Model (3) = 3797.07 |
| ( 1 df, prob value = .000000)
|
| (High values of LM favor FEM/REM over CR model.) |
| Fixed vs. Random Effects (Hausman)
= 2632.34 |
| ( 4 df, prob value = .000000)
|
| (High (low) values of H favor FEM (REM).)
|
+--------------------------------------------------+

Part 16: Panel Data

Fixed Effects

16-123/135

+----------------------------------------------------+
| Panel:Groups
Empty
0,
Valid data
595 |
|
Smallest
7,
Largest
7 |
|
Average group size
7.00 |
| There are 2 vars. with no within group variation. |
| ED
FEM
|
| Look for huge standard errors and fixed parameters.|
| F.E. results are based on a generalized inverse.
|
| They will be highly erratic. (Problematic model.) |
| Unable to compute std.errors for dummy var. coeffs.|
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|WKS
|
.00083
.00060003
1.381
.1672
46.811525|
|OCC
|
-.02157
.01379216
-1.564
.1178
.5111645|
|IND
|
.01888
.01545450
1.221
.2219
.3954382|
|SOUTH
|
.00039
.03429053
.011
.9909
.2902761|
|SMSA
|
-.04451**
.01939659
-2.295
.0217
.6537815|
|UNION
|
.03274**
.01493217
2.192
.0283
.3639856|
|EXP
|
.11327***
.00247221
45.819
.0000
19.853782|
|EXPSQ
|
-.00042***
.546283D-04
-7.664
.0000
514.40504|
|ED
|
.000
......(Fixed Parameter).......
|
|FEM
|
.000
......(Fixed Parameter).......
|
+--------+------------------------------------------------------------+

Part 16: Panel Data

Random Effects

16-124/135

+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i)
|
| Estimates: Var[e]
=
.235368D-01 |
|
Var[u]
=
.110254D+00 |
|
Corr[v(i,t),v(i,s)] =
.824078
|
| Lagrange Multiplier Test vs. Model (3) = 3797.07 |
| ( 1 df, prob value = .000000)
|
| (High values of LM favor FEM/REM over CR model.) |
+--------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|WKS
|
.00094
.00059308
1.586
.1128
46.811525|
|OCC
|
-.04367***
.01299206
-3.361
.0008
.5111645|
|IND
|
.00271
.01373256
.197
.8434
.3954382|
|SOUTH
|
-.00664
.02246416
-.295
.7677
.2902761|
|SMSA
|
-.03117*
.01615455
-1.930
.0536
.6537815|
|UNION
|
.05802***
.01349982
4.298
.0000
.3639856|
|EXP
|
.08744***
.00224705
38.913
.0000
19.853782|
|EXPSQ
|
-.00076***
.495876D-04
-15.411
.0000
514.40504|
|ED
|
.10724***
.00511463
20.967
.0000
12.845378|
|FEM
|
-.24786***
.04283536
-5.786
.0000
.1126050|
|Constant|
3.97756***
.08178139
48.637
.0000
|
+--------+------------------------------------------------------------+

Part 16: Panel Data

The Hausman Test, by Hand

16-125/135

--> matrix; br=b(1:8) ; vr=varb(1:8,1:8)$


--> matrix ; db = bf - br ; dv = vf - vr $
--> matrix ; list ; h =db'<dv>db$
Matrix H

has

1 rows and

1 columns.

1
+-------------1| 2523.64910
--> calc;list;ctb(.95,8)$
+------------------------------------+
| Listed Calculator Results
|
+------------------------------------+
Result =
15.507313

Part 16: Panel Data

Hello, professor greene.


Ive taken the liberty of attaching some LIMDEP output in order to ask your
view on whether my Hausman test stat is large, requiring the FEM, or not,
allowing me to use the (much better for my research) REM.
Specifically, my test statistic, corrected for heteroscedasticity, is about 34
and significant with 6 df.
I considered this a large value until I found your assignment 2 on the
internet which shows a value of 2554 with 4 df. Now, Id like to assert that
34/6 is a small value.

16-126/135

Part 16: Panel Data

16-127/135

Variable Addition
A Fixed Effects Model
yit i xit it
LSDV estimator - Deviations from group means:
To estimate , regress (y it yi ) on (xit xi )
Algebraic equivalent: OLS regress y it on (xit , xi )
Mundlak interpretation: i xi u i
Model becomes y it xi u i xit it
= xi xit it u i
a random effects model with the group means.
Estimate by FGLS.
Part 16: Panel Data

A Variable Addition Test


Asymptotic equivalent to Hausman
Also equivalent to Mundlak formulation
In the random effects model, using FGLS

16-128/135

Only applies to time varying variables


Add expanded group means to the regression (i.e.,
observation i,t gets same group means for all t.
Use Wald test to test for coefficients on means
equal to 0. Large chi-squared weighs against
random effects specification.

Part 16: Panel Data

Means Added to REM - Mundlak

16-129/135

+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|WKS
|
.00083
.00060070
1.380
.1677
46.811525|
|OCC
|
-.02157
.01380769
-1.562
.1182
.5111645|
|IND
|
.01888
.01547189
1.220
.2224
.3954382|
|SOUTH
|
.00039
.03432914
.011
.9909
.2902761|
|SMSA
|
-.04451**
.01941842
-2.292
.0219
.6537815|
|UNION
|
.03274**
.01494898
2.190
.0285
.3639856|
|EXP
|
.11327***
.00247500
45.768
.0000
19.853782|
|EXPSQ
|
-.00042***
.546898D-04
-7.655
.0000
514.40504|
|ED
|
.05199***
.00552893
9.404
.0000
12.845378|
|FEM
|
-.41306***
.03732204
-11.067
.0000
.1126050|
|WKSB
|
.00863**
.00363907
2.371
.0177
46.811525|
|OCCB
|
-.14656***
.03640885
-4.025
.0001
.5111645|
|INDB
|
.04142
.02976363
1.392
.1640
.3954382|
|SOUTHB |
-.05551
.04297816
-1.292
.1965
.2902761|
|SMSAB
|
.21607***
.03213205
6.724
.0000
.6537815|
|UNIONB |
.08152**
.03266438
2.496
.0126
.3639856|
|EXPB
|
-.08005***
.00533603
-15.002
.0000
19.853782|
|EXPSQB |
-.00017
.00011763
-1.416
.1567
514.40504|
|Constant|
5.19036***
.20147201
25.762
.0000
|
+--------+------------------------------------------------------------+

Part 16: Panel Data

Wu (Variable Addition) Test

16-130/135

--> matrix ; bm=b(12:19);vm=varb(12:19,12:19)$


--> matrix ; list ; wu = bm'<vm>bm $
Matrix WU

has

1 rows and

1 columns.

1
+-------------1| 3004.38076

Part 16: Panel Data

A Hierarchical Linear Model


Interpretation of the FE Model
it ci+it , x
yit x
( does not contain a constant)
E[it| Xi , ci ] 0, Var[it| Xi , ci ]=2
i + ui ,
ci +z
E[u|i zi ] 0, Var[u|i zi] u2
it [ z
i ui ] it
yit x

16-131/135

Part 16: Panel Data

Hierarchical Linear Model as REM


+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i)
|
| Estimates: Var[e]
=
.235368D-01 |
|
Var[u]
=
.110254D+00 |
|
Corr[v(i,t),v(i,s)] =
.824078
|
|
Sigma(u)
= 0.3303
|
+--------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
OCC
|
-.03908144
.01298962
-3.009
.0026
.51116447
SMSA
|
-.03881553
.01645862
-2.358
.0184
.65378151
MS
|
-.06557030
.01815465
-3.612
.0003
.81440576
EXP
|
.05737298
.00088467
64.852
.0000
19.8537815
FEM
|
-.34715010
.04681514
-7.415
.0000
.11260504
ED
|
.11120152
.00525209
21.173
.0000
12.8453782
Constant|
4.24669585
.07763394
54.702
.0000

16-132/135

Part 16: Panel Data

16-133/135

Evolution: Correlated Random Effects


Unknown parameters
yit i xit it , [1 , 2 ,..., N , , 2 ]
Standard estimation based on LS (dummy variables)
Ambiguous definition of the distribution of yit
Effects model, nonorthogonality, heterogeneity
yit i xit it , E[ i | Xi ] g( Xi ) 0
Contrast to random effects E[i | X i ]
Standard estimation (still) based on LS (dummy variables)
Correlated random effects, more detailed model
yit i xit it , P[i | Xi ] g( Xi ) 0
Linear projection? i xi u i Cor(u i , xi ) 0
Part 16: Panel Data

Mundlaks Estimator

16-134/135

Mundlak, Y., On the Pooling of Time Series and Cross


Section Data, Econometrica, 46, 1978, pp. 69-85.

i ui , E[ci | x i1 ,x i1 ,...x iT ]= x
i
Write ci = x
i
Assume ci contains all time invariant information
yi=X
i +cii+ i , Ti observations in group i

=X
i +ix i+ i + uii
Looks like random effects.
Var[i + uii]=i +2uii
This is the model we used for the Wu test.
Part 16: Panel Data

16-135/135

Correlated Random Effects


Mundlak
i ui , E[ci | x i1 ,x i1 ,...x iT ]= x
ci = x
i
i
Assume ci contains all time invariant information
yi=Xi+cii+i , Ti observations in group i

=X
i +ix i+ i + uii
Chamberlain/ Wooldridge
i1 1 x i22 ... x
iT T ui
ci = x

yi=X
i ix i11 ix i1 2 ... ix iT T i u+
i i
TxK E5F
TxK E5F
TxK
E5F

TxK etc.
E5F

Problems: Requires balanced panels


Modern panels have large T; models have large K
Part 16: Panel Data

Mundlaks Approach for an FE Model with Time


Invariant Variables
it +z
i ci+it , x
yit x
( does not contain a constant)
E[it| Xi , ci ] 0, Var[it| Xi , ci ]=2
i + wi ,
ci + x
E[w|i Xi , zi ] 0, Var[w|i Xi , zi ] 2w
it z
i x
i wi it
yit x
random effects model including group means of
time varying variables.

16-136/135

Part 16: Panel Data

Mundlak Form of FE Model


+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
x(i,t)=================================================================
OCC
|
-.02021384
.01375165
-1.470
.1416
.51116447
SMSA
|
-.04250645
.01951727
-2.178
.0294
.65378151
MS
|
-.02946444
.01915264
-1.538
.1240
.81440576
EXP
|
.09665711
.00119262
81.046
.0000
19.8537815
z(i)===================================================================
FEM
|
-.34322129
.05725632
-5.994
.0000
.11260504
ED
|
.05099781
.00575551
8.861
.0000
12.8453782
Means of x(i,t) and constant===========================================
Constant|
5.72655261
.10300460
55.595
.0000
OCCB
|
-.10850252
.03635921
-2.984
.0028
.51116447
SMSAB
|
.22934020
.03282197
6.987
.0000
.65378151
MSB
|
.20453332
.05329948
3.837
.0001
.81440576
EXPB
|
-.08988632
.00165025
-54.468
.0000
19.8537815
Variance Estimates=====================================================
Var[e]|
.0235632
Var[u]|
.0773825

16-137/135

Part 16: Panel Data

16-138/135

Panel Data Extensions


Dynamic models: lagged effects of the
dependent variable
Endogenous RHS variables
Cross country comparisons large T
More general parameter heterogeneity not
only the constant term
Nonlinear models such as binary choice

Part 16: Panel Data

The Hausman and Taylor Model


it
yit x1

x2 it

i
z1

i
z2

it ui

Model: x2 and z2 are correlated with u.


Deviations from group means removes all time invariant variables
yit yi ( x1it - x1i )'1 ( x2it - x2i ) '2 it
Implication: 1 , 2 are consistently estimated by LSDV.
( x1it - x1i ) = K1 instrumental variables
( x2it - x2i ) = K2 instrumental variables
z1i
?

= L1 instrumental variables (uncorrelated with u)


= L2 instrumental variables (where do we get them?)

H&T: x1i = K1 additional instrumental variables. Needs K1 L2.

16-139/135

Part 16: Panel Data

16-140/135

H&Ts 4 Step FGLS Estimator


(1) LSDV estimates of 1 , 2 , 2
(2) (e* )' = (e1 , e1 ,..., e1),(e2 , e2 ,..., e2 ),...,(eN, eN,..., eN )
IV regression of e * on Z* with instruments
Wi consistently

estimates 1 and 2.
(3) With fixed T, residual variance in (2) estimates u2 2 / T
With unbalanced panel, it estimates u2 2 (1/T) or something
resembling this. (1) provided an estimate of 2 so use the two
to obtain estimates of u2 and 2 . For each group, compute
2
2
2
i 1
/ (
Ti
u)

(4) Transform

[xit1 , xit2 ,zi1 ,zi2 ] to

W*
i = [xit1 , xit2 ,zi1 ,zi2 ] - i[x i1 , xi2 ,zi1 ,zi2 ]
and

yit to yit * = yit - iyi.


Part 16: Panel Data

H&Ts 4 STEP IV Estimator


Instrumental Variables Vi
(x1it - x1i ) = K1 instrumental variables
(x2it - x2i ) = K2 instrumental variables
z1i

= L1 instrumental variables (uncorrelated with u)

x1i

= K1 additional instrumental variables.

Now do 2SLS of y * on W * with instruments V to estimate


all parameters. I.e.,
* W*
)-1W
* y * .
[1 , 2 , 1 , 2 ]=(W

16-141/135

Part 16: Panel Data

16-142/135

Part 16: Panel Data

Arellano/Bond/Bovers Formulation Builds


on Hausman and Taylor

16-143/135

it
yit x1

x2 it

i
z1

i
z2

it ui

Instrumental variables for period t


( x1it - x1i ) = K1 instrumental variables
( x2it - x2i ) = K2 instrumental variables
z1i

= L1 instrumental variables (uncorrelated with u)

x1i

= K1 additional instrumental variables. K1 L2.

Let vit it ui
Let zit [( x1it - x1i )',( x2it - x2i )',z1i , x1']
Then E[zit vit ] 0
We formulate this for the Ti observations in group i.
Part 16: Panel Data

Arellano/Bond/Bovers Formulation Adds a


Lagged DV to H&T
it
yit yi,t1+x1

x2 it

i
z1

i
z2

it ui

Parameters : = [, 1
, 2
, 1
, 2']
The data
yi,2
yi,1 x1i2 x2i2 z1i z2 i

y
y
x1
x2
z1
z2
i3
i3
i
i,3
i,2
i
yi
, Xi
, Ti -1 rows

yi,T i
yi,T-1 x1iTi x2iTi z1i z2 i
1 K1
K2
L1 L2 columns

This formulation is the same as H&T with yi,t-1 contained in x2it .

16-144/135

Part 16: Panel Data

16-145/135

Dynamic (Linear) Panel


Data (DPD) Models
Application
Bias in Conventional Estimation
Development of Consistent Estimators
Efficient GMM Estimators

Part 16: Panel Data

Dynamic Linear Model


Balestra-Nerlove (1966), 36 States, 11 Years
Demand for Natural Gas
Structure
New Demand: G*i,t Gi,t (1 )Gi,t1
Demand Function G*i,t 1 2Pi,t 3Ni,t 4Ni,t 5Yi,t 6 Yi,t i,t
G=gas demand
N = population
P = price
Y = per capita income
Reduced Form
Gi,t 1 2Pi,t 3Ni,t 4Ni,t 5Yi,t 6 Yi,t 7Gi,t1 i i,t

16-146/135

Part 16: Panel Data

16-147/135

A General DPD model


i,t yi,t1 ci i,t
yi,t x
E[i,t | Xi ,ci ] 0
2
E[i,t
| Xi , ci ] 2 , E[i,ti,s | Xi , ci ] 0 if t s.

E[ci | Xi ] g( Xi )
No correlation across individuals
OLS and GLS are both inconsistent.

Part 16: Panel Data

16-148/135

Arellano and Bond Estimator


Base on first differences
yi,t yi,t1 ( xi,t xi,t1 )'+(yi,t1 yi,t2 ) (i,t i,t1)
Instrumental variables
yi,3 yi,2 ( xi,3 xi,2 )'+(yi,2 yi,1) (i,3 i,2 )
Can use yi1
yi,4 yi,3 ( xi,4 xi,3 )'+(yi,3 yi,2 ) (i,4 i,3 )
Can use yi,1 and yi2
yi,5 yi,4 ( xi,5 xi,4 )'+(yi,4 yi,3 ) (i,5 i,4 )
Can use yi,1 and yi2 and yi,3

Part 16: Panel Data

16-149/135

Arellano and Bond Estimator


More instrumental variables - Predetermined X
yi,3 yi,2 ( xi,3 xi,2 )'+(yi,2 yi,1) (i,3 i,2 )
Can use yi1 and xi,1 , xi,2
yi,4 yi,3 ( xi,4 xi,3 )'+(yi,3 yi,2 ) (i,4 i,3 )
Can use yi,1 , yi2 , xi,1 , xi,2 , xi,3
yi,5 yi,4 ( xi,5 xi,4 )'+(yi,4 yi,3 ) (i,5 i,4 )
Can use yi,1 , yi2 , yi,3 , xi,1 , xi,2 , xi,3 , xi,4

Part 16: Panel Data

16-150/135

Arellano and Bond Estimator


Even more instrumental variables - Strictly exogenous X
yi,3 yi,2 ( xi,3 xi,2 )'+(yi,2 yi,1) (i,3 i,2 )
Can use yi1 and xi,1 , xi,2 ,..., xi,T (all periods)
yi,4 yi,3 ( xi,4 xi,3 )'+(yi,3 yi,2 ) (i,4 i,3 )
Can use yi,1 , yi2 , xi,1 , xi,2 ,..., xi,T
yi,5 yi,4 ( xi,5 xi,4 )'+(yi,4 yi,3 ) (i,5 i,4 )
Can use yi,1 , yi2 , yi,3 , xi,1 , xi,2 ,..., xi,T
The number of potential instruments is huge.
These define the rows of Zi. These can be used for
simple instrumental variable estimation.

Part 16: Panel Data

Application: Maquiladora

http://www.dallasfed.org/news/research/2005/05us-mexico_felix.pdf

16-151/135

Part 16: Panel Data

16-152/135

Maquiladora

Part 16: Panel Data

16-153/135

Estimates

Part 16: Panel Data

Das könnte Ihnen auch gefallen