Econometrics I 16

16-1/135
Econometrics I
Professor William Greene
Stern School of Business
Department of Economics
Part 16: Panel Data
16-2/135
Econometrics I
Part 16 Panel Data
Part 16: Panel Data
16-3/135
Part 16: Panel Data
www.oft.gov.uk/shared_oft/reports/Evaluating-OFTs-work/oft1416.pdf
16-4/135
Part 16: Panel Data
16-5/135
Part 16: Panel Data
16-6/135
Panel Data Sets
Longitudinal data
Cross section time series
British household panel survey (BHPS)

Panel Study of Income Dynamics (PSID)
many others
Penn world tables
Financial data by firm, by year
rit rft = i(rmt - rft) + it, i = 1,,many; t=1,many
Exchange rate data, essentially infinite T, large N
Part 16: Panel Data
Benefits of Panel Data
16-7/135
Time and individual variation in behavior

unobservable in cross sections or aggregate time
series
Observable and unobservable individual
heterogeneity
Rich hierarchical structures
More complicated models
Features that cannot be modeled with only cross
section or aggregate time series data alone
Dynamics in economic behavior
Part 16: Panel Data
16-8/135
Part 16: Panel Data
16-9/135
Part 16: Panel Data
16-10/135
BHPS Has
Evolved
Part 16: Panel Data
16-11/135
Part 16: Panel Data
16-12/135
Part 16: Panel Data
16-13/135
Part 16: Panel Data
16-14/135
Part 16: Panel Data
16-15/135
Part 16: Panel Data
16-16/135
Part 16: Panel Data
16-17/135
Part 16: Panel Data
16-18/135
Part 16: Panel Data
Cornwell and Rupert Data

Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7
Years
(Extracted from NLSY.) Variables in the file are
EXP = work experience
WKS
= weeks worked
OCC
= occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA
= 1 if resides in a city (SMSA)
MS = 1 if married
FEM
= 1 if female
UNION = 1 if wage set by union contract
ED = years of education
BLK = 1 if individual is black
LWAGE = log of wage = dependent variable in regressions
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation
with Panel Data: An Empirical Comparison of Instrumental Variable
Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. See
Baltagi, page 122 for further analysis. The data were downloaded from the
website for Baltagi's text.
16-19/135
Part 16: Panel Data
16-20/135
Part 16: Panel Data
Balanced and Unbalanced Panels
Distinction: Balanced vs. Unbalanced Panels

A notation to help with mechanics
zi,t, i = 1,,N; t = 1,,Ti
The role of the assumption
Mathematical and notational convenience:
Balanced, n=NT
N
Unbalanced: n i=1 Ti
Is the fixed Ti assumption ever necessary? Almost

never.
Is unbalancedness due to nonrandom attrition from an
otherwise balanced panel? This would require special
considerations.
16-21/135
Part 16: Panel Data
Application: Health Care Usage

German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods
This is an unbalanced panel with 7,293 individuals. There are altogether 27,326 observations. The number of
observations ranges from 1 to 7.
(Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).
(Downloaded from the JAE Archive)
Variables in the file are
DOCTOR = 1(Number of doctor visits > 0)
HOSPITAL = 1(Number of hospital visits > 0)
HSAT
= health satisfaction, coded 0 (low) - 10 (high)
DOCVIS
= number of doctor visits in last three months
HOSPVIS = number of hospital visits in last calendar year
PUBLIC
= insured in public health insurance = 1; otherwise = 0
ADDON
= insured by add-on insurance = 1; otherswise = 0
HHNINC = household nominal monthly net income in German marks / 10000.
(4 observations with income=0 were dropped)
HHKIDS
= children under age 16 in the household = 1; otherwise = 0
EDUC
= years of schooling
AGE = age in years
MARRIED = marital status
16-22/135
Part 16: Panel Data
An Unbalanced Panel:
RWMs GSOEP Data on Health Care
16-23/135
N = 7,293 Households
Part 16: Panel Data
A Basic Model for Panel Data
Unobserved individual effects in regression: E[yit | xit, ci]

Notation:
yit =xit + ci + it
xi1
x
i2
Xi
M

xiT i
Ti rows, K columns
Linear specification:
Fixed Effects: E[ci | Xi ] = g(Xi). Cov[xit,ci] 0
16-24/135
effects are correlated with included variables.

Random Effects: E[ci | Xi ] = 0. Cov[xit,ci] = 0
Part 16: Panel Data
Convenient Notation
Fixed Effects the dummy variable model
yit = i + xit + it
Individual specific constant terms.
Random Effects the error components model
16-25/135
yit = xit + it + ui
Compound (composed) disturbance
Part 16: Panel Data
Estimating
16-26/135
is the partial effect of interest
Can it be estimated (consistently) in

the presence of (unmeasured) ci?
Does pooled least squares work?

Strategies for controlling for ci using the
sample data
Part 16: Panel Data
Assumptions for Asymptotics
Convergence of moments involving cross section Xi.

N increasing, T or Ti assumed fixed.
16-27/135
Fixed T asymptotics (see text, p. 348)

Time series characteristics are not relevant (may be
nonstationary relevant in Penn World Tables)
If T is also growing, need to treat as multivariate time series.
Ranks of matrices. X must have full column rank. (Xi

may not, if Ti < K.)
Strict exogeneity and dynamics. If xit contains yi,t-1 then xit
cannot be strictly exogenous. Xit will be correlated with
the unobservables in period t-1. (To be revisited later.)
Empirical characteristics of microeconomic data
Part 16: Panel Data
The Pooled Regression
Presence of omitted effects

it +ci+it , observation for person i at time t
yit =x
yi=X
i +cii+ i , Ti observations in group i
=X
i +c i + i , note c i (ci , ci ,...,ci )
y =X+c +, Ni=1Ti observations in the sample
16-28/135
Potential bias/inconsistency of OLS depends

on fixed or random
Part 16: Panel Data
OLS in the Presence of Individual Effects

b=(XX)-1X'y
-1
=
+ (1/N)Ni=1
X Xi i
(1/N)Ni=1
X c i
(part due to the omitted ci )
-1
i i (covariance of X and will = 0)

+ (1/N) XiX i (1/N)Ni=1X
The third term vanishes asymptotically by assumption
N
i=1
-1
N Ti
1 N
plim b= + plim
N Xi=1X i i i=1xNc
So, what becomes of Ni=1

w ixci i ?
plim b=
16-29/135
if the covariance x
of
(left
out variable for
i
mula)
and ci converges to zero.
Part 16: Panel Data
Estimating the Sampling Variance of b
16-30/135
s2(X X)-1? Inappropriate because
Correlation across observations (certainly)

Heteroscedasticity (possibly)
A robust covariance matrix
Robust estimation (in general)

The White estimator
A Robust estimator for OLS.
Part 16: Panel Data
Cluster Estimator
16-31/135
Robust variance estimator for Var[b]

Est.Var[b]
Ti
Ti
= ( X'X)1 Ni=1( t=1
xit vit )( t=1
xit vit ) ( X'X) 1
Ti
Ti
vit vis xit xis ( X'X) 1
= ( X'X)1 Ni=1 t=1
s=1
v a least squares residual = c

it
it
(If Ti = 1, this is the White estimator.)
Part 16: Panel Data
Alternative OLS Variance Estimators

Cluster correction increases SEs
16-32/135
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
Constant
5.40159723
.04838934
111.628
.0000
EXP
.04084968
.00218534
18.693
.0000
EXPSQ
-.00068788
.480428D-04
-14.318
.0000
OCC
-.13830480
.01480107
-9.344
.0000
SMSA
.14856267
.01206772
12.311
.0000
MS
.06798358
.02074599
3.277
.0010
FEM
-.40020215
.02526118
-15.843
.0000
UNION
.09409925
.01253203
7.509
.0000
ED
.05812166
.00260039
22.351
.0000
Robust
Constant
5.40159723
.10156038
53.186
.0000
EXP
.04084968
.00432272
9.450
.0000
EXPSQ
-.00068788
.983981D-04
-6.991
.0000
OCC
-.13830480
.02772631
-4.988
.0000
SMSA
.14856267
.02423668
6.130
.0000
MS
.06798358
.04382220
1.551
.1208
FEM
-.40020215
.04961926
-8.065
.0000
UNION
.09409925
.02422669
3.884
.0001
ED
.05812166
.00555697
10.459
.0000
Part 16: Panel Data
16-33/135
Results of Bootstrap Estimation
Part 16: Panel Data
16-34/135
Bootstrap variance for a

panel data estimator
Panel Bootstrap =
Block Bootstrap
Data set is N groups of
size Ti
Bootstrap sample is N
groups of size Ti drawn
with replacement.
Part 16: Panel Data
16-35/135
Part 16: Panel Data
Using First Differences
16-36/135

yit =x
Eliminating the heterogeneity
it ) + ci +
yit = yit -yi,t-1 = (x
it
it ) + wit
= (x
Note: Time invariant variables become zero
Time trend becomes the constant term
Part 16: Panel Data
OLS with First Differences
16-37/135
With strict exogeneity of (Xi,ci), OLS regression of yit on

xit is unbiased and consistent but inefficient.
i,2 i,1
Var
22 2
0
0

i,3 i,2 2 22 2
M
0 2 O 2
M
2
2
0
L
i,Ti i,Ti 1
GLS is unpleasantly complicated. Use OLS in

first differences and use Newey-West with one
lag.
Part 16: Panel Data
Application of a Two Period Model

Hemoglobin and Quality of Life in Cancer
Patients with Anemia,
Finkelstein (MIT), Berndt (MIT), Greene (NYU),
Cremieux (Univ. of Quebec)
1998
With Ortho Biotech seeking to change labeling
of already approved drug erythropoetin.
r-HuEPO
16-38/135
Part 16: Panel Data
16-39/135
Part 16: Panel Data
QOL Study
Quality of life study
yit = self administered quality of life survey, scale = 0,,100

xit = hemoglobin level, other covariates
16-40/135
Treatment effects model (hemoglobin level)

Background r-HuEPO treatment to affect Hg level
Important statistical issues
i = 1, 1200+ clinically anemic cancer patients undergoing

chemotherapy, treated with transfusions and/or r-HuEPO
t = 0 at baseline, 1 at exit. (interperiod survey by some patients was not
used)
Unobservable individual effects

The placebo effect
Attrition sample selection
FDA mistrust of community based not clinical trial based statistical
evidence
Objective when to administer treatment for maximum marginal

benefit
Part 16: Panel Data
Regression-Treatment Effects Model
16-41/135
QOLit t + "other covariates"

+ 7Hbit7 + 8Hbit8+ 9Hbit9+ ... 15Hb15
it
+ ci + it
Hbit hemoglobin level, grams/deciliter, range 3+ to 15
Hbit7 1(3 Hbit < 7.5) (Base case; 7 = 0)
Hbit8 1(7.5 Hbit < 8.5)
M
Hb15
it 1(14.5 Hbit 15)
Part 16: Panel Data
Effects and Covariates
16-42/135
Individual effects that would impact a self reported QOL:

Depression, comorbidity factors (smoking), recent
financial setback, recent loss of spouse, etc.
Covariates
Change in tumor status

Measured progressivity of disease
Change in number of transfusions
Presence of pain and nausea
Change in number of chemotherapy cycles
Change in radiotherapy types
Elapsed days since chemotherapy treatment
Amount of time between baseline and exit
Part 16: Panel Data
First Differences Model

QOLi QOLi1 QOLi0
j
j
K
= (1 0 ) 15
(Hb
Hb
)
j8 j
i1
i0
k 1k (xik,1 xik,0 ) i1 i0
Regression to the mean (the "tendency to mediocrity")

i0 i1 ui (QOLi0 QOL0 ) Expect 0 < 1
implies
= 1 0 QOL0
QOLi QOLi1 QOLi0
16-43/135
j
j
K
= 15
(Hb
Hb
)
j 8 j
i1
i0
k1k (xik,1 xik,0 ) QOLi0 + ui
Part 16: Panel Data
Dealing with Attrition
16-44/135
The attrition issue: Appearance for the second interview

was low for people with initial low QOL (death or
depression) or with initial high QOL (dont need the
treatment). Thus, missing data at exit were clearly
related to values of the dependent variable.
Solutions to the attrition problem
Heckman selection model (used in the study)
Prob[Present at exit|covariates] = (z) (Probit model)

Additional variable added to difference model i = (zi)/(zi)
The FDA solution: fill with zeros. (!)
Part 16: Panel Data
Difference in Differences
With two periods,
i1) + ui
yit = yi2 -yi1 = 0 + (xi2 -x
Consider a "treatment, Di ," that takes place between
time 1 and time 2 for some of the individuals
+ 1Di + ui
yi= 0 + (x
i)
Di = the "treatment dummy"
This is a linear regression model. If there are no regressors,
16-45/135
1 y| treatment - y| control
= "difference in differences" estimator.
0 Average change in yi for the "treated"
Part 16: Panel Data
Difference-in-Differences Model
With two periods and strict exogeneity of D and T,
yit = 0 1Dit 2 Tt 3 TtDit it

Dit = dummy variable for a treatment that takes place
between time 1 and time 2 for some of the individuals,
Tt = a time period dummy variable, 0 in period 1,
1 in period 2.
This is a linear regression model. If there are no regressors,
16-46/135
Using least squares,

b3 (y2 y1)D1 (y2 y1)D0
Part 16: Panel Data
16-47/135
yit = 0 1Dit 2 Tt 3Dit Tt xit it , t 1,2
yit = 2 3Di 2 (xit ) it
= 2 3Di 2 (xit ) ui
yit | D 1 yit | D 0
3 (xit | D 1) (xit | D 0)
If the same individual is observed in both states,
the second term is zero. If the effect is estimated by
averaging individuals with D = 1 and different individuals
with D=0, then part of the 'effect' is explained by change
in the covariates, not the treatment.
Part 16: Panel Data
A Tale of Two Cities
16-48/135
A sharp change in policy can constitute a natural

experiment
The Mariel boatlift from Cuba to Miami (May-September,
1980) increased the Miami labor force by 7%. Did it
reduce wages or employment of non-immigrants?
Compare Miami to Los Angeles, a comparable
(assumed) city.
Card, David, The Impact of the Mariel Boatlift on the
Miami Labor Market, Industrial and Labor Relations
Review, 43, 1990, pp. 245-257.
Part 16: Panel Data
i individual, T = 0 for no immigration, T=1 for migration
(Yi | T) Yi,T 1 if unemployed, 0 if employed.
c = city, t = period.
Unemployment rate in city c at time t is E[Yi,0 | c,t] with no migration
Unemploym ent rate in city c at time t is E[Yi,1 | c,t] with migration
Assume E[Yi,0 | c,t] t c
E[Yi,1 | c,t] t c
E[Yi,0 | c,t]
the effect of the immigration on the unemployment rate.
16-49/135
Part 16: Panel Data
Applying the Model
16-50/135
c = M for Miami, L for Los Angeles

Immigration occurs in Miami, not Los Angeles
T = 1979, 1981 (pre- and post-)
Sample moment equations: E[Yi|c,t,T]
E[Yi|M,79] = 79 + M
E[Yi|M,81] = 81 + M +
E[Yi|L,79] = 79 + L
E[Yi|M,79] = 81 + L
It is assumed that unemployment growth in the two cities

would be the same if there were no immigration.
Part 16: Panel Data
Implications for Differences
16-51/135
If neither city exposed to migration
E[Yi,0|M,81] - E[Yi,0|M,79] = 81 79 (Miami)
E[Yi,0|L,81] - E[Yi,0|L,79] = 81 79 (LA)
If both cities exposed to migration
E[Yi,1|M,81] - E[Yi,1|M,79] = 81 79 + (Miami)
E[Yi,1|L,81] - E[Yi,1|L,79] = 81 79 + (LA)
One city (Miami) exposed to migration: The difference

in differences is.
{E[Yi,1|M,81] - E[Yi,1|M,79]} {E[Yi,0|L,81] - E[Yi,0|L,79]}

= (Miami)
Part 16: Panel Data
UK Office of Fair Trading, May 2012; Stephen Davies
http://dera.ioe.ac.uk/14610/1/oft1416.pdf
16-52/135
Part 16: Panel Data
16-53/135
Outcome is the fees charged.
Activity is collusion on fees.
Part 16: Panel Data
Treatment Schools:
Treatment is an
intervention by the
Office of Fair Trading
Control Schools were
not involved in the
conspiracy
Treatment is not
voluntary
16-54/135
Part 16: Panel Data
16-55/135
Apparent Impact of the Intervention
Part 16: Panel Data
16-56/135
Part 16: Panel Data
Treatment (Intervention)
Effect = 1 +
2 if SS school
16-57/135
Part 16: Panel Data
In order to test robustness two versions of the fixed effects model were run. The first is
Ordinary Least Squares, and the second is heteroscedasticity and auto-correlation robust
(HAC) standard errors in order to check for heteroscedasticity and autocorrelation.
16-58/135
Part 16: Panel Data
16-59/135
Part 16: Panel Data
16-60/135
The cumulative impact of the intervention is the

area between the two paths from intervention to
time T.
Part 16: Panel Data
16-61/135
Part 16: Panel Data
16-62/135
The Fixed Effects Model

yi = Xi + dii + i, for each individual
y1 X1 d1 0

y2 X2 0 d2
M M M M

X
y
0 0
N
N

= [X,D]

= Z
0
0 0
O M
0 d
N
0
E[ci | Xi ] = g(Xi); Effects are correlated with included variables.

Cov[xit,ci] 0
Part 16: Panel Data
16-63/135
The Within Groups Transformation

Removes the Effects
it ci+it
yit x
i ci+i
yi x
(it i )
yit yi ( xit - x
i)
Use least squares to estimate .
Part 16: Panel Data
Useful Analysis of Variance Notation
16-64/135
Decomposition of Total variation:

(zit z)
N
i=1
Ti
t=1
N
i=1
(zit z.)
Ti
t=1
T z.i z
N
i=1 i
Total variation = Within groups variation

+ Between groups variation
Part 16: Panel Data
WHO Data
16-65/135
Part 16: Panel Data
Baltagi and Griffins Gasoline Data

World Gasoline Demand Data, 18 OECD Countries, 19 years
Variables in the file are
COUNTRY = name of country
YEAR = year, 1960-1978
LGASPCAR = log of consumption per car
LINCOMEP = log of per capita income
LRPMG = log of real price of gasoline
LCARPCAP = log of per capita number of cars
See Baltagi (2001, p. 24) for analysis of these data. The article on
which the analysis is based is Baltagi, B. and Griffin, J., "Gasolne
Demand in the OECD: An Application of Pooling and Testing
Procedures," European Economic Review, 22, 1983, pp. 117-137. The
data were downloaded from the website for Baltagi's text.
16-66/135
Part 16: Panel Data
Analysis of Variance
16-67/135
Part 16: Panel Data
Analysis of Variance
+--------------------------------------------------------------------------+
| Analysis of Variance for
LGASPCAR
|
| Stratification Variable
_STRATUM
|
| Observations weighted by
ONE
|
| Total Sample Size
342
|
| Number of Groups
18
|
| Number of groups with no data
0
|
| Overall Sample Mean
4.2962420
|
| Sample Standard Deviation
.5489071
|
| Total Sample Variance
.3012990
|
|
|
| Source of Variation
Variation
Deg.Fr.
Mean Square |
| Between Groups
85.68228007
17
5.04013 |
| Within Groups
17.06068428
324
.05266 |
| Total
102.74296435
341
.30130 |
| Residual S.D.
.22946990
|
| R-squared
.83394791
MSB/MSW
21.96425 |
| F ratio
95.71734806
P value
.00000 |
+--------------------------------------------------------------------------+
16-68/135
Part 16: Panel Data
Estimating the Fixed Effects Model

The FEM is a plain vanilla regression model but
with many independent variables
Least squares is unbiased, consistent, efficient,
but inconvenient if N is large.
16-69/135
b XX X D
Xy

D y
a
D
X
D
D

Using the Frisch-Waugh theorem

b
=[XMDX]1 XMDy
Part 16: Panel Data
Fixed Effects Estimator (cont.)

M1D
2
0 MD
0
MD
(The dummy variables are orthogonal)
N
0
MD
0
i i )1 d = I Ti (1/Ti )dd
MDi I Ti di (dd
i
0
i Di Xi ,
XMD X = Ni=1XM
i Di yi ,
XMD y = Ni=1XM
16-70/135
y
XM
i Di Xi
XM
i
i
D
k,l
i k
i
t=1
(xit,k -xi.,k )(xit,l -xi.,l )
Ti
t=1(xit,k -xi.,k )(yit -yi. )
Part 16: Panel Data
Least Squares Dummy Variable Estimator
b is obtained by within groups least squares

(group mean deviations)
a is estimated using the normal equations:

DXb+DDa=Dy
16-71/135
a = (DD)-1D(y Xb)
Ti
a=(1/T
)
i
i
t=1 (yit -xitb)=ei
Part 16: Panel Data
Inference About OLS
16-72/135
Assume strict exogeneity: Cov[it,(xjs,cj)]=0. Every

disturbance in every period for each person is
uncorrelated with variables and effects for every person
and across periods.
Now, its just least squares in a classical linear
regression model.
2
N
N
N
i
1
T
)plim[(1
/
T
)
X
M
X
]
i=1 i
i=1 i
i=1 i D i
Asy.Var[b] =
which is the usual estimator for OLS
Ti
Ni=1 t=1
(yit -ai -xitb)2
N
i=1 i
T - N - K
(Note the degrees of freedom correction)

Part 16: Panel Data
16-73/135
Application Cornwell and Rupert
Part 16: Panel Data
16-74/135
LSDV Results
Note huge changes in
the coefficients. SMSA
and MS change signs.
Significance changes
completely!
Pooled OLS
Part 16: Panel Data
16-75/135
The Effect of the Effects
Part 16: Panel Data
The Within (LSDV) Estimator is an IV Estimator
16-76/135
y = X+(D+)
= X+ w
Regression of y on X is inconsistent because X is
correlated with w. The data in group mean deviations is
Z = MDX = X - D(DD)-1DX
The inconsistent OLS estimator is b = (XX)-1 Xy (omits D)
The IV estimator bLSDV =(ZX)-1 Zy=(XMDX)-1 XMDy.
=[(XMD )(MDX)]-1 (XMD )(MDy)
This is OLS using data in mean deviations, i.e., LSDV.
Part 16: Panel Data
16-77/135
LSDV As Usual
Part 16: Panel Data
16-78/135
2SLS Using Z=MDX as Instruments
Part 16: Panel Data
A Caution About Stata and R2

Residual Sum of Squares
Total Sum of Squares
Or is it? What is the total sum of squares?
R squared = 1 -
For the FE model above,
Conventional: Total Sum of Squares =
"Within Sum of Squares"
i 1
i 1
Ti
t 1
it
Ti
t 1
it
yi
R2 = 0.90542
R2 = 0.65142
Which should appear in the denominator of R 2
The coefficient estimates and standard errors are the same. The calculation of the R 2 is
different. In the areg procedure, you are estimating coefficients for each of your covariates
plus each dummy variable for your groups. In the xtreg, fe procedure the R2 reported is
obtained by only fitting a mean deviated model where the effects of the groups (all of the
dummy variables) are assumed to be fixed quantities. So, all of the effects for the groups are
simply subtracted out of the model and no attempt is made to quantify their overall effect on
the fit of the model.
Since the SSE is the same, the R 2=1SSE/SST is very different. The difference is real in
that we are making different assumptions with the two approaches. In the xtreg, fe approach,
the effects of the groups are fixed and unestimated quantities are subtracted out of the
model before the fit is performed. In the areg approach, the group effects are estimated and
affect the total sum of squares of the model under consideration.
16-79/135
Part 16: Panel Data
Examining the Effects with a KDE

Fixed E ff ect s from Corn well an d Ru p er t Wag e Mod el
.345
.207
.138
.069
Fixed Eff ects from Cornwell and Rupert W

age Model
.000
0
AI
K
ernel dens ity es tim a te for
AI
Frequenc y
De n s ity
.276
Mean = 4.819,
Standard deviation =
1.054.
16-80/135
.856
1.688
2.520
3 .351
4.1 83
5.015
5.847
6.678
AI
Part 16: Panel Data
Robust Covariance Matrix for LSDV

Cluster Estimator for Within Estimator
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|OCC
|
-.02021
.01374007
-1.471
.1412
.5111645|
|SMSA
|
-.04251**
.01950085
-2.180
.0293
.6537815|
|MS
|
-.02946
.01913652
-1.540
.1236
.8144058|
|EXP
|
.09666***
.00119162
81.114
.0000
19.853782|
+--------+------------------------------------------------------------+
+---------------------------------------------------------------------+
| Covariance matrix for the model is adjusted for data clustering.
|
| Sample of
4165 observations contained
595 clusters defined by |
|
7 observations (fixed number) in each cluster.
|
+---------------------------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
+--------+--------------+----------------+--------+--------+----------+
|DOCC
|
-.02021
.01982162
-1.020
.3078
.00000|
|DSMSA
|
-.04251
.03091685
-1.375
.1692
.00000|
|DMS
|
-.02946
.02635035
-1.118
.2635
.00000|
|DEXP
|
.09666***
.00176599
54.732
.0000
.00000|
+--------+------------------------------------------------------------+
16-81/135
Part 16: Panel Data
16-82/135
Time Invariant Regressors
Time invariant xit is defined as

invariant for all i. E.g., sex dummy
variable, FEM and ED (education in
the Cornwell/Rupert data).
If xit,k is invariant for all t, then the

group mean deviations are all 0.
Part 16: Panel Data
16-83/135
FE With Time Invariant Variables

+----------------------------------------------------+
| There are 2 vars. with no within group variation. |
| FEM
ED
|
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
+--------+--------------+----------------+--------+--------+----------+
EXP
|
.09671227
.00119137
81.177
.0000
19.8537815
WKS
|
.00118483
.00060357
1.963
.0496
46.8115246
OCC
|
-.02145609
.01375327
-1.560
.1187
.51116447
SMSA
|
-.04454343
.01946544
-2.288
.0221
.65378151
FEM
|
.000000
......(Fixed Parameter).......
ED
|
.000000
+--------------------------------------------------------------------+
|
Test Statistics for the Classical Model
|
+--------------------------------------------------------------------+
|
Model
Log-Likelihood
Sum of Squares R-squared |
|(1) Constant term only
-2688.80597
886.90494
.00000 |
|(2) Group effects only
27.58464
240.65119
.72866 |
|(3) X - variables only
-1688.12010
548.51596
.38154 |
|(4) X and group effects
2223.20087
83.85013
.90546 |
+--------------------------------------------------------------------+
Part 16: Panel Data
Drop The Time Invariant Variables

Same Results
+--------+--------------+----------------+--------+--------+----------+
+--------+--------------+----------------+--------+--------+----------+
EXP
|
.09671227
.00119087
81.211
.0000
19.8537815
WKS
|
.00118483
.00060332
1.964
.0495
46.8115246
OCC
|
-.02145609
.01374749
-1.561
.1186
.51116447
SMSA
|
-.04454343
.01945725
-2.289
.0221
.65378151
+--------------------------------------------------------------------+
|
Test Statistics for the Classical Model
|
+--------------------------------------------------------------------+
|
Model
Log-Likelihood
Sum of Squares R-squared |
|(1) Constant term only
-2688.80597
886.90494
.00000 |
|(2) Group effects only
27.58464
240.65119
.72866 |
|(3) X - variables only
-1688.12010
548.51596
.38154 |
|(4) X and group effects
2223.20087
83.85013
.90546 |
+--------------------------------------------------------------------+
16-84/135
No change in the sum of squared residuals

Part 16: Panel Data
Fixed Effects Vector Decomposition

Efficient Estimation of Time Invariant and
Rarely Changing Variables in Finite Sample
Panel Analyses with Unit Fixed Effects
16-85/135
Thomas Plmper and Vera Troeger

Political Analysis, 2007
Part 16: Panel Data
Introduction
[T]he FE model does not allow the estimation of
time invariant variables. A second drawback of
the FE model results from its inefficiency in
estimating the effect of variables that have very
little within variance.
This article discusses a remedy to the related
problems of estimating time invariant and rarely
changing variables in FE models with unit effects
16-86/135
Part 16: Panel Data
16-87/135
The Model
yit = i +
x
+
k
kit
k=1
z
+
m
mi
it
m=1
where i denote the N unit effects.
Part 16: Panel Data
Fixed Effects Vector Decomposition

Step 1: Compute the fixed effects regression to
get the estimated unit effects. We run this FE
model with the sole intention to obtain estimates
of the unit effects, i.
16-88/135
i = yi - K bFE
xki
k=1 k
Part 16: Panel Data
16-89/135
Step 2
Regress ai on zi and compute residuals
ai =zm=1 +mh
M
im
hi is orthogonal to zi (since it is a residual)

Vector hi is expanded so each element
hi is replicated Ti times - h is the length of
the full sample.
Part 16: Panel Data
16-90/135
Step 3
Regress yit on a constant, X, Z and h using
ordinary least squares to estimate , , , .
yit = +
x kit +
k=1 k
z + h i + it
m=1 m mi
Notice that i in the original model has

become +h i in the revised model.
Part 16: Panel Data
Step 1 (Based on full sample)

These 2 variables have no within group variation.
FEM
ED
F.E. estimates are based on a generalized inverse.
--------+--------------------------------------------------------|
Standard
Prob.
Mean
LWAGE| Coefficient
Error
z
z>|Z|
of X
--------+--------------------------------------------------------EXP|
.09663***
.00119
81.13 .0000
19.8538
WKS|
.00114*
.00060
1.88 .0600
46.8115
OCC|
-.02496*
.01390
-1.80 .0724
.51116
IND|
.02042
.01558
1.31 .1899
.39544
SOUTH|
-.00091
.03457
-.03 .9791
.29028
SMSA|
-.04581**
.01955
-2.34 .0191
.65378
UNION|
.03411**
.01505
2.27 .0234
.36399
FEM|
.000
.....(Fixed Parameter).....
.11261
ED|
.000
.....(Fixed Parameter).....
12.8454
--------+---------------------------------------------------------
16-91/135
Part 16: Panel Data
Step 2 (Based on 595 observations)

--------+--------------------------------------------------------|
Standard
Prob.
Mean
UHI| Coefficient
Error
z
z>|Z|
of X
--------+--------------------------------------------------------Constant|
2.88090***
.07172
40.17 .0000
FEM|
-.09963**
.04842
-2.06 .0396
.11261
ED|
.14616***
.00541
27.02 .0000
12.8454
--------+---------------------------------------------------------
16-92/135
Part 16: Panel Data
Step 3!
--------+--------------------------------------------------------|
Standard
Prob.
Mean
LWAGE| Coefficient
Error
z
z>|Z|
of X
--------+--------------------------------------------------------Constant|
2.88090***
.03282
87.78 .0000
EXP|
.09663***
.00061
157.53 .0000
19.8538
WKS|
.00114***
.00044
2.58 .0098
46.8115
OCC|
-.02496***
.00601
-4.16 .0000
.51116
IND|
.02042***
.00479
4.26 .0000
.39544
SOUTH|
-.00091
.00510
-.18 .8590
.29028
SMSA|
-.04581***
.00506
-9.06 .0000
.65378
UNION|
.03411***
.00521
6.55 .0000
.36399
FEM|
-.09963***
.00767
-13.00 .0000
.11261
ED|
.14616***
.00122
120.19 .0000
12.8454
HI|
1.00000***
.00670
149.26 .0000 -.103D-13
--------+---------------------------------------------------------
16-93/135
Part 16: Panel Data
16-94/135
Part 16: Panel Data
16-95/135
What happened here?

yit = i +
x kit +
k=1 k
z + it
m=1 m mi
where i denote the N unit effects.

An assumption is added along the way
Cov(i ,Zi ) = 0. This is exactly the number of
orthogonality assumptions needed to
identify . It is not part of the original model.
Part 16: Panel Data
http://davegiles.blogspot.com/2012/06/fixed-effects-vector-decomposition.html
16-96/135
Part 16: Panel Data
16-97/135
Part 16: Panel Data
The Random Effects Model
The random effects model

yit =x
yi=X
=X
i +c i + i , note c i (ci , ci ,...,ci )
y =X+c +, Ni=1Ti observations in the sample
c=(c1 , c2 ,...cN), Ni=1Ti by 1 vector
ci is uncorrelated with xit for all t;
16-98/135
E[ci |Xi] = 0
E[it|Xi,ci]=0
Part 16: Panel Data
Notation
16-99/135
y1
y
2
M

yN
X
1
X
M

X
N
= X++u
T1 observations
1i u1 1
i u
T2 observations
22
2
M
M
M
i
TN observations
N uN N
Ni=1 Ti observations
= X+w
I n all that follows, except where explicitly noted, X, Xi
and xit contain a constant term as the first element.
To avoid notational clutter, in those cases, xit etc. will
simply denote the counterpart without the constant term.
Use of the symbol K for the number of variables will thus
be context specific but will usually include the constant term.
Part 16: Panel Data
Error Components Model

A Generalized Regression Model
yit xitb+it +ui
E[it | Xi ] 0
E[it2 | Xi ] 2
E[ui | Xi ] 0
E[ui2 | Xi ]
2 +u2
Var[i+uii ]
2
u
u2
...
2
2
...u
u +
...
...
u2
2
u
u2
2
...
2
2
+
u
2
u
yi=X
i + i +ui
i for Ti observations
16-100/135
Part 16: Panel Data
Notation
16-101/135
Var[i+uii ]
2 u2
u2
L
2
2 u2 L
u
u2
u2
u2
u2
O
M
K 2 u2
= 2I Ti u2ii Ti Ti
= 2I Ti u2ii
= i
0
1 0 L
0
(Note these differ only
L
0
2
Var[w | X]
M M O
M in the dimension Ti )
0 0 K
Part 16: Panel Data
Convergence of Moments
16-102/135
XiXi
XX
N
f
a weighted sum of individual moment matrices
i1 i
N
i1 T
Ti
i iX i
X
XX
N
f
a weighted sum of individual moment matrices
i1 i
N
i1T
Ti
= 2Ni1ffi
XiXi
u2Ni1 ixixi
Ti
Note asymptotics are with respect to N. Each matrix
Xi Xi
is the
Ti
moments for the Ti observations. Should be 'well behaved' in micro

level data. The average of N such matrices should be likewise.
T or Ti is assumed to be fixed (and small).
Part 16: Panel Data
Random vs. Fixed Effects
Random Effects
Small number of parameters

Efficient estimation
Objectionable orthogonality assumption (ci Xi)
Fixed Effects
16-103/135
Robust generally consistent

Large number of parameters
Part 16: Panel Data
Ordinary Least Squares
Standard results for OLS in a GR model
Consistent
Unbiased
Inefficient
True variance of the least squares estimator
16-104/135
X X
XX
Ni1T i Ni1 Ti
0 Q-1 Q * Q-1
0 as N
1
Var[b | X] N
i1Ti
XX
N
i1T i
Part 16: Panel Data
Estimating the Variance for OLS
16-105/135
XX XX
X

X
N
N
N
i1 i
i1 i
i1 Ti
In the spirit of the White estimator, use
1
Var[b | X] N
i1 Ti
iw
iXi
Xi w
Ti
XX
N
i= yi - Xib, fi N
i1fi
, w
N
Ti
i1Ti
i1Ti
Hypothesis tests are then based on Wald statistics.
THI S I S THE 'CLUSTER' ESTI MATOR
Part 16: Panel Data
OLS Results for Cornwell and Rupert

+----------------------------------------------------+
| Residuals
Sum of squares
=
522.2008
|
|
Standard error of e =
.3544712
|
| Fit
R-squared
=
.4112099
|
|
Adjusted R-squared
=
.4100766
|
+----------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant
5.40159723
.04838934
111.628
.0000
EXP
.04084968
.00218534
18.693
.0000
19.8537815
EXPSQ
-.00068788
.480428D-04
-14.318
.0000
514.405042
OCC
-.13830480
.01480107
-9.344
.0000
.51116447
SMSA
.14856267
.01206772
12.311
.0000
.65378151
MS
.06798358
.02074599
3.277
.0010
.81440576
FEM
-.40020215
.02526118
-15.843
.0000
.11260504
UNION
.09409925
.01253203
7.509
.0000
.36398559
ED
.05812166
.00260039
22.351
.0000
12.8453782
16-106/135
Part 16: Panel Data
Alternative Variance Estimators
16-107/135
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
Constant
5.40159723
.04838934
111.628
.0000
EXP
.04084968
.00218534
18.693
.0000
EXPSQ
-.00068788
.480428D-04
-14.318
.0000
OCC
-.13830480
.01480107
-9.344
.0000
SMSA
.14856267
.01206772
12.311
.0000
MS
.06798358
.02074599
3.277
.0010
FEM
-.40020215
.02526118
-15.843
.0000
UNION
.09409925
.01253203
7.509
.0000
ED
.05812166
.00260039
22.351
.0000
Robust Cluster___________________________________________
Constant
5.40159723
.10156038
53.186
.0000
EXP
.04084968
.00432272
9.450
.0000
EXPSQ
-.00068788
.983981D-04
-6.991
.0000
OCC
-.13830480
.02772631
-4.988
.0000
SMSA
.14856267
.02423668
6.130
.0000
MS
.06798358
.04382220
1.551
.1208
FEM
-.40020215
.04961926
-8.065
.0000
UNION
.09409925
.02422669
3.884
.0001
ED
.05812166
.00555697
10.459
.0000
Part 16: Panel Data
Generalized Least Squares
16-108/135
=[X-1X]1[X-1y]
i i-1X i ]1[Ni1X i i-1y i ]

=[Ni1X
2
1
-1
i 2 I Ti 2
ii
2

Tiu
(note, depends on i only through Ti )
Part 16: Panel Data
Generalized Least Squares
16-109/135
GLS is equivalent to OLS regression of

yit * yit iyi. on xit * xit ixi.,
where i 1
2 Tiu2
] [X-1X]-1 2[X * X* ]-1

Asy.Var[
Part 16: Panel Data
Estimators for the Variances

it it ui
yit x
Using the OLS estimator of , bOLS ,
Ni1 tTi 1(yit - a - xitb)2
T -1-K
N
i1 i
estimates 2 U2
With the LSDV estimates, ai and bLSDV ,

Ni1 tTi 1(yit - ai - xitb)2
T -N-K
N
i1 i
estimates 2
Using the difference of the two,
16-110/135
N Ti (y - a - x b)2
it
i1 t1 it
i1Ti -1-K
N Ti (y - a - x b )2
i1 t 1
it
i
it
i1Ti -N-K
estimates U2
Part 16: Panel Data
Practical Problems with FGLS

The preceding regularly produce negative estimates of u2.
Estimation is made very complicated in unbalanced panels.
A bulletproof solution (originally used in TSP, now NLOGIT and others).
Ti
N
2
2
i1 t1 (yit ai xitbLSDV )
From the robust LSDV estimator:

Ni1 Ti
Ni1 tTi 1 (yit aOLS xitbOLS )2

2
From the pooled OLS estimator: Est( )
Ni1 Ti
16-111/135
2
u
Ni1 tTi 1(yit aOLS xitbOLS )2 Ni1 tTi 1 (yit ai xitbLSDV )2
0

N
i1 Ti
2
u
Part 16: Panel Data
Stata Variance Estimators

Ni1 tTi 1 (yit ai xitbLSDV )2
> 0 based on FE estimates

Ni1Ti K N
2
(N
K)
SSE(group
means)
u Max 0,
A
(N
A)T
2
where A = K or if
u is negative,
A=trace of a matrix that somewhat resembles I K .

Many other adjustments exist. None guaranteed to be
positive. No optimality properties or even guaranteed consistency.
16-112/135
Part 16: Panel Data
Other Variance Estimators

% )2
% xib
Ni1(yit a
2
2
MEANS
From the group means regression: / T u
N K 1
it w
is
Ni1 tTi 11 sTi t1w
2
2
(Wooldridge) Based on E[wit wis | Xi ] u if t s,
u
Ni1Ti K N
There are many others. Generally if the original, standard choices fail,
these will also.
x does not contain a constant term in the
preceding.
16-113/135
Part 16: Panel Data
Fixed Effects Estimates

---------------------------------------------------------------------Least Squares with Group Dummy Variables..........
LHS=LWAGE
Mean
=
6.67635
Residuals
Sum of squares
=
82.34912
Standard error of e =
.15205
These 2 variables have no within group variation.
FEM
ED
F.E. estimates are based on a generalized inverse.
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------EXP|
.11346***
.00247
45.982
.0000
19.8538
EXPSQ|
-.00042***
.544864D-04
-7.789
.0000
514.405
OCC|
-.02106
.01373
-1.534
.1251
.51116
SMSA|
-.04209**
.01934
-2.177
.0295
.65378
MS|
-.02915
.01897
-1.536
.1245
.81441
FEM|
.000
UNION|
.03413**
.01491
2.290
.0220
.36399
ED|
.000
--------+-------------------------------------------------------------
16-114/135
Part 16: Panel Data
Computing Variance Estimators

Using the full list of variables (FEM and ED are time invariant)
OLS sum of squares = 522.2008.
2+2 = 522.2008 / (4165 - 9) = 0.12565.
Using full list of variables and a generalized inverse (same

as dropping FEM and ED), LSDV sum of squares = 82.34912.
2 = 82.34912 / (4165 - 8-595) = 0.023119.
2 0.12565 - 0.023119 = 0.10253
u
2 were
Both estimators are positive. We stop here. If
u
negative, we would use estimators without DF corrections.
16-115/135
Part 16: Panel Data
Application
---------------------------------------------------------------------Random Effects Model: v(i,t)
= e(i,t) + u(i)
Estimates: Var[e]
=
.023119
Var[u]
=
.102531
Corr[v(i,t),v(i,s)] =
.816006
Lagrange Multiplier Test vs. Model (3) =3713.07
( 1 degrees of freedom, prob. value = .000000)
(High values of LM favor FEM/REM over CR model)
Fixed vs. Random Effects (Hausman)
=
.00 (Cannot be computed)
( 8 degrees of freedom, prob. value = 1.000000)
(High (low) values of H favor F.E.(R.E.) model)
Sum of Squares
1411.241136
R-squared
-.591198
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
EXP
.08819204
.00224823
39.227
.0000
19.8537815
EXPSQ
-.00076604
.496074D-04
-15.442
.0000
514.405042
OCC
-.04243576
.01298466
-3.268
.0011
.51116447
SMSA
-.03404260
.01620508
-2.101
.0357
.65378151
MS
-.06708159
.01794516
-3.738
.0002
.81440576
FEM
-.34346104
.04536453
-7.571
.0000
.11260504
UNION
.05752770
.01350031
4.261
.0000
.36398559
ED
.11028379
.00510008
21.624
.0000
12.8453782
Constant
4.01913257
.07724830
52.029
.0000
16-116/135
Part 16: Panel Data
Testing for Effects: An LM Test
16-117/135
Breusch and Pagan Lagrange Multiplier statistic

0
yit xit ui it , ui and it ~ Normal
0
u2
0
2

H0 : u2 0
General
2
Ni1(Ti ei )2
( T )
LM =
1

N
N
T
2
2i1Ti (Ti 1) i1 t1eit
Balanced Panel
N
2
i1 i
i i ]
NT [(Te ) ee
LM
i i
2(T-1)
Ni1ee
N
i1
2
i
[1]
Part 16: Panel Data
Application: Cornwell-Rupert
16-118/135
Part 16: Panel Data
Testing for Effects

Regress;
lhs=lwage;rhs=fixedx,varyingx;res=e$
Matrix ; tebar=7*gxbr(e,person)$
Calc
; list;lm=595*7/(2*(7-1))*
(tebar'tebar/sumsqdev - 1)^2$
LM
= 3797.06757
16-119/135
Part 16: Panel Data
A Hausman Test for FE vs. RE

Estimator
Random Effects
E[ci|Xi] = 0
Fixed Effects
E[ci|Xi] 0
FGLS
(Random
Effects)
LSDV
(Fixed Effects)
Consistent and
Efficient
Inconsistent
Consistent
Inefficient
Consistent
Possibly Efficient
16-120/135
Part 16: Panel Data
Computing the Hausman Statistic
16-121/135
1
N
Est.Var[FE ]
i1Xi I ii X i
Ti
i
N
]
Est.Var[
X
I
ii
RE
i1 i
Ti
X i
-1
2
Ti
u
, 0 i = 2
1
2
Ti
u
2
2
] Est.Var[
]
As long as
and
u are consistent, as N , Est.Var[
FE
RE
will be nonnegative definite. In a finite sample, to ensure this, both must

2
be computed using the same estimate of
. The one based on LSDV will
generally be the better choice.

] if there are time
Note that columns of zeros will appear in Est.Var[
FE
invariant variables in X.
does not contain the constant term in the

preceding.
Part 16: Panel Data
Hausman Test
16-122/135
+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i)
|
| Estimates: Var[e]
=
.235368D-01 |
|
Var[u]
=
.110254D+00 |
|
.824078
|
| Lagrange Multiplier Test vs. Model (3) = 3797.07 |
| ( 1 df, prob value = .000000)
|
| (High values of LM favor FEM/REM over CR model.) |
| Fixed vs. Random Effects (Hausman)
= 2632.34 |
| ( 4 df, prob value = .000000)
|
| (High (low) values of H favor FEM (REM).)
|
+--------------------------------------------------+
Part 16: Panel Data
Fixed Effects
16-123/135
+----------------------------------------------------+
| Panel:Groups
Empty
0,
Valid data
595 |
|
Smallest
7,
Largest
7 |
|
Average group size
7.00 |
| There are 2 vars. with no within group variation. |
| ED
FEM
|
| Look for huge standard errors and fixed parameters.|
| F.E. results are based on a generalized inverse.
|
| They will be highly erratic. (Problematic model.) |
| Unable to compute std.errors for dummy var. coeffs.|
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
+--------+--------------+----------------+--------+--------+----------+
|WKS
|
.00083
.00060003
1.381
.1672
46.811525|
|OCC
|
-.02157
.01379216
-1.564
.1178
.5111645|
|IND
|
.01888
.01545450
1.221
.2219
.3954382|
|SOUTH
|
.00039
.03429053
.011
.9909
.2902761|
|SMSA
|
-.04451**
.01939659
-2.295
.0217
.6537815|
|UNION
|
.03274**
.01493217
2.192
.0283
.3639856|
|EXP
|
.11327***
.00247221
45.819
.0000
19.853782|
|EXPSQ
|
-.00042***
.546283D-04
-7.664
.0000
514.40504|
|ED
|
.000
|
|FEM
|
.000
|
+--------+------------------------------------------------------------+
Part 16: Panel Data
Random Effects
16-124/135
+--------------------------------------------------+
|
| Estimates: Var[e]
=
.235368D-01 |
|
Var[u]
=
.110254D+00 |
|
.824078
|
| Lagrange Multiplier Test vs. Model (3) = 3797.07 |
| ( 1 df, prob value = .000000)
|
| (High values of LM favor FEM/REM over CR model.) |
+--------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
+--------+--------------+----------------+--------+--------+----------+
|WKS
|
.00094
.00059308
1.586
.1128
46.811525|
|OCC
|
-.04367***
.01299206
-3.361
.0008
.5111645|
|IND
|
.00271
.01373256
.197
.8434
.3954382|
|SOUTH
|
-.00664
.02246416
-.295
.7677
.2902761|
|SMSA
|
-.03117*
.01615455
-1.930
.0536
.6537815|
|UNION
|
.05802***
.01349982
4.298
.0000
.3639856|
|EXP
|
.08744***
.00224705
38.913
.0000
19.853782|
|EXPSQ
|
-.00076***
.495876D-04
-15.411
.0000
514.40504|
|ED
|
.10724***
.00511463
20.967
.0000
12.845378|
|FEM
|
-.24786***
.04283536
-5.786
.0000
.1126050|
|Constant|
3.97756***
.08178139
48.637
.0000
|
+--------+------------------------------------------------------------+
Part 16: Panel Data
The Hausman Test, by Hand
16-125/135
--> matrix; br=b(1:8) ; vr=varb(1:8,1:8)$

--> matrix ; db = bf - br ; dv = vf - vr $
--> matrix ; list ; h =db'<dv>db$
Matrix H
has
1 rows and
1 columns.
1
+-------------1| 2523.64910
--> calc;list;ctb(.95,8)$
+------------------------------------+
| Listed Calculator Results
|
+------------------------------------+
Result =
15.507313
Part 16: Panel Data
Hello, professor greene.

Ive taken the liberty of attaching some LIMDEP output in order to ask your
view on whether my Hausman test stat is large, requiring the FEM, or not,
allowing me to use the (much better for my research) REM.
Specifically, my test statistic, corrected for heteroscedasticity, is about 34
and significant with 6 df.
I considered this a large value until I found your assignment 2 on the
internet which shows a value of 2554 with 4 df. Now, Id like to assert that
34/6 is a small value.
16-126/135
Part 16: Panel Data
16-127/135
Variable Addition
A Fixed Effects Model
yit i xit it
LSDV estimator - Deviations from group means:
To estimate , regress (y it yi ) on (xit xi )
Algebraic equivalent: OLS regress y it on (xit , xi )
Mundlak interpretation: i xi u i
Model becomes y it xi u i xit it
= xi xit it u i
a random effects model with the group means.
Estimate by FGLS.
Part 16: Panel Data
A Variable Addition Test

Asymptotic equivalent to Hausman
Also equivalent to Mundlak formulation
In the random effects model, using FGLS
16-128/135
Only applies to time varying variables

Add expanded group means to the regression (i.e.,
observation i,t gets same group means for all t.
Use Wald test to test for coefficients on means
equal to 0. Large chi-squared weighs against
random effects specification.
Part 16: Panel Data
Means Added to REM - Mundlak
16-129/135
+--------+--------------+----------------+--------+--------+----------+
+--------+--------------+----------------+--------+--------+----------+
|WKS
|
.00083
.00060070
1.380
.1677
46.811525|
|OCC
|
-.02157
.01380769
-1.562
.1182
.5111645|
|IND
|
.01888
.01547189
1.220
.2224
.3954382|
|SOUTH
|
.00039
.03432914
.011
.9909
.2902761|
|SMSA
|
-.04451**
.01941842
-2.292
.0219
.6537815|
|UNION
|
.03274**
.01494898
2.190
.0285
.3639856|
|EXP
|
.11327***
.00247500
45.768
.0000
19.853782|
|EXPSQ
|
-.00042***
.546898D-04
-7.655
.0000
514.40504|
|ED
|
.05199***
.00552893
9.404
.0000
12.845378|
|FEM
|
-.41306***
.03732204
-11.067
.0000
.1126050|
|WKSB
|
.00863**
.00363907
2.371
.0177
46.811525|
|OCCB
|
-.14656***
.03640885
-4.025
.0001
.5111645|
|INDB
|
.04142
.02976363
1.392
.1640
.3954382|
|SOUTHB |
-.05551
.04297816
-1.292
.1965
.2902761|
|SMSAB
|
.21607***
.03213205
6.724
.0000
.6537815|
|UNIONB |
.08152**
.03266438
2.496
.0126
.3639856|
|EXPB
|
-.08005***
.00533603
-15.002
.0000
19.853782|
|EXPSQB |
-.00017
.00011763
-1.416
.1567
514.40504|
|Constant|
5.19036***
.20147201
25.762
.0000
|
+--------+------------------------------------------------------------+
Part 16: Panel Data
Wu (Variable Addition) Test
16-130/135
--> matrix ; bm=b(12:19);vm=varb(12:19,12:19)$

--> matrix ; list ; wu = bm'<vm>bm $
Matrix WU
has
1 rows and
1 columns.
1
+-------------1| 3004.38076
Part 16: Panel Data
A Hierarchical Linear Model

Interpretation of the FE Model
it ci+it , x
yit x
( does not contain a constant)
E[it| Xi , ci ] 0, Var[it| Xi , ci ]=2
i + ui ,
ci +z
E[u|i zi ] 0, Var[u|i zi] u2
it [ z
i ui ] it
yit x
16-131/135
Part 16: Panel Data
Hierarchical Linear Model as REM

+--------------------------------------------------+
|
| Estimates: Var[e]
=
.235368D-01 |
|
Var[u]
=
.110254D+00 |
|
.824078
|
|
Sigma(u)
= 0.3303
|
+--------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
+--------+--------------+----------------+--------+--------+----------+
OCC
|
-.03908144
.01298962
-3.009
.0026
.51116447
SMSA
|
-.03881553
.01645862
-2.358
.0184
.65378151
MS
|
-.06557030
.01815465
-3.612
.0003
.81440576
EXP
|
.05737298
.00088467
64.852
.0000
19.8537815
FEM
|
-.34715010
.04681514
-7.415
.0000
.11260504
ED
|
.11120152
.00525209
21.173
.0000
12.8453782
Constant|
4.24669585
.07763394
54.702
.0000
16-132/135
Part 16: Panel Data
16-133/135
Evolution: Correlated Random Effects

Unknown parameters
yit i xit it , [1 , 2 ,..., N , , 2 ]
Standard estimation based on LS (dummy variables)
Ambiguous definition of the distribution of yit
Effects model, nonorthogonality, heterogeneity
yit i xit it , E[ i | Xi ] g( Xi ) 0
Contrast to random effects E[i | X i ]
Standard estimation (still) based on LS (dummy variables)
Correlated random effects, more detailed model
yit i xit it , P[i | Xi ] g( Xi ) 0
Linear projection? i xi u i Cor(u i , xi ) 0
Part 16: Panel Data
Mundlaks Estimator
16-134/135
Mundlak, Y., On the Pooling of Time Series and Cross

Section Data, Econometrica, 46, 1978, pp. 69-85.
i ui , E[ci | x i1 ,x i1 ,...x iT ]= x
i
Write ci = x
i
Assume ci contains all time invariant information
yi=X
=X
i +ix i+ i + uii
Looks like random effects.
Var[i + uii]=i +2uii
This is the model we used for the Wu test.
Part 16: Panel Data
16-135/135
Correlated Random Effects

Mundlak
i ui , E[ci | x i1 ,x i1 ,...x iT ]= x
ci = x
i
i
Assume ci contains all time invariant information
yi=Xi+cii+i , Ti observations in group i
=X
i +ix i+ i + uii
Chamberlain/ Wooldridge
i1 1 x i22 ... x
iT T ui
ci = x
yi=X
i ix i11 ix i1 2 ... ix iT T i u+
i i
TxK E5F
TxK E5F
TxK
E5F
TxK etc.
E5F
Problems: Requires balanced panels

Modern panels have large T; models have large K
Part 16: Panel Data
Mundlaks Approach for an FE Model with Time

Invariant Variables
it +z
i ci+it , x
yit x
( does not contain a constant)
E[it| Xi , ci ] 0, Var[it| Xi , ci ]=2
i + wi ,
ci + x
E[w|i Xi , zi ] 0, Var[w|i Xi , zi ] 2w
it z
i x
i wi it
yit x
random effects model including group means of
time varying variables.
16-136/135
Part 16: Panel Data
Mundlak Form of FE Model

+--------+--------------+----------------+--------+--------+----------+
+--------+--------------+----------------+--------+--------+----------+
x(i,t)=================================================================
OCC
|
-.02021384
.01375165
-1.470
.1416
.51116447
SMSA
|
-.04250645
.01951727
-2.178
.0294
.65378151
MS
|
-.02946444
.01915264
-1.538
.1240
.81440576
EXP
|
.09665711
.00119262
81.046
.0000
19.8537815
z(i)===================================================================
FEM
|
-.34322129
.05725632
-5.994
.0000
.11260504
ED
|
.05099781
.00575551
8.861
.0000
12.8453782
Means of x(i,t) and constant===========================================
Constant|
5.72655261
.10300460
55.595
.0000
OCCB
|
-.10850252
.03635921
-2.984
.0028
.51116447
SMSAB
|
.22934020
.03282197
6.987
.0000
.65378151
MSB
|
.20453332
.05329948
3.837
.0001
.81440576
EXPB
|
-.08988632
.00165025
-54.468
.0000
19.8537815
Variance Estimates=====================================================
Var[e]|
.0235632
Var[u]|
.0773825
16-137/135
Part 16: Panel Data
16-138/135
Panel Data Extensions

Dynamic models: lagged effects of the
dependent variable
Endogenous RHS variables
Cross country comparisons large T
More general parameter heterogeneity not
only the constant term
Nonlinear models such as binary choice
Part 16: Panel Data
The Hausman and Taylor Model

it
yit x1
x2 it
i
z1
i
z2
it ui
Model: x2 and z2 are correlated with u.

Deviations from group means removes all time invariant variables
yit yi ( x1it - x1i )'1 ( x2it - x2i ) '2 it
Implication: 1 , 2 are consistently estimated by LSDV.
( x1it - x1i ) = K1 instrumental variables
z1i
?
= L1 instrumental variables (uncorrelated with u)

= L2 instrumental variables (where do we get them?)
H&T: x1i = K1 additional instrumental variables. Needs K1 L2.
16-139/135
Part 16: Panel Data
16-140/135
H&Ts 4 Step FGLS Estimator

(1) LSDV estimates of 1 , 2 , 2
(2) (e* )' = (e1 , e1 ,..., e1),(e2 , e2 ,..., e2 ),...,(eN, eN,..., eN )
IV regression of e * on Z* with instruments
Wi consistently
estimates 1 and 2.
(3) With fixed T, residual variance in (2) estimates u2 2 / T
With unbalanced panel, it estimates u2 2 (1/T) or something
resembling this. (1) provided an estimate of 2 so use the two
to obtain estimates of u2 and 2 . For each group, compute
2
2
2
i 1
/ (
Ti
u)
(4) Transform
[xit1 , xit2 ,zi1 ,zi2 ] to
W*
i = [xit1 , xit2 ,zi1 ,zi2 ] - i[x i1 , xi2 ,zi1 ,zi2 ]
and
yit to yit * = yit - iyi.

Part 16: Panel Data
H&Ts 4 STEP IV Estimator

Instrumental Variables Vi
(x1it - x1i ) = K1 instrumental variables
(x2it - x2i ) = K2 instrumental variables
z1i
x1i
= K1 additional instrumental variables.
Now do 2SLS of y * on W * with instruments V to estimate

all parameters. I.e.,
* W*
)-1W
* y * .
[1 , 2 , 1 , 2 ]=(W
16-141/135
Part 16: Panel Data
16-142/135
Part 16: Panel Data
Arellano/Bond/Bovers Formulation Builds

on Hausman and Taylor
16-143/135
it
yit x1
x2 it
i
z1
i
z2
it ui
Instrumental variables for period t

z1i
x1i
= K1 additional instrumental variables. K1 L2.
Let vit it ui
Let zit [( x1it - x1i )',( x2it - x2i )',z1i , x1']
Then E[zit vit ] 0
We formulate this for the Ti observations in group i.
Part 16: Panel Data
Arellano/Bond/Bovers Formulation Adds a

Lagged DV to H&T
it
yit yi,t1+x1
x2 it
i
z1
i
z2
it ui
Parameters : = [, 1
, 2
, 1
, 2']
The data
yi,2
yi,1 x1i2 x2i2 z1i z2 i
y
y
x1
x2
z1
z2
i3
i3
i
i,3
i,2
i
yi
, Xi
, Ti -1 rows
yi,T i
yi,T-1 x1iTi x2iTi z1i z2 i
1 K1
K2
L1 L2 columns
This formulation is the same as H&T with yi,t-1 contained in x2it .
16-144/135
Part 16: Panel Data
16-145/135
Dynamic (Linear) Panel

Data (DPD) Models
Application
Bias in Conventional Estimation
Development of Consistent Estimators
Efficient GMM Estimators
Part 16: Panel Data
Dynamic Linear Model

Balestra-Nerlove (1966), 36 States, 11 Years
Demand for Natural Gas
Structure
New Demand: G*i,t Gi,t (1 )Gi,t1
Demand Function G*i,t 1 2Pi,t 3Ni,t 4Ni,t 5Yi,t 6 Yi,t i,t
G=gas demand
N = population
P = price
Y = per capita income
Reduced Form
Gi,t 1 2Pi,t 3Ni,t 4Ni,t 5Yi,t 6 Yi,t 7Gi,t1 i i,t
16-146/135
Part 16: Panel Data
16-147/135
A General DPD model

i,t yi,t1 ci i,t
yi,t x
E[i,t | Xi ,ci ] 0
2
E[i,t
| Xi , ci ] 2 , E[i,ti,s | Xi , ci ] 0 if t s.
E[ci | Xi ] g( Xi )
No correlation across individuals
OLS and GLS are both inconsistent.
Part 16: Panel Data
16-148/135
Arellano and Bond Estimator

Base on first differences
yi,t yi,t1 ( xi,t xi,t1 )'+(yi,t1 yi,t2 ) (i,t i,t1)
Instrumental variables
yi,3 yi,2 ( xi,3 xi,2 )'+(yi,2 yi,1) (i,3 i,2 )
Can use yi1
yi,4 yi,3 ( xi,4 xi,3 )'+(yi,3 yi,2 ) (i,4 i,3 )
Can use yi,1 and yi2
yi,5 yi,4 ( xi,5 xi,4 )'+(yi,4 yi,3 ) (i,5 i,4 )
Can use yi,1 and yi2 and yi,3
Part 16: Panel Data
16-149/135

More instrumental variables - Predetermined X
yi,3 yi,2 ( xi,3 xi,2 )'+(yi,2 yi,1) (i,3 i,2 )
Can use yi1 and xi,1 , xi,2
yi,4 yi,3 ( xi,4 xi,3 )'+(yi,3 yi,2 ) (i,4 i,3 )
Can use yi,1 , yi2 , xi,1 , xi,2 , xi,3
yi,5 yi,4 ( xi,5 xi,4 )'+(yi,4 yi,3 ) (i,5 i,4 )
Can use yi,1 , yi2 , yi,3 , xi,1 , xi,2 , xi,3 , xi,4
Part 16: Panel Data
16-150/135

Even more instrumental variables - Strictly exogenous X
yi,3 yi,2 ( xi,3 xi,2 )'+(yi,2 yi,1) (i,3 i,2 )
Can use yi1 and xi,1 , xi,2 ,..., xi,T (all periods)
yi,4 yi,3 ( xi,4 xi,3 )'+(yi,3 yi,2 ) (i,4 i,3 )
Can use yi,1 , yi2 , xi,1 , xi,2 ,..., xi,T
yi,5 yi,4 ( xi,5 xi,4 )'+(yi,4 yi,3 ) (i,5 i,4 )
Can use yi,1 , yi2 , yi,3 , xi,1 , xi,2 ,..., xi,T
The number of potential instruments is huge.
These define the rows of Zi. These can be used for
simple instrumental variable estimation.
Part 16: Panel Data
Application: Maquiladora
http://www.dallasfed.org/news/research/2005/05us-mexico_felix.pdf
16-151/135
Part 16: Panel Data
16-152/135
Maquiladora
Part 16: Panel Data
16-153/135
Estimates
Part 16: Panel Data

Econometrics I 16

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Econometrics I 16

Hochgeladen von

Copyright:

Verfügbare Formate

16-1/135

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Panel Data Sets

Cross section time series

British household panel survey (BHPS)

Financial data by firm, by year

rit rft = i(rmt - rft) + it, i = 1,,many; t=1,many

Exchange rate data, essentially infinite T, large N

Part 16: Panel Data

Benefits of Panel Data

Time and individual variation in behavior

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Part 16: Panel Data

Cornwell and Rupert Data

Part 16: Panel Data

Part 16: Panel Data

Balanced and Unbalanced Panels

Distinction: Balanced vs. Unbalanced Panels

Is the fixed Ti assumption ever necessary? Almost

Part 16: Panel Data

Application: Health Care Usage

Part 16: Panel Data

Part 16: Panel Data

A Basic Model for Panel Data

Unobserved individual effects in regression: E[yit | xit, ci]

effects are correlated with included variables.

Fixed Effects the dummy variable model

Random Effects the error components model

is the partial effect of interest

Can it be estimated (consistently) in

Does pooled least squares work?

Part 16: Panel Data

Assumptions for Asymptotics

Convergence of moments involving cross section Xi.

Fixed T asymptotics (see text, p. 348)

Ranks of matrices. X must have full column rank. (Xi

The Pooled Regression

Presence of omitted effects

Potential bias/inconsistency of OLS depends

Part 16: Panel Data

OLS in the Presence of Individual Effects

(part due to the omitted ci )

i i (covariance of X and will = 0)

So, what becomes of Ni=1

and ci converges to zero.

Part 16: Panel Data

Estimating the Sampling Variance of b

s2(X X)-1? Inappropriate because

Correlation across observations (certainly)

A robust covariance matrix