Selectivity Econometria

Sample Selection
Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2012
April 23, 2012
Sample Selection
Preliminaries 1) Truncated normal distribution

X f (x), X|X < a: X truncated in a. Then
f (x|X < a) =
f (x)
P r(X < a)
If X N (, 2 ), and recalling that

P r(X < a) = P r 1/(X ) < 1/(a ) = P r(z < ),
(a )/, z (x )/.
f (x|X < a) =
=
1 1
2
h
exp 21
()
(1/)(z)
()
Sample Selection
x 2
Result (no proof): if X N (, 2 ), then:

E(X|X < a) =
()
()
Truncated to the right: expected value moves to the left

(general).
How much? Depends on and 2
(z) (z)/(z) is known as the inverse Mills ratio
Sample Selection
Preliminaries 2) Latent variables and probits

Recall the probit model:
P r(y = 1|x) = (x0 )
can be consistently estimated by MLE based on a random
sample (yi , xi ), i = 1, . . . , n.
Consider the regression model
y = x0 + u,
u N (0, 2 )
y not directly observable (a latent variable) but, instead, we

observe y = 1[y > 0].
Which parameters of the regression models can be estimated
consistently with this information?
Sample Selection
Note that:
P (y = 1|x) = P (y > 0|x)
= P (u > x0 |x)
= P (u < x0 |x)
= P (u/ < x0 / | x)
= (x0 )
This is a probit model with /.
Based on (yi , xi ), we can estimate consistenly by MLE, even
when we cannot estimate and 2 separately.
Sample Selection
2 and are not identified with the sample (yi , xi ). For

example = 10 and 2 = 2 are observationally equivalent to
the case = 5 y 2 = 1.
/ is identified.
Sample Selection
Sample selectivity
Consider the regression model

yi = x0i + ui
si is a selectivity variable: si = 1 observed, 0 if not.
We can think that there is a super sample of size N of
yi , xi , si and that we observe the sub sample yi , xi , only
when si = 1.
Example: female labor productivity
Sample Selection
OLS under selectivity
With a random sample (yi , xi ), consistency relies on:

E(ui |xi ) = 0
which implies E(yi |xi ) = x0i .
Now we do not have a random sample, but one conditioned on
si = 1. Taking conditional expectations:
E(yi |xi , si = 1) = x0i + E(ui |xi , si = 1)
OLS based on the selected sample will be inconsistent, unless
E(ui |xi , si = 1) = 0.
Sample Selection
Not every selectivity mechanism makes OLS inconsistent.

If u is independent of x, OLS still consistent (why?).
If s = g(x), OLS still consistent.
Examples: wages and education. Males with odd SSN. Males
with formal education?
Sample Selection
An estimable model under selectivity
Consider the following equations:

y1i = x01i 1 + u1i
y2i = x02i 2 + u2i
(regression)
(selectivity)
> 0].
Let y2i = 1[y2i
Example: y1i = wage, regression equantion determines wages based on

persons characteristics (x1i ). y2i = net utility of work, x2i are observed
determinantes of utility.
Sample Selection
Assumptions:
1
(y2i , x2i ) observed for everyone.
(y1i , x1i ) observed only if y2i = 1 (selected sample).
(u1i , u2i ) are independent of x2i and have zero mean.
u2i N (0, 22 ).
E(u1i |u2i ) = u2 . No-observables can be related.
Sample Selection
Selectivity bias
Here si y2i .
E(y1i |x1i , y2i = 1) = x01i 1 + E(u1i | x1i , y2i = 1)

= x01i 1 + E E(u1i |u2i ) | x1i , y2i = 1

= x01i 1 + E u2i | x1i , y2i = 1

= x01i 1 + E u2i | x1i , y2i

>0

= x01i 1 + E u2i | x1i , u2i < x02i 2
= x01i 1 2 (x02i 2 /2 )
= x01i 1 2 zi 6= x01i 1
with zi (x02i 2 /2 ). OLS with the selected sample is
inconsistent.
Sample Selection
E(y1i |x1i , y2i = 1) = x01i 1 2 zi 6= x01i 1

Inconsistency: omission of zi . Heckman (1979): selectivity
bias as misspecification.
Inconsistency due to the correlation between u1i y u2i , , that
is 6= 0.
Sample Selection
Heckmans two-step estimator
Define u1i y1i x01i 1 zi , 2 . Solving:

y1i = x01i + z + u1i
where, by construction E(u1i |x1i , y2i = 1) = 0.
x1i , zi observable when y2i = 1: OLS of y1i on x1i and zi
using the selected sample gives consistent esimates of 1 and
.
Problem: zi (x02i 2 /2 ) is NOT observable, it depends on
2 and 2 .
Sample Selection
Note that u2i N (0, 22 ), hence:
P (y2i = 1) = P (y2i
> 0) = P (u2i /2 < x02i 2 /2 ) = (x02i )
P (y2i = 1) is a probit model with unknown coefficients .

x2i and y2i are observed for everybody: can be estimated
using probit.
Important: we cannot identify 2i and 2 separately but
= 2i /2i .
Sample Selection
This suggests the following two-stage method:

Stage 1: Obtain estimates based on the probit model
P (y2i = 1) = (x02i ) using the full sample. Predict zi using
zi = (x02i ).
Stage 2: Regress y2i on x1i and zi using the selected sample.
This produces consistent estimates of 1 and .
Sample Selection
Practical remarks
The method is consistent and asymptotically normal (method

of moments estimator). Standard inference works fine.
Careful with asympototic variance. The second stage is
heteroskedastic. Requires correction. See Greene (Ch. 20).
A test of H0 : = 0 may be used to check sample selectivity.
Under H0 , the regression model with the selected sample is
homoskedastic, test can be based in a model without taking
care of heteroskedasticity.
Classic issue: low power when x1 is very similar to x2 .
MLE? Requires bivariate normality. Complicated likelihood,
rarely used.
Sample Selection

Selectivity Econometria

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Selectivity Econometria

Hochgeladen von

Copyright:

Verfügbare Formate

Sample Selection

April 23, 2012

Preliminaries 1) Truncated normal distribution

If X N (, 2 ), and recalling that

Result (no proof): if X N (, 2 ), then:

Truncated to the right: expected value moves to the left

Preliminaries 2) Latent variables and probits

y not directly observable (a latent variable) but, instead, we

2 and are not identified with the sample (yi , xi ). For

Consider the regression model

OLS under selectivity

With a random sample (yi , xi ), consistency relies on:

Not every selectivity mechanism makes OLS inconsistent.

An estimable model under selectivity

Consider the following equations:

Example: y1i = wage, regression equantion determines wages based on

(y2i , x2i ) observed for everyone.

(y1i , x1i ) observed only if y2i = 1 (selected sample).

(u1i , u2i ) are independent of x2i and have zero mean.

E(u1i |u2i ) = u2 . No-observables can be related.

= x01i 1 + E u2i | x1i , y2i

E(y1i |x1i , y2i = 1) = x01i 1 2 zi 6= x01i 1

Heckmans two-step estimator

Define u1i y1i x01i 1 zi , 2 . Solving:

Note that u2i N (0, 22 ), hence:

P (y2i = 1) is a probit model with unknown coefficients .

This suggests the following two-stage method:

The method is consistent and asymptotically normal (method

Das könnte Ihnen auch gefallen