Sie sind auf Seite 1von 17

Sample Selection

Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2012

April 23, 2012

Walter Sosa-Escudero

Sample Selection

Preliminaries 1) Truncated normal distribution


X f (x), X|X < a: X truncated in a. Then
f (x|X < a) =

f (x)
P r(X < a)

If X N (, 2 ), and recalling that




P r(X < a) = P r 1/(X ) < 1/(a ) = P r(z < ),
(a )/, z (x )/.

f (x|X < a) =
=

1 1
2

h
exp 21

()
(1/)(z)
()

Walter Sosa-Escudero

Sample Selection

x 2

Result (no proof): if X N (, 2 ), then:


E(X|X < a) =

()
()

Truncated to the right: expected value moves to the left


(general).
How much? Depends on and 2
(z) (z)/(z) is known as the inverse Mills ratio

Walter Sosa-Escudero

Sample Selection

Preliminaries 2) Latent variables and probits


Recall the probit model:
P r(y = 1|x) = (x0 )
can be consistently estimated by MLE based on a random
sample (yi , xi ), i = 1, . . . , n.
Consider the regression model
y = x0 + u,

u N (0, 2 )

y not directly observable (a latent variable) but, instead, we


observe y = 1[y > 0].
Which parameters of the regression models can be estimated
consistently with this information?
Walter Sosa-Escudero

Sample Selection

Note that:
P (y = 1|x) = P (y > 0|x)
= P (u > x0 |x)
= P (u < x0 |x)
= P (u/ < x0 / | x)
= (x0 )
This is a probit model with /.
Based on (yi , xi ), we can estimate consistenly by MLE, even
when we cannot estimate and 2 separately.

Walter Sosa-Escudero

Sample Selection

2 and are not identified with the sample (yi , xi ). For


example = 10 and 2 = 2 are observationally equivalent to
the case = 5 y 2 = 1.
/ is identified.

Walter Sosa-Escudero

Sample Selection

Sample selectivity

Consider the regression model


yi = x0i + ui
si is a selectivity variable: si = 1 observed, 0 if not.
We can think that there is a super sample of size N of
yi , xi , si and that we observe the sub sample yi , xi , only
when si = 1.
Example: female labor productivity

Walter Sosa-Escudero

Sample Selection

OLS under selectivity

With a random sample (yi , xi ), consistency relies on:


E(ui |xi ) = 0
which implies E(yi |xi ) = x0i .
Now we do not have a random sample, but one conditioned on
si = 1. Taking conditional expectations:
E(yi |xi , si = 1) = x0i + E(ui |xi , si = 1)
OLS based on the selected sample will be inconsistent, unless
E(ui |xi , si = 1) = 0.

Walter Sosa-Escudero

Sample Selection

Not every selectivity mechanism makes OLS inconsistent.


If u is independent of x, OLS still consistent (why?).
If s = g(x), OLS still consistent.
Examples: wages and education. Males with odd SSN. Males
with formal education?

Walter Sosa-Escudero

Sample Selection

An estimable model under selectivity

Consider the following equations:



y1i = x01i 1 + u1i
y2i = x02i 2 + u2i

(regression)
(selectivity)

> 0].
Let y2i = 1[y2i

Example: y1i = wage, regression equantion determines wages based on


persons characteristics (x1i ). y2i = net utility of work, x2i are observed
determinantes of utility.

Walter Sosa-Escudero

Sample Selection

Assumptions:
1

(y2i , x2i ) observed for everyone.

(y1i , x1i ) observed only if y2i = 1 (selected sample).

(u1i , u2i ) are independent of x2i and have zero mean.

u2i N (0, 22 ).

E(u1i |u2i ) = u2 . No-observables can be related.

Walter Sosa-Escudero

Sample Selection

Selectivity bias
Here si y2i .
E(y1i |x1i , y2i = 1) = x01i 1 + E(u1i | x1i , y2i = 1)


= x01i 1 + E E(u1i |u2i ) | x1i , y2i = 1


= x01i 1 + E u2i | x1i , y2i = 1



= x01i 1 + E u2i | x1i , y2i


>0


= x01i 1 + E u2i | x1i , u2i < x02i 2
= x01i 1 2 (x02i 2 /2 )
= x01i 1 2 zi 6= x01i 1
with zi (x02i 2 /2 ). OLS with the selected sample is
inconsistent.

Walter Sosa-Escudero

Sample Selection

E(y1i |x1i , y2i = 1) = x01i 1 2 zi 6= x01i 1


Inconsistency: omission of zi . Heckman (1979): selectivity
bias as misspecification.
Inconsistency due to the correlation between u1i y u2i , , that
is 6= 0.

Walter Sosa-Escudero

Sample Selection

Heckmans two-step estimator

Define u1i y1i x01i 1 zi , 2 . Solving:


y1i = x01i + z + u1i
where, by construction E(u1i |x1i , y2i = 1) = 0.
x1i , zi observable when y2i = 1: OLS of y1i on x1i and zi
using the selected sample gives consistent esimates of 1 and
.
Problem: zi (x02i 2 /2 ) is NOT observable, it depends on
2 and 2 .

Walter Sosa-Escudero

Sample Selection

Note that u2i N (0, 22 ), hence:

P (y2i = 1) = P (y2i
> 0) = P (u2i /2 < x02i 2 /2 ) = (x02i )

P (y2i = 1) is a probit model with unknown coefficients .


x2i and y2i are observed for everybody: can be estimated
using probit.
Important: we cannot identify 2i and 2 separately but
= 2i /2i .

Walter Sosa-Escudero

Sample Selection

This suggests the following two-stage method:


Stage 1: Obtain estimates based on the probit model
P (y2i = 1) = (x02i ) using the full sample. Predict zi using

zi = (x02i ).
Stage 2: Regress y2i on x1i and zi using the selected sample.
This produces consistent estimates of 1 and .

Walter Sosa-Escudero

Sample Selection

Practical remarks

The method is consistent and asymptotically normal (method


of moments estimator). Standard inference works fine.
Careful with asympototic variance. The second stage is
heteroskedastic. Requires correction. See Greene (Ch. 20).
A test of H0 : = 0 may be used to check sample selectivity.
Under H0 , the regression model with the selected sample is
homoskedastic, test can be based in a model without taking
care of heteroskedasticity.
Classic issue: low power when x1 is very similar to x2 .
MLE? Requires bivariate normality. Complicated likelihood,
rarely used.

Walter Sosa-Escudero

Sample Selection

Das könnte Ihnen auch gefallen