Potential Outcomes Framework

Applied Econometrics for Economic Research CIES/INEI
The Potential Outcomes Framework or the Neyman-Rubin-Holland Model
Stanislao Maldonado1
University of California, Berkeley
January, 2010
1. Notation
This is based on Holland (1986), Angrist et al (2009) and Morgan et al (2007). The model was
proposed originally by Neyman (1923) and further developed by Rubin (1974). We introduce
here the basic terminology:
• i is an index for individuals in a population.

• Di is the treatment or the potential cause of which we want to estimate the effect.
o Di = 1 if individual i has been exposed to treatment.
o Di = 0 if individual i has not been exposed to treatment.
• Yi ( Di ) is the outcome or the effect we want to attribute to the treatment.
o Yi (1) is the outcome in case of treatment.
o Yi (0) is the outcome in case of no treatment.
Note that the outcome for each individual can be written as follows:
(1) Yi = DiYi (1) + (1 − Di )Yi (0)
Or simply:
Yi = Yi (1) if Di = 1
Yi = Yi (0) if Di = 0
2. The fundamental problem of causal inference
Definition 1: Causal Effect

For every individual i , the causal effect of Di = 1 is Δ i = Yi (1) − Yi (0)
Problem: we don’t observe the same unit in both treatment states.
Proposition 1: Fundamental problem of causal inference (Holland 1986)

It is not possible to observe for the same individual i the values Di = 1 and Di = 0 as well as the values
Yi (1) and Yi (0) . Therefore, it is not possible to estimate the effect of Di on Yi for each individual i .

1 Ph.D student. Department of Agricultural and Resource Economics. E-mail: smaldonadoz@berkeley.edu
1


Table 1
The Fundamental Problem of Causal Inference
Group Y(1) Y(0)
Treatment (D=1) Observable as Y Counterfactual
Control (D=0) Counterfactual Observable as Y
We are required to think in terms of “counterfactuals”; i.e what would have happened with a
treated individual if he or she would not have received the treatment and viceversa.
3. Solutions to the fundamental problem of causal inference.
Holland (1986) suggests two types of solutions: a) the scientific solution and b) the statistical
solution.
The statistical solution is based on estimating the average effect of the treatment instead of
doing so at an individual level.
The first one in known as average treatment effect (ATE):
(2) ATE = E [ Δ i ] = E [Yi (1) − Yi (0) ]

= E [Yi (1) ] − E [Yi (0) ]
This average effect is still not estimable without further assumptions on the relationship
between the potential outcomes Yi (1) and Yi (0) with the treatment Di .
Notice that we can have a conditional version of this parameter:
(2 ') ATE ( X ) = E [ Δ i X ] = E [Yi (1) − Yi (0) X ]

= E [Yi (1) X ] − E [Yi (0) X ]
More interesting for economists is the average treatment effect on the treated (ATT):
(3) ATT = E [ Δ i Di = 1] = E [Yi (1) − Yi (0) Di = 1]

= E [Yi (1) Di = 1] − E [Yi (0) Di = 1]
As in the previous case, we cannot estimate this parameter without further assumptions.
The conditional version:
2


(3') ATT ( X ) = E [ Δ i X , Di = 1] = E [Yi (1) − Yi (0) X , Di = 1]

= E [Yi (1) X , Di = 1] − E [Yi (0) X , Di = 1]
We can define also a parameter called the average treatment effect for the untreated (ATU):
(4) ATU = E [ Δ i Di = 0] = E [Yi (1) − Yi (0) Di = 0]

= E [Yi (1) Di = 0] − E [Yi (0) Di = 0]
The conditional version:
(4 ') ATU ( X ) = E [ Δ i X , Di = 0] = E [Yi (1) − Yi (0) X , Di = 0]

= E [Yi (1) X , Di = 0] − E [Yi (0) X , Di = 0]
Other parameters of interest in the literature:
• Local average treatment effect (LATE)

• Marginal treatment effect (MTE)
As mentioned before, we can extend these parameters by conditioning on a set of covariates X .
4. The selection problem
A simple way to estimate ATT is by using the mean difference in outcomes (MDO) or naïve
estimator:
(5) MDO = E [Yi Di = 1] − E [Yi Di = 0]

= E [Yi (1) Di = 1] − E [Yi (0) Di = 0]
This provides a biased estimate of ATT:
(6) MDO = E [Yi Di = 1] − E [Yi Di = 0]

= E [Yi (1) Di = 1] − E [Yi (0) Di = 0]
= E [Yi (1) Di = 1] − E [Yi (0) Di = 1]
+ E [Yi (0) Di = 1] − E [Yi (0) Di = 0]
= ATT + { E [Yi (0) Di = 1] − E [Yi (0) Di = 0]}

selection bias
ATT can be consistently estimated using the naïve estimator when there is no selection bias.
Where does selection bias come from?
• Open bias:
3


(7) E [Yi (1) Di = 1] ≠ E [Yi (1) ] ∧ E [Yi (1) X , Di = 1] = E [Yi (1) X ]
The selection process is based on observables.
• Hidden bias
(8) E [Yi (1) X , Di = 1] ≠ E [Yi (1) X ] ∧ E [Yi (1) X , ε i , Di = 1] = E [Yi (1) X , ε i ]
The selection process is based on unobservables.
5. Some conceptual issues
Some issues to have in mind when working with this model:
5.1. The stable unit treatment unit value assumption (SUTVA)
This assumption implies that the potential outcomes of individuals be unaffected by potential
changes in the treatment exposures of other individuals (Morgan and Winship 2007, section
2.4).
One way to understand SUTVA: no general equilibrium effects due to the treatment.
5.2. “No causation without manipulation”
Critical issue: understanding causality in this framework depends on the ability of defining
correctly the potential outcomes.
Poorly defined treatments are those in which the treatment cannot be potentially manipulated.
Example:
• She scored highly on the exam because she is female.

• She scored highly on the exam because she studied.
• She scored highly on the exam because her teacher tutored her.
In which case the potential outcomes are correctly defined?
6. The Experimental Ideal
Key idea of this course: how to approximate our research strategy to one situation that
resembles an experiment in which the treatment is randomly assigned.
Angrist and Pischke (2009): random assignment is the most credible and influential research
design because solves the “selection problem”.
Recall from Table 1,
4


E [Yi Di = 1] = E [Yi (1) Di = 1]

E [Yi Di = 0] = E [Yi (0) Di = 0]
The key question is whether;
(9) E [Yi (0) Di = 0] = E [Yi (0) Di = 1]
And also;
(10) E [Yi (1) Di = 1] = E [Yi (1) Di = 0 ]
Comments:
• Generally, none of these conditions hold with observational data due to the existence of
selection.
• There is an important case in which these conditions are met. That is the case of a
randomized experiment.
• In an experimental design, the treatment Di is randomly assigned. Because of that, the
treatment Di is independent (or orthogonal) of the potential outcomes Yi (1) and Yi (0) .
Therefore,
(11) E [Yi (0) Di = 0 ] = E [Yi (0) Di = 1] = E [Yi (0) ]
(12) E [Yi (1) Di = 1] = E [Yi (1) Di = 0 ] = E [Yi (1) ]
Then, we can compute ATE by simply computing;
(13) ATE = E [ Δ i ] = E [Yi (1) − Yi (0) ] = E [Yi (1) ] − E [Yi (0) ]

= E [Yi (1) Di = 1] − E [Yi (0) Di = 0] = E [Yi Di = 1] − E [Yi Di = 0]
7. Naïve estimation of treatment effects with observational data
Without experimental data, we need to rely on assumptions. In particular, we need to assume

or argue that our treatment is “as good as randomly assigned”. We write this condition as
follows:
(14) {Yi (1), Yi (0)} ⊥ Di
As we will see later, one way to do that is by arguing that the treatment is ignorable after
conditioning by a set of covariates. This is known as selection on observables.
5


The critical assumption can be written as follows:
(15) {Yi (1), Yi (0)} ⊥ Di X
Correspondingly, there are also techniques based on the selection on unobservables,

particularly the instrumental variables approach. The assumption is written as:
(16) {Yi (1), Yi (0)} ⊥ Di X , ε i
Let’s consider the estimation of treatment effects with a random sample from a population.
Thus, we can re-write the naïve estimator of treatment effect in the following way:

(17) Δ NAIVE = EN [ yi di = 1] − EN [ yi di = 1]
Assume that an autonomous fixed treatment selection regime prevails and π is the proportion
of the population of interest that takes the treatment.
In observational studies, there is no guarantee that the naïve estimator is going to converge to
any of the parameters defined earlier.
For instance, we can decompose ATE in the following way:
(18) E [ Δ i ] = π E [Yi (1) Di = 1] + (1 − π ) E [Yi (1) Di = 0]

−π E [Yi (0) Di = 1] + (1 − π ) E [Yi (0) Di = 0]
Comments:
• ATE is a function of five unknowns: the proportion of the population self-selected into the
treatment, and four potential outcomes.
• Without additional assumptions, we can consistently estimate three of these five unknowns
from a random sample of the population.
• In particular, we have that the following sample means converge in probability to the true
population parameters:
EN [di ] ⎯⎯
p
→π
EN [ yi di = 1] ⎯⎯
p
→ E [Yi (1) Di = 1]
EN [ yi di = 0] ⎯⎯
p
→ E [Yi (0) Di = 0]
• Without imposing additional assumptions there is no way to compute the remaining

unknowns.
Now, let’s discuss the bias of the naïve estimator as an estimator of ATE. After a bit of algebra,
it can be shown that:
6


(19) E [Yi (1) Di = 1] − E [Yi (0) Di = 0] = E [ Δ i ] +

E [Yi (0) Di = 1] − E [Yi (0) Di = 0]
+ (1 − π ) { E [ Δ i Di = 1] − E [ Δ i Di = 0]}
This expression suggests that the naïve estimator includes the ATE plus two terms:
• E [Yi (0) Di = 1] − E [Yi (0) Di = 0] , which is known as the “baseline bias”; and
• (1 − π ) { E [ Δ i Di = 1] − E [ Δ i Di = 0]} , which is known as the “differential treatment
effect bias”.
It should be clear that in order to get an unbiased and consistent estimate of ATE from a
random sample of a population, we have to rely on assumptions about the counterfactuals.
Consider the following:
(20) A.1: E [Yi (1) Di = 1] = E [Yi (1) Di = 0 ]
(21) A.2: E [Yi (0) Di = 1] = E [Yi (0) Di = 0 ]
By assuming A.1 and A.2, we can compute the remaining unknowns in equation (13). In such a
situation ATE=ATT=ATU. Consider the following cases:
• A.1 true but A.2 is not:

Therefore, the naïve estimator is biased and inconsistent for ATE but unbiased and
consistent for ATU.
• A.2 true but A.1 is not:
Therefore, the naïve estimator is biased and inconsistent for ATE but unbiased and
consistent for ATT.
8. Final comments
In most of the cases faced by social scientists, both assumptions are hard to believe when only
non-experimental data is available.
We need to find some source of exogenous variation in the data in order to be able to estimate a
causal relationship.
The “beauty” of the modern econometrics is to find ways to do exactly this.
7

Potential Outcomes Framework

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Potential Outcomes Framework

Hochgeladen von

Copyright:

Verfügbare Formate

Applied Econometrics for Economic Research CIES/INEI

The Potential Outcomes Framework or the Neyman-Rubin-Holland Model

• i is an index for individuals in a population.

(1) Yi = DiYi (1) + (1 − Di )Yi (0)

2. The fundamental problem of causal inference

Definition 1: Causal Effect

Problem: we don’t observe the same unit in both treatment states.

Proposition 1: Fundamental problem of causal inference (Holland 1986)

3. Solutions to the fundamental problem of causal inference.

The first one in known as average treatment effect (ATE):

(2) ATE = E [ Δ i ] = E [Yi (1) − Yi (0) ]

Notice that we can have a conditional version of this parameter:

(2 ') ATE ( X ) = E [ Δ i X ] = E [Yi (1) − Yi (0) X ]

(3) ATT = E [ Δ i Di = 1] = E [Yi (1) − Yi (0) Di = 1]

The conditional version:

(3') ATT ( X ) = E [ Δ i X , Di = 1] = E [Yi (1) − Yi (0) X , Di = 1]

(4) ATU = E [ Δ i Di = 0] = E [Yi (1) − Yi (0) Di = 0]

The conditional version:

(4 ') ATU ( X ) = E [ Δ i X , Di = 0] = E [Yi (1) − Yi (0) X , Di = 0]

Other parameters of interest in the literature:

• Local average treatment effect (LATE)

As mentioned before, we can extend these parameters by conditioning on a set of covariates X .

4. The selection problem

(5) MDO = E [Yi Di = 1] − E [Yi Di = 0]

This provides a biased estimate of ATT:

(6) MDO = E [Yi Di = 1] − E [Yi Di = 0]

Where does selection bias come from?

(7) E [Yi (1) Di = 1] ≠ E [Yi (1) ] ∧ E [Yi (1) X , Di = 1] = E [Yi (1) X ]

The selection process is based on observables.

(8) E [Yi (1) X , Di = 1] ≠ E [Yi (1) X ] ∧ E [Yi (1) X , ε i , Di = 1] = E [Yi (1) X , ε i ]

The selection process is based on unobservables.

5. Some conceptual issues

Some issues to have in mind when working with this model:

5.1. The stable unit treatment unit value assumption (SUTVA)

5.2. “No causation without manipulation”

• She scored highly on the exam because she is female.

In which case the potential outcomes are correctly defined?

6. The Experimental Ideal

Recall from Table 1,

E [Yi Di = 1] = E [Yi (1) Di = 1]

The key question is whether;

(9) E [Yi (0) Di = 0] = E [Yi (0) Di = 1]

(10) E [Yi (1) Di = 1] = E [Yi (1) Di = 0 ]

(11) E [Yi (0) Di = 0 ] = E [Yi (0) Di = 1] = E [Yi (0) ]

(12) E [Yi (1) Di = 1] = E [Yi (1) Di = 0 ] = E [Yi (1) ]

Then, we can compute ATE by simply computing;

(13) ATE = E [ Δ i ] = E [Yi (1) − Yi (0) ] = E [Yi (1) ] − E [Yi (0) ]

7. Naïve estimation of treatment effects with observational data

Without experimental data, we need to rely on assumptions. In particular, we need to assume

(14) {Yi (1), Yi (0)} ⊥ Di

The critical assumption can be written as follows:

(15) {Yi (1), Yi (0)} ⊥ Di X

Correspondingly, there are also techniques based on the selection on unobservables,

(16) {Yi (1), Yi (0)} ⊥ Di X , ε i

For instance, we can decompose ATE in the following way:

(18) E [ Δ i ] = π E [Yi (1) Di = 1] + (1 − π ) E [Yi (1) Di = 0]

• Without imposing additional assumptions there is no way to compute the remaining

(19) E [Yi (1) Di = 1] − E [Yi (0) Di = 0] = E [ Δ i ] +

Consider the following:

(20) A.1: E [Yi (1) Di = 1] = E [Yi (1) Di = 0 ]

(21) A.2: E [Yi (0) Di = 1] = E [Yi (0) Di = 0 ]

• A.1 true but A.2 is not:

The “beauty” of the modern econometrics is to find ways to do exactly this.

Das könnte Ihnen auch gefallen