Causal Inference Missing Problem

Causal
Inference IS a Missing Data

Problem!
Let y(E) be the value of Y measured at t2 on
the unit, given that the unit received the
experimental Treatment E iniFated at t1
Let y(C) the the value of Y measured at t2 on
the unit given that the unit received the
control Treatment C iniFated at t1
The y(E) Y(C) is the causal eect of the E
versus C treatment on Y for that trial, that is,
for that parFcular unit and the Fmes t1, t2
Reference: Rubin, 1974, JEP, p. 689
The Fundamental Problem Facing

Causal Inference
Missing data: either Y(E) or Y(C) is missing
Need potenFal outcomes notaFon to reveal this
perspecFve
OLS and companions lose insights to the scienFc
quesFon
Without insights, even RA Fisher can get confused
To answer the scienFc quesFon, must address
the missing potenFal outcomes, in general
Job Training ApplicaFon

MulFply imputed missing potenFal outcomes,
given MLEs of parameters of parsimonious
model consistent with observed data and
scienFc understanding
More appropriately, we wish to draw
parameters from their joint posterior
distribuFon
IllustraFng The General Approach with

Two Examples
Both examples involve complex modeling
Examples dier in type of model used for mulFply impuFng
the potenFal outcomes
First example,
Real data with an ignorable assignment mechanism with
intervenFon at a point in Fme in two age groups
This required Fme series modeling and mulFply impuFng the
but-for world without the intervenFon
Second example,
Two group hypotheFcal observaFonal study with assumed

unconfounded assignment mechanism with many covariates
Simulated evaluaFons of spline-based mulFple imputaFon
relying on propensity score esFmaFon to dene knot locaFons
CDCs EvaluaFon of the Impact of a

New Pneumococcal Conjugate Vaccine
Invasive Pneumococcal Disease for Children <5

Years of Age and Adults >65 Years of Age, U.S.
100
Cases per 100,000 Popula/on
90
PCV7
PCV13
80
70
60
50
40
>65 yrs
30
20
<5 yrs
10
0
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Year
Source: www.cdc.gov/abcs
Baseline, pre PCV13 introducFon: 1 July 2004 31 March 2010
EvaluaFon period: 1 July 2010 30 June 2012
The ScienFc QuesFon

What would have happened in the evaluaFon
period without the introducFon of PCV13?
Use potenFal outcomes approach to esFmate
counterfactual case counts that would have
occurred in the absence of the new vaccine
(PCV13) introducFon.
Source: M. Moore, T. Pondo, T. Taylor, and E.R. Zell
IPD Modelling
Invasive Pneumoccal Disease (IPD) is a
seasonal disease with peaks in winter and
nadirs in summer
Strains of pneumococcal bacteria not included
in the PCV7 vaccine have become more
common
Time-series models should be cyclical/
seasonal in structure and contain a
longitudinal component
Time-Series Model t to Number of Cases of PCV5

IPD in Baseline Period, Children < 5 Yrs
Previous Display Extended to EvaluaFon Period
All Invasive Pneumococcal Disease, U.S.

50
45
40
PCV13
35
>65 yrs
30
25
20
<5 yrs
15
10
5
0
7/04-6/05 7/05-6/06 7/06-6/07 7/07-6/08 7/08-6/09 7/09-6/10 7/10-6/11 7/11-6/12
Year
Source: AcFve Bacterial Core surveillance, based on direct standardized
11
PCV5 Invasive Pneumococcal Disease, U.S.

20
PCV13
18
16
14
12
10
>65 yrs
8
6
4
<5 yrs
2
0
7/04-6/05 7/05-6/06 7/06-6/07 7/07-6/08 7/08-6/09 7/09-6/10 7/10-6/11 7/11-6/12
Year
Source: AcFve Bacterial Core surveillance, based on direct standardized
12
PrevenFon of PCV5 Serotype IPD, U.S.

July 2010
-June 2011
July 2011
-June 2012
Cases
Incidence*
Percent Decrease
Cases
Incidence*
Percent Decrease
Children <5 years
Adults >65 years
1850 (1410, 2240)
1370 (710, 2010)
9.2 (7.0, 11.1)
3.4 (1.8, 5.0)
65% (58%, 69%)
24% (14%, 31%)
2690 (2200, 3160)
2870 (2080, 3650)
13.4 (1.8, 5.0)
6.9 (5.0, 8.8)
87% (85%, 89%)
47% (39%, 53%)
*Incidence is cases per 100,000 populaFon
13
An Outcome-Free Procedure for

Interval EsFmaFon of Causal Eects
Roee Gutman
Donald B. Rubin
Controlling by OLS
Xt ~ N(2.3,1.1)
Xc ~ N(1.3,0.7)
Y(1) = exp(X 1.5) + 2
Y(0) = exp(X 1.5) + 2
E(E(Y(1)|X) - E(Y(0)|X)) =
E(E(ex-1.5+2|X)-E(ex-1.5+2|X)) = 0
(0.3)
Note: Similar to Cochran and Rubin, 1973
Performance of OLS Under the Null
Coverage Rate for 95% Interval
B \ 2
0.5
0.25
0.004
0.88
0.23
0.5
0.014
0.73
0.42
0.091
0.42
0.36
Xt ~ N(,2)
Xc ~ N(0,1)
Controlling using OLS with non-

linear terms
Xt ~ N(2.3,1.1)
Xc ~ N(1.3,0.7)
Y(1) = exp(X 1.5) + 2
Y(0) = exp(X 1.5) + 2
E(E(Y(1)|X) - E(Y(0)|X)) =
E(E(ex-1.5+2|X)-E(ex-1.5+2|X)) = 0
(0.06)
Performance of OLS With Added

Terms Under the Null
B \ 2
0.5
0.25
0.81
0.81
0.47
0.5
0.84
0.68
0.40
0.75
0.49
0.36
Xt ~ N(,2)
Xc ~ N(0,1)
Sub-classica/on
Xt ~ N(2.3,1.1)
Xc ~ N(1.3,0.7)
Y(1) = exp(X 1.5) + 2
Y(0) = exp(X 1.5) + 2
E(E(Y(1)|X) - E(Y(0)|X)) =
E(E(ex-1.5+2|X)-E(ex-1.5+2|X)) = 0
Subclass
Dierence
SD
-0.005 -0.027 0.04

0.02
0.02
0.03
Total
0.24
5.33
1.12
0.13
0.96
0.2
Sub-classica/on with Regression

Adjustment
Xt ~ N(2.3,1.1)
Xc ~ N(1.3,0.7)
Y(1) = exp(X 1.5) + 2
Y(0) = exp(X 1.5) + 2
E(E(Y(1)|X) - E(Y(0)|X)) =
E(E(ex-1.5+2|X)-E(ex-1.5+2|X)) = 0
Subclass
Total
Dierence
0.004
0.002
-0.003
-0.02
-1.9
-0.38
SD
0.005
0.0012
0.002
0.01
1.5
0.31
Performance of Sub-classica/on
with Regression Adjustment
B \ 2
0.5
0.25
0.62
0.92
0.85
0.5
0.49
0.90
0.91
0.42
0.69
0.91
Xt ~ N(,2)
Xc ~ N(0,1)
Proposed Procedure with Mul/ple

Covariates
EsFmate propensity score and create ve (or more)
subclasses
Discard units that do not overlap.
EsFmate Yc=fc(X) and Yt=ft(x), using cubic splines with
knots at subclass boundaries along propensity score
and addiFve linear model for components orthogonal
to the propensity score.
MulFply impute the missing potenFal outcomes M
Fmes (using the posterior distribuFon of parameters).
Calculate the mean and the standard deviaFon for
average treatment eect for each of the M completed
data sets.
Combine using Rubins Rule for MI.
Overall 95% Average Coverage
Why Does It Fail when it does?
Conclusions
To obtain a valid esFmate of the treatment eect
there must be balance on the covariates.
When there is not enough overlap between the
distribuFon of the covariates in treatment and control
populaFons none of the methods will work.
Using MITSS is the only generally valid method with a
coverage interval that is not excessively large.
MITSS also allow to dene and esFmate any funcFon
of interest as treatment eect.
MITSS allows to obtain nite sample esFmate.
The results are true for conFnuous as well as binary
outcome
Results (null eect)

Monotone Response Surfaces
Non-monotone Response Surfaces
Results (treatment eect exists)

Monotone Response Surfaces
Non-monotone Response Surfaces

Causal Inference Missing Problem

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Causal Inference Missing Problem

Hochgeladen von

Copyright:

Verfügbare Formate

Causal

Inference IS a Missing Data

The Fundamental Problem Facing

Job Training ApplicaFon

IllustraFng The General Approach with

Two group hypotheFcal observaFonal study with assumed

CDCs EvaluaFon of the Impact of a

Invasive Pneumococcal Disease for Children <5

Cases per 100,000 Popula/on

The ScienFc QuesFon

Time-Series Model t to Number of Cases of PCV5

Previous Display Extended to EvaluaFon Period

All Invasive Pneumococcal Disease, U.S.

Cases per 100,000 Popula/on

7/04-6/05 7/05-6/06 7/06-6/07 7/07-6/08 7/08-6/09 7/09-6/10 7/10-6/11 7/11-6/12

PCV5 Invasive Pneumococcal Disease, U.S.

Cases per 100,000 Popula/on

PrevenFon of PCV5 Serotype IPD, U.S.

Children <5 years

Adults >65 years

1850 (1410, 2240)

1370 (710, 2010)

9.2 (7.0, 11.1)

3.4 (1.8, 5.0)

65% (58%, 69%)

24% (14%, 31%)

2690 (2200, 3160)

2870 (2080, 3650)

13.4 (1.8, 5.0)

6.9 (5.0, 8.8)

87% (85%, 89%)

47% (39%, 53%)

*Incidence is cases per 100,000 populaFon

An Outcome-Free Procedure for

Performance of OLS Under the Null

Coverage Rate for 95% Interval

Controlling using OLS with non-

Performance of OLS With Added

-0.005 -0.027 0.04

Sub-classica/on with Regression

Proposed Procedure with Mul/ple

Overall 95% Average Coverage

Why Does It Fail when it does?

Results (null eect)

Non-monotone Response Surfaces

Results (treatment eect exists)

Non-monotone Response Surfaces

Das könnte Ihnen auch gefallen