Source: American Journal of Political Science, Vol. 55, No. 1 (January 2011), pp. 188-200

Published by: Midwest Political Science Association

Stable URL: http://www.jstor.org/stable/25766262

Accessed: 16/08/2013 11:49

Science: A Readers'

in Political

Estimation

Instrumental Variables

Guide

Yale University

Yale University

Allison J. Sovey

Donald P. Green

The use of instrumentalvariables regressioninpolitical science has evolvedfrom an obscure technique to a staple of the

uneven

political science toolkit.Yet thesurgeof interestin the instrumentalvariablesmethod has led to implementationof

we

on

in

chart

the

ways which

quality.Afterproviding a briefoverviewof themethod and theassumptions which it rests,

more

than 100 articlespublished in theAmerican

theseassumptionsare invokedinpractice inpolitical science.We review

JournalofPolitical Science, theAmerican Political ScienceReview, and World Politics over a 24-year span.We discuss in

detail twonoteworthyapplications of instrumentalvariables regression,callingattention to thestatisticalassumptions that

each invokes.The concludingsectionproposes reportingstandards and provides a checklist

for readers to consideras they

evaluate applications of thismethod.

Political

fects of independent variables that are measured

served determinants of the dependent variable. Recogniz

ing that ordinary least squares regression performs poorly

entists since the 1970s have turned to instrumental vari

problematic independent variable with a proxy variable

that is uncontaminated

by error or unobserved

Instrumental variables

factors

regres

sion is designed to relax some of the rigid assumptions

of OLS regression, but IV introduces assumptions of its

own. Whether IV is in fact an improvement over OLS de

pends on the tenability of those assumptions in specific

applications (Bartels 1991).

In order to help readers judge the tenability of IV as

sumptions, researchers must provide pertinent evidence

and argumentation. Readers must have access to certain

IV estimator to bias. Readers also need a description of

the causal parameter to be estimated and an argument ex

plaining why the proposed instrumental variable satisfies

Allison

J. Sovey

is a Ph.D.

student

in Political

Science,

Yale University,

of this essay is to call attention to the fact that although

IV applications in political science have grown more nu

merous

regression commonly fail to present evidence and argu

ments that enable readers to understand and evaluate the

statistical conclusions.

begin by providing a brief overview of the as

sumptions underlying the use of instrumental variables.

We

are made apparent in models of po

that

assumptions

tential outcomes (Angrist, Imbens, and Rubin 1996).

We then examine the ways inwhich these assumptions

metric models

based on a review ofmore than 100 articles published in

theAmerican Political Science Review, theAmerican Jour

nal ofPolitical Science, and World Politics over a 24-year

span. We then discuss in detail two noteworthy appli

cations of instrumental variables regression, calling at

tention to the assumptions that each invokes. The con

presentation of IV estimation results and a checklist for

115 Prospect

Street, Rosenkranz

Hall,

Room

437, New

Haven,

CT

Rosenkranz

Hall,

Room

437, New

Haven,

CT

06520

(donald.green@yale.edu).

We

are

helpful

Andrew Gelman, Holger Kern, and Jan Box-Steffensmeier

for

Butler, Thad Dunning,

grateful

comments. We also thank Peter Aronow

and Mario

and provided valuable

Chacon, who assisted us in data collection

suggestions.

This project was funded by support from Yale's Institution for Social and Policy Studies. We are responsible for any errors.

American Journal of Political Science, Vol. 55, No. 1, January 2011, Pp. 188-200

©2010, Midwest Political Science Association

Association

DOI:

10.1111/j.l540-5907.2010.00477.x

188

IV:A READERS'GUIDE

189

method.

variable (Z,) and the covariates.1

Xi

7o + 7i Zi +

8, Qu + 02Q2, +

+8* QKi +

e{

(2)

Variables Estimation

that it appears in equation (2) but not equation (l).2

The instrumental variables estimator in this case may

cated using structural econometric models (Bowden and

Turkington 1984;Theil 1971), withmore recent textbooks

and the covariates; use the coefficients from this first-stage

regression to generate predicted values of X,-; and regress

This estimator presupposes that Zx is not an exact linear

the logic underlying IV regression; formore detailed sta

undefined. Beyond this simple mechanical requirement,

instrumental variables regression generates consistent es

(Morgan and

has the advantage of calling attention to several assump

tions that are often implicit in or ignored by traditional

(2006), and Angrist and Pischke (2008).

and Hill

linear and additive relationship between a dependent vari

The

and an unobserved distur

covariates (Qh, Q2i_Qk*)>

to observations 1 through N. In thismodel

Yi:

p0 + PiXf-+ Xi Qh + k2Qn +

+\KQKi

Uf

(1)

The effectsof the covariates in themodel are of secondary

importance. Notice that thismodel represents the causal

effect (Pi) as a constant, implying that each observation

this condition.

A model of this form allows for consistent estimation

of Pi via ordinary least squares (OLS) ifplim E Xz U{ =

0. In other words, as the sample size approaches infinity,

as the covariance between X; and U{ approaches zero.

The motivation for instrumental variables estimation is

that this requirement is violated when Xz is systemat

causes of Yj. Violations of

ically related to unobserved

this sort commonly occur when factors related to X; that

predict outcomes are omitted from the regression model

independent variables

need not believe that

One

(Wooldridge 2002, chap. 5).

to

have good reason to use IV.

Y{ is causing Xz in order

or when

concern.

Two-way causation is not the only

The instrumental variables estimator ispremised on a

two-equation model

regressor

the predicted values of X,-would be collinear with the co

that the covariance between Z% and Uj goes to zero as N

becomes infinite.When critics question the validity of

unrelated to unobserved factors that affect Y,.

The validity of an instrumentmay be challenged on

various grounds, depending on the research design. In the

context of experimental studies using a so-called encour

agement design, subjects may be randomly assigned (Zt)

to receive a treatment (X,-).Well-known examples of this

of patients to get a flu vaccination (Hirano et al. 2000)

or randomly assigned attempts by canvassers to mobi

lize voters on the eve of an election (Gerber and Green

2000). The fact that encouragements are randomly as

causes of Y? which makes Z; a potentially valid instru

influence on the outcome solely through themediating

is optional

in experimen

covariates

inclusion of exogenous

variable

of the instrumental

tal analysis, as random

assignment

re

ensures that it is statistically independent

of the disturbance

are included

In ob

in the model.

covariates

gardless of whether

servational

studies, the inclusion of covariates usually makes more

instrumental vari

that the near-random

the assumption

plausible

^he

is

One concern, however,

of the disturbance.

is independent

inwhich case including them

that covariates may not be exogenous,

may bias the IV estimates.

able

instru

could be used as excluded

In principle,

several variables

case two-stage

least squares provides

in which

mental

variables,

variables

more

than instrumental

efficient estimates

regression,

and the availability of excess instruments allows the researcher to

strumental

which

mate

variables

are

scarce,

and

so we

focus

one excluded

variable

instrumental

just

the effect of an endogenous

regressor.

on

enables

the case

in

us to esti

ipo

variable X;. In the case of the fluvaccine study,one could

were it the case that

imagine a violation of this condition

encouragement to get a vaccine, rather than the vaccine

itself,affected health outcomes. In the case of votermobi

lization experiments, this assumption would be violated,

campaign learned

of the experimental groups and directed its canvassers to

contact the experimenter s control group.3 In such cases,

Z{ affects Yi through some channel other than X,-.

In nonexperimental research, the validity of this as

sumption is often unclear or controversial. As Dunning

(2008, 288) points out, instrumental variables may be

classified along a spectrum ranging from "plausibly ran

sibly random are IVs that are determined by forces that

Y{. Researchers in recent years have generated a remark

able array of these kinds of studies. Duflo and Pande

struction in explaining poverty. Acemoglu, Johnson, and

Robinson (2001) use themortality of colonial settlers to

estimate the effect of current institutional arrangements

on economic performance. Kern and Hainmueller (2009)

use whether an individual lives near Dresden as an instru

ment

television

struments qualify as "plausibly random" is a matter of

opinion, but at least the authors advance reasoned ar

guments about why such instruments are independent

of unobserved factors that affect the dependent variable.

mographic attributes in studies of political attitudes or

These variables are dubbed instruments as a matter of

stipulation, often without any accompanying argumen

tation.Whether such variables are truly unrelated to the

unmeasured

doubtful.

Even well-reasoned

modeling uncertainty (Bartels 1991), and thismodeling

uncertainty should be reflected in the standard errors as

sociated with IV estimates. However,

it isdifficult to quan

essentially ignore it (Gerber, Green, and Kaplan 2004).

Reported standard errors, in other words, presuppose no

of instrumental variables regression to form an opinion

ALLISONJ.SOVEYANDDONALD R GREEN

adjust the reported standard errors accordingly.

nations forwhy the exclusion restrictions are plausible.

sort are frequently absent from political science publica

tions using IV regression. It should be noted that the plau

on argumenta

sibility of the exclusion restriction hinges

one observes political scientists arguing that an instru

of

Yx on Xx, ZI5 and covariates. This

regression

not provide reliable informa

does

misguided regression

tion about whether Zx is excludable.4

ment

is valid because

in an OLS

(after partialling out the covariance that each

and Xx

variable

nonzero quantity as N becomes infinite.Unlike the ques

tion ofwhether instrumental variables are valid, which is

largely theoretical, the second assumption can be tested

in finite samples based on the empirical relationship be

tween Z; and

If the partial correlation between Z; and

ments problem can lead to substantial finite sample bias

even when there is only a slight correlation between X;

and Ui.Wooldridge

(2009,514) provides a useful heuristic

case where equation (1) excludes covariates (i.e., all 7 k=

0). He notes that in this case the probability limit of the IV

estimator may be expressed as plimfii = (3i+

777"^"'

where tab denotes the correlation between the variables

A and B. This formula makes clear that although the cor

relation between the instrumental variable (Zx) and the

disturbance term (ux)may be very slight in a given appli

cation, the amount of bias may be very large ifthe correla

tion between (Zx) and (X/) is also very small. Fortunately,

the problem of weak instruments is relatively easy to di

agnose. Stock and Watson

F-test that compares the sum of squared residuals from

two nested models: equation (2) versus a restricted re

gression that excludes the instrumental variable(s). For

a single instrumental variable, F statistics under 10 are

thought to suggest a problem ofweak instruments.5

4This regressionwill not yield unbiased estimatesof 2,'s effects

when

5

Xj

is endogenous.

and Watson

suggest using limited information

to Monte

Carlo

simula

likelihood, which,

according

is less prone to bias and has more

reliable standard errors.

maximum

3

Violations

tions,

Another

of this assumption

may

to sample

attri

point

below.

suggestion

and Hansen

Yi on Qki and Zx (Chernozhukov

2008). A permuta

to inference in the presence of weak

tion approach

instruments

is

IV: A READERS'

GUIDE

191

equations inwhich the effectof X; is assumed to be con

hold in a variety of applications. For example, suppose

an interest group randomly assigns voters to receive calls

designed to persuade them to vote for a particular candi

date. Itmay be that targeted voters who are easy to reach

by phone aremore responsive to campaign appeals than

voters who are hard to reach. Indeed, the campaign may

target a particular group precisely because they are both

easy to reach and especially responsive to the message.

local average treatment effect (LATE), that is, the average

treatment effect among those who would be contacted if

assigned to the treatment group but not contacted if as

effectmaybe different from the average effect in the entire

population of voters.

we allow for heterogeneous treatment effects,

when

play

we apply the potential outcomes framework discussed by

Angrist, Imbens, and Rubin (1996) to the application de

scribed by Albertson and Lawrence (2009) in their study

of the effects of viewing a Fox News Special on voters'

support for a ballot proposition on affirmative action. In

their experiment, which we discuss inmore detail below,

group were encouraged to view the program, and the out

follow-up survey, subjects reported supporting the ballot

measure. For ease of exposition, we assume that assign

come measure

We characterize the dependent variable as a pair of po

tential outcomes for subject i: yn denotes the subject's

yio denotes the subject's response ifnot exposed to this

show. Thus, when classified according to their potential

subjects: those who oppose Proposition 209 regardless of

whether they are treated (yn = 0, y^ = 0), thosewho sup

= 1,

port Proposition 209 iftreated and not otherwise (u

=

yi0 0), those who oppose Proposition 209 iftreated and

= 0,

=

1)> and those who

yl0

support it otherwise (yn

209

whether they are

of

support Proposition

regardless

=

=

we

will assume that a

treated (yn

1,y^

1). Note that

or she

person's response is solely a function ofwhether he

or treatments applied

personally is treated; assignments

no

to others have

effect.This requirement is known as

the Stable Unit Treatment Value Assumption, or SUTVA

further assume what Angrist and

(2008, 153) call the independence assumption:

the potential outcomes (y^y yn) are independent of as

(Rubin

Pischke

1978). We

sume that apart from increasing the probability of view

the outcome. This is simply a restatement of the exclusion

restriction.

receiving the treatment, we further distinguish among

agement to view the show. Using Angrist, Imbens, and

Rubins (1996) terminology, we call "Compliers" those

who view the Fox News Special if and only if they are

special program regardless of whether they are assigned

who do not watch regardless of the experimental group

towhich they are assigned are called "Never-Takers." Fi

control group are called "Defiers."

nations of yi and Xj,which is to say 16 possible kinds of

subjects. Table 1describes each of the possible voter types.

Each type comprises a share tt; of the total subject popu

lation,with J2\6 iTj = 1.When we speak of the compiler

average causal effect (CACE), we refer to the causal ef

fect of viewing the Fox News Special among those who

are Compliers. From Table

average causal effect is

1,we

^

+

TT5

-.

1T6-r tt7+ ^8

(3)

tion ofCompliers. Without ample numbers ofCompliers,

the experimenter faces the equivalent of a weak instru

News Special will be weakly correlated with actual view

ing.

observe yx\and yz0for the same individuals. Instead, one

outcome

tual. In order to estimate the compiler average causal ef

fect, a researchermay conduct a randomized experiment.

the treatment group (Zz = 1) or the control group (Zz =

0). Among those assigned to the treatment group, some

watch the Fox News Special (Z3?= 1,Xx = 1) and others

do not (Z{ = 1, Xi = 0). Among those assigned to the

control group, some watch the Fox News Special (Zz = 0,

and others do not (Zz = 0, Xz = 0).

Xi=l)

A randomized experiment provides estimates of sev

eral potentially useful quantities. We will observe the av

erage outcome among those assigned to the treatment

ALLISON

jp2

J. SOVEYAND

P. GREEN

DONALD

Watches

Watches

Supports

Fox News

Fox News

Prop. 209

IfWatches

Special If

Supports

Prop. 209 If

Does Not

Share

Group

Special If

Assigned to

Treatment?

Assigned

to Control?

Debate?

No.

too)

Population

Never-takers

No

No

No

No

IT l

Never-takers

No

No

Yes

No

Never-takers

No

No

No

Yes

Never-takers

No

No

Yes

Yes

Compliers

Yes

No

No

No

Compilers

Yes

No

Yes

No

Compliers

Yes

No

No

Yes

Compliers

Yes

No

Yes

Yes

Type

(yn)

Always-takers

Yes

Yes

No

No

10

Always-takers

Yes

Yes

Yes

No

11

Always-takers

Yes

Yes

No

Yes

12

Always-takers

Yes

Yes

Yes

Yes

13

Defiers

No

Yes

No

No

14

Defiers

No

Yes

Yes

No

15

Defiers

No

Yes

No

Yes

Yes

Yes

16

No

Defiers

Yes

of the

Watch Debate?

^ab

ab

TT5

<

TTab

^8

^ab

^10

ab

7T 12

^13

-b

7T

ab

*Thisshareof thepopulation supportsProposition 209 ifassigned to the controlgroup.

group, the average outcome among those assigned to the

control group, and the proportion of each experimental

group that is actually treated.As Angrist, Imbens, and Ru

to identify the causal effectwithout further assumptions.

In particular, we assume that the population contains no

Defiers

7Ti3

(i.e.,

tth

it 15=

tt\6

0).

This

stipula

Imbens, and Rubin 1996): no one watches the Fox News

Special ifand only ifhe or she is assigned to the control

tal design enables the researcher to identify the compiler

ment effect.6

The mechanics

a treatment is administered. The researcher observes the

rate of Proposition 209 support in the assigned treatment

1) and in the assigned control group (Z,-=

group (Zj=

00, the observed rate of support in the assigned control

6

Estimationof theCACE or LATE becomesmore complicatedwhen

one

controls

similar when

or covariates,

IV is applied

of local average

response

although

to experimental

data. For a discussion

functions,

see Abadie

(2003).

= ^3 + ^4 + ^7 + ^8 +

PlimNc^oo ^

77i0+

77i2

(4)

larly,as the number of observations increases, the support

rate in the assigned treatment group converges in proba

bilityto

PlimNt^oc

773+

774+

776+

778+

77io+ ^12

(5)

debate ifand only ifencouraged to do so (a) is estimated

treatment group who watches minus the proportion of

the assigned control group who watches:

P^N^oo*

(775+

=

776+

777+

(779+

77io+

775+

776+

778+

77n+

779+

77io+

77ii+

77i2)

77i2)

777+

(6)

Combining

equations

776

VtV -Vc

775+

776+

777

777+

778

IV:A READERS'GUIDE

193

causal effect defined above. Another way to summarize

this result is to say that assuming (1) the instrument is

independent of potential outcomes, (2) the exclusion re

striction, (3) a nonzero effectof encouragement on actual

treatment, (4) monotonicity, and (5) SUTVA, IV regres

sion provides a consistent estimate of the complier average

or

the average effect among those induced to view

effect,

the TV special by the experimental encouragement (An

causal effect.The CACE

It should be stressed that the researcher will not know

like

look

just

Compliers

Always-Takers, and in the control

look

group, Compliers

just likeNever-Takers. Moreover,

the identities of the Compliers.

Complied

of the experimental encouragement. If the encourage

ment

"2sls," "3sls," and "stage least squares."7 A detailed listing

the supplementary appendix. Table 2 presents summary

statistics of the 102 articles forwhich instrumental vari

ables methods were mentioned

The articles were divided into four chronological groups:

1985-90, 1991-96, 1997-2002, and 2003-2008. Each ar

ticlewas further classified according to three criteria: the

way inwhich exclusion restrictions are justified,whether

is just-identified or overidentified, and whether

first-stage results are presented.

Our content analysis classified authors' justifications

themodel

egories: "Experiment," "Natural Experiment," "Theory,"

iment" category comprises instrumental variables that

broader point is that even very large experiments may gen

erate different results depending on which sorts of peo

ple are induced to comply with the encouragement. This

whether a researcher or government agency conducted

the randomization.8 In principle, instruments thatwere

heterogeneous treatment

effects are admitted as a possibility, caution must be ex

trition that afflicts the treatment and control groups dif

of instrumental variables estimation, which assume con

stant treatment effects. Once

other settings or populations. Even within a given sam

ercised when

not generalize to non-Compliers.

In political science, the quantity and quality of instru

mental variables applications have evolved considerably

over time. In this section, we describe trends in the use

of instrumental variables in leading political science jour

nals. We analyze articles appearing in the American Po

Science, and World Politics during the period 1985-2008.

The APSR and AJPS were chosen because articles in these

more

journals employ instrumental variables methods

often than do articles in other political science journals

cluded World Politics as well to ensure that our sample

was representative of literature in international relations

the years 1985-2007 in theAJPS, 1985-2005 in theAPSR,

and 1985-2003 inWorld Politics were obtained through

nals were searched directly. In the case ofWorld Politics,

though any given application may suffer from problems

that undermine

ferently.

The next category, "Natural Experiment," includes

instruments thatwere not formed using random assign

turns out that this category includes just one article, as

only Lassen (2004) employed a near-random interven

tion as an instrumental variable. In order to estimate the

effect of information on voter turnout, Lassen exploits

a Copenhagen

referendum on decentralization thatwas

carried out in four of 15 city districts. The districts were

created for the purpose of the experiment, and four dis

tricts, chosen to be representative of the city, introduced

ment seems "plausibly random" since itwas created using

near-random

assignment.

which authors provide a theoretical explanation for the

validity of their exclusion restrictions. In other words, the

authors presented some type of reasoned argument for

The

7Other

searches were

strument,"

but

and "in

tried, such as "IV," "endogenous,"

results to be

far too many unrelated

these yielded

useful.

8An example of a government-run

our

after we completed

appeared

one that

although

experiment,

content analysis,

is Bhavnani's

dates

in India.

194

Table 2

ALLISONJ.SOVEYANDDONALD P. GREEN

1985-1990

Type of Justification

1991-1996

0%

0

7

7

0

0

86

0%

Experiment

Natural Experiment

Theory

9

9

Lag

Reference

Empirics

9

68

None

100%

22

8

23

Total Percent

Number ofArticles

% Just-identified

% Report First Stage

2003-2008

1997-2002

6%

3

31

11

5

5

39

100%

36

22

33

4%

0

14

14

3

17

48

Too%

29

24

31

100%

15

13

7

Table 2 summarizesmore than 100 articlespublished in theAmericanPolitical ScienceReview, theAmerican JournalofPolitical Science,

and World

over a 24-year

Politics

span, categorizing

them according

to the way

Percentages

of Categories:

a random

IVs that were generated

process.

Experiment:

assignment

through

a

IVs that were generated

Natural Experiment:

process.

quasi-random

through

assignment

a theoretical

inwhich the authors provided

for the validity of their exclusion

Theory: Articles

explanation

within

Explanation

Empirics:

selected

based

on

on Z to determine

the strongest instruments).

regressing X

inwhich the author explains the validity

Reference: Articles

None: No justification

provided.

into

that falls

this category is Tsai (2007), which uses

rural Chinese temple activity before 1949 to instrument

public goods provision. Tsai argues that "because of the

nearly complete eradication of community temples and

collective temple activities and the radical social upheaval

period...

temple activity has influenced the current

during theMaoist

precommunist

by making the current existence of temple groups more

likely by providing a familiar template for newly orga

category contains justifications such as Tsai's; however,

the strength of argumentation about the validity of the

exclusion restrictions varies widely. For example, many

authors used variables such as age, gender, or education

as instruments,

arguing that these should be unrelated

to the error term in their regression equation. Our con

tent analysis took a permissive view ofwhat constitutes a

theoretical justification.

The fourth category, "Lag," includes IVs that were

9A

laggedvariable isa realizationof a variable at a previouspoint in

time.For example, laggedcampaign spending is theamount spent

by

a candidate

in a previous

election.

test (such

as

regressing

restrictions

Y on X

restrictions.

and Z

by citing another

to show no

correlation

or

author's work.

variable as an instrument. For example, Gerber (1998)

presents amodel estimating the effectof campaign spend

vote percentage, the endogeneity of campaign spending

must be dealt with. He instrumented for campaign spend

arguing that "due to the staggered nature of Senate elec

volve the same incumbent or challenger. The variable is

therefore free from the criticism thatmight be applied to

lagged spending by the same candidate, namely, that spe

cific candidate attributes are correlated with both the re

gression error and past fundraising levels" (Gerber 1998,

405). Again, instrumental variables in this category must

strength of the author s argumentation.

The fifthcategory, "Empirics," includes IVs thatwere

selected based on the results of an empirical test. For ex

ample, researchers regress Y on X and Z to show no cor

themost highly correlated instruments. Such empirical

tests do not convincingly demonstrate the

validity of the

exclusion restrictions. The first regression is biased in

sofar as X is endogenous (suspicions about endogeneity

are presumably what

impelled the researcher to turn to

IV regression); the second regression says nothing about

whether Z is independent of the disturbance term.

the author explains the validity of his or her ex

clusion restrictions by citing another author's work. For

Applications

Our

which

struments as Gerber (1998) and merely citeGerber's work

rather than providing a full justification for their selec

tion.

where no justification for the exclusion restrictions is

provided. Two coders evaluated each article in order to

cations. The first uses random assignment as an instru

mental variable and illustrates the special considerations

that arise with noncompliance and attrition. The second

uses a near-random intervention, change in rainfall, as an

instrumental variable and illustrates the special consid

erations that arise when applying IV to nonexperimental

data.

Table 2 displays some encouraging

trends. First, it

is clear that the percentage of articles that provide some

justification for the choice of instruments increased sub

into the "Experiment," "Natural Experiment," "Theory,"

"Lag," and "Reference" categories. Collectively, the ar

between 1991 and 1996 to 56% in themost recent period.

periments emerged, growing from 0% in early periods to

6% most recently.Another encouraging sign is the rising

percentage of just-identifiedmodels. Apparently, the real

ization that valid instruments are hard to find and defend

inating in their choice of instruments.10 These numbers

litical scientists in selection and implementation of in

became more transparent over time. The percentage of

articles reporting the first-stage relationship between Z

strumental variables methods.

to 33% between 2003 and 2008.

In absolute terms, however, there is stillmuch room

for improvement. Almost half of the articles published

as late as 2003-2008 offered no argumentation or de

first-stage results, and only a fraction of these assessed sta

tisticallywhether instruments areweak or whether overi

are signs that scholars are becoming more sophisticated

in terms of argumentation and presentation. We now turn

IV. The fact that both sets of authors have made their

be evaluated in depth.

identification

(2009) on alternative

strategies. Some

instrument and ask what param

start with a near-random

Randomized Experiment with

Noncompliance

Those who

the laboratory confront the problem of selective exposure:

people decide whether towatch a TV program, and there

ers and nonviewers. In an innovative attempt to address

the selection problem, Albertson and Lawrence (2009)

analyzed an experiment in which survey respondents

were randomly encouraged to view a Fox News debate

on affirmative action on the eve of the 1996 presiden

tial election. Shortly after the election, these respondents

respondents whether they viewed the Fox News debate

with affirmative action. The authors report that 45.2% of

the 259 people who were reinterviewed in the treatment

4.4% of the 248 respondents who were reinterviewed in

the control group. The F-statistic implied by this first

stage regression is 142.2, which allays any concerns about

weak

instruments.

the

Albertson and Lawrence appropriately model

relationship between media exposure and support for

Proposition 209 in a manner that does not presuppose

that exposure is exogenous. Their two-equation system is

(8)

Xi =y0 + 1iZi + ei

(9)

the Fox News debate, and Zz is the random assignment

variable, Albertson and Lawrence's IV regression results

10See Gelman

authors

with a parametertheyhope

etersitmight help identify;

othersstart

to identify and

instrument.

11

The

authors

also

include

an array of covariates,

but we

exclude

196

show a substantively strong but statistically insignificant

relationship between program viewing and support for

the proposition. Viewers were 8.1 percentage points more

likely to support the ballot measure, with a standard error

Several features of this study are noteworthy from the

the local average treatment effectof viewing the program

tially watch the program only if assigned to receive an

up letter containing $2 and a reminder to watch in the

form of a refrigeratormagnet. It seems reasonable to

dents' propensity to view the debate, which implies that

we can safely assume monotonicity

(i.e., no Defiers). It

follows thatCompliers constitute 45.2% - 4.4% = 40.8%

of this sample.

Second, the ignorability restriction stipulates that the

treatment and control groups are identical except for the

may argue that random assignment created groups that,

in expectation, have identical potential outcomes. In ad

letter and accompanying payment had no direct effect

on support for Proposition 209. On the other hand, the

trition from the treatment and control groups. We do not

know whether rates of attrition are similar in the two ex

of attrition are similar. If attrition operates differently in

Proposition 209, the IV estimates may be biased.

To investigate whether attrition presents a problem

for their research design, we use Albertson and Lawrence's

replication data to conduct a randomization check. Their

com

pleted both the pretest and the posttest, and the ques

tion iswhether attrition introduced noticeable imbalance

among pretreatment covariates. A regression of treatment

assignment on the demographic variables used in their

study does not yield any significant predictors of treat

ment assignment. (The demographic variables in our

regression include Party Identification, Interest in Poli

tics,Watch National News, Read Newspapers, Education,

Income, Gender, White, and dummy variables formiss

statistic, F( 16,490) = 1.24, p = .23, is consistent with the

observables.

of com

Fox News debate, and the difference between the treat

ment and control group viewing rates forms the denom

inator of the IV estimator. A potential concern is that

viewed the program in order to appear to comply with

interviewers' encouragement. This form ofmeasurement

errorwill cause researchers to overstate the proportion of

erage treatment effect.As the authors note, researchers

using this encouragement design in the futuremay wish

to insert some specific recallmeasures to gauge the relia

Application

2: IV Regression

Experiment

and a Natural

ral experiment that has attracted a great deal of atten

strategy.The authors use variation in rainfall (percentage

for economic growth in order to estimate the impact

of economic conditions on civil conflict. This approach

attempts to overcome the problems of correlation be

tween economic growth and unobserved causes of con

flict,which has plagued other observational studies.

The authors focus on the incidence of civil war

fractionalization, mountainous

terrain, and population.

fixed

effects

and

Country

country-specific time trends are

also included inmost

and Sergenti find a significant positive

relationship be

tween rainfall and GDP growth but

acknowledge that

in

rainfall

falls

short

of

change

passing theweak instru

ments test proposed by Stock andWatson

(2007, 735) in

all of the specifications they present.

The second-stage equation estimates the impact of

GDP growth and lagged income growth on the incidence

of violence. Their

12In

other analyses,theauthors findthatviewing theprogramdid

some

evidence

turnout

or attitude

that viewers

feltmore

polarization,

informed

there is

although

about the issue.

in

covers 41 countries in 19years. Current and

lagged rainfall

growth are used to instrument for per capita economic

growth and lagged per capita economic growth control

ling for other country characteristics such as religious

rent and lagged economic growth

significantly reduce the

likelihood of civil conflict. This basic pattern of results

holds up when

IV:A READERS'GUIDE

197

lem is fairlyminor.

instruments prob

this claim rests. First, consider the estimand. Unless one

is prepared to assume that effects of a one-unit change

in economic growth are the same regardless of how eco

nomic growth comes about, the instrumental variables

estimator may be said to gauge the local average treat

ofMiguel, Satyanath, and Sergenti, Dunning (2008) ar

gues that growth in different economic sectorsmay have

ment

nate the growth-induced effectsof the agricultural sector.

treatment ef

Relaxing the assumption of homogeneous

fects forcesmore cautious extrapolations from the results.

The resultsmay tell us not about the effectsof economic

a

growth but of particular type of economic growth.

A second assumption is that rainfall is a near-random

source of variation in economic growth. In a natural ex

or event satis

periment, "it is assumed that some variable

or variable is

event

the

the

criterion

fies

of'randomness,'

factors

orthogonal to the unobservable and unmalleable

that could affect the outcomes under study" (Rosenzweig

and Wolpin 2000, 827). In this case, the exogeneity of

valuable. These possible SUTVA violations can produce

biased estimates, which, importantly, could be biased in

either direction.14

In sum, these two applications illustrate the kinds

mental and nonexperimental analysis. When reading ex

one must con

periments that involve noncompliance,

sider whether

the outcome

evaluating experiments

as

be alert to problems such

attrition, which threaten to

undermine the comparability of the treatment and con

trol groups. When reading nonexperimental applications,

When

to the assumption

special critical attention must be paid

that the instrument is unrelated to the disturbance term.

reflecton whether the instrumental variable may transmit

its influence on the outcome through causal pathways not

specified by themodel. Instrumental variables estimation

prepared to critically evaluate these arguments.

can exam

truly random, it should be unpredictable. One

ine whether rainfall's associations with other observable

assignment. Using the replication dataset thatMiguel,

their article, we

Satyanath, and Sergenti provide with

ter

find that factors such as population, mountainous

rainfall

GDP

rain, and lagged

growth

significantly predict

or lagged rainfall growth, although these relationships are

not particularly strong and the predictors as a group tend

to fall short of joint significance.13 Suppose for the sake

of argument that these covariates were found to be sys

still

tematically related to rainfall growth. Rainfall could

A Reader's Checklist

Having reviewed the assumptions underlying instrumen

tal variables regression, both in general and with regard to

we conclude with a checklist (sum

specific applications,

marized inTable 3) for readers to consider as they evaluate

argumentation and evidence.

1)What is the estimand? A basic conceptual question

iswhether treatment effects are homogenous. In

strumental variables regression identifies the local

fect among Compliers. Ifhomogenous treatment

effects are assumed, then the LATE is the same

as the average treatment effect for the sample as

model. However, the reason using rainfall as an instru

ment is intuitively appealing is thatwe think of rainfall

as patternless. If rainfall growth is systematically related

to other variables, we have to assume that our regression

model includes just the right covariates in order to isolate

the random component of rainfall.

A further estimation concern is that rainfall in one

countrymay have consequences for the economic growth

in another country, creating potential SUTVA violations.

For example, drought in one country could make an

13

of thesestatisticalrelationshipsvariesdepending

The significance

on model

specification.

of the analysis.

materials

for details

for reasons other than the treatment itself.

re

drawing inferences from IV

sults, the reader should consider the question of

whether results for theCompliers in thisparticular

a whole. When

do rainfall

study are generalizable. For example,

induced shocks to economic growth have the same

effect on ethnic violence as technology-induced

14

Even

if there were

no

spillover

effects, countries

The

to

graphicallyproximal are likely shareweather assignments.

fact that rainfallis randomly "assigned" to geographic locations

has potentiallyimportantconsequences fortheestimatedstandard

errors. Miguel,

Satyanath, and Sergenti

is not the same as clustering by geographic weather

patterns and

errors (Green and Vavreck 2008).

standard

the

underestimate

may

ig8

Table 3

What

Model

Issues toAddress

Category

is the estimand?

Independence

homogenous or heterogeneous?

Explain why it isplausible to believe that

different instruments or populations

Conduct a randomization check (e.g., an

F-test) to look forunexpected correlations

other

variable.

covariates.

predetermined

across

Exclusion Restriction

Instrument Strength

treatment

on

the outcome.

the endogenous independent variable after

are no Defiers, that is,people who take the

treatment ifand only if they are assigned

to the control group.

Value Assumption

(SUTVA)

groups.

paths from the instrumental variable to

Check whether the F-test of the excluded

instrumental variable is greater than 10.

Ifnot, checkwhether maximum likelihood

estimation

control

instrumental variable has no direct effect

Monotonicity

and

generates

similar

estimates.

explain why the research design rules out

to those in the control group).

given observation isunaffected by

units.

geographical proximity or proximity

fects is best addressed through replication. Do

based on randomization or a near-random pro

ment,

trapolation becomes increasingly plausible when

than one can reasonably expect from a sin

more

guide readers' intuitions about heterogeneous ef

fects by describing how other IV approaches have

2) Is the instrumental variable independent of the

potential outcomes? When evaluating the inde

pendence assumption, the reader should take note

ofwhether it is justified empirically, procedurally,

or theoretically. Empirical justifications that take

the form of a statistical test should be read with

direct test of this assumption. One should be es

strument based on preliminary regressions show

consistent with the hypothesis of random assign

is weakly pre

dicted by other covariates. If attrition occurs, the

researcher should assess whether the loss of treat

ment

undermines

the

tal variables are proposed on theoretical grounds,

term. For example, McCleary and Barro (2006)

by

to identify the effect of per capita GDP on

which

unmeasured causes of religiosity?

3) Suppose an instrumental variable is deemed ex

ogenous because it is random or near random. Are

the exclusion restrictions valid? This assumption

implies that the instrument can have no effecton

IV:A READERS'GUIDE

199

opinion change is induced when a person is invited

towatch the TV special, regardless of whether he

or she in factwatches?

are instrumental variables whose

incremental

contribution to R-squared (over and above the

contribution of other covariates) in the first-stage

is so low that the risk of bias is severe.

equation

Although the precise criteria by which to evalu

ate the weakness of an instrument are subject to

debate, the usual rule of thumb is that a single

instrumental variable should have an F-statistic of

treatment

to

assignment

(Wooldridge 2003). But even

our

abbreviated list of evaluative criteria, the

consulting

limited dependent variables

lenge: most publications that use instrumental variables

readers need in order to evaluate the statistical claims. If

authors could be encouraged to consider the abbreviated

checklist presented above, the quality of exposition?and,

one hopes, estimation?might

improve substantially.

to grow dramatically in years to come, and with good

struments bias. In the case of a single instrumental

of selection bias and unobserved heterogeneity. By pro

viding a checklist for readers to consider as they critically

research design. The reason to read instrumental variables

instru

t-ratio must be greater than 3.16. When

ments fall short of this threshold, researchers are

encouraged to check the robustness of their re

(2007).

effecton the treatment?The assumption ofmono

tonicity states that there are no units that receive

inferior to other estimation approaches. On the contrary,

instrumental variables regression is extraordinarily useful

oriented research deserves special attention.

group, ruling out the existence of Defiers. This

imentswhere the treatment is only available to the

treatment group. However, in other experimental

and observational research designs, this assump

tion ismore uncertain. For example, in theMiguel,

Satyanath, and Sergenti (2004) study, increased

nomic growth; more rain could actually impede

tion of monotonicity would be violated, leading

to potentially biased estimates of the local average

treatment effect.

of the Stable Unit Treatment Value As

sumption, or SUTVA, occur when outcomes for

one unit depend on whether other units receive

Violations

observation is affected by another observation sZx

or X{. SUTVA violations may lead to biased esti

on the way inwhich treatment effects spill over

across

observations.

exhaust the list of concerns, and one could easily ex

These

arising from

References

Abadie,

2003.

Alberto.

Instrumental

"Semiparametric

ofEconometrics 113: 231-63.

Simon

Daron,

Acemoglu,

Johnson,

and

James A.

Vari

Journal

Robinson.

An Empirical Investigation." American Economic Review

91(5): 1369-1401.

itsRoll: The Long-Term EffectsofEducational Television on

Public Knowledge andAttitudes."American PoliticsResearch

37(2): 275-300.

W.

Guido

Joshua D.,

Angrist,

and Donald

Imbens,

B. Rubin.

Variables." Journalof theAmerican StatisticalAssociation 91:

444-55.

Harmless

An Empiricist's

Press.

Econometrics:

Prince

Companion.

University

Experiments to StudyVoting Behavior." The Annals of the

American Academy of Political and Social Science 60(1):

169-79.

E.

Barnard,

J.,C.

Bartels,

Larry M.

Frangakis,

J. L. Hill,

and D.

B. Rubin.

2003.

Experiments: A Case Study of School Choice Vouchers in

New York City." Journalof theAmerican StatisticalAssocia

tion98(462): 299-324.

1991.

"Instrumental

and

'Quasi

Instrumen

777-800.

200

ALLISON

J. SOVEYAND

DONALD

P. GREEN

They AreWithdrawn? Evidence froma Natural Experiment

in India." American Political ScienceReview 103(1): 23-35.

Variables. Cambridge and New York: Cambridge University

Negative Campaigning inU.S. Senate Elections." American

JournalofPolitical Science 46(1): 47-66.

Press.

Chernozhukov,

Victor,

and Christian

Hansen.

2008.

"The

Re

Instruments."

Economics

Letters

100(1):

68-71.

Turnout:

Evidence

from a Natural

American

Experiment."

ables in Econometrics. Cambridge: Cambridge University

Press.

nal ofEconomics 122(2): 601-46.

Effects."American Economic Review 80(2): 319-23.

and Limitations ofNatural Experiments." Political Research

Quarterly 61(2): 282-93.

and Economy." Journal ofEconomic Perspectives20(2): 49

Harmless Econometrics: An Empiricist's Companion,' by

Pischke." The Stata Jour

JoshuaD. Angrist and Jorn-Steffen

nal 9(2): 315-20.

Hill. 2006. Data Analysis Using

Gelman, Andrew, and Jennifer

Regression andMultilevel/HierarchicalModels. Cambridge:

Cambridge UniversityPress.

"Economic Shocks and Civil Conflict: An Instrumental

Variables Approach." Journal ofPolitical Economy 112(4):

ingon Senate Election Outcomes Using InstrumentalVari

ables." American Political ScienceReview 92(2): 401-11.

Gerber,

Alan,

and Donald

Green.

2000.

Field Experiment." American Political ScienceReview 94(3):

653-63.

Illusion of Learning fromObservational Research. In Prob

lemsandMethods in the Study ofPolitics, ed. Ian Shapiro,

Rogers Smith, and TarekMassoud. New York: Cambridge

University

Green,

Donald

Press,

251-73.

2008.

"Analysis

of Cluster

timationApproaches." PoliticalAnalysis 16(2): 138-52.

B. Rubin, and Xiao

Keisuke, Guido W. Imbens, Donald

Zhou.

2000.

the Effect of an Influenza Vac

"Assessing

cine in an Encouragement

Biostatistics

1(1): 69-88.

Design."

Hirano,

Hua

curate Confidence

Intervals with aWeak

2005.

"Robust,

Instrument:

Ac

Quar

Society:SeriesA (Statistics inSociety) 168(1): 109+.

Kern, Holger Lutz, and JensHainmueller. 2009. "Opium for

theMasses: How ForeignMedia Can StabilizeAuthoritarian

Regimes." PoliticalAnalysis 17(4): 377-99.

72.

725-53.

Methods and Principlesfor So

factuals and Causal Inference:

cial Research (Analytical

Methods for Social Research). Cam

bridge: Cambridge University Press.

Murray, Michael P. 2006. "Avoiding Invalid Instruments and

Coping withWeak Instruments." Journal ofEconomic Per

spectives20(4): 111-32.

Rosenzweig, Mark R, and Kenneth I.Wolpin. 2000. "Natural

'Natural

Experiments'

in Economics."

Literature38(4): 827-74.

of Economic

Journal

The Role of Randomization." The Annals of Statistics6(1):

34-58.

Stock,

James H.,

and Mark

W. Watson.

2007.

Introduction

to

River,NJ: PrenticeHall.

Theil, Henri. 1971.Principles ofEconometrics.New York:Wiley.

Tsai, Lily L. 2007. "Solidary Groups, InformalAccountability,

and Local

Public

Goods

Provision

in Rural

China."

American

2009. Introductory

Econometrics:AModern

Wooldridge, Jeffrey.

Approach.

OH:

South-Western

College.

Wooldridge, Jeffrey

tionand Panel Data. Cambridge, MA: MIT Press.

M. 2003. "Cluster-Sample Methods in

Wooldridge, Jeffrey

Applied Econometrics." American Economic Review 93(2):

133-38.

