Sie sind auf Seite 1von 51

Unpacking Human Rights Treaty Impact:

The Differing Causal Effects of Monitoring Procedures

October 30, 2017

Abstract

When evaluating the impact of international human rights treaties, the existing literature fo-

cuses almost exclusively on treaty ratification. We instead unpack treaty ratification into multiple

monitoring procedures and compare their individual causal effects on human rights outcome. Us-

ing a graphical model to assist causal identification and employing the targeted learning method-

ology for estimation, we estimate the causal effects of treaty monitoring procedures under the

Convention against Torture and its Optional Protocol, including state reporting, inquiry, individual

communication, and country visit. The results indicate that only the country visit procedure has

a consistent, positive causal impact. Other monitoring procedures do not. The differing causal

effects, we argue, are a function of the intrusiveness of different monitoring procedures. The find-

ings advance our understanding of the effectiveness of international human rights treaties and

demonstrate a novel application of the structural causal inference framework and ensemble ma-

chine learning methods in political science research.

Key words: human rights treaties; causal effects; causal graphs; machine learning

JEL codes: K38, C32, C53


1 Introduction

For many decades the United Nations (UN) human rights treaty system has been a crucial endeavor

to protect human rights across the world. An expansive body of research has examined the impact

of particular human rights treaties on government repression (Hathaway 2002; Landman 2005;

Neumayer 2005; Hafner-Burton and Tsutsui 2007; Simmons 2009; Hill 2010; Lupu 2013a; Clark

2013); yet, few have evaluated the comparative effectiveness of monitoring procedures under these

treaties. Two anecdotal examples highlight the potential impact of a treaty monitoring procedure—

the individual communication mechanism. In 1995, the Human Rights Committee—the treaty body

under the International Covenant on Civil and Political Rights (ICCPR)—in the case of Arhuaco v.

Colombia (Communication No. 612/1995) found the Colombian government responsible for the

torture and arbitrary detention of Jose Vicente and Amando Villafane. Following the Human Rights

Committee’s decision, the Colombian government, under its own Law 288/96, issued a favorable

opinion for compliance and later let the case proceed to national courts (Ulfstein and Keller 2012:

365). Similarly, in 1998, the Committee against Torture—the treaty body under the Convention

against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment (CAT)—issued

its decision in Ristic v. Yugoslavia (Communication No. 113/1998), concluding that the government

had violated its obligations under the CAT and ordering remedial measures. Even though the Com-

mittee’s decision was not legally binding, the country’s Supreme Court endorsed the decision and

ordered reparations for the victims (Ulfstein and Keller 2012: 369–370).

By focusing on treaty monitoring procedures, we hope to yield more insights into the empirical

effectiveness of the UN human rights treaties at a more micro-level. The motivating idea is to unpack

treaty practice into monitoring procedures and estimate the causal effect of each procedure on

human rights outcome, thus providing a more granular picture of the UN human rights treaty system.

Overall, there are five types of treaty monitoring procedures, including state reporting, inquiry, state

communication, individual communication, and country visit. We exclude state communication

from our investigation because this procedure has never been used in practice before.

State reporting refers to the procedure according to which a state party submits periodic self-

reports on the measures it has taken and the progress made to fulfill its treaty obligations. State

reporting is also the only mandatory procedure under all UN human rights treaties. Participating

in other monitoring procedures remains optional, which allows treaty members to opt out by way

1
of making a reservation (the inquiry procedure) or requires them to opt in through a unilateral

declaration (the state communication procedure and the individual communication procedure) or

specifically demands a separate formal ratification (the country visit under the Optional Protocol to

the CAT).

Under the state communication procedure, the treaty body—a committee of independent

experts—hears complaints that one participating state may bring against another participating state.

The individual communication procedure under Art. 20 of the CAT, for example, allows the treaty

body to receive and adjudicate complaints brought against a state party by individuals within that

state’s jurisdiction. In the absence of a declaration to opt out of the inquiry procedure, treaty mem-

bers have to answer questions and inquiries that the treaty body may have regarding allegations of

systematic violations. The treaty body may conduct an inquiring visit to investigate allegations of

treaty violations under this monitoring procedure as well.

The only operative country visit procedure in the UN human rights treaty system at this mo-

ment is under the Optional Protocol to the CAT (OPCAT). It permits a separate monitoring body—the

Subcommittee on Prevention of Torture (SPT)—to visit, investigate, report, and even publicize its

findings on the torture practices of a member state. Table 1 summarizes the monitoring procedures

under the CAT (Egan 2011; Keller and Ulfstein 2012; Rodley 2013; Bassiouni and Schabas 2011).

Appendix A provides similar information for all nine UN core human rights treaties as well as a

detailed list of these treaties and their current status of ratification. Figure 1 shows the number of

states that participate in each of the four monitoring procedures under the CAT and the OPCAT from

1984 when the CAT was open for ratification until 2015.

Table 1: Monitoring procedures under the Convention against Torture (CAT)

Procedure State Inquiry State Individual Country


reporting communication communication visit

Provision Art. 19 Art. 20 Art. 21 Art. 22 Optional Protocol


Participation Mandatory Opt-out allowed Opt-in required Opt-in required Ratification required

We investigate the causal effects of monitoring procedures under the CAT for several reasons.

First, the CAT is one of the most important international treaties that protect a universal and non-

derogable human right—the right not to be subject to torture. Second, the CAT, together with

the OPCAT, is the only UN human rights treaty currently in effect that employs all five monitoring

2
Procedure Art19 Art20 Art22 OPCAT
160

140

120

100

80

60

40

20

0
1984 1989 1994 1999 2004 2009 2014

Figure 1: Number of states that participate in each of the four monitoring procedures under the CAT from
1984–2015, including the state reporting procedure under under Art. 19, the inquiry procedure under Art.
20, the individual communication procedure under Art. 22, and the country visit procedure under the
OPCAT. The CAT was open for ratification in December 1984 and went into effect in 1987. The OPCAT was
open for ratification in December 2002 and entered into force in 2006.

procedures. Especially important among them is the country visit system under the OPCAT, which

was designed and operated with a designated authority for a separate monitoring body. As a result,

among all UN human rights treaties, the CAT is the only case where we can comparatively evaluate

the causal effects of all types of monitoring procedures. Finally, it is worth noting that the CAT aims

to address a single type of violation (torture) and protect a highly specific, targeted human right

(the right not to be subject to torture). It is thus different from other human rights treaties that

usually address a wide range of rights such as civil and political rights, women’s rights, or children’s

rights. We can therefore more appropriately compare the causal effects of monitoring procedures

under this treaty since they operate in the same rights domain.

Our causal effect estimation indicates that only the country visit procedure significantly and

consistently reduces torture and improves government respect for physical integrity rights. Other

procedures demonstrate no such impact. In fact, the state reporting procedure tends to have a neg-

ative, though not always significant, impact on rights protection. These differing causal effects, we

argue, are most likely the result of the variation in intrusiveness among the monitoring procedures.

Future research should explore the causal effects of monitoring procedures under other human rights

treaties for a more definitive conclusion as to whether intrusive procedures with stronger external

oversight lead to greater improvements in relevant human rights outcomes. More broadly, a similar

dynamics could potentially be observed with respect to monitoring institutions in other domains

3
such as weapons inspection and nonproliferation. Methodologically, our research also presents an

innovative application of the structural causal inference framework and a machine learning-based

estimator to a substantive research question in political science. A combination of causal inference

and flexible machine learning has great potential to advance political science research much further.

2 Theoretical Proposition

We develop a proposition to explain why intrusive monitoring procedures may have a greater causal

impact on human rights outcome. We, first, make an empirical observation that monitoring proce-

dures vary in their intrusiveness to state sovereignty. This intrusiveness has two implications. The

first one is that participating in an intrusive procedure sends a credible signal to international au-

diences about a state’s intent to protect human rights. Second, an intrusive monitoring procedure

is likely to generate more information about the actual behavior of participating states. By provid-

ing more credible signals ex ante and better information ex post, intrusive procedures improve state

practices by raising both the cost of treaty violations as well as the probability of getting caught

violating treaty obligations.

Where our proposition differs from, and contributes to, the literature is mainly with respect

to the choice of the key independent variable. Unlike the existing literature that mainly focuses

on treaty ratification, we instead (1) disaggregate treaty ratification into separate state decisions to

participate in different treaty monitoring procedures; (2) classify these procedures by their intru-

siveness; (3) explain the signaling value and the informative power of monitoring procedures as

a function of their intrusiveness; and (4) offer empirical evidence that demonstrates the differing

causal effects of treaty monitoring procedures. A stylized model of the causal process according to

our theoretical proposition is presented in Figure 2.

Intrusiveness Signals and Human Rights


of Procedures Information Outcome

Figure 2: Causal process from monitoring procedures to human rights outcome

4
2.1 Intrusiveness of monitoring procedures

A key mandate of human rights treaty bodies involves monitoring treaty compliance by member

states, using a variety of procedures that can vary in their intrusiveness. One way to classify the

intrusiveness of monitoring procedures is to use the three criteria under the legalization framework,

including obligation, precision, and delegation, all of which are defined “in terms of key characteris-

tics of rules and procedures, not in terms of effects” (Abbott et al. 2000: 402). Obligation concerns

whether a particular rule imposed upon states parties is legally binding. Precision refers to the de-

gree to which obligations are unambiguously and clearly specified. Delegation measures “the extent

to which states and other actors delegate authority to designated third parties [. . . ] to implement

agreements” (Abbott et al. 2000: 415). Hafner-Burton, Mansfield and Pevehouse (2015: 11) pre-

viously apply this three-criterion framework to develop a ten-indicator measure of the sovereignty

costs of a large number of human rights institutions. In the case of monitoring procedures under the

CAT, we similarly use these criteria to develop a more targeted and descriptive ranking based on the

treaty language as well as the actual implementation of each monitoring procedure.

First, obligations range from being explicitly non-binding to merely hortatory to uncondition-

ally binding. In a state reporting system, states voluntarily submit self-reports for review by the

treaty bodies. Many treaty members, however, either substantially delay their initial and periodic

submissions or simply ignore this obligation altogether. According to a 2012 report by the UN High

Commissioner for Human Rights, around 20% of the states parties to the CAT have never submitted

any reports at all (Pillay 2012: 22). When participating in other monitoring procedures, however,

states do not get to choose whether and when they are subject to scrutiny by the treaty monitoring

bodies. Under the inquiry procedure, for example, the treaty bodies can actively initiate inquiries

into allegations of systematic violations by a state party without having to wait for submitted com-

plaints by individuals with proper standing.

Obligation is more demanding under the individual communication procedure. Monitoring

no longer depends on a state’s periodic submission of self-reports. Rather, it is in response to submit-

ted individual complaints. Unlike their reviews and comments on state reports under the reporting

system, adjudication by the treaty bodies under the individual communication procedures also car-

ries a “great weight”—a status that has gained wide consensus among international legal scholars

and treaty bodies and has also been acknowledged and affirmed by the International Court of Justice

5
(Ulfstein and Keller 2012: 92–94). An indication that participating states take a treaty body’s adju-

dicating decisions seriously is that many of them choose to respond and defend themselves against

allegations. States accused of treaty violations regularly argue in front of the treaty bodies not only

against the merits but also the admissibility of submitted complaints and the standing of alleged

victims.

The country visit procedure that the SPT operates under the OPCAT likely imposes the most

onerous and binding obligations. This procedure takes the form of regular and unannounced visits to

places of detention, broadly defined, within any territories under the jurisdiction of OPCAT members

(Art. 4). According to its First Annual Report (p. 25), for example, the SPT visited not only police

facilities and prisons, but also juvenile and shelter centers, children’s homes, and drug rehabilitation

centers. Its Fifth Annual Report (p. 13) also states that the SPT tried “to increase its activities

in relation to non-traditional places of detention during 2011, including immigration facilities and

medical rehabilitation centres.” As an indication of the unrestricted nature of the SPT’s visits, OPCAT

members are obligated to grant access to the SPT even in case of state emergency (Art. 14). The

SPT is able to request any relevant information and interview in private any persons it believes can

supply relevant information (Art. 14 and Art. 15). States parties usually get notified and consulted

about an upcoming visit, but the purpose is “to facilitate the visit, not to prevent it from occurring”

and, in fact, “no State has objected to a visit proposed by the SPT” (Steinerte, Evans and de Wolf

2011: 98). In addition to the SPT, Articles 17–23 of the OPCAT also require participating states to

establish or designate a national preventive mechanism, preferably in the form of an independent

national human rights institution, as both a liaison and an oversight body that has similar mandate

and authority as the SPT. Finally, states parties are not able to make reservations to any of the

provisions in the OPCAT (Art. 30).

The second criterion is precision. The basic idea is that precise and specific rules make it

easier for monitoring bodies to determine whether alleged violations are factual and accurate and

which reparations are merited. Monitoring procedures that operate according to more precise rules

are therefore more likely to make concrete determination about state compliance. The individual

communication procedure fare best according to this evaluative criterion. Under this procedure,

the competence to adjudicate submitted complaints as well as the composition and the rules of op-

eration for treaty bodies are highly precise and even deemed comparable to those of international

courts (Ulfstein and Keller 2012: 98). Decisions by the treaty bodies with respect to individual

6
complaints also have the effect of clarifying the legal content and contributing to more precise in-

terpretation of international rules for national governments and domestic courts. This semi-judicial

role of the treaty bodies under the individual communication procedure, while not producing legally

binding and directly enforceable decisions, could nonetheless prove highly intrusive to the domestic

governance of states parties, especially compared to the state reporting procedure. As a former UN

Special Rapporteur on Torture has observed, the individual communication procedure “is the most

court-like function of the treaty bodies” whereas “[the state reporting system] is a mode of reviewing

compliance with the treaties’ obligations in minimally intrusive manner” (Rodley 2013: 634).

The third criterion concerns delegation. In the context of human rights treaty monitoring

procedures, it refers to the amount of authority that treaty members delegate to the treaty bodies

to carry out monitoring compliance and promote implementation of treaty obligations. One way

to assess the amounts of delegated authority across multiple monitoring procedures is to examine

the rules that treaty bodies have developed on their own to carry out their mandates. Delegated

authority is minimal under the state reporting procedure because under that system the treaty bod-

ies are in a passive position to receive and make comments on the self-reports that states parties

care enough to submit. The inquiry procedure under Art. 20 of the CAT grants more authority to

the treaty body, including the ability to initiate inquiries into allegations of systematic violations.

However, Art. 20(5) requires the treaty body to seek cooperation from the state under inquiry “at

all stages of the proceedings,” placing a significant limitation in terms of delegated authority.

Under the individual communication procedure, there is more delegated authority since treaty

bodies could receive and deliver judgments on complaints by alleged victims or those acting on their

behalf. States parties cannot halt or delay the process even if they refuse to participate in arguments

or respond to allegations. Similarly, treaty bodies can also reach a judgment on the admissibility and

merits of submitted complaints with or without the inputs and defense by the accused governments.

The SPT, the treaty body under the OPCAT, arguably has the greatest amount of delegated authority

to implement the country visit procedure. According to the OPCAT, the SPT could randomly select a

member state in which to conduct an investigative visit without having to secure any authorization.

In fact, according to its First Annual Report, the SPT selected its first batch of country visits by

random. Selected states are obligated to grant access to any places within their jurisdiction and the

SPT could later publicize its findings by a simple majority decision in the CAT Committee—the treaty

monitoring body of the CAT.

7
When aggregating over these three criteria, we can approximately classify monitoring proce-

dures under the CAT by their intrusiveness. The country visit system under the OPCAT ranks first

by the two criteria of obligation and delegation whereas the individual communication procedure

ranks first by the precision criterion. Similar to Hafner-Burton, Mansfield and Pevehouse (2015),

we are assuming that obligation, precision, and delegation contribute roughly equally to the overall

metrics of intrusiveness. As a result, we classify the country visit procedure as more intrusive than

individual communication and inquiry whereas state reporting is the minimally intrusive procedure

of all.

2.2 Signal about intent and information about compliance

Two logical implications follow the variation in intrusiveness among treaty monitoring procedures.

First, ratifying an intrusive monitoring procedure will credibly reveal a state’s intent to comply with

treaty obligations (Farber 2002). The logic is straightforward. Intrusive monitoring procedures

impose a high sovereignty cost, defined as constraints on a state’s freedom of action, that only a

state genuine about compliance could ratify and maintain its ratification status without having to

withdraw by Art. 31 of the CAT and Art. 33 of the OPCAT. According to this logic, by subjecting itself

to the country visit procedure, for instance, a state sends a credible signal about its intent to comply

than is the case if it merely participates in the state reporting system.1 The reason is that hosting

unannounced and unrestricted visits by international independent experts, and therefore running

the risk of having violations and abuses exposed, is significantly more costly than submitting self-

reports at the discretion of the government. Even if fulfilling the state reporting obligation might not

be entirely costless (Goodman and Jinks 2003; Cole 2009; Hafner-Burton, Mansfield and Pevehouse

2015), the cost of participating in an intrusive procedure could be significantly higher.

The history of the OPCAT is instructive of how concerns about sovereignty costs could delay or
1 Hollyer and Rosendorff (2011) offer a different signaling logic, according to which dictators sign the CAT to signal
their strength rather than their intent to comply. Domestic opposition groups perceive a dictator’s treaty commitment and
subsequent ostentatious treaty violations as credibly signaling her strength. As a result, they are less likely to mount a
challenge, in effect prolonging the survival of the authoritarian regime. Although this argument “has some plausibility
problems on its face” (Simmons 2011: 743), it has not been contested empirically. We would like to note three key differences,
though, between our analysis and that by Hollyer and Rosendorff (2011). First, our target population for inference is
all countries, not just the subset of dictatorships. Second, our causal predictor is participation in monitoring procedures
under the CAT, not just a commitment to the treaty and its mandatory state reporting system. Third, the hypothetical policy
intervention in our analysis is ratification as opposed to signature as operationalized in the analysis by Hollyer and Rosendorff
(2011). The third point is critical because, under international law, signing a human rights treaty does not activate any treaty
monitoring activities. Nor does it create any treaty obligations for the signatories, except for the legally vague obligation “not
to defeat the object and purpose of a treaty prior to its entry into force” under Article 18 of the Vienna Convention on the
Law of Treaties (Jonas and Saunders 2010).

8
even stymie the adoption of a comparatively more intrusive monitoring procedure. The formulation

of the OPCAT was modeled on a proposal by the Swiss Committee against Torture, which was later

renamed as the Association for the Prevention against Torture, during the negotiation for the CAT

in the late 1970s and early 1980s. At that time, this proposal, known as the “Swiss model,” was

considered so intrusive to state sovereignty, especially compared to the “Swedish model” that the

CAT eventually adopted, that it was shelved (Clark 2009; Evans 2011). It took almost 15 years

after the adoption of the CAT for the “Swiss model” to be revived, modified, and adopted as the

OPCAT on December 18, 2002. In other words, both the CAT and the OPCAT used to be considered

at the same time, but they were eventually adopted almost two decades apart because they differ

significantly in terms of their intrusiveness. As a result, their ratification sends signals of different

levels of credibility.

The second implication is that intrusive monitoring is likely to generate more information

about state compliance. The SPT’s visit to Paraguay in 2009, for example, produced a 313-paragraph

report on that country’s torture practice alone. This is because, compared to the CAT Committee, the

SPT is under fewer restrictions when monitoring and exposing government abuses. The SPT also has

more clout because of its ability to publish a state’s substandard record. Conversely, governments

with a good record, by Art. 16(2) of the OPCAT, could ask the SPT to publish its positive assessment.

According to the Fourth Annual Report of the SPT, for instance, five visit reports have been published

following requests by Honduras, Maldives, Mexico, Paraguay, and Sweden. This design enhances the

ability of OPCAT participation to effectively separate compliant states from violating countries, thus

reducing the risk of OPCAT ratification being used merely as a cover for human rights abuses. The

individual communication procedure is also more likely to generate more compliance information

because individual complaints provide more detailed information about state abuses and the treaty

body’s decisions in response remove ambiguity about state compliance.

In summary, a voluntary decision to subject itself to an intrusive monitoring procedure cred-

ibly signals a state’s intent to comply. That procedure is also likely to generate more information

about a state’s actual compliance. This ex ante signal and ex post information decrease government

repression through a couple of mechanisms. First, ratifying an intrusive procedure that imposes a

significant sovereignty cost will establish a higher baseline expectation by international audiences.

Other countries, inter-governmental organizations, and non-governmental organizations (NGOs)

perceive a strong, clear, and credible signal from the ratifying state. A greater public expectation

9
results in a higher reputational cost for treaty violations (Brewster 2013). In other words, by raising

the reputational stake, participation in an intrusive monitoring procedure raises the cost of repres-

sion and thereby reduces the incentive to commit treaty violations post-ratification.

Second, intrusive monitoring is also more likely to detect non-compliant behaviors, increasing

the probability that a member state gets caught violating its treaty obligations. This is especially

important in the case of state torture. As Lupu (2013a: 477-481) points out, even independent

domestic courts could encounter enormous difficulties when enforcing international commitments to

protect physical integrity rights. A major reason is that repressive governments have a considerable

capacity to interfere with witnesses, hide the victims, and destroy evidence, thus raising the cost

of producing legally admissible evidence. The informative power of intrusive monitoring could be

especially useful under such difficult circumstances and can provide evidence that is hard to obtain

otherwise.

The unannounced and unrestricted nature of the SPT’s visits also produces the kind of evi-

dential information that could be instrumental for legislative opposition parties (Lupu 2015), social

movements, and mass mobilization (Simmons 2009) to constrain and hold abusive state officials

accountable. The SPT’s Second Annual Report, for example, mentions that the SPT has “carried out

unannounced visits to places of detention [and] had interviews in private with persons deprived of

their liberty” (p. 8). Its Third Annual Report reiterates that “confidential face-to-face interviews

with persons deprived of liberty are the chief means of verifying information and establishing the

risk of torture” (p. 9). The same report also raises concern that many detainees whom the SPT

spoke with may become a target of reprisal afterward (p. 11). The risk that detainees have taken in

talking to the SPT indicates that the kind of information the SPT gathers is unlikely to be available

in voluntary state reports and constructive dialogues that states parties occasionally have with treaty

body experts under the state reporting procedure.

To summarize, in addition to raising the reputational cost of rights abuses, intrusive moni-

toring procedures also increase the probability of detecting violations in participating states by gen-

erating compliance information that is usually not available under less intrusive procedures. This

information factors into domestic institutions such as the legislature, domestic courts, and social

movements that exert a constraining effect on state behavior. We make no assumption as to which

causal pathway—international reputation or domestic institutions or social movements—is more

effective. They may operate more or less independently or, more likely, there could be a complex

10
interplay between them. Regardless, it should not prevent us from the task of estimating the causal

effects of treaty monitoring procedures, which is the focus of this paper. If a relationship truly exists

between the intrusive design of treaty monitoring procedures and their causal effects, we expect

more intrusive procedures such as individual communication and especially country visit will be

more causally significant than less intrusive ones, particularly the state reporting procedure. It is

noteworthy that the variation in intrusiveness among monitoring procedures does not only exist; it

is by design. It is not a coincidence that more intrusive monitoring procedures are almost always

packaged in an optional protocol or an optional provision of a human rights treaty.

3 Empirical Analysis

This section estimates the causal effects of participating in the state reporting procedure under

Art. 19 of the CAT, the inquiry procedure under Art. 20, the individual communication procedure

under Art. 22, and the country visit procedure under the OPCAT. We use observational data on

192 countries from 1987 when the CAT went into effect until 2015. Adopting the causal inference

framework that Pearl (2000, 2009) develops, we begin by formulating a structural causal model

that formalizes our background knowledge about the data generating process. We then use a causal

graph to represent this structural model, encode the necessary causal assumptions, and establish the

estimability of the causal effects of interest. Establishing estimability, known as causal identification,

is essential because it specifies the conditions for translating an interventional model of causality into

an observational setting where we obtain our data. It permits, assuming identification conditions are

satisfied, the use of estimation methods using observational data to make causal inference. Finally,

we employ the machine learning-based targeted maximum likelihood estimator to compute effect

estimates that are more robust than those obtained via standard parametric statistical models. We

conclude with our interpretation of the effect estimates by mapping them back into our substantive

research question.

3.1 Causal model formulation

To make causal inference in a non-experimental setting, one has to start by assuming that the ob-

served data come from a data generating system that is compatible with a specified structural causal

model. A structural causal model is simply a set of equations that make explicit our notion about

11
“how nature assigns values to variables of interest” (Pearl, Glymour and Jewell 2016: 27). Our

structural model in Equation set 1 describes the functional relationships between human rights out-

come Y , a set of four treatments A = {A1, A2, A3, A4}, mediators M , and a set of time-invariant

confounders X and time-varying confounders W . The subscript t indicates the time period during

which the observed variables are measured.

X = fX (UX )

Wt = fW (X, Wt−1 , UW )

A1t = fA1 (X, A1t−1 , Mt−1 , Yt−1 , Wt , UA1 )

A2t = fA2 (X, A2t−1 , Mt−1 , Yt−1 , A1t , Wt , UA2 )


(1)
A3t = fA3 (X, A3t−1 , Mt−1 , Yt−1 , A1t , Wt , UA3 )

A4t = fA4 (X, A4t−1 , Mt−1 , Yt−1 , A1t , Wt , UA4 )

Mt = fM (X, Mt−1 , A1t , A2t , A3t , A4t , Wt , UM )

Yt = fY (X, Yt−1 , A1t , A2t , A3t , A4t , Mt , Wt , UY )

The treatments, whose causal effects are our target of estimation, include A1 (the state report-

ing procedure), A2 (the inquiry procedure), A3 (the individual communication procedure), and A4

(the country visit procedure). To measure the treatment variables, we use a state’s formal ratifica-

tion of the CAT, the presence or absence of a reservation to Art. 20, declaration of intent to be bound

by Art. 22, and formal ratification of the OPCAT. Since the state reporting procedure is mandatory

under the CAT, a measure of participation in this procedure completely overlaps with ratification

of the CAT. We do not measure how participating states actually engage with the periodic review

process under state reporting system (Creamer and Simmons 2015) or with other procedures. Mea-

surements of actual monitoring activities, while seemingly more intuitive, may miss the signaling

value of formal ratification. Equally important, the lack of data would present an insurmountable

challenge.

Informed by the existing literature about the causal mechanisms through which a human

rights treaty influences state behavior, we specify four potential causal mediators. These mediators

include institutional and legislative constraints (Lupu 2015), domestic judicial enforcement (Powell

and Staton 2009; Conrad 2013), mass mobilization (Simmons 2009), and international socializa-

tion (Keck and Sikkink 1998; Clark 2013; Goodman and Jinks 2013). They are believed to be trans-

12
mitting and mediating the causal effects of human rights treaties. The mediators are respectively

measured by (a) political constraints index, which measures the feasibility of policy change based on

the veto power and alignment among government branches and degrees of preference heterogene-

ity within the legislative branch (Henisz 2002); (b) latent judicial independence estimates, which

measure the independent power of the judiciary to constrain choices of the government (Linzer and

Staton 2015); (c) naming and shaming index, which is an aggregation of reporting on human rights

abuses by major media outlets, Amnesty International, and the UN Commission on Human Rights

(Cole 2015: 423); and (d) treaty commitment preference coordinates based on state ratifications of

280 universal treaties across a large number of policy areas (Lupu 2016), which is a good indicator

of the extent to which countries interact and socialize internationally.

To investigate the robustness of our causal effect estimation to different measurements, we

use three different measures of human rights outcome, including the Political Terror Scale (Gibney

et al. 2016), human rights protection scores (Fariss 2014), and the CIRI torture index (Cingranelli,

Richards and Clay 2013). While the CIRI torture index has the benefit of Each of these datasets

has a different measurement timeframe. We therefore right-censor our analysis accordingly. Finally,

we include a number of time-invariant covariates (legal origin, treaty ratification rule, and electoral

system) and time-varying covariates (population size, gross domestic product (GDP) per capita,

participation in international trade, regime type, regime durability, and involvement in international

conflicts). As indicated in the literature, these covariates are the usual suspects that may confound

the relationship between treaties and human rights practices. Table 2 lists the observed variables,

indicates the timeframe of measurement for each variable, refers to studies in the literature that

similarly examine these variables, and identifies the data sources. Appendix B provides a more

detailed description of data sources, variable measurements as well as the recoding, preprocessing,

and transformation of variables for data analysis.

From the generative system as modeled in Equation set 1, we observe a sample of n country–

year observations On = (X, Wt , At , Mt , Yt ) ∼ PO where PO is the joint probability distribution of the

observed variables. In estimating the contemporaneous causal effect of each treatment, we compute

the average change in human rights outcome Yt as if we could physically intervene to alternate

the ratification/participation status of a monitoring procedure for all country–year observations.

The effects of these interventions are expressed in terms of the mean of the outcome distribution:
   
EPO Y |do(A = 1) and EPO Y |do(A = 0) . The do-operator indicates an active intervention to fix

13
the treatment value at A = 1 (ratified) and A = 0 (non-ratified) and the expectations are taken

over the entire population. The difference of the two expected values is the average causal effect of
   
participating in a monitoring procedure τ = EPO Y |do(A = 1) − EPO Y |do(A = 0) .

Table 2: Model variables

Sets Timeframe Variables, References, and Data Sources


A 1986–2015 A1: Art. 19 ratification status (OHCHR).
1986–2015 A2: Art. 20 reservation status (OHCHR).
1986–2015 A3: Art. 22 declaration status (OHCHR).
1986–2015 A4: OPCAT ratification status (OHCHR).
M 1986–2016 M 1: Institutional and legislative constraints (Lupu 2015)
measured by political constraints index (Polcon iii) (Henisz 2002).
1986–2012 M 2: Judiciary effectiveness (Powell and Staton 2009; Conrad 2013)
measured by judicial independence index (Linzer and Staton 2015)
1986–2007 M 3: Political mobilization (Murdie and Davis 2012; Simmons 2009)
measured by naming and shaming index (Cole 2015).
1986–2008 M 4: International socialization (Clark 2013; Goodman and Jinks 2013)
measured by treaty commitment preferences (Lupu 2016).
Y 1986–2011 Y 1: CIRI torture index (Cingranelli, Richards and Clay 2013).
1986–2013 Y 2: Human rights protection scores (Fariss 2014).
1986–2015 Y 3: Political Terror Scale (Gibney et al. 2016).
W 1986–2015 Population size (Hafner-Burton and Tsutsui 2007)
measured by the World Bank Indicators.
1986–2015 Gross domestic product (GDP) per capita (Hafner-Burton and Tsutsui 2007)
measured by the World Bank Indicators.
1986–2015 Participation in international trade (Hafner-Burton 2013)
measured by the World Bank Indicators.
1986–2015 Regime types (Hathaway 2007; Chapman and Chaudoin 2013; Neumayer 2007)
measured by Polity scores (Marshall Monty, Keith and Robert 2016).
1986–2015 Regime durability (Goodliffe and Hawkins 2006)
measured by age in current regime (Cheibub, Gandhi and Vreeland 2010).
1986–2015 Involvement in militarized interstate disputes (Chapman and Chaudoin 2013).
measured by MID dataset (Themnér 2014).
X Legal origin (Mitchell, Ring and Spellman 2013)
measured by legal origins data (La Porta, Lopez-de Silanes and Shleifer 2008).
Treaty ratification rule (Simmons 2009)
measured by ratification rules dataset (Simmons 2009).
Electoral system (Cingranelli and Filippov 2010)
measured by database of political institutions (Cruz and Scartascini 2016).

In summary, to compute the average causal effect of a monitoring procedure, we would not

simply observe the treatment values that nature generates. Rather, we would need to intervene

to disable the treatment assignment mechanism—in our structural causal model, for instance, the

treatment mechanism that generates the ratification status for the state reporting procedure is the

equation A1t = fA1 (X, A1t−1 , Mt−1 , Yt−1 , Wt , UA1 )—and fix the treatment values at A1 = a for

14
a = {0, 1} while, at the same time, retaining the rest of the generative system (the remaining

equations). The modified causal model is an interventional model Pa with the subscript a indicating

an intervention to fix the treatment value at A = a. We then predict the outcome value under

different interventions to compute the average causal effect.

3.2 Causal identification

The question of causal identification arises when we wish to establish the conditions under which

we can translate (and compute) an interventional query from the observational joint probability

distribution. This translation is made on a transparent basis using a graphical causal model in

the form of a directed acyclic graph (DAG). The DAG in Figure 3 is a graphical representation

of our structural causal model in Equation set 1. A directed acyclic graph (Pearl 2000; Koller and

Friedman 2009) comprises of nodes denoting random variables and edges (directed arrows between

two nodes) denoting one variable’s direct causal influence on another variable. A path in a DAG is a

sequence of directed arrows, regardless of their directions, that connects one node to another. A path

between two nodes that only consists of arrows of the same direction is a causal path. Otherwise,

it is a non-causal path. An acyclic graph contains no cycle, that is, no feedback loop, which means

that no node in the graph can have a causal path leading to itself.

Identification of causal effects is dependent upon the causal structure, which is represented

by the topology of a causal graph. Thus, any causal model should be based on the background

knowledge in the existing literature to maximize the chance that it accurately captures the true data

generating process. Our causal model in Figure 3 is constructed following that guideline as detailed

below.

First, we represent in black solid arrows both the direct causal effect of each monitoring pro-

cedure on human rights outcome (At → Yt ) and their indirect causal effects that go through all the

mediators (At → Mt → Yt ). We make one key assumption with respect to the relationships between

the four treaty monitoring procedures. In order to participate in any optional monitoring procedures

(inquiry, individual communication, and country visit), a state has to be a party to the CAT in the

first place. Since we have no reasons to believe in any particular causal order between the three op-

tional monitoring procedures, we assume they are causally unrelated to each other. In other words,

A2, A3, and A4 are independent conditional on treaty ratification A1 and other covariates. This

15
assumption of conditional independence among three optional monitoring procedures is necessary

for causal effect identifiability. Otherwise, if these procedures mutually cause each other, we would

not be able to identify their causal effects.

Wt−1 Wt

A1t−1 A1t

A2t−1 A2t

Mt−1 Mt

A3t−1 A3t

A4t−1 A4t

Yt−1 Yt

Figure 3: A causal DAG representing the causal process from treaty monitoring procedures A1 (state re-
porting under Art. 19), A2 (inquiry under Art. 20), A3 (individual communication under Art. 22), and
A4 (country visit under the OPCAT) to Y (human rights outcome) with time-varying confounders W .
For clarity of presentation, time-invariant confounders X, which precede and potentially affect all time-
varying covariates, are not represented. All exogenous variables U ’s are assumed to be mutually indepen-
dent and are not represented either. Two shaded blocks indicate two time periods. The conditioning set
ZA1 = {X, A1t−1 , Mt−1 , Yt−1 , Wt } is sufficient to identify the causal effect of A1t on Yt . Similarly, the
conditioning sets ZA2 = {X, A2t−1 , Mt−1 , Yt−1 , A1t , Wt }, ZA3 = {X, A3t−1 , Mt−1 , Yt−1 , A1t , Wt },
and ZA4 = {X, A4t−1 , Mt−1 , Yt−1 , A1t , Wt } are sufficient to, respectively, identify the causal effect of
A2t , A3t , and A4t on Yt .

Second, to represent potential confounding by various time-varying covariates W , we include

directed arrows from these covariates to all treatments (Wt → At ), mediators (Wt → Mt ), and

outcome (Wt → Yt ). For a clear and concise presentation, we do not represent time-invariant

confounders X, but they are assumed to precede and affect all other variables in the model.

Third, we incorporate into the construction of our graphical model the selection effects argu-

ment very common in the literature (von Stein 2005; Simmons and Hopkins 2005; Gilligan, Sergenti

et al. 2008). Briefly, this argument claims that only states that intend to comply with a human rights

institution in the first place would join and, therefore, selection into a monitoring procedure would

16
be potentially biased. As a consequence, we would observe compliance only where we would have

observed compliance anyway in the absence of the institution. This is an identification problem and

in order to deal with it, we represent the potential causal effect of lagged human rights outcome on

the treatments with the directed arrows Yt−1 → A1t , Yt−1 → A2t , Yt−1 → A3t , and Yt−1 → A4t .

Fourth, we further include the directed arrow Yt−1 → Mt to denote the causal effect of the

lagged outcome variable on the time-varying mediators. This causal arrow reflects the possibility

that the use of torture by the executive could threaten the opposition parties and weaken legislative

constraints; intimidate judges and undermine judicial independence; suppress social movements

even while potentially provoking mass mobilization; and possibly prompt international censure and

condemnation.

Fifth, to further enable our graphical causal model to capture a potentially highly complex

reality, we allow for the directed arrows Mt−1 → A1t , Mt−1 → A2t , Mt−1 → A3t , and Mt−1 → A4t .

Substantively, this means the lagged mediators might have a causal influence on the ratification/par-

ticipation status of treaty monitoring procedures. This is meant to incorporate the finding that states

may enter into reservations to certain human rights treaty provisions based in part on the effective-

ness of their domestic judicial system (Hill 2016). It is also based on the research that suggests

countries may ratify human rights treaties due to mobilization by human rights NGOs (Simmons

2009) or an entrenchment of human rights norms (Finnemore and Sikkink 1998).

Finally, given the time-series cross-sectional structure of the data, we leaves open the pos-

sibility that the lagged value of a variable has an influence on its current value. Thus, for every

time-varying covariate we include a directed arrow such as A1t−1 → A1t and Yt−1 → Yt .

In summary, all the potential causal relationships between variables are derived on the basis of

the background knowledge in the literature and then represented by directed arrows in a graphical

model. It should be noted that a directed arrow in a graphical causal model does not necessarily

indicate an actual causal influence, but rather a possible one. Thus, including a directed arrow is

synonymous to not making a causal assumption whereas a missing arrow is equivalent to assuming

that a direct causal relationship is absent.

As a graphical representation, a causal DAG compactly represents the causal structure of the

data generating process without making any assumptions about the form of any functions f or the

probability distribution of any exogenous variables U = (UX , UW , UA , UM , UY ) other than that these

exogenous variables are assumed to be mutually independent. Our causal graph also exhibits the

17
Markov property (Pearl 2009: 30), also known as invariance, according to which a node/variable

is independent from its non-descendants (nodes that are not on a causal path from that variable)

conditional on its parent nodes (nodes that have a directed arrow entering that variable). This

concept of invariance lies at the heart of identification using the backdoor criterion.

The backdoor criterion is used to determine non-parametrically the identifiability of a causal

effect via covariate adjustment (Pearl, Glymour and Jewell 2016: 61–66). To identify the causal

effect of A1t on Yt , for example, the backdoor criterion requires a sufficient adjustment set that:

(a) blocks any non-causal paths between A1t and Yt that have an arrow entering A1t ;

(b) leaves open all causal paths from A1t to Yt ; and

(c) creates no spurious paths when conditioning on a collider (a node that lies on a path between

A1t and Yt and has two arrows coming into it) or a descendant of a collider.

Sufficient adjustment sets for causal effect identification are derived based on the structure of

our causal DAG in Figure 3. According to criterion (a), the set ZA1 = {X, A1t−1 , Mt−1 , Yt−1 , Wt } is

sufficient to identify the causal effect of A1t on Yt . To satisfy criterion (b), a sufficient adjustment set

should not include any of the mediators Mt . It also cautions against interacting a mediator with the

treatment or other covariates when estimating a treatment effect. Nor should we condition on any

of the optional monitoring procedures {A2t , A3t , A4t } when estimating the causal effect of A1t . The

reason is that these optional procedures are the child nodes of A1t since participation in any of these

three procedures is legally premised on being a state party to the CAT (A1) in the first place. Criterion

(c) is automatically satisfied since we do not have any colliders on any paths emanating from the

treatment At to the outcome Yt . Applying the same backdoor criterion, we derive three other

adjustment sets ZA2 = {X, A2t−1 , Mt−1 , Yt−1 , A1t , Wt }, ZA3 = {X, A3t−1 , Mt−1 , Yt−1 , At , Wt }, and

ZA4 = {X, A4t−1 , Mt−1 , Yt−1 , A1t , Wt } to respectively identify the causal effects of the other three

monitoring procedures.

The idea of the backdoor criterion is to use an adjustment set of covariates to block all non-

causal paths (usually known as backdoor paths because they include arrows that enter the treat-

ment variable/node) between the treatment and the outcome. In other words, we condition on a

sufficient adjustment set to make the treatment and the outcome (for example, A1t and Yt ) condi-

tionally independent after we have exempted the causal path from A1t to Yt (Pearl 2009: 79–81).

Post-conditioning, we then have effectively removed all non-causal paths between A1t and Yt and

18
any remaining association left is evidence of a causal relationship. In short, a successful backdoor

adjustment will render an interventional query P (Y |do(A = a)) equivalent to an observational query
   
P (Y |A = a). The causal effect τ = EPO Yt |do(A1t = 1) − EPO Yt |do(A1t = 0) then becomes the
h i
parameter ψ(Pn ) = EZA1 E(Yt |At = 1, ZA1 = z) − E(Yt |At = 0, ZA1 = z) P (ZA1 = z) that is

estimable from observational data.

3.3 Machine learning-based estimation

For causal effect estimation, we employ the machine learning-based targeted maximum likelihood

estimator (TMLE) (van der Laan and Rose 2011). The estimation proceeds in three steps. First, we fit

an initial outcome model Q0n (A, Z) = E(Y |A, Z) and the treatment model gn0 (Z) = P (A|Z) where

Z is the sufficient adjustment set. Second, we then modify the initial model Q0n into an updated

outcome model Q1n using the updating equation logit(Q1n ) = logit(Q0n ) + n Hn with the “clever
I(A=1) I(A=0)
covariate” Hn (A, Z) = gn (A=1|Z) − gn (A=0|Z) and the coefficient n is obtained from a separate

logistic regression model logit(Y ) = logit(Q0n ) + n Hn . Finally, we plug the two binary treatment

values a = {1, 0} into the updated model Q1n to compute an estimate of the causal effect parameter
Pn
ψ̂(Pn ) = n1 i=1 Q1n (Ai = 1, Zi ) − Q1n (Ai = 0, Zi ) . Statistical uncertainties around the estimate


ψ̂(Pn ) are approximated by the variance of the efficient influence function (van der Laan and Rose

2011).

The case for this targeted estimator is made in more details by van der Laan and Rose (2011:

101–118). Suffice here to note that this estimator has many desirable properties that few other

estimators can match. First, it is doubly robust, producing unbiased estimates if either the initial

outcome model Q0n or the treatment model gn0 is consistent. It is more robust than, for example,

propensity score-based estimators, the consistency of which depends on the correct specification of

the propensity score model. If both Q0n and gn0 are consistent, the targeted maximum likelihood

estimates are maximally precise.

Second, when using standard parametric regression models for effect estimation, researchers

often make modeling assumptions about the forms of functions that characterize the relationships

between the variables. These functional form assumptions include, for example, linearity of parame-

ters and additivity of covariate effects. Given that many social and political dynamics are non-linear,

these assumptions are likely unwarranted. If a model is not correctly specified in terms of its func-

19
tional form, the estimates will be biased. The TMLE significantly circumvents this limitation by

incorporating a machine learning ensemble technique called Super Learner that adapts to the data

and better approximates the true underlying functions (Polley and van der Laan 2010).

Super Learner uses a collection of parametric (such as generalized linear models and regu-

larized linear models), semi-parametric (such as generalized additive models and spline regression

models), and non-parametric algorithms (such as random forest and gradient boosting). They are

then assembled in a weighted combination with an individual weight for each algorithm that is pro-

portionate to each algorithm’s cross-validated predictive performance. The use of cross-validation

helps make sure that the algorithms should generalize well and avoid overfitting. This cross-

validated combination of algorithms creates a hybrid and much more powerful predictive function,

which is then used to build both the initial outcome model Q0n and the treatment model gn0 . Super

Learner-based models are much more likely to approximate the true underlying data generating

process and satisfy the assumption of correct model specification. Table 3 lists the algorithms we

use in our Super Learner-based effect estimation.

Table 3: Algorithms used in Super Learner-based targeted maximum likelihood estimation

Algorithm Description
glm Main-term generalized linear models Pp
glmnet Regularized linear models with lasso penalty j=1 |βj |
gam Generalized additive models (deg. of polynomials = 2).
polymars Adaptive polynomial spline regression
randomForest Random forest (ntree = 1,000)
xgboost Extreme gradient boosting (ntree = 1,000, max depth = 4, eta = 0.1)

For ease of interpretation and estimation, all three measures of the outcome variable are

rescaled into a bounded 0–1 range with zero indicating the worst torture practice and one indicating

the best performance. To handle missing data, we conduct multiple imputation (m = 5), using the

Amelia II program (Honaker et al. 2011), and combine the effect estimates from each imputed

data set. Appendix C provides the summary statistics of the data and Appendix D summarizes the

imputation process.

3.4 Results and interpretation

Machine learning-based TMLE estimates of the causal effect of participating in four monitoring pro-

cedures are reported in Table 4. In the top panel of Table 4, the results indicate that across three

20
different measures of human rights outcome, country visit is the only one monitoring procedure that

has a consistent and significant causal impact in terms of reducing torture and improving govern-

ment’s respect for physical integrity rights. Its causal effect ranges from a 1.2% increase in human

rights protection scores to a 4.6% decrease in political terror scale to a dramatic 11.8% reduction of

state torture.
Table 4: Average causal effect point estimates and 95% CI of treaty monitoring procedures under the CAT
and the OPCAT.

Procedure PTS scores HR scores CIRI torture index

Super Learner-based Targeted Maximum Likelihood Estimation


Influence Function-based CI with Multiple Imputation

State reporting (Art. 19) −0.023 0.003 −0.165


[−0.044, −0.001] [−0.006, 0.011] [−0.313, −0.018]
Inquiry (Art. 20) 0.058 0.007 −0.064
[0.031, 0.084] [0.004, 0.009] [−0.087, −0.041]
Individual complaint (Art. 22) 0.173 0.116 0.144
[0.142, 0.204] [−0.348, 0.580] [0.033, 0.256]
Country visit (OPCAT) 0.046 0.012 0.118
[0.026, 0.065] [0.009, 0.015] [0.029, 0.207]

Linear Models of Human Rights Outcome


Least Square Estimation with Multiple Imputation

State reporting (Art. 19) −0.001 0.002 −0.030


[−0.031, 0.028] [−0.001, 0.005] [−0.081, 0.020]
Inquiry (Art. 20) 0.058 0.006 0.046
[0.019, 0.098] [0.001, 0.010] [−0.021, 0.113]
Individual complaint (Art. 22) 0.049 0.007 0.040
[0.006, 0.092] [0.002, 0.012] [−0.032, 0.112]
Country visit (OPCAT) 0.014 0.003 0.036
[−0.025, 0.053] [−0.002, 0.007] [−0.032, 0.104]

Number of years (CAT) 29 [1987–2015] 27 [1987–2013] 25 [1987–2011]


Number of observations (CAT) 5,414 5,032 4,648

Number of years (OPCAT) 10 [2006–2015] 8 [2006–2013] 6 [2006–2011]


Number of observations (OPCAT) 1,929 1,547 1,163

The CIRI torture index specifically measures the torture practice by state officials and private

individuals at the instigation of the government. It is probably the most suitable measure of the

outcome. However, its limited timeframe of measurement leads to a smaller number of observa-

tions. We therefore use two other indicators as well that measure a larger variety of government

abuses of physical integrity rights, including political imprisonment, extrajudicial execution, en-

21
forced disappearances, and other violations of political rights. As a result, the causal effect estimates

naturally vary, but they are nonetheless statistically significant. These estimates empirically confirm

our proposition and boost our confidence in the positive causal impact of the country visit procedure

under the OPCAT. Given the large number of determinants of human rights outcome, some of which

are highly resistant to meaningful changes, the finding that ratifying a single intrusive monitoring

procedure can lead to a substantial improvement in human rights conditions is significant and merits

the attention of rights defenders and other stakeholders in the UN human rights treaty system.

The next causally effective monitoring procedure is the individual communication mechanism

under Art. 22 of the CAT. In terms of magnitude, the causal effect of this procedure could be even

greater, averaging at around 15% improvement of human rights conditions. However, when we

measure human rights outcome using the human rights protection scores, its causal impact is no

longer significant due to the large variation of estimated causal effect. The causal effect of the

inquiry procedure is smaller and inconsistent in terms of effect direction across different outcome

measures. Counter-intuitively, but not without similar findings in the literature (Hill 2010; Lupu

2013b), the state reporting system, if anything, has a damaging impact on human rights protection.

This surprising negative impact is even statistically significant, increasing torture by 16.5% when we

measure state practices using the CIRI torture index. It is not implausible that abusive governments

may use participation in this relatively low-cost reporting procedure as a cover for their domestic

repression and deflect international criticism.

In short, the monitoring procedures under the CAT and the OPCAT have noticeably different

effects on human rights outcome, ranging from a high of 17.3% improvement to a low of 16.5%

decline in human rights conditions. Importantly, these findings provide the empirical evidence in

support of our theoretical proposition that intrusive procedures have a positive causal effect whereas

the same claim cannot be made with respect to less intrusive ones. By implications, the findings

suggest that one major way to improve human rights practices is to design and promote treaty

monitoring procedures that are able to exercise intrusive external oversight over state compliance.

Among the five types of monitoring procedures available, the country visit procedure and, to a lesser

extent, the individual complaint procedure represent the most effective protection mechanisms.

Moreover, ongoing efforts to reform the reporting procedures of UN human rights treaty system

should be directed toward building a more intrusive system. Otherwise, the current reporting system

is unlikely to have any positive impact or even backfire.

22
To contrast our machine learning-based estimator with the conventional practice of using

parametric regression models for effect estimation, we report in the bottom panel of Table 4 the

effect estimates from linear regression models of human rights outcome. Covariate selection for

causal effect identifiability remains the same. The results indicate that only in the case of the inquiry

procedure with human rights outcome measured in the PTS scores and human rights protection

scores are the effect estimates somewhat similar, though not as precise. This is probably an indication

that a linear regression model might be close enough in these scenarios. In all other cases, a simple

linear regression model fails to recover any similar effects. It is particularly off the mark with respect

to the country visit procedure and when one uses the CIRI torture index to measure the outcome

variable. It is worth noting that we include ordinary least square regression model in our library

of algorithms. Therefore, if modeling assumptions such as linearity and additivity truly reflected

the data generating process, the model estimates would have been similar to those obtained via our

machine learning-based estimation.

4 Conclusion

In this paper, we disaggregate treaty ratification into participation in different monitoring procedures

and address the question of whether these procedures have differing causal effects on human rights

outcome because of their different designs. Answering this question has significant implications.

First, it contributes to the broader literature that examines the empirical implications of institutional

design for substantive policy outcomes (Downs, Rocke and Barsoom 1998; Koremenos 2005; Gilligan

and Johns 2012; Abbott and Snidal 2013). Evaluating the effectiveness of monitoring procedures

under the CAT and the OPCAT, our causal analysis suggests that human rights institutions that

impose more binding obligations, operate according to precise rules, and enjoy greater delegation

of authority lead to better outcomes.

Second, in terms of human rights research, the contribution of this paper involves examining

the causal impact of an important human rights treaty from a different level of analysis. In the

existing empirical literature, contradictory findings unfortunately abound when it comes to estimat-

ing the causal effect of human rights treaties. One major reason could be that treaties have mostly

been examined at a more aggregate level. We unpack treaty ratification, shift the investigative focus

onto monitoring processes. We develop a proposition as to why monitoring procedures may have

23
differing causal impacts, arguing that their effectiveness is a likely function of their intrusiveness.

Our findings indicate that not all monitoring procedures are created equal or have the same

causal impact. More intrusive procedures such as country visit may have the best chance to improve

human rights conditions. Our causal analysis also informs the current debate on how to reform

the operation and improve the performance of human rights treaty bodies. The findings favorably

support more intensive monitoring and extensive information gathering by international bodies and

panels of independent experts. Our research is certainly only an initial step in evaluating and de-

termining the kind of institutional design that works, has no causal impact, or even backfires in

protecting human rights and improving government accountability. As mentioned, further research

is needed with respect to other human rights treaties and international institutions in other domains.

In terms of methodology, we employ the structural causal inference framework that Pearl

(2009) and others have developed to address the question of causality in an observational setting.

By putting the task of identification front and center, we attempt to give a causal interpretation to our

effect estimates on a transparent basis. Many, if not most, research questions in political science are

queries about cause and effect. Researchers should therefore openly embrace and employ this causal

inference framework. In this paper, we also apply the machine learning-based targeted learning

methodology for parameter estimation. Due to its desirable statistical properties, targeted learning

is likely to become more popular and widely used across different fields of scientific inquiry.

24
References
Abbott, Kenneth W and Duncan Snidal. 2013. Law, Legalization, and Politics: An Agenda for the Next Gener-
ation of IL/IR Scholars. In Interdisciplinary Perspectives on International Law and International Relations: The
State of the Art, ed. Jeffrey L Dunoff and Mark A Pollack. Cambridge University Press pp. 33–57. 23

Abbott, Kenneth W, Robert O Keohane, Andrew Moravcsik, Anne-Marie Slaughter and Duncan Snidal. 2000.
“The Concept of Legalization.” International Organization 54(03):401–419. 5

Bassiouni, Cherif M. and William A. Schabas. 2011. New Challanges for the UN Human Rights Machinery.
Intersentia. 2

Brewster, Rachel. 2013. Reputation in International Relations and International Law Theory. In Interdisciplinary
Perspectives on International Law and International Relations: The State of the Art, ed. Jeffrey L Dunoff and
Mark A Pollack. Cambridge University Press pp. 524–543. 10

Chapman, Terrence L and Stephen Chaudoin. 2013. “Ratification Patterns and the International Criminal
Court1.” International Studies Quarterly 57(2):400–409. 14

Cheibub, José Antonio, Jennifer Gandhi and James Raymond Vreeland. 2010. “Democracy and Dictatorship
Revisited.” Public Choice 143(1-2):67–101. 14

Cingranelli, David L., David L. Richards and K. Chad Clay. 2013. “The Cingranelli-Richards (CIRI) Human
Rights Dataset.” CIRI Human Rights Data Website: http: // www. humanrightsdata. org . 13, 14

Cingranelli, David and Mikhail Filippov. 2010. “Electoral Rules and Incentives to Protect Human Rights.”
Journal of Politics 72(1):243–257. 14

Clark, Ann Marie. 2009. Human Rights NGOs at the United Nations: Developing an Optional Protocol to the
Convention against Torture. In Transnational Activism in the UN and EU: A Comparative Study, ed. Jutta
Joachim and Birgit Locher. Routledge. 9

Clark, Ann Marie. 2013. The Normative Context of Human Rights Criticism: Treaty Ratification and UN
Mechanisms. In From Commitment to Compliance: The Persistent Power of Human Rights, ed. Thomas Risse,
Stephen C. Ropp and Kathryn Sikkink. Cambridge: Cambridge University Press. 1, 12, 14

Cole, Wade M. 2009. “Hard and Soft Commitments to Human Rights Treaties, 1966–2001.” Sociological Forum
24(3):563–588. 8

Cole, Wade M. 2015. “Mind the Gap: State Capacity and the Implementation of Human Rights Treaties.”
International Organization 69(02):405–441. 13, 14, 30

Conrad, Courtenay R. 2013. “Divergent Incentives for Dictators: Domestic Institutions and (International
Promises Not to) Torture.” Journal of Conflict Resolution . 12, 14

Conrad, Courtenay R. and Will H. Moore. 2010. “What Stops the Torture?” American Journal of Political Science
54(2):459–476.

Creamer, Cosette D and Beth A Simmons. 2015. “Ratification, Reporting, and Rights: Quality of Participation
in the Convention against Torture.” Human Rights Quarterly 37(3):579–608. 12

Cruz, Cesi, Philip Keefer and Carlos Scartascini. 2016. “Database of Political Institutions Codebook, 2015
Update (DPI2015).”. 14

Downs, George W., David M. Rocke and Peter N. Barsoom. 1998. “Managing the Evolution of Multilateralism.”
International Organization 52(2):397–419. 23

Egan, Suzanne. 2011. The United Nations Human Rights Treaty System: Law and Procedure. Bloomsbury Profes-
sional. 2

25
Evans, Malcolm. 2011. The OPCAT at 50. In The Delivery of Human Rights: Essays in Honour of Professor Sir
Nigel Rodley, ed. Geoff Gilbert and Clara Sandoval Villalba. Taylor & Francis. 9

Farber, Daniel A. 2002. “Rights as Signals.” Journal of Legal Studies 31:83–94. 8

Fariss, Christopher J. 2014. “Respect for Human Rights Has Improved Over Time: Modeling the Changing
Standard of Accountability.” American Political Science Review 108(2):297–318. 13, 14

Finnemore, Martha and Kathryn Sikkink. 1998. “International Norm Dynamics and Political Change.” Interna-
tional Organization 52(4):887–917. 17

Gibney, Mark, Linda Cornett, Reed Wood and Peter Haschke. 2016. “Political Terror Scale 1976-2015.” Political
Terror Scale Website: http: // www. politicalterrorscale. org . 13, 14

Gilligan, Michael J, Ernest J Sergenti et al. 2008. “Do UN Interventions Cause Peace? Using Matching to
Improve Causal Inference.” Quarterly Journal of Political Science 3(2):89–122. 16

Gilligan, Michael J and Leslie Johns. 2012. “Formal Models of International Institutions.” Annual Review of
Political Science 15:221–243. 23

Goodliffe, Jay and Darren G. Hawkins. 2006. “Explaining Commitment: States and the Convention against
Torture.” Journal of Politics 68(2):358–371. 14

Goodman, Ryan and Derek Jinks. 2003. “Measuring the Effects of Human Rights Treaties.” European Journal of
International Law 14(1):171–183. 8

Goodman, Ryan and Derek Jinks. 2013. Socializing States: Promoting Human Rights Through International Law.
Oxford University Press. 12, 14

Hafner-Burton, Emilie, Edward Mansfield and Jon Pevehouse. 2015. “Human Rights Institutions, Sovereignty
Costs and Democratization.” British Journal of Political Science 45:1–27. 5, 8

Hafner-Burton, Emilie and Kiyoteru Tsutsui. 2007. “Justice Lost! The Failure of International Human Rights
Law to Matter Where Needed Most.” Journal of Peace Research 44(4):407–425. 1, 14

Hafner-Burton, Emilie M. 2013. Forced to Be Good: Why Trade Agreements Boost Human Rights. Cornell Univer-
sity Press. 14

Hathaway, Oona. 2002. “Do Human Rights Treaties Make a Difference?” Yale Law Journal 111(8):1935–2042.
1

Hathaway, Oona A. 2007. “Why Do Countries Commit to Human Rights Treaties?” Journal of Conflict Resolution
51(4):588–621. 14

Henisz, Witold J. 2002. “The Political Constraint Index (POLCON) Dataset.” Website: https: // whartonmgmt.
wufoo. com/ forms/ political-constraint-index-polcon-dataset . 13, 14

Hill, Daniel W. 2010. “Estimating the Effects of Human Rights Treaties on State Behavior.” Journal of Politics
72(4):1161–1174. 1, 22

Hill, Daniel W. 2016. “Avoiding Obligation: Reservations to Human Rights Treaties.” Journal of Conflict Resolu-
tion 60(6):1129–1158. 17

Hollyer, James and B. Peter Rosendorff. 2011. “Why Do Authoritarian Regimes Sign the Convention against
Torture? Signaling, Domestic Politics and Non-Compliance.” Quarterly Journal of Political Science 6(3-4):275–
327. 8

Honaker, James, Gary King, Matthew Blackwell et al. 2011. “Amelia II: A Program for Missing Data.” Journal
of Statistical Software 45(7):1–47. 20

Jonas, David S and Thomas N Saunders. 2010. “The Object and Purpose of a Treaty: Three Interpretive
Methods.” Vanderbilt Journal of Transnational Law 43(3):565–609. 8

26
Keck, Margaret E. and Kathryn Sikkink. 1998. Activists Beyond Borders: Advocacy Networks in International
Politics. Ithaca: Cornell University Press. 12

Keller, Helen and Geir Ulfstein. 2012. UN Human Rights Treaty Bodies: Law and Legitimacy. Vol. 1 Cambridge
University Press. 2

Koller, Daphne and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press.
15

Koremenos, Barbara. 2005. “Contracting around International Uncertainty.” American Political Science Review
99(4):549. 23

La Porta, Rafael, Florencio Lopez-de Silanes and Andrei Shleifer. 2008. “The Economic Consequences of Legal
Origins.” Journal of Economic Literature 46(2):285–332. 14, 30

Landman, Todd. 2005. Protecting Human Rights: A Comparative Study. Georgetown University Press. 1

Linzer, Drew A and Jeffrey K Staton. 2015. “A Global Measure of Judicial Independence, 1948–2012.” Journal
of Law and Courts 3(2):223–256. 13, 14

Lupu, Yonatan. 2013a. “Best Evidence: The Role of Information in Domestic Judicial Enforcement of Interna-
tional Human Rights Agreements.” International Organization 67(3):469–503. 1, 10

Lupu, Yonatan. 2013b. “The Informative Power of Treaty Commitment: Using the Spatial Model to Address
Selection Effects.” American Journal of Political Science 57(4). 22

Lupu, Yonatan. 2015. “Legislative Veto Players and the Effects of International Human Rights Agreements.”
American Journal of Political Science 59(3):578–594. 10, 12, 14

Lupu, Yonatan. 2016. “Why Do States Join Some Universal Treaties But Not Others? An Analysis of Treaty
Commitment Preferences.” Journal of Conflict Resolution 60(7):1219–1250. 13, 14, 30

Marshall Monty, G, Jaggers Keith and Gurr Ted Robert. 2016. “Polity IV Project: Political Regime Character-
istics and Transitions, 1800-2015.” Dataset Users’ Manual. Center for International Development and Conflict
Management, University of Maryland . 14

Mitchell, Sara McLaughlin, Jonathan J Ring and Mary K Spellman. 2013. “Domestic Legal Traditions and States’
Human Rights Practices.” Journal of Peace Research 50(2):189–202. 14

Murdie, Amanda M. and David R. Davis. 2012. “Shaming and Blaming: Using Events Data to Assess the Impact
of Human Rights INGOs.” International Studies Quarterly 56(1):1–16. 14

Neumayer, Eric. 2005. “Do International Human Rights Treaties Improve Respect for Human Rights?” Journal
of Conflict Resolution 49(6):925–953. 1

Neumayer, Eric. 2007. “Qualified Ratification: Explaining Reservations to International Human Rights Treaties.”
The Journal of Legal Studies 36(2):397–429. 14

Pearl, Judea. 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press. 11, 15

Pearl, Judea. 2009. Causality. Cambridge University Press. 11, 18, 24

Pearl, Judea, Madelyn Glymour and Nicholas P Jewell. 2016. Causal Inference in Statistics: A Primer. John
Wiley & Sons. 12, 18

Phan, Hao D. 2012. A Selective Approach to Establishing a Human Rights Mechanism in Southeast Asia: The Case
for a Southeast Asian Court of Human Rights. Procedural Aspects of International Law Leiden: Brill.

Pillay, Navanethem. 2012. “Strengthening the United Nations Human Rights Treaty Body System.” The Office
of the High Commissioner for Human Rights . 5

27
Polley, Eric C and Mark J van der Laan. 2010. “Super Learner in Prediction.” Working Paper Series UC Berkeley
Division of Biostatistics . 20

Powell, Emilia J. and Jeffrey K. Staton. 2009. “Domestic Judicial Institutions and Human Rights Treaty Viola-
tion.” International Studies Quarterly 53(1):149–174. 12, 14

Rodley, Nigel S. 2013. The Role and Impact of Treaty Bodies. In The Oxford Handbook of International Human
Rights Law, ed. Dinah Shelton. Oxford University Press pp. 621–648. 2, 7

Simmons, Beth A. 2009. Mobilizing for Human Rights: International Law in Domestic Politics. Cambridge:
Cambridge University Press. 1, 10, 12, 14, 17, 30, 31

Simmons, Beth A. 2011. “Reflections on Mobilizing for Human Rights.” NYU Journal of International Law and
Politics 44:729–750. 8

Simmons, Beth A and Daniel J Hopkins. 2005. “The Constraining Power of International Treaties: Theory and
Methods.” American Political Science Review 99(04):623–631. 16

Steinerte, Elina, Malcolm David Evans and Antenor Hallo de Wolf. 2011. The Optional Protocol to the UN
Convention Against Torture. Oxford University Press. 6

Themnér, Lotta. 2014. “UCDP/PRIO Armed Conflict Dataset Codebook.” Uppsala Conflict Data Program (UCDP)
. 14

Ulfstein, Geir and Helen Keller. 2012. UN Human Rights Treaty Bodies. Cambridge: Cambridge University Press.
1, 6

van der Laan, Mark J and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experi-
mental Data. Springer Science & Business Media. 19

von Stein, Jana. 2005. “Do Treaties Constrain or Screen? Selection Bias and Treaty Compliance.” American
Political Science Review 99(4):611–622. 16

28
A United Nations Human Rights Treaties
A.1 Human rights conventions and status of ratification
CAT Convention Against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment,
1465 UNTS 85, adopted 10 December 1984, entered into force 26 June 1987, ratified by 158 states.

CED Convention for the Protection of All Persons from Enforced Disappearance, UNTS 2715 Doc.A/61/448,
adopted 20 December 2006, entered into force 23 December 2010, ratified by 50 states.

CEDAW Convention on the Elimination of All Forms of Discrimination against Women, 1249 UNTS
13, adopted 18 December 1979, entered into force 3 September 1981, ratified by 189 states.

CERD Convention on the Elimination of All Forms of Racial Discrimination, GA Res. 2106 (XX), An-
nex, 20 UN GAOR Supp. (No. 14) at 47, UN Doc. A/6014 (1966), 660 UNTS 195, adopted 7 March 1965,
entered into force 4 January 1969, ratified by 177 states.

CMW International Convention on the Protection of the Rights of All Migrant Workers and Members
of their Families, GA Res. 45/158, Annex, 45 UN GAOR Supp. (No. 49A) at 262, UN Doc. A/45/49, adopted
18 December 1990, entered into force 1 July 2003, ratified by 48 states.

CRC Convention on the Rights of the Child, 1577 UNTS 3, adopted 20 November 1989, entered into
force 2 September 1990, ratified by 194 states.

CRPD Convention on the Rights of Persons with Disabilities, UN Doc. A/61/611, adopted 13 Decem-
ber 2006, entered into force 3 May 2008, ratified by 157 states.

ICCPR International Covenant on Civil and Political Rights, 999 UNTS 171, adopted 16 December 16
1966, entered into force 23 March 1976, ratified by 168 states.

ICESCR International Covenant on Economic, Social and Cultural Rights, 993 UNTS 3, adopted 16
December 1966, entered into force 3 January 1976, ratified by 164 states.

A.2 Monitoring procedures under UN human rights treaties

Table 5: Monitoring procedures under UN core human rights treaties

State State Individual Inquiry Country


reporting communication communication visit
CERD X X X(optional) 7 7
ICESCR X X(OP) X(OP) X(OP) 7
ICCPR X X(optional) X(OP) 7 7
CEDAW X 7 X(OP) X(OP) 7
CAT X X(optional) X(optional) X(optional) X(OP)
CRC X X(OP) X(OP) X(OP) 7
CMW X X(optional) X(optional) 7 7
CRPD X 7 X(OP) X(OP) 7
CED X X(optional) X(optional) X 7
OP: Optional Protocol.

29
B Variable Descriptions
• Ratification Status of Monitoring Procedures: A country–year binary variable coded 1 for ratification
and 0 otherwise. Monitoring procedures include (i) Art. 19 of the Convention against Torture (CAT), (ii)
Art. 20, (iii) Art. 22, and (iv) Optional Protocol to the Convention against Torture (OPCAT). Data are
from the Office of the High Commissioner for Human Rights database.
(http://www.ohchr.org/EN/HRBodies/Pages/HumanRightsBodies.aspx).
• Political Constraints Index: an expert-coded country–year interval variable on a scale from 0 (most
hazardous - no checks and balances) to 1 (most constrained–extensive checks and balances).
Variable name in original dataset is polconiii.
(https://whartonmgmt.wufoo.com/forms/political-constraint-index-polcon-dataset/).
• Judicial independence: a time-series cross-sectional interval variable ranging from 0 (no judicial inde-
pendence) to 1 (complete judicial independence).
(http://polisci.emory.edu/faculty/jkstato/page3/index.html).
• Name and shame index: An country–year index that Cole (2015: 423) computes that “sums the stan-
dardized scores of four variables: media reporting of human rights abuses in (1) The Economist and (2)
Newsweek; (3) Amnesty International press releases targeting a country’s human rights blemishes; and
(4) UN Commission on Human Rights resolutions condemning a country’s human rights performance.”
Variable name in original dataset is name_shame.
• Treaty Commitment Propensity: a country–year interval variable, ranging from −1 to 1, that Lupu
(2016) computes to measure a country’s commitment preference across a large number of treaties in
different domains. I use the first-dimension coordinates as a proxy of the degree to which states are
internationally socialized as measured by their participation in the pool of 280 universal treaties.
Variable name in original dataset is coord1d.
• Political Terror Scale: a country–year five-point ordinal variable measuring levels of political murders,
torture, political imprisonment, and disappearances. This variable was originally coded from 5 for worst
level of abuses to 1 for least abuses. For consistency of interpretation with the other two measures of
human rights outcome, we reverse coded into 0 (worst performance) to 4 (best performance).
(http://www.politicalterrorscale.org).
• Human Rights Scores: a country–year interval variable that measures respect for physical integrity hu-
man rights. Rescaled to a 0–1 range from the empirical range for ease of interpretation. Low scores
indicate low government’s respect for physical integrity right whereas high scores indicate greater gov-
ernment’s respect. The scores were generated using a dynamic ordinal item-response theory model that
accounts for systematic change in the way human rights abuses have been monitored over time. The
human rights scores model builds on data from the CIRI Human Rights Data Project, the Political Terror
Scale, the Ill Treatment and Torture Data Collection, the Uppsala Conflict Data Program, and several
other published sources.
(http://humanrightsscores.org).
• CIRI toture index: an ordinal index that measures the extent of torture practice by government officials
or by private individuals at the instigation of government officials. A score of zero indicates frequent
torture practice; a score of 1 indicates occasional torture practice; and a score of 2 indicates that torture
did not occur in a given year.
(http://www.humanrightsdata.com/p/data-documentation.html).
• Legal origins: a cross-sectional (country) multinomial variable coded for British, French, German, Scan-
dinavian, and Socialist legal origins. Data are from La Porta, Lopez-de Silanes and Shleifer (2008). We
recoded 1 for common law and 0 otherwise.
• Ratification rules: a cross-sectional (country) five-point ordinal variable (1, 1.5, 2, 3, 4) by (Simmons
2009). Its empirical maximum value, however, is only a score of 3. It measures “the institutional “hurdle”
that must be overcome in order to get a treaty ratified.” The coding is based on descriptions of national
constitution or basic rule.
(http://scholar.harvard.edu/files/bsimmons/files/APP_3.2_Ratification_rules.pdf).

30
• Electoral System: a cross-sectional (country) categorical variable coded Parliamentary (2), Assembly-
elected President (1), Presidential (0). Some missing values are filled in using a relatively comparable
coding system by Simmons (2009), in which the variable is coded 2 for primarily parliamentary system,
1 for hybrid system, and 0 for primarily presidential system. We recoded 0 for presidential system and 1
otherwise.
(http://www.iadb.org/en/research-and-data/publication-details,3169.html?pub_id=IDB-DB-121).
• Population: a country–year interval variable measuring the total number of residents in a country re-
gardless of their legal status.
(http://data.worldbank.org/indicator/SP.POP.TOTL).
• GDP per capita: a country–year interval variable measuring gross domestic product divided by midyear
population measured in current US dollars.
(http://data.worldbank.org/indicator/NY.GDP.PCAP.CD).
• Trade: a country–year interval variable measuring the sum of exports and imports of goods and services
as a share of gross domestic product.
(http://data.worldbank.org/indicator/NE.TRD.GNFS.ZS).
• Regime Type: measured by the Polity Score. The Polity Score is a country–year interval variable mea-
suring regime authority spectrum on a 21-point scale ranging from −10 (hereditary monarchy) to +10
(consolidated democracy).
(http://www.systemicpeace.org/polityproject.html).
• Regime Durability: a country–year interval variable measuring the number of years since the most
recent regime change (defined by a three-point change in the POLITY score over a period of three years
or less) or the end of transition period defined by the lack of stable political institutions.
(http://www.systemicpeace.org/polityproject.html).
• Involvement in militarized interstate dispute: a country–year binary variable from the Militarized
Interstate Dispute Data (MIDB dataset, version 4.1). It is recoded 1 to indicate a country’s involvement
in any side of an militarized dispute in a given year and 0 otherwise between the start year and the end
year of a dispute.
(http://cow.dss.ucdavis.edu/data-sets/MIDs).

31
C Summary Statistics

Table 6: Summary Statistics

Statistic N Mean SD Min Max


COW country code 5,609 — — 2 990
Year 5,609 — — 1986 2015
Reporting (Art. 19) 5,609 0.586 0.493 0 1
Inquiry (Art. 20) 5,609 0.555 0.497 0 1
Individual complaint (Art. 22) 5,609 0.250 0.433 0 1
Country visit (OPCAT) 5,609 0.107 0.309 0 1
Political Terror Scale scores 5,178 2.515 1.170 0 4
Human rights scores 5,227 0.559 1.442 −2.940 4.705
CIRI torture index 4,198 0.741 0.733 0 2
Political constraints index 5,285 0.267 0.216 0 0.726
Judicial independence 4,900 0.514 0.314 0.013 0.995
Name shame index 2,775 0.188 3.140 −1.499 26.310
Treaty commitment propensity 4,115 0.086 0.483 −1 0.993
Legal origins 5,503 1.949 0.975 1 5
Ratification rules 5,337 1.782 0.643 1 3
Electoral systems 5,081 0.715 0.876 0 2
Population size 5,386 34,533,273 126,414,302 9,419 1,371,220,000
GDP per capita 5,130 9,330 16,654 64.810 193,648
Trade/GDP proportion 4,657 82.090 49.720 0.021 531.700
Polity 4 scores 4,611 2.630 6.861 −10 10
Regime durability 4,676 24.480 30.290 0 206
Involvement in MID 4,983 0.282 0.450 0 1

32
D Multiple Imputation of Missing Data
Multiple imputation is used instead to fill in missing data and create five imputed data sets. Estimates computed
from these five datasets are then pooled according to Rubin’s rules. All variables in the data summary statistics
table (Table C) are used in the multiple imputation stage to make the missing at random (MAR) assumption as
plausible as possible. The 1986–2015 time frame is used to impute missing data.

Table 7: Fractions of missing data by variables

Variables Fraction
Name and shame index 0.505
Treaty commitment propensity 0.266
CIRI torture index 0.252
Polity 4 scores 0.178
Trade/GDP proportion 0.170
Regime durability 0.166
Judicial independence 0.126
Involvement in MID 0.112
Electoral system 0.094
GDP per capita 0.085
Political Terror Scale 0.077
Human rights protection scores 0.068
Political constraint index 0.058
Ratification rule 0.048
Population size 0.040
Legal origin 0.019
State reporting procedure 0.000
Inquiry procedure 0.000
Individual complaint procedure 0.000
Country visit procedure 0.000
Number of observations after listwise deletion 2,302
Number of observations after imputation 5,609

33
Missingness Map

2
20
31
40
41
42
51
52
53
54
55
56
57
58
60
70
80
90
91
92
93
94
95
100
101
110
115
130
135
140
145
150
155
160
165
200
205
210
211
212
220
221
223
225
230
232
235
255
290
305
310
316
317
325
331
338
339
341
343
344
345
346
349
350
352
355
359
360
365
366
367
368
369
370
371
372
373
375
380
385
390
395
403
404
411
420
432
433
434
435
436
437
438
439
450
451
452
461
471
475
481
482
483
484
490
500
501
510
516
517
520
522
530
531
540
541
551
552
553
560
565
570
571
572
580
581
590
591
600
615
616
620
625
626
630
640
645
651
652
660
663
666
670
679
690
692
694
696
698
700
701
702
703
704
705
710
712
731
732
740
750
760
770
771
775
780
781
790
800
811
812
816
820
830
835
840
850
860
900
910
920
935
940
946
947
950
955
970
983
986
987
990
nameshame

coord1d

torture

p4

trade

durable

ji

dispute

system

gdppc

pts

hrs

polcon

ratifrule

population

legor

visit

complaint

inquiry

reporting

year

cow
Figure 4: Map of missing data for multiple imputation

E R code
E.1 Data preprocessing

1 options(digits = 3)
2 options(dplyr.width = Inf)
3 rm(list = ls())
4 cat("\014")
5

6 library(dplyr) # Upload dplyr to process data


7 library(tidyr) # tinyr package to tidy data
8 library(foreign) # Read Stata data
9 library(lubridate) # Handle dates data
10 library(stargazer) # Export summary statistics in latex table
11 library(reshape2) # convert data sets into long and wide formats
12 library(ggplot2)
13 library(ggthemes)
14

15 ####################################
16 # Treatments (A): CAT, Art.20, Art.22, OPCAT
17 ####################################
18 # Treaty ratification years by 192 countries
19 # CAT open 1984, entry 1987
20 ratifcow <− read.csv("ratification.csv") %>%
21 mutate(catyear = year(as.Date(cat.date, format = "%m/%d/%Y")),
22 individualyear = year(as.Date(individual.date, format = "%m/%d/%Y")),
23 inquiryyear = year(as.Date(inquiry.date, format = "%m/%d/%Y")),
24 opcatyear = year(as.Date(opcat.date, format = "%m/%d/%Y"))) %>%
25 dplyr::select(cow, name, catyear, individualyear, inquiryyear, opcatyear) %>%
26 mutate(catyear = ifelse(is.na(catyear), 0, catyear),

34
27 individualyear = ifelse(is.na(individualyear), 0, individualyear),
28 inquiryyear = ifelse(is.na(inquiryyear), 0, inquiryyear),
29 opcatyear = ifelse(is.na(opcatyear), 0, opcatyear))
30

31 ##########################
32 # Human rights outcome (Y)
33 ##########################
34 # PTS on 201 states from 1976−2015 by order Amnesty > State > HRW
35 # Reverse code so that 5 = best, 1 = worst
36 ptscores <− read.csv("PTS2016.csv") %>%
37 mutate(pts = ifelse(is.na(PTS_A),
38 ifelse(is.na(PTS_S),
39 ifelse(is.na(PTS_H), NA, PTS_H),
40 PTS_S), PTS_A)) %>%
41 rename(cow = COW_Code_N, year = Year, region = Region) %>%
42 dplyr::select(cow, year, pts) %>%
43 mutate(pts = (5 − pts))
44

45 # HRP scores on 205 states from 1949−2013. High = good, low = bad.
46 hrscores <− read.csv("hrscores.csv") %>%
47 rename(cow = COW, year = YEAR, hrs = latentmean) %>%
48 dplyr::select(cow, year, hrs)
49

50 # CIRI data on 198 states from 1981−2011


51 # torture (0 = frequent torture, 2 = no torture)
52 ciri <− read.csv("CIRI.csv") %>%
53 dplyr::select(cow = COW, year = YEAR, torture = TORT) %>%
54 mutate(torture = ifelse(torture < 0, NA, torture))
55

56 #####################
57 # Time−invariant covariates (X)
58 #####################
59 # Legal origin and ratification rules in 187 states
60 # legal origins (1−English, 2−French, 4−German, 5−Scandinavian)
61 # ratification rules 1 (lowest hurdle) to 4 (highest)
62 legalratif <− read.csv("legalratif.csv") %>%
63 mutate(cow, legor = as.factor(legor), ratifrule = as.factor(ratifrule)) %>%
64 dplyr::select(−name)
65

66 # Electoral system in 196 states (24 missing)


67 # Parliamentary (2), Assembly−elected President (1), Presidential (0)
68 electoral <− read.csv("electoral.csv") %>%
69 mutate(system = factor(system))
70

71 #####################################
72 # Combine A (treatments), X (time−invariants), Y (outcome)
73 #####################################
74 data <− ratifcow %>% left_join(ptscores, by = "cow") %>%
75 left_join(hrscores, by = c("cow" = "cow", "year" = "year")) %>%
76 left_join(ciri, by = c("cow" = "cow", "year" = "year")) %>%
77 filter(!is.na(pts) | !is.na(hrs) | !is.na(torture)) %>%
78 mutate(reporting = ifelse(catyear == 0, 0, ifelse(year >= catyear, 1, 0)),
79 inquiry = ifelse(inquiryyear == 0, 0, ifelse(year >= inquiryyear, 1, 0)),
80 complaint = ifelse(individualyear == 0, 0, ifelse(year >= individualyear, 1, 0)),
81 visit = ifelse(opcatyear == 0, 0, ifelse(year >= opcatyear, 1, 0))) %>%

35
82 dplyr::select(−c(name, catyear, inquiryyear, individualyear, opcatyear)) %>%
83 left_join(legalratif, by = "cow") %>%
84 left_join(electoral, by = "cow")
85

86 ###########
87 # Mediators (M)
88 ###########
89 # Political constraints 208 states from 1800−2016
90 polcon <− read.csv("polcon2017.csv") %>%
91 select(cow = ccode, year, polcon = polconiii)
92

93 # Latent Judicial Independence estimates 200 states from 1948−2012


94 lji <− read.csv("LJI.csv") %>%
95 dplyr::select(cow = ccode, year, ji = LJI)
96

97 # NGOs data 199 states for 27 years (1981−2007)


98 mobilize <− read.dta("colecapacity.dta") %>%
99 select(cow = id, year = year, nameshame = name_shame)
100

101 # Treaty commitment propensity 192 states from 1950−2008


102 socialize <− read.dta("lupucommitment.dta") %>%
103 select(cow = ccode, year, coord1d)
104

105 #######################
106 # Time−varying confounders (W)
107 #######################
108 # Population in 185 states from 1966−2015
109 pop <− read.csv("popWDI.csv", header = TRUE, stringsAsFactors = FALSE) %>%
110 gather(year, population, −c(name, cow), factor_key = TRUE) %>%
111 mutate(year = as.integer(gsub("X", "", year)),
112 population = as.numeric(population)) %>%
113 dplyr::select(cow, year, population)
114

115 # GDP per capita in constant US dollars in 185 states from 1966−2015
116 GDPpc <− read.csv("GDPpcWDI.csv", header = TRUE, stringsAsFactors = FALSE) %>%
117 gather(year, gdppc, −c(name, cow), factor_key = TRUE) %>%
118 mutate(year = as.integer(gsub("X", "", year)), gdppc = as.numeric(gdppc)) %>%
119 dplyr::select(cow, year, gdppc) %>%
120 mutate(gdppc = ifelse(gdppc <= 0, NA, gdppc))
121

122 # Trade data 170 states from 1960−2015


123 trade <− read.csv("trade.csv") %>%
124 dplyr::select(−c(Country.Name)) %>%
125 gather(year, trade, −c(COW), factor_key = TRUE) %>%
126 mutate(year = as.integer(gsub("X", "", year))) %>%
127 dplyr::select(cow = COW, year, trade)
128

129 # Regime type and regime durability on 194 countries from 1800−2015
130 polity <− read.csv("polity2015.csv") %>%
131 dplyr::select(year, cow = ccode, p4 = polity2, durable = durable)
132

133 # MID data on 172 states from 1983 to 2015


134 MID <− read.csv("MIDB.csv") %>%
135 dplyr::select(cow = ccode, start = StYear, end = EndYear) %>%
136 right_join(ptscores, by = c("cow")) %>%

36
137 dplyr::select(cow, start, end, year) %>%
138 filter(start > 1982) %>% filter(year > 1982) %>%
139 mutate(dispute = ifelse(year >= start, ifelse(year <= end, 1, 0), 0)) %>%
140 dplyr::select(cow, year, dispute) %>%
141 distinct() %>% group_by(cow, year) %>% summarize(dispute = max(dispute))
142

143 # Combine data on 192 states from 1966−2013


144 data <− left_join(data, polcon, by = c("cow" = "cow", "year" = "year")) %>%
145 left_join(lji, by = c("cow" = "cow", "year" = "year")) %>%
146 left_join(mobilize, by = c("cow" = "cow", "year" = "year")) %>%
147 left_join(socialize, by = c("cow" = "cow", "year" = "year")) %>%
148 left_join(pop, by = c("cow" = "cow", "year" = "year")) %>%
149 left_join(GDPpc, by = c("cow" = "cow", "year" = "year")) %>%
150 left_join(trade, by = c("cow" = "cow", "year" = "year")) %>%
151 left_join(polity, by = c("cow" = "cow", "year" = "year")) %>%
152 left_join(MID, by = c("cow" = "cow", "year" = "year")) %>%
153 dplyr::select(cow, year,
154 reporting, inquiry, complaint, visit,
155 pts, hrs, torture,
156 polcon, ji, nameshame, coord1d,
157 legor, ratifrule, system,
158 population, gdppc, trade, p4, durable, dispute) %>%
159 filter(year > 1985)
160

161 # Summary statistics in LaTeX


162 sum(complete.cases(data)) # 2302 complete cases out of 5609
163

164 # Saving data into drive


165 write.csv(data, "finaldata.csv", row.names = FALSE)
166

167 # Saving data in RData file


168 save.image("dataprocess.RData")

37
E.2 Multiple imputation

1 options(digits = 2)
2 options(dplyr.width = Inf)
3 options("scipen" = 100, "digits" = 4)
4 rm(list = ls())
5 cat("\014")
6

7 library(dplyr) # Upload dplyr to process data


8 library(tidyr) # tinyr package to tidy data
9 library(foreign) # Read Stata data
10 library(ggplot2) # ggplot graphics
11 library(Amelia) # Multiple imputation
12 library(lubridate) # Handle dates data
13 library(psych) # Descriptive statistics
14 library(stargazer)
15

16 # Summary statistics of raw data and impute GDP = min and conflict NA = 0
17 data <− read.csv("finaldata.csv")
18 stargazer(data)
19 describe(data)
20

21 # Multiple imputation using Amelia package


22 set.seed(123)
23 mi.data <− amelia(data, m = 5, ts ="year", cs = "cow",
24 p2s = 2, polytime = 3,
25 logs = c("population", "gdppc", "trade"),
26 noms = c("legor", "ratifrule", "system", "dispute", "torture"),
27 ords = c("pts", "p4"),
28 emburn = c(50, 500), boot.type = "none",
29 bounds = rbind(c(7, 0, 4), c(8, −2.9, 4.7), c(9, 0, 2),
30 c(10, 0, 1), c(11, 0, 1), c(12, −1.5, 26.3),
31 c(13, −1, 1), c(20, −10, 10), c(21, 1, 206)))
32

33 # Write imputed data sets into CSV files


34 save(mi.data, file = "midata.RData")
35 write.amelia(obj = mi.data, file.stem = "midata", row.names = FALSE)
36

37 # Create missingness map and diagnostics


38 missmap(mi.data)
39 summary(mi.data)
40

41 # Stack all five data sets and export into a CSV file
42 data1 <− read.csv("midata1.csv")
43 data2 <− read.csv("midata2.csv")
44 data3 <− read.csv("midata3.csv")
45 data4 <− read.csv("midata4.csv")
46 data5 <− read.csv("midata5.csv")
47 stackdata <− rbind(data1, data2, data3, data4, data5)
48 write.csv(stackdata, file = "stackeddata.csv", row.names = FALSE)
49

50 # Saving data in RData file


51 save.image("imputation.RData")

38
E.3 Super Learner-based Targeted Maximum Likelihood Estimation

1 #################################
2 # Estimating Causal Effects of Monitoring Procedures
3 # Using Super Learner−based TMLE
4 #################################
5 options(digits = 5)
6 options(dplyr.width = Inf)
7 rm(list = ls())
8 cat("\014")
9 options("scipen" = 100, "digits" = 5)
10

11 # Load packages
12 library(dplyr) # Manage data
13 library(ggplot2) # visualize data
14 library(ggthemes) # use various themes in ggplot
15 library(SuperLearner) # use Super Learner predictive method
16 library(tmle) # use TMLE method
17 library(gam) # algorithm used within TMLE
18 library(glmnet) # algorithm used within TMLE
19 library(randomForest) # algorithm used within TMLE
20 library(polspline) # algorithm used within TMLE
21 library(gbm) #algorithm used within TMLE
22 library(xgboost) # algorithm used within TMLE
23 library(xtable) # create LaTeX tables
24 library(RhpcBLASctl) #multicore
25 library(parallel) # parallel computing
26 library(Amelia) # combine estimates
27

28 # Create Super Learner library


29 SL.library <− c("SL.glm", "SL.glmnet",
30 "SL.polymars", "SL.gam",
31 "SL.randomForest", "SL.xgboost")
32

33 # Set multicore compatible seed.


34 set.seed(1, "L’Ecuyer-CMRG")
35

36 # Setup parallel computation − use all cores on our computer.


37 num_cores = RhpcBLASctl::get_num_cores()
38

39 # Use all of those cores for parallel SuperLearner.


40 options(mc.cores = num_cores)
41

42 # Check how many parallel workers we are using:


43 getOption("mc.cores")
44

45 # Create function rescaling outcome into 0−1


46 std <− function(x) {
47 x = (x − min(x))/(max(x) − min(x))
48 }
49

50 #########################
51 # Read stacked data sets
52 data <− read.csv("stackeddata.csv")
53 d <− split(data, rep(1:5, each = nrow(data)/5))

39
54

55 # Create holders for TMLE estimates for each imputed dataset


56 tmle_pts <− data.frame(matrix(NA, nrow = 10, ncol = 4))
57 tmle_hrs <− data.frame(matrix(NA, nrow = 10, ncol = 4))
58 tmle_ciri <− data.frame(matrix(NA, nrow = 10, ncol = 4))
59

60 for (m in 1:5) {
61

62 # Transform variables
63 d[[m]] <− d[[m]] %>%
64 mutate(pts = std(pts), hrs = std(hrs), torture = std(torture),
65 polcon = std(polcon), ji = std(ji),
66 nameshame = std(nameshame), coord1d = std(coord1d),
67 legor = ifelse(legor == 1, 1, 0),
68 ratifrule = std(ratifrule), system = ifelse(system == 0, 0, 1),
69 population = std(population), gdppc = std(gdppc),
70 trade = std(trade), p4 = std(p4), durable = std(durable)) %>%
71 group_by(cow) %>%
72 mutate(lagreporting = lag(reporting), laginquiry = lag(inquiry),
73 lagcomplaint = lag(complaint), lagvisit = lag(visit),
74 lagpts = lag(pts), laghrs = lag(hrs), lagtorture = lag(torture),
75 lagpolcon = lag(polcon), lagji = lag(ji),
76 lagnameshame = lag(nameshame), lagcoord1d = lag(coord1d))
77

78 # Create PTS dataset


79 data_pts <− d[[m]] %>%
80 dplyr::select(c(reporting, inquiry, complaint, visit, pts,
81 legor, ratifrule, system,
82 population, gdppc, trade, p4, durable, dispute,
83 lagreporting, laginquiry, lagcomplaint, lagvisit, lagpts,
84 lagpolcon, lagji, lagnameshame, lagcoord1d,
85 year, cow)) %>% filter(year > 1986) %>% na.omit()
86

87 # Estimate the causal efect of reporting on PTS


88 Y <− data_pts$pts
89 A <− data_pts$reporting
90 W <− data.frame(dplyr::select(data_pts, c(legor, ratifrule, system,
91 lagreporting,
92 lagpolcon, lagji, lagnameshame, lagcoord1d,
93 lagpts,
94 population, gdppc, trade, p4, durable,
95 dispute, cow))) %>% dplyr::select(−cow)
96 tmle_reporting <− tmle(Y, A, W,
97 Qbounds = c(0, 1), Q.SL.library = SL.library,
98 gbound = 1e−2, g.SL.library = SL.library,
99 family = "gaussian", fluctuation = "logistic",
100 verbose = TRUE)
101 tmle_pts[(2∗m − 1):(2∗m), 1] <− c(tmle_reporting$estimates$ATE$psi,
102 sqrt(tmle_reporting$estimates$ATE$var.psi))
103 print(c(m, "PTS", "reporting"))
104

105 # Estimate the causal effect of inquiry on PTS


106 Y <− data_pts$pts
107 A <− data_pts$inquiry
108 W <− data.frame(dplyr::select(data_pts, c(legor, ratifrule, system,

40
109 laginquiry,
110 lagpolcon, lagji, lagnameshame, lagcoord1d,
111 lagpts,
112 reporting,
113 population, gdppc, trade, p4, durable,
114 dispute, cow))) %>% dplyr::select(−cow)
115 tmle_inquiry <− tmle(Y, A, W,
116 Qbounds = c(0, 1), Q.SL.library = SL.library,
117 gbound = 1e−2, g.SL.library = SL.library,
118 family = "gaussian", fluctuation = "logistic",
119 verbose = TRUE)
120 tmle_pts[(2∗m − 1):(2∗m), 2] <− c(tmle_inquiry$estimates$ATE$psi,
121 sqrt(tmle_inquiry$estimates$ATE$var.psi))
122 print(c(m, "PTS", "inquiry"))
123

124 # Estimate the causal effect of complaint on PTS


125 Y <− data_pts$pts
126 A <− data_pts$complaint
127 W <− data.frame(dplyr::select(data_pts, c(legor, ratifrule, system,
128 lagcomplaint,
129 lagpolcon, lagji, lagnameshame, lagcoord1d,
130 lagpts,
131 reporting,
132 population, gdppc, trade, p4, durable,
133 dispute, cow))) %>% dplyr::select(−cow)
134 tmle_complaint <− tmle(Y, A, W,
135 Qbounds = c(0, 1), Q.SL.library = SL.library,
136 gbound = 1e−2, g.SL.library = SL.library,
137 family = "gaussian", fluctuation = "logistic",
138 verbose = TRUE)
139 tmle_pts[(2∗m − 1):(2∗m), 3] <− c(tmle_complaint$estimates$ATE$psi,
140 sqrt(tmle_complaint$estimates$ATE$var.psi))
141 print(c(m, "PTS", "complaint"))
142

143 # Estimate the causal effect of visit on PTS


144 data_pts2 <− data_pts %>% filter(year > 2005)
145 Y <− data_pts2$pts
146 A <− data_pts2$visit
147 W <− data.frame(dplyr::select(data_pts2, c(legor, ratifrule, system,
148 lagvisit,
149 lagpolcon, lagji, lagnameshame, lagcoord1d,
150 lagpts,
151 reporting,
152 population, gdppc, trade, p4, durable,
153 dispute, cow))) %>% dplyr::select(−cow)
154 tmle_visit <− tmle(Y, A, W,
155 Qbounds = c(0, 1), Q.SL.library = SL.library,
156 gbound = 1e−2, g.SL.library = SL.library,
157 family = "gaussian", fluctuation = "logistic",
158 verbose = TRUE)
159 tmle_pts[(2∗m − 1):(2∗m), 4] <− c(tmle_visit$estimates$ATE$psi,
160 sqrt(tmle_visit$estimates$ATE$var.psi))
161 print(c(m, "PTS", "visit"))
162

163 # Create HRS dataset

41
164 data_hrs <− d[[m]] %>%
165 dplyr::select(c(reporting, inquiry, complaint, visit,
166 hrs,
167 legor, ratifrule, system,
168 population, gdppc, trade, p4, durable, dispute,
169 lagreporting, laginquiry, lagcomplaint, lagvisit,
170 laghrs, lagpolcon, lagji, lagnameshame, lagcoord1d,
171 cow, year)) %>% filter(year < 2014 & year > 1986) %>% na.omit()
172

173 # Estimate the causal efect of reporting on HRS


174 Y <− data_hrs$hrs
175 A <− data_hrs$reporting
176 W <− data.frame(dplyr::select(data_hrs, c(legor, ratifrule, system,
177 lagreporting,
178 lagpolcon, lagji, lagnameshame, lagcoord1d,
179 laghrs,
180 population, gdppc, trade, p4, durable,
181 dispute, cow))) %>% dplyr::select(−cow)
182 tmle_reporting <− tmle(Y, A, W,
183 Qbounds = c(0, 1), Q.SL.library = SL.library,
184 gbound = 1e−2, g.SL.library = SL.library,
185 family = "gaussian", fluctuation = "logistic",
186 verbose = TRUE)
187 tmle_hrs[(2∗m − 1):(2∗m), 1] <− c(tmle_reporting$estimates$ATE$psi,
188 sqrt(tmle_reporting$estimates$ATE$var.psi))
189 print(c(m, "HRS", "reporting"))
190

191 # Estimate the causal effect of inquiry on HRS


192 Y <− data_hrs$hrs
193 A <− data_hrs$inquiry
194 W <− data.frame(dplyr::select(data_hrs, c(legor, ratifrule, system,
195 laginquiry,
196 lagpolcon, lagji, lagnameshame, lagcoord1d,
197 laghrs,
198 reporting,
199 population, gdppc, trade, p4, durable,
200 dispute, cow))) %>% dplyr::select(−cow)
201 tmle_inquiry <− tmle(Y, A, W,
202 Qbounds = c(0, 1), Q.SL.library = SL.library,
203 gbound = 1e−2, g.SL.library = SL.library,
204 family = "gaussian", fluctuation = "logistic",
205 verbose = TRUE)
206 tmle_hrs[(2∗m − 1):(2∗m), 2] <− c(tmle_inquiry$estimates$ATE$psi,
207 sqrt(tmle_inquiry$estimates$ATE$var.psi))
208 print(c(m, "HRS", "inquiry"))
209

210 # Estimate the causal effect of complaint on HRS


211 Y <− data_hrs$hrs
212 A <− data_hrs$complaint
213 W <− data.frame(dplyr::select(data_hrs, c(legor, ratifrule, system,
214 lagcomplaint,
215 lagpolcon, lagji, lagnameshame, lagcoord1d,
216 laghrs,
217 reporting,
218 population, gdppc, trade, p4, durable,

42
219 dispute, cow))) %>% dplyr::select(−cow)
220 tmle_complaint <− tmle(Y, A, W,
221 Qbounds = c(0, 1), Q.SL.library = SL.library,
222 gbound = 1e−2, g.SL.library = SL.library,
223 family = "gaussian", fluctuation = "logistic",
224 verbose = TRUE)
225 tmle_hrs[(2∗m − 1):(2∗m), 3] <− c(tmle_complaint$estimates$ATE$psi,
226 sqrt(tmle_complaint$estimates$ATE$var.psi))
227 print(c(m, "HRS", "complaint"))
228

229 # Estimate the causal effect of visit on HRS


230 data_hrs2 <− data_hrs %>% filter(year > 2005)
231 Y <− data_hrs2$hrs
232 A <− data_hrs2$visit
233 W <− data.frame(dplyr::select(data_hrs2, c(legor, ratifrule, system,
234 lagvisit,
235 lagpolcon, lagji, lagnameshame, lagcoord1d,
236 laghrs,
237 reporting,
238 population, gdppc, trade, p4, durable,
239 dispute, cow))) %>% dplyr::select(−cow)
240 tmle_visit <− tmle(Y, A, W,
241 Qbounds = c(0, 1), Q.SL.library = SL.library,
242 gbound = 1e−2, g.SL.library = SL.library,
243 family = "gaussian", fluctuation = "logistic",
244 verbose = TRUE)
245 tmle_hrs[(2∗m − 1):(2∗m), 4] <− c(tmle_visit$estimates$ATE$psi,
246 sqrt(tmle_visit$estimates$ATE$var.psi))
247 print(c(m, "HRS", "visit"))
248

249 # Create CIRI dataset


250 data_ciri <− d[[m]] %>%
251 dplyr::select(c(reporting, inquiry, complaint, visit,
252 torture,
253 legor, ratifrule, system,
254 population, gdppc, trade, p4, durable, dispute,
255 lagreporting, laginquiry, lagcomplaint, lagvisit,
256 lagtorture, lagpolcon, lagji, lagnameshame, lagcoord1d,
257 cow, year)) %>%
258 filter(year < 2012, year > 1986) %>% na.omit()
259

260 # Estimate the causal efect of reporting on CIRI


261 Y <− data_ciri$torture
262 A <− data_ciri$reporting
263 W <− data.frame(dplyr::select(data_ciri, c(legor, ratifrule, system,
264 lagreporting,
265 lagpolcon, lagji, lagnameshame, lagcoord1d,
266 lagtorture,
267 population, gdppc, trade, p4, durable,
268 dispute, cow))) %>% dplyr::select(−cow)
269 tmle_reporting <− tmle(Y, A, W,
270 Qbounds = c(0, 1), Q.SL.library = SL.library,
271 gbound = 1e−2, g.SL.library = SL.library,
272 family = "gaussian", fluctuation = "logistic",
273 verbose = TRUE)

43
274 tmle_ciri[(2∗m − 1):(2∗m), 1] <− c(tmle_reporting$estimates$ATE$psi,
275 sqrt(tmle_reporting$estimates$ATE$var.psi))
276 print(c(m, "CIRI", "reporting"))
277

278 # Estimate the causal effect of inquiry on CIRI


279 Y <− data_ciri$torture
280 A <− data_ciri$inquiry
281 W <− data.frame(dplyr::select(data_ciri, c(legor, ratifrule, system,
282 laginquiry,
283 lagpolcon, lagji, lagnameshame, lagcoord1d,
284 lagtorture,
285 reporting,
286 population, gdppc, trade, p4, durable,
287 dispute, cow))) %>% dplyr::select(−cow)
288 tmle_inquiry <− tmle(Y, A, W,
289 Qbounds = c(0, 1), Q.SL.library = SL.library,
290 gbound = 1e−2, g.SL.library = SL.library,
291 family = "gaussian", fluctuation = "logistic",
292 verbose = TRUE)
293 tmle_ciri[(2∗m − 1):(2∗m), 2] <− c(tmle_inquiry$estimates$ATE$psi,
294 sqrt(tmle_inquiry$estimates$ATE$var.psi))
295 print(c(m, "CIRI", "inquiry"))
296

297 # Estimate the causal effect of complaint on CIRI


298 Y <− data_ciri$torture
299 A <− data_ciri$complaint
300 W <− data.frame(dplyr::select(data_ciri, c(legor, ratifrule, system,
301 lagcomplaint,
302 lagpolcon, lagji, lagnameshame, lagcoord1d,
303 lagtorture,
304 reporting,
305 population, gdppc, trade, p4, durable,
306 dispute, cow))) %>% dplyr::select(−cow)
307 tmle_complaint <− tmle(Y, A, W,
308 Qbounds = c(0, 1), Q.SL.library = SL.library,
309 gbound = 1e−2, g.SL.library = SL.library,
310 family = "gaussian", fluctuation = "logistic",
311 verbose = TRUE)
312 tmle_ciri[(2∗m − 1):(2∗m), 3] <− c(tmle_complaint$estimates$ATE$psi,
313 sqrt(tmle_complaint$estimates$ATE$var.psi))
314 print(c(m, "CIRI", "complaint"))
315

316 # Estimate the causal effect of visit on CIRI


317 data_ciri2 <− data_ciri %>% filter(year > 2005)
318 Y <− data_ciri2$torture
319 A <− data_ciri2$visit
320 W <− data.frame(dplyr::select(data_ciri2, c(legor, ratifrule, system,
321 lagvisit,
322 lagpolcon, lagji, lagnameshame, lagcoord1d,
323 lagtorture,
324 reporting,
325 population, gdppc, trade, p4, durable,
326 dispute, cow))) %>% dplyr::select(−cow)
327 tmle_visit <− tmle(Y, A, W,
328 Qbounds = c(0, 1), Q.SL.library = SL.library,

44
329 gbound = 1e−2, g.SL.library = SL.library,
330 family = "gaussian", fluctuation = "logistic",
331 verbose = TRUE)
332 tmle_ciri[(2∗m − 1):(2∗m), 4] <− c(tmle_visit$estimates$ATE$psi,
333 sqrt(tmle_visit$estimates$ATE$var.psi))
334 print(c(m, "CIRI", "visit"))
335 }
336

337 # Combine TMLE estimates from 5 imputed datasets


338 procedure <− c("Reporting", "Inquiry", "Complaint", "Visit")
339 procedure_pts <− data.frame(mi.meld(q = tmle_pts[c(1, 3, 5, 7, 9), ],
340 se = tmle_pts[c(2, 4, 6, 8, 10), ],
341 byrow = TRUE))
342 result_pts <− data.frame(cbind(t(procedure_pts[, 1:4]),
343 t(procedure_pts[, 5:8]))) %>%
344 mutate(procedure = procedure, mean = X1, sd = X2,
345 lower = X1 − 1.96∗X2, upper = X1 + 1.96∗X2) %>%
346 dplyr::select(procedure, mean, lower, upper)
347

348 procedure_hrs <− data.frame(mi.meld(q = tmle_hrs[c(1, 3, 5, 7, 9), ],


349 se = tmle_hrs[c(2, 4, 6, 8, 10), ],
350 byrow = TRUE))
351 result_hrs <− data.frame(cbind(t(procedure_hrs[, 1:4]),
352 t(procedure_hrs[, 5:8]))) %>%
353 mutate(procedure = procedure, mean = X1, sd = X2,
354 lower = X1 − 1.96∗X2, upper = X1 + 1.96∗X2) %>%
355 dplyr::select(procedure, mean, lower, upper)
356

357 procedure_ciri <− data.frame(mi.meld(q = tmle_ciri[c(1, 3, 5, 7, 9), ],


358 se = tmle_ciri[c(2, 4, 6, 8, 10), ],
359 byrow = TRUE))
360 result_ciri <− data.frame(cbind(t(procedure_ciri[, 1:4]),
361 t(procedure_ciri[, 5:8]))) %>%
362 mutate(procedure = procedure, mean = X1, sd = X2,
363 lower = X1 − 1.96∗X2, upper = X1 + 1.96∗X2) %>%
364 dplyr::select(procedure, mean, lower, upper)
365

366 datasets <− c("PTS", "HRS", "CIRI")


367 effect <− data.frame(rbind(result_pts, result_hrs, result_ciri))
368 xtable(effect, digits = c(rep(3, 5)))
369

370 save.image("SL-basedTMLE.RData")

45
E.4 Generalized Linear Models

1 ##################################
2 # Estimating Causal Effects of Monitoring Procedures
3 # Using Linear Models of Human Rights Outcome
4 ##################################
5 options(digits = 5)
6 options(dplyr.width = Inf)
7 rm(list = ls())
8 cat("\014")
9 options("scipen" = 100, "digits" = 5)
10

11 # Load packages
12 library(dplyr) # Manage data
13 library(ggplot2) # visualize data
14 library(ggthemes) # use various themes in ggplot
15 library(xtable) # create LaTeX tables
16 library(RhpcBLASctl) #multicore
17 library(parallel) # parallel computing
18 library(Amelia) # combine estimates
19

20 # Set multicore compatible seed.


21 set.seed(1, "L’Ecuyer-CMRG")
22

23 # Setup parallel computation − use all cores on our computer.


24 num_cores = RhpcBLASctl::get_num_cores()
25

26 # Use all of those cores for parallel SuperLearner.


27 options(mc.cores = num_cores)
28

29 # Check how many parallel workers we are using:


30 getOption("mc.cores")
31

32 # Create function rescaling outcome into 0−1


33 std <− function(x) {
34 x = (x − min(x))/(max(x) − min(x))
35 }
36

37 ###############################
38 # Read stacked data sets
39 data <− read.csv("stackeddata.csv")
40 d <− split(data, rep(1:5, each = nrow(data)/5))
41

42 # Create holders for TMLE estimates for each imputed dataset


43 glm_pts <− data.frame(matrix(NA, nrow = 10, ncol = 4))
44 glm_hrs <− data.frame(matrix(NA, nrow = 10, ncol = 4))
45 glm_ciri <− data.frame(matrix(NA, nrow = 10, ncol = 4))
46

47 for (m in 1:5) {
48

49 # Transform variables
50 d[[m]] <− d[[m]] %>%
51 mutate(pts = std(pts), hrs = std(hrs), torture = std(torture),
52 polcon = std(polcon), ji = std(ji),
53 nameshame = std(nameshame), coord1d = std(coord1d),

46
54 legor = ifelse(legor == 1, 1, 0),
55 ratifrule = std(ratifrule), system = ifelse(system == 0, 0, 1),
56 population = std(population), gdppc = std(gdppc),
57 trade = std(trade), p4 = std(p4), durable = std(durable)) %>%
58 group_by(cow) %>%
59 mutate(lagreporting = lag(reporting), laginquiry = lag(inquiry),
60 lagcomplaint = lag(complaint), lagvisit = lag(visit),
61 lagpts = lag(pts), laghrs = lag(hrs), lagtorture = lag(torture),
62 lagpolcon = lag(polcon), lagji = lag(ji),
63 lagnameshame = lag(nameshame), lagcoord1d = lag(coord1d))
64

65 # Create PTS dataset


66 data_pts <− d[[m]] %>%
67 dplyr::select(c(reporting, inquiry, complaint, visit, pts,
68 legor, ratifrule, system,
69 population, gdppc, trade, p4, durable, dispute,
70 lagreporting, laginquiry, lagcomplaint, lagvisit, lagpts,
71 lagpolcon, lagji, lagnameshame, lagcoord1d,
72 year, cow)) %>% filter(year > 1986) %>% na.omit()
73

74 # Estimate the causal efect of reporting on PTS


75 glm_reporting <− glm(pts ~ reporting + lagreporting +
76 legor + ratifrule + system +
77 population + gdppc + trade + p4 + durable + dispute +
78 lagpts +
79 lagpolcon + lagji + lagnameshame + lagcoord1d,
80 data = data_pts)
81 glm_pts[(2∗m − 1):(2∗m), 1] <− c(coef(glm_reporting)["reporting"],
82 se.coef(glm_reporting)["reporting"])
83 print(c(m, "PTS", "reporting"))
84

85 # Estimate the causal effect of inquiry on PTS


86 glm_inquiry <− glm(pts ~ inquiry + laginquiry +
87 legor + ratifrule + system +
88 population + gdppc + trade + p4 + durable + dispute +
89 lagpts + reporting +
90 lagpolcon + lagji + lagnameshame + lagcoord1d,
91 data = data_pts)
92 glm_pts[(2∗m − 1):(2∗m), 2] <− c(coef(glm_inquiry)["inquiry"],
93 se.coef(glm_inquiry)["inquiry"])
94 print(c(m, "PTS", "inquiry"))
95

96 # Estimate the causal effect of complaint on PTS


97 glm_complaint <− glm(pts ~ complaint + lagcomplaint +
98 legor + ratifrule + system +
99 population + gdppc + trade + p4 + durable + dispute +
100 lagpts + reporting +
101 lagpolcon + lagji + lagnameshame + lagcoord1d,
102 data = data_pts)
103 glm_pts[(2∗m − 1):(2∗m), 3] <− c(coef(glm_complaint)["complaint"],
104 se.coef(glm_complaint)["complaint"])
105 print(c(m, "PTS", "complaint"))
106

107 # Estimate the causal effect of visit on PTS


108 data_pts2 <− data_pts %>% filter(year > 2005)

47
109 glm_visit <− glm(pts ~ visit + lagvisit +
110 legor + ratifrule + system +
111 population + gdppc + trade + p4 + durable + dispute +
112 lagpts + reporting +
113 lagpolcon + lagji + lagnameshame + lagcoord1d,
114 data = data_pts2)
115 glm_pts[(2∗m − 1):(2∗m), 4] <− c(coef(glm_visit)["visit"],
116 se.coef(glm_visit)["visit"])
117 print(c(m, "PTS", "visit"))
118

119 # Create HRS dataset


120 data_hrs <− d[[m]] %>%
121 dplyr::select(c(reporting, inquiry, complaint, visit,
122 hrs,
123 legor, ratifrule, system,
124 population, gdppc, trade, p4, durable, dispute,
125 lagreporting, laginquiry, lagcomplaint, lagvisit,
126 laghrs, lagpolcon, lagji, lagnameshame, lagcoord1d,
127 cow, year)) %>% filter(year < 2014 & year > 1986) %>% na.omit()
128

129 # Estimate the causal efect of reporting on HRS


130 glm_reporting <− glm(hrs ~ reporting + lagreporting +
131 legor + ratifrule + system +
132 population + gdppc + trade + p4 + durable + dispute +
133 laghrs +
134 lagpolcon + lagji + lagnameshame + lagcoord1d,
135 data = data_hrs)
136 glm_hrs[(2∗m − 1):(2∗m), 1] <− c(coef(glm_reporting)["reporting"],
137 se.coef(glm_reporting)["reporting"])
138 print(c(m, "HRS", "reporting"))
139

140 # Estimate the causal effect of inquiry on HRS


141 glm_inquiry <− glm(hrs ~ inquiry + laginquiry +
142 legor + ratifrule + system +
143 population + gdppc + trade + p4 + durable + dispute +
144 laghrs + reporting +
145 lagpolcon + lagji + lagnameshame + lagcoord1d,
146 data = data_hrs)
147 glm_hrs[(2∗m − 1):(2∗m), 2] <− c(coef(glm_inquiry)["inquiry"],
148 se.coef(glm_inquiry)["inquiry"])
149 print(c(m, "HRS", "inquiry"))
150

151 # Estimate the causal effect of complaint on HRS


152 glm_complaint <− glm(hrs ~ complaint + lagcomplaint +
153 legor + ratifrule + system +
154 population + gdppc + trade + p4 + durable + dispute +
155 laghrs + reporting +
156 lagpolcon + lagji + lagnameshame + lagcoord1d,
157 data = data_hrs)
158 glm_hrs[(2∗m − 1):(2∗m), 3] <− c(coef(glm_complaint)["complaint"],
159 se.coef(glm_complaint)["complaint"])
160 print(c(m, "HRS", "complaint"))
161

162 # Estimate the causal effect of visit on HRS


163 data_hrs2 <− data_hrs %>% filter(year > 2005)

48
164 glm_visit <− glm(hrs ~ visit + lagvisit +
165 legor + ratifrule + system +
166 population + gdppc + trade + p4 + durable + dispute +
167 laghrs + reporting +
168 lagpolcon + lagji + lagnameshame + lagcoord1d,
169 data = data_hrs2)
170 glm_hrs[(2∗m − 1):(2∗m), 4] <− c(coef(glm_visit)["visit"],
171 se.coef(glm_visit)["visit"])
172 print(c(m, "HRS", "visit"))
173

174 # Create CIRI dataset


175 data_ciri <− d[[m]] %>%
176 dplyr::select(c(reporting, inquiry, complaint, visit,
177 torture,
178 legor, ratifrule, system,
179 population, gdppc, trade, p4, durable, dispute,
180 lagreporting, laginquiry, lagcomplaint, lagvisit,
181 lagtorture, lagpolcon, lagji, lagnameshame, lagcoord1d,
182 cow, year)) %>%
183 filter(year < 2012, year > 1986) %>% na.omit()
184

185 # Estimate the causal efect of reporting on CIRI


186 glm_reporting <− glm(torture ~ reporting + lagreporting +
187 legor + ratifrule + system +
188 population + gdppc + trade + p4 + durable + dispute +
189 lagtorture +
190 lagpolcon + lagji + lagnameshame + lagcoord1d,
191 data = data_ciri)
192 glm_ciri[(2∗m − 1):(2∗m), 1] <− c(coef(glm_reporting)["reporting"],
193 se.coef(glm_reporting)["reporting"])
194 print(c(m, "CIRI", "reporting"))
195

196 # Estimate the causal effect of inquiry on CIRI


197 glm_inquiry <− glm(torture ~ inquiry + laginquiry +
198 legor + ratifrule + system +
199 population + gdppc + trade + p4 + durable + dispute +
200 lagtorture + reporting +
201 lagpolcon + lagji + lagnameshame + lagcoord1d,
202 data = data_ciri)
203 glm_ciri[(2∗m − 1):(2∗m), 2] <− c(coef(glm_inquiry)["inquiry"],
204 se.coef(glm_inquiry)["inquiry"])
205 print(c(m, "CIRI", "inquiry"))
206

207 # Estimate the causal effect of complaint on CIRI


208 glm_complaint <− glm(torture ~ complaint + lagcomplaint +
209 legor + ratifrule + system +
210 population + gdppc + trade + p4 + durable + dispute +
211 lagtorture + reporting +
212 lagpolcon + lagji + lagnameshame + lagcoord1d,
213 data = data_ciri)
214 glm_ciri[(2∗m − 1):(2∗m), 3] <− c(coef(glm_complaint)["complaint"],
215 se.coef(glm_complaint)["complaint"])
216 print(c(m, "CIRI", "complaint"))
217

218 # Estimate the causal effect of visit on CIRI

49
219 data_ciri2 <− data_ciri %>% filter(year > 2005)
220 glm_visit <− glm(torture ~ visit + lagvisit +
221 legor + ratifrule + system +
222 population + gdppc + trade + p4 + durable + dispute +
223 lagtorture + reporting +
224 lagpolcon + lagji + lagnameshame + lagcoord1d,
225 data = data_ciri2)
226 glm_ciri[(2∗m − 1):(2∗m), 4] <− c(coef(glm_visit)["visit"],
227 se.coef(glm_visit)["visit"])
228 print(c(m, "CIRI", "visit"))
229 }
230

231 # Combine TMLE estimates from 5 imputed datasets


232 procedure <− c("Reporting", "Inquiry", "Complaint", "Visit")
233 procedure_pts <− data.frame(mi.meld(q = glm_pts[c(1, 3, 5, 7, 9), ],
234 se = glm_pts[c(2, 4, 6, 8, 10), ],
235 byrow = TRUE))
236 result_pts <− data.frame(cbind(t(procedure_pts[, 1:4]),
237 t(procedure_pts[, 5:8]))) %>%
238 mutate(procedure = procedure, mean = X1, sd = X2,
239 lower = X1 − 1.96∗X2, upper = X1 + 1.96∗X2) %>%
240 dplyr::select(procedure, mean, lower, upper)
241

242 procedure_hrs <− data.frame(mi.meld(q = glm_hrs[c(1, 3, 5, 7, 9), ],


243 se = glm_hrs[c(2, 4, 6, 8, 10), ],
244 byrow = TRUE))
245 result_hrs <− data.frame(cbind(t(procedure_hrs[, 1:4]),
246 t(procedure_hrs[, 5:8]))) %>%
247 mutate(procedure = procedure, mean = X1, sd = X2,
248 lower = X1 − 1.96∗X2, upper = X1 + 1.96∗X2) %>%
249 dplyr::select(procedure, mean, lower, upper)
250

251 procedure_ciri <− data.frame(mi.meld(q = glm_ciri[c(1, 3, 5, 7, 9), ],


252 se = glm_ciri[c(2, 4, 6, 8, 10), ],
253 byrow = TRUE))
254 result_ciri <− data.frame(cbind(t(procedure_ciri[, 1:4]),
255 t(procedure_ciri[, 5:8]))) %>%
256 mutate(procedure = procedure, mean = X1, sd = X2,
257 lower = X1 − 1.96∗X2, upper = X1 + 1.96∗X2) %>%
258 dplyr::select(procedure, mean, lower, upper)
259

260 datasets <− c("PTS", "HRS", "CIRI")


261 effect <− data.frame(rbind(result_pts, result_hrs, result_ciri))
262 xtable(effect, digits = c(rep(3, 5)))
263

264 save.image("GLM.RData")

50

Das könnte Ihnen auch gefallen