Beruflich Dokumente
Kultur Dokumente
Abstract
Making causal claims is central to evaluation practice because we want to know the effects of a
program, project, or policy. In the past decade, the conversation about establishing causal claims has
become prominent (and problematic). In response to this changing conversation about causality, we
argue that evaluators need to take up some new ways of thinking about and examining causal claims
in their practices, including (1) being responsive to the situation and intervention, (2) building rel-
evant and defensible causal arguments, (3) being literate in multiple ways of thinking about causality,
(4) being familiar with a range of causal designs and methods, (5) layering theories to explain causality
at multiple levels; and (6) justifying the causal approach taken to multiple audiences. Drawing on
recent literature, we discuss why and how evaluators can take up each of these ideas in practice. We
conclude with considerations for evaluator training and future research.
Keywords
causality, outcomes, impact evaluation, causal pluralism
Introduction
Evaluators routinely make causal claims about social and educational policies and programs. While
evaluators may not use language of cause and effect, they use a wide range of terms that refer to the
relationship between an intervention and the changes brought about by the intervention such as
outcomes, impacts, consequences, results, and differences. Regardless of the term(s) used, most
evaluators would agree that making causal claims is central to practicing evaluation.
Across a wide variety of settings from grassroots nonprofit organizations to university grant-
funded programs to national policy initiatives, evaluators are increasingly facing pressure to make
causal claims and to defend these claims. Recent international trends toward evidence-based policy
making and results-oriented management have led to a surge in commissioning evaluations that
investigate the causal relationships between interventions and their consequences. There are often
1
University of Illinois, Urbana–Champaign, IL, USA
2
The University of Auckland, Auckland, New Zealand
Corresponding Author:
Emily Gates, University of Illinois at Urbana-Champaign, 1310 South Sixth Street, Champaign, IL 61820, USA.
Email: emilygat@gmail.com
30 American Journal of Evaluation 38(1)
significant stakes in the use of these claims. Evaluation commissioners may use the findings to
inform decisions about which interventions are funded, modified, or discontinued. Evaluation
stakeholders—both those in favor of and those in opposition to an intervention—may use these
findings to bolster their arguments for or against the intervention. Those involved in and affected by
the intervention may look to these findings to provide evidence of whether and how they are (or are
not) making a difference in the lives of intended beneficiaries. And the wider public may take
interest in and draw conclusions about the extent to which these interventions and the government
agencies, foundations, and development organizations that fund them are serving the interests of
intended beneficiaries and achieving the social changes they claim to advance.
Despite the importance of causal claims, the ways in which evaluators should establish evidence
in support of these causal claims are heavily contested in evaluation. While answering causal
questions is a core mandate of the evaluation field and has been since its origins (Mayne, 2011;
Picciotto, 2012), how evaluators warrant causal claims has recently come under considerable scru-
tiny and debate (Cook, Scriven, Coryn, & Evergreen, 2010; Donaldson, Christie, & Mark, 2009;
Picciotto, 2013; Scriven, 2008; Stern, Andersen, & Hansen, 2013). The issue of causality in evalua-
tion is complicated by the lack of agreement in philosophy of science about the nature of causality
and broader disagreements in the social sciences about how causal claims ought to be warranted. For
example, the philosopher of science Nancy Cartwright (2007) points out, ‘‘nowadays causality is
back, and with a vengeance . . . methodologists and philosophers are suddenly in intense dispute
about what these kinds of claims can mean and how to test them’’ (p. 1). Warranting causal claims in
evaluation poses an ongoing challenge for the evaluation field and for evaluators who are being
routinely called on to substantiate such claims in the evaluations they conduct.
Our aim in this article is not to try to settle the definitional matter of the nature and meaning of
causality or to evaluate the merits of methodological and epistemological arguments for establishing
the strongest evidence on behalf of causal hypotheses. Rather, we take up the issue of warranting
causal claims as a practical problem that requires evaluators to utilize their professional judgment to
make decisions in particular circumstances. We consider warranting causal claims to be a practical
problem because the appropriate and feasible causal questions, designs, and methods may be unclear
and multiple options may be possible; yet, evaluators need to make decisions about how to warrant
causal claims in response to particular intervention(s), circumstances, and stakeholders (Schwandt,
2014). Schwandt (2014) argues that theoretical knowledge can serve as ‘‘an aid in thinking through
options in a situation that a practitioner faces’’ (p. 234). Since the body of theoretical knowledge
having to do with how evaluators, and social scientists more broadly, ought to warrant causal claims
is fast changing and under dispute, engaging in reasoned reflection to decide what to do can be
overwhelming, confusing, and quite challenging to evaluators unfamiliar with these debates. There-
fore, we provide an introductory overview of this changing conversation and identify six guidelines
for evaluators when warranting causal claims in particular circumstances. These guidelines offer
evaluators a heuristic for practice—an aid for thinking through possible options, making decisions,
and taking action in particular circumstances.
ethical dilemmas associated with random assignment procedures, and their methodological appro-
priateness (American Evaluation Association, 2003; Cook, 2007; Scriven, 2008; U.S. Government
Accountability Office, 2009). The recent conversation about causality has moved beyond the matter
of hierarchy of methods to other concerns including relevant causal questions, variety of causal
relationships and ways of thinking about causality, and the complexity of intervention
characteristics.
Several trends characterize recent conversations about causality. First, there has been an expansion
from attribution-oriented questions that aim to attribute outcomes to an intervention to contribution-
oriented questions that investigate the contribution an intervention is making to outcomes and wider
impacts (Stern et al., 2012, p. 38). Second, there has been growing attention on theorizing and
validating how causal processes and mechanisms work. For example, Pawson and Tilley (1997,
2004) have developed realist evaluation that focuses on building and verifying a theory about how
processes and mechanisms work in particular contexts to generate effects and changes. Third, in
evaluation and social science more broadly, there is growing acknowledgment that there are multiple
ways to think about causal relationships. Cartwright (2007) makes this point in Hunting Causes and
Using Them: ‘‘Causation is not one thing, as commonly assumed, but many. There is a huge variety of
causal relations, each with different characterizing features, different methods for discovery, and
different uses to which it can be put’’ (p. 2). Fourth, concepts and approaches from the complexity
sciences and systems fields are beginning to be applied in evaluation contexts, thus introducing ways
of examining and modeling nonlinear, multidirectional, nested, and layered causal relationships.
Finally, there has been a surge of interest in methods and designs for assessing causal relationships
other than true experiments. For example, in international development evaluation considerable atten-
tion is focused on rigorous nonexperimental designs and methods for impact evaluation (Leeuw &
Vaessen, 2009; Network of Networks on Impact Evaluation [NONIE], 2008; Piccioto, 2013; Rogers,
2009; Stern et al., 2012; Tsui, Hearn, & Young, 2014; White & Philips, 2012). Likewise structural
equation modeling and econometric methods such as difference-in-difference approaches have cap-
tured significant attention of many evaluators. Political scientists have drawn attention to qualitative
approaches for assessing causality in single cases (e.g., process tracing) and multiple cases (e.g.,
qualitative comparison analysis). Stakeholder-based and narrative approaches, such as most significant
change and success case method (SCM), are also attracting more attention in the evaluation field.
Additionally, approaches for modeling complex systems (e.g., causal loop diagrams, system
dynamics) have gained the interest of some evaluators. As Stern (2013) notes, ‘‘the repertoire that
evaluators can draw on to address cause and effect questions is fast expanding . . . Designs that were
until quite recently considered marginal and exploratory are fast becoming mainstream’’ (p. 3).
This changing conversation raises several important considerations, which we frame in the
following section as guidelines for evaluators when dealing with issues of causality.
Table 1. Issues and Questions to Consider Regarding the Evaluation Situation and Intervention.
Issues Questions
(1) Intervention What characteristics describe the intervention (e.g., the size, scale,
attributes multifaceted, and dynamic)?
How is the intervention theorized to work (e.g., multiple mechanisms and in
conjunction with other interventions)?
What degree of complexity characterizes the relation between the intervention
and its context?
(2) Evaluation purpose, What is the purpose of the evaluation (e.g., accountability, program
audience, and improvement, scaling up, and empowerment)?
questions Who are the intended audiences for the evaluation?
Which causal questions are central to these audiences?
What kinds of decisions will be made based on the results?
(3) Evidence needed What existing evidence about the outcomes and impact of the intervention is
already available?
What evidence is credible and trustworthy to the intended audiences?
What level of certainty and confidence in this evidence is needed?
(4) Cultural and ethical How do intended audiences view the nature of change?
considerations Are there any cultural differences in views on change?
Are the views of the most disadvantaged addressed equally?
(5) Resources and What are the evaluator(s) methodological capacities?
constraints Which views on causality do the evaluator(s) assume?
What financial and material resources are available?
What is the time frame for the evaluation and what, if any, constraints does this
pose?
context in which the evaluation is being conducted’’ (Davidson, 2000, p. 24). The evidence needs to
fit the purpose and the context. Donaldson, Christie, and Mark (2009) discuss the necessity to assess
what information ‘‘stakeholders [will] perceive as trustworthy and relevant for answering their
questions’’ (p. 244). Different stakeholder groups might have different evidence needs for an
evaluation, requiring balancing and prioritizing their needs.
It is also important to identify cultural and ethical considerations. In any evaluation, these are
important, but issues that are specific to outcome evaluation include the ways in which different
stakeholders may view the nature of change or the intended changes from a program. Julnes and Rog
(2007) point out that these considerations are particularly sensitive in evaluations that involve
communities of indigenous people. Social justice concerns can also arise when considering whether
the views of the most disadvantaged are addressed equally and whether the evaluation will focus on
measuring average effects or effects on the most disadvantaged. ‘‘Social justice requires that those
least advantaged not be further disenfranchised by focusing only on the information needs of those
with the greatest resources’’ (Julnes & Rog, 2007, p. 137).
Of course, these issues must be weighed against logistical constraints and available resources.
Often practical and political constraints, such as timelines, lack of available evidence, and cost
constraints, must be balanced against the desired ideal or preferred outcome evaluation methods.
The process of considering these five key issues may involve helping stakeholders understand
that there are a variety of causal questions, a variety of ways to think about causality, and a variety of
ways to gather evidence and warrant causal claims. By considering these issues, evaluators can
begin to understand, describe, and discuss with stakeholders characteristics of the intervention and
situation, which will help to inform other considerations for making causal claims. Table 1 provides
questions that can help evaluators consider each issue. These questions should be considered before
or during the process of developing an evaluation design.
34 American Journal of Evaluation 38(1)
Evaluators [need to] learn—and become capable of explaining to the public—that an evaluation is an
argument. My concern here is that in the press to master methods of generating data we ignore the idea of
developing a warranted argument—a clear chain of reasoning that connects the grounds, reasons, or
evidence to an evaluative conclusion. (p. 147)
Patton has similarly underscored the importance of shifting from methods to reasoning: ‘‘evaluation
as a field has become methodologically manic-obsessive. Too many of us, and those who commis-
sion us, think it’s all about methods. It’s not. It’s all about reasoning’’ (Patton, 2012, p. 105).
Four interrelated considerations are involved in building relevant and defensible causal argu-
ments: (1) the nature and character of the causal argument one wants to make; (2) the types, sources,
and probative force of evidence required; (3) the audiences for the argument; and (4) standards,
norms, or criteria for what constitutes a good causal argument. First, there are a variety of causal
questions and each sets the ground for a different kind of argument. For example, a causal question
about the outcomes and impacts of a particular program calls for describing the nature, extent, and
perceived value of intended and unintended outcomes. A causal question regarding the processes
and mechanisms that contribute to a program working in particular circumstances calls for an
argument about explaining a plausible theory of change, evidence that supports this theory occur-
ring, and ruling out rival explanations. Just as ‘‘care is needed to determine the relevant cause–effect
question in any specific context, and whether or not the question is reasonable’’ (Mayne, 2012b,
p. 1), evaluators ought to connect the questions with the nature and character of argument.
Second, evaluators ought to think about the types, sources, and probative force of evidence
needed to build an argument. A way to frame this is using empirical evidence to distinguish between
more and less plausible claims (Campbell, 1999) through a preponderance of evidence approach
(Scriven, 1976) using discretionary judgment (Patton, 2002). The preponderance of evidence approach
dates back to Scriven’s (1976) modus operandi method drawing on the notion of establishing
causality that is used in the professions and in everyday life, such as in diagnosing what is wrong
with a car, or in medical diagnosis, detective work, cause-of-death determination, and so forth.
Third, in order to synthesize evidence into a relevant and convincing argument, evaluators need
to consider their primary audiences. Being situationally responsive involves constructing an argu-
ment that will have a certain rhetorical appeal to particular audience(s).
Fourth, evaluators ought to consider criteria, standards, and norms for constructing a ‘‘good’’
causal argument. Cartwright and Hardie (2012) define a ‘‘good argument’’ as ‘‘one in which the
premises themselves are all well warranted—trustworthy—and together imply the conclusion, or at
least make it highly likely’’ (p. 53) and then provide practical examples of good arguments of policy
effectiveness. In social science research, Gerring (2005) has identified 14 criteria as applicable to all
causal arguments (e.g., specification, completeness, intelligibility, and relevance). In reference to
building arguments in evaluation more generally, Davidson (2005) highlights the importance of
‘‘intelligibility,’’ arguing for the need for clear and coherent causal reasoning since evaluators need
to communicate for understanding with the lay public. In relation to evaluative evidence, Julnes and
Gates and Dyson 35
Rog (2007) contend that evidence must not only be credible but also be ‘‘actionable,’’ that is,
adequate and appropriate for guiding actions in real-world contexts. The extent to which an evalua-
tion can provide answers to ‘‘why’’ and ‘‘why not’’ questions is one criterion for determining
actionable evidence. We believe actionable offers another criterion relevant to the quality of causal
arguments.
Successionist. For much of the history of evaluation (as well as the social sciences more broadly), a
successionist framework has been the dominant way of thinking about and assessing causality. A
successionist framework underlies two closely related logics of causality, regularity, and counter-
factual. A regularity view of causal relations is based on the simultaneous observation of two
separate events, X and Y, in which X occurs temporally prior to Y; there is a statistical relationship
(covariation) between X and Y; X is both necessary (X always present when Y is) and sufficient
(Y always present when X is); other plausible causes can be ruled out; and the relationship can be
found in a large number of cases. This way of thinking is the basis for statistical techniques including
survey research methodology and statistical analyses of data sets. Another and related way of
thinking about causality follows a counterfactual logic in which causal claims require making a
comparison between two highly similar situations to illuminate the ‘‘counterfactual,’’ an estimate of
what would have happened in the absence of the intervention (Mark & Henry, 2006). It is based on
the assumption that causality itself is not observable. Therefore, ‘‘we observe what did happen when
people received a treatment . . . [and use a control group to estimate] what would have happened to
those same people if they simultaneously had not received treatment’’ (Shadish, Cook, & Campbell,
2002, p. 5). Counterfactual logic is the basis for RCTs, quasi-experimental designs as well as some
statistical techniques including difference in differences.
Narrative. Another way of thinking about causality and one that nearly each of us uses routinely in
our day-to-day lives relies on a narrative accounting for how we think change happens (Abell, 2004).
Narrative explanation foregrounds the importance of human agency in causality by attending to
human perception, motivation, and behavior. This way of thinking about causality does not view
participants as passive recipients but rather as active ‘agents’ (Stern et al., 2012). Under this
assumption, participants have agency and can help cause successful outcomes by their own actions
and decisions. People who advocate a narrative view of causality reject treating causal agents as
variables and treating context as a confounding variable that should be controlled for. Instead they
treat context as an important factor in determining whether a program will work in a certain setting.
This view doesn’t aggregate outcomes across different people but recognizes that people have
different values and that program outcomes will be different for different clients. The narrative
view focuses on documenting individualized outcomes of individual clients rather than measures of
standardized outcomes. Narrative accounts underlie many participatory, story-centered approaches
including most significant change and SCM.
36 American Journal of Evaluation 38(1)
Generative. A generative way of thinking about causal relationships builds and verifies a theory-based
explanation of how causal processes happen by showing how mechanisms work within particular
contexts to generate outcome patterns. This way of thinking is oriented toward understanding how,
why, for whom, and under what conditions interventions work to produce specific results. This way of
thinking assumes there are multiple possible causal pathways linking an intervention to an outcome.
These alternative causal paths will be true for certain people under certain conditions. Drawing on an
analogy with gunpowder that will only fire in favorable conditions, Pawson and Tilley (1997) have
suggested that program causal mechanisms only fire within favorable contexts. Mechanisms are not
regarded as general laws that are always true; instead, their particular context is a part of the causal
process (Pawson & Tiley, 1997). A generative way of thinking underlies some theory-based
approaches to causality including realist evaluation and process tracing.
Causal package. Another way of thinking about causality is the idea of ‘‘causal packages’’—the
copresence of multiple causes each of which may or may not be necessary and/or sufficient to produce
an effect. This way of thinking supports examining the contributory role components of interventions
and combinations of multiple interventions play in producing outcomes and impacts. The idea here is
that many interventions do not act alone, and the desired outcomes are often the result of a combination
of causal factors, including other related interventions, events, and conditions external to the inter-
vention (Mayne, 2012a). This view highlights that it is not mono-causal conditions but combinations
of conditions that need to be examined (Sager & Andereggen, 2012). This way of thinking draws on
the logic of ‘‘necessary’’ and ‘‘sufficient’’ conditions to focus on causes that are neither necessary nor
sufficient on their own (Mayne, 2012a). The logic of this way of thinking involves identifying a
package of multiple causes that work together to produce an effect; describing each cause as necessary
but not sufficient within a causal package that is sufficient; and distinguishing ground preparing,
triggering, and sustaining contributory causes (Stern et al., 2012). After all, many programs are ‘‘less
often ‘magic bullets’ that trigger change in and of themselves, but mostly prepare the ground for long-
term change . . . . Knowing whether some conditions are required for a programme to work is impor-
tant in order to make it work’’ (Befani, 2013, p. 277). A causal package way of thinking is associated
with qualitative comparison analysis and contribution analysis.
Complex systems. Conceiving of the world comprising complex systems offers a way of thinking
about nonlinear, multidirectional, hierarchical, and dynamical causal relationships in a system or
situation of interest. In this way of thinking, the focus is on examining the multiple, interdependent
causal variables (also called factors) and nonlinear, cyclical feedback processes that affect the
structure and dynamical behavior of a system over time. Feedback loops—the means by which
systems reorganize—are ‘‘closed chains of causal connections’’ that balance or reinforce system
behavior through the dynamic of stocks and flows (Meadows, 2008, p. 188). Challenging the notion
that causal factors are stable and can be studied in isolation, this way of thinking requires studying
the interrelationships between factors as they are hypothesized, empirically found, and/or computer
simulated to affect change in a particular situation or system of interest. In this way, causal relation-
ships are context dependent; however, general patterns of systemic behavior may occur in and apply
to different contexts. This systemic way of thinking about causality often requires investigating
different levels of causality (e.g., individual motivation, an organizational policy, and economic
shifts) and how these levels interact to affect change (Forss, Marra, & Schwartz, 2011). This way of
thinking puts great emphasis on modeling causal relationships as their nonlinearity (i.e., effects are
not proportional to the size, quantity, or strength of the inputs) and emergent properties (i.e.,
characteristics and behaviors of a system cannot be reduced to or predicted based on its component
parts) often lead to unpredictable, surprising, and counterintuitive behaviors. The aim is not for a
single, bottom-level explanation, which is not possible in this way of thinking but rather for an
Gates and Dyson 37
ongoing investigation of how causal relationships and feedback loops interact to influence change
over time. This complex systems’ way of thinking is found in causal loop diagramming and system
dynamics.
Each of these five ways of thinking about causality—successionist, narrative, generative, causal
package, and complex systems—illustrates a different way of thinking about and investigating
causal relationships that evaluators may draw on in their practices. Table 2 summarizes each
approach and offers examples of its use in particular methodologies and its relevance to particular
evaluation circumstances.
While these ways of thinking about causality are presented here as distinct, elements and assump-
tions of each are often mixed in methodological approaches and particular circumstances. For
example, one could take a narrative approach to understanding how participants understand the
impact of a program while also employing a type of ‘‘subjective counterfactual’’ (Abell, 2004) by
asking a participant to consider what would have occurred had they not participated in a given
program or taken a particular action. Or, as in the work of Byrne (2009), a generative way of thinking
about the processes and mechanisms that produce effects in particular circumstances is combined
with a complex systems’ account to examine causality at multiple levels and with a causal package
way of thinking to identify the necessary and sufficient causes in a particular case. These examples
suggest that the point of reflecting on different ways of thinking about causality is not to classify
one’s view but rather to carefully consider the assumptions used to investigate causal relationships in
a particular evaluation. This can then influence as well as be informed by the kind of causal
argument one wants to make and the questions and subsequent design and methods used in an
evaluation.
Theory based. The ‘‘theory’’ in these approaches is a set of assumptions about how an intervention
achieves its goals and under what conditions. In these approaches a theory of change (or causal
chain) follows ‘‘the pathway of a program from its initiation through various causal links in a chain
of implementation, until intended outcomes are reached’’ (Stern et al., 2012, p. 25). Theories of
change and related descriptions of causal links can focus on a sequence of decisions or actions, or
may also consider ‘‘causal mechanisms.’’ The concept of causal mechanisms assumes that it is
necessary to identify the ‘‘mechanism’’ that makes things happen in order to make plausible causal
claims. Theory-based evaluation also tries to understand the contextual circumstances under which
particular mechanisms operate. Merely having similar mechanisms in place will not assure similar
outcomes if the context is different or if various ‘‘helping’’ or ‘‘support’’ factors are absent. Theory-
based methodologies can range from ‘‘telling the causal story’’ about how and to what extent the
intervention has produced results to using the theory as an explicit benchmark to ‘‘formally test
causal assumptions’’ (Leeuw & Vaessen, 2009). The primary basis for causal inference in these
approaches is in-depth theoretical analysis to identify and/or confirm causal processes or ‘chains’
and the supporting factors (and possibly mechanisms) at work in context. Some authors caution that
Gates and Dyson 39
these approaches are not effective at estimating the quantity or extent of the causal contribution of an
intervention (Stern et al., 2012) and that a causal contribution may just be assumed if there is
evidence of the expected causal chain (NONIE, 2008).
Realist evaluation is an example of a theory-based approach. Developed by Pawson and Tilley
(1997), realist evaluation develops an explanation of how an intervention brings about effects (i.e.,
mechanisms), the features and conditions that influence the activation of these mechanisms (i.e.,
context), and the intended and unintended consequences resulting from activation of different
mechanisms in different contexts (i.e., outcome patterns). By identifying and building empirical
support for a context–mechanism–outcome configuration, realist evaluators provide information
about how, why, under what circumstances, and for whom interventions work. Realist evaluation
is often used in public health (e.g., Evans & Killoran, 2010; Marchal, van Belle, van Olmen, Hoeree, &
Kegels, 2012).
Case based. Case-based approaches to causal analysis are those that focus on examining causal
relationships within a particular case or across multiple cases. There are different philosophical
traditions and methodologies within case-based approaches, which are described by Stern et al.
(2012) as interpretive (e.g., naturalistic, grounded theory, ethnography) and structured (e.g., con-
figurations, qualitative comparison analysis, within-case analysis, and simulations and network
analysis; p. 24). Case-based approaches can also be distinguished according to approaches that
Gates and Dyson 41
focus on single cases (i.e., within case) and those that compare multiple cases (i.e., across case or
comparative case).
Process tracing, an example of a within-case approach, involves developing a theory of how
causal processes and mechanisms lead to effects, outcomes, or impacts in a particular intervention or
case; collecting evidence that these causal processes and mechanisms, in fact, took place; identifying
alternative explanations for what led to these effects; and collecting evidence that these alternative
explanations did not take place and/or are not responsible for producing the effects (Bennett, 2010).
Qualitative comparative analysis (QCA; Ragin, 2000) is an analytical tool comparing different
combinations of conditions and outcomes in multiple cases. QCA foregrounds context and empha-
sizes the importance of ‘‘constellations of causes’’ (Sager & Andereggen, 2012, p. 63) instead of
mono-causal explanations. It draws conclusions about which conditions are necessary parts of a
‘‘causal recipe’’ to bring about a given outcome (Ragin, 2000). QCA attempts to compare the
different combinations of conditions and outcomes of each case, with the goal of discovering what
configurations of conditions lead to what outcomes, and which of those conditions are key in
producing certain outcomes (White & Phillips, 2012). Analysis begins by identifying a number of
relevant cases, typically between 15 and 25, for which a specified outcome has or has not occurred.
The researcher assembles a table of the various combinations of conditions and outcomes in the
cases being examined and uses Boolean algebra to compare the combinations of conditions (Ragin
& Amoroso, 2011). The last step involves analyzing and interpreting the causal recipes with con-
sideration of existing theory. In evaluation, a QCA analysis can be useful in determining where to
invest future resources by determining what conditions are most likely to lead to certain outcomes
and has most often been used in evaluations of policy issues.
Systems based. Systems-based approaches ‘‘use a wide range of methods and methodologies devel-
oped over the past 50 years within the systems field’’ (NONIE, 2008, p. 28). Systems approaches
focus on understanding and often modeling interrelationships between aspects or influences on a
situation; examining different perspectives, values, and worldviews in relation to a situation or
intervention; and reflecting on and critiquing different boundaries drawn around a situation, inter-
vention, or evaluation. According to NONIE (2008), systems-based approaches to examining causal
relationships are useful for supporting the development of causal models that address nonlinear and
nonsimple causality and, in some approaches, paying attention to who is included in the evaluation
and how decisions are made (p. 28).
One example of a systems-based approach to examining causal relationships is system dynamics
modeling. Originally developed by Jay Forrester in the 1950s, ‘‘system dynamics is an approach for
thinking about and simulating situations and organizations of all kinds and sizes by visualizing how
the elements fit together, interact, and change over time’’ (Morecroft, 2010, p. 25). These models,
especially when computer simulated, offer real-time feedback and provide the capacity to make
decisions and take actions that are not feasible or ethical in the everyday world, such as modeling
interventions across multiple scenarios, manipulating time and conditions, stopping action to allow
for reflection, and pushing a system to extreme conditions to see what happens (Sterman, 2006).
Despite the potential of system dynamics modeling for understanding the effects of policies and
programs, there are few examples using this methodology in evaluation (see Fredericks, Deegan, &
Carman, 2008; Homer & Hirsch, 2006 for a discussion in public health).
between inputs, activities, outputs, outcomes, and longer term impacts. This theory is usually
developed through a combination of reviewing the literature, examining the design of the interven-
tion, and talking with key stakeholders about how they conceptualize the workings of the interven-
tion. However, amid the changing conversation about causality, some scholars contend that these
linear, intervention-centered theoretical explanations do not adequately capture the multiple levels
of change influencing and influenced by programs and policies (Barnes, Matka, & Sullivan, 2003;
Callaghan, 2008). In particular, they contend that programmatic theories do not adequately describe
or explain microlevel processes of power, negotiation, and contested interpretations of how change
happens, or more macrolevel organizational, institutional, and sociological processes that constrain
and shape the workings of interventions. They argue for evaluations that draw on theories from
across disciplines to explain change and/or causal relationships at multiple levels of analysis. This
requires evaluators to be familiar with different theories that explain causality at different levels of
analysis (e.g., psychological, social psychological, organizational, institutional, and sociological), to
know how to layer these theories in particular evaluations, and how these layered theories can
inform evaluation design, data collection, and data analysis.
Recent evaluation work that draws on theory from the complexity sciences offers examples of
which theories evaluators might draw on and how evaluators layer these theories in a particular
evaluation. Barnes, Matka, and Sullivan (2003), in their evaluation of a health action zone (HAZ),
draw on complexity theory and new institutionalist perspectives to adequately understand and
explain change in what they describe to be a multifaceted intervention:
The HAZ seeks to promote change of individuals (e.g. changing lifestyles to improve health status);
populations (e.g. reduced incidence of heart disease amongst people living in a particular area); com-
munities (e.g. increased social cohesion); services (e.g. health services that are more responsive to the
needs and circumstances of service users); and systems (the processes through which different agencies
work together, determining shared aims, delivering services and establishing appropriate systems of
governance). (p. 266)
Similarly, Westhorp (2012) contends that substantive theories that draw on complexity theories and
concepts (e.g., contingent causation, emergence of system properties) can help to understand change
processes in complex adaptive systems, particularly which aspects of context to attend to and which
interactions matter for generating outcomes, which can help shape questions and guide evaluation
design (p. 411). Sanderson (2000) advances Bhaskar’s social naturalism as a guiding framework for
conceptualizing change. These scholars’ work not only points to the need for evaluations that layer
theories to explain causality at multiple levels but also suggests that there are some substantive
theories (e.g., complex adaptive systems, social naturalism) that may be relevant for understanding
change processes across different evaluation settings and circumstances.
audiences including evaluation commissioners, stakeholders, and the wider public. For example,
while commissioning impact evaluations using nonexperimental designs is becoming more accepted
by some funding agencies, there remains a demand that evaluators justify the necessity and feasi-
bility of alternative designs. Justifying approaches requires communicating its necessity and value
for the circumstances at hand as well as limitations. Multiple audiences include the evaluation
commissioners and stakeholders but also the wider public and media. In the context of evidence-
based and results-oriented management, nonexperimental designs that cannot neatly attribute out-
comes to programs may be less clear-cut and conclusive making communicating results to the media
more difficult.
Conclusion
The issue of how evaluators warrant causal claims will continue to be a central issue in the field and
practice of evaluation. As philosophical definitions of causality shift and new methodological
approaches emerge, evaluators will and ought to continue to reassess what they mean by causality
in their practices and what methodological approaches they use to investigate causal relationships
and warrant causal claims. However, given the routine practice of making causal claims in evalua-
tion and the growing demand for evaluators to defend how they make these claims, this article
reviewed the recent conversation about causality to identify six guidelines for evaluators when
warranting causal claims in particular circumstances. In addition to guiding evaluation practice,
this heuristic poses implications for teaching and training evaluators. The issue of making causal
claims is often covered within the confines of methodological courses with little consideration and
examination of the multitude of ways of thinking about causality and the other practical issues
relevant to making causal claims in evaluation circumstances. This heuristic of six guidelines can be
used to guide reflection, discussion, and debate about how evaluators engage and ought to engage in
making causal claims in different practical circumstances. Future research is needed to empirically
explore how these guidelines can be used by evaluators in particular circumstances and what, if any,
guidance they offer. Additionally, further discussion within the evaluation community is needed
regarding how evaluators construct relevant and defensible causal arguments and how evaluators
can justify the causal approach taken to multiple audiences.
Acknowledgments
We thank Thomas Schwandt for his guidance and thoughtful editorial comments and three anonymous
reviewers for their feedback on an earlier version of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
References
Abell, P. (2004). Narrative explanation: An alternative to variable-centered explanation? Annual Review of
Sociology, 30, 287 310.
American Evaluation Association. (2003). American evaluation association response to U.S. Department of
Education notice of proposed priority. Federal Register, RIN 1890-ZA00, November 4, 2003. Retrieved
from http://www.eval.org/p/cm/ld/fid=95
44 American Journal of Evaluation 38(1)
Barnes, M., Matka, E., & Sullivan, H. (2003). Evidence, understanding and complexity: Evaluation in
non-linear systems. Evaluation, 9, 265 284.
Befani, B. (2012). Models of causality and causal inference (Report of a study commissioned by the
Department for International Development, Working paper 38). Retrieved from http://www.dfid.gov.uk/
Documents/publications1/design-method-impact-eval.pdf
Befani, B. (2013). Between complexity and generalization: Addressing evaluation challenges with QCA.
Evaluation, 19, 269 283.
Bennett, A. (2010). Process tracing and causal inference. In H. E. Brady & D. Collier (Eds.), Rethinking
social inquiry: Diverse tools, shared standards (2nd ed., pp. 207 220). Lanham, MD: Rowman &
Littlefield.
Biesta, G. (2007). Why ‘‘what works’’ won’t work: Evidence-based practice and the democratic deficit in
educational research. Educational Theory, 57, 1 22.
Brinkerhoff, R. O. (2003). The success case method: Find out quickly what’s working and what’s not. San
Francisco, CA: Berret-Koehler.
Brinkerhoff, R. O. (2005). Success case method. In S. Mathison (Ed.), Encyclopedia of evaluation (pp.
402 403). Thousand Oaks, CA: Sage
Byrne, D. (2009). Complex realists and configurational approaches to cases: A radical synthesis. In D.
Byrne & C. C. Ragin (Eds.), The SAGE handbook of case-based methods (pp. 101 111). Thousand
Oaks, CA: Sage.
Callaghan, G. (2008). Evaluation and negotiated order: Developing the application of complexity theory.
Evaluation, 14, 399 411.
Campbell, D. T. (1999). On the rhetorical use of experiments. In D. T. Campbell & M. Jean Russo (Eds.), Social
experimentation. (pp. 149–158). Thousand Oaks, CA: Sage.
Cartwright, N. (2007). Hunting causes and using them. Cambridge, England: Cambridge University Press.
Cartwright, N., & Hardie, J. (2012). Evidence-based policy: Doing it better. A practical guide to predicting if a
policy will work for you. Oxford, England: Oxford University Press.
Cook, T. D. (2007). Describing what is special about the role of experiments in contemporary educational
research: Putting the ‘‘gold standard’’ rhetoric into perspective. Journal of Multidisciplinary Evaluation, 3,
1 7.
Cook, T. D., Scriven, M., Coryn, C. L., & Evergreen, S. D. (2010). Contemporary thinking about causation in
evaluation: A dialogue with Tom Cook and Michael Scriven. American Journal of Evaluation, 31, 105 117.
Coryn, C. L. S., Schröter, D. C., & Hanssen, C. E. (2009). Adding a time-series design element to the success
case method to improve methodological rigor: An application for nonprofit program evaluation. American
Journal of Evaluation, 30, 80 92.
Davidson, E. J. (2000). Ascertaining causality in theory-based evaluation. New Directions for Evaluation, 2000,
17 26.
Davidson, E. J. (2005). Evaluation methodology basics: The nuts and bolts of sound evaluation. Thousand
Oaks, CA: Sage.
Donaldson, S. I., Christie, C. A., & Mark, M. M. (Eds.). (2009). What counts as credible evidence in applied
research and evaluation practice? Thousand Oaks, CA: Sage.
Duignan, P. (2009). Seven possible impact/outcome evaluation design types. Outcomes Theory Knowledge
Base, article No 209. Retrieved from http://knol.google.com/k/paul-duignan-phd/seven-possible-outco-
meimpact-evaluation/2m7zd68aaz774/10
Evans, D., & Killoran, A. (2010). Tackling health inequalities through partnership working: Learning from a
realistic evaluation. Critical Public Health, 10, 125 140.
Forss, K., Marra, M., & Schwartz, R. (Eds.). (2011). Evaluating the complex: Attribution, contribution, and
beyond. New Brunswick, NJ: Transaction.
Fredericks, K. A., Deegan, M., & Carman, J. G. (2008). Using system dynamics as an evaluation tool:
Experience from a demonstration program. American Journal of Evaluation, 29, 251 267.
Gates and Dyson 45
Gerring, J. (2005). Causation: A unified framework for the social sciences. Journal of Theoretical Politics, 17,
163 198.
Homer, J. B., & Hirsch, G. B. (2006). System dynamics modeling for public health: Background and oppor-
tunities. American Journal of Public Health, 96, 452 458.
House, E. R. (1977). The logic of evaluative argument. In E. Baker (Ed.) CSE Monograph Series in Evaluation
(Vol. 7). Los Angeles, CA: UCLA Center for the Study of Evaluation.
Julnes, G., & Rog, D. J. (2007). Pragmatic support for policies on methodology. New Directions for Evaluation,
113, 129 147.
Karlan, D. (2009). Thoughts on randomized trials for evaluation of development: Presentation to the Cairo
evaluation clinic. Journal of Development Effectiveness, 1, 237 242.
Leeuw, F., & Vaessen, J. (2009). Impact evaluations and development: NONIE guidance on impact evaluation.
Washington, DC: World Bank.
Marchal, B., van Belle, S., van Olmen, J., Hoeree, T., & Kegels, G. (2012). Is realist evaluation keeping its
promise? A review of published empirical studies in the field of health systems research. Evaluation, 18,
192 212.
Mark, M., & Henry, G. T. (2006). Methods for policy-making and knowledge development evaluation. In I. Shaw,
J. Greene, & M. Mark (Eds.), The Sage handbook of evaluation (pp. 317 339). London, England: Sage.
Mayne, J. (2011). Contribution analysis: Addressing cause and effect. In K. Forss, M. Marra, & R. Schwartz (Eds.),
Evaluating the complex: Attribution, contribution, and beyond (pp. 53 95). New Brunswick, NJ: Transaction.
Mayne, J. (2012a). Contribution analysis: Coming of age? Evaluation, 18, 270 280.
Mayne, J. (2012b). Making causal claims (ILAC Brief No. 26). Rome, Italy: Institutional Learning and Change
(ILAC) Initiative. Retrieved from http://www.cgiar-ilac.org/files/publications/mayne_making_causal_
claims_ilac_brief_26.pdf
Meadows, D. H. (2008). Thinking in systems: A primer. White River Junction, VT: Chelsea Green.
Morecroft, J. (2010). System dynamics. In M. Reynolds & S. Holwell (Eds.), Systems approaches to managing
change: A practical guide (pp. 25 85). London, England: Springer.
Network of Networks on Impact Evaluation Subgroup 2. (2008). NONIE impact evaluation guidance. Retrieved
from http://www.worldbank.org/ieg/nonie/docs/NONIE_SG2.pdf
Patton, M. Q. (2002). Qualitative research and evaluation methods. Thousand Oaks, CA: Sage.
Patton, M. Q. (2008). Utilization focused evaluation (4th ed.). Thousand Oaks, CA: Sage.
Patton, M. Q. (2012). Contextual pragmatics of valuing. New Directions for Evaluation, 133, 97 108. doi:10.
1002/ev.20011
Pawson, R., & Tilley, N. (1997). Realistic evaluation. London, England: Sage.
Pawson, R., & Tilley, N. (2004). Realist evaluation. British Cabinet Office, 1 36. Retrieved from http://www.
communitymatters.com.au/RE_chapter.pdf
Picciotto, R. (2012). Experimentalism and development evaluation: Will the bubble burst? Evaluation, 18,
213 229.
Picciotto, R. (2013). The logic of development effectiveness: Is it time for the broader evaluation community to
take notice? Evaluation, 19, 155 170.
Ragin, C. C. (2000). Fuzzy set social science. Chicago, IL: University of Chicago Press.
Ragin, C. C., & Amoroso, L. M. (2011). Constructing social research (2nd ed.). Thousand Oaks, CA: Pine
Forge Press.
Rogers, P. (2009). Matching impact evaluation design to the nature of the intervention and the purpose of the
evaluation. Journal of Development Effectiveness, 1, 217 226.
Sager, F., & Andereggen, C. (2012). Dealing with complex causality in realist synthesis: The promise of
qualitative comparative analysis. American Journal of Evaluation, 33, 60 78.
Sanderson, I. (2000). Evaluation in complex policy systems. Evaluation, 6, 433 454.
Schwandt, T. A. (2008). Educating for intelligent belief in evaluation. American Journal of Evaluation, 29,
139 150.
46 American Journal of Evaluation 38(1)
Schwandt, T. A. (2014). On the mutually informing relationship between practice and theory in evaluation.
American Journal of Evaluation, 35, 231 236.
Scriven, M. (1976). Maximizing the power of causal investigation: The modus operandi method. In G. V. Glass
(Ed.), Evaluation studies annual review 1 (pp. 120 139). Beverly Hills, CA: Sage.
Scriven, M. (2005). Causation. In S. Mathison (Ed.), Encyclopedia of evaluation (pp. 44 48). Thousand Oaks,
CA: Sage.
Scriven, M. (2008). A summative evaluation of RCT methodology: & An alternative approach to causal
research. Journal of Multidisciplinary Evaluation, 5, 11 24.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for
generalized causal inference. Boston, MA: Houghton Mifflin.
Stern, E. (2013). Editorial. Evaluation, 19, 3 4.
Stern, E., Andersen, O. W., & Hansen, H. (2013). Editorial: Special issue: What can case studies do? Evalua-
tion, 19, 213 216.
Stern, E., Stame, N., Mayne, J., Forss, K., Davies, R., & Befani, B. (2012). Broadening the range of designs and
methods for impact evaluations (Report of a study commissioned by the Department for International
Development, Working paper 38). Retrieved from http://www.dfid.gov.uk/Documents/publications1/
design-method-impact-eval.pdf
Sterman, J. D. (2006). Learning from evidence in a complex world. American Journal of Public Health, 96,
505 514.
Tsui, J., Hearn, S., & Young, J. (2014). Monitoring and evaluation of policy influence and advocacy (Working
paper for the Overseas Development Institute (ODI)). Retrieved from http://www.odi.org.uk/sites/odi.org.
uk/files/odi-assets/publications-opinionfiles/8928.pdf
U.S. Government Accountability Office. (2009). Program evaluation: A variety of rigorous methods can help
identify effective interventions (GAO-10-30). Retrieved from http://www.gao.gov
Westhorp, G. (2012). Using complexity-consistent theory for evaluating complex systems. Evaluation, 18,
405 420.
White, H., & Phillips, D. (2012). Addressing attribution of cause and effect in small n impact evaluations:
Towards an integrated framework. New Delhi, India. Retrieved from www.3ieimpact.org