Sie sind auf Seite 1von 13

Reliability Engineering and System Safety 108 (2012) 7789

Contents lists available at SciVerse ScienceDirect

Reliability Engineering and System Safety


journal homepage: www.elsevier.com/locate/ress

Using plural modeling for predicting decisions made by adaptive adversaries


Dennis M. Buede n, Suzanne Mahoney, Barry Ezell, John Lathrop
Innovative Decisions, Inc., 1945 Old Gallows Rd., Suite 207, Vienna, VA 22182, USA

a r t i c l e i n f o

abstract

Article history:
Received 11 June 2011
Received in revised form
28 May 2012
Accepted 1 June 2012
Available online 15 June 2012

Incorporating an appropriate representation of the likelihood of terrorist decision outcomes into risk
assessments associated with weapons of mass destruction attacks has been a signicant problem for
countries around the world. Developing these likelihoods gets at the heart of the most difcult
predictive problems: human decision making, adaptive adversaries, and adversaries about which very
little is known. A plural modeling approach is proposed that incorporates estimates of all critical
uncertainties: who is the adversary and what skills and resources are available to him, what
information is known to the adversary and what perceptions of the important facts are held by this
group or individual, what does the adversary know about the countermeasure actions taken by the
government in question, what are the adversarys objectives and the priorities of those objectives, what
would trigger the adversary to start an attack and what kind of success does the adversary desire, how
realistic is the adversary in estimating the success of an attack, how does the adversary make a decision
and what type of model best predicts this decision-making process. A computational framework is
dened to aggregate the predictions from a suite of models, based on this broad array of uncertainties.
A validation approach is described that deals with a signicant scarcity of data.
& 2012 Elsevier Ltd. All rights reserved.

Keywords:
Adaptive adversary
Probabilistic risk analysis
Plural analysis
Descriptive modeling of decisions

1. Introduction
Probabilistic risk analysis (PRA) is being used extensively to
address not only the risks of engineered and natural structures,
but also attacks by human adversaries [14]. There has been some
criticism and extended discussions about the appropriateness of
PRA for situations in which the threat is an adaptive, human
adversary (hereafter called an adaptive adversary) [512]. The
focus of this paper is not PRA and whether it is an appropriate
approach for risk analysis involving adaptive adversaries. Rather,
the focus of this approach develops a plural modeling framework
that is similar to suggestions made by Guikema and Aven [13] and
addresses modeling adaptive adversaries for risk analysis so that
the results can be used by whatever higher-level risk analysis
method seems appropriate. The conversation for this paper will
be the threat risk assessments performed by the Department of
Homeland Security for weapons of mass destruction (WMD). In
particular, our illustrations will be for the bio-terrorism risk
assessment, but the approach and comments apply for any risk
analysis of this sort.
The motivations for modeling terrorists as adaptive adversaries will be addressed extensively later in this paper. But to
summarize for now, terrorists are not homogeneous but differ

Corresponding author. Tel.: 1 703 861 3678; fax: 1 703 860 8639.
E-mail address: dbuede@innovativedecisions.com (D.M. Buede).

0951-8320/$ - see front matter & 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.ress.2012.06.002

Downloaded from http://www.elearnica.ir

widely in terms of motivations; decision making information,


skills, and processes; and organizational or personal psychology.
In addition, there will likely be some interaction between what
the terrorist (red) does and what the defending government
(blue) does. For this paper we are focusing on strategic risk
analyses of one to three years so move-countermove aspects of
this interaction will be fuzzy at best. For more tactical or
operational risk analysis involving interchanges in minutes
through months, it is much more critical to model these interactions between red and blue, making a time dependent model of
the adaptive adversary more critical.
The primary assertion associated with this research is that
adaptive adversaries, acting as terrorists, cannot be assumed to be
rational decision makers. Even more emphatically, modelers
cannot presume to know which modeling approach best characterizes the decision-making outcomes of these individuals and
groups, all of whom are different, some dramatically so. Nonetheless the perspectives and motivations of these adaptive adversaries are critical to predicting their decision-making outcomes
and need to be included in the modeling process.
Our approach in this paper is to use multiple modeling
methods, plural modeling. These modeling methods will consider
motivations or objectives of the adaptive adversaries, will address
multiple decision making styles, and will be conditioned on reds
perceptions of reds capabilities as well as reds perceptions of the
defensive actions that blue has taken. This approach is founded on
the principle that has been learned many times in the military/

78

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

intelligence communities: that blue should not assume that red


will do what blue would do in a given situation, often called
mirroring.
In summary our approach will categorize adaptive adversaries
in multiple groups based on similarities of motivations and
resources, decision-making characteristics and psychology. The
decision-making outcomes of each group will then be modeled by
multiple simple, descriptive methods and aggregated into a
probabilistic representation based upon the uncertainties of the
situation, red groups characteristics, and red perceptions of its
capabilities and what defensive actions blue has taken. Our
justication for this approach is that plural modeling (using
relatively simple models) has proven (across many domains) to
be more statistically accurate than a single model, no matter how
complex that model is [14]. The data input requirements for these
simple models will be similar and manageable for the task at
hand. Finally the experts providing these model inputs should be
more comfortable providing the information required by these
models than providing the output of the adversary choice models.
This paper identies several critical issues that must be
addressed by any solution to this problem. Next we justify and
dene a plural modeling approach for computing the probabilities
of the adversarys decision outcomes. Since this is a computationally intensive problem to address and the plural modeling
solution may need to be inserted into any of a number of risk
assessments, we dene a computational framework for implementing the proposed plural modeling solution. Finally we
address the important issue of validation.

2. Issues associated with modeling adaptive adversaries


The largest issue in modeling adaptive adversaries for a risk
analysis is coming to grips with how little is really known. Once
this reality is accepted, the fact that a probabilistic model must be
used is easy to accept. So the output of the adaptive adversary
model is not going to be X is going to happen. But even more
importantly just about everything about the adaptive adversary is
uncertain: who is red and is this adversary a group or individual?
What does red know about blue past actions and future intentions? What does red care about in mounting an attack (e.g.,
hurting blue, changing blues policies, building a bigger red
organization)? What kind of success does red need to justify an
attack? What resources (nancial, technical, personnel, etc.) does
red possess for the attack? How realistic is red in estimating the
success of an attack red may mount? How does red make a
decision about what attack to mount, and does reds decision
process unfold as intended or is it diverted to something new and
unpredictable? What are the criteria that would cause red to
activate the attack (is red waiting for the attack approach to be
ready or waiting for a trigger based on blue actions or something
else)? The bottom line here is that there are many uncertainties
and they should all be modeled to capture blues uncertainty
about reds actions.

and collectively exhaustive alternatives, shown as lines of text


within the box. Arrows entering a decision node indicate the
information available at the time of the decision. Random variable
nodes or chance nodes, represented by ovals, have multiple
possible, mutually exclusive and collectively exhaustive states,
shown as lines of text within the oval. The function associated
with the random variable node represents the probabilistic
dependence between the random variable and the nodes having
arrows pointing to the random variable. A value node, shown as a
hexagon, represents a measurable objective (which could be
combination of objectives), and has an associated value function
to calculate the measure. Arrows entering a value node indicate
the variables and decisions that serve as parameters for the value
function.
Fig. 1 is a simplication of the red inuence diagram for red
decisions. The box in the upper left represents a number of
decisions that a particular red group (or individual) will make
regarding an attack on blue. Examples of these decisions include
the target to be attacked, the weapon type used in the attack, and
the delivery mechanism. The specic decisions that are made
(as well as their order) may be inuenced by which group is being
analyzed. The specic decision alternatives chosen will be
affected by the preferences of the group. Red is uncertain about
which countermeasures blue has implemented or plans to implement as well as reds chances of success should an attack be
initiated. Finally red is uncertain about the consequences that
may result from the attack if it is initiated. The bottom node in the
gure involves a number of concepts. For each alternative red
considers, it assesses the possible (uncertain) consequences in
terms of the degree to which they further each of reds objectives.
Then red assesses the overall value of the alternative by in some
way combining how all those objectives are furthered into a
single impression of overall value, accounting for the relative
importance red associates with each objective. That sentence is
deliberately engineered to avoid making any assumptions as to
how, and how systematically, red incorporates all those considerations in deciding among alternatives.
Fig. 2 provides a more detailed representation of reds decision
problem, showing some of the additional nodes that would be
needed for a realistic analysis. There are three decision nodes in
the upper left, one for attack initiation and two for the agent and
target aspects of an attack. With both agent and target aspects
only a small subset of the full spectrum of decision alternatives
are shown. In the upper right issues associated with reds
perceptions of blues countermeasures are shown. We will discuss
the issue of perceptions in more detail later in the paper.
Uncertainties about the consequences of a red attack are shown

2.1. A structured approach for discussing this modeling problem


In this paper we will use inuence diagrams to model
decisions by red. There is long history of archival literature on
inuence diagrams. See [15,16] for two early papers on inuence
diagrams. An inuence diagram is an acyclic directed graph with
three types of nodes: decision nodes, random variable nodes and
value nodes. Directed links pointing to a node indicate that the
nodes from which the arrows emanate contain the parameters
required to evaluate the destination nodes function. Decision
nodes, represented by boxes, have multiple, mutually exclusive

Fig. 1. Simplication of the red decision problem.

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

79

Incorporate Reds prior belief along with Reds


perception of Blue action into belief about
location and effectiveness of Blue
countermeasures.

Red perceptions of Blue


countermeasures and
probability of success.
Reds belief about
probability of success
depends upon Red
decisions as well as
belief about
countermeasures.

Distributions over
Red preferences.
Total Cost to Red
Change Blue Policy
Recruit - Further cause
Adhere to Religious Teachings

Distributions derived from Blue estimates


of consequences given scenario.
Estimates adjusted for each Red group
type.

Fig. 2. More detailed representation of the red decision problem. (For interpretation of the references to color in this gure legend, the reader is reffered to the web version
of this article.)

Fig. 3. Detailed simplication of realistic inuence diagram for reds decision problem. (For interpretation of the references to color in this gure legend, the reader is
reffered to the web version of this article.)

in the bottom right; the conditioning of these uncertainties on


reds choices are not shown here to keep the diagram simple, but
are shown in Fig. 3. Finally, the various consequences will impact
the extent to which reds objectives are achieved. Examples of
reds objectives are total cost (can only afford certain capabilities),
change blue policy, furthering reds cause by increased recruiting
or standing, and adhering to religious teaching. Of course some of

these will not be relevant to some groups or individuals but


critical to other groups or individuals.
Fig. 3 shows reds decision problem in more detail in terms of
arrows between nodes and variables needed for the value model.
But this gure is still a simplication of the nal model, which
could have, e.g., more decision nodes and many more choices
associated with each decision node. Also there will be some

80

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

intermediate chance nodes between the decision nodes and the


consequence nodes that would mitigate the effects associated
with the decisions being made. Note that a node for which type of
group is undertaking the attack is shown at the bottom of this
gure. There would be four to eight groups types needed for a
realistic analysis.
2.2. A discussion of the uncertainties associated with this modeling
problem
These gures presuppose specic knowledge that we believe is
generally available concerning groups or individuals who are
likely to use WMD against large countries like the United States,
though a great deal of uncertainty remains. Example uncertainties
concern the skills and resources the red group or individual may
have, what their fundamental objectives are [17], what targets
they would consider and other aspects of their attack, and what
the consequences (e.g., deaths, nancial damage, impact on the
economy) of such an attack might be. There is sufcient information among researchers and intelligence analysts to elicit probability distributions over the variables discussed here for each of
four to eight red groups. The analysis can then be conducted for
each group, and then aggregated across groups considering
probability distributions developed across the groups.
A second category of knowledge presupposed by these gures
concerns the critical but very difcult issue related to the
perceptions of the attackers for what the consequences of an
attack may be, as well as how the consequences will relate to
achieving sufcient levels of satisfaction on their fundamental
objectives. This issue of perceptions is much more difcult to
address than red and blue. Horlick-Jones [18] discuss additional
dimensions of perceptions that add to perception complexities
including technical, engineering, societal, moral and political
each impacting red and blues perceptions. Renn [19] explains the
complexity resides in the fact that perception is a mental
representation. We will address how to gather information on
perceptions in Section 2.3. Before leaving this topic we provide
another inuence diagram showing reds perceptions of blues
countermeasures decision in Fig. 4. At the bottom of this gure

we show a possible representation of reds perceptions of blues


objectives and associated measures for those objectives. What
were reds decision alternatives in Figs. 2 and 3 are now
uncertainties for blue when making countermeasure decisions.
Exactly how much discussion members of a red group might have
about blues decision making are questions that should be posed
to experts. One of our plural modeling methods deals explicitly
with these issues.
Another category of knowledge necessary to perform calculations on these diagrams is what types of models are reasonable
representations of how each of the red groups or individuals will
decide which alternative to choose. Multiple objective decision
analysis (MODA) [20] is a normative decision model, though it is
at times used as a descriptive model. Satiscing [21], lexicographic reasoning [22], and prospect theory [23] have been
suggested as more realistic descriptive models. Various forms of
game theory are at times used for this situation, though the
descriptive power of these highly rational modeling approaches is
often called into question. Level-k game theory [24] ts nicely
within the plural modeling approach being taken here. Those
decision models are explained in Section 3.2. There are deeper
questions related to this issue of how to describe the decision
process, such as is the order in which decisions (e.g., threat,
agent) are made important in predicting the nal decision?
Finally, there is the uncertainty of how reliably red follows some
descriptive process. In a modeling sense, how noisy is reds
behavior when compared to our best attempt at a descriptive
model? Our approach to dealing with these issues is the plural
modeling discussed in Section 1 and incorporated into the title of
the paper. We will use as many of these modeling methods as
seems appropriate and develop probability distributions across
the actions of each red group based upon Monte Carlo simulations
for the previous uncertainties for each of these methods.
Finally, there is the issue of blues reaction to reds attack. First,
is this reaction even important to red? Second, is there any
deterrence effect that will keep red from trying to be too
successful? These issues need to be addressed via elicitation and
modeling. Two approaches that can be used are game theory and
incorporating these issues into the fundamental objectives of red.

Fig. 4. Reds perceptions of blues decision on countermeasures. (For interpretation of the references to color in this gure legend, the reader is reffered to the web version
of this article.)

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

2.3. A discussion of implementation issues for this modeling problem


Some of the critical issues in modeling adaptive adversaries
are (1) integrating this model into the risk assessment models
that need probability distributions over adversary decision alternatives, (2) dealing with the computational size induced by the
number of relevant WMD variables, (3) gathering information on
red perceptions such that the results represent how red is
thinking about its own decisions and not how blue would act if
it were the terrorists, and nally (4) integrating the results of the
many levels of uncertainty into one representation of the uncertainty faced by blue for a terrorist attack.
The rst issue deals with knowing what decision outcomes the
broader risk assessment is addressing. The same WMD decision
outcomes being addressed in the risk assessment must be part of
the adaptive adversary model. But the context variables associated with these decision outcomes from the perspective of red
groups must also be part of the adaptive adversary model. In
addition, some way of addressing multiple categories of red
groups or individuals must be included along with the resources
and skills of each group. The inuence diagrams we show in
Figs. 3 and 4 have samples of these details.
The computational size of some of these WMD risk assessments is large enough to create some unique problems. For
example, the Bioterrorism Risk Assessments conducted by the
U.S. Department of Homeland Security have about 50,000 variations of agents, target categories, acquisition/production options,
and dissemination options. Naturally some combinations of these
are not very consequential. The U.S. employs large, science-based
simulations to estimate the consequences across all 50,000
combinations of options and more [8]. A terrorist organization
may or may not have the computational resources or orientation
to take a similar approach. But more fundamentally, the
50,000-variation nature of the problem indicates the degree of
complexity that could be involved. A terrorist organization could
bring any of a number of analytic or less analytic approaches to bear
on that complex a problem, in ways that are hard to anticipate. This
brings us back to the important topic of perceptions.
There are many sources for estimating the perceptions of
terrorist organizations: intelligence reports; reports of interviews
with terrorism-related detainees; reports of experts on terrorism
psychology and decision making; media reports about previous
WMD attacks, both successful and unsuccessful; media reports on
the statements of experts or government spokespeople about
WMD attack variables; and reports by technology media and
papers in journals related to these WMD variables.
The nal topic is combining the many forms of uncertainty into
a single probability distribution across the red decision outcomes,
i.e., across reds alternatives at a red decision node. We have
demonstrated that inuence diagrams can be created to represent
the decision space, associated uncertainties, and objectives for each
red group. Similarly, we have just described information sources
that could provide rough probabilistic representations of the red
perceptions of the possible consequences for blue of red attack
options. We have even illustrated a modeling approach for thinking
about how red might view blues decisions for elding WMD
countermeasures. Finally, we have described how uncertain the
red decision-making process is, leading to formulating multiple
models of the red decision-making process. We will describe how
these models of decision-making processes can be implemented in
the next section. Now we must address how probabilities across
decision outcomes can be computed for any model of decision
making. The two common approaches are Monte Carlo simulation
and employing one or a combination of probabilistic choice models.
The Monte Carlo approach randomly samples from the probability distributions over the arguments for a given decision

81

model, then calculates the decision outcome that would be


chosen by that decision model in that Monte Carlo case, repeating
that process for some large number of cases. The number of times
a decision outcome is chosen, across Monte Carlo cases, is
normalized to become the probability that decision outcome will
be chosen, given that one of the decision outcomes is chosen. This
approach has the advantage that if a specic decision outcome is
always inferior to some other decision outcome, its probability
will be zero.
The second approach is to compute an expected value to red
for each of the decision outcomes across all of the uncertainties,
then use one or a combination of two widely recognized models
for estimating choice probabilities among decision outcomes
based on the set of expected values for all decision outcomes:
the Luce model [25] and the Random Utility Model [26]. We can
compactly present the equation for choice probabilities for those
two models in one equation:
pDi

eavDi blnvDi
n
P
eavDj blnvDj
j1

where Di represents the ith decision alternative available to red,


p(Di) is the probability the ith decision alternative will be selected
by red, given that red picks one of the alternatives from the set Dj,
j1 through n, v(Di) represents reds value for the ith decision
alternative, a and b are constants that sum to 1.
When a 0 and b 1, the equation is the Luce model. When
a 1 and b 0, the equation is the random utility model. This
combined-model equation will generate choice probabilities for
methods that compute an expected value for each of the
considered decision outcomes, such as MODA and Prospect
Theory. (Note: Section 4 will address how the results of each of
the several models can be aggregated into a nal answer.)

3. A plural modeling approach for predicting decision


outcomes
Forecasters have shown repeatedly that aggregating across
multiple models outperforms a good single model.
Considerable literature has accumulated over the years regarding the combination of forecasts. The primary conclusion of
this line of research is that forecast accuracy can be substantially improved through the combination of multiple individual forecasts. Furthermore, simple combination methods
often work reasonably well relative to more complex combinations [14].
See also [2730]. While most of these forecasters are working
in simpler domains than predicting human decision making,
these results should generalize to any forecasting domain.
Guikema [31,32] shows that aggregate forecasts outperform
individual forecasting models on forecasting the impacts of
natural disasters on critical infrastructure in the context of
homeland security risk analysis.
In general, averaging across many models is likely to produce
results that may prove close to a uniform probability distribution
if the models are widely divergent. Answers that are close to a
uniform distribution may not be viewed as helpful, but in many
cases may be the correct answer. This is especially true if
numerous models were used and their results varied widely.
The general principle involved can be simply put: Any one
model makes a set of assumptions that deviates from the actual
world processes being predicted. If different models deviate
from the actual world in different ways, then it should not be

82

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

surprising that aggregating those models results in better predictions than any one of those models operating alone. There is
nothing strictly inevitable about that reasoning, but the empirical
ndings cited above show that reasoning to widely hold true. That
said, two concerns about plural modeling call for comment:
First, plural modeling muddles the relationships of which
assumptions drive which results, which can be deduced by careful
study of an individual model. Yet there is nothing in plural
modeling that precludes the modeler from deducing those relationships in each individual model, then analyzing how those many
relationships map into the aggregated behavior of the plural model.
A second concern involves framing. Any model has a particular
scope and set of assumptions. That scope and those assumptions
have an effect on the descriptive performance of the model, which
we will refer to here as a framing effect. In the case of the
component models considered here, those assumptions include
ones concerning the ranges of alternatives, and value attributes of
those alternatives, considered, and the choice model employed.
Plural modeling in general presents an opportunity to reduce
framing effects by combining models with different framings.
However, the plural modeling presented here has limitations on
reducing those framing effects. That is all the component models
aggregated here share the same range of alternatives and range of
attributes. So, while the plural modeling presented here has the
opportunity to reduce the framing effect of which particular
choice model is used, by aggregating over several choice models,
it does not reduce the framing effects of the ranges of alternatives
and attributes considered.
Certainly time has proven that predicting human decision
making, especially those of humans who are adversaries (e.g.,
terrorists), is quite difcult and fraught with peril. The research
literature on building descriptive models of human behavior
demonstrates that no single approach does very well [33].
For DARPAs on-going Integrated Crisis Early Warning System
(ICEWS) program, Innovative Decisions, Incorporated, the consulting group with which the authors are afliated, constructed a
probabilistic aggregation model to combine the estimates made
by four statistical and two agent based models. The combined
estimates outperformed those of any one other model. To accomplish this IDI characterized the estimates of each model, effectively recalibrating them, and developed a Bayesian network,
through a data mining process, to compute the combined estimates [34].
Section 3.1 describes ensemble approaches for aggregating the
results of multiple models. The next section after that describes
the descriptive models that we believe should be employed in any
plural modeling approach to predicting the decision outcomes of
red. Finally Section 3.3 describes an approach to aggregating
across this collection of descriptive models given that no ground
truth data is available.

3.1. Ensemble modeling approaches


As previously discussed, plural modeling [35] recommends
analyzing a modeling problem by applying several analyses in
parallel and aggregating their results. The primary advantage of
such an approach is that the results are more accurate than
relying on a single complex model [36].
An illustrative analog is found in the machine learning
literature; in particular ensemble learning aggregates results of
multiple learners [37]. A learner is simply a model learned from
data that makes a prediction such as the classication of an
observed entity. Ensemble learning has two thrusts: (1) generating a set of base learners, each of which makes its own prediction,
and (2) aggregating the output of the base learners, sometimes
with model that is learned whose data includes the predictions
made by the base learners. This approach works given the
following assumptions: (a) that the base learners are independent
and (b) each base learner is better than chance.
The challenge in generating ensembles is to create a large set
of base learners. To efciently generate large sets of learners,
researchers may manipulate:
a) Classier/model type: One approach is to use information
about a models informational and computational requirements to select a subset of available models.
b) Versions of a model type: To manipulate versions of a model
type, one may modify a basic model in a systematic way. For
example, one may randomly limit oneself to a subset of the
random variables when learning a model or modify the
structure of the relationships among the variables.
c) Training sets: A training set is a subset of the data used to
generate base learners. By randomly sampling from the available training data, usually sampling with replacement, a
different base learner may be generated for each sample.
d) Parameter sets: Model parameters, where relevant, may also
be manipulated through sampling to produce as many models
as samples.
We consider the adaptive adversary models to be equivalent to
base learners. To generate an ensemble, we propose that a varied set
of models be selected from a database. Then we propose to sample
over specied model parameters so that any one model produces
multiple outcomes that are then combined into a distribution. Next,
we combine the probability distributions from the different models.
Table 1 summarizes selected aggregation methods used to aggregate
probability forecasts by experts or models. Aggregation requirements denote what known data is required while the aggregation
considerations denote what issues about forecasts the method
explicitly considers. As shown in the table, most probability aggregation tools use past performance to guide aggregation. Many also

Table 1
Aggregation methods.
Aggregation method

Majority vote
Average
Weighted Average
Variance Algorithm [39]
Coherent Approximation Principle [41]
Generative Bayesian [42]
IDI ICEWS aggregator [34]
Multi-response linear regression (MLR) [43]

Aggregation requirements

Explicit aggregation considerations

Priors

Calibration

X
X

History

X
X
X
X
X
X

Coherence

Precision

Dependence
among experts

X
X
X
X

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

try to model characteristics of the forecasters. Some approaches,


such as average, are simple. Others, such as the ICEWS Aggregator
[34] actually learn another model to perform the aggregation. In
machine learning, this is called stacking [38].
In comparing aggregation methods, one needs to consider a
number of factors. These include requirements such as prior
values for probability distributions and ground truth history.
Calibration is the capability of a model to make accurate predictions: for example, predictions with a 0.75 probability occur 3 out
of 4 times. Coherence requires that a set of predictions from a
single source be logically consistent. Precision speaks to the
accuracy of the predictions. Dependence among experts considers
the degree to which predictions of experts are correlated.
Dani et al. [39] compare the performance of a number of
algorithms for a database of binary predictions. In all cases,
performance is based upon past history. For example, Cesa-Bianchi
et al. [40] found (for their particular data set) that variations of an
aggregation algorithm that assigns weights to individual experts
perform no better than a simple average. An algorithm that weighs
experts based on their variance does better. The variance algorithm makes the following assumptions about the experts: predictions are (1) Gaussian centering on the true probability, (2)
unchanging in how well they make predictions and (3) independent. Dani et al. [39] surmise that the average works well because
experts are not well calibratedtending to be over-condent in
their predictions. They also propose that a weighted linear
combination across a set of other aggregation algorithms may
perform well. Kahns generative Bayesian model [42] considers
calibration and dependence among experts as well as accuracy for
fusing forecasts of experts and models. The calibration assumes a
functional form based upon a single parameter. It performs well
on simulated datasets. Predd et al. [41] add coherence to the mix.
Coherence depends upon eliciting multiple related probabilities
from an expert. Statistical and agent based models produce
forecasts for the ICEWS program. Following traditional Bayesian
formulations [28] and using historical performance data, the
ICEWS aggregation algorithm [34] learns the likelihood ratios
associated with a discretization of a models forecasts. The nave
Bayes formulation assumes independence of the forecasters. The
multi-response linear regression (MLR) develops a linear regression for each situation of interest [43].
Ranjan and Gneiting [44] prove that linear combinations of
calibrated forecasts are themselves not calibrated. They propose a
transform to adjust the weights of different forecasters so that
the aggregated distribution is calibrated. We believe that improving calibration of aggregated forecasts is a major performance
enhancement.
3.2. Descriptive models for predicting human behavior
Our plural modeling approach builds around the concept of
the adversary having multiple conicting objectives. As such we
start with Multiple Objective Decision Analysis (MODA), even
though it is more known for its normative power than descriptive
power. However, other approaches such as satiscing, lexicographic analysis, and prospect theory all use a similar structure
with different mathematics. Finally, we have adapted Level-k
game theory to this structure as well.
MODA is an approach to balancing trade-offs among competing
objectives that is consistent with the rationality axioms of decision
analysis [20]. There are additional axioms associated with MODA
that justify the additive equation associated with most MODA
applications [20], as shown in the following equation.
vx

n
X
i1

wi vi xi

83

where v(x) is the overall value associated with an alternative, based


on an analysis that addresses n measures that capture the value of
the alternatives on n or fewer objectives. xi is the value of the ith
measure for the alternative in question. vi is the value function for
the ith measure; this value function has a dened minimum and
maximum value for xi known as x0i and xni . This value function can
reect more is better or more is worse, etc. wi is the relative
weight for the ith measure; this weight is properly called a swing
weight because it reects the relative of the importance of the ith
measure to the other measures based on the swing in value from
the minimum to the maximum values associated with x0i and xni . The
weights are commonly normalized to sum to 1.0, and the value
functions are normalized to range from either 0 to 1, or 0 to 10, or
0 to 100.
The objectives for the MODA model for each particular terrorist
group will have to be dened during elicitation sessions with
experts. A rst cut at high level objectives for the terrorist groups
was shown in Fig. 3: change blue policy, recruit more members to
further the goals of the group, adhere to religious policies, and cost.
These are consistent with the fundamental objectives that Keeney
and von Winterfeldt [38] suggested recently.
Satiscing was proposed by Simon [21] as a descriptive theory
of human decision making because it suggests a form of bounded
rationality. Satiscing starts by having the decision maker think
of an alternative and evaluate it on a set of objectives and
measures. So red is assumed to be rational with respect to the
value model, but is assumed to have bounded rationality with
respect to identifying a complete set of alternatives from which to
choose. In our approach a random alternative would be selected
from the universal set of alternatives. If the alternative passes
some threshold of overall value, then the decision maker selects
that alternative and acts. However, if the alternative falls short of
the threshold of overall value, the decision maker nds another
alternative and repeats the process. That is, red thinks of one
alternative at a time and determines (using a value model) if it is
good enough. If not, red thinks of another alternative and
continues until one is determined to be good enough. We believe
there is real world merit to this approach since a typical terrorist
group would not have the ability to evaluate a complete set of
biological agents (or chemical compounds, radiological devices,
etc.) in one sitting, but would be presented with or think of
potential weapons somewhat randomly, and will have to decide
whether to take action with that weapon or wait for a better
opportunity.
For the satiscing case, blue will have uncertainty about the
value model, the order in which the WMD alternatives will be
presented and the threshold adopted by red. So in this case the
Monte Carlo simulation would randomly sample for all of those
uncertainties.
Lexicographic reasoning is another descriptive theory of human
decision making that is based on a form of bounded rationality
called non-compensatory heuristics [22]. Here the decision maker
is assumed to be capable of developing a list of all possible
alternatives, but the value model is much simplied. The decision
maker is assumed to have a rank order of the most important
measures or objectives. No value functions or weights are needed.
The decision maker rank orders the alternatives on the basis of
how well they do on the most important measure (or objective). If
there is one single alternative at the top of the list, this alternative
is the winner. If two or more alternatives are tied for best, these
tied alternatives are ranked on the second most important
measure or objective. Again if only one alternative is at the top
of this second ranking, the winning alternative is selected. If there
is a tie, this process is repeated on the third most important
measure (or objective). This approach is called non-compensatory
because the second ranked alternative may outperform the

84

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

top-ranked alternative on every other objective, but this does not


compensate for even a small difference on the most important
objective.
For the lexicographic reasoning case, blue will have uncertainty about the rank order of the measures (and objectives). The
Monte Carlo simulation will address the rank order of the
measures (or objectives).
Prospect theory was developed by Kahneman and Tversky [23]
to retain part of decision theory but be more descriptive of actual
human decision making. There are now two forms of Prospect
theory: original [23] and cumulative [45].
Original Prospect theory takes the following form for the value
function.
8 a
if x 4 0
>
<x ,
if x 0
vx 0,
>
: lxa , if x o 0
where xstatus quo. Common values for a and l are 0.88 and
2.25, respectively. This curve demonstrates satiation of value for
both negative and positive values of x as x gets further from zero,
as well as a stronger inuence of negative values of x compared to
positive values of x.
Our implementation of original Prospect theory requires that
we substitute the individual MODA value functions (the vi from
the MODA equation) for red into the equation above. That is, x in
the equation above would represent each individual value function for the red MODA model, calibrated such that vi (status
quo) zero. Since the decisions being addressed have to do with
taking new actions, the status quo values are current levels if no
new action is taken.
Cumulative Prospect theory creates a function to modify the
subjective probabilities of the decision maker:

pp

dpg
dpg 1pg

Kahneman and Tversky [45] assumed d was equal to 1 and


found g to equal 0.61 for gains and 0.69 for losses. Wu, Zhang and
Gonzalez [46] found d and g to be 0.79 and 0.60 for gains,
respectively, and 0.88 and 0.67 for losses. These latter s-shaped
curves are shown below. Note the curvature is much more
pronounced for losses. The intersection of both curves with the
45-degree line is where p and p are about 0.39. In the calibration
literature these curves are termed underextremity, assigning
probabilities on the y-axis that are not extreme enough Fig. 5.
Level-k game theory was developed by Stahl and Wilson
[47,48] and Nagel [49] and has been found to successfully account

Fig. 5. Comparison of gains and losses in prospect theory.

for behavior in a wide range of experimental settings [24,4952].


It provides is a tractable algorithmic alternative to traditional
game theoretic solution concepts, but it relaxes the hyperrationality assumptions required to justify those concepts.
As in traditional game theoretic models, players form beliefs
about how their opponent(s) are likely to play. In traditional
equilibrium-based approaches to solving game theoretic models,
these beliefs are found by imposing a mutual consistency
assumptionwhich each players belief about her opponents
actions should coincide with the actual actions chosen by her
opponent(s). In level-k game theory, these beliefs are instead
formed on the basis of an inductive, hierarchical model of the
strategic sophistication of players.
Specically, each player in a level-k game has a level of strategic
sophistication. A level-0 player is non-strategic, and is assumed to
act randomly. A level-1 player reasons strategically but employs a
simplistic model of how his opponent(s) will play: he assumes they
are level-0 players and treats their actions as effectively random.
A level-2 player is one step more sophisticated: she assumes her
opponent(s) are level-1 players, and forms her beliefs about how
they will act accordingly. Similarly, a level-3 player forms beliefs
about her opponent(s) by assuming they are level-2 players (and by
computing what such a player would do). A level-4 player assumes
she faces a level-3 player, and so forthy
This algorithmic solution approach is easily adaptable to and
tractable in complex settings, including cases where each player
is uncertain about the goals and worldviews of her opponent(s)
[5354].
In summary, we are proposing six different descriptive modeling techniques, some of which will be analyzed simply by
Monte Carlo simulation to generate probabilities, others will be
analyzed via both Monte Carlo simulations and the expanded
Luce method to generate two sets of probabilities.
3.3. An aggregation approach for plural modeling
The adversary models we envision produce probability distributions over adversary choices. As illustrated by Table 1, most
approaches for combining distributions across multiple models
require historical ground truth data. How does one proceed
without a history grounded in experience?
Guikema and Aven [13] suggest an integrative approach that
triages risk into three classes: tolerable, unacceptable and subject
to further study. Their approach applies risk assessments designed
to rank discrete risks. Tolerable risks require no further study.
Mitigation/prevention of unacceptable risks will be of the highest
priority. For the third class, they recommend assessing the risk
with four different approaches: game-theoretic, probabilistic risk
assessment (PRA), a semi-quantitative analysis and protecting high
value targets. Each assessment produces a ranked list of the risks
being studied. Risks on which all four assessment approaches agree
on ranking would be ranked accordingly. Risks for which there
was disagreement would be subject to further analysis to determine why the assessments disagreed. The advantage of such an
approach is the insight provided by considering multiple analyses,
each with its own strengths and weaknesses.
Using ranked lists, in lieu of probability distributions over
events of interest, gets around the problem of calibration when
comparing probabilities generated by two models. Two models
lists could be very similar in their order, but have distributions
with very different variances. When ranked lists for all models are
in agreement, then an average of their probabilities would be an
appropriate combination method given a lack of ground truth.
If the ranked lists disagree, an average may be presented, but the
user should be warned about the disagreement among models.
This could encourage further analysis. A statistic such as Kendalls

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

tau could be presented to the user in conjunction with a


distribution to inform the user of the level of agreement
among lists.
We do have some ground truth. IEDs, for example, are planted
by terrorist organizations every day. As a second example suicide
attacks occur on a regular basis. Our suite of adversary models
provides a variety of decision-making behaviors that depend
upon group characteristics. The challenge is to determine which
behavior(s) are most aligned with a specied terrorist group type.
One could use attacks for which we have data to evaluate the
different decision-making behaviors represented by the models.
The assumption is that the decision-making behavior used in
planning for and carrying out lesser attacks would be consistent
with that for WMD attacks. By creating a set of adversary models
for attacks for which data is available, we would be able to
generate some history for each adversary model type. That history
could in turn be used to generate an aggregation model.

4. Framework for computing and aggregating probabilities


across models
As can be seen from the discussion in Section 3, this plural
modeling approach relies upon Monte Carlo simulation of the
decision making models given the uncertainties in the value trade
offs across objectives, samples of alternatives being considered,
and other probabilistic inputs related to reds perceptions and
resources. So the computing effort for this approach needs a
computational engine. In addition, there may be multiple users
with varying needs and threat risk assessments that drive the
specic manner in which these plural models will be used,
dictating a exible user interface. Additionally there may be
widely varying input data sets that depend upon the question
being asked and the threat assessment being done. Plus there will
often be the need to save the results of an analysis so that it can
be rerun at a later date with some changes or be used to create
new variations of runs for a future analysis. Finally, some users
may want just the answer, a probability distribution over the
threat decisions, while other users may want sophisticated
sensitivity analysis charts that describe which variables were
most critical in driving the answers and how varied these answers
might be given different settings of the input data. Fig. 6 shows a
schematic of the computational framework for this plural modeling approach with queries from the users (long dashes) entering
from the top, outputs exiting to the top right and right (long dash
and two dots repeated), and input data and models entering from

85

the left (long dash and single dot repeated). This is not an
inuence diagram but a diagram showing the ow of inputs (data
and models) and outputs (data results and reports).
The next three gures show the framework details and
illustrate two use case examples of how users might interact
with this framework. At the top of Fig. 7, we see that the
framework includes two user interface modules: the problem
denition query processor and the visualization query processor.
The arrows indicate data ows (dotted lines) and ow of controls
(dashed lines). The double-headed arrows indicate that data or
controls ows in both directions (e.g., query and response). The
rst query enables a user to compose a query that denes what
input data is required, what models are to be used, and what
outputs are desired while resolving conicts that arise such as
more data will be required than is available for those models. This
problem denition query processor requires some controller
capabilities to help resolve those conicts. The visualization query
processor enables interactions with the user once the computations are complete and various visualization formats are being
explored. Here some controller and output processing capabilities
are needed. The supervisor module integrates the problem denition with Monte Carlo simulations that must be performed. The
Monte Carlo simulation module calls the models that are required
and supervises the model computations needed to complete the
problem denition. Each of models is enabled to call the data
needed from the database. Finally the module for aggregation and
results sorts through the model results to create the outputs
desired by the users in the formats of choice. Different shapes are
used for the functional blocks in the computational framework to
make the functional differences clear to the reader.
Fig. 8 presents the rst of two use cases, which show just the ow
of data between the modules of the computational framework. Here
all of the arrows are solid since they represent just one activitydata
ow. This use case addresses the denition of a query and the
computation of models needed to complete the query. This use case
illustrates a clear interaction of the user with the Problem denition
query processor, which interacts with the supervisor, which sets up
the query and enables the Monte Carlo simulation module to carry
out the computations. The results of these computations are then
stored in the database. The second use case addresses pulling these
results out of the database for visualizations to the user.
Fig. 9 presents a use case for the visualization query. Here the
user interacts with the visualization query processor to dene the
format of the subset of results of interest. The visualization query
processor interacts with aggregation and results module so that
the latter module can format the visualizations requested and

Queries (inputs and Outputs)


Probabilities
of interest

Models from across


the community can be
integrated

Selected
models

Selected
groups & data

Probabilities
of interest

User Interface Capability


Model
Input
Processing Execution
Capability
Capability

Output
Processing
Capability

Reports

New models
Model Base

New data

Data Base
Custodial Capabilities

Computational Framework
Fig. 6. External interactions with plural modeling framework.

Probabilities
for all TRAs

86

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

User
Interface
Capability
Input
Processing
Capability

Problem Definition
Query Processor

Visualization
Query Proc

Supervisor
Controller
Capability

Aggregation
& Results

Meta data
Monte
Carlo
Simulation

Model
Compu
-tation

Processor
Capability

Output
Processing
Capability

Model Base
Data Base
Custodial Capabilities
Fig. 7. Computational framework for plural modeling.

Fig. 8. Use case for a problem denition query.

cause additional computations to be performed via the previous


use case, if needed. In particular, most users are going to want to
know what models and assumptions associated with those
models are driving the results to be what they are. The aggregation and results module will contain a range of sensitivity and
what-if analysis tools to help the user gure these associations
between models and assumptions with results.
In summary, any modeling approach for predicting the actions of
adaptive adversaries must serve as an input to a broader risk
analysis engine, where that estimates probability distributions over
consequences to blue due to reds actions and reactions to blues
mitigation actions. The point of this analysis is to aid blue in
examining alternate sets of countermeasures and adopting a costeffective countermeasure approach. The plural modeling approach
presented in the previous section requires substantial Monte Carlo

simulation to generate the probabilities that specic red groups or


individuals would undertake any of the many possible different
attacks possible. The possible attacks number in the millions. The
context settings (e.g., red group and possible resources, perception
states) around which a specic attack might be chosen number at
least in the thousands. The magnitude of the combinations and
permutations of the problem (e.g., red group, decision making
priorities and process, perception states) calls for analyses that fully
account for that complexity and the associated uncertainty.

5. Evaluation and validation


Sound analysis requires that the issues of evaluating the
results of modeling, and validating those results be taken

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

87

Fig. 9. Use case for a visualization query.

seriously. Having a computational platform that is capable of


performing what-if analyses and sensitivity analyses is a critical
part of evaluating the model results. It is here that we include the
development of statements about what model parameters have a
big (or negligible) impact on the answer and under what conditions those ndings hold. Similarly, with what-if analysis one can
describe the circumstances (data inputs) that would yield certain
types of outputs. The computational framework described above
for the plural modeling approach contains these capabilities.
Validation of the models is more complicated. There is no
available substantive record of ground truth for WMD attacks by
terrorist groups, since they have been rare events, which is a good
thing. Except for the simplest models it is incorrect to refer to
complex models such as described in the inuence diagrams
above as valid or validated simply by inspection. More appropriately, our validation plan is a qualied description of validity
that is interpreted by its users as accurate enough to be useful
within the parameters and context (bounds of validity) set forth
in its design.
Our validation plan is comprised of three steps. The rst step is
cause-effect graphing. Cause-effect graphing compares cause and
effect in the model with cause and effect in the real world. The
inuence diagrams shown in Fig. 2 though 4 are examples of such
cause-effect graphing.
The second step is predictive validation. Predictive validation
compares model outcomes to corresponding outcomes in the real
world. Of course, the real world here is the quintessential issue,
since these models are predicting events that have not necessarily
happened before in the real world. Since there is no data source
for ground truth related to red WMD attacks, our goal of
predictive validation for models regarding red decisions is a
nding of model convergence. Model convergence is achieved
when multiple sources are analyzed and data triangulation results
in convergence for a given set of inputs. We propose the following
process.
1. Establish test cases for a dozen or two sets of input sets.
Specic input sets would be created to make various types of
attacks as highly likely as possible. Other input sets could be
devised to make various types of attacks as unlikely as
possible.
2. Execute plural model computations for those input sets.

3. Compare output of the plural modeling approach to results


from the following venues as a surrogate to ground truth.
a. Red team assessments based on tabletop exercises using the
same scenarios and inputs, using multiple experts from government, academic and think tank organizations.
b. Literature-based evidence associated with writings and statements of specic terrorist groups or individuals.
c. Other analyses such as previous threat risk assessments.
As part of this triangulation process the data generated from
the test case would be analyzed for themes and patterns, allowing
the analysts to triangulate as described above and build a library
of what-if and sensitivity results that could form the basis for
future modeling activities.
Step three is an informal process known as face validity. It is
often not possible to validate a model by saying it has sufcient
face validity. But it is possible to discredit a model by saying it
does not have face validity. Given our plural modeling approach,
perhaps the most relevant face validity question is whether there
are other models that have equivalent face validity to those being
used in our framework. If that is the case and they can be
integrated easily into our framework and harnessed as part of
the plural modeling results, then they should be.

6. Discussion and summary


Addressing risk mitigation actions to counter terrorist WMD
strikes has become a time-consuming, funding-intensive activity
within governments around the world. There are many risk
analysis frameworks for evaluating such risk mitigation activities
but they all will require modeling adaptive adversaries, particularly the elusive nature of human decision making. Unfortunately,
the scope of addressing this problem is large so the analytics have
to handle signicant computational issues as well as signicant
predictive modeling issues. Underlying both of these issue sets is
a vast degree of uncertainty regarding who reds are, what reds
want and know (including perceptions), what resources and skills
reds possess, how reds decide, and what would make any one red
act if a WMD weapon was available. Given all of that uncertainty
any defendable analytical method must generate a probabilistic
(or other characterization of uncertainty) prediction of potential

88

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

red actions that fully accounts for and communicates all of those
many uncertainties.
The approach described in this paper handles all of these
issues. A plural modeling approach is used to compensate for our
uncertainty in red decision making. Inuence diagrams representing the uncertainties about who red is and what red knows
(perceptions) are used to structure the decision problem for each
of the descriptive decision models employed: MODA, satiscing,
lexicographic reasoning, Prospect theory (two versions), and
level-k game theory. Monte Carlo simulation as well as Luce
normalization and randomized utility methods are used to
produce probability distributions over the red decision outcomes.
Then a computation framework is proposed for building these
analytic methods into any terrorist risk assessment for red WMD
actions.
Finally any modeling effort must address how its results can
be validated. This is especially difcult in problem areas of
predictive modeling for which ground truth data is scarce. This
paper describes a three-step process that involves cause-effect
graphing; the triangulation of a range of data, judgments and
previous modeling results; and face validity assessments that the
approach employs reasonable methods and is not ignoring other
reasonable methods.
In summary, this paper denes the benchmarks against which
any method for modeling adaptive adversary decision behaviors
should be judged.

Acknowledgements
The authors are most grateful to their colleagues for many
suggestions and productive collaboration in this effort: Seth
Guikema, Laura McClay, Casey Rothschild, Jerrold Post. The
Department of Homeland Security Science and Technology Directorate funded this work under Contract No. HSHQDC-10-C-00105.
The authors wish to thank the reviewer for his many insightful
comments.

References
[1] U.S. Nuclear Regulatory Commission (USNRC). Reactor safety study: assessment of accident risk in U.S. Commercial nuclear plants. WASH-1400
(NUREG-75/014). Washington, D.C.: U.S. Nuclear Regulatory Commission;
1975.
[2] Vesely WE. Fault tree handbook. Washington DC: Ofce of Nuclear Regulatory Research; 1981.
[3] Garrick BJ. Perspectives on the use of risk assessment to address terrorism.
Risk Analysis 2002;22(3):4213.
[4] Ezell B, Bennett S, von Winterfeldt D, Sokolowski J, Collins A. Probabilistic risk
analysis and terrorism risk. Risk Analysis 2010;30(4):57589.
[5] Department of Homeland Securitys Bioterrorism Risk Assessment: A Call for
Change. Committee on methodological improvements to the department of
homeland securitys biological agent risk analysis, National Research Council
of the National Academies. Washington, DC: The National Academy Press;
2008.
[6] Cox Jr LA. Improving risk-based decision making for terrorism applications.
Risk Analysis 2009;29(3):33641.
[7] Wein L. Homeland security: from mathematical models to policy implementation. Operations Research 2009;57:80111.
[8] Parnell GS, Smith CM, Moxley FI. Intelligent adversary risk analysis: a
bioterrorism risk management model. Risk Analysis 2010;30(1):3248.
[9] Brown G, Cox A. How probabilistic risk assessment can mislead terrorism
analysts. Risk Analysis 2010;31(2):196204.
[10] Ezell B, Collins A. Letter to editor in response to brown and cox, how
probabilistic risk assessment can mislead terrorism analysts. Risk Analysis
2010;31(2):192.
[11] Brown GG, Carlyle WM, Harney RC, Skroch EM, Wood RK. Interdicting a
nuclear-weapons project. Operations Research 2010;57(4) 896877.
[12] Rios J, Rios Insua D. Adversarial risk analysis for counterrorism modeling.
Risk Analysis 2012;32(5):894915.
[13] Guikema SD, Aven T. Assessing risk from intelligent attacks: a perspective on
approaches. Reliability Engineering and System Safety 2010;95:47883.

[14] Clemen RT. Combining forecasts: a review and annotated bibliography.


International Journal of Forecasting 1989;5:55983.
[15] Howard RA. From inuence to relevance to knowledge. In: Oliver RM, Smith
JQ, editors. Inuence Diagrams, Belief Nets, and Decision Analysis. Chichester: Wiley; 1990.
[16] Shachter RD. Evaluating inuence diagrams. Operations Research 1986;34:
87182.
[17] Keeney RL, von Winterfeldt D. Identifying and structuring the objectives and
terrorists. Risk Analysis 2010;30(12):180316.
[18] Horlick-Jones T. Meaning and contextualization in risk assessment. Reliability Engineering and System 1998;59:7989.
[19] Renn O. The role of risk perception for risk management. Reliability
Engineering and System Safety 1998;59:4962.
[20] Kirkwood CW. Strategic decision making. Belmont, CA: Duxbury Press; 1997.
[21] Simon HA. Rational choice and the structure of the environment. Psychological Review 1956;63(2):12938.
[22] Einhorn HJ. Use of nonlinear, noncompensatory models as a fu nction of task
and amount of information. Organizational Behavior and Human Performance 1971;6:127.
[23] Kahneman D, Tversky A. Prospect theory: an analysis of decision under risk.
Econometrica 1979;47(2):26392.
[24] Crawford V, Iriberri N. Level-k auctions: can a nonequilibrium model of
strategic thinking explain the winners curse and overbidding in privatevalue auctions? Econometrica 2007;75(6):172170.
[25] Luce RD. Individual choice behavior: a theoretical analysis. Mineola, NY:
Dover Publications; 2005.
[26] Baltas G, Doyle P. Random utility models in marketing research: a survey.
Journal of Business Research 2001;51:11525.
[27] Buede DM. Errors associated with simple versus realistic models. Computational and Mathematical Organization Theory 2010;15(4):118.
[28] Clemen RT, Winkler RL. Aggregating probability distributions. In: Edwards W,
Miles R, von Winterfeldt D, editors. Advances in Decision Analysis. New
York: Cambridge University Press; 2007.
[29] Clemen RT, Winkler RL. Combining probability distributions from experts in
risk analysis. Risk Analysis 1999;19(2):187203.
[30] Collopy F, Armstrong JS. Expert opinions about extrapolation and the mystery
of the overlooked discontinuities. International Journal of Forecasting
1992;8:57582.
[31] Guikema SD, Quiring SM. Hurricane outage prediction podel: Phase II Report.
Baltimore, MD: Johns Hopkins University; 2009.
[32] Guikema SD, Han SR, Quiring SM. Pre-storm estimation of hurricane
damage to electric power distribution systems. Risk Analysis 2010;30(12):
174452.
[33] Dawes R. The robust beauty of improper linear models in decision making.
American Psychologist 1979;34(7):57182.
[34] Mahoney S, Comstock E, deBlois B, Darcy S. Aggregating forecasts using a
learned bayesian network. Palm Beach, FL: Conference of Florida Articial
Intelligence Research Society; 2011.
[35] Brown RV, Lindley DV. Plural analysis: multiple approaches to quantitative
research. Theory and Decision 1986;20(2):33154.
[36] Sollich P, Krogh A. Learning with ensembles: how overtting can be useful.
Advances in Neural Information Processing Systems 1996;8:19096s.
[37] Dietterich T. Ensemble learning. The Handbook of Brain Theory and Neural
Networks. 2nd ed. Cambridge, MA: MIT Press; 2002, p. 405408.
[38] Wolpert D. Stacked generalization. Neural Networks 1992;5(2):24159.
[39] Dani V, Madani O, Pennock D, Sanghasi S, Galebach B An empirical comparison
of algorithms for aggregating expert predictions. In: Proceedings of the 22nd
conference on uncertainty in articial intelligence; 2006. AUAI Press.
[40] Cesa-Bianchi N, Freund Y, Helbold D, Haussler D, Schapire R, Warmuth M.
How to use expert advice. Journal of the Association for Computing
Machinery 1997;44(3):42785.
[41] Predd JB, Kulkarni SR, Poor HV, Osherson DN. Scalable algorithms for
aggregating disparate forecasts of probability. In: ninth international conference on information fusion; 2006.
[42] Kahn JM. A generative Bayesian model for aggregating experts probabilities.
In: uncertainty in articial intelligence proceedings of the 20th conference;
2004. AUAI Press.
[43] Ting KM, Witten IH. Issues in stacked generalization. Journal of Articial
Intelligence Research 1999;10:27189.
[44] Ranjan R, Gneiting T. Combining probability forecasts. Journal of the
Royal Statistical Society. Series B, Statistical methodology 2010;72(1):
7191.
[45] Tversky A, Kahneman D. Advances in prospect theory: cumulative representation of uncertatinty. Journal of Risk and Uncertainty 1992;5:297323.
[46] Wu G, Zhang J, Gonzalez R. Decision under risk. In: Koehler DJ, Harvey N,
editors. Blackwell Handbook of Judgment and Decision Making. Oxford:
Blackwell Publishing; 2004.
[47] Stahl D, Wilson P. Experimental evidence on players models of other players.
Journal of Economic Behavior & Organization 1994;25:30927.
[48] Stahl D, Wilson P. On players models of other players: theory and experimental evidence. Games and Economic Behavior 1995;10:21854.
[49] Nagel R. Unraveling in guessing games: an experimental study. American
Economic Review 1995;85(5):131326.
[50] Crawford V, Gneezy U, Rottenstreich Y. The power of focal points is limited:
even minute payoff asymmetry may yield large coordination failures. American Economic Review 2008;98(4):144358.

D.M. Buede et al. / Reliability Engineering and System Safety 108 (2012) 7789

[51] Costa-Gomes MA, Crawford VP. Cognition and behavior in two-person


guessing games: an experimental study. American Economic Review
2006;96:173768.
[52] Kawagoe T, Takizawa H. Equilibrium renement vs. level-k analysis: an
experimental study of cheap-talk games with private information. Games
and Economic Behavior 2009;66:23855.

89

[53] Rothschild C, McLay L, Guikema SD. Adversarial risk analysis with incomplete
information: a level-k approach. Risk Analysis 2012;32(7):121931.
[54] McLay L, Rothschild C, Guikema SD. Robust adversarial risk analysis: a level-k
approach. Decision Analysis 2012;9(1):4154.

Das könnte Ihnen auch gefallen