Beruflich Dokumente
Kultur Dokumente
7
Phil.015
April 12, 2016
ELEMENTS OF DECISION THEORY
The elements of decision theory are similar to those of the theory of games in
that decision theory may be considered as the theory of a two-person game, in which
nature takes the role of one of the players, and the other player if of course the
statistician. Because the problems are simpler, it is useful to start with decisions in
the absence of data. Often certain decision problems involving data can cleverly be
converted into no-data problems.
I. NO-DATA DECISION PROBLEMS
The basic conceptual ingredients of a no-data decision model are the same is
those of a zero-sum two-person game-theoretic model, namely:
(i) A non-empty set = {1 , 2 , } called the state space (sometimes
referred to as the parameter space), of possible states of nature.
(ii) A non-empty set A = {a1 , a2 , } called the action space, of actions
available to the decision-maker. And
(iii) a loss function : A R that assigns a real number (, a)
to each pair (, a), specified by the state of nature and action a. The
value (, a) represents the loss incurred when the decision-maker takes
action a and the nature is in state . Although (, a) can even be a
negative number, representing gain or payoff, statisticians usually think of
(, a) conservatively as a loss and take 0 to be the smallest loss (i.e., no
loss at all). That is to say, (, a) is always a non-negative real number.
Simply, nature chooses a point in , and the statistician (without being
informed of the choice nature has made) chooses an action a in A. As a
consequence of these two choices, the statistician (decision-maker) loses an
amount (, a).
In a mathematical sense,1 the triple h, A, i with the specifications itemized
above is called a no-data decision model (or a two-person game).
Here are some examples:
1
An essential part of understanding how a mathematical method works in decision theory is being
able to view it somewhat abstractly as a mathematical phenomenon. This approach helps to see
how an abstract idea might work in areas other than the one we may be interested in at the moment.
1
170
0
2
170
200
1
0
170
2
0
-30
2. Homemaker Alma Jones can cook spaghetti, hamburger, or steak for dinner. She has learned from her past experience that if her husband is in
good mood, she can serve him spaghetti and save some money, but if he
is in a bad mood, only a juicy steak will calm him down and make him
bearable. Clearly, there are three states of nature (Jones possible modes
of being) and there are three actions available to Alma, leading to another
finitary decision model h, A, i, where
(a) = {1 , 2 , 3 }, where 1 = Mr. Jones is in a good mood, 2 =
2
1
0
5
10
2
2
3
9
3
4
5
6
p0 ()
1
0.7
2
0.3
Find the Bayes action that minimizes the average losses. Do the same in
Example 2 (prepare dinner for Mr. Jones), given that the prior probability
distribution on states of nature is given by the table
p0 ()
1
0.5
2
0.3
3
0.2
Considerably better decisions become available, if in addition to a prior knowledge there is access to various observation results of a designated parent random
variable X, whose values are assumed to depend on the states of nature in the
form of (conditional) probability distributions p(x|) = P(X = x | = ).
Simply, in an observation setting we think of the states of nature as causes
and the values x of parent variable X as effects, but of course a precisely
formulated cause-effect relationship is only statistical. Unfortunately, the welcome feature of having additional information from observation data somewhat
complicates the no-data decision models, treated above.
II. DATA-BASED DECISION PROBLEMS
To give a correct mathematical structure to the process of information gathering, we assume that there is a parent random variable X taking its values in
an observation space (sample space) X , whose known probability distribution
p(x|) depends on the true state of nature . Thus, what we have so far
is a decision model h, A, i coupled with a random variable X with range X
and known probability distribution
pX (x|). For each there is a probability
Rx
measure P(X x | ) = pX (x | ) dx , specified by pX (x | ).
x1
0.7
0.4
x2
0.3
0.6
x1
x2
d1
a1
a1
d2
a1
a2
d3
a2
a1
d4
a2
a2
x1
0.5
0.2
0
x2
0.4
0.5
0.2
x3
0.1
0.2
0.5
x4
0
0.1
0.3
Here the first row specifies the probability distribution p(x|1 ), the second
row gives p(x|1 ), and the third row defines p(x|3 ) for X.
Evidently, here the risk function for 1 and decision rule d is given by
(1 , d) = (, d(x1 ))p(x1 |1 )+(1 , d(x2 ))p(x2 |1 )+ +(1 , d(x4 ))p(x4 |1 ).
Now, suppose Alma knows that Mr. Jones is in a good mood 30% of the
time, in a normal mood 50% of the time, and is in a bad mood 20% of the
time. Now the resulting enriched data-based decision model h, AX , i
allows to calculate the Bayes risks
B(d) == (1 , d)p0 (1 ) + (2 , d)p0 (2 ) + (3 , d)p0 (3 )
for all admissible d by substituting the given percentages for p0 (). Alma
can now choose the best strategy, based on her data about Mr. Jones
behavior, reflecting whether Mr. Jones is in a good, normal or bad mood.
Suppose we observe a parent random variable X with possible values x1 , x2 , x3
and note that the observation outcome was X = x1 . Because we can calculate
the marginal probability P(X = x1 ) from the weighted average
P(X = x1 ) = p(x1 |1 )p0 (1 ) + p(x1 |2 )p0 (2 ) + + p(x1 |m )p0 (m )
of conditional probabilities and prior, the posterior probability
p1 (i |x1 ) = P( = i | X = x1 ) = p0 (i )
p(x1 |i )
P(X = x1 )
(known from the Bayes theorem) can be used in determining the conditional
Bayes risk
B(d|x1 ) == (1 , d)p1 (1 |x1 ) + (2 , d)p1 (2 |x1 ) + (3 , d)p1 (3 |x1 )
for all d. As before, the decision maker chooses the decision function d provided
that its Bayes risk B(d |x1 ) is the smallest among all Bayes risks B(dj |x1 ) of
decision functions.
9