Chapter 5 Statistical Decision Problems PDF

Leong YK & Wong WY Introduction to Statistical Decisions 1
Chapter 5
Statistical Decision Problems
Decision problems called statistical are those in which there
are data on the state of nature, hopefully containing
information that can be used to make a better decision. The
availability of data will generally provide some illumination,
so that in the selection of an action one is not completely in
the dark concerning the state of nature. However, in
practice, one still faces the problem of taking bad action if
information contained in the data is not intelligently utilized.
5.1 Date and the State of Nature

We assume that data contain information about the true state
of nature. For the general discussion, the notation X , either a
single random variable or random vector, will be employed
to refer to data, will have a (joint) density function given by
f ( x| ) , x S X ; .
If is treated as random variable then the above density
function will be regarded as the conditional density function
of X given that the state of nature is = .
5.2 Decision Functions

Let (, A, L) be a decision problem and X , an observable
random variable. A procedure for using data as an aid to
action choosing will involve a rule. That is, assign an action
to each of the observed data X.
Decision rule
A function
d : SX A
is called a nonrandomized decision rule.
Example 5.2.1
Suppose (, A, L) is a decision problem with A = { a1 , a2 },
and suppose X is an observable random variable such that
S X = {x1 , x2 }. Then for are four distinct nonrandomized
decision rules as in the following table:
d1 d2 d3 d4
x = x1 a2 a2 a1 a1
x2 a2 a1 a2 a1
Example 5.2.2
Suppose (, A, L) is a decision problem with A = { a1 , a2 },
and suppose X is an observable random variable such that
S X = {x1 , x2 , x3}. Then for are eight distinct
nonrandomized decision rules as in the following table:
d1 d2 d3 d 4 d5 d 6 d 7 d8
x = x1 a2 a2 a2 a1 a2 a1 a1 a1
x2 a2 a2 a1 a2 a1 a2 a1 a1
x3 a2 a1 a2 a2 a1 a1 a2 a1
5.3 Risk Functions

When a decision function d ( x) is used, the loss incurred will
depend not only on the true state of nature , but also on the
value of the observable random variable X . Since X is
random, so the loss incurred
L( , d ( X ))
is a random variable. The expected value of the above
random loss is called the risk function of decision d . That is,
Risk Function
The risk function of the decision rule d , when
the state of nature is , is defined to be
R( , d ) = E[ L( , d ( X ))]
When observed random variable X was incorporated in

making decisions, the action space is expanded to the set of
(nonrandomized) decision rules, denoted by D and the no-
data decision problem (, A, L) is extended to the statistical
decision problem denoted by (, D, R) .
Example 5.3.1
Consider the decision problem with loss table given by
a1 a2
1 0 4
2 4 0
Suppose that is random variable is observed with density

function given by
f ( x;1) f ( x; 2 )
x=0 0.6 0.2
1 0.4 0.8
The set of nonrandomized decision rules are
d1 d2 d3 d4
x=0 a2 a2 a1 a1
1 a2 a1 a2 a1
The risks of these four decision rules are listed below:

d1 d2 d3 d4
R(1, d ) 4 2.4 1.6 0
R( 2 , d ) 0 3.2 0.8 4
The decision rules are also be represented graphically by
means of their risk points.
Remark
Decisions d1 and d 4 , which ignore the data, give risk
points which are exactly the same as the corresponding
loss points for no-data problem.
The straight line joining the risk points of d1 and d 4
would consists the loss points ( as regards risks ) to the
randomized actions of a2 and a1.
An intelligent use of data, such as decision rule d 3 can
improve the expected losses.
Using data foolishly, such as decision d 2 can deteriorate
the loss function.
How many distinct nonrandomized decision rules are in a
given statistical decision problem in which there n available
Actions and the observed random variable has m possible
values?
Answer : n m
Example 5.3.2
Consider a decision problem in which the loss matrix is
given by
a1 a2
1 0 1
2 6 5
Suppose that the observed random variable X takes three
possible values with density function given in the following
table
x1 x2 x3
f ( x;1) 0.6 0.3 0.1
f ( x; 2 ) 0.1 0.4 0.5
The list of nonrandomized decision rules are tabulated as
follows:
d1 d2 d3 d4 d5 d6 d7 d8
x1 a2 a2 a2 a1 a2 a1 a1 a1
x2 a2 a2 a1 a2 a1 a2 a1 a1
x3 a2 a1 a2 a1 a1 a1 a2 a1
The risks of these decision rules are given in the following

table
d1 d 2 d 3 d 4 d 5 d 6 d 7 d8
R(1, d ) 1.0 0.9 0.7 0.4 0.6 0.3 0.1 0.0
R( 2 , d ) 5.0 5.5 5.4 5.1 5.9 5.6 5.5 6.0
These risk points are presented graphically as follows:
The phenomenon observed in the above examples above is

typical:
The availability of data permits a reduction of

losses if the data are wisely used.
If the set of risk points ( R(1, a ), R( 2 , a )) represents
various decisions is pulled in toward the origin,
where decisions are as perfect as they could be.
But it should be emphasized that for this to happen

the data must contain some information of the state of
nature. For, if the distribution of X is independent of ,
say
f ( x; ) = g ( x)
then the risks of a given decision rule d are
R(i , d ) = E[ L(i , d ( X ))] = L(i , d ( x)) g ( x) .
x
If the state space contains m elements, then
R(1, d ) L(1, d ( x))

R( 2 , d ) L( 2 , d ( x))
= g ( x)
M x M

R( m , d ) L( m , d ( x)
which is a convex combination of the loss point.
Example 5.3.3
Consider the decision problem stated in Example 5.3.2 with
loss table given by
a1 a2
1 0 1
2 6 5
Suppose now the density function of the observed random

variable X is given by
x1 x2 x3
f ( x;1) 0.6 0.3 0.1
f ( x; 2 ) 0.6 0.3 0.1
x1, x2 , x3
The risk of a decision rule, say d 2 =
(Note)
a2, a2 , a1
Would be
R(1, d 2 ) = 1 0.6 + 1 0.3 + 0 0.1
R( 2 , d 2 ) = 5 0.6 + 5 0.3 + 6 0.1,
or written in vector form as
R(1, d 2 ) 1 0
= 0.9 + 0.1
R( 2 , d 2 ) 5 6
This is a convex combination of the pure actions a1 and a2 .
5.4 Optimal Decision Rules

The way to select a decision rule, with the knowledge of the
risk function, is exactly the same problem as the selection of
an action. The risk function plays the role as that of the loss
function in no-data decision problem.
Admissible Decision Rules
The concepts of dominance and admissibility can be
extended in obvious way to decision rules.
Admissible Decision Rule

We say that decision rule d is dominated by
decision rule d 0 if
R( , d 0 ) R ( , d ) .
If the above inequality is strict for some state ,
then decision rule is said to be inadmissible.
Decision rule which is not inadmissible is called
an admissible decision rule.
Example 5.4.1
Consider the decision problem stated in Example 5.3.1. The
risk set and the risk points of the nonrandomized decision
rules are reproduced here for easy reference.
d1 d2 d3 d4
R(1, d ) 4 2.4 1.6 0
R( 2 , d ) 0 3.2 0.8 4
Decision rule d 2 is inadmissible

Other three decision rules are admissible.
Minimax Principle
As in the no-data case, it is necessary to devise a scheme of
preferences so that in this ordering one can select the most
desirable decision rule.
Minimax Decision Rule

Decision rule d 0 is said to be a minimax decision
rule if
max R ( , d 0 ) max R ( , d ) for all d D .

Example 5.4.2
The risk set of the risk points of the decision rules
considered in Example 5.4.1 is reproduced as follows:
d1 d2 d3 d4
R(1, d ) 4 2.4 1.6 0
R( 2 , d ) 0 3.2 0.8 4
max R ( , d ) 3 3.2 1.6 4

So d3 is the minimax decision rule.
The result can also be obtained graphically. That is,

The minimax decision rule is obtained by moving a wedge
whose vertex is on 450 line and whose sides are parallel to
the coordinate axes up to the set of risk sets
As in the no-data decision problem, it can make a difference
if one is going to consider a minimax approach whether
one uses regrets or losses.
Note that there are two ways to introduce the idea of regret.
It is either
(a) applying it to the initial loss function
Data
Loss Regret Regret
(b) applying it to the risk function.

Data
Loss Risk Regret
Recall that the regret function is defined as

Lr ( , ai ) = L( , ai ) min L( , a ) .
aA
Note that the min L( , a ) depends solely on the state of
aA
nature. So for each observed value X = x ,
Lr ( , d ( x )) = L( , d ( x )) min L( , a )
aA
Thus the expected regret is then
Lr ( , d ) = E[ L( , d ( X ))] = R( , d ) min L( , a ) .
aA
Now, for any decision rule d,
min L( , a ) L( , d ( x )) for all x .
aA
Therefore,
min L( , a ) E[ L( , d ( X )] = R( , d ) for all d,

aA
and hence
min L( , a ) min R( , d ) .
aA d D
On the other hand, since A D, and R( , d *) = L( , a ) if
d * ( x ) a , so
min R( , d ) L( , a ) for all a A
d D
and therefore,
min R( , d ) = min L( , a ) ,
d D aA
Therefore
Rr ( , d ) = E[ Lr ( , d ( X )] = R( , d ) min R( , d ' )
d 'D
This implies that the expected regret is the same as the
regretized risk.
Example 5.4.3
Reconsider the statistical decision problem in Example 5.3.2.
The risk functions of the nonrandomized decision rules were
tabulated as follows:
d1 d 2 d 3 d 4 d 5 d 6 d 7 d8
R(1, d ) 1.0 0.9 0.7 0.4 0.6 0.3 0.1 0.0
R( 2 , d ) 5.0 5.5 5.4 5.1 5.9 5.6 5.5 6.0
By regretizing the above risks, we obtain the expected regret

as follows
d1 d 2 d3 d 4 d5 d 6 d 7 d8
E[ Lr (1, d ( X ))] 1.0 0.9 0.7 0.4 0.6 0.3 0.1 0.0
E[ Lr ( 2 , d ( X ))] 0.0 0.5 0.4 0.1 9.9 0.6 0.5 1.0
It follows that d 4 is the minimax nonrandomized regret

decision rule. The minimax randomized regret decision rule
can be obtained graphically by moving the wedge with
vertex on the 450 line up to the set of expected regret points.
It is clear that the minimax randomized regret rule is a
mixture of d 4 and d 7 , namely
~ d1 d 2 d 3 d 4 d5 d 6 d 7 d8
p=
0 0 0 p 0 0 1 p 0
or simply denoted as
~
p = ( 0 , 0 , 0 , p , 0 , 0 ,1 p , 0 )
The value of p is obtained by equating

E[ Lr (1, ~
p )] = E[ Lr ( 2 , ~
p )]
0.4 p + 0.1(1 p ) = 0.1 p + 0.5(1 p )

3p +1 = 5 4 p
p = 4/7
Bayes Principle
Another scheme for ordering decision rules is that of
assigning prior probabilities ( ) to the various states of
nature, and determining the average risk over these states.
Bayes Decision Rule

Suppose that ( ) = P( = ) is the prior
probability of the state of nature. Then the Bayes
risk of the decision rule d in a statistical decision
problem (, D, R) is defined to be
R( , d ) = R( , d ) ( ) .

Decision rule d 0 is called a Bayes decision rule
against if
R( , d 0 ) R( , d ) for all d D.
Posterior Distribution
The Bayes approach to select an optimal decision rule
involves the assumption that the state of nature is random
with probability function ( ) . The probability function of
the observed random variable X will be regarded as the
conditional distribution given that = and is written as
P( X = x | = ) or simply as f ( x | ) . We shall denote the
conditional distribution of given that X = x as ( | x ) .
Note that
( | x ) P ( X = x ) = P( X = x | = ) ( ) (*)
In fact, both sides of the above equation represent the joint
probability of X and . We call conditional distribution
( | x ) the posterior distribution of given that X = x .
As ( | x ) is considered as a function of while x is fixed,
( | x ) is proportional to the product
P( X = x | = ) ( )
and we write
( | x ) P ( X = x | = ) ( ) (**)
Example 5.4.4
Suppose that the conditional probabilities of X given =
are
P ( X = x1 | = 1 ) = 1 / 4 = 1 P ( X = x 2 | = 1 )
P( X = x1 | = 2 ) = 2 / 3 = 1 P ( X = x2 | = 2 ) ,
and suppose the prior probability function of is given by
P( = 1 ) = (1 ) = w
P( = 2 ) = ( 2 ) = 1 w , 0 w 1
Find the posterior distribution of given that X = x .
For X = x1,
3w 8 8w
(1 | x1) = , ( 2 | x1 ) = 1 (1 | x1 ) =
8 5w 8 5w
For X = x2 ,
3w 4 4w
(1 | x2 ) = , ( 2 | x2 ) =
4w 4w
Successive Observations
If an observation can alter prior odds to posterior odds, it
would seem that further observation, applied to the first
posterior distribution as though it were a prior, should result
in yet a new posterior distribution.
It is of interest to know that if a posterior distribution is used

as a prior distribution with new data, the resulting posterior
distribution is the same as if one had waited until all the data
were at hand to use with the original prior distribution for a
final posterior distribution.
We shall give an affirmative answer by verifying the case

that X 1 and X 2 independent observations with density
functions f1( x1 | ) and f 2 ( x2 | ) , respectively.
Let ( ) be a prior density function of . Then

( | x1) f1( x1 | ) ( ) .
Thus the posterior density function of given that X 1 = x1
is
( | x1) = c1( x1) f1( X 1 | ) ( )
By regarding 1( ) = ( | x1 ) as the density function of
,the posterior density function of when X 2 = x2 is
available is given by
2 ( | x2 ) f 2 ( x2 | )1( )
f 2 ( x2 | ) f1( x1 | ) ( )
Since the f1( x1 | ) f 2 ( x2 | ) is the joint density function of
X 1 and X 2 , this shows that 2 ( | x2 ) is the posterior
density of given the observed vector ( x1, x2 ) .
Bayes Decision Rules

Recall that the Bayes risk of a decision rule d against the
prior probability function is
R( , d ) = E [ R( , d )] = R( , d ) ( ) .

Now

R( , d ) = L( , d ( x )) f ( x | ) ( )
x

= L( , d ( x ) ( | x ) f ( x )
(*)
x
where f (x ) represents the marginal density function of X .
It follows from (*) that for each observed value x , the Bayes
decision rule d 0 such that
L( , d 0 ( x )) ( | x ) L( , d ( x )) ( | x ) d D

Example 5.4.5
Consider a decision problem with loss matrix
a1 a2
1 0 8
2 4 0
Suppose that the statistician can observe a random variable
X with the following conditional distributions:
P( X = 0 | = 1 ) = 3 / 4 P( X = 0 | = 2 ) = 1 / 3
P( X = 1 | = 1 ) = 1 / 4 P( X = 1 | = 2 ) = 2 / 3 .
It is requires to construct a Bayes decision rule against the
following prior distribution of
: P( = 1) = w , P( = 2 ) = 1 w , 0 w 1
Construction of Posterior Distribution
For x = 0 ,
(1 | 0) P ( X = 0 | = 1) P ( = 1)
= 3w / 4
( 2 | 0) P ( X = 0 | = 2 ) P ( = 2 )
= (1 w) / 3 .
This implies that
3w / 4 9w
(1 | 0) = =
3w / 4 + (1 w) / 3 9w + 4(1 w)
4(1 w)
( 2 | 0) = .
9 w + 4(1 w)
L( , a1 ) ( | 0) = 0 (1 | 0) + 4 ( 2 | 0)

16(1 w )
=
9 w + 4(1 w)
L( , a2 ) ( | 0) = 8 (1 | 0) + 0 ( 2 | 0)

72w
= .
9 w + 4(1 w)
Therefore, d (0) = a1 iff 16(1 w) 72w

iff w 2 / 11.
Similarly, for x = 1,
(1 | 1) P( X = 1 | = 1) P( = 1)
= w/4
( 2 | 1) P ( X = 1 | = 2 ) P( = 2 )
= 2(1 w ) / 3
w/4 3w
(1 | 1) = =
w / 4 + 2(1 w) / 3 3w + 8(1 w)
8(1 w)
( 2 | 1) = .
3w + 8(1 w)
Therefore, d (1) = a1 32(1 w ) 24 w w 4 / 7
Conclusion : The Bayes decision rule is
d1 , 0 w 2 / 11

d0 ( x ) = d3 , 2 / 11 w 4 / 7
d , 4/7 w 1
4
where the decision rules are defined to be

d1 d2 d3 d3
0 a2 a2 a1 a1
1 a2 a1 a2 a1
Note that the risk points of these decision rules are
d1 d2 d3 d4
R(1, d ) 8 6 2 0
R( 2 , d ) 0 8/3 4/3 4
Bayes decision rule with constant risk is

minimax
Example 5.4.6
In the above example (Example 4.5.4), the (randomized)
decision rule with constant risk is a mixture of d3 and d3 .
The slope of the line joining risk points of d3 and d 4 is
4 4/3 4
m= = .
r0 2 3
Hence the prior vector =< w ,1 w > which is perpendicular
to vector joining risk points of d3 and d 4 is given by
1 w 3 4
= or w = .
w 4 7
This implies that the randomized decision rule d * with
constant risk is Bayes against the prior distribution
: P( = 1) = 4 / 7 and P( = 2 ) = 3 / 7 .
In fact the randomized minimax decision rule is found to be

d * = (0 , 0 , 6 / 7 , 1 / 7 )
Notice that d * is the only admissible decision rule with
constant risk ( see the following figure ).
Another question arises:
When are Bayes decision rule admissible?
Some answers are given in the following assertions

(a) If for a given prior distribution , 0 is the

unique Bayes decision rule against , then
0 is admissible.
(b) Suppose that = {1 , L , n } and 0 is

Bayes against a prior distribution
P( = i ) = pi > 0 , for all i = 1, L, n ,
Then 0 is admissible.
5.5 Sufficiency
It is common practice for statistician, when confronted with
a mass of data, to compute some simple measure from the
data and then base statistical procedures on the simpler
quantity . Computing such simpler measure is called
reducing the data; the measures themselves are called
statistics.
A question arises naturally, How much reduction of the data
is possible without losing information regarding the state of
nature?
Sufficient Statistic
statistic T is said to be sufficient for a family of density
functions { f ( | ) ; } if ( | ~
x1 ) = ( | ~
x2 ) for
any prior distribution of , and any data ~ x1 and ~ x2 of
the same size from the family { f ( | ) ; }.
Factorization Theorem
Suppose that f ( ~ x | ) represents the joint density
~
function of the observed random vector X .
~
Statistic T = t ( X ) is sufficient for if and only if
f (~x | ) = g (t ( ~
x ); )h( ~
x)
where g depends on ~
x only through t and h does
not depend on .
The original definition of sufficiency was proposed by

Fisher in early of 1920.
~
Statistic T = t ( X ) is said to be sufficient for the
family of density functions { f (; ) : } if the
~
conditional distribution of X given T = t does not
depend on .
Use of Sufficient Statistic in Decision Problem

The reason for performing an experiment whose distribution
depends on is to learn about , and if the distribution does
not depend on there is no point to performing the
experiment no information about is about by so doing.
~
This shows, intuitively, why nothing is lost if the data X is
reduced to a sufficient statistic T .
~
More precisely, given any decision rule d ( X ) , there is a decision
rule (T ) for which R ( , ) = R ( , d ) .The concept of
sufficiency is useful for it allows us to focus on decision rules that
are functions of sufficient statistics. We shall illustrate this fact by
example (see Example 5.6.1) and state without proof the
theoretical result as given below.
Given any decision rule d (x~ ), there is a decision rule (t ) ,

~
depending only on sufficient statistic T = t ( X ) so that
R( , d ) = R( , ).
Let d (x~ ) be any decision rule. Define a decision rule as

follows:
~
Suppose T = t is observed. Then in the distribution of X
~
given T = t , if data X = ~
x * is observed, then taken action
d (x~*) .
Note that even if d (x~ ) is a nonrandomized decision rule,

(t ) is generally a randomized decision rule. Since the
action taken depends on ~ x *, and ~ x * came about by
~
performing the experiment X | T = t . Thus for given and
T = t , the loss of is random and what is relevant is the
expected loss:
~
L( , (t )) = E[ L( , d ( X )) | T = t )
~ ~ ~
= L ( , d ( x ) P ( X = x | T = t).
~
x :t ( ~
x )= t
The risk function using (T ) is then

R( , ) = E[ L( , (T ))]
= L( , (t )) P(T = t )
t
~
= L( , d ( x )) P( X = x | T = t ) P (T = t )
~ ~
x :t ( ~x )= t
t ~

~
= L( , d ( ~ x )) P( X = ~
x)
~
x
= R( , d ) .
Example 5.5.1
Consider a decision problem (, A, L ) in which
A = { a1 , a2 }. Let X 1 and X 2 be a random sample of size 2
from a Bernoulli distribution with parameter . Then
T = X 1 + X 2 is sufficient for . Moreover, T is a binomial
random variable with parameter ( 2 , ).
Now consider the following decision rule
~ a1 , ( x1, x2 ) = (0, 0) or (0,1)

d(x ) =
a2 , ( x1, x2 ) = (1, 0) or (1,1)
Now define (t ) as follows:
a1 , t =0
(t ) =
a2 , t=2
If t = 1, then toss a fair coin once,
a1 , if Head occurs
(1) = .
a2 , if Tail occurs
Then the risk function of is
R( , ) = E[ L( , (T ))]
= L( , a1 ) P (T = 0) + L( , (1)) P (T = 1)
+ L( , a2 ) P (T = 2)
2 1
= L( , a1 )(1 ) + [ L( , a1 ) + L( , a2 )]2 (1 )
2
+ L( , a2 ) 2
= L( , a1 ){(1 ) 2 + (1 )} + L( , a2 ){ (1 ) + 2 }
~ ~
= L( , a1 ){P ( X = (0,0)) + P( X = (0,1))}
~ ~
+ L( , a2 ){P ( X = (1,1)) + P ( X = (1,0))}
Example 5.6.1
Consider a decision problem (, A, L ) in which A = { a1 , a2 }.
Let X1 and X 2 be a random sample of size 2 from a Bernoulli
distribution with parameter . Then T = X 1 + X 2 is sufficient for
. Moreover, T is a binomial random variable with parameter
( 2 , ).
Now consider the following decision rule
~ a1 , ( x1, x2 ) = (0, 0) or (0,1)

d (x ) =
a 2 , ( x1, x2 ) = (1, 0) or (1,1)
Now define (t ) as follows:
a1 , t =0
(t ) =
a 2 , t=2
If t = 1, then toss a fair coin once,
a1 , if Head occurs
(1) = .
a 2 , if Tail occurs
Then the risk function of is
R ( , ) = E[ L ( , (T ))]
= L( , a1 ) P (T = 0) + L( , (1)) P (T = 1) + L( , a2 ) P (T = 2)
1
= L( , a1 )(1 ) 2 + [ L( , a1 ) + L( , a2 )]2 (1 ) + L( , a2 ) 2
2
= L( , a1 ){(1 ) 2 + (1 )} + L( , a2 ){ (1 ) + 2 }
~ ~
= L( , a1){P ( X = (0,0)) + P ( X = (0,1))}
~ ~
+ L( , a2 ){P ( X = (1,1)) + P ( X = (1,0))}
= R ( , d )

Chapter 5 Statistical Decision Problems PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Chapter 5 Statistical Decision Problems PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Leong YK & Wong WY Introduction to Statistical Decisions 1

5.1 Date and the State of Nature

5.2 Decision Functions

5.3 Risk Functions

When observed random variable X was incorporated in

Suppose that is random variable is observed with density

The risks of these four decision rules are listed below:

The risks of these decision rules are given in the following

The phenomenon observed in the above examples above is

The availability of data permits a reduction of

But it should be emphasized that for this to happen

Suppose now the density function of the observed random

5.4 Optimal Decision Rules

Admissible Decision Rule

Decision rule d 2 is inadmissible

Minimax Decision Rule

The result can also be obtained graphically. That is,

(b) applying it to the risk function.

Recall that the regret function is defined as

min L( , a ) E[ L( , d ( X )] = R( , d ) for all d,

By regretizing the above risks, we obtain the expected regret

It follows that d 4 is the minimax nonrandomized regret

The value of p is obtained by equating

0.4 p + 0.1(1 p ) = 0.1 p + 0.5(1 p )

Bayes Decision Rule

Find the posterior distribution of given that X = x .

It is of interest to know that if a posterior distribution is used

We shall give an affirmative answer by verifying the case

Let ( ) be a prior density function of . Then

Bayes Decision Rules

Construction of Posterior Distribution

Therefore, d (0) = a1 iff 16(1 w) 72w

where the decision rules are defined to be

Note that the risk points of these decision rules are

Bayes decision rule with constant risk is

In fact the randomized minimax decision rule is found to be

Another question arises:

When are Bayes decision rule admissible?

Some answers are given in the following assertions

(a) If for a given prior distribution , 0 is the

(b) Suppose that = {1 , L , n } and 0 is

The original definition of sufficiency was proposed by

Use of Sufficient Statistic in Decision Problem

Given any decision rule d (x~ ), there is a decision rule (t ) ,

Let d (x~ ) be any decision rule. Define a decision rule as

Note that even if d (x~ ) is a nonrandomized decision rule,

The risk function using (T ) is then

Now consider the following decision rule

~ a1 , ( x1, x2 ) = (0, 0) or (0,1)

Now define (t ) as follows:

If t = 1, then toss a fair coin once,

Then the risk function of is

Now consider the following decision rule

~ a1 , ( x1, x2 ) = (0, 0) or (0,1)

Now define (t ) as follows:

If t = 1, then toss a fair coin once,

Then the risk function of is

Das könnte Ihnen auch gefallen