Beruflich Dokumente
Kultur Dokumente
Xi is a continuous RV . Xi is a discrete RV
The likelihood function is sometimes viewed as a function of x1 , . . . , xn (xing ) and sometimes as a function of (xing x1 , . . . , xn ). In the latter case, the likelihood is sometimes denoted L(). Theorem 1 (Factorization Theorem). T is a sucient statistic for if the likelihood factorizes into the following form L(x1 , . . . , xn |) = g(, T (x1 , . . . , xn )) h(x1 , . . . , xn ) for some functions g, h. Proof. We prove the theorem only for the discrete case (the continuous case requires dierent techniques). First assume the likelihood factorizes as above. Then p(x1 , . . . , xn |T (x1 , . . . , xn )) = p(x1 , . . . , xn , T (x1 , . . . , xn )) = p(T (x1 , . . . , xn )) p(x1 , . . . , xn ) = y:T (y)=T (x) p(y1 , . . . , yn ) h(x1 , . . . , xn ) y:T (y)=T (x) h(y1 , . . . , yn )
which is not a function of . Conversely, assume that T is a sucient statistic for . Then L(x1 , . . . , xn |) = p(x1 , . . . , xn |T (x1 , . . . , xn ), )p(T (x1 , . . . , xn )|) = h(x1 , . . . , xn )g(T (x1 , . . . , xn ), ).
Xi since
P xi
xi (1 )1xi =
(1 )n
xi
= g ,
xi 1.
Example: A sucient statistic for the uniform distribution U ([0, ]) is max(X1 , . . . , Xn ) since L(x1 , . . . , xn |) =
i
In the case that is a vector rather than a scalar, the sucient statistic may be a vector as well. In this case we say that the sucient statistic vector is jointly sucient for the parameter vector . The denitions and factorization theorem carry over with little change. 2 Example: T = ( Xi , Xi ) are jointly sucient statistics for = (, 2 ) for normally distributed data 2 X1 , . . . , Xn N (, ): L(x1 , . . . , xn |) =
i
1 2 2
e(xi )
P
i
/(2 2 )
= (2 2 )n/2 e
P
i
i (xi )
/(2 2 )
= (2 2 )n/2 e
x2 /(2 2 )+2 i
xi /(2 2 )2 /(2 2 )
Clearly, sucient statistics are not unique. From the factorization theorem it is easy to see that (i) the identity function T (x1 , . . . , xn ) = (x1 , . . . , xn ) is a sucient statistic vector and (ii) if T is a sucient statistic for then so is any 1-1 function of T . A function that is not 1-1 of a sucient statistic may or may not be a sucient statistic. This leads to the notion of a minimal sucient statistic. Denition 3. A statistic that is a sucient statistic and that is a function of all other sucient statistics is called a minimal sucient statistic. In a sense, a minimal sucient statistic is the smallest sucient statistic and therefore it represents the ultimate data reduction with respect to estimating . In general, it may or may not exists. 2 Example: Since T = ( Xi , Xi ) are jointly sucient statistics for = (, 2 ) for normally distributed 2 2 data X1 , . . . , Xn N (, ), then so are (X, S 2 ) which are a 1-1 function of ( Xi , Xi ). The following theorem provides a way of verifying that a sucient statistic is minimal. Theorem 2. T is a minimal sucient statistics if L(x1 , . . . , xn |) is not a function of L(y1 , . . . , yn |) T (x1 , . . . , xn ) = T (y1 , . . . , yn ).
Proof. First we show that T is a sucient statistic. For each element in the range of T , x a sample t t y1 , . . . , yn . For arbitrary x1 , . . . , xn denote T (x1 , . . . , xn ) = t and L(x1 , . . . , xn |) = L(x1 , . . . , xn |) t t L(y1 , . . . , yn |) = h(x1 , . . . , xn )g(T (x1 , . . . , xn ), ). t t L(y1 , . . . , yn |)
We show that T is a function of some other arbitrary sucient statistic T . Let x, y be such that T (x1 , . . . , xn ) = T (y1 , . . . , yn ). Since g (T (x1 , . . . , xn ), )h (x1 , . . . , xn ) h (x1 , . . . , xn ) L(x1 , . . . , xn |) = = L(y1 , . . . , yn |) g (T (y1 , . . . , yn ), )h (y1 , . . . , yn ) h (y1 , . . . , yn ) is independent of , T (x1 , . . . , xn ) = T (y1 , . . . , yn ) and T is a 1-1 function of T .
2 Example: T = ( Xi , Xi ) is a minimal sucient statistic for the Normal distribution since the likelihood ratio is not a function of i T (x) = T (y)
1 L(x1 , . . . , xn |) = e 22 L(y1 , . . . , yn |)
(xi )2 (yi )2
= e 22 (
P 2 P P x2 yi )+ 2 ( xi yi ) i