Sie sind auf Seite 1von 2

Sucient Statistics

Guy Lebanon May 2, 2006


A sucient statistics with respect to is a statistic T (X1 , . . . , Xn ) that contains all the information that is useful for the estimation of . It is useful a data reduction tool, and studying its properties leads to other useful results. Denition 1. A statistic T is sucient for if p(x1 , . . . , xn |T (x1 , . . . , xn )) is not a function of . A useful way to visualize it is as a Markov chain T (X1 , . . . , Xn ) {X1 , . . . , Xn } (although in classical statistics is not a random variable but a specic value). Conditioned on the middle part of the chain, the front and back are independent. As mentioned above, the intuition behind the sucient statistic concept is that it contains all the information necessary for estimating . Therefore if one is interested in estimating , it is perfectly ne to get rid of the original data while keeping only the value of the sucient statistic. The motivation connects to the formal denition by considering the concept of sampling a ghost sample: Consider a statistician who erased the original data, but kept the sucient statistic. Since p(x1 , . . . , xn |T (x1 , . . . , xn )) is not a function of (which is unknown), we assume that it is a known distribution. The statistician can then sample x1 , . . . , xn from that conditional distribution, and that ghost sample can be used in lieu of the original data that was thrown away. The denition of sucient statistic is very hard to verify. A much easier way to nd sucient statistics is through the factorization theorem. Denition 2. Let X1 , . . . , Xn be iid RVs whose distribution is the pdf fXi or the pmf pXi . The likelihood function is the product of the pdfs or pmfs L(x1 , . . . , xn |) =
n i=1 n i=1

fXi (xi ) pXi (xi )

Xi is a continuous RV . Xi is a discrete RV

The likelihood function is sometimes viewed as a function of x1 , . . . , xn (xing ) and sometimes as a function of (xing x1 , . . . , xn ). In the latter case, the likelihood is sometimes denoted L(). Theorem 1 (Factorization Theorem). T is a sucient statistic for if the likelihood factorizes into the following form L(x1 , . . . , xn |) = g(, T (x1 , . . . , xn )) h(x1 , . . . , xn ) for some functions g, h. Proof. We prove the theorem only for the discrete case (the continuous case requires dierent techniques). First assume the likelihood factorizes as above. Then p(x1 , . . . , xn |T (x1 , . . . , xn )) = p(x1 , . . . , xn , T (x1 , . . . , xn )) = p(T (x1 , . . . , xn )) p(x1 , . . . , xn ) = y:T (y)=T (x) p(y1 , . . . , yn ) h(x1 , . . . , xn ) y:T (y)=T (x) h(y1 , . . . , yn )

which is not a function of . Conversely, assume that T is a sucient statistic for . Then L(x1 , . . . , xn |) = p(x1 , . . . , xn |T (x1 , . . . , xn ), )p(T (x1 , . . . , xn )|) = h(x1 , . . . , xn )g(T (x1 , . . . , xn ), ).

Example: A sucient statistic for Ber() is L(x1 , . . . , xn |) =


i

Xi since
P xi

xi (1 )1xi =

(1 )n

xi

= g ,

xi 1.

Example: A sucient statistic for the uniform distribution U ([0, ]) is max(X1 , . . . , Xn ) since L(x1 , . . . , xn |) =
i

1 1{0xi } = n 1{max(x1 ,...,xn )} 1{min(x1 ,...,xn )0} = g(, max(x1 , . . . , xn ))h(x1 , . . . , xn ).

In the case that is a vector rather than a scalar, the sucient statistic may be a vector as well. In this case we say that the sucient statistic vector is jointly sucient for the parameter vector . The denitions and factorization theorem carry over with little change. 2 Example: T = ( Xi , Xi ) are jointly sucient statistics for = (, 2 ) for normally distributed data 2 X1 , . . . , Xn N (, ): L(x1 , . . . , xn |) =
i

1 2 2

e(xi )
P
i

/(2 2 )

= (2 2 )n/2 e
P
i

i (xi )

/(2 2 )

= (2 2 )n/2 e

x2 /(2 2 )+2 i

xi /(2 2 )2 /(2 2 )

= g(, T ) 1 = g(, T ) h(x1 , . . . , xn )

Clearly, sucient statistics are not unique. From the factorization theorem it is easy to see that (i) the identity function T (x1 , . . . , xn ) = (x1 , . . . , xn ) is a sucient statistic vector and (ii) if T is a sucient statistic for then so is any 1-1 function of T . A function that is not 1-1 of a sucient statistic may or may not be a sucient statistic. This leads to the notion of a minimal sucient statistic. Denition 3. A statistic that is a sucient statistic and that is a function of all other sucient statistics is called a minimal sucient statistic. In a sense, a minimal sucient statistic is the smallest sucient statistic and therefore it represents the ultimate data reduction with respect to estimating . In general, it may or may not exists. 2 Example: Since T = ( Xi , Xi ) are jointly sucient statistics for = (, 2 ) for normally distributed 2 2 data X1 , . . . , Xn N (, ), then so are (X, S 2 ) which are a 1-1 function of ( Xi , Xi ). The following theorem provides a way of verifying that a sucient statistic is minimal. Theorem 2. T is a minimal sucient statistics if L(x1 , . . . , xn |) is not a function of L(y1 , . . . , yn |) T (x1 , . . . , xn ) = T (y1 , . . . , yn ).

Proof. First we show that T is a sucient statistic. For each element in the range of T , x a sample t t y1 , . . . , yn . For arbitrary x1 , . . . , xn denote T (x1 , . . . , xn ) = t and L(x1 , . . . , xn |) = L(x1 , . . . , xn |) t t L(y1 , . . . , yn |) = h(x1 , . . . , xn )g(T (x1 , . . . , xn ), ). t t L(y1 , . . . , yn |)

We show that T is a function of some other arbitrary sucient statistic T . Let x, y be such that T (x1 , . . . , xn ) = T (y1 , . . . , yn ). Since g (T (x1 , . . . , xn ), )h (x1 , . . . , xn ) h (x1 , . . . , xn ) L(x1 , . . . , xn |) = = L(y1 , . . . , yn |) g (T (y1 , . . . , yn ), )h (y1 , . . . , yn ) h (y1 , . . . , yn ) is independent of , T (x1 , . . . , xn ) = T (y1 , . . . , yn ) and T is a 1-1 function of T .
2 Example: T = ( Xi , Xi ) is a minimal sucient statistic for the Normal distribution since the likelihood ratio is not a function of i T (x) = T (y)
1 L(x1 , . . . , xn |) = e 22 L(y1 , . . . , yn |)

(xi )2 (yi )2

= e 22 (

P 2 P P x2 yi )+ 2 ( xi yi ) i

Since (X, S 2 ) is a function of T , it is minimal sucient statistic as well. 2

Das könnte Ihnen auch gefallen