Sie sind auf Seite 1von 23

Inferences about a Mean Vector

In the following lectures, we test hypotheses about a

p 1 population mean vector = (1, 2, . . . , p)0

We could test p disjoint hypothesis (one for each j in ) but

that would not take advantage of the correlations between
the measured traits (X1, X2, . . . , Xp).

We first review hypothesis testing in the univariate case, and

then develop the multivariate Hotellings T 2 statistic and the
likelihood ratio statistic for multivariate hypothesis testing.

We consider applications to repeated measures (longitudinal)


We also consider situations when data are incomplete (data

are missing at random).

Approaches to Multivariate Inference

Define a reasonable distance measure. An estimated mean

vector that is too far away from the hypothesized mean
vector 0 gives evidence against the null hypothesis.

Construct a likelihood ratio test based on the multivariate

normal distribution.

Union-Intersection approach: Consider a univariate test

of H0 : a0 = a00 versus Ha : a0 6= a00 for some linear
combination of the traits a0X. Optimize over possible values
of a.

Review of Univariate Hypothesis Testing
Is a given value 0 a plausible value for the population mean
We formulate the problem as a hypothesis testing problem.
The competing hypotheses are
H0 : = 0 and Ha : 6= 0.

Given a sample X1, ..., Xn from a normal population, we

compute the test statistic
0 )
t= .
s/ n
and 0 are close and we fail to reject
If t is small, then X
H0 .

Univariate Hypothesis Testing (contd)

When H0 is true, the statistic t has a student t distribution

with n 1 degrees of freedom. We reject the null hypothesis
at level when |t| > t(n1)(/2).
Notice that rejecting H0 when t is large is equivalent to
rejecting it when the squared standardized distance
0 ) 2
t = = n( 0)(s2)1(X
X 0 )
s /n
is large.
We reject H0 when
n(X 0 ) > t 2
(n1) (/2)
i.e., the squared standardized distance exceeds the upper
percentile of a central F-distribution with n 1 df.

Univariate Hypothesis Testing (contd)
If we fail to reject H0, we conclude that 0 is close (in units
of standard deviations of X) to X,
and thus is a plausible
value for .

The set of plausible values for is the set of all values that
lie in the 100(1 )% confidence interval for :
s s
tn1(/2) 0 x
x + tn1(/2) .
n n

The confidence interval consists of all the 0 values that

would not be rejected by the level test of H0 : = 0.

Before collecting the data, the interval is random and has

probability 1 of containing .

Hotellings T 2 Statistic

Consider now the problem of testing whether the p 1 vector

0 is a plausible value for the population mean vector .

The squared distance

1 1
T = (X 0 ) 0 S 0)0S 1(X
0) = n(X
(X 0 )
is called the Hotelling T 2 statistic.

In the expression above,

1X 1 X
X Xi , S=
(Xi X)(X 0.
i X)
n i n1 i

Hotellings T 2 Statistic (contd)
If the observed T 2 value is large we reject H0 : = 0.

To decide how large is large, we need the sampling

distribution of T 2 when the hypothesized mean vector
is correct:
(n 1)p
T2 Fp,np.
(n p)

We reject the null hypothesis H0 : = 0 for the p-dimensional

vector at level when
(n 1)p
T2 > Fp,np(),
(n p)
where Fp,np() is the upper percentile of the central F
distribution with p and n p degrees of freedom.

Hotellings T 2 Statistic (contd)
As we noted earlier,
1 1
T = (X 0 ) 0 S 0) = n(X
(X 0)0S 1(X
0 )
has an approximate central chi-square distribution with p df
when 0 is correct, for large n, or when is known, in which
case the distribution is exact when we have normality.

The exact F-distribution relies on the normality assumption.

Note that
(n 1)p
Fp,np() > 2
p ()
(n p)
but these quantities are nearly equal for large values of n p.

Example 5.2: Female Sweat Data

Perspiration from a sample of 20 healthy females was

analyzed. Three variables were measured for each
X1 =sweat rate
X2 =sodium content
X3 =potassium content

The question is whether 0 = [4, 50, 10]0 is plausible for

the population mean vector.

Example 5.2: Sweat Data (contd)

At level = 0.1, we reject the null hypothesis if

(n 1)p
0)0S 1(X
T 2 = 20(X 0 ) > Fp,np(0.1)
(n p)
= F3,17(0.1) = 8.18.

From the data displayed in Table 5.1:

4.64 4.64 4 0.64
x 0 = 45.4 50 = 4.6 .
= 45.4 and x

9.96 9.96 10 0.04

Example 5.2: Sweat Data (contd)

After computing the inverse of the 3 3 sample covariance

matrix S 1 we can compute the value of the T 2 statistic as

0.586 0.022 0.258 0.64
T 2 = 20[ 0.64 4.6 0.04 ] 0.022 0.006 0.002 4.60

0.258 0.002 0.402 0.04
= 9.74.

Since 9.74 > 8.18 we reject H0 and conclude that 0 is not

a plausible value for at the 10% level.

At this point, we do not know which of the three

hypothesized mean values is not supported by the data.

The Female Sweat Data: R code sweat.R

sweat <- read.table(file =

header = F, col.names = c("subject", "x1", "x2", "x3"))
HotellingsT2(X = sweat[, -1], mu = nullmean)

# Hotellings one sample T2-test

# data: sweat[, -1]

# T.2 = 2.9045, df1 = 3, df2 = 17, p-value = 0.06493
# alternative hypothesis: true location is not equal to c(4,50,10)

Invariance property of Hotellings T 2
The T 2 statistic is invariant to changes in units of
measurements of the form
Yp1 = CppXp1 + dp1,
with C non-singular. An example of such a transformation
is the conversion of temperature measurements from
Fahrenheit to Celsius.

Note that given observations x1, ..., xn, we find that

x + d, and Sy = CSC 0.
y = C

Similarly, E(Y ) = C + d and the hypothesized value is

Y,0 = C0 + d.

Invariance property of Hotellings T 2 (contd)

We now show that the Ty2 = Tx2.

y Y,0)0Sy1(
Ty2 = n( y Y,0)

x 0))0(CSC 0)1(C(
= n(C( x 0))

x 0)0C 0(C 0)1S 1C 1C(

= n( x 0 )

x 0)0S 1(
= n( x 0).

The Hotelling T 2 test is the most powerful test in the class

of tests that are invariate to full rank linear transformations

Likelihood Ratio Test and Hotellings T 2
Compare the maximum value of the multivariate normal
likelihood function under no restrictions against the
restricted maximized value with the mean vector held at 0.
The hypothesized value 0 will be plausible if it produces a
likelihood value almost as large as the unrestricted maximum.

To test H0 : = 0 against H1 : 6= 0 we construct the

max{} L(0, )
Likelihood ratio = = = ,
max{,} L(, ) 0|
where the numerator in the ratio is the likelihood at the MLE
of given that = 0 and the denominator is the likelihood
at the unrestricted MLEs for both , .
Likelihood Ratio Test and Hotellings T 2
0 = n1 (xi 0)(xi 0)0 under H0

= n1
, under H0 H1
xi = x
= n1 )0 = n1A, under H0 H1,
(xi x
)(xi x
then under the assumption of multivariate normality
0|n/2 exp{tr[
| 1 i(xi 0)(xi 0)0/2]}
= 0 .

|| n/2 1
exp{tr[ A]}

Derivation of Likelihood Ratio Test

0|n/2 exp {
| 1(xi 0)]}
(xi 0)0
2 i 0

|0| n/2 exp{ 1(x0)}
tr(xi 0)0
2 i 0

n/2 1 1
(xi 0)(xi 0)0}
|0| exp{ tr0
2 i
= 0|n/2 exp{ tr
| 1 0n}
2 0
= 0|n/2 exp{ }.

Derivation of Likelihood Ratio Test

n/2 exp {
|| )0
(xi x 1(xi x)]}
2 i
n/2 1X
|| exp{ tr(xi x)0
2 i
n/2 1 1 )0}
|| exp{ tr (xi x
)(xi x
2 i
= n/2 exp{ tr
|| 1n}

= n/2 exp{ }.

Derivation of Likelihood Ratio Test

| 0|n/2 exp{ np }
= 2
n/2 exp{ np }
|| 2
| 0|n/2
| 0|n/2

= .


0 is a plausible value for if is close to one.

Relationship between and T 2

It is just a matter of algebra to show that

2/n =
| 0|
= 1+ , or
| 0| T2
= 1+ .

|| n1

For large T 2, the likelihood ratio is small and both lead to

rejection of H0.

Relationship between and T 2

From the previous equation,

" #
T2 = 1 (n 1),

which provides another way to compute T 2 that does not
require inverting a covariance matrix.

When H0 : = 0 is true, the exact distribution of the

likelihood ratio test statistic is obtained from
" #
2 p(n 1)
T = 1 (n 1) F(p,np)()

|| np

Union-Intersection Derivation of T 2

Consider a reduction from p-dimensional observation vectors

to univariate observations
Yj = a0Xj = a1X1 + a2X2 + + apXp N ID(a0, a0a)
where a0 = (a1, a2, . . . , ap)

The null hypothesis H0 : = 0 is true if and only if all null

hypotheses of the form H(0,a) : a0 = a00 are true.

Test H(0,a) : a0 = a00 versus H(A,a) : a0 6= a00 with

a00 0 a0
Y a X 0
(a) = = q
sY 1 a0 S a
Union-Intersection Derivation of T 2

If you cannot reject the null hypothesis for the a that

maximizes t2(a)
, you cannot reject any of the the univariate
null hypotheses and you cannot reject the multivariate null
hypothesis H0 : = 0.

From previous results, a vector that maximizes t2

is a =
S 1(X

Consequently, The maximum squared t-test is

0)0S 1(X
T 2 = n(X 0)


Das könnte Ihnen auch gefallen