Beruflich Dokumente
Kultur Dokumente
Chapter 2
2.1
Definition
A random vector x is said to have a multivariate normal distribution (multinormal
distribution) if every linear combination of its components has a univariate normal distribution.
f (x1 , x2 )
f (a1 x1 + a2 x2 )
x1
x2
a1 x 1 + a2 x 2
a1 1 + a2 2
Properties
1. If x is multinormal, for any constant vector a,
a0 x N (a0 , a0 a).
Proof: Since E(a0 x) = a0 and Var(a0 x) = a0 a, the result follows by knowing that a0 x is
univariate normal.
2. The m.g.f. of a multinormal random vector x with mean vector and covariance matrix is
given by
1 0
0
Mx (t) = exp t + t t .
2
Thus, a multinormal distribution is identified by its means and covariances . We use the
notation x Np (, ).
0
Hint: Mx (t) = E(et x ) = E(ey ), where t = [t1 tp ]0 and y = t0 x N (t0 , t0 t), y is a linear
combination of components in x.
Recall that the moment generating function (m.g.f.) of a univariate normal
x N (, 2 ) is
k
2
x (t)
Mx (t) = exp(t + 2 t2 ) and the kth moment is generated by E(xk ) = d M
.
dtk
t=0
Chapter 2
( is p.d.).
x1
1
11 12
7. Let x =
,=
and =
, where x1 consists of the first q components and
2
21 22
x2
x2 consists of the last (p q) components.
(a) x1 and x2 are independent if and only if Cov(x1 , x2 ) = 12 = 0.
(b) x1 Nq (1 , 11 ) and x2 Npq (2 , 22 ).
1
(c) (x1 12 1
22 x2 ) is independent of x2 and is distributed as Nq (1 12 22 2 , 11
12 1
22 21 ).
1
(d) Given x2 , x1 |x2 Nq (1 + 12 1
22 (x2 2 ),11 12 22 21 ).
1
Property 7(d) implies, E(x1 |x2 ) = 1 + 12 1
22 (x2 2 ) and Var(x1 |x2 ) = 11 12 22 21
which does not change with the value of x2 . Indeed, the results of Property 7(d) is related to the
multivariate linear regression model. To summarize, the marginal pdf and conditional pdf of a
multivariate normal distribution are still multivariate normal.
2.2
Chapter 2
As shown in what follows, the method of Maximum Likelihood Estimation (MLE) will also give x
as MLE of , but the MLE of is slightly different from S, namely (n 1)S/n, which is very close to
S as n is large.
The likelihood function is
L(, ) = f (x1 , x2 , . . . , xn )
= f (x1 )f (x2 ) f (xn )
x1 , x2 , . . . , xn are independent
(
)
n
1X
1
(xi )0 1 (xi )
by Property (5).
=
n exp
np
2 i=1
(2) 2 || 2
and thus the log-likelihood function is
`(, ) = log L(, )
"
#
n
X
np
n
1
= log(2) log || tr 1
(xi )(xi )0 .
2
2
2
i=1
The maximization of the log-likelihood function yields the MLE of and are respectively
= W = (n 1)S .
=x
and
n
n
(2.2.1)
Properties
and S are sufficient statistics for Np (, ), i.e., the conditional distribution of the sample
1. x
and S does not depend on and .
(x1 , . . . , xn ) given x
Np (, n1 ) and (n 1)S Wp (n 1, ), a central Wishart distribution which is defined in
2. x
the next section.
is biased. However, S is unbiased for .
is unbiased but
3.
4. The MLEs possess an invariance property. If the MLE of j is denoted as j for j = 1, 2, . . . , p,
then MLE of i = hi (1 , 2 , . . . , p ) is i = hi (1 , 2 , . . . , p ) for i = 1, 2, . . . , r where 1 r p,
provided that hi ()s are one-to-one functions.
2.3
Wishart Distribution
Definition
Suppose xi (i = 1, . . . , k) are independent Np (i , ). Define a symmetric p p matrix V as
V =
k
X
xi x0i = X 0 X.
i=1
Then, V is said to follow a Wishart distribution, denoted by Wp (k, , ), where is called the
k
P
scaling matrix, k the degree of freedom, =
i 0i the (p p symmetric) noncentrality matrix.
i=1
Indeed, the Wishart distribution can be considered a multivariate extension of chi-squared distribution.
HKU STA7005 (2016-17, Semester 1)
Chapter 2
When = 0, the Wishart distribution is called the central Wishart distribution, denoted
simply by Wp (k, ).
Note that when p = 1 and = 2 , the Wishart
distribution Wp (k, , ) is reduced to a non-central
k
P
chi-squared distribution 2 2 k, 2i .
i=1
In the univariate case, the pdf of a central Wishart distribution is reduced to central chi-squared
distribution:
f (x)
Properties
1. If V Wp (k, ), the pdf of V is given by
1 1
f (V ) = c(p, k) ||
|V |
exp tr V
2
1
where c(p, k) = 2kp/2 p(p1)/4 k2 k1
kp+1
.
2
2
k/2
(kp1)/2
Chapter 2
2.4
Before any statistical modeling, it is crucial to verify if the data at hand satisfy the underlying
distributional assumptions. For most multivariate analyses, it is important that the data indeed follow
the multivariate normal, at least approximately if not exactly. Here are some commonly used methods.
1. Check each variable for univariate normality (necessary for multinormality but not
sufficient) [Use either SAS procedure proc univariate or SAS/INSIGHT. To invoke the latter,
we need to select buttons in the following sequence: Solution . Analysis . Interactive Data
Analysis]
Q-Q plot (quantile against quantile plot) for normal distribution
sample quantiles are plotted against the theoretical quantiles of a standard normal
distribution
a straight line indicates univariate normality
non-linear transformation on the variable may help to achieve the normality.
HKU STA7005 (2016-17, Semester 1)
Chapter 2
Normal Quantile
Normal Quantile
Normal Quantile
Normal Quantile
Normal Quantile
Normal Quantile
An alternative device is the P-P plot (probability against probability plot) that plots the
sample cdf against the theoretical cdf.
Shapiro-Wilk W test (small sample)
The test statistic of W test is a modified version of the squared sample correlation between
the sample quantiles and the expected quantiles.
Kolmogorov-Smirnov-Lilliefors (KSL) test (large sample, n > 2000)
The test statistic of this test is the maximum difference between the empirical cdf and the
normal cdf.
Test for zero skewness
For symmetric distribution, the sample skewness
Xn (xi x)3
n
i=1
(n 1)(n 2)
s3
is close to 0. [See Doane D.P., Seward L.E. (2011), Measuring Skewness: A Forgotten
Statistic? Journal of Statistics Education, 19(2).]
Test for zero excess kurtosis
For normal distribution, sample excess kurtosis
Xn (xi x)4
n(n + 1)
3(n 1)2
i=1
(n 1)(n 2)(n 3)
s4
(n 2)(n 3)
is close to 0. [See Sheskin, D.J. (2000), Handbook of Parametric and Nonparametric
Statistical Procedures, Second Edition. Boca Raton, Florida: Chapman & Hall/CRC.]
HKU STA7005 (2016-17, Semester 1)
Chapter 2
2. When n p is large enough, we make use of property (9) of Section 2.1. Check whether the
squared generalized distance as defined below follows a chi-square distribution by a Q-Q plot
(necessary and sufficient conditions for very large sample size).
)0 S 1 (xi x
), i = 1, . . . , n.
Define the squared Mahalanobis distance as d2i = (xi x
Order d21 , d22 , . . . , d2n as d2(1) d2(2) d2(n) .
Plot 2(i) vs d2(i) , where 2(i) is the 100(i0.375)/(n+0.25) (as in proc univariate) percentile
of 2 (p) distribution.
f (x1 , x2 )
f (x1 , x2 )
x2
x2
x1
x1
Chi-square Quantile
Chi-square Quantile
In this course, a SAS macro code to produce this Q-Q plot will be provided.
3. Check each Principal Component (PC) for univariate normality (necessary condition; and if
the sample size n is large enough, a sufficient condition)
The PCs are readily available and their univariate normality easily checked by SAS/INSIGHT;
otherwise the procedure proc princomp is required before we can use proc univariate to check
normality.
2.5
To achieve the multinormality of the data, univariate transformation is applied to each variable
individually. After then, the multinormality of transformed variables is checked again. Followings are
the transformation commonly used in practice:
HKU STA7005 (2016-17, Semester 1)
Chapter 2
count data: x
p
x 1
()
x =
logx
for 6= 0
for = 0
Transformation
x3
x2
x
x
log x
1/ x
1/x
1/x2
1/x3
log(x)
Normal Quantile
Normal Quantile
(b) Log-transformation ( = 0)
Chapter 2
2.6
Summary of Chapter 2
1
(2)p/2 ||1/2
1
0 1
exp (x ) (x ) .
2
f (V ) = c(p, k) ||
(kp1)/2
|V |
where c(p, k) = 2kp/2 p(p1)/4 k2 k1
2
P
=x
= n1 ni=1 xi .
3. (a)
n1
= 1 (X 0 X n
0) =
(b)
xx
S.
n
n
1 1
exp tr V
2
kp+1
2
1
and S
4. Sampling distributions of x
Np (, n1 ).
(a) x
(b) (n 1)S Wp (n 1, ) where (n 1)S =
Pn1
i=1
z i z 0i with z i Np (0, ).
5. Normality checking
(a) Univariate
Q-Q plot, Shapiro-Wilk test, Kolmogorov-Smirnov (KS) test, tests for skewness and kurtosis
(b) Multivariate
Chi-square plot, principal components method
6. Transformation to near normality
Box-Cox transformation