Multivariate Data Analysis

SET B
BOLGATANGA POLYTECHNIC
DEPARTMENT OF STATISTICS HND 3
MULTIVARIATE DATA ANALYSIS
END OF FIRST SEMESTER EXAMINATION 2011/2012 (TIME 3 HRS)
SECTION A. ANSWER ALL QUESTIONS (50 MARKS)
Q1)a.Briefly explain the following
(i).Symmetry Matrix (2 marks)
(ii).Trace of a Matrix (1mark)
(iii).Orthogonal Matrix (1mark)
(iv).Unit Matrix (1mark)
(v).Transpose of a Matrix (2marks)
b. Let the random variable y have a covariance matrix

1
1
1
]
1
9 1 4
1 4 2
4 2 25
(i). Calculate the correlation matrix of y. (8marks)
(ii). Given that
[ ] 1 0 2 a , calculate the variance of y. (4marks)
(iii). Calculate the trace of the matrix above. (2marks)
Q2)a.From the data given below, calculate;
(i). the sample means,
X (2marks)
(ii). the sample variance and covariances, S
( 6marks)
(iii). the sample correlations, R. (4marks)
Variable 1 ( X
1
) 5 4 6 2 2 6 3
Variable 2 ( X
2
) 5 5 4 7 9 5 5
Q2b). Given that
1
]
1
00390 . 0
0065 . 0
1
X
,
1
]
1
0262 . 0
2483 . 0
2
X

S
P
-1
= 1
]
1
147 . 108 423 . 90

423 . 90 158 . 131
Assume equal cost and equal prior discriminant function to which population will you assign.
X
O
=

,
_
044 . 0
210 . 2
? ( 7 marks)
Q3. As part of the study of AIDS prevention undertaken by a social counselor, some questions
were designed and the respondents were categorizes in males (population 1) and females
(population 2). A sample of 30 males and 30 females were considered. The sample mean vector
were
Males Females

1
X
1
1
1
1
]
1
700 . 4
967 . 3
033 . 7
833 . 6

2
X
1
1
1
1
]
1
533 . 4
000 . 4
000 . 7
633 . 6
and the pooled variance-covariance is

1
1
1
1
]
1
306 . 0 029 . 0 143 . 0 161 . 0

029 . 0 810 . 0 173 . 0 066 . 0
143 . 0 0173 637 . 0 262 . 0
161 . 0 066 . 0 262 . 0 606 . 0
SP
(a) Sketch the sample profiles for the results
(4marks)
(b) Is the profile of males and females the same? (Take
% 5
). (6marks)
SET B
SECTION B : ANSWER ANY TWO(2) (50MARKS)
Q1)a. Given that [ ] 1 ... 0 0 0 m . Show that p
z
of z is normally distributed with
Z
p
~ N( , p
pp
).
b. Using the matrix X =
1
1
1
]
1

2 2 0
2 1 3
0 3 1
, which of the following are independent? Explain.
(i)
( )
2 1
, x x
(ii)
( ) [ ]
3 3 2
, x x x +

(iii)
( ) ( )
1
]
1
+
3 1 3 2 1
2
1
,
2
1
x x x x x
(iv)
1
]
1
+
3
2 1
,
2
X
X X
Q2. Consider the following independent samples from three levels. The observation vectors
1
]
1
2
1
X
X
are;

1
]
1
1
]
1
1
]
1
1
]
1
4
16
2
8
2
6
8
14
: 100 Level

1
]
1
1
]
1
1
]
1
1
]
1
7
2
15
0
12
5
6
1
: 200 Level
a. Break up the observation in to mean,
treatment, and residual components and construct the corresponding arrays for each variable.
b. Construct the one-way MANOVA table using the information in (a)
c. Calculate Wilks lamda. (Use
% 5
).
1
]
1
1
]
1
1
]
1
1
]
1
6
6
1
11
7
2
2
3
: 300 Level
Q.3) Perspiration from 20 healthy females was analysed. Three components:
x
1
= sweat rate, x
2
= sodium content and
x
3
= potassium content were measured and the results are presented below.
Summary in a computer analysis as follows.
,
_
,
_
4902 . 0 022 . 0 258 . 0

002 . 0 006 . 0 022 . 0
258 . 0 027 . 0 586 . 0
628 . 3 627 . 5 810 . 1
627 . 5 789 . 199 002 . 10
810 . 1 002 . 10 879 . 2
1
S
and
S
Female X1 X2 X3
1. 3.7 48.5 9.3
2. 5.7 65.1 8.0
3. 3.8 47.2 10.9
4. 3.2 53.2 12.0
5. 3.1 55.5 9.7
6. 4.6 36.1 7.9
7. 2.4 24.8 14.0
8. 7.2 33.1 7.6
9. 6.7 47.4 8.5
10. 5.4 54.1 11.3
11. 3.9 36.9 12.7
12. 4.5 58.8 12.3
13. 3.5 27.8 9.8
14. 4.5 40.2 8.4
15. 1.5 13.5 10.1
16. 8.5 56.4 7.1
17. 4.5 71.6 8.2
18. 6.5 52.8 10.9
19. 4.1 44.1 11.2
20. 5.5 40.9 9.4
Test the hypothesis
H
0
: U
T
= [4, 50, 10] against
END OF FIRST SEMESTER EXAMINATION 2011/2012 (TIME 3 HRS)
MACKING SCHEME
Q1.a
i. A square matrix (a
ij
) is a symmetric if a
ij
= a
ji
. eg
,
_
4 9 5
9 8 2
5 2 1
ii. The trace of a matrix is the sum of its diagonal element.
If A=
,
_
22 21
12 11
a a
a a
, then trace(A) = 22 11
a a a
ii
+

iii. An Orthogonal matrix Q is a matrix that QQ
T
= Q
T
Q= I. eg Q =
,
_
2
1
2
1
2
1
2
1

iv. Unit matrix is a diagonal matrix in which the elements on the leading diagonal are all
unitary. eg
,
_
1 0 0
0 1 0
0 0 1
v. Transpose of a matrix is a matrix when the rows and columns of a matrix are
interchanged: ie the first row becomes the first column, the second row becomes the
second column, the third row becomes the third column, etc. then the new matrix so
formed is called the transposed of the original matrix. Eg if A=
,
_
5 2
9 7
6 4
,
then A
T
=

,
_
5 9 6
2 7 4
b. Given that

,
_
9 1 4
1 4 2
4 2 25
, since y has a covariance matrix
Let v be the diagonal matrix and correlation matrix y be R.
But
2
1
2
1
V R V
R =

2
1
2
1
V V
1
1
1
1
1
]
1
9
1
0 0
0
4
1
0
0 0
25
1
2
1
V =
1
1
1
1
]
1
3
1
0 0
0
2
1
0
0 0
5
1
R =
1
1
1
1
]
1
3
1
0 0
0
2
1
0
0 0
5
1

1
1
1
]
1
9 1 4
1 4 2
4 2 25
1
1
1
1
]
1
3
1
0 0
0
2
1
0
0 0
5
1
Therefore, the correlation matrix R is ;
R=
1
1
1
1
]
1
1
6
1
5
4
6
1
1
5
1
5
4
5
1
1
ii. since Var
[ ]

a a y
Var
[ ] ] [ 1 0 2
9 1 4
1 4 2
4 2 25
1
0
2
1
1
1
]
1
1
1
1
]
1
y
Var
[ ] 125 y
iii. Let A=
1
1
1
]
1
9 1 4
1 4 2
4 2 25
Since trace(A) =
38 9 4 25 ) ( ,
1
+ +
A trace a
n
i
ii
Q2) a.i Since
i
X
n
X
1
1

7
3 6 2 2 6 4 5
1
+ + + + + +
X

4
1
X

6
7
7 5 9 7 4 5 5
2

+ + + + + +
X

6
2
X
i. For the sample Variance and covariance =S
n
since

S
n
=
1
]
1
22 21
12 11
S S
S S
,
S
n
=
( )
2
1
1
1
X X
n
( )
2
1 11
1
1
X X
n
S
( ) ( ) ( ) ( ) ( ) ( ) ( ) [ ] 3 4 3 4 6 4 2 4 2 4 6 4 4 4 5
6
1
2 2 2 2 2 2 2
11
+ + + + + + S

( ) ( ) ( ) ( ) ( ) ( ) ( ) [ ] 3 6 7 6 5 6 9 6 7 6 4 6 5 6 5
6
1
2 2 2 2 2 2 2
22
+ + + + + + S
( )( ) ( )( ) ( )( ) ( )( ) ( )( ) [ ] 67 . 2 6 7 4 3 ... 6 7 4 2 6 4 4 6 6 5 4 4 6 5 4 5
6
1
22 21
+ + + + + S S

,
_

3 67 . 2
267 3
n
S
iii. For the sample correlation, R
Since, R =
,
_
,
_
3 67 . 2
67 . 2 3
22 21
12 11
r r
r r
r
ij
=
jj ii
ij
r r
r
1
3
3
9
3
3 3
3
11
r
,
1
3
3
9
3
3 3
3
22
r
89 . 0
3
67 . 2
9
67 . 2
3 3
67 . 2
12 21

r r

R =
,
_
1 89 . 0
89 . 0 1
[ ]
4
6
, X mean samlpe the hence
b. using fishers method of discriminant function, Given that

1
]
1
00390 . 0
0065 . 0
1
X
,
1
]
1
0262 . 0
2483 . 0
2
X

S
P
-1
=

1
]
1
147 . 108 423 . 90

423 . 90 158 . 131
X
O
=

,
_
044 . 0
210 . 2
since y=( )
0
1
2 1
X Sp x x

( )
1
]
1
+
+

0262 . 0 0390 . 0
2483 . 0 0065 . 0
2 1
x x

,
_
0128 . 0
2418 . 0
y=
( )
0
1
2 1
X Sp x x

= (0.2418 -0.0128)

1
]
1
147 . 108 423 . 90

423 . 90 158 . 131
,
_
044 . 0
210 . 2
therefore y = -5.8801
Also since
D
2
( ) ( )
2 1
1
2 1
x x Sp x x +

( ) +
2 1
x x +
1
]
1
00390 . 0
0065 . 0
,
_
1
]
1
0301 . 0
3133 . 0
0262 . 0
2483 . 0
D
2
(0.2418 -0.0128)
1
]
1
147 . 108 423 . 90

423 . 90 158 . 131
,
_
0301 . 0
3133 . 0
D
2
-6.8598, but since y

o
=( )
0
1
2 1
X Sp x x

2
1
D
2

2
1
D
2
4299 . 3
2
8598 . 6

since -5.8801 <

4299 . 3
Allocate (assign) X
O
to
2
X
Q3).
Given that

1
X
1
1
1
1
]
1
700 . 4
967 . 3
033 . 7
833 . 6
2
X
1
1
1
1
]
1
533 . 4
000 . 4
000 . 7
633 . 6
1
1
1
1
]
1
306 . 0 029 . 0 143 . 0 161 . 0

029 . 0 810 . 0 173 . 0 066 . 0
143 . 0 0173 637 . 0 262 . 0
161 . 0 066 . 0 262 . 0 606 . 0
SP
b. H
01
:
2 1
c c
c =
,
_
1 1 0 0
1 1 1 0
0 0 1 1
T
2
= ( )
2
) (
1 1
2 1
1
2 1
2 1
C x x C C CSp
n n
c x x >
1
]
1
,
_
,
_
,
_

167 . 0
033 . 0
033 . 0
2 . 0
533 . 4 700 . 4
000 . 4 967 . 3
000 . 7 033 . 7
633 . 6 833 . 6
)
2 1
x x
(
1
2 1
) x x =
[ ] 167 . 0 033 . 0 033 . 0 200 . 0
(
1 1
2 1
) C x x =
[ ] 167 . 0 033 . 0 033 . 0 200 . 0 033 . 0
1 0 0
1 1 0
0 1 1
0 0 1

,
_
033 . 0
167 . 0
033 . 0
033 . 0
2 . 0
1 1 0 0
0 1 1 0
0 0 1 1
) (
2 1

,
_
,
_
x x C
C SpC
1
=
,
_
1 1 0 0
1 1 1 0
0 0 1 1
1
1
1
1
]
1
306 . 0 029 . 0 143 . 0 161 . 0

029 . 0 810 . 0 173 . 0 066 . 0
143 . 0 0173 637 . 0 262 . 0
161 . 0 066 . 0 262 . 0 606 . 0
,
_
1 0 0
1 1 0
0 1 1
0 0 1

C SpC
1
=
,
_

058 . 1 751 . 0 125 . 0
751 . 0 101 . 1 268 . 0
125 . 0 268 . 0 719 . 0

But
067 . 0
30
1
30
1 1 1
2 1
+ +
n n
0.067 C Sp C
1
= 0.067
,
_

058 . 1 751 . 0 125 . 0
751 . 0 101 . 1 268 . 0
125 . 0 268 . 0 719 . 0
=
1
070568 . 0 05009 . 0 00337 . 0
05009 . 0 0734 . 0 0178 . 0
008337 . 0 01787 . 0 04795 . 0
,
_

But
1
1
1
]
1
1
]
1
,
_
405 . 37 801 . 29 611 . 17

801 . 29 719 . 38 613 . 19
611 . 17 613 . 19 224 . 31
1 1
1
C CSp
n n
T
2
= -0.033
1
1
1
]
1
405 . 37 801 . 29 611 . 17

801 . 29 719 . 38 613 . 19
611 . 17 613 . 19 224 . 31
(-0.033)
T
2
=
1
1
1
]
1
04073 . 0 03245 . 0 01918 , 0

0324 , 0 04216 . 0 02136 . 0
01918 . 0 02136 . 6 034 . 0
> c
2
C
2
=
( )( )
p n n F
P n n
p n n
P
+
+
+
2 1 1
2 1
2 1
,
1 2
C
2
=
( )( )
2 30 30 ,
3 30 30
1 3 2 30 30
1 3
+
+
+
F
=F(2,58)
05 . 0
= 4.00
DetT
2
= 0.061136
T
2
< C
2
1.51126 < 4.00
We fail to reject H
0
and conclude that there is parallism between the two profiles. Once there
assume coincidence.
a. Since
Sp =
1
1
1
1
]
1
6390 . 0
2695 . 0
3038 . 0
2738 . 0

1
X
1
1
1
1
]
1
700 . 4
967 . 3
033 . 7
833 . 6

2
X
1
1
1
1
]
1
533 . 4
000 . 4
000 . 7
633 . 6
0
1
2
3
4
5
6
7
8
Category 1 Category 2 Category 3 Category 4
Series 1
Series 2
Series 3
MULTI-VARIATE DATA ANALYSIS (STA 311)
END OF FIRST SEMESTER EXAMINATION 2011 / 2012 (TIME: 3 HRS)
MARKING SCHEME
SECTION B (50MARKS)
Q1).a Given that
[ ] 1 .... 0 0 0 m ,
( )
pp p p
N z ~

Since
( )
m m m Z
, Let
1
1
1
1
1
]
1
2
1

1
1
1
1
]
1
PP P P
P
P

2 1
2 22 21
1 22 11

[ ]
1
1
1
1
1
]
1

p
m
2
1
1 ... 0 0 0
P
m ... 0 0 0 + + +
P
m
[ ]
1
1
1
1
]
1
1
1
1
]
1
1
0
0
1 ... 0 0
2 1
23 22 21
13 12 11
pp p p
m m

[ ]
1
1
1
]
1
1
0
0
2 1 pp p p

pp
pp
m m

Hence
( )
pp p p
N z ~
b. Given that

1
1
1
]
1

2 2 0
2 1 3
0 3 1

i. ( ) 3 ,
2 1
X X ,
Since their covariance is not zero, it implies that it is not independent.
ii.
( ) [ ]
3 3 2
, x x x +
X
2 +
X
3
= 3+0 = 3
X
2 +
X
3
= -1-2 = -3
X
2 +
X
3
= -2+2 = 0
( ) [ ]
3 3 2
, x x x +
0
2 0
2 3
0 3
1
1
1
]
1

Since cov(X
2
+X
3
), X
3
= 0, therefore it is independent
iii.

( ) ( )
1
]
1
+
3 1 3 2 1
2
1
,
2
1
x x x x x

( ) ( ) 2 0 3 1
2
1
2
1
3 2 1
+ + x x x

( ) ( ) 2 2 1 3
2
1
2
1
3 2 1
+ + x x x

( ) ( ) 2 2 2 0
2
1
2
1
3 2 1
+ x x x

( )
3 1
2
1
x x
=
( )
2
1
0 1
2
1

( )
3 1
2
1
x x
=
( )
2
5
2 3
2
1
+

( )
3 1
2
1
x x
=
( ) 1 2 0
2
1

1
1
1
1
]
1
1 2
2
5
2
2
1
2
Cov=
( ) ( )
2
1
2
1
,
2
1
3 1 3 2 1

1
]
1
+ x x x x x

Therefore it is not independent. Since their covariance is not equal to zero.
iv. Given that

1
1
1
]
1

2 2 0
2 1 3
0 3 1
, to find
1
]
1
+
3
2 1
,
2
X
X X
( ) 2 3 1
2
1
2
2 1
+
+ x x
( ) 1 1 3
2
1
2
2 1

+ x x
( ) 1 2 0
2
1
2
2 1

+ x x
Therefore
1
1
1
]
1
2 1
2 1
0 2
,
+
0 ,
2
3
2 1
x
x x
it is independent since the covariance is zero.
Q2. a Hypothesis
H
O
:
0 ...
2 1

g

H
1
:
0 ...
2 1

g

11
4
16 8 6 14
1

+ + +
X

2
4
4 2 2 8
1

+ +
X
1
]
1
2
11
1
X

2
4
2 0 5 1
2

+ + +
X

10
4
7 15 12 6
2

+ + +
X

1
]
1
10
2
2
X

4
4
6 11 2 3
2

X

3
4
6 1 7 2
3

+ + +
X
1
]
1
3
4
3
X
For the overall mean, since
1
1
1
1
]
1
+ +
+ +
i
i
n
x n x n x n
n
x n x n x n
X
) (
) (
3 3 2 2 1 1
3 3 2 2 1 1

1
]
1
1
1
1
]
1
+ +
+ +
5
3
12
) 3 4 10 4 2 4 (
12
) 4 ( 4 2 4 11 4 (
X
since X
i j
= X + ( X
i
- X ) + ( X
i j
- X
i
)
,
_
,
_
,
_
,
_
2 7 2 7
0 2 3 1
5 3 5 3
7 7 7 7
1 1 1 1
8 8 8 8
3 3 3 3
3 3 3 3
3 3 3 3
6 11 2 3
2 0 5 1
16 8 6 14

,
_
6 11 2 3
2 0 5 1
16 8 6 14
,
_
6 11 2 3
2 0 5 1
16 8 6 14
Up roll
,
_
,
_
,
_
,
_
3 2 4 5
3 5 2 4
6 0 0 6
2 2 2 2
5 5 5 5
3 3 3 3
5 5 5 5
5 5 5 5
5 5 5 5
6 1 7 2
7 15 12 6
4 2 2 8
,
_
6 1 7 2
7 15 12 6
4 2 2 8
,
_
6 1 7 2
7 15 12 6
4 2 2 8
SS
OBS
= SS
MEAN
+ SS
TRT
+ SS
ERROR
UP ROLL
752 = 108 + 456 + 188
Total sum of squares corrected = SS
OBS
- SS
MEAN
TSS = 752 - 108 = 644
Total sum of squares corrected = SS
OBS
- SS
MEAN
TSS = 632 - 300 = 332
Mean
OBS = Mean + Trt +Res
(3
) 5
+ (3
5
) + + (3
) 5
= 12(3
) 5
= 180
Treatment = 4
( ) 3 8
+ 4 ) 5 ( (-1
)
+4
) 2 ( ) 7 (
= -60
Residual = (3
) 6
+ (-5
) 0
+(-3
) 0
+(5
) ) 6 (
+ = - 60
Total cross product
(14
) 8
+(6
) 5
+(8
) 5
+16
) 4 (
+ = 89
Total corrected cross product = Total cross product Mean cross product
= 89 108 = -91
b.
MANOVA TABLE
Source of Variation Matrix SS and Cross product D.F
Treatment
1
]
1
152 60
60 456 3
Residual
1
]
1
180 31
31 188 8
Total
1
]
1
332 91
91 644 11
c. For the wilks lamda,
since L = - ( n 1 -
2
g p +
) 1n
W B
W
+
, but

W B
W
+
=
8281 213808
961 33840
332 91
91 644
180 31
31 188
1
]
1
1
]
1
16 . 0
205527
32879

,
L= - ( 12- 1 -
)
2
3 4+
1n (0.16) = 13.74
Hence L = 13.74 since L > ( )
) 1 (
2
g
p
For the degree of freedom, ( ) 8 1 3 4 ) 1 (
2
g p
5073 . 15 ) 05 . 0 (
2
8

< 74 . 13 sin L ce
5073 . 15 ) 05 . 0 (
2
8

DECISION
Since 13.74 < 15.5073, we fail to reject H
O
at
05 . 0
CONCLUSION
We conclude that the three levels are not of the same effect
Q 3)
,
_
,
_
,
_
,
_
402 . 0 002 . 0 258 . 0

002 . 0 006 . 0 022 . 0
258 . 0 022 . 0 586 . 0
628 . 3 827 . 5 810 . 1
827 . 5 798 . 199 002 . 10
810 . 1 002 . 10 879 . 2

965 . 9
400 . 45
640 . 4
|
3
2
1
S
S
x
x
x
X
Testing the hypothesis; H
0
: U
T
= [4, 50, 10] against
H
1
: U
T
[4, 50, 10], n = 20
Using T
2
= n (x U
0
)
T
S
-1
(x U
0
)
( )
( )
,
_
,
_
,
_

160 . 0
072 . 0
467 . 0
035 . 0 , 100 . 4 , 64 . 0 20
10 965 . 9
50 400 . 45
4 640 . 4
402 . 0 022 . 0 258 . 0
400 . 45 006 . 0 022 . 0
258 . 0 022 . 0 586 . 0
10 965 . 9 , 50 400 . 45 , 4 640 . 4 20
2
T
T
2
= 9.74 (10marks)
By comparing with the critical value
( )
18 . 8 74 . 9
18 . 8 ) 10 . 0 (
) 1 (
74 . 9 . .
18 . 8 ) 44 . 2 ( 353 . 3
) 10 . 0 (
17
) 3 ( 19
) 10 . 0 (
1
,
2
17 , 3 ,
>
>

p n p
p n p
F
p n
p n
T e i
F F
p n
p n
(5marks)
Therefore we reject H
0
at the 10% level of significance.
(2marks)

Multivariate Data Analysis

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Multivariate Data Analysis

Hochgeladen von

Copyright:

Verfügbare Formate

SET B

147 . 108 423 . 90

306 . 0 029 . 0 143 . 0 161 . 0

4902 . 0 022 . 0 258 . 0

147 . 108 423 . 90

147 . 108 423 . 90

147 . 108 423 . 90

-6.8598, but since y

since -5.8801 <

306 . 0 029 . 0 143 . 0 161 . 0

306 . 0 029 . 0 143 . 0 161 . 0

405 . 37 801 . 29 611 . 17

405 . 37 801 . 29 611 . 17

04073 . 0 03245 . 0 01918 , 0

402 . 0 002 . 0 258 . 0

Das könnte Ihnen auch gefallen