Sie sind auf Seite 1von 9

Developing the Decision Model

through Discriminant Analysis


As part of Weekly Assignments

To be Submitted to
PROF. SREEDHARA R.

Presented by:
Anindya Biswas
1527605, M1

QUESTION:
ABC School of Business selects students for its MBA Program every year through a written test,
Group Discussion, and Interview. Then it tracks the performances by means of Grade Point
Average during the 2 -Year Program. The school has past data for 30 students admitted in an
earlier year, who either were successful students [ GPA greater than 3.00 on a 4 Point scale is
defined as successful] or unsuccessful [ GPA less than 3.00]

Develop model using discriminant analysis for the ABC school of business. The model should be
able to predict whether a student will be successful or unsuccessful based on his Written Test
score, Group Discussion Score & Interview Score.
1. Explain the decision rule to be used for classifying the student as
i. Potentially Successful
ii. Unsuccessful
2. What is the classification accuracy level of the model?
3. Which of the three scores is the best predictor of a students future success?
Dependent Variable:
Successful 1
Unsuccessful 2
Independent Variable:
X1 Written Test Score
X2 Group Discussion Score
X3 Interview Score

Data Sheet containing Data on Past 30 Students admitted in an earlier year:


Written Test
Score

Group Discussion
Score
200
270
300
250
260
220
210

Interview
Score
10
15
30
35
45
40
30

Successful or
Unsuccessful
15
20
30
30
35
45
40

2
1
1
1
1
2
1
2

200
240
230
200
300
280
290
275
263
285
291
300
205
220
230
240
270
290
280
250
255
260
260

10
15
25
30
35
20
30
40
25
30
35
25
15
25
40
40
45
35
30
30
15
20
25

25
30
25
35
40
20
25
40
30
25
35
30
20
40
25
15
25
30
35
28
23
40
15

2
1
2
2
1
2
2
2
1
1
1
1
2
2
2
2
1
1
2
2
2
1
2

Results & Analysis:

Classification Resultsa
Predicted Group MembershipA
Successful/Unsuccessful
OriginalB

CountC

Successful
Unsuccessful

%D

Successful

Successful

Unsuccessful

Total

10

14

12

16

71.4

28.6

100.0

Unsuccessful

25.0

75.0

100.0

a. 73.3% of original grouped cases correctly classified.

The Classification Table:


This is the table that describes the accuracy level of the discriminant model developed by the
system. We shall try and understand the variables involved:
A. Predicted Group Membership: These are the predicted frequencies of groups from
the analysis. The numbers going down the groups mentioned (Successful and
Unsuccessful) tell us of the number of respondents how many were correctly and
how many were incorrectly classified. For example, of the 14 cases which were fed
into the system as Successful, 10 were correctly classified and 4 were incorrectly
classified.
B. Original: These are frequencies of groups found in the data. Across each row, we see
how many of the cases in the group are classified by our analysis into each of the
different groups.
C. Count: This portion of the table presents the number of observations falling into the
given intersection of original and predicted group membership. The row totals of
these counts are presented, but column totals are not. For example, number of
observations that were originally classified as Successful, but were predicted to be in
the Unsuccessful category is 4.
D. %: This portion of the table presents the percent of observations originally in a given
group (listed in the rows) predicted to be in a given group (listed in the columns). For
example, the number of cases that were classified as successful but predicted as
unsuccessful is around 28.6%.
Furthermore, on an overall basis if we are to see the classification accuracy of the table the
footnote at the end of the table says: 73.3% of original grouped cases correctly classified.
This if we are to see on a scale of accuracy, we can say that while the model is accurate it is not
accurate enough to give proper results all the time and it might need some more data to

give a more accurate interpretation to determine whether a future candidate admitted into the
ABC School of Business will be potentially successful or unsuccessful.

Wilks' Lambda
Wilks' LambdaA

Test of Function(s)
1

Chi-square

.705

SigB.

Df

9.255

.026

The Wilks Lambda Table:


The table is supposed to tell us how significant the predictors used to determine the model are.
We shall now see the variables in this table:
A. Wilks Lambda: It is one of the multivariate statistical measures used by SPSS. It can be
calculated using the Variable Canonical Correlation from the Eigen values table using
the formula: {1 (canonical correlation)2}. In our case, the canonical correlation is .543
and thus the Wilks Lambda is (1 - .543 2) or .705. The closer Wilks' lambda is to 0, the
more the variable contributes to the discriminant function. But since, we have a
value of .705 for Wilks Lambda we can say that has a low discriminating power.
B. Sig.: This is the p-value associated with the Chi-square statistic of a given test. The null
hypothesis that a given function's canonical correlation and all smaller canonical
correlations are equal to zero is evaluated with regard to this p-value. In our case, we
have an Alpha level of .05 and the p-value that we have achieved is .026 which is less
than the Alpha value. Thus we can say that the corresponding function explain the
group membership well.
Standardized Canonical Discriminant
Function Coefficients
Function
1
Written Test Score
Group Discussion Score
Interview Score

.934
-.174
.567

Standardized Canonical Discriminant Function Coefficients


5

The standardized canonical discriminant coefficients can be used to rank the importance of
each variable. A high standardized discriminant function coefficient might mean that the groups
differ a lot on that variable.
In our case, we could see that the written test score, is the most important predictor variable
in understanding whether the potential candidate will be successful or not. Followed by the
personal interview scores as this determines the candidates communication skills and also
ability to handle pressure.
The least important predictor variable is the Group Discussion Score as the student might
not have to participate in a discussion type scenario in his future.

Canonical Discriminant Function


Coefficients
Function
1
Written Test Score
Group Discussion Score
Interview Score
(Constant)

.032
-.017
.070
-9.729

Unstandardized coefficients

Canonical Discriminant Function Coefficients


The unstandardized canonical coefficients are the estimate of parameters. The purpose of
canonical discriminant analysis is to find out the best coefficient estimation to maximize the
difference in mean discriminant score between groups. The general discriminant analysis is as
follows:
Y =a+ K 1 X 1+ K 2 X 2+ K 3 X 3 + KnXn

In our case, the equation for the Discriminant Analysis Equation stands out to be:
Y =a+ K 1 X 1+ K 2 X 2+ K 3 X 3

Substituting the values from the above table we get the equation as:
6

Y =9.729+0.32 X 1+ (0.17 ) X 2+0.70 X 3

Functions at Group Centroids


Function
Successful/Unsuccessful

Successful

.668

Unsuccessful

-.584

Unstandardized canonical discriminant


functions evaluated at group means

Functions at Group centroids


These are the means of the discriminant function scores by group for each function calculated.
Unsuccessful

- 0.584

Successful

- ve

+ ve

+ 0.668

Centroid Mean
0.0415

On the right side of the line there are positive values and on the left side there are negative
values. The right side of the line denotes xx=0.668 (sum of Discriminant score for all the
unsuccessful students) and on the left side xx= -0.584 (sum of Discriminant score for all the
unsuccessful students). We know that the function scores have a mean of zero, and we can
check this by looking at the sum of the group means multiplied by the number of cases in
each group:
(14 * 0.668) + (16 * -0.578) = 9.352 9.248 = 0.104 which is almost near to 0.
We shall now calculate the discriminant scores as well as the predicted group by substituting the
values for X1, X2 and X3 in the above equation and finding the value of Y. A value that is lower
than the centroid mean of 0.0415 would be deemed unsuccessful whereas above it would
mean successful and then we can compare it with the original data.
7

Written Test
Score

200
270
300
250
260
220
210
200
240
230
200
300
280
290
275
263
285
291
300
205
220
230
240
270
290
280
250
255
260
260

Group Discussion
Score
10
15
30
35
45
40
30
10
15
25
30
35
20
30
40
25
30
35
25
15
25
40
40
45
35
30
30
15
20
25

Interview
Score

Successful or
Unsuccessful
15
20
30
30
35
45
40
25
30
25
35
40
20
25
40
30
25
35
30
20
40
25
15
25
30
35
28
23
40
15

2
1
1
1
1
2
1
2
1
2
2
1
2
2
2
1
1
1
1
2
2
2
2
1
1
2
2
2
1
2

Discriminant
Scores
-2.41213
0.10499
1.51215
-0.18397
0.31521
-0.18256
-0.68174
-1.70955
-0.15791
-1.00498
-1.35486
2.12776
0.33985
0.83903
1.23622
0.40835
0.67812
1.48683
1.59912
-1.98690
-0.27294
-1.26589
-1.64664
-0.06554
1.10335
1.21978
-0.23751
-0.16698
1.10135
-0.74207

Predicted
Group
2
1
1
2
1
2
2
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
2
1
1
2
2
1
2

Similarly, as we can see from the table above the highlighted cases are the ones wherein the
data gathered said that the student was successful or unsuccessful but the data when fed
into the system predicted otherwise. Thus the accuracy level was around (22/30) which is
around 73.33% which was as shown in the classification table.

FINAL VERDICT
Thus, in conclusion we can say that:
1. The model needs more parameters to justify the student potential because currently it has
a low classifying capability of around 73.33% which is not enough.
8

2. The model though has data (which has been provided as of now) that is significant as was
seen through the p-value of .026 has low discriminating power (as was seen through
Wilks Lambda value of .705) and may not be able to guarantee satisfactory and accurate
results of a students success potential in the institute.
3. Furthermore, as we saw in the table for standardized discriminant function coefficients,

the importance for Written Test Scores is the most important followed by Personal
Interview and finally, the Group Discussion Scores. This might not be the scenario in the
future, once the student graduates out of ABC School of business as being able to
participate successfully in a team meeting is often considered to be characteristic of a
good team player. Thus the judgement of the model need not be true always.