Beruflich Dokumente
Kultur Dokumente
I.
INTRODUCTION
Congratulations Educational data mining (EDM) is said to be budding and promising application of data
mining, concerned with constructing such procedures which can discover the distinctive types of data
that comes from the educational quarter, and using these procedures to comprehend students in a
healthier way by which they learn [1]. Educational data mining is also concerned with investigating,
developing and applying automated methods on large collections of educational data to extract patterns
which would otherwise be complex to comprehend and unfeasible to analyze due to massive size of data
within which it survives [2].
Data mining is sometimes called as knowledge discovery in Databases (KDD), which can be used to find
the hidden knowledge and the relationships that exist among the huge amount of educational data. The
Knowledge discovery process that digs out data or knowledge from databases is also known as Data
mining [3]. There are number of techniques and algorithms such as classification, prediction, clustering,
outlier detection, and association rule etc., used for such specific purpose. Mining techniques are used for
predicting the dropout students so that the stakeholders or decision makers can retain such students at
an earliest. Thus, making an attempt to improve their performance and consequently reduce their
dropout ratio or the likelihood of their failure. For this, discriminant analysis is used and the central
attention of this paper has been on the relationship among the variables using discriminant analysis.
Discriminant analysis being a statistical technique and parallel to regression, involves dependent variable
to be categorical in nature rather than continuous [4]. And the method is based on a linear combination
of predictor variables.
D= c+b1X1 + b2X2 +.. + bkXk
www.ijafrc.org
Mean
Std. Deviation
Unweighted
Weighted
VARGE
65.69
20.675
100
100.000
VAREC
66.98
11.870
100
100.000
VARED
53.79
34.598
100
100.000
VARPS
72.00
15.708
100
100.000
VARGE
60.14
18.466
29
29.000
www.ijafrc.org
Total
VAREC
43.90
14.652
29
29.000
VARED
47.55
35.900
29
29.000
VARPS
62.10
19.927
29
29.000
VARGE
64.44
20.263
129
129.000
VAREC
61.79
15.796
129
129.000
VARED
52.39
34.851
129
129.000
VARPS
69.78
17.173
129
129.000
It is clear from the group means (table 1) that two groups namely pass or fail are widely separated in
terms of subject economics and political sciences respectively. There appear some differences across the
subjects of Education and General English.
Further, there appears higher scattering in the results of education as its standard deviation is on higher
side. Also, there appears reasonable scattering in the result obtained in the subject of General English.
The pooled within groups correlation Matrix indicates low correlation between the predictors. Therefore,
Multi-Co linearity is unlikely to be a problem (see table 2).
Table 2: Pooled Within-Groups Matrices
VARGE
VAREC
VARED
VARPS
Correlation VARGE
1.000
.045
-.004
.097
VAREC
.045
1.000
-.101
-.057
VARED
-.004
-.101
1.000
-.053
VARPS
.097
-.057
-.053
1.000
The significance of univariate F-ratio (table 3) indicates that when predictors are considered are
considered individually, only Economics and political science significantly differentiate between those
who passed or failed in the examination as significance value associated with two predictors is less than
acceptable level of significance (i.e 0.05).
Table 3: Tests of Equality of Group Means
Wilks'
Lambda
df1
df2
Sig.
VARGE
.987
1.697
127
.195
VAREC
.625
76.215
127
.000
VARED
.994
.719
127
.398
VARPS
.942
7.866
127
.006
Thus the competence in political sciences and economics would at large determine the overall results
significantly. Thus to have better results, the curriculum should be administered in such a way that
should lead to superior performance in Economics and Political Sciences.
www.ijafrc.org
.719a
100.0
Canonical
Correlation
100.0
.647
The Egin value associated with the function is 0.719 (Table 4) and it accounts for 100 percent of the
explained variance. The canonical correlation associated with the function is 0.647. The square of this
correlation (0.647)2 =0.41, indicates that 41 percent of the variance in the dependant variable result
[pass/fail] is explained or accounted for by this model.
Determine the significance of the Discriminant Function: It would not be meaningful to interpret the
analysis if the discriminant functions estimated were not statistically significant. The null hypothesis that,
in the population, the means of all discriminant functions in all groups are equal can be statistically tested
and is based on wilks lamda. The significance level is estimated based on chi-square transformation of
the statistic. In testing the significance in the overall results, it may be noted that the wilks lamda
associated with the function is 0.582 which transform to a chi-square of 67.693 with 4 degrees of
freedom. This is significant beyond the 0.05 level (table 5).
Table 5: Wilks' Lambda
Test of
Function(s)
Wilks'
Lambda
Chi-square
df
Sig.
.582
67.693
.000
Interpret the Results: An examination of the standardized discriminant function coefficients for the
overall results (Bjb College) is constructive. Given the low interactions between the predictors, on might
use the magnitude of standardized coefficients to suggest that Economics is predominant factor in
determining the performance in overall result. However, it is not the top most important variable based
on standardized canonical discriminant function coefficients. This anomaly results from multiCo linearity (table 6).
Table 6: Standardized Canonical Discriminant Function Coefficients
Function
1
VARGE
.060
VAREC
.952
VARED
.204
VARPS
.353
Also from structure correlation matrix, it is clear that the first and for most important factor (Subjects in
this case) is economics followed by political sciences, then General English and finally the competence in
Education. Thus, it is clear that competence in economics (as a subjects) largely determine the
performance in overall result of a candidate (table 7).
www.ijafrc.org
.914
VARPS
.294
VARGE
.136
VARED
.089
IV. CONCLUSION
In this research study, the various variables associated with a specific course (B.A) were analyzed using
discriminant method with the intention of enhancing the student performance. After studying these
variables, the researcher came to the conclusion that the first and foremost predictor variable
(economics) plays a pivotal role in the overall success of a student. Pooled within groups correlations
between discriminating variables and standardized canonical discriminant functions also showed
imperativeness of the subjects (Political Science and General English) in determining the overall results
which has great bearing on the performance of the student. Thus, the stakeholders or decisions makers
can strengthen the subject to reduce the overall drop out ratio considerably.
V. REFERENCES
[1]
BakerRSJd, Yacef K. The state of educational data mining in 2009. A review and future visions. J
EduData Min 2009.
[2]
Romero C, Ventura S, Pechenizky M, Baker R. Handbook of educational data mining. Data Mining
and Knowledge Discovery Series. Boca Raton, FL: Chapman and Hall/CRCpress; 2010.
J. Han and M. Kamber, Data mining: Concepts and Techniques. San Francisco, CA, USA: Morgan
Kaufmann Publishers Inc., 2000.
[3]
[4]
[5]
Sheikh, L., Tanveer, B. and Hamdani, S. 2004. Interesting measures for mining association rules.
IEEE-NMIC Conference. held at Lahore (Pakistan), 2426 Dec. 2004.
[6]
Romero, C. and Ventura, S. (2007) Educational data Mining: A Survey from 1995 to 2005, Expert
Systems pp. 135-146.with Applications (33),
[7]
El-Halees, A. 2009. Mining students data to analyze learning behavior: a case study.
https://uqu.edu.sa/fi les2/tiny_mce/plugins/fi lemanager/fi les/30/papers/f158.pdf.
[8]
Kifaya. 2009. Mining student evaluation using associative classification and clustering.
Communications of the IBIMA. 11, IISN 19437765.
www.ijafrc.org
Ayesha, S., Mustafa, T., Sattar, A.R. and Khan, M.I. 2010. Data mining model for higher education
system. European Journal of Scientific Research. 43(1): pp. 2429.
[10]
Sunil Kumar, P., Panda, A.K. and Jena, D.2013. Mining the factors affecting the high school
dropouts in rural areas, International Journal of Advance Computer Engineering and
Communication Technology (IJACECT), 2(1); pp. 16.
www.ijafrc.org