Beruflich Dokumente
Kultur Dokumente
Abstract: Education, more often known as learning, is a way SVM is a learning model, a supervised technique, used for
of exchanging knowledge with the perspective of betterment classification analysis of data. The algorithm works by finding
of individuals and progress of the nation as well. The hyperplane which aims to widen the boundaries between two
objective of this paper is to help students to improve their sets of classes. The goal of classification is to separate the data
performance with the use of applications of data mining. It points into separate classes. SVM views these points as an n-
makes use of psychological features of students. The paper dimensional vector and tries to separate them with (n-1)
uses multi classifier Support Vector Machine (SVM) to dimensional hyperplane. This mechanism is commonly known
classify the learners in the category of high, average and low as linear classifier. Various hyperplanes may exist to classify
according to their academic scores. It is carried out using the data, but it focuses on finding the one which provides
linear kernel and radial basis kernel. It is noted that RBF maximum separation.
produces better results as compared to the linear kernel.
Predicting the performance of students in advance can Section 2 provides an overview of related work and discusses
advantage both the institution and learner to take about various prediction techniques and their accuracy rate
measurable steps in order to enhance the learning process. applied in this area. Section 3 explains the proposed
methodology used for prediction in our study. Results of our
Keywords: Education, Data mining, SVM, Student study and their comparative analysis are given in Section 4. At
Performance, Prediction, Psychology the end, section 5 provides the conclusion.
Richardson et al. [16] worked on psychological parameters for SVM divides the data points as:
learner’s scholastic performance. The parameters are divided
into five major categories – Personality, Motivation, Self f(y) > 0, iff y אX, and (2)
regulatory learning strategies, Learning approach and
f(y) <= 0, iff y אZ (3)
Psychosocial contextual influences. Our study considered these
parameters and collected the data with questionnaire prepared The distance between the observation and the hyperplane is
based on these psychometric parameters. It was found that given by |x.y + z|/||x||, and the margin is given as 2/||x||.
students’ scholarly achievement also depend on non
intellectual correlates than just academic correlates [17]. Also, ii. Non separable case, where data points overlap. To classify
Meta analysis showed the correlation in students’ percentage these data points, SVM performs restructuring of the data with
and non intellectual correlate. the use of transformation function represented as (Φ). It works
by mapping the scalar dot product of the data points to a higher
III. PROPOSED METHODOLOGY enough dimension where linear separation becomes possible.
The steps of proposed methodology are depicted in Figure 1.
B. Analysis
A. Support Vector Machine
An important and most common task of Machine learning,
SVM is a supervised learning technique that aims to classify Classification, can be performed with various data mining
the data. It makes use of hyperplane for dividing the dataset techniques. This paper focuses on classifying the students’ data
into classes with the gap as wide as possible known as margin. based on psychometric components into three classes: High,
It generates parallel lines for creating partitions. The margin is Average and Low. Our problem is a multi classifier problem. It
the maximum distance between the nearest data points of the makes use of Linear Kernel and Radial Basis Function Kernel.
classes. To reduce the generalization error, largest margin is The dataset consists of the records of the students collected
selected. using Questionnaire based on their psychological parameters
which covers Personality, Motivation, Psychosocial contextual
influences, Learning strategies, Approach to learning and
Socio economic status [17]. The dataset consists of one
thousand records based on 29 non intellectual constructs of
students. For classification, we divide the dataset as - 70% of
the data has been used for training the model and testing is
done on the rest 30% of the dataset.
The linear kernel can be defined as the inner dot product [a, b]
and an arbitrary constant c, which can be mathematically
represented as:
where x and x' are two input feature vectors, ||x-x'||2 is the
square of Euclidian distance and is calculated as: 1/2 2. The
value of the RBF kernel can be used as likeness measure which
Fig. 1. Proposed model varies between 0 and 1; it decreases with distance.
757
IV. RESULTS AND DISCUSSIONS
The proposed model for predicting the performance of students
is assessed using sensitivity, specificity and accuracy. The
results of training data set using different kernels have been
shown in figure 2 and figure 3; and the results of testing data
are discussed in table 1 and table 2.
A. Sensitivity
It is a statistical metric of accomplishment that measures the
positive values (like in our study, correctly identifying the
students in the category of High, Average and Low according
to the given parameters).
(6)
B. Specificity
(7)
C. Accuracy
It is the statistical biasness that measures the trueness i.e.
difference between the observed value and the true value.
TP TN
Accuracy u 100
TP TN FP FN (8)
Fig. 3. Results of Training data set using Radial Basis Kernel
where, TP represents no. of true positives, FP represents no. of
TABLE I: CONFUSION MATRIX
false positives, TN represents no. of true negatives and FN
represents no. of false negatives. Prediction Linear Kernel Radial Basis Function
Kernel
H L A H L A
H 100 0 52 140 0 14
L 0 6 0 0 6 0
A 53 0 88 13 0 126
758
machine for classification of data and predict learners’ CGPI.
According to the statistics given in the Table 2 it was found
that Radial Basis Function kernel gives more accurate results
than Linear Kernel which is approximately 90%.
ACKNOWLEDGEMENT
I would like to express my sincere gratitude to Vivekananda
Institute of Professional Studies and Amity University Uttar
Pradesh for their continuous support.
REFERENCES
[1] Ranjan and R. Ranjan, “Modelling Key parameters in Higher
Education using Logistic Regression: an Indian case based Data
Analysis”, 4th International Conference on Reliability, Infocom
Technologies and Optimization (Trends and Future Directions),
pp. 365-369, 2015, IEEE, 2015.
[2] H. Goker, and H. I. Bulbul, “Improving an early warning
Fig. 4. SVM using Linear Kernel system to prediction of student examination achievement”, In
Machine Learning and Applications (ICMLA), 13th
The tuning of the model is done using grid with values ranging International Conference on pp. 568-573, IEEE, December
from 0 to 5. It was found that Linear kernel provides the best 2014.
values at tuning parameter C=2 as shown above in Figure 4; [3] P. Kaur, M. Singh, and G. S. Josan, “Classification and
when the model run using Radial Grid Kernel, it calculated prediction based data mining algorithms to predict slow learners
variations and provides the best results of sigma at 0.1 and C at in education sector”, Procedia Computer Science, 57, pp. 500-
2 as shown below in Figure 5. 508, 2015.
[4] H. Hamsa, S. Indiradevi, and J. J. Kizhakkethottam, “Student
academic performance prediction model using decision tree and
fuzzy genetic algorithm”, Procedia Technology, 25, pp. 326-
332, 2016.
[5] A. M. Shahiri, and W. Husain, “A review on predicting
student's performance using data mining techniques”, Procedia
Computer Science, 72, pp. 414-422, 2015.
[6] G. Gray, C. McGuinness and P. Owende, “An application of
classification models to predict learner progression in tertiary
education, in: Advance Computing Conference (IACC)”, IEEE
International, pp. 549–554, 2014.
[7] T. Mishra, D. Kumar and S. Gupta, “Mining students’ data for
prediction performance”, Proceedings of the 2014 Fourth
International Conference on Advanced Computing &
Communication Technologies, ACCT ’14, IEEE Computer
Society, Washington, DC, USA, 2014, pp. 255–262.
doi:10.1109/ACCT.2014.105.
[8] S. Sembiring, M. Zarlis, D. Hartama, S. Ramliana and E. Wani,
Fig. 5. SVM using Radial Grid Kernel “Prediction of student academic performance by an application
of data mining techniques”, International Conference on
V. CONCLUSION Management and Artificial Intelligence IPEDR, Vol. 6, pp.
110–114, 2011.
This paper focuses on non intellectual parameters of students
[9] M. Richardson, C. Abraham, and R. Bond, “Psychological
which affect their study and academic growth. Utilization of correlates of university students' academic performance: A
data mining in the field of education can prove to be a boon for systematic review and meta-analysis”, Psychological bulletin,
the society. Psychometric analyses of students’ behavior in 138(2), pp. 353, 2012.
respect of learning help in enhancing their academic [10] I. Burman, S. Som and S. A. Hossain, “Meta Analysis of
performance. Various mining techniques like neural network, Psychometric Measures and Prediction of Student’s Learning
decision tree, KNN, naïve bayes and SVM have been applied Behaviour using Regression Analysis and SVM”, Jour of Adv
to the educational data covering psychological factors. As Research in Dynamical and Control systems, Vol. 10, 02-
discussed in section 3, the accuracy rate of previous studies is Special issue, pp. 291-298, 2018.
less than 89%. Our proposed model uses Support vector
759