You are on page 1of 3

IJSRD - International Journal for Scientific Research & Development| Vol.

3, Issue 10, 2015 | ISSN (online): 2321-0613

Student Performance Evaluation in Education Sector using Prediction

and Clustering Algorithms
Solankar Punam Anil1 Jagatap Trupti Baban2 Rupnawar Sachin Hanumant3 Shitole Vibhavari
Jayvant4 Prof. Kumbhar S. L.5
Student 5Assistant Professor
Department of Computer Engineering
SBPCOE Indapur
Abstract Data mining is the crucial steps to find out
previously unknown information from large relational
database. various technique and algorithm are their used in
data mining such as association rules, clustering and
classification and prediction techniques. Ease of the
techniques contains particular characteristics and behaviour.
In this paper the prime focus on clustering technique and
prediction technique. Now a days large amount of data
stored in educational database increasing rapidly. The
database for particular set of student was collected. The
clustering and prediction is made on some detailed manner
and the results were produce. The K-means clustering
algorithm is used here. To find nearest possible a cluster a
similar group the turning point India is the performance in
higher education for all students. This academic
performance is influenced by various factor, therefore to
identify the difference between high learners and slow
learner students it is important for student performance to
develop predictive data mining model.
Key words: Data Mining, Clustering, Classification,
Predictive Model
The attainment to predict a students performance is most
important in educational sector. Student performance is
based upon various factors like personal, social and other
variables. A most promising tool to get this objective is the
use of Data Mining. Data mining is the simply the process
of extracting useful information from large amount of data.
To mine the unknown data, different techniques were used
such as the Supervised and unsupervised learning technique,
pattern mining, clustering, classification technique,
prediction, Association rule etc. The K-Means Algorithm
concentrated in this paper is the clustering technique, which
is the partition, and supervised learning type data mining
algorithm. Clustering is the process of grouping a given set
of paradigm into disjoint clusters. This is done such that
patterns in the same cluster are likewise and patterns
belonging to two different clusters are different.
In this paper another one of the data mining
techniques were used such as classification. Classification is
a predictive data mining technique; create prediction about
values of data using known results found from various data.
Classification maps data into predefined sets of classes.
Before examining the data it is often referred to as
supervised learning because the classes are determined.
Prediction models that contain all personal, social, and other
variables are needs for the performance of the students for
effective prediction. The prediction of student performance
with high accuracy is beneficial for to recognize the students
with low academic achievements initially. It is required that

the students identified can be cared more by the teacher so

that their performance is improved in future.
The remaining paper is organized as follows. The
Related work is included in second section related to the
proposed work. Section 3 contains the proposed work with
the K- Means clustering algorithm, classification, and the
new procedure for the implementation. Section 4 consists of
the conclusion.
International conference on computer supported Brijesh
Kumar Bhardwaj and Saurbh Pal, Predicting students future
learning behavior-with the use of student modeling, By
making the students model the destination can be
The destination can be achieved by making student
models that assemble the learners characteristics, contains
all information such as their knowledge; behaviour and
motivation to learn .The learning are also measured with
user experience of the learner and their overall satisfaction.
Suchita Borkar and K. Rajeswari studied the use of
education data mining for discovering different patterns to
improve the performance of students data and identified the
attributes which effect students academic performance. In
this paper to operate large amount of data to their
performance is improved by using the clustering algorithm.
In this paper the High prediction accuracy, processing speed
is increased.
In S. Ganga and Dr. T. Meyyappan paper
performance of student educational data mining describes a
research region concerned with the desire data mining,
machine learning and statistics to information generated
from educational setting. At high levels, the field search for
grow up the methods for exploring this data.
The categorization of clustering algorithm into
unsupervised and semi supervised strategies based on
whether we have certain prior knowledge about the clusters
and clustering process to utilize to improve the clustering
performance of student data. In our work, focus on the KMeans algorithm as centroid based semi-supervised model.
It can be achieved by various algorithms that to present a
systematic commentary on various clustering techniques
applied for educational data mining to predict academic
performance of student and its implications find them.
Pooja Thakar, Anil Mehta and Manisha conducted
study on Performance Analysis and Prediction in
Educational Data Mining.

All rights reserved by


Student Performance Evaluation in Education Sector using Prediction and Clustering Algorithms
(IJSRD/Vol. 3/Issue 10/2015/054)


A. Data Set:
We are collecting the data of engineering college. This
shows the student academic performance and the personal
information like economical problem, family problem etc.
The data also contain the student family background, student
details, subjects marks, semester wise percentage have been

Fig. 1: Analysis of student performance

A. Clustering:
Clustering analysis or clustering is the task of grouping set
of object in such a way that object in the same group are
more allied to each other than to those in other groups
(cluster) it is main task of exploratory data mining, and for
statistical analysis in general purpose it is also useful in
machine learning pattern recognition, image analyze bioinformatics and also information get back.
B. K-Means Clustering Algorithm:
K-Means is one of the unsupervised learning algorithms that
used to solve the problem of clustering. The k-means
algorithm is easier and simple to understand the problem.
The main intellection is to define K centers, one for each
cluster. These centers should be placed in designing way
because of various location reason result vary. So, the better
choice is to place them as much as possible far away from
each other. Another step is to take each point belonging to a
given data set and attach it to the closest center. After the
first point is complete there is no waiting state and an early
group age is done. For recalculating that point of the K new
centroid as barycenter of the clustering resulting from the
previous step. After we have these K new centroid, A new
obligatory has to be done between the similar data set points
and the closest new center. A loop has been generated. As
result of this loop we may notice that the K center change
their step by step location varies is does not change. After
that this algorithm target at minimizing an objective function
known as soured error function.
C. Naviebayes:
In machine learning, naive Bayes classifiers are a family of
intelligible probabilistic classifiers depend on applying
Bayes theorem with strong (naive) independence
assumptions between the objectives. Naive Bayes classifiers
are highly scalable, requiring a number of parameters linear
in the number of variables (features/predictors) in a learning

problem. Maximum-likelihood training can be done by

evaluating a closed-form expression, which takes linear
time, rather than by expensive iterative approximation used
for many other types of classifiers. The Navie Bayesian
classifier is based on Bayes theorem with independence
assumptions between predictors. A nave Bayesian model is
easy to build, with no complicated iterative parameter
estimation which makes it particularly useful for very large
datasets. Despite its simplicity, the Naive Bayesian classifier
often does surprisingly well is widely used because it often
outperforms more sophisticated classification methods.
D. Implementation of Mining Model
In implementation of mining model various algorithms and
techniques like, classification, clustering, regression, neural
network, artificial intelligence association rules and genetic
algorithm, decision tree are used for knowledge discovery
from various database. From these algorithm and k means
classification is one of the most frequently used problems by
data mining and machine learning (ML) researches .It
consist of predicting the value of category class based on
values of other class which are predicting class (attributes ).
There are different classification methods in our present
study. We are going to use the Bayesian classification
Bayes classification proposed that is based on
conditional probability of Bayes rule. Bayes rule is one of
the techniques to the similarity of property given the set of
data as evidence or input Bayes, Bayes theorem is as
The approach is called Navie, because it
considers the independence between the various classes or
attributes values. Navie byes classification generally viewed
as both descriptive and a predictive types of algorithm. The
probabilities are predictive type and the used to predict the
class membership. For a target tuple.
The naive Bayes approaches has several
advantages: It is easy to use unlike other classification
approaches the Navie Bayes approach required only one
scan of the training data; easily handle mining value by
simply omitting that probability one more advantage of the
Navie Bayes classifier is that it requires a small amount of
training data to estimate the various parameters such as
means and variance which are necessary for classification
because independent variables are assumed, there is no need
to determine entire covariance matrix, only the variances of
the variables for each class need to be determined. In spite
of their Navie design and apparently over simplified
assumptions, Navie Bayes classifiers have worked quite
well in many complex real world situations.
By referred these Navie Bayesian classification and kMeans clustering technique on student database to predict
the student academic performance on the basis of student
database. This predicted data from student database is
usefull for student performance improvement. This study
shows student performance and easy to identify those
student which having less mark and poor performance.
According to the present study the performance off the

All rights reserved by


Student Performance Evaluation in Education Sector using Prediction and Clustering Algorithms
(IJSRD/Vol. 3/Issue 10/2015/054)

students are not always depend on their own efforts. Other

factors have influenced the students performance according
to our investigation. This proposal will improve the insights
over existing methods.
[1] Ali Buldua, Kerem gn,. Data mining application on
students data. Procedia Social and Behavioral Sciences
2 52515259, 2010.
[2] Singh, Randhir. An Empirical Study of Applications of
Data Mining Techniques for Predicting Student
Performance in Higher Education, 2013.
[3] Baha Sen, Emine Ucar. Evaluating the achievements of
computer engineering department of distance education
students with data mining methods. Procedia
Technology 1 262 267, 2012.
[4] Baradwaj, Brijesh Kumar, and Saurabh Pal. Mining
Educational Data to Analyze Students' Performance.
arXiv preprint arXiv: 1201.3417, 2012.
[5] Castro, Flix, et al. Applying data mining techniques to
e-learning problems. Evolution of teaching and learning
paradigms in intelligent environment. Springer Berlin
Heidelberg, 183-221, 2007.
[6] Huebner, Richard A. "A survey of educational".
[7] Ramaswami, M., and R. Bhaskaran. A CHAID based
performance prediction model in educational data
mining. arXiv preprint arXiv: 1002.1144, 2010.
[8] Pool, Lorraine Dacre, Pamela Qualter, and Peter J.
Sewell. "Exploring the factor structure of the
CareerEDGE employability development profile."
Education+ Training 56.4 (2014): 303-313.
[9] Saranya, S., R. Ayyappan, and N. Kumar. "Student
Progress Analysis and Educational Institutional Growth
Prognosis Using Data Mining." International Journal Of
Engineering Sciences & Research Technology, 2014
[10] Hicheur Cairns, Awatef, et al. "Towards CustomDesigned Professional Training Contents and
Curriculums through Educational Process Mining."
IMMM 2014, The Fourth International Conference on
Advances in Information Mining and Management.
[11] Archer, Elizabeth, Yuraisha Bianca Chetty, and Paul
Prinsloo. "Benchmarking the habits and behaviors of
successful students: A case study of academic-business
collaboration." The International Review of Research in
Open and Distance Learning 15.1 (2014).
[12] Arora, Rakesh Kumar, and Dharmendra Badal. "Mining
Association Rules to Improve Academic Performance."
[13] Pea-Ayala, Alejandro. "Educational data mining: A
survey and a data mining-based analysis of recent
works." Expert systems with applications 41.4 (2014):
[14] Potgieter, Ingrid, and Melinde Coetzee. "Employability
attributes and personality preferences of postgraduate
business management students." SA Journal of
Industrial Psychology 39.1 (2013): 01-10.
[15] Jantawan, Bangsuk, and Cheng-Fa Tsai. "The
Application of Data Mining to Build Classification
Model for Predicting Graduate Employment."
International Journal Of Computer Science And
Information Security (2013).

[16] Azhar Rauf, Sheeba,Enhanced K-Mean Clustering

Algorithm to Reduce Number of Iterations and Time
Complexity, Middle-East Journal of Scientific
Research, Vol. 12 (7), Pp. 959-963, 2012.
[17] Jaideep Vaidya, Privacy Preserving K-Means
Clustering overVertically Partitioned Data, In
proceeding of SIGKDD 03,Washington, DC, USA,
August 24-27, 2003.
[18] N. Sivaram, Applicability of Clustering and
Classification Algorithms for Recruitment Data
Applications, Vol. 4(5), July 2010.
[19] Md. Hedayetul Islam Shovon, Prediction of Student
Academic Performance by an Application of K-Means
Advanced Research in Computer Science and Software
Engineering, Vol. 2(7), July 2012.

All rights reserved by