Beruflich Dokumente
Kultur Dokumente
Research Scholar, School of Information technology & Engineering VIT University, Vellore
2
Professor, School of Computing Science & Engineering, VIT University, Vellore
E-Mail: k.rajmohan90@gmail.com, pilango@vit.ac.in
I. INTRODUCTION
Incomplete datas is very common in the large and
huge data bases. Technically, some attribute values
are missing leads the database inconsistent state. Data
preprocessing is very essential process to address the
missing attribute values. Typically, they can replace
the missing values with many possible approaches.
We need certain knowledge to predict whether the
data is missed or not. [1] Many real world applications
taking complicated decisions to handle missing data.
For example in a health care industry, if doctor have
to examine the patient, he / she have to check for the
patient history to predict the result. Not only health
care industry, there are many corporate concerns also
worried about their missed data. There are many
approaches and techniques that are handling for
incomplete data. There are many drawbacks that lead
to having missing attribute values that includes loss of
efficiency, complication to manage and analyze data,
bias resulting from differences between missing and
complete data. [2] In order to avoid the negative
effects in the analysis of data mining algorithms.
When missing values are present, different approaches
are employed to prepare and cleanse the data. This is
critical as many existing industrial and research datas
MAR
NMAR
Weaker
Violated
Partial /
Good
distinct
DATA
Information
No information
TEST
Not fit
Not fit
RESULT
Plausible
Sensitive
Table 1.1 Comparisons of MAR and NMAR
1222
1223
Variable
type
Regression
Incomplete
Prediction
Incomplete
Incomplete
Probable
Estimation
Prediction
Initialization
Iterative
Incomplete
Incomplete
Initialization
Prediction
Distribution
No
Yes
Yes
Numerical
Iteration
Yes
Multiple
KNNI
Fuzzykmeans
E-M
SVM
Outlier
detection
ITF
Substitution
Possib
ility
Yes
Yes
Yes
Yes
1224
REFERENCE
[1]
IV.CHALLENGES
As datas growing larger and even there are many
machine learning approaches and techniques. Still
there may have some loss of quality to that intend. So
formally there are many challenges out come by the
word missing attributes that are mainly reflected in the
quality of data. Many real world applications formally
working with huge amount of data.If any of the datas
missed means that will reflected to major concern. So
by filling the missing values into the equivalent
probable value or by simply eliminating the missed
group or by ignoring the actual missed datas may
lead to the loss of efficiency. So the datas shall say to
be missed before going to the data preprocessing.
Although many new techniques impressed companies
and even they are taking and picking up some of the
technique still there may have some drawbacks. The
main challenges in addressing missing value attributes
are the loss of quality. This tends the datas to go
down. So considering the datas to be more formal we
are about to make a prediction and replace the values
exactly in deed. Replacing is also the way that we may
feel not good. Rather we can go for some other
techniques to achieve. We have identified major
challenges faced by many real time applications and
even some draw backs of present machine learning
approaches.
V.CONCLUSION
In this paper we have briefly discussed about the
various techniques of missing attributes. We have
discussed about various applications that are broadly
1225