Sie sind auf Seite 1von 5

Application of Data mining in Analysis and

detection of Parkinson’s Disease


Omini Rathore Mrs. P. Akilandeswari Namraata Yadav
Computer Science and Engineering Computer Science and Engineering
SRM Institute of Science and Technology SRM Institute of Science and Technology
Chennai, India Chennai, India
omini1997@gmail.com namyadav28@gmail.com

Abstract- Parkinson's disease (PD) is a associated with difficulties along the whole course
neurodegenerative disorder which often affects patients' of the movement process, from planning to
movements. The most common symptoms include initiation to execution of a movement.
shaking, rigidity, slowness of movement, and difficulty in
walking. The main motor symptoms are collectively iii. Rigidity: It is stiffness and resistance to limb
called “parkinsonism”. This paper provides a brief movement caused by increased muscle tone, an
description of the existing techniques used in detecting excessive and continuous contraction of muscles.
Parkinson’s Disease with the help of various data mining
algorithms such as Multiple Instance Learning (MIL),
K-means clustering, Decision Tree Classification,
Moving Average Algorithm etc., their accuracies and
drawbacks and also gives an overview of the proposed
system. Since all of the existing models consider a single
symptom for the detection, the proposed system is based
on building an analytical model with two different
symptoms i.e. speech and finger tapping keystroke in
order to increase the accuracy and find the co-relation
between these symptoms.

I. INTRODUCTION

Currently, PD is diagnosed via various neurological


examinations by specialists. A physician will initially assess
for Parkinson's disease with a careful medical history and
neurological examination. Since there are not many standard Fig 1 Handwriting Samples of patients diagnosed with
tests for detecting Parkinson’s disease therefore a statistical Parkinson’s.
approach has been proposed. The datasets are based on
particular symptoms, some of which are described below- iv. Postural instability: It is typical in the later stages
i. Tremors: The most common symptom is a course of the disease, leading to impaired balance and
slow tremor of the hand at rest which disappears frequent falls, and secondarily to bone fractures,
during voluntary movement of the affected arm and loss of confidence, and reduced mobility.
in the deeper stages of sleep. It typically appears in
only one hand, eventually affecting both hands as
the disease progresses. Tremor results in The data mining algorithms used for classification includes-
micrographia which is a disorder that features
abnormally small, cramped handwriting or i. Multiple Instance Learning: It is a type
progressively smaller handwriting. of supervised learning. The learner receives a set of
labelled bags each of which contains many
ii. Bradykinesia: It is slowness of movement is found instances, instead of receiving a set of instances
in every case of PD, and is due to disturbances which were individually labelled. In the simple
in motor planning of movement initiation, and case of multiple-instance binary classification, if all
the instances in it are negative the bag may be
labelled negative. On the other hand, a bag is v. Support Vector Machine: These are supervised
labelled positive if at least one instance is positive. learning models which are associated with
From a collection of labelled bags, the learner tries learning algorithms that analyse data used
to either induce a concept that will label individual for classification and regression analysis. Given a
instances correctly or learn how to label bags set of training examples, each marked as belonging
without inducing the concept. to one or the other of two categories, an SVM
training algorithm builds a model that assigns new
ii. K-Means: Given a collection of objects each with n examples to one category or the other, making it a
measurable attributes, k-means is an analytical non-probabilistic binary . An SVM model is a
technique that, for a chosen value of k, identifies k representation of the examples as points in space,
clusters of objects based on the objects' proximity mapped so that the examples of the separate
to the center of the k groups. K means is an categories are divided by a clear gap that is as wide
iterative clustering algorithm that aims to find local as possible. New examples are then mapped into
maxima in each iteration. The center is determined that same space and predicted to belong to a
as the arithmetic average (mean) of each cluster's n- category based on which side of the gap they fall.
dimensional vector of attributes.

iii. Decision tree classification: In decision trees at the Algorithm Parameters Data set
beginning, the whole training set is considered as
the root. Feature values are preferred to be Multiple Instance -Dyskinesia ADNi database
categorical. If the values are continuous then they Learning(MIL) -Tremors in Medpix
are discretized prior to building the model. Records Algorithm hands
are distributed recursively on the basis of attribute
values. Order to placing attributes as root or K-means -Dyskinesia 100forparkinsons,
internal node of the tree is done by using some clustering, -movement Clinical trials
statistical approach. The primary challenge in the Decision tree characteristics
decision tree is to identify which attributes do we classification
need to consider as the root node and each level. algorithm
Handling this is know the attributes selection.
Moving Average -Tremors 100forparkinsons,
We have different attributes selection measure to identify Algorithm Medline
the attribute which can be considered as the root note at each
level. The popular attribute selection measures are: a)
Artificial -movement Kaggle, Medpix
Information gain b) Gini index Intelligent characteristics
IG(S,A) for a set S is the effective change in entropy after Algorithms -direction
changes
deciding on a particular attribute A. It measures the relative
change in entropy with respect to the independent variables.
Entropy is the measure of uncertainty of a random variable, Support vector -rigidity handwriting samples
it characterizes the impurity of an arbitrary collection of machine (SVM) -tremors from 37 medicated
examples. The higher the entropy more the information algorithm -handwriting PD patients and 38
content. markers age- and sex-
matched controls
E = -∑i.pi.log2pi

where pi are the ratios of elements of each label in the set.

Table 1. Survey Table

where H(s) is the entropy of the set S II. ISSUES

iv. Moving Average Algorithm: In statistics, a moving Individual analysis of every symptom has some drawback
average, which is also called rolling attached to it such as handwriting is a complex activity
average or running average or moving mean is a where other factors can influence motor movement, in
calculation to analyse data points by creating series speech recognition additional steps such as noise removal
of averages of different subsets of the full data set. and speech segmentation are required, and using breath
It is a type of finite impulse response filter. samples has proved to fail to meet clinically relevant results.
Variations are: simple, and cumulative,
or weighted forms.
I. EXISTING SYSTEMS sensors in patients with Parkinson's Disease.They
Implemented Support Vector Machines (SVM’s) to predict
P. Bonato, D.M. Sherrill, D.G. Standaert, S.S. Salles, M. clinical scores of the severity and performed tests to
Akay proposed Data mining techniques to detect motor determine optimal parameters for the SVM’s.
fluctuations in Parkinson's disease. They used accelerometer
(ACC) and surface electromyographic (EMG) signals as J. Synnott, L. Chen, C.D. Nugent, G. Moore proposed -
their algorithms in which the main focus is on specific Assessment and visualization of Parkinson's disease
clinical application the approach can be generalized to tremor.They used Computer vision based approach.They did
applications in which data mining can be used to analyse a method of tremor amplitude quantification is proposed,
large data sets derived from wearable sensors. and 3D visualization techniques are exploited to provide an
F. Widjaja, C. Y. Shee, W. L. Au, P. Poignet, W. T. Ang intuitive tool for monitoring and assessment of Parkinson's
proposed Towards a sensing system for quantification of disease using Moving Average Algorithm.
pathological tremor. The algorithm they used involved
Accelerometers and sEMG system to obtain tremor motion Cristian F. Pasluosta, Heiko Gassner, Juergen
from the upper limb of the subject. An optical tracking Winkler,Jochen Klucken, Bjoern M. Eskofier proposed - An
system was used as a ground truth for the aforementioned Emerging Era in the Management of Parkinson's Disease:
sensors. The main concept was Sensing system, which was Wearable Technologies and the Internet of Things.They
proposed to quantify pathological tremor in human upper used Wearable technologies and Internet-of-Things applied
limb(arm). to PD, with an emphasis on how this technological platform
may lead to a shift in paradigm in terms of diagnostics and
Samarjit Das,Breogan Amoedo,Fernando De la treatment using Artificial Intelligent Algorithms.
Torre,Jessica Hodgins proposed Detecting Parkinsons'
symptoms in uncontrolled home environments: A multiple II. DRAWBACKS
instance learning approach. The algorithm they used was
Multiple Instance Learning (MIL), Develop a monitoring The analysis of every symptom has some drawback attached
system capable of being used outside of controlled to it for each individual. The limited number of patients
laboratory settings was it’s main focus. tested does not allow performance of additional analysis that
would correlate reliability of the results with the severity of
Yi Liu, Chonho Lee, Bu-Sung Lee,James K.R. Stevenson, the symptoms which adds up to the constraint in the
Martin J. McKeown proposed Analysis of visually guided progress of our project.
tracking performance in Parkinson's disease. They used K-
means clustering, Decision tree classification algorithms to III. CONCLUSIONS
visually-guided tracking performance of PD patients using The existing systems include the use of wearable
data mining techniques to reveal the differences between technologies through the implementation of Internet of
dyskinesia and non-dyskinesia patients. things, handwriting as a marker for the diagnosis of PD
U Kit Pun, Huanying Gu, Ziqian Dong, N. Sertac Artan using support vector machine achieving the accuracy of
proposed the use of a visualization tool for detecting PD and 88.13%, using 3D visualization techniques to provide an
classification of gait data. They have followed a statistical intuitive tool for assessment of Parkinson’s, visually guided
and graphical approach using various data mining tracking performance of PD patients using data mining
techniques. The classification process includes data technique and using voice and speech data to detect
selection, features selection, visualization, and formula Parkinson’s.
integration. The proposed system aims at achieving an accuracy of
above 90% by using two different symptoms i.e. voice and
finger tapping keystroke. Because of the unavailability of
datasets with multiple symptoms, the model is based on the
Peter Drotár, Jiří Mekyska, Irena Rektorová, Lucia
assumption that both the symptoms are of the same patient.
Masarová, Zdeněk Smékal, Marcos Faundez-Zanuy
The voice dataset is created by Max Little of the University
proposed a decision support framework for PD based on
of Oxford, in collaboration with the National Centre for
handwriting markers using Support vector machine
Voice and Speech, Denver, Colorado, who recorded the
algorithm. Since various kinematic aspects are affected in
speech signals. The original study published the feature
PD they have used these aspects as parameters in each task.
extraction methods for general voice disorders. This dataset
These parameters were then fed to the SVM for diagnosis.
is composed of a range of biomedical voice measurements
The results showed an accuracy of over 88%, thus proving
from 31 people, 23 with Parkinson's disease (PD). Each
that handwriting can be used as a valuable marker for the
column in the table is a particular voice measure, and each
diagnosis of PD.
row corresponds one of 195 voice recording from these
individuals ("name" column). The main aim of the data is to
discriminate healthy people from those with PD, according
Shyamal Patel, Richard Hughes,Nancy Huggins,David to "status" column which is set to 0 for healthy and 1 for PD.
Standaert, John Growdon,Jennifer Dy, Paolo Bonato did The data is in ASCII CSV format. The rows of the CSV file
study on using wearable sensors to predict the severity of contain an instance corresponding to one voice recording.
symptoms and motor complications in late stage Parkinson's There are around six recordings per patient, the name of the
Disease.They analysed the data obtained from wearable patient is identified in the first column. Other columns give
values of various attributes such as jitter, shimmer, dimensioned plane. Depending on which region the points
variations in fundamental frequency etc. are located in, they are appropriately classified in that
region. Logistic regression is a predictive analysis. Logistic
The second dataset gives information about multiple regression is used to describe data and to explain the
characteristics of finger movement while typing. The dataset relationship between one dependent binary variable and one
contains keystroke logs collected from over 200 subjects, or more nominal, ordinal, interval or ratio-level independent
with and without Parkinson's Disease (PD), as they typed variables. When selecting the model for the logistic
normally on their own computer (without any supervision) regression analysis, another important consideration is the
over a period of weeks or months (having initially installed a model fit. Adding independent variables to a logistic
custom keystroke recording app, Tappy). regression model will always increase the amount of
The datasets have been merged on the basis of the status (0- variance.
healthy, 1- Parkinson’s) field in both the datasets. The final
dataset consists of a total of 195 entries and 40 attributes.
Merging is followed by data pre-processing which includes
converting the categorical data, and dropping the missing
data.

Fig 4 Architecture diagram of the model

Fig 2 The graph compares the number of patients with PD and


without PD. X axis depicts the status of disease and y axis depicts
the no. of entries in the dataset.

Fig 3 The histogram depicts the values of each attribute in


graphical form.

This dataset will be analysed using Support vector machine


(SVM) algorithm and Logistic regression. A Support Vector
Machine is a supervised learning algorithm. An SVM
models the data into k categories, performing classification
and forming an N-dimensional hyper plane. These models
are very similar to neural networks. Consider a dataset of N
dimensions. The SVM plots the training data into an N
dimensioned space. The training data points are then divided
into k different regions depending on their labels by hyper-
planes of n different dimensions. After the testing phase is
complete, the test points are plotted in the same N
IV. REFERENCES 10. An Emerging Era in the Management of Parkinson's
1. https://pn.bmj.com/content/15/1/14 Disease: Wearable Technologies and the Internet of
Things by Cristian F. Pasluosta ; Heiko
2. Data mining techniques to detect motor fluctuations in Gassner ; Juergen Winkler ; Jochen Klucken ; Bjoern
Parkinson's disease – 2005 by P. Bonato ; D.M. M. Eskofier.
Sherrill ; D.G. Standaert ; S.S. Salles ; M. Akay.
11. 'Exploiting Nonlinear Recurrence and Fractal Scaling
3. Towards a sensing system for quantification of Properties for Voice Disorder Detection', Little MA,
pathological tremor by F. Widjaja ; C. Y. Shee ; W. L. McSharry PE, Roberts SJ, Costello DAE, Moroz IM.
Au ; P. Poignet ; W. T. Ang BioMedical Engineering OnLine 2007, 6:23 (26 June
2007)
4. Detecting Parkinsons' symptoms in uncontrolled home
environments: A multiple instance learning approach by 12. http://archive.ics.uci.edu/ml/datasets/Parkinsons
Samarjit Das ; Breogan Amoedo ; Fernando De la
Torre ; Jessica Hodgins 13. Goldberger AL, Amaral LAN, Glass L, Hausdorff JM,
Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng C-
5. Analysis of visually guided tracking performance in K, Stanley HE. PhysioBank, PhysioToolkit, and
Parkinson's disease by Yi Liu ; Chonho Lee ; Bu-Sung PhysioNet: Components of a New Research Resource
Lee ; James K.R. Stevenson ; Martin J. McKeown. for Complex Physiologic Signals. Circulation
101(23):e215-e220 [Circulation Electronic Pages.
6. Classification and visualization tool for gait analysis of
Parkinson's disease by U Kit Pun ; Huanying 14. http://circ.ahajournals.org/content/101/23/e215
Gu ; Ziqian Dong ; N. Sertac Artan.

7. Decision Support Framework for Parkinson’s Disease


Based on Novel Handwriting Markers by Peter Drotár ;
Jiří Mekyska ; Irena Rektorová ; Lucia Masarová ;
Zdeněk Smékal ; Marcos Faundez-Za.

8. Using wearable sensors to predict the severity of


symptoms and motor complications in late stage
Parkinson's Disease by Shyamal Patel ; Richard
Hughes ; Nancy Huggins ; David Standaert ; John
Growdon ; Jennifer Dy

9. Assessment and visualization of Parkinson's disease


tremor by J. Synnott ; L. Chen ; C.D. Nugent ; G.
Moore

Das könnte Ihnen auch gefallen