Grey

ADHD SIGNAL DIAGNOSIS USING RASPBERRY PI
A project report submitted by
DEVESH SINGH (UR13EC013)

KONKANCHI THARUN (UR13EC031)
ASHUTOSH SINGHAL (UR15EC094)
in partial fulfillment for the award of the degree

of
BACHELOR OF TECHNOLOGY
in
ELECTRONICS AND COMMUNICATION ENGINEERING
under the supervision of
Dr. S. THOMAS GEORGE
DEPARTMENT OF ELECTRICAL TECHNOLOGY
(Karunya Institute of Technology and Sciences)

(Declared as Deemed-to-be University -under Sec-3 of the UGC Act, 1956)
Karunya Nagar, Coimbatore - 641 114. INDIA
APRIL 2019
(Declared as Deemed-to-be university-under Sec-3 of the UGC Act, 1956)
KARUNYANAGAR,COIMBATORE–641114
BONAFIDE CERTIFICATE
This is to certify that the project report entitled, “ADHD signal diagnosis using Raspberry Pi ”
is a bonafide record of work of the following candidates who carried out the project work under my
supervision during the academic year 2018-2019.
Devesh Singh (UR13EC013)

Konkanchi Tharun (UR13EC031)
Ashutosh Singhal (UR15EC094)
Dr. S. THOMAS GEORGE Dr. D. NIRMAL

SUPERVISOR HEAD OF DEPARTMENT- UG ECE
ASSOCIATE PROFESSOR ASSOCIATE PROFESSOR
ECE PROGRAMME ECE PROGRAMME

DEPARTMENT OF ELECTRICAL TECHNOLOGY DEPARTMENT OF ELECTRICAL
TECHNOLOGY
Submitted for the Full Semester/Half Semester Viva Voce examination held on ……………………….
…………………….. ……………………..
(Internal Examiner) (External Examiner)
i
ACKNOWLEDGEMENT
First and foremost, we would like to thank Almighty God for all the blessings He
has bestowed upon us to work thus far and finish this project.
We express our gratitude to the Vice Chancellor Dr. P MANNAR JAWAHAR,

and the Registrar Dr. R. ELIJAH BLESSING, Karunya University, for their enduring
leadership.
We extend my thanks to our Director, Engineering and technology, Dr. PRINCE

ARUL RAJ, Karunya University, for his excellent encouragements in course of this work.
We are very thankful to Dr. D. NIRMAL, Associate Professor & Head,

Department of Electronics and Communication Engineering Department, Karunya
University, for their constant readiness in providing help and encouragement at all stages
in our project.
Our sincere and special thanks to our guide, Dr. S. THOMAS GEORGE,
Assistant Professor, for his immense help and guidance. We would like to extend a
thankful heart for his constant support through the entire project.
We express our deepest gratitude to our mentors Dr. S. THOMAS GEORGE

and Mrs. MARY NEEBHA for giving us this opportunity and providing us with an
environment to complete our project successfully.
Finally we would like to extend our deepest appreciation to our family and friends
for all that they were to us during the project period.
ii
ABSTRACT
Attention-Deficit/Hyperactivity Disorder (ADHD) is a multi-factorial

and clinically heterogeneous disorder that is prevailing in children
worldwide. It’s a neurodevelopment condition encompassing symptoms of
inattention, hyperactive and impulsivity that interfere with a child’s daily
functioning. In this work, for cross sectional study the children within the
age of 6 to 10 years were chosen from six different schools in Coimbatore
district under Thondamuthur union. This study helps us to identify the
prevalence of Attention Deficit hyperactivity Disorder (ADHD) in primary
school children. The data of ADHD patient (I.e. fMRI) is obtained from
NIRTC website where pre-processed fMRI images of ADHD patient where
collected from different institutes all over the world. In this work, the
available images, which is enormous data is studied clearly and
characteristics of images were known. Entropy is amount of information
present in a particular region. Entropy gives the information of brain states
and by these brain states, the functionality of neuron is found out. A normal
healthy human brain has an entropy value of 3.5336 in male and 3.5547 in
female. The Shannon entropy is calculated and compared with normal brain
entropy. The results additionally shows a high prevalence of ADHD among
primary school children and is mostly among males than females. The
dimension and characteristics of images is studied and Shannon entropy of
all images is found out.
Index Terms - Electroencephalogram, Independent Component

Analysis, Artifact Removal, Local Binary Pattern, Linear Discriminant
Analysis, K-Nearest Neighbor, Support Vector Machine.
iii
TABLE OF CONTENTS
Chapter Page No
ACKNOWLEDGEMENT i
ABSTRACT ii
LIST OF TABLES v
LIST OF FIGURES vii
LIST OF NOMENCLATURE viii
1. INTRODUCTION 1
Statement of Problem 2
Image preprocessing 2
Types of classifiers 3
Naïve Bayes 3
K-Nearest Neighbour (KNN) 3
Support Vector Machine (SVM) 3
Summary 4
2. LITERATURE SURVEY 5
3. IMAGE PRE-PROCESSING 9
Overall Block Diagram 9
Raspberry Pi 10
Independent Component Analysis 10
Ambiguities of ICA 12
Illustration of ICA 12
Summary 15
4. FEATURE EXTRACTION 16
Local Binary Pattern 16
Normalization 18
Energy 20
Entropy 20
Standard Deviation 21
Covariance 22
Summary 22
iv
5. CLASSIFIERS 23
Naïve Bayes 29
K-Nearest Neighbor 30
Algorithm 31
Properties 32
Features of KNN 33
Support Vector Machine 35
Applications 37
Summary 37
6. RESULT AND INFERENCE 38
Result 38
Inference 39
7. CONCLUSION 40
REFERENCES 41
v
LIST OF TABLES
TABLENO DESCRIPTION PAGENO
6.1 Comparison results of the classifiers 39

vi
LIST OF FIGURES
FIG.NO DESCRIPTION PAGE.NO

1.1 International 10-20 system 2
3.1 Overall block diagram 9
3.2 Complete set of 15 brain maps 14
3.3 The joint distribution of the independent components 14
3.4 The joint distribution of the observed mixtures 15
4.1 An example of LBP computation 17

4.2 Brain maps and its LBP applied result 19
4.3 Histogram of ADHD image 20
5.1 Data sets and test vectors in original 24

5.2 Eigen vector direction in class dependent type 28
5.3 Eigen vector direction in class independent type 29
5.4 Example of K-NN classification 31
5.5 Maximum-margin hyper plane the support vectors 36
vii
LIST OF NOMENCLATURE
SYMBOLS DESCRIPTION
ADHD Attention Deficit Hyperactivity

Disorder
ICA
Independent Component Analysis
PCA Principal Component Analysis
LBP Local Binary Pattern
KNN K- Nearest Neighbour
SVM Support Vector Machine

8
CHAPTER 1
INTRODUCTION
Image processing is very important tool in differentiating the images
with normal patient and affected patient. In our projectwe are discussing about
the ADHD affected patients. ADHD is a neuro-developmental, psychiatric and
the most chronic mental disorder present in children frequently seen in
preschool and early school years. ADHD is a strong genetic basis
multifunctional disorder with complex diagnosis. Features of ADHD are
hyperactivity, inattention, academic, behavioural, emotional, impulsivity and
social functioning. ADHD impacts focus, self control and other skills important
in daily life. It is caused by difference in brain anatomy and wiring. ADHD is a
neurodevelopment disorder, which cannot be seen naturally with our naked eye.
As we grow up from children to adults ADHD is gradually increasing. ADHD
has impacts on hyperactivity, inattention and impulsivity.
It is normal to have these symptoms but with ADHD children the
symptoms are severe, often and reduce the task completion. ADHD can be
caused because of the carelessness of parents like mother’s intake of drugs,
alcohol and tobacco during pregnancy. Birth complications like babies born
before due date or terribly low birth weight. Exposure to lead or alternative
virulent substance, extreme negligence, abuse or social deprivation, food
additives like artificial food colouring, brain injury. Children having this
problem faces many difficulties in academic, social and relationship problems.
ADHD affects 11% of school age children. ADHD is also known as
minimal brain dysfunction, hyperkinetic reaction of childhood. ADHD is most
commonly seen in the children born with a low birth weight, born premature or
whose mothers had difficult pregnancies.
Figure1.1. International 10-20 system

9
STATEMENT OF THE PROBLEM
Diagnosis of ADHD requires the evaluation of psychiatrist, Clinicians

and doctors expert in ADHD. The problems related to ADHD can be long lasting
and it differentiates from normal patient. Most children receive medication during
primary school. For an adolescent or adult to receive a diagnosis of ADHD, the
symptoms need to have been present prior to age 12.Adults with undiagnosed
ADHD result in less academic performance,improper work station, failed
difficulties.
ADHD symptoms change over age for period of time. According to the
survey conducted by us we figured out that ADHD (Attention Deficit
Hyperactivity Disorder) is predominantly seen in children below 10 years of age.
The children with ADHD in school are mainly noticed by the teachers and parents.
The early identification of this disorder helps the children to grow in healthy
environment. The academic standards of children, the ability to learn and work can
be improved.
The medication can be done by using stimulant and non stimulant. In
stimulant we use the chemical substance like dopamine and norepinephrine, which
play essential roles in thinking and attention. Under the proper supervisory of
doctors it is unsafe to use the stimulant. The second medication is by using non
stimulants. It takes much time than the previous one but helps in focusing work
and paying attention to problems. This method helps in proper interaction between
the doctors and patients since it s slow running process.
Psychotherapy helps in focusing parents and students in a proper
manner. The two therapies which help improving the patient mental health are
behaviour therapy and family and marital therapy. Behaviour therapy helps in
studying his or her behaviour, reacting to elders, controlling anger and thinking
before acting. Family and marital status can help family members and spouses find
better ways to handle disruptive behaviours, to encourage behaviour changes, and
improve interactions with the patient.
IMAGE PRE-PROCESSING
Independent Component Analysis (FastICA algorithm) is applied to

the EEG signal which in turn decomposes it into Independent
Components (IC's). These IC's are in turn processed by LBP to obtain certain
features for classification.
TYPES OF CLASSIFIERS
Three types of classifiers are used
(1) Naive bayes classifier
(2) K-Nearest Neighbour (KNN)

(3) Support Vector Machine (SVM)
10
NAIVE BAYES CLASSIFIER
Naive Bayes are a family of simple "probabilistic classifiers" based on

applying bayes' theorem with strong (naive) independence assumptions
between the features.
KNN (K-Nearest Neighbour)
A nearest-neighbor classification object, where both distance metric

("nearest") and number of neighbors can be altered. The object classifies new
observations using the predict method.
SVM (Support Vector Machine)
A Support Vector Machine (SVM) is a discriminative classifier formally

defined by a separating hyperplane. In other words, given labeled training data
(supervised learning), the algorithm outputs an optimal hyperplane which
categorizes new examples.
SUMMARY
In this chapter we have discussed about the image pre-processing

techniques and described the types of classifiers. The EEG signal is processed
using EEGLAB. ICA sub-divides the signal into linearly independent
components known as IC's. These IC's are further processed using LBP and its
features are obtained. These features are given to the classifiers for further
classification.
11
CHAPTER 2
LITERATURE SURVEY
Independent Component Analysis of Electroencephalographic

data (Scott Makeig, Anthony J. Bell, Tzyy-Ping Jung, Terrence J.
Sejnowski)
The joint problems of EEG source segregation, identification, and localization are very
difficult, since the problem of determining brain electrical sources from potential
patterns recorded on the scalp surface is mathematically underdetermined. Recent
efforts to identify EEG sources have focused mostly on performing spatial segregation
and localization of source activity. By applying the ICA algorithm of Bell and
Sejnowski, we attempt to completely separate the twin problems of source
identification and source localization. The ICA algorithm derives independent sources
from highly correlated EEG signals statistically and without regard to the physical
location or configuration of the source generators. ICA appears to be a promising new
analysis tool for human EEG and ERP research. It can isolate a wide range of artifacts
to a few output channels while removing them from remaining channels. The algorithm
also appears to be useful for decomposing evoked response data into spatially distinct
sub components, while measures of non stationarity in the ICA source solution may be
useful for observing brain state changes.
EEG Artifact Elimination by Extraction of ICA Component

features using Image Processing Algorithms(T. Radüntz, J.
Scouten, O. Hochmuth, B. Meffert)
Artifact rejection is a central issue when dealing with electroencephalogram

recordings. Although independent component analysis (ICA) separates data in linearly
independent components (IC), the classification of these components as artifact or EEG
signal still requires visual inspection by experts. We achieve automated artifact
elimination using linear discriminant analysis (LDA) for classification of feature
vectors extracted from ICA components via image processing algorithms. We compare
the performance of this automated classifier to visual classification by experts and
identify range filtering as a feature extraction method with great potential for
automated IC artifact recognition (accuracy rate 88%). We obtain almost the same
12
level of recognition performance for geometric features and local binary pattern (LBP)
features.
Automatic removal of eye movement and blink artifacts from

EEG data using blind component separation (CARRIE A.
JOYCE, IRINA F. GORODNITSKY, MARTA KUTAS)
Signals from eye movements and blinks can be orders of magnitude larger than brain-
generated electrical potentials and are one of the main sources of artifacts in
electroencephalographic (EEG) data. Rejecting contaminated trials causes substantial
data loss, and restricting eye movements/blinks limits the experimental designs
possible and may impact the cognitive processes under investigation. This article
presents a method based on blind source separation (BSS) for automatic removal of
electroocular artifacts from EEG data. BBS is a signal-processing methodology that
includes independent component analysis (ICA). In contrast to previously explored
ICA-based methods for artifact removal, this method is automated.
EEG signal classification using PCA, ICA, LDA and support

Vector machines (Abdulhamit Subasi a,, M. Ismail Gursoy)
In this work, we proposed a versatile signal processing and analysis framework for
Electroencephalogram (EEG). Within this framework the signals were decomposed
into the frequency sub-bands using DWT and a set of statistical features was extracted
from the sub-bands to represent the distribution of wavelet coefficients. Principal
components analysis (PCA), independent components analysis (ICA) and linear
discriminant analysis (LDA) is used to reduce the dimension of data. Then these
features were used as an input to a support vector machine (SVM) with two discrete
outputs: epileptic seizure or not. The performance of classification process due to
different methods is presented and compared to show the excellent of classification
process.
13
Diagnosis of Diabetes Mellitius using K Nearest Neighbour

Algorithm ( Krati Saxena, Dr. Zubair Khan, Shefali Singh)
KNN is a method which is used for classifying objects based on closest training
examples in the feature space. KNN is the most basic type of instance-based learning
or lazy learning. It assumes all instances are points in n-dimensional space. A distance
measure is needed to determine the “closeness” of instances. KNN classifies an
instance by finding its nearest neighbors and KNN is a method which is used for
classifying objects based on closest training examples in the feature space. KNN is the
most basic type of instance-based learning or lazy learning. It assumes all instances are
points in n-dimensional space. A distance measure is needed to determine the
“closeness” of instances. KNN classifies an instance by finding its nearest neighbors
and picking the most popular class among the neighbors.
Application of k- Nearest Neighbour Classification in Medical

Data Mining (Hassan Shee Khamis, Kipruto W. Cheruiyot, Stephen
Kimani)
The k-nearest neighbours algorithm is one of the simplest machine learning

algorithms. It is simply based on the idea that “objects that are ‘near’ each other will
also have similar characteristics. Thus if you know the characteristic features of one of
the objects, you can also predict it for its nearest neighbour.” k-NN is an improvisation
over the nearest neighbour technique. It is based on the idea that any new instance can
be classified by the majority vote of its ‘k’ neighbours, - where k is a positive integer,
usually a small number. kNN is one of the most simple and straight forward data
mining techniques. It is called Memory-Based Classification as the training examples
need to be in the memory at run-time [14]. When dealing with continuous attributes the
difference between the attributes is calculated using the Euclidean distance.
14
Multiclass Support Vector Machines for EEG - Signals Classification

(Inan Guler and Elif Derya Ubeyli)
The SVM is a binary classifier, which can be extended by fusing several of its kind into
a multiclass classifier. In this paper, we fuse SVM decisions using the ECOC approach,
adopted from the digital communication theory [. In the ECOC approach,up to 2n−1 −
1 (where n is the number of classes) SVMs are trained, each of them aimed at
separating a different combination of classes. For three classes (A, B, and C) we need
three classifiers; one SVM classifies A from B and C, a second SVM classifies B from
A and C, and a third SVM classifies C from A and B. T he multiclass-classifier-
output code for a pattern is a combination of targets of all the separate SVMs. In our
example, vectors from classes A, B, and C have codes (1,−1,−1), (−1,1,−1), and
(−1,−1,1), respectively. If each of the separate SVMs classifies a pattern correctly, the
multiclassclassifier- target code is met and the ECOC approach reports no error for that
pattern. However, if at least one of the SVMs misclassifies the pattern, the class
selected for this pattern is the one its target code closest in the Hamming distance
senses to the actual output code and this may be an erroneous decision.
Support Vector Machine – a Large Margin Classifier to

Diagnose Skin Illnesses (Krupal S. Parikh, Trupti P. Shah)
Support Vector Machine (SVM) have been very popular as a large margin classifier
due its robust mathematical theory. It has many practical applications in a number of
fields such as in bioinformatics, in medical science for diagnosis of diseases, in various
engineering applications for prediction of model, in finance for forecasting etc. It is
widely used in medical science because of its powerful learning ability in
classification. It can classify highly nonlinear data using kernel function. This paper
proposes and analyses diagnostic model to classify the most common skin illnesses and
also provide a useful insight into the SVM algorithm. In rural areas where people are
generally treated by paramedical staff, skin patients are not subject to proper diagnosis
resulting in mistreatment. We think SVM is a good tool for proper diagnosis.
15
CHAPTER 3
IMAGE PRE- PROCESSING
OVERALL BLOCK DIAGRAM
The overall block diagram describes that the EEG signal is taken and
image pre-processing is done using EEGLAB. ICA converts the signal into
linearly independent components known as topoplots. These topoplots are further
used in LBP for image feature extraction. The features obtained are used to train
the classifiers and helps in the classification of artifact and non-artifact. The
rejected artifacts are removed and the remaining topoplots are used for
reconstruction of the denoised signal.
EEG ICA Brainmap

Signal s
Classification Feature
Non- Results
ADHD Extraction(LBP)
ma
ADHD
Maps
Figure 3.1 Processing Pipeline

16
RASPBERRY PI 3
The Raspberry Pi is a computer, very like the computers with which
you’re already familiar. It uses a different kind of processor, so you can’t
install Microsoft Windows on it. But you can install several versions of the
Linux operating system that look and feel very much like Windows. If you
want to, you can use the Raspberry Pi to surf the internet, send an email or
write a letter using a word processor. But you can also do so much more.
Easy to use but powerful, affordable and (as long as you’re careful)
difficult to break, the Raspberry Pi is the perfect tool for aspiring computer
scientists. What do we mean by computer science? We mean learning how
computers work so you can make them do what you want them to do, not
what someone else thinks you should do with them.
And who do we mean by computer scientists? We mean you. You may
finish this manual and decide you want to be next Tim Berners Lee, but even
if you don’t, we hope you have fun, learn something new and get a feel for
how computers work. Because no matter what you do in life, computers are
bound to be part of it.
Hardware
The Raspberry Pi hardware has evolved through several versions that
feature variations in memory capacity and peripheral-device support.
This block diagram describes Model B and B+; Model A, A+, and the
Pi Zero are similar, but lack the Ethernet and USB hub components. The
Ethernet adapter is internally connected to an additional USB port. In
Model A, A+, and the Pi Zero, the USB port is connected directly to
the system on a chip (SoC). On the Pi 1 Model B+ and later models the
USB/Ethernet chip contains a five-port USB hub, of which four ports are
available, while the Pi 1 Model B only provides two. On the Pi Zero, the USB
port is also connected directly to the SoC, but it uses a micro USB (OTG)
port.
17
PROCESSOR
The Broadcom BCM2835 SoC used in the first generation Raspberry
Pi includes a 700 MHz ARM11 76JZF-S processor, VideoCore IV graphics
processing unit (GPU), and RAM. It has a level 1 (L1) cache of 16 KB and a
level 2 (L2) cache of 128 KB. The level 2 cache is used primarily by the GPU.
The SoC is stacked underneath the RAM chip, so only its edge is visible. The
1176JZ(F)-S is the same CPU used in the original iPhone, although at a
higher clock rate, and mated with a much faster GPU.
The earlier V1.1 model of the Raspberry Pi 2 used a Broadcom
BCM2836 SoC with a 900 MHz 32-bit, quad-core ARM Cortex-
A7 processor, with 256 KB shared L2 cache. The Raspberry Pi 2 V1.2 was
upgraded to a Broadcom BCM2837 SoC with a 1.2 GHz 64-bit quad-
core ARM Cortex-A53 processor, the same SoC which is used on the
Raspberry Pi 3, but under clocked (by default) to the same 900 MHz CPU
clock speed as the V1.1. The BCM2836 SoC is no longer in production as of
late 2016.
The Raspberry Pi 3+ uses a Broadcom BCM2837B0 SoC with a
1.4 GHz 64-bit quad-core ARM Cortex-A53 processor, with 512 KB shared
PERFORMANCE
While operating at 700 MHz by default, the first generation
Raspberry Pi provided a real-world performance roughly equivalent to
0.041 GFLOPS. On the CPU level the performance is similar to a
300 MHz Pentium II of 1997–99. The GPU provides 1 Gpixel/s or
1.5 Gtexel/s of graphics processing or 24 GFLOPS of general purpose
computing performance. The graphical capabilities of the Raspberry Pi are
roughly equivalent to the performance of the Xbox of 2001.
Raspberry Pi 2 V1.1 included a quad-core Cortex-A7 CPU running at
900 MHz and 1 GB RAM. It was described as 4–6 times more powerful than
its predecessor. The GPU was identical to the original. In parallelised
benchmarks, the Raspberry Pi 2 V1.1 could be up to 14 times faster than a
Raspberry Pi 1 Model B+.
The Raspberry Pi 3, with a quad-core ARM Cortex-A53 processor, is
described as having ten times the performance of a Raspberry Pi 1. This was
suggested to be highly dependent upon task threading and instruction set use.
Benchmarks showed the Raspberry Pi 3 to be approximately 80% faster than
the Raspberry Pi 2 in parallelised tasks.
18
PHYTHON
Python is a wonderful and powerful language and with raspberry pi it
lets to connect the project with real world. Python syntax is very clean, with
an emphasis on readability and uses Standard English keywords. The easiest
introduction to python is through Integrated Development and Learning
Environment (IDLE), a python development environment. Python is the ocean
of libraries. A python library is a collection of functions and methods that
allows performing many actions and it decreases the code size also.
INDEPENDENT COMPONENT ANALYSIS
To rigorously define ICA we can use a statistical “latent variables”

model. Assume that we observe n linear mixtures x1, ...,xn of n independent
components
xj = aj1s1+aj2s2+...+ajnsn, for all j. (3.1)
We have now dropped the time index t; in the ICA model, we assume
that each mixture xj as well as each independent component sk is a random
variable, instead of a proper time signal. The observed values xj(t), e.g., the
microphone signals in the cocktail party problem, are then a sample of this
random variable. Without loss of generality, we can assume that both the
mixture variables and the independent components have zero mean: If this is
not true, then the observable variables xi can always be centered by subtracting
the sample mean, which makes the model zero-mean.
It is convenient to use vector-matrix notation instead of the sums like in

the previous equation. Let us denote by x the random vector whose elements
are the mixtures x1, ...,xn, and likewise by s the random vector with elements s1,
..., sn. Let us denote by A the matrix with elements ai j . Generally, bold lower
case letters indicate vectors and bold upper-case letters denote matrices. All
vectors are understood as column vectors; thus xT , or the transpose of x, is a
row vector. Using this vector-matrix notation, the above mixing model is
written as
x = As. (3.2)
Sometimes we need the columns of matrix A; denoting them by aj the

model can also be written as
𝐱 = ∑ n𝑖=1 𝐚𝑖𝐬𝐢 (3.3)
19
The statistical model in Equation (3.2) is called independent component

analysis, or ICA model. The ICA model is a generative model, which means
that it describes how the observed data are generated by a process of mixing the
components si. The independent components are latent variables, meaning that
they cannot be directly observed. Also the mixing matrix is assumed to be
unknown. All we observe is the random vector x, and we must estimate both A
and s using it. This must be done under as general assumptions as possible.
The starting point for ICA is the very simple assumption that the
components si are statistically independent. It will be seen below that we must
also assume that the independent component must have non Gaussian
distributions. However, in the basic model we do not assume these distributions
known (if they are known, the problem is considerably simplified.) For
simplicity, we are also assuming that the unknown mixing matrix is square, but
this assumption can be sometimes relaxed. Then, after estimating the matrix A,
we can compute its inverse, say W, and obtain the independent component
simply by:
s = Wx. (3.4)
ICA is very closely related to the method called blind source separation
(BSS) or blind signal separation. A “source” means here an original signal, i.e.
independent component, like the speaker in a cocktail party problem. “Blind”
means that we know very little, if anything, on the mixing matrix, and make
little assumptions on the source signals. ICA is one method, perhaps the most
widely used, for performing blind source separation.
In many applications, it would be more realistic to assume that there is

some noise in the measurements which would mean adding a noise term in the
model. For simplicity, we omit any noise terms, since the estimation of the
noise-free model is difficult enough in itself, and seems to be sufficient for
many applications.
20
Ambiguities of ICA
In the ICA model in Equation (3.2), it is easy to see that the following
ambiguities will hold:
1. We cannot determine the variances (energies) of the independent
components. The reason is that, both s and A being unknown, any scalar
multiplier in one of the sources si could always be cancelled by dividing the
corresponding column ai of A by the same scalar; see Equation (3.3). As a
consequence, we may quite as well fix the magnitudes of the independent
components; as they are random variables, the most natural way to do this is to
assume that each has unit variance: E{si2}= 1. Then the matrix A will be
adapted in the ICA solution methods to take into account this restriction. Note
that this still leaves the ambiguity of the sign: we could multiply the
independent component by −1 without affecting the model. This ambiguity is,
fortunately, insignificant in most applications.
2. We cannot determine the order of the independent components. The
reason is that, again both s and A being unknown, we can freely change the
order of the terms in the sum in Equation (3.3), and call any of the independent
components the first one. Formally, a permutation matrix P and its inverse can
be substituted in the model to give x = AP−1Ps. The elements of Ps are the
original independent variables s j , but in another order. The matrix AP−1 is just
a new unknown mixing matrix, to be solved by the ICA algorithms.
Illustration of ICA
To illustrate the ICA model in statistical terms, consider two independent

components that have the following uniform distributions:
1
p(𝑠𝑖) = { 2√3 if|si| ≤ √3 (3.5)
0 otherwise
21
The range of values for this uniform distribution were chosen so as to
make the mean zero and the variance equal to one, as was agreed in the
previous Section. The joint density of s1 and s2 is then uniform on a square.
This follows from the basic definition that the joint density of two independent
variables is just the product of their marginal densities, we need to simply
compute the product. The joint density is illustrated in Figure (3.2) by showing
data points randomly drawn from this distribution.
Now let as mix these two independent components. Let us take the
following mixing matrix:
2 3
A0 = ( ) (3.6)
2 1
This gives us two mixed variables, x1 and x2. It is easily computed that
the mixed data has a uniform distribution on a parallelogram. Note that the
random variables x1 and x2 are not independent any more an easy way to see
this is to consider, whether it is possible to predict the value of one of them, say
x2, from the value of the other. Clearly if x1 attains one of its maximum or
minimum values, then this completely determines the value of x2. They are
therefore not independent. (For variables s1 and s2 the situation is different:
from Figure (3.2) it can be seen that knowing the value of s1 does not in any
way help in guessing the value of s2.) The problem of estimating the data
model of ICA is now to estimate the mixing matrix A0 using only information
contained in the mixtures x1 and x2. Actually, you can see from Figure (3.3) an
intuitive way of estimating A: The edges of the parallelogram are in the
directions of the columns of A. This means that we could, in principle, estimate
the ICA model by first estimating the joint density of x1 and x2, and then
locating the edges. So, the problem seems to have a solution.
In reality, however, this would be a very poor method because it only
works with variables that have exactly uniform distributions. Moreover, it
would be computationally quite complicated. What we need is a method that
works for any distributions of the independent components, and works fast and
reliably. Next we shall consider the exact definition of independence before
starting to develop methods for estimation of the ICA model.
22
Figure 3.2. Complete set of 15 Brain maps
Figure 3.3 The joint distribution of the independent components s1 and s2 with uniform distributions.
Horizontal axis: s1, vertical axis: s2.
23
Figure 3.3. The joint distribution of the observed mixtures x1 and x2. Horizontal axis:
x1, vertical axis: x2.
SUMMARY
In this chapter we have discussed about the EEGLAB and described the
ICA technique used. As seen above EEGLAB is an interactive Matlab toolbox
for processing continuous and event-related EEG data using independent
component analysis (ICA), time/frequency analysis, and other methods
including artifact rejection. Using the Infomax algorithm we obtained linearly
independent components. These IC’s are further used for classification and
artifact rejection.
24
CHAPTER 4
FEATURE EXTRACTION
LOCAL BINARY PATTERN

Local Binary Pattern (LBP) is a simple yet very efficient texture operator
which labels the pixels of an image by thresholding the neighbourhood of each
pixel and considers the result as a binary number. Due to its discriminative
power and computational simplicity, LBP texture operator has become a
popular approach in various applications. It can be seen as a unifying approach
to the traditionally divergent statistical and structural models of texture
analysis. Perhaps the most important property of the LBP operator in real-
world applications is its robustness to monotonic gray-scale changes caused,
for example, by illumination variations. Another important property is its
computational simplicity, which makes it possible to analyze images in
challenging real-time settings.
The basic idea for developing the LBP operator was that two-
dimensional surface textures can be described by two complementary
measures: local spatial patterns and gray scale contrast. The original LBP
operator forms labels for the image pixels by thresholding the 3 x 3
neighbourhood of each pixel with the center value and considering the result as
a binary number. The histogram of these 28 = 256 different labels can then be
used as a texture descriptor. This operator used jointly with a simple local
contrast measure provided very good performance in unsupervised texture
segmentation. After this, many related approaches have been developed for
texture and colour texture segmentation.
The LBP operator was extended to use neighbourhoods of different.
Using a circular neighbourhood and bilinearly interpolating values at non-
integer pixel coordinates allow any radius and number of pixels in the
neighbourhood. The gray scale variance of the local neighbourhood can be
used as the complementary contrast measure. In the following, the notation
(P,R) will be used for pixel neighbourhoods which means P sampling points on
a circle of radius of R
25
Figure 4.1: An example of LBP computation.
Another extension to the original operator is the definition of so-

called uniform patterns, which can be used to reduce the length of the feature
vector and implement a simple rotation-invariant descriptor. This extension
was inspired by the fact that some binary patterns occur more commonly in
texture images than others. A local binary pattern is called uniform if the
binary pattern contains at most two bitwise transitions from 0 to 1 or vice versa
when the bit pattern is traversed circularly. For example, the patterns 00000000
(0 transitions), 01110000 (2 transitions) and 11001111 (2 transitions) are
uniform whereas the patterns 11001001 (4 transitions) and 01010010 (6
transitions) are not. In the computation of the LBP labels, uniform patterns are
used so that there is a separate label for each uniform pattern and all the non-
uniform patterns are labelled with a single label. For example, when using
(8,R) neighbourhood, there are a total of 256 patterns, 58 of which are uniform,
which yields in 59 different labels.
26
LBP is a powerful texture feature classification method. The features are

calculated by subtracting each pixel with its neighbor. If the center pixel is
greater than its neighbor pixel the value will be 1 and if the center pixel is less,
then the value is 0. Thus by comparing all the 8 neighbor pixels, we will obtain
an eight-digit binary number whose value will be converted into a decimal
number. The following notation is used for the LBP operator: LBPP,Ru2. The
subscript represents using the operator in a (P, R) neighbourhood. Superscript
u2 stands for using only uniform patterns and labelling all remaining patterns
with a single label. After the LBP labelled image f l(x, y) has been obtained, the
LBP histogram can be defined as
H𝑖 = ∑s,y I{ƒ𝑖(x, y) = 𝑖}, 𝑖 = 0, … , n − 1, (4.1)
in which n is the number of different labels produced by the LBP operator, and
I{A} is 1 if A is true and 0 if A is false.
When the image patches whose histograms are to be compared have
different sizes, the histograms must be normalized to get a coherent description
𝐻
N𝑖 = ∑n−1i 𝐻 (4.2)
j=0 j
An example of topoplot and LBP applied to a topoplot is shown in Figure (4.2).

From this LBP result the features are taken for classification
Normalization
After obtaining the LBP of the image, normalization is done in order to
bring consistency in the dynamic range of images. In image processing,
normalization is a process which is used to change the range of pixel intensity
values. It converts the image or any type of signal into a range that is more
familiar. After the normalization, we will obtain the image features such as
energy, entropy, standard deviation and covariance of the images. Energy is
defined based on a normalized image. It shows how the gray levels are
distributed. The entropy describes how much randomness or uncertainty is
27
figure 4.2 Brain map and LBP applied result.
present in a signal or an image. In image processing, standard deviation is the

estimate of the underlying brightness probability distribution. The covariance
matrix is used to calculate the spectral variability of a signal. These attributes
are used for classification. Normalization transforms an n-
dimensional grayscale image
I: {𝑋 ∈ 𝑅 n} → {M𝑖n, … , M𝑎x} (4.3)
with intensity values in the range (Min, Max), into a new image
IN: {𝑋 ∈ 𝑅n} → {newM𝑖n, … , newM𝑎x} (4.4)
with intensity values in the range (newMin, newMax). The linear normalization
of a grayscale digital image is performed according to the formula
IN = (I − M𝑖n) newMas−newM𝑖n + newM𝑖n (4.5)

Mas−M𝑖n
28
Figure 4.3. Histogram of ADHD image
Energy
Energy is defined based on a normalized image. Energy shows how
the gray levels are distributed. When the number of gray levels is low then
energy is high. The energy of an image gives information present on the image.
Entropy
Image entropy is a quantity which is used to describe the amount of
information which must be coded for by a compression algorithm. The image
that is perfectly flat will have entropy of zero. So they can be compressed to a
relatively small size. In high entropy images which have a great deal of
contrast from one pixel to the next, cannot be compressed as much as low
entropy images.
H(𝑋) = ∑𝑖 𝑃(x𝑖)I(x𝑖) = − ∑𝑖 𝑃 (x𝑖)logb𝑃(x𝑖) (4.3)
In the above expression, P (xi) is the probability that the difference

29
between 2 adjacent pixels is equal to i. For a high dimensional discrete random

variable X = (X1, . . . , Xd) ∈ R d that has a probability mass function of p(x1, . .
. , xd), the entropy formula Equation (4.3) can be extended
H(𝑋) = ∑s 1 ,…, s𝑑 p (x1 , … , xd) log 1

(4.4)
𝑝(s1 ,…,s𝑑)
Note that if Xi ’s are independent and identically distributed with a p.m.f.

q for all i, it is easy to see that H(X) = d · H(q). In information theory, an
information source that produces such a random variable is usually called
stationary and memoryless. Note that, in a general stationary source, i.e., if Xi
’s are identically distributed with q; then H(X) ≤ d · H(q). That is the joint
random variable cannot contain more information than the sum of the
individual information entropies of the components. The upper bound is only
achieved when all components are independent. A simple model would assume
that each pixel is an i.i.d. realization. The normalized histogram can be an
estimate of the underlying probability of pixel intensities, i.e., p(i) = hU (i)/N,
where hU (i) denotes the histogram entry of intensity value i in image U and N
is the total number of pixels of U. Using this model, we can compute the
entropy of the image as: H(U) = Xi hU (i) log N/hU (i), (2.5) which for the
image shown in Figure 4.2 is approximately 0.4100.
Standard Deviation
The unbiased estimate of the standard deviation, sa, of the brightness's

within a region R with ˄ pixels is called the sample standard deviation and is
given by:
𝑠a= √ 1 ∑N,n𝜖𝑅 (𝑎[N, n] − Na ) 2
˄−1
∑N,n∈𝑅 a2[N,n]−˄N2𝑎
=√ (4.4)
˄−1
LBP is a powerful texture feature classification method. The features are

calculated by subtracting each pixel with its neighbour. If the center pixel is
greater than its neighbour pixel the value will be 1 and if the center pixel is
less, then the value is 0. Thus by comparing all the 8 neighbour pixels, we will
obtain an eight-digit binary number whose value will be converted into a
decimal number.
30
Covariance
Covariance is a measure of how changes in one variable are associated
with changes in a second variable. Covariance measures the degree to which
two variables are linearly associated. The covariance matrix is used to capture
the spectral variability of a signal. C = cov( A ) returns the covariance. (If A is
a vector of observations, C is the scalar-valued variance. If A is a matrix whose
columns represent random variables and whose rows represent observations, C
is the covariance matrix with the corresponding column variances along the
diagonal.
SUMMARY
In this chapter we have discussed about the feature extraction and

described the various features extracted using LBP. Local binary pattern is a
texture classification method used for obtaining image features from the
independent components. After obtaining the LBP of the images, certain
features are taken for the purpose of classification. The features obtained are
energy, entropy, standard deviation and covariance.
31
CHAPTER 5
CLASSIFIERS
Classification is a process in which the attributes obtained from Chapter

3 are used for identifying the artifacts. This project is done by using various
classifiers like Naive Bayes and K-Nearest Neighbor (KNN) and Support
Vector Machine (SVM).
NAIVE BAYES
Naive Bayes is a simple technique for constructing classifiers: models

that assign class labels to problem instances, represented as vectors
of feature values, where the class labels are drawn from some finite set. There is
not a single algorithm for training such classifiers, but a family of algorithms
based on a common principle: all naive Bayes classifiers assume that the value of
a particular feature is independent of the value of any other feature, given the
class variable. For example, a fruit may be considered to be an apple if it is red,
round, and about 10 cm in diameter. A naive Bayes classifier considers each of
these features to contribute independently to the probability that this fruit is an
apple, regardless of any possible correlations between the color, roundness, and
diameter features.
For some types of probability models, naive Bayes classifiers can be
trained very efficiently in a supervised learning setting. In many practical
applications, parameter estimation for naive Bayes models uses the method
of maximum likelihood; in other words, one can work with the naive Bayes model
without accepting Bayesian probability or using any Bayesian methods.
Despite their naive design and apparently oversimplified assumptions,
naive Bayes classifiers have worked quite well in many complex real-world
situations. In 2004, an analysis of the Bayesian classification problem showed that
there are sound theoretical reasons for the apparently implausible efficacy of
naive Bayes classifiers. Still, a comprehensive comparison with other
classification algorithms in 2006 showed that Bayes classification is outperformed
by other approaches, such as boosted trees or random forests. An advantage of
naive Bayes is that it only requires a small number of training data to estimate the
parameters necessary for classification.
32
K-NEAREST NEIGHBOUR (KNN)
K- Nearest Neighbour is a method which is used for classifying objects

based on closest training examples in the feature space. It assumes all instances
are points in n-dimensional space. A distance measure is needed to determine
the “closeness” of instances. KNN classifies an instance by finding its nearest
neighbours and picking the most popular class among the neighbours. The
Classification is done by comparing feature vectors of the different points in a
space region. KNN is an improvisation over the nearest neighbour technique. It
is based on the idea that any new instance can be classified by the majority vote
of its ‘k’ neighbours, - where k is a positive integer, usually a small number.
KNN is one of the most simple and straight forward data mining
techniques. It is called Memory-Based Classification as the training examples
need to be in the memory at run-time. When dealing with continuous attributes
the difference between the attributes is calculated using the Euclidean distance.
A major problem when dealing with the Euclidean distance formula is that the
large values frequency swamps the smaller ones. For example, in patients
seeking heart disease records the cholesterol measure ranges between 100 and
190 while the age measure ranges between 40 and 80. So the influence of the
cholesterol measure will be higher than the age. To overcome this problem the
continuous attributes are normalized so that they have the same influence on
the distance measure between instances. Thus KNN classifies objects based on
closest training examples in the feature space. KNN is the most basic type of
instance-based learning or lazy learning. It assumes all instances are points in
n-dimensional space. A distance measure is needed to determine the
“closeness” of instances. KNN classifies an instance by finding its nearest
neighbors and picking the most popular class among the neighbors.
33
Algorithm
a) Take a sample dataset of n columns and m rows named as R .In which

n-1th columns are the input vector and nth column is the output vector.
b) Take a test dataset of n-1 attributes and y rows named as P.
c) Find the Euclidean distance between every S and T by the help of
formula
Eculiden Distance = (5.13)
d) Then, Decide a random value of K. K is the no. of nearest neighbors.

e) Then with the help of these minimum distance and Euclidean distance
find out the nth column of each.
f) Find out the same output values.
KNN is a highly effective inductive inference method for noisy training data
and complex target functions. Target function for a whole space may be
described as a combination of less complex local approximations. In Knn
Learning is very simple and Classification is time consuming.
Fig 5.4 Example of K-NN classification
The test sample (green circle) should be classified either to the first
class of blue squares or to the second class of red triangles. If k = 3 (solid line
34
circle) it is assigned to the second class because there are 2 triangles and only 1
square inside the inner circle. If k = 5 (dashed line circle) it is assigned to the
first class (3 squares vs. 2 triangles inside the outer circle).
PROPERTIES
KNN is a special case of a variable-bandwidth, kernel density "balloon"

estimator with a uniform kernel. The naive version of the algorithm is easy to
implement by computing the distances from the test example to all stored
examples, but it is computationally intensive for large training sets. Using an
appropriate nearest neighbor search algorithm makes KNN computationally
tractable even for large data sets. Many nearest neighbor search algorithms
have been proposed over the years; these generally seek to reduce the number
of distance evaluations actually performed.
KNN has some strong consistency results. As the amount of data approaches
infinity, the two-class KNN algorithm is guaranteed to yield an error rate no
worse than twice the Bayes error rate (the minimum achievable error rate
given the distribution of the data). Various improvements to the KNN speed
are possible by using proximity graphs
For multi-class k-NN classification, Cover and Hart (1967) prove an upper
bound error rate of R* ≤ RKnn ≤ R*(2-MR*/(M-1)) where R* is the Bayes error
rate (which is the minimal error rate possible), RKnn is the k-NN error rate,
and M is the number of classes in the problem. For M=2 and as the Bayesian
error rate R* approaches zero, this limit reduces to "not more than twice the
Bayesian error rate".
KNN has many disadvantages like it has high Computation cost since it
needs to compute the distance of each test instance to all training samples. It
requires large memory proportional to the size of training set. It has Low
accuracy rate in multidimensional data sets with irrelevant features. It can be
used for both prediction and classification. It is highly adaptive to local
information. KNN algorithm uses the closest data points for estimation;
therefore it is able to take full advantage of local information and form highly
nonlinear, highly adaptive decision boundaries for each data point. Another
disadvantage of KNN classifiers is the large memory requirement needed to
store the whole sample. When the sample is large, response time on a
sequential computer is also large. Despite the memory requirement issue, it is
showing good performance in classification problems of various datasets.
35
Dividing the training data into smaller subsets and building a model for each
subset then applying voting to classify testing data can enhance the classifier’s
performance.
Features of KNN
a) All instances of the data correspond to the points in an n-dimensional

Euclidean space.
b) Classification is delayed till a new instance arrives.
c) In KNN, the Classification is done by comparing feature vectors of the
different points in a space region.
d) The target function may be discrete or real-valued.
An arbitrary instance is represented by (a1(x), a2(x), a3(x),.., an(x)), where

ai(x) denotes features. Euclidean distance between two instances d (xi, xj) =sqrt
(sum for r=1 to n (ar(xi) - ar(xj))2). The k-nearest neighbor algorithm is
simplest of all machine learning algorithms and it is analytically tractable.
In KNN, the training samples are mainly described by n-dimensional
numeric attributes. The training samples are stored in an n-dimensional space.
When a test sample (unknown class label) is given, k-nearest neighbor
classifier starts searching the ‘k’ training samples which are closest to the
unknown sample or test sample. Closeness is mainly defined in terms of
Euclidean distance. The Euclidean distance between two points P and Q i.e. P
(p1, p2… Pn) and Q (q1, q2 ...qn) is defined by the following equation:-
(5.14)
.
SUPPORT VECTOR MACHINE (SVM)
Support Vector Machine (SVM) is kernel-based supervised learning

algorithm, which is the combination of Machine learning theory, optimization
algorithms from operation research and kernel techniques from Mathematical
analysis. A good generalization of a classifier is achieved when it minimizes
training error along with higher testing accuracy for unknown testing dataset.
36
The training algorithm of SVM maximizes the margin between the training
data and class boundary, removing some meaningless data from the training
dataset. So, the resulting decision function depends only on the training data
called support vectors, which are closest to the decision boundary. Thus SVM
maximizing the boundary by minimizing the maximum loss and giving good
accuracy compared to the classifier which are based on the minimizing the
mean squared error. It is also effective in high dimensional space where
number of dimension is greater than the number of training data. SVM can
separate the classes which cannot be separated by linear classifier. SVM is
kernel based method. It uses the kernel induced feature space. Using a kernel
function it transforms data from input space into a high-dimensional feature
space in which it searches for a separating hyper plane. So, that nonlinear data
can also be separated using hyper plan in high dimensional space. This takes a
lot of computation power. But SVM overcome this problem using kernel trick.
In SVM kernel functions are defined in reproducing kernel Hilbert space
(RKHS). Hilbert space is complete inner product space so similarity between
training data points are measured by inner product which is less expensive
computationally. Also, kernels are Mercer’s kernel , i.e., positive semi definite
kernel and due to the Mercer’s kernel SVM gives global optimum.
SVMs are built on developments in computational learning theory.
Because of their accuracy and ability to deal with a large number of predictors,
they have more attention in biomedical applications. The majority of the
previous classifiers separate classes using hyper planes that split the classes,
using a flat plane, within the predictor space. SVMs broaden the concept of
hyper plane separation to data that cannot be separated linearly, by mapping the
predictors onto a new, higher-dimensional space in which they can be separated
linearly. The method’s name derives from the support vectors, which are lists
of the predictor values taken from cases that lie closest to the decision
boundary separating the classes. It is practical to assume that these cases have
the greatest impact on the location of the decision boundary. In fact, if they
were removed they could have large effects on its location. Computationally,
finding the best location for the decision plane is an optimization problem that
makes uses of a kernel function to build linear boundaries through nonlinear
transformations, or mappings, of the predictors. The intelligent component of
the algorithm is that it locates a hyper plane in the predictor space which is
stated in terms of the input vectors and dot products in the feature space. The
dot product can then be used to find the distances between the vectors in this
37
higher-dimensional space. A SVM locates the hyper plane that divides the
support vectors without ever representing the space explicitly. As an alternative
a kernel function is used that plays the role of the dot product in the feature
space. The two classes can only be separated absolutely by a complex curve in
the original space of the predictor. The best linear separator cannot totally
separate the two classes. On the other hand, if the original predictor values can
be projected into a more suitable feature space, it is possible to separate
completely the classes with a linear decision boundary. As a result, the problem
becomes one of finding the suitable transformation. The kernel function, which
is central to the SVM approach, is also one of the main problems, especially
with respect to the selection of its parameter values. It is also crucial to select
the magnitude of the penalty for violating the soft margin between the classes.
This means that successful construction of a SVM necessitates some decisions
that should be informed by the data to be classified. The basic support vector
classifier is very similar to the perceptron. Both are linear classifiers, assuming
separable data. In perceptron learning, the iterative procedure is stopped when
all samples in the training set are classified correctly. For linearly separable
data, this means that the found perceptron is one solution arbitrarily selected
from an (in principle) infinite set of solutions. In contrast, the support vector
classifier chooses one particular solution: the classifier which separates the
classes with maximal margin. The margin is defined as the width of the largest
‘tube’ not containing samples that can be drawn around the decision boundary.
It can be proven that this particular solution has the highest generalization
ability.
High learning ability, good generalization in classification and
regression makes SVM most popular learning algorithm in many real-life
applications such as bioinformatics, electrical load forecasting, pattern
recognition, image processing, field of Hydrology. SVM is used to predict
mechanical property such as hot-rolled plain carbon steel, to build credit
scoring models assessing the risk of default of clients, in fault diagnosis, for
forecasting failures and reliability in engine system. It is also used to evaluate
level of coal mine underground environment, in classification of drug and
nondrug problem, to diagnosis diabetes and erythematous disease, in drug
design, in qualitative and quantitative prediction from sensor data etc.
SVMs are among the best “off-the-shelf” supervised learning algorithms.
It is kernel based supervisedlearning algorithm for binary classification
problem. It separates the two classes using kernel function which isinduced
38
from the training data set. The goal is to produce a classifier that will work well
on unseen examples, i.e.give good generalization.
Let there be m training examples (xi , yi , yi) = ± 1,2,3,...m.

Then there exist a hyper plane w. x + b=0, which separate the positive and
negative training examples using the decision function:
f (x) = sign (w.x + b), where sign (x) :-
where, w is the normal to the hyper plane which is known as weight vector and
b is called the bias. We see that yi (wixi + b) > 0, ฀ i 1,2,3,...m.
Training data (instance) on the margin are called the support vectors
.
Figure 5.5 Maximum-margin hyper plane the support vectors

39
APPLICATONS
SVMs can be used to solve various real world problems:

a) SVMs are helpful in text and hypertext categorization as their application
can significantly reduce the need for labeled training instances in both the
standard Inductive and transductive settings.
b) Classification of images can also be performed using SVMs. Experimental
results show that SVMs achieve significantly higher search accuracy
than traditional query refinement schemes after just three to four rounds
of relevance feedback. This is also true of image segmentation systems,
including those using a modified version SVM that uses the privileged
approach as suggested by Vapnik.
c) Hand-written characters can be recognized using SVM.
d) The SVM algorithm has been widely applied in the biological and other
Sciences. They have been used to classify proteins with up to 90% of the
compounds classified correctly. Permutation tests based on SVM
weights have been suggested as a mechanism for interpretation of SVM
models. Support vector machine weights have also been used to
interpret SVM models in the past Posthoc interpretation of support vector
machine models in order to identify features used by the model to make
predictions is a relatively new area of research with special
significance in the biological sciences.
SUMMARY
In this chapter we described classifiers such as Linear Discriminant

Analysis (LDA), K-Nearest Neighbor (KNN) and Support Vector Machine
(SVM). Using the features obtained in the previous chapter the classifiers
such as Linear Discriminant Analysis (LDA) and K- Nearest Neighbor
(KNN) are trained in order to differentiate between artifacts and non-artifacts.
Classification method like Support Vector Machine(SVM) is also described
in this chapter.
40
CHAPTER 6
RESULT AND INFERENCE
RESULT
The features obtained from LBP are applied to the classifiers. The objective of training
phase is to develop the classifier for distinguishing between ADHD maps and non-
ADHD maps. In order to analyze the output data obtained from the classifiers
accuracy, precision and sensitivity is calculated. First the data was applied to naive
bayes. It is observed that naïve bayes gives an accuracy of 35% and precision of
33.33%. Next the data was applied to KNN and SVM. It gives better accuracy than
naive bayes. It is observed that SVM has the best accuracy and precision than the other
two classifiers.
41
INFERENCE
The image features like Normalization, Standard Deviation, Energy, Entropy and
Covariance have been extracted for the output of Local Binary Pattern for the 15
brain maps obtained from Independent Component Analysis. These features are
used for training naive bayes, KNN and SVM classifiers. From Table 6.1, it is
inferred that SVM gives a better result than the KNN and naive bayes classifiers.
Table 6.1 Comparison results of the classifiers
S.NO CLASSIFIER ACCURACY PRECISION

1 Naive Bayes 35% %
2 KNN 66% %
3 SVM 71% %
42
CHAPTER 7
CONCLUSION
CONCLUSION
Electroencephalography (EEG) signals provide valuable information

about the brain and neurobiological problems. But EEG recordings are affected
by various types of noises which is a problem in the proper analysis of the
brain signals. Various artifacts such as the eyes blinking, ocular artifacts, and
movement of eyeballs extra create noise and to detect such noise is difficult.
All these problems arise during EEG recordings of patients and for proper
diagnosis it is very important to make the signals noise free. We presented a
novel method for automated EEG artifact. We use Independent Component
Analysis (ICA) to create brain maps from the EEG signal. We implemented
image feature extraction technique like LBP on these brain maps. Extracted
features were used with the classifiers such as naive bayes, KNN, and SVM.
The outputs were cross compared based on two scalar performance methods:
accuracy and precision. The result of the classification shows that SVM has got
better performance compared to KNN and naive bayes. In summary we present
a method that automated, reliable and robust for artifact rejection from EEG
signal.
43
REFERENCES
1. Jung, T. P., Humphries, C., Lee, T. W., Makeig, S., McKeown, M. J.,
Iragui, V., & Sejnowski, T. J. (1998, August). Removing
electroencephalographic artifacts: comparison between ICA and PCA. In
Neural Networks for Signal Processing VIII, 1998. Proceedings of the
1998 IEEE Signal Processing Society Workshop (pp. 63-72). IEEE.
2. Khan, H. A., Al Helal, A., Ahmed, K. I., & Mostafa, R. (2016, September).
Abnormal mass classification in breast mammography using rotation
invariant LBP. In Electrical Engineering and Information Communication
Technology (ICEEICT), 2016 3rd International Conference on (pp. 1-5).
IEEE.
3. Radüntz, T., Scouten, J., Hochmuth, O., & Meffert, B. (2015). EEG artifact
elimination by extraction of ICA-component features using image
processing algorithms. Journal of neuroscience methods, 243, 84-93.
4. Saxena, K., Khan, Z., & Singh, S. (2014). Diagnosis of Diabetes Mellitus
using K Nearest Neighbor Algorithm. International Journal of Computer
Science Trends and Technology (IJCST).
5. Parikh, K. S., & Shah, T. P. (2016). Support Vector Machine–A Large

Margin Classifier to Diagnose Skin Illnesses. Procedia Technology, 23,
369-375.
6. Yoko, S., Akutagawa, M., Kaji, Y., Shichijo, F., Nagashino, H., &
Kinouchi, Y. (2007). Simulation study on artifact elimination in EEG
signals by artificial neural network. In World Congress on Medical Physics
and Biomedical Engineering 2006 (pp. 1164-1166). Springer Berlin
Heidelberg.
7. Li, R., & Principe, J. C. (2006, August). Blinking artifact removal in

cognitive EEG data using ICA. In Engineering in Medicine and Biology
Society, 2006. EMBS'06. 28th Annual International Conference of the
IEEE (pp. 5273-5276). IEEE.
8. Subasi, A., & Gursoy, M. I. (2010). EEG signal classification using PCA,
ICA, LDA and support vector machines. Expert Systems with
Applications, 37(12), 8659-8666.
44
9. Mehmood, R. M., & Lee, H. J. (2015, June). Emotion classification of

EEG brain signal using SVM and KNN. In Multimedia & Expo Workshops
(ICMEW), 2015 IEEE International Conference on (pp. 1-5). IEEE.
10. Makeig, S., Bell, A. J., Jung, T. P., & Sejnowski, T. J. (1996). Independent
component analysis of electroencephalographic data. Advances in neural
information processing systems, 145-151.
11. Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-
scale and rotation invariant texture classification with local binary patterns.
IEEE Transactions on pattern analysis and machine intelligence, 24(7),
971-987.
12. Khamis, H. S., Cheruiyot, K. W., & Kimani, S. (2014). Application of k-

nearest neighbour classification in medical data mining. International
Journal of Information and Communication Technology Research.
13. Bouzalmat, A., Kharroubi, J., & Zarghili, A. (2013). Face Recognition
Using SVM Based on LDA. International Journal of Computer Science
Issues, 10(4), 171-179.
14. Romero, S., Mañanas, M. A., & Barbanoj, M. J. (2008). A comparative

study of automatic techniques for ocular artifact reduction in spontaneous
EEG signals based on clinical target variables: a simulation case.
Computers in biology and medicine, 38(3), 348-360.
15. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine

learning, 20(3), 273-297.

Grey

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Grey

Hochgeladen von

Copyright:

Verfügbare Formate

ADHD SIGNAL DIAGNOSIS USING RASPBERRY PI

A project report submitted by

DEVESH SINGH (UR13EC013)

in partial fulfillment for the award of the degree

DEPARTMENT OF ELECTRICAL TECHNOLOGY

(Karunya Institute of Technology and Sciences)

Devesh Singh (UR13EC013)

Dr. S. THOMAS GEORGE Dr. D. NIRMAL

ECE PROGRAMME ECE PROGRAMME

We express our gratitude to the Vice Chancellor Dr. P MANNAR JAWAHAR,

We extend my thanks to our Director, Engineering and technology, Dr. PRINCE

We are very thankful to Dr. D. NIRMAL, Associate Professor & Head,

We express our deepest gratitude to our mentors Dr. S. THOMAS GEORGE

Attention-Deficit/Hyperactivity Disorder (ADHD) is a multi-factorial

Index Terms - Electroencephalogram, Independent Component

K-Nearest Neighbour (KNN) 3

Support Vector Machine (SVM) 3

TABLENO DESCRIPTION PAGENO

6.1 Comparison results of the classifiers 39

FIG.NO DESCRIPTION PAGE.NO

4.1 An example of LBP computation 17

5.1 Data sets and test vectors in original 24

ADHD Attention Deficit Hyperactivity

LBP Local Binary Pattern

KNN K- Nearest Neighbour

SVM Support Vector Machine

Figure1.1. International 10-20 system

Diagnosis of ADHD requires the evaluation of psychiatrist, Clinicians

Independent Component Analysis (FastICA algorithm) is applied to

(1) Naive bayes classifier

(2) K-Nearest Neighbour (KNN)

NAIVE BAYES CLASSIFIER

Naive Bayes are a family of simple "probabilistic classifiers" based on

KNN (K-Nearest Neighbour)

A nearest-neighbor classification object, where both distance metric

SVM (Support Vector Machine)

A Support Vector Machine (SVM) is a discriminative classifier formally

In this chapter we have discussed about the image pre-processing

Independent Component Analysis of Electroencephalographic

EEG Artifact Elimination by Extraction of ICA Component

Artifact rejection is a central issue when dealing with electroencephalogram

Automatic removal of eye movement and blink artifacts from

EEG signal classification using PCA, ICA, LDA and support

Diagnosis of Diabetes Mellitius using K Nearest Neighbour

Application of k- Nearest Neighbour Classification in Medical

The k-nearest neighbours algorithm is one of the simplest machine learning

Multiclass Support Vector Machines for EEG - Signals Classification

Support Vector Machine – a Large Margin Classifier to

OVERALL BLOCK DIAGRAM

EEG ICA Brainmap

Figure 3.1 Processing Pipeline

INDEPENDENT COMPONENT ANALYSIS

To rigorously define ICA we can use a statistical “latent variables”

It is convenient to use vector-matrix notation instead of the sums like in

Sometimes we need the columns of matrix A; denoting them by aj the

The statistical model in Equation (3.2) is called independent component

In many applications, it would be more realistic to assume that there is

To illustrate the ICA model in statistical terms, consider two independent

Figure 3.2. Complete set of 15 Brain maps

LOCAL BINARY PATTERN

Figure 4.1: An example of LBP computation.

Another extension to the original operator is the definition of so-

LBP is a powerful texture feature classification method. The features are