Detection of Major Depressive Disorder Using Linear and Non-Linear Features From EEG Signals

Microsystem Technologies
https://doi.org/10.1007/s00542-018-4075-z (0123456789().,-volV)(0123456789().
,- volV)
TECHNICAL PAPER
Detection of major depressive disorder using linear and non-linear

features from EEG signals
Shalini Mahato1 • Sanchita Paul1
Received: 14 May 2018 / Accepted: 27 July 2018

Springer-Verlag GmbH Germany, part of Springer Nature 2018
Abstract
EEG signals are non-stationary, complex and non-linear signals. During major depressive disorder (MDD) or depression,
any deterioration in the brain function is reflected in the EEG signals. In this paper, linear features (band power, inter
hemispheric asymmetry) and non-linear features [relative wavelet energy (RWE) and wavelet entropy (WE)] and com-
bination of linear and non-linear features were used to classify depression patients and healthy individuals. In this analysis
the data set used is publicly available data set contributed by Mumtaz et al. (Biomed Signal Process Control 31:108–115,
2017b). The dataset consisted of 34 MDD patients and 30 healthy individuals. The classifiers used were multi layered
perceptron neural network (MLPNN), radial basis function network (RBFN), linear discriminant analysis (LDA) and
quadratic discriminant analysis. When linear feature was used, highest classification accuracy of 91.67% was obtained by
alpha power with MLPNN classifier. When non-linear feature was used, both RWE and WE provided highest classification
accuracy of 90% with RBFN and LDA classifier, respectively. The highest classification of 93.33% was achieved when
combining linear and non-linear feature, i.e., combination alpha power and RWE with MLPNN as well as RBFN classifier.
This paper also showed that the combination of non-linear features, i.e., RWE and WE also performed the best with highest
classification accuracy of 93.33%. The study compared the accuracy, sensitivity and specificity of different classifiers along
with linear and non-linear features and combination of both. The results indicated that combination alpha power and RWE
showed the highest classification 93.33% accuracy in all the applied classifiers.
1 Introduction They generally face hindrance in social acceptability, suf-

fer from lack of personal esteem and exhibit decline in
Over the years MDD has transformed into a major cause of performance at work front. In worst case depression
disability around the world in terms of years lived with patients tend to commit suicide. Due to inappropriate
disability. The rate of people affected by MDD has grown diagnosis, dearth of trained medical personnel, inadequate
up to 20% in the last decade with a ratio 1:1.5 for males as facilities and social disrepute only half of the patients end
compared to females. More than 300 million people across up getting proper treatment. Sometimes it may happen that
the globe carry the burden of this disease and it is antici- person suffering from depression goes undiagnosed while
pated by World Health Organization (WHO) that by 2030, an unaffected person is wrongly identified and prescribed
MDD will become the major cause of disease burden. antidepressant.
An individual suffering from MDD exhibits severe lack The major reason behind inappropriate diagnosis of
of interest in usual activities and pleasure. This is accom- MDD is because of absence of any accepted biomarkers.
panied with four additional symptoms of depression Diagnosis of depression is done on the basis of accepted
namely feeling restless and agitated, being tired and depression classification criteria (Cusin et al. 2009) like
unenergetic, feeling worthless or guilty and having suicidal Diagnostic and Statistical Manual of Mental Disorders-fifth
tendency, etc., for a sustained period of at least 2 weeks. edition (DSM-V), Beck Depression Inventory (BDI) and
Hamilton Depression Rating Scale (HAM-D) which
depends on certain questionnaire and observed behavior
& Shalini Mahato
swarup.shalini@gmail.com during the interview session between the MDD patient and
health practitioner.
1
Department of Computer Science and Engineering, Birla
Institute of Technology, Mesra, Ranchi 835215, India
123
MDD causes degradation in brain’s performance which (Mohammadi et al. 2015). Wavelet transform features
is expected to be reflected in the bioelectrical activity of the when used in logistic regression (LR) provided a classifi-
brain. Screening of brain under these conditions with the cation accuracy of 87.5% (Mumtaz et al. 2017a, b). LDA
help of EEG can be helpful in understanding the brain along with detrended fluctuation analysis (DFA) and
functioning and imbalance of brain activity. EEG is cost Spectral Asymmetric Index (SASI) provided an accuracy
effective and also easily available. The time scale of EEG of 91.2% (Bachmann et al. 2017). LR, LDA and k-nearest
is in milli-seconds which is also same as that of neural neighbor (KNN) along with four non-linear features, i.e.,
activity thus giving better temporal resolution. DFA, Higuchi’s fractal dimension (HFD), correlation
Manual interpretation of the EEG signals is very com- dimension (CD) and Lyapunov exponent (LE) provided an
plicated as the EEG signals are extremely intricate, non- accuracy of 90, 86.6 and 80%, respectively. Both LR and
stationary and non-linear. So computer aided signal pro- LDA provided an accuracy of 73.3% while KNN provided
cessing becomes necessary to automatically classify MDD an accuracy of 70% when alpha power was used as a
patients and healthy individuals. feature (Hosseinifarda et al. 2013). Both SASI and HFD
Extensive research has been done using different EEG provided a classification accuracy of 85% using statistical
features and classifiers to improve the prediction accuracy t test (Bachmann et al. 2013).
of MDD patients. The purpose of this paper is to compare the efficiency of
It was found that during neuro-feedback treatment, linear features (band power and asymmetry), nonlinear
alpha asymmetry decreases remarkably in MDD patients features (WE and RWE) and combination of linear and
especially in women (Bruder et al. 2012). Frontal alpha non-linear features using different classifier (MLPNN,
asymmetry was found to be higher in MDD patients and RBFN, LDA, QDA) for detection of MDD.
found to be positively correlated with inhibitory behavior
before treatment (Gollan et al. 2014). Higher classification
accuracy in classifying depression and healthy individuals 2 Materials and methods
was achieved by alpha band power as compared to all the
bands that were analyzed together (Hosseinifarda et al. At first features are extracted from the EEG signals of 30
2013; Mohammadi et al. 2015). During depression, high MDD patients and 30 healthy individuals. The study is
alpha activity was found in the posterior region of the brain basically divided into three parts based on type of feature
in resting state (Stewart et al. 2014; Grin-Yatsenko et al. sets used. Three types of feature sets have been used: eight
2010). EEG power was found to be increased in central, linear feature sets (Four band power and four correspond-
occipital, parietal and posterior temporal areas in patients ing hemispheric asymmetry), two non-linear feature sets
in the early stage of depression (Grin-Yatsenko et al. (RWE and WE) and six combinations of linear and non-
2010). It was found that any of the hemispheres could be linear feature sets.
affected in depression but mainly abnormal sources were The linear feature sets for band power consisted of
found in the right hemisphere (Ricardo-Garcel et al. 2010). 19 9 1 equal to 19 features.
WE and approximate entropy was found to be higher in The non-linear feature sets consisted of 19 9 4 equal to
healthy individuals than MDD patients (Puthankattil and 76 features. The high dimensionality of non-linear feature
Joseph 2014). sets is reduced by dimension reduction with the help of
RWE for 4–32 Hz was found to be higher for healthy principal component analysis (PCA).
individuals than MDD patients in left hemisphere while For combination feature sets, whenever the combination
RWE for 0–4 Hz was found to be higher for MDD patients is with non-linear features dimension reduction is done
in right hemisphere of the brain (Puthankattil and Joseph with the help of PCA. Subsequently for each feature sets
2012). It was found that MDD patient’s beta power four types of classifiers are applied, i.e., MLPNN, RBFN,
increased only for left hemisphere (Grin-Yatsenko et al. LDA and QDA. Here, Fig. 1 represents the block diagram
2010; Mumtaz et al. 2017a, b). It was also found that beta of the proposed model for detection of MDD patients.
power was higher in central, temporal and parietal region
of brain for MDD patients (Mumtaz et al. 2017a, b). In a 2.1 Subjects
study it was found that while listening to music frontal
theta asymmetry was found to be decreased in MDD The EEG signal data set which has been used is publically
patients while frontal theta asymmetry was found to be available data contributed by Mumtaz et al. (2017b). The
increased in healthy individuals (Dharmadhikari et al. data sets consist of EEG signal of 34 MDD patients and 30
2018). healthy individuals. The group of 34 MDD patients con-
Classification accuracy of 80% was obtained using EEG sists of 17 male and 17 female patients with average age of
band power as feature and decision tree as classifier 40.3 ± 12.9 years. Out of 34 MDD patients EEG signal,
123
reference (Dien 1998). The sampling frequency was set to

256 Hz. All the signals were high pass filtered with 0.5 Hz
cut-off and low pass filtered with 70 Hz cut-off frequency.
50 Hz notch filter was used to remove line noise.
2.3 EEG pre-processing
During EEG recording the EEG signal gets contaminated

by various unwanted artifacts such as eye blink, muscle
activity, movements and electrical noise from the power
line.
Thus EEG pre-processing is essential to remove these
unwanted signals (Jung et al. 2000; Delorme and Makeig
2004; Jung et al. 1998; Joyce et al. 2003).
For removing the background noise re-reference to
common average reference (CAR) was done (Gandhi
2014). As the analysis needs to be done in the signal
between 0.5 and 32 Hz, the signal was high pass filtered
with 0.5 Hz cut off frequency and low pass filtered with
32 Hz cut off frequency. Since signals for most of the
muscle artifacts and power line noise lie above 32 Hz,
these artifacts are automatically removed by this process.
Other muscle artifacts like eye blinks and eye movement
artifacts are removed visually with the help of independent
component analysis (ICA) (Delorme and Makeig 2004).
ICA is based on blind source separation used recover
independent source signals and can easily and efficiently
remove artifacts.
All the signal analysis was done using Matlab software
along with EEGLAB.
Fig. 1 Block diagram representation of the steps involved in the

classification of MDD patients and healthy individuals
30 EEG signal has been used in this study. The healthy

individuals consisted of age matched 21 males and 9
females with the average age of 38.3 ± 15.6 years.
The MDD patients were diagnosed on the basis of DSM-
IV criteria (American Psychiatric Association 1994).
Human Ethics committee of Hospital Universitii Sains
Malaysia (HUSM), Malaysia has given acceptance to the
above study.
2.2 EEG data acquisition
The EEG data was recorded for 5 min eyes closed (EC)
condition in resting position. The data was recorded with
19 electrodes based on international standard 10/20 system.
The 19 electrodes covered the frontal (Fp1, Fp2, F3, F4,
F7, F8, Fz), temporal (T3, T4, T5, T6), parietal (P3, P4,
Pz), occipital (O1, O2) central (C3, C4, Cz) as shown in
Fig. 2. The data was recorded using linked ear (LE)
Fig. 2 International 10/20 system for electrode positioning
123
2.4 Feature extraction alpha interhemispheric asymmetry, for theta power is ter-
med as theta asymmetry, for beta power is termed as beta
The feature extracted from the EEG signal can mainly be asymmetry. In this analysis, interhemispheric asymmetry
divided into two categories: linear analysis and non-linear was calculated for each pair of channels, i.e.,
analysis. Fp2Fp1; F4F3; F8F7; C4C3;
Frequency analysis (e.g. Fourier transform) and para-
T4T3; P4P3; T6T5; O2O1
metric modeling (auto regressive models) falls under linear
analysis. Linear methods include band power, Interhemi-
spheric asymmetry, EEG measurements (amplitude, fre- 2.4.3 Wavelet transform: non-linear features
quency, power) and so on. Linear methods have been
applied in a number of studies relating to EEG signal and For non-linear feature extraction, wavelet transform (WT)
have provided good results but it fails to capture the of the signal is done. Wavelet transform at low frequency
underlying complex and chaotic behavior of brain signals. gives good frequency information and at high frequency
Non-linear methods have the ability to capture the gives good time information. Wavelet transform provides
chaotic behavior and abrupt changes in EEG signal due to time frequency localization (Mallat 1989; Bopardikar and
underlying physiological phenomenon occurring in the Rao 1998). Wavelets are simple oscillatory function of
brain (Rodreguez-Bermudez and Garcia-Laencina 2015). fixed duration which by translation and dilation can create
Non-linear methods include relative wavelet energy any signal. CWT of finite energy function x(t) w.r.t to
(RWE), wavelet entropy, approximate entropy (ApEn), mother wavelet ws;d ðtÞ is represented as
Higuchi’s fractal dimension (HFD), correlation dimension
Z1
(CD), detrended fluctuation analysis (DFA) and so on.
In this paper, linear as well as non-linear methods have Ycwt ðs; dÞ ¼ xðtÞws;d ðtÞdt ð2Þ
been used. Linear methods used are band power and 1
interhemispheric asymmetry. Non-linear methods used are where,

RWE and WE.
1 td
ws;d ðtÞ ¼ pffiffiffiffiffi w ð3Þ
2.4.1 Band power jsj s
where, t is the time and s, d are the scaling and translation

In this method, the four frequency band: delta (0.5–4 Hz),
variable, respectively. Example of wavelets are Morlet
theta (4–8 Hz), alpha (8–13 Hz) and beta (13–30 Hz) are
wavelet, Coiflet and Haar.
extracted from EEG. Welch periodogram was applied to
CWT provides information which is highly redundant
compute the power spectrum of EEG signal. In this
and requires more computation time and resource. This
method, the signal is divided into small segments with 50%
problem is solved by discrete wavelet transform
overlap. The average of all modified periodogram is taken
(DWT).
which gives the signal power of each band.
In DWT, discrete values for translational and scaling are
used which is given by:
2.4.2 Interhemispheric asymmetry
s ¼ sm m
0 and d ¼ nd0 s0
The difference in EEG signal power of left half of the brain where, m and n are integers.
and right half of the brain can be represented by inter Thus, we get:
hemispheric asymmetry. It can be obtained by subtracting m=2
the natural logarithm of the power of the left half of brain wm;n ðtÞ ¼ s0 wðsm
0 n nd0 Þ ð4Þ
from the natural logarithm power of the right half of the
where, m indicates frequency localization and n indicates
brain. Positive value represents higher activity of right
time localization.
hemisphere while negative value represents relatively
In DWT, high frequencies are analyzed using a series of
higher activity of left hemisphere. It can be represented as
high pass filters and the low frequencies are analyzed using
Asym ¼ meanðlogðPoWR Þ logðPoWL ÞÞ ð1Þ a series of low pass filters. Quadrature mirror filters (QMF)
are pairs of finite impulse response (FIR) filters used in
where PoWR represents the summation of power of right
WT. The low and high frequency are separated by two FIR.
electrodes and PoWL represents the summation of power of
The low pass filter output is fed to input of another QMF
left electrodes.
filter pair. The procedure is repeated which creates a
Interhemispheric asymmetry corresponding to delta
hierarchal structure. Wavelet function is related to high
power is delta asymmetry, for alpha power is termed as
123
pass filter and scaling function is related to low pass filter. 2.4.5 Wavelet entropy (WE)
Information of the wavelet coefficient is organized in a
multi resolution scheme represented as a hierarchal Information contained in each level of decomposition can
scheme. be represented as wavelet entropy (Rosso et al. 2006).
To obtain the original signal inverse filtering opera- Shanon entropy forms the basis of wavelet entropy.
tion is done. Output of low pass filter is called approx- The equation for calculation of Wavelet Entropy (WE)
imate (A) and output of the high pass filter is called is:
details (D). X
N
In this paper, three level decomposition using Coiflet 5 Ent ¼ RWEk logðRWEk Þ ð8Þ
was used as shown in Fig. 3. j¼1
where, j = 1,2, …, N is wavelet decomposition level.

2.4.4 Relative wavelet energy (RWE)
2.5 Feature reduction
Relative energy associated with wavelet coefficient at
each resolution level is defined as RWE (Rosso et al.
When a large number of features are used for classification
2006).
by the classifier, it leads to high computational cost as well
Representation of energy at the kth resolution level (1,
as reduction in efficiency. In such situations, feature
2,…, N) is represented as:
reduction becomes necessary.
X 2
EDk ¼ Dk;l ð5Þ For linear feature analysis, feature set of 19 features
l were used, i.e., 1 feature for each channel location. Each
row in the data matrix represents the subject and each
where, k = 1, 2, …, N, andand Dk,l is the wavelet co-
column represents an attribute.
efficient.
But for non-linear feature analysis, we have 76 features
Total energy is represented as:
(19 9 4). So it is necessary to reduce the dimensionality.
X
N
For dimensional reduction PCA was used (Jolliffe 2002).
Etotal ¼ EDk ð6Þ Feature reduction is a necessary step here since it is a high
k¼1
dimension feature matrix. Feature reduction improves the
RWE can be represented as performance of classifier when we have high dimension
EDk matrix.
RWEk ¼ ð7Þ In PCA, dimensional reduction is done by representing
Etotal
d-dimensional data in a lower dimensional space without
much loss of information. For less loss of information,
variables with low variance are removed. PCA is a form of
unsupervised learning. Maximum variance of data is rep-
resented by first principal component, second principal
component represents the second largest variance till kth
principal components, where k is the dimension of the
reduced space. The principal components are uncorrelated
and orthonormal. For finding the principal component first
d 9 d covariance matrix is calculated from d-dimensional
mean adjusted vector. Eigen vector and eigen value are
calculated for the covariance matrix and arranged in the
decreasing order of eigen value (k1, k2, …, kd). The eigen
vector corresponding to maximum eigen value represents
the first principal component and it has maximum variance.
The values of eigen vector and eigen values are computed
by solving the following equation.
X
EigV ¼ kEigV ð9Þ
where, EigVPrepresent eigen vectors and k represent eigen

values and represent covariance matrix.
Feature vector is formed from the selected eigen vectors.
Fig. 3 3-level multi resolution decomposition of EEG signal
123
Feature Vector ¼ ðeig1 ; eig2 . . .eigd Þ may become linearly separable in higher dimensional
space.
To get the final reduced dataset the following equation is The hidden layers perform non-linear transformation by
used using Radial Basis Function. The output layer uses linear
FinalDataSet ¼ RowFeatureVector transformation.
RowMeanAdjustData ð10Þ Each node in hidden layer is mathematically represented
by a radial basis function. Radial basis function is defined
where, RowFeatureVector represents the Eigen Vector in
as
row in descending order of their eigen values.
RowMeanAdjustData represents the mean adjusted data /j ðxÞ ¼ / x xj ð12Þ
where each row hold separate dimension
where, j = 1, 2,…, N and xj defines the center of the radial
The reduced dimension k should be selected such that
basis function.
90–95% of total variance is represented.
Generally, Gaussian radial function is used for the
Pk
ki computation in the hidden layer nodes which is defined as
V ¼ Pi¼1
d
0:9or0:95 ð11Þ
j¼1 kj /j ðxÞ ¼ /ðx xj Þ
!
1 2 ð13Þ
¼ exp 2 x xj
2.6 Classifier model 2rj
2.6.1 Multi layer perceptron neural network (MLPNN) where, j = 1,2,…, N.

The training of RBFN is divided into two stages:
MLPNN (Haykin 2009) is a type of neural network model (i) Training of the hidden layer nodes. (ii) Training of the
which uses supervised learning. Mainly it is a feed forward weight vector between the hidden layer and output layer.
network trained with back propagation. It consists of 1 In the first stage either random subset selection or
input layer, 1 output layer and more than one hidden layers. clustering algorithm (like k-means) is used to determine the
Linear transfer function is used in the input layer and non- location and width of the localized basis function. Once
linear transfer functions like sigmoidal function are used basis function parameter is fixed, the first phase is
for the output layer and hidden layers. completed.
Each node in the input layer represents one attribute. In the second phase of training, updation of weight
The neurons of the input layer, hidden layer and output vector between hidden layer and output layer is done with
layer are connected by links which consists of some the help of Least Mean Square algorithm.
weight. All the complex computations are done in the The output unit consists of one node per class. Each
hidden layers. output node calculates a score for each associated class by
The algorithm comprises of two steps: forward pass and weighted sum of activation from each node in the hidden
backward pass. During the forward pass the input is feed to layer. Every output unit has its own set of weight to cal-
the network which propagates through the network and culate the score for each class. The classification decision
produces an output. Here no change in synaptic weight is is made on the basis of the output class which got the
done. Then the actual output is subtracted from the target highest score.
output to generate an error signal. Based on the error signal
the synaptic weights are adjusted during the backward pass. 2.6.3 Linear discriminant analysis (LDA)
The weights are adjusted in such a way that the error
decreases. This process is repeated a number of times until LDA (Fisher’s Linear Discriminant) is a form of supervised
a fixed number of iteration or the error become less than the learning. In LDA linear combination of variables are
tolerance value. searched that best separates two classes (Bishop 2006;
James 2013; Tharwat 2016). In this method a new
2.6.2 Radial basis function network (RBFN) dimension is picked such that separation of the means of
projected classes is maximum and variances within the
RBFN (Haykin 2009) is a form of artificial neural network projected classes are minimum. This is said to be the
model. It consists of only three layers: input layer, strictly Fisher’s criteria which can be determined as the ratio of
one hidden layer, output layer. In RBFN, dimension of between class variance to within class variance. Fisher’s
feature vector is increased. If the dimensionality of feature criteria is defined as
vector is increased, the complex classification problem
which is non-linearly separable in lower dimensional space
123
ðl2 l1 Þ2 TP
J¼ ð14Þ Sensitivity ¼ ð17Þ
r2 þ r1 ðTP þ FNÞ
where, l1 and l2 are mean vectors of two classes and r1 TN

Specificity ¼ ð18Þ
and r2 are the corresponding variances. ðTN þ FPÞ
LDA classification can be done on the basis of Baye’s no: of correct decisions
Theorem. A test sample x is said to belong to class k for Accuracy ¼ ð19Þ
total no: of cases
which discriminant functions dk ðxÞ is maximum.
X where TP is the number of MDD patients detected cor-
1 1
dk ðxÞ ¼ xT lk lTk lk þ log pk ð15Þ rectly, FN is the number of MDD patients detected as
2 healthy individuals, TP is the number of MDD patients
P
where represents covariance matrix, lk represents mean detected correctly, FN is the number of MDD patients
of kth class and r2 the corresponding variance. In the detected as healthy individuals, TN is the number of
equation is pk defined as healthy individuals detected correctly, and FP is the num-
ber of healthy individuals detected as MDD patients.
pk ¼ nk =n
where, n represents total number of observations and nk
represents the number of samples belonging to k class. 3 Experimental results
2.6.4 Quadratic discriminant analysis (QDA) 3.1 Results for classification based on linear
methods
The decision boundary of QDA is non-linear (James 2013;
Tharwat 2016). QDA is very similar to LDA except that in 3.1.1 Band power
LDA the covariance are assumed to be same for each
classes but in QDA each class has its own co variance Figure 4 shows that among all the band powers alpha
matrix. power shows the highest accuracy of 91.67% in MLPNN.
LDA classification can be done on the basis of Baye’s The classification accuracy of all band powers is always
Theorem. A test sample x is said to belong to class k for found to be lesser than the accuracy of alpha power in all
which discriminant functions dk ðxÞ is maximum. types of applied classifier.
1 X
1
dk ðxÞ ¼ ðx lk Þ2 ðx lk Þ 3.1.2 Interhemispheric asymmetry
2 k
ð16Þ
1 X Figure 4 shows that alpha asymmetry has the highest
log þ log pk classification accuracy among all types of asymmetry in all
2 k
types of applied classifier. Highest classification accuracy
P
where k represents covariance matrix of class k. of 73.33% was shown in QDA for alpha asymmetry. Theta
When the covariance matrix of each of the classes is asymmetry classification accuracy was also found to be at
equal then LDA performs better than QDA but when the par with alpha asymmetry. Highest classification accuracy
covariance matrix differs QDA performs better than LDA. of theta asymmetry was found to be 71.67% in MLPNN.
Tables 1, 2, 3, 4 shows the performance of classifiers
2.7 Validation MLPNN, RBFN, LDA, QDA respectively using linear
features (Band power & Interhemispheric Asymmetry)
The classifiers performance was evaluated using tenfold only.
cross validation for 100 iterations. In this method the data
set is divided into ten equal size segments. Each nine 3.2 Results for classification based on non-linear
segments are randomly selected for training and one seg- features
ment is selected for testing. Validation is done in order to
validate the model on different data set other than the Figure 5 shows that on an average both RWE and WE has
training data set which was used for parameter estimation good classification accuracy. RWE and WE both showed
of the model. the highest classification accuracy of 90% with RBFN and
Accuracy, sensitivity and specificity are calculated on LDA classifiers, respectively. Tables 5, 6, 7, 8 shows the
the basis of confusion matrix performance of classifiers MLPNN, RBFN, LDA, QDA
respectively using non-linear features (RWE & WE) only.
123
Fig. 4 Comparison of
classification accuracy of linear
features
Table 1 MLPNN classifier’s performance for linear features Table 3 LDA classifier’s performance for linear features
Features Accuracy (%) Sensitivity (%) Specificity (%) Features Accuracy (%) Sensitivity (%) Specificity (%)
Delta 83.33 93.33 73.33 Delta 73.33 66.67 80.00

Theta 86.67 83.33 90.00 Theta 80.00 70.00 90.00
Alpha 91.67 96.67 86.67 Alpha 86.67 83.33 90.00
Beta 81.67 80.00 83.33 Beta 78.33 76.67 80.00
All band power 80.00 83.33 76.67 All band power 83.33 77.78 85.19
Delta asymm 61.67 53.33 70.00 Delta asymm 48.33 46.67 50.00
Theta asymm 71.67 70.00 73.33 Theta asymm 61.67 63.33 60.00
Alpha asymm 71.67 80.00 63.33 Alpha asymm 61.67 63.33 60.00
Beta asymm 66.67 66.67 66.67 Beta asymm 70.00 60.00 80.00
Table 2 RBFN classifier’s performance for linear features Table 4 QDA classifier’s performance for linear features
Features Accuracy (%) Sensitivity (%) Specificity (%) Features Accuracy (%) Sensitivity (%) Specificity (%)
Delta 71.67 70.00 73.33 Delta 73.33 90.00 56.67

Theta 85.00 86.67 83.33 Theta 71.67 86.67 56.67
Alpha 85.00 80.00 90.00 Alpha 81.67 93.33 70.00
Beta 78.33 80.00 76.67 Beta 70.00 53.33 53.33
All band power 80.00 76.67 83.33 All band power 73.33 93.33 53.33
Delta asymm 50.00 66.67 33.33 Delta asymm 65.00 60.00 70.00
Theta asymm 70.33 100.00 66.67 Theta asymm 73.33 63.33 83.33
Alpha asymm 66.67 100.00 33.33 Alpha asymm 73.33 66.67 80.00
Beta asymm 66.67 100.00 33.33 Beta asymm 76.67 70.00 83.33
3.3 Results for classification based RBFN and LDA. Combination of two non-linear features
on combination of linear and non-linear (RWE and WE) also performed well in almost all applied
features classifiers with highest classification accuracy of 93.33% in
LDA. Whenever the alpha asymmetry feature was com-
Figure 6 shows that highest classification accuracy of bined with the combination alpha power and RWE, accu-
93.33% was achieved by the combination of alpha power racy degraded in all the classifier. Tables 9, 10, 11,
(linear method) and RWE (non-linear method) in MLPNN, 12 shows the performance of classifiers MLPNN, RBFN,
123
classification accuracy of non-
linear features
Table 5 MLPNN classifier’s performance for non-linear features Figure 7 compares the classification accuracy of
MLPNN, RBFN, LDA and QDA based on best performing
Features Accuracy (%) Sensitivity (%) Specificity (%)
linear feature (alpha power) non-linear feature (WE) and
RWE 86.67 86.67 86.67 combination of linear and non-linear feature (alpha
WE 88.33 90.00 86.67 power ? WE). It shows that combination of alpha power
and WE features works the best in all types of applied
classifier.
Table 6 RBFN classifier’s performance for non-linear features

4 Discussion
RWE 90.00 83.33 96.67 Highest classification accuracy of 93.33% was achieved by
WE 83.33 73.33 93.33 combination of alpha power and WE features in MLPNN
and RBFN classifiers. The improvement in accuracy by
combination of linear and non-linear feature is because of
the fact that the new feature set can analyse the signal’s
frequency and time domain behavior as well as the chaotic
Table 7 LDA classifier’s performance for non-linear features
and complex behavior of the signal.
Features Accuracy (%) Sensitivity (%) Specificity (%) Combination of two non-linear features, i.e., WE and
RWE 86.67 80.00 93.33 RWE also provided the highest classification accuracy of
WE 90.00 83.33 96.67 91.67% using LDA classifier which is higher than accuracy
obtained by any non-linear feature alone in any of the
applied classifier. The improvement in the accuracy is due
to the fact that when combination of two non-linear fea-
tures is used more amount of complex and chaotic behavior
Table 8 QDA classifier’s performance for non-linear features
of brain signal is captured.
Features Accuracy (%) Sensitivity (%) Specificity (%) Hosseinifarda et al. (2013) got similar result with
accuracy of 90% by combination of four non-linear fea-
RWE 80.00 86.67 73.33
tures (DFA, HFD, CD and Lyapunov exponent) using
WE 85.00 93.33 76.67
logistic regression classifier.
Alpha band power provided the highest accuracy among
the entire power band showing that MDD patients and
healthy individuals more significantly differ in alpha band
LDA, QDA respectively using combination of linear and than in any other power band. Similar results were found in
non-linear features. literature (Hosseinifarda et al. 2013; Mohammadi et al.
On the basis of above results it can be concluded that 2015).
combination of RWE and alpha power gives the best result Table 13, compares the present study with existing
in all kind of classifier. algorithm (Mumtaz et al. 2017b) on the published dataset.
123
classification accuracy of
combination of linear and non-
linear features
Table 9 MLPNN classifier’s

performance for linear and non-
linear features Alpha power ? RWE 93.33 94.44 87.78
Alpha power ? RWE ? alpha asymm 90.00 93.33 86.67
Alpha power ? alpha asymm 83.33 66.67 100.00
RWE ? alpha asymm 88.33 83.33 93.33
RWE ? WE 90.00 90.00 90.00
WE ? alpha power 86.67 90.00 83.33
Table 10 RBFN classifier’s

Alpha power ? RWE ? alpha_asymm) 85.00 76.67 93.33
Alpha_power ? alpha_asymm 83.33 75.00 91.66
RWE ? alpha_asymm 85.00 84.00 91.00
RWE ? WE 88.33 86.67 90.00
WE ? alpha_power 88.33 86.67 90.00
Table 11 LDA classifier’s

alpha power ? RWE ? alpha_asymm 90.00 80.00 100.00
RWE ? alpha_asymm 88.33 83.33 93.33
RWE ? WE 93.33 86.67 88.24
WE ? alpha_power 91.67 83.33 100.00
Table 12 QDA classifier’s

Alpha power ? RWE ? alpha_asymm 85.00 93.33 76.67
RWE ? alpha_asymm 81.67 90.00 73.33
RWE ? WE 85.00 93.33 76.67
WE ? alpha_power 81.67 86.67 76.67
123
classification accuracy of
classifiers for the best
performing linear feature (alpha
power), non-linear feature (WE)
and combination of linear and
non-linear feature (alpha
power ? RWE)
Table 13 Comparative study with existing algorithm on the published data set
Paper name A wavelet-based technique to predict treatment outcome Detection of major depressive disorder using linear and non-
for major depressive disorder (Mumtaz et al. 2017a, b) linear features from EEG signal
Objective (i) Prediction of response of antidepressant’s treatment on Classification of MDD and healthy individuals based on
MDD patient using EEG signal EEG signal using different linear and non-linear features
(ii) Classification of MDD and healthy individuals based
on EEG signal
Features used EEG features computed with wavelet transform (WT) Linear features: band power and asymmetry
analysis, short-time Fourier transform (STFT) analysis, Nonlinear features: Relative Wavelet Energy (RWE),
and Empirical Mode decompositions (EMD) Wavelet Entropy(WE) and combination of linear and non-
linear features
Classifier used Logistic regression (LR) MLPNN, RBFN, LDA, QDA
Feature reduction (i) Rank-based feature selection method on the basis of Principal component analysis (PCA)
and selection receiver operating characteristic (ROC)
techniques used (ii) Minimum redundancy and maximum relevance
(mRMR) method
Results Highest classification accuracy was achieved with Highest classification accuracy was achieved with
combination wavelet, STFT and EMD features with combination alpha power and RWE in both MLPNN and
Rank Based selection technique RBFN using PCA for reduction
Accuracy: 90.5 ± 8.3% Accuracy: 93.33 ± 1.67%
Sensitivity: 91.6 ± 5.7% Sensitivity: 94.44 ± 3.68%
Specificity: 88.7 ± 7.5% Specificity: 87.78 ± 2.12%
https://figshare.com/articles/EEG-based_Diagnosis_and_Treatment_Outcome_Prediction_for_Major_Depressive_Disorder/3385168
Alpha asymmetry as well as theta asymmetry showed performs better than RBFN when the entire feature’s per-
good classification accuracy. A lot of study (Hinrikus et al. formance is compared.
2009; Ricardo-Garcel et al. 2010; Stewart et al. 2014; MLPNN gives good and reliable results due to its ability
Gollan et al. 2014) has been done in the field of alpha to represent complex non-linear behavior of the problem.
asymmetry which shows its potential power for high pre- This characteristic of MLPNN is mainly because of pres-
diction accuracy. ence of one or more hidden layers with differentiable
But this study reveals the potential power of high pre- activation function and the layers are highly connected.
diction accuracy of theta power which is an area of MLPNN being a non-parametric model can capture more
research still needed to be explored. minute details of the data which is helpful in predicting
In this study four classifiers: MLPNN, RBFN, LDA and future data.
QDA have been used. Both MLPNN and RBFN performed Future research work would concentrate on the specific
better than LDA and QDA in most of the cases. MLPNN regions of the brain which gets affected in depression. EEG
123
signal of more number of MDD patient’s needs to be Haykin S (2009) Multilayer perceptrons. In: Dworkin A, Mars D,
analyzed so that the results could be generalized. Opaluc W (eds) Neural networks and learning machines, 3rd
edn. Pearson Education, Cranbury, pp 1–263
Hinrikus H, Sukhova A, Bachmann M et al (2009) Electroencephalo-
graphic spectral asymmetry index for detection of depression.
5 Conclusion Med Biomed Eng Comput 47:1291–1299
Hosseinifarda B, Moradia MH, Rostami R (2013) Classifying
The study demonstrated that EEG signal can be effectively depression patients and normal subjects using machine learning
used in discriminating between MDD patients and healthy techniques and nonlinear features from EEG signal. Comput
Methods Programs Biomed 109:339–345
individuals. Combination of linear and non-linear feature James G (2013) Classification. In: Casella G, Fienberg S, Olkin I
or combination of non-linear features is also an effective (eds) An introduction to statistical learning with applications in
way of increasing the accuracy of the classifier. Along with R. Springer, New York, pp 138–150
alpha asymmetry, theta asymmetry can also be used for Jolliffe IT (2002) Principal component analysis, series: Springer
series in statistics, 2nd edn. Springer, New York, pp 1–147
diagnosis of depression. Among all the classifier, i.e., Joyce CA, Gorodnitsky IF, Kutas M (2003) Automatic removal of eye
MLPNN, RBFN, LDA and QDA, MLPNN outperformed movement and blink artifacts from EEG data using blind
all other classifiers in the given localized data set. component separation. In: Fabiani M, Jennings JR (eds)
Psychophysiology. Blackwell Publishing Inc, Malden,
pp 313–325
Jung TP, Humphries C, Lee TW et al (1998) Extended ICA removes
References artifacts from electroencephalographic recordings. Adv Neural
Inf Process Syst 10:1–7
American Psychiatric Association (1994) Diagnostic and statistical Jung TP, Makeig S, Humphries C et al (2000) Removing electroen-
manual of mental disorders, 4th edn. American Psychiatric cephalographic artifacts by blind source separation. Psychophys-
Association, Washington, DC, pp 339–345 iology. Cambridge University Press, Cambridge, pp 163–178
Bachmann M, Lass J, Suhhova A, Hinrikus H (2013) Spectral Mallat SG (1989) A theory for multi-resolution signal decomposition:
asymmetry and Higuchi’s fractal dimension measures of depres- the wavelet representation. IEEE Trans Pattern Anal Mach Intell
sion electroencephalogram. Comput Math Methods Med II:674–694
2013:1–9 Mohammadi M, Al-Azab F, Raahem B et al (2015) Data mining EEG
Bachmann M, Lass J, Hinrikus H (2017) Single channel EEG analysis signals in depression for their diagnostic value. BMC Med
for detection of depression. Biomed Signal Process Control Inform Decis Making 108:108–123
31:391–397 Mumtaz W, Xia L, Ali SSA et al (2017a) A wavelet-based technique
Bishop C (2006) Linear models for classification. In: Jordan M, to predict treatment outcome for major depressive disorder.
Kleinberg J, Scholkopf B (eds) Pattern recognition and machine PLoS One 2017:1–30
learning. Springer, Singapore, pp 186–189 Mumtaz W, Xia L, Ali SSA et al (2017b) Electroencephalogram
Bopardikar AS, Rao RM (1998) Wavelet transforms: Introduction to (EEG)-based computer-aided technique to diagnose major
Theory and Applications. Dorling Kindersley Publishing Inc, depressive disorder (MDD). Biomed Signal Process Control
New Delhi, pp 2–82 31:108–115
Bruder GE, Stewart JW, Hellerstein D et al (2012) Abnormal Puthankattil SD, Joseph PK (2012) Classification of EEG signals In
functional brain asymmetry in depression: evidence of biologic normal and depression conditions by ANN using RWE and
commonality between major depression and dysthymia. Psychi- signal entropy. J Mech Med Biol 12:1240019–1240032
atry Res 196:250–254 Puthankattil SD, Joseph PK (2014) Analysis of EEG signals using
Cusin C, Yang H, Yeung A et al (2009) Rating scales for depression. wavelet entropy and approximate entropy: a case study on
In: Baer L, Blais MA (eds) Handbook of clinical rating scales depression patients. Int J Bioeng Life Sci 8:420–424
and assessment in psychiatry and mental health. Current Clinical Ricardo-Garcel J, Gonzalez-Olvera JJ, Miranda E et al (2010) EEG
Psychiatry, Boston, pp 7–37 sources in a group of patients with major depressive disorders.
Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for Int J Psychophysiol 71:70–74
analysis of single-trial EEG dynamics including independent Rodreguez-Bermudez G, Garcia-Laencina P (2015) Analysis of EEG
component analysis. J Neurosci Methods 134:9–21 signals using nonlinear dynamics and chaos: a review. Appl
Dharmadhikari AS, Tandle AL, Jaiswal SV et al (2018) Frontal theta Math Inf Sci 9:2309–2321
asymmetry as a biomarker of depression. East Asian Arch Rosso OA, Martin MT, Figliola A et al (2006) EEG analysis using
Psychiatry 28:17–22 wavelet-based information tools. J Neurosci Methods
Dien J (1998) Issues in the application of the average reference: 153:163–182
review, critiques and recommendations. Behav Res Methods Stewart JL, Coan JA, Towers DN et al (2014) Resting and task-
Instrum Comput 30:34–43 elicited prefrontal EEG alpha asymmetry in depression: support
Gandhi V (2014) Brain computer interfacing for assistive robotics. for the capability model. Psychophysiology 51:1–18
Electroencephalograms, recurrent quantum neural networks, and Tharwat A (2016) Linear vs. quadratic discriminant analysis classi-
user-centric graphical interfaces, 1st edn. Academic Press, fier: a tutorial. Int J Appl. Pattern Recognit 3(2):145–180
Cambridge, pp 21–29
Gollan JK, Hoxha D, Chihade D et al (2014) Frontal alpha EEG
asymmetry before and after behavioral activation treatment for Publisher’s Note Springer Nature remains neutral with regard to
depression. Biol Psychol 99:198–208 jurisdictional claims in published maps and institutional affiliations.
Grin-Yatsenko VA, Baas I, Ponomarev VA et al (2010) Independent
component approach to the analysis of EEG recordings at early
stages of depressive disorders. Clin Neurophysiol 281:281–289
123

Detection of Major Depressive Disorder Using Linear and Non-Linear Features From EEG Signals

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Detection of Major Depressive Disorder Using Linear and Non-Linear Features From EEG Signals

Hochgeladen von

Copyright:

Verfügbare Formate

Microsystem Technologies

Detection of major depressive disorder using linear and non-linear

Received: 14 May 2018 / Accepted: 27 July 2018

1 Introduction They generally face hindrance in social acceptability, suf-

reference (Dien 1998). The sampling frequency was set to

2.3 EEG pre-processing

During EEG recording the EEG signal gets contaminated

Fig. 1 Block diagram representation of the steps involved in the

30 EEG signal has been used in this study. The healthy

2.2 EEG data acquisition

interhemispheric asymmetry. Non-linear methods used are where,

where, t is the time and s, d are the scaling and translation

where, j = 1,2, …, N is wavelet decomposition level.

where, EigVPrepresent eigen vectors and k represent eigen

2.6.1 Multi layer perceptron neural network (MLPNN) where, j = 1,2,…, N.

where, l1 and l2 are mean vectors of two classes and r1 TN

Delta 83.33 93.33 73.33 Delta 73.33 66.67 80.00

Delta 71.67 70.00 73.33 Delta 73.33 90.00 56.67

Table 6 RBFN classifier’s performance for non-linear features

Table 9 MLPNN classifier’s

Table 10 RBFN classifier’s

Table 11 LDA classifier’s

Table 12 QDA classifier’s

Das könnte Ihnen auch gefallen