Classification of Electrocardiogram Signals With Support Vector Machines and Particle Swarm Optimization

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO.
5, SEPTEMBER 2008
667
Classification of Electrocardiogram Signals

With Support Vector Machines and Particle
Swarm Optimization
Farid Melgani, Senior Member, IEEE, and Yakoub Bazi, Member, IEEE
AbstractThe aim of this paper is twofold. First, we present a

thorough experimental study to show the superiority of the generalization capability of the support vector machine (SVM) approach
in the automatic classification of electrocardiogram (ECG) beats.
Second, we propose a novel classification system based on particle
swarm optimization (PSO) to improve the generalization performance of the SVM classifier. For this purpose, we have optimized
the SVM classifier design by searching for the best value of the
parameters that tune its discriminant function, and upstream by
looking for the best subset of features that feed the classifier. The
experiments were conducted on the basis of ECG data from the
Massachusetts Institute of TechnologyBeth Israel Hospital (MIT
BIH) arrhythmia database to classify five kinds of abnormal waveforms and normal beats. In particular, they were organized so as to
test the sensitivity of the SVM classifier and that of two reference
classifiers used for comparison, i.e., the k-nearest neighbor (kNN)
classifier and the radial basis function (RBF) neural network classifier, with respect to the curse of dimensionality and the number
of available training beats. The obtained results clearly confirm
the superiority of the SVM approach as compared to traditional
classifiers, and suggest that further substantial improvements in
terms of classification accuracy can be achieved by the proposed
PSOSVM classification system. On an average, over three experiments making use of a different total number of training beats
(250, 500, and 750, respectively), the PSOSVM yielded an overall
accuracy of 89.72% on 40438 test beats selected from 20 patient
records against 85.98%, 83.70%, and 82.34% for the SVM, the
kNN, and the RBF classifiers, respectively.
Index TermsElectrocardiogram (ECG) signal classification,
feature detection, feature reduction, generalization capability,
model selection issue, particle swarm optimization (PSO), support
vector machine (SVM).
I. INTRODUCTION
OR SEVERAL years, the automatic classification of electrocardiogram (ECG) signals has received great attention
from the biomedical engineering community. This is mainly due
to the fact that ECG provides cardiologists with useful information about the rhythm and functioning of the heart. Therefore, its
analysis represents an efficient way to detect and treat different
kinds of cardiac diseases.
Manuscript received June 21, 2007; revised December 31, 2007 and
March 4, 2008. First published April 11, 2008; current version published
September 4, 2008.
F. Melgani is with the Department of Information Engineering and Computer Science, University of Trento, I-38050 Trento, Italy (e-mail: melgani@
disi.unitn.it).
Y. Bazi is with the College of Engineering, Al Jouf University, Al Jouf 2014,
Saudi Arabia, (e-mail: yakoub.bazi@ju.edu.sa).
Digital Object Identifier 10.1109/TITB.2008.923147
In the literature, several methods have been proposed for

the automatic classification of ECG signals. Among the most
recently published works are those presented in [1][10]. In
greater detail, the method presented in [1] is based on a hybrid
fuzzy neural network that consists of a fuzzy self-organizing
subnetwork connected in a cascade with a multilayer perceptron. The authors proposed to use high-order statistics (i.e., cumulants of the second, third, and fourth orders) as input features
for feeding their classifier. In [2], a neuro-fuzzy approach for the
ECG-based classification of heart rhythms is described. Here,
the QRS complex signal is characterized by Hermite polynomials, whose coefficients feed the neuro-fuzzy classifier. In [3],
the authors implemented two classification systems based on
the support vector machine (SVM) approach. The first exploits
features based on high-order statistics, while the second uses
the coefficients of Hermite polynomials. For improved performance, the authors propose to combine the two classifiers by
means of a weighting mechanism, whose weights are determined according to a least square estimation method. Detection of premature ventricular contractions (PVCs) by means of
a fuzzy-neural network classifier with features derived from a
quadratic spline wavelet transform is proposed in [4]. In [5], different classification systems based on linear discriminant classifiers are explored, together with different morphological and
timing features obtained from single and multiple ECG leads.
In [6], a high-order spectral analysis method is proposed for
the analysis and classification of cardiac arrhythmias, based on
bispectral analysis techniques. In particular, the bispectrum is
estimated using an autoregressive model, and the frequency support of the bispectrum is extracted as a quantitative measure to
classify atrial and ventricular tachyarrhythmias. In [7], an automatic online beat segmentation and classification system based
on a Markovian approach is proposed. The system carries out
ECG signal analysis through two processing layers. In the first,
the ECG signal is segmented into beat waveforms by means of
a robust and precise waveform modeling with hidden Markov
models (HMMs). In the second, the system identifies premature ventricular contraction beats using a simple set of rules.
In [8], a rule-based rough-set decision system is presented for
the development of an inference engine for disease identification
using time-domain features. In [9], a patient-adapting heartbeat
classifier system based on linear discriminants is proposed. The
classification system processes an incoming recording with a
global-classifier to produce the first set of beat annotations.
Then, an expert validates, and, if necessary, corrects a fraction
of the beats of the recording. The system then adapts by first
1089-7771/$25.00 2008 IEEE

Authorized licensed use limited to: University of Nebraska Omaha Campus. Downloaded on July 08,2010 at 22:30:26 UTC from IEEE Xplore. Restrictions apply.
668
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008
training a local classifier using the newly annotated beats, and

combines both local and global classifiers to form an adapted
classification system. Finally, in [10], the authors present an
approach for classifying beats of a large dataset by training a
neural network classifier using wavelet and timing features. The
authors found that the fourth scale of a dyadic wavelet transform with a quadratic spline wavelet together with the pre/post
RR-interval ratio is very effective in distinguishing normal and
PVC from other beats.
From these works, it appears clear that research in the field
of automatic ECG classification has reached a good level of
maturation. However, in the design of an ECG classification
system, there are still some open issues, which, if suitably addressed, may lead to the development of more robust and efficient classifiers. One of these issues is related to the choice
of the classification approach to be adopted. In particular, we
think that despite its great potential, the SVM approach has not
received the attention it deserves in the ECG classification literature as compared to other research fields. Indeed, the SVM
classifier exhibits a promising generalization capability, thanks
to the maximal margin principle (MMP) it is based on [11]. Another important property is that it is less sensitive to the curse of
dimensionality than traditional classification approaches. This
is explained by the fact that the MMP makes it unnecessary
to estimate explicitly the statistical distributions of classes in
the hyperdimensional feature space in order to carry out the
classification task. Thanks to these interesting properties, the
SVM classifier has proved successful in a number of different
application fields, such as 3-D object recognition [12], biomedical imaging [13], image compression [14], and remote sensing [15], [16]. Turning back to ECG classification, other issues
that need to be addressed are the following: 1) feature selection
is not performed in a completely automatic way and 2) the selection of the best free parameters of the adopted classifier is
generally done empirically (model selection issue).
In this paper, in order to address the aforementioned issues, in
a first step, we present a thorough experimental exploration of
the SVM capabilities for ECG classification. In a second step,
we propose to optimize further the performances of the SVM
approach in terms of classification accuracy: 1) by automatically detecting the best discriminating features from the whole
considered feature space and 2) by solving the model selection
issue. Unlike traditional feature selection methods, where the
user has to specify the number of desired features, the proposed
system allows to carry out what we term feature detection.
Feature selection and feature detection have the common characteristic of searching for the best discriminative features. The
latter, however, has the advantage of determining their number
automatically. In other words, feature detection does not require
the desired number of most discriminative features from the user
a priori. The detection process is implemented through a particle
swarm optimization (PSO) framework that exploits a criterion
intrinsically related to SVM classifier properties, namely, the
number of support vectors (SVs). This framework is formulated
in such a way that it also solves the model selection issue, i.e.,
to estimate the best values of the SVM classifier parameters,
which are the regularization and kernel parameters.
The rest of the paper is organized as follows. The basic mathematical formulation of SVMs for solving binary and multiclass
classification problems is recalled in Section II. The main concepts and principles of PSO are introduced in Section III. The
proposed PSOSVM classification system is described in Section IV. The experimental results obtained on ECG data from
the Massachusetts Institute of TechnologyBeth Israel Hospital
(MITBIH) arrhythmia database [17] are reported in Sections V
and VI. Finally, conclusions are drawn in Section VII.
II. SUPPORT VECTOR MACHINES
Let us first consider, for simplicity, a supervised binary classification problem. Let us assume that the training set consists of
N vectors xi d (i = 1, 2, . . . , N ) from the d-dimensional
feature space X. To each vector xi , we associate a target
yi {1, +1}. The linear SVM classification approach consists of looking for a separation between the two classes in X by
means of an optimal hyperplane that maximizes the separating
margin [11]. In the nonlinear case, which is the most commonly
used as data are often linearly nonseparable, the two classes are
first mapped with a kernel method in a higher dimensional fea
ture space, i.e., (X) d (d > d). The membership decision
rule is based on the function sign[f (x)], where f (x) represents
the discriminant function associated with the hyperplane in the
transformed space and is defined as
f (x) = w (x) + b .
(1)
The optimal hyperplane defined by the weight vector w

and the bias b is the one that minimizes a cost function
that expresses a combination of two criteria: margin maximization and error minimization. It is expressed as [11]
d

1
w2 + C
i .
2
i=1
N
(w, ) =
(2)
This cost function minimization is subject to the following

constraints:
yi (w(xi ) + b) 1 i ,
i = 1, 2, . . . , N
(3)
and
i 0,i
= 1, 2, . . . , N
(4)
where the i s are slack variables introduced to account for

nonseparable data. The constant C represents a regularization
parameter that allows to control the shape of the discriminant
function. The aforementioned optimization problem can be reformulated through a Lagrange functional, for which the Lagrange multipliers can be found by means of a dual optimization
leading to a quadratic programming (QP) solution [11], i.e.,
max
N
i=1
N
1
i j yi yj K(xi , xj )
2 i,j =1
(5)
under the constraints

i 0,
for i = 1, 2, . . . , N
(6)
MELGANI AND BAZI: CLASSIFICATION OF ECG SIGNALS WITH SVMs AND PSO
and
N
i yi = 0
(7)
i=1
where = [1 , 2 , . . . , N ] is the vector of Lagrange multipliers and K(, ) is a kernel function. The final result is a
discriminant function conveniently expressed as a function of
the data in the original (lower) dimensional feature space X

i yi K(xi , x) + b .
(8)
f (x) =
iS
The set S is a subset of the indexes {1, 2, . . . , N } corresponding to the nonzero Lagrange multipliers i s, which define
the so-called SVs. The kernel K(,) must satisfy the condition
stated in Mercers theorem so as to correspond to some type
of inner product in the transformed (higher) dimensional feature space (X) [11]. A typical example of such kernels is
represented by the following Gaussian function:
K(xi , x) = exp(xi x2 )
(9)
where represents a parameter inversely proportional to the

width of the Gaussian kernel.
As described before, SVMs are intrinsically binary classifiers. But, the classification of ECG signals often involves the
simultaneous discrimination of numerous information classes.
In order to face this issue, a number of multiclass classification strategies can be adopted [15], [18]. The most popular ones
are the one-against-all (OAA) and the one-against-one (OAO)
strategies. The former involves a reduced number of binary decompositions (and thus, of SVMs), which are, however, more
complex. The latter requires a shorter training time, but may
incur conflicts between classes due to the nature of the score
function used for decision. Both strategies generally lead to
similar results in terms of classification accuracy. In this paper, we shall consider the OAA strategy. Briefly, this strategy is
based on the following procedure. Let = {w1 , w2 , . . . , wT }
be the set of T possible labels (information classes) associated
with the ECG beats that we desire to classify. First, an ensemble
of T (parallel) SVM classifiers is trained. Each classifier aims
at solving a binary classification problem defined by the discrimination between one information class i (i = 1, 2, . . . , T )
against all others (i.e., {wi }). Then, in the classification
phase, the winner-takes-all rule is used to decide which label
to assign to each beat. This means that the winning class is the
one that corresponds to the SVM classifier of the ensemble that
shows the highest output (discriminant function value).
III. PARTICLE SWARM OPTIMIZATION
PSO is a stochastic optimization technique introduced recently by Kennedy and Eberhart, inspired by the social behavior
of bird flocking and fish schooling [19]. Similar to other evolutionary computation algorithms, such as genetic algorithms
(GAs) [16], PSO is a population-based search method that exploits the concept of social sharing of information. This means
that each individual (called particle) of a given population
(called swarm) can benefit from the previous experiences of
669
all other individuals in the same population. During the iterative search process in the d-dimensional solution space, each
particle (i.e., candidate solution) will adjust its flying velocity
and position according to its own flying experience as well as
those of the other companion particles in the swarm. PSO has
proved promising in solving a number of engineering problems
such as automatic control [20], antenna design [21], and inverse
problems [22]. In the following, we will briefly describe the
main concepts of the basic PSO algorithm.
Let us consider a swarm of size S. Each particle Pi (i =
1, 2, . . . , S) in the swarm is characterized by: 1) its current position pi (t) d , which refers to a candidate solution of the
optimization problem at iteration T ; 2) its velocity vi (t) d ;
and 3) the best position pbi (t) d identified during its past
trajectory. Let pg (t) d be the best global position found over
all trajectories traveled by the particles of the swarm. Position
optimality is measured by means of one or more fitness functions defined in relation to the considered optimization problem.
During the search process, the particles move according to the
following equations:
vi (t + 1) = wvi (t) + c1 r1 (t) (pbi (t) pi (t))
+ c2 r2 (t) (pg (t) pi (t))
pi (t + 1) = pi (t) + vi (t)
(10)
(11)
where r1 () and r2 () are random variables drawn from a uniform distribution in the range [0,1] so as to provide a stochastic
weighting of the different components participating in the particle velocity definition. c1 and c2 are two acceleration constants
regulating the relative velocities with respect to the best global
and local positions, respectively. In greater detail, these parameters are considered as scaling factors that determine the relative
pull of the best position of the particle and the global best position. Sometimes, it is referred to them as the cognitive and social
rates, respectively. They are factors determining how much the
particle is influenced by the memory of its best location and
by the rest of the swarm, respectively. The inertia weight w
is used as a tradeoff between the global and local exploration
capabilities of the swarm. Large values of this parameter permit better global exploration, while small values lead to a fine
search in the solution space. Equation (10) allows the computation of the velocity at iteration T + 1 for each particle in the
swarm by combining linearly its current velocity (at iteration
T ) and the distances that separate the current particle position
from its best previous position and the best global position,
respectively. The particle position is updated with (11). Both
(10) and (11) are iterated until convergence of the search process is reached. Typical convergence criteria are based on the
iterative behavior of the best value of the adopted fitness function(s) or/and simply on a user-defined maximum number of
iterations.
IV. PROPOSED PSOSVM CLASSIFICATION SYSTEM
In this section, we describe the proposed SVM system for
the classification of ECG signals. As mentioned in the Introduction, the aim of this system is to optimize the SVM classifier
670
accuracy by automatically: 1) detecting the subset of the best

discriminative features (without requiring a user-defined number of desired features) and 2) solving the SVM model selection
issue (i.e, estimating the best values of the regularization and
kernel parameters). In order to attain this, the system is derived
from an optimization framework based on PSO.
A. PSO Setup
Because of the good performances generally achieved with
the nonlinear SVM classifier based on the Gaussian kernel [15],
in the following, we shall describe the proposed classification
system with this particular kernel. We note that the same reasoning holds for any kind of kernel that satisfies the Mercers
condition mentioned in Section II.
The position pi d+2 of each particle Pi from the swarm is
viewed as a vector encoding: 1) a candidate subset f of features
among the d available input features and 2) the value of the
two SVM classifier parameters, which are the regularization
and the kernel parameters C and , respectively. Since the first
part of the position vector implements a feature detection task,
each component (coordinate) of this part will assume either a
0 (feature discarded) or a 1 (feature selected) value. The
conversion from real to binary values will be made by a simple
thresholding operation at the 0.5 value.
Let f (i) be the fitness function value associated with the ith
particle Pi . The choice of the fitness function is important because it is on this basis that the PSO evaluates the goodness of
each candidate solution pi for designing our SVM classification system. A possible choice is to adopt the class of criteria
that estimates the leave-one-out error bound that exhibits the
interesting property of representing an unbiased estimation of
the generalization performance of classifiers. In particular, for
SVM classifiers, different measures of this error bound have
been derived, such as the radius-margin bound and the simple
SV count [11]. In this paper, we shall explore the simple SV
count as a fitness criterion in the PSO optimization framework
because of its simplicity and effectiveness, as shown in the classification of hyperspectral remote sensing images [16].
B. SVM Classification With PSO: Case of Binary
Classification Problems
The procedure describing the proposed SVM classification
system for the basic discrimination case between two classes is
as follows.
1) Initialization
Step 1) Generate randomly an initial swarm of size S.
Step 2) Set to zero the velocity vectors vi (i =
1, 2, . . . , S) associated with the S particles.
Step 3) For each position pi d+2 of the particle
Pi (i = 1, 2, . . . , S) from the swarm, train an
SVM classifier and compute the corresponding
fitness function f (i) (i.e., the SV measure).
Step 4) Set the best position of each particle with its initial
position, i.e.,
pbi = pi ,
(i = 1, 2, . . . , S).
(12)
2) Search process
Step 5) Detect the best global position pg in the swarm
exhibiting the minimal value of the considered
fitness function over all explored trajectories.
Step 6) Update the speed of each particle using (10).
Step 7) Update the position of each particle using (11). If a
particle goes beyond the predefined boundaries of
the search space, truncate the updating by setting
the position of the particle at the space boundary
and reverse its search direction (i.e., multiply its
speed vector by 1). This will stop the particles
from further attempting to go out of the allowed
search space.
Step 8) For each candidate particle pi (i = 1, 2, . . . , S),
train an SVM classifier and compute the corresponding fitness function f (i).
Step 9) Update the best position pbi of each particle if its
current position pi (i = 1, 2, . . . , S) has a smaller
fitness function.
3) Convergence
Step 10) If the maximum number of iterations is not yet
reached, return to step 5).
4) Classification
Step 11) Select the best global position pg in the swarm
and train an SVM classifier fed with the subset
of detected features mapped by pg and modeled
with the values of the two parameters C and
encoded in the same position.
Step 12) Classify the ECG signals with the trained SVM
classifier.
C. Extension to Multiclass Classification Problems
Extension of the proposed system to address multiclass classification problems is made by adopting the OAA strategy described in Section II. For a problem with a set of T classes
= {w1 , w2 , . . . , wT }, this means that an ensemble of T binary SVM classifiers should be optimized according to the PSO
procedure described in the previous section. Therefore, the proposed approach will automatically detect the features and determine the two model parameter values for each binary SVM classifier defined to discriminate between class wi (i = 1, 2, . . . , T )
and all others (i.e., {wi }). During the classification phase,
the winner-takes-all rule is used to produce the final decision.
Note that, though we adopted the OAA strategy as a multiclass
strategy, other strategies could also be considered, thanks to the
general nature of the proposed PSOSVM classification system.
V. EXPERIMENTAL DESIGN
A. Dataset Description
Our experiments were conducted on the basis of ECG data
from the MITBIH arrhythmia database [17]. In particular, the
considered beats refer to the following classes: normal sinus
rhythm (N ), atrial premature beat (A), ventricular premature
beat (V ), right bundle branch block (RB), left bundle branch
block (LB), and paced beat (/). The beats were selected from
671
among the beats of the considered class; 3) the average accuracy

(AA), which is the average over the classification accuracies obtained for the different classes; 4) the McNemars test that gives
the statistical significance of differences between the accuracies
achieved by the different classification approaches. This test is
based on the standardized normal test statistic [25]
fij fj i
Zij =
fij + fj i
Fig. 1. Two-dimensional distribution of the six considered classes in the subspace formed by the best couple of features obtained with the PCA algorithm.
For better visualization, just 25 samples were randomly selected for each class.
the recordings of 20 patients, which correspond to the following

files: 100, 102, 104, 105, 106, 107, 118, 119, 200, 201, 202,
203, 205, 208, 209, 212, 213, 214, 215, and 217. In order to
feed the classification process, in this study, we adopted the two
following kinds of features: 1) ECG morphology features and
2) three ECG temporal features, i.e., the QRS complex duration,
the RR interval (the time span between two consecutive R points
representing the distance between the QRS peaks of the present
and previous beats), and the RR interval averaged over the ten
last beats [9]. In order to extract these features, first we performed the QRS detection and ECG wave boundary recognition
tasks by means of the well-known ecgpuwave software available on http://www.physionet.org/physiotools/ecgpuwave/src/.
Then, after extracting the three temporal features of interest, we
normalized to the same periodic length the duration of the segmented ECG cycles according to the procedure reported in [23].
To this purpose, the mean beat period was chosen as the normalized periodic length, which was represented by 300 uniformly
distributed samples. Consequently, the total number of morphology and temporal features equals 303 for each beat. Fig. 1
illustrates the distribution of the six considered classes drawn
by means of 25 samples randomly selected for each class and
the two best features according to the principal component analysis (PCA) algorithm [24]. From this figure, one can expect that
the discrimination task will not be straightforward due to the
apparently strong overlap between classes.
In order to obtain reliable assessments of the classification
accuracy of the investigated classifiers, in all the following experiments, we carried out three different trials, each with a new
set of randomly selected training beats, while the test set was
kept unchanged. The results of these three trials obtained on the
test set were thus averaged. The detailed numbers of training
and test beats are reported for each class in Table I. Classification performance was evaluated in terms of four measures,
which are: 1) the overall accuracy (OA), which is the percentage of correctly classified beats among all the beats considered
(independently of the classes they belong to); 2) the accuracy
of each class that is the percentage of correctly classified beats
(13)
where Zij measures the pairwise statistical significance of the

difference between the accuracies of the ith and jth classifiers. fij stands for the number of beats classified correctly
and wrongly by the ith and jth classifiers, respectively. Accordingly, fij and fij are the counts of classified beats on which
the considered ith and jth classifiers disagree. At the commonly
used 5% level of significance, the difference of accuracies between the ith and jth classifiers is said statistically significant if
|Zij | > 1.96.
B. Experimental Scheme
The proposed experimental framework was articulated
around the following five main experiments.
1) The first experiment aimed at assessing the effectiveness
of the SVM approach in classifying ECG signals directly
in the whole original hyperdimensional feature space (i.e.,
by means of all the 303 available features). The total number of training beats was fixed to 500, as reported in Table I.
For comparison purpose, we implemented two other reference nonparametric classification approaches, namely,
the k-nearest neighbor (kNN) and the radial basis function (RBF) neural network classifiers [24].
2) In the second experiment, it was desired to explore the
behavior of the SVM classifier (compared to the two reference classifiers) when integrated within a standard classification scheme based on a PCA feature reduction. In
particular, the number of features was varied from 10 to
50 with a step of 10 so as to test this classifier in small as
well as high-dimensional feature subspaces.
3) The third experimental part had for objective to assess
the capability of the proposed PSOSVM classification
system to boost further the accuracy of the SVM classifier, thanks to its automatic feature detection and model
selection-oriented optimization process.
4) The fourth experiment was devoted to analyze the generalization capability of the SVM, the kNN, and the RBF
classifiers with and without feature reduction, and of the
PSOSVM classification system by decreasing/increasing
the number of available training beats. This analysis was
done through two experimental scenarios, which consisted in passing from 500 to 250 and 750 training beats,
respectively.
5) Finally, in the fifth experiment, we analyzed the sensitivity
of the PSOSVM classification system with respect to the
three parameters that govern the PSO optimizer, namely,
the inertia weight w and the two acceleration constants c1
and c2 .
672
TABLE I
NUMBERS OF TRAINING AND TEST BEATS USED IN THE EXPERIMENTS
TABLE II
OVERALL (OA), AVERAGE (AA), AND CLASS PERCENTAGE ACCURACIES ACHIEVED ON THE TEST BEATS WITH
THE DIFFERENT INVESTIGATED CLASSIFIERS WITH A TOTAL NUMBER OF 500 TRAINING BEATS
C. Experiment Settings
In the experiments, we considered the nonlinear SVM based
on the popular Gaussian kernel (referred to as SVM-RBF or
simply SVM). The related parameters C and for this kernel were varied in the arbitrarily fixed ranges [103 , 200] and
[103 , 2] so as to cover high and small regularization of the classification model, and fat as well as thin kernels, respectively. In
addition, for comparison purpose, we implemented, in the first
experiment, the SVM classifier with two other kernels, which
are the linear and the polynomial kernels, leading thus to two
other SVM classifiers termed as SVM-linear and SVM-poly,
respectively. The degree d of the polynomial kernel was varied
in the range [2,5] in order to span polynomials with low and
high flexibility. The K value and the number of hidden nodes
(h) of the kNN and the RBF classifiers were tuned in the arbitrarily fixed intervals [1,15] and [10,60], respectively. The other
RBF parameters, which include the center and the width of each
RBF (kernel), were computed by applying the K-means clustering algorithm separately to each class [26]. Concerning the
PSO algorithm, we considered the following standard parameters: swarm size S = 40, inertia weight w = 0.4, acceleration
constants c1 and c2 equal to the unity, and maximum number of
iterations fixed at 40.
VI. EXPERIMENTAL RESULTS
A. Experiment 1: Classification in the Whole Original
Hyperdimensional Feature Space
As mentioned earlier, in this experiment, we applied the SVM
classifier directly on the entire original hyperdimensional feature space, which is made up of 303 features. During the training
phase, the SVM parameters were selected according to a m-fold
cross-validation (CV) procedure [27], first by randomly splitting the 500 training beats into m mutually exclusive subsets
(folds) of equal size, and then, by training m times an SVM
classifier modeled with predefined values: C for the linear kernel, (C and ) for the Gaussian kernel, and (C and d) for the
polynomial kernel. Each time we left one of the subsets out

of the training, and only used it to obtain an estimate of the
classification accuracy. From m times of training and accuracy
computation, the AA yielded a prediction of the classification
accuracy of the considered SVM classifier. We chose the best
SVM classifier parameter values to maximize this prediction.
In all experiments reported in this paper, we adopted a fivefold
CV. The same procedure was adopted to find the best parameters
for the kNN and RBF classifiers. We recall that this empirical
parameter estimation procedure and all the classification experiments were repeated three times, each with one of the three
different training sets generated randomly.
As reported in Table II, the OA and AA accuracies achieved
with the SVM classifier based on the Gaussian kernel (SVM
RBF) on the test set were equal to 87.76% and 87.48%, respectively. These results were better than those achieved by the
SVM-linear, the SVM-poly, the RBF, and the kNN classifiers.
Indeed, the OA (and AA) accuracies were equal to 80.55%
(78.90%) for the SVM-linear classifier, 85.25% (85.75%) for
the SVM-poly classifier, 82.74% (82.07%) for the RBF classifier, and 81.36% (80.70%) for the kNN classifier. Note that
depending on the classifier, the most difficult classes to discriminate were the paced beat (/), the ventricular premature beat
(V ), and the atrial premature beat (A) classes, which are also
the most overlapped ones according to Fig. 1.
This experiment appears to confirm what was observed in
other application fields, i.e., the superiority of SVM based on
the Gaussian kernel as compared to traditional classifiers when
dealing with feature spaces of very high dimensionality. In addition, it provides reference classification accuracies in order to
quantify the capability of the proposed PSOSVM classification
system to further improve these interesting results.
B. Experiment 2: Classification Based on Feature Reduction
In this experiment, we trained the SVM classifier based on the
Gaussian kernel, which proved in the previous experiments to
be the most appropriate kernel for ECG signal classification, in
673
feature subspaces of various dimensionalities. The desired number of features varied from 10 to 50 with a step of 10, namely,
from small to high-dimensional feature subspaces. Feature reduction was achieved by the traditional PCA algorithm, commonly used in ECG signal classification. It is based on the idea
to select the first component (i.e., the direction of maximum
variance), then the second component (direction of second maximum variance), and so on, up to the desired number of components, which will compose the considered feature subspace.
Fig. 2(a) depicts the results obtained in terms of OA by the
three considered classifiers combined with the PCA algorithm,
namely, the PCASVM, the PCARBF, and the PCAkNN classifiers. In particular, it can be seen that for all feature subspace
dimensionalities except the lowest (i.e., 10 features), the PCA
SVM classifier maintains a clear superiority over the other two.
Its best accuracy was found using a feature subspace made up of
the first 30 components. The corresponding OA and AA accuracies were 87.57% and 88.92%, respectively. Comparing these
results with those achieved with the SVM classifier based on
the Gaussian kernel in the original feature space (i.e., without
feature reduction), a slight decrease of 0.19% in terms of OA
and an increase of 1.44% in terms of AA was obtained (see
Table II). As regards the PCAkNN and the PCARBF classifiers, the best empirical numbers of features were found to be
20 and 30, respectively. The corresponding OA and AA accuracies were 84% and 82.83% for the PCAkNN classifier, and
83.54% and 83.01% for the PCARBF, respectively. Note from
Table II that the PCAkNN classifier behaves much better with
20 features than in the original hyperdimensional feature space.
From this experiment, we can make three observations: 1) the
SVM classifier shows a relatively low sensitivity to the curse
of dimensionality as compared to the kNN and the RBF classifiers [see Table III(a)]; 2) the SVM classifier still preserve its
superiority when integrated in a feature reduction-based classification scheme; and 3) though the SVM performs well in the
whole original feature space, its accuracy can still be improved
provided that a subspace of higher generalization capability can
be found.
C. Experiment 3: Classification With PSOSVM
As described in Section IV, the proposed PSOSVM classification system aims at enhancing the SVM classification process
from two different viewpoints: 1) by automatically detecting a
feature subspace of higher generalization capability in order to
deal in a more effective way with the curse of dimensionality,
instead of reducing the dimension of the original feature space
basing on PCA or simply on feature sampling as done in the
literature [1], [3], [5], [10] and 2) by passing from an empirical
tuning of the value of the two SVM parameters to their automatic
optimization. This experiment is aimed at assessing the effectiveness of this methodological enhancement. To this purpose,
we applied the PSOSVM classifier to the available training
beats. Note that each particle of the swarm was defined by position and velocity vectors of a dimension of 305. At convergence
of the optimization process, we assessed the PSOSVM classifier accuracy on the test samples. The achieved overall and
Fig. 2. Overall percentage accuracy (OA) versus number of selected features

(first principal components) achieved on the test beats with the PCASVM, the
PCAkNN, and the PCARBF classifiers with a training set made up of (a) 500,
(b) 250, and (c) 750 beats. The horizontal line refers to the OA achieved with
the proposed PSOSVM classification approach.
average accuracies were 90.52% and 92.34% corresponding to

substantial accuracy gains as compared to what was yielded either by the SVM classifier (with the Gaussian kernel) applied to
all available features (+2.76% and +4.86%, respectively) or by
the PCASVM classifier (+2.95% and +3.42%, respectively)
[see Table II and Fig. 2(a)]. The superiority of the PSOSVM
674
TABLE III
STATISTICAL SIGNIFICANCE OF DIFFERENCES IN CLASSIFICATION ACCURACY BETWEEN THE NINE INVESTIGATED CLASSIFIERS
EXPRESSED BY MEANS OF THE MCNEMARS TEST WITH A TOTAL OF (a) 500, (b) 250, AND (c) 750 TRAINING BEATS
TABLE IV
NUMBER OF FEATURES DETECTED FOR EACH CLASS WITH THE PSOSVM
CLASSIFICATION SYSTEM TRAINED ON 500 BEATS
and maximum numbers of features were obtained for the ventricular premature (V ) and normal (N ) classes with 35 and 63
features, respectively.
D. Experiment 4: Sensitivity to the Number of Training Beats
is statistically significant as shown by the McNemars statistical

test numbers reported in Table III(a), which, in absolute value,
are all above the 1.96 threshold. Its worst class accuracy was
obtained for normal beat (N ) (89.12%), while that of the SVM
and the PCASVM classifiers was for ventricular premature
beats (V ) as they were (81.48%) and (83.62%), respectively.
This shows the capability of the PSOSVM classifier to reduce
the gap between the worst and the best class accuracies (6.19%
versus 9.69% and 14.5% for the PCASVM and the SVM classifiers, respectively) while keeping OA at a high level. Table IV
shows the number of features detected automatically to discriminate each class from the others. The average number of features
required by the PSOSVM classifier is 46, while the minimum
In this experiment, we repeated the previous three experiments while decreasing and increasing the training set size by
50%. In particular, we considered two experimental scenarios
characterized by a total number of 250 and 750 training beats,
respectively. Table V(a) and (b) shows the results achieved with
all nine investigated classifiers (SVM-linear, SVM-poly, SVM
RBF, kNN, RBF, PCASVM, PCAkNN, PCARBF, and PSO
SVM) for these two scenarios, i.e., for 250 and 750 training
beats, respectively. Similarly, Fig. 2(b) and (c) shows the trend
of the OA provided by the PCASVM, the PCAkNN, and the
PCARBF classifiers on varying the number of features from
10 to 50.
In general, as could be expected, reducing the number of
training beats involved a more or less significant decrease
in accuracy depending on the classifier. In terms of OA, the
675
TABLE V
OVERALL (OA), AVERAGE (AA), AND CLASS PERCENTAGE ACCURACIES ACHIEVED ON THE TEST BEATS WITH
THE DIFFERENT EXPLORED CLASSIFIERS WITH A TOTAL NUMBER OF (a) 250 AND (b) 750 TRAINING BEATS
decrease in accuracy was 3.55%, 3.93%, 4.54%, 4.63%, 5.13%,

5.15%, 5.76%, 5.82%, and 6.16% for the PSOSVM, the PCA
kNN, the RBF, the kNN, the PCARBF, the SVMRBF, the
PCASVM, the SVM-poly, and the SVM-linear classifiers, respectively. The PSOSVM classifier thus shows the greatest
robustness to a decrease in training beats. Though it exploited
only 250 training beats, it yielded the best OA and AA values,
which are comparable to those of the SVM classifier trained
with double the number of training beats (500 beats) [see Tables II and V(a)]. Its worst class accuracy was 86.47% versus
80.23%, 68.14%, 61.67%, 57.14%, 50.22%, and 49.41% for
the PCASVM, the SVM(-RBF), the PCAkNN, the kNN, the
RBF, and the PCARBF classifiers, respectively.
When increasing the number of training beats from 500 to
750, the classification accuracies increase and the differences
between the classifiers appear less pronounced. In particular,
the classifier that benefited from the additional training beats
was the PCAkNN classifier with a gain of +3.04% and +4.10%
in terms of OA and AA, respectively. Still, in this classification
scenario, the PSOSVM classifier maintained a clear superiority
in terms of both OA and AA.
For both scenarios, the PSOSVM showed statistically significant differences of accuracy with respect to the other classifiers
according to the McNemars test [see Table III(b) and (c)].
E. Experiment 5: Sensitivity to the Inertia Weight and

Acceleration Parameters
As mentioned previously, in this experiment, we wanted to
analyze the sensitivity of the PSOSVM classification system
TABLE VI
OVERALL (OA) AND AVERAGE (AA) ACCURACIES ACHIEVED ON THE TEST
BEATS BY THE PSOSVM CLASSIFICATION SYSTEM FOR DIFFERENT VALUES
OF (a) INERTIA WEIGHT w VARIED IN THE RANGE [0,1] (c 1 AND c 2 WERE SET
TO 1) AND (b) ACCELERATION CONSTANTS c 1 AND c 2 TUNED IN THE RANGE
[1,2] [w WAS FIXED AT 0.6, WHICH IS THE BEST VALUE FOUND IN (a)]
with respect to the inertia weight w and the two acceleration

constants c1 and c2 , which control the behavior, and thus, the
goodness of the PSO search process. In the first step, we fixed
c1 and c2 to 1 and we varied w in the range [0,1] (according
to [19]). From Table VI(a), the best and the worst classification
accuracies were obtained for w = 0.6 and w = 0.8, respectively.
The corresponding OA (and AA) accuracies achieved on the test
set were 90.88% (92.70%) and 88.87% (91.90%), respectively.
676
In the second step, we fixed w = 0.6 (corresponding to the best

obtained accuracy) and we varied c1 and c2 in the range [1,2]
(according to [19]). In this case, the OA and AA accuracies were
less affected by the variation of these parameters. Indeed, they
fluctuated from 90.18% (91.65%) for c1 = c2 = 1.2 to 90.88%
(92.70%) for c1 = c2 = 1.
As this empirical analysis shows, the PSO optimizer appears
more sensitive to the inertia weight parameter than the two
other parameters. However, even when nonstandard parameter
values are adopted, the achieved accuracies keep still above
those yielded by the reference classifiers.
VII. CONCLUSION
From the obtained experimental results, we can strongly recommend the use of the SVM approach for classifying ECG
signals on account of their superior generalization capability as
compared to traditional classification techniques. This capability generally provides them with higher classification accuracies
and a lower sensitivity to the curse of dimensionality.
The main novelty of this paper is in the proposed PSO-based
approach, which aims at optimizing the performances of SVM
classifiers in terms of classification accuracy by detecting the
best subset of available features and solving the tricky model
selection issue. The fact that it is entirely automatic makes it
particularly useful and attractive. The results confirm that the
PSOSVM classification system substantially boosts the generalization capability achievable with the SVM classifier, and its
robustness against the problem of limited training beat availability, which may characterize pathologies of rare occurrence. Another advantage of the PSOSVM approach can be found in its
high sparseness, which is explained by the fact that the adopted
optimization criterion is based on minimizing the number of
SVs. This criterion favors the definition of compact discriminant functions, which are thus easy to implement on a hardware
platform. For such purpose, the PSOSVM classifier should first
be run on a PC for determining the best features for each class
and the discrimination model (SVs and related weights) of the
corresponding SVM.
Finally, it is noteworthy that, thanks to its general nature, the
proposed PSOSVM system is applicable not only to morphology and temporal features, but also to other types of features
such as those based on wavelets and high-order statistics. Furthermore, other optimization criteria could be considered as
well, individually or jointly depending on the application requirements.
ACKNOWLEDGMENT
The authors would like to thank Dr. C.-C. Chang and
Dr. C.-J. Lin for supplying the software LIBSVM (http://www.
csie.ntu.edu.tw/ cjlin/libsvm) used in this research.
REFERENCES
[1] S. Osowski and T. H. Linh, ECG beat recognition using fuzzy hybrid
neural network, IEEE Trans. Biomed. Eng., vol. 48, no. 11, pp. 1265
1271, Nov. 2001.
[2] T. H. Linh, S. Osowski, and M. L. Stodoloski, On-line heart beat recognition using Hermite polynomials and neuron-fuzzy network, IEEE Trans.
Instrum. Meas., vol. 52, no. 4, pp. 12241231, Aug. 2003.
[3] S. Osowski, T. H. Linh, and T. Markiewicz, Support vector machinebased expert system for reliable heart beat recognition, IEEE Trans.
Biomed. Eng., vol. 51, no. 4, pp. 582589, Apr. 2004.
[4] L.Y. Shyu, Y. H. Wu, and W. Hu, Using wavelet transform and fuzzy
neural network for VPC detection form the Holter ECG, IEEE Trans.
Biomed. Eng., vol. 51, no. 7, pp. 12691273, Jul. 2004.
[5] F. de Chazal, M. ODwyer, and R. B. Reilly, Automatic classification
of ECG heartbeats using ECG morphology and heartbeat interval features, IEEE Trans. Biomed. Eng., vol. 51, no. 7, pp. 11961206, Jul.
2004.
[6] L. Khadra, A. S. Al-Fahoum, and S. Binajjaj, A quantitative analysis
approach for cardiac arrhythmia classification using higher order spectral
techniques, IEEE Trans. Biomed. Eng., vol. 52, no. 11, pp. 18401845,
Nov. 2005.
[7] R. V. Andreao, B. Dorizzi, and J. Boudy, ECG signal analysis through
hidden Markov models, IEEE Trans. Biomed. Eng., vol. 53, no. 8,
pp. 15411549, Aug. 2006.
[8] S. Mitra, M. Mitra, and B. B. Chaudhuri, A rough set-based inference
engine for ECG classification, IEEE Trans. Instrum. Meas., vol. 55,
no. 6, pp. 21982206, Dec. 2006.
[9] F. de Chazal and R. B. Reilly, A patient adapting heart beat classifier
using ECG morphology and heartbeat interval features, IEEE Trans.
Biomed. Eng., vol. 53, no. 12, pp. 25352543, Dec. 2006.
[10] T. Inan, L. Giovangrandi, and J. T. A. Kovacs, Robust neural network
based classification of premature ventricular contractions using wavelet
transform and timing interval features, IEEE Trans. Biomed. Eng.,
vol. 53, no. 12, pp. 25072515, Dec. 2006.
[11] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[12] M. Pontil and A. Verri, Support vector machines for 3D object recognition, IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 6, pp. 637646,
Jun. 1998.
[13] Y. Y. El-Naqa, M. N. Wernick, N. P. Galatsanos, and R. M. Nishikawa,
A support vector machine approach for detection of microcalcifications, IEEE Trans. Med. Imag., vol. 21, no. 12, pp. 15521563, Dec.
2002.
[14] J. Robinson and V. Kecman, Combining support vector machine learning
with the discrete cosine transform in image compression, IEEE Trans.
Neural Netw., vol. 14, no. 4, pp. 950958, Jul. 2003.
[15] F. Melgani and L. Bruzzone, Classification of hyperspectral remote sensing images with support vector machine, IEEE Trans. Geosci. Remote
Sens., vol. 42, no. 8, pp. 17781790, Aug. 2004.
[16] Y. Bazi and F. Melgani, Toward an optimal SVM classification system
for hyperspectral remote sensing images, IEEE Trans. Geosci. Remote
Sens., vol. 44, no. 11, pp. 33743385, Nov. 2006.
[17] R. Mark and G. Moody MIT-BIH Arrhythmia Database 1997 [Online].
Available http://ecg. mit.edu/dbinfo.html.
[18] C.-W. Hsu and C.-J. Lin, A comparison of methods for multiclass support
vector machines, IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 415425,
Mar. 2002.
[19] J. Kennedy and R. C. Eberhart, Swarm Intelligence. San Mateo, CA:
Morgan Kaufmann, 2001.
[20] Z. L. Gaing, A particle swarm optimization approach for optimum design
of PID controller in AVR system, IEEE Trans. Energy Convers., vol. 19,
no. 2, pp. 384391, Jun. 2004.
[21] M. Donelli, R. Azzaro, F. G. B. De Natale, and A. Massa, An innovative
computational approach based on a particle swarm strategy for adaptive
phased-arrays control, IEEE Trans. Antennas Propag., vol. 54, no. 3,
pp. 888898, Mar. 2006.
[22] W. H. Slade, H. W. Ressom, M. T. Musavi, and R. L. Miller, Inversion
of ocean color observations using particle swarm optimization, IEEE
Trans. Geosci. Remote Sens., vol. 42, no. 9, pp. 19151923, Sep. 2004.
[23] J. J. Wei, C. J. Chang, N. K. Shou, and G. J. Jan, ECG data compression
using truncated singular value decomposition, IEEE Trans. Biomed.
Eng., vol. 5, no. 4, pp. 290299, Dec. 2001.
[24] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New
York: Wiley, 2001.
[25] A. Agresti, Categorical Data Analysis, 2nd ed. New York: Wiley, 2002.
[26] L. Bruzzone and D. Prieto, A technique for the selection of kernel function
parameters in RBF neural networks for classification of remote sensing
images, IEEE Trans. Geosci. Remote. Sens., vol. 37, no. 2, pp. 1179
1184, Mar. 1999.
[27] M. Stone, Cross-validatory choice and assessment of statistical predictions, J. R. Statist. Soc. B, vol. 36, pp. 111147, 1974.
Farid Melgani (M04SM06) received the State Engineer degree in electronics from the University of
Batna, Batna, Algeria, in 1994, the M.Sc. degree in
electrical engineering from the University of Baghdad, Baghdad, Iraq, in 1999, and the Ph.D. degree in
electronic and computer engineering from the University of Genoa, Genoa, Italy, in 2003.
From 1999 to 2002, he was with the Signal Processing and Telecommunications Group, Department
of Biophysical and Electronic Engineering, University of Genoa. Since 2002, he has been with the University of Trento, Trento, Italy, where he is an Assistant Professor of Telecommunications, and currently, the Head of the Intelligent Information Processing (I2P)
Laboratory, Department of Information Engineering and Computer Science. His
current research interests include processing, pattern recognition, and machine
learning techniques applied to remote sensing and biomedical signals/images
(classification, regression, multitemporal analysis, and data fusion). He is the
author or coauthor of more than 80 scientific papers and is a referee for several
international journals.
Dr. Melgani was on the scientific committees of several international conferences and is an Associate Editor of the IEEE GEOSCIENCE AND REMOTE
SENSING LETTERS.
677
Yakoub Bazi (S05M07) received the State Engineer and M.Sc. degrees in electronics from the University of Batna, Batna, Algeria, in 1994 and 2000,
respectively, and the Ph.D. degree in information and
communication technology from the University of
Trento, Trento, Italy, in 2005.
From 2000 to 2002, he was a Lecturer at the University of Msila, Msila, Algeria. From January 2006
to June 2006, he was a Postdoctoral Researcher at the
University of Trento. He is currently an Assistant
Professor in the College of Engineering, Al Jouf University, Al Jouf, Saudi Arabia. His current research interests include pattern
recognition and evolutionary computation methodologies applied to remote
sensing images and biomedical signal/images (change detection, classification,
and semisupervised learning).
Dr. Bazi is a Referee for several international journals.

Classification of Electrocardiogram Signals With Support Vector Machines and Particle Swarm Optimization

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Classification of Electrocardiogram Signals With Support Vector Machines and Particle Swarm Optimization

Hochgeladen von

Copyright:

Verfügbare Formate

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO.

Classification of Electrocardiogram Signals

AbstractThe aim of this paper is twofold. First, we present a

In the literature, several methods have been proposed for

1089-7771/$25.00 2008 IEEE

training a local classifier using the newly annotated beats, and

The optimal hyperplane defined by the weight vector w

This cost function minimization is subject to the following

where the i s are slack variables introduced to account for

under the constraints

where represents a parameter inversely proportional to the

accuracy by automatically: 1) detecting the subset of the best

among the beats of the considered class; 3) the average accuracy

the recordings of 20 patients, which correspond to the following

where Zij measures the pairwise statistical significance of the

polynomial kernel. Each time we left one of the subsets out

Fig. 2. Overall percentage accuracy (OA) versus number of selected features

average accuracies were 90.52% and 92.34% corresponding to

is statistically significant as shown by the McNemars statistical

decrease in accuracy was 3.55%, 3.93%, 4.54%, 4.63%, 5.13%,

E. Experiment 5: Sensitivity to the Inertia Weight and

with respect to the inertia weight w and the two acceleration

In the second step, we fixed w = 0.6 (corresponding to the best

Das könnte Ihnen auch gefallen