Beruflich Dokumente
Kultur Dokumente
2, FEBRUARY 2009
Abstract—An automatic electroencephalogram (EEG) artifact trained on equal number of artifactual and nonartifactual sam-
removal method is presented in this paper. Compared to past meth- ples, is used for automatic identification of artifactual ICs. Such
ods, it has two unique features: 1) a weighted version of support vec- a combination of ICA and SVM offers a promising approach
tor machine formulation that handles the inherent unbalanced na-
ture of component classification and 2) the ability to accommodate for automatic artifact removal. Unfortunately, unique properties
structural information typically found in component classification. of the problem at hand have not been taken into consideration.
The advantages of the proposed method are demonstrated on real- First, the real data is extremely unbalanced—only a few of the
life EEG recordings with comparisons made to several benchmark ICs are artifactual ICs and the majority is nonartifactual ICs (see,
methods. Results show that the proposed method is preferable to e.g., [8], [18]–[21]). It is well known in the machine learning
the other methods in the context of artifact removal by achieving
a better tradeoff between removing artifacts and preserving in- community that the performance of a standard SVM, trained on
herent brain activities. Qualitative evaluation of the reconstructed balanced dataset, may be poor when the real data is unbalanced.
EEG epochs also demonstrates that after artifact removal inherent Second, the number of artifactual ICs responsible for each type
brain activities are largely preserved. of artifact, decomposed from a given EEG epoch, is often small.
Index Terms—Artifact removal, electroencephalogram, error This structural information of the underlying data can be very
correction, weighted support vector machine (SVM). useful in improving the accuracy of automatic artifact identifica-
tion. To the best of our knowledge, such structural information
I. INTRODUCTION has, however, not been exploited in past literature.
LECTROENCEPHALOGRAM recordings are known to This paper shows a formulation that exploits the aforemen-
E be contaminated by physiological artifacts from various
sources, such as blinking and movements of the eyes, beating
tioned unique properties by: 1) using weighted SVM [22] to han-
dle the unbalanced data and 2) imposing constraints on the num-
of the heart, and movements of other muscle groups [1]. These ber of artifactual ICs through a novel error correction algorithm.
artifacts are mixed together with the brain signals, making in- It is worth noting that the proposed formulation is conceptually
terpretation of the EEG signals difficult [2]. different from past ICA-based artifact removal methods. It con-
In recent years, there has been increasing interest in apply- siders all the ICs derived from a given EEG epoch collectively
ing independent component analysis (ICA) [3], [4] to artifact while past methods treat each IC independently. The advantages
removal in EEG [2], [5]–[11]. This is mainly motivated by the of the proposed method over a number of past methods in the
fact that ICA is effective in decomposing raw EEG record- literature are shown in a carefully controlled experiment using
ings into artifactual and nonartifactual independent components real-life EEG data.
(ICs) (see, e.g., [5], [8], and [10]). Nonartifactual ICs represent
signals from brain activations while artifactual ICs represent
electrical signals originating from noncerebral artifacts. In con- II. OVERVIEW OF PROPOSED ARTIFACT REMOVAL SYSTEM
ventional ICA-based artifact removal methods (see, e.g., [2], [9], This section provides an overview of the proposed artifact
and [10]), artifactual ICs are manually identified (usually by vi- removal system and establishes the necessary notations needed
sual inspection) and removed. This process is very time consum- for subsequent sections. Like other ICA-based artifact removal
ing, and hence, not suitable for real-time applications. Recent systems in the literature, the proposed system (see Fig. 1) con-
effort toward automatic artifact removal includes [12] and [13] sists of four main modules: ICA, feature extractor, IC classifier,
where a standard support vector machine (SVM) [14]–[17], and EEG reconstruction. The contribution of the present paper
is mainly on the new method used in the IC classifier, though
Manuscript received January 11, 2008; revised June 24, 2008. First published the feature extractor also includes some new features.
October 3, 2008; current version published March 25, 2009. Asterisk indicates
corresponding author. The continuous raw EEG recording is first segmented into
S.-Y. Shao, K.-Q. Shen, and C. J. Ong are with the Department of Mechanical epochs with a fixed length. The resulting EEG epochs are then
Engineering, National University of Singapore, Singapore 117576, Singapore fed, epoch by epoch, into the artifact removal system. Given a
(e-mail: shao@nus.edu.sg; mpeskq@nus.edu.sg; mpeongcj@nus.edu.sg).
E. P. V. Wilder-Smith is with the Division of Neurology, National University raw EEG epoch as the input, the output of the system is the
Hospital, Singapore 119074, Singapore (e-mail: mdcwse@nus.edu.sg). reconstructed artifact-free EEG epoch.
∗ X.-P. Li is with the Department of Mechanical Engineering and Divi-
Consider a given n-channel raw EEG epoch, X =
sion of Bioengineering, National University of Singapore, Singapore 117576,
Singapore (e-mail: mpelixp@nus.edu.sg). [x1 x2 . . . xn ]T where xi ∈ Rl , i = 1, . . . , n, is the time se-
Digital Object Identifier 10.1109/TBME.2008.2005969 ries for the ith EEG channel with a fixed length, l. The ICA
0018-9294/$25.00 © 2009 IEEE
SHAO et al.: AUTOMATIC EEG ARTIFACT REMOVAL: WEIGHTED SUPPORT VECTOR MACHINE APPROACH 337
Fig. 1. Block diagram of the proposed automatic artifact removal system. The system consists of four main modules: ICA, feature extractor, IC classifier, and
EEG reconstruction module. The novelty of the proposed IC classifier is explicitly shown. It has two submodules: a modified probabilistic multiclass SVM (denoted
by weighted PWC_PSVM in the figure) to address the unbalanced nature of the data and an error correction block to handle the unique structural information of
the data.
module decomposes X into m (≤n) ICs, each representing gorithm. It is our attempt to address the unique properties of
an independent source. Let S = [s1 s2 . . . sm ]T denote the re- the problem. Given m ICs decomposed from an EEG epoch,
sulting ICs where si ∈ Rl , i = 1, . . . , m, is the ith IC and A = let mω j be the number of ICs corresponding to class ωj . The
[a1 a2 . . . am ] denote the mixing matrix with ai ∈ Rn containing unique properties of the problem can be effected in terms of the
the scalp distribution coefficients of si . While many implemen- following constraints:
tations of ICA are available, the popular FastICA package [23]
is used in the present study. mω 1 >> mω 2 ,mω 1 >> mω 3 , . . . , mω 1 >> mω c (1)
The feature extractor generates a set of feature vectors from
the si ’s needed for classification of the si ’s. Suppose d features and
are extracted from si . Then, f (si ) = [f1 (si )f2 (si ) . . . fd (si )]T
denotes the feature vector extracted from si and F(S) =
lω 2 ≤ mω 2 ≤ uω 2 , lω 3 ≤ mω 3 ≤ uω 3 , . . . , lω c ≤ mω c ≤ uω c .
[f (s1 )f (s2 ) . . . f (sm )]T denotes the set of feature vectors ob-
(2)
tained from S.
The constraints in (1) represent the inherent unbalanced na-
Suppose the si ’s are attributed to c different classes
ture of the data, while those as in (2) are the unique structural
{ω1 , . . . , ωc }, with ω1 being the class of brain sources and the
information that define the upper and lower bounds, denoted by
rest c-1 classes, i.e., ω2 , . . . , ωc , referring to different artifac-
uω j and lω j , respectively, with regards to the number of ICs be-
tual sources. Standard IC classifier in the literature [12], [13]
longing to each artifactual source. Optimal values of uω j and lω j
classifies si into one of c classes, or the decision function
depend on the bioelectrical nature of that artifact (for example,
d(f (si )) maps f (si ) into {ω1 , . . . , ωc }. Such a setup consid-
electrocardiogram (ECG) and electrooculogram (EOG) artifacts
ers si , i = 1, . . . , m, independently and is the framework used
generally have three spatial components each [24]), and the pro-
in most studies in the literature. However, it is difficult for such
tocol under which the EEG data are collected (e.g., the number
a setup to account for the unique structure of the underlying
of EEG channels used). Typically, they can be tuned by a data-
data. The proposed classifier, as shown in Fig. 1, considers the
driven approach. The details will be given in the description of
si ’s collectively and yields all m predicted class labels via the
the numerical experiments in Section IV.
decision function d(F(S)). Such a setup aims to incorporate
structural information of the dataset and address the unbalanced
nature of the data. The proposed d(F(S)) is based on a mod- A. Modified Probabilistic Multiclass SVM
ified version of probabilistic multiclass SVM. The choice of
Multiclass SVM is typically built up from several standard
SVM stems from its superior performance on many learning
binary SVMs [25], [26]. The proposed probabilistic multiclass
problems and a modification proposed in the present paper to
SVM is modified from a recently developed probabilistic mul-
address the unbalanced nature of the data is also a novel con-
ticlass SVM by Hastie and Tibshirani [27], by replacing all the
tribution. Justification to this choice is verified by experimental
standard binary SVMs with weighted SVMs [22], to properly
results compared with other classification approaches, like K-
address the unbalanced learning problem at hand. It is hereafter
nearest neighbors (KNN), Gaussian mixture models (GMM),
referred to as the weighted PWC_PSVM. The detailed imple-
and linear discriminant function (LDF). The detailed account of
mentation of the weighted PWC_PSVM is as follows.
the proposed IC classifier will be given in Section III.
Consider a nominal c-class unbalanced classification problem
The EEG reconstruction module reconstructs artifact-free
with dataset D in the form of {fi , yi }N i=1 , where fi ∈ R is the
d
EEG epoch by zeroing the contribution of all artifactual sources
ith sample and yi ∈ {ω1 , . . . , ωc } is the corresponding class
from raw EEG epoch. Suppose S̃ is obtained from S by zero-
label and N is the total number of training samples. Let Nj
ing all the identified artifactual ICs. The reconstructed artifact-
denote the number of training samples belonging to class ωj ,
free EEG epoch, denoted by X̃, can be obtained as follows:
and Dij := {fk , yk }f k ∈ω i ∪ω j be the subset of D formed by the
X̃ = AS̃.
samples from class ωi and ωj .
1) Construction of Weighted Binary SVMs: In total, c(c-
III. PROPOSED APPROACH
1)/2 weighted binary SVMs are constructed, each classifying
The proposed IC classifier is a combination of a modified a pair of classes. The weighted binary SVM classifying class
probabilistic multiclass SVM and a novel error correction al- ωi and class ωj are trained using Dij by solving the following
338 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 2, FEBRUARY 2009
qij = 1
j =1
min − [t k log(p ij
(f k ))+(1−t k ) log(1−p ij
(f k ))]
A i j ,B i j m m
f k ∈D i j lω 2 ≤ qi2 ≤ uω 2 , . . . ,lω c ≤ qic ≤ uω c (10)
(8) i=1 i=1
SHAO et al.: AUTOMATIC EEG ARTIFACT REMOVAL: WEIGHTED SUPPORT VECTOR MACHINE APPROACH 339
where the optimization is over Q = [q1 , q2 , . . . , qm ]T . While i.e., the regularization parameters, Ciij and Cjij , and the kernel
various efficient solvers of (10) are available, the present study parameters, γ ij and 2) the lower and upper bounds for each type
uses the solver developed by Bemporad and Mignone [33]. With of artifactual ICs, i.e., uω j and lω j .
the solution of (10), Q, the si ’s (i = 1, . . . , m) are simultane- 1) Tuning of Hyperparameters: Since Ciij and Cjij are con-
ously classified by nected through (4), only one of them needs to be tuned. In the ex-
T periments, (Ciij , γ ij ) were jointly tuned by a fivefold cross vali-
d(F(S))= arg max{q1k } arg max{q2k } · · · arg max{qm k } .
k k k dation [35] using the model selection tool in the LIBSVM pack-
(11) age [36] on the following grids: [2−5 , . . . , 210 ] × [2−10 , . . . , 23 ]
(step size = 20.5 ).
IV. NUMERICAL EXPERIMENTS 2) Tuning of uω j and lω j : As mentioned earlier, for ECG
In numerical experiments, we limited ourselves to the prob- and EOG artifacts, they generally have three spatial components
lem of automatic removal of ECG artifact and EOG artifact each [24]. ICA may output three artifactual ICs corresponding
in real-life EEG. The proposed IC classifier was quantitatively to the three spatial components if high-density EEG recordings
compared to several benchmark methods in a stringent subject- (such as 64-channel EEG) are used. However, the EEG data
wise cross-validation procedure. In addition, the reconstructed used in the present study were recorded from 17 locations in the
EEG epochs were reviewed by an independent EEG expert to standard 10–20 system and ICA tended to output less than three
qualitatively evaluate the performance of the proposed artifact artifactual ICs for both ECG and EOG artifacts. In the present
removal method. experiment, a simple grid search, with both uω j and lω j ranging
from 0 to 3 and a search step size of 1, was performed for ECG
ICs and EOG ICs, respectively, to obtain optimal values for uω j
A. Data Preparation
and lω j that gave the highest balanced accuracy.
Ten right-handed volunteers from local tertiary institutions
were selected for EEG measurements. These subjects fulfilled
the inclusion criteria of no history of cardiovascular disease, C. Quantitative Performance Evaluation
and normal eyesight with regular eye blinks. Informed con-
1) Subject-Wise Cross Validation: Among the subsets
sent was obtained and nominal monetary incentive was given
{Dk }10k =1 from ten subjects, samples from nine subjects were
for their participation. Multichannel unipolar EEG data were
used to form a training set Dtra , and the samples from the left-
recorded from 17 electrodes (excluding Fp1, Fp2) placed on the
out subject were used to form a testing set Dtes . Practically,
scalp according to the International 10–20 system [34] using the
this resampling procedure results in ten pairs of Dtra and Dtes
PL-EEG Wavepoint System (Medtronic, Inc., Denmark), with
in total. In the numerical experiments, for each pair of Dtra
sampling frequency of 167 Hz and a passband of 0.02–35 Hz
and Dtes , Dtra was used for tuning of parameters and training
using a customized bandpass filter implemented in LabView
of SVM. The trained classifier was then tested on the left-out
(version 6.1, National Instruments, USA). Five minutes of EEG
dataset Dtes . The major advantages of such subject-wise cross-
data were recorded from each subject with their eyes open and
validation procedure include that: 1) each testing set is inde-
in resting state.
pendent of the training set, and thus, the test error simulates the
These EEG recordings were first segmented into 12-s epochs
classifier’s generalization performance on other unseen subjects
(l = 2000). Each EEG epoch was then decomposed into ICs by
and 2) classifier performance obtained on multiple testing sets
the ICA. The ICs were manually labeled as EEG IC (class ω1 ),
can be used for evaluating the significance in the comparison of
EOG IC (class ω2 ), or ECG IC (class ω3 ) by an EEG expert in
two classifiers [37].
a random order. These labels were regarded as “true” labels,
2) Performance Measures: For a given testing set with c
against which the performance of IC classifiers was bench-
classes, let nij be the number of samples from ωi (true label)
marked.
being classified to ωj by the classifier (predicted label). The
Six features (d = 6) were extracted from each IC and they
following measures were used for evaluating the performance
were used as the chief information source, in place of the IC, for
of the proposed IC classifier.
classification. Four features were directly adopted from the liter-
a) Balanced accuracy:
It is the average
of accuracy on each
ature [13] for characterizing the EOG artifact and two new fea-
class, i.e. BA = 1c ci=1 (nii / cj =1 nij ) × 100%.
tures were proposed in the present study for characterizing the
b) RCI: It measures the amount of uncertainty about
ECG artifact. The detailed definitions of these features are given
the class of an input reduced by a classifier, i.e.,
in the Appendix. Combining the resulting feature vectors with
RCI = (HI − HO )/HI × 100%, where HI and
the “true” labels given by the EEG expert, a subject-wise data
HO denote the prior and posterior uncertainty
subset of 425 samples (25 epochs × 17 ICs), Dk := {fi , yi }425i=1 about the class
of an unseen input, respectively.
(k = 1, . . . , 10), was obtained for each subject.
Here,
c HI
= −
ci=1 (pin in
i log2 pi ),
where pin
i =
c c c out
n / n ; H = (p H O j ),
B. Parameter Selection
j =1 ij
c j =1
c
c
i=1 ij O j =1 j
where pout = n / n and H Oj =
c j i=1 ij i=1 j =1
ij
c
For the proposed IC classifier, two groups of parameters need − i=1 (pij log2 pij ) with pij = nij / i=1 nij . RCI
to be tuned: 1) the hyperparameters for each weighted SVM, has been shown to be a useful performance measure that
340 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 2, FEBRUARY 2009
TABLE I
PERFORMANCE COMPARISON BETWEEN PROPOSED METHOD (I.E., WEIGHTED PWC_PSVM WITH ERROR CORRECTION) AND FIVE BENCHMARK METHODS
(WEIGHTED PWC_PSVM, STANDARD SVM, GMM, KNN, AND LDF)
ICs and ECG ICs determined through grid search, i.e., uω 2 = 2, c) Comparison between weighted PWC_PSVM with error
lω 2 = 1, uω 3 = lω 3 = 1. correction and weighted PWC_PSVM without error cor-
2) Quantitative Comparison: Detailed classification results rection: As shown in Table I, the weighted PWC_PSVM
and performance measures of the proposed method and the with error correction showed significant wins over the
benchmark methods are summarized in Table I. The numbers weighted PWC_PSVM method without error correction
shown are the averages over ten test datasets corresponding to on almost all performance measures. The confusion matri-
ten pairs of Dtra and Dtes . The P -values (given in parenthe- ces show that the incorporation of error correction resulted
ses) were obtained in the paired t-test [42], [43] between the in a large increase in the accuracy on EEG ICs (3540/3708
proposed method and each of the benchmark methods. Based vs. 3474/3708) at a tiny cost of the accuracy on ECG ICs
on the results in Table I, the proposed method appears to be (246/250 vs. 248/250). It indicates the goodness of the pro-
superior to all the benchmark methods. Details are as follows. posed error correction algorithm in that it helps to achieve
a) Comparison between proposed method and modeling ap- a better tradeoff between removing artifacts and preserving
proaches (including GMM, KNN, and LDF): As shown in inherent brain activities. Consider a batch of ICs resulting
Table I, the proposed method achieved significantly higher from a given EEG epoch. The error correction algorithm
balanced accuracy and RCI than the modeling approaches. prevents the number of ICs predicted as ECG/EOG ICs
It performed well on all the three classes. In comparison, from either exceeding the corresponding upper limits or be-
all the modeling approaches showed very good perfor- ing less than the corresponding lower limits, as given in (2),
mance on EEG ICs; however, their performance on EOG by picking out appropriate number of most probable arti-
and ECG ICs was not satisfying. In the context of artifact factual ICs (with largest posterior probability of belonging
removal, the proposed method, which achieved a better to the classes of ECG/EOG ICs). Fig. 2(b) shows a typical
tradeoff among removing artifacts and preserving inherent example when the weighted PWC_PSVM classified two
brain activities, is preferable. ICs (marked by “ ” and “ ”, respectively) as ECG ICs
b) Comparison between the proposed method and the stan- but the IC marked by “ ” was actually an EEG IC. The
dard SVM: As can be seen from Table I, the proposed proposed error correction algorithm corrected this error by
method showed significant wins over the standard SVM incorporating the constraint on ECG ICs: 1≤mω 3 ≤1.
on all performance measures. The results given by the 3) Review of Reconstructed EEG: The qualitative evaluation
confusion matrices suggest that the better performance of of the proposed artifact removal system by the independent EEG
the proposed method is mainly due to its higher accuracy expert is given in Table II. The amount of ECG artifacts was re-
on EEG ICs as compared to the standard SVM (3520/3708 duced in 98.4% of the epochs, with 98.0% indicated as “almost
vs. 3424/3708). One plausible reason is that the standard removed” and 0.4% indicated as “minor improvement.” EOG
SVM, as used in [12] and [13], was trained on downsam- artifacts were removed in 96.8% of the epochs, with 92.0%
pled balanced training data (with large portion of samples indicated as “almost removed” and 4.8% indicated as “minor
of EEG ICs being discarded). Such downsampling causes improvement.” In 88.4% of the epochs, inherent brain activities
loss of information, and thus, leads to suboptimal perfor- were well preserved and in 10.8% of the epochs, brain activities
mance on the majority class [44]. were slightly attenuated. Only 0.8% of the epochs suffered from
342 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 2, FEBRUARY 2009
greatly attenuated while brain activities were largely preserved. and testing datasets. They do not require additional reference
The proposed method appears to be well suited for automatic EEG channels that are generally required in many non-ICA-
EEG artifact removal. based artifact removal methods.
APPENDIX REFERENCES
DEFINITIONS OF SIX FEATURES [1] T. P. Jung, S. Makeig, C. Humphries, T. W. Lee, M. J. McKeown,
V. Iragui, and T. J. Sejnowski, “Removing electroencephalographic arti-
Given an IC, si , the six features extracted from it are defined facts by blind source separation,” Psychophysiology, vol. 37, pp. 163–178,
as follows. 2000.
Feature 1: This feature is the ratio between the peak amplitude [2] E. Urrestarazu, J. Iriarte, M. Alegre, M. Valencia, C. Viteri, and J. Artieda,
“Independent component analysis removing artifacts in ictal recordings,”
and the variance of si [13], as given by Epilepsia, vol. 45, pp. 1071–1078, Sep. 2004.
[3] P. Comon, “Independent component analysis, a new concept?,” Signal
f1 (si ) = max(|si |)/σs2i (A1) Process., vol. 36, pp. 287–314, Apr. 1994.
[4] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis.
where σs i is the standard deviation of si . New York: Wiley, 2001.
Feature 2: The second feature is the normalized skewness of [5] T. P. Jung, C. Humphries, T. W. Lee, S. Makeig, M. J. McKeown, V. Lragui,
si [13], defined as and T. J. Sejnowski, “Removing electroencephalographic artifacts: Com-
parison between ICA and PCA,” in Proc. IEEE Neural Netw. Signal
f2 (si ) = E{s3i }/σs3i . (A2) Process., 1998, pp. 63–72.
[6] T. P. Jung, S. Makeig, M. Westerfield, J. Townsend, E. Courchesne, and T.
Feature 3: This feature measures the cross correlation be- J. Sejnowski, “Removal of eye activity artifacts from visual event-related
potentials in normal and clinical subjects,” Clin. Neurophysiol., vol. 111,
tween si and an independent reference dataset containing eye- pp. 1745–1758, Oct. 2000.
blinking-dominated EEG channels (Fp1, Fp2, F3, F4, O1, O2), [7] G. L. Wallstrom, R. E. Kass, A. Miller, J. F. Cohn, and A. F. Nathan,
x0j (j = 1, . . . , 6) [13], which is distinct from the training and “Automatic correction of ocular artifacts in the EEG: A comparison of
regression-based and component-based methods,” Int. J. Psychophysiol.,
testing datasets. It is calculated as vol. 53, pp. 105–119, Jul. 2004.
[8] N. P. Castellanos and V. A. Makarov, “Recovering EEG brain signals:
6
Artifact suppression with wavelet enhanced independent component anal-
f3 (si ) = max(E x0j (m)si (m + τ ) )/6. (A3) ysis,” J. Neurosci. Methods, vol. 158, pp. 300–312, Dec. 2006.
τ
j =1 [9] S. Makeig, A. J. Bell, T. P. Jung, and T. J. Sejnowski, “Independent
component analysis of electroencephalographic data,” Adv. Neural Inf.
Feature 4: This feature is the Kullback–Leibler (KL) distance Process. Syst., vol. 8, pp. 145–151, 1996.
between the probability density function (PDF) of si and that [10] R. Vigário, J. Särelä, V. Jousmäki, M. Hämäläinen, and E. Oja, “Indepen-
dent component approach to the analysis of EEG and MEG recordings,”
of a reference EOG IC, which is separated from an EEG epoch IEEE Trans. Biomed. Eng., vol. 47, no. 5, pp. 589–593, May 2000.
distinct from the data used for training and testing, s0eog [13]. It [11] R. Vigário, “Extraction of ocular artifacts from EEG using independent
is calculated as component analysis,” Electroencephalogr. Clin. Neurophysiol., vol. 103,
pp. 395–404, 1997.
[12] N. Nicolaou and S. J. Nasuto, “Temporal independent component analysis
f4 (si ) = P(si ) ln(P(si )/P(s0eog )) dsi (A4) for automatic artifact removal from EEG,” in Proc. 2nd Int. Conf. MDSIP,
2004, pp. 5–8.
where P(si ) and P(s0eog ) are the PDFs of si and s0eog , [13] L. Shoker, S. Sanei, and J. Chambers, “Artifact removal from electroen-
cephalograms using a hybrid BSS-SVM algorithm,” IEEE Signal Process.
respectively. Lett., vol. 12, no. 10, pp. 721–724, Oct. 2005.
Feature 5: The fifth feature is the variance of scalp distribution [14] B. Boser, I. Guyon, and V. Vapnik, “A training algorithm for optimal
of si , given by margin classifiers,” in Proc. 5th Annu. Workshop Comput. Learn. Theory,
New York: ACM, 1992, pp. 144–152.
f5 (si ) = var(ai / ai ) (A5) [15] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn.,
vol. 20, pp. 273–297, 1995.
where ai refers to the scalp distribution coefficients in mixing [16] V. N. Vapnik, “The vicinal risk minimization principle and the SVMs,”
The Nature of Statistical Learning Theory, 2nd ed. New York: Springer-
matrix corresponding to si . This feature is specially proposed Verlag, 1995.
for ECG ICs because empirical evidences have shown that their [17] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Ma-
unique scalp distribution gives smaller variance compared to chines and Other Kernel-Based Learning Methods. Cambridge, U.K:
Cambridge Univ. Press, 2000.
other types of ICs [47]. [18] C. A. Joyce, I. F. Gorodnitsky, and M. Kutas, “Automatic removal of
Feature 6: This feature gives the KL distance between the eye movement and blink artifacts from EEG data using blind component
PDF of si and that of a reference ECG IC, s0ecg . It is described separation,” Psychophysiology, vol. 41, pp. 313–325, Mar. 2004.
[19] T. P. Jung, S. Makeig, M. Westerfield, J. Townsend, E. Courchesne, and
as
T. J. Sejnowski, “Analysis and visualization of single-trial event-related
potentials,” Hum. Brain Mapp., vol. 14, pp. 166–185, 2001.
f6 (si ) = P(si ) ln(P(si )/P(s0ecg )) dsi (A6) [20] S. Romero, M. A. Mailanas, S. Clos, S. Gimenez, and M. J. Barbanoj,
“Reduction of EEG artifacts by ICA in different sleep stages,” in Proc.
25th IEEE Annu. Int. Conf. Eng. Med. Biol. Soc., Cancun, Mexico, 2003,
where P(si ) and P(s0ecg ) are the PDFs of si and s0ecg , respec- pp. 2675–2678.
tively. This feature is proposed to account for distinct PDFs [21] J. Onton and S. Makeig, “Information-based modeling of event-related
of ECG ICs due to their unique composition of P wave, QRS brain dynamics,” in Progress in Brain Research Book Series, vol. 159,
C. Neuper and W. Klimesch, Eds. Amsterdam, The Netherlands: Elsevier,
complex, and T wave. 2006, pp. 99–120.
It is worth noting that features 3, 4, and 6 require reference ICs [22] E. Osuna, R. Freund, and F. Girosi, “Support vector machines: Training
obtained from distinct EEG epochs that are not part of training and applications,” MIT A. I. Lab., Lexington, MA, A.I. Memo 1602, 1997.
344 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 2, FEBRUARY 2009
[23] H. Gävert, J. Hurri, J. Särelä, and A. Hyvärinen. (2005, Oct. 19). Shi-Yun Shao received the B.Eng. and M.Eng. de-
The FastICA Package for MATLAB (version 2.5) [Online]. Available: grees in biomedical engineering from Xi’an Jiaotong
http://www.cis.hut.fi/projects/ica/fastica/ University, Shaanxi, China, in 2003 and 2006, re-
[24] A. Schlögl, C. Keinrath, D. Zimmermann, R. Scherer, R. Leeb, and spectively. She is currently working toward the Ph.D.
G. Pfurtscheller, “A fully automated correction method of EOG arti- degree in the Department of Mechanical Engineer-
facts in EEG recordings,” Clin. Neurophysiol., vol. 118, pp. 98–104, Oct. ing, National University of Singapore, Singapore.
2007. Her current research interests include electroen-
[25] C. W. Hsu and C. J. Lin, “A comparison of methods for multi-class support cephalogram artifact removal, blind source separa-
vector machines,” IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 415–425, tion, and machine learning.
Mar. 2002.
[26] K. Duan and S. S. Keerthi, “Which is the best multiclass SVM method?
An empirical study,” Lect. Notes Comput. Sci., vol. 3541, pp. 278–285,
2005. Kai-Quan Shen received the B.S. degree from the
University of Science and Technology of China,
[27] T. Hastie and R. Tibshirani, “Classification by pairwise coupling,” Ann.
Hefei, China, in 2001.
Stat., vol. 26, pp. 451–471, 1998.
In 2007, he joined the National University of Sin-
[28] F. Markowetz, “Support vector machines in bioinformatics” M.S. the-
sis, Dept. Math. Ruprecht-Karls Univ. Heidelberg, Heidelberg, Germany, gapore, Singapore, as a Research Fellow with the
Neurosensors Laboratory, Department of Mechanical
2001.
Engineering. His current research interests include
[29] T. Eitrich and B. Lang, “Efficient optimization of support vector machine
feature selection methods, support vector machines,
learning parameters for unbalanced data sets,” J. Comput. Appl. Math.,
brain signal processing, blind signal separation, and
vol. 196, pp. 425–436, 2006.
the investigation of neurophysiological mechanisms
[30] J. C. Platt, “Probabilistic outputs for support vector machines and com-
of human brain using functional MRI.
parisons to regularized likelihood methods,” in Advances in Large Margin
Classifiers, A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, Eds.
Cambridge, MA: MIT Press, 2000.
[31] H. T. Lin, C. J. Lin, and R. C. Weng, “A note on Platt’s probabilistic Chong Jin Ong (S’91–M’92) received the B.Eng.
outputs for support vector machines,” Dept. Comput. Sci., Nat. Taiwan (Hons) and M.Eng. degrees in mechanical engineer-
Univ., Taipei, Taiwan, Tech. Rep., 2003. ing from the National University of Singapore, Singa-
[32] T. F. Wu and C. J. Lin, “Probability estimates for multi-class classifica- pore, in 1986 and 1988, respectively, and the M.S.E.
tion by pairwise coupling,” J. Mach. Learn. Res., vol. 5, pp. 975–1005, and Ph.D. degrees in mechanical and applied me-
2004. chanics from the University of Michigan, Ann Arbor,
[33] A. Bemporad and D. Mignone. (2001, May 9). A Matlab Function for Solv- in 1992 and 1993, respectively.
ing Mixed Integer Quadratic Programs (version 1.06) [Online]. Available: He joined the National University of Singapore in
http://control.ethz.ch/∼hybrid/miqp/ 1993 and is now an Associate Professor with the De-
[34] H. H. Jasper, “The ten-twenty electrode system of the international fed- partment of Mechanical Engineering. His current re-
eration,” Electroencephalogr. Clin. Neurophysiol., vol. 10, pp. 371–375, search interests include algorithms for machine learn-
1958. ing, feature selection, and robust control.
[35] K. R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, “An
introduction to kernel-based learning algorithms,” IEEE Trans. Neural
Netw., vol. 12, no. 2, pp. 181–201, Mar. 2001.
[36] C. Chang and C. Lin. (2007, Apr. 1). LIBSVM: A Library Einar P. V. Wilder-Smith received the M.B.B.S
for Support Vector Machines (version 2.84) [Online]. Available: (equivalent) and M.D. degrees from Heidelberg Uni-
http://www.csie.ntu.edu.tw/∼cjlin/libsvm versity, Heidelberg, Germany, in 1986 and 1989, re-
[37] K. Q. Shen, C. J. Ong, X. P. Li, H. Zheng, and E. P. V. Wilder-Smith, “A spectively.
feature selection method for multilevel mental fatigue EEG classification,” He joined the National University of Singapore,
IEEE Trans. Biomed. Eng., vol. 54, no. 7, pp. 1231–1237, Jul. 2007. Singapore, in 2001, and is now a Full Professor and
[38] V. Sindhwani, P. Bhattacharya, and S. Rakshit, “Information theoretic Director of Research in the Department of Medicine,
feature crediting in multiclass support vector machines,” presented at the where he is also the head of the Neurology Diagnos-
1st SIAM Int. Conf. Data Mining, Chicago, IL, 2001. tic Laboratory. His current research interests include
[39] G. Hripcsak and D. F. Heitjan, “Measuring agreement in medical infor- the field of clinical neuropathysiology, particularly
matics reliability studies,” J. Biomed. Inf., vol. 35, pp. 99–110, 2002. EEG changes in relation to emotions and neurological
[40] J. Cohen, “A coefficient of agreement for nominal scales,” Educ. Psychol. disorders.
Meas., vol. 20, pp. 37–46, 1960.
[41] C. A. Bouman. (1997, Apr.). Cluster: An Unsupervised Algorithm
for Modeling Gaussian Mixtures [Online]. Available: http://cobweb.ecn.
purdue.edu/∼bouman/software/cluster/ Xiao-Ping Li received the Ph.D. degree in mechani-
[42] G. E. P. Box, W. G. Hunter, and J. S. Hunter, Statistics for Experimenters: cal and manufacturing engineering from the Univer-
An Introduction to Design, Data Analysis, and Model Building. New York: sity of New South Wales, Sydney, Australia, in 1991.
Wiley, 1978. He joined the National University of Singapore,
[43] E. Alpaydin, “Assessing and comparing classification algorithms,” Intro- Singapore, in 1992, and is currently a Full Professor
duction to Machine Learning, Cambridge, MA: MIT Press, 2004. with the Department of Mechanical Engineering and
[44] Y. Liu, A. An, and X. Huang, “Boosting prediction accuracy on imbalanced Division of Bioengineering. His current research in-
datasets with SVM ensembles,” in Proc. ECML/PKDD 2004 Workshop terests include neurosensors and nanomachining. He
SAWM, Pisa, Italy, pp. 2–13. is the holder of six patents, and two patents that are
[45] K. Q. Shen, C. J. Ong, X. P. Li, and E. P. V. Wilder-Smith, “Feature pending. He is the author or coauthor of 230 technical
selection via sensitivity analysis of SVM probabilistic outputs,” Mach. publications, of which 118 are international refereed
Learn., vol. 70, no. 1, pp. 1–20, 2008. journal papers. He supervised six postdoctoral research fellows, 13 Ph.D. degree
[46] K. Q. Shen, X. P. Li, C. J. Ong, S. Y. Shao, and E. P. V. Wilder-Smith, students, and 23 master’s degree students.
“EEG-based mental fatigue measurement using multi-class support vec- Prof. Li is a member of the American Society of Mechanical Engineers, a
tor machines with confidence estimate,” Clin. Neurophysiol., vol. 119, Senior Member of the Society of Manufacturing Engineers (SME), and a Senior
pp. 1524–1533, 2008. Member of the North American Manufacturing Institute of the SME. He is a
[47] A. Greco, N. Mammone, F. Morabito, and M. Versaci, “Semi-automatic Guest Editor of the International Journal of Computer Applications in Tech-
artifact rejection procedure based on kurtosis, Renyi’s entropy and inde- nology, an Editorial Board Member of the International Journal of Abrasive
pendent component scalp maps,” Int. J. Biomed. Sci., vol. 1, pp. 12–16, Technology and Engineering, an Editorial Advisor of the Chinese Journal of
2005. Mechanical Engineering, and a regular Reviewer of 14 international journals.