Fatima 2017

DECODING BRAIN COGNITIVE ACTIVITY ACROSS SUBJECTS USING MULTIMODAL
M/EEG NEUROIMAGING
Sarwat Fatima, Awais M. Kamboh, Member IEEE
Neuroinformatics Lab, School of Electrical Engineering & Computer Science (SEECS),

National University of Sciences & Technology (NUST), Islamabad, Pakistan
ABSTRACT Due to this variability, the signals are very specific for
each subject and the cortical dipoles for any brain activity
Brain decoding is essential in understanding where and how can be different in different subjects. This variability can
information is encoded inside the brain. Existing literature change the underlying distribution of the features from sub-
has shown that a good classification accuracy is achievable in ject to subject and thus, make it difficult for decoding brain
decoding for single subjects, but multi-subject classification activity across subjects.
has proven difficult due to the inter-subject variability. In Inter-subject brain decoding can be categorized as a trans-
this paper, multi-modal neuroimaging was used to improve fer learning problem, i.e. it is needed to transfer knowledge
two-class multi-subject classification accuracy in a cognitive learned from other subjects, or other experiments of same
task of differentiating between a face and a scrambled face. subject, to new unseen subjects in order to decode brain ac-
In this transfer learning problem, a feature space based on tivity of new subjects. This way the brain activity relevant
special-form covariance matrices manipulated with rieman- to a particular stimulus can be confirmed. Brain decoding
nian geometry are used. A supervised two-layer hierarchical is useful in several applications, such as Brain Computer In-
model was trained iteratively for estimating classification terface (BCI) applications where an individual uses his brain
accuracies. Results are reported on a publically available activity to interact with an external machine, or for a reha-
multi-subject, multi-modal human neuroimaging dataset bilitation training, such as neurofeedback. In all these appli-
from MRC Cognition and Brain Sciences Unit, University cations, typically, the classifier is calibrated or trained for a
of Cambridge. The dataset contains simultaneous recordings particular subject. However, what is required is a generalized
of electroencephalography (EEG) and magnetoencephalogra- classifier, trained on brain patterns of other subjects, or other
phy (MEG). Our model attained, using leave-one-subject-out experiments of the same subject, which could accurately clas-
cross-validation, a classification accuracy of 70.82% for sin- sify brain activity.
gle modal EEG, 81.55% for single modal MEG and 84.98% Multi-modal M/EEG can capture information about brain
for multi-modal M/EEG. patterns which may not be available when using EEG or MEG
Index Terms— Multi-Modal Neuroimaging, Brain De- as a single modality. The underlying neuronal sources for
coding, Cognitive Activity, M/EEG both MEG and EEG are similar, but both modalities are sensi-
tive to different components of the dipolar sources. The EEG
field can pick up both tangential and radial components while
1. INTRODUCTION MEG is only sensitive to the tangential component of the
dipolar sources. Moreover, both MEG and EEG have good
Brain decoding refers to the ability of identifying neural temporal resolution but MEG has slightly better spatial res-
patterns generated in response to a given stimulus, and con- olution than EEG. This complementary information in MEG
versely, be able to predict the stimulus by observing neural and EEG signals is beneficial for better dipole localization
patterns. There have been several successes in decoding brain and classification performance.
activity for individual subjects, however, one of the biggest In this work, we propose a new combination of feature ex-
challenges in correctly predicting the stimulus is inter-subject traction and classification model using multi-modal M/EEG
variability. which caters for inter-subject variability and improves the
Brain decoding across subjects is a difficult task because classification performance across subjects. To the best of
of the inter-subject variability, existing due to the structural our knowledge, the combination of special form covariance
and functional differences of their brains. These differences matrices as features and supervised hierarchical iterative
originate from the capacity of neurons to rearrange them- classification has not been applied on multi-modal M/EEG.
selves functionally and structurally owing to the powerful The proposed model estimates the Leave-One-Subject-Out
ability of neural plasticity. (LOSO) two-class classification accuracy for single modal
978-1-5090-2809-2/17/$31.00 ©2017 IEEE 3224

EEG, single modal MEG and multi-modal M/EEG. A signif- After spatially filtering the input signal, a special form of
icant improvement was observed in the LOSO classification Symmetric Positive Definite (SPD) covariance matrix is esti-
accuracies as compared to the state-of-the-art classification mated to capture the relevant information of the trials. First,
accuracy of 77.4% on the same dataset for single modal MEG a new trial Zi is built by concatenating spatially filtered av-
[1]. erage evoked potential P (k) and the spatially filtered trial zi
[1]. Then, the SPD covariance matrix of the new trial Zi is
estimated as
2. METHODOLOGY
1
This section explains the specific methods used in the fea- Ci = Zi ZiT (2)
N
ture extraction and classification processes for our proposed
The riemannian geometry tools i.e. riemannian mean &
model as shown in Fig. 1.
riemannian distance [1] are used to keep the special structure
of the SPD covariance i.e. matrix form. However, for vector-
2.1. Transfer Learning based classification purposes, the covariance matrice has to be
converted into a euclidean vector. Tangent space mapping is
Transfer learning is a method to improve a classifier in one used for approximation of riemannian distance computations
domain by transferring information from a related domain. into euclidean distance computation using geometric mean of
This method is often useful when the training data is limited, the covariance matrices as reference [1].
expensive to collect or not easily accessible. Brain decoding
across subjects can be categorized as a transfer learning (TL)
Feature Extraction
problem. Transfer learning is formally defined as improving
SPD Riemannian Tangent
the target predictive function given a source domain DS with Spatial Riemannian
Covariance Distance Space
filtering Mean
a corresponding source task TS and a target domain DT with Matrix
a corresponding target task TT . A domain is defined by two
Classification Model
parts, a feature space X and the marginal probability distribu-
Hierarchal model Iterative Training
tion P(X) [2]. SVM Xtest
In our problem, the source and target domain are same i.e. SVM Random Confidence = yProb - 0.5
yPred
human brain but the marginal distribution is different because Forest W = 10 * (confidence > 0.1)
of structural and functional differences across subjects. We SVM yPred
have different feature space for each subject and we want to
improve the classification accuracy on a new subject when Level 0 Level 1
the classifier is trained on other subjects. This was termed as

transductive transfer learning (TTL) in [3]. Fig. 1: Proposed system diagram showing a special-form co-
variance matrix used for features in a two-layer hierarchical
iterative classifier.
2.2. Feature extraction
xDAWN spatial filtering was used for dimensionality reduc- 2.3. Classification
tion and for maximizing the signal to noise ratio of the evoked
potentials. xDAWN assumes that the signal is composed of The classification model is a two-layer hierarchical combina-
two typical patterns; one evoked by individual class and one tion of non-linear Support Vector Machine (SVM) and ran-
evoked by whole signal i.e. both classes. Therefore, the spa- dom forest (RF) at level 0, and iterative training at level 1 [4].
tial filters are estimated for class k as An SVM can model complex non-linear relationships by
finding a hyperplane between a set of samples belonging to
T
wT P (k) P (k) w class labels. A non-linear Radial Basis Function (rbf) ker-
w∗ = argmax (1) nel was used for separating the linearly non-separable data
wT XX T w
points to higher dimensions and creating more accurate deci-
where P is the average evoked potential, w is the spatial sion boundary. The parameter C = 1 and γ = 1/nf eatures
filter vector and X is the matrix of the whole signal. The are used for optimization of rbf kernel where nfeatures is the
spatial filters are selected corresponding to highest eigen total number of features used. The parameter C controls the
values for each class through eigen value decomposition misclassification of training examples versus the simplicity of
(k) (k)T
of P XX P
T . The resulting set of spatial filters from each the decision surface i.e. a higher C results in classifying all
class are concatenated in a single matrix W which performs training samples accurately. The parameter γ defines the in-
the spatial filtering operation on both the trials and average fluence of training sample by giving a weight to each training
evoked potential [1]. sample. Although SVM uses significant amount of memory
3225
and processing power, using the right kernel can help improve Table 1: Mean LOSO classification accuracies across sub-
the speed of the algorithm. jects
Random Forest classier is an ensemble learning algorithm
which sequentially combines output of several decision tree Linear Non- Normal. Subject
classifiers and averages the probabilistic predictions to im- SVM Linear (Non- 02 Re-
prove the generalizability or robustness over a single estima- SVM Linear moval
tor. SVM)
EEG 70.12% 70.03% 69.70% 70.82%
MEG 53.74% 80.84% 78.23% 81.55%
2.3.1. Level 0: Hierarchical Model Separate M/EEG 61.18% 78.86% 78.91% 80.97%
Joint M/EEG 52.97% 84.63% 79.92% 84.98%
The two-layer hierarchical combination of SVM classifiers
and random forest classifier takes a 3-D feature space as in- 3. RESULTS AND DISCUSSION
put. The first dimension is the number of trials while the sec-
ond and third dimension depends on the chosen number of This section describes the implementation of the above men-
features. In our case, each subject has a feature subspace tioned proposed system, and tests its performance on a public
which is aggregated across subjects to form a new feature dataset.
space of dimension [trials x single-subject features x across-
subject features]. The 3-D form of the feature space was
kept instead of combining second and third dimension. The 3.1. Dataset description
first layer trains a collection of non-linear SVM classifiers
The proposed model in section 2 is tested on a multi-subject,
using data from second (single-subject) and third dimension
multi-modal human neuroimaging dataset for evoked re-
(across-subject). The total number of SVM classifiers de-
sponses of face stimuli by D. G. Wakeman and R. N. Henson
pends on the length of second and third dimension [4]. The
from MRC Cognition & Brain Sciences Unit, University
resulting probability estimates from SVM are then fed into a
of Cambridge [5]. It contains simultaneous recordings of
1000-tree random forest classifier which makes the final pre-
70-channel electroencephalography (EEG) and 306-channel
diction on the test subject as shown in Fig. 1.
magnetoencephalography (MEG) (102 magnetometers and
204 planar gradiometers) from 19 subjects. However, we
used 16 subjects to keep our analysis comparable with [1].
2.3.2. Level 1: Iterative Training The subjects undertook multiple runs of a large number
of face stimuli i.e. unfamiliar and scrambled faces. The du-
The two-layer hierarchical model is wrapped inside the iter- ration of a single trial was between 1.2 and 1.6 seconds with
ative learning classifier to transfer knowledge to the test sub- approximately 295 trials per category. A complete descrip-
ject. The predicted labels and corresponding probabilities at tion of the dataset can be found in [5]. After performing the
level 0 are used at level 1 to improve the classification perfor- usual pre-processing steps in Statistical Parameteric Mapping
mance. The LOSO is used to divide the subjects into training (SPM) MATLAB toolbox, we extracted epochs for unfamil-
and test subjects. The predicted labels for the test subject at iar face (0) and scrambled face (1) which varied between 289
level 0 are used as ground truth for the test subject at level and 297 epochs. In order to balance the epochs, the smallest
1. The predicted probabilities are used for screening the test number of epochs available for both stimuli i.e 289 epochs
subject samples i.e increasing the weight of the most reliable were kept only.
samples. It is expected that the samples with predicted prob- A 6th-order butterworth band-pass filter was applied be-
ability far from chance probability of 0.5 in case of binary tween 1 and 20 Hz. The first 0.5 seconds of the signal before
class are more reliable and therefore, duplicated by the fac- stimulus was discarded and 0.5 seconds after stimulus was
tor of 10 which was set an optimal value for replication in kept for further analysis. A set of 8 spatial filters were se-
[4]. As shown in Fig. 1, the confidence for a given sam- lected i.e. 4 spatial filters for each category. For each subject,
ple is calculated by subtracting the predicted probability of a feature subspace of dimension [136 features x 578 trials] is
that sample from chance probability of 0.5. If the confidence aggregated to form a new feature space of dimension [9248
value is greater than 0.1, the sample is considered reliable and trials x 136 features x 16 subjects]. The new feature space
duplicated by the factor of 10 and if it is less than 0.1, then was used as input at level 0 of classification model forming
the sample is not duplicated. The test subject is retrained with a collection of 152 SVM classifiers each using data from a
predicted labels from level 0 and duplicated reliable samples single subject (136 features) or across subject (16 features).
at level 1 using training model at level 0. The final predictions Table 1 shows the mean LOSO classification accuracies for
are made on the test subject after re-training. across subjects under different situations.
3226
3.2. Linear vs non-linear SVM signal values are changed. As shown in Table I, the nor-
malization did not have any significant effect on the LOSO
The LOSO classification accuracy for linear and non-linear
mean classification accuracy for EEG and separate M/EEG
SVM are compared for EEG, MEG & M/EEG. As shown in
features. However, the mean classification accuracy of MEG
Table 1, non-linear SVM had a significant improvement in
was reduced by 2% and that of joint M/EEG was reduced by
classification accuracy for MEG & M/EEG as compared to
4%. However, the classification accuracy for joint M/EEG
linear SVM. However, linear & non-linear SVM performed
was still higher than separate M/EEG features.
equally well for EEG and had no significant improvement.
During the analysis of averaged trials of each subject, no
This indicates that MEG and M/EEG samples were not lin-
activity was observed for the averaged event related poten-
early separable which is why a linear classifier performed
tials in subject 2 because of too many noisy trials. Since all
poorly for these modalities. The box plots in Fig. 2 clearly
other subjects had evoked potentials, subject 2 was removed,
shows the improvement in MEG as compared to EEG for non-
features were re-extracted and classification accuracy was re-
linear SVM.
estimated. This improved the classification accuracies. Table
1 shows improved classification accuracies for EEG, MEG,
3.3. Multimodal M/EEG feature extraction M/EEG (separate & joint).
EEG and MEG were combined in two different ways for
estimating the LOSO classification accuracy. First, EEG and 4. CONCLUSION
MEG were concatenated after separately extracting features
i.e. separate M/EEG; Second, EEG and MEG signals were Brain decoding across subjects is a challenging task since
combined before extracting features i.e. joint M/EEG as inter-subject variability can reduce the classification accuracy
shown in Table 1 and Fig. 2. The LOSO mean classification significantly. The proposed model captured the inter-subject
accuracy was significantly improved in joint M/EEG using a variability very well because of its good generalization ca-
non-linear SVM from 78.86% to 84.63%. pabilities across subjects. Compared to the state of the art,
the model improved the LOSO classification accuracy for
1 single modal MEG to 81.55% and for multi-modal M/EEG
to 84.98% but no significant improvement was observed for
Classification Accuracy
0.9
EEG. To the best of our knowledge, the method has not been
0.8 applied on either separate or joint M/EEG. Using this as a
0.7 preliminary result, further exploration can be done for multi-
class brain decoding across subjects. The work can also be
0.6
extended with the addition of fMRI signal to M/EEG.
0.5
0.4 5. REFERENCES
EEG MEG Separate M/EEG Joint M/EEG
[1] A. Barachant (2014), “Meg decod-

Fig. 2: The box plots are plotted for EEG, MEG, separate
ing across subjects [online],” Available:
& joint M/EEG for non-linear SVM. A box plot is typically
https://github.com/alexandrebarachant/DecMeg2014.
divided into four quartiles representing distribution of the
dataset which, in our case, is classification accuracy across 16 [2] Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang,
subjects. The four horizontal lines on a single box plot marks “A survey of transfer learning,” Journal of Big Data, vol.
four quartiles. Each quartile represents 25 % of the data dis- 3, no. 1, pp. 1–40, 2016.
tribution. The upper horizontal line indicates the maximum
classification accuracy while the lowest horizontal line indi- [3] Emanuele Olivetti, Seved Mostafa Kia, and Paolo
cates the minimum classification accuracy. The dots outside Avesani, “Meg decoding across subjects,” in Pattern
of this distribution are outliers. The red line indicates the me- Recognition in Neuroimaging, 2014 International Work-
dian or the middle of the dataset. shop on. IEEE, 2014, pp. 1–4.
The range of signal values for MEG and EEG are different [4] H. Huttunen (2014), “Hierarchical combination of lo-
with MEG measured in femtoTesla (fT) and EEG measured gistic regression and random forest [online],” Available:
in microvolts (µV). Therefore, the euclidean norm of the sig- https://github.com/mahehu/decmeg.
nal was taken in order to remove any bias in the result. As
[5] D. G. Wakeman and R. N. Henson, “A multi-subject,
described earlier, riemannian mean and distance are not af-
multi-modal human neuroimaging dataset,” Sci. Data
fected by normalization. However, SVM affects the LOSO
2:150001 doi: 10.1038/sdata.2015.1, vol. 2, 2015.
classification accuracies for normalized data because SVM is
a distance based classifier and the distances are changed when
3227

Fatima 2017

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Fatima 2017

Hochgeladen von

Copyright:

Verfügbare Formate

DECODING BRAIN COGNITIVE ACTIVITY ACROSS SUBJECTS USING MULTIMODAL

Sarwat Fatima, Awais M. Kamboh, Member IEEE

Neuroinformatics Lab, School of Electrical Engineering & Computer Science (SEECS),

978-1-5090-2809-2/17/$31.00 ©2017 IEEE 3224

the classifier is trained on other subjects. This was termed as

[1] A. Barachant (2014), “Meg decod-

Das könnte Ihnen auch gefallen