Beruflich Dokumente
Kultur Dokumente
Abstract—Cognitive workload is an important indicator of gain information on the cognitive user’s state such as their
mental activity that has implications for human–computer in- mental or cognitive activities, memory load and task engage-
teraction, biomedical and task analysis applications. Previously, ment. These measures are criteria for how hard a user is
subjective rating (self-assessment) has often been a preferred mea-
working to solve a problem or use an interface, and can be
sure, due to its ease of use and relative sensitivity to the cognitive
load variations. However, it can only be feasibly measured in a employed to support the user’s goals or adapt an interface [3].
post-hoc manner with the user’s cooperation, and is not available Cognitive monitoring, also called passive BCI, can detect the
as an online, continuous measurement during the progress of the potential for human error or the loss of control over a system
cognitive task. In this paper, we used a cognitive task inducing by providing information on the user’s state [1]. Cognitive load
seven different levels of workload to investigate workload dis- is the load imposed on the working memory during a cogni-
crimination using electroencephalography (EEG) signals. The tive process [4], and its measurement is of great importance in
entropy, energy, and standard deviation of the wavelet coefficients
extracted from the segmented EEGs were found to change very
cognitive monitoring [3], [4]. This is to avoid cognitive/mental
consistently in accordance with the induced load, yielding strong overload of the user or interface and maintain efficiency and
significance in statistical tests of ranking accuracy. High accuracy productivity in work performance, especially in critical or high
for subject-independent multichannel classification among seven mental load workplaces such as air traffic control, military op-
load levels was achieved, across the twelve subjects studied. We erations, medical and emergency applications. A mechanism of
compare these results with alternative measures such as perfor- interest in these fields is working memory, which is used for
mance, subjective ratings, and reaction time (response time) of
short-term retention, utilization, and manipulation of informa-
the subjects and compare their reliability with the EEG-based
method introduced. We also investigate test/re-test reliability of tion in the mind [5]. More broadly, changing characteristics of
the recorded EEG signals to evaluate their stability over time. working memory with memory load or task engagement [4], and
These findings bring the use of passive brain-computer interfaces the relationship with the cortical activation are areas of growing
(BCI) for continuous memory load measurement closer to reality, study in recent years [6], [7].
and suggest EEG as the preferred measure of working memory There are different techniques available to measure cognitive
load. load, including subjective measures (e.g., performance-based,
Index Terms—Cognitive workload, electroencephalog- subjective ratings, reaction time), and behavioral and phys-
raphy (EEG), multichannel classification subject-independent, iological methods. Among subjective measures, self-rating
self-rating, wavelet coefficients. (self-report) has been widely used as the preferred measurement
method [8], [9], which consists of posttask questionnaires. This
is because these are easy and inexpensive to administer and
I. INTRODUCTION
assess, they do not obtrude with primary task performance, and
they can detect small variations in workload with a relatively
I N BRAIN-COMPUTER INTERFACES (BCI), human–
computer interaction (HCI), cognitive neuroscience,
biomedical engineering, and psychology researchers have an
good sensitivity [4]. However, subjective measures cannot
provide satisfactory results in all scenarios as they basically
rely on the assumption that the subjects are willing and able
ongoing interest in cognitive monitoring [1]–[3], in order to
to respond, accurately [10]. Another limitation is that they can
only be measured in a post-hoc manner and are not available
Manuscript received November 12, 2014; revised February 26, 2015; ac-
as an on-line, continuous measurement during the progress
cepted May 22, 2015. Date of publication June 04, 2015; date of current version
December 09, 2015. of the cognitive task, which is the case for many real-time
P. Zarjam is with the School of Electrical Engineering and Telecommunica- BCI/HCI applications. Behavioral methods (e.g., speech,
tions, University of New South Wales, Sydney, NSW 2052, Australia (e-mail:
mouse/pen-input movements [11]–[13]) can also reflect cogni-
p.zarjam@unsw.edu.au).
J. Epps is with the School of Electrical Engineering and Telecommunications, tive load, but they are the most distant level of measurement
University of New South Wales, Sydney, NSW 2052, Australia, and also with from the cognitive activity, and their sensitivity to workload
the ATP Research Laboratory, National ICT Australia, Eveleigh, NSW 2015,
is mostly not high. Physiological methods include heart rate
Australia (e-mail: j.epps@unsw.edu.au).
N. H. Lovell is with the Graduate School of Biomedical Engineering, variability [14], eye movement [15], hormone levels [10], skin
University of New South Wales, Sydney, NSW 2052, Australia (e-mail: conductance/galvanic skin response (GSR) [16], [17], and brain
n.lovell@unsw.edu.au).
activity. Among these measures, monitoring the brain activity
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. has been recognized as the most sensitive and consistent re-
Digital Object Identifier 10.1109/TAMD.2015.2441960 flector of cognitive load [2]. However, physiological methods
1943-0604 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
302 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 7, NO. 4, DECEMBER 2015
accuracy for each addition task. To minimize any muscle move- TABLE II
ment artifact due to electromyogram (EMG) activity during the EEG FREQUENCY BANDS APPROXIMATELY CORRESPONDING TO EACH
WAVELET SCALE, Hz [31]
recording, subjects were asked to avoid any unnecessary phys-
ical movements and their hand was placed in a fixed position,
where they could still make slight finger movements in response
to the correct answer on the mouse. Since the channels in the
frontal lobe are susceptible to ocular artifact (Electrooculog-
raphy/EOG), subjects were required to refrain from excessive
blinking as much as possible. The subjects were given 30 sec-
onds break between each two levels, when they could move or
blink. In the baseline (rest) condition, conducted after the exper-
iment, the subjects were asked to sit relaxed and keep their eyes
closed for 2 min. We chose closed-eyes as a baseline to control
the rest condition more and ensure the subjects were in a relaxed
position.
B. Subjects
Twelve healthy male volunteers engaged in postgraduate B. Spectral Analysis
study participated in the experiment. Their age ranged from In order to quantitatively study the PSD of the EEG signals
24–30 years, all were right-handed and had normal or corrected for this task in a summarized, comparative manner across all
to normal eye-sight. They gave written informed consent, in channels, we calculated the spectral edge frequency (SEF) of
accordance with human research ethics guidelines. The subjects the signals recorded from all 32 channels. Here, the s were con-
were asked to refrain from taking alcohol and caffeine 12 h sidered as the frequency below which 95% of the EEG signal
prior to the experiment. The experiment (test) was done in power resides.
one session and the task lasted about 15 min. The experiment
was repeated (retest session) after a six-month interval for two C. Multi-Resolution Analysis Using the DWT
subjects. We employed a new set of features, using the discrete wavelet
transform (DWT). The DWT has proven to be one of the most
C. EEG Recording suitable tools for processing of the EEG signals, since it does
The subjects’ EEG signals were recorded using an Active not assume the signals are stationary and offers an optimal
Two acquisition system, at the ATP Laboratory of National ICT time-frequency resolution for analysis of these signals which
Australia in Sydney. The experiment was conducted under con- are generally dominated by low frequencies [37] and localized
trolled conditions in an electrically isolated lab, with a min- signal representation in the time- and frequency-domain for
imum distance of 5 m from power sources to the experiment EEG signals analysis [38]. The efficient DWT computational
desk and under natural illumination. Each recording contained structure is obtained by wavelets and scaling sequences de-
32 EEG channels mounted in an elastic cap, according to the ex- duced from one octave to the next by a two-scale difference
tended international 10–20 system. A linked earlobe reference equation. At each octave level ,
was used and impedance was kept under k . The EEG signals a signal is passed through a low-pass filter and a
were passed through a band-pass filter with cut-off frequencies high-pass filter to produce approximate (i.e., ) and detail
of Hz and were recorded at a 256 Hz sampling rate. (i.e., ) coefficients, respectively [39]. The choice of the
mother wavelet depends on the application. The smoothing
feature of the Daubechies wavelet makes it suitable to detect
III. ANALYSIS METHODS variations in the EEG signals [39]. Therefore, we selected the
Daubechies-4 mother wavelet and the parameter was chosen
A. Preprocessing
to be 3 resulting in 3 wavelet scales, corresponding approxi-
In order to obtain artifact-free data, the acquired EEG signals mately to the EEG frequency bands, as shown in Table II.
were first visually inspected and segments contaminated with
EMG and EOG artifacts were removed. For the purpose of fea- D. Feature Extraction
ture extraction, the EEG signals were segmented using a rect- During preliminary analyses, we extracted many features
angular window of length seconds. After removing the from our previous work for classifying three levels of load,
DC level of the segments, we resampled them at Hz for example zero-crossing rate, maximum cross-correlation,
to reduce the computational load of the feature extraction algo- spectral coherence, instantaneous frequency, phase locking
rithms. Finally, the EEG segments were band-pass filtered value, intensity weighted mean frequency, intensity weighted
in the frequency band of Hz (as the EEG signals gen- bandwidth and [40]–[42]. These features were unable to
erally do not have many useful frequency components above discriminate all seven load levels accurately.
30 Hz [36]). Hereafter, we denote the EEG segment under anal- We then tested a series of wavelet-based features, of which
ysis by which contains samples per seg- the entropy, energy, and standard deviation of the coefficients
ment, with no overlap between the successive segments. were best able to distinguish the seven load levels. We extracted
304 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 7, NO. 4, DECEMBER 2015
these features from each of 13 artifact-free EEG segments (from the data. We calculated -values by Kruskal–Wallis test across
each channel), since the total artifact-free task duration was at all the channels for each extracted feature, for each participant.
least 65 s across all load levels and subjects. For each wavelet To assess the consistency of ranking among the task loads, we
scale , wavelet approximate and detail coefficients are given also conducted Friedman’s test, which ranks each row together,
by , respectively, where then considers the values of ranks by columns [43]. Generally,
is the number of coefficients in scale . The fea- the Friedman test is a nonparametric statistical test which exam-
tures employed were calculated using the wavelet coefficients ines differences between groups when the dependent variable
as follows. under measurement is ordinal. This test is used as column effects
1. Entropy of the approximate coefficients ( ): The are of interest in the study and it examines only column effects
first feature is the Shannon entropy of the approximate co- after adjusting possible row effects [43]. This is an important
efficients and is given by [31] test for measurement applications like cognitive load, where an
ordinal relationship between load levels must be consistently
observed in order for EEG features to be meaningful indicators.
Here, we considered the designed task loads as columns and the
subjects as rows. This ranked the measured load levels for all
(1) the subjects, each time for one EEG channel (for one extracted
feature).
From the above equation, it is clear that
F. Feature Selection
. The feature is zero when only one of the wavelet
coefficients is nonzero and attains its peak value when all In line with the earlier observation of the importance of an
coefficients are equal. This feature therefore captures the ordinal relationship between load levels, all feature/channel/
variability and self-similarity among the coefficient values, frequency band combinations that exhibited a consistent
and is also a measure of the flatness of the approximate monotonic trend in the same direction across all subjects were
coefficients. Analysis of the effects of increasing the load selected.
level on the approximate coefficients showed that as the
G. Classification
load level increases, the number of significant coefficients
decreases, which results in lower values for the entropy We classified the imposed memory loads using the multilayer
given in (1). perceptron (MLP) structure of artificial neural networks (ANN)
2. Energy of the approximate coefficients (WSTD): This from the EEG signals recorded when performing the task. ANNs
feature represents the energy of the EEG segment under are known as flexible classifiers and with careful selection of ar-
analysis in a particular frequency band. The can chitecture can be a great choice for classifying noisy and nonsta-
be calculated as the sum of the squares of the wavelet ap- tionary data such as EEGs [45]. We used an MLP structure for
proximate coefficients for that particular scale the ANN, which has the ability to handle complex and nonlinear
input-output relationship with hidden layers. A four-layer archi-
tecture ANN with two hidden layers was adopted here based
(2) on empirical results; the first hidden layer included 20 neurons,
the second hidden layer 14 neurons and the output layer 7 load
The value of this feature shows how the signal energy levels.
is distributed over different subbands. From the spectral
analysis in Section IV-A, it is known that the signal com- IV. EXPERIMENTAL RESULTS
ponents with the largest power are in the low frequency
A. Spectral Analysis
bands, therefore we expect this feature provides some
useful information in higher wavelet scales (e.g., scale The results of the analysis are shown in Fig. 1, where for
). each level of task difficulty, the median of the SEFs of the sig-
3. Standard deviation of the approximate coefficients nals recorded from a particular channel is plotted. This shows
( ): This feature shows how much the EEG seg- that most of the power of the signals recorded from the frontal
ment under analysis varies from the mean. channels (the frontal left; channels Fp1, AF3, F7, F3, FC1, FC5
and the frontal right; channels FC6, FC2, F0.01 , F8, AF4,
E. Statistical Analysis of Extracted Features Fp2) lies in the lower EEG frequency band, corresponding to
We used a Kruskal–Wallis test to measure how effective the the delta frequency band. For the occipital lobe channels (PO3,
features were in separating different workloads imposed, since O1, Oz, O2, PO4) there is a significant power spread up to 12 Hz
there are more than three independent load level groups; it is which includes the delta, theta and alpha frequency bands. The
a nonparametric method for one-way analysis of variance and left hemisphere lobe (temporal, central and parietal channels,
is not affected by variations in small portions of the data. Per- i.e., T7, C3, CP1, CP5, P7, P3, Pz), displays significant fre-
forming the test examines the hypothesis that all samples were quency components up to 8 Hz which include the delta and theta
taken from identical populations and is especially sensitive to frequency bands. The right hemisphere lobe (temporal, cen-
differences in central tendency [43]. This test was suggested in tral, parietal channels, i.e., P4, P8, CP6, CP2, C4, T8) includes
previous related studies [27], [44], due to the ranked nature of significant frequency components in the delta, theta and alpha
ZARJAM et al.: BEYOND SUBJECTIVE SELF-RATING: EEG SIGNAL CLASSIFICATION OF COGNITIVE WORKLOAD 305
Fig. 1. Median of the extracted spectral edge frequency from the segmented
EEG for 32 EEG channels across twelve subjects for different load levels im- Fig. 2. Medians of extracted from segmented EEG data in the delta
posed. The lowest task load is denoted by L1 and the highest by L7. This shows band, from the frontal lobe of subject 1. In many frontal EEG channels, the
the frequency range within which the concentration of the signals’ power lies, median decreases as workload increases.
across different regions of the brain (frontal, occipital, left/right hemisphere) for
different load levels. Based on this, the various EEG channels were studied in
their corresponding frequency ranges (see Table IV). TABLE IV
EEG CHANNELS, BANDS AND FEATURES FOR WHICH A CONSISTENT
DECREASING TREND WAS OBSERVED ACROSS ALL TWELVE SUBJECTS. BASED
TABLE III ON THESE RESULTS, THE MOST SUITABLE CHANNELS ARE SELECTED
CHANNELS THAT SHOWED THE SMALLEST -VALUES CALCULATED BY
KRUSKAL-WALLIS TEST, ACROSS ALL SUBJECTS
Fig. 3. Boxplot of the extracted from the segmented EEG data for
Fig. 4. Averaged reaction time versus task difficulty across twelve subjects.
channel F3 for subject 1 in the delta band. On each box, the red mark is the
On each box, the red mark is the mean; the edges of the box are the 25th and
median; the edges of the box are the 25th and the 75th percentiles. Very low
the 75th percentiles [31].
level is denoted by L1 and the extremely difficult level by L7.
TABLE VI
TABLE V
RESPONSE ACCURACY (TASK PERFORMANCE) PERCENTAGE VERSUS TASK
ACCURACY OF 7-CLASS CLASSIFICATION FOR THE EXTRACTED FEATURES
DIFFICULTY AVERAGED OVER TWELVE SUBJECTS AND FRIEDMAN’S TEST
BY ANN USING LEAVE-ONE-OUT TECHNIQUE, ACROSS TWELVE SUBJECTS,
RESULTS
COMPARED TO ACCURACY OF 7-CLASS CLASSIFICATION FOR SUBJECTIVE
SELF-RATING
TABLE VII
CONFUSION MATRIX FOR SELF-RATINGS RESULTS ACROSS TWELVE SUBJECTS.
EACH LOAD LEVEL (TASK LOAD) REPRESENTS 72 ADDITION TASKS
V. DISCUSSION
We evaluated the usefulness of the proposed multiresolution
feature set and sensitivity of the EEG signals in discriminating
cognitive load in fine seven levels; first, using two statistical ap-
proaches (Kruskal-Wallis and Friedman tests) in Section IV-B,
Fig. 7. Medians of the entropy of the approximate coefficients extracted from
the segmented EEG data in the delta band in retest session, from the frontal lobe second, using classification accuracy across twelve subjects in
of the same subject shown in Fig. 3. In most of the frontal EEG channels, the Section IV-D. The results showed that high task load discrimi-
decreasing trend with task load increase is preserved. nation was mainly achieved in the frontal lobe of the brain. This
is supported by related findings emphasizing that the frontal
areas show the load levels correctly rated by the subjects. As lobe is necessary for maintaining and carrying out calculation
seen, the highest numbers of correct ratings belong to L1 to tasks, due to its tight association with attention and working
L3, indicating that subjects were mostly able to rate the lower memory [31], [48], [49].
load levels, correctly. On the other hand, adjacent intermediate The decrease of the extracted entropy and energy of the ap-
levels were often confused, for instance level 2 was confused proximate coefficients with the task difficulty increase could be
with level 1 with the highest rate and level 4 with level 3. Ap- interpreted as narrower distribution of the entropy or energy of
parently, subjects tended to under-rate the task difficulty experi- the coefficients, in the frequency band under study. This may
enced in most cases. The results demonstrate that the self-ratings indicate that the brain behaves in a more certain and focused
manner when performing more difficult tasks [47]. Our study
do not present a consistent trend corresponding to the task load
not only affirms the literature but also adds that in the case of in-
imposed. Obviously, our subjects seemed not very capable of
creasing task difficulty; the frontal lobe is affected more deeply.
rating their experienced mental burden, even though they were
This might be interpreted as the subjects paying more attention
advised prior to the task that there are 7 load levels to rate, in-
or becoming more focused while tackling more difficult tasks.
crementally. We also applied Friedman’s test to the collected
The feature set used herein is based on wavelet coefficients,
subjective ratings. Here, we considered the designed task loads
which have shown their capability for EEG signal classification
as columns and the subjective ratings as rows for twelve sub-
in pathological diagnoses previously [50], [51], but had not been
jects. This ranked the reported load levels for all the subjects, as
used in the area of cognitive load or mental task classification,
follows: the mean-ratings of; 4.62, 4.91, 5.04, 3.12, 3.70, 2.87,
prior to this study. Only in a few related studies have wavelet co-
and 3.70, for ranking task difficulty levels from 1 to 7, respec-
efficients been deployed: to remove artifacts from EEG data in
tively, with . Clearly, the ranking is not consistent [26] and discriminate three load levels on the basis of the appear-
and is not significant at a 0.01 level. Accuracy of 7-class classi- ance time and total power extracted from the EEG signals [27].
fication for this self-rating is shown in Table V. This feature set extends our current knowledge of successful
wavelet-features for an EEG-based load classification system
F. Re-Test Analysis
[31] and advances the existing knowledge of EEG-based fea-
We analyzed the recorded data in two ways in order to assess tures, in general.
the test/re-test reliability of the feature distributions. First, a one In this research, we put an emphasis on the delta band due to
way repeated ANOVA analysis was used to examine any differ- its role in the brain region of interest, i.e., the frontal lobe and its
ences across the two sessions of test and retest [32]. This test can close relationship with working memory [48], [49], [52]. This
be used for both parametric and nonparametric data and when frequency band has been investigated in only a couple of pre-
we have a single group on which we have measured something vious studies, which showed that by increasing task demand,
a few times (test/re-test in our case) [43]. Second, we checked subject attention to the task and also the delta band activity in-
the trends shown by the features extracted from each channel creases [53], [54]. This frequency band has attracted less atten-
when increasing the task load, as discussed in Section IV-C. tion, perhaps due to concerns of artifact contamination [54]. We
Using ANOVA, there were no significant effects between point out that, our task was conducted under controlled condi-
sessions for the recorded EEGs across two corresponding load tions and we eliminated DC level and artifact-contaminated seg-
levels in test and retest (for the same channels and subjects). For ments and studied only artifact-free segments [31]. However, in
instance, for the feature extracted from Ch4 (F3) between L1 informal experiments we have previously extracted features like
from the test and L1 from the retest session, we found those in this paper from artifact-affected data, and not observed
and also between L7 from the test and L7 from the retest session any degradation in classification accuracy.
. As shown by Table VI, the study of subjects’ performance
We also repeated the procedure conducted in Section IV-C for indicated that as the load level increased, the response accu-
the retest EEG signals. For illustration purposes, Fig. 7 shows racy consistently declined. This was confirmed statistically by
308 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 7, NO. 4, DECEMBER 2015
a low -value in Section IV-E. Table VII also showed that the We acknowledge a limitation in the above experimental work
subjects were not capable of accurately rating their experienced that the order of the task presentation for the different task dif-
mental burden (self-ratings). This is despite the fact that prior ficulty levels was not randomized for the test session. How-
to the experiment, subjects were instructed that the task will ever, we repeated the recordings with randomized order at a
be presented in an increasing seven-level manner. Their given later stage for two subjects (recorded during the retest). The
self-ratings were not statistically significant by Friedman test, extracted features in most of the channels of interest showed
which confirms doubts expressed previously about the relia- the same decreasing trend and returned similar statistical re-
bility of self-ratings as an indicator of the mental burden im- sults. For instance, for the entropy of the coefficients extracted
posed [55], [56]. Self-rating is well-established as one of the from Ch4 (F3) between L1 (L7) from the test session and L1
subjective assessment methods for cognitive load [4], [57]. This (L7) from the randomized order session, we have calculated
reduced sensitivity of self-ratings to workload variations might , using ANOVA test, indicating that the
be interpreted in terms of cases when the task demand lies in ei- two groups were taken from identical populations, across two
ther the low load (underload) or the high load (overload) levels, recording sessions. Therefore, we do not expect that the con-
in which there is a trade-off between the amount of invested re- sistent trends are due to the nonrandomized order of stimulus
sources and time-sharing requirements on the working memory presentation. It has been suggested previously that randomizing
imposed by the task demand [55], [56]. This trade-off may cause the order of task presentation may reduce response order effect
disassociation between the subjects’ performance and self-rat- but does not remove it entirely [59].
ings [55], [56]. We note the other limitations of this experiment, including
Our results therefore demonstrate that cognitive load assess- the controlled conditions subjects were placed under while per-
ment by EEG can outperform self-ratings, which is an inter- forming the task (e.g., to sit still and refrain from excessive body
esting result for empirical cognitive load research in general. movement and blink). The setup was also lengthy and may have
Therefore, it can be tentatively proposed as a more reliable and induced some physical fatigue in the subjects prior to the start of
sensitive method to measure the cognitive load, even in cases of the experiment, however, we tried to minimize the fatigue im-
overload or underload. posed on subjects during the task recording by keeping the tasks
To compare our classification results and computational load as short as possible and giving short breaks, after each task level.
with previous EEG-based studies, we replicated the classifica- We also acknowledge that this method should be validated on
tion methodology used in [3], which reported 99% classification a larger database with more subjects, to be able to generalize
accuracy for two load levels and 88% for four load levels (for the method as a subject-independent technique. Even though
eight subjects). Using the same feature set (power), channels, and the subjects were chosen to have a similar education level, there
window-length band, we achieved 83% classification accuracy may be some individual differences, such as better calculation
for seven load levels in the delta frequency, on our database. skills or motivation to complete the task, that may have affected
Using the proposed wavelet features here, classification accuracy the performance or self-ratings.
of 98% for seven load levels across twelve subjects was achieved,
similarly to results based on our wavelet-entropy features in [31].
VI. CONCLUSION
However, the two new and features here, gave
a slight classification performance improvement over This paper demonstrates the promising capability of mul-
feature, which is the best performing entropy-based feature in tiresolution features extracted from the EEG signals in discrim-
[31]. On the other hand, the feature has a shorter running inating close levels of cognitive load imposed in an arithmetic
time and can be calculated faster than , as it follows. In task. This discrimination was also demonstrated statistically
terms of the computational load, we compared the running time by means of Kruskal–Wallis and Friedman tests, for the se-
of the two algorithms implemented in Matlab (R2010b) on a PC lected EEG channels, across twelve subjects. This is while the
(with Intel processor of core (TM)2 DUO CPU E8400@3 GHz). Friedman ranking was neither consistent nor significant for
The results show the running time of 66.46 and 46.12 seconds subjective self-rating. The results in Section IV overall suggest
for 65 seconds of data (for the selected channels, across twelve EEG as a highly promising method for measuring cognitive
subjects) with the proposed and features, load to show an improvement over subjective self-ratings in
compared to 76.48 seconds for the feature in proposed in [3]. terms of statistical significance and classification accuracy.
This indicates our proposed system outperforms the existing The proposed cognitive load classification methodology,
systems and can be used for near real-time processing of EEG which uses the entropy, energy and standard deviation of the
signals for memory load classification. wavelet coefficients, outperforms self-ratings and existing
As for other physiological measures, the average classifica- spectral-based features and achieves a very high detection
tion accuracy of 57% from pupil diameter and 53% from GSR accuracy of 98% in discriminating seven load levels, across
measures were obtained for two load levels (across twenty sub- twelve subjects studied (compared to 31% classification ac-
jects) in [17]. We also evaluated the EEG signal stability over curacy of self-rating). In an informal experiment conducted,
time for two subjects and showed they are reproducible not only we have extracted these features from artifact-affected data,
after 6 months but also for high load levels, which is again a and not observed any degradation in classification accuracy.
promising preliminary result. Other studies to date have already This strongly suggests that the cognitive workload approach
indicated that cognitive EEG is reproducible from one day to 18 described herein is applicable in near real-time, without any
weeks during a resting state or low cognitive load tasks [58]. manual artifact removal. The reliability and reproducibility of
ZARJAM et al.: BEYOND SUBJECTIVE SELF-RATING: EEG SIGNAL CLASSIFICATION OF COGNITIVE WORKLOAD 309
the reported results were also confirmed through the retest after [20] G. Pfurtscheller and F. H. Lopes da Silva, “Event-related EEG/MEG
a six-month interval. synchronization and desynchronization: Basic principles,” Clinical
Neurophysiol., vol. 110, no. 11, pp. 1842–1857, 1999.
Future work includes investigating other features in the given [21] W. Klimesch, “EEG alpha and theta oscillations reflect cognitive and
task load discrimination, collection of a bigger database, and in- memory performance: A review and analysis,” Brain Res. Rev., vol.
vestigating the applicability the proposed features for other cog- 29, no. 2–3, pp. 169–195, 1999.
[22] C. Neuper, R. H. Grabner, A. Fink, and A. C. Neubauer, “Long-term
nitive tasks. There is also scope for introducing further signal stability and consistency of EEG event-related (de-)synchronization
preprocessing, towards improving the classification rate partic- across different cognitive tasks,” Clinical Neurophysiol., vol. 116, no.
ularly for less controlled measurement environments. 7, pp. 1681–1694, 2005.
[23] W. Klimesch, “Memory processes, brain oscillations and EEG syn-
chronization,” Int. J. Psychophysiol., vol. 24, no. 1–2, pp. 61–100,
1996.
REFERENCES [24] A. Gevins, M. E. Smith, H. Leong, L. McEvoy, S. Whitfield, R. Du, and
G. Rush, “Monitoring working memory load during computer-based
[1] O. T. Zander and C. Kothe, “Towards passive brain–computer inter- tasks with EEG pattern recognition methods,” Human Factors, vol. 40,
faces: Applying brain–computer interface technology to human–ma- no. 1, pp. 79–91, 1998.
chine systems in general,” J. Neural Eng., vol. 8, no. 2, pp. 1–5, 2011. [25] A. Stipacek, R. H. Grabner, C. Neuper, A. Fink, and A. C. Neubauer,
[2] P. Antonenko, F. Paas, R. Grabner, and T. van Gog, “Using electroen- “Sensitivity of human EEG alpha band desynchronization to different
cephalography to measure cognitive load,” Edu. Psychol. Rev., vol. 22, working memory components and increasing levels of memory load,”
no. 4, pp. 425–438, 2010. Neurosci. Lett., vol. 353, no. 3, pp. 193–196, 2003.
[3] D. Grimes, D. S. Tan, S. E. Hudson, P. Shenoy, and R. Rao, “Feasibility [26] C. Berka and D. J. Levendowski et al., “EEG correlates of task en-
and pragmatics of classifying working memory load with an electroen- gagement and mental workload in vigilance, learning, and memory
cephalograph,” in Proc. 26th SIGCHI Conf., 2008, pp. 835–844. tasks,” Aviation, Space, Environ. Med., vol. 78, no. Supplement 1, pp.
[4] F. Paas, J. E. Tuovinen, H. Tabbers, and P. W. M. Van Gerven, “Cog- B231–B244, 2007.
nitive load measurement as a means to advance cognitive load theory,” [27] A. Murata, “An attempt to evaluate mental workload using wavelet
Edu. Psychol., vol. 38, no. 1, pp. 63–71, 2003. transform of EEG,” Human Factors: J. Human Factors Ergon. Soc.,
[5] A. Baddeley, “Recent developments in working memory,” Current vol. 47, no. 3, pp. 498–508, 2005.
Opinion Neurobiol., vol. 8, no. 2, pp. 234–238, 1998. [28] O. Jensen and C. D. Tesche, “Frontal theta activity in humans increases
[6] P. Sauseng, W. Klimesch, M. Schabus, and M. Dopplmayr, “Fronto- with memory load in a working memory task,” Eur. J. Neurosci., vol.
parietal EEG coherence in theta and upper alpha reflect central execu- 15, no. 8, pp. 1395–1399, 2002.
tive functions of working memory,” Int. J. Psychophysiol., vol. 57, no. [29] A. Bashashati, M. Fatourechi, K. R. Ward, and G. E. Birch, “A survey
2, pp. 97–103, 2005. of signal processing algorithms in brain–computer interfaces based on
[7] W. Klimesch, M. Doppelmayr, and S. Dopplmayr, “Upper alpha ERD electrical brain signals,” J. Neural Eng., vol. 4, pp. R32–R57, 2007.
and absolute power: Their meaning for memory performance,” in [30] H. Nai-Jen and R. Palaniappan, “Classification of mental tasks using
Progress in Brain Research, N. Christa and K. Wolfgang, Eds. Am- fixed and adaptive autoregressive models of EEG signals,” in Proc. 2nd
sterdam, The Netherlands: Elsevier, 2006, pp. 151–165. Neural EMBS Conf., 2005, pp. 633–636.
[8] F. G. W. C. Paas, J. J. G. Van Merrienboer, and J. J. Adam, “Measur- [31] P. Zarjam, J. Epps, F. Chen, and N. H. Lovell, “Estimating cognitive
ment of cognitive load in instructional research,” Percept. Motor Skills, workload using wavelet entropy-based features during an arithmetic
vol. 79, no. 1, pp. 419–430, 1994. task,” Comput. Biol. Med., vol. 43, no. 12, pp. 2186–2195, 2013.
[9] N. Meshkati, P. A. Hancock, M. Rahimi, and S. M. Dawes, “Tech- [32] L. K. Mc-Evoy, M. E. Smith, and A. Gevins, “Test–retest reliability of
niques in mental workload assessment,” in Evaluation of Human Work: cognitive EEG,” Clinical Neurophysiol., vol. 111, no. 3, pp. 457–463,
A Practical Ergonomics Methodology, J. R. W. E. N. Corlett, Ed., 2nd 2000.
Ed. ed. London, U.K.: Taylor & Francis, 1995, pp. 749–782. [33] A. FÜrst and G. Hitch, “Separate roles for executive and phonological
[10] G. F. Wilson and F. T. Eggemeier, “Psychophysiological assessment components of working memory in mental arithmetic,” Memory Cogn.,
of workload in multi-task environments: Multiple-task performance,” vol. 28, no. 5, pp. 774–782, 2000.
Multiple-task performance book, books.google.com, pp. 329–360, [34] M. Mc-Closkey, “Cognitive mechanisms in numerical processing: Ev-
1991. idence from acquired dyscalculia,” Cognition, vol. 44, no. 1–2, pp.
[11] B. Yin and F. Chen, “Towards automatic cognitive load measurement 107–157, 1992.
from speech analysis,” Lecture Notes Comput. Sci., pp. 1011–1020, [35] S. Luck, “Setting up an ERP Lab,” in An Introduction to the Event-
2007. Related Potential Technique (Cognitive Neuroscience). Cambridge,
[12] N. Ruiz, R. Taib, Y. Shi, E. Choi, and F. Chen, “Using pen input fea- MA, USA: MIT Press, 2005, pp. 305–340.
tures as indices of cognitive load,” in Proc. 9th Int. ACM Conf., 2007, [36] A. Subasi, “Automatic recognition of alertness level from EEG by
pp. 315–318. using neural network and wavelet coefficients,” Expert Syst. Appl., vol.
[13] F. Chen and Ruiz et al., “Multimodal behaviour and interaction as in- 28, no. 4, pp. 701–711, 2005.
dicators of cognitive load,” Trans. Intell. Interact. Syst., vol. 2, no. 4, [37] O. A. Rosso, S. Blanco, J. Yordanova, V. Kolev, A. Figliola, M. Schür-
2012, Article 22. mann, and E. Başar, “Wavelet entropy: A new tool for analysis of short
[14] F. Paas and J. Van Merriënboer, “Instructional control of cognitive load duration brain electrical signals,” J. Neurosci. Methods, vol. 105, no.
in the training of complex cognitive tasks,” Edu. Psychol. Rev., vol. 6, 1, pp. 65–75, 2001.
no. 4, pp. 351–371, 1994. [38] I. Guler and E. D. Ubeyli, “Multiclass support vector machines for
[15] P. W. M. Van Gerven, F. Paas, J. J. G. Van Merriënboer, and H. G. EEG-signals classification,” IEEE Trans. Inf. Technol. Biomed., vol.
Schmidt, “Memory load and the cognitive pupillary response in aging,” 11, no. 2, pp. 117–126, Nov. 2007.
Psychophysiology, vol. 41, no. 2, pp. 167–174, 2004. [39] M. Akay, Time Frequency and Wavelets in Biomedical Signal Pro-
[16] Y. Shi, N. Ruiz, R. Taib, E. Choi, and F. Chen, “Galvanic skin response cessing. New York, NY, USA: Wiley-IEEE Press, 1998.
(GSR) as an index of cognitive load,” in Proc. CHI ’ 2007 Conf., 2007, [40] P. Zarjam, J. Epps, and F. Chen, “Characterizing working memory load
pp. 2651–2656. using EEG delta activity,” in Proc. 19th EUSIPCO Conf., 2011, pp.
[17] E. Haapalainen, S. Kim, J. F. Forlizzi, and A. K. Dey, “Psycho-phys- 1554–1558.
iological measures for assessing cognitive load,” in Proc. 12th ACM [41] P. Zarjam, J. Epps, and F. Chen, “Spectral EEG features for evaluating
Conf. , 2010, pp. 301–310. cognitive load,” in Proc. 33rd EMBS Conf., 2011, pp. 3841–3844.
[18] C. Berka and D. J. Levendowski et al., “Real-time analysis of EEG [42] P. Zarjam, J. Epps, and F. Chen, “Evaluation of working memory load
indexes of alertness, cognition, and memory acquired with a wireless using EEG signals,” in Proc. 2nd APSIPA Conf., 2011, pp. 715–719.
EEG headset,” Int. J. Human-Comput. Interact., vol. 17, no. 2, pp. [43] D. C. Howell, Statistical Methods for Psychology. Belmont, CA,
151–170, 2004. USA: Thomson Higher Education, 2007, pp. 659–661.
[19] A. Gevins, M. E. Smith, L. McEvoy, and D. Yu, “High-resolution EEG [44] A. Fong, C. Sibley, A. Cole, C. Baldwin, and J. Coyne, “A compar-
mapping of cortical activation related to working memory: Effects of ison of artificial neural networks, logistic regressions, and classification
task difficulty, type of processing, and practice,” Cereb. Cortex, vol. 7, trees for modeling mental workload in real-time,” in Proc. Human Fac-
no. 4, pp. 374–385, 1997. tors Ergon. Soc. Annu. Meeting, 2010, vol. 54, no. 19, pp. 1709–1712.
310 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 7, NO. 4, DECEMBER 2015
[45] F. Lotte, M. Congedo, A. Léuyer, and B. Arnaldi, “A review of classifi- Pega Zarjam received the master’s degree from Queensland University of
cation algorithms for EEG-based brain–computer interfaces,” J. Neural Technology, Brisbane, Australia, and the Ph.D. degree in biomedical signal
Eng., vol. 4, pp. R1–R13, 2007. processing from the University of New South Wales, Sydney, Australia,
[46] L. Pezard, J. Martinerie, F. Breton, J. C. Bourzeix, and B. Renault, respectively.
“Non-linear forecasting measurements of multichannel EEG dy- Her research interests include biomedical/physiological signal processing,
namics,” Electroencephalography Clinical Neurophysiol., vol. 91, no. time-scale/time-frequency signal processing, and pattern recognition.
5, pp. 383–391, 1994.
[47] P. Zarjam, J. Epps, N. H. Lovell, and F. Chen, “Characterization of
memory load in an arithmetic task using non-linear analysis of EEG
signals,” in Proc. 34rd EMBS Conf., 2012, pp. 3519–3522. Julien Epps (M’97) received the B.E. and Ph.D.
[48] K. Sasaki, T. Tsujimoto, A. Nambu, R. Matsuzaki, and S. Kyuhou, degrees from the University of New South Wales,
“Dynamic activities of the frontal association cortex in calculating and Sydney, Australia, in 1997 and 2001, respectively.
thinking,” Neurosci. Res., vol. 19, no. 2, pp. 229–233, 1994. After an appointment as a Postdoctoral Fellow at
[49] H. Kondo, M. Morishita, N. Osaka, M. Osaka, H. Fukuyama, and H. the University of New South Wales, he worked on
Shibasaki, “Functional roles of the cingulo-frontal network in perfor- speech recognition and speech processing research
mance on working memory,” NeuroImage, vol. 21, no. 1, pp. 2–14, first as a Senior Research Engineer at Motorola Labs
2004. and then as a Senior Researcher at National ICT Aus-
[50] N. Hazarika, J. Z. Chen, A. Tsoi, and A. Sergejew, “Classification of tralia. He joined the UNSW School of Electrical En-
EEG signals using the wavelet transform,” Signal Process., vol. 59, no. gineering and Telecommunications as a Senior Lec-
1, pp. 61–72, 1997. turer in 2007, and is currently an Associate Professor.
[51] E. D. Übeyli, “Combined neural network model employing wavelet His research interests include characterization, modeling, and classification of
coefficients for EEG signals classification,” Dig. Signal Process., vol. mental state from behavioral and physiological signals. He has authored or coau-
19, no. 2, pp. 297–308, 2009. thored around 180 publications.
[52] J. Onton, A. Delorme, and S. Makeig, “Frontal midline EEG dynamics Dr. Epps is currently serving as an Associate Editor for the IEEE
during working memory,” NeuroImage, vol. 27, no. 2, pp. 341–356, TRANSACTIONS ON AFFECTIVE COMPUTING, as well as regularly serving as a
2005. Reviewer or Technical Program Committee Member for several IEEE Journals
[53] T. Harmony et al., “EEG delta activity: An indicator of attention to and numerous conferences.
internal processing during performance of mental tasks,” Int. J. Psy-
chophysiol., vol. 24, no. 1–2, pp. 161–171, 1996.
[54] T. Fernández et al., “EEG activation patterns during the performance
of tasks involving different components of mental calculation,”
Electroencephalography Clinical Neurophysiol., vol. 94, no. 3, pp. Nigel Lovell (M’91–SM’99–F’11) received a B.E.
175–182, 1995. degree (Hons) in electrical engineering, in 1983,
[55] M. A. Vidulich and C. D. Wickens, “Causes of dissociation between sub- and the Ph.D. degree in biomedical engineering, in
jective workload measures and performance: Caveats for the use of sub- 1991, both from the University of New South Wales,
jective assessments,” Appl. Ergon., vol. 17, no. 4, pp. 291–296, 1986. Sydney, Australia.
[56] Y.-Y. Yeh and C. D. Wickens, “Dissociation of performance and sub- He is currently at the Graduate School of Biomed-
jective measures of workload,” Human Factors: J. Human Factors ical Engineering University of New South Wales
Ergon. Soc., vol. 30, no. 1, pp. 111–120, 1988. (UNSW), Sydney, Australia, where he holds a
[57] E. N. Wiebe, E. Roberts, and T. S. Behrend, “An examination of position of Scientia Professor. He has authored 200+
two mental workload measurement approaches to understanding refereed journals and 270+ conference proceedings,
multimedia learning,” Comput. Human Behav., vol. 26, no. 3, pp. and been awarded over $79 million in R&D and in-
474–481, 2010. frastructure funding. His research work has covered areas of expertise ranging
[58] L. M. Williams, E. Simms, C. R. Clark, R. H. Paul, D. Rowe, and E. from cardiac modeling, telehealth technologies, biological signal processing,
Gordon, “The test-retest reliability of a standardized neurocognitive and visual prosthesis design. Through a spin-out company from UNSW he
and neurophysiological test battery: “Neuromaker,” Int. J. Neurosci., has commercialized a range of telehealth technologies for managing chronic
vol. 115, no. 12, pp. 1605–1630, 2005. disease and falls in the older population. He is also one of the key researchers
[59] J. A. Krosnick and D. F. Alwin, “An evaluation of a cognitive theory leading an R&D program to develop an Australian bionic eye.
of response-order effects in survey measurement,” Public Opinion Dr. Lovell is a Fellow of five learned societies throughout the world: ATSE,
Quart., vol. 51, no. 2, pp. 201–219, 1987. Engineers Australia, IEEE, FIP, and AIMBE.