Beruflich Dokumente
Kultur Dokumente
cardiorespiratory activity
Citation for published version (APA):
Long, X. (2015). On the analysis and classification of sleep stages from cardiorespiratory activity Eindhoven:
Technische Universiteit Eindhoven
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
• A submitted manuscript is the author's version of the article upon submission and before peer-review. There
can be important differences between the submitted version and the official published version of record. People
interested in the research are advised to contact the author for the final version of the publication, or visit the
DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page
numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
Take down policy
If you believe that this document breaches copyright please contact us:
openaccess@tue.nl
providing details. We will immediately remove access to the work pending the investigation of your claim.
PROEFSCHRIFT
door
Xi Long
Xi Long
On the Analysis and Classification of Sleep Stages from Cardiorespiratory Activity / by Xi
Long – Eindhoven : Eindhoven University of Technology, 2015.
A catalogue record is available from the Eindhoven University of Technology Library.
Proefschrift. – ISBN : 978-90-386-3850-8.
The research presented in this thesis was supported by Philips Group Innovation – Research,
Eindhoven, The Netherlands.
Sleep is a state of reversible disconnection from the environment and plays an exceptionally
essential role in maintaining internal homeostasis, memory consolidation, energy conservation,
and cognitive and behavioral performance. Nowadays, problems in sleeping are widely preva-
lent around the world with increasing sleep complaints. Historically, such problems have been
less common because the regulation of sleep is synchronized with the external environment
through a biological circadian rhythm. However, since we are now living in a modern indus-
trialized society with artificial environments where lighting, heat, and food are available at any
moment, sleep disturbances and disorders have reached epidemic levels. People experience the
symptoms of disturbed sleep such as fatigue, increased impulsiveness, and agitation without
being aware of the link between these issues and their sleeping patterns.
In order to have a healthy condition in body and mind, people should be empowered with
the ability to monitor sleep easily and without disturbing the sleep, to assess sleep quality or
sleep-related problems and to be able to adjust their sleep habits accordingly. However, the
traditional sleep monitoring method, known as polysomnography (PSG), has the problems that
the monitoring is usually accomplished in a sleep laboratory with costly facilities, and many
sleep-disturbing devices with electrodes and wires have to be attached to the body. Furthermore,
the measurements of such devices can only be interpreted by highly trained sleep clinicians.
Therefore, although PSG is currently considered the gold standard and common practice for
sleep monitoring, it is very unfit for daily use in a home scenario by people without specialized
training, and will introduce undesired sleep disturbances. This has motivated the investigation
of alternative sensors and methods that allow for monitoring sleep in an unobtrusive manner,
preferably inexpensive and with no requirement of training.
Objective sleep assessment is often based on monitoring sleep stages throughout the night.
In the past decades, cardiorespiratory signals have attracted more and more attention in the
context of sleep staging or sleep stage classification. Cardiorespiratory activity has been shown
to associate with sleep stages through the regulation of the autonomic nervous system. More
importantly, cardiorespiratory signals can be acquired unobtrusively using advanced technolo-
gies such as microwave Doppler radar, ballistocardiography, photoplethysmography, pressure-
sensitive bed sheets, acoustic devices, and near-infrared cameras. Thus, investigating cardiac
vii
viii Summary
and respiratory characteristics in different sleep stages is important for providing a reliable per-
formance in sleep stage classification, with which a more adequate sleep assessment can be
delivered.
This thesis first exploits characteristics of cardiac/respiratory activity and their interaction
during sleep using several signal analysis methods. These are: frequency band adaptation on
heart rate variability (Chapter 2), dynamic time/frequency warping and uniform scaling (mea-
suring self-dissimilarity) for respiration (Chapter 3 and Chapter 4 respectively), analysis of
breathing depth and volume (Chapter 5), and visibility graph analysis in complex networks for
cardiorespiratory interaction (Chapter 6). Based on these methods, novel cardiorespiratory fea-
tures (expressing certain physiological properties) are proposed to classify sleep stages. Results
show that these features can help to profoundly improve performance of sleep stage classifica-
tion.
In addition, an interesting finding is demonstrated in Chapter 7, which is that there is a time
delay between the changes in brain activity and autonomic variations during sleep transitions.
It appears that the cardiac changes consistently precede the variations in brain activity during
light-deep sleep and sleep-wake transitions. In Chapter 8, this finding is utilized to detect deep
sleep (i.e., slow wave sleep) by using the feature values from with a preceding time interval of
a few minutes before, which can help to significantly improve the detection results. Further-
more, the major challenge of sleep stage classification based on cardiorespiratory activity is
discussed in Chapter 9. It is found that the classification performance is mainly limited by the
between- and within-subject variations in autonomic physiology as well as subject demograph-
ics. Therefore, methods of feature normalization and feature smoothing over the entire night
are proposed in Chapter 10, which serve to reduce these variations between and within subjects
that are observed in the cardiorespiratory features. As a result, marked improvements in sleep
stage classification are observed.
In summary, this thesis focuses on objectively analyzing and classifying sleep stages using
cardiorespiratory signals. It shows that by extracting novel features from the signals, post-
processing features using normalization and smoothing, and applying new findings regarding
autonomic-brain time delay, the sleep stage classifiers can be substantially improved with reli-
able results being ultimately achieved.
Nederlandse samenvatting
Slaap is een omkeerbare toestand van ontkoppeling met de omgeving en speelt een buitenge-
woon belangrijke rol in het instandhouden van de interne homeostase, consolidatie van het
geheugen, energiebesparing, cognitieve prestaties en gedrag. Tegenwoordig komen in de hele
wereld problemen bij slapen in toenemende mate voor. Dit is in het verleden geen probleem
geweest omdat de regulatie van slaap altijd goed gesynchroniseerd is geweest met de omge-
ving door een biologisch circadiaan ritme, maar sinds we in een modern geı̈ndustrialiseerde
maatschappij leven met kunstmatige omgevingen waarbij licht, warmte, en eten beschikbaar
zijn op elk moment, bereiken slaapverstoring en slaapproblemen een epidemisch niveau. Men-
sen ervaren de symptomen van slaapverstoring zoals moeheid, toegenomen impulsiviteit en
agitatie zonder daarbij de relatie met hun slaappatroon te leggen.
Om een gezonde geestelijke en lichamelijke conditie te hebben zouden mensen de mogelijk-
heid moeten hebben om op een eenvoudige manier en zonder daarmee hun slaap te verstoren
hun slaapkwaliteit of slaapproblemen vast te kunnen stellen en hun slaapgewoontes daaraan
aan te passen. Gangbare slaapregistratiemethodes, bekend als polysomnografie (PSG), worden
toegepast in een slaaplaboratorium met dure faciliteiten, en veel slaapverstorende meetmetho-
den met elektrodes met draden verbonden aan het lichaam, waarvan de metingen bovendien
alleen geı̈nterpreteerd kunnen worden door hoogopgeleide slaaptechnici. Hoewel PSG nu de
gouden standaard is en de gangbare praktijk is voor slaapregistratie, is het ongeschikt voor
dagelijks thuisgebruik door mensen zonder speciale opleiding en zal het ongewenst slaapver-
storingen introduceren. Dit heeft het onderzoek naar alternatieve sensoren en methodes gemo-
tiveerd die het meten zonder deze problemen mogelijk maken, bij voorkeur niet duur en zonder
speciale opleiding te gebruiken.
Objectieve vaststelling van slaapparameters is vaak gebaseerd op registratie van slaaptoe-
standen gedurende de hele nacht. In de afgelopen tientallen jaren hebben cardiaal-respiratoire-
signalen meer en meer aandacht gekregen bij het vaststellen van slaapfases en de classificatie
van slaap. Cardiaal-respiratoire activiteit blijkt gerelateerd te zijn aan slaapfases door de regu-
latie van het autonome zenuwstelsel en de signalen kunnen, nog belangrijker, verkregen worden
zonder daar hinder van te hebben door het gebruik van geavanceerde technieken zoals mi-
crogolf Doppler radar, ballistocardiografie, fotoplethysmografie, drukgevoelige bedlakens, en
ix
x Nederlandse samenvatting
xi
xii List of abbreviations
EMG Electromyography
EOG Electrooculography
FFT Fast fourier transform
FN False negative
FP False positive
HF High frequency
HHT Hilbert-Huang transform
HMM Hidden Markov model
HR Heart rate
HRV Heart rate variability
ICC Intra-group correlation coefficient
IG Information gain
IGLS Iterated generalized least squares
IQR Interquartile range
KNN K-nearest neighbor
L Light sleep
LD Linear discriminant
LF Low frequency
LOSOCV Leave-one-subject-out cross-validation
LS Light sleep
LSA Least squares approximation
N1 Stage 1 NREM sleep
N2 Stage 2 NREM sleep
N3 Stage 3 NREM sleep (slow wave sleep, stage S3 and S4)
NB Naive Bayes
NREM Non-rapid-eye-movement
PAT Peripheral arterial tone
PDFA Progressive DFA
pNN50 Percentage of successive RR differences >50 ms
PPG Photoplethysmography
PR Precision-recall
PSD Power spectral density
PSG Polysomnography
PSQI Pittsburgh Sleep Quality Index
PSSA Pressure-sensitive sensor array
PVE Proportion of variance explained
Q-Q Quantile-Quantile
QD Quadratic discriminant
QRS Three successive extrema in the ECG
OSA Obstructive sleep apnea
R REM sleep
List of abbreviations xiii
W Wake
WASO Wake after sleep onset
WDFA Windowed DFA
WRLD Wake, REM sleep, light sleep, and deep sleep
WRN Wake, REM sleep, and NREM sleep
Contents
Summary vii
Nederlandse samenvatting ix
List of abbreviations xi
1 General introduction 1
1.1 Human sleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Sleep stages in electrophysiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Polysomnography – standard for sleep assessment . . . . . . . . . . . . . . . . . . . 3
1.4 Automatic sleep monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 PSG-based sleep stage classification . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.2 Cardiorespiratory-based sleep stage classification . . . . . . . . . . . . . . . . 4
1.5 Research question and objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Spectral boundary adaptation on heart rate variability for sleep and wake
classification 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 PSD estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Boundary adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.4 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.5 Feature evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.6 Sleep and wake classification . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.7 Classifier evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
xv
xvi Contents
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.1 Discriminative power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Healthy subjects versus insomniacs . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.4 Determination of adaptive boundaries . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Sleep and wake classification with actigraphy and respiratory effort using
dynamic warping 29
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Subjects and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Dynamic warping algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Sleep and wake classification . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Experiments and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7 Time delay between cardiac and brain activity during sleep transitions 95
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2 Subjects and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.1 Subjects and recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.2 EEG and cardiac activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.3 Correlation-analysis during sleep transitions . . . . . . . . . . . . . . . . . . . . . . . 98
7.4 Results and discission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Acknowledgements 209
General introduction
“Everything is one; during sleep the soul, undistracted, is absorbed into the unity;
when awake, distracted, it sees the different beings.”
— Chuang Tzu, 300 B.C., Warring States period 1
1
Translated by M. Palmer, The Book of Chuang Tzu, 1st ed, Penguin Classics, 2006
1
2 Chapter 1. General introduction
Sleep occupies approximately one-third of our lifetime and it is very important to maintain
health and wellbeing, homeostasis, memory, and cognitive and behavioral performance [48,
165, 285]. Sleep exerts significant effects on the systemic hemodynamics, cardiac function,
endothelial function, and coagulation [311]. Sleep deprivation can lead to loss of daytime
performance, disturbance in circadian rhythm, impairments such as mental or physical fatigue,
reduced immune system, reduced cognitive functioning, and other health risks [10, 15, 45, 80].
In epidemiology and pathophysiology, it has been found that sleep disorders or abnormalities
are linked to depression, diabetes, metabolic syndrome, sudden death, and other cardiovascular
diseases such as cardiac arrhythmias, hypertension, atherosclerosis, stroke, and heart failure
[113, 252, 282, 287, 311].
Human sleep is a complex biological process with its own internal architecture expressed by
sleep states or stages [63, 281]. Sleep states usually include nighttime wakefulness, rapid-eye-
movement (REM) sleep, and non-REM (NREM) sleep, where NREM sleep can be further di-
vided into stage S1-S4 according to the rules recommended by Rechtschaffen and Kales (R&K)
[247]. With the more recent guidelines of the American Academy of Sleep Medicine (AASM)
[136], S3 and S4 are suggested to be merged, namely slow wave sleep (SWS) or “deep” sleep,
since no essential difference was found between them. Besides, S1 and S2 usually correspond
to “light” sleep [276]. Note that, for simplicity, the sleep states and stages are generally called
“sleep stages” in this thesis.
For normal or healthy subjects, sleep progresses with about four NREM-REM sleep cycles per
night, where each cycle lasts around 90-110 min on average, starting with light sleep followed
by deep sleep before REM sleep (Figure 1.1) [63, 216, 243]. Electrophysiological interpre-
tation of sleep stage changes during a sleep cycle can be described as follows. During sleep
onset (usually from wake to S1 sleep), the changes in electroencephalography (EEG) from
clear rhythmic alpha waves (8-13 Hz) to a mixed frequency pattern with less alpha waves but
more theta waves (4-7 Hz), accompanying a gradual decrease of muscle activity that can be
observed in electromyography (EMG) as well as slow and asynchronous eye movements in
electrooculography (EOG) [63]. Many studies argued that the acknowledgement of sleep onset
should require the presence of S2. This is because the transition from wake to S1 may not
coincide with perceived sleep onset and it often occurs several times, which is considered as
‘unequivocal sleep’ associated with a low arousal threshold where subjects often report they
are still awake [5, 82]. During S2 sleep, K-complexes or sleep spindles appear along with the
incremental presence of high-amplitude and low-frequency activity as S2 progresses [243]. Af-
terwards, sleep enters SWS (S3 and then S4), during which high-voltage (≥75 µ V) and slow
(delta) wave activity (0.5-2 Hz) accounts for at least 20% (S3) or 50% (S4) of the EEG ac-
tivity with no eye movements [17, 136, 247]. SWS represents the most restorative period of
Chapter 1. General introduction 3
Wake
REM
S1
S2
SWS
0 1 2 3 4 5 6 7 8
Time (h)
Figure 1.1: An example of the sleep stage progression throughout an entire night hypnogram from a
healthy adult.
sleep for metabolic functioning associated with sleep quality [10, 91], where brain and body
energy can be efficiently conserved and recovered [34, 35] and new memories are consolidated
[61, 285, 302]. During SWS, the field potentials in EEG oscillations are related to synchro-
nized patterns of burst-pause firing in cortical neurons [92, 284]. REM sleep correlates to burst
of rapid eye movements, muscle atonia, low-voltage brain waves, and irregular heartbeats and
breathing and this is the state where dreaming often takes place [88].
As stated, objective assessment of sleep is often based on analyzing overnight sleep architecture
so that identifying sleep stages is required. Since 1968, manual scoring with PSG recordings
has become the gold standard in clinical environment for identifying sleep stages [247], where
4 Chapter 1. General introduction
Figure 1.2: A sleep laboratory where a subject was being monitored with PSG and a sleep technician
was visually inspecting the recorded PSG (adapted from source: www.newscenter.philips.com).
rules and regulations had been used for more than 40 years. In the 21 century, the AASM guide-
lines [136] and their updated version [39] were respectively released in 2007 and 2012, which
can yield an increased inter-rater agreement when scoring sleep stages. However, visually scor-
ing is very tedious and time consuming. This has resulted in a large number of studies (since
1970s) focusing on investigating computer-assisted automatic sleep staging systems with PSG
channels including EEG, EMG, and/or EOG [40, 104, 114, 115, 173, 223, 300] where reliable
classification results have been achieved. Further, some validated computer-assisted sleep scor-
ing systems have been applied in clinical routine such as the Somnolyzerr developed by the
SIESTA group [19, 20].
Although PSG is the gold standard and common practice for objective sleep assessment and
the sleep stage classification based on PSG can be automated, it has several disadvantages from
healthcare perspective. For example, it requires high costs of facilitating equipment in a sleep
laboratory, it disrupts ‘normal’ sleep, and it is inapplicable for long-term sleep monitoring at
a home environment. This has motivated the investigation of signals and sensors that allow
for reliable physiological measurements during sleep in an unobtrusive and convenient man-
ner. Figure 1.4 shows an obtrusive (with PSG) and an expected unobtrusive scenario for sleep
monitoring. In this context, alternatives such as body movements and cardiorespiratory activity
have attracted more and more attention in the past years, mainly because they can be easily
Chapter 1. General introduction 5
S3 S4 S3 S2 Wake S2 S2 S2 S2 S1 S2 S2 S2 S2 S2 S1 S2 S1 S2 REM S1 S1 REM REM REM REM REM S1 S1 Wake S1 S2 S1 REM REM REM REM REM Wake Wake
Figure 1.3: An example of an continuous PSG recording (20 min) with multiple channels of bio-signals
from a healthy adult. The channels from top to down are: hypnogram, EEG (Fp1-A2), EEG (C3-A2),
EEG (O1-A2), EEG (Fp2-A1), EEG (C4-A1), EEG (M2-M1), EOG (P8-A1 left), EOG (P18-A1 right),
EMG (mental), EMG (leg), ECG (chest), airflow, respiratory effort (chest wall movements), respiratory
effort (abdominal wall movements), and SaO2 .
acquired using less-obtrusive or even non-contact sensors with minimal discomfort to subjects
along with the fast development of wearable/off-body unobtrusive sensing techniques.
Body movements can be measured with several methods. For example, actigraphy is a
well-known less-obtrusive way of measuring one’s body movements that undergo using an ac-
celerometer, typically worn on wrist. It has been extensively studied [18, 74, 126, 204, 234] and
is regarded as a standard method for sleep assessment when PSG is not available [204]. There
are many commercialized actigraphy-based products to monitor sleep. For example, Philips
developed an Actiwatch [229] to measure activity counts during sleep, which is a clinically
validated device. Recently, Fitbit [105] and Jawbone [143] also released their wearable prod-
ucts that can quantify body movements for sleep monitoring. Some studies proposed to use an
‘off-the-shelf’ smartphone by placing it in the bed or close to the pillow to capture body move-
ments during sleep and satisfactory results were obtained in computing some sleep statistics
compared with actigraphy [33, 208]. Contrary to PSG, actigraphy can only be used to iden-
tify sleep and wake periods rather than different sleep stages. This is because it only measures
6 Chapter 1. General introduction
(a) (b)
Figure 1.4: Scenario of (a) an obtrusive (with PSG) and (b) a conceptual unobtrusive sleep monitoring
(adapted from source: www.newscenter.philips.com).
physical movements of the body, reflecting limited internal physiological information [33, 258].
Researchers argued that, even for distinguishing between sleep and wake states, actigraphy still
accounts for errors when compared with PSG [33, 295, 310]. For example, it can not deal with
the misidentifications of ‘quiet-wake’ with low or no body activity, leading to a low accuracy
in detecting wakefulness [18, 87, 234], in particular for subjects with insomnia, jet lag, or shift
work [175, 220]. To obtain a better performance in identifying sleep/wake and to achieve clas-
sification of multiple sleep stages, additional physiological information is required. Figure 1.5
compares the overnight sleep stages with the corresponding actigraphy measured by a Philips
Actiwatch from a healthy subject [see Figure 1.5(a) and Figure 1.5(b)]. It indicates that, instead
of different sleep stages, activity count in actigraphy is only correlated to sleep and wake states.
Using solely actigraphy to classify multiple sleep stages is of inadequacy thus.
Cardiorespiratory activity is characterized differently by sleep stages due to the substantial
differences in manifestation or regulation of autonomic nervous system (ANS) including sym-
pathetic activity and parasympathetic (or vagal) activity [13, 226, 267, 281, 292]. Mostly, they
have ‘opposite’ actions where one activates a response in physiology while the other suppresses
it [231]. In regard to cardiac activity, for example, heart rate (HR) and standard deviation of
normal-to-normal heartbeat/interbeat intervals (SDNN) are associated with sympathetic activ-
ity, the spectral power in the high-frequency band between 0.15 and 0.4 Hz is a marker of
parasympathetic nervous modulation activated by respiratory-stimulated stretch receptors, and
the spectral power in the low-frequency band between 0.04 and 0.15 Hz is assumed to indicate
sympathetic tone [12, 24, 265, 288]. All these non-invasively measured characteristics have
been experimentally shown to differ across sleep stages. In addition, respiratory dynamics,
such as respiratory frequency (breathing rate, BR) [95], respiratory variability [256], and res-
piratory regularity [67, 129], have also been proven to vary over sleep stages. This means that
cardiac and respiratory activity can be in turn used to separate sleep stages, which is of signifi-
cant clinical relevance. As displayed in Figure 1.5, the hypnogram with full sleep stages seems
more correlated to the variations of BR and HR in comparison with actigraphy. Furthermore,
Chapter 1. General introduction 7
(a)
Wake
REM
S1
S2
SWS
(b)
300
200
100
0
0 100 200 300 400 500 600 700 800
Time in epoch (30 s)
5.5
(c)
SDNN (a.u.)
4.5
3.5
2.5
0 100 200 300 400 500 600 700 800
Time in epoch (30 s)
-1
(d)
SDBR (a.u.)
-3
-5
-7
0 100 200 300 400 500 600 700 800
Time in epoch (30 s)
Figure 1.5: Comparison between (a) a hypnogram, (b) an actigraphy measured by Philips Actiwatch,
(c) standard deviation of normal-to-normal heartbeat intervals (SDNN), and (d) standard deviation of
breathing rates (SDBR) from a healthy adult.
the coupling or interaction between cardiac and respiratory signals has also been demonstrated
to change over sleep stages in previous work [29, 30, 41]. For example, SWS corresponds to
an enhanced phase synchronization between heartbeats and respiration [30].
In the past decade, researchers have dedicated on exploring new sensors or approaches to
acquire cardiac and/or respiratory signals, which can eventually be applied for sleep analysis.
Instead of the traditional Holter system, wearable textile electrodes were developed for record-
8 Chapter 1. General introduction
Table 1.1: Summary of some unobtrusive or less-unobtrusive approaches for measuring cardiac
and/or respiratory activity
ing ECG signals [93, 176, 199, 221, 316]. Bar et al. [28] proposed a WatchPAT ambulatory
system to obtain peripheral arterial tone (PAT) signal from which HR or heartbeat interval can
be derived [28]. More recently, photoplethysmography (PPG) is becoming a more widely used
approach that is placed at the skin surface to detect blood volume changes in the microvascular
bed of tissue [16]. From PPG, HR and respiration are able to be reliably estimated [174, 318].
Several PPG-based watches are available in the market including Adidas miCoach [2], Mio
Alpha [201], TomTom Runner Cardio [291], etc. Ballistocardiography (BCG), collected with
piezo-electric sensors for example, has also received a growing recognition as long as it can
be acquired non-invasively and it contains physiological activity of HR and even respiration
[7, 189]. It has been increasingly employed to monitor sleep as an integrated form of mat-
tress [218], load cells [68], (underneath) pillow [66], or bed [161, 200]. Furthermore, a textile
bedsheet with a pressure-sensitive sensor array was designed to estimate respiration and body
posture during bedtime sleep [141, 264]. In addition, video-based [128, 232] and audio-based
[52, 228] approaches were applied to measure cardiac or respiratory activity. They can also be
obtained with a off-body microwave Doppler radar or radio-frequency sensor [85, 319]. All
these approaches can be potentially used for unobtrusive sleep monitoring. Table 1.1 summa-
rizes some unobtrusive or less-unobtrusive approaches for cardiac and/or respiratory measure-
ment.
Automatic classification of sleep stages using body movements, cardiac activity [or more
specifically, heart rate variability (HRV)], and/or respiratory activity has been intensively re-
searched to date due to the rationale of the regulatory autonomic fluctuations occurring over
various sleep stages. With actigraphy used to quantify body movements, the studies were
mostly focused on detecting sleep/wake states [74, 126, 259]. Combining body movements
with cardiorespiratory activity can result in a superior sleep/wake performance [89, 150]. By
means of cardiac and/or respiratory signals, numerous papers were published aiming at differ-
ent classification tasks, such as the classification between sleep and wake [145, 151], between
Chapter 1. General introduction 9
Figure 1.6: A general framework of sleep stage classification, in which the present thesis is devoted to
feature extraction and feature post-processing.
wake, REM sleep, and NREM sleep [94, 161, 248, 249, 303], between wake, REM sleep, light
sleep, and SWS [138, 309], between REM sleep and NREM sleep [69, 197], between light sleep
and SWS [51], between SWS and all the other sleep stages [68, 273], and even between full
sleep stages (wake, REM sleep, S1 sleep, S2 sleep, and SWS) [167, 214, 315]. Note that some
of those studies executed several different sleep stage classification tasks and some made also
use of information about body movements. The general framework of sleep stage classification
is illustrated in Figure 1.6.
is on (1) extracting new features that contain cardiorespiratory characteristics in addition to the
existing features and/or are robust to the variability between or within subjects, and (2) reduc-
ing the variability in cardiorespiratory signals through feature post-processing (Figure 1.6). It
is noted that the focus of population in this thesis is mainly on healthy subjects whereas the
patients with disordered sleep are out of our scope.
This chapter is adapted from: X. Long, P. Fonseca, R. Haakma, R. M. Aarts and J. Foussier. Spectral
boundary adaptation on heart rate variability for sleep and wake classification. International Journal on
Artificial Intelligence Tools, 23(3):1460002, 2014.
World
c Scientific Publishing
Abstract – A method of adapting the boundaries when extracting the spectral features from
heart rate variability (HRV) for sleep and wake classification is described. HRV series can be
derived from electrocardiogram (ECG) signals obtained from single-night polysomnography
(PSG) recordings. Conventionally, the HRV spectral features are extracted from the spectrum
of an HRV series with fixed boundaries specifying bands of very low frequency (VLF), low
frequency (LF), and high frequency (HF). However, because they are fixed, they may fail to
accurately reflect certain aspects of autonomic nervous activity which in turn may limit their
discriminative power, e.g. in sleep and wake classification. This is in part related to the fact that
the sympathetic tone (partially reflected in the LF band) and the respiratory activity (modulated
in the HF band) vary over time. In order to minimize the impact of these variations, we adapt
the HRV spectral boundaries using time-frequency analysis. Experiments were conducted on
a data set acquired from two groups with 15 healthy and 15 insomnia subjects each. Results
show that adapting the HRV spectral features significantly increased their discriminative power
when classifying sleep and wake. Additionally, this method also provided a significant im-
provement of the overall classification performance when used in combination with other HRV
non-spectral features. Furthermore, compared with the use of actigraphy, the classification
performed better when combining it with the HRV features.
13
14 Chapter 2. Spectral boundary adaptation on heart rate variability
2.1 Introduction
Sleep plays an important role in human health. Night-time polysomnography (PSG) record-
ings, along with manually scored hypnograms, are considered the “gold standard” for objec-
tively analyzing sleep architecture and occurrence of sleep-related problems [247, 248]. PSG
recordings are typically recorded and analyzed in sleep laboratories, and are usually split into
non-overlapping time intervals (or epochs) of 30 s according to the Rechtschaffen & Kales
(R&K) rules [247].
As shown in literature, monitoring heart rate variability (HRV) during bedtime is helpful
in sleep staging [89, 248], particularly to distinguish between rapid-eye-movement (REM) and
non rapid-eye-movement (NREM) [59, 197]. It reflects the variation, over time, of the period
between consecutive heart beats. HRV is derived from the length variations of RR-intervals,
i.e. time intervals between consecutive R-peaks of the QRS complex in the electrocardiogram
(ECG). Spectral analysis of HRV has been widely employed in the assessment of autonomic
nervous activity during bedtime [59, 197, 299]. It traditionally involves the computation of
the power spectral density (PSD) of an HRV series. An HRV spectrum is typically divided
in three bands, namely in a very low frequency (VLF) band from 0.003 to 0.04 Hz, a low
frequency (LF) band from 0.04 to 0.15 Hz, and a high frequency (HF) band between 0.15
and 0.4 Hz [190, 288]. These bands are then be used to compute certain properties such as
the spectral power of the VLF, LF, and HF components and the power ratio of low-to-high
frequency (LF/HF) components [59, 202, 265]. In general, it has been found that the VLF
spectral power is associated with long-term regulatory mechanisms, the LF spectral power is a
marker of sympathetic modulation of the heart and it also reflects some parasympathetic activity
when the respiratory frequency components partially fall into the LF band, the HF spectral
power is related to parasympathetic activity mainly caused by respiratory sinus arrhythmia
(RSA), and the LF/HF ratio is an indication of sympathetic-parasympathetic balance [265, 275,
288]. In particular, the HRV spectrum usually contains a peak centered around the respiration
frequency, located in the HF band, and another peak in the LF band which reflects, to a certain
degree, sympathetic activation [13, 190, 219].
The parameters derived from HRV PSD are often used as “features” in automatic sleep
staging [248] or sleep and wake classification systems [89]. Previous work has used HRV
spectral features with fixed boundaries for sleep and wake classification [89]. This classifier
exploits the fact that sympathetic tone and the respiratory activity are modulated in different
frequency bands of the HRV spectrum and exhibit different properties during sleep and wake,
allowing them to be distinguished.
It is known that the HRV spectrum and the dominant (or peak) frequencies of the LF and
HF bands are not constant but rather vary over time according to the autonomic modulations
of the heart beats [288]. Hence, as long as fixed band boundaries are used to compute HRV
spectral features, we might produce inaccurate estimates of cardiac autonomic activities. Since
the discrimination of sleep states (or sleep and wake in our case) depends on these estimates,
the classification accuracy will be affected. To avoid this issue, we will use a feature adaption
method while estimating the HRV features.
Part I. Signal analysis for sleep stage classification 15
−3
x 10
Wake (mean)
18
Wake (standard errors)
16 Sleep (mean)
Sleep (standard errors)
Normalized PSD (ms2 /Hz) 14
12
10
−2
0 0.1 0.2 0.3 0.4 0.5
Frequency (Hz)
Figure 2.1: An example of the mean HRV PSD with standard errors for sleep and wake states over an
entire-night’s recording of a subject.
Figure 2.2: An example of the normalized HRV PSD versus time (30-s epoch) over an entire-night’s
recording of a subject.
The problem of boundary adaptation has been analyzed before in other areas such as stress
detection [25, 117] and anesthesia analysis [270]. It has been suggested that the LF and HF
boundaries are related to the peak frequency in the traditional LF band, called “LF peak fre-
quency”, and the peak frequency in the traditional HF band, called “HF peak frequency”, re-
16 Chapter 2. Spectral boundary adaptation on heart rate variability
Data
Acquisition
Spectrum Information
(LF and HF peaks) (HRV) PSD
Estimation
Classification Classifier
(training/testing) Evaluation
Figure 2.3: Block diagram of the feature adaptation method used for sleep and wake classification.
spectively [117, 270]. In practice, these two peak frequencies can be estimated by determining
the frequency of local maximum in the band between 0.003 and 0.15 Hz (i.e. the traditional
VLF band and LF band) and in the band from 0.15 to 0.4 Hz (i.e. the traditional HF band),
respectively. The working assumption here is that the peaks always fall within those two bands.
By centering the new bands around these peaks instead of using fixed boundaries, we can com-
pensate for their time-varying behavior. This should help, to some extent, reduce within- and
between-subject variabilities in the way these features express sympathetic activation and res-
piratory activity, ultimately helping improve sleep and wake classification. Figure 2.1 shows an
example of the mean HRV PSDs with standard errors [standard deviations (SD)] for sleep and
wake states of a subject. It can be observed that, although their standard errors overlap, their
mean values are not the same in different frequency ranges. This should provide an opportunity
of discriminating between sleep and wake states. Figure 2.2 illustrates the time variation of the
HRV PSD for a subject.
In total the data acquired from 30 subjects were used in our experiment. Fifteen subjects belong
to healthy group and fifteen subjects are insomniacs. The insomniacs were randomly selected
from a larger-sized group in order to evenly compare the classification performance between
Part I. Signal analysis for sleep stage classification 17
the healthy and insomnia groups, from which we ensured that the numbers of subjects are
equal. A subject was considered healthy if his/her Pittsburgh Sleep Quality Index (PSQI) [60]
was less than 6, while a subject was considered insomnia based on his/her self-report. For each
subject, a full PSG was recorded according to the guidelines of the American Academy of Sleep
Medicine (AASM) [136]. Among the 30 subjects, the PSG recordings of fifteen insomniacs
and nine healthy subjects were recorded in the Sleep Health Center, Boston, USA during 2009
(Alice 5 PSG, Philips Respironics) and of the remaining six healthy subjects in the Philips
Experience Lab, Eindhoven, The Netherlands during 2010 (Vitaport 3 PSG, TEMEC). The
ECG was recorded with a modified V2 Lead, sampled at 500 Hz (Boston data) and 256 Hz
(Eindhoven data).
Sleep stages were manually scored on 30-s epochs by sleep experts according to the AASM
guidelines as wake, REM sleep, and each of the NREM sleep stages (N1-N3). For sleep and
wake classification, we considered two classes wake and sleep (including REM and NREM
sleep). Each PSG recording was manually clipped to the time interval comprised between the
instant when the subject turned the lights off with the intention of sleeping until the moment the
lights were turned on before the subject got out of bed in the morning. The study protocol was
approved by the ethics committee of both centers and all subjects signed an informed consent.
The subject demographics including sex, age, body mass index (BMI), and sleep efficiency are
summarized in Table 2.1.
To estimate the PSDs of HRV, RR-intervals were first computed from the ECG signals. In our
study, the following steps were performed to obtain an RR interval series: (1) a peak detec-
tor based on the Afonso-Tompkins filter-bank algorithm [4] was used to locate the R peaks,
yielding an RR-interval series; (2) the very short (less than 0.3 s) and long (more than 2 s) RR
intervals (usually caused by ectopic heart beats, misidentification of R peaks, or badly attached
electrodes during measurement) were removed; (3) the RR-interval series was normalized by
dividing it by the mean value; (4) the resulting series was “re-sampled” at 4 Hz using linear in-
terpolation; and (5) the PSD was finally estimated with an autoregressive model with adaptive
18 Chapter 2. Spectral boundary adaptation on heart rate variability
order automatically determined using the Akaike’s information criterion (AIC) [43].
As explained in Section 2.1, the use of fixed boundaries in HRV spectrum may not be appro-
priate to accurately represent different states of the autonomic nervous system and further to
classify sleep and wake. The respiratory frequency, and therefore the corresponding peak in
the HF band vary in time. Likewise, the peak corresponding to the sympathetic tone in the LF
band also varies, reflecting differences in the autonomic activation during sleep. By applying
a time-frequency analysis, the boundaries that define each band can be dynamically adapted
so that the frequency components can be more correctly assigned to the corresponding bands.
To do this, it is required to estimate the LF and HF peak frequencies, which change over time.
Figure 2.4 illustrates, with a filled contour plot, an example of the HRV spectrum over time
for a subject together with the traditional fixed frequency bands. As it can be easily seen, the
dominant LF and HF peak frequencies vary over time. Moreover, it can be observed that, for
some epochs, the spectral power of a frequency band spills over its neighboring bands when
using the fixed boundaries. For instance, for the epochs from 140 to 150, the spectral power of
the LF band also partially falls into the HF band (see Figure 2.4).
By adapting the boundaries of the LF and HF bands for each epoch, we can overcome the
issues mentioned above. This can be achieved in the following way.
• The new HF band (HF∗ ) is centered on the HF peak frequency [25, 117] and has a con-
stant bandwidth of 0.1 Hz [153]. This bandwidth was chosen after analyzing the HRV
PSDs of all 15 healthy subjects and empirically determining that most of the spectral
power related to RSA lie within a bandwidth of 0.1 Hz. A larger bandwidth (0.25 Hz)
was empirically used in other work [25, 142], but we found that in some occasions it
overlapped its adjacent LF band .
• The new LF band (LF∗ ) is centered on the dominant frequency found in the traditional
LF band, and has a bandwidth of 0.11 Hz that is similar to the traditional definition.
• The new VLF band (VLF∗ ) is defined from its traditional lower limit of 0.003 Hz up to
the lower limit of the LF band.
Figure 2.5 illustrates the adapted boundaries for the same HRV PSD shown in Figure 2.4. We
note that the LF∗ and HF∗ bands overlap in some epochs. This occurs when the LF and HF
peaks are too close to each other or when there is no HF peak (often during REM sleep [36]).
0.5 0.05
0.45 0.045
0.4 0.04
0.35 0.035
Frequency (Hz)
0.3 0.03
HF band
0.25 0.025
0.2 0.02
0.15 0.015
0.1 0.01
LF band
0.05 0.005
VLF band
0
100 200 300 400 500
Time (30−s epoch)
Figure 2.4: HRV spectrum versus time (30-s epoch) of a subject. The fixed boundaries of the VLF, HF,
and LF bands are plotted in solid lines and the corresponding bands are indicated.
0.5 0.05
0.45 0.045
0.4 0.04
0.3 0.03
0.25 0.025
0.2 0.02
0.15 0.015
Figure 2.5: HRV spectrum versus time (30-s epoch) of a subject. The limits of the new HF∗ and LF∗
bands are plotted in dotted and solid curves, respectively. The lower boundary of the new VLF∗ band (at
0.003 Hz) is plotted as a dashed line.
between the spectral powers of the LF∗ and the HF∗ bands (expressed as hrv lf/hf ). Before
computing the logarithm, the power of each band was first normalized. This was achieved by
dividing the power in the VLF∗ , LF∗ , and HF∗ bands by the total spectrum power [202, 298].
Alternatively we could have normalized it by dividing the power in each band by the total
spectrum power minus the power in the VLF∗ band [59, 299]. Since we did not observe any
20 Chapter 2. Spectral boundary adaptation on heart rate variability
significant differences in the final result, the first method was used.
A Hellinger distance metric [130] was employed to evaluate the discriminative power (i.e. class
separability) of the HRV spectral features between sleep and wake. It is estimated by computing
the amount of overlap between two probability density estimates in a binary class problem,
expressed as
q p
DH (p, q) = 1 − ∑ p(x)q(x) (2.1)
where p(x) and q(x) are the probability density estimates of the feature values given class sleep
and wake, respectively. In its most basic form, these density estimates can be computed by
means of a normalized histogram with either a fixed number of bins or a specific bin size. In
our study the histograms were computed with a fixed number of 100 bins. A larger Hellinger
distance reflects a higher discriminative power in separating the two classes.
It has been demonstrated that a linear discriminant- (LD-) based classifier is appropriate for
the task of sleep and wake classification [89, 177]. Assuming that all features are normally dis-
tributed and their covariance matrices for the two classes are identical, the “linear discriminant”
function is given by
1
Gc (f) = − (f − µ c )T Σ −1 (f − µ c ) + ln Pr(c) (2.2)
2
where µ c is the mean vector of the feature vector f, Σ is the pooled covariance matrix, and
Pr(c) expresses the prior probability of class c.[97] In this study c = sleep as negative class
or c = wake as positive class. Based on a feature vector, the epoch is assigned to one class
when the computed discriminant score of this class minus that of the other class is higher than a
decision making threshold T (here we chose T = 0). For instance, an epoch is classified as sleep
if Gsleep (f) > Gwake (f) for this epoch. Because quadratic discriminants are known to require
a larger sample size than linear discriminants and they seem to be more sensitive to possible
violations of the assumptions of normality [110], the linear discriminant was used instead.
Part I. Signal analysis for sleep stage classification 21
In regard to the prior probability Pr(c), it can be observed that the probabilities of different
classes vary throughout the night [249]. This prior probability is typically estimated during
training procedure. For a given class, for example, the probability of being asleep in the middle
of the night is much higher than just right after entering the bed or at the end of the night.
In order to exploit these variations, instead of using a fixed prior probability we computed a
time-varying prior probability for each epoch by counting the number of times that specific
epoch (relative to the instant when lights were turned off) was annotated as each class [108].
It should be pointed out that a prior probability ‘emphasis’ factor (or weight) γ (γ ∈ [0, 1]) is
used to bias the classifier towards a pre-defined class, meaning that it can set a higher barrier of
being identified to one class and at the same time a lower one to another class during decision
making. Because the classes are imbalanced with much more sleep epochs than wake epochs
(this will be explained later), yielding a very low prior probability of wake in our study, we use
this factor to “emphasize” the wake class and meanwhile “penalize” the sleep class. Therefore,
the new time-varying prior probabilities after adding emphasis factor of the two classes are
Pr′ (sleep) = γ · Pr(sleep) and Pr′ (wake) = 1 − Pr′ (sleep), where the factor γ of 0.79 was
experimentally chosen as a proper value in the case of sleep and wake classification.
positive predictive value. When comparing different classifiers, a larger ‘area under the PR
curve’ (AUCPR ) or ‘area under the ROC curve’ (AUCROC ) indicates a better performance. In
this study, the three metrics (κ , AUCROC and AUCPR ) were used to evaluate the performance
of sleep and wake classification with and without HRV boundary adaptation.
In addition, we combined the HRV spectral features with some other HRV (non-spectral)
features selected from the feature set used in previous work [89], including time domain fea-
tures [89, 248], nonlinear measures extracted using detrended fluctuation analysis [289] and
sample entropy [75]. Five HRV non-spectral features were selected using the feature selection
method described in [108]. This serves the purpose of examining whether the feature adap-
tation method described in this chapter can help improve the classification performance when
combined with other relevant features. Note that all features were extracted from the same
HRV series. Besides, we compared the results with those obtained using the actigraphy feature
(activity counts over a 30-s epoch, expressed as ac ), a well-known feature for sleep and wake
classification [74]. Finally, we also examined the classification performance by combining the
HRV features with this actigraphy feature.
2.3 Results
A leave-one-subject-out cross-validation (LOSOCV) procedure was conducted to assess the
discriminative power of the HRV spectral features and also to assess the performance of our
classifier. Table 2.2 compares the discriminative power (as measured by a Hellinger distance
DH ) of the HRV spectral features using the traditional fixed boundaries and using the adaptive
boundaries for healthy and insomnia subjects. They were obtained by averaging the results
computed based on training data over all iterations of the LOSOCV process.
Table 2.3 and Table 2.4 summarize the classification performance obtained with and without
boundary adaptation using different sets of features for the healthy and insomnia groups. The
HRV spectral features consist of hrv vlf, hrv lf, hrv hf, and hrv lf/hf and the HRV non-
spectral features were selected based on the training sets during the cross-validation procedure.
The results are also illustrated in Figure 2.6 and Figure 2.7 using ROC and PR curves, giving an
overview of the performance of our sleep and wake classifier used in a two-dimension solution
space. Note that the ROC and PR curves were obtained by thresholding the discriminant scores
pooled over all iterations of the LOSOCV for each group.
2.4 Discussion
2.4.1 Discriminative power
Table 2.2 shows that, after using the adaptation method, the discriminative power of the HRV
spectral features are significantly increased for the subjects in both healthy and insomnia groups
(with a paired Wilcoxon signed-rank test). For comparison, the table also indicates the Hellinger
distance of the actigraphy feature ac . Although the feature adaptation helps, to different ex-
tents, improving the discriminative power of each HRV spectral feature, it is still relatively
Part I. Signal analysis for sleep stage classification 23
Table 2.2: Discriminative power comparison of the HRV spectral features for
healthy and insomnia groups
Table 2.3: Classification performance (mean ± SD of accuracy, sensitivity, and specificity) for
healthy and insomnia groups
lower than that of the actigraphy feature which addresses body motion during bedtime. As
known in literature, body motion activity often happens during wake states [74, 258].
2.4.2 Classification
As shown in Table 2.4, in general, adapting the boundaries of the HRV spectral features can
improve the performance as evaluated by the three metrics. For the healthy group, it is inter-
esting to note that the value of κ is similar when using HRV spectral features with and without
boundary adaptation. This seems to contradict the significant increase in discriminating power
found with the Hellinger distance. Upon closer inspection we found that actually this occurs
only for that single point in the solution space. In fact, when evaluating the performance over
the entire solution space with AUCPR we see an increase from 0.30 to 0.36. The ROC and PR
curves (plotted on Figure 2.6 and Figure 2.7, respectively) with the use of HRV spectral fea-
tures clearly show that the adapted versions are superior to the original ones, particularly in the
region when recall is lower than about 0.30 or larger than about 0.60. For the insomnia group,
the figures also indicate a clear improvement after adapting the HRV spectral features.
When combining the HRV spectral features with the additional HRV features indicated ear-
lier, we see a significant increase (Wilcoxon test, p < 0.01) in κ from 0.44 ± 0.25 (without
Part I. Signal analysis for sleep stage classification 25
1
(a) Healthy subjects
0.9
0.8
0.7
0.6
Sensitivity
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 − specificity
1
(b) Insomniacs
0.9
0.8
0.7
0.6
Sensitivity
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 − specificity
Figure 2.6: Pooled ROC curves for sleep and wake classification using different feature sets with and
without adaptation for healthy subjects (a) and insomniacs (b).
26 Chapter 2. Spectral boundary adaptation on heart rate variability
1
(a) Healthy subjects
0.9
0.8
0.7
0.6
Precision
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
1
(b) Insomniacs
0.9
0.8
0.7
0.6
Precision
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Figure 2.7: Pooled PR curves for sleep and wake classification using different feature sets with and
without adaptation for healthy subjects (a) and insomniacs (b).
Part I. Signal analysis for sleep stage classification 27
adaptation) to 0.48 ±0.24 (with adaptation) for the healthy group and from 0.31 ±0.11 (without
adaptation) to 0.34 ± 0.12 (with adaptation) for the insomnia group. The Wilcoxon significance
test performed pair-wise comparison for each subject, thus indicating that boundary adaptation
improved the classification performance for the majority of the subjects. Likewise, the pooled
AUCPR and AUCROC metrics increased when applying boundary adaptation. As shown in Ta-
ble 2.4, the variations of κ are relatively large compared to the mean values, indicating a large
between-subject variability in the classification performance.
For comparison purposes, Table 2.3 and Table 2.4 also show the classification results using
the actigraphy feature ac . As expected, for the healthy group, it outperforms the HRV features.
For the insomnia group, although the κ value of using the HRV feature set generally is lower
than using ac , the HRV features (in particular the adapted versions) outperforms this actigraphy
feature when recall is higher than ∼0.55 (see Figure 2.6 and Figure 2.7). This indicates that the
sensitivity to wake might be increased by adding these HRV features for the insomnia subjects.
It also highlights the disadvantage of a metric such as κ , which only represents a single point
reflecting a single solution in the space.
The classification results with the actigraphy and the HRV features are also given in the
tables. Although actigraphy is adequate for sleep and wake classification, combining it with the
HRV features (in particular when applying boundary adaptation on the HRV spectral features)
significantly increases the classification performance measured by κ value. The significance
was confirmed with a Wilcoxon signed-rank test (p < 0.01).
To compare between the healthy subjects and insomniacs, it makes less sense to use the pooled
AUCPR metric due to the difference in the ratio between the numbers of sleep and wake epochs
in both groups. For instance, using a decision making rule such that all epochs are classified as
wake (i.e. recall = 1), it will lead to different precision for the healthy and insomnia groups, with
∼92% and ∼70%, respectively, which only depends on their prior probabilities. Differences in
class balance prevent a comparison between the area under the curves of each group. Therefore,
here we used the pooled AUCROC metric instead. Figure 2.6 illustrates that the sleep and
wake classification performances with different feature sets for the healthy subjects are much
better in contrast to that for the insomniacs. This confirms earlier findings, which show that
discrimination between wake and sleep (especially REM sleep) is more difficult in insomniacs
than in healthy subjects, when using cardiac activity [283] or actigraphy [175].
The method described in this chapter shows a time-varying adaptation of the HRV spectral
features that offer higher discriminative power in classifying sleep and wake states. The features
are used as inputs to a sleep and wake classifier. We re-defined the spectral boundaries which
are adapted to the spectrum information (related to autonomic activity) that can be obtained
before feature extraction. This is because it is aimed at finding frequency bands that can more
28 Chapter 2. Spectral boundary adaptation on heart rate variability
accurately capture certain aspects of physiology during sleep. For instance, the HF band should
only includes respiratory activity rather than sympathetic activation, which should be in the
LF band. An excessively larger HF bandwidth might incorrectly include the “spillovered”
spectral power from sympathetic activation (see Figure 2.4). For this purpose, we used an HF∗
bandwidth of 0.1 Hz instead of the 0.25 Hz used in the traditional HF band. Alternatively, rather
than using a constant HF bandwidth (0.1 Hz) in this study, it can be determined by measuring
respiratory effort signals and analyzing their PSDs [117], but the use of an additional sensor is
required.
Additionally, we observed that the LF and HF bands can overlap under different circum-
stances: when the peak in the LF and in the HF band are close to each other, when there is no
clear peak in the HF band, or when the respiratory-frequency peak is below 0.15 Hz and there-
fore lies in the traditional LF band. Such overlaps (or spillovers) can be observed in Figure 2.4.
In these situations, the overlapped part of the spectrum components will actually influence the
features computed for both the LF∗ and the HF∗ bands. This may have an impact in the clas-
sification process, decreasing the accuracy of the classifier. Therefore, a more accurate method
is needed for defining a threshold which separates the two bands rather than just using fixed
bandwidths. This merits further investigation.
Finally, as we mentioned, the respiratory information was derived from the HRV data. Al-
though this may not be as good an estimation as a direct measure of respiratory effort, it has
been proven to be an available estimate of respiratory rate especially during sleep [79]. More
importantly, it does not require the use of an additional sensor to measure respiratory effort. Al-
ternatively, the respiration rate can also be estimated from the ECG signal directly, for example
by computing the changes in the “envelope” of the ECG due to the modulation induced by the
respiration movements [203]. This method will be further studied in future work.
2.5 Conclusion
In this chapter, we used a method based on the time-frequency analysis of HRV spectral power
to adapt HRV spectral features. It aimed at providing more accurate interpretations of the sym-
pathetic and respiratory activities in order to better discriminate between sleep and wake states.
It was achieved by adapting the spectral boundaries according to the peaks found in HF and LF
bands of the HRV power spectral density. The adaptation improved the discriminative power
of the HRV spectral features, and therefore enhanced the sleep and wake classification perfor-
mance, especially after combining the adapted HRV spectral features with the other selected
HRV non-spectral features. Using a linear discriminant classifier tested with leave-one-subject-
out cross-validation, we achieved a significant increase on Cohen’s Kappa coefficient κ (from
0.44 to 0.48 for healthy subjects and from 0.31 to 0.34 for insomniacs). Furthermore, by com-
bining these HRV features and actigraphy, we obtained a significantly increased κ compared
with that obtained when only using actigraphy (0.64 versus 0.53 for the healthy group and 0.50
versus 0.45 for the insomnia group).
CHAPTER 3
This chapter is adapted from: X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Sleep and
Wake Classification with Actigraphy and Respiratory Effort using Dynamic Warping. IEEE Journal of
Biomedical and Health Informatics, 18(4):1272–1284, 2014,
IEEE
c
Abstract – This chapter proposes the use of dynamic warping (DW) methods for improving
automatic sleep and wake classification using actigraphy and respiratory effort. DW is an al-
gorithm that finds an optimal non-linear alignment between two series allowing scaling and
shifting. It is widely used to quantify (dis)similarity between two series. To compare the res-
piratory effort between sleep and wake states by means of (dis)similarity, we constructed two
novel features based on DW. For a given epoch of a respiratory effort recording, the features
search for the optimally aligned epoch within the same recording in time and frequency do-
main. This is expected to yield a high (or low) similarity score when this epoch is sleep (or
wake). Since the comparison occurs throughout the entire-night recording of a subject, it may
reduce the effects of within- and between-subject variations of respiratory effort, and thus help
discriminate between sleep and wake states. The DW-based features were evaluated using a
Linear Discriminant classifier on a dataset of 15 healthy subjects. Results show that the DW-
based features can provide a Cohen’s Kappa coefficient of agreement κ = 0.59 which is signifi-
cantly higher than the existing respiratory-based features and is comparable to actigraphy. After
combining the actigraphy and the DW-based features, the classifier achieved a κ of 0.66 and an
overall accuracy of 95.7%, outperforming an earlier actigraphy- and respiratory-based feature
set (κ = 0.62). The results are also comparable with those obtained using an actigraphy- and
cardiorespiratory-based feature set but have the important advantage that they do not require an
ECG signal to be recorded.
29
30 Chapter 3. Dynamic warping on respiratory effort
3.1 Introduction
Sleep plays an important role in human’s emotional wellbeing and physical health. Many peo-
ple live with sleep-related problems (e.g., insomnia and obstructive sleep apnea) that have a
primary implication of one’s health condition [27, 247, 248]. Objective assessment of sleep
is often based on the monitoring of sleep and wake stages throughout the entire night during
bedtime [89, 151]. According to the guidelines of the American Academy of Sleep Medicine
(AASM) [136], the sleep stages consist of rapid-eye-movement (REM) and non-REM (NREM,
including N1, N2, and N3) sleep.
Overnight polysomnography (PSG) recordings with manually annotated hypnograms are
considered the “gold standard” for objectively analyzing sleep architecture and occurrence
of specific sleep-related problems [247]. A PSG typically comprises physiological data such
as the electroencephalogram (EEG), electrocardiogram (ECG), electromyogram (EMG), elec-
troocculogram (EOG), oxygen saturation, and respiratory effort. When used for sleep staging,
recorded signals are typically split in non-overlapping epochs of 30 s each in accordance with
the Rechtschaffen and Kales (R&K) rules [247] and also the more recent AASM guidelines
[136].
Although PSG is the gold standard for sleep assessment, it has several drawbacks such as the
high costs of laboratory facilities, disruption of “normal” sleep, and impossibility to perform
long-term monitoring. This has motivated the investigation of sensors/methods that allow for a
reliable acquisition of physiological modalities in an unobtrusive or at least more comfortable
and convenient way. In particular, actigraphy and cardiorespiratory signals have been often
considered in the context of automatic sleep monitoring [89, 248].
Actigraphy is a less-unobtrusive way of measuring the body movement of a subject based
on an accelerometer, which is typically worn on wrist. It has been extensively studied [18, 74,
126, 204, 234, 295] and is considered a standard method for sleep assessment when PSG is
not available [204]. However, researchers argue that actigraphy accounts for error when com-
pared with PSG [295]; and it can not cope with the misclassification of ‘quiet-wake’ with a
low body activity, resulting in low accuracy in detecting wake state [18, 234]. Since actigra-
phy only measures body movement, it reflects limited physiological information. It has been
shown that cardiorespiratory signals contain relevant physiological information which can help
improve actigraphy-based sleep and wake classification [89, 150, 226]. More importantly, these
signal modalities can be acquired in an unobtrusive circumstance in different ways (e.g., bal-
listocardiogram [189], Doppler radar [194], near-infrared camera [166], under-pillow sensor
[66], bed sensor [303]). For example, acquiring cardiorespiratory information using a static-
charge-sensitive bed (SCSB) [140, 158] has been investigated; and in recent years it becomes
more popular for unobtrusive (or non-contact) monitoring of sleep [161, 303]. However, dif-
ficulty has been found in discriminating between wake and REM sleep [249] when only using
cardiorespiratory signals. So it is necessarily important to improve the sleep and wake clas-
sification when actigraphy is absent. On the other hand, cardiac activity is relatively difficult
to capture reliably in an unobtrusive manner, particularly when compared with body move-
ment and respiratory activity [158]. For example, a novel radio-frequency sensing system [85],
Part I. Signal analysis for sleep stage classification 31
which can only capture respiratory effort, was developed for sleep/wake measurement. Thus,
enhancing the sleep and wake classification performance when without cardiac activity is also
of importance. This work therefore addresses the problem of obtaining a reliable sleep and
wake classification based on the following physiological signal modalities: (1) only respiratory
effort, and (2) the combination of actigraphy and respiratory effort.
As presented in previous studies, a large amount of features have been explored for sleep
and wake classification [74, 89, 248]. As long as either ECG or actigraphy is excluded, the clas-
sification performance will degrade to a certain degree [89, 150, 151]. In this work we present
new features based on respiratory effort, which result in a classification performance not only
better than the previous respiratory feature set (and the actigrapgy feature), but also comparable
to the cardiorespiratory feature set described in [89]. Compared to that work, this study does
not require ECG signals, which is particularly well-suited to the problem of unobtrusive sleep
and wake classification.
It is known that the breathing rhythm is usually more stable and more regular during sleep
than when awake [111, 163]. After observing different respiratory effort signals in the time
and the frequency domains, we found that the morphology of the respiratory waveform and the
properties of its power spectral density (PSD) differ between sleep and wake epochs. As illus-
trated in Figure 3.1, the respiratory effort is more regular during sleep than during wake. Note
that the irregularity of respiratory effort would also be caused by body motions. Additionally,
the PSD of the respiratory effort signal of a sleep epoch is typically distributed with a clear peak
indicating the dominant respiration frequency, while that of a wake epoch often distributes with
multiple peaks. Therefore, it is assumed that, a sleep epoch is more similar to another sleep
epoch and less similar to a wake epoch from the perspective of “series shape”, regardless of be-
ing in the time or in the frequency domain. We thereby concentrate on two questions: (1) how
to quantify the “(dis)similarity” between two series in terms of their morphological properties,
and (2) which template best reflects the shape of a specific state (sleep/wake)?
Dynamic Warping (DW) algorithms have been used to assess (dis)similarity of two data
series with respect to their values. In particular, Dynamic Time Warping (DTW) [37] is a signal
matching algorithm that represents the time-alignment between two time series via dynamic
programming by means of a total cumulative distance function. It can therefore be used to
establish the degree to which two patterns match. Dynamic Frequency Warping (DFW) [209]
is an exact analog of DTW but applied in the frequency domain, where it aims at aligning two
PSD curves (often known as spectrogram frames). When used with respiratory effort signals,
DTW is expected to find a good match between the waveforms of the respiratory effort in
two separate sleep periods. In contrast, it should not find any good match of the respiratory
waveform between a sleep and a wake period, or even between two distinct wake periods. This
is simply because the breathing pattern during wake is usually not as regular as it is during
sleep, and sometimes it is more related to body motion artifacts. Analogously to DTW, DFW
can help distinguish respiratory PSD curve between a sleep and a wake state. Using DTW and
DFW we can express the (dis)similarity of signals in the time and in the frequency domains,
and accordingly capture properties of the respiratory effort signals which are characteristic of
32 Chapter 3. Dynamic warping on respiratory effort
0 20 40 60 0 20 40 60
Time (s) Time (s)
Sleep Wake
PSD (a.u.)
0 0.2 0.4 0.6 0.7 PSD (a.u.) 0 0.2 0.4 0.6 0.7
Frequency (Hz) Frequency (Hz)
Figure 3.1: Typical examples of respiratory time series (a) during sleep and (b) during wake in a period
of one min, and respiratory PSD series (c) during sleep and (d) during wake.
PSG was recorded according to the guidelines of the AASM [136]. The PSG recordings of
nine subjects were recorded in the Sleep Health Center, Boston, USA, during 2009 (Alice 5
PSG, Philips Respironics) and of six subjects in the Philips Experience Lab of the High Tech
Campus in Eindhoven, The Netherlands, during 2010 (Vitaport 3 PSG, TEMEC). The subject
demographics are presented in Table 3.1 as mean ± standard deviation (SD) and range. The
Ethics Committee of the two sleep laboratories (or labs) approved the study protocol and all
subjects signed an informed consent form.
Actigraphy was obtained with the wrist-worn Actiwatch where acceleration data, caused by
body movements, were recorded and converted into activity counts per second (influenced by
the intensity and frequency of acceleration) [229, 254]. The thoracic respiratory effort signal
was recorded using respiratory inductance plethysmography with a sampling rate of 10 Hz.
Note that the recordings from the Actiwatch were synchronized with those from the PSG, using
markers in both the Actiwatch and the PSG clocks.
Sleep stages were scored on 30-s epochs by sleep experts based on the AASM guidelines
as wake, REM sleep, and three NREM stages N1-N3. For sleep and wake classification, we
considered two classes wake and sleep (including REM and NREM sleep). Each PSG recording
was manually clipped to the time interval comprised between the instant when the subject turned
the lights OFF with the intention of sleeping until the moment the lights were turned ON before
the subject got out of bed in the morning.
3.3 Methods
3.3.1 Dynamic warping algorithm
m
wK
Warping band
(upper)
r
Warping path
j wk Warping band
(lower)
w1 i n
B
Figure 3.2: An example of DW process between two series A and B, where the warping path (circle
markers) and the Sakoe-Chiba warping bands with the size of r (dash lines) are indicated.
These two series can be arranged such that they form an n-by-m “warping matrix”, where each
element of the matrix (i, j) is given by a distance function D, expressing the squared distance
between ai and b j :
A warping path maps the elements of A and B through the matrix so that the total cumulative
distance between them is minimized. The warping path W belongs to a set Ω including all
possible warping paths, and is denoted as
where wk = (i, j)k is the kth element of the warping path W and max(n, m) ≤ K ≤ m + n − 1.
The DW distance between the two series is the minimum measure based on W such that:
q
1 K
K ∑k=1
DW (A, B) = min wk , W ∈ Ω, (3.5)
where the distance is normalized by a factor K (path length). Figure 3.2 illustrates an example
of the dynamic warping procedure between two series A and B in a 2-D space.
Part I. Signal analysis for sleep stage classification 35
where the cumulative distance ∆(i, j) is defined as the sum of the distance D(i, j) found in a
warping step with the minimum of the cumulative distances of the adjacent elements on the
warping matrix.
Additionally, the warping path can be restricted by a band of size r (i.e., |ik − jk | ≤ r) on
both sides of the diagonal points of the warping matrix to reduce computational complexity of
a DW procedure (i.e., to reduce search space of the warping matrix). It is called warping band
condition, and the corresponding band is commonly known as the Sakoe-Chiba band [261] (see
Figure 3.2). In regard to the warping band condition, using a band size r that is too large often
results in “over-warping” the periodic series with multiple cycles and thus introducing artificial
features [71]. These artificial features usually occur when the warping path takes excessive
numbers of non-diagonal (i.e., vertical or horizontal) moves. While a very small band size may
account for “under-warping” between two series (the extreme cases is the Euclidean alignment
that corresponds to the diagonal line of the warping matrix) [152]. Over-warping and under-
warping are both undesirable. To determine a suitable band size r, we search for the parameter
value that would result in the highest feature discriminative power. This will be presented later
in Section 3.3.3.
Euclidean DW
A A
B B
Figure 3.3: An example of the alignment between two series (A and B) when computing the Euclidean
(Left) and DW (Right) distances.
Sleep Sleep
0 50 100 150 200 250 300 0 20 40 60 80 100 120 140
Time series sample Freq. series sample
(a) (d)
Sleep Sleep
Wake Wake
Figure 3.4: Examples of DTW alignments of the respiratory time series (a-c) and of DFW alignments of
respiratory PSD series (d-f), respectively, between two sleep epochs (S-S), between a wake and a sleep
epoch (W-S), and between two wake epochs (W-W). Each time series lasts 30 s sampled at 10 Hz and
each PSD series contains 144 samples falling within a frequency range of 0 to ∼0.7 Hz. The values of
corresponding DTW and DFW distances are indicated.
of 30 s, such that:
where E pT = {x p,1 , x p,2 , ..., x p,N } is the time series of the pth epoch (p ∈ Z+ and 1 ≤ p ≤ L)
and N is the number of data points per epoch (N = 300 at a signal sample rate of 10 Hz). In
order to compute the feature value for a given epoch of a recording, the template needs to be
determined. We search for the template based on a window ΛT with a size of 2λ T (<2λ T when
p < λ T or p > L − λ T ) centered on the given epoch (±λ T ), where this epoch itself should be
excluded to avoid “self-alignment”. Thus, for the pth epoch E pT , the time-series template ΓTp is
selected using
where λ T is a positive integer with 1 ≤ λ T ≤ L − 1. Then the feature value of the pth epoch is
computed by
It means that we choose, as the feature value, the minimum of all DTW distances between the
given epoch E pT and all the other epochs within a searching window ΛT .
The DFW feature (dfw ) is computed based on the DFW algorithm. The procedure of com-
puting dfw is the same as that of computing dtw , but for a respiratory PSD series rather than
its time series. This feature compares the shape of the PSD curve between a given epoch and a
“frequency-series template” with an indication of maximum similarity in the frequency domain.
Therefore, the feature value of dfw for the pth epoch is obtained as
where E pF = {ϕ p,1 , ϕ p,2 , ..., ϕ p,M } is the PSD series of the pth epoch (p ∈ Z+ and 1 ≤ p ≤
L), containing M frequency bins and ΓFp is the selected frequency-series template. Here the
template searching window of the DFW feature is ΛF with a size of 2λ F epochs. As explained
before, the PSD series are obtained after STFT, for each of which the number of frequency bins
is M = 144 in a frequency range between 0 and ∼0.7 Hz (a subset of the original spectrum
with 1024 frequency bins in the range of 0 to 5 Hz). We limit the comparison of the PSD of
each epoch to this frequency range since it can be observed that the frequency components of
a healthy subject’s respiration during sleep are usually below 0.7 Hz. We experimentally found
that including higher frequency components would result in a lower discriminative power of
the feature since they carry very small but unexpected non-zero noise that would contaminate
the DFW alignment.
The use of template searching window is to reduce the computational complexity when ex-
tracting the DW features, restricting the search for minimum DW value to that window. An
assumption here is that, for a given epoch, it will always offer a suitable template by search-
ing from all the other epochs within the window except the given epoch. The procedure of
determining λ T and λ F will be presented in Section 3.3.3.
wake, it is not likely to obtain a small feature value, because W-S or W-W may happen. For this
reason, this feature will in turn have discriminative power for distinguishing sleep and wake
states. Regarding the DFW feature, the same reasoning applies; but instead of (dis)similarities
of respiratory waveform, this feature expresses (dis)similarities in the shape of PSD series.
Figure 3.4 depicts two examples of the alignment found by DTW and DFW between epochs,
in the three situations (S-S, W-S, and W-W).
It should be kept in mind that the respiratory waveform and PSD shape might carry some
information of body motion artifacts, which often appear during wake state. This would possi-
bly lead to irregularity of a recorded respiratory effort. As illustrated in Figure 3.5, some peaks
(e.g., around the 420th epoch) of the DW-based features seem correlated to the actigraphy fea-
ture (ac ), expressing the activity counts. It means that these two features might help detect
body motion artifacts. On the other hand, some peaks (e.g., around the 750th epoch) of the
DW-based features seem related to the wake epochs, but where no activity counts are observed.
These peaks might possibly be in correspondence with irregular breathing rhythm.
3.3.2.4 Classifier
A linear discriminant (LD) classifier is adopted in this study. It has been previously proved to
be appropriate for the task of sleep and wake classification using actigraphy, respiratory, and
cardiac data [89, 108, 178, 249]. The details of an LD classifier can be found in [249] and [97].
Note that the classifier used here is based on epoch-by-epoch classification.
Regarding the prior probability in the LD classifier, it can be observed that the probabilities
of different classes vary throughout the night. For example, the probability of being awake
just right after sleep onset or at the end of the night is much higher than in the middle of the
night. To exploit these variations, we compute a time-varying prior probability for each epoch
by counting the relative frequency that specific epoch was annotated as each class [108, 249].
3.3.3.2 Evaluation
To evaluate the performance of our classifier, overall accuracy (i.e., ratio of correctly identified
samples to the total number of samples) used in a binary classification problem is not the most
40 Chapter 3. Dynamic warping on respiratory effort
Annotation
Wake (a)
Sleep
(b)
Resp.
(c)
dtw
(d)
dfw
(e)
ac
Figure 3.5: An example of (a) manually scored sleep/wake annotation, (b) respiratory effort recording
at 10 Hz, and feature values of (c) dtw , (d) dfw , and (e) ac for each 30-s epoch of a healthy subject.
adequate. The reason is that during a recording of a whole night the number of epochs of the
wake class (accounting for 7.6% of all epochs) is much smaller than that of the sleep class
(accounting for 92.4% of all epochs), in what is usually called an “imbalanced class distribu-
tion” [125]. Thus we also consider the metrics specificity (proportion of correctly identified
actual negatives), sensitivity or recall (proportion of correctly identified positives), and preci-
sion (ratio of true positives to true positives plus false positives). Besides to these metrics,
the Cohen’s Kappa coefficient of agreement κ [72] provides a more insightful measure of the
general performance of the classifier (0-0.20: slight, 0.21-0.40: fair, 0.41-0.60: moderate, 0.61-
0.80: substantial, and 0.81-1: almost perfect agreement [172]); but it only represents a single
point in the entire solution space [237]. In order to have an overview of the performance across
the entire solution space, we use a Precision-Recall (PR) curve [103], which plots precision ver-
sus recall by varying the classifier’s decision-making threshold. Compared with the well-known
Receiver Operating Characteristic (ROC) curve that has been shown to be over-optimistic when
the data set is heavily imbalanced between classes [83], a PR curve gives a more conservative
view of the classifier’s performance. The corresponding ‘Area Under the PR Curve’ (AUCPR )
can then be estimated [83]. In the remainder of the chapter, we will consider wake and sleep as
the positive and negative classes, respectively.
An absolute standardized mean difference (ASMD) metric is utilized to evaluate the discrim-
inative power (i.e., separability) of a single feature. It computes as the absolute mean difference
of the feature values between sleep and wake epochs divided by the standard deviation among
that of all epochs. A Mann-Whitney unpaired (1-sided) test is applied to check whether the
feature values of the two classes significantly differ. Moreover, the Spearman’s rank correla-
tion coefficient (denoted as ρ ) measures the correlation between features. The significance of
Part I. Signal analysis for sleep stage classification 41
properties will be exploited with the introduction of the new DW-based features.
To understand whether the new DW-based features add discriminative power to a sleep
and wake classifier that uses the selected features extracted from different signal modalities,
we consider three respiratory-based feature sets and three actigraphy- and respiratory-based
feature sets, in which the features included are presented in Table 3.2. For the comparison of
classification performance with and without cardiac information, two feature sets FARC1 and
FARC2 including all the previously selected features (or together with the DW-based features)
are also considered.
Since our data were collected from two distinct sleep labs (Boston or Eindhoven), the lab-
effect (possibly caused by the difference of PSG setup during measurement between labs) on
sleep and wake classification is then analyzed by using one data set for training and the other
for testing.
• The most commonly used DW approach is the one with the warping conditions but with-
out the warping band condition [205]. It requires a computational complexity of O(N 2 ),
where the two series have the same length N. When using exhaustive template searching,
the complexity of computing a DW-based feature value becomes O(LN 2 ), in which L is
the epoch number of a recording. This approach is denoted as A1.
• The Sakoe-Chiba warping band condition brings down the computational complexity to
Part I. Signal analysis for sleep stage classification 43
O(LrN) instead of O(LN 2 ), where r is the warping band size and typically r ≪ N [205].
This approach is denoted as A2.
• Setting a template searching window Λ with a size of 2λ can reduce the complexity to
O(λ rN), where λ < L. This approach is denoted as A3.
3.4 Results
Table 3.3 indicates the determined parameter values obtained by the grid search method. Since
the determination was based on the training set of each iteration during the LOSOCV procedure,
the optimal values for each iteration might differ. Their means and variances (over grid search
iterations) are also indicated in the table.
Table 3.4 shows the pooled discriminative power (as measured by ASMD) of the selected
features for all subjects in separating the sleep and wake classes. As confirmed with the Mann-
Whitney unpaired (1-sided) test, the differences of the features between these two classes are
significant. The table also indicates that the DW-based features perform much better than actig-
raphy when discriminating between quiet-wake and sleep; and the feature dtw offers a higher
discriminative power compared with the other features for wake and REM separation. Fig-
ure 3.6 illustrates the box plots of the three features (ac , dtw , and dfw ) for sleep and wake
epochs for every subject and the pool of all these subjects. It clearly shows how the features can
help discriminate (albeit not perfectly) between the two classes. Classification errors will occur
for feature values where the box plots overlap. Besides, the in-between feature correlations for
44 Chapter 3. Dynamic warping on respiratory effort
Feature Sleep vs. Wake Quiet-wake vs. Sleep Wake vs. REM sleep
ac 1.77∗ 0.16∗∗ 0.92∗
dtw 1.75∗ 0.48∗ 1.03∗
dfw 1.39∗ 0.74∗ 0.70∗
For each feature, the significance of difference between classes (sleep/wake,
quiet-wake/sleep, and wake/REM sleep) was examined with a Mann-Whitney
test (∗ p < 0.0001 and ∗∗ p < 0.005).
all subjects are presented in Table 3.5, indicating that the correlation between ac and dtw is
higher than the others.
The classification results obtained with each of the feature sets after LOSOCV are summa-
rized in Table 3.6, where both ‘averaged’ and ‘pooled’ results are presented. Note that, for each
feature set, the decision threshold (i.e., operating point) of the classifier was chosen to optimize
Kappa coefficient (based on training sets) rather than overall accuracy due to the between-class
imbalance of our data. As it can be seen in the table, for instance, the two DW-based features
(i.e., FDW ) provide a pooled κ of 0.59, which seems to be comparable with the actigraphy fea-
ture (corresponding to a κ of 0.58). Combining them with the actigraphy feature in FAC-DW ,
we achieved a pooled κ of 0.66 and a pooled accuracy of 95.7%. The table also presents the
classification results obtained with FAR1 and FAR2 , indicating that the addition of DW-based
features significantly improves the classification performance. It also shows that the feature set
FAC-DW performs significantly better than FAR1 and comparably with FAR2 . For comparison,
the results based on the feature sets comprising actigraphy, respiratory, and cardiac features are
also provided in Table 3.6. No significant difference was found between FAC-DW , FARC1 , and
FARC2 . Figure 3.7 compares the pooled PR curves using different feature sets.
The classifier’s learning curves (based on FAC-DW ) using LOSOCV are displayed in Fig-
ure 3.8. It is plotted as pooled κ versus the number of subjects (varying from 2 to 15). The
results on training and test sets start converging rapidly from 4 or 5 subjects and become stable
at 13 subjects, ultimately achieving a κ of ∼0.66. This confirms the unsuitability of splitting
separate training and test sets in our experiment.
Part I. Signal analysis for sleep stage classification 45
800
Sleep Wake
ac (a.u.)
400
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pool
0.1
dtw (a.u.)
0.05
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pool
−4
x 10
3
dfw (a.u.)
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pool
Subject
Figure 3.6: Box plots (mean and SD) of the feature values of ac , dtw , and dfw for sleep and wake
epochs for each of the 15 subjects and for the pool of all these subjects.
0.9
0.8
0.7
0.6
Precision
0.5
0.4 FAC
FR1
0.3
FDW
0.2
FAR1
0.1 FAC−DW
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Figure 3.7: PR curves with features or feature sets with their corresponding operating points of classifier
(representing κ ) are marked.
46
Table 3.6: Summary of sleep and wake classification results using LOSOCV
Feature set Precision (%) Sensitivity (%) Specificity (%) Accuracy (%) AUCPR∗ Kappa κ ∗
FAC 64.5 (66.2 ± 20.9) 57.5 (61.8 ± 16.1) 97.4 (97.3 ± 2.0) 94.4 (94.2 ± 1.7) 0.66 (0.73 ± 0.09) 0.58 (0.57 ± 0.07)
FDTW 50.7 (51.0 ± 17.9) 62.9 (67.7 ± 17.0) 95.0 (94.9 ± 2.5) 92.5 (92.4 ± 2.1) 0.55 (0.60 ± 0.13) 0.52 (0.51 ± 0.11)
FDFW 43.3 (42.6 ± 11.8) 50.8 (53.7 ± 11.9) 94.5 (94.3 ± 2.3) 91.2 (91.0 ± 2.9) 0.43 (0.44 ± 0.10) 0.41 (0.41 ± 0.08)
FR1 45.2 (51.6 ± 20.2) 54.3 (52.9 ± 16.8) 94.6 (93.8 ± 6.9) 91.5 (90.9 ± 6.0) 0.52 (0.55 ± 0.14) 0.45 (0.44 ± 0.12)
FR2 64.2 (66.8 ± 20.5) 56.0 (55.0 ± 19.3) 97.3 (96.6 ± 3.0) 94.2 (94.1 ± 2.6) 0.64 (0.67 ± 0.16) 0.57 (0.55 ± 0.17)
FDW 63.5 (64.0 ± 18.9) 59.9 (63.4 ± 16.2) 97.3 (97.2 ± 1.9) 94.3 (94.2 ± 2.2) 0.64 (0.68 ± 0.12) 0.59 (0.58 ± 0.11)
FAR1 70.6 (75.3 ± 29.2) 60.5 (62.4 ± 18.7) 97.8 (97.6 ± 2.7) 95.0 (94.8 ± 2.3) 0.68 (0.75 ± 0.12) 0.62 (0.61 ± 0.12)
from a larger feature set based on the selection method described in [108].
Part I. Signal analysis for sleep stage classification 47
0.8
Kappa coefficient
0.6
0.4
Training set
0.2
Test set
0
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of subjects
Figure 3.8: Learning curves with LOSOCV by varying the number of subjects.
Table 3.7: Classification results with split training and test sets
Table 3.7 shows the classification results (pooled overall accuracy, AUCPR , and Kappa) of
using our actigraphy- and respiratory-based feature set FAC-DW by splitting training and test sets
with regard to lab (i.e., using the Boston set to train the classifier and testing it on the Eindhoven
set, and the other way around).
The results (absolute error and estimation bias) of the sleep statistics over subjects using
different actigraphy and respiratory feature sets (FAR1 and FAC-DW ) are summarized and com-
pared in Table 3.8. Using FAC-DW we achieved significantly lower absolute errors (after t-test,
p < 0.05) in estimating the sleep statistics compared with that using FAR1 , with an exception of
ST. To compare the degree of agreement, the Bland-Altman scatter plots were produced in Fig-
ure 3.9. It can be seen that the difference values of SE, TST, TWT, SOL, and WASO are more
converging when using FAC-DW than using FAR1 , indicating less variances (or higher degree of
agreement) when estimating the sleep statistics with FAC-DW .
Table 3.9 compares the computational complexity of different DW approaches (A1, A2, and
A3). It means that, when extracting DW-based features, using a warping band for DW and
constraining the template searching range reduces the computation time significantly (after a
t-test, p < 0.001) to an average value of 0.53 s and 0.10 s for computing dtw and dfw of each
30-s epoch, respectively. On average, it takes approximately 7.5 min for the DTW feature and
1.5 min for the DFW feature to compute all their feature values of one night per subject.
48 Chapter 3. Dynamic warping on respiratory effort
3.5 Discussion
During the training step of each LOSOCV iteration, some parameters, evaluated by the pooled
AUCPR , were determined. The determined Sakoe-Chiba warping band for DTW (rT = 60) is
much larger than that for DFW (rF = 5). This is because, when computing the DTW distance
between two respiratory time series, they usually start and end with different phases of a breath-
ing cycle. A larger DTW warping band allows a larger signal variation (caused by breathing
phase, length, amplitude differences, etc.) between two epochs. It helps compensating for the
signal variation and thus enables to find a better alignment between them. On the other hand,
when computing the DFW distance, the respiratory PSDs were normalized between 0 and 1 so
that the amplitude variation between epochs would be no more existing (no improvement on
classification performance was observed without normalizing them). Also, they usually have
less peaks and no troughs compared with time series (see Figure 3.1). These would yield a
higher similarity between two respiratory PSD series than between two respiratory time series.
Besides, using a smaller warping band for DFW is able to avoid over-alignment between two
PSD series, which still enables to discriminate between sleep and wake with respect to their
minimum distance.
The searching window sizes for extracting DW-based features were also determined with
the use of the grid search method. Since we relied on the observation that the minimum DW
distance for a sleep epoch is small, this potential disadvantage of restricting the search space
Part I. Signal analysis for sleep stage classification 49
SE diff. (%)
SE diff. (%)
0 0
−10 −10
80 85 90 95 100 80 85 90 95 100
SE average (%) SE average (%)
TST diff. (min)
0 0
−50 −50
200 300 400 500 200 300 400 500
TST average (min) TST average (min)
TWT diff. (min)
0 0
−50 −50
0 20 40 60 80 0 20 40 60 80
TWT average (min) TWT average (min)
SOL diff. (min)
40 40
0 0
−40 −40
0 10 20 30 0 10 20 30
SOL average (min) SOL average (min)
WASO diff. (min)
50 50
0 0
−50 −50
0 20 40 60 0 20 40 60
WASO average (min) WASO average (min)
ST diff. (min)
ST diff. (min)
10 10
0 0
−10 −10
0 5 10 0 5 10
ST average (min) ST average (min)
Figure 3.9: Bland Altman plots of for sleep statistics estimated using FAR1 (Left) with data points marked
by “×” and FAC-DW (Right) with data points marked by “◦”. Data points in a plot represent different
subjects. Mean bias and 95% limits (± 1.96 SD) are shown as solid and dash lines, respectively.
are alleviated by the fact that sleep epochs are usually not isolated in time, i.e., there are, very
likely, other sleep epochs close to any given (sleep) epoch during the night. Furthermore, a
larger searching window might not provide a better separation between sleep and wake classes.
For instance, when analyzing a wake epoch, the inclusion of more distant (in time) candidate
templates might increase the likelihood of selecting a more similar wake template. This would
result in a smaller DW distance and thus decrease the feature’s discriminative power. Here we
50 Chapter 3. Dynamic warping on respiratory effort
found that the discriminative power of these two features did not dramatically change when
λ > 25 epochs.
The DW-based features performed well for sleep and wake classification. These features can
effectively encode differences in the waveform and PSD shape of the respiratory effort between
sleep and wake states. As shown in Table 3.6, when considering the use of only respiratory
effort, our DW-based feature set FAC-DW offers around relative 31% increase of κ compared to
the existing respiratory feature set FR1 (i.e., κ of 0.59 versus 0.45); and it is comparable with the
well-known actigraphy (κ = 0.58). After combining actigraphy with respiratory effort signal,
our DW-based features improved the classification performance from κ = 0.62 to κ = 0.66,
yielding a higher relative increase (∼14%) when compared with actigraphy. The reason might
be that the DW-based features (particularly the DFW feature) better help distinguish between
quiet-wake and sleep (see Table 3.4).
A previous study [126] presented a novel actigraphy-based algorithm for sleep and wake
classification, in which the authors reported an overall accuracy of ∼86%, a sleep accuracy
of ∼91%, and a wake accuracy of ∼69% for a group of 38 normal subjects. In [234], the
overall accuracy was ∼87% (measured in 14 healthy subjects). In this study, to perform an
even comparison, we varied the operating point of our classifier and obtained comparable results
based on only actigraphy. After combining it with the DW-based respiratory features, as shown
in Table 3.6, we achieved much better results.
It is known that the wrist actigraphy ultimately measures the body (or more precisely, wrist)
movements during sleep, which proved to be an indication of wake state [18, 234]. To a certain
extent, they would often be reflected in respiratory effort signal as body motion artifacts during
measurement. This can be observed in Figure 3.5, which suggests a relatively high correlation
between peaks in the actigraphy feature and respiratory effort series. As mentioned, the res-
piratory waveform and PSD shape not only reflect the respiration information but also contain
some information about body motion artifacts. It means that the DW-based features might en-
code the artifact information in both of the time and the frequency domains. Table 3.5 confirms
this due to the significant correlation between ac and dtw (ρ = 0.32) and between ac and
dfw (ρ = 0.26). These two features (particularly the DTW feature) might help separate wake
and REM sleep, resulting in an improved classification when actigraphy is not provided (see
Table 3.6).
The inclusion of the cardiac feature (i.e., using FARC2 ) did not significantly improve the
performance of sleep and wake classification (see Table 3.6). It means that a good performance
is still possible to be obtained when using fewer physiological signal modalities. However, it
is still encouraged to explore new cardiac features containing additional information that can
better discriminate between sleep and wake states, for which these information is not contained
by actigraphy and respiratory activity. Moreover, the κ of 0.59 with only DW-based respiratory
features is comparable with that of 0.60 reported in [249], where they used not only respiratory
but also cardiac information.
The results of using FAR2 (with six features) are comparable with that using FAC-DW (with
three features). Since we aimed at evaluating the proposed new DW-based features, they were
Part I. Signal analysis for sleep stage classification 51
simply combined with the other pre-selected features. Often, using more features does not
necessarily guarantee a better performance, and in some cases it may even decrease. This is
because features may be mutually correlated to some extent, and thus some features are likely
redundant. As a consequence, they may hardly contribute to (or even be against) the classifica-
tion when the additionally useful information they carried is limited compared to the increase
of noise level. Therefore, selecting features from a larger feature set aiming at removing the
feature set redundancy (e.g., correlation-based feature selection [121]) merits further investiga-
tion.
As shown in Table 3.7, the sleep and wake classification results obtained on the Eindhoven
set remain worse compared with those on the Boston set, regardless of either set used for train-
ing. This might be associated with the between-subject variability instead of lab-effect. Thus,
it is not sufficiently confident to conclude about the existence of lab-effect based on our data set
with a small number of subjects. Although results have been shown to be consistent between
labs [126], it is encouraged to be further studied on a larger-sized data set .
By choosing different classifier operating points, we can obtain results that prefer a higher
specificity or sensitivity. In practice, this often depends on the requirement of accuracy in
estimating sleep statistics, which can be delivered to subjects. For example, it should be cho-
sen to optimize the estimate of SOL for subjects who might have insomnia; while for overall
assessment of sleep, one can choose to optimize the estimate of SE.
In addition, this study focused on the healthy subjects with high sleep efficiencies (>86%)
rather than, e.g., the insomniacs with low sleep efficiencies. However, it has been indicated that
distinguishing between sleep and wake states is more difficult in insomniacs than in healthy
subjects when using cardiorespiratory activity [85, 283] or actigraphy [175]. Although the
DW-based features perform well in separating sleep and wake states for the healthy subjects, it
is necessarily required to further evaluate how robust they are against low sleep efficiency.
Finally, although the DW-based features seem computationally intensive compared with
many other existing features, it is still practically feasible to achieve an offline classification
of sleep and wake. In fact, recent research has developed a set of techniques that can make
the DW computation much faster and comparable with the Euclidean alignment, so that DW is
applicable on large-sized data sets in real time [242]. Nevertheless, speeding up our algorithms
using these techniques will be carried on in our future work.
3.6 Conclusion
In this chapter, we proposed two new features extracted from respiratory effort based on dy-
namic warping (DW) algorithms to enhance the performance of sleep and wake classification.
The features compared the shape (dis)similarity between two series (in time and frequency
domain) for a given 30-s epoch with the other epochs within a pre-determined window from
an entire-night respiratory effort recording. The minimal dissimilarity (measured by a DW
distance) was computed as the feature value for this epoch. To evaluate the sleep and wake
classification performance, a linear discriminant classifier was tested with a leave-one-subject-
52 Chapter 3. Dynamic warping on respiratory effort
out cross-validation. By combining the two DW-based features with a well-known actigraphy
feature, we obtained a significantly increased Cohen’s Kappa coefficient (κ = 0.66) compared
with the use of the actigraphy feature and the traditional respiratory features (κ = 0.62), and
it significantly outperforms that only with actigraphy (κ = 0.58). It is comparable with that of
0.67, obtained with a feature set comprising the DW-features and the previously used actigraphy
and cardiorespiratory features. Furthermore, when using the respiratory signal only, the DW-
based features provided a large improvement compared with the existing respiratory features
(κ of 0.59 versus 0.45).
CHAPTER 4
This chapter is adapted from: X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Analyzing
respiratory effort amplitude for automated sleep stage classification. Biomedical Signal Processing and
Control, 14:197-205, 2014.
Elsevier
c
Abstract – Respiratory effort has been widely used for objective analysis of human sleep
during bedtime. Several features extracted from respiratory effort signal have succeeded in
automated sleep stage classification throughout the night such as variability of respiratory fre-
quency, spectral powers in different frequency bands, respiratory regularity and self-similarity.
In regard to the respiratory amplitude, it has been found that the respiratory depth is more ir-
regular and the tidal volume is smaller during rapid-eye-movement (REM) sleep than during
non-REM (NREM) sleep. However, these physiological properties have not been explicitly
elaborated for sleep stage classification. By analyzing the respiratory effort amplitude, we pro-
pose a set of 12 novel features that should reflect respiratory depth and volume, respectively.
They are expected to help classify sleep stages. Experiments were conducted with a data set
of 48 sleepers using a linear discriminant (LD) classifier and classification performance was
evaluated by overall accuracy and Cohen’s Kappa coefficient of agreement. Cross validations
(10-fold) show that adding the new features into the existing feature set achieved significantly
improved results in classifying wake, REM sleep, light sleep and deep sleep (Kappa of 0.38
and accuracy of 63.8%) and in classifying wake, REM sleep and NREM sleep (Kappa of 0.45
and accuracy of 76.2%). In particular, the incorporation of these new features can help improve
deep sleep detection to more extent (with a Kappa coefficient increasing from 0.33 to 0.43). We
also revealed that calibrating the respiratory effort signals by means of body movements and
performing subject-specific feature normalization can ultimately yield enhanced classification
performance.
53
54 Chapter 4. Analysis of respiratory effort amplitude
4.1 Introduction
According to the rules presented by Rechtschaffen and Kales (the R&K rules) [247], human
sleep is comprised of wake, rapid-eye-movement (REM) sleep and four non-REM (NREM)
sleep stages S1-S4. S1 and S2 are usually grouped as “light sleep” and S3 and S4 correspond
to slow-wave sleep (SWS) or “deep sleep” [276]. The gold standard for nocturnal sleep assess-
ment is overnight polysomnography (PSG) which is typically collected in a sleep laboratory.
With PSG, sleep stage is manually scored on each 30-s epoch throughout the night by trained
sleep experts, forming a sleep hypnogram [247]. PSG recordings usually contain multiple bio-
signals such as electroencephalography (EEG), electrocardiography (ECG), electrooculography
(EOG), electromyography (EMG), respiratory effort, and blood oxygen saturation.
Respiratory information has been widely used for objectively assessing human nocturnal
sleep [95, 226, 281]. Detecting sleep stages over night is beneficial to the interpretation of
sleep architecture or monitoring of sleep-related disorders [102, 248]. Cardiorespiratory-based
automated sleep stage classification has been increasingly studied in recent years [158, 180,
249, 309, 312]. Some of those studies only made use of respiratory activity because, when
comparing with it cardiac activity is relatively more difficult to be captured reliably in an unob-
trusive manner [158, 180]. For respiratory activity, in comparison with the breathing ventilation
acquired with traditional devices such as nasal prongs or face mask [106], respiratory effort can
be obtained in an easier and more noninvasive way, e.g., using a respiratory inductance plethys-
mography (RIP) sensor [73], an infrared camera [166], or a pressure sensitive bed-sheet [264].
Several parameters have been derived from respiratory effort signals for sleep analysis in-
cluding respiratory frequency, powers of different respiratory spectral bands [249], respiratory
self-similarity [180], and regularity [250], etc. These parameters are usually called “features” in
the tasks of epoch-by-epoch sleep stage classification. In addition, it has been reported that the
respiratory amplitude (e.g., depth and volume) differs between sleep stages [95]. For instance,
the “respiratory depth” is more regular and the tidal volume, minute ventilation, and inspiratory
flow rate are significantly lower during REM sleep than during NREM sleep (particularly dur-
ing deep sleep) [67, 129]. To the authors knowledge, these characteristics that express different
physiological properties across sleep stages have not been explicitly elaborated and quantified
for applications of sleep stage classification. We therefore exploit these characteristics by an-
alyzing respiratory effort signal envelope and area. Features quantifying these characteristics
are motivated to be designed which are expected to in turn help separate different sleep stages.
It is assumed that the information about respiratory depth or volume is obtainable from the
respiratory effort signal. For instance, the signal (upper and lower) envelopes and area should
correspond to respiratory depth and volume, respectively. In fact, respiratory effort has often
been used as a surrogate of tidal volume since it is obtained by measuring motions of rib cage or
abdominal with, e.g., RIP [73]. However, Whyte et al. [307] argued that this assumption does
not always hold, particularly when a sleeper changes his/her posture along with body move-
ments during sleep. This is because the respiratory effort amplitude might be affected by body
movements as the sensor position may shift and/or the sensor may be stretched. This will cause
an uneven comparison of the signal amplitude before and after body movements, yielding errors
Part I. Signal analysis for sleep stage classification 55
when computing the feature values. In order to provide a more accurate estimate of respiratory
depth and volume from respiratory effort signal, we must calibrate the signal by means of body
movements. They can be quantified by analyzing the artifacts of respiratory effort signal (of-
ten in line with body movements) using a dynamic time warping (DTW)-based method [180].
DTW is a signal-matching algorithm that quantifies an optimal nonlinear alignment between
two time series allowing scaling and offset [37]. Our previous work [180] has proposed a DTW
measure to effectively capture body motion artifacts by measuring self-similarity of respiratory
effort. This measure has been successfully used as a feature for classifying sleep and wake
states in that work. Therefore, we simply adopted this measure to detect motion artifacts mod-
ulated by body movements in respiratory effort signals. Using the DTW-based method enables
the exclusion of an additional sensor modality (e.g., actigraphy) specifically used for detecting
body movements.
The address of this work is exclusively on investigating a set of novel features that can
characterize respiratory amplitude in different aspects with the ultimate goal of improving sleep
stage classification performance. Previous studies have shown that linear discriminant (LD) is
an appropriate algorithm in sleep stage classification [179, 248, 249]. Likewise, we simply
adopted an LD classifier. Preliminary results of this work in classifying REM and NREM sleep
have been previously published [181].
Data of 48 healthy subjects (21 males and 27 females) in the SIESTA project, supported by
the European Commission [160], were included in our data set. The subjects had a Pittsburgh
Sleep Quality Index (PSQI) [60] of no more than 5 and met several criteria (no shift work, no
depressive symptoms, usual bedtime before midnight, etc.). All the subjects signed an informed
consent form prior to the study, documented their sleep habits over 14 nights, and underwent
overnight PSG study for two consecutive nights (on day 7 and day 8) in sleep laboratories. The
PSG recordings collected on day 7 were used for analyses, from which the respiratory effort
signals (sampling rate of 10 Hz) were recorded with thoracic inductance plethysmography
Sleep stages were manually scored on 30-s epochs as wake, REM sleep, or one of the NREM
sleep stages by sleep clinicians based on the R&K rules. For sleep stage classification epochs
were labeled as four classes W (wake), R (REM sleep), L (light sleep), and D (deep sleep), or
three classes W, R, and N (NREM sleep).
From the data used in this study the subject demographics and some sleep statistics [mean
± standard deviation (SD) and range] are summarized in Table 4.1.
The raw respiratory effort signals of all subjects were preprocessed before feature extraction.
They were filtered with a 10th order Butterworth low-pass filter with a cut-off frequency of
56 Chapter 4. Analysis of respiratory effort amplitude
0.6 Hz for the purpose of eliminating high frequency noise. Afterwards the baseline was re-
moved by subtracting the median peak-to-trough amplitude. To locate the peaks and troughs,
we identified the turning points simply based on sign change of signal slope and then corrected
the falsely detected ‘dubious’ peaks and troughs (1) with too short intervals between peak and
trough pairs where the sum of two successive intervals is less than the median of all intervals
over the entire recording and (2) with two small amplitudes where the peak-to-trough differ-
ence is smaller than 15% of the median of the entire respiratory effort signal. These methods
were validated by comparing automatically detected results with manually annotated peaks and
troughs and an accuracy of ∼98% was achieved.
A pool of 14 existing features extracted from the respiratory effort signal has been used in
previous studies for sleep stage classification. In the time domain, the mean and SD of breath
lengths (Lm and Lsd ) and the mean and SD of breath-by-breath correlations (Cm and Csd ) were
calculated [248]. In the frequency domain, we extracted features based on the respiratory effort
spectrum for each epoch where the spectrum was estimated using a short time Fourier transform
(STFT) with a Hanning window. From the spectrum the dominant frequency (Fr ) in the range of
0.05-0.5 Hz (estimated as the respiratory frequency) and the logarithm of its power (Fp ) were
obtained [248]. We also took the logarithm of the spectral power in the very low frequency
band between 0.01 and 0.05 Hz (VLF), low frequency band between 0.05 and 0.15 Hz (LF),
and high frequency band from 0.15 to 0.5 Hz (HF) and the ratio between LF and HF spectral
powers (LF/HF) [248, 249]. Furthermore the standard deviation of respiratory frequency over
5 epochs (Fsd ) was computed [249]. Non-linear features consist of self-similarity measured
between each epoch of interest and the other epochs by means of dynamic time and frequency
warping (Sdtw and Sdfw ) [180] and signal regularity estimated by sample entropy (Rse ) [250].
The latter was implemented with the PhysioNet toolkit sampen [170].
Part I. Signal analysis for sleep stage classification 57
-1.2
210 240 270 Time (s) 300 330
REM sleep
Resp. effort (a.u.)
1.2
-1.2
14640 14670 14700 Time (s) 14730 14760
Light sleep
Resp. effort (a.u.)
1.8
-1.8
27390 27420 27450 Time (s) 27480 27510
Deep sleep
Resp. effort (a.u.)
-2
7200 7230 7260 Time (s) 7290 7320
Figure 4.1: A typical example of a 2-min (or 4-epoch) respiratory effort signal in wake, REM sleep,
light sleep and deep sleep. The peaks and troughs are represented by filled circles and filled squares,
respectively.
Inhalation Exhalation
Wake
1.2
Resp. effort (a.u.)
1.2
1.8
Figure 4.2: A typical example of a 30-s (or one-epoch) respiratory effort signal in wake, REM sleep,
light sleep and deep sleep. The areas between the curves and the baseline are filled in light gray (inhala-
tion) and dark gray (exhalation). Examples of one breathing cycle period are indicated.
troughs should include the information in regard to respiratory depth. Let us consider p =
p1 , p2 , . . . , pn and t = t1,t2 , . . .,tn the peak and trough sequences from a window of 25 epochs
or 12.5 min centered at the epoch under consideration, containing n peaks and troughs, respec-
tively. We thus computed the standardized median of the peaks (and troughs) by dividing the
median by their interquartile range (IQR, the difference between the third and the first quartile),
such that
median(p1 , p2 , . . . , pn )
Psdm = , (4.1)
IQR(p1 , p2 , . . ., pn )
median(t1,t2 , . . .,tn )
Tsdm = . (4.2)
IQR(t1,t2 , . . .,tn )
These two features consider the mean respiratory depth and its variability at the same time in
terms of inhalation (for peaks) and exhalation (for troughs). Note that the period length of 25
epochs was chosen to maximize the average discriminative power of all respiratory amplitude
features in separating wake, REM sleep, light sleep, and deep sleep.
To examine how regular the envelopes are, we used the non-linear sample entropy mea-
sure, which has been broadly used in quantifying regularity of biomedical time series [250].
Part I. Signal analysis for sleep stage classification 59
Considering a time series with n data points u = u1 , u2 , . . . , un , let v(i) = ui , ui+1 , . . . , ui+m−1
(1 ≤ i ≤ n − m + 1) be a subsequence of u, where the window length m is a positive integer and
m < n. Then for each i, we have Bi,m (r) = (n − m + 1)−1 η (r), in which η (r) is the number of j
such that dm [v(i), v( j)] ≤ r (1 ≤ j ≤ n −m, j 6= i) where the distance metric dm between two sub-
sequences v(i) and v( j) is given by dm [v(i), v( j)] = max|ui+l − u j+l | for all l = 0, 1, . . ., m − 1.
For a higher dimension m + 1, we have Ai,m (r). Then the sample entropy of the time series u is
defined by
m
A (r)
SE = −ln m , (4.3)
B (r)
where
1 n−m
Am (r) = ∑ Ai,m(r), (4.4)
n − m i=1
1 n−m
Bm (r) = ∑ Bi,m(r). (4.5)
n − m i=1
Similarly, the sample entropy measures of the peak and trough sequences Pse and Tse are com-
puted as
" m #
Apeak (r)
Pse = −ln m , (4.6)
Bpeak (r)
" #
Am
trough (r)
Tse = −ln , (4.7)
Bm
trough (r)
in which r is the tolerance that usually takes the value of 0.1-0.25 SD of the peak or the trough
sequence and m takes a value of 1 or 2 for the sequence of length n larger than 100 data points
[171, 250]. In our study, r of 0.20 SD of the sequence and m of 2 were experimentally chosen
to maximize the discriminative power of the two features.
Additionally, the median of peak-to-trough differences express the range of inhale and ex-
hale depths. It was computed as
period between two consecutive troughs and thereby the inhalation and exhalation periods in
this breathing cycle are separated by the peak in between these two troughs. We first computed
the median respiratory volume (expressed by respiratory effort area) measured during breathing
cycles (Vbr ), inhalation periods (Vin ), and exhalation periods (Vex ) for each epoch, such that
In addition, we computed the median respiratory “flow rate” (expressed by the respiratory
effort area over time) during breathing cycles (FRbr ), inhalation periods (FRin ), and exhalation
periods (FRex ), such that
1 1 1
FRbr = median br ∑ sx , br ∑ sx , . . . , br ∑ sx , (4.12)
τ1 s ∈Ωin τ2 s ∈Ωin τK s ∈Ωin
x 1 x 2 x K
1 1 1
FRin = median in ∑ sx , in ∑ sx , . . . , in ∑ sx ,
τ1 s ∈Ωin τ2 s ∈Ωin τK s ∈Ωin
(4.13)
x 1 x 2 x K
1 1 1
FRex = median ex ∑ sx , ex ∑ sx , . . ., ex ∑ sx ,
τ1 sx ∈Ωex τ2 sx ∈Ωex τK sx ∈Ωex
(4.14)
1 2 K
in which τkin and τkex are the kth inhalation and exhalation time (unit: 100 ms)
The ratio of the inhalation and the exhalation flow rate FRin and FRex was finally computed as
RTin
RTfr = . (4.18)
RTex
Part I. Signal analysis for sleep stage classification 61
10
(a)
−5
−10
0 100 200 300 400 481
Time (min)
0.06
DTW measure (a.u.)
(b) Threshold
0.04
0.02
0
0 100 200 300 400 481
Time (min)
Figure 4.3: An example of (a) an overnight respiratory effort signal and (b) the corresponding epoch-
based DTW measure, where the threshold (0.01) for identifying epochs with body movements is indi-
cated.
4.2.6 Classifier
An LD classifier was used for sleep stage classification in this study. With LD, the prior prob-
abilities of different classes (i.e. sleep stages) have been observed to change over time. To
exploit this change, we calculated a time-varying prior probability for each epoch by counting
the relative frequency that specific epoch index was labeled as each class [179, 248, 249].
4.3 Results
As shown in Figure 4.4, the respiratory amplitude features were found to significantly differ
across sleep stages. This means that the information regarding respiratory depth and volume
estimated from respiratory effort, which are indicators of some properties of respiratory phys-
iology, is not independent of sleep stages and therefore it can be in turn used to classify sleep
stages.
Figure 4.5 compares their discriminative power in separating wake, REM sleep, light sleep
and deep sleep with and without respiratory signal calibration (by means of body motion arti-
facts) and subject-specific feature normalization. Mostly, by calibrating the respiratory effort
and normalizing the features per subject, the IG values of these new features were increased.
The discriminative powers of all the 26 respiratory features for different classification tasks are
presented in Figure 4.6. We note that the respiratory amplitude features rank higher than most
existing features for multiple-stage classifications and NREM sleep detection. Psdm and Tsdm
(reflecting the variability of depth) perform better in detecting deep sleep; Pse and Tse (reflect-
ing the regularity of respiratory depth) have a relatively larger power in distinguishing between
wake and sleep. It can be seen that the volume-based features (with an exception of RTfr ) have
higher discriminative powers in detecting REM sleep.
64 Chapter 4. Analysis of respiratory effort amplitude
5 5 5 5
Psdm (a.u.)
Tsdm (a.u.)
Pse (a.u.)
Tse (a.u.)
0 0 0 0
−5 −5 −5 −5
W R L D W R L D W R L D W R L D
5 5 5 5
PTdiff (a.u.)
Vex (a.u.)
Vbr (a.u.)
Vin (a.u.)
0 0 0 0
−5 −5 −5 −5
W R L D W R L D W R L D W R L D
5 5 5 5
FRex (a.u.)
FRbr (a.u.)
FRin (a.u.)
RTfr (a.u.)
0 0 0 0
−5 −5 −5 −5
W R L D W R L D W R L D W R L D
Figure 4.4: Boxplots of values of the 12 respiratory amplitude features (with signal calibration and
subject-specific normalization) in different classes (W, R, L and D). Outliers are not shown in order to
visualize the boxes clearer. The significance of difference was found between each two classes for each
feature using an unpaired Mann-Whitney test at p < 0.01.
0.15
IG
0.1
0.05
0
(b) With subject-specific normalization
0.2
0.15
IG
0.1
0.05
0
Psdm Tsdm Pse Tse PTdiff Vbr Vin Vex FRbr FR in FR ex RTfr
Figure 4.5: Comparison of discriminative power (as measured by IG) of all the 12 respiratory amplitude
features without and with calibrating the respiratory effort signals for WRLD classification, where the
values (a) without and (b) without subject-specific feature normalization are both presented. IG was
computed by pooling epochs over all subjects.
Figure 4.7 illustrates the average Cohen’s Kappa coefficient versus the number of features
(ranked and selected by IG values) used for different classification tasks. For most tasks the
classification performance obtained using the feature set “all” is always better than that ob-
tained using the feature set “exist” when the number of selected features is larger than a certain
value. The overall accuracy and Kappa coefficient with the number of selected features yield-
Part I. Signal analysis for sleep stage classification 65
0.1
0
(b) WRN classification
0.2
IG
0.1
0
(c) W detection
0.15
IG
0.1
0.05
0
(d) R detection
0.15
IG
0.1
0.05
0
(e) D detection
0.15
IG
0.1
0.05
0
(f) N detection
0.15
IG
0.1
0.05
0
Fsd Tsdm Psdm Pse FR br Vbr Vex FR in Tse FR ex R se Sdfw Vin Fr Sdtw LF HF Cm LF/HF L sd L m VLF Fp Csd PTdiff RTfr
Figure 4.6: Discriminative power of all the 26 respiratory features (with signal calibration and subject-
specific feature normalization) for (a) WRLD classification, (b) WRN classification, (c) W detection, (d)
R detection, (e) D detection, and (f) N detection. The features were ranked by IG (computed by pooling
epochs over all subjects) for WRLD classification in a descending order.
ing maximum Kappa are summarized in Table 4.2. We see that, on the one hand, normalizing
the features per subject largely increased the sleep stage classification performance for all the
classification tasks. It also shows that, to a certain extent, this method is able to reduce between-
subject variability in respiratory physiology (by comparing their SD). On the other hand, com-
bining the existing and the new respiratory amplitude features resulted in significantly improved
results except for wake detection. In particular, the relatively large improvement in detecting
deep sleep epochs (Kappa of 0.43 ± 0.19 versus 0.33 ± 0.17) indicates that the new features
can benefit the deep sleep detection most.
Table 4.3 compares the performance of our sleep stage classifiers (for multiple stages) with
those reported in literature. For instance, Hedner et al. [127] presented a Kappa of 0.48 and an
66 Chapter 4. Analysis of respiratory effort amplitude
(a) Feature set “exist” (b) Feature set “all” (c) Feature set “exist” (d) Feature set “all”
without normalization without normalization with normalization with normalization
0.5
0.45
0.4
Kappa coefficient
0.35
0.3
0.25
0.2
0.15
W R D N WRLD WRN
0.1
0 2 4 6 8 10 12 14 0 4 8 12 16 20 24 0 2 4 6 8 10 12 14 0 4 8 12 16 20 24
# Features # Features # Features # Features
Figure 4.7: Kappa coefficient of Sleep stage classification versus the number of selected features ranked
by their IG values in a descending order. Results were obtained based on 10-fold CV using feature set (a)
“exist” and (b) “all” without subject-specific feature normalization and using feature set (c) “exist” and
(d) “all” with subject-specific feature normalization. WRLD: classification of wake, REM sleep, light
sleep and deep sleep; WRN: classification of wake, REM sleep and NREM sleep; W: wake detection; R:
REM sleep detection; D: deep sleep detection; N: NREM sleep detection.
overall accuracy of 65.4% in classifying wake, REM sleep, light sleep and deep sleep, which
outperform our results but they used more signal modalities such as peripheral arterial tone,
pulse rate, oxyhemoglobin saturation and actigraphy. With respect to WRN classification, al-
though Redmond et al. [249] obtained better results compared with our study, they included
more signal modalities including cardiac activity. Besides, our results are slightly better than
those reported in some other studies, e.g., Kappa of 0.42 by Mendez et al. [197] and Kappa of
0.44 by Kortelainen et al. [161], where they considered ballistocardiogram (BCG) that contains
also cardiac information. Nevertheless, when only using respiratory activity, Sloboda et al.
[277] achieved an overall accuracy of ∼70% (with 9 respiratory features using a naive Bayes
classifier) which is much lower than that presented in this chapter.
4.4 Discussion
The respiratory effort signals were calibrated using the DTW-based method. The DTW measure
has been proven to be in association with body movements [180], where a significant Spear-
man’s rank correlation coefficient (r = 0.32, p < 0.0001) was reported. Further, we obtained
a higher correlation (r = 0.56, p < 0.0001) between the quantified body movements using the
DTW-based method (where the DTW measures lower than 0.01 were set to be zero) and activ-
ity counts computed using actigraphy based on the data set used in that study. We also tested
Part I. Signal analysis for sleep stage classification 67
Table 4.2: Summary of sleep stage classification performance (10-fold CV) using feature set “exist”
and “all” with and without performing subject-specific feature normalization
tained with and without subject-specific feature normalization at p < 0.01 except for wake detection.
the sensitivity of the threshold and found that the discriminative power of the respiratory am-
plitude features did not dramatically change when the threshold was ranging between ∼0.005
and ∼0.013. To analyze the adequacy of this method for sleep stage classification, we com-
pared the discriminative power as well as the classification performance of these new features
between using actigraphy [181] and using the DTW-based method to calibrate the respiratory
effort signals. The results are comparable. This suggests that the DTW measure is an adequate
estimate of actigraphy for identifying body movements and is therefore effective in mitigating
the effect of body motion artifacts on computing the respiratory amplitude features.
As stated in Section 4.4.2 and 4.4.3, the respiratory amplitude features were computed with
a window of 25 epochs (12.5 min). This served to capture the changes of respiratory depth and
volume as well as providing reliable regularity measures of peak/trough sequences using sam-
ple entropy with sufficient data points. Additionally, we hypothesized that the respiratory effort
area can accurately represent breathing tidal volume or ventilation when extracting the respi-
ratory volume-based features. However, this hypothesis is not always acceptable, in particular
for subjects who change their posture during sleep [307]. In those cases these features might
be inaccurately computed, thus harming classification performance. This challenge should be
further studied.
68 Chapter 4. Analysis of respiratory effort amplitude
Table 4.3: Summary of sleep stage classification performance (10-fold CV) using feature set
“exist” and “all” with and without performing subject-specific feature normalization
Although the addition of the respiratory amplitude features resulted in enhanced perfor-
mance in WRLD and WRN classifications (Table 4.2), the improvements seem relatively mod-
est in general. One explanation is that these new features are correlated with the existing fea-
tures as discussed before and the additional information is limited. Upon a closer look, we
found that the new features contributed more on deep sleep detection than other detection tasks.
As a result, this would yield relatively lower performance improvements for multiple-stage
classifications since deep sleep only accounts for an average of 14.5% over the entire night.
As shown in Table 4.2, the new features could not help improve wake detection. Actually, the
existing features Sdtw and Sdfw have been shown to be reliable in detecting wake epochs with
body movements in our previous study [180]. In this work, to focus more on the respiratory
depth and volume properties without being influenced by body movements, we excluded the
‘dubious’ peaks and troughs (see Section 4.2) where some of them are possibly body motion
artifacts which are often indication of wake epochs. Therefore, the new features here might not
be able to help detect ‘quiet wake’ (wakefulness without body movements). Nevertheless, the
effect of body movements on the respiratory depth and volume needs to be further studied.
In addition, we observe that the variation of sleep stage classification results between sub-
jects still remains high (see Table 4.2). For instance, the average Kappa values of WRLD and
WRN classifications over all subjects are 0.38 ± 0.14 and 0.45 ± 0.15, respectively. This is
mainly caused by large physiological differences between subjects in the way sleep stages are
expressed on respiratory features, which naturally leads to difficulties in enhancing the clas-
sification performance for some subjects. Therefore, it is still worth investigating methods to
reduce the between-subject variability of the features.
In this work we selected features solely based on their discriminative power measured by
Part I. Signal analysis for sleep stage classification 69
IG. This approach did not take the correlation or relevance between features into account so
that some of them might likely redundant to some extent. On average, the maximum abso-
lute Spearman’s rank correlation coefficient |r|max between each new feature and the existing
features is 0.35 ± 0.11 (ranging from 0.07 ± 0.46 for different new features, p < 0.01). For
instance, the highest correlation (r = 0.46, p < 0.0001) occurs between Fsd and Tsdm , indicat-
ing that the variation of respiratory frequency is highly correlated with respiratory depth and
its change. Hence, employing feature selectors that aim at reducing feature redundancy merits
further investigation, especially when more features are incorporated.
As presented in Table 4.3, our methods achieved acceptable sleep stage classification results
when using respiratory information alone. Although the results are lower than some other stud-
ies, those studies used more signal modalities such as cardiac activity. We therefore anticipate
that the classification performance should be further enhanced when combining respiratory and
cardiac activity, which will be further studied. Moreover, we only used the simple LD classi-
fier as long as we exclusively focused on analyzing new features for sleep stage classification.
Nevertheless, more advanced classification algorithms merit investigation in future work.
4.5 Conclusion
In this chapter, respiratory effort amplitude (with respect to breathing depth and volume) was
analyzed and quantified during nighttime sleep, which was found to differ across sleep stages.
Based on this, 12 novel features that characterize different aspects of respiratory effort ampli-
tude were extracted for automated sleep stage classification. To eliminate the effect of body
movements during sleep, respiratory effort signals were calibrated by using a DTW measure
which has been shown to correlate with body motion artifacts. By calibrating the signals and
normalizing the features for each subject, the discriminative power of the features can be in-
creased. When using only respiratory effort signals, combining the new features proposed in
this work with the existing respiratory features (known in literature) can help significantly im-
prove the performance in classifying and identifying different sleep stages with an exception of
wake state detection.
70 Chapter 4. Analysis of respiratory effort amplitude
CHAPTER 5
This chapter is adapted from: X. Long, J. Yang, T. Weysen, R. Haakma, J. Foussier, P. Fonseca, and R.
M. Aarts. Measuring dissimilarity between respiratory effort signals based on uniform scaling for sleep
staging. Physiological Measurement, 35(12):2529–2542, 2014.
IOPc Publishing
Abstract – Polysomnography (PSG) has been extensively studied for sleep staging, where sleep
stages are usually classified as wake, rapid-eye-movement (REM) sleep, or non-REM (NREM)
sleep (including light and deep sleep). Respiratory information has been proven to correlate
with autonomic nervous activity that is related to sleep stages. For example, it is known that the
breathing rate and amplitude during NREM sleep, in particular during deep sleep, are steadier
and more regular compared to periods of wakefulness that can be influenced by body move-
ments, conscious control, or other external factors. However, the respiratory morphology has
not been well investigated across sleep stages. We thus explore the dissimilarity of respira-
tory effort with respect to its signal waveform or morphology. The dissimilarity measure is
computed between two respiratory effort signal segments with the same number of consecu-
tive breaths using a uniform scaling distance. To capture the property of signal morphological
dissimilarity, we propose a novel window-based feature in a framework of sleep staging. Ex-
periments were conducted with a data set of 48 healthy subjects using a linear discriminant
classifier and a 10-fold cross validation. It is revealed that this feature can help discriminate be-
tween sleep stages, but with an exception of separating wake and REM sleep. When combining
the new feature with 26 existing respiratory features, we achieved a Cohen’s Kappa coefficient
of 0.48 for 3-stage classification (wake, REM sleep, and NREM sleep) and of 0.41 for 4-stage
classification (wake, REM sleep, light sleep, and deep sleep), which outperform the results
obtained without using this new feature.
71
72 Chapter 5. Uniform scaling dissimilarity on respiratory effort
5.1 Introduction
Previous studies have shown that characteristics of human respiratory activity are associated
with sleep stages throughout the entire night [95, 281]. Respiratory effort has been increasingly
used for objective sleep analysis [253] and sleep staging [69, 249] in contrast to traditional
polysomnography (PSG) which is considered the “gold standard” in sleep studies. This is
because respiratory activity is able to be acquired in an easy and unobtrusive manner using,
for example, bed sensors [161, 304], Doppler radar [194], photoplethysmography [174], or a
watch-based device [131]. Sleep consists of wake, rapid-eye-movement (REM) sleep, and four
non-REM (NREM) sleep stages S1-S4 according to the R&K rules [247]. In regard to S3 and
S4, the American Academy of Sleep Medicine (AASM) guidelines [136] and their updated
rules [38] suggest merging them into a single “deep sleep” or slow wave sleep stage. S1 and
S2 often correspond to “light sleep” [51, 276]. With PSG, sleep stages are manually scored
by sleep technicians on 30-s epochs based on multiple electrophysiological signals including
electroencephalography (EEG), electrooculography (EOG), and electromyography (EMG). The
manually scored sleep stages can be visualized in a hypnogram.
It has been reported in earlier studies that some characteristics of respiration differ across
sleep stages such as respiratory frequency [95], respiratory variability [256], different frequency
components of respiratory spectrum [249], etc. However, the dissimilarity of respiratory effort
in terms of signal waveform or morphology for different sleep stages has not been well explored.
In fact, the respiratory pattern (e.g., amplitude and frequency) has been shown to be more stable
and regular during NREM sleep (in particular during deep sleep) than during wake and REM
sleep [67, 129]. The irregularity of breathing is usually caused by body movements, alternation
of ventilation control, or behavioral factors when awake [230] and it is related to paralysis of
voluntary musculature (muscle atonia) during REM sleep [233]. In this matter, we may then
anticipate that if a sleep stage has a higher regularity in breathing, the respiratory effort in this
stage would have lower dissimilarity in between. On the other hand, the respiratory dynam-
ics have been found to associate with physiologic states such as sleep stages which distinctly
correspond to autonomic regulatory mechanisms [226, 267, 292]. We therefore hypothesise
that (1) the respiratory effort is characterized by signal morphology and (2) the dissimilarity
between two respiratory effort periods is influenced by their corresponding sleep stages. Re-
search has been focusing on investigating respiration changes during sleep [149, 256]. For in-
stance, some researchers analyzed non-random variability of respiration (e.g., breath-by-breath
intervals) on short- and long-term scales [256], whereas with a much less focus on comparing
respiratory patterns of multiple breaths. Although some parameters including breathing rate,
inspiratory/expiratory volumes, and minute volume were investigated, the respiratory morphol-
ogy was less researched.
Many methods have been utilized to compare two time series such as cross-correlation, de-
trended fluctuation analysis, and cross-approximate entropy, however, they can be limited by
several factors including the non-stationary trend of data, insufficient number of data points
for, e.g., polynomial fitting, low relative consistency, and/or unequal length between time series
[31, 133, 250]. The idea here is to use a Euclidean-based distance as a dissimilarity metric
Part I. Signal analysis for sleep stage classification 73
between two respiratory effort signal segments from a subject. When computing the distance,
each signal segment is selected inside its corresponding 30-s epoch to have a certain number of
consecutive breaths, served to provide an even comparison on their signal morphology. These
signal segments are usually less than 30 s. It is inevitable that the length (i.e., number of data
points) of any two signal segments differs so that they are necessarily required to be scaled at
an equal length in order to perform an Euclidian (sequential) mapping. To resolve this prob-
lem, we propose to use a uniform scaling method [314] to re-scale the two signal segments
by searching for the minimal Euclidean distance between them. In other words, they are uni-
formly ‘stretched’ to allow for a reduction on the effects of variant breathing frequency to a
certain degree, resulting in focusing more on signal morphology.
As for automatic sleep staging, it is particularly interesting to know if different sleep stages
can be distinguished by means of respiratory effort data when the PSG-based hypnogram is
absent. This would benefit the applications of home-based sleep staging or sleep stage clas-
sification which has been attracting increasing attention in recent years [89, 182, 248, 264].
Information regarding sleep stages is usually extracted as epoch-based “features” used to per-
form epoch-by-epoch classification. For this purpose, we propose a new feature to describe the
dissimilarity of respiratory effort morphology between different epochs from the same record-
ing. Of this feature, discriminative power in classifying sleep stages will be evaluated and it is
expected to help improve sleep staging performance.
Forty eight healthy subjects [21 men and 27 women; mean age 41.3 y ranging from 20 to
83, standard deviation (SD) 16.1; mean body mass index 23.6 kg·m−2 ranging from 19.1 to
31.3, SD 2.9] in the SIESTA project [160] are considered. The project was supported by the
European Commission and the subjects were monitored in seven different sleep laboratories
located in five European countries over a period of three years from 1997 to 2000. The subjects
had a Pittsburgh Sleep Quality Index [60] of less than 6 and fulfilled several criteria (e.g., no
depressive symptoms, no reported medical, neurological, mental or cardiovascular disorders,
no history of drug abuse or habituation, no psychoactive medication, no shift work, and usually
bedtime before midnight). According to the study protocol of the SIESTA project, all subjects
provided an informed consent, documented their sleep habits over 14 nights, and spent two
consecutive nights (on days 7 and 8) in the sleep laboratory [19]. More details regarding the
subject information and the study protocol can be found online (http://www.ofai.at/siesta). In
this study, we only include single-night PSG recordings (on day 7) for analysis.
Full PSG data, including multiple EEG, EOG, and EMG channels, electrocardiography (ECG),
respiratory effort, oxygen saturation, snoring, etc., were recorded for each subject and the sleep
74 Chapter 5. Uniform scaling dissimilarity on respiratory effort
stages were visually scored by professional sleep technicians as wake, REM, and S1-S4 on
30-s epochs according to the R&K rules. Thoracic breathing movements were measured by
respiratory inductance plethysmography (RIP) in the form of respiratory effort signals at a
sampling rate of 10 Hz. For the problem of sleep staging, we consider deep sleep (merged S3
and S4) as a single stage as suggested by the AASM guidelines. In the mean time, S1 and S2
are merged as single light sleep.
Referring to the statistics of normal sleepers across the human lifespan reported previously
[216], the selection of overnight recordings from a larger data set met several criteria including
the sleep efficiency of ≥75%, REM sleep of ≥15%, and deep sleep of ≥5%. The sleep data is
summarized in Table 5.1, in which mean and SD over subjects and range are presented.
The raw respiratory effort signals are first low-pass filtered (10th order Butterworth filter with
a cut-off frequency of 0.6 Hz) in order to eliminate high-frequency noise. Then the baseline is
removed by subtracting the median peak-to-trough amplitude estimated over the entire record-
ing, which serves to compute the respiratory volume-based features. These features will be
described further in Section 5.2.7. The localization of respiratory peaks/troughs is achieved by
detecting the signal turning points based on sign changes of the signal slopes. Afterwards, we
remove the falsely detected peaks/troughs (1) with too short peak-to-trough or trough-to-peak
intervals (where the sum of two successive intervals is less than the median of all intervals over
the entire recording) and (2) with too small amplitudes (where the peak-to-trough difference
is smaller than 0.15 times the median of the entire respiratory signal). These methods were
validated by comparing the automatically detected results with manually annotated peaks and
troughs and an accuracy of ∼98% was achieved.
Given an overnight respiratory effort recording with L epochs from a subject, the ith epoch is
expressed as Ui = {ui,1 , ui,2, . . . , ui,n} (i = 1, 2, . . ., L) with n data points (here n = 300 at the
signal sampling rate of 10 Hz). As explained before, we only choose a signal segment with
Part I. Signal analysis for sleep stage classification 75
a certain number of consecutive breaths λ inside this epoch when computing the dissimilarity
score, thereby the chosen signal segment for this particular epoch Ui is expressed by Vi =
{vi,1 , vi,2 , . . ., vi,mi } with mi data points (mi ≤ n). The locations of vi,1 and vi,mi are based on the
detected respiratory peaks or troughs within this epoch so that the segment Vi contains several
complete breaths, starting and ending at two different troughs. The signal segment length mi is
dependent of i because respiratory frequency usually varies between signal segments, even if
they might have a same number of breaths. Besides, it also depends on the prescribed number
of breaths λ .
Let us consider two epochs Ui and U j (i, j = 1, 2, . . ., L and i 6= j) with pi and qi consecutive
breaths, respectively. To ensure an equal number of breaths that aims at evenly comparing
their dissimilarity, we have λ = min{pi , qi }. For the epoch with more breaths, only the λ
breaths in the middle are selected, yielding a signal segment within this epoch. Then the two
signal segments Vi and V j (i 6= j) with λ breaths each are normalized at zero mean and unit
variance (Z-score normalization). However, the two signal segments may have unequal lengths,
which is not applicable for computing the Euclidean distance between them. To tackle this, we
utilize uniform scaling, a Euclidean-based minimization method. For Vi and V j , assuming that
mi ≤ m j , a uniformly scaled series of Vi is expressed as Vik = {vki,1 , vki,2 , . . ., vki,k } with length of k
(mi ≤ k ≤ m j ), where vki,x = vi,⌈x·mi ·k−1 ⌉ for x = 1, 2, . . ., mi . Hence, the dissimilarity score dscore
between Ui and U j is the uniform scaling distance dus between Vi and V j , which can be obtained
by minimizing the Euclidean distance subject to mi ≤ m j , such that
v
u
u1 k
dscore (Ui ,U j ) ≡ dus (Vi ,V j ) = min t ∑ (vki,x − v j,x )2 . (5.1)
mi ≤k≤m j k x=1
Since the k-space Euclidian distance metric is sensitive to series length k which usually en-
counters different values in Equation 5.1, the distance should be normalized by k. Figure 5.1
depicts an example of computing the dissimilarity score dscore between two epochs. Note that
dscore is computed within each recording (or subject for the single-night data) to avoid the effect
of between-subject variability, often caused by the existence of physiological difference from
subject to subject.
It is of interest to extract a feature for each 30-s epoch to capture the dissimilarity property of
respiratory effort morphology. This feature can in turn be used to separate different sleep stages.
To do so, we compute the mean dissimilarity score between each epoch and the other epochs
from the same recording within a window, named by windowed (self-) dissimilarity feature and
denoted as Dwin henceforth. We expect that this feature is not independent of sleep stage and
thus it is informative for sleep staging. For the ith epoch Ui of a given subject, it is computed as
∑ j dscore (Ui ,U j )
Dwin (Ui ) = , for | j − i| ≤ w and j 6= i, (5.2)
min(w, i − 1) + min(w, L − i)
76 Chapter 5. Uniform scaling dissimilarity on respiratory effort
(a)
2
Vi Vj
1
0
1
0 50 100 150 200 250
(c) Sample (at 10 Hz)
Resp. effort (a.u.)
2 k
Vi Vj
1
0
1
0 50 100 150 200 250
Sample (at 10 Hz)
Figure 5.1: An example of computing the dissimilarity score of respiratory effort between two epochs:
(a) original signals Ui and U j at 10 Hz within 30-s epochs; (b) selected signal segments Vi and V j with
5 consecutive breaths, where series lengths are unequal; (c) uniformly scaled series Vik and V j , where k
equals the length of V j . Note that the signal segments in (a) and (b) are normalized to have zero mean
and unit variance.
in which L is the total number of epochs for this specific subject and w = 1, 2, . . ., L is the
(single-side) size of the window centered at Ui . This means that Dwin is a feature with a certain
time (or window) scale. The window size w is determined by maximizing the feature discrim-
inative power. Intuitively, the majority of the epochs contained within a small window should
be in the same sleep stage as the given epoch. This can be examined by comparing the percent-
age of occurrence for different sleep stages versus the time difference ∆ between epochs. We
also analyze the changes of dscore for ‘self-comparisons’ versus ∆, where dscore is computed be-
tween epochs with same sleep stage (i.e., wake-wake, REM-REM, light-light, and deep-deep).
To reduce noise in feature level caused by measurement errors or body motion artifacts, Dwin is
smoothed over the entire-night recording using a moving average method (with a 10-min span).
For the windowed dissimilarity feature Dwin , we first compare its mean value and SD over
all subjects between sleep stages. In addition to that, we compute its discriminative power for
sleep staging using One-Way analysis of variance (ANOVA) F-statistic. A higher discriminative
power leads to a larger value of ANOVA F-statistic. The F-statistic of Dwin is then compared
with that of the existing features by ranking it among all the features. The distributions of
Dwin in different sleep stages are found to approximately follow a normal distribution using a
Quantile-Quantile (Q-Q) plot method.
Part I. Signal analysis for sleep stage classification 77
As stated, the new feature Dwin can be incorporated to perform automatic sleep staging when
solely using respiratory effort data. A set of 26 existing respiratory features have been used to
classify sleep stages in previous studies. They comprise features in both time and frequency do-
main [248], respiratory depth- and volume-based features [182], and non-linear features based
on sample entropy [75] and dynamic warping [180]. Table 5.2 lists and describes all the respi-
ratory features. To examine whether Dwin can help achieve an enhanced classification perfor-
mance, we compare the classification results with and without adding it to the existing feature
set. Note that for the purpose of reducing between-subject variability in respiration, all the
features are normalized (Z-score) for each overnight recording.
We simply adopt a linear discriminant (LD) classifier which has been widely used for the
task of sleep staging [89, 108, 182, 249]. The data including 48 entire-night recordings is
randomly divided to 10 data subsets where each fold consists of four or five recordings and
then we execute the sleep staging iteratively using a 10-fold cross-validation (CV). During each
iteration, the classifier is trained on nine folds and validated on the remaining one in order to
minimize the classifier bias.
To evaluate the classifier, we use Cohen’s Kappa coefficient κ [72] in addition to overall
accuracy because it is more appropriate for analyzing unbalanced data (in our case light sleep
accounts for 53.6% which is much larger than the other stages). To exploit the prior proba-
bilities of different sleep stages in an LD classifier that may change over time, we compute
a time-varying prior probability (TVPP) for each epoch by counting the relative frequency of
occurrence of each sleep stage at its corresponding time of the night based on the associated
training data. More details about TVPP can be found elsewhere [249]. Here we present results
for two sleep staging schemes, including 4-stage classification (wake, REM sleep, light sleep,
and deep sleep) and 3-stage classification (wake, REM sleep, and NREM sleep).
5.3 Results
The (single-side) window size w of 25 epochs was experimentally found to be an appropriate
value when computing the new feature Dwin , where its feature discriminative power in classi-
fying wake, REM sleep, light sleep, and deep sleep was maximized. Figure 5.2 compares the
percentage of occurrence in different sleep stages changing over ∆. The figure indicates a pres-
ence of self-comparisons with a higher likelihood if |∆| is smaller than a value (e.g., ∼30 epochs
for wake, REM sleep, and deep sleep). It also illustrates that the comparison between each sleep
stage and light sleep dominates if |∆| is larger than that value. These graphs imply that, for our
choice of w = 25 epochs, the feature values of Dwin depend more on the self-comparisons. As
shown in Figure 5.3, in regard to the self-comparisons, we observe that different sleep stages
can be separated by the dissimilarity score within the 25-epoch window except for that between
wake and REM sleep where overlaps occur.
Figure 5.4 compares the feature values of Dwin in different sleep stages (mean ± SD and
histogram), in which the separation can be observed between sleep stages, particularly be-
78 Chapter 5. Uniform scaling dissimilarity on respiratory effort
tween deep sleep and the other stages and between REM and NREM sleep. An example of an
overnight hypnogram and the corresponding Dwin values from a 50-year-old female are illus-
trated in Figure 5.5, where the correlation between them can be seen. Table 5.3 presents the
discriminative powers (as measured by ANOVA F-statistic) of Dwin in separating different sleep
stages. For comparison, we also provide its ranking among all features as well as the top-10
ranked features (in a descending order in terms of F-statistic) in the table.
The respiratory effort-based sleep staging results using the feature set with and without Dwin
are compared in Table 5.4, where the overall accuracy and the Cohen’s Kappa coefficient are
reported. It is noted that combining Dwin with the existing features resulted in a significantly
Part I. Signal analysis for sleep stage classification 79
Percentage (%)
Percentage (%)
0.5 0.5
0 0
−200 −100 −25 0 25 100 200 −200 −100 −25 0 25 100 200
∆ (30−s epoch) ∆ (30−s epoch)
(c) Light (d) Deep
1 1
Percentage (%)
Percentage (%)
0.5 0.5
0 0
−200 −100 −25 0 25 100 200 −200 −100 −25 0 25 100 200
∆ (30−s epoch) ∆ (30−s epoch)
Figure 5.2: The probability of occurrence of different sleep stages versus time difference ∆ for (a) wake,
(b) REM, (c) light, and (d) deep sleep epochs. The boundary of the 25-epoch window for computing
Dwin is indicated (dashed line). For all stages, light sleep percentage is larger than any other stages when
|∆| > ∼30 epochs.
0.9
(a.u.)
score
d
0.8
Wake−wake Light−light
REM−REM Deep−deep
0.7
0 25 50 75 100
|∆| (30−s epoch)
Figure 5.3: Mean dissimilarity score dscore versus absolute time difference |∆| for self-comparisons
wake-wake, REM-REM, light-light, and deep-deep. The boundary of the 25-epoch window for comput-
ing Dwin is indicated (dashed line).
80 Chapter 5. Uniform scaling dissimilarity on respiratory effort
1 0.78 ± 0.17
0.1
0.8
0.05
0.6
0
Wake REM Light Deep 0.3 0.5 0.7 0.9 1.1 1.3 1.5
Sleep stage D win (a.u.)
Figure 5.4: Comparison of the windowed dissimilarity feature Dwin in different sleep stages: (a)
mean ± SD and (b) normalized histogram (i.e., percentage, %).
(a)
Wake
REM
Light
Deep
1.5 (b)
Dwin (a.u.)
0.5
Figure 5.5: An example of (a) overnight annotation and (b) feature values of Dwin from a 50-year-old
female, where the unsmoothed (gray) and smoothed (black) feature values are both shown.
increased κ of 0.41 at an overall accuracy of 64.9% when classifying four sleep stages and
of 0.48 at an over accuracy of 77.1% when classifying three sleep stages (both with TVPP).
The table also shows the results obtained without applying TVPP, indicating that using TVPP
can help achieve significantly better results. Here the significance was checked with a two-
sided Wilcoxon signed-rank test. To understand what aspects of sleep staging the new feature
improves, we present the confusion matrices obtained with and without Dwin in Table 5.5 (for
4-stage classification) and in Table 5.6 (for 3-stage classification), where TVPP was applied.
Part I. Signal analysis for sleep stage classification 81
Table 5.3: Discriminative power of Dwin in separating different sleep stages as evalu-
ated and ranked by ANOVA F-statistic. Results are pooled over all subjects
Table 5.4: Ten-fold CV results of 4-stage (wake, REM sleep, light sleep and deep
sleep) and 3-stage (wake, REM sleep, and NREM sleep) classification schemes ob-
tained using the feature set with and without Dwin , where the results obtained with
and without using TVPP are also presented
Table 5.5: Confusion matrix of 4-stage classification (10-fold CV) obtained using fea-
ture set with and without Dwin , where the results without Dwin are given in parentheses
5.4 Discussion
The deployment of respiratory effort dissimilarity with several consecutive breaths (as mea-
sured by a uniform scaling distance) to characterize the regulation of breathing within different
sleep stages was investigated. On average, we observe the lowest dissimilarity score between
two deep sleep epochs. This is because respiratory effort during NREM sleep (in particular dur-
ing deep sleep) is steadier and more regular compared with that during wake and REM sleep
as mentioned before. As illustrated in Figure 5.3, the discrimination between wake and REM
sleep in terms of respiratory effort dissimilarity over time difference is not consistent and seems
maximized at |∆| beyond 40 epochs. With smaller time differences, overlap can be observed
between the dissimilarity scores for wake-wake and REM-REM comparisons. During wake,
breathing control might be somewhat less affected by conscious control as well as body move-
ments or other external influences in a short range (e.g., with a |∆| of less than 10 epochs or 5
minutes). This would decrease the dissimilarity scores of wake-wake comparison during that
range, yielding a difficulty in distinguishing between wake and REM sleep. As a result of that,
the windowed dissimilarity feature Dwin has a low discriminative power in separating wake and
REM sleep as shown in Table 5.3. Actually, classifying wake and REM sleep might sometimes
be difficult even with PSG-based visual scoring [276].
In this work, we chose the window size w of 25 epochs to compute Dwin by globally maxi-
mizing the feature discriminative power in classifying wake, REM sleep, light sleep, and deep
sleep. However, it might not be the optimal choice all the time, particularly in separating wake
and REM sleep (see Figure 5.3). The optimal window size might vary when classifying dif-
ferent sleep stages. Therefore, we think that using an adaptive window size to discriminate
between different sleep stages merits further investigation.
Regarding sleep staging, the new feature Dwin helped improve the classification performance
(Table 5.4) and it contributed more to the detection of REM and deep sleep from the other sleep
stages (Table 5.5). It is therefore suggested that this feature contains additional information
that is not carried by the existing features. We also reveal that using TVPP can lead to better
classification results, as shown in Table 5.4. With cardiorespiratory activity, a κ of 0.46 and an
overall accuracy of 76.1% were achieved when classifying wake, REM sleep, and NREM sleep
for 31 healthy subjects [249]. We obtained slightly better results with the use of the respiratory
information alone. For 4-stage classification (wake, REM sleep, light sleep, and deep sleep),
a κ of 0.48 and an overall accuracy of 65.4% (re-computed based on the reported confusion
Part I. Signal analysis for sleep stage classification 83
matrix) were achieved by Hedner et al. [127], which outperform our results. However, they
employed more signal modalities including peripheral arterial tone, pulse rate, oxyhemoglobin
saturation, and actigraphy. In a more recent study, Willemen et al. [309] reported a κ of 0.56
(at an accuracy of 69%) for 4-stage classification using cardiorespiratory and body movement
features, whereas they considered an epoch of 60 s instead of the standard 30 s used in most
studies with respect to sleep staging. Nevertheless, we anticipate that combining respiratory
and cardiac activity will result in a performance enhancement on sleep stage classification and
this will be further studied.
The PSG-based sleep stages were manually scored based on the R&K rules in the SIESTA
database. However, it has been reported that the overall inter-scorer agreement using the new
AASM standard is slightly higher than that obtained using the R&K rules [81]. Therefore, the
AASM standard is suggested to be applied for PSG-based sleep stage scoring in future work,
which is expected to deliver more reliable annotations of overnight sleep stages used for the
task of respiratory-based sleep stage classification.
This study only considered healthy subjects without any reported medical, neurological,
mental, or cardiovascular diseases as mentioned before. However, for patients with sleep-
disordered breathing (e.g., sleep apnea/hypopnea) or other respiratory abnormalities, abnormal
respiratory events during the night can affect measuring the dissimilarity between respiratory
effort signals. Therefore, the approach described in this work needs to be tested further for
these patients. In addition, it has been shown that the respiratory effort is more sensitive to
changes of sleep posture and body movements during sleep in comparison with measurements
by nasal cannulas [307]. In that case, Dwin might be erroneously calculated, thus harming the
classification performance. However, for the dissimilarity measure described in this work, the
effect of sleep posture might be eliminated since it was computed by comparing each respi-
ratory signal segment with its adjacent segments where the same sleep posture was expected.
Moreover, the dissimilarity measure focused on comparing signal morphology with a certain
number of breaths, where the falsely detected peaks and troughs (often corresponding to body
movements) were removed. As a result, the influences of sleep posture and body movements
should be diminished to some extent. Despite that, those influences merit further investigation.
5.5 Conclusion
In this chapter, by analyzing overnight respiratory effort from healthy subjects, we found that
sleep stages can be differentiated using a dissimilarity measure. This measure expresses the
dissimilarity between respiratory effort signals in their morphology. The dissimilarity can be
evoked by autonomic activity, alternation of ventilation control, or other external factors. A
new feature was extracted based on the properties of respiratory effort dissimilarity. Although
it performed worse than an existing feature (standard deviation of respiratory frequency), it can
help improving the performance of sleep staging when combined with all 26 existing respiratory
features (except for detecting wake from REM sleep). This indicates that this new feature
contains additional information that is not carried by the existing features for sleep staging.
84 Chapter 5. Uniform scaling dissimilarity on respiratory effort
CHAPTER 6
This chapter is adapted from: X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Foussier. Modeling
cardiorespiratory interaction during sleep with complex networks. Applied Physics Letters, 105:203701,
2014.
AIP
c
Abstract – Human sleep comprises several stages including wake, rapid-eye-movement (REM)
sleep, light sleep, and deep sleep. Cardiorespiratory activity has been shown to correlate with
sleep stages due to the regulation of autonomic nervous system. Here the cardiorespiratory
interaction (CRI) during sleep is analyzed using a visibility graph (VG) method that represents
the CRI time series in complex networks. We demonstrate that the dynamics of the interac-
tion between heartbeats and respiration can be revealed by VG-based networks, whereby sleep
stages can be characterized and differentiated.
85
86 Chapter 6. Cardiorespiratory interaction in complex networks
6.1 Introduction
Human sleep is considered a complex biological process with its own internal architecture
expressed by sleep stages [63, 281]. Sleep stages can be typically separated based on pat-
terns observed in standard polysomnography (PSG) recordings including electroencephalogra-
phy (EEG), electromyography (EMG), and electrooculography (EOG) [136, 247]. With PSG,
sleep stages are manually scored on continuous and non-overlapping epochs (lasting 30 s each)
as wake, rapid-eye-movement (REM) sleep, and several non-REM (NREM) sleep stages for
adults. This is usually done by trained sleep technicians according to either the recommenda-
tions provided by Rechtschaffen and Kales (R&K) [247] or using the more recent guidelines of
the American Academy of Sleep Medicine (AASM) [136]. NREM sleep can be further divided
into stages S1-S4 based on the R&K rules, or stages N1-N3 based on the AASM guidelines.
S1 and S2 (or N1 and N2) are associated with ‘light sleep’. S3 and S4 (or N3) correspond
to slow-wave sleep or ‘deep sleep’. For normal subjects, sleep usually starts with light sleep
and then deep sleep with REM sleep following [63]. This sequence is called a sleep cycle and
occurs about every 90 minutes, four to six times per night [63, 243].
Cardiorespiratory activity has proven different characteristics across sleep stages due to
the manifestation of autonomic (sympathetic and vagal) nervous activity [13, 281, 292]. Re-
cently, dynamics of heartbeats and respiration during sleep have been extensively described
[54, 179, 182, 222, 225, 226]. In particular, characteristics of cardiorespiratory interaction
(CRI) or coupling during sleep have attracted more and more attention since they can be used
to provide means to clinically diagnose sleep-related disorders or to identify sleep stages for ob-
jective sleep assessment [29, 30, 147, 266]. For example, Bartsch et al. [30] proposed methods
based on Hilbert-Huang transform (HHT) and detrended fluctuation analysis (DFA) to quantify
and analyze cardiorespiratory phase synchronization in different sleep stages.
In recent years, exploration of a time series has been extended to a two-dimensional complex
network with encoded information stored in the time series, aiming at better exploiting its
dynamics or properties [6, 96, 188, 313, 320]. Lacasa et al. [169] proposed a nonlinear visibility
graph (VG) method in order to describe a time series in a graph based on specific geometric
criteria. They found that random, fractal, and periodic time series correspond to networks with
exponential, scale-free, and regular characteristics, respectively, which means that VG is an
adaptive method for investigating different types of time series.
Some studies have analyzed human physiological activity by means of VG-based networks
[144, 272, 321]. For example, heartbeat dynamics in VG-based networks have been investigated
for healthy subjects and patients with congestive heart failure [272] and for subjects with med-
itation training [144]. In the field of sleep, a recent work has shown that sleep stages are able
to be identified using parameters extracted from EEG signals based on VG-based algorithms
[321, 322]. However, the characteristics of the interaction between cardiac and respiratory
activities in a two-dimensional network during sleep was not studied. Modeling these charac-
teristics during sleep will potentially benefit the cardiorespiratory-based classification of sleep
stages. Therefore, we investigate the CRI dynamics across sleep stages in complex networks
using the VG-based method.
Part I. Signal analysis for sleep stage classification 87
9 (a)
ECG (a.u.)
RR interval
6
3
0
−3
Resp. (a.u.) (b)
1
0
−1
(c)
CRI (a.u.)
1
0
−1
330 333 336 339 342 345
Time (s)
Figure 6.1: An example of using (a) a 15-s ECG signal and (b) the corresponding respiratory effort
signal to obtain (c) a CRI time series.
We consider 330 overnight PSG recordings from 165 healthy subjects (87 males) from the
SIESTA database [160]. Each subject spent two consecutive nights in a sleep laboratory. The
subjects had an average age of 51.8 ± 19.4 y [mean ± standard deviation (SD)] and an average
total recording time of 7.8 ± 0.5 h. According to the SIESTA study protocol, they met several
criteria such as no reported symptoms of neurological, mental, medical, or cardiovascular dis-
orders, no sleep-related disorders, no shift work, and usual bedtime between 22:00 and 24:00.
The PSG recordings were visually scored on 30-s epochs by two independent raters based on
the R&K rules and in case of disagreement, a consensus annotation was obtained.
Here for each 30-s epoch, the location of individual heartbeats is identified by applying
the Hamilton-Tompkins R-peak detector [124] followed by a slope-based QRS localization
method [107] on the ECG signal with a window of 7 epochs (3.5 min) centered on the epoch of
interest. This window serves the purpose of including sufficient data points to capture changes
in heartbeat (or RR) intervals [288]. Afterwards, the corresponding respiratory effort at the
time stamps of the heartbeats is sampled. The resulting CRI time series is then used for VG
analysis. Figure 6.1 illustrates an example of the computation of a CRI time series from its
corresponding ECG signal and respiratory (effort) signal.
In this work, we apply the VG method to build complex networks for modeling a CRI time se-
ries and to analyze its dynamics across different sleep stages for healthy subjects. To formulize
the VG method, let us consider a time series with n data points {xk ,tk }k=1,2,...,n . Two data points
88 Chapter 6. Cardiorespiratory interaction in complex networks
(a)
xi+1 xi+2 xi+3 xi+4 xi+5 xi+6 xi+7
… …
… …
Figure 6.2: An example of converting (a) a time series segment with 7 data points into (b) a network
using the VG method, where the respective degrees of nodes from xi+1 to xi+7 are 4, 3, 3, 5, 3, 2, and 4.
(xi ,ti) and (x j ,t j ) are connected as vertices or nodes through an undirected edge if and only if
the following rule [169] is satisfied
t j − tℓ
∀ℓ ∈ (i, j); xℓ < x j − (x j − xi ) . (6.1)
t j − ti
Intuitively, this means that the two data points are connected if they are able to ‘see’ each
other (i.e., the linear interpolation between their values is always larger than the value of its
corresponding data point). The time series can therefore be converted into a VG by applying
this rule on all the data points, resulting in its associated complex network with occurrence of
edges that are linked between nodes. Figure 6.2 illustrates an example of converting a time
series x into a VG-based network. For each node, its degree δ is defined as the number of
edges attached to it, giving a heuristic indication of the network’s complexity. Thus, the degree
distribution of the nodes P(δ ) can be used to characterize the time series.
A total of 310,503 epochs (including 19.2% wake, 15.2% REM sleep, 53.5% light sleep, and
12.1% deep sleep) are analyzed in this work. Figure 6.3 plots the node degree distribution of
CRI, denoted as P(δ ), pooled over all epochs for each sleep stage (wake, REM sleep, light
sleep, and deep sleep). As illustrated, the degree distribution P(δ ) for each sleep stage follows
a power-law topology such that P(δ ) ∼ δ −λ , in particular when the degree is large (e.g., δ > 4).
The power λ is shown to differ across sleep stages (wake: λ = 3.7, REM sleep: λ = 3.8, light
sleep: λ = 4.1, and deep sleep: λ = 4.2). As reported in literature, a power-law topology should
correspond to a scale-free dynamics [14, 211, 286], suggesting that the CRI time series during
Part I. Signal analysis for sleep stage classification 89
0
10
Wake
−1
10 REM sleep
Light sleep
−2
10 Deep sleep
Degree distribution P(δ)
−3
10
−4 λ = 3.7
10
−5
10 λ = 4.2
−6
10
−7
10
−8
10 0 1 2 3
10 10 10 10
Degree δ
Figure 6.3: Log-log plot of degree distribution P(δ ) of CRI during wake, REM sleep, light sleep, and
deep sleep. P(δ ) follows a power-law topology when δ is larger than 4, such that P(δ ) ∼ δ −λ with λ of
3.7 for wake, 3.8 for REM sleep, 4.1 for light sleep, and 4.2 for deep sleep.
0
10
Wake
−1 REM sleep
10
Light sleep
Mean degree distribution P (δm )
Deep sleep
−2
10
λ = 6.8
−3
10 λ = 8.6
−4
10
−5
10
−6
10
0 1 2
10 10 10
Mean degree δm
Figure 6.4: Log-log plot of mean degree distribution P(δm ) of CRI during wake, REM sleep, light sleep,
and deep sleep. P(δm ) follows a power-law [P(δm ) ∼ δm−λ ] when λ ≥ 6 with λ of 6.8 for wake, 7.2 for
REM sleep, 8.0 for light sleep, and 8.6 for deep sleep.
90 Chapter 6. Cardiorespiratory interaction in complex networks
7
Mean δ m
3
Wake REM sleep Light sleep Deep sleep
Figure 6.5: Mean degree δm of the CRI time series networks (mean and SD) in different sleep stages. A
Mann-Whitney test shows significant differences between all pairs of sleep stages, with p < 0.0001.
a specific sleep stage are non-stationary and fractal [169]. In addition, we also observe that the
VG-based networks of CRI for wake epochs have a higher percentage of high-degree nodes (the
networks have a higher complexity) compared with other sleep stages, such as deep sleep which
has the least high-degree nodes of the associated networks. A possible explanation for this is
that the CRI time series is more noisy (caused by the weaker coupling between cardiac and
respiratory signals) during wake and it is more regular (due to the stronger cardiorespiratory
coupling) during deep sleep when compared with the other stages [30, 188]. Consequently, the
CRI time series are more irregular for wake epochs while they are more regular for deep sleep
epochs. The ‘blur’ in the figure at large values of δ might be due to the presence of outliers in
CRI time series caused by loose cables during measurement or body motion artifacts.
Since the degree is different between different sleep stages, it can be used to distinguish them
on an epoch-by-epoch basis. For this purpose, the mean degree δm for each epoch (computed
by averaging the degrees over the nodes with a window of 7 epochs centered on that epoch) can
be used as a quantification of the network ‘complexity’ of the CRI time series in VG for each
epoch. Figure 6.4 shows the distribution of δm for different sleep stages where the separations
between sleep stages can be clearly observed, in particular when the mean degree is smaller
than 3 or larger than 6. These results are similar to those obtained based on the analysis of
EEG signals [321]. In Figure 6.5, the δm values in different sleep stages are compared. Using
a two-tailed Mann-Whitney test, δm is found to be significantly different between each pair of
sleep stages (all with p < 0.0001). This means that, on average, wake epochs have the highest
mean degree in the networks followed by REM sleep epochs, then by light sleep and finally by
deep sleep. Moreover, if we consider the degree variation δsd , computed as the SD of the node
degrees in each epoch, we also find statistically significant differences between sleep stages
(all with p < 0.0001) as illustrated in Figure 6.6. The Spearman’s rank correlation coefficient
r between these two parameters δm and δsd is found to be high [r = 0.733, p < 0.00001; 95%
Part I. Signal analysis for sleep stage classification 91
Mean δ sd
6
3
Wake REM sleep Light sleep Deep sleep
Figure 6.6: Degree variation δsd of the CRI time series networks (mean and SD) in different sleep stages.
A Mann-Whitney test shows significant differences between all pairs of sleep stages, with p < 0.0001.
Another important property of a network is its assortative mixing [210], which has been widely
used to analyze many real-world networks such as biological [100], neural [86], and social
networks [212]. For a node in a network, it takes the preference of its connections to high- or
low-degree nodes into account. Considering a network including a total of M edges, the ith
edge connects two nodes with degree of αi and βi at their ends. The assortativity coefficient ζ
of this network [210] is given by
M −1 ∑i αi βi − [M −1 ∑i 12 (αi + βi )]2
ζ= , (6.2)
M −1 ∑i 21 (αi2 + βi2 ) − [M −1 ∑i 12 (αi + βi )]2
with ζ ranging between −1 and 1. The network is assortative if ζ > 0, in which case the
high-degree (or low-degree) nodes are more likely to be connected to each other than to the
low-degree (or high-degree) nodes; if ζ = 0, the network is randomly mixed; and if ζ < 0 the
network exhibits disassortativity, in which case the high-degree nodes tend to connect to the
low-degree ones, and vice versa. For the CRI time series in this work, the assortativity coeffi-
cients of the associated VG-based networks in different sleep stages are shown in Figure 6.7.
The CRI networks in all sleep stages present assortative. In comparison with REM and NREM
sleep, the CRI network has a decreased assortativity coefficient during wake, indicating that
the network is more randomly mixed. Deep sleep, on the other hand, has a larger ζ compared
with light sleep, possibly because the CRI time series during deep sleep exhibit a more regular
pattern than light sleep. These findings suggest that sleep stages can also be separated based
on differences between the assortativity coefficients of VG-based CRI networks. It should also
92 Chapter 6. Cardiorespiratory interaction in complex networks
0.25
0.2
0.15
Mean ζ
0.1
0.05
0
Wake REM sleep Light sleep Deep sleep
Figure 6.7: Assortativity coefficient ζ of the CRI time series networks (mean and SD) for different
sleep stages. A Mann-Whitney test shows significant differences between all pairs of sleep stages, with
p < 0.0001.
6.5 Conclusion
In this chapter, we achieved the quantification of the dynamics of cardiorespiratory interaction
during sleep by converting it into complex networks using the VG method. It can be described
by some important characteristics of the networks including (mean) degree and its distribu-
tion, degree variation, and assortativity coefficient. These characteristics were shown to behave
differently across sleep stages. However, they were found to be correlated, possibly due to
the presence of mutual information between them. Nevertheless, in practice, they can offer
promising features used for classifying sleep stages based on cardiorespiratory activity.
Part II: Timing Between Autonomic
and Brain Activity
CHAPTER 7
This chapter is adapted from: X. Long, J. B. Arends, R. M. Aarts, R. Haakma, P. Fonseca, and J. Rolink.
Time delay between changes of cardiac and brain activity during sleep transitions. Applied Physics Let-
ters, 106:143702, 2015.
AIP
c
Abstract – Human sleep consists of wake, rapid-eye-movement (REM) sleep, and non-REM
(NREM) sleep that includes light and deep sleep stages. This work investigated the time de-
lay between changes of cardiac and brain activity for sleep transitions. Here the brain activity
was quantified by electroencephalographic (EEG) mean frequency and the cardiac parameters
included heart rate, standard deviation of heartbeat intervals and their low- and high-frequency
spectral powers. Using a cross-correlation analysis, we found that the cardiac variations during
wake-sleep and NREM sleep transitions preceded the EEG changes by 1-3 min but this was not
the case for REM sleep transitions. These important findings can be further used to predict the
onset and ending of some sleep stages in an early manner.
95
96 Chapter 7. Time delay between cardiac and brain activity
7.1 Introduction
In the past decades a phenomenon has been recognized in many domains that two coupled
sources or systems exhibit an unsynchronized interaction with a time difference or delay in be-
tween [29, 50, 90, 155, 157, 290]. For instance, neural oscillators have enhanced coupling in
delayed-time [90]. In particular, this may occur during transitions between two physical or bi-
ological states such as chaotic state changes [290], gene switches [50], neutron emission [157],
and cardiorespiratory phase synchronization transitions [29]. Understanding these phenomena
can help, e.g., explore the coherence of neurons and information transmission of the brain in
neurology [90] and improve ‘perception-action’ planning with stimulus events from external
world in cognitive science [236].
In this work we apply the time delay analysis in the area of human sleep. Neurophysiological
mechanisms of sleep are exceptionally important for humans to maintain, for instance, health,
internal homeostasis, memory, and cognitive and behavioral performance [61, 165]. Numerous
studies have reported significant association between heart rate (and heart rate variability, HRV)
and electroencephalographic (EEG) activity during sleep, where they both vary across sleep
states/stages [46, 54, 292]. Previous studies have demonstrated the presence of unsynchronized
changes of HRV and EEG activity in time course over the entire night [146, 217]. However,
the variations of brain activity and autonomous cardiac dynamics should not be independent of
sleep (state/stage) transitions, for which their coupling might change. We therefore investigated
the time delay in sleep transition profiles between cardiac and EEG activity using a cross-
correlation analysis, which was not studied before.
It is known that human sleep consists of wake state, rapid-eye-movement (REM) sleep state,
and non-REM (NREM) sleep state including four stages 1, 2, 3, and 4 according to the rules
recommended by Rechtschaffen and Kales (R&K) [247]. With the more recent guidelines of the
American Academy of Sleep Medicine [136], stage 3 and 4 are suggested to be merged to single
slow wave sleep or “deep” sleep since no essential difference was found between them. Besides,
stage 1 and 2 usually correspond to “light” sleep. According to one of these manuals, sleep
states/stages are scored by sleep clinicians on continuous 30-s epochs by visually inspecting
polysomnographic (PSG) recordings including multi-channel EEG, electrooculography (EOG),
and electromyography (EMG).
A total of 330 overnight PSG recordings in the SIESTA database [160] from 165 normal sub-
jects (88 females) were considered in our analysis, where each subject spent two consecutive
nights for sleep monitoring. The SIESTA data were collected in seven sleep centers located
in five EU countries within a period from 1997 to 2000. The study was approved by the local
ethical committees of the recording partners and all subjects provided their informed consent.
The subjects had an average age of 51.8 ± 19.4 y and the average total recording length was
Part II. Timing between autonomic and brain activity 97
1.1
0.8
0.7
0.6
0.5
REM/deep Wake/deep Wake/REM REM/light Wake/light Light/deep
Figure 7.1: Inter-rater agreement as evaluated by Cohen’s Kappa [mean and standard deviation (SD)
over recordings] between different sleep stages. Statistical significance of difference between each two
Kappa values was examined with a t-test, where the Kappa had no significant difference between REM
sleep/deep sleep and wake/deep sleep and between REM sleep/light sleep and wake/light sleep (p <
0.05) but it was significantly different between the others (p < 0.001).
7.8 ± 0.5 h per night. They fulfilled several criteria, such as no reported symptoms of neuro-
logical, mental, medical, or cardiovascular disorders, no history of drug or alcohol abuse, no
psychoactive medication, no shift work, and retirement to bed between 22:00 and 24:00 de-
pending on their habitual bedtime. Sleep states/stages were scored by two independent raters
based on the R&K rules. In case of disagreement, the consensus annotations were obtained.
The inter-rater reliability (measured by Cohen’s Kappa coefficient of agreement [72], rang-
ing from 0 to 1) in separating different sleep stages is compared in Figure 7.1. It shows that
the Kappa in distinguishing between light and deep sleep was statistically significantly lower
than that for separating other sleep stages. This is due to the gradual changes of physiological
behaviors within NREM sleep.
The EEG activity was quantified by a parameter fEEG , called EEG mean frequency [217]. To
calculate it, the EEG signals were first band-pass filtered between 0.3 and 35 Hz and then
the power spectral density was computed for each non-overlapping 2-s interval with a discrete
Fourier transform (DFT). Afterwards, the associated peak frequencies between 0.5 and 30 Hz
were detected accordingly and then for each 30-s epoch, they were averaged over a window of
9 epochs (4.5 min) centered on that epoch, yielding the epoch-based estimates of fEEG . The
cardiac parameters, derived from electrocardiography (ECG) signals over a 9-epoch window
centered on each 30-s epoch, included mean heart rate (HR), standard deviation of heartbeat
intervals (SDNN), and the logarithmic spectral powers of heartbeat intervals in low-frequency
(LF, 0.01-0.15 Hz) and high-frequency (HF, 0.15-0.4 Hz) bands. They have been proven to re-
late to certain properties of autonomic nervous system [13, 281]. For instance, HR, SDNN, and
98 Chapter 7. Time delay between cardiac and brain activity
Wake
REM sleep
Light sleep
Deep sleep
2
f EEG
0
-2
2
HR
0
-2
2
SDNN
0
-2
2
LF
0
-2
2
HF
0
-2
0 1 2 3 4 5 6 7 8
Time (h)
Figure 7.2: An example of epoch-based sleep states/stages over night and the normalized (Z-score) EEG
mean frequency fEEG and cardiac parameters HR, SDNN, LF, and HF (in nu).
LF are associated with sympathetic activity and the HF power is a marker of parasympathetic or
vagal activity activated by respiratory-stimulated stretch receptors [24, 281, 288]. Many stud-
ies have shown that autonomic nervous activity is effective in identifying sleep states or stages
when PSG is absent [179, 183, 248]. Here all the parameters were normalized to zero mean
and unit variance (Z-score) for each recording, leading to a normalized unit “nu”. Note that the
use of a window aimed at including sufficient heartbeats to capture cardiac rhythms and to help
reduce signal noise so that the autonomic nervous activity can be reliably expressed where a
window size of about 5 min was recommended [288]. This could also help reduce signal noise.
For analyzing the time delay during sleep transitions, we chose 30 s the minimum epoch length
because (1) it is the standard resolution for PSG-based manual scoring of sleep stages [247]
and (2) using a smaller length the parameters could be influenced by the subtle changes caused
by the physiological response during arousals [268], which would likely lead to spurious cross-
correlation analysis results. Figure 7.2 illustrates an example of overnight sleep profile and the
EEG and cardiac parameter values from a healthy subject. It can be seen that these parameters
seem correlated with sleep states/stages to some extent.
To capture the delayed changes of cardiac and EEG activity, we constrained our analysis on
the periods with 15 epochs (7.5 min) before and after each transition moment where only one
transition occurred in the middle of each period. We note that a small portion of transitions was
sampled according to our criteria, which might lead to under representation of the fragmented
sleep transitions, i.e., the transitions with other transitions immediately ahead or following
within a short time. The amount of these periods was 1077 out of totally 28359 transitions from
Part II. Timing between autonomic and brain activity 99
(a) (b) 70
Distribution (%)
60
50
40
30
Wake 20
10
0
Wake REM Light Deep
sleep sleep sleep
0.6% 25.0% 1.0%
3.8% 20.2% 0.03%
Figure 7.3: (a) Mean percentages of sleep transitions over recordings. The average number of total
transitions per recording is 85.9. The transitions are indicated with arrows, where the REM–deep sleep
transitions are not shown because they account for less than 0.01% of total transitions. (b) Sleep stage
distribution (mean and SD over recordings).
all 330 recordings. The first and the last 5 epochs of these periods were excluded, yielding
10-min segments used for analyzing time delays. This served to avoid the time-delayed effects
of the previous and the next transitions when analyzing the parameter values for the time delay
of current sleep transition and meanwhile, to include enough data points for computing cross-
correlation coefficients. By these means, we only considered major types of sleep transitions in
three “hierarchical” levels, as shown in Figure 7.3. They are the transitions: (1) between wake
and sleep including W→LS (from wake to light sleep), LS→W (from light sleep to wake), and
RS→W (from REM sleep to wake); (2) between REM and NREM sleep including RS→LS
(from REM to light sleep) and LS→RS (from light to REM sleep); and (3) within NREM sleep
including LS→DS (from light to deep sleep) and DS→LS (from deep to light sleep). These
seven types of transitions are of predominance among all sleep transitions [154, 159], which
can also be observed in our data (see Figure 7.3). The transitions between REM and deep sleep
and from deep sleep to wake were not included. For each parameter, we calculated the mean
values over all the 10-min segments for each type of transition and then they were Z-score
normalized. Figure 7.4 illustrates the mean parameter values 5 min (or 10 epochs) before and
after sleep transitions.
The cross-correlation between EEG mean frequency fEEG and each cardiac parameter αc
(HR, SDNN, LF, or HF) for a given time segment with m epochs is expressed by a cross-
correlation function G,
1 m−n
G fEEG ,αc (n) ≡ ( fEEG ⋆ αc )(n) = ∑ fEEG,i · αc,i+n,
m i=1
(7.1)
where n is the number of time shifts (a.k.a. time lag) of the convolution between fEEG and αc .
Therefore, the delayed time ∆τ can be obtained by searching for the lag leading to maximum
100 Chapter 7. Time delay between cardiac and brain activity
f EEG Transition
Wake-sleep REM-NREM NREM
W LS LS W RS W RS LS LS RS LS DS DS LS
2
0
-2
2
HR
0
-2
2
SDNN
0
-2
2
LF
0
-2
2
HF
0
-2
-5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5
Time in epoch (30 s)
Figure 7.4: Mean values of the normalized parameter fEEG , HR, SDNN, LF, and HF (in nu) with 10
epochs (5 min) before and after different sleep transitions (W→LS, LS→W, RS→W, RS→LS, LS→RS,
LS→DS, and DS→LS).
The time delay ∆τ can be positive or negative. A positive ∆τ value indicates that fEEG starts
changing earlier than the cardiac parameter αc , and conversely, a negative value reflects that the
variations of αc are later than fEEG with ∆τ epochs (∆τ /2 min) on average.
Table 7.1: Results of time delay analysis between EEG mean frequency fEEG and four cardiac
parameters HR, SDNN, LF, and HF for different sleep transitions
tions. The constant earlier appearance of autonomic variations suggests that cortical changes
are secondary to changes elsewhere in the brain (e.g., brain stem) or central nervous system.
These time differences are sleep state/stage dependent and seem not occurring for REM sleep
(i.e., REM-NREM transitions). This also suggests that the physiology of these changes dur-
ing REM sleep is different from that during wake and NREM sleep. In fact, REM sleep has
different physiological mechanisms compared with NREM sleep, where REM transitions are
‘switch-like’ transitions [187] while the physiological variations within NREM sleep are grad-
ual [55]. The lack of time delay during REM transitions might also be caused by the fact that
the R&K rules force human raters to merge REM epochs of 30 s into one REM sleep period if
they occur within 3 min [247]. For W→LS transitions, upon a closer look, we found that most
of them were in the beginning of the night, indicating the presence of time delay conveyed be-
tween cardiac and brain activity during sleep onset. The time delay from sleep (REM or light
sleep) to wake could be due to the gradual steps of awakening [8, 116]. Additionally, as shown
in the table, the changes of HR seem always later than the HRV changes. We therefore specu-
late that, to a certain degree, parasympathetic changes (reflected by HF changes) might present
slightly earlier than the variations of sympathetic activity (corresponding to HR changes) during
wake-sleep and NREM transitions.
As stated, when computing the parameters, we applied averaging or filtering over a 9-epoch
102 Chapter 7. Time delay between cardiac and brain activity
6 HR SDNN LF HF
Δτ (30 s) 4
2
0
W LS
-2
-4 LS W
-6 RS W
RS LS
LS RS
0.8 LS DS
|r| (-)
DS LS
0.6
0.4
HR SDNN LF HF
1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9
Window size (30 s)
Figure 7.5: Time delay ∆τ between cardiac and EEG activity and the associated (maximum) absolute
correlation coefficient |r| versus averaging window size (1-9 epochs, step size 2 epochs) for computing
the epoch-based parameters.
14
Wake-sleep transition
12 REM-NREM transition
Absolute HR change (bpm)
NREM transition
10
0
LS W RS W W LS LS RS RS LS DS LS LS DS
Figure 7.6: Absolute changes of HR (mean and SD) during sleep transitions, computed based on the
10-min segments.
(4.5-min) window centered on each epoch in order to obtain reliable parameter values. Fig-
ure 7.5 illustrates the time delay and the associated absolute correlation coefficient versus the
averaging window size. The figure shows that our choice was appropriate where the correla-
tions generally increased and the time delays ∆τ stabilized along with the increase in window
size. In fact, when performing cross-correlation analysis between two signals, using a symmet-
ric linear-phase filtering at the same window size would not cause signal phase distortion [240].
Thus, the averaging here should not affect the lag sought when searching for the time delays.
Figure 7.6 shows the absolute changes of HR (in beat per minute, bpm) during different
sleep state/stage transitions. It is noted that large HR changes (4.6-9.1 bpm) occurred during
Part II. Timing between autonomic and brain activity 103
the wake-sleep transitions while the NREM transitions had the smallest changes in HR (1.1-
2.7 bpm). This supports the “hierarchical” nature of the various transitions and confirms the
validity of the results.
7.5 Conclusion
In this chapter, we investigated the time delay between cardiac and brain activity for different
sleep transitions using a cross-correlation analysis. The presented results indicate that the au-
tonomic nervous system changes generally precede the EEG changes by 1-3 min during sleep
transitions except for REM-NREM transitions. In practice, the important findings here can be
used in future research to predict sleep state/stage changes based on autonomic nervous activity.
104 Chapter 7. Time delay between cardiac and brain activity
CHAPTER 8
This chapter is adapted from: X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Rolink. Detection of
nocturnal slow wave sleep based on cardiorespiratory activity. Submitted.
Abstract – Human slow wave sleep (SWS) during bedtime is paramount for energy conser-
vation and memory consolidation. This work aims at automatically detecting SWS from noc-
turnal sleep using cardiorespiratory signals that can be acquired with unobtrusive sensors in a
home-based scenario. From the signals, time-dependent features are extracted for continuous
30-s epochs. To reduce the measuring noise, body motion artifacts, and/or within-subject vari-
ability in physiology conveyed by the features and thus enhance the detection performance, we
propose to smooth the features over each night using a spline fitting method. In addition, it is
found that the changes in cardiorespiratory activity precede the transitions between SWS and
the other sleep stages (non-SWS). To this matter, a novel scheme is proposed that performs
the SWS detection for each epoch using the feature values prior to that epoch. Experiments
were conducted with a large data set of 325 overnight polysomnography (PSG) recordings us-
ing a linear discriminant classifier and ten-fold cross validations. Features were selected with a
correlation-based method. Results show that the performance in classifying SWS and non-SWS
can be significantly improved when smoothing the features and using the preceding feature val-
ues of 5-min earlier. When compared with manual PSG scoring, we achieved a Cohen’s Kappa
coefficient of 0.57 (at an accuracy of 88.8%) using only six selected features for 257 recordings
with a minimum of 30-min overnight SWS that were considered representative of their habitual
sleeping pattern at home. A marked drop in Kappa to 0.21 was observed for the other nights
with SWS time of less than 30 min which were found to more likely occur in older subjects.
This will be the future challenge in cardiorespiratory-based SWS detection.
105
106 Chapter 8. Slow wave sleep detection with cardiorespiratory activity
8.1 Introduction
Nocturnal sleep of humans is comprised of rapid-eye-movement (REM) sleep, stages S1-S4
of non-REM (NREM) sleep, and wake according to the R&K rules [247]. S1 and S2 are
grouped into “light sleep”, where S1 and S2 correspond to stages N1 and N2 respectively
according to the more recent guidelines of the American Academy of Sleep Medicine (AASM)
[136]. S3 and S4 are considered slow wave sleep (SWS), in correspondence to N3 stage in the
AASM guidelines. SWS relates to delta electroencephalographic (EEG) activity with no eye
movements [136]. It represents the most restorative period of sleep for metabolic functioning,
during which brain and body energy are conserved [35] and new memories are consolidated
[285]. SWS associates with maintenance of sleep and sleep quality [45]. Lack of SWS may
result in, e.g., loss of daytime performance [45] and increased risk of diabetes [287]. More
interestingly, attention has been engaged in the past decade to improve nighttime sleep (i.e., to
enhance memory consolidation) through external stimulation of sleep slow waves in humans
[192, 193, 213]. Therefore, we were engaged to develop a system to accurately detect SWS
from nocturnal sleep, particularly in a home scenario.
Polysomnography (PSG) is the “gold standard” for objective sleep assessment, relying on
which a hypnogram can be derived through visual scoring by sleep technicians [136, 247]. A
PSG recording typically consists of various bio-signals such as EEG, electromyograhy (EMG),
electroocculography (EOG), electrocardiography (ECG), respiratory effort (RE), and blood
oxygen saturation (SaO2 ). These signals are usually split into continuous 30-s non-overlapping
intervals, called epochs. Although PSG is a standard method for sleep analysis, it has some
disadvantages, for example, it is conducted in a sleep laboratory, leading to high costs with
facilities; it requires many electrodes to be attached to the body, disrupting a subject’s normal
sleep as a consequence; and it requires subjects to stay in the sleep laboratory overnight that
is not compatible with a prolonged sleep monitoring. To overcome these disadvantages, car-
diac/respiratory information has been deployed to assess sleep for years as long as they can be
acquired with unobtrusive sensing systems in a home-based environment such as with a wrist-
worn watch [127], a bed sensor [161], a textile bedsheet [264], a web-camera [232], an acoustic
device [228], a Doppler radar [319], and a photoplethysmographic sensor [16]. It has been
proven that cardiorespiratory signals contain relevant physiological information for sleep stag-
ing such as heart rate variability (HRV) [46] and respiration rhythm [95]. This is because they
are related to autonomic nervous system that differs between sleep stages [292]. For example,
SWS coincides with an decreased sympathetic activity conveyed by the low-frequency power
in HRV.
Cardiorespiratory-based sleep stage classification has been increasingly studied in recent
years, where many features (representing certain physiological aspects) have been designed
and extracted from cardiac and/or respiratory signals [151, 182, 197, 249, 309]. However,
rather than SWS detection, those studies investigated either wake–REM–NREM or sleep–wake
classification. Many other studies have reported results in classifying wake, REM sleep, light
sleep, and SWS [127], detecting REM sleep [131], or differentiating light sleep and SWS [51],
whereas they used additional physiological signal modalities such as peripheral arterial tone
Part II. Timing between autonomic and brain activity 107
and oxyhemoglobin saturation. Shinar et al. [273] developed an HRV-based SWS detector and
obtained an accuracy of about 80%, while they used a very small portion with a total duration
of 100 min (SWS of 50 min) rather than entire-night recordings for validation. Therefore, this
chapter addresses the problem of continuously classifying overnight SWS and non-SWS (all
the other stages) with cardiorespiratory signals that can be unobtrusively acquired.
The sleeping pattern of healthy adults usually progresses with several regular cycles through-
out the night [63]. This means that, for each recording, the sleep stage with associated physio-
logical activity across the night is time-variant so that each feature is considered an epoch-based
time series. After visually comparing some feature values and PSG-based annotations chang-
ing over night, we observed many errors occurring in the middle of a long SWS/non-SWS
period, possibly due to measuring noise, feature computing variances, or body motion artifacts.
Another cause might be the ‘within-subject variability’ in physiology, which means that the
physiological expression of features was not perfectly discriminative and thus could not deliver
an ideal separation between sleep stages. For these reasons, we decided to low-pass filter or
smooth each feature’s values over time using a spline fitting method [296]. The main reason
of using spline fitting was that it is capable of interpolating missing data compared with many
other low-pass filters [84, 296]. This is of particular importance because sleep is a continuous
process and we found that our data had an average of ∼10% missing values.
Several researchers have investigated the temporal relationship between cardiac dynamics
and brain activity [62, 146, 217]. For instance, Otzenberger et al. [217] reported that the
overnight HRV changes generally precede the variations in EEG activity by around 1-2 min.
Jurysta et al. [146] demonstrated that the high-frequency power of heartbeat or RR intervals
corresponds to a preceding time (or negative time delay) of approximately 7 min compared
with the delta-wave power of EEG spectrum. Additionally, the decrease of heart rate in stage
S2 was found to anticipates the onset of SWS by several minutes [62]. These studies indicate
that the autonomic changes are not exactly synchronized with the variations in EEG activity,
in particular during the transitions between SWS and non-SWS; rather that a time difference
appears in between. In our data, we also observed that many features started changing prior
to the transition moments between the annotated SWS and non-SWS epochs. This time delay
phenomenon would end up with errors in classifying SWS and non-SWS epochs. To this matter,
we propose a novel scheme by using the preceding feature values in earlier epochs to further
improve the identification of the sleep state of each epoch (SWS or non-SWS). This can also
potentially enable the prediction of SWS onset in an early manner allowing a real-time SWS
detection system, usually required for slow wave stimulation in practice.
Previous work has shown that a linear discriminant (LD) classifier is appropriate in the
problem of sleep stage classification [180, 182, 249], which was adopted in this work for SWS
and non-SWS classification. Preliminary results of this work have been previously reported
[186].
108 Chapter 8. Slow wave sleep detection with cardiorespiratory activity
Full PSG data (at least 16 channels of bio-signals) from 165 healthy subjects in the SIESTA
project [160] was included, monitored in seven different sleep centers located in five European
countries. In accordance with the SIESTA protocol, the subjects met several criteria such as
no reported symptoms of neurological, mental, medical, or cardiovascular disorders, no history
of drug or alcohol abuse, no shift work, and retirement to bed before midnight depending on
their habitual bedtime [160]. Each subject spent two consecutive nights in a sleep laboratory,
resulting in a total of 330 overnight recordings. For each recording, the scoring of 30-s epoch-
based sleep stages was carried out by sleep technicians based on PSG according to the R&K
rules. For SWS and non-SWS classification, wake, REM sleep, S1, and S2 were merged into
a single non-SWS class; S3 and S4 were labeled as SWS class. The epochs with invalid PSG
scoring (∼3%) were removed.
Five recordings were excluded due to the absence of SWS, yielding an inclusion of 325
recordings in our data set. In addition, this work primarily addressed on the ‘normal’ sleep
nights (from lights OFF in the evening till lights ON in the morning), during which the total
SWS time throughout the night was no less than 30 min [216], resulting a group of 257 record-
ings from 145 subjects in a normal group. These nights were more representative of the normal
sleeping pattern in terms of SWS [216], which were expected with a home-based sleep moni-
toring. The remaining 68 nights (from 51 subjects) with the overnight total SWS time of less
than 30 min (low-SWS group), more from the first nights than the second nights, were excluded
because they might be strongly influenced by the “laboratory effects” where the subjects could
not sleep well as habitual as being at home [196]. The subject demographics and sleep data for
the normal group used in this study is summarized in Table 8.1. In spite of that, we also tested
our approach on the recordings from the low-SWS group.
The thoracic RE signals (sampled at 10 Hz) were acquired with a respiratory inductance
plethysmographic (RIP) chest belt and the cardiac signals (sampled at ≥100 Hz) were recorded
with a modified V1 lead ECG.
Part II. Timing between autonomic and brain activity 109
8.3 Methods
8.3.1 Signal preprocessing
The RE signal was filtered with a tenth order Butterworth low-pass filter (with a cut-off fre-
quency of 0.6 Hz) to eliminate high frequency noise. Afterwards, the baseline was subtracted
by the median peak-to-trough amplitude over the entire recording [182, 248]. Because we also
extracted respiratory features in the frequency domain, a fast Fourier transform (FFT) with a
Hanning window (used to reduce spectral leakage) was applied to estimate the power spectral
density (PSD) on the resulting signal for each epoch [248].
The ECG signal was high-pass filtered using a Kaiser window (with a cut-off frequency
of 0.8 Hz and a side-lobe attenuation of 30 dB) to remove baseline wander, after which the
resulting signal was zero-meaned. To extract features from RR intervals for each epoch, a
Hamilton-Tompkins R-peak detector[124] combined with a precise QRS localization algorithm
[107] was applied to locate R peaks on the ECG signal with a window of nine epochs centered
at the epoch of interest. This window served to include sufficient data points to capture the
changes in RR intervals, where the window size is close to the value of 5 min recommended
in [288]. The resulting RR interval series was then re-sampled via linear interpolation at a
sampling rate of 4 Hz. The PSD of RR intervals was estimated using an autoregressive (AR)
model with adaptive order [42]. Using the AR model instead of a Fourier-based approach
was due to its limitations such as poor spectral resolution and leakage [44], which were more
sensitive to estimating the PSD of the RR interval series having a lower sampling rate compared
with the RE signal.
A total of 70 features were extracted for each 30-s epoch from ECG and thoracic RE signals,
which are briefly described below. Note that the features for a specific epoch were mostly
computed within a certain window centered at that epoch.
The ECG features were obtained from the RR intervals or heart rates over a window of nine
epochs (with around 300 beats during sleep on average). In the time domain, they included
the mean heart rate, mean RR interval (detrended and non-detrended), standard deviation and
range of RR intervals, the percentage of successive RR intervals that differ by more than 50
ms, and the root mean square and standard deviation of successive RR interval differences
[288]. Frequency domain features comprised the logarithm of normalized power in the very low
frequency (VLF, 0.003-0.04 Hz), low frequency (LF, 0.04-0.15 Hz), and high frequency (HF,
0.15-0.4 Hz) spectral bands, the ratio of LF and HF spectral powers [249, 288], and the module
and phase of HF pole [197]. The VLF, LF, and HF power and LF-to-HF ratio with adapted
spectral bands have succeeded in improving sleep/wake detection [178]. The maximum power
in the HF band and its associated frequency (in line with the mean respiratory frequency) were
also calculated [248]. Additionally, non-linear properties of RR intervals were quantified based
on detrended fluctuation analysis (DFA) with parameter α [148] and its short-term (parameter
α1 ) and long-term (parameter α2 ) exponents [224], and multi-scale sample entropy (length: 1
110 Chapter 8. Slow wave sleep detection with cardiorespiratory activity
As stated, the features should cycle with time in terms of sleep stage, which motivated us
to consider a recording- or night-specific feature smoothing. Before that, each feature was
normalized for each recording to have zero mean and unit variance (Z-score normalization).
This served to reduce the variability between subjects caused by the difference between PSG
systems used in different sleep laboratories and/or the difference in physiological expression
during sleep. Our previous work [186] has revealed that the Z-score normalization can help
improving SWS detection.
The spline fitting method has been widely used for time series smoothing [84]. Let x rep-
resent a sequence of observations x = {x1 , x2 , ..., xn} (x1 < x2 < ... < xn ) and y their responses
y = {y1 , y2 , ..., yn}, then a relation between them can be modeled by
where g is a smoothing (spline) function, εi are independent and identically distributed resid-
uals. The smoothing function can be estimated by minimizing the objective function (i.e.,
penalized sum of square) such that
" #
n Z xn
ĝ = arg min ∑ [yi − g(xi)]2 + λ g′′ (x)2 dx , (8.2)
g x1
i=1
where λ is a smoothing parameter that controls the trade-off between residual and local vari-
ation. The smoothing function can be expressed by cubic B-splines as basis functions and
determined via least squares approximation (LSA) [84, 296].
Given a feature for a recording, the observations here are the epoch indices t = {t1,t2 , ...,tm}
and the responses are their corresponding feature values v = {v1 , v2 , ..., vm }, where m is the
Part II. Timing between autonomic and brain activity 111
total number of epochs. To build up a spline fitting model, the entire sequence is divided in
k continuous subsequences with k − 1 boundaries called knots or breaks; and each of them
contains l epochs. The feature values and epoch indices for this recording are then expressed
respectively as
v = {v11 , v12 , ..., v1l , v21 , v22 , ..., v2l , ..., vk1, vk2 , ..., vkl } (8.3)
| {z }| {z } | {z }
1 2 k
and
Thereafter, each subsequence is modeled by Equation 8.1 and 8.2, yielding a spline fitting over
the entire sequence with multiple knots. Since the total number of epochs differs between
recordings, we preferred to fix the window size of subsequences w = ⌈m/k⌉ instead of using a
fixed number of breaks k. A larger window size (or fewer knots) results in a smoother fitting
curve; while a smaller window size (or more knots) decreases its smoothness. For example,
as depicted in Figure 8.1, the feature values throughout the night after spline smoothing seem
better mapped to the PSG-based annotations. The figure also shows that the RR interval and
respiratory rate have lower variances during SWS compared with the other stages.
8.3.5 Classifier
Here a simple LD classifier was adopted and the classification was performed on each epoch
over the whole recording. The linear discriminant function is given by
1
Gc (f) = − (f − µ c )T Σ −1 (f − µ c ) + ln Pr(c), (8.5)
2
where µ c expresses the mean of the feature vector f, Σ the pooled covariance matrix, and Pr(c)
the prior probability for class c [SWS (positive class) or non-SWS (negative class)]. Given a
112 Chapter 8. Slow wave sleep detection with cardiorespiratory activity
feature vector, the jth epoch E j ( j = 1, 2, ..., m) of a recording is classified based on the decision
making rule
(
SWS if GSWS (f j ) > Gnon-SWS (f j )
C (E j | f j ) = . (8.6)
non-SWS otherwise
We observed that the occurrence of each class varied throughout the night. For instance,
the probability of being in SWS at the end of the night should be lower than that in the middle
of the night. This indicates that the prior probabilities are time-varying. Instead of using a
fixed prior probability hence, we computed the time-varying prior probability for each epoch
by simply counting the relative frequency it was, in that specific epoch index, annotated as each
class [248].
As illustrated in Figure 8.1, there seems to be some errors in feature values with a few min-
utes before the transitions between SWS and non-SWS, implying the presence of time delay
between the changes of cardiorespiratory properties and the PSG-based annotations. Under the
consideration of the time delay, earlier cardiorespiratory activity can be utilized to identify SWS
or non-SWS class. Supposing that we want to classify the jth epoch E j ( j = 1, 2, ..., m), we can
use the feature values of the ( j+τ )th epoch (with a delay of τ epochs to the target epoch) instead
of using the feature values from the epoch itself, such that
(
SWS if GSWS (f j+τ ) > Gnon-SWS (f j+τ )
C (E j | f j+τ ) = (8.7)
non-SWS otherwise
in which a negatively delayed time (i.e., a preceding time) was expected. This means that we
anticipated the class of the target epoch with τ epochs earlier. To evaluate this approach, we
computed the discriminative power of the features and the classification results by varying the
time delay from -30 to 0 epochs with a step size of one epoch (a τ of zero corresponds to the
absence of time delay).
PSG
SWS
Non-SWS
SDNN RR (a.u.)
2
-2
2
SDFRE (a.u.)
-2
0 1 2 3 4 5 6 7
Time (h)
Figure 8.1: An example of overnight PSG-based annotations of SWS and non-SWS and the values of
two representative features SDNNRR (standard deviation of RR intervals) and SDFRE (standard deviation
of respiratory frequency) from a subject. The unsmoothed (dashed) and smoothed (solid) feature values
are plotted. The window width for spline fitting was 25 epochs. By comparing the annotations and the
two features, classification errors might occur around the transitions between SWS and non-SWS (e.g.,
the transition around the 5th h).
To prevent selecting features upon the whole data set and thus biasing the classifier, CFS
was applied during each iteration of the ten-fold CV, yielding ten ‘optimal’ feature subsets, one
for each training set. In order to assemble a single feature list, only the features appearing in all
feature subsets were selected. This list was thereby used in all iterations of ten-fold CV to test
the classifier.
Although the feature selector can automatically choose features that optimally separate the
classes SWS and non-SWS, evaluating the discriminative power of each single feature explores
which physiological aspects help distinguish both classes. It not only allows for the comparison
among features but also indicates to what extent the smoothing and time delay help improve
the features. For these purposes, the absolute standardized mean difference (ASMD) was used
to measure the discriminative power of a single feature. Given a feature f, it is computed as the
absolute mean difference of the feature values between SWS and non-SWS epochs divided by
the standard deviation of the values over all epochs
f
|µSWS f
− µnon-SWS | f
ASMD = (8.8)
σf
where µSWSf f
and µnon-SWS express the sample mean of SWS and non-SWS epochs, respectively,
f
and σ is the sample standard deviation. A higher discriminative power in separating the two
classes translates to a larger ASMD value.
Overall accuracy, precision, sensitivity, and specificity were first considered to evaluate the
classifier. However, they might not be appropriate criteria for the “imbalanced class distribu-
tion” in our data, where the non-SWS epochs account for an average of 87.6% of the night. The
114 Chapter 8. Slow wave sleep detection with cardiorespiratory activity
Cohen’s Kappa coefficient of agreement κ [72] offers an indication of the general classification
performance in correctly identifying imbalanced classes by compensating for the probability
of chance agreement. Here the classifier threshold was chosen to optimize the pooled Kappa
based on training data. To have an overview of the classification performance across the entire
solution space, a Precision-Recall (PR) curve was used. It plots precision versus recall (or sen-
sitivity) by varying the classifier threshold used to separate the two classes. When comparing
classifiers, the metric ‘area under the PR curve’ (AUCPR ) was calculated. In general, a larger
AUCPR corresponds to a better classification performance.
In order to evaluate the effectiveness of the feature smoothing and the time delay approaches
in improving SWS and non-SWS classification, we compared four classification schemes by
using features
The spline window size and the delayed time were determined to optimize κ based on training
data. Moreover, the classification performance was also compared between using only ECG
and only RE signals and between the normal group and the low-SWS group.
8.5 Results
After the feature selection procedure described before, a total of six features were selected with
CFS when including all cardiorespiratory features. In the same way, we obtained a list of four
features when using ECG alone and four when using solely RE. The selected features using
different signal modalities are listed in Table 8.2.
The averaged discriminative powers of the selected features in different schemes are com-
pared in Figure 8.2. It indicates that the smoothing with spline fitting can improve the feature
discriminative power. Experimentally it was found that the κ value was maximized at a spline
window of 25 epochs. On the other hand, using the features with negative time delay also
increased their discriminative power by comparing the ASMD values between scheme A and
C (or between scheme B and D). Here the optimal time delay τ of −2.5 and −5 min were
experimentally found for scheme C and D, respectively.
Figure 8.3 plots the classification performance (κ and AUCPR ) versus time delay (τ ) in
scheme C and D. The figure shows that the highest κ and AUCPR occurred with a negative time
delay of five epochs (2.5 min) for the unsmoothed features and of ten epochs (5 min) for the
smoothed features. This means that the optimal time delay should depend on the window size
of spline fitting. As we expected, it was longer in scheme D (with smoothing) than in scheme
C (without smoothing).
The results of SWS and non-SWS classification obtained with respect to the four schemes
are summarized in Table 8.3. The best result, obtained with smoothing and time delay, cor-
Part II. Timing between autonomic and brain activity 115
ASMD
ASMD
1 1 1 1
A B C D A B C D A B C D A B C D
Scheme Scheme Scheme Scheme
ASMD
ASMD
1 1 1 1
0 0 0 0
A B C D A B C D A B C D A B C D
Scheme Scheme Scheme Scheme
Figure 8.2: Average discriminative power (as measured by ASMD) of the selected features in different
schemes. The ASMD of scheme D was found to be significantly higher than the others for all the selected
features using a paired (two-sided) Wilcoxon signed-rank test (p < 0.001). The time delay τ was −2.5
min for scheme C and −5 min for scheme D.
responds to a pooled κ of 0.57, an overall accuracy of 88.8%, and an AUCPR of 0.68. With
an average κ of 0.56 ± 0.17, an average accuracy of 88.7 ± 4.2%, and an average AUCPR
of 0.69 ± 0.18, this scheme significantly outperforms all others, tested with a Wilcoxon test
(p < 0.0001). The table indicates that smoothing the features per recording resulted in a sig-
nificant increase in both κ and AUCPR regardless of where time delay was considered. The
classification performances of the four schemes are also compared by PR curves in Figure 8.4.
Taking a recording as an example, Figure 8.5 visually compares the PSG-based annotations and
the identified classes, suggesting an enhancement in classification performance when applying
feature smoothing and time delay. The figure also illustrates that feature smoothing can help
116 Chapter 8. Slow wave sleep detection with cardiorespiratory activity
0.7 0.7
(a) (b)
0.6 0.6
Value (-)
Value (-)
0.5 0.5
Figure 8.3: Classification performance using features (a) with smoothing and (b) without smoothing
versus time delay (τ ), in epochs. The minus sign of τ indicates the use of preceding feature values.
Table 8.3: Summary of SWS and non-SWS classification results in different schemes using ten-
Fold CV
Result Prec. (%) Sens. (%) Spec. (%) Acc. (%) Kappa κ AUCPR
◮ Scheme A: without smoothing and without time delay
Pool 53.8 53.9 91.8 86.1 0.45 0.54
Average 53.5 ± 17.7 54.9 ± 18.7 91.8 ± 3.5 86.0 ± 4.3 0.43 ± 0.17 0.55 ± 0.18
◮ Scheme B: with smoothing and without time delay
Pool 56.8 57.2 92.3 87.0 0.49 0.60
Average 56.8 ± 18.3 58.1 ± 18.7 92.4 ± 3.8 87.0 ± 4.3 0.48 ± 0.17 0.61 ± 0.18
◮ Scheme C: without smoothing and with time delay (τ = −2.5 min)
Pool 59.1 61.7 92.5 87.9 0.53 0.62
Average 59.0 ± 17.7 63.2 ± 20.5 92.5 ± 3.5 87.8 ± 4.4 0.52 ± 0.18 0.63 ± 0.18
◮ Scheme D: with smoothing and with time delay (τ = −5 min)
Pool 61.8 65.6 92.9 88.8 0.57 0.68
Average 62.0 ± 17.8 67.2 ± 20.4 93.0 ± 3.7 88.7 ± 4.2 0.56 ± 0.17 0.69 ± 0.18
In total six features were selected via CFS (see Table 8.2). Classifier threshold was chosen to maxi-
mize κ for training data. Significance of difference was confimed between scheme D and the others for
accuracy, κ , and AUCPR using a paired (two-sided) Wilcoxon signed-rank test (p < 0.0001).
removing spurious (very few epochs) detections of a class in the middle of a longer period of
the other class. It confirms our expectation that the feature smoothing is an adequate way to
handle this type of errors.
Table 8.4 presents the confusion matrix of our SWS and non-SWS classifier based on car-
diorespiratory features with smoothing and a 5-min negative delay. To analyze the source of
false positives or alarms (i.e., instances where non-SWS epochs were classified as SWS), the
breakdowns of classification results for non-SWS between wake, REM sleep, S1, and S2 are
also given.
Part II. Timing between autonomic and brain activity 117
0.9
0.8
0.7
0.6
Precision
0.5
0.4
0.3
Scheme A: without smoothing and time delay
0.2 Scheme B: with smoothing, without time delay
Scheme C: without smoothing, with time delay (-2.5 min)
0.1 Scheme D: with smoothing and time delay (-5 min)
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Figure 8.4: Pooled PR curves of SWS and non-SWS classification in different schemes, where the
scheme D performed the best.
SWS PSG
Non-SWS
SWS Scheme A
Non-SWS
SWS Scheme B
Non-SWS
SWS Scheme C
Non-SWS
SWS Scheme D
Non-SWS
0 1 2 3 4 5 6 7
Time (h)
Figure 8.5: An example of overnight PSG-based annotations and the corresponding SWS and non-SWS
classification results in different schemes.
When using one signal modality alone, the classification performance would degrade as
shown in Table 8.5 (average of κ = 0.54 for ECG or of κ = 0.51 for RE). Since the optimal time
delay (−5 min) was found to be the same for either ECG or RE features, it was then used for
comparison. Although the inclusion of ECG and RE signals yielded a better classification per-
118 Chapter 8. Slow wave sleep detection with cardiorespiratory activity
formance and they can be easily and unobtrusively acquired as mentioned before, our approach
is still applicable to achieve reasonable results when one of them is absent.
We also applied our SWS detection approach for all the 325 recordings and for those in
the low-SWS group (68 recordings from 51 subjects) with the total SWS time of less than 30
min, where the classification results are presented in Table 8.6. The results for the low-SWS
group (κ = 0.21) were much worse than the normal group engaged in this study, due to which
the classification performance for all recordings dropped to κ = 0.51. Figure 8.6 (upper graph)
illustrates the relation between the amount of SWS and age, confirming what is known from
literature [216], i.e., that the amount of SWS decreases with age. Figure 8.6 (middle graph)
illustrates the classification performance versus SWS time, which were (positively) significantly
correlated. Figure 8.6 (lower graph) shows a significant (negative) correlation between κ and
age, indicating that the classification performance was age-dependent.
Part II. Timing between autonomic and brain activity 119
150
0.6
0.4
2
0.2 R = 0.28
0
0 30 60 90 120 150
SWS time (min)
1
0.8 2
R = 0.13
Kappa
0.6
0.4
0.2
0
20 30 40 50 60 70 80 90 100
Age (y)
Figure 8.6: Relation between overnight SWS time, subject age, and classification performance (κ ) of all
325 recordings (including the normal and the low-SWS recordings). Lines represent the linear equations
fitted for data from samples. Significant Spearman’s rank correlation was found between SWS time and
age (r = −0.35), between κ and SWS time (r = 0.52), and between κ and age (r = −0.32) at p < 0.001.
8.6 Discussion
It is noted that Hedner et al. [127] evaluated a sleep staging system and obtained a SWS and
non-SWS classification with a κ of 0.48 (re-computed in terms of their reported confusion
matrix). They deployed pulse rate, peripheral arterial tone, and actigraphy and their κ value is
smaller than that produced in the current study. To provide a fair comparison, we achieved a
κ of 0.51 for all 325 recordings, which still outperforms their result. A respiratory-based sleep
stager was developed in our previous work [182], reporting a κ of 0.43 in detecting SWS where
a subset of 48 normal sleep nights from the same database were included. It is lower than the
result presented here (κ = 0.51), generated using only four respiratory features.
Although feature smoothing can, in general, benefit SWS detection, it may also introduce
errors when detecting very short SWS periods. This is because some of the high-frequency fea-
ture components are likely not due to measuring noise or to outliers caused by motion artifacts,
but rather reflect some essential characteristics of short SWS periods. In the case of a longer
SWS duration, smoothing the feature values would reduce noise and, in consequence, increase
specificity. However, in the case of a shorter SWS duration (i.e., fragmented SWS), sensitivity
would decrease. This procedure should then be adopted by finding an optimal trade-off be-
tween rejecting noise and keeping useful information. As shown in Table 8.3, the spline fitting
increased all metrics. This was also the case for the low-SWS recordings where we found that
the improvement was mainly contributed by the increase of specificity. The reason might be
that many false positives (misclassified non-SWS epochs) in a long period of non-SWS were
120 Chapter 8. Slow wave sleep detection with cardiorespiratory activity
8.7 Conclusion
In this chapter, overnight epoch-by-epoch classification of nocturnal SWS and non-SWS was
achieved based on cardiorespiratory signals which can be acquired unobtrusively. To reduce
classification errors caused by, for example, sensor noise, body motion artifacts, and/or within-
subject variability, a recording-specific feature smoothing using spline fitting was employed.
Besides, we used the features anticipating each target epoch to identify SWS of that epoch as
long as the preceding cardiorespiratory activity (compared with the PSG-based annotations) ap-
peared during the transitions between SWS and non-SWS. With an LD classifier, we revealed
that the use of feature smoothing and time delay profoundly improved the classification per-
formance (κ of 0.57 versus 0.45). Our approach also produced reasonable classification results
when only one of the signal modalities was present. Furthermore, the classifier performed much
better for subjects who had more total SWS time than for subjects with less SWS time.
122 Chapter 8. Slow wave sleep detection with cardiorespiratory activity
Part III: Cardiorespiratory-Based Sleep
Stage Classification
CHAPTER 9
This chapter is adapted from: X. Long, R. Haakma, T.R.M. Leufkens, P. Fonseca, and R.M. Aarts. Ef-
fects of between- and within-subject variability on autonomic cardiorespiratory activity during sleep and
their limitations on sleep staging: a multilevel analysis. Submitted.
125
126 Chapter 9. Effects of between- and within-subject variability
9.1 Introduction
Polysomnography (PSG) is the gold standard and common practice for the objective analyses of
sleep architecture (hypnogram) and sleep-related disorders such as insomnia/parasomnia, sleep-
disordered breathing, and rapid-eye-movement (REM) sleep behavior disorder [168]. With
PSG, sleep stages are manually scored on continuous 30-s epochs based on electrophysiologi-
cal signals including electroencephalogram (EEG), electromyogram (EMG), and electrooccu-
logram (EOG) according to the Rechtschaffen and Kales (R&K) rules [247] or the more recent
guidelines of the American Academy of Sleep Medicine (AASM) [136]. PSG recordings are
usually acquired in a sleep laboratory that requires a lot of manual labor for visual scoring.
It is costly and uncomfortable for subjects and therefore not suited for long-term monitoring.
These disadvantages motivated sleep researchers and clinicians to devote more attention to al-
ternatives such as cardiac and respiratory activities, allowing unobtrusive sleep staging with
minimal discomfort to subjects [127, 161, 249, 253, 303].
Cardiorespiratory activity has been proven to associate with autonomic sympathetic and
parasympathetic (or vagal) nervous system in humans, which relates to sleep stages [95, 135,
183, 279, 292]. For example, the sympathetic activation of the heart usually translates in an in-
creased spectral power of heart rate variability (HRV) in the low-frequency (LF) band between
0.04 and 0.15 Hz and the vagal activity (primarily caused by respiratory sinus arrhythmia) is
associated with the spectral power in the high-frequency (HF) band between 0.15 and 0.4 Hz
[288]. During rapid-eye-movement (REM) sleep, the HF spectral power increases while the
LF spectral power decreases, when compared with non-REM (NREM) sleep and wakefulness
[265]. Furthermore, the respiratory volume and frequency are more regular during NREM sleep
than during REM sleep and wakefulness [95]. Irregular respiration patterns occurring during
wakefulness are usually caused by body movements or alternation of ventilation control ma-
nipulated by some external factors; during REM sleep they can be related to muscle atonia or
subcortical structures with a possible involvement of the bizarre content of dreams [230, 233].
In addition to sleep stages, the cardiorespiratory activity can be influenced by between-
subject variability with respect to 1) subject demographics (including body size) such as age,
gender, and body mass index (BMI) [49, 227, 267], and 2) internal physiology such as response
of autonomic regulation, metabolic function, and subcortical arousals [132, 269, 305]. Other
factors, which differ from subject to subject and within subjects, such as conscious breathing
control and external sleep environment (e.g., noise and temperature), can also cause variations
in autonomic response during sleep [55, 56, 64, 206]. Furthermore, the autonomic activity ap-
pears as a function of time and the ratio of NREM and REM sleep in a sleep cycle changes
during the time course of the night [46, 292]. These would also be reflected in changes of
cardiorespiratory activity throughout the night within subjects. Additionally, the daytime activ-
ity and stressful events may change the sleep architecture and, consequently, affect autonomic
control of cardiorespiratory activity during the night [11, 118, 122]. It is however not clear to
which extent each of these effects can explain the variations in cardiorespiratory activity during
sleep.
In regard to automatic sleep staging with autonomic cardiorespiratory activity, parameters
Part III. Cardiorespiratory-based sleep stage classification 127
are usually derived from cardiac and respiratory signals on a 30-s epoch basis [136, 247]. Due
to the existence of between- and within-subject (variability) effects, the correct identification of
sleep stages based on the cardiorespiratory parameters seems challenging, in particular when
a subject-independent model is used (i.e., when a model is derived from a set of subjects, and
used to identify sleep stages for other new subjects).
The aim of this fundamental study was to quantitatively investigate the effects of between-
and within-subject variability on cardiorespiratory activity during sleep and to evaluate the
limitations of these effects on achieving reliable cardiorespiratory-based sleep staging results.
A total of 165 healthy subjects participating in the SIESTA project [160] were included in
this study. The subjects were monitored over a period of three years from 1997 to 2000 in
seven different sleep laboratories located in five European countries. The subject demograph-
ics [mean ± standard deviation (SD)] including age, gender, and BMI are given in Table 9.1.
The protocol was approved by local ethics committees of all sleep laboratories involved and
all subjects provided a written informed consent. The subjects fulfilled the following criteria:
no significant medical disorders, no reported symptoms of neurological, mental, medical or
cardiovascular disorders, no history of drug abuse or habituation (including alcohol), no psy-
choactive medication or other drugs (e.g., beta blockers), no shift work, and usually retirement
to bed between 22:00 and 24:00 depending on their habitual bedtime ([160].
For each subject, single-night full PSG recordings were obtained. Each recording consists of
at least 16 channels including EEG (C3-M2, C4-M1, O1-M2, O2-M1, Fp1-M2 and Fp2-M1),
EMG (chin and leg), EOG (2 leads), electrocardiogram (ECG, single-channel, modified V1
lead), nasal airflow, respiratory effort (abdominal and chest wall with respiratory inductance
plethysmography), snoring (microphone), and blood oxygen saturation [160]. Only the ECG
128 Chapter 9. Effects of between- and within-subject variability
signals, sampled at 100 Hz, 200 Hz, or 256 Hz depending on the equipment setup of each sleep
laboratory, and the respiratory (chest) effort signals, all sampled at 10 Hz were used in this
study.
Each PSG recording was visually annotated in 30-s epochs as nighttime wake, REM sleep,
and one of the NREM sleep stages S1-S4 by two independent raters according to the R&K rules.
In case of disagreement, the consensus annotations between the two raters were obtained. For
the analysis in this study, we considered four stages: wake, REM sleep, light sleep (merging
S1 and S2), and deep sleep or slow wave sleep (merging S3 and S4). Table 9.1 presents some
sleep statistics of the recording nights.
The ECG and respiratory effort signals of all subjects were preprocessed before computing the
parameters used for analyses. The baseline wander of the ECG signal was removed with a linear
phase high-pass filter using an 1.106-s Kaiser window with a 0.8-Hz cutoff frequency and a 30-
dB side-lobe attenuation [297]. The resulting signal was normalized with regard to mean and
amplitude and a low-complexity precise QRS complex localization algorithm [107] was used
to locate the R peaks in the signal. The resulting heartbeat or RR intervals were re-sampled at
4 Hz using a linear interpolator. To compute the cardiac parameters in the frequency domain,
the power spectral density of the re-sampled RR intervals was estimated with an autoregressive
model [42]. Ectopic RR intervals longer than 2 s, shorter than 0.3 s, or shorter than 0.6 times
their previous value were discarded.
The respiratory effort signal was first low-passed filtered using a 10th order Butterworth fil-
ter with a cut-off frequency of 0.6 Hz to eliminate high-frequency noise. Afterwards, the signal
baseline was removed by subtracting the median peak-to-trough amplitude estimated over the
entire signal. The respiratory peaks and troughs were detected by locating the signal turning
points based on sign changes of signal slopes. Finally, we excluded incorrectly detected peaks
and troughs 1) in peak-to-trough or trough-to-peak intervals where the sum of two successive
intervals was less than the median of all intervals over the entire recording and 2) with am-
plitudes where the peak-to-trough difference was smaller than 0.15 times the median of the
entire-night respiratory signal [185].
We analyzed six cardiorespiratory (two respiratory and four cardiac) parameters. The respi-
ratory parameters were BR, the mean breathing rate or respiratory frequency, and SDBR, the
standard deviation of breathing rates. For cardiac activity, the time-domain parameters included
HR, the mean heart rate, and SDNN, the standard deviation of heartbeat intervals. The spectral-
domain parameters included LF, the spectral power of heartbeat intervals in the LF band, and
HF, the spectral power in the HF band. Note that LF and HF were normalized by dividing
them by the total spectral power minus the power in the very-low-frequency (VLF, 0.003-0.05
Hz) band [58, 288]. This resulted in their expressions in a normalized unit (nu) instead of the
Part III. Cardiorespiratory-based sleep stage classification 129
absolute unit (ms2 ). These parameters have been widely used for the task of cardiorespiratory-
based sleep staging [94, 182, 185, 248, 249]. A logarithmic transformation was applied to BR,
SDBR, HR, and SDNN to correct for non-symmetry in the frequency distributions. Measure-
ment units are therefore expressed in natural logarithmic Hz (ln-Hz) for BR and SDBR, natural
logarithmic beats per minute (ln-bpm) for HR, and natural logarithmic millisecond (ln-ms) for
SDNN.
Values of the cardiorespiratory parameters (mean ± SD) measured from subjects with different
demographics (gender, age, and BMI) and time of night are presented. We considered different
cohort sets including three age groups: young (20-39 y), middle (40-69 y), and elderly (>69
y) and three BMI groups: under weight (<18.5 kg/m2), normal weight (18.5-25 kg/m2), and
over weight (>25 kg/m2 ). In addition, total sleep time was divided into four periods: 0-90
min, 90-180 min, 180-270 min, and >270 min. Significance of difference between groups was
tested with the analysis of variance (ANOVA) F-test.
Traditional statistical methods such as repeated measures ANOVA (rANOVA) and repeated
measures multivariate ANOVA (rMANOVA) are often used to analyze longitudinal data. How-
ever, they might not be appropriate since they expect uncorrelated and independent observations
[23]. In regard to the nature of multiple dependent variables, a more generalized multilevel re-
gression analysis [134] takes structural variables with fixed and random effects measured at
multiple hierarchical levels into account. Compared with the traditional methods, multilevel
analysis has several advantages [134, 301]. First, it serves to deal with incomplete data while
ANOVA-based methods handle that by simply deleting all subjects with missing measures.
Second, it concerns data with a hierarchical structure and thus allows for meta-analysis of ex-
planatory variables with effects on different levels simultaneously. Third, it is able to quantify
the variability within levels. To these matters, we applied multilevel models to statistically eval-
uate the effects of between- and within-subject variability on the cardiorespiratory parameters.
Under a variety of names used by different authors, multilevel models are also known as, e.g.,
mixed models, random-effects models, and hierarchical linear models [134].
the influences from the differences in sleep environment, daytime energy expenditure, and other
factors or behaviors such as stress, smoking, and personality. These influences, if existent, were
assumed to be conveyed by the physiological variability. Additionally, in our previous work
[184], there were no effects on the cardiac activity found between different laboratories based
on the same data. For this reason, we disregarded the laboratory factor during our modeling
procedure.
To evaluate the between- and within-subject effects, we constructed a multilevel model with
two levels (level two: subject; level one: time or epoch) for a given cardiorespiratory parameter
y. The model predicts/estimates the values of the parameter based on a set of variables including
sleep stages, age, gender, BMI, and time of night. For the parameter value yi j in the ith epoch
of the night (i = 1, 2, . . ., N with a total of N epochs) from subject j ( j = 1, 2, . . ., M where M
is the total number of subjects), the two-level regression model with associated coefficients is
given by
in which β0 is the fixed intercept, µ0 j is the random effect with variance Ω0 indicating the
between-subject variability in physiology (independent of sleep stages or corrected by sleep
stages), and e0i j is the (random) residual term with variance Ωe quantifying the within-subject
physiological variability (independent of time). si j is a dummy variable (0 or 1) specifying the
sleep stage (s = wake, REM, light, deep) of epoch i from subject j with its fixed effect βs and
random effect µs j , where Ωs reflects the between-subject physiological variability in sleep stage
s. The demographic variables age (y), gender (dummy variable: 0 = man, 1 = woman), and BMI
(kg/m2) respectively correspond to the fixed effects βa , βg , and βb varying between subjects.
The variable timei j (min) expresses the relative time of epoch i (timei j = i/2) from subject j,
βt is the fixed time effect corresponding to linear changes over time within subjects, µt j is the
random time effect with variance Ωt indicating the variability of time effect between subjects,
and βa , βta , and βtb are cross-interactions specifying the fixed age-, gender-, and BMI-related
time effects, respectively. Note that the variances from the random effects (including residuals)
were assumed to be drawn from a normal distribution with zero mean. Here the normality was
visually checked using a heuristic Quantile-Quantile (Q-Q) plot method since the commonly
used numerical normality tests are not appropriate on large-sized samples [271].
Part III. Cardiorespiratory-based sleep stage classification 131
where y j is the variable that gives the within-subject mean value over the entire night for sub-
ject j and its associated fixed slope βc corresponds to the between-subject centering effect.
This effect is meant to reflect the physiological difference between subjects at the (individual)
overnight mean level. Here the estimation of the overnight mean value was assumed to be in-
dependent of sleep stage composition (percentages of sleep stages) over the entire night. To a
certain degree, the demographic effects were expected to be conveyed by the centering effect.
Therefore, the model without the centering term (Model #1) should be used for exploring the
actual demographic effects with a single model.
γ2
Z= . (9.3)
SE 2 (γ )
The acceptance or rejection of the null hypothesis can be tested with a Chi-squared (χ 2 ) test
with one degree of freedom (df).
132 Chapter 9. Effects of between- and within-subject variability
Table 9.2: Description of the seven explanatory effects (with exclusion of sleep stage effects) on
cardiorespiratory activity considered in this study.
Effect Description
◮ Overall between-subject effect
Demographic effect Fixed, variability in age, gender, BMI between subjects
Centering (physiological) effect Fixed, variability in overnight mean level between subjects
Between-subject time effect Random, variability in time of night between subjects
Between-subject physiological effect Random, variability in physiology between subjects
◮ Overall within-subject effect
Within-subject time effect Fixed, variability in age, gender, BMI within subjects
Within-subject physiological effect Random, variability in physiology within subjects
◮ Cross-interaction effect
Demographic-related time effect Fixed, demographic-related variability in time of night
The models described in Equation 9.1 and 9.2 are ‘full’ models and need to be optimized
by excluding the effects with coefficients statistically not different from zero (tested with the
Wald statistic). Differences between models are assessed by comparing model deviances using
a χ 2 statistic (i.e., likelihood ratio test) with df = 2. This chapter only presents the results of the
optimized models that are manipulated by significant effects.
It is of particular interest in interpreting how much the model variance is explained by different
variables or effects. As described in Table 9.2, a total of seven explanatory effects for each car-
diorespiratory parameter were considered in this study. Raudenbush and Bryk [246] proposed
an approach by using the squared multiple correlation R2 a sequence of models. Suppose that
the full model under consideration for a given parameter is Model #2, given by Equation 9.2.
A sequence of seven models (Model A-G) can be established in a certain order that serves to
compute the proportion of variance explained (PVE) of each effect. The details of doing this is
described in the Appendix.
• The second scheme CS2 corrects the parameter values by subtracting all the (sleep-stage-
independent) fixed effects and all the between-subject random effects, such that
CS2: ŷi j = ∑ β̂s si j + ê0i j . (9.5)
s
• The third scheme CS3 excludes all the (sleep-stage-independent) fixed effects and the
within-subject effect to correct the parameter values, such that
CS3: ŷi j = µ̂0 j + ∑(β̂s + µ̂s j )si j + µ̂t j timei j . (9.6)
s
Note that, again, the exclusive aim of analyzing these correction schemes in the present study
was to evaluate in what aspect and how far the cardiorespiratory parameters can be improved for
sleep staging instead of really performing sleep staging. In other words, we intended to answer
the question what sleep staging performance can be achieved if we can eliminate the effects
caused by the between- or within-subject variability. Investigating methods of estimating the
fixed coefficients and random variances without knowing sleep stages was not addressed here.
9.3 Results
9.3.1 Descriptive results
Figure 9.1 compares the skewness of the parameters with and without being transformed using
natural logarithm. It indicates that the four parameters BR, SDBR, HR, and SDNN need to
be log-transformed since they were of skewed distribution and their skewness values largely
decreased after performing the log-transformation. Table 9.3 shows the values (mean ± SD) of
the six cardiorespiratory parameters BR, SDBR, HR, SDNN, LF, and HF analyzed in this study
for different cohort sets in different gender, age groups, BMI groups, time periods, and sleep
stages. The values significantly differed across different groups for all the cohort sets (ANOVA
F-test, p < 0.001).
134 Chapter 9. Effects of between- and within-subject variability
3
Original
Natural logarithm
2
1
Skewness
-1
-2
BR SDBR HR SDNN LF HF
Cardiorespiratory parameter
Figure 9.1: Skewness comparison of cardiorespiratory parameters with and without natural logarithm
transformation, indicating that BR, SDBR, HR, and SDNN should be log-transformed.
In comparison with the F-test, the multilevel regression models enable a more adequate and
thorough statistical analysis. With the multilevel Model #1, the estimated coefficients and vari-
ances for all the parameters are shown in Table 9.4. As a result of removing the insignificant
variables (tested using the Wald Z-test with p > 0.05) except for the constant intercept and
sleep stage variables, the model was optimized. The table indicates that the demographics sig-
nificantly influenced the cardiorespiratory activity from different aspects. Upon a closer look, it
is found that the breathing rate BR for the healthy subjects with a higher BMI was significantly
higher than the subjects with a lower BMI (0.011 ln-Hz per kg/m2 , p < 0.01) at the baseline
of -1.458 ln-Hz, whereas its variation SDBR remained the same. For cardiac activity, the mean
heart rate HR of women was higher than men (0.042 ln-bpm, p < 0.05) at the baseline of 4.221
ln-bpm while its variation SDNN was lower than men (-0.247 ln-ms, p < 0.0001) at the base-
line of 4.823 ln-ms. SDNN was also negatively correlated to subject age (-0.009 ln-ms per y,
p < 0.0001) and BMI (-0.025 ln-ms per kg/m2, p < 0.01). With the spectral analysis of HRV,
men had an LF power increased by 0.045 nu (p < 0.05) but a lower HF power of 0.052 nu
(p < 0.01) nu compared with women during bedtime sleep. The HF power slightly decreased
along with the increase in age for men (-0.002 nu per y, p < 0.05). These results are consistent
with previous work [49, 101, 257].
Most of the analyzed parameters were found to be time-variant (i.e., they were modulated
by time of night) with an exception of breathing rate (Table 9.4). For instance, the heart rate
HR dropped down gradually along with the time progression over the night (-0.0001 ln-bpm
Part III. Cardiorespiratory-based sleep stage classification 135
Table 9.3: Values (mean ± SD) of the six cardiorespiratory parameters in different cohort sets
(n=165)
ln, natural logarithm; nu, normalized unit; young, 20-39 y; middle, 40-69 y; elderly, >69 y; under weight,
<18.5 kg/m2 ; normal weight, 18.5-25 kg/m2; over weight, >25 kg/m2 ; light sleep, S1 and S2 stages;
deep sleep, S3 and S4 stages. For all the parameters, values between each cohort groups were signifi-
cantly different (F-test, p < 0.001) but this may be imprecise since subject demographics, time of night,
and sleep stages were possibly not independent.
per min, p < 0.0001) at the baseline of 4.221 ln-bpm while the variation in heartbeat intervals
SDNN increased (0.001 ln-ms per min, p < 0.0001) at the baseline of 4.823 ln-ms, confirming
the findings reported previously [57]. This time modulation varied from subject to subject
because of the presence of significant variance Ωt (p < 0.0001), referring to the random time
effect. The time was also modulated by some demographic variables (such as age for SDNN
and BMI for SDBR, LF, and HF). We note in the table that there appeared to be significant
between-subject physiological effects for all parameters (p < 0.0001), measured by the random
variances of sleep stage variables. These variances seemed approximately homogeneous across
sleep stages for BR and HR but were clearly different for their variations SDBR and SDNN.
Figure 9.2 illustrates an example that compares the parameter values (estimated by multilevel
regression based on Model #1) changing along with time between two subjects with different
136 Chapter 9. Effects of between- and within-subject variability
Table 9.4: Coefficients and their standard errors (SE) of the optimized multilevel model without the
between-subject centering effect (Model #1) for the six cardiorespiratory parameters.
ln, natural logarithm; nu, normalized unit. The statistically significant effects (Wald Z-test, p < 0.05) the fixed
constant intercept β0 and sleep stage intercepts βs are presented.
demographics. It shows that the fixed time and demographic effects were generally larger than
the differences between sleep stages.
With the addition of the centering variable to Model #1, we have Model #2 and the estimated
regression coefficients after model optimization (Wald Z-test at p < 0.05, for each coefficient)
are shown in Table 9.5. As stated, this model included the between-subject physiological effect
at the overnight mean level (i.e., centering effect), resulting in an obvious reduction of the
random variance in each sleep stage compared with Model #1. This indicates that, regardless of
sleep stages, the between-subject variability in physiology can be reflected, to a certain degree,
by the difference of the mean value over night. Besides, centering the parameter values per
subject slightly influenced the time effect in both fixed and random parts. In comparison with
Model #1, a lower deviance using Model #2 was obtained for all the parameters (p < 0.0001)
as shown in Table 9.4 and 9.5, indicating a better goodness-of-fit on the parameters using the
model with the centering variable.
Part III. Cardiorespiratory-based sleep stage classification
-1.12 4.25 0.6
Man Man Man
HR (ln-bpm)
-1.16 4.2 0.5
BR (ln-Hz)
LF (nu)
-1.2 4.15 0.4
SDNN (ln-ms)
SDBR (ln-Hz)
-3.4 4 0.6
HF (nu)
0.5
-4 3.4
Woman
0.4
Woman Woman
-4.6 2.8
0 200 400 0 200 400 0 200 400 0 200 400 0 200 400 0 200 400
Time (min) Time (min) Time (min) Time (min) Time (min) Time (min)
Figure 9.2: An example of multilevel regressions of the six cardiorespiratory parameters for a man (age: 24 y, BMI: 21.3 kg/m2 ) and a woman (age: 70 y,
BMI: 28.6 kg/m2 ) using coefficients estimated through Model #1 excluding the random coefficients and residual term. The regression variables included
age, gender, BMI, time, time × age, time × gender, time × BMI, and sleep stages wake, REM, light, and deep.
137
138 Chapter 9. Effects of between- and within-subject variability
Table 9.5: Coefficients and their standard errors (SE) of the optimized multilevel model with the addi-
tional between-subject centering effect (Model #2) for the six cardiorespiratory parameters.
ln, natural logarithm; nu, normalized unit. The statistically significant effects (Wald Z-test, p < 0.05) the fixed
constant intercept β0 and sleep stage intercepts βs are presented.
Normality of the variances was tested and suggested using the Q-Q plot method for all
models. For example, the Q-Q plots of the residual variances Ωe (in Model #1) for all the
parameters are shown in Figure 9.3, suggesting that the variances were approximately drawn
from a normal distribution.
To exploit by which effects the variance was explained and how much they constituted, we
computed for each cardiorespiratory parameter the PVE for each effect by analyzing the es-
timated variances of random intercept and residual in a sequence of models (Model A-G in
the Appendix). The variance changes in the models with the inclusion of different effects in
a specific order are shown in Table 9.6, based on which the PVE values were obtained in Ta-
ble 9.7. Note that the variances explained by sleep stages were not included in PVE. For BR
Part III. Cardiorespiratory-based sleep stage classification 139
Quantiles of Ωe
Quantiles of Ωe
Quantiles of Ωe
0.2
0 0 0
-0.5 -0.2
-2
-0.5 0 0.5 -2 0 2 -0.2 0 0.2
Standard normal quantiles Standard normal quantiles Standard normal quantiles
Quantiles of Ωe
Quantiles of Ωe
Quantiles of Ωe
0.5 0.5
0 0 0
-2 -0.5 -0.5
Figure 9.3: Q-Q plots of residual variance Ωe of the multilevel models (Model #1) for the six cardiores-
piratory parameters. These plots suggest approximate normal distributions of the residual variances.
and HR, the between-subject centering effects dominated the variances (55.26% for BR and
77.95% for HR), indicating that the subjects behaved differently with respect to their breathing
rate and heart rate at the general mean level throughout the whole night. We also see that the
variations in breathing rate and heart rate had a lower centering difference between subjects
(with PVE of 26.23% for SDBR and of 39.06% for SDNN) compared with the physiological
variability within subjects (with PVE of 61.69% for SDBR and of 40.87% for SDNN). This
was also the case for LF and HF powers in the spectral domain of HRV as shown in Table 9.7.
As a result, the overall between-subject variability influenced more on breathing rate (PVE
of 66.58%) and heart rate (PVE of 86.25%) while less on their variations (PVE of 37.94%,
58.66%, 33.62%, and 35.13% for SDBR, SDNN, LF, and HF, respectively) compared with the
overall within-subject variability. In general, the variances explained by the effects in physiol-
ogy between subjects (including the effect at the overnight mean level and random effect) and
within subjects accounted for 83.83-97.16% of the total variance for different cardiorespiratory
parameters.
Specifically, a relative larger percentage (13.7%) of the demographic effect can be found
on SDNN compared with the other parameters. The PVE of between-subject physiological
variability (in the random part) ranged from 2.27% to 7.62% depending on the parameters.
For the time effect, the PVE in the fixed part (0.01-1.32%) reflecting the linear changes of
parameters over time within subjects was smaller than in the random part (1.58-2.74%) with
the indication of different changes over time between subjects. In general, the time effect
accounted for relatively less of the total variance than most other effects. Finally, although
the cross-interactions existed between time and demographics for BR, SDNN, LF, and HF, the
proportion of variance they explained was very small (<0.20%).
140 Chapter 9. Effects of between- and within-subject variability
Table 9.6: Variances of a sequence of models (Model A-G in the Appendix) with different effects
for computing their PVE for the six cardiorespiratory parameters.
ln, natural logarithm; nu, normalized unit; Dev., model deviance; Ne, no effect. All the models include
fixed (β0 ) and random (µ0 ) intercepts, and sleep-stage-dependent variables (wake, REM, light, and deep)
with their coefficients. The models were optimized by excluding the effects with their coefficients statis-
tically equal to zero (Wald Z-test, p > 0.05) and the variances presented in the table were all statistically
significant (Wald Z-test, p < 0.01).
The results of sleep staging are presented in Table 9.8, where different schemes (BS and CS1-
CS3) were compared. We observe that the correction by means of the between- and/or within-
subject effects for the parameters generally enabled performance improvement in sleep staging
(by comparing the results of CS1-CS3 with BS). In particular, correcting the parameters by
the fixed effects (demographics, time, and their cross-interactions) independent of sleep stages
(CS1) resulted in a significantly increased Kappa of 0.29 ± 0.11 and a significantly increased
accuracy of 60.4 ± 8.8% (Wilcoxon test, p < 0.00001) compared with the baseline without any
correction (Kappa of 0.19 ± 0.10 and accuracy of 55.8 ± 9.8%). In addition, if we could man-
Part III. Cardiorespiratory-based sleep stage classification 141
Table 9.7: Proportion of variance explained (PVE, %) accounted for by different effects for the six
cardiorespiratory parameters.
Table 9.8: Comparison of sleep staging results (wake/REM sleep/light sleep/deep sleep) using
different schemes in correcting the cardiorespiratory parameters.
age to further correct the variability of the parameters evoked by the between-subject random
effects (CS2), the sleep staging results would significantly increase to a Kappa of 0.35 ± 0.09
and an accuracy of 62.9 ± 7.8% (Wilcoxon test, p < 0.00001), where the SD of results over
subjects would be simultaneously reduced. On the other hand, if the within-subject variability
could be corrected (CS3), the sleep staging performance would be markedly improved (at a
142 Chapter 9. Effects of between- and within-subject variability
Kappa of 0.72 ± 0.23 and an accuracy of 83.5 ± 14.4%) (Wilcoxon test, p < 0.00001), but
meanwhile, the SD would increase because this correction scheme focused on reducing effects
within subjects rather than those between subjects. Similarly, as shown in Table 9.8, correcting
the parameters could help obtain a more accurate estimation of sleep stage composition.
9.4 Discussion
The results of demographic and time of night effects found in this study are consistent with the
findings reported in previous work [49, 57, 101, 257]. It is noted that the model used to facili-
tate the interpretation of the demographic effects (Model #1) should not include the (between-
subject) centering variable. This is because the demographic differences usually correspond to
the autonomic changes at the overnight mean level. Due to the inclusion of the centering effect
in Model #2, it came as a surprise that some demographic variables still had significant effects
(see Table 9.4), which contradicts our expectation that their effects on the cardiorespiratory ac-
tivity are fully manifested by the parameter mean values. The cause is that the percentages (or
composition) of sleep stages were not exactly the same for all subjects. Therefore, the demo-
graphic differences were only partially explained by the centering variable and the unexplained
part depends on the difference of sleep stage composition between subjects.
It is important to note that, since some effects were correlated with each other, the order in
the procedure of constructing the sequence of models (see the Appendix) must be specifically
determined. This aimed at precisely quantifying the proportion of variance explained by each
effect. The procedure should follow the way that the model with fixed effects (e.g., demo-
graphic effects) that are explainable by other effects should be first addressed and the model
with random effects should be included later [134].
In Table 9.4 and 9.5, it can be seen that the time variable was able to explain variance at the
subject level due to the significance of the random time effect. First, the slope of cardiorespi-
ratory activity changing over time might depend on sleep stages (or their transitions) and thus
might not be with a continuous linear trend. A method of handling the sleep-stage-dependency
is to use a model that contains the cross-interactions between sleep stages and time; but for
the influence of sleep stage transitions, it is suggested to regard the night as different segments
without any sleep stage transitions. Second, the random time effect could likely be due to
the difference in autonomic control or changes in sleep architecture between subjects by other
factors such as daytime activity, work stress, and response to the sleep environments during
sleep. This was not addressed in this study and it merits further investigation. On the other
hand, the cross-interactions between time and demographics (in particular, BMI) explained
some total variance at both subject and epoch levels. Although the amount and proportion of
variance explained by the time-related effects seems much smaller than some other effects as
shown in Table 9.7, they are still statistically unequal for different subjects and are relative large
compared with the differences between sleep stages for some parameters such as LF and HF,
especially at the end of the night, which can be observed in Figure 9.2.
Regarding the quantified within effects, several factors in addition to internal physiology
Part III. Cardiorespiratory-based sleep stage classification 143
may also explain some of the total variance within subjects in cardiorespiratory activity such
as body movements, body position, sleep environment, conscious breathing control, and even
daytime activity. However, we did not answer which of these effects takes place in this work
and this should be studied in the future.
When evaluating the performance of sleep staging using the cardiorespiratory parameters,
Model #2 should be regarded as the preference. For each parameter, although the estimate of
its overnight mean value for each subject was not completely accurate (due to the difference
of sleep stage composition between subjects), correcting it can still result in a reduction of the
physiological variability between subjects to a great extent. As a consequence, the sleep staging
results can be improved. Table 9.6 confirms that the centering effect actually constituted a
large proportion of the total variance. Moreover, Figure 9.2 illustrates that the variations of the
parameters caused by demographic and time effects were somewhat comparable with or even
larger than the differences between sleep stages, leading to difficulty in separating sleep stages.
With respect to the capability of the parameters in classifying sleep stages, Table 9.5 shows
that, for example, SDBR had a larger difference between sleep stages compared with the other
parameters while BR had no difference between REM sleep and wakefulness. This indicates
that the intrinsic separation of sleep stages should vary between the parameters that express
different aspects of the autonomic activity.
Table 9.8 indicates that the variability between and within subjects conveyed by the car-
diorespiratory activity limited the sleep staging performance. To improve it, the correction
scheme CS1 seems potentially applicable from a practical point of view because the fixed ef-
fects are usually prior information that is independent of sleep stages or they can be estimated
from the training data before performing sleep staging. However, realizing CS2 and CS3 re-
quires either information of sleep stages (which appear practically unknown and need to be
identified) or estimation of random variances (which are hardly predictable for new subjects).
Therefore, the challenge will be on how to diminish the random effects caused by variability
either between or within subjects when sleep stages are unknown. For instance, normalizing
the parameter values based on their variation or distribution throughout the night for each sub-
ject might allow for reduction of between-subject random effect in physiology to some extent.
Incorporating more explanatory variables in the model that are independent of sleep stages and
are able to explain some variance of the model would help better correct the parameters. Com-
pared to the parameters analyzed in this study, exploring new parameters with smaller random
variances (i.e., are less influenced by the between- or within-subject physiological variability)
or additional information in separating sleep stages may improve the sleep staging performance.
Nevertheless, we argue that the performance of cardiorespiratory-based sleep staging will al-
ways be limited unless the between- and/or within-subject random variances are successfully
explained and corrected.
144 Chapter 9. Effects of between- and within-subject variability
9.5 Conclusion
In this chapter, with a multilevel analysis we statistically modeled and quantified the effects on
autonomic cardiorespiratory activity during sleep caused by differences in subject demograph-
ics, time of night, physiology within and between subjects. All these effects were found to
significant. The primary effects were the physiological variability within and between subjects.
They markedly limit the performance of sleep staging when using cardiorespiratory informa-
tion. Therefore, diminution of these effects will be the main challenge to further improve the
cardiorespiratory-based sleep staging.
9.A Appendix
The sequence of models constructed to compute the PVE values for different effects is described
in the following.
• The first model is the model with solely the constant and random intercepts as well as the
fixed sleep-stage-dependent variables. This baseline model can be written as
where s = wake, REM, light, deep, and the total variance Ωtotal consists of variance in
two levels: the between-subject variance ΩA0 at the subject level and the within-subject
(residual) variance ΩAe at the time/epoch level. The percentage of the total variance taken
by ΩA0 , called intra-group correlation coefficient (ICC) ρ (21, 39), is computed by
ΩA0 ΩA
ρ= = A 0 A . (9.8)
Ωtotal (Ωe + Ω0 )
• Let us then consider the model with fixed time effect at the first level
For the variance analysis of the time variable, instead of using the original time stamps
mentioned before (i.e., timei j = i/2), we use the shifted (centered) values computed as the
original time minus the mean value of the median time over all subjects. This is because,
for a longitudinal multilevel analysis, time is an occasional variable within subjects and
it usually suffices a linear trend for the measurements since, it thus would explain part of
total variance in both levels [134]. Actually, with and without shifting the occasion mea-
sures do result in equivalent models with exactly the same model coefficients (including
residual) and deviance except for the variance estimates of random effects. The variance
Part III. Cardiorespiratory-based sleep stage classification 145
estimates obtained by shifting the time values are considered to be more accurate and
realistic [134]. To quantify the PVE constituted by the fixed time effect, we exploit the
relative variance reduction of the baseline model in the two levels R21 and R21 , such that
Similarly, the PVE explained by the between-subject demographic variables can be com-
puted by
(ΩB C B C
e − Ωe ) + (Ω0 − Ω0 )
PVEdemographic = . (9.12)
Ωtotal
The demographic variables only explain the variability between subjects, so the variance
change at the epoch level should be approximately zero (ΩBe − ΩCe = 0).
• Further, Model D is the model with the inclusion of between-subject centering effect
(expressing the physiological difference between subjects at the overnight mean level),
given by
(ΩC D C D
e − Ωe ) + (Ω0 − Ω0 )
PVEcenter = . (9.14)
Ωtotal
146 Chapter 9. Effects of between- and within-subject variability
• For the inclusion with cross-interactions that express the demographic-related time ef-
fects, the model is
(ΩD E D E
e − Ωe ) + (Ω0 − Ω0 )
PVEcross = . (9.16)
Ωtotal
In addition to the fixed part, we consider the random part of some effects.
The computation of the PVE accounted for by the random time effect can be accordingly
obtained by
• Afterwards, the model with random effects for different sleep stages (expressing the
between-subject physiological variability associated with each sleep stage in random
part) is then expressed as
Model G: yi j = β0G + µ0Gj + ∑(βsG + µsGj )si j + (βtG + µtGj )timei j + βcG ŷ j + eG
0i j ,
s
+ βa age j + βgG gender j + βbG BMI j ,
G
(ΩFe − ΩG F G F G
e ) + (Ω0 − Ω0 ) + (Ωt − Ωt )
PVEbetw subj random = . (9.20)
Ωtotal
Then the PVE of the random time effect to the total variance should be corrected to
(ΩEe − ΩFe ) + (ΩE0 − ΩF0 ) − (ΩFt − ΩG
t )
PVEtime random = . (9.21)
Ωtotal
• Finally, the remaining residual variance is assumed to only associate with the physiolog-
ical variability within subjects and its proportion can be obtained such that
ΩG
e
PVEwithin subj random = . (9.22)
Ωtotal
Note that all these models are optimized by only keeping the variables that do not statistically
equal zero.
148 Chapter 9. Effects of between- and within-subject variability
CHAPTER 10
This chapter is adapted from: P. Fonseca∗ , X. Long∗ , M. Radha, R. Haakma, R. M. Aarts, and J. Rolink.
Sleep stage classification with ECG and respiratory effort. Submitted. (∗ Joint first authorship)
Abstract – Automatic sleep stage classification with cardiorespiratory signals has attracted
increasing attention. In contrast to the traditional manual scoring based on polysomnography
(PSG), these signals can be measured using advanced unobtrusive techniques that are currently
available, promising the applications for personal and continuous home sleep monitoring. This
chapter describe a methodology for classifying wake, rapid-eye-movement (REM) sleep, and
non-REM (NREM) light and deep sleep on a 30-s epoch basis. A total of 142 features were ex-
tracted from electrocardiogram (ECG) and thoracic respiratory effort measured with respiratory
inductance plethysmography (RIP). To improve the quality of these features, subject-specific
Z-score normalization and spline smoothing were used reduce between-subject and within-
subject variability. A modified sequential forward search- (SFS-) feature selector procedure
was applied, yielding 80 features while preventing the introduction of bias in the estimation of
cross-validation performance. Data from 48 healthy adults were used to validate our methods.
Using a linear discriminant classifier and a ten-fold cross-validation, we achieved a Cohen’s
Kappa coefficient of 0.49 and an accuracy of 69% in the classification of wake, REM, light,
and deep sleep. These values increased to Kappa = 0.56 and accuracy = 80% when the classifi-
cation problem was reduced to three classes, wake, REM sleep, and NREM sleep.
149
150 Chapter 10. Sleep stage classification
10.1 Introduction
Sleep is a state of reversible disconnection from the environment and plays an essential role in
the homeostatic regulation of body and mind. The limited consciousness during sleep makes it
one of the hardest lifestyle patterns to reflect upon. Historically this has not been a problem as
the regulation of sleep is rigorously synchronized through a biological circadian rhythm with
the external environment. Yet, in the modern industrialized society where we spend our lives
in artificial environments where lighting, heat and food are available at any moment, sleep
disturbances and disorders have reached epidemic levels [65]. People experience the symptoms
of disturbed sleep such as fatigue, increased impulsiveness and agitation, without the means to
link these issues to their sleeping patterns.
To ensure fitness of body and mind, individuals must be empowered with the ability to mon-
itor sleep easily in order to identify sleep-related problems and adjust their sleeping habits ac-
cordingly. Yet a problem with traditional sleep monitoring, known as polysomnography (PSG),
is that a wide array of potentially sleep-disturbing sensors must be applied to the body and their
measurements can only be interpreted by highly trained sleep technicians or scientists. The
traditional PSG is therefore rather unsuited for individual untrained use and will only introduce
more sleep disturbances when applied on a daily basis. This scenario makes apparent a need for
unobtrusive methods of sleep monitoring, preferably inexpensive and with no training required
to operate them. Cardiorespiratory monitoring can be unobtrusive and the data can be analyzed
by a computer, which makes this technology a promising candidate for personal, continuous
and unobtrusive sleep monitoring.
Cardiorespiratory sleep staging or sleep stage classification is often based on heart rate vari-
ability (HRV) calculated from electrocardiogram (ECG) and respiratory effort, often from res-
piratory inductance plethysmography (RIP). Usually cardiorespiratory information is combined
with body movements from an accelerometer to more accurately distinguish wake from sleep.
One of the earliest studies that presented a successful machine learning approach to cardiorespi-
ratory sleep stage classification with these modalities was done by Redmond et al. [248]. Using
a set of HRV features to model the autonomic nervous activity and a set of respiratory features
to model the parasympathetic tone, Redmond and colleagues showed the viability of a sleep
stage classifier that can generate a simplified hypnogram for an entire night indicating, for each
30-s segment, a sleep stage, classified as either wake, rapid-eye-movement (REM) sleep, or
non-REM (NREM) with no PSG (wake-REM-NREM or WRN classification for short). More
recent research has shown that it is possible to obtain the same cardiorespiratory information
from other sensors for sleep stage classification, such as from bed-mounted ballistocardiogram
[161, 303] or contactless radio frequency [85]. Although these studies focused on distinction
between wake, REM sleep, and NREM sleep (without separating NREM sleep in other sleep
stages) or between wake and sleep (merging REM and NREM sleep), these attempts promised
that cardiorespiratory methods could one day be completely unobtrusive.
In previous work [182] we proposed methods to simultaneously classify wake, REM sleep,
light sleep (NREM stage S1 and S2), and deep sleep or slow wave sleep (stage S3 and S4) us-
ing respiratory activity in order to estimate an overnight wake-REM-light-deep sleep (WRLD)
Part III. Cardiorespiratory-based sleep stage classification 151
The data set was the same as used in earlier work [182] and it comprised full single-night
polysomnographic (PSG) recordings of 48 subjects (27 females) acquired in the SIESTA project
[160]. All subjects were healthy sleepers with a Pittsburgh Sleep Quality Index [60] of less than
6 and had no regular sleep complaints nor earlier diagnosis of sleep disorders. The subjects had
an average age of 41.3 (±16.1) y at the time of the recording. Full subject demographics can
be found in our earlier work [182]. Sleep stages were scored by trained sleep technicians in six
classes according to the R&K rules [247]. In the scope of this study, S1 and S2 were merged in
a single L (light sleep) class and S3 and S4 were merged in a single D (deep sleep) class.
Each PSG recording comprised, besides the standard signals required for sleep scoring,
modified lead II ECG, and (thoracic) respiratory effort recorded with respiratory inductance
plethysmography (RIP). QRS complexes were detected and localized from ECG signals using
a combination of a Hamilton-Tompkins detector [123, 124] and a post-processing localization
algorithm [107]. Prior to feature extraction, RIP signals were filtered with a 10th order Butter-
worth low-pass filter with a cut-off frequency of 0.6 Hz, after which baseline was removed by
subtracting the median peak-to-through amplitude [182].
152 Chapter 10. Sleep stage classification
We extracted a set of 142 features from cardiac and respiratory activity, and from cardiores-
piratory interaction (CRI) using a sliding window centered on each 30-s epoch, guaranteeing
sufficient data to capture the changes in autonomic activity [288].
length [249]. Our previous study [182] introduced respiratory amplitude features for sleep
stage classification, including the standardized mean, standardized median, and sample entropy
of respiratory peaks and troughs (indicating inhalation and exhalation breathing depth, respec-
tively), median peak-to-trough difference, median volume and flow rate for complete breath
cycle, inhalation, and exhalation, and inhalation-to-exhalation flow rate ratio. These features
were adopted in this work. Besides, we also computed the similarity between the peaks and
troughs by means of the envelope morphology using a dynamic time warping (DTW) metric
[37]. From the respiratory spectrum, the respiratory frequency and its power, the logarithm of
the spectral power in VLF (0.01-0.05 Hz), LF (0.05-0.15 Hz), and HF (0.15-0.5 Hz) bands,
and the LF-to-HF ratio were estimated [248]. Respiratory regularity was measured by means
of sample entropy over seven epochs [185, 250] and self-(dis)similarity based on DTW and
dynamic frequency warping (DFW) [180] and uniform scaling [185] were derived. The same
network analysis features as for HRV were also computed for breath-to-breath intervals.
where h is a smoothing (spline) function, εi are independent and identically distributed resid-
uals. The smoothing function can be estimated by minimizing the objective function (i.e.,
penalized sum of square) such that
" #
n Z tn
ĥ = arg min ∑ [vi − h(ti)]2 + λ h′′ (t)2dt , (10.2)
h i=1 t1
where λ is a smoothing parameter that controls the trade-off between residual and local vari-
ation. The smoothing function can be expressed by cubic B-splines as basis functions and
determined via least squares approximation [84, 296].
For a specific overnight recording with a total of m epochs, it is divided in s continuous
segments (s = ⌈m/n⌉), designated as smoothing splines. Each segment can then be modeled by
the spline function, yielding a general spline fitting for the epochs over the entire recording. n
represents the smoothing window size where a larger n translates to a smoother fitting curve. In
this work, a window size of nine epochs for modeling splines was experimentally found to be
appropriate for the task of sleep stage classification.
10.2.3 Classifier
This work used a multi-class Bayesian linear discriminant with time-varying prior probabilities
[249], similar to that used in previous work [182]. For each epoch, the selected class (D, L, R,
or W) is the class ωi that maximizes the posterior probability given an feature vector x [97],
To select the final list of features we used a wrapper feature selection method based on sequen-
tial forward selection (SFS) [306] using as criterion the Cohen’s Kappa coefficient of agreement
κ [72] on the training set. This measure of agreement between the classification predictions and
the ground-truth annotations is more adequate than traditional measures of accuracy for this
problem since there is a strong imbalance between classes (L epochs, for example, account for
more than 50% of all epochs in the data set) and this coefficient factors out chance agreement,
compensating for class imbalance.
In many machine learning studies supervised feature selection is often applied on the en-
tire data set, even if the training and validation are kept separate (for example using cross-
validation). This common pitfall is known to introduce a bias in the evaluation of a classifier’s
Part III. Cardiorespiratory-based sleep stage classification 155
performance, which will often be overestimated [278]. Although keeping a hold-out set for
validation would solve this problem, the limited size of the data set would either mean that
the model learning would be based on potentially insufficient examples, or that the classi-
fier would be evaluated on a very small sample, potentially unrepresentative of the problem
at hand. Instead, the feature selection procedure was executed by strictly separating, on an
iterative procedure akin to cross-validation, the training and validation sets according to the
following procedure:
1. Randomly divide all subjects in the data set amongst ten folds of the same size
(a) Hold out fold i as a validation set and combine the subjects in the remaining folds
to form a training set
(b) Perform an iterative SFS procedure for the total number of available features N =
142, i.e., for each SFS-iteration j = 1, .., N,
i. Select which feature fi j should be added to the set of features selected in the
previous iteration (an empty set when j = 1),
fi j = arg max (κik ) ∀k : fik ∈
/ Fi j−1 (10.5)
k
where κik is the Kappa coefficient of agreement obtained after training and
classification on the training set using the set of features
Fi j−1 ∪ fik = fi1 , . . ., fi j−1 , fik (10.6)
and Fi j−1 is the set of features selected in the previous iteration of SFS.
ii. Store the set of features selected up to this iteration,
Fi j = fi1 , . . ., fi j−1 , fi j (10.7)
and the Kappa coefficient obtained with that set of features, κi j .
After the sets of features for all fold- and SFS-iterations and corresponding Kappa coefficients
are computed, the final consolidated list is obtained:
1. For a varying number of features j = 1, .., N, calculate the average Kappa κ̄ across the
ten iterations, κ̄ j ,
κ̄ = κ̄ j : ∀ j = 1, .., N (10.8)
with
∑10 κi j
κ̄ j = i=1 (10.9)
10
2. Calculate the smallest number of features S that yields a certain percentage P of the max-
imum average Kappa such that the Kappa values per fold-iteration are not significantly
different than those which gives the maximum average Kappa,
S = j | ∀k 6= j : κ j ≥ P · max (κ̄ ) ∧ κk ≥ κ j (10.10)
156 Chapter 10. Sleep stage classification
3. For each feature l, count the number of iterations that feature is selected, f cl , when
limiting the set of features on each iteration to S,
10
f cl = ∑ f cil (10.11)
i=1
where f cil indicates whether feature l is present in the set of selected features for fold-
iteration i and SFS-iteration S,
(
1, fl ∈ FiS
f cil = (10.12)
0, otherwise
4. Pick the S features withe largest feature count f cl to assemble the final set of consolidated
features, FS .
The discriminative power of selected features was evaluated with the absolute standardized
mean distance (ASMD) between the feature values of two classes, computed as
x̄1 − x̄2
ASMD = (10.13)
σ
where x̄1 and x̄2 are the sample means for class 1 and 2 and σ is the pooled sample SD.
After feature selection is performed and the set of features FS is chosen, the classification results
per subject were evaluated using a ten-fold cross-validation procedure using the same folds as
in the feature selection procedure:
(a) Hold out fold i as a validation set and combine the subjects in the remaining folds
to form a training set
(b) Restrict the feature set in the training set to the set FS
(c) Train an LD classifier with the training data
(d) For each subject in the validation set
i. Use the classifier trained in this iteration to classify each epoch of the current
subject
ii. Calculate the Kappa coefficient of agreement between the classification results
and the ground-truth annotations for this subject.
Part III. Cardiorespiratory-based sleep stage classification 157
After computing the Kappa coefficient for all subjects in the data set, the average and pooled
performance was calculated.
As mentioned, the (Cohen’s) Kappa coefficient κ is an adequate and well-accepted metric
for evaluating the agreement between sleep technician and computer-based classification since
it compensates for the random agreement that can occur due to class imbalance. In addition
to the Kappa coefficient, we also computed the traditional metric overall accuracy, i.e., the
percentage of correctly identified epochs. For these two metrics, the results are computed both
after pooling the predictions over all epochs of all subjects and after averaging the performance
for each subject.
Figure 10.1 indicates the Kappa coefficient obtained for each training set, for a varying number
of features. As illustrated, the maximum average training performance is obtained for 105
features, with an average Kappa of 0.58. Also clear in the figure, is a plateau in performance
between 70 and 100 features. This suggests that the number of features can be greatly decreased
without affecting the training performance. A small feature set is often desirable to prevent
over-fitting to the training data, as long as it is not so small that the model cannot learn the
characteristics of the problem.
Figure 10.2 illustrates the decrease in average training performance associated with a de-
crease in the number of features when choosing different operating points in Figure 10.1 (ex-
pressed in the scatter plot as percentages of the maximum training performance). As it can be
clearly observed in the figure, by allowing a reduction of 0.5% in the training performance,
the number of features can be reduced by 16.2% to a total number of 88 features without a
statistically significant decrease in performance. Allowing a further decrease of 0.5%, the total
number of features is reduced by 23.8% to a total of 80 features, also without a statistically
significant decrease in performance. From this point on, the performance reduction is signifi-
cant and reducing further the number of features will likely lead to a decrease in classification
performance after cross-validation. Using as criteria the smallest number of features that does
not decrease significantly the training performance, a total of S = 80 features was chosen.
Figure 10.3 illustrates the feature count given by (10.11) for each of the 142 features, using
S = 80. A total of 14 features were selected in all 10 iterations of the selection process, while
95 features were selected in at least 50% of the iterations. This means that after ranking the
features by their feature count and selecting the 80 features with the highest count, all features
in the final list of selected features were selected in at least 5 of the 10 iterations (with a mean
count of 7.67). This illustrates the robustness of the modified SFS method described earlier:
despite their simplicity and computational efficiency, sequential selection algorithms are known
to suffer from a so-called ‘nesting effect’, potentially leading to sub-optimal feature sets [238].
By iteratively performing several unbound SFS searches on different training sets and keeping
only the features that are selected most often, this effect should be reduced, as attested by the
158 Chapter 10. Sleep stage classification
0.7
(105, 0.58)
Training performance (κ)
0.6
0.5
0.4
Figure 10.1: Training performance per fold and average training performance. The maximum maximum
performance is indicated with a marker.
Reduction in training performance (%)
5 (58) 95.0%**
(60) 95.5%**
4 (61) 96.0%**
(64) 96.5%**
3 (66) 97.0%**
(68) 97.5%*
2 (71) 98.0%*
(75) 98.5%*
1 (80) 99.0%
(88) 99.5%
(105) 100.0%
0
0 5 10 15 20 25 30 35 40 45
Reduction in number of features (%)
Figure 10.2: Reduction in training performance caused by a reduction in the number of features. For
each point, the number of features (in parenthesis) and the corresponding percentage compared to the
total number of features are indicated. Significance of difference between performance with and without
feature reduction was tested with a Wilcoxon signed-rank test (∗ p < 0.05, ∗∗ p < 0.01).
large average number of iterations each feature in the final set was selected.
For brevity only the 14 features selected in all iterations will be discussed further. Table 10.1
indicates the discriminative power of each feature using the pooled ASMD. It was computed
for each pair of classes after aggregating the feature values for all subjects and also the 90th
percentile of the ASMD (in parenthesis) obtained for each feature, for all individual subjects.
Pooled ASMD values below 0.5 were omitted and 90th percentile ASMD values below 1 were
omitted.
The top features are clearly discriminative for different pairs of classes which helps explain
the relatively large number of features selected. Additionally, it is interesting to observe that
there is one feature (median likelihood ratio) which does not have a pooled ASMD above 0.5 for
any class pair. However, its 90th percentile ASMD value is larger than 1 for the pairs D/W and
L/W. This is a good example of a feature which is discriminative for only a subset of the subjects
Part III. Cardiorespiratory-based sleep stage classification 159
10
0
20 40 60 80 100 120 140
Feature index (-)
Figure 10.3: Feature count indicating, per feature, in how many iterations it was selected when the
number of features was limited to S = 80.
(at least 10%) but not for all subjects. The fact that it was selected in every single iteration
using the wrapper method described in Section 10.2.4 suggests that it is complementary to
other chosen features for certain subjects, helping raise the overall training performance.
10.3.2 Cross-validation
Table 10.2 indicates the classification performance obtained after 10-fold cross-validation using
the selected set of 80 features. In addition, it indicates the performance per class, obtained by
considering each class as the positive class and merging the remaining in a single negative class.
The highest performance is obtained for R detection, followed by W. The lowest performance
is obtained for L. This is further confirmed by the confusion matrix of Table 10.3 which shows
that the largest proportion of errors occurs when trying to distinguish L from the other classes.
For all other classes, the percentage of misclassified epochs (relative to the total number of
epochs) is below 1% except for L.
In order to evaluate the performance of the classifier in a three-class task (WRN), classes D
and L were merged in a single N (non-REM) class. Table 10.2 indicates the resulting perfor-
mance. Analyzing the performance of the classifier we see that the classification performance
rises substantially, to a Kappa of 0.56 and an accuracy of 80%. This was expected since a large
number of classification errors occurred between D and L, and in a WNR task these two classes
no longer need to be distinguished.
To evaluate whether the procedure used to determine the number of features during feature
selection was adequate, we plotted the average classification performance after cross-validation
if the whole feature selection procedure from (10.11) onwards is used to select different-sized
sets of features, and cross-validation is repeated with the corresponding feature sets (Fig-
ure 10.4).
As it can be seen, the maximum cross-validation performance (κ = 0.50) is obtained with 76
features, only 5.3% features less than the 80 features chosen by the feature selection procedure,
160 Chapter 10. Sleep stage classification
Table 10.1: Pooled and individual 90th percentile ASMD values for features selected in all
iterations
but 38.2% less than the 105 features that give the maximum training performance. Furthermore,
the performance obtained with 80 features (κ = 0.49) is actually slightly larger than the perfor-
mance obtained with 105 features (κ = 0.48), confirming that the feature reduction procedure
Part III. Cardiorespiratory-based sleep stage classification 161
Pred.↓ Ref.→ D L R W
D 3431 (7.6%) 1949 (4.3%) 5 (0.0%) 97 (0.2%)
L 2969 (6.6%) 19165 (42.6%) 2947 (6.5%) 2302 (5.1%)
R 86 (0.2%) 2071 (4.6%) 5383 (12.0%) 404 (0.9%)
W 31 (0.1%) 952 (2.1%) 243 (0.5%) 2996 (6.7%)
0.6
Cross-validation performance (κ)
(76, 0.50)
0.5 (105, 0.48)
(80, 0.49)
0.4
0.3
Maximum performance
Performance using feature selection
Using features that give maximum training performance
0.2 Average performance for all subjects
Standard deviation of the performance for all subjects
Figure 10.4: Performance after cross-validation for a varying number of features with markers indicating
the maximum performance, and the performance with the number of features resulting from the feature
selection procedure and with the number of features that give the best training performance.
W W W
R R R
PSG
L L L
D D D
Time since lights off Time since lights off Time since lights off
W W W
Prediction
R R R
L L L
D D D
00:00 01:59 03:59 05:59 07:59 00:00 01:59 03:59 05:59 07:59 00:00 01:50 03:40 05:30 07:20
Time since lights off Time since lights off Time since lights off
Figure 10.5: Example of sleep stage reference (top) and predictions (bottom) for the subject with the
worst performance (left), with the median performance (middle) and with the best performance (right).
trained with the characteristics of the general sample population does not fully capture this sub-
ject’s cardiac and respiratory expression of different sleep stages. However, despite the low
Kappa coefficient, the predicted hypnogram still exhibits some correct features, namely, most
REM intervals were detected, albeit with the incorrect length, and the two deep sleep periods
were also detected. As the performance improves, we see that the predicted hypnograms match
better the characteristics of the reference hypnogram, and in the best case the most obvious
mistakes are in the missed detection of brief periods of wake during the night while the rest of
the sleep stages are correctly predicted. This is likely caused by the use of spline smoothing
during feature post-processing, which is adequate to capture the slow-changing characteristics
of most sleep stages, but penalizes short, abrupt changes such as brief periods of awakening.
In literature, only a few studies focused on WRLD classification based on cardiac and/or res-
piratory signals and our results are amongst the best performing. The first observation is that
the results (κ = 0.41 and accuracy = 0.65) of our previous work [185], which used only respi-
ratory features, are worse than those produced in the present work, indicating that combining
cardiac and respiratory activity can lead to an improved classification performance. Isa et al.
[138] presented a Kappa coefficient κ of 0.26 (with an accuracy of 0.60) using only cardiac fea-
tures. The study of Hedner et al. [127] achieved similar results (κ = 0.48 and accuracy = 0.66)
but they used more signal modalities including peripheral arterial tone, actigraphy, and pulse
oximetry. The recent study by Willemen et al. [309] also achieved a good performance with a
κ of 0.56 and an accuracy of 0.69, although it was validated with a younger sample population
(age 22.1 ± 3.2 y), excluded 12% of the epochs from validation and used a basis of 60-s epochs
Part III. Cardiorespiratory-based sleep stage classification 163
instead of the standard scoring basis of 30 s which makes the results incomparable.
For WRN classification with cardiac and/or respiratory activity, we see that, to the best of
our knowledge, our results also outperform those reported in almost all of the previous studies,
such as κ = 0.32 and accuracy = 0.67 [248], κ = 0.45 and accuracy = 0.76 [249], κ = 0.42
and accuracy = 0.72 [198], κ = 0.44 and accuracy = 0.79 [161], κ = 0.55 and accuracy = 0.77
[200], κ = 0.48 and accuracy = 0.78 [167], κ = 0.46 and accuracy = 0.73 [312], κ = 0.62 and
accuracy = 0.81 [309], and κ = 0.58 and accuracy = 0.78 [94]. In comparison with one of the
best performing studies [94], we obtain a higher accuracy (albeit a slightly smaller Kappa) but
require one less modality (actigraphy). Regarding the work of Willemen et al. [309] it is again
important to note that the results in that study were obtained on basis of 60-s epochs.
10.4 Conclusion
This chapter presents a method to identify overnight sleep stages using cardiorespiratory fea-
tures extracted from ECG and RIP signals. These features were post-processed by means of
subject-specific Z-score normalization and spline smoothing, which helps reduce the influence
of signal noise, between-subject, or within-subject variability in autonomic physiology. Eighty
features were selected from a set of 142 features using a modified SFS-based feature selector
designed to avoid biasing validation performance. Using a linear discriminant classifier in a
ten-fold cross-validation procedure, the classification results (for both the four-class WRLD
and three-class WRN classification tasks) achieved in this work outperform most of the previ-
ous studies.
164 Chapter 10. Sleep stage classification
CHAPTER 11
165
166 Chapter 11. General discussion and future perspectives
HR, heart rate; RR, heartbeat interval; SD, standard deviation; SDNN, SD of RR; RR range, maximal-
to-minimal RR difference; pNN50, percentage of successive RR differences >50 ms; RMSSD, root mean
square of successive RR differences; SDSD, SD of successive RR differences; VLF, very low frequency;
LF, low frequency; HF, high frequency; SampEn, sample entropy; DFA, detrended fluctuation analysis;
PDFA, progressive DFA; WDFA, windowed DFA; VG, visibility graph; DVG, difference VG; BB, breath-
to-breath interval.
168 Chapter 11. General discussion and future perspectives
0
W/L
ASMD
0
W/D
ASMD
0
R/L
ASMD
0
R/D
ASMD
0
L/D
ASMD
Figure 11.1: Discriminative power as measured by ASMD of all the 142 features with post-processing
(Z-score and/or spline smoothing) in separating each two sleep stages.
6000
Without post-processing W/R/L/D
5000 With Z-score
With Z-score and smoothing
ANOVA F-statistic
4000
3000
2000
1000
7000
Without post-processing W/R/N
6000
With Z-score
With Z-score and smoothing
5000
ANOVA F-statistic
4000
3000
2000
1000
Figure 11.2: Discriminative power as measured by ANOVA F-statistic of all the 142 features with and
without post-processing (Z-score and/or spline smoothing) for W/R/L/D and W/R/N separation.
movements that has been successfully used to identify wake epochs. Its feature values had a
strongly skewed distribution, where the body movement information was usually reflected by
the high-frequency components in the spectral domain. Smoothing this feature would filter out
the useful body movement information, leading to a deteriorated discriminative power. There-
fore, we think that the post-processing methods should be ‘feature-dependent’. In other words,
it is worthwhile to investigate a criterion that can be used to determine if a feature needs to be
post-processed or not. For example, this criterion can be linked to the distribution of a specific
feature.
It was likely that the features with a high discriminative power would be significantly corre-
lated with mutual information in sleep stage classification. To have a general view of feature-
to-feature correlations, Figure 11.3 plots the Spearman’s rank correlation coefficients between
all the 142 features. The features respiratory frequency SD over 150, 210, and 270 s (R8-R10)
170 Chapter 11. General discussion and future perspectives
X2-12
X1
R16-44 0.9
0.8
0.7
C72-86 R1-15
0.6
Feature index
0.5
0.4
0.3
C1-71
0.2
0.1
are typical examples. These three features had the highest discriminative power in general but,
apparently, they are strongly correlated. On the other hand, some lower-ranked features could
still contribute to the classification if they contained additional physiological information that
was not observed in the top-ranked features. Therefore, feature selection that takes both the
feature discriminative power and the correlation between features into account. For example,
for the binary-class problem, the correlation-based feature selector (CFS) has been success-
fully used for deep sleep detection (Chapter 8), where only six features were selected without
loss in final classification performance. However, CFS was considered to be inapplicable for
the multiple-class problem since the changes in different features (reflecting certain aspects in
physiology) across sleep stages were not linear and were not even always consistent. A super-
vised sequential forward search (SFS) feature selection algorithm was described in Chapter 10,
whereas some features with a low discriminative power were still selected.
In Chapter 9, it was demonstrated that the physiological variations within subjects and from
subject to subject would be the main barrier for achieving reliable sleep stage classification re-
sults, where a multilevel modeling method was proposed to evaluate features by quantifying
the amount of those variations. As discussed in the associated chapters about the new fea-
tures, they were expected to either have additional physiological information or be robust to
between-/within-subject variability. Additionally, the employment of feature post-processing
methods (normalization and smoothing) was assumed to diminish the variations conveyed by
the features. However, it was not thoroughly exploited for those new features or post-processing
methods that what drove the contribution to getting improved sleep stage classification results.
Chapter 11. General discussion and future perspectives 171
Figure 11.4: Indication of agreement level of the Cohen’s Kappa coefficient [172].
Further research is required at this point using the method proposed in Chapter 9, which will
help understand the new features and the post-processing methods, thus inspiring us to, for ex-
ample, develop adaptive post-processing/feature selection algorithms to optimize each feature
for identifying different sleep stages.
Table 11.2: Studies on sleep stage classification with cardiorespiratory activity (and body movements)
Task First author, year Modalitya Record.b Epoch Algo.c Acc.d Kappad
WRLD Yilmaz, 2010 [315] ECG 8H 30 s SVM 73%† n.a.
classification Isa, 2011 [138] ECG 16 O 30 s RF 60% 0.26
Hedner, 2011 [127] BM, PAT, PO 227 H/O 30 s zzzPAT 66% 0.48
Willemen, 2014 [309] BM, ECG, RE 85 H 60 s SVM 69% 0.56
This thesis (Chapter 10)∗ ECG 48 H 30 s LD 67% 0.46
RE 48 H 30 s LD 66% 0.44
ECG, RE 48 H 30 s LD 69% 0.49
WRN Redmond, 2006 [248] ECG, RE 37 O 30 s LD 67% 0.32
classification Redmond, 2007 [249] ECG, RE 31 H 30 s LD 76% 0.45
Mendez, 2010 [198] BCG 17 H 30 s KNN 72% 0.42
Kortelainen, 2010 [161] BCG 18 H 30 s HMM 79% 0.44
Kurihara, 2012 [167] BCG 20 H 30 s IR 78% 0.48
Xiao, 2013 [312] ECG 45 H 30 s RF 73% 0.46
Willemen, 2014 [309] BM, ECG, RE 85 H 60 s SVM 81% 0.62
Domingues, 2014 [94] BM, ECG, RE 24 H 30 s HMM 78%‡ 0.58‡
This thesis (Chapter 10)∗ ECG 48 H 30 s LD 78% 0.52
RE 48 H 30 s LD 77% 0.50
ECG, RE 48 H 30 s LD 80% 0.56
SW Redmond, 2007 [249] ECG, RE 31 H 30 s LD 89% 0.60
classification Karlen, 2009 [151] ECG, RE 6H 30 s ANN 85% n.a.
Devot, 2010 [89] BM, ECG, RE 35 H/I 30 s LD 87% 0.62
Jung, 2013 [145] BCG 10 H 30 s TH 97%§ 0.83§
Willemen, 2013 [309] BM, ECG, RE 85 H 60 s SVM 92% 0.69
This thesis (Chapter 2) ECG 15 H 30 s LD 93% 0.48
BM, ECG 15 H 30 s LD 96% 0.64
This thesis (Chapter 3) RE 15 H 30 s LD 94% 0.59
BM, RE 15 H 30 s LD 96% 0.66
BM, ECG, RE 15 H 30 s LD 96% 0.67
D (SWS) Shinar, 2001 [273] ECG 34 H 30 s TH 80% n.a.
detection Choi, 2009 [68] BM, ECG 4H 30 s BACT 92%§ 0.62§
Bsoul, 2010 [53] ECG 16 H/O 30 s SVM 83%¶ n.a.
Hedner, 2011 [127] BM, PAT 227 H/O 30 s zzzPAT 89%♯ 0.49♯
Ebrahimi, 2013 [99] ECG 30 H 30 s LD 80% n.a.
Long, 2014 [186] ECG 15 H 30 s LD 81% 0.42
This thesis (Chapter 8) ECG 257 H 30 s LD 88% 0.54
RE 257 H 30 s LD 88% 0.51
ECG, RE 257 H 30 s LD 89% 0.57
a
BM, body movements; ECG, mostly heart rate variability (HRV) was used; RE, respiration; BCG, ballisto-
cardiogram (including BM, HRV, and/or RE); PAT, peripheral arterial tone; PO, pulse oximetry (also pulse
rate). b H, healthy subjects; O, subjects with obstructive sleep apnea; I, insomniacs. c SVM, support vector ma-
chine; RF, random forest; zzzPAT, the algorithm described in [131]; LD, (Bayesian) linear discriminant; KNN,
k-nearest neighbour; HMM, hidden Markov model; IR, incidence ratio; ANN, artificial neural network; TH,
thresholding; BACT, the algorithm described in [68]. d Subject-independent classification results are presented.
∗ Results were either presented in the corresponding chapter or produced using the methods in that chapter.
† Cross-validation was used within each subject. ‡ Ambiguous epochs were rejected, § Training and test sets
were mixed. ¶ Light sleep were disregarded. ♯ Results were re-computed from the reported confusion matrix.
Chapter 11. General discussion and future perspectives 173
used 60-s epochs rather than the clinically standard 30 s which made the classification easier.
Jung et al. [145] (for SW classification) and Choi et al. [68] validated their algorithm without
splitting training and test sets, obviously leading to bias of the classification performance.
The findings in regard to the time delay between changes in autonomic and brain activity
during some sleep stage transitions have been described in Chapter 7 and utilized for helping
detect deep sleep from all the other sleep stages (Chapter 8). Unfortunately, these findings
were not incorporated in classifying multiple sleep stages in the methodology presented in
Chapter 10, which will be promising in further improving the classification performance. It
is important to note that the incorporation of time-delayed methods should depend on sleep
stages as long as some sleep stage transitions appear no time delays between autonomic and
brain activity.
In addition to sleep stages, arousals also influence the changes in cardiorespiratory activity
during sleep [293], constraining the sleep stage classification. Hence, correcting the arousal
influences would promise an improvement in cardiorespiratory-based sleep stage classification.
This merits further study.
To answer the research question of this thesis raised in Chapter 1, Figure 11.5 shows the
progressive increases of our sleep stage classification results achieved in different phases during
the past four years’ PhD work. It can be seen that more reliable performances in sleep stage clas-
sification (for healthy adults) with body movements, cardiac and/or respiratory activity has been
achieved. It is interesting to compare the performance of sleep stage classification (for wake,
REM sleep, light sleep, and deep sleep) using our cardiorespiratory-based approach with that
using an automatic PSG-based system. Here the agreement between automatic classification
and the standard manual scoring were used for comparison. With the validated Somnolyzerr
[20], a Cohen’s Kappa coefficient of 0.80 and an accuracy of 85% were reached for classifying
the four stages (re-computed based on the reported confusion matrix). These results are com-
parable to the agreement between human raters [81], indicating that, unsurprisingly, the PSG-
based automatic sleep staging system far outperforms the cardiorespiratory-based approach pre-
sented in this thesis. This implies that sleep stage classification with cardiorespiratory activity
is not applicable for clinical utilization at present. However, it is still promising for home sleep
monitoring aiming at offering an understandable sleep assessment for users in a healthy con-
dition. Nevertheless, further researches are encouraged to improve the cardiorespiratory-based
sleep classification, particularly in distinguishing between wake and REM sleep and between
light and deep sleep that seem more difficult than separating the other sleep stages (see Chap-
ter 10).
11.3 Classifier
Evaluating different classifiers is out of the scope of the present thesis, in which a simple linear
discriminant classifier was adopted all the time. In fact, many other different classifiers have
been tested over the years including thresholding (TH), quadratic discriminant (QD), hidden
Markov models (HMM), support vector machines (SVM), random forest (RF), neural networks
174 Chapter 11. General discussion and future perspectives
0.7 BM+ECG+RE
BM+ECG+RE BM+RE
BM+ECG BM+ECG
0.6 BM+RE RE
ECG+RE ECG+RE
Cohen’s Kappa coefficient
ECG
ECG
RE RE
0.5 ECG
ECG+RE
RE RE ECG
RE
RE
ECG+RE
ECG
0.4 RE RE
RE SW classification
0.3 D detection
WRN classification
RE
WRLD classification
0.2
2011 2012 2013 2014 2015
Year
Figure 11.5: Progression of increases in sleep stage classification performance (Cohen’s Kappa coef-
ficient) achieved in different phases during the PhD work. All increases were found to be significant
(p < 0.05), examined with a Wilcoxon (two-sided) sign-rank test. The signal modalities included body
movements (BM), electrocardiogram (ECG), and/or respiratory effort (RE). The classification tasks in-
cluded: sleep and wake (SW) classification, deep sleep (D) or slow wave sleep (SWS) detection, wake,
REM sleep, and NREM sleep (WRN) classification, and wake, REM sleep, light sleep, and deep sleep
(WRLD) classification. The highest Kappa for each classification task is marked in bold.
(NN), etc. For many classification tasks, the LD classifier was found to be one of the best per-
forming algorithms (see Table 11.2). The strength of LD lies in the underlying simple model,
providing a robust model of the features over the different sleep stages. From a machine learn-
ing point of view, we speculate that the current features only expressed limited indicative phys-
iological information for separating sleep stages so that the classification performance would
not be markedly improved unless new features or classifiers that can characterize additional
inherent physiological information are used. For example, because the LD classifier is inde-
pendent of time whereas sleep is a structured process (i.e., the state and characteristics of each
epoch are not independent), temporal classifiers exerting this structure are expected to improve
the classification. Therefore, exploring these types of classifiers should be in the future work.
Healthy subject
Wake
REM
Light
Deep
0 1 2 3 4 5 6 7
Time (h)
Insomniac
Wake
REM
Light
Deep
0 1 2 3 4 5 6 7
Time (h)
OSA patient
Wake
REM
Light
Deep
0 1 2 3 4 5 6 7
Time (h)
Figure 11.6: Examples of overnight sleep stages for a healthy subject, an insomniac, and an OSA patient.
subjects and might not be appropriate for the other subject groups with prevalent sleep prob-
lems. For example, Figure 11.6 depicts typical examples of overnight sleep architecture (sleep
stages) from a healthy subject, an insomniac, and a patient with severe obstructive sleep ap-
nea (OSA). The sleep stages (wake, REM sleep, light sleep, and SWS) were obtained through
PSG-based manual scoring by sleep technicians. Clearly in the figure, the manifested sleep
architecture differs between the three subjects throughout the night, during which sleep stages
would be altered by the pathophysiology of the disordered sleep. For example, in comparison
with healthy subjects, insomniacs experience much longer wake time [215], sometimes along
with SWS deficiency [109, 112]. Patients with the sleep apnea syndrome is associated with
sleep fragmentation (with a lot of sleep stage transitions) due to the repeated occurrences of
end-apneic arousals [156, 317], altered cardiac variability [207], and dysfunction in autonomic
nervous activity changing across sleep stages [119, 120] which is the rationale and hypothesis
for autonomic-based sleep stage classification. In addition, the autonomic function is also in-
fluenced by the presence of some sleep problems [22, 195, 280]. As a consequence, these will
lead to difficulties in cardiorespiratory-based sleep stage classification for these patient groups.
Using a classifier trained by the data from healthy subjects to classify sleep stages for sleep
disordered patients would obviously not be applicable. For a specific patient group, even if
the classifier derived from some patients is applied for the other patients in the same patient
group, the classification performances would still be worse than those obtained for healthy
subjects. This can be seen in Table 11.2, for example, by comparing the classification results
in [249] for healthy subjects (Kappa = 0.46, accuracy = 76%) and in [248] for OSA patients
(Kappa = 0.32, accuracy = 67%). Moreover, the effectiveness of the features and the post-
176 Chapter 11. General discussion and future perspectives
processing methods proposed in this thesis is unknown for other subject groups, which should
be further studied. For example, the Z-score normalization assumed that the percentages of
sleep stages for different subjects are similar which is not always the case for sleep disrupted
patients such as insomniacs. Smoothing the feature values per night for OSA patients with
a fragmented sleep architecture seems not appropriate since it would not be able to capture
the fast and subtle changes in cardiorespiratory activity caused by the frequent occurrence of
arousals. Thus, the post-processing methods must be re-examined for those patient groups.
In addition, it was found that the sleep stage classification performance was dependent of
age. Chapter 8 has revealed that the deep sleep epochs were easier to be correctly identified
with cardiorespiratory activity for younger subjects in comparison with elderly people. In fact,
the overnight sleep architecture is age-related as shown by Ohayon et al. [216], in which a meta-
analysis of sleep parameters across the human lifespan (for healthy subjects with ages 5 to 85
y) showed that the total sleep time (TST), wake after sleep onset (WASO), REM sleep time,
and deep sleep time decrease along with the increase in age. Moreover, the multilevel analysis
results presented in Chapter 9 indicate that the autonomic cardiorespiratory activity during
sleep was significantly influenced by age. To these matters, executing sleep stage classification
for different age groups would be promising to further enhance the classification performance.
This merits further exploration.
Although the focus of this thesis was on objective sleep monitoring, it is important to assess
sleep from subjective perspectives because ‘sleep quality’ is also linked to the perception or
feeling by humans. For example, sleep deprivation has been consistently associated with the
loss of daytime (cognitive or behavioral) performance, such as drowsiness, irritability, or in-
creased fatigue [47, 98, 263]. Chronic sleep disruption caused by, e.g., sleep fragmentation, has
been shown to relate to worsened mood [255].
Several clinical questionnaires have been published for examining sleep quality such as the
Pittsburgh Sleep Quality Index (PSQI) [60] and the Self-Assessment of Sleep and Awakening
Quality Scale (SSA) [262]. The relationship between objective sleep variables, derived from a
PSG-based sleep architecture, and subjective sleep quality, obtained from questionnaires, has
been researched thoroughly in the past. Although some inconsistent or even paradoxical re-
sults were found where the study outcomes differed to which extent the objective variables are
correlated, some objective variables were consistently related to the subjective sleep experi-
ence. The most profound association was between wake time and subjective sleep quality with
a correlation r = −0.59 [9, 21, 251]. A reliable objective sleep stage classification will enable
the analysis and construction of a frontier model by combining objective and subjective mea-
surements to assess an overall sleep quality, in particular when the classification is done with
cardiorespiratory activity (and body movements) that can be acquired unobtrusively at home.
Krystal and Edinger [164] proposed new methods to analyze PSG data more at a measure of
nature/depth of sleep, such as indices for the frequency content of electroencephalogram (EEG)
Chapter 11. General discussion and future perspectives 177
signals obtained during NREM sleep, or to look at particular patterns in the NREM sleep and
their sequence between NREM sleep patterns instead of only taking the variables derived from
sleep stages. In the context of sleep monitoring with cardiorespiratory activity, it would be
tempting to go a further step to investigate the associations between autonomic physiological
measures and subjective sleep quality have not been explicitly analyzed.
Intuitively, it was expected that a stable sleep, seen in, for example, a low breathing rate
variation overnight, is indicative for a good sleep quality rating. Taking this respiratory measure
as an example, we analyzed the (Spearman’s rank) correlation between the total SSA score and
the overnight mean standard deviation of breathing rates (SDBR) for males and females and for
three age groups [young: 20-39 y (n = 52), middle aged: 40-69 y (n = 69), and elderly: ≥70 y
(n = 44)]. The SSA consists of 27 questions, divided in four parts: sleep quality, awakening
quality, somatic complaints, and estimates about sleeping times of last nights. A total SSA
score can be calculated when taking the first three parts, or a sub score of each part separately
can be calculated. The total score range is between 20 and 80, where higher scores indicate
poorer sleep quality. The respiratory measure here was obtained as the mean of the whole
night, calculated by taking the mean SDBR for each sleep stage separately and followed by
calculating the mean of those separate means for different sleep stages. This was done so
that the final mean value was not influenced by the differences in the percentages of the sleep
stages, serving to purely look at the physiological measure without including information of
sleep stages. The data used here was the same as that used in Chapter 9 including 165 subjects
monitored in sleep laboratories with PSG for two consecutive nights. Positive correlations were
found between the mean SDBR and the total SSA score [night 1: r = 0.179, p = 0.024; night 2:
r = 0.213, p = 0.007]. This means that a higher variation of breathing rate was associated
with worse sleep experience. However, the correlation coefficient was not high, implicating
that a weak association in between. A gender effect was observed in both nights, as significant
correlations were found between mean SDBR and total score on SSA for females [night 1:
r = 0.263, p = 0.014, Figure 11.7(a); night 2: r = 0.300, p = 0.005, Figure 11.7(b)], but not
appeared for males. Therefore, a higher mean variation of the breathing rate was associated
with a worse sleep quality in women. The significant correlations found for females contributed
to the presence of the previous (weak) correlations found in all the subjects. An explanation
for this is not clear and needs to be further investigated. Additionally, moderate correlations
were found between the mean SDBR and the total score of the SSA in the first night in the
elderly group [r = 0.399, p = 0.008, Figure 11.7(c)]. No significant correlations were observed
for the other age groups, suggesting that, especially for elderly subjects, a higher breathing rate
variation was associated with worse sleep experience. However, these results were not present
in the second night, meaning that these findings might be due to the “first-night effect” present
in this data set [196].
This was a preliminary analysis (as an example) and future research with more in-depth
analyses of the PSG data is needed to better understand the relationship between objective
sleep measurements and subjective sleep quality. Moreover, multiple nights are necessary to
assess the night-to-night variability within this relationship.
178 Chapter 11. General discussion and future perspectives
30 30 30
20 20 20
0 0.05 0.1 0 0.05 0.1 0 0.05 0.1
Mean SDBR (Hz) Mean SDBR (Hz) Mean SDBR (Hz)
Figure 11.7: Scatter plot of the total score on the SSA and the mean SDBR for (a) women of night 1
and (b) women of night 2, and (c) the elderly group of night 1.
[1] J. Aach and G. M. Church. Aligning gene expression time series with time warping
algorithms. Bioinformatics, 17(6):495-508, 2001.
[2] Adidas miCoach Heart Rate Monitor (retrieved in Jan. 2015). [Online] Available:
http://micoach.adidas.com/heartratemonitor.
[3] M. Adnane, Z. Jiang, and Z. Yan. Sleep-wake stages classification and sleep efficiency
estimation using single-lead electrocardiogram. Expert Syst. Appl., 39(1):1401–1413,
2012.
[4] V. X. Afonso, W. J. Tompkins, T. Q. Nguyen, and S. Luo. ECG beat detection using filter
banks. IEEE Trans. Biomed. Eng., 46(2):192–202, 1996.
[5] H. W. Agnew and W. B. Webb. Measurement of sleep onset by EEG criteria. Am. J. EEG
Technol., 12:127-134, 1972.
[6] M. Ahmadlou and H. Adeli. Visibility graph similarity: A new measure of generalized
synchronization in coupled dynamic systems. Physica D, 241(4):326-332, 2012.
[7] J. Alihanka, K. Vaahtoranta, and I. Saarikivi. A new method for long-term monitoring of
the ballistocardiogram, heart rate, and respiration. Am. J. Physiol. Regul. Integr. Comp.
Physiol., 240(5):R384-R392, 2012.
[9] T. Åkerstedt, K. Hume, D. Minors, and J. Waterhouse. The meaning of good sleep: a lon-
gitudinal study of polysomnography and subjective sleep quality. J. Sleep Res., 3(3):152–
158, 1994.
[10] T. Åkerstedt, K. Hume, D. Minors, and J. Waterhouse. Good sleep–its timing and physi-
ological sleep characteristics. J. Sleep Res., 6(4):221–229, 1997.
179
180 References
[12] M. Ako, T. Kawara, S. Uchida, S. Miyazaki, K. Nishihara, J. Mukai, K. Hirao, J. Ako and
Y. Okubo. Correlation between electroencephalography and heart rate variability during
sleep. Science, 57(1):59–65, 2003.
[14] R. Albert and A.-L. Barabási. Statistical mechanics of complex networks. Rev. Mod.
Phys., 74:47, 2002.
[17] F. Amzica and M. Steriade. Electrophysiological correlates of sleep delta waves. Elec-
troencephalogr. Clin. Neurophysiol., 107(2):69–83, 1998.
[22] M. Aydin, R. Altin, A. Ozeren, L. Kart, M. Bilge, and M. Unalacak. Cardiac auonomic
activity in obstructive sleep apnea: time-dependent and spectral analysis of heart rate
variability using 24-hour Holter electrocardiograms. Tex. Heart Inst. J., 31(2):132–136,
2004.
[25] R. Bailon, P. Laguna, L. Mainardi, and L. Sornmo. Analysis of heart rate variability
using time-varying frequency bands based on respiratory frequency. In Proc. 29th Ann.
Int. Conf. IEEE Eng. Med. Biol. Soc., pp. 6675–6678, Lyon, France, 2007.
[27] S. Banks and D. F. Dinges. Behavioral and physiological consequences of sleep restric-
tion. J. Clin. Sleep Med., 3(5):519–528, 2007.
[28] A. Bar, G. Pillar, I. Dvir, J. Sheffy, R. P. Schnall, and P. Lavie. Evaluation of a portable
device based on peripheral arterial tone for unattended home sleep studies. Chest,
123(3):695–703, 2003.
[34] J. H. Benington and H. C. Heller. Restoration of brain energy metabolism as the function
of sleep. Prog. Neurobiol., 45(4):347–360, 1995.
182 References
[35] R. J. Berger and N. H. Phillips. Energy conservation and sleep. Behav. Brain Res., 69(1-
2):65–73, 1995.
[36] I. I. Berlad, A. Shlitner, S. Ben-Haim, and P. Lavie. Power spectrum analysis and heart
rate variability in stage 4 and REM Sleep: evidence for state-specific changes in auto-
nomic dominance. J. Sleep Res., 2(2):88–90, 1993.
[37] D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series.
In Proc. Assoc. Advancement Artif. Intell. Workshop Knowl. Disc. Databases (AAAI-
KDD’94), pp. 359–370, 1994.
[45] M. H. Bonnet. Effect of sleep disruption on sleep, performance, and mood. Sleep,
8(1):11–19, 1985.
References 183
[46] M. H. Bonnet and D. L. Arand. Heart rate variability: sleep stage, time of night, and
arousal influences. Electroenceph. Clin. Neurophysiol., 102(5):390–396, 1997.
[47] M. H. Bonnet and D. L. Arand. Clinical effects of sleep fragmentation versus sleep de-
privation. Sleep Med. Rev., 7(4):297–310, 2003.
[48] A. A. Borbély and P. Achermann. Sleep homeostasis and models of sleep regulation. J.
Biol. Rhythms., 14(6):559-570, 1999.
[52] G. de Bruijne, P. Sommen, and R.M. Aarts. Detection of epileptic seizures through audio
classification. In 4th Eur. Congr. Int. Fed. Med. Biol. Eng. (IFMBE’08), pp. 1450–1454,
Antwerpen, Belgium, 2008.
[53] M. Bsoul, H. Minn, M. Nourani, G. Gupta, and L. Tamil. Real-time sleep quality as-
sessment using single-lead ECG and multi-stage SVM classifier. In Proc. 32nd Ann. Int.
Conf. IEEE Eng. Med. Biol. Soc. (EMBC’10), pp. 1178–1181, Buenos Aires, Argentina,
2010.
[54] A. Bunde, S. Havlin, J. W. Kantelhardt, T. Penzel, J.-H. Peter, and K. Voigt. Corre-
lated and uncorrelated regions in heart-rate fluctuations during sleep. Phys. Rev. Lett.,
85(17):3736, 2000.
[55] H. J. Burgess, A. L. Holmes, and D. Dawson. The relationship between slow-wave activ-
ity, body temperature, and cardiac activity during nighttime sleep. Sleep, 24(3):343–349,
2001.
[56] H. J. Burgess, T. Sletten, N. Savic, S. S. Gilbert, and D. Dawson. Effects of bright light
and melatonin on sleep propensity, temperature, and cardiac activity at night. J. Appl.
Physiol., 91(3):1214–1222, 2001.
[57] H. J. Burgess, J. Trinder, Y. Kim, and D. Luke. Sleep and circadian influences on cardiac
autonomic nervous system activity. Am. J. Physiol. Heart Circ. Physiol., 273(4):H1761–
H1768, 1997.
[58] R. L. Burr. Interpretation of normalized spectral heart rate variability indices in sleep
research: a critical review. Sleep, 30(7):913–919, 2007.
184 References
[62] C. Cajochen, J. Pischke, D. Aeschbach, and A. A. Borbély. Heart rate dynamics during
human sleep. Physiol. Behav., 55(4):767–774, 1994.
[64] N. Carter, R. Henderson, S. Lai, M. Hart, S. Booth, and S. Hunyor. Cardiovascular and
autonomic response to environmental noise during sleep in night shift workers. Sleep,
25(4):457-464, 2002.
[65] Centers for Disease Control and Prevention (CDC). Perceived insufficient rest or sleep
among adults – United States, 2008. MMWR Morb. Mortal. Wkly. Rep., 58(42):1175-
1179, 2009.
[68] B. H. Choi, G. S. Chung, J.-S. Lee, D.-U. Jeong, and K. S. Park. Slow-wave sleep
estimation on a load-cell-installed bed: a non-constrained method. Physiol. Meas.,
30(11):1163–1170, 2009.
[69] G. S. Chung, B. H. Choi, J.-S. Lee, J. S. Lee, D.-U. Jeong, and K. S. Park. REM sleep
estimation only using respiratory dynamics. Physiol. Meas., 30(12):1327–1340, 2009.
[70] G. S. Chung, J. S. Lee, S. H. Hwang, Y. K. Kim, D.-U. Jeong, and K. S. Park. Wake-
fulness estimation only using ballistocardiogram: Nonintrusive method for sleep moni-
toring. In Proc. 32nd Ann. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC’10), pp. 2459–
2462, Buenos Aires, Argentina, 2010.
[72] J. Cohen. A coefficient of agreement for nominal scales. Educ. Psychol. Meas., 20(1):37–
46, 1960.
[75] M. Costa, A. L. Goldberger, and C.-K. Peng. Multiscale entropy analysis of biological
signals. Phys. Rev. E, 71(2):021906, 2005.
[77] D. Cysarz, H. Bettermann, S. Lange, D. Geue, and P. Van Leeuwen. A quantitative com-
parison of different methods to detect cardiorespiratory coordination during night-time
sleep. Biomed. Eng. Online, 3:44, 2004.
[78] D. Cysarz, H. Bettermann, and P. Van Leeuwen. Entropies of short binary sequences in
heart period dynamics. Am. J. Physiol. Heart Circ. Physiol., 278(6):H2163–2172, 2000.
[80] Y. Dagan. Circadian rhythm sleep disorders (CRSD). Sleep Med. Rev., 6(1):45–54, 2002.
[82] H. Davis, P. A. Davis, A. L. Loomis, E. N. Harvey, and G. Hobart. Human brain poten-
tials during the onset of sleep. J. Neurophysiol., 1:24-38, 1938.
[83] J. Davis and M. Goadrich. The relationship between precision-recall and ROC curves. In
Proc. 23rd Int. Conf. Machine Learn. (ICML’06), pp. 223–240, Pittsburgh, PA, 2006.
[84] C. De Boor. A Practical Guide to Splines, Springer-Verlag, New York, NY, 2001.
[88] W. Dement and N. Kleitman. The relation of eye movements during sleep to dream
activity: an objective method for the study of dreaming. J. Exp. Psychol., 53(5):339–
346, 1957.
[90] M. Dhamala, V. K. Jirsa, and M. Ding. Enhancement of neural synchrony by time delay.
Phys. Rev. Lett., 92(7):074104, 2004.
[92] D. J. Dijk. Slow-wave sleep, diabetes, and the sympathetic nervous system. Proc. Natl.
Acad. Sci. U.S.A., 105(4):1107-1108, 2008.
[94] A. Domingues, T. Paiva, and J. M. Sanches. Hypnogram and sleep parameter computa-
tion from activity and cardiovascular data. IEEE Trans. Biomed. Eng., 61(6):1711–1719,
2014.
[97] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2nd edn., Wiley-
Interscience Press, 2000.
[99] F. Ebrahimi, S.-K. Setarehdan, J. Ayala-Moyeda, and H. Nazeran. Automatic sleep stag-
ing using empirical mode decomposition, discrete wavelet transform, time-domain, and
nonlinear dynamics features of heart rate variability signals. Comput. Methods Programs
Biomed., 112(1):47–57, 2013.
[101] S. Elsenbruch, M. J. Harnish, and W. C. Orr. Heart rate variability during waking and
sleep in healthy males and females. Sleep, 22(8):1067–1071, 1999.
[103] T. Fawcett. ROC graphs: notes and practical considerations for researchers, Tech. Rep.
HP Labs, Palo Alto, CA, 2004.
[104] J. Fell, J. Röschke, K. Mann, and C. Schäffner. Discrimination of sleep stages: a com-
parison between spectral and nonlinear EEG measures. Electroencephalogr. Clin. Neu-
rophysiol., 98(5):401–410, 1996.
[105] Fitbit ONE Wireless Activity and Sleep Tracker (retrieved in Jan. 2015). [Online] Avail-
able: https://www.fitbit.com./one.
[106] M. Folke, L. Cernerud, M. Ekstrom, and B. Hok. Critical review of non-invasive respi-
ratory monitoring in medical care. Med. Biol. Eng. Comput., 41(4):377–383, 2003.
[108] J. Foussier, P. Fonseca, X. Long, and S. Leonhardt. Automatic feature selection for
sleep/wake classification with small data sets. In 7th Int. Joint Conf. Biomed. Eng. Syst.
Technol. (BIOSTEC’13), pp. 178–184, Barcelona, Spain, 2013.
[109] B. L. Frankel, R. D. Coursey, R. Buchbinder, and F. Snyder. Recorded and reported sleep
in chronic primary insomnia. Arch. Gen. Psychiatry, 33(5):615–623, 1976.
[111] P. M. Fuller and C. J. Amlaner (eds.). SRS Basics of Sleep Guide, 2nd ed., Sleep Research
Society, Darien, IL, 2011.
[115] I. Gath and E. Bar-On. Computerized method for scoring of polygraphic sleep record-
ings. Comput. Prog. Biomed., 11(3):217–223, 1980.
[118] P. Grossman, F. H. Wilhelm, and M. Spoerle. Respiratory sinus arrhythmia, cardiac va-
gal control, and daily activity. Am. J. Physiol. Heart Circ Physiol., 287(2):H728–H734,
2004.
[121] M. A. Hall. Correlation-based feature selection for machine learning. Ph.D. dissertation,
Dept. Computer Science, The Univ. of Waikato, Hamilton, New Zealand, 1999.
[125] H. He and E. A. Garcia. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng.,
21(9):1263-1284, 2009.
References 189
[126] J. Hedner, G. Pillar, S. D. Pittman, D. Zou, L. Grote, and D. P. White. A novel adap-
tive wrist actigraphy algorithm for sleep-wake assessment in sleep apnea patients. Sleep,
27(8):1560-1566, 2004.
[128] A. Heinrich, F. Van Heesch, B. Puvvula, and M. Rocque. Video based actigraphy and
breathing monitoring from the bedside table of shared beds. J. Ambient. Intell. Human
Comput., 6(1):107–120, 2015.
[129] R. C. Heinzer and F. Series. Normal physiology of the upper and lower airways. In Prin-
ciples and practice of sleep medicine, edited by M. H. Kryger, T. Roth, W. C. Dement,
pp. 581–596, Saunders Elsevier, St. Louis, MO, 2011.
[130] E. Hellinger. Neue begründung der theorie quadratischer formen von unendlichvielen
veränderlichen. J. für die Reine und Angew Math., 36:210–271, 1909.
[131] S. Herscovici, A. Pe’er, S. Papyan, P. Lavie. Detecting REM sleep from the finger: an
automatic REM sleep algorithm based on peripheral arterial tone (PAT) and actigraphy.
Physiol. Meas., 28(2):129–140, 2007.
[132] R. L. Horner. Autonomic consequences of arousal from sleep: mechanisms and implica-
tions. Sleep, 19(10 Suppl.):S193-195, 1996.
[134] J. J. Hox. Multilevel Analysis: Techniques and Applications, 2nd edn., Routledge, 2010.
[135] D. W. Hudgel, R. J. Martin, B. Johnson, and P. Hill. Mechanics of the respiratory system
and breathing pattern during sleep in normal humans. J. Appl. Physiol. Respir. Environ.
Exerc. Physiol., 56(1):133–137, 1984.
[136] C. Iber, S. Ancoli-Israel, A. L. Chesson, and S. F. Quan. The AASM Manual for the Scor-
ing of Sleep and Associated Events: Rules, Terminology and Technical Specifications.
American Academy of Sleep Medicine, Westchester, IL, 2007.
[137] Y. Ichimaru, K. P. Clark, J. Ringler, and W. J. Weiss. Effect of sleep stage on the relation-
ship between respiration and heart rate variability. In Computers in Cardiology (CinC),
pp. 657–660, Chicago, IL, 1990.
[140] B. H. Jansen and K. Shankar. Sleep staging with movement-related signals. Int. J.
Biomed. Comput., 32(3-4):289-297, 1993.
[141] J. J. Liu, W. Xu, M.-C. Huang, N. Alshurafa, M. Sarrafzadeh, N. Raut, and B. Yadegar.
Sleep posture analysis using a dense pressure sensitive bedsheet. Perv. Mobile Comput.,
10(2):34-50, 2014.
[144] S. Jiang, C. H. Bian, X. B. Ning, and Q. D. Y. Ma. Visibility graph analysis on heartbeat
dynamics of meditation training. Appl. Phys. Lett., 102(25):253702, 2013.
[145] D. W. Jung, S. H. Hwang, H. N. Yoon, Y.-J. G. Lee, D.-U. Jeong, and K. S. Park. Noc-
turnal awakening and sleep efficiency estimation using unobtrusively measured ballisto-
cardiogram. IEEE Trans. Biomed. Eng., 61(1):131–138, 2013.
[146] F. Jurysta, P. Van De Borne, P.-F. Migeotte, M. Dumont, J.-P. Lanquart, J.-P. Degaute,
P. Linkowski. A study of the dynamic interactions between sleep EEG and heart rate
variability in healthy young men. Clin. Neurophysiol., 114(11):2146–2155, 2003.
[151] W. Karlen, C. Mattiussi, and D. Floreano. Sleep and wake classification with ECG and
respiratory effort signals. IEEE Trans. Biomed. Circuits Syst., 3(2):71–78, 2009.
[152] E. Keogh and M. Pazzani. Scaling up dynamic time warping for datamining applications.
In Proc. 6th Assoc. Comput. Mach. SIG Knowl. Discovery Data Mining (ACM SIGKDD),
pp. 285-289, 2000.
[153] L. Keselbrener and S. Akselrod. Selective discrete Fourier transform algorithm for time-
frequency analysis: method and application on simulated and cardiovascular signals.
IEEE Trans. Biomed. Eng., 43(8):789–802, 1996.
[154] J. W. Kim, J.-S. Lee, P. A. Robinson, and D.-U. Jeong. Markov analysis of sleep dynam-
ics. Phys. Rev. Lett., 102(17):178104, 2009.
[155] S. Kim, S. H. Park, and C. S. Ryu. Multistability in coupled oscillator systems with time
delay. Phys. Rev. Lett., 79(15):2911, 1997.
[157] M. T. Kinlaw and A. W. Hunt. Time dependence of delayed neutron emission for fission-
able isotope identification. Appl. Phys. Lett., 86(25):254104, 2005.
[158] T. Kirjavainen, D. Cooper, O. Polo, and C. E. Sullivan. Respiratory and body movements
as indicators of sleep stage and wakefulness in infants and young children. J. Sleep Res.,
5(3):186-194, 1996.
[163] J. Krieger. Breathing during sleep in normal subjects. Clin. Chest Med., 6(4):577-594,
1985.
192 References
[164] A. D. Krystal and J. D. Edinger. Measuring sleep quality. Sleep Med., 9(Suppl.1):S10-
S17, 2008.
[167] Y. Kurihara and K. Watanabe. Sleep-stage decision algorithm by using heartbeat and
body-movement signals. IEEE Trans. Syst. Man. Cybern. A Syst. Hum., 42(6):1450-
1459, 2012.
[169] L. Lacasa, B. Luque, F. Ballesteros, J. Luque, and J. C. Nuño. From time series to com-
plex networks: the visibility graph. Proc. Natl. Acad. Sci. U.S.A., 105(13):4972-4975,
2008.
[170] D. K. Lake, J. R. Moorman, and H. Cao. Sample entropy estimation using sampen. In
PhysioNet (May 2014), [Online] Available: http://physionet.org/physiotools/sampen.
[172] J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical
data. Biometrics, 33(1):159–174, 1977.
[173] L. E. Larsen and D. O. Walter. On automatic methods of sleep staging by EEG spectra.
Electroencephalogr. Clin. Neurophysiol., 28(5):459-467, 1970.
[174] J. Lázaro, E. Gil, R. Bailón, A. Minchole, and P. Laguna. Deriving respiration from
photoplethysmographic pulse width. Med. Biol. Eng. Comput., 51(1-2):233–242, 2013.
[176] S. S. Lobodzinski and M. M. Laks. New devices for very long-term ECG monitoring.
Cardiol. J., 19(2):210–214, 2012.
References 193
[177] X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Using dynamic time
warping for sleep and wake discrimination. In Proc. IEEE-EMBS Int. Conf. Biomed.
Health Inf. (BHI), pp. 886–889, Hong Kong and Shenzhen, China, 2012.
[179] X. Long, P. Fonseca, R. Haakma, R. M. Aarts, and J. Foussier. Spectral boundary adap-
tation on heart rate variability for sleep and wake classification. Int. J. Artif. Intell. Tools,
23(3):1460002, 2014.
[180] X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Sleep and wake classi-
fication with actigraphy and respiratory effort using dynamic warping. IEEE J. Biomed.
Health Inform., 18(4):1272-1284, 2014.
[182] X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Analyzing respiratory ef-
fort amplitude for automated sleep stage classification. Biomed. Signal Process. Control,
14:197–205, 2014.
[187] J. Lu, D. Sherman, M. Devor, and C. B. Saper. A putative flipflop switch for control of
REM sleep. Nature, 441(7093):589–594, 2006.
[188] B. Luque, L. Lacasa, F. Ballesteros, and J. Luque. Horizontal visibility graphs: exact
results for random time series. Phys. Rev. E, 80:046103, 2009.
194 References
[192] L. Marshall, H. Helgadóttir, M. Mölle, and J. Born. Boosting slow oscillations during
sleep potentiates memory. Nature, 444(7119):610–613, 2006.
[194] G. Matthews, B. Sudduth, and M. Burrow. A non-contact vital signs monitor. Crit. Rev.
Biomed. Eng., 28(12):173-178, 2000.
[195] P. Meerlo, A. Sgoifo, and D. Suchecki. Restricted and disrupted sleep: Effects on auto-
nomic function, neuroendocrine stress systems and stress responsivity. Sleep Med. Rev.,
12(3):197-210, 2008.
[199] N. Meziane, J. G. Webster, M. Attari, and A. J. Nimunkar. Dry electrodes for electrocar-
diography. Physiol. Meas., 34(9):R47–R69, 2013.
[201] Mio Alpha Intensive Heart Rate Monitor (retrieved in Jan. 2015). [Online] Available:
http://www.mioglobal.com.
[205] M. Muller. Part 1: Analysis and retrieval techniques for music data – Dynamic time
warping. In Information Retrieval for Music and Motion, Chap. 4, pp. 69-84, Springer-
Verlag, Berlin, Germany, 2007.
[206] A. Muzet. Environmental noise, sleep and health. Sleep Med. Rev., 11(2):135-142, 2007.
[209] E. P. Neuburg. Frequency warping by dynamic programming. In Proc. IEEE Int. Conf.
Acoust. Speech Signal Process. (ICASSP), pp. 573-575, New York, NY, 1988.
[210] M. E. J. Newman. Assortative mixing in networks. Phys. Rev. Lett., 89(20):208701, 2002.
[211] M. E. J. Newman. The structure and function of complex networks. SIAM Rev.,
45(2):167-256, 2003.
[212] M. E. J. Newman and J. Park. Why social networks are different from other types of
networks. Phys. Rev. E, 68(3):036122, 2003.
[213] H.-V. V. Ngo, T. Martinetz, J. Born, and M. Mölle. Auditory closed-loop stimulation of
the sleep slow oscillation enhances memory. Neuron, 78(3):545–553, 2013.
[215] M. M. Ohayon. Epidemiology of insomnia: what we know and what we still need to
learn. Sleep Med. Rev., 6(2):97–111, 2002.
[220] J. Paquet A. Kawinska, and J. Carrier. Wake detection capacity of actigraphy during
sleep. Sleep, 30(10):1362–1369, 2007.
[221] R. Paradiso, G. Loriga, and N. Taccini. A wearable health care system based on knitted
integrated sensors. IEEE Trans. Inf. Technol. Biomed., 9(3):337–344, 2005.
[223] T. Penzel and R. Conradt. Computer based sleep recording and analysis. Sleep Med. Rev.,
4(2):131–148, 2000.
[225] T. Penzel, J. W. Kantelhardt, C.-C. Lo, K. Voigt, and C. Vogelmeier. Dynamics of heart
rate and sleep stages in normals and patients with sleep apnea. Neuropsychopharmacol-
ogy, 28:S48-S53, 2003.
[228] D. Pevernagie, R. M. Aarts, and M. D. Meyer. The acoustics of snoring, Sleep Med. Rev.,
14(2):131–144, 2010.
[229] Philips Respironics Actiwatch, Philips Healthcare (retrieved in Nov. 2012). [Online]
Available: http://www.actiwatch.respironics.com.
[230] E. A. Phillipson. Control of breathing during sleep. Am. Rev. Respir. Dis., 118(5):909–
939, 1978.
[231] G. Pocock, C. D. Richards, and D. A. Richards. Human Physiology, 4th edn., Oxford
University Press, 2013.
[234] C. P. Pollak, W. W. Tryon, H. Nagaraja, and R. Dzwonczyk. How accurately does wrist
actigraphy identify the states of sleep and wakefulness? Sleep, 24(8):957-965, 2001.
[235] I. P. Priban and W. F. Fincham. Self-adaptive control and respiratory system. Nature,
208(5008):339–343, 1965.
[236] W. Prinz. Perception and action planning. Eur. J. Cognit. Psychol., 9(2):129–154, 1997.
[237] F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for compar-
ing induction algorithms. In Proc. 15th Int. Conf. Machine Learn. (ICML), pp. 445–453,
Madison, WI, 1998.
[238] P. Pudil, J. Novovičová. Floating search methods in feature selection. Pattern Recogn.
Lett., 15(11):1119–1125, 1994.
[239] J. R. Quinlan. C4.5: programs for machine learning, Morgan Kaufmann Publishers Inc.,
San Francisco, CA, 1993.
[240] L. R. Rabiner and B. Gold. Theory and Application of Diginal Signal Processing, Pren-
tice Hall Press, 1975.
[243] A. N. Rama, S. C. Cho, and C. A. Kushida. Normal human sleep. In Sleep: A Compre-
hensive Handbook, edited by T. Lee-Chiong, Chap. 1, pp. 3-9, Wiley-Liss, New Jersey,
2006.
[244] J. Rasbash, F. Steele, W. J. Browne, and H. Goldstein. A User’s Guide to MLwiN, Centre
for Multilevel Modelling, Univ. of Bristol, Bristol, UK, 2009.
[246] S. W. Raudenbush and A. S. Bryk. Hierarchical Linear Models, Sage, Thousand Oaks,
CA, 2002.
[251] B. W. Riedel and K. L. Lichstein. Objective sleep measures and subjective sleep satis-
faction: How do older adults with insomnia define a good night’s sleep? Psychol. Aging,
13(1):159–163, 1998.
[252] D. Riemann, M. Berger, and U. Voderholzer. Sleep and depression results from psy-
chobiological studies: an overview. Biol. Psychol., 57(1-3):67–103, 2001.
[258] A. Sadeh. The role and validity of actigraphy in sleep medicine: an update. Sleep Med.
Rev., 15(4):259–267, 2011.
[260] C. C. R. Sady, U. S. Freitas, A. Portmann, J.-F. Muir, C. Letellier, and L. A. Aguirre. Au-
tomatic sleep staging from ventilator signals in non-invasive ventilation. Comput. Biol.
Med., 43(7):833–839, 2013.
[261] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word
recognition. IEEE Trans. Acoust., Speech, Signal Process., AASP-26(1):43-49, 1978.
[263] J. S. Samkoff and C. H. Jacques. A review of studies concerning effects of sleep depri-
vation and fatigue on residents’ performance. Acad. Med., 66(11):687-693, 1991.
[264] L. Samy, M.-C. Huang, J. Liu, W. Xu, and M. Sarrafzadeh. Unobtrusive sleep stage iden-
tification using a pressure-sensitive bed sheet. IEEE Sens. J., 14(7):2092–2101, 2014.
[265] J. P. Saul, R. F. Rea, D. L. Eckberg, R. D. Berger, and R. J. Cohen. Heart rate and
muscle sympathetic nerve variability during reflex changes of autonomic activity, Am. J.
Physiol., 258(3):H713–H721, 1990.
[266] C. Schäfer, M. G. Rosenblum, J. Kurths and H.-H. Abel. Heartbeat synchronized with
ventilation. Nature, 329:239–240, 1998.
[268] E. Sforza, C. Jouny, and V. Ibanez. Cardiac activation during arousal in humans: further
evidence for hierarchy in the arousal response. Clin. Neurophysiol., 111(9):1611–1619,
2000.
[269] A. Sgoifo, C. Coe, S. Parmigiani, and J. Koolhaas. Individual differences in behavior and
physiology: causes and consequences. Neurosci. Biobehav. Rev., 29:1–2, 2005.
[271] S. S. Shapiro, M. B. Wilk, and H. J. Chen. Network analysis of human heartbeat dynam-
ics. J. Am. Stat. Assoc., 63(324):1343–1372, 1968.
[272] Z.-G. Shao. Network analysis of human heartbeat dynamics. Appl. Phys. Lett.,
96(7):073703, 2010.
[274] Z. Shinar, S. Akselrod, Y. Dagan, and A. Baharav. Autonomic changes during wake-
sleep transition: a heart rate variability based approach. Auton. Neurosci., 130(1-2):17–
27, 2006.
[277] J. Sloboda and M. Das. A simple sleep stage identification technique for incorporation
in inexpensive electronic sleep screening devices. In Proc. IEEE Nat. Aero. Elect. Conf.
(NAECON), pp. 21–24, Dayton, OH, 2011.
[282] K. Spiegel, R. Leproult, E. Van Cauter. Impact of sleep debt on metabolic and endocrine
function. The Lancet, 354(9188):1435–1439, 1999.
[284] M. Steriade. The corticothalamic system in sleep. Front. Biosci., 8:878–899, 2003.
[287] E. Tasali, R. Leproult, D. A. Ehrmann, and E. V. Cauter. Slow-wave sleep and the risk of
type 2 diabetes in humans. Proc. Natl. Acad. Sci. U.S.A., 105(3):1044–1049, 2008.
[288] Task Force of the European Society of Cardiology and the North American Society of
Pacing and Electrophysiology. Heart rate variability: standards of measurement, physio-
logical interpretation and clinical use. Circulation, 93:1043–1065, 1996.
[290] C. Texier and S. N. Majumdar. Wigner time-delay distribution in chaotic cavities and
freezing transition. Phys. Rev. Lett., 110(25):250602, 2013.
[291] TomTom Runner Cardio Monitor (retrieved in Jan. 2015). [Online] Available:
http://www.tomtom.com/products/your-sports/running.
[292] J. Trinder, J. Kleiman, M. Carrington, S. Smith, S. Breen, N. Tan, and Y. Kim. Auto-
nomic activity during human sleep as a function of time and sleep stage. J. Sleep Res.,
10(4):253–264, 2001.
[294] J. Trinder, F. Whitworth, A. Kay, and P. Wilkin. Respiratory instability during sleep
onset. J. Appl. Physiol., 73(6):2462–2469, 1992.
[296] M. Unser. Splines: a perfect fit for signal and image processing. IEEE Signal Proc. Mag.,
16(6):22–38, 1999.
[297] J. Van. Alste and T. S. Schilder. Removal of base-line wander and power-line interference
from the ECG by an efficient FIR filter with a reduced number of taps. IEEE Trans.
Biomed. Engineering, BME-32(12):1052–1060, 1985.
[299] E. Vanoli, P. B. Adamson, L. Ba, G. D. Pinna, R. Lazzara, and W. C. Orr. Heart rate
variability during specific sleep stages: a comparison of healthy subjects with patients
after myocardial infarction. Circulation, 91:1918–1922, 1995.
[300] J. Virkkala, J. Hasan, A. Värri, S.-L. Himanen, and K. Müller. Automatic sleep stage
classification using two-channel electro-oculography. J. Neurosci. Meth., 166(1):109–
115, 2007.
[303] T. Watanabe and K. Watanabe. Noncontact method for sleep stage estimation. IEEE
Trans Biomed. Eng., 51(10):1735–1748, 2004.
[305] D. O. White, J. V. Weil, and C. W. Zwillich. Metabolic rate and breathing during sleep.
J. Appl. Physiol., 59(2):384–391, 1985.
signals as a validation tool for ergonomic steering in smart bedding systems. Work: J.
Prev. Ass. Rehabil., 41:1985-1989, 2012.
[312] M. Xiao, H. Yan, J. Song, Y. Yang, and X. Yang. Sleep stages classification based on
heart rate variability and random forest. Biomed. Signal Process. Control, 8(6):624–633,
2013.
[313] X. Xu, J. Zhang, and M. Small1. Superfamily phenomena and motifs of networks in-
duced from time series. Proc. Natl. Acad. Sci. U.S.A., 105(50):19601-19605, 2008.
[314] D. Yankov, E. Keogh, J. Medina, B. Chiu, and V. Zordan. Detecting time series mo-
tifs under uniform scaling. In Proc. Assoc. Comput. Mach. SIG Knowl. Discovery Data
Mining (ACM SIGKDD), 2005, pp. 844–853.
[315] B. Yilmaz, M. H. Asyali, E. Arikan, S. Yetkin, and Fuat Özgen. Sleep stage and ob-
structive apneaic epoch classification using single-lead ECG. Biomed. Eng. Online, 9:39,
2010.
[316] J. Yoo, L. Yan, S. Lee, H. Kim, H.-J. Yoo. A wearable ECG acquisition system with
compact planar-fashionable circuit board-based shirt. IEEE Trans. Inf. Technol. Biomed.,
13(6):897–902, 2009.
[318] C. Yu, Z. Liu, T. McKenna, A. T. Reisner, and J. Reifman. A method for automatic
identification of reliable heart rates calculated from ECG and PPG waveforms. J. Am.
Med. Inform. Assoc., 13(3):309–320, 2006.
[320] J. Zhang and M. Small. Complex network from pseudoperiodic time series: topology
versus dynamics. Phys. Rev. Lett., 96(23):238701, 2006.
204 References
[321] G. Zhu, Y. Li, and P. Wen. Analysis and classification of sleep stages based on difference
visibility graphs from a single-channel EEG signal. IEEE J. Biomed. Health Inform.,
18(6):1813–1821, 2014.
[322] G. Zhu, Y. Li, and P. Wen. An efficient visibility graph similarity algorithm and its ap-
plication on sleep stages classification. Brain Inform. LNCS, 7670:185–195, 2012.
List of the author’s publications
Journal articles
2. X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Sleep and wake classi-
fication with actigraphy and respiratory effort using dynamic warping. IEEE Journal of
Biomedical and Health Informatics, 18(4):1272–1284, 2014.
205
206 List of the author’s publications
scaling for sleep staging (2014 Physiol. Meas. 35 2539). Physiological Measurement,
36(3):625, 2015.
11. X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Rolink. Detection of nocturnal slow
wave sleep based on cardiorespiratory activity. Submitted.
12. X. Long, R. Haakma, T. Leufkens, P. Fonseca, and R. M. Aarts. Effects of between- and
within-subject variability on autonomic cardiorespiratory activity during sleep and their
limitations on sleep staging: a multilevel analysis. Submitted.
13. P. Fonseca∗ , X. Long∗ , M. Radha, R. Haakma, R. M. Aarts, and J. Rolink. Sleep stage
classification with ECG and respiratory effort. Submitted. (∗ Joint first authorship)
14. P. Fonseca, R. M. Aarts, X. Long, and R. Haakma. Estimating actigraphy from motion
artifacts in ECG and respiratory effort signals. Submitted.
15. P. Fonseca, N. Den Teuling, X. Long, J. Rolink, and R. M. Aarts. Cardiorespiratory sleep
stage detection using conditional random fields. Submitted.
11. M. S. Goelema, X. Long, and R. Haakma. Gender effect found in the association be-
tween overnight breathing rate variation and reported sleep quality scores. Sleep, vol. 38
(Abstract Supplement), p. A60, Jun. 2015.
14. M.-M. Nano, X. Long, J. Werth, R. M. Aarts, and R. Heusdens. Sleep apnea detection
using time-delayed heart rate variability. Submitted, 2015.
1. X. Long, R. Haakma, P. Fonseca, and R. M. Aarts. System and method for determining
spectral boundaries for sleep stage classification. Pending.
2. X. Long, P. Fonseca, Niek den Teuling, R. Haakma, and R. M. Aarts. System and method
for slow wave sleep detection. Pending.
3. P. Fonseca, Niek den Teuling, X. Long, R. Haakma, and R. M. Aarts. System and method
for cardiorespiratory sleep stage classification. Pending.
2. X. Long, S. Pauws, M. Pijl, J. Lacroix, A. Goris, and R. M. Aarts. Analysis and predic-
tion of daily physical activity level data using autoregressive integrated moving average
models. 3rd Workshop on Behaviour Monitoring and Interpretation (BMI’09), pp. 1–15,
Paderborn, Germany, Oct. 2009.
3. X. Long, W. Yin, L. An, H. Ni, L. Huang, Q. Luo, and Y. Chen. Churn analysis of online
social network users using data mining techniques. International MultiConference of
Engineers and Computer Scientists (IMECS’12), pp. 551–556, Hong Kong, Mar. 2012.
4. X. Long, M. Pijl, S. Pauws, J. Lacroix, A. Goris, and R. M. Aarts. Towards tailored phys-
ical activity health intervention: Predicting dropout participants. Health and Technology,
4:273–287, 2014.
I still remember the moment more than four years ago I decided to pursue this PhD project in
Western Europe (Eindhoven, the Netherlands), 11472 km away from my home in the Far East
(Huizhou, China). It was not an easy decision since it meant that I had to change my career
path and stay on the other side of the earth for the second time after I finished my master study
in Eindhoven in 2009, and this time it would be much longer. Even though, I experienced to be
exciting and full of strength at that moment because it seemed that I found my dream, a dream
of dedicating myself to what I am extraordinarily interested in; and then after four years, you
see this book. Herewith, I would like to express my heartfelt appreciation to all of you who
shared my experience over the years.
First and foremost, I would like to express the deepest thanks and gratitude to my supervi-
sors, Prof. Ronald M. Aarts and Dr. Reinder Haakma. Thank you, Ronald, for your sincerest
advices and encouragements for the long walks on both professional development and personal
life I underwent during the past four years, as well as during my master period. Thank you,
Reinder, for masterly and patiently coaching me the doctorate work and for giving me the free-
dom to explore my own ideas. I will never forget the discussions during our regular meetings,
which were always so inspiring and happy. I learned so much from you about how to energize
creative thinking during scientific research. I always feel being lucky under your supervision.
A special note of gratitude to Prof. Jan Bergmans, the chair of the Signal Processing (SPS)
group at TU/e, you offered me this wonderful opportunity to pursue my doctorate degree.
Enormous thanks must go to my colleague Pedro Fonseca who worked closely with me.
Your support and knowledge have been of great help for surpassing the encountered obstacles.
I have been receiving lots of benefits from your critical review for my articles. Without the help
from you, this thesis would never have been possible to be finished. Particular thanks must be
recorded to my second promoter Prof. Johan Arends for your advices and discussions regarding
the neurophysiology aspect of the work. Many thanks go also to Jérôme Rolink and Mustafa
Radha, who provided inspiring comments for my manuscripts and provided huge contributions
to the algorithm framework of the project, and to Dr. Sandrine Devot and Reimund Dratwa,
who initiated the framework. I would also like to thank Maaike Goelema, Dr. Tim Weysen, Dr.
Tim Leufkens, Dr. Roy Raymann, Tine Smits, and Renske de Bruijn for supporting the work
209
210 Acknowledgements
with your expertise in psychology or physiology. My gratitude is also due to the other former
team members Adrienne Heinrich and Dr. Igor Berezhnyy as well as the former master students
Jie Yang, Niek den Teuling, Xi Yang, Antonio Rebelo, Yuan Lu, and Xi Zhang being involved
in the project. Your enthusiastic attitude and your hard work during different phases of the
project did accelerate the success of my work. I feel fortunate for having you in the project. A
special thank goes to Timothy A. Nathan, a senior intellectual property counsel from the IP&S
department in Philips United States, for your active responses that expedited the approval of
my work for publication and for helping me with filing several patent applications.
I would also like to acknowledge the committee of this thesis, Prof. Panos Markopoulos,
Prof. Sabine Van Huffel (KU Leuven), Prof. Steffen Leonhardt (RWTH Aachen University),
and the chairman Prof. Peter de With, for the insightful comments of the thesis.
Most of the work presented in this thesis has been conducted at Philips Research, the Nether-
lands. For this reason thanks must go to Marieke van der Hoeven, the head of the Brain, Body
& Behavior department where I spent the first two years in your group, and to Dr. Jörg Habetha,
the head of the Personal Health department where I had the honor to work in your department
during the past two years. It was a grateful and pleasure time of my life where I met so many
knowledgable and energetic scientists, from whom I have learned a lot during coffee breaks,
lunch time, and offside events that we had together. Dr. Michael Rooijakkers, thank you for
helping me with presenting my work in the EMBC’14 conference in Chicago. Since I have also
been involved in a couple of other projects apart from my PhD work, I would like to thank Prof.
Guofu Zhou, Jan Werth, Dr. Peter Andriessen, Dr. Louis Atallah, Elly Zwartkruis-Pelgrim,
Marina Nano, and Dr. Richard Heusdens for having great discussions with you. Thanks go
also to the Philips and SPS secretaries as well as the other adminstration staffs who helped me
with organizing many non-technical issues such as providing instructions and ICT supports at
the beginning of the project, applying business trips and reimbursement, appointing teleconfer-
ences, and making support letters for my parents’ and friends’ visit to the Netherlands. I would
also like to express my sincerest gratitude to Dr. Bin Yin and Dr. Steffen Pauws for guiding my
master project. You started me down the amazing road of scientific research.
During my life in the Netherlands, I have to admit that I owe a lot of thanks to my Chinese
friends and colleagues who made me never lonely: Anmin, Liya, Yuanjia, Xiaoyin, BoC, Qing,
Tao, Tao, Wei, Jianhua, Bin, Rui, Pu, Wei, Quan, Shaoxiong, Yanan, Yan, Lin, Xin, Xiong and
so many others. Particularly, I would also give special thanks to Le, Wenyao, Xiaomin, Xin,
Dan, Tingyun, Joanne, Chen, Fei, and Anqi for different reasons. Last but not least, I want to
express my sincerest thanks to my parents for giving me life to see, listen, feel, and experience
this wonderful world and to my relatives and friends in China for your supports during the past
11586 days.
Being back to 18 years ago during my child age at middle school in 1997, I wrote an article
when I was doing my writing homework where I dreamed to be awarded the Nobel Laureate in
Biomedicine in 2016, although I had no idea what ‘Biomedicine’ means literally. Unfortunately,
since 2016 is coming soon, I now realize that there is even no Nobel Laureate in Biomedicine,
but who knows if that will come in the future.
About the author
211