Sleepstages Cardio Respiratory

On the analysis and classification of sleep stages from
cardiorespiratory activity
Citation for published version (APA):
Long, X. (2015). On the analysis and classification of sleep stages from cardiorespiratory activity Eindhoven:
Technische Universiteit Eindhoven
Document status and date:

Published: 30/06/2015
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the author's version of the article upon submission and before peer-review. There
can be important differences between the submitted version and the official published version of record. People
interested in the research are advised to contact the author for the final version of the publication, or visit the
DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page
numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
Take down policy
If you believe that this document breaches copyright please contact us:
openaccess@tue.nl
providing details. We will immediately remove access to the work pending the investigation of your claim.
Download date: 06. febr.. 2019

On the Analysis and Classification of Sleep Stages
from Cardiorespiratory Activity
PROEFSCHRIFT
ter verkrijging van de graad van doctor aan de Technische Universiteit

Eindhoven, op gezag van de rector magnificus prof.dr.ir. F.P.T.Baaijens,
voor een commissie aangewezen door het College voor Promoties, in het
openbaar te verdedigen op dinsdag 30 juni 2015 om 16:00 uur
door
Xi Long
Geboren te Ganzhou, China

Dit proefschrift is goedgekeurd door de promotoren en de samenstelling van de promotiecom-
missie is als volgt:
voorzitter : prof.dr.ir. P.H.N de With

1e promotor : prof.dr. R.M. Aarts
2e promotor : prof.dr. J.B.A.M. Arends
copromotor : dr.ir. R. Haakma (Philips Research)
leden : prof.dr. P. Markopoulos
Prof.Dr.-Ing.Dr.med. S. Leonhardt (RWTH Aachen University)
prof.dr.ir. S. Van Huffel (KU Leuven)
On the Analysis and Classification of Sleep
Stages from Cardiorespiratory Activity
Xi Long
On the Analysis and Classification of Sleep Stages from Cardiorespiratory Activity / by Xi
Long – Eindhoven : Eindhoven University of Technology, 2015.
A catalogue record is available from the Eindhoven University of Technology Library.
Proefschrift. – ISBN : 978-90-386-3850-8.
The research presented in this thesis was supported by Philips Group Innovation – Research,
Eindhoven, The Netherlands.
Cover Design : Ya Shu, Eindhoven, The Netherlands.

Reproduction : Eindhoven University of Technology.
Copyright c 2015, Xi Long

All rights reserved. Copyright of the individual chapters belongs to the publisher of the journal
listed at the beginning of each respective chapter. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior written permission from the copyright
owner.
To my beloved parents
to the memory of my wonderful youth
and to my country, China
Summary
Sleep is a state of reversible disconnection from the environment and plays an exceptionally
essential role in maintaining internal homeostasis, memory consolidation, energy conservation,
and cognitive and behavioral performance. Nowadays, problems in sleeping are widely preva-
lent around the world with increasing sleep complaints. Historically, such problems have been
less common because the regulation of sleep is synchronized with the external environment
through a biological circadian rhythm. However, since we are now living in a modern indus-
trialized society with artificial environments where lighting, heat, and food are available at any
moment, sleep disturbances and disorders have reached epidemic levels. People experience the
symptoms of disturbed sleep such as fatigue, increased impulsiveness, and agitation without
being aware of the link between these issues and their sleeping patterns.
In order to have a healthy condition in body and mind, people should be empowered with
the ability to monitor sleep easily and without disturbing the sleep, to assess sleep quality or
sleep-related problems and to be able to adjust their sleep habits accordingly. However, the
traditional sleep monitoring method, known as polysomnography (PSG), has the problems that
the monitoring is usually accomplished in a sleep laboratory with costly facilities, and many
sleep-disturbing devices with electrodes and wires have to be attached to the body. Furthermore,
the measurements of such devices can only be interpreted by highly trained sleep clinicians.
Therefore, although PSG is currently considered the gold standard and common practice for
sleep monitoring, it is very unfit for daily use in a home scenario by people without specialized
training, and will introduce undesired sleep disturbances. This has motivated the investigation
of alternative sensors and methods that allow for monitoring sleep in an unobtrusive manner,
preferably inexpensive and with no requirement of training.
Objective sleep assessment is often based on monitoring sleep stages throughout the night.
In the past decades, cardiorespiratory signals have attracted more and more attention in the
context of sleep staging or sleep stage classification. Cardiorespiratory activity has been shown
to associate with sleep stages through the regulation of the autonomic nervous system. More
importantly, cardiorespiratory signals can be acquired unobtrusively using advanced technolo-
gies such as microwave Doppler radar, ballistocardiography, photoplethysmography, pressure-
sensitive bed sheets, acoustic devices, and near-infrared cameras. Thus, investigating cardiac
vii
viii Summary
and respiratory characteristics in different sleep stages is important for providing a reliable per-
formance in sleep stage classification, with which a more adequate sleep assessment can be
delivered.
This thesis first exploits characteristics of cardiac/respiratory activity and their interaction
during sleep using several signal analysis methods. These are: frequency band adaptation on
heart rate variability (Chapter 2), dynamic time/frequency warping and uniform scaling (mea-
suring self-dissimilarity) for respiration (Chapter 3 and Chapter 4 respectively), analysis of
breathing depth and volume (Chapter 5), and visibility graph analysis in complex networks for
cardiorespiratory interaction (Chapter 6). Based on these methods, novel cardiorespiratory fea-
tures (expressing certain physiological properties) are proposed to classify sleep stages. Results
show that these features can help to profoundly improve performance of sleep stage classifica-
tion.
In addition, an interesting finding is demonstrated in Chapter 7, which is that there is a time
delay between the changes in brain activity and autonomic variations during sleep transitions.
It appears that the cardiac changes consistently precede the variations in brain activity during
light-deep sleep and sleep-wake transitions. In Chapter 8, this finding is utilized to detect deep
sleep (i.e., slow wave sleep) by using the feature values from with a preceding time interval of
a few minutes before, which can help to significantly improve the detection results. Further-
more, the major challenge of sleep stage classification based on cardiorespiratory activity is
discussed in Chapter 9. It is found that the classification performance is mainly limited by the
between- and within-subject variations in autonomic physiology as well as subject demograph-
ics. Therefore, methods of feature normalization and feature smoothing over the entire night
are proposed in Chapter 10, which serve to reduce these variations between and within subjects
that are observed in the cardiorespiratory features. As a result, marked improvements in sleep
stage classification are observed.
In summary, this thesis focuses on objectively analyzing and classifying sleep stages using
cardiorespiratory signals. It shows that by extracting novel features from the signals, post-
processing features using normalization and smoothing, and applying new findings regarding
autonomic-brain time delay, the sleep stage classifiers can be substantially improved with reli-
able results being ultimately achieved.
Nederlandse samenvatting
Slaap is een omkeerbare toestand van ontkoppeling met de omgeving en speelt een buitenge-
woon belangrijke rol in het instandhouden van de interne homeostase, consolidatie van het
geheugen, energiebesparing, cognitieve prestaties en gedrag. Tegenwoordig komen in de hele
wereld problemen bij slapen in toenemende mate voor. Dit is in het verleden geen probleem
geweest omdat de regulatie van slaap altijd goed gesynchroniseerd is geweest met de omge-
ving door een biologisch circadiaan ritme, maar sinds we in een modern geı̈ndustrialiseerde
maatschappij leven met kunstmatige omgevingen waarbij licht, warmte, en eten beschikbaar
zijn op elk moment, bereiken slaapverstoring en slaapproblemen een epidemisch niveau. Men-
sen ervaren de symptomen van slaapverstoring zoals moeheid, toegenomen impulsiviteit en
agitatie zonder daarbij de relatie met hun slaappatroon te leggen.
Om een gezonde geestelijke en lichamelijke conditie te hebben zouden mensen de mogelijk-
heid moeten hebben om op een eenvoudige manier en zonder daarmee hun slaap te verstoren
hun slaapkwaliteit of slaapproblemen vast te kunnen stellen en hun slaapgewoontes daaraan
aan te passen. Gangbare slaapregistratiemethodes, bekend als polysomnografie (PSG), worden
toegepast in een slaaplaboratorium met dure faciliteiten, en veel slaapverstorende meetmetho-
den met elektrodes met draden verbonden aan het lichaam, waarvan de metingen bovendien
alleen geı̈nterpreteerd kunnen worden door hoogopgeleide slaaptechnici. Hoewel PSG nu de
gouden standaard is en de gangbare praktijk is voor slaapregistratie, is het ongeschikt voor
dagelijks thuisgebruik door mensen zonder speciale opleiding en zal het ongewenst slaapver-
storingen introduceren. Dit heeft het onderzoek naar alternatieve sensoren en methodes gemo-
tiveerd die het meten zonder deze problemen mogelijk maken, bij voorkeur niet duur en zonder
speciale opleiding te gebruiken.
Objectieve vaststelling van slaapparameters is vaak gebaseerd op registratie van slaaptoe-
standen gedurende de hele nacht. In de afgelopen tientallen jaren hebben cardiaal-respiratoire-
signalen meer en meer aandacht gekregen bij het vaststellen van slaapfases en de classificatie
van slaap. Cardiaal-respiratoire activiteit blijkt gerelateerd te zijn aan slaapfases door de regu-
latie van het autonome zenuwstelsel en de signalen kunnen, nog belangrijker, verkregen worden
zonder daar hinder van te hebben door het gebruik van geavanceerde technieken zoals mi-
crogolf Doppler radar, ballistocardiografie, fotoplethysmografie, drukgevoelige bedlakens, en
ix
x Nederlandse samenvatting
nabij-infraroodcameras. Daarom is het onderzoek naar cardiaal-respiratoire-karakteristieken

van verschillende slaapfases belangrijk om een betrouwbare slaapclassificatie te verkrijgen,
waarmee een betere slaapbeoordeling mogelijk wordt.
Dit proefschrift benut de eigenschappen van cardiaal-respiratoire activiteit en hun interac-
tie tijdens slaap, gebruikmakend van signaalanalysemethodes. Dit zijn: adaptieve filters voor
hartslag (hoofdstuk 2), dynamische tijd-frequentie-warping en uniforme schaling (meten van
zelf-ongelijkheid) voor ademhaling (respectievelijk hoofdstuk 3 en 4), analyse van ademha-
lingsdiepte en -volume (hoofdstuk 5), en zichtbaarheidsgrafiek analyse van complexe netwerken
voor cardiaal-respiratoire-interactie (hoofdstuk 6). Gebruikmakend van deze methodes wor-
den nieuwe cardiale en respiratoire features (die fysiologische eigenschappen representeren)
voorgesteld voor het vaststellen van slaapfaseclassificatie. De resultaten laten zien dat deze
features de slaapfaseclassificatie grondig kunnen verbeteren.
Bovendien is een interessant fenomeen gedemonstreerd in hoofdstuk 7 betreffende de tijds-
vertraging tussen de activiteit van het brein en die van het autonoom zenuwstelsel gedurende de
overgangen van de slaapfases. Het blijkt dat de cardiale veranderingen consistent voorafgaan
aan de variatie van de breinactiviteit gedurende de overgangen tussen lichte en diepe slaap
en gedurende de overgangen tussen slaap en waak. In hoofdstuk 8 passen we dit fenomeen
toe om diepe slaap te detecteren, wat significant verbeterde detectieresultaten oplevert. Het
belangrijkste probleem van slaapfaseclassificatie met cardio-respiratoire-activiteit is behandeld
in hoofdstuk 9. Het blijkt dat de classificatieresultaten voornamelijk worden beperkt door de
variatie tussen proefpersonen en binnen proefpersonen, zowel in de autonome fysiologie, als in
de demografie van de proefpersonen. In hoofdstuk 10 zijn daarom methodes voor normalise-
ren en gladstrijken over de hele nacht van features voorgesteld, welke dienen om de genoemde
variaties in de cardio-respiratoire-activiteit te verminderen. Dit resulteert erin dat er markante
verbeteringen in de slaapfaseclassificatie worden waargenomen.
Samengevat richt dit proefschrift zich op het objectief analyseren en classificeren van slaap-
fases met gebruikmaking van cardio-respiratoire-signalen. Het laat zien dat door het afleiden
van nieuwe features uit de signalen, het verder bewerken van deze features door middel van
normalisatie en gladstrijken, en het toepassen van nieuwe bevindingen betreffende de tijdsver-
traging tussen de activiteit van het brein en die van het autonoom zenuwstelsel, de slaapfase-
classificatie substantieel verbeterd kan worden en uiteindelijk betrouwbare resultaten bereikt
kunnen worden.
List of abbreviations
AASM American Academy of Sleep Medicine

AIC Akaike’s information criterion
ANA Autonomic nervous activity
ANN Artificial neural network
ANOVA Analysis of variance
ANS Autonomic nervous system
AR Autoregressive
ASMD Absolute standardized mean difference
AUC Area under the curve
BB Breath-to-breath interval
BCG Ballistocardiography
BM Body movement
BMI Body mass index
BR Breathing rate
BS Baseline
CFS Correlation-based feature selection
CRI Cardiorespiratory interaction
CS Correction scheme
CV Cross validation
D Deep sleep (slow wave sleep)
DFA Detrended fluctuation analysis
DFT Discrete Fourier transform
DFW Dynamic frequency warping
DS Deep sleep (slow wave sleep)
DTW Dynamic time warping
DVG Difference visibility graph
DW Dynamic warping
ECG Electrocardiography
EEG Electroencephalography
xi
xii List of abbreviations
EMG Electromyography
EOG Electrooculography
FFT Fast fourier transform
FN False negative
FP False positive
HF High frequency
HHT Hilbert-Huang transform
HMM Hidden Markov model
HR Heart rate
HRV Heart rate variability
ICC Intra-group correlation coefficient
IG Information gain
IGLS Iterated generalized least squares
IQR Interquartile range
KNN K-nearest neighbor
L Light sleep
LD Linear discriminant
LF Low frequency
LOSOCV Leave-one-subject-out cross-validation
LS Light sleep
LSA Least squares approximation
N1 Stage 1 NREM sleep
N2 Stage 2 NREM sleep
N3 Stage 3 NREM sleep (slow wave sleep, stage S3 and S4)
NB Naive Bayes
NREM Non-rapid-eye-movement
PAT Peripheral arterial tone
PDFA Progressive DFA
pNN50 Percentage of successive RR differences >50 ms
PPG Photoplethysmography
PR Precision-recall
PSD Power spectral density
PSG Polysomnography
PSQI Pittsburgh Sleep Quality Index
PSSA Pressure-sensitive sensor array
PVE Proportion of variance explained
Q-Q Quantile-Quantile
QD Quadratic discriminant
QRS Three successive extrema in the ECG
OSA Obstructive sleep apnea
R REM sleep
List of abbreviations xiii
R&K Rechtschaffen and Kales

rANOVA Repeated measures ANOVA
RE Respiration (sometimes respiratory effort)
REM Rapid-eye-movement
RF Random forest
RIP Respiratory inductance plethysmography
rMNOVA Repeated measures multivariate ANOVA
RMSSD Root mean square of successive RR differences
ROC Receiver operating characteristic
RR Two successive heartbeats
RS REM sleep
S1 Stage 1 NREM sleep
SampEn Sample entropy
SaO2 Oxygen saturation
SCSB Static-charge-sensitive bed
SD Standard deviation
SDBR Standard deviation of breathing rates
SDNN Standard deviation of heartbeat intervals
SDSD Standard deviation of successive RR differences
SE Sleep efficiency (Chapter 2)
SE Sample entropy (Chapter 4 and 8)
SE Standard error (Chapter 9)
SFS Sequential forward selection
SOL Sleep onset latency
SSA Self-Assessment of Sleep and Awakening Quality Scale
ST Snooze time
STFT Short time fourier transform
SVM Support vector machines
SWS Slow wave sleep
TH Thresholding
TN True negative
TP True positive
TRT Total recording time
TST Total sleep time
TVPP Time-varying prior probability
TWT Total wake time
VG Visibility graph
VLF Very low frequency
xiv List of abbreviations
W Wake
WASO Wake after sleep onset
WDFA Windowed DFA
WRLD Wake, REM sleep, light sleep, and deep sleep
WRN Wake, REM sleep, and NREM sleep
Contents
Summary vii
Nederlandse samenvatting ix
List of abbreviations xi
1 General introduction 1
1.1 Human sleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Sleep stages in electrophysiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Polysomnography – standard for sleep assessment . . . . . . . . . . . . . . . . . . . 3
1.4 Automatic sleep monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 PSG-based sleep stage classification . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.2 Cardiorespiratory-based sleep stage classification . . . . . . . . . . . . . . . . 4
1.5 Research question and objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Part I: Signal Analysis for Sleep Stage Classification 11
2 Spectral boundary adaptation on heart rate variability for sleep and wake
classification 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 PSD estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Boundary adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.4 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.5 Feature evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.6 Sleep and wake classification . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.7 Classifier evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
xv
xvi Contents
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.1 Discriminative power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Healthy subjects versus insomniacs . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.4 Determination of adaptive boundaries . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Sleep and wake classification with actigraphy and respiratory effort using
dynamic warping 29
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Subjects and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Dynamic warping algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Sleep and wake classification . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Experiments and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Analysis of respiratory effort amplitude for sleep stage classification 53

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.1 Subjects and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.2 Signal preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.3 Existing respiratory features . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.4 Respiratory amplitude features . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.5 Subject-specific feature normalization . . . . . . . . . . . . . . . . . . . . . . 61
4.2.6 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.7 Experiments and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Measuring dissimilarity between respiratory effort signals based on uniform

scaling for sleep staging 71
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.1 Subjects and protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.2 Polysomnographic measurements . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.3 Signal processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.4 Dissimilarity measure with uniform scaling . . . . . . . . . . . . . . . . . . . 74
5.2.5 Windowed dissimilarity feature . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2.6 Feature analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Contents xvii
5.2.7 Sleep staging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6 Modeling cardiorespiratory interaction during sleep with complex networks 85

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Cardiorespiratory interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3 Visibility graph network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4 Network properties of cardiorespiratory interaction . . . . . . . . . . . . . . . . . . . 88
6.4.1 Degree distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4.2 Assortativity mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Part II: Timing Between Autonomic and Brain Activity 93
7 Time delay between cardiac and brain activity during sleep transitions 95
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.1 Subjects and recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.2 EEG and cardiac activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.3 Correlation-analysis during sleep transitions . . . . . . . . . . . . . . . . . . . . . . . 98
7.4 Results and discission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8 Detection of nocturnal slow wave sleep based on cardiorespiratory activity 105

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.3.1 Signal preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.3.3 Spline fitting for feature smoothing . . . . . . . . . . . . . . . . . . . . . . . 110
8.3.4 Feature subset selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.3.5 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.3.6 Time delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.4 Experiments and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Part III: Cardiorespiratory-Based Sleep Stage Classification 122

xviii Contents
9 Effects of between- and within-subject variability on autonomic cardiores-

piratory activity during sleep and their limitations on sleep staging: a mul-
tilevel analysis 125
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.2.1 Subjects and protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.2.2 PSG recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.2.3 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.2.4 Cardiorespiratory parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.2.5 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.2.6 Multilevel analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.2.7 Explanations of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9.2.8 Between- and within-subject effects in sleep staging . . . . . . . . . . . . . . 132
9.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.3.1 Descriptive results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.3.2 Multilevel modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.3.3 Proportion of variance explained . . . . . . . . . . . . . . . . . . . . . . . . 138
9.3.4 Sleep staging results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
9.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
9.A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
10 Sleep stage classification with ECG and respiratory effort 149

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
10.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10.2.1 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10.2.3 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
10.2.4 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
10.2.5 Validation and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
10.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
10.3.1 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
10.3.2 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.3.3 Comparison with state-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . 162
10.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
11 General discussion and future perspectives 165

11.1 Analysis of features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
11.2 Sleep stage classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
11.3 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
11.4 Subject/patient groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
11.5 Objective and subjective sleep assessments . . . . . . . . . . . . . . . . . . . . . . . 176
Contents xix
11.6 Towards unobtrusive sleep monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 178
List of the author’s publications 205
Acknowledgements 209
About the author 211

xx Contents
CHAPTER 1
General introduction
“Everything is one; during sleep the soul, undistracted, is absorbed into the unity;
when awake, distracted, it sees the different beings.”
— Chuang Tzu, 300 B.C., Warring States period 1
1
Translated by M. Palmer, The Book of Chuang Tzu, 1st ed, Penguin Classics, 2006
1
2 Chapter 1. General introduction
1.1 Human sleep
Sleep occupies approximately one-third of our lifetime and it is very important to maintain
health and wellbeing, homeostasis, memory, and cognitive and behavioral performance [48,
165, 285]. Sleep exerts significant effects on the systemic hemodynamics, cardiac function,
endothelial function, and coagulation [311]. Sleep deprivation can lead to loss of daytime
performance, disturbance in circadian rhythm, impairments such as mental or physical fatigue,
reduced immune system, reduced cognitive functioning, and other health risks [10, 15, 45, 80].
In epidemiology and pathophysiology, it has been found that sleep disorders or abnormalities
are linked to depression, diabetes, metabolic syndrome, sudden death, and other cardiovascular
diseases such as cardiac arrhythmias, hypertension, atherosclerosis, stroke, and heart failure
[113, 252, 282, 287, 311].
Human sleep is a complex biological process with its own internal architecture expressed by
sleep states or stages [63, 281]. Sleep states usually include nighttime wakefulness, rapid-eye-
movement (REM) sleep, and non-REM (NREM) sleep, where NREM sleep can be further di-
vided into stage S1-S4 according to the rules recommended by Rechtschaffen and Kales (R&K)
[247]. With the more recent guidelines of the American Academy of Sleep Medicine (AASM)
[136], S3 and S4 are suggested to be merged, namely slow wave sleep (SWS) or “deep” sleep,
since no essential difference was found between them. Besides, S1 and S2 usually correspond
to “light” sleep [276]. Note that, for simplicity, the sleep states and stages are generally called
“sleep stages” in this thesis.
1.2 Sleep stages in electrophysiology
For normal or healthy subjects, sleep progresses with about four NREM-REM sleep cycles per
night, where each cycle lasts around 90-110 min on average, starting with light sleep followed
by deep sleep before REM sleep (Figure 1.1) [63, 216, 243]. Electrophysiological interpre-
tation of sleep stage changes during a sleep cycle can be described as follows. During sleep
onset (usually from wake to S1 sleep), the changes in electroencephalography (EEG) from
clear rhythmic alpha waves (8-13 Hz) to a mixed frequency pattern with less alpha waves but
more theta waves (4-7 Hz), accompanying a gradual decrease of muscle activity that can be
observed in electromyography (EMG) as well as slow and asynchronous eye movements in
electrooculography (EOG) [63]. Many studies argued that the acknowledgement of sleep onset
should require the presence of S2. This is because the transition from wake to S1 may not
coincide with perceived sleep onset and it often occurs several times, which is considered as
‘unequivocal sleep’ associated with a low arousal threshold where subjects often report they
are still awake [5, 82]. During S2 sleep, K-complexes or sleep spindles appear along with the
incremental presence of high-amplitude and low-frequency activity as S2 progresses [243]. Af-
terwards, sleep enters SWS (S3 and then S4), during which high-voltage (≥75 µ V) and slow
(delta) wave activity (0.5-2 Hz) accounts for at least 20% (S3) or 50% (S4) of the EEG ac-
tivity with no eye movements [17, 136, 247]. SWS represents the most restorative period of
Chapter 1. General introduction 3
Wake
REM
S1
S2
SWS
0 1 2 3 4 5 6 7 8
Time (h)
Figure 1.1: An example of the sleep stage progression throughout an entire night hypnogram from a
healthy adult.
sleep for metabolic functioning associated with sleep quality [10, 91], where brain and body
energy can be efficiently conserved and recovered [34, 35] and new memories are consolidated
[61, 285, 302]. During SWS, the field potentials in EEG oscillations are related to synchro-
nized patterns of burst-pause firing in cortical neurons [92, 284]. REM sleep correlates to burst
of rapid eye movements, muscle atonia, low-voltage brain waves, and irregular heartbeats and
breathing and this is the state where dreaming often takes place [88].
1.3 Polysomnography – standard for sleep assessment

In clinical practice, overnight polysomnography (PSG) is currently regarded as the “gold stan-
dard” for objective assessment of sleep architecture/pattern and occurrence of sleep-related dis-
orders such as insomnia, parasomnia, sleep-disorder breathing (apnea and hypopnea), and REM
sleep behavior disorder [136, 168, 247]. It is usually recorded in a sleep laboratory (Figure 1.2).
PSG comprises multi-channel biological signals including EEG, EMG, EOG, electrocardiog-
raphy (ECG), airflow, respiratory effort (chest and abdomen), blood oxygen saturation (SaO2 ),
etc. Figure 1.3 shows an example of a PSG recording (20 min) of a healthy adult. According
to the R&K rules [247] or the AASM guidelines [136] (or the revised version [39]), overnight
sleep stages are typically scored by trained sleep technicians on continuous 30-s epochs through
visually inspecting the EEG, EMG, and EOG channels in PSG, forming a hynogram throughout
the entire night.
1.4 Automatic sleep monitoring

1.4.1 PSG-based sleep stage classification
As stated, objective assessment of sleep is often based on analyzing overnight sleep architecture
so that identifying sleep stages is required. Since 1968, manual scoring with PSG recordings
has become the gold standard in clinical environment for identifying sleep stages [247], where
Figure 1.2: A sleep laboratory where a subject was being monitored with PSG and a sleep technician
was visually inspecting the recorded PSG (adapted from source: www.newscenter.philips.com).
rules and regulations had been used for more than 40 years. In the 21 century, the AASM guide-
lines [136] and their updated version [39] were respectively released in 2007 and 2012, which
can yield an increased inter-rater agreement when scoring sleep stages. However, visually scor-
ing is very tedious and time consuming. This has resulted in a large number of studies (since
1970s) focusing on investigating computer-assisted automatic sleep staging systems with PSG
channels including EEG, EMG, and/or EOG [40, 104, 114, 115, 173, 223, 300] where reliable
classification results have been achieved. Further, some validated computer-assisted sleep scor-
ing systems have been applied in clinical routine such as the Somnolyzerr developed by the
SIESTA group [19, 20].
1.4.2 Cardiorespiratory-based sleep stage classification
Although PSG is the gold standard and common practice for objective sleep assessment and
the sleep stage classification based on PSG can be automated, it has several disadvantages from
healthcare perspective. For example, it requires high costs of facilitating equipment in a sleep
laboratory, it disrupts ‘normal’ sleep, and it is inapplicable for long-term sleep monitoring at
a home environment. This has motivated the investigation of signals and sensors that allow
for reliable physiological measurements during sleep in an unobtrusive and convenient man-
ner. Figure 1.4 shows an obtrusive (with PSG) and an expected unobtrusive scenario for sleep
monitoring. In this context, alternatives such as body movements and cardiorespiratory activity
have attracted more and more attention in the past years, mainly because they can be easily
S3 S4 S3 S2 Wake S2 S2 S2 S2 S1 S2 S2 S2 S2 S2 S1 S2 S1 S2 REM S1 S1 REM REM REM REM REM S1 S1 Wake S1 S2 S1 REM REM REM REM REM Wake Wake
Figure 1.3: An example of an continuous PSG recording (20 min) with multiple channels of bio-signals
from a healthy adult. The channels from top to down are: hypnogram, EEG (Fp1-A2), EEG (C3-A2),
EEG (O1-A2), EEG (Fp2-A1), EEG (C4-A1), EEG (M2-M1), EOG (P8-A1 left), EOG (P18-A1 right),
EMG (mental), EMG (leg), ECG (chest), airflow, respiratory effort (chest wall movements), respiratory
effort (abdominal wall movements), and SaO2 .
acquired using less-obtrusive or even non-contact sensors with minimal discomfort to subjects
along with the fast development of wearable/off-body unobtrusive sensing techniques.
Body movements can be measured with several methods. For example, actigraphy is a
well-known less-obtrusive way of measuring one’s body movements that undergo using an ac-
celerometer, typically worn on wrist. It has been extensively studied [18, 74, 126, 204, 234] and
is regarded as a standard method for sleep assessment when PSG is not available [204]. There
are many commercialized actigraphy-based products to monitor sleep. For example, Philips
developed an Actiwatch [229] to measure activity counts during sleep, which is a clinically
validated device. Recently, Fitbit [105] and Jawbone [143] also released their wearable prod-
ucts that can quantify body movements for sleep monitoring. Some studies proposed to use an
‘off-the-shelf’ smartphone by placing it in the bed or close to the pillow to capture body move-
ments during sleep and satisfactory results were obtained in computing some sleep statistics
compared with actigraphy [33, 208]. Contrary to PSG, actigraphy can only be used to iden-
tify sleep and wake periods rather than different sleep stages. This is because it only measures
(a) (b)
Figure 1.4: Scenario of (a) an obtrusive (with PSG) and (b) a conceptual unobtrusive sleep monitoring
(adapted from source: www.newscenter.philips.com).
physical movements of the body, reflecting limited internal physiological information [33, 258].
Researchers argued that, even for distinguishing between sleep and wake states, actigraphy still
accounts for errors when compared with PSG [33, 295, 310]. For example, it can not deal with
the misidentifications of ‘quiet-wake’ with low or no body activity, leading to a low accuracy
in detecting wakefulness [18, 87, 234], in particular for subjects with insomnia, jet lag, or shift
work [175, 220]. To obtain a better performance in identifying sleep/wake and to achieve clas-
sification of multiple sleep stages, additional physiological information is required. Figure 1.5
compares the overnight sleep stages with the corresponding actigraphy measured by a Philips
Actiwatch from a healthy subject [see Figure 1.5(a) and Figure 1.5(b)]. It indicates that, instead
of different sleep stages, activity count in actigraphy is only correlated to sleep and wake states.
Using solely actigraphy to classify multiple sleep stages is of inadequacy thus.
Cardiorespiratory activity is characterized differently by sleep stages due to the substantial
differences in manifestation or regulation of autonomic nervous system (ANS) including sym-
pathetic activity and parasympathetic (or vagal) activity [13, 226, 267, 281, 292]. Mostly, they
have ‘opposite’ actions where one activates a response in physiology while the other suppresses
it [231]. In regard to cardiac activity, for example, heart rate (HR) and standard deviation of
normal-to-normal heartbeat/interbeat intervals (SDNN) are associated with sympathetic activ-
ity, the spectral power in the high-frequency band between 0.15 and 0.4 Hz is a marker of
parasympathetic nervous modulation activated by respiratory-stimulated stretch receptors, and
the spectral power in the low-frequency band between 0.04 and 0.15 Hz is assumed to indicate
sympathetic tone [12, 24, 265, 288]. All these non-invasively measured characteristics have
been experimentally shown to differ across sleep stages. In addition, respiratory dynamics,
such as respiratory frequency (breathing rate, BR) [95], respiratory variability [256], and res-
piratory regularity [67, 129], have also been proven to vary over sleep stages. This means that
cardiac and respiratory activity can be in turn used to separate sleep stages, which is of signifi-
cant clinical relevance. As displayed in Figure 1.5, the hypnogram with full sleep stages seems
more correlated to the variations of BR and HR in comparison with actigraphy. Furthermore,
(a)
Wake
REM
S1
S2
SWS
0 100 200 300 400 500 600 700 800

Time in epoch (30 s)
400
Activity count (a.u.)
(b)
300
200
100
0
0 100 200 300 400 500 600 700 800
5.5
(c)
SDNN (a.u.)
4.5
3.5
2.5
0 100 200 300 400 500 600 700 800
-1
(d)
SDBR (a.u.)
-3
-5
-7
0 100 200 300 400 500 600 700 800
Figure 1.5: Comparison between (a) a hypnogram, (b) an actigraphy measured by Philips Actiwatch,
(c) standard deviation of normal-to-normal heartbeat intervals (SDNN), and (d) standard deviation of
breathing rates (SDBR) from a healthy adult.
the coupling or interaction between cardiac and respiratory signals has also been demonstrated
to change over sleep stages in previous work [29, 30, 41]. For example, SWS corresponds to
an enhanced phase synchronization between heartbeats and respiration [30].
In the past decade, researchers have dedicated on exploring new sensors or approaches to
acquire cardiac and/or respiratory signals, which can eventually be applied for sleep analysis.
Instead of the traditional Holter system, wearable textile electrodes were developed for record-
Table 1.1: Summary of some unobtrusive or less-unobtrusive approaches for measuring cardiac
and/or respiratory activity
Approach Activity Placement References (examples)

Textile electrode ECG/RE On-body patch or T-shirt [93, 176, 199, 221, 316]
PAT HR Wrist [28]
PPG HR and RE Wrist [174, 318]
BCG HR/RE Mattress, pillow, load cells, or bed [66, 68, 161, 200, 218]
PSSA RE Bedsheet [141, 264]
Web-camera HR In front of face [232]
Infrared-camera RE Bedside table [128]
Microphone RE Off-body [52, 228]
Radar RE Off-body [85, 319]
PAT: peripheral arterial tone; PPG: photoplethysmography; BCG: ballistocardiography; PSSA: pressure-
sensitive sensor array; HR: heart rate (pulse rate for PAT and PPG); RE: respiration.
ing ECG signals [93, 176, 199, 221, 316]. Bar et al. [28] proposed a WatchPAT ambulatory
system to obtain peripheral arterial tone (PAT) signal from which HR or heartbeat interval can
be derived [28]. More recently, photoplethysmography (PPG) is becoming a more widely used
approach that is placed at the skin surface to detect blood volume changes in the microvascular
bed of tissue [16]. From PPG, HR and respiration are able to be reliably estimated [174, 318].
Several PPG-based watches are available in the market including Adidas miCoach [2], Mio
Alpha [201], TomTom Runner Cardio [291], etc. Ballistocardiography (BCG), collected with
piezo-electric sensors for example, has also received a growing recognition as long as it can
be acquired non-invasively and it contains physiological activity of HR and even respiration
[7, 189]. It has been increasingly employed to monitor sleep as an integrated form of mat-
tress [218], load cells [68], (underneath) pillow [66], or bed [161, 200]. Furthermore, a textile
bedsheet with a pressure-sensitive sensor array was designed to estimate respiration and body
posture during bedtime sleep [141, 264]. In addition, video-based [128, 232] and audio-based
[52, 228] approaches were applied to measure cardiac or respiratory activity. They can also be
obtained with a off-body microwave Doppler radar or radio-frequency sensor [85, 319]. All
these approaches can be potentially used for unobtrusive sleep monitoring. Table 1.1 summa-
rizes some unobtrusive or less-unobtrusive approaches for cardiac and/or respiratory measure-
ment.
Automatic classification of sleep stages using body movements, cardiac activity [or more
specifically, heart rate variability (HRV)], and/or respiratory activity has been intensively re-
searched to date due to the rationale of the regulatory autonomic fluctuations occurring over
various sleep stages. With actigraphy used to quantify body movements, the studies were
mostly focused on detecting sleep/wake states [74, 126, 259]. Combining body movements
with cardiorespiratory activity can result in a superior sleep/wake performance [89, 150]. By
means of cardiac and/or respiratory signals, numerous papers were published aiming at differ-
ent classification tasks, such as the classification between sleep and wake [145, 151], between
Data Signal Feature

acquisition preprocessing extraction
Sleep stage Feature Feature post-

classification selection processing
Figure 1.6: A general framework of sleep stage classification, in which the present thesis is devoted to
feature extraction and feature post-processing.
wake, REM sleep, and NREM sleep [94, 161, 248, 249, 303], between wake, REM sleep, light
sleep, and SWS [138, 309], between REM sleep and NREM sleep [69, 197], between light sleep
and SWS [51], between SWS and all the other sleep stages [68, 273], and even between full
sleep stages (wake, REM sleep, S1 sleep, S2 sleep, and SWS) [167, 214, 315]. Note that some
of those studies executed several different sleep stage classification tasks and some made also
use of information about body movements. The general framework of sleep stage classification
is illustrated in Figure 1.6.
1.5 Research question and objective

The main research question addressed in the present thesis is can overnight sleep stages be
classified reliably with body movements, cardiac activity, and/or respiratory activity?
It is known that, to discriminate between sleep stages, plenty of existent features describing
certain physiological characteristics of cardiorespiratory activity have been presented (see, e.g.,
[32, 59, 197, 249, 289, 315]). However, in comparison with PSG scoring, the sleep staging per-
formance obtained using those features still remains low, suggesting a strong need for further
improvement to obtain more reliable results. This motivated us to exploit additional information
in autonomic activity characterized by sleep stages that is complementary to the existing fea-
tures. On the other hand, variability between and within subjects conveyed by the signals would
be considerable barriers to achieving a reliable performance in sleep stage classification. The
between-subject variability (individual difference) that influences the cardiorespiratory activity
can relate to subject demographics (including body size) such as age, gender, and body mass in-
dex, and internal physiology such as response of autonomic regulation and metabolic function.
Additionally, some other factors (e.g., measuring noise, body movements, conscious breath-
ing control, sleep environment, daytime activity, and stressful events) varying within subjects
or from subject to subject may also affect the nighttime cardiorespiratory activity. Therefore,
there is an urge to extract features that are not or less affected by the between-/within-subject
variability on their own or to (post-) process the features for the purpose of mitigating those
variations appeared in the signals.
The general objective of this thesis is to achieve performance improvement in reliably classi-
fying sleep stages based on the above-mentioned signals. As indicated in Figure 1.6, this thesis
is on (1) extracting new features that contain cardiorespiratory characteristics in addition to the
existing features and/or are robust to the variability between or within subjects, and (2) reduc-
ing the variability in cardiorespiratory signals through feature post-processing (Figure 1.6). It
is noted that the focus of population in this thesis is mainly on healthy subjects whereas the
patients with disordered sleep are out of our scope.
1.6 Outline of the thesis

The thesis is comprised of three parts. Part I introduces novel informative features for sleep
stage classification by analyzing the cardiac and respiratory signals in different aspects of phys-
iology. These new features include HRV spectral powers with adaptive boundaries using HRV-
derived respiratory frequency (Chapter 2), respiratory self-similarity in signal morphology
quantified by means of dynamic warping (Chapter 3) and uniform scaling (Chapter 5), mea-
sures expressing certain properties of breathing depth and volume (Chapter 4), and interaction
between heartbeats and respiration translated in a two-dimensional visibility graph (Chapter 6).
The detailed methods and results of the new features are provided in the corresponding chap-
ters where they are originally designed to improve the performance for different classification
tasks. In other words, these features should have their own special superiority in identifying
certain sleep stages. Taking the respiratory self-similarity features as examples, the use of dy-
namic warping distance aims at detecting wakefulness out of sleep while using uniform scaling
measure can help separating SWS and the other sleep stages.
Part II demonstrates the findings with regard to timing between autonomic and brain activ-
ity and examines its usefulness in sleep stage classification. Chapter 7 discusses the time delay
phenomenon between changes in cardiac and brain activity for different hierarchy of sleep stage
transitions. In Chapter 8, we apply these findings to help predict SWS periods using early car-
diorespiratory activity that precedes the transitions between SWS and the other sleep stages
with a few minutes.
Part III is devoted to understanding the challenges from the effects caused by between-
/within-subject variability for separating sleep stages. Chapter 9 systematically quantifies and
assesses those effects on cardiorespiratory activity caused by difference in, for example, sub-
ject demographics, internal physiology, and time of the night. In order to overcome these
challenges to some extent, in Chapter 10, we proposes to normalize and smooth features for
each night’s recording aiming at diminishing the between-subject and within-subject variabil-
ity, respectively. The classification results using an extended feature set (including the existing
features presented in literature and the new features in this thesis) with feature post-processing
are reported in this chapter.
The last chapter (Chapter 11) generally discusses the work presented in this thesis and
answers the main research question raised before. Additionally, future work that would be
interesting and promising for sleep stage classification is suggested.
Part I: Signal Analysis for Sleep Stage
Classification
CHAPTER 2
Spectral boundary adaptation on heart rate variability for

sleep and wake classification
This chapter is adapted from: X. Long, P. Fonseca, R. Haakma, R. M. Aarts and J. Foussier. Spectral
boundary adaptation on heart rate variability for sleep and wake classification. International Journal on
Artificial Intelligence Tools, 23(3):1460002, 2014. World
c Scientific Publishing
Abstract – A method of adapting the boundaries when extracting the spectral features from
heart rate variability (HRV) for sleep and wake classification is described. HRV series can be
derived from electrocardiogram (ECG) signals obtained from single-night polysomnography
(PSG) recordings. Conventionally, the HRV spectral features are extracted from the spectrum
of an HRV series with fixed boundaries specifying bands of very low frequency (VLF), low
frequency (LF), and high frequency (HF). However, because they are fixed, they may fail to
accurately reflect certain aspects of autonomic nervous activity which in turn may limit their
discriminative power, e.g. in sleep and wake classification. This is in part related to the fact that
the sympathetic tone (partially reflected in the LF band) and the respiratory activity (modulated
in the HF band) vary over time. In order to minimize the impact of these variations, we adapt
the HRV spectral boundaries using time-frequency analysis. Experiments were conducted on
a data set acquired from two groups with 15 healthy and 15 insomnia subjects each. Results
show that adapting the HRV spectral features significantly increased their discriminative power
when classifying sleep and wake. Additionally, this method also provided a significant im-
provement of the overall classification performance when used in combination with other HRV
non-spectral features. Furthermore, compared with the use of actigraphy, the classification
performed better when combining it with the HRV features.
13
14 Chapter 2. Spectral boundary adaptation on heart rate variability
2.1 Introduction
Sleep plays an important role in human health. Night-time polysomnography (PSG) record-
ings, along with manually scored hypnograms, are considered the “gold standard” for objec-
tively analyzing sleep architecture and occurrence of sleep-related problems [247, 248]. PSG
recordings are typically recorded and analyzed in sleep laboratories, and are usually split into
non-overlapping time intervals (or epochs) of 30 s according to the Rechtschaffen & Kales
(R&K) rules [247].
As shown in literature, monitoring heart rate variability (HRV) during bedtime is helpful
in sleep staging [89, 248], particularly to distinguish between rapid-eye-movement (REM) and
non rapid-eye-movement (NREM) [59, 197]. It reflects the variation, over time, of the period
between consecutive heart beats. HRV is derived from the length variations of RR-intervals,
i.e. time intervals between consecutive R-peaks of the QRS complex in the electrocardiogram
(ECG). Spectral analysis of HRV has been widely employed in the assessment of autonomic
nervous activity during bedtime [59, 197, 299]. It traditionally involves the computation of
the power spectral density (PSD) of an HRV series. An HRV spectrum is typically divided
in three bands, namely in a very low frequency (VLF) band from 0.003 to 0.04 Hz, a low
frequency (LF) band from 0.04 to 0.15 Hz, and a high frequency (HF) band between 0.15
and 0.4 Hz [190, 288]. These bands are then be used to compute certain properties such as
the spectral power of the VLF, LF, and HF components and the power ratio of low-to-high
frequency (LF/HF) components [59, 202, 265]. In general, it has been found that the VLF
spectral power is associated with long-term regulatory mechanisms, the LF spectral power is a
marker of sympathetic modulation of the heart and it also reflects some parasympathetic activity
when the respiratory frequency components partially fall into the LF band, the HF spectral
power is related to parasympathetic activity mainly caused by respiratory sinus arrhythmia
(RSA), and the LF/HF ratio is an indication of sympathetic-parasympathetic balance [265, 275,
288]. In particular, the HRV spectrum usually contains a peak centered around the respiration
frequency, located in the HF band, and another peak in the LF band which reflects, to a certain
degree, sympathetic activation [13, 190, 219].
The parameters derived from HRV PSD are often used as “features” in automatic sleep
staging [248] or sleep and wake classification systems [89]. Previous work has used HRV
spectral features with fixed boundaries for sleep and wake classification [89]. This classifier
exploits the fact that sympathetic tone and the respiratory activity are modulated in different
frequency bands of the HRV spectrum and exhibit different properties during sleep and wake,
allowing them to be distinguished.
It is known that the HRV spectrum and the dominant (or peak) frequencies of the LF and
HF bands are not constant but rather vary over time according to the autonomic modulations
of the heart beats [288]. Hence, as long as fixed band boundaries are used to compute HRV
spectral features, we might produce inaccurate estimates of cardiac autonomic activities. Since
the discrimination of sleep states (or sleep and wake in our case) depends on these estimates,
the classification accuracy will be affected. To avoid this issue, we will use a feature adaption
method while estimating the HRV features.
Part I. Signal analysis for sleep stage classification 15
−3
x 10
Wake (mean)
18
Wake (standard errors)
16 Sleep (mean)
Sleep (standard errors)
Normalized PSD (ms2 /Hz) 14
12
10
−2
0 0.1 0.2 0.3 0.4 0.5
Frequency (Hz)
Figure 2.1: An example of the mean HRV PSD with standard errors for sleep and wake states over an
entire-night’s recording of a subject.
Figure 2.2: An example of the normalized HRV PSD versus time (30-s epoch) over an entire-night’s
recording of a subject.
The problem of boundary adaptation has been analyzed before in other areas such as stress
detection [25, 117] and anesthesia analysis [270]. It has been suggested that the LF and HF
boundaries are related to the peak frequency in the traditional LF band, called “LF peak fre-
quency”, and the peak frequency in the traditional HF band, called “HF peak frequency”, re-
Data
Acquisition
Spectrum Information
(LF and HF peaks) (HRV) PSD
Estimation
Boundary Feature Feature

Adaptation Extraction Evaluation
Classification Classifier
(training/testing) Evaluation
Figure 2.3: Block diagram of the feature adaptation method used for sleep and wake classification.
spectively [117, 270]. In practice, these two peak frequencies can be estimated by determining
the frequency of local maximum in the band between 0.003 and 0.15 Hz (i.e. the traditional
VLF band and LF band) and in the band from 0.15 to 0.4 Hz (i.e. the traditional HF band),
respectively. The working assumption here is that the peaks always fall within those two bands.
By centering the new bands around these peaks instead of using fixed boundaries, we can com-
pensate for their time-varying behavior. This should help, to some extent, reduce within- and
between-subject variabilities in the way these features express sympathetic activation and res-
piratory activity, ultimately helping improve sleep and wake classification. Figure 2.1 shows an
example of the mean HRV PSDs with standard errors [standard deviations (SD)] for sleep and
wake states of a subject. It can be observed that, although their standard errors overlap, their
mean values are not the same in different frequency ranges. This should provide an opportunity
of discriminating between sleep and wake states. Figure 2.2 illustrates the time variation of the
HRV PSD for a subject.
2.2 Materials and methods

The proposed boundary adaptation method applied on HRV spectral features used for sleep and
wake classification is described by a block diagram in Figure 2.3. Each block will be explained
further in the following subsections.
2.2.1 Data acquisition
In total the data acquired from 30 subjects were used in our experiment. Fifteen subjects belong
to healthy group and fifteen subjects are insomniacs. The insomniacs were randomly selected
from a larger-sized group in order to evenly compare the classification performance between
Table 2.1: Summary of subject demographics

Parameter Mean ± SD
Healthy Sex 5 males and 10 females
Group Age (y) 31.0 ± 10.4
(N = 15) BMI (kg/m2 ) 24.4 ± 3.3
Sleep Efficiency (%) 92.3 ± 3.8
Insomnia Sex 8 males and 7 females
Group Age (y) 47.4 ± 14.5
(N = 15) BMI (kg/m2 ) 27.7 ± 4.5
Sleep Efficiency (%) 69.7 ± 14.7
the healthy and insomnia groups, from which we ensured that the numbers of subjects are
equal. A subject was considered healthy if his/her Pittsburgh Sleep Quality Index (PSQI) [60]
was less than 6, while a subject was considered insomnia based on his/her self-report. For each
subject, a full PSG was recorded according to the guidelines of the American Academy of Sleep
Medicine (AASM) [136]. Among the 30 subjects, the PSG recordings of fifteen insomniacs
and nine healthy subjects were recorded in the Sleep Health Center, Boston, USA during 2009
(Alice 5 PSG, Philips Respironics) and of the remaining six healthy subjects in the Philips
Experience Lab, Eindhoven, The Netherlands during 2010 (Vitaport 3 PSG, TEMEC). The
ECG was recorded with a modified V2 Lead, sampled at 500 Hz (Boston data) and 256 Hz
(Eindhoven data).
Sleep stages were manually scored on 30-s epochs by sleep experts according to the AASM
guidelines as wake, REM sleep, and each of the NREM sleep stages (N1-N3). For sleep and
wake classification, we considered two classes wake and sleep (including REM and NREM
sleep). Each PSG recording was manually clipped to the time interval comprised between the
instant when the subject turned the lights off with the intention of sleeping until the moment the
lights were turned on before the subject got out of bed in the morning. The study protocol was
approved by the ethics committee of both centers and all subjects signed an informed consent.
The subject demographics including sex, age, body mass index (BMI), and sleep efficiency are
summarized in Table 2.1.
2.2.2 PSD estimation
To estimate the PSDs of HRV, RR-intervals were first computed from the ECG signals. In our
study, the following steps were performed to obtain an RR interval series: (1) a peak detec-
tor based on the Afonso-Tompkins filter-bank algorithm [4] was used to locate the R peaks,
yielding an RR-interval series; (2) the very short (less than 0.3 s) and long (more than 2 s) RR
intervals (usually caused by ectopic heart beats, misidentification of R peaks, or badly attached
electrodes during measurement) were removed; (3) the RR-interval series was normalized by
dividing it by the mean value; (4) the resulting series was “re-sampled” at 4 Hz using linear in-
terpolation; and (5) the PSD was finally estimated with an autoregressive model with adaptive
order automatically determined using the Akaike’s information criterion (AIC) [43].
2.2.3 Boundary adaptation
As explained in Section 2.1, the use of fixed boundaries in HRV spectrum may not be appro-
priate to accurately represent different states of the autonomic nervous system and further to
classify sleep and wake. The respiratory frequency, and therefore the corresponding peak in
the HF band vary in time. Likewise, the peak corresponding to the sympathetic tone in the LF
band also varies, reflecting differences in the autonomic activation during sleep. By applying
a time-frequency analysis, the boundaries that define each band can be dynamically adapted
so that the frequency components can be more correctly assigned to the corresponding bands.
To do this, it is required to estimate the LF and HF peak frequencies, which change over time.
Figure 2.4 illustrates, with a filled contour plot, an example of the HRV spectrum over time
for a subject together with the traditional fixed frequency bands. As it can be easily seen, the
dominant LF and HF peak frequencies vary over time. Moreover, it can be observed that, for
some epochs, the spectral power of a frequency band spills over its neighboring bands when
using the fixed boundaries. For instance, for the epochs from 140 to 150, the spectral power of
the LF band also partially falls into the HF band (see Figure 2.4).
By adapting the boundaries of the LF and HF bands for each epoch, we can overcome the
issues mentioned above. This can be achieved in the following way.
• The new HF band (HF∗ ) is centered on the HF peak frequency [25, 117] and has a con-
stant bandwidth of 0.1 Hz [153]. This bandwidth was chosen after analyzing the HRV
PSDs of all 15 healthy subjects and empirically determining that most of the spectral
power related to RSA lie within a bandwidth of 0.1 Hz. A larger bandwidth (0.25 Hz)
was empirically used in other work [25, 142], but we found that in some occasions it
overlapped its adjacent LF band .
• The new LF band (LF∗ ) is centered on the dominant frequency found in the traditional
LF band, and has a bandwidth of 0.11 Hz that is similar to the traditional definition.
• The new VLF band (VLF∗ ) is defined from its traditional lower limit of 0.003 Hz up to
the lower limit of the LF band.
Figure 2.5 illustrates the adapted boundaries for the same HRV PSD shown in Figure 2.4. We
note that the LF∗ and HF∗ bands overlap in some epochs. This occurs when the LF and HF
peaks are too close to each other or when there is no HF peak (often during REM sleep [36]).
2.2.4 Feature extraction
2.2.4.1 HRV spectral features

After determining the bands we can extract HRV-related features for sleep and wake classifica-
tion. In our study we computed the logarithm of the spectral power in the VLF∗ , LF∗ , and HF∗
bands (from here on expressed as hrv vlf, hrv lf , and hrv hf ) and, in addition, the ratio
0.5 0.05
0.45 0.045
0.4 0.04
0.35 0.035
Frequency (Hz)
0.3 0.03
HF band
0.25 0.025
0.2 0.02
0.15 0.015
0.1 0.01
LF band
0.05 0.005
VLF band
0
100 200 300 400 500
Time (30−s epoch)
Figure 2.4: HRV spectrum versus time (30-s epoch) of a subject. The fixed boundaries of the VLF, HF,
and LF bands are plotted in solid lines and the corresponding bands are indicated.
0.5 0.05
0.45 0.045
0.4 0.04
0.35 HF* 0.035

band
Frequency (Hz)
0.3 0.03
0.25 0.025
0.2 0.02
0.15 0.015
0.1 LF* 0.01

band
0.05 0.005
VLF* band
0
100 200 300 400 500
Time (30−s epoch)
Figure 2.5: HRV spectrum versus time (30-s epoch) of a subject. The limits of the new HF∗ and LF∗
bands are plotted in dotted and solid curves, respectively. The lower boundary of the new VLF∗ band (at
0.003 Hz) is plotted as a dashed line.
between the spectral powers of the LF∗ and the HF∗ bands (expressed as hrv lf/hf ). Before
computing the logarithm, the power of each band was first normalized. This was achieved by
dividing the power in the VLF∗ , LF∗ , and HF∗ bands by the total spectrum power [202, 298].
Alternatively we could have normalized it by dividing the power in each band by the total
spectrum power minus the power in the VLF∗ band [59, 299]. Since we did not observe any
significant differences in the final result, the first method was used.
2.2.4.2 Spectrum information

As mentioned, the adaptation of the new boundaries requires knowledge of information derived
from the power spectrum, namely the LF and HF peak frequencies, which must be obtained
before extracting the features. The LF peak frequency can be estimated by detecting the location
of the peak in the HRV spectral range from 0.003 to 0.15 Hz. The HF peak frequency can be
estimated from a respiratory effort signal simultaneously recorded with the PSG data or it can
be derived from the HRV series directly by searching for the peak in the range between 0.15 and
0.4 Hz. In this study, to avoid using an additional sensor modality, we used the latter approach.
2.2.5 Feature evaluation
A Hellinger distance metric [130] was employed to evaluate the discriminative power (i.e. class
separability) of the HRV spectral features between sleep and wake. It is estimated by computing
the amount of overlap between two probability density estimates in a binary class problem,
expressed as
q p
DH (p, q) = 1 − ∑ p(x)q(x) (2.1)
where p(x) and q(x) are the probability density estimates of the feature values given class sleep
and wake, respectively. In its most basic form, these density estimates can be computed by
means of a normalized histogram with either a fixed number of bins or a specific bin size. In
our study the histograms were computed with a fixed number of 100 bins. A larger Hellinger
distance reflects a higher discriminative power in separating the two classes.
2.2.6 Sleep and wake classification
It has been demonstrated that a linear discriminant- (LD-) based classifier is appropriate for
the task of sleep and wake classification [89, 177]. Assuming that all features are normally dis-
tributed and their covariance matrices for the two classes are identical, the “linear discriminant”
function is given by
1
Gc (f) = − (f − µ c )T Σ −1 (f − µ c ) + ln Pr(c) (2.2)
2
where µ c is the mean vector of the feature vector f, Σ is the pooled covariance matrix, and
Pr(c) expresses the prior probability of class c.[97] In this study c = sleep as negative class
or c = wake as positive class. Based on a feature vector, the epoch is assigned to one class
when the computed discriminant score of this class minus that of the other class is higher than a
decision making threshold T (here we chose T = 0). For instance, an epoch is classified as sleep
if Gsleep (f) > Gwake (f) for this epoch. Because quadratic discriminants are known to require
a larger sample size than linear discriminants and they seem to be more sensitive to possible
violations of the assumptions of normality [110], the linear discriminant was used instead.
In regard to the prior probability Pr(c), it can be observed that the probabilities of different
classes vary throughout the night [249]. This prior probability is typically estimated during
training procedure. For a given class, for example, the probability of being asleep in the middle
of the night is much higher than just right after entering the bed or at the end of the night.
In order to exploit these variations, instead of using a fixed prior probability we computed a
time-varying prior probability for each epoch by counting the number of times that specific
epoch (relative to the instant when lights were turned off) was annotated as each class [108].
It should be pointed out that a prior probability ‘emphasis’ factor (or weight) γ (γ ∈ [0, 1]) is
used to bias the classifier towards a pre-defined class, meaning that it can set a higher barrier of
being identified to one class and at the same time a lower one to another class during decision
making. Because the classes are imbalanced with much more sleep epochs than wake epochs
(this will be explained later), yielding a very low prior probability of wake in our study, we use
this factor to “emphasize” the wake class and meanwhile “penalize” the sleep class. Therefore,
the new time-varying prior probabilities after adding emphasis factor of the two classes are
Pr′ (sleep) = γ · Pr(sleep) and Pr′ (wake) = 1 − Pr′ (sleep), where the factor γ of 0.79 was
experimentally chosen as a proper value in the case of sleep and wake classification.
2.2.7 Classifier evaluation
To assess the performance of this classifier, conventional measures of sensitivity (proportion

of correctly identified actual wake epochs) and specificity (proportion of correctly identified
actual sleep epochs) are often used. They can be calculated as sensitivity = TP/(TP+FN) and
specificity = TN/(TN+FP), with TP, TN, FP, and FN indicating the number of true positive,
true negative, false positive, and false negative classifications, respectively. However, these
two measures are not the most adequate criteria for this binary classification problem. The
reason is that the number of epochs of the wake class during an entire recording lasting the
whole night is naturally smaller than the number of epochs of the sleep class, in what is usually
called “imbalanced class distribution”. On average, the sleep and wake classes account for
respectively 92.3% and 7.7% of all epochs for the healthy subjects, and respectively 69.7% and
30.3% of all epochs in the insomnia group.
The Cohen’s Kappa coefficient of agreement [72] (denoted as κ ) not only provides a better
understanding of the general performance of the classifier in correctly identifying both classes,
but also allows for a better interpretation of the imbalanced problem when it is used as a crite-
rion to optimize performance [26]. Although it indicates how well a classifier performs for both
classes, evaluating a method with this metric that represents a single point in the entire solution
space might not be sufficient [237]. An alternative is to use a receiver operating characteristic
(ROC) curve which plots the true positive rate (i.e. sensitivity or recall) versus false positive rate
(i.e. one minus specificity) thus illustrating the classifier’s performance over the entire solution
space by means of varying a decision making threshold [103]. However, the ROC curve has
been shown to be over-optimistic when there is a heavy imbalance between two classes [83], for
instance, sleep and wake in the healthy group. Hence, a so-called Precision-Recall (PR) curve
that plots precision versus recall is used instead, where precision = TP/(TP+FP), measures the
positive predictive value. When comparing different classifiers, a larger ‘area under the PR
curve’ (AUCPR ) or ‘area under the ROC curve’ (AUCROC ) indicates a better performance. In
this study, the three metrics (κ , AUCROC and AUCPR ) were used to evaluate the performance
of sleep and wake classification with and without HRV boundary adaptation.
In addition, we combined the HRV spectral features with some other HRV (non-spectral)
features selected from the feature set used in previous work [89], including time domain fea-
tures [89, 248], nonlinear measures extracted using detrended fluctuation analysis [289] and
sample entropy [75]. Five HRV non-spectral features were selected using the feature selection
method described in [108]. This serves the purpose of examining whether the feature adap-
tation method described in this chapter can help improve the classification performance when
combined with other relevant features. Note that all features were extracted from the same
HRV series. Besides, we compared the results with those obtained using the actigraphy feature
(activity counts over a 30-s epoch, expressed as ac ), a well-known feature for sleep and wake
classification [74]. Finally, we also examined the classification performance by combining the
HRV features with this actigraphy feature.
2.3 Results
A leave-one-subject-out cross-validation (LOSOCV) procedure was conducted to assess the
discriminative power of the HRV spectral features and also to assess the performance of our
classifier. Table 2.2 compares the discriminative power (as measured by a Hellinger distance
DH ) of the HRV spectral features using the traditional fixed boundaries and using the adaptive
boundaries for healthy and insomnia subjects. They were obtained by averaging the results
computed based on training data over all iterations of the LOSOCV process.
Table 2.3 and Table 2.4 summarize the classification performance obtained with and without
boundary adaptation using different sets of features for the healthy and insomnia groups. The
HRV spectral features consist of hrv vlf, hrv lf, hrv hf, and hrv lf/hf and the HRV non-
spectral features were selected based on the training sets during the cross-validation procedure.
The results are also illustrated in Figure 2.6 and Figure 2.7 using ROC and PR curves, giving an
overview of the performance of our sleep and wake classifier used in a two-dimension solution
space. Note that the ROC and PR curves were obtained by thresholding the discriminant scores
pooled over all iterations of the LOSOCV for each group.
2.4 Discussion
2.4.1 Discriminative power
Table 2.2 shows that, after using the adaptation method, the discriminative power of the HRV
spectral features are significantly increased for the subjects in both healthy and insomnia groups
(with a paired Wilcoxon signed-rank test). For comparison, the table also indicates the Hellinger
distance of the actigraphy feature ac . Although the feature adaptation helps, to different ex-
tents, improving the discriminative power of each HRV spectral feature, it is still relatively
Table 2.2: Discriminative power comparison of the HRV spectral features for
healthy and insomnia groups
Group Feature Hellinger Distance DH p value†

Fixed Boundaries Adaptive Boundaries
hrv vlf 0.19 ± 0.02 0.22 ± 0.01 0.0004
Healthy hrv lf 0.25 ± 0.01 0.26 ± 0.01 0.0026
Group hrv hf 0.23 ± 0.01 0.29 ± 0.01 0.0001
hrv lf/hf 0.22 ± 0.01 0.27 ± 0.01 0.0001
ac 0.49 ± 0.02 –
hrv vlf 0.13 ± 0.01 0.14 ± 0.01 0.049
Insomnia hrv lf 0.18 ± 0.01 0.19 ± 0.01 0.0001
Group hrv hf 0.17 ± 0.01 0.21 ± 0.01 0.0001
hrv lf/hf 0.20 ± 0.01 0.21 ± 0.01 0.0001
ac 0.41 ± 0.02 –
†
Significance of difference between using fixed and using adaptive boundaries was
examined with a paired Wilcoxon signed-rank test.
Table 2.3: Classification performance (mean ± SD of accuracy, sensitivity, and specificity) for
healthy and insomnia groups
Group Feature set Accuracy (%) Sensitivity (%) Specificity (%)

Actigraphy feature 94.8 ± 2.7 46.8 ± 19.6 99.1 ± 1.0
HRV spectral features (F) 90.3 ± 9.0 32.7 ± 14.1 95.4 ± 9.6
Healthy HRV spectral features (A) 89.3 ± 10.7 33.9 ± 13.8 94.1 ± 11.6
Group HRV features† (F) 89.9 ± 8.5 50.6 ± 13.4 93.3 ± 8.9
HRV features† (A) 93.1 ± 4.2 49.7 ± 19.2 96.6 ± 3.3
Actigraphy + HRV features† (F) 95.7 ± 2.1 56.9 ± 16.4 99.0 ± 0.9
Actigraphy + HRV features† (A) 95.8 ± 2.2 58.1 ± 18.0 99.1 ± 0.9
Actigraphy feature 79.1 ± 12.1 47.2 ± 17.1 95.3 ± 3.5
HRV spectral features (F) 65.2 ± 12.9 42.9 ± 20.0 78.6 ± 17.3
Insomnia HRV spectral features (A) 69.0 ± 11.2 49.1 ± 16.5 78.5 ± 12.5
Group HRV features† (F) 70.1 ± 9.8 54.6 ± 18.4 80.7 ± 14.2
HRV features† (A) 72.9 ± 8.8 53.2 ± 15.3 83.0 ± 12.2
Actigraphy + HRV features† (F) 79.2 ± 10.8 57.4 ± 17.6 91.5 ± 9.0
Actigraphy + HRV features† (A) 80.6 ± 8.5 57.8 ± 16.9 92.2 ± 8.2
F: using fixed boundaries (without adaptation) on the HRV spectral features.
A: using adaptive boundaries (with adaptation) on the HRV spectral features.
† The HRV features consist of the spectral features and the non-spectral features selected from a larger fea-
ture set used in [89].

Table 2.4: Classification performance (mean ± SD of κ and pooled AUCPR and

AUCROC ) for healthy and insomnia groups
Group Feature set Kappa κ AUCPR AUCROC

Actigraphy feature 0.53 ± 0.15 0.67 0.90
HRV spectral features (F) 0.33 ± 0.18 0.30 0.71
Healthy HRV spectral features (A) 0.33 ± 0.19 0.36 0.74
Group HRV features† (F) 0.44 ± 0.25 0.51 0.80
HRV features† (A) 0.48 ± 0.24∗ 0.54 0.81
Actigraphy + HRV features† (F) 0.63 ± 0.10 0.71 0.89
Actigraphy + HRV features† (A) 0.64 ± 0.13 0.72 0.90
Actigraphy feature 0.45 ± 0.17 0.64 0.71
HRV spectral features (F) 0.20 ± 0.14 0.48 0.60
Insomnia HRV spectral features(A) 0.25 ± 0.13∗ 0.52 0.68
Group HRV features† (F) 0.31 ± 0.11 0.56 0.68
HRV features† (A) 0.34 ± 0.12∗ 0.59 0.72
Actigraphy + HRV features† (F) 0.47 ± 0.17 0.70 0.78
Actigraphy + HRV features† (A) 0.50 ± 0.14∗ 0.72 0.81
F: using fixed boundaries (without adaptation) on the HRV spectral features.
A: using adaptive boundaries (with adaptation) on the HRV spectral features.
† The HRV features consist of the spectral features and the non-spectral features selected from
a larger feature set used in [89].

∗ The difference between using fixed and using adaptive boundaries is significant, examined
with a paired Wilcoxon signed-rank test (with p < 0.05).
lower than that of the actigraphy feature which addresses body motion during bedtime. As
known in literature, body motion activity often happens during wake states [74, 258].
2.4.2 Classification
As shown in Table 2.4, in general, adapting the boundaries of the HRV spectral features can
improve the performance as evaluated by the three metrics. For the healthy group, it is inter-
esting to note that the value of κ is similar when using HRV spectral features with and without
boundary adaptation. This seems to contradict the significant increase in discriminating power
found with the Hellinger distance. Upon closer inspection we found that actually this occurs
only for that single point in the solution space. In fact, when evaluating the performance over
the entire solution space with AUCPR we see an increase from 0.30 to 0.36. The ROC and PR
curves (plotted on Figure 2.6 and Figure 2.7, respectively) with the use of HRV spectral fea-
tures clearly show that the adapted versions are superior to the original ones, particularly in the
region when recall is lower than about 0.30 or larger than about 0.60. For the insomnia group,
the figures also indicate a clear improvement after adapting the HRV spectral features.
When combining the HRV spectral features with the additional HRV features indicated ear-
lier, we see a significant increase (Wilcoxon test, p < 0.01) in κ from 0.44 ± 0.25 (without
1
(a) Healthy subjects
0.9
0.8
0.7
0.6
Sensitivity
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 − specificity
1
(b) Insomniacs
0.9
0.8
0.7
0.6
Sensitivity
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 − specificity
HRV spectral features (without adaptation)

HRV spectral features (with adaptation)
HRV features (without adaptation)
HRV features (with adaptation)
Actigraphy
Actigraphy + HRV features (without adaptation)
Actigraphy + HRV features (with adaptation)
Figure 2.6: Pooled ROC curves for sleep and wake classification using different feature sets with and
without adaptation for healthy subjects (a) and insomniacs (b).
1
(a) Healthy subjects
0.9
0.8
0.7
0.6
Precision
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
1
(b) Insomniacs
0.9
0.8
0.7
0.6
Precision
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
HRV spectral features (without adaptation)

HRV spectral features (with adaptation)
HRV features (without adaptation)
HRV features (with adaptation)
Actigraphy
Actigraphy + HRV features (without adaptation)
Actigraphy + HRV features (with adaptation)
Figure 2.7: Pooled PR curves for sleep and wake classification using different feature sets with and
without adaptation for healthy subjects (a) and insomniacs (b).
adaptation) to 0.48 ±0.24 (with adaptation) for the healthy group and from 0.31 ±0.11 (without
adaptation) to 0.34 ± 0.12 (with adaptation) for the insomnia group. The Wilcoxon significance
test performed pair-wise comparison for each subject, thus indicating that boundary adaptation
improved the classification performance for the majority of the subjects. Likewise, the pooled
AUCPR and AUCROC metrics increased when applying boundary adaptation. As shown in Ta-
ble 2.4, the variations of κ are relatively large compared to the mean values, indicating a large
between-subject variability in the classification performance.
For comparison purposes, Table 2.3 and Table 2.4 also show the classification results using
the actigraphy feature ac . As expected, for the healthy group, it outperforms the HRV features.
For the insomnia group, although the κ value of using the HRV feature set generally is lower
than using ac , the HRV features (in particular the adapted versions) outperforms this actigraphy
feature when recall is higher than ∼0.55 (see Figure 2.6 and Figure 2.7). This indicates that the
sensitivity to wake might be increased by adding these HRV features for the insomnia subjects.
It also highlights the disadvantage of a metric such as κ , which only represents a single point
reflecting a single solution in the space.
The classification results with the actigraphy and the HRV features are also given in the
tables. Although actigraphy is adequate for sleep and wake classification, combining it with the
HRV features (in particular when applying boundary adaptation on the HRV spectral features)
significantly increases the classification performance measured by κ value. The significance
was confirmed with a Wilcoxon signed-rank test (p < 0.01).
2.4.3 Healthy subjects versus insomniacs
To compare between the healthy subjects and insomniacs, it makes less sense to use the pooled
AUCPR metric due to the difference in the ratio between the numbers of sleep and wake epochs
in both groups. For instance, using a decision making rule such that all epochs are classified as
wake (i.e. recall = 1), it will lead to different precision for the healthy and insomnia groups, with
∼92% and ∼70%, respectively, which only depends on their prior probabilities. Differences in
class balance prevent a comparison between the area under the curves of each group. Therefore,
here we used the pooled AUCROC metric instead. Figure 2.6 illustrates that the sleep and
wake classification performances with different feature sets for the healthy subjects are much
better in contrast to that for the insomniacs. This confirms earlier findings, which show that
discrimination between wake and sleep (especially REM sleep) is more difficult in insomniacs
than in healthy subjects, when using cardiac activity [283] or actigraphy [175].
2.4.4 Determination of adaptive boundaries
The method described in this chapter shows a time-varying adaptation of the HRV spectral
features that offer higher discriminative power in classifying sleep and wake states. The features
are used as inputs to a sleep and wake classifier. We re-defined the spectral boundaries which
are adapted to the spectrum information (related to autonomic activity) that can be obtained
before feature extraction. This is because it is aimed at finding frequency bands that can more
accurately capture certain aspects of physiology during sleep. For instance, the HF band should
only includes respiratory activity rather than sympathetic activation, which should be in the
LF band. An excessively larger HF bandwidth might incorrectly include the “spillovered”
spectral power from sympathetic activation (see Figure 2.4). For this purpose, we used an HF∗
bandwidth of 0.1 Hz instead of the 0.25 Hz used in the traditional HF band. Alternatively, rather
than using a constant HF bandwidth (0.1 Hz) in this study, it can be determined by measuring
respiratory effort signals and analyzing their PSDs [117], but the use of an additional sensor is
required.
Additionally, we observed that the LF and HF bands can overlap under different circum-
stances: when the peak in the LF and in the HF band are close to each other, when there is no
clear peak in the HF band, or when the respiratory-frequency peak is below 0.15 Hz and there-
fore lies in the traditional LF band. Such overlaps (or spillovers) can be observed in Figure 2.4.
In these situations, the overlapped part of the spectrum components will actually influence the
features computed for both the LF∗ and the HF∗ bands. This may have an impact in the clas-
sification process, decreasing the accuracy of the classifier. Therefore, a more accurate method
is needed for defining a threshold which separates the two bands rather than just using fixed
bandwidths. This merits further investigation.
Finally, as we mentioned, the respiratory information was derived from the HRV data. Al-
though this may not be as good an estimation as a direct measure of respiratory effort, it has
been proven to be an available estimate of respiratory rate especially during sleep [79]. More
importantly, it does not require the use of an additional sensor to measure respiratory effort. Al-
ternatively, the respiration rate can also be estimated from the ECG signal directly, for example
by computing the changes in the “envelope” of the ECG due to the modulation induced by the
respiration movements [203]. This method will be further studied in future work.
2.5 Conclusion
In this chapter, we used a method based on the time-frequency analysis of HRV spectral power
to adapt HRV spectral features. It aimed at providing more accurate interpretations of the sym-
pathetic and respiratory activities in order to better discriminate between sleep and wake states.
It was achieved by adapting the spectral boundaries according to the peaks found in HF and LF
bands of the HRV power spectral density. The adaptation improved the discriminative power
of the HRV spectral features, and therefore enhanced the sleep and wake classification perfor-
mance, especially after combining the adapted HRV spectral features with the other selected
HRV non-spectral features. Using a linear discriminant classifier tested with leave-one-subject-
out cross-validation, we achieved a significant increase on Cohen’s Kappa coefficient κ (from
0.44 to 0.48 for healthy subjects and from 0.31 to 0.34 for insomniacs). Furthermore, by com-
bining these HRV features and actigraphy, we obtained a significantly increased κ compared
with that obtained when only using actigraphy (0.64 versus 0.53 for the healthy group and 0.50
versus 0.45 for the insomnia group).
CHAPTER 3
Sleep and wake classification with actigraphy and

respiratory effort using dynamic warping
This chapter is adapted from: X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Sleep and
Wake Classification with Actigraphy and Respiratory Effort using Dynamic Warping. IEEE Journal of
Biomedical and Health Informatics, 18(4):1272–1284, 2014, IEEE
c
Abstract – This chapter proposes the use of dynamic warping (DW) methods for improving
automatic sleep and wake classification using actigraphy and respiratory effort. DW is an al-
gorithm that finds an optimal non-linear alignment between two series allowing scaling and
shifting. It is widely used to quantify (dis)similarity between two series. To compare the res-
piratory effort between sleep and wake states by means of (dis)similarity, we constructed two
novel features based on DW. For a given epoch of a respiratory effort recording, the features
search for the optimally aligned epoch within the same recording in time and frequency do-
main. This is expected to yield a high (or low) similarity score when this epoch is sleep (or
wake). Since the comparison occurs throughout the entire-night recording of a subject, it may
reduce the effects of within- and between-subject variations of respiratory effort, and thus help
discriminate between sleep and wake states. The DW-based features were evaluated using a
Linear Discriminant classifier on a dataset of 15 healthy subjects. Results show that the DW-
based features can provide a Cohen’s Kappa coefficient of agreement κ = 0.59 which is signifi-
cantly higher than the existing respiratory-based features and is comparable to actigraphy. After
combining the actigraphy and the DW-based features, the classifier achieved a κ of 0.66 and an
overall accuracy of 95.7%, outperforming an earlier actigraphy- and respiratory-based feature
set (κ = 0.62). The results are also comparable with those obtained using an actigraphy- and
cardiorespiratory-based feature set but have the important advantage that they do not require an
ECG signal to be recorded.
29
30 Chapter 3. Dynamic warping on respiratory effort
3.1 Introduction
Sleep plays an important role in human’s emotional wellbeing and physical health. Many peo-
ple live with sleep-related problems (e.g., insomnia and obstructive sleep apnea) that have a
primary implication of one’s health condition [27, 247, 248]. Objective assessment of sleep
is often based on the monitoring of sleep and wake stages throughout the entire night during
bedtime [89, 151]. According to the guidelines of the American Academy of Sleep Medicine
(AASM) [136], the sleep stages consist of rapid-eye-movement (REM) and non-REM (NREM,
including N1, N2, and N3) sleep.
Overnight polysomnography (PSG) recordings with manually annotated hypnograms are
considered the “gold standard” for objectively analyzing sleep architecture and occurrence
of specific sleep-related problems [247]. A PSG typically comprises physiological data such
as the electroencephalogram (EEG), electrocardiogram (ECG), electromyogram (EMG), elec-
troocculogram (EOG), oxygen saturation, and respiratory effort. When used for sleep staging,
recorded signals are typically split in non-overlapping epochs of 30 s each in accordance with
the Rechtschaffen and Kales (R&K) rules [247] and also the more recent AASM guidelines
[136].
Although PSG is the gold standard for sleep assessment, it has several drawbacks such as the
high costs of laboratory facilities, disruption of “normal” sleep, and impossibility to perform
long-term monitoring. This has motivated the investigation of sensors/methods that allow for a
reliable acquisition of physiological modalities in an unobtrusive or at least more comfortable
and convenient way. In particular, actigraphy and cardiorespiratory signals have been often
considered in the context of automatic sleep monitoring [89, 248].
Actigraphy is a less-unobtrusive way of measuring the body movement of a subject based
on an accelerometer, which is typically worn on wrist. It has been extensively studied [18, 74,
126, 204, 234, 295] and is considered a standard method for sleep assessment when PSG is
not available [204]. However, researchers argue that actigraphy accounts for error when com-
pared with PSG [295]; and it can not cope with the misclassification of ‘quiet-wake’ with a
low body activity, resulting in low accuracy in detecting wake state [18, 234]. Since actigra-
phy only measures body movement, it reflects limited physiological information. It has been
shown that cardiorespiratory signals contain relevant physiological information which can help
improve actigraphy-based sleep and wake classification [89, 150, 226]. More importantly, these
signal modalities can be acquired in an unobtrusive circumstance in different ways (e.g., bal-
listocardiogram [189], Doppler radar [194], near-infrared camera [166], under-pillow sensor
[66], bed sensor [303]). For example, acquiring cardiorespiratory information using a static-
charge-sensitive bed (SCSB) [140, 158] has been investigated; and in recent years it becomes
more popular for unobtrusive (or non-contact) monitoring of sleep [161, 303]. However, dif-
ficulty has been found in discriminating between wake and REM sleep [249] when only using
cardiorespiratory signals. So it is necessarily important to improve the sleep and wake clas-
sification when actigraphy is absent. On the other hand, cardiac activity is relatively difficult
to capture reliably in an unobtrusive manner, particularly when compared with body move-
ment and respiratory activity [158]. For example, a novel radio-frequency sensing system [85],
which can only capture respiratory effort, was developed for sleep/wake measurement. Thus,
enhancing the sleep and wake classification performance when without cardiac activity is also
of importance. This work therefore addresses the problem of obtaining a reliable sleep and
wake classification based on the following physiological signal modalities: (1) only respiratory
effort, and (2) the combination of actigraphy and respiratory effort.
As presented in previous studies, a large amount of features have been explored for sleep
and wake classification [74, 89, 248]. As long as either ECG or actigraphy is excluded, the clas-
sification performance will degrade to a certain degree [89, 150, 151]. In this work we present
new features based on respiratory effort, which result in a classification performance not only
better than the previous respiratory feature set (and the actigrapgy feature), but also comparable
to the cardiorespiratory feature set described in [89]. Compared to that work, this study does
not require ECG signals, which is particularly well-suited to the problem of unobtrusive sleep
and wake classification.
It is known that the breathing rhythm is usually more stable and more regular during sleep
than when awake [111, 163]. After observing different respiratory effort signals in the time
and the frequency domains, we found that the morphology of the respiratory waveform and the
properties of its power spectral density (PSD) differ between sleep and wake epochs. As illus-
trated in Figure 3.1, the respiratory effort is more regular during sleep than during wake. Note
that the irregularity of respiratory effort would also be caused by body motions. Additionally,
the PSD of the respiratory effort signal of a sleep epoch is typically distributed with a clear peak
indicating the dominant respiration frequency, while that of a wake epoch often distributes with
multiple peaks. Therefore, it is assumed that, a sleep epoch is more similar to another sleep
epoch and less similar to a wake epoch from the perspective of “series shape”, regardless of be-
ing in the time or in the frequency domain. We thereby concentrate on two questions: (1) how
to quantify the “(dis)similarity” between two series in terms of their morphological properties,
and (2) which template best reflects the shape of a specific state (sleep/wake)?
Dynamic Warping (DW) algorithms have been used to assess (dis)similarity of two data
series with respect to their values. In particular, Dynamic Time Warping (DTW) [37] is a signal
matching algorithm that represents the time-alignment between two time series via dynamic
programming by means of a total cumulative distance function. It can therefore be used to
establish the degree to which two patterns match. Dynamic Frequency Warping (DFW) [209]
is an exact analog of DTW but applied in the frequency domain, where it aims at aligning two
PSD curves (often known as spectrogram frames). When used with respiratory effort signals,
DTW is expected to find a good match between the waveforms of the respiratory effort in
two separate sleep periods. In contrast, it should not find any good match of the respiratory
waveform between a sleep and a wake period, or even between two distinct wake periods. This
is simply because the breathing pattern during wake is usually not as regular as it is during
sleep, and sometimes it is more related to body motion artifacts. Analogously to DTW, DFW
can help distinguish respiratory PSD curve between a sleep and a wake state. Using DTW and
DFW we can express the (dis)similarity of signals in the time and in the frequency domains,
and accordingly capture properties of the respiratory effort signals which are characteristic of
Resp. effort (a.u.)
Resp. effort (a.u.)

Sleep Wake
0 20 40 60 0 20 40 60
Time (s) Time (s)
Sleep Wake
PSD (a.u.)
0 0.2 0.4 0.6 0.7 PSD (a.u.) 0 0.2 0.4 0.6 0.7
Frequency (Hz) Frequency (Hz)
Figure 3.1: Typical examples of respiratory time series (a) during sleep and (b) during wake in a period
of one min, and respiratory PSD series (c) during sleep and (d) during wake.
sleep and wake.

In this chapter we propose two respiratory-based features based on DW algorithms to dis-
criminate the respiration pattern between a sleep and a wake state. More concretely, one feature
uses DTW to calculate dissimilarity scores in the time domain and is applied on the respiratory
(effort) time series; the other uses DFW to calculate dissimilarity scores in the frequency do-
main and is applied on the respiratory PSD series. Both algorithms find an optimal alignment
between two discrete data series allowing variations in two dimensions, e.g., scaling or shift-
ing, and amplitude or offset [37, 152, 209]. For a given epoch from a subject’s recording, the
features search for the most similar epoch (i.e., optimally aligned epoch) as a template over
some other epochs of the same recording based on DW, instead of using a globally pre-defined
template for all subjects. This may possibly reduce the impact of the physiological differences
between subjects. Besides, because these epochs are all taken from the same subject and the
properties of the respiratory activity will not change dramatically throughout the night, the im-
pact of within-subject variation might be small. Consequently, these would potentially increase
the classification performance across the entire data set.
DW has been widely applied to recognize patterns in various topics such as speech process-
ing [241], fingerprint verification [162], and gene expression [1]. However, to our knowledge,
studies exploring the application of DW in association with sleep staging do not seem to exist.
3.2 Subjects and data

The data set was comprised of single-night PSG recordings and actigraphy (Actiwatch, Philips
Respironics) of fifteen healthy adults. Inclusion in the data collection trial was defined by a
score lower than 6 on the Pittsburg Sleep Quality Index (PSQI) [60]. For each subject, full
Table 3.1: Subject Demographics
Parameter Mean ± SD Range

Sex 5 males and 10 females
Age (y) 31.0 ± 10.4 23 − 58
Body mass index, BMI (kg/m2 ) 24.4 ± 3.3 20.2 − 31.2
Total recording time (h) 7.2 ± 1.1 4.2 − 9.1
Number of total epochs 866.2 ± 135.6 507 − 1092
Sleep efficiency∗ (%) 92.3 ± 3.8 86.0 − 97.9
For some subjects, only a portion of recording was used because EEG
electrodes fell off during the night.
∗ Ratio between total sleep time and total time in bed (here equal to the
recording length) based on the manual scores.
PSG was recorded according to the guidelines of the AASM [136]. The PSG recordings of
nine subjects were recorded in the Sleep Health Center, Boston, USA, during 2009 (Alice 5
PSG, Philips Respironics) and of six subjects in the Philips Experience Lab of the High Tech
Campus in Eindhoven, The Netherlands, during 2010 (Vitaport 3 PSG, TEMEC). The subject
demographics are presented in Table 3.1 as mean ± standard deviation (SD) and range. The
Ethics Committee of the two sleep laboratories (or labs) approved the study protocol and all
subjects signed an informed consent form.
Actigraphy was obtained with the wrist-worn Actiwatch where acceleration data, caused by
body movements, were recorded and converted into activity counts per second (influenced by
the intensity and frequency of acceleration) [229, 254]. The thoracic respiratory effort signal
was recorded using respiratory inductance plethysmography with a sampling rate of 10 Hz.
Note that the recordings from the Actiwatch were synchronized with those from the PSG, using
markers in both the Actiwatch and the PSG clocks.
Sleep stages were scored on 30-s epochs by sleep experts based on the AASM guidelines
as wake, REM sleep, and three NREM stages N1-N3. For sleep and wake classification, we
considered two classes wake and sleep (including REM and NREM sleep). Each PSG recording
was manually clipped to the time interval comprised between the instant when the subject turned
the lights OFF with the intention of sleeping until the moment the lights were turned ON before
the subject got out of bed in the morning.
3.3 Methods
3.3.1 Dynamic warping algorithm
3.3.1.1 Dynamic warping distance

DW computes a distance between two series by non-linearly aligning them in a given dimen-
sion. Consider two series:
A = a1 , a2 , ..., ai, ..., an (length n), (3.1)

m
wK
Warping band
(upper)
r
Warping path
j wk Warping band
(lower)
w1 i n
B
Figure 3.2: An example of DW process between two series A and B, where the warping path (circle
markers) and the Sakoe-Chiba warping bands with the size of r (dash lines) are indicated.
B = b1 , b2 , ..., b j , ..., bm (length m). (3.2)
These two series can be arranged such that they form an n-by-m “warping matrix”, where each
element of the matrix (i, j) is given by a distance function D, expressing the squared distance
between ai and b j :
D(i, j) = (ai − b j )2 . (3.3)
A warping path maps the elements of A and B through the matrix so that the total cumulative
distance between them is minimized. The warping path W belongs to a set Ω including all
possible warping paths, and is denoted as
W = w1 , w2 , ..., wk , ..., wK (length K), (3.4)
where wk = (i, j)k is the kth element of the warping path W and max(n, m) ≤ K ≤ m + n − 1.
The DW distance between the two series is the minimum measure based on W such that:
q
1 K
K ∑k=1
DW (A, B) = min wk , W ∈ Ω, (3.5)
where the distance is normalized by a factor K (path length). Figure 3.2 illustrates an example
of the dynamic warping procedure between two series A and B in a 2-D space.
3.3.1.2 Warping conditions

Since the DW algorithm searches for an optimal warping path through all possible paths, the
number of possible combinations quickly explodes with the length of the series. The search
space can be reduced by means of “conditions”, which help to effectively mitigate the quadratic
complexity of the algorithm [37]. Several conditions are used to decrease the number of
possible paths including continuity, monotonicity, slope constraint, and boundary constraint
[37, 245]. They can be used to construct a warping path specified by a recurrence:
∆(i, j) = D(i, j) + min[∆(i − 1, j − 1), ∆(i − 1, j), ∆(i, j − 1)], (3.6)
where the cumulative distance ∆(i, j) is defined as the sum of the distance D(i, j) found in a
warping step with the minimum of the cumulative distances of the adjacent elements on the
warping matrix.
Additionally, the warping path can be restricted by a band of size r (i.e., |ik − jk | ≤ r) on
both sides of the diagonal points of the warping matrix to reduce computational complexity of
a DW procedure (i.e., to reduce search space of the warping matrix). It is called warping band
condition, and the corresponding band is commonly known as the Sakoe-Chiba band [261] (see
Figure 3.2). In regard to the warping band condition, using a band size r that is too large often
results in “over-warping” the periodic series with multiple cycles and thus introducing artificial
features [71]. These artificial features usually occur when the warping path takes excessive
numbers of non-diagonal (i.e., vertical or horizontal) moves. While a very small band size may
account for “under-warping” between two series (the extreme cases is the Euclidean alignment
that corresponds to the diagonal line of the warping matrix) [152]. Over-warping and under-
warping are both undesirable. To determine a suitable band size r, we search for the parameter
value that would result in the highest feature discriminative power. This will be presented later
in Section 3.3.3.
3.3.1.3 DW versus Euclidean

The Euclidean distance (computed as a sequential mapping of two series) is a special case
of the DW distance, where the warping path coincides with the diagonal line of the warping
matrix. It is known to be sensitive to distortion in the horizontal dimension of a series [245].
Figure 3.3 depicts an example of the Euclidean and the DW alignments between series A and
B. It illustrates that the DW allows them to scale or shift along the horizontal dimension. Thus,
in this example, the DW distance should be smaller than the Euclidean distance.
3.3.1.4 DTW and DFW distance

When the DW algorithm is used to compute the distance between two time series AT and BT ,
it is called “DTW algorithm” with corresponding DTW distance. Similarly, it is called “DFW
algorithm” with corresponding DFW distance when used to compute the distance between two
frequency (or PSD) series AF and BF . The superscripts indicate the time series (T ) and PSD se-
ries (F). These two distance measures can be obtained based on Equation 3.5 and 3.6 described
before.
Euclidean DW
A A
B B
Figure 3.3: An example of the alignment between two series (A and B) when computing the Euclidean
(Left) and DW (Right) distances.
3.3.2 Sleep and wake classification
3.3.2.1 Signal preprocessing and PSD estimation

Before feature extraction, the respiratory effort signal of each recording is first low-pass filtered
(using a 10th order Butterworth filter with a cut-off frequency of 0.7 Hz) to eliminate high
frequency noise, after which the baseline is removed by subtracting the median peak-to-trough
amplitude estimated over the entire recording. On the other hand, for each epoch, a Short-
Time Fourier Transform (STFT) can be used to estimate a PSD based on the resulting pre-
processed respiratory effort signal according to the following procedure: the resulting signal is
first divided in 60-s frames centered on the epoch of interest, with an frame-to-frame overlap
of 50%; after that, a Hanning window with a length of 60 s is used to reduce spectral leakage;
the spectrum is then computed using the Fast Fourier Transform (FFT); finally, the absolute
spectral values along the positive frequency axis are squared, yielding the PSD estimate for this
epoch.
3.3.2.2 Feature extraction

Respiratory effort and actigraphy are considered, from which features are extracted for sleep
and wake classification. First, an actigraphy feature can be extracted from the output (activity
counts per second) of the Actiwatch. Second, we introduce two new features: a DTW fea-
ture called “minimum DTW distance” and a DFW feature called “minimum DFW distance”,
extracted from the pre-processed respiratory effort signal.
The actigraphy feature (ac ) is first calculated as the sum of activity counts over one epoch
with 30 s; then it is smoothed via a weighted moving average method with a window size of 9
epochs in order to eliminate noise introduced during measurement [89]. This feature gives an
indication of gross body movements during sleep.
The DTW feature (dtw ) is computed based on the respiratory time series of a subject with
the DTW algorithm described earlier. For each epoch, it measures the maximum similarity
in the time domain between that epoch and a “time-series template” having the same length.
Assume that the respiratory effort data recorded for a given subject is split in L non-overlapping
epochs. Each of them consists of a collection of N data points in the time domain with a length
DTW distance = 0.003 DFW distance = 0.7e−4

Sleep
Sleep
Sleep Sleep
0 50 100 150 200 250 300 0 20 40 60 80 100 120 140
Time series sample Freq. series sample
(a) (d)
Wake DTW distance = 0.016 DFW distance = 3.8e−4

Wake
Sleep Sleep
0 50 100 150 200 250 300 0 20 40 60 80 100 120 140

(b) (e)
DTW distance = 0.033 DFW distance = 1.5e−4

Wake Wake
Wake Wake
0 50 100 150 200 250 300 0 20 40 60 80 100 120 140

(c) (f)
Figure 3.4: Examples of DTW alignments of the respiratory time series (a-c) and of DFW alignments of
respiratory PSD series (d-f), respectively, between two sleep epochs (S-S), between a wake and a sleep
epoch (W-S), and between two wake epochs (W-W). Each time series lasts 30 s sampled at 10 Hz and
each PSD series contains 144 samples falling within a frequency range of 0 to ∼0.7 Hz. The values of
corresponding DTW and DFW distances are indicated.
of 30 s, such that:
E T (L) = {E1T , E2T , ..., E pT , ..., ELT }, (3.7)
where E pT = {x p,1 , x p,2 , ..., x p,N } is the time series of the pth epoch (p ∈ Z+ and 1 ≤ p ≤ L)
and N is the number of data points per epoch (N = 300 at a signal sample rate of 10 Hz). In
order to compute the feature value for a given epoch of a recording, the template needs to be
determined. We search for the template based on a window ΛT with a size of 2λ T (<2λ T when
p < λ T or p > L − λ T ) centered on the given epoch (±λ T ), where this epoch itself should be
excluded to avoid “self-alignment”. Thus, for the pth epoch E pT , the time-series template ΓTp is
selected using
ΓTp = arg min DW (E pT , EqT )

EqT
(3.8)
T
for all q ∈ Z+ , |q − p| ≤ λ , and q 6= p,
where λ T is a positive integer with 1 ≤ λ T ≤ L − 1. Then the feature value of the pth epoch is
computed by
dtw (p) = DW (E pT , ΓTp ). (3.9)
It means that we choose, as the feature value, the minimum of all DTW distances between the
given epoch E pT and all the other epochs within a searching window ΛT .
The DFW feature (dfw ) is computed based on the DFW algorithm. The procedure of com-
puting dfw is the same as that of computing dtw , but for a respiratory PSD series rather than
its time series. This feature compares the shape of the PSD curve between a given epoch and a
“frequency-series template” with an indication of maximum similarity in the frequency domain.
Therefore, the feature value of dfw for the pth epoch is obtained as
dfw (p) = DW (E pF , ΓFp ), (3.10)
where E pF = {ϕ p,1 , ϕ p,2 , ..., ϕ p,M } is the PSD series of the pth epoch (p ∈ Z+ and 1 ≤ p ≤
L), containing M frequency bins and ΓFp is the selected frequency-series template. Here the
template searching window of the DFW feature is ΛF with a size of 2λ F epochs. As explained
before, the PSD series are obtained after STFT, for each of which the number of frequency bins
is M = 144 in a frequency range between 0 and ∼0.7 Hz (a subset of the original spectrum
with 1024 frequency bins in the range of 0 to 5 Hz). We limit the comparison of the PSD of
each epoch to this frequency range since it can be observed that the frequency components of
a healthy subject’s respiration during sleep are usually below 0.7 Hz. We experimentally found
that including higher frequency components would result in a lower discriminative power of
the feature since they carry very small but unexpected non-zero noise that would contaminate
the DFW alignment.
The use of template searching window is to reduce the computational complexity when ex-
tracting the DW features, restricting the search for minimum DW value to that window. An
assumption here is that, for a given epoch, it will always offer a suitable template by search-
ing from all the other epochs within the window except the given epoch. The procedure of
determining λ T and λ F will be presented in Section 3.3.3.
3.3.2.3 Understanding of DW-based features

Intuitively, there should be higher similarities of respiratory waveform and PSD shape between
any two sleep epochs than between a wake and a sleep epoch or between two wake epochs.
This will be expressed by the minimum DTW and minimum DFW distances found for each
epoch. To further understand this, we consider two simple cases: the current epoch is sleep or
is wake. Then the feature dtw (or dfw ) of this epoch may have three possible situations, where
the minimal DTW (or DFW) distance may occur: between two sleep epochs (S-S), between
a wake and a sleep epoch (W-S), or between two wake epochs (W-W). Regarding the DTW
feature, we can state the following: (1) if the current epoch is sleep, it is likely to find a small
value of DTW distance after searching for similarities of signal waveform between this epoch
and the remaining epochs in a certain window, since S-S may happen; (2) if the current epoch is
wake, it is not likely to obtain a small feature value, because W-S or W-W may happen. For this
reason, this feature will in turn have discriminative power for distinguishing sleep and wake
states. Regarding the DFW feature, the same reasoning applies; but instead of (dis)similarities
of respiratory waveform, this feature expresses (dis)similarities in the shape of PSD series.
Figure 3.4 depicts two examples of the alignment found by DTW and DFW between epochs,
in the three situations (S-S, W-S, and W-W).
It should be kept in mind that the respiratory waveform and PSD shape might carry some
information of body motion artifacts, which often appear during wake state. This would possi-
bly lead to irregularity of a recorded respiratory effort. As illustrated in Figure 3.5, some peaks
(e.g., around the 420th epoch) of the DW-based features seem correlated to the actigraphy fea-
ture (ac ), expressing the activity counts. It means that these two features might help detect
body motion artifacts. On the other hand, some peaks (e.g., around the 750th epoch) of the
DW-based features seem related to the wake epochs, but where no activity counts are observed.
These peaks might possibly be in correspondence with irregular breathing rhythm.
3.3.2.4 Classifier
A linear discriminant (LD) classifier is adopted in this study. It has been previously proved to
be appropriate for the task of sleep and wake classification using actigraphy, respiratory, and
cardiac data [89, 108, 178, 249]. The details of an LD classifier can be found in [249] and [97].
Note that the classifier used here is based on epoch-by-epoch classification.
Regarding the prior probability in the LD classifier, it can be observed that the probabilities
of different classes vary throughout the night. For example, the probability of being awake
just right after sleep onset or at the end of the night is much higher than in the middle of the
night. To exploit these variations, we compute a time-varying prior probability for each epoch
by counting the relative frequency that specific epoch was annotated as each class [108, 249].
3.3.3 Experiments and evaluation
3.3.3.1 Experimental validation

Due to the relatively small size of our data set, it is not appropriate to split it into separate train-
ing and test sets. To alleviate this issue, a leave-one-subject-out cross validation (LOSOCV)
procedure [97] can be used to evaluate the performance of our sleep and wake classifier. Given
a set of feature vectors, we first divide it into l subsets (corresponding to l = 15 subjects in
this study). On each iteration, one subset is used as test set and the remaining subsets are used
to train the classifier. The classifier is evaluated on each test set, obtaining its performance
for each iteration of the cross-validation. Finally, results are averaged and pooled to obtain an
indication of the overall performance.
3.3.3.2 Evaluation
To evaluate the performance of our classifier, overall accuracy (i.e., ratio of correctly identified
samples to the total number of samples) used in a binary classification problem is not the most
Annotation
Wake (a)
Sleep
(b)
Resp.
(c)
dtw
(d)
dfw
(e)
ac
0 100 200 300 400 500 600 700 800

Time (30-s
- epoch)
Figure 3.5: An example of (a) manually scored sleep/wake annotation, (b) respiratory effort recording
at 10 Hz, and feature values of (c) dtw , (d) dfw , and (e) ac for each 30-s epoch of a healthy subject.
adequate. The reason is that during a recording of a whole night the number of epochs of the
wake class (accounting for 7.6% of all epochs) is much smaller than that of the sleep class
(accounting for 92.4% of all epochs), in what is usually called an “imbalanced class distribu-
tion” [125]. Thus we also consider the metrics specificity (proportion of correctly identified
actual negatives), sensitivity or recall (proportion of correctly identified positives), and preci-
sion (ratio of true positives to true positives plus false positives). Besides to these metrics,
the Cohen’s Kappa coefficient of agreement κ [72] provides a more insightful measure of the
general performance of the classifier (0-0.20: slight, 0.21-0.40: fair, 0.41-0.60: moderate, 0.61-
0.80: substantial, and 0.81-1: almost perfect agreement [172]); but it only represents a single
point in the entire solution space [237]. In order to have an overview of the performance across
the entire solution space, we use a Precision-Recall (PR) curve [103], which plots precision ver-
sus recall by varying the classifier’s decision-making threshold. Compared with the well-known
Receiver Operating Characteristic (ROC) curve that has been shown to be over-optimistic when
the data set is heavily imbalanced between classes [83], a PR curve gives a more conservative
view of the classifier’s performance. The corresponding ‘Area Under the PR Curve’ (AUCPR )
can then be estimated [83]. In the remainder of the chapter, we will consider wake and sleep as
the positive and negative classes, respectively.
An absolute standardized mean difference (ASMD) metric is utilized to evaluate the discrim-
inative power (i.e., separability) of a single feature. It computes as the absolute mean difference
of the feature values between sleep and wake epochs divided by the standard deviation among
that of all epochs. A Mann-Whitney unpaired (1-sided) test is applied to check whether the
feature values of the two classes significantly differ. Moreover, the Spearman’s rank correla-
tion coefficient (denoted as ρ ) measures the correlation between features. The significance of
correlation can be examined with a Student’s t-test.

In addition to evaluating the feature discriminative power between sleep and wake epochs,
more specifically, we will also evaluate that between wake and REM epochs and between
sleep and quiet-wake epochs. This is because the sleep and wake misclassification often oc-
curs between wake and REM epochs by means of the traditional (cardio)respiratory features
[249, 283], and between quiet-wake and sleep epochs when using actigraphy only [18]. Here
the quiet-wake is defined as the wake with computed activity counts lower than 4.5, approxi-
mate to the mean value of all the sleep epochs.
The ASMD metric can also be used to determine the parameters (i.e., the Sakoe-Chiba band
size rT and rF and the template searching window size λ T and λ F ) for computing the DW-
based features. For each feature, a grid search method is applied for the two parameters that
optimize the feature’s ASMD value. To obtain an unbiased determination, the grid search is
therefore run on each training set during the LOSOCV procedure. Then for each parameter, the
determined value is in the majority of the optimal values occurred on different training sets.
For the purpose of objectively assessing different aspects of sleep, it makes sense to eval-
uate the performance of the classifier in respect to its ability to deliver good estimates of so-
called “sleep statistics”. The sleep statistics include: total sleep time (TST), total wake time
(TWT), sleep efficiency (SE) computed as the ratio of TST to total time in bed, sleep onset la-
tency (SOL) computed as the time it took before the subject fell asleep, wake after sleep onset
(WASO), and snooze time (ST). Since we are considering exclusively sleep and wake states in
this study, SOL is defined as the period between the beginning of a recording and the first epoch
that is annotated (or classified) as sleep according to the AASM guidelines. For the computa-
tion of ST, we follow a similar criterion, measuring the period between the last epoch that is
annotated (or classified) as sleep and the end of the recording. Keep in mind that the recordings
are restricted to the intervals from lights off until lights on. For each statistic, we compute the
error as the difference value (estimation bias) and as the absolute difference value (absolute
error) between the reference (computed based on the PSG-based manual annotation) and the
estimate (computed based on the classification result). Furthermore, we apply Bland-Altman
scatter plots to assess the degree of agreement between the PSG-based and estimated statistics.
3.3.3.3 Classification performance comparison

The actigraphy and the DW-based features used in this study are first compared. They are
denoted as FAC , FDTW , and FDFW for comparison with other feature sets (see Table 3.2).
Our earlier studies [89, 108] have considered a large amount of features for sleep and wake
classification. In those studies, a subset of features were selected from them based on the feature
selection method described in [108]. It consists of five features – an actigraphy feature activity
counts (ac ); three respiratory features including standard deviation of respiratory frequency
over 9 epochs (sdf ), high frequency components (hfc ) [248], and non-linear measure by
means of sample entropy (se ) [75]; and a cardiac feature mean heart rate (mhr ). However,
these selected respiratory features do not or less reflect characteristic morphological properties
of respiratory effort waveform or their variation over time by means of PSD shape. Those
Table 3.2: Summary of feature sets
Feature set Features # Signal modality∗

FAC ac 1 A
FDTW dtw 1 R
FDFW dfw 1 R
FR1 sdf , hfc , se 3 R
FR2 sdf , hfc , se , dtw , dfw 5 R
FDW dtw , dfw 2 R
FAR1 ac , sdf , hfc , se 4 A, R
FAR2 ac , sdf , hfc , se , dtw , dfw 6 A, R
FAC-DW ac , dtw , dfw 3 A, R
FARC1 ac , sdf , hfc , se , mhr 5 A, R, C
FARC2 ac , sdf , hfc , se , dtw , dfw , mhr 7 A, R, C
∗ A: actigraphy data; R: respiratory effort data; C: cardiac data.
properties will be exploited with the introduction of the new DW-based features.
To understand whether the new DW-based features add discriminative power to a sleep
and wake classifier that uses the selected features extracted from different signal modalities,
we consider three respiratory-based feature sets and three actigraphy- and respiratory-based
feature sets, in which the features included are presented in Table 3.2. For the comparison of
classification performance with and without cardiac information, two feature sets FARC1 and
FARC2 including all the previously selected features (or together with the DW-based features)
are also considered.
Since our data were collected from two distinct sleep labs (Boston or Eindhoven), the lab-
effect (possibly caused by the difference of PSG setup during measurement between labs) on
sleep and wake classification is then analyzed by using one data set for training and the other
for testing.
3.3.3.4 Computational complexity of DW-based features

The original dynamic programming (without any conditions) is extraordinarily computationally
intensive because it searches through all possible warping paths [37]. The use of the warping
conditions can, to a great extent, speed up the DW computation [37, 152]. To compare the
computational complexity when extracting DW-based features, three approaches are considered
as follows.
• The most commonly used DW approach is the one with the warping conditions but with-
out the warping band condition [205]. It requires a computational complexity of O(N 2 ),
where the two series have the same length N. When using exhaustive template searching,
the complexity of computing a DW-based feature value becomes O(LN 2 ), in which L is
the epoch number of a recording. This approach is denoted as A1.
• The Sakoe-Chiba warping band condition brings down the computational complexity to
Table 3.3: Parameter determination procedure
Parameter Symbol Grid search Determined value

Min Max Step
Sakoe-Chiba warping rT 0 300 5 60 samples†
band rF 0 144 1 5 frequency bins‡
Template searching λT 25 500∗ 25 200 epochs§
window size (1-side) λF 25 500∗ 25 250 epochs♮
∗
The maximal template searching window size could be limited to the total number
of epochs when computing the features.
† 6 s (6.4 ± 0.9 s).
‡ ∼0.024 Hz (0.026 ± 0.004 Hz).
§ 100 min (94.4 ± 10.3 min).
♮ 125 min (134.4 ± 23.3 min).
O(LrN) instead of O(LN 2 ), where r is the warping band size and typically r ≪ N [205].
This approach is denoted as A2.
• Setting a template searching window Λ with a size of 2λ can reduce the complexity to
O(λ rN), where λ < L. This approach is denoted as A3.
These three DW approaches will be compared in terms of average computation time of

extracting a DW-based feature for one epoch, implemented in a MEX-compiled C routine used
in Matlab (Mathworks, Natick, MA). All computations were carried out in a laptop computer
with a single Intel(R) Core(TM) i5 processor (2.53 GHz) and 4GB RAM memory.
3.4 Results
Table 3.3 indicates the determined parameter values obtained by the grid search method. Since
the determination was based on the training set of each iteration during the LOSOCV procedure,
the optimal values for each iteration might differ. Their means and variances (over grid search
iterations) are also indicated in the table.
Table 3.4 shows the pooled discriminative power (as measured by ASMD) of the selected
features for all subjects in separating the sleep and wake classes. As confirmed with the Mann-
Whitney unpaired (1-sided) test, the differences of the features between these two classes are
significant. The table also indicates that the DW-based features perform much better than actig-
raphy when discriminating between quiet-wake and sleep; and the feature dtw offers a higher
discriminative power compared with the other features for wake and REM separation. Fig-
ure 3.6 illustrates the box plots of the three features (ac , dtw , and dfw ) for sleep and wake
epochs for every subject and the pool of all these subjects. It clearly shows how the features can
help discriminate (albeit not perfectly) between the two classes. Classification errors will occur
for feature values where the box plots overlap. Besides, the in-between feature correlations for
Table 3.4: Feature discriminative power (ASMD)
Feature Sleep vs. Wake Quiet-wake vs. Sleep Wake vs. REM sleep
ac 1.77∗ 0.16∗∗ 0.92∗
dtw 1.75∗ 0.48∗ 1.03∗
dfw 1.39∗ 0.74∗ 0.70∗
For each feature, the significance of difference between classes (sleep/wake,
quiet-wake/sleep, and wake/REM sleep) was examined with a Mann-Whitney
test (∗ p < 0.0001 and ∗∗ p < 0.005).
Table 3.5: Feature correlation matrix
Correlation (ρ )∗ ac dtw dfw

ac 1 0.32† 0.26†
dtw – 1 0.26†
dfw – – –
∗ Spearman’s rank correlation coefficient.
† Significance of correlation was tested with a t-test,
p < 0.0001.
all subjects are presented in Table 3.5, indicating that the correlation between ac and dtw is
higher than the others.
The classification results obtained with each of the feature sets after LOSOCV are summa-
rized in Table 3.6, where both ‘averaged’ and ‘pooled’ results are presented. Note that, for each
feature set, the decision threshold (i.e., operating point) of the classifier was chosen to optimize
Kappa coefficient (based on training sets) rather than overall accuracy due to the between-class
imbalance of our data. As it can be seen in the table, for instance, the two DW-based features
(i.e., FDW ) provide a pooled κ of 0.59, which seems to be comparable with the actigraphy fea-
ture (corresponding to a κ of 0.58). Combining them with the actigraphy feature in FAC-DW ,
we achieved a pooled κ of 0.66 and a pooled accuracy of 95.7%. The table also presents the
classification results obtained with FAR1 and FAR2 , indicating that the addition of DW-based
features significantly improves the classification performance. It also shows that the feature set
FAC-DW performs significantly better than FAR1 and comparably with FAR2 . For comparison,
the results based on the feature sets comprising actigraphy, respiratory, and cardiac features are
also provided in Table 3.6. No significant difference was found between FAC-DW , FARC1 , and
FARC2 . Figure 3.7 compares the pooled PR curves using different feature sets.
The classifier’s learning curves (based on FAC-DW ) using LOSOCV are displayed in Fig-
ure 3.8. It is plotted as pooled κ versus the number of subjects (varying from 2 to 15). The
results on training and test sets start converging rapidly from 4 or 5 subjects and become stable
at 13 subjects, ultimately achieving a κ of ∼0.66. This confirms the unsuitability of splitting
separate training and test sets in our experiment.
800
Sleep Wake
ac (a.u.)
400
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pool
0.1
dtw (a.u.)
0.05
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pool
−4
x 10
3
dfw (a.u.)
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pool
Subject
Figure 3.6: Box plots (mean and SD) of the feature values of ac , dtw , and dfw for sleep and wake
epochs for each of the 15 subjects and for the pool of all these subjects.
0.9
0.8
0.7
0.6
Precision
0.5
0.4 FAC
FR1
0.3
FDW
0.2
FAR1
0.1 FAC−DW
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Figure 3.7: PR curves with features or feature sets with their corresponding operating points of classifier
(representing κ ) are marked.
46
Table 3.6: Summary of sleep and wake classification results using LOSOCV
Feature set Precision (%) Sensitivity (%) Specificity (%) Accuracy (%) AUCPR∗ Kappa κ ∗
FAC 64.5 (66.2 ± 20.9) 57.5 (61.8 ± 16.1) 97.4 (97.3 ± 2.0) 94.4 (94.2 ± 1.7) 0.66 (0.73 ± 0.09) 0.58 (0.57 ± 0.07)
FDTW 50.7 (51.0 ± 17.9) 62.9 (67.7 ± 17.0) 95.0 (94.9 ± 2.5) 92.5 (92.4 ± 2.1) 0.55 (0.60 ± 0.13) 0.52 (0.51 ± 0.11)
FDFW 43.3 (42.6 ± 11.8) 50.8 (53.7 ± 11.9) 94.5 (94.3 ± 2.3) 91.2 (91.0 ± 2.9) 0.43 (0.44 ± 0.10) 0.41 (0.41 ± 0.08)
FR1 45.2 (51.6 ± 20.2) 54.3 (52.9 ± 16.8) 94.6 (93.8 ± 6.9) 91.5 (90.9 ± 6.0) 0.52 (0.55 ± 0.14) 0.45 (0.44 ± 0.12)
FR2 64.2 (66.8 ± 20.5) 56.0 (55.0 ± 19.3) 97.3 (96.6 ± 3.0) 94.2 (94.1 ± 2.6) 0.64 (0.67 ± 0.16) 0.57 (0.55 ± 0.17)
FDW 63.5 (64.0 ± 18.9) 59.9 (63.4 ± 16.2) 97.3 (97.2 ± 1.9) 94.3 (94.2 ± 2.2) 0.64 (0.68 ± 0.12) 0.59 (0.58 ± 0.11)
FAR1 70.6 (75.3 ± 29.2) 60.5 (62.4 ± 18.7) 97.8 (97.6 ± 2.7) 95.0 (94.8 ± 2.3) 0.68 (0.75 ± 0.12) 0.62 (0.61 ± 0.12)
Chapter 3. Dynamic warping on respiratory effort

FAR2 75.0 (80.2 ± 20.1) 62.9 (64.2 ± 20.6) 98.1 (97.9 ± 2.8) 95.6 (95.3 ± 2.4) 0.73 (0.78 ± 0.13) 0.66 (0.64 ± 0.15)
FAC-DW 77.3 (79.1 ± 16.5) 61.2 (64.5 ± 20.6) 98.5 (98.3 ± 1.9) 95.7 (95.5 ± 2.0) 0.74 (0.78 ± 0.12) 0.66 (0.65 ± 0.13)
FARC1 † 76.9 (81.4 ± 17.0) 60.3 (59.8 ± 19.1) 98.5 (98.4 ± 2.3) 95.6 (95.5 ± 2.9) 0.72 (0.77 ± 0.12) 0.65 (0.64 ± 0.14)
FARC2 † 75.4 (79.8 ± 16.6) 63.2 (63.0 ± 18.8) 98.3 (98.1 ± 2.3) 95.7 (95.5 ± 2.2) 0.74 (0.77 ± 0.13) 0.67 (0.65 ± 0.12)
For each metric, the pooled and the averaged (between brackets) results over subjects are provided. Results were chosen to optimize κ .
∗ Significance of difference between feature sets was examined with a t-test (with 14 degrees of freedom and p < 0.05). Normality of the results was
confirmed with a Q-Q plot method.

† Compared to the previous work [89, 177], a larger data set with 15 subjects was used; the features (except the DW-based features) were selected
from a larger feature set based on the selection method described in [108].
0.8
Kappa coefficient
0.6
0.4
Training set
0.2
Test set
0
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of subjects
Figure 3.8: Learning curves with LOSOCV by varying the number of subjects.
Table 3.7: Classification results with split training and test sets
Training set Test set Accuracy (%) AUCPR Kappa κ

Boston Boston 96.0 0.80 0.71
Boston Eindhoven 95.4 0.66 0.61
Eindhoven Eindhoven 95.2 0.65 0.59
Eindhoven Boston 95.5 0.78 0.67
Table 3.7 shows the classification results (pooled overall accuracy, AUCPR , and Kappa) of
using our actigraphy- and respiratory-based feature set FAC-DW by splitting training and test sets
with regard to lab (i.e., using the Boston set to train the classifier and testing it on the Eindhoven
set, and the other way around).
The results (absolute error and estimation bias) of the sleep statistics over subjects using
different actigraphy and respiratory feature sets (FAR1 and FAC-DW ) are summarized and com-
pared in Table 3.8. Using FAC-DW we achieved significantly lower absolute errors (after t-test,
p < 0.05) in estimating the sleep statistics compared with that using FAR1 , with an exception of
ST. To compare the degree of agreement, the Bland-Altman scatter plots were produced in Fig-
ure 3.9. It can be seen that the difference values of SE, TST, TWT, SOL, and WASO are more
converging when using FAC-DW than using FAR1 , indicating less variances (or higher degree of
agreement) when estimating the sleep statistics with FAC-DW .
Table 3.9 compares the computational complexity of different DW approaches (A1, A2, and
A3). It means that, when extracting DW-based features, using a warping band for DW and
constraining the template searching range reduces the computation time significantly (after a
t-test, p < 0.001) to an average value of 0.53 s and 0.10 s for computing dtw and dfw of each
30-s epoch, respectively. On average, it takes approximately 7.5 min for the DTW feature and
1.5 min for the DFW feature to compute all their feature values of one night per subject.
Table 3.8: Comparison of sleep statistics (mean ± SD over subjects)
Sleep statistics Absolute error Estimation bias∗

FAR1 FAC-DW FAR1 FAC-DW
SE (%) 3.3 ± 2.4 2.8 ± 2.1 −0.1 ± 4.1 −1.3 ± 3.3
TST (min) 13.7 ± 10.4 11.2 ± 7.9 −0.43 ± 17.6 −6.5 ± 12.4
TWT (min) 13.7 ± 10.4 11.4 ± 8.0 0.43 ± 17.6 −6.9 ± 12.4
SOL (min) 6.4 ± 7.1 5.0 ± 6.8 3.9 ± 8.8 3.4 ± 7.8
WASO (min) 12.2 ± 10.4 7.1 ± 5.3 −3.6 ± 15.9 3.6 ± 8.2
ST (min) 1.0 ± 1.3 1.1 ± 1.7 0.17 ± 1.7 −0.13 ± 2.0
∗
For each subject, estimation bias was computed as reference value minus estimated value.
Table 3.9: Comparison of computational complexity for different DW-

based feature extraction approaches
DW Computational Average computation time (s)

approaches complexity dtw (one epoch) dfw (one epoch)
A1 O(LN 2 ) 2.29 ± 0.08 0.44 ± 0.02
A2 O(LrN)∗ 1.43 ± 0.05 0.21 ± 0.01
A3 O(λ rN)∗ 0.53 ± 0.04 0.10 ± 0.03
∗
Here the parameters are rT = 60, rF = 5, λ T = 200, and λ F = 250.
3.5 Discussion
During the training step of each LOSOCV iteration, some parameters, evaluated by the pooled
AUCPR , were determined. The determined Sakoe-Chiba warping band for DTW (rT = 60) is
much larger than that for DFW (rF = 5). This is because, when computing the DTW distance
between two respiratory time series, they usually start and end with different phases of a breath-
ing cycle. A larger DTW warping band allows a larger signal variation (caused by breathing
phase, length, amplitude differences, etc.) between two epochs. It helps compensating for the
signal variation and thus enables to find a better alignment between them. On the other hand,
when computing the DFW distance, the respiratory PSDs were normalized between 0 and 1 so
that the amplitude variation between epochs would be no more existing (no improvement on
classification performance was observed without normalizing them). Also, they usually have
less peaks and no troughs compared with time series (see Figure 3.1). These would yield a
higher similarity between two respiratory PSD series than between two respiratory time series.
Besides, using a smaller warping band for DFW is able to avoid over-alignment between two
PSD series, which still enables to discriminate between sleep and wake with respect to their
minimum distance.
The searching window sizes for extracting DW-based features were also determined with
the use of the grid search method. Since we relied on the observation that the minimum DW
distance for a sleep epoch is small, this potential disadvantage of restricting the search space
Bland−Altman plots using FAR1 Bland−Altman plots using FAC−DW

10 10
SE diff. (%)
SE diff. (%)
0 0
−10 −10
80 85 90 95 100 80 85 90 95 100
SE average (%) SE average (%)
TST diff. (min)
TST diff. (min)

50 50
0 0
−50 −50
200 300 400 500 200 300 400 500
TST average (min) TST average (min)
TWT diff. (min)
TWT diff. (min)

50 50
0 0
−50 −50
0 20 40 60 80 0 20 40 60 80
TWT average (min) TWT average (min)
SOL diff. (min)
SOL diff. (min)
40 40
0 0
−40 −40
0 10 20 30 0 10 20 30
SOL average (min) SOL average (min)
WASO diff. (min)
WASO diff. (min)
50 50
0 0
−50 −50
0 20 40 60 0 20 40 60
WASO average (min) WASO average (min)
ST diff. (min)
ST diff. (min)
10 10
0 0
−10 −10
0 5 10 0 5 10
ST average (min) ST average (min)
Figure 3.9: Bland Altman plots of for sleep statistics estimated using FAR1 (Left) with data points marked
by “×” and FAC-DW (Right) with data points marked by “◦”. Data points in a plot represent different
subjects. Mean bias and 95% limits (± 1.96 SD) are shown as solid and dash lines, respectively.
are alleviated by the fact that sleep epochs are usually not isolated in time, i.e., there are, very
likely, other sleep epochs close to any given (sleep) epoch during the night. Furthermore, a
larger searching window might not provide a better separation between sleep and wake classes.
For instance, when analyzing a wake epoch, the inclusion of more distant (in time) candidate
templates might increase the likelihood of selecting a more similar wake template. This would
result in a smaller DW distance and thus decrease the feature’s discriminative power. Here we
found that the discriminative power of these two features did not dramatically change when
λ > 25 epochs.
The DW-based features performed well for sleep and wake classification. These features can
effectively encode differences in the waveform and PSD shape of the respiratory effort between
sleep and wake states. As shown in Table 3.6, when considering the use of only respiratory
effort, our DW-based feature set FAC-DW offers around relative 31% increase of κ compared to
the existing respiratory feature set FR1 (i.e., κ of 0.59 versus 0.45); and it is comparable with the
well-known actigraphy (κ = 0.58). After combining actigraphy with respiratory effort signal,
our DW-based features improved the classification performance from κ = 0.62 to κ = 0.66,
yielding a higher relative increase (∼14%) when compared with actigraphy. The reason might
be that the DW-based features (particularly the DFW feature) better help distinguish between
quiet-wake and sleep (see Table 3.4).
A previous study [126] presented a novel actigraphy-based algorithm for sleep and wake
classification, in which the authors reported an overall accuracy of ∼86%, a sleep accuracy
of ∼91%, and a wake accuracy of ∼69% for a group of 38 normal subjects. In [234], the
overall accuracy was ∼87% (measured in 14 healthy subjects). In this study, to perform an
even comparison, we varied the operating point of our classifier and obtained comparable results
based on only actigraphy. After combining it with the DW-based respiratory features, as shown
in Table 3.6, we achieved much better results.
It is known that the wrist actigraphy ultimately measures the body (or more precisely, wrist)
movements during sleep, which proved to be an indication of wake state [18, 234]. To a certain
extent, they would often be reflected in respiratory effort signal as body motion artifacts during
measurement. This can be observed in Figure 3.5, which suggests a relatively high correlation
between peaks in the actigraphy feature and respiratory effort series. As mentioned, the res-
piratory waveform and PSD shape not only reflect the respiration information but also contain
some information about body motion artifacts. It means that the DW-based features might en-
code the artifact information in both of the time and the frequency domains. Table 3.5 confirms
this due to the significant correlation between ac and dtw (ρ = 0.32) and between ac and
dfw (ρ = 0.26). These two features (particularly the DTW feature) might help separate wake
and REM sleep, resulting in an improved classification when actigraphy is not provided (see
Table 3.6).
The inclusion of the cardiac feature (i.e., using FARC2 ) did not significantly improve the
performance of sleep and wake classification (see Table 3.6). It means that a good performance
is still possible to be obtained when using fewer physiological signal modalities. However, it
is still encouraged to explore new cardiac features containing additional information that can
better discriminate between sleep and wake states, for which these information is not contained
by actigraphy and respiratory activity. Moreover, the κ of 0.59 with only DW-based respiratory
features is comparable with that of 0.60 reported in [249], where they used not only respiratory
but also cardiac information.
The results of using FAR2 (with six features) are comparable with that using FAC-DW (with
three features). Since we aimed at evaluating the proposed new DW-based features, they were
simply combined with the other pre-selected features. Often, using more features does not
necessarily guarantee a better performance, and in some cases it may even decrease. This is
because features may be mutually correlated to some extent, and thus some features are likely
redundant. As a consequence, they may hardly contribute to (or even be against) the classifica-
tion when the additionally useful information they carried is limited compared to the increase
of noise level. Therefore, selecting features from a larger feature set aiming at removing the
feature set redundancy (e.g., correlation-based feature selection [121]) merits further investiga-
tion.
As shown in Table 3.7, the sleep and wake classification results obtained on the Eindhoven
set remain worse compared with those on the Boston set, regardless of either set used for train-
ing. This might be associated with the between-subject variability instead of lab-effect. Thus,
it is not sufficiently confident to conclude about the existence of lab-effect based on our data set
with a small number of subjects. Although results have been shown to be consistent between
labs [126], it is encouraged to be further studied on a larger-sized data set .
By choosing different classifier operating points, we can obtain results that prefer a higher
specificity or sensitivity. In practice, this often depends on the requirement of accuracy in
estimating sleep statistics, which can be delivered to subjects. For example, it should be cho-
sen to optimize the estimate of SOL for subjects who might have insomnia; while for overall
assessment of sleep, one can choose to optimize the estimate of SE.
In addition, this study focused on the healthy subjects with high sleep efficiencies (>86%)
rather than, e.g., the insomniacs with low sleep efficiencies. However, it has been indicated that
distinguishing between sleep and wake states is more difficult in insomniacs than in healthy
subjects when using cardiorespiratory activity [85, 283] or actigraphy [175]. Although the
DW-based features perform well in separating sleep and wake states for the healthy subjects, it
is necessarily required to further evaluate how robust they are against low sleep efficiency.
Finally, although the DW-based features seem computationally intensive compared with
many other existing features, it is still practically feasible to achieve an offline classification
of sleep and wake. In fact, recent research has developed a set of techniques that can make
the DW computation much faster and comparable with the Euclidean alignment, so that DW is
applicable on large-sized data sets in real time [242]. Nevertheless, speeding up our algorithms
using these techniques will be carried on in our future work.
3.6 Conclusion
In this chapter, we proposed two new features extracted from respiratory effort based on dy-
namic warping (DW) algorithms to enhance the performance of sleep and wake classification.
The features compared the shape (dis)similarity between two series (in time and frequency
domain) for a given 30-s epoch with the other epochs within a pre-determined window from
an entire-night respiratory effort recording. The minimal dissimilarity (measured by a DW
distance) was computed as the feature value for this epoch. To evaluate the sleep and wake
classification performance, a linear discriminant classifier was tested with a leave-one-subject-
out cross-validation. By combining the two DW-based features with a well-known actigraphy
feature, we obtained a significantly increased Cohen’s Kappa coefficient (κ = 0.66) compared
with the use of the actigraphy feature and the traditional respiratory features (κ = 0.62), and
it significantly outperforms that only with actigraphy (κ = 0.58). It is comparable with that of
0.67, obtained with a feature set comprising the DW-features and the previously used actigraphy
and cardiorespiratory features. Furthermore, when using the respiratory signal only, the DW-
based features provided a large improvement compared with the existing respiratory features
(κ of 0.59 versus 0.45).
CHAPTER 4
Analysis of respiratory effort amplitude for sleep stage

classification
This chapter is adapted from: X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Analyzing
respiratory effort amplitude for automated sleep stage classification. Biomedical Signal Processing and
Control, 14:197-205, 2014. Elsevier
c
Abstract – Respiratory effort has been widely used for objective analysis of human sleep
during bedtime. Several features extracted from respiratory effort signal have succeeded in
automated sleep stage classification throughout the night such as variability of respiratory fre-
quency, spectral powers in different frequency bands, respiratory regularity and self-similarity.
In regard to the respiratory amplitude, it has been found that the respiratory depth is more ir-
regular and the tidal volume is smaller during rapid-eye-movement (REM) sleep than during
non-REM (NREM) sleep. However, these physiological properties have not been explicitly
elaborated for sleep stage classification. By analyzing the respiratory effort amplitude, we pro-
pose a set of 12 novel features that should reflect respiratory depth and volume, respectively.
They are expected to help classify sleep stages. Experiments were conducted with a data set
of 48 sleepers using a linear discriminant (LD) classifier and classification performance was
evaluated by overall accuracy and Cohen’s Kappa coefficient of agreement. Cross validations
(10-fold) show that adding the new features into the existing feature set achieved significantly
improved results in classifying wake, REM sleep, light sleep and deep sleep (Kappa of 0.38
and accuracy of 63.8%) and in classifying wake, REM sleep and NREM sleep (Kappa of 0.45
and accuracy of 76.2%). In particular, the incorporation of these new features can help improve
deep sleep detection to more extent (with a Kappa coefficient increasing from 0.33 to 0.43). We
also revealed that calibrating the respiratory effort signals by means of body movements and
performing subject-specific feature normalization can ultimately yield enhanced classification
performance.
53
54 Chapter 4. Analysis of respiratory effort amplitude
4.1 Introduction
According to the rules presented by Rechtschaffen and Kales (the R&K rules) [247], human
sleep is comprised of wake, rapid-eye-movement (REM) sleep and four non-REM (NREM)
sleep stages S1-S4. S1 and S2 are usually grouped as “light sleep” and S3 and S4 correspond
to slow-wave sleep (SWS) or “deep sleep” [276]. The gold standard for nocturnal sleep assess-
ment is overnight polysomnography (PSG) which is typically collected in a sleep laboratory.
With PSG, sleep stage is manually scored on each 30-s epoch throughout the night by trained
sleep experts, forming a sleep hypnogram [247]. PSG recordings usually contain multiple bio-
signals such as electroencephalography (EEG), electrocardiography (ECG), electrooculography
(EOG), electromyography (EMG), respiratory effort, and blood oxygen saturation.
Respiratory information has been widely used for objectively assessing human nocturnal
sleep [95, 226, 281]. Detecting sleep stages over night is beneficial to the interpretation of
sleep architecture or monitoring of sleep-related disorders [102, 248]. Cardiorespiratory-based
automated sleep stage classification has been increasingly studied in recent years [158, 180,
249, 309, 312]. Some of those studies only made use of respiratory activity because, when
comparing with it cardiac activity is relatively more difficult to be captured reliably in an unob-
trusive manner [158, 180]. For respiratory activity, in comparison with the breathing ventilation
acquired with traditional devices such as nasal prongs or face mask [106], respiratory effort can
be obtained in an easier and more noninvasive way, e.g., using a respiratory inductance plethys-
mography (RIP) sensor [73], an infrared camera [166], or a pressure sensitive bed-sheet [264].
Several parameters have been derived from respiratory effort signals for sleep analysis in-
cluding respiratory frequency, powers of different respiratory spectral bands [249], respiratory
self-similarity [180], and regularity [250], etc. These parameters are usually called “features” in
the tasks of epoch-by-epoch sleep stage classification. In addition, it has been reported that the
respiratory amplitude (e.g., depth and volume) differs between sleep stages [95]. For instance,
the “respiratory depth” is more regular and the tidal volume, minute ventilation, and inspiratory
flow rate are significantly lower during REM sleep than during NREM sleep (particularly dur-
ing deep sleep) [67, 129]. To the authors knowledge, these characteristics that express different
physiological properties across sleep stages have not been explicitly elaborated and quantified
for applications of sleep stage classification. We therefore exploit these characteristics by an-
alyzing respiratory effort signal envelope and area. Features quantifying these characteristics
are motivated to be designed which are expected to in turn help separate different sleep stages.
It is assumed that the information about respiratory depth or volume is obtainable from the
respiratory effort signal. For instance, the signal (upper and lower) envelopes and area should
correspond to respiratory depth and volume, respectively. In fact, respiratory effort has often
been used as a surrogate of tidal volume since it is obtained by measuring motions of rib cage or
abdominal with, e.g., RIP [73]. However, Whyte et al. [307] argued that this assumption does
not always hold, particularly when a sleeper changes his/her posture along with body move-
ments during sleep. This is because the respiratory effort amplitude might be affected by body
movements as the sensor position may shift and/or the sensor may be stretched. This will cause
an uneven comparison of the signal amplitude before and after body movements, yielding errors
when computing the feature values. In order to provide a more accurate estimate of respiratory
depth and volume from respiratory effort signal, we must calibrate the signal by means of body
movements. They can be quantified by analyzing the artifacts of respiratory effort signal (of-
ten in line with body movements) using a dynamic time warping (DTW)-based method [180].
DTW is a signal-matching algorithm that quantifies an optimal nonlinear alignment between
two time series allowing scaling and offset [37]. Our previous work [180] has proposed a DTW
measure to effectively capture body motion artifacts by measuring self-similarity of respiratory
effort. This measure has been successfully used as a feature for classifying sleep and wake
states in that work. Therefore, we simply adopted this measure to detect motion artifacts mod-
ulated by body movements in respiratory effort signals. Using the DTW-based method enables
the exclusion of an additional sensor modality (e.g., actigraphy) specifically used for detecting
body movements.
The address of this work is exclusively on investigating a set of novel features that can
characterize respiratory amplitude in different aspects with the ultimate goal of improving sleep
stage classification performance. Previous studies have shown that linear discriminant (LD) is
an appropriate algorithm in sleep stage classification [179, 248, 249]. Likewise, we simply
adopted an LD classifier. Preliminary results of this work in classifying REM and NREM sleep
have been previously published [181].

4.2.1 Subjects and data
Data of 48 healthy subjects (21 males and 27 females) in the SIESTA project, supported by
the European Commission [160], were included in our data set. The subjects had a Pittsburgh
Sleep Quality Index (PSQI) [60] of no more than 5 and met several criteria (no shift work, no
depressive symptoms, usual bedtime before midnight, etc.). All the subjects signed an informed
consent form prior to the study, documented their sleep habits over 14 nights, and underwent
overnight PSG study for two consecutive nights (on day 7 and day 8) in sleep laboratories. The
PSG recordings collected on day 7 were used for analyses, from which the respiratory effort
signals (sampling rate of 10 Hz) were recorded with thoracic inductance plethysmography
Sleep stages were manually scored on 30-s epochs as wake, REM sleep, or one of the NREM
sleep stages by sleep clinicians based on the R&K rules. For sleep stage classification epochs
were labeled as four classes W (wake), R (REM sleep), L (light sleep), and D (deep sleep), or
three classes W, R, and N (NREM sleep).
From the data used in this study the subject demographics and some sleep statistics [mean
± standard deviation (SD) and range] are summarized in Table 4.1.
4.2.2 Signal preprocessing
The raw respiratory effort signals of all subjects were preprocessed before feature extraction.
They were filtered with a 10th order Butterworth low-pass filter with a cut-off frequency of
Table 4.1: Summary of subject demographics and sleep statis-

tics (N = 48)

Age (y) 41.3 ± 16.1 20 − 83
Body mass index (kg/m2 ) 23.6 ± 2.9 19.1 − 31.3
Wake, W (%) 12.9 ± 6.1 1.2 − 24.5
REM sleep, R (%) 19.0 ± 3.3 15.3 − 26.5
NREM sleep, N (%) 68.1 ± 4.9 56.1 − 76.3
Light sleep, L (%) 53.6 ± 5.5 42.7 − 66.7
Deep sleep, D (%) 14.5 ± 4.8 5.3 − 28.5
0.6 Hz for the purpose of eliminating high frequency noise. Afterwards the baseline was re-
moved by subtracting the median peak-to-trough amplitude. To locate the peaks and troughs,
we identified the turning points simply based on sign change of signal slope and then corrected
the falsely detected ‘dubious’ peaks and troughs (1) with too short intervals between peak and
trough pairs where the sum of two successive intervals is less than the median of all intervals
over the entire recording and (2) with two small amplitudes where the peak-to-trough differ-
ence is smaller than 15% of the median of the entire respiratory effort signal. These methods
were validated by comparing automatically detected results with manually annotated peaks and
troughs and an accuracy of ∼98% was achieved.
4.2.3 Existing respiratory features
A pool of 14 existing features extracted from the respiratory effort signal has been used in
previous studies for sleep stage classification. In the time domain, the mean and SD of breath
lengths (Lm and Lsd ) and the mean and SD of breath-by-breath correlations (Cm and Csd ) were
calculated [248]. In the frequency domain, we extracted features based on the respiratory effort
spectrum for each epoch where the spectrum was estimated using a short time Fourier transform
(STFT) with a Hanning window. From the spectrum the dominant frequency (Fr ) in the range of
0.05-0.5 Hz (estimated as the respiratory frequency) and the logarithm of its power (Fp ) were
obtained [248]. We also took the logarithm of the spectral power in the very low frequency
band between 0.01 and 0.05 Hz (VLF), low frequency band between 0.05 and 0.15 Hz (LF),
and high frequency band from 0.15 to 0.5 Hz (HF) and the ratio between LF and HF spectral
powers (LF/HF) [248, 249]. Furthermore the standard deviation of respiratory frequency over
5 epochs (Fsd ) was computed [249]. Non-linear features consist of self-similarity measured
between each epoch of interest and the other epochs by means of dynamic time and frequency
warping (Sdtw and Sdfw ) [180] and signal regularity estimated by sample entropy (Rse ) [250].
The latter was implemented with the PhysioNet toolkit sampen [170].
Peak sequence Trough sequence

Wake
Resp. effort (a.u.)

1.2
-1.2
210 240 270 Time (s) 300 330
REM sleep
Resp. effort (a.u.)
1.2
-1.2
14640 14670 14700 Time (s) 14730 14760
Light sleep
Resp. effort (a.u.)
1.8
-1.8
27390 27420 27450 Time (s) 27480 27510
Deep sleep
Resp. effort (a.u.)
-2
7200 7230 7260 Time (s) 7290 7320
Figure 4.1: A typical example of a 2-min (or 4-epoch) respiratory effort signal in wake, REM sleep,
light sleep and deep sleep. The peaks and troughs are represented by filled circles and filled squares,
respectively.
4.2.4 Respiratory amplitude features
4.2.4.1 Analysis of respiratory effort amplitude

Figure 4.1 illustrates four short segments of a respiratory effort signal during different sleep
stages. It is observed that the envelopes formed by the peak and trough sequences of the signal
during wake and REM sleep, when compared with that during light and deep sleep: (1) are more
‘irregular’; (2) have generally lower absolute mean or median; and (3) have larger variance. In
addition, as illustrated in Figure 4.2, we also considered the respiratory effort ‘area’ comprised
between the respiratory effort amplitude and its mean value (zero in the example). As explained,
this area should correlate with respiratory volume to a certain extent, which differs across sleep
stages. Relying on these observations, several new respiratory amplitude features were explored
in two aspects, namely respiratory depth-based and volume-based features.
4.2.4.2 Depth-based features

A total of five depth-based features were extracted from the peak and trough sequences (i.e.,
upper and lower envelopes) of the respiratory effort signal. The amplitudes of these peaks and
Inhalation Exhalation
Wake
1.2
Resp. effort (a.u.)
One breathing cycle

-1.2
240 250 Time (s) 260 270
REM sleep
Resp. effort (a.u.)
1.2
One breathing cycle

-1.2
14700 14710 Time (s) 14720 14730
Light sleep
Resp. effort (a.u.)
1.8
One breathing cycle

-1.8
27390 27400 Time (s) 27410 27420
Deep sleep
Resp. effort (a.u.)
One breathing cycle

-2
7230 7240 Time (s) 7250 7260
Figure 4.2: A typical example of a 30-s (or one-epoch) respiratory effort signal in wake, REM sleep,
light sleep and deep sleep. The areas between the curves and the baseline are filled in light gray (inhala-
tion) and dark gray (exhalation). Examples of one breathing cycle period are indicated.
troughs should include the information in regard to respiratory depth. Let us consider p =
p1 , p2 , . . . , pn and t = t1,t2 , . . .,tn the peak and trough sequences from a window of 25 epochs
or 12.5 min centered at the epoch under consideration, containing n peaks and troughs, respec-
tively. We thus computed the standardized median of the peaks (and troughs) by dividing the
median by their interquartile range (IQR, the difference between the third and the first quartile),
such that
median(p1 , p2 , . . . , pn )
Psdm = , (4.1)
IQR(p1 , p2 , . . ., pn )
median(t1,t2 , . . .,tn )
Tsdm = . (4.2)
IQR(t1,t2 , . . .,tn )
These two features consider the mean respiratory depth and its variability at the same time in
terms of inhalation (for peaks) and exhalation (for troughs). Note that the period length of 25
epochs was chosen to maximize the average discriminative power of all respiratory amplitude
features in separating wake, REM sleep, light sleep, and deep sleep.
To examine how regular the envelopes are, we used the non-linear sample entropy mea-
sure, which has been broadly used in quantifying regularity of biomedical time series [250].
Considering a time series with n data points u = u1 , u2 , . . . , un , let v(i) = ui , ui+1 , . . . , ui+m−1
(1 ≤ i ≤ n − m + 1) be a subsequence of u, where the window length m is a positive integer and
m < n. Then for each i, we have Bi,m (r) = (n − m + 1)−1 η (r), in which η (r) is the number of j
such that dm [v(i), v( j)] ≤ r (1 ≤ j ≤ n −m, j 6= i) where the distance metric dm between two sub-
sequences v(i) and v( j) is given by dm [v(i), v( j)] = max|ui+l − u j+l | for all l = 0, 1, . . ., m − 1.
For a higher dimension m + 1, we have Ai,m (r). Then the sample entropy of the time series u is
defined by
m
A (r)
SE = −ln m , (4.3)
B (r)
where
1 n−m
Am (r) = ∑ Ai,m(r), (4.4)
n − m i=1
1 n−m
Bm (r) = ∑ Bi,m(r). (4.5)
n − m i=1
Similarly, the sample entropy measures of the peak and trough sequences Pse and Tse are com-
puted as
" m #
Apeak (r)
Pse = −ln m , (4.6)
Bpeak (r)
" #
Am
trough (r)
Tse = −ln , (4.7)
Bm
trough (r)
in which r is the tolerance that usually takes the value of 0.1-0.25 SD of the peak or the trough
sequence and m takes a value of 1 or 2 for the sequence of length n larger than 100 data points
[171, 250]. In our study, r of 0.20 SD of the sequence and m of 2 were experimentally chosen
to maximize the discriminative power of the two features.
Additionally, the median of peak-to-trough differences express the range of inhale and ex-
hale depths. It was computed as
PTdiff = median [(p1 − t1 ), (p2 − t2 ), . . . , (pn − tn )] . (4.8)
4.2.4.3 Volume-based features

A total of seven volume-based features were extracted from the respiratory effort signal. They
should reflect certain properties of respiratory volume. The respiratory effort signal (sampled
at 10 Hz) over a window of 25 epochs or 12.5 min centered at the epoch of interest is expressed
as s = s1 , s2 , . . . , sx , . . ., sM (x = 1, 2, . . ., M), where M is the number of sample points in this
period. Suppose that Ωbr k is the kth breathing cycle in the epoch where there are in total K
consecutive breathing cycles (k = 1, 2, . . ., K). Then the corresponding kth inhalation and exha-
lation periods are Ωin ex
k and Ωk , respectively. As illustrated in Figure 4.2, a breathing cycle is the
period between two consecutive troughs and thereby the inhalation and exhalation periods in
this breathing cycle are separated by the peak in between these two troughs. We first computed
the median respiratory volume (expressed by respiratory effort area) measured during breathing
cycles (Vbr ), inhalation periods (Vin ), and exhalation periods (Vex ) for each epoch, such that
 
Vbr = median  ∑ in sx, ∑ in sx , . . ., ∑ in sx , (4.9)

sx ∈Ω1 sx ∈Ω2 sx ∈ΩK
 
Vin = median  ∑ sx , ∑ sx , . . ., ∑ sx  , (4.10)

sx ∈Ωin
1 sx ∈Ωin
2 sx ∈Ωin
K
 
Vex = median  ∑ex sx, ∑ex sx , . . ., ∑ex sx  . (4.11)

sx ∈Ω1 sx ∈Ω2 sx ∈ΩK
In addition, we computed the median respiratory “flow rate” (expressed by the respiratory
effort area over time) during breathing cycles (FRbr ), inhalation periods (FRin ), and exhalation
periods (FRex ), such that
 
1 1 1
FRbr = median  br ∑ sx , br ∑ sx , . . . , br ∑ sx  , (4.12)
τ1 s ∈Ωin τ2 s ∈Ωin τK s ∈Ωin
x 1 x 2 x K
 
1 1 1
FRin = median  in ∑ sx , in ∑ sx , . . . , in ∑ sx  ,
τ1 s ∈Ωin τ2 s ∈Ωin τK s ∈Ωin
(4.13)
x 1 x 2 x K
 
1 1 1
FRex = median  ex ∑ sx , ex ∑ sx , . . ., ex ∑ sx  ,
τ1 sx ∈Ωex τ2 sx ∈Ωex τK sx ∈Ωex
(4.14)
1 2 K
in which τkin and τkex are the kth inhalation and exhalation time (unit: 100 ms)
τkin = max (x) − min (x), (4.15)

sx ∈Ωin
k sx ∈Ωin
k
τkex = maxex (x) − minex (x), (4.16)

sx ∈Ωk sx ∈Ωk
and accordingly the time of the kth breathing cycle is given by
τkbr = τkin + τkex . (4.17)
The ratio of the inhalation and the exhalation flow rate FRin and FRex was finally computed as
RTin
RTfr = . (4.18)
RTex
10
(a)
Resp. effort (a.u.)

5
−5
−10
0 100 200 300 400 481
Time (min)
0.06
DTW measure (a.u.)
(b) Threshold
0.04
0.02
0
0 100 200 300 400 481
Time (min)
Figure 4.3: An example of (a) an overnight respiratory effort signal and (b) the corresponding epoch-
based DTW measure, where the threshold (0.01) for identifying epochs with body movements is indi-
cated.
4.2.4.4 Signal calibration by body movements

As mentioned, the respiratory amplitude features are sensitive to body motion artifacts. We
thus should calibrate the respiratory effort signal before computing these features. This was
done by calibrating each signal segment to have zero mean and unit variance between any two
epochs detected as with body movements. As mentioned in Section 4.1, a DTW-based method
measuring the respiratory similarity between each epoch and its adjacent epochs using DTW
distance [37] was applied to estimate the body movements. For the details of computing the
DTW measure we refer to our previous work [180]. Here the epochs were identified as with
body movements if their DTW measures (expressing body motion artifacts) are larger than
a threshold. A threshold of 0.01 was experimentally found to be adequate for this purpose.
Figure 4.3 compares an overnight preprocessed respiratory effort signal with the corresponding
epoch-based DTW measure from a subject where the peaks (reflecting body movements) are
well aligned in time axis.
4.2.5 Subject-specific feature normalization
Following the feature extraction procedure as described above, we performed a subject-specific

Z-score normalization for each feature. It was done per subject/recording by subtracting the
mean of feature values and dividing by their standard deviation. This allows for reducing phys-
iological and equipment-related variations from subject to subject, thereafter enhancing the
discrimination between sleep stages.
4.2.6 Classifier
An LD classifier was used for sleep stage classification in this study. With LD, the prior prob-
abilities of different classes (i.e. sleep stages) have been observed to change over time. To
exploit this change, we calculated a time-varying prior probability for each epoch by counting
the relative frequency that specific epoch index was labeled as each class [179, 248, 249].
4.2.7 Experiments and evaluation
4.2.7.1 Cross validation

A 10-fold cross validation (10-fold CV) was conducted in our experiments. The subjects were
first randomly divided into 10 subsets, yielding 8 subsets with 5 subjects each and 2 subsets with
4 subjects each. During each iteration of the 10-fold CV procedure, data from 9 subsets were
used to train the classifier and the remaining one was used for testing. After CV, classification
results obtained for each subject in each iterations testing set were collected and performance
metrics (averaged or pooled over all subjects) were computed to evaluate the classifier.
4.2.7.2 Feature evaluation and ranking

We first compared the values of the new respiratory amplitude features in different sleep stages
to see whether they are statistically different between sleep stages. This serves to understand
their feasibility to detect sleep stage at first glance. For each of them, an unpaired Mann-
Whitney test (two-sided) was applied to examine the significance of difference.
To assess the discriminative power or class separability of each single feature in separat-
ing different classes, the information gain (IG) [239] metric was employed. IG describes the
change in information entropy caused by knowing the informative feature values. A higher
discriminative power of a feature is reflected by a larger IG value, vice versa. In this study the
discriminative power of the new features (in separating wake, REM sleep, light sleep, and deep
sleep) with and without calibrating the respiratory effort signal and with and without perform-
ing subject-specific normalization were compared. To examine which sleep stage they are able
to detect best, we compared their IG values (after signal calibration and feature normalization)
in discriminating between each stage and all the other stages as a whole. The new features in
combination with the existing features were ranked by IG which serves to select features.
During each 10-fold CV iteration, features were first ranked by means of the discriminative
power (measured by IG) in a descending order based on the associated training set. Afterwards,
a certain number of top-ranked features were selected. With this approach, we would get 10
feature subsets for all the 10 iterations. To compare the classification performance using differ-
ent number of features, we plot the performance metric versus the number of selected features
and then report the best results. Note that the feature ranking and thus the selected features
may change during each iteration of the cross validation. We allowed for this during our ex-
periments since we found that the feature rankings in different iterations were similar for the
relatively large-sized training data sets (with 43 or 44 overnight recordings) used in this study.
4.2.7.3 Classification performance evaluation

We evaluated the performance of several sleep stage classification tasks. They are (1) two
multiple-stage classification tasks: WRLD (classification of W, R, L, and D) and WRN (clas-
sification of W, R, and N); and (2) four detection tasks: W, R, D, and N (binary classification
between each of them versus all the other stages).
To evaluate the performance of classifiers, conventional metric of overall accuracy was con-
sidered. However, the high class imbalance makes this metric less appropriate. For instance,
the wake epochs account for an average of only 12.9% of all the epochs throughout the night
while the light sleep constitutes 53.6% of the night. The Cohen’s Kappa coefficient of agree-
ment [72] which has often been used in the area of sleep stage classification is considered to be
a better criterion for this problem. By factoring out chance agreement, it is not sensitive to class
imbalance. By these means, it offers a better understanding of the general performance of the
classifier in correctly identifying different classes. For the binary classification tasks, we chose
the classifier decision-making threshold leading to the maximum pooled Kappa and therefore
with this threshold the mean and SD of the overall accuracy and Kappa over all subjects were
computed.
For each classification task, the 10-fold CV using the LD classifier was conducted with
the feature sets comprising the existing pool of 14 respiratory features (set “exist”) and the
combination of the existing features and the new respiratory amplitude features (set “all”). In
addition, we also compared the classification results obtained using features with and without
performing subject-specific (Z-score) normalization. A paired Wilcoxon signed-rank test (two-
sided) was applied to test the significance of difference between classification performances.
4.3 Results
As shown in Figure 4.4, the respiratory amplitude features were found to significantly differ
across sleep stages. This means that the information regarding respiratory depth and volume
estimated from respiratory effort, which are indicators of some properties of respiratory phys-
iology, is not independent of sleep stages and therefore it can be in turn used to classify sleep
stages.
Figure 4.5 compares their discriminative power in separating wake, REM sleep, light sleep
and deep sleep with and without respiratory signal calibration (by means of body motion arti-
facts) and subject-specific feature normalization. Mostly, by calibrating the respiratory effort
and normalizing the features per subject, the IG values of these new features were increased.
The discriminative powers of all the 26 respiratory features for different classification tasks are
presented in Figure 4.6. We note that the respiratory amplitude features rank higher than most
existing features for multiple-stage classifications and NREM sleep detection. Psdm and Tsdm
(reflecting the variability of depth) perform better in detecting deep sleep; Pse and Tse (reflect-
ing the regularity of respiratory depth) have a relatively larger power in distinguishing between
wake and sleep. It can be seen that the volume-based features (with an exception of RTfr ) have
higher discriminative powers in detecting REM sleep.
5 5 5 5
Psdm (a.u.)
Tsdm (a.u.)
Pse (a.u.)
Tse (a.u.)
0 0 0 0
−5 −5 −5 −5
W R L D W R L D W R L D W R L D
5 5 5 5
PTdiff (a.u.)
Vex (a.u.)
Vbr (a.u.)
Vin (a.u.)
0 0 0 0
−5 −5 −5 −5
5 5 5 5
FRex (a.u.)
FRbr (a.u.)
FRin (a.u.)
RTfr (a.u.)
0 0 0 0
−5 −5 −5 −5
Figure 4.4: Boxplots of values of the 12 respiratory amplitude features (with signal calibration and
subject-specific normalization) in different classes (W, R, L and D). Outliers are not shown in order to
visualize the boxes clearer. The significance of difference was found between each two classes for each
feature using an unpaired Mann-Whitney test at p < 0.01.
(a) Without subject-specific normalization

Without signal calibration
0.2
With signal calibration
0.15
IG
0.1
0.05
0
(b) With subject-specific normalization
0.2
0.15
IG
0.1
0.05
0
Psdm Tsdm Pse Tse PTdiff Vbr Vin Vex FRbr FR in FR ex RTfr
Figure 4.5: Comparison of discriminative power (as measured by IG) of all the 12 respiratory amplitude
features without and with calibrating the respiratory effort signals for WRLD classification, where the
values (a) without and (b) without subject-specific feature normalization are both presented. IG was
computed by pooling epochs over all subjects.
Figure 4.7 illustrates the average Cohen’s Kappa coefficient versus the number of features
(ranked and selected by IG values) used for different classification tasks. For most tasks the
classification performance obtained using the feature set “all” is always better than that ob-
tained using the feature set “exist” when the number of selected features is larger than a certain
value. The overall accuracy and Kappa coefficient with the number of selected features yield-
(a) WRLD classification

Existing respiratory features
0.2 Respiratory amplitude features
IG
0.1
0
(b) WRN classification
0.2
IG
0.1
0
(c) W detection
0.15
IG
0.1
0.05
0
(d) R detection
0.15
IG
0.1
0.05
0
(e) D detection
0.15
IG
0.1
0.05
0
(f) N detection
0.15
IG
0.1
0.05
0
Fsd Tsdm Psdm Pse FR br Vbr Vex FR in Tse FR ex R se Sdfw Vin Fr Sdtw LF HF Cm LF/HF L sd L m VLF Fp Csd PTdiff RTfr
Figure 4.6: Discriminative power of all the 26 respiratory features (with signal calibration and subject-
specific feature normalization) for (a) WRLD classification, (b) WRN classification, (c) W detection, (d)
R detection, (e) D detection, and (f) N detection. The features were ranked by IG (computed by pooling
epochs over all subjects) for WRLD classification in a descending order.
ing maximum Kappa are summarized in Table 4.2. We see that, on the one hand, normalizing
the features per subject largely increased the sleep stage classification performance for all the
classification tasks. It also shows that, to a certain extent, this method is able to reduce between-
subject variability in respiratory physiology (by comparing their SD). On the other hand, com-
bining the existing and the new respiratory amplitude features resulted in significantly improved
results except for wake detection. In particular, the relatively large improvement in detecting
deep sleep epochs (Kappa of 0.43 ± 0.19 versus 0.33 ± 0.17) indicates that the new features
can benefit the deep sleep detection most.
Table 4.3 compares the performance of our sleep stage classifiers (for multiple stages) with
those reported in literature. For instance, Hedner et al. [127] presented a Kappa of 0.48 and an
(a) Feature set “exist” (b) Feature set “all” (c) Feature set “exist” (d) Feature set “all”
without normalization without normalization with normalization with normalization
0.5
0.45
0.4
Kappa coefficient
0.35
0.3
0.25
0.2
0.15
W R D N WRLD WRN
0.1
0 2 4 6 8 10 12 14 0 4 8 12 16 20 24 0 2 4 6 8 10 12 14 0 4 8 12 16 20 24
# Features # Features # Features # Features
Figure 4.7: Kappa coefficient of Sleep stage classification versus the number of selected features ranked
by their IG values in a descending order. Results were obtained based on 10-fold CV using feature set (a)
“exist” and (b) “all” without subject-specific feature normalization and using feature set (c) “exist” and
(d) “all” with subject-specific feature normalization. WRLD: classification of wake, REM sleep, light
sleep and deep sleep; WRN: classification of wake, REM sleep and NREM sleep; W: wake detection; R:
REM sleep detection; D: deep sleep detection; N: NREM sleep detection.
overall accuracy of 65.4% in classifying wake, REM sleep, light sleep and deep sleep, which
outperform our results but they used more signal modalities such as peripheral arterial tone,
pulse rate, oxyhemoglobin saturation and actigraphy. With respect to WRN classification, al-
though Redmond et al. [249] obtained better results compared with our study, they included
more signal modalities including cardiac activity. Besides, our results are slightly better than
those reported in some other studies, e.g., Kappa of 0.42 by Mendez et al. [197] and Kappa of
0.44 by Kortelainen et al. [161], where they considered ballistocardiogram (BCG) that contains
also cardiac information. Nevertheless, when only using respiratory activity, Sloboda et al.
[277] achieved an overall accuracy of ∼70% (with 9 respiratory features using a naive Bayes
classifier) which is much lower than that presented in this chapter.
4.4 Discussion
The respiratory effort signals were calibrated using the DTW-based method. The DTW measure
has been proven to be in association with body movements [180], where a significant Spear-
man’s rank correlation coefficient (r = 0.32, p < 0.0001) was reported. Further, we obtained
a higher correlation (r = 0.56, p < 0.0001) between the quantified body movements using the
DTW-based method (where the DTW measures lower than 0.01 were set to be zero) and activ-
ity counts computed using actigraphy based on the data set used in that study. We also tested
Table 4.2: Summary of sleep stage classification performance (10-fold CV) using feature set “exist”
and “all” with and without performing subject-specific feature normalization
Task Feat. Without normalization With normalization

set # Acc. (%) Kappa # Acc. (%) Kappa
WRLD Exist 14 58.4 ± 6.8 0.26 ± 0.12 13 61.7 ± 6.9 0.32 ± 0.11
All 25 59.2 ± 8.6∗ 0.29 ± 0.14∗ 24 63.8 ± 8.1∗ 0.38 ± 0.14∗
WRN Exist 14 71.7 ± 7.4 0.32 ± 0.14 13 75.0 ± 6.7 0.41 ± 0.13
All 25 72.3 ± 8.1∗ 0.34 ± 0.15∗ 23 76.2 ± 7.9∗ 0.45 ± 0.15∗
W detection Exist 6 89.8 ± 6.3 0.49 ± 0.16 10 90.1 ± 4.2 0.50 ± 0.14
All 9 89.8 ± 6.2∗ 0.49 ± 0.16∗ 15 90.3 ± 4.1∗ 0.51 ± 0.15∗
R detection Exist 14 79.4 ± 7.8 0.29 ± 0.19 14 82.0 ± 5.6 0.39 ± 0.20
All 26 79.9 ± 7.6∗ 0.31 ± 0.19∗ 26 82.7 ± 5.8∗ 0.44 ± 0.20∗
D detection Exist 12 84.6 ± 4.9 0.26 ± 0.19 10 84.9 ± 4.3 0.33 ± 0.17
All 8 86.1 ± 4.1∗ 0.34 ± 0.22∗ 5 86.1 ± 4.1∗ 0.43 ± 0.19∗
N detection Exist 13 72.8 ± 10.8 0.40 ± 0.17 14 75.2 ± 8.0 0.44 ± 0.17
All 23 73.3 ± 11.6∗ 0.42 ± 0.19∗ 25 76.8 ± 8.7∗ 0.48 ± 0.18∗
For each feature set, the results obtained using the selected features leading to maximum Kappa coefficient
are reported (see Figure 4.7). Significance of difference between the results obtained using feature set “ex-
ist” and “all” was examined with a paired two-sided Wilcoxon signed-rank test (∗ p < 0.05, ∗∗ p < 0.01,
∗∗∗ p < 0.001, NS: not significant). For all metrics, significant difference was found between the results ob-
tained with and without subject-specific feature normalization at p < 0.01 except for wake detection.
the sensitivity of the threshold and found that the discriminative power of the respiratory am-
plitude features did not dramatically change when the threshold was ranging between ∼0.005
and ∼0.013. To analyze the adequacy of this method for sleep stage classification, we com-
pared the discriminative power as well as the classification performance of these new features
between using actigraphy [181] and using the DTW-based method to calibrate the respiratory
effort signals. The results are comparable. This suggests that the DTW measure is an adequate
estimate of actigraphy for identifying body movements and is therefore effective in mitigating
the effect of body motion artifacts on computing the respiratory amplitude features.
As stated in Section 4.4.2 and 4.4.3, the respiratory amplitude features were computed with
a window of 25 epochs (12.5 min). This served to capture the changes of respiratory depth and
volume as well as providing reliable regularity measures of peak/trough sequences using sam-
ple entropy with sufficient data points. Additionally, we hypothesized that the respiratory effort
area can accurately represent breathing tidal volume or ventilation when extracting the respi-
ratory volume-based features. However, this hypothesis is not always acceptable, in particular
for subjects who change their posture during sleep [307]. In those cases these features might
be inaccurately computed, thus harming classification performance. This challenge should be
further studied.
Table 4.3: Summary of sleep stage classification performance (10-fold CV) using feature set
“exist” and “all” with and without performing subject-specific feature normalization
Task First author/year Modality N # Feat. Classifier Acc. (%) Kappa

WRLD Hedner 2011 [127] PAT 227 – zzzPAT 65.4 0.48
Isa 2011 [138] ECG 16 9 RF 60.3 0.26
This work RE 48 26 LD 63.8 0.38
WRN Redmond 2007 [249] ECG,RE 31 30 LD 76.1 0.45
Mendez 2010 [198] BCG† 17 46 KNN 72.0 0.42
Kortelainen 2010 [161] BCG‡ 18 4 HMM 79.0 0.44
Sloboda 2011 [277] RE 16 9 NB ∼70 –
Xiao 2013 [312] ECG 45 41 RF 72.6 0.46
This work RE 48 26 LD 76.2 0.45
For signal modalities – PAT includes peripheral arterial tone, pulse rate, oxyhemoglobin saturation, and
actigraphy; RE, respiration; BCG, ballistocardiogram measured with bed sensor († BCG with cardiores-
piratory activity and body movement and ‡ BCG with cardiac activity and body movement).
For classifier – zzzPAT, a sleep staging algorithm developed by Herscovici et al. [131]; RF, random for-
est; LD, linear discriminant; HMM, hidden Markov model; NB, naive Bayes; KNN, k-nearest neighbor.
Although the addition of the respiratory amplitude features resulted in enhanced perfor-
mance in WRLD and WRN classifications (Table 4.2), the improvements seem relatively mod-
est in general. One explanation is that these new features are correlated with the existing fea-
tures as discussed before and the additional information is limited. Upon a closer look, we
found that the new features contributed more on deep sleep detection than other detection tasks.
As a result, this would yield relatively lower performance improvements for multiple-stage
classifications since deep sleep only accounts for an average of 14.5% over the entire night.
As shown in Table 4.2, the new features could not help improve wake detection. Actually, the
existing features Sdtw and Sdfw have been shown to be reliable in detecting wake epochs with
body movements in our previous study [180]. In this work, to focus more on the respiratory
depth and volume properties without being influenced by body movements, we excluded the
‘dubious’ peaks and troughs (see Section 4.2) where some of them are possibly body motion
artifacts which are often indication of wake epochs. Therefore, the new features here might not
be able to help detect ‘quiet wake’ (wakefulness without body movements). Nevertheless, the
effect of body movements on the respiratory depth and volume needs to be further studied.
In addition, we observe that the variation of sleep stage classification results between sub-
jects still remains high (see Table 4.2). For instance, the average Kappa values of WRLD and
WRN classifications over all subjects are 0.38 ± 0.14 and 0.45 ± 0.15, respectively. This is
mainly caused by large physiological differences between subjects in the way sleep stages are
expressed on respiratory features, which naturally leads to difficulties in enhancing the clas-
sification performance for some subjects. Therefore, it is still worth investigating methods to
reduce the between-subject variability of the features.
In this work we selected features solely based on their discriminative power measured by
IG. This approach did not take the correlation or relevance between features into account so
that some of them might likely redundant to some extent. On average, the maximum abso-
lute Spearman’s rank correlation coefficient |r|max between each new feature and the existing
features is 0.35 ± 0.11 (ranging from 0.07 ± 0.46 for different new features, p < 0.01). For
instance, the highest correlation (r = 0.46, p < 0.0001) occurs between Fsd and Tsdm , indicat-
ing that the variation of respiratory frequency is highly correlated with respiratory depth and
its change. Hence, employing feature selectors that aim at reducing feature redundancy merits
further investigation, especially when more features are incorporated.
As presented in Table 4.3, our methods achieved acceptable sleep stage classification results
when using respiratory information alone. Although the results are lower than some other stud-
ies, those studies used more signal modalities such as cardiac activity. We therefore anticipate
that the classification performance should be further enhanced when combining respiratory and
cardiac activity, which will be further studied. Moreover, we only used the simple LD classi-
fier as long as we exclusively focused on analyzing new features for sleep stage classification.
Nevertheless, more advanced classification algorithms merit investigation in future work.
4.5 Conclusion
In this chapter, respiratory effort amplitude (with respect to breathing depth and volume) was
analyzed and quantified during nighttime sleep, which was found to differ across sleep stages.
Based on this, 12 novel features that characterize different aspects of respiratory effort ampli-
tude were extracted for automated sleep stage classification. To eliminate the effect of body
movements during sleep, respiratory effort signals were calibrated by using a DTW measure
which has been shown to correlate with body motion artifacts. By calibrating the signals and
normalizing the features for each subject, the discriminative power of the features can be in-
creased. When using only respiratory effort signals, combining the new features proposed in
this work with the existing respiratory features (known in literature) can help significantly im-
prove the performance in classifying and identifying different sleep stages with an exception of
wake state detection.
CHAPTER 5
Measuring dissimilarity between respiratory effort signals

based on uniform scaling for sleep staging
This chapter is adapted from: X. Long, J. Yang, T. Weysen, R. Haakma, J. Foussier, P. Fonseca, and R.
M. Aarts. Measuring dissimilarity between respiratory effort signals based on uniform scaling for sleep
staging. Physiological Measurement, 35(12):2529–2542, 2014. IOPc Publishing
Abstract – Polysomnography (PSG) has been extensively studied for sleep staging, where sleep
stages are usually classified as wake, rapid-eye-movement (REM) sleep, or non-REM (NREM)
sleep (including light and deep sleep). Respiratory information has been proven to correlate
with autonomic nervous activity that is related to sleep stages. For example, it is known that the
breathing rate and amplitude during NREM sleep, in particular during deep sleep, are steadier
and more regular compared to periods of wakefulness that can be influenced by body move-
ments, conscious control, or other external factors. However, the respiratory morphology has
not been well investigated across sleep stages. We thus explore the dissimilarity of respira-
tory effort with respect to its signal waveform or morphology. The dissimilarity measure is
computed between two respiratory effort signal segments with the same number of consecu-
tive breaths using a uniform scaling distance. To capture the property of signal morphological
dissimilarity, we propose a novel window-based feature in a framework of sleep staging. Ex-
periments were conducted with a data set of 48 healthy subjects using a linear discriminant
classifier and a 10-fold cross validation. It is revealed that this feature can help discriminate be-
tween sleep stages, but with an exception of separating wake and REM sleep. When combining
the new feature with 26 existing respiratory features, we achieved a Cohen’s Kappa coefficient
of 0.48 for 3-stage classification (wake, REM sleep, and NREM sleep) and of 0.41 for 4-stage
classification (wake, REM sleep, light sleep, and deep sleep), which outperform the results
obtained without using this new feature.
71
72 Chapter 5. Uniform scaling dissimilarity on respiratory effort
5.1 Introduction
Previous studies have shown that characteristics of human respiratory activity are associated
with sleep stages throughout the entire night [95, 281]. Respiratory effort has been increasingly
used for objective sleep analysis [253] and sleep staging [69, 249] in contrast to traditional
polysomnography (PSG) which is considered the “gold standard” in sleep studies. This is
because respiratory activity is able to be acquired in an easy and unobtrusive manner using,
for example, bed sensors [161, 304], Doppler radar [194], photoplethysmography [174], or a
watch-based device [131]. Sleep consists of wake, rapid-eye-movement (REM) sleep, and four
non-REM (NREM) sleep stages S1-S4 according to the R&K rules [247]. In regard to S3 and
S4, the American Academy of Sleep Medicine (AASM) guidelines [136] and their updated
rules [38] suggest merging them into a single “deep sleep” or slow wave sleep stage. S1 and
S2 often correspond to “light sleep” [51, 276]. With PSG, sleep stages are manually scored
by sleep technicians on 30-s epochs based on multiple electrophysiological signals including
electroencephalography (EEG), electrooculography (EOG), and electromyography (EMG). The
manually scored sleep stages can be visualized in a hypnogram.
It has been reported in earlier studies that some characteristics of respiration differ across
sleep stages such as respiratory frequency [95], respiratory variability [256], different frequency
components of respiratory spectrum [249], etc. However, the dissimilarity of respiratory effort
in terms of signal waveform or morphology for different sleep stages has not been well explored.
In fact, the respiratory pattern (e.g., amplitude and frequency) has been shown to be more stable
and regular during NREM sleep (in particular during deep sleep) than during wake and REM
sleep [67, 129]. The irregularity of breathing is usually caused by body movements, alternation
of ventilation control, or behavioral factors when awake [230] and it is related to paralysis of
voluntary musculature (muscle atonia) during REM sleep [233]. In this matter, we may then
anticipate that if a sleep stage has a higher regularity in breathing, the respiratory effort in this
stage would have lower dissimilarity in between. On the other hand, the respiratory dynam-
ics have been found to associate with physiologic states such as sleep stages which distinctly
correspond to autonomic regulatory mechanisms [226, 267, 292]. We therefore hypothesise
that (1) the respiratory effort is characterized by signal morphology and (2) the dissimilarity
between two respiratory effort periods is influenced by their corresponding sleep stages. Re-
search has been focusing on investigating respiration changes during sleep [149, 256]. For in-
stance, some researchers analyzed non-random variability of respiration (e.g., breath-by-breath
intervals) on short- and long-term scales [256], whereas with a much less focus on comparing
respiratory patterns of multiple breaths. Although some parameters including breathing rate,
inspiratory/expiratory volumes, and minute volume were investigated, the respiratory morphol-
ogy was less researched.
Many methods have been utilized to compare two time series such as cross-correlation, de-
trended fluctuation analysis, and cross-approximate entropy, however, they can be limited by
several factors including the non-stationary trend of data, insufficient number of data points
for, e.g., polynomial fitting, low relative consistency, and/or unequal length between time series
[31, 133, 250]. The idea here is to use a Euclidean-based distance as a dissimilarity metric
between two respiratory effort signal segments from a subject. When computing the distance,
each signal segment is selected inside its corresponding 30-s epoch to have a certain number of
consecutive breaths, served to provide an even comparison on their signal morphology. These
signal segments are usually less than 30 s. It is inevitable that the length (i.e., number of data
points) of any two signal segments differs so that they are necessarily required to be scaled at
an equal length in order to perform an Euclidian (sequential) mapping. To resolve this prob-
lem, we propose to use a uniform scaling method [314] to re-scale the two signal segments
by searching for the minimal Euclidean distance between them. In other words, they are uni-
formly ‘stretched’ to allow for a reduction on the effects of variant breathing frequency to a
certain degree, resulting in focusing more on signal morphology.
As for automatic sleep staging, it is particularly interesting to know if different sleep stages
can be distinguished by means of respiratory effort data when the PSG-based hypnogram is
absent. This would benefit the applications of home-based sleep staging or sleep stage clas-
sification which has been attracting increasing attention in recent years [89, 182, 248, 264].
Information regarding sleep stages is usually extracted as epoch-based “features” used to per-
form epoch-by-epoch classification. For this purpose, we propose a new feature to describe the
dissimilarity of respiratory effort morphology between different epochs from the same record-
ing. Of this feature, discriminative power in classifying sleep stages will be evaluated and it is
expected to help improve sleep staging performance.

5.2.1 Subjects and protocol
Forty eight healthy subjects [21 men and 27 women; mean age 41.3 y ranging from 20 to
83, standard deviation (SD) 16.1; mean body mass index 23.6 kg·m−2 ranging from 19.1 to
31.3, SD 2.9] in the SIESTA project [160] are considered. The project was supported by the
European Commission and the subjects were monitored in seven different sleep laboratories
located in five European countries over a period of three years from 1997 to 2000. The subjects
had a Pittsburgh Sleep Quality Index [60] of less than 6 and fulfilled several criteria (e.g., no
depressive symptoms, no reported medical, neurological, mental or cardiovascular disorders,
no history of drug abuse or habituation, no psychoactive medication, no shift work, and usually
bedtime before midnight). According to the study protocol of the SIESTA project, all subjects
provided an informed consent, documented their sleep habits over 14 nights, and spent two
consecutive nights (on days 7 and 8) in the sleep laboratory [19]. More details regarding the
subject information and the study protocol can be found online (http://www.ofai.at/siesta). In
this study, we only include single-night PSG recordings (on day 7) for analysis.
5.2.2 Polysomnographic measurements
Full PSG data, including multiple EEG, EOG, and EMG channels, electrocardiography (ECG),
respiratory effort, oxygen saturation, snoring, etc., were recorded for each subject and the sleep
Table 5.1: Sleep data from 48 healthy subjects, where mean ± SD

and range are given.

Total number of epochs (#) 938.3 ± 44.5 796 − 1026
Wake (%) 12.9 ± 6.1 1.2 − 24.5
REM sleep (%) 19.0 ± 3.3 15.3 − 26.5
NREM sleep (%) 68.1 ± 4.9 56.1 − 76.3
Light sleep (%) 53.6 ± 5.5 42.7 − 66.7
Deep sleep (%) 14.5 ± 4.8 5.3 − 28.5
stages were visually scored by professional sleep technicians as wake, REM, and S1-S4 on
30-s epochs according to the R&K rules. Thoracic breathing movements were measured by
respiratory inductance plethysmography (RIP) in the form of respiratory effort signals at a
sampling rate of 10 Hz. For the problem of sleep staging, we consider deep sleep (merged S3
and S4) as a single stage as suggested by the AASM guidelines. In the mean time, S1 and S2
are merged as single light sleep.
Referring to the statistics of normal sleepers across the human lifespan reported previously
[216], the selection of overnight recordings from a larger data set met several criteria including
the sleep efficiency of ≥75%, REM sleep of ≥15%, and deep sleep of ≥5%. The sleep data is
summarized in Table 5.1, in which mean and SD over subjects and range are presented.
5.2.3 Signal processing
The raw respiratory effort signals are first low-pass filtered (10th order Butterworth filter with
a cut-off frequency of 0.6 Hz) in order to eliminate high-frequency noise. Then the baseline is
removed by subtracting the median peak-to-trough amplitude estimated over the entire record-
ing, which serves to compute the respiratory volume-based features. These features will be
described further in Section 5.2.7. The localization of respiratory peaks/troughs is achieved by
detecting the signal turning points based on sign changes of the signal slopes. Afterwards, we
remove the falsely detected peaks/troughs (1) with too short peak-to-trough or trough-to-peak
intervals (where the sum of two successive intervals is less than the median of all intervals over
the entire recording) and (2) with too small amplitudes (where the peak-to-trough difference
is smaller than 0.15 times the median of the entire respiratory signal). These methods were
validated by comparing the automatically detected results with manually annotated peaks and
troughs and an accuracy of ∼98% was achieved.
5.2.4 Dissimilarity measure with uniform scaling
Given an overnight respiratory effort recording with L epochs from a subject, the ith epoch is
expressed as Ui = {ui,1 , ui,2, . . . , ui,n} (i = 1, 2, . . ., L) with n data points (here n = 300 at the
signal sampling rate of 10 Hz). As explained before, we only choose a signal segment with
a certain number of consecutive breaths λ inside this epoch when computing the dissimilarity
score, thereby the chosen signal segment for this particular epoch Ui is expressed by Vi =
{vi,1 , vi,2 , . . ., vi,mi } with mi data points (mi ≤ n). The locations of vi,1 and vi,mi are based on the
detected respiratory peaks or troughs within this epoch so that the segment Vi contains several
complete breaths, starting and ending at two different troughs. The signal segment length mi is
dependent of i because respiratory frequency usually varies between signal segments, even if
they might have a same number of breaths. Besides, it also depends on the prescribed number
of breaths λ .
Let us consider two epochs Ui and U j (i, j = 1, 2, . . ., L and i 6= j) with pi and qi consecutive
breaths, respectively. To ensure an equal number of breaths that aims at evenly comparing
their dissimilarity, we have λ = min{pi , qi }. For the epoch with more breaths, only the λ
breaths in the middle are selected, yielding a signal segment within this epoch. Then the two
signal segments Vi and V j (i 6= j) with λ breaths each are normalized at zero mean and unit
variance (Z-score normalization). However, the two signal segments may have unequal lengths,
which is not applicable for computing the Euclidean distance between them. To tackle this, we
utilize uniform scaling, a Euclidean-based minimization method. For Vi and V j , assuming that
mi ≤ m j , a uniformly scaled series of Vi is expressed as Vik = {vki,1 , vki,2 , . . ., vki,k } with length of k
(mi ≤ k ≤ m j ), where vki,x = vi,⌈x·mi ·k−1 ⌉ for x = 1, 2, . . ., mi . Hence, the dissimilarity score dscore
between Ui and U j is the uniform scaling distance dus between Vi and V j , which can be obtained
by minimizing the Euclidean distance subject to mi ≤ m j , such that
v
u
u1 k
dscore (Ui ,U j ) ≡ dus (Vi ,V j ) = min t ∑ (vki,x − v j,x )2 . (5.1)
mi ≤k≤m j k x=1
Since the k-space Euclidian distance metric is sensitive to series length k which usually en-
counters different values in Equation 5.1, the distance should be normalized by k. Figure 5.1
depicts an example of computing the dissimilarity score dscore between two epochs. Note that
dscore is computed within each recording (or subject for the single-night data) to avoid the effect
of between-subject variability, often caused by the existence of physiological difference from
subject to subject.
5.2.5 Windowed dissimilarity feature
It is of interest to extract a feature for each 30-s epoch to capture the dissimilarity property of
respiratory effort morphology. This feature can in turn be used to separate different sleep stages.
To do so, we compute the mean dissimilarity score between each epoch and the other epochs
from the same recording within a window, named by windowed (self-) dissimilarity feature and
denoted as Dwin henceforth. We expect that this feature is not independent of sleep stage and
thus it is informative for sleep staging. For the ith epoch Ui of a given subject, it is computed as
∑ j dscore (Ui ,U j )
Dwin (Ui ) = , for | j − i| ≤ w and j 6= i, (5.2)
min(w, i − 1) + min(w, L − i)
(a)
Resp. effort (a.u.)

Ui Uj
1
0
1
600 605 610 615 620 625 630
(b) Time (s)
Resp. effort (a.u.)
2
Vi Vj
1
0
1
0 50 100 150 200 250
(c) Sample (at 10 Hz)
Resp. effort (a.u.)
2 k
Vi Vj
1
0
1
0 50 100 150 200 250
Sample (at 10 Hz)
Figure 5.1: An example of computing the dissimilarity score of respiratory effort between two epochs:
(a) original signals Ui and U j at 10 Hz within 30-s epochs; (b) selected signal segments Vi and V j with
5 consecutive breaths, where series lengths are unequal; (c) uniformly scaled series Vik and V j , where k
equals the length of V j . Note that the signal segments in (a) and (b) are normalized to have zero mean
and unit variance.
in which L is the total number of epochs for this specific subject and w = 1, 2, . . ., L is the
(single-side) size of the window centered at Ui . This means that Dwin is a feature with a certain
time (or window) scale. The window size w is determined by maximizing the feature discrim-
inative power. Intuitively, the majority of the epochs contained within a small window should
be in the same sleep stage as the given epoch. This can be examined by comparing the percent-
age of occurrence for different sleep stages versus the time difference ∆ between epochs. We
also analyze the changes of dscore for ‘self-comparisons’ versus ∆, where dscore is computed be-
tween epochs with same sleep stage (i.e., wake-wake, REM-REM, light-light, and deep-deep).
To reduce noise in feature level caused by measurement errors or body motion artifacts, Dwin is
smoothed over the entire-night recording using a moving average method (with a 10-min span).
5.2.6 Feature analysis
For the windowed dissimilarity feature Dwin , we first compare its mean value and SD over
all subjects between sleep stages. In addition to that, we compute its discriminative power for
sleep staging using One-Way analysis of variance (ANOVA) F-statistic. A higher discriminative
power leads to a larger value of ANOVA F-statistic. The F-statistic of Dwin is then compared
with that of the existing features by ranking it among all the features. The distributions of
Dwin in different sleep stages are found to approximately follow a normal distribution using a
Quantile-Quantile (Q-Q) plot method.
5.2.7 Sleep staging
As stated, the new feature Dwin can be incorporated to perform automatic sleep staging when
solely using respiratory effort data. A set of 26 existing respiratory features have been used to
classify sleep stages in previous studies. They comprise features in both time and frequency do-
main [248], respiratory depth- and volume-based features [182], and non-linear features based
on sample entropy [75] and dynamic warping [180]. Table 5.2 lists and describes all the respi-
ratory features. To examine whether Dwin can help achieve an enhanced classification perfor-
mance, we compare the classification results with and without adding it to the existing feature
set. Note that for the purpose of reducing between-subject variability in respiration, all the
features are normalized (Z-score) for each overnight recording.
We simply adopt a linear discriminant (LD) classifier which has been widely used for the
task of sleep staging [89, 108, 182, 249]. The data including 48 entire-night recordings is
randomly divided to 10 data subsets where each fold consists of four or five recordings and
then we execute the sleep staging iteratively using a 10-fold cross-validation (CV). During each
iteration, the classifier is trained on nine folds and validated on the remaining one in order to
minimize the classifier bias.
To evaluate the classifier, we use Cohen’s Kappa coefficient κ [72] in addition to overall
accuracy because it is more appropriate for analyzing unbalanced data (in our case light sleep
accounts for 53.6% which is much larger than the other stages). To exploit the prior proba-
bilities of different sleep stages in an LD classifier that may change over time, we compute
a time-varying prior probability (TVPP) for each epoch by counting the relative frequency of
occurrence of each sleep stage at its corresponding time of the night based on the associated
training data. More details about TVPP can be found elsewhere [249]. Here we present results
for two sleep staging schemes, including 4-stage classification (wake, REM sleep, light sleep,
and deep sleep) and 3-stage classification (wake, REM sleep, and NREM sleep).
5.3 Results
The (single-side) window size w of 25 epochs was experimentally found to be an appropriate
value when computing the new feature Dwin , where its feature discriminative power in classi-
fying wake, REM sleep, light sleep, and deep sleep was maximized. Figure 5.2 compares the
percentage of occurrence in different sleep stages changing over ∆. The figure indicates a pres-
ence of self-comparisons with a higher likelihood if |∆| is smaller than a value (e.g., ∼30 epochs
for wake, REM sleep, and deep sleep). It also illustrates that the comparison between each sleep
stage and light sleep dominates if |∆| is larger than that value. These graphs imply that, for our
choice of w = 25 epochs, the feature values of Dwin depend more on the self-comparisons. As
shown in Figure 5.3, in regard to the self-comparisons, we observe that different sleep stages
can be separated by the dissimilarity score within the 25-epoch window except for that between
wake and REM sleep where overlaps occur.
Figure 5.4 compares the feature values of Dwin in different sleep stages (mean ± SD and
histogram), in which the separation can be observed between sleep stages, particularly be-
Table 5.2: A list of respiratory features
Feature index Description

Existing∗
1 Respiratory frequency estimated in the frequency domain
2 Spectral power of respiratory frequency
3 Spectral power in the very low frequency (VLF) band (0.01-0.05 Hz)
4 Spectral power in the low frequency (LF) band (0.05-0.15 Hz)
5 Spectral power in the high frequency (HF) band (0.15-0.5 Hz)
6 Ratio of spectral power between LF and HF bands
7 Standard deviation of respiratory frequency over 150 s
8 Mean breath-by-breath correlation
9 Standard deviation of breath-by-breath correlation
10 Standard deviation of breath length
11 Respiratory frequency estimated in the time domain
12 Respiratory regularity measured by sample entropy
13 Respiratory similarity measured by dynamic time warping
14 Respiratory similarity measured by dynamic frequency warping
15 Standardized median of respiratory peaks
16 Standardized median of respiratory troughs
17 Respiratory peak regularity measured by sample entropy
18 Respiratory trough regularity measured by sample entropy
19 Median respiratory peak-to-trough difference
20 Median respiratory volume during breath cycles
21 Median respiratory volume during inhalations
22 Median respiratory volume during exhalations
23 Median respiratory flow rate during breath cycles
24 Median respiratory flow rate during inhalations
25 Median respiratory flow rate during exhalations
26 Ratio of inhalation and exhalation flow rate
New
27 Respiratory dissimilarity measured by uniform scaling (Dwin )
∗ The references for the features are 1-11 [249, 288], 12 [75], 13 and 14 [180], and 15-26 [182].
tween deep sleep and the other stages and between REM and NREM sleep. An example of an
overnight hypnogram and the corresponding Dwin values from a 50-year-old female are illus-
trated in Figure 5.5, where the correlation between them can be seen. Table 5.3 presents the
discriminative powers (as measured by ANOVA F-statistic) of Dwin in separating different sleep
stages. For comparison, we also provide its ranking among all features as well as the top-10
ranked features (in a descending order in terms of F-statistic) in the table.
The respiratory effort-based sleep staging results using the feature set with and without Dwin
are compared in Table 5.4, where the overall accuracy and the Cohen’s Kappa coefficient are
reported. It is noted that combining Dwin with the existing features resulted in a significantly
Wake REM Light Deep

(a) Wake (b) REM
1 1
Percentage (%)
Percentage (%)
0.5 0.5
0 0
−200 −100 −25 0 25 100 200 −200 −100 −25 0 25 100 200
∆ (30−s epoch) ∆ (30−s epoch)
(c) Light (d) Deep
1 1
Percentage (%)
Percentage (%)
0.5 0.5
0 0
−200 −100 −25 0 25 100 200 −200 −100 −25 0 25 100 200
∆ (30−s epoch) ∆ (30−s epoch)
Figure 5.2: The probability of occurrence of different sleep stages versus time difference ∆ for (a) wake,
(b) REM, (c) light, and (d) deep sleep epochs. The boundary of the 25-epoch window for computing
Dwin is indicated (dashed line). For all stages, light sleep percentage is larger than any other stages when
|∆| > ∼30 epochs.
0.9
(a.u.)
score
d
0.8
Wake−wake Light−light
REM−REM Deep−deep
0.7
0 25 50 75 100
|∆| (30−s epoch)
Figure 5.3: Mean dissimilarity score dscore versus absolute time difference |∆| for self-comparisons
wake-wake, REM-REM, light-light, and deep-deep. The boundary of the 25-epoch window for comput-
ing Dwin is indicated (dashed line).
(a) Wake (b)
Normalized histogram (%)

0.99 ± 0.14 0.2
1.2 REM
0.98 ± 0.13
Light
0.89 ± 0.16 0.15 Deep
D win (a.u.)
1 0.78 ± 0.17
0.1
0.8
0.05
0.6
0
Wake REM Light Deep 0.3 0.5 0.7 0.9 1.1 1.3 1.5
Sleep stage D win (a.u.)
Figure 5.4: Comparison of the windowed dissimilarity feature Dwin in different sleep stages: (a)
mean ± SD and (b) normalized histogram (i.e., percentage, %).
(a)
Wake
REM
Light
Deep
1.5 (b)
Dwin (a.u.)
0.5
0 100 200 300 400 500 600 700 800 900

Time (30 s epoch)
Figure 5.5: An example of (a) overnight annotation and (b) feature values of Dwin from a 50-year-old
female, where the unsmoothed (gray) and smoothed (black) feature values are both shown.
increased κ of 0.41 at an overall accuracy of 64.9% when classifying four sleep stages and
of 0.48 at an over accuracy of 77.1% when classifying three sleep stages (both with TVPP).
The table also shows the results obtained without applying TVPP, indicating that using TVPP
can help achieve significantly better results. Here the significance was checked with a two-
sided Wilcoxon signed-rank test. To understand what aspects of sleep staging the new feature
improves, we present the confusion matrices obtained with and without Dwin in Table 5.5 (for
4-stage classification) and in Table 5.6 (for 3-stage classification), where TVPP was applied.
Table 5.3: Discriminative power of Dwin in separating different sleep stages as evalu-
ated and ranked by ANOVA F-statistic. Results are pooled over all subjects
Sleep stages F-statistic Rank† Top 10 features‡ (descending order)

Wake/REM 10.7∗∗ 25 12, 13, 3, 5, 4, 14, 20, 21, 18, 22
Wake/light 1487.5∗ 9 13, 14, 7, 4, 5, 3, 15, 17, 27, 16
Wake/deep 3694.4∗ 2 7, 27, 16, 15, 14, 4, 17, 13, 3, 18
REM/light 1679.0∗ 2 7, 27, 14, 15, 20, 21, 16, 22, 24, 23
REM/deep 4915.8∗ 2 7, 27, 16, 20, 15, 21, 25, 22, 23, 24
Light/deep 1420.9∗ 4 16, 15, 7, 27, 17, 14, 10, 4, 8, 18
Wake/REM/light/deep 1912.6∗ 6 7, 16, 15, 13, 14, 27, 4, 5, 17, 3
Wake/REM/NREM 2012.8∗ 4 7, 13, 14, 27, 5, 4, 15, 16, 3, 12
†
Ranking of F-statistic among all respiratory features.
‡
The feature indices are referred to Table 5.2 and the new feature (feature 27) is indicated
with underline.
∗ p < 0.0001, ∗∗ p < 0.005.
Table 5.4: Ten-fold CV results of 4-stage (wake, REM sleep, light sleep and deep
sleep) and 3-stage (wake, REM sleep, and NREM sleep) classification schemes ob-
tained using the feature set with and without Dwin , where the results obtained with
and without using TVPP are also presented
Scheme TVPP Without Dwin † With Dwin ‡

Accuracy Kappa (κ ) Accuracy Kappa (κ )
4 stages No 53.7 ± 8.3% 0.34 ± 0.12 55.2 ± 8.0%∗ 0.37 ± 0.11∗
Yes 63.8 ± 8.0% 0.38 ± 0.14 64.9 ± 7.8%∗∗ 0.41 ± 0.14∗
3 stages No 69.2 ± 9.7% 0.43 ± 0.16 70.0 ± 9.3%∗∗ 0.45 ± 0.15∗∗
Yes 76.1 ± 7.8% 0.45 ± 0.16 77.1 ± 7.6%∗∗ 0.48 ± 0.17∗
†
26 existing features.
‡ 27features (26 existing features and Dwin ).
Significance of difference was found with and without Dwin using a paired Wilcoxon
signed-rank test (two-sided) at ∗ p < 0.001 or ∗∗ p < 0.01.
Table 5.5: Confusion matrix of 4-stage classification (10-fold CV) obtained using fea-
ture set with and without Dwin , where the results without Dwin are given in parentheses
PSG ↓ Classified → Wake REM sleep Light sleep Deep sleep

Wake 2608 (2606) 512 (453) 2533 (2622) 56 (28)
REM sleep 269 (288) 4259 (3679) 3992 (4492) 13 (74)
Light sleep 844 (831) 2018 (1839) 19285 (19569) 1883 (1791)
Deep sleep 35 (33) 55 (65) 3532 (3664) 2887 (2747)
Table 5.6: Confusion matrix of 3-stage classification (10-fold CV) ob-

tained using feature set with and without Dwin , where the results with-
out Dwin are given in parentheses
PSG ↓ Classified → Wake REM sleep NREM sleep

Wake 2605 (2596) 540 (495) 2564 (2618)
REM sleep 271 (278) 4255 (3909) 4007 (4346)
NREM sleep 851 (861) 2112 (2050) 27576 (27628)
5.4 Discussion
The deployment of respiratory effort dissimilarity with several consecutive breaths (as mea-
sured by a uniform scaling distance) to characterize the regulation of breathing within different
sleep stages was investigated. On average, we observe the lowest dissimilarity score between
two deep sleep epochs. This is because respiratory effort during NREM sleep (in particular dur-
ing deep sleep) is steadier and more regular compared with that during wake and REM sleep
as mentioned before. As illustrated in Figure 5.3, the discrimination between wake and REM
sleep in terms of respiratory effort dissimilarity over time difference is not consistent and seems
maximized at |∆| beyond 40 epochs. With smaller time differences, overlap can be observed
between the dissimilarity scores for wake-wake and REM-REM comparisons. During wake,
breathing control might be somewhat less affected by conscious control as well as body move-
ments or other external influences in a short range (e.g., with a |∆| of less than 10 epochs or 5
minutes). This would decrease the dissimilarity scores of wake-wake comparison during that
range, yielding a difficulty in distinguishing between wake and REM sleep. As a result of that,
the windowed dissimilarity feature Dwin has a low discriminative power in separating wake and
REM sleep as shown in Table 5.3. Actually, classifying wake and REM sleep might sometimes
be difficult even with PSG-based visual scoring [276].
In this work, we chose the window size w of 25 epochs to compute Dwin by globally maxi-
mizing the feature discriminative power in classifying wake, REM sleep, light sleep, and deep
sleep. However, it might not be the optimal choice all the time, particularly in separating wake
and REM sleep (see Figure 5.3). The optimal window size might vary when classifying dif-
ferent sleep stages. Therefore, we think that using an adaptive window size to discriminate
between different sleep stages merits further investigation.
Regarding sleep staging, the new feature Dwin helped improve the classification performance
(Table 5.4) and it contributed more to the detection of REM and deep sleep from the other sleep
stages (Table 5.5). It is therefore suggested that this feature contains additional information
that is not carried by the existing features. We also reveal that using TVPP can lead to better
classification results, as shown in Table 5.4. With cardiorespiratory activity, a κ of 0.46 and an
overall accuracy of 76.1% were achieved when classifying wake, REM sleep, and NREM sleep
for 31 healthy subjects [249]. We obtained slightly better results with the use of the respiratory
information alone. For 4-stage classification (wake, REM sleep, light sleep, and deep sleep),
a κ of 0.48 and an overall accuracy of 65.4% (re-computed based on the reported confusion
matrix) were achieved by Hedner et al. [127], which outperform our results. However, they
employed more signal modalities including peripheral arterial tone, pulse rate, oxyhemoglobin
saturation, and actigraphy. In a more recent study, Willemen et al. [309] reported a κ of 0.56
(at an accuracy of 69%) for 4-stage classification using cardiorespiratory and body movement
features, whereas they considered an epoch of 60 s instead of the standard 30 s used in most
studies with respect to sleep staging. Nevertheless, we anticipate that combining respiratory
and cardiac activity will result in a performance enhancement on sleep stage classification and
this will be further studied.
The PSG-based sleep stages were manually scored based on the R&K rules in the SIESTA
database. However, it has been reported that the overall inter-scorer agreement using the new
AASM standard is slightly higher than that obtained using the R&K rules [81]. Therefore, the
AASM standard is suggested to be applied for PSG-based sleep stage scoring in future work,
which is expected to deliver more reliable annotations of overnight sleep stages used for the
task of respiratory-based sleep stage classification.
This study only considered healthy subjects without any reported medical, neurological,
mental, or cardiovascular diseases as mentioned before. However, for patients with sleep-
disordered breathing (e.g., sleep apnea/hypopnea) or other respiratory abnormalities, abnormal
respiratory events during the night can affect measuring the dissimilarity between respiratory
effort signals. Therefore, the approach described in this work needs to be tested further for
these patients. In addition, it has been shown that the respiratory effort is more sensitive to
changes of sleep posture and body movements during sleep in comparison with measurements
by nasal cannulas [307]. In that case, Dwin might be erroneously calculated, thus harming the
classification performance. However, for the dissimilarity measure described in this work, the
effect of sleep posture might be eliminated since it was computed by comparing each respi-
ratory signal segment with its adjacent segments where the same sleep posture was expected.
Moreover, the dissimilarity measure focused on comparing signal morphology with a certain
number of breaths, where the falsely detected peaks and troughs (often corresponding to body
movements) were removed. As a result, the influences of sleep posture and body movements
should be diminished to some extent. Despite that, those influences merit further investigation.
5.5 Conclusion
In this chapter, by analyzing overnight respiratory effort from healthy subjects, we found that
sleep stages can be differentiated using a dissimilarity measure. This measure expresses the
dissimilarity between respiratory effort signals in their morphology. The dissimilarity can be
evoked by autonomic activity, alternation of ventilation control, or other external factors. A
new feature was extracted based on the properties of respiratory effort dissimilarity. Although
it performed worse than an existing feature (standard deviation of respiratory frequency), it can
help improving the performance of sleep staging when combined with all 26 existing respiratory
features (except for detecting wake from REM sleep). This indicates that this new feature
contains additional information that is not carried by the existing features for sleep staging.
CHAPTER 6
Modeling cardiorespiratory interaction during sleep with

complex networks
This chapter is adapted from: X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Foussier. Modeling
cardiorespiratory interaction during sleep with complex networks. Applied Physics Letters, 105:203701,
2014. AIP
c
Abstract – Human sleep comprises several stages including wake, rapid-eye-movement (REM)
sleep, light sleep, and deep sleep. Cardiorespiratory activity has been shown to correlate with
sleep stages due to the regulation of autonomic nervous system. Here the cardiorespiratory
interaction (CRI) during sleep is analyzed using a visibility graph (VG) method that represents
the CRI time series in complex networks. We demonstrate that the dynamics of the interac-
tion between heartbeats and respiration can be revealed by VG-based networks, whereby sleep
stages can be characterized and differentiated.
85
86 Chapter 6. Cardiorespiratory interaction in complex networks
6.1 Introduction
Human sleep is considered a complex biological process with its own internal architecture
expressed by sleep stages [63, 281]. Sleep stages can be typically separated based on pat-
terns observed in standard polysomnography (PSG) recordings including electroencephalogra-
phy (EEG), electromyography (EMG), and electrooculography (EOG) [136, 247]. With PSG,
sleep stages are manually scored on continuous and non-overlapping epochs (lasting 30 s each)
as wake, rapid-eye-movement (REM) sleep, and several non-REM (NREM) sleep stages for
adults. This is usually done by trained sleep technicians according to either the recommenda-
tions provided by Rechtschaffen and Kales (R&K) [247] or using the more recent guidelines of
the American Academy of Sleep Medicine (AASM) [136]. NREM sleep can be further divided
into stages S1-S4 based on the R&K rules, or stages N1-N3 based on the AASM guidelines.
S1 and S2 (or N1 and N2) are associated with ‘light sleep’. S3 and S4 (or N3) correspond
to slow-wave sleep or ‘deep sleep’. For normal subjects, sleep usually starts with light sleep
and then deep sleep with REM sleep following [63]. This sequence is called a sleep cycle and
occurs about every 90 minutes, four to six times per night [63, 243].
Cardiorespiratory activity has proven different characteristics across sleep stages due to
the manifestation of autonomic (sympathetic and vagal) nervous activity [13, 281, 292]. Re-
cently, dynamics of heartbeats and respiration during sleep have been extensively described
[54, 179, 182, 222, 225, 226]. In particular, characteristics of cardiorespiratory interaction
(CRI) or coupling during sleep have attracted more and more attention since they can be used
to provide means to clinically diagnose sleep-related disorders or to identify sleep stages for ob-
jective sleep assessment [29, 30, 147, 266]. For example, Bartsch et al. [30] proposed methods
based on Hilbert-Huang transform (HHT) and detrended fluctuation analysis (DFA) to quantify
and analyze cardiorespiratory phase synchronization in different sleep stages.
In recent years, exploration of a time series has been extended to a two-dimensional complex
network with encoded information stored in the time series, aiming at better exploiting its
dynamics or properties [6, 96, 188, 313, 320]. Lacasa et al. [169] proposed a nonlinear visibility
graph (VG) method in order to describe a time series in a graph based on specific geometric
criteria. They found that random, fractal, and periodic time series correspond to networks with
exponential, scale-free, and regular characteristics, respectively, which means that VG is an
adaptive method for investigating different types of time series.
Some studies have analyzed human physiological activity by means of VG-based networks
[144, 272, 321]. For example, heartbeat dynamics in VG-based networks have been investigated
for healthy subjects and patients with congestive heart failure [272] and for subjects with med-
itation training [144]. In the field of sleep, a recent work has shown that sleep stages are able
to be identified using parameters extracted from EEG signals based on VG-based algorithms
[321, 322]. However, the characteristics of the interaction between cardiac and respiratory
activities in a two-dimensional network during sleep was not studied. Modeling these charac-
teristics during sleep will potentially benefit the cardiorespiratory-based classification of sleep
stages. Therefore, we investigate the CRI dynamics across sleep stages in complex networks
using the VG-based method.
9 (a)
ECG (a.u.)
RR interval
6
3
0
−3
Resp. (a.u.) (b)
1
0
−1
(c)
CRI (a.u.)
1
0
−1
330 333 336 339 342 345
Time (s)
Figure 6.1: An example of using (a) a 15-s ECG signal and (b) the corresponding respiratory effort
signal to obtain (c) a CRI time series.
6.2 Cardiorespiratory interaction
We consider 330 overnight PSG recordings from 165 healthy subjects (87 males) from the
SIESTA database [160]. Each subject spent two consecutive nights in a sleep laboratory. The
subjects had an average age of 51.8 ± 19.4 y [mean ± standard deviation (SD)] and an average
total recording time of 7.8 ± 0.5 h. According to the SIESTA study protocol, they met several
criteria such as no reported symptoms of neurological, mental, medical, or cardiovascular dis-
orders, no sleep-related disorders, no shift work, and usual bedtime between 22:00 and 24:00.
The PSG recordings were visually scored on 30-s epochs by two independent raters based on
the R&K rules and in case of disagreement, a consensus annotation was obtained.
Here for each 30-s epoch, the location of individual heartbeats is identified by applying
the Hamilton-Tompkins R-peak detector [124] followed by a slope-based QRS localization
method [107] on the ECG signal with a window of 7 epochs (3.5 min) centered on the epoch of
interest. This window serves the purpose of including sufficient data points to capture changes
in heartbeat (or RR) intervals [288]. Afterwards, the corresponding respiratory effort at the
time stamps of the heartbeats is sampled. The resulting CRI time series is then used for VG
analysis. Figure 6.1 illustrates an example of the computation of a CRI time series from its
corresponding ECG signal and respiratory (effort) signal.
6.3 Visibility graph network
In this work, we apply the VG method to build complex networks for modeling a CRI time se-
ries and to analyze its dynamics across different sleep stages for healthy subjects. To formulize
the VG method, let us consider a time series with n data points {xk ,tk }k=1,2,...,n . Two data points
(a)
xi+1 xi+2 xi+3 xi+4 xi+5 xi+6 xi+7
… …
ti+1 ti+2 ti+3 ti+4 ti+5 ti+6 ti+7

(b)
… …
Figure 6.2: An example of converting (a) a time series segment with 7 data points into (b) a network
using the VG method, where the respective degrees of nodes from xi+1 to xi+7 are 4, 3, 3, 5, 3, 2, and 4.
(xi ,ti) and (x j ,t j ) are connected as vertices or nodes through an undirected edge if and only if
the following rule [169] is satisfied
t j − tℓ
∀ℓ ∈ (i, j); xℓ < x j − (x j − xi ) . (6.1)
t j − ti
Intuitively, this means that the two data points are connected if they are able to ‘see’ each
other (i.e., the linear interpolation between their values is always larger than the value of its
corresponding data point). The time series can therefore be converted into a VG by applying
this rule on all the data points, resulting in its associated complex network with occurrence of
edges that are linked between nodes. Figure 6.2 illustrates an example of converting a time
series x into a VG-based network. For each node, its degree δ is defined as the number of
edges attached to it, giving a heuristic indication of the network’s complexity. Thus, the degree
distribution of the nodes P(δ ) can be used to characterize the time series.
6.4 Network properties of cardiorespiratory interaction

6.4.1 Degree distribution
A total of 310,503 epochs (including 19.2% wake, 15.2% REM sleep, 53.5% light sleep, and
12.1% deep sleep) are analyzed in this work. Figure 6.3 plots the node degree distribution of
CRI, denoted as P(δ ), pooled over all epochs for each sleep stage (wake, REM sleep, light
sleep, and deep sleep). As illustrated, the degree distribution P(δ ) for each sleep stage follows
a power-law topology such that P(δ ) ∼ δ −λ , in particular when the degree is large (e.g., δ > 4).
The power λ is shown to differ across sleep stages (wake: λ = 3.7, REM sleep: λ = 3.8, light
sleep: λ = 4.1, and deep sleep: λ = 4.2). As reported in literature, a power-law topology should
correspond to a scale-free dynamics [14, 211, 286], suggesting that the CRI time series during
0
10
Wake
−1
10 REM sleep
Light sleep
−2
10 Deep sleep
Degree distribution P(δ)
−3
10
−4 λ = 3.7
10
−5
10 λ = 4.2
−6
10
−7
10
−8
10 0 1 2 3
10 10 10 10
Degree δ
Figure 6.3: Log-log plot of degree distribution P(δ ) of CRI during wake, REM sleep, light sleep, and
deep sleep. P(δ ) follows a power-law topology when δ is larger than 4, such that P(δ ) ∼ δ −λ with λ of
3.7 for wake, 3.8 for REM sleep, 4.1 for light sleep, and 4.2 for deep sleep.
0
10
Wake
−1 REM sleep
10
Light sleep
Mean degree distribution P (δm )
Deep sleep
−2
10
λ = 6.8
−3
10 λ = 8.6
−4
10
−5
10
−6
10
0 1 2
10 10 10
Mean degree δm
Figure 6.4: Log-log plot of mean degree distribution P(δm ) of CRI during wake, REM sleep, light sleep,
and deep sleep. P(δm ) follows a power-law [P(δm ) ∼ δm−λ ] when λ ≥ 6 with λ of 6.8 for wake, 7.2 for
REM sleep, 8.0 for light sleep, and 8.6 for deep sleep.
7
Mean δ m
3
Wake REM sleep Light sleep Deep sleep
Figure 6.5: Mean degree δm of the CRI time series networks (mean and SD) in different sleep stages. A
Mann-Whitney test shows significant differences between all pairs of sleep stages, with p < 0.0001.
a specific sleep stage are non-stationary and fractal [169]. In addition, we also observe that the
VG-based networks of CRI for wake epochs have a higher percentage of high-degree nodes (the
networks have a higher complexity) compared with other sleep stages, such as deep sleep which
has the least high-degree nodes of the associated networks. A possible explanation for this is
that the CRI time series is more noisy (caused by the weaker coupling between cardiac and
respiratory signals) during wake and it is more regular (due to the stronger cardiorespiratory
coupling) during deep sleep when compared with the other stages [30, 188]. Consequently, the
CRI time series are more irregular for wake epochs while they are more regular for deep sleep
epochs. The ‘blur’ in the figure at large values of δ might be due to the presence of outliers in
CRI time series caused by loose cables during measurement or body motion artifacts.
Since the degree is different between different sleep stages, it can be used to distinguish them
on an epoch-by-epoch basis. For this purpose, the mean degree δm for each epoch (computed
by averaging the degrees over the nodes with a window of 7 epochs centered on that epoch) can
be used as a quantification of the network ‘complexity’ of the CRI time series in VG for each
epoch. Figure 6.4 shows the distribution of δm for different sleep stages where the separations
between sleep stages can be clearly observed, in particular when the mean degree is smaller
than 3 or larger than 6. These results are similar to those obtained based on the analysis of
EEG signals [321]. In Figure 6.5, the δm values in different sleep stages are compared. Using
a two-tailed Mann-Whitney test, δm is found to be significantly different between each pair of
sleep stages (all with p < 0.0001). This means that, on average, wake epochs have the highest
mean degree in the networks followed by REM sleep epochs, then by light sleep and finally by
deep sleep. Moreover, if we consider the degree variation δsd , computed as the SD of the node
degrees in each epoch, we also find statistically significant differences between sleep stages
(all with p < 0.0001) as illustrated in Figure 6.6. The Spearman’s rank correlation coefficient
r between these two parameters δm and δsd is found to be high [r = 0.733, p < 0.00001; 95%
Mean δ sd
6
3
Figure 6.6: Degree variation δsd of the CRI time series networks (mean and SD) in different sleep stages.
A Mann-Whitney test shows significant differences between all pairs of sleep stages, with p < 0.0001.
confidence interval (CI) 0.730-0.736].
6.4.2 Assortativity mixing
Another important property of a network is its assortative mixing [210], which has been widely
used to analyze many real-world networks such as biological [100], neural [86], and social
networks [212]. For a node in a network, it takes the preference of its connections to high- or
low-degree nodes into account. Considering a network including a total of M edges, the ith
edge connects two nodes with degree of αi and βi at their ends. The assortativity coefficient ζ
of this network [210] is given by
M −1 ∑i αi βi − [M −1 ∑i 12 (αi + βi )]2
ζ= , (6.2)
M −1 ∑i 21 (αi2 + βi2 ) − [M −1 ∑i 12 (αi + βi )]2
with ζ ranging between −1 and 1. The network is assortative if ζ > 0, in which case the
high-degree (or low-degree) nodes are more likely to be connected to each other than to the
low-degree (or high-degree) nodes; if ζ = 0, the network is randomly mixed; and if ζ < 0 the
network exhibits disassortativity, in which case the high-degree nodes tend to connect to the
low-degree ones, and vice versa. For the CRI time series in this work, the assortativity coeffi-
cients of the associated VG-based networks in different sleep stages are shown in Figure 6.7.
The CRI networks in all sleep stages present assortative. In comparison with REM and NREM
sleep, the CRI network has a decreased assortativity coefficient during wake, indicating that
the network is more randomly mixed. Deep sleep, on the other hand, has a larger ζ compared
with light sleep, possibly because the CRI time series during deep sleep exhibit a more regular
pattern than light sleep. These findings suggest that sleep stages can also be separated based
on differences between the assortativity coefficients of VG-based CRI networks. It should also
0.25
0.2
0.15
Mean ζ
0.1
0.05
0
Figure 6.7: Assortativity coefficient ζ of the CRI time series networks (mean and SD) for different
sleep stages. A Mann-Whitney test shows significant differences between all pairs of sleep stages, with
p < 0.0001.
be noted that ζ is significantly correlated to δm (r = −0.363, p < 0.0001; 95% CI −0.368 to

−0.358) and δsd (r = −0.526, p < 0.0001; 95% CI −0.531 to −0.522).
6.5 Conclusion
In this chapter, we achieved the quantification of the dynamics of cardiorespiratory interaction
during sleep by converting it into complex networks using the VG method. It can be described
by some important characteristics of the networks including (mean) degree and its distribu-
tion, degree variation, and assortativity coefficient. These characteristics were shown to behave
differently across sleep stages. However, they were found to be correlated, possibly due to
the presence of mutual information between them. Nevertheless, in practice, they can offer
promising features used for classifying sleep stages based on cardiorespiratory activity.
Part II: Timing Between Autonomic
and Brain Activity
CHAPTER 7
Time delay between cardiac and brain activity during

sleep transitions
This chapter is adapted from: X. Long, J. B. Arends, R. M. Aarts, R. Haakma, P. Fonseca, and J. Rolink.
Time delay between changes of cardiac and brain activity during sleep transitions. Applied Physics Let-
ters, 106:143702, 2015. AIP
c
Abstract – Human sleep consists of wake, rapid-eye-movement (REM) sleep, and non-REM
(NREM) sleep that includes light and deep sleep stages. This work investigated the time de-
lay between changes of cardiac and brain activity for sleep transitions. Here the brain activity
was quantified by electroencephalographic (EEG) mean frequency and the cardiac parameters
included heart rate, standard deviation of heartbeat intervals and their low- and high-frequency
spectral powers. Using a cross-correlation analysis, we found that the cardiac variations during
wake-sleep and NREM sleep transitions preceded the EEG changes by 1-3 min but this was not
the case for REM sleep transitions. These important findings can be further used to predict the
onset and ending of some sleep stages in an early manner.
95
96 Chapter 7. Time delay between cardiac and brain activity
7.1 Introduction
In the past decades a phenomenon has been recognized in many domains that two coupled
sources or systems exhibit an unsynchronized interaction with a time difference or delay in be-
tween [29, 50, 90, 155, 157, 290]. For instance, neural oscillators have enhanced coupling in
delayed-time [90]. In particular, this may occur during transitions between two physical or bi-
ological states such as chaotic state changes [290], gene switches [50], neutron emission [157],
and cardiorespiratory phase synchronization transitions [29]. Understanding these phenomena
can help, e.g., explore the coherence of neurons and information transmission of the brain in
neurology [90] and improve ‘perception-action’ planning with stimulus events from external
world in cognitive science [236].
In this work we apply the time delay analysis in the area of human sleep. Neurophysiological
mechanisms of sleep are exceptionally important for humans to maintain, for instance, health,
internal homeostasis, memory, and cognitive and behavioral performance [61, 165]. Numerous
studies have reported significant association between heart rate (and heart rate variability, HRV)
and electroencephalographic (EEG) activity during sleep, where they both vary across sleep
states/stages [46, 54, 292]. Previous studies have demonstrated the presence of unsynchronized
changes of HRV and EEG activity in time course over the entire night [146, 217]. However,
the variations of brain activity and autonomous cardiac dynamics should not be independent of
sleep (state/stage) transitions, for which their coupling might change. We therefore investigated
the time delay in sleep transition profiles between cardiac and EEG activity using a cross-
correlation analysis, which was not studied before.
It is known that human sleep consists of wake state, rapid-eye-movement (REM) sleep state,
and non-REM (NREM) sleep state including four stages 1, 2, 3, and 4 according to the rules
recommended by Rechtschaffen and Kales (R&K) [247]. With the more recent guidelines of the
American Academy of Sleep Medicine [136], stage 3 and 4 are suggested to be merged to single
slow wave sleep or “deep” sleep since no essential difference was found between them. Besides,
stage 1 and 2 usually correspond to “light” sleep. According to one of these manuals, sleep
states/stages are scored by sleep clinicians on continuous 30-s epochs by visually inspecting
polysomnographic (PSG) recordings including multi-channel EEG, electrooculography (EOG),
and electromyography (EMG).

7.2.1 Subjects and recordings
A total of 330 overnight PSG recordings in the SIESTA database [160] from 165 normal sub-
jects (88 females) were considered in our analysis, where each subject spent two consecutive
nights for sleep monitoring. The SIESTA data were collected in seven sleep centers located
in five EU countries within a period from 1997 to 2000. The study was approved by the local
ethical committees of the recording partners and all subjects provided their informed consent.
The subjects had an average age of 51.8 ± 19.4 y and the average total recording length was
Part II. Timing between autonomic and brain activity 97
1.1
Cohen’s Kappa coefficient

0.9
0.8
0.7
0.6
0.5
REM/deep Wake/deep Wake/REM REM/light Wake/light Light/deep
Figure 7.1: Inter-rater agreement as evaluated by Cohen’s Kappa [mean and standard deviation (SD)
over recordings] between different sleep stages. Statistical significance of difference between each two
Kappa values was examined with a t-test, where the Kappa had no significant difference between REM
sleep/deep sleep and wake/deep sleep and between REM sleep/light sleep and wake/light sleep (p <
0.05) but it was significantly different between the others (p < 0.001).
7.8 ± 0.5 h per night. They fulfilled several criteria, such as no reported symptoms of neuro-
logical, mental, medical, or cardiovascular disorders, no history of drug or alcohol abuse, no
psychoactive medication, no shift work, and retirement to bed between 22:00 and 24:00 de-
pending on their habitual bedtime. Sleep states/stages were scored by two independent raters
based on the R&K rules. In case of disagreement, the consensus annotations were obtained.
The inter-rater reliability (measured by Cohen’s Kappa coefficient of agreement [72], rang-
ing from 0 to 1) in separating different sleep stages is compared in Figure 7.1. It shows that
the Kappa in distinguishing between light and deep sleep was statistically significantly lower
than that for separating other sleep stages. This is due to the gradual changes of physiological
behaviors within NREM sleep.
7.2.2 EEG and cardiac activity
The EEG activity was quantified by a parameter fEEG , called EEG mean frequency [217]. To
calculate it, the EEG signals were first band-pass filtered between 0.3 and 35 Hz and then
the power spectral density was computed for each non-overlapping 2-s interval with a discrete
Fourier transform (DFT). Afterwards, the associated peak frequencies between 0.5 and 30 Hz
were detected accordingly and then for each 30-s epoch, they were averaged over a window of
9 epochs (4.5 min) centered on that epoch, yielding the epoch-based estimates of fEEG . The
cardiac parameters, derived from electrocardiography (ECG) signals over a 9-epoch window
centered on each 30-s epoch, included mean heart rate (HR), standard deviation of heartbeat
intervals (SDNN), and the logarithmic spectral powers of heartbeat intervals in low-frequency
(LF, 0.01-0.15 Hz) and high-frequency (HF, 0.15-0.4 Hz) bands. They have been proven to re-
late to certain properties of autonomic nervous system [13, 281]. For instance, HR, SDNN, and
Wake
REM sleep
Light sleep
Deep sleep
2
f EEG
0
-2
2
HR
0
-2
2
SDNN
0
-2
2
LF
0
-2
2
HF
0
-2
0 1 2 3 4 5 6 7 8
Time (h)
Figure 7.2: An example of epoch-based sleep states/stages over night and the normalized (Z-score) EEG
mean frequency fEEG and cardiac parameters HR, SDNN, LF, and HF (in nu).
LF are associated with sympathetic activity and the HF power is a marker of parasympathetic or
vagal activity activated by respiratory-stimulated stretch receptors [24, 281, 288]. Many stud-
ies have shown that autonomic nervous activity is effective in identifying sleep states or stages
when PSG is absent [179, 183, 248]. Here all the parameters were normalized to zero mean
and unit variance (Z-score) for each recording, leading to a normalized unit “nu”. Note that the
use of a window aimed at including sufficient heartbeats to capture cardiac rhythms and to help
reduce signal noise so that the autonomic nervous activity can be reliably expressed where a
window size of about 5 min was recommended [288]. This could also help reduce signal noise.
For analyzing the time delay during sleep transitions, we chose 30 s the minimum epoch length
because (1) it is the standard resolution for PSG-based manual scoring of sleep stages [247]
and (2) using a smaller length the parameters could be influenced by the subtle changes caused
by the physiological response during arousals [268], which would likely lead to spurious cross-
correlation analysis results. Figure 7.2 illustrates an example of overnight sleep profile and the
EEG and cardiac parameter values from a healthy subject. It can be seen that these parameters
seem correlated with sleep states/stages to some extent.
7.3 Correlation-analysis during sleep transitions
To capture the delayed changes of cardiac and EEG activity, we constrained our analysis on
the periods with 15 epochs (7.5 min) before and after each transition moment where only one
transition occurred in the middle of each period. We note that a small portion of transitions was
sampled according to our criteria, which might lead to under representation of the fragmented
sleep transitions, i.e., the transitions with other transitions immediately ahead or following
within a short time. The amount of these periods was 1077 out of totally 28359 transitions from
(a) (b) 70
Distribution (%)
60
50
40
30
Wake 20
10
0
Wake REM Light Deep
sleep sleep sleep
0.6% 25.0% 1.0%
3.8% 20.2% 0.03%
8.1% Light 14.4% Deep

REM sleep sleep sleep
11.3% 15.5%
NREM sleep
Figure 7.3: (a) Mean percentages of sleep transitions over recordings. The average number of total
transitions per recording is 85.9. The transitions are indicated with arrows, where the REM–deep sleep
transitions are not shown because they account for less than 0.01% of total transitions. (b) Sleep stage
distribution (mean and SD over recordings).
all 330 recordings. The first and the last 5 epochs of these periods were excluded, yielding
10-min segments used for analyzing time delays. This served to avoid the time-delayed effects
of the previous and the next transitions when analyzing the parameter values for the time delay
of current sleep transition and meanwhile, to include enough data points for computing cross-
correlation coefficients. By these means, we only considered major types of sleep transitions in
three “hierarchical” levels, as shown in Figure 7.3. They are the transitions: (1) between wake
and sleep including W→LS (from wake to light sleep), LS→W (from light sleep to wake), and
RS→W (from REM sleep to wake); (2) between REM and NREM sleep including RS→LS
(from REM to light sleep) and LS→RS (from light to REM sleep); and (3) within NREM sleep
including LS→DS (from light to deep sleep) and DS→LS (from deep to light sleep). These
seven types of transitions are of predominance among all sleep transitions [154, 159], which
can also be observed in our data (see Figure 7.3). The transitions between REM and deep sleep
and from deep sleep to wake were not included. For each parameter, we calculated the mean
values over all the 10-min segments for each type of transition and then they were Z-score
normalized. Figure 7.4 illustrates the mean parameter values 5 min (or 10 epochs) before and
after sleep transitions.
The cross-correlation between EEG mean frequency fEEG and each cardiac parameter αc
(HR, SDNN, LF, or HF) for a given time segment with m epochs is expressed by a cross-
correlation function G,
1 m−n
G fEEG ,αc (n) ≡ ( fEEG ⋆ αc )(n) = ∑ fEEG,i · αc,i+n,
m i=1
(7.1)
where n is the number of time shifts (a.k.a. time lag) of the convolution between fEEG and αc .
Therefore, the delayed time ∆τ can be obtained by searching for the lag leading to maximum
f EEG Transition
Wake-sleep REM-NREM NREM
W LS LS W RS W RS LS LS RS LS DS DS LS
2
0
-2
2
HR
0
-2
2
SDNN
0
-2
2
LF
0
-2
2
HF
0
-2
-5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5
Figure 7.4: Mean values of the normalized parameter fEEG , HR, SDNN, LF, and HF (in nu) with 10
epochs (5 min) before and after different sleep transitions (W→LS, LS→W, RS→W, RS→LS, LS→RS,
LS→DS, and DS→LS).
absolute correlation coefficient, such that
∆τ = arg max |G fEEG ,αc (n)|. (7.2)

n
The time delay ∆τ can be positive or negative. A positive ∆τ value indicates that fEEG starts
changing earlier than the cardiac parameter αc , and conversely, a negative value reflects that the
variations of αc are later than fEEG with ∆τ epochs (∆τ /2 min) on average.
7.4 Results and discission

As shown in Table 7.1, the cardiac parameters started changing approximately 1.5 min ahead
of the EEG mean frequency for the entire-night recordings, confirming the findings reported
by Otzenbeger et al. [217]. This indicates that the changes of autonomous activity generally
precede the EEG changes. It was also revealed that, on average, HR, SDNN, and LF were
positively correlated with EEG mean frequency while HF was negatively correlated with it
(p < 0.05). In addition, the table provides the time delay analysis results for different types of
sleep transitions, where the time lag ∆τ (in 30-s epoch) and the associated maximum correlation
coefficients r are given. For SDNN, LF, and HF, we found that the time lag was of −3 to −1 min
for the transitions between wake and sleep and of −2 to −1 min for NREM sleep transitions.
This indicates that the changes of HRV anticipated the variations of EEG mean frequency by
1-3 min for these types of transitions.
In general, the relatively constant time delay between cardiac and EEG parameters indicates
the existence of time differences between autonomic and cortical changes during sleep transi-
Table 7.1: Results of time delay analysis between EEG mean frequency fEEG and four cardiac
parameters HR, SDNN, LF, and HF for different sleep transitions
Sleep transition Samples HR SDNN LF HF

∆τ r ∆τ r ∆τ r ∆τ r
◮ Full-night recording
All N = 330 −2.4 0.22 −2.6 0.24 −2.6 0.19 −3.3 −0.19
◮ Wake-sleep transition
W→LS N = 159 −1 0.90 −3 0.86 −5 0.71 −4 −0.77
LS→W N = 84 −1 0.89 −5 0.62 −2 0.74 −2 −0.73
RS→W N = 29 −2 0.86 −6 0.70 −3 0.79 −3 −0.86
◮ REM-NREM transition
RS→LS N = 180 0 0.84 0 0.90 0 0.71 0 −0.71
LS→RS N = 284 1 0.89 0 0.84 1 0.92 2 −0.90
◮ NREM transition
LS→DS N = 196 0 −0.96 −2 0.70 −2 0.75 −3 −0.64
DS→LS N = 145 1 −0.60 −4 0.78 −4 0.81 −4 −0.83
Correlation coefficients r were computed for lags from -20 to +20 epochs. For full-night recordings,
the average time delays and correlation coefficients are presented which were significant (p < 0.05)
for the majority of the recordings (82.7% for HR, 85.2% for SDNN, 76.1% for LF, and 78.4% for
HF). For sleep transitions, the maximum correlations are presented and they were found to be signifi-
cant (p < 0.0001). The positive delays mean that EEG changes are prior to cardiac changes and the
negative delays indicate the changes in cardiac activity preceding those in EEG activity.
tions. The constant earlier appearance of autonomic variations suggests that cortical changes
are secondary to changes elsewhere in the brain (e.g., brain stem) or central nervous system.
These time differences are sleep state/stage dependent and seem not occurring for REM sleep
(i.e., REM-NREM transitions). This also suggests that the physiology of these changes dur-
ing REM sleep is different from that during wake and NREM sleep. In fact, REM sleep has
different physiological mechanisms compared with NREM sleep, where REM transitions are
‘switch-like’ transitions [187] while the physiological variations within NREM sleep are grad-
ual [55]. The lack of time delay during REM transitions might also be caused by the fact that
the R&K rules force human raters to merge REM epochs of 30 s into one REM sleep period if
they occur within 3 min [247]. For W→LS transitions, upon a closer look, we found that most
of them were in the beginning of the night, indicating the presence of time delay conveyed be-
tween cardiac and brain activity during sleep onset. The time delay from sleep (REM or light
sleep) to wake could be due to the gradual steps of awakening [8, 116]. Additionally, as shown
in the table, the changes of HR seem always later than the HRV changes. We therefore specu-
late that, to a certain degree, parasympathetic changes (reflected by HF changes) might present
slightly earlier than the variations of sympathetic activity (corresponding to HR changes) during
wake-sleep and NREM transitions.
As stated, when computing the parameters, we applied averaging or filtering over a 9-epoch
6 HR SDNN LF HF
Δτ (30 s) 4
2
0
W LS
-2
-4 LS W
-6 RS W
RS LS
LS RS
0.8 LS DS
|r| (-)
DS LS
0.6
0.4
HR SDNN LF HF
1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9
Window size (30 s)
Figure 7.5: Time delay ∆τ between cardiac and EEG activity and the associated (maximum) absolute
correlation coefficient |r| versus averaging window size (1-9 epochs, step size 2 epochs) for computing
the epoch-based parameters.
14
Wake-sleep transition
12 REM-NREM transition
Absolute HR change (bpm)
NREM transition
10
0
LS W RS W W LS LS RS RS LS DS LS LS DS
Figure 7.6: Absolute changes of HR (mean and SD) during sleep transitions, computed based on the
10-min segments.
(4.5-min) window centered on each epoch in order to obtain reliable parameter values. Fig-
ure 7.5 illustrates the time delay and the associated absolute correlation coefficient versus the
averaging window size. The figure shows that our choice was appropriate where the correla-
tions generally increased and the time delays ∆τ stabilized along with the increase in window
size. In fact, when performing cross-correlation analysis between two signals, using a symmet-
ric linear-phase filtering at the same window size would not cause signal phase distortion [240].
Thus, the averaging here should not affect the lag sought when searching for the time delays.
Figure 7.6 shows the absolute changes of HR (in beat per minute, bpm) during different
sleep state/stage transitions. It is noted that large HR changes (4.6-9.1 bpm) occurred during
the wake-sleep transitions while the NREM transitions had the smallest changes in HR (1.1-
2.7 bpm). This supports the “hierarchical” nature of the various transitions and confirms the
validity of the results.
7.5 Conclusion
In this chapter, we investigated the time delay between cardiac and brain activity for different
sleep transitions using a cross-correlation analysis. The presented results indicate that the au-
tonomic nervous system changes generally precede the EEG changes by 1-3 min during sleep
transitions except for REM-NREM transitions. In practice, the important findings here can be
used in future research to predict sleep state/stage changes based on autonomic nervous activity.
CHAPTER 8
Detection of nocturnal slow wave sleep based on

cardiorespiratory activity
This chapter is adapted from: X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Rolink. Detection of
nocturnal slow wave sleep based on cardiorespiratory activity. Submitted.
Abstract – Human slow wave sleep (SWS) during bedtime is paramount for energy conser-
vation and memory consolidation. This work aims at automatically detecting SWS from noc-
turnal sleep using cardiorespiratory signals that can be acquired with unobtrusive sensors in a
home-based scenario. From the signals, time-dependent features are extracted for continuous
30-s epochs. To reduce the measuring noise, body motion artifacts, and/or within-subject vari-
ability in physiology conveyed by the features and thus enhance the detection performance, we
propose to smooth the features over each night using a spline fitting method. In addition, it is
found that the changes in cardiorespiratory activity precede the transitions between SWS and
the other sleep stages (non-SWS). To this matter, a novel scheme is proposed that performs
the SWS detection for each epoch using the feature values prior to that epoch. Experiments
were conducted with a large data set of 325 overnight polysomnography (PSG) recordings us-
ing a linear discriminant classifier and ten-fold cross validations. Features were selected with a
correlation-based method. Results show that the performance in classifying SWS and non-SWS
can be significantly improved when smoothing the features and using the preceding feature val-
ues of 5-min earlier. When compared with manual PSG scoring, we achieved a Cohen’s Kappa
coefficient of 0.57 (at an accuracy of 88.8%) using only six selected features for 257 recordings
with a minimum of 30-min overnight SWS that were considered representative of their habitual
sleeping pattern at home. A marked drop in Kappa to 0.21 was observed for the other nights
with SWS time of less than 30 min which were found to more likely occur in older subjects.
This will be the future challenge in cardiorespiratory-based SWS detection.
105
106 Chapter 8. Slow wave sleep detection with cardiorespiratory activity
8.1 Introduction
Nocturnal sleep of humans is comprised of rapid-eye-movement (REM) sleep, stages S1-S4
of non-REM (NREM) sleep, and wake according to the R&K rules [247]. S1 and S2 are
grouped into “light sleep”, where S1 and S2 correspond to stages N1 and N2 respectively
according to the more recent guidelines of the American Academy of Sleep Medicine (AASM)
[136]. S3 and S4 are considered slow wave sleep (SWS), in correspondence to N3 stage in the
AASM guidelines. SWS relates to delta electroencephalographic (EEG) activity with no eye
movements [136]. It represents the most restorative period of sleep for metabolic functioning,
during which brain and body energy are conserved [35] and new memories are consolidated
[285]. SWS associates with maintenance of sleep and sleep quality [45]. Lack of SWS may
result in, e.g., loss of daytime performance [45] and increased risk of diabetes [287]. More
interestingly, attention has been engaged in the past decade to improve nighttime sleep (i.e., to
enhance memory consolidation) through external stimulation of sleep slow waves in humans
[192, 193, 213]. Therefore, we were engaged to develop a system to accurately detect SWS
from nocturnal sleep, particularly in a home scenario.
Polysomnography (PSG) is the “gold standard” for objective sleep assessment, relying on
which a hypnogram can be derived through visual scoring by sleep technicians [136, 247]. A
PSG recording typically consists of various bio-signals such as EEG, electromyograhy (EMG),
electroocculography (EOG), electrocardiography (ECG), respiratory effort (RE), and blood
oxygen saturation (SaO2 ). These signals are usually split into continuous 30-s non-overlapping
intervals, called epochs. Although PSG is a standard method for sleep analysis, it has some
disadvantages, for example, it is conducted in a sleep laboratory, leading to high costs with
facilities; it requires many electrodes to be attached to the body, disrupting a subject’s normal
sleep as a consequence; and it requires subjects to stay in the sleep laboratory overnight that
is not compatible with a prolonged sleep monitoring. To overcome these disadvantages, car-
diac/respiratory information has been deployed to assess sleep for years as long as they can be
acquired with unobtrusive sensing systems in a home-based environment such as with a wrist-
worn watch [127], a bed sensor [161], a textile bedsheet [264], a web-camera [232], an acoustic
device [228], a Doppler radar [319], and a photoplethysmographic sensor [16]. It has been
proven that cardiorespiratory signals contain relevant physiological information for sleep stag-
ing such as heart rate variability (HRV) [46] and respiration rhythm [95]. This is because they
are related to autonomic nervous system that differs between sleep stages [292]. For example,
SWS coincides with an decreased sympathetic activity conveyed by the low-frequency power
in HRV.
Cardiorespiratory-based sleep stage classification has been increasingly studied in recent
years, where many features (representing certain physiological aspects) have been designed
and extracted from cardiac and/or respiratory signals [151, 182, 197, 249, 309]. However,
rather than SWS detection, those studies investigated either wake–REM–NREM or sleep–wake
classification. Many other studies have reported results in classifying wake, REM sleep, light
sleep, and SWS [127], detecting REM sleep [131], or differentiating light sleep and SWS [51],
whereas they used additional physiological signal modalities such as peripheral arterial tone
and oxyhemoglobin saturation. Shinar et al. [273] developed an HRV-based SWS detector and
obtained an accuracy of about 80%, while they used a very small portion with a total duration
of 100 min (SWS of 50 min) rather than entire-night recordings for validation. Therefore, this
chapter addresses the problem of continuously classifying overnight SWS and non-SWS (all
the other stages) with cardiorespiratory signals that can be unobtrusively acquired.
The sleeping pattern of healthy adults usually progresses with several regular cycles through-
out the night [63]. This means that, for each recording, the sleep stage with associated physio-
logical activity across the night is time-variant so that each feature is considered an epoch-based
time series. After visually comparing some feature values and PSG-based annotations chang-
ing over night, we observed many errors occurring in the middle of a long SWS/non-SWS
period, possibly due to measuring noise, feature computing variances, or body motion artifacts.
Another cause might be the ‘within-subject variability’ in physiology, which means that the
physiological expression of features was not perfectly discriminative and thus could not deliver
an ideal separation between sleep stages. For these reasons, we decided to low-pass filter or
smooth each feature’s values over time using a spline fitting method [296]. The main reason
of using spline fitting was that it is capable of interpolating missing data compared with many
other low-pass filters [84, 296]. This is of particular importance because sleep is a continuous
process and we found that our data had an average of ∼10% missing values.
Several researchers have investigated the temporal relationship between cardiac dynamics
and brain activity [62, 146, 217]. For instance, Otzenberger et al. [217] reported that the
overnight HRV changes generally precede the variations in EEG activity by around 1-2 min.
Jurysta et al. [146] demonstrated that the high-frequency power of heartbeat or RR intervals
corresponds to a preceding time (or negative time delay) of approximately 7 min compared
with the delta-wave power of EEG spectrum. Additionally, the decrease of heart rate in stage
S2 was found to anticipates the onset of SWS by several minutes [62]. These studies indicate
that the autonomic changes are not exactly synchronized with the variations in EEG activity,
in particular during the transitions between SWS and non-SWS; rather that a time difference
appears in between. In our data, we also observed that many features started changing prior
to the transition moments between the annotated SWS and non-SWS epochs. This time delay
phenomenon would end up with errors in classifying SWS and non-SWS epochs. To this matter,
we propose a novel scheme by using the preceding feature values in earlier epochs to further
improve the identification of the sleep state of each epoch (SWS or non-SWS). This can also
potentially enable the prediction of SWS onset in an early manner allowing a real-time SWS
detection system, usually required for slow wave stimulation in practice.
Previous work has shown that a linear discriminant (LD) classifier is appropriate in the
problem of sleep stage classification [180, 182, 249], which was adopted in this work for SWS
and non-SWS classification. Preliminary results of this work have been previously reported
[186].
Table 8.1: Subject demographics and sleep data from normal

nights (with a minimal SWS time of 30 min)

Recording N = 257 (145 subjects)
Sex 65 males and 80 females
Age (y) 49.5 ± 19.2 20 − 95
Wake (%) 17.5 ± 11.0 1.1 − 63.0
REM sleep (%) 15.8 ± 5.5 0.0 − 29.0
Light sleep (%) 51.9 ± 8.6 21.1 − 70.4
SWS sleep (%) 14.8 ± 5.1 6.2 − 32.2
Full PSG data (at least 16 channels of bio-signals) from 165 healthy subjects in the SIESTA
project [160] was included, monitored in seven different sleep centers located in five European
countries. In accordance with the SIESTA protocol, the subjects met several criteria such as
no reported symptoms of neurological, mental, medical, or cardiovascular disorders, no history
of drug or alcohol abuse, no shift work, and retirement to bed before midnight depending on
their habitual bedtime [160]. Each subject spent two consecutive nights in a sleep laboratory,
resulting in a total of 330 overnight recordings. For each recording, the scoring of 30-s epoch-
based sleep stages was carried out by sleep technicians based on PSG according to the R&K
rules. For SWS and non-SWS classification, wake, REM sleep, S1, and S2 were merged into
a single non-SWS class; S3 and S4 were labeled as SWS class. The epochs with invalid PSG
scoring (∼3%) were removed.
Five recordings were excluded due to the absence of SWS, yielding an inclusion of 325
recordings in our data set. In addition, this work primarily addressed on the ‘normal’ sleep
nights (from lights OFF in the evening till lights ON in the morning), during which the total
SWS time throughout the night was no less than 30 min [216], resulting a group of 257 record-
ings from 145 subjects in a normal group. These nights were more representative of the normal
sleeping pattern in terms of SWS [216], which were expected with a home-based sleep moni-
toring. The remaining 68 nights (from 51 subjects) with the overnight total SWS time of less
than 30 min (low-SWS group), more from the first nights than the second nights, were excluded
because they might be strongly influenced by the “laboratory effects” where the subjects could
not sleep well as habitual as being at home [196]. The subject demographics and sleep data for
the normal group used in this study is summarized in Table 8.1. In spite of that, we also tested
our approach on the recordings from the low-SWS group.
The thoracic RE signals (sampled at 10 Hz) were acquired with a respiratory inductance
plethysmographic (RIP) chest belt and the cardiac signals (sampled at ≥100 Hz) were recorded
with a modified V1 lead ECG.
8.3 Methods
8.3.1 Signal preprocessing
The RE signal was filtered with a tenth order Butterworth low-pass filter (with a cut-off fre-
quency of 0.6 Hz) to eliminate high frequency noise. Afterwards, the baseline was subtracted
by the median peak-to-trough amplitude over the entire recording [182, 248]. Because we also
extracted respiratory features in the frequency domain, a fast Fourier transform (FFT) with a
Hanning window (used to reduce spectral leakage) was applied to estimate the power spectral
density (PSD) on the resulting signal for each epoch [248].
The ECG signal was high-pass filtered using a Kaiser window (with a cut-off frequency
of 0.8 Hz and a side-lobe attenuation of 30 dB) to remove baseline wander, after which the
resulting signal was zero-meaned. To extract features from RR intervals for each epoch, a
Hamilton-Tompkins R-peak detector[124] combined with a precise QRS localization algorithm
[107] was applied to locate R peaks on the ECG signal with a window of nine epochs centered
at the epoch of interest. This window served to include sufficient data points to capture the
changes in RR intervals, where the window size is close to the value of 5 min recommended
in [288]. The resulting RR interval series was then re-sampled via linear interpolation at a
sampling rate of 4 Hz. The PSD of RR intervals was estimated using an autoregressive (AR)
model with adaptive order [42]. Using the AR model instead of a Fourier-based approach
was due to its limitations such as poor spectral resolution and leakage [44], which were more
sensitive to estimating the PSD of the RR interval series having a lower sampling rate compared
with the RE signal.
A total of 70 features were extracted for each 30-s epoch from ECG and thoracic RE signals,
which are briefly described below. Note that the features for a specific epoch were mostly
computed within a certain window centered at that epoch.
The ECG features were obtained from the RR intervals or heart rates over a window of nine
epochs (with around 300 beats during sleep on average). In the time domain, they included
the mean heart rate, mean RR interval (detrended and non-detrended), standard deviation and
range of RR intervals, the percentage of successive RR intervals that differ by more than 50
ms, and the root mean square and standard deviation of successive RR interval differences
[288]. Frequency domain features comprised the logarithm of normalized power in the very low
frequency (VLF, 0.003-0.04 Hz), low frequency (LF, 0.04-0.15 Hz), and high frequency (HF,
0.15-0.4 Hz) spectral bands, the ratio of LF and HF spectral powers [249, 288], and the module
and phase of HF pole [197]. The VLF, LF, and HF power and LF-to-HF ratio with adapted
spectral bands have succeeded in improving sleep/wake detection [178]. The maximum power
in the HF band and its associated frequency (in line with the mean respiratory frequency) were
also calculated [248]. Additionally, non-linear properties of RR intervals were quantified based
on detrended fluctuation analysis (DFA) with parameter α [148] and its short-term (parameter
α1 ) and long-term (parameter α2 ) exponents [224], and multi-scale sample entropy (length: 1
and 2 samples, scale: 1-10) [75].

The RE features included the mean respiratory frequency estimated in the time and the fre-
quency domain, respiratory frequency standard deviation over five epochs, mean and standard
deviation of breath-by-breath correlations, standard deviation of breath lengths, and the spec-
tral power of respiratory frequency [249]. Several features regarding the RE amplitude were
derived: the standardized median and sample entropy of respiratory peaks and troughs (i.e.,
the respiratory upper and lower envelopes indicating the inhalation and exhalation depths, re-
spectively), median peak-to-trough difference, median volumes and flow rates of breath cycles,
inhalations, and exhalations, and the ratio of inhalation and exhalation flow rates [182]. When
computing these amplitude-based features for each epoch, we used a window of thirteen epochs
(with around 120 breath cycles) since the sample entropy measures are less reliable if the num-
ber of samples is less than 100 [250]. Similar to the spectrum analysis of RR intervals, we
found the power in different spectral bands (VLF, LF, and HF) and the LF-to-HF ratio obtained
from respiratory PSD [248]. In addition, we extracted the respiratory regularity quantified with
sample entropy over seven epochs [182] and windowed respiratory dissimilarity measured by
means of uniform scaling [185] and dynamic (time and frequency) warping [180], respectively.
8.3.3 Spline fitting for feature smoothing
As stated, the features should cycle with time in terms of sleep stage, which motivated us
to consider a recording- or night-specific feature smoothing. Before that, each feature was
normalized for each recording to have zero mean and unit variance (Z-score normalization).
This served to reduce the variability between subjects caused by the difference between PSG
systems used in different sleep laboratories and/or the difference in physiological expression
during sleep. Our previous work [186] has revealed that the Z-score normalization can help
improving SWS detection.
The spline fitting method has been widely used for time series smoothing [84]. Let x rep-
resent a sequence of observations x = {x1 , x2 , ..., xn} (x1 < x2 < ... < xn ) and y their responses
y = {y1 , y2 , ..., yn}, then a relation between them can be modeled by
yi = g(xi ) + εi (i = 1, 2, ..., n), (8.1)
where g is a smoothing (spline) function, εi are independent and identically distributed resid-
uals. The smoothing function can be estimated by minimizing the objective function (i.e.,
penalized sum of square) such that
" #
n Z xn
ĝ = arg min ∑ [yi − g(xi)]2 + λ g′′ (x)2 dx , (8.2)
g x1
i=1
where λ is a smoothing parameter that controls the trade-off between residual and local vari-
ation. The smoothing function can be expressed by cubic B-splines as basis functions and
determined via least squares approximation (LSA) [84, 296].
Given a feature for a recording, the observations here are the epoch indices t = {t1,t2 , ...,tm}
and the responses are their corresponding feature values v = {v1 , v2 , ..., vm }, where m is the
total number of epochs. To build up a spline fitting model, the entire sequence is divided in
k continuous subsequences with k − 1 boundaries called knots or breaks; and each of them
contains l epochs. The feature values and epoch indices for this recording are then expressed
respectively as
v = {v11 , v12 , ..., v1l , v21 , v22 , ..., v2l , ..., vk1, vk2 , ..., vkl } (8.3)
| {z }| {z } | {z }
1 2 k
and
t = {t11 ,t12 , ...,t1l ,t21 ,t22 , ...,t2l , ...,tk1,tk2 , ...,tkl }. (8.4)

| {z }| {z } | {z }
1 2 k
Thereafter, each subsequence is modeled by Equation 8.1 and 8.2, yielding a spline fitting over
the entire sequence with multiple knots. Since the total number of epochs differs between
recordings, we preferred to fix the window size of subsequences w = ⌈m/k⌉ instead of using a
fixed number of breaks k. A larger window size (or fewer knots) results in a smoother fitting
curve; while a smaller window size (or more knots) decreases its smoothness. For example,
as depicted in Figure 8.1, the feature values throughout the night after spline smoothing seem
better mapped to the PSG-based annotations. The figure also shows that the RR interval and
respiratory rate have lower variances during SWS compared with the other stages.
8.3.4 Feature subset selection
Since an LD classifier is usually sensitive to the presence of redundant and non-discriminative

features, classification performance would degrade as a result. Hence, we applied a correlation-
based feature selector (CFS) [121] to select features that can maximize the discriminative
power. CFS is a supervised algorithm that towards finding an ‘optimal’ feature subset contain-
ing features uncorrelated with each other and highly correlated with the classes. With CFS, the
heuristic evaluation criterion, called “merit”, can be formulized by taking the feature-to-class
and feature-to-feature correlations into account. Starting with no features, a forward search
was used to add new features one-by-one until no increase on merit was observed when in
combination with additional features. More details of CFS can be found elsewhere [121].
8.3.5 Classifier
Here a simple LD classifier was adopted and the classification was performed on each epoch
over the whole recording. The linear discriminant function is given by
1
Gc (f) = − (f − µ c )T Σ −1 (f − µ c ) + ln Pr(c), (8.5)
2
where µ c expresses the mean of the feature vector f, Σ the pooled covariance matrix, and Pr(c)
the prior probability for class c [SWS (positive class) or non-SWS (negative class)]. Given a
feature vector, the jth epoch E j ( j = 1, 2, ..., m) of a recording is classified based on the decision
making rule
(
SWS if GSWS (f j ) > Gnon-SWS (f j )
C (E j | f j ) = . (8.6)
non-SWS otherwise
We observed that the occurrence of each class varied throughout the night. For instance,
the probability of being in SWS at the end of the night should be lower than that in the middle
of the night. This indicates that the prior probabilities are time-varying. Instead of using a
fixed prior probability hence, we computed the time-varying prior probability for each epoch
by simply counting the relative frequency it was, in that specific epoch index, annotated as each
class [248].
8.3.6 Time delay
As illustrated in Figure 8.1, there seems to be some errors in feature values with a few min-
utes before the transitions between SWS and non-SWS, implying the presence of time delay
between the changes of cardiorespiratory properties and the PSG-based annotations. Under the
consideration of the time delay, earlier cardiorespiratory activity can be utilized to identify SWS
or non-SWS class. Supposing that we want to classify the jth epoch E j ( j = 1, 2, ..., m), we can
use the feature values of the ( j+τ )th epoch (with a delay of τ epochs to the target epoch) instead
of using the feature values from the epoch itself, such that
(
SWS if GSWS (f j+τ ) > Gnon-SWS (f j+τ )
C (E j | f j+τ ) = (8.7)
non-SWS otherwise
in which a negatively delayed time (i.e., a preceding time) was expected. This means that we
anticipated the class of the target epoch with τ epochs earlier. To evaluate this approach, we
computed the discriminative power of the features and the classification results by varying the
time delay from -30 to 0 epochs with a step size of one epoch (a τ of zero corresponds to the
absence of time delay).
8.4 Experiments and evaluation

From a practical point of view, we considered a subject-independent cross validation – the two
nights’ recordings from the same subject were either included in the training or the test data
set. To provide an unbiased evaluation of our classifier, a ten-fold cross validation (CV) pro-
cedure was conducted. The data set was partitioned into ten subsets containing recordings as
nearly equal as possible. During each iteration of the ten-fold CV, nine subsets were used to
generate feature subsets and then train the classifier and the remaining was used for testing.
The classification results were then obtained on each test data set of the cross-validation; there-
after the evaluation of the classifier’s performance was formed by pooling (i.e., aggregating) or
averaging all results.
PSG
SWS
Non-SWS
SDNN RR (a.u.)
2
-2
2
SDFRE (a.u.)
-2
0 1 2 3 4 5 6 7
Time (h)
Figure 8.1: An example of overnight PSG-based annotations of SWS and non-SWS and the values of
two representative features SDNNRR (standard deviation of RR intervals) and SDFRE (standard deviation
of respiratory frequency) from a subject. The unsmoothed (dashed) and smoothed (solid) feature values
are plotted. The window width for spline fitting was 25 epochs. By comparing the annotations and the
two features, classification errors might occur around the transitions between SWS and non-SWS (e.g.,
the transition around the 5th h).
To prevent selecting features upon the whole data set and thus biasing the classifier, CFS
was applied during each iteration of the ten-fold CV, yielding ten ‘optimal’ feature subsets, one
for each training set. In order to assemble a single feature list, only the features appearing in all
feature subsets were selected. This list was thereby used in all iterations of ten-fold CV to test
the classifier.
Although the feature selector can automatically choose features that optimally separate the
classes SWS and non-SWS, evaluating the discriminative power of each single feature explores
which physiological aspects help distinguish both classes. It not only allows for the comparison
among features but also indicates to what extent the smoothing and time delay help improve
the features. For these purposes, the absolute standardized mean difference (ASMD) was used
to measure the discriminative power of a single feature. Given a feature f, it is computed as the
absolute mean difference of the feature values between SWS and non-SWS epochs divided by
the standard deviation of the values over all epochs
f
|µSWS f
− µnon-SWS | f
ASMD = (8.8)
σf
where µSWSf f
and µnon-SWS express the sample mean of SWS and non-SWS epochs, respectively,
f
and σ is the sample standard deviation. A higher discriminative power in separating the two
classes translates to a larger ASMD value.
Overall accuracy, precision, sensitivity, and specificity were first considered to evaluate the
classifier. However, they might not be appropriate criteria for the “imbalanced class distribu-
tion” in our data, where the non-SWS epochs account for an average of 87.6% of the night. The
Cohen’s Kappa coefficient of agreement κ [72] offers an indication of the general classification
performance in correctly identifying imbalanced classes by compensating for the probability
of chance agreement. Here the classifier threshold was chosen to optimize the pooled Kappa
based on training data. To have an overview of the classification performance across the entire
solution space, a Precision-Recall (PR) curve was used. It plots precision versus recall (or sen-
sitivity) by varying the classifier threshold used to separate the two classes. When comparing
classifiers, the metric ‘area under the PR curve’ (AUCPR ) was calculated. In general, a larger
AUCPR corresponds to a better classification performance.
In order to evaluate the effectiveness of the feature smoothing and the time delay approaches
in improving SWS and non-SWS classification, we compared four classification schemes by
using features
• A: without smoothing and time delay,

• B: with smoothing but without time delay,
• C: without smoothing but with time delay, and
• D: with smoothing and time delay.
The spline window size and the delayed time were determined to optimize κ based on training
data. Moreover, the classification performance was also compared between using only ECG
and only RE signals and between the normal group and the low-SWS group.
8.5 Results
After the feature selection procedure described before, a total of six features were selected with
CFS when including all cardiorespiratory features. In the same way, we obtained a list of four
features when using ECG alone and four when using solely RE. The selected features using
different signal modalities are listed in Table 8.2.
The averaged discriminative powers of the selected features in different schemes are com-
pared in Figure 8.2. It indicates that the smoothing with spline fitting can improve the feature
discriminative power. Experimentally it was found that the κ value was maximized at a spline
window of 25 epochs. On the other hand, using the features with negative time delay also
increased their discriminative power by comparing the ASMD values between scheme A and
C (or between scheme B and D). Here the optimal time delay τ of −2.5 and −5 min were
experimentally found for scheme C and D, respectively.
Figure 8.3 plots the classification performance (κ and AUCPR ) versus time delay (τ ) in
scheme C and D. The figure shows that the highest κ and AUCPR occurred with a negative time
delay of five epochs (2.5 min) for the unsmoothed features and of ten epochs (5 min) for the
smoothed features. This means that the optimal time delay should depend on the window size
of spline fitting. As we expected, it was longer in scheme D (with smoothing) than in scheme
C (without smoothing).
The results of SWS and non-SWS classification obtained with respect to the four schemes
are summarized in Table 8.3. The best result, obtained with smoothing and time delay, cor-
Table 8.2: A list of selected features for SWS detection
Feature Description Denotation Signal modality

RR standard deviation∗,† SDNNRR ECG
RR spectrum power LF band∗,† LFRR ECG
RR DFA (parameter α )∗,† DFARR ECG
RR sample entropy (length 2, scale 1)† SERR ECG
Respiratory frequency standard deviation∗,‡ SDFRE RE
Respiratory peak standardized median∗,‡ SDMPRE RE
Respiratory trough standardized median∗,‡ SDMTRE RE
Respiratory uniform scaling dissimilarity‡ UNISRE RE
Selected features for SWS detection ∗ using both ECG and RE signals, † using only
ECG signal, or ‡ using only RE signal.
SDNN RR LFRR DFA RR SE RR

2 2 2 2
ASMD
ASMD
ASMD
ASMD
1 1 1 1
A B C D A B C D A B C D A B C D
Scheme Scheme Scheme Scheme
SDMPRE SDMTRE UNIS RE

2 SDFRE 2 2 2
ASMD
ASMD
ASMD
ASMD
1 1 1 1
0 0 0 0
A B C D A B C D A B C D A B C D
Scheme Scheme Scheme Scheme
Figure 8.2: Average discriminative power (as measured by ASMD) of the selected features in different
schemes. The ASMD of scheme D was found to be significantly higher than the others for all the selected
features using a paired (two-sided) Wilcoxon signed-rank test (p < 0.001). The time delay τ was −2.5
min for scheme C and −5 min for scheme D.
responds to a pooled κ of 0.57, an overall accuracy of 88.8%, and an AUCPR of 0.68. With
an average κ of 0.56 ± 0.17, an average accuracy of 88.7 ± 4.2%, and an average AUCPR
of 0.69 ± 0.18, this scheme significantly outperforms all others, tested with a Wilcoxon test
(p < 0.0001). The table indicates that smoothing the features per recording resulted in a sig-
nificant increase in both κ and AUCPR regardless of where time delay was considered. The
classification performances of the four schemes are also compared by PR curves in Figure 8.4.
Taking a recording as an example, Figure 8.5 visually compares the PSG-based annotations and
the identified classes, suggesting an enhancement in classification performance when applying
feature smoothing and time delay. The figure also illustrates that feature smoothing can help
0.7 0.7
(a) (b)
0.6 0.6
Value (-)
Value (-)
0.5 0.5
0.4 Kappa 0.4 Kappa

AUC PR AUCPR
-30 -20 -10 0 -30 -20 -10 0

Time delay (30-s epoch) Time delay (30-s epoch)
Figure 8.3: Classification performance using features (a) with smoothing and (b) without smoothing
versus time delay (τ ), in epochs. The minus sign of τ indicates the use of preceding feature values.
Table 8.3: Summary of SWS and non-SWS classification results in different schemes using ten-
Fold CV
Result Prec. (%) Sens. (%) Spec. (%) Acc. (%) Kappa κ AUCPR
◮ Scheme A: without smoothing and without time delay
Pool 53.8 53.9 91.8 86.1 0.45 0.54
Average 53.5 ± 17.7 54.9 ± 18.7 91.8 ± 3.5 86.0 ± 4.3 0.43 ± 0.17 0.55 ± 0.18
◮ Scheme B: with smoothing and without time delay
Pool 56.8 57.2 92.3 87.0 0.49 0.60
Average 56.8 ± 18.3 58.1 ± 18.7 92.4 ± 3.8 87.0 ± 4.3 0.48 ± 0.17 0.61 ± 0.18
◮ Scheme C: without smoothing and with time delay (τ = −2.5 min)
Pool 59.1 61.7 92.5 87.9 0.53 0.62
Average 59.0 ± 17.7 63.2 ± 20.5 92.5 ± 3.5 87.8 ± 4.4 0.52 ± 0.18 0.63 ± 0.18
◮ Scheme D: with smoothing and with time delay (τ = −5 min)
Pool 61.8 65.6 92.9 88.8 0.57 0.68
Average 62.0 ± 17.8 67.2 ± 20.4 93.0 ± 3.7 88.7 ± 4.2 0.56 ± 0.17 0.69 ± 0.18
In total six features were selected via CFS (see Table 8.2). Classifier threshold was chosen to maxi-
mize κ for training data. Significance of difference was confimed between scheme D and the others for
accuracy, κ , and AUCPR using a paired (two-sided) Wilcoxon signed-rank test (p < 0.0001).
removing spurious (very few epochs) detections of a class in the middle of a longer period of
the other class. It confirms our expectation that the feature smoothing is an adequate way to
handle this type of errors.
Table 8.4 presents the confusion matrix of our SWS and non-SWS classifier based on car-
diorespiratory features with smoothing and a 5-min negative delay. To analyze the source of
false positives or alarms (i.e., instances where non-SWS epochs were classified as SWS), the
breakdowns of classification results for non-SWS between wake, REM sleep, S1, and S2 are
also given.
0.9
0.8
0.7
0.6
Precision
0.5
0.4
0.3
Scheme A: without smoothing and time delay
0.2 Scheme B: with smoothing, without time delay
Scheme C: without smoothing, with time delay (-2.5 min)
0.1 Scheme D: with smoothing and time delay (-5 min)
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Figure 8.4: Pooled PR curves of SWS and non-SWS classification in different schemes, where the
scheme D performed the best.
SWS PSG
Non-SWS
SWS Scheme A
Non-SWS
SWS Scheme B
Non-SWS
SWS Scheme C
Non-SWS
SWS Scheme D
Non-SWS
0 1 2 3 4 5 6 7
Time (h)
Figure 8.5: An example of overnight PSG-based annotations and the corresponding SWS and non-SWS
classification results in different schemes.
When using one signal modality alone, the classification performance would degrade as
shown in Table 8.5 (average of κ = 0.54 for ECG or of κ = 0.51 for RE). Since the optimal time
delay (−5 min) was found to be the same for either ECG or RE features, it was then used for
comparison. Although the inclusion of ECG and RE signals yielded a better classification per-
Table 8.4: Confusion matrix of SWS and non-SWS classification with

indication of false positives (normal group)
PSG → SWS non-SWS

Classified ↓ Total S2 S1 REM Wake
SWS 23122 14283 12841 444 424 574
non-SWS 12106 185309 90693 18172 36844 39060
Table 8.5: Comparison of SWS and non-SWS classification results

using different signal modality (normal group)
Signal modality #Features∗ Accuracy (%) Kappa κ AUCPR

ECG 4 88.2 0.54 0.65
RE 4 87.7 0.51 0.61
The pooled results (in scheme D) are presented.
∗ Features were selected with CFS (see Table 8.2).
Table 8.6: Performance comparison of SWS detection for different subject

groups of recordings in terms of SWS time.
Subject group N SWS time (min) Accuracy (%) Kappa κ AUCPR

Normal∗ 257 ≥30 min 88.8% 0.57 0.68
Low-SWS 68 (0, 30) min 92.3% 0.21 0.17
All 325 >0 min 88.9% 0.51 0.58
Pooled results (in scheme D) are presented.

∗ The group focused in this work, in which the recordings had a more repre-
sentative normal sleeping pattern as in a home-based environment where the

overnight SWS time was less influenced by laboratory effects.
formance and they can be easily and unobtrusively acquired as mentioned before, our approach
is still applicable to achieve reasonable results when one of them is absent.
We also applied our SWS detection approach for all the 325 recordings and for those in
the low-SWS group (68 recordings from 51 subjects) with the total SWS time of less than 30
min, where the classification results are presented in Table 8.6. The results for the low-SWS
group (κ = 0.21) were much worse than the normal group engaged in this study, due to which
the classification performance for all recordings dropped to κ = 0.51. Figure 8.6 (upper graph)
illustrates the relation between the amount of SWS and age, confirming what is known from
literature [216], i.e., that the amount of SWS decreases with age. Figure 8.6 (middle graph)
illustrates the classification performance versus SWS time, which were (positively) significantly
correlated. Figure 8.6 (lower graph) shows a significant (negative) correlation between κ and
age, indicating that the classification performance was age-dependent.
150
SWS time (min)

120 2
R = 0.11
90
60
30
0
20 30 40 50 60 70 80 90 100
Age (y)
1
0.8
Kappa
0.6
0.4
2
0.2 R = 0.28
0
0 30 60 90 120 150
SWS time (min)
1
0.8 2
R = 0.13
Kappa
0.6
0.4
0.2
0
20 30 40 50 60 70 80 90 100
Age (y)
Figure 8.6: Relation between overnight SWS time, subject age, and classification performance (κ ) of all
325 recordings (including the normal and the low-SWS recordings). Lines represent the linear equations
fitted for data from samples. Significant Spearman’s rank correlation was found between SWS time and
age (r = −0.35), between κ and SWS time (r = 0.52), and between κ and age (r = −0.32) at p < 0.001.
8.6 Discussion
It is noted that Hedner et al. [127] evaluated a sleep staging system and obtained a SWS and
non-SWS classification with a κ of 0.48 (re-computed in terms of their reported confusion
matrix). They deployed pulse rate, peripheral arterial tone, and actigraphy and their κ value is
smaller than that produced in the current study. To provide a fair comparison, we achieved a
κ of 0.51 for all 325 recordings, which still outperforms their result. A respiratory-based sleep
stager was developed in our previous work [182], reporting a κ of 0.43 in detecting SWS where
a subset of 48 normal sleep nights from the same database were included. It is lower than the
result presented here (κ = 0.51), generated using only four respiratory features.
Although feature smoothing can, in general, benefit SWS detection, it may also introduce
errors when detecting very short SWS periods. This is because some of the high-frequency fea-
ture components are likely not due to measuring noise or to outliers caused by motion artifacts,
but rather reflect some essential characteristics of short SWS periods. In the case of a longer
SWS duration, smoothing the feature values would reduce noise and, in consequence, increase
specificity. However, in the case of a shorter SWS duration (i.e., fragmented SWS), sensitivity
would decrease. This procedure should then be adopted by finding an optimal trade-off be-
tween rejecting noise and keeping useful information. As shown in Table 8.3, the spline fitting
increased all metrics. This was also the case for the low-SWS recordings where we found that
the improvement was mainly contributed by the increase of specificity. The reason might be
that many false positives (misclassified non-SWS epochs) in a long period of non-SWS were
corrected through feature smoothing.

In addition, the “optimal” time delay might be varying over the night possibly influenced by
the sleep stage immediately before or after SWS periods where the transition dynamics between
SWS and different stages in non-SWS usually change over time [159]. Hence, fixing the time
delay on features might not be the most appropriate strategy. An adaptive time delay model
depending on sleeping time should instead be investigated.
In Table 8.4, we notice that the false alarms mostly occurred in stage S2, of which 12.4%
were misclassified as SWS. Evidence has shown that the autonomic activity differs little be-
tween S2 and SWS [292]. Consequently, regardless of whether cardiac or respiratory activity
was used, there might be small differences between these two sleep stages. In fact, even with
PSG, scoring SWS is somewhat difficult due to the gradual changes in physiology between
SWS and light sleep. A relatively low inter-rater agreement with a Cohen’s Kappa coefficient
of only 0.71 was reported [81], which would lead to presence of fragmented SWS shown in
hypnogram (PSG-based annotations). Nevertheless, it merits further investigation upon how to
better discriminate between S2 and SWS stages by means of cardiorespiratory information.
As shown in Table 8.6, the marked decreased classification results for the low-SWS group
indicate that our classifier could not handle well the recordings with a very low number of SWS
epochs. These recordings in the low-SWS group had heavily imbalanced classes, which was
clearly more challenging from a classification point of view. Moreover, it was found that the
recordings with a decreased SWS time along with fragmented SWS appeared more during the
first nights than the second nights, likely caused by the “first-night effect” in a laboratory study
[196]. It can be clearly seen in Figure 8.6 that the total SWS time over night was significantly
correlated with subject age (negatively) and the classification performance (positively). The
figure also illustrates that our SWS detector performed better for young subjects with more SWS
time than elderly. The low-SWS nights should not be neglected and it is therefore suggested
to address SWS detection for subjects with less overnight SWS time (likely at older ages) in
future work.
Finally, as mentioned before, automatic cardiorespiratory-based SWS detection will benefit
the applications of prompting slow waves for enhancing memory consolidation during sleep in
an unobtrusive manner. This usually requires an online system for detecting SWS in real-time.
Although our proposed approach can anticipate the occurrence of SWS with 5 min ahead where
the window size (one side) used for computing features were all less than this time interval, the
feature normalization (based on the entire-night recording) and smoothing (with a spline win-
dow of 25 epochs) would still limit the achievement of an online SWS detector. However, these
limitations seem manageable. First, since the feature normalization mainly served to diminish
between-subject variability, this can be alternatively achieved by using the previous night (as a
baseline) to normalize the following nights in real applications where usually multiple nights
are expected. Second, the smoothing window size can be reduced and the smoothing can be
‘time-progressive’ to fit the online requirement. Nevertheless, their influences on the detection
performance should be further studied when targeting an online SWS detection system.
8.7 Conclusion
In this chapter, overnight epoch-by-epoch classification of nocturnal SWS and non-SWS was
achieved based on cardiorespiratory signals which can be acquired unobtrusively. To reduce
classification errors caused by, for example, sensor noise, body motion artifacts, and/or within-
subject variability, a recording-specific feature smoothing using spline fitting was employed.
Besides, we used the features anticipating each target epoch to identify SWS of that epoch as
long as the preceding cardiorespiratory activity (compared with the PSG-based annotations) ap-
peared during the transitions between SWS and non-SWS. With an LD classifier, we revealed
that the use of feature smoothing and time delay profoundly improved the classification per-
formance (κ of 0.57 versus 0.45). Our approach also produced reasonable classification results
when only one of the signal modalities was present. Furthermore, the classifier performed much
better for subjects who had more total SWS time than for subjects with less SWS time.
Part III: Cardiorespiratory-Based Sleep
Stage Classification
CHAPTER 9
Effects of between- and within-subject variability on

autonomic cardiorespiratory activity during sleep and
their limitations on sleep staging: a multilevel analysis
This chapter is adapted from: X. Long, R. Haakma, T.R.M. Leufkens, P. Fonseca, and R.M. Aarts. Ef-
fects of between- and within-subject variability on autonomic cardiorespiratory activity during sleep and
their limitations on sleep staging: a multilevel analysis. Submitted.
Abstract – Autonomic cardiorespiratory activity changes across sleep stages. However, it

presents unknown to what extent it is affected by variability between and within subjects dur-
ing sleep. As hypothesized that the variability is caused by differences in subject demograph-
ics (age, gender, and body mass index), time, and physiology, we quantitatively investigated
these effects and their limitations on achieving reliable cardiorespiratory-based sleep staging.
Polysomnographic recordings from 165 normal sleepers were included. Six representative pa-
rameters (30-s basis) obtained from overnight heartbeats and respiration, such as breathing rate
and heart rate, were analyzed. Multilevel models were used to evaluate the effects evoked by
differences in sleep stages, demographics, time, and physiology between and within subjects.
We also compared the cardiorespiratory-based sleep staging performance with and without cor-
recting the associated effects. Results show that the between- and within-subject effects were
found to be significant for each parameter. The between-subject variability influenced more on
breathing rate and heart rate while less on their variations compared with the within-subject
variability. When adjusted by sleep stages, the effects in physiology between and within sub-
jects explained more than 80% of total variance but the others explained less. If these effects
were corrected, profound improvements in sleep staging were observed. These results indicate
that the differences in subject demographics, time, and physiology present significant effects
on cardiorespiratory activity during sleep. The primary effects come from the physiological
variability between and within subjects, markedly limiting the sleep staging performance using
cardiorespiratory information. Efforts to diminish these effects will be the main challenge.
125
126 Chapter 9. Effects of between- and within-subject variability
9.1 Introduction
Polysomnography (PSG) is the gold standard and common practice for the objective analyses of
sleep architecture (hypnogram) and sleep-related disorders such as insomnia/parasomnia, sleep-
disordered breathing, and rapid-eye-movement (REM) sleep behavior disorder [168]. With
PSG, sleep stages are manually scored on continuous 30-s epochs based on electrophysiologi-
cal signals including electroencephalogram (EEG), electromyogram (EMG), and electrooccu-
logram (EOG) according to the Rechtschaffen and Kales (R&K) rules [247] or the more recent
guidelines of the American Academy of Sleep Medicine (AASM) [136]. PSG recordings are
usually acquired in a sleep laboratory that requires a lot of manual labor for visual scoring.
It is costly and uncomfortable for subjects and therefore not suited for long-term monitoring.
These disadvantages motivated sleep researchers and clinicians to devote more attention to al-
ternatives such as cardiac and respiratory activities, allowing unobtrusive sleep staging with
minimal discomfort to subjects [127, 161, 249, 253, 303].
Cardiorespiratory activity has been proven to associate with autonomic sympathetic and
parasympathetic (or vagal) nervous system in humans, which relates to sleep stages [95, 135,
183, 279, 292]. For example, the sympathetic activation of the heart usually translates in an in-
creased spectral power of heart rate variability (HRV) in the low-frequency (LF) band between
0.04 and 0.15 Hz and the vagal activity (primarily caused by respiratory sinus arrhythmia) is
associated with the spectral power in the high-frequency (HF) band between 0.15 and 0.4 Hz
[288]. During rapid-eye-movement (REM) sleep, the HF spectral power increases while the
LF spectral power decreases, when compared with non-REM (NREM) sleep and wakefulness
[265]. Furthermore, the respiratory volume and frequency are more regular during NREM sleep
than during REM sleep and wakefulness [95]. Irregular respiration patterns occurring during
wakefulness are usually caused by body movements or alternation of ventilation control ma-
nipulated by some external factors; during REM sleep they can be related to muscle atonia or
subcortical structures with a possible involvement of the bizarre content of dreams [230, 233].
In addition to sleep stages, the cardiorespiratory activity can be influenced by between-
subject variability with respect to 1) subject demographics (including body size) such as age,
gender, and body mass index (BMI) [49, 227, 267], and 2) internal physiology such as response
of autonomic regulation, metabolic function, and subcortical arousals [132, 269, 305]. Other
factors, which differ from subject to subject and within subjects, such as conscious breathing
control and external sleep environment (e.g., noise and temperature), can also cause variations
in autonomic response during sleep [55, 56, 64, 206]. Furthermore, the autonomic activity ap-
pears as a function of time and the ratio of NREM and REM sleep in a sleep cycle changes
during the time course of the night [46, 292]. These would also be reflected in changes of
cardiorespiratory activity throughout the night within subjects. Additionally, the daytime activ-
ity and stressful events may change the sleep architecture and, consequently, affect autonomic
control of cardiorespiratory activity during the night [11, 118, 122]. It is however not clear to
which extent each of these effects can explain the variations in cardiorespiratory activity during
sleep.
In regard to automatic sleep staging with autonomic cardiorespiratory activity, parameters
Part III. Cardiorespiratory-based sleep stage classification 127
Table 9.1: Subject demographics and sleep statistics (n=165).

Gender 77 men and 88 women
Age (y) 51.8 ± 19.4 20 − 95
Wake (%) 22.7 ± 13.2 1.2 − 78.6
REM sleep (%) 13.6 ± 5.3 0 − 26.5
Light sleep (%) 52.3 ± 10.0 15.6 − 72.1
Deep sleep (%) 11.4 ± 6.6 0 − 28.5
are usually derived from cardiac and respiratory signals on a 30-s epoch basis [136, 247]. Due
to the existence of between- and within-subject (variability) effects, the correct identification of
sleep stages based on the cardiorespiratory parameters seems challenging, in particular when
a subject-independent model is used (i.e., when a model is derived from a set of subjects, and
used to identify sleep stages for other new subjects).
The aim of this fundamental study was to quantitatively investigate the effects of between-
and within-subject variability on cardiorespiratory activity during sleep and to evaluate the
limitations of these effects on achieving reliable cardiorespiratory-based sleep staging results.

9.2.1 Subjects and protocol
A total of 165 healthy subjects participating in the SIESTA project [160] were included in
this study. The subjects were monitored over a period of three years from 1997 to 2000 in
seven different sleep laboratories located in five European countries. The subject demograph-
ics [mean ± standard deviation (SD)] including age, gender, and BMI are given in Table 9.1.
The protocol was approved by local ethics committees of all sleep laboratories involved and
all subjects provided a written informed consent. The subjects fulfilled the following criteria:
no significant medical disorders, no reported symptoms of neurological, mental, medical or
cardiovascular disorders, no history of drug abuse or habituation (including alcohol), no psy-
choactive medication or other drugs (e.g., beta blockers), no shift work, and usually retirement
to bed between 22:00 and 24:00 depending on their habitual bedtime ([160].
9.2.2 PSG recordings
For each subject, single-night full PSG recordings were obtained. Each recording consists of
at least 16 channels including EEG (C3-M2, C4-M1, O1-M2, O2-M1, Fp1-M2 and Fp2-M1),
EMG (chin and leg), EOG (2 leads), electrocardiogram (ECG, single-channel, modified V1
lead), nasal airflow, respiratory effort (abdominal and chest wall with respiratory inductance
plethysmography), snoring (microphone), and blood oxygen saturation [160]. Only the ECG
signals, sampled at 100 Hz, 200 Hz, or 256 Hz depending on the equipment setup of each sleep
laboratory, and the respiratory (chest) effort signals, all sampled at 10 Hz were used in this
study.
Each PSG recording was visually annotated in 30-s epochs as nighttime wake, REM sleep,
and one of the NREM sleep stages S1-S4 by two independent raters according to the R&K rules.
In case of disagreement, the consensus annotations between the two raters were obtained. For
the analysis in this study, we considered four stages: wake, REM sleep, light sleep (merging
S1 and S2), and deep sleep or slow wave sleep (merging S3 and S4). Table 9.1 presents some
sleep statistics of the recording nights.
9.2.3 Data preparation
The ECG and respiratory effort signals of all subjects were preprocessed before computing the
parameters used for analyses. The baseline wander of the ECG signal was removed with a linear
phase high-pass filter using an 1.106-s Kaiser window with a 0.8-Hz cutoff frequency and a 30-
dB side-lobe attenuation [297]. The resulting signal was normalized with regard to mean and
amplitude and a low-complexity precise QRS complex localization algorithm [107] was used
to locate the R peaks in the signal. The resulting heartbeat or RR intervals were re-sampled at
4 Hz using a linear interpolator. To compute the cardiac parameters in the frequency domain,
the power spectral density of the re-sampled RR intervals was estimated with an autoregressive
model [42]. Ectopic RR intervals longer than 2 s, shorter than 0.3 s, or shorter than 0.6 times
their previous value were discarded.
The respiratory effort signal was first low-passed filtered using a 10th order Butterworth fil-
ter with a cut-off frequency of 0.6 Hz to eliminate high-frequency noise. Afterwards, the signal
baseline was removed by subtracting the median peak-to-trough amplitude estimated over the
entire signal. The respiratory peaks and troughs were detected by locating the signal turning
points based on sign changes of signal slopes. Finally, we excluded incorrectly detected peaks
and troughs 1) in peak-to-trough or trough-to-peak intervals where the sum of two successive
intervals was less than the median of all intervals over the entire recording and 2) with am-
plitudes where the peak-to-trough difference was smaller than 0.15 times the median of the
entire-night respiratory signal [185].
9.2.4 Cardiorespiratory parameters
We analyzed six cardiorespiratory (two respiratory and four cardiac) parameters. The respi-
ratory parameters were BR, the mean breathing rate or respiratory frequency, and SDBR, the
standard deviation of breathing rates. For cardiac activity, the time-domain parameters included
HR, the mean heart rate, and SDNN, the standard deviation of heartbeat intervals. The spectral-
domain parameters included LF, the spectral power of heartbeat intervals in the LF band, and
HF, the spectral power in the HF band. Note that LF and HF were normalized by dividing
them by the total spectral power minus the power in the very-low-frequency (VLF, 0.003-0.05
Hz) band [58, 288]. This resulted in their expressions in a normalized unit (nu) instead of the
absolute unit (ms2 ). These parameters have been widely used for the task of cardiorespiratory-
based sleep staging [94, 182, 185, 248, 249]. A logarithmic transformation was applied to BR,
SDBR, HR, and SDNN to correct for non-symmetry in the frequency distributions. Measure-
ment units are therefore expressed in natural logarithmic Hz (ln-Hz) for BR and SDBR, natural
logarithmic beats per minute (ln-bpm) for HR, and natural logarithmic millisecond (ln-ms) for
SDNN.
9.2.5 Descriptive statistics
Values of the cardiorespiratory parameters (mean ± SD) measured from subjects with different
demographics (gender, age, and BMI) and time of night are presented. We considered different
cohort sets including three age groups: young (20-39 y), middle (40-69 y), and elderly (>69
y) and three BMI groups: under weight (<18.5 kg/m2), normal weight (18.5-25 kg/m2), and
over weight (>25 kg/m2 ). In addition, total sleep time was divided into four periods: 0-90
min, 90-180 min, 180-270 min, and >270 min. Significance of difference between groups was
tested with the analysis of variance (ANOVA) F-test.
9.2.6 Multilevel analysis
Traditional statistical methods such as repeated measures ANOVA (rANOVA) and repeated
measures multivariate ANOVA (rMANOVA) are often used to analyze longitudinal data. How-
ever, they might not be appropriate since they expect uncorrelated and independent observations
[23]. In regard to the nature of multiple dependent variables, a more generalized multilevel re-
gression analysis [134] takes structural variables with fixed and random effects measured at
multiple hierarchical levels into account. Compared with the traditional methods, multilevel
analysis has several advantages [134, 301]. First, it serves to deal with incomplete data while
ANOVA-based methods handle that by simply deleting all subjects with missing measures.
Second, it concerns data with a hierarchical structure and thus allows for meta-analysis of ex-
planatory variables with effects on different levels simultaneously. Third, it is able to quantify
the variability within levels. To these matters, we applied multilevel models to statistically eval-
uate the effects of between- and within-subject variability on the cardiorespiratory parameters.
Under a variety of names used by different authors, multilevel models are also known as, e.g.,
mixed models, random-effects models, and hierarchical linear models [134].
9.2.6.1 Between- and within-subject effects

The between-subject variability effects on cardiorespiratory activity can be linked to physiology
and subject demographics (age, gender, and BMI). On the other hand, cardiorespiratory activity
may change depending on the time of night within subjects [292]. This time effect can also vary
between subjects. Most multilevel models assume homogeneity or equality of variance for each
prediction variable, whereas this might not hold for the time effect. Therefore, it is hypothesized
that the time effect also changes along with subject demographics. This can be evaluated by
‘cross-interactions’ between time and demographic variables. Here we did not take into account
the influences from the differences in sleep environment, daytime energy expenditure, and other
factors or behaviors such as stress, smoking, and personality. These influences, if existent, were
assumed to be conveyed by the physiological variability. Additionally, in our previous work
[184], there were no effects on the cardiac activity found between different laboratories based
on the same data. For this reason, we disregarded the laboratory factor during our modeling
procedure.
To evaluate the between- and within-subject effects, we constructed a multilevel model with
two levels (level two: subject; level one: time or epoch) for a given cardiorespiratory parameter
y. The model predicts/estimates the values of the parameter based on a set of variables including
sleep stages, age, gender, BMI, and time of night. For the parameter value yi j in the ith epoch
of the night (i = 1, 2, . . ., N with a total of N epochs) from subject j ( j = 1, 2, . . ., M where M
is the total number of subjects), the two-level regression model with associated coefficients is
given by
Model #1 : yi j = β0 + ∑(βs + µs j )si j + (βt + µt j )timei j + e0i j

s
+ βa age j + βg gender j + βb BMI j
+ βta (time × age)i j + βtg (time × gender)i j + βtb (time × BMI)i j

with
     
µ0 j 0 Ω0
     
 µs j  ∼ N 0  Ωs  and e0i j ∼ N(0, Ωe ),
, (9.1)
µt j 0 Ωt
in which β0 is the fixed intercept, µ0 j is the random effect with variance Ω0 indicating the
between-subject variability in physiology (independent of sleep stages or corrected by sleep
stages), and e0i j is the (random) residual term with variance Ωe quantifying the within-subject
physiological variability (independent of time). si j is a dummy variable (0 or 1) specifying the
sleep stage (s = wake, REM, light, deep) of epoch i from subject j with its fixed effect βs and
random effect µs j , where Ωs reflects the between-subject physiological variability in sleep stage
s. The demographic variables age (y), gender (dummy variable: 0 = man, 1 = woman), and BMI
(kg/m2) respectively correspond to the fixed effects βa , βg , and βb varying between subjects.
The variable timei j (min) expresses the relative time of epoch i (timei j = i/2) from subject j,
βt is the fixed time effect corresponding to linear changes over time within subjects, µt j is the
random time effect with variance Ωt indicating the variability of time effect between subjects,
and βa , βta , and βtb are cross-interactions specifying the fixed age-, gender-, and BMI-related
time effects, respectively. Note that the variances from the random effects (including residuals)
were assumed to be drawn from a normal distribution with zero mean. Here the normality was
visually checked using a heuristic Quantile-Quantile (Q-Q) plot method since the commonly
used numerical normality tests are not appropriate on large-sized samples [271].
9.2.6.2 Centering effect

Intuitively, the mean value of a specific cardiorespiratory parameter over the entire night may
differ from subject to subject, which might be due to the physiological variability between sub-
jects at the general mean level. Cronbach [76] proposed a model that regards an additional
predictor indicating the between-group centering effect in real applications, allowing expres-
sions of parameter values as deviations from the group means. In this study, the model with
centering (physiological) effect for a given parameter can be expressed as
Model #2 : yi j = β0 + ∑(βs + µs j )si j + (βt + µt j )timei j + βc y j + e0i j

s
+ βa age j + βg gender j + βb BMI j
+ βta (time × age)i j + βtg (time × gender)i j + βtb (time × BMI)i j

with
     
µ0 j 0 Ω0
     
 µs j  ∼ N 0 ,  Ωs  and e0i j ∼ N(0, Ωe), (9.2)
µt j 0 Ωt
where y j is the variable that gives the within-subject mean value over the entire night for sub-
ject j and its associated fixed slope βc corresponds to the between-subject centering effect.
This effect is meant to reflect the physiological difference between subjects at the (individual)
overnight mean level. Here the estimation of the overnight mean value was assumed to be in-
dependent of sleep stage composition (percentages of sleep stages) over the entire night. To a
certain degree, the demographic effects were expected to be conveyed by the centering effect.
Therefore, the model without the centering term (Model #1) should be used for exploring the
actual demographic effects with a single model.
9.2.6.3 Model estimation and optimization

The multilevel modeling was implemented using the MLwiN software (Centre for Multilevel
Modeling, the University of Bristol, UK), where an iterated generalized least square (IGLS)
algorithm is issued for the model estimation, i.e., the estimates of regression coefficients and
their variances [244]. The model goodness-of-fit can be evaluated by the deviance (measured
by -2·log-likelihood) obtained during the modeling procedure.
A Wald Z-test was used to statistically examine the significance of the effects, testing the
null hypothesis that a coefficient equals zero [134]. For each estimated model coefficient or
variance corresponding to a specific effect, the Wald Z statistic is computed as the square of
the estimated coefficient divided by its standard error (SE)
γ2
Z= . (9.3)
SE 2 (γ )
The acceptance or rejection of the null hypothesis can be tested with a Chi-squared (χ 2 ) test
with one degree of freedom (df).
Table 9.2: Description of the seven explanatory effects (with exclusion of sleep stage effects) on
cardiorespiratory activity considered in this study.
Effect Description
◮ Overall between-subject effect
Demographic effect Fixed, variability in age, gender, BMI between subjects
Centering (physiological) effect Fixed, variability in overnight mean level between subjects
Between-subject time effect Random, variability in time of night between subjects
Between-subject physiological effect Random, variability in physiology between subjects
◮ Overall within-subject effect
Within-subject time effect Fixed, variability in age, gender, BMI within subjects
Within-subject physiological effect Random, variability in physiology within subjects
◮ Cross-interaction effect
Demographic-related time effect Fixed, demographic-related variability in time of night
The models described in Equation 9.1 and 9.2 are ‘full’ models and need to be optimized
by excluding the effects with coefficients statistically not different from zero (tested with the
Wald statistic). Differences between models are assessed by comparing model deviances using
a χ 2 statistic (i.e., likelihood ratio test) with df = 2. This chapter only presents the results of the
optimized models that are manipulated by significant effects.
9.2.7 Explanations of variance
It is of particular interest in interpreting how much the model variance is explained by different
variables or effects. As described in Table 9.2, a total of seven explanatory effects for each car-
diorespiratory parameter were considered in this study. Raudenbush and Bryk [246] proposed
an approach by using the squared multiple correlation R2 a sequence of models. Suppose that
the full model under consideration for a given parameter is Model #2, given by Equation 9.2.
A sequence of seven models (Model A-G) can be established in a certain order that serves to
compute the proportion of variance explained (PVE) of each effect. The details of doing this is
described in the Appendix.
9.2.8 Between- and within-subject effects in sleep staging
9.2.8.1 Sleep staging algorithm

Linear discriminant (LD) has been shown to be an appropriate algorithm in classifying overnight
sleep stages based on cardiorespiratory activity in many studies [248, 249]. In this work we
adopted an LD classifier to perform automatic sleep staging. Overall accuracy and the Cohen’s
Kappa coefficient of agreement [72] were used to evaluate the classifier’s performance. Addi-
tionally, sleep statistics including the percentages of wake, REM sleep, light sleep, and deep
sleep were calculated. In order to verify the classification performance, the subjects were ran-
domly divided into a training set of 82 subjects used to train the classifier and a testing set of
83 subjects for testing.
9.2.8.2 Comparison of correction schemes

The objective was to examine how much the between-and within-subject effects on the car-
diorespiratory activity would restrict the performance in classifying sleep stages (wake, REM
sleep, light sleep, and deep sleep) and then estimating the sleep statistics. For comparison, we
analyzed three different ‘correction’ scheme (CS) based on the optimized Model #2 with esti-
mated model coefficients to correct (or predict) the values for each parameter. The corrected
values were then used to perform sleep staging. The sleep staging using originally measured
values without any corrections served as the baseline (BS).
• The first correction scheme CS1 predicts the parameter values with subtraction of all the
fixed effects independent of sleep stages, such that
CS1: ŷi j = µ̂0 j + ∑(β̂s + µ̂s j )si j + µ̂t j timei j + ê0i j . (9.4)
s
• The second scheme CS2 corrects the parameter values by subtracting all the (sleep-stage-
independent) fixed effects and all the between-subject random effects, such that
CS2: ŷi j = ∑ β̂s si j + ê0i j . (9.5)
s
• The third scheme CS3 excludes all the (sleep-stage-independent) fixed effects and the
within-subject effect to correct the parameter values, such that
CS3: ŷi j = µ̂0 j + ∑(β̂s + µ̂s j )si j + µ̂t j timei j . (9.6)
s
Note that, again, the exclusive aim of analyzing these correction schemes in the present study
was to evaluate in what aspect and how far the cardiorespiratory parameters can be improved for
sleep staging instead of really performing sleep staging. In other words, we intended to answer
the question what sleep staging performance can be achieved if we can eliminate the effects
caused by the between- or within-subject variability. Investigating methods of estimating the
fixed coefficients and random variances without knowing sleep stages was not addressed here.
9.3 Results
9.3.1 Descriptive results
Figure 9.1 compares the skewness of the parameters with and without being transformed using
natural logarithm. It indicates that the four parameters BR, SDBR, HR, and SDNN need to
be log-transformed since they were of skewed distribution and their skewness values largely
decreased after performing the log-transformation. Table 9.3 shows the values (mean ± SD) of
the six cardiorespiratory parameters BR, SDBR, HR, SDNN, LF, and HF analyzed in this study
for different cohort sets in different gender, age groups, BMI groups, time periods, and sleep
stages. The values significantly differed across different groups for all the cohort sets (ANOVA
F-test, p < 0.001).
3
Original
Natural logarithm
2
1
Skewness
-1
-2
BR SDBR HR SDNN LF HF
Cardiorespiratory parameter
Figure 9.1: Skewness comparison of cardiorespiratory parameters with and without natural logarithm
transformation, indicating that BR, SDBR, HR, and SDNN should be log-transformed.
9.3.2 Multilevel modeling
In comparison with the F-test, the multilevel regression models enable a more adequate and
thorough statistical analysis. With the multilevel Model #1, the estimated coefficients and vari-
ances for all the parameters are shown in Table 9.4. As a result of removing the insignificant
variables (tested using the Wald Z-test with p > 0.05) except for the constant intercept and
sleep stage variables, the model was optimized. The table indicates that the demographics sig-
nificantly influenced the cardiorespiratory activity from different aspects. Upon a closer look, it
is found that the breathing rate BR for the healthy subjects with a higher BMI was significantly
higher than the subjects with a lower BMI (0.011 ln-Hz per kg/m2 , p < 0.01) at the baseline
of -1.458 ln-Hz, whereas its variation SDBR remained the same. For cardiac activity, the mean
heart rate HR of women was higher than men (0.042 ln-bpm, p < 0.05) at the baseline of 4.221
ln-bpm while its variation SDNN was lower than men (-0.247 ln-ms, p < 0.0001) at the base-
line of 4.823 ln-ms. SDNN was also negatively correlated to subject age (-0.009 ln-ms per y,
p < 0.0001) and BMI (-0.025 ln-ms per kg/m2, p < 0.01). With the spectral analysis of HRV,
men had an LF power increased by 0.045 nu (p < 0.05) but a lower HF power of 0.052 nu
(p < 0.01) nu compared with women during bedtime sleep. The HF power slightly decreased
along with the increase in age for men (-0.002 nu per y, p < 0.05). These results are consistent
with previous work [49, 101, 257].
Most of the analyzed parameters were found to be time-variant (i.e., they were modulated
by time of night) with an exception of breathing rate (Table 9.4). For instance, the heart rate
HR dropped down gradually along with the time progression over the night (-0.0001 ln-bpm
Table 9.3: Values (mean ± SD) of the six cardiorespiratory parameters in different cohort sets
(n=165)
Cohort set BR SDBR HR SDNN LF HF

(ln-Hz) (ln-HZ) (ln-bpm) (ln-ms) (nu) (nu)
◮ Gender
Man -1.20 ± 0.24 -3.67 ± 0.75 4.13 ± 0.15 3.74 ± 0.77 0.42 ± 0.23 0.47 ± 0.23
Woman -1.22 ± 0.23 -3.81 ± 0.76 4.16 ± 0.16 3.49 ± 0.71 0.39 ± 0.22 0.50 ± 0.23
◮ Age
Young -1.24 ± 0.24 -3.85 ± 0.74 4.11 ± 0.16 3.94 ± 0.63 0.36 ± 0.20 0.56 ± 0.22
Middle -1.20 ± 0.24 -3.71 ± 0.78 4.15 ± 0.16 3.52 ± 0.69 0.45 ± 0.23 0.45 ± 0.23
Elderly -1.18 ± 0.20 -3.70 ± 0.71 4.17 ± 0.13 3.39 ± 0.81 0.38 ± 0.24 0.45 ± 0.22
◮ BMI
Underweight -1.24 ± 0.14 -4.00 ± 0.66 4.11 ± 0.12 4.01 ± 0.53 0.36 ± 0.18 0.56 ± 0.19
Normal -1.23 ± 0.23 -3.77 ± 0.74 4.14 ± 0.16 3.72 ± 0.73 0.41 ± 0.22 0.48 ± 0.23
Overweight -1.18 ± 0.24 -3.70 ± 0.77 4.15 ± 0.15 3.46 ± 0.75 0.39 ± 0.23 0.48 ± 0.23
◮ Time of night
0-90 min -1.22 ± 0.22 -3.81 ± 0.80 4.16 ± 0.15 3.52 ± 0.73 0.39 ± 0.22 0.50 ± 0.23
90-180 min -1.21 ± 0.22 -3.85 ± 0.75 4.17 ± 0.15 3.58 ± 0.74 0.42 ± 0.23 0.46 ± 0.23
180-270 min -1.20 ± 0.23 -3.77 ± 0.77 4.15 ± 0.16 3.61 ± 0.77 0.41 ± 0.23 0.48 ± 0.23
>270 min -1.21 ± 0.24 -3.66 ± 0.72 4.12 ± 0.15 3.67 ± 0.75 0.40 ± 0.22 0.49 ± 0.23
◮ Sleep stage
Wake -1.16 ± 0.23 -3.25 ± 0.62 4.19 ± 0.15 3.61 ± 0.78 0.42 ± 0.24 0.44 ± 0.23
REM sleep -1.18 ± 0.22 -3.44 ± 0.52 4.15 ± 0.16 3.64 ± 0.76 0.45 ± 0.23 0.42 ± 0.22
Light sleep -1.23 ± 0.23 -3.89 ± 0.73 4.13 ± 0.15 3.64 ± 0.73 0.40 ± 0.22 0.49 ± 0.23
Deep sleep -1.24 ± 0.23 -4.29 ± 0.71 4.14 ± 0.15 3.45 ± 0.72 0.33 ± 0.21 0.57 ± 0.21
ln, natural logarithm; nu, normalized unit; young, 20-39 y; middle, 40-69 y; elderly, >69 y; under weight,
<18.5 kg/m2 ; normal weight, 18.5-25 kg/m2; over weight, >25 kg/m2 ; light sleep, S1 and S2 stages;
deep sleep, S3 and S4 stages. For all the parameters, values between each cohort groups were signifi-
cantly different (F-test, p < 0.001) but this may be imprecise since subject demographics, time of night,
and sleep stages were possibly not independent.
per min, p < 0.0001) at the baseline of 4.221 ln-bpm while the variation in heartbeat intervals
SDNN increased (0.001 ln-ms per min, p < 0.0001) at the baseline of 4.823 ln-ms, confirming
the findings reported previously [57]. This time modulation varied from subject to subject
because of the presence of significant variance Ωt (p < 0.0001), referring to the random time
effect. The time was also modulated by some demographic variables (such as age for SDNN
and BMI for SDBR, LF, and HF). We note in the table that there appeared to be significant
between-subject physiological effects for all parameters (p < 0.0001), measured by the random
variances of sleep stage variables. These variances seemed approximately homogeneous across
sleep stages for BR and HR but were clearly different for their variations SDBR and SDNN.
Figure 9.2 illustrates an example that compares the parameter values (estimated by multilevel
regression based on Model #1) changing along with time between two subjects with different
Table 9.4: Coefficients and their standard errors (SE) of the optimized multilevel model without the
between-subject centering effect (Model #1) for the six cardiorespiratory parameters.
Coef. BR SDBR HR SDNN LF HF

◮ Fixed, coefficient (SE)
β0 -1.458 (0.087) -3.320 (0.032) 4.221 (0.016) 4.823 (0.255) 0.464 (0.014) 0.535 (0.027)
βwake Baseline Baseline Baseline Baseline Baseline Baseline
βREM 0.002 (0.008) -0.205 (0.026) -0.028 (0.004) -0.104 (0.027) 0.030 (0.007) -0.037 (0.007)
βlight -0.035 (0.008) -0.611 (0.026) -0.061 (0.004) -0.052 (0.021) -0.027 (0.006) 0.039 (0.006)
βdeep -0.044 (0.010) -0.997 (0.033) -0.055 (0.004) -0.249 (0.026) -0.096 (0.008) -0.106 (0.008)
βa -0.009 (0.002) -0.002 (0.001)
βg 0.042 (0.021) -0.247 (0.069) -0.045 (0.018) 0.052 (0.017)
βb 0.011 (0.004) -0.025 (0.011)
βt 0.001 (4e-4) -1e-4 (2e-5) 0.001 (2e-4) 4e-4 (1e-4) -4e-4 (1e-4)
βta -1e-5 (3e-6)
βtg
βtb -3e-5 (1e-5) -2e-5 (5e-6) 2e-5 (5e-6)
◮ Random, coefficient (SE)
Ω0
Ωwake 0.030 (0.003) 0.159 (0.018) 0.018 (0.002) 0.224 (0.025) 0.018 (0.002) 0.014 (0.002)
ΩREM 0.029 (0.003) 0.171 (0.018) 0.019 (0.002) 0.280 (0.031) 0.022 (0.002) 0.018 (0.002)
Ωlight 0.030 (0.003) 0.219 (0.018) 0.020 (0.002) 0.256 (0.028) 0.019 (0.002) 0.017 (0.002)
Ωdeep 0.031 (0.003) 0.257 (0.018) 0.020 (0.002) 0.324 (0.036) 0.020 (0.002) 0.017 (0.002)
Ωt 1e-7 (1e-8) 7e-7 (8e-8) 4e-8 (4e-9) 7e-7 (8e-8) 5e-8 (6e-9) 5e-8 (5e-9)
◮ Residual and deviation (dev.)
Ωe 0.019 (1e-4) 0.290 (0.001) 0.003 (1e-5) 0.230 (0.001) 0.033 (1e-4) 0.033 (1e-4)
Dev. -150487 217253 -398075 186380 -75029 -74306
ln, natural logarithm; nu, normalized unit. The statistically significant effects (Wald Z-test, p < 0.05) the fixed
constant intercept β0 and sleep stage intercepts βs are presented.
demographics. It shows that the fixed time and demographic effects were generally larger than
the differences between sleep stages.
With the addition of the centering variable to Model #1, we have Model #2 and the estimated
regression coefficients after model optimization (Wald Z-test at p < 0.05, for each coefficient)
are shown in Table 9.5. As stated, this model included the between-subject physiological effect
at the overnight mean level (i.e., centering effect), resulting in an obvious reduction of the
random variance in each sleep stage compared with Model #1. This indicates that, regardless of
sleep stages, the between-subject variability in physiology can be reflected, to a certain degree,
by the difference of the mean value over night. Besides, centering the parameter values per
subject slightly influenced the time effect in both fixed and random parts. In comparison with
Model #1, a lower deviance using Model #2 was obtained for all the parameters (p < 0.0001)
as shown in Table 9.4 and 9.5, indicating a better goodness-of-fit on the parameters using the
model with the centering variable.
Part III. Cardiorespiratory-based sleep stage classification
-1.12 4.25 0.6
Man Man Man
HR (ln-bpm)
-1.16 4.2 0.5
BR (ln-Hz)
LF (nu)
-1.2 4.15 0.4
-1.24 4.1 0.3

Woman Woman Woman
-1.28 4.05 0.2
0 200 400 0 200 400 0 200 400 0 200 400 0 200 400 0 200 400
Time (min) Time (min) Time (min) Time (min) Time (min) Time (min)
-2.8 4.6 0.7
Man Man Man
SDNN (ln-ms)
SDBR (ln-Hz)
-3.4 4 0.6
HF (nu)
0.5
-4 3.4
Woman
0.4
Woman Woman
-4.6 2.8
0 200 400 0 200 400 0 200 400 0 200 400 0 200 400 0 200 400
Time (min) Time (min) Time (min) Time (min) Time (min) Time (min)
Figure 9.2: An example of multilevel regressions of the six cardiorespiratory parameters for a man (age: 24 y, BMI: 21.3 kg/m2 ) and a woman (age: 70 y,
BMI: 28.6 kg/m2 ) using coefficients estimated through Model #1 excluding the random coefficients and residual term. The regression variables included
age, gender, BMI, time, time × age, time × gender, time × BMI, and sleep stages wake, REM, light, and deep.
137
Table 9.5: Coefficients and their standard errors (SE) of the optimized multilevel model with the addi-
tional between-subject centering effect (Model #2) for the six cardiorespiratory parameters.
Coef. BR SDBR HR SDNN LF HF

◮ Fixed, coefficient (SE)
β0 -0.098 (0.079) -0.012 (0.017) 0.104 (0.028) -0.060 (0.047) -0.018 (0.034) 0.131 (0.030)
βc 0.973 (0.011) 0.884 (0.020) 0.993 (0.007) 0.979 (0.011) 0.936 (0.012) 0.923 (0.011)
βwake Baseline Baseline Baseline Baseline Baseline Baseline
βREM 0.002 (0.008) -0.199 (0.025) -0.027 (0.004) -0.104 (0.027) 0.030 (0.007) -0.037 (0.007)
βlight -0.035 (0.008) -0.606 (0.026) -0.062 (0.004) -0.052 (0.020) -0.027 (0.005) 0.039 (0.006)
βdeep -0.044 (0.010) -0.992 (0.033) -0.054 (0.004) -0.248 (0.026) -0.096 (0.008) -0.105 (0.008)
βa -0.002 (0.001) -1e-4 (5e-5) 4e-4 (1e-4) 2e-4 (1e-4)
βg -0.024 (0.012)
βb 0.005 (0.001) -0.004 (0.001)
βt 3e-4 (1e-4) -1e-4 (2e-5) 0.001 (1e-4) 4e-4 (1e-4) -4e-4 (1e-4)
βta -1e-5 (1e-6)
βtg 1e-4 (5e-5)
βb -2e-5 (5e-6) 2e-5 (5e-6)
◮ Random, coefficient (SE)
Ω0
Ωwake 0.012 (0.001) 0.093 (0.011) 0.004 (4e-4) 0.094 (0.011) 0.006 (0.001) 0.005 (0.001)
ΩREM 0.014 (0.002) 0.099 (0.011) 0.003 (3e-4) 0.095 (0.011) 0.007 (0.001) 0.006 (0.001)
Ωlight 0.006 (0.001) 0.061 (0.007) 0.002 (3e-4) 0.044 (0.005) 0.004 (0.001) 0.003 (3e-4)
Ωdeep 0.010 (0.001) 0.131 (0.015) 0.003 (3e-4) 0.087 (0.010) 0.006 (0.001) 0.006 (0.001)
Ωt 1e-7 (1e-8) 7e-7 (8e-8) 4e-8 (4e-9) 7e-7 (8e-8) 5e-8 (6e-9) 4e-8 (5e-9)
◮ Residual and deviation (dev.)
Ωe 0.019 (1e-4) 0.290 (0.001) 0.003 (1e-5) 0.230 (0.001) 0.033 (1e-4) 0.033 (1e-4)
Dev. -151084 216873 -398866 185774 -75617 -74903
ln, natural logarithm; nu, normalized unit. The statistically significant effects (Wald Z-test, p < 0.05) the fixed
constant intercept β0 and sleep stage intercepts βs are presented.
Normality of the variances was tested and suggested using the Q-Q plot method for all
models. For example, the Q-Q plots of the residual variances Ωe (in Model #1) for all the
parameters are shown in Figure 9.3, suggesting that the variances were approximately drawn
from a normal distribution.
9.3.3 Proportion of variance explained
To exploit by which effects the variance was explained and how much they constituted, we
computed for each cardiorespiratory parameter the PVE for each effect by analyzing the es-
timated variances of random intercept and residual in a sequence of models (Model A-G in
the Appendix). The variance changes in the models with the inclusion of different effects in
a specific order are shown in Table 9.6, based on which the PVE values were obtained in Ta-
ble 9.7. Note that the variances explained by sleep stages were not included in PVE. For BR
0.5 BR (ln-Hz) 2 SDBR (ln-Hz) HR (ln-bpm)
Quantiles of Ωe
Quantiles of Ωe
Quantiles of Ωe
0.2
0 0 0
-0.5 -0.2
-2
-0.5 0 0.5 -2 0 2 -0.2 0 0.2
Standard normal quantiles Standard normal quantiles Standard normal quantiles
2 SDNN (ln-bpm) LF (nu) HF (nu)
Quantiles of Ωe
Quantiles of Ωe
Quantiles of Ωe
0.5 0.5
0 0 0
-2 -0.5 -0.5
-2 0 2 -0.5 0 0.5 -0.5 0 0.5

Standard normal quantiles Standard normal quantiles Standard normal quantiles
Figure 9.3: Q-Q plots of residual variance Ωe of the multilevel models (Model #1) for the six cardiores-
piratory parameters. These plots suggest approximate normal distributions of the residual variances.
and HR, the between-subject centering effects dominated the variances (55.26% for BR and
77.95% for HR), indicating that the subjects behaved differently with respect to their breathing
rate and heart rate at the general mean level throughout the whole night. We also see that the
variations in breathing rate and heart rate had a lower centering difference between subjects
(with PVE of 26.23% for SDBR and of 39.06% for SDNN) compared with the physiological
variability within subjects (with PVE of 61.69% for SDBR and of 40.87% for SDNN). This
was also the case for LF and HF powers in the spectral domain of HRV as shown in Table 9.7.
As a result, the overall between-subject variability influenced more on breathing rate (PVE
of 66.58%) and heart rate (PVE of 86.25%) while less on their variations (PVE of 37.94%,
58.66%, 33.62%, and 35.13% for SDBR, SDNN, LF, and HF, respectively) compared with the
overall within-subject variability. In general, the variances explained by the effects in physiol-
ogy between subjects (including the effect at the overnight mean level and random effect) and
within subjects accounted for 83.83-97.16% of the total variance for different cardiorespiratory
parameters.
Specifically, a relative larger percentage (13.7%) of the demographic effect can be found
on SDNN compared with the other parameters. The PVE of between-subject physiological
variability (in the random part) ranged from 2.27% to 7.62% depending on the parameters.
For the time effect, the PVE in the fixed part (0.01-1.32%) reflecting the linear changes of
parameters over time within subjects was smaller than in the random part (1.58-2.74%) with
the indication of different changes over time between subjects. In general, the time effect
accounted for relatively less of the total variance than most other effects. Finally, although
the cross-interactions existed between time and demographics for BR, SDNN, LF, and HF, the
proportion of variance they explained was very small (<0.20%).
Table 9.6: Variances of a sequence of models (Model A-G in the Appendix) with different effects
for computing their PVE for the six cardiorespiratory parameters.
Model A-G with different effects BR SDBR HR SDNN LF HF

(Appendix) (ln-Hz) (ln-HZ) (ln-bpm) (ln-ms) (nu) (nu)
Model A: Ωe 0.0229 0.3306 0.0043 0.2626 0.0354 0.0356
baseline model Ω0 0.0328 0.1389 0.0192 0.2997 0.0151 0.0156
Dev. -125045 232926 -348717 202249 -66487 -65952
Model B: Ωe 0.0228 0.3284 0.0040 0.2600 0.0353 0.0355
+ within-subject time effect Ω0 0.0328 0.1393 0.0191 0.2999 0.0150 0.0155
(fixed) Dev. -125109 232056 -357783 200926 -66724 -66131
Model C: Ωe 0.0228 0.3284 0.0040 0.2600 0.0353 0.0355
+ demographic effect Ω0 0.0308 0.1329 0.0183 0.2230 0.0147 0.0136
(fixed) Dev. -125120 232048 -357790 200877 -66730 -66152
Model D: Ωe 0.0228 0.3284 0.0040 0.2600 0.0353 0.0355
+ centering effect Ω0 0.0001 0.0098 0.0001 0.0033 0.0003 0.0002
(fixed) Dev. -126064 231624 -358718 200200 -67367 -66850
Model E: Ωe 0.0227 0.3284 0.0040 0.2597 0.0352 0.0354
+ demographic-related time Ω0 0.0001 0.0098 0.0001 0.0033 0.0003 0.0002
effect (fixed) Dev. -126393 231624Ne -358718Ne 200027 -67718 -67206
Model F: Ωe 0.0210 0.3157 0.0034 0.2476 0.0343 0.0346
+ between-subject time Ω0 0.0003 0.0097 0.0001 0.0041 0.0003 0.0002
effect (random) Ωt 1.1e-7 7.3e-7 3.6e-8 7.1e-7 5.2e-8 4.5e-8
Dev. -136185 226933 -380964 194316 -70913 -69899
Model G: Ωe 0.0186 0.2896 0.0029 0.2298 0.0328 0.033
+ between-subject Ω0 0 0 0 0 0 0
physiological effect Ωt 1.0e-7 6.7e-7 3.5e-8 7.1e-7 4.8e-8 4.3e-8
(random) Dev. -151084 216874 -398866 185774 -75617 -74903
ln, natural logarithm; nu, normalized unit; Dev., model deviance; Ne, no effect. All the models include
fixed (β0 ) and random (µ0 ) intercepts, and sleep-stage-dependent variables (wake, REM, light, and deep)
with their coefficients. The models were optimized by excluding the effects with their coefficients statis-
tically equal to zero (Wald Z-test, p > 0.05) and the variances presented in the table were all statistically
significant (Wald Z-test, p < 0.01).
9.3.4 Sleep staging results
The results of sleep staging are presented in Table 9.8, where different schemes (BS and CS1-
CS3) were compared. We observe that the correction by means of the between- and/or within-
subject effects for the parameters generally enabled performance improvement in sleep staging
(by comparing the results of CS1-CS3 with BS). In particular, correcting the parameters by
the fixed effects (demographics, time, and their cross-interactions) independent of sleep stages
(CS1) resulted in a significantly increased Kappa of 0.29 ± 0.11 and a significantly increased
accuracy of 60.4 ± 8.8% (Wilcoxon test, p < 0.00001) compared with the baseline without any
correction (Kappa of 0.19 ± 0.10 and accuracy of 55.8 ± 9.8%). In addition, if we could man-
Table 9.7: Proportion of variance explained (PVE, %) accounted for by different effects for the six
cardiorespiratory parameters.
Effect BR SDBR HR SDNN LF HF

◮ Overall between-subject effect
Demographic effect 3.55% 1.37% 3.36% 13.69% 0.63% 3.70%
Centering (physiological) effect 55.26% 26.23% 77.95% 39.06% 28.63% 26.41%
Between-subject time effect 2.74% 2.72% 2.67% 2.00% 1.87% 1.58%
Between-subject physiological effect 5.03% 7.62% 2.27% 3.91% 3.49% 3.44%
◮ Overall within-subject effect
Within-subject time effect 0.01% 0.37% 1.32% 0.42% 0.16% 0.14%
Within-subject physiological effect 33.39% 61.69% 12.43% 40.87% 65.04% 64.54%
◮ Cross-interaction effect
Demographic-related time effect 0.02% Ne Ne 0.06% 0.18% 0.19%
ln, natural logarithm; Ne, no effect. For each cardiorespiratory parameter, the sum of PVE’s from all the ef-
fects is 100%, representing the total variance for that parameter. The centering effect reflected some between-
subject physiological variability (at the overnight mean level) that was assumed to be independent of sleep
stage composition over the entire night.
Table 9.8: Comparison of sleep staging results (wake/REM sleep/light sleep/deep sleep) using
different schemes in correcting the cardiorespiratory parameters.
PSG BS CS1 CS2 CS3

◮ Overall performance
Accuracy, % – 55.8 ± 9.8 60.4 ± 8.8 62.9 ± 7.8 83.5 ± 14.4
Kappa coefficient – 0.19 ± 0.10 0.29 ± 0.11 0.35 ± 0.09 0.72 ± 0.23
◮ Sleep stage composition (percentage)
Wake, % 19.8 ± 12.5 19.9 ± 14.4 18.4 ± 4.9 20.6 ± 6.4 19.7 ± 10.7
REM sleep, % 14.0 ± 5.6 0.7 ± 1.0 2.4 ± 2.0 3.0 ± 1.7 10.5 ± 7.8
Light sleep, % 53.4 ± 10.7 74.7 ± 15.1 73.5 ± 8.1 71.0 ± 8.2 59.9 ± 12.0
Deep sleep, % 12.8 ± 7.2 4.7 ± 5.6 5.7 ± 5.2 5.4 ± 4.0 9.9 ± 7.6
BL, baseline with original parameter values without correction; CS1, with correction by fixed effects;
CS2, with correction by fixed effects and between-subject random effects; CS3, with correction by
fixed effects and within-subject random effect (model residual). For CS2 and CS3, results were ob-
tained when assuming the sleep stages were known, which was usually not the case in practice. For
accuracy and Kappa coefficient, significance of difference between using each correction scheme
and BS was confirmed with a paired (two-sided) Wilcoxon signed-rank test, all at p < 0.00001.
age to further correct the variability of the parameters evoked by the between-subject random
effects (CS2), the sleep staging results would significantly increase to a Kappa of 0.35 ± 0.09
and an accuracy of 62.9 ± 7.8% (Wilcoxon test, p < 0.00001), where the SD of results over
subjects would be simultaneously reduced. On the other hand, if the within-subject variability
could be corrected (CS3), the sleep staging performance would be markedly improved (at a
Kappa of 0.72 ± 0.23 and an accuracy of 83.5 ± 14.4%) (Wilcoxon test, p < 0.00001), but
meanwhile, the SD would increase because this correction scheme focused on reducing effects
within subjects rather than those between subjects. Similarly, as shown in Table 9.8, correcting
the parameters could help obtain a more accurate estimation of sleep stage composition.
9.4 Discussion
The results of demographic and time of night effects found in this study are consistent with the
findings reported in previous work [49, 57, 101, 257]. It is noted that the model used to facili-
tate the interpretation of the demographic effects (Model #1) should not include the (between-
subject) centering variable. This is because the demographic differences usually correspond to
the autonomic changes at the overnight mean level. Due to the inclusion of the centering effect
in Model #2, it came as a surprise that some demographic variables still had significant effects
(see Table 9.4), which contradicts our expectation that their effects on the cardiorespiratory ac-
tivity are fully manifested by the parameter mean values. The cause is that the percentages (or
composition) of sleep stages were not exactly the same for all subjects. Therefore, the demo-
graphic differences were only partially explained by the centering variable and the unexplained
part depends on the difference of sleep stage composition between subjects.
It is important to note that, since some effects were correlated with each other, the order in
the procedure of constructing the sequence of models (see the Appendix) must be specifically
determined. This aimed at precisely quantifying the proportion of variance explained by each
effect. The procedure should follow the way that the model with fixed effects (e.g., demo-
graphic effects) that are explainable by other effects should be first addressed and the model
with random effects should be included later [134].
In Table 9.4 and 9.5, it can be seen that the time variable was able to explain variance at the
subject level due to the significance of the random time effect. First, the slope of cardiorespi-
ratory activity changing over time might depend on sleep stages (or their transitions) and thus
might not be with a continuous linear trend. A method of handling the sleep-stage-dependency
is to use a model that contains the cross-interactions between sleep stages and time; but for
the influence of sleep stage transitions, it is suggested to regard the night as different segments
without any sleep stage transitions. Second, the random time effect could likely be due to
the difference in autonomic control or changes in sleep architecture between subjects by other
factors such as daytime activity, work stress, and response to the sleep environments during
sleep. This was not addressed in this study and it merits further investigation. On the other
hand, the cross-interactions between time and demographics (in particular, BMI) explained
some total variance at both subject and epoch levels. Although the amount and proportion of
variance explained by the time-related effects seems much smaller than some other effects as
shown in Table 9.7, they are still statistically unequal for different subjects and are relative large
compared with the differences between sleep stages for some parameters such as LF and HF,
especially at the end of the night, which can be observed in Figure 9.2.
Regarding the quantified within effects, several factors in addition to internal physiology
may also explain some of the total variance within subjects in cardiorespiratory activity such
as body movements, body position, sleep environment, conscious breathing control, and even
daytime activity. However, we did not answer which of these effects takes place in this work
and this should be studied in the future.
When evaluating the performance of sleep staging using the cardiorespiratory parameters,
Model #2 should be regarded as the preference. For each parameter, although the estimate of
its overnight mean value for each subject was not completely accurate (due to the difference
of sleep stage composition between subjects), correcting it can still result in a reduction of the
physiological variability between subjects to a great extent. As a consequence, the sleep staging
results can be improved. Table 9.6 confirms that the centering effect actually constituted a
large proportion of the total variance. Moreover, Figure 9.2 illustrates that the variations of the
parameters caused by demographic and time effects were somewhat comparable with or even
larger than the differences between sleep stages, leading to difficulty in separating sleep stages.
With respect to the capability of the parameters in classifying sleep stages, Table 9.5 shows
that, for example, SDBR had a larger difference between sleep stages compared with the other
parameters while BR had no difference between REM sleep and wakefulness. This indicates
that the intrinsic separation of sleep stages should vary between the parameters that express
different aspects of the autonomic activity.
Table 9.8 indicates that the variability between and within subjects conveyed by the car-
diorespiratory activity limited the sleep staging performance. To improve it, the correction
scheme CS1 seems potentially applicable from a practical point of view because the fixed ef-
fects are usually prior information that is independent of sleep stages or they can be estimated
from the training data before performing sleep staging. However, realizing CS2 and CS3 re-
quires either information of sleep stages (which appear practically unknown and need to be
identified) or estimation of random variances (which are hardly predictable for new subjects).
Therefore, the challenge will be on how to diminish the random effects caused by variability
either between or within subjects when sleep stages are unknown. For instance, normalizing
the parameter values based on their variation or distribution throughout the night for each sub-
ject might allow for reduction of between-subject random effect in physiology to some extent.
Incorporating more explanatory variables in the model that are independent of sleep stages and
are able to explain some variance of the model would help better correct the parameters. Com-
pared to the parameters analyzed in this study, exploring new parameters with smaller random
variances (i.e., are less influenced by the between- or within-subject physiological variability)
or additional information in separating sleep stages may improve the sleep staging performance.
Nevertheless, we argue that the performance of cardiorespiratory-based sleep staging will al-
ways be limited unless the between- and/or within-subject random variances are successfully
explained and corrected.
9.5 Conclusion
In this chapter, with a multilevel analysis we statistically modeled and quantified the effects on
autonomic cardiorespiratory activity during sleep caused by differences in subject demograph-
ics, time of night, physiology within and between subjects. All these effects were found to
significant. The primary effects were the physiological variability within and between subjects.
They markedly limit the performance of sleep staging when using cardiorespiratory informa-
tion. Therefore, diminution of these effects will be the main challenge to further improve the
cardiorespiratory-based sleep staging.
9.A Appendix
The sequence of models constructed to compute the PVE values for different effects is described
in the following.
• The first model is the model with solely the constant and random intercepts as well as the
fixed sleep-stage-dependent variables. This baseline model can be written as
Model A: yi j = β0A + µ0Aj + ∑ βsA si j + eA0i j ,

s
with µ0Aj ∼ N(0, ΩA0 ) and eA0i j ∼ N(0, ΩAe ), (9.7)
where s = wake, REM, light, deep, and the total variance Ωtotal consists of variance in
two levels: the between-subject variance ΩA0 at the subject level and the within-subject
(residual) variance ΩAe at the time/epoch level. The percentage of the total variance taken
by ΩA0 , called intra-group correlation coefficient (ICC) ρ (21, 39), is computed by
ΩA0 ΩA
ρ= = A 0 A . (9.8)
Ωtotal (Ωe + Ω0 )
• Let us then consider the model with fixed time effect at the first level
Model B: yi j = β0B + µ0Bj + ∑ βsB si j + βtBtimei j + eA0i j ,

s
with µ0Bj ∼ N(0, ΩB0 ) and e0i j ∼ N(0, ΩBe ). (9.9)
For the variance analysis of the time variable, instead of using the original time stamps
mentioned before (i.e., timei j = i/2), we use the shifted (centered) values computed as the
original time minus the mean value of the median time over all subjects. This is because,
for a longitudinal multilevel analysis, time is an occasional variable within subjects and
it usually suffices a linear trend for the measurements since, it thus would explain part of
total variance in both levels [134]. Actually, with and without shifting the occasion mea-
sures do result in equivalent models with exactly the same model coefficients (including
residual) and deviance except for the variance estimates of random effects. The variance
estimates obtained by shifting the time values are considered to be more accurate and
realistic [134]. To quantify the PVE constituted by the fixed time effect, we exploit the
relative variance reduction of the baseline model in the two levels R21 and R21 , such that
PVEtime fixed = (1 − ρ )R21 + ρ R22
ΩAe − ΩBe ΩA0 − ΩB0

=ρ + (1 − ρ )
ΩAe ΩA0
(ΩAe − ΩBe ) + (ΩA0 − ΩB0 )
= . (9.10)
Ωtotal
Now we consider the subject-level fixed effects.
• The model including demographic variables is
Model C: yi j = β0C + µ0Cj + ∑ βsC si j + βtC timei j + eC0i j ,

s
+ βa age j + βgC gender j + βbC BMI j ,
C
with µ0Cj ∼ N(0, ΩC0 ) and eC0i j ∼ N(0, ΩCe ). (9.11)
Similarly, the PVE explained by the between-subject demographic variables can be com-
puted by
(ΩB C B C
e − Ωe ) + (Ω0 − Ω0 )
PVEdemographic = . (9.12)
Ωtotal
The demographic variables only explain the variability between subjects, so the variance
change at the epoch level should be approximately zero (ΩBe − ΩCe = 0).
• Further, Model D is the model with the inclusion of between-subject centering effect
(expressing the physiological difference between subjects at the overnight mean level),
given by
Model D: yi j = β0D + µ0Dj + ∑ βsD si j + βtDtimei j + βcD ŷ j + eD

0i j ,
s
+ βa age j + βgD gender j + βbD BMI j ,
D
with µ0Dj ∼ N(0, ΩD D D

0 ) and e0i j ∼ N(0, Ωe ), (9.13)
from which the corresponding PVE is computed such that
(ΩC D C D
e − Ωe ) + (Ω0 − Ω0 )
PVEcenter = . (9.14)
Ωtotal
• For the inclusion with cross-interactions that express the demographic-related time ef-
fects, the model is
Model E: yi j = β0E + µ0Ej + ∑ βsE si j + βtE timei j + βcE ŷ j + eE0i j ,

s
+ βa age j + βgE gender j + βbE BMI j ,
E
+ βtaE (time × age)i j + βtgE (time × gender)i j + βtbE (time × BMI)i j ,
with µ0Ej ∼ N(0, ΩE0 ) and eE0i j ∼ N(0, ΩEe ), (9.15)
and the proportion of cross-interaction variance is
(ΩD E D E
e − Ωe ) + (Ω0 − Ω0 )
PVEcross = . (9.16)
Ωtotal
In addition to the fixed part, we consider the random part of some effects.
• The models with additional random time effect is
Model F: yi j = β0F + µ0Fj + ∑ βsF si j + (βtF + µtFj )timei j + βcF ŷ j + eF0i j ,

s
+ βa age j + βgF gender j + βbF BMI j ,
F
+ βtaF (time × age)i j + βtgF (time × gender)i j + βtbF (time × BMI)i j ,

" # " # " #!
µ0Fj 0 ΩF0
with F ∼ N , F and eF0i j ∼ N(0, ΩFe ), (9.17)
µt j 0 Ωt
The computation of the PVE accounted for by the random time effect can be accordingly
obtained by
(ΩEe − ΩFe ) + (ΩE0 − ΩF0 )

PVEtime random = . (9.18)
Ωtotal
• Afterwards, the model with random effects for different sleep stages (expressing the
between-subject physiological variability associated with each sleep stage in random
part) is then expressed as
Model G: yi j = β0G + µ0Gj + ∑(βsG + µsGj )si j + (βtG + µtGj )timei j + βcG ŷ j + eG
0i j ,
s
+ βa age j + βgG gender j + βbG BMI j ,
G
+ βtaG (time × age)i j + βtgG (time × gender)i j + βtbG (time × BMI)i j ,

     
µ0Gj 0 ΩG
   0G 
with  µt j  ∼ N 0 , Ωt  and eG G
 G
0i j ∼ N(0, Ωe ). (9.19)
G Ωs G
µs j 0
In this model, the random variance ΩG F F

s not only explain the variance in Ω0 and Ωe , but
also reflect some variance of the random time effect ΩtF . Therefore, the proportion of
variance contained in ΩGs to the total variance is
(ΩFe − ΩG F G F G
e ) + (Ω0 − Ω0 ) + (Ωt − Ωt )
PVEbetw subj random = . (9.20)
Ωtotal
Then the PVE of the random time effect to the total variance should be corrected to
(ΩEe − ΩFe ) + (ΩE0 − ΩF0 ) − (ΩFt − ΩG
t )
PVEtime random = . (9.21)
Ωtotal
• Finally, the remaining residual variance is assumed to only associate with the physiolog-
ical variability within subjects and its proportion can be obtained such that
ΩG
e
PVEwithin subj random = . (9.22)
Ωtotal
Note that all these models are optimized by only keeping the variables that do not statistically
equal zero.
CHAPTER 10
Sleep stage classification with ECG and respiratory effort
This chapter is adapted from: P. Fonseca∗ , X. Long∗ , M. Radha, R. Haakma, R. M. Aarts, and J. Rolink.
Sleep stage classification with ECG and respiratory effort. Submitted. (∗ Joint first authorship)
Abstract – Automatic sleep stage classification with cardiorespiratory signals has attracted
increasing attention. In contrast to the traditional manual scoring based on polysomnography
(PSG), these signals can be measured using advanced unobtrusive techniques that are currently
available, promising the applications for personal and continuous home sleep monitoring. This
chapter describe a methodology for classifying wake, rapid-eye-movement (REM) sleep, and
non-REM (NREM) light and deep sleep on a 30-s epoch basis. A total of 142 features were ex-
tracted from electrocardiogram (ECG) and thoracic respiratory effort measured with respiratory
inductance plethysmography (RIP). To improve the quality of these features, subject-specific
Z-score normalization and spline smoothing were used reduce between-subject and within-
subject variability. A modified sequential forward search- (SFS-) feature selector procedure
was applied, yielding 80 features while preventing the introduction of bias in the estimation of
cross-validation performance. Data from 48 healthy adults were used to validate our methods.
Using a linear discriminant classifier and a ten-fold cross-validation, we achieved a Cohen’s
Kappa coefficient of 0.49 and an accuracy of 69% in the classification of wake, REM, light,
and deep sleep. These values increased to Kappa = 0.56 and accuracy = 80% when the classifi-
cation problem was reduced to three classes, wake, REM sleep, and NREM sleep.
149
150 Chapter 10. Sleep stage classification
10.1 Introduction
Sleep is a state of reversible disconnection from the environment and plays an essential role in
the homeostatic regulation of body and mind. The limited consciousness during sleep makes it
one of the hardest lifestyle patterns to reflect upon. Historically this has not been a problem as
the regulation of sleep is rigorously synchronized through a biological circadian rhythm with
the external environment. Yet, in the modern industrialized society where we spend our lives
in artificial environments where lighting, heat and food are available at any moment, sleep
disturbances and disorders have reached epidemic levels [65]. People experience the symptoms
of disturbed sleep such as fatigue, increased impulsiveness and agitation, without the means to
link these issues to their sleeping patterns.
To ensure fitness of body and mind, individuals must be empowered with the ability to mon-
itor sleep easily in order to identify sleep-related problems and adjust their sleeping habits ac-
cordingly. Yet a problem with traditional sleep monitoring, known as polysomnography (PSG),
is that a wide array of potentially sleep-disturbing sensors must be applied to the body and their
measurements can only be interpreted by highly trained sleep technicians or scientists. The
traditional PSG is therefore rather unsuited for individual untrained use and will only introduce
more sleep disturbances when applied on a daily basis. This scenario makes apparent a need for
unobtrusive methods of sleep monitoring, preferably inexpensive and with no training required
to operate them. Cardiorespiratory monitoring can be unobtrusive and the data can be analyzed
by a computer, which makes this technology a promising candidate for personal, continuous
and unobtrusive sleep monitoring.
Cardiorespiratory sleep staging or sleep stage classification is often based on heart rate vari-
ability (HRV) calculated from electrocardiogram (ECG) and respiratory effort, often from res-
piratory inductance plethysmography (RIP). Usually cardiorespiratory information is combined
with body movements from an accelerometer to more accurately distinguish wake from sleep.
One of the earliest studies that presented a successful machine learning approach to cardiorespi-
ratory sleep stage classification with these modalities was done by Redmond et al. [248]. Using
a set of HRV features to model the autonomic nervous activity and a set of respiratory features
to model the parasympathetic tone, Redmond and colleagues showed the viability of a sleep
stage classifier that can generate a simplified hypnogram for an entire night indicating, for each
30-s segment, a sleep stage, classified as either wake, rapid-eye-movement (REM) sleep, or
non-REM (NREM) with no PSG (wake-REM-NREM or WRN classification for short). More
recent research has shown that it is possible to obtain the same cardiorespiratory information
from other sensors for sleep stage classification, such as from bed-mounted ballistocardiogram
[161, 303] or contactless radio frequency [85]. Although these studies focused on distinction
between wake, REM sleep, and NREM sleep (without separating NREM sleep in other sleep
stages) or between wake and sleep (merging REM and NREM sleep), these attempts promised
that cardiorespiratory methods could one day be completely unobtrusive.
In previous work [182] we proposed methods to simultaneously classify wake, REM sleep,
light sleep (NREM stage S1 and S2), and deep sleep or slow wave sleep (stage S3 and S4) us-
ing respiratory activity in order to estimate an overnight wake-REM-light-deep sleep (WRLD)
hypnogram. In comparison with WRN classification, achieving WRLD classification would

allow a more adequate assessment of sleep since, for example, deep sleep is regarded as an in-
dicator of brain memory consolidation and energy reservation [35, 285]. In that work, we also
reviewed the state-of-the-art in sleep stage classification with cardiac and/or respiratory activ-
ity. The methods presented there will be used to benchmark the method proposed in this work.
Since then, at least two additional approaches using cardiorespiratory features have been pro-
posed, which will also be compared with our work. For example, Willemen et al. [309] achieved
a significantly improved performance in cardiorespiratory sleep stage classification. However,
that study classified sleep stages on the basis of one-minute epochs while the standardized scor-
ing of sleep epochs is done on the basis of 30 s [136]. Comparing classification results with a
reference scoring thus involves the merging the ground-truth scores of two successive epochs
for which no official guidelines exist. Nevertheless, the performance reported sets this method
apart from the previous generations of published algorithms. Another cardiorespiratory-based
algorithm with comparable results has been proposed by Domingues et al. [94]. However,
this work only reports results on a three-class task (WRN classification) rather than the more
difficult four-class problem (WRLD classification).
In this chapter, a methodology is described for automatic sleep stage classification based
on machine learned models of the autonomic nervous system during sleep from ECG and RIP
signals. Compared to previous studies, our methodology includes novel features, new feature
post-processing methods, and a refined feature selection method which guarantees that no bias
is introduced in the validation of the algorithm while avoiding the use of a hold-out validation
set, all this applied to both the three-class (WNR) as well as the four-class (WRLD) problem.
10.2 Materials and Methods

10.2.1 Data sets
The data set was the same as used in earlier work [182] and it comprised full single-night
polysomnographic (PSG) recordings of 48 subjects (27 females) acquired in the SIESTA project
[160]. All subjects were healthy sleepers with a Pittsburgh Sleep Quality Index [60] of less than
6 and had no regular sleep complaints nor earlier diagnosis of sleep disorders. The subjects had
an average age of 41.3 (±16.1) y at the time of the recording. Full subject demographics can
be found in our earlier work [182]. Sleep stages were scored by trained sleep technicians in six
classes according to the R&K rules [247]. In the scope of this study, S1 and S2 were merged in
a single L (light sleep) class and S3 and S4 were merged in a single D (deep sleep) class.
Each PSG recording comprised, besides the standard signals required for sleep scoring,
modified lead II ECG, and (thoracic) respiratory effort recorded with respiratory inductance
plethysmography (RIP). QRS complexes were detected and localized from ECG signals using
a combination of a Hamilton-Tompkins detector [123, 124] and a post-processing localization
algorithm [107]. Prior to feature extraction, RIP signals were filtered with a 10th order Butter-
worth low-pass filter with a cut-off frequency of 0.6 Hz, after which baseline was removed by
subtracting the median peak-to-through amplitude [182].
We extracted a set of 142 features from cardiac and respiratory activity, and from cardiores-
piratory interaction (CRI) using a sliding window centered on each 30-s epoch, guaranteeing
sufficient data to capture the changes in autonomic activity [288].
10.2.2.1 Cardiac features

Considering cardiac activity, 86 cardiac features were computed from the QRS complexes de-
tected in the ECG signal. Time domain features, computed over nine epochs, include mean
heart rate, mean heartbeat interval (detrended and non-detrended), standard deviation (SD) of
heartbeat intervals, difference between maximal and minimal heartbeat intervals, root mean
square and SD of successive heartbeat interval differences, and percentage of successive heart-
beat intervals differing by >50 ms [249, 288]. We also computed the mean absolute difference
and different percentiles (at 10%, 25%, 50%, 75%, and 90%) of detrended and non-detrended
heart rates and heartbeat intervals [309, 315] as well as the mean, median, minimal, and max-
imal likelihood ratios of heart rates [32]. In the frequency domain, the features include the
logarithmic spectral powers in the very low frequency band (VLF) from 0.003 to 0.04 Hz, in
the low frequency band (LF) from 0.04 to 0.15 Hz, in the high frequency band (HF) between
0.15 to 0.4 Hz, and the LF-to-HF ratio [59], where the power spectral densities were estimated
over nine epochs. The spectral boundaries were adapted to the corresponding peak frequency,
yielding their boundary-adapted versions [179]. We also computed the maximum module and
phase of HF pole [197] and the maximal power in the HF band and its associated frequency
representing respiratory rate [249]. Features describing non-linear properties of heartbeat in-
tervals were quantified with detrended fluctuation analysis (DFA) over eleven epochs [148] and
its short-term (α1 ), long-term (α2 ), and all time scaling exponents [139, 224], progressive DFA
with non-overlapping segments of 64 heartbeats [289], windowed DFA over eleven epochs [3],
and multi-scale sample entropy over 17 epochs (length of 1 and 2 samples with scales of 1-10)
[75]. Approximate entropy of the symbolic binary sequence that encodes the increase or de-
crease in successive heartbeat intervals over nine epochs was also calculated [78]. In addition,
we propose new features based on a visibility graph (VG) and a difference VG (DVG) method
to characterize HRV time series in a two-dimensional complex network where samples are con-
nected as nodes in terms of certain criteria [169, 183]. The network-based features, computed
over seven epochs, comprise mean, SD, and slope of node degrees and number of nodes in
VG- and DVG-based networks with a small degree (≤ 3 for VG and ≤ 2 for DVG) and a large
degree (≥ 10 for VG and ≥ 8 for DVG), and assortativity coefficient in the VG-based network
[183, 272, 321].
10.2.2.2 Respiratory features

Concerning respiratory activity, 44 features were derived from RIP signals. In the time do-
main, we estimated the variance of respiratory signal, the respiratory frequency and its SD over
150, 210, and 270 s, the mean and SD of breath-by-breath correlation, and the SD in breath
length [249]. Our previous study [182] introduced respiratory amplitude features for sleep
stage classification, including the standardized mean, standardized median, and sample entropy
of respiratory peaks and troughs (indicating inhalation and exhalation breathing depth, respec-
tively), median peak-to-trough difference, median volume and flow rate for complete breath
cycle, inhalation, and exhalation, and inhalation-to-exhalation flow rate ratio. These features
were adopted in this work. Besides, we also computed the similarity between the peaks and
troughs by means of the envelope morphology using a dynamic time warping (DTW) metric
[37]. From the respiratory spectrum, the respiratory frequency and its power, the logarithm of
the spectral power in VLF (0.01-0.05 Hz), LF (0.05-0.15 Hz), and HF (0.15-0.5 Hz) bands,
and the LF-to-HF ratio were estimated [248]. Respiratory regularity was measured by means
of sample entropy over seven epochs [185, 250] and self-(dis)similarity based on DTW and
dynamic frequency warping (DFW) [180] and uniform scaling [185] were derived. The same
network analysis features as for HRV were also computed for breath-to-breath intervals.
10.2.2.3 Cardiorespiratory interaction features

Numerous studies have shown that the interaction between cardiac and respiratory activity
varies across sleep stages [77, 137, 183]. The power associated with respiratory-modulated
heartbeat intervals was quantified over windows of nine epochs[137]. In addition, we also ex-
tracted the VG- and DVG-based features for CRI [183]. These resulted in a total of 12 CRI
features in our feature set.
10.2.2.4 Feature post-processing

In order to reduce the impact of physiological differences and equipment-related variations from
subject to subject, the features of each subject were first Z-score normalized by subtracting
their mean and dividing by their SD. Further, it is known that the sleep pattern of healthy
adults progresses with several cycles throughout the night [63]. For example, REM and NREM
sleep alternate with 4-6 cycles of about 90-110 minutes with deep sleep usually dominating the
NREM periods during the first half of the night. This suggests that the autonomic physiological
response with its associated sleep stage is time-variant across the night for each subject. For
this reason, we were motivated to smooth each feature for each subject by means of a cubic
spline fitting method [84]. This is also expected to help reduce signal measurement noise and
variability within subjects for each sleep stage conveyed by the feature values. The latter can be
caused by body movements, conscious breathing control, internal physiological variations, or
other external factors such as changes in environmental noise and temperature during bedtime
sleep. Instead of other simpler low-pass filters, spline fitting was chosen since it can interpolate
feature values which could not be computed, for example due to motion artifacts (about 10%
observed in our data set), effectively allowing all epochs in each recording to be classified.
Let t represent a sequence of feature values v = {v1 , v2 , ..., vn} at their corresponding time
(or epoch) indices t = {t1 ,t2, ...,tn} (in 30 s), then a relation between them can be modeled by
vi = h(ti ) + εi (i = 1, 2, ..., n), (10.1)

where h is a smoothing (spline) function, εi are independent and identically distributed resid-
uals. The smoothing function can be estimated by minimizing the objective function (i.e.,
penalized sum of square) such that
" #
n Z tn
ĥ = arg min ∑ [vi − h(ti)]2 + λ h′′ (t)2dt , (10.2)
h i=1 t1
where λ is a smoothing parameter that controls the trade-off between residual and local vari-
ation. The smoothing function can be expressed by cubic B-splines as basis functions and
determined via least squares approximation [84, 296].
For a specific overnight recording with a total of m epochs, it is divided in s continuous
segments (s = ⌈m/n⌉), designated as smoothing splines. Each segment can then be modeled by
the spline function, yielding a general spline fitting for the epochs over the entire recording. n
represents the smoothing window size where a larger n translates to a smoother fitting curve. In
this work, a window size of nine epochs for modeling splines was experimentally found to be
appropriate for the task of sleep stage classification.
10.2.3 Classifier
This work used a multi-class Bayesian linear discriminant with time-varying prior probabilities
[249], similar to that used in previous work [182]. For each epoch, the selected class (D, L, R,
or W) is the class ωi that maximizes the posterior probability given an feature vector x [97],
ωi (x) = arg max [gi (x)] (10.3)

i
with the the discriminant function gi for each class given by

1
gi (x) = − (x − µi )T Σ−1 (x − µi ) + ln P (ωi ,t) (10.4)
2
where µi is the average feature vector for class i, Σ is the pooled covariance matrix for all
classes, and P (ωi ,t) is the prior probability for class i at time (since lights off) t. All parameters
are estimated during training.
10.2.4 Feature selection
To select the final list of features we used a wrapper feature selection method based on sequen-
tial forward selection (SFS) [306] using as criterion the Cohen’s Kappa coefficient of agreement
κ [72] on the training set. This measure of agreement between the classification predictions and
the ground-truth annotations is more adequate than traditional measures of accuracy for this
problem since there is a strong imbalance between classes (L epochs, for example, account for
more than 50% of all epochs in the data set) and this coefficient factors out chance agreement,
compensating for class imbalance.
In many machine learning studies supervised feature selection is often applied on the en-
tire data set, even if the training and validation are kept separate (for example using cross-
validation). This common pitfall is known to introduce a bias in the evaluation of a classifier’s
performance, which will often be overestimated [278]. Although keeping a hold-out set for
validation would solve this problem, the limited size of the data set would either mean that
the model learning would be based on potentially insufficient examples, or that the classi-
fier would be evaluated on a very small sample, potentially unrepresentative of the problem
at hand. Instead, the feature selection procedure was executed by strictly separating, on an
iterative procedure akin to cross-validation, the training and validation sets according to the
following procedure:
1. Randomly divide all subjects in the data set amongst ten folds of the same size
2. For each fold-iteration i = 1, .., 10,
(a) Hold out fold i as a validation set and combine the subjects in the remaining folds
to form a training set
(b) Perform an iterative SFS procedure for the total number of available features N =
142, i.e., for each SFS-iteration j = 1, .., N,
i. Select which feature fi j should be added to the set of features selected in the
previous iteration (an empty set when j = 1),
fi j = arg max (κik ) ∀k : fik ∈
/ Fi j−1 (10.5)
k
where κik is the Kappa coefficient of agreement obtained after training and
classification on the training set using the set of features

Fi j−1 ∪ fik = fi1 , . . ., fi j−1 , fik (10.6)
and Fi j−1 is the set of features selected in the previous iteration of SFS.
ii. Store the set of features selected up to this iteration,

Fi j = fi1 , . . ., fi j−1 , fi j (10.7)
and the Kappa coefficient obtained with that set of features, κi j .
After the sets of features for all fold- and SFS-iterations and corresponding Kappa coefficients
are computed, the final consolidated list is obtained:
1. For a varying number of features j = 1, .., N, calculate the average Kappa κ̄ across the
ten iterations, κ̄ j ,

κ̄ = κ̄ j : ∀ j = 1, .., N (10.8)
with
∑10 κi j
κ̄ j = i=1 (10.9)
10
2. Calculate the smallest number of features S that yields a certain percentage P of the max-
imum average Kappa such that the Kappa values per fold-iteration are not significantly
different than those which gives the maximum average Kappa,

S = j | ∀k 6= j : κ j ≥ P · max (κ̄ ) ∧ κk ≥ κ j (10.10)
3. For each feature l, count the number of iterations that feature is selected, f cl , when
limiting the set of features on each iteration to S,
10
f cl = ∑ f cil (10.11)
i=1
where f cil indicates whether feature l is present in the set of selected features for fold-
iteration i and SFS-iteration S,
(
1, fl ∈ FiS
f cil = (10.12)
0, otherwise
4. Pick the S features withe largest feature count f cl to assemble the final set of consolidated
features, FS .
The discriminative power of selected features was evaluated with the absolute standardized
mean distance (ASMD) between the feature values of two classes, computed as

x̄1 − x̄2
ASMD = (10.13)
σ
where x̄1 and x̄2 are the sample means for class 1 and 2 and σ is the pooled sample SD.
10.2.5 Validation and evaluation
After feature selection is performed and the set of features FS is chosen, the classification results
per subject were evaluated using a ten-fold cross-validation procedure using the same folds as
in the feature selection procedure:
1. For each iteration i = 1, .., 10,
(a) Hold out fold i as a validation set and combine the subjects in the remaining folds
to form a training set
(b) Restrict the feature set in the training set to the set FS
(c) Train an LD classifier with the training data
(d) For each subject in the validation set
i. Use the classifier trained in this iteration to classify each epoch of the current
subject
ii. Calculate the Kappa coefficient of agreement between the classification results
and the ground-truth annotations for this subject.
After computing the Kappa coefficient for all subjects in the data set, the average and pooled
performance was calculated.
As mentioned, the (Cohen’s) Kappa coefficient κ is an adequate and well-accepted metric
for evaluating the agreement between sleep technician and computer-based classification since
it compensates for the random agreement that can occur due to class imbalance. In addition
to the Kappa coefficient, we also computed the traditional metric overall accuracy, i.e., the
percentage of correctly identified epochs. For these two metrics, the results are computed both
after pooling the predictions over all epochs of all subjects and after averaging the performance
for each subject.
10.3 Results and discussion

10.3.1 Feature selection
Figure 10.1 indicates the Kappa coefficient obtained for each training set, for a varying number
of features. As illustrated, the maximum average training performance is obtained for 105
features, with an average Kappa of 0.58. Also clear in the figure, is a plateau in performance
between 70 and 100 features. This suggests that the number of features can be greatly decreased
without affecting the training performance. A small feature set is often desirable to prevent
over-fitting to the training data, as long as it is not so small that the model cannot learn the
characteristics of the problem.
Figure 10.2 illustrates the decrease in average training performance associated with a de-
crease in the number of features when choosing different operating points in Figure 10.1 (ex-
pressed in the scatter plot as percentages of the maximum training performance). As it can be
clearly observed in the figure, by allowing a reduction of 0.5% in the training performance,
the number of features can be reduced by 16.2% to a total number of 88 features without a
statistically significant decrease in performance. Allowing a further decrease of 0.5%, the total
number of features is reduced by 23.8% to a total of 80 features, also without a statistically
significant decrease in performance. From this point on, the performance reduction is signifi-
cant and reducing further the number of features will likely lead to a decrease in classification
performance after cross-validation. Using as criteria the smallest number of features that does
not decrease significantly the training performance, a total of S = 80 features was chosen.
Figure 10.3 illustrates the feature count given by (10.11) for each of the 142 features, using
S = 80. A total of 14 features were selected in all 10 iterations of the selection process, while
95 features were selected in at least 50% of the iterations. This means that after ranking the
features by their feature count and selecting the 80 features with the highest count, all features
in the final list of selected features were selected in at least 5 of the 10 iterations (with a mean
count of 7.67). This illustrates the robustness of the modified SFS method described earlier:
despite their simplicity and computational efficiency, sequential selection algorithms are known
to suffer from a so-called ‘nesting effect’, potentially leading to sub-optimal feature sets [238].
By iteratively performing several unbound SFS searches on different training sets and keeping
only the features that are selected most often, this effect should be reduced, as attested by the
0.7
(105, 0.58)
Training performance (κ)
0.6
0.5
0.4
0.3 Avg. performance across all folds

Maximum performance
0.2
0 50 100 150
Number of features (-)
Figure 10.1: Training performance per fold and average training performance. The maximum maximum
performance is indicated with a marker.
Reduction in training performance (%)
5 (58) 95.0%**
(60) 95.5%**
4 (61) 96.0%**
(64) 96.5%**
3 (66) 97.0%**
(68) 97.5%*
2 (71) 98.0%*
(75) 98.5%*
1 (80) 99.0%
(88) 99.5%
(105) 100.0%
0
0 5 10 15 20 25 30 35 40 45
Reduction in number of features (%)
Figure 10.2: Reduction in training performance caused by a reduction in the number of features. For
each point, the number of features (in parenthesis) and the corresponding percentage compared to the
total number of features are indicated. Significance of difference between performance with and without
feature reduction was tested with a Wilcoxon signed-rank test (∗ p < 0.05, ∗∗ p < 0.01).
large average number of iterations each feature in the final set was selected.
For brevity only the 14 features selected in all iterations will be discussed further. Table 10.1
indicates the discriminative power of each feature using the pooled ASMD. It was computed
for each pair of classes after aggregating the feature values for all subjects and also the 90th
percentile of the ASMD (in parenthesis) obtained for each feature, for all individual subjects.
Pooled ASMD values below 0.5 were omitted and 90th percentile ASMD values below 1 were
omitted.
The top features are clearly discriminative for different pairs of classes which helps explain
the relatively large number of features selected. Additionally, it is interesting to observe that
there is one feature (median likelihood ratio) which does not have a pooled ASMD above 0.5 for
any class pair. However, its 90th percentile ASMD value is larger than 1 for the pairs D/W and
L/W. This is a good example of a feature which is discriminative for only a subset of the subjects
10
Feature count (-) 6
0
20 40 60 80 100 120 140
Feature index (-)
Figure 10.3: Feature count indicating, per feature, in how many iterations it was selected when the
number of features was limited to S = 80.
(at least 10%) but not for all subjects. The fact that it was selected in every single iteration
using the wrapper method described in Section 10.2.4 suggests that it is complementary to
other chosen features for certain subjects, helping raise the overall training performance.
10.3.2 Cross-validation
Table 10.2 indicates the classification performance obtained after 10-fold cross-validation using
the selected set of 80 features. In addition, it indicates the performance per class, obtained by
considering each class as the positive class and merging the remaining in a single negative class.
The highest performance is obtained for R detection, followed by W. The lowest performance
is obtained for L. This is further confirmed by the confusion matrix of Table 10.3 which shows
that the largest proportion of errors occurs when trying to distinguish L from the other classes.
For all other classes, the percentage of misclassified epochs (relative to the total number of
epochs) is below 1% except for L.
In order to evaluate the performance of the classifier in a three-class task (WRN), classes D
and L were merged in a single N (non-REM) class. Table 10.2 indicates the resulting perfor-
mance. Analyzing the performance of the classifier we see that the classification performance
rises substantially, to a Kappa of 0.56 and an accuracy of 80%. This was expected since a large
number of classification errors occurred between D and L, and in a WNR task these two classes
no longer need to be distinguished.
To evaluate whether the procedure used to determine the number of features during feature
selection was adequate, we plotted the average classification performance after cross-validation
if the whole feature selection procedure from (10.11) onwards is used to select different-sized
sets of features, and cross-validation is repeated with the corresponding feature sets (Fig-
ure 10.4).
As it can be seen, the maximum cross-validation performance (κ = 0.50) is obtained with 76
features, only 5.3% features less than the 80 features chosen by the feature selection procedure,
Table 10.1: Pooled and individual 90th percentile ASMD values for features selected in all
iterations
Feature D/L D/R D/W L/R L/W R/W

◮ Respiratory features:
VLF spectral power 0.56 1.02 0.86 0.68
(1.34) (1.69) (1.52) (1.26)
LF/HF spectral power ratio 0.56 0.85 0.95 0.70
(1.36) (1.62) (1.10) (1.62) (1.30)
Frequency SD over 270 s 0.79 1.46 1.41 0.84 0.97
(1.20) (1.82) (1.87) (1.38) (1.67) (1.09)
Mean breath-by-breath correlation 0.59 1.03 0.82
(1.27) (1.78) (1.76) (1.61) (1.51) (1.46)
Sample entropy regularity 0.71 0.61 0.55 0.86
(1.67) (1.53) (1.41) (1.49) (1.65)
DTW self-dissimilarity 0.59 0.86 0.86 0.56
(1.62) (1.58) (1.68) (1.39)
Standardized mean of troughs 0.82 1.21 0.97 0.56
(1.41) (1.83) (1.85) (1.19) (1.18) (1.34)
DTW peak-to-trough similarity 0.55
(1.06) (1.34) (1.04) (1.38)
Uniform scaling self-dissimilarity 0.92 1.46 1.16 0.85 0.56
(1.47) (1.87) (1.85) (1.50) (1.47) (1.22)
◮ Cardiac (HRV) features:
Mean likelihood ratio 0.86
(1.50) (1.60) (1.09) (1.46) (1.23)
Median likelihood ratio
(1.19) (1.23)
Adapted LF spectral power 0.65 0.88 0.70
(1.47) (1.70) (1.59) (1.09) (1.15) (1.14)
Assortativity coefficient in VG 0.53
(1.34) (1.11) (1.22) (1.44)
Number small-degree nodes in VG 0.59
(1.05) (1.39) (1.32) (1.14) (1.13) (1.24)
The features can be referred to Section 10.2.4. The pooled ASMD was computed for each pair of
classes after aggregating the feature values for all subjects (values below 0.5 were omitted); The
90th ASMD percentiles (in parentheses) were obtained after computing the ASMD of each fea-
ture, for each subject (values below 1 were omitted).
but 38.2% less than the 105 features that give the maximum training performance. Furthermore,
the performance obtained with 80 features (κ = 0.49) is actually slightly larger than the perfor-
mance obtained with 105 features (κ = 0.48), confirming that the feature reduction procedure
Table 10.2: Cross-validation performance for 3 and 4 classes
Pooled Kappa Pooled Acc. Mean Kappa Mean Acc.

WRLD 0.49 0.69 0.49 ± 0.13 0.69 ± 0.08
D 0.51 0.89 0.50 ± 0.17 0.89 ± 0.04
L 0.40 0.71 0.41 ± 0.14 0.71 ± 0.07
R 0.57 0.87 0.58 ± 0.19 0.87 ± 0.08
W 0.54 0.91 0.51 ± 0.18 0.91 ± 0.04
WRN 0.56 0.80 0.56 ± 0.15 0.80 ± 0.08
The pooled performance was computed after aggregating all epochs of all
subjects. The mean and SD were calculated based on the performance for
each individual subject.
Table 10.3: Confusion matrix after cross-validation
Pred.↓ Ref.→ D L R W
D 3431 (7.6%) 1949 (4.3%) 5 (0.0%) 97 (0.2%)
L 2969 (6.6%) 19165 (42.6%) 2947 (6.5%) 2302 (5.1%)
R 86 (0.2%) 2071 (4.6%) 5383 (12.0%) 404 (0.9%)
W 31 (0.1%) 952 (2.1%) 243 (0.5%) 2996 (6.7%)
0.6
Cross-validation performance (κ)
(76, 0.50)
0.5 (105, 0.48)
(80, 0.49)
0.4
0.3
Maximum performance
Performance using feature selection
Using features that give maximum training performance
0.2 Average performance for all subjects
Standard deviation of the performance for all subjects
0 20 40 60 80 100 120 140

Number of features (-)
Figure 10.4: Performance after cross-validation for a varying number of features with markers indicating
the maximum performance, and the performance with the number of features resulting from the feature
selection procedure and with the number of features that give the best training performance.
is beneficial to reduce over-fitting.

Figure 10.5 illustrates three examples of predicted hypnograms, as compared with the refer-
ence, for three subjects in the data set: the subject with the worst performance (with κ = 0.17),
with the median performance (with κ = 0.50) and with the best performance (with κ = 0.69).
A possible explanation for the poor performance obtained for the worst subject is that the model
κ = 0.17 κ = 0.50 κ = 0.68
W W W
R R R
PSG
L L L
D D D
Time since lights off Time since lights off Time since lights off
W W W
Prediction
R R R
L L L
D D D
00:00 01:59 03:59 05:59 07:59 00:00 01:59 03:59 05:59 07:59 00:00 01:50 03:40 05:30 07:20
Time since lights off Time since lights off Time since lights off
Figure 10.5: Example of sleep stage reference (top) and predictions (bottom) for the subject with the
worst performance (left), with the median performance (middle) and with the best performance (right).
trained with the characteristics of the general sample population does not fully capture this sub-
ject’s cardiac and respiratory expression of different sleep stages. However, despite the low
Kappa coefficient, the predicted hypnogram still exhibits some correct features, namely, most
REM intervals were detected, albeit with the incorrect length, and the two deep sleep periods
were also detected. As the performance improves, we see that the predicted hypnograms match
better the characteristics of the reference hypnogram, and in the best case the most obvious
mistakes are in the missed detection of brief periods of wake during the night while the rest of
the sleep stages are correctly predicted. This is likely caused by the use of spline smoothing
during feature post-processing, which is adequate to capture the slow-changing characteristics
of most sleep stages, but penalizes short, abrupt changes such as brief periods of awakening.
10.3.3 Comparison with state-of-the-art
In literature, only a few studies focused on WRLD classification based on cardiac and/or res-
piratory signals and our results are amongst the best performing. The first observation is that
the results (κ = 0.41 and accuracy = 0.65) of our previous work [185], which used only respi-
ratory features, are worse than those produced in the present work, indicating that combining
cardiac and respiratory activity can lead to an improved classification performance. Isa et al.
[138] presented a Kappa coefficient κ of 0.26 (with an accuracy of 0.60) using only cardiac fea-
tures. The study of Hedner et al. [127] achieved similar results (κ = 0.48 and accuracy = 0.66)
but they used more signal modalities including peripheral arterial tone, actigraphy, and pulse
oximetry. The recent study by Willemen et al. [309] also achieved a good performance with a
κ of 0.56 and an accuracy of 0.69, although it was validated with a younger sample population
(age 22.1 ± 3.2 y), excluded 12% of the epochs from validation and used a basis of 60-s epochs
instead of the standard scoring basis of 30 s which makes the results incomparable.
For WRN classification with cardiac and/or respiratory activity, we see that, to the best of
our knowledge, our results also outperform those reported in almost all of the previous studies,
such as κ = 0.32 and accuracy = 0.67 [248], κ = 0.45 and accuracy = 0.76 [249], κ = 0.42
and accuracy = 0.72 [198], κ = 0.44 and accuracy = 0.79 [161], κ = 0.55 and accuracy = 0.77
[200], κ = 0.48 and accuracy = 0.78 [167], κ = 0.46 and accuracy = 0.73 [312], κ = 0.62 and
accuracy = 0.81 [309], and κ = 0.58 and accuracy = 0.78 [94]. In comparison with one of the
best performing studies [94], we obtain a higher accuracy (albeit a slightly smaller Kappa) but
require one less modality (actigraphy). Regarding the work of Willemen et al. [309] it is again
important to note that the results in that study were obtained on basis of 60-s epochs.
10.4 Conclusion
This chapter presents a method to identify overnight sleep stages using cardiorespiratory fea-
tures extracted from ECG and RIP signals. These features were post-processed by means of
subject-specific Z-score normalization and spline smoothing, which helps reduce the influence
of signal noise, between-subject, or within-subject variability in autonomic physiology. Eighty
features were selected from a set of 142 features using a modified SFS-based feature selector
designed to avoid biasing validation performance. Using a linear discriminant classifier in a
ten-fold cross-validation procedure, the classification results (for both the four-class WRLD
and three-class WRN classification tasks) achieved in this work outperform most of the previ-
ous studies.
CHAPTER 11
General discussion and future perspectives
165
166 Chapter 11. General discussion and future perspectives
11.1 Analysis of features

As mentioned in Chapter 1, to achieve ultimate improvement in sleep stage classification, one
of the aims in this thesis was to extract new features that contain cardiorespiratory characteris-
tics in addition to the existing features are robust to the variability between or within subjects.
Table 11.1 lists all the 142 features including 86 cardiac, 44 respiratory and 12 cardiorespi-
ratory interaction (CRI) features previously used for sleep stage classification. Among those
features, 53 features (15 cardiac, 27 respiratory, and 11 CRI features) were newly proposed in
this thesis. The feasibility of all these new features in enhancing sleep stage classification has
been revealed in previous chapters. The spectral features with adaptive boundaries (C72-C75)
were presented in Chapter 2 and two novel self-similarity respiratory features measured by
means of dynamic time and frequency warping (R16 and R17) were presented in Chapter 3.
These features were shown to help identifying sleep/wake states, especially when actigraphy
was absent. A different similarity metric uniform scaling was exerted to extract a respiratory
self-dissimilarity feature (R33) as described in Chapter 5, which was beneficial for classifying
sleep stages, in particular for detecting deep sleep or slow wave sleep (SWS) from the other
stages. Chapter 4 designed a set of respiratory features (R18-R32) derived from the respira-
tory signal envelopes and area under the curves, expressing the breathing depths and volumes,
respectively. These features were superior in identifying deeper sleep stages. Chapter 6 dis-
cussed a visibility graph model that can be potentially used to extract novel features from in
regard to CRI properties in complex networks. In Chapter 10, besides the VG-based CRI fea-
tures (X2-X12), the model also allowed extracting VG-based features from heartbeat intervals
(C76-C86) and breath-to-breath intervals (R34-R44). Some of them were automatically se-
lected with the feature selection procedure described in that chapter, contributing on achieving
better sleep stage classification results.
To provide a more detailed comparison between the new features and the existing features,
Figure 11.1 illustrates their discriminative power in separating each pair of classes (sleep stages)
including wake versus REM sleep (W/R), wake versus light sleep (W/L), wake versus deep
sleep (W/D), REM sleep versus light sleep (R/L), REM sleep versus deep sleep (R/D), and
light sleep versus deep sleep (L/D). The discriminative power was measured by the standard-
ized mean difference (ASMD) metric, computed by pooling over all 30-s epochs. Note that the
feature values were post-processed per night with Z-score normalization and spline smooth-
ing as described previously. The data set used here was the same as that used in Chapter 4,
5, and 10 where single-night polysomnographic (PSG) recordings acquired from 48 healthy
subjects with normal overnight sleep architectures. It is noted in the figure that many new fea-
tures proposed in this thesis appeared relatively high discriminative powers, indicating that they
were effective in help classifying sleep stages. Further, we observe that the feature ranking in
separating different pairs of classes seems not consistent. For example, the cardiorespiratory
interaction VG-based features X2-X12 generally performed better in identifying wake epochs
when compared with the other new features, while the respiratory amplitude features R18-R32
and the dissimilarity features R33 ranked higher for discriminating deep sleep from the other
sleep stages. This motivates us to investigate specified features in identifying different sleep
Chapter 11. General discussion and future perspectives 167
Table 11.1: A list of features used in this thesis
Feature index Description References

◮ Existing cardiac features
C1-C3 Mean HR, mean RR, and detrended mean RR [248, 288]
C4-C8 SDNN, RR range, pNN50, RMSSD, and SDSD [288]
C9-C12 RR logarithmic VLF, LF, and HF power and LF-to-HF ratio [59, 288]
C13-C16 RR mean resp frequency and power, max phase and module in HF pole [197]
C17-C36 RR SampEn regularity at length 1 scale 1-10 and length 2 scale 1-10 [75]
C37-C42 RR DFA, its short, long exponents and all scales, PDFA, and WDFA [148, 224, 289]
C43-C46 Mean absolute difference in HR and RR and in detrended HR and RR [308, 315]
C47-C56 RR and HR percentiles (10%, 25%, 50%, 75%, and 90%) [308, 315]
C57-C68 Detrended RR and HR percentiles (10%, 25%, 50%, 75%, and 90%) [308, 315]
C67-C70 Mean, median, minimum, and maximum of RR likelihood ratios [32]
C71 RR SampEn regularity in symbolic binary changes [78]
⊲ New cardiac features
C72-C75 Adaptive logarithmic RR VLF, LF, and HF power and LF-to-HF ratio Chapter 2
C76-C81 Node degree mean, SD, and slope in VG and in DVG of RR Chapter 6, 10
C82-C85 Number of small- and large-degree nodes in VG and in DVG of RR Chapter 6, 10
C86 Assortativity mixing coefficient in VG of RR Chapter 6, 10
◮ Existing respiratory (resp) features
R1-R3 Resp frequency in time and frequency domain and its spectral power [248, 249]
R4-R7 Resp logarithmic VLF, LF, and HF power and LF-to-HF ratio [248]
R8-R10 Resp frequency SD over 150, 210, and 270 s [249]
R11-R13 Mean and SD of breath-by-breath correlations, SD of breath lengths [248]
R14-R15 Resp SampEn regularity and resp variance [249, 250]
⊲ New respiratory features
R16,R17 Resp dynamic time and frequency warping self-(dis)similarity Chapter 3
R18-R21 Standardized mean and median of resp peaks and troughs Chapter 4
R22,R23 SampEn regularity of resp peaks and troughs Chapter 4
R24,R25 Median peak-to-trough difference and dynamic time warping similarity Chapter 4
R26-R31 Median volume and flow rate of breaths, inhalations, and exhalations Chapter 4
R32 Ratio of inhalation-to-exhalation flow rate Chapter 4
R33 Resp uniform scaling self-dissimilarity Chapter 5
R34-R39 Node degree mean, SD, and slope in VG and in DVG of BB Chapter 6, 10
R40-R43 Number of small- and large-degree nodes in VG and in DVG of BB Chapter 6, 10
R44 Assortativity mixing coefficient in VG of BB Chapter 6, 10
◮ Existing cardiorespiratory interaction (CRI) features
X1 Co-power between RR and resp [137]
⊲ New cardiorespiratory interaction (CRI) features
X2-X7 Node degree mean, SD, and slope in VG and in DVG of CRI Chapter 6, 10
X8-X11 Number of small- and large-degree nodes in VG and in DVG of CRI Chapter 6, 10
X12 Assortativity mixing coefficient in VG of CRI Chapter 6, 10
HR, heart rate; RR, heartbeat interval; SD, standard deviation; SDNN, SD of RR; RR range, maximal-
to-minimal RR difference; pNN50, percentage of successive RR differences >50 ms; RMSSD, root mean
square of successive RR differences; SDSD, SD of successive RR differences; VLF, very low frequency;
LF, low frequency; HF, high frequency; SampEn, sample entropy; DFA, detrended fluctuation analysis;
PDFA, progressive DFA; WDFA, windowed DFA; VG, visibility graph; DVG, difference VG; BB, breath-
to-breath interval.
W/R Exist. features New features

ASMD
1
0
W/L
ASMD
0
W/D
ASMD
0
R/L
ASMD
0
R/D
ASMD
0
L/D
ASMD
C1-C71 C72-86 R1-15 R16-44 X2-12

X1
Feature index
Figure 11.1: Discriminative power as measured by ASMD of all the 142 features with post-processing
(Z-score and/or spline smoothing) in separating each two sleep stages.
stages to further improve the classification performance in future work.

When classifying multiple sleep stages simultaneously [(wake, REM sleep, light sleep, and
deep sleep (W/R/L/D) and wake, REM sleep, and NREM sleep (W/R/N)], the feature dis-
criminative power can quantified by the One-Way analysis of variance (ANOVA) F-statistic
metric (Figure 11.2). It also indicates that the several new features outperformed many exist-
ing features. Chapter 4, 5, 8, and 10 have shown that the post-processing with (subject- or
night-specific) Z-score normalization and spline smoothing can improve the features and there-
after enhance the sleep stage classification performance. This is because these methods could
help reduce either the between- or the within-subject effect to a certain extent. Additionally,
the use of spline smoothing instead of the other low-pass filtering could also help interpolate
the missing feature values that constituted an average of about 10% of the total amount of
epochs per night for some features. These missing values were possibly caused by, e.g., in-
sufficient detected heartbeats or respiratory peaks/troughs due to the presence of body motion
artifacts. Figure 11.2 also illustrates the discriminative power (ANOVA F-statistic) for all fea-
tures (1) without post-processing, (2) with Z-score normalization, and (3) with both Z-score
normalization and spline smoothing. It shows that both methods can yield a clear increase in
ANOVA F-statistic for most of the features. However, we also see that, for some features,
the smoothing resulted in a decreased discriminative power. For example, the feature respi-
ratory dynamic time warping self-(dis)similarity (R16) was regarded as an indicator of body
6000
Without post-processing W/R/L/D
5000 With Z-score
With Z-score and smoothing
ANOVA F-statistic
4000
3000
2000
1000
C1-71 C72-86 R1-15 R16-44 X2-12

X1
Feature index
7000
Without post-processing W/R/N
6000
With Z-score
With Z-score and smoothing
5000
ANOVA F-statistic
4000
3000
2000
1000
C1-71 C72-86 R1-15 R16-44 X2-12

X1
Feature index
Figure 11.2: Discriminative power as measured by ANOVA F-statistic of all the 142 features with and
without post-processing (Z-score and/or spline smoothing) for W/R/L/D and W/R/N separation.
movements that has been successfully used to identify wake epochs. Its feature values had a
strongly skewed distribution, where the body movement information was usually reflected by
the high-frequency components in the spectral domain. Smoothing this feature would filter out
the useful body movement information, leading to a deteriorated discriminative power. There-
fore, we think that the post-processing methods should be ‘feature-dependent’. In other words,
it is worthwhile to investigate a criterion that can be used to determine if a feature needs to be
post-processed or not. For example, this criterion can be linked to the distribution of a specific
feature.
It was likely that the features with a high discriminative power would be significantly corre-
lated with mutual information in sleep stage classification. To have a general view of feature-
to-feature correlations, Figure 11.3 plots the Spearman’s rank correlation coefficients between
all the 142 features. The features respiratory frequency SD over 150, 210, and 270 s (R8-R10)
X2-12
X1
R16-44 0.9
0.8
0.7
C72-86 R1-15
0.6
Feature index
0.5
0.4
0.3
C1-71
0.2
0.1
C1-71 C72-86 R1-15 R16-44 X2-12

X1
Feature index
Figure 11.3: Correlation coefficients (Spearman’s rank) between features.
are typical examples. These three features had the highest discriminative power in general but,
apparently, they are strongly correlated. On the other hand, some lower-ranked features could
still contribute to the classification if they contained additional physiological information that
was not observed in the top-ranked features. Therefore, feature selection that takes both the
feature discriminative power and the correlation between features into account. For example,
for the binary-class problem, the correlation-based feature selector (CFS) has been success-
fully used for deep sleep detection (Chapter 8), where only six features were selected without
loss in final classification performance. However, CFS was considered to be inapplicable for
the multiple-class problem since the changes in different features (reflecting certain aspects in
physiology) across sleep stages were not linear and were not even always consistent. A super-
vised sequential forward search (SFS) feature selection algorithm was described in Chapter 10,
whereas some features with a low discriminative power were still selected.
In Chapter 9, it was demonstrated that the physiological variations within subjects and from
subject to subject would be the main barrier for achieving reliable sleep stage classification re-
sults, where a multilevel modeling method was proposed to evaluate features by quantifying
the amount of those variations. As discussed in the associated chapters about the new fea-
tures, they were expected to either have additional physiological information or be robust to
between-/within-subject variability. Additionally, the employment of feature post-processing
methods (normalization and smoothing) was assumed to diminish the variations conveyed by
the features. However, it was not thoroughly exploited for those new features or post-processing
methods that what drove the contribution to getting improved sleep stage classification results.
No Slight Fair Moderate Substantial Almost perfect

agreement agreement agreement agreement agreement agreement
<0 0 - 0.20 0.21 - 0.40 0.41 - 0.60 0.61 - 0.80 0.81 - 1
Figure 11.4: Indication of agreement level of the Cohen’s Kappa coefficient [172].
Further research is required at this point using the method proposed in Chapter 9, which will
help understand the new features and the post-processing methods, thus inspiring us to, for ex-
ample, develop adaptive post-processing/feature selection algorithms to optimize each feature
for identifying different sleep stages.
11.2 Sleep stage classification

Nocturnal sleep stage classification with body movements, cardiac, and/or respiratory activity
that can be unobtrusively acquired represents a novel frontier for quantitative sleep assessment.
Since the evaluation metric of overall accuracy also relies on the class balance (here the per-
centages of sleep stages) usually changing over data sets, we primarily compare the Cohen’s
Kappa coefficient robust to the chance agreement. The Kappa values can be characterized in
different levels of agreement [172] as illustrated in Figure 11.4.
To benchmark the classification methods and results, Table 11.2 compares the best classi-
fication results produced in our work with those reported in literature. The studies with signal
modalities (body movements, cardiac, and/or respiratory activity), subjects, number of record-
ings, number of features, algorithms, and classification performance (accuracy and Kappa co-
efficient) were presented. We included four classification tasks for comparison: four-stage
(WRLD) classification and three-stage (WRN) classification, sleep and wake (SW) classifica-
tion, and deep sleep (D) or SWS detection. In order to allow fair comparisons, only the clas-
sification results in a subject-independent scheme (i.e., classifying sleep stages for an ‘unseen’
subject using the model trained on the data from other subjects) are shown in the table. It is
more realistic with regard to home sleep monitoring compared with a subject-specific scheme
(i.e., classifying sleep stages for a subject using the model trained on the data from the same
subject) that requires at least one-night stay in a sleep laboratory with PSG-based scoring. In
that case, the reported classification performances (usually much better than those estimated
with a subject-independent scheme [151, 248, 312]) would be over-optimized.
Table 11.2 indicates that the classification performance as reported in earlier studies was
mostly poorer than that achieved in this thesis. However, it should be noted that comparing
the classification performance between studies is complicated by the variety of experimental
settings such as data recording protocols (e.g., recording between lights ON and OFF or not),
number of subjects/recordings, subject group (e.g., healthy/unhealthy or age group), and signal
modalities, and by the variety of classification procedure such as features, type of classification
algorithm, and validation method. The study by Willemen et al. [309] reported better classifica-
tion results than those obtained in this thesis in WRLD and WRN classification tasks, but they
Table 11.2: Studies on sleep stage classification with cardiorespiratory activity (and body movements)
Task First author, year Modalitya Record.b Epoch Algo.c Acc.d Kappad
WRLD Yilmaz, 2010 [315] ECG 8H 30 s SVM 73%† n.a.
classification Isa, 2011 [138] ECG 16 O 30 s RF 60% 0.26
Hedner, 2011 [127] BM, PAT, PO 227 H/O 30 s zzzPAT 66% 0.48
Willemen, 2014 [309] BM, ECG, RE 85 H 60 s SVM 69% 0.56
This thesis (Chapter 10)∗ ECG 48 H 30 s LD 67% 0.46
RE 48 H 30 s LD 66% 0.44
ECG, RE 48 H 30 s LD 69% 0.49
WRN Redmond, 2006 [248] ECG, RE 37 O 30 s LD 67% 0.32
classification Redmond, 2007 [249] ECG, RE 31 H 30 s LD 76% 0.45
Mendez, 2010 [198] BCG 17 H 30 s KNN 72% 0.42
Kortelainen, 2010 [161] BCG 18 H 30 s HMM 79% 0.44
Kurihara, 2012 [167] BCG 20 H 30 s IR 78% 0.48
Xiao, 2013 [312] ECG 45 H 30 s RF 73% 0.46
Domingues, 2014 [94] BM, ECG, RE 24 H 30 s HMM 78%‡ 0.58‡
This thesis (Chapter 10)∗ ECG 48 H 30 s LD 78% 0.52
RE 48 H 30 s LD 77% 0.50
ECG, RE 48 H 30 s LD 80% 0.56
SW Redmond, 2007 [249] ECG, RE 31 H 30 s LD 89% 0.60
classification Karlen, 2009 [151] ECG, RE 6H 30 s ANN 85% n.a.
Devot, 2010 [89] BM, ECG, RE 35 H/I 30 s LD 87% 0.62
Jung, 2013 [145] BCG 10 H 30 s TH 97%§ 0.83§
This thesis (Chapter 2) ECG 15 H 30 s LD 93% 0.48
BM, ECG 15 H 30 s LD 96% 0.64
This thesis (Chapter 3) RE 15 H 30 s LD 94% 0.59
BM, RE 15 H 30 s LD 96% 0.66
BM, ECG, RE 15 H 30 s LD 96% 0.67
D (SWS) Shinar, 2001 [273] ECG 34 H 30 s TH 80% n.a.
detection Choi, 2009 [68] BM, ECG 4H 30 s BACT 92%§ 0.62§
Bsoul, 2010 [53] ECG 16 H/O 30 s SVM 83%¶ n.a.
Hedner, 2011 [127] BM, PAT 227 H/O 30 s zzzPAT 89%♯ 0.49♯
Ebrahimi, 2013 [99] ECG 30 H 30 s LD 80% n.a.
Long, 2014 [186] ECG 15 H 30 s LD 81% 0.42
This thesis (Chapter 8) ECG 257 H 30 s LD 88% 0.54
RE 257 H 30 s LD 88% 0.51
ECG, RE 257 H 30 s LD 89% 0.57
a
BM, body movements; ECG, mostly heart rate variability (HRV) was used; RE, respiration; BCG, ballisto-
cardiogram (including BM, HRV, and/or RE); PAT, peripheral arterial tone; PO, pulse oximetry (also pulse
rate). b H, healthy subjects; O, subjects with obstructive sleep apnea; I, insomniacs. c SVM, support vector ma-
chine; RF, random forest; zzzPAT, the algorithm described in [131]; LD, (Bayesian) linear discriminant; KNN,
k-nearest neighbour; HMM, hidden Markov model; IR, incidence ratio; ANN, artificial neural network; TH,
thresholding; BACT, the algorithm described in [68]. d Subject-independent classification results are presented.
∗ Results were either presented in the corresponding chapter or produced using the methods in that chapter.
† Cross-validation was used within each subject. ‡ Ambiguous epochs were rejected, § Training and test sets
were mixed. ¶ Light sleep were disregarded. ♯ Results were re-computed from the reported confusion matrix.
used 60-s epochs rather than the clinically standard 30 s which made the classification easier.
Jung et al. [145] (for SW classification) and Choi et al. [68] validated their algorithm without
splitting training and test sets, obviously leading to bias of the classification performance.
The findings in regard to the time delay between changes in autonomic and brain activity
during some sleep stage transitions have been described in Chapter 7 and utilized for helping
detect deep sleep from all the other sleep stages (Chapter 8). Unfortunately, these findings
were not incorporated in classifying multiple sleep stages in the methodology presented in
Chapter 10, which will be promising in further improving the classification performance. It
is important to note that the incorporation of time-delayed methods should depend on sleep
stages as long as some sleep stage transitions appear no time delays between autonomic and
brain activity.
In addition to sleep stages, arousals also influence the changes in cardiorespiratory activity
during sleep [293], constraining the sleep stage classification. Hence, correcting the arousal
influences would promise an improvement in cardiorespiratory-based sleep stage classification.
This merits further study.
To answer the research question of this thesis raised in Chapter 1, Figure 11.5 shows the
progressive increases of our sleep stage classification results achieved in different phases during
the past four years’ PhD work. It can be seen that more reliable performances in sleep stage clas-
sification (for healthy adults) with body movements, cardiac and/or respiratory activity has been
achieved. It is interesting to compare the performance of sleep stage classification (for wake,
REM sleep, light sleep, and deep sleep) using our cardiorespiratory-based approach with that
using an automatic PSG-based system. Here the agreement between automatic classification
and the standard manual scoring were used for comparison. With the validated Somnolyzerr
[20], a Cohen’s Kappa coefficient of 0.80 and an accuracy of 85% were reached for classifying
the four stages (re-computed based on the reported confusion matrix). These results are com-
parable to the agreement between human raters [81], indicating that, unsurprisingly, the PSG-
based automatic sleep staging system far outperforms the cardiorespiratory-based approach pre-
sented in this thesis. This implies that sleep stage classification with cardiorespiratory activity
is not applicable for clinical utilization at present. However, it is still promising for home sleep
monitoring aiming at offering an understandable sleep assessment for users in a healthy con-
dition. Nevertheless, further researches are encouraged to improve the cardiorespiratory-based
sleep classification, particularly in distinguishing between wake and REM sleep and between
light and deep sleep that seem more difficult than separating the other sleep stages (see Chap-
ter 10).
11.3 Classifier
Evaluating different classifiers is out of the scope of the present thesis, in which a simple linear
discriminant classifier was adopted all the time. In fact, many other different classifiers have
been tested over the years including thresholding (TH), quadratic discriminant (QD), hidden
Markov models (HMM), support vector machines (SVM), random forest (RF), neural networks
0.7 BM+ECG+RE
BM+ECG+RE BM+RE
BM+ECG BM+ECG
0.6 BM+RE RE
ECG+RE ECG+RE
Cohen’s Kappa coefficient
ECG
ECG
RE RE
0.5 ECG
ECG+RE
RE RE ECG
RE
RE
ECG+RE
ECG
0.4 RE RE
RE SW classification
0.3 D detection
WRN classification
RE
WRLD classification
0.2
2011 2012 2013 2014 2015
Year
Figure 11.5: Progression of increases in sleep stage classification performance (Cohen’s Kappa coef-
ficient) achieved in different phases during the PhD work. All increases were found to be significant
(p < 0.05), examined with a Wilcoxon (two-sided) sign-rank test. The signal modalities included body
movements (BM), electrocardiogram (ECG), and/or respiratory effort (RE). The classification tasks in-
cluded: sleep and wake (SW) classification, deep sleep (D) or slow wave sleep (SWS) detection, wake,
REM sleep, and NREM sleep (WRN) classification, and wake, REM sleep, light sleep, and deep sleep
(WRLD) classification. The highest Kappa for each classification task is marked in bold.
(NN), etc. For many classification tasks, the LD classifier was found to be one of the best per-
forming algorithms (see Table 11.2). The strength of LD lies in the underlying simple model,
providing a robust model of the features over the different sleep stages. From a machine learn-
ing point of view, we speculate that the current features only expressed limited indicative phys-
iological information for separating sleep stages so that the classification performance would
not be markedly improved unless new features or classifiers that can characterize additional
inherent physiological information are used. For example, because the LD classifier is inde-
pendent of time whereas sleep is a structured process (i.e., the state and characteristics of each
epoch are not independent), temporal classifiers exerting this structure are expected to improve
the classification. Therefore, exploring these types of classifiers should be in the future work.
11.4 Subject/patient groups

For the potential applications of sleep monitoring in a home environment, this thesis primar-
ily addressed on the sleep stage classification for healthy adults who had usually a normal
overnight sleep architecture, in which each sleep stage had a certain amount of epochs as sug-
gested by Ohayon et al. [216]. Therefore, our methods and the corresponding settings (e.g.,
parameters when computing features, selected features, classifier operating thresholds, the use
of normalization, and smoothing window size) were optimally tuned on the data from healthy
Healthy subject
Wake
REM
Light
Deep
0 1 2 3 4 5 6 7
Time (h)
Insomniac
Wake
REM
Light
Deep
0 1 2 3 4 5 6 7
Time (h)
OSA patient
Wake
REM
Light
Deep
0 1 2 3 4 5 6 7
Time (h)
Figure 11.6: Examples of overnight sleep stages for a healthy subject, an insomniac, and an OSA patient.
subjects and might not be appropriate for the other subject groups with prevalent sleep prob-
lems. For example, Figure 11.6 depicts typical examples of overnight sleep architecture (sleep
stages) from a healthy subject, an insomniac, and a patient with severe obstructive sleep ap-
nea (OSA). The sleep stages (wake, REM sleep, light sleep, and SWS) were obtained through
PSG-based manual scoring by sleep technicians. Clearly in the figure, the manifested sleep
architecture differs between the three subjects throughout the night, during which sleep stages
would be altered by the pathophysiology of the disordered sleep. For example, in comparison
with healthy subjects, insomniacs experience much longer wake time [215], sometimes along
with SWS deficiency [109, 112]. Patients with the sleep apnea syndrome is associated with
sleep fragmentation (with a lot of sleep stage transitions) due to the repeated occurrences of
end-apneic arousals [156, 317], altered cardiac variability [207], and dysfunction in autonomic
nervous activity changing across sleep stages [119, 120] which is the rationale and hypothesis
for autonomic-based sleep stage classification. In addition, the autonomic function is also in-
fluenced by the presence of some sleep problems [22, 195, 280]. As a consequence, these will
lead to difficulties in cardiorespiratory-based sleep stage classification for these patient groups.
Using a classifier trained by the data from healthy subjects to classify sleep stages for sleep
disordered patients would obviously not be applicable. For a specific patient group, even if
the classifier derived from some patients is applied for the other patients in the same patient
group, the classification performances would still be worse than those obtained for healthy
subjects. This can be seen in Table 11.2, for example, by comparing the classification results
in [249] for healthy subjects (Kappa = 0.46, accuracy = 76%) and in [248] for OSA patients
(Kappa = 0.32, accuracy = 67%). Moreover, the effectiveness of the features and the post-
processing methods proposed in this thesis is unknown for other subject groups, which should
be further studied. For example, the Z-score normalization assumed that the percentages of
sleep stages for different subjects are similar which is not always the case for sleep disrupted
patients such as insomniacs. Smoothing the feature values per night for OSA patients with
a fragmented sleep architecture seems not appropriate since it would not be able to capture
the fast and subtle changes in cardiorespiratory activity caused by the frequent occurrence of
arousals. Thus, the post-processing methods must be re-examined for those patient groups.
In addition, it was found that the sleep stage classification performance was dependent of
age. Chapter 8 has revealed that the deep sleep epochs were easier to be correctly identified
with cardiorespiratory activity for younger subjects in comparison with elderly people. In fact,
the overnight sleep architecture is age-related as shown by Ohayon et al. [216], in which a meta-
analysis of sleep parameters across the human lifespan (for healthy subjects with ages 5 to 85
y) showed that the total sleep time (TST), wake after sleep onset (WASO), REM sleep time,
and deep sleep time decrease along with the increase in age. Moreover, the multilevel analysis
results presented in Chapter 9 indicate that the autonomic cardiorespiratory activity during
sleep was significantly influenced by age. To these matters, executing sleep stage classification
for different age groups would be promising to further enhance the classification performance.
This merits further exploration.
11.5 Objective and subjective sleep assessments
Although the focus of this thesis was on objective sleep monitoring, it is important to assess
sleep from subjective perspectives because ‘sleep quality’ is also linked to the perception or
feeling by humans. For example, sleep deprivation has been consistently associated with the
loss of daytime (cognitive or behavioral) performance, such as drowsiness, irritability, or in-
creased fatigue [47, 98, 263]. Chronic sleep disruption caused by, e.g., sleep fragmentation, has
been shown to relate to worsened mood [255].
Several clinical questionnaires have been published for examining sleep quality such as the
Pittsburgh Sleep Quality Index (PSQI) [60] and the Self-Assessment of Sleep and Awakening
Quality Scale (SSA) [262]. The relationship between objective sleep variables, derived from a
PSG-based sleep architecture, and subjective sleep quality, obtained from questionnaires, has
been researched thoroughly in the past. Although some inconsistent or even paradoxical re-
sults were found where the study outcomes differed to which extent the objective variables are
correlated, some objective variables were consistently related to the subjective sleep experi-
ence. The most profound association was between wake time and subjective sleep quality with
a correlation r = −0.59 [9, 21, 251]. A reliable objective sleep stage classification will enable
the analysis and construction of a frontier model by combining objective and subjective mea-
surements to assess an overall sleep quality, in particular when the classification is done with
cardiorespiratory activity (and body movements) that can be acquired unobtrusively at home.
Krystal and Edinger [164] proposed new methods to analyze PSG data more at a measure of
nature/depth of sleep, such as indices for the frequency content of electroencephalogram (EEG)
signals obtained during NREM sleep, or to look at particular patterns in the NREM sleep and
their sequence between NREM sleep patterns instead of only taking the variables derived from
sleep stages. In the context of sleep monitoring with cardiorespiratory activity, it would be
tempting to go a further step to investigate the associations between autonomic physiological
measures and subjective sleep quality have not been explicitly analyzed.
Intuitively, it was expected that a stable sleep, seen in, for example, a low breathing rate
variation overnight, is indicative for a good sleep quality rating. Taking this respiratory measure
as an example, we analyzed the (Spearman’s rank) correlation between the total SSA score and
the overnight mean standard deviation of breathing rates (SDBR) for males and females and for
three age groups [young: 20-39 y (n = 52), middle aged: 40-69 y (n = 69), and elderly: ≥70 y
(n = 44)]. The SSA consists of 27 questions, divided in four parts: sleep quality, awakening
quality, somatic complaints, and estimates about sleeping times of last nights. A total SSA
score can be calculated when taking the first three parts, or a sub score of each part separately
can be calculated. The total score range is between 20 and 80, where higher scores indicate
poorer sleep quality. The respiratory measure here was obtained as the mean of the whole
night, calculated by taking the mean SDBR for each sleep stage separately and followed by
calculating the mean of those separate means for different sleep stages. This was done so
that the final mean value was not influenced by the differences in the percentages of the sleep
stages, serving to purely look at the physiological measure without including information of
sleep stages. The data used here was the same as that used in Chapter 9 including 165 subjects
monitored in sleep laboratories with PSG for two consecutive nights. Positive correlations were
found between the mean SDBR and the total SSA score [night 1: r = 0.179, p = 0.024; night 2:
r = 0.213, p = 0.007]. This means that a higher variation of breathing rate was associated
with worse sleep experience. However, the correlation coefficient was not high, implicating
that a weak association in between. A gender effect was observed in both nights, as significant
correlations were found between mean SDBR and total score on SSA for females [night 1:
r = 0.263, p = 0.014, Figure 11.7(a); night 2: r = 0.300, p = 0.005, Figure 11.7(b)], but not
appeared for males. Therefore, a higher mean variation of the breathing rate was associated
with a worse sleep quality in women. The significant correlations found for females contributed
to the presence of the previous (weak) correlations found in all the subjects. An explanation
for this is not clear and needs to be further investigated. Additionally, moderate correlations
were found between the mean SDBR and the total score of the SSA in the first night in the
elderly group [r = 0.399, p = 0.008, Figure 11.7(c)]. No significant correlations were observed
for the other age groups, suggesting that, especially for elderly subjects, a higher breathing rate
variation was associated with worse sleep experience. However, these results were not present
in the second night, meaning that these findings might be due to the “first-night effect” present
in this data set [196].
This was a preliminary analysis (as an example) and future research with more in-depth
analyses of the PSG data is needed to better understand the relationship between objective
sleep measurements and subjective sleep quality. Moreover, multiple nights are necessary to
assess the night-to-night variability within this relationship.
(a) (b) (c)

50 50 50
Total SSA score
Total SSA score
Total SSA score

40 40 40
30 30 30
20 20 20
0 0.05 0.1 0 0.05 0.1 0 0.05 0.1
Mean SDBR (Hz) Mean SDBR (Hz) Mean SDBR (Hz)
Figure 11.7: Scatter plot of the total score on the SSA and the mean SDBR for (a) women of night 1
and (b) women of night 2, and (c) the elderly group of night 1.
11.6 Towards unobtrusive sleep monitoring

This thesis has demonstrated the feasibility of using cardiorespiratory activity (and body move-
ments) to classify nocturnal sleep stages for healthy adults. Although those signals were mea-
sured with the traditional ways, i.e., ECG for the cardiac signal and respiratory inductance
plethysmography (RIP) for the respiratory signal, they are less obtrusive than the full PSG
recordings. Moreover, these signal modalities have been shown to be feasible for the applica-
tions of home sleep monitoring in long-term since they can be acquired using several advanced
unobtrusive measurement techniques, as mentioned in Section 1.4 (Chapter 1).
Researchers have already attempted to identify sleep stages using cardiorespiratory activity
(and body movements) measured with some unobtrusive techniques, e.g., BCG in the form of
a mattress [161, 303], textile pressure-sensitive sensor arrays in the form of a bedsheet [264],
photoplethysmography (PPG) in the form of a wrist watch/band [260], and peripheral arterial
tone (PAT) in the form of a finger tip [127]. However, as discussed, the performance reported
in these studies are not as good as expected. The challenges (such as the reliability of mea-
surements to the standard ECG and RIP signals, the robustness against motion artifacts during
measurement, and the influences of sleep postures) vary between the various techniques. These
challenges need to be addressed when using those techniques for sleep stage classification. To-
wards unobtrusive sleep monitoring, the methods presented in this thesis need to be further
validated when those unobtrusively acquired cardiorespiratory (and body movement) informa-
tion comes available.
References
[1] J. Aach and G. M. Church. Aligning gene expression time series with time warping
algorithms. Bioinformatics, 17(6):495-508, 2001.
[2] Adidas miCoach Heart Rate Monitor (retrieved in Jan. 2015). [Online] Available:
http://micoach.adidas.com/heartratemonitor.
[3] M. Adnane, Z. Jiang, and Z. Yan. Sleep-wake stages classification and sleep efficiency
estimation using single-lead electrocardiogram. Expert Syst. Appl., 39(1):1401–1413,
2012.
[4] V. X. Afonso, W. J. Tompkins, T. Q. Nguyen, and S. Luo. ECG beat detection using filter
banks. IEEE Trans. Biomed. Eng., 46(2):192–202, 1996.
[5] H. W. Agnew and W. B. Webb. Measurement of sleep onset by EEG criteria. Am. J. EEG
Technol., 12:127-134, 1972.
[6] M. Ahmadlou and H. Adeli. Visibility graph similarity: A new measure of generalized
synchronization in coupled dynamic systems. Physica D, 241(4):326-332, 2012.
[7] J. Alihanka, K. Vaahtoranta, and I. Saarikivi. A new method for long-term monitoring of
the ballistocardiogram, heart rate, and respiration. Am. J. Physiol. Regul. Integr. Comp.
Physiol., 240(5):R384-R392, 2012.
[8] T. Åkerstedt, M. Billiard, M. Bonnet, G. Ficca, L. Garma, M. Mariotti, P. Salzarulo, and

H. Schulz. Awakening from sleep. Sleep Med. Rev., 6(4):267–286, 2002.
[9] T. Åkerstedt, K. Hume, D. Minors, and J. Waterhouse. The meaning of good sleep: a lon-
gitudinal study of polysomnography and subjective sleep quality. J. Sleep Res., 3(3):152–
158, 1994.
[10] T. Åkerstedt, K. Hume, D. Minors, and J. Waterhouse. Good sleep–its timing and physi-
ological sleep characteristics. J. Sleep Res., 6(4):221–229, 1997.
179
180 References
[11] T. Åkerstedt, A. Knutsson, P. Westerholm, T. Theorell, L. Alfredsson, and G. Kecklund.

Sleep disturbances, work stress and work hours: a cross-sectional study. J. Psychosom.
Res., 53(3):741–748, 2002.
[12] M. Ako, T. Kawara, S. Uchida, S. Miyazaki, K. Nishihara, J. Mukai, K. Hirao, J. Ako and
Y. Okubo. Correlation between electroencephalography and heart rate variability during
sleep. Science, 57(1):59–65, 2003.
[13] S. Akselrod, D. Gordon, F. A. Ubel, D. C. Shannon, A. C. Berger, and R. J. Cohen.

Power spectrum analysis of heart rate fluctuation: a quantitative probe of beat-to-beat
cardiovascular control. Science, 213(4504):220–222, 1981.
[14] R. Albert and A.-L. Barabási. Statistical mechanics of complex networks. Rev. Mod.
Phys., 74:47, 2002.
[15] P. Alhola, P. Polo-Kantola. Sleep deprivation: impact on cognitive performance. Neu-

ropsychiatr. Dis. Treat., 3(5):553-567, 2007.
[16] J. Allen. Photoplethysmography and its application in clinical physiological measure-

ment. Physiol. Meas., 28(3):R1–R39, 2007.
[17] F. Amzica and M. Steriade. Electrophysiological correlates of sleep delta waves. Elec-
troencephalogr. Clin. Neurophysiol., 107(2):69–83, 1998.
[18] S. Ancoli-Israel, R. Cole, C. Alessi, M. Chambers, W. Moorcroft, and C. P. Pollak. The

role of actigraphy in the study of sleep and circadian rhythms. Sleep, 26(3):342-392,
2003.
[19] P. Anderer, G. Gruber, S. Parapatics, M. Woertz, T. Miazhynskaia, G. Klösch, B. Saletu,

J. Zeitlhofer, M. J. Barbanoj, H. Danker-Hopfe, S. L. Himanen, B. Kemp, T. Penzel, M.
Grözinger, D. Kunz, P. Rappelsberger, A. Schlögl, and G. Dorffner. An E-health solution
for automatic sleep classification according to Rechtschaffen and Kales: validation study
of the Somnolyzer 24×7 utilizing the Siesta database. Neuropsychobiology, 51(3):115–
133, 2005.
[20] P. Anderer, A. Moreau, M. Woertz, M. Ross, G. Gruber, S. Parapatics, E. Loretz, E.

Heller, A. Schmidt, M. Boeck, D. Moser, G. Kloesch, B. Saletu, G. M. Saletu-Zyhlarz,
H. Danker-Hopfe, J. Zeitlhofer, and G. Dorffner. Computer-assisted sleep classification
according to the standard of the American Academy of Sleep Medicine: validation study
of the AASM version of the Somnolyzer 24×7. Neuropsychobiology, 62(4):250–264,
2010.
[21] R. Armitage, M. Trivedi, R. Hoffmann and A. J. Rush. Relationship between objective

and subjective sleep measures in depressed patients and healthy controls. Depress. Anx-
iety , 5(2):97-102, 1997.
References 181
[22] M. Aydin, R. Altin, A. Ozeren, L. Kart, M. Bilge, and M. Unalacak. Cardiac auonomic
activity in obstructive sleep apnea: time-dependent and spectral analysis of heart rate
variability using 24-hour Holter electrocardiograms. Tex. Heart Inst. J., 31(2):132–136,
2004.
[23] E. Bagiella, R. P. Sloan, and D. F. Heitjan. Mixed-effects models in psychophysiology.

Psychophysiology. Psychophysiol., 37(1):13–20, 2000.
[24] A. Baharav, S. Kotagal, V. Gibbons, B. K. Rubin, G. Pratt, J. Karin, and S. Akselrod.

Fluctuations in autonomic nervous activity during sleep displayed by power spectrum
analysis of heart rate variability. Neurology, 45(6):1183–1187, 1995.
[25] R. Bailon, P. Laguna, L. Mainardi, and L. Sornmo. Analysis of heart rate variability
using time-varying frequency bands based on respiratory frequency. In Proc. 29th Ann.
Int. Conf. IEEE Eng. Med. Biol. Soc., pp. 6675–6678, Lyon, France, 2007.
[26] R. Bakeman and J. M. Gottman. Observing Interaction: An Introduction to Sequential

Analysis, 2nd edn., Cambridge University Press, 1986.
[27] S. Banks and D. F. Dinges. Behavioral and physiological consequences of sleep restric-
tion. J. Clin. Sleep Med., 3(5):519–528, 2007.
[28] A. Bar, G. Pillar, I. Dvir, J. Sheffy, R. P. Schnall, and P. Lavie. Evaluation of a portable
device based on peripheral arterial tone for unattended home sleep studies. Chest,
123(3):695–703, 2003.
[29] R. P. Bartsch, J. W. Kantelhardt, T. Penzel, and S. Havlin. Experimental evidence for

phase synchronization transitions in the human cardiorespiratory system. Phys. Rev.
Lett., 98(5):054102, 2007.
[30] R. P. Bartsch, A. Y. Schumann, J. W. Kantelhardt, T. Penzel, and P. Ch. Ivanov. Phase

transitions in physiologic coupling. Proc. Natl. Acad. Sci. U.S.A., 109(26):10181-10186,
2012.
[31] A. Bashan, R. P. Bartsch, J. W. Kantelhardt, and S. Havlin. Comparison of detrending

methods for fluctuation analysis. Physica A Stat. Mech. Appl., 387(21):5080–5090, 2008.
[32] M. Basner, B. Griefahn, U. Müller, G. Plath, and A. Samel. An ECG-based algorithm

for the automatic identification of autonomic activations associated with cortical arousal.
Sleep, 30(10):1349–11361, 2007.
[33] J. Behar, A. Roebuck, J. S. Domingos, E. Gederi, and G. D. Clifford. A review of current

sleep screening applications for smartphones. Physiol. Meas., 34(7):R29–R46, 2013.
[34] J. H. Benington and H. C. Heller. Restoration of brain energy metabolism as the function
of sleep. Prog. Neurobiol., 45(4):347–360, 1995.
182 References
[35] R. J. Berger and N. H. Phillips. Energy conservation and sleep. Behav. Brain Res., 69(1-
2):65–73, 1995.
[36] I. I. Berlad, A. Shlitner, S. Ben-Haim, and P. Lavie. Power spectrum analysis and heart
rate variability in stage 4 and REM Sleep: evidence for state-specific changes in auto-
nomic dominance. J. Sleep Res., 2(2):88–90, 1993.
[37] D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series.
In Proc. Assoc. Advancement Artif. Intell. Workshop Knowl. Disc. Databases (AAAI-
KDD’94), pp. 359–370, 1994.
[38] R. B. Berry, R. Budhiraja, D. Gottlieb, D. Gozal, C. Iber, V. K. Kapur, C. L. Marcus, R.

Mehra, S. Parthasarathy, S. F. Quan, S. Redline, K. P. Strohl, S. L. Davidson Ward, and
M. M. Tangredi. Rules for scoring respiratory events in sleep: update of the 2007 AASM
manual for the scoring of sleep and associated events. J. Clin. Sleep Med., 8(5):597–619,
2012.
[39] R. B. Berry, R. Brooks, C. E. Gamaldo, S. M. Harding, C. L. Marcus, and B. V. Vaughn.

The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminol-
ogy and Technical Specifications, Version 2.0. American Academy of Sleep Medicine,
Darien, IL, 2012.
[40] C. Berthomier, X. Drouot, M. Herman-Stoı̈ca, P. Berthomier, J. Prado, D. Bokar-Thire,

O. Benoit, J. Mattout, and M.-P. d’Ortho. Automatic analysis of single-channel sleep
EEG: validation in healthy individuals. Sleep, 30(11):1587–1595, 2007.
[41] H. Bettermann, D. Cysarz, and P. Van Leeuwen. Detecting cardiorespiratory coordi-

nation by respiratory pattern analysis of heart period dynamicsthe musical rhythm ap-
proach. Int. J. Bifurcation Chaos Appl. Sci. Eng., 10(10):2349–2360, 2000.
[42] A. M. Bianchi, L. Mainardi, E. Petrucci, M. G. Signorini, M. Mainardi, and S. Cerutti.

Time-variant power spectrum analysis for the detection of transient episodes in HRV
signal. IEEE Trans. Biomed. Eng., 40(2):136–144, 1993.
[43] A. M. Bianchi, L. T. Mainardi, C. Meloni, S. Chierchia, and S. Cerutti. Continuous

monitoring of the sympatho-vagal balance through spectral analysis. IEEE Eng. Med.
Biol. Mag., 16(5):64–73, 1997.
[44] A. Boardman, F. S. Schlindwein, A. P. Rocha, and A. Leite. A study on the optimum

order of autoregressive models for heart rate variability. Physiol. Meas., 23(2):325–336,
2002.
[45] M. H. Bonnet. Effect of sleep disruption on sleep, performance, and mood. Sleep,
8(1):11–19, 1985.
References 183
[46] M. H. Bonnet and D. L. Arand. Heart rate variability: sleep stage, time of night, and
arousal influences. Electroenceph. Clin. Neurophysiol., 102(5):390–396, 1997.
[47] M. H. Bonnet and D. L. Arand. Clinical effects of sleep fragmentation versus sleep de-
privation. Sleep Med. Rev., 7(4):297–310, 2003.
[48] A. A. Borbély and P. Achermann. Sleep homeostasis and models of sleep regulation. J.
Biol. Rhythms., 14(6):559-570, 1999.
[49] G. Brandenberger, A. U. Viola, J. Ehrhart, A. Charloux, B. Geny, F. Piquard, and C.

Simon. Age-related changes in cardiac autonomic control during sleep. J. Sleep Res.,
12(3):173-180, 2003.
[50] D. Bratsun, D. Volfson, L. S. Tsimring, and J. Hasty. Delay-induced stochastic oscilla-

tions in gene regulation. Proc. Natl. Acad. Sci. U.S.A., 102(41):14593-14598, 2005.
[51] M. Bresler, K. Sheffy, G. Pillar, M. Preiszler, and S. Herscovici. Differentiating be-

tween light and deep sleep stages using an ambulatory device based on peripheral arterial
tonometry. Physiol. Meas., 29(5):571–584, 2008.
[52] G. de Bruijne, P. Sommen, and R.M. Aarts. Detection of epileptic seizures through audio
classification. In 4th Eur. Congr. Int. Fed. Med. Biol. Eng. (IFMBE’08), pp. 1450–1454,
Antwerpen, Belgium, 2008.
[53] M. Bsoul, H. Minn, M. Nourani, G. Gupta, and L. Tamil. Real-time sleep quality as-
sessment using single-lead ECG and multi-stage SVM classifier. In Proc. 32nd Ann. Int.
Conf. IEEE Eng. Med. Biol. Soc. (EMBC’10), pp. 1178–1181, Buenos Aires, Argentina,
2010.
[54] A. Bunde, S. Havlin, J. W. Kantelhardt, T. Penzel, J.-H. Peter, and K. Voigt. Corre-
lated and uncorrelated regions in heart-rate fluctuations during sleep. Phys. Rev. Lett.,
85(17):3736, 2000.
[55] H. J. Burgess, A. L. Holmes, and D. Dawson. The relationship between slow-wave activ-
ity, body temperature, and cardiac activity during nighttime sleep. Sleep, 24(3):343–349,
2001.
[56] H. J. Burgess, T. Sletten, N. Savic, S. S. Gilbert, and D. Dawson. Effects of bright light
and melatonin on sleep propensity, temperature, and cardiac activity at night. J. Appl.
Physiol., 91(3):1214–1222, 2001.
[57] H. J. Burgess, J. Trinder, Y. Kim, and D. Luke. Sleep and circadian influences on cardiac
autonomic nervous system activity. Am. J. Physiol. Heart Circ. Physiol., 273(4):H1761–
H1768, 1997.
[58] R. L. Burr. Interpretation of normalized spectral heart rate variability indices in sleep
research: a critical review. Sleep, 30(7):913–919, 2007.
184 References
[59] P. Bušek, J. Vaňková, J. Opavský, J. Salinger, and S. Nevšı́malová. Spectral analysis of

the heart rate variability in sleep. Physiol. Res., 54(4):369–376, 2005.
[60] D. J. Buysse, C. F. Reynolds, T. H. Monk, S. R. Berman, and D. J. Kupfer. The Pittsburgh

Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry
Res., 28(2):193–213, 1989.
[61] G. BuzsÁk. Memory consolidation during sleep: a neurophysiological perspective. J.

Sleep Res., 7(S1):17–23, 1998.
[62] C. Cajochen, J. Pischke, D. Aeschbach, and A. A. Borbély. Heart rate dynamics during
human sleep. Physiol. Behav., 55(4):767–774, 1994.
[63] M. A. Carskadon and W. C. Dement. Normal human sleep: an overview. In Principles

and Practice of Sleep Medicine, edited by M. H. Kryger, T. Roth, and W. C. Dement,
Chap. 2, pp. 16-26, Elsevier Saunders, St. Louis, 2011.
[64] N. Carter, R. Henderson, S. Lai, M. Hart, S. Booth, and S. Hunyor. Cardiovascular and
autonomic response to environmental noise during sleep in night shift workers. Sleep,
25(4):457-464, 2002.
[65] Centers for Disease Control and Prevention (CDC). Perceived insufficient rest or sleep
among adults – United States, 2008. MMWR Morb. Mortal. Wkly. Rep., 58(42):1175-
1179, 2009.
[66] W. Chen, X. Zhu, T. Nemoto, Y. Kanemitsu, K. Kitamura, and K. Yamakoshi. Uncon-

strained detection of respiration rhythm and pulse rate with one under-pillow sensor dur-
ing sleep. Med. Biol. Eng. Comput., 43(2):306-312, 2005.
[67] N. S. Cherniack. Respiratory dysrhythmias during sleep. N. Engl. J. Med., 305(6):325–

330, 1981.
[68] B. H. Choi, G. S. Chung, J.-S. Lee, D.-U. Jeong, and K. S. Park. Slow-wave sleep
estimation on a load-cell-installed bed: a non-constrained method. Physiol. Meas.,
30(11):1163–1170, 2009.
[69] G. S. Chung, B. H. Choi, J.-S. Lee, J. S. Lee, D.-U. Jeong, and K. S. Park. REM sleep
estimation only using respiratory dynamics. Physiol. Meas., 30(12):1327–1340, 2009.
[70] G. S. Chung, J. S. Lee, S. H. Hwang, Y. K. Kim, D.-U. Jeong, and K. S. Park. Wake-
fulness estimation only using ballistocardiogram: Nonintrusive method for sleep moni-
toring. In Proc. 32nd Ann. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC’10), pp. 2459–
2462, Buenos Aires, Argentina, 2010.
[71] D. Clifford, G. Stone, I. Montoliu, S. Rezzi, F. P. Martin, P. Guy, S. Bruce, and

S. Kochhar. Alignment using variable penalty dynamic time warping. Anal. Chem.,
81(3):1000-1007, 2009.
References 185
[72] J. Cohen. A coefficient of agreement for nominal scales. Educ. Psychol. Meas., 20(1):37–
46, 1960.
[73] M. A. Cohn, A. S. Rao, M. Broudy, S. Birch, H. Watson, N. Atkins, B. Davis, F. D.

Stott, and M. A. Sackner. The respiratory inductive plethysmograph: a new non-invasive
monitor of respiration. Bull. Eur. Physiopathol. Respir., 18(4):643–658, 1982.
[74] R. J. Cole, D. F. Kripke, W. Gruen, D. J. Mullaney, and J. C. Gillin. Automatic

sleep/wake identification from wrist activity. Sleep, 15(5):461–469, 1992.
[75] M. Costa, A. L. Goldberger, and C.-K. Peng. Multiscale entropy analysis of biological
signals. Phys. Rev. E, 71(2):021906, 2005.
[76] L. J. Cronbach. Research in classrooms and schools: Formulation of questions, designs

and analysis, Occasional Paper, Stanford Evaluation Consortium, 1976.
[77] D. Cysarz, H. Bettermann, S. Lange, D. Geue, and P. Van Leeuwen. A quantitative com-
parison of different methods to detect cardiorespiratory coordination during night-time
sleep. Biomed. Eng. Online, 3:44, 2004.
[78] D. Cysarz, H. Bettermann, and P. Van Leeuwen. Entropies of short binary sequences in
heart period dynamics. Am. J. Physiol. Heart Circ. Physiol., 278(6):H2163–2172, 2000.
[79] D. Cysarz, R. Zerm, H. Bettermann, M. Frühwirth, M. Moser, and M. Kröz. Comparison

of respiratory rates derived from heart rate variability, ECG amplitude, and nasal/oral
airflow. Ann. Biomed. Eng., 36(12):2085–2094, 2008.
[80] Y. Dagan. Circadian rhythm sleep disorders (CRSD). Sleep Med. Rev., 6(1):45–54, 2002.
[81] H. Danker-Hopfe, P. Anderer, J. Zeitlhofer, M. Boeck, H. Dorn, G. Gruber, E. Heller,

E. Loretz, D. Moser, S. Parapatics, B. Saletu, A. Schmidt, and G. Dorffner. Interrater
reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM
standard. J. Sleep. Res., 18(1):74–84, 2009.
[82] H. Davis, P. A. Davis, A. L. Loomis, E. N. Harvey, and G. Hobart. Human brain poten-
tials during the onset of sleep. J. Neurophysiol., 1:24-38, 1938.
[83] J. Davis and M. Goadrich. The relationship between precision-recall and ROC curves. In
Proc. 23rd Int. Conf. Machine Learn. (ICML’06), pp. 223–240, Pittsburgh, PA, 2006.
[84] C. De Boor. A Practical Guide to Splines, Springer-Verlag, New York, NY, 2001.
[85] P. De Chazal, N. Fox, E. O’Hare, C. Heneghan, A. Zaffaroni, P. Boyle, S. Smith, C.

O’Connell, and W. T. McNicholas. Sleep/wake measurement using a non-contact biomo-
tion sensor. J. Sleep Res., 20(2):356-366, 2011.
186 References
[86] S. De Franciscis, S. Johnson, and J. J. Torres. Enhancing neural-network performance

via assortativity. Phys. Rev. E, 83:036114, 2011.
[87] L. De Souza, A. A. Benedito-Silva, M. L. N. Pires, D. Poyares, S. Tufik, and H. M. Calil.

Further validation of actigraphy for sleep studies. Sleep, 26(1):81–85, 2003.
[88] W. Dement and N. Kleitman. The relation of eye movements during sleep to dream
activity: an objective method for the study of dreaming. J. Exp. Psychol., 53(5):339–
346, 1957.
[89] S. Devot, R. Dratwa, and E. Naujokat. Sleep/wake detection based on cardiorespira-

tory signals and actigraphy. In Proc. 32nd Ann. Int. Conf. IEEE Eng. Med. Biol. Soc.
(EMBC’10), pp. 5089–5092, Buenos Aires, Argentina, 2010.
[90] M. Dhamala, V. K. Jirsa, and M. Ding. Enhancement of neural synchrony by time delay.
Phys. Rev. Lett., 92(7):074104, 2004.
[91] D. J. Dijk. Sleep-wave sleep: characteristics and homeostatic regulation. In Slow-Wave

Sleep: Beyond Insomnia – The Importance of Slow-Wave Sleep for Your Patients, edited
by T. Roth and D. J. Dijk, Wolters Kluwer Health Pharma Solutions, London, UK, 2010.
[92] D. J. Dijk. Slow-wave sleep, diabetes, and the sympathetic nervous system. Proc. Natl.
Acad. Sci. U.S.A., 105(4):1107-1108, 2008.
[93] M. Di Rienzo, F. Rizzo, G. Parati, G. Brambilla, M. Ferratini, and P. Castiglioni. MagIC

system: a new textile-based wearable device for biological signal monitoring. applicabil-
ity in daily life and clinical setting. In Proc. 27nd Ann. Int. Conf. IEEE Eng. Med. Biol.
Soc. (EMBC’05), pp. 7167–7169, Shanghai, China, 2005.
[94] A. Domingues, T. Paiva, and J. M. Sanches. Hypnogram and sleep parameter computa-
tion from activity and cardiovascular data. IEEE Trans. Biomed. Eng., 61(6):1711–1719,
2014.
[95] N. J. Douglas, D. P. White, C. K. Pickett, J. V. Weil, and C. W. Zwillich. Respiration

during sleep in normal man. Thorax, 37(11):840–844, 1982.
[96] R. V. Donner, Y. Zou, J. F. Donges, N. Marwan, and J. Kurths. Recurrence networksa

novel paradigm for nonlinear time series analysis. New J. Phys., 12:033025, 2010.
[97] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2nd edn., Wiley-
Interscience Press, 2000.
[98] J. S. Durmer and D. F. Dinges. Neurocognitive consequences of sleep deprivation. Semin.

Neurol., 25(1):117–129, 2005.
References 187
[99] F. Ebrahimi, S.-K. Setarehdan, J. Ayala-Moyeda, and H. Nazeran. Automatic sleep stag-
ing using empirical mode decomposition, discrete wavelet transform, time-domain, and
nonlinear dynamics features of heart rate variability signals. Comput. Methods Programs
Biomed., 112(1):47–57, 2013.
[100] V. M. Eguı́luz, D. R. Chialvo, G. A. Cecchi, M. Baliki, and A. V. Apkarian. Scale-free

brain functional networks. Phys. Rev. Lett., 94(1):018102, 2005.
[101] S. Elsenbruch, M. J. Harnish, and W. C. Orr. Heart rate variability during waking and
sleep in healthy males and females. Sleep, 22(8):1067–1071, 1999.
[102] P. A. Estevez, C. M. Held, C. A. Holzmann, C. A. Perez, J. P. Perez, J. Heiss, M. Garrido,

and P. Peirano. Polysomnographic pattern recognition for automated classification of
sleep-waking states in infants. Med. Biol. Eng. Comput., 40(1):105–113, 2002.
[103] T. Fawcett. ROC graphs: notes and practical considerations for researchers, Tech. Rep.
HP Labs, Palo Alto, CA, 2004.
[104] J. Fell, J. Röschke, K. Mann, and C. Schäffner. Discrimination of sleep stages: a com-
parison between spectral and nonlinear EEG measures. Electroencephalogr. Clin. Neu-
rophysiol., 98(5):401–410, 1996.
[105] Fitbit ONE Wireless Activity and Sleep Tracker (retrieved in Jan. 2015). [Online] Avail-
able: https://www.fitbit.com./one.
[106] M. Folke, L. Cernerud, M. Ekstrom, and B. Hok. Critical review of non-invasive respi-
ratory monitoring in medical care. Med. Biol. Eng. Comput., 41(4):377–383, 2003.
[107] P. Fonseca, R. M Aarts, J. Foussier, and X. Long. A novel low-complexity post-

processing algorithm for precise QRS localization. SpringerPlus, 3:376, 2014.
[108] J. Foussier, P. Fonseca, X. Long, and S. Leonhardt. Automatic feature selection for
sleep/wake classification with small data sets. In 7th Int. Joint Conf. Biomed. Eng. Syst.
Technol. (BIOSTEC’13), pp. 178–184, Barcelona, Spain, 2013.
[109] B. L. Frankel, R. D. Coursey, R. Buchbinder, and F. Snyder. Recorded and reported sleep
in chronic primary insomnia. Arch. Gen. Psychiatry, 33(5):615–623, 1976.
[110] J. H. Friedman. Regularized discriminant analysis. J. Am. Stat. Assoc., 84(405):165–175,

2012.
[111] P. M. Fuller and C. J. Amlaner (eds.). SRS Basics of Sleep Guide, 2nd ed., Sleep Research
Society, Darien, IL, 2011.
[112] J. M. Gaillard. Chronic primary insomnia: possible physiopathological involvement of

slow wave sleep deficiency. Sleep, 1(2):133–147, 1978.
188 References
[113] A. S. Gami, D. E. Howard, E. J. Olson, and V. K. Somers. Daynight pattern of sudden

death in obstructive sleep apnea. N. Engl. J. Med., 352(12):1206–1214, 2005.
[114] G. Garcia-Molina, M. Bellesi, S. Pastoor, S. Pfundtner, B. Riedner, and G. Tononi. On-

line single EEG channel based automatic sleep staging. Eng. Psychol. Cogn. Erg. Appl.
Serv. LNCS, 8020:333–342, 2013.
[115] I. Gath and E. Bar-On. Computerized method for scoring of polygraphic sleep record-
ings. Comput. Prog. Biomed., 11(3):217–223, 1980.
[116] D. R. Goodenough, H. B. Lewis, A. Shapiro, L. Jaret, and I. Sleser. Dream reporting

following abrupt and gradual awakenings from different types of sleep. J. Pers. Soc.
Psychol., 2(2):170–179, 1965.
[117] Y. Goren, L. R. Davrath, I. Pinhas, E. Toledo, and S. Akselrod. Individual time-dependent

spectral boundaries for improved accuracy in time-frequency analysis of heart rate vari-
ability. IEEE Trans. Biomed. Eng., 53(1):35–42, 2006.
[118] P. Grossman, F. H. Wilhelm, and M. Spoerle. Respiratory sinus arrhythmia, cardiac va-
gal control, and daily activity. Am. J. Physiol. Heart Circ Physiol., 287(2):H728–H734,
2004.
[119] C. Guilleminault, J. G. Briskin, M. S. Greenfield, and R. Silvestri. The impact of au-

tonomic nervous system dysfunction on breathing during sleep. Sleep, 4(3):263–278,
1981.
[120] C. Guilleminault, A. Tilkian, K. Lehrman, L. Forno, and W. C. Dement. Sleep apnoea

syndrome: state of sleep and autonomic dysfunction. J. Neurol. Neurisurg. Psychiatry,
40(7):718–725, 1977.
[121] M. A. Hall. Correlation-based feature selection for machine learning. Ph.D. dissertation,
Dept. Computer Science, The Univ. of Waikato, Hamilton, New Zealand, 1999.
[122] M. Hall, R. Vasko, D. Buysse, H. Ombao, Q. Chen, J. D. Cashmere, D. Kupfer, and

J. F. Thayer. Acute stress affects heart rate variability during sleep. Psychosom. Med.,
66(1):56–62, 2004.
[123] P. S. Hamilton. Open Source ECG Analysis. In Computing in Cardiology (CinC),

pp. 101–104, Memphis, TN, 2002.
[124] P. S. Hamilton and W. J. Tompkins. Quantitative investigation of QRS detection rules

using the MIT/BIH arrhythmia database. IEEE Trans. Biomed. Eng., 33(12):1157–1165,
1986.
[125] H. He and E. A. Garcia. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng.,
21(9):1263-1284, 2009.
References 189
[126] J. Hedner, G. Pillar, S. D. Pittman, D. Zou, L. Grote, and D. P. White. A novel adap-
tive wrist actigraphy algorithm for sleep-wake assessment in sleep apnea patients. Sleep,
27(8):1560-1566, 2004.
[127] J. Hedner, D. P. White, A. Malhotra, S. Herscovici, S. D. Pittman, D. Zou, L. Grote, and

G. Pillar. Sleep staging based on autonomic signals: a multi-center validation study. J.
Clin. Sleep Med., 7(3):301–306, 2011.
[128] A. Heinrich, F. Van Heesch, B. Puvvula, and M. Rocque. Video based actigraphy and
breathing monitoring from the bedside table of shared beds. J. Ambient. Intell. Human
Comput., 6(1):107–120, 2015.
[129] R. C. Heinzer and F. Series. Normal physiology of the upper and lower airways. In Prin-
ciples and practice of sleep medicine, edited by M. H. Kryger, T. Roth, W. C. Dement,
pp. 581–596, Saunders Elsevier, St. Louis, MO, 2011.
[130] E. Hellinger. Neue begründung der theorie quadratischer formen von unendlichvielen
veränderlichen. J. für die Reine und Angew Math., 36:210–271, 1909.
[131] S. Herscovici, A. Pe’er, S. Papyan, P. Lavie. Detecting REM sleep from the finger: an
automatic REM sleep algorithm based on peripheral arterial tone (PAT) and actigraphy.
Physiol. Meas., 28(2):129–140, 2007.
[132] R. L. Horner. Autonomic consequences of arousal from sleep: mechanisms and implica-
tions. Sleep, 19(10 Suppl.):S193-195, 1996.
[133] D. Horvatic, H. E. Stanley, and B. Podobnik. Detrended cross-correlation analysis for

non-stationary time series with periodic trends. Eur. Phys. Lett., 94(1):18007, 2011.
[134] J. J. Hox. Multilevel Analysis: Techniques and Applications, 2nd edn., Routledge, 2010.
[135] D. W. Hudgel, R. J. Martin, B. Johnson, and P. Hill. Mechanics of the respiratory system
and breathing pattern during sleep in normal humans. J. Appl. Physiol. Respir. Environ.
Exerc. Physiol., 56(1):133–137, 1984.
[136] C. Iber, S. Ancoli-Israel, A. L. Chesson, and S. F. Quan. The AASM Manual for the Scor-
ing of Sleep and Associated Events: Rules, Terminology and Technical Specifications.
American Academy of Sleep Medicine, Westchester, IL, 2007.
[137] Y. Ichimaru, K. P. Clark, J. Ringler, and W. J. Weiss. Effect of sleep stage on the relation-
ship between respiration and heart rate variability. In Computers in Cardiology (CinC),
pp. 657–660, Chicago, IL, 1990.
[138] S. M. Isa, I. Wasito, and A. M. Arymurthy. Kernel dimensionality reduction on sleep

stage classification using ECG signal. Int. J. Comput. Spec. Iss., 8(4):115–123, 2011.
190 References
[139] N. Iyengar, C. K. Peng, R. Morin, A. L. Goldberger, and L. A. Lipsitz. Age-related

alterations in the fractal scaling of cardiac interbeat interval dynamics. Am. J. Physiol.
Regul. Integr. Comp. Physiol., 271(4):R1078–1084, 1996.
[140] B. H. Jansen and K. Shankar. Sleep staging with movement-related signals. Int. J.
Biomed. Comput., 32(3-4):289-297, 1993.
[141] J. J. Liu, W. Xu, M.-C. Huang, N. Alshurafa, M. Sarrafzadeh, N. Raut, and B. Yadegar.
Sleep posture analysis using a dense pressure sensitive bedsheet. Perv. Mobile Comput.,
10(2):34-50, 2014.
[142] S. Jasson, C. Medigue, P. Maison-Blanche, N. Montano, L. Meyer, C. Vermeiren, P.

Mansier, P. Coumel, and A. Malliani. Instant power spectrum analysis of heart rate
variability during orthostatic tilt using a time-/frequency-domain method. Circulation,
96(10):3521–3526, 1997.
[143] Jawbone UP Fitness Trackers (retrieved in Jan. 2015). [Online] Available:

http://www.jawbone.com/up.
[144] S. Jiang, C. H. Bian, X. B. Ning, and Q. D. Y. Ma. Visibility graph analysis on heartbeat
dynamics of meditation training. Appl. Phys. Lett., 102(25):253702, 2013.
[145] D. W. Jung, S. H. Hwang, H. N. Yoon, Y.-J. G. Lee, D.-U. Jeong, and K. S. Park. Noc-
turnal awakening and sleep efficiency estimation using unobtrusively measured ballisto-
cardiogram. IEEE Trans. Biomed. Eng., 61(1):131–138, 2013.
[146] F. Jurysta, P. Van De Borne, P.-F. Migeotte, M. Dumont, J.-P. Lanquart, J.-P. Degaute,
P. Linkowski. A study of the dynamic interactions between sleep EEG and heart rate
variability in healthy young men. Clin. Neurophysiol., 114(11):2146–2155, 2003.
[147] M. M. Kabir, H. Dimitri, P. Sanders, R. Antic, E. Nalivaiko, D. Abbott, and M. Baumert.

Cardiorespiratory phase-coupling is reduced in patients with obstructive sleep apnea.
Plos One, 5(5):e10602, 2010.
[148] J. W. Kantelhardt, E. Koscielny-Bunde, H. H. A. Rego, S. Havlin, and A. Bunde. Detect-

ing long-range correlations with detrended fluctuation analysis. Physica A Stat. Mech.
Appl., 295(3-4):441–454, 2001.
[149] J. W. Kantelhardt, T. Penzel, S. Rostig, H. F. Becker, S. Havlin, and A. Bunde. Breathing

during REM and non-REM sleep: correlated versus uncorrelated behaviour. Physica A
Stat. Mech. Appl., 319:447–457, 2003.
[150] W. Karlen, C. Mattiussi, and D. Floreano. Improving actigraph sleep/wake classification

with cardio-respiratory signals. In Proc. 30th Ann. Int. Conf. IEEE Eng. Med. Biol. Soc.
(EMBC), pp. 5262-5265, Vancouver, Canada, 2008.
References 191
[151] W. Karlen, C. Mattiussi, and D. Floreano. Sleep and wake classification with ECG and
respiratory effort signals. IEEE Trans. Biomed. Circuits Syst., 3(2):71–78, 2009.
[152] E. Keogh and M. Pazzani. Scaling up dynamic time warping for datamining applications.
In Proc. 6th Assoc. Comput. Mach. SIG Knowl. Discovery Data Mining (ACM SIGKDD),
pp. 285-289, 2000.
[153] L. Keselbrener and S. Akselrod. Selective discrete Fourier transform algorithm for time-
frequency analysis: method and application on simulated and cardiovascular signals.
IEEE Trans. Biomed. Eng., 43(8):789–802, 1996.
[154] J. W. Kim, J.-S. Lee, P. A. Robinson, and D.-U. Jeong. Markov analysis of sleep dynam-
ics. Phys. Rev. Lett., 102(17):178104, 2009.
[155] S. Kim, S. H. Park, and C. S. Ryu. Multistability in coupled oscillator systems with time
delay. Phys. Rev. Lett., 79(15):2911, 1997.
[156] R. J. Kimoff. Sleep fragmentation in obstructive sleep apnea. Sleep, 19(Suppl.9):S61-

S66, 1996.
[157] M. T. Kinlaw and A. W. Hunt. Time dependence of delayed neutron emission for fission-
able isotope identification. Appl. Phys. Lett., 86(25):254104, 2005.
[158] T. Kirjavainen, D. Cooper, O. Polo, and C. E. Sullivan. Respiratory and body movements
as indicators of sleep stage and wakefulness in infants and young children. J. Sleep Res.,
5(3):186-194, 1996.
[159] A. Kishi, Z. R. Struzik, B. H. Natelson, F. Togo, and Y. Yamamoto. Dynamics of sleep

stage transitions in healthy humans and patients with chronic fatigue syndrome. Am. J.
Physiol. Regul. Integr. Comp. Physiol., 294(6):R1980-R1987, 2008.
[160] G. Klösch, B. Kemp, T. Penzel, A. Schlögl, P. Rappelsberger, E. Trenker, G. Gruber,

J. Zeitlhofer, B. Saletu, W. M. Herrmann, S. L. Himanen, D. Kunz, M. J. Barbanoj,
J. Röschke, A. Värri, and G. Dorffner. The SIESTA project polygraphic and clinical
database. IEEE Eng. Med. Biol. Mag., 20(3):51–57, 2001.
[161] J. M. Kortelainen, M. O. Mendez, A. M. Bianchi, M. Matteucci, and S. Cerutti. Sleep

staging based on signals acquired through bed sensor. IEEE Trans Inf. Technol. Biomed.,
14(3):776-785, 2010.
[162] Z. M. Kovacs-Vajna. A fingerprint verification system based on triangular matching

and dynamic time warping. IEEE Trans. Pattern Anal. Mach. Intell., 22(11):1266-1276,
2000.
[163] J. Krieger. Breathing during sleep in normal subjects. Clin. Chest Med., 6(4):577-594,
1985.
192 References
[164] A. D. Krystal and J. D. Edinger. Measuring sleep quality. Sleep Med., 9(Suppl.1):S10-
S17, 2008.
[165] J. M. Krueger, D. M. Rector, S. Roy, H. P. Van Dongen, G. Belenky, and J. Panksepp.

Sleep as a fundamental property of neuronal assemblies. Nat. Rev. Neurosci., 9(12):910-
919, 2008.
[166] Y. M. Kuo, J. S. Lee, and P. C. Chung. A visual context-awareness-based sleeping-

respiration measurement system. IEEE Trans. Inf. Technol. Biomed., 14(2):255-265,
2010.
[167] Y. Kurihara and K. Watanabe. Sleep-stage decision algorithm by using heartbeat and
body-movement signals. IEEE Trans. Syst. Man. Cybern. A Syst. Hum., 42(6):1450-
1459, 2012.
[168] C. A. Kushida, M. R. Littner, T. Morgenthaler, C. A. Alessi, D. Bailey, J. Coleman, L.

Friedman, M. Hirshkowitz, S. Kapen, M. Kramer, T. Lee-Chiong, D. L. Loube, J. Owens,
J. P. Pancer, and M. Wise. Practice parameters for the indications for polysomnography
and related procedures: an update for 2005. Sleep, 28(4):499-521, 2005.
[169] L. Lacasa, B. Luque, F. Ballesteros, J. Luque, and J. C. Nuño. From time series to com-
plex networks: the visibility graph. Proc. Natl. Acad. Sci. U.S.A., 105(13):4972-4975,
2008.
[170] D. K. Lake, J. R. Moorman, and H. Cao. Sample entropy estimation using sampen. In
PhysioNet (May 2014), [Online] Available: http://physionet.org/physiotools/sampen.
[171] D. E. Lake, J. S. Richman, M. P. Griffin, and J. R. Moorman. Sample entropy anal-

ysis of neonatal heart rate variability. Am. J. Physiol. Regul. Integr. Comp. Physiol.,
283(3):R789-797, 2002.
[172] J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical
data. Biometrics, 33(1):159–174, 1977.
[173] L. E. Larsen and D. O. Walter. On automatic methods of sleep staging by EEG spectra.
Electroencephalogr. Clin. Neurophysiol., 28(5):459-467, 1970.
[174] J. Lázaro, E. Gil, R. Bailón, A. Minchole, and P. Laguna. Deriving respiration from
photoplethysmographic pulse width. Med. Biol. Eng. Comput., 51(1-2):233–242, 2013.
[175] K. L. Lichstein, K. C. Stone, J. Donaldson, S. D. Nau, J. P. Soeffing, D. Murray, K. W.

Lester, and R. N. Aguillard. Actigraphy validation with insomnia. Sleep, 29(2):232–239,
2006.
[176] S. S. Lobodzinski and M. M. Laks. New devices for very long-term ECG monitoring.
Cardiol. J., 19(2):210–214, 2012.
References 193
[177] X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Using dynamic time
warping for sleep and wake discrimination. In Proc. IEEE-EMBS Int. Conf. Biomed.
Health Inf. (BHI), pp. 886–889, Hong Kong and Shenzhen, China, 2012.
[178] X. Long, P. Fonseca, R. Haakma, R. M. Aarts, and J. Foussier. Time-frequency analysis

of heart rate variability for sleep and wake classification. In Proc. 12nd IEEE Int. Conf.
BioInform. BioEng. (BIBE), pp. 85–90, Larnaca, Cyprus, 2012.
[179] X. Long, P. Fonseca, R. Haakma, R. M. Aarts, and J. Foussier. Spectral boundary adap-
tation on heart rate variability for sleep and wake classification. Int. J. Artif. Intell. Tools,
23(3):1460002, 2014.
[180] X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Sleep and wake classi-
fication with actigraphy and respiratory effort using dynamic warping. IEEE J. Biomed.
Health Inform., 18(4):1272-1284, 2014.
[181] X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Respiration amplitude

analysis for REM and NREM sleep classification. In Proc. 35th Ann. Int. Conf. IEEE
Eng. Med. Biol. Soc. (EMBC), pp. 5017–5020, Osaka, Japan, 2013.
[182] X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Analyzing respiratory ef-
fort amplitude for automated sleep stage classification. Biomed. Signal Process. Control,
14:197–205, 2014.
[183] X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Foussier. Modeling cardiorespira-

tory interaction during sleep with complex networks. Appl. Phys. Lett., 105(20):203701,
2014.
[184] X. Long, R. Haakma, R. M. Aarts, P. Fonseca, and J. Foussier. Between-laboratory and

demographic effects on heart rate and its variability during sleep. In Workshop Smart
Healthcare and Healing Enviornments (SHHE), Eindhoven, The Netherlands, 2014.
[185] X. Long, J. Yang, T. Weysen, R. Haakma, J. Foussier, P. Fonseca, and R. M. Aarts.

Measuring dissimilarity between respiratory effort signals based on uniform scaling for
sleep staging. Physiol. Meas., 35(12):2529–2542, 2014.
[186] X. Long, P. Fonseca, R. Haakma, J. Foussier, and R. M. Aarts. Automatic detection of

overnight deep sleep based on heart rate variability: a preliminary study. In Proc. 36th
Ann. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), pp. 50–53, Chicago, IL, 2014.
[187] J. Lu, D. Sherman, M. Devor, and C. B. Saper. A putative flipflop switch for control of
REM sleep. Nature, 441(7093):589–594, 2006.
[188] B. Luque, L. Lacasa, F. Ballesteros, and J. Luque. Horizontal visibility graphs: exact
results for random time series. Phys. Rev. E, 80:046103, 2009.
194 References
[189] D. C. Mack, J. T. Patrie, P. M. Suratt, R. A. Felder, and M. A. Alwan. Development

and preliminary validation of heart rate and breathing rate detection using a passive,
ballistocardiography-based sleep monitoring system. IEEE Trans. Inf. Technol. Biomed.,
13(1):111-120, 2009.
[190] A. Malliani, M. Pagani, F. Lombardi, and S. Cerutti. Cardiovascular neural regulation

explored in the frequency domain. Circulation, 84(2):482–492, 1991.
[191] P. Maquet, C. Degueldre, G. Delfiore, J. Aerts, J. M. Peters, A. Luxen, and G. Franck.

Functional neuroanatomy of human slow wave sleep. J. Neurosci., 17(8):2807–2812,
1997.
[192] L. Marshall, H. Helgadóttir, M. Mölle, and J. Born. Boosting slow oscillations during
sleep potentiates memory. Nature, 444(7119):610–613, 2006.
[193] M. Massimini, F. Ferrarelli, S. K. Esser, B. A. Riedner, R. Huber, M. Murphy, M. J. Pe-

terson, and G. Tononi. Triggering sleep slow waves by transcranial magnetic stimulation.
Proc. Natl. Acad. Sci. U.S.A., 104(20):8496–8501, 2007.
[194] G. Matthews, B. Sudduth, and M. Burrow. A non-contact vital signs monitor. Crit. Rev.
Biomed. Eng., 28(12):173-178, 2000.
[195] P. Meerlo, A. Sgoifo, and D. Suchecki. Restricted and disrupted sleep: Effects on auto-
nomic function, neuroendocrine stress systems and stress responsivity. Sleep Med. Rev.,
12(3):197-210, 2008.
[196] J. Mendels and D. R. Hawkins. Sleep laboratory adaptation in normal subjects

and depressed patients (“first night effect”). Electroencephalogr. Clin. Neurophysiol.,
22(6):556-558, 1967.
[197] M. O. Mendez, M. Matteucci, V. Castronovo, L. Ferini-Strambi, S. Cerutti, and A. M.

Bianchi. Sleep staging from heart rate variability: time-varying spectral features and
hidden Markov models. Int. J. Biomed. Eng. Technol., 3(3-4):246–263, 2010.
[198] M. O. Mendez, M. Migliorini, J. M. Kortelainen, D. Nisticò, E. Arce-Santana, S. Cerutti,

and A. M. Bianchi. Evaluation of the sleep quality based on bed sensor signals: Time-
variant analysis. In Proc. 32nd Ann. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC’10),
pp. 3994–3997, Buenos Aires, Argentina, 2010.
[199] N. Meziane, J. G. Webster, M. Attari, and A. J. Nimunkar. Dry electrodes for electrocar-
diography. Physiol. Meas., 34(9):R47–R69, 2013.
[200] M. Migliorini, A. M. Bianchi, D. Nisticò, J. Kortelainen, E. Arce-Santana, S. Cerutti,

and M. O. Mendez. Automatic sleep staging based on ballistocardiographic signals
recorded through bed sensors. In Proc. 32nd Ann. Int. Conf. IEEE Eng. Med. Biol. Soc.
(EMBC’10), pp. 3273–3276, Buenos Aires, Argentina, 2010.
References 195
[201] Mio Alpha Intensive Heart Rate Monitor (retrieved in Jan. 2015). [Online] Available:
http://www.mioglobal.com.
[202] N. Montano, T. G. Ruscone, A. Porta, F. Lombardi, M. Pagani, and A. Malliani. Power

spectrum analysis of heart rate variability to assess the changes in sympathovagal balance
during graded orthostatic tilt. Circulation, 90:1826–1834, 1994.
[203] G. B. Moody, R. G. Mark, A. Zoccola, and S. Mantero. Derivation of respiratory signals

from multi-lead ECGs. In Computers in Cardiology (CinC), pp. 113–116, Linköping,
Sweden, 1985.
[204] T. Morgenthaler, C. Alessi, L. Friedman, J. Owens, V. Kapur, B. Boehlecke, T. Brown,

A. Chesson, J. Coleman, T. Lee-Chiong, J. Pancer, and T. J. Swick. Practice parameters
for the use of actigraphy in the assessment of sleep and sleep disorders: An update for
2007. Sleep, 30(4):519-0529, 2007.
[205] M. Muller. Part 1: Analysis and retrieval techniques for music data – Dynamic time
warping. In Information Retrieval for Music and Motion, Chap. 4, pp. 69-84, Springer-
Verlag, Berlin, Germany, 2007.
[206] A. Muzet. Environmental noise, sleep and health. Sleep Med. Rev., 11(2):135-142, 2007.
[207] K. Narkiewicz, N. Montano, C. Cogliati, P. J. H. Van De Borne, M. E. Dyken, and V.

K. Somers. Altered cardiovascular variability in obstructive sleep apnea. Circulation,
98(11):1071–1077, 1998.
[208] V. Natale, M. Drejak, A. Erbacci, L. Tonetti, M. Fabbri, and M. Martoni. Monitoring

sleep with a smartphone accelerometer. Sleep Biol. Rhythm., 10(4):287–292, 2012.
[209] E. P. Neuburg. Frequency warping by dynamic programming. In Proc. IEEE Int. Conf.
Acoust. Speech Signal Process. (ICASSP), pp. 573-575, New York, NY, 1988.
[210] M. E. J. Newman. Assortative mixing in networks. Phys. Rev. Lett., 89(20):208701, 2002.
[211] M. E. J. Newman. The structure and function of complex networks. SIAM Rev.,
45(2):167-256, 2003.
[212] M. E. J. Newman and J. Park. Why social networks are different from other types of
networks. Phys. Rev. E, 68(3):036122, 2003.
[213] H.-V. V. Ngo, T. Martinetz, J. Born, and M. Mölle. Auditory closed-loop stimulation of
the sleep slow oscillation enhances memory. Neuron, 78(3):545–553, 2013.
[214] A. Noviyanto, S. M. Isa, I. Wasito, and A. M. Arymurthy. Selecting features of single

lead ecg signal for automatic sleep stages classification using correlation-based feature
subset selection. Int. J. Comput. Spec. Iss., 8(5):139–148, 2011.
196 References
[215] M. M. Ohayon. Epidemiology of insomnia: what we know and what we still need to
learn. Sleep Med. Rev., 6(2):97–111, 2002.
[216] M. M. Ohayon, M. A. Carskadon, C. Guilleminault, and M. V. Vitiello. Meta-analysis of

quantitative sleep parameters from childhood to old age in healthy individuals: develop-
ing normative sleep values across the human lifespan. Sleep, 27(7):1255–1273, 2004.
[217] H. Otzenberger, C. Simon, C. Gronfier, and G. Brandenberger. Temporal relationship

between dynamic heart rate variability and electroencephalographic activity during sleep
in man. Sleep, 229(3):173–176, 1997.
[218] J. Paalasmaa, M. Waris, H. Toivonen, L. Leppakorpi, and M. Partinen. Unobtrusive on-

line monitoring of sleep at home. In Proc. 34th Ann. Int. Conf. IEEE Eng. Med. Biol.
Soc. (EMBC’14), pp. 3784–3788, San Diego, CA, 2012.
[219] M. Pagani, F. Lombardi, S. Guzzetti, O. Rimoldi, R. Furlan, P. Pizzinelli, G. Sandrone, G.

Malfatto, S. Dell’Orto, and E. Piccaluga. Power spectral analysis of heart rate and arterial
pressure variabilities as a marker of sympatho-vagal interaction in man and conscious
dog. Circ. Res., 59:178–193, 1986.
[220] J. Paquet A. Kawinska, and J. Carrier. Wake detection capacity of actigraphy during
sleep. Sleep, 30(10):1362–1369, 2007.
[221] R. Paradiso, G. Loriga, and N. Taccini. A wearable health care system based on knitted
integrated sensors. IEEE Trans. Inf. Technol. Biomed., 9(3):337–344, 2005.
[222] C.-K. Peng, J. Mietus, J. M. Hausdorff, S. Havlin, H. E. Stanley, and A. L. Goldberger.

Long-range anticorrelations and non-Gaussian behavior of the heartbeat. Phys. Rev. Lett.,
70(9):1343, 1993.
[223] T. Penzel and R. Conradt. Computer based sleep recording and analysis. Sleep Med. Rev.,
4(2):131–148, 2000.
[224] T. Penzel, J. W. Kantelhardt, L. Grote, J. H. Peter, and A. Bunde. Comparison of de-

trended fluctuation analysis and spectral analysis for heart rate variability in sleep and
sleep apnea. IEEE Trans. Biomed. Eng., 50(10):1143–1151, 2003.
[225] T. Penzel, J. W. Kantelhardt, C.-C. Lo, K. Voigt, and C. Vogelmeier. Dynamics of heart
rate and sleep stages in normals and patients with sleep apnea. Neuropsychopharmacol-
ogy, 28:S48-S53, 2003.
[226] T. Penzel, N. Wessel, M. Riedl, J. W. Kantelhardt, S. Rostig, M. Glos, A. Suhrbier,

H. Malberg, and I. Fietze. Cardiovascular and respiratory dynamics during normal and
pathological sleep. Chaos, 17(1):015116, 2007.
References 197
[227] H. R. Peterson, M. Rothschild, C. R. Weinberg, R. D. Fell, K. R. McLeish, and M. A.

Pfeifer. Body fat and the activity of the autonomic nervous system. N. Engl. J. Med.,
318(17):1077-1083, 1988.
[228] D. Pevernagie, R. M. Aarts, and M. D. Meyer. The acoustics of snoring, Sleep Med. Rev.,
14(2):131–144, 2010.
[229] Philips Respironics Actiwatch, Philips Healthcare (retrieved in Nov. 2012). [Online]
Available: http://www.actiwatch.respironics.com.
[230] E. A. Phillipson. Control of breathing during sleep. Am. Rev. Respir. Dis., 118(5):909–
939, 1978.
[231] G. Pocock, C. D. Richards, and D. A. Richards. Human Physiology, 4th edn., Oxford
University Press, 2013.
[232] M.-Z. Poh, D. J. McDuff, and R. W. Picard, “Advancements in noncontact, multiparam-

eter physiological measurements using a webcam,” IEEE Trans. Biomed. Eng., 58(1):7–
11, 2011.
[233] M. I. Polkey, M. Green, and J. Moxham. Measurement of respiratory muscle strength.

Thorax, 50(11):1131–1135, 1995.
[234] C. P. Pollak, W. W. Tryon, H. Nagaraja, and R. Dzwonczyk. How accurately does wrist
actigraphy identify the states of sleep and wakefulness? Sleep, 24(8):957-965, 2001.
[235] I. P. Priban and W. F. Fincham. Self-adaptive control and respiratory system. Nature,
208(5008):339–343, 1965.
[236] W. Prinz. Perception and action planning. Eur. J. Cognit. Psychol., 9(2):129–154, 1997.
[237] F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for compar-
ing induction algorithms. In Proc. 15th Int. Conf. Machine Learn. (ICML), pp. 445–453,
Madison, WI, 1998.
[238] P. Pudil, J. Novovičová. Floating search methods in feature selection. Pattern Recogn.
Lett., 15(11):1119–1125, 1994.
[239] J. R. Quinlan. C4.5: programs for machine learning, Morgan Kaufmann Publishers Inc.,
San Francisco, CA, 1993.
[240] L. R. Rabiner and B. Gold. Theory and Application of Diginal Signal Processing, Pren-
tice Hall Press, 1975.
[241] L. Rabiner, A. Rosenberg, and S. Levinson. Considerations in dynamic time warping

algorithms for discrete word recognition. IEEE Trans. Acoust. Speech Signal Process.,
26(6):575-528, 1978.
198 References
[242] T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria,

and E. Keogh. Searching and mining trillions of time series subsequences under dynamic
time warping. In Proc. Assoc. Comput. Mach. SIG Knowl. Discovery Data Mining (ACM
SIGKDD), pp. 262–270, 2012.
[243] A. N. Rama, S. C. Cho, and C. A. Kushida. Normal human sleep. In Sleep: A Compre-
hensive Handbook, edited by T. Lee-Chiong, Chap. 1, pp. 3-9, Wiley-Liss, New Jersey,
2006.
[244] J. Rasbash, F. Steele, W. J. Browne, and H. Goldstein. A User’s Guide to MLwiN, Centre
for Multilevel Modelling, Univ. of Bristol, Bristol, UK, 2009.
[245] C. A. Ratanamahatana and E. Keogh. Making time-series classification more accurate

using learned constraints. In Proc. SIAM Int. Conf. Data Mining (ICDM), pp. 11-22,
2004.
[246] S. W. Raudenbush and A. S. Bryk. Hierarchical Linear Models, Sage, Thousand Oaks,
CA, 2002.
[247] A. Rechtschaffen and A. Kales. A Manual of Standardized Terminology, Techniques and

Scoring System for Sleep Stages of Human Subjects. National Institutes of Health, Wash-
ington DC, 1968.
[248] S. J. Redmond and C. Heneghan. Cardiorespiratory-based sleep staging in subjects with

obstructive sleep apnea. IEEE Trans. Biomed. Eng., 53(3):485–496, 2006.
[249] S. J. Redmond, P. De Chazal, C. O’Brien, S. Ryan, W. T. McNicholas, and C. Heneghan.

Sleep staging using cardiorespiratory signals. Somnologie, 11(4):245–256, 2007.
[250] J. S. Richman and J. R. Moorman. Physiological time-series analysis using approximate

entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol., 278(6):H2039–2049,
2000.
[251] B. W. Riedel and K. L. Lichstein. Objective sleep measures and subjective sleep satis-
faction: How do older adults with insomnia define a good night’s sleep? Psychol. Aging,
13(1):159–163, 1998.
[252] D. Riemann, M. Berger, and U. Voderholzer. Sleep and depression results from psy-
chobiological studies: an overview. Biol. Psychol., 57(1-3):67–103, 2001.
[253] A. Roebuck, V. Monasterio, E. Gederi, M. Osipov, J. Behar, A. Malhotra, T. Penzel, and

G. D. Clifford. A review of signals used in sleep analysis. Physiol. Meas., 35(1):R1–R57,
2014.
[254] R. Robillard, T. J. R. Lambert, and N. L. Rogers. Measuring sleep-wake patterns with

physical activity and energy expenditure monitors. Biol. Rhythm Res., 43(5):555-562,
2012.
References 199
[255] I. M. Rosen, P. A. Gimotty, J. A. Shea, and L. M. Bellini. Evolution of sleep quantity,

sleep deprivation, mood disturbances, empathy, and burnout among interns. Acad. Med.,
81(1):82-85, 2006.
[256] S. Rostig, J. W. Kantelhardt, T. Penzel, W. Cassel, J.-H. Peter, C. Vogelmeier, H. F.

Becker, and A. Jerrentrup. Nonrandom variability of respiration during sleep in healthy
humans. Sleep, 28(4):411–417, 2005.
[257] S. M. Ryan, A. L. Goldberger, S. M. Pincus, J. Mietus, and L. A. Lipsitz. Gender- and

age-related differences in heart rate dynamics: are women more complex than men? J.
Am. Coll. Cardiol., 24(7):1700–1707, 1994.
[258] A. Sadeh. The role and validity of actigraphy in sleep medicine: an update. Sleep Med.
Rev., 15(4):259–267, 2011.
[259] A. Sadeh, K. M. Sharkey, and M. A. Carskadon. Activity-based sleep-wake identifica-

tion: an empirical test of methodological issues. Sleep, 17(3):201–207, 1994.
[260] C. C. R. Sady, U. S. Freitas, A. Portmann, J.-F. Muir, C. Letellier, and L. A. Aguirre. Au-
tomatic sleep staging from ventilator signals in non-invasive ventilation. Comput. Biol.
Med., 43(7):833–839, 2013.
[261] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word
recognition. IEEE Trans. Acoust., Speech, Signal Process., AASP-26(1):43-49, 1978.
[262] B. Saletu, P. Wessely, P. Grünberger, and M. Schultes. Erste klinische Erfahrungen

mit einem neuen schlafanstoßenden Benzodiacepin Cinolazepam mittels eines Selbst-
beurteilungsbogens für Schalf–und Aufwachqualität (SSA). Acad. Med., 66(11):687-
693, 1991.
[263] J. S. Samkoff and C. H. Jacques. A review of studies concerning effects of sleep depri-
vation and fatigue on residents’ performance. Acad. Med., 66(11):687-693, 1991.
[264] L. Samy, M.-C. Huang, J. Liu, W. Xu, and M. Sarrafzadeh. Unobtrusive sleep stage iden-
tification using a pressure-sensitive bed sheet. IEEE Sens. J., 14(7):2092–2101, 2014.
[265] J. P. Saul, R. F. Rea, D. L. Eckberg, R. D. Berger, and R. J. Cohen. Heart rate and
muscle sympathetic nerve variability during reflex changes of autonomic activity, Am. J.
Physiol., 258(3):H713–H721, 1990.
[266] C. Schäfer, M. G. Rosenblum, J. Kurths and H.-H. Abel. Heartbeat synchronized with
ventilation. Nature, 329:239–240, 1998.
[267] A. Y. Schumann, R. P. Bartsch, T. Penzel, P. Ch. Ivanov, and J. W. Kantelhardt. Aging

effects on cardiac and respiratory dynamics in healthy subjects across sleep stages. Sleep,
33(7):943–955, 2010.
200 References
[268] E. Sforza, C. Jouny, and V. Ibanez. Cardiac activation during arousal in humans: further
evidence for hierarchy in the arousal response. Clin. Neurophysiol., 111(9):1611–1619,
2000.
[269] A. Sgoifo, C. Coe, S. Parmigiani, and J. Koolhaas. Individual differences in behavior and
physiology: causes and consequences. Neurosci. Biobehav. Rev., 29:1–2, 2005.
[270] K. Shafqat, S. K. Pal, S. Kumari, and P. A. Kyriacou. Time-frequency analysis of HRV

data from locally anesthetized patients. In Proc. 31st Ann. Int. Conf. IEEE Eng. Med.
Biol. Soc. (EMBC), pp. 1824–1827, Minneapolis, MN, 2009.
[271] S. S. Shapiro, M. B. Wilk, and H. J. Chen. Network analysis of human heartbeat dynam-
ics. J. Am. Stat. Assoc., 63(324):1343–1372, 1968.
[272] Z.-G. Shao. Network analysis of human heartbeat dynamics. Appl. Phys. Lett.,
96(7):073703, 2010.
[273] Z. Shinar, A. Baharav, Y. Dagan, and S. Akselrod. Automatic detection of slow-wave-

sleep using heart rate variability. In Computers in Cardiology (CinC), pp. 593–596, Rot-
terdam, The Netherlands, 2001.
[274] Z. Shinar, S. Akselrod, Y. Dagan, and A. Baharav. Autonomic changes during wake-
sleep transition: a heart rate variability based approach. Auton. Neurosci., 130(1-2):17–
27, 2006.
[275] T. Shiomi, C. Guilleminault, R. Sasanabe, I. Hirota, M. Maekawa, and T. Kobayashi.

Augmented very low frequency component of heart rate variability during obstructive
sleep apnea. Sleep, 19(5):370–377, 1996.
[276] M. H. Silber, S. Ancoli-Israel, M. H. Bonnet, S. Chokroverty, M. M. Grigg-Damberger,

M. Hirshkowitz, S. Kapen, S.A. Keenan, M. H. Kryger, T. Penzel, M.R. Pressman, and
C. Iber. The visual scoring of sleep in adults. J. Clin. Sleep Med., 3(2):485–496, 2007.
[277] J. Sloboda and M. Das. A simple sleep stage identification technique for incorporation
in inexpensive electronic sleep screening devices. In Proc. IEEE Nat. Aero. Elect. Conf.
(NAECON), pp. 21–24, Dayton, OH, 2011.
[278] P. Smialowski, D. Frishman, and S. Kramer. Pitfalls of supervised feature selection.

Bioinformatics, 26(3):440–443, 2010.
[279] F. Snyder, J. A. Hobson, D. F. Morrison, and F. Goldfrank. Changes in Respiration, heart

rate, and systolic blood pressure in human Sleep. J. Appl. Physiol., 19(5):417–422, 1964.
[280] V. K. Somers, M. E. Dyken, M. P. Clary, and F. M. Abboud. Sympathetic neural mecha-

nisms in obstructive sleep apnea. J. Clin. Invest., 96(4):1897–1904, 1995.
References 201
[281] V. K. Somers, M. E. Dyken, A. L. Mark, and F. M. Abboud. Sympathetic-nerve activity

during sleep in normal subjects. N. Engl. J. Med., 328(5):303–307, 1993.
[282] K. Spiegel, R. Leproult, E. Van Cauter. Impact of sleep debt on metabolic and endocrine
function. The Lancet, 354(9188):1435–1439, 1999.
[283] K. Spiegelhalder, L. Fuchs, J. Ladwig, S. D. Kyle, C. Nissen, U. Voderholzer, B. Feige,

and D. Riemann. Heart rate and heart rate variability in subjectively reported insomnia.
J. Sleep Res., 20(1pt2):137–145, 2011.
[284] M. Steriade. The corticothalamic system in sleep. Front. Biosci., 8:878–899, 2003.
[285] R. Stickgold. Sleep-dependent memory consolidation. Nature, 437(7063):1272–1278,

2005.
[286] S. H. Strogatz. Exploring complex networks. Nature, 410(6825):268–276, 2001.
[287] E. Tasali, R. Leproult, D. A. Ehrmann, and E. V. Cauter. Slow-wave sleep and the risk of
type 2 diabetes in humans. Proc. Natl. Acad. Sci. U.S.A., 105(3):1044–1049, 2008.
[288] Task Force of the European Society of Cardiology and the North American Society of
Pacing and Electrophysiology. Heart rate variability: standards of measurement, physio-
logical interpretation and clinical use. Circulation, 93:1043–1065, 1996.
[289] S. Telser, M. Staudacher, Y. Ploner, A. Amann, H. Hinterhuber, and M. Ritsch-Marte.

Can one detect sleep stage transitions for on-line sleep scoring by monitoring the heart
rate variability? Somnologie, 8(2):33–41, 2004.
[290] C. Texier and S. N. Majumdar. Wigner time-delay distribution in chaotic cavities and
freezing transition. Phys. Rev. Lett., 110(25):250602, 2013.
[291] TomTom Runner Cardio Monitor (retrieved in Jan. 2015). [Online] Available:
http://www.tomtom.com/products/your-sports/running.
[292] J. Trinder, J. Kleiman, M. Carrington, S. Smith, S. Breen, N. Tan, and Y. Kim. Auto-
nomic activity during human sleep as a function of time and sleep stage. J. Sleep Res.,
10(4):253–264, 2001.
[293] J. Trinder, M. Padula, D. Berlowitz, J. Kleiman, S. Breen, P. Rochford, C. Worsnop, B.

Thompson, and R. Pierce. Cardiac and respiratory activity at arousal from sleep under
controlled ventilation conditions. J. Appl. Physiol., 90(4):1455–1463, 2001.
[294] J. Trinder, F. Whitworth, A. Kay, and P. Wilkin. Respiratory instability during sleep
onset. J. Appl. Physiol., 73(6):2462–2469, 1992.
[295] W. W. Tryon. Issues of validity in actigraphic sleep assessment. Sleep, 27(1):158-165,

2004.
202 References
[296] M. Unser. Splines: a perfect fit for signal and image processing. IEEE Signal Proc. Mag.,
16(6):22–38, 1999.
[297] J. Van. Alste and T. S. Schilder. Removal of base-line wander and power-line interference
from the ECG by an efficient FIR filter with a reduced number of taps. IEEE Trans.
Biomed. Engineering, BME-32(12):1052–1060, 1985.
[298] P. Van De Borne, H. Nguyen, P. Biston, P. Linkowski, and J. P. Degaute. Effects of

wake and sleep stages on the 24-h autonomic control of blood pressure and heart rate in
recumbent men. Am. J. Physiol., 266(2):H548–H554, 1994.
[299] E. Vanoli, P. B. Adamson, L. Ba, G. D. Pinna, R. Lazzara, and W. C. Orr. Heart rate
variability during specific sleep stages: a comparison of healthy subjects with patients
after myocardial infarction. Circulation, 91:1918–1922, 1995.
[300] J. Virkkala, J. Hasan, A. Värri, S.-L. Himanen, and K. Müller. Automatic sleep stage
classification using two-channel electro-oculography. J. Neurosci. Meth., 166(1):109–
115, 2007.
[301] P E. Wainwright, S. T. Leatherdale, and J. A. Dubin. Advantages of mixed effects models

over traditional ANOVA models in developmental studies: a worked example in a mouse
model of fetal alcohol syndrome. Develop. Psychobiol., 49(1):664–674, 2007.
[302] M. P. Walker and R. Stickgold. Sleep-dependent learning and memory consolidation.

Neuron, 44(1):121–133, 2004.
[303] T. Watanabe and K. Watanabe. Noncontact method for sleep stage estimation. IEEE
Trans Biomed. Eng., 51(10):1735–1748, 2004.
[304] K. Watanabe, T. Watanabe, H. Watanabe, H. Ando, T. Ishikawa, and K. Kobayashi.

Noninvasive measurement of heartbeat, respiration, snoring and body movements of a
subject in bed via a pneumatic method. IEEE Trans. Biomed. Eng., 52(12):2100–2107,
2005.
[305] D. O. White, J. V. Weil, and C. W. Zwillich. Metabolic rate and breathing during sleep.
J. Appl. Physiol., 59(2):384–391, 1985.
[306] A. W. Whitney. A direct method of nonparametric measurement selection. IEEE Trans.

Comput., C-20(9):1100–1103, 1971.
[307] K. F. Whyte, M. Gugger, G. A. Gould, J. Molloy, P. K. Wraith, and N. J. Douglas. Accu-

racy of respiratory inductive plethysmograph in measuring tidal volume during sleep. J.
Appl. Physiol. (1985), 71(5):1866–1871, 1991.
[308] T. Willemen, D. Van Deun, V. Verhaert, S. Pirrera, V. Exadaktylos, J. Verbraecken, B.

Haex, and J. Vander Sloten. Automatic sleep stage classification based on easy to register
References 203
signals as a validation tool for ergonomic steering in smart bedding systems. Work: J.
Prev. Ass. Rehabil., 41:1985-1989, 2012.
[309] T. Willemen, D. Van Deun, V. Verhaert, M. Vandekerckhove, V. Exadaktylos, J. Ver-

braecken, S. V. Huffel, B. Haex, and J. Vander Sloten. An evaluation of cardio-respiratory
and movement features with respect to sleep stage classification. IEEE J. Biomed. Health
Inform., 18(2):661-669, 2014.
[310] P. Wohlfahrt, J. W. Kantelhardt, M. Zinkhan, A. Y. Schumann, T. Penzel, I. Fietze, F.

Pillmann, and A. Stang. Transitions in effective scaling behavior of accelerometric time
series across sleep and wake. Eur. Phys. Lett., 103(6):68002, 2013.
[311] R. Wolk, A. S. Gami, A. Garcia-Touchard, and V. K. Somers. Sleep and cardiovascular

disease. Curr. Probl. Cardiol., 30:625–662, 2005.
[312] M. Xiao, H. Yan, J. Song, Y. Yang, and X. Yang. Sleep stages classification based on
heart rate variability and random forest. Biomed. Signal Process. Control, 8(6):624–633,
2013.
[313] X. Xu, J. Zhang, and M. Small1. Superfamily phenomena and motifs of networks in-
duced from time series. Proc. Natl. Acad. Sci. U.S.A., 105(50):19601-19605, 2008.
[314] D. Yankov, E. Keogh, J. Medina, B. Chiu, and V. Zordan. Detecting time series mo-
tifs under uniform scaling. In Proc. Assoc. Comput. Mach. SIG Knowl. Discovery Data
Mining (ACM SIGKDD), 2005, pp. 844–853.
[315] B. Yilmaz, M. H. Asyali, E. Arikan, S. Yetkin, and Fuat Özgen. Sleep stage and ob-
structive apneaic epoch classification using single-lead ECG. Biomed. Eng. Online, 9:39,
2010.
[316] J. Yoo, L. Yan, S. Lee, H. Kim, H.-J. Yoo. A wearable ECG acquisition system with
compact planar-fashionable circuit board-based shirt. IEEE Trans. Inf. Technol. Biomed.,
13(6):897–902, 2009.
[317] T. Young, P. E. Peppard, and D. J. Gottlieb. Epidemiology of obstructive sleep apnea: a

population health perspective. Am. J. Respir. Crit. Care Med., 165(9):1217–1239, 2002.
[318] C. Yu, Z. Liu, T. McKenna, A. T. Reisner, and J. Reifman. A method for automatic
identification of reliable heart rates calculated from ECG and PPG waveforms. J. Am.
Med. Inform. Assoc., 13(3):309–320, 2006.
[319] M. Zakrzewski, H. Raittinen, and J. Vanhala. Comparison of center estimation algo-

rithms for heart and respiration monitoring with microwave Doppler radar. . IEEE Sens.
J., 12(3):627–634, 2012.
[320] J. Zhang and M. Small. Complex network from pseudoperiodic time series: topology
versus dynamics. Phys. Rev. Lett., 96(23):238701, 2006.
204 References
[321] G. Zhu, Y. Li, and P. Wen. Analysis and classification of sleep stages based on difference
visibility graphs from a single-channel EEG signal. IEEE J. Biomed. Health Inform.,
18(6):1813–1821, 2014.
[322] G. Zhu, Y. Li, and P. Wen. An efficient visibility graph similarity algorithm and its ap-
plication on sleep stages classification. Brain Inform. LNCS, 7670:185–195, 2012.
List of the author’s publications
Journal articles
1. X. Long, P. Fonseca, R. Haakma, R. M. Aarts, and J. Foussier. Spectral boundary adap-

tation on heart rate variability for sleep and wake classification. International Journal on
Artificial Intelligence Tools, 23(3):1460002, 2014.
2. X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Sleep and wake classi-
fication with actigraphy and respiratory effort using dynamic warping. IEEE Journal of
Biomedical and Health Informatics, 18(4):1272–1284, 2014.
3. P. Fonseca, J. Foussier, R. M. Aarts, and X. Long. A novel low-complexity post-process-

ing algorithm for precise QRS localization. SpringerPlus, 3:376, 2014.
4. X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Analyzing respiratory

effort amplitude for automated sleep stage classification. Biomedical Signal Processing
and Control, 14:197–205, 2014.
5. X. Long, J. Yang, T. Weysen, R. Haakma, J. Foussier, P. Fonseca, and R. M. Aarts.

Measuring dissimilarity between respiratory effort signals based on uniform scaling for
sleep staging. Physiological Measurement, 35(12):2529–2542, 2014.
6. X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Foussier. Modeling cardiorespira-

tory interaction during sleep with complex networks. Applied Physics Letters, 105(20):
203701, 2014.
7. M. S. Goelema, X. Long, and R. Haakma. Correlations between overnight breathing rate

variation and subjective sleep quality scores. Sleep-Wake Research in the Netherlands
(NSWO Jaarboek), 25:60–63, 2015.
8. X. Long, J. Yang, T. Weysen, R. Haakma, J. Foussier, P. Fonseca, and R. M. Aarts.

Erratum: Measuring dissimilarity between respiratory effort signals based on uniform
205
206 List of the author’s publications
scaling for sleep staging (2014 Physiol. Meas. 35 2539). Physiological Measurement,
36(3):625, 2015.
9. X. Long, J. B. Arends, R. M. Aarts, R. Haakma, P. Fonseca, and J. Rolink. Time de-

lay between cardiac and brain activity during sleep transitions. Applied Physics Letters,
106(14):143702, 2015.
10. J. Rolink, M. Kutz, P. Fonseca, X. Long, B. Misgeld, and S. Leonhardt. Recurrence

quantification analysis across sleep stages. Biomedical Signal Processing and Control,
20:107–116, 2015.
11. X. Long, P. Fonseca, R. M. Aarts, R. Haakma, and J. Rolink. Detection of nocturnal slow
wave sleep based on cardiorespiratory activity. Submitted.
12. X. Long, R. Haakma, T. Leufkens, P. Fonseca, and R. M. Aarts. Effects of between- and
within-subject variability on autonomic cardiorespiratory activity during sleep and their
limitations on sleep staging: a multilevel analysis. Submitted.
13. P. Fonseca∗ , X. Long∗ , M. Radha, R. Haakma, R. M. Aarts, and J. Rolink. Sleep stage
classification with ECG and respiratory effort. Submitted. (∗ Joint first authorship)
14. P. Fonseca, R. M. Aarts, X. Long, and R. Haakma. Estimating actigraphy from motion
artifacts in ECG and respiratory effort signals. Submitted.
15. P. Fonseca, N. Den Teuling, X. Long, J. Rolink, and R. M. Aarts. Cardiorespiratory sleep
stage detection using conditional random fields. Submitted.
16. J. Werth, L. Atallah, P. Andriessen, X. Long, E. Zwartkruis-Pelgrim, and R. M. Aarts.

Unobtrusive sleep state measurements in preterm infants: a review. Submitted.
Conference articles and abstracts
1. X. Long, P. Fonseca, J. Foussier, R. Haakma, and R. M. Aarts. Using dynamic time

warping for sleep and wake discrimination. IEEE-EMBS International Conference on
Biomedical and Health Informatics (BHI’12), pp. 886–889, Hong Kong and Shenzhen,
China, Jan. 2012. (First Runner-up Student Paper Award)
2. X. Long, P. Fonseca, R. Haakma, R. M. Aarts, and J. Foussier. Time-frequency analysis

of heart rate variability for sleep and wake classification. 12nd IEEE International Con-
ference on BioInformatics and BioEngineering (BIBE’12), pp. 85–90, Larnaca, Cyprus,
Nov. 2012. (Best Student Paper Award)
3. J. Foussier, P. Fonseca, X. Long, and S. Leonhardt. Automatic feature selection for

sleep/wake classification with small data sets. International Joint Conference on Biomed-
ical Engineering Systems and Technologies (BIOSTEC’13), pp. 178–184, Barcelona,
Spain, Feb. 2013.
List of the author’s publications 207
4. X. Long, J. Foussier, P. Fonseca, R. Haakma, and R. M. Aarts. Respiration amplitude

analysis for REM and NREM sleep classification. 35th Annual International Conference
of the IEEE Engineering in Medicine and Biology Society (EMBC’13), pp. 5017–5020,
Osaka, Japan, Jul. 2013.
5. P. Fonseca, X. Long, J. Foussier, and R. M. Aarts. On the impact of arousals on the

performance of sleep and wake classification using actigraphy. 35th Annual Interna-
tional Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’13),
pp. 6760–6763, Osaka, Japan, Jul. 2013.
6. J. Foussier, P. Fonseca, X. Long, B. Misgeld, and S. Leonhardt. Combining HRV fea-

tures for automatic arousal detection. Computing in Cardiology (CinC), pp. 1003–1006,
Zaragoza, Spain, Sep. 2013.
7. J. Foussier, X. Long, P. Fonseca, B. Misgeld, and S. Leonhardt. On the relationship of

arousals and artifacts in respiratory effort signals. International Conference on Health
Informatics, pp. 31–34, Vilamoura, Portugal, Nov. 2013. (Finalists Young Investigator
Award)
8. X. Long, R. Haakma, M. Goelema, Tim Weysen, P. Fonseca, J. Foussier, and R. M.

Aarts. Self-dissimilarity of respiratory effort across sleep states and time. Sleep, vol. 37
(Abstract Supplement), p. A36, May 2014.
9. X. Long, P. Fonseca, R. Haakma, J. Foussier, and R. M. Aarts. Automatic detection

of overnight deep sleep based on heart rate variability: a preliminary study. 36th An-
nual International Conference of the IEEE Engineering in Medicine and Biology Society
(EMBC’14), pp. 50–53, Chicago, IL, Aug. 2014.
10. X. Long, R. Haakma, R. M. Aarts, P. Fonseca, and J. Foussier. Between-laboratory and

demographic effects on heart rate and its variability during sleep. Workshop on Smart
Healthcare and Healing Environments in conjunction with the European Conference on
Ambient Intelligence (AmI’14), pp. 1–4, Eindhoven, The Netherlands, Nov. 2014.
11. M. S. Goelema, X. Long, and R. Haakma. Gender effect found in the association be-
tween overnight breathing rate variation and reported sleep quality scores. Sleep, vol. 38
(Abstract Supplement), p. A60, Jun. 2015.
12. X. Long, R. Haakma, P. Fonseca, R. M. Aarts, M. S. Goelema, and J. Rolink. What

causes the differences in cardiac activity within and between subjects during sleep? Sleep,
vol. 38 (Abstract Supplement), p. A63, Jun. 2015.
13. X. Long, R. Haakma, J. Rolink, P. Fonseca, and R. M. Aarts. Improving sleep/wake

detection via boundary adaptation for respiratory spectral features. Submitted, 2015.
208 List of the author’s publications
14. M.-M. Nano, X. Long, J. Werth, R. M. Aarts, and R. Heusdens. Sleep apnea detection
using time-delayed heart rate variability. Submitted, 2015.
Patent application filings
1. X. Long, R. Haakma, P. Fonseca, and R. M. Aarts. System and method for determining
spectral boundaries for sleep stage classification. Pending.
2. X. Long, P. Fonseca, Niek den Teuling, R. Haakma, and R. M. Aarts. System and method
for slow wave sleep detection. Pending.
3. P. Fonseca, Niek den Teuling, X. Long, R. Haakma, and R. M. Aarts. System and method
for cardiorespiratory sleep stage classification. Pending.
4. P. Fonseca, R. Haakma, R. M. Aarts, and X. Long. Actigraphy methods and apparatuses.

Pending.
Articles out of the thesis’s scope
1. X. Long, B. Yin, and R. M. Aarts. Single-accelerometer-based daily physical activ-

ity classification. 31st Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC’09), pp. 6107–6110, Minneapolis, MN, Sep. 2009.
2. X. Long, S. Pauws, M. Pijl, J. Lacroix, A. Goris, and R. M. Aarts. Analysis and predic-
tion of daily physical activity level data using autoregressive integrated moving average
models. 3rd Workshop on Behaviour Monitoring and Interpretation (BMI’09), pp. 1–15,
Paderborn, Germany, Oct. 2009.
3. X. Long, W. Yin, L. An, H. Ni, L. Huang, Q. Luo, and Y. Chen. Churn analysis of online
social network users using data mining techniques. International MultiConference of
Engineers and Computer Scientists (IMECS’12), pp. 551–556, Hong Kong, Mar. 2012.
4. X. Long, M. Pijl, S. Pauws, J. Lacroix, A. Goris, and R. M. Aarts. Towards tailored phys-
ical activity health intervention: Predicting dropout participants. Health and Technology,
4:273–287, 2014.
5. X. Long, S. Pauws, M. Pijl, J. Lacroix, A. Goris, and R. M. Aarts. Predicting daily

physical activity in a lifestyle intervention program. In Ambient Intelligence and Smart
Environments, Vol. 9: Behaviour Monitoring and Interpretation – BMI, edited by B.
Gottfried and H. Aghajan, Part III, pp. 131-146, IOP Press, Amsterdam, The Netherlands,
2011.
Acknowledgements
I still remember the moment more than four years ago I decided to pursue this PhD project in
Western Europe (Eindhoven, the Netherlands), 11472 km away from my home in the Far East
(Huizhou, China). It was not an easy decision since it meant that I had to change my career
path and stay on the other side of the earth for the second time after I finished my master study
in Eindhoven in 2009, and this time it would be much longer. Even though, I experienced to be
exciting and full of strength at that moment because it seemed that I found my dream, a dream
of dedicating myself to what I am extraordinarily interested in; and then after four years, you
see this book. Herewith, I would like to express my heartfelt appreciation to all of you who
shared my experience over the years.
First and foremost, I would like to express the deepest thanks and gratitude to my supervi-
sors, Prof. Ronald M. Aarts and Dr. Reinder Haakma. Thank you, Ronald, for your sincerest
advices and encouragements for the long walks on both professional development and personal
life I underwent during the past four years, as well as during my master period. Thank you,
Reinder, for masterly and patiently coaching me the doctorate work and for giving me the free-
dom to explore my own ideas. I will never forget the discussions during our regular meetings,
which were always so inspiring and happy. I learned so much from you about how to energize
creative thinking during scientific research. I always feel being lucky under your supervision.
A special note of gratitude to Prof. Jan Bergmans, the chair of the Signal Processing (SPS)
group at TU/e, you offered me this wonderful opportunity to pursue my doctorate degree.
Enormous thanks must go to my colleague Pedro Fonseca who worked closely with me.
Your support and knowledge have been of great help for surpassing the encountered obstacles.
I have been receiving lots of benefits from your critical review for my articles. Without the help
from you, this thesis would never have been possible to be finished. Particular thanks must be
recorded to my second promoter Prof. Johan Arends for your advices and discussions regarding
the neurophysiology aspect of the work. Many thanks go also to Jérôme Rolink and Mustafa
Radha, who provided inspiring comments for my manuscripts and provided huge contributions
to the algorithm framework of the project, and to Dr. Sandrine Devot and Reimund Dratwa,
who initiated the framework. I would also like to thank Maaike Goelema, Dr. Tim Weysen, Dr.
Tim Leufkens, Dr. Roy Raymann, Tine Smits, and Renske de Bruijn for supporting the work
209
210 Acknowledgements
with your expertise in psychology or physiology. My gratitude is also due to the other former
team members Adrienne Heinrich and Dr. Igor Berezhnyy as well as the former master students
Jie Yang, Niek den Teuling, Xi Yang, Antonio Rebelo, Yuan Lu, and Xi Zhang being involved
in the project. Your enthusiastic attitude and your hard work during different phases of the
project did accelerate the success of my work. I feel fortunate for having you in the project. A
special thank goes to Timothy A. Nathan, a senior intellectual property counsel from the IP&S
department in Philips United States, for your active responses that expedited the approval of
my work for publication and for helping me with filing several patent applications.
I would also like to acknowledge the committee of this thesis, Prof. Panos Markopoulos,
Prof. Sabine Van Huffel (KU Leuven), Prof. Steffen Leonhardt (RWTH Aachen University),
and the chairman Prof. Peter de With, for the insightful comments of the thesis.
Most of the work presented in this thesis has been conducted at Philips Research, the Nether-
lands. For this reason thanks must go to Marieke van der Hoeven, the head of the Brain, Body
& Behavior department where I spent the first two years in your group, and to Dr. Jörg Habetha,
the head of the Personal Health department where I had the honor to work in your department
during the past two years. It was a grateful and pleasure time of my life where I met so many
knowledgable and energetic scientists, from whom I have learned a lot during coffee breaks,
lunch time, and offside events that we had together. Dr. Michael Rooijakkers, thank you for
helping me with presenting my work in the EMBC’14 conference in Chicago. Since I have also
been involved in a couple of other projects apart from my PhD work, I would like to thank Prof.
Guofu Zhou, Jan Werth, Dr. Peter Andriessen, Dr. Louis Atallah, Elly Zwartkruis-Pelgrim,
Marina Nano, and Dr. Richard Heusdens for having great discussions with you. Thanks go
also to the Philips and SPS secretaries as well as the other adminstration staffs who helped me
with organizing many non-technical issues such as providing instructions and ICT supports at
the beginning of the project, applying business trips and reimbursement, appointing teleconfer-
ences, and making support letters for my parents’ and friends’ visit to the Netherlands. I would
also like to express my sincerest gratitude to Dr. Bin Yin and Dr. Steffen Pauws for guiding my
master project. You started me down the amazing road of scientific research.
During my life in the Netherlands, I have to admit that I owe a lot of thanks to my Chinese
friends and colleagues who made me never lonely: Anmin, Liya, Yuanjia, Xiaoyin, BoC, Qing,
Tao, Tao, Wei, Jianhua, Bin, Rui, Pu, Wei, Quan, Shaoxiong, Yanan, Yan, Lin, Xin, Xiong and
so many others. Particularly, I would also give special thanks to Le, Wenyao, Xiaomin, Xin,
Dan, Tingyun, Joanne, Chen, Fei, and Anqi for different reasons. Last but not least, I want to
express my sincerest thanks to my parents for giving me life to see, listen, feel, and experience
this wonderful world and to my relatives and friends in China for your supports during the past
11586 days.
Being back to 18 years ago during my child age at middle school in 1997, I wrote an article
when I was doing my writing homework where I dreamed to be awarded the Nobel Laureate in
Biomedicine in 2016, although I had no idea what ‘Biomedicine’ means literally. Unfortunately,
since 2016 is coming soon, I now realize that there is even no Nobel Laureate in Biomedicine,
but who knows if that will come in the future.
About the author
Xi Long was born on October 9, 1983 in Ganzhou, China, and

moved to Huizhou, China, in 1992. He received the B.Eng.
degree (with honor) in electronic and information engineering
from Zhejiang University, Hangzhou, China, in June 2006 and
the M.Sc. degree in electrical engineering (with a fully-funded
scholarship awarded by NXP) from Eindhoven University of
Technology, Eindhoven, the Netherlands, in August 2009. Dur-
ing the period between May 2008 and August 2009, he was a
research intern at Philips Research Eindhoven and worked on
accelerometer-based activity monitoring, supervised by Prof.
Ronald M. Aarts, Dr. Bin Yin, and Dr. Steffen Pauws. After that, from January 2010 until
June 2011, he worked for Tencent Inc., Shenzhen, China, with responsibilities for user research
and quantitative data analysis of web-based products and services.
From July 2011 to June 2015, he was a Ph.D. candidate in the Signal Processing Systems
group at the Eindhoven University of Technology, the Netherlands, granted by Philips Research
in Eindhoven, the Netherlands, under the supervision of Prof. Ronald M. Aarts and Dr. Rein-
der Haakma. At the same time, he joined the Brain, Body & Behavior group (and later the
Personal Health group) at Philips Research where he investigated autonomic markers and ma-
chine learning algorithms for sleep stage classification and objective sleep assessment. He has
published over thirty papers and reports, and holds four first US patent application filings and
over ten Philips invention disclosures. He was the recipient of the Best Student Paper Award
at the IEEE 12nd International Conference on BioInformatics and BioEngineering (BIBE) in
2012 and the First Runner-up Student Paper Award at the IEEE-EMBS International Confer-
ence on Biomedical and Health Informatics (BHI) in 2012. He serves as a reviewer of several
journals in biomedical engineering and healthcare, such as IEEE Journal of Biomedical and
Health Informatics, Biomedical Signal Processing and Control, and Physiological Measure-
ment. His research interests include objective sleep analysis, vital sign monitoring, unobtrusive
and wearable sensing, and biomedical signal processing as well as machine learning, time series
analysis, and computational models for medicine and healthcare.
211

Sleepstages Cardio Respiratory

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Sleepstages Cardio Respiratory

Hochgeladen von

Copyright:

Verfügbare Formate

On the analysis and classification of sleep stages from

Document status and date:

Please check the document version of this publication:

Download date: 06. febr.. 2019

ter verkrijging van de graad van doctor aan de Technische Universiteit

Geboren te Ganzhou, China

voorzitter : prof.dr.ir. P.H.N de With

Cover Design : Ya Shu, Eindhoven, The Netherlands.

Copyright c 2015, Xi Long

nabij-infraroodcameras. Daarom is het onderzoek naar cardiaal-respiratoire-karakteristieken

AASM American Academy of Sleep Medicine

R&K Rechtschaffen and Kales

Part I: Signal Analysis for Sleep Stage Classification 11

4 Analysis of respiratory effort amplitude for sleep stage classification 53

5 Measuring dissimilarity between respiratory effort signals based on uniform

5.2.7 Sleep staging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6 Modeling cardiorespiratory interaction during sleep with complex networks 85

Part II: Timing Between Autonomic and Brain Activity 93

8 Detection of nocturnal slow wave sleep based on cardiorespiratory activity 105

Part III: Cardiorespiratory-Based Sleep Stage Classification 122

9 Effects of between- and within-subject variability on autonomic cardiores-

10 Sleep stage classification with ECG and respiratory effort 149

11 General discussion and future perspectives 165

11.6 Towards unobtrusive sleep monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 178

List of the author’s publications 205

About the author 211

1.1 Human sleep

1.2 Sleep stages in electrophysiology

1.3 Polysomnography – standard for sleep assessment

1.4 Automatic sleep monitoring

1.4.2 Cardiorespiratory-based sleep stage classification

0 100 200 300 400 500 600 700 800

Approach Activity Placement References (examples)

Data Signal Feature

Sleep stage Feature Feature post-

1.5 Research question and objective

1.6 Outline of the thesis

Spectral boundary adaptation on heart rate variability for

Boundary Feature Feature

2.2 Materials and methods

2.2.1 Data acquisition

Table 2.1: Summary of subject demographics

2.2.2 PSD estimation

2.2.3 Boundary adaptation

2.2.4 Feature extraction

2.2.4.1 HRV spectral features

0.35 HF* 0.035

0.1 LF* 0.01

2.2.4.2 Spectrum information

2.2.5 Feature evaluation

2.2.6 Sleep and wake classification

2.2.7 Classifier evaluation

To assess the performance of this classifier, conventional measures of sensitivity (proportion

Group Feature Hellinger Distance DH p value†

Group Feature set Accuracy (%) Sensitivity (%) Specificity (%)

ture set used in [89].

Table 2.4: Classification performance (mean ± SD of κ and pooled AUCPR and

Group Feature set Kappa κ AUCPR AUCROC

a larger feature set used in [89].

with a paired Wilcoxon signed-rank test (with p < 0.05).

HRV spectral features (without adaptation)

HRV spectral features (without adaptation)

2.4.3 Healthy subjects versus insomniacs