Driver Drowsiness Detection Systems Potential of Smart Wearable Devices To Improve Vehicle Safety

Submitted by
Thomas Kundinger, M.Sc.
Submitted at
Institute for Pervasive
Computing
Driver Drowsiness Supervisor and

First Evaluator
Detection Systems: Prof. Priv.-Doz. Dr.

Andreas Riener
Potential of Smart Second Evaluator

Univ.-Prof. Dr.
Florian Alt
Wearable Devices to Linz, June 2021
Improve Vehicle Safety
Doctoral Thesis
to obtain the academic degree of
Doktor der technischen Wissenschaften
in the Doctoral Program
Engineering Sciences
JOHANNES KEPLER
UNIVERSITY LINZ
Altenbergerstraße 69
4040 Linz, Österreich
www.jku.at
DVR 0093696
Statutory declaration
I hereby declare that the thesis submitted is my own unaided work, that I
have not used other than the sources indicated, and that all direct and indirect
sources are acknowledged as references. This printed thesis is identical with
the electronic version submitted.
Parts of this thesis have been published as international conference or jour-

nal articles (see [1], [2], [3], [4], [5], [6], [7], [8], [9]). A statement with my
contributions to these publications is provided at the end of this thesis.
........................................................ ........................................................
Place, Date Thomas Kundinger
i
Abstract
Driver drowsiness is a major cause of fatal trac accidents. Automated driv-

ing might counteract this problem in the end by taking over more and more
the driving task and reducing human-made errors in this way. However, in
the lower levels of automation, the driver is still responsible as a fallback au-
thority. Consequently, systems for the reliable monitoring and detection of the
driver's current state, especially regarding the risk factor drowsiness, are re-
quired. Current commercial drowsiness detection systems mainly focus on the
analysis of driving-related parameters. These parameters cannot be evaluated
to the usual extent in the ongoing automation of the driving task since the au-
tomated system controls the vehicle more and more. Especially techniques that
include physiological measurements seem to be a promising alternative. How-
ever, in a dynamic environment such as driving, only non- or minimal intrusive
methods are accepted, and vibrations from the roadbed could lead to degraded
sensor technology. A solution for the mentioned problems could be integrating
consumer-grade smart wearables in the vehicle. Besides, existing vehicles could
quickly be upgraded and retrotted with this technology without installing ad-
ditional sensors. For this reason and encouraged by the ongoing progress in the
development of smart wearable devices in recent years, this work investigated
the potential of applying their recorded physiological data in an automotive
environment. Experimental results from three user studies prove the potential
and feasibility of driver drowsiness detection based on physiological data from
smart wearables. Several aspects and open challenges in driver drowsiness de-
tection are highlighted that need to be considered in further research. Thereby,
the knowledge gained in this work can serve as a starting point and provide
incentives for researchers and automobile manufacturers for novel and intelli-
gent driver-vehicle interaction concepts for driver state monitoring on the way
to full driving automation. Safety on the roads needs to be further increased
by reducing fatal accidents based on risk factors such as driver drowsiness.
iii
Kurzfassung
Fahrermüdigkeit ist eine der Hauptursachen für tödliche Verkehrsunfälle. Das

automatisierte Fahren könnte diesem Problem auf lange Sicht entgegenwirken,
indem es immer mehr die Fahraufgabe übernimmt und Fehler des Menschen re-
duziert. In den niedrigeren Automatisierungsstufen ist der Fahrer jedoch weit-
erhin als Rückfallebene verantwortlich. Folglich sind Systeme zur zuverlässigen
Überwachung und Erkennung des aktuellen Zustands des Fahrers erforderlich,
insbesondere in Bezug auf den Risikofaktor Müdigkeit. Aktuelle kommerzielle
Systeme zur Erkennung von Müdigkeit konzentrieren sich hauptsächlich auf
die Analyse fahrbezogener Parameter. Diese Parameter können aber bei der
fortschreitenden Automatisierung der Fahraufgabe nicht im üblichen Umfang
ausgewertet werden, da das automatisierte System immer mehr die Kontrolle
über das Fahrzeug übernimmt. Besonders Techniken, die physiologische Mes-
sungen beinhalten, scheinen eine vielversprechende Alternative zu sein. In einer
dynamischen Umgebung wie dem Fahren werden jedoch nur nicht- oder mini-
mal störende Methoden akzeptiert. Zudem können Vibrationen vom Straÿen-
bett zu einer Verschlechterung der Sensorperformance führen. Eine Lösung
für die genannten Probleme könnte darin bestehen, intelligente Wearables in
das Fahrzeug zu integrieren. Auÿerdem könnten vorhandene Fahrzeuge mit
dieser Technologie schnell nachgerüstet werden, ohne dass zusätzliche Sensoren
installiert werden müssen. Aus diesem Grund und ermutigt durch die anhal-
tenden Fortschritte bei der Entwicklung intelligenter tragbarer Geräte in den
letzten Jahren, untersuchte diese Arbeit das Potenzial der Anwendung ihrer
aufgezeichneten physiologischen Daten in einem automobilen Umfeld. Exper-
imentelle Ergebnisse aus drei Benutzerstudien belegen die Durchführbarkeit
der Erkennung von Müdigkeit des Fahrers auf der Grundlage physiologischer
Daten von Smart Wearables. Unterschiedliche Aspekte und oene Heraus-
forderungen im Themengebiet Fahrermüdigkeitserkennung werden hervorge-
hoben, die in der weiteren Forschung angegangen werden müssen. Das in
dieser Arbeit gewonnene Wissen kann dabei als Ausgangspunkt dienen und
Forschern und Automobilherstellern Anreize für neuartige und intelligente In-
teraktionskonzepte zwischen Fahrer und Fahrzeug für die Überwachung des
Fahrerzustands auf dem Weg zur vollständigen Fahrautomatisierung bieten.
Die Sicherheit auf den Straÿen muss weiter erhöht werden, indem tödliche Un-
fälle aufgrund von Risikofaktoren wie Fahrermüdigkeit verringert werden.
v
Acknowledgments
This thesis and the corresponding research were completed with the Johannes
Kepler University Linz, Austria, in cooperation with the AUDI AG Ingolstadt,
Germany, and the Technische Hochschule Ingolstadt, Germany.
First and foremost, I would like to thank Prof. Priv.-Doz. Dr. techn. Andreas
Riener for the supervision of my research, his constant support, and valuable
inputs.
Special thanks also go to my supervisors at AUDI AG, Dr. Nikoletta Sofra

and Gordon Groÿkopf, for their help and support. Moreover, I am grateful to
my colleagues from the team User State Server who were always available for
advice.
Additionally, I would like to thank Univ.-Prof. Dr. Florian Alt of the Univer-
sität der Bundeswehr München, Germany for taking over the role as the second
evaluator. I would also like to thank Univ.-Prof. Mag. Dr. Gabriele Anderst-
Kotsis for her participation in the examination committee and Univ.-Prof. Dr.
Armin Biere for chairing the defense.
Moreover, I would like to thank my colleagues from the Human-Computer

Interaction Group for their support, giving me helpful feedback, and creating
a friendly atmosphere during the doctoral program and my time at Technische
Hochschule Ingolstadt.
Finally, I would like to thank my family and friends for their constant support
and for reminding me that there is life besides the doctoral thesis.
vii
Contents
List of Figures xiii

List of Tables xv
List of Acronyms xvii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Theoretical Background and State-of-the-Art of Drowsiness De-

tection 7
2.1 Drowsiness, Sleepiness and Fatigue . . . . . . . . . . . . . . . . . . 7
2.2 Driver Drowsiness Detection Methods - Advantages and Limita-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Subjective Measures . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Behavioral Measures . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Vehicle-based Measures . . . . . . . . . . . . . . . . . . . . . 11
2.2.4 Physiological Measures . . . . . . . . . . . . . . . . . . . . . 12
2.2.4.1 Non-/Less Intrusive Approaches for Measuring
Physiological Signals inside a Vehicle . . . . . . . 15
2.2.4.2 Driver Drowsiness Detection Using Wrist-Worn
Wearable Devices . . . . . . . . . . . . . . . . . . 17
2.3 Problem Statement and Research Approach . . . . . . . . . . . . . 19
2.3.1 Hypothesis and Research Questions . . . . . . . . . . . . . 20
2.4 Previous Drowsiness Studies in Simulated and Realistic Envi-
ronments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.1 Simulated Environment . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Realistic Environment . . . . . . . . . . . . . . . . . . . . . 24
3 Baseline Studies and Subjective Evaluation 25

3.1 Study Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.2.1 Subjective Measures . . . . . . . . . . . . . . . . . 27
3.1.2.2 Objective Measures . . . . . . . . . . . . . . . . . 30
3.1.3 Study Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 31
ix
Contents
3.2 Study 1: Driving Simulator . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1.1 Questionnaires . . . . . . . . . . . . . . . . . . . . 34
3.2.1.2 Analysis of Self-Ratings . . . . . . . . . . . . . . . 36
3.2.1.3 Discussion and Limitations . . . . . . . . . . . . . 39
3.2.1.4 Main Findings . . . . . . . . . . . . . . . . . . . . 40
3.3 Study 2: Test Track . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1.1 Questionnaires . . . . . . . . . . . . . . . . . . . . 43
3.3.1.2 Analysis of Self-Ratings . . . . . . . . . . . . . . . 47
3.3.1.3 Analysis of Heart Rate Data from Wearable De-
vices . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.1.4 Discussion and Limitations . . . . . . . . . . . . . 52
3.3.1.5 Main Findings . . . . . . . . . . . . . . . . . . . . 54
4 Model Development: Driver Drowsiness Detection using Wrist-

Worn Wearable Devices 57
4.1 Wrist-Worn Wearable vs. Medical-Grade Device . . . . . . . . . . 58
4.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.1.1 Ground Truth for Drowsiness . . . . . . . . . . . 60
4.1.1.2 Feature Extraction and Data Set Preparation . 63
4.1.1.3 Classication of Driver Drowsiness . . . . . . . . 65
4.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.2.1 Selected Features . . . . . . . . . . . . . . . . . . . 67
4.1.2.2 Classication Results . . . . . . . . . . . . . . . . 69
4.1.3 Discussion and Limitations . . . . . . . . . . . . . . . . . . 71
4.1.4 Main Findings . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Wrist-Worn Wearable vs. Wrist-Worn Wearable . . . . . . . . . . 73
4.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.1.2 Feature Extraction and Data Set Preparation . 76
4.2.1.3 Classication of Driver Drowsiness . . . . . . . . 77
4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.2.1 Selected Features . . . . . . . . . . . . . . . . . . . 78
4.2.4 Main Findings . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3 Ground Truth for Drowsiness: A Complexity Analysis . . . . . . 83
4.3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3.1.1 Self-Ratings as Ground Truth . . . . . . . . . . . 83
4.3.1.2 Observer Ratings as Ground Truth . . . . . . . . 84
4.3.1.3 Hybrid Ground Truth . . . . . . . . . . . . . . . . 85
4.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
x
Contents
4.3.5 Main Findings . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5 Evaluation: Performance and Acceptance of a Driver Drowsiness

Detection System based on Smart Wearables 97
5.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2.2 Wrist-worn Smart Wearable Device . . . . . . . . . . . . . 101
5.2.3 Application on Mobile Device . . . . . . . . . . . . . . . . . 101
5.2.3.1 Backend . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2.3.2 Frontend . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3.1 Simulator Setup and Driving Simulation . . . . . . . . . . 105
5.3.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.3.1 Pre-Questionnaire . . . . . . . . . . . . . . . . . . 106
5.3.3.3 Post-Questionnaire . . . . . . . . . . . . . . . . . . 107
5.3.4 Study Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4.1 Pre-Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4.2 Performance of Machine Learning Model . . . . . . . . . . 112
5.4.2.3 Outlook: Post-processing of Machine Learning
Output . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.4.3 TAM - Technology Acceptance . . . . . . . . . . . . . . . . 120
5.4.4 UEQ - User Experience . . . . . . . . . . . . . . . . . . . . . 121
5.4.5 Further results from post-questionnaire . . . . . . . . . . . 123
5.5 Discussion and Limitations . . . . . . . . . . . . . . . . . . . . . . . 125
5.6 Main Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6 Discussion 131
6.1 Preconditions for the Adaptation of Driver Drowsiness Detection
Systems (RQ1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.2 Driver Drowsiness Detection with Vital Data from Smart Wear-
ables (RQ2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.3 Acceptance of Drowsiness Detection Systems based on Smart
Wearables (RQ3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4 Further Deployment Scenarios for Drowsiness Detection Systems
based on Smart Wearables . . . . . . . . . . . . . . . . . . . . . . . 140
xi
Contents
7 Conclusion 143
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.1.1 Recommendations for the Design and Development of
Drowsiness Detection Systems . . . . . . . . . . . . . . . . 145
7.1.1.1 Preconditions for the Adaptation of Driver
Drowsiness Detection Systems . . . . . . . . . . . 145
7.1.1.2 Model Development for Driver Drowsiness De-
tection Systems using Vital Data from Smart
Wearables . . . . . . . . . . . . . . . . . . . . . . . 145
7.1.1.3 Acceptance of Drowsiness Detection Systems
based on Smart Wearables . . . . . . . . . . . . . 146
7.2 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . 147
7.2.1 Study Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2.2 Model Development with Data from Wearable Devices . . 148
7.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A Publications and Contribution Statement 151

B German Versions of Study Questionnaires and Scales 155
B.1 Own Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
B.2 Epworth Sleepiness Scale (ESS) . . . . . . . . . . . . . . . . . . . . 156
B.3 Karolinska Sleepiness Scale (KSS) . . . . . . . . . . . . . . . . . . . 156
B.4 Trust Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
B.5 User Experience Questionnaire (UEQ) . . . . . . . . . . . . . . . . 157
B.6 Technology Acceptance Model (TAM) . . . . . . . . . . . . . . . . 158
C German Version of Developed Android Application 159

Bibliography 161
xii
List of Figures
1.1 Changing Role of Driver in Driving Automation . . . . . . . . . . 2
2.1 Driver Alertness Monitor . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Cardiowheel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Wearable Device-based Drowsiness Detection Systems . . . . . . . 19
2.4 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Representation of KSS on Tablet . . . . . . . . . . . . . . . . . . . 28

3.2 Wearable Devices on Participant's Wrists . . . . . . . . . . . . . . 31
3.3 Study Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Study 1: Track for Simulator Ride . . . . . . . . . . . . . . . . . . 33
3.5 Study 1: Driving Simulator Setup . . . . . . . . . . . . . . . . . . . 33
3.6 Study 1: Average KSS Ratings During and After Driving . . . . . 36
3.7 Study 1: Average KSS Ratings for Manual and Automated Driving 37
3.8 Study 1: Average KSS Ratings for Young and Old Age Group . . 38
3.9 Study 1: Average KSS Ratings for Dierent Day Times . . . . . . 38
3.10 Study 1: Reached KSS Levels for Age Groups and Driving Modes 39
3.11 Study 2: Study Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.12 Study 2: Test Area with Test Track . . . . . . . . . . . . . . . . . 42
3.13 Study 2: Average KSS Ratings During And After Driving . . . . 47
3.14 Study 2: Average KSS Ratings for Manual and Automated Driving 48
3.15 Study 2: Average KSS Ratings for Young and Old Age Group . . 49
3.16 Study 2: Average KSS Ratings for Dierent Day Times . . . . . . 49
3.17 Study 2: Reached KSS Levels for Age Groups and Driving Modes 50
3.18 Study 2: Average Heart Rate for Manual and Automated Driving 52
3.19 Study 2: Average Heart Rate for Young and Old Age Group . . . 52
4.1 Methodology for Model Development and Testing . . . . . . . . . 58

4.2 RR Intervals in ECG Signal . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Localization of Observer Ratings in Time . . . . . . . . . . . . . . 62
4.4 Sliding Window Approach for Feature Extraction . . . . . . . . . 65
4.5 Leave-One-Subject-Out Cross-Validation (LOSOCV) . . . . . . . 66
4.6 Average KSS Ratings for Automated Driving . . . . . . . . . . . . 75
4.7 Distribution of Self-Ratings across KSS Levels . . . . . . . . . . . 76
4.8 Comparison of Average Heart Rate for Dierent Wearable Devices 77
4.9 Label Granularity for Drowsiness . . . . . . . . . . . . . . . . . . . 87
xiii
List of Figures
4.10 Sample Image of Video File . . . . . . . . . . . . . . . . . . . . . . 88

4.11 Video Ratings for Participants in Manual and Automated Driving 89
4.12 Approach for Calculation of Correlations . . . . . . . . . . . . . . 92
5.1 Concept of Driver Drowsiness Detection System . . . . . . . . . . 99

5.2 Selected Implementations of Mobile Applications . . . . . . . . . . 100
5.3 Average Battery Consumption . . . . . . . . . . . . . . . . . . . . . 101
5.4 Data Pre-Processing Algorithm . . . . . . . . . . . . . . . . . . . . 103
5.5 Screenshots of Developed Android Application . . . . . . . . . . . 104
5.6 Study 3: Driving Simulator Setup . . . . . . . . . . . . . . . . . . . 107
5.7 Technology Acceptance Model (TAM) . . . . . . . . . . . . . . . . 108
5.8 Study 3: Study Procedure . . . . . . . . . . . . . . . . . . . . . . . 110
5.9 Study 3: Average KSS Ratings During and After Driving . . . . . 112
5.10 Evaluation of KSS and Weinbeer Ratings . . . . . . . . . . . . . . 114
5.11 Evaluation of TAM Subscales . . . . . . . . . . . . . . . . . . . . . 121
5.12 Evaluation of UEQ Subscales . . . . . . . . . . . . . . . . . . . . . 122
5.13 Evaluation of UEQ Dimensions . . . . . . . . . . . . . . . . . . . . 123
5.14 UEQ Benchmark Evaluation . . . . . . . . . . . . . . . . . . . . . . 123
xiv
List of Tables
2.1 Karolinska Sleepiness Scale (KSS) . . . . . . . . . . . . . . . . . . . 9

2.2 Observer Rating Scale by Weinbeer et al. . . . . . . . . . . . . . . 10
2.3 EEG Waves with Frequency Band and Measure . . . . . . . . . . 13
2.4 Advantages and Limitations of Dierent Drowsiness Measures . . 15
3.1 Pre-Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Epworth Sleepiness Scale (ESS) . . . . . . . . . . . . . . . . . . . . 27
3.3 Post-Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Trust Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 Study 1: Results of Pre-Questionnaire . . . . . . . . . . . . . . . . 34
3.6 Study 1: Results of ESS . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7 Study 1: KSS Level for Drowsiness Warning . . . . . . . . . . . . 35
3.11 Study 2: Usage of Wearable Devices . . . . . . . . . . . . . . . . . 45
3.12 Study 2: Descriptive Statistics for Trust Scale . . . . . . . . . . . 46
3.13 Study 2: Results from Correlation Analysis with Spearman . . . 51
4.1 Allocation of Micro-Sleep Events to Drowsiness Level . . . . . . . 61

4.2 Adjustment of Observer Ratings with Micro-Sleep Events . . . . 61
4.3 Distribution of Observer Ratings across Drowsiness Levels . . . . 62
4.4 Selected Features in UDT and UIT . . . . . . . . . . . . . . . . . . 68
4.5 Classication Results Wearable vs. Medical-Grade Device . . . . 70
4.6 Selected Classication Results of Specic Participants . . . . . . . 71
4.7 Selected Features in 10-fold CV . . . . . . . . . . . . . . . . . . . . 79
4.8 Classication Results Wearable vs. Wearable Device . . . . . . . 80
4.9 Distribution of Ratings across Drowsiness Levels . . . . . . . . . . 90
4.10 Distribution of Inconsistent Ratings . . . . . . . . . . . . . . . . . 91
4.11 Results from Correlation Analysis for Six Levels of Drowsiness . 93
4.12 Results from Correlation Analysis for Three Levels of Drowsiness 93
4.13 Results from Correlation Analysis for Two Levels of Drowsiness . 93
5.1 Items of Technology Acceptance Model (TAM) . . . . . . . . . . . 108

5.2 Items of User Experience Questionnaire (UEQ) . . . . . . . . . . . 108
5.3 Post-Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
xv
List of Tables

5.6 Classication Results . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.7 Confusion Matrix for Self-Ratings (5 min) . . . . . . . . . . . . . . 118
5.8 Confusion Matrix for Self-Ratings (1 min) . . . . . . . . . . . . . . 118
5.9 Confusion Matrix for Observer Ratings (1 min) . . . . . . . . . . 119
5.10 Performance Comparison Post-Processing Techniques . . . . . . . 120
5.11 Statistical Analysis of TAM . . . . . . . . . . . . . . . . . . . . . . 121
5.12 Statistical Analysis of UEQ . . . . . . . . . . . . . . . . . . . . . . . 122
5.14 Study 3: Usage of Wearable Devices . . . . . . . . . . . . . . . . . 125
xvi
List of Acronyms
A Accuracy
Af After
AAA American Automobile Association
ACC Adaptive Cruise Control
ADAS Advanced Driving Assistance Systems
AIC Akaike Information Criterion
ANOVA Analysis Of Variance
ANS Autonomic Nervous System
ANT Adaptive Network Topology
ApEn Approximate Entropy
API Application Programming Interface
ATT Attitude
BIC Bayesian Information Criterion
BLE Bluetooth Low Energy
BN Bayesian Network
BVP Blood Volume Pulse
CFSS Correlation-Based Feature Subset Selection
CI Condence Interval
CS Compound Symmetry
CV Cross-Validation
D During
DDAW Driver Drowsiness and Attention Warning
DS Decision Stump
DT Decision Tree
ECG Electrocardiography
EDA Electrodermal Activity
EEG Electroencephalography
EMG Electromyography
EOG Electrooculography
ESS Epworth Sleepiness Scale
EU European Union
EuroNCAP European New Car Assessment Program
F F-Measure
FFT Fast Fourier Transform
FN False Negative
FP False Positive
xvii
List of Acronyms
GPS Global Positioning System

GSR Galvanic Skin Response
GUI Graphical User Interface
H Hypothesis
HF High-Frequency
HMM Hidden Markov Model
HR Heart Rate
HRV Heart Rate Variability
IBI Inter-Beat Interval
INT Intention
KNN K-Nearest Neighbor
KSS Karolinska Sleepiness Scale
LED Light-Emitting Diode
LF Low-Frequency
LKA Lane-Keeping Assist
LMM Linear Mixed Model
LOSOCV Leave-One-Subject-Out Cross-Validation
M Mean
Mdn Median
MLP Multilayer Perceptron
MSLT Multiple Sleep Latency Test
MWT Maintenance of Wakefulness Test
NB Naive Bayes
NDRT Non-Driving Related Task
NHTSA National Highway Trac Safety Administration
NOA Number of Adjustments
NREM Non-Rapid Eye Movement
OR Observer Rating
OTA Over-The-Air
PART Partial Decision Tree
PEOU Perceived Ease Of Use
PERCLOS Percentage of Eyelid Closure
PPG Photoplethysmography
PSD Power Spectral Density
PSQI Pittsburgh Sleep Quality Index
PU Perceived Usefulness
REM Rapid Eye Movement
RF Random Forest
RHDV Right-Hand-Drive Vehicle
RMSE Root-Mean-Square Error
RMSSD Root Mean Sum of Squared Distance
RQ Research Question
RRADS Real Road Autonomous Driving Simulator
RT Random Tree
SAE Society of Automotive Engineers
xviii
SD Standard Deviation
SDK Software Development Kit
SDLP Standard Deviation of Lane Position
SMOTE Synthetic Minority Oversampling Technique
SR Self-Rating
SSS Stanford Sleepiness Scale
SVM Support Vector Machine
SVR Support Vector Regression
SWM Steering Wheel Movement
TAM Technology Acceptance Model
TN True Negative
TOR Take-Over Request
TP True Positive
UDT User-Dependent Test
UEQ User Experience Questionnaire
UIT User-Independent Test
UX User Experience
VLF Very Low-Frequency
WFCM Weighted Fuzzy C-Mean
xix
1 Introduction
1.1 Motivation
Drowsiness describes a state of sleepiness and apathy, potentially causing to fall

asleep [10]. It is dened as transitional state between wakefulness and sleep
[11] that causes reduced attention and alertness for all the tasks performed [12].
Drowsiness can be caused, e.g., by sleep loss, sleep-inducing medications, alco-
hol consumption, or misdiagnosed sleep disorders. As for driving, drowsiness
can be hazardous and the originator of fatal trac accidents. Even healthy
people without sleep problems can fall asleep while driving, e.g., during long
trips with few or no breaks. In a survey by the American Automobile Associa-
tion (AAA) Foundation, several drivers even admitted to driving in a state of
drowsiness [13]. According to the Global Status Report on Road Safety from
2018, road accidents cause approximately 1.35 million deaths each year [14].
90% of them are based on human errors, and a serious part is the result of
drowsy driving [15, 16]. In the USA, for example, 16% of fatal crashes can be
attributed to drowsiness [17] and 20% in Europe [18]. A naturalistic driving
study of the National Highway Trac Safety Administration (NHTSA) showed
that the risk for being involved in a crash or near-crash is almost four times
higher while being in a drowsy state [15]. Further, it was found that a high
number of drowsy driving crashes involved a single vehicle with no co-passenger,
and the vehicle ran o the road at very high speeds with no braking evidence
[15].
For the reasons mentioned, systems that monitor and detect the driver's cur-
rent state, particularly the risk factor drowsiness, can reduce crashes related
to drowsiness. These systems can increase road safety by issuing a warning in
time [16]. In addition to the evident need, the integration of driver monitoring
systems into the vehicle will become mandatory for automobile manufacturers
in the European Union (EU) shortly [19]. For expressing the importance and
relevance of these systems, international institutions and bodies have integrated
them into their programs. Based on the General Safety Regulations of the EU,
from July 2022, a system for driver drowsiness and attention warning (DDAW)
will be compulsory for new vehicle types and from July 2024 for all vehicles to
be registered [19]. Further, in the 2025 Roadmap of the European New Car
Assessment Program (EuroNCAP), which assesses production cars in terms of
1
1 Introduction
safety, driver monitoring is part of the safety assessments in the category of

primary safety [16].
What is expected to have an even higher impact on road safety is automated

driving. It should further reduce or even completely avoid human-made errors,
e.g., through the risk factor driver drowsiness. Especially in further automating
the driving task, there has been a rapid growth in innovation in the automotive
industry in recent years. The capabilities of advanced driver assistance systems
(ADAS) in the vehicle are steadily increasing. These systems, such as adaptive
cruise control (ACC) or lane-keeping assist (LKA), are designed to assist and
take over the driving task in certain situations. The rising decoupling of the
driver from the actual driving task through ADAS paves the way for higher
driving automation levels. Based on the taxonomy provided by the Society of
Automotive Engineers (SAE), the automation of the driving task is categorized
into dierent levels, from manual driving (SAE level 0) to the full automation
of the driving task (SAE level 5) (see Figure 1.1) [20]. However, full driving
automation is not to be expected to hit the market before 2030 [21].
Figure 1.1: Changing role of the driver across the SAE levels of driving automation
(SAE J3016 [20]).
Until then, and by looking at the dierent levels of automation in more detail,
the risk factor drowsiness and its reliable detection will still play a crucial role.
Across these levels, the driver's role changes from the sole operator in manual
driving (SAE level 0) to the fallback level (SAE levels 1-3) and nally to the
passenger (SAE levels 4-5) of an entirely automated system. Therefore, and in
terms of driver drowsiness, the lower levels of automation, namely SAE level
1 (driving assistance), level 2 (partial automation), and level 3 (conditional
automation), require special attention. In level 1, the driver is supported by
ADAS that take over either the steering task or acceleration/deceleration in
certain situations, but never both of them at the same time. In level 2, the
driver must continuously monitor the system to intercede and take over control
in an adequate time when asked. In level 3, the driver is excluded from all mon-
2
1.1 Motivation
itoring obligations but has to be responsive in the event of a take-over request

(TOR) anytime. However, due to the decreased active involvement in driving,
the risk of getting drowsy faster is increased. The duty and monotony of ob-
serving during SAE level 2 driving results in a signicant increase of drowsiness
1
already after 20 minutes compared to manual driving in SAE level 0 [2] . Seri-
ous accidents [22, 23] with currently available SAE level 2 automated vehicles
have shown how essential the role of the driver is. In these cases, the fall-
back role could not be fullled by the human. Monitoring complex systems for
more extended periods is a challenging task even for highly motivated human
beings (c.f. irony of automation ) [24]. Vigilance in such an environment
is known to signicantly degrade within half an hour [24] - in other words:
monitoring is tiring. Therefore, in the transition phase from manual driving
to the rollout of full automation, systems that reliably monitor and recognize
the driver's current state, particularly the risk factor drowsiness, will be neces-
sary to guarantee the driver's ability take over full control. Only if the vehicle
and the driver work together successfully, increased implementation of driving
automation is possible.
Many approaches and methods based on dierent measures have been pro-
posed to detect drowsiness in an automotive environment. These measures
can be categorized mainly into four groups: vehicle-based, behavioral, physio-
logical, and subjective measures [10, 25]. Dierent car manufacturers provide
driver assistance systems to counteract the potential risk of drowsiness, e.g.,
with a rest recommendation [26, 27, 28, 29]. These commercial systems cur-
rently mainly focus on vehicle-based measures, i.e., the analysis of parameters
related to the driving behavior and imply drowsiness, such as lane position or
steering angle [10]. However, with increased driving automation levels, these
parameters will be more dicult to evaluate since the automated system con-
trols the car. Therefore, alternatives are necessary. These alternatives not only
have to guarantee reliable drowsiness detection during dierent stages of auto-
mated driving. Also, existing vehicles have to be able to be retrotted with as
little eort as possible since from July 2024, a system for drowsiness detection
is legally required within the EU for all cars to be registered [19].
A promising alternative to vehicle-based measures seems to be methods that
evaluate physiological data to identify driver drowsiness. This kind of data can
dierentiate between wakefulness and sleep and warn the driver in an early
stage [10, 30]. Considering typical Electroencephalography (EEG) or Electro-
cardiography (ECG) measurements in laboratories, complex measuring devices,
including electrodes on the head or upper body, are required to obtain su-
cient data quality. However, due to their intrusiveness, measurements of this
type are not accepted inside a vehicle. Depending on the driving environment,
disruptive factors such as vibrations from the roadbed can lead to reduced data
1 own publications are highlighted in blue
3
1 Introduction
quality. Therefore, new non- or less intrusive strategies for recording physio-
logical signals inside a vehicle are required.
In consumer electronics, health monitoring and tness tracking with wearable

devices prevailed in recent years. This technology is anchored in society so that
smartwatches and tness trackers became mainstream. The increased number
of consumers who monitor their health and tness in daily life with informa-
tion such as the number of steps, heartbeat, sleep quality, or calories consumed
resulted in a booming market. This market might even be enlarged shortly
through slimmer designs, expanded device connectivity, and, especially, more
accurate sensors. Statistics show that smartwatches' shipment increased from
79 million in 2018 to around 92 million in 2019. It is estimated to reach 131
million devices in 2023 [31]. These devices have greatly improved, particularly
in the recording of physiological signals, such as the heart rate, and provide
similar results compared to more advanced or even medical-grade devices [32],
[5]. Along with the recording of physiological data, most of these devices pro-
vide the option of real-time data transfer via Bluetooth Low Energy (BLE) or
other wireless transmission technologies.
To counteract limitations of existing drowsiness detection systems and encour-

aged by the ongoing progress in the development of smart wearable devices
during the last years, this Ph.D. thesis aimed to investigate the suitability of
wrist-worn wearable devices and the usage of their recorded physiological data
in an automotive environment and in particular, in the eld of driver drowsiness
detection. Apart from their actual use in daily health and tness monitoring,
in this way, their scope of use could be further enhanced. This kind of driver-
vehicle interaction oers the option of being easily integrated into a vehicle.
Existing vehicles can quickly be upgraded and retrotted with this technology
without having to install additional sensors. Since these devices can be worn
like a watch, their usage inside a vehicle would enable long-term recording of
physiological signals in a non-intrusive and driver-familiar way.
For reaching these goals, this work builds on data collected in drowsiness stud-
ies. Dierent modeling approaches and detection algorithms are investigated
using physiological data from wrist-worn wearable devices based on supervised
machine learning. In this context, it is further examined how the acceptance
of drowsiness detection systems based on smart wearables is and which precon-
ditions can be considered to adapt drowsiness detection systems and improve
their performance. With the obtained ndings, a contribution is made to the
improvement of drowsiness detection systems and thus vehicle safety on the
way to the full automation of the driving task.
4
1.2 Outline
1.2 Outline
The remainder of this Ph.D. thesis is structured as follows: In Chapter 2, theo-

retical background information and a state-of-the-art overview from literature
are presented. Based on the related work presented, the research gap is high-
lighted, the problem statement for the Ph.D. thesis derived, and the research
questions and research approach introduced. In Chapter 3, the study setting
developed in this work and results from the subjective evaluation of the baseline
studies is presented. Chapter 4 comprises the results from model development
for driver drowsiness detection using wrist-worn wearable devices. Further, in
this chapter, the diculty of nding a reliable and uniform ground truth for
drowsiness will be discussed. Chapter 5 presents the evaluation study, in which
a prototype for real-time driver drowsiness detection is presented and evaluated
regarding detection performance, technology acceptance, and user experience.
In Chapter 6, the results obtained regarding the established research questions
are discussed, and recommendations for further research and the development
of drowsiness detection systems are given. Chapter 7 will conclude the thesis
by providing an overview and summary of the presented work, highlighting the
main contributions, addressing limitations, and deriving future work.
5
2 Theoretical Background and
State-of-the-Art of Drowsiness
Detection
In this chapter, at rst, an explanation of the term drowsiness and related

constructs is given. This is followed by an overview of state-of-the-art drowsi-
ness detection methods in research and on the market. Based on the related
work presented, the research gap is highlighted, the problem statement for the
Ph.D. thesis derived, and the research approach and research questions pre-
sented. The last section presents and discusses drowsiness studies in dierent
environments from related work and derives implications for the study setting
developed in this work.
2.1 Drowsiness, Sleepiness and Fatigue
In related work on driver drowsiness detection, the terms drowsiness, sleepi-

ness, and fatigue are often applied interchangeably. However, it is possible and
important to distinguish between them [35]. In the following, relevant deni-
tions are presented to get a more profound understanding of these terms.
Drowsiness is dened as a transitional state between wakefulness and sleep in
which the 'sleep onset process' [...] has already begun, albeit intermittently, and
is likely to proceed to sleep [11]. Within the automotive context, Knipling and
Wierwille dened drowsiness as follows: [...] drowsiness is used here to refer
to the state of reduced alertness, usually accompanied by performance and psy-
chophysiological changes, which may result in loss of alertness or being asleep
at the wheel. [33].
The term sleepiness represents a physiologic drive toward sleep [34] and is
dened as a measure of a subject's tendency at a particular time to doze or
fall asleep that refers to a propensity of sleep [11]. Both the terms drowsiness
and sleepiness are often considered as synonyms [34]. Drowsiness is the state
of being sleepy and drowsy [35].
2 Drowsiness/Sleepiness can change rapidly
2 in the following the terms drowsiness and sleepiness are used synonymously
7
2 Theoretical Background and State-of-the-Art of Drowsiness Detection
within seconds with periods of missing awareness of the reality that can some-
times lead to a micro-sleep event [35].
In contrast, fatigue can be dened as a subjective state of weariness, often
with muscle aches or discomfort, emotional irritability and a disinclination to
continue activities [35]. Others dened it as a reduced inclination for activ-
ity, due to excessive extension in time or intensity of that activity [36] or as a
subjectively experienced disinclination to continue performing the task at hand
[37]. The longer the mental and physical task or activity, the worse fatigue
gets without any rest in-between. In comparison to drowsiness that can be
decreased by sleep, fatigue can be relieved by rest. Further, no lack of aware-
ness occurs due to fatigue, as it is in the case of drowsiness. When drivers are
driving for an extended time, they can be fatigued but do not have to be in a
drowsy state. In many cases, both drowsiness and fatigue happen simultane-
ously, which might be why these constructs are often used as synonyms [35].
From the denitions presented, it can be seen that drowsiness/sleepiness and
fatigue are distinguishable. However, many people in research, industry and
other areas dealing with road safety topics lack understanding and knowledge
regarding the precise denitions and demarcation of these three terms when
applying as part of their daily work. From a safety perspective, the more dan-
gerous and relevant state is drowsiness due to the lack of awareness caused by
drowsiness. Therefore, early and reliable detection while driving needs to be
ensured.
2.2 Driver Drowsiness Detection Methods -

Advantages and Limitations
In the following, current driver drowsiness measures and detection methods

based on subjective, behavioral, vehicle-based, and physiological measures are
presented [10, 25]. In the rarest of cases, only a single measure is applied,
but rather a combination of dierent measures in the form of a hybrid system
considered to increase the performance of the drowsiness detection system.
Based on their advantages and limitations, the own work is distinguished from
them.
2.2.1 Subjective Measures
Subjective measures include self- and observer ratings. In order to generate a

reference or ground truth of the driver's drowsiness state, e.g., for performance
validation of existing methods as it was also performed in the course of this
Ph.D. project, subjective measures are often considered to be the best method
8
2.2 Driver Drowsiness Detection Methods - Advantages and Limitations
[1, 38, 39]. Self-ratings (SR) involve questionnaires asked at regular intervals
or in certain situations under specic conditions. Some of the commonly used
tests for self-assessment include the Epworth Sleepiness Scale (ESS) [40], the
Multiple Sleep Latency Test (MSLT) [41], the Maintenance of Wakefulness
Test (MWT) [42], the Stanford Sleepiness Scale (SSS) [43], and the Pittsburgh
Sleep Quality Index (PSQI) [44]. The most commonly used self-rating scale
also applied in this work is the Karolinska Sleepiness Scale (KSS) (see Table
2.1), a 9-point Likert scale. In several previous validation studies, it was found
that this subjective scale can be related to dierent objective measures proving
its suitability as a valid indicator of sleepiness [45, 46, 47, 48].
Level Description
1 extremely alert
2 very alert
3 alert
4 rather alert
5 neither alert nor sleepy
6 some signs of sleepiness
7 sleepy; no eort to keep awake
8 sleepy; some eort to keep awake
9 very sleepy; sleep ghting
Table 2.1: Karolinska Sleepiness Scale (KSS) [49].
For observer ratings (OR) of drowsiness, experts or trained raters observe the
driver either in real-time [50] or by watching videos recorded during an exper-
iment [51]. The driver's state is predicted based on sleep-induced indicators
and behavioral changes in the facial region, such as the eyelid position, blink
frequency, and facial muscle activity [52, 51, 38, 39, 53]. The most commonly
used observer rating scale was published by Wierwille and Ellsworth [52]. In
this Ph.D. thesis, the drowsiness scale published by Weinbeer et al. was applied
[50]. This scale categorizes drowsiness into six levels with drowsiness indicators
per level as a reference for the observers (see Table 2.2). It is based on the scale
by Wierwille and Ellsworth [52] and modied with ndings of Wiegand et al.
[54] and Karrer-Gauÿ [55].
9
Level Description Indicators
appearance of alertness present; normal

facial tone; normal fast eye blinks; short
1 not drowsy
ordinary glances; occasional body
movements/gestures
still suciently alert; less sharp/alert looks;

longer glances; slower eye blinks; rst
2 slightly drowsy mannerisms as: rubbing face/eyes,
scratching, facial contortions, moving
restlessly in seat
mannerisms; slower eye lid closures;

3 moderately drowsy decreasing facial tone; glassy eyes;
staring at xed position
eyelid closures (1-2s); eyes rolling sideways;

rarer blinks; no proper focused eyes;
4 drowsy decreased facial tone; lack of apparent
activity; large isolated or punctuating
movements
eyelid closures (2-3s); eyes rolling upward/

sideways; no proper focused eyes; decreased
5 very drowsy
facial tone; lack of apparent activity;
large isolated or punctuating movements
eyelid closures (4s or more); falling asleep;

longer periods of lack of activity;
6 extremely drowsy
movements when transition in and out of
dozing
Table 2.2: Observer rating scale by Weinbeer et al. [50].
The advantage of subjective measures is the consideration of personal feelings

and impressions. However, since this type of measurement relies either on the
driver's response or an external observer, the application in real-time and a
real-world driving scenario is impossible. Therefore, these measures are mainly
used as reference metrics or ground truth for drowsiness.
2.2.2 Behavioral Measures
Behavioral-based drowsiness detection techniques measure driver drowsiness by

evaluating behavioral changes in the driver's face induced by reduced alertness
10
[56]. This technique has already been investigated for a long time [33, 57]. For
this purpose, in most cases, the driver is monitored using cameras mounted
inside the car and directed towards the driver's face. Due to advancements
in camera technology combined with novel approaches in computer vision
and image processing, the evaluation of behavioral measures in the context
of camera-based drowsiness detection has been receiving more and more at-
tention in recent years [58]. These methods evaluate mainly three parameters:
eye movements (eye blinking, eye closure activity) via eye-tracking, facial ex-
pressions (yawning, jaw drop, brow rise, lip stretch), and head position (head
scaling/nodding/rotation) [25]. One of the most commonly applied and im-
portant behavioral parameters is the percentage of eyelid closure (PERCLOS)
[59, 60, 61, 62]. Drowsiness is assessed by calculating the proportion of time in
a dened time interval where the eyes were closed 80% to 100% (see Equation
2.1 [4]).
N o. f rames of closed eyes

P ERCLOS = (2.1)
interval considered − blinking time
Many studies focused on using machine (deep) learning-based approaches

[63, 64, 65, 66, 67]. Apart from research, numerous commercial products are
available that rely on behavioral measures for drowsiness detection. These
range from camera-based methods [68, 69] to glasses [70].
A particular advantage of behavioral measures is their non-intrusiveness, mak-

ing their usage inside a vehicle very customer friendly. However, diering blink
frequencies and amplitudes from individual to individual can impact the qual-
ity of the observing framework [71]. Moreover, insucient lighting, sunglasses,
or partial visibility of the head by, for example, an unusual sitting position can
limit the performance of the monitoring system [72, 25]. Another main limiting
factor is privacy. Some drivers may not want to be lmed continuously and
could turn o the camera in the car menu or even tape it o.
2.2.3 Vehicle-based Measures
In terms of vehicle-based measures for drowsiness detection, the vehicle and

driving environment are monitored, and the driver's driving activities evaluated
using integrated sensors in the car [25]. Evidence gathered from sleep-related
crashes suggest that the vehicle exhibits characteristics such as higher speed
with little or no braking [73], vehicle leaves the road [74], crashes occur on a
high-speed monotonous road, driver does not attempt to avoid crash [75] or
driver is alone in the vehicle [76]. These characteristics suggest that a drowsy
driver's car reects specic patterns, measured and applied to predict potential
11
drowsy driving [77]. The two most commonly used measures are steering wheel
movement (SWM) and standard deviation of lane position (SDLP) [25] that
were applied in several previous works [78, 79, 80, 81, 82, 83]. SWM evaluates
unnatural steering behavior induced by drowsiness using a steering angle sensor
or accelerometer to determine the driver's drowsiness. The number of micro-
corrections made by a drowsy driver is less when compared to an alert driver
[84, 85]. The input for an SDLP system is a camera mounted on cars to
determine the car's relative position in the driving lane, i.e., the deviation
from the lane's center-line. A drowsy driver might cross the current driving
lane abruptly, causing crashes. If the car is found to be crossing the lane or
approaching the sides of the lane, the driver will be alerted [79].
Vehicle-based measures are non-intrusive and show high potential in detecting

drowsiness. They represent the current standard measure for driver drowsiness
and are applied in series production cars of well-known car manufacturers for
several years [26, 27, 28, 29]. However, their reliability is inuenced by driving
expertise, geometric and environmental conditions, and intoxication [10]. Dur-
ing automated driving, these parameters will be more challenging to evaluate
and not available anymore to a large extent since the automated system con-
trols the vehicle [86]. Therefore, alternatives are required when ADAS such as
ACC and LKA are activated and take over parts of the driving task.
2.2.4 Physiological Measures
Since this Ph.D. thesis focuses on using physiological data from consumer-grade
wearable devices for drowsiness detection, drowsiness detection methods apply-
ing physiological measures will be described in more detail. In this context, the
research gap in which this Ph.D. thesis is located will be highlighted thereby
again.
Physiologically based drowsiness detection methods mainly evaluate signals

from four categories: brain, skin/muscle, eye, and heart [87].
In the case of brain signals, the brain's activity is measured with EEG using
several electrodes on the scalp. EEG is often considered the gold standard and
most reliable indicator of drowsiness, especially when it comes to the transition
between wakefulness and sleep. From EEG, information can be obtained from
waves (frequency bands) that are continuously generated by impulses in the
brain. Rechtschaen and Kales established distinct sleep stages [88] that were
further developed and modied by the American Academy of Sleep Medicine
[89]. Therein, rapid eye movement (REM) sleep is separated from non-rapid
eye movement (NREM) sleep. NREM is further categorized into light sleep
(stages N1 and N2) and deep sleep (N3). These stages can also be derived
from the EEG waves in the frequency range of 8 Hz and below. In terms of
12
wakefulness, drowsiness, and sleep, the waves presented in Table 2.3 can be
considered.
Wave Frequency Measure

BETA 13-30 Hz alertness
ALPHA 7.5-13 Hz relaxation
THETA 4-8 Hz drowsiness
DELTA 0.5-4 Hz sleep
Table 2.3: EEG waves with frequency band and measure [25, 90].
The delta band is in the frequency range from 0.5 to 4 Hz and provides infor-
mation on sleep, whereas the theta band with a frequency range of 4 to 8 Hz
reects drowsiness. The alpha band frequency is 7.5 up to 13 Hz and contains
relaxation, i.e., the onset of sleep and the early stages of drowsiness. The beta
band is associated with wakefulness and alertness and lies in the range from
13 to 30 Hz [25, 90]. For drowsiness detection, EEG was applied in numerous
studies [90, 91, 92, 93, 94, 95].
Systems based on skin signals use electromyography (EMG) to measure and
record changes in the electric potential of the skin caused by muscle cells
[96, 97, 98, 99]. Drowsiness detection with EMG assumes an increase in am-
plitude and a decrease in mean frequency [100]. In terms of skin signals, elec-
trodermal activity (EDA) or galvanic skin response (GSR), measured through
skin conductance and resistance, is applied for drowsiness detection. This phe-
nomenon reects the changes of sweat on the human skin that can be referred
to as the current physical state of a person [101, 102, 103, 104].
Eye-based signals can be gathered using electrooculography (EOG) by attach-
ing electrodes to the right and left side of the eye to measure its movements
[105, 106, 107, 108, 109]. With this signal, information is obtained on the one
hand about the blinking pattern, and on the other hand, about eye movements.
Specically, the potential dierence between the cornea and retina is measured
[110, 111]. Slower eye-rolling movements represent a transition between wake-
fulness and sleep, whereas the saccade speed is an indicator of vigilance [112].
Vigilance is identied by faster eye movements replaced by slower rolling move-
ments during the process of getting sleepy. Reduced and rare eye blinks indicate
drowsiness [113].
Using heart signals to identify driver drowsiness, cardiac activity is measured
and analyzed using ECG [114, 115, 116]. One parameter applied in driver
drowsiness detection that can be easily derived from the ECG signal is the
heart rate. This parameter varies signicantly between wakefulness and sleep
[10, 30]. Another physiological parameter that is particularly often used in
driver drowsiness detection is heart rate variability (HRV) [117, 118, 119]. The
changes in the length of the RR intervals, which is the time elapsed between
two successive R waves, i.e., two heartbeats of the QRS complex on the ECG,
13
are examined. By performing HRV analysis on ECG signals, the activity of

the autonomic nervous system (ANS) can be obtained [117]. This activity al-
ters with stress and drowsiness [120]. The two major components of the ANS
are the sympathetic and parasympathetic nervous systems. The decrease of
parasympathetic activity and the increase of sympathetic activity are associ-
ated with a person's vigilance. The increase in parasympathetic activity and
the decrease in sympathetic activity indicates relaxation [120]. The normalized
low-frequency (LF) band power of HRV follows the dominance of sympathetic
activity, whereas high-frequency (HF) power is associated with parasympa-
thetic activity [121]. Therefore, investigating ANS activity may help to get
deeper insights into a driver's drowsiness [122].
Further, for obtaining physiological signals, photoplethysmography (PPG) is
applied. PPG represents a low-cost optical measurement technique for detect-
ing blood volume changes, mostly using a pulse oximeter at the skin surface.
By illuminating the skin with a light-emitting diode (LED), changes in light
absorption are measured, i.e., the amount of light reected to a photodiode
[123, 124]. In recent years, this technology gained popularity through health
and tness monitoring in consumer electronics with wearable devices, such
as smartwatches and tness trackers. PPG is mainly applied for measuring
real-time heart rate [125, 126] or, with more advanced devices, RR intervals
for HRV analysis [127]. For driver drowsiness detection, PPG was applied in
several previous works [128, 129, 103, 114].
Given the presented advantages and limitations of subjective, behavioral, and

vehicle-based measures (see Table 2.4), especially techniques that include phys-
iological measurements seem to be a promising alternative. Physiological mea-
sures are reliable, accurate, and show high potential in diering wakefulness
and sleep [10, 30]. They change in the very early stages of drowsiness what
brings the advantage to warn the driver in time [10]. However, a major dis-
advantage is their intrusiveness, e.g., by attaching adhesive electrodes on the
subject's chest or scalp, e.g., for measuring ECG or EEG. Moreover, in a dy-
namic environment such as driving, only non- or minimal intrusive methods are
accepted, and vibrations from the roadbed could lead to degraded sensor tech-
nology. Therefore, new and less intrusive monitoring strategies for recording
physiological signals inside a vehicle are required.
14
Measures Parameters Advantages Limitations
takes personal
not
subjective questionnaire feeling into
real-time
account
eye movements, interpersonal

non-
behavioral facial expressions, accuracy,
intrusive
head position privacy
real-time, not available

vehicle-based SWM, SDLP non- in automated
intrusive driving
EEG, ECG,
physiological reliable intrusive
EOG, EMG
Table 2.4: Summary of advantages and limitations of dierent types of drowsiness
measures.
In research and on the market, several non- and less intrusive approaches for
measuring physiological signals inside a vehicle were proposed and will be dis-
cussed in the following section.
2.2.4.1 Non-/Less Intrusive Approaches for Measuring Physiological

Signals inside a Vehicle
In research, experiments were conducted to measure heart rate or ECG for HRV
analysis via integrated sensors on the steering wheel [130, 131, 132]. However,
their usage is limited in the context of automated driving since one or even both
hands have to touch the steering wheel for a longer time. Further, additional
sensors need to be integrated into the steering wheel. The driver's breathing
rate was captured from real-time image recognition. Results show that the
kind of clothes inuences and reduces system performance [133]. As part of a
research project funded by the EU, bio-sensors were built into car seat fabrics
and seatbelts to measure heart rate and respiration [134]. For this, however,
each driver's seat would have to be equipped with the required sensors, or
they would have to be retrotted, which would be associated with considerable
costs. EEG, EOG, EMG, and EDA for micro-sleep detection were detected
with a device worn behind both ears in another project. Limiting factors are
noise artifacts due to sweating and hydration. Further, for ensuring optimal
contact of the device on the skin, wet electrodes through a specic gel are
required which cannot be ensured for all customers [135].
Apart from research, also the market developed systems for driver drowsiness
15
detection with physiological signals. In the driver alertness monitor by Plessey,

an array of capacitive sensors is integrated into the driver's seatback (see Figure
2.1) for the acquisition of ECG signals [136]. Without direct contact with
the human skin, the RR intervals from the ECG signal can be detected and
applied in HRV analysis. As discussed in the example above, for this purpose,
additional sensors need to be installed in the driver's seat, and the type of
clothes could limit the system's performance.
Figure 2.1: Plessey driver alertness monitor (adapted from [136]).
CardioID Technologies developed Cardiowheel, which is a custom cover for

the steering wheel with conductive elements (see Figure 2.2) [137, 138]. This
steering wheel cover collects electric impulses generated by the heart from the
driver's hands placed on the cover. It issues an alert when drowsiness is de-
tected. Again, their usage is limited in the context of automated driving since
one or even both hands have to touch the steering wheel for an extended time.
A customized steering wheel cover is required for each car associated with in-
creased costs and might not be compatible with every car manufacturer's design
premises.
Figure 2.2: Cardiowheel by CardioID Technologies [137].
Further, several wearable devices for driver drowsiness detection were devel-
oped. A bracelet that measures heart rate and EDA was presented by Steer
16
[139]. Depending on the detected level of drowsiness, the bracelet either vi-
brates or produces a moderate electric shock in higher levels of drowsiness.
StopSleep developed a double ring for measuring EDA. When initial signs of
drowsiness are detected, the driver is warned with a vibration. In higher levels
of drowsiness, an auditory signal is added [140]. Vigiton, a driver drowsiness
detection system proposed by Neurocom, collects physiological information by
measuring GSR with wristband and ring and provides visual and auditory
warnings [141].
The presented solutions based on wearable devices seem to be the most promis-
ing since they can be easily integrated into the vehicle and, depending on the
carrying position, are less or even non-intrusive. Since the area of application
of these wearable devices is very limited and usage not compulsory, full mar-
ket penetration or widespread use of them could not be achieved so that the
majority of these devices were no longer pursued or further developed.
This work aims to apply physiological data from consumer-grade and widely
available wrist-worn wearable devices, such as smartwatches and tness track-
ers, for driver drowsiness detection. In literature, several works about this topic
can be found and will be discussed in detail in the upcoming section. Based on
that, the research gap is highlighted, and the research approach and research
questions of this Ph.D. thesis are presented.
2.2.4.2 Driver Drowsiness Detection Using Wrist-Worn Wearable Devices
Lee et al. utilized the built-in motion sensors of a smartwatch for driver drowsi-
ness detection by evaluating the driving behavior [142]. Twenty subjects par-
ticipated in a simulator study with an average duration of 60 minutes. Time,
spectral, and frequency domain features were extracted and mapped to the
subject's drowsiness self-ratings. A support vector machine (SVM) classier
reached an accuracy of 98.15%. Lee et al. followed a similar approach, where
accelerometer and gyroscope data were collected during a 2-hour simulator
drive with ve participants (see Figure 2.3(a)). An SVM classier resulted in
an accuracy of 98.80% [143]. Leng et al. developed a wristband connected
to a PPG and GSR sensor on a nger. From data of 20 subjects, ve fea-
tures were extracted, including HRV and respiratory rate, and labeled with
self-ratings [129]. An SVM classier resulted in an accuracy of 98.30%. In the
work of Choi et al., a wrist-worn wearable device with sensors for PPG for
HRV analysis, GSR, temperature, acceleration, and gyroscope was developed
(see Figure 2.3(b)). Twenty-eight people participated in their simulator study,
which consisted of four parts (normal, stress, drowsiness, fatigue) with a total
driving time of 3 hours and 20 minutes. Labels were gathered by analyzing
signs of sleepiness in their facial expressions. With an SVM classier, an accu-
racy of 98.43% of accuracy was reached [103]. Lee et al. conducted a simulator
17
study with a duration of one to a maximum of two hours and six participants.
They combined data from a PPG sensor of a Polar smartwatch with ECG data
measured with a breast belt (see Figure 2.3(c)). Labels were assigned by eval-
uating videos of the driver's face and driving behavior. Their classication in
the form of recurrence plots resulted in an accuracy of 70% [114]. The heart
rate measurement of a smartwatch was fused with PERCLOS in the work of Li
et al. [144]. A study in a simulated environment with a duration of 50 minutes
and 10 participants was conducted. An accuracy of 83% was obtained with an
SVM classier. Lee et al. evaluated steering wheel movements with accelerom-
eter and gyroscope data from a smartwatch on one wrist and combined it with
physiological data from a PPG sensor placed on a sports wristband on the
other wrist. From the data collected during a 3-hour simulator drive with 12
participants, time, phase space, and spectral-domain features were calculated
and classied with a Weighted Fuzzy C-Mean (WFCM) model. Their system
reached a detection accuracy of 96.50% [145]. The temperature of the nose
from a sensor and wrist from a smart wearable and heart rate from a chest
strap was collected from 19 participants in a simulator study with a driving
duration between 90 and 150 minutes by Gielen and Aerts [146]. Classication
with a decision tree model resulted in accuracies of 68.40% (temperature nose),
88.90% (temperature wrist), and 70.60% (heart rate). When combining all pa-
rameters, an accuracy of 89.50% was reached. In the work of Misbhauddin et
al., a wearable-based drowsiness detection system consisting of an Empatica
E4 wristband and mobile application was proposed. For real-time identica-
tion of drowsiness, HRV and GSR data from the wristband were streamed to
and processed in the mobile application. For training the system, the users are
required to wear the wristband when not driving and give feedback four times
a day regarding their current drowsiness state through the mobile application.
If both values of GSR and HRV are below a certain threshold, a warning is
issued during driving. An accuracy of 80% was reached after testing the pro-
posed system in a simulator study with 10 participants [102]. In the system of
Bi et al., unsafe hand motions (hands o the steering wheel) through drowsi-
ness or distraction are detected with motion data from two smartwatches (see
Figure 2.3(d)). Their data set consisted of 75 real-world driving trips from
six participants. Their self-employed adaptive training algorithm reached over
97% of precision and recall [147]. Malathi et al. developed a wrist-worn EDA-
based wearable device for drowsiness detection. Results depict intrapersonal
dierences in the collected EDA signal when being active, drowsy, or asleep
[101].
18
2.3 Problem Statement and Research Approach
(a) Lee et al. [143] (b) Choi et al. [103]
(c) Lee et al. [114] (d) Bi et al. [147]
Figure 2.3: Selected drowsiness detection systems using wrist-worn wearable de-
vices from related work.
It can be seen that by utilizing physiological data from wrist-worn wearable

devices for driver drowsiness detection, promising results were achieved in pre-
vious works. However, some devices were prototypes and specially developed
for these studies [129, 103, 101]. When available on the market, they were
combined with another or more intrusive measurement [114, 144]. Focusing on
future automated driving, sensors like an accelerometer or gyroscope will be
more dicult to apply since movement patterns from the steering cannot be
continuously evaluated for driver state detection [142, 143, 145, 147]. More-
over, using skin temperature on the wrist is also a promising approach [146].
However, only very few or then correspondingly expensive smartwatches and
tness trackers have an integrated skin temperature sensor, which could reduce
the usage and acceptance of the system. Furthermore, the system user should
not be obliged to continuously wear and use the detection system and give
constant feedback several times a day to train the detection model [102].
19
In contrast to previous work and encouraged, especially in the recent advance-

ments in smart wearables in recording physiological signals through more accu-
rate sensors, this Ph.D. thesis takes a dierent approach. To further investigate
the application of wrist-worn wearable devices inside the vehicle, the potential
and feasibility of their recorded physiological data as single data input for the
detection of driver drowsiness is investigated (see Figure 2.4). By utilizing
machine learning techniques, dierent modeling approaches and detection al-
gorithms based on data from these devices are investigated and compared. As-
pects such as the choice of meaningful parameters and features, the generation
of valid ground truth for drowsiness, the number of drowsiness levels, and the
inuence of inter-driver-variance on model performance are analyzed. Further-
more, in this context, it is examined which preconditions can be considered to
adapt drowsiness detection systems for improving their performance and how
the acceptance of drowsiness detection systems based on smart wearables is.
Figure 2.4: Research approach: Using physiological data from a consumer-grade

wrist-worn wearable device as single data input for a machine learning
model for driver drowsiness detection.
2.3.1 Hypothesis and Research Questions
To achieve these goals and investigate the presented issues, the following hy-
pothesis (H) and research questions (RQ) being investigated in this work are
proposed.
H: Smart wearables can be applied for a reliable detection of driver

drowsiness in an automotive context.
RQ1: What preconditions can be considered to adapt and personal-

ize driver drowsiness detection systems and to model dierent groups
of users?
People behave very dierently in certain physiological states, such as drowsi-
20
ness. Causes and factors inuencing this behavior can dier signicantly from
individual to individual, and it poses a challenge to nd uniform attributes
across individuals [148]. Therefore, for answering this research question, it was
examined to what extent dierent preconditions and human and external fac-
tors, such as age, time of the day, driving mode, driving time, trust, inuence
the drowsiness state. And how this information can be applied during the de-
velopment process and for the parameterization of driver drowsiness detection
systems. This is intended to give researchers and car manufacturers pointers
and framework conditions for developing this kind of system.
RQ2: Can driver drowsiness be derived from vital parameters mea-

sured with wrist-worn smart wearables?
To answer this research question and thus to be able to assess the potential of
consumer-grade wrist-worn wearables devices for the detection of driver drowsi-
ness in an automotive environment, dierent investigations were carried out
that provide novel insights researchers can build upon. In these experiments,
wearable devices from dierent manufacturers were compared on the one hand
with one another, and on the other hand, with a medical-grade device. Various
physiological parameters measured by the wearable device were applied to de-
tect drowsiness. Dierent features were used and evaluated concerning quality
and impact on detection performance depending on the physiological parame-
ter. Several supervised machine learning models were compared and evaluated
using various performance measures in user-dependent and user-independent
tests. In these tests, dierent ground truths for drowsiness were applied and
a dierent number of drowsiness levels. Since the ground truth is a decisive
factor in developing drowsiness detection models based on supervised machine
learning and many dierent approaches can be found in the literature, this
topic was examined in more detail.
RQ3: Are driver drowsiness detection systems based on smart wear-

ables accepted and how to further enhance their acceptance and thus
integration in the vehicle?
In consumer electronics, wearable devices, especially smartwatches and tness

trackers, have become established, especially in recent years. Therefore, the
automotive industry and research have proceeded with the uptake of wearable
devices, sourcing use-cases and inventive ways to implement this technology in
future vehicles to enhance safety and in-car driving experience on the way to
full driving automation. What has not yet been investigated is whether driver
assistance systems, in this case, a driver drowsiness detection system based on
wearable devices, are even accepted. Can users imagine using these devices
and the data recorded with them for a safety-critical task, such as drowsiness
detection? What has to be done to increase acceptance for it? Therefore, a
prototype for real-time driver drowsiness detection based on a consumer-grade
21
wrist-worn wearable device was developed by applying knowledge gained in

previous stages in this Ph.D. project and evaluated in terms of technology
acceptance, user experience, and detection performance in a nal user study.
2.4 Previous Drowsiness Studies in Simulated

and Realistic Environments
This work builds on data collected in drowsiness studies to examine the is-
sues addressed. In total, three user studies, two baseline studies for database
creation, and an evaluation study were conducted in the course of the Ph.D.
project.
In general, the investigation of driver drowsiness is associated with increased
risk and cannot easily be examined under realistic conditions. Often partici-
pants with sleep deprivation are recruited to induce drowsiness more quickly,
which would be even riskier in a study setting in real trac. Therefore, re-
searchers' rst choice for conducting a drowsiness study is a driving simulator
that brings many benets compared to a eld study [149]. Complex study
settings, e.g., with take-over scenarios in an SAE level 2 or level 3 automated
system, can be realized easier and faster, optimally adapted to the study setting
requirements. Study conditions are standardized, controllable, and, above all,
easily reproducible. However, it is dicult to apply the knowledge gained in the
simulator one to one to real-world driving and gure out how the physical state
and the drowsiness development over time would have been aected. Therefore,
it is essential to examine driver drowsiness in simulated and especially realistic
environments and scenarios.
In the following, previous drowsiness studies from related work in simulated and
realistic environments are presented from which implications for the developed
study setting in this work are derived.
2.4.1 Simulated Environment
A large number of simulator studies on the subject of driver drowsiness can be

found in the literature. The following selection is based on studies that deal
with manual driving and driving automation, preferably up to SAE level 3. Up
to this level, the driver either still has complete control of the vehicle or forms
the system's fallback level and is required to intervene and take back control
after a TOR [20].
Neubauer et al. compared partial automation with non-automation in 35-
minute simulator drives with 184 participants (age: M=20.16, SD=3.13). The
22
2.4 Previous Drowsiness Studies in Simulated and Realistic Environments
partially automated drives were highly demanding for the driver and illustrated
the increasing risk of drowsiness [150].
Körber et al. performed a driving simulator study with 20 subjects (age:
M=23.30, SD=2.64). In a 42.5-minute partially automated drive, the driver's
only task was to monitor the system. The results showed that drowsiness oc-
curs when not being engaged in an active task while driving and that the duty
of monitoring leads to a decrease of vigilance [151].
Similar results were determined by Miller et al., where 48 participants (age:
M=20.85, SD=1.32) performed a 40-minutes partially automated drive in a
driving simulator [152].
Vogelpohl et al. compared 60 minutes with sleep-deprived participants of con-
ditionally automated driving with manual driving with 60 participants (age:
M=41.30, SD=21.10). The study was conducted between 8 pm and midnight.
Observers found earlier signs of drowsiness in the former ones [153].
Increasing KSS levels and changes in PERCLOS [72] were observed by Jarosch
et al. in the context of a simulator study where 56 participants (age: M=30.10,
SD=9.00) drove 30 minutes conditionally automated [154].
In another experiment by Jarosch et al., 73 participants (age: M=31.36,
SD=9.86) drove 50 minutes with conditional automation. Due to the monitor-
ing task, increased sleepiness levels were determined, the take-over performance
was impaired, and a higher number of accidents occurred compared to being
engaged in a quiz task while driving [155].
In another simulator experiment conducted by Omae et al., eight of 30 partic-
ipants fell asleep after 60 minutes of automated driving while monitoring the
system [156].
Feldhütter et al. conducted a simulator study with 13 participants. During a
60-minute trip of automated driving, participants had to monitor the driving
task. Three participants fell asleep after 20 minutes of driving. Two of them
closed their eyes for longer than ve minutes, and one experienced a micro-sleep
[157].
The results of the presented studies show that automated driving aects the
driver's physiological state and leads to an increase of drowsiness, already after
a short driving duration. Not in the focus of the presented investigations was
the age of the driver. The studies were mainly based on younger participants
or covered larger age intervals. However, older people are considered a poten-
tial target group who could benet from the technology of automated driving
[158]. With increasing age, the number of health problems rises. Vision and
hearing diminish, and the ability to concentrate, resulting in potential hazards
in road trac. Thus, partially automated driving as one of the enablers to full
automation can benet older people. It is especially relevant for older people
who are disabled in their ability to drive but still want to be mobile. There-
fore, the specic needs and requirements of older people need to be investigated
and should be considered when developing and parameterizing systems for the
detection of drowsiness.
23
2.4.2 Realistic Environment
Not many user studies or publications can be found where, in particular, driver
drowsiness was examined in a real-world automated driving context.
Weinbeer et al. investigated the suitability of a right-hand-drive vehicle
(RHDV) as a test method to explore the utility of several methods for han-
dling driver drowsiness during highly automated driving. The drowsiness of
31 participants (age: M=30.61, SD=8.16) was assessed by two investigators
in the back of the car during a 120-minute motorway drive as part of a user
study. The participants were sitting on the car's left side, where an additional
steering wheel was mounted. A person on the right seat controlled the vehicle.
Between the driver and passenger seat, a curtain was placed. Depending on the
current drowsiness level of the participants, TORs were triggered. In terms of
these, no signicant inuence of drowsiness on take-over reaction times could
be determined. Results further depict that in an RHDV setting combined with
highly automated driving on a highway, high drowsiness levels under safe con-
ditions can be achieved [50].
In the user study of Berghöfer et al. with 34 participants (age: M=54, SD=14),
a Wizard-of-Oz setting was utilized to simulate level-3 automated driving [20]
and to investigate possible inuences of behavior and characteristics on the
take-over reaction time. After the driver turned on the level-3 system, the
driver wizard on the passenger seat assumed responsibility for the test vehi-
cle's longitudinal and lateral control. Therefore, a second pair of pedals, special
control units on the seat and armrest were installed. Observers rated sleepi-
ness, but no signicant inuence on take-over time was determined [159].
The RRADS (Real Road Autonomous Driving Simulator) platform used for
simulating autonomous driving on real roads was presented in the work of Bal-
todano et al. [160]. A partition between the driver and passenger seat was
used to separate the participant from the driving wizard.
The studies show that researchers' choice to simulate automated driving and
ensure the necessary safety while driving was a Wizard-of-Oz approach. How-
ever, in these cases, it is not possible to speak of real drowsy driving since the
vehicle itself is being driven by another person when the auto-pilot is switched
on. Alternative solutions for conducting user studies to experience automated
and manual driving under reproducible and for the participants' safe conditions
in a realistic environment are required.
In the following chapter, the baseline studies for database creation and results
from the subjective evaluation are presented.
24
3 Baseline Studies and
Subjective Evaluation
The database of this work was created in the context of two user studies. An
identical study setting was applied for both studies. The only but signicant
dierence represents the study environment. For determining dierences re-
garding driver drowsiness, baseline study 1 was carried out in a simulator-based
and baseline study 2 in a realistic environment.
In the upcoming sections, the developed study setting is described rst. This
is followed by presenting the results from the subjective evaluation of both
RQ1 (What preconditions can be considered to
studies for answering
adapt and personalize driver drowsiness detection systems and to
model dierent groups of users?). Since drivers of future automated ve-
hicles are mainly typical and average consumers and no experts in this domain,
dierent expectations, previous knowledge, and behavior must be anticipated
and investigated during the development process of intelligent driver-vehicle
interaction systems. This knowledge can be applied to adapt and personalize
intelligent user interfaces for driver drowsiness detection and model dierent
user groups.
The main focus in the subjective evaluation and for answering RQ1 will lie on
the following hypotheses:
H1: Driving mode (manual/automated) has a signicant eect on drowsi-

ness.
H2: Driving time has a signicant eect on drowsiness.
H3: Driver's age (young/old) has a signicant eect on drowsiness.
H4: A correlation between drowsiness self-ratings and objective measures

can be found. (only in study 2)
25
3 Baseline Studies and Subjective Evaluation
3.1 Study Setting
A study setting was developed for this work based on the presented studies from
related work and the derived implications. This study setting was performed
in both simulator and real-world driving for determining dierences between
and collecting data for model development from both environments.
This section is based on the following own publications: [2, 6]
3.1.1 Participants
As presented in Section 2.4.1, in previous drowsiness studies, the focus was

not on the driver's age, and mainly younger participants or larger age intervals
were considered. For this reason, the focus in this work is on two specic age
groups.
A report of the Sleep Health Foundation released information about sleep needs
across the lifespan [161]. According to their recommended sleep duration,
adults were divided into two groups: young adults/adults ranging from 18
to 64 years and older adults with ages higher than 64. People in the younger
age group are recommended to sleep seven to nine hours, six hours may be
appropriate, and less than six hours are not recommended. In contrast, peo-
ple older than 64 years are advised to sleep seven to eight hours, ve to six
hours may still be appropriate, and less than ve hours of sleep are not recom-
mended. It was also presented that the need for sleep decreases with increasing
age [161]. Based on this report, two age groups were selected, whose intra-
group age range is limited to a small interval and whose inter-group ages are
far apart. Therefore, 15 participants in the age of 20-25 and 15 participants
in the age of 65-70 were chosen. In study 1, the participants received 35e and
in study 2 40e compensation for their eorts since participating in the study
required around three hours without individual travel times. The requirements
for participating in the study included the possession of a valid driving license,
a subjectively rated good health condition, no sleep disorder, and no limitation
in their ability to drive. Furthermore, they were instructed not to consume any
caeinated drinks within ve hours before participation.
3.1.2 Data Collection
Several subjective and objective measures were collected, which will be ex-
plained in the following sections.
26
3.1 Study Setting
3.1.2.1 Subjective Measures
Pre-Questionnaire
The rst part of the pre-questionnaire contained some basic demographic ques-
tions and queried details about the participants' sleeping behavior and health,
as presented in Table 3.1.
In the second part of the pre-questionnaire, the items of the Epworth Sleepi-
ness Scale (ESS) [40] had to be answered for assessing the participants' daytime
sleepiness (see Table 3.2). The ESS queries how likely it is to doze o or fall
asleep in the mentioned situations, in contrast to feeling just tired, by apply-
ing the following scale: 0 (would never doze), 1 (slight chance of dozing), 2
(moderate chance of dozing), 3 (high chance of dozing).
How old are you?

Which gender are you?
When did you get up today?
How many hours did you sleep last night?
How many hours do you usually sleep per night?
How did you sleep last night in general?
Do you currently undergo medical treatment?
Did you ever experience a micro-sleep while driving? (added in study 2)
Table 3.1: Pre-Questionnaire Part 1.
Sitting and reading

Watching TV
Sitting inactive in a public place
Being a passenger in a car for an hour
Lying down in the afternoon
Sitting and talking to someone
Sitting quietly after lunch (no alcohol)
Stopping for a few minutes in trac while driving
Table 3.2: Pre-Questionnaire Part 2: Epworth Sleepiness Scale (ESS) [40].
Drowsiness Self-Ratings
Since this work investigates the potential of using physiological data from smart
wearables in connection with supervised machine learning for driver drowsiness
detection, a ground truth for drowsiness, i.e., labels for the physiological data
recorded with smart wearable devices, is needed. Dierent types of labels were
applied in this work. One of those was determined via drowsiness self-ratings
as it was done in many previous works before [90, 81, 162, 129, 146]. For
this purpose, the most frequently used scale, the Karolinska Sleepiness Scale
(KSS), a nine-point Likert Scale (1 | extremely alert; 2 | very alert; 3 | alert;
4 | rather alert; 5 | neither alert nor sleepy; 6 | some signs of sleepiness; 7
| sleepy; but no eort to keep awake; 8 | sleepy, some eort to keep awake;
27
9 | very sleepy, sleep ghting) was applied (see Table 2.1 in Section 2.2.1)
[49]. This scale was displayed in an Android Application on a Google Pixel
C tablet computer placed next to the steering wheel in the center console of
the car. To minimize the impacts of the self-rating requests on the drowsiness
development, the application was programmed as follows: After the start of
the drive, the participant was prompted by the tablet every ve minutes by
slowly increasing the screen brightness without any auditory hints. After the
self-rating was given, i.e., the current drowsiness level selected and conrmed,
the screen brightness slowly faded away. KSS levels 1-4 were colored in shades
of green, level 5, and 6 in yellow and orange, and levels 7-9 in shades of red
(see Figure 3.1). The ratings with the corresponding timestamp were stored
on the local memory of the tablet.
Figure 3.1: KSS representation on tablet.
In addition to the self-ratings during the drive, the participants were asked to
draw drowsiness curves after their test drive based on the user experience (UX)
curve method [163]. Therefore, a paper with a two-dimensional coordinate
system was handed out. It showed the duration of the simulation in ve-minute
intervals on the x-axis and the KSS levels on the y-axis. The rst reason was
to investigate how the subjects rate their drowsiness right after the drive and
if there are dierences within the ratings during the drive. The other and more
practical reason was to have potential backup ratings in case of problems with
the tablet's self-ratings while driving.
Post-Questionnaire After nishing the drives, the participants were ques-

tioned about the drowsiness self-ratings, the development of drowsiness con-
cerning the two driving modes, and wearable devices. The specic questions
are summarized in Table 3.3.
28
3.1 Study Setting
What would be the appropriate KSS level to receive a rst warning?

How condent did you feel when rating your own drowsiness?
Did you get drowsier in automated or manual driving?
Do you own a wearable (smartwatch, tness tracker)? (added in study 2)
Are you going to buy a wearable in the near future? (added in study 2)
Would you wear a wearable to ensure driving safely? (added in study 2)
Table 3.3: Post-Questionnaire.
Trust Questionnaire
For study 2, a questionnaire for assessing trust in automation was added. Apart
from driving in a drowsy state, especially overtrust is a critical challenge for
the safe use of automated vehicle technology [164]. Recent incidents with au-
tomated vehicles, e.g., as mentioned in the introduction, with Tesla Autopi-
lot [22], or the Uber self-driving Taxi [23] are (at least partly) connected to
overtrust, as drivers failed to monitor and intervene properly. Drivers that
trust the automation more may show greater willingness to fall asleep and vice
versa. It will be evaluated how trust levels change after a single session of real
system exposure and if a correlation between subjective trust and drowsiness
can exist. Drowsiness can be an ideal candidate for evaluating such behavior
since giving way to drowsiness (what can be re-formulated as the willingness to
fall asleep) exposes a high risk. Investigating if and how drowsiness is aected
by users' trust levels might reveal additional safety risks and be an option to
measure user trust in an unobtrusive way behaviorally.
Lee and See dene trust as attitude that an agent will help achieve an individ-
ual's goals in a situation characterized by uncertainty and vulnerability [165],
but also state that trust is an attitude underlain by beliefs, that leads to inten-
tions and thus resulting reliance behavior [165]. Further, to accomplish proper
levels, trust should match the true capabilities of an agent [165]. Overtrust is
thus a situation where subjective trust exceeds a system's capabilities, which
can ultimately lead to misuse of technology [166]. Various situations can lead
to overtrust, including pure performance-based (poor calibration, resolution,
or specicity [165]), but also pre-existing (dispositional, situational or learned
trust [167]) explanations. Trust research often emphasizes the performance-
based component, e.g., recent publications in the automated driving domain
suggest making system capabilities and performance transparent to the user
[168, 169, 170, 171]. Future consumer-oriented users of automated vehicles
may add an important aspect that has not been addressed yet. In contrast to
professional operators, drivers may self-negotiate their trust levels to justify the
engagement in non-driving related tasks (NDRT), which is an often mentioned
advantage of automated vehicles [172].
Therefore, to investigate if and how trust in automation aects drowsiness,
the trust scale by Jian et al. (see Table 3.4) was applied [173] which provides
29
sub-scales for both trust and distrust. Subjective trust was assessed before and
after the SAE level 2 automated drive in the realistic environment.
The system is deceptive.

The system behaves in an underhanded manner.
I am suspicious of the system's intent, action or outputs.
I am wary of the system.
The system's actions will have a harmful or injurious outcome.
I am condent in the system.
The system provides security.
The system has integrity.
The system is dependable.
The system is reliable.
I can trust the system.
I am familiar with the system.
Table 3.4: Trust Scale by Jian et al. [173].
3.1.2.2 Objective Measures
Physiological Data
Concerning objective measurements, physiological data for model development
with supervised machine learning were recorded using four wrist-worn wear-
able devices. The devices used were Garmin Forerunner 235 [174], Garmin
Vivosmart 3 [175], Polar A370 [176], and Empatica E4 [177]. The rst three
are standard consumer-grade tness trackers with optical heart rate sensors us-
ing PPG. In contrast, the Empatica E4 wristband represents a more advanced
and medical-grade wrist-worn wearable device often used in research applica-
tions. It oers the acquisition of several physiological signals such as blood
volume pulse and inter-beat intervals (IBIs) for HRV analysis via PPG and
the measurement of electrodermal activity, acceleration, and skin temperature.
The IBI sequence is received from the PPG/BVP (blood volume pulse) signal
with a sampling frequency of 64 Hz. McCarthy et al. checked the validity of
the Empatica E4 wristband against clinical standard gears in recognizing the
anomalies in a heartbeat and found a comparable data quality of 85% between
these devices [178]. Two wearables were worn on each wrist during the study
(see Figure 3.2). The arm's choice was randomized so that the watches were
worn equally often on the left and right wrist by all participants.
Apart from the wrist-worn wearable device, a 3-channel ECG measurement de-
vice, the Faros from Bittium [179], served as a reference measurement device.
ECG data were recorded with the maximum possible sampling frequency of
1000 Hz. With ve adhesive electrodes, it was attached to the subject's upper
body. Before attaching the electrodes, the relevant body sites were shaved with
a disposable razor, if necessary, and then cleaned with alcohol swabs.
30
3.1 Study Setting
Figure 3.2: Wearables on participant's wrists: Garmin Forerunner 235 (1), Garmin
Vivosmart 3 (2), Empatica E4 (3), Polar A370 (4).
Video Data for Observer Ratings

Besides physiological signals, videos of the driver's face were recorded. There-
fore, a camera was mounted either on top of the cockpit or the windshield and
directed towards the driver's face. After each study, external observers rated
the video data with regard to drowsiness to have a further ground truth for
drowsiness in supervised machine learning in addition to the self-ratings. The
procedure for collecting the ratings (number of external raters, frequency of rat-
ings, etc.) is explained in detail in the following chapters for the corresponding
experiment.
3.1.3 Study Procedure
The complete procedure took about two and a half hours for each participant
(see Figure 3.3). The study was carried out at three dierent times of the day
(9 am, 1:30 pm, and 5:30 pm) to compare the inuence on the development of
drowsiness. First, the participants got an introduction and instruction from the
experimenter. After lling the pre-questionnaire, the physiological measuring
instruments were attached to the body. This was followed by the central part
of the study, which consisted of 90 minutes of driving. For investigating the
inuence of driving mode on the development of drowsiness, the 90 minutes
were split into two successive 45-minute sessions: a manual (SAE level 0) and
a partially automated (SAE level 2) one [20]. In partially automated driving,
the driver is obliged to monitor the driving process all the time and be ready to
take-over when requested [20]. In the case of this study, no take-over situations
31
were triggered. This would have resulted in an alerting eect on the driver
and negatively aected the progressive development of drowsiness. However,
SAE level 2 and the duty of monitoring was used more as a pretext to uphold
the driver's concentration and focus on the driving environment to achieve a
quicker increase of drowsiness. To accustom the participants to the driving
situation and the drowsiness self-ratings via the tablet, 10 minutes of manual
practice driving was carried out. During driving, the participants were asked
not to use the mobile phone, eat or drink, and avoid chewing gum. They were
instructed not to close their eyes for a long time or to fall asleep. Moreover, they
were instructed to avoid talking to the experimenter and not perform any other
secondary activities. For each participant, the order of the two drives changed.
Half of the participants started with the manual and the other half with the
partially automated drive to counteract possible inuences on the respective
drive's measurement results. As stated in the previous section, the participants
had to rate their drowsiness with the KSS displayed on the tablet every ve
minutes during the drives. In a short break between the two drives and after
the second drive, participants were asked to draw the drowsiness curves. The
study ended with the completion of the post questionnaire.
Figure 3.3: Study procedure.
3.2 Study 1: Driving Simulator
This section is based on the following own publication: [2]
Baseline study 1 was conducted in a high-delity driving simulator at Technis-

che Hochschule Ingolstadt (THI) (see left part of Figure 3.5). It represents a
six-axis hexapod system with a motion platform and remodeled VW Golf cabin.
The virtual test environment was created with IPGs CarMaker (v.5.01). Real-
istic driving scenarios can be recreated with the associated surrounding envi-
ronments [180]. For time synchronization of self-ratings and driving simulation,
the MQTT protocol was applied to communicate between the PC running the
driving simulation and the tablet running the application for the self-ratings.
The driving simulation was displayed with four projectors and an LCD screen
inside the car cabin. Speakers in the simulator interior produced the vehicle
sound. To accelerate or at least not interrupt the test person's drowsiness devel-
opment during the test drive and reduce possible alerting eects, the route was
32
designed as a monotonous track in the form of a loop with a length of around

13 km (see Figure 3.4). The track represented a three-lane highway with little
trac (see the right part of Figure 3.5). The speed limit for the manual drive
was 120 km/h. For the automated drive, it was set to 110 km/h. Since the
simulation car traveled with autonomous cruise control (ACC) during the auto-
mated drive, the car automatically slowed down and accelerated to a maximum
of 110 km/h, depending on the trac. The temperature in the simulator was
25○ C on average. No radio or music was played during the drive. Through a
camera in the simulator, directed at the driver, the experimenter could observe
the participant. In case of health problems, e.g., simulator sickness or other
inconsistencies during the study, the participant could communicate with the
experimenter via a microphone in the simulator.
Figure 3.4: CarMaker export of study route (distance information in meters).
Figure 3.5: Left: Hexapod driving simulator at THI; Right: simulator setup with
KSS on tablet.
3.2.1 Results
In the following, results from the subjective evaluation are presented and dis-
cussed based on the postulated hypotheses.
33
3.2.1.1 Questionnaires
Pre-Questionnaire
An almost equally distributed number of participants at each day time was
available: nine in the morning, 11 in the afternoon, and 10 in the evening.
Six women and nine men (age: M=22.87 years, SD=1.81 years), most of them
students from Technische Hochschule Ingolstadt (THI), were selected from the
young age group. For the old age group, eight female and seven male partic-
ipants (age: M=67.60 years, SD=1.88 years) were recruited via an advertise-
ment in a local newspaper. Of all 30 subjects, six older participants currently
undergo medical treatment. On average, all participants slept around seven
hours the night before the study (young: M=7.07 hours, SD=0.80 hours; old:
M=6.90 hours, SD=1.27 hours) and in general (young: M=7.08 hours, SD=0.74
hours; old: M=7.07 hours, SD=1.40 hours). Their average sleep duration per
night is at a similar level. Besides, the perceived sleep quality the night before
the study was evaluated by the participants. One younger and three older par-
ticipants answered with very good, nine younger and nine older with good,
and ve younger and three older with medium. A summary of the results
from the pre-questionnaire can be found in Table 3.5.
Participants Young Old Overall

female 6 8 14
male 9 7 16
age 22.87 (±1.81) 67.60 (±1.88) 45.23 (±22.82)
Medication Young Old Overall

yes 0 6 6
no 15 9 24
Sleep duration Young Old Overall

before study 7.07 (±0.80) 6.90 (±1.27) 6.98 (±1.05)
in general 7.08 (±0.74) 7.07 (±1.40) 7.08 (±1.10)
Sleep quality Young Old Overall

very good 1 3 4
good 9 9 18
medium 5 3 8
bad 0 0 0
Table 3.5: Results of pre-questionnaire.
Apart from the demographic details, the participants were asked to answer the
ESS [40]. For the interpretation of the results, the maximum achievable score of
24 points was grouped into ve categories (0-4) [181], as presented in Table 3.6.
It can be seen that all 15 subjects of the older age group are located in the lower
normal and higher normal daytime sleepiness categories, representing the
34
non-critical range of the ESS score [40]. In comparison, younger participants

cover all ve categories.
Daytime Sleepiness Young Old Overall
0 (0-5 points)
4 10 14
lower normal daytime sleepiness
1 (6-10 points)
6 5 11
higher normal daytime sleepiness
2 (11-12 points
2 0 2
mild excessive daytime sleepiness
3 (13-15 points)
2 0 2
moderate excessive daytime sleepiness
4 (16-24 points)
1 0 1
severe excessive daytime sleepiness
Table 3.6: Results of ESS [181].
Post-Questionnaire
In the rst question, participants were asked about the most suitable level of
the KSS to receive an initial drowsiness warning in the vehicle. The results are
presented in Table 3.7.
KSS Level Young Old Overall
1 (extremely alert) 0 0 0
2 (very alert) 0 0 0
3 (alert) 0 0 0
4 (rather alert) 1 6 7
5 (neither alert nor sleepy) 1 1 2
6 (some signs of sleepiness) 10 6 16
7 (sleepy; no eort to keep awake) 2 2 4
8 (sleepy; some eort to keep awake) 1 0 1
9 (very sleepy; sleep ghting) 0 0 0
Table 3.7: Preferred KSS level for a rst drowsiness warning.
In both age groups, the majority (young: 10; old: 6) voted for level 6 (some
signs of sleepiness). It is noteworthy that six older participants would prefer
35
a warning already at level 4 (rather alert). Furthermore, they had to answer

how condent they felt in their overall drowsiness self-assessment. On a three-
level scale (not safe, medium, safe), the younger subjects ranked themselves as
follows: not safe: 0; medium: 4; safe: 11. The results of the older participants
were similar: not safe: 0; medium: 2; safe: 13. For the third question, they had
to indicate if manual or partially automated driving was more tiring for them.
For both age groups, the answer was equivalent: automated: 12; manual: 3.
3.2.1.2 Analysis of Self-Ratings
The main focus in the subjective evaluation is on the analysis of the drowsiness
self-ratings. All data sets of the KSS ratings from both age groups from dur-
ing the drive via the tablet and afterward by drawing drowsiness curves were
available for the evaluation. Four data sets of KSS ratings were generated per
participant, two from the manual and two from the partially automated drive,
respectively, 36 measuring points per participant. In summary, 1080 KSS rat-
ings were applied in the analysis.
To evaluate the eects of driving mode, driving time, and age on the de-
velopment of drowsiness, a three-factorial analysis of variance (ANOVA) Re-
peated Measure with one within-subject factor (measuring time points), and
two between-subject factors (manual/automated, young/old) was conducted.
This was done separately for the ratings during (D) and after (Af ) the drive.
The ratings for both age groups and driving modes are at an almost similar
level with slightly higher ratings during the drive (see Figure 3.6). The ob-
tained drowsiness ratings steadily increased over the entire period of the drive.
Except for one case, identical eects could be determined with the ANOVA.
Figure 3.6: Average KSS ratings with 95% condence interval (CI) of all partici-
pants during and after the drive.
Results show that driving time (D: F(8,21)=6.43; p=.000; ηp2 =.710; Af:
F(8,21)=5.84; p=.001; ηp2 =.690) and driving mode (D: F(1,28)=4.46; p=.044;
36
ηp2 =.137; Af: F(1,28)=7.98; p=.009; ηp2 =.222) had a signicant eect on the
development of drowsiness. Therefore, a signicant dierence exists in the KSS
ratings over the nine measuring time points with higher drowsiness levels at
the end of the ride and automated driving. No signicant eect of age group
was found (D: F(1,28)=3.96; n.s.; Af: F(1,28)=5.41; n.s.). Further, no signif-
icant interaction eects on drowsiness were found between driving mode and
age group (D: F(1,28)=1.50; n.s.; Af: F(1,28)=1.74; n.s.), driving time and age
group (D: F(8,21)=1.24; n.s.; Af: F(8,21)=1.05; n.s.), as well as driving mode,
driving time and age group (D: F(8,21)=.56; n.s.; Af: F(8,21)=2.03; n.s.).
Whereas no signicant interaction eect on drowsiness could be identied be-
tween driving mode and driving time for the ratings from during the drive
(D: F(8,21)=1.32; n.s.), it could be for the ones afterward (Af: F(8,21)=3.27;
p=.014; ηp2 =.555).
Apart from the presented results of the ANOVA, the characteristics of the ob-
tained KSS ratings are presented in the form of dierent charts. The average
KSS ratings with 95% condence interval (CI) are plotted over time comparing
the two driving modes (see Figure 3.7) and age groups (see Figure 3.8). Fur-
ther, the average ratings of all participants for both driving modes at the three
dierent times of the day (see Figure 3.9) on which the study was carried out
are considered. For all these charts, the ratings collected during the drive were
used.
Figure 3.7: Average KSS ratings with 95% CI during drive of all participants for
manual and automated driving.
Considering the ratings given during the ride for automated and manual driving
separately (see Figure 3.7), it is noticeable that the ratings were relatively sim-
ilar up to the 3rd measurement point after 15 minutes, but from there started
to diverge and higher KSS ratings were assigned in automated driving. At
minute 40, the average dierence increased to more than one KSS level (1.05).
Regarding the development of drowsiness in both age groups separately for
both driving modes (see Figure 3.8), a similar trend is apparent as presented
in Figure 3.7. The divergence of the two curves already started at the second
measurement point after ten minutes. It increased over time to an average
37
Figure 3.8: Average KSS ratings with 95% CI during drive of young and old age
group for both manual and automated driving.
Figure 3.9: Average KSS ratings with 95% CI during drive at dierent times of the
day for both age groups and driving modes.
dierence between young and old participants of up to 1.5 KSS levels with
higher levels for the young age group. Considering the KSS ratings at dierent
times of the day (see Figure 3.9), it becomes clear that, in general, the lowest
ratings were given in the evening, followed by the ratings in the afternoon. In
the morning, the highest KSS levels were reached.
In addition to the presented charts, Figure 3.10 compares age groups and driv-
ing modes in terms of the absolute numbers of participants that reached a
certain drowsiness level. It can be seen that for the young age group, all KSS
levels were covered, both for manual and automated driving. A decreasing
trend for both driving modes is recognizable throughout all KSS levels. For
the older age group only in automated driving, all levels were covered. In
manual driving, none of the older participants reached KSS levels 8 and 9. In
general, for manual driving in the old age group, the assessed drowsiness levels
were low and mainly ranged from KSS levels 1 to 5.
38
Figure 3.10: Comparison of age groups and driving modes in terms of number of
participants that chose/reached a certain KSS level.
3.2.1.3 Discussion and Limitations
Regarding the presented hypotheses, it can be summarized that driving mode

and driving time signicantly aected drowsiness, whereas the age group did
not. Therefore, H1 and H2 can be accepted, while H3 needs to be declined.
In concern of the drowsiness development between the two age groups, the re-
sults of the ESS (see Figure 3.9) could also be associated with the higher KSS
levels of the younger participants 3.8). On average, the younger participants
reached higher ESS scores than the older participants (see Table 3.9). Another
reason that could have aected the development of drowsiness dierently in
the two age groups is the following: The young subjects have a regular daily
routine throughout the whole week, e.g., at university or their job, with various
duties and deadlines to be met. Phases, such as the participation in a study and
within the presented study, the partially automated drive, may most likely be
seen as a relaxation phase. By contrast, the majority of the older subjects were
retired. In conversation with them, the impression arises that participating in a
simulator study, dealing with a future topic such as automated driving, and the
experience of new technology was fascinating for them. The experience of new
technology may also have inuenced the results of the question What would
be the most appropriate level for you to receive a warning in the vehicle?. The
majority of both groups (young: 10, old: 6) chose level 6. However, six older
subjects appeared to be a little more cautious and decided for KSS level 4 and,
therefore, would wish a warning already in a very early stage of drowsiness.
This eect needs to be veried in future research with a much larger number
of younger and older subjects. If the eect that older people always wish to be
warned at an earlier point in time persists, this could be further incorporated
into the design of future drowsiness warning systems.
The change of KSS levels over time in automated compared to manual driving
(see Figure 3.7) reects the resulting signicant eect of driving mode on the
39
development of drowsiness, which is also conrmed in the post-questionnaire.

The majority of all participants stated that automated driving had a higher
impact on the increase of drowsiness than manual driving what might be caused
by the duty and the monotony of monitoring the system during partially au-
tomated driving. Moreover, the small eect of lower drowsiness ratings after
the drive than during the drive raises some questions for further investiga-
tion. What ratings can be trusted more and are more reliable as a reference
and ground truth for driver drowsiness? In this context, it would also be in-
teresting to investigate how drowsiness would have developed under the same
conditions during a more extended driving period, whether the dierences or
the drowsiness levels would increase. Moreover, a higher number of partici-
pants in general and in the age range of 26-64 years should be considered.
In the current study, optimal conditions were created to induce drowsiness.
This included no communication with the experimenter, no food, no caeinated
drinks ve hours before the study, a monotonous driving route with a low
speed limit, and warm temperature inside the simulator. Considering these
factors and that the study was carried out in a driving simulator, drowsiness
would generally occur later under realistic conditions since possible dangers in
a simulator-based environment compared to real trac can be neglected. How-
ever, with these conditions, it was possible to achieve high drowsiness levels
already in a short ride of 45 minutes. This work focused on comparing two age
groups, selected based on their recommended average sleep requirements. Dif-
ferent tendencies in their drowsiness development under the given conditions
were recognizable. Nevertheless, the validity of the occurring eects needs to
be enhanced with a higher number of subjects and a more realistic driving
environment.
3.2.1.4 Main Findings
In the following, the main ndings are summarized:
A signicant eect of driving mode on the development of drowsiness was

found. Therefore, a signicant dierence between manual and automated
driving was evident with higher levels in automated driving.
A signicant eect of driving time on the development of drowsiness was

found with higher levels of drowsiness at the end of the ride.
No signicant eect of age on the development of drowsiness was found.

However, over time, the change of KSS showed the tendency that the
younger participants were more prone to drowsiness and got drowsier
faster compared to the older ones.
40
3.3 Study 2: Test Track
In comparison to younger participants, older participants prefer a drowsi-

ness warning at earlier stages of drowsiness.
Researchers applied in previous drowsiness studies in realistic environments a

Wizard-of-Oz approach to imitate automated driving and ensure safety while
driving (see Section 2.4.2). An alternative solution is presented in this Ph.D.
thesis for conducting user studies to experience automated and manual driving
under reproducible and for the participants' safe conditions in a realistic envi-
ronment. For bridging the gap between simulator studies and experiments in
real trac, the study setting will be performed on an outdoor test area with a
suitably instrumented vehicle and safety device under reproducible conditions.
On the one hand, the controlled environment minimizes risk and danger for the
study participants; on the other hand, it approaches a more realistic scenario
than drowsiness studies in a driving simulator.
Baseline study 2 was conducted on a test track from Technische Hochschule
Ingolstadt (THI) (see right part of Figure 3.11).
Figure 3.11: Left: test vehicle setup with driving robot on steering wheel (1) and
pedals (2) as well as tablet (3) for drowsiness self-ratings in center

console; Right: test track with safety corridor (4) and test vehicle (5).
In trial runs before the actual study, the test track was retracted and stored for
the automated ride. Due to safety restrictions on the one hand and high lateral
accelerations within the curves that would negatively inuence the drowsiness
development, on the other hand, attention was paid to a maximum speed of
41
about 25 km/h. The previously recorded route and speeds could be traveled
through high-precision GPS (Global Positioning System) positioning as part of
the partially automated ride. As in the simulated environment, the test track
was designed as a loop to provoke monotonous driving in the best possible
way to accelerate or at least not impair the drowsiness development of the test
persons and reduce possible alerting eects. The design took into account a
safety corridor with a width of approximately ve meters between the test area
boundary and the test track to bring the vehicle to a standstill in time in the
event of an emergency brake (see Figure 3.12).
Figure 3.12: Graphic representation of test area with distance information in meters
(m) and retracted study track (red dashed line).
The test vehicle used was an Audi A4 Avant (initial registration 2018). To
enable SAE level 2 driving, the car was equipped with a driving robot [182].
From this, the entire lateral and longitudinal guidance of the vehicle could be
completely taken over by the robot. The driving robot, which consisted of
two parts, was mounted on the pedals and the steering wheel (see left part
of Figure 3.11). The driver's usual sitting position was thereby not restricted
since, in the course of the study, the vehicle should and could also be driven
manually. The temperature in the vehicle was set to 23○ C. During the ride,
the radio was o, and no other music was played. In order to make the drive
as safe as possible for the subjects, three safety precautions were deployed. On
the passenger side, on which the experimenter sat during the study, a second
pair of pedals were attached. Moreover, an emergency stop was installed in the
vehicle center console with which either the test person or the experimenter
could immediately bring the vehicle to a standstill. Furthermore, the ride was
monitored by a person outside the test track, who could stop the car via remote
control.
With the provisions made in the test car and on the test track, it was possible to
carry out the study under reproducible and safe conditions for the subjects.
3.3.1 Results
In the following, the results from the subjective evaluation are presented and
discussed based on the postulated hypotheses.
42
3.3.1.1 Questionnaires
Pre-Questionnaire
As in study 1, 15 participants, seven female, and eight male were selected in the
age range 20-25 (age: M=23.73 years, SD=1.49 years), and 15, ve female and
ten male, in the range 65-70 years (age: M=67.27 years, SD=1.83 years). The
same number of participants from each age group was invited for the dierent
points in time. As part of the pre-questionnaire, subjects were asked if they ever
had a micro-sleep during a drive. Three younger and three older participants
answered this question with yes. Besides, it has been determined that none of
the younger but seven older volunteers currently undergo medical treatment.
On average, the younger participants slept 7.35 (SD=0.91) and the older ones
7.8 (SD=1.25) hours in the night before the study. The perceived sleep quality
in the night before the study should be evaluated with the choices very good,
good, medium or bad. Four young and ve older subjects answered with
very good, six young and seven older with good, four young and three older
with medium, and only one young participant with bad. A summary of the
pre-questionnaire results can be found in Table 3.8.

female 7 5 12
male 8 10 18
age 23.73 (±1.49) 67.27 (±1.83) 45.50 (±22.20)
Micro-sleep Young Old Overall

yes 3 3 6
no 12 12 24

yes 0 7 7
no 15 8 23

before study 7.35 (±0.91) 7.80 (±1.25) 7.56 (±1.10)
in general 7.43 (±0.93) 7.56 (±1.05) 7.50 (±0.93)

very good 4 5 9
good 6 7 13
medium 4 3 7
bad 1 0 1
Furthermore, they were asked to respond to the questions from ESS [40]. For
better illustrating the outcomes, the highest reachable score of 24 was separated
into ve groups [181], as displayed in Table 3.9. As for study 1, all 15 subjects
of the older age group are located in the lower normal and higher normal
daytime sleepiness categories, representing the non-critical range of the ESS
43
score [40]. In comparison, younger participants cover, except category 3, all

other categories.
Daytime Sleepiness Young Old Overall
0 (0-5 points)
2 11 13
1 (6-10 points)
9 4 13
2 (11-12 points
3 0 3
3 (13-15 points)
1 0 1
4 (16-24 points)
0 0 0
Post-Questionnaire
In the post-questionnaire, participants were asked if they got sleepier in auto-
mated or manual driving. Except for one younger subject, all other younger
and older participants stated that they had to struggle more with drowsiness
during the partially automated ride. Concerning the self-ratings, the subjects
were asked how condent they felt in doing this. Of the three possible op-
tions not condent, medium and condent seven young participants chose
medium and eight condent. It was similar to the older participants. Six
answered with medium and nine with condent. Participants were further
asked at which level they would like to receive a rst drowsiness warning. The
results are presented in Table 3.10. In general, a wide range of possible levels
has been selected. The majority of younger subjects wish to get a warning
from KSS level 6, and four would even be satised with level 7 and three with
level 8. Seven older participants chose level 6, but only two subjects level 7,
and no one level 8 or 9. Levels 4 and 5 are represented by two and three older
participants. One older participant selected even level 3.
44
3 (alert) 0 1 1
Table 3.10: Preferred KSS level for a rst drowsiness warning.
Furthermore, the post-study questionnaire of study 2 contained questions about

the usage of wearables devices. More precisely, if participants possess one, they
intend to purchase one and would wear one to ensure safety while driving. The
results can be found in Table 3.11. It can be seen that only the minority of
both the younger and older subjects own or plan to buy a wearable shortly.
However, it should be noted that in order to maintain or increase safety during
driving, the majority would wear a wearable in the vehicle that supports the
usage of wearable devices in cars for safety-critical tasks like driver drowsiness
detection.
Young Old Overall
possession of a wearable (yes/no) 2/13 3/12 5/25
purchase intention (yes/no) 4/11 2/13 6/24
wearing for safety

11/4 11/4 22/8
while driving (yes/no)
Table 3.11: Results from post-study questionnaire regarding usage of wearable de-
vices.
Trust Questionnaire
In study 2, it was also investigated how trust/distrust in the automated sys-
tem inuences the drowsiness state and if a correlation exists. The results are
presented in the following paragraphs. This section is based on the following
own publication: [3]
To assess subjective trust, participants had to complete the trust scale by Jian
45
et al. [173], which provides sub-scales for both trust and distrust, before and
after the drive. In terms of drowsiness, the KSS ratings from during the drive
were utilized. Participants completed the trust scale before the partially au-
tomated drive and again after the drive to assess the eect of initial system
exposure on their trust levels. Statistical analysis was conducted using IBM
SPSS V.24, and eects are reported as signicant at p<.05. Considering the
trust scale, scale values for both concepts, trust and distrust were calculated,
since all scales showed acceptable reliability (Cronbach's α >0.846 for all scales,
see Table 3.12). Since not all data were normally distributed, Wilcoxon signed
ranked test was applied to evaluate within-subjects eects. Participants rated
the sub-scale distrust lower after than before the trip with the automated ve-
hicle; however, the dierence is not statistically signicant (p=.130). Ratings
for trust, on the other hand, increased signicantly (p=.002) after the ride.
Regarding subjects-eects for gender or the dierent age groups, no signicant
dierences could be found. Male drivers rated trust in the vehicle after the
ride (M=4.58, SD=0.86) higher than female drivers (M=3.56, SD=0.15), but
a statistical signicance was not given (p=.068).
To quantify the eect of the 45-minute monitoring task on drowsiness, a linear
regression on the nine subsequent KSS ratings of each participant and calcu-
lated the slope of the regression line was performed. This allowed the expression
of an increase of drowsiness in a single number while omitting interpersonal dif-
ferences emerging from the ordinal nature of the scale (c.f. an increase from
level 1 to 4 shows an equal slope than an increase from 4 to 7). Statistical eval-
uation using Mann-Whitney U tests revealed no signicant dierences between
the two age groups or gender. Considering a potential correlation between
trust and drowsiness, a signicant positive correlation between the drowsiness
increase (KSS-slope) and trust ratings after the drive (r=.408; n=30; p=.013)
was found.
M SD C. α
Distrust (before) 1.85 1.20 0.887
Distrust (after) 1.56 1.11 0.846
Trust (before) 3.70 0.94 0.864
Trust (after) 4.18 1.11 0.951
KSS-slope 0.49 0.05 -
Table 3.12: Descriptive statistics: values for mean (M), standard deviation (SD)
and Cronbach's alpha (C. α) for trust/distrust items (before/after ride)
and KSS-slope.
46
3.3.1.2 Analysis of Self-Ratings
The statistical evaluation was conducted with IBM SPSS V.24. For evaluating
the drowsiness self-ratings, in summary, 1072 ratings, 540 from the manual,
and 532 from the partially automated drive were used as a data basis. For
each participant, four data sets were available. These included two data sets of
the manual and automated ride with ratings from during and after the drive.
Due to technical problems with the driving robot, two subjects had to stop
the partially automated drive after 35 minutes. For these cases, the last two
ratings were missing.
A three-factorial ANOVA for repeated measures with one within-subject factor
(measuring time points) and two between-subject factors (manual/automated,
young/old), was applied separately for the ratings during and after the drive to
evaluate the eects of driving mode, age and driving time on the development
of drowsiness.
Figure 3.13: Average KSS ratings with 95% CI of all participants during and after
the drive.
Concerning the ratings during (D) and after (Af ) the ride for both age groups
and driving modes, whose values are very similar for all measurement points
(see Figure 3.13), the same eects were obtained with the ANOVA. For this
reason, results are presented together. The ANOVA results show a signi-
cant eect of driving time on the development of drowsiness (D: F(8,18)=8.36;
p=.000; ηp2 =.788; Af: F(8,18)=6.94; p=.000; ηp2 =.755). Therefore, a signi-
cant dierence exists in the KSS ratings over the nine measuring time points
with higher drowsiness levels at the end of the ride. Furthermore, results
show a signicant eect of driving mode on the development of drowsiness (D:
F(1,25)=48.84; p=.000; ηp2 =.661; Af: F(1,25)=45.59; p=.000; ηp2 =.646). Over
the nine measuring time points, a signicant dierence between manual and
automated driving was evident in the KSS ratings over time with higher levels
in automated driving. Moreover, a signicant eect of age group was found (D:
F(1,25)=5.11; p=.033; ηp2 =.170; Af: F(1,25)=5.39; p=.029; ηp2 =.177). Thus,
47
a signicant dierence exists between the younger and older subjects in the
KSS ratings over the nine measuring time points with higher drowsiness levels
for the younger subjects. No signicant interaction eects were observed for
driving time and age group (D: F(8,18)=1.482; n.s.; Af: F(8,18)=1.12; n.s.),
driving mode and age group (D: F(1,25)=.05; n.s.; Af: F(1,25)=1.37; n.s.),
driving time and driving mode (D: F(8,18)=.65; n.s.; Af: F(8,18)=2.35; n.s.)
as well as driving time, driving mode and age group (D: F(8,18)=.55; n.s.; Af:
F(8,18)=.74; n.s.).
Apart from the presented results of the ANOVA, the results and calculated
eects are apparent in the form of dierent charts. The average KSS ratings
with 95% CI for the considered cases are plotted over time for the manual and
partially automated ride. In addition to the diagrams that compare manual
vs. automated (see Figure 3.14) and young vs. old (see Figure 3.15), the
average ratings of all subjects for both driving modes at the three dierent
times of the day (see Figure 3.16) were considered. For all these charts, the
ratings collected during the drive were used.
Figure 3.14: Average KSS ratings with 95% CI during drive of all participants for
manual and automated driving.
48
Figure 3.15: Average KSS ratings with 95% CI during drive of young and older age
group for both manual and automated driving.
Figure 3.16: Average KSS ratings with 95% CI during drive at dierent times of
the day for both age groups and driving modes.
When comparing manual and partially automated driving (see Figure 3.14)
regardless of age group, it becomes clear that higher self-ratings were given
during automated driving. This reects the signicant dierence in KSS ratings
between the two driving modes over the measuring time points. After just
ve minutes of driving, the average dierence is almost one KSS level, which
increases to a maximum dierence of 1.73 levels by minute 35. The signicant
dierence in the development of KSS ratings over time in terms of age for both
driving modes is shown in Figure 3.15. The younger subjects gave signicantly
higher ratings as time increases. After ten minutes, a dierence in the KSS
ratings of an average of 1.20 levels is apparent. The dierence reaches its
maximum after 25 minutes with 1.60 KSS levels. Towards the end of the rides,
the two curves and so the KSS levels approach slightly.
Considering the KSS ratings at dierent times of the day (see Figure 3.16,
it becomes clear that the lowest ratings were given in the evening. In the
49
comparison of morning and afternoon, the two curves are not separable over
time; however, higher KSS levels were reached in the afternoon, especially at
the end of the ride.
To compare the dierences between the two age groups even more clearly and
concerning the two driving modes, Figure 3.17 shows how many participants
reached a certain KSS level for both age groups and driving modes. Whereas
in manual driving from the younger participants, six reached KSS level 7 and
two level 8, for the older subjects, the maximum was at level 6, chosen by three
subjects. In automated driving, it can be seen that only two older subjects
reached level 8 and no one level 9. Further, older subjects generally choose
lower levels, but the dierences between the two age groups are more decisive
in manual driving.
Figure 3.17: Comparison of age groups and driving modes in terms of number of
participants that chose/reached a certain KSS level.
3.3.1.3 Analysis of Heart Rate Data from Wearable Devices
Since heart rate was found to change during drowsiness [10, 30], in study 2, cor-
relations between drowsiness self-ratings and heart rate data from the wearable
devices were calculated with Spearman's ρ. The reason for choosing Spear-
man is the ordinal and discrete form of the drowsiness self-ratings and the
non-existent bivariate normal distribution in the data. The participants were
equipped with four wearable devices (Empatica E4, Garmin Forerunner 235,
Garmin Vivosmart 3, and Polar A370), two on each wrist. The focus is on
the last three, as these are standard tness trackers available on the consumer
electronics market. In the following, they are further referenced as follows:
Wearable1 (Garmin Forerunner 235), Wearable2 (Garmin Vivosmart 3), Wear-
able3 (Polar A370). The self-ratings are represented in ve-minute intervals,
but the heart rate was measured every second. This was adjusted in a pre-
processing step. For this purpose, the mean value of the heart rate over the
entire ve-minute interval was calculated and matched with the corresponding
50
drowsiness level.
Table 3.13 presents the results of the correlation analysis. Across the dier-
ent data sets, a weak linear correlation with the drowsiness self-ratings was
found.
Data set Wearable1 Wearable2 Wearable3
ρ = 0.003 ρ = 0.090 ρ = 0.041

automated driving
p = 0.959 p = 0.159 p = 0.521
ρ = -0.136 ρ = -0.166 ρ = 0.036

manual driving
p = 0.028 p = 0.007 p = 0.551
ρ = -0.157 ρ = -0.141 ρ = -0.131

young participants
p = 0.011 p = 0.023 p = 0.030
ρ = -0.351 ρ = -0.245 ρ = -0.231

old participants
p = 0.000 p =0.000 p =0.000
ρ =-0,137 ρ = -0.090 ρ = -0.088

overall
p = 0.002 p = 0.043 p = 0.046
Table 3.13: Results from correlation analysis with Spearman tested separately for
automated and manual driving, young and old participants as well

as the overall data set. Signicant results (p < 0.05) are printed in
boldface.
The lowest correlations were achieved with Wearable3. In contrast, the highest
correlations were achieved with the data of the older subjects, with Wearable3
(ρ = -0.351). Signicant correlations could be obtained with all three wearable
devices.
A closer look at the change of heart rate throughout the drives (see Figures
3.18 and 3.19) and in comparison to the self-ratings (see Figures 3.14 and 3.15)
makes the dierences and reasons apparent. Since the highest correlations
for Wearable1 were found on average across all data sets, only data of this
device was considered in these evaluations. Whereas drowsiness in terms of
automated and manual driving and young and old participants shows a constant
increase over time, the value of the heart rate is, except for smaller uctuations,
at a relatively constant value throughout the drive. In consideration of the
heart rate itself, there are noticeable dierences in the comparison between
automated and manual driving (see Figure 3.18) as well as within the age
groups (see Figure 3.19). In manual driving, the average heart rate for all
subjects is 5.79 beats per minute higher than for automated driving, possibly
due to the reduced activity and workload. Furthermore, the average heart rate
for young subjects is 2.40 beats per minute higher than for the older ones.
51
Figure 3.18: Average heart rate (Wearable1) with 95% CI (dashed lines) of all par-
ticipants for manual and automated driving.
Figure 3.19: Average heart rate (Wearable1) with 95% CI (dashed lines) of younger
and older participants for both manual and automated driving.
3.3.1.4 Discussion and Limitations
All presented hypotheses can be accepted. As in the simulator study, a signi-

cant eect of driving mode (H1) and driving time (H2) on the development of
drowsiness was found. Therefore, in the KSS ratings over the nine measuring
time points, a signicant dierence between manual and automated driving
was evident with higher drowsiness levels in automated driving. Further, a
signicant dierence exists in the KSS ratings over the nine measuring time
points with higher levels of drowsiness at the end of the ride. In contrast to
study 1, in study 2, a signicant eect of age on the development of drowsi-
ness was found (H3). Therefore, a signicant dierence exists between the
younger and older subjects in the KSS ratings over the nine measuring time
points with higher drowsiness levels for the younger subjects. Therefore, the
dierences between the two selected age groups were more pronounced in a
more realistic environment than in the simulator. This emphasizes the need
52
for studies in realistic environments. The higher KSS levels for the young age
group may also be related again to the ESS questionnaire results. On aver-
age and similar to the simulator study, the younger subjects achieved higher
ESS scores. Nine younger compared to four older subjects, are located in the
level of higher normal and even three in the area of mild excessive daytime
sleepiness. In the group of older participants, the remaining 11 subjects are
represented in the lowest level. In general, it can be noted that despite the rela-
tively short driving time in manual and partially automated driving, high levels
of drowsiness with higher levels in automated driving could be achieved with
the chosen study setting on a real test area. Thus, even in a production car,
the duty of monitoring and the non-engagement in secondary activities during
partially automated driving aect drowsiness already in a short time. This
issue was also conrmed by the post-questionnaire, where the majority stated
that they had to ght more with drowsiness during automated driving. What
could be another reason is the connection between drowsiness and trust in the
automated system. Results showed a signicant correlation between increased
drowsiness and trust ratings after the ride with the automated vehicle. It was
found that the subjects rated the distrust items lower after driving than before
driving with the automated vehicle. In return, the trust ratings increased after
the trip. Already after a short initial system exposure, trust in the automated
vehicle was present. Drivers who trust the automated vehicle more show larger
signs of drowsiness that may negatively impact the monitoring behavior. This
result is important as the attested safety risk of drowsy driving could become
even more critical with automated vehicles, that (at SAE level 2) demand being
permanently monitored by the driver. On the other hand, this could allow to
include (given this assumption holds for physiological measurements) drowsi-
ness measures as an unobtrusive behavioral measure for automation trust, too.
Increased signs of drowsiness could thus be interpreted so that drivers of au-
tomated vehicles accept to fall asleep due to high trust in automation. The
behavior of drowsy drivers might help to infer trust in an unobtrusively way.
Research on trust in automation and drowsiness will be necessary to prevent
misuse and successfully implement automated driving technology.
Regarding H4, a weak linear correlation with the drowsiness self-ratings was
found. It should be noted that the analysis was performed only at ve-minute
intervals, and only the mean was considered in terms of heart rate. Therefore,
smaller intervals and other heart rate features could be calculated and corre-
lated with drowsiness, resulting in higher correlations. Moreover, the reference
for drowsiness in the form of self-ratings could be brought into question be-
cause the subjects could have misinterpreted their current state of drowsiness
or given a rating not according to the truth. A more objective variant, e.g.,
in the form of observer ratings and these for shorter time intervals, could rep-
resent a more meaningful ground truth for drowsiness. With longer driving
times, possibly more pronounced changes in the heart rate signal could have
been detected and, as a result, higher correlations with drowsiness. Related
studies show that driving with automation tends to result in a decrease in heart
53
rate in comparison to manual driving [183, 184, 185]. However, not all stud-
ies show consistent results. Regarding time-on-task eects, subjects could also
have accustomed to the experiment. Concerning the dierences in heart rate
between the two age groups, results from literature show that a decrease in
the maximum heart rate comes with increasing age [186, 187, 188] that could
also be conrmed with the data from consumer-grade wearable devices in this
work. Further, in another work, it was shown that predicting drowsiness for
older people with models that were trained with data of young people is not
reliable and sucient. In contrast, models trained with data from young people
could predict drowsiness for young people with higher accuracies [4]. There-
fore, with the knowledge gained, intelligent driver-vehicle interfaces intended
to warn the driver in the event of an onset of drowsiness can be adapted and
personalized. For example, individual models for dierent driving modes and
age groups can be developed.
Regarding the study itself, it has to be noted that the test site on which the
study was carried out is limited in size, resulting in extremely monotonous
and safe driving conditions. In a real-world scenario, drowsiness might have
occurred later because possible dangers by other road users are not given. Fur-
thermore, it should be noted that certain precautions have been taken to induce
drowsiness more quickly in the subjects, e.g., no caeinated drinks ve hours
before the study, the monotonous route, the duty of monitoring, and the low
speed limit. However, in this environment, under controlled, safe, and above all
reproducible conditions, it was possible to investigate the risk factor of driver
drowsiness in manual and partially automated driving in a more realistic en-
vironment and a production car. However, the presented eects should also
be examined and validated in other age groups and with a larger number of
participants.
3.3.1.5 Main Findings
A signicant eect of driving mode on the development of drowsiness was

found. Therefore, a signicant dierence between manual and automated
driving was evident with higher levels in automated driving.
A signicant eect of driving time on the development of drowsiness was

found with higher levels of drowsiness at the end of the ride.
A signicant eect of age on the development of drowsiness was found.

Therefore, a signicant dierence exists between the younger and older
subjects with higher drowsiness levels for the younger subjects.
54
A signicant correlation between drowsiness self-ratings and heart rate

data from wearable devices was found.
Noticeable dierences in the participant's heart rate regarding driving

mode and age of the subjects were found.
In comparison to younger participants, older participants prefer a drowsi-

ness warning at earlier stages of drowsiness.
The presented study setting shows that approaching a more realistic sce-
nario under reproducible and above all safe conditions, e.g., when dealing
with safety-critical issues such as drowsiness, is possible if appropriate
preparations and precautions are taken.
A signicant correlation between an increase in drowsiness and trust rat-

ings after the ride with the automated vehicle was found. Participants
rated the distrust items lower after driving than before driving with the
automated vehicle. Trust increased after the drive and was present after
a short initial system exposure.
To ensure and increase safety, participants would use a wearable device

while driving.
55
4 Model Development: Driver
Drowsiness Detection using
Wrist-Worn Wearable Devices
The previous chapter presented the baseline studies and examined possible
preconditions for the adaptation and personalization of driver drowsiness de-
tection systems and modeling of dierent user groups. It was discussed how
the knowledge gained could be incorporated into the development process of
intelligent driver-vehicle interaction concepts for driver drowsiness detection.
RQ2 (Can driver drowsiness be derived from
This chapter addresses
vital parameters measured with wrist-worn smart wearables?). The
applicability of wrist-worn wearable devices for driver drowsiness detection in
an automotive environment will be examined. The potential and feasibility of
using physiological data from a wrist-worn wearable device, readily available
in the consumer electronics market, as a single data input for a machine learn-
ing classier to detect driver drowsiness are being explored and evaluated. In
further steps and based on the results, the knowledge gained and information
provided can then be applied to develop multimodal systems with a sensor fu-
sion approach and merge the data of a wrist-worn wearable device with other
non-intrusive in-vehicle sensors, e.g., a driver monitoring camera. For now,
and within the scope of this thesis, the goal is to investigate which detection
performance can be achieved purely with the wearable device's data.
Therefore, the methodology, as presented in Figure 4.1, was established. Sev-

eral physiological signals measured with wearable devices were tested. Rel-
evant features were extracted and labeled with the driver's drowsiness level.
Supervised machine learning classiers were evaluated and compared in user-
independent and dependent tests, focusing on classication accuracy, the choice
of input parameters and features, the choice of ground truth, or the number
of drowsiness levels. Detailed information about each of the dierent steps is
explained in the upcoming sections. In the following experiments, feature ex-
traction and data set preparation for machine learning were performed with
MATLAB (version 2018a). The machine learning procedure, including over-
sampling and feature selection, was performed with the Weka machine learning
library implemented in Java [189].
57
4 Model Development: Driver Drowsiness Detection using Wrist-Worn Wearable
Devices
Figure 4.1: Methodology for model development and testing.
4.1 Wrist-Worn Wearable vs. Medical-Grade

Device
As described in Chapter 2, HRV was particularly often applied to detect drowsi-

ness. However, the recording is often very intrusive, e.g., by attaching adhe-
sive electrodes to the driver's upper body during an ECG measurement. For
the present case, HRV is calculated from the recorded physiological data of a
wrist-worn wearable device. The Empatica E4 wristband (further referenced
as wristband) records IBIs with a sampling frequency of 64 Hz, the time
between two successive and individual heartbeats, for calculating the instanta-
neous heart rate that can be applied for analyzing HRV.
This experiment aimed to investigate and assess the potential and feasibility of
using physiological data, i.e., HRV, from a wrist-worn wearable device, readily
available on the market, as a single data source combined with a common ma-
chine learning model for driver drowsiness detection. To further check accuracy
and feasibility, results are compared with reference data from an intrusive and
medical-grade ECG device (further referenced as ECG), the Faros 3-channel
ECG from Bittium [179]. With a sampling frequency of 1000 Hz, this device
collects RR intervals, which is the time elapsed between two successive R waves,
i.e., two heartbeats, of the QRS complex on the ECG that can be applied for
HRV analysis. Figure 4.2 shows an excerpt of a study participant's RR intervals
from the ECG signal.
58
4.1 Wrist-Worn Wearable vs. Medical-Grade Device
Figure 4.2: RR intervals in QRS complex of ECG signal exemplary taken from a
study participant.
Hence, in the context of RQ2 (Can driver drowsiness be derived from

vital parameters measured with wrist-worn smart wearables?) the
following research questions (RQ) will be investigated in this experiment:
RQ2.1: Is it possible to reliably detect driver drowsiness by using phys-

iological data (HRV) from a wrist-worn wearable device as single data
source in combination with a machine learning classier?
RQ2.2: Considering the in-vehicle setting, how do the results of the

consumer device dier from a more intrusive medical-grade device?
RQ2.3: How do the results dier in the case of user-dependent vs. user-
independent tests?
4.1.1 Method
The methodology containing the collection of the ground truth of drowsiness,

feature extraction, and data set preparation and classication of driver drowsi-
ness is explained in the following. The data from the SAE level 2 drive from
Study 1 were applied for this experiment. The vehicle takes over the longitudi-
nal and lateral guidance at this level, but the driver must constantly monitor
the driving situation [20]. Driving-related parameters can no longer be con-
tinuously evaluated to identify drowsiness, so at this level, where the driver
forms the fall-back level for the automated system, alternative methods for
identifying drowsiness are necessary.
59
Devices
4.1.1.1 Ground Truth for Drowsiness
In order to obtain a reliable and valid ground truth of drowsiness for supervised
machine learning, a two-stage process with a combination of observer ratings
and image processing was applied.
Observer Ratings
Observer ratings of the driver's facial expressions and behaviors were collected
after the study by evaluating the video data recorded while driving. The 45-
minute partially automated ride was split into nine intervals, each of ve min-
utes in length. From each 5-minute interval, the fourth minute was extracted
to be rated by the observers. Sandberg et al. found that most driver drowsi-
ness indicators can be observed for intervals of 60 seconds or longer to obtain
reasonable signs of a driver's drowsiness state [190]. The extracted one-minute
segments were randomized per participant to eliminate the single segments'
time dependency. Video segments at the end of a participant's drive would
probably be rated higher than those in the beginning. To increase the reliabil-
ity of the results, two trained individuals rated all videos separately. Following
that, the segments with inconsistent ratings were evaluated and discussed by
both raters, and a joint rating was set. The obtained rating represents the en-
tire 5-min interval assuming that the drowsiness state does not change abruptly
but rather slowly. The six-level (1 | not drowsy; 2 | slightly drowsy; 3 | mod-
erately drowsy; 4 | drowsy; 5 | very drowsy; 6 | extremely drowsy) drowsiness
scale by Weinbeer et al. was applied for collecting the observer ratings (see
Table 2.2 in Section 2.2.1).
Taking into account the 30 subjects and the 45-minute partially automated
drive, a total of 270 minutes, i.e., 270 ratings, would have been available for
evaluation. However, problems with the video recording occurred for some
subjects, or the face was only partially visible in the video, e.g., due to an
unusual seating position. These segments were removed so that in the end, 244
min were evaluated. Both raters made the same decisions in 191 of 244 cases,
which correspond to a percentage of 78.28%. Inter-rater reliability in the form
of Cohen's Kappa resulted in a value of 0.69, which represents substantial
agreement following the classication of Landis and Koch [191].
Detection of Micro-Sleep Events through Image Processing

In addition to the observer ratings, the recorded video data was evaluated
through image processing. As the raters only assessed one minute in each
5-minute interval of the 45-minute ride, drowsiness indicators in the remain-
ing four minutes could not be considered for the nal rating. Concerning the
drowsiness scale, this would have been particularly critical if a micro-sleep event
had taken place within the non-rated minutes since levels 4 to 6 include the
specic eyelid closure time as a drowsiness indicator. Therefore, all detected
events with an eye closure of at least one second or longer were used, as well as
60
the respective eyelid closure duration to assign the appropriate level of drowsi-
ness. With this additional step, the observer ratings could be cross-checked
and enhanced. In total, 201 micro-sleep events were detected for 14 out of 30
subjects. All events detected were manually double-checked based on the cor-
responding frame numbers in the video le. As shown in Table 4.1, the events
were split to be assigned directly to drowsiness levels 4 to 6 on the scale used.
As the scale does not consider eyelid closure times between 3 and 4 seconds
(level 5: 23 seconds and level 6: 4 seconds or more), these micro-sleep events
were added to level 5.
Drowsiness level Eyelid closure time Micro-sleep events

4: drowsy 1 ≤ seconds < 2 89
5: very drowsy 2 ≤ seconds < 4 69
6: extremely drowsy seconds ≥ 4 43
Table 4.1: Categorization and allocation of the 201 detected micro-sleep events to
the corresponding level of drowsiness based on the eyelid closure time.
After receiving observer ratings and micro-sleep events, both measures were
combined. For each subject, the ratings for the 5-min intervals with micro-
sleep events were adjusted if necessary. A change in the drowsiness level was
done for 23 out of the 244 received ratings. The drowsiness level was corrected
20 times upwards and one time downwards. Two new ratings could be gained,
giving a total of 246 ratings (see Table 4.2).
Observer Adjusted Number of

rating rating occurrences
1 4 1
2 4 3
2 5 4
2 6 1
3 4 2
3 5 5
4 5 3
5 6 1
6 4 1
n.a. 6 2
Table 4.2: Adjustment of observer ratings after integration of detected micro-sleep
events with number of occurrences for each case; n.a.: no observer rating
available.
61
Devices
Since the focus is on assessing the potential of wrist-worn wearable devices

for driver drowsiness detection, the case of binary classication of drowsiness
(not drowsy vs. drowsy) is considered. In further steps, the classication
of drowsiness will be extended to a multi-class problem. Therefore, the six
drowsiness levels of the considered scale were divided into two groups. The
non-drowsy class contains levels 1 to 3 (not drowsy, slightly drowsy, moder-
ately drowsy), whereas the drowsy class covers levels 4 to 6 (drowsy, very
drowsy, extremely drowsy). Table 4.3 shows the distribution of the number of
grouped ratings on the two classes before and after the adjustment with micro-
sleep events. A class imbalance is apparent with a higher number of instances
in the non-drowsy class.
Observer ratings +
Class Observer ratings
Micro-sleep events
non-drowsy (levels 13) 212 196
drowsy (levels 46) 32 50
Table 4.3: Distribution of number of observer ratings in absolute numbers across
grouped drowsiness levels before and after integration of micro-sleep

events.
In Figure 4.3, the localization in time of the number of ratings for both the non-
drowsy and drowsy class after integrating micro-sleep events is presented.
The number of drowsy ratings increased almost linearly across all subjects up
to a driving time of 30 min. In contrast, the number of non-drowsy ratings
decreased in the same time interval. From minutes 30 to 40, the opposite
is recognizable. From this, it can be deduced that drowsiness increased on
average across all subjects up to minute 30 and decreased from minutes 30 to
40. Towards the end, the level of drowsiness rose again.
Figure 4.3: Localization of number of ratings in time for both the non-drowsy and
drowsy class.
62
4.1.1.2 Feature Extraction and Data Set Preparation
Before presenting the HRV feature extraction method in this work, related ap-
proaches from previous work will be presented.
Vicente et al. investigated two drowsiness detection methods with various win-
dows sizes. First, they extracted features from the time and frequency domain
utilizing windows with a length of three minutes. Second, every minute was
assessed and named either non-drowsy or drowsy. They utilized direct dis-
criminant analysis for classication and Wilks lambda minimization criteria for
reducing the number of features. With that approach, they achieved a sensitiv-
ity of 0.59 and a specicity of 0.98 [117]. Zhao et al. investigated the detection
of drowsiness with approximate entropy (ApEn) and power spectral density
(PSD) of an ECG signal. An auto-regressive method was applied to compute
PSD. They found that during drowsiness, the LF PSD diminishes and the HF
PSD and the ApEn of the ECG increases [192]. Jung et al. explored the uti-
lization of conductive fabric electrodes on the steering wheel to record the ECG
of the driver to analyze it in the time and frequency domain. The PSD was cal-
culated with a fast Fourier transform (FFT). For driver state classication, an
ANS balance graph built with LF, HF, LF/HF ratio, and RMSSD (root mean
sum of squared distance) was utilized [193]. A comparative technique (FFT for
PSD calculation) was introduced by Nambiar et al., who trained a neural net-
work for classication and accomplished an accuracy of around 99.99% [194].
Lenis et al. used the Welch method to extract frequency domain measures from
ECG and presumed that during a microsleep, pulse rate decreased and HRV
increased [195].
Based on the presented related work, it can be assumed that an analysis of HRV
with suitable methods and algorithms allows high detection rates concerning
drowsiness.
For the present experiment, the dierent sampling rates of the two devices were
not adjusted. The aim is to compare the usage of data from a consumer and a
medical-grade device. Since three channels were recorded with the ECG mea-
suring device, but only one and the same channel was used for all participants
for further analysis, the three channels of all subjects were visually inspected in
terms of data quality and possible artifacts, e.g., undetected RR peaks in the
ECG pattern, with the Kubios HRV analysis software [196]. Finally, the RR
peaks of ECG channel 1 were applied in further analysis in raw format. The
Empatica E4 wristband uses an algorithm to record the data and process the
PPG/BVP signal. This algorithm lters and removes false peaks due to noise
(e.g., motion artifacts) [127]. For this reason, the raw data of the wristband was
used for the following analyses and is not further ltered or preprocessed. IBIs
from wristband and ECG were processed in time, frequency, and non-linear
domain for HRV analysis. HRV features were extracted from 5-min windows
of the signal with reference to the Task Force of the European Society of Car-
diology and the North American Society of Pacing and Electrophysiology that
63
Devices
suggests 25-minute windows to analyze frequency-domain features [121]. A

sliding window with a 2-seconds increment was used for generating the feature
vectors (see Figure 4.4). PSD was computed with LombScargle periodogram
in the frequency domain, a non-parametric technique that does not require any
earlier information of the signal parameters. The advantage of utilizing Lomb
Scargle periodogram over other non-parametric techniques like Welch is that
no interpolation is needed if the signals are non-periodic. From three dierent
frequency bands of PSD features were extracted: very low frequency (VLF)
band (00.04 Hz), LF band (0.040.15 Hz) related to the sympathetic activity,
and HF band (0.150.4 Hz) related to the parasympathetic activity of ANS
[117]. Features in the non-linear domain were calculated with Poincare plots.
The following 26 features were extracted (abbreviations in brackets):
Time-domain features: Mean RR interval length (meanRR), maximum

RR interval length (maxRR), minimum RR interval length (minRR),
range of RR interval length (rangeRR), standard deviation of RR in-
terval lengths (SDNN), mean of 5-min standard deviation of RR inter-
vals (SDANNIndex), maximum heart rate (maxHR), minimum heart rate
(minHR), average heart rate (meanHR), standard deviation of heart rate
(SDHR), square root of the mean squared dierence of successive RR
intervals (RMSSD), number of interval dierences of successive RR in-
tervals greater than 50 ms (NN50), percentage of successive/adjacent RR
intervals diering by 50 ms (pNN50);
Frequency-domain features: Very low frequency power (VLFpower), low

frequency power (LFpower), high frequency power (HFpower), total power
(Totalpower), percentage value of very low frequency power (pVLF), per-
centage value of low frequency power (pLF), percentage value of high
frequency power (pHF), normalized low frequency power (LFnorm), nor-
malized high frequency power (HFnorm), ratio of low and high frequency
(LFHF_ratio);
Non-linear domain features: Standard deviation of instantaneous (short

term) beat-to-beat R-R interval variability (SD1), standard deviation
(SD) of the long term R-R interval variability (SD2), ratio of standard
deviation 1 and standard deviation 2;
After extraction, features were labeled for supervised machine learning (see
Figure 4.4). In total, data from 27 out of 30 subjects were available for feature
extraction from the wristband. To provide comparability, the missing three
subjects were not considered in the case of the ECG. Overall, the number of
non-drowsy and drowsy instances for the wristband are 14627 and 3987 and
for ECG 24149 and 5845. It becomes clear that ECG contains more instances
due to a more accurate, higher-resolution measurement than the wristband.
64
Figure 4.4: Sliding window approach for feature extraction and labeling exemplary
for time intervals 1 and 2.
4.1.1.3 Classication of Driver Drowsiness
The ability to generalize for new users is crucial for the establishment of systems
for driver activity recognition. Thereby, the problem of inter-driver variance
has to be taken into account because physiological signals within persons, in
our case, drivers of an automated vehicle, can dier to a great extent [197].
We apply a user-independent test (UIT) to deal with this issue. In the UIT, a
leave-one-subject-out cross-validation (LOSOCV) is performed. The data set
for each subject is treated as testing data once. Since data from 27 participants
were collected, in each LOSOCV-iteration, 26 participants are used for training
and the remaining 27th for testing. The prediction results are then averaged
over all subjects (see Figure 4.5). In comparison to the UIT, a user-dependent
test (UDT) will be performed additionally in the form of 10-fold stratied cross-
validation (CV) to obtain the overall classication accuracy and decrease the
eect of inter-driver variance. Stratied cross-validation was utilized because
each fold reects the class distribution in the original data set. In terms of
the present class imbalance, this ensures the same proportion of drowsy and
non-drowsy samples in each cross-validation run. Moreover, it reduces both
bias and variance compared to regular k-fold cross-validation, where the data
set is only randomly divided into k folds [198].
Class Balancing and Feature Selection

To counteract class imbalance, the number of instances in the two classes has
been adjusted using the synthetic minority oversampling technique (SMOTE)
[199]. During LOSOCV and 10-fold cross-validation, SMOTE was applied to
the training set in each iteration before the classier was trained (see Figure
4.5).
To reduce the feature space, feature selection on each training set was performed
in each iteration step after oversampling. For this purpose, correlation-based
feature subset selection (CFSS) was applied. CFSS searches for a subset of
features highly correlated to the output class, yet its features are unrelated
to each other. Thereby, feature-to-feature correlation is reduced, and feature-
to-class correlation is increased. The criterion is dened using the Pearson
65
Devices
correlation coecient. CFSS is not a search method; instead, it proposes a

metric to evaluate a specic feature subset. Best-rst was used as a search
algorithm. Each feature's predictive ability is used to evaluate the worth of
a subset, considering the redundancy between all features [200]. Depending
on the features in the obtained subset, in each iteration, the corresponding
features were adjusted in the corresponding test data set (see Figure4.5).
Figure 4.5: Graphic representation of Leave-One-Subject-Out Cross-Validation

(LOSOCV) for 27 subjects: Oversampling of minority class in training
set in each iteration with Synthetic Minority Oversampling Technique
(SMOTE); Selection of best feature subset with Correlation-based Fea-
ture Subset Selection (CFSS) in training set and adjusting of features
in testing set; calculation of average accuracy (A) and F-measure (F)
(separately for both classes) across all iterations.
Performance Measures
Concerning performance measures, accuracy as one of the traditional measures
might be suitable but not when dealing with unbalanced data. Its focus is more
on the majority classes than on the minority ones [201]. Thus, F-measure will
be used additionally. For both UDT and UIT, accuracy and F-measure were
calculated and averaged across all iterations (see Figure 4.5). Concerning the
presented binary classication problem, it is, of course, important to correctly
detect when the driver is in a drowsy state. However, from the customer's
point of view, it is also crucial to correctly detect when the driver is in a non-
drowsy state to not irritate with unnecessary drowsiness warnings. Concerning
a standard confusion matrix with the values for True Positive (TP), True
Negative (TN), False Positive (FP), and False Negative (FN), the formula
for F-measure (see Equation (4.1)) does not take the True Negative (TN) values
in account.
2T P
F -measure = (4.1)
2T P + F P + F N
In the presented case, the correctly classied instances of the negative class,
representing drowsy, would not be considered. For this reason, at each cross-
66
validation iteration, the value for F-measure is calculated per class. For the
non-drowsy, i.e., positive class, this is further referenced as F1, and for the
drowsy, i.e., negative class as F2. An average value is then presented for
both F1 and F2 across all subjects. Therefore, F2 is the crucial measure for
the detection of drowsiness. Its value represents how many drowsy instances
were correctly classied as drowsy.
Machine Learning Classiers

Several dierent classiers were applied with the data collected during the
simulator study. The models were not ne-tuned, but the default parame-
ter values used preset in the Weka machine learning library [189]. Models
from dierent categories were compared with each other regarding their per-
formance in the presented classication problem. From the category of tree
classiers, Random Forest (RF) (100 trees), Random Tree (RT), and Decision
Stump (DS) were chosen. In terms of rule-based classiers, a Decision Ta-
ble (DT) (search algorithm: Best rst, evaluation measure: root-mean-square
error (RMSE)) was applied. The K-Nearest Neighbor (KNN) algorithm (no
distance weighting; number of neighbors: 1, search algorithm: brute-force, dis-
tance function: Euclidean) from the group of lazy learners and from the group
of Bayesian classiers a Bayesian Network (BN) (estimator: simple estimator,
search algorithm: K2), and Naive Bayes (NB) classier were tested. A Sup-
port Vector Machine (SVM) classier (kernel: polynomial, C: 1) represented
a function-based-classier. A Multilayer Perceptron (MLP) (batch size: 100,
hidden layers: (number of features + number of classes)/2, learning rate: 0.3,
momentum: 0.2) from the group of neural networks was applied. For each of
the models presented, the same procedure for testing was carried out, i.e., a
10-fold stratied CV in the UDT and a LOSOCV in the UIT, as described
above.
4.1.2 Results
In the following, results from feature selection and classication of drowsiness

are presented.
4.1.2.1 Selected Features
The feature selection procedure using CFSS was performed on the training
data. In the next step, the features in the testing data were adjusted ac-
cordingly. This was done before the actual training and testing of the machine
learning model. Table4.4 presents all selected feature subsets in CFSS for UDT
and UIT for both devices. The total number of available subsets equals the
67
Devices
number of iterations in each cross-validation: 10 for UDT (10-fold stratied

CV) and 27 for UIT (LOSOCV with 27 subjects).
UDT
8× meanRR, meanHR
Wristband 1× meanRR, meanHR, RMSSD
1× meanRR, meanHR, RMSSD, NN50
3× maxRR, minRR, maxHR, minHR

ECG 2× maxHR, minHR
5× maxRR, minRR
UIT
5× meanRR, meanHR, SD1

1× meanRR, maxRR, SD1, LFpower
1× meanRR, maxRR, meanHR
5× meanRR, meanHRV, RMSSD
9× meanRR, meanHR
Wristband 1× meanRR, meanHR, HFpower
1× meanRR, meanHR, pNN50, SD1
1× meanRR, meanHR, RMSSD, LFpower
1× meanRR, meanHR, SD2
1× meanRR, meanHR, Totalpower
1× meanRR, minHR, meanHR, RMSSD, NN50
8× maxRR, minRR, maxHR, minHR

5× maxRR, minRR, rangeRR, maxHR, minHR
3× maxRR, maxHR
1× maxHR, minHR
ECG
6× minRR, minHR
4× maxRR, minRR
Table 4.4: Selected Feature Subsets in CFSS for User-Dependent Test (UDT) (10
iterations in 10-fold Cross Validation (CV)) and User-Independent Test

(UIT) (27 iterations in LOSOCV) with number of occurrences for wrist-
band and ECG.
When looking closer at the selected feature subsets for the UDT, it can be seen
that they consist only of time-domain features. However, the subsets of both
devices do not contain any identical features. In the case of the wristband, a
total of ve dierent features are selected. Each subset includes meanRR and
meanHR. In the case of ECG, maxRR and minRR appear in eight subsets.
Furthermore, maxHR and minHR were selected. The low number of subsets
can be related to the fact that all subjects' data were randomized and evenly
divided into ten folds in a stratied way to counter inter-driver variance between
68
the subjects.
In comparison to UDT, 11 dierent feature subsets were selected with data from
the wristband for UIT, which are almost four times as many as for UDT. In
addition to time-domain features, features from the frequency- and non-linear
domain were selected in the feature subsets. It becomes clear that inter-driver
variance inuences the choice of features if only a single person is removed
from the data set. This can also be recognized for ECG. Twice as many feature
subsets exist but, as for the UDT, containing only features from the time-
domain and therein min/max values of the RR and HR signals.
In general, for both the UDT and UIT, the majority of selected feature subsets
for the wristband and ECG mainly consist of time-domain features. In the
present case, the features' importance can be ranked in descending order as
follows: time-domain, frequency-domain, non-linear domain.
4.1.2.2 Classication Results
Table 4.5 shows the results of UDT and UIT, for both devices and all mod-
els tested, with the respective values for accuracy and F-measure. Focusing
on the UDT accuracy, it can be seen that ECG data produced better results
on all tested classiers except NB. These dierences are, in some cases, more
pronounced as in BN, SVM, DS, and DT, but for KNN, RF, and RT, the dier-
ence is only a few percentage points. For the wristband, the highest accuracy
of 92.13% was achieved with KNN, 91.58% with RF, and 90.02% with RT. For
ECG, the classiers RF and RT performed best with an accuracy of 97.37%and
BN with 96.85%, and DT with 91.18%. In terms of F-measure, it is noticeable
that the values for F1 and F2 diverge more in the wristband. Looking at RF
and RT, the values for F1 are 0.94 and 0.93, and the values for F2 are 0.82
and 0.79. For ECG, these are 0.98 and 0.94 each. In general and concerning
F1, the values of the wristband are slightly lower but comparable to ECG.
Except for DS and MLP, which have higher values for F2 than for ECG, the
dierences between the F1 and F2 are larger. For ECG, the values for F1 and
F2 are very high and at a similar level for BN, KNN, RF, RT, and DT. This
speaks for a low number of false positives and false negatives and an equally
satisfying classication in both classes. The assessment of drowsy instances as
really drowsy has shown to be more dicult for the models when working with
data from the wristband, which can be referred to as lower values for F2 and
a higher number of false negatives than ECG.
With regard to UIT and except SVM (65.64% accuracy), overall lower classi-
cation accuracies were achieved when compared to UDT. In addition to NB
(66.74%), SVM (65.64%), and MLP (25.84%) yielded better results with data
from the wristband. All other models achieved higher results with HRV from
ECG data. However, the dierences between the two devices are not as high
as in the UDT case. DS achieved the highest accuracy with 73.39% and was
69
Devices
the only model >70%. Regarding ECG data, the classiers NB (41.84%), SVM
(40.01%), an dMLP (25.84%) did not reach the threshold of 50%. In terms of
accuracy, DS scored the best with ECG at 78.94%.
UDT Wristband ECG

Model A F1 F2 A F1 F2
BN 79.59 0.86 0.62 96.85 0.98 0.92
NB 68.39 0.79 0.32 53.88 0.66 0.26
SVM 29.27 0.72 0.32 54.39 0.66 0.28
KNN 92.13 0.95 0.83 97.34 0.98 0.94
RF 91.58 0.94 0.82 97.37 0.98 0.94
RT 90.02 0.93 0.79 97.37 0.98 0.94
DS 77.60 0.86 0.36 81.36 0.89 0.21
DT 80.61 0.87 0.65 97.18 0.98 0.93
MLP 61.94 0.69 0.46 64.43 0.71 0.30
UIT Wristband ECG

Model A F1 F2 A F1 F2
BN 57.77 0.69 0.19 70.86 0.77 0.10
NB 66.74 0.80 0.57 41.84 0.63 0.20
SVM 65.64 0.79 0.26 40.01 0.61 0.28
KNN 55.44 0.71 0.12 65.71 0.75 0.14
RF 62.36 0.74 0.20 70.64 0.79 0.13
RT 63.16 0.72 0.19 68.88 0.76 0.21
DS 73.39 0.82 0.65 78.94 0.83 0.17
DT 64.28 0.73 0.15 76.14 0.73 0.37
MLP 43.48 0.57 0.42 25.84 0.49 0.22
Table 4.5: Classication results with performance measures for UDT/UIT, Wrist-
band/ECG and for all tested classiers. The abbreviations represent

accuracy (A), F-measure for non-drowsy class (F1) and F-measure for
drowsy class (F2).
Focusing on the values of F1 and F2, it is noticeable that in F1 for both

devices, the values are lower compared to the UDT. In the UIT, the maximum
F1 score for both the wristband and ECG was achieved with DS (0.82). In
contrast, values for F2 are signicantly lower. DS achieved a maximum of 0.65
with data from the wristband and DT with ECG data 0.37. In general, the
number of false negatives in the drowsy class is way higher than the number
of false positives in the non-drowsy class, which resulted in the low values
for F2. To get a better impression of the classication results, Table 4.6 shows
exemplary UIT classication results for four individual subjects (3, 4, 21, and
27) and ve selected classiers (BN, KNN, RF, RT, and DT) for wristband
and ECG. The values for F2 show again that the classication of drowsy
instances is more challenging than the non-drowsy ones. Generally speaking,
70
it also becomes clear that the performance depends very much on the individual
subject, reecting the strong inuence of the inter-driver variance. Depending
on the model and the type of data, strong performance uctuations exist within
this small extract from the data set.
Participant 3 4
Model Device A F1 F2 A F1 F2
Wristband 59.90 0.75 0.00 20.45 0.32 0.04
BN
ECG 51.73 0.59 0.41 36.57 0.54 0.00
Wristband 58.17 0.73 0.05 22.30 0.37 0.00

KNN
ECG 63.41 0.68 0.58 42.93 0.59 0.07
Wristband 57.37 0.73 0.02 53.16 0.14 0.68

RF
ECG 66.62 0.77 0.36 42.29 0.54 0.23
Wristband 56.74 0.72 0.02 40.90 0.52 0.24

RT
ECG 67.27 0.78 0.36 42.29 0.54 0.23
Wristband 49.60 0.66 0.00 34.20 0.04 0.50

DT
ECG 38.25 0.00 0.55 60.57 0.40 0.71
Participant 21 27
Model Device A F1 F2 A F1 F2
Wristband 52.30 0.65 0.27 18.02 0.17 0.20
BN
ECG 77.06 0.87 0.00 85.49 0.92 0.51
Wristband 62.20 0.77 0.00 42.55 0.60 0.01

KNN
ECG 74.56 0.85 0.04 78.65 0.86 0.26
Wristband 63.65 0.78 0.00 47.43 0.64 0.04

RF
ECG 72.40 0.84 0.02 83.49 0.91 0.00
Wristband 63.29 0.76 0.00 47.06 0.63 0.04

RT
ECG 74.81 0.86 0.02 87.49 0.93 0.00
Wristband 58.09 0.74 0.00 33.29 0.43 0.19

DT
ECG 87.49 0.93 0.00 87.48 0.93 0.00
Table 4.6: Exemplary classication results of UIT for selected classiers and sub-
jects with performance measures for both wristband and ECG.
4.1.3 Discussion and Limitations
In the following, the presented RQs will be answered, and implications for
further research in the development of novel techniques for driver drowsiness
detection derived.
RQ2.1: Is it possible to reliably detect driver drowsiness by using
physiological data (HRV) from a wrist-worn wearable device as a
71
Devices
single data source in combination with a machine learning classier?

In general, high accuracies (>90%) and values for F1 (>0.90) and F2 (>0.80)
were achieved in the UDT with data from the wristband. However, as the
results show, a high detection accuracy depends on the type of classier applied.
Classiers from the group of lazy learners (KNN) and trees (RF, RT) seem to
be more suitable for this detection task compared to the other classier types.
Especially KNN, with an accuracy of 92.13% and values of 0.95 and 0.83 for
F1 and F2, respectively, should be highlighted. This speaks for the use of
physiological data, i.e., HRV, from a wristband as single data input for driver
drowsiness detection with user-dependent models.
RQ2.2: Considering the in-vehicle setting, how do the results of the
consumer device dier from a more intrusive medical-grade device?
Focusing on the results of the two devices in general, regardless of whether a
UDT or UIT was performed, it is noticeable that slightly better results for most
tested models were achieved with ECG data. However, it should be noted that
with the data of the consumer device, albeit not for all models, the results are
comparable and at a similar level as compared to the more intrusive medical-
grade device in the in-vehicle setting. Thus, for the present application, no
more complex ECG measurement would have to be applied. Instead, the much
less intrusive sensor of the wrist-worn smart wearable would suce.
RQ2.3: How do the results dier in the case of user-dependent vs.
user-independent tests? Results from UDT are signicantly higher than
in the UIT, which may reect the more considerable inuence of inter-driver
variance that could have been better accounted for a more extensive data set.
The classication of the drowsy class turned out to be particularly critical
at UIT, which can be recognized by the low values of F2 for both devices.
The maximum achieved score for F2 was 0.65 with wristband data and 0.37
with ECG data. This indicates the challenges that still need to be addressed
before a robust commercial warning system can be developed. In comparison,
UIT and UDT achieved high F1 values. From the customer's point of view,
this is crucial, as the driver does not want to be irritated by false drowsiness
warnings.
Moreover, data from more realistic environments has to be collected [202].
Vibrations in a real car that could inuence the data recording were absent in
the simulator. To counteract the class imbalance, cost-sensitive classication
could also be considered, in which the values of TP, FP, TN, and FN carry
dierent weightings. Throughout the use, the system could transit from a user-
independent model to a user-dependent one and adapt to the user. In terms of
the ground truth, the question arises, whether it is sucient to use drowsiness
ratings that apply to 5-min intervals or ratings at much shorter intervals are
needed. During the transition to the adjacent 5-min interval, possible changes
in the course of drowsiness ratings may not be reected by the physiological
signal. The applied models were used o-the-shelf , and no hyper-parameters
were tuned. The aim was to identify which standard machine learning models
are suitable for the proposed classication problem. Fine-tuning of the most
72
4.2 Wrist-Worn Wearable vs. Wrist-Worn Wearable
promising models could then increase performance further. Since the focus in
this work was on a specic type of feature selection (CFSS) and class balancing,
other methods should be considered and compared.
4.1.4 Main Findings
Due to the reduced inuence of inter-driver variance, better results were

achieved in the UDT.
With a KNN classier, a maximum accuracy of 92.13% was achieved with

data from the wristband in the UDT. For ECG, a maximum accuracy of
97.37% was obtained with an RT and RF.
Due to the class imbalance in the data set with a higher number of non-
drowsy samples, in the UIT, especially drowsy instances were more chal-
lenging to classify.
In the UIT, the wristband achieved 73.39% accuracy with a DS. The
maximum for ECG was 78.94%.
For both UDT and UIT, the obtained results with data from wristband
and ECG are comparable.
Driver drowsiness detection using HRV from a standalone wrist-worn

wearable device readily available on the market in combination with a
machine learning model is feasible.
4.2 Wrist-Worn Wearable vs. Wrist-Worn

Wearable
In the previous chapter, a wrist-worn wearable device was compared with a

medical-grade ECG device. Since a variety of wrist-worn smart wearables is
available on the consumer electronics market, in the next step, it will be inves-
tigated whether comparable results can be achieved with data from dierent
devices. The focus in this experiment is on the standard consumer-grade tness
trackers: Garmin Forerunner 235 (further referenced as Wearable1), Garmin
73
Devices
Vivosmart 3 (Wearable2), Polar A370 (Wearable3). This time, heart rate

is applied as a single physiological parameter in combination with a machine
learning classier for driver drowsiness detection. Apart from binary classi-
cation, as applied in the previous section for driver drowsiness detection, the
performance of a multi-level classication of driver drowsiness will also be ex-
amined.

Hence, in the context of
vital parameters measured with wrist-worn smart wearables?) three
further questions (RQ) will be investigated in this experiment:
RQ2.4: Is it possible to reliably detect driver drowsiness by using exclu-

sively physiological data (heart rate) from a wrist-worn wearable device
in combination with a common machine learning classier?
RQ2.5: How do results of dierent devices dier?
RQ2.6: How do results for a dierent number of drowsiness levels dier?
4.2.1 Method
As in the previous experiment, at rst, the methodology containing ground

truth of drowsiness, feature extraction, and data set preparation and classi-
cation of driver drowsiness is explained. For this experiment and approaching
a more realistic scenario, data collected during the partially automated drive
in study 2 is applied.
In contrast to the previous experiment, this time, drowsiness self-ratings are

applied as ground truth for driver drowsiness.
Since each of the 30 subjects rated their drowsiness every ve minutes during
the 45-minute SAE level 2 automated drive, a total of 270 individual ratings
would be available. However, with two test subjects, the automated drive had
to be ended earlier because of problems with the driving robot. Therefore,
265 ratings were available that could be used as labels in supervised machine
learning. Figure 4.6 shows the average KSS ratings of all 30 subjects throughout
the 45-minute partially automated drive.
74
Figure 4.6: Average KSS ratings with 95% CI of all participants during SAE level
2 automated driving in baseline study 2.
The drowsiness ratings continuously rose throughout the drive, ranging from an
average KSS level of 3.37 after ve minutes to 5.61 after 40 minutes. The ratings
marginally decreased in the last ve minutes. To get a better impression for the
determined drowsiness self-ratings given during partially automated driving,
the distribution of ratings given by all 30 subjects across all nine levels of the
KSS is plotted in Figure 4.7(a). It can be seen that the ratings are unevenly
distributed, with a minimum of one rating at level 1 to a maximum of 58 ratings
at level 3. Level 4 was chosen 47 times. Level 2 with 28, level 5 with 37, level
with 33, and level 7 with 32 ratings are on a similar level. KSS levels 8 and 9
received 14 and 15 ratings, respectively. Additionally, a grouping of the nine
KSS levels into three and two levels was done concerning later 2- and 3-level
classication of drowsiness in supervised machine learning. The categorization
of the KSS levels was based on the results in the work of Ingre et al. [79].
A simulator study was conducted to investigate sleepiness and accident risk
to give subject-level relative risks to various levels of sleepiness recorded with
KSS. This was done every ve minutes, and events of crashes, accidents, and
incidences were recorded. Their outcomes demonstrated that sleepiness was in
a strong relationship with the risk of an accident. For an average participant,
the risk for an accident was 28.2 times higher at KSS level 8 and 185 times
higher at KSS level 9 than KSS level 5. The grouping of KSS levels was derived
from the predicted probability for the event of an instance during the drive.
Based on this, KSS levels 1-4 represent the state awake (group 1), levels 5 and
6 the transition state (group 2) as well as levels 7-9 the state drowsy (group 3)
(see Figure 4.7(b)). For the 2-level case, group 1 represents the non-drowsy
and group 2 the drowsy class (see Figure 4.7(c)).
75
Devices
(a) (b)
(c)
Figure 4.7: Distribution of self-ratings across KSS levels (a) and grouped levels for
three-level (b) and two-level (c) classication of drowsiness.
4.2.1.2 Feature Extraction and Data Set Preparation
Concerning the recording of heart rate data, Wearable1 and Wearable3 pro-
vided a heart rate value every second, whereas Wearable2, on average, every
three seconds across all 30 subjects. Figure 4.8 illustrates the average heart
rate of all subjects over the 45-minute SAE level 2 automated drive. It can be
seen that after a few minutes, the data of all three devices show a similarly-
sized drop in heart rate. The heart rate of all three devices is characterized
by many uctuations, some of which are strong, with a minimal increase over
the driving duration. It can also be determined that the values of Wearable2
and Wearable3 are at a very similar level and only drift apart towards the end.
In contrast, Wearable1 has a similar course per se, which is a few beats lower
than Wearable1 and Wearable2. A slight approaching of the curves takes place
towards the end of the drive. Data from the same 28 out of the 30 partici-
pants of each wearable were available for feature extraction. Again, a sliding
window with a length of ve minutes and a 2-second increment was applied
for calculating the feature vectors. The following six features were extracted
(abbreviations in brackets) from the received heart rate data: maximum heart
rate (maxHR), minimum heart rate (minHR), average heart rate (meanHR),
standard deviation of heart rate (stdHR), range of heart (rangeHR), median
of heart rate (medianHR). Feature vectors were then labeled with the corre-
76
sponding KSS rating for preparing the data sets, where a rating represents the
entire 5-minute interval.
Figure 4.8: Comparison of average heart rate in beats per minute of all participants
measured by the three wearables during the SAE level 2 automated

drive in baseline study 2.
4.2.1.3 Classication of Driver Drowsiness
10-fold stratied cross-validation (CV) was performed to obtain the overall

classication accuracy and decrease inter-driver variance. Stratied CV was
utilized because, in each fold, the class distribution of the original data set is
represented. Further, it decreases variance and bias compared to regular k-fold
CV, where the data set is only randomly split into k folds [198].
Class Balancing and Feature Selection

To counteract class imbalance, the number of samples in the two classes was
aligned using the SMOTE [199]. For reducing the feature space, CFSS was
applied to each training set in each iteration after oversampling.
Performance Measures
Accuracy (further referenced as A), i.e., the number of correctly classied
instances in percent and F-measure (further referenced as F), i.e., the har-
monic mean of precision and recall (sensitivity) will be used as performance
measures; the latter one primarily due to the present class imbalance. Besides
the reliable detection of drowsiness and from the customer's point of view, it
is also crucial to recognize the non-drowsy state. The driver does not want
to be irritated by false drowsiness warnings. Therefore, high values for both
recall and precision are needed.
Machine Learning Classiers

As in the previous experiment, several common and widely used machine learn-
77
Devices
ing models from dierent categories were evaluated and compared in terms of
performance for the presented classication problem. Again, models were not
ne-tuned or optimized, but the parameter values used that were preset in the
Weka library [189]. From the group of tree classiers, Random Tree (RT) and
Random Forest (RF) (100 trees) were selected. Decision Table (DT) (search al-
gorithm: Best rst, evaluation measure: root-mean-square error (RMSE)) and
Partial Decision List (PART) were applied from the group of rule-based clas-
siers. K-Nearest Neighbor (KNN) (no distance weighting; number of neigh-
bors: 1, search algorithm: brute-force, distance function: Euclidean) from the
category of lazy learners and Bayesian Network (BN) (estimator: simple esti-
mator, search algorithm: K2) and Naive Bayes (NB) classier from the group
of Bayesian classiers were tested. A SVM classier (kernel: polynomial, C: 1)
represented a function-based classier. A Multilayer Perceptron (MLP) (batch
size: 100, hidden layers: (number of features + number of classes)/2, learning
rate: 0.3, momentum: 0.2) served as neural network.
4.2.2 Results
In the following, results from feature selection and classication of driver

drowsiness are presented.
4.2.2.1 Selected Features
In each iteration of 10-fold stratied CV, feature selection was performed on

the training data set. Table 4.7 shows the selected feature subsets for all three
wearables and the case of a 2-level and 3-level classication of drowsiness. In
the 2-level case, a single subset was used for all three wearables in all ten
iterations of the stratied CV. The subsets of Wearable1 (meanHR, stdHR,
maxHR, medianHR) and Wearable3 (meanHR, stdHR, maxHR, minHR) each
contained four features, that of Wearable2 (maxHR, minHR) only two. In the
3-level case, just one subset was selected for Wearable1 and Wearable2, but two
for Wearable3. The subset of Wearable1 (meanHR, stdHR, maxHR, minHR)
contains four, the subset of Wearable2 (meanHR, maxHR, minHR) three and
the two subsets of Wearable3 four (meanHR, stdHR, maxHR, minHR) and
three (meanHR, stdHR, maxHR) features. It can be seen that meanHR is
included in all feature subsets, except for Wearable2 in the 2-level classication
case. In previous work, too, the main eect regarding the driver's state was
found in cardiac parameters at meanHR [30].
78
2-level Classication
Wearable1 10x meanHR, stdHR, maxHR, medianHR

Wearable2 10x maxHR, minHR
Wearable3 10x meanHR, stdHR, maxHR, minHR
3-level Classication
Wearable1 10x meanHR, stdHR, maxHR, minHR

Wearable2 10x meanHR, maxHR, minHR
8x meanHR, stdHR, maxHR, minHR
Wearable3
2x meanHR, stdHR, maxHR
Table 4.7: Selected feature subsets for each wearable device in 10-fold CV for two
and 2- and 3-level classication of drowsiness.
Table 4.8 shows the 2-level and 3-level classication results using 10-fold strat-
ied CV with corresponding performance measures (A and F) for all three
wrist-worn wearables used.
Looking at the results for the 2-level classication of driver drowsiness, they
can be divided into two or even three groups in terms of performance. With
the data of all three wearables, KNN, RF, RT, PART, and DT achieved very
high accuracies and F-measures and particularly with Wearable3: KNN (A:
99.42%, F: 0.99), RF (A: 99.34%, F: 0.99), RT (A: 99.13%, F: 0.99), PART (A:
98.53%, F: 0.99), DT (A: 96.03%, F: 0.96). Overall and slightly poorer results
were achieved with BN, MLP, and SVM. The lowest classication performance
was reached with NB, which is even below the threshold of 50% for Wearable1
(A: 40.60%, F: 0.42) and Wearable2 (A: 36.77%, F: 0.37). With regard to the
devices used, it becomes clear that Wearable2 achieved the lowest accuracies
and F-measures across all classiers tested.
When considering the 3-level classication results, similar results were obtained
as in the 2-level classication case. The best classication results were again
achieved with KNN, RF, RT, PART, and DT. The data from Wearable1 per-
formed best with KNN (A: 98.59%, F: 0.99) and PART (A: 97.68%, F:0.98),
with the data from Wearable3 RF (A: 98.56%, F: 0.99), RT (A: 98.01%, F:
0.98) and DT (A: 93.83%, F: 0.94). With regard to the existing strong class
imbalance, the high values for F-measure (0.99) are particularly noteworthy,
which corresponds to high precision and recall, thus a uniformly successful clas-
sication in all classes. This is followed by BN (A: 74.06%, F: 0.75), MLP (A:
53.22%, F: 0.53), SVM (A: 52.69%, F: 0.53), and NB (A: 49.36%, F: 0.50) in
descending order, in which the best results were achieved with data from Wear-
able3. As in the 2-level case, NB is the only classier below 50% accuracy for
all three devices used and, in particular, with Wearable2 (A: 25.95%, F: 0.23).
79
Devices
The dierences in the best-performing classiers within the tested devices are
small, but Wearable2 contains the lowest values for accuracy and F-measure
across all classiers.
In general, the best results were achieved with a model from the group of lazy
learners (KNN), followed by the tree (RF, RT) and rule-based classiers (DT,
PART). Neural networks (MLP) and Bayesian classiers (BN, NB) performed
less successfully.
2-level Wearable1 Wearable2 Wearable3

Classier A F A F A F
BN 82.28 0.83 77.88 0.80 83.93 0.85

NB 40.60 0.42 36.77 0.37 59.52 0.63
MLP 74.29 0.74 69.01 0.71 81.69 0.82
SVM 68.21 0.69 64.30 0.67 73.15 0.73
KNN 98.76 0.99 90.11 0.91 99.42 0.99
RF 98.73 0.99 90.08 0.91 99.34 0.99
RT 98.33 0.98 90.10 0.91 99.13 0.99
PART 96.70 0.97 89.65 0.90 98.53 0.99
DT 95.75 0.96 88.08 0.89 96.03 0.96
3-level Wearable1 Wearable2 Wearable3

Classier A F A F A F
BN 73.55 0.74 69.74 0.70 74.06 0.75

NB 37.90 0.40 25.95 0.23 49.36 0.50
MLP 53.03 0.49 48.54 0.48 53.22 0.53
SVM 49.15 0.48 49.33 0.50 52.69 0.53
KNN 98.59 0.99 95.43 0.95 98.52 0.99
RF 98.47 0.98 95.39 0.95 98.56 0.99
RT 97.90 0.98 95.05 0.95 98.01 0.98
PART 97.68 0.98 94.93 0.95 97.65 0.98
DT 93.39 0.93 91.75 0.92 93.83 0.94
Table 4.8: Classication results of 2-level and 3-level classication of drowsiness
with performance measures (accuracy (A) in percent and F-measure

(F)) for all three devices.
In the following, the presented RQs will be answered, the results discussed, and
implications for further research derived.
RQ2.4: Is it possible to reliably detect driver drowsiness by using ex-
clusively physiological data (heart rate) from a wrist-worn wearable
device in combination with a common machine learning classier?
In general, this RQ can be answered with yes. High accuracies (>99%) and
F-measures (0.99) were achieved in both 2- and 3-level classication. However,
80
as can be seen from the results, successful detection strongly depends on the
classier type. In the present case, classiers from the group of lazy learners
(KNN), tree (RF, RT) and rule-based (DT, PART) classiers were more suit-
able than Bayesian (BN, NB) or function-based classiers (SVM) as well as
neural networks (MLP). Reasons can also be the type of feature extraction,
feature selection, and the ground truth applied. Another important point to
address is the way the performance of the models has been tested. In this work,
a 10-fold stratied CV was used. Cross-validation is a commonly used statisti-
cal method used to compare and select models for a given predictive modeling
problem, estimate the skill of a model on unseen data, and get an impression of
its prediction capability. However, before the data set is divided into ten folds,
it is shued. Therefore, a subject's data could be in both the training and
test data sets. For this reason, in a future step, the best performing models
have to be tested with entirely new data in order to be able to assess to what
extent they are capable of generalizing to new data. With regard to the results,
it should also be noted that the data set for training and testing the models
contained only 30 subjects from two specic age groups. Thus, the results may
dier if a more extensive data set with subjects from other age ranges were
available.
RQ2.5: How do results of dierent devices dier? Generally speaking,
the three devices used show similar tendencies in all tested classiers, from
which a similar measurement accuracy can be derived. Therefore, for later
in-vehicle usage, dierent wearables could be considered. It is noteworthy,
however, that Wearable2 has achieved, in some cases, poorer results compared
to the other two devices. Wearable2 only provides a heart rate value every 3
seconds on average across all subjects, which could be attributed to a dierent
analysis of the PPG signal or a possible higher susceptibility to external inu-
ences (vibrations, heavy arm movements). The other two wearables deliver a
heart rate value every second. The reduced number of data points may have
negatively aected the expressiveness of some of the extracted features and
thus the classication accuracy. It can be deduced that better results could
be achieved for the present case by sampling the heart rate with a higher fre-
quency. However, this nding needs to be investigated with other wrist-worn
wearables and larger data sets.
RQ2.6: How do results for a dierent number of drowsiness levels
dier? To answer this research question, it can be said that the dierences
between the results of 2-level and 3-level classication are relatively small, with
slightly better performances in the 2-level case. However, it is noteworthy that
in the case of 3-level classication, the dierences within the devices are lower
than in the 2-level case. This speaks for the use of wrist-worn wearables in
both cases. However, it depends on the individual use case whether drowsiness
should be classied binary (non-drowsy and drowsy) or whether the detec-
tion of the transition state from non-drowsy to drowsy, which includes the
detection of the onset of drowsiness, is to be recorded.
One limiting factor of this experiment might be the choice of ground truth
81
Devices
for drowsiness. As in the previous experiment, the applied models were used
with the default parameters preset in the Weka machine learning library, and
no hyper-parameters were tuned in the current development stage of the de-
tection models. Further, other types of feature selection and class balancing
should be considered.
4.2.4 Main Findings
Driver drowsiness detection using exclusively physiological data (heart

rate) from a standalone consumer-grade wrist-worn wearable device in
combination with a common machine learning classier is feasible.
High values for accuracy (>99%) and F-measure (0.99) were achieved.
KNN and RF performed best in both 2- and 3-level classication of driver

drowsiness.
Results of the three wearables tested are comparable, particularly in the

case of 3-level classication.
The shown method can be applied by potentially any consumer device,

indicating that 2-level and 3-level classication of drowsiness is feasible
with today's technology on the consumer electronics market and common
machine learning techniques.
In the two experiments presented, both observer ratings and self-ratings were
used as the ground truth for drowsiness. Several other approaches can be found
in the literature, and no standard exists. Therefore, a further investigation, pre-
sented in the following section, was carried out on this topic. This investigation
is not directly related to
vital parameters measured with wrist-worn smart wearables?). How-
ever, since it is a fundamental and essential topic for obtaining a high detection
accuracy when dealing with supervised machine learning, it is in the same con-
text. It will be included in the overarching discussion at the end of this Ph.D.
thesis.
82
4.3 Ground Truth for Drowsiness: A Complexity Analysis
4.3 Ground Truth for Drowsiness: A Complexity

Analysis
Systems for driver drowsiness detection are increasingly being proposed, whose
implementations are based on machine learning, mainly supervised machine
learning in the form of a classication problem, as in the previous two ex-
periments. These algorithms are data-driven and require a sucient amount
of labeled training data. The quality of the labels determines the quality of
the machine learning algorithm and thus of the drowsiness detection system.
However, to collect a ground truth for drowsiness, various approaches exist.
As described above, in the rst experiment, a combination of observer ratings
and image processing served as ground truth. In the second experiment, self-
ratings were applied. The acquisition is, in most cases, tailored to the respective
study. So far, and in terms of comparability of dierent works with each other,
a uniform process, general guidelines, or recommendations are missing. When
looking at previous works, in the majority, self-, observer ratings, or hybrid
solutions are applied. In this experiment, observer ratings, which are mostly
assessed for time intervals of one-minute length, will be analyzed in terms of
necessary complexity and if a reduction in rating frequency aects the quality
of the ratings. The ndings of this experiment can be applied in optimizing the
collection of ground truth for drowsiness. Moreover, it can serve as a starting
point for further research in this area with the ultimate goal of standardizing
the collection of a valid ground truth for driver drowsiness. In the next sec-
tion, an overview of related work is given, and the research questions for this
experiment is derived.
4.3.1 Related Work
In this section, an overview of dierent ground truth types for driver drowsiness
from related work is given, including self-ratings, observer ratings, and hybrid
approaches. This is followed by the research goals of this experiment.
4.3.1.1 Self-Ratings as Ground Truth
In the work of Mehreen et al., a hybrid approach for detecting drowsiness was
applied. EEG was combined with data from the accelerometer and gyroscope.
KSS ratings served as ground truth, which was polled every minute during 12-
minute drowsy and fresh drives [90] by lling a questionnaire. Friedrichs
83
Devices
et al. evaluated 90 hours of real driving data collected by over 900 drivers.
During these trips, the KSS was interrogated verbally by a co-driver every 15
minutes. For drowsiness classication, the 9-point Likert scale was grouped as
follows: awake (KSS ≤ 6), questionable (6 < KSS < 8), and drowsy (8 ≤ KSS).
Facial features formed the data basis from recorded video data classied using
an articial neural network (ANN) [81]. Fu et al. presented a Hidden Markov
Model (HMM) for evaluating ECG, EMG, and respiratory data. During a
3.5-hour drive in real trac, the drivers had to report their drowsiness about
every 15 minutes by means of the KSS after each driving session. Specic levels
of this scale were categorized into three levels: alert (1-3), mild fatigue (3-5),
fatigue (5-7) [162]. Leng et al. used data from PPG and GSR sensors to develop
an SVM model for drowsiness detection. Again, KSS ratings, recorded during
the drive based on the number of minutes and veried by the participants
by watching recorded videos, were applied as ground truth. The 9-level scale
has been divided into ve levels for classication: level 1 (1-2), level 2 (3-
4), level 3 (5-6), level 4 (7-8), level 5 (9) [129]. Gielen and Aerts conducted
a driving simulator study with 26 subjects with a maximum driving time of
150 minutes. For the detection of drowsiness, heart rate, the temperature of
the nose and wrist were classied utilizing a binary decision tree. SSS ratings
served as labels, which had to be submitted every ve minutes while driving.
A person outside the simulator signalized the moment for the rating outside
the simulator with a hand gesture. From a score of ve, the participant was
labeled as drowsy, below as non-drowsy [146].
4.3.1.2 Observer Ratings as Ground Truth
In the work of Li et al., the steering wheel angle time series were evaluated
for detecting driver drowsiness. Data from six participants, recorded during a
90-minutes monotonous real-world drive, served as a data basis for a binary
decision classier. For the development of their drowsiness detection model,
three experts rated 1-minute video segments as either awake or drowsy. If
the raters achieved no consensus on a sequence, the considered sample was
skipped [203]. Jacobe dé Naurois et al. conducted a driving simulator study
with 21 participants and driving time between 100 and 110 minutes. Several
dierent kinds of measures (behavioral, vehicle-based, and physiological) were
recorded that served as input for an articial neural network (ANN). Two
raters analyzed each minute of the recorded videos independently based on
the methodology published by Wierwille and Ellsworth [52]. This rating scale
ranges from level 0 (alert) to level 4 (extremely drowsy) with steps of length
0.5. The average of both ratings was applied as ground truth for drowsiness.
The threshold for not drowsy and drowsy was set at 1.5. [204]. A brain-
machine interface was developed in the work of Li et al. [91], which fuses EEG
with head movements to detect drowsiness with a binary SVM classier. Six
84
subjects participated in a one-hour monotonous drive in a simulator. Based

on 1-minute video samples, a ground truth for drowsiness was created by two
raters. The labels alert and slightly drowsy were assigned every minute,
based on Wierwille and Ellsworth [52]. In a hybrid approach, proposed by
Li et al. [205], heart rate data was fused with PERCLOS (percentage of eye
closure) [72] and applied as input for a binary SVM classier. Ten subjects
participated in a 50-minute experiment in a simulated environment. After the
study was nished, ve subjects were selected to rate the videos minute by
minute as drowsy or not drowsy.
4.3.1.3 Hybrid Ground Truth
Wang et al. analyzed EEG features for drowsiness detection. For gathering a
ground truth for drowsiness, the subject's self-assessments were combined with
observer ratings. After four 10-min monotonous simulated drives, 15 subjects
rated their current state of drowsiness by lling the SSS. For observer ratings,
the facial expressions for the same periods were evaluated using a three-level
scale: clear-minded (0), tired (1), and fatigue (2). To establish a connection
between the seven-stage SSS and the three-stage video evaluation, the following
technique was applied to lter inconsistent ratings: clear-minded corresponds
to SSS levels 1 and 2, tired to levels 3 and 4, and fatigue to levels 5 to 7.
With an SVM classier, a drowsiness detection model was built [92]. In the
work of McDonald et al., four kinds of measures were applied for receiving
a ground truth for drowsiness and creating the evaluation data set for their
proposed drowsiness detection algorithm. Two observers evaluated a 1-minute
time window before the occurrence of a drowsy-related lane departure. For
the ratings, a 5-step scale ranging from not drowsy to extremely drowsy
and based on Wierwille and Ellsworth [52] was applied. If ratings were greater
than two, they were assigned to the class drowsy. To be classied as awake
by a Random Forest model, three more tests had to result in awake: Psy-
chomotor Vigilance Test (PVT) [43], and self-assessments using the SSS and
a retrospective sleepiness scale [83]. Lee et al. conducted a simulator study
with six participants and combined data from a PPG sensor with ECG. Labels
were assigned by evaluating 1-minute segments of videos of the driver's face and
driving behavior. Each minute was labeled as either drowsy or awake. Their
binary classication in the form of recurrence plots resulted in an accuracy of
70% [114]. Twenty subjects participated in a one-hour driving study in the
work of Li et al. [206]. Data from an EEG headset served as input for an SVM
classier. Labels for alert and drowsy data were acquired with a combina-
tion of PERCLOS and the number of adjustments (NOA) while steering [143].
Values of PERCLOS ≥ 12% and NOA ≤ 9 corresponds to true drowsy and
PERCLOS < 8% and NOA > 26 to true alert. Lee et al. evaluated steering
wheel movements with accelerometer and gyroscope data from a smartwatch
85
Devices
on one wrist and combined it with physiological data from a PPG sensor placed
on a sports wristband on the other wrist. From the data that was collected
during a 3-hour simulator drive with 12 participants, time, phase space, and
spectral-domain features were calculated and classied with a WFCM model.
Self-Ratings combined with observer ratings, both calculated every two minutes
based on the KSS, serve as ground truth. KSS levels 1-5 represent wakefulness,
and 6-9 drowsiness [145].
From the presented related work, it can be seen that a majority of dierent
approaches were applied for gathering a ground truth for driver drowsiness.
With regard to self-ratings, rating frequencies ranging from one to 15 minutes
can be found by applying scales such as KSS or SSS. For the rating itself,
dierent types of requests (e.g., sound, visual hint), feedback (e.g., verbally,
entering rating on tablet PC, pointing with nger on a printed scale), and
scales with a dierent number of drowsiness levels were applied. Although these
kinds of ratings are often applied as ground truth for drowsiness, their use is also
discouraged. One reason is their subjectivity; another reason is that, depending
on the type and frequency of rating requests, they have an alerting eect on the
driver, which can negatively aect the development of drowsiness. In contrast
to self-ratings, observer ratings are a promising alternative and are increasingly
being applied as ground truth. Since external observers collect them either in
real-time or oine, they are not intrusive and thus do not negatively aect
the drivers and their drowsiness state during driving. In general, they provide
a more objective ground truth for drowsiness. As for self-ratings, dierent
scales and a dierent number of drowsiness levels are used for assessing them.
However, there is more consensus in the frequency of the ratings in previous
works, mostly one minute.
In this experiment, observer ratings will be analyzed in more detail in terms of

necessary complexity. Based on 1-minute observer ratings, correlation analysis
will be applied to examine if a reduction in rating frequency aects the quality
of the ratings. The development of drowsiness is a long-term process. There-
fore, more extended periods may be sucient for a rating, giving the same or
even a better impression of the driver's current state. Due to a high rating
frequency, rating errors can occur faster, e.g., due to a misinterpretation of
specic drowsiness indicators by the observer, which might not be when more
extended periods for the rating are considered. These incorrect ratings of the
driver's drowsiness result in incorrect labels for the training data. Another
aspect being investigated is the inuence of the number of drowsiness levels to
be predicted. Whether for a lower number of drowsiness levels, a lower rating
frequency is sucient. Observers rate drowsiness often in a high granularity.
However, this label granularity [207] is mostly reduced when it comes to cre-
ating labeled data sets for training a machine learning classier. The more
granular drowsiness is to be predicted, the more challenging it is for the clas-
sier to achieve this with a high level of accuracy. Therefore, for the detection
86
of drowsiness, usually two (e.g., not drowsy, drowsy) or three (e.g., not drowsy,
transition state with the onset of drowsiness, drowsy) levels are considered (see
Figure 4.9).
Figure 4.9: Label granularity for drowsiness based on the drowsiness scale by Wein-
beer et al. [50] and reduction to two and three levels.
The following research questions (RQ) will be examined:
RQ2.7: Does the quality of the ground truth for drowsiness decrease
with decreasing rating frequency?
RQ2.8: If the number of drowsiness levels to be predicted is low, are

lower rating frequencies sucient?
4.3.2 Method
For this experiment, the recorded videos of the driver's face/upper body from
both manual and SAE level 2 driving from study 1 were evaluated oine by
external raters. In the following, this procedure will be described in detail.
Observer ratings of drowsiness based on indicators in the driver's face were

gathered oine after nishing the study. Therefore, in a pre-processing step,
all videos were sliced into 1-minute segments. Further, the order of extracted
1-minute segments was randomized per participant to eliminate the time de-
pendency of the single segments. Video segments at the end of a participant's
drive would probably be rated higher than those in the beginning. To increase
the reliability of the results, two trained individuals rated all videos separately.
Following that, segments with a dierence of more than one level of drowsiness
were evaluated by and discussed with a third rater, and a nal rating was set.
To keep the quality of the ratings high, several rating guidelines have been
set: The video evaluation must take place between 8 am and 8 pm. Persons
87
Devices
Figure 4.10: Extracted sample image of a recorded video le from study 1.
working on the night shift are not allowed to rate. Each rater may rate a max-
imum of one hour at a time and is then obliged to take a break of at least one
hour. Each rater may rate a maximum of four hours per day. If a rater is no
longer able to continue the rating in the usual quality at one's discretion, he or
she should take a break. Furthermore, training material was provided for the
raters to ensure a high quality of the video ratings. This aimed at calibrat-
ing the raters among themselves and a dedicated rater with him- or herself.
The training proceeded as follows: For an introduction, the raters were sensi-
tized to the following points: Subjective drowsiness judgments are not allowed.
The assessments have to be exclusively based on the observed indicators on the
drowsiness scale. Outliers in the sense of extreme forms of individual indicators
should not be taken into account in forming the overall rating. Instead, the
average of an indicator over the 1-minute video sequence has to be built. For
the training, a set of training videos was provided for each level of drowsiness
on the provided scale. The evaluation of these sample video sequences had to
be done individually by each rater during training. Deviations in the ratings
were then discussed with the aim of standardization. Moreover, all other indi-
cators regarding their occurrence were discussed. Depending on the progress
in the video evaluations, the training was repeated at regular intervals. Again
the drowsiness scale published by Weinbeer et al. was applied for the observer
ratings [50].
4.3.3 Results
Considering the 30 participants and the 45-minute SAE level 2 and manual
drive, a total of 60 videos, which corresponds to 2700 minutes or 45 hours,
would have been available for further analysis. However, for some participants,
88
problems with the video recording occurred, or the face was only partially or
not visible, e.g., due to an unusual seating position. These segments were
removed so that 2465 minutes (around 41 hours) were applicable for video
rating. Since the 1-minute video sequences were randomized for the rating,
the ratings obtained were initially arranged chronologically. In the following,
no distinctions, comparisons, or separate evaluations were conducted regarding
driving mode (manual, partially automated) or age (young, old). All available
ratings were considered as a single data set. In the event of a dierence of two
or more levels of drowsiness, this sequence had to be discussed with a third
rater and an agreement reached. This was not the case for the existing ratings.
Therefore, for further analysis, the 2465 ratings of both raters were averaged.
In Figure 4.11, the video ratings of all subjects of both manual and automated
driving are plotted separately (blue lines) and on average (black line), including
a trend line (dashed black line). The line charts for the individual subjects are
not intended to show a detailed course of their drowsiness state over time. They
should instead give an impression that a high inter-driver variance exists and
that some participants reached higher levels of drowsiness already at an early
stage of the drive than others. On average, across all subjects and both driving
modes, a constant, almost linear increase with minor uctuations in drowsiness
becomes apparent.
Figure 4.11: Video ratings for all participants from manual and automated driving
(blue lines) and averaged (black line) with trend line (black dashed
line).
89
Devices
Drowsiness level Rater 1 Rater 2
1 906 (36.75%) 869 (35.25%)
2 1044 (42.35%) 1083 (43.94%)
3 308 (12.49%) 305 (12.37%)
4 98 (3.98%) 98 (3.98%)
5 41 (1.66%) 44 (1.78%)
6 68 (2.76%) 66 (2.68%)
Overall 2465 (100%) 2465 (100%)
Table 4.9: Distribution of the 2465 ratings of the two raters on the dierent drowsi-
ness levels in absolute and relative (in brackets) values.
A closer look at the ratings (see Table 4.9) shows that, apart from levels 1
and 2, both raters gave an almost identical number of ratings in the other
drowsiness levels. It becomes apparent that the majority of ratings cover levels
1 and 2 (79%), and the remaining 21% is spread over levels 4-6. From that,
it can be deduced that only occasionally high drowsiness levels were reached
within the participants. Among the 2465 ratings, a dierent rating was given
in 313 (13%) cases by both raters. The distribution of inconsistent ratings
within the two raters for adjacent drowsiness levels and their percentage of the
average number of ratings in these two levels per rater is presented in Table
4.10. Given the absolute values, a noticeably higher number of dierent ratings
in the lower levels of drowsiness can be determined. Regarding levels 1 or 2,
199 dierent ratings were given, between levels 2 and 3 82. In comparison, only
24 dierent ratings exist between level 3 and 4, and only four dierent ratings
between levels 4 and 5 and levels 5 and 6. In the rst place, this could be
attributed to the generally higher number of ratings in these levels. However,
the dierences are less decisive when considering the percentage of the average
number of ratings of a rater in the two respective levels.
90
Percentage of Percentage of
Drowsiness Inconsistent
the number of the number of
levels ratings
ratings of rater 1 ratings of rater 2
1 / 2 199 10.21% 10.19%
2 / 3 82 6.07% 5.91%
3 / 4 24 5.91% 5.96%
4 / 5 4 2.88% 2.82%
5 / 6 4 3.67% 3.64%
Table 4.10: Distribution of inconsistent ratings within the two raters for adjacent
drowsiness levels and their percentage of the number of ratings in these

two levels per rater.
For further analysis, correlations were calculated with Spearman's ρ. Spearman

was chosen based on the ordinal and discrete form of the observer rating scale
and the non-existent bivariate normal distribution in the data. A t-test was
applied to check the obtained correlations for signicance (p < 0.05). To gain
a better understanding of the obtained video ratings in terms of a possible
reduction of complexity, two dierent cases were considered in the correlation
analysis. In this process, the minute-by-minute ratings served as a reference.
In the course of the evaluation, rating intervals of two, three, four, and ve
minutes were examined in more detail. Figure 4.12 exemplies both cases for a
rating interval of ve minutes. It is assumed that the rating of every fth minute
corresponds to the driver's drowsiness state over the entire 5-minute interval,
and thus the time since the previous rating. In the case of a Minute-by-minute
comparison, the remaining minutes in the interval are lled with this rating.
Then, correlations between the ratings of the single minutes are calculated.
In the Average case, the obtained 1-minute video ratings in the considered
interval are averaged. This value is correlated with the rating of the fth
minute. The procedure was identical for the other considered rating intervals
(two, three, and four minutes). Besides, it was examined how the correlations
change with dierent numbers of drowsiness levels. Since the drowsiness scale
applied consists of six levels, all of them were considered rst. Given the related
work presented, drowsiness was mostly split into two or three levels. Therefore,
the six levels were categorized into three (12-3-456) and two (12-3456) levels.
91
Devices
Figure 4.12: Calculation of correlations using minute-to-minute comparisons and
the average exemplary for a rating interval of ve minutes. The num-
bers in the boxes represent ctitious ratings.
In Table 4.11, the results from correlation analysis with six drowsiness levels
are presented. In the Minute-by-minute evaluation, the correlations are gen-
erally strong (≥ 0.77), and only minor dierences in the results from dierent
time intervals exist. The maximum correlation coecient with a value of 0.81
was reached at intervals of two and ve minutes. By utilizing the average in
calculating the correlations, stronger eects, except for three minutes (0.71),
are noticeable. The most noticeable dierence (0.14) is at an interval of ve
minutes. All correlations obtained are statistically signicant.
Assuming three levels of drowsiness (see Table 4.12), the evaluations resulted
in strong eects (≥ 0.84). The dierences within the Minute-by-minute and
Average evaluations are not as pronounced apart from a time interval of ve
minutes. The strongest correlation with a value of 0.95 was achieved in the case
of Average at a 5-minute rating interval. Specically, in this evaluation, the
variations in the correlations of the dierent time intervals are more extensive
than in the case of Minute-by-minute. There, the obtained correlations are
at an almost constant level.
In the case of a binary split of drowsiness (see Table 4.13), the results of the
Minute-by-minute evaluation can generally be associated with strong corre-
lations, with a maximum value of 0.80 with a rating interval length of two
minutes. However, they decrease steadily with increasing rating interval size
and reach a minimum of 0.70 at an interval of ve minutes. In the case of Aver-
age, contradictory results were achieved in the correlation analysis. Again, the
maximum value of 0.94 can be found at a rating interval of two minutes. For a
5-minute rating interval, the analysis resulted in a ρ-value of 0.90, which is at
a similar level. Further, when comparing rating intervals of two and four min-
utes for both evaluation cases, more substantial dierences in the correlation
coecients become apparent.
92
Time interval
2 3 4 5
(in minutes)
ρ = 0.81 ρ = 0.77 ρ = 0.80 ρ = 0.81
Minute by minute
p < 0.001 p < 0.001 p < 0.001 p < 0.001
ρ = 0.88 ρ = 0.71 ρ = 0.90 ρ = 0.95

Average
p < 0.001 p = 0.003 p < 0.001 p < 0.001
Table 4.11: Results from correlation analysis with Spearman for dierent time in-
tervals and six levels of drowsiness. Correlations are calculated with

every single minute or the average of the 1-minute ratings in the cor-
responding time interval. Signicant results (p < 0.05) are printed in
boldface.
Time interval
2 3 4 5
(in minutes)
ρ = 0.88 ρ = 0.89 ρ = 0.84 ρ = 0.84
Minute by minute
p < 0.001 p < 0.001 p < 0.001 p < 0.001
ρ = 0.86 ρ = 0.91 ρ = 0.83 ρ = 0.95

Average
p < 0.001 p < 0.001 p = 0.002 p < 0.001
tervals and three levels of drowsiness (12-3-456). Correlations are

calculated with every single minute or the average of the 1-minute rat-
ings in the corresponding time interval. Signicant results (p < 0.05)
are printed in boldface.
Time interval
2 3 4 5
(in minutes)
ρ = 0.80 ρ = 0.78 ρ = 0.76 ρ = 0.70
Minute by minute
p < 0.001 p < 0.001 p < 0.001 p < 0.001
ρ = 0.94 ρ = 0.86 ρ = 0.80 ρ = 0.90

Average
p < 0.001 p < 0.001 p = 0.003 p = 0.001
tervals and two levels of drowsiness (12-3456). Correlations are

calculated with every single minute or the average of the 1-minute rat-
ings in the corresponding time interval. Signicant results (p < 0.05)
are printed in boldface.
93
Devices
In the following, the results are discussed, and the presented RQs are answered.
Moreover, implications for further research are derived.
RQ2.7: Does the quality of the ground truth for drowsiness decrease
with decreasing rating frequency? When looking at the results on a
drowsiness scale with six levels (see Table 4.11), the minute-by-minute com-
parison resulted in ρ-values >0.77 with a maximum of 0.81 at a rating interval
of two and ve minutes, which corresponds to a strong correlation. In the case
of the Average evaluation, this eect is more evident with a maximum ρ-
value of 0.95 at an interval length of ve minutes. Therefore, an almost perfect
correlation could be determined above the ρ-values of a 3- (0.71) or 4-minute
(0.90) rating interval. Thus, for the present case, and if the rating interval is
increased, i.e., the rating frequency decreased, a ground truth of almost com-
parable quality can be determined.
RQ2.8: If the number of drowsiness levels to be predicted is low,
are lower rating frequencies sucient? By reducing the drowsiness levels
from six to three levels (see Table 4.12), the strongest eect (0.95) was again
found in the Average evaluation with a rating interval of ve minutes. For
this reason, with a reduction to three levels, the quality of the ground truth is
still at an almost identical level with a rating interval of ve minutes compared
to one minute. If drowsiness consists of two levels (see Table 4.13), the highest
correlation coecient with a rating interval of two minutes was reached with
a ρ-value of 0.94. The correlation coecients for rating intervals with a length
of three and four minutes are slightly lower, with values of 0.86 and 0.80. A
similar correlation coecient can be found at a 5-minute time interval, which is
0.90. In summary, it can be said that through a reduction of drowsiness levels,
its inuence on the quality of ground truth is low (RQ2.8). In particular, it
should be noted that this eect occurs at larger rating intervals, in the present
case of up to ve minutes.
In general, it can be deduced that drowsiness represents a rather slowly chang-
ing state. Therefore higher rating intervals are sucient that might give even
a better impression of the driver's drowsiness state. A higher rating frequency
also increases the likelihood of assigning incorrect ratings, which leads to in-
correct labels of the training data. However, in the present case, it should
be noted that the evaluations in this work are based on data from 45-minute
drives in the driving simulator, and the database contains in the majority lower
drowsiness levels what could be a reason for higher correlation coecients due
to few changes in the participant's drowsiness state. Therefore, longer drives,
e.g., over three or four hours and possibly with sleep-deprived participants,
resulting in a less steady course of drowsiness, should be considered for data
collection. This is intended to result in a database with a more evenly dis-
tributed number of ratings across the levels and more frequent changes in the
driver's drowsiness state throughout the drive.
94
Based on the knowledge gained, deriving a standard procedure for determining

a ground truth for drowsiness, which can be applied directly for the engi-
neering of drowsiness detection systems, is a dicult undertaking. However,
suggestions can be derived with reference to the presented related work and the
results obtained in this work. In the previous works, depending on the scale
used, drowsiness was mostly assessed in full granularity. For the subsequent
development of the drowsiness detection algorithm using supervised machine
learning and the potential usage in a real-world scenario, this granularity was
reduced, and drowsiness was categorized mostly into two or three levels. De-
tecting drowsiness in this level of detail presents the classier with a major
challenge that will, in the minority of cases, not be required in a practical use
case. Rather, the reliable detection of the onset of drowsiness and the drowsy
state itself is relevant. The correct classication of the non-drowsy state is, of
course, also of major concern to avoid unnecessary drowsiness warnings. The
question arises whether, in general, the state of drowsiness needs to be de-
tected at this level of detail. When reducing this granularity, a wide variety of
terms were introduced for the levels obtained (alert/drowsy, awake/drowsy, not
drowsy/drowsy, alert/slightly drowsy, clear-minded/tired/fatigue). This work
shows that a ground truth of almost identical quality was obtained by reducing
drowsiness levels to two or three. The categorization of drowsiness into the
foreseen categories could be carried out before data collection and generally
applicable names for the levels established. For a division of drowsiness into
two levels, the terms not drowsy" and drowsy and, in the case of three levels,
not drowsy, transition state, and drowsy" could be exemplary applied.
As a limiting factor, it should further be noted that only rating intervals of up
to ve minutes were examined. By increasing the interval length, it should be
investigated whether similar eects can be determined. Further, the collection
of ground truth, the scale, and the division of the drowsiness levels used in
this work are application-dependent. Therefore, the videos should be evalu-
ated using other observer rating scales, and identical evaluations carried out.
By comparing scales that are frequently used for observer ratings, it might be
possible to derive generalized statements, standard procedures, or rules, which
can be applied more universally.
4.3.5 Main Findings
By reducing the rating frequency, thus, the rating complexity by a fac-

tor of ve, an almost comparable ground truth for drowsiness could be
obtained.
95
Devices
By reducing the number of drowsiness additionally from six to three and

two levels, strong correlations could still be determined.
The knowledge gained can be used in future studies in this research area,
the collection, and standardization of a reliable and valid ground truth of
drowsiness, and the process improvement in developing reliable drowsi-
ness detection systems.
96
5 Evaluation: Performance and
Acceptance of a Driver
Drowsiness Detection System
based on Smart Wearables
The previous chapter contained dierent experiments where the feasibility and
potential of using wrist-worn wearable devices inside the vehicle for driver
drowsiness detection were investigated. Promising results were achieved using
their physiological as a single data source combined with supervised machine
learning to detect driver drowsiness.
In this chapter and based on the knowledge gained from these experiments,
a prototype for a driver drowsiness detection system based on a wrist-worn
smart wearable device is proposed and evaluated in the context of a user study
in terms of detection performance. Further, even if the in-vehicle usage of
smart wearables for driver drowsiness detection is feasible and high detection
accuracies were obtained, the driver is to a certain extent forced to use a
smart wearable, which is connected to the vehicle and continuously streams
vital and health data. Therefore, in addition to the performance evaluation,
this chapter will investigate whether the in-vehicle usage of systems based on
wearable devices for safety-critical tasks, in the present case, driver drowsiness
detection is accepted and how this acceptance can be further enhanced.
In this chapter, RQ3 (Are driver drowsiness detection systems based

on smart wearables accepted and how to further enhance their accep-
tance and thus integration in the vehicle?), is addressed. The following
hypotheses (H) were derived for this experiment:
H1: A high detection accuracy can be obtained by using the proposed

drowsiness detection system.
H2.1: There is a high user experience after rst-time use of the proposed
97
5 Evaluation: Performance and Acceptance of a Driver Drowsiness Detection
System based on Smart Wearables
H2.2: There is a signicant dierence between younger (20-25 years) and

older (65-70 years) people regarding the user experience of the proposed
H3.1: There is a high technology acceptance after rst-time use of the

proposed drowsiness detection system.
H3.2: There is a signicant dierence between younger (20-25 years)

and older (65-70 years) people regarding the technology acceptance of
the proposed drowsiness detection system.
In the following, the concept and implementation of the drowsiness detection

system are presented. After explaining the user study setting, its results are
described and discussed and the core ndings summarized.
5.1 Concept
The concept of the driver drowsiness detection system is shown in Figure 5.1. A
smart wearable is worn on the driver's wrist and continuously streams real-time
physiological data to an application on a mobile device. On the application, the
received data are processed on the backend. Relevant features are extracted
and fed to an already trained machine learning classier. The classier's out-
put, i.e., the predicted drowsiness level, is then presented on the frontend of
the application in the user interface. Furthermore, a user manual is included in
the application, which supports the user in attaching the device to the wrist/-
forearm and establishing the connection with it.
This concept aimed to develop a portable and handy system for driver drowsi-
ness detection based on a wearable device. Since the system only consists of
a wearable and tablet PC at the current development stage, it can be inte-
grated into any vehicle. The long-term vision is to replace the tablet with the
human-machine interface (HMI) integrated into the vehicle in the long term.
Then, only the smart wearable is required in terms of hardware and exter-
nal/additional sensors. The current trend in the automotive industry shows
that well-known manufacturers will replace their own operating systems with
ones used initially on mobile devices, e.g., with Android Automotive [208].
98
5.2 Implementation
Figure 5.1: Concept of driver drowsiness detection system.
5.2 Implementation
Before giving a detailed description of the implementation of the drowsiness

detection systems, a short overview of approaches from related work will be
given and how drowsiness detection systems based on mobile applications in
combination with wearable and other sensors were realized.
5.2.1 Related Work
In the work of Lin et al., an Android application was developed that receives and
processes EEG signals from a wireless and wearable headset. A support vector
regression (SVR) model embedded in the app was used to detect the drowsiness
state. On the graphical user interface (GUI), the level of drowsiness and the
dierent EEG channels were displayed and updated every two seconds [209]. Li
and Chung proposed a drowsiness detection system based on EEG signals and
gyroscope data. The EEG signals were streamed from a wearable EEG headset,
and an SVM model was applied for determining the driver's current drowsiness
state. On the app (see Figure 5.2(a)), the raw EEG data and features, 3-axis
gyroscope data and features, as well as the driving status was displayed [91]. Li
and Chung proposed another Android-based driver drowsiness detection system
using HRV and PERCLOS as input data and an SVM model for classication.
The app displays the PERCLOS measure, raw PPG data, and the detected
drowsiness state (see Figure 5.2(b)). Further, in terms of being classied as
drowsy, nearby coee shops are suggested [210]. In the work of Jabbar et
al., an Android-based system is presented, where facial pictures of the mobile
camera were evaluated by a trained deep learning algorithm that was embedded
in an application (see architecture in Figure 5.2(d)). If the driver is drowsy, the
application will signal the driver with auditory and visual notications [64]. In
the work of Misbhauddin et al., a wearable-based drowsiness detection system
99
(a) Li and Chung [91] (b) Li and Chung [210]
(c) Li and Chung [210] (d) Jabbar et al. [64]
Figure 5.2: Selected implementations of mobile applications for driver drowsiness
detection from related work.
consisting of a wristband and an Android application was proposed. For real-

time identication of drowsiness, HRV and GSR data from the wristband were
streamed to and processed in the mobile application. For training, the system
has to be used when not driving to collect feedback from the users through the
mobile application four times a day regarding their current drowsiness state.
During driving, if both values of GSR and HRV are below a certain threshold,
a warning is issued [102].
It became apparent that mobile applications oer a promising platform for

the implementation of drowsiness detection systems. However, the focus was
mainly on studying the feasibility of implementing such a system but not eval-
uating it from the user's point of view, i.e., whether they would use it at all
and how their user experience is while using it. Therefore, a portable system
for driver drowsiness detection based on a wearable device combined with a
mobile application will be proposed for investigating these issues. Real-time
physiological data of wrist-worn wearable devices will be evaluated by a trained
machine learning classier embedded in the app. In the course of a user study,
the aim is to examine how systems based on wearables can be operated inside
a vehicle with the proposed prototype, and user experience and acceptance can
100
5.2 Implementation
be further enhanced.
In the following sections, the implementation of the drowsiness detection will
be described in detail.
5.2.2 Wrist-worn Smart Wearable Device
The Polar OH1+ optical sensor provides real-time physiological data (see right
part of Figure 5.6). This commercial smart wearable device can be easily worn
on the wrist or forearm with a textile strap. It oers real-time streaming of
heart rate data via BLE or Adaptive Network Topology (ANT+) for more than
12 hours without charging [211]. In the work of Hettiarachchi, the heart rate
data of the Polar strap was compared with a medical-grade ECG measurement
device and resulted in very high correlations (99%) [212].
In terms of real-time behavior, ve test runs for evaluating the battery con-
sumption of the Polar OH1+ device were carried out before the study, in which
the heart rate data was continuously streamed from the wristband to the tablet
via BLE for six hours. The consumption of the individual test runs (blue scat-
tered lines) are almost identical (see Figure 5.3). On average (blue line), the
battery consumption over the entire test duration is almost linear and reaches
around 50% after six hours. Particularly with regard to longer journeys with the
car, the proposed drowsiness detection system could be used without recharging
the wearable device after a short time.
Figure 5.3: Average battery consumption (blue line) of Polar OH1+ for ve test
runs (blue scattered lines) during real-time data transfer to app over a
duration of 6 hours.
5.2.3 Application on Mobile Device
The heart rate data was transmitted in real-time via BLE to a Google Pixel C
Tablet with an Android operating system (8.0.1 Oreo) that served as a plat-
101
form for implementing the drowsiness detection application. This Android

application was implemented in Java with Android Studio (Version 3.5) with
software development kit (SDK) tools version 26.1.1 on an Android platform
with application programming interface (API) 29 and Android OS 10.0.
5.2.3.1 Backend
In the backend of the application, the Polar BLE SDK was integrated to handle
the connection to and data transfer from the wearable [213]. Next, the received
heart rate data is pre-processed. Possibly unexpected and longer gaps between
two successive data points due to motion artifacts, e.g., caused by strong hand
movements of the driver, can occur [212]. Hence, a data pre-processing algo-
rithm is implemented to ensure a sucient amount of data and prepare it for
feature extraction. As performed in the experiments in the previous chapter,
features are extracted with a sliding window of ve minutes and a 2-seconds in-
crement. Therefore, at least ve minutes of data are needed for calculating the
rst set of features. From that time, features are calculated every two seconds.
The ow chart of the data pre-processing algorithm is presented in Figure 5.4.
After receiving new data, the dierence (di_1) between the timestamp of the
last received heart rate value and the rst value of the recording time is calcu-
lated. If di_1 is less than the length of the sliding window, i.e., ve minutes,
the system continues to wait for new input. The next step is to check whether
the system is still in the initialization state (init_state), i.e., whether features
are being calculated for the rst time after starting the application. If this is
the case, the sliding window's content is checked whether a sucient amount
of data is available and the maximum permitted time dierence between the
inputs has not been exceeded. In the present case, this maximum permitted
time dierence was set to 330 seconds, i.e., within ve minutes (300 seconds), a
maximum of 30 seconds of missing data is allowed. If it is exceeded, the system
continues to wait for new input. If not, the data is transferred to the feature
extraction algorithm. After the features have been calculated for the rst time,
the system is no longer in init_state. From this point in time, the dierence
between the timestamp of the last received input and the last timestamp of the
previously forwarded 5-minute window to the feature extraction algorithm is
calculated. If di_2 is higher than or equal to the dened increment of the slid-
ing window, i.e., two seconds, it is checked whether sucient data is available.
If this is not the case, the system continues to wait for new input; otherwise,
the data is forwarded to the feature extraction algorithm. In total, six features
from the time-domain are required as an input for the machine learning model,
as proposed in Section 4.2: mean, standard deviation, maximum, minimum,
range, and median of heart rate. As a next step, the obtained feature vector
is fed to a Random Forest (100 trees) [214] machine learning model for binary
classication of drowsiness (non-drowsy vs. drowsy). This model was se-
102
5.2 Implementation
lected since it provided overall the most promising results (>95% accuracy in
10-fold CV) in the experiments of Sections 4.1 and 4.2. The training of the
model was conducted in the WEKA Machine Learning GUI [189] with data
collected with the Polar A370 device in the SAE level 2 drive of study 1 (see
Section 3.2). The one reason is that Polar also manufactured the device used
in the prototype. The other reason is that an SAE level 2 drive is carried out
in a driving simulator to evaluate the prototype. In terms of ground truth for
drowsiness, the same as in Section 4.1, i.e., a combination of observer ratings
and detected micro-sleep events were applied and provided labels for the ex-
tracted feature vectors in the training data set. To use the trained machine
learning model in the app, the WEKA Machine Learning workbench was in-
tegrated as an external Java library (weka.jar). With this library, classiers
trained in and exported from the WEKA GUI can easily be integrated and
executed in Java code.
Figure 5.4: Flow chart of data pre-processing algorithm.
5.2.3.2 Frontend
A total of six dierent screens were implemented on the frontend that serves
as an interface for the interaction with the user (see Figure 5.5). After starting
the app, a welcome screen was displayed (see Figure 5.5(a)). Initially, the app
ensures internally that Bluetooth on the tablet is switched on. If this is not
the case, the user receives a notication to switch it on. Further, the option
is provided to choose a dierent language via the globe symbol in the upper
right corner. Currently, English, German, and Kannada are oered but can be
103
further extended to other languages. As part of the study, the participants had
to enter their ID.
(a) (b)
(c) (d)
(e) (f)
Figure 5.5: Screenshots of developed Android application: (a) Welcome screen; (b)
Screen with instructions for smart wearable usage; (c) Screen while
being classied as non-drowsy; (d) Screen while being classied as
drowsy; (e) Screen after nishing drive; (f ) Screen with KSS scale for
drowsiness self-ratings.
After pressing the Continue button, a user manual followed with pictures of
the wearable, how to put it on the wrist or forearm (see Figure 5.5(b)), switch it
on, and connect it with the application on the tablet. Once the smart wearable
104
5.3 User Study
is connected, the user is able to press the Continue button and switch to the
next screen where the current level of drowsiness is presented. This level is
also stored with a timestamp on the tablet. Figures 5.5(c) and 5.5(d) show
the screens for either being classied as non-drowsy or drowsy. When the
drive is nished, screen 5.5(e) appears, showing the current battery status and
indicating that the wearable should be recharged until the next drive. However,
within the scope of the user study carried out in this work, the screens 5.5(c)
and 5.5(d), i.e., the drowsiness state predicted by the machine learning model,
are not presented to the participants while driving. These two screens are shown
to the participants in the course of evaluating user experience and technology
acceptance of the whole system after the drive. In return, during the simulator
drive, the subjects had to assess their drowsiness via self-ratings during the
drive every ve minutes. For this purpose and in the same way, as in user
studies 1 and 2, the KSS was displayed to the participants (see Figure 5.5(f )).
The ulterior motive was to compare the output of the machine learning model
with the subjects' self-ratings after the study to evaluate the performance of
the machine learning model.
5.3 User Study
For the investigation of the presented hypotheses, a user study in a driving

simulator was carried out.
5.3.1 Simulator Setup and Driving Simulation
As in study 1, the study was carried out in the high-delity hexapod driving
simulator at Technische Hochschule Ingolstadt (THI). Except for two dier-
ences, the same study setup and driving simulation were applied (see descrip-
tion in Section 3.2). In the simulation, driving was only partially automated,
i.e., with SAE level 2 [20] since the dierences in the development of drowsi-
ness between manual and partially automated driving were already examined.
Further, to potentially obtain higher levels of drowsiness, the driving duration
was extended from 45 to 60 minutes.
5.3.2 Participants
The participants were selected from two age groups since the inuence of age on
user experience and technology acceptance will be examined. To be consistent
105
with the rst two studies, the same age groups were chosen. Therefore, 15
participants between 20-25 years and 15 between 65-70 years were selected
for the study. The participants received e25 in cash for participating in the
study and had to meet the same conditions for participation as in study 1 and
2: no sleep disorder, subjectively rated good health, valid driving license, no
limitations in their driving ability, no consumption of caeinated drinks ve
hours before the study.
5.3.3 Data Collection
Various data were recorded in the course of the study that will be described in
the following sections.
5.3.3.1 Pre-Questionnaire
Identical to study 2, the pre-questionnaire contained some basic demographic

questions and queried details about the participants' sleeping behavior and
health (see Table 3.1 in Section 2.2.1). Further, the items of the Epworth
Sleepiness Scale (ESS) [40] had to be answered for assessing the participants'
daytime sleepiness (see Table 3.2 in Section 2.2.1).
Since the performance of the driver state detection system will be evaluated,
i.e., how capable is the integrated machine learning model in detecting the
non-drowsy and drowsy state of the driver, a ground truth for drowsiness is
needed. In this work, the drowsiness state of the participants was determined
in two dierent ways: self-ratings and observer ratings.
Self-ratings
In the same way, as in studies 1 and 2, the subjects had to assess their drowsi-
ness via self-ratings during the drive using the KSS. The screen containing the
KSS scale was shown to the participants every ve minutes on the tablet in
the vehicle's center console with the same modality as in the previous studies.
(see left part of Figure 5.6). The subjects also had to disclose the development
of their drowsiness after the trip using drowsiness curves, based on the UX
curve method [163], as in studies 1 and 2. This type of self-assessment will be
used to compare how the participants assess themselves directly after compared
to during the drive. Moreover, backup ratings were available in the event of
problems with self-ratings utilizing the tablet.
106
5.3 User Study
Observer ratings
In addition to the subjects' drowsiness self-assessments, their state was assessed
by external raters. With a camera installed in the driving simulator (see left
part of Figure 5.6), the subjects could be observed on a screen outside the
simulator. The observer ratings were collected in real-time during the study
every minute. In the work of Sandberg et al., it was found that for the detection
of reasonable signs of drowsiness, most indicators can be observed for intervals
of 60 seconds or longer [190]. To increase the reliability of the ratings, two
trained individuals worked together and set a rating for every minute after a
short discussion. As in the previous studies and experiments, the drowsiness
Scale by Weinbeer et al. was applied for the observer ratings [50].
Figure 5.6: Left: Simulator setup with camera for video ratings (1) and tablet PC
with Android application in center console (2); Right: Wearable on

participant's wrist: Polar OH1+ (3).
5.3.3.3 Post-Questionnaire
Subjects had to ll a post-questionnaire after the ride in the simulator, which
consisted of three parts.
Technology Acceptance Model (TAM)

For evaluating the acceptance of the proposed drowsiness detection system,
the TAM was applied [215], as it represents a standardized and frequently
used questionnaire. The perceived usefulness (PU) and ease of use (PEOU)
inuence the attitude (ATT) that aects the intention (INT) of using a system
(see Figure 5.7). With regard to these categories, items of TAM (see Table
5.1) were queried on a 7-point semantic dierential scale from 1 (negative) to
7 (positive).
107
Figure 5.7: Technology Acceptance Model (TAM), adapted from [216].
PU1: Using the system would be useful for me.

PU2: The system gives me a feeling of control over my activities.
PU3: The system would improve my performance.
PEOU1: Finding the information I need is easy in this system.
PEOU2: Learning to use the system is easy.
PEOU3: The system is easy to use.
ATT1: I like the idea of this system.
ATT2: Using the system is an enjoyable experience.
ATT3: The system makes sense.
INT: I would want to use the system.
Table 5.1: Items of the Technology Acceptance Model (TAM) [215].
User Experience Questionnaire (UEQ)

The user experience of the drowsiness detection systems was evaluated using
the UEQ [217] on a 7-point semantic dierential scale from 1 (negative) to 7
(positive). This questionnaire consists of six scales (Attractiveness, Perspicuity,
Eciency, Dependability, Stimulation, Novelty) with 26 pairs of contrasting
attributes (items) (see Table 5.2).
annoying | enjoyable unlikable | pleasing

not understandable | understandable usual | leading edge
creative | dull unpleasant | pleasant
easy to learn | dicult to learn secure | not secure
motivating | demotivating valuable | inferior
meets expectations | does not meet expectations boring | exciting
not interesting | interesting inecient | ecient
unpredictable | predictable clear | confusing
fast | slow impractical | practical
inventive | conventional organized | cluttered
obstructive | supportive attractive | unattractive
good | bad friendly | unfriendly
complicated | easy conservative | innovative
Table 5.2: Items of the User Experience Questionnaire (UEQ) [217].
Whereas Attractiveness represents a valence dimension, Perspicuity, Eciency,
108
5.3 User Study
and Dependability aim at the Pragmatic Quality (goal-directed) and Novelty

and Stimulation at the Hedonic Quality (not goal-directed) of a product. The
scale of Attractiveness contains six items, all the other four.
Further Questions
Apart from TAM and UEQ, the post-questionnaire contained the following
questions (see Table 5.3):
What would be the appropriate KSS level to receive a rst warning?

How condent did you feel when rating your own drowsiness?
Do you own a wearable (smartwatch, tness tracker)?
Did you already have any experience with smartwatches or tness trackers?
Are you going to buy a wearable in the near future?
Would you wear a wearable to ensure safety during a drive?
What position on the body would you prefer to wear the wearable?
Table 5.3: Further questions in post-questionnaire.
Finally, to receive qualitative feedback, the participants were also asked individ-
ually in the form of unstructured interviews about their overall experience while
using the system for the rst time and possible suggestions for improvement in
the proposed drowsiness detection system. In addition to the interviewer and
the participant, another person was present to take notes on the answers.
5.3.4 Study Procedure
An overview of the study procedure that took a maximum of two hours for each
subject is presented in Figure 5.8. The experiments were carried out at 9:00
a.m., 1:30 p.m., and 5:00 p.m. The same number of participants from each age
group was invited for the dierent points in time. Upon arrival, the subject
was given background information about the study, and the pre-questionnaire
was lled before entering the simulator. The subjects were then shown the
location of Polar OH1+ in the vehicle. In the next step, they were advised to
start the developed app on the tablet in the center console of the car and follow
the instructions. They were observed by the experimenter, who documented
any problems while dealing with the app and wearable. The experimenter also
assisted them and ensured that all necessary steps and settings were carried out
correctly before the drive in the simulator was started. A 5-minute partially
automated test drive was initially carried out to accustom the participants to
the rating request and allow the participants to get used to the simulator situa-
tion. Especially for most older participants, participating in a simulator study
was a completely new experience and could, therefore, have led to an increase
109
in heart rate due to nervousness and excitement. This could have distorted
the heart rate data at the beginning of the central part of the experiment. A
decrease in heart rate could have been potentially attributed to getting used
to the simulator situation. The central part consisted of a 60-minute partially
automated drive. A total distance of around 100 km was covered. During
the drive, the subjects had to assess their drowsiness every ve minutes via
self-ratings on the tablet. Outside the simulator, two external raters gave an
observer rating every minute. During the entire drive, the participants could
not use the mobile phone, drink or eat, or chew gum. Moreover, they should
only talk to the experimenter in an emergency and not perform any other activ-
ities. After nishing the drive, subjects were asked to draw drowsiness curves
and to ll the post-questionnaire.
Figure 5.8: Study procedure.
5.4 Results
In the following, results for the pre-questionnaire, performance evaluation of the

machine learning model, TAM, UEQ, and post-questionnaire are presented.
5.4.1 Pre-Questionnaire
In summary, 15 subjects, eight women and seven men (age: M=22.50 years,
SD=1.88 years), between 20-25 years and 15 subjects, eight women and seven
men (age: M=66.60 years, SD=1.84 years), between 65-70 years were selected
for the study. The majority of the younger subjects were students of Tech-
nische Hochschule Ingolstadt (THI). The older ones were recruited through a
newspaper announcement. Further, participants had to answer whether they
already experienced a micro-sleep while driving, which was answered yes by
nine younger and six older subjects. Furthermore, it was found that none of
the younger but nine older participants currently undergo medical treatment.
To stay more realistic, and since not only the detection of the drowsy state
but also the non-drowsy state is of high importance, the participants were not
110
5.4 Results
sleep-deprived. Younger participants slept on average 6.87 and the older ones
6.40 hours before the study, in both age groups, the majority with medium
quality. Younger participants slept on average 6.87 and the older ones 6.40
hours before the study, in both age groups, the majority with medium quality.
The younger participants slept on average 7.07 hours, and the older participants
6.73 hours per night. A summary of the results from the pre-questionnaire can
be found in Table 5.4.

female 8 8 16
male 7 7 14
age 22.50 (±1.88) 66.60 (±1.84) 44.57 (±22.48)
Micro-sleep Young Old Overall

yes 9 6 15
no 6 9 15

yes 0 9 9
no 15 6 21

before study 6.87 (±1.19) 6.40 (±1.18) 6.63 (±1.19)
in general 7.07 (±1.18) 6.73 (±0.88) 6.90 (±0.96)

very good 1 0 1
good 5 6 11
medium 8 8 16
bad 1 1 2
Additionally, for determining their daytime sleepiness, the items of the Epworth
Sleepiness Scale (ESS) were queried [40] and again divided into ve groups [181],
as shown in Table 5.5. It can be seen that the older subjects generally have
lower daytime sleepiness since all 15 subjects are localized in the two lowest
categories. The younger subjects cover all ve categories.
111
Daytime Sleepiness Young Old All
0 (0-5 points)
3 6 9
1 (6-10 points)
7 9 16
2 (11-12 points
1 0 1
3 (13-15 points)
3 0 3
4 (16-24 points)
1 0 1
5.4.2 Performance of Machine Learning Model
In order to validate the performance of the machine learning model, its output
was compared with the obtained self- and observer ratings. In the following,
the self-ratings from during the drive are used since the ones determined via
the drowsiness curves are at an almost identical level (see Figure 5.9). The
dierence averaged 0.24 KSS levels over the entire duration of the 60-minute
drive.
Figure 5.9: Average KSS ratings with 95% CI of all participants during and after
the drive.
A total number of 360 KSS and 1800 Weinbeer ratings are available for all 30
112
5.4 Results
subjects. In Figures 5.10(a) and 5.10(b) for both types of ratings, the average
ratings over the 60-minute drive are shown separately for younger and older
subjects. It can be seen that the younger participants achieved higher levels of
drowsiness on average than the older ones, which can be especially seen from
the course of the self-ratings. Whereas the increase in drowsiness in the rst
20 minutes of the trip is relatively strong among the younger participants and
then tends to remain at an almost constant level, for the older participants, a
relatively moderate but constant increase over the entire duration was recorded.
The dierences between the two age groups are evident in the range of 20-
35 minutes. With regard to the observer ratings, the dierences are not that
substantial. Towards the end of the drive, the two curves even converge and are
at an almost identical level. Figures 5.10(c) and 5.10(d) show the distribution
of the number of ratings across the levels of the dierent scales. A higher
number of ratings in the lower drowsiness levels can be determined for both
scales. Level 3 on the KSS (103 ratings) and level 2 on the Weinbeer scale (770
ratings) were frequently selected. Since drowsiness is classied binary (non-
drowsy, drowsy) by the trained machine learning model, but the KSS consists
of nine and the Weinbeer scale of six levels, this had to be adjusted rst. Based
on the work of Ingre et al. [79] the levels of the KSS were grouped as follows:
Levels 1-6 (non-drowsy), levels 7-9 (drowsy). The observer rating scale was
adopted in the same way as in Section 4.1: Levels 1-3 (non-drowsy), levels 4-6
(drowsy). Here, the class imbalance becomes particularly clear after grouping
both scales into the two levels. In the case of the grouped KSS ratings, 286
ratings are available for the non-drowsy and 74 for the drowsy class (see
Figure 5.10(e)), with the grouped Weinbeer ratings 1538 for non-drowsy and
262 for drowsy (see Figure 5.10(f )).
113
(a) (b)
(c) (d)
(e) (f)
Figure 5.10: Evaluation of KSS and Weinbeer ratings: (a) Average KSS ratings for
younger and older participants with 95% CI; (b) Average Weinbeer
ratings for younger and older participants with 95% CI; (c) Absolute
number of KSS ratings given per level; (d) Absolute number of Wein-
beer ratings given per level; (e) Absolute number of ratings for grouped
KSS levels; (f ) Absolute number of ratings for grouped Weinbeer lev-
els.
Further and as performed for studies 1 and 2, these trends were statistically
evaluated using IBM SPSS v.25, separately for KSS and Weinbeer ratings.
Since Weinbeer ratings are available for 1-minute, but KSS ratings for 5-minute
intervals, the mean of Weinbeer ratings over the corresponding 5-minute inter-
vals was considered for evaluation.
KSS ratings : A Linear Mixed Model (LMM) for repeated measures with two
114
5.4 Results
within-subject factors (measuring time points, moment of rating (during/after

drive)) and one between-subject factor (age group (younger/older partici-
pants)) was applied to evaluate the eects of driving time, moment of rating and
age on drowsiness. Several dierent covariance structures were analyzed (Di-
agonal, Compound Symmetry CS, First-Order Autoregressive AR(1), hetero-
geneous AR(1)). Based on the Bayesian Information Criterion BIC (2329.66)
and Akaike Information Criterion AIC (2216.91), the model t was best using a
rst-order autoregressive (AR(1)) covariance structure with heterogeneous vari-
ances. The test of xed eects resulted in a signicant eect of driving time
(F(11,216.09)=5.63; p=.000). Thus, a signicant dierence exists in the KSS
ratings over the 12 measuring time points. Furthermore, a signicant eect of
age group (F(1,45.08)=11.63; p=.001) was found, i.e., a signicant dierence
exists between the KSS ratings of younger and older participants over the 12
measuring time points with higher drowsiness levels for the younger partici-
pants. Regarding the moment of rating for both age groups over the 60-minute
drive, no signicant dierences could be identied in the given KSS ratings
from during and after the drive (F(1,250.59)=.47; n.s.). In terms of interac-
tion eects, no signicant eects were found for age group and driving time
(F(11,216.09)=.79; n.s.), age group and moment of rating (F(1,250.59)=1.229;
n.s.), driving time, and moment of rating (F(11,90.75)=1.77; n.s.) as well as
age group, driving time and moment of rating (F(11,90.75)=1.33; n.s.).
Weinbeer ratings : Since Weinbeer ratings were only collected during the drive,
in comparison to KSS, an LMM for repeated measures with only one within-
subject factor (measuring time points) and one between-subject factor (age
group (younger/older participants)) was applied to evaluate the eects of driv-
ing time and age on the development of drowsiness. Based on BIC (574.35)
and AIC (524.73), the model t was again best using a rst-order autore-
gressive (AR(1)) covariance structure with heterogeneous variances. From
the test of xed eects, a signicant eect of driving time was identied
(F(11,138.80)=13.63; p=.000). Thus, a signicant dierence exists in the Wein-
beer ratings over the 12 measuring time points. No signicant eect of age
group (F(1,34.76)=3.02; p=.091) was found, i.e., no signicant dierence ex-
ists between the Weinbeer ratings of younger and older participants over the
12 measuring time points. In terms of interaction eects, no signicant eects
were observed for age group and driving time (F(11,138.80)=1.02, n.s.).
Since not only the output of the machine learning model with either self- or
observer ratings will be compared, but also the two types of ratings with each
other in terms of their usage as ground truth for drowsiness, the available time
intervals of the ratings were synchronized. Because self-ratings represent 5-
minute time intervals, but observer ratings 1-minute intervals, the assumption
was made that every single minute in the 5-minute interval of the self-rating also
represents the rating of the complete interval. This enabled a minute-to-minute
comparison between the two types of ratings. Nevertheless, the evaluation and
comparison with the machine learning output were performed for all three vari-
ants, i.e., self-ratings over the whole 5-minute interval (further referenced as
115
SR (5 min)), self-ratings over single minutes in the 5-minute interval (further
referenced as SR (1 min)), as well as observer ratings over 1-minute intervals
(further referenced as OR (1 min)).
The classication results are presented in Table 5.6. Since the machine learning
model outputs a drowsiness level every two seconds, these levels were in a rst
step averaged in the considered time interval, i.e., 5-minute and 1-minute inter-
vals. The corresponding output for non-drowsy is 1, and for drowsy 2. From
an average drowsiness level of 1.5 in the considered time interval, the number
was rounded up to 2 (drowsy) and below this threshold to 1 (non-drowsy).
Due to a lack of recorded heart rate data, participants 5, 9, 11, 25, and 30 had
to be removed from the evaluation data set. With a general look at the clas-
sication accuracy of the individual participants, regardless of the drowsiness
reference, it can be seen that a high variance within the participants' accuracies
exists ranging from 50% (participant 8) up to 100% (4,10,12,17,20,21,23,28,29).
Concerning self-ratings, it can be determined that except for participant 2, the
accuracy when considering a 5- or 1-minute interval is almost identical. How-
ever, when comparing self- and observer ratings, more considerable dierences
exist, but no clear trend can be found. Regarding self-ratings, the state of
participant 3 was, e.g., correctly classied with an accuracy of 58.33% (5 min)
and 55.36% (1 min), whereas with observer ratings, an accuracy 91.07% of
was achieved. With participant 8, 50% each was achieved with self-ratings as
ground truth and 83.93% with observer ratings. The opposite can be deter-
mined for participant 22: around 66% with self-ratings and 42.86% in com-
parison to observer ratings. Across all subjects, the dierences in accuracy for
all three reference types are marginal, with a maximum of 82.72% with SR (5
min). Focusing on the two age groups, the classication accuracy is around
16% higher among the older subjects.
116
5.4 Results
Subject SR (5 min) SR (1 min) OR (1 min)

Young
1 75.00 75.00 60.71

2 83.33 73.21 35.71
3 58.33 55.36 91.07
4 100.00 100.00 37.50
5 n.a. n.a. n.a.
6 75.00 73.21 75.00
7 66.67 69.64 75.00
8 50.00 50.00 83.93
9 n.a. n.a. n.a.
10 100.00 98.21 98.21
11 n.a. n.a. n.a.
12 100.00 100.00 100.00
13 66.67 62.50 60.71
14 58.33 53.57 94.64
15 75.00 75.00 67.86
Average 75.69 73.81 73.36
Old
16 91.67 91.07 100.00

17 100.00 100.00 98.21
18 91.67 87.50 96.43
19 91.67 91.07 91.07
20 100.00 100.00 100.00
21 100.00 100.00 100.00
22 66.67 66.07 42.86
23 100.00 100.00 91.07
24 83.33 80.36 98.21
25 n.a. n.a. n.a.
26 75.00 76.79 76.79
27 66.67 73.21 87.50
28 100.00 100.00 100.00
29 100.00 100.00 91.07
30 n.a. n.a. n.a.
Average 89.74 89.70 90.25
Overall average 82.72 81.76 81.81
Table 5.6: Machine learning classication accuracy (in percent) calculated with ref-
erence to self-ratings with KSS for both 1- and 5-minute intervals and
1-min observer ratings with Weinbeer scale for individual subjects and
on average in each age group. For participants with n.a. not sucient
data was available for evaluation.
So far, and in terms of performance evaluation, the focus was purely on ac-
curacy as one of the most traditional measures. However, when dealing with
class imbalance, as in the present case, other performance measures should be
117
considered. Accuracy itself focuses more on the majority than on the minority
classes [201]. Thus, F-measure (further referenced as F) will be additionally
used that represents the harmonic mean of precision and recall with the high-
est possible value of 1 (perfect precision and recall). Concerning the presented
binary classication problem, it is important to detect when the driver is in a
drowsy state correctly. From the customer's point of view, however, it is also
crucial to correctly detect when the driver is non-drowsy to not irritate with
unnecessary drowsiness warnings. Concerning a standard confusion matrix for
binary classication with values for true positive (TP), true negative (TN),
false positive (FP) and false negative (FN), the formula for F does not take
the True Negative (TN) values into account.
For this reason, F-measure is calculated for each class considered as posi-
tive class once. Tables 5.7, 5.8 and 5.9 show the confusion matrices for the
three cases described. When calculating F for the displayed confusion matrices
(non-drowsy as positive class), the following results are obtained: 0.90 for SR
(5min), 0.89 for SR (1 min), and 0.90 for OR (1 min). If drowsy is considered
a positive class, the results are 0.26 for SR (5min), 0.29 for SR (1 min), and
0.25 for OR (1 min), which are noticeably lower in all three cases.
Actual
Non-
Drowsy
drowsy
Non-
Predicted
TP: 240 FP: 35

drowsy
Drowsy FN: 16 TN: 9
SR (5 min)
Table 5.7: Confusion matrix for evaluation of self-ratings (SR) (5 min).
Actual
Non-
Drowsy
drowsy
Non-
Predicted
TP: 1069 FP: 190

drowsy
Drowsy FN: 86 TN: 55
SR (1 min)
Table 5.8: Confusion matrix for evaluation of self-ratings (SR) (1 min).
118
5.4 Results
Actual
Non-
Drowsy
drowsy
Non-
Predicted
TP: 1108 FP: 151
drowsy
Drowsy FN: 99 TN: 42
OR (1 min)
Table 5.9: Confusion matrix for evaluation of observer ratings (OR) (1 min).
5.4.2.3 Outlook: Post-processing of Machine Learning Output
The machine learning model delivers a new output every two seconds. However,
it is not practical to update and re-inform the driver of his/her state in such
short periods. In this section, three possible and common methods are com-
pared for post-processing the machine learning output and how each method
inuences the detection accuracy. This should instead provide an outlook for
possible future work. For this reason, the methods are compared and evalu-
ated using participant 6 as an example and observer ratings as ground truth
for drowsiness.
Mean: To evaluate the performance of the machine learning model in the pre-
vious section, the mean was calculated in the considered time interval, e.g.,
in terms of observer ratings for intervals of one-minute length and rounded to
the nearest integer. This could be applied as a simple method to display and
update the driver's state every minute.
Ratio: This method calculates the ratio of the number of non-drowsy or
drowsy outputs to the total number of outputs in the considered time in-
terval. The driver's condition can be decided based on the ratio by setting
a threshold. For example, if the threshold is set to 0.7 and 70% of the out-
put is non-drowsy in a 1-minute interval, the driver would be classied as
non-drowsy. However, supposing the output is non-drowsy over the rst
45 seconds and drowsy over the last 15 seconds in the 1-minute interval, the
output would still be non-drowsy. In reality, the driver would probably be
moving towards a drowsy state. Hence, this technique is not very suitable,
especially when longer time intervals, such as ve or ten minutes, are consid-
ered.
Moving average: To overcome the limitations of the ratio-method, a moving
average, e.g., with a 2-minute sliding window with 1-minute increment, could
be calculated. The obtained average is rounded to the nearest integer. For this
method, the state is determined for every minute, but this time depends on the
previous minute.
Table 5.10 shows how the performance of the machine learning model changed
119
for participant 6 by using dierent methods for post-processing the machine

learning output. The result for Mean (see also Table 5.6) lies between the two
other methods. In the case of Ratio worse results were achieved for accuracy
and F-measures of both classes. Accuracy increased from 75% to over 78%
by using Moving average and the values for F-measure, which represents the
moving average method as the most promising one.
Mean Ratio Moving average
Accuracy 75.00 70.90 78.18
F-measure (non-drowsy) 0.78 0.82 0.84
F-measure (drowsy) 0.58 0.61 0.65
Table 5.10: Performance (accuracy, F-measure for both non-drowsy and drowsy
class) of machine learning model using dierent techniques for post-

processing its output with reference to observer ratings exemplary for
participant 6.
5.4.3 TAM - Technology Acceptance
TAM results were collected on a 7-point (1-7) Likert scale. Before evaluating
TAM, the data were rst checked for validity using Cronbach's α. Internal
reliability of all multi-item scales could be conrmed (α > 0.6). The results are
shown in Table 5.11. For a better and more detailed illustration, the results of
the individual TAM subscales for the two age groups are presented in the form
of a box plot in Figure 5.11. In general, the median and mean values for all
TAM subscales show, except for the equal median of PEOU, higher values for
the older participants in all other subscales.
Furthermore, it was checked if the obtained dierences and eects between the
two age groups are signicant. Since the collected TAM data did not follow
a normal distribution, and the applied scale is ordinal, a non-parametric test
in the form of the Mann-Whitney U test was applied per TAM variable. The
results are listed in the right column of Table 5.11. With the assumption
of a signicance level of α = 0.5 and the overall group size of 15 subjects
per age group, the critical U value is 64. If the value of U resulting from
the Mann-Whitney U test is beneath this critical U value, then a signicant
dierence in the ratings from the age groups exists. To check the certainty of
these outcomes, a two-tailed test is performed. If the p-value is less than 0.05,
a signicant dierence between the two groups with the condence of 1-p is
present. Signicant results, i.e., signicant dierences between the age groups,
are indicated in Table 5.11 with a *. Except for PEOU (U: 107.00, z = -0.21, p
= 0.83), the two-tailed Mann Whitney U test was signicant for PU (U: 45.00,
120
5.4 Results
z = -2.78, p = 0.01), ATT (U: 41.50, z = -2.92, p = 0.00) and INT (U: 40.00,
z = -2.99, p = 0.00).
Mdn (M) Mdn (M) Mann-Whitney U

TAM Items C.'s α
Young Old (U,z,p)
PU 3 0.75 5 (4.69) 6 (5.82) 45.00, -2.78, 0.01*

PEOU 3 0.78 7 (6.44) 7 (6.47) 107.00, -0.21, 0.83
ATT 3 0.68 6 (5.71) 7 (6.56) 41.50, -2.92, 0.00*
INT 1 - 5 (5.20) 7 (6.53) 40.00, -2.99, 0.00*
Table 5.11: Overview of the deployed subscales of TAM including Cronbach's α

(C.'s α), median (Mdn), and mean (M) for younger and older partic-
ipants. Signicant dierences (p < .05) between the two age groups
from Mann-Whitney U test are indicated by *.
Figure 5.11: Summary of results of TAM subscales for younger and older partici-
pants.
5.4.4 UEQ - User Experience
As for the TAM, the UEQ subscales were checked for consistency by calcu-
lating Cronbach's α (see Table 5.12). This could be conrmed for subscales
Attractiveness, Perspicuity, Stimulation, and Novelty but not for Eciency
and Dependability since an alpha value lower than 0.6 was obtained. Espe-
cially in the case of small sample size, as in the present case, the reason is
not necessarily a scale inconsistency. Instead, the misinterpretation of certain
items in the subscales by the participants can be the reason. In such cases, the
mean values of rating per question per subscale can be used to identify the mis-
interpreted items of the corresponding subscale. By applying this technique, it
was found that several participants misinterpreted specic scale items. Thus
the ratings for them were inconsistent with their ratings for other items of the
same subscale. In the case of one older participant, four of such inconsistencies
121
could be identied. Hence, for further evaluation and analysis, the ratings of
this participant were discarded. Moreover, in Table 5.12, median and mean
values for each subscale are listed. Further, Figure 5.12 presents the results of
UEQ in the form of box plots. In general, high values for both median and
mean (≥ 5) could be obtained in all subscales. Whereas for the mean, the older
participants achieved higher values in all subscales, in the case of the median,
this is the case in the categories Attractiveness, Stimulation, and Novelty. An
identical median value was determined in the remaining categories, i.e., Per-
spicuity, Eciency, and Dependability.
Furthermore, it was checked with the Mann-Whitney U test if the dierences
between the two age groups in the dierent UEQ subscales are signicant. The
critical U value is again 64. A two-tailed test is performed to check the validity
of Mann-Whitney U. Signicant results are indicated in Table 5.12 with a *.
Signicant results were obtained for Attractiveness (U: 24.50, z = -3.49, p =
0.00), Stimulation (U: 49.50, z = -2.40, p = 0.02) and Novelty (U: 27.00, z =
-3.38, p = 0.00). For Perspicuity (U: 96.50, z = -0.35, p = 0.73), Eciency (U:
81.00, z = -1.03, p = 0.30) and Dependability (U: 83.00, z = -0.94, p = 0.35)
results are not signicant.
Mdn (M) Mdn (M) Mann-Whitney U

UEQ Items C.'s α
Young Old (U,z,p)
Attractiveness 6 0.83 5 (5.40) 6 (6.21) 24.50, -3.49, 0.00*

Perspicuity 4 0.78 7 (6.50) 7 (6.55) 96.50, -0.35, 0.73
Eciency 4 0.55 6 (5.38) 6 (5.75) 81.00, -1.03, 0.30
Dependability 4 0.51 6 (5.58) 6 (5.73) 83.00, -0.94, 0.35
Stimulation 4 0.69 5 (5.00) 6 (5.77) 49.50, -2.40, 0.02*
Novelty 4 0.76 5 (5.07) 6 (6.18) 27.00, -3.38, 0.00*
Table 5.12: Overview of the deployed subscales of UEQ including Cronbach's α

(C.'s α), median (Mdn), and mean (M) for younger and older partic-
ipants. Signicant dierences (p < .05) between the two age groups
from Mann-Whitney U test are indicated by *.
Figure 5.12: Summary of results of UEQ subscales for younger and older partici-
pants.
122
5.4 Results
In addition to evaluating the six subscales of the UEQ and comparing the two
age groups, the data of the UEQ were evaluated concerning the three dimen-
sions of UEQ, i.e., Attractiveness, Pragmatic and Hedonic Quality (see Figure
5.13). The median is 6 in all three dimensions. The mean of Attractiveness is
6.21, with a minimum value of 3 and a maximum of 7.
Figure 5.13: Summary of results of UEQ dimensions for all participants.
Since, as in the present work, UEQ measurements are only available for a single
system, it is not easy to judge whether the product meets the quality goals. For
this reason, the results obtained were compared with a UEQ benchmark data
set, which contains data from over 20,000 people from around 450 surveys on
various products (business software, development tools, webshops or services,
social networks, mobile applications, household appliances). In all scales, the
product should be located in the Good category [218]. The results are pre-
sented in Figure 5.14. In terms of the presented drowsiness detection system,
this was achieved for all scales.
Figure 5.14: UEQ benchmark graph for proposed drowsiness detection system.
5.4.5 Further results from post-questionnaire
In addition to TAM and UEQ, the subjects were also asked how condent
they felt when submitting their self-ratings. The majority in both age groups,
ten younger, and 11 older participants, stated that they felt condent; four
123
medium and one younger subject not condent. Furthermore, they were asked
what level of drowsiness a warning would be appropriate with reference to the
KSS. The results are shown in Table 5.13. It can be seen that none of the
subjects consider levels 1-3 (extremely alert, very alert, alert) and level 9 (very
sleepy; sleep ghting) to be helpful for a warning. One younger and ve older
subjects chose level 4 (rather alert), two from each age group level 5 (neither
sleepy nor alert), three younger and seven older ones level 6 (some signs of
sleepiness) as well as ve younger and one older level 7 (sleepy; no eort to
keep awake). Four of the younger subjects would be satised with a warning
from level 8 (sleepy; some signs of sleepiness). It can be seen that KSS level 7,
which represents the onset of drowsiness, is the relevant level for a rst warning,
and level 9 is already perceived as too late.
3 (alert) 0 0 0
Table 5.13: Preferred KSS levels for a rst drowsiness warning.
The participants were also asked whether they already had experience with
smartwatches or tness trackers. Six of the younger subjects were familiar
with their usage; the remaining six had little or no experience. Only two of the
older participants came in touch with smartwatches or tness trackers so far.
They were further asked if they own a smartwatch or tness tracker, buy one
or wear one to ensure safety while driving. The results are listed in Table 5.14.
Only three younger and one of the older participants possess a wearable. One
younger and four older ones intend to buy one. In order to ensure safety while
driving, 13 younger and 14 older subjects would wear a wearable.
Participants should further reveal if they could imagine positioning the wear-
able on other body parts during driving, such as waist, arm, neck, nger, or
ankle. However, all younger and 12 older participants voted for the wrist. Two
of the three remaining older subjects chose nger and one waist.
124
5.5 Discussion and Limitations
Young Old Overall
possession of a wearable (Yes/No) 3/12 1/14 4/26
purchase intention (Yes/No) 1/14 4/11 5/25
wearing for safety

12/3 14/1 26/4
while driving (Yes/No)
Table 5.14: Results from post-questionnaire regarding usage of wearable devices.
After lling the post-questionnaire, the participants revealed in the context of

unstructured interviews insights about their overall experience and attitude to-
wards the system and whether they have suggestions for improvements. All an-
swers were summarized and analyzed in detail to nd common themes. Nearly
all mentioned that they found the idea of the system exciting, useful, and easy
to understand. A participant said: If you've used it a few times, you've in-
ternalized how to handle it. Three subjects added that they would use the
system, especially during long drives at night. Regarding the wristband, a par-
ticipant stated: Since the measuring device can be easily worn like a watch on
the wrist, this does not bother at all. In relation to this, a participant sug-
gested that the wristband should be sold directly with the car since no wearable
is used privately. Further, a subject wished for the bracelet to be able to be
charged in the vehicle when not driving. In terms of the developed Android
app, it was perceived by almost everyone as easy to understand, intuitive, and
self-explanatory. One older participant mentioned: I recently bought my rst
smartphone but could handle the app well. Another one said: The robot man
was cool, very intuitive. However, some and especially the older participants
had problems switching the device on: It should be better explained in the
app how and where to switch on the wristband.
In the following and based on the obtained results, the presented hypotheses
will be discussed as well as implications for further research in this area derived.
H1: A high detection accuracy can be obtained using the proposed
drowsiness detection system. The detection accuracy of the machine learn-
ing model that was integrated into the Android app was computed, considering
both self-ratings and observer ratings separately as ground truths. From the
results, it can be seen that inter-driver variance has a strong inuence on model
performance. Therefore, at the current development stage, a single model can-
not be applied to detect drowsiness for all participants reliably. On average,
across all participants, the obtained accuracies for the three presented cases (SR
125
(5 min), SR (1 min), OR (1 min) is slightly above 80%. H3 can only be partly

accepted and in individual cases. Apart from the accuracy, the performance of
the machine learning model was also calculated using the F-measure, separately
for both classes. Given the obtained results, it can be seen that the prediction of
drowsy instances is more challenging than non-drowsy ones. Before training
the machine learning model, the drowsy class was over-sampled by applying
SMOTE due to class imbalance. However, the model did not perform equally
well in both classes. This can be overcome by collecting more real drowsy
data or tweaking the hyperparameters. Moreover, cost-sensitive classication
should also be considered, where the elements of the confusion matrix carry
dierent weightings. Throughout the use, the system could also transit from a
user-independent model to a user-dependent one and adapt to the user. When
looking at the results from a commercial perspective, it is essential to correctly
detect the non-drowsy state of the driver to avoid false alarms of drowsiness
and not irritate or even annoy the user with unnecessary warnings. Looking
back at Table 3.10, in which the chosen levels for a rst warning are presented,
the majority of the participants already wanted a warning at KSS level 6 (some
signs of sleepiness) or level 7 (sleepy; but no eort to keep awake). Given the
split of the KSS levels into non-drowsy (levels 1-6) and drowsy (levels 7-9),
these two levels lie exactly on the threshold for a warning request and mark
kind of transitional state between being non-drowsy and drowsy. Partici-
pants would prefer a drowsiness predictor rather than a detector, which issues
a warning on the onset of drowsiness before being drowsy. Another noteworthy
aspect is that heart rate data from a commercial tness tracker, and a small
number of features served as input for the machine learning model. However,
in the case of drowsiness detection, a decisive acceptance criterion is the sys-
tem's performance. In the present study, the calculated drowsiness level was
not displayed, as they had to evaluate themselves in return via drowsiness self-
ratings. Therefore, the acceptance would have to be examined again with the
same participants in a follow-up study where the calculated levels are shown.
On the one hand, to rule out possible novelty eects, on the other hand, to see
how possibly incorrectly predicted drowsiness levels aect acceptance in the
end.
H2.1: There is a high user experience after rst-time use of the pro-
posed drowsiness detection system. To evaluate the user experience of the
proposed drowsiness detection system, the UEQ was presented to the partici-
pants after the simulator ride. In general, the results showed that the median
in all UEQ scales, i.e., Attractiveness, Perspicuity, Eciency, Dependability,
Stimulation, and Novelty, for both age groups is with values between 5 and
7 in a high range (see Figure 5.12) with overall higher ratings from the older
subjects. Further, for all UEQ dimensions (see Figure 5.13), the median is 6.
Therefore, it can be concluded that a high user experience was achieved when
the proposed drowsiness detection system is used for the rst time. This is also
reected in comparison to the benchmark data set (see Figure 5.14). For this
reason, H2.1 can be accepted.
126

and older (65-70 years) people regarding the user experience of the
proposed drowsiness detection system. For determining the signicance
between the two age groups for dierent UEQ scales, a Mann-Whitney U was
applied (see results in Table 5.12). For Attractiveness, Stimulation, and Nov-
elty, a signicant dierence between the ratings of younger and older partici-
pants was obtained. Whereas, for Perspicuity, Eciency, and Dependability,
no signicant dierence between the ratings of the two age groups exists. For
this reason, H2.2 can only be partially accepted. Therefore, the older partici-
pants found the system more valid and got an overall better impression of the
product. The usage was more exciting and motivating for them as compared to
the younger participants. Further, the product was perceived as more innova-
tive and caught the interest of the older participants more. A possible reason
may be that the majority of older people do not deal with the latest technolo-
gies daily and therefore do not have the same understanding of technology as
younger subjects who already grew up with it. For example, an older subject
mentioned in the interview that he recently bought a smartphone. Since the
presented app is a prototype, the use and design of the app were perceived by
young subjects as less exciting, as they probably use a large number of other
and more sophisticated mobile applications every day what might be the reason
for lower ratings in the hedonic quality. For both age groups, however, it was
easy to learn to use the application and wearable device and to become familiar
with it.
H3.1: There is a high technology acceptance after rst-time use of
the proposed drowsiness detection system. For determining the tech-
nology acceptance of the proposed drowsiness detection system, participants
answered the TAM after the drive in the simulator. The results showed that
the values of the median in all TAM scales, i.e., perceived usefulness (PU), per-
ceived ease of use (PEOU), attitude (ATT) towards the systems, and Intention
to use the system, are in the range of 5-7 and thus at a high level across both
age groups. From this, it can be derived that a high technology acceptance was
obtained after the rst-time use of the proposed drowsiness detection system
that leads to an acceptance of H3.1.
and older (65-70 years) people regarding the technology acceptance
of the proposed drowsiness detection system. On average, in all scales,
the older participants gave higher ratings. A Mann-Whitney U test was con-
ducted to determine possible signicant dierences in the ratings of the two age
groups. Expect for PEOU, the dierence between the age groups is signicant
in all other scales. Therefore, H3.2 can partially be accepted. Regarding the
results of UEQ, for TAM, a similar eect was obtained. Older participants
found the system to be more useful, showed a higher attitude towards the sys-
tem, and intended to use it. Both age groups considered using the system again
as easy. Therefore, reasons similar to those described for the UEQ, such as a
dierent understanding of technology or irregularly/no use of new technologies,
127
can be mentioned. Regarding the purpose of the presented system, the detec-
tion of drowsiness while driving, other reasons could also be considered. With
increasing age, several dierent health problems, such as diminished hearing or
vision, can occur that could lead to dangerous situations while driving. For this
reason, automated driving is particularly relevant for older people who want
to stay mobile but are disabled in their ability to drive. They are probably
more willing to use such kind of driver assistance system on the way to full
automation and therefore show a higher technology acceptance than younger
people. Another noteworthy nding from the post-questionnaire is that only
a few participants own a smart wearable or want to buy one. Automobile
manufacturers have to take this into account when developing systems of this
type in the future. In that case, which several participants after the study also
mentioned, one option would be to sell the wearable with the vehicle.
Regarding limitations, in the context of the user study, specic precautions
were taken to induce drowsiness faster. This included no caeinated drinks
ve hours before the study, a monotonous driving route with little trac and
low speeds, no communication with the experimenter, and warm temperature
inside the simulator. Besides, the study was conducted in a driving simulator.
Therefore, the onset of drowsiness would probably occur later under realistic
conditions since, in a simulator-based environment, drivers are not exposed to
dangers compared to real trac. Moreover, in real-world driving, vibrations
from the roadbed could have also led to degraded or dierent quality of the
collected physiological data and a diering machine learning output. Further-
more, drives with a duration longer than 60 minutes should be considered.
Moreover, two specic age groups were selected. The presented issues should
also be investigated for other age groups and, in general, with a higher number
of subjects.
5.6 Main Findings
With the proposed prototype in its current development stage, an overall

detection accuracy of 82.72% was reached across all participants when
considering self-ratings as ground truth.
Due to a strong inuence of inter-driver variance, reliable detection of

drowsiness was not possible with the same accuracy in all participants,
especially in terms of drowsy instances.
In terms of user experience and technology acceptance, high ratings were

obtained.
128
5.6 Main Findings
Both age groups rated the system easy to use and informative.
Older participants found the system to be more attractive as compared

to younger participants.
129
6 Discussion
Based on the overarching hypothesis (H: Smart wearables can be applied

for a reliable detection of driver drowsiness in an automotive con-
text.), three dierent aspects were investigated in the course of this thesis in
more detail. First, it was examined what preconditions can be considered to
adapt and personalize driver drowsiness detection systems and model dierent
groups of users (RQ1). The main focus of the thesis was the investigation of the
potential of the usage of vital data from consumer-grade wrist-worn wearables
devices as a single data source for drowsiness detection (RQ2). Further, in this
context, it was examined if driver drowsiness detection systems based on smart
wearables are accepted and how their acceptance and thus integration in the
vehicle can be further enhanced (RQ3).
In the following sections, the postulated research questions and obtained an-
swers will be discussed.
6.1 Preconditions for the Adaptation of Driver

Drowsiness Detection Systems (RQ1)
Based onRQ1 (What preconditions can be considered to adapt and

personalize driver drowsiness detection systems and to model dier-
ent groups of users?), several hypotheses were set up. It was argued that
age (young, old), driving mode (manual, partially automated), and driving time
have a signicant eect on the development of driver drowsiness, i.e., signicant
dierences between the age groups, the driving modes and the drowsiness de-
velopment over time can be determined. To answer this research question and
investigate the postulated hypotheses, two user studies were conducted. To
derive dierences between dierent study environments, a study setting was
proposed, carried out both in the simulator and on a test track.
In the evaluation, the main focus was on the subjective drowsiness assessment
via driver self-ratings to capture the drowsiness state directly from the driver's
perspective. The drowsiness self-ratings were collected during and after the
manual and partially automated drive. In both studies, results showed only a
131
6 Discussion
marginal dierence in the ratings during and after the drive. Further, the same
statistical eects could be determined for both ratings. This fact would suggest
that it is sucient to let the test persons assess their drowsiness immediately
and only after and not during driving. Even if the rating request is as little
disruption as possible while driving, it certainly has a very brief alerting eect
on the driver, which can harm the development of drowsiness.
Focusing on the presented hypotheses, the ones for driving mode and driving
time could be accepted in both studies. Therefore, a signicant dierence in the
development of drowsiness between manual and automated driving was evident
with higher levels in automated driving. Further, drowsiness was signicantly
higher at the end of the drive. Despite the short driving time of 45 minutes, high
levels of drowsiness with higher levels in automated driving could be achieved
with the chosen study setting. This is particularly noteworthy for study 2.
Even in a production car, the duty of monitoring during partially automated
driving aects drowsiness already in a short time. The post-questionnaire also
conrmed this issue in both studies, where the majority of participants stated
that they had to ght more with drowsiness during automated driving. There-
fore, the results obtained rearm the need for reliable and early detection of
driver drowsiness, especially in the lower levels of automated driving, where
the driver still forms the fallback level [?]. Since a constant involvement in the
driving task no longer exists, the level of drowsiness rises faster with partially
automated driving compared to manual driving. This eect has already been
presented in previous, and mainly simulator studies [150, 151, 152] and could
be conrmed in this work in a production car in a more realistic scenario within
a short driving time. If, for example, ADAS are activated while driving, which
enable driving in SAE level 2 or 3, there is a likelihood of higher drowsiness
levels. Therefore, the current status of these systems should be applied as in-
put for the adaptation of the drowsiness detection system, e.g., for adjusting
its sensitivity, by issuing warnings earlier.
Concerning the age eect, non-signicant dierences in the development of

drowsiness (higher drowsiness in younger subjects) could be found in the sim-
ulator study. In comparison, in study 2, a signicant eect of age on drowsi-
ness was found. Therefore, a signicant dierence exists between the younger
and older subjects in drowsiness development over time, with higher drowsi-
ness levels for the younger subjects. The dierences between the two selected
age groups were more pronounced in a more realistic environment than in the
simulator. This emphasizes the need for drowsiness studies in more realistic
environments. Especially in terms of the risk factor driver drowsiness, it is
dicult to transfer knowledge gained in the simulator one to one to real-world
driving and gure out how the drowsiness development over time would have
been aected. Therefore, drowsiness studies in more realistic environments are
of major importance and should be considered. Even if no real trac is in-
volved in the beginning, a study on a test track with specic precautions for
132
6.1 Preconditions for the Adaptation of Driver Drowsiness Detection Systems (RQ1)
reproducible and safe conditions for a drowsy driver, as performed in this Ph.D.
thesis, can serve as a starting point for bridging the gap between simulator and
the real world.
In both studies, higher levels of drowsiness were found in the younger partic-
ipants. With a focus on the obtained ESS scores, the older subjects achieved
lower ESS scores in studies 1 and 2, which indicates a lower level of daytime
sleepiness and could be a reason for the overall lower levels of drowsiness.
Moreover, in conversation with the participants after the studies, the impres-
sion arose that participating in a user study, dealing with a future topic such
as automated driving, and the experience and acceptance of new technology
were more exciting and fascinating for the older than younger ones. Further, in
both studies, the older subjects were more cautious and wanted a drowsiness
warning at an earlier time than the younger participants. This eect could be
incorporated into the design of a drowsiness detection system. Depending on
age, the time for a warning could be adjusted with earlier warnings for older
participants.
Furthermore, the self-ratings were compared concerning the three dierent

times the studies were conducted. In both studies, the lowest ratings were
given in the evening. In study 1, this was followed by the ratings in the af-
ternoon. The highest drowsiness levels were reached in the morning. In study
2, marginally higher levels of drowsiness were obtained in the afternoon. The
time of day also has an inuence on the development of drowsiness when driv-
ing. Depending on the time of day, the drowsiness detection system could be
adapted, for example, with earlier warnings in the morning.
Since in study 2, a production car was available, it was examined if a correlation

exists between trust in the automated vehicle and the development of drowsi-
ness while driving. Results showed a signicant correlation between increased
drowsiness and trust ratings after the ride with the automated vehicle. It was
found that the subjects rated the distrust items lower after driving than before
driving. In return, the trust ratings increased, already after a short system
exposure of 45 minutes. Participants who put more trust in the automated
vehicle before starting the drive became drowsier during the drive, which could
negatively aect the duty of monitoring in an SAE Level 2 car. Increased signs
of drowsiness could thus be interpreted in a way that drivers of automated
vehicles accept to fall asleep due to high trust in automation. Further research
on trust in automation and drowsiness will be necessary to prevent misuse and
successfully implement automated driving technology.
It was further investigated whether a connection exists between self-ratings and

heart rate measured by the wearables and if it is inuenced by age or driving
mode. Weak linear signicant correlations for all wearables used were found.
However, in the comparison between the heart rate in automated and man-
ual driving and within the age groups, noticeable dierences are recognizable.
133
6 Discussion
In manual driving, the average heart rate for all subjects is several beats per
minute higher than for automated driving, possibly due to the reduced activity
and workload. Furthermore, the average heart rate for young subjects is higher
than for older ones. Previous work also found that it is challenging to apply
detection models that have been trained with data from a particular age group
to another age group [4]. Results from the literature show that a decrease in the
maximum heart rate comes along with increasing age [186, 187, 188] that could
be conrmed in the context of this Ph.D. thesis in terms of driver drowsiness
with physiological data from low-resolution consumer-grade wearable devices.
The dierences in the heart rate between manual and automated driving and
the two age group thus reect the corresponding signicant dierences in the
self-ratings. If physiological data from wearable devices are applied for detec-
tion, dierent detection models could be integrated depending on the driving
modes and the driver's age.
Overall, the results indicate that various preconditions can and should be con-
sidered to adapt and personalize driver drowsiness detection systems and model
dierent groups of users (RQ1). With the knowledge gained, the performance
of intelligent driver-vehicle interfaces, which are intended to warn the driver in
the event of an onset of drowsiness, can be increased to ensure safe driving and
avoid crashes based on driver drowsiness in the best possible way.
6.2 Driver Drowsiness Detection with Vital Data

from Smart Wearables (RQ2)
RQ2 (Can driver drowsiness be derived from vital param-

The goal of
eters measured with wrist-worn smart wearables?) was to examine the
applicability of wrist-worn wearable devices for driver drowsiness detection in
an automotive environment. In particular, the potential and feasibility of us-
ing vital parameters, i.e., physiological data from consumer-grade wrist-worn
wearable devices for driver drowsiness detection, was investigated. In the course
of two main experiments where wearable devices from dierent manufacturers
were compared on the one hand with one another, and on the other hand, with
a medical-grade device, dierent aspects were analyzed. Further, in terms of
ground truth for drowsiness, a complexity analysis on the example of observer
ratings was conducted.
In the rst experiment (see Section 4.1), the goal was to investigate the poten-
tial of using HRV as single data input for a supervised machine learning model
for driver drowsiness detection. Through HRV analysis, the activity of the
ANS can be obtained [117] that helps to get detailed insights into the current
drowsiness state of a person [122]. For this reason, HRV was particularly often
134
6.2 Driver Drowsiness Detection with Vital Data from Smart Wearables (RQ2)
applied in terms of driver drowsiness detection [117, 118, 119]. However, the
recording was often very intrusive, e.g., by attaching adhesive electrodes to the
driver's upper body in an ECG measurement. For this experiment, HRV anal-
ysis was performed with physiological data of a consumer-grade wrist-worn
wearable device. The standard physiological measure of tness trackers and
smartwatches on the market is still the heart rate. However, newer devices
equipped with more advanced sensors can measure the RR intervals to carry
out HRV analysis, e.g., for stress detection. Therefore, it can be assumed that
the measurement of HRV will be standard shortly. The results were compared
with reference data from an intrusive and medical-grade ECG device to ex-
amine feasibility and accuracy further. Electrodes had to be attached to the
upper body to record the ECG data.
The automated drive in study 1 served as the database in this experiment.
In user-dependent (UDT) and user-independent (UIT) tests, the two devices
and dierent machine learning models were compared. Results showed that
with the data of the smart wearable device, albeit not for all models, the re-
sults are comparable and at a similar level compared to the more intrusive
medical-grade device in the in-vehicle setting. Regarding the choice of the
supervised machine learning model, with our proposed approach with KNN,
RT, and RF, in the UDT accuracies >90% could be achieved by using ex-
clusively physiological data (HRV) of the wristband. The models were only
trained with data from less than 30 persons in the current development stage,
and no hyper-parameters were optimized. Thus, for the present application, no
more complex ECG measurement would have to be applied. Instead, the much
less intrusive sensor of the wrist-worn smart wearable would suce to conduct
HRV analysis and use the calculated features for driver drowsiness detection.
If the results of UDT and UIT are compared with one another, the UDT re-
sults are signicantly higher. This fact reects the more considerable inuence
of inter-driver variance. Due to the lower number of drowsy instances, the
classication of this class turned out to be particularly critical at UIT for both
devices. With a thoroughly more extensive and balanced data set, this could
probably have been better accounted for since also oversampling of the data
set during training could not eliminate this problem. However, it is also essen-
tial to detect the non-drowsy state from the customer's point of view. The
driver does not want to be irritated by false drowsiness warnings and would
probably switch o the drowsiness detection system. Creating a meaningful
database indicates one of the major challenges that need to be addressed in
future research and industry before a robust commercial warning system can
be developed with a generalized model and a high ability to adapt to new and
unseen data. Since people behave very dierently in certain physical states, the
system could also transit from a user-independent model to a user-dependent
one and adapt to the user to increase detection performance. Therefore, the
drivers' feedback about their drowsiness state could be requested while driving
and applied for oine training of the drowsiness detection model, e.g., in the
cloud. With over-the-air (OTA) updates that more and more car manufactur-
135
6 Discussion
ers use, model performance could be steadily increased. Instead of the request,
a further possibility would be to have the driver either conrm or correct the
detected drowsiness in dened time intervals.
Due to the increasing demand for smart wearables, more and more manufactur-
ers are entering this market. Therefore, in the second experiment (see Section
4.2), dierent wearables were compared with one another. Heart rate was used
as a physiological parameter for the detection of drowsiness. Previous studies
show that heart rate varies signicantly between wakefulness and sleep [10, 30].
Heart rate is the standard physiological parameter measured by the majority
of consumer-grade wearable devices. The question arises whether it is also pos-
sible to infer driver drowsiness with it.
The automated drive in study 2 served as the database. Since the focus, in
this case, is on comparing dierent wearables and machine learning models for
dierent levels of drowsiness, the investigations were carried out purely as part
of UDTs for dierent levels of drowsiness. High accuracies were achieved in
both 2- and 3-level classications of drowsiness. However, as can be seen from
the results, successful detection strongly depends on the classier type. As
in the previous experiment, again, KNN, RF, and RT resulted in the highest
accuracies. In general, the three used devices show very similar results in all
tested classiers. Therefore, for later in-vehicle usage, dierent wearable de-
vices could be considered. However, it is noteworthy that one device, in some
cases, achieved marginally poorer results than the other two devices. This de-
vice provides a heart rate value about every three seconds across all subjects,
whereas the other devices deliver a value every second. This could probably
be attributed to a dierent analysis of the PPG signal or a higher sensitivity
towards external inuences (vibrations, strong movements). The reduced num-
ber of data points may have negatively aected the expressiveness of some of
the extracted features and thus the classication accuracy. It can be deduced
that better results could be achieved for the present case by sampling the heart
rate with a higher frequency. However, this nding needs to be investigated
with other wrist-worn wearables and larger data sets. It can be said that the
dierences between the results of 2-level and 3-level classication are relatively
small, with slightly better performances in the 2-level case. Moreover, in the
3-level classication, the dierences within the devices are lower than in the
2-level case. Depending on the concrete use case, this speaks for the use of
wrist-worn wearables in both cases. In general, by using heart rate solely from
consumer-grade wearables as input for drowsiness detection, promising results
could be achieved. However, in this experiment, the detection performance of
the tested classiers was evaluated in the form of a user-dependent test apply-
ing 10-fold stratied cross-validation. As in the previous experiments, the best
performing models have to be tested in user-independent tests (UIT) to be able
to assess to what extent they are capable of generalizing to new data.
Another aspect that was investigated in the context of the model development
136
6.2 Driver Drowsiness Detection with Vital Data from Smart Wearables (RQ2)
for driver drowsiness detection, but is not in the main focus for answering this
research question, is the ground truth for drowsiness. In the rst experiment,
observer ratings combined with detected micro-sleep events served as ground
truth for drowsiness. In the second one, self-ratings were applied. In both
experiments, the ratings represented intervals of ve minutes lengths. The
question arises, whether it is sucient to use drowsiness ratings that apply to
5-minute intervals or if ratings at much shorter intervals are needed, or a dif-
ferent ground truth for drowsiness at all should be considered. Given previous
research in driver drowsiness detection, the acquisition of ground truth is, in
most cases, tailored to the respective study. So far, and in terms of comparabil-
ity of dierent works with each other, a uniform process and drowsiness scale,
general guidelines, or recommendations are still missing. In the course of this
Ph.D. thesis, results from a complexity analysis (see Section 4.3) show that in
the case of a decreased rating frequency, a ground truth of almost comparable
quality can be determined, independent of a changing number of drowsiness
levels. It can be deduced that drowsiness represents a rather slowly chang-
ing state, and therefore higher rating intervals are sucient. Deriving already
a standard process from the obtained results for collecting a reliable ground
truth for drowsiness is a complex undertaking. However, the obtained results
can serve as recommendations for further research on this topic.
Concerning the approaches in the previous works on driver drowsiness detection

using wrist-worn wearable devices, a one-to-one comparison with the proposed
methodology in this work is not possible because of the use of dierent data sets
and inputs for the detection. However, some general statements can be made.
The involvement of multiple sensors makes the systems more intrusive [114]
and complex [144], whereas the proposed method only uses a wristband. In
other studies, new wrist-worn wearables for recording physiological data were
developed [129, 103, 101] or data applied that only very few wearable devices
available on the market currently provide [146]. In contrast to the devices used
in this work, these devices are not yet ready for the market, making future
and large-scale use in the vehicle dicult. The use of motion sensors in the
wearables to evaluate the steering behavior will be more challenging in the
future [142, 143, 145, 147] since the degree of automation will steadily increase.
Using vital data from smart wearables, as the sole data source continuously
measured and streamed to the vehicle, could solve this problem.
The proposed methodology was implemented to show and discuss the feasi-
bility of using solely physiological data from a wrist-worn wearable device for
driver drowsiness detection in combination with a supervised machine learning
classier. In general, the results indicate that drowsiness can be derived by
applying vital data (HRV, heart rate) from wrist-worn smart wearable devices
(RQ2), and its detection in an automotive context is feasible. Open challenges
and issues were discussed and highlighted and can serve as a starting point for
further research in this area.
137
6 Discussion
6.3 Acceptance of Drowsiness Detection Systems

based on Smart Wearables (RQ3)
With RQ3 (Are driver drowsiness detection systems based on smart

wearables accepted and how to further enhance their acceptance and
thus integration in the vehicle?) it was examined whether the in-vehicle
usage of systems based on wearable devices for safety-critical tasks such as
driver drowsiness detection is accepted and how this acceptance can be further
enhanced. So far in related work, systems were mainly evaluated regarding
detection performance. However, even if, with the in-vehicle usage of smart
wearables, high detection accuracies for driver drowsiness can be achieved, the
driver is to a certain extent forced to use and wear such a device while driving.
A portable prototype for a driver drowsiness detection system based on a wrist-
worn wearable device is proposed for this purpose. It was argued that a high
user experience and technology acceptance and signicant dierences regarding
the drivers' age could be obtained after rst-time use of the system.
The results of the conducted simulator study showed that, in general, a high
level of user experience and technology acceptance could be ascertained after
using the system for the rst time, with higher ratings among the older sub-
jects. Nearly all participants mentioned that they found the idea of the system
exciting, useful, and easy to understand. Thus, both of these hypotheses could
be accepted. Both age groups found using the system easy to understand and
use. With regard to signicant dierences due to age, the hypotheses pos-
tulated could only be partially accepted, as the signicance did not apply to
all subscales of the questionnaires used. The older study participants found
the product more innovative, more appealing, and aroused more interest com-
pared to the younger participants. Therefore, to increase acceptance among
younger subjects, special attention must be paid to the user interface design
to make the system more appealing. Moreover, the older participants found
the system more useful and showed a higher attitude towards the system and
intention to use it. Older subjects are a potential target group for automated
driving. Despite possible health problems, they want to stay mobile, which
possibly ascribes higher usability to the system and motivates them to use it.
A non-intrusive system, consisting of an easy-to-use mobile application and a
wrist-worn wearable, does not restrict drivers while driving in a partially au-
tomated vehicle and can also give them a feeling of security and condence.
Especially the older participants appreciated that the wearable device could
be worn like a watch and thus in a familiar way. During the interviews, the
participants mentioned several other aspects that would lead to an acceptance
enhancement: The wearable should be sold directly with the vehicle as a com-
plete package. If the vehicle is retrotted with this feature, it can be purchased
then. Further, it should also be possible to charge the device in the vehicle.
138
6.3 Acceptance of Drowsiness Detection Systems based on Smart Wearables (RQ3)
The user manual for operating the wearable and generally using the system
was of great importance, especially for the older participants. They might be
less ane with new technologies than younger people. Because even if smart
wearables are becoming more and more established in society, the results of
post-questionnaires of study 2 and 3 showed that, in general, and also among
the young subjects, only a few used a wearable before or have the intention to
buy one shortly. Therefore, it will be decisive how automobile manufacturers
draw attention to themselves in this regard to enhance the acceptance of po-
tential customers for using this type of system in the vehicle. Dierent aspects
and questions should be considered in this context, depending on the specic
business case: Will the wearable be sold directly with the vehicle? Can the
device be used privately and outside the vehicle for other activities, or is it
only intended and designed for in-vehicle usage? Is it only used for drowsiness
detection inside the vehicle? Which other states can be detected by using phys-
iological data from wearable devices inside the vehicle? Which wearables can
generally be used in the vehicle for that purpose? For which manufacturer's
wearable devices is an interface in the vehicle available? Suppose the wearable
is not sold with the car, and a more general solution is preferred. How can the
market for wearable devices be covered in the best possible way, i.e., should
interfaces only be available for the manufacturers of the currently most popular
wearables?
With regard to the evaluation of the detection performance of the machine

learning model integrated into the application, similar results and tendencies
emerged as in the experiments discussed above, especially in terms of the UIT.
The results again show the strong inuence of inter-driver variance with accura-
cies from 50% to 100%. Therefore, a unique model can not be applied to detect
drowsiness for everyone reliably. The results also show that the prediction of
drowsy instances is more challenging than not drowsy ones. Nevertheless,
it should be emphasized that the Random Forest model used was trained with
data from only 30 test subjects and tested in the context of study 3 with com-
pletely unseen data. Detection accuracy of over 80% could be achieved. As
discussed for RQ2, there are numerous approaches that can be taken into ac-
count in this context in order to increase the accuracy further. Apart from the
general acceptance and user experience of drowsiness detection systems based
on wearables, the performance and the correct detection of drowsiness will
contribute signicantly to the acceptance of the whole system and its further
usage.
The knowledge gained and dierent aspects discussed can serve as a reference
for other researchers and car manufacturers to start developing their applica-
tions and systems for drowsiness detection, using wrist-worn wearables devices,
with high acceptance and user experience.
Overall, the discussed results show that it is feasible to detect drowsiness purely
with physiological data from consumer-grade wrist-worn wearables combined
139
6 Discussion
with supervised machine learning. The focus should not only lie on the pure de-
velopment of the models but also on preconditions, such as external or human
factors, which can be used to adapt the systems to increase their performance.
Furthermore, it is essential to pay attention to how these systems are designed
and advertised to make them equally attractive for everyone across the dif-
ferent age groups. It became evident that there are still many aspects and
open challenges that need to be further researched, mainly related to the top-
ics examined in this doctoral thesis and topics that allude to research on driver
drowsiness detection in general. These include, e.g., the focus on more realis-
tic driving studies, collection of a balanced and suciently large database for
model development, and the establishment of uniform ground truth for drowsi-
ness. Thereby, the knowledge gained in this Ph.D. thesis can serve as a starting
point for further research.
The requirements for driver state monitoring for automobile manufacturers
from international institutions and the ongoing progress in automated driving
have changed and tightened. Safety on the roads needs to be increased by fur-
ther reducing fatal accidents based on risk factors such as driver drowsiness.
6.4 Further Deployment Scenarios for

Drowsiness Detection Systems based on
Smart Wearables
With the ndings of this work, the core contribution is made to the improve-
ment of driver drowsiness detection systems and trac safety on the way to
the complete automation of the driving task. However, beyond (automated)
driving apart from the automotive industry and car manufactures, other areas
are also conceivable in which drowsiness represents a major risk factor and
intelligent detection systems based on wrist-worn wearable devices could be
applied.
Areas in which systems for recognizing drowsiness and fatigue have been investi-
gated in recent years are aviation [219, 220] and maritime operations [221, 222].
Existing aircrafts and ships could be retrotted like vehicles and warn pilots
and ship captains when the system detects drowsiness.
Another application area is Industry 4.0 in which the focus is on the fol-
lowing goals and design principles: interconnection, information transparency,
technical assistance, and decentralized decision [223]. With a focus on wear-
able devices and drowsiness detection, interconnection and technical assistance
are particularly relevant. When it comes to interconnection, the aim is that
people, devices, and sensors are interconnected and can communicate, e.g., via
the Internet of Things (IoT). As part of technical assistance, humans shall be
supported in dicult or dangerous tasks. By using and networking the work
140
6.4 Further Deployment Scenarios for Drowsiness Detection Systems based on
Smart Wearables
environment in Industry 4.0 with intelligent wearables, breaks can be automat-

ically requested, and machines or processes slowed down or even stopped to
protect workers when drowsiness is detected within dangerous tasks.
The areas of application mentioned are intended to show that the usage of
wearable devices and their recorded physiological data oers a wide variety of
possibilities that will be increased in the future through improved sensors and
connectivity.
141
7 Conclusion
As the automation of the driving task progresses, new challenges arise for car
manufacturers. Notably, the interaction of the car with the driver, who forms
the fallback for the automated system in the lower levels of automation, is
one of the essential aspects. How is it ensured that the driver can take over
complete control from the vehicle in an acceptable time frame after a TOR?
To guarantee this, reliable driver state monitoring and detection systems move
more and more to the fore. However, in the automation of the driving task,
new challenges arose; in general, the framework conditions in the area of driver
state monitoring for automobile manufacturers have changed and tightened.
For expressing the importance of these systems, international institutions in-
tegrated them into their programs. Based on the General Safety Regulations
of the European Union (EU), from July 2022, a system for driver drowsiness
detection will be legally binding for new vehicle types and from July 2024 for all
vehicles to be registered [19]. Further, in the 2025 Roadmap of the European
New Car Assessment Program (EuroNCAP), driver monitoring is listed in the
category of primary safety [16]. To overcome limitations of existing drowsiness
detection systems and encouraged by the advancements in the development of
smart wearables devices in consumer electronics in recent years, in this work,
their suitability in an automotive environment, particularly in the eld of driver
drowsiness detection for automated driving, was investigated.
The nal chapter concludes the Ph.D. thesis with a summary and presents rec-
ommendations for the design and development of drowsiness detection systems
in the automotive industry and in general. It highlights the main contributions
and addresses limitations and possible future work.
7.1 Summary
In introducing and motivating the topic, the need for systems to identify the
driver's state, especially the risk factor drowsiness, was emphasized. Since ac-
cidents caused by drowsiness still occur very often, international institutions
are now also stipulating that automobile manufacturers integrate systems of
this type into vehicles in the future. Both the future legal obligation and the
143
7 Conclusion
ongoing driving automation require systems for reliable driver state monitor-
ing.
Building on this, theoretical basics and a state-of-the-art overview of current

drowsiness detection methods in research and industry were given. Starting
with a general description of the four common measurements for driver drowsi-
ness and comparing their advantages and limitations, the topic was narrowed
down. The focus was placed on driver drowsiness detection using physiological
measures of consumer-grade wrist-worn wearable devices. After the research
gap was shown, the problem statement and the own research approach were
presented. It was hypothesized that smart wearables can be applied to detect
drowsiness in an automotive environment. In the context of this hypothesis,
it was examined which preconditions have to be considered when developing
drowsiness detection systems (RQ1), whether it is possible to detect drowsi-
ness with vital data from wearable devices (RQ2), and how the acceptance of
drowsiness detection systems based on smart wearables is and how it can be
further enhanced (RQ3).
Based on previous studies from related work and to examine RQ1 and RQ2, a
study setting was presented, carried out in the simulator and on a test track.
The results indicate that various preconditions can be considered to adapt, per-
sonalize and increase the performance of driver drowsiness detection systems
(RQ1). Dierent congurations of drowsiness detection systems, e.g., for dier-
ent times of the day, driving modes, and age groups, can be developed to ensure
safe driving and avoid crashes based on driver drowsiness in the best possible
way. Experiments followed where the development and potential of drowsiness
detection models with physiological data from wearable devices were investi-
gated (RQ2). In general, promising results indicate that drowsiness can be
derived by applying vital data from wrist-worn smart wearable devices, and
its detection in an automotive context is feasible. Open challenges and issues
were discussed and highlighted and can serve as a starting point for further
research in this area. Building on the previous results, a portable prototype for
real-time driver drowsiness detection based on a smart wearable was presented
and evaluated in a third user study. The results showed a high acceptance of
the idea of this kind of system (RQ3). However, for automobile manufactur-
ers, it will be important in the future to draw attention to themselves in this
regard so that potential customers will accept and use systems of this type in
the vehicle.
Finally, based on the postulated hypothesis and three research questions, the
results were discussed, and aspects presented that need to be tackled in future
research.
144
7.1 Summary
7.1.1 Recommendations for the Design and Development of

Drowsiness Detection Systems
Based on the results obtained in this work, recommendations for the design
and development of drowsiness detection systems are summarized.
7.1.1.1 Preconditions for the Adaptation of Driver Drowsiness Detection

Systems
Usage of time on task, i.e., driving time, as input for the drowsiness
detection system, especially when ADAS are activated.
Usage of system status (activated/deactivated) of ADAS (e.g., ACC or

LKA), i.e., driving mode, as input for the adaptation of the drowsiness
detection system, e.g., by issuing earlier warnings when ADAS are acti-
vated.
Usage of age of participants as input for the adaptation of the drowsiness

detection system. Depending on age, the time for a warning can be
adjusted with earlier warnings for older participants.
Usage of time of the day as input for the adaption of the drowsiness
detection system, e.g., by issuing earlier warnings in times of the day,
when humans are more pronounced to become drowsy.
Usage of dierent detection models for dierent age groups and driving
modes.
Need for drowsiness studies in more realistic environments (e.g., test

track) for a better imitation of realistic trac scenarios.
7.1.1.2 Model Development for Driver Drowsiness Detection Systems

using Vital Data from Smart Wearables
Drowsiness detection by applying heart rate and HRV data from wrist-
worn wearable devices is feasible.
To counteract inter-driver variance the creation of a meaningful, i.e., ex-

tensive and balanced data set is required to develop robust models that
can generalize to unseen data.
145
7 Conclusion
Development of user adaptive systems to increase detection performance:

Transition from user-independent to user-dependent models, e.g., via
cloud-based training and over-the-air (OTA) updates.
The focus should not only lie on the detection of drowsiness, but also on
the detection of alertness/wakefulness. Too many false alarms can, in the
worst case, lead to the deactivation of the system by the user.
The wearable device should deliver the heart rate at regular and small
time intervals. Larger/irregular intervals can result in lower detection
accuracies.
Drowsiness can be classied both in a 2- or 3-level scenario with physio-

logical data from wrist-worn smart wearables.
Ground truth collection: No necessity for observer ratings at high fre-

quencies (1-minute intervals). Drowsiness represents a rather slowly
changing state, and therefore higher rating intervals (up to ve minutes
in this work) are sucient and deliver almost identical results.
Ground truth collection: Drowsiness self-ratings can be recorded directly

after the drive, since this does not have an alerting eect on the driver
and the dierences with ratings from during drive are marginal.
7.1.1.3 Acceptance of Drowsiness Detection Systems based on Smart

Wearables
Special attention must be paid to the design of the user interface of the
drowsiness detection system in order to make it appealing for all age
groups.
A charging facility for wearables in the vehicle is recommended.
A user manual for operating the wearable and generally using the system
is of great importance, especially for older people, who might be less ane
with new technologies than younger people.
It is recommended to sell the wearable device directly with the vehicle.

A universal interface for potentially all devices available on the market
is challenging to implement. In addition, the quality and type of physi-
ological data must meet the requirements of the system what might not
be ensured by all wearable devices.
146
7.2 Limitations and Future Work
7.2 Limitations and Future Work
Additionally to the individual limitations discussed in the presented studies and

experiments, the main limitations are summarized in the following sections, and
recommendations derived for future work.
7.2.1 Study Settings
Certain precautions were taken to induce drowsiness in all three studies. This
included no communication with the experimenter, no food, no caeinated
drinks ve hours before the study, a monotonous driving route with a low
speed limit, and warm temperature inside the simulator and test vehicle. In
both scenarios, drowsiness would generally occur later since possible dangers in
a simulator-based environment and on a test track compared to real trac can
be neglected. Therefore, the results obtained may not have absolute validity
and cannot be fully transferred to real-world scenarios. Simulators lack realism
due to potentially inadequate movements or eld of views or a possibly poor
graphical representation of the simulation environment. Especially in the sim-
ulator, the hurdle to fall asleep is lower for the test persons since they are not
confronted with severe and realistic consequences. Although a more realistic
follow-up study was conducted on a test track in the course of this thesis, a
similar study should also be carried out in real trac. Moreover, drives with
longer duration, possibly sleep-deprived participants, and night times should
be considered to collect a more evenly distributed data set in terms of drowsy
and not drowsy samples.
Regarding participants, the focus in these studies was on the comparison of two
age groups selected based on their recommended average sleep requirements.
The validity of the occurring eects needs to be enhanced with a higher number
of subjects. Further, participants from other age groups should be considered.
Another limiting factor is the selection process of participants. Whereas the
older participants were recruited via an advertisement in a local newspaper,
the younger participants were mainly students from the Technische Hochschule
Ingolstadt (THI) that possibly have a dierent view on technology or similar
aspects. Therefore, when interpreting the obtained eects, this should be con-
sidered and future studies conducted with participants who better represent
the general population.
147
7 Conclusion
7.2.2 Model Development with Data from Wearable Devices
Due to inter-driver variance and an unevenly balanced training data set with
only 30 participants of two age groups, the machine learning models could
not predict drowsiness for all users with the same accuracy and reliability in
the user-independent tests. To develop generalized solutions, a balanced and
suciently large database needs to be collected in future research. Further,
dierent and more objective types of ground truth for drowsiness should be
considered. In this work, heart rate and HRV were applied for driver drowsiness
detection. Other physiological parameters that can be measured with wearable
devices should be applied in the future. The dierent experiments showed
that it is feasible to apply physiological data from smart wearables to detect
driver drowsiness. However, since only a small number of dierent devices were
used, the results obtained may not be transferable to all devices available on
the market. Therefore, more devices from dierent manufacturers need to be
considered and compared in future work.
Further, the applied models were used with the default parameters preset in
the Weka machine learning library. Therefore, no hyper-parameters were tuned.
The aim was to identify which standard machine learning models are suitable
for the proposed classication problem. In future research, ne-tuning of hy-
perparameters of the most promising classiers could increase performance.
Since the focus in this work was on a specic type of feature extraction (sliding
window), feature selection (CFSS), and class balancing (SMOTE), other meth-
ods should be considered and compared. Moreover, rule-based, unsupervised,
or deep learning approaches should be considered in future research.
7.3 Contributions
In the following, the core ndings and individual contributions to the eld of
driver drowsiness detection are briey summarized.
Provision of an overview and summary of current driver drowsiness de-

tection methods and future challenges in the ongoing automation of the
driving task ([1]).
For bridging the gap between simulator studies and experiments in real
trac, a study setting for drowsiness was presented that approaches a
more realistic scenario with safe and reproducible conditions that mini-
mize risk and danger for the participants ([6]).
Investigation of preconditions that inuence drowsiness and can be ap-
148
7.3 Contributions
plied for the adaptation and personalization of driver drowsiness detection

systems and modeling of dierent user groups ([2], [3], [6]).
The potential and feasibility of applying physiological data of wrist-worn

smart wearables devices for the detection of driver drowsiness was inves-
tigated. Novel insights and future challenges were presented that can be
applied for the development of novel driver-vehicle interaction concepts
for driver state monitoring [5], ([8], [4]).
In the course of a complexity analysis, recommendations, as well as pos-

sible guidelines for collecting a reliable and valid ground truth for driver
drowsiness was presented ([7]).
A portable prototype for a driver drowsiness detection system based on

a wrist-worn wearable device was proposed and evaluated in a user study
in terms of technology acceptance, user experience, and detection perfor-
mance. ([9]).
149
A Publications and Contribution
Statement
The following publications were published in the context of this doctoral the-
sis:
[1] Kundinger, T., Riener, A. & Sofra, N. (2017). A Robust Drowsi-

ness Detection Method based on Vehicle and Driver Vital Data. In
Mensch und Computer 2017 - Workshopband.
My contribution: I conducted the literature review and authored most of

the work.
Contribution of co-authors: Andreas Riener came up with the initial idea,

provided feedback and co-authored the paper. Nikoletta Sofra commented on
the manuscript and provided feedback on the project.
[2] Kundinger, T., Riener, A., Sofra, N. & Weigl, K. (2018). Drowsi-
ness Detection and Warning in Manual and Automated Driving: Re-
sults from Subjective Evaluation. In Proceedings of the 10th International
Conference on Automotive User Interfaces and Interactive Vehicular Applica-
tions (pp. 229-236).
My contribution: I developed the initial idea and the study design for this
work in joint discussions with Andreas Riener. I implemented the driving
scenario and the application on the tablet. Further, I conducted the whole
experiment and carried out all evaluations and analyses except the statistical
analysis of drowsiness self-ratings. I authored most parts of the paper.
Contribution of co-authors: Klemens Weigl performed the statistical anal-

ysis of drowsiness self-ratings and commented on its textual description. An-
dreas Riener gave recommendations for the evaluation, commented on the
manuscript, and provided feedback on the project. Nikoletta Sofra commented
on the manuscript and provided feedback.
[3] Kundinger, T., Wintersberger, P. & Riener, A. (2019).

(Over)trust in Automated Driving: The Sleeping Pill of Tomor-
151
A Publications and Contribution Statement
row? In Extended Abstracts of the 2019 CHI Conference on Human Factors

in Computing Systems (pp. 1-6).
My contribution: This work is based on the user study presented in [6]. The
initial idea came up in joint discussions with Philipp Wintersberger. I con-
ducted the whole experiment and pre-processed the data for statistical evalu-
ation. I authored most parts of the paper.
Contribution of co-authors: Philipp Wintersberger performed the statisti-

cal evaluation and co-authored the paper. Andreas Riener commented on the
manuscript and provided feedback on the project.
[4] Kundinger, T., Yalavarthi, P. K., Riener, A., Wintersberger,

P. & Schartmüller, C. (2020). Feasibility of Smart Wearables for
Driver Drowsiness Detection and its Potential Among Dierent Age
Groups. International Journal of Pervasive Computing and Communications
(IJPCC), 16(1), 1-23.
My contribution: This work is partly based on the user study presented in

[2]. Phani Krishna Yalavarthi and I performed all evaluations and analyses as
well as most parts of the writing.
Contribution of co-authors: Phani Krishna Yalavarthi conducted the pre-

study. Andreas Riener, Philipp Wintersberger, and Clemens Schartmüller gave
recommendations on the study design, provided feedback for evaluation, and
commented on the manuscript.
[5] Kundinger, T., Sofra, N. & Riener, A. (2020). Assessment of

the Potential of Wrist-worn Wearable Sensors for Driver Drowsiness
Detection. Sensors, 20(4).
My contribution: This work is based on the user study presented in [2]. All
evaluations and analyses carried out in this work were performed by me. I
authored most parts of the paper.
Contribution of co-authors: Andreas Riener and Nikoletta Sofra com-

mented on the manuscript and provided feedback.
[6] Kundinger, T., Riener, A., Sofra, N. & Weigl, K. (2020). Driver
Drowsiness in Automated and Manual Driving: Insights from a Test
Track Study. In Proceedings of the 25th International Conference on Intelli-
gent User Interfaces (pp. 369-379).
work in joint discussions with Andreas Riener and implemented the application
152
on the tablet. Further, I conducted the whole experiment and carried out
all evaluations and analyses except the statistical analysis of drowsiness self-
ratings. I authored most parts of the paper.
Contribution of co-authors: Klemens Weigl performed the statistical anal-

ysis of drowsiness self-ratings and commented on its textual description. An-
dreas Riener gave recommendations for the evaluation, commented on the
manuscript, and provided feedback on the project. Nikoletta Sofra commented
and edited the manuscript.
[7] Kundinger, T., Mayr, C. & Riener, A. (2020). Towards a Reliable

Ground Truth for Drowsiness: A Complexity Analysis on the Exam-
ple of Driver Fatigue. In Proceedings of the ACM on Human-Computer
Interaction, 4 (EICS).
My contribution: This work is based on the user study presented in [2]. I

developed the ideas for evaluation and analyses in joint discussions with Celena
Mayr. I authored most of the work.
Contribution of co-authors: Celena Mayr conducted the evaluations and

statistical analyses presented in the paper in her bachelor's thesis supervised
by Andreas Riener and me. Andreas Riener commented on the manuscript and
provided feedback.
[8] Kundinger T. & Riener, A. (2020). The Potential of Wrist-worn

Wearables for Driver Drowsiness Detection: A Feasibility Analysis.
In Proceedings of the 28th ACM Conference on User Modeling, Adaptation
and Personalization (pp. 117-125).
My contribution: This work is based on the user study presented in [6]. I

performed all evaluations and analyses presented in this work. I authored most
of the work.
Contribution of co-authors: Andreas Riener commented on the manuscript

and provided feedback.
[9] Kundinger, T., Bhat, R. & Riener, A. (2021). Performance

and Acceptance Evaluation of a Driver Drowsiness Detection Sys-
tem based on Smart Wearables. In Proceedings of the 13th International
ACM Conference on Automotive User Interfaces and Interactive Vehicular Ap-
plications, under review.
work in joint discussions with Ramyashree Bhat and Andreas Riener and imple-
mented the driving scenario. Since this study was conducted in Ramyashree's
153
A Publications and Contribution Statement
master's thesis, the majority of evaluations was performed by her after dis-
cussing them with me. I authored most of the work.
Contribution of co-authors: In the course of her master's thesis, which I

supervised, Ramyashree Bhat implemented the prototype and performed the
majority of evaluations. She further supported conducting the user study and
co-authored the paper. Andreas Riener gave recommendations for implement-
ing the prototype and the study setting and commented on the work.
154
B German Versions of Study
Questionnaires and Scales
B.1 Own Questions
Wie alt sind Sie?

Welches Geschlecht haben Sie?
Wann sind Sie heute aufgestanden?
Wie viele Stunden haben Sie die letzte Nacht geschlafen?
Wie viele Stunden schlafen Sie gewöhnlich pro Nacht?
Wie haben Sie allgemein die letzte Nacht geschlafen?
Sind Sie derzeit in medikamentöser Behandlung?
Hatten Sie während einer Aufofahrt schon mal einen Sekundenschlaf ?
Bei welchem Wert der im Fahrzeug verwendeten Müdigkeitsskala wollen Sie
eine Warnung erhalten?
Wie sicher fühlten Sie sich bei der Müdigkeitseinschätzung während der
Testfahrt?
Bei welchem Fahrmodus mussten Sie eher mit Müdigkeit kämpfen?
Besitzen Sie ein Wearable (Smartwatch, Fitnesstracker)?
Haben Sie bisher schon Erfahrungen mit Smartwatches oder Fitnesstracker
gesammelt? Wenn nein, haben Sie vor in nächster Zeit ein Wearable
(Smartwatch, Fitnesstracker) zu kaufen?
Würden Sie, um die Sicherheit während einer Autofahrt zu erhöhen, vor
Fahrtbeginn ein Wearable (Fitnesstracker, Smartwatch) anlegen?
Welche Position am Körper würden Sie zum Tragen des Wearables
bevorzugen?
155
B German Versions of Study Questionnaires and Scales
B.2 Epworth Sleepiness Scale (ESS)
Ich sitze im Stuhl und lese. Die Wahrscheinlichkeit dabei einzuschlafen

oder einzunicken ist...(null/gering/mittel/hoch).
Ich schaue Fernsehen. Die Wahrscheinlichkeit dabei einzuschlafen oder
einzunicken ist...(null/gering/mittel/hoch).
Ich sitze im Theater oder in einer Versammlung. Die Wahrscheinlichkeit
dabei einzuschlafen oder einzunicken ist...(null/gering/mittel/hoch).
Ich bin Mitfahrer in einem Auto, das seit einer Stunde unterwegs ist. Die
Wahrscheinlichkeit dabei einzuschlafen oder einzunicken ist...(null/gering/
mittel/hoch).
Ich lege mich nachmittags zum Ausruhen hin. Die Wahrscheinlichkeit
dabei einzuschlafen oder einzunicken ist...(null/gering/mittel/hoch).
Ich unterhalte mich mit jemandem. Die Wahrscheinlichkeit dabei
einzuschlafen oder einzunicken ist...(null/gering/mittel/hoch).
Ich sitze nach dem Mittagessen im Sessel. Die Wahrscheinlichkeit dabei
einzuschlafen oder einzunicken ist...(null/gering/mittel/hoch).
Ich sitze in einem Auto, das für wenige Minuten an einer Ampel anhält.
Die Wahrscheinlichkeit dabei einzuschlafen oder einzunicken ist...(null/
gering/mittel/hoch).
B.3 Karolinska Sleepiness Scale (KSS)
1 | Extrem wach
2 | Sehr wach
3 | Wach
4 | Einigermaÿen wach
5 | Weder wach noch müde
6 | Erste Anzeichen von Müdigkeit
7 | Müde, aber keine Probleme wach zu bleiben
8 | Müde, erste Probleme wach zu bleiben
9 | Sehr müde, mit dem Schlaf kämpfend
156
B.4 Trust Scale
B.4 Trust Scale
Das automatisierte Fahrzeug ist irreführend.

Das automatisierte Fahrzeug verhält sich fehlerhaft.
Ich bin misstrauisch gegenüber den Absichten, Aktionen oder Ausgaben des
automatisierten Fahrzeugs.
Ich bin vorsichtig mit diesem automatisierten Fahrzeug.
Die Aktionen des automatisierten Fahrzeugs sind schädlich oder gefährlich.
Ich bin zuversichtlich, was dieses automatisierte Fahrzeug betrit.
Das automatisierte Fahrzeug bietet Sicherheit.
Das automatisierte Fahrzeug zeichnet sich durch Integrität aus.
Das automatisierte Fahrzeug ist verlässlich.
Das automatisierte Fahrzeug ist zuverlässig.
Ich kann dem automatisierten Fahrzeug vertrauen.
Ich kenne mich mit diesem automatisierten Fahrzeug aus.
B.5 User Experience Questionnaire (UEQ)
unerfreulich | erfreulich
unverständlich | verständlich
kreativ | phantasielos
leicht zu lernen | schwer zu lernen
wertvoll | minderwertig
langweilig | spannend
uninteressant | interessant
unberechenbar | voraussagbar
schnell | langsam
originell | konventionell
behindernd | unterstützend
gut | schlecht
kompliziert | einfach
abstoÿend | anziehend
herkömmlich | neuartig
unangenehm | angenehm
sicher | unsicher
aktivierend | einschläfernd
erwartungskonform | nicht erwartungskonform
inezient | ezient
übersichtlich | verwirrend
unpragmatisch | pragmatisch
aufgeräumt | überladen
attraktiv | unattraktiv
sympathisch | unsympathisch
konservativ | innovativ
157
B German Versions of Study Questionnaires and Scales
B.6 Technology Acceptance Model (TAM)
Das System zu benutzen wäre nützlich für mich.

Das System gibt mir das Gefühl der Kontrolle über meine Aktivitäten.
Das System würde meine Leistung verbessern.
Die Information zu nden, die ich brauche, ist einfach in diesem System.
Zu lernen mit dem System umzugehen, ist einfach.
Das System ist einfach zu benutzen.
Mir gefällt die Idee dieses Systems.
Das System zu benutzen ist ein erfreuliches Erlebnis.
Das System ist sinnvoll.
Ich würde das System benutzen wollen.
158
C German Version of Developed
Android Application
(a) (b)
(c) (d)
(e) (f)
159
Bibliography
[1] T. Kundinger, A. Riener, and N. Sofra, A robust drowsiness detection

method based on vehicle and driver vital data, in Mensch und Com-
puter 2017 - Workshopband, M. Burghardt, R. Wimmer, C. Wol, and
C. Womser-Hacker, Eds. Regensburg: Gesellschaft für Informatik e.V.,
2017.
[2] T. Kundinger, A. Riener, N. Sofra, and K. Weigl, Drowsi-

ness detection and warning in manual and automated driving:
Results from subjective evaluation, in Proceedings of the 10th In-
ternational Conference on Automotive User Interfaces and Interactive
Vehicular Applications, ser. AutomotiveUI '18. New York, NY,
USA: Association for Computing Machinery, 2018. [Online]. Available:
https://doi.org/10.1145/3239060.3239073
[3] T. Kundinger, P. Wintersberger, and A. Riener, (over)trust in

automated driving: The sleeping pill of tomorrow? in Extended
Abstracts of the 2019 CHI Conference on Human Factors in Computing
Systems, ser. CHI EA'19. New York, NY, USA: Association
for Computing Machinery, 2019, pp. 16. [Online]. Available:
https://doi.org/10.1145/3290607.3312869
[4] T. Kundinger, P. K. Yalavarthi, A. Riener, P. Wintersberger, and

C. Schartmüller, Feasibility of smart wearables for driver drowsiness
detection and its potential among dierent age groups, International
Journal of Pervasive Computing and Communications, vol. 16, no. 1,
pp. 123, jan 2020. [Online]. Available: https://doi.org/10.1108/IJPCC-
03-2019-0017
[5] T. Kundinger, N. Sofra, and A. Riener, Assessment of the

potential of wrist-worn wearable sensors for driver drowsiness
detection, Sensors, vol. 20, no. 4, 2020. [Online]. Available:
https://www.mdpi.com/1424-8220/20/4/1029
[6] T. Kundinger, A. Riener, N. Sofra, and K. Weigl, Driver drowsiness

in automated and manual driving: Insights from a test track
study, in Proceedings of the 25th International Conference on Intelligent
161
Bibliography
User Interfaces, ser. IUI'20. New York, NY, USA: Association

https://doi.org/10.1145/3377325.3377506
[7] T. Kundinger, C. Mayr, and A. Riener, Towards a reliable ground

truth for drowsiness: A complexity analysis on the example of driver
fatigue, Proc. ACM Hum.-Comput. Interact., vol. 4, no. EICS, Jun.
2020. [Online]. Available: https://doi.org/10.1145/3394980
[8] T. Kundinger and A. Riener, The potential of wrist-worn wear-

ables for driver drowsiness detection: A feasibility analysis, in
Proceedings of the 28th ACM Conference on User Modeling, Adaptation
and Personalization, ser. UMAP'20. New York, NY, USA: Association
https://doi.org/10.1145/3340631.3394852
[9] T. Kundinger, R. Bhat, and A. Riener, Acceptance and performance

evaluation of a driver drowsiness detection system based on smart wear-
ables, in Proceedings of the 13th International ACM Conference on Au-
tomotive User Interfaces and Interactive Vehicular Applications, ser. Au-
tomotiveUI'21, 2021, Under Review.
[10] A. Sahayadhas, K. Sundaraj, and M. Murugappan, Detecting driver

drowsiness based on sensors: A review, Sensors, vol. 12, no. 12, pp.
16 93716 953, 2012. [Online]. Available: https://www.mdpi.com/1424-
8220/12/12/16937
[11] M. Johns, Rethinking the assessment of sleepiness, Sleep Medicine

Reviews, vol. 2, no. 1, pp. 3 15, 1998. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1087079298900508
[12] J. D. Slater, A denition of drowsiness: One purpose for sleep? Medical

Hypotheses, vol. 71, no. 5, pp. 641 644, 2008. [Online]. Available:
[13] USA Today, AAA: Drowsy driving plays larger role in accidents
than federal statistics suggest, 2019, https://eu.usatoday.com/story/
news/2018/02/07/aaa-drowsy-driving-plays-larger-role-accidents-than-
federal-statistics-suggest/313226002/ (retrieved May 16, 2021).
[14] World Health Organisation, Global Status Report on Road Safety 2018:
Summary, World Health Organization, Tech. Rep. 1, 2018. [Online].
Available: https://www.who.int/publications/i/item/9789241565684
[15] National Highway Trac Safety Administration and US Department
162
Bibliography
of Transportation, TRAFFIC SAFETY FACTS Critical Reasons

for Crashes Investigated in the National Motor Vehicle Crash
Causation Survey, Tech. Rep., 2015. [Online]. Available: https:
//crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812115
[16] EuroNCAP, EuroNCAP 2025 Roadmap, Tech. Rep., 2017. [Online].

Available: https://cdn.euroncap.com/media/30700/euroncap-roadmap-
2025-v4.pdf
[17] B. C. Tet, Acute Sleep Deprivation and Risk of Motor Vehicle Crash
Involvement, Tech. Rep., 2016, https://aaafoundation.org/wp-content/
uploads/2017/12/AcuteSleepDeprivationCrashRisk.pdf (retrieved May
16, 2021).
[18] T. Åkerstedt, C. Bassetti, F. Cirignotta, D. García-Borreguero,

M. Gonçalves, J. Horne, D. Léger, M. Partinen, T. Penzel, P. Philipp,
and J. C. Verster, Sleepiness at the Wheel. French Motorway
Companies (ASFA) and the National Institute of Sleep and Vigilance
(INSV), 2013. [Online]. Available: https://esrs.eu/wp-content/uploads/
2018/09/Livre_blanc_VA_V4.pdf
[19] European Union, REGULATION (EU) 2019/2144 OF THE EURO-

PEAN PARLIAMENT AND OF THE COUNCIL , Tech. Rep., 2019.
[Online]. Available: https://eur-lex.europa.eu/legal-content/EN/TXT/
PDF/?uri=CELEX:32019R2144&from=EN
[20] Society of Automotive Engineers (SAE) International, Taxonomy and

Denitions for Terms Related to Driving Automation Systems for On-
Road Motor Vehicles, 2018. [Online]. Available: https://doi.org/10.4271/
J3016_201806
[21] T. Litman, Autonomous Vehicle Implementation Predictions, Tech.

Rep., 2020. [Online]. Available: https://www.vtpi.org/avip.pdf
[22] Jordan Golson, Tesla driver killed in crash with autopilot active, nhtsa
investigating, 2016, https://www.theverge.com/2016/6/30/12072408/
tesla-autopilot-car-crash-death-autonomous-model-s (retrieved May 16,
2021).
[23] Andrew J. Hawkins, Uber's self-driving car showed no signs of slowing

before fatal crash, police say, 2018, https://www.theverge.com/2018/3/
19/17140936/uber-self-driving-crash-death-homeless-arizona (retrieved
May 16, 2021).
[24] L. Bainbridge, Ironies of automation, Automatica, vol. 19, no. 6, pp.
163
Bibliography
775 779, 1983. [Online]. Available: http://www.sciencedirect.com/

science/article/pii/0005109883900468
[25] M. Doudou, A. Bouabdallah, and V. Berge-Cherfaoui, Driver drowsiness

measurement technologies: Current research, market solutions, and
challenges, Int. J. Intell. Transp. Syst. Res., Sep 2019. [Online].
Available: https://doi.org/10.1007/s13177-019-00199-w
[26] Audi AG, Driver assistance systems, 2013, https://www.audi-

mediacenter.com/en/foray-into-the-worlds-largest-market-segment-
the-audi-a3-sedan-and-s3-sedan-3249/driver-assistance-systems-3350
(retrieved May 16, 2021).
[27] Volkswagen AG, Driver Alert System, https://www.volkswagen.co.uk/

technology/car-safety/driver-alert-system (retrieved May 16, 2021).
[28] Daimler AG, ATTENTION ASSIST: Drowsiness-detection sys-

tem warns drivers to prevent them falling asleep momentarily,
https://media.daimler.com/marsMediaSite/en/instance/ko.xhtml?oid=
9361586 (retrieved May 16, 2021).
[29] Lexus, Lexus Safety System+, https://drivers.lexus.com/lexus-drivers-

theme/pdf/LSS+%20Quick%20Guide%20Link.pdf (retrieved May 16,
2021).
[30] J. Wörle, B. Metz, C. Thiele, and G. Weller, Detecting sleep in

drivers during highly automated driving: the potential of physiological
parameters, IET Intelligent Transport Systems, vol. 13, no. 8, pp. 1241
1248, 2019. [Online]. Available: https://doi.org/10.1049/iet-its.2018.5529
[31] IDC Corporate USA, Worldwide wearables market to top 300 million
units in 2019 and nearly 500 million units in 2023, 2019, https://
www.idc.com/getdoc.jsp?containerId=prUS45737919 (retrieved May 16,
2021).
[32] K. Georgiou, A. V. Larentzakis, N. N. Khamis, G. I. Alsuhaibani,

Y. A. Alaska, and E. J. Giallafos, Can wearable devices accurately
measure heart rate variability? a systematic review, Folia
Med., vol. 60, no. 1, pp. 7 20, 2018. [Online]. Available:
https://doi.org/10.2478/folmed-2018-0012
[33] R. R. Knipling and W. W. Wierwille, Vehicle-based drowsy driver

detection: Current status and future prospects, in Proceedings
of the IVHS AMERICA Conference Moving Toward Deployment, 1994.
[Online]. Available: https://rosap.ntl.bts.gov/view/dot/15347
164
Bibliography
[34] I. M. Ahmed and M. J. Thorpy, Clinical evaluation of the patient with

excessive sleepiness. Cambridge University Press, 2011, pp. 3649.
[Online]. Available: https://doi.org/10.1017/CBO9780511762697.006
[35] M. W. Johns, Drowsy Driving and the Law, Law Re-

form Commission of Tasmania, no. 12, pp. 212, 2007. [On-
line]. Available: http://www.mwjohns.com/wp-content/uploads/2017/
05/johns_2007_drowsy_driving_and_the_law.pdf
[36] T. Åkerstedt, Shift work disorder and sleepiness. Cambridge University

Press, 2011, pp. 186203. [Online]. Available: https://doi.org/10.1017/
CBO9780511762697.020
[37] I. D. Brown, Driver fatigue, Human Factors, vol. 36, no. 2,

pp. 298314, 1994, pMID: 8070794. [Online]. Available: https:
//doi.org/10.1177/001872089403600210
[38] M. Poursadeghiyan, A. Mazloumi, G. N. Saraji, A. Niknezhad,

A. Akbarzadeh, and M. H. Ebrahimi, Determination the levels of
subjective and observer rating of drowsiness and their associations with
facial dynamic changes, Iranian J. Public Health, vol. 46, no. 1, pp.
93102, 2017. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/
articles/PMC5401941/
[39] A. Anund, C. Fors, D. Hallvig, T. Åkerstedt, and G. Kecklund,

Observer rated sleepiness and real road driving: An explorative study,
PLoS OnE, vol. 8, no. 5, pp. 18, May 2013. [Online]. Available:
https://doi.org/10.1371/journal.pone.0064782
[40] M. W. Johns, A new method for measuring daytime sleepiness: The

epworth sleepiness scale, Sleep, vol. 14, no. 6, pp. 540545, 1991.
[Online]. Available: https://doi.org/10.1093/sleep/14.6.540
[41] M. A. Carskadon, Guidelines for the Multiple Sleep Latency Test

(MSLT): A Standard Measure of Sleepiness, Sleep, vol. 9, no. 4, pp. 519
524, 12 1986. [Online]. Available: https://doi.org/10.1093/sleep/9.4.519
[42] M. M. Mitler, K. S. Gujavarty, and C. P. Browman, Maintenance of

wakefulness test: A polysomnographic technique for evaluating treatment
ecacy in patients with excessive somnolence, Electroencephalography
and Clinical Neurophysiology, vol. 53, no. 6, pp. 658661, 1982. [Online].
Available: https://doi.org/10.1016/0013-4694(82)90142-0
[43] E. Hoddes, V. Zarcone, H. Smythe, R. Phillips, and W. C. Dement,

Quantication of sleepiness: A new approach, Psychophysiology,
165
Bibliography
vol. 10, no. 4, pp. 431436, 1973. [Online]. Available: https:

//doi.org/10.1111/j.1469-8986.1973.tb00801.x
[44] D. J. Buysse, C. F. Reynolds, T. H. Monk, S. R. Berman,

and D. J. Kupfer, The pittsburgh sleep quality index: A
new instrument for psychiatric practice and research, Psychiatry
Research, vol. 28, no. 2, pp. 193 213, 1989. [Online]. Available:
https://doi.org/10.1016/0165-1781(89)90047-4
[45] J. A. Horne and S. D. Baulk, Awareness of sleepiness when

driving, Psychophysiology, vol. 41, no. 1, pp. 161165, 2004.
[Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1046/
j.1469-8986.2003.00130.x
[46] K. Kaida, M. Takahashi, T. Åkerstedt, A. Nakata, Y. Otsuka,

T. Haratani, and K. Fukasawa, Validation of the karolinska sleepiness
scale against performance and eeg variables, Clinical Neurophysiology,
vol. 117, no. 7, pp. 1574 1581, 2006. [Online]. Available:
[47] T. Åkerstedt, A. Anund, J. Axelsson, and G. Kecklund, Subjective

sleepiness is a sensitive indicator of insucient sleep and impaired
waking function, Journal of Sleep Research, vol. 23, no. 3, pp.
242254, 2014. [Online]. Available: https://onlinelibrary.wiley.com/doi/
abs/10.1111/jsr.12158
[48] M. Gillberg, G. Kecklund, and T. Åkerstedt, Relations Between

Performance and Subjective Ratings of Sleepiness During a Night
Awake, Sleep, vol. 17, no. 3, pp. 236241, 05 1994. [Online]. Available:
https://doi.org/10.1093/sleep/17.3.236
[49] T. Åkerstedt and M. Gillberg, Subjective and objective sleepiness in the

active individual, Int. J. Neurosci., vol. 52, pp. 2937, 1990. [Online].
Available: https://doi.org/10.3109/00207459008994241
[50] V. Weinbeer, T. Muhr, K. Bengler, C. Baur, J. Radlmayr, and

J. Bill, Highly automated driving: How to get the driver drowsy
and how does drowsiness inuence various take-over-aspects? in
8. Tagung Fahrerassistenz. Lehrstuhl für Fahrzeugtechnik mit TÜV
SÜD Akademie, 2017. [Online]. Available: https://mediatum.ub.tum.de/
1421309
[51] C. Ahlström, C. Fors, A. Anund, and D. Hallvig, Video-based observer

rated sleepiness versus self-reported subjective sleepiness in real road
166
Bibliography
driving, Eur. Transp. Res. Rev., vol. 7, no. 4, p. 38, November 2015.
[Online]. Available: https://doi.org/10.1007/s12544-015-0188-y
[52] W. W. Wierwille and L. A. Ellsworth, Evaluation of driver drowsiness

by trained raters, Accident Analysis & Prevention, vol. 26, no. 5, pp.
571 581, 1994. [Online]. Available: https://doi.org/10.1016/0001-
4575(94)90019-1
[53] A. Mashko, Subjective Methods for the Assessment of Driver

Drowsiness, Acta Polytechnica CTU Proceedings, vol. 12, p. 64, dec
2017. [Online]. Available: https://doi.org/10.14311/app.2017.12.0064
[54] D. M. Wiegand, J. Mcclaerty, S. E. Mcdonald, and R. J. Hanowski,

Development and Evaluation of a Naturalistic Observer Rating of
Drowsiness Protocol Final Report, The National Surface Transportation
Safety Center for Excellence, 2009. [Online]. Available: https://
scholar.lib.vt.edu/VTTI/reports/ORD_Final_Report_022509.pdf
[55] K. Karrer-Gauÿ, Prospektive bewertung von systemen zur müdigkeit-

serkennung ableitung von gestaltungsempfehlungen zur vermeidung
von risikokompensation aus empirischen untersuchungen, Ph.D.
dissertation, Verkehrs- und Maschinensysteme, Technische Uni-
versität Berlin, Berlin, Germany, 20110. [Online]. Available:
https://depositonce.tu-berlin.de/handle/11303/3482
[56] M. Ramzan, H. U. Khan, S. M. Awan, A. Ismail, M. Ilyas, and

A. Mahmood, A survey on state-of-the-art drowsiness detection
techniques, IEEE Access, vol. 7, pp. 61 90461 919, 2019. [Online].
Available: https://doi.org/10.1109/ACCESS.2019.2914373
[57] H. Ueno, M. Kaneda, and M. Tsukino, Development of drowsiness

detection system, in Proceedings of VNIS'94 - 1994 Vehicle Navigation
and Information Systems Conference, Aug 1994, pp. 1520. [Online].
Available: https://doi.org/10.1109/VNIS.1994.396873
[58] S. Leonhardt, L. Leicht, and D. Teichmann, Unobtrusive vital

sign monitoring in automotive environments-A review, Sensors
(Basel), vol. 18, no. 9, sep 2018. [Online]. Available: https:
//doi.org/10.3390/s18093080
[59] G. Li and W. Y. Chung, Estimation of eye closure degree using EEG

sensors and its application in driver drowsiness detection, Sensors
(Switzerland), vol. 14, no. 9, pp. 17 49117 515, 2014. [Online]. Available:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4208235/
167
Bibliography
[60] D. F. Dinges and R. Grace, Perclos: A valid psychophysi-

ological measure of alertness as assessed by psychomotor vigi-
lance, US Department of Transportation, Federal Highway Administra-
tion, Publication Number FHWA-MCRT-98-006, 1998. [Online]. Avail-
able: https://rosap.ntl.bts.gov/view/dot/113/dot_113_DS1.pdf ?
[61] T. A. Dingus, H. Hardee, and W. W. Wierwille, Development of

models for on-board detection of driver impairment, Accident Analysis
& Prevention, vol. 19, no. 4, pp. 271 283, 1987. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/0001457587900625
[62] T. P. Nguyen, M. T. Chew, and S. Demidenko, Eye tracking system

to detect driver drowsiness, in 2015 6th International Conference on
Automation, Robotics and Applications (ICARA), 2015, pp. 472477.
[Online]. Available: https://doi.org/10.1109/ICARA.2015.7081194
[63] T. Vesselenyi, S. Moca, A. Rus, T. Mitran, and B. T taru, Driver

drowsiness detection using ANN image processing, IOP Conf. Series:
Materials Science and Engineering, vol. 252, p. 012097, oct 2017.
[Online]. Available: https://doi.org/10.1088/1757-899X/252/1/012097
[64] R. Jabbar, K. Al-Khalifa, M. Kharbeche, W. Alhajyaseen, M. Jafari,

and S. Jiang, Real-time driver drowsiness detection for android
application using deep neural networks techniques, Procedia Computer
Science, vol. 130, pp. 400 407, 2018, the 9th International Conference
on Ambient Systems, Networks and Technologies (ANT 2018) /
The 8th International Conference on Sustainable Energy Information
Technology (SEIT-2018) / Aliated Workshops. [Online]. Available:
[65] M. F. Shakeel, N. A. Bajwa, A. M. Anwaar, A. Sohail, A. Khan,

and H. ur Rashid, Detecting driver drowsiness in real time through
deep learning based object detection, in Advances in Computational
Intelligence, I. Rojas, G. Joya, and A. Catala, Eds. Cham: Springer
International Publishing, 2019, pp. 283296. [Online]. Available:
https://doi.org/10.1007/978-3-030-20521-8_24
[66] V. Vijayan and E. Sherly, Real time detection system of driver

drowsiness based on representation learning using deep neural
networks, J. Intell. Fuzzy Syst., vol. 36, pp. 19, 01 2019. [Online].
Available: https://doi.org/10.3233/JIFS-169909
[67] A. Bamidele, K. Kamardin, N. Syazarin, S. Mohd, I. Sha, A. Azizan,

N. Aini, and H. Mad, Non-intrusive driver drowsiness detection based on
168
Bibliography
face and eye tracking, Int J. Adv. Comput. Sci. Appl., vol. 10, 01 2019.
[Online]. Available: https://doi.org/10.14569/IJACSA.2019.0100775
[68] SmartEye, Driver Monitoring System | Interior Sensing for vehicle in-
tegration, 2019, https://smarteye.se/automotive-solutions/ (retrieved
May 16, 2021).
[69] N. Edenborough, R. Hammoud, A. Harbach, A. Ingold, B. Kisacanin,

P. Malawey, T. Newman, G. Scharenbroch, S. Skiver, M. Smith,
A. Wilhelm, G. Witt, E. Yoder, and H. Zhang, Driver state monitor
from delphi, in 2005 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR), vol. 2, June 2005, pp. 1206
1207 vol. 2. [Online]. Available: https://doi.org/10.1109/CVPR.2005.135
[70] Optalert, Scientically validated Glasses-Mining, 2019, https:

//www.optalert.com/explore-products/scientically-validated-glasses-
mining/ (retrieved May 16, 2021).
[71] W. Zhang, B. Cheng, and Y. Lin, Driver drowsiness recognition based

on computer vision technology, Tsinghua Sci. Technol., vol. 17, no. 3,
pp. 354362, June 2012. [Online]. Available: https://doi.org/10.1109/
TST.2012.6216768
[72] U. Trutschel, B. Sirois, D. Sommer, M. Golz, and D. Edwards,

PERCLOS: An Alertness Measure of the Past, in Driving Assessment
2011: 6th International Driving Symposium on Human Factors in Driver
Assessment, Training, and Vehicle Design, 2011, pp. 172179. [Online].
Available: https://doi.org/10.17077/drivingassessment.1394
[73] J. A. Horne and L. A. Reyner, Sleep related vehicle accidents,

BMJ, vol. 310, no. 6979, pp. 565567, 1995. [Online]. Available:
https://doi.org/10.1136/bmj.310.6979.565
[74] A. I. Pack, A. M. Pack, E. Rodgman, A. Cucchiara, D. F.

Dinges, and C. Schwab, Characteristics of crashes attributed to
the driver having fallen asleep, Accident Analysis & Prevention,
vol. 27, no. 6, pp. 769 775, 1995. [Online]. Available: http:
//www.sciencedirect.com/science/article/pii/0001457595000348
[75] J.-S. Wang, R. Knipling, and M. Goodman, The role of driver

inattention in crashes: new statistics from the 1995 Crashworthiness Data
System, Annual proceedings of the Association for the Advancement of
Automotive Medicine, vol. 40, pp. 377392, 1996. [Online]. Available:
https://trid.trb.org/view/476093
169
Bibliography
[76] A. T. McCartt, S. A. Ribner, A. I. Pack, and M. C. Hammer, The

scope and nature of the drowsy driving problem in new york state,
Accident Analysis & Prevention, vol. 28, no. 4, pp. 511 517, 1996.
[Online]. Available: http://www.sciencedirect.com/science/article/pii/
0001457596000218
[77] A. oli¢, O. Marques, and B. Furht, Driver Drowsiness Detection

and Measurement Methods, in SpringerBriefs in Computer Science.
Springer, 2014, no. 9783319115344, pp. 718. [Online]. Available:
https://doi.org/10.1007/978-3-319-11535-1_2
[78] P. M. Forsman, B. J. Vila, R. A. Short, C. G. Mott, and H. P. V.

Dongen, Ecient driver drowsiness detection at moderate levels of
drowsiness, Accident Analysis & Prevention, vol. 50, pp. 341 350,
2013. [Online]. Available: https://doi.org/10.1016/j.aap.2012.05.005
[79] M. Ingre, T. Åkerstedt, B. Peters, A. Anund, G. Kecklund, and

A. Pickles, Subjective sleepiness and accident risk avoiding the
ecological fallacy, J. Sleep Res., vol. 15, no. 2, pp. 142148, 2006.
[Online]. Available: https://doi.org/10.1111/j.1365-2869.2006.00517.x
[80] D. M. Morris, J. J. Pilcher, and F. S. S. III, Lane heading dierence: An

innovative model for drowsy driving detection using retrospective analysis
around curves, Accident Analysis & Prevention, vol. 80, pp. 117 124,
[81] F. Friedrichs and B. Yang, Drowsiness monitoring by steering

and lane data based features under real driving conditions, in
European Signal Processing Conference, 2010, pp. 209213. [Online].
Available: https://www.iss.uni-stuttgart.de/forschung/publikationen/
friedrichs-eusipco2010.pdf
[82] Z. Li, S. E. Li, R. Li, B. Cheng, and J. Shi, Online detection of

driver fatigue using steering wheel angles for real driving conditions,
Sensors (Basel), vol. 17, no. 3, pp. 112, 2017. [Online]. Available:
https://doi.org/10.3390/s17030495
[83] A. D. McDonald, C. Schwarz, J. D. Lee, and T. L. Brown, Real-

time detection of drowsiness related lane departures using steering
wheel angle, Proceedings of the Human Factors and Ergonomics Society
Annual Meeting, vol. 56, no. 1, pp. 22012205, 2012. [Online]. Available:
https://doi.org/10.1177/1071181312561464
[84] S. H. Fairclough and R. Graham, Impairment of driving performance

caused by sleep deprivation or alcohol: A comparative study, Human
170
Bibliography
Factors, vol. 41, no. 1, pp. 118128, 1999, pMID: 10354808. [Online].
[85] R. Feng, G. Zhang, and B. Cheng, An on-board system for detecting
driver drowsiness based on multi-sensor data fusion using dempster-
shafer theory, in 2009 International Conference on Networking, Sensing
and Control, 2009, pp. 897902. [Online]. Available: https://doi.org/
10.1109/ICNSC.2009.4919399
[86] J. Schmidt, C. Braunagel, W. Stolzmann, and K. Karrer-Gauÿ, Driver

drowsiness and behavior detection in prolonged conditionally automated
drives, in 2016 IEEE Intelligent Vehicles Symposium (IV), 2016, pp.
400405. [Online]. Available: https://doi.org/10.1109/IVS.2016.7535417
[87] G. Sikander and S. Anwar, Driver fatigue detection systems:

A review, IEEE Transactions on Intelligent Transportation Systems,
//doi.org/10.1109/TITS.2018.2868499
[88] A. Kales, A. Rechtschaen, L. A. B. I. S. University of Califor-

nia, and N. N. I. N. (U.S.), A Manual of Standardized Terminology,
Techniques and Scoring System for Sleep Stages of Human Subjects: Al-
lan Rechtschaen and Anthony Kales, Editors, ser. NIH publication.
U. S. National Institute of Neurological Diseases and Blind-
ness, Neurological Information Network, 1968. [Online]. Available:
https://books.google.de/books?id=wzdRnQEACAAJ
[89] C. Iber, S. Ancoli-Israel, A. L. Chesson, S. F. Quan et al.,

The AASM manual for the scoring of sleep and associated events: rules,
terminology and technical specications. American academy of sleep
medicine Westchester, IL, 2007, vol. 1. [Online]. Available: https:
//aasm.org/clinical-resources/scoring-manual/
[90] A. Mehreen, S. M. Anwar, M. Haseeb, M. Majid, and M. O. Ullah,

A Hybrid Scheme for Drowsiness Detection Using Wearable Sensors,
IEEE Sensors Journal, vol. 19, no. 13, pp. 51195126, 2019. [Online].
Available: https://doi.org/10.1109/JSEN.2019.2904222
[91] G. Li and W.-Y. Chung, A context-aware eeg headset system for

early detection of driver drowsiness, Sensors, vol. 15, no. 8, pp.
20 87320 893, 2015. [Online]. Available: https://www.mdpi.com/1424-
8220/15/8/20873
[92] F. Wang, S. Wang, X. Wang, Y. Peng, and Y. Yang, Design

of driving fatigue detection system based on hybrid measures using
171
Bibliography
wavelet-packets transform, in 2014 IEEE International Conference on

Robotics and Automation (ICRA), May 2014, pp. 40374042. [Online].
Available: https://doi.org/10.1109/ICRA.2014.6907445
[93] S. Taran and V. Bajaj, Drowsiness detection using adaptive hermite

decomposition and extreme learning machine for electroencephalogram
signals, IEEE Sens. J., vol. 18, no. 21, pp. 88558862, Nov 2018.
[Online]. Available: https://doi.org/10.1109/JSEN.2018.2869775
[94] F. Rundo, S. Rinella, S. Massimino, M. Coco, G. Fallica, R. Parenti,

S. Conoci, and V. Perciavalle, An innovative deep learning algorithm
for drowsiness detection from eeg signal, Computation, vol. 7, no. 1,
2019. [Online]. Available: https://doi.org/10.3390/computation7010013
[95] U. Budak, V. Bajaj, Y. Akbulut, O. Atila, and A. Sengur, An

eective hybrid model for eeg-based drowsiness detection, IEEE Sens.
J., vol. 19, no. 17, pp. 76247631, September 2019. [Online]. Available:
https://doi.org/10.1109/JSEN.2019.2917850
[96] R. Fu and H. Wang, Detection of driving fatigue by using

noncontact emg and ecg signals measurement system, Int. J. of
Neural Syst., vol. 24, no. 03, p. 1450006, 2014. [Online]. Available:
https://doi.org/10.1142/S0129065714500063
[97] V. Balasubramanian and K. Adalarasu, Emg-based analysis of change

in muscle activity during simulated driving, Journal of Bodywork
and Movement Therapies, vol. 11, no. 2, pp. 151 158, 2007.
S1360859207000034
[98] M. Mahmoodi and A. Nahvi, Driver drowsiness detection based

on classication of surface electromyography features in a driving
simulator, Proceedings of the Institution of Mechanical Engineers, Part
H: Journal of Engineering in Medicine, vol. 233, no. 4, pp. 395406,
2019. [Online]. Available: https://doi.org/10.1177/0954411919831313
[99] C. D. Katsis, N. E. Ntouvas, C. G. Bafas, and D. I. Fotiadis, Assessment

of muscle fatigue during driving using surface EMG, Tech. Rep., 2004.
[100] I. Hostens and H. Ramon, Assessment of muscle fatigue in low

level monotonous task performance during car driving, Journal of
Electromyography and Kinesiology, vol. 15, no. 3, pp. 266 274, 2005.
S1050641104000823
172
Bibliography
[101] D. Malathi, J. D. Dorathi Jayaseeli, S. Madhuri, and K. Senthilkumar,

Electrodermal Activity Based Wearable Device for Drowsy Drivers,
Journal of Physics: Conference Series, vol. 1000, no. 1, 2018. [Online].
Available: https://doi.org/10.1088/1742-6596/1000/1/012048
[102] M. Misbhauddin, A. R. AlMutlaq, A. Almithn, N. Alshukr, and

M. Aleesa, Real-time driver drowsiness detection using wearable
technology, in ACM International Conference Proceeding Series, 2019.
[Online]. Available: https://doi.org/10.1145/3368756.3369081
[103] M. Choi, G. Koo, M. Seo, and S. W. Kim, Wearable Device-Based

System to Monitor a Driver's Stress, Fatigue, and Drowsiness, IEEE
Trans. Instrum. Meas., vol. 67, no. 3, pp. 634645, March 2018. [Online].
Available: https://doi.org/10.1109/TIM.2017.2779329
[104] M. M. Bundele and R. Banerjee, Detection of fatigue of vehicular

driver using skin conductance and oximetry pulse: A neural
network approach, in iiWAS2009 - The 11th International Conference on
Information Integration and Web-based Applications and Services. New
York, New York, USA: ACM Press, 2009, pp. 739744. [Online].
Available: http://portal.acm.org/citation.cfm?doid=1806338.1806478
[105] W. Zheng, K. Gao, G. Li, W. Liu, C. Liu, J. Liu, G. Wang, and

B. Lu, Vigilance estimation using a wearable eog device in real
driving environment, IEEE Trans. Intell. Transp. Syst., pp. 115, 2019.
[Online]. Available: https://doi.org/10.1109/TITS.2018.2889962
[106] S. Barua, M. U. Ahmed, C. Ahlström, and S. Begum, Automatic

driver sleepiness detection using eeg, eog and contextual information,
Expert Syst. Appl., vol. 115, pp. 121 135, 2019. [Online]. Available:
https://doi.org/10.1016/j.eswa.2018.07.054
[107] S. Hu and G. Zheng, Driver drowsiness detection with eyelid

related parameters by support vector machine, Expert Syst. Appl.,
vol. 36, no. 4, pp. 7651 7658, 2009. [Online]. Available:
[108] X. Zhu, W. Zheng, B. Lu, X. Chen, S. Chen, and C. Wang,

Eog-based drowsiness detection using convolutional neural networks,
in 2014 International Joint Conference on Neural Networks (IJCNN),
2014, pp. 128134. [Online]. Available: https://doi.org/10.1109/
IJCNN.2014.6889642
[109] Y. Zhang, X. Gao, J. Zhu, W. Zheng, and B. Lu, A novel approach to

driving fatigue detection using forehead eog, in 2015 7th International
173
Bibliography
IEEE/EMBS Conference on Neural Engineering (NER), 2015, pp. 707

710. [Online]. Available: https://doi.org/10.1109/NER.2015.7146721
[110] L. R. Young and D. Sheena, Eye-movement measurement techniques.

The American Psychologist, vol. 30, no. 3, pp. 315330, 1975. [Online].
Available: https://doi.org/10.1037/0003-066X.30.3.315
[111] R. Barea, L. Boquete, M. Mazo, and E. Lopez, System for

assisted mobility using eye movements based on electrooculography,
IEEE Transactions on Neural Systems and Rehabilitation Engineering,
//doi.org/10.1109/TNSRE.2002.806829
[112] K. Hyoki, M. Shigeta, N. Tsuno, Y. Kawamuro, and T. Kinoshita,

Quantitative electro-oculography and electroencephalography as indices
of alertness, Electroencephalography and Clinical Neurophysiology, vol.
106, no. 3, pp. 213 219, 1998. [Online]. Available: http:
//www.sciencedirect.com/science/article/pii/S0013469497001284
[113] H. Häkkänen, LicPsych, H. Summala, M. Partinen, M. Tiihonen,

and J. Silvo, Blink Duration as an Indicator of Driver Sleepiness in
Professional Bus Drivers, Sleep, vol. 22, no. 6, pp. 798802, 09 1999.
[Online]. Available: https://doi.org/10.1093/sleep/22.6.798
[114] H. Lee, J. Lee, and M. Shin, Using wearable ecg/ppg sensors

for driver drowsiness detection based on distinguishable pattern of
recurrence plots, Electronics, vol. 8, no. 2, 2019. [Online]. Available:
https://doi.org/10.3390/electronics8020192
[115] M. Gromer, D. Salb, T. Walzer, N. M. Madrid, and R. Seepold, Ecg

sensor for detection of driver's drowsiness, Procedia Comput. Sci., vol.
159, pp. 1938 1946, 2019, knowledge-Based and Intelligent Information
& Engineering Systems: Proc. of the 23rd Int. Conf. KES2019. [Online].
Available: https://doi.org/10.1016/j.procs.2019.09.366
[116] M. Babaeian and M. Mozumdar, Driver drowsiness detection

algorithms using electrocardiogram data analysis, in 2019 IEEE
9th Annual Computing and Communication Workshop and Conference
(CCWC), Jan 2019, pp. 00010006. [Online]. Available: https:
//doi.org/10.1109/CCWC.2019.8666467
[117] J. Vicente, P. Laguna, A. Bartra, and R. Bailón, Drowsiness

detection using heart rate variability, Med. Biol. Eng. & Comput.,
vol. 54, no. 6, pp. 927937, June 2016. [Online]. Available:
https://doi.org/10.1007/s11517-015-1448-7
174
Bibliography
[118] M. Patel, S. Lal, D. Kavanagh, and P. Rossiter, Applying neural

network analysis on heart rate variability data to assess driver fatigue,
Expert Syst. Appl., vol. 38, no. 6, pp. 72357242, Jun. 2011. [Online].
Available: https://doi.org/10.1016/j.eswa.2010.12.028
[119] G. Yang, Y. Lin, and P. Bhattacharya, A driver fatigue recognition

model based on information fusion and dynamic bayesian network,
Information Sciences, vol. 180, no. 10, pp. 1942 1954, 2010, special
Issue on Intelligent Distributed Information Systems. [Online]. Available:
[120] E. Michail, A. Kokonozi, I. Chouvarda, and N. Maglaveras, Eeg

and hrv markers of sleepiness and loss of control during car driving,
in 2008 30th Annual International Conference of the IEEE Engineering
in Medicine and Biology Society, Aug 2008, pp. 25662569. [Online].
Available: https://doi.org/10.1109/IEMBS.2008.4649724
[121] M. Malik, A. Camm, J. Bigger, G. Breithardt, S. Cerutti, R. Cohen,

P. Coumel, E. Fallen, H. Kennedy, R. Kleiger, F. Lombardi, A. Malliani,
A. Moss, J. Rottman, G. Schmidt, P. Schwartz, and D. Singer, Heart
rate variability. standards of measurement, physiological interpretation,
and clinical use, Eur. Heart J., vol. 17, no. 3, pp. 354381, 1996.
[Online]. Available: https://doi.org/10.1161/01.cir.93.5.1043
[122] I. Lee, P. Lau, E. C.-P. Chua, J. J. Gooley, W.-Q. Tan, S.-C. Yeo,
K. Puvanendran, and I. H. Mien, Heart Rate Variability Can Be Used
to Estimate Sleepiness-related Decrements in Psychomotor Vigilance
during Total Sleep Deprivation, Sleep, vol. 35, no. 3, pp. 325334,
March 2012. [Online]. Available: https://doi.org/10.5665/sleep.1688
[123] J. Allen, Photoplethysmography and its application in clinical

physiological measurement, Physiological Measurement, vol. 28, no. 3,
pp. R1R39, feb 2007. [Online]. Available: https://doi.org/10.1088%
2F0967-3334%2F28%2F3%2Fr01
[124] E. A. Pelaez and E. R. Villegas, Led power reduction trade-os

for ambulatory pulse oximetry, in 2007 29th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society,
IEMBS.2007.4352784
[125] Watch Ranker, How do smartwatches & tness trackers measure your
heart rate (hr)? 2020, https://watchranker.com/how-do-smartwatches-
tness-trackers-measure-heart-rate/.
175
Bibliography
[126] Tom's Guide, Who has the most accurate heart rate monitor? 2018,
https://www.tomsguide.com/us/heart-rate-monitor, review-2885.html.
[127] Empatica Support, E4 data - ibi expected signal, 2020,

https://support.empatica.com/hc/en-us/articles/360030058011-E4-
data-IBI-expected-signal (retrieved May 16, 2021).
[128] Using Photoplethysmography Based Features As Indicators of Drowsiness:

Preliminary Results, ser. Frontiers in Biomedical Devices, vol. 2019
Design of Medical Devices Conference, 04 2019, v001T09A008. [Online].
Available: https://doi.org/10.1115/DMD2019-3236
[129] L. B. Leng, L. B. Giin, and W. Chung, Wearable driver

drowsiness detection system based on biomedical and motion sensors,
in IEEE SENSORS, 2015, pp. 14. [Online]. Available: https:
//doi.org/10.1109/ICSENS.2015.7370355
[130] M. V. Ramesh, A. K. Nair, and A. T. Kunnathu, Real-time automated

multiplexed sensor system for driver drowsiness detection, in 2011
7th International Conference on Wireless Communications, Networking
and Mobile Computing, Sep. 2011, pp. 14. [Online]. Available:
https://doi.org/10.1109/wicom.2011.6040613
[131] H. Rahim, A. Dalimi, and H. Jaafar, Detecting drowsy driver

using pulse sensor, J. Teknol., vol. 73, 03 2015. [Online]. Available:
https://doi.org/10.11113/jt.v73.4238
[132] S. Jung, H. Shin, and W. Chung, Driver fatigue and drowsiness

monitoring system with embedded electrocardiogram sensor on steering
wheel, IET Intell. Transp. Syst., vol. 8, no. 1, pp. 4350, Feb 2014.
[Online]. Available: https://doi.org/10.1049/iet-its.2012.0032
[133] J. Solaz, J. Laparra-HernÃ½ndez, D. Bande, N. RodrÃguez, S. Vele,

J. Gerpe, and E. Medina, Drowsiness detection based on the
analysis of breathing rate obtained from real-time image recognition,
Transportation Research Procedia, vol. 14, pp. 3867 3876, 2016,
transport Research Arena TRA2016. [Online]. Available: https:
//doi.org/10.1016/j.trpro.2016.05.472
[134] Final Report Summary - HARKEN (Heart and respiration in-car embed-
ded nonintrusive sensors) | Report Summary | HARKEN | FP7 | CORDIS
| European Commission, 2014, https://cordis.europa.eu/project/rcn/
103870/reporting/en (retrieved May 16, 2021).
[135] N. Pham, T. Dinh, Z. Raghebi, T. Kim, N. Bui, P. Nguyen,
176
Bibliography
H. Truong, F. Banaei-Kashani, A. Halbower, T. Dinh, and T. Vu,

Wake: A behind-the-ear wearable system for microsleep detection,
in Proceedings of the 18th International Conference on Mobile Systems,
Applications, and Services, ser. MobiSys'20. New York, NY, USA:
Association for Computing Machinery, 2020, pp. 404418. [Online].
Available: https://doi.org/10.1145/3386901.3389032
[136] Astute Electronics Ltd., Plessey warden driver alertness moni-

tor, 2020, https://www.astute.global/products/plessey-warden-driver-
alertness-monitor/.
[137] CardioID, Cardiowheel, 2020, https://www.cardio-id.com/cardiowheel.
[138] A. Lourenço, A. P. Alves, C. Carreiras, R. P. Duarte, and A. Fred,

Cardiowheel: Ecg biometrics on the steering wheel, in Machine
Learning and Knowledge Discovery in Databases, A. Bifet, M. May,
B. Zadrozny, R. Gavalda, D. Pedreschi, F. Bonchi, J. Cardoso, and
M. Spiliopoulou, Eds. Cham: Springer International Publishing,
2015, pp. 267270. [Online]. Available: https://www.it.pt/Publications/
PaperConference/16310
[139] Creative Mode, STEER: Wearable device that will not let you
fall asleep, 2019, https://www.kickstarter.com/projects/creativemode/
steer-you-will-never-fall-asleep-while-driving?lang=en (retrieved May 16,
2021).
[140] StopSleep, Anti-sleep alarm, 2019, https://www.stopsleep.co.uk/ (re-

trieved May 16, 2021).
[141] Neurocom, Driver vigilance telemetric control system - VIGI-

TON, 2019, http://www.neurocom.ru/en2/product/vigiton.html (re-
trieved May 16, 2021).
[142] B. Lee, B. Lee, and W. Chung, Standalone wearable driver

drowsiness detection system in a smartwatch, IEEE Sens. J.,
vol. 16, no. 13, pp. 54445451, July 2016. [Online]. Available:
https://doi.org/10.1109/JSEN.2016.2566667
[143] B.-l. Lee, B.-g. Lee, G. Li, and W.-Y. Chung, Wearable Driver Drowsi-
ness Detection System Based on Smartwatch, in Korea Institute of Signal
Processing and Systems, vol. 15, 2014, pp. 134146.
[144] Q. Li, J. Wu, S.-D. Kim, and C.-G. Kim, Hybrid driver fatigue detection
system based on data fusion with wearable sensor devices, 2014.
177
Bibliography
[145] B. Lee, B. Lee, and W. Chung, Wristband-type driver vigilance

monitoring system using smartwatch, IEEE Sensors Journal, vol. 15,
no. 10, pp. 56245633, 2015. [Online]. Available: https://doi.org/
10.1109/JSEN.2015.2447012
[146] J. Gielen and J.-M. Aerts, Feature extraction and evaluation for driver
drowsiness detection based on thermoregulation, Applied Sciences,
vol. 9, no. 17, 2019. [Online]. Available: https://doi.org/10.3390/
app9173555
[147] C. Bi, J. Huang, G. Xing, L. Jiang, X. Liu, and M. Chen,

Safewatch: A wearable hand motion tracking system for improving
driving safety, in 2017 IEEE/ACM Second International Conference on
Internet-of-Things Design and Implementation (IoTDI), 2017, pp. 223
232. [Online]. Available: https://ieeexplore.ieee.org/document/7946880
[148] A. A. Putilov, O. G. Donskaya, and E. G. Verevkin, Quan-

tication of sleepiness through principal component analysis
of the electroencephalographic spectrum, Chronobiology Interna-
tional, vol. 29, no. 4, pp. 509522, 2012. [Online]. Available:
https://doi.org/10.1016/j.clinph.2013.01.018
[149] J. De Winter and P. Happee, Advantages and disadvantages of driving

simulators: a discussion, in Proceedings of Measuring Behavior, 2012,
pp. 4750. [Online]. Available: https://measuringbehavior.org/mb2012/
les/2012/ProceedingsPDF(website)/Special%20Sessions/Measuring%
20Driver%20and%20Pilot%20Behavior/de_Winter_et_al_MB2012.pdf
[150] C. Neubauer, G. Matthews, L. Langheim, and D. Saxby, Fatigue

and voluntary utilization of automation in simulated driving, Human
Factors, vol. 54, no. 5, pp. 734746, 2012. [Online]. Available:
https://doi.org/10.1177/0018720811423261
[151] M. Körber, A. Cingel, M. Zimmermann, and K. Bengler, Vigilance

Decrement and Passive Fatigue Caused by Monotony in Automated
Driving, Procedia Manufact, vol. 3, pp. 24032409, 2015. [Online].
Available: https://doi.org/10.1016/j.promfg.2015.07.499
[152] D. Miller, A. Sun, M. Johns, H. Ive, D. Sirkin, S. Aich, and W. Ju,

Distraction becomes engagement in automated driving, Proceedings
of the Human Factors and Ergonomics Society Annual Meeting, vol. 59,
no. 1, pp. 16761680, 2015. [Online]. Available: https://doi.org/10.1177/
1541931215591362
[153] T. Vogelpohl, M. Vollrath, M. Kühn, T. Hummel, and T. Gehlert,
178
Bibliography
Übergabe von hochautomatisiertem Fahren zu manueller Steuerung, 2016,

no. 39. [Online]. Available: https://trid.trb.org/view/1435665
[154] O. Jarosch, M. Kuhnt, S. Paradies, and K. Bengler, It's Out of

Our Hands Now! Eects of Non-Driving Related Tasks During
Highly Automated Driving on Drivers' Fatigue, ser. Proceedings of
the 9th International Driving Symposium on Human Factors in Driver
Assessment, Training, and Vehicle Design, 2017, pp. 319325. [Online].
Available: https://doi.org/10.17077/drivingassessment.1653
[155] O. Jarosch, H. Bellem, and K. Bengler, Eects of task-induced fatigue

in prolonged conditional automated driving, Human Factors, vol. 61,
no. 7, pp. 11861199, 2019, pMID: 30657711. [Online]. Available:
https://doi.org/10.1177/0018720818816226
[156] M. Omae, T. Fujioka, N. Hashimoto, and H. Shimizu, The

application of rtk-gps and steer-by-wire technology to the automatic
driving of vehicles and an evluation of driver behavior, IATSS
Research, vol. 30, no. 2, pp. 29 38, 2006. [Online]. Available:
[157] A. Feldhütter, T. Hecht, and K. Bengler, Fahrerspez-

ische Aspekte beim hochautomatisierten Fahren, Tech.
Rep., 2017. [Online]. Available: https://bast.opus.hbz-
nrw.de/opus45-bast/frontdoor/deliver/index/docId/1890/le/
FE_82.0628_Schlussbericht_Fahrerspezische_Aspekte_HAF_nal.pdf
[158] J. Anderson, N. Kalra, K. Stanley, P. Sorensen,

C. Samaras, and O. Oluwatola, Autonomous Vehicle Technol-
ogy: A Guide for Policymakers. RAND Corporation, 2016. [On-
line]. Available: https://www.rand.org/pubs/research_reports/RR443-
2.html.Alsoavailableinprintform.
[159] F. Berghöfer, C. Purucker, F. Naujoks, K. Wiedemann, and C. Mar-

berger, Prediction of Take-Over Time Demand in Highly Automated
Driving. Results of a Naturalistic Driving Study Prediction of take-over
time demand in conditionally automated driving - results of a real world
driving study, ser. Proceedings of the Human Factors and Ergonomics
Society Europe Chapter 2018 Annual Conference, 2019.
[160] S. Baltodano, S. Sibi, N. Martelaro, N. Gowda, and W. Ju,

The rrads platform: A real road autonomous driving simulator,
in Proceedings of the 7th International Conference on Automotive User
Interfaces and Interactive Vehicular Applications, ser. AutomotiveUI '15.
179
Bibliography
New York, NY, USA: ACM, 2015, pp. 281288. [Online]. Available:
https://doi.org/10.1145/2799250.2799288
[161] Sleep Health Foundation, Sleep needs across the lifespan, 2015,
http://www.sleephealthfoundation.org.au/les/pdfs/Sleep-Needs-
Across-Lifespan.pdf (retrieved May 16, 2021).
[162] R. Fu, H. Wang, and W. Zhao, Dynamic driver fatigue detection

using hidden Markov model in real driving condition, Expert Systems
with Applications, vol. 63, pp. 397411, 2016. [Online]. Available:
[163] S. Kujala, V. Roto, K. Väänänen-Vainio-Mattila, E. Karapanos, and

A. Sinnelä, Ux curve: A method for evaluating long-term user
experience, Interacting with Computers, vol. 23, no. 5, pp. 473
483, 2011, feminism and HCI: New Perspectives. [Online]. Available:
[164] A. R. Wagner, J. Borenstein, and A. Howard, Overtrust in the robotic

age, Commun. ACM, vol. 61, no. 9, pp. 2224, Aug. 2018. [Online].
[165] J. D. Lee and K. A. See, Trust in automation: Designing for

appropriate reliance, Human Factors, vol. 46, no. 1, pp. 5080,
2004, pMID: 15151155. [Online]. Available: https://doi.org/10.1518/
hfes.46.1.50_30392
[166] R. Parasuraman and V. Riley, Humans and automation: Use, misuse,

disuse, abuse, Human Factors, vol. 39, no. 2, pp. 230253, 1997.
[Online]. Available: https://doi.org/10.1518/001872097778543886
[167] K. A. Ho and M. Bashir, Trust in automation: Integrating

empirical evidence on factors that inuence trust, Human Factors,
vol. 57, no. 3, pp. 407434, 2015, pMID: 25875432. [Online]. Available:
https://doi.org/10.1177/0018720814547570
[168] P. Wintersberger, T. von Sawitzky, A.-K. Frison, and A. Riener,

Trac augmentation as a means to increase trust in automated
driving systems, in Proceedings of the 12th Biannual Conference on
Italian SIGCHI Chapter, ser. CHItaly'17. New York, NY, USA: ACM,
2017, pp. 17:117:7. [Online]. Available: http://doi.acm.org/10.1145/
3125571.3125600
[169] J. Koo, J. Kwac, W. Ju, M. Steinert, L. Leifer, and C. Nass, Why did my
car just do that? Explaining semi-autonomous driving actions to improve
180
Bibliography
driver understanding, trust, and performance, International Journal on

Interactive Design and Manufacturing, vol. 9, no. 4, pp. 269275, nov
2015. [Online]. Available: https://doi.org/10.1007/s12008-014-0227-2
[170] B. E. Noah, P. Wintersberger, A. G. Mirnig, S. Thakkar, F. Yan, T. M.

Gable, J. Kraus, and R. McCall, First workshop on trust in the age of
automated driving, in Proceedings of the 9th International Conference
on Automotive User Interfaces and Interactive Vehicular Applications
Adjunct, ser. AutomotiveUI'17. New York, NY, USA: ACM, 2017, pp.
1521. [Online]. Available: http://doi.acm.org/10.1145/3131726.3131733
[171] A. Kunze, S. J. Summerskill, R. Marshall, and A. J. Filtness, Enhancing

driving safety and user experience through unobtrusive and function-
specic feedback, in Proceedings of the 9th International Conference
on Automotive User Interfaces and Interactive Vehicular Applications
Adjunct, ser. AutomotiveUI'17. New York, NY, USA: Association
https://doi.org/10.1145/3131726.3131762
[172] B. Peging, M. Rang, and N. Broy, Investigating user needs for non-
driving-related activities during automated driving, in Proceedings of
the 15th International Conference on Mobile and Ubiquitous Multimedia,
ser. MUM'16. New York, NY, USA: Association for Computing
Machinery, 2016, pp. 9199. [Online]. Available: https://doi.org/
10.1145/3012709.3012735
[173] J.-Y. Jian, A. M. Bisantz, and C. G. Drury, Foundations for an em-

pirically determined scale of trust in automated systems, International
Journal of Cognitive Ergonomics, vol. 4, no. 1, pp. 5371, 2000. [Online].
Available: https://doi.org/10.1207/S15327566IJCE0401_04
[174] Garmin Ltd., Forerunner 235, 2020, https://buy.garmin.com/en-US/

US/p/529988/pn/010-03717-48 (retrieved May 16, 2021).
[175] , Vivosmart 3, 2020, https://buy.garmin.com/en-US/US/p/

567813/pn/010-01755-10 (retrieved May 16, 2021).
[176] Polar Electro Oy, Polar A370, 2020, https://www.polar.com/en/

products/sport/A370-tness-tracker (retrieved May 16, 2021).
[177] Empatica Support, Recent publications citing the e4 wristband, 2018,

https://support.empatica.com/hc/en-us/articles/115002540543-Recent-
Publications-citing-the-E4-wristband- (retrieved May 16, 2021).
[178] C. McCarthy, N. Pradhan, C. Redpath, and A. Adler, Validation
181
Bibliography
of the empatica e4 wristband, in 2016 IEEE EMBS International

Student Conference (ISC), May 2016, pp. 14. [Online]. Available:
https://doi.org/10.1109/EMBSISC.2016.7508621
[179] Bittium Corporation, Bittium faros waterproof ecg devices, 2019, https:
//www.bittium.com/medical/bittium-faros (retrieved May 16, 2021).
[180] IPG Automotive GmbH, Carmaker: Virtual testing of automobiles

and light-duty vehicles, 2020, https://ipg-automotive.com/products-
services/simulation-software/carmaker/.
[181] The Epworth Sleepiness Scale, About the ess, 2020, http://
epworthsleepinessscale.com/about-the-ess/.
[182] Stähle GmbH, Automated driving system sfphybrid for cars, 2020,
https://www.staehle-robots.com/english-1/products/proving-ground-
driving-systems/ (retrieved May 16, 2021).
[183] D. de Waard, M. van der Hulst, M. Hoedemaeker, and K. A. Brookhuis,

Driver Behavior in an Emergency Situation in the Automated Highway
System, Transportation Human Factors, vol. 1, no. 1, pp. 8789,
jan 1999. [Online]. Available: http://www.tandfonline.com/doi/abs/
10.1207/sthf0101_7
[184] O. Carsten, F. C. Lai, Y. Barnard, A. H. Jamson, and N. Merat, Control

task substitution in semiautomated driving: Does it matter what aspects
are automated? Human Factors, vol. 54, no. 5, pp. 747761, oct 2012.
[Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/23156620
[185] M. Mark Vollrath, S. Briest, and K. Oeltze, Auswirkungen

des Fahrens mit Tempomat und ACC auf das Fahrerverhalten,
Tech. Rep., 2010, https://bast.opus.hbz-nrw.de/opus45-bast/frontdoor/
deliver/index/docId/249/le/F74.pdf (retrieved May 16, 2021).
[186] J. B. Kostis, A. E. Moreyra, M. T. Amendo, J. Di Pietro, N. Cosgrove,

and P. T. Kuo, The eect of age on heart rate in subjects free of
heart disease. Studies by ambulatory electrocardiography and maximal
exercise stress test, Circulation, vol. 65, no. 1 I, pp. 141145, jan 1982.
[Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/7198013
[187] E. D. Larson, J. R. Clair, W. A. Sumner, R. A. Bannister, and

C. Proenza, Depressed pacemaker activity of sinoatrial node myocytes
contributes to the age-dependent decline in maximum heart rate,
Proceedings of the National Academy of Sciences of the United States of
182
Bibliography
America, vol. 110, no. 44, pp. 18 01118 016, oct 2013. [Online].
Available: http://www.ncbi.nlm.nih.gov/pubmed/24128759
[188] J. M. Hagberg, W. K. Allen, D. R. Seals, B. F. Hurley,

A. A. Ehsani, and J. O. Holloszy, A hemodynamic comparison
of young and older endurance athletes during exercise, Journal of
Applied Physiology, vol. 58, no. 6, pp. 20412046, jun 1985. [On-
line]. Available: http://www.ncbi.nlm.nih.gov/pubmed/4008419https:
//www.physiology.org/doi/10.1152/jappl.1985.58.6.2041
[189] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and

I. H. Witten, The WEKA data mining software: An update,
SIGKDD Explor., vol. 11, no. 1, pp. 1018, 2009. [Online]. Available:
https://doi.org/10.1145/1656274.1656278
[190] D. Sandberg, The performance of driver sleepiness indicators as a

function of interval length, in 2011 14th International IEEE Conference
on Intelligent Transportation Systems (ITSC), 2011, pp. 17351740.
[Online]. Available: https://doi.org/10.1109/ITSC.2011.6082939
[191] J. R. Landis and G. G. Koch, The Measurement of Observer

Agreement for Categorical Data, Tech. Rep. 1, 1977. [Online]. Available:
https://pubmed.ncbi.nlm.nih.gov/843571/
[192] C. Zhao, M. Zhao, J. Liu, and C. Zheng, Electroencephalogram

and electrocardiograph assessment of mental fatigue in a driving
simulator, Accident Analysis & Prevention, vol. 45, pp. 83 90, 2012.
S0001457511003241
[193] S.-J. Jung, H.-S. Shin, and W.-Y. Chung, Driver fatigue and drowsiness
monitoring system with embedded electrocardiogram sensor on steering
wheel, IET Intell. Transp. Syst., vol. 8, no. 1, pp. 4350, 2014. [Online].
Available: https://doi.org/10.1049/iet-its.2012.0032
[194] V. P. Nambiar, M. Khalil-Hani, C. W. Sia, and M. N. Marsono, Evolv-

able block-based neural networks for classication of driver drowsiness
based on heart rate variability, in 2012 IEEE International Conference
on Circuits and Systems (ICCAS), Oct 2012, pp. 156161. [Online].
Available: https://doi.org/10.1109/ICCircuitsAndSystems.2012.6408316
[195] G. Lenis, P. Reichensperger, D. Sommer, C. Heinze, M. Golz,

and O. Dössel, Detection of microsleep events in a car driving
simulation study using electrocardiographic features, Current Directions
in Biomedical Engineering, vol. 2, no. 1, pp. 283287, 2016. [Online].
183
Bibliography
Available: https://www.degruyter.com/view/j/cdbme.2016.2.issue-1/
cdbme-2016-0063/cdbme-2016-0063.xml
[196] M. P. Tarvainen, J.-P. Niskanen, J. A. Lipponen, P. O. Ranta-

aho, and P. A. Karjalainen, Kubios hrv - heart rate variability
analysis software, Computer Methods and Programs in Biomedicine,
vol. 113, no. 1, pp. 210 220, 2014. [Online]. Available: https:
//doi.org/10.1016/j.cmpb.2013.07.024
[197] S. Shirmohammadi, K. Barbe, D. Grimaldi, S. Rapuano, and

S. Grassini, Instrumentation and measurement in medical, biomedical,
and healthcare systems, IEEE Instrum. Meas. Mag., vol. 19, no. 5,
pp. 612, October 2016. [Online]. Available: https://doi.org/10.1109/
MIM.2016.7579063
[198] R. Kohavi, A study of cross-validation and bootstrap for accuracy

estimation and model selection, in Proceedings of the 14th International
Joint Conference on Articial Intelligence - Volume 2, ser. IJCAI'95.
San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1995,
pp. 11371143. [Online]. Available: https://dl.acm.org/doi/10.5555/
1643031.1643047
[199] N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, Smote: Synthetic

minority over-sampling technique, J. Artif. Intell. Res., vol. 16, pp.
321357, 01 2002. [Online]. Available: https://doi.org/10.1613/jair.953
[200] M. A. Hall, Correlation-based feature subset selection for machine

learning, Ph.D. dissertation, University of Waikato, Hamilton, New
Zealand, 1998. [Online]. Available: https://www.cs.waikato.ac.nz/
~mhall/thesis.pdf
[201] P. Branco, L. Torgo, and R. P. Ribeiro, A survey of predictive

modelling under imbalanced distributions, CoRR, vol. abs/1505.01658,
2015. [Online]. Available: http://arxiv.org/abs/1505.01658
[202] A. Persson, H. Jonasson, I. Fredriksson, U. Wiklund, and C. Ahlström,

Heart rate variability for driver sleepiness classication in real road
driving conditions*, in 2019 41st Annual International Conference of
the IEEE Engineering in Medicine and Biology Society (EMBC), July
EMBC.2019.8857229
[203] Z. Li, S. E. Li, R. Li, B. Cheng, and J. Shi, Online detection

of driver fatigue using steering wheel angles for real driving
184
Bibliography
conditions, Sensors, vol. 17, no. 3, 2017. [Online]. Available:

https://doi.org/10.3390/s17030495
[204] C. J. de Naurois, C. Bourdin, A. Stratulat, E. Diaz, and J.-L. Vercher,

Detection and prediction of driver drowsiness using articial neural
network models, Accident Analysis & Prevention, vol. 126, pp. 95104,
[205] Q. Li, J. Wu, S.-D. Kim, and C.-G. Kim, Hybrid driver fatigue detection
system based on data fusion with wearable sensor devices, 2015.
[206] G. Li, B. L. Lee, and W. Y. Chung, Smartwatch-Based Wearable

EEG System for Driver Drowsiness Detection, IEEE Sensors Journal,
//doi.org/10.1109/JSEN.2015.2473679
[207] Z. Chen, R. Ding, T. W. Chin, and D. Marculescu, Understanding

the impact of label granularity on CNN-based image classication,
in IEEE International Conference on Data Mining Workshops, ICDMW,
vol. 2018-Novem, 2019, pp. 895904. [Online]. Available: https:
//doi.org/10.1109/ICDMW.2018.00131
[208] Android, Android automotive, 2020, https://source.android.com/

devices/automotive (retrieved May 16, 2021).
[209] C. Lin, C. Chuang, C. Huang, S. Tsai, S. Lu, Y. Chen, and L. Ko,

Wireless and wearable eeg system for evaluating driver vigilance, IEEE
Transactions on Biomedical Circuits and Systems, vol. 8, no. 2, pp. 165
176, 2014.
[210] G. Li and W.-Y. Chung, Detection of driver drowsiness using wavelet

analysis of heart rate variability and a support vector machine classier,
Sensors, vol. 13, no. 12, pp. 16 49416 511, 2013. [Online]. Available:
https://www.mdpi.com/1424-8220/13/12/16494
[211] Polar, Polar oh1 - optical heart rate sensor, 2020, https:
//www.polar.com/us-en/products/accessories/oh1-optical-heart-rate-
sensor.
[212] I. T. Hettiarachchi, S. Hanoun, D. Nahavandi, and S. Nahavandi,

Validation of polar oh1 optical heart rate sensor for moderate and high
intensity physical activities, PLOS ONE, vol. 14, no. 5, pp. 113, 05
2019. [Online]. Available: https://doi.org/10.1371/journal.pone.0217288
185
Bibliography
[213] Polar Electro Oy, Polar sdk, 2020, https://www.polar.com/en/

developers/sdk.
[214] L. Breiman, Random forests, Machine Learning, vol. 45, no. 1, pp. 532,
oct 2001. [Online]. Available: https://doi.org/10.1023/A:1010933404324
[215] F. D. Davis, Perceived usefulness, perceived ease of use, and user

acceptance of information technology, MIS Quarterly, vol. 13, no. 3, pp.
319340, 1989. [Online]. Available: http://www.jstor.org/stable/249008
[216] P. Wintersberger, A.-K. Frison, A. Riener, and T. v. Sawitzky,

Fostering User Acceptance and Trust in Fully Automated Vehicles:
Evaluating the Potential of Augmented Reality, PRESENCE: Virtual
and Augmented Reality, vol. 27, no. 1, pp. 4662, 03 2019. [Online].
Available: https://doi.org/10.1162/pres_a_00320
[217] B. Laugwitz, T. Held, and M. Schrepp, Construction and evaluation of

a user experience questionnaire, in HCI and Usability for Education and
Work, A. Holzinger, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg,
2008, pp. 6376.
[218] M. Schrepp, A. Hinderks, and J. Thomaschewski, Construc-

tion of a benchmark for the user experience questionnaire
(ueq), International Journal of Interactive Multimedia and Articial
Intelligence, vol. 4, no. 4, pp. 4044, 06/2017 2017. [Online].
Available: http://www.ijimai.org/journal/sites/default/les/les/2016/
12/ijimai20174_4_5_pdf _94297.pdf
[219] X. Hu and G. Lodewijks, Detecting fatigue in car drivers and aircraft

pilots by using non-invasive measures: The value of dierentiation
of sleepiness and mental fatigue, Journal of Safety Research, 2020.
[Online]. Available: https://doi.org/10.1016/j.jsr.2019.12.015
[220] M. A. Corbett, A drowsiness detection system for pilots: Optaler,

Aviation, Space, and Environmental Medicine, vol. 80, no. 2, pp. 149149,
2009. [Online]. Available: https://doi.org/10.3357/ASEM.21001.2009
[221] M. Sant'Ana, G. Li, and H. Zhang, A decentralized sensor fusion

approach to human fatigue monitoring in maritime operations, in
2019 IEEE 15th International Conference on Control and Automation
(ICCA), July 2019, pp. 15691574. [Online]. Available: https:
//doi.org/10.1109/ICCA.2019.8899708
[222] G. Li, R. Mao, H. Hildre, and H. Zhang, Visual attention assessment

for expert-in-the-loop training in a maritime operation simulator,
186
Bibliography
IEEE Transactions on Industrial Informatics, vol. PP, pp. 11, 10 2019.

[Online]. Available: https://doi.org/10.1109/TII.2019.2945361
[223] M. Hermann, T. Pentek, and B. Otto, Design principles for

industrie 4.0 scenarios, in 49th Hawaii International Conference on
System Sciences (HICSS), 2016, pp. 39283937. [Online]. Available:
https://doi.org/10.1109/HICSS.2016.488
187

Driver Drowsiness Detection Systems Potential of Smart Wearable Devices To Improve Vehicle Safety

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Driver Drowsiness Detection Systems Potential of Smart Wearable Devices To Improve Vehicle Safety

Hochgeladen von

Copyright:

Verfügbare Formate

Submitted by

Thomas Kundinger, M.Sc.

Driver Drowsiness Supervisor and

Detection Systems: Prof. Priv.-Doz. Dr.

Potential of Smart Second Evaluator

Wearable Devices to Linz, June 2021

Improve Vehicle Safety

Parts of this thesis have been published as international conference or jour-

Driver drowsiness is a major cause of fatal trac accidents. Automated driv-

Fahrermüdigkeit ist eine der Hauptursachen für tödliche Verkehrsunfälle. Das

Special thanks also go to my supervisors at AUDI AG, Dr. Nikoletta Sofra

Moreover, I would like to thank my colleagues from the Human-Computer

List of Figures xiii

2 Theoretical Background and State-of-the-Art of Drowsiness De-

3 Baseline Studies and Subjective Evaluation 25

3.2 Study 1: Driving Simulator . . . . . . . . . . . . . . . . . . . . . . . 32

4 Model Development: Driver Drowsiness Detection using Wrist-

4.3.5 Main Findings . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5 Evaluation: Performance and Acceptance of a Driver Drowsiness

A Publications and Contribution Statement 151

C German Version of Developed Android Application 159

1.1 Changing Role of Driver in Driving Automation . . . . . . . . . . 2

2.1 Driver Alertness Monitor . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1 Representation of KSS on Tablet . . . . . . . . . . . . . . . . . . . 28

4.1 Methodology for Model Development and Testing . . . . . . . . . 58

4.10 Sample Image of Video File . . . . . . . . . . . . . . . . . . . . . . 88

5.1 Concept of Driver Drowsiness Detection System . . . . . . . . . . 99

2.1 Karolinska Sleepiness Scale (KSS) . . . . . . . . . . . . . . . . . . . 9

4.1 Allocation of Micro-Sleep Events to Drowsiness Level . . . . . . . 61

5.1 Items of Technology Acceptance Model (TAM) . . . . . . . . . . . 108

5.4 Study 3: Results of Pre-Questionnaire . . . . . . . . . . . . . . . . 111

GPS Global Positioning System

Drowsiness describes a state of sleepiness and apathy, potentially causing to fall

safety, driver monitoring is part of the safety assessments in the category of

What is expected to have an even higher impact on road safety is automated

(SAE J3016 [20]).

itoring obligations but has to be responsive in the event of a take-over request

1 own publications are highlighted in blue

In consumer electronics, health monitoring and tness tracking with wearable

To counteract limitations of existing drowsiness detection systems and encour-

The remainder of this Ph.D. thesis is structured as follows: In Chapter 2, theo-

In this chapter, at rst, an explanation of the term drowsiness and related

2.1 Drowsiness, Sleepiness and Fatigue

In related work on driver drowsiness detection, the terms drowsiness, sleepi-

2.2 Driver Drowsiness Detection Methods -

In the following, current driver drowsiness measures and detection methods

2.2.1 Subjective Measures

Subjective measures include self- and observer ratings. In order to generate a

Table 2.1: Karolinska Sleepiness Scale (KSS) [49].

Level Description Indicators

appearance of alertness present; normal

still suciently alert; less sharp/alert looks;

mannerisms; slower eye lid closures;

eyelid closures (1-2s); eyes rolling sideways;

eyelid closures (2-3s); eyes rolling upward/

eyelid closures (4s or more); falling asleep;

Table 2.2: Observer rating scale by Weinbeer et al. [50].

The advantage of subjective measures is the consideration of personal feelings

2.2.2 Behavioral Measures

Behavioral-based drowsiness detection techniques measure driver drowsiness by

N o. f rames of closed eyes

Many studies focused on using machine (deep) learning-based approaches

Driver drowsiness is a major cause of fatal trac accidents. Automated driv-

In consumer electronics, health monitoring and tness tracking with wearable

In this chapter, at rst, an explanation of the term drowsiness and related

still suciently alert; less sharp/alert looks;

Table 2.4: Summary of advantages and limitations of dierent types of drowsiness

People behave very dierently in certain physiological states, such as drowsi-

In consumer electronics, wearable devices, especially smartwatches and tness

H1: Driving mode (manual/automated) has a signicant eect on drowsi-

H2: Driving time has a signicant eect on drowsiness.

H3: Driver's age (young/old) has a signicant eect on drowsiness.

H4: A correlation between drowsiness self-ratings and objective measures

Post-Questionnaire After nishing the drives, the participants were ques-

What would be the appropriate KSS level to receive a rst warning?

Baseline study 1 was conducted in a high-delity driving simulator at Technis-