Beruflich Dokumente
Kultur Dokumente
a r t i c l e
i n f o
a b s t r a c t
Article history:
In this paper, a 3D computer vision system for cognitive assessment and rehabilitation
based on the Kinect device is presented. It is intended for individuals with body scheme
dysfunctions and leftright confusion. The system processes depth information to overcome
17 October 2013
the shortcomings of a previously presented 2D vision system for the same application. It
achieves left and right-hand tracking, and face and facial feature detection (eye, nose, and
ears) detection. The system is easily implemented with a consumer-grade computer and
Keywords:
an affordable Kinect device and is robust to drastic background and illumination changes.
Humancomputer interaction
The system was tested and achieved a successful monitoring percentage of 96.28%. The
Cognitive rehabilitation
automation of the human body parts motion monitoring, its analysis in relation to the
psychomotor exercise indicated to the patient, and the storage of the result of the realization
tracking
of a set of exercises free the rehabilitation experts of doing such demanding tasks. The
3D computer vision
Kinect device
1.
Introduction
Generally, physical and cognitive rehabilitation is a complex and long-term process that requires clinician experts and
appropriate tools. Although some clinical decision-support
systems are based on computer vision to help health professionals [4], there are few medical centers with rehabilitation
professionals and programs that currently utilize such systems. Therefore, a system that can successfully support the
rehabilitation of individuals with a disability would be of great
interest.
Zhou and Hu conducted a review of the human motion
tracking systems for rehabilitation. Six issues were considered to assess systems: cost, size, weight function, operation,
and automation. Marker-free visual systems were highlighted
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
621
In this paper, we present a 3D vision-based marker-free system to monitor human body parts motions for the assessment
and rehabilitation of body scheme dysfunctions and leftright
confusion using the Kinect device. The monitoring aims to
extract the achievement level, to calculate temporal parameters (reaction time, fulllment time, or failure time), and to
track the trajectories of human body parts in the psychomotor
exercises. In these exercises, the patient has to move one hand
to touch the head or one facial feature. The proposed system
advances the previous systems task monitoring by leveraging the Kinects 3D skeleton tracking [21]. The 3D system has
several advantages over the previous system that motivated
its development. The 3D system achieves a more robust and
satisfactory performance of the cognitive rehabilitation exercises monitoring than the 2D system. The 3D data processing
overcomes the limitation of the 2D system that cannot discriminate if the 2D occlusion between a hand and a facial
feature involves their contact. With the use of the depth information, the 3D system can discriminate unambiguously if a
hand touches a facial feature. This way, the 3D system avoids
the unlikely but possible erroneous processing of the 2D system that could interpret for instance that a hand touched an
eye or an ear when the hand only occluded this facial feature in its motion toward the other eye or ear, respectively.
Thus, the 2D system could erroneously decide that the user
confused the right eye with the left eye or the right ear with
the left ear. The 3D system prevents this situation from happening. Besides, in the 2D system, when a hand and the face
are in occlusion, the joint tracking of both human body parts
was carried out, thus there is no precise information of the
hand position in relation to the face in this occlusion state. In
the 3D system, hand position is always known. Moreover, the
3D system records the 3D positions of each human body part
in the video sequence unlike the 2D system that just records
the in-plane positions. The 3D positions are more detailed and
interesting for the rehabilitation experts in the further ofine
analysis. Lastly, although the 2D system is robust to different
illumination conditions, it requires a certain level of illumination to make color-based tracking of the hands and face.
The Kinect device does not need any illumination to extract
the depth information and only needs some amount of illumination for the AdaBoost face detector. Such implementation
requires substantially less illumination than the color-based
tracking of the previous system.
The 3D system was conceived to be included in GRADIOR system [22], which is a cognitive rehabilitation platform
developed by the INTRAS Foundation. INTRAS is a nonprot organization whose aim is to develop and promote
activities concerning assistance, research, evaluation, and dissemination in the social and health scope [22]. GRADIOR
was supported by the Spanish Ministries of Education and
Science and Innovation and includes more than 15,000 rehabilitation exercises of different cognitive functions such as
attention, perception, memory, calculation, and language and
was assessed satisfactory by rehabilitation experts and users
[23]. It is currently used by more than 450 medical centers all
over the world.
The rest of the paper is organized as follows. In Section
2, the Kinect device and its applications to different areas,
e.g. human activity recognition, are presented. The proposed
622
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
2.
3.
Proposed system
3.1.
Clinical motivation
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
623
3.2.
To carry out the psychomotor exercises monitoring, the system needs to achieve face and facial features (nose, eyes, and
ears) detection and left-, and right-hand detection and tracking. The system proposed in [21] works with images captured
by a web camera through 2D processing. Due to the fact that it
does not process 3D information, the system has the following limitation: if a hand were between the camera and a facial
feature, it could occlude the facial feature in a video sequence
even though the hand had not touched the face. Although the
2D system can be considered to have an acceptable performance as the most signicant information of a psychomotor
exercise lies in the fact that the user moves the corresponding hand to the facial feature regardless of whether the touch
between the hand and the facial feature takes place or not, the
system presented in this paper overcomes this limitation and
its possible consequences explained in the introduction as it
works with 3D information. The system achieves a real-time
performance in an environment with a certain minimum level
of illumination.
Two stages are included in the system as showed in Fig. 1:
skeleton detection stage and exercises monitoring stage.
3.2.1.
CDHJ IFL
IDHJ
(1)
624
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
3.2.2.
H(x, y) =
Y=
(2)
where Y is the new value of a pixel, Ymax and Ymin the new
range of depth values, X the depth value before the rescaling,
and Xmax and Xmin the initial range of depth values before the
rescaling.
To decrease the irregularities of the surface depicted in the
depth image, a rst smoothing is applied through a median
lter, with which the depth value of a pixel is the median of
the neighboring pixels.
K(x, y) =
3/2
2
fxx fyy fxy
(1 + fx2 + fy2 )
(3)
(4)
To calculate the derivatives, each pixel in the surface is tted to a paraboloid. To achieve it, the following biquadratic
polynomial is used:
gij (x, y) = aij + bij (x xi ) + cij (y yj ) + dij (x xi )(y yj )
2
+ eij (x xi ) + fij (y yi )
(5)
where the coefcients aij , bij , cij , dij , eij , fij , are obtained through
the least square tting of the pixels in a neighborhood of (xj ,yj ).
The derivatives of f in (xj ,yj ) are then estimated by the derivatives of gij that are:
fx (xi , yj ) = bij
(6)
fy (xi , yj ) = cij
(7)
(8)
(9)
(10)
After computing the mean and Gaussian curvature, a HK classication [43] of the surface pixels is applied to obtain a concise
description of the local behavior of the surface. Following the
625
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
Fig. 3 (a) Facial image with the elliptical convex regions in green and the elliptical concave regions in red; (b) Thresholded
elliptical convex and elliptical concave regions. (For interpretation of the references to color in this gure legend, the reader
is referred to the web version of this article.)
Table 1 HK classication of the local surface in a pixel as a function of the values of the mean and Gaussian curvatures.
H<0
H=0
H>0
K<0
K=0
Hyperbolic concave
Hyperbolic symmetric
Hyperbolic convex
Cylindrical concave
Planar
Cylindrical convex
3.2.3.
K>0
Elliptical concave
Impossible
Elliptical convex
3D ear detection
626
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
Fig. 5 (a) Example of depth image obtained from the Kinect sensor; (b) Depth image after applying the depth lter; (c)
Depth image after applying the rescaling.
3.3.
Fig. 6 Face orientation and rectangular region used to detect the ears in two frames.
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
627
Fig. 7 (a) Depth values of the rectangular region used to obtain the ears; (b) Regions with the lowest and non-null value in
the two halves of the rectangular region.
4.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
628
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
Fig. 8 Monitoring of four psychomotor exercises. (For interpretation of the references to color in the text, the reader is
referred to the web version of this article.)
629
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
Exercise
Processing time
16 ms
35 ms
28 ms
4.1.
Performance evaluation
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
#11
#12
#13
#14
Total
# of
monitorings
75
75
75
75
75
75
75
75
75
75
75
75
75
75
1050
# of successful
monitorings
73
71
69
73
74
73
72
67
69
71
75
74
75
75
1011 (96.28%)
# of erroneous
monitorings
2
4
6
2
1
2
3
8
6
4
0
1
0
0
39 (3.71%)
head joint given by the skeleton tracking. Thus the face detection time is reduced to 35 ms and the system processes 12 fps.
The system was tested in the research laboratory with
unconstrained background and some amount of illumination necessary for face detection. 15 individuals tested it: 10
healthy, 2 with frontal lobe injury, and 3 with mild dementia
with a view to looking into their differences testing the system. The individuals were informed about the procedure of
the tests. Each test would be formed by a series of exercises in
which the right or the left hand would have to touch a facial
feature. The particular exercise would be detailed by an acoustic message or a pop-up window. After being informed about
an exercise, the participants would have to touch the facial
feature implied in the exercise with the hand and remain in
the nal position of the exercise until a pop-up window indicated the correct or incorrect realization of the exercise or the
nalization of the time to fulll it. The participants did not
have to maintain a constrained posture during the exercises
although the face has to be in such posture that the facial features are visible. The participants made 5 tests, each with the
14 psychomotor exercises. They did the tests in three different
days. Each participant had 30 s to fulll an exercise after being
informed about it. Table 3 shows the performance results of
the system. A monitoring is considered successful if the correct or incorrect realization of the exercise is obtained.
Failures in exercises regarding eyes and nose were caused
by the tendency to turn the head while the hand was moving
toward the facial features, thus in the occlusion state the face
was far from being in frontal position and the facial features
detection failed.
The bigger number of errors in the monitoring of exercises
in which the ears were concerned was due to the fact that
the depth values in pixels around the ears were not as low as
the applied approach expected. Sometimes, the Kinect device
failed to obtain an accurate depth value in some image regions.
The overall successful monitoring percentage was 96.28%.
The system is considered adequate for the rehabilitation
experts at INTRAS Foundation to monitor the psychomotor
exercises.
At the end of the exercises, the participants were asked
about the usability, enjoyment, and motivation they found
while doing the tests monitored by the 3D system. All the
630
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
5.
Conclusions
In this paper, a real-time 3D computer vision-aided system applied to the monitoring of psychomotor exercises is
proposed. The system is intended for the assessment and
evaluation of body scheme dysfunctions and leftright confusion. Other computer vision-based systems presented in
literature do not focus on these disabilities. Monitoring is
achieved through the human body joint tracking (head and
hands) and the face and facial feature (eyes, nose, and ears)
detection using the depth information provided by the Kinect
device. Thus the system overcomes the shortcomings of the
2D computer vision-based approaches and is robust to drastic
changes in the working environment and illumination. The
Kinect device was proved to be an affordable off-the-shelf
product that can be very useful to develop applications in
many elds such as rehabilitation, easy to use and with optimal performance.
The proposed system is capable of human limb monitoring,
its analysis in relation to the psychomotor exercise indicated
to the user, and the storage of the result of the realization of a
set of exercises. The automation of these tasks frees the rehabilitation experts of doing them. The system was evaluated
with 15 users, achieving a successful monitoring percentage
of 96.28%. This performance is suitable for the integration of
the system in a multimodal cognitive rehabilitation platform.
Acknowledgements
This work was partially supported by the Spanish Ministry
of Science and Innovation under project TIN2010-20529 and
by the Regional Government of Castilla y Len (Spain) under
project VA171A11-2. We would like to thank people at INTRAS
Foundation for their contribution and advice to the clinical
focus and requirements of the computer vision system to be
applicable in the cognitive rehabilitation and integrable in the
GRADIOR platform.
references
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
631