1 s2.0 S0169260713003568 Main

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631
journal homepage: www.intl.elsevierhealth.com/journals/cmpb
A Kinect-based system for cognitive rehabilitation

exercises monitoring
D. Gonzlez-Ortega , F.J. Daz-Pernas, M. Martnez-Zarzuela,
M. Antn-Rodrguez
Department of Signal Theory, Communications and Telematics Engineering, Telecommunications Engineering School,
University of Valladolid, Valladolid, Spain
a r t i c l e
i n f o
a b s t r a c t
Article history:
In this paper, a 3D computer vision system for cognitive assessment and rehabilitation
Received 14 November 2012
based on the Kinect device is presented. It is intended for individuals with body scheme
Received in revised form
dysfunctions and leftright confusion. The system processes depth information to overcome
17 October 2013
the shortcomings of a previously presented 2D vision system for the same application. It
Accepted 23 October 2013
achieves left and right-hand tracking, and face and facial feature detection (eye, nose, and
ears) detection. The system is easily implemented with a consumer-grade computer and
Keywords:
an affordable Kinect device and is robust to drastic background and illumination changes.
Humancomputer interaction
The system was tested and achieved a successful monitoring percentage of 96.28%. The
Cognitive rehabilitation
automation of the human body parts motion monitoring, its analysis in relation to the
Human body parts detection and
psychomotor exercise indicated to the patient, and the storage of the result of the realization
tracking
of a set of exercises free the rehabilitation experts of doing such demanding tasks. The
3D computer vision
vision-based system is potentially applicable to other tasks with minor changes.
Kinect device
1.
Introduction
Research and development in recent years has made the

inclusion of individuals with a physical or cognitive disability into society much easier, helping them in the realization
of their daily activities, thus increasing their autonomy and
self-condence [1]. Particularly, research in the elds of computer vision, virtual reality, and augmented reality has given
rise to applications in the educational, working, domestic,
and leisure environments, with the potential for signicant
impact on the life of individuals with a disability. For instance,
they use computers due to the development of vision-based
humancomputer interfaces [2,3].
2013 Elsevier Ireland Ltd. All rights reserved.
Generally, physical and cognitive rehabilitation is a complex and long-term process that requires clinician experts and
appropriate tools. Although some clinical decision-support
systems are based on computer vision to help health professionals [4], there are few medical centers with rehabilitation
professionals and programs that currently utilize such systems. Therefore, a system that can successfully support the
rehabilitation of individuals with a disability would be of great
interest.
Zhou and Hu conducted a review of the human motion
tracking systems for rehabilitation. Six issues were considered to assess systems: cost, size, weight function, operation,
and automation. Marker-free visual systems were highlighted
Corresponding author at: Departamento de Teora de la Senal,

Comunicaciones e Ingeniera Telemtica, E.T.S.I. Telecomunicacin,
Universidad de Valladolid, Campus Miguel Delibes, Paseo de Beln n 15, 47011 Valladolid, Spain. Tel.: +34 983423000; fax: +34 983423667.
E-mail address: davgon@tel.uva.es (D. Gonzlez-Ortega).
0169-2607/$ see front matter 2013 Elsevier Ireland Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.cmpb.2013.10.014
because they have reduced restrictions, robust performance,

and low cost [5]. Wang et al. proposed a virtual reality application for brain injury rehabilitation [6]. Lin et al. demonstrated
the effectiveness of an eye-tracking device to play a computer
game in the rehabilitation of eye movement dysfunction [7].
da Costa and de Carvalho [8] and Edmans et al. [9] showed
the positive results of a virtual reality device for stroke and
cognitive rehabilitation, respectively. Rand et al. studied the
rehabilitation potential with the low-cost video game console Sony@ PS2TM and a camera [10]. It was stated as a useful
intervention tool during the rehabilitation of stroke patients
and those with other neurological disorders, leaving aside its
limited motion monitoring and recording. Lin et al. developed
a rehabilitation system with a double-CCD camera to exercise hand grasp-and-place movements, so that their upper
limb movement and trigger trunk control and balance can be
improved [11]. Vousdoukas et al. [12] presented a stereo vision
motion tracking system for the detection and assessment of
human motor reactivity elicited by sensory stimulation. The
system achieved robust performance with low cost. Broeren
et al. investigated the effects of virtual reality technology for
stroke rehabilitation [13]. They provided evidence that virtual
reality rehabilitation can increase stroke subjects motor and
cognitive skills and showed that virtual reality can achieve a
real-time quantitative 3D task analysis. Pyk et al. presented a
virtual reality-based therapy system for arm and hand rehabilitation in children [14]. They stated that the system can
reduce staff therapy cost, increase patient motivation, and
evaluate objectively patient progress. Cho et al. [15] developed
and tested a rehabilitation system based on virtual reality
proprioceptive feedback training. It improved the motor control of stroke patients. Some researchers have worked on
applications to develop customized virtual reality-based rehabilitation exercises. Avola et al. [16] described an open gesture
recognition framework, which is supported within a virtual
environment creator, for fast prototyping of customized rehabilitation exercises in virtual scenarios. Badesa et al. [17]
presented a classication method to automatically adapt the
therapy and real-time displays of a virtual reality system to the
specic needs and demands on the patient. Having said that,
many issues need further research, for instance, vision-based
motion analysis for epilepsy diagnosis and rehabilitation [18].
Devices have also arisen outside of the eld of rehabilitation research that can be very useful. Such is the case
of the Microsoft@ KinectTM device, which achieves 3D body
motion capture. Following its appearance in 2010, a great number of applications of different nature have been developed,
which use the motion tracking that the Kinect device provides. Among these applications, some focus on rehabilitation
tasks. Chang et al. applied the Kinect for the rehabilitation of two people with motor disabilities [19]. Human body
motion capture was used to determine the correct and
incorrect performance of exercises in physical rehabilitation.
They concluded that the Kinect device can motivate physical rehabilitation. The patient can carry out the rehabilitation
prescribed by the therapist autonomously. Lange et al. developed a game to train reaching and weight shift to improve
balance in adults with neurological injury using the Kinect
device [20]. They also stated the potential of the Kinect device
as a rehabilitation tool.
621
In this paper, we present a 3D vision-based marker-free system to monitor human body parts motions for the assessment
and rehabilitation of body scheme dysfunctions and leftright
confusion using the Kinect device. The monitoring aims to
extract the achievement level, to calculate temporal parameters (reaction time, fulllment time, or failure time), and to
track the trajectories of human body parts in the psychomotor
exercises. In these exercises, the patient has to move one hand
to touch the head or one facial feature. The proposed system
advances the previous systems task monitoring by leveraging the Kinects 3D skeleton tracking [21]. The 3D system has
several advantages over the previous system that motivated
its development. The 3D system achieves a more robust and
satisfactory performance of the cognitive rehabilitation exercises monitoring than the 2D system. The 3D data processing
overcomes the limitation of the 2D system that cannot discriminate if the 2D occlusion between a hand and a facial
feature involves their contact. With the use of the depth information, the 3D system can discriminate unambiguously if a
hand touches a facial feature. This way, the 3D system avoids
the unlikely but possible erroneous processing of the 2D system that could interpret for instance that a hand touched an
eye or an ear when the hand only occluded this facial feature in its motion toward the other eye or ear, respectively.
Thus, the 2D system could erroneously decide that the user
confused the right eye with the left eye or the right ear with
the left ear. The 3D system prevents this situation from happening. Besides, in the 2D system, when a hand and the face
are in occlusion, the joint tracking of both human body parts
was carried out, thus there is no precise information of the
hand position in relation to the face in this occlusion state. In
the 3D system, hand position is always known. Moreover, the
3D system records the 3D positions of each human body part
in the video sequence unlike the 2D system that just records
the in-plane positions. The 3D positions are more detailed and
interesting for the rehabilitation experts in the further ofine
analysis. Lastly, although the 2D system is robust to different
illumination conditions, it requires a certain level of illumination to make color-based tracking of the hands and face.
The Kinect device does not need any illumination to extract
the depth information and only needs some amount of illumination for the AdaBoost face detector. Such implementation
requires substantially less illumination than the color-based
tracking of the previous system.
The 3D system was conceived to be included in GRADIOR system [22], which is a cognitive rehabilitation platform
developed by the INTRAS Foundation. INTRAS is a nonprot organization whose aim is to develop and promote
activities concerning assistance, research, evaluation, and dissemination in the social and health scope [22]. GRADIOR
was supported by the Spanish Ministries of Education and
Science and Innovation and includes more than 15,000 rehabilitation exercises of different cognitive functions such as
attention, perception, memory, calculation, and language and
was assessed satisfactory by rehabilitation experts and users
[23]. It is currently used by more than 450 medical centers all
over the world.
The rest of the paper is organized as follows. In Section
2, the Kinect device and its applications to different areas,
e.g. human activity recognition, are presented. The proposed
622
system to monitor the psychomotor exercises is explained in

Section 3 with subsections for its clinical implications, the
stages included in the system, and comparisons with other
clinical systems. Later, Section 4 focuses on the psychomotor exercises monitoring and the performance evaluation of
the system. Finally, Section 5 draws the conclusions about the
presented system.
2.
Kinect device applications
Human body part detection and tracking has a wide range of

applications. In the past, camera-based motion capture systems that required cumbersome markers or suits were used.
Recent research has focused on marker-free camera-based
systems. The complexity of such systems regarding image
processing depends largely on how the scene is captured.
When 2D cameras are used, problems such as the variety of
human motions, occlusions between limbs or with other body
parts, and the sensitivity to illumination changes are difcult
to cope with.
These problems can be faced with a capture system that
provides depth information. To that end, multi-camera systems [24] or binocular cameras [25,26] can be used. Another
type of depth cameras are Time of Flight (TOF) cameras. They
use an infrared light beam to illuminate the scene and then
they measure the phase lag between the waves sent by the
transmitter to the receiver device. TOF cameras are very precise but require complex hardware, are expensive and provide
a low resolution, e.g. depth image can have a resolution of
176 144. As a low-cost alternative, the Kinect sensor add-on
for the Microsoft Xbox 360TM video game platform came out
at the end of 2010. It includes a structured light camera with
a conventional RGB camera that can be calibrated to the same
reference frame. The Kinect device interprets the 3D information of the scene obtained through infrared structured light
that is read by a standard CMOS sensor. It was designed to
allow users to interact with the gaming system without the
need for a traditional hand held controller. Instead, the sensor
recognizes the users gestures.
Although the Kinect device was developed as an add-on for
the Xbox 360 platform, it can be connected to a PC via a USB
port. Initially, Microsoft did not launch ofcial drivers to use
the Kinect device with a PC. Some libraries were developed
to make the most of the functionalities of Kinect shortly after.
Eventually, Microsoft launched a Kinect software development
kit in June 2011. Since its commercial launch, many developers
and researchers have used the Kinect device in their work,
in different areas, e.g. head pose and facial expression tracking, hand gesture recognition, human activity recognition, and
healthcare applications. Schwarz et al. presented a method
for human body pose estimation using depth data extracted
from TOF cameras and the Kinect device [27]. The Kinect depth
images lead to higher stability in landmark locations and more
robustness to noise. The use of an off-the-shelf device such as
the Kinect device in diverse computer vision-based systems
can make life easier to many people, particularly the individuals with a disability. For instance, Dutta studied the validity
of the Kinect device to assess the postural control and concluded that it may provide comparable data to a 3D motion
analysis system for performing ergonomics assessment [28].

Hu et al. presented a tracking system that estimates the 3D
pose of a wheeled walker users lower limbs with Kinect [29].
The tracker showed robustness against partial occlusions and
missing observations. Chang et al. used Kinect in a system
to facilitate task prompts needed by people with cognitive
impairments [30].
Our system uses the Kinect device to achieve human body
joint tracking and facial feature detection (nose, eyes, ad ears)
so that psychomotor exercises which consist of touching a
facial feature with a hand or raising a hand can be monitored
and the correct and incorrect realization of the exercise can
be determined.
3.
Proposed system
3.1.
Clinical motivation
The system proposed in this paper was developed to be

included in GRADIOR. GRADIOR is a computer rehabilitation
platform that can help health professionals in the eld of
cognitive assessment and rehabilitation. It is intended for
individuals suffering from a broad range of cognitive damage
and deterioration, for instance, traumatic brain injury, dementia, neuropsychiatric disorder, and mental illness. Therapists
from INTRAS aimed to include the assessment and rehabilitation of body scheme dysfunctions and leftright confusion in
GRADIOR so that they did not need to be present in the sessions and were able to check the progression of the patients
analyzing the data recorded in their personal les.
Body scheme refers to a representation of the positions
of body parts in space, which is updated during body movements [31]. Touching different body parts with a hand requires
processing of body knowledge, which includes different interacting representational levels [32]. One level contains lexical
and semantic representations of the body. The other level
is composed of the category-specic visuospatial representations of an individuals own body and bodies in general.
These representations are supposed to specify the position
of body parts on the body surface, the relation of proximity between body parts and their boundaries, and to provide
a structural description of the human body [33]. Dysfunctions can be present in one of these levels. Autotopagnosia
is the inability to localize ones own body part and has been
considered the most striking example of the loss of spatial
representation of the body, distinct from the representation of
external space [34]. Autotopagnosia can be caused by a lesion
in the parietal lobe but also by generalized brain damage.
Besides, autotopagnosia can be accompanied by other neurological disorders such as aphasia or mental deterioration
[35].
Leftright confusion is the disability to name or point
the right or left side of the objects, including the patients
own body parts. It is a quite common phenomenon for
young children or even adults. It can be present with other
associated symptoms such as agraphia, acalculia, and nger
agnosia in the cognitive impairment known as Gerstmanns
syndrome [36]. Leftright confusion can be present in patients
with dementia [37] being independent of aphasia or spatial
623
disorientation. The assessment of leftright confusion needs

to consider various factors [38]:
- Normal individuals with signicant difculties in the
leftright orientation.
- Aphasia as a distorting factor in the understanding of oral
commands.
- Unilateral somatic and spatial neglect.
Therapists from INTRAS designed a series of psychomotor exercises to assess and rehabilitate both body scheme
dysfunctions and leftright confusion and indicated the conditions for a suitable evaluation of the system. In these exercises,
the patient has to touch his or her head or one facial feature
among the right eye, left eye, nose, right ear, and left ear with
one particular hand. The 3D system presented in this paper,
just like the one presented in [21], monitors these psychomotor exercises.
3.2.
Processing stages of the system
To carry out the psychomotor exercises monitoring, the system needs to achieve face and facial features (nose, eyes, and
ears) detection and left-, and right-hand detection and tracking. The system proposed in [21] works with images captured
by a web camera through 2D processing. Due to the fact that it
does not process 3D information, the system has the following limitation: if a hand were between the camera and a facial
feature, it could occlude the facial feature in a video sequence
even though the hand had not touched the face. Although the
2D system can be considered to have an acceptable performance as the most signicant information of a psychomotor
exercise lies in the fact that the user moves the corresponding hand to the facial feature regardless of whether the touch
between the hand and the facial feature takes place or not, the
system presented in this paper overcomes this limitation and
its possible consequences explained in the introduction as it
works with 3D information. The system achieves a real-time
performance in an environment with a certain minimum level
of illumination.
Two stages are included in the system as showed in Fig. 1:
skeleton detection stage and exercises monitoring stage.
3.2.1.
3D skeleton detection and tracking
In the rst stage, a 3D skeleton detection of the person in front

of the device is achieved from the depth image. The skeleton
is described by the following joints: head, neck, right shoulder, left shoulder, right elbow, left elbow, right hand, left hand,
torso, right hip, left hip, right knee, left knee, right foot, and left
foot. To extract these joints, the user has to be placed around
two meters away from the camera with the entire body visible
until they are detected. It is also possible to extract the joints
from the upper body without the need to have the entire body
visible. This is useful for some individuals with a disability that
cannot stand up. Fig. 2 shows two frames with the skeleton
tracking achieved with the Kinect device.
From all the body joints, we are interested in the head and
hands so that monitoring of the psychomotor exercises can
be achieved. Both the 3D detection of the body joints and their
Fig. 1 Overall diagram of the monitoring system.
later tracking are fullled through a library included in the

NITE module in the OpenNI framework [39].
When the skeleton tracking is given by the Kinect device,
the hand joint is placed in a pixel near the tip of the ngers.
Differently, when the user approaches the Kinect device or the
arm is occluding other skeleton part, the hand joint is placed
near the wrist. As it is convenient to have the hand joint near
the tip of the ngers to monitor the psychomotor exercises,
we changed the hand joint given by the Kinect device as a
function of the distance between the elbow and the hand. To
obtain the new hand joint computed in the monitoring, we
used both the distance between the elbow and the hand when
body joints are identied and the angle given by the shoulder,
elbow, and hand.
The distance between the elbow and the hand (Current
Forearm Length: CFL) is estimated using Eq. (1) as a function
of the initial distance (Initial Forearm Length: IFL) when the
skeleton tracking begins, the initial depth of the hand joint
(IDHJ), and the current depth of the hand joint (CDHJ).
CFL =
CDHJ IFL
IDHJ
(1)
Once the distance between the elbow and the hand is

obtained, we calculate the angle between the arm (given by
the shoulder and elbow joints) and the forearm (given by the
elbow and hand joints) to nally obtain the hand joint used in
the psychomotor exercises.
Once the skeleton is detected, the system enters the exercises monitoring stage in which the hand and facial features
are detected and the monitoring analysis for assessment of
psychomotor exercises is fullled.
With the face joint given by the skeleton tracking, face
detection is achieved using the face detector developed in the
framework of the Open Computer Vision Library [40] by Lienhart et al. with the AdaBoost algorithm and based on Haar-like
features [41]. The input to the face detector is the image given
624
Fig. 2 Kinect-based skeleton tracking.
by the RGB camera of the Kinect device converted to grayscale.

A variance normalization is also applied to the grayscale image
similarly to the images used to train the detector. In doing so,
the detector is robust to changing lighting conditions although
it needs some amount of illumination to work reliably. We
already used this face detector in [21].
If the detector were applied to the entire 640 480 image,
the detection time would be rather high. While the detector
is only applied to an image region centered in the face joint
given by the skeleton tracking, this time is appropriate for the
system working.
3.2.2.
H(x, y) =
3D nose and eye detection
With the face region obtained from the detection algorithm

applied in the RGB image, we extract the same region from the
depth image. Four preprocessing stages were applied before
the facial feature detection. These preprocessing stages are:
edge removal, depth increase, median lter, and low-pass lter.
If the face region in the depth image is tted to an analytic
surface (tting data locally to a paraboloid), it is proved that
both regions are almost identical, with the exception of the
edges where there are great differences due to discontinuities.
Thus, the rst preprocessing stage is the removal of the edges
in the face region decreasing 30% the area of the rectangle
obtained by the face detection.
From the range of values of the human body in the depth
image, a rescaling is applied converting the resolution from
8 to 32 bits. The purpose of this rescale is to distinguish the
facial features better as they have more different depth values.
The equation used for the rescaling is:
Y=
(Ymax Ymin )(X Xmin ) + Ymin

(Xmax Xmin )
As a nal stage of the preprocessing, a low-pass lter is

applied to smooth the image. A 3 3 convolution mask with
weights equals to 1/9 was selected.
Once preprocessed the face region of the depth image, two
curvature measures, which provide a lot of information about
the surface geometry, are obtained: the mean curvature (H)
and the Gaussian curvature (K) [42], whose expressions are
showed in Eqs. (3) and (4), respectively, and where fx , fy , fxy ,
fxx , and fyy are the rst and second derivatives of the depth
image in the pixel (x,y).
(2)
where Y is the new value of a pixel, Ymax and Ymin the new
range of depth values, X the depth value before the rescaling,
and Xmax and Xmin the initial range of depth values before the
rescaling.
To decrease the irregularities of the surface depicted in the
depth image, a rst smoothing is applied through a median
lter, with which the depth value of a pixel is the median of
the neighboring pixels.
K(x, y) =
(1 + fy2 )fxx 2fx fy fxy + (1 + fx2 )fyy

2(1 + fx2 + fy2 )
3/2
2
fxx fyy fxy
(1 + fx2 + fy2 )
(3)
(4)
To calculate the derivatives, each pixel in the surface is tted to a paraboloid. To achieve it, the following biquadratic
polynomial is used:
gij (x, y) = aij + bij (x xi ) + cij (y yj ) + dij (x xi )(y yj )
2
+ eij (x xi ) + fij (y yi )
(5)
where the coefcients aij , bij , cij , dij , eij , fij , are obtained through
the least square tting of the pixels in a neighborhood of (xj ,yj ).
The derivatives of f in (xj ,yj ) are then estimated by the derivatives of gij that are:
fx (xi , yj ) = bij
(6)
fy (xi , yj ) = cij
(7)
fxy (xi , yj ) = dij
(8)
fxx (xi , yj ) = 2eij
(9)
fyy (xi , yj ) = 2fij
(10)
After computing the mean and Gaussian curvature, a HK classication [43] of the surface pixels is applied to obtain a concise
description of the local behavior of the surface. Following the
625
Fig. 3 (a) Facial image with the elliptical convex regions in green and the elliptical concave regions in red; (b) Thresholded
elliptical convex and elliptical concave regions. (For interpretation of the references to color in this gure legend, the reader
is referred to the web version of this article.)
Table 1 HK classication of the local surface in a pixel as a function of the values of the mean and Gaussian curvatures.
H<0
H=0
H>0
K<0
K=0
Hyperbolic concave
Hyperbolic symmetric
Hyperbolic convex
Cylindrical concave
Planar
Cylindrical convex
HK classication, image pixels can be labeled as belonging to a

viewpoint-independent surface class type based on the combination of the signs from the mean and Gaussian curvatures
as shown in Table 1.
Following the HK classication, the pixels with both the
K and the H curvature positive, belong to elliptical convex
regions. The pixels with K curvature positive and H curvature
negative, belong to elliptical concave regions. Fig. 3(a) shows
an image where the elliptical convex regions are in green color
and the elliptical concave regions are in red color. In Fig. 3(b),
a thresholding has been applied to isolate the regions with
high values of curvature, which correspond to the region of
interest: the nose is the most convex regions of the face and
the eye sockets are the most concave regions of the face. In
Fig. 4, the elliptical convex regions and the elliptical concave
regions appear in green and red color, respectively, over the
face image. To lter regions that do not correspond to the
nose or the eyes, geometric constraints are applied. Based on
the facial geometry, the nose has to be below the eyes and
between them. These criteria are useful to discard the elliptical convex regions that appear in the forehead. Finally, the
centers of mass of the regions that correspond to the eyes and
the nose are calculated.
3.2.3.
K>0
Elliptical concave
Impossible
Elliptical convex
not problematic. We are only interested in the face, which is

closer to the Kinect device than the torso and therefore has a
distinctive range of depth values.
Then, a rescaling is applied to the depth values with a conversion from 8 to 32 bits with the aim of discriminating the
facial features better. For instance, Fig. 5(c) shows an image
after applying the rescaling.
The localization of the nose tip using the steps explained in
Subsection 3.2.2, and the head orientation are used to place a
rectangular region where the ears are extracted. The face orientation is calculated tting the nose ridge to a line segment
3D ear detection
To detect the ears, we apply a preprocessing to the depth

image just as we do to detect the nose and eyes. From the
depth image, e.g. the one showed in Fig. 5(a), a depth lter is
applied to remove the background and compute just the foreground where the user is. Fig. 5(b) shows a depth image after
applying the depth lter. In this image, all the background is
removed and only the face and a fraction of the torso of the
user appear. Although a fraction of the torso is removed, it is
Fig. 4 Facial feature detection. (For interpretation of the

references to color in the text, the reader is referred to the
web version of this article.)
626
Fig. 5 (a) Example of depth image obtained from the Kinect sensor; (b) Depth image after applying the depth lter; (c)
Depth image after applying the rescaling.
using the depth information. The curvature of a point in an

implicit surface can be dened by the main curvatures k1 and
k2 , that solve 2 + 2H + K = 0, where H and K are the mean and
Gaussian curvatures, respectively [44]. To obtain the pixels
that correspond to the nose, thresholds were xed experimentally for k1 and k2 (k1 in the range [0.1,0.8] and k2 in the
range [0.12,0.25]). The selected pixels are grouped around the
nose ridge. Then the least squares method is applied to these
pixels to obtain the line that best ts these pixels, which corresponds with the nose ridge. This line is used to extract the
face orientation. A rectangular region with width 80 percent
of the face rectangle width and height 20 percent of the face
rectangle height is obtained, which is placed horizontally as a
function of the face orientation, includes in its left and right
ends the corresponding ear regardless of the particular orientation. Fig. 6 shows two images with the line that corresponds
with the nose and the rectangular region extracted from the
face orientation that includes the ears.
With the values of the depth image after the preprocessing
stages in this rectangular region, the pixels with the lowest
and non-null value in the left and right half are extracted.
Fig. 7(a) shows an image with the depth values in this rectangular region. Fig. 7(b) shows the regions formed by the pixels
with the lowest and non-null value in the left half and right
half of the region. The centers of mass of these two regions
correspond with the ears.
3.3.
Comparison with other systems
Several motion capture systems for the assessment and

rehabilitation of cognitive and physical disabilities have
been presented in literature. Some systems use reective
markers attached to the patients bodies and optical sensors
to determine their 3D position. They are very precise but
the markers are usually cumbersome, taking into account
that patients can suffer from limited movements. Besides,
they are too expensive for many medical centers to acquire.
Mirelman et al. [45] used the Vicon marker-based optical
motion capture system for gait analysis on individuals with
hemiparesis caused by stroke.
Low-cost consumer game interface devices such as Kinect
have been used in marker-free systems. Nintendo@ WiiTM
consists of a wireless controller that uses embedded acceleration sensors responsive to changes in direction, speed, and
acceleration caused by wrist, arm, and hand movements. It
has been used for stroke rehabilitation with positive results
[46]. Various systems are based on the Kinect device. They
are intended for a wide range of rehabilitation tasks with
satisfactory results. In this regards, Chang et al. [47] compared
the high delity OptiTrack optical system with the Kinect
device. They showed that Kinect can achieve competitive
motion tracking performance as OptiTrack. The Kinect device
was included in a stroke rehabilitation system for the upper
Fig. 6 Face orientation and rectangular region used to detect the ears in two frames.
627
Fig. 7 (a) Depth values of the rectangular region used to obtain the ears; (b) Regions with the lowest and non-null value in
the two halves of the rectangular region.
limbs, developed as an interactive virtual environment [48].

The information of the patient movements, along with the
signals obtained from ergonometric measuring devices, are
used to supervise and evaluate the stroke rehabilitation
progress. SeeMe [49] is a program for rehabilitation and
physiotherapy that can use a webcam or Kinect although
only the latter provides all the functions and modules. SeeMe
contains several tasks with many parameters and levels.
Esoma exercise system [50] uses the Kinect for cardiac rehabilitation so that the exercises can be executed in the patients
home. REMOVIEM [51] uses the Kinect for multiple sclerosis
rehabilitation. It allows control of the rehabilitation process
and evaluates the progress of the patients. Prescription
Software for Use in Recovery and Rehabilitation (PURR) [52] is
a Kinect-based system developed for use in the rehabilitation
and assessment of patients following a serious brain injury.
It achieves real-time data capture and analysis to provide
information to therapists on patient progress.
The last ve mentioned systems are based on the skeleton
tracking provided by Kinect to monitor and control rehabilitation exercises. They show the versatility of Kinect-based
applications in the clinical eld as they are applied to different
diseases. However, none of the systems are intended for the
assessment and rehabilitation of the body scheme dysfunctions and leftright confusion, unlike our system. To that end,
our system not only requires the skeleton tracking provided
by Kinect, which the rest of the Kinect-based systems use,
but it also requires facial feature detection. To achieve it, further processing of the 3D depth image is accomplished. To our
knowledge, the Kinect depth image has not been processed
this way in literature.
4.
Psychomotor exercises monitoring
The system has to be running in a standard PC with a Kinect

device connected to it. The user has to sit in front of the Kinect
with a certain minimum level of illumination necessary for the
face detection. Initially, the user has to be at around 2 m away
from the Kinect device either stand or sit in a particular pose:
with the arms up in a horizontal position and the forearms
forming a 90 angle with the arms. After the skeleton tracking
is achieved, the user can abandon this pose and sit around 1 m
away from the Kinect.
The system uses the position of the hands and facial
features in the video sequence to monitor the following psychomotor exercises:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
Touch the right eye with the right hand.

Touch the left eye with the right hand.
Touch the right eye with the left hand.
Touch the left eye with the left hand.
Touch the nose with the right hand.
Touch the nose with the left hand.
Touch the right ear with the right hand.
Touch the left ear with the right hand.
Touch the right ear with the left hand.
Touch the left ear with the left hand.
Touch the head with right hand.
Touch the head with the left hand.
Raise the right hand.
Raise the left hand.
A depth threshold was xed in the monitoring such that if

the difference between the depth values of two body parts, e.g.
a hand and a facial feature, is below this threshold, they are
considered to be in the same depth plane. In this situation, we
consider the values of the 2D coordinates of the corresponding
hand and facial feature to decide that they are in contact.
The monitoring of a psychomotor exercise begins just after
the user is showed the particular exercise he or she must do
with an acoustic message or a pop-up window. The system
monitors both hands, eyes, and ears so that it can distinguish
the incorrect realization of an exercise due to the involvement
of the opposite hand, eye, or ear.
Fig. 8 shows in each two rows, frames with the monitoring of a different psychomotor exercise with three healthy
participants. Each frame is accompanied with the number of
the frame in the video sequence from the beginning of the
monitoring of the psychomotor exercise. Human body parts
involved in the exercises are showed in different colors: blue
for the right hand, right eye, and right ear, red for the left hand,
left eye, and left ear, and green for the nose and green rectangle
for the detected face. Frames are mirrored in Fig. 8. The psychomotor exercises monitored in the rows are #1, 6, 8, and 10.
The last frame in each row corresponds with the correct realization of the exercise. The 3D position of each human body
part and the time associated to each frame of every monitored
exercise are saved in the patients personal le for the further
ofine analysis of the rehabilitation experts.
The 3D system is to be included in GRADIOR so that the
assessment and rehabilitation sessions can incorporate exercises dealing with body scheme disorders and leftright confusion. The therapists use GRADIOR to design assessment, training, stimulation, and rehabilitation sessions in a systematic
628
Fig. 8 Monitoring of four psychomotor exercises. (For interpretation of the references to color in the text, the reader is
referred to the web version of this article.)
629
Table 2 Detection and tracking time of the monitoring

system.
Human body part
Skeleton tracking
Face detection
Eye, nose, and ear detection
Exercise
Processing time
16 ms
35 ms
28 ms
and customized way. Two different modules are present in

GRADIOR: the clinical manager and session module. The
therapists use the clinical manager to modify the session
parameters and assess the progression of the patient over
time. A GRADIOR session is formed by all the exercises that
are part of a customized program. Each customized program
has previously been selected by the therapists as a function of the characteristics and needs of each patient. The
exercises of a session can be intended for the rehabilitation
of different cognitive functions such as attention, perception, memory, calculation, or language. The outcome of each
session is recorded in the patients personal le. The therapists
can analyze the patients personal le at any time to check the
progression of the patient and modify the exercises and their
parameters in response to that progression.
The psychomotor exercises implemented in the 3D system
have a series of parameters and variables, some can be modied by the therapists and others are recorded in the patients
personal le as a result of the monitoring. The customized
parameters are: the overall time devoted to a set of exercises, the rest time between two consecutive exercises, the
maximum time to fulll each exercise, the type of stimulus
(acoustic, visual, or a clinical assistant doing the exercise in
front of the patient), and the presence of positive or negative
reinforcements informing about the correct or incorrect realization of an exercise with an acoustic message and a pop-up
window.
The system records in the patients personal le the
number of exercises realized correctly (with the associated fulllment time), the number of exercises realized erroneously
(with the associated failure time), and the reaction time that
the patient takes to begin the exercise from the time he has
been informed about it. The therapists can congure a different sequence of rehabilitation sessions with the 3D system as
a function of the level of fulllment of the patient in the previous sessions. The therapists do not need to be present in the
sessions. They can check the patients personal le to assess
their performance.
4.1.
Table 3 Performance of the monitoring system.
Performance evaluation
The system program was implemented in Visual C++ with

the functionalities of OpenNI framework. It was tested in a
consumer-grade PC and the Kinect device with a resolution of
640 480 pixels at an average frame rate of 13 frames per second (fps). This rate is sufcient for the correct monitoring of
the 14 psychomotor exercises.
The processing times of each system stages are showed
in Table 2. These times were taken with an Intel Core 2 Duo
Processor at 3 GHz and 3GB RAM. Face detection is drastically
reduced as the algorithm is not applied to the entire 640 480
image (processing time: 190 ms) but to a region around the
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
#11
#12
#13
#14
Total
# of
monitorings
75
75
75
75
75
75
75
75
75
75
75
75
75
75
1050
# of successful
monitorings
73
71
69
73
74
73
72
67
69
71
75
74
75
75
1011 (96.28%)
# of erroneous
monitorings
2
4
6
2
1
2
3
8
6
4
0
1
0
0
39 (3.71%)
head joint given by the skeleton tracking. Thus the face detection time is reduced to 35 ms and the system processes 12 fps.
The system was tested in the research laboratory with
unconstrained background and some amount of illumination necessary for face detection. 15 individuals tested it: 10
healthy, 2 with frontal lobe injury, and 3 with mild dementia
with a view to looking into their differences testing the system. The individuals were informed about the procedure of
the tests. Each test would be formed by a series of exercises in
which the right or the left hand would have to touch a facial
feature. The particular exercise would be detailed by an acoustic message or a pop-up window. After being informed about
an exercise, the participants would have to touch the facial
feature implied in the exercise with the hand and remain in
the nal position of the exercise until a pop-up window indicated the correct or incorrect realization of the exercise or the
nalization of the time to fulll it. The participants did not
have to maintain a constrained posture during the exercises
although the face has to be in such posture that the facial features are visible. The participants made 5 tests, each with the
14 psychomotor exercises. They did the tests in three different
days. Each participant had 30 s to fulll an exercise after being
informed about it. Table 3 shows the performance results of
the system. A monitoring is considered successful if the correct or incorrect realization of the exercise is obtained.
Failures in exercises regarding eyes and nose were caused
by the tendency to turn the head while the hand was moving
toward the facial features, thus in the occlusion state the face
was far from being in frontal position and the facial features
detection failed.
The bigger number of errors in the monitoring of exercises
in which the ears were concerned was due to the fact that
the depth values in pixels around the ears were not as low as
the applied approach expected. Sometimes, the Kinect device
failed to obtain an accurate depth value in some image regions.
The overall successful monitoring percentage was 96.28%.
The system is considered adequate for the rehabilitation
experts at INTRAS Foundation to monitor the psychomotor
exercises.
At the end of the exercises, the participants were asked
about the usability, enjoyment, and motivation they found
while doing the tests monitored by the 3D system. All the
630
participants found it easy to understand the exercises they

had to do although two individuals with mild dementia said
that they would need the support of a person to be ready
to do the exercises in the future. Two individuals with mild
dementia and one with frontal lobe injury said that they
would prefer to have more time to do each exercise. One
individual with mild dementia said that the environment
with the screen and the Kinect was not comfortable. Two
individuals with mild dementia and one with frontal lobe
injury enjoyed doing the test and felt condent using the
system after the 5 tests. All the individuals were motivated to
fulll the correct realization of the exercises, then obtaining
the corresponding positive reinforcements. Besides, they
were not reluctant to use the system in the future.
5.
Conclusions
In this paper, a real-time 3D computer vision-aided system applied to the monitoring of psychomotor exercises is
proposed. The system is intended for the assessment and
evaluation of body scheme dysfunctions and leftright confusion. Other computer vision-based systems presented in
literature do not focus on these disabilities. Monitoring is
achieved through the human body joint tracking (head and
hands) and the face and facial feature (eyes, nose, and ears)
detection using the depth information provided by the Kinect
device. Thus the system overcomes the shortcomings of the
2D computer vision-based approaches and is robust to drastic
changes in the working environment and illumination. The
Kinect device was proved to be an affordable off-the-shelf
product that can be very useful to develop applications in
many elds such as rehabilitation, easy to use and with optimal performance.
The proposed system is capable of human limb monitoring,
its analysis in relation to the psychomotor exercise indicated
to the user, and the storage of the result of the realization of a
set of exercises. The automation of these tasks frees the rehabilitation experts of doing them. The system was evaluated
with 15 users, achieving a successful monitoring percentage
of 96.28%. This performance is suitable for the integration of
the system in a multimodal cognitive rehabilitation platform.
Acknowledgements
This work was partially supported by the Spanish Ministry
of Science and Innovation under project TIN2010-20529 and
by the Regional Government of Castilla y Len (Spain) under
project VA171A11-2. We would like to thank people at INTRAS
Foundation for their contribution and advice to the clinical
focus and requirements of the computer vision system to be
applicable in the cognitive rehabilitation and integrable in the
GRADIOR platform.
references
[1] M.C. Domingo, An overview of the Internet of things for

people with disabilities, Journal of Networks and Computer
Applications 35 (2) (2012) 584596.
[2] M. Betke, J. Gips, The camera mouse: visual tracking of body

features to provide computer access for people with severe
disabilities, IEEE Transactions on Neural Systems and
Rehabilitation Engineering 10 (1) (2002) 110.
[3] J. Varona, C. Manresa-Yee, F.J. Perales, Hands-free
vision-based interface for computer accessibility, Journal of
Network and Computer Applications 31 (4) (2008)
357374.
[4] M.A. Musen, Y. Shahar, E.H. Shortliffe, Clinical
decision-support systems, in: E.H. Shortliffe, L.E. Perreault,
G. Wiederhold, L.M. Fagan (Eds.), Medical Informatics:
Computer Applications in Health Care and Biomedicine,
Springer, New York, NY, 2001, pp. 573609.
[5] H. Zhou, H. Hu, Human motion tracking for rehabilitation a
survey, Biomedical Signal Processing and Control 3 (1) (2008)
118.
[6] P. Wang, I.A. Kreutzer, R. Bjrnemo, R.C. Davies, A web-based
cost-effective training tool with possible application to brain
injury rehabilitation, Computer Methods and Programs in
Biomedicine 74 (3) (2004) 235243.
[7] C. Lin, C. Huan, C. Chan, M. Yeh, C.-C. Chiu, Design of a
computer game using an eye-tracking device for eyes
activity rehabilitation, Optics and Lasers in Engineering 42
(1) (2004) 91108.
[8] R.M.E.M. da Costa, L.A.V. de Carvalho, The acceptance of
virtual reality devices for cognitive rehabilitation: a report of
positive results with schizophrenia, Computer Methods and
Programs in Biomedicine 73 (3) (2004) 173182.
[9] J.A. Edmans, J.R.F. Gladman, S. Cobb, A. Sunderland, T.
Pridmore, D. Hilton, et al., Validity of a virtual environment
for stroke rehabilitation, Stroke 37 (11) (2006) 27702775.
[10] D. Rand, R. Kizony, P.L. Weiss, Virtual reality rehabilitation
for all: Vivid GX versus Sony PlayStation II EyeToy, in:
Proceedings of the Fifth International Conference on
Disability, Virtual Reality and Associated Technologies, 2004,
pp. 8794.
[11] C.-S. Lin, T.-C. Wei, A.-T. Lu, S.-S. Hung, W.-L. Chen, C.-C.
Chang, A rehabilitation training system with double-CCD
camera and automatic spatial positioning technique, Optics
and Lasers in Engineering 49 (3) (2011) 457464.
[12] M.I. Vousdoukas, P. Perakakis, S. Idrissi, J. Vila, SVMT: a
MATLAB toolbox for stereo-vision motion tracking of motor
reactivity, Computer Methods and Programs in Biomedicine
108 (1) (2012) 318329.
[13] J. Broeren, A. Bjorkdahl, L. Claesson, D. Goude,
Lundgren-Nilsson, H. Samuelsson, C. Blomstrand, K.S.
Sunnerhagen, M. Rydmark, Virtual rehabilitation after
stroke, in: Proceedings of the Medical Informatics Europe,
2008, pp. 7782.
[14] P. Pyk, D. Wille, E. Chevrier, Y. Hauser, L. Holper, I. Fatton, R.
Greipl, S. Schlegel, L. Ottiger, B. Ruckriem, A. Pescatore, A.
Meyer-Heim, D. Kiper, K. Eng, A paediatric interactive
therapy system for arm and hand rehabilitation, in:
Proceedings of the Virtual Rehabilitation Conference, 2008,
pp. 127132.
[15] S. Cho, J. Ku, Y.K. Cho, I.-Y. Kim, Y.J. Kang, D.P. Jang, S.I. Kim,
Development of virtual reality proprioceptive rehabilitation
system for stroke patients, Computer Methods and
Programs in Biomedicine (2013) (in press).
[16] D. Avola, M. Spezialetti, G. Placidi, Design of an efcient
framework for fast prototyping of customized
humancomputer interfaces and virtual environments for
rehabilitation, Computer Methods and Programs in
Biomedicine 110 (3) (2013) 490502.
[17] F.J. Badesa, R. Morales, N. Garcia-Aracil, J.M. Sabater, A.
Casals, L. Zollo, Auto-adaptive robot-aided therapy using
machine learning techniques, Computer Methods and
Programs in Biomedicine (2013) (in press).
[18] M. Pediaditis, M. Tsiknakis, N. Leitgeb, Vision-based motion

detection, analysis and recognition of epileptic seizuresa
systematic review, Computer Methods and Programs in
Biomedicine 108 (3) (2012)
11331148.
[19] Y.-J. Chang, S.-F. Chen, J.-D. Huang, A Kinect-based system
for physical rehabilitation: a pilot study for young adults
with motor disabilities, Research in Developmental
Disabilities 32 (6) (2011) 25662570.
[20] B. Lange, C.-Y. Chang, E. Suma, B. Newman, A.S. Rizzo, M.
Bolas, Development, Evaluation of low cost game-based
balance rehabilitation tool using the Microsoft Kinect
sensor, Proceedings of the Annual International Conference
of the IEEE EMBS (2011) 18311834.
[21] D. Gonzlez-Ortega, F.J. Daz-Pernas, M. Martnez-Zarzuela,
M. Antn-Rodrguez, J.F. Dez-Higuera, D. Boto-Giralda,
Real-time hands, face and facial features detection and
tracking: application to cognitive rehabilitation tests
monitoring, Journal of Networks and Computer Applications
33 (4) (2010) 447466.
[22] M.A. Franco-Martn, T. Orihuela-Villameriel, Y.
Buanco-Aguado, Programa GRADIOR. Programa de
evaluacin y rehabilitacin cognitiva por ordenador
(Editrans, Valladolid, Spain, 2000).
[23] INTRAS Foundation, 2013, http://www.intras.es
[24] C. Plagemann, V. Ganapathi, D. Koller, S. Thrun, Real-time
identication and localization of body parts from depth
images, in: Proceedings of the International Conference on
Robotics and Automation, 2010, pp. 31083113.
[25] S. Hrabar, P. Corke, M. Bosse, High dynamic range stereo
vision for outdoor mobile robotics, Proceedings of the IEEE
International Conference on Robotics and Automation
(2009) 430435.
[26] M. Elmezain, A. Al-Hamadi, B. Michaelis, Real-time capable
system for hand gesture recognition using hidden markov
models in stereo color image sequence, Journal of WSCG 16
(1) (2008) 12136972.
[27] L.A. Schwarz, A. Mkhitaryan, D. Mateus, N. Navab, Human
skeleton tracking from depth data using geodesic distances
and optical ow, Image and Vision Computing 30 (3) (2012)
217226.
[28] T. Dutta, Evaluation of the Kinect sensor for 3-D kinematic
measurement in the workplace, Applied Ergonomics 43 (4)
(2012) 645649.
[29] R.Z.-L. Hu, A. Harteld, J. Tung, A. Fakih, J. Hoey, P. Poupart,
3D pose tracking of walker users lower limb with a
structured-light camera on a moving platform, in:
Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition Workshops, 2011,
pp. 2936.
[30] Y.-J. Chang, L.-D. Chou, F.T.-Y. Wang, S.-F. Chen, A
kinect-based vocational task prompting system for
individuals with cognitive impairments, Personal and
Ubiquitous Computing (2011), http://dx.doi.org/10.1007/
s00779-011-0498-6.
[31] P. Haggard, D.M. Wolpert, Disorders of body scheme, in: H.J.
Freund, M. Jeannerod, M. Hallett, R. Leiguarda (Eds.),
Higher-Order Motor Disorders: From Neuroanatomy and
Neurobiology to Clinical Neurology, Oxford University Press,
Oxford, UK, 2005.
[32] A. Sirigu, J. Grafman, K. Bressler, T. Sunderland, Multiple
representations contribute to body knowledge processing,
evidence from a case of autotopagnosia, Brain 114 (2011)
629642.
[33] H.B. Coslett, Evidence for a disturbance of body schema in
neglect, Brain and Cognition 37 (3) (1998) 527544.
631
[34] G. Denes, J.Y. Cappelletti, T. Zilli, F.D. Porta, A. Gallana, A

category-specic decit of spatial representation: the case of
autotopagnosia, Neuropsychologia 38 (4) (2000) 345350.
[35] K. Poeck, B. Orgass, The concept of the body schema: a
critical review and some experimental results, Cortex 7 (3)
(1971) 254277.
[36] Y. Lebrun, Gerstmanns syndrome, Journal of
Neurolinguistics 18 (4) (2005) 317326.
[37] P. Fischer, A. Marterer, W. Danielczyk, Right-left
disorientation in dementia of the Alzheimer type, Neurology
40 (10) (1990) 16191620.
[38] R. Strub, N. Geschwind, Localization in Gerstmann
syndrome, in: A. Kertesz (Ed.), Localization in
Neuropsychology, Academic Press, New York NY, 1983.
[39] OpenNI, 2013, http://openni.org
[40] OpenCV, 2013, http://sourceforge.net/projects/opencvlibrary
[41] R. Lienhart, A. Kuranov, V. Pisarevsky, Empirical analysis of
detection cascades of boosted classiers for rapid object
detection, in: Proceedings of the DAGM-Symposium, 2003,
pp. 297304.
[42] M.P. do Carmo, Differential Geometry of Curves and
Surfaces, Prentice-Hall, Englewood Cliffs NJ, 1976.
[43] P.J. Besl, R.C. Jain, Invariant surface characteristics for 3-D
object recognition in range images, Computer Vision,
Graphics, and Image Processing 33 (1) (1986) 3380.
[44] S. Malassiotis, M.G. Strintzis, Robust real-time 3D head pose
estimation from range data, Pattern Recognition 38 (8) (2005)
11531165.
[45] A. Mirelman, B.L. Patritti, P. Bonato, J.E. Deutsch, Effects of
virtual reality training on gait biomechanics of individuals
post-stroke, Gait Posture 31 (4) (2010) 433437.
[46] G. Saposnik, R. Teasell, M. Mamdani, J. Hall, W. McIlroy, D.
Cheung, K.E. Thorpe, L.G. Cohen, M.E. Bayley, Effectiveness
of virtual reality using Wii gaming technology in stroke
rehabilitation: a pilot randomized clinical trial and proof of
principle, Stroke 41 (7) (2010) 14771484.
[47] C.-Y. Chang, B. Lange, M. Zhang, S. Koenig, P. Requejo, N.
Somboon, A.A. Sawchuk, A. Rizzo, Towards pervasive
physical rehabilitation using Microsoft Kinect, in:
Proceedings of the International Conference on Pervasive
Computing Technologies for Healthcare, 2012, pp. 159162.
[48] J.M. Ibarra Zannatha, A.J. Malo Tamayo, .D. Gmez
Snchez, J.E. Lavn Delgado, L.E. Rodrguez Cheu, W.A. Sierra
Arvalo, Development of a system based on 3D vision,
interactive virtual environments, ergonometric signals and
a humanoid for stroke rehabilitation, Computer Methods
and Programs in Biomedicine 112 (2) (2013) 239249.
[49] SeeMe Rehabilitation, 2013, http://www.virtual-realityrehabilitation.com/products/seeme/what-is-seeme
[50] Esoma Exercise System, 2013, http://newmed.media.mit.
edu/blog/jom/2011/03/17/esoma-exercise-system-cardiacrehab-using-kinect
[51] J.-A. Lozano-Quilis, S. Albiol-Prez, H. Gil-Gmez, G.
Palacios-Navarro, H. Fardoun, J.-A. Gil-Gmez, A. Mashat,
Virtual reality system for multiple sclerosis rehabilitation
using Kinect, in: Proceedings of the International
Conference on Pervasive Computing Technologies for
Healthcare (ICTs for improving Patients Rehabilitation
Research Techniques Workshop), 2013.
[52] S. Simmons, R. McCrindle, M. Sperrin, A. Smith, Prescription
software for recovery and rehabilitation using Microsoft
Kinect, in: Proceedings of the International Conference on
Pervasive Computing Technologies for Healthcare (ICTs for
improving Patients Rehabilitation Research Techniques
Workshop), 2013.

1 s2.0 S0169260713003568 Main

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

1 s2.0 S0169260713003568 Main

Hochgeladen von

Copyright:

Verfügbare Formate

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 3 ( 2 0 1 4 ) 620631

journal homepage: www.intl.elsevierhealth.com/journals/cmpb

A Kinect-based system for cognitive rehabilitation

Received 14 November 2012

Received in revised form

Accepted 23 October 2013

Human body parts detection and

vision-based system is potentially applicable to other tasks with minor changes.

Research and development in recent years has made the

2013 Elsevier Ireland Ltd. All rights reserved.

Corresponding author at: Departamento de Teora de la Senal,

because they have reduced restrictions, robust performance,

system to monitor the psychomotor exercises is explained in

Kinect device applications

Human body part detection and tracking has a wide range of

analysis system for performing ergonomics assessment [28].

The system proposed in this paper was developed to be

disorientation. The assessment of leftright confusion needs

Processing stages of the system

3D skeleton detection and tracking

In the rst stage, a 3D skeleton detection of the person in front

Fig. 1 Overall diagram of the monitoring system.

later tracking are fullled through a library included in the

Once the distance between the elbow and the hand is

Fig. 2 Kinect-based skeleton tracking.

by the RGB camera of the Kinect device converted to grayscale.

3D nose and eye detection

With the face region obtained from the detection algorithm

(Ymax Ymin )(X Xmin ) + Ymin

As a nal stage of the preprocessing, a low-pass lter is

(1 + fy2 )fxx 2fx fy fxy + (1 + fx2 )fyy

fxy (xi , yj ) = dij

fxx (xi , yj ) = 2eij

fyy (xi , yj ) = 2fij

HK classication, image pixels can be labeled as belonging to a

not problematic. We are only interested in the face, which is

To detect the ears, we apply a preprocessing to the depth

Fig. 4 Facial feature detection. (For interpretation of the

using the depth information. The curvature of a point in an

Comparison with other systems

Several motion capture systems for the assessment and

limbs, developed as an interactive virtual environment [48].

Psychomotor exercises monitoring

The system has to be running in a standard PC with a Kinect

Touch the right eye with the right hand.

A depth threshold was xed in the monitoring such that if

Table 2 Detection and tracking time of the monitoring

and customized way. Two different modules are present in

Table 3 Performance of the monitoring system.

The system program was implemented in Visual C++ with

participants found it easy to understand the exercises they

[1] M.C. Domingo, An overview of the Internet of things for

[2] M. Betke, J. Gips, The camera mouse: visual tracking of body

[18] M. Pediaditis, M. Tsiknakis, N. Leitgeb, Vision-based motion

[34] G. Denes, J.Y. Cappelletti, T. Zilli, F.D. Porta, A. Gallana, A

Das könnte Ihnen auch gefallen