You are on page 1of 5

2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel

A Single Camera Based Floating Virtual Keyboard

with Improved Touch Detection
Erez Posner
Department of Electrical Engineering,
Afeka, Tel Aviv College of Engineering
218 Bney Efraim Rd., Tel Aviv 69107, Israel

Nick Starzicki
Department of Electrical Engineering,
Afeka, Tel Aviv College of Engineering
218 Bney Efraim Rd., Tel Aviv 69107, Israel

Eyal Katz
Department of Electrical Engineering,
Afeka, Tel Aviv College of Engineering
218 Bney Efraim Rd., Tel Aviv 69107, Israel

AbstractVirtual keyboard enables user typing on any

surface, including a plain paper or your desk. Some virtual
keyboards give vibration feedback; some are projected on
the typing surface, while others give different kind of visual
feedback such as showing it on a smart phones screen. The
user presses the virtual keys thus typing the desired input

The rest of the paper is organized as follows: Section II reviews

the related work for existing 2D and 3D touch detection methods.
Section III describes the proposed system, shadow-based touch
detection improvement and the conditions applicable to the
proposed solution. Section IV shows comparison results. And
Section V concludes the paper.

In this work we have implemented a single standard camerabased virtual keyboard, by improving shadow-based touch
detection. The proposed solution is applicable to any surface.
The system has been implemented on an Android phone,
operates in real time, and gives excellent results.


This paper describes an improved method implementing a
virtual keyboard using the single integrated standard two
dimensional (2D) camera of a smart phone.
Virtual keyboards had been proposed based on different methods
[1]-[13]. Camera based virtual keyboards can be implemented
using a single or multiple cameras. One of the major challenges
is how to determine if the finger touches the surface or not.
Touch detection based on real three-dimensional (3D) model
built from stereoscopic camera based systems is more accurate
than single camera based solutions. However, since stereoscopic
cameras are not common in mobile phones, this method is less
applicable to mobile solutions. The challenge of accurate touch
detection is even greater when using a single camera and almost
any surface. Floating virtual keyboards, which are portable and
enable directing the camera to any surface, are even more
challenging to implement.
The proposed solution is based on single standard mobile phone
camera; it implements a floating keyboard; and presents an
improved touch detection method, based on shadow analysis.
Thus enables working on any surface, as long as both the finger
and its shadow are visible to the camera.

A Virtual Keyboard can be implemented in several different

techniques. In [1], a virtual key board (vkb) based on a true-3D
optical ranging is presented. It is accurate and robust; however it
requires a 3D optical imaging system. Similarly in [2], [3], the
systems provide special features such as hand gestures and multi
touch but [2] requires multiple cameras, and [3] requires
unique hardware. In [4], the shadow of a finger is detected, and
when it is occluded by the finger, a touch is assumed. The
corresponding touch detection system created in [5] was
designed to detect a touch by comparing the ratio of black pixels
to the number of white ones. It is understood that in the
corresponding article [5], the ratio is acquired by searching
small regions around the fingertips and comparing the number of
white pixels to black ones, where black pixels represent the
shadow. If the ratio of white to black pixels exceeds a certain
threshold, a touch has occurred. However these methods are
sensitive to the direction of lighting, where in many cases only a
thin portion of the shadow is captured by the camera. Therefore,
when the finger seems close to touching the surface, it is still far
away from it, since the pixel difference is small. In [6] and [7],
high speed camera is used, and a special in-air movement should
be made
This section describes extensively the implementation details
of the proposed solution shown in Figure 1, which is primarily
based on [5] with added modifications as shown in Figure 1.

978-1-4673-4681-8/12/$31.00 2012 IEEE

used to enhance the preliminary filter results thus removing

artifacts and undesirable small blobs.
This ROI is then further processed by operating edge detection
which is eventually used to identify the finger tips. For finger tips
detection we used the algorithm described in [8].
The challenge of touch detection is coped with the paradigm to
use the shadow as a measure of distance to the surface.
Hence, the shadow is extracted using image subtraction from an
initial reference image and the captured image. The subtracted
image is processed in the same way as the captured image in
order to find its shadow's contour that leads to the shadow's tip.
The crossing between the fingertip and the shadow's tip is
calculated under specific terms and restraints to discover touch.
A feedback to the user is given as the virtual keyboard exhibited
on the Android's phone screen turns red and the selected letter is
One of the main constraints is the implementation on an Android
phone, which is not as powerful as using a pc. Unique steps are
taken into consideration in order to reduce the complexity of our

Figure 1: Block diagram of the proposed virtual keyboard. The red

outline marks the improved algorithm

The system is divided into two portions: the captured image

analysis and shadow analysis. Initially, the captured image is

1. Filtering, Image subtraction, Color Segmentation and

Morphological operations
The first phase is to separate between pixels that have the
potential of being hand pixels and the ones that are not. The
processed frame is blurred using a 3x3 Gaussian blur filter. It is
used to reduce image noise and to reduce the image details.
Then, the image is subtracted from an initially referenced image
as presented in [5]. For the purpose of enhancing the hand's
detection - neglecting abnormal objects skin segmentation is
used. The image captured is coded in YUV color space due to
hardware constrains. A transformation to HSV color space is
performed. Due to the fact that the hand's dominant color is red,
using HSV color space coding enables a comprehensive
observation on the entire red color region. Thus, different users'
hand's colors would not be neglected. The detected hand region
that passes a certain threshold turns to white, while the rest of the
image is made black. Ideally, after the mentioned operations only
the hand should remain. However, small bumps remained, and so
we applied a median filter and morphological close technique.
The median filter applied using a 3x3 rectangular element as
33 = +1

, =

= 1
255 ,

preliminary filtered to reduce the noises level. Then, the captured

image is subtracted from an initially referenced image, and the
user's hand is detected using color segmentation, resulted in the
hand Region of Interest (ROI). Morphological operations are

0 ,

( , )

33255 (

33<255 (



The morphological close technique is based on dilating and

eroding of the ROI using a 3x3 rectangular element.

= ( ())


Where dilate is:

max , ( + , + )
(4) Where
erode is:
min , ( + , + )
Where src is the image and x' y' in structural element.
In some cases small objects remained rather than the hand,
and had to be removed. This is done after the edge detection

Figure 3: Finger Edge Detection. Continuous finger curve is shown in

white (a); discrete finger curve is shown in yellow (b)

3. Tip Detection
In this stage the tip of the object is found using the approach
presented in [8]. The finger's discrete outline is converted into a
list of consecutive coordinates representing the contour of the
finger. Each three j consecutive pixel coordinates [C(j k),
C(j) , C(j + k)] represent three vertexes of a triangle, and the
head angle is calculated under the assumption that the middle
coordinate is the head vertex. The angle is calculated using the
= (

Figure 2: Finger Separation. The finger Region is shown in white

+ 2

) 2


Where is the angle representing the contour's peak

examined as a potential peak. And a, b, c are the triangle edges
calculated as distances between [C(j k), C(j) , C(j + k)].
Once all angles are acquired, the smallest angle represents the
finger tip. This is under the assumption that the surrounding of
the fingertip is the only area that will provide the smallest angle.
For this method to work the finger's contour must be discrete,
otherwise the coordinates will be too close to each other, and
even the angle for the fingertip will be large. In addition, the
discrete method also reduces the complexity of our system.

2. Edge Detection
The edge detection phase is essential in order to find the
hand's contour and then the finger tip. Hence, to extract the
contour we use canny filter. Additional modifications are needed
due to the fact that the contour acquired is not continuous and to
make it so we use 8-point connected component analysis
neighborhood system that produces a set of counter-clockwise
perimeter coordinates which trace the outline of the hand
I(j) = { (xj, yj) }
this enables a complete traversal of the hand's edge used in 3.
and small bumps that may have remained.
In order to remove the remaining bumps we threshold the
image under the assumption that there should be only one big
object in the frame- the hand- . The area of each remained object
is calculated and the largest one is determined to be the hand. To
reduce the complexity, the continuous contour is transformed into
a discrete one.



Figure 4: Finger-tip Detection. The fingertip detected is shown as a red

dot on the hand's contour

4. Shadow Extraction
In most Smart Phones, in particular Samsung Galaxy s9000I
there are many features. Among them is an ISO Camera. ISO
determines how sensitive the image sensor is to light ISO 100
was found affective to revoke dynamic operations that could
affect the shadow isolation.
The image obtained from 1. and the captured image are
subtracted from the initial reference image leaving us with only
the shadow within the new obtained image (after transforming
the result image to a binary image). Then, the image containing
the shadow is processed as if a hand is being detected through

phases 1. 3. (without skin segmentation). The outcome is as

described in 3.









Figure 5: Shadow Processing. (a) shows the shadow extraction from the
captured image, (b) shows the shadow's contour consisting of the main
shadow and an unnecessary small shadow, (c) shows the shadow's
continuous curve, (d) The shadow's tip detected is shown as a green dot
on the shadow's contour

5. Touch Detection
The fingertip detected in 3. and its following shadow's tip
found in 4. is an estimate of the finger and shadow locations in
the image. The fingertip is marked on its hand's curve ( xSF , ySF )
shown in RED in Figure 6.a, Figure 6.b. The corresponding point
on the shadow's curve ( xS , yS ) is marked in GREEN in Figure
6.a, (theorem: the corresponding point on the shadow curve will
always be visible). Then the distance between ( xSF , ySF ) and
( xS , yS ) is calculated as:
d = ( xSF xS , ySF yS )


In opposed to the pixel ratio measured in [5]:



As seen from Figure 6.a where the finger is distant from the
surface, the distance d is large and there is no touch, and when
they are close, shown in Figure 6.b the distance is small and a
touch has occurred. It must be noted that when the distance d is
small enough to represent a touch, both the fingertip and shadow
tip must be in the same region representing a certain letter in
order for the touch to be valid. Special cases are treated or
flagged by the system.



Figure 6: Shadow curve is shown in blue, finger curve in white. (c)

Shows a typical no touch case. The distance between the red and the
green dot is large. (d) Shows a typical touch case. The distance is
small. (e) and (f) are the virtual keyboard as seen on the phone's
screen. (e) The floating keyboard. (f) a touch detected on the letter "I".

6. Mapping
The bottom half of the phones screen is divided into 30
buttons. The button coordinates are known, making the mapping
quite simple: once a touch is detected, its coordinate is known,
and is compared to the button ranges to obtain the requested key
and add user feedback. The keyboard's language was
implemented in both English and Hebrew.
From examining the ratio, r, of Eq. (9), [5] it is apparent that
the shadow area has little or no change between "Almost-touch"
and actual touch, but is not reduced to an area of zero. Unlike
the above measure the proposed measure calculated distances
between the fingertip and the shadow-tip. The proposed measure
of Eq. (8) reduces to zero upon actual touch.
Furthermore, looking at the possibility for a false touch
detection, since the existing method of [5] needs a threshold
which depends on lightning conditions and the virtual keyboard
surface texture, this threshold acts on variable and noisy input.
The proposed solution is always searching for a fingertip and
shadow-tip distance that ideally reduces to one pixel, depending
on the light source position. Therefore, it appears that this

method [5] may lead to a number of points representing a touch

or a point representing a false touch, even though this is not the
will of the user, as in the image there are several places where
you can see that the ratio is in favor of the white pixels, but this
does not necessarily mean that the finger is touching the desired
Existing Measure:
Pixel Ratio [5]
Distance (d)

No touch

Almost Touch

For future work, we believe that it is possible to improve the

system's complexity even further and add extra functionality,
such as more languages, font change, multi-touch typing and
more. Although there is always place for improvement, we
believe we have provided a rock-solid virtual keyboard
implementation that comes to solve certain challenges met in the
past, and may be a milestone for future outcomes.








Table 1: Distance Measurement and Pixel Ratio using 3x3 rectangular

element comparison. Note the differences in values between
almost touch and touch

The presented touch detection method solves these issues by

adding another layer of accuracy. The method in which the
finger and the fingertip are analyzed, and calculating the
distance from the shadow tip, results in very accurate
coordinates of the location at which a touch has occurred. As
mentioned in the previous section, a touch will only be valid if
both the fingertip and shadow tip are in the same letter region,
also adding to the accuracy of the system. Special cases in which
the region difference is minimal, is treated by the system. Our
touch detection implementation runs in real time at 15 fps, and
results in a touch accuracy of approx 95%, and a minimal false
touch detection.
A floating virtual keyboard, based on a single camera has
been presented, implemented on an Android smart phone, and
runs in real time. Since this implementation runs on a mobile
phone, no extra hardware is required.
The touch detection is performed using improved shadow
based touch detection and runs at 15 fps. This detection is based
on measuring a distance between corresponding points on the
shadow and finger, rather than measuring ratio of finger and
shadow pixels, and is therefore more robust to various
illuminations conditions. Also, hand detection is based on HSV
color space, which although a heuristic method, still provides
better results than the RGB color space. Our touch accuracy is
approx 95% and false touch detection is minimal.
The virtual keyboard implementation took system complexity
into consideration, allowing the final product to run on a less
powerful machine than a pc. This was done by rewriting many
functions and minimizing their time complexity.
Within our project we implemented the option of choosing
more than one language, and many more can be added. This was
possible as our virtual keyboard is shown on a smart phone's
screen, rather than on a physical desktop.

H. Du, T. Oggier, F. Lustenburger and E. Charbon, A virtual
keyboard based on true-3D optical ranging, Proc. British Machine
Vision Conference (BMVC), Oxford, pp. 220-229, Sept. 2005.
[2] Katz I, Gabayan K, Aghajan H, "A Multi-Touch Surface Using
Multiple Cameras". Dept. of Electrical Engineering, Stanford
University, Stanford, CA 94305, 2007.
[3] F. Echtler, M. Huber, G. Klinker, Shadow Tracking on MultiTouch Tables", Technische Universitt Mnchen - Institut fr
InformatikBoltzmannstr. 3, D-85747 Garching, Germany., AVI
08, 28-30 May, 2008, Napoli, Italy.
[4] Andrew D. Wilson, PlayAnywhere, A compact interactive
tabletop projection-vision system, Proceedings of the 18th annual
ACM symposium on User interface software and technology,
October 2326, 2005, Seattle, WA, USA
[5] Y. Adajania, J. Gosalia, A. Kanade, H. Mehta, N. Shekokar.
Virtual Keyboard Using Shadow Analysis, IEEE Conference on
Emerging Trends in Engineering and Technology (ICETET),
[6] Kenkichi Yamamoto, Satoshi Ikeda, Tokuo Tsuji, and Idaku Ishii,
A Real-time Finger-tapping Interface Using High-speed Vision
System, 2006 IEEE International Conference on Systems, Man,
and Cybernetics, October 8-11, 2006, Taipei, Taiwan
[7] Y. Hirobe, T. Niikura, Y. Watanabe, T. Komuro, M. Ishikawa,
Vision-based Input Interface for Mobile Devices with High-speed
Fingertip Tracking, UIST09, October 47, 2009, Victoria, BC,
[8] Malik S, "Real-time Hand Tracking and Finger Tracking for
Interaction", CSC2503F Project Report, December 18, 2003.
[9] T. .Niikura, Y. Hirobe, A. Cassinelli, Y. Watanabe, T. Komuro, M.
Ishikawa, In-air Typing Interface for Mobile Devices with
Vibration Feedback, SIGGRAPH 2010.
[10] Jani Mantyjarvi', Jussi Koivumaki2,
Petri Vuori3 ,
[11] H. A. Habib and M. Mufti: Real Time Mono Vision Gesture Based
Virtual Keyboard System, IEEE Transactions on Consumer
Electronics, Vol. 52, No. 4, NOVEMBER 2006
[12] Shumin Zhai, Michael Hunter, Barton A Smith, The Metropolis
Keyboard An Exploration of Quantitative Techniques for
Virtual Keyboard Design, Proceedings of ACM Symposium on
User Interface Software and Technology (UIST 2000), November
5-8, 2000, San Diego, California. pp 119-128.
[13] Mathias Klsch, Matthew Turk, Keyboards without Keyboards:
A Survey of Virtual Keyboards, UCSB Technical Report 200221, July 12, 2002