Sie sind auf Seite 1von 71

Crisanto Da Cunha


Toolkit Support for the Analysis


of Finger Movement in a Video Stream
B.Sc. (Hons) Computer Science
23rd March 2018

1
Statement of Originality

I certify that the material contained in this dissertation is my own work and does not contain
unreferenced or unacknowledged material. I also warrant that the above statement applies to
the implementation of the project and all associated documentation.

Regarding the electronically submitted version of this submitted work, I consent to this being
stored electronically and copied for assessment purposes, including the School’s use of
plagiarism detection systems in order to check the integrity of assessed work.

I agree to my dissertation being placed in the public domain, with my name explicitly
included as the author of the work.

(Word Count: 11,723)

Date:

Signed:

Working Documents:

Student Wellbeing Services


Disability Service
Student Coursework Coversheet
Student is responsible for attaching/submitting this coversheet to coursework

Student ID: 33972974


Course/Module for which coversheet is submitted:
SCC. 300 – Final Year Project
(To be completed by student on submission)

By submitting this coursework coversheet, this student has requested that the academic marker of this work
takes into consideration the marking guidelines for students with Specific Learning Disabilities (SpLDs) when
marking this work, where appropriate (i.e. for issues outside of defined competence standards). The Inclusive
Learning and Support Plan for this student confirms their eligibility to submit this coursework coversheet.

If you have any further questions about this student’s needs, please either discuss your concerns with the
student, who should be able to answer most questions, or contact the Disability Service (x92111 or email:
disability@lancaster.ac.uk). For further guidance on teaching students with SpLDs, please see our website:
http://www.lancaster.ac.uk/current-staff/disability/inclusive-teaching/information-for-teaching-
staff/assessment/marking-for-students-with-spld/.

Marking guidelines for students with SpLDs


• Mark work primarily for content, ideas and critical thinking without penalising for weaknesses
of expression, spelling and grammatical errors unless the latter are defined as competence
standards; the work itself must be effective and fulfil the assessment criteria that have been set
within academic standards.
• To maintain academic standards, weaknesses of expression, spelling and grammatical errors
should not be disregarded where written expression which is so poor that coherence and
intelligibility are an issue.
• Any work that cannot be marked without penalising for specific weaknesses of expression,
spelling or grammatical errors (e.g. externally accredited programme or programmes with
‘fitness to practice’ considerations) should implement these guidelines only as and when
appropriate.
• Students should be aware of the focus of marking. If work is marked for content and ideas
alone please make this clear to prevent any misunderstandings with respect to weaknesses of
expression, spelling or grammatical errors.
• Do not penalise poor handwriting.
• Be sensitive toward individuals and their work. Constructive criticism that is sympathetic to the
students’ difficulties can help individuals to progress.
• Clear explanations of what is expected in student’s work, how their work compares with these
expectations and how it can be changed to match expectations will be most helpful.
• The use of examples to show how to achieve expectations will be helpful.
• Be aware that the use of short sentences, repeated sentence construction or simple words and
terminology does not necessarily indicate poor understanding or unsophisticated concept
development but may reflect difficulties with word retrieval and sentence construction
characteristic of people with dyslexia.
3
Abstract

This project aims to develop a research toolkit that supports the analysis of finger movement
in a video stream. The toolkit will consist of both hardware and software; hardware includes
a camera and an input device such as an iPad. In addition, a program was written to analyse
the video and inputs.

The toolkit will be beneficial to research as it helps to better understand users movement
habits. The aim of this project is to record the fingers path, ‘hit’ time, ‘hit’ location and ‘hit’
angle (‘hit’ being any interactions with the iPad).
The computer vision library OpenCV will be used along with imagine altering techniques
such as segmentation, contouring, grayscale and binarization.

In summary, this project was able to contribute ideas on how to simplify hand tracking. In
addition, several imagined processing techniques were discussed and used in this project to
gain an accurate result.

Acknowledgements

Thanks to Jason Alexander, my project supervisor for providing me with feedback and
guidance.

4
Contents Page

Abstract --------------------------------------------------------------------- 4
Acknowledgements ------------------------------------------------------- 4

Chapter 1: Introduction --------------------------------------------------------- 8

1.1 Chapter Overview ------------------------------- 8


1.2 Introduction -------------------------------------- 8
1.3 Aims and Objectives ---------------------------- 10
1.4 Report Overview -------------------------------- 11

Chapter 2: Background -------------------------------------------------------- 12

2.1 OpenCV Introduction ------------------------ 12


2.2 Background/Foreground Segmentation ---- 12
2.3 Haar Cascade ---------------------------------- 14
2.4 Chapter Summary ---------------------------- 14

Chapter 3: Design ----------------------------------------------------------------- 15


3.1 Chapter Overview ---------------------------- 15
3.2 Interview with PhD Student ----------------- 15
3.3 Software Resources -------------------------- 17
3.4 Video Clip Capture --------------------------- 17
3.5 Camera Capture Types ---------------------- 20
3.6 System Constraints --------------------------- 20
3.7 Summary Desired Output and System
Run-Through ----------------------------------- 21
3.8 Chapter Summary ---------------------------- 22

5
Chapter 4: Implementation -------------------------------------------------------- 23

4.1 Chapter Overview ---------------------------- 23


4.2 Video Capture -------------------------------- 23
4.3 Grayscale Conversion ----------------------- 24
4.4 Thresholding ---------------------------------- 25
4.5 Binarization ----------------------------------- 26
4.6 Morphological Transformations ------------ 27
4.7 Contour Features ----------------------------- 29
4.8 Harr Cascade ---------------------------------- 30
4.9 Finger Path ------------------------------------ 32
4.10 Chapter Review ----------------------------- 32

Chapter 5: System in Operation --------------------------------------------------- 33

5.1 Chapter Overview ---------------------------- 33


5.2 Process Description --------------------------- 33
5.3 Chapter Review ------------------------------- 34

Chapter 6: Testing and Evaluation ----------------------------------------------- 35

6.1 Chapter Overview ----------------------------- 35


6.2 Accuracy --------------------------------------- 35
6.2.1 Time ----------------- 35
6.2.2 Angle ---------------- 37
6.2.3 Location ------------ 39
6.3 Set Up and Toolkit Run Through by
Another Person --------------------------------- 41
6.4 Tracking Different Object Movements ----- 43
6.5 Chapter Review -------------------------------- 45

6
Chapter 7: Conclusion --------------------------------------------------------------- 47

7.1 Conclusion of Aims --------------------------- 47


7.2 Project Revision ------------------------------ 49
7.3 Further Work --------------------------------- 50
7.3.1 Medical Use -------------- 50
7.3.2 Sign Language
Translation ---------------- 51
7.3.3 Input Device Research –- 51
7.3.4 Website Development -- 52
7.3.5 Robotic Arm ------------- 52
7.4 Reflection --------------------------------------- 52
7.5 Closing Statement ----------------------------- 54

References ----------------------------------------------------------------------------- 55

Appendix ------------------------------------------------------------------------------- 57

A – Proposal ------------------------------------ 58
B – User Manual ------------------------------- 66
C – 6.3 User Questionnaire ------------------- 70

7
Chapter 1

Introduction

1.1 Chapter Overview

In this chapter, we will be looking at the proposed aims and motivations behind this project.
In addition, there will be an explanation of the paper structure and how everything has come
together for this project.

1.2 Introduction
Today there are almost 2.53 billion smartphone users in the world, this market for
touchscreen devices has grown, with an increase of 8.3% since 2017 [1]. For a long time,
research on human-computer interaction has been restricted to techniques based on the use of
a graphics display, a keyboard and a mouse [2]. Modern users want to get as close to the
software as possible and the way users interact with their phones and other electronic devices
has changed drastically from the 2007 iPhone release [1]. In line with these changes, it is
only natural that the way user interaction is recorded and analysed should also adapt; this is
what this project hopes to achieve.

With growing use of touchscreens, multi-touch interaction techniques have become widely
available [3]. This project is relevant to the evolving technology market, in particular with
system user interactions. The need to analyse how users are approaching new technologies is
required. In addition, further research is necessary into how devices can be adapted to
provide additional ease of use, with human-centred requirements i.e. low system latency [4].

8
The main challenge faced during this project and other similar projects, is the ability to
accurately analyse the movement of a finger while also documenting any interactions the
hand or finger may have with a touch device. Producing a final quantitative result of these
user input evaluations can be challenging. This paper will break down the steps needed for
the development of a research toolkit that aids in the analysis of finger movements captured
as a video. The research included analysis of a range of hand and finger movements from
pianists to simple bare-hand finger tracking analysis and some open-source toolkits that are
available for use now [1, 5, 6].

A research toolkit such as this would be useful to a researcher for three main reasons. Firstly,
the toolkit includes both the software and hardware, thus making it convenient for the
researcher to conduct their tests. In addition, the research toolkit should be adaptable to meet
the intended needs of the researcher, with a modular structure. Lastly, the hardware should be
easy to construct and portable while the software should be user-friendly and interactive.

The aim of this research toolkit is to have the functionality to detect and process hand
positions without prior knowledge of the existence of the hand [3]. The difficulty with this
task would be to determine the background and foreground of the video to make the data
intake easier, while also needing to make sure the lighting and environment did not hinder the
results. Thus by distinguishing the background and foreground (foreground the object in
question and background everything else in a frame), it enables the progression of data
analysis from the video, in a process called segmentation [6].

Furthermore, to detect a person’s finger from the hand and have the ability to track these
fingers in their movement, we will need a high-resolution RGB camera to observe a path. The
intended goal is to produce useful information that would be required by a researcher. Areas
of analysis will include ‘hit’ locations, ‘hit’ times and ‘hit’ angle (‘hit’ being any interactions
with the iPad).

Finally, a suitable output of data in various forms, graphs and charts would be available for
the researchers to choose from and the toolkit would produce an appropriate data sheet. The
toolkit itself will be assessed on how easy it is to use by conducting usability testing. The key
areas to look at during the usability study are the number of errors encountered during the

9
set-up phase with the user manual and whether the user is able to successfully conduct their
experiment.

1.3 Aims and Objectives

The aim of this project is to develop a simple and easy toolkit for finger movement analyses.
The following bullet points break down the aims that will be worked towards during this
project.

My aims for this project are:

• To build a toolkit, which includes hardware that should be quick and easy to
assemble, with minimal equipment.
• Record a library of 20 - 40 video streams of participants undergoing various tasks
using the RGB camera.
• To be able to extrapolate information/data from the library gathered, including ‘hit’
time, ‘hit’ angle, ‘hit’ locations, time taken for tasks to complete etc.
• To evaluate the toolkit and if it meets the researchers’ needs and wants (this may
include ‘hit’ are, ‘hit’ time, ‘hit’ angle etc.). Evaluate the toolkit to make sure it is
capable of doing everything that was intended from it, for this project.
• For the user to be able to carry out Fitts’s law study while finger movement is
analysed by the toolkit.

10
1.4 Report Overview

The rest of this report will be structured as followed, and will outline the key areas of
research and design.

• Chapter 2 – Background: The initial research and findings from past works were
discussed and analysed. The results of similar projects were looked at and the
implementations of those designs were taken into account in the design of this project.
• Chapter 3 – Design: After condensing the background research, the design chapter
looks at a high-level construct of the project. In this chapter there are examples of
equipment that could be used, along with a structure of how the project should be
conducted.
• Chapter 4 – Implementation: An overview of the program written, with examples
from the code itself and how the implementation of the program is structured.
• Chapter 5 – Process Description: A break down of the processes that are taking place
behind the code, and the steps taken during this projects.
• Chapter 6 – Testing and Evaluation: Critical review of the program with vigorous
testing and examples of results along with evaluations of if aims were achieved.
• Chapter 7 – Conclusions: A full review of the project, with reflections on the aims
and objectives chapter. In addition, the further work that can be carried out from the
work done in this project will be discussed. .

11
Chapter 2

Background

This section will highlight findings and research prior to the design and implementation
phase. It will look at areas of research such as segmentation techniques, contouring and
OpenCV.

2.1 OpenCV Introduction

In this project, computer vision libraries will be needed to help break down and analyse the
multiple video streams collected. Therefore use of OpenCV, a library of programming
functions mainly aimed at real-time computer vision, is vital for this project.

OpenCV (Open Source Computer Vision), as stated before, is a library of programming


functions mainly aimed at computer vision, this library of functions can be used for image/ or
video processing. The library has more than 2500 optimized algorithms, including a
comprehensive computer vision and machine learning algorithms. These algorithms can be
used to detect and recognise faces, identify objects, classify human actions in videos, track
camera movements, track moving objects, extract 3D models of objects, produce 3D point
clouds from stereo cameras, etc. [7]

OpenCV is a cornerstone in computer vision analysis and has been used for countless
projects over the years. Yeo et al [8] exclusively used features from the OpenCV library
along with a Xbox Kinect to create a hand and finger tracker along with a gesture recognition
system using low-cost hardware. On the other hand, Godbehere et al, 2012 [9] developed a
tracking algorithm within their own computer vision system that is said to demonstrate
significant performance improvement over existing methods in OpenCV 2.1. For this project,
the use of the functions provided in the OpenCV library is of great aid to deconstruct a video
stream.

12
2.2 Background/Foreground Segmentation

Once the video has been captured, it is necessary for the software to break down the video to
help with analysis. One technique that can be used to break down the video is
background/foreground segmentation to help the software distinguish between the subject
that is being analysed and everything else in the background that is irrelevant.

The basic problem lies in the extraction of information from vast amounts of data. The most
important part of the project for the researchers are the results and how and why they are
relevant. The goal of segmentation is to decrease the amount of image information by
selecting areas of interest. Typically hand segmentation techniques are based on stereo
information, colour, contour detection, connected component analysis and image differencing
[2].

Similarly, Moscheni et al [10] highlight the intimate relationship between background and
foreground segmentation and computation of the global motion. When breaking down the
global motion (background) we can estimate that any local motion (foreground) to be
performed by the object (finger or hand), thus foreground can be assumed to be the
displacement of the object in a scene. This approach of tracking or clustering pixels works
well at determining the foreground and background. This paper was helpful for approaching
the project in a layered structure, by building on existing concepts and adapting and adding to
them to gain more accurate results. The disadvantages of background/foreground
segmentation that were found by Von Hardenberg and Berard included, “unrealisable
cluttered backgrounds, sensitive to changes in the overall illumination, sensitive to shadows,
prone to segmentation errors caused by objects with similar colours” [2] . Letessier and
Bérard developed new techniques to overcome these disadvantages, Image Differencing
Segmentation (IDS) is insensitive to shadows and Fast Rejection Filter (FRF) is insensitive to
finger orientation [4].

13
2.3 Haar Cascade

Haar Cascades are used in computing vision software such as OpenCV to help with detection
[11]. In the OpenCV python environment we are able to train a Cascade by providing the
program with thousands of positive and negative images. Next, the program takes each
positive image and matches that against every pixel of the negative images, when the process
is complete the Cascade should be able to distinguish between a positive object Cascade and
its negative environment.

This concept is perfectly demonstrated in Padilla et al’s face detection paper [12]. Faces
much like fingers and hands come in various different shapes and sizes. The 4-steps
suggested in this paper are similar to the suggestions made in the python programming
tutorial [11] these consist of: face detection, image pre-processing, feature extraction and
matching. This along with varying image colour, lighting, rotation and quantity of times
gives you a more precise detection of the frame. Ultimately, to gain an accurate
representation of the subject of interest it is vital to do everything that can be done make the
object (finger in this case) as easy to detect, with the highest precision possible, by the
system.

2.4 Chapter Summary

Through the research gathered it was found that there are several ways to conduct this
project. The next step will be to undertake each method and evaluate which gains the most
precise result that is required but is able to fit within the timeframe and recourses available
for this project. Ultimately, the background research that has been conducted will determine
the shape and structure of the project’s design.

14
Chapter 3

Design

Figure 1: Screenshot of a finger being tracked by the toolkit.

3.1 Chapter Overview


Through this chapter, the various high level system design decisions that were made will be
discussed prior to the final implementation of the toolkit. This chapter will include an in-
depth conversation with an HCI PhD student and their desired requirements from a toolkit
such as this. In addition, we will look at the software resources, video clip capture, various
camera types, system constraints and the desired system output.

3.2 Interview with PhD Student


As the intended target user for this research toolkit would be academics such as PhD
students, to aid them in their work, a logical start was to find out what researchers are
actually looking for in a product such as this and what they intended its design to be like.

The conversations with PhD students were helpful as it brought a different perspective to this
project. The PhD student looked at this project as a stepping stone to look at further study and
how this toolkit can be adapted and evolved into making it better and useful in other areas of
work such as Medicine and Psychology. Additionally, it allowed the toolkit to be visualised
in a broader picture and seeing how it could possibly be used in multiple other ways as a
small part of a much larger project as well. This added significance to the results that would

15
be achieved from this toolkit. The questioning opened up with ‘What analytical information
would you like to gain from tracking hand/finger movement in a video stream?’, the higher
level and general answer that was given by most students was that the requirements for
projects varies and a useful toolkit should accommodate multiple results but also should be
open to adaptation for the users’ desired output. This indicated the need to publish the source
code for other users to be able to modify and adapt the desired code for their specific needs.
As a default, the most important results that should be produced by the toolkit include time to
track and analyse the finger through the video stream, as well as the ‘pause’ time taken by the
individual between the tasks they carry out. ‘An additional add-on that could be added to the
toolkit would be the ability to calculate errors during the experiment’.

Following on from the initial talk, the next question was posed to the PhD student was
‘Would a toolkit which analyses finger movement in a video stream be useful to your work?’.
As previously addressed, the feedback from the students depended on their work. Most HCI
(Human-Computer Interaction) students had said that part of their work would benefit from
this toolkit. One particular student stated that she was working on a medical based project
and that this kind of toolkit would be extremely useful in that line of work. The toolkit could
help track and analyse a surgeon’s hand movements during surgery and then use this
information to work towards getting robotic arms to replicate the work done by the surgeon.
With the use of movement tracking software, if robots are able to replicate precise human
tasks this could lead to safer medical operations that would eliminate human error and
increasing precision.

The final questions expanded on the idea of how a movement tracking toolkit could be
adapted and used in the future for more extensive research. For a wider range of uses, the
PhD student advised looking outside of computer science to find some common group with
departments such as psychology. Finger and hand movement tracking can also be expanded
to analyse facial features and detect facial releases. This would be very useful for website
builders/content producers as they can use psychological reactions to produce desired
content. This would include gaining information on hot spots on interfaces, how to achieve
an action most efficiently and we will be able to gauge emotional reactions.

In summary, there are three key areas that the HCI PhD students are looking for. Firstly
adaptability, as projects differ and result needed are different and it is important that a

16
research toolkit should be able to be adaptable to the researcher's needs. This could include
things such as being able to code specific reading requirements that are needed from the
software. Secondly, a clear representation of results is extremely useful for future analysis,
this includes producing the output data in a CSV or excel format. Lastly, the whole toolkit,
which includes the hardware and software should be easy and quick to use and setup.

3.3 Software Resources

As discussed in prior research and in the project brief the OpenCV library provides vast tools
that are capable to gather all information required for this toolkit. Furthermore, the functions
available in the library aid in analysing this information. This includes being able to convert
videos files to grayscale and using that information to further analyse the video clips.
Accompanying OpenCV library, Python was the language of choice for this toolkit. The
simplicity of the Python language along with its consistent syntax and large library,
compliments OpenCV functions in a way that enables them to work fluently together. These
were the tools available when undertaking this project but to gain a high-level overview of
the design, interviews were carried out with PhD students to ask what results would be
helpful for them in their line of work.

3.4 Video Clip Capture

A large part of this project is to be able to capture video from tests. The videos captured need
to be clear with minimal shaking so that the finger movement can be tracked accurately.

For any analysis work to take place, a large quantity of raw data needs to be available to the
toolkit software. As stated in the proposal (in Appendix A) the video capture of this project
needs to be simple, easy to assemble and portable. The camera for capturing the footage may
vary in choice but the video should be of good quality with a minimum of 1080p. This is
because it will produce a resolution of 1,920x1,080 pixels hence allowing the finger outline
to be precisely defined in the video. This makes it easier for the finger to be detected by the
software and produces a more accurate result. In addition, 30fps is a standard of most

17
webcams and cameras therefore that will be perfect for the recording. Anything less than
30fps such as 24 fps, could result in empty frames and the footage will not be as smooth as
that collected from a 30fps or higher camera. This is especially important when you have
fast-moving fingers in a video stream.

An iPad will be used as an input device for this project. The camera will be connected to a
tripod and will be placed directly above the iPad (bird's eye view) and will capture videos of
participants while they attempt to do the tests for the experiment. The iPad (or any equivalent
device that is used) must be portable and easy to set up, it should be able to load any tasks
and should be simple to use. An iPad was the ideal, because of its portability and ease of use,
for this project but this can be altered depending on the researcher's preference.

The iPad should be placed on a flat surface with the brightness of the screen adjusted for the
exposure of the camera. The use of the camera is intended to capture the iPad screen and to
minimise the recording any of the surroundings. Therefore, the video should be framed on the
iPad screen with the brightness adjusted so that we are able to clearly read any writing on the
iPad balancing any exposure of the screen. The camera should be connected to a laptop or
any other portable device that is able to host and execute the program. This is the final setup
to take in data [figure 2].

Once everything is in place, the experiment will begin and the participant will be asked to
undertake various tasks, these could involve: typing a sentence, navigating a website, doing
the Fitz law test etc. The video stream is then uploaded onto the laptop and saved. Following
this, the video file is run through the finger movement program that should analyse the time,
position and angle of the subject’s finger movement in the video stream. The frames
information such as the time, the angle and the location of the finger will be presented in a
spreadsheet or equivalent document that the researcher can then further use in their work.

18
Figure 2: Images of the set-up

Figure 3: Birds-eye view of the hardware set-up for this project.

19
3.5 Camera Capture Types

Figure 4:Samsung S9 [29] Figure 5: Logitech C922 [28] Figure 6: Iphone X [30]

Figure 4,5,6 shows the different cameras that are able to capture the intended video input,
smartphones that are able to capture the desired input. Ultimately the chosen device should be
able to precisely capture the videos clip while being in focus, clear and allow easy file
transfer to take place. A webcam works well as the video captured can be directly saved onto
a laptop and conveniently inputted into the program. On the other hand, a smartphone does
not need to directly connect it to a laptop straight away and multiple video streams can be
recorded and saved. These can be transferred to a laptop at a later date for analysis. The
choice of camera is at the discretion of the researcher's preference.

3.6 System Constraints

As discussed by Yu, et al [13] the problems of background subtraction includes a resulting


image being filled with noise, thus, requiring the process of edge-preserving filtering in order
to remove this noise making the system more. This is a very costly process and would not be
suitable for this project. However, as explained in the implementation chapter, other
techniques such as thresholding and contouring would need to be used to produce an accurate
result in this project.

20
Furthermore, foreground constraints include accurate detection due to colour noise and
colour tracking. A solution to this would be to use a specific colour range that would be
tracked similarly to the one used by Ghorkar, et al [14]. Another solution would be
grayscaling the images and eliminating the need for colour tracking altogether.
When using contouring to analyse the shape of the finger the program goes through the
frames and images to get the most accurate shape of the finger, but this function only allows
to gauge the approximation of the shape and not the actual shape [15], this ultimately has an
effect on the accuracy of the results.

Another option with regards to gaining the shape of the finger would be using Song, et al
finger shape tracking [16]. Similar to implementing Haar Cascades with a redefined image of
a shape; this system requires a very complicated process of checking certain colour ranges
and extraction of finger tips that do not include the full hand (if the hand is out of frame) will
cause an issue with the system.

3.7 Summary Desired Output and System Run-Through

The overall desired output for this system is for a program to accurately be able to track the
finger movements within a video stream and produce the coordinates on the screen at which
the finger is located at each time frame. The result is represented in a useful CSV file,
including time, angle and location, which can be loaded into excel if necessary. The system
should be able to output various results depending on the researches preference. This can be
implemented in HCI analysis of various input devices. The system should be adaptable for
functional add-ons that are able to be coded on by any user.

The steps involved to achieve the desired output:

Take video
Review the
Set-up recordings of Run the video
result file that is
hardware, with the hand files into the
produced with
camera and movement and software on the
the details of
input device. upload onto a computer
the experiment
computer

21
3.8 Chapter Summary

In this chapter the high-level design features of this project have been established.
Additionally, the desired run-through of the project has also been discussed. The ideas talked
about in this design chapter will be deconstructed further in the next chapter (4)
implementation.

22
Chapter 4

Implementation

4.1 Chapter Overview

In this chapter, the concepts that are involved in the project will be looked at further to gain a
better understanding of what is occurring in the background of the toolkit’s software. These
concepts include video capture, grayscale conversation, binarization, contouring, polygon
approximation, morphology, the law of cosine. The toolkit takes in inputs of .mp4 and .mov
video files, while additionally needing python to be available to run the software toolkit code.
Segmentation Contouring Finger Position Angle

4.2 Video Capture

A simple and user friendly command for the toolkit would facilitate the speed and efficiency
required for research.

As the software is initially run using the command “python finger_tracking.py


<full_path_of_video_file>”, the “cv2.VideoCapture” function sets the reader to read from
video file input. This “VideoCapture” instance is able to capture the video frame-by-frame
until it is released.

23
4.3 Grayscale Conversion

After the video stream is read into the program frame-by-frame, the images are converted
into grayscale. Grayscale is simply reducing complexity: from a 3D pixel value (R, G, B) to a
1D value. Grayscale is extremely important in image processing for several reasons. Firstly,
it allows the elimination of noise in coloured images. Due to the change in pixels, hue and
various different colours, is it difficult to identify edges and the extra colours are considered
to be noise.

Secondly, colour is complex and unlike the ease of perception and identification that humans
can detect colour, systems will need a lot more processing power due to an increase in the
complexity of the code. On the other hand, grayscale is fairly easy to conceptualise because
we can think of the two spatial dimensions and one brightness level. Lastly, the speed of
processing is a major factor as coloured images can take a very long time to process, a lot
longer than processing a grayscale image. As we are analysing hundreds of these frames the
time taken to analyse coloured images is much longer than that of a grayscale image,
consequently we use grayscale images for the next sections of processing.

In the following images grayscale has been implemented along with foreground/background
subtraction. With this technique, the object that is moving (foreground) is white and the
objects that are still in the frame (i.e. the desk) are in black.

Figure 6: Images of background subtraction taking place, the white parts show the finger in motion

24
4.4 Thresholding (Extracting the finger from the video)

The need to separate the subject, in this case the individual’s finger in a video, is highly
important. It is a priority to establish the difference between the background/ irrelevant noise
and the specific finger that will be used for analysis. Once the finger is established, the initial
stages of analysis on the finger will be easier to accomplish. There are many forms of image
segmentation, these include: clustering, compression, edge detection, region-growing, graph
partitioning, watershed etc. The most basic type and the one looked at in this project is
thresholding.

Firstly, the screen area of the video stream is segmented, this is done by thresholding and
looking for bright rectangles/squares. Thresholding works as follows; if a pixel value is
greater than a threshold value, it is assigned a value (possibly white), else it is assigned an
alternative value (possibly black). The function used is cv.threshold [17].

25
By looking at the signature of the thresholding function it is determined that the first
parameter is our source image (src), or the image that we want to perform thresholding on;
this image should be grayscale. The second parameter thresh, is the threshold value which is
used to classify the pixel intensities in the grayscale image. The third parameter maxval, is
the pixel value used if any given pixel in the image passes the thresh test [18]. Finally, the
fourth parameter is the thresholding method to be used, these methods include:

• cv2.THRESH_BINARY
• cv2.THRESH_BINARY_INV – Inverts the colours of cv2.THRESH_BINARY
• cv2.THRESH_TRUNC - If the source pixel is not greater than the supplied threshold the
pixels are left as they are.
• cv2.THRESH_TOZERO
• cv2.THRESH_TOZERO_INV

4.5 Binarization

In this project, after the video stream is converted to grayscale this is followed by
binarization using “cv2.THRESH_BINARY”, if the source pixel is not greater than the
supplied threshold, the pixels are left as they are.

26
Figure 7:"Neutrophils" by Dr Graham Beards [19]

Figure 8: Threshold Types [18]

The Binarization method converts the grayscale image (0 up to 255 gray-levels) into a black
and white image (0 or 1). Simply, Binarization is the process of converting a pixel image to a
binary image [19]. The high-quality binarized image can give more accuracy in character
recognition as to a compared original image due to the noise removed from the original
image [20].

4.6 Morphological Transformations


Contours are identified in the binary. Once contours are found, polygon approximation is
used to get a rectangle from the contour. This is done to get the location of the screen so that
only hand motions within the screen are observed. To obtain the hand contour a couple of
steps need to be completed first; these include looking at darker objects in the frames then
applying the blurring, thresholding and other morphology operations considered. This is all
done to obtain a more accurate hand contour.

27
In linguistics, morphology is the study of the internal structure of words. While in computer
vision analysis, morphology transformations are simple operations based on the image
shapes. The transformation is normally performed on binary images (this gained after
binarization). The function requires two inputs, one is our original image, the second one is
called structuring element or kernel which decides the nature of the operation. Two basic
morphological operators are Erosion and Dilation [22].

Erosion works by having a slider, called the 'kernel'. In this project “MORPH_RECT” was
used as a kernel, this is a rectangle with the 7 pixels by 7 pixels dimension. Additionally, there
are three shapes that can be used for the kernel:

§ Rectangular box: MORPH_RECT


§ Cross: MORPH_CROSS
§ Ellipse: MORPH_ELLIPSE

Figure 9: MORPH_RECT, is shown to be used in the code to obtain a rectangle.

The kernel slides through the image and erodes away any pixel that doesn't match all
surrounding pixels. If all the pixels in the kernel are 0's (black) and one of the pixels is a 1
(white), it will remove (eroded) that pixel. It is useful for removing small white noises and to
detach two connected objects..

Dilation is the opposite to erosion, dilation works by pushing out the pixels that do not match.
Normally, in cases like noise removal, erosion is followed by dilation. Because, erosion
removes white noises, it also shrinks the object (as seen in the images below [23]). Next the
image is dilated, as a result of this the noise is removed but the object area increases. This
technique is also useful in accurately determining the whereabouts of the finger/hand in the
area of the screen.

28
Figure 10: Original image Figure 11: Erosion Figure 12: Dilation

4.7 Contour Features

After the parts of the screen where hand movement occurs have been identified, the next step
would be to use contour techniques to determine the outline and position of the finger itself to
start tracking. There are several contour features such as moments, contour area, contour
parameter, contour approximation, convex hull etc. The prerequisites for contouring is first to
use a binary image, and secondly to apply thresholding or canny edge detection, both which
will be done, this is to ensure that a accurate hand contour.

Contours can be explained simply as a curve joining all the continuous points (along the
boundary), having the same colour or intensity [24]. The contours are a useful tool for shape
analysis and object detection and recognition. This is the perfect feature to use for the finger
analysis portion of the software.

Figure 13: Image shows the start of using contouring to find the finger

The "cv2.findContours()" function (in the picture above) allows us to detect an object in an
image. The function, "findContours" takes three arguments. First is the source image, second
is contour retrieval mode, third is contour approximation method, the output is a modified
image. The contour retrieval mode "RETER_TREE" retrieves all the contours and creates a
full family hierarchy list, this means it is able to detect objects that are in different locations
or shapes are inside other shapes but is still able to connect them (the contours) together.

29
In image processing and computer vision moments are a quantitative measurement of the
intensity of image pixel. Moments help you to calculate some features such as total mass of
an object, the centre of mass of the object, area of the object etc. From these moments, you
can extract useful data such as area and centroid. Centroid is given by the relations [25]:

Figure 14: Equation used to calculate the centre of the finger.

This calculation is used to determine the centre of the finger for the tracking to be able to
follow a certain point. The centre of contour is calculated to aid us in the angle detection
further on in the analysis. The “contourArea” function is used to be able to gain a list of all
the contours within the screen area to be able to pick the maximum contour area.

Once the hand contour is obtained, we then look for the finger by analyzing the position of
the hand and looking for extreme points which would correspond to pointed fingers. This
allows the “calculate_fingertip” function to be used to calculate the location of the fingertip.
It does this by looking at all the contour points, the points on the extreme left and extreme
right that are found and these extreme points are looked at. Next the top left, top right, bottom
left and bottom right corners are searched for presence of a hand. Based on where the hand is
present, extreme left or extreme right points are picked as fingertips. If the hand is present in
top left, then it is expected that point on the extreme left will be the fingertip. This allows us
to identify the fingertip and we are now able to track it.

4.8 Haar Cascade


This section will look at an object detection technique called Haar Cascades, this technique
was implemented during initial iteration of the toolkit but was not used in the final version.
This is due to Haar Cascade not being as accurate as required by this project, and it was very
time consuming to develop a Haar Cascade which was not the most efficient technique for
this project.

30
Haar Cascades are a machine learning based approach where a cascade function is trained
from a large number of positive and negative images and this is used to detect objects in other
images. For this project, four thousand positive images of hands and finger and four thousand
images that did not include hands or fingers were used to train a hand detection Haar
Cascade. This proved to take a long time as training the Haar Cascade took a couple of hours,
sometimes overnight. Furthermore, the detection accuracy using this technique for fingers
was poor but worked well for faces and eyes.

The Haar Cascade are beneficial when wanting to track a large object that won’t necessarily
change shape or move rapidly. This is able to detect eyes and a head but also can be used to
detect an object such as a ball or a pen. However, trying to detect a finger movement which
can be small can be challenging to achieve with a Haar Cascade. The pictures below
demonstrate a trained Haar Cascade detecting a face, eyes and a hand. As demonstrated it is
not accurate and therefore unsuitable for this project.

Figure 15: Images of Haar Cascade being implemented, the squares represent the detection of a face, eyes and a hand

31
4.9 Finger Path
To visualise the finger path being tracked it was useful for researchers to be able to have a
video clip of the actual tracking taking place. We use the pixels in the screen to gain
distances around the finger, due to the three lengths a, b and c that were able to be calculated.
The use of numpy and math libraries in python aided in calculating the mathematical
equations used. Next, the cosine rule is used to find the angle of the finger. The cosine rule is
given as:
cos(C) = a2 + b2 − c22ab,

In the code this same formula is replicated and the three lengths that were previously found
are used to find the angle on the finger in comparison to the iPad.

Figure 16: The points obtain and the use of the cosine rule to work out the angle of the finger.

The centre of the hand contour is found and angle is measured between the centre and the
index finger to get the angle of the finger.

The ‘self.draw’ method runs through that points that were calculated and is able to use the
.circle and .line to draw the path of the finger over the video’s stream. The results are stored
as the position of the x and y coordinates on each frame of the video stream. Finally, the
function ‘write_output’ is able to publish the results in a .csv file. The file can be used for
further analysis by creating graphs to show the diverse experiments that can be obtained from
this research toolkit. The full analysis of the toolkit will be presented in the next section.

4.10 Chapter Review


In this chapter, a clear understanding of the stages required for accomplishing this project
aims have been established. The techniques mentioned in the implementation chapter will be
able to track the path of a finger whilst providing the locations of the finger with respect to
the screen as well as the angle, this meets the requirements brought forward in the aims.

32
Chapter 5
System in Operation and Process Description

5.1 Chapter Overview


This section will look at the process of conducting an experiment using this research toolkit.
The section will be broken down into the software handing and hardware handing as well as a
general user guide.

5. 2 Process Description

This following section is a high-level run-through of the manual in Appendix B.

Figure 17: The desired setup. camera birds-eye view of the iPad

When using this particular finger tracking research toolkit, the hardware will need to be
installed first to get data to analyse later. As described in the design chapter the set-up will
include an overhead camera with a bird's eye view on an input device (iPad).

The goal of this experiment is to determine whether an input delay exists between the video
and the results produced from the toolkit. The experiment begins with the particular test (such

33
as the Fitts’s Law figure 35 or any finger tracking test) uploaded onto the iPad and the
camera recording will start. The candidate being evaluated, will sit down and begin the tests
on the iPad using their index finger. The videos should then be uploaded to the computer that
is able to run the software portion of the toolkit. As the default, the toolkit is able to track the
finger movement by giving the x and y coordinates of the finger at various sections (frames)
in the video stream. Along with recording the time of the experiment and the 'hit' angle of the
finger. The video should be fed into the software and a .CSV file ( in the image below) will
be produced with the details of the experiment.

Figure 18: An example of the output result in a .CSV file.

5.3 Chapter review


This chapter looks into the process that is taken through this experiment to achieve an
accurate and precise results file of a tracked finger. Furthermore visual representation of the
final output final and the setup are given

34
Chapter 6

Testing and Evaluation

6.1 Chapter Overview


The final toolkit will be assessed and evaluated through several different testing methods.
The accuracy of the toolkit will be examined and will involve accurately determining the
time, location and angle of a finger as it is traced in a video stream.

6.2 Accuracy
The main criteria researchers are looking for in a research toolkit is how accurate the results
are. This evaluation will be broken down into 3 categories and will reflect the aims of this
project. The key areas of accuracy that will be looked at are the ‘hit’ time, ‘hit’ angle and the
‘hit’ location.

6.2.1 Time

The reason for this investigation to be conducted is to determine whether there is a delay time
between the raw footage and the results that are produced after the analyses. If there is a
delay a quantitative result should be evaluated to determine the time offset present in the
toolkit.

To evaluate the ‘hit’ time, this experiment will consist of recording a video stream where an
individual is asked to move their hand from one square to another on the iPad. The squares
will be 15cm apart and will be at the same horizontal level. The individual will be asked to
press a square (for example square A) and hold their finger there for two seconds, then they
will move their finger onto the next square (square B) and hold for two seconds, finally they
will move their finger back to the original square (square A). This experiment will be

35
repeated with changes in the duration of the press (there will be a timer running beside the
iPad).

After the video is recorded, the times and frames of the video will be analysed. The time
when the finger is initially presses down on a square will be recorded as well as the time
when the finger is released, the number of frames between the press will also be counted and
noted. After the video is recorded, the times and frames gathered from the video will then be
analysed further.

Next, the video stream will run through the software of the toolkit and the results will be
evaluated. In the results, concurrent repeat values of x and y coordinates will be looked for,
this is because if the x and y coordinates are the same for multiple frames/seconds the finger
hasn't moved in the 2D plane and can be concluded to be "pressing the square". The time of
the initial repeated coordinates will be noted along with its duration.

Finally, the results gained from the software will be compared to the results recorded when
observed in the video. The accuracy of the toolkit can be determined by if the results match
the timing from the observation or if there is a delay or offset in the time taken by the
software.

The following explains the results from the experiment:

36
Figure 19: Screenshot of the initial touch at A at 0:02

Figure 20: Results screen for the first 3 seconds of the video

As shown in figure 10 the initial press onto A was done at 2 seconds however in figure 11 the
results from the toolkit show the finger was stationary at about 2.5 seconds this is determined
by the x coordinate only varying by 1 from 2.375 to 3.08 and the y varying by 2 between 2.5
second and 3 seconds.

37
We can conclude that the offset (delay) of this toolkit is around 0.5 seconds, thus a delay of
0.5 seconds between the raw video file and the result sheet produced, this being a very
successful result. This experiment was carried out several times along with a 5 second press
producing similar results, all the raw results and data are present on the working document.

6.2.2 Angle
To accurately measure the angle, the next experiment involved an individual initially placing
their finger lying flat on the iPad. They lifted their finger up to 90 degrees then placed it back
down on the other side (as shown in the pictures below).

This helped evaluate the accuracy of the toolkit and the desired output is for the program to
produce
Figure 21: a of
Side view data set that goes from 0 degrees (at the start of
the experiment, the video) up to 90 degrees and
the actual recording will be taken from
then
a birds eye to 180 degrees (at the end of the video), the accuracy
view. of this experiment will be
determined by whether the system can accurately record these changes in angle consistently.

The following experiment was carried out multiple times, with all the raw data collected
available on the working document.

From the data below figure 22, it is clear to see that at the beginning of the video the angle
recorded is 98.40 degrees, which is not the desired output (desired output being 0 degrees).
At the midway point of the video around 3.65 seconds (with the 0.5 seconds added for delay
from the previous section), the angle at 3.67 seconds is 145.46 degrees and at 4.12 seconds is
152.65 degrees these values gained are incorrect as the desired output should be 90 degrees.
Finally, the end of the video shows the angle at 7.33 seconds as 100.40 degrees however it
should be 180 degrees.

38
Figure 22: Results sheet from the Angles experiment

In conclusion, this toolkit is unable to accurately evaluate angle of the finger while it is being
tracked. The results showed that the angle region, throughout the video, stays between 98
degrees to 150 degrees and therefore is something that would need to be refined for further
work with this toolkit.

6.2.3 Location

For this portion of the evaluation, the accuracy of the finger location was tested. To conduct
this experiment an iPad was used along with two rulers either side of the screen to record the
measurement of the iPad screen. To determine where the individual has touch, PPI (pixels per
inches) was used to calculate where on the iPad a ‘hit’ has occurred. This is done by
measuring the location using the ruler (in inches) and knowing that the iPad used has a PPI of
264 the screen pixels can we calculated [36]. The result sheet data will be checked to locate
the finger at specific times and check x/y coordinates for the same location. The accuracy
will be judged according to how precise the results are able to output, the desire output would
be to get the same coordinates.

39
Figure 23: Ruler used to calculate the dimensions of the iPad screen

The following experiment was carried out and the results are evaluated below, the full data
and links to the video of the experiments being run are available on the working document.

D C

E
A

Figure 24: Screenshot of the video when the experiment was being carried out.

The following marks from A to E were made at 0.11, 0.15, 0.19, 0.23 and 0.28 seconds
respectively. Thus this would also be the location of the finger at those specific times. To
work out the location of the fingers using PPI the measurements of the marks were taken, for

40
example, A measure in at (1.74 inch, 5.52 inch). Knowing the iPad has a PPI of 264 the
location of the finger at A would be 1.74 * 264 = 459.36 px and the 5.52* 264 = 1457.28 px.

Concluding for location A with the x/y coordinates are (459.36 px, 1457.28 px). This result
was then compared to the result output from the toolkit software and a large difference was
found. If we take into account the 0.5 seconds delay that was found in 6.1, on line 37 of the
results page figure 25 the coordinates are notes as (610 px, 347 px) thus concluding that
these results are not accurate. Mark B was calculated as (459.30 px, 810.4 px) but as recorded
on figure 26 the coordinates were (600 px,128 px). This experiment was done with all five
marks and no correlation between the data was found, therefore can conclude there was no
specific offset number for the location.

Figure 25: Results for B at 15.5 seconds Figure 26: Results for A at 11.5 seconds

41
6.3 Set Up and Toolkit Run Through by Another Person

For this evaluation, another person was given the equipment, the software and a user manual
guide for the toolkit. They were asked to setup the toolkit and record a video, upload it onto
their computer. Open the software and run the video stream through the software to provide
an output file. At the end the experiment the individual was asked some questions on how
they found the experience and what they thought worked well and what could be changed and
done better.

Figure 27: User after they have completed their experiment

Figure 28: Screenshots of the experiments the user ran in the toolkit

42
The full questionnaire for this experiment is detailed in Appendix C. From this experiment
the user has no prior knowledge of coding or this project. This experiment was to evaluate the
ease of use of this toolkit and therefore was appropriate to get someone who has no technical
knowledge. The user said “it was easy and took a short time to do” when referring to the
hardware setup as they used the pictures in the user manual to guide them. In addition, even
with no knowledge of the code, they were able to understand the output of the toolkit and use
appropriate modifications (changing the threshold in line 74, instructed in the user manual) to
gain their desired output. They did notice some glitches with the tracing as the marker would
occasionally register something other than their fingertip but overall they were happy with
their experiment.

6.4 Tracking Different Object Movement


The final experiment consisted of using different object in the video stream and evaluating if
the software was able to recognise anything that has a similar finger shape. The video stream
used conducted an experiment that involved using a pen to mimic the movement of a finger
on a keyboard, as well as using different fingers (little finger) and wearing a glove while
doing the experiment (this final experiment was not included but footage and results can be
found on the working document).

The following experiment was carried out, initially used a pen to replicate the movements of
a finger typing.

43
Figure 29: The start of the video as the pen enters the frame.

Figure 30: Middle of the video with the pen 'typing' Figure 31: Final outcome after the pen has exited the frame.

As represented in figures 16,17 and 18 the system was able to track the pen at certain point
but was unable to fluidly track its path as it would do with a finger. The desired output of this
experiment would have been if the software wasn’t able to recognise the pen altogether but it
can be concluded that the software was able to partially track the pen but not to great
accuracy.

The next experiment was using the little finger to replicate what has previously been done by
an index finger.

44
Figure 32: Initial set-up when starting the video

Figure 33: Mid video at 00:09 seconds Figure 34: End of the video at 00:18 seconds

From the test run the conclusion can be made that the toolkit does tracks the little finger.
From the initial observations of the output video with the traced routes it does look like the
software is able to track the little with the same accuracy as it would do with an index finger.
In contracts with the pen this shows that the system is efficient at determining fingers in
comparison to objects.

45
6.5 Chapter Review

This section has seen the evaluation of the toolkit. It can be concluded that the toolkit needs
more work to refine and make the system more accurate. However, it does meet the aim of
tracking the path of a finger in a video stream and produces results with a 0.5 second system
delay.

46
Chapter 7

Conclusion
In this chapter the overall project is discussed. The overall outcome of the success and failure
of the project will be evaluated. The aims will be reflected upon as well as the knowledge and
experience gained through the project.

7.1 Conclusion of Aims

Aim 1: To build a toolkit, which includes hardware that should be quick and easy to
assemble, with minimal equipment.

This aim was successfully completed, in section 6.3, during the evaluation, the individual that
was given the user manual along with the toolkit was able to efficiently and quickly set-up
the equipment and conduct their desired experiment. From the feedback they said it was
quick and easy to run and required a minimum of 3 pieces of equipment which was good.
However, they could have benefited with a more user-friendly user interface for the software.
In addition, the hardware for the toolkit was adaptable to allow the researcher to use whatever
device they had available to them.

Aims 2: Record a library of 20 - 40 video streams of participants undergoing various tasks


using the RGB camera.

Through various different experiments and investigations 20 video stream was a good target
to hit. During this experiment this target was met and exceeded, collecting and evaluating
nearly 40 videos streams. The evidence for this can be located on the working document that
contains all the raw footage that was recorded during this experiment along with all the
outputs produced.

47
Aim 3: To be able to extrapolate information/data from the library gathered, including ‘hit’
time, ‘hit’ angle, ‘hit’ locations, the time taken for tasks to complete etc.

The ‘hit’ time, could be calculated was shown in 6.11 where the toolkit was efficiently able
to track the finger movement but did experience a 0.5 second input delay. However, the
toolkit was not able to evaluate at what stages in the video stream the subject actually clicked
onto the input device, this was only determined by the repeated values (coordinates) of x and
y as the finger would have been stationary at those stages.

As evaluated in sections 6.12 and 6.13 the toolkit was not fully accurate when recording the
location and angle. This is due to the values that were calculated not matching up with the
results produced by the toolkit. Therefore, the techniques used to calculate the angle and
location need further testing and altering.

Aim 4: To evaluate the toolkit and if it meets the researchers needs and wants (this may
include ‘hit’ are, ‘hit’ time, ‘hit’ angle etc.). Evaluate the toolkit to make sure it is capable to
do everything that was intended from it for this project.

This aim is subject to the situation that it is being used in. For this experiment the three main
needs were ‘hit’ time, angle and location. The toolkit was able to accomplish one of these
needs but wasn’t able to accurately determine all three. Therefore currently, this toolkit is not
capable to do everything that was intended by the project.

Aim 5: For the user to be able to carry out Fitts’s law study while finger movement is
analysed by the toolkit.

The Fitts’s law study was used as one of the testing methods when gaining video streams for
the evaluation of the toolkit. The images below show the Fitts’ law in action, these were
taken during the beginning stages of development for the toolkit and therefore results in
sporadic tracking. This helped to gauge the threshold level that was required from a toolkit to
provide smooth and accurate tracking.

48
Figure 35: These images were taken during the Fitts' law experiment

7.2 Project Revision


As most of the aims were met and the toolkit was able to answer several questions there is
still room for improvement. These are a few ideas and implements that would be added on if
this project was done again or is extra time for further development was given.
Firstly, the toolkit intention was to be able to record the ‘hit’ of the finger onto an input
device. This would be able to record the actual input of the finger. For example, if a finger is
typing on a keyboard the toolkit should be able to recognise the letters that are being typed.
This does not work when the toolkit is used to do the Fitts’s law experiment or instructed to
browse the web. This is because the actual input is not reflected by a finite letter
combination.

Secondly, the user interface with this toolkit is poor. This was developed for HCI researchers
and was assumed that they had prior knowledge of compilation and running programs in java.
In reality, if this toolkit was to be used by researchers in different departments, for example,
psychology. If they are unable to use the toolkit or able to follow the manual, they will not be
able to use the system efficiently.

Lastly, more research regarding hand sensor should have been done. Ponraj and Ren’s [31]
work on tracking fingertips using distinct sensors and combining the data using sensor
function techniques proved to be extremely accurate. Using accurate techniques such as this
while gradually minimising the equipment needed to track finger and hand movements
should be how the work is done in the field needs to progress.

49
7.3 Further Work

The most important section in this project would be the further work. This section looks at
the concluded project and the ideas proposed and challenge the need for further research and
work on the idea to make this toolkit bigger and better in the future. This section will look at
how this idea of finger tracking in a video stream can be implemented in the future for a
variety of different uses in different sectors. These sectors may include medicine, sign
language translation, input device research and website development

It is clear that further work is required in determining the angle and location of the finger
within the video stream. In addition, the delay time of 0.5 seconds needs to be reduced as
much as possible with the goal of obtaining 0 seconds delay time. As said in 6.3 the user
would have benefited from an UI that was accessible and easy to use.

Finally, if a camera was able to be housed in a case along with a mini operating system
running the toolkit software, this would create the idea portable moment tracking toolkit (as
illustrated in the image below). The output could be sent to the researcher wirelessly via wifi,
Bluetooth or NFC.

7.3.1 Medical Use


As previously outlined in the interview with the PhD student, a great use for tracking fingers
in a video stream would be in medicine, more importantly during surgery especially surgeries
that are not routine and mostly repetitive. A video would record the surgeon's hand
movements during a surgery and output the analysed information including the x and y
coordinates of their hands and the angles of their fingers. This information could then be
uploaded into a robotic pair of hands that would be able to replicate the surgery, this could
increase accuracy during the surgery and would also allow time for the surgeons for more
complicated work.

This approach works better than the current approach of hand/finger monitoring which uses
sensors that are clipped onto the individual's hands. The major challenges that face this

50
further work include the three-dimensional aspects of the surgery. This would possibly
require a minimum of two cameras along with a more powerful software toolkit able to
provide more accurate results. In addition, the contrast between the surgeon and the patient
might be an issue but could be solved with bright gloves creating a greater contrast. There are
more obstacles before the use of finger tracking can be implemented in the health-care
industry but hopefully, this project provides some ideas for getting started.

7.3.2 Sign Language Translation


Another idea for further work that would benefit from the ideas that are proposed in this
project would be a sign language translation system. The basis of the idea is to capitalise on a
unique input method. The idea suggests that by recording a video stream of a pair of hands
saying a sentence in sign language and using computer vision techniques can be used to
analyse the hand/finger movements and translate them into words.

After a library is filled and a software is taught how to translate sign language into words,
individuals can use sign language to type as the video stream analyses the finger movements,
translates them into words and they are added into a document, email or message. This would
be a unique input device that could challenge the traditional keyboard but also encourage
people to learn sign language thus help communicate with individuals with hearing
impairments.

7.3.3 Input Device Research


One of the goals of HCI (human computer interaction) is to be able to make the
communication between the human and the computer as seamless and effortless as possible.
This starts with the input device, as mention before the sign language translation is a unique
take on using the ideas of the project to create an input device. Another use of this port would
be to monitor the approach taken by individuals when using a keyboard and other input
devices. By analysing the travel time between keys we can use this toolkit to gain
information that could help with creating a more efficient and easier to use a keyboard. These
keyboards are already in the market, but by gathering information for individuals we can
determine what exactly they are looking for. Potentially recording individuals for a certain
amount of time and personalising keyboard specifically for their typing habits.

51
7.3.4 Website Development
Just like the toolkit used to gain information about user input habits and the devices they use,
this information can also be translated to a website usage that is available on iPads or
smartphone (any touchscreen devices). The toolkit can be used to track the user's habits
when using the websites and how they navigate through it, over time the website developers
can determine what buttons/pathways are the most common in the website and can change
the layout to be more user-friendly and efficient. This will help the development of websites
and will also help with user experience because they are not frustrated navigating a website
(and everyone has been there).

7.35 Robotic Arm


Finally, an alternative idea for further research, and following Kadir, et al's proposal that, 'the
movement of the robot arm can be controlled by a computer via the internet' [27], this project
can be used to aid these ideas into reality. To use a game of chess as an example, an
individual's hand movements could be traced and transferred over the internet (and possibly
across the world) to a robotic arm that is programmed to replicate these movements. This
replication would allow physical games of chess to be played with opponents across the
world, and only with equipment as simple as a webcam.

7.4 Reflection
During this project, I believe I have learned a huge amount about computer vision, something
that I previously had very little to no knowledge of. I have been introduced to the world of
OpenCV with its massive library of functions capable of various operations. Initially, when
the project started I didn’t have a certain idea or direction I wanted to go with therefore I was
able to play around and learnt how to use different concepts such as Haar Cascades and
background subtraction. I was able to understand the advantages and disadvantages of using
various techniques to achieve my goal. This was all before understanding the steps I needed
to take to ensure I was able to create an efficient and accurate toolkit and overall project.

Going into detail regarding how images can be manipulated and changed to obtain certain
information really fascinated me while during my research. I really enjoyed learning how
binarization worked and why it worked and how we can basically manipulate individual

52
pixels and alter an image/video to analyse and extract specific information, such as the shape
of a finger. Furthermore, I understood in intricate relationship between thresholding,
contouring, blurring and how they all work together and how the order in which processes are
done is very important depending on what information you would like to obtain and how you
want to use that information.

I enjoyed talking to the HCI PhD students along with peers. As this toolkit is intended to be
used by others and therefore it was important to me that look at the user manual or the
instruction is given someone that they are able to understand where I am coming from and
why this is of interest. In particular, I enjoyed talking to the PhD student about innovations in
the future that involved HCI. Such innovation as robotic arms conducting their own surgeries
or using potentially recording hand recordings of yourself to play chess with someone across
the world just because the toolkit was able to analyse the movement of your hand and
replicate this to a robotic arm, this without using expensive sensors or high tech equipment.

Finally, I was able to overcome challenges such as my limited knowledge of python and
OpenCV and I did this by research, watching tutorials and asking peers. Another major
challenges I was faced with was not knowing how to efficiently evaluate my toolkit and I was
able to look at past work and talk to PhD students to gain the broad research techniques I
could use. Thanks to this project I have been able to learn a wide arrange of skills. This
experience has taught me how to put together a large report along with research, particularly
with the structure of the project, meeting personal deadlines and working largely
independently. This has improved my communication skills because I needed to be clear and
concise with my points and idea to get across the message to someone who is unfamiliar with
computer science altogether. In addition, doing the project in python has aided me immensely
as I am more confident with my coding. Python is extremely desired and this has allowed me
to be more confident during assessment days and job interviews while talking about python
and computer vision. Finally, I believe I have improved my ability to undertake a large
project and be able to efficiently execute it individually.

53
7.5 Closing statement
In conclusion, I believe this project was a great success as it personally has helped me
develop in various ways. This includes communication, time management, structure, work
management. Whilst I was not able to complete all my desired objectives to the fullest as I
aspired to, such as accurately calculating the angle. I believe I was able to put my best effort
in and I am proud of the outcome. I was able to pick an area of computer science I was
interested in (HCI) and I was given the opportunity to learn so much more this included
computer vision OpenCV library and techniques to alter video frames/images. Furthermore, I
have acquired many invaluable skills which I can now apply to any number of fields. Finally,
even with a challenging project such as this one, with scary hurdles and stressful times, I am
ultimately proud of the work that I was able to achieve.

54
Reference:

[1] Facts, S. (2018). Topic: Smartphones. [online] www.statista.com. Available at:


https://www.statista.com/topics/840/smartphones/ [Accessed 7 Oct. 2017].
[2] Von Hardenberg, Christian, and François Bérard. "Bare-hand human-computer
interaction." Proceedings of the 2001 workshop on Perceptive user interfaces. ACM,
2001 [Accessed Oct 2017]
[3] Hackenberg, G., Mccall, R. & Broll, W., 2011. Lightweight palm and finger tracking for
real-time 3D gesture control. Virtual Reality Conference (VR), 2011 IEEE, pp.19–26.
[Accessed Oct 2017]
[4] Letessier, J. & Bérard, F., 2004. Visual tracking of bare fingers for interactive surfaces.
Proceedings of the 17th annual ACM symposium on user interface software and
technology, pp.119–122. (Accessed Oct 2018)
[5] Gorodnichy, D.O. & Yogeswaran, A., 2006. Detection and tracking of pianist hands and
fingers. Computer and Robot Vision, 2006. The 3rd Canadian Conference on, p.63.
[Accessed Oct 2017]
[6] Rico J., Crossan A., Brewster S. (2011) Gesture-Based Interfaces: Practical
Applications of Gestures in Real World Mobile Settings. In: England D. (eds) Whole
Body Interaction. Human-Computer Interaction Series. Springer, London [Accessed
Oct 2017]
[7] Opencv.org. (2018). About - OpenCV library. [online] Available at:
https://opencv.org/about.html [Accessed 18 Oct. 2017].
[8] Yeo, H.-S., Lee, B.-G. & Lim, H., 2015. Hand tracking and gesture recognition system
for human-computer interaction using low-cost hardware. , 74(8), pp.2687–2715.
[9] Godbehere, A.B., Matsukawa, A. & Goldberg, K., 2012. Visual tracking of human
visitors under variable-lighting conditions for a responsive audio art installation.
American Control Conference (ACC), 2012, pp.4305–4312.
[10] Moscheni, Dufaux & Kunt, 1995. A new two-stage global/local motion estimation
based on a background/foreground segmentation. Acoustics, Speech, and Signal
Processing, 1995. ICASSP-95., 1995 International Conference on, 4, pp.2261–2264.
[11] Sa-cybernetics.github.io. (2018). Hand Tracking And Recognition with OpenCV.
[online] Available at: http://sa-cybernetics.github.io/blog/2013/08/12/hand-tracking-and-
recognition-with-opencv/ [Accessed 7 Mar. 2018].

55
[12] Padilla, Rafael & Filho, Cicero & Costa, Marly. (2012). Evaluation of Haar Cascade
Classifiers for Face Detection.
[13] Yu, T., Zhang, C., Cohen, M., Rui, Y., and Wu, Y. “Monocular video
foreground/background segmentation by tracking spatial-color gaussian mixture
models.". In Motion and Video Computing, 2007. 23 Feb. 2007.
[14] Ghotkar, A. S., and Kharate, G. K. "Hand segmentation techniques to hand gesture
recognition for natural human computer interaction." International Journal of Human,
University of Pune, India, 2012.
[15] Docs.opencv.org. (2018). OpenCV: Contour Features. [online] Available at:
https://docs.opencv.org/3.4.0/dd/d49/tutorial_py_contour_features.html [Accessed 18
Mar. 2018].
[16] Song, P., Yu, H., and Winkler, S. "Vision-based 3D finger interactions for mixed
reality games with physics simulation." In Proceedings of the 7th ACM SIGGRAPH
International Conference on Virtual-Reality Continuum and Its Applications in Industry.
ACM, 2008, Singapore, December 08, 2008.
[17] Docs.opencv.org. (2018). OpenCV: Image Thresholding. [online] Available at:
https://docs.opencv.org/3.4.0/d7/d4d/tutorial_py_thresholding.html [Accessed 7 Mar.
2018].
[18] Rosebrock, A. (2018). Thresholding: Simple Image Segmentation using OpenCV
PyImageSearch. [online] PyImageSearch. Available at:
https://www.pyimagesearch.com/2014/09/08/thresholding-simple-image-segmentation-
using-opencv/ [Accessed 7 Mar. 2018].
[19] Felixniklas.com. (2018). Image Processing - Binarization. [online] Available at:
http://felixniklas.com/imageprocessing/binarization [Accessed 7 Mar. 2018].
[20] M. Sezgin, B. Sankur, “Survey over image thresholding techniques and quantitative
performance evaluation”, Journal of Electronic Imaging 13 (1) (2004) 146–168.
[21] Docs.opencv.org. (2018). Miscellaneous Image Transformations — OpenCV 2.4.13.6
documentation. [online] Available at:
https://docs.opencv.org/2.4/modules/imgproc/doc/miscellaneous_transformations.html
[Accessed 21 Mar. 2018].
[22] Docs.opencv.org. (2018). OpenCV: Morphological Transformations. [online]
Available at: https://docs.opencv.org/trunk/d9/d61/tutorial_py_morphological_ops.html
[Accessed 8 Mar. 2018].

56
[23] Docs.opencv.org. (2018). Eroding and Dilating — OpenCV 2.4.13.6 documentation.
[online] Available at:
https://docs.opencv.org/2.4/doc/tutorials/imgproc/erosion_dilatation/erosion_dilatation.ht
ml [Accessed 8 Mar. 2018].
[24] Docs.opencv.org. (2018). OpenCV: Contours : Getting Started. [online] Available at:
https://docs.opencv.org/3.3.1/d4/d73/tutorial_py_contours_begin.html [Accessed 8 Mar.
2018].
[25] Docs.opencv.org. (2018). OpenCV: Contour Features. [online] Available at:
https://docs.opencv.org/3.4.0/dd/d49/tutorial_py_contour_features.html [Accessed 18
Mar. 2018].
[26] Ghotkar, A. S., and Kharate, G. K. "Hand segmentation techniques to hand gesture
recognition for natural human computer interaction." International Journal of Human,
University of Pune, India, 2012.
[27] Kadir, Samin & Ibrahim, 2012. Internet Controlled Robotic Arm. Procedia
Engineering, 41, pp.1065–1071.
[28] Webcam, C. and Webcam, C. (2018). Logitech C922 Pro Stream 1080P Webcam for
Game Streaming. [online] Logitech.com. Available at: https://www.logitech.com/en-
gb/product/c922-pro-stream-webcam [Accessed 17 Mar. 2018].
[29] Papitawholesale. (2018). Samsung Galaxy S9 Plus Dual Sim - 64GB, 6GB Ram, 4G
LTE- Grey.[online] Available at: https://papitawholesale.com/products/samsung-galaxy-
s9-plus-dual-sim-64gb-6gb-ram-4g-lte-grey?variant=11185661804587 [Accessed 17
Mar. 2018].
[30] D3nevzfk7ii3be.cloudfront.net. (2018). [online]Available
at:https://d3nevzfk7ii3be.cloudfront.net/igi/BS5F1LW5IZiGLEBy.huge [Accessed 17
Mar. 2018].
[31] Ponraj, G. & Ren, H., 2018. Sensor Fusion of Leap Motion Controller and Flex
Sensors Using Kalman Filter for Human Finger Tracking. Sensors Journal, IEEE, 18(5),
pp.2042–2049.

57
Appendix A – Project Proposal

Toolkit Support for the Analysis of Finger Movements in Video


Streams
Crisanto Da Cunha - Proposal

Abstract

The project proposed in this paper will aim to develop a toolkit that aids in the analysis of
finger movement in a video and learn how user’s mobility works when using particular
devices. This additional toolkit will be beneficial as we will be able to better understand the
movement habits of our intended target audience.

This project will begin with us familiarising ourselves with computer visions programs (e.g.
OpenCV). We would need to develop a reference test-bed to record suitable video stream for
analysis, this allows us to gauge reference points with user’s hands and also gives us an
opportunity to change anything we may need when we start to analyse a large volume of
video streams.

When the test-bed is complete and we have gained appropriate reference points, we are ready
to build the bulk of our library. To build our library we will need a record of user’s
interactions with multiple different devices, to demonstrate the robustness of the toolkit. The
toolkit can then start processing the video streams detecting and recording to a log file.
Finally, a method of evaluating the data collected will produce charts and graphs with the
appropriate information required by the researcher.

58
1. Introduction

The market for touch screens devices is growing, with almost 2.53 billion smartphone users
in the word [9]. For a long time, research on human-computer interaction has been restricted
to techniques based on the use of a graphics display, a keyboard and a mouse [2]. Nowadays
users want to get as close to the software as possible and the way we use our phones and
other electronic devices has changed drastically from the 2007 iPhone release [9]. The way
we research and analyse user interaction should also adapt.

With this growing use of touchscreens, multi-touch interaction techniques have become
widely available recently [3]. This project is relevant to the current growth of how users
interact with technology, and with this continued growth, we need to analyse how users are
approaching new technology and how we can adapt devices to be easier to use, with human-
centred requirements i.e. low system latency [1].

In this paper, I will break down the steps for the development of my toolkit and how and why
I have chosen to create it with certain, specific features. I have included my research that
looks at a range of hand and finger movement analysis from pianists to simple bare-hand
finger tracking analysis and some open-source toolkits that are available for use now [4, 1, 6].

In this project, I aim for my toolkit to have the functionality to process and detect hand
positions without prior knowledge of existence [3]. The difficulty with this task would be to
determine the background and foreground to make the data intake easier, while also needing
to make sure the lighting and environment did not hinder my results. Thus we will distinguish
background and foreground constraints generally applied for simplifying the segmentation
process [5].

Furthermore, detect the individual's fingers from the hand and have the ability to track these
fingers in their movement, we will be using a high-resolution RGB camera to observe a path.
From this, we can start building our library of recording finger movements from different
users while they are using different devices. We can analyse the footage and start to produce
useful information that researched would require. Areas of analysis will include ‘hit’
locations, ‘hit’ times, ‘hit’ angle.

59
Finally, a suitable output of data in various forms, graphs, charts would be available for the
researchers to choose from and the toolkit would produce an appropriate data sheet. The
toolkit itself will be evaluated on how easy it is to use. How responsive the program if and
most importantly how intuitive the whole sequence of gathering a library of data, analysing
and publishing a final report is.

2. Background (implementation)

From the papers, I have researched there have a been several recurring concepts that I should
consider for my project. For my toolkit to be portable and easy to set up, I would need to
have the off-the-shelf and affordable equipment, along with a transportable and compact PC
set up [1]. While this is the case my set up still should have low latency, static stability, and
robust to the setup conditions. A common theme that I have come across is a layered
approach to the investigations.

The first step is always to reference the movement subject, detection determines the presence
and position of a class of objects in the scene [2], where that me a whole hand or just the
fingers. Several image cues as colour, texture and shape have been used to approach the
problem of hand detection [3]. From the detection/reference finding methods, the background
method is the method that would work for my project. It works by isolation from a
background and by recognizing the object dominant features [4]. After this stage is complete
the next step is usually the tracking stage.

Tracking of hands is relevant because it allows natural gesturing [5]. This is beneficial for
researchers as it allows the data collected from the users to be as natural as possible with no
external hindrance. As we are analysing the movement of objects, the objects will not rest in
the same position over time, this leads to various problems including motion blurring [2]. As
proposed in [8], trackers that rely on a single feature are likely to lose the target when that
feature becomes unreliable. The use of multiple features e.g. foreground object, colour, shape
and proportionality leads to an increase in robustness. After a firm grip of the objects
movements, we are able to collect data and start to analyse the interactions.

60
From my research, the following step is usually segmentation [1, 2,4,5,7]. The basic problem
lies in the extraction of information from vast amounts of data [2]. From gathering the data,
the most important part of the project for the researchers are the results and how/why they are
relevant. The goal of segmentation is to decrease the amount of image information by
selecting areas of interest [2]. Typically hand segmentation techniques are based on stereo
information, colour, contour detection, connected component analysis and image differencing
[2]. To the disadvantages that were found in [2] that included, unrealisable in cluttered
backgrounds, sensitive to changes in the overall illumination, sensitive to shadows, prone to
segmentation errors caused by objects with similar colours. In [1] new techniques were
developed to overcome these disadvantages, Image Differencing Segmentation (IDS) is
insensitive to shadows and Fast Rejection Filter (FRF) is insensitive to finger orientation.
Finally, for my project the data sheet/data represented relies on the researches needs. The
data gathered will consist of ‘hit’ location, ‘hit’ time, ‘hit’ angle and the path the fingered
followed.

3. The Project Proposal

3.1 Aims and Objectives


The aim of my project is to have a simple and easy to toolkit that analyses the movement path
and hit detection, the time between hits, angle and errors for a user and their particular
device. The toolkit should be able to analyse a large library of video data recorded on a high-
resolution RGB camera and should be simple to use while producing the data collected in a
useful manner to the researcher.

My Aims for this project are:

• To build the toolkit that is quick and easy to assemble, with minimal equipment that
still is able to gathered the required information.
• Record a library of over 30 - 50 video steams of participants undergoing various tasks
using the RGB camera.
• Be able to extrapolate information/data from the library gathered, including ‘hit’
time, ‘hit’ angle, ‘hit’ locations, time taken for tasks to complete etc.

61
• To evaluate the toolkit and if it meets the researchers needs and wants (researcher’s
criteria). Evaluate the toolkit to make sure it is capable to do everything that was
intended from it for this project.
• The user is able to carry out Fitz law study while finger movement is analysed by the
toolkit.

3.2 Methodology

When building my toolkit, I will need to gather all the information regarding OpenCV and
using python together to create my needs for this toolkit. With regards to the needs of the
toolkit I will first start by approaching PhD students from Lancaster who have project in
interactive systems and getting their feedback with regards to what requirements do they need
and what information they would need.

{To determine the ease of use for my toolkit I will be conducting two studies. I will first ask
my users to fill out a questionnaire and get their feedback on how easy/ intuitive the toolkit is
to use. Furthermore, I will observe the users write a qualitative report on how the users
interacted with the toolkit, if it is the way Intended them to/ if they developed their own use/
were any mistakes made. These reports will be conducted at each stage of the toolkit
development.}

To record my library of data, I will book a lab in the engineering building and I will time
myself from the start until the end of the study, I will note down the time taken for the initial
set-up. The average time for setup will be calculated and will be present in my final report.
Participants will be given time slots that they will need to come and undertake different
movement related activates while the RGB camera records their movement. They will fill out
a form at the end of their session with feedback and answer a few questions. This will be
repeated until I have gathered between 30 – 50 videos streams.

When the user study is being carried out, I will comprise a data collection sheet. Here I will
include if the toolkit was able to detect the hand/finger if the toolkit was able to distinguish
the foreground and background if the toolkit was able to track the object/subject, the time
taken for the user to complete tasks, the correctness of the data collected. Also, the ‘hit’ time,

62
‘hit’ angle, ‘hit’ locations. The tasks will involve the user typing a sentence on an iPhone
keypad, this will be repeated on an android phone and a full-sized laptop keyboard.

After the library of videos has been collected, the analyse of the video will produce data that
should be useful for researches. I will go back to the PhD students I interviewed at the start of
my project and check if the results from my projects were useful for them and if this was an
improvement on the toolkit/approaches they would have used to carry out a similar
experiment. This will lead to my final evaluation of my toolkit where I can evaluated if they
toolkit has meet my aims and satisfied the researched needs and my needs.

4. Programme of Work
The project will begin in October 2017 and will conclude in March 2018, and it will be
broken up into the following stages;

• Introduction (Project Aims): By the end of week 3, we will have a final proposal with
clearly defined projects aims. This proposal will consist of an introduction to the project,
the background research conducted that leads me to choose my project, the defined aims
and a break down of how to achieve each aim.
• Background R1esearch: From week 4 to week 6 we will start researching computer
vision programs and developing the toolkit with the aid of a high-resolution RGB camera.
For OpenCV, we will need to watch tutorials and learn to use its feature thus allowing us
to create our intended goal.
• Software Development: During Week 7 to week 10 the program development stage will
begin. This will include using OpenCV along with the RGB camera to try and record,
recognise and analyse a hand and its movements.
• Software Development p2: Over the Christmas holidays to week 11, I will continue to
work on the program, but will concentrate on developing toolkit to record, recognise and
analyse each finger and their movements.
• Beta testing: In week 12 to 13 I will start carrying out initial testing with participants, I
will attempt to detect their hands and fingers to start the tracking phase. The Beta tests
will be carried out by a small number of users, less (<) than 10 users. I will check what
worked during the and get feedback from the users and my own visual study. This will
contribute to the 30 – 50 video stream I intended to collect in my aims.

63
• Improvements: In week 14 any improvements that are required to be made from the
feedback received from my beta test will be carried out during week 14. This will
complete the build aim of my toolkit, and will make sure the toolkit is compact and easy
to use.
• Main Study: Between week 15 to week 16 the main study that will allow me to build my
library of data will be undertaken but also help me complete my project aims. The toolkit
will be a more complete and refined version of the pilot study. This will contribute to the
bulk of the 30 - 50 video stream I indent to collect. The Fitts’ law study that I aimed to
carry out will also be one of the tasks the participants are asked to complete.
• Data Analysis: After completing the study, and with all the data collected, I will
evaluate my toolkit, its accuracy, ease of use and how it benefits researchers for their
specific needs and requirements, a draft of my evaluations will be finished by week 17. I
will also evaluate the study as a whole, and suggest improvements for next time. My final
project write up should be concluded by week 18, allowing me time to gain feedback and
make any necessary changes for my final hand in for week 20

*Week 10 includes 4 weeks of Christmas holisays. Week 10 = week 10 + 4 weeks Christmas


holidays.

64
5. References

1. Visual Tracking of Bare Fingers for Interactive Surfaces – Julien Letessier, Francois
Berard (Accessed Oct 2018)
https://doi.org/10.1145/1029632.1029652
2. Bare-Hand Human-Computer Interaction – Christian Von Hardenberg, Francois Berard
(Accessed Oct 2017)
https://doi.org/10.1145/971478.971513
3. Lightweight Palm and Finger Tracking for Real-Time 3D Gesture Control – Georg
Hackenberg, Rod McCall, Wolfgang Broll (Accessed Oct 2017)
https://doi.org/10.1109/VR.2011.5759431
4. Detection and Tracing of pianist hands and fingers – Dimitry O.Gorodnichy and Arjun
Yogeswaran (Accessed Oct 2017)
https://doi.org/10.1109/CRV.2006.26
5. Finger tracking for interaction in augmented environments – Klaus Dormuller-Ulhaas,
Dieter Schmalstieg (Accessed Oct 2017)
https://doi.org/10.1109/ISAR.2001.970515
6. Gesture-Based Interfaces: Practical Applications of Gestures in Real World Mobile
Settings – Julie Rico, Andrew Crossan, and Stephen Brewster (Accessed Oct 2017)
https://link.springer.com/content/pdf/10.1007%2F978-0-85729-433-3.pdf
7. Motmot, an open-source toolkit for real-time video acquisition and analysis – (Accessed
Oct 2017)
https://scfbm.biomedcentral.com/articles/10.1186/1751-0473-4-5
8. Real-Time Finger Tracking with improved Performance in Video Sequences with
Motion Blur – Daniel Popa, Vasile Gui and Marius Otesteanu (Accessed Oct 2017)
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7296467
9. Smartphones industries: Statistics and Facts – (Accessed Oct 2017)
https://www.statista.com/topics/840/smartphones/
10. An Interactive Visualisation of Fitts's Law with JavaScript and D3 – Simon Wallner,
Otilia Danet, Trine Eilersen, and Jesper Tved. (Accessed Oct 2017)
http://simonwallner.at/ext/fitts/

65
Appendix B – User Manual

Crisanto Da Cunha
Toolkit support for the analysis of finger movement in a video stream

Finger Movement Toolkit


User Manual

66
Product Description

This toolkit is able to take recordings and run them through a software that is able to analyse
the finger movements within the video stream. The threshold (how sensitive the software
needs to be to eliminate background noise) can be altered to achieve best results when
analysing a video sample.

System Requirements

The video camera required to capture the footage for analyse needs to be of highest quality,
preferably 1080p with 60 fps or higher, this is to ensure the highest accuracy possible when
running through the software in addition to avoiding missed frames.

Hardware setup
The hardware set up will consist of three main components.
• Webcam on a tripod
• Input device (iPad)
• Computer (to be able to run the software and produce results)
Connect the Webcam to the laptop to be able to record and save the video files.

67
Download and Install the toolkit software

To begin, download the code that is linked below and copy and paste it in any code editor,
preferably in PyCharm.

https://drive.google.com/open?id=1AFNVi8OOf9IO5GRqzBYBD2zHeNywzi6_

Open Toolkit

Windows Open Toolkit using any of the following:

• Create a folder called ‘finger movement toolkit’


• Download the code into a code editor and save in the folder.
• Insert any video file that you want to run through the toolkit into the same folder

68
• Follow the link and instruction to download python for windows:
https://www.python.org/downloads/windows/
• Go into the folder and right click and select the option “open command window here”
• While in command promp enter the follow line: python finger_tracking.py
<full_path_of_video_file>

Mac Open Toolkit using any of the following:

• Create a folder called ‘finger movement toolkit’


• Download the code into a code editor and save in the folder.
• Insert any video that are going to be used in the same folder
• Open “terminal” and go into the folder that contains your project
• While in the folder use the following command to compile the code and analyse your
video file: : python finger_tracking.py <full_path_of_video_file>

Features

• Lines 46, 72 and 73 can be varied to change threshold area for hand detection as well
as screen detection. //make more clear
• Additional modules (methods) can be added onto toolkit to detect/analyse other
things.
• Line 229 you can alter the output that is written to the results file.

69
Appendix C – 6.3 Questionnaire

1.) Are you from a technical background (i.e. computer science,


engineering, physics etc.)?
No I am not.

2.) Do you have any experience coding?


No, I may have done some coding in ICT in school but I don’t really remember.
Even if I did it might have been excel or very basic.

3.) How did you find the set up process?


So I had the user manual but I mainly looked at the pictures and just tried to
replicate that, it wasn’t too difficult the hardest part was probably putting the
webcam onto the tripod. Other than that it was simple and when I plugged the
webcam into the computer it worked straight away. So it was very easy and simple
and took a short time to do.

4.) What experiments did you run?


I wanted to start slow to understand what I was doing so I just started by typing my
name into notes and then I tried to draw a smiley face in notes.

5.) Did you understand the code/ what was going on?
No I didn’t, but I knew what the actual program did and I knew I had to change
one number on line 74 to change how accurate I want the tracking to be, so I took
different videos and changed the number on each one and played the code to see
how it changes.

6.) How accurate is the tracking?


From what I can see the tracking is very accurate, I was very surprised on how
accurate it actually is. However, it goes bug out at times and picks up other parts

70
of my body like my knuckle and will get confused but normally goes back to
tracking my fingertips.

7.) What changes would you make to this toolkit?


As someone who has no experience with coding and the technical side of this
toolkit, it is daunting looking at this code so I think a friendly user interface would
be beneficial for anyone using it.

8.) Can you see this toolkit being use?


Personally for me I would never use it, but for someone whose doing research
work or in this field I can see it being an easy assemble tool to use to get quick
results. Especially if they know coding and are able to understand whats going on.

71

Das könnte Ihnen auch gefallen