Sie sind auf Seite 1von 23

Design of a Laser Controlled Mouse Using OpenCV

Final Project Report


Building Tools for Creative Practices

Christopher Armenio
Table of Contents
Project Overview ............................................................................................................................ 3
Summary ..................................................................................................................................... 3
Project Goals............................................................................................................................... 3
Project Features........................................................................................................................... 3
Background Research ..................................................................................................................... 4
Literature Study .......................................................................................................................... 4
Michael S. Brown, William K. H. Wong................................................................................ 4
Carsten Kirstein, Heinrich Muller........................................................................................... 4
Jean-Francois, Guy Godin ...................................................................................................... 5
Community Study ....................................................................................................................... 6
Existing Methods of Interaction.............................................................................................. 6
User Interface Observations.................................................................................................... 7
Resulting Pattern Language ............................................................................................................ 9
Domain Problem ......................................................................................................................... 9
Applicable Pattern Domains + Supported Creative Practices..................................................... 9
Overall Language Patterns........................................................................................................ 10
Virtual Environments............................................................................................................ 11
Standard UI ........................................................................................................................... 11
Human-Computer Interaction Patterns ..................................................................................... 12
Incorporation of Pattern Language ........................................................................................... 12
System Overview .......................................................................................................................... 14
Laser Pointer Tracking.............................................................................................................. 14
Coordinate Rectification ....................................................................................................... 15
Mouse Coordinate Transmission .............................................................................................. 16
Laser Pointer Tracking.................................................................................................................. 17
Camera ...................................................................................................................................... 17
Tracking Algorithm .................................................................................................................. 17
Configuration File................................................................................................................. 17
Lens Calibration Routine ...................................................................................................... 18
Extrinsic Calibration Routine ............................................................................................... 18
Tracking Algorithm .............................................................................................................. 19
Coordinate Rectification ....................................................................................................... 20
Mouse Coordinate Transmission .................................................................................................. 22
ioBroker .................................................................................................................................... 22
Overall Architecture.................................................................................................................. 22
VncViewer ................................................................................................................................ 22
Additional Documentation............................................................................................................ 23
Calibration Client Interface....................................................................................................... 23

2
Project Overview
Summary
Since the desktop computer first became popular decades ago, the mouse/keyboard combination
has become the defacto standard for human/computer interfacing. Both the mouse and keyboard
are, by design, single user devices, and as such, have worked quite well for the standard single
user machine. However, as large display devices such as LCD projectors become increasingly
popular, the desire for a multi-user collaborative environment has grown accordingly. In such an
environment, the standard mouse/keyboard interface becomes somewhat of an impediment to
true multi-user group collaboration.

It is the goal of this project to create a method of interfacing with a large projected computer
screen (or set of screens) that is both intuitive, and expandable to the multi-user environment. To
accomplish this task, the usage patterns of large, projection-based screens will be studied,
leading to a study of not only the Human-Computer Interface patterns of such screens, but the
Software Engineering patterns that emerged from the design of the target interface.

Project Goals
• Provide a method for interacting with large, multi-screen displays
• Should be easy to learn, and natural to use
• Should required a minimum of user interaction to calibrate
• Should have a relatively fast response time
• Should be cross-platform compatible

Project Features
• Capable of using laser pointers, LED pens, or anything producing bright light
• Supports the following auto-calibration procedures:
Camera lens calibration
Screen bounds and rotation calibration
Exposure calibration
• Uses VNC for cross-platform capability
• Provides an interface for direct interaction with X-Windows
• Refresh rate of ~10 frames/second

3
Background Research
Literature Study
Michael S. Brown, William K. H. Wong
Laser Pointer Interaction for Camera-Registered Multi-Projector Displays
Michael S. Brown and William K. H. Wong
Department of Computer Science
Hong Kong University of Science and Technology

In this system, a single camera is used to survey the entire display surface. While this approach
does minimize the overhead of capturing/processing multiple images, it does introduce problems
when the display are not physically viewable by a single camera. The detection algorithm
employed here makes use of a red filter on the camera and exposure time adjustment to obtain
the input image. With the correct calibration, laser pointers are located using a hard-coded
threshold value; this algorithm is also capable of locating multiple laser pointers at the same
time. This algorithm does not relay mouse coordinates to the underlying operating system.

Carsten Kirstein, Heinrich Muller


Interaction with a Projection Screen Using a Camera-tracked Laser Pointer
Carsten Kirstein, Heinrich Muller
Informatik VII (Computer Graphics)
University of Dortmund, D-44221

4
The interesting part of this approach is that a motion detector filter is first used to only consider
portions of the screen that have changed over the past few frames. Once this filter has been
applied, they take a representative mask of the laser pointer and apply it with different
permutations to the areas identified by the motion detection filter. The downside to this approach
is twofold: 1. screens that have dynamic background may cause false triggering. 2. different
color/size/intensity laser pointers may cause erratic operation since the dot on the screen is a
direct result of all three aspects.

Jean-Francois, Guy Godin


On-Screen Laser Spot Detection for Large Display Interaction
Jean-Francois Lapointe, Guy Godin
Institute for Information Technology
National Research Council of Canada

5
This approach is very similar to the approach used in this project. The image is binarized by
applying a threshold determined by the maximum brightness red, green, and blue values the
projector can project; it is them assumed the laser pointer is brighter than the threshold. The
algorithm then looks for blobs in the resulting image, assuming that the laser pointer will be a
group of pixels in very close proximity. The problem with this approach is that it does not allow
for multiple laser pointers as our approach does.

Community Study
The origins of this project can be traced back to the Collaboritorium Project at RIT, “A dedicated
IT-enabled collaboration space for stimulating multidisciplinary research, pedagogy and
collaboration.”

It only makes sense that a study of this community will provide a better understanding of the
requirements of the interface, both from a technical, and a human perspective.

Existing Methods of Interaction


The Collaboritorium Project at RIT is well over a year old, which means that in the time between
the start of the Collaboritorium, and the completion of this project, there must have been some
method of interaction between users, and the multiple screen computer system. It turns out that
most of these methods were commercially available products that simply emulate the standard
mouse interaction pattern:

Wireless Gyro-Mouse

6
This is a handheld device which controls the movement of the mouse cursor by sensing rotation
of the device about an internal axis.

Figure - Gyromouse

This device, while providing freedom from both wires and a flat mousing surface, still tethers the
user, in somewhat cumbersome fashion, to the concept of a manipulating a mouse pointer.

Wireless Trackball
This is a handheld device which controls the movement of the mouse cursor by sensing the
rotation of a small ball located in the mouse, which is moved by the user.

Figure – Trackball mouse

Much like the previous device, this mouse provides the same freedoms and limitations, simply
exchanging the action of physical rotation with the action of rotating a tiny ball.

Wireless Mouse
Perhaps the most common method of interaction employed in the Collaboritorium is the use of a
standard wireless mouse. Since there are no proper desks in the Collaboritorium, the user often
has to find novel mousing surfaces such as: chairs, books, pant legs, and the floor. Again, this
method still manipulates a mouse pointer, an artificial concept introduced simply for the
technical ease of development it represents.

User Interface Observations


• Each of the four screens are larger than is physically accessible by an average human
 Therefore a touchscreen interface cannot be applied to the entire screen
• The screens are sufficiently spread out that walking from screen to screen is
cumbersome

7
 Therefore a touchscreen interface across all 4 screens would also be
cumbersome
• When large distances are involved the concept of a mouse pointer becomes unnatural
 Therefore the new method of interaction should not rely on the mouse pointer

8
Resulting Pattern Language
Domain Problem
Problem
One or more users need to interact with a large, multi-screen computer display

Context
Large displays encourage multi-user interaction
Large displays may or may not border other displays
Mouse/Keyboard combinations are inherently single-user
Mouse/Keyboard combinations are non-intuitive on large screens
Mouse/Keyboard combinations typically require a desk to other flat surface
Mice become cumbersome when not correctly oriented with the screen
Humans naturally point to objects or touch them

Therefore
A method of interaction for users of large, multi-screen displays must be developed. It must
support multi-user collaboration, and must be easy to learn and intuitive.

Details
Conventional methods of interacting with computer displays are typically limited to a standard
computer mouse or a touch screen interface. For standard computer monitors, or even multiple
monitor display setups, such devices work well and have even become a natural method of
interaction; however, for large displays, these input methods become increasingly problematic.

To begin with, in order to use a standard computer mouse on a multiple screen display, the
screens must be setup in a geometrically significant pattern, that is, each display must be setup to
logically border with another display. In setups where displays are physically separated by
distance, there exists a discontinuity between mouse movements on the screen, and physical
mouse movement in the room (as the mouse will jump from the edge of one screen immediately
to the edge of the next screen, bridging potentially huge gaps in space). Additionally, standard
computer mice usually require a desk or other flat surface to operate, and become extremely
cumbersome when the mouse and user aren't oriented correctly with the screens in question.

Although there are other types of devices that emulate mouse movement (gyromice, wii type
mice), they all take a little getting used to, and they all still control the mouse pointer, which
presents the same problems outlined above. A more natural method of interacting with large
scale displays would be similar to interacting with real objects, through the use of touch (like a
touch screen or multi-touch). The problem here is that on large screens, it is often physically
impossible to touch the entire screen (unless you are 7-8 foot tall or carry a ladder), additionally,
in setups where there are multiple physically separated screens, it becomes tiresome to walk back
and forth between screens.

Applicable Pattern Domains + Supported Creative Practices


• Presentations

9
o Multiple screens provide more information
o Information can be logically partitioned
• Drawing/photo editing
o Large screens show more detail
o Replicate feeling of drawing on canvas
• Virtual Environments
o Large screens are more immersive
o Multiple screens provide a larger field of view
• Internet Browsing
o Large screens show more detail
• Gaming
o Large screens are more immersive
o Multiple screens provide a larger field of view
• Computer Assisted Design
o Large screens show more detail
o Multiple screens provide multiple angles of view

Overall Language Patterns


The following patterns describe the overall interaction of a user with a computer. The goal of
these patterns in to be as abstract as possible, detailing the action itself while remaining
completely technologically independent.

Figure – Overall Interaction Patterns

As can be seen from the above figure, there are two primary forms of graphical interaction with a
computer: Virtual Environments and Standard UI.

10
Virtual Environments
This type of graphical interaction is more typical of modern day games or collaborative
environments such as Second Life or Croquet.

Problem
A user has a set of goals that must be accomplished through interaction with a virtual
environment

Context
The virtual environment is presented on the screen of the computer
The user has some method of controlling the actions of the virtual avatar
The user has some set of goals in interacting with the virtual environment
The said goals can be achieved through interaction with the virtual environment

Therefore
The user must interact with the environment in a manner determined by the constraints of the
environment.

Details
When interacting with the virtual environment, the user is typically constrained to set of
primitive actions; although not a complete list, these often include interacting with others,
navigations, and manipulation of objects. Interaction with others is usually accomplished through
some method of pointing out detail or drawing; navigation is usually accomplished by moving
the virtual avatar through the environment; object manipulation is usually highly dependent on
the environment.

Standard UI
This type of graphical interaction is perhaps the most common, as most modern-day operating
systems replicate this pattern.

Problem
A user has a set of goals that must be accomplished through interaction with the user interface

Context
The interface is presented on the screen of the computer
The user has some set of goals in interacting with the interface
The said goals can be achieved through interaction with the interface

Therefore
The user must interface with the interface in a manner determined by the constraints of the
interface.

Details
When interacting with the interface, the user is typically constrained to set of primitive actions;
although not a complete list, these often include interacting with others, navigations, and
manipulation of objects. Interaction with others is usually accomplished through some method of

11
pointing out detail or drawing; navigation is usually accomplished by proceeding through
sequential sets of screens in a manner consistent with reaching the said goals; object
manipulation is usually accomplished through moving conceptual objects on the screen which
results in the specified goal.

Human-Computer Interaction Patterns


The following patterns describe the interaction of the user with the computer itself. For these
patterns, the description of how the action is actually accomplished, from a user point of view is
most important.

Figure – HCI Interaction Patterns

As can be seen from the above figure, the actions of centering a viewport on an area of interest
and moving a pointer to an area of interest (the primary results of the overall pattern language)
are broken down into their constituent components. Not surprisingly, the constituent components
of an interface designed for modern day computers are the mouse and the keyboard (not pictured
here).

In this instance the mouse is the single weakest link in the chain of interaction: It is at this point
that the abstract movement or rotation of your hand, or the movement of a trackball, result in
movement of a cursor on the screen. The mouse is essentially translating physical movement in
one plane (the desk, or the surface of the trackball, etc) with physical movement in another plane
(the screen). It is this translation that, when applied to sufficiently large screens, becomes
unnatural and cumbersome.

Incorporation of Pattern Language


Unlike projects completed by other students in this class, the goal is not to combine and
implement discovered patterns in a novel way, but rather to allow the discovered patterns to be
performed in a different way; the discovered patterns of navigation, object manipulation, etc,

12
instead of ultimately being performed by the movement of a mouse, will now be performed
simply by pointing to a section of the screen.

13
System Overview
In general, the system can be broken down into 2 main parts:
1. Laser Pointer Tracking
2. Mouse Coordinate Transmission

The first part is encapsulated into a single executable called ‘pointerFinder’ which handles a
single screen and communicates with a second executable ‘vncviewer’ using a simple message
passing interface. The advantage to this design is that multiple instances of the ‘pointerFinder’
program can be run at once allowing multiple screens to be observed at the same time, with each
instance communicating with only a single instance of ‘vncviewer’.

Laser Pointer Tracking


It has already been established that we are projecting a standard computer interface onto a large
projection screen, and using a laser pointer directed at said screen to control the mouse
movement. We now need a mechanism for detecting where on the screen the laser pointer is
pointing, so we can eventually extract mouse coordinates. The somewhat obvious solution is to
point a camera at said screen and use a computer vision algorithm to determine the laser location.
Such an algorithm would locate the dots produced by the laser pointer on the screen and return
their location, relative to the camera’s field of vision.

Figure – Physical System Configuration

Note: In the above figure, we are utilizing both a rear projector and a rear camera, this is to
maximize the space available for user movement, such a system could easily be adapted to a
front projection system as well.

14
Coordinate Rectification
Now that the tracking algorithm has returned the location of the laser pointer in the camera’s
field of vision (FOV), the coordinates must be rectified to actual screen location. The reason for
this step is threefold:
Camera Lens Distortion
When low quality cameras or wide angle lenses are used, it can not be expected that the image
plane is perfectly Euclidean, that is, a picture of a rectangular grid may not necessarily yield a
rectangular grid; an example follows:

Figure – Barrel Distortion (source: www.dcresource.com)

As can be seen in the above figure, the picture is a regular rectangular grid; however, if you look
closely at the corners, you will see that the grid is in fact warped slightly, producing curved lines.
Such distortion needs to be accounted for.
Camera Perspective Distortion
If the camera is not physically located at the center of the screen, an effect known as keystoning
occurs, an example follows:

Figure – Keystone (source: www.aboutprojectors.com)

As can be seen in the above figure, the dimensions of the visible screen are clearly not
rectangular, add to this any camera rotation that may be present, and we have a lot of distortion
that must be accounted for.

15
Camera FOV Size
In order for such a system to work, the camera must be able to see the entire projected screen;
because of this, the projected screen size is necessarily smaller than the image size. Once the
previous rectification algorithms have been applied, the edges of the image must be disregarded
and a new coordinate system established; such a coordinate system should then be aligned
properly with the true mouse coordinates of the target computer.

Mouse Coordinate Transmission


Since one of the requirements of the system is that is must not impede the normal operation of
the machine, it would not be wise put such processor intensive algorithms on the target
computer, instead a secondary computer (host computer) is introduced that performs all of the
necessary calculations and generates the mouse coordinates. These coordinates must then be
relayed to the target machine in some manner; to reduce development time, utilizing a standard
protocol such as VNC seemed like the best option. The VNC protocol is used to remotely control
a target computer over a network connection, this includes relaying both mouse and keyboard
commands. By simply encapsulating our mouse coordinates into the VNC protocol, we can
securely and reliably send them to the target computer.

16
Laser Pointer Tracking
To begin with, the host computer (the computer on which the tracking algorithms are running) is
a 1.5GHz Pentium running Ubuntu 7.10. Linux was picked as the operating system of choice
because generating C/C++ programs is generally easier than in a comparable Windows
environment.

Camera
For testing and demonstration purposes, a Logitech Orbit MP webcam was used to capture the
input video; the camera is your standard USB 2.0 webcam with added capabilities of pan and tilt.
Once the camera was plugged into the host computer, Ubuntu automatically configured the
necessary Universal Video Class (UVC) drivers and created an entry in the /dev file system for
access to the camera. OpenCV was used for both image capture and processing since it is one of
the more robust image processing libraries freely available .

Tracking Algorithm
Configuration File
Each instance of the algorithm reads necessary configuration information from a separate clear
text configuration file. The configuration file has the following entries:
Entry Name Description
IOBServerPath Path to the location of the ioBroker Socket
CameraConfig
camNum Each camera has an entry under /dev of video0, video1, video2. This
number specifies which device to use.
Intrinsicfile Location of the file which describes the intrinsic properties of the
camera
rearCamera ‘1’ if the camera is located behind the screen, ‘0’ if the camera is
located in front of the screen
threshAdj This sets the background subtraction threshold, higher values result in
more of the background being subtracted.
MouseConfig
screenName Name of the screen…must be unique
screenWidth This is the width of the target screen in pixels (horizontal resolution).
screenHeight This is the height of the target screen in pixels (vertical resolution).
xoffset This is the horizontal offset of the target screen in pixels (see diagram)
yoffset This is the vertical offset of the target screen in pixels (see diagram)
Table - Configuration File Entries

17
768 pixels

Figure - Description of height, width, offset entries

Lens Calibration Routine


Note: This calibration is required ONLY when new cameras are used.

This calibration phase requires an 8(col) x 10(row) checkerboard pattern to be displayed on the
current target display. Using the OpenCV ‘cvCalibrateCamera2’ function the following values
are calculated:
intrinsicMatrix - lens distortion
distortionCoefficients – lens distortion
rotationVectors – camera rotation and perspective compensation
translationVectors – camera/screen offset

As indicated above, the first two values are used with OpenCV’s ‘cvUndistort2’ function to
correct any lens distortion that may be present. To complete this calibration, 10 views of the
calibration pattern will need to be acquired; for each view, it is recommended that the camera be
in a different physical location so that all regions of the lens can be studied.

Once this calibration phase is complete, the parameters are written to the intrinsic file specified
in the configuration file. From this point on, the configuration parameters will be read from this
file.

Extrinsic Calibration Routine


This routine is executed each time the pointer finder process is started. There are two distinct
phases to the calibration process, screen bounds calculation and white value calibration.

During the first phase, a calibration pattern (show below) is projected onto the target screen:

18
Figure – Screen Orientation Calibration Pattern

This calibration pattern allows the calibration routine to determine the location of the four screen
corners, as well as the orientation of the screen, resulting in determination of top-right, top-left,
bottom-right, and bottom-left corners of the screen.

Tracking Algorithm
Once the calibration routine is completed, the tracking algorithm begins execution. To begin
with, a single frame is captured roughly every 1/15 of a second, each frame is undistorted using
the ‘cvUndistort2’ function. From here a grayscale image is created by comparing each pixel
value with the maximum red, green, and blue values obtained during the white value phase of the
calibration sequence. Since the laser pointer is extremely bright, compared with the light created
by the projector, only pixels that are brighter than the maximum threshold are allowed to retain
their original value.

Once the threshold image is created, it is passed to a function that locates what appears to be a
dot produced by the laser. This algorithm works by searching iteratively through the image
pixels, starting at the top-left proceeding to the bottom-right. If a pixel is activated, this means
the light at that location is brighter than the threshold of the calibration pattern and must be of
interest. The algorithm then takes a snapshot of a window around the current pixel and finds a
minimum enclosing circle (that is, the smallest circle that will enclose all activated pixels).

Figure – Tracking Algorithm

19
From here, the center and overall brightness of the circle is determined and stored in an array.
All points in the circle are then removed from further consideration. Once the entire frame has
been processed, the brightest n points are returned for rectification.

Coordinate Rectification
Now that the coordinate of the laser pointer(s) on the screen is known, the coordinate(s) must be
rectified into actual screen coordinates for transmission to the target computer; this is done
through the use of a homography matrix to correct for perspective distortion.

Figure – Homography Matrix


http://plus.maths.org/issue23/features/criminisi/index.html

The idea behind homography is that a camera is essentially a 2D plane which represents a 3D
object through the use of projection. In this case, our 3D object is simply assumed to be a 2D
plane rotated in 3D dimensions, which significantly reduces the complexity of calculations. The
homography matrix is used to relate points in the 2D world plane (the screen image) with points
in the camera plane. The homography matrix is assembled from the follow points:

Screen Points Camera Points


X Y X Y
0 0 Top-left bound.x Top-left bound.y
width 0 Top-right bound.x Top-right bound.y
0 height Bottom-left bound.x Bottom-left bound.y
width height Bottom-right bound.x Bottom-right bound.y

20
Fortunately, the homography matrix takes care of not only the perspective transformation, but
also distortion introduced by rotation of the camera; thus by simply multiplying a camera point
by the homography matrix, the mouse screen coordinates can easily be determined.

21
Mouse Coordinate Transmission
Mouse coordinate transmission is accomplished through the use of a message passing interface
known as ioBroker.

ioBroker
ioBroker is a message passing interface developed by a colleague at Impact Technologies by the
name of Greg Rowe. The interface consists of a single server known as the ‘ioBroker’ this server
is a standalone process that listens for messages from ‘iobClients’ which are passed through
either unix sockets or TCP/IP sockets. The ioBroker model is known as a post/subscribe model,
meaning that each client can either subscribe to a message with a particular class type, or post a
message with a particular class type. The class type of the message is simply a string which
specifies what the message contains, a single client can subscribe or post multiple message
classes.

Overall Architecture

Figure - Overall communication architecture

In this implementation, there are 3 main components, the ‘pointerFinder’ program which is an
‘iobClient’, the ‘ioBroker’ and the ‘vncviewer’ which is also an ‘iobClient’. The pointerFinder
programs (one for each screen) post messages with a classID of “mouse-position”; the
vncviewer, which is subscribed to the same messages receives these messages and encapsulates
them into the vnc protocol, which is then transmitted to the target computer.

VncViewer
This program is essentially the vnc client provided with the TightVNC distribution. Since the
program is open source, the user interface was removed and replaced with ‘iobClient’ backend,
in this manner, no new code was needed to conform to the vnc protocol.

22
Additional Documentation
Calibration Client Interface

Figure – Calibration Client Interface

Number Description
1 Specify address of server computer
2 Specify port number server is listening on
3 Press to connect to server
4 Press to start calibration process
5 Press to start relaying mouse coordinates

23

Das könnte Ihnen auch gefallen